0% found this document useful (0 votes)
66 views138 pages

Compiler Design Full PDF

The document provides an introduction to compilers including their phases and passes. It discusses the six main phases of compilation: lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. It also describes compiler writing tools that can help automate the compiler development process, such as parser generators, scanner generators, and syntax directed translation engines. Bootstrapping is defined as writing a compiler in the source language it intends to compile, allowing it to be self-hosting. Cross compilers are compilers that can create executable code for a platform other than the one they are running on.

Uploaded by

works8606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views138 pages

Compiler Design Full PDF

The document provides an introduction to compilers including their phases and passes. It discusses the six main phases of compilation: lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. It also describes compiler writing tools that can help automate the compiler development process, such as parser generators, scanner generators, and syntax directed translation engines. Bootstrapping is defined as writing a compiler in the source language it intends to compile, allowing it to be self-hosting. Cross compilers are compilers that can create executable code for a platform other than the one they are running on.

Uploaded by

works8606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 138

Introduction to

UNIT
Compiler

CONTENTS
Part-1 1-20 to 1-6C
Introduetion to Compiler:
Phases and Passees
* **********

Part-2 : Bootstrapping. *************************


. 1 - 6 C to 1-7C

Part-3 Finite State Machines -8Cto 1-17C


and.
Regular Expressions and
their Application to Lexical
Analysis, Optimization of
DFA Based Pattern Matchers

Part-4 Implementation or . .**asansn


*e****** 1-17C to 1-22C
Lexical Analyzers,
Lexical Analyzer Generator,
LEX Compiler

Part-5 Formal Grammars and. * * * * * *


1-22C to 1-25C
their Application to Syntax
Analysis, BNF Notation

Part-6 Ambiguity, YAC.. . . 1-25C to 1-27C

to 1-30C
Part-7 The Syntactic Specification. I-ZTC

of Programming Languages
Context Free
Grammar (CPG),
Derivation and Parse Trees,
Capabilities of CFG

1-1 C(CS/TT-Sem-5)
1-2 C (CS/IT-Sem-5) Introduetion to Compiler

LPART- 1
Introduction to Compiler, Phases and Passes.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 1.1. Explain in detail the process ofcompilation. Ilustrate


the output of each phase of compilation of the input

a (b+c)*(b +c)» 2". AKTU 2016-17, Marks 10


Answer
A compiler contains 6 phases which are as follows

i Phase 1 (Lexical analyzer):


a. The lexical analyzer is also called scanner.
b. The lexical analyzer phase takes source program as an input and
separates characters of source language into groups of strimgs

called token.
C. These tokens may be keywords identifiers, operator symbols and

punctuation symbols.
ii. Phase 2 (Syntax analyzer):
a. The syntax analyzer phase is also called parsing phase.
b. The syntax analyzer groups tokens together into syntactic

structures.
The output of this phase is parse tree.

i . Phase 3 (Semantic analyzer):


a. The semantic analyzer phase checks the source program for
semantic errors and gathers type information for subsequent code

generation phase.
It uses parse tree and symbol table to check whether the given
b.
program is semantically consistent with language definition.

c. The output of this phase is annotated syntax tree


iv. Phase 4 (Intermediate code generation):
a. The intermediate code generation takes syntax tree as an input

from semantic phase andgenerates intermediate code.


b. It generates variety of code such as three address code, quadruple,
triple.
Compiler Design 1-3 C (CS/IT-Sem-5)

Source program

Lexical analyzer

Syntaxanalyzer

Semantic analyzer

Symbol table Intermediate code


manager generator |Error handler

Code optimizer

Code generator-

Target program
Fig. 1.1.1.

Phase 5 (Code optimization): This phase is designed to improve the


intermediate code so that the ultimate object program runs faster and

takes less space.


vi. Phase 6 (Code generation):
a It is the final phase for compiler
b. It generates the assembly code as target language.

c. In this phase, the address in the binary code is translated fromn


logical address.
A symbol table is a data structure
Symbol table/ table management:
containing a record that allows us to find the record for each identifier
quickly and to store or retrieve data from that record quickly.

handler is invoked when a flaw in the source


Error handler: The error

program is detected.

(6+ c)*(b + c)*2":


Compilation of "a =
14C (CSIT-Sem-5) Introduction to Compiler

Input processing in compile Output


a = b +c) (b + c)*2

Lexical analyzer
Token stream

id, =
id,+ id) id,+ id) 2

Syntax analyzer

Parse tree

Semantic analyzer

Annotated syntax tree

pt 0 real

Intermediatecode
LKeneration

=b+c Intermediate code

3 int_to_real (2)

4
Compiler Design 1-5 C (CS/IT-Sem-5)

Code optimization
Optimized code

id,=t22
Machinecode Machine code
MOVR, b
ADD R, R,,C
MUL R2, R,. R,
MUL R2, Ri, # 2.0
ST id, R2

Que 12. What are the types of passes in compiler ?

Answer
Types of passes:
. Single-pass compiler
a In a single-pass compiler, when a line source is processed it is
scanned and the tokens are extracted.
b. Then the syntax of the line is analyzed and the tree structure,
some tables containing information about each token are built.
2 Multi-pass compiler : In multi-pass compiler, it scan the input source
once and produces first modified form, then scans the modified form
and produce a second modified form and so on, until the object form is
produced.

Que 13. Discuss the role of compiler writing tools. Describe


various compiler writing tools.

Answer
Role of compiler writing tools:
LCompiler writing tools are used for automatic design of compiler
component.
2. Every tool uses specialized language.
3 Writing tools are used as debuggers, version manager.
Introduction to Compiler
1-6C (CSIT-Sem-5)

Various compiler construction/writing tools


are

normally
1. Parser generator: The procedure produces syntax analyzer,
from input that is based on context free grammar.

2 Scanner generator: It automatically generates lexical analyzer,

normally from specification based on regular expressions.


3 Syntax directed translation engine
It collection of routines that are used in parse tree.
a produces
b. These translations are associated with each node of parse tree,
and each translation is detined in teerms of translations at its

neighbour nodes in the tree.


4 Automatic code generator: These tools take a collection of rules
that define translation of each operation of the intermediate language
into the machine language for target machine.
5. Data flow engine: The data flow engine is use to optimize the code
involved and gathers the information about how values are transmitted

from one part of the program to another

PART-2
Bootstrapping

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 14. Define bootstrapping with the help of an


example.
OR
What is a cross compiler ? How is bootstrapping ofa compiler done
to a second machine ?

Answer
Cross compiler:A cross compiler is a compiler capable ofcreating executable
code for a platform other than the one on which the compiler is running.

Bootstrapping:
1. Bootstrapping is the process of writing a compiler (or assembler) in the
source programming language that it intends to compile.
2. Bootstrapping leads to a self-hosting compiler.
3. An initial minimal core version of the compiler is generated in a different
language.
Compiler Design 1-7C (CS/IT-Sem-5)

4. A compiler is characterized by three languages:


a. Source language (S)
b. Target language (7)
C.
Implementation language ()

°C represents a compiler for Source S, Target T, inmplemented in I.


The T-diagram shown in Fig. 1.4.1 is also used to depict the same
compiler:

Fig. 14.1
6. To create a new
language, L, for machine A :
a. Create °C^ a compiler for a subset, S, of the desired language, L,
using language A, which runs on machine A. (Language A may be
assembly language.)

A
Fig. 14.2
b. Create "C3 ,a compiler for language L written in subset
a
of L.
A

Fig. 1.43

C. Compile "C3 using °C^ to obtain "C a compiler for language


L, which runs on machine A and produces code for machine A.
C *CA > "CA
The process illustrated by the T-diagrams is called b0otstrapping and
can be summarized by the equation:
gA +SA = LA

A
L A

s
Fig. 144
1-8C (CS/AT-Sem-5) Introduction to Compiler

PART 3
Finite State Machines and Regular Expressions and their
Application to Lexical Analysis, Optimization of DFA
Based Pattern Matchers

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 1.5. What do you mean by regular expression ? Write the


formal recursive definition of a regular expression.
Answer
1. Regular expression is a formula in a special language that is used for
specifying simple classes of strings.
A string is a sequence of symbols; for the purpost of most text-based
search techniques, a string is any sequence of alphanumeric characters
(letters, numbers, spaces, tabs, and punctuation).
Formal recursive definition of regular expression
Pormally, a regular expression is an algebraic notation for characterizing a
set of strings.

1. Any terminals, i.e., the symbols belong to S are regular expression.


Null string (n, E) and null set (4) are also regular expression.
2. If P and Q are two regular expressions then the union of the two
regular expressions, denoted by P + Qis also a regular expression.
3. IfPand Q are two regular expressions then their concatenation denoted
by PQ is also a regular expresSion.
4. If P is a regular expression then the iteration
(repetition or closure)
denoted by P* 18 als0 a regular expression.
5. IfP is a regular expression then P, is
regular expression.
a

6. The
expressions
2 are also
got by repeated application of the rules from (1) to (5)
over regular express1on.

Que 1.6.Define and differentiate between DFA and NFA with an

example.
Compiler Design
1-9C(CS/IT-Sem-5)
Answer
DFA:
1. A finite automata is said to be
on the same
deterministic if we have only one transition
input symbol from some state.
2. A DFA is a set of five
tuples and represented as
M
=Q. 2, 8, 4o. F)
where, A set
of non-empty finite states
= A set of
non-empty finite input symbols
o Initial state
of DFA
F= A
non-empty finite set of final state
6 =QxE »Q.
NFA:
1. Afinite automata said to be
is
one
non-deterministic, we have more than
possible transition on the same input symboliffrom some state.
2. Anon-deterministic finite automata is set
as
and of five tuples represented
M = Q, 2, , qo» F)

where, A set
of non-empty finite states
Z = A set of non-empty finite
input symbols
90 Initial state of NFA and member
of Q2
F=A non-empty finite set of final states and
member of Q

oq,a)
42
Fig. 1.6.1.

ö = It is transition function that takes a state from


and an input symbol from 2 and returns a
subset of Q. The öis represented as:

Q*(u le) > 29


1-10C(Cs/AT-Sem-5) Introduction to Compiler

Difference between DFA and NFA:


S.No. DFA
NFA
1. It stands for deterministic finite It stands for non-deterministic
automata. finite automata.

Z. Only one transition is possible | More than one transition 1s


trom one state to another on possible from one state to
Same input symbol. another on same input symbol.
3. Transition function ð is written Transition function ö is written
AS as

8:Qx :Qx (utel)-> 2


4 I n DFA, t-transition is not possible. In NFA, E-transition is possible.
5 DFA cannot be converted into NFA can be converted into
NFA DFA.

Example: DFA for the Ianguage that contains the strings ending with
0 over 2 = 10, 11.

0
Start 0

Fig. 1.6.2.
NFA for the language L which accept all the strings in which the third
symbol from right end is always a over 2= la, b|.
a, b
a,b

a, b
(4)- a, b

Fig. 1.6.3.

Que 1.7.Explain Thompson's construction with example

Answer
Thompson's construction :
1. It is an algorithm for transforming a regular expression to equivalent
NFA.
2. Following rules are defined for a regular expression as a basis for the
construction:
i The NFA representing the empty string is
Compiler Design 1-11 C (CS/IT-Sem-5)

i. Ifthe regular expression is just a character, thus a can be represented


as.

1. The union operator is represented by a choice of transitions from


a node thus a |b can be represented as

iv. Concatenation simply involves connecting one NFA to the other


thus ab can be represented as:

v. The Kleene closure must allow for taking zero or more instances
of the letter from the input; thus a" looks like :

Q
For example:
Construct NFA for r = (a |b)"a

For r=a,
start

For r b,
start
-0
For r a b

3
star

The NFA for r = r;"


1-12C (CSIT-Sem-5) Introduction to Compiler

star

Finally, NFA for rs = rTi= (a |b)"a

Que 1.8. 1 Construct the NFA for the regular expression a labbla'b
by using Thompson's construction methodology.

AKTTU 2017-18, Marks 10


Answer
Given regular expression: a + abb + a*b*

Step1:G 4+abb +a"b

Step 2:

Step 3: 1,
Compiler Design 1-13C (Cs/AT-Sem-5)

Que 1.8.Draw NFA for the regular expression ab* |ab.

AKTU 2016-17,Marke10
Answer
Step 1: a

Step 2: b*

Step 3:b
Step 4:ab*

Step 5:ab
o-046DO
o000-0-O
Step 6:ab*|ab

Flg. 19.1. NFA of ab Jab.


Que 1.10. Discuss conversion of NFA into a DFA. Also give the

algorithm used in this conversion. AKTU 2017-18,Marks 10


Answer
Conversion from NFA to DFA:

Suppose there is a n NFA N< Q, 2, 40. 8, F> which recognizes a language L.


Then the DFAD <Q, X, 4o8, F> can be constructed for language L as :

Step1: Initially Q' = ¢.

Step 2: Add 4, to ' .


Step 3: For each state in Q', find the possible set of states for each input
transition funetion of NFA. If this set is not in Q', add
symbol using of states
it toQ.
1-14 C (CS/NT-Sem-5) Introduction to Compiler

Step 4: Final state of DFA will be all states which contain F (final states oft
NFA).

Que 1.11.|Construet the minimized DFAfor the regular expression


(0+1)(0+1) 10. AKTU 2016-17, Marks 10
Answer
Given regular expression: (0+1(0 + 1)10
NFA for given
regular expression:
1)"0+ 1)10

If we remove & we get

0, 1o

:Ecan be neglected so
9, =
9, 4

Now, we convert above NFA into DFA:


Transition table for NFA :

S/E

9
43 4

Transition table for DFA:

o/2 Let
as.A
4,93 4,9, as B
4,9,9,as C
9,4,94 9,94 9,4,9
9,9,9 9,9 4,9,9 9,9,9,asD
Compiler Design 1-15C (CS/IT-Sem-5)

Transition diagram for DFAA:

- L -
B

B C
D C
D

For minimization divide the rows of transition table into 2 sets, as


Set-1:It consists of non-final state rows.

C C

Set-2:It consists of final state rows.

D B
C
No two rows are similar.

So, the DFA is already minimized.

Que 1.12.| How does finite automata useful for lexical analysis ?

Answer
1. Lexical analysis is the process of
reading the source text of a
program
and
converting it into a sequence of tokens.
2 The lexical structure of every
programming language can be
specified
by a regular language, a common way to implement a lexical
S to:
analyzer
a.
Specify regular expressions for all of the kinds of tokens in the
language.
b. The disjunction of all of the
regular expressions thus describes
any possible token in the language.
c. Convert the overall
regular expression specifying all possible
tokens into a Deterministic Finite Automaton (DFA).
d Translate the DFA into a program that simulates the DFA. This
program is the lexical analyzer.
1-16C (CS/IT-Sem-5) Introduction to Compiler

3 This approach is so useful that programs called lexical analyzer


generators exist to automate the entire process.

Que 1.13. | Write down the regular expression for


1. The set of all string over la, b) such that rifth symbol from right
is a.
2 The set of all string over (0, 1) such that every block of fourr
consecutive symbol contain at least two zero.

AKTU 2017-18,Marks 10
Answer
1. DFA for all strings over la, b} such that fifth symbol from right is a:

b) (a +b) (a +b) (a +b)


Regular expression: (a + b) a (a +

2 Regular expression :

1) (0+ 1) 0(0+ 1) +000 + 1) (0 + 10 + (0+ 1) 00(0 +1)


[0oK0+ 1) 00 + +(0+ 1)00 + 1)0+ (0 + 1) (0 + 1001

Convert following NFA to equivalent DFA and hence


Que 1.14.
minimize the number of states in the DFA.

. C

a. C

Fig. 1.14.1.

AKTU 2018-19, Marks 07


Answer
Transition table for s-NFA :

C
8/2
2 1919

lo

E-closure of lg,) = Mo» 9 1 9 2

E-closure of lg,l = lg,)

E-closure of lq,) = lg2!


Compiler Design 1-17C (CS/IT-Sem-5)

Transition table for NFA:


8/2 b

91,9l9g99%9|9o4
Let 90919g =A

g9= B
=C
Transition table for NFA:
8/2 a b

A B

B A

Transition table for DFA:

8/2

So, DFA is given by

/a, b, c

PDead state
a, b, c
Fig. 1.14.2

PART-4
Implementation of Lexical Analyzers, Lexical Analyzer Generator,
LEX Compiler.
1-18C (CSAT-Sem-5) Introduction to Compiler

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 1.15. Explain the implementation of lexical analyzer.

Answer
Lexical analyzer can be implemented in following step:
1. Input to the lexical analyzer is a source program.
2 By using input buffering scheme, it scans the source program.
3. Regular expressions are used to represent the input patterns.
4. Now this input pattern is converted into NFA by using finite automation
machine.

Regular expression
Finite automata

Tokenized
Input program Lexical analyzer output file

Symbol table

Fig. 1.15.1. Implementation of lexical analyzer


5. This NFA are then converted into DFA and DFA are minimized by using
different method of minimization.
6. The minimized DFA are used to recognize the pattern and broken into
lexemes.

7. Each minimized DFA is associated with a phase in a programming


language which will evaluate the lexemes that match the regular
expresson.
8. The tool then constructs a state table for the appropriate finite state
machine and creates program code which contains the table, the
evaluation phases, and a routine which uses them appropriately.

Que 1.16. Write short notes on lexical analyzer generator.

Answer
For efficient design of compiler, various tools are used to automate the
phases ofcompiler.The lexical analysis phase can be automated using a
tool called LEX.
Compiler Design 1-19C (CSIT-Sem-5)

2. LEX is Unix utility which generates lexical


a
analyzer
3. The lexical analyzer is generated with the
help of regular expressions
4 LEX lexer is very fast in finding the tokens as compared to handwritten
LEX progTam in C.
5. LEX scans the source program in order to get the stream of tokens and
these tokens can be related together so that various
programming
structure such as expression, block statement, control structures,
procedures can be recognized.

Que 1.17. Explain the automatic generation of lexical analyzer.

Answer
1. Automatic generation of lexical analyzer is done using LEX
programming language.
2. The LEX specification file can be denoted using the extension.l (often
pronounced as dot L).
3. Por example, let consider specification file
us as
x.l.
4 This x.lfile is then given to LEX compiler to produce lex.yy.e as shown
in Pig. 1.17,.1. This lex.yy.c is a C program which is actually a lexical
analyzer program.
Lex specification
LEX lex.yy.c
x.. compilerLexicalanalyzer
program
Fig1.17.1
5. The LEX specification file stores the regular expressions for the token
and the lex.yy.c file consists of the tabular representation of the
transition diagrams constructed for the regular expression.
6. In specification file, LEX actions are associated with every regular
expression.
7. These actions are simply the pieces of C code that are directly carried

over to the lex.yy.c.


Finally, the C compiler compiles this generated lex.yy.c and produces
shown in Fig. 1.17.2.
an object program a.out as

9. When some input stream is given to a.out then sequence of tokens


is shown in Fig. 1.17.2.
gets generated. The described scenario
Introduction to Compiler
1-20 C (CSAT-Sem-5)

lex yy. a.out


compier[Executableprogram
Stream of
Input a.out
strings trom tokens
source program

Fig. 1.17.2. Generation of lexical analyzer using LEX.

Que 1.18.Explain different parts of LEX program.

Answer
The LLEX program consists of three parts

Declaration section

Rule section

Auxiliary procedure section

1 Declaration section:
constants can be
declaration of variable
a. In the declaration section,
done.

definitions can also be written in this section.


b. Some regular
basically components of regular
c. The regular definitions are

expressions.

2 Rule section :

expressions with associated


a The rule section consists of regular in the form as
be given
actions. These translation
rules can
R, laction,
R, laction

R, laction, is
Where each R, is a regular expression and each action, a program

what action is to be taken for corresponding


fragment describing
regular expression.
of C code.
b. These actions can be specified by piece
3. Auxiliary procedure section:
procedures defined which are required
In this section, all the
are
a.

the actions in the rule section.


by
Compiler Design 1-21 C (CS/IT-Sem-5)

b. This section consists of two functions:


1 main() function
ii. yywrap() function

Que 1.19. Writea LEX program to identify few reserved words of C


language.
Answer

int count;

"program to recognize the keywords/

(%\t]+ * *+" indicates zero or more and this pattern is use foor
ignoring the white spaces/
auto | double | if| static | break | else | int | struct| case
enum
| long | switch | char| extern| near | typedet | const| float |
register| union | unsigned | void | while | default
printf"C keyword(%d):\t %s",count,yytext);
la-zA-Z]+ printfi%8: is not the
keyword \n, yytext
main)

yylex();

Que 1.20.|What are the various LEX actions that are used in LEX
programming ?

Answer
There are following LEX actions that can be used for ease of programming
using LEX tool
1. BEGIN: It indicates the start state. The lexical analyzer starts at statee
).

2 ECHO: It emits the input as it is.


3. yytext():
a. yytext is a null terminated string that stores the lexemes when
lexer recognizes the token from input token.
b. When new token is found the contents of yytext are replaced by
new token.
1-22 C (CSAT-Sem-5)
Introduetion to Compiler
4
yylex): This is an
important function. The funetion yylex() is called
when scanner starts
scanning the source program.
5. yywrap():
a. The funetion yywrap() is called when scanner encounter end of

file.
b.
If yyWrap() returns 0 then scanner continues
scanning
C.1f yywrap) returns 1 that means end of file is encountered.
6 yyin:It is the standard input file that stores
input source program.
7. yvleng : yyleng stores the length or number of characters in the input
string.

Que 1.21.| Explain the term token, lexeme and pattern.

Answer
Token:
A token is a pair consisting of a token name and an optional attribute
value.

2. The token name is an abstract symbol representing a kind of lexical


unit.
3. Tokens can be identifiers, keywords, constants, operators and
punctuation symbols such as commas and parenthesis.
Lexeme:
1. Alexeme is a sequence of characters in the source program that matches
the pattern for a token.
Lexeme is identified by the lexical analyzer as an instance of that token.
2.

Pattern:
of the form that the lexemes of a token may
1 A pattern is a
description
take.
2. Hegular expressions play an important role for specifying patterns.
3. Ifakeyword is considered as token, pattern is just sequence ofcharacters.

PART-5
Formal Grammars and their Application to Syntar Analysis,
BNE Notation.

Questions-Answers
Long Answer Type and Medium Answer Type Questions
Compiler Design 1-23 C (CS/IT-Sem-5)

Que 1.22 Deseribe grammar.

Answer
A grammar or phrase structured grammar is combination of four tuples and
can be represented as G (V, T, P, S). Where,
1. V is finite non-empty set of variables/non-terminals. Generally non-
terminals are represented by capital letters like A, B, C, . . X , Y, Z.
2 Tis finite non-empty set of terminals, sometimes also represented by
or V Generally terminals are represented by a, b, c, x, y, 2, a, B, 7etc.
3. Pis finite set whose elements are in the form a > p. Where a and ß are
strings, made up by combination of V and Tie., (VuT). a has at least
one symbol from V. Elements of P are called productions or produetion
rule or rewriting rules.

4 Sis special variable/non-terminal known as starting symbol.


While writing a grammar, it should be noted that VnT=¢, i.e., no terminal
can belong to set of non-terminals and no non-terminal can belong to set ot
terminals.

Que 1.23. What is Context Free Grammar (CFG) ? Explain.

Answer
Context free grammar:
1 A CFG describes a language by recursive rules called productions.
2. A CFG can be described as a combination of four tuples and represented
by G(V, T, P, S).

where,
V> set of variables or non-terminal represented by A, B,., Y, Z.
T> set of terminals represented by a, b, c, . X, Y, 2, t,
*,()etc.

Sstarting symbol.
P-set of productions.
3. The production used in CFG must be in the form of A >a, where A isa
variable and a is string of symbols (VuT°.
4. The example of CFG is:

G= (V, T,
P, S)
1-24C (CSIT-Sem-5) Introduction to Compiler

where V EI,T = I+, *, (,), id)

S= E) and production Pis given as:


EE+ E

EE* E

E(E)
E id

Que 124. Explain formal grammar and its application to syntax


analyzer.

Answer

1. Formal grammar represents the specification of programming language


with the use of production rules.
2 The syntax analyzer basically checks the syntax of the language.
3. A syntax analyzer takes the tokens from the lexical analyzer and groups
them in such a way that some programming strueture can be
recognized
4. After grouping the tokens if at all any syntax cannot be recognized
then syntactie error will be generated.

5. This overall process is called syntax checking of the language.


6. This syntax can be checked in the compiler by writing the specifications.
. Specification tells the compiler how the syntax of the programming
language should be.

Que 1.25.1 Write short note on BNF notation.

Answer
BNFnotation:
1. The BNF (Backus-Naur Form) is a notation technique for context free
grammar. This notation is useful for specifying the syntax of the
language.
2. The BNF specification is as
<symbol> := Expl | Exp2| Exp3..
Where symbol> is a non terminal, and Exp1, Exp2 is a sequence of
symbols. These symbols can be combination of terminal or non

terminals.
Compiler Design 1-25 C (CS/IT-Sem-5)

3 For example :
<Address>; = <fullname>: "," <street>"," <zip code>
<fullname>: = <firstname> " " <middle name> " " <surname>

<street> ; = <street name> "," <city>

We can specify first name, middle name, surname, street name, city
and zip code by valid strings.

The BNP not ation is more often non-formal and in human readable
form. But commonly used notations in BNF are :

a.
Optional symbols are written with square brackets.
b. For repeating the symbol for 0 or more number of times asterisk
can be used.

For example: name


For repeating the symbols for at least one or more number of
times+is used.

For example: {name|"

d The alternative rules are separated by vertical bar.


e. The group of items must be enclosed within brackets.

LPART-6
Ambiguity, YACC

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 1.26. What is an ambiguous grammar ? Is the


following
grammar is ambiguous ? Prove BE + |E(E)|id. The grammar should
be moved to the next line, centered.
AKTU 2016-17, Marks 10
Answer
Ambiguous grammar: A context free
grammar G is ambiguous
is at least one
string in LiG) having two or more if there
distinct derivation tree.
Proof: Let production rule is given as:
E EE+
1-26 C (CSIT-Sem-5) Introduction to Compiler

E ECE)
Eid
Parse tree for id(idjid + is

Only one parse tree 18


possible for id(id)id+
s0, the given grammar
d is unambiguous.

Que 127. Write short note on:


i Context free grammar
ii. YACC parser generator
OR
Write a short note on YACC parser generator.

AKTU 2017-18, Marks 05


Answer
i Context free grammar : Refer Q. 1.23, Page 1-23C, Unit-1.

i. YACC parser generator:


1. YACC (Yet Another Compiler- Compiler) is the standard parser
generator for the Unix operating system.
the parser in
2 An open source
program, YACC generates code for
the C programming language.
3 It isa Look Ahead Lef-to-Right (LALR) parsergenerator, generating
a parser, the part of a compiler that tries to make syntactic sense of
the source code.

follows
Que 1.28. Consider the grammar G given as :

SABlaaB
A alAa
B-
Determine whether the grammár G is ambiguous or not. If G is
ambiguous then construct an unambiguous grammar equivalent
to G.
1-27 C (Cs/1T-Sem-5)
Compiler Design

Answer
Given
S AB|aaß
A a |Aa

Let us generate string aab from the given grammar. Parse tree for generating
string aab are as follows:

Fig. 1.28.1

Here for the same string, we are getting more than one parse tree. Hence,
gTammar is an ambiguous grammar.
The grammar
SAB
A Aala
Bb
1s an unambiguous grammar equivalent to G. Now this grammar has only

one parse tree for string aab.

Fig. 1.28.2

PART 7]
The Syntactic Specification of Programming Languages: Context
Free Grammar (CFG), Derivation and Parse Trees,
Capabilities of CFG.
1-28 C (Cs/AT-Sem-5) Introduction to Compiler

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 1.29.| Define parse tree. What are the conditions for
constructing a parse tree from a CFG ?

Answerr

Parse tree
1. A parse tree is an ordered tree in which left hand side of a production
represents a parent node and children nodes are represented by the
production's right hand side.
Parse tree is the tree representation of deriving a Context Free Language
(CFL) from a given Context Free Grammar (CFG). These types
of trees
are sometimes called derivation trees.

Conditions for constructing a parse tree from a CFG:


Each vertex of the tree must have a label. The label is a non-terminal or
terminal or null (c).
i The root of the tree is the start symbol, i.e., S.
ii The label of the internal vertices is non-terminal symbols
eVy
iv. If there is a production AXX,.Then for a vertex, label A, the
children of that node, will be A,A, ...Ax
A vertex n is called a leaf of the parse tree if its label is a terminal
symbol e or null ().

Que 130. How derivation is defined in CFGG?

Answer

A derivation is a sequence of tokens that is used to find out whether a


sequence of string is generating valid statement or not.
We can define the notations to represent a derivation.

3. Pirst, we define two notationsand


4. IfaB is a production of P in CFG and a and b are strings in
(V,uV,",then
aab aßb.
G
Compiler Design 1-29 C (CS/IT-Sem-5)

5. We say that the production a>ß is applied to the string aab to obtain
aßb or we say that aab directly drives aßb.
6. Now suppose a,. " ,g . O, are string in (V,UV", m21 and

Then we say that a, ie., we say a, drives a in grammar G. Ifa

drives by exactly i steps, we say a , P .

Que 1.31. What do you mean by left most derivation and right
most derivation with example ?

Answer
Left most derivation: The derivation S>s is called a left most derivation,
ifthe production is applied only to the left most variable (non-terminal) at
every step
Example: Let us consider a grammar G that consist of production rules
E
E>E+E|E * | id.
Firstly take the production
(Replace E >E * E)
EE+EE* E+E
i d * E+E (Replace E id)

i d * id + E (Replace E id)

i d * id+ id (Replace E > id)

Right most derivation:A derivation S s is calleda right most derivation.


if production is applied only to the right most variable (non-terminal) at

every step.
Example: Let us consider a grammar Ghaving production.

EE +E |E* E| d.
Start with production
EE * E
E * E +E (Replace E >E + E)

E * E+ id (Replace E id)
E * id + id (Replace E id)
i d * id+ id
(ReplaceE+id)
1-30C (CSAT-Sem-5)
Introduction to Compiler
Que 1.32.Describe the capabilities of CFG.
Answer
Various capabilities of CFG are:
1. Context free grammar is useful to
describe most of the
languages. programming
2. If the
grammar is properly designed then an
constructed automatically. efficient parser can be
3. Usingthe features of
associatively and precedence information,
grammars for expressions can be constructed.
4. Context free grammar is
capable of describing nested structures like
balanced parenthesis, matching begin-end, corresponding if-then-else's
and so on.

VERY IMPORTANT QUESTIONS

Following questions are very important.


These questions
may be asked in your SESSIONALS a s well as
UNIVERSITY EXAMINATION.

Q.1. Explain in detail the process


of compilation. Illustrate the
output of each phase of compilation of the input
a = (b + c)*(b+c)* 2".

Ans Refer Q. 1.1.

Q.2. Define and differentiate between DFA and NFA with an


example.
AE Refer Q. 1.6.

Q.3. Construct the minimized DFA for the regular expression


(0+1)*(0+1)10.
Refer Q. 1.11.

94. Explain the implementation of lexical analyzer.


ARS Refer Q. 1.15.

5 . Convert following NFA to equivalent DFA and hence


minimize the number of states in the DFA.
2 UNIT
Basic Parsing
Techniques

CONTENTS
Part-1 Basie Parsing Techniques ********** 2-2C to 2-4C
Parsers Shift Reduce Parsing

Part-2 Operator Precedence Parsing.. 24C to 2-88C


2-8C to 2-15C
Part-3 : Top-down Parsing.
g**********************************
Predictive Parsers

Part-4 : Automatic Generation of . . . 2 - 1 5 C to 2-17C


Efficient Parser: LR Parsers
The Canonical Collections
of LR(O) Items

2-17C to 2-27C
Part-5 Constructing SLR .
Parsing Tables

Part-6 Constructing canonical L R . . 2-27C to 2-28C

Parsing Tables

LALR 2-28C to 2-37C


Part-7 : Constructing ****************************

Parsing Tables Using


Ambiguous Grammars
An Automatic Parser
Generator Implementation
of LR Parsing Tables

2-1 C (CS/IT-Sem-5)
2-2 C (CSAT-Sem-5) Basic Parsing Techniques

PART1
Basic Parsing Techniques: Parsers, Shif Reduce Parsing.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 2.1.|What is parser ? Write the role of parser. What are the
most popular parsing techniques ?
OR
Explain about basie parsing techniques. What is top-down parsing?
Explain in detail.

Answer
A parser for any grammar is a program that takes as input string w and
produces as output a parse tree for w.
Role of parser:
1. The role of parsing is to determine the syntactic validity of a source
string
2. Parser helps to report any syntax errors and recover from those erTors.
3. Parser helps to construct parse tree and passes it to rest of phases of
compiler.
There are basically two type of parsing techniques
1. Top-down parsing:
a. Top-down parsing attempts to find the left-most derivation for an
input string w, that start from the root (or start symbol), and
create the nodes in pre-defined order.
b. In top-down parsing, the input string w is scanned by the parser
from left to right, one symboltoken at a time.
C. The left-most derivation generates the leaves of parse tree in left
to right order, which matches to the input scan order.
d In the top-down parsing, parsing decisions are based on the
lookahead symbol (or sequence of symbols).
2 Bottom-up parsing:
a. Bottom-up parsing can be defined as an attempt to reduce the input
stringw to the start symbol of a grammar by finding out the right
most derivation of w in reverse.
Compiler Design 2-3C (CS/TT-Sem-5)
b. Parsing involves searching for the substring that matches the
side of any of the right
productions of the grammar.
This substring is
C.
replaced by the left hand side non-terminal of the
production.
d Process of replacing the right side of the
non-terminal is called "reduction".
production by the left side

Que 2.2.Discuss bottom-up parsing. What are bottom-up


parsing techniques ?
Answer
Bottom-up parsing: Refer Q. 2.1, Page 2-20, Unit-2.
Bottom-up parsing techniques are:
1. Shiftreduce parser:
a. Shift-reduce parser attempts to construct parse tree from leaves
to root and uses a stack to hold grammar symbols.
b. A parser goes shifting the input symbols onto the stack until
on
a
handle comes on the top of the stack.
C. When handle appears
a the of the stack, it
on
top performs
reduction.

id,+id, " idg$


Input tape

readwrite head

Seeking Shift-reduce parser


for handle
on stack
top
Stack
Fig. 2.2.1.Shift-reduce parser.
d This parser performs following basie operations:
Shift
Reduce
11. Accept
v. Error
2 LR parser : LR parser is the most efficient method of bottom-up
parsing which can be used to parse the large class of context free
grammars. This method is called LR(k) parsing. Here
2-4C (CSAT-Sem-5) Basic Parsing Techniques

L stands for left to right scanning.


b. R stands for right-most derivation in reverse.
c. kisnumber ofinput symbols. When k is omittedit is assumed to be 1.

Que 2.3.What are the common conflicts that can be encountered


in shift-reduce parser ?

Answer
There are two most common conflict encountered in shift-reduce parser
1. Shift-reduce conflict
a. The shift-reduce conflict is the most common type of conflict found
in grammars.

b. This conflict occurs because some production rule in the grammar


is shifted and reduced for the particular token at the same time.
This error is often caused by recursive grammar definitions where
the system cannot determine when one rule is complete and
another isjust started.
2 Reduce-reduce conflict
a. A reduce-reduce conflict is caused when a grammar allows two or
more different rules to be reduced at the same time, for the same
token.
b. When this happens, the grammar becomes ambiguous since a
program can be interpreted more than one way.
c. This error can be caused when the same rule is reached by more
than one path.

PART-2
Operator Precedence Parsing.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 24.Explain operator precedence parsing with example.


Answer
1. A grammar G is said to be
operator precedence if it posses following
properties
No production on the
a.
right side is e.
2-5 C (CS/IT-Sem-5)
Compiler Design

b. There should not be any production rule possessing two adjacent


non-terminals at the right hand side.
2. In operator precedence parsing, we will first define precedence relations
< and > between pair of terminals. The meanings of these relations8
are
gives precedence than
p<'q P more
9
p=9 phassame precedence as q.
p takes precedence over q.
For example :

Consider the grammar for arithmetic expressions


E EA | (E) | - E | id

A+|-I'1/1
1. Now c o n s i d e r t h e s t r i n g i d + id * id

2. Wewill insert $ symbols at the start and end of the input string. We will
also insert precedence operator by referring the precedence relation
table.
S< id >+<* id > *<id > $

3. We will follow following steps to parse the given string:


a. Scan the input from left to right until first > is encountered.
b. Scan backwards over = until < i s encountered.
c. The handle is a string between < and :>.
4. The parsing can be done as tfollows :

$< i d >+< id > *< i d > $| Handle id is obtained between <*>,


Reduce this by E -> id.
E+< id >*<*id: >$ Handle id is obtained between e*>,
Reduce this by E-> id.

E+E*< id >$ Handle id is obtained between <>


Reduce this by E ->id.
E+E*E
Remove all the non-terminals.
Insert $ at the beginning and at the
end. Also insert the precedence
operators.
+<
The
*
operator is surrounded by
<>. This indicates that * becomes

That means, we have to


handle.
reduce E *Eoperation first.
$e+$ Now + becomes handle. Hence, we
evaluate E+ E.

$$ Parsing is done.
2-6C(CS/AT-Sem-5) Basic Parsing Techniques

Que 2.5. Give the algorithm for computing precedenece function.

Consider the following operator precedence matrix draw precedence


graph and compute the precedenee function:
a

AKTU 2015-16, Marks 10


Answer
Algorithm for computing precedence function:
Input : An operator precedence matrix.
Output: Precedence functions representing the input matrix or an indication
that none exist.
Method:
1. Create symbols f, and 8, for each a that is a terminal or $.

2 Partition the created symbols into as many groups as possible, in such a


way that if ab, then fa and 8, are in the same group.
3. Create a directed graph whose nodes are the groups found in step 2. For
any a and 6, if a <. b, place an edge from the group of 8, to the gTOup
of f Ifa.> b, place edge from the group
an
of f, to that of gh
4 If the graph constructed in step 3 has a cycle, then no precedence
functions exist. If there are no cyeles, let fla) be the length of the longest
path from the group of f: let gtb) be the length of the longest path from
the group Then there exists a precedence
of B,. function.
Precedence graph for above matrix is :

Fig. 2.6.1.
Compiler Design 2-7 C (CS/IT-Sem-5)

From the precedence graph, the precedence function using


algorithnm calculated as follows :

3
022 3 0

Que 2.6. Give operator precedence parsing algorithm. Consider


the following grammar and build up operator precedence table. Also
parse the input string (tid+lid'id))

E>E+T|T, T>T"F|F, F- (E)|id AKTU 2017-18, Marks10


Answer
Operator precedence parsing algorithm:
Let the input string be a1, a2,. an $. Initially, the stack contains S.
1 Set p to point to the first symbol of ws.
2. Repeat: Let a be the topmost terminal symbol on the stack and let b be
the current input symbol.
i If only $ is on the stack and only $ is the input then accept and
break.

else

begin
Ifa» bor a #b then shift a onto the stack and increment p to next
input symbol.
i. else ifa < b then reduceb from the stack

iv. Repeat:
CEpop the stack
v. Until the top stack terminal is related by » to the terminal most

recently popped.
else
vi. Call the error correcting routine
end
Operator precedence table :

id $

)>>
id>>
<
2-8C (CSAT-Sem-5) Basic Parsing Techniques

Parsing:
$ id> +(< id >*< id >))$| Handle id is obtained between<
Reduce this by F-> id
(F+(<*id >*<* id >)$ Handle id is obtained between<>
Reduce this by F-> id
(F+ (F.< id >))8 Handle id is obtained between <
Reduce this by F> id
(F + (F*F) Remove all the non-terminals.
(+(*)) Insert $ at the beginning and at the
end.
Also insert the precedence operators.
$<+>(<** >)) $ The
This
operator is surrounded by <
indicates that * becomes handle.

That means we have to reduce T*F


operation first.

$< +> $ Now+ becomes handle. Hence we

evaluate E + T.

Parsing1s done.

PART-3
Top-down Parsing, Predictive Parsers

Questions-Answers
Answer 1ype Questions
Long Answer Type and Medium

?
Que 2.7. What are the problems with top-down parsing

Answer
Problems with top-down parsing are:
1. Backtracking:
of non-terminal
a. Backtracking is a technique in which for expansion
mismatch occurs then
symbol, choose alternative and if some
we

we try another alternative if any.

there are multiple production rules beginning


b. If fora non-terminal, the correct derivation, we
with the same input symbol then to get
alternatives.
need to try all these
Compiler Design 2-9 C (CS/IT-Sem-5)

c. Secondly, in backtracking. we need to move some levels upward in


order to check the possibilities. This increases lot of overhead in
implementation of parsing.
d. Hence, it becomes necessary to eliminate the backtracking by
modifying the grammar.
2. Left recursion:
a. The left recursive grammar is represented as

4 Aa

b. Here means deriving the input in one or more steps.

C.Here, A is a non-terminal and a denotes some input string.


d Ifleft recursion is present in the grammar then top-down parser
can enter into infinite loop.

Subtree

Fig. 2.7.1. Left recursion.


e. This causes major problem in top-down parsing and therefore
elimination of left recursion is must.
& Left factoring:
a. Left factoring is occurred when it is not clear that which of the two
alternatives is used to expand the non-terminal.
b. Ifthe grammar is not left factored then it becomes difficult for the
parser to make decisions.

Que 2.8. What do you understand by left factoring and left


recursion and how it is eliminated?

Answer
Left factoring and left recursion: Refer Q 2.7, Page 2-8C, Unit-2.
Left factoring can be eliminated by the following scheme :
a. In general if

is a production then it is not possible for parser to take a decision whether


to choose first rule or second.
b. In such situation, the given grammar can be left factored as
2-10C (CSIT-Sem-5) Basic Parsing Techniques

AaA' IY,- 1Y

A'B,1P,.I,
Left recursion can be eliminated by following scheme:
a In general if

AAa,1Aa, |..| Aa, |B,1P,. |P.


where no p, begin with an A.
b. In such situation replace
we the A-productions by
A B,A'IB,A'|.|P,A4'
A'a,A'|a,A'|...|a, A'|E
Que 2.9. Eliminate left recursion from the following grammar

SAB,A BS|b, B > SA |a AKTU 2017-18, Marks10


Answer
SAB
A> BS|b
B SA |a

SAB
S BSB |bB
SSASBl aSB | bB
SaSBS |bBS"
S ASBS|E
BABA |a

BBABBA | bBA| g
BbBA B' |aB'
BABBA B|E
A BS |a
A SAS|aS |a
A A BAABIaAB|a

A aABA' |aA'
A'BAAB A' |e
is
The production after left recursion
S aSB S |bBS'
SASB S'|e
A aABA' |aA
Compiler Design 2-11 C (CS/IT-Sem-5)

A' BAABA'|E
BbBAB |aß'
B'ABBA B'|E
Que 2.10. Write short notes on top-down parsing. What are top-
down parsing techniques ?

Answer
Top-down parsing: Refer . 2.1, Page 2-2C, Unit-2.
Top-down parsing techniques are:
1. Recursive-descent parsing:
i A top-down parser that executes a set of recursive procedures to
process the input without backtracking is called recursive-descent

parser and parsing is called recursive-descent parsing.


i. The recursive procedures can be easy to write and fairly efficient
if written in a language that implements the procedure call

efficiently
2 Predictive parsing:
A predictive parsing is an efficient way of implementing
descent parsing by handling the stack recursive-
of activation records

explicitly.
T h e predictive parser has an input, a stack, a parsing table, and an
output. The input econtains the string to be parsed, followed by $,
the right end-marker.

a+bsInput
XY Predictive
Stack parsing Output
Lprogramn|

Parsing
Table

Fig. 2.10.1. Model of a predictive


parser
ii. The stack contains a sequence of
grammar symbols with $
indicating button of the stack. Initially, the stack contains the
start symbol of the grammar preceded by $.
iv. The parsing table is a two-dimensional array M [A, a) where 'A' is
a non-terminal and 'a' is a terminal or the
symbol $.
V.The parser is controlled
by a program that behaves as follows
2-12C (CSAT-Sem-5) Basic Parsing Techniques

The program determines X symbol on top of the stack, and 'a' the
current input symbol. These two symbols determine the action of
the parser.

vi. Following are the possibilities:


a. If X = a= $, the parser halts and announces successful
completion of parsing.

b IfX=a: $. the parser pops X off the stack and advances the
1nput ponter to the next input symbol.

C.IfXis a non-terminal. the program consults entry M[X, al of


theparsing table M. This entry will be either an X-production
of the grammar or an error entry.

Que 2.11.Differentiate between top-down and bottom-up parser.


Under which conditions predictive parsing can be constructed fora
grammar ?

Answer

No. Top-down parser Bottom-up parser


1 | In top-down parser left| ]
bottom-up parser right-most
recursion is done. derivation is done.

|Backtracking is possible Backtracking is not possible.


. Inthis, input token are popped| In this, input token are pushed on
offthe stack. the stack.

First and follow are defined in First and follow are used in
top-downparser. bottom-up parser.
Predictive parser and recursive| Shift-reduce parser, operator
descent parser are top-down precedence parser, and LR parser

parsing techniques. are


bottom-up parsng technique.
Predictive parsing can be constructed if the following condition
holds:
1. Every grammar must be recursive in nature.

2. Each grammar must be left factored.

Que 2.12.| What are the problems with top-down parsing ? Write

the algorithm for FIRST and FOLLOW.AKTU201s-19, Marks07


Answer
Problems with top-down parsing : Refer Q.2.7, Page 2-8C, Unit-2.
Compiler Design 2-13C (CS/AT-Sem-5)

Algorithmfor FIRST and FOLLOW:


1. FIRST function
i. FIRST (A) is a set of terminal symbols that are first symbols
appearing at R.H.S. in derivation of X.
ii. Following are the rules used to compute the FIRST functions
a. IfX determine terminal symbol 'a' then the FIRST) = la).
b. If there is a rule X-sthen FIRSTX) contain lel.
c. If X is non-terminal and X>Y, Y, Y,.Y, is a production and
ifs is in all of FIRST (Y,)... FIRST (Y) then
FIRSTX) = FIRST(Y,)UFIRSTYY)UFIRSTY,).. FIRST(Y).
2 FOLLOW funetion:
i FOLLOWA) is defined as the set of terminal symbols that appear
immediately to the right of A.

ii. FOLLOWA) = la | S =aAa ß where a and ß are some grammar

symbols may be terminal or non-terminal).


ii. The rules for computing FOLLOW function are as follows:
a. For the start symbol S place $ in FOLLOWS).
b. fthere is a production A >aBp then everything in FIRSTB)
without e is to be placed in FOLLOWB).
C. If there is a production A -aB ß or A>aB and FIRST(B)
contain then FOLLOW(B) = FOLLOWA). That means
everything in FOLLOWA) is in FOLLOWB).

Differentiate between recursive descent parsing and


Que 2.13.
predictive parsing.
Answer

Predictive parsing
S.No. Recursive descent parsing
1 CFG is used to build recursive Recursive routine is not build.
routine.
2. RHS of production rule is Production rule is not converted
converted into program. into program.

3.
.
Parsing table s not |Parsing table is constr ed
constructed.

First and follow is not used. First and follow is used to


construct parsing table.
2-14C (CS/T-Sem-5) Basic Parsing Techniques

Que 2.14.Explain non-recursive predictive parsing. Consider the


following grammar and construet the predictive parsing table
E TE'
E++TE|8
T>FT
T 'FT|e

F>F*|a|b AKTU 2017-18, Marks 10


Answer
Non-recursive descent parsing (Predictive parsing): Refer Q. 2.10,
Page 2-11C, Unit-2.
Numerical
E TE'

E+TE"|e
T> FT'

T*FT'|E
PF|a|b
First we remove left recursion

FFlgl
F a F ' |bF

F"F"|e
FIRSTE) = FIRST(T) = FIRSTYP) = la, b)

FIRSTCE") = { +, el, FIRSTYF ") =l*, tl

FIRST(T) = (", td

FOLLOWE) = {$1

FOLLOWE') = 1$
FOLLOWT) = l+,$)

FOLLOW T) =l+, $
FOLLOWF) = l", +, $

FOLLOWF) = (", +,$


Compiler Design 2-15C (CSIT-Sem-5)

Predictive parsing table:


Non-terminal Input symbol

E E TE E>TE|
E E+TE
T T>FT T>FT
T' TE T *FT| "-

F a F ' |F>bF"
F'
*F

PART-4
Automatic Generation of Efficient Parsers: LR Parsers, The
Canonical Collections of LR(0) Ttems

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 2.15. Discuss working of LR parser with its block diagram.


Why it is most commonly used parser ?

Answer
Working of LR parser:
The working of LR parser can be understood by using block diagram as
shown in Fig. 2.15.1.

s Input tape

LR parsing8
Output
Stack Xm program

Action BOto

Fig. 2.15.1. Model of an LR parser,


2-16C (CS/IT-Sem-5) Basic Parsing Techniques

2. In LR parser, it has input buffer for storing the input string, a stack for
storing the grammar symbols, output and a parsing table comprised of
two parts, namely action and goto.
3. There is a driver program and reads the input symbol one at a time from
the input buffer. This program is same for all LR
parser
It reads the input string one symbol at a time and maintains a stack.
5. The stack always maintains the
following form:
SX, S,X S2 SmX,S
where, 1s a grammar symbol, each S, is the state and S, state is top
of the stack.
6. The action of the driver program depends on action 1S, a,| where a, is
the current input symbol.
7. Following action are
possible for input a, a,. 1
. . .

G
a. Shift: If action (S a, = shift S, the parser shift the input symbol,
a, onto the stack and then stack state S. Now current input symbol
becomes 4;.1
Stack Input
S4, S, 1,.2 4
b. Reduce : If action [S," al = reduce A > ß the parser executes aa
reduce move using the A >B production of the grammar. If A >
h a s r grammar symbols, first 2r symbols are popped of the stack
r state symbol and r grammar symbol). So, the top of the stack
now becomes Sm then A is pushed on the stack, and then state
goto (S A] is pushed on the stack. The current input symbol is
still a
Stack Input
SX, S,X , AS
where, S = Goto [Sm Al

i I f action S a,= accept, parsing is completed.


ii. If action [S,,, a, = error, the parser has discovered a syntax
error.

LR parser is widely used for following reasons:


1. LR parsers can be constructed to recognize most of the programming
language for which context free grammar can be written.
2. The class of grammar that can be parsed by LR parser is a superset of
class of grammars that can be parsed usingg predictive parsers.

3. LR parser works using non-backtracking shift-reduce technique


4. LR parser is an efficient parser as it detects syntactic error very quickly.
Compiler Design 2-17C (CS/NT-Sem-5)

Que 2.16.Write short note on the following


1. LR (0) items
2 Augnmented grammar
3 Kernel and non-kernel items
4 Functions closure and goto

Answer
1. LR(0) items: The LR (0) item for grammar G is production rule in
which symbol is inserted at some position in R.H.S. of the rule. For
examplee
SABCc

SA BC
S AB C

SABC
The production S E generates only one item S>*.

2 Augmented grammar: Ifa grammar G is having start symbol S then


augmented grammar & in which S' is a new start symbol such that
S S . The purpose of this grammar is to indicate the acceptance of
input. That is when parser is about to reduce S' >S, it reaches to
acceptance state.
3. Kernel items :It is collection of items S' >* S and all the items whose
dots are not at the lef most end of R.H.S. of the rule.
Non-kernel items: The collection of all the items in which are at the
left end of RHS. of the rule.
4 Functions closure and goto: These are two important functions
required to create collection of canonical set of LR (0) items.

PART-5
Constructing SLR Parsing Tables
Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 2.17.Explain SLR parsing techniques. Also write the

algorithm to construct SLR parsing table.


Answer
The SLR parsing can be done as
2 - 1 8 C (CcS/IT-Sem-5)
Basic Parsing Techniques

Context free grammar

Construction of canonical set of items

Construction of SLR parsing table

nput
Parsing of input string
string

Output
Fig. 2.17.1. Working of SLR parser.

for construction of SLR parsing table :


Algorithm an

Input: Cthe canonical collection of sets of items for an augmented grammar


G.
of parsing action function
Output: fpossible, LR parsing table consisting
an

ACTION and a goto function GOTO.

Method
Let C'= Uo1 , The states of the parser are 0, 1....n state i being

constructed from
The parsing action for state is determined as follows:
1. If [A a ° a ß l is in I, and GOTO (I, a) =1, then set ACTION li, al to
"shift . Here a is a terminal.
If A a®] is in 1, the set ACTION Li, al to "reduce A >d" for all 'a' in
FOLLOW (A).
3. If IS-S°] is in 1, then set ACTION [i, S] to "accept".
The goto transitions for state i are constructed using the rule.
4. If GOTO (, A) =I, then GOTO li, A] =j.
5. All entries not defined by rules (1) through rule (4) are made "error".

6 The initial state of the parser is the one constructed from the set of
S].
itemscontaining IS
The parsing table consisting of the parsing ACTION and GOTO function
determined by this algorithm is called the SLR parsing table for G. An
LR parser using the SLR parsing table for G is called the SLR parser
for G and a grammar having an SLR parsing table is said to be SLR(1).

Que 2.18. Construet an SLR(1) parsing table for the following


grammar
S- A)
SA,PI (P,P
P> {num, num) AKTU 2016-16, Marks 10
Compiler Design 2-19C (CS/IT-Sem-5)

Answer
The augmented grammar G for the above grammar G 1s
S S
S A)
S A, P
S(P,P
P {num, num
The canonical collection of sets of LR(0) item for grammar are as follows:
S S
S A)
S A, P
S (P,P
P [num, num)
I, GOTO .
= S)
S S.
1,= GOTO, .A)
S A °)
SA,P
GOTO , )
S(P,P
P> °{num, num

I GOTO. )
°num, num)
1, =GOTO ,))
S>A)
GOTO , )
SA,. P
P num, num}
GOTO (g. P)
, S>(P:, P
I = GOTO, num)
P> {num °, num}
I,= GOTO U, P) S A, P
-GOTO,9 S (P,°P
10
P>° {num, num}
GOTO,) P {num, ° num

1
= GOTO0,P) S> (P,Pe
GOTo , num)
3
P {num, num °)
13
GOTO13)
14
P-> Inum, numl
2-20 C (CS/MT-Sem-5) Basic Parsing Techniques

Action Goto
Item
t Num A

2
accept
S

12
11

12

14

Que 2.19.Consider the following grammar


S AS|b
A SA|a
Construct the SLR parse table for the grammar. Show the actions
of the parser for the input string "abab". AKTU 2016-17,Marks 15
Answer
The augmented grammar is
S''>S
S > AS|b
A SA Ja
The canonical collection of LR(0) items are
:S S
S AS | b
ASA|a
GOTO , S)
,SS SAS|b
AS A
2-21 C (CS/AT-Sem-5)
Compiler Design
A » SA|°a

I, = GOTO , A)

:SA
S
s
AS|°b

ASA|a
GOTO b)
= ,,
:Sb
I,= GOTO , a )
: A >a

1, = GOTO (4, A)
:A>SA
I = GOTO 4, S) = 7,
I, = GOTO U, a) = I,

1, = GOTO , S)
:S AS
I, = GOTO U, A) =1,
1 = GOTO, b) =1,
Let us numbered the production rules in the s ammar as:

1. SAS
2. Sb
3. ASA
4 A a
FIRST S) = FIRST(A) = la, b)
FOLLOWS) = 1$, a, b}

FOLLOWA) = la, b)

Table:2.19.1. SLR parsing table.

Action Goto

States|
accept
2

T2

3 3
2-22 C (CS/IT-Sem-5) Basic Parsing Techniques

AC

Fig. 2.19.1. DFA for set of items.


Table 2.19.2: Parse the input abab using parse table.

Stack Input buffer Action


$0 abab$ Shift
$0a4 bab$ Reduce A >a
$0A2 babs Shift
$0A263 ab$ Reduce S+b
$0A2S8 ab$ Shift S- AS
SOS1 abs Shift
$OSla4 Reduce A >a
$OS1A5 b$ Reduce A AS
$0A2 bs Shift
$OA2b3 Reduce S >b
$0A2S8
Reduce S>AS
$OS1
Accept

Que 2.20 Consider the


Construct the SLR
following grammar EE+E |E°E|(E) |id.
parsing table and suggest your final
parsing
table.
AKTU 2017-18, Marks 10
Answer
The augmented grammar is as

E- E
EE + E
E E
(E)
E id
The set of LR(0) items is as follows
Compiler Design 2-23 C (Cs/AT-Sem-5

E' *E
E E+ E
E E E
E (E)
i d

I, =
GOTO G, E)
E E
E E+E
EE*E

I = GOTO , )
E»(°E)
B E+E
E E*E
E(E)
Eid
1 =
GOTO G, id)
E ide

I GOTO , +)
EE+E
E *E +E

E E *E
E (E)
E°d
I, = GOTO , )
E> E * *E

E > °E *E
()

= GOTO (d2, E)
E > (E)
E ° +E

E> E» *E

= GOTO E)
, E> E + E
E Ee * E
E > E° *E
GOTO , E)
I
E>E* E
E E° +
E> E° *E
I, = GOTO 6, ))
E (E)
2-24 C (CSAT-Sem-5) Basic Parsing Techniques

Action Goto

State| id E
0
S 1

S accept
2 6

3
3
Que 221.Perform shift reduce parsing for the given input strings
usingthe grammar S (L)|aL >L, S|S
i. (a,(a, a)) (a,a) AKTU 2018-19,Marks 07
Answer

Stack contents Input string Actions


(a, (a, a))$ Shift
(a, (a, a)S Shifta
a, a Reduce S >a
a, a s Reduce L+S
SUL ,a, a)$ Shift
$(L, a, a))s Shift
$L,0 4,a))$ Shift a
$CL, (a Reduce S a
as
$L, (S a)s Reduce L >S
L, (L ,a))s Shift,
$L, (L, a)s Shifta
SL, (L,a Reduce S >a
$L, (L, S Reduce L L, S
$(L, L Shift
SL, (L)
SL, S
Reduce S>L)
Reduce L->L,S|
Shift)
$(L)
$S ReduceS(L)
Accept
Compiler Design 2-25 C(CS/IT-Sem-5)
11.

Stack contents Input string Actions


, a$ Shift (
, a)$ Shift
Ma as Reduce S a
SS ,a)$ Reduce L >S
SL ,a's Shift,
$(L, as Shift a
Reduce S >a
$(L,a
Reduce L L , S |
SL,S Shift
S
$(L) Reduce S>L
Accept

Que 2.22. Construct LR(0) parsing table for the following


grammar

S cB\ccA
A cA|a
B- ceB|b AKTU 2018-19, Marks 07
Answer
The augmented grammar is :
SS
S cB |ccA
A cA Ja
BccBb
The canonical collection of LR (0) items are

:S S

S cB|° ccA

A cA|°a
B»° ccB|° b

1, = GOTO , S)
:SS
= GOTO, c)

1,Sc*
Blc cA
Ac°A

Bc°cB
A> °cA|°a

B°ccB|*b
2-26 C (CSIT-Sem-5) Basic Parsing Techniques

1, = GOTO ( , a)

Aa
GOTO a,,b)
:Bb
I, = GOTO 1, B)

,:ScB3.
, GOTO (U, A)
AcA
1, = GOTO, c)

I,:S cc A
B>cc B
A c°A
B c cB
A cA/°a
B ccB/ b
I, = GOTO (u, A)
S ccA
A cA
1, = GOTO4, B)
:BccB
GOTO(,e)
10B ce B
A c * A

B>c°cB
BecB| b
A cA| ° a
GOTO 0.A)
AcA
DFA for set of items:

a
A

Fig. 2.22.1.
2-27 C (CS/IT-Sem-5)
Compiler Design
Let us numbered the production rules in the grammar a8

S cB

2. S-ccA

3. CA
A-
B ccB
B b

Action GOTO

States a s

S S S
Accept

SaS S, 6

sS, Sto 8

s s s s
SS S10

LPART-6
Constructing Canonical LR Parsing Tables.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 2.23. Give the algorithm for construction of canonical LR


parsing table.
2-28 C (CS/IT-Sem-5) Basic Parsing Techniques

Answer
Algorithm for construetion of canonical LR parsing table :

Input:An augmented grammar G'.


Output: The canonical parsing table function ACTION and GOTO for G
Method: Construct C = V. the collection of sets of LR (1) items of
G'. State i of the parser is constructed from

1 T h eparsing actions for state i are determined as follows :


a. If lA a°aß, b] is in 1, and GOTO ( , a) = I, then set
ACTION [i, al to "shift j. Here, a is required to be a
terminal
b . I fA a " , al is in , A : S', then set ACTION li, al to "reduce
A a"
c . f I S S , S| is in 1, then set ACTION i, SI to "accept"
The goto transitions for state i are determined as follows:
2 If GOTO (, A) = 1, then GOTO li, A) =j.
3. All entries not defined by rules (1) and (2) are made "error.

4The initial stateof parser is the one constructed from the set containing
items (S" *S, $].
Ifthe parsing action funetion has no multiple entries then grammar is
said to be LR(1) or LR.

PART7
Constructing LALR Parsing Tables Using Ambiguous Grammars, An
Automatic Parser Generator, Implementation of LR Parsing Tables.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 2.24.| Give the algorithm for construction of LALR parsing


table.

Answer
The algorithm for construction of LALR parsing table is as
Input: An augmented grammar G
Output: The LALR parsing table function ACTION and GOTO for G'
Compiler Design 2-29 C (CS/IT-Sem-5)

Method:
1. Construct C =W , . , I,) the collection ofsets of LR (1) items
2. For each core present among the LR (1) items, find all sets having that
core, and replace these sets by their
union
3. Let C = Wo» J . be the resulting sets of LR (1) items. The
parsing actions for state i are constructed from,, If there is a parsing
action conflicts, the algorithms fails to produce a parser and the
(1).
grammar is said not
to be LALR
4 The goto table construeted as follows. If S is the union of one or more
sets of LR(1) items, i.e., J = 1 U l 2 u l . . U t h e n the cores of the
GOTO ( , X), GOTO 2, X),., GOTO , X ) are the same. Since
I all have the same core. Let k be the union of all sets of the
items having the same core as GOTO ( , X). Then GOTO (J, X) = k.

The table produced by this algorithm is called LALR parsing tabile for
grammar G. If there are no parsing action conflicts, then the given
grammar is said to be LALR1) grammar.
The collection of sets of items constructed in step 3 of this algorithm is
called LALR(1) collections.

Que 2.25.For the grammar S a A d |bBd |aBe | bAe A >


f, B f
Construct LR(1) parsing table. Also draw the LALR able from the

derived LR) parsing table. AKTU 2017-18, Marks 10


Answer

Augmented grammar
S'S
SaAd|bBd| aBe|bAe
A f
Bf
Canonical collection of LR(1) grammar:

S S, $
SaAd,$
SbBd,$
SaBe,$
SbAe,$
A f , dle
B f , dle
2-30 C (CSAT-Sem-5) Basic Parsing Techniques
1,: = GOToM,, S)

S' S,$
GOTO,a)
S> a°Ad,$
SaBe,$
Af,d
B f, e

1:= GOTOd,, b)
SbBd,$
SbAe,
A,d
Bf.e
GOTOu, A)
S aA°d, $

s:= GOTOu,, B)
S aßed, $

I:= GOTOI,
A f,d
B»f,e
:= GOTO,. B)
S>bBed, $
I : = GOTOU,, A)
bA°e,

I:= GOTO1,.
Af,d
B f,e

10GOTOu,d)
S aAd,$
GOTOI, d)
SaBd,$
I12= GOTOu, d)

S bBd ,$

13= GOTOd,e)
S bAe,s
Compiler Design 2-31 C (CS/IT-Sem-5)

State Action Goto

de f B

Accept

11

12

13

Que 2.26. Show that the following grammar


SAalbAc|Be |bBa
Ad
Bd
is LR(1) but not LALR (1).
AKTU 2015-16, Marks 10
Answer
Augmented grammar G for the given grammar
S'S

SAa

SbAc
SBc
SbBa
2-32 C (CSAT-Sem-5) Basic Parsing Techniques

A d

B-d
Canonical collection of sets of LR(O) items for grammar are as follows:

:S,$
SAa,$
SbAc,$
SBc,$
SbBa,$
A*d, a
B °d, c
I, GOTO , . S)

1,:S>S ,$
= GOTO ,.A)
I2:S> A*a, $
=GOTO 4,b)
1,:S> b°Ac, $
SbBa,$
A d, c

B°d, a
GOT , B)
SBc, $
= GOTO, d)
IsAd,a
B>do, c
I GOTO,a)
s : S > Aa,$

GOTO , A)
1, =

,:SbA*c, $

GOTO u, B)
:S bBea, $

d)
1, = GOTO ,
Compiler Design 2-33 C (CIT-Sem-5)

g:A d*, c
Bd,a

10 GOTO (1,. c)
10SBc,$
GOTo
I (,, c)
S bAc°, $

12= GOTO Ig, a)


12SbBa»,$
The action/goto table will be designed as follows:

Table 2.26.1.

State Action Goto


s B
S
accept

10
5 5

9 6

10

11

12

Since the table does not have any conflict. So, it is LR(1).
For LALR(1) table, item set 5 and item set 9 are same. Thus we merge
both the item sets (,1g)= item set 54. Now, the resultant parsing table
becomes
2-34 C (CS/AT-Sem-5) Basic Parsing Techniques

Table 2.26.2.

State Action Goto

a d sAB

accept

S10
59 s96

S
S2
10

11

12

Since the table contains reduce-reduce conflict, it is not LALR1).

Que 2.27 Construct the LALR parsing table for following


grammar
S AA
A >aA
A »b
is LR (1) but not LALR(1). AKTU 2015-16, Marks 10
Answer
The given grammar 1s :
S >AA
A >a4 |b
The augmented grammar will be :
SS

SAA
A aA |b
2-35 C (CS/IT-Sem-5)
Compiler Design
The LR (1) items will be:

S S, $
S AA,$
A aA, alb

A b, alb
1, GOTO , S)
SS, $
I = GOTO( . A)
iSAA, $

A aA, $
A b, $

= GOTO , a)

g: AaA, alb
AaA,alb
Ab, a/b
I, = GOTO . b)

A b, a/b
1, = GOTO , , A)
I5S AA.S

I, = GOTO (u, a)
gAaeA, $
A aA,$
A b, $

, =GOTO , b)

A bo, $
= GOTO 1,, A)
gAaA»,alb
I GOTO, A)
gAaA , $
2-36 C(CS/AT-Sem-5) Basic Parsing Techniques

Table 2. .27.1.
Action Goto
State

S 2

accep

6 9
S
3
8

9
Since table does not contain any conflict. So it is LR(1).
The goto table will be for LALR I, and I, will be unioned, I, and I, will
be unioned, and , and I, will be unioned.

So, 36:Aa A , alb/$


A aA, alb/$
A b, a/b/$
47Abe,alb/$
and LALR table will be
89A aA., alb/$
:

Table 2.27.2.

Action Goto
State A

0 35 Sq7
accept

36 B9
47

47 3

89
2-37C (CS/IT-Sem-5)
Compiler Design
contain any conflict. So, it is also LALR 1).
Since, LALR table does not
DFA

(1o
A

Fig. 2.27.1.

VERY IMPORTANT QUESTIONS


These questions
Following questions are very important.
SESSIONALS a s well a s
may be
asked in your
UNIVERSITY EXAMINATION.

What are the


Write the role of parser.
Q.1. What is parser ?
?
most parsing techniques
popular
Ans Refer Q. 2.1.

example.
Q.2. Explain operator precedence parsing with
ABE Refer Q. 2.4.

parsing?
What are the problems with top-down
Q.3.
Refer Q. 2.7.
Ans
and left recursion
understand by left factoring
Q4. What do you eliminated ?
and how it is
ABE Refer Q. 2.8.

recursion from the following grammar


Eliminate left
Q5.
BS|b, B >SA Ja
SAB,A
Refer Q. 2.9.
top-down parsing ? Write the
What are the problems with
Q6. FOLLOW.
algorithm for FIRST and

AB Refer Q. 2.12.
3
UNIT
Syntax-Directed
Translations

CONTENTS
Part-1 : Syntax-Directed Translation 3-2C to 3-5C
Syntax-Directed Translation Scheme,
Implementation of Syntax-Directed
Translators

Part-2 Intermediate Code, Post Fix. 3-6C to 3-9C


Notation, Parse "Trees and
Syntax Trees

Part-3 : Three Address Code,..**************************** 3-9C to 3-13C

Quadruple and Triples

Part-4 Translation of Assignment . 3 - 1 3 C to 3-18C


Statements

Part-5: Boolean Expressions. ********


3-18C to 3-21C
Statements that Alter
the Flow of Control

Postfix 3-21C to 3-23C


Part-6: Translation: Array..
Reference in Arithmetic
******** *****

Expressions

Part-7 : Procedures all****************************** 3-23C to 3-25C

Part-8 Declarations Statements.. .3-25C to 3-26C


-
-

3-1C (CS/TT-Sem-5)
Syntax-Directed Translations
3-2C (CSIT-Sem-5)

PART 1
Syntax-Directed Translation Syntax-Directed Translation
Schemes, Implementation of Syntax-Directed Translators.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Construet
syntax directed translation.
an
Que 3.1.Define
annotated parse tree for the expression (4 * 7 1) * 2, using the

simple desk calculator grammar. AKTU 2018-19,Marks 07


Answer
1. Syntax directed definition/translation is a generalization of context
X is associated
free grammar in which each grammar production
-> a

is
with a set form
of semantic rules of the a:=flb,,6 . b,), where a
an attribute obtained from the function f.
2. Syntax directed translation is a kind of abstract specification.
3. It is done for static analysis of the language.

4. It allows subroutines or actions to be attached to the


semantic

productions of a context free grammar. These subroutines generate


intermediate code when called at appropriate time by a parser for that

grammar.
subsets called
5. The directed translation is partitioned into two
syntax
the synthesized and inherited attributes of grammar.

Lexical analysis

Token stream

Syntaxanalysis
Parse tree

Semantic analysis
Dependency graph
Syntax directed
translation Evaluation order
for semantic rules
Translation of constructs

Fig. 3.1.1
Compiler Design 3-3 C (CS/IT-Sem-5)

Annotated tree for the expression (4*7 + 1)*2:


L val 58
=

(E)
E.val 29
=

(F) TF.val =2

EE.val 29 id id.lexval = 2

T.val = 28(T).
Fval 1
T.val 4T)
EF.val = 7

d id
id.lexval =4 id.lexval = 7

Fig. 3.1.2.

Que 32.|What is syntax directed translation ? How are semantic


actions attached to the production ? Explain with an example.

Answer
Syntax directed translation : Refer Q. 3.1, Page 3-2C, Unit-3.
Semantic actions are attached with every node of annotated parse tree.
Example: A parse tree along with the values of the attributes at nodes
(called an annotated parse tree") for an expression 2 +3"5 with synthesized
attributes is shown in the Fig. 3.2.1.

(E E.val=17

E.val-2 E
(T)T.val=15

Tval-2 T.val=3 T F.val=5


Fval=2 E Fva-3(E digit)
digit.lexval=5

digit dignt
digit.lexval=2 digit.lexval=3
Fig. 3.2.1.
3-4 C (CS/IT-Sem-5) Syntax-Directed Translations

Que 3.3. Explain attributes. What are synthesized and inherited


attribute?

Answer
Attributes:
1. Attributes associated information with language construct by
are

attaching them to grammar symbols representing that construct.


2. Attributes are associated with the grammar symbols that are the labels
of parse tree node.

3. An attribute can represent anything (reasonable) such as string, a


number, a type, memory location, a code fragment etc.
a

4 The value of an attribute at parse tree node is defined by a semantic rule


associated with the production used at that node.
Synthesized attribute:
1. An attribute at a node is said to be synthesized if its value is
computed
from the attributed values of the children of that node in the parse tree.
A syntax directed definition that uses the synthesized attributes is
exclusively said to be S-attributed definition.
3 Thus, a parse tree for S-attributed definition can always be annotated
by evaluating the semantic rules for the attributes at each node from
leaves to root.

4 I f the translations are specified using S-attributed definitions, then the


semantic rules can be conveniently evaluated by the parser itself during
the parsin8
For example: A parse tree along with the values of the attributes at
nodes (called an "annotated parse tree") for an expression 2 +3"5 with
synthesized attributes is shown in the Fig. 3.3.1.
Eval=17

E.val-2 (E)
T)Tval-15

T.val-2 (T T.val-3T F.val-5


F.val-2 E F.val=3(F) digit)
digit.lexval=5
digit Cdigit)
digit.lexval=2 digit.lexval=3
Fig. 3.3.1. An annotated parse tree for expression 2+35.
Compiler Design 3-5C (CS/IT-Sem-5)

Inherited attribute
1. An inherited attribute is one whose value at a node in a parse tree 1s
defined in terms of attributes at the parent and/or sibling ofthat node.
Inherited attributes are convenient for expressing the dependence ofa
programming language construct.
For example: Syntax directed definitions that uses inherited attribute
aregiven as
D TL L.type = T:lype

T-
T
nt

real
Ttype: = integer
T.type: = real
L L, id L.type: =L.type
enter (td.prt, L.type)
L id enter (id.prt, L.type)
The parse tree, along with the attribute values at the parse tree nodes, for an
input string int id,, id, and id, is shown in the Fig. 3.3.2.

T.type=int T
(L)Ltypeint
int)

L.type=intL)

Fig. 3.3.2. Parse tree with inherited


attributes for the string int id, ida, idg
Que 34. What is the difference between S-attributed and
L-attributed definitions?
Answer

S.No. S-attributed definition L-attributed definition


1 It uses synthesized attributes. It uses synthesized and inherited
attributes.
Semantics actions are
placed at| Semantics actions
right end of production.
are placed at
anywhere on RHS.
3 S-attributes can be evaluated L-attributes are evaluated by
during parsing. traversing the parse tree in deptth
first, left to right.
3-6 C(CSNT-Sem-5) Syntax-Directed Translations

PART-2
Intermediate Code, Postfix Notation, Parse Trees and Syntax Trees.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 3.5.What is intermediate code generation and discuss


benefits of intermediate code ?

Answer
Intermediate code generation is the fourth phase of compiler which takes
parse tree as an input from semantic phase and generates an intermediate
code as output.
The benefits of intermediate code are:
1. Intermediate code is machine independent, which makes it easy to
retarget the compiler to generate code for newer and diferent processors.
2. Intermediate code is nearer to the target machine as compared to the
source language so it is easier to generate the object code.
3. The intermediate code allows the machine independent optimization of
the code by using specialized techniques.
4. Syntax directed translation implements the intermediate code
generation, thus by augmenting the parser, it can be folded into the
parsing
Que 3.6. What is postfix translation ? Explain it with suitable

example.
Answer
Postfix (reverse polish) translation: It is the type of translation in which
the operator symbol is placed after its two operands.
For example: 12)
Consider the expression: (20+(-5)* 6 +
Postfix for above expression can be calculate as
= 5-
(20 + 6+ 12)
20+t + 12 t6
t+ 12 34 20t2
12 +

Now putting values oft,, '3» '2»'1


=1 12 +
3-7C(CST-Sem-5)
Compiler Design
20 t,+ 12+
20,6 + 12 +
(20) 5- 6 * + 12 +

is only
tree. Why parse tree construction
Que 3.7. Define parse
possible for CFG?
Answer
Parse tree : A parse tree is an ordered tree in which left hand side of a

production represents a parent node and children nodes are represented by


the production's right hand side.
Conditions for constructing a parse tree from a CFG are
E a c h vertex ot the tree must have a label. The label is a non-terminal or

terminal or null (e).


i The root of the tree is the start symbol, i.e., S.
i. The label of the internal vertices is non-terminal symbols eVy"
iv. If there is a production A X X , ...Xx. Then for a vertex, label A, the
children node, will be X,X, ...
A vertexn is called a leaf of the parse tree if its label is a terminal
.
symbol e o r null (e).
Parse tree construction is only possible tor CFG. This is because the properties
of a tree match with the properties of CFG.

What is syntax tree ? What the rules to construct


Que 3.8. are

syntax tree for an expression ?

Answer
1. Asyntax tree is a tree that shows the syntactic structure of a program
tree.
while omitting irrelevant details present in a parse
2. Syntax tree is condensed form of the parse tree.

parse tree moved to their


operator and keyword nodes of a
are
3. The
parent and a chain of single production
is replaced by single link.
Rules for constructing a syntax tree for an expression:
1. Each node in a syntax tree can be implemented as a record with several

2.
fields
n the node for an operator, one field identifies the operator and the
remaining field contains pointer to the nodes for the operands.

3. The operator often is called the label of the node.


4. The following functions are used to create the nodes of syntax trees for
returns pointer to
expressions with binary operators. Each function a

newly created node.


a. Mknode(op, left, right):It creates an operator node with label op
and two field containing pointers to left and right.
Syntax-Directed Translations
3-8C (CSAT-Sem-5)
b. Mkleaflid, entry):It creates an identifier node with label id and
the field containing entry, a pointer to the symbol table entry tor

the identifier.
c. Mkleaf(num, val) :It ereates a number node with label num and
a field containing val, the value of the number
For example: Construet a syntax tree for an expressiona - 4 +c. n
this sequence. P1 P q . Pare pointers to nodes, and entry a and
entry c are pointers to the symbol table entries for identifier 'a' and c
respectively.
P1 mkleaf (id, entry a);
P2 mkleaf (num, 4);
P3 mknode(P1 Pg);
P4 mkleaf (id, entry c;
Ps mknode(+PaP
The tree is constructed in bottom-up fashion. The function calls mkleaf
(id, entry a) and mkleaf (num, 4) construct the leaves for a and 4. The
pointers to these nodes are saved using Pi and pz. Call mknode
( P P,) then constructs the interior node with the leaves for a and 4
as children. The syntax tree will be:

id
to entry for c

id num 4

to entry for a
Fig. 3.8.1. The syntax tree for a-4+c

Que 3.9. Draw syntax tree for the arithmetie expressions :


a b+c)- d/'2. Also write the given expression in postfix notation.

Answer
*
(b c)- d/2
Syntax tree for given expression :a +

Fig. 3.9.1.
Compiler Design 3-9 C (CS/IT-Sem-5)

Postfix notation for a (b+ c)-d/2


be + (a *t,-d/2)

a , -d/2)
d2/

,
Put value of t,. , . ,

= t , d 2/-
at, "d 2/-
=
abc +
* d2/-

L PART-3
Three Address Code, Quadruples and Triples.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

gue3.10. Explain three address code with examples

Answer
1 Three address code is an abstract form of intermediate code that can be
implemented as a record with the address fields.
2. The general form of three address code representation is
a b op c
where a, b and c are operands that can be names, constants and op
represents the operator.
3. The operator can be fixed or floating point arithmetic operator or logical
operators or boolean valued data. Only single operation at right side of
the expression is allowed at a time.
4 There are at most three addresses are allowed (two for operands and
one for result). Hence, the name of this representation is three address
code.
For exanmple: The three address code for the expression a =b+c +d
will be :

= b+c
+d
a:

Here t, and
t, are the temporary names generated by the compiler.
3-10C (CAT-Sem-5) Syntax-Directed Translations

u e 3.11. What are different ways to write three address code ?

Answer
Different ways to write three address code are:
1. Quadruple representation
The quadruple is a structure with at most four fields such as op,
argl, arg, result.

b. The op field is used to represent the internal code for operator, the
argl and arg2 represent the two operands used and result field is
used to store the result of an expression.
For example: Consider the input statement x =-a *b+-a *b
The three address code is

1 umnus a Op ArgArg Resut


uminus a
3
uminus
s2'4
5

2 Triples representation: In the triple representation, the use of


temporary variables is avoided by referring the pointers in the symbol
table.

For example:x = -a *b+-a *b


The triple representation is

Number Op Argl Arg2


(0) uminus a
(1) * 0) b

(2) uminus a

(3)
(4) (1) 3
5)

3 Indirect triples representation: ln the indirect trple representation,


the listing of triples is done and listing pointers are used instead of using
statement.

For example:r - a *b+-a +b


=

The indirect triples representation is


3-11 C(CS/AT-Sem-5)
Compiler Design

Number Op Arg1 Arg2 Location Statement


(0) uminus (0) (11)
1) (11) 2)

2) uminusS (2) (13)


(3) (13) b (3) (14)

(12) (14) (15)


(15 (5) 16)

Three address code of given statement is:

1 IfA> C and B < D goto 2


2. IfA = 1 goto 6

3. IfA<= D goto 6
4 = A +2

5. A =1
6. , C+1
7. C=t

for the
Que 3.12.| Write the quadruples, triple and indirect triple
following expression
(r+y) (y+2)+ (x+y+z)

AKTU 2018-19, Marks 07


Answer
The three address code for given expression

The quadruple representation:


Result
LocationOperator Operand 1|Operand 2
(1)

(2) 2
(3)
(4)
(5)
3-12C (CS/AT-Sem-5) Syntax-Directed Translations

i. The triple representation


Location Operator Operand 1
Operand 2
1)
(2) y
(3) (1) 2)
(4) (1)
(5) (3) (4)

. The indirect triple representation:

Location Operator Operand 1Operand2 Loeation Statement


(1) (11)
(2) (2) (12)
(3) (11) 12) (3) (13)
(4) (11) 4 (14)
(5) (13) (14) b) (15)

Que 3.13. Generate three address code for the


following code:
switch a +b

case 1:x=X+1|
case 2:y =y +2
case 3: z =z+3
default: C=C-1

AKTU 2015-16, Marks 10


Answer

101:, a
102 goto ll5
+b goto 103
103: t= 1 goto 105
104 goto 107
105:12=x+l
106: x l2 =

107: ift = 2 goto 109


108 goto 111
109: t y +2
110 y = t3
111:ift =3 goto 113
112 goto 115
113t =z +3
114:2 =4
115:t, =c-1
116:c=ts
117: Next statement
Compiler Design 3-13C (CS/IT-Sem-5)

Que 3.14.| Generate three address code for


CLAli, j11= Bli, jl+C[Ali, jI1+ Dli, j1 (You can assume any data for
solving question, if needed). Assuming that all array elements are
integer. Let A and Bal0 x 20 array with low, = low = 1.

AKTU 2017-18, Marks 10


Answer
Given: low, = 1 and low = 1, n, = 10, n, = 20.

Bl,Jl ( na) +))


= x
(base-((low, n,)+ low)
x w+

Bi,j| = ((i x 20) +)) x 4 + (base ( ( 1 x 20)+ 1) x 4)


x x w)

Bli, jl = 4 x (20 i +j) + (base - 84)


Ali,j) = 4 x (20 i +j) + (base - 84)
Similarly,
and, Di, jl = 4 x (20 i +j) + (base -84)
Hence, CIAli, jll = 4 x (20i + j) + (base - 84) + 4 x (20i +j)+
(base - 8 4 ) + 4 x ( 2 0 i + j ) + (base - 84)

= 4 x (20i +j) + (base - 84) [1 + 1+ 11

4 x3x (20 i+j) + (base 84) x 3


1 2 x (20i +j) + (base - 84) x 3

Therefore, three address code will be


2 0 xi

= base -84

12 xt2

PART-44
Translation of Assignment Statements.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 3.16. How would you convert the following into intermediate
code ? Give a suitable example.
i. Assignment statements

ii. Case statements AKTU 2016-17, Marks 15


3-14C (CS/AT-Sem-5)
Syntax-Directed Translations
Answer
i.
Assignment statements :

Production rule Semantice actions


Sid=E d
_entry =
look_ uplid.name);
if id_entry nil then
append (id_entry':='E.place)
else error;/" d not declared"/

EE,+E, E.place := newtemp);


append (E.place'= E, place'+B,place)
EE, E E.place := newtemp);
append (E.place ':="E, place *
E,place)
E-E, E.place := newtemp0;
append (E.place "="minus' E, place)

EE) E. place:E,-place
E id id_entry: =look_up(id.name);
if id_entry + nil then
append (id _entry ":= E.place)
else error;/" id not declared7

1. The look_ up returns the entry for id.name in the symbol table if it exists
there.

2. The function append is used for


appending the three address code to thee
output file. Otherwise, an error will be reported.
3. Newtemp() is the function used for generating new temporary variables.
4. E.place is used to hold the value of E.

Example:x :=(a +b)lc + d)


We will assume all these identifiers are of the same type. Let us have
bottom-upP parsing method:
Compiler Design 3-15C (C/IT-Sem-5)

Production rule Semantic action Output


attribute evaluation
E.place = a
Eid
Eid E.place=b

EE,+E, E.place=t =a+b


E id E.place:=c
E id E.place = d

EE, +E, Eplace:=, c+d

EE,E E.place :=t3 =(a +b)c + d


Sid = E

ii. Case statements:

Production rule Semantic action


Switch E Evaluate E into t such that t = E
goto check
case 1 1 L: code for s,
case D2*2 goto last
Lcode for s,
case D,-1 Sn-1 goto last
default Sn
L code Tor sn
goto last
check:if t =
vj goto L
ift = U, goto La

ift = v,-1 goto - 1

goto
last

switch expression

case value : statement


case value : statement

case value: statement


default: statement
3-16C (CSAT-Sem-5) Syntax-Directed Translations

Example
switchch)

case 1:C = a + b;

break;
case 2: c =a - b;

break;

The three address code can be

if ch = 1 goto L,
if ch = 2 goto L,
a +b
ast
goto
: = a-b

goto last
last
Que 3.16. Write down the translation procedure for control
statement and switch statement. AKTU 2018-19, Marks 07
Answer
L Boolean expression are used along with if-then, if-then-else,
while-do, do-while statement constructs.
2. S I f E then S1 | IfE then S1 else S2 while E do S1 |do E1 whileE.
3. All these statements 'E' correspond to a boolean expression evaluation.
4 This expression E should be converted to three address code.
5. This is then integrated in the context of control statement.
Translation procedure for if-then and if-then-else statement:
1. Consider a grammar for if-else
S i fE then S, | if E then S, else S
2. Syntax directed translation scheme for if-then is given as follows :

SifEthen S,
B.true = new_label()
B.false:= S.next
S,next:= S.next
Compiler Design 3-17C (Cs/IT-Sem-5)

S.code = E.code | gen_code(E.true :")| S,.code


3. In the given translation scheme| | is used to concatenate the strings.
4. The function gen_code is used to evaluate the non-quoted arguments
passed to it and to concatenate complete string.

5. The S.code is the important rule which ultimately generates the three
address code.
S i fE then S, else S,
E.true = new_label)
E.false = new_label(0
S, next := S.next

S,.next := S.next

S.code: E.code || gen_code(E.true :) ||


S.code:= gen_code('goto', S.next)||
gen_ codelE.false *) I S,.code
For example : Consider the statement ifa <b then a =a + 5 else
a a+

ifa<b E.true

true: S
E. false: a = a+7

The three address code for if-else is


100: ifa<b goto 102
101 goto 103
102: Lla := a+5 /*E.true"/
103: L2 a :=a+7
Hence, E.code is "ifa < b"L1 denotes E.true and L2 denotes E.false is
shown by jumping to line 103 (i.e., S.next).

Translation procedure for while-do statement


Production Semantic rules
S while E do S1 S.begin: =newlable
E.true: = newlable
B.talse: = S. next

S1.next:=S.begin
S.codeE.code
=genlS.begin :9||
1|

gen E.true )||S1.code ||


gen gotoS.being)
3-18C (CS/AT-Sem-5)
Syntax-Directed Translations
Translation procedure for do-while statement:
Production Semantic rules
S do S1 while E
S.begin: = newlable
E.true: = S. begin
B.false: =S.next
S.code =

S.code|| E.code ||
gen E.true :)||
gen gotoS.being)

PART5
Boolean Expressions, Statements that alter the Flow of Control.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 3.17. Define backpatching and semantic rules for boolean


expression. Derive the three address code for the following
expression:P<Q or R< Sand T< U.
AKTU 2015-16, Marks 10
OR
Write short notes on backpatching.

Answer
1. Backpatching is the activity of filling up unspecified information of labels
using appropriate semantic actions during the code generation process.
2. Backpatching refers to the process of resolving forward branches that
have been used in the code, when the value of the target becomes
known.
3. Backpatching is done to overcome the problem of processing the
incomplete information in one pass.
Backpatching can be used to generate code for boolean expressions and
flow of control statements in one pass.
To generate code using backpatching following functions are used :
1. Makelist(i) : Makelist is a function which creates a new list from one
item wherei is an index into the array ofinstruetions.
2 Merge(p, P,): Merge is a function which concatenates the lists pointed
by p, and p2, and returns a pointer to the concatenated list.
Compiler Design 3-19C (CS/IT-Sem-5)

3
Backpatch(p, i): Inserts i as the target label for each
of the instructions
on the list pointed by p.
Backpatching in boolean
expressions:
. T h e solution is to generate a sequence of branching statements where
the addresses of the jumps are
temporarily left unspecified.
2. For each boolean expression E maintain two lists:
we

a. E.truelist which is the list of the (addresses of the) jump statements


appearing in the translation of E and forwarding to E. true.
b. E.falselist which is the list of the (addresses of the)jump statements
appearing in the translation of E and forwarding to E.false.
3. When the label E.true (resp. E.false) is eventually defined we can walk
down the list, patching in the value of its address.
In the translation scheme below:
a We use emit to generate code that contains place holders to be filled
in later by the backpatch procedure.
b. The attributes E.truelist, E.falselist are synthesized.
c.When the code for E is generated, addresses ofjumps corresponding
to the values true and 1alse are left unspecified and put
E.truelist and E. falselist, respectively.
on the lists

5. A marker non-terminal M is used to capture the numerical address of a


statement.

6. nextinstr is a global variable that stores the number of the next staternient
to be generated.

The grammar is as follows:

BB,1| MB IB, AND MB,| 13, (B,)E,rel E, | True False


M

Thetranslation scheme is as follows


i B->B, l| MB,lbackpatch(B, falselist, M.instr);
B.truelist = merge (B,.truelist,

B,truelist);
B.falselist =B, falselist;}
i B-B,AND MB,
backpatch (B,truelist, M.instr);

B.truelist= B,truelist;
B.falselist = merge (B,.falselist,
B, falselist);}
i. B-B, IB.truelist =B,.falselist;
B.falselist =B, truelist;)
3-20 C (CSMT-Sem-5) Syntax-Directed Translations
iv.
B(B,NB.truelist =
B,.truelist;
B.falselist = B,.falselist;

B E, rel E,1B.truelist = makelist (nextinstr);


B.falselist = makelist (nextinstr + 1);

append (if B, addr relop E, addr 'goto_'%


append ('goto_ );}
vi. B true{B.truelist makelist (nextinstr);
=

append (goto_);}
vi. B>falselB.falselist =makelist (nextinstr);
append goto_;}
vii. M> e{M.instr =
nextinstr;}
Three address code:
100: ifP < Q goto
101 goto 102
102: ifR <S goto 104
103 goto
104: ifT< U goto_
105 goto

Que 3.18.| Explain translation scheme for boolean expression.

Answer
Translation scheme for boolean expression can be understand by following
example.
Consider the boolean expression generated by the following grammar
E E OR E

EE AND E
E» NOTE
E (E)
Eid relop id
E> TRUE
E FALSE
Here the relop is denoted by s, 2,, <,>. The OR and AND are left associate.

The highest precedence is NOT then AND and lastly OR.


The translation scheme for boolean expressions having numerical
representation is as given below
Compiler Design 3-21 C (CS/IT-Sem-6)

Production rule Semantic rule

EE,ORE,
E.place:= newtemp)
append(B.place ="E, place OR' E,place)

EE, AND E,
B.place:= newtemp()
append(E.place '=" E,place 'AND' E, place)

ENOTE
E.place = newtempí)
append(.place ":="NOT E, place)

E(E)
E.place= E, place
E id, relop 1d,
E.place= newtemp(
append(if id,place relop.op id, place 'goto'
nextstate + 3);

append(E.place:="0);
append'goto' next state +2);
append(E.place = "1')

ETRUE
E.place= newtemp();
append(E.place ="1'

EFALSE
E.place := newtempl)
append(B.place'="'0)

PART-6
Postfix Translation: Array References in Arithmetic Expressions.

Questions-Answers
Long Answer Type and Mecium Answer Type Questions
3-22 C (CS/IT-Sem-5)
Syntax-Directed Translations

Que 3.19. Write a short note on postfix translation.

Answer
1. In a production A a , the translation rule
of A.CODE consists of the
concatenation of the CODE translations of the non-terminals in a in the
same order as the
non-terminals appear in a.
Production can be factored to achieve postfix form.
Postfix translation of
while statement:
Production:S while M1E do M2 S1
Can be
factored as
1.SCS1
2 C WE do
3. Wwhile
A suitable transition scheme is given as

Production rule Semantic action


Wwhile w.QUAD = NEXTQUAD

CWE do CWE do
SCS1 BACKPATCH (S1.NEXT,
C.QUAD)
S.NEXT= C.FALSE
GEN (goto C.QUAD)

Que 3.20. What is postfix notations ? Translate (C+D)*(E + Y) into


postfix using Syntax Directed Translation Scheme (SDTS).
AKTU 2017-18, Marks 10
Answer
Postfix notation : Refer Q.3.6, Page 3-6C, Unit-3.
Numerical: Syntax directed translation scheme to specify the translation of
an expression into postfix notation are as follow:
Production:
E-~E, +T
T
- TxF
F
F-(E)
id
3-23 C (CS/IT-Sem-5)
Compiler Design
Schemes
E.code = E,.code | | T,code | | +"

B,.code = T.code

T.code T,.code| |F.code | '*


T.code Fcode
P.code = E.code

F.code = id.code
where '||'sign is used for coneatenation.

PART-7
Procedures Call.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 3.21. Explain procedure call with example.


Answer
Procedures call:
1. Procedure is an important and frequently used programming construct
for a compiler.
2. It is used to generate code for procedure calls and returns.
3. Queue is used to store the list of parameters in the procedure call.
4. The translation for a call includes a sequence of actions taken on entry
and exit from each procedure. Following actions take place ina calling
sequence
a. When a procedure call occurs then space is allocated for activation
record.

b. Evaluate the argunment of the called procedure.


Establish the environment pointers to enable the called procedure
to access data in enclosing blocks.
d Save the state of the calling procedure so that it can resume execution
after the call.
e. Also save the return address. It is the address of the location to
which the called routine must transfer after it is finished.
f. Finally generate ajump to the beginning of the code for the called
procedure.
3-24 C (CS/AT-Sem-5) Syntax-Directed Translations

For example : Let us consider a grammar for a simple procedure cal


statement:
1. Scall id( Elist)
2. ElistElist,E
3. Elist +E
A suitable transition scheme for procedure call would be:

Production rule Semantic action


Scall id(Elist)
for each item p
GEN (param p)
on QUEUE do

GEN (call id. PLACE)


Elist Elist, E append E.PLACE to the end of QUEUE
ElistE initialize QUEUE to contain only
E.PLACE

Que 3.22. Explain the concept of array references in arithmetic


expressions.
Answer
1. An array is a collection of elements of similar data type. Here, we assume
the static allocation of array, whose subseripts ranges from one to some
limit known at compile time.

Ifbegins each array element is 'w' then the ith element of array A
widthinoflocation,
base + (i - low) * d

where low is the lower bound on the subseript and base is the relative
address of the storage allocated for an array i.e., base is the relative
address of Allow].
3. A two dimensional array is normally stored in one of two forms, either
row-major (row by row) or column-major (column by column).
The Fig. 3.22.1 for row-major and column-major are given as

T AI1, 11 A I1,111T|First Column


First Row AI1, 21 A12, 11
A I1, 3
AI1, 21 Second Column
A12,11 A 12, 21
Second Row A 2
|A 1,31 I Third Column
A(2, 31 A(2, 31 1
Fig. 3.22.1.
5. In case of a two dimensional array stored in row-major form, the relative
address of Ali,, i2l can be caleulated by formula,
Compiler Design 3-25 C (CS/IT-Sem-5)

(base +(i- l o w , ) * n2 + i g - low2" w

where low, and low, are lower bounds on the values of i and i, and n
is the number of values that i, can take.
6. That is, if high, is the upper bound on the value ofi, then n, = Ihigh, -

low2
7. Assuming that i, and ig are only values that are not known at compile
time, we can rewrite above expression as:

(,n,) + iz)"w + (base -(low,"n,) + low,) w)


8. The generalize form of row-major will be,
. . 0 , n 2 +i2) ng + ig)...) n +1)*w + base - (...(0
low, + l o w , ) n , + low)..) n + low,) * w

PART-8
Declarations Statements.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 3.23. Explain declarative statements with example.

Answer
In the declarative statements the data items along with their data types are
declared.
For example:
SD offset:= 0
D> id: T lenter_tablid.name, T.type,offset);
offset:= offset + T.width)}
T> integer T.type:=integer;
T.width:= 8

T+real 1T.type:= real;


T. width:= 8)

Tarraylnum| of T,T iype:= array(num.val,Ttype)


T.width:= num.val x
7,.width
TT1 Ttype:= pointer(T.type)
T.width:= 4)
3-26 C (CSAT-Sem-5) Syntax-Directed Translations

1. Initially, the value of offset is set to zero. The computation of offset can
be done by the formula offset
using offset + width.
=

2. In the above translation scheme,


T.type, T.width are the synthesized
attributes. The type indicates the data type of corresponding identifier
and width is used to indicate the memory units associated with an
identifier of corresponding type. For instance integer has width 4 and
real has 8.
The rule D > id: T is a declarative statements for id declaration. The
entertab 1s a function used for creating the symbol table entry for
identifier along with its type and offset.
4 The width of array is obtained by multiplying the width of each element
by number of elements in the array
5. The width of pointer types of supposed to be 4.

VERY IMPORTANT QUESTIONS

Followin8 questions are very important. These questions


may be asked in your SESSIONALS as wellas
UNTVERSITY EXAMINATION.

Q.1. Define syntax directed translation. Construct an annotated


parse tree for the expression
(4* 7+ 1)*2, using the simple
desk calculator grammar.
Ans Refer Q. 3.1.

q2. Explain attributes. What are synthesized and inherited


attribute ?
Ans: Refer Q. 3.3.

translation ? Explain it with suitable


Q.3. What is postfix
example.
Ans Refer Q. 3.3.

Q.4. What is syntax tree? What are the rules to construct syntax
tree for an expression ?
Ans. Refer Q. 3.8.

Q.5. What are different ways to write three address code ?


Ana. Refer Q. 3.11.

triple and indirect triple for the


9.6. Write the quadruples,
following expression
x+y) "(y +z) + (*+y+2)
Ans. Refer Q. 3.12.
4
UNIT
Symbol Tables

CONTENTS
Part-1 :
Symbol Tables: ****************************************** Z to 4-7C
Data Structure for
Symbol T'ables

Part-2 Representing Scope In formation.47C to 4-10C


Part-3 Run-Time Administration to
. 4-10C 4lb
Implementation of Simple Stack
Allocation Scheme

Part-4 Storage Allocation in


.. ******** . 4 - 1 5 C t o 4-16C
Block Structured Language

Part-5
Error Detection and Recovery.
Lexical Phase Errors
4-16C
to 4-21C
Syntactic Phase Errors
Semantic Errors

4-1C (CSIT-Sem-5)
4-2 C (CS/IT-Sem-5)
Symbol Tables

Symbol
PART-1
Tables: Data Structure for Symbol Tables.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 4.1. Discuss symbol table with its capabilities ?

Answer
1. A symbol table is a data structure used by a compiler to keep track of
scope, life and binding information about names.
2. These information are used in the source program to identify the various
program elements, like variables, constants, procedures, and the labels
of statements.
3. Asymbol table must have the
a.
following capabilities
Lookup: To determine whether a given name is in the table.
b. Insert: To add a new name (a new entry) to the table.
c. Access: To access the information related with the given name.
d Modify: To add new information about a known name.
Delete To delete
e
: a name or group of names from the table
Que 4.2.What are the symbol table requirements ? What are the
demerits in the uniform structure of symbol table ?

Answer
The basic requirements of a symbol table are as follows:
L Struetural flexibility : Based on the usage of identifier, the symbol
table entries must contain all the necessary information.
2
Fast lookup/search: The table lookup/search depends on the
implementation of the symbol table and the speed of the search should
be as tast as possible.
Efficient utilization of space: The symbol table must be able to
gTOW or shrink dynamicaly for an efficient usage of space.
4
Ability to
handle languagecharacteristics: The characteristie of
a language such as scoping and implicit declaration needs to be handled.

Demerits in uniform structure of symbol table:


1. The uniform structure cannot handle a name whose length exceed
upper bound or limit or name field.
2. f the length of a name is small, then the remaining space is wasted.
Compiler Design 4-3 C (CS/IT-Sem-5)

Que 4.3.How names can be looked up in the symbol table ?

Discuss. AKTU 2016-17, Mark 10


Answer
1. The symbol table is searched (looked up) every time a name 1s
encountered in the source text.
2. When a new name or new information about an existing name is
discovered, the content of the symbol table changes.
3. Therefore, a symbol table must have an efficient mechanism for
accessing
the information held in the table as well as for adding new entries to the
symbol table.
4. In any case, the symboltable is a useful abstraction to aid the
to ascertain and verity the semantics, or meaning of a piece of code.
compiler
5. It makes the compiler more efficient, since the file does not need to be
re-parsed to discover previously processed information.
For example: Consider the following outline of a C function:
void scopes ()

int a, b, C; level 1 */

intta, b;
level2 */

float c, d; level 3 */

level 4 */

The symbol table could be


represented by an
upwards growing stack as
i Initially the symbol table is empty.

ii. After the first three declarations, the symbol table will be

ii. After the second declaration of Level 2.


Symbol Tables
44C (CS/T-Sem-5)

b int
aint
int

b nt

int
iv. As the control come out from Level 2.

int

int
int
v.When control will enter into Level 3.

d
float
float
int
int

After entering into Level 4.


vi
int

|d float
C float
ne

nt

Jint
vi. On leaving the control from Level 4.

float
float
c
int
int

int
vii. On leaving the control from Level 3.
C int
b int

ix.
| ntj
On leaving the function entirely, the symbol table will be again empty.

different data
Que 44. What is the role of symbol table ? Discuss
structures used for symbol table.
OR
Discuss the various data structures used for symbol table with

suitable example.
Compiler Design 4-5 C (CS/AT-Sem-5)

Answer
Role of symbol table:
It keeps the track of semantics of variables
2. It stores information about
3. It helps to achieve
scope
compile
time efficiency.
Different data structures used in implementing
symbol table are:
1. Unordered list:
a.
Simple to implement symbol table.
b. It is implemented as array or a linked list.
an

Linked list can grow dynamically that eliminate the problem of a


fixed size array.

d. Insertion of variable take O(1) time , but lookup is slow for large
tables i.e., Ofn).
2 Ordered list:
a If an array is sorted, it can be searched using binary search in
Olog , n).

b.Insertion into a sorted array is expensive that it takes Oln) time on


average.
Ordered list is useful when set of names is known i.e., table of
reserved words.
3 Search tree:
a. Search tree operation and lookup is done in logarithmic time.

b. Search tree is balanced by using algorithm of AVL and Red-black


tree.

4 Hash tables and hash functions:


a. Hash table translate the elements in the fixed range of value called
hash value and this value is used by hash function.
Hash table can be used to minimize the movement of elements in
the symbol table.
The hash funetion helps in uniform distribution of names in symbol
table.
For example : Consider a part of C program
int x, y;

msg
1 Unordered list
Symbol Tables
4-6 C (CS/TT-Sem-5)

S. No. Name Type


int
msg function

3 int

2. Ordered list:

Id Name Type Id

Id1 int Id
Id2 int Id2
Id3 mg Id3
function
3. Search tree

4 Hash table :

Name
Data 1
Linkl
Name Name2
Data2
Link2
Hash table Name3
Data3
Link3
Storage table

Que4.5. Deseribe symbol table and its entries. Also, diseuss


various data structure used for symbol table.

AKTU 2015-16, Marks 10


Answer
Symbol table : Refer Q. 4.1, Page 4-2c, Unit-4.
Entries in the symbol table are as follows:
1. Variables:

a Variables are identifiers whose value may change between


executions and during a single execution of a program.
b. They represent the contents of some memory location.
c. The symbol table needs to record both the variable name as well as
its allocated storage space at runtime.
Compiler Design 4-7C (CSAT-Sem-5)
2. Constants:
a. Constants are identifiers that
be changed.
represent a fixed value that can never

b. Unlike variables
stored for constants.
or
procedures, no runtime location needs to be

C. These are
typically placed right into the code stream by the compiler
at compilation time.
3 Types (user defined):
a. A user defined type is combination of one
b.
or more
existing types.
Types are acessed by name and reference a type definition
structure.
4. Classes:
a. Classes are abstract data types which restrict access to its members
and provide convenient language level polymorphism.
b. This includes the location of the default
constructor and destructor,
and the address of the virtual function table.
5. Records:
a. Records represent a collection of
possibly heterogeneous members
which can be accessed by name.
b. The symbol table probably needs to record each of the record's
members.
Various
Unit4.
data structure used for symbol table: Refer Q4.4, Page 4-4C,

PART-2
Representing Scope Information.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Discuss how the scope information is represented in


Que4.6. a

symbol table.

Answer
1. Scope information characterizes the declaration of identifiers and the
portions of the program where it is allowed to use each identifier.
2. Different languages have different scopes for declarations. For example,
in FORTRAN, the scope of a name is a single subroutine, whereas in
4-8C (CSMT-Sem-5) Symbol Tables

the scope of a is the section procedure in which it is


ALGOL,
declared.
name or

3. Thus, the same identifier may be declared several times as distinct


names, with different attributes, and with different intended storage
locations.
4. The symbol table is thus responsible for keeping different declaration
of the same identifier distinet.
5. To make distinction among the declarations, a unique number is
assigned to each program element that in return may have its own
local data.

6. Semantic rules associated with productions that can recognize the


beginning and ending of a subprogram are used to compute the number
of currently active subprograms.
There are mainly two semantic rules regarding the scope of an
identifier:
a. Each identifier can only be used within its scope.
b. Two or more identifiers with same name and are of same kind
cannot be declared within the same lexical scope.
8 The scope declaration of variables, functions, labels and objects within
a program is shown below:
Scope of variables in statement blocks:
int x ***************1
S c o p e of variable x

int y *"1
Scope of variable y
*********

Scope of formal arguments of functions:


int mul (int n)

Scope of argument n

Scope of labels:
void jumper (
goto Sim;

sim++; -Scope of label sim

goto sim,
****************"

Que 4.7.Write a short note on scoping.


Compiler Design 4-9 C (CS/IT-Sem-5)

Answer
1. Scoping is method of keeping variables in different parts of program
distinct from one another.
2. Scoping is generally divided into two classes:
a. Static scoping:Static scoping is also called lexical scoping. In this
scoping a variable always refers to its top level environment.
b. Dynamic scoping: In dynamic scoping, a
global identifier refers
to the identifier associated with the most recent environment.

Que 4.8.Differentiate between lexiecal (or static) scope and


dynamiC scope.

Answer

S.No. Lexical scope Dynamic scope


1 The binding of name| The binding of name occurrences
occurrences to declarations to declarations is done dynamically
is done statistically
at| at run-time.
compile time.

2. The structure of the The binding of variables is defined


program defines the | by the flow of control at the run
binding of variables. Lime.

3 A free variable in a A free variable gets its value from


procedure gets its value where the procedure is called.
from the environment in
which the procedure is
defined.

Que 4.9.Distinguish between static scope and dynamie scope.


Briefly explain access to non-local names in static scope
AKTU 2018-19, Marks 07
Answer
Difference: Refer Q. 4.8, Page 4-9C, Unit-4.
Access to non-local names in static scope
L Static chain is the mechanism to implement non-local names (variable)

access in static scope.


A static chain is a chain of static links that connects certain activation
record instances in the stack.
The static ink, static scope pointer, in activation record instance for
an
3.
to the activation record instances of A's
subprogram A ponts one
of
static parent.
4-10 C (CSAT-Sem-5) Symbol Tables
4. When a subroutine at nesting levelj has a reference to an object declared
in a static parent at the
surrounding scope nested at level k, then J-k
static links forms a static chain that is traversed to get to the Irame
containing the object.
5. The compiler generates code to make these traversals over frames to
reach non-local names.
For example : Subroutine A is at nesting level 1 and Cat nesting level
3. When C accesses an object of A, 2 static links are traversed to get to
A's frame that contains that
object
Nesting
Static frames
A

C
statie link

static link-

static li
Calls

A calls E
E calls B static
Bcalls D
Dcalls C

PART-3
Run-Time Administration: Implementation of Simple Stack
Allocation Scheme

Questions-Answers
Long Answer Type and Medium Answer Type Questions

u e 4.10. Draw the format of activation record in stack allocation

and explain each field in it.


ARTU2018-18,Marks 0
Answer
1. Activation record is used to manage the information needed by a single
execution of a procedure.
2. An activation record is pushed into the stack when a procedure is called
and it is popped when the control returns to the caller function.
Compiler Design 4-11C(CS/IT-Sem-5)
Format of activation records in stack allocation
Return value

Actual parameters

Control link

Access link
Saved machine status

Local data

Temporaries
Fields of activation record are:
1. Return value : It is used by calling procedure to return a value to
calling procedure.
2 Actual parameter : It is used by calling procedures to supply
parameters to the called procedures.
3 Control link: It points to activation record of the caller.
4 Access link: It is used to refer to non-local data held in other activation
records.

5 Saved machine status: It holds the information about status of


machine before the procedure is called.
6 Local data : It holds the data that is local to the execution of the
procedure.
7. Temporaries: It stores the value that arises in the evaluation
of an
expression.
Que 4.11How to sub-divide a run-time memory into code and

data areas ? Explain.


AKTU 2016-17, Marks 10
Answer
Sub-division of run-time memory into codes and data areas is shown in
Fig. 4.11.1.

Code
Statie
1eap

Free Memory
LStack
Fig. 4.11.1.
Symbol Tables
4-12 C (CSIT-Sem-5)
size and do
1. Code: It stores the executable target code which is of fixed
not change during compilation.
2 Static allocation:
a The static allocation is for all the data objects at compile time

compile time.
b. The size of the data objects is known at
The names of these objects are bound to storage at compile time
done by static allocation.
only and such an alocation of data objects is
d. In static allocation, the compiler can determine amount of storage
it becomes easy for a
required by each data object. Therefore,
data in the activation record.
compiler to find the address of these
At compile time, compiler can fill the addresses at which the target
code can find the data on which it operates.

3 Heap allocation: There are two methods used for heap management

a. Garbage collection method :

When all access path to a object are destroyed but data object
continue to exist, such type of objects are said to be garbaged.

collection is a technique which is used to reuse

i The garbage
that object space.
ii. Ingarbage collection, all the elements whose garbage collection
and returned to the free space list.
bit is 'on' are garbaged
b. Reference counter:

Reference counter attempt to reclaim each element of heap


i
after it can n o longer be accessed.
storage immediately
reference counter
Each memory cell
on the heap has a
i
count of number of values
associated with it that contains a

that point to it.


time a new value point to the
incremented each
ii The count is to it.
decremented each time a value
ceases to point
cell and

4 Stack allocation: activation


called
to store data structure
a. Stack allocation is used
record.
and popped as activations begins
b. The activation records are pushed
and ends respectively.
Compiler Design 4-13 C (CS/MT-Sem-5)
C.
Storage for the locals in each call of the
the activation record for procedure is contained in
that call. Thus, locals are
storage in each bound to fresh
activation, because
onto the stack when call is made.
a new
activation record is pushed

d These values of locals are deleted when the activation ends.

Que 4.12. Why run-time storage


management is
required ? How
simple stack implementation is implemented?
Answer
Run-time storage management is required because:
1 A
program needs memory resources to execute
2.
instructions.
The storage management must connect to the data
3. It programs. objects of
takes care of memory allocation
and deallocation while the
being executed. program is
Simple stack
implementation is implemented as:
In stack allocation
strategy,
is also called control
the storage is organized as stack. This stack
stack.
2 As activation begins the activation records are pushed onto the stack
and on
completion of this activation the corresponding activation records
can be popped.

3. The locals arestored in the each activation record.


bound to Hence, locals are
corresponding activation record on each fresh activation.
4. The data structures can be created dymamically for stack allocation.

Que 4.13.| Discuss the following parameter passing techniques


with suitable example.
i. Call by name
ii. Call by reference
OR
Explain the various parameter passing mechanisms of a
high level
language. AKTU 2018-19, Marks 07
Answer
i Call by name:
1. In call by name, the actual parameters are substituted for formals
in all the places where formals in the
occur
procedure.
4-14C (CS/AT-Sem-5) Symbol Tables

2. It is also referred as lazy evaluation because evaluation is done on


parameters only when needed.

For example:
main (

int n1=10;n2=20;
printf"n1: %d, n2: %d\n", nl, n2);
Swapln1,n2);

n2: Kd\n", n1, n2);}


printf"n l: %d,
Swapint c ,int d)

int t;

t=C,

c=d;

d=t;
printfl"nl: %d, n2:%d\n", nl, n2);

Output: 10 20

20 10
20 10

i. Call by reference:
(address) of actual arguments is
1 In call by reference, the location
to formal arguments of the called function. This means by
passed
actual arguments we can alter them
accessing the addresses of
within the called function.
is possible within
In call by reference, alteration to actual arguments
called function; therefore the code must handle arguments carefully
results.
else we get unexpected

Forexample:
#include <stdio.h>

void swapByReference(int", int*);/* Prototype "/

int main()* Main function "

int nl = 10; n2 = 20;


Compiler Design 4-15 C (CS/AT-Sem-5)

actual arguments will be altered "/

swapByReferencel&n1, &n2)
printf"nl:d, n2: %d\n", ni, n2);

void swapBy Reference(int "a, int "b)

int t;

t= a; "a =
"b; "b =Ft:

Output:nl:20, n2: 10

PART4
Storage Allocation in Block Struetured Language.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 4.14. Explain symbol table organization using hash tables.


With an example show the symbol table organization for block
structured language.

Answer
1.
Hashing is important
an
table. This method is
technique used to search the records of symbol
superior to list organization.
2. In hashing scheme, a hash table and symbol table are maintained.
The hash table consists ofk entries from 0, 1 to k- 1.These entries are
basically pointers to symbol table pointing to the names of symbol table.
To determine whether the 'Name' is in symbol table, we used a hash
function "h such that h{name) will result any integer between 0 to
k-1.
We can search any name by position =h(name).
5. Using this position, we can obtain the exact locations of name in symbol
table.
4-16C (CSIT-Sem-5) Symbol Tables

6. The hash table and symbol table are shown in Fig. 4.14.1

Symbol table
hash table Name Info hash link

Sum Sum

avg
avg

Fig. 4.14.1.

7. The hash function should result in uniform distribution of names in


symbol table.

3. The hash function should have minimum number of collision.

PART-5
Eror Detection and Recovery: Lexical Phase Errors, Syntactic
Phase Errors, Semantic Errors.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

gue4.15. Define error recovery. What are the properties of error


message ? Discuss the goals of error handling.

Answer
Error recovery: Error recovery is an
important feature
of any compiler,
through which compiler can read and execute the complete program even it
have some errors.
Compiler Design 4-17C (CS/AT-Sem-5)
Properties of error message are follows: as

1. Message should report the errors in


original source
than in terms of some
internal representation of sourceprogram rather
2. Error message should not be program.
complicated.
3. Error message should be
very specific and should fix the errors
positions. at correct

4 There should be no
duplicacy of error messages, i.e.,
not be reported again and same error should
again.
Goals of error
handling are as follows
1. Detect the
presence of errors and produce "meaningful" diagnostics.
2. To recover
quickly enough to be able to detect
3 Error
subsequent errors.
handling components should not significantly slow down the
compilaiion of syntactically correct programs.
Que 4.16.| What are lexical phase errors, syntactie phase errors
and semantie phase errors ? Explain with suitable example.
AKTU 2015-16, Marks 10
Answer
1. Lexical phase error:
a. Alexical phase error is a
the
sequence of character thatdoes not match
pattern of token i.e., while
scanning the source program, the
compiler may not generate a valid token from the source
program.
b. Reasons due to which errors are found in lexical phase are

i The addition of an extraneous character.


i. The removal of character that should be
presented.
ii The replacement of a character with an incorrect character.

iv. The transposition of two characters.

For example:
i In Fortran, an identifier with more than 7 characters long is a
lexical error.
ii. In Pascal program, the character -, & and @ if occurred is a
lexical error.
2 Syntactie phase errors (syntax error):
a Syntactic errors are those errors which occur due to the mistake
done by the programmer during coding process.
Symbol Tables
4-18C (CSAT-Sem-5)
b. Reasons due t o which errors are found in syntactic phase are

i. Missing of semicolon

i. Unbalanced parenthesis and punctuation


For example: Let us consider the following piece of code :

int x;

int y /Syntax error


In example, syntactic error occurred because of absence of semicolon.
Semantic phase errors :

a. Semantic phase errors are those errors which occur in declaration


and scope in a program.

Reason due to which errors are found :


i. Undeclared names
i. Type incompatibilities

ii. Mismatching of actual arguments with the formal arguments.


For example: Let us consider the following piece of code:
scanf*%f6f, a, b);
In example, a and b are semantic error because scanf uses address ot
the variables as &a and &b.

4 Logical errors :
a Logical errors are the logical mistakes founded in the program
which is not handled by the compiler.
b. In these types of errors, program is syntactically correct but does
not operate as desired.

For example:
Let consider following piece of code :

4
y=5
average =x + y/2;

The given code do not give the average ofr and y because BODMAS
property is not used properly.

Que4.17. What do you understand by lexical error and syntactic


error? Also, suggest methods for recovery of errors.
Compiler Design 4-19C (CSIT-Sem-5)

OR
Explain logical phase error and syntactie phase error. Also suggest
methods for recovery of error.
AKTU 2017-18, Marke 10
Answer
Lexical and syntactic error: Refer Q. 4. 16, Page 4-17C, Unit-4.
Various error recovery methods are:
1.
Panic mode recovery:
a. This is the simplest method to implement and used by most of the
parsing methods.

b. When parser detect an error, the parser discards the


input symbols
one at a time until one ofthe designated set of synchronizing token
is found.

Panic mode correction often skips a considerable amount of


input
without checking it for additional errors. It gives guarantee not to
go in infinite loop.

For example:
Let consider a piece of code
a =b+C;

d=e +f
By using panic mode it skipsa =b+cwithout checking the error in
the code.

2 Phrase-level recovery :

a When parser detects an error the parser may pertorm local

correction on remaining input.


b. It may replace a prefix of the remaining input by some string that
allows parser to continue.

C. A typical local correction would replace a comma by a semicolon,


delete an extraneous semicolon or insert a missing semicolon.

For example:
Let consider a piece of code

while x> 0) y=a +b;


In this code local correction is done
by phrase-level recovery by
adding 'do' and parsing is continued.
Symbol Tables
4-20 C (CS/IT-Sem-5)
the parser, we can
3. Error production: If error production is used by
continued.
generate appropriate error message and parsing i8
For example :

Let consider a grammar

E E-E| *A|/A
E
When error production encounters"A, it sends an error message

or not.
to the user asking to use
* as unary

Global correction:
a. Global correction is a theoretical concept.

This method increases time and space requirement during parsing.

in operator
Que 4.18.|Explain in detail the error recovery process
precedence parsing method. AKTU 2018-19,Marks 07
Answer
Error recovery in operator precedence parsing:
1. There are two points in the parsing process at which an operator
error :
precedence parser can discover syntactic
If no precedence relation holds between the terminal on top of the
stack and the current input.
b. If a handle has been found, but there is no production with this

handle as a right side.


2 The error checker does the following errors

a. Missing operand

b. Missing operator
C. No expression between parenthesess
d These error diagnostic issued at handling of errors during reduction.

the diagnostic's issues are:


During handling of shift/reduce errors,
Compiler Design 4-21 C (CS/IT.Sem-5)

a. Missing operand
b. Unbalanced right parenthesis
c. Missing right parenthesis

d Missing operators

VERY IMPORTANT QUESTIONS

Following questions are very important. These questions


may be asked in your SESSIONALS as well as
UNIVERSITY EXAMINATION.

Q.1. What are the symbol table requirements ? What are the
in the uniform structure of symbol table ?
Ans
demerits
Refer Q. 4.2.

92. What is the role of symbol table ? Discuss different data


structures used for symbol table.
Ans Refer Q. 4.4.

93. Deseribe symbol table and its entries. Also, discuss various
data structure used for symbol table.
Ans Refer Q. 4.5.

9.4. Distinguish between statie seope and dynamie seope. Briefly


explain access to non-local names in statie scope.
Ans Refer Q. 4.9.

Q.5. Draw the format of activation record in stack allocation


and explain each field in it.
Ans Refer Q. 4.10.

Q6. Explain the various parameter passing mechanisms of a


high level language.
Ans Refer Q 4.13.

Q.7. Define error recovery. What are the properties of error


message ? Discuss the goals of error handling.
AE Refer Q. 4.15.

98. What are lexical phase errors, syntactie phase errors and
semantic phase errors ? Explain with suitable example.
An Refer Q 4.16.
5 UNIT
Code Generation

CONTENTS
Part-1 Code Generation 5-2C to 5-3C
***********************
.
Design Issues

Part-2 : The Target Language. *****************


5-3C to 5-4C
Address in Target Code

Part-3 : Basic Blocks and Plow Graphs.. 5 4 C to 5-100C


Optimization of Basic Blocks
Code Generator

Part-4 Machine Independent.. ******* 5-10C to 5-16C


Optimizations
Loop Optimization
Part-5 DAG sentanon.. D-l6 o 5-22C
of Basie Blocks

Part-6 Value Numbers a n d . . . . 5-22C to 5-24C


Algebraic Laws
Global Data Plow Analysis

5-1 C (CSIT-Sem-5)
5-2C (CS/AT-Sem-5) Code Generation

PART-1
Code Generation: Design Issues.

Questions-Answers

Long Answer Type and Medium Answer Type Questions

Que 5.1. What is code generation ? Discuss the design issues of


code generation.

Answer
Code generation is the final phase of compiler.
It takes as input the Intermediate Representation (IR) produced by the
front end of the compiler, along with relevant symbol table information,
and produces as output a semantically equivalent target program as
shown in Fig. 5.1.1.

Frontntermediate Code ntermediate Code Target


End Code iOptimizer Code Generator program
L----

Fig. 6.1.1. Position of code generator.

Design issues ofcode generator a r e :


1. Input to the code generator
a The input to the code generator is the intermediate representation
of the source program produced by the front end, along with
information in the symbol table.
b. IR includes three address representations and graphical

representations.

The target program


a. The instruction set architecture of the target machine has a

significant impact on the difficulty of constructing a good code


generator that produces high quality machine code.
b. The most common target machine architectures are RISC
(Reduced Instruction Set Computer), CISC (Complex Instruction

Set Computer), and stack based.

3. Instruction selection:
a. The code generator must map the IR program into a code sequence

that can be executed by the target machine.


Compiler Design 5-3C (CS/TT-Sem-5)

b.
If the IR is high level, the code generator may translate each IR
statement into a sequence of machine instructions using code
templates.
4 Register allocation :
a. A key problem in code generation is deciding what values to hold
in which registers on the target machine do not have enough
space to hold all values.
Values that are not held in registers need to reside in memory.
Instructions involving register operands are invariably shorter
and faster than those involving operands in memory, so efficient
utilization of registers is particularly important.
C.The use of registers is often subdivided into two subproblems:
i
Register allocation, during which we select the set of variables
that will reside in registersat each in the
point program.
i Register assignment, during which we pick the specific register
that a variable will reside in.
5. Evaluation order:
aThe order in which computations are performed can affect the
efficiency of the target code.
b. Some computation orders require fewer registers to hold
intermediate results than others.

PART-2
The Target Language, Address in Target Code.

QuestionsAnswers
Long Answer Type and Medium Answer Type Questions

Que 5.2. Discuss addresses in the target code.


Answer
1. Addresses in the target code show how names in the IR can be converted
into addresses in the target code by looking at code generation for
simple procedure calls and returns using static and stack allocation.
2. Addresses in the target code represent executing program runs in its
own logical address space that was partitioned into four code and data
areas
a. A statically determined area code that holds the executable target
code. The size of the target code can be determined at compile
time.
5-4C (CS/IT-Sem-5)
Code Generation
b. A statically determined data area static for holding global constants
and other data generated
by the compiler. The
size of the global
constants and compiler data can also be determined at compile time.
C. A
dynamically managed area heap for holding data objects that
are allocated and
freed during program execution. The size
heap cannot be determined at compile time. of the
d
Adymamically managed area stack for holding activation records
as they are created and destroyed
returns. Like
during procedure calls and
the heap, the size of the stack cannot be
at compile time.
determined

PART-3
Basic Blocks and Flow Graphs, Optimization of Basic
Blocks, Code Generator.

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 53. Write an algorithm to partition a sequence of three

address statements into basic blocks.


AKTU2016-17, Marks 10
Answer

The algorithm for construction of basic block is


Input : A sequence of three address statements.
as follows

Output:A list of basic blocks with each three address statements in exactly
one block.
Method:
1. We first determine the set of leaders, the first statement of basic block.
The rules we use are given as
a The first statement is a leader.
b. Any statement which is the target of a conditional or unconditional
goto is a leader.
c.Any statement which immediatelyfollows a conditional goto is a
leader.
2. For each leader construct its basic block, which consist of leader and all
statements up to the end of program but not including the next leader.
Any statement not placed in the block can never be executed and may
now be removed, if desired.
Compiler Design 5-5 C (CS/IT-Sem-5)

ue 5.4. Explain Mow graph with example.

Answer
1. A tlow graph isa directed graph in which the flow control information is
added to the basie blocks.
2. The nodes to the flow graph are represented by basic blocks.
3. The block whose leader is the first statement is called initial blocks.
4. There is a directed edge from block B,, to block B, if B, immediately
follows B,-, in the given sequence. We can say that B, is a predecessor
of B.
For example: Consider the three address code as :
1. prod:=0 2. i =1
3. 4i 4.
=alt, computation of alil
5. 4*i 6. t:=blt computationof blil /
7. t , " t, 8. t= prod +
9. prod :=t 10. 1, =i + 1
11. i = t 12. ifi 10 goto
<=
(3)9
The flow graph for the given code can be drawn as follows:

prod:=00 Block B: the


1:=l initial block

4i
alt
t 4°1
t. bltg

prod + t5
6
prod=ts
i+1

10
if i
goto (3)

Fig. 54.1. Flow graph


Que 6.5. What is loop ? Explain what constitute a loop in a flow

graph.
5-6 C (CS/AT-Sem-5) Code Generation

Answer

Loop is a collection of nodes in the flow graph such that :


1. All such nodes are strongly connected i.e., there is always a path from
any node to any other node within that loop.
2. The collection of nodes has unique entry. That means there is only one
path from a node outside the loop to the node inside the loop.
3. The loop that contains no other loop is called inner loop.
constitute loop in flow graph:
Following term a

1. Dominators :
a. In control flow graphs, a node d dominates a noden if every path
from the entry node to n must go through d. This is denoted as d
dom n.
b. By definition, every node dominates itself.
c. A node d strietly dominates a node n if d dominates n and d is not
equal to n.
d The immediate dominator (or idom) of a node n is the unique node
that strictly dominates n but does not strictly dominate any other
node that strictly dominates n. Every node, except the entry node,
has an immediate dominator.
e. A dominator tree is a tree where each node's children are those

nodes it immediately dominates. Because the immediate dominator


is unique, it is a tree. The start node is the root of the tree.

2. Natural loops:
The natural loop be defined by a back edge n > d such that
a there exists a
can

collection of all the nodes that can reach to n

without going through d and at the same time d can also be


added to this collection.
b. Loop in a flow graph can be denoted by n d such that d domn.

C. These edges are called back edges and for a loop there can be
more than one back edge.
head and p is a tail and head dominates
Ifthere isp~q then q is
tail.
a

3. Pre-header:
a. The pre-header is a new block created such that successor of this
block is the header block.
b. All the computations that can be made before the header block
can be made before the pre-header block.
Compiler Design 5-7C (CS/T-Sem-5)

Pre-header

header
L header

o Bo

Fig. 6.5.1. Pre-header


4 Reducible flow graph:
a. A flow graph G is reducible
graph if an only if we can partition
the edges into two disjointed
groups i.e., forward edges and backward
edges.
These edges have following
i The forward
properties
edge forms an
acyclic graph.
i The backward edges are such edges whose heads dominate
their tails.
The program structure in which there is
exclusive use of if-then,
while-do or goto statements generates a flow
graph which is
always reducible.
Que 5.6. Diseuss in detail the process of optimization of basie
blocks. Give example.
an
AKTU 2016-17, Marks 10
OR
What different issues in code
are
optimization ? Explain it with
proper example.

Answer
Different issues in code optimization are:
1. Function preserving transformation: The function
transformations are basically divided into following types: preserving
a. Common sub-expression elimination:
i A common
is already
sub-expression
is nothing but the
expression which
computed and the same expression is used
and again in the again
program.
ii. If the result of the
expression not changed then we eliminate
computation of same
expression again and again.
5-8C (CS/MT-Sem-5) Code Generation

For example:
Before common sub-expression elimination:
a=t'4-b+c;
* ****************

* * * * * ***********

m=t 4-b ++ C;
***************

********************

n= t
4-b+ C
After common sub-expression elimination
temp =t * 4 - b + c;

a = temp

***************

* **************

m= temp,
**************

************

n = temp
. In given example, the equation a =t " 4 - b + c i s occurred most

of the times. So it is eliminated by storing the equation into


temp variable.
b. Dead codeelimination:
i Dead code means the code which can be emitted from program
and still there will be no change in result.
A variable is live only when it is used in the program again and
again. Otherwise, it is declared as dead, because we cannot
use that variable in the program so it is useless.
ii. The dead code occurred during the program is not introduced
intentionally by the programmer.
For example:
#Define False = 0
!False = 1
IRIFalse)

***********************

**********************

iv. Iffalse becomes zero, is guaranteed then code in TF" statement


will never be executed. So, there is no need to generate or
Write code tor this statement because it is dead code.

c Copy propagation:
i Copy propagation is the concept where we can copy the result
of common sub-expression and use it in the program.
i In this technique the value of variable is replaced and
computation of an expression is done at the compilation time.
Compiler Design 5-9C(CS/IT-Sem-5)

For example:
pi = 3.14;
r=

Area =pi "r*r;


Here at the compilation time the value of pi is replaced by 3.14
and r by 5.
d. Constant folding (compile time evaluation):
i Constant folding is defined as replacement of the value of one
constant in an expression by equivalent constant value at the
compile time.
i. In constant folding all operands in an operation are constant.
Onginal evaluation can also be replaced by result which is also
constant.

For example: a = 3.14157/2 can be replaced by a = 1.570785


thereby eliminating a division operation.
2 Algebraic simplification:
a. Peephole optimization is an effective technique for algebraic
simplification.
b. The statements such as
x=x+0
Dr
x:=x*l1
can be eliminated by peephole optimization.

Que 5.7. Write a short note on transformation of basic blocks.

Answer
Transformation
1 A number of transformations can be applied to basic block without
changing set of expression computed by the block.
Transformation helps us in improving quality of code and act as optimizer.
There are two important classes as local transformation that can be
applied to the basic block:
a. Structure preserving transformation: They are as follows:
i. Common sub-expression elimination: Refer Q. 5.6,
Page 5-7C, Unit-5.
ii. Dead code elimination: ReferQ. 5.6, Page 5-7C, Unit-5.
iii. Interchange of statement : Suppose we have a block with
the two adjacent statements,
templ =a +b
5-10C (Cs/AT-Sem-5) Code Generation

temp2 = m +n

hen we can interchange the two statements without affecting


he value of the block if and only if neither m nor "n 1s
temporary variable templ and neither'a' nor 'b is temporary
variable temp2. From the given statements we can conclude
that a normal form basic block allow us for interchanging all
the statements if they are possible.
b. Algebraic transformation: Refer . 5.6, Page 5-7C, Unit-5.

PART-4|
Machine Independent Optimizations, Loop Optimization.

Question5-Answers

Long Answer Type and Medium Answer Type Questions

Que 5.8. What is code optimization ? Diseuss the classification of


code optimization.

Answer
Code optimization :

1 The code optimization


the execution
to the
refers
efficiency of generated object
the compiler to
techniques used bycode.
improve
2 It involves a complex analysis of intermediate code and performs various
transtormations but every optimizing transtormation must also

preserve the semantic of the program.

Classification of code optimization:

Code optimization

Machine dependent Machine independent|


Fig. 5.8.1. Classification of code optimization.
Machine dependent: The machine dependent optimization can be
achieved using following criteria
a. Allocation of sufficient number of resources to improve the execution
efficieney of the program.
b. Using immediate instructions wherever necessary.
Compiler Design 5-11 C(CS/IT-Sem-5)

c. The use of intermix instructions along with the data increases the
speed of execution.
2 Machine independent: The machine independent optimization can
be achieved using following criteria:
a The code should be analyzed completely and use alternative
equivalent sequence of source code that will produce a minimum
amount of target code.
b U s e appropriate program structure in order to improve the
efficiency of target code.
C.By eliminating the unreachable code from the source program.
d Move two or more identical computations at one place and make
use of the result instead of each time computing the expressions.

Qure 5.9. Explain local optimization.


Answer

1. Local optimization is a kind of optimization in which both the analysis


and the transformations are localized to a basic block.
2. The transformations in local optimization are called as local
transformations.
3. The name of transformation is usually prefixed with loeal' while
referring to the local transformation.
There are "local" transformations that cun be applied to program to
attempt an improvement.
For example:The elimination of common sub-expression, provided A
is not an alias for B or C, the assignments

A: B+C+D
E B+C+F
might be evaluated as

T,= B+C
A T, +D
E T, +F
In the given example, B +Cis stored in T, which act as local optimization
of common sub-expression.

9ue 5.10.| Explain what constitute loop in fow graph and how
a a

will you do loop optimizations in code optimization ofa compiler.

AKTU 2018-19, Marks 07


OR
Write a short note on loop optimization.

AKTU 2017-18, Marks065


5-12C (CS/AT-Sem-5) Code Generation

Answer
Following term constitute a loop in flow graph: Refer Q. 5.5,
Page 5-5C, Unit-5.
Loop optimization is a process of increasing execution time and reducing
the overhead associated with loops.
The loop optimization is carried out by following methods:
1. Code motion :
a Code motion is a technique which moves the code outside the
loop.
If some expression in the loop whose result remains unchanged
even after executing the loop for several times, then such an
expression should be placed just before the loop (i.e., outside the
loop).
C. Code motion is done to reduce the execution time of the program.
Induction variables:

a Avariable x is called an induction variable of loop L if the value of


variable gets changed every time.
b. It is either decremented or incremented by some constant.
3 Reduction in strength :
a. In strength reduction technique the higher strength operators
can be replaced by lower strength operators.
b The strength of certain operator is higher than other.
c. The strength reduction is not applied to the floating point
expressions because it may yvield different results.
4 Loop invariant method: In loop invariant method, the computation
inside the loop is avoided and thereby the computation overhead on
compiler is avoided.
5. Loop unrolling : In this method, the number of jumps and tests can
be reduced by writing the code two times.

For example :

ti =1 int i 1;
whileeic=100) whiletic=100)
Can be
a li)=blil; written as
alil=blil;
alil=blil:

6. Loop fusion or
loop jamming: In loop fusion method, several loops
are
merged to one
loop.
Compiler Design 5-13C (CSAT-Sem-5)

For example:
for i:=1 to n do Can be written as 1or 1:=l to n*m do
for j:=1 to m do alil-10
alijl:=10

Que5.11. Consider the following three address code segments:


PROD: = 0

2 T1: =4°T|
3 12:= addr{A)-4
4 T3:= T2 [T1]
5 T4:= addr{B) -4
6 T5:= T4[T1]
7. T6:= T3*T5
&
9.
PROD: =PROD+TS
l: =l+1
10. Ifl« =20 goto (3)
a. Find the basic blocks and low graph of above sequence.
b. Optimize the code sequence by applying function preserving
transformation optimization technique.

AKTU 2017-18, Marks 10


OR
Consider the following sequence of three address codes:
1. Prod:=0
2 I:=l

4 2 addr (A) --4


5. T= T, [T,]
& T4: addr (B) -4
.
8
TsT,17,
T6=13°Ts

9. Prod: = Prod +Ts


10. I=I +1
11. IfI <= 20 goto (3)

Perform loop optimization. ATU 2016-16,Marks 10


Answer
a. Basic blocks and flow graph:
1. As first statement of program is leader statement.
PROD = 0 is a leader.
2. Fragmented code represented by two blocks is shown below:
5-14C (CSIT-Sem-5) Code Generation

PROD = 0

1=1

T 4I
T2 addnA) 4 -

Ta T, IT
TaddrB)- 4
Ts T IT,l
Te = T3 *Ts
PROD =PROD +TsB2
I =I+1
IfI«= 20 goto Ba

Fig. 5.11.1.
b. Function preserving transformation :
L. Common sub-expression elimination: No any block has any sub

expression which is used two times. So, no change in low graphs.


2 Copy propagation : No any instruction in the block B, is direct
in flow graph and
assignment i.e., in the form ofx =
y. So, no change
basic block.
3 Dead code elimination: No any instruction in the block B, is dead. So,
no change in flow graph and basic block.
Constant folding: No any constant expression is present in basic

block. So, no change in flow graph and basic block.

Loop
L optimization
Code motion:In block B, we can see that value of T, and T, is calculated
these two instructions
every time when loop is executed. So, we can move
shown in Fig. 5.11.2.
outside the loop and put in block B, as

PROD 0
I= 1

T2= addrA) -4 B

T addrB) - 4

T=4I
Ta T, IT
Ts T4 lT
T6 T3Ts
PROD = PROD + T6| P2
I = I+1
IfI «= 20 goto B2
Fig. 5.11.2.
Compiler Desigm 6-15 C(CSIT-Sem-5)
2 Induction variable :A variable I and T, are called an induction variable
of loop L because every time the variable / change the value of 7, is also
change. To remove these variables we use other method that is called
reduction in strength.

3 Reduction in strength: The values ofl varies from 1 to 20 and value


T varies from (4, 8, 80).
.
,

Block B, is now given as


T =4*Ie In block B,

Now final flow graph is given as

PROD = 0
T 4I
T2 addr(A) - 4 B
Taddr(B)-4

T=T+4
Ta T IT
Ts=T, IT
Ts= T3 * T s
Ba
PROD =
PROD +
T6
if Ti< =

80 goto B2
Fig. 5.11.3

Que 5.12.| Write short notes on the following with the help of
example
i Loop unrolling
ii. Loop jamming8
iii. Dominators
iv. Viable prefix
AKTU 2018-19, Marks 07
Answer
i. Loop unrolling : Refer Q.
5.10, Page 5-11C, Unit-5:
ii. Loop jamming: Refer
Q. 5.10, Page 5-11C, Unit-5.
ii. Dominators: Refer Q. 5.5,
Page 5-5C, Unit-5.
For example: In the flow graph,
5-16C (CSMT-Sem-5) Code Generation

(6) Dominator tree


10)

(a) Flow graph


Fig. 5.12.1.
Initial Node, Nodel dominates every Node.
Node 2 dominates itself. Node 3 dominates all but 1 and 2. Node 4 dominates
all but 1,2 and 3.
Node 5 and 6 dominates only themselves, since flow of control can skip
around either by go in through the other. Node 7 dominates 7, 8, 9 and 10.
Node 8 dominates 8, 9 and 10.
Node 9 and 10 dominates only themselves.
iv. Viable prefix: Viable prefixes are the prefixes of right sentential forms
that the stack of a shift-reduce
can appear on
parser
For example:
Let:SI¥FF4
A 12
Let w =
T¥3
SLR parse trace:

STACK INPUT
$
z3
z*3

3
$AX3

AS we see, x,7¥, Will never appear on the stack. So, it is not a viable
prefix.

PART-5s
DAG Representation of Basic Blocks.
Compiler Design 5-17C (CSMT-Sem-5)

Questions-Answers
Long Answer Type and Medium Answer Type Questions

Que 5.13.| What is DAG ? What are its advantages in context of


optimization ?
OR
Write a short note direct
on
acyclic graph.
AKTU 2017-18, Marks O5
Answer
DAG:

The abbreviation DAG stands for Directed


Acyclic Graph.
2. DAGs are useful data structure for
basic blocks.
implementing transformations on

3. A DAG gives picture of how the value computed by each statement in


the basic block is used in the
subsequent statement of the block.
4.
Constructing a DAG from three address statement is a
good way of
determining common
sub-expressions within a block.
5. A DAG for á basic block has following
properties:
a Leaves are labeled by unique identifier, either a variable name or
constants.
b. Interior nodes are labeled by an
operator symbol.
C. Nodes also
are
optionally given sequence of identifiers for labels.
a

6 Since, DAG is used in code


is machine code and machine code
optimization and output of code optimization
uses
in the source
register to store variable used
program.
Advantage of DAG:
1 We
automatically detect common
sub-expressions with the
DAG algorithm. help of
2. We can determine which identifiers have their values
block. used in the
3. We can determine which statements
used outside the block. compute values which could be

Que 5.14 What is DAG ? How DAG is


created from three address
code ? Write algorithm for it and
explain it with a relevant example.
5-18C (CS/IT-Sem-5) Code Generation

Answer
DAG: Refer Q.5.13, Page 5-17C, Unit-6.
Algorithm
Input: A basic block.
Output: A DAG with label for each node (identifier).
Method:
1. Create nodes with one or two left and right children.
2. Create linked list of attached identifiers for each node.
3. Maintain all identifiers for which a node is associated.
4 Node (identifier) represents value that identifier has the current point
in DAG construction process. Symbol table store the value of node
(identifier).

5. If there is an expression of the form x =y op z then DAG contain "op


as a parent node and node(y) as a left child and nodelz) as a right child.

For example:
Given expression: a* (b -c) + ( b - c) * d

The construction of DAG with three address code will be as follows

Step 1: t=b-

Step 2 2 (b-c) * d

3 a b - c)
Step 3

4 a (b-c)+ (b-c). d
Step 4:

Que5.15. How DAG is different from syntax tree ? Construct the


DAG for the following basic blocks :
Compiler Design 5-19C (CS/IT-Sem-5)
a :=b+c
btb-d
Cc+d

e =b+c

Also, explain the key application of DAG. |AKTU 2015-16, Marks 15


Answer
DAG v/s Syntax tree
1. Directed Acyclic Graph is a data structure for transformations on the
basic block. While syntax tree is an
abstract representation of the
language constructs.
2 DAG is constructed from three address statement while syntax tree is
constructed directly from the expression.
DAG for the given code is:

Fig.5.15.1.
The two Occurrences of sub-expressions b + c compute the same value.
2. Value computed by a and e are same.

Applications of DAG :
1. Scheduling: Directed acyclic graphs representations of partial orderings
have many applications in scheduling for systems of tasks.
2 Data processing networks : A directed acyclic graph may be used too
represent a network of processing elements.
3 Data compression :Directed acyclic graphs may also be used as a
compact representation of a collection of sequences. In this type of
application, one finds a DAG in which the paths form the sequences.
4 It helps in finding statement that can be recorded.

Que 5.16. Define a directed acyclic graph. Construet a DAG and


write the sequence of instructions for the expression:
a+a (b-c) + (b -c) d. AKTU 2016-17, Marks 15
5-20 C (CSAT-Sem-5) Code Generation

Answer
Directed acyclic graph: Refer Q.5.13, Page 5-17C, Unit-5.
Numerical:
Given expression: a +a (b-c)+ (b-c) * d
The construction of DAG with three address code will be as follows

Step 1: =b-c

Step 2: ta (b c ) * d

Step 3: a ( b - c)

Step 4: 4ab-c)+ (b-c) * d

Step 5: ts= a + a * (b-c) + (b-c) * d

Que 5.17. How would you represent the following equation using
DAG?
a= b°-c +b*-e AKTU2018-10, Marks 07
Compiler Design 5-21 C (CS/1T-Sem-5)

Answer
Code representation using DAG of equation:a = b* - c+b *-c
Step 1
-C

Step 2:

t3= t2+ 2
Step 3:

Step 4

t
a)

Que 5.18. Give the algorithm for the elimination oflocal and global
common sub-expressions algorithm with the help of example.

AKTU 2017-18, Marks 10J


Answer 1
Algorithm for elimination of local common sub-expression: DAG
algorithm is used to eliminate local common sub-expression.
DAG:
ReferQ.5.13, Page 5-17C, Unit-5
5-22 C (CSAT-Sem-5) Code Generation

Algorithm for elimination of global common sub-expr essions:


1. An expression is defined at the point where it is assigned a value and
killed when one of its operands is subsequently assigned a new value.
2 An expression is available at some point p in a flow graph every path
leading to p contains a prior definition of that expression which is not
subsequently killed.
used
3 Following expressions are

avail B| = set of expressions available on entry to block B

b. exit[B] = set of expressions available on exit from B

killed[B] = set of expressions killed in B


d defined|B] = set of expressions defined in B

exit|B] = avail[B]- killed|B] + defined[B


Algorithmn:
P i r s t , compute defined and killed sets for each basic block
the
2 Iteratively compute the avail and exit sets for each block by running
following algorithm until we get a fixed point:
block B
a. Identify each statement s
of the forma =b
op c in some

is
such thatb opc is available at the entry to B and neither b nor c
redefined in B prior to s.
back to but
b. Follow flow of control backwards in the graph passing
defines b op c. the last computation of
not through each block that
b op c in such a block reaches
s.

After each computationd b op c identified in step 2(a), add


=

temp d).
d to that block (where t is
a new
statementt =

t
d Replace s
by a =

PART-6
Global Data Flow Analysis.
Value Numbers and Algebraic Laus,

Questions-Answers

and Medium Answer Type Questions


Long Answer Type

Write short note on data flow analysis.


Que 5.18. a

OR
? How does it use in code optimization?
What is data flow analysis
Compiler Design 5-23 C (cS/IT-Sem-5)

Answer
1. Data flow analysis is a process in which the values are computed using
data flow properties.
2. In this analysis, the analysis is made on data flow.
3.
AprogTram's Control Flow Graph (CFG) is used to those
determine
of a program to which a particular value assigned to a variable might
parts
propagate.
A simple way to perform data flow analysis of programs is to set up data
flow equations for each node of the control flow graph and solve them
by repeatedly calculating the output from the input locally at each node
until the whole system stabilizes, i.e., it reaches a fix point.
5. Reaching definitions is used by data flow analysis in code optimization.
Reaching definitions:
A definition D reaches at point p if there is a path from D top along
which D is not killed.

d:
y=2 B1

42:x=y+2 B2
2. A definition D of variablex is killed when there is a redefinition
of x.

d1: y=2 B1

d2:y=y+2 | B2

d3:x=y+2 | B3

3. The definition d1 is said to a reaching definition for block B2. But the
definition dl is not a reaching definition in block B3, because it is killed
by definition d2 in block B2.

Que 5.20. Write short notes (any two):


i Global data flow analysis
ii. Loop unrolling
ii. Loop jamming

AKTU 2015-16, Marks 15


Code Generation
5-24C(CsAT-Sem-5)
OR
Write short note on global data analysis
AKTU 2017-18, Marks 06
Answer

i. Global data flow analysis


. Global data flow analysis collects the information about the entire
program and distributed it to each block in the flow graph.
2. Data flow can be collected in various block by setting up and solving
a system of equation.

3. A data flow equation is given as:


OUT(s) = (IN(s) KILLs)) U GEN(s)
OUT(s): Definitions that reach exist of block B.
GEN(s): Definitions within block B that reach the end of B.
INs): Definitions that reaches entry of block B.
KILL(s): Definitions that never reaches the end of block B.
ii Loop unrolling: Refer Q. 5.10, Page 5-11C, Unit-5.
ii. Loop fusion or loop jamming: Refer Q. 5.10, Page 5-11C, Unit-5.

Que 5.21. Discuss the role of macros in programming language.

Answer

Role of macros in programming language are :


1. It is use to define word that are used most of the time in program.

2. It automates complex task.


3. It helps to reduce the use of complex statement in a program.

4 It makes the program run faster.

VERY IMPORTANT QUESTIONS


Following questions are very important. These questions
may be asked in your SESSIONALS a s well as

UNIVERSITY EXAMINATION.

Q.1. What is code generation ? Discuss the design issues of


code generation.
Ans. Refer Q. 5.1.

You might also like