0% found this document useful (0 votes)
59 views34 pages

CD Important Questions With Answers

The document discusses the phases of a compiler and provides details about each phase. It also defines regular expressions and explains their properties and operations. The main phases of a compiler are lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization and code generation. Regular expressions use operators like concatenation, union and Kleene closure to define languages and satisfy closure properties.

Uploaded by

Omer Sohail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views34 pages

CD Important Questions With Answers

The document discusses the phases of a compiler and provides details about each phase. It also defines regular expressions and explains their properties and operations. The main phases of a compiler are lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization and code generation. Regular expressions use operators like concatenation, union and Kleene closure to define languages and satisfy closure properties.

Uploaded by

Omer Sohail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

1. Explain the various phases of a compiler in detail.

Also write down the output for the


following expression after each phase a: =b*cd.
Answer:

PHASES OF COMPILER
A Compiler operates in phases, each of which transforms the source program from one
representation into another. The following are the phases of the compiler:

Main phases:
1) Lexical analysis
2) Syntax analysis
3) Semantic analysis
4) Intermediate code generation
5) Code optimization
6) Code generation
Sub-Phases:
1) Symbol table management
2) Error handling

Lexical Analysis

The first phase of a compiler is called lexical analysis or scanning. The lexical analyzer reads the
stream of characters making up the source program and groups the characters into meaningful
sequences called lexemes. For each lexeme, the lexical analyzer produces as output a token of
the form (token-name, attribute-value) that it passes on to the subsequent phase, syntax analysis.
In the token, the first component token-name is an abstract symbol that is used during syntax
analysis, and the second component attribute-value points to an entry in the symbol table for this
token. Information from the symbol-table entry 'is needed for semantic analysis and code
generation.
For example, suppose a source program contains the assignment statement
p o s i t i o n = i n i t i a l + r a t e * 60
after lexical analysis as the sequence of tokens

Syntax Analysis
The second phase of the compiler is syntax analysis or parsing. A typical representation is a
syntax tree in which each interior node represents an operation and the children of the node
represent the arguments of the operation. A syntax tree for the token stream (1.2) is shown as the
output of the syntactic analyzer in Fig. 1.7.9 This tree shows the order in which the operations in
the assignment
p o s i t i o n = i n i t i a l + r a t e * 60
are to be performed. The tree has an interior node labeled * with (id, 3) as its left child and the
integer 60 as its right child. The node (id, 3) represents the identifier r a t e . The node labeled *
makes it explicit that we must first multiply the value of r a t e by 60. The node labeled +
indicates that we must add the result of this multiplication to the value of i n i t i a l . The root of
the tree, labeled =, indicates that we must store the result of this addition into the location for the
identifier p o s i t ion.

 It is the second phase of the compiler. It is also known as parser. It gets the token stream
as input from the lexical analyser of the compiler and generates syntax tree as the output.
 Syntax tree:It is a tree in which interior nodes are operators and exterior nodes are
operands.

SEMANTIC ANALYSIS:

The semantic analyzer uses the syntax tree and the information in the symbol table to check the
source program for semantic consistency with the language definition.

An important part of semantic analysis is type checking, where the compiler checks that each
operator has matching operands. For example, many programming language definitions require
an array index to be an integer; the compiler must report an error if a floating-point number is
used to index an array.

The language specification may permit some type conversions called coercions. For example, a
binary arithmetic operator may be applied to either a pair of integers or to a pair of floating-point
numbers. If the operator is applied to a floating-point number and an integer, the compiler may
convert or coerce the integer into a floating-point number.

 It is the third phase of the compiler.


 It gets input from the syntax analysis as parse tree and checks whether the given syntax is
correct or not.
 It performs type conversion of all the data types into real data types.

Intermediate Code Generation


In the process of translating a source program into target code, a compiler may construct one or
more intermediate representations, which can have a variety of forms. Syntax trees are a form of
intermediate representation; they are commonly used during syntax and semantic analysis.
we consider an intermediate form called three-address code, which consists of a sequence of
assembly-like instructions with three operands per instruction. Each operand can act like a
register. The output of the intermediate code generator in Fig. 1.7 consists of the three-address
code sequence

t l = i n t t o f l o a t (60)
t 2 = id3 * tl
t 3 = id2 + t 2
id1=t3

Code Optimization
The machine-independent code-optimization phase attempts to improve the intermediate code so
that better target code will result. Usually better means faster, but other objectives may be
desired, such as shorter code, or target code that consumes less power.

A simple intermediate code generation algorithm followed by code optimization is a reasonable


way to generate good target code. The optimizer can deduce that the conversion of 60 from
integer to floating point can be done once and for all at compile time, so the inttofloat operation
can be eliminated by replacing the integer 60 by the floating-point number 60.0

Code Generation
The code generator takes as input an intermediate representation of the source program and maps
it into the target language. If the target language is machine code, registers or memory locations
are selected for each of the variables used by the program. Then, the intermediate instructions are
translated into sequences of machine instructions that perform the same task. A crucial aspect of
code generation is the judicious assignment of registers to hold variables. For example, using
registers R1 and R2, the intermediate code in (1.4) might get translated into the machine code

LDF R2 , id3
MULF R2 , R2 , #60.0
LDF Rl , id2
ADDF Rl , Rl , R2
STF i dl , Rl

SYMBOL TABLE MANAGEMENT:


 Symbol table is used to store all the information about identifiers used in the program.
 It is a data structure containing a record for each identifier, with fields for the attributes of
 the identifier.
 It allows to find the record for each identifier quickly and to store or retrieve data from
that record.
 Whenever an identifier is detected in any of the phases, it is stored in the symbol table.

ERROR HANDLING:
Each phase can encounter errors. After detecting an error, a phase must handle the error so that
compilation can proceed.
 In lexical analysis, errors occur in separation of tokens.
 In syntax analysis, errors occur during construction of syntax tree.
 In semantic analysis, errors occur when the compiler detects constructs with right
syntactic structure but no meaning and during type conversion.
 In code optimization, errors occur when the result is affected by the optimization.
 In code generation, it shows error when code is missing etc.

2. Define Regular Expression. Explain about the Properties of Regular Expressions


Answer:

Regular expressions :
 It is a way of representing regular languages.
 The algebraic description for regular languages is done using regular expressions.
 They can define the same language that various forms of finite automata can
describe.
 Regular expressions offer something that finite automata do not, i.e. it is a
declarative way to express the strings that we want to accept. They act as input for
many systems. They are used for string matching in many systems(Java, python
etc.)
 For example, Lexical-analyzer generators, such as Lex or Flex.
 The widely used operators in regular expressions are Kleene
closure(∗) ,concatenation(.) , Union(+).
Rules for regular expressions :
 The set of regular expressions is defined by the following rules.
 Every letter of ∑ can be made into a regular expression, null string, ∈ itself is a regular
expression.
If r1 and r2 are regular expressions, then (r1), r1.r2, r1+r2, r1*, r1 + are also regular
expressions.
Example – ∑ = {a, b} and r is a regular expression of language made using these symbols

Regular language Regular set


∅ {}
∈ {∈}
a* {∈, a, aa, aaa …..}
a+ b {a, b}
a.b {ab}
a* + ba {∈, a, aa, aaa,…… , ba}

Operations performed on regular expressions :


1. Union –
The union of two regular languages, L1 and L2, which are represented using L1 ∪ L2, is also
regular and which represents the set of strings that are either in L1 or L2 or both.
Example
L1 = (1+0).(1+0) = {00 , 10, 11, 01} and
L2 = {∈ , 100}
then L1 ∪ L2 = {∈, 00, 10, 11, 01, 100}.

2. Concatenation –
The concatenation of two regular languages, L1 and L2, which are represented using L1.L2 is
also regular and which represents the set of strings that are formed by taking any string in L1
concatenating it with any string in L2.
Example –
L1 = { 0,1 } and L2 = { 00, 11} then L1.L2 = {000, 011, 100, 111}.
3. Kleene closure –
If L1 is a regular language, then the Kleene closure i.e. L1* of L1 is also regular and represents
the set of those strings which are formed by taking a number of strings from L1 and the same
string can be repeated any number of times and concatenating those strings.
Example –
L1 = { 0,1} = {∈, 0, 1, 00, 01, 10, 11 …….} , then L* is all strings possible with symbols 0
and 1 including a null string.
Algebraic properties of regular expressions :
Kleene closure is an unary operator and Union(+) and concatenation operator(.) are binary
operators.
1. Closure –
If r1 and r2 are regular expressions(RE), then
 r1* is a RE
 r1+r2 is a RE
 r1.r2 is a RE
2. Closure laws –
 (r*)* = r, closing an expression that is already closed does not change the language.
 ∅* = ∈, a string formed by concatenating any number of copies of an empty string is empty
itself.
 r+ = r.r* = r*r, as r* = ∈ + r + rr+ rrr …. and r.r* = r+ rr + rrr ……
 r* = r*+ ∈
3. Associativity –
If r1, r2, r3 are RE, then
i.) r1+ (r2+r3) = (r1+r2) +r3
 For example : r1 = a , r2 = b , r3 = c, then
 The resultant regular expression in LHS becomes a+(b+ c) and the regular set for the
corresponding RE is {a, b, c}.
 for the RE in RHS becomes (a+ b) + c and the regular set for this RE is {a, b, c}, which is
same in both cases. Therefore, the associativity property holds for union operator.
ii.) r1.(r2.r3) = (r1.r2).r3
 For example – r1 = a , r2 = b , r3 = c
 Then the string accepted by RE a.(b.c) is only abc.
 The string accepted by RE in RHS is (a.b).c is only abc ,which is same in both cases.
Therefore, the associativity property holds for concatenation operator.
Associativity property does not hold for Kleene closure(*) because it is unary operator.
4. Identity –
In the case of union operators
if r+ x = r ⇒ x= ∅ as r ∪ ∅= r, therefore ∅ is the identity for +.
Therefore, ∅ is the identity element for a union operator.
In the case of concatenation operator –
if r.x = r , for x= ∈
r.∈ = r ⇒ ∈ is the identity element for concatenation operator(.) .
5. Annihilator –
 If r+ x = r ⇒ r ∪ x= x , there is no annihilator for +
 In the case of a concatenation operator, r.x = x, when x = ∅, then r.∅ = ∅, therefore ∅ is the
annihilator for the (.)operator. For example {a, aa, ab}.{ } = { }
6. Commutative property –
If r1, r2 are RE, then
 r1+r2 = r2+r1. For example, for r1 =a and r2 =b, then RE a+ b and b+ a are equal.
 r1.r2 ≠ r2.r1. For example, for r1 = a and r2 = b, then RE a.b is not equal to b.a.
7. Distributed property –
If r1, r2, r3 are regular expressions, then
 (r1+r2).r3 = r1.r3 + r2.r3 i.e. Right distribution
 r1.(r2+ r3) = r1.r2 + r1.r3 i.e. left distribution
 (r1.r2) +r3 ≠ (r1+r3)(r2+r3)
8. Idempotent law –
 r1 + r1 = r1 ⇒ r1 ∪ r1 = r1 , therefore the union operator satisfies idempotent property.
 r.r ≠ r ⇒ concatenation operator does not satisfy idempotent property.
9. Identities for regular expression –
There are many identities for the regular expression. Let p, q and r are regular expressions.
 ∅+r=r
 ∅.r= r.∅ = ∅
 ∈.r = r.∈ =r
 ∈* = ∈ and ∅* = ∈
 r+r=r
 r*.r* = r*
 r.r* = r*.r = r+.
 (r*)* = r*
 ∈ +r.r* = r* = ∈ + r.r*
 (p.q)*.p = p.(q.p)*
 (p + q)* = (p*.q*)* = (p* + q*)*
 (p+ q).r= p.r+ q.r and r.(p+q) = r.p + r.q

3. Explain various error recovery strategies in lexical analysis.

Answer:

In this phase of compilation, all possible errors made by the user are detected and reported to
the user in form of error messages. This process of locating errors and reporting it to user is
called Error Handling process.
Functions of Error handler
 Detection
 Reporting
 Recovery

Classification of Errors

Compile time errors are of three types:-


Lexical phase errors
These errors are detected during the lexical analysis phase. Typical lexical errors are
 Exceeding length of identifier or numeric constants.
 Appearance of illegal characters
 Unmatched string
 Example 1 : printf("Geeksforgeeks");$
This is a lexical error since an illegal character $ appears at the end of statement.

 Example 2 : This is a comment */
This is an lexical error since end of comment is present but beginning is not present.
Error recovery:
Panic Mode Recovery
 In this method, successive characters from the input are removed one at a time until a
designated set of synchronizing tokens is found. Synchronizing tokens are delimiters such
as; or }
 Advantage is that it is easy to implement and guarantees not to go to infinite loop
 Disadvantage is that a considerable amount of input is skipped without checking it for
additional errors
Syntactic phase errors
These errors are detected during syntax analysis phase. Typical syntax errors are
 Errors in structure
 Missing operator
 Misspelled keywords
 Unbalanced parenthesis
Example : swicth(ch)
{
.......
.......
}
The keyword switch is incorrectly written as swicth. Hence, “Unidentified
keyword/identifier” error occurs.
Error recovery:
1. Panic Mode Recovery
 In this method, successive characters from input are removed one at a time until a
designated set of synchronizing tokens is found. Synchronizing tokens are deli-meters
such as ; or }
 Advantage is that its easy to implement and guarantees not to go to infinite loop
 Disadvantage is that a considerable amount of input is skipped without checking it for
additional errors
2. Statement Mode recovery
 In this method, when a parser encounters an error, it performs necessary correction on
remaining input so that the rest of input statement allow the parser to parse ahead.
 The correction can be deletion of extra semicolons, replacing comma by semicolon or
inserting missing semicolon.
 While performing correction, atmost care should be taken for not going in infinite loop.
 Disadvantage is that it finds difficult to handle situations where actual error occured
before point of detection.
3. Error production
 If user has knowledge of common errors that can be encountered then, these errors can
be incorporated by augmenting the grammar with error productions that generate
erroneous constructs.
 If this is used then, during parsing appropriate error messages can be generated and
parsing can be continued.
 Disadvantage is that its difficult to maintain.
4. Global Correction
 The parser examines the whole program and tries to find out the closest match for it
which is error free.
 The closest match program has less number of insertions, deletions and changes of
tokens to recover from erroneous input.
 Due to high time and space complexity, this method is not implemented practically.
Semantic errors
These errors are detected during semantic analysis phase. Typical semantic errors are
 Incompatible type of operands
 Undeclared variables
 Not matching of actual arguments with formal one
Example : int a[10], b;
.......
.......
a = b;
It generates a semantic error because of an incompatible type of a and b.
Error recovery
 If error “Undeclared Identifier” is encountered then, to recover from this a symbol table
entry for corresponding identifier is made.
 If data types of two operands are incompatible then, automatic type conversion is done by
the compiler.

4. Explain in detail about lex tool.

The Lexical-Analyzer Generator Lex

Introduction: In this section, we introduce a tool called Lex, or in a more recent implementation
Flex, that allows one to specify a lexical analyzer by specifying regular expressions to describe
patterns for tokens. The input notation for the Lex tool is referred to as the Lex language and the
tool itself is the Lex compiler. Behind the scenes, the Lex compiler transforms the input patterns
into a transition diagram and generates code, in a file called l e x . y y . c, that simulates this
transition diagram. The mechanics of how this translation from regular expressions to transition
diagrams occurs is the subject of the next sections; here we only learn the Lex language.

Use of Lex: Figure 3.22 suggests how Lex is used. An input file, which we call lex. l , is written
in the Lex language and describes the lexical analyzer to be generated. The Lex compiler
transforms lex.1 to a C program, in a file that is always named l e x. y y . c. The latter file is
compiled by the C compiler into a file called a . o u t , as always. The C-compiler output is a
working lexical analyzer that can take a stream of input characters and produce a stream of
tokens. The normal use of the compiled C program, referred to as a. out in Fig. 3.22, is as a
subroutine of the parser. It is a C function that returns an integer, which is a code for one of the
possible token names. The attribute value, whether it be another numeric code, a pointer to the
symbol table, or nothing, is placed in a global variable y y l v a l , 2 which is shared between the
lexical analyzer and parser, thereby making it simple to return both the name and an attribute
value of a token.
Structure of Lex Programs:

A Lex program has the following form:

declarations

%%

translation rules

%%

auxiliary functions

The declarations section includes declarations of variables, manifest constants (identifiers


declared to stand for a constant, e.g., the name of a token), and regular definitions.

The translation rules each have the form

Pattern {Action}

Each pattern is a regular expression, which may use the regular definitions of the declaration
section. The actions are fragments of code, typically written in C, although many variants of Lex
using other languages have been created. The third section holds whatever additional functions
are used in the actions. Alternatively, these functions can be compiled separately and loaded with
the lexical analyzer.

The lexical analyzer created by Lex behaves in concert with the parser as follows. When called
by the parser, the lexical analyzer begins reading its remaining input, one character at a time,
until it finds the longest prefix of the input that matches one of the patterns Pi. It then executes
the associated action Ai. Typically, Ai will return to the parser, but if it does not (e.g., because Pi
describes whitespace or comments), then the lexical analyzer proceeds to find additional
lexemes, until one of the corresponding actions causes a return to the parser. The lexical analyzer
returns a single value, the token name, to the parser, but uses the shared, integer variable y y l v a
l to pass additional information about the lexeme found, if needed.
Unit-2
1. Consider the grammar. E → E + T E → T T → T * F T → F F → (E) / id Construct SLR
parsing table for the above grammar. Give the moves of the SLR parser on id * id + id.

SLR Grammar

E` → E
E→E+T|T
T→T*F|F
F → id

Add Augment Production and insert '•' symbol at the first position for every production in G

E` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id

I0 State:

Add Augment production to the I0 State and Compute the Closure

I0 = Closure (E` → •E)

Add all productions starting with E in to I0 State because "." is followed by the non-terminal. So,
the I0 State becomes

I0 = E` → •E
E → •E + T
E → •T

Add all productions starting with T and F in modified I0 State because "." is followed by the
non-terminal. So, the I0 State becomes.

I0= E` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id

I1= Go to (I0, E) = closure (E` → E•, E → E• + T)


I2= Go to (I0, T) = closure (E → T•,T → T• * F)
I3= Go to (I0, F) = Closure ( T → F• ) = T → F•
I4= Go to (I0, id) = closure ( F → id•) = F → id•
I5= Go to (I1, +) = Closure (E → E +•T)
Add all productions starting with T and F in I5 State because "." is followed by the non-terminal.
So, the I5 State becomes

I5 = E → E +•T
T → •T * F
T → •F
F → •id

Go to (I5, F) = Closure (T → F•) = (same as I3)


Go to (I5, id) = Closure (F → id•) = (same as I4)

I6= Go to (I2, *) = Closure (T → T * •F)

Add all productions starting with F in I6 State because "." is followed by the non-terminal. So,
the I6 State becomes

I6 = T → T * •F
F → •id

Go to (I6, id) = Closure (F → id•) = (same as I4)

I7= Go to (I5, T) = Closure (E → E + T•) = E → E + T•


I8= Go to (I6, F) = Closure (T → T * F•) = T → T * F•

SLR (1) Table

First (E) = First (E + T) ∪ First (T)


First (T) = First (T * F) ∪ First (F)
First (F) = {id}
First (T) = {id}
First (E) = {id}
Follow (E) = First (+T) ∪ {$} = {+, $}
Follow (T) = First (*F) ∪ First (F)
= {*, +, $}
Follow (F) = {*, +, $}

o I1 contains the final item which drives S → E• and follow (S) = {$}, so action {I1, $} =
Accept
o I2 contains the final item which drives E → T• and follow (E) = {+, $}, so action {I2, +}
= R2, action {I2, $} = R2
o I3 contains the final item which drives T → F• and follow (T) = {+, *, $}, so action {I3,
+} = R4, action {I3, *} = R4, action {I3, $} = R4
o I4 contains the final item which drives F → id• and follow (F) = {+, *, $}, so action {I4,
+} = R5, action {I4, *} = R5, action {I4, $} = R5
o I7 contains the final item which drives E → E + T• and follow (E) = {+, $}, so action {I7,
+} = R1, action {I7, $} = R1
o I8 contains the final item which drives T → T * F• and follow (T) = {+, *, $}, so action
{I8, +} = R3, action {I8, *} = R3, action {I8, $} = R3.

2. Construct a Predictive parsing table for the Grammar [4+6] E→E+T/T, T→T*F/F, F→
(E)/id?
3. Define Ambiguous Grammar? Check whether the grammar S → SS, S → a, S → b Is
Ambiguous or not?
Grammar Ambiguity-
Before you go through this article, make sure that you have gone through the previous article
on Ambiguous Grammar.
 There exists no algorithm to check whether any given grammar is ambiguous or not.
 This general decision problem is un decidable-“Whether a grammar is ambiguous or not?”
 This is because it can be shown that this problem is equivalent to Post Correspondence
Problem.
General Approach To Check Grammar Ambiguity-
To check whether a given grammar is ambiguous or not, we follow the following steps-
Step-01:
We try finding a string from the Language of Grammar such that for the string there exists more
than one-
 parse tree
 or derivation tree
 or syntax tree
 or leftmost derivation
 or rightmost derivation
Step-02:
If there exists at least one such string, then the grammar is ambiguous otherwise unambiguous.
Check whether the given grammar is ambiguous or not-
S → SS
S→a
S→b
Solution- Let us consider a string w generated by the given grammar-w = abba
Now, let us draw parse trees for this string w.

5. Construct CLR Parsing table for the grammar S->CC , C->cC, C->d

Answer:

We form the Augmented grammar by introducing the new start symbol S’ and form the set of
items and is given in table
The LR(1) items can also be represented as a DFA similar to the LR(0) items where the
states correspond to the nodes and edges correspond to the grammar symbols.

CLR Parsing Table

The CALR parsing table also has two divisions: action and goto. The action() fields are
constituted by the terminals and it has the shift, reduce, accept and error actions. The goto() fields
are constituted by the non-terminals and it contains the state numbers which is the result of the
goto(). The procedure for the CALR parsing table construction is given in Algorithm 17.4.
Step 1 of the algorithm 17.4 calls for the construction of the LR(1) items. Step 2 has four actions
which are used to construct the action field of the CALR parsing table. The first one is a shift
action which is the same as the SLR table’s shift action. If goto(Ii, a) = Ij then at the intersection
of [i, a] we set the action as “sj” to indicate “shift j”. Step 2 (ii) of the algorithm is for reduce
action where the kernel items are considered. At the table entry of kernel item number and the
look-ahead symbol we add the action reduce by the production indicated by the kernel item. The
item number that has the initial kernel item is used to indicate the accept action as in the SLR
parsing table. All other entries are considered as error in the action field of the CALR parsing
table. The goto() field is the same as the SLR table’s goto field. If goto(Ii , A) = Ij then goto (i, A)
= j is added to the CALR parsing table.
CALR parsing
The stack is initialized with the state 0. The stack contains alternately state number and the
grammar symbol with the state number on the stack. The input is appended with $. The stack
symbol and the input are compared in the table and the stack is manipulated accordingly. The
CALR parsing table is given in Algorithm 17.5
CALR Parsing Table (Table T, Input w$) {
6. Differentiate between Top down and bottom up parsing techniques
Answer:
There are 2 types of Parsing Technique present in parsing, first one is Top-down parsing and
second one is Bottom-up parsing.
Top-down Parsing is a parsing technique that first looks at the highest level of the parse tree
and works down the parse tree by using the rules of grammar while Bottom-up Parsing is a
parsing technique that first looks at the lowest level of the parse tree and works up the parse
tree by using the rules of grammar.
There are some differences present to differentiate these two parsing techniques, which are
given below:

S.No Top Down Parsing Bottom Up Parsing


1. It is a parsing strategy that first looks at the It is a parsing strategy that first looks at the
highest level of the parse tree and works lowest level of the parse tree and works up the
down the parse tree by using the rules of parse tree by using the rules of grammar.
grammar.

2. Top-down parsing attempts to find the left Bottom-up parsing can be defined as an
most derivations for an input string. attempts to reduce the input string to start
symbol of a grammar.
3. In this parsing technique we start parsing In this parsing technique we start parsing from
from top (start symbol of parse tree) to down bottom (leaf node of parse tree) to up (the start
(the leaf node of parse tree) in top-down symbol of parse tree) in bottom-up manner.
manner.

4. This parsing technique uses Left Most This parsing technique uses Right Most
Derivation. Derivation.
5. It’s main decision is to select what production It’s main decision is to select when to use a
rule to use in order to construct the string. production rule to reduce the string to get the
starting symbol.

Unit-3
1. What is a three address code? Mention its types. How would you implement the three
address statements? Explain with examples.

Answer:

• Three address code is a type of intermediate code which is easy to generate and can be
easily converted to machine code.
• It makes use of at most three addresses and one operator to represent an expression and
the value computed at each instruction is stored in temporary variable generated by
compiler.
• The compiler decides the order of operation given by three address code.

General representation –
a = b op c Where a, b or c represents operands like names, constants or compiler generated
temporaries and op represents the operator

Example-1: Convert the expression a * – (b + c) into three address code.

There are 3 representations of three address code namely

• Quadruple
• Triples
• Indirect Triples

1. Quadruple: It is structure with consist of 4 fields namely op, arg1, arg2 and result. op denotes
the operator and arg1 and arg2 denotes the two operands and result is used to store the result
of the expression.
Consider expression a = b * – c + b * – c.
The three address code is:

t1 = uminus c

t2 = b * t1

t3 = uminus c

t4 = b * t3

t5 = t2 + t4
a = t5

Advantage
Easy to rearrange code for global optimization.
One can quickly access value of temporary variables using symbol table.
Disadvantage –
Contain lot of temporaries.
Temporary variable creation increases time and space complexity.
2. Triples –
This representation doesn’t make use of extra temporary variable to represent a single
operation instead when a reference to another triple’s value is needed, a pointer to that triple
is used. So, it consist of only three fields namely op, arg1 and arg2.
o It is difficult to optimize because optimization involves moving intermediate code. When
a triple is moved, any other triple referring to it must be updatd also. With help of pointer
one can directly access symbol table entry.

Consider expression a = b * – c + b * – c.
The three address code is:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5

Disadvantage –
Temporaries are implicit and difficult to rearrange code.

Indirect Triples –

This representation makes use of pointer to the listing of all references to computations which is
made separately and stored. Its similar in utility as compared to quadruple representation but
requires less space than it. Temporaries are implicit and easier to rearrange code.

Example – Consider expression a = b * – c + b * – c.


The three address code is:

t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
2. Explain various storage allocation strategies with its merits and demerits
STORAGE ALLOCATION TECHNIQUES

 Static Storage Allocation


 Stack Storage Allocation
 Heap Storage Allocation

Runtime storage can be subdivide to hold Target code- the program code , it is static as its
size can be determined at compile time and it contains Static data objects,Dynamic data objects-
heapAutomatic data objects- stack

Static Storage Allocation

 For any program if we create memory at compile time, memory will be created in
the static area.
 For any program if we create memory at compile time only, memory is created
only once.
 It don’t support dynamic data structure i.e memory is created at compile time and
de-allocated after program completion.
 The drawback with static storage allocation is recursion is not supported.
 Another drawback is size of data should be known at compile time.
Eg - FORTRAN was designed to permit static storage allocation.

In static allocation, names are bound to storage as the program is compiled, so there is no
need for a run-time support package. Since the bindings do not change at run-time, everytime a
procedure is activated, its names are bound to the same storage locations. Therefore values of
local names are retained across activations of a procedure.
That is, when control returns to a procedure the values of the locals are the same as they
were when control left the last time. From the type of a name, the compiler decides the amount
of storage for the name and decides where the activation records go. At compile time, we can fill
in the addresses at which the target code can find the data it operates on.

Stack Storage Allocation


 Storage is organized as a stack and activation records are pushed and popped as
activation begin and end respectively.
 Locals are contained in activation records so they are bound to fresh storage in each
activation.
 Recursion is supported in stack allocation

All compilers for languages that use procedures, functions or methods as units of user-
defined actions manage at least part of their run-time memory as a stack. Each time a procedure
is called , space for its local variables is pushed onto a stack, and when the procedure terminates,
that space is popped off the stack.

Calling sequences:
Procedures called are implemented in what is called as calling sequence, which consists of
code that allocates an activation record on the stack and enters information into its fields. A
return sequence is similar to code to restore the state of machine so the calling procedure can
continue its execution after the call. The code in calling sequence is often divided between the
calling procedure (caller) and the procedure it calls (callee).
When designing calling sequences and the layout of activation records, the following principles
are helpful:
 Values communicated between caller and callee are generally placed at the beginning of
the callee’s activation record, so they are as close as possible to the caller’s activation
record.
 Fixed length items are generally placed in the middle. Such i the control link, the
accesslink, and the machine status fields
 Items whose size may not be known early enough are placed at the end of the activation
record. The most common example is dynamically sized array, where the value of one of
the callee’s parameters determines the length of the array.
 We must locate the top-of-stack pointer judiciously. A common approach is to have it
pointto the end of fixed-length fields in the activation record. Fixed-length data can
then be accessed by fixed offsets, known to the intermediate-code generator, relative to
the top-of-stack pointer.
The calling sequence and its division between caller and callee are as follows:
 The caller evaluates the actual parameters.
 The caller stores a return address and the old value of top_sp into the callee’s activation
record. The caller then increments the top_sp to the respective positions.
 The callee saves the register values and other status information.
 The callee initializes its local data and begins execution.
A suitable, corresponding return sequence is:
 The callee places the return value next to the parameters.
 Using the information in the machine-status field, the callee restores top_sp and other
registers, and then branches to the return address that the caller placed in the status field.
 Although top_sp has been decremented, the caller knows where the return value is,
relative to the current value of top_sp; the caller therefore may use that value.
Heap Storage Allocation

 Memory allocation and deallocation can be done at any time and at any place depending
on the requirement of the user.
 Heap allocation is used to dynamically allocate memory to the variables and claim it back
when the variables are no more required.
 Recursion is supported.

Stack allocation strategy cannot be used if either of the following is possible :


1. The values of local names must be retained when an activation ends.
2. A called activation outlives the caller.

Heap allocation parcels out pieces of contiguous storage, as needed for activation records or
other objects. Pieces may be deallocated in any order, so over the time the heap will consist of
alternate areas that are free and in use.

3. A) a) Define Three Address Code

Three address code is a type of intermediate code which is easy to generate and can be easily converted to
machine code.

It makes use of at most three addresses and one operator to represent an expression and the value
computed at each instruction is stored in temporary variable generated by compiler.

The compiler decides the order of operation given by three address code.

a = b op c Where a, b or c represents operands like names, constants or compiler generated temporaries


and op represents the operator

Example-1: Convert the expression a * – (b + c) into three address code.


b) Differentiate Parse Trees & Syntax Trees?

Parse Tree Syntax Tree

Parse tree is a graphical representation


of the replacement process in a Syntax tree is the compact form of a parse tree.
derivation.

Each interior node represents a Each interior node represents an operator.


grammar rule.
Each leaf node represents an operand.
Each leaf node represents a terminal.

Parse trees provide every


Syntax trees do not provide every characteristic
characteristic information from the
information from the real syntax.
real syntax.

Parse trees are comparatively less Syntax trees are comparatively more dense than
dense than syntax trees. parse trees.

c)Define Semantic Rule

In syntax directed translation, along with the grammar we associate some informal notations and
these notations are called as semantic rules.

So we can say that

1. Grammar + semantic rule = SDT (syntax directed translation)


2. In syntax directed translation, every non-terminal can get one or more than one attribute
or sometimes 0 attribute depending on the type of the attribute. The value of these
attributes is evaluated by the semantic rules associated with the production rule.
3. In the semantic rule, attribute is VAL and an attribute may hold anything like a string, a
number, a memory location and a complex record
4. In Syntax directed translation, whenever a construct encounters in the programming
language then it is translated according to the semantic rules define in that particular
programming language.

Production Semantic Rules

E→E+T E.val := E.val + T.val

E→T E.val := T.val

T→T*F T.val := T.val + F.val

T→F T.val := F.val

F → (F) F.val := F.val

F → num F.val := num.lexval

Example

E.val is one of the attributes of E.

num.lexval is the attribute returned by the lexical analyzer.

4. Construct an annotated parse tree for real id1, id2, id3


Answer:
Annotated Parse-Trees where each node of the tree is a record with a field for each attribute (e.g., X.a
indicates the attribute a of the grammar symbol X). The value of an attribute of a grammar symbol at a
given parse-tree node is defined by a semantic rule associated with the production used at that node.

There are two kinds of attributes:


1. Synthesized Attributes: They are computed from the values of the attributes of the children nodes.
2. Inherited Attributes: They are computed from the values of the attributes of both the siblings and the
parent nodes.
Example: Let us consider the Grammar for arithmetic expressions (DESK CALCULATOR) The Syntax
Directed Definition associates to each non terminal a synthesized attribute called val.
Definition: An S-Attributed Definition is a Syntax Directed Definition that uses only synthesized
attributes.
Evaluation Order: Semantic rules in a S-Attributed Definition can be evaluated by a bottom-up, or Post
Order, traversal of the parse-tree.
The annotated parse-tree for the input 3*5+4n is:

Implementation of SDT for Desk Calculator: Use an extra fields in the parser stack entries corresponding
to the grammar symbols. These fields hold the values of the corresponding translations

The grammar and corresponding Semantic action is shown below:


5. Explain in brief about equivalence of type expressions.

Type Equivalence

TYPE CHECKING RULES usually have the form., if two type expressions are equivalent
then return a given type else return type_error.

KEY IDEAS. The central issue is then that we have to define when two given type
expressions are equivalent.

The main difficulty arises from the fact that most modern languages allow the naming of
user-defined types.
For instance, in C and C++ this is achieved by the typedef statement.
When checking equivalence of named types, we have two possibilities.
Name equivalence.
Treat named types as basic types. Therefore two type expressions are name equivalent if and
only if they are identical, that is if they can be represented by the same syntax tree, with the
same labels.
Structural equivalence.
Replace the named types by their definitions and recursively check the substituted trees.

STRUCTURAL EQUIVALENCE. If type expressions are built from basic types and
constructors (without type names, that is in our example, when using products instead of
records), structural equivalence of types can be decided easily.

For instance, to check whether the constructed types array(n1,T1) and array(n2,T2) are
equivalent
we can check that the integer values n1 and n2 are equal and recursively check that T1 and
T2 are equivalent,or we can be less restrictive and check only that T1 and T2 are equivalent.
Compilers use representations for type expressions (trees or dags) that allow type
equivalence to be tested quickly.
RECURSIVE TYPES. In PASCAL a linked list is usually defined as follows.

type link = cell;


cell = record
info: type;
next: link;
end;

The corresponding type graph has a cycle. So to decide structural equivalence of two types
represented by graphs PASCAL compilers put a mark on each visited node (in order not to
visit a node twice). In C, a linked list is usually defined as follows.

struct cell {
int info;
struct cell *next;
};
To avoid cyclic graphs, C compilers
require type names to be declared before they are used, except for pointers to records.
use structural equivalence except for records for which they use name equivalence.

6. What is symbol table? Discuss various ways to organizing symbol table

Symbol Table is an important data structure created and maintained by the compiler in order to
keep track of semantics of variable i.e. it stores information about scope and binding information
about names, information about instances of various entities such as variable and function names,
classes, objects, etc.
 It is built in lexical and syntax analysis phases.
 The information is collected by the analysis phases of compiler and is used by synthesis
phases of compiler to generate code.
 It is used by compiler to achieve compile time efficiency.
 It is used by various phases of compiler as follows :-
1. Lexical Analysis: Creates new table entries in the table, example like entries about token.
2. Syntax Analysis: Adds information regarding attribute type, scope, dimension, line of
reference, use, etc in the table.
3. Semantic Analysis: Uses available information in the table to check for semantics i.e. to
verify that expressions and assignments are semantically correct(type checking) and update
it accordingly.
4. Intermediate Code generation: Refers symbol table for knowing how much and what type
of run-time is allocated and table helps in adding temporary variable information.
5. Code Optimization: Uses information present in symbol table for machine dependent
optimization.
6. Target Code generation: Generates code by using address information of identifier present
in the table.
Symbol Table entries – Each entry in symbol table is associated with attributes that support
compiler in different phases.
Items stored in Symbol table:
 Variable names and constants
 Procedure and function names
 Literal constants and strings
 Compiler generated temporaries
 Labels in source languages
Information used by compiler from Symbol table:
 Data type and name
 Declaring procedures
 Offset in storage
 If structure or record then, pointer to structure table.
 For parameters, whether parameter passing by value or by reference
 Number and type of arguments passed to function
 Base Address
Operations of Symbol table – The basic operations defined on a symbol table include:

Implementation of Symbol table –


Following are commonly used data structure for implementing symbol table :-
1. List –
 In this method, an array is used to store names and associated information.
 A pointer “available” is maintained at end of all stored records and new names are added
in the order as they arrive
 To search for a name we start from beginning of list till available pointer and if not found we
get an error “use of undeclared name”
 While inserting a new name we must ensure that it is not already present otherwise error
occurs i.e. “Multiple defined name”
 Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
 Advantage is that it takes minimum amount of space.
2. Linked List –
 This implementation is using linked list. A link field is added to each record.
 Searching of names is done in order pointed by link of link field.
 A pointer “First” is maintained to point to first record of symbol table.
 Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
3. Hash Table –
 In hashing scheme two tables are maintained – a hash table and symbol table and is the
most commonly used method to implement symbol tables..
 A hash table is an array with index range: 0 to tablesize – 1.These entries are pointer
pointing to names of symbol table.
 To search for a name we use hash function that will result in any integer between 0 to
tablesize – 1.
 Insertion and lookup can be made very fast – O(1).
 Advantage is quick search is possible and disadvantage is that hashing is complicated to
implement.
4. Binary Search Tree –
 Another approach to implement symbol table is to use binary search tree i.e. we add two link
fields i.e. left and right child.
 All names are created as child of root node that always follow the property of binary search
tree.
 Insertion and lookup are O(log2 n) on average.
7. Write an SDT to convert infix to postfix expression

Answer :
The idea is to specify the translation of a source language construct in terms of attributes of its
syntactic components. The basic idea is use the productions to specify a (typically recursive)
procedure for translation. For example, consider the production

stmt-list → stmt-list ; stmt


To process the left stmt-list, we

1. Call ourselves recursively to process the right stmt-list (which is smaller). This will, say,
generate code for all the statements in the right stmt-list.
2. Call the procedure for stmt, generating code for stmt.
3. Process the left stmt-list by combining the results for the first two steps as well as what is
needed for the semicolon (a terminal, so we do not further delegate its actions). In this
case we probably concatenate the code for the right stmt-list and stmt.

To avoid having to say the right stmt-list we write the production as

stmt-list → stmt-list1 ; stmt

where the subscript is used to distinguish the two instances of stmt-list.

2.3.1: Postfix notation

This notation is called postfix because the rule is operator after operand(s). Parentheses are not
needed. The notation we normally use is called infix. If you start with an infix expression, the
following algorithm will give you the equivalent postfix expression.

 Variables and constants are left alone.


 E op F becomes E' F' op, where E' and F' are the postfix of E and F respectively.
 ( E ) becomes E', where E' is the postfix of E.
One question is, given say 1+2-3, what is E, F and op? Does E=1+2, F=3, and op=+? Or does
E=1, F=2-3 and op=+? This is the issue of precedence mentioned above. To simplify the present
discussion we will start with fully parenthesized infix expressions.

Example: 1+2/3-4*5

1. Start with 1+2/3-4*5


2. Parenthesize (using standard precedence) to get (1+(2/3))-(4*5)
3. Apply the above rules to calculate P{(1+(2/3))-(4*5)}, where P{X} means “convert the
infix expression X to postfix”.
A. P{(1+(2/3))-(4*5)}
B. P{(1+(2/3))} P{(4*5)} -
C. P{1+(2/3)} P{4*5} -
D. P{1} P{2/3} + P{4} P{5} * -
E. 1 P{2} P{3} / + 4 5 * -
F. 1 2 3 / + 4 5 * -
Example: Now do (1+2)/3-4*5

1. Parenthesize to get ((1+2)/3)-(4*5)


2. Calculate P{((1+2)/3)-(4*5)}
A. P{((1+2)/3) P{(4*5)} -
B. P{(1+2)/3} P{4*5) -
C. P{(1+2)} P{3} / P{4} P{5} * -
D. P{1+2} 3 / 4 5 * -
E. P{1} P{2} + 3 / 4 5 * -
F. 1 2 + 3 / 4 5 * -

You might also like