CD Guess Paper
CD Guess Paper
B. Tech.V Semester
5CS-04/ Compiler Design
Branch-AIDS/CS/IT
GUESS PAPER
Attempt all questions
Schematic diagrams must be shown wherever necessary. Any data you feel missing suitably be
assumed and state clearly .No supplementary sheet shall be issued in any case.
Part A (All questions are compulsory)
Answer should be given up to 25 words only
S->cC
C->cC
C-c/d [CO2]
Q. 5 Solve the input id+id*id, using operator precedence parser for the following grammar:
T->T+T/T*T/id [CO2]
Q.6 Construct DFA for following regular expression without constructing NFA.
(a | b) * a [CO1]
Q.7 Demonstrate Input buffering techniques. [CO1]
Q.8 Design the FIRST SET and FOLLOW SET for the following grammar. [CO2]
S→ Bb/Cd
B→ aB/Ɛ
C→ cC/Ɛ
Q.9 Justify that the following grammar is LL (1).
S→ AaAb | BbBa
A→ ϵ
B→ϵ [CO2]
Q.10 Describe the process of bootstrapping in detail. Also write short note on error recovery
Strategies. [CO1]
Q.11 A) Write a regular expression for identifiers and reserved words. Design the
transition diagrams for them. [CO1]
B) Explain the three general approaches for the implementation of a Lexical
analyzer. [CO2]
Q.12 Construct the predictive parser for the following grammar. [CO2]
S -> (L) | a
L ->L,S | S
Q.13 Translate the assignment x := A[y,z] into three address statement. [CO5]
Q.14 Write the quadruple, triple, indirect triple for the expression. [CO3]
-(a*b) + (c+d)-(a+b+c+d)
Q.15 Write regular expressions for the set of words having a,e,i,o,u appearing in that order,
although not necessarily consecutively. [CO4]
ANSWERS
Q.1Differentiate between Top-down Parsing and Bottom up parsing.
Ans.Top-down parsing bottom up parsing
Top-down approach starts evaluating Bottom-up approach starts evaluating the
the parse tree from the top and move parse tree from the lowest level of the
downwards for parsing other nodes. tree and move upwards for parsing the
node.
Q.3 Classify leftmost derivation and right most derivation Show anexample for
each.
Ans.Leftmost derivation: A leftmost derivation is obtained by applying production
to the leftmost variable in each successive step.
Example:
Consider the grammar G with production:
S→aSS
A→b
Computethestring w=‘aababbb’ with leftmostderivation.
S⇒aSS (Rule:1)
S⇒aaSSS (Rule:1)
S⇒aabSS (Rule:2)
S⇒aabaSSS (Rule:1)
S⇒aababSS (Rule:2)
S⇒aababbS (Rule:2)
S⇒aababbb (Rule: 2)
To obtain the string ‘w’ the sequence followed is “left most derivation”, following
“1121222”.
Rightmost derivation: A rightmost derivation is obtained by applying production to
the rightmost variable in each step.
Example:
Consider the grammar G with production:
S→aSS
A→b
Compute the string w = ‘aababbb’ with right most derivation.
S⇒aSS (Rule:1)
S⇒aSb (Rule:2)
S⇒aaSSb (Rule:1)
S⇒aaSaSSb (Rule:1)
S⇒aaSaSbb (Rule:2)
S⇒aaSabbb (Rule:2)
S⇒aababbb (Rule: 2)
To obtain the string ‘w’ the sequence followed is “right most derivation”, following
“1211222”.
Grammar A -> Aa | a:
This grammar is not ambiguous. It generates strings where 'a' can appear at the
beginning or after one or more 'a's.
Grammar B -> Bb | b:
This grammar is not ambiguous. It generates strings where 'b' can appear at the
beginning or after one or more 'b's.
Part-B
Q.1 Construct DFA for following regular expression without constructing NFA.
(a|b)*a.
Ans.Given Regular Expression: (a|b)*a
Original: (a|b)*a
Reversed: a*|a
Construct DFA from the Reversed Regular Expression:
Transitions:
State 0 --(a)--> State 0
State 0 --(*)--> State 0
State 0 --(|)--> State 0, State 1
State 1 --(a)--> State 1
Q.3 Design the FIRST SET and FOLLOW SET for the following grammar.
S->Bb/Cd B->aB/€ C->cC/€
Ans.
The construction of these sets is based on the production rules of the grammar. Let's
explain the process for each set:
FIRST set:
For each non-terminal, add the first terminal of its production rules.
If ε (empty string) is in the FIRST set of a non-terminal, also add the
first terminal of the next symbol in the production.
FOLLOW set:
Initialize the FOLLOW set of the start symbol with $.
For each production rule, add the FIRST set of the following symbol to
the FOLLOW set of the current non-terminal (excluding ε).
If ε is in the FIRST set of the following symbol, add the FOLLOW set of
the left-hand side non-terminal to the FOLLOW set of the current non-
terminal.
Q.5 Describe the process of bootstrapping in detail. Also write short note on
error recovery strategies.
Ans. Bootstrapping - When a computer is first turned on or restarted, a special type
of absolute loader, called asbootstrap loader is executed. This bootstrap loads the first
program to be run by the computerusually an operating system. The bootstrap itself
begins at address O in the memory of the
machine. It loads the operating system (or some other program) starting at address 80.
After allof the object code from device has been loaded, the bootstrap program jumps
to address 80,which begins the execution of the program that was loaded.Such
loaders can be used to run stand-alone programs independent of the operating system
or thesystem loader. They can also be used to load the operating system or the loader
itself intomemory.
Loaders are of two types:
Linking loader.
Linkage editor.
Linkage loaders, perform all linking and relocation at load time.
Linkage editors, perform linking prior to load time and dynamic linking, in which the
linkingfunction is performed at execution time.
error recovery strategies-
panic mode, statement mode, error productions, global correction
Panic mode
When a parser encounters an error anywhere in the statement, it ignores the rest of
the statement by not processing input from erroneous input to delimiter, such as semi-
colon. This is the easiest way of error-recovery and also, it prevents the parser from
developing infinite loops.
Part-C
Q.1 Elaborate left recursion? State the rules to remove left recursion from the
grammar. Eliminate left recursion from the following grammar.
S->L=R/R L->*R/id R->L
Ans. Left Recursion:Left recursion occurs in a grammar when a non-terminal A can
derive a string that starts with itself, directly or indirectly. It can lead to infinite recur-
sion during parsing and needs to be eliminated for the grammar to be suitable for pre-
dictive parsing.
In the given grammar, the left recursion is present in the productions: S→L=R
L→∗R
1. Left Factoring:
If a non-terminal A has multiple productions, factor out the common pre-
fix among the right-hand sides of those productions.
This helps in breaking the left recursion indirectly.
2. Introduce New Non-Terminals:
For each left-recursive production A→Aα ∣ β, introduce a new non-
terminal A' and rewrite the productions as:
A→βA′
A′→αA′ ∣ ϵ
where ϵ represents the empty string.
S→L=R ∣ R
L→∗R ∣ id
R→L
1. Left Factoring:
The common prefix in S→L=R and S→R is 'R'. Factor out the common
prefix:
′S→RS′
S′→=R ∣ ϵ
2. Introduce New Non-Terminals:
For the left-recursive production R→L, introduce a new non-terminal R':
R→LR′
R′→ϵ
S→RS′
S′→=R ∣ ϵ
L→∗R ∣ id
R→LR′
R′→ϵ
Now, this grammar is free of left recursion and can be used for predictive parsing.
E→E+T ∣ T
T→T∗F ∣ F
F→(E) ∣ id
Modified grammar:
E→TE′
E′→+TE′ ∣ ϵ
T→FT'
T′→∗FT′ ∣ ϵ
F→(E) ∣ id
To construct the predictive parsing table, we need to determine the First and Follow
sets for each non-terminal.
First Sets:
FIRST(E)=(id(ϵ
FIRST(E′)=(+,ϵ)
FIRST(T)=(id(ϵ
FIRST(T′)=(∗,ϵ)
FIRST(F)=(id(
Follow Sets:
FOLLOW(E)=($, +)
FOLLOW(E′)=($, +)
FOLLOW(T)=($, +, *)
FOLLOW(T′)=($, +, *)
FOLLOW(F)=($, +, *, (, id)
Predictive Parsing Table:
Non-
Terminal
/
Terminal id ( ) + ∗ $
E E→TE′ E→TE′
′E′ E′→ϵ E′→+TE′ E′→ϵ
T T→FT′ T→FT′
T′ T′→ϵ T′→ϵ T′→∗FT′ T′→ϵ
F F→id F→(E)
Design Stack Implementation:
Q.3 Classify different phases of compiler ? Explain each phase in detail. Also
give each phase result for given statement.
Ans.
A compiler is a complex software system that translates high-level programming lan-
guages into machine code or an intermediate code. The compilation process is divid-
ed into several phases, each performing a specific set of tasks. The main phases of a
compiler are:
These phases together ensure the correct translation of high-level source code into
executable machine code while performing necessary optimizations and error checks.
The exact details and sub-stages within each phase can vary based on the specific
compiler and language being used.
Part-A
Q.1 Explain the termGrammer.
Ans. It is a finite set of formal rules for generating syntactically correct sentences or
meaningful correct sentences.
Any Grammar can be represented by 4 tuples – <N, T, P, S>
N – Finite Non-Empty Set of Non-Terminal Symbols.
T – Finite Set of Terminal Symbols.
P – Finite Non-Empty Set of Production Rules.
S – Start Symbol (Symbol from where we start producing our sentences or
strings).
E→TE′
E′→+TE′ ∣ ∗TE′ ∣ ϵ
T→a ∣ b
Ans. Activation code refers to a code used for user authentication. An activation code
can be included with the software or sent to the user’s email address or device. They
can be used by software publishers to confirm the purchase.They can be used to un-
lock product functionality.
Part-B
Q.1 Construct the DAG and generate the code for the given block:
T1=a+b T4=T1-T3
T2=a-b T5=T4+T3
T3=T1*T2
Ans.
T1 = a + b // T1 = 10 + 5 = 15
T2 = a - b // T2 = 10 - 5 = 5
T3 = T1 * T2 // T3 = 15 * 5 = 75
T4 = T1 - T3 // T4 = 15 - 75 = -60
T5 = T4 + T3 // T5 = -60 + 75 = 15
This is a simple representation of the DAG and the corresponding code for the given
block. Each operation in the DAG is represented as an assignment statement in the
code. Note that the DAG helps in identifying common subexpressions and optimizing
the code by reusing intermediate results. In this example, T1 and T3 are computed
only once and reused in subsequent operations.
Q.2 What do you mean by basic block? Consider the following program
segment:
for r from 1 to 10 do
for c from 1to 10 do
a[r,c]=0.0;
for r from 1 to 10 do
a[r,c]=1.0;
Find the basic block and construct the flow graph.
Ans. 1) i=1 // Leader 1 (First statement)
2) j=1 // Leader 2 (Target of 11th statement)
3) t1 = 10 * i // Leader 3 (Target of 9th statement)
4) t2 = t1 + j
5) t3 = 8 * t2
6) t4 = t3 - 88
7) a[t4] = 0.0
8) j = j + 1
9) if j <= 10 goto (3) // Leader 4 (Immediately following Conditional goto
statement)
10) i = i + 1
11) if i <= 10 goto (2) // Leader 5 (Immediately following Conditional goto
statement)
12) i = 1 // Leader 6 (Immediately following Conditional goto
statement)
13) t5 = i - 1 // Leader 7 (Target of 17th statement)
14) t6 = 88 * t5
15) a[t6] = 1.0
16) i = i + 1
17) if i <= 10 goto (13) // Leader 8 (Immediately following Conditional goto
statement)
There are six basic blocks for the above-given code, which are:
B1 for statement 1
B2 for statement 2
B3 for statements 3-9
B4 for statements 10-11
B5 for statement 12
B6 for statements 13-17.
a) Syntax Tree:
+
/\
* +
/\/\
a b +
/\
+ c
/\
a b
Q.4 What is an LALR(1) grammar? Construct LALR parsing table for the
following grammar:
S->cC, C->cC, C->c/d
Ans. LALR(1) stands for "Look-Ahead LR(1)" and is a type of parsing table used in
compiler design. LALR(1) parsers are an extension of LR(1) parsers, where "LR"
stands for "Left-to-right, Rightmost derivation." LALR parsers are capable of han-
dling a broader class of grammars compared to SLR parsers.
Construct LR (1) Set of items. First of all, all the LR (1) set of items should be
generated.
In these states, states I3 and I6 can be merged because they have the same core or first
component but a different second component of Look Ahead.
Q.5 Solve the input id+id*id, using operator precedence parser for the following
grammar:
T->T+T/T*T/id
Ans. We construct the operator precedence table as-
id + x $
Part-C
Q.1 Generate code for the following C statements for the simple/target machine
assuming all variables are static and three register as available.
t:=a-b
u:=a-c
v:=t+u
d:=v+u
Ans.
Statement Code Register Address
Generated descriptor descriptor
Register
empty
Register Descriptor:
R0: Contains the value of 't' after the operation t := a - b. Later used for stor-
ing 'v' and 'd'.
R1: Contains the value of 'u' after the operation u := a - c.
Address Descriptor:
t: Represents the memory location or register where the variable 't' is stored. In
this case, it is in register R0.
u: Represents the memory location or register where the variable 'u' is stored.
It is initially in register R1 after the operation u := a - c.
v: Represents the memory location or register where the variable 'v' is stored.
In this case, it is in register R0 after the operation v := t + u.
d: Represents the memory location or register where the variable 'd' is stored.
It is in register R0 after the operation d := v + u.
Code Generation: The generated assembly code performs arithmetic operations and
uses registers to store intermediate values. The operations are sequenced to achieve
the desired results for each assignment statement.
The LR(0) parse table is constructed based on the LR(0) items for each production in
the grammar. The items indicate the current position of the parser within a
production. For the given grammar:
Grammar:
D -> type tlist;
tlist ->tlist , id / id
type -> int / float
Augmented Grammar:
S' -> D
D -> type tlist;
tlist ->tlist , id / id
type -> int / float
LR(0) Items:
S' -> .D (initial item)
D -> .typetlist;
tlist -> .tlist , id / id
type -> .int / float
LR(0) Parse Table:
+-----------+-------------+-------------+-------------+-------------+-------------+
|State| type| id | , | ; | $ | D | tlist|
+-----------+-------------+-------------+-------------+-------------+-------------+
| 0 | s2 | | | | | 1 | |
| 1 | | s3| | | | | |
| 2| | | | | acc | | |
| 3 | s2 | | | | | | |
| 4 | | s3 | | | | | |
| 5| | | s6 | | | | |
| 6| s2 | | | | | | |
| 7 | | | | s8 | | | |
| 8 | s2 | | | | | | |
| 9| | | | | r3 | | r3 |
| 10 | | | | | | | |
| 11 | | | | | | 12 | |
| 12 | | | | | r1 | r1 r1|
sX: Shift to state X.
rX: Reduce by production X.
acc: Accept.
The LR(0) parse table contains a reduce-reduce conflict. Specifically, in state 12,
there is a conflict between reducing by production 1 (S' -> D) and production 3 (type
-> int / float) when the next symbol is $.
1. Hash Tables: Efficient for quick lookups. The identifier is hashed, and the
hash value is used to index into the table.
2. Binary Search Trees (BST): Sorted structure where identifiers are arranged in
a tree. Allows for efficient searching.
3. Linked Lists: Simple structure where each entry points to the next one. Easy
to implement but may not be as efficient for large symbol tables.
4. Arrays: For small symbol tables, an array may be used with identifiers in-
dexed directly. The drawback is that it may waste space for unused entries.
Static vs. Dynamic Storage Allocation:
Static Storage Allocation:
Memory is allocated for variables at compile-time.
The size and layout of data structures are known at compile-time.
Examples include global variables and statically declared arrays.
The main advantage is efficiency, as memory is allocated once during
compilation.
The main drawback is inflexibility; it may not support dynamic memory
needs.
Dynamic Storage Allocation:
Memory is allocated at run-time during program execution.
Examples include dynamically allocated memory using malloc in C or
new in C++.
Provides flexibility, but comes with the cost of runtime overhead.
Dynamic allocation allows for structures like linked lists and dynamic
arrays.
b) CFG (Context-Free Grammar), Distinction Between Regular and Context-Free Grammar,
and Limitations of Context-Free Grammar:
Preprocessors are programs that process the source code before compilation. Several steps are
involved between writing a program and executing a program in C. Let us have a look at these
steps before we actually start learning about Preprocessors.
2. Discuss about the Syntax Error Handling.
The tasks of the Error Handling process are to detect each error, report it to the user, and then
make some recovery strategy and implement them to handle the error. During this whole process
processing time of the program should not be slow.
Functions of Error Handler:
Error Detection
Error Report
Error Recovery
3. Differentiate between shift-reduce and Operator Precedence Parsers.
Operator precedence grammar is kinds of shift reduce parsing method. It is applied to a small class
of operator grammars.
Operator precedence can only established between the terminals of the grammar. It ignores the non-
terminal.
Easier to implement: Intermediate code generation can simplify the code generation process by
reducing the complexity of the input code, making it easier to implement.
Facilitates code optimization: Intermediate code generation can enable the use of various code
optimization techniques, leading to improved performance and efficiency of the generated code.
Platform independence: Intermediate code is platform-independent, meaning that it can be
translated into machine code or bytecode for any platform.
Code reuse: Intermediate code can be reused in the future to generate code for other platforms or
languages.
Easier debugging: Intermediate code can be easier to debug than machine code or bytecode, as it
is closer to the original source code.
Symbol table is an important data structure created and maintained by compilers in order to store
information about the occurrence of various entities such as variable names, function names, ob-
jects, classes, interfaces, etc. Symbol table is used by both the analysis and the synthesis parts of a
compiler.
A symbol table may serve the following purposes depending upon the language in hand:
This is a lexical error since the ending of comment “*/” is not present but the beginning is pre-
sent.
4. Spelling Error
5. Replacing a character with an incorrect character.
8. What are the functions used to create the nodes of syntax trees?
A syntax tree’s nodes can all be performed as data with numerous fields. One element of the node
for an operator identifies the operator, while the remaining field contains a pointer to the operand
nodes. The operator is also known as the node’s label. The nodes of the syntax tree for expres-
sions with binary operators are created using the following functions. Each function returns a ref-
erence to the node that was most recently created.
1. mknode (op, left, right): It creates an operator node with the name op and two fields, contain-
ing left and right pointers.
2. mkleaf (id, entry): It creates an identifier node with the label id and the entry field, which is a
reference to the identifier’s symbol table entry.
3. mkleaf (num, val): It creates a number node with the name num and a field containing the
number’s value, val. Make a syntax tree for the expression a 4 + c, for example. p1, p2,…, p5 are
pointers to the symbol table entries for identifiers ‘a’ and ‘c’, respectively, in this sequence.
Semantic Analysis is the third phase of Compiler. Semantic Analysis makes sure that declarations
and statements of program are semantically correct. It is a collection of procedures which is called
by parser as and when required by grammar. Both syntax tree of previous phase and symbol table
are used to check the consistency of the given code. Type checking is an important part of
semantic analysis where compiler makes sure that each operator has matching operands.
Semantic Analyzer:
It uses syntax tree and symbol table to check whether the given program is semantically
consistent with language definition. It gathers type information and stores it in either syntax tree
or symbol table. This type information is subsequently used by compiler during intermediate-code
generation.
Semantic Errors:
Errors recognized by semantic analyzer are as follows:
Type mismatch
Undeclared variables
Reserved identifier misuse
Q.11 A) Write a regular expression for identifiers and reserved words. Design the
transition diagrams for them.
C) Explain the three general approaches for the implementation of a Lexical
analyzer.
A) The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme
that belongs to the language in hand. It searches for the pattern defined by the language
rules.
Regular expressions have the capability to express finite languages by defining a pattern for
finite strings of symbols. The grammar defined by regular expressions is known as regular
grammar. The language defined by regular grammar is known as regular language.
Operations
The various operations on languages are:
Union of two languages L and M is written as
L U M = {s | s is in L or s is in M}
Concatenation of two languages L and M is written as
LM = {st | s is in L and t is in M}
The Kleene Closure of a language L is written as
L* = Zero or more occurrence of language L.
Notations
If r and s are regular expressions denoting the languages L(r) and L(s), then
Union : (r)|(s) is a regular expression denoting L(r) U L(s)
Concatenation : (r)(s) is a regular expression denoting L(r)L(s)
Kleene closure : (r)* is a regular expression denoting (L(r))*
(r) is a regular expression denoting L(r)
Precedence and Associativity
*, concatenation (.), and | (pipe sign) are left associative
* has the highest precedence
Concatenation (.) has the second highest precedence.
| (pipe sign) has the lowest precedence of all.
Representing valid tokens of a language in regular expression
If x is a regular expression, then:
x* means zero or more occurrence of x.
i.e., it can generate { e, x, xx, xxx, xxxx, … }
x+ means one or more occurrence of x.
i.e., it can generate { x, xx, xxx, xxxx … } or x.x*
x? means at most one occurrence of x
i.e., it can generate either {x} or {e}.
[a-z] is all lower-case alphabets of English language.
[A-Z] is all upper-case alphabets of English language.
[0-9] is all natural digits used in mathematics.
Representing occurrence of symbols using regular expressions
letter = [a – z] or [A – Z]
digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
sign = [ + | - ]
Representing language tokens using regular expressions
Decimal = (sign)?(digit)+
Identifier = (letter)(letter | digit)*
The only problem left with the lexical analyzer is how to verify the validity of a regular expression
used in specifying the patterns of keywords of a language. A well-accepted solution is to use finite
automata for verification.
B) Lexical Analysis is the first step of the compiler which reads the source code one character at a time
and transforms it into an array of tokens. The token is a meaningful collection of characters in a
program. These tokens can be keywords including do, if, while etc. and identifiers including x,
num, count, etc. and operator symbols including >,>=, +, etc., and punctuation symbols including
parenthesis or commas. The output of the lexical analyzer phase passes to the next phase called syn-
tax analyzer or parser.
The syntax analyser or parser is also known as parsing phase. It takes tokens as input from lexical
analyser phase. The syntax analyser groups tokens together into syntactic structures. The output of
this phase is parse tree.
Function of Lexical Analysis
The main function of lexical analysis are as follows −
It can separate tokens from the program and return those tokens to the parser as requested by
it.
It can eliminate comments, whitespaces, newline characters, etc. from the string.
It can insert the token into the symbol table.
Lexical Analysis will return an integer number for each token to the parser.
Stripping out the comments and whitespace (tab, newline, blank, and other characters that
are used to separate tokens in the input).
The correlating error messages that are produced by the compiler during lexical analyzer
with the source program.
It can implement the expansion of macros, in the case of macro, pre-processors are used in
the source code.
LEX generates Lexical Analyzer as its output by taking the LEX program as its input. LEX pro-
gram is a collection of patterns (Regular Expression) and their corresponding Actions.
Patterns represent the tokens to be recognized by the lexical analyzer to be generated. For each pat-
tern, a corresponding NFA will be designed.
There can be n number of NFAs for n number of patterns.
Example − If patterns are { }
P1 { }
P2 { }
Pn { }
Then NFA’s for corresponding patterns will be −
A start state is taken and using ϵ transition, and all these NFAs can be connected to make combined
NFA −
The final state of each NFA shows that it has found its token Pi.
It converts the combined NFA to DFA as it is always easy to simulate the behavior of DFA with a
program.
The final state shows which token we have found. If none of the states of DFA includes any final
states of NFA then control returns to an error condition.
If the final state of DFA includes more than one final state of NFA, then the final state, for the pat-
tern coming first in the Translation rules has priority.