0% found this document useful (0 votes)
30 views12 pages

Mid Sem Solution (CPTT)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views12 pages

Mid Sem Solution (CPTT)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Subject: Compilers: Principles, Techniques, and Tools (CSE 3739)

Solution of Mid Semester Examination (April 2025)

Q. 1 (a) Consider the following fragment of ‘C’ code:


float a, b;
a = a * -50 + b + 3;
Write the output of 1st four phases of the compiler for the above arithmetic expression
in the ‘C’ code.
Solution:
1. Lexical Analysis: In this phase, the compiler breaks the code into tokens. Tokens are the
smallest units of meaning like keywords, identifiers, operators, constants, etc.

Tokens:

1. float : keyword 2. a : identifier


3. , : punctuation 4. b : identifier
5. ; : punctuation 6. a : identifier
7. = : assignment operator 8. a : identifier
9. * : arithmetic operator 10. -50 : constant (negative number)
11. + : arithmetic operator 12. b : identifier
13. + : arithmetic operator 14. 3 : constant
15. ; : punctuation

2. Syntax Analysis (Parsing): This phase checks whether the tokens follow correct grammar
rules. It creates a parse tree or an Abstract Syntax Tree (AST).
=
/ \
a +
/ \
+ 3
/ \
* b
/ \
a -50

3. Semantic Analysis: Checks for type correctness, declarations, scope, etc. a and b are
declared as float. -50 and 3 are implicitly converted to float. All operands in expressions are of
type float. The expression is valid, and a can be assigned the result. So, no semantic errors here.
=
/ \
a +
/ \
+ 3.0
/ \
* b
/ \
a -50.0

4. Intermediate Code Generation: The compiler generates an intermediate representation,


often in three-address code (TAC).
t1 = a
t2 = b
t3 = -50.0
t4 = t1 * t3
t5 = t4 + t2
t6 = t5 + 3.0
a = t6

(b) Write short notes on:


(i) Symbol Table (ii) Input Buffering
Answer:
(i) Symbol Table: Every compiler uses a symbol table to track all variables, functions, and
identifiers in a program. It stores information such as the name, type, scope, and memory
location of each identifier. Built during the early stages of compilation, the symbol table
supports error checking, scope management, and code optimization for runtime efficiency. It
plays a crucial role in ensuring the correct usage of identifiers according to language rules.
Role of symbol table in compiler phases: The symbol table acts as a bridge between the
analysis and synthesis phases of the compiler. It collects information during the analysis phases
and utilizes it during the synthesis phases to generate efficient code, ultimately enhancing
compile-time performance.
(ii) Input buffering: Input buffering is a technique that allows the compiler to read input in
larger chunks, which can improve performance and reduce overhead. The basic idea behind
input buffering is to read a block of input from the source code into a buffer, and then process
that buffer before reading the next block. The size of the buffer can vary depending on the
specific needs of the compiler and the characteristics of the source code being compiled.
Lexical Analysis has to access secondary memory each time to identify tokens. It is time-
consuming and costly. So, the input strings are stored into a buffer and then scanned by Lexical
Analysis. Lexical Analysis scans input string from left to right one character at a time to
identify tokens. It uses two pointers to scan tokens:
Begin Pointer (bp) − It points to the beginning of the string to be read.
Forward Pointer (fp) − It moves ahead to search for the end of the token.
Initially both the pointers point to the first character of the input string as shown below

The forward pointer moves ahead to search for end of lexeme. As soon as the blank space is
encountered, it indicates end of lexeme. In above example as soon as pointer (fp) encounters a
blank space the lexeme “int” is identified. The fp will be moved ahead at white space, when fp
encounters white space, it ignore and moves ahead, then both the begin pointer (bp) and
forward pointer (fp) are set at next token. The input character is thus read from secondary
storage, but reading in this way from secondary storage is costly. Hence buffering technique is
used. A block of data is first read into a buffer, and then second by lexical analyser.
(c) Explain the language processing system for the modern computer programming
languages.
Answer: The computer is an intelligent combination of software and hardware. Hardware is
simply a piece of mechanical equipment and its functions are being compiled by the relevant
software. The hardware considers instructions as electronic charge, which is equivalent to the
binary language in software programming. The binary language has only 0s and 1s. To
enlighten, the hardware code has to be written in binary format, which is just a series of 0s and
1s. Writing such code would be an inconvenient and complicated task for computer
programmers, so we write programs in a high-level language, which is Convenient for us to
comprehend and memorize. These programs are then fed into a series of devices and operating
system (OS) components to obtain the desired code that can be used by the machine. This is
known as a language processing system.

Figure: Language Processing System


Components of Language processing system:
Preprocessor: It includes all header files and also evaluates whether a macro is included (A
macro is a piece of code that is given a name. Whenever the name is used, it is replaced by the
contents of the macro by an interpreter or compiler. The purpose of macros is either to automate
the frequency used for sequences or to enable more powerful abstraction). It takes source code
as input and produces modified source code as output. The preprocessor is also known as a
macro evaluator, the processing is optional that is if any language that does not support
#include macros processing is not required.
Compiler: The compiler takes the modified code as input and produces the target code as
output.

Figure: Compiler Input-Output


Assembler: The assembler takes the target code as input and produces real locatable machine
code as output.
Linker: Linker or link editor is a program that takes a collection of objects (created by
assemblers and compilers) and combines them into an executable program.
Loader: The loader keeps the linked program in the main memory.
Executable Code: It is low-level and machine-specific code that the machine can easily
understand. Once the job of the linker and loader is done the object code is finally converted it
into executable code.

Q. 2 (a) Write the regular expression for the floating point number with exponent. The
example of some floating point numbers with exponent are some strings such as 6.5897E4,
1.8569E-4, 2.54e15 or 3.54e-15. [Hint: At least one digit should be before and after the dot
(.)]
Solution:
digit → [0-9]
digits → digit+
Exponent → E | e
number → digits (. digits)? (Exponent (+|-)? digits)?
(b) Construct a transition diagram (finite state automaton) that represents the lexical
recognition of a floating-point number with an exponent.

Figure: Transition diagram for unsigned numbers

(c) Find the number of tokens in the given C statement.


printf(“pt = %d, &pt = %x”, pt, &pt);
1. printf : identifier (function name)
2. ( : punctuation (opening parenthesis)
3. “pt = %d, &pt = %x” : string literal (whole string is 1 token)
4. , : punctuation (comma separating arguments)
5. pt : identifier
6. , : punctuation
7. & : operator (address-of)
8. pt : identifier
9. ) : punctuation (closing parenthesis)
10. ; : punctuation (statement terminator)
Total number of tokens: 10

Q. 3 (a) Consider two grammars G with the production rules given below:
G: S → if E then S | if E then S else S | a
E→b
Where if, then, else, a, b, c are the terminals. Determine whether the given grammar G is
ambiguous.
Answer: The given grammar G has common prefix “if E then S”. This grammar is having
dangling else ambiguity. Also, the common prefix leads to the ambiguous grammar.
Note: Students can also find ambiguity of the grammar by having LMD and RMD for an input
string.

(b) Determine whether the given grammar G in question 3(a) is LL (1) grammar.
Solution:
At first, grammar needs to be left factored by rewriting it.
S → if E then S | if E then S S’ | a
S’ → else S | ɛ
E→b
Next, we need to construct LL (1) parsing table to check whether the given grammar is LL (1)
or not. For this, we need to find FIRST and FOLLOW of each nonterminal in left factored
grammar.
FIRST FOLLOW
FIRST (S) = {if, a} FOLLOW (S) = {else, $}
FIRST (S’) = {else, ɛ} FOLLOW (S’) = {else, $}
FIRST (E) = {b} FOLLOW (E) = {then}

Non- Input Symbol


Terminal a b else if then $
S S→a S → if E then S
S’ S’ → else S S’ → ɛ
S’ → ɛ
E E→b

The parsing table has conflict since it has two entries in the cell (S’, else). Therefore, the given
grammar is not LL (1).

(c) Consider the following context-free grammar where the start symbol is S and the set
of terminals is {a, b, c, d}.
S → AaAb | BbBa
A → cS | ɛ
B → dS | ɛ
The following is a partially-filled LL (1) parsing table.

a b c d $
S S → AaAb S → BbBa (1) (2)
A A→ɛ (3) A → cS
B (4) B→ɛ B → dS

Write down the CORRECT productions for the numbered cells in the parsing table.
Solution:
First, we have to find FIRST and FOLLOW of each nonterminal in left factored grammar.
FIRST FOLLOW
FIRST (S) = {a, b, c, d} FOLLOW (S) = {a, b, $}
FIRST (A) = {c, ɛ} FOLLOW (A) = {a, b}
FIRST (B) = {d, ɛ} FOLLOW (B) = {a, b}

The correct parsing table is:

a b c d $
S S → AaAb S → BbBa S → AaAb S → BbBa
A A→ɛ A→ɛ A → cS
B B→ɛ B→ɛ B → dS

Q. 4 (a) Compute the FIRST( ) and FOLLOW( ) of the given grammar:


S → Aba | bCA
A → cBCD | ɛ
B → CdA | ad
C → eC | ɛ
D → bsf | a
Solution:
FIRST FOLLOW
FIRST (S) = {b, c} FOLLOW (S) = {$}
FIRST (A) = {c, ɛ} FOLLOW (A) = {a, b, e, $}
FIRST (B) = {a, d, e} FOLLOW (B) = {a, b, e}
FIRST (C) = {e, ɛ} FOLLOW (C) = {a, b, c, d, $}
FIRST (D) = {a, b} FOLLOW (D) = {a, b, e, $}

(b) The attributes of three arithmetic operators in some programming language is given
below.
Operator Precedence Associativity Arity
+ High Left Binary
- Medium right Binary
* Low Left Binary
Compute the value of the expression 2-5+1-7*3 and create the parse tree for this
expression.
Solution:
Given expression is:
2–5+1–7*3
= 2 – 6 – 7 * 3 (highest priority is of ‘+’, associativity left to right)
= 2 – (–1) * 3 (second highest priority is of ‘–’, associativity right to left)
= 2 + 1 * 3 (highest priority is of ‘+’, associativity left to right)
= 3 * 3 (lowest priority is of ‘*’, associativity left to right)
=9
Parse tree for the given expression is provided below:
*
/ \
– 3
/ \
2 –
/ \
+ 7
/ \
5 1

(c) Consider the following parse tree for the expression a#b$c$d#e#f, involving two
binary operators $ and #.

#
/ \
a #
/ \
$ #
/ \ / \
$ d e f
/ \
b c
Determine which operator has the high precedence and what is the associativity of the
two operators $ and #.
Solution:
Highest precedence operator is at the lowest level in the expression tree so that it is evaluated
first.
For unambiguous grammar, we can get precedence and associativity directly from production
or expression tree.
Left Associativity → Left Linear Grammar or in expression tree, it should expand on left child
for the same operator and vice versa.
Here at the lowest level, we have $ so it has the highest precedence.
# is right associative and $ is left associative.
Q. 5 (a) Consider the following grammar with production rules as:
E→T+E|T
T → id
Compute the canonical collection of LR (0) items of the grammar.
Solution:
Augmented Grammar
E’ → E
E→T+E
E→T
T → id

I0 : E’ → .E GOTO (I0, E) → I1
E → .T + E GOTO (I0, T) → I2
E → .T GOTO (I0, id) → I3
T → .id

I1: GOTO (I0, E)


E’ → E.

I2: GOTO (I0, T) GOTO (I2, +) → I4


E → T. + E
E → T.

I3: GOTO (I0, id)


E → id.

I4: GOTO (I2, +) GOTO (I4, E) → I5


E → T +. E GOTO (I4, T) → I2
E → .T + E GOTO (I4, id) → I3
E → .T
T → .id

I5: GOTO (I4, E)


E → T + E.
(b) Justify that the grammar given in question 5(a) is not LR (0) grammar.

Answer:
A grammar is LR (0) if in every item set
(i) no shift/reduce conflict occurs
(ii) no reduce/reduce conflict occurs
In I2, we have shift-reduce conflict. Therefore, the grammar is not LR (0).
OR
Alternatively, students can solve it by constructing LR (0) parsing table.
(1) E → T + E
(2) E → T
(3) T → id

Action GOTO
State
id + $ E T

0 S3 1 2

1 Accept

2 R2 S4/R2 R2

3 R3 R3 R3

4 S3 5 2

5 R1 R1 R1

It can be observed from table that state 2 has shift-reduce conflict for + input. Therefore, the
grammar is not LR (0).
(c) Construct the SLR (1) parsing table for the grammar given in question 5(a).
Solution:
Action GOTO
State
id + $ E T

0 S3 1 2

1 Accept

2 S4 R2

3 R3 R3

4 S3 5 2

5 R1

You might also like