Compiler Construction 2
Compiler Construction 2
1. Constant Folding
2. Common Subexpression Elimination (CSE)
3. Dead Code Elimination
4. Loop Optimization
5. Strength Reduction
6. Peephole Optimization
b) What is Sentinel?
A sentinel is a special value placed at the end of a data structure to simplify processing and avoid
extra boundary checks.
c) Define Handle.
A handle is a substring in the rightmost sentential form that matches the right-hand side of a
production and can be replaced in shift-reduce parsing.
d) Define Bootstrapping.
Bootstrapping is the process of using a compiler to compile itself, often done in multiple stages.
e) LEX is a scanner provided by the Linux operating system. State True or False. Justify.
True. LEX is a lexical analyzer generator used in Linux/Unix systems to generate scanners for
tokenizing input.
LALR (Look-Ahead LR) parsing is preferred because it reduces memory usage compared to CLR
parsing while still handling most programming languages efficiently.
A basic block is a sequence of instructions with one entry point and one exit point where execution
flows sequentially without jumps or branches.
An attribute that is computed from child An attribute that is assigned from parent or
Definition nodes and passed upward to the parent in sibling nodes and moves downward or
the parse tree. sideways in the parse tree.
In an expression grammar: E → E1 + T,
In a grammar rule: T → id, T.type = E.type
Example E.val = E1.val + T.val (computed from
(inherits type from parent E).
children).
i) What is a Parser?
A parser is a component of a compiler that analyzes the syntax of source code and generates a parse
tree.
1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization
6. Code Generation
7. Symbol Table Management
8. Error Handling
• A lookahead pointer is used in lexical analysis to check upcoming characters in the input
stream without consuming them. It helps in deciding the correct token, especially in cases
where multiple patterns match the input.
b) State true or false: "Target code is generated in the analysis phase of the compiler".
• False. Target code is generated in the synthesis phase, not in the analysis phase.
• The output of a LEX program is a C program (lex.yy.c) that contains a lexical analyzer, which
can recognize patterns in input text.
d) Terminals can have synthesized attributes, but not inherited attributes. State true or false.
• True. Terminals can only have synthesized attributes because they do not have children to
inherit attributes from.
• Operand descriptors store information about where an operand is currently located (register,
memory, or stack) during intermediate code generation.
f) State True or False: The yywrap() lex library function by default always returns 1.
• Analysis (Front-end) – Breaks down the source code (Lexical, Syntax, and Semantic
Analysis).
• Synthesis (Back-end) – Generates target code (Intermediate Code Generation, Optimization,
and Code Generation).
h) List the different types of conflicts that occur in LR parser.
• Shift-Reduce Conflict
• Reduce-Reduce Conflict
• Constant Folding
• Common Subexpression Elimination
• Dead Code Elimination
• Loop Optimization (Loop Unrolling, Loop Invariant Code Motion)
• Peephole Optimization
a) Define cross-compiler.
• A cross-compiler is a compiler that runs on one machine but generates code for a different
machine or platform.
c) What is sentinels?
• Sentinels are special values placed at the end of an array or list to avoid boundary checking
and improve efficiency in searching or scanning operations.
• The retract() function moves the lookahead pointer backward in lexical analysis to
reconsider a character that was read ahead.
• Augmenting the grammar adds a new start symbol to the original grammar, which helps in
parsing and constructing parse trees properly.
• A synthesized attribute is an attribute whose value is computed from its child nodes in a
parse tree.
j) Define DAG.
• DAG (Directed Acyclic Graph) is a data structure used in code optimization to represent
expressions efficiently by eliminating common subexpressions and redundant
computations.
• YACC is a parser generator, not a compiler. It is used to generate parsers for processing
structured input.
• 0[xX][0-9a-fA-F]+
c) Define cross-compiler.
• A cross-compiler is a compiler that runs on one platform but generates executable code for a
different platform.
e) What is sentinels?
• Sentinels are special values placed at the end of an array to eliminate the need for boundary
checking, improving efficiency.
• An annotated parse tree is a parse tree where each node contains attribute values that help
in semantic analysis.
g) Name the types of LR parser.
• A basic block is a sequence of statements that always execute sequentially without branching
(except at the end).
• The retract() function moves the lookahead pointer backward in lexical analysis when an
extra character is read ahead.
LR(1) items:
1. S' → · S, $
2. S' → S ·, $
3. S → ·, $ (since S → ∈, the dot appears before ∈)
The LR(1) item for S → ∈ simply means the parser has recognized an empty production and will
reduce S → ∈ when encountering $ (end of input).
• Dead code refers to code that never gets executed or does not affect the program output,
making it unnecessary and removable during optimization.
• Shift-Reduce Conflict
• Reduce-Reduce Conflict
e) State one difference between an Annotated Parse Tree and a Dependency Graph.
• Annotated Parse Tree shows attribute values at each node in the parse tree.
• Dependency Graph shows dependencies between attributes to determine evaluation order.
f) List the techniques used in code optimization.
• Constant Folding
• Dead Code Elimination
• Common Subexpression Elimination
• Loop Optimization (Loop Unrolling, Loop Invariant Code Motion)
• Peephole Optimization
• Augmenting the grammar introduces a new start symbol to help define parsing rules clearly
and ensure proper handling of input.
• The output of lexical analysis is a stream of tokens, which are used as input for the syntax
analyzer (parser).
• True. LR parsers do not have Shift-Shift conflicts because their parsing tables are designed to
avoid such situations.
Q2
a) Write a short note on s-attributed grammar.
Key Features:
An identifier in LEX is a sequence of letters (A-Z, a-z) or underscores (_) followed by letters,
digits (0-9), or underscores. It must not start with a digit.
• An Annotated Parse Tree is a parse tree where each node is associated with attribute values
that help in semantic analysis. These attributes can be synthesized or inherited and are used to
store information such as data types, values, or symbol table entries.
Example:
Consider the expression E → E1 + T, where E1 and T have synthesized attributes for value
computation.
E (val = 5)
/ \
E1(val=3) + T(val=2)
b) List and explain in short any two LEX library functions (2 marks each).
1. yylex()
o This function is the main lexical analyzer function generated by LEX.
o It reads the input stream, matches patterns, and returns the corresponding tokens to the
parser.
2. yytext
o yytext is a global character array that stores the current matched token from the
input.
o Example: If 123 is matched as a number, yytext will store "123".
c) Give 2 differences between synthesized and inherited attributes.
Given grammar:
1. S → a | ∈ | (R)
2. T → S, T | S
3. R → T
FIRST sets:
• FIRST(S) = {a, ∈, (}
• FIRST(T) = {a, ∈, (, a, ∈, (} = {a, ∈, (}
• FIRST(R) = FIRST(T) = {a, ∈, (}
FOLLOW sets:
• FOLLOW(S) = { $, |, ) }
• FOLLOW(T) = { $, |, ) }
• FOLLOW(R) = { ) }
Given grammar:
1. E → E + T | T
2. T → T * F | F
3. F → (E) | id
• LEADING(E) = { (, id }
• LEADING(T) = { (, id }
• LEADING(F) = { (, id }
• TRAILING(E) = { ), id }
• TRAILING(T) = { ), id }
• TRAILING(F) = { ), id }
a) Construct the DAG for the expression:
Expression: b * (a + c) + (a + c) * d
DAG Representation:
(+)
/ \
(*) (*)
/ \ / \
b (a+c) (a+c) d
/ \
a c
Basic Tasks:
Auxiliary Tasks:
• S-Attributed Grammar:
o Uses only synthesized attributes, which are evaluated from child nodes to parent in
the parse tree.
o Example: Used in postfix expression evaluation.
• L-Attributed Grammar:
o Uses both synthesized and inherited attributes, with evaluation following left-to-
right traversal.
o Example: Used in type checking in a compiler.
• Synthesized Attribute:
o An attribute computed from child nodes and passed up in the parse tree.
o Example: Expression evaluation (E.val = E1.val + T.val).
• Inherited Attribute:
o An attribute derived from parent or sibling nodes and passed down or sideways in
the parse tree.
o Example: Type checking in variable declarations.
c)Directed Acyclic Graph (DAG) Construction for the Given Block
Step 1: Identify Common Subexpressions
• a[i] is accessed twice in b = a[i] and e = a[i], so we create a single node for a[i] to avoid
redundancy.
• a[j] = d is an independent assignment.
DAG Representation
----> (b)
|
(a[i]) (a[j]) ---> (a)
| |
| (d)
|
(e)
Explanation:
Definition:
• Convert A → Aα | β into:
• A → βA'
• A' → αA' | ∈
• Example:
o Given A → A + T | T,
o Convert to A → T A' and A' → + T A' | ∈.
This transformation ensures that recursion happens at the rightmost position instead of the left,
making the grammar suitable for top-down parsing.
a) Define SDD and SDT. State the task performed by SDT.
Syntax Directed Definition (SDD) is a formalism used in compilers where semantic rules are
associated with grammar productions. These rules define how attributes are computed based on
syntax structure.
Syntax Directed Translation (SDT) is a method where semantic actions (code snippets) are
embedded within the grammar productions to perform translations during parsing.
� Example:
E → E1 + T { print('+'); }
Here, { print('+'); } is an SDT action that prints + when the rule is applied.
Computed from child nodes and Derived from parent or sibling nodes and passed
passed up. down/sideways.
%%
[0-9]+ { printf("Number detected: %s\n", yytext); }
[a-zA-Z]+ { printf("Word detected: %s\n", yytext); }
\n { /* Ignore newlines */ }
. { printf("Symbol detected: %s\n", yytext); }
%%
int main() {
printf("Enter text: ");
yylex();
return 0;
}
int yywrap() { return 1; }
a) Write a LEX program to find factorial of a given number
%{
#include <stdio.h>
#include <stdlib.h>
long factorial(int n) {
if (n == 0 || n == 1) return 1;
return n * factorial(n - 1);
}
%}
%%
[0-9]+ {
int num = atoi(yytext);
printf("Factorial of %d is %ld\n", num, factorial(num));
}
.|\n { /* Ignore other characters */ }
%%
int main() {
printf("Enter a number: ");
yylex();
return 0;
}
int yywrap() { return 1; }
b) What is multi-pass compiler? Explain diagrammatically with its
advantages and disadvantages.
Grammatical
rules with Error routine
data structure table
Source
code
Pass n
Variable,constant
symbol and literal Object
code
Multi-Pass Compiler
A multi-pass compiler processes the source code in multiple passes, where each pass analyzes and
transforms the program before passing it to the next phase.
Code optimization improves the performance of the compiled code by making it faster and more
efficient. Some common techniques are:
a = b * c + d;
x = b * c + e;
After Optimization
temp = b * c;
a = temp + d;
x = temp + e;
• Definition:
yytext is a character array that stores the matched token from the input.
• Usage:
o Helps in identifying token values like keywords, numbers, or identifiers.
o Used for further processing like symbol table insertion.
• Example:
• [0-9]+ { printf("Number: %s\n", yytext); }
o If input is "123", the output will be:
o Number: 123
2. yywrap()
• Definition:
This function is called when the input file ends.
• Default Behavior:
o Returns 1 to signal the end of lexical analysis.
• Usage:
o Can be overridden to process multiple input files.
• Example:
• int yywrap() { return 1; } // Ends processing
o If overridden:
o int yywrap() { return 0; } // Continue processing another file
These functions help in token recognition and input handling in Lex programs.
a) Write the steps of creation of lexical analyzer on lex. Explain the lex
library functions associated with lex.
A lexical analyzer (lexer) reads the input source code and converts it into tokens. The steps to create
a lexical analyzer using Lex are:
yylex() The main function that scans and processes input based on defined patterns.
These functions help in scanning, processing, and handling tokens in a Lex-based lexical analyzer.
a) Define Annotated Parse tree. Give an example.
An Annotated Parse Tree is a syntax tree where each node is associated with attribute values that
provide semantic information. These attributes help in type checking, intermediate code
generation, and semantic analysis during compilation.
Example
E → E1 + T
E1 → T
T → id
E (val = x + y)
/ \
E1 (val = x) +
|
T (val = x)
|
id (x)
Each node carries computed values or attributes, which are later used for semantic analysis or
code generation.
d) Give 2 differences between synthesized and inherited attributes.
Example
E → T + E1
E.val = T.val + E1.val
T → id
T.type = E.type // Inheriting type from parent node E
Synthesized attributes are more common in syntax-directed translation, while inherited attributes
are used in context-sensitive parsing.