CD Important Questions
CD Important Questions
Front-end: The front-end of the compiler deals with the syntax and seman cs
of the source code. It is responsible for reading the source code, checking for
syntax and seman c errors, and producing an intermediate representa on
(IR) of the program.
3. Seman c Analysis
Input: The syntax tree (AST).
Output: A modified AST, annotated with type informa on and error
reports.
Ac ons:
The seman c analyzer checks for logical errors, like type
mismatches, undeclared variables, etc.
It ensures that opera ons are seman cally correct (e.g., adding two
integers).
It may annotate the AST with addi onal informa on (like variable
types).
5. Op miza on
Input: Intermediate code.
Output: Op mized intermediate code.
Ac ons:
The op mizer improves the intermediate code to reduce resource
usage (like me or memory) without changing the program's
func onality.
Common op miza ons include constant folding, loop unrolling,
dead code elimina on, and more.
6. Code Genera on
Input: Op mized intermediate code.
Output: Target machine code or assembly language.
Ac ons:
The code generator translates the intermediate code into machine
code that can be executed by the CPU or into assembly language,
which can later be assembled into machine code.
It maps variables and opera ons to specific CPU instruc ons.
3. What is lexical analysis? What tasks are performed by the lexical analyzer?
Explain the role of the lexical analyzer in the compila on process.
Lexical Analysis is the first phase of a compiler that takes the input as a source
code wri en in a high-level language. The purpose of lexical analysis is that
it aims to read the input code and break it down into meaningful elements
called tokens. Those tokens are turned into building blocks for other phases
of compila on. It is also known as a scanner.
What is a Lexeme?
The sequence of characters matched by a pa ern to form the corresponding
token or a sequence of input characters that comprises a single token is called
a lexeme.
Disadvantages
1) Limited Context: Lexical analysis operates based on individual tokens and
does not consider the overall context of the code.
2) Overhead: Although lexical analysis is necessary for the compila on or
interpreta on process, it adds an extra layer of overhead.
3) Debugging Challenges: Lexical errors detected during the analysis phase
may not always provide clear indica ons of their origins in the original
source code.
A er lexical analysis, tokens are passed on to the syntax analysis phase. The
syntax analysis checks whether the code structure complies with the
grammar of the chosen programming language.
4. Explain error recovery strategies used in a compiler.
The error may occur at various levels of compila on, so error handling is
important for the correct execu on of code. There are mainly five error
recovery strategies, which are as follows:
1. Panic mode
2. Phrase level recovery
3. Error produc on
4. Global correc on
5. Symbol table
Panic Mode
This strategy is used by most parsing methods. In this method of discovering
the error, the parser discards input symbols one at a me. This process is
con nued un l one of the designated sets of synchronizing tokens is found.
These tokens indicate an end of the input statement.
Advantages:
1. It’s easy to use.
2. The program never falls into the loop.
Disadvantage:
1. This technique may lead to seman c error or run me error in
further stages.
Disadvantages:
While doing the replacement the program should be prevented from
falling into an infinite loop.
Error Produc on
If error produc on is used during parsing, we can generate an appropriate
error message to indicate the error that has been recognized in the input.
This method is extremely difficult to maintain, because if we change
grammar, then it becomes necessary to change the corresponding
produc ons.
Advantages:
1. Syntac c phase errors are generally recovered by error
produc ons.
Disadvantages:
1. The method is very difficult to maintain because if we change the
grammar then it becomes necessary to change the corresponding
produc on.
2. It is difficult to maintain by the developers.
Global Correc on
Global correc on methods increase me & space requirements at parsing
me. This is simply a theore cal concept.
Advantages:
It makes very few changes in processing an incorrect input string.
Disadvantages:
It is simply a theore cal concept, which is unimplementable.
Symbol Table:
In seman c errors, errors are recovered by using a symbol table for the
corresponding iden fier and if data types of two operands are not
compa ble, automa cally type conversion is done by the compiler.
Advantages:
It allows basic type conversion, which we generally do in real-life
calcula ons.
Disadvantages:
Only Implicit type conversion is possible.
5. Define lexeme, token, and pa ern. Provide examples.
Lexeme
A lexeme is a sequence of source code that matches one of the predefined
pa erns and thereby forms a valid token. These lexemes follow the rules of
the language in order for them to be recognized as valid tokens.
Example:
main is lexeme of type iden fier(token)
(,),{,} are lexemes of type punctua on(token)
Token
In programming, a token is the smallest unit of meaningful data; it may be an
iden fier, keyword, operator, or symbol. A token represents a series or
sequence of characters that cannot be decomposed further.
Example
int a = 10; //Input Source code
Tokens
int (keyword), a(iden fier), =(operator), 10(constant) and
;(punctua on-semicolon)
Pa ern
A pa ern is a rule or syntax that designates how tokens are iden fied in a
programming language. In fact, it is supposed to specify the sequences of
characters or symbols that make up valid tokens, and provide guidelines as
to how to iden fy them correctly to the scanner.
Example
For a keyword to be iden fied as a valid token, the pa ern is the
sequence of characters that make the keyword.
For iden fier to be iden fied as a valid token, the pa ern is the
predefined rules that it must start with alphabet, followed by alphabet
or a digit.
6. Explain synthesized and inherited a ributes with examples.
Synthesized A ributes & Inherited A ributes
7
7. What are conflicts in an LR parser? What are their types? Explain with an
example.
Conflicts in an LR parser occur when the parser encounters an ambiguity in
the parsing table, meaning it has more than one possible ac on for a given
input symbol and stack configura on. These conflicts arise because the
parser cannot decide unambiguously whether to shi (read more input) or
reduce (apply a grammar rule) at certain points in the parsing process.
Example:
Imagine we have a math expression like id + id * id:
When the parser sees + followed by *, it might be unclear whether to:
Shi the *, because mul plica on (*) usually takes precedence over
addi on (+), or
Reduce using the +, if it doesn’t recognize the * as taking priority.
The parser is stuck between two ac ons—shi ing or reducing—causing a
shi -reduce conflict.
Reduce-Reduce Conflict
A reduce-reduce conflict occurs when a parser encounters a state where
more than one reduc on is possible for a par cular grammar symbol. This
conflict arises when mul ple produc on rules can be applied to the same
grammar symbol in a given state.
Example:
Consider the ambiguous grammar for an if-else statement:
if condi on then if condi on then statement else statement
The parser might see the else and be unsure whether it belongs to:
The first if statement, or
The second if statement.
Because the parser could reduce this in two different ways, it causes a
reduce-reduce conflict.
8. Differen ate between top-down and bo om-up parsing.
Top-down & Bo om-up parsing
Example
For the expression id + id * id:
1. The parser first looks for a term (T), which is the first id.
2. It then checks for the + symbol, indica ng it needs another expression.
3. It looks for another term (id), then checks for * (mul plica on).
4. Finally, it looks for the last term (id).
The parser successfully matches all parts of the expression, breaking it down
into smaller terms and opera ons. Each step follows the grammar rules,
which helps the parser understand the structure of the input.
10.Describe the role of the lexical analyzer in the compila on process.
The lexical analyzer (also called a lexer or scanner) plays a crucial role in the
compila on process. Its main job is to break down the source code into a
sequence of tokens, which are the meaningful components of the code.
Debugging difficul es: Debugging generated code can be more difficult than
debugging hand-wri en code, as the generated code may not always be easy
to read or understand. This can make it harder to iden fy and fix issues that
arise during development.
Learning curve: Code generators can have a steep learning curve, as they
typically require a deep understanding of the underlying code genera on
framework and the programming languages being used. This can make it
more difficult to onboard new developers onto a project that uses a code
generator.
2) Linker
Func on: The linker is responsible for combining mul ple object files
generated by the compiler into a single executable program.
Tasks:
o Resolves references between different object files (e.g., func on
calls, variable references).
o Combines libraries (sta c or dynamic) with the object files.
o Arranges the memory addresses for the program’s code and
data.
Output: The output of the linker is an executable file or a library.
3) Loader
Func on: The loader is part of the opera ng system and is responsible
for loading the executable file into memory for execu on.
Tasks:
o Allocates memory space for the program.
o Loads the executable code and data into memory.
o Ini alizes the program's run me environment (se ng up stack,
heap, etc.).
Output: Once the loader finishes its work, the program is ready to be
executed by the CPU.
13.What is le recursion? How is it eliminated in context-free grammars (CFG)?
Le Recursion
Le Recursion is a common problem that occurs in grammar during parsing
in the syntax analysis part of compila on. Le recursion occurs when a
grammar rule refers to itself at the beginning, making it difficult for a parser
to proceed without looping indefinitely.
Example of Le Recursion
Consider the rule:
A → Aα | β
Here, A on the le side is called again immediately on the right side (Aα),
which is le recursion. This can cause an infinite loop in recursive descent
parsers, as the parser would con nuously try to apply A → Aα.
Now:
E starts with T, followed by E'.
E' recursively adds + T without immediately calling E again, removing
the le recursion.
14.What is le factoring in CFG? Provide an example.
Le Factoring
Le factoring in context-free grammars (CFG) is a transforma on technique
used to remove ambiguity by factoring out common prefixes in produc ons.
This makes the grammar more suitable for top-down parsers, which need a
clear decision point when selec ng which rule to apply.
Defini on
Le factoring is the process of rewri ng a grammar to ensure that if two or
more produc ons for a non-terminal share a common prefix, that prefix is
factored out. This avoids situa ons where the parser has to make arbitrary
choices between rules that start the same way.
Purpose
The purpose of le factoring is to make a grammar more determinis c and
avoid ambiguity, which is essen al for parsers like recursive descent parsers
that process input in a top-down manner.
Example
Suppose the grammar is in the form:
A ⇒ αβ1 | αβ2 | αβ3 | …… | αβn | γ
We will separate those produc ons with a common prefix and then add a
new produc on rule in which the new non-terminal we introduced will derive
those produc ons with a common prefix.
A ⇒ αA`
A` ⇒ β1 | β2 | β3 | …… | βn
The top-down parser can easily parse this grammar to derive a given string.
So, this is how le factoring in compiler design is performed on a given
grammar.
15.What are synthesized a ributes? How are they used in syntax-directed
transla on?
Synthesized A ributes
Synthesized a ributes are proper es associated with nodes in a parse tree
and are used in syntax-directed transla on (SDT) to pass informa on up the
tree. These a ributes are computed based on the values of child nodes in the
parse tree. In other words, a synthesized a ribute at a par cular node is
derived from the a ributes of its children.
Example
Consider a simple arithme c grammar for expressions:
E→E+T
|T
T→T*F
|F
F→(E)
| id
Suppose we want to compute the value of an arithme c expression using
synthesized a ributes. Here’s how the a ributes could work in this grammar:
1. Define a synthesized a ribute val for each node in the parse tree to store
the computed value.
2. For each produc on rule, specify how to compute val based on child
nodes.
For instance:
In the produc on E → E + T, the value of E.val would be computed as:
E.val = E.val (from le child) + T.val (from right child)
In the produc on T → T * F, T.val could be computed as:
T.val = T.val (from le child) * F.val (from right child)
3. This process allows the computed values to propagate up from the leaf
nodes to the root, ul mately producing the value of the en re expression.
16.Discuss various storage alloca on strategies in a compiler.
In a compiler, storage alloca on strategies determine how memory is
allocated and managed for variables, data structures, and control
informa on. These strategies are essen al for efficiently using memory
during program execu on. The main storage alloca on strategies in a
compiler are sta c alloca on, stack alloca on, and heap alloca on.
Sta c Alloca on
In sta c alloca on, memory is assigned to variables, constants, and fixed-size
data structures at compile me. The allocated memory loca ons remain fixed
throughout program execu on.
For example:
int number = 1;
sta c int digit = 1;
For example:
int* ans = new int[5];
Stack Alloca on
Stack is commonly known as Dynamic alloca on. Dynamic alloca on means
the alloca on of memory at run- me. Stack is a data structure that follows
the LIFO principle so whenever there is mul ple ac va on record created it
will be pushed or popped in the stack as ac va ons begin and ends. Local
variables are bound to new storage each me whenever the ac va on record
begins because the storage is allocated at run me every me a procedure or
func on call is made.
For example:
void sum(int a, int b){int ans = a+b;cout<<ans;}
Steps:
Traverse the list of free blocks in memory.
Allocate the first block that fits the size requested.
Split the block if it’s larger than required, leaving the remainder as a
free block.
2) Best Fit
The allocator searches for the smallest available free block that is large
enough to sa sfy the alloca on request.
Steps:
Traverse the list of free blocks.
Select the smallest block that fits the requested size to minimize
le over space.
Allocate this block and, if necessary, split it.
Advantages: Reduces wasted memory because it leaves the smallest
amount of unused space.
Disadvantages: Can be slower because it requires searching the en re list
of free blocks, and may lead to fragmenta on as smaller blocks become
unusable.
3) Worst Fit
The allocator finds the largest available free block and uses it to sa sfy the
alloca on request.
Steps:
Traverse the list of free blocks.
Allocate the largest block found, leaving smaller, more usable
fragments for future alloca ons.
4) Buddy System
This technique splits memory into blocks of sizes that are powers of two
(e.g., 4KB, 8KB, 16KB). When memory is requested, the allocator finds the
smallest block that can fit the request and splits it into two "buddies" if
necessary.
Steps:
Find a block of the required size or split a larger block to create it.
When blocks are freed, they are combined (if possible) with their
"buddy" to create larger blocks again.
Steps:
Allocate a pool of fixed-size blocks for specific data structures or
objects.
When an object of that type is needed, the allocator provides a block
from the pool.
When the object is no longer needed, the block is returned to the pool
for reuse.
Dangling pointer points to the memory loca on that no longer contains the
same content it used to hold.
We deallocated the memory using free or delete methods. This frees the
memory to the compiler, and now it can allocate it again to either some
variable or a completely different program. However, the problem is that we
s ll have pointers that point to the exact memory loca on. These pointers
can later access those loca ons and can corrupt data.
We call such pointers the dangling pointer.
19.Explain various parameter passing methods (Call-by-value, Call-by-
reference, Copy-Restore, Call-by-Name).
Parameter passing methods determine how arguments are passed to
func ons in programming languages. Each method has different implica ons
for memory use, func on execu on, and the way arguments behave within
the func on.
1) Call-by-Value
In call-by-value, a copy of the argument’s value is passed to the func on.
The func on works on this copy, so any changes made to the parameter
within the func on do not affect the original argument.
Advantages:
Prevents unintended side effects, as changes in the func on don’t alter
the original variable.
Disadvantages:
Can be memory-intensive for large data structures, as a copy is created
for each argument.
2) Call-by-Reference
In call-by-reference, a reference (or address) of the argument is passed to
the func on. The func on can directly modify the original variable, as
both the func on parameter and the argument refer to the same memory
loca on.
Example: In C++, you can pass parameters by reference using &, while
languages like Python use references for mutable objects.
Advantages:
Saves memory and me since there is no copying involved.
Allows the func on to modify the original argument, which can be
useful in cases like upda ng data structures.
Disadvantages:
Risk of unintended side effects, as changes within the func on affect
the original variable.
3) Copy-Restore (Call-by-Value-Result)
This method combines aspects of call-by-value and call-by-reference. The
argument’s value is copied into the func on (like call-by-value), but when
the func on returns, the modified value is copied back to the original
argument (like call-by-reference).
Advantages:
Allows the func on to work with a copy, preven ng side effects during
execu on, but s ll passes back modifica ons.
Disadvantages:
Can be memory-intensive, as two copies (ini al and final) are created.
4) Call-by-Name
In call-by-name, the argument expression is not evaluated immediately.
Instead, it is subs tuted directly into the func on, and each me it is
accessed within the func on, it’s re-evaluated. This means that any
variable changes in the func on or in the argument itself are reflected
immediately.
Advantages:
Provides flexibility and allows for constructs like lazy evalua on.
Disadvantages:
Can lead to inefficient performance if the argument is complex, as it
gets evaluated mul ple mes.
20.What is a DAG? What are its advantages in the context of op miza on?
How does it help in elimina ng common subexpressions?
A Directed Acyclic Graph (DAG) is a data structure used to represent
expressions or computa ons in a way that avoids duplica on of common
subexpressions. In compiler design, DAGs are o en applied during the
op miza on phase to improve the efficiency of code by recognizing and
elimina ng redundant calcula ons.
Only three opera ons (two addi ons and one mul plica on) are now needed
to compute e, even though the original expression had duplicate calcula ons.
21.Explain the language-dependent and machine-independent phases of the
compiler. Also, list major func ons done by the compiler.
In the compila on process, a compiler undergoes two main types of phases:
language-dependent and machine-independent phases. Each phase serves
dis nct purposes, either focusing on understanding the source code
language or preparing the code for the target machine's architecture.
Language-Dependent Phases
Language-dependent phases focus on understanding and analyzing the
source code, typically in a specific programming language. These phases do
not consider the hardware specifica ons of the target machine.
Phases:
Lexical Analysis:
Scans the source code, breaks it into tokens (keywords, operators,
iden fiers), and removes whitespace and comments.
Syntax Analysis:
Parses the tokens according to grammar rules of the source language,
construc ng a parse tree that represents the code’s structure.
Seman c Analysis:
Examines the parse tree for seman c correctness (e.g., type checking,
scope rules), ensuring that the code makes logical sense according to
the language's rules.
Machine-Independent Phases
Machine-independent phases focus on op mizing and preparing the code for
the target machine without being ed to any specific hardware. These phases
are designed to enhance code efficiency across any machine.
Phases:
Intermediate Code Genera on:
Transforms the language-dependent structure into an intermediate
form (such as three-address code or abstract syntax trees) that is not
specific to any hardware.
Code Op miza on:
Refines the intermediate code by elimina ng redundancies, op mizing
control flow, and performing loop op miza ons, among other
techniques, to enhance performance.
Components of an Ac va on Record
A typical ac va on record consists of the following sec ons:
1. Return Address:
Stores the address where the func on should resume execu on a er it
completes. When a func on call is finished, the program uses this address
to know where to return in the calling func on.
4. Parameters:
Stores the values of parameters passed to the func on. These may be
passed by value, reference, or other parameter-passing methods. They
are required for the func on’s execu on and act as local variables within
the func on.
5. Local Variables:
Allocates space for the func on’s local variables, which are only accessible
within the func on's scope. These variables are destroyed when the
func on exits.
6. Temporary Values:
Stores temporary values or intermediate results generated during the
func on's execu on. This area is o en used for expression evalua on.
7. Saved Registers:
Holds copies of register values from the calling func on that the current
func on may modify. Saving these values ensures that the calling
func on’s register values are restored a er the func on call.
3) Support for Recursion: Since each func on call has its own ac va on
record, recursive func ons can safely call themselves mul ple mes
without interfering with previous calls.
Step-by-step Process:
Step-1: Grammar Explana on:
The start symbol S can be replaced by A.
The non-terminal A can be replaced either by aA or b.
Step-2: LL(1) Parsing Table:
Construct a parsing table based on the first terminal that each non-
terminal can generate.
The table looks something like this:
Non-Terminal A B
S A →aA A→b
A A →aA A→b
The parser uses the input symbol to decide which produc on to
apply. For example:
o If the current symbol is a, the parser uses A → aA.
o If the current symbol is b, the parser uses A → b.
Step-4: Result:
The parser successfully parses the input aab and constructs the
corresponding parse tree.
24.Explain the pass structure of an assembler.
An assembler is a program that translates assembly language code into
machine code (or object code). The process of transla on typically involves
mul ple phases or passes. Each pass of an assembler performs specific tasks
and contributes to conver ng the high-level assembly code into the final
machine code output.
Pass 1:
Goal: Analyze the source code, resolve labels, and generate the symbol table.
Pass 2:
Goal: Generate the final machine code by replacing placeholders with actual
addresses and crea ng the executable code.
The small set of instruc ons or small part of code on which peephole
op miza on is performed is known as peephole or window.
The compiler op mizing process should meet the following objec ves:
The op miza on must be correct, it must not, in any way, change the
meaning of the program.
Op miza on should increase the speed and performance of the program.
The compila on me must be kept reasonable.
The op miza on process should not delay the overall compiling process.