0% found this document useful (0 votes)
12 views44 pages

CD Important Questions

Uploaded by

Vatsal Vaghasiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views44 pages

CD Important Questions

Uploaded by

Vatsal Vaghasiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

CD Important Ques ons

1. What is a compiler? What is the front-end and back-end of a compiler?


 Compiler:
A compiler is a special program that translates a high-level programming
language (source code) into machine code or an intermediate code that can
be executed by a computer.

Front-end: The front-end of the compiler deals with the syntax and seman cs
of the source code. It is responsible for reading the source code, checking for
syntax and seman c errors, and producing an intermediate representa on
(IR) of the program.

The front-end includes the following phases:


 Lexical Analysis:
It is also called a scanner. It takes the output of the preprocessor
(which performs file inclusion and macro expansion) as the input
which is in a pure high-level language. It reads the characters from the
source program and groups them into lexemes.
 Syntax Analysis:
It is some mes called a parser. It constructs the parse tree. It takes all
the tokens one by one and uses Context-Free Grammar to construct
the parse tree.
 Seman c Analysis:
It verifies the parse tree, whether it’s meaningful or not. It furthermore
produces a verified parse tree. It also does type checking, Label
checking, and Flow control checking.
 Intermediate Code Genera on:
It generates intermediate code, which is a form that can be readily
executed by a machine. Intermediate code is converted to machine
language using the last two phases which are pla orm dependent.

Back-end: The back-end of the compiler focuses on op miza on and code


genera on. It is responsible for conver ng the intermediate representa on
into machine code or a lower-level intermediate representa on. The back-
end includes the following phases:
 Code Op miza on:
The fi h phase of a compiler is op miza on. This phase applies various
op miza on techniques to the intermediate code to improve the
performance of the generated machine code.
 Code Genera on
The final phase of a compiler is code genera on. This phase takes the
op mized intermediate code and generates the actual machine code
that can be executed by the target hardware.
2. Explain the input, output, and ac ons performed by each phase of a
compiler with an example.
 Phases of a Compiler:

1. Lexical Analysis (Scanner)


Input: The source code (as a sequence of characters).
Output: A sequence of tokens (meaningful units such as keywords,
iden fiers, operators, etc.).
Ac ons:
 The lexical analyzer scans the source code character by character.
 It groups characters into tokens (e.g., keywords, operators, variable
names).
 It removes comments and whitespace.

2. Syntax Analysis (Parser)


Input: Sequence of tokens generated by the lexical analyzer.
Output: A syntax tree (or abstract syntax tree, AST), which represents the
gramma cal structure of the program.
Ac ons:
 The parser checks if the sequence of tokens follows the syntax rules
of the programming language.
 It creates a tree structure (AST) that reflects the syntac c structure
of the program.

3. Seman c Analysis
Input: The syntax tree (AST).
Output: A modified AST, annotated with type informa on and error
reports.
Ac ons:
 The seman c analyzer checks for logical errors, like type
mismatches, undeclared variables, etc.
 It ensures that opera ons are seman cally correct (e.g., adding two
integers).
 It may annotate the AST with addi onal informa on (like variable
types).

4. Intermediate Code Genera on


Input: Annotated AST (with seman c checks).
Output: Intermediate code (o en in a low-level representa on such as
three-address code).
Ac ons:
 The compiler translates the AST into an intermediate form that is
easier to op mize and translate into machine code.
 Intermediate code is usually not machine-specific and helps in
making the final machine code genera on more efficient.

5. Op miza on
Input: Intermediate code.
Output: Op mized intermediate code.
Ac ons:
 The op mizer improves the intermediate code to reduce resource
usage (like me or memory) without changing the program's
func onality.
 Common op miza ons include constant folding, loop unrolling,
dead code elimina on, and more.

6. Code Genera on
Input: Op mized intermediate code.
Output: Target machine code or assembly language.
Ac ons:
 The code generator translates the intermediate code into machine
code that can be executed by the CPU or into assembly language,
which can later be assembled into machine code.
 It maps variables and opera ons to specific CPU instruc ons.
3. What is lexical analysis? What tasks are performed by the lexical analyzer?
Explain the role of the lexical analyzer in the compila on process.
 Lexical Analysis is the first phase of a compiler that takes the input as a source
code wri en in a high-level language. The purpose of lexical analysis is that
it aims to read the input code and break it down into meaningful elements
called tokens. Those tokens are turned into building blocks for other phases
of compila on. It is also known as a scanner.

What is a Lexeme?
The sequence of characters matched by a pa ern to form the corresponding
token or a sequence of input characters that comprises a single token is called
a lexeme.

Tasks Performed by Lexical Analyzer:


 Input preprocessing: This stage involves cleaning up the input text and
preparing it for lexical analysis.
 Tokeniza on: This is the process of breaking the input text into a
sequence of tokens.
 Token classifica on: In this stage, the Lexical Analyzer determines the
type of each token.
 Token valida on: In this stage, the Lexical Analyzer checks that each token
is valid according to the rules of the programming language.
 Output genera on: In this final stage, the Lexical Analyzer generates the
output of the lexical analysis process, which is typically a list of tokens.
Advantages
1) Simplifies Parsing: Breaking down the source code into tokens makes it
easier for computers to understand and work with the code.
2) Error Detec on: Lexical analysis will detect lexical errors such as
misspelled keywords or undefined symbols early in the compila on
process.
3) Efficiency: Once the source code is converted into tokens, subsequent
phases of compila on or interpreta on can operate more efficiently.

Disadvantages
1) Limited Context: Lexical analysis operates based on individual tokens and
does not consider the overall context of the code.
2) Overhead: Although lexical analysis is necessary for the compila on or
interpreta on process, it adds an extra layer of overhead.
3) Debugging Challenges: Lexical errors detected during the analysis phase
may not always provide clear indica ons of their origins in the original
source code.

A er lexical analysis, tokens are passed on to the syntax analysis phase. The
syntax analysis checks whether the code structure complies with the
grammar of the chosen programming language.
4. Explain error recovery strategies used in a compiler.
 The error may occur at various levels of compila on, so error handling is
important for the correct execu on of code. There are mainly five error
recovery strategies, which are as follows:
1. Panic mode
2. Phrase level recovery
3. Error produc on
4. Global correc on
5. Symbol table

Panic Mode
This strategy is used by most parsing methods. In this method of discovering
the error, the parser discards input symbols one at a me. This process is
con nued un l one of the designated sets of synchronizing tokens is found.
These tokens indicate an end of the input statement.
Advantages:
1. It’s easy to use.
2. The program never falls into the loop.
Disadvantage:
1. This technique may lead to seman c error or run me error in
further stages.

Phrase Level Recovery


In this strategy, on discovering an error, parser performs local correc on on
the remaining input. It can replace a prefix of the remaining input with some
string. This actually helps the parser to con nue its job.
Advantages:
This method is used in many errors repairing compilers.

Disadvantages:
While doing the replacement the program should be prevented from
falling into an infinite loop.
Error Produc on
If error produc on is used during parsing, we can generate an appropriate
error message to indicate the error that has been recognized in the input.
This method is extremely difficult to maintain, because if we change
grammar, then it becomes necessary to change the corresponding
produc ons.
Advantages:
1. Syntac c phase errors are generally recovered by error
produc ons.
Disadvantages:
1. The method is very difficult to maintain because if we change the
grammar then it becomes necessary to change the corresponding
produc on.
2. It is difficult to maintain by the developers.

Global Correc on
Global correc on methods increase me & space requirements at parsing
me. This is simply a theore cal concept.
Advantages:
It makes very few changes in processing an incorrect input string.
Disadvantages:
It is simply a theore cal concept, which is unimplementable.

Symbol Table:
In seman c errors, errors are recovered by using a symbol table for the
corresponding iden fier and if data types of two operands are not
compa ble, automa cally type conversion is done by the compiler.
Advantages:
It allows basic type conversion, which we generally do in real-life
calcula ons.
Disadvantages:
Only Implicit type conversion is possible.
5. Define lexeme, token, and pa ern. Provide examples.
 Lexeme
A lexeme is a sequence of source code that matches one of the predefined
pa erns and thereby forms a valid token. These lexemes follow the rules of
the language in order for them to be recognized as valid tokens.

Example:
main is lexeme of type iden fier(token)
(,),{,} are lexemes of type punctua on(token)

Token
In programming, a token is the smallest unit of meaningful data; it may be an
iden fier, keyword, operator, or symbol. A token represents a series or
sequence of characters that cannot be decomposed further.

Example
int a = 10; //Input Source code

Tokens
int (keyword), a(iden fier), =(operator), 10(constant) and
;(punctua on-semicolon)

Pa ern
A pa ern is a rule or syntax that designates how tokens are iden fied in a
programming language. In fact, it is supposed to specify the sequences of
characters or symbols that make up valid tokens, and provide guidelines as
to how to iden fy them correctly to the scanner.

Example
For a keyword to be iden fied as a valid token, the pa ern is the
sequence of characters that make the keyword.

For iden fier to be iden fied as a valid token, the pa ern is the
predefined rules that it must start with alphabet, followed by alphabet
or a digit.
6. Explain synthesized and inherited a ributes with examples.
 Synthesized A ributes & Inherited A ributes

Sr.No Synthesized A ributes Inherited A ributes


An a ribute is said to be a An a ribute is said to be an
synthesized a ribute if its inherited a ribute if its parse
1 parse tree node value is tree node value is determined
determined by the a ribute by the a ribute value at
value at child nodes. parent and/or sibling nodes.
The produc on must have a
The produc on must have a
2 non-terminal as a symbol in its
non-terminal as its head.
body.
A synthesized a ribute at node An inherited a ribute at node
n is defined only in terms of n is defined only in terms of
3
a ribute values at the children a ribute values of n’s parent, n
of n itself. itself, and n’s siblings.
It can be evaluated during a It can be evaluated during a
4 single bo om-up traversal of single top-down or sideways
the parse tree. traversal of the parse tree.
Inherited a ributes cannot be
Synthesized a ributes can be
contained by terminals; they
5 contained by both terminals
are only contained by non-
and non-terminals.
terminals.
Synthesized a ributes are
used by both S-a ributed and Inherited a ributes are used
6
L-a ributed syntax-directed only by L-a ributed SDT.
transla ons (SDT).

7
7. What are conflicts in an LR parser? What are their types? Explain with an
example.
 Conflicts in an LR parser occur when the parser encounters an ambiguity in
the parsing table, meaning it has more than one possible ac on for a given
input symbol and stack configura on. These conflicts arise because the
parser cannot decide unambiguously whether to shi (read more input) or
reduce (apply a grammar rule) at certain points in the parsing process.

Types of Conflicts in LR Parser


Shi -Reduce Conflict
A shi -reduce conflict occurs when a parser is uncertain about whether to
shi or reduce a grammar symbol in a given state. In other words, the parser
faces a choice between applying a shi ac on or a reduce ac on based on
the next input symbol. This conflict can arise due to the presence of a rule in
the grammar that allows mul ple possible ac ons for a given state and input
symbol.

Example:
Imagine we have a math expression like id + id * id:
When the parser sees + followed by *, it might be unclear whether to:
 Shi the *, because mul plica on (*) usually takes precedence over
addi on (+), or
 Reduce using the +, if it doesn’t recognize the * as taking priority.
The parser is stuck between two ac ons—shi ing or reducing—causing a
shi -reduce conflict.

Reduce-Reduce Conflict
A reduce-reduce conflict occurs when a parser encounters a state where
more than one reduc on is possible for a par cular grammar symbol. This
conflict arises when mul ple produc on rules can be applied to the same
grammar symbol in a given state.
Example:
Consider the ambiguous grammar for an if-else statement:
if condi on then if condi on then statement else statement
The parser might see the else and be unsure whether it belongs to:
 The first if statement, or
 The second if statement.
Because the parser could reduce this in two different ways, it causes a
reduce-reduce conflict.
8. Differen ate between top-down and bo om-up parsing.
 Top-down & Bo om-up parsing

Top-Down Parsing Bo om-Up Parsing


It is a parsing strategy that first It is a parsing strategy that first
looks at the highest level of the looks at the lowest level of the
parse tree and works down the parse tree and works up the parse
parse tree by using the rules of tree by using the rules of
grammar. grammar.
Top-down parsing a empts to find Bo om-up parsing a empts to
the le most deriva on for an reduce the input string to the
input string. start symbol of a grammar.
This parsing technique starts
This parsing technique starts
parsing from the bo om (leaf
parsing from the top (start symbol
nodes of the parse tree) up to the
of parse tree) down to the leaf
start symbol in a bo om-up
nodes in a top-down manner.
manner.
This parsing technique uses Le - This parsing technique uses Right-
Most Deriva on. Most Deriva on.
The main decision is to determine
The main decision is to select
when to use a produc on rule to
which produc on rule to use to
reduce the string towards the
construct the string.
start symbol.
Example: Recursive Descent
Example: Shi -Reduce Parser.
Parser.
9. Explain recursive descent parsing technique with a suitable example.
 Recursive Descent Parser
Recursive Descent Parsing is a top-down parsing technique that uses a set of
recursive procedures to process the input. Each procedure in the parser
corresponds to a non-terminal symbol in the grammar, and these procedures
call each other recursively to match the input string with the grammar rules.
It is a straigh orward parsing technique o en used for simple grammars,
such as those without le recursion or ambiguous rules.

How Recursive Descent Parsing Works


In recursive descent parsing:
1. The parser starts with the start symbol of the grammar.
2. It a empts to match the input string by recursively expanding non-
terminals according to the grammar rules.
3. If it encounters a terminal symbol that matches the input, it consumes the
symbol and con nues.
4. If a rule does not match, the parser backtracks and tries alterna ve rules
(if any).

Example
For the expression id + id * id:
1. The parser first looks for a term (T), which is the first id.
2. It then checks for the + symbol, indica ng it needs another expression.
3. It looks for another term (id), then checks for * (mul plica on).
4. Finally, it looks for the last term (id).

The parser successfully matches all parts of the expression, breaking it down
into smaller terms and opera ons. Each step follows the grammar rules,
which helps the parser understand the structure of the input.
10.Describe the role of the lexical analyzer in the compila on process.
 The lexical analyzer (also called a lexer or scanner) plays a crucial role in the
compila on process. Its main job is to break down the source code into a
sequence of tokens, which are the meaningful components of the code.

Role of the Lexical Analyzer:


1. Tokeniza on:
 The lexical analyzer reads the raw source code (characters or
characters streams) and groups them into tokens. Tokens are the
smallest units that have meaning in the programming language.

2. Removing Whitespace and Comments:


 The lexer ignores unnecessary characters like spaces, tabs, and
comments. These do not affect the logic of the program but can make
the code more readable for humans.

3. Error Detec on:


 It helps detect simple errors like unrecognized characters or incorrect
symbols early on. If it encounters something that doesn't match any
valid token (like an invalid symbol), it can report a lexical error.

4. Passing Tokens to the Parser:


 A er the source code is broken into tokens, the lexical analyzer passes
these tokens to the parser, which uses them to build a syntax tree or
other structures needed to understand the program's logic.
11.What are the issues in the design of a code generator?
 Limited flexibility: Code generators are typically designed to produce a
specific type of code, and as a result, they may not be flexible enough to
handle a wide range of inputs or generate code for different target pla orms.
This can limit the usefulness of the code generator in certain situa ons.

Maintenance overhead: Code generators can add a significant maintenance


overhead to a project, as they need to be maintained and updated alongside
the code they generate. This can lead to addi onal complexity and poten al
errors.

Debugging difficul es: Debugging generated code can be more difficult than
debugging hand-wri en code, as the generated code may not always be easy
to read or understand. This can make it harder to iden fy and fix issues that
arise during development.

Performance issues: Depending on the complexity of the code being


generated, a code generator may not be able to generate op mal code that
is as performant as hand-wri en code. This can be a concern in applica ons
where performance is cri cal.

Learning curve: Code generators can have a steep learning curve, as they
typically require a deep understanding of the underlying code genera on
framework and the programming languages being used. This can make it
more difficult to onboard new developers onto a project that uses a code
generator.

Over-reliance: It’s important to ensure that the use of a code generator


doesn’t lead to over-reliance on generated code, to the point where
developers are no longer able to write code manually when necessary. This
can limit the flexibility and crea vity of a development team, and may also
result in lower quality code overall.
12.Explain the role of linker, loader, and preprocessor in the process of
compila on.

1) Preprocessor
 Func on: The preprocessor is the first step in the compila on process.
It processes source code before it is compiled.
 Tasks:
o Handles direc ves (e.g., #include, #define in C/C++).
o Expands macros and includes header files.
o Removes comments and performs condi onal compila on.
 Output: The output of the preprocessor is a modified source code file
that is then passed to the compiler.

2) Linker
 Func on: The linker is responsible for combining mul ple object files
generated by the compiler into a single executable program.
 Tasks:
o Resolves references between different object files (e.g., func on
calls, variable references).
o Combines libraries (sta c or dynamic) with the object files.
o Arranges the memory addresses for the program’s code and
data.
 Output: The output of the linker is an executable file or a library.

3) Loader
 Func on: The loader is part of the opera ng system and is responsible
for loading the executable file into memory for execu on.
 Tasks:
o Allocates memory space for the program.
o Loads the executable code and data into memory.
o Ini alizes the program's run me environment (se ng up stack,
heap, etc.).
 Output: Once the loader finishes its work, the program is ready to be
executed by the CPU.
13.What is le recursion? How is it eliminated in context-free grammars (CFG)?
 Le Recursion
Le Recursion is a common problem that occurs in grammar during parsing
in the syntax analysis part of compila on. Le recursion occurs when a
grammar rule refers to itself at the beginning, making it difficult for a parser
to proceed without looping indefinitely.

Example of Le Recursion
Consider the rule:
A → Aα | β

Here, A on the le side is called again immediately on the right side (Aα),
which is le recursion. This can cause an infinite loop in recursive descent
parsers, as the parser would con nuously try to apply A → Aα.

Elimina ng Le Recursion in CFG


To remove le recursion, we can rewrite the rules in a way that avoids direct
self-calls at the beginning. The general approach is to convert the le -
recursive rule into a right-recursive one.

For a rule of the form:


A → Aα | β

where α and β are sequences of grammar symbols:


1. Rewrite A to separate the recursive and non-recursive cases.
2. Create a new non-terminal (e.g., A') to handle the recursion in a non-le -
recursive way.
This becomes:
A → βA'
A' → αA' | ε
where ε represents an empty string (no ac on).
Example of Elimina ng Le Recursion
Using the same expression grammar:
E→E+T|T

This can be rewri en as:


E → T E'
E' → + T E' | ε

Now:
 E starts with T, followed by E'.
 E' recursively adds + T without immediately calling E again, removing
the le recursion.
14.What is le factoring in CFG? Provide an example.
 Le Factoring
Le factoring in context-free grammars (CFG) is a transforma on technique
used to remove ambiguity by factoring out common prefixes in produc ons.
This makes the grammar more suitable for top-down parsers, which need a
clear decision point when selec ng which rule to apply.

Defini on
Le factoring is the process of rewri ng a grammar to ensure that if two or
more produc ons for a non-terminal share a common prefix, that prefix is
factored out. This avoids situa ons where the parser has to make arbitrary
choices between rules that start the same way.

Purpose
The purpose of le factoring is to make a grammar more determinis c and
avoid ambiguity, which is essen al for parsers like recursive descent parsers
that process input in a top-down manner.

Example
Suppose the grammar is in the form:
A ⇒ αβ1 | αβ2 | αβ3 | …… | αβn | γ

Where A is a non-terminal and α is the common prefix.

We will separate those produc ons with a common prefix and then add a
new produc on rule in which the new non-terminal we introduced will derive
those produc ons with a common prefix.

A ⇒ αA`
A` ⇒ β1 | β2 | β3 | …… | βn

The top-down parser can easily parse this grammar to derive a given string.
So, this is how le factoring in compiler design is performed on a given
grammar.
15.What are synthesized a ributes? How are they used in syntax-directed
transla on?
 Synthesized A ributes
Synthesized a ributes are proper es associated with nodes in a parse tree
and are used in syntax-directed transla on (SDT) to pass informa on up the
tree. These a ributes are computed based on the values of child nodes in the
parse tree. In other words, a synthesized a ribute at a par cular node is
derived from the a ributes of its children.

Role in Syntax-Directed Transla on (SDT)


In syntax-directed transla on, synthesized a ributes help to define and
perform seman c ac ons based on the grammar of the language. These
ac ons are carried out during parsing and are o en used to build an
intermediate representa on of the code, perform type checking, or compute
values.

Example
Consider a simple arithme c grammar for expressions:
E→E+T
|T
T→T*F
|F
F→(E)
| id
Suppose we want to compute the value of an arithme c expression using
synthesized a ributes. Here’s how the a ributes could work in this grammar:
1. Define a synthesized a ribute val for each node in the parse tree to store
the computed value.
2. For each produc on rule, specify how to compute val based on child
nodes.
For instance:
 In the produc on E → E + T, the value of E.val would be computed as:
E.val = E.val (from le child) + T.val (from right child)
 In the produc on T → T * F, T.val could be computed as:
T.val = T.val (from le child) * F.val (from right child)

3. This process allows the computed values to propagate up from the leaf
nodes to the root, ul mately producing the value of the en re expression.
16.Discuss various storage alloca on strategies in a compiler.
 In a compiler, storage alloca on strategies determine how memory is
allocated and managed for variables, data structures, and control
informa on. These strategies are essen al for efficiently using memory
during program execu on. The main storage alloca on strategies in a
compiler are sta c alloca on, stack alloca on, and heap alloca on.

There are mainly three types of Storage Alloca on Strategies:


1. Sta c Alloca on
2. Heap Alloca on
3. Stack Alloca on

Sta c Alloca on
In sta c alloca on, memory is assigned to variables, constants, and fixed-size
data structures at compile me. The allocated memory loca ons remain fixed
throughout program execu on.

For example:
int number = 1;
sta c int digit = 1;

Advantages of Sta c Alloca on


 It is easy to understand.
 The memory is allocated once only at compile me and remains the
same throughout the program comple on.
 Memory alloca on is done before the program starts taking memory
only on compile me.

Disadvantages of Sta c Alloca on


 Not highly scalable.
 Sta c storage alloca on is not very efficient.
 The size of the data must be known at the compile me.
Heap Alloca on
Heap alloca on is used where the Stack alloca on lacks if we want to retain
the values of the local variable a er the ac va on record ends, which we
cannot do in stack alloca on, here LIFO scheme does not work for the
alloca on and de-alloca on of the ac va on record. Heap is the most flexible
storage alloca on strategy we can dynamically allocate and de-allocate local
variables whenever the user wants according to the user needs at run- me.

For example:
int* ans = new int[5];

Advantages of Heap Alloca on


 Heap alloca on is useful when we have data whose size is not fixed
and can change during the run me.
 We can retain the values of variables even if the ac va on records end.
 Heap alloca on is the most flexible alloca on scheme.
Disadvantages of Heap Alloca on
 Heap alloca on is slower as compared to stack alloca on.
 There is a chance of memory leaks.

Stack Alloca on
Stack is commonly known as Dynamic alloca on. Dynamic alloca on means
the alloca on of memory at run- me. Stack is a data structure that follows
the LIFO principle so whenever there is mul ple ac va on record created it
will be pushed or popped in the stack as ac va ons begin and ends. Local
variables are bound to new storage each me whenever the ac va on record
begins because the storage is allocated at run me every me a procedure or
func on call is made.

For example:
void sum(int a, int b){int ans = a+b;cout<<ans;}

17.What is dynamic storage alloca on? Explain its techniques.


 Dynamic storage alloca on is the process of alloca ng memory at run me
rather than at compile me. This type of alloca on allows a program to
request memory as needed, making it ideal for data structures whose sizes
or life mes aren’t known un l the program is running, such as arrays, linked
lists, and other complex structures. In dynamic storage alloca on, memory is
managed in the heap, a region of memory set aside for dynamically allocated
data. Techniques for dynamic storage alloca on focus on efficient and safe
use of this heap memory.

Key Techniques of Dynamic Storage Alloca on


1) First Fit
The allocator searches for the first available free memory block that is
large enough to fulfill the requested alloca on.

Steps:
 Traverse the list of free blocks in memory.
 Allocate the first block that fits the size requested.
 Split the block if it’s larger than required, leaving the remainder as a
free block.

Advantages: Simple and fast, as it finds a suitable block quickly without


searching the en re list.
Disadvantages: May lead to fragmenta on, where many small unused
blocks accumulate, was ng space.

2) Best Fit
The allocator searches for the smallest available free block that is large
enough to sa sfy the alloca on request.

Steps:
 Traverse the list of free blocks.
 Select the smallest block that fits the requested size to minimize
le over space.
 Allocate this block and, if necessary, split it.
Advantages: Reduces wasted memory because it leaves the smallest
amount of unused space.
Disadvantages: Can be slower because it requires searching the en re list
of free blocks, and may lead to fragmenta on as smaller blocks become
unusable.

3) Worst Fit
The allocator finds the largest available free block and uses it to sa sfy the
alloca on request.

Steps:
 Traverse the list of free blocks.
 Allocate the largest block found, leaving smaller, more usable
fragments for future alloca ons.

Advantages: May reduce fragmenta on by breaking larger blocks, making


it easier to find large blocks when needed.
Disadvantages: Can result in inefficient use of memory, as large blocks
might be divided into sizes not suitable for future requests.

4) Buddy System
This technique splits memory into blocks of sizes that are powers of two
(e.g., 4KB, 8KB, 16KB). When memory is requested, the allocator finds the
smallest block that can fit the request and splits it into two "buddies" if
necessary.

Steps:
 Find a block of the required size or split a larger block to create it.
 When blocks are freed, they are combined (if possible) with their
"buddy" to create larger blocks again.

Advantages: Efficiently minimizes fragmenta on and speeds up


alloca on/dealloca on by using standardized block sizes.
Disadvantages: May waste memory if the request size does not exactly
match a power of two, leading to internal fragmenta on.
5) Memory Pool (or Slab Alloca on)
A memory pool pre-allocates blocks of a specific size, typically for objects
that are frequently allocated and deallocated. Each pool contains mul ple
blocks of the same size.

Steps:
 Allocate a pool of fixed-size blocks for specific data structures or
objects.
 When an object of that type is needed, the allocator provides a block
from the pool.
 When the object is no longer needed, the block is returned to the pool
for reuse.

Advantages: Reduces fragmenta on and speeds up alloca on for


frequently used fixed-size structures.
Disadvantages: Not flexible for objects of varying sizes and requires more
memory management setup.
18.What do you mean by dangling references?
 A dangling reference is a situa on in compiler design where a pointer or
reference points to a memory loca on that has been deallocated or released.
This can happen when an object is deleted or goes out of scope, but the
reference to it s ll exists and is later accessed.

Dangling pointer points to the memory loca on that no longer contains the
same content it used to hold.

In the C programming language, pointers are used to allocate memory using


func ons like malloc and calloc.
 These func ons allocate memory dynamically to the pointers. A er using
pointers, we free the allocated memory to the respec ve pointer using
the free func on.
 This dealloca on of the memory makes the deallocated memory available
for realloca on to either the same program or a different program.

We deallocated the memory using free or delete methods. This frees the
memory to the compiler, and now it can allocate it again to either some
variable or a completely different program. However, the problem is that we
s ll have pointers that point to the exact memory loca on. These pointers
can later access those loca ons and can corrupt data.
We call such pointers the dangling pointer.
19.Explain various parameter passing methods (Call-by-value, Call-by-
reference, Copy-Restore, Call-by-Name).
 Parameter passing methods determine how arguments are passed to
func ons in programming languages. Each method has different implica ons
for memory use, func on execu on, and the way arguments behave within
the func on.

1) Call-by-Value
In call-by-value, a copy of the argument’s value is passed to the func on.
The func on works on this copy, so any changes made to the parameter
within the func on do not affect the original argument.

Example: In languages like C, call-by-value is standard for basic types like


int, float, etc.

Advantages:
 Prevents unintended side effects, as changes in the func on don’t alter
the original variable.
Disadvantages:
 Can be memory-intensive for large data structures, as a copy is created
for each argument.

2) Call-by-Reference
In call-by-reference, a reference (or address) of the argument is passed to
the func on. The func on can directly modify the original variable, as
both the func on parameter and the argument refer to the same memory
loca on.

Example: In C++, you can pass parameters by reference using &, while
languages like Python use references for mutable objects.

Advantages:
 Saves memory and me since there is no copying involved.
 Allows the func on to modify the original argument, which can be
useful in cases like upda ng data structures.
Disadvantages:
 Risk of unintended side effects, as changes within the func on affect
the original variable.

3) Copy-Restore (Call-by-Value-Result)
This method combines aspects of call-by-value and call-by-reference. The
argument’s value is copied into the func on (like call-by-value), but when
the func on returns, the modified value is copied back to the original
argument (like call-by-reference).

Example: Some Fortran implementa ons use this method.

Advantages:
 Allows the func on to work with a copy, preven ng side effects during
execu on, but s ll passes back modifica ons.
Disadvantages:
 Can be memory-intensive, as two copies (ini al and final) are created.

4) Call-by-Name
In call-by-name, the argument expression is not evaluated immediately.
Instead, it is subs tuted directly into the func on, and each me it is
accessed within the func on, it’s re-evaluated. This means that any
variable changes in the func on or in the argument itself are reflected
immediately.

Example: Call-by-name was famously used in the programming language


Algol.

Advantages:
 Provides flexibility and allows for constructs like lazy evalua on.
Disadvantages:
 Can lead to inefficient performance if the argument is complex, as it
gets evaluated mul ple mes.
20.What is a DAG? What are its advantages in the context of op miza on?
How does it help in elimina ng common subexpressions?
 A Directed Acyclic Graph (DAG) is a data structure used to represent
expressions or computa ons in a way that avoids duplica on of common
subexpressions. In compiler design, DAGs are o en applied during the
op miza on phase to improve the efficiency of code by recognizing and
elimina ng redundant calcula ons.

Advantages of DAGs in Op miza on

1) Elimina on of Common Subexpressions:


When crea ng a DAG, if the same opera on (e.g., a + b) is encountered
mul ple mes, it is represented by a single node rather than being
duplicated. Each occurrence of the subexpression points to the same
node, ensuring it is computed only once.

2) Efficient Use of Storage:


Since nodes are reused for repeated subexpressions, DAGs help reduce
the memory required to store intermediate values, par cularly in cases
where large expressions are repeated.

3) Op mized Register Alloca on:


DAGs iden fy the minimum number of opera ons needed, allowing the
compiler to use fewer registers to store intermediate results.

4) Constant Propaga on:


If a node in the DAG represents a constant value, the compiler can
propagate this constant to other nodes in the graph where it is used,
poten ally reducing further opera ons.

Example of DAG Elimina ng Common Subexpressions:


Consider the expression:
e = (a + b) * (c + d) + (a + b) * (c + d)
Step 1: Construct the DAG
Iden fy and create a node for each unique subexpression:
 First, create nodes for a + b and c + d.
 Next, create a node for (a + b) * (c + d).
 Finally, add a node to add the two occurrences of (a + b) * (c + d).
Avoid duplica ng any subexpressions. Since both (a + b) and (c + d) are
reused, each has only one corresponding node in the DAG.

Step 2: Op mize Using the DAG


The DAG will reveal that (a + b) * (c + d) is computed once and reused,
removing the need to recompute it. This effec vely removes redundant
opera ons from the expression, leading to a more efficient representa on
like this:
temp1 = a + b
temp2 = c + d
temp3 = temp1 * temp2
e = temp3 + temp3

Only three opera ons (two addi ons and one mul plica on) are now needed
to compute e, even though the original expression had duplicate calcula ons.
21.Explain the language-dependent and machine-independent phases of the
compiler. Also, list major func ons done by the compiler.
 In the compila on process, a compiler undergoes two main types of phases:
language-dependent and machine-independent phases. Each phase serves
dis nct purposes, either focusing on understanding the source code
language or preparing the code for the target machine's architecture.

Language-Dependent Phases
Language-dependent phases focus on understanding and analyzing the
source code, typically in a specific programming language. These phases do
not consider the hardware specifica ons of the target machine.

Phases:
Lexical Analysis:
 Scans the source code, breaks it into tokens (keywords, operators,
iden fiers), and removes whitespace and comments.
Syntax Analysis:
 Parses the tokens according to grammar rules of the source language,
construc ng a parse tree that represents the code’s structure.
Seman c Analysis:
 Examines the parse tree for seman c correctness (e.g., type checking,
scope rules), ensuring that the code makes logical sense according to
the language's rules.

Machine-Independent Phases
Machine-independent phases focus on op mizing and preparing the code for
the target machine without being ed to any specific hardware. These phases
are designed to enhance code efficiency across any machine.
Phases:
Intermediate Code Genera on:
 Transforms the language-dependent structure into an intermediate
form (such as three-address code or abstract syntax trees) that is not
specific to any hardware.
Code Op miza on:
 Refines the intermediate code by elimina ng redundancies, op mizing
control flow, and performing loop op miza ons, among other
techniques, to enhance performance.

Major Func ons of a Compiler


1) Lexical Analysis: Tokenizes source code into meaningful symbols.
2) Syntax Analysis: Builds a syntax tree based on gramma cal rules.
3) Seman c Analysis: Ensures logical correctness through type and scope
checks.
4) Intermediate Code Genera on: Creates an intermediate representa on
for machine independence.
5) Code Op miza on: Refines intermediate code to improve efficiency.
6) Code Genera on: Produces machine-level code instruc ons.
7) Error Handling: Detects and reports errors throughout the compila on
process.
8) Symbol Table Management: Manages variable names and types for
consistent use throughout the program.
22.Explain ac va on record in detail.
 An ac va on record (also known as a stack frame) is a data structure that
stores informa on needed to manage a single execu on of a func on. Every
me a func on is called, an ac va on record is created and pushed onto the
program’s call stack, and it is removed when the func on returns. Ac va on
records play a crucial role in managing the func on’s local variables,
parameters, return addresses, and more, ensuring proper func on execu on
and memory management.

Components of an Ac va on Record
A typical ac va on record consists of the following sec ons:

1. Return Address:
Stores the address where the func on should resume execu on a er it
completes. When a func on call is finished, the program uses this address
to know where to return in the calling func on.

2. Control Link (or Dynamic Link):


Points to the ac va on record of the calling (parent) func on. This link
helps maintain the stack's call hierarchy and allows the program to return
to the calling func on's context.

3. Access Link (or Sta c Link):


Points to the ac va on record of the nearest lexically enclosing func on,
which is par cularly useful in languages with nested func ons or blocks.
This link allows func ons to access non-local variables in the parent scope.

4. Parameters:
Stores the values of parameters passed to the func on. These may be
passed by value, reference, or other parameter-passing methods. They
are required for the func on’s execu on and act as local variables within
the func on.
5. Local Variables:
Allocates space for the func on’s local variables, which are only accessible
within the func on's scope. These variables are destroyed when the
func on exits.

6. Temporary Values:
Stores temporary values or intermediate results generated during the
func on's execu on. This area is o en used for expression evalua on.

7. Saved Registers:
Holds copies of register values from the calling func on that the current
func on may modify. Saving these values ensures that the calling
func on’s register values are restored a er the func on call.

Purpose and Benefits of Ac va on Records


1) Memory Management: Ac va on records enable dynamic memory
management on the stack, where each func on’s resources are neatly
allocated and deallocated as needed.

2) Scope and Life me Management: Each func on’s variables have a


defined scope and life me. The ac va on record allows each func on to
have isolated memory for its execu on.

3) Support for Recursion: Since each func on call has its own ac va on
record, recursive func ons can safely call themselves mul ple mes
without interfering with previous calls.

4) Efficient Return Handling: The return address in the ac va on record


ensures that each func on knows where to return a er comple on,
preserving the call flow.
23.What is LL(1) parsing technique? Explain with a suitable example.
 LL(1) Parsing Technique
LL(1) parsing is a top-down parsing method in the syntax analysis phase of
compiler design. An LL(1) parser uses a predic ve parsing strategy. It
constructs the parse tree from the top (start symbol) to the bo om (terminal
symbols) by expanding non-terminals using a set of grammar rules. The "1"
in LL(1) indicates that the parser only looks at one symbol ahead in the input
to make decisions about which produc on rule to apply.

Key Proper es of LL(1) Grammar:


1. Non-ambiguity: There is no ambiguity in choosing the correct rule from
the set of possible rules.
2. No le recursion: LL(1) parsers cannot handle grammars that are le -
recursive.
3. Predic ve parsing: Based on the current input symbol, the parser can
determine which rule to apply.

Example of LL(1) Parsing


Let's consider a simple grammar for arithme c expressions:
S→A
A → aA | b

Step-by-step Process:
Step-1: Grammar Explana on:
 The start symbol S can be replaced by A.
 The non-terminal A can be replaced either by aA or b.
Step-2: LL(1) Parsing Table:
 Construct a parsing table based on the first terminal that each non-
terminal can generate.
 The table looks something like this:

Non-Terminal A B
S A →aA A→b
A A →aA A→b
 The parser uses the input symbol to decide which produc on to
apply. For example:
o If the current symbol is a, the parser uses A → aA.
o If the current symbol is b, the parser uses A → b.

Step-3: Parsing Process:


 Assume the input string is aab. The parser starts with the start
symbol S and uses the following steps:
1. S → A (From the parsing table)
2. Now, the current symbol is a. Look at the parsing table for A
with a. The rule is A → aA.
3. Apply A → aA, so the new string becomes aA.
4. Now, the current symbol is a. Again, apply A → aA.
5. The string becomes aaA, and the current symbol is b.
6. Now, look at the parsing table for A with b. The rule is A → b.
7. Apply A → b, and the final string is aab.

Step-4: Result:
 The parser successfully parses the input aab and constructs the
corresponding parse tree.
24.Explain the pass structure of an assembler.
 An assembler is a program that translates assembly language code into
machine code (or object code). The process of transla on typically involves
mul ple phases or passes. Each pass of an assembler performs specific tasks
and contributes to conver ng the high-level assembly code into the final
machine code output.

Pass Structure of an Assembler


An assembler generally works in two passes, although more sophis cated
assemblers might have mul ple passes. The two-pass assembler structure is
the most common and can be broken down into the following phases:

Pass 1:
Goal: Analyze the source code, resolve labels, and generate the symbol table.

Steps Involved in Pass 1:


1) Read the Source Code
2) Create the Symbol Table
3) Allocate Memory
4) Resolve Forward References
5) Generate Intermediate Code
6) Pass 1 Output : The output of Pass 1 typically includes symbol table &
intermediate representa on of the program

Pass 2:
Goal: Generate the final machine code by replacing placeholders with actual
addresses and crea ng the executable code.

Steps Involved in Pass 2:


1) Read the Source Code (Again)
2) Resolve All Addresses
3) Generate Machine Code
4) Output the Final Code : The machine code is output as an object code
file or executable file that can be loaded and run on the target
machine.
Single-Pass Assemblers: In some cases, a single-pass assembler is designed
to handle both steps (symbol table crea on and code genera on) in one go,
but this usually requires special techniques to handle forward references.
25.Explain peephole op miza on with an example.
 Peephole op miza on is a type of code Op miza on performed on a small
part of the code. It is performed on a very small set of instruc ons in a
segment of code.

The small set of instruc ons or small part of code on which peephole
op miza on is performed is known as peephole or window.

The objec ve of peephole op miza on is as follows:


1. To improve performance
2. To reduce memory footprint
3. To reduce code size

Peephole Op miza on Techniques


1) Redundant load and store elimina on: In this technique, redundancy is
eliminated.
2) Constant folding: The code that can be simplified by the user itself, is
simplified. Here simplifica on to be done at run me are replaced with
simplified code to avoid addi onal computa on.
3) Strength Reduc on: The operators that consume higher execu on me
are replaced by the operators consuming less execu on me.
4) Null sequences/ Simplify Algebraic Expressions : Useless opera ons are
deleted.
5) Combine opera ons: Several opera ons are replaced by a single
equivalent opera on.
6) Deadcode Elimina on:- Dead code refers to por ons of the program that
are never executed or do not affect the program’s observable behavior.
Elimina ng dead code helps improve the efficiency and performance of
the compiled program by reducing unnecessary computa ons and
memory usage.
26.Explain syntax-directed defini on to produce three-address codes for the
flow of control statements with a suitable example.
 Syntax-Directed Defini ons (SDD)
A Syntax-Directed Defini on (SDD) is a formal way of associa ng a grammar
with seman cs, specifically for transla on purposes. In SDD, a ributes are
a ached to the grammar's produc ons, and seman c rules define how these
a ributes are computed during the parsing process.

Three-address code is an intermediate representa on that breaks down a


complex expression or statement into simpler opera ons, typically involving
at most one operator and two operands per instruc on.

Flow of Control Statements in Three-Address Code


The flow control statements generally include:
 Condi onal statements (if-else)
 Loops (while, for)

Steps Involved in the Process:


1) Parse the Source Code
2) A ribute Evalua on
3) Seman c Rules
4) Genera ng Three-Address Code (TAC)

Advantages of Three-Address Code for Flow Control:


 Simplicity: TAC breaks down complex opera ons into simple statements
involving at most one opera on and two operands.
 Op miza on: It allows easy op miza on by simplifying opera ons and
elimina ng redundant expressions (e.g., common subexpression
elimina on).
 Machine Independence: Since it's not ed to a par cular machine's
assembly language, TAC can be further op mized for different target
architectures.
27.Describe various code op miza on techniques with examples.
 Code Op miza on
Code op miza on is a crucial phase in compiler design aimed at enhancing
the performance and efficiency of the executable code. By improving the
quality of the generated machine code op miza ons can reduce execu on
me, minimize resource usage, and improve overall system performance.
This process involves the various techniques and strategies applied during
compila on to produce more efficient code without altering the program’s
func onality.

The compiler op mizing process should meet the following objec ves:
 The op miza on must be correct, it must not, in any way, change the
meaning of the program.
 Op miza on should increase the speed and performance of the program.
 The compila on me must be kept reasonable.
 The op miza on process should not delay the overall compiling process.

Types of Code Op miza on


The op miza on process can be broadly classified into two types:
 Machine Independent Op miza on: This code op miza on phase
a empts to improve the intermediate code to get a be er target code
as the output.
 Machine Dependent Op miza on: Machine-dependent op miza on
is done a er the target code has been generated and when the code
is transformed according to the target machine architecture.

Code Op miza on Techniques


1. Compile Time Evalua on
2. Variable Propaga on
3. Constant Propaga on: If the value of a variable is a constant, then
replace the variable with the constant. The variable may not always be
a constant.
4. Constant Folding: The variables whose values can be computed at
compile me are considered and computed.
5. Copy Propaga on: It helps in reducing the compile me as it reduces
copying.
6. Common Sub Expression Elimina on
7. Dead Code Elimina on: In order to find the dead variables, a data flow
analysis should be done.
8. Unreachable Code Elimina on: A er constant propaga on and
constant folding, the unreachable branches can be eliminated.
9. Func on Inlining
10. Func on Cloning
11. Induc on Variable and Strength Reduc on

Advantages of Code Op miza on


 Improved performance: Code op miza on can result in code that
executes faster and uses fewer resources, leading to improved
performance.
 Reduc on in code size: Code op miza on can help reduce the size of
the generated code, making it easier to distribute and deploy.
 Improved maintainability: Code op miza on can result in code that is
easier to understand and maintain, reducing the cost of so ware
maintenance.

Disadvantages of Code Op miza on


 Increased compila on me: Code op miza on can significantly
increase the compila on me, which can be a significant drawback
when developing large so ware systems.
 Increased complexity: Code op miza on can result in more complex
code, making it harder to understand and debug.
 Poten al for introducing bugs: Code op miza on can introduce bugs
into the code if not done carefully, leading to unexpected behavior and
errors.

You might also like