Compiler Design solved question paper
Compiler Design solved question paper
1. (a) Discuss in detail about the operations of compiler which transforms the source
program from one representation into another. Illustrate the output for the input:
A=B*C+2 10 Marks (CO1)
Ans: compiler is a computer program that transforms source code written in a high-level
programming language into an executable machine code that a computer can understand
and execute. The process of compilation is divided into several stages, each of which
performs a specific task.
1: Lexical analysis: This stage involves breaking up the source code into tokens, which are
the basic building blocks of the programming language. The compiler uses a tokenizer to
scan the source code and identify keywords, identifiers, operators, and literals. For example,
in the input "A=BC+2", the tokens are "A", "=", "B", "", "C", "+", and "2".
2: Syntax analysis: In this stage, the compiler checks the syntax of the program to ensure
that it conforms to the rules of the programming language. The compiler uses a parser to
analyze the structure of the source code and construct a parse tree. The parse tree
represents the hierarchical structure of the program and shows how the various
components of the program relate to each other.
3: Semantic analysis: This stage involves checking the meaning of the program to ensure
that it is logically sound. The compiler uses a semantic analyzer to check for type errors,
undeclared variables, and other semantic errors. For example, in the input "A=B*C+2", the
compiler checks that the variables "A", "B", and "C" have been declared and that they have
the correct types.
4: Code generation: In this stage, the compiler generates machine code that can be
executed by a computer. The compiler uses a code generator to translate the parse tree into
machine code. The code generator takes into account the target architecture of the
computer and optimizes the code for performance.
5: Code optimization: This stage involves improving the efficiency of the generated code.
The compiler uses a code optimizer to analyze the generated code and make improvements
such as eliminating redundant operations and reordering instructions.
This is an example of assembly language code that the compiler may generate for the input
"A=B*C+2". The code loads the value of B into a register, multiplies it by the value of C, adds
2 to the result, and stores the final value in the variable A. The specific instructions
generated may vary depending on the target architecture and the optimizations applied by
the compiler.
(b) Identify whether the following grammars are suitable for Top-Down parsing. If not,
apply appropriate technique to make it suitable for TopDown parsing.
10 Marks (CO1)
Modified Grammar:
A -> aA'
A' -> BAd | aA' | ε
B -> bB'
B' -> eB' | ε
In the modified grammar, the left recursion has been eliminated by introducing a new non-
terminal A' and B'.
(ii) The given grammar is suitable for Top-Down parsing because it is not a left-recursive
grammar.
(iii) The given grammar is not suitable for Top-Down parsing because it is ambiguous. The
ambiguity arises because the same string can be derived by multiple paths in the grammar.
To make it suitable for Top-Down parsing, we need to remove the ambiguity.
Modified Grammar:
S -> aSaSbS | T
T -> abb | b
In the modified grammar, we have removed the ambiguity by introducing a new non-
terminal T and rewriting the productions. Now, each string can be derived in only one way,
making the grammar suitable for Top-Down parsing.
The input buffering strategy used in lexical analysis involves reading the source code from
the input file into the input buffer in blocks of fixed size. The lexical analyzer then scans the
input buffer, recognizing the tokens in the source code and storing them in the lookahead
buffer. When the lookahead buffer is empty, the lexical analyzer reads another block of the
source code into the input buffer and continues scanning.
By using a two-buffer scheme and input buffering strategy, the lexical analyzer can
efficiently scan the source code without having to read the input file one character at a
time. This speeds up the compilation process and reduces the number of I/O operations
needed.
1: Assemblers: Assemblers are programs that translate assembly language code into
machine code. They are similar to compilers, but they operate at a lower level of
abstraction.
2: Interpreters: Interpreters are programs that execute code directly, without translating it
into machine code first. They interpret the source code line by line, executing each line as it
is encountered.
3: Linkers: Linkers are programs that combine object files generated by compilers into a
single executable file. They resolve references between different object files and generate
the final executable code.
4: Loaders: Loaders are programs that load executable code into memory and prepare it for
execution. They perform tasks such as relocating the code to the correct memory addresses
and setting up the program's initial state.
5: Debuggers: Debuggers are programs that allow developers to debug their code by setting
breakpoints, examining variables and memory contents, and stepping through the code line
by line. They are used during development to find and fix errors in the code.
Each of these programs plays an important role in the software development process, and
they work together to translate, execute, and debug code.
Ques: What are the different types of parsing? Explain the shift-reduce parsing. Write the
different operations that are performed in shift-reduce parsing.
Shift reduce parsing is a process of reducing a string to the start symbol of a grammar. It uses a stack
to hold the grammar and an input tape to hold the string.
Shift reduce parsing performs the two actions: shift and reduce. That's why it is known as shift reduces
parsing. At the shift action, the current symbol in the input string is pushed to a stack. At each reduction,
the symbols will replaced by the non-terminals. The symbol is the right side of the production and
nonterminal is the left side of the production.
Operations:
Shift: This involves moving symbols from the input buffer onto the stack.
Reduce: This involves reduction of the handle by using appropriate production rules.
Accept: If only the start symbol is present in the stack and the input buffer is empty then, the parsing action
is called accept.
Error: This is the situation in which the parser can neither perform shift action nor reduce action and not
even accept action.
Ques: What is Cross Compiler? Suppose there are two machines A and B, show the
bootstrap arrangement for machine B.
A cross compiler is a compiler capable of creating executable code for a platform other than the one
on which the compiler is running. For example, a compiler that runs on a Windows 7 PC but
generates code that runs on Android smartphone is a cross compiler.
A cross compiler is necessary to compile for multiple platforms from one machine. A platform could be
infeasible for a compiler to run on, such as for the micro-controller of an embedded system because those
systems contain no operating system. In para-virtualization one machine runs many operating systems, and
a cross compiler could generate an executable for each of them from one main source.
We can create compiler of many different forms
Compiler which takes C language and generates an assembly language as an output with the
availability of a machine of assembly language.
Step-1: First we write a compiler for a small of C in assembly language.
Step-2: Then using with small subset of C i.e. C0, for the source language c the compiler
is written.
Step-3: Finally we compile the second compiler. using compiler 1 the compiler 2 is
compiled.
Step-4: Thus we get a compiler written in ASM which compiles C and generates code in
ASM.
Parser Generator – It produces syntax analyzers (parsers) from the input that
is based on a grammatical description of programming language or on a
context-free grammar. It is useful as the syntax analysis phase is highly
complex and consumes more manual and compilation time.
Line buffering: In this technique, the compiler reads one line of source code at a
time and stores it in a buffer. This is a simple and efficient method for processing
source code, but it can lead to problems with error reporting when errors occur on
long lines.
Block buffering: This technique involves reading a block of source code into a
buffer and then processing it. This can be more efficient than line buffering, as it
reduces the number of system calls required. However, it can be less flexible in
terms of handling input errors.
Token buffering: In this technique, the compiler reads the source code one token
(such as a keyword or identifier) at a time and stores it in a buffer. This can be
more efficient than line or block buffering, as it allows the compiler to work on
smaller chunks of code at once. However, it can be more complex to implement
and may require additional processing steps.
In the context of compilers, a pass is a complete traversal of the source code by the
compiler. During each pass, the compiler performs a specific set of tasks on the
source code, such as lexical analysis, syntax analysis, semantic analysis, code
generation, and optimization. The output of one pass becomes the input of the next
pass, and so on until the final executable code is generated.
Factors which influence the number of passes to be used in a particular compiler include the
following:
1. Available memory
2. Speed and size of compiler
3. Speed and size of object program
4. Debugging features required
5. Error-detection and -recovery techniques desired
6. Number of people and time required to accomplish the compiler writing project
Handles and viable prefixes are important in bottom-up parsing because they allow
the parser to efficiently build a parse tree from the bottom up.
By identifying handles and pruning them, the parser can simplify the parsing process
and reduce the number of possible parse trees.
By identifying viable prefixes, the parser can determine which symbols in the input
string can be replaced with nonterminal symbols, and which symbols must be parsed
further.
Ques: Explain the different issues that are considered in selecting the
intermediate codes. Discuss the different types of Intermediate Codes.
Intermediate code is a representation of the source code that is generated by a compiler
during the compilation process. It is used as an intermediate step between the source code
and the target code. The selection of the intermediate code is an important task for the
compiler designer, as it affects the efficiency and performance of the compiler. The different
issues that are considered in selecting the intermediate codes are:
There are several types of intermediate codes that are commonly used in compiler design,
including:
Three-address code: Three-address code represents each operation with at most three
operands. It is easy to generate and manipulate, and is commonly used in optimizing
compilers.
Quadruples: Quadruples are a sequence of four items (operator, operand 1, operand 2, and
result) that represent an operation. They are used to simplify the generation of machine
code.
Triples: Triples are a sequence of three items (operator, operand 1, and operand 2) that
represent an operation. They are used to simplify the optimization of machine code.
Control flow graphs: Control flow graphs represent the control flow of the program as a
directed graph. They are used to simplify the analysis and optimization of control flow in the
program.
A translator is a program that converts source code into object code. Generally, there are three
types of translator:
compile
rs
interpret
ers
assembl
ers
Phases of Compiler :
Lexical Analysis - The first phase of scanner works as a text scanner. This phase scans the
source code as a stream of characters and converts it into meaningful lexemes. Lexical
analyzer represents these lexemes in the form of tokens as:
<token-name, attribute-value>
Syntax Analysis - The next phase is called the syntax analysis or parsing. It takes the
token produced by lexical analysis as input and generates a parse tree (or syntax tree). In
this phase, token arrangements are checked against the source code grammar, i.e. the parser
checks if the expression made by the tokens is syntactically correct.
Semantic Analysis - Semantic analysis checks whether the parse tree constructed follows the
rules of
language. For example, assignment of values is between compatible data types, and
adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types
and expressions; whether identifiers are declared before use or not etc. The semantic analyzer
produces an annotated syntax tree as an output.
Intermediate Code Generation - After semantic analysis the compiler generates an
intermediate code of the source code for the target machine. It represents a program for
some abstract machine. It is in between the high-level language and the machine language.
This intermediate code should be generated in such a way that it makes it easier to be
translated into the target machine code.
Code Optimization – The next phase does code optimization of the intermediate code.
Optimization can be assumed as something that removes unnecessary code lines, and
arranges the sequence of statements in order to speed up the program execution without
wasting resources (CPU, memory).
Code Generation - In this phase, the code generator takes the optimized representation
of the intermediate code and maps it to the target machine language. The code generator
translates the intermediate code into a sequence of (generally) relocatable machine code.
Sequence of instructions of machine code performs the task as the intermediate code would
do.
Intermediate Code Generation - After semantic analysis the compiler generates an
intermediate code of the source code for the target machine. It represents a program for
some abstract machine. It is in between the high-level language and the machine language.
This intermediate code should be generated in such a way that it makes it easier to be
translated into the target machine code.
Code Optimization - The next phase does code optimization of the intermediate code.
Optimization can be assumed as something that removes unnecessary code lines, and
arranges the sequence of statements in order to speed up the program execution without
wasting resources (CPU, memory).
Code Generation - In this phase, the code generator takes the optimized representation
of the intermediate code and maps it to the target machine language. The code generator
translates the intermediate code into a sequence of (generally) relocatable machine code.
Sequence of instructions of machine code performs the task as the intermediate code would
do.
Ques: What is Shift-Reduce and Reduce-Reduce conflict? How can
these be resolved? Explain error recovery in LR parsing.
Shift-reduce and reduce-reduce conflicts are two types of parsing conflicts that can
occur in a bottom-up parser like an LR parser.
In a shift-reduce conflict, the parser encounters a grammar rule that can either be
reduced or have another token shifted onto the stack. This ambiguity can result in
multiple possible parse trees for the input.
In a reduce-reduce conflict, the parser encounters a state where there are two or
more possible reductions for the same lookahead token. This ambiguity can also
result in multiple possible parse trees for the input.
To resolve these conflicts, the parser generator typically uses a set of rules called
precedence rules, associativity rules, and/or default actions that specify how to
resolve the ambiguity. For example, a higher precedence operator may be given
priority over a lower precedence operator in a shift-reduce conflict, or an associativity
rule may be used to determine how to group operators of the same precedence in a
reduce-reduce conflict.
When there is no valid continuation for the input scanned thus far, LR parsers report an error.
Before notifying a mistake, a CLR parser never performs a single reduction and an SLR or LALR may
do multiple reductions, but they will never move an incorrect input symbol into the stack.
When the parser checks the table and discovers that the relevant action item is empty, an error is
recognized in LR parsing. Goto entries can never be used to detect errors.
The FIRST and FOLLOW sets are used to build the parsing table, which maps the
current non-terminal symbol and the current input token to the next action to take in
the parsing process. Predictive parsers use this table to make parsing decisions based
on the current symbol and the current input token.
Here are the rules for finding the FIRST set of a grammar symbol:
Here are the rules for finding the FOLLOW set of a grammar symbol:
1. The FOLLOW set of the start symbol is always {$}, where $ is the end-of-input
marker.
2. For each non-terminal symbol A, add the FOLLOW set of A to the FOLLOW set of
any symbol that appears immediately after A in any production.
3. If A appears at the end of a production or if a production A -> Bβ has ε in FIRST
set of β, then add the FOLLOW set of the non-terminal symbol on the left-hand
side of the production to the FOLLOW set of A.
1. Type checking: The semantic analyzer checks that all variables and expressions
are being used with the correct data types, and performs implicit type
conversions where necessary.
2. Scope resolution: The semantic analyzer determines the scope of variables and
ensures that variables are used within their scope.
3. Syntax tree transformation: The semantic analyzer can modify the syntax tree
generated by the parser to simplify the code or optimize its execution.
4. Error checking: The semantic analyzer detects and reports semantic errors that
cannot be detected by the lexical and syntactic analysis phases.
5. Memory management: The semantic analyzer can allocate and deallocate
memory for variables and data structures.
Now, let's discuss the two benefits of generating intermediate code in compiler:
Ambiguity in a grammar can cause problems for parsers, as it can make it difficult or
impossible to determine the correct parse tree for a given input string. This can result
in incorrect or unpredictable behaviour when the parser tries to parse the input.
To remove ambiguity from a grammar, there are two techniques that can be used:
Here are some examples of ambiguous grammars and how they can be
disambiguated:
Ques: Write the algorithm for finding the LR(0) items in bottom up parsing.
Define the CLOSURE and GOTO functions with examples.
Algorithm for finding the LR(0) items in bottom-up parsing:
1. Start with the augmented grammar with the start symbol S' → S, and create an initial item
[S' → .S].
2. Repeat until no new items can be added: a. For each item [A → α.Bβ], add items for
productions B → γ to the closure of [A → α.Bβ], where the new items are [B → .γ]. b. For
each item [A → α.Bβ] and each symbol X in FOLLOW(B), add the item [A → αB.β] to GOTO([A
→ α.Bβ], X).
3. The set of LR(0) items is the set of all items generated in step 2.
CLOSURE function:
The CLOSURE function takes an LR(0) item [A → α.Bβ] and returns a set of items that includes the
original item and all items that can be derived from it using the grammar rules. Specifically, the
closure of [A → α.Bβ] is the set of all items [B → .γ] where B → γ is a production rule in the
grammar, and γ can be derived from β.
Example:
S→E
E→E+T|T
T→T*F|F
F → ( E ) | id
If we apply the closure function to the LR(0) item [E → E.+T], we get the set of items:
[E → E.+T]
[T → .T*F]
[T → .F]
[F → .(E)]
[F → .id]
GOTO function:
The GOTO function takes a set of LR(0) items I and a symbol X, and returns a new set of items that
can be reached by shifting X in each item in I. Specifically, GOTO(I, X) is the set of all items [A →
αX.β] where [A → α.Bβ] is an item in I.
Example:
Continuing from the previous example, if we apply the GOTO function to the set of items
generated by the closure function above, with X = T, we get the set of items:
[T → T*.F]
[F → .(E)]
[F → .id]
This set of items represents the LR(0) items that can be reached from [E → E.+T] by shifting T.
In some cases, optimizing the code may not be worthwhile, especially for small
programs or programs that are not performance-critical. Additionally, optimizing code
can be computationally expensive, and may require significant resources or long
compile times.
If we were designing a code optimizer for a compiler, some design objectives that we
would consider include:
1. Correctness: The optimizer should not introduce any bugs or errors into the
code, and should preserve the semantics of the original program.
2. Efficiency: The optimizer should be efficient and fast, so that it does not
significantly increase the compile time of the program.
3. Effectiveness: The optimizer should be effective at improving the performance
of the generated code, and should produce significant improvements in
execution time or memory usage.
4. Portability: The optimizer should be designed to work with a wide range of input
programs and architectures, and should not rely on specific hardware or
software features.
5. Maintainability: The optimizer should be designed to be maintainable and easy
to modify or extend, so that it can be updated as new optimizations or features
are developed.
The front-end of a compiler is responsible for analyzing the source code and creating
an intermediate representation of the program that can be used by the back-end. This
includes tasks such as lexical analysis, syntax analysis, semantic analysis, and code
generation for the intermediate language. The front-end typically performs most of
the error checking and type checking for the program.
The back-end of a compiler is responsible for generating the target machine code or
assembly language from the intermediate representation. This includes tasks such as
instruction selection, register allocation, and code optimization. The back-end is
typically heavily dependent on the specific architecture of the target machine, and
may have multiple variants to support different hardware configurations.
The front-end and back-end are important in the design of different compilers because
they can be designed and optimized independently. This allows compiler designers to
reuse front-end components across multiple languages or platforms, while
customizing the back-end to take advantage of specific hardware features or
optimizations.
1. No two productions for a nonterminal A should begin with the same terminal
symbol or ε. In other words, if A → α and A → β are two productions for
nonterminal A, then FIRST(α) ∩ FIRST(β) = Ø, unless ε is in FIRST(β) and
FIRST(α) ∩ FOLLOW(A) = Ø.
2. If A → α is a production and FIRST(α) contains ε, then FOLLOW(A) ∩ FIRST(B) = Ø
for all nonterminals B that can appear after A in a sentential form.
S -> a A
A -> b A | c
Ques: List the name four cousins of compiler and explain the working
of these cousins of compiler. Write the name of different files that are
generated when we compile a c-language program (sample. c).
Four cousins of the compiler are:
When we compile a C-language program (sample.c), the following files are generated:
1. Preprocessed file (sample.i) - This file contains the preprocessed output of the
source code, with all preprocessor directives expanded and included files
inserted.
2. Assembly file (sample.s) - This file contains the assembly language code
generated by the compiler from the preprocessed source code.
3. Object file (sample.o) - This file contains the machine code generated by the
assembler from the assembly language code.
4. Executable file (sample.exe or sample.out) - This file is generated by the linker
and contains the complete program that can be executed by the computer's
processor.
The bootstrap arrangement for the second machine (Assuming two machines A and B) involves
using machine A to create a simple version of the compiler for machine B. The simple compiler is
then used to create a more complex version of the compiler for machine B, which can then be
used to create a more complex version of the compiler for machine A.
The bootstrap arrangement for the second machine can be summarized as follows:
This process can be repeated iteratively until the final version of the compiler is produced for both
machines A and B. The benefit of bootstrapping is that it allows for the creation of a powerful and
efficient compiler from a simpler and less powerful one.