CD Final
CD Final
FIG→
11. What is front-end and back-end of compiler?
➢ Front end:- The front end consist of those phases, that depends primarily on source language andlargely
independent of the target machine. Front end includes lexical analysis, syntax analysis, semantic analysis, intermediate
codegeneration and creation of symbol table. Certain amount of code optimization can be done by front end.
➢ Back end:- The back end consists of phases, that depend on target machine a n d notdepend on source program.
It includes code optimization and code generation phase with necessary errorhandling and symbol table operation.
12. List the cousins of the compiler and explain the role of each. / What is linker and loader. / Explain the roles of linker,
loader and preprocessor.
➢ In addition to a compiler, several other programs may be required to create an
executable target program.
• Preprocessor:- Preprocessor produces input to compiler.
Macro processing: A preprocessor may allow user to define macros that are
shorthandfor longer constructs.
File inclusion: A preprocessor may include the header file into the program text.
Rational preprocessor: Such a preprocessor provides the user with built in macro for
construct like while statement or if statement.
Language extensions: this processors attempt to add capabilities to the language by
what amount to built-in macros.
• Assembler:- It is a translator which takes the assembly program as an input and
generates the machine code as a output. An assembly is a mnemonic version of
machine code, in which names are used instead of binary codes for operations.
• Linker:- Linker allows us to make a single program from a several files of relocatable
machine code. These file may have been the result of several different compilation,
and one or more may be library files of routine provided by a system.
• Loader:- The process of loading consists of taking relocatable machine code,
altering the relocatableaddress and placing the altered instructions and data in memory at the proper location.
13. Describe code generator design issues. / Explain various issues in the design of code generator.
➢ Input to code generator:- The input to the code generator is the intermediate code generated by the front end,
along with information in the symbol table that determines the run-time addresses of the data objects denoted by
the names in the intermediate representation. Intermediate codes may be represented mostly in quadruples,
triples, indirect triples, Postfix notation, syntax trees, DAGs, etc.
➢ Target program: The target program is the output of the code generator. The output may be:-
Assembly language: It allows subprogram to be separately compiled.
Relocatable machine language: It makes the process of code generation easier.
Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.
➢ Memory management:- Mapping names in the source program to addresses of data objects in run time memory is
done cooperatively by the front end and the code generator.
We assume that a name in a three-address statement refers to a symbol table entry for the name
➢ Instruction selection:- Nature of instruction set of the target machine should be complete and uniform. When you
consider the efficiency of target machine then the instruction speed and machine idioms are important factors. The
quality of the generated code can be determined by its speed and size.
➢ Register allocation issues – Use of registers make the computations faster in comparison to that of memory, so
efficient utilization of registers is important. The use of registers is subdivided into two subproblems:-
During Register allocation: we select only those sets of variable that will reside in registers at each point in program.
During a subsequent Register assignment phase, the specific register is picked to access the variable.
➢ Choice of evaluation:- The order in which computations are performed can affect the efficiency of the target code.
Some computation orders require fewer registers to hold intermediate results than others. Picking a best order is
another difficult, NP-complete problem.
➢ Approaches to code generation:- The most important criterion for a code generator is that it produces correct code.
Correctness takes on special significance because of the number of special cases thatcode generator must face.
Given the premium on correctness, designing a code generator so it can be easily implemented, tested, and
maintained is an important design goal.
14. Explain: Error recovery strategies in compiler. / Explain all error recovery strategies used by parser.
➢ There are mainly four error recovery strategies :
• Panic Mode:- In this method on discovering error, the parser discards input symbol one at a time. This process is
continued until one of a designated set of synchronizing tokens is found. Synchronizing tokens are delimiters such as
semicolon or end. These tokens indicate an end of the statement.
If there is less-number of errors in the same statement then this strategy is best choice.
• Phase Level Recovery:- In this method, on discovering an error parser performs local correction on remaining input.
The local correction can be replacing comma by semicolon, deletion of semicolons or inserting missing semicolon.
This type of local correction is decided by compiler designer.
This method is used in many error-repairing compilers.
• Error Production:- If we have good knowledge of common errors that might be encountered, then we can augment
the grammar for the corresponding language with error productions that generate the erroneous constructs.
Then we use the grammar augmented by these error production to construct a parser. If error production is used
then, during parsing we can generate appropriate error message and parsing can be continued.
• Global Correction:- Given an incorrect input string x and grammar G, the algorithm will find a parse tree for a related
string y, such that number of insertions, deletions and changes of token require to transform x into y is as small as
possible. Such methods increase time and space requirements at parsing time.
Global correction is thus simply a theoretical concept.
15. List the functions of lexical analyzer.
➢ Tokenization: The lexical analyzer breaks the input source code into meaningful units called tokens, such as identifiers,
keywords, operators, literals, and punctuation symbols. Each token represents a specific lexical element in the
programming language.
• Removing Comments and Whitespace: The lexical analyzer filters out comments and whitespace from the source
code, as they are not relevant to the compilation process but may aid readability for humans.
• Error Handling: It identifies and reports lexical errors, such as invalid characters or tokens, to the compiler or the
programmer. It ensures that the compiler can provide meaningful feedback to the user about syntax issues.
• Symbol Table Management: In some implementations, the lexical analyzer may also interact with the symbol table to
manage identifiers and their associated attributes. It may perform tasks like symbol table lookup and insertion for
identifiers encountered in the source code.
• Handling Preprocessor Directives: In languages with preprocessor directives (e.g., C/C++), the lexical analyzer may
process these directives before passing the modified source code to the parser. This includes tasks such as file
inclusion, macro expansion, and conditional compilation.
• Generating Output for the Parser: Finally, the lexical analyzer produces an output stream of tokens that serves as input
for the parser. This stream provides the parser with the necessary information to analyze the syntax of the source
code and construct a parse tree or AST.
16. Explain Storage allocation strategies. / List and explain various storage allocation strategies.
➢ Static allocation: lays out storage for all data objects at compile time.
In static allocation, names are bound to storage as the program is compiled, so there is noneed for a run-time
support package. Since the bindings do not change at run time, every time procedure is activated, its names are
bounded to the same storage location.
Therefore values of local names are retained across activations of a procedure. That is, when control returns to a
procedure the value of the local are the same as they were when control left the last time.
➢ Stack allocation: manages the run-time storage as a stack.
All compilers for languages that use procedures, functions or methods as units of user define actions manage at least
part of their run-time memory as a stack. Each time a procedure is called, space for its local variables is pushed onto a
stack, and when the procedure terminates, the space is popped off the stack.
➢ Heap allocation: allocates and de-allocates storage as needed at run time from a data areaknown as heap.
Stack allocation strategy cannot be used if either of the following is possible:- The value of local names must be
retained when activation ends.; A called activation outlives the caller.
Heap allocation parcels out pieces of contiguous storage, as needed for activation record orother objects.
Pieces may be de-allocated in any order, so over the time the heap will consist of alternateareas that are free and in
use. The record for an activation of procedure r is retained when the activation ends.
17. Explain various code optimization techniques. / Explain any three code-optimization technique in detail.
➢ Common sub expressions elimination:- Compile time evaluation means shifting of computations from run time to
compile time. There are two methods used to obtain the compile time evaluation.
Folding:- In the folding technique the computation of constant is done at compile time instead ofrun time.
Constant propagation:- In this technique the value of variable is replaced and computation of an expression isdone
at compilation time.
• Common sub expressions elimination:- The common sub expression is an expression appearing repeatedly in the
program whichis computed previously. If the operands of this sub expression do not get changed at all then result
of such subexpression is used instead of re-computing it each time.
• Variable propagation:- Variable propagation means use of one variable instead of another.
• Code movement:- There are two basic goals of code movement: i) To reduce the size of the code. ii)To reduce the
frequency of execution of code.
Loop invariant computation:- Loop invariant optimization can be obtained by moving some amount of code outside
theloop and placing it just before entering in the loop. This method is also called code motion.
• Strength reduction:- Strength of certain operators is higher than others. For instance strength of * is higher than +. In
this technique the higher strength operators can be replaced by lower strengthoperators.
• Dead code elimination:- A variable is said to be live in a program if the value contained into is subsequently. On the
other hand, the variable is said to be dead at a point in a program if the value contained into it is never been used. The
code containing such a variable supposed to be a dead code. And an optimization can be performed by eliminating
such a dead code.
18. Explain LALR parser in detail. Support your answer with an example.
➢ LALR Parser is lookahead LR parser. It is the most powerful parser which can handle large classes of grammar. The
size of CLR parsing table is quite large as compared to others parsing table. LALR reduces the size of this table. LALR
works similar to CLR. The only difference is, it combines the similar states of CLR parsing table into one single state.
The general syntax becomes [A->∝.B, a ] where A → ∝.B is production and a is a terminal or right end marker $,
LR(1) items=LR(0) items + look ahead.
• Example:-
S→CC
C→aC|d
Augmented grammar: S’ →.S, $
Closure(I)
TABLE→
Now we will merge state 3, 6 then 4, 7
and 8, 9.
I36 : C→ a.C , a | d | $
C→ .a C , a | d | $
C→ .d , a | d | $
I47 : C→ d. , a | d | $
I89: C→ aC. ,a | d | $
Parsing table:
TABLE→
19. Explain different types of intermediate code.
➢ There are three types of intermediate representation:-
• Abstract syntax tree:- A syntax tree depicts the
natural hierarchical structure of a source program. A
DAG (Directed Acyclic Graph) gives the same
information but in a more compact waybecause
common sub-expressions are identified. A syntax tree
and DAG for the assignment statement a = b*-c + b*-c
is given in Fig→ (Assign, Uminu)
• Postfix notation:- Postfix notation is a linearization of a syntax tree. In postfix notation the operands occurs first and
then operators are arranged. The postfix notation for the syntax tree in Fig. is,a b c uminus * b c uminus * + assign.
• Three address code:- Three address code is a sequence of statements of the general form, a:= b op c
Where a, b or c are the operands that can be names or constants. And op stands for anyoperator.
For the expression like a = b + c + d might be translated into a sequence, t1=b+c t2=t1+d a= t2
Here t1 and t2 are the temporary names generated by the compiler.
There are at most three addresses allowed (two for operands and one for result).
20. Explain different representation of three address code.
➢ There are 3 representations of three address code namely:-
• Quadruple:- It is a structure which consists of 4 fields namely op, arg1, arg2 and result. op denotes the operator and
arg1 and arg2 denotes the two operands and result is used to store the result of the expression.
Example:- Consider expression a = b * – c + b * – c. The
three address code is:
t1 = uminus c (Unary minus operation on c)
t2 = b * t1
t3 = uminus c (Another unary minus operation on c)
t4 = b * t3
t5 = t2 + t4
a = t5 (Assignment of t5 to a)
FIG→
• Triples:- This representation doesn’t make use of extra
temporary variable to represent a single operation instead when a
reference to another triple’s value is needed, a pointer to that
triple is used. So, it consist of only three fields namely op, arg1 and
arg2.
Example:- Consider expression a = b * – c + b * – c
FIG
65. Define token, lexeme and pattern. Identify the lexemes that makes up the tokens for the following code
const p = 10; if( a < p) { a++ ; If(a== 5) continue ; }
➢ const: Keyword token, lexeme = const a: Identifier token, lexeme = a
p: Identifier token, lexeme = p ++: Operator token, lexeme = ++
=: Operator token, lexeme = = ;: Separator token, lexeme = ;
10: Number token, lexeme = 10 if: Keyword token, lexeme = if
;: Separator token, lexeme = ; (: Separator token, lexeme = (
if: Keyword token, lexeme = if a: Identifier token, lexeme = a
(: Separator token, lexeme = ( ==: Operator token, lexeme = ==
a: Identifier token, lexeme = a 5: Number token, lexeme = 5
<: Operator token, lexeme = < ): Separator token, lexeme = )
p: Identifier token, lexeme = p continue: Keyword token, lexeme = continue
): Separator token, lexeme = ) ;: Separator token, lexeme = ;
{: Separator token, lexeme = { }: Separator token, lexeme = }
66. Construct deterministic finite automata without constructing NFA for following regular expression. (a/b)*abb*
67. Generate the SLR parsing table for following grammar
S → Aa | bAc | bBa
A→d
B→d
68. Show the following grammar is LR(1) but not LALR(1).
S → Aa │bAc │Bc│bBa A→ d B→d
69. Construct SLR parsing table for the following grammar: S → (L) | a L→ L,S | S
70. Translate following arithmetic expression into (i) Quadruples (ii) Triple (iii) Indirect Triple. (a*b)+(c+d)-(a+b+c+d)
71. Construct CLR parsing table for following grammar.
S → aSA | €
A → bS | c