COMPILER LAB VIVA QUESTIONS - Docx-1
COMPILER LAB VIVA QUESTIONS - Docx-1
1. What is a compiler?
A compiler is a program that reads a program written in one
language –the source language and translates it into an equivalent
program in another language-the target language. The compiler
reports to its user the presence of errors in the source program
A compiler is a program that reads a program written in one
language –the source language and translates it into an equivalent
program in another language-the target language. The compiler
reports to its user the presence of errors in the source program.
A compiler is a program that reads a program written in one
language –the source language and translates it into an equivalent
program in another language-the target language. The compiler
reports to its user the presence of errors in the source program.
It is a tool or an application which converts high level language program to low level
language program(Machine Language).
Editor
Editor is a tool used to write a program. e.g. gedit, Notepad, Wordpad, MS Word etc.
Preprocessor
Preprocessor expands the program. It adds preprocessor directives definitions in the
program wherever they are called. e.g. header files, constant definitions, macros etc.
Compiler
It is important application. It converts high level language program to low level
language program(Assembly Language/Machine Language). It includes six phases, that we
will see in the next part of this post.
Assembler
It is a tool, which accepts assembly language code and converts it into the machine
language(executable) code. This executable code has relocatable logical addresses.
Linker
It is the tool which links program to the global variables.
Loader
Loader loads the program from secondary storage(Hard disk, usb storage etc.) to the
main memory(RAM) to get executed by the processor. Loader uses the relocatable logical
addresses and save the executable code into the actual address space in the main memory.
3. What is the difference between compiler and interpreter?
Compiler Interpreter
Compiler works on complete program at Interpreter works line by line i.e. Interpreter
once i.e. Compiler take complete program takes one line as input at one time
as input. instance.
Compiler generates intermediate code i.e. Interpreter does not generate intermediate
object code. code.
Programs need to be compiled once and Interpreter has to work every time for the
can be run any number of times. same program.
Compiler takes more memory as it Interpreter is memory efficient as it does
generates intermediate code which has to not generate intermediate object code.
be saved in memory.
Code Generator
Code generator generates the assembly code in terms of registers from the input it got
from above steps.
For above example (suppose id1,id2,id3 are saved to registers AX,BX,CX respectively),
then assembly code will be
MUL BX,CX
ADD AX, CX
5. What is a cross compiler?
Cross compiler is a compiler which converts source program compatible for one
architecture to the another program compatible for another architecture.
6. What is front end and back end of a compiler?
In compilers, the frontend translates a computer programming source code into an
intermediate representation, and the backend works with the intermediate representation to
produce code in a computer output language. The backend usually optimizes to produce code
that runs faster.
7. What is a symbol table and its use?
Symbol Table is an important data structure created and maintained by the
compiler in order to keep track of semantics of variables i.e. it stores information about
the scope and binding information about names, information about instances of various
entities such as variable and function names, classes, objects, etc.
8.What is lexeme in compiler?
A lexeme is a sequence of alphanumeric characters in a token. The term is used in
both the study of language and in the lexical analysis of computer program compilation. In
the context of computer programming, lexemes are part of the input stream from which
tokens are identified.
9. What is a lexeme?
The sequence of characters matched by a pattern to form the corresponding token
or a sequence of input characters that comprises a single token is called a lexeme. eg- “float”,
“abs_zero_Kelvin”, “=”, “-”, “273”, “;” .
10. What is a token?
A lexical token is a sequence of characters that can be treated as a unit in the
grammar of the programming languages. Example of tokens:
Keywords; Examples-for, while, if etc.
Identifier; Examples-Variable name, function name, etc.
Operators; Examples '+', '++', '-' etc.
Separators; Examples ',' ';' etc
11. What are the three parts of a Lex Program.
In first section, we can mention C language code which may consist of header files
inclusion, global variables/ Constants definition/declaration. C language code can be
mentioned in between the symbols %{ and %}. Also we can define tokens in the first
section. We can define the associativity of the operations (i.e. left associativity or right
associativity). Priorities among the operators can also be specified.
In second section, we mention the grammar productions and the action for each
production.
Third section consists of the subroutines. We have to call yyparse() to initiate the
parsing process. yyerror() function is called when all productions in the grammar in second
section do not match to the input statement.
13.What is the use of yyparse()?
We have to call yyparse() to initiate the parsing process i.e. the process of
checking syntax (i.e. process of matching grammer productions).
14.What is yytext,yylval and yylength?
Lex keep the matched string into the address pointed by pointer yytext. Matched
string's length is kept in yyleng while value of token is kept in variable yylval.
15.What is an ambiguous grammar?
A CFG is said to be ambiguous if there exists more than one derivation tree for the
given input string i.e., more than one LeftMost Derivation Tree (LMDT)
or RightMost Derivation Tree (RMDT).
Let us consider this grammar: E -> E+E|id We can create a 2 parse tree from this grammar to
obtain a string id+id+id. The following are the 2 parse trees generated by left-most
derivation.
(e.g:) E -> I
E -> E + E
E -> E * E
E -> (E)
I -> ε | 0 | 1 | … | 9
From the above grammar String 3*2+5 can be derived in 2 ways:
t1=c
t2=d
t3=t1*t2
t4=b
t5=t4+t3
a=t5
Quadruples-
It is structure with consist of 4 fields namely op, arg1, arg2 and result. op denotes the
operator and arg1 and arg2 denotes the two operands and result is used to store the result of
the expression.
In quadruples representation, each instruction is splitted into the following 4 different fields-
op, arg1, arg2, result
Here-
● The op field is used for storing the internal code of the operator.
● The arg1 and arg2 fields are used for storing the two operands used.
● The result field is used for storing the result of the expression.
Triples –
This representation doesn’t make use of extra temporary variable to represent a single operation
instead when a reference to another triple’s value is needed, a pointer to that triple is used. So, it
consist of only three fields namely op, arg1 and arg2.
Indirect Triples –
This representation makes use of pointer to the listing of all references to computations which is
made separately and stored. Its similar in utility as compared to quadruple representation but
requires less space than it. Temporaries are implicit and easier to rearrange code.
Question – Write quadruple, triples and indirect triples for following expression : (x + y) * (y + z) + (x + y + z)
Explanation – The three address code is:
t1 = x + y
t2 = y + z
t3 = t1 * t2
t4 = t1 + z
t5 = t3 + t4
18. What is three address code?
Three address code is a type of intermediate code which is easy to generate and can
be easily converted to machine code.It makes use of at most three addresses and one operator
to represent an expression and the value computed at each instruction is stored in temporary
variable generated by compiler. The compiler decides the order of operation given by three
address code.
General representation –
a = b op c
Where a, b or c represents operands like names, constants or compiler generated temporaries
and op represents the operator.
19. What are the different types of parsers?
Broadly, parsers can be classifies into two major categories:
1. Top Down Parser
2. Bottom Up Parser
In Top Down Parser, we start with a Starting Non-Terminal , we use the grammar
productions and try to reach to the string.
Bottom Up Parsing is opposite to the Top Down Parsing. Here we start with
String. We reduce it by using grammar productions and try to reach to the Start symbol.
In LR(0), for a closure which has item of the form (A->BCD.) i.e. item which has Dot at
the end, we have to add Reduce action for all terminals. That is why, it is also called to have
zero Lookahead symbol.
While in SLR(1), for a closure which has item of the form (A->BCD.) i.e. item
which has Dot at the end, we have to add Reduce action for FOLLOW(A) terminals. That is
the only difference between LR(0) and SLR(1).
21. What is the difference between top-down and bottom up parsing?
In Top Down Parser, we start with a Starting Non-Terminal , we use the grammar
productions and try to reach to the string.
Bottom Up Parsing is opposite to the Top Down Parsing. Here we start with
String. We reduce it by using grammar productions and try to reach to the Start symbol.
22. What is difference between LR(0) item and LR(1) item?
In LR(0) and SLR(1), items which we use, are called LR(0) items which are of the form
(A->.BCD or A->B.CD etc.). In CLR(1) and LALR(1), items which we use, are called as
LR(1) items. These are of the form (A->.BCD, α). Here α is a Lookahead symbol which
represents the terminals after A .
FIRST of a grammar:
A Non-terminal can generate a sequence of terminals(non-empty string) or
empty string. The collection of initial terminal of all these strings is called a FIRST of a
Non-terminal of a Grammar.
How to find set FIRST(X):
For all productions whose LHS is X,
1. If RHS starts with terminal, then add that terminal to the set FIRST(X).
2. If RHS is ϵ, then add ϵ to the set FIRST(X).
3. If RHS starts with Non-Terminal(say Y), then add FIRST(Y) to the set FIRST(X). If
FIRST(Y) includes ϵ, then, also add FIRST(RHS except Y) to the set FIRST(X).
Examples:
1. Grammar:
S->xyz/aBC
B->c/cd
C->eg/df
FIRST(S)={x,a}
FIRST(B)={c}
FIRST(C)={e,d}
FOLLOW(S)={$}
FOLLOW(B)={e,d}
FOLLOW(C)={$}
24. What is operator precedence parsing?
Operator precedence can only established between the terminals of the grammar. It ignores
the non-terminal.
a ⋗ b means that terminal "a" has the higher precedence than terminal "b".
a ⋖ b means that terminal "a" has the lower precedence than terminal "b".
a ≐ b means that the terminal "a" and "b" both have same precedence.
Precedence table:
Parsing Action
Example
Grammar:
1. E → E+T/T
2. T → T*F/F
3. F → id
Given string:
1. w = id + id * id
On the basis of above tree, we can design following operator precedence table:
Now let us process the string with the help of the above precedence table:
Recursive Descent Parser uses the technique of Top-Down Parsing without backtracking. It
can be defined as a Parser that uses the various recursive procedure to process the input
string with no backtracking. It can be simply performed using a Recursive language. The first
symbol of the string of R.H.S of production will uniquely determine the correct alternative to
choose.
The major approach of recursive-descent parsing is to relate each non-terminal with a
procedure. The objective of each procedure is to read a sequence of input characters that can
be produced by the corresponding non-terminal, and return a pointer to the root of the parse
tree for the non-terminal. The structure of the procedure is prescribed by the productions for
the equivalent non-terminal.
The recursive procedures can be simply to write and adequately effective if written in a
language that executes the procedure call effectively. There is a procedure for each
non-terminal in the grammar. It can consider a global variable lookahead, holding the current
input token and a procedure match (Expected Token) is the action of recognizing the next
token in the parsing process and advancing the input stream pointer, such that lookahead
points to the next token to be parsed. Match () is effectively a call to the lexical analyzer to
get the next token.
For example, input stream is a + b$.
lookahead == a
match()
lookahead == +
match ()
lookahead == b
Example − Write down the algorithm using Recursive procedures to implement the
following Grammar.
E → TE′
E′ → +TE′
T → FT′
T′ →∗ FT′|ε
F → (E)|id
One of major drawback or recursive-descent parsing is that it can be implemented only for
those languages which support recursive procedure calls and it suffers from the problem of
left-recursion.
26. Explain the steps in Shift reduce parsing?
Shift Reduce parser attempts for the construction of parse in a similar manner as
done in bottom-up parsing i.e. the parse tree is constructed from leaves(bottom) to the
root(up). A more general form of the shift-reduce parser is the LR parser.
This parser requires some data structures i.e.
A stack for storing and accessing the production rules.
An input buffer for storing the input string
E → E +T | T
A → A α | Β
∴ A = E, α = +T, β = T
∴ A → A α |β is changed to A → βA′and A′ → α A′|ε
∴ A → βA′ means E → TE′
A′ → α A′|ε means E′ → +TE′|ε
Comparing T → T ∗ F|F with A → Aα|β
T → T *F | F
A → A α | β
∴ A = T, α =∗ F, β = F
∴ A → β A′ means T → FT′
A → α A′|ε means T′ →* FT′|ε
Production F → (E)|id does not have any left recursion
∴ Combining productions 1, 2, 3, 4, 5, we get
E → TE′
E′ → +TE′| ε
T → FT′
T →* FT′|ε
F → (E)| id
28. What is DAG?
Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic blocks, helps
to see the flow of values flowing among the basic blocks, and offers optimization too. DAG
provides easy transformation on basic blocks. DAG can be understood here:
● Leaf nodes represent identifiers, names or constants.
● Interior nodes represent operators.
● Interior nodes also represent the results of expressions or the identifiers/name where the
values are to be stored or assigned.
● Example:
● t0 = a + b
● t1 = t0 + c
● d = t0 + t1