Course File Compiler Design
Course File Compiler Design
COURSE FILE
Program : B.E.
Semester : VII
Course Code :
1. Scheme
2. Syllabus
3. Time Table
4. Lecture Plan
5. List of Books
8. Tutorial Questions
9. Assignment Questions
Unit I: Introduction: Alphabets, Strings and Languages; Automata and Grammars, Deterministic
finite Automata (DFA)-Formal Definition, Simplified notation: State transition graph, Transition
table, Language of DFA, Nondeterministic finite Automata (NFA), Equivalence of NFA and DFA,
Minimization of Finite Automata, Regular Expressions, Ardens theorem.
Unit II: Compiler Structure: Compilers and Translators, Various Phases of Compiler, Pass
Structure of Compiler, Bootstrapping of Compiler. Lexical Analysis: The role of Lexical Analyzer,
A simple approach to the design of Lexical Analyzer, Implementation of Lexical Analyzer. The
Syntactic Specification of Programming Languages: CFG, Derivation and Parse tree, Ambiguity,
Capabilities of CFG. Basic Parsing Techniques: Top-Down parsers with backtracking, Recursive
Descent Parsers, Predictive Parsers,
Unit III: Bottomup Parsers, Shift-Reduce Parsing, Operator Precedence Parsers, LR parsers
(SLR, Canonical LR, LALR) Syntax Analyzer Generator: YACC, Intermediate Code Generation:
Different Intermediate forms: three address code, Quadruples & Triples. Syntax Directed
translation mechanism and attributed definition. Translation of Declaration, Assignment, Control
flow, Boolean expression, Array References in arithmetic expressions, procedure calls, case
statements, postfix translation.
Unit IV: Run Time Memory Management: Static and Dynamic storage allocation, stack based
memory allocation schemes, Symbol Table management Error Detection and Recovery: Lexical
phase errors, Syntactic phase errors, Semantic errors.
Unit V: Code Optimization and Code Generation: Local optimization, Loop optimization,
Peephole optimization, Basic blocks and flow graphs, DAG, Data flow analyzer, Machine Model,
Order of evaluation, Register allocation and code selection
References:
(B) TIME SCHEDULE : Total expected periods:___, Extra periods (if required)_____
Lecture Plan
Day Mon Tue Wed Thu Fri Sat Max.
availab
le
No. of 01 01 01 01 01 01
Periods
Lecture. Topics to be covered Planned
Date of
Completion Remarks
UNIT-I
1 Alphabets, Strings and Languages R1:1,
R2:1
2 Automata and Grammars R1:3,
R2:1
3 Deterministic finite Automata (DFA) R1:10 ,
R2:5
4 State transition graph, Transition table R1:84,8
8,R2:74
5 Nondeterministic finite Automata (NFA R1:105,
R2:103
6 Equivalence of NFA and DFA, R2:20
Unit-II
27 LR parsers NOTES
Unit IV
Unit V
50 DAG R2-408
R1: Compilers Principles, Techniques, and Tools by Aho,Ullman & Sethi, Pearson Education
R2: Principles of Compiler Design by Aho & Ullman, Narosa Publishing House
Websites:
1. www.cs.uccs.edu/~abudjen/classsnotes.doc
2. www.os.iitb.ac.in/~sri/notes/lexical.pdf
3. www.iitb.ac.in/~sri/notes/compiler/regex.pdf
List of Books
1. (a) What is Ad hoc network and how it differs from other network?
1. (a) Explain dynamic source routing (DSR) protocols with its advantages and 5
disadvantages.
(b) Evaluate the route discovery (RD) time parameter in the communication 5
performance of ad-hoc network.
(b) What are the advantages and disadvantages of packet switching over circuit 5
switching?
(b) With neat sketch, explain architecture of 802.11 LAN. Also, explain its MAC 5
logic.
5. Write short note on (any two) 10
i) ZRP protocol
ii) EED performance in ad-hoc network
iii) X.25
iv) AAL
COLLEGE NAME, Bhopal
Department of Information Technology
Assignment-1
Unit-1
1. Give the reasons for the separation of scanner and parser in separate phases of compiler.
2. Describe the role of lexical analyzer in recognizing tokens.
3. Explain the concept of input buffering.
4. Explain how tokens are recognized.
5. What is simple approach to design of lexical analyzer for an identifier.
6. Whats LEX? Describe auxiliary definitions and translation rules for LEX with suitable
example.
7. What are the tasks performed by the compiler in lexical and syntax analysis phases.
Explain with help of examples.
8. Explain role of symbol table in various phases of compiler
COLLEGE NAME, Bhopal
Department of Information Technology
Assignment-2
Unit-2
L E
E E +T | T
T T*F | F
F(E)| digit
8. Write a brief note on syntax tree.
9. For the following grammar find FIRST and FOLLOW sets for each non terminal-
SaAB |bA|
AaAb|
BbB|
10. What is Shift-Reduce and Reduce-Reduce conflict? How these can be resolved? With
examples explain in which condition S-R and R-R conflict can occur in SLR, Canonical LR
and LALR parsers. (Make use of LR(0), LR(1) items.
COLLEGE NAME, Bhopal
Department of Information Technology
Assignment-3
Unit-3
1. What do you mean by heap allocation? Explain the following terms related to heap
allocation-
(i) Fragmentation
(ii) Free list
(iii) Reference counts
5. Explain with a suitable example, mechanisms, used by the compiler to handle procedure
parameters.
8. Explain various data structures used for implementing the symbol table and compare them.
Assignment-4
Unit-4
1. Define Leaders.
2. Explain DAG construction.
3. What are applications of DAGs?
4. Write advantages of DAG.
5. Write short note on application of DAG in code generation.
6. Discuss the various methods of translating Boolean expression.
7. Construct DAG of basic block after converting code in 3-address representation-
Begin
Prod:=0;
i :=1;
do
begin
prod:= prod+a[i]*b[i];
i:=i+1;
end
while i<=20
end
8. Translate the following expression into quadruples, triples and indirect triples.
COLLEGE NAME, Bhopal
Department of Information Technology
Assignment-5
Unit-5
1. What is global data flow analysis? What is its use in code optimization?
2. Describe global data flow analysis.
3. Write the criteria for code improving transformations. Explain the principal sources of
optimization.
4. Define dominators and write short note on loops in flow graph.
COLLEGE NAME, Bhopal
Department of Information Technology
Tutorial-1
Tutorial-2
Unit-2
Unit-3
Tutorial-4
Unit-4
9. Write a translation scheme to generate intermediate code for assignment statement with
array references.
10. Write syntax directed definition to translate switch statement. With a suitable example,
show translation of the source language switch statement.
11. Write short note on back patching.
12. Write short note on code generation.
13. What are general issues in designing a code generator?
14. Explain code generation algorithm.
15. What is basic block? With suitable example discuss various transformations on the basic
blocks.
16. What are the different types of intermediate codes? Explain in brief.
COLLEGE NAME, Bhopal
Department of Information Technology
Tutorial-5
Unit-5
UNIT I
(i) The lexical phase (scanner) groups characters in lexical units or kens. The input the
lexical phase is a character stream. The output is a stream of kens. Regular
expressions are used define the kens recognized by a scanner (or lexical analyzer).
The scanner is implemented as a finite state machine.
(ii) The parser groups kens in syntactical units. The output of the parser is a parse tree
representation of the program. Context-free grammars are used define the program
structure recognized by a parser. The parser is implemented as a push-down automata.
(iii) The contextual analysis phase analyzes the parse tree for context-sensitive
information often called the static semantics. The output of the contextual analysis
phase is an annotated parse tree. Attribute grammars are used describe the static
semantics of a program.
(iv) The optimizer applies semantics preserving transformation the annotated parse tree
simplify the structure of the tree and facilitate the generation of more efficient code.
(v) The code general transforms the simplified annotated parse tree in object code using
rules which denote the semantics of the source language.
(vi) The peephole optimizer examines the object code, a few instructions at a time, and
attempts do machine dependent code improvements.
Types of Compiler-
1. One-pass compiler
2. Multi-pass Compiler
3. Load & Go Compiler
4. Optimized Compiler
A one-pass compiler reads the program only once, and translates it at the same time as it is
reading. A multi-pass compiler reads the program several times, each time transforming it
in a different form and usually in a different data structure.
Cross Compiler-A cross compiler compiles a target language different from the language
of the machine it runs on
Bootstrapping- Bootstrapping describe the techniques involved in writing a compiler or
assembler) in the target programming language which it is intended compile.
LEX- LEX is a program general designed for lexical processing of character input streams.
It accepts a high-level, problem oriented specification for character string matching, and
produces a program in a general purpose language which recognizes regular expressions.
The regular expressions are specified by the user in the source specifications given LEX.
The LEX written code recognizes these expressions in an input stream and partitions the
input stream in strings matching the expressions
UNIT II
The parsing is a process of finding a parse tree for a string of kens. Equivalently, it is a
process of determining whether a string of kens can be generated by a grammar. There are
two types of Parsing
Yacc-YACC stands for Yet Another Compiler-Compiler. this is because this kind of
analysis of text files is normally associated with writing compilers. Yacc provides a general
ol for imposing structure on the input a computer program. The Yacc user prepares a
specification of the input process; this includes rules describing the input structure, code be
invoked when these rules are recognized, and a low-level routine do the basic input. Yacc
then generates a function control the input process
UNIT III
A syntax- directed translation is used define the translation of a sequence of kens some
other value, based on a CFG for the input. A syntax- directed translation is defined by
associating a translation rule with each grammar rule. A translation rule defines the
translation of the left- hand -side conterminal as a function of eight-hand- side no terminals'
translations, and the values of the right-hand-side terminals. compute the translation of a
string, build the parse tree, and use the translation rules compute the translation of each
conterminal in the tree, bottom-up; the translation of the string is the translation of the root
conterminal. There is no restriction on the type of converted actions that manipulate the
parser's semantic stack. Each action must pop all right-hand-side non terminals' translations
from the semantic stack, then compute and push the left-hand-side non terminal's
translation. Next, the actions are incorporated (as action numbers) in the grammar rules.
Finally, the grammar is converted LL (1) form (treating the action numbers just like
terminal or conterminal symbols).
Intermediate Code- The semantic phase of a compiler first translates parse trees in an
intermediate representation (IR), which is independent of the underlying computer
architecture, and then generates machine code from the IRs. This makes the task of
retargeting the compiler another computer architecture easier handle
1. High-Level Intermediate Languages
Dependence graphs.
Activation records
Created every time a procedure is called
Must be accessible both the caller and the callee
Allocates space for
o Parameters
o Local variables
o Return address
o Other links and pointers provide access non-local data
Other issues
o Initializing local variables
o Stack vs. heap allocated
o Optimizing activation records by coalescing
Symbol Table
keeps track of scope and other attributes of named program entities
key operations
o void insert(symbol s);
o symbol lookup(string name);
o void enter_scope(void);
o void exit_scope(void);
implementations
o list
o hash table
o stack of tables
for some languages, the symbol table must handle overloading
o each identifier contains a list of symbols
o when entering new scope, chain symbols with same name in previous scope
UNIT V
Code optimization together with code generation form the back end of the
compiler. In compilers with a very extensive optimization, the optimization phase
is distinguished as a middle end.
The goal of the compiler's optimizer is transform the IR program created by the
front end in an IR program that computes the same results in a better way. Here
"better" can take on many meanings. It usually implies faster code, but it might
imply more compact code, less power when it runs or costs less run under some
model.
Ideally, compilers should produce target code that is as good as can be written by
hand. The reality is that this goal is achieved only in limited cases, and with
difficulty. However, the code produced by straightforward compiling algorithms can
often be made run faster or take less space, or both. This improvement is achieved
by program transformations that are traditionally called optimizations, although the
term "optimization" is a misnomer because is rarely a guarantee that the resulting
code is the best possible. Most of the compilers involve some optimization.
Compilers that apply code-improving transformations are called optimizing
compilers.
Code generation is the final phase in compilation. It takes as input an intermediate
representation of the source program and produces as output an equivalent target
program. The code generation techniques can be used whether or not an
optimization phase occurs before code generation.
The requirements traditionally imposed on a code generar are severe. The output
code must be correct and of high quality, meaning that it should make effective use
of the resources of the target machine. Moreover, the code generar itself should run
efficiently.
Mathematically, the problem of generating optimal target code is (TM)
undecidable!
In practice, we must be content with heuristic techniques that generate good, but not
necessarily optimal, code. The choice of heuristics is important, because a carefully
designed code generation algorithm can easily produce code that is several times
faster that produced with ad hoc code generation techniques.
As code generation begins, the program exists in IR form. The code generar must
convert the IR program (perhaps, already optimized) in code that can run on the
target machine.
The code generation is performed typically as a sequence- instruction selection,
instruction scheduling and register allocation-
o Instruction selection - selecting a sequence of target-machine operations that
implement the IR operations.
o Instruction scheduling - choosing an order in which the operations should
execute.
o Register allocation - deciding which values should reside in registers at each
point in the program
Most compilers handle each of these three processes separately. The term code
generation is often used refer instruction selection only.
When the level of abstraction of the IR and the target machine differ significantly,
or the underlying computation models differ, instruction selection can play a critical
role in bridging the gap. The extent which instruction selection can map the
computation in the IR program efficiently the target machine will often determine
the efficiency of the generated code. For example, consider three scenarios for
generating code from an IR-
o A simple, scalar RISC machine - the mapping from IR assembly is
straightforward. The code generar might consider only one or two assembly-
language sequences for each IR operation.
o A CISC processor - make effective use of a CISC's instruction set, the
compiler may need aggregate several IR operations in a single target-
machine operation.
o A stack machine - the code generar must translate from the register--register
computational style of IR a stack based style with its implicit names and, in
some cases, destructive operations.
As the gap in abstraction between the IR and the target ISA grows, so does the need
for ols help build code generars.
While instruction selection can play an important role in determining code quality,
the compiler writer must keep in mind the enormous size of the search space that
the instruction select might explore. As we shall see, even moderately sized
instruction sets can produce search spaces that contain hundreds of millions of
states. Clearly, the compiler cannot afford explore such spaces in either a careless
or an exhaustive way.
Previous Univ. Exam Paper
CS 701
B.E. (Seventh Semester ) EXAMINATION , June, 2009
(Common for CS & IT Engg.)
COMPILER DESIGN
Time - Three Hours
Maximum Marks - 100
Minimum Pass Marks - 35
Auxiliary Definitions
(none)
Translation Rules
Token Pattern Action
Key word {Return KW}
Identifier (Return id}
Constant {Return const.}
Operator {Return OP}
(b) Describe the role of a lexical analyzer and also explain the concept of Input Buffering.
OR
2. (a) Consider a finite state automata in which -
State 0 1
q0 q2 q1
q1 q3 q0
q2 q0 q3
q3 q1 q2
a) Give the entire sequence of states for the input string 101101.
b) Find out the string among the following string which are accepted by the given finite- state
automation -
c) 101101
d) 11111
e) 000000
3. (a) What do you mean by ambiguous and unambiguous grammar? Explain with example.
(b) Construct the parsing table for the following grammar.
S aXYb
X c/
Y d/
OR
4. (a) Consider the grammar -
S ACB/ C b B/ B a
A da /BC
B g /
C h /
(b) Show that the following grammar is U (1), by constructing its parse table-
S a AB / b A /
A a A b /
B b B /
If a > b then x = a + b
else x = a - b
- (x + y) * (z + c) (x + y + z)
OR
10. (a) Explain the role of symbol table in various phases of compiler.
(b) What do you mean by heap allocation? Explain the following terms related to heap
allocation-
11. Fragmentation
12. Free list
13. Bit map
14. Reference counts
OR
15. (a) Explain various storage allocation strategies. Which storage allocation model is to be
used if a language permits recursion ?
16. (a) Describe the necessary and sufficient conditions for performing constant propagation
and dead code eliminations.
OR
17. Explain the following -
***********************************************************************
Model Paper
CS-701
COMPILER DESIGN
Time - 3 Hrs
Max. Marks-100
Minimum Pass Marks - 35
Note - Solve any five questions. All questions carry equal marks.
1.(a) What is the basic task of scanning. What are the difficulties faced in delimeter oriented
scanning. How can this be removed.
2. (a) What are errors can be encountered by virtually all the phases of a compiler.
(b) Explain the concept of transition diagram and its use in building a lexical analyzer.
3 (a) Why are multiple passes required in a compiler. Describe strategies for reducing the number
of passes.
(b) What are data structures used for symbol tables management. Explain each of them in brief.
5(a) Write the quadruples, triples and indirection triples for the expression -
A B + C*D | E
(b) Why is the dynamic srage allocation strategy particularly amendable the run-time usage
requirements of block structured languages. Write the procedure for run-time address calculation in
our dynamic srage allocation mode.
6. (a) Describe Global Data Flow analysis. How is it useful in code optimization.
(b) Define the ambiguity of grammer. Suggest some possible solutions remove the ambiguity of a
grammer.