Compiler Design: Objectives
Compiler Design: Objectives
OBJECTIVES:
Understand the basic concept of compiler design, and its different phases which will be helpful to construct new
tools like LEX, YACC, etc.
UNIT – I
Introduction Language Processing, Structure of a compiler the evaluation of Programming language, The Science of
building a Compiler application of Compiler Technology. Programming Language Basics.
Lexical Analysis-: The role of lexical analysis buffing, specification of tokens. Recognitions of
tokens the lexical analyzer generator lexical
UNIT –II
Syntax Analysis -: The Role of a parser, Context free Grammars Writing A grammar, top down passing bottom up
parsing Introduction to Lr Parser.
UNIT –III
More Powerful LR parser (LR1, LALR) Using Armigers Grammars Equal Recovery in Lr parser
Syntax Directed Transactions Definition, Evolution order of SDTS Application of SDTS. Syntax Directed
Translation Schemes.
UNIT – IV
Intermediated Code: Generation Variants of Syntax trees 3 Address code, Types and
Deceleration, Translation of Expressions, Type Checking. Canted Flow Back patching?
UNIT – V
Runtime Environments, Stack allocation of space, access to Non Local date on the stack Heap Management code
generation – Issues in design of code generation the target Language Address in the target code Basic blocks and
Flow graphs. A Simple Code generation.
UNIT –VI
Machine Independent Optimization. The principle sources of Optimization peep hole
Optimization, Introduction to Data flow Analysis.
OUTCOMES:
• Acquire knowledge in different phases and passes of Compiler, and specifying different
types of tokens by lexical analyzer, and also able to use the Compiler tools like LEX,
YACC, etc.
• Parser and its types i.e. Top-down and Bottom-up parsers.
• Construction of LL, SLR, CLR and LALR parse table.
• Syntax directed translation, synthesized and inherited attributes.
• Techniques for code optimization.
TEXT BOOKS:
1. Compilers, Principles Techniques and Tools.Alfred V Aho, Monical S. Lam, Ravi Sethi Jeffery D. Ullman,2nd
edition,pearson,2007
2. Compiler Design K.Muneeswaran, OXFORD
3. Principles of compiler design,2nd edition,Nandhini Prasad,Elsebier.
REFERENCE BOOKS:
1. Compiler Construction, Principles and practice, Kenneth C Louden, CENGAGE
2. Implementations of Compiler, A New approach to Compilers including the algebraic
methods, Yunlinsu ,SPRINGER
UNIT – I
Introduction Language Processing, Structure of a compiler the evaluation of Programming language, The Science of
building a Compiler application of Compiler Technology. Programming Language Basics.
Lexical Analysis-: The role of lexical analysis buffering, specification of tokens. Recognitions of tokens the lexical
analyzer generator
Preprocessor
A preprocessor produce input to compilers. They may perform the following functions.
1. Macro processing: A preprocessor may allow a user to define macros that are short
hands for longer constructs.
2. File inclusion: A preprocessor may include header files into the program text.
COMPILER
Compiler is a translator program that translates a program written in (HLL) the source
program and translate it into an equivalent program in (MLL) the target program. As an
important part of a compiler is error showing to the programmer.
Error msg
ASSEMBLER: programmers found it difficult to write or read programs in machine
language. They begin to use a mnemonic (symbols) for each machine instruction, which
they would subsequently translate into machine language. Such a mnemonic machine
language is now called an assembly language. Programs known as assembler were
written to automate the translation of assembly language in to machine language. The
input to an assembler program is called source program, the output is a machine
language translation (object program).
“A loader is a program that places programs into memory and prepares them for
execution.” It would be more efficient if subroutines could be translated into object form the
loader could”relocate” directly behind the user’s program. The task of adjusting programs o
they may be placed in arbitrary core locations is called relocation. Relocation loaders
perform four functions.
Lexical Analysis:-
LA or Scanners reads the source program one character at a time, carving the
source program into a sequence of automic units called tokens.
Syntax Analysis:-
The second stage of translation is called Syntax analysis or parsing. In this
phase expressions, statements, declarations etc… are identified by using the results of lexical
analysis. Syntax analysis is aided by using techniques based on formal grammar of the
programming language.
Intermediate Code Generations:-
An intermediate representation of the final machine language code is produced.
This phase bridges the analysis and synthesis phases of translation.
Code Optimization :-
This is optional phase described to improve the intermediate code so that the
output runs faster and takes less space.
Code Generation:-
The last phase of translation is code generation. A number of optimizations to
reduce the length of machine language program are carried out during this phase. The
output of the code generator is the machine language program of the specified computer.
Symbol Table Management
This is the portion to keep the names used by the program and records
essential information about each. The data structure used to record this information called a
‘Symbol Table’.
Error Handing :-
One of the most important functions of a compiler is the detection and
reporting of errors in the source program. The error message should allow the programmer to
determine exactly where the errors have occurred. Errors may occur in all or the phases of a
compiler.
Whenever a phase of the compiler discovers an error, it must report the error to
the error handler, which issues an appropriate diagnostic msg. Both of the table-management
and error-Handling routines interact with all phases of the compiler.
LEXICAL ANALYSIS
OVER VIEW OF LEXICAL ANALYSIS
o To identify the tokens we need some method of describing the possible tokens that
can appear in the input stream. For this purpose we introduce regular expression, a
notation that can be used to describe essentially all the tokens of programming
language.
o Secondly , having decided what the tokens are, we need some mechanism to
recognize these in the input stream. This is done by the token recognizers, which are
designed using transition diagrams and finite automata.
LA may also perform certain secondary tasks as the user interface. One such task is
striping out from the source program the commands and white spaces in the form of blank,
tab and new line characters. Another is correlating error message from the compiler with the
source program.
TOKEN, LEXEME, PATTERN:
Token: Token is a sequence of characters that can be treated as a single logical entity.
Typical tokens are,
1) Identifiers 2) keywords 3) operators 4) special symbols 5)constants
Pattern: A set of strings in the input for which the same token is produced as output. This set
of strings is described by a rule called a pattern associated with the token.
Lexeme: A lexeme is a sequence of characters in the source program that is matched by the
pattern for a token.
Example:
Description of token
if if If
relation <,<=,= ,< >,>=,> < or <= or = or < > or >= or letter
followed by letters & digit
i pi any numeric constant
A patter is a rule describing the set of lexemes that can represent a particular token in source
program.
Lex specifications:
A Lex program (the .l file ) consists of three parts:
declarations
%%
translation rules
%%
auxiliary procedures
p2 {action 2}
p3 {action 3}
… …
… …
where each p is a regular expression and each action is a program fragment describing
what action the lexical analyzer should take when a pattern p matches a lexeme. In Lex
the actions are written in C.
3. The third section holds whatever auxiliary procedures are needed by the
actions.Alternatively these procedures can be compiled separately and loaded with the
lexical analyzer.
INPUT BUFFERING
The LA scans the characters of the source pgm one at a time to discover tokens.
Because of large amount of time can be consumed scanning characters, specialized buffering
techniques have been developed to reduce the amount of overhead required to process an
input character.For example please refer class notes.
UNIT –II
Syntax Analysis -: The Role of a parser, Context free Grammars Writing A grammar, top down
parsing bottom up parsing Introduction to Lr Parser
SYNTAX ANALYSIS
TOP-DOWN PARSING
A program that performs syntax analysis is called a parser. A syntax analyzer takes tokens as input and output
error message if the program syntax is wrong. The parser uses symbol-look- ahead and an approach called top-
down parsing without backtracking. Top-downparsers check to see if a string can be generated by a grammar by
creating a parse tree starting from the initial symbol and working down.
BOTTOM-UP PARSING
Bottom-up parsers, however, check to see a string can be generated from a grammar by creating a parse
tree from the leaves, and working up.
Context-Free Grammar
A context-free grammar (CFG) consisting of a finite set of grammar rules is a quadruple (N, T, P, S) where
Example
Syntax analyzers follow production rules defined by means of context-free grammar. The way the production rules
are implemented (derivation) divides parsing into two types : top-down parsing and bottom-up parsing.
Top-down Parsing
When the parser starts constructing the parse tree from the start symbol and then tries to transform the start symbol
to the input, it is called top-down parsing.
• Recursive descent parsing : It is a common form of top-down parsing. It is called recursive as it uses
recursive procedures to process the input. Recursive descent parsing suffers from backtracking.
• Backtracking : It means, if one derivation of a production fails, the syntax analyzer restarts the process using
different rules of same production. This technique may process the input string more than once to determine
the right production.
Bottom-up Parsing
As the name suggests, bottom-up parsing starts with the input symbols and tries to construct the parse tree up to the
start symbol.
Example:
Input string : a + b * c
Production rules:
S → E
E → E + T
E → E * T
E → T
T → id
a + b * c
Read the input and check if any production matches with the input:
a + b * c
T + b * c
E + b * c
E + T * c
E * c
E * T
E
S
LR PARSER
LR PARSING INTRODUCTION
The "L" is for left-to-right scanning of the input and the "R" is for constructing a rightmost
derivation in reverse.
WHY LR PARSING:
✓ LR parsers can be constructed to recognize virtually all programming-language
constructs for which context-free grammars can be written.
✓ The LR parsing method is the most general non-backtracking shift-reduce parsing
method known, yet it can be implemented as efficiently as other shift-reduce
methods.
✓ The class of grammars that can be parsed using LR methods is a proper subset of the
class of grammars that can be parsed with predictive parsers.
✓ An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-
right scan of the input.
The disadvantage is that it takes too much work to constuct an LR parser by hand for a
typical programming-language grammar. But there are lots of LR parser generators available
to make this task easy.
MODELS OF LR PARSERS
The schematic form of an LR parser is shown below.
The program uses a stack to store a string of the form s0X1s1X2...Xmsm where sm is on top.
Each Xi is a grammar symbol and each si is a symbol representing a state. Each state symbol
summarizes the information contained in the stack below it. The combination of the state
symbol on top of the stack and the current input symbol are used to index the parsing table
and determine the shiftreduce parsing decision. The parsing table consists of two parts: a
parsing action function action and a goto function goto. The program driving the LR parser
behaves as follows: It determines sm the state currently on top of the stack and ai the current
input symbol. It then consults action[sm,ai], which can have one of four values:
▪ shift s, where s is a state
▪ reduce by a grammar production A -> b
▪ accept
▪ error
UNIT –III
More Powerful LR parser (LR1, LALR) Using Armigers Grammars Error Recovery in Lr parser,Syntax Directed
Transactions Definition, Evolution order of SDTS Application of SDTS. Syntax Directed Translation Schemes
6. Then goto(J, X) = K.
Consider the above example,
I3 & I6 can be replaced by their union
I36:C->c.C,c/d/$
C->.Cc,C/D/$
C->.d,c/d/$
I47:C->d.,c/d/$
I89:C->Cc.,c/d/$
Parsing Table
state c d $ S C
0 S36 S47 1 2
1 Accept
2 S36 S47 5
36 S36 S47 89
47 R3 R3
5 R1
89 R2 R2 R2
HANDLING ERRORS
The LALR parser may continue to do reductions after the LR parser would have spotted an
error, but the LALR parser will never do a shift after the point the LR parser would have
discovered the error and will eventually find the error.
DANGLING ELSE
The dangling else is a problem in computer programming in which an optional else clause in
an If–then(–else) statement results in nested conditionals being ambiguous. Formally,
the context-free grammar of the language is ambiguous, meaning there is more than one
correct parse tree.
In many programming languages one may write conditionally executed code in two forms:
the if-then form, and the if-then-else form – the else clause is optional:
E ::= E . * E +*$
Here we have a shift-reduce error. Consider the first two items in I3. If we have a*b+c and
we parsed a*b, do we reduce using E ::= E * E or do we shift more symbols? In the former
case we get a parse tree (a*b)+c; in the latter case we get a*(b+c). To resolve this conflict, we
can specify that * has higher precedence than +. The precedence of a grammar production is
equal to the precedence of the rightmost token at the rhs of the production. For example, the
precedence of the production E ::= E * E is equal to the precedence of the operator *, the
precedence of the production E ::= ( E ) is equal to the precedence of the token ), and the
precedence of the production E ::= if E then E else E is equal to the precedence of the token
else. The idea is that if the look ahead has higher precedence than the production currently
used, we shift. For example, if we are parsing E + E using the production rule E ::= E + E
and the look ahead is *, we shift *. If the look ahead has the same precedence as that of the
current production and is slesfotcaiative , we reduce, otherwise we shift. The above grammar
is valid if we define the precedence and associativity of all the operators. Thus, it is very
important when you write a parser using CUP or any other LALR(1) parser generator to
specify associativities and precedence’s for most tokens (especially for those used as
operators). Note: you can explicitly define the precedence of a rule in CUP using the %prec
directive:
LR ERROR RECOVERY
An LR parser will detect an error when it consults the parsing action table and find a blank or
error entry. Errors are never detected by consulting the goto table. An LR parser will detect
an error as soon as there is no valid continuation for the portion of the input thus far scanned.
A canonical LR parser will not make even
PHRASE-LEVEL RECOVERY
The actions may include insertion or deletion of symbols from the stack or the input or both,
or naldteration a transposition of input symbols. We must make our choices so that the LR
parser will not get into an infinite loop. A safe strategy will assure that at least one input
symbol will be removed or shifted eventually, or that the stack will eventually shrink if the
end of the input has been reached. Popping a stack state that covers a non terminal should be
avoided, because this modification eliminates from the stack a construct that has already been
successfully parsed.
SEMANTIC ANALYSIS
➢ Semantic Analysis computes additional information related to the meaning of the
program once the syntactic structure is known.
➢ In typed languages as C, semantic analysis involves adding information to the symbol
table and performing type checking.
➢ The information to be computed is beyond the capabilities of standard parsing
techniques, therefore it is not regarded as syntax.
➢ As for Lexical and Syntax analysis, also for Semantic Analysis we need both a
Representation Formalism and an Implementation Mechanism.
➢ As representation formalism this lecture illustrates what are called Syntax Directed
Translations.
SYNTAX DIRECTED TRANSLATION
➢ The Principle of Syntax Directed Translation states that the meaning of an input
sentence is related to its syntactic structure, i.e., to its Parse-Tree.
➢ By Syntax Directed Translations we indicate those formalisms for specifying
translations for programming language constructs guided by context-free
grammars.
o We associate Attributes to the grammar symbols representing the language
constructs.
o Values for attributes are computed by Semantic Rules associated with
grammar productions.
➢ Evaluation of Semantic Rules may:
o Generate Code;
o Insert information into the Symbol Table;
o Perform Semantic Check;
o Issue error messages;
o etc.
A compiler front end is organized as in figure above, where parsing, static checking,
and intermediate-code generation are done sequentially; sometimes they can be combined
and folded into parsing. All schemes can be implemented by creating a syntax tree and then
walking the tree.
Static Checking
This includes type checking which ensures that operators are applied to compatible
operands. It also includes any syntactic checks that remain after parsing like
flow–of-control checks
o Ex: Break statement within a loop
construct Uniqueness checks
o Labels in case
statements Name-related
checks
Intermediate Representations
We could translate the source program directly into the target language. However, there
are benefits to having an intermediate, machine-independent representation.
IR can be either an actual language or a group of internal data structures that are shared by
the phases of the compiler. C used as intermediate language as it is flexible, compiles into
efficient machine code and its compilers are widely available.In all cases, the intermediate
code is a linearization of the syntax tree produced during syntax and semantic analysis. It is
formed by breaking down the tree structure into sequential instructions, each of which is
equivalent to a single, or small number of machine instructions. Machine code can then be
generated (access might be required to symbol tables etc). TAC can range from high- to low-
level, depending on the choice of operators. In general, it is a statement containing at most 3
addresses or operands.
The general form is x := y op z, where “op” is an operator, x is the result, and y and z are
operands. x, y, z are variables, constants, or “temporaries”. A three-address instruction
Unconditional jump
goto L
Creates label L and generates three-address code ‘goto L’
i. Creates label L, generate code for expression exp, If the exp returns value true then go to
the statement labelled L. exp returns a value false go to the statement immediately
following the if statement.
Function call
For a function fun with n arguments a1,a2,a3….an ie.,
fun(a1, a2, a3,…an),
Call fun, n
Where param defines the arguments to function.
t1 = - b
t2 = t1 * d
t3 = t2 + c
t4 = - b
t5 = t4 * d
t6 = t3 + t5
a = t6
Quadruples for the above example is as follows
TRIPLES
Triples uses only three fields in the record structure. One field for operator, two fields for
operands named as arg1 and arg2. Value of temporary variable can be accessed by the
position of the statement the computes it and not by location as in quadruples.
Example: a = -b * d + c + (-b) * d
Triples for the above example is as follows
Arg1 and arg2 may be pointers to symbol table for program variables or literal table for
constant or pointers into triple structure for intermediate results.
Example: Triples for statement x[i] = y which generates two records is as follows
Triples are alternative ways for representing syntax tree or Directed acyclic graph for
program defined names.
Indirect Triples
Indirect triples are used to achieve indirection in listing of pointers. That is, it uses pointers to
triples than listing of triples themselves.
Example: a = -b * d + c + (-b) * d
Conditional operator and operands. Representations include quadruples, triples and indirect
triples.
• The problem in generating three address codes in a single pass is that we may not know the labels that
control must go to at the time jump statements are generated.
• So to get around this problem a series of branching statements with the targets of the jumps temporarily left
unspecified is generated.
• Back Patching is putting the address instead of labels when the proper label is determined.
1) makelist (i) – creates a new list containing only i, an index into the array of quadruples and returns pointer
to the list it has made.
2) Merge (i, j) – concatenates the lists pointed to by i and j, and returns a pointer to the concatenated list.
3) Backpatch (p, i) – inserts i as the target label for each of the statements on the list pointed to by p.
UNIT – V
Runtime Environments, Stack allocation of space, access to Non Local data on the stack Heap Management code
generation – Issues in design of code generation the target Language Address in the target code Basic blocks and
Flow graphs. A Simple Code generation.
RUNTIME ENVIRONMENT
➢ Runtime organization of different storage locations
➢ Representation of scopes and extents during program execution.
➢ Components of executing program reside in blocks of memory (supplied by OS).
➢ Three kinds of entities that need to be managed at runtime:
o Generated code for various procedures and programs.
forms text or code segment of your program: size known at compile time.
o Data objects:
Global variables/constants: size known at compile time
Code: Program
Instructions
Stack: Manage activation of procedures at runtime.
Heap: holds variables created dynamically
STORAGE ORGANIZATION
1Fixed-size objects can be placed in predefined locations.
Run-time stack and heap
CODE GENERATION
The most important criterion for a code generator is that it produce correct code. Correctness
takes on special significance because of the number of special cases that code generator must
face. Given the premium on correctness, designing a code generator so it can be easily
implemented, tested, and maintained is an important design goal Reference Counting
Garbage Collection The difficulty in garbage collection is not the actual process of collecting
the garbage--it is the problem of finding the garbage in the first place. An object is
considered to be garbage when no references to that object exist. But how can we tell when
no references to an xobisjte?ct e A simple expedient is to keep track in each object of the
total number of references to that object. That is, we add a special field to each object called a reference count .
The idea is that the reference count field is not accessible to the Java program. Instead, the reference count field
is updated by the Java virtual machine itself.
Consider the statement
objectp = new Integer (57);
which ncreeates a w instance of the Integer class. Only a single variable, p, refers to the
object. Thus, its reference count should be one.
In general, every time one reference variable is assigned to another, it may be necessary to
update several reference counts. Suppose p and q are both reference variables. The
assignment
p = q;
would be implemented by the Java virtual machine as follows:
if (p != q)
{
if (p != null)
--p.refCount;
p = q;
if (p != null)
++p.refCount;
}
For example suppose p and q are initialized as follows:
Object p = new Integer (57);
Object q = new Integer (99);
As shown in Figure (a), two Integer objects are created, each with a reference count of
one. Now, suppose we assign q to p using the code sequence given above. Figure (b)
shows that after the assignment, both p and q refer to the same object--its reference count is
two. And the reference count on Integer(57) has gone to zero which indicates that it is
garbage.
Figure: Reference counts before and after the assignment p = q.
The costs of using reference counts are twofold: First, every object requires the special
reference count field. Typically, this means an extra word of storage must be allocated in
each object. Second, every time one reference is assigned to another, the reference counts
must be adjusted as above. This increases significantly the time taken by assignment
statements.
The advantage of using reference counts is that garbage is easily identified. When it becomes
necessary to reclaim the storage from unused objects, the garbage collector needs only to
examine the reference count fields of all the objects that have been m. If the reference count
is zero, the object is garbage.
It is not necessary to wait until there is insufficient memory before initiating the garbage
collection process. We can reclaim memory used by an object immediately when its
reference goes to zero. Consider what happens if we implement the Java assignment p = q in
the Java virtual machine as follows:
if (p != q)
{
if (p != null)
if (--p.refCount == 0)
heap.release (p);
p = q;
if (p != null)
++p.refCount;
}
Notice that the release method is invoked immediately when the reference count of an object goes
to zero, i.e., when it becomes garbage. In this way, garbage may be collected incrementally as it is
created.
BASIC BLOCKS
A basic block is a sequence of consecutive statements in which flow of control
enters at the beginning and leaves at the end without halt or possibility of branching except at
the end. The following sequence of three-address statements forms a basic block:
t1 := a*a
t2 := a*b
t3 := 2*t2
t4 := t1+t3
t5 := b*b
t6 := t4+t5
A three-address statement x := y+z is said to define x and to use y or z. A name in a basic
block is
vsaeida to lit a given point if its value is used after that point in the program,
perhaps in another basic block.
The following algorithm can be used to partition a sequence of three-address statements into
basic blocks.
Algorithm 1: Partition into basic blocks.
Input: A sequence of three-address statements.
Output: A list of basic blocks with each three-address statement in exactly one block.
Method:
1. We first determine the set of leaders, the first statements of basic
blocks. The rules we use are the following:
I) The first statement is a leader.
II) Any statement that is the target of a conditional or unconditional goto is a leader.
III) Any statement that immediately follows a goto or conditional goto statement
is a leader.
2. For each leader, its basic block consists of the leader and all statements up to but
not including the next leader or the end of the program.
Example 3: Consider the fragment of source code shown in fig. 7; it computes the dot
product of two vectors a and b of length 20. A list of three-address statements performing
this computation on our target machine is shown in fig. 8.
Begin
prod := 0;
i := 1;
do begin
prod := prod + a[i] * b[i];
i := i+1;
end
while i<= 20
end
Let us apply Algorithm 1 to the three-address code in fig 8 to determine its basic
blocks. statement (1) is ya lreualede(rI)ba nd statement (3) is a leader by rule (II), since the
last statement can jump to it. By rule (III) the statement following (12) is a leader. Therefore,
statements (1) and (2) form a basic block. The remainder of the program beginning with
statement (3) forms a second basic block.
(1) prod := 0
(2) (2) i := 1
(3) t1 := 4*i
(4) t2 := a [ t1 ]
(5) t3 := 4*i
(6) t4 :=b [ t3 ]
(7) t5 := t2*t4
(8) t6 := prod +t5
(9) prod := t6
(10) t7 := i+1
(11) i := t7
(12) if i<=20 goto (3)
UNIT –VI
Machine Independent Optimization. The principle sources of Optimization peep hole
Optimization, Introduction to Data flow Analysis
Example: the above Fig shows the result of eliminating both global and local common sub
expressions from blocks B5 and B6 in the flow graph of Fig. We first discuss the
transformation of B5 and then mention some subtleties involving arrays.
After local common sub expressions are eliminated B5 still evaluates 4*i and 4*j, as
Shown in the earlier fig. Both are common sub expressions; in particular, the three statements
t8:= 4*j; t9:= a[t[8]; a[t8]:=x in B5 can be replaced by t9:= a[t4]; a[t4:= x using t4 computed
in block B3. In Fig. observe that as control passes from the evaluation of 4*j in B3 to B5,
there is no change in j, so t4 can be used if 4*j is needed.
Another common sub expression comes to light in B5 after t4 replaces t8. The new
expression a[t4] corresponds to the value of a[j] at the source level. Not only does j retain its
value as control leaves b3 and then enters B5, but a[j], a value computed into a temporary t5,
does too because there are no assignments to elements of the array a in the interim. The
statement t9:= a[t4]; a[t6]:= t9 in B5 can therefore be replaced by
a[t6]:= t5 The expression in blocks B1 and B6 is not considered a common sub expression
although t1 can be used in both places. After control leaves B1 and before it reaches B6,it
can go through B5,where there are assignments to a. Hence, a[t1] may not have the same
value on reaching B6 as it did in leaving B1, and it is not safe to treat a[t1] as a common sub
expression.
Copy Propagation
Block B5 in Fig. can be further improved by eliminating x using two nsefwortm
raations.
One concerns assignments of the form f:=g called copy statements, or copies for short. Had
we gone into more detail in Example 10.2, copies would have arisen much sooner, because
the algorithm for eliminating common sub expressions introduces them, as do several other
algorithms. For example, when the common sub expression in c:=d+e is eliminated in Fig.,
the algorithm uses a new variable t to hold the value of d+e. Since control may reach c:=d+e
either after the assignment to a or after the assignment to b, itncwoorruelcd be t to replace
c:=d+e by either c:=a or by c:=b. The idea behind the copy-propagation transformation is to use g
for f, wherever possible after the copy statement f:=g. For example, the assignment x:=t3 in block
B5 of Fig. is a copy. Copy propagation applied to B5 yields:
x:=t3
a[t2]:=t5
a[t4]:=t3
goto B2 Copies introduced during common subexpression elimination. This may not
appear to be an improvement, but as we shall see, it gives us the opportunity to eliminate
the assignment to x.
The peephole is a small, moving window on the target program. The code in
the peephole need not contiguous, although some implementations do
require this. We shall give the following examples of program
transformations that are characteristic of peephole optimizations:
• Redundant-instructions elimination
• Flow-of-control optimizations
• Algebraic simplifications
• Use of machine idioms
REDUNTANT LOADS AND STORES
If we see the instructions sequence
(1) (1) MOV R0,a
(2) (2) MOV a,R0
-we can delete instructions (2) because whenever (2) is executed. (1) will
ensure that the value of a is already in register R0.If (2) had a label we
could not be sure that (1) was always executed immediately before (2)
and so we could not remove (2).
UNREACHABLE CODE
Another opportunity for peephole optimizations is the removal of unreachable
instructions. An unlabeled instruction immediately following an
unconditional jump may be removed. This operation can be repeated to
eliminate a sequence of instructions. For example, for debugging
purposes, a large program may have within it certain segments that are
executed only if a variable debug is 1.In C, the source code might look
like:
#define debug 0
….
If ( debug ) {
Print debugging information
}
In the intermediate representations the if-statement
may be translated as: If debug =1 goto L2
Goto L2
L1: print debugging information
L2: …………………………(a)
One obvious peephole optimization is to eliminate jumps over jumps
.Thus no matter what the value of debug; (a) can be replaced by:
If debug ≠1 goto L2
Print debugging information
L2: ……………………………(b)
As the argument of the statement of (b) evaluates to a constant true it can be replaced by
If debug ≠0 goto L2
Print debugging information
L2: ……………………………(c)
As the argument of the first statement of (c) evaluates to a constant true, it can
be replaced by goto L2. Then all the statement that print debugging aids are
manifestly unreachable and can be eliminated one at a time
Compiler Structure
• Data flow analysis operates on control flow graph (and other intermediate representations)
• x := a + b;
• y := a * b;
• while (y > a) {
• a := a + 1;
• x := a + b
• }
Available Expressions
• An expression e is available at program point p if
■ e is computed on every path to p, and
■ the value of e has not changed since the last time e is computed on p
• Optimization
■ If an expression is available, need not be recomputed
- (At least, if it’s still in a register somewhere)
Data Flow Facts
• Is expression e available?
• Facts:
■ a + b is available
■ a * b is available
■ a + 1 is available
UNIT-I
1. a) What are different analysis phases of compiler? Explain the reasons for separation of
lexical analysis from syntax analysis
b) Write a lexical analyzer program to identify Strings, Sequences, Comments,
Reserved words and identifiers.
4. a) What are the cousins of compiler? Explain their operations in processing high level
language.
5. a)What do you mean by front end in the compiler design? Show the output
7. What is the relationship with lexical analyzer, regular expressions and transition diagram?
Give an example.
4. Prove that the given grammar is ambiguous and eliminate ambiguity in it.
G→SiEtSeS|iEtS|a,
E →b|c|d
UNIT-III
1.What is syntax directed translation? How it is different from translation schemes? Explain with
an example.
3. a)Explain the type system in type checker? Write the syntax directed definition for type
checker.
D→TL,
T→ int|real,
L→ L,id|id
b) What is the role of type system in type checker? Write the syntax directed
definition for type checker.
UNIT IV
5) a)Explain the type system in type checker? Write the syntax directed definition
for type checker.
UNIT V
b)Generate target code from sequence of three address statements using simple
code generator algorithm.
UNIT VI
1) For the code given in Q.1(d) generate the basic blocks and write the rules.
dependent optimizations.
b)Explain how code motion and frequency reduction used for loop
optimizations?