CD All Units
CD All Units
Error report
Fig.1.1. A Compiler
Major functions done by compiler:
Compiler is used to convert one form of program to another.
A compiler should convert the source program to a target machine code in such a way
that the generated target code should be easy to understand.
Compiler should preserve
preserv the meaning of source code.
Compiler should report errors that occur during compilation process.
The compilation must be done efficiently.
+
a
a *
b *
c 2
Semantic analysis:
Semantic analyzer determines the meaning of a source string.
For example matching of parenthesis in the expression, or matching of if..else
statement or performing arithmetic operation that are type compatible, or checking the
scope of operation.
=
+
a
a *
b *
c 2
Int to float
Synthesis phase:: synthesis part is divided into three sub parts,
I. Intermediate code generation
II. Code optimization
III. Code generation
Intermediate code generation:
The intermediate representation should have two important properties, it should be
Dixita Kagathara, CE Department | 170701 – Compiler Design 3
Unit 1 - Introduction
easy to produce and easy to translate into target program.
We consider intermediate form called “three address code”.
Three address code consist of a sequence of instruction, each of which has at most
three operands.
The source program might appear in three address code as,
t1= int to real(2)
t2= id3 * t1
t3= t2 * id2
t4= t3 + id1
id1= t4
Code optimization:
The code optimization phase attempt to improve the intermediate code.
This is necessary to have a faster executing code or less consumption of memory.
Thus by optimizing the code the overall running running time of a target program can be
improved.
t1= id3 * 2.0
t2= id2 * t1
id1 = id1 + t2
Code generation:
In code generation phase the target code gets generated. The intermediate code
instructions are translated into sequence of machine instruction.
instructio
MOV id3, R1
MUL #2.0, R1
MOV id2, R2
MUL R2, R1
MOV id1, R2
ADD R2, R1
MOV R1, id1
Symbol Table
A symbol table is a data structure used by a language translator such as a compiler or
interpreter.
It is used to store names encountered in the source program, along with the relevant
attributes for those names.
Information
tion about following entities is stored in the symbol table.
Variable/Identifier
Procedure/function
Keyword
Constant
Class name
Label name
Source program
Lexical Analysis
Syntax Analysis
Semantic Analysis
Symbol Table Error detection
and recovery
Intermediate Code
Code Optimization
Code Generation
Target Program
Skeletal source
Preprocessor
Source program
Compiler
Target assembly
Assembler
Linker / Loader
LEXICAL ANALYSIS:
→ As the first phase of a compiler, the main task of the lexical analyzer is to read the input
characters of the source program, group them into lexemes, and produce as output tokens for each
lexeme in the source program. This stream of tokens is sent to the parser for syntax analysis. It is
common for the lexical analyzer to interact with the symbol table as well.
→When the lexical analyzer discovers a lexeme constituting an identifier, it needs to enter that
lexeme into the symbol table. This process is shown in the following figure.
→When lexical analyzer identifies the first token it will send it to the parser, the parser receives the
token and calls the lexical analyzer to send next token by issuing the getNextToken() command.
This Process continues until the lexical analyzer identifies all the tokens. During this process the
lexical analyzer will neglect or discard the white spaces and comment lines.
LEXICAL ANALYSIS Vs PARSING:
There are a number of reasons why the analysis portion of a compiler is normally separated into
lexical analysis and parsing (syntax analysis) phases.
1. Simplicity of design is the most important consideration. The separation of Lexical
and Syntactic analysis often allows us to simplify at least one of these tasks. For
example, a parser that had to deal with comments and whitespace as syntactic units
would be considerably more complex than one that can assume comments and
whitespace have already been removed by the lexical analyzer.
2. Compiler efficiency is improved. A separate lexical analyzer allows us to apply
specialized techniques that serve only the lexical task, not the job of parsing. In addition,
specialized buffering techniques for reading input characters can speed up the compiler
significantly.
3. Compiler portability is enhanced: Input-device-specific peculiarities can be
restricted to the lexical analyzer.
INPUT BUFFERING:
some ways that the simple but important task of reading the source program can be speeded.
This task is made difficult by the fact that we often have to look one or more characters beyond
the next lexeme before we can be sure we have the right lexeme. There are many situations
where we need to look at least one additional character ahead. For instance, we cannot be sure
we've seen the end of an identifier until we see a character that is not a letter or digit, and
therefore is not part of the lexeme for id. In C, single-character operators like -, =, or <
could also be the beginning of a two-character operator like ->, ==, or <=. Thus, we shall
introduce a two-buffer scheme that handles large look aheads safely. We then consider an
improvement involving "sentinels" that saves time checking for the ends of buffers.
Buffer Pairs
Each buffer is of the same size N, and N is usually the size of a disk block, e.g., 4096
bytes. Using one system read command we can read N characters in to a buffer, rather than
using one system call per character. If fewer than N characters remain in the input file, then a
special character, represented by eof, marks the end of the source file and is different from any
possible character of the source program.
Two pointers to the input are maintained:
1. The Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent
we are attempting to determine.
2. Pointer forward scans ahead until a pattern match is found; the exact strategy
whereby this determination is made will be covered in the balance of this chapter.
→Once the next lexeme is determined, forward is set to the character at its right end. Then,
after the lexeme is recorded as an attribute value of a token returned to the parser, 1exemeBegin
is set to the character immediately after the lexeme just found. In Fig, we see forward has passed
the end of the next lexeme, ** (the FORTRAN exponentiation operator), and must be retracted
one position to its left.
Advancing forward requires that we first test whether we have reached the end of one
of the buffers, and if so, we must reload the other buffer from the input, and move forward to
the beginning of the newly loaded buffer. As long as we never need to look so far ahead of the
actual lexeme that the sum of the lexeme's length plus the distance we look ahead is greater
than N, we shall never overwrite the lexeme in its buffer before determining it.
tree and passes it to the rest of the compiler for further processing.
→During the process of parsing it may encounter some error and present the error information back
to the user
Syntactic errors include misplaced semicolons or extra or missing braces; that is,
―{" or "}." As another example, in C or Java, the appearance of a case statement without
an enclosing switch is a syntactic error (however, this situation is usually allowed by the
parser and caught later in the processing, as the compiler attempts to generate code).
→Based on the way/order the Parse Tree is constructed, Parsing is basically classified in to
following two types:
1. Top Down Parsing : Parse tree construction start at the root node and moves to the
children nodes (i.e., top down order).
2. Bottom up Parsing: Parse tree construction begins from the leaf nodes and proceeds
towards the root node (called the bottom up order).
→in addition to these translators, programs like interpreters, text formatters etc., may be used in
language processing system. To translate a program in a high level language program to an
executable one, the Compiler performs by default the compile and linking functions.
→Normally the steps in a language processing system includes Preprocessing the skeletal Source
program which produces an extended or expanded source program or a ready to compile unit of
the source program, followed by compiling the resultant, then linking / loading , and finally its
equivalent executable code is produced.
→As I said earlier not all these steps are mandatory. In
some cases, the Compiler only performs this linking and loading functions implicitly.
UNIT-II
Context Free Grammar (CFG):
→CFG used to describe or denote the syntax of the programming language constructs. The
CFG is denoted as G, and defined using a four tuple
notation.
→Let G be CFG, then G is written as, G= (V, T, P, S)
Where,
V is a finite set of Non terminal; Non terminals are syntactic variables that denote sets of
strings. The sets of strings denoted by non terminals help define the language generated
by the grammar. Non terminals impose a hierarchical structure on the language that
is key to syntax analysis and translation.
T is a Finite set of Terminal; Terminals are the basic symbols from which strings are
formed. The term "token name" is a synonym for '"terminal" and frequently we will use
the word "token" for terminal when it is clear that we are talking about just the token
name. We assume that the terminals are the first components of the tokens output by the
lexical analyzer.
S is the Starting Symbol of the grammar, one non terminal is distinguished as the start
symbol, and the set of strings it denotes is the language generated by the grammar. P
is finite set of Productions; the productions of a grammar specify the manner in which the
erminals and non terminals can be combined to form strings, each production is in α->β
form, where α is a single non terminal, β is (VUT)*.Each production consists of:
(a) A non terminal called the head or left side of the production; this
production defines some of the strings denoted by the head.
(b) The symbol ->. Some times: = has been used in place of the arrow.
(c) A body or right side consisting of zero or more terminals and non-
terminals. The components of the body describe one way in which strings of the non
terminal at the head can be constructed.
Conventionally, the productions for the start symbol are listed first.
Example: Context Free Grammar to accept Arithmetic expressions.
BACK TRACKING:
This parsing method uses the technique called Brute Force method
during the parse tree construction process. This allows the process to go back (back track)
and redo the steps by undoing the work done so far in the point of processing.
→Brute force method: It is a Top down Parsing technique, occurs when there is more
than one alternative in the productions to be tried while parsing the input string. It selects
alternatives in the order they appear and when it realizes that something gone wrong it tries
with next alternative.
→For example, consider the grammar bellow.
S cAd
A ab | a
To generate the input string ―cad‖, initially the first parse tree given below is generated.
As the string generated is not ―cad‖, input pointer is back tracked to position ―A‖, to
examine the next alternate of ―A‖. Now a match to the input string occurs
a|b
S S -> A B
A A -> a
B B-> b
2. Initialising stack and input buffer:
Stack: '$ S'
Input Buffer: 'ab $'
3. Apply top-down predictive parsing algorithm.
Stack Input Buffer Action
$SAB $ Accept
2. Bottom up Parsing
In bottom-up parsing, the parser begins with the input string and attempts to reduce it to
the start symbol of the grammar by applying production rules in a right-to-left manner. It
builds the parse tree from the bottom-up, hence the name "bottom-up."
Eg. Consider the grammar:
S -> A B
A -> a
B -> b
Let's parse the input string “ab” using bottom-up predictive parsing.
1. Initialize the stack and input buffer.
Stack: $
Input Buffer: ab $
$S $ Accept
For example, if the parser is in state S (row) and the next input symbol is "a" (column), the
corresponding cell entry "S->aAB" instructs the parser to apply the production rule "S ->
aAB".
The parsing table is a critical component of predictive parsing as it systematically guides
the parser's decisions and actions. By consulting the table, the parser can determine the
appropriate production or action based on the current state and the lookahead input
symbol, enabling it to construct the parse tree or detect syntax errors during the parsing
process.
S AB
A a
B b ε
4. Table entries based on the algorithm:.
.From the production S -> A B:
For each terminal 'a' in First(A), add S -> A B to the entry [S, 'a'].
•From the production A -> a:
Add A -> a to the entry [A, 'a'].
•From the production B -> b:
Add B -> b to the entry [B, 'b'].
•From the production B -> ε:
For each terminal 'b' in Follow(B), add B -> ε to the entry [B, 'b'].
Add B -> ε to the entry [B, '$'].
Q. What is a parser?
A parser is a component of a compiler or interpreter that divides data into smaller chunks
for easier translation into another language. A parser breaks down input in the form of a
sequence of tokens, interactive commands, or computer instructions into portions that can
be used by other programming components.
Q. What is a recursive descent parser?
It can be defined as a Parser that processes the input string using numerous recursive
procedures with no backtracking. Recursive Descent Parser employs Top-Down Parsing
without the need for retracing.
Q. What is the main purpose of parser?
Parsers are employed to abstractly represent input data, such as source code, as a
structured format. This facilitates syntax validation by verifying adherence to language
rules. Parsing is utilized in various coding languages and technologies for this purpose.
Q. What is the difference between predictive and recursive
parsing?
Predictive parsing uses lookahead tokens to make parsing decisions without
backtracking, suitable for LL grammars. Recursive parsing involves functions
calling themselves to process input, adaptable but potentially less efficient due to
backtracking in some cases.
SHIFT-REDUCE PARSING:
→Shift-reduce parsing is a form of bottom-up parsing in which a stack holds grammar
symbols and an input buffer holds the rest of the string to be parsed, We use $ to mark the
bottom of the stack and also the right end of the input. And it makes use of the process of
shift and reduce actions to accept the input string.
→Here, the parse tree is Constructed bottom up from the leaf nodes towards the root node.
When we are parsing the given input string, if the match occurs the parser takes the
reduce action otherwise it will go for shift action. And it can accept ambiguous grammars
also.
→For example, consider the below grammar to accept the input string ―id * id―, using S-
R parser
E→ E+T|T
T→ T*F | F
F →(E)|id
→Actions of the Shift-reduce parser using Stack implementation
STACK INPUT ACTION
$ Id*id$ Shift
$id *id$ Reduce with F d
$F *id$ Reduce with T F
$T *id$ Shift
$T* id$ Shift
$T*id $ Reduce with F id
$T*F $ Reduce with T T*F
$T $ Reduce with E T
$E $ Accept
Ambiguous Grammar
Introduction
Before heading towards ambiguous grammar, let's see about Context-Free Grammar. A
context-free grammar is a formal grammar used to generate all the possible patterns of
strings.
A context-free grammar is classified based on:
•Number of Derivation trees
•Number of Strings
•→The number of Derivation trees is further classified into
•Ambiguous grammar
•Unambiguous grammar.
Ambiguity in Grammar
A grammar or a Context-Free Grammar(CFG) is said to be ambiguous if there exists more
than one leftmost derivation(LMDT) or more than one rightmost derivation(RMDT), or
more than one parse tree for a given input string.
→Technically, we can say that context-free grammar(CFG) represented by G = (N, T, P,
S) is said to be ambiguous grammar if there exists more than one string in L(G).
Otherwise, the grammar will be unambiguous.
→One thing that should be transparent is that ambiguity is a property of grammar and not
a language.
→Since Ambiguous Grammar can produce two Parse trees for the same expression, it's
often confusing for a compiler to find out which one among all available Parse Trees is the
correct one according to the context of the work. This is the reason ambiguity is not
suitable for compiler construction.
Example 1
Let production rule is given as:
S -> AB|aaB
A -> a|Aa
B -> b
Let us generate string aab from the given grammar. Parse trees for generating string aab
are as follows :
Here for the same string, we are getting more than one parse tree. Hence, grammar is
ambiguous grammar.
Example 2
Let production rule is given as:
E -> EE+
E -> E(E)
E -> id
Parse tree for id(id)id + is:
Only one parse tree is possible for id(id)id+, so the given grammar is unambiguous.
Also read - Arden's theorem
Example 3
Check the given production is ambiguous or not
S → aSb | SS
S→ε
For the string "aabb" the above grammar can generate two parse trees.
Since there are two parse trees for a single string, "aabb", the grammar G is ambiguous.
YACC
YACC stands for Yet Another Compiler Compiler.
It is used to produce the source code of the syntactic analyzer of the language
produced by LALR (1) grammar.
The input of YACC is the rule or grammar and the output is a C program.
These are some points about YACC:
C Compiler
To parse this JSON data, we can define a grammar using a parser generator tool, such as
ANTLR (ANother Tool for Language Recognition). The grammar might look something like
this:
jsonObject : '{' (pair (',' pair)*)? '}';pair : STRING ':'
value;value : STRING | NUMBER | jsonObject;STRING : '"' ~["]*
'"';NUMBER : '-'? [0-9]+ ('.' [0-9]+)?;
Using the grammar specification, the automatic parser generator can generate code in a
target programming language, such as Java or C++. This generated code will include the
necessary logic to parse JSON data according to the defined grammar rules.
With the generated parser code, we can now parse the JSON data and extract specific
elements. For example, we can extract the name, age, and email from the JSON object and
use them in our application logic.Advantages of Using an Automatic Parser Generato
rSaves Development Time and EffortDeveloping a parser from scratch can be a
complex and time-consuming task. Automatic parser generators automate much of the
process, generating the necessary code based on a given grammar specification. This saves
developers significant time and effort, allowing them to focus on other aspects of their
project
.Ensures Correctness and ConsistencyAutomatic parser generators generate code that is
based on the specified grammar rules. This ensures that the generated parser will correctly
interpret input data according to the defined syntax and structure. It helps in avoiding
manual errors and inconsistencies that can arise when implementing a parser by hand.
Flexibility and Maintainability Automatic parser generators provide flexibility in
terms of the target programming language. Developers can choose the programming
language in which they want the generated parser code to be written. This allows them to
integrate the parser into their existing codebase seamlessly. Additionally, if the grammar
specification needs to be modified or updated, the parser generator can regenerate the
code, making it easier to maintain and adapt to changing requirements.
INTERMEDIATE CODE GENERATION
In Intermediate code generation we use syntax directed methods to translate the source
program into an intermediate form programming language constructs such as declarations,
assignments and flow-of-control statements.
In the next few slides we will see how abstract syntax trees can be constructed from syntax
directed definitions. Abstract syntax trees are condensed form of parse trees. Normally operators
and keywords appear as leaves but in an abstract syntax tree they are associated with the interior
nodes that would be the parent of those leaves in the parse tree. This is clearly indicated by the
examples in these slides. Chain of single productions may be collapsed, and operators move to the
parent nodes
Chain of single productions are collapsed into one node with the operators moving up to become
the node.
Advantages
Here are some advantages of polish notation in compiler design:
1.No need for parentheses: In polish notation, there is no need for parentheses
while writing the arithmetic expressions as the operators come before the operands.
2.Efficient Evaluation: The evaluation of an expression is easier in polish notation
because in polish notation stack can be used for evaluation.
3.Easy parsing: In polish notation, the parsing can easily be done as compared to the
infix notation.
4.Less scanning: The compiler needs fewer scans as the parentheses are not used
in the polish notations, and the compiler does not need to scan the operators and
operands differently.
Disadvantages
Here are some disadvantages of polish notation in compiler design:
1.Vague: If someone sees the polish notation for the first time and doesn’t know about
it. It will be very hard to understand how to evaluate the expression.
2.Not used commonly: The polish notation is not commonly used in day-to-day life.
These are mostly used for scientific purposes.
3.Difficult for programmers: It will be difficult for programmers who need to become
more familiar with polish notations to read the expression.
→In the semantic rule, attribute is VAL and an attribute may hold anything like a
string, a number, a memory location and a complex record
Example
Production Semantic Rules
E→E+T E.val := E.val + T.val
Symbol Table
→Symbol table is an important data structure used in a compiler.
→Symbol table is used to store the information about the occurrence of various entities
such as objects, classes, variable name, interface, function name etc. it is used by both the
analysis and synthesis phases.
It is used to store the name of all entities in a structured form at one place.
Hash table
Operations
The symbol table provides the following operations:
Insert ()
Insert () operation is more frequently used in the analysis phase when the tokens
are identified and names are stored in the table.
The insert() operation is used to insert the information in the symbol table like the
unique name occurring in the source code.
1. int x;
Should be processed by the compiler as:
lookup()
In the symbol table, lookup() operation is used to search a name. It is used to determine:
1. lookup (symbol)
This format is varies according to the programming language.
ORGANIZATION FOR BLOCK STRUCTURES:
A block is a any sequence of operations or instructions that are used to perform a [sub] task. In
any programming language,
Blocks contain its own local data structure.
Blocks can be nested and their starting and ends are marked by a delimiter.
They ensure that either block is independent of other or nested in another block. That is,
it is not possible for two blocks B1 and B2 to overlap in such a way that first block B1
begins, then B2, but B1 end before B2.
This nesting property is called block structure. The scope of a declaration in a block-
structured language is given by the most closely nested rule:
1. The scope of a declaration in a block B includes B.
2. If a name X is not declared in a block B, then an occurrence of X in B is in the scope
of a declaration of X in an enclosing block B ' such that. B ' has a declaration of X, and. B
' is more closely nested around B then any other block with a declaration of X.
Storage Allocation
The different ways to allocate memory are:
If memory is created at compile time then the memory will be created in static
area and only once.
Static allocation supports the dynamic data structure that means memory is
created only at compile time and deallocated after program completion.
The drawback with static storage allocation is that the size and position of data
objects should be known at compile time.
An activation record is pushed into the stack when activation begins and it is
popped when the activation end.
Activation record contains the locals so that they are bound to fresh storage in
each activation record. The value of locals is deleted when the activation ends.
It works on the basis of last-in-first-out (LIFO) and this allocation supports the
recursion process.
Allocation and deallocation of memory can be done at any time and at any place
depending upon the user's requirement.
Heap allocation is used to allocate memory to the variables dynamically and when
the variables are no more used then claim it back.
Example:
1. fact (int n)
2. {
3. if (n<=1)
4. return 1;
5. else
6. return (n * fact(n-1));
7. }
8. fact (6)
The dynamic allocation is as follows:
UNIT-IV
Considerations for optimization :
→ The code produced by the straight forward compiling
algorithms can often be made to run faster or take less space,or both. This
improvement is achieved by program transformations that are traditionally
called optimizations. Machine independent optimizations are program
transformations that improve the target code without taking into consideration
any properties of the target machine.
→Machine dependant optimizations are based on register allocation and
utilization of special machine-instruction sequences.
Criteria for code improvement transformations
- Simply stated, the best program transformations are those that yield the most
benefit for the least effort.
- First, the transformation must preserve the meaning of programs. That is, the
optimization must not change the output produced by a program for a given
input, or cause an error.
- Second, a transformation must, on the average, speed up programs by a
measurable amount.
- Third, the transformation must be worth the effort.
Some transformations can only be applied after detailed, often time-consuming
analysis of the source program, so there is little point in applying them to
programs that will be run only a few times
OBJECTIVES OF OPTIMIZATION:
The main objectives of the optimization techniques are
as follows
1. Exploit the fast path in case of multiple paths fro a given situation.
2. Reduce redundant instructions.
3. Produce minimum code for maximum work.
4. Trade off between the size of the code and the speed with which it gets
executed.
5. Place code and data together whenever it is required to avoid unnecessary
searching of
data/code
During code transformation in the process of optimization, the basic
requirements are as follows:
1. Retain the semantics of the source code.
2. Reduce time and/ or space.
3. Reduce the overhead involved in the optimization process.
Scope of Optimization:
Control-Flow Analysis Consider all that has happened up to this point in the
compiling process—lexical analysis, syntactic analysis, semantic analysis and
finally intermediate-code generation
→. The compiler has done an enormous amount of analysis, but it still doesn‘t
really know how the program does what it does. In control-flow analysis, the
compiler figures out even more information about how the program does its
work, only now it can assume that there are no
syntactic or semantic errors in the code.
→Control-flow analysis begins by constructing a control-flow graph, which is a
graph of the different possible paths program flow could take through a
function.
→To build the graph, we first divide the code into basic blocks. A basic block is
a segment of the code that a program must enter at the beginning and exit only
at the end.
→This means that only the first statement can be reached from outside the
block (there are no branches into the middle of the block) and all statements are
executed consecutively after the first one is (no branches or halts until the exit).
→Thus a basic block has exactly one entry point and one exit point. If a
program executes the first instruction in a basic block, it must execute every
instruction in the block sequentially after it.
→A basic block begins in one of several ways:
• The entry point into the function
• The target of a branch (in our example, any label)
• The instruction immediately following a branch or a return
A basic block ends in any of the following ways:
• A jump statement
• A conditional or unconditional branch
• A return statement
Now we can construct the control-flow graph between the blocks. Each basic
block is anode in the graph, and the possible different routes a program might
take are the connections, i.e
if a block ends with a branch, there will be a path leading from that block to the
branch target.
→The blocks that can follow a block are called its successors. There may be
multiple successors or just one. Similarly the block may have many, one, or no
predecessors.
Connect up the flow graph for Fibonacci basic blocks given above. What does
an if then-else look like in a flow graph?
LOCAL OPTIMIZATIONS
Optimizations performed exclusively within a basic block are called "local
optimizations". These are typically the easiest to perform since we do not
consider any control flow information; we just work with the statements within
the block.
Many of the local optimizations we will discuss have corresponding global
optimizations that operate on the same principle, but require additional analysis
to perform. We'll consider some of the more common local optimizations as
examples.
FUNCTION PRESERVING TRANSFORMATIONS
Common sub expression elimination
Constant folding
Variable propagation
Dead Code Elimination
Code motion
Strength Reduction
1. Common Sub Expression Elimination:
Two operations are common if they produce the same result. In such a case, it is
likely more efficient to compute the result once and reference it the second time
rather than re-evaluate it. An expression is alive if the operands used to compute
the expression have not been changed. An expression that is no longer alive is
dead.
Example :
a=b*c;
d=b*c+x-y;
2. Variable Propagation:
Let us consider the above code once again
c=a*b;
x=a;
d=x*b+4;
Department of Computer Science & Engineering Course File : Compiler Design
if we replace x by a in the last statement, we can identify a*b and x*b as
common sub expressions.
→This technique is called variable propagation where the use of one variable is
replaced by another variable if it has been assigned the value of same
Compile Time evaluation
a= 2*(22.0/7.0)*r;
Here, we can perform the computation 2*(22.0/7.0) at compile time itself.
3. Dead Code Elimination:
→If the value contained in the variable at a point is not used anywhere in the
program subsequently, the variable is said to be dead at that place. If an
assignment is made to a dead variable, then that assignment is a dead
assignment and it can be safely removed from the program.
→Similarly, a piece of code is said to be dead, which computes value that are
never used anywhere
in the program.
c=a*b;
x=a;
d=x*b+4;
Using variable propagation, the code can be written as follows:
c=a*b;
x=a;
d=a*b+4;
Using Common Sub expression elimination, the code can be written as follows:
t1= a*b;
c=t1;
x=a;
d=t1+4;
Here, x=a will considered as dead code. Hence it is eliminated.
t1= a*b;
c=t1;
d=t1+4;
We can evaluate an expression with constants operands at compile time and
replace that expression by a single value. This is called folding. Consider the
following statement:
a= 2*(22.0/7.0)*r;
Here, we can perform the computation 2*(22.0/7.0) at compile time itself.
4)code movement
The motivation for performing code movement in a program is to improve the
execution time of the program by reducing the evaluation frequency of
expressions.
This can be done by movingthe evaluation of an expression to other parts of the
program. Let us consider the bellow code:
If(a<10)
{
b=x^2-y^2;
}
else
{
b=5;
a=( x^2-y^2)*10;
}
At the time of execution of the condition a<10, x^2-y^2 is evaluated twice. So,
we can optimize
the code by moving the out side to the block as follows:
t= x^2-y^2;
If(a<10)
{
b=t;
}
else
{
b=5;
a=t*10;
}
5. Strength Reduction:
In the frequency reduction transformation we tried to reduce the execution
frequency of the expressions by moving the code. There is other class of
transformations which perform equivalent actions indicated in the source
program by reducing the strength of operators
.→ By strength reduction, we mean replacing the high strength operator with
low strength operator with out affecting the program meaning.
Flow Graph
Flow graph is a directed graph. It contains the flow of control information for the set of
basic block.
A control flow graph is used to depict that how the program control is being parsed
among the blocks. It is useful in the loop optimization.
→Block B1 is the initial node. Block B2 immediately follows B1, so from B2 to B1 there is an
edge.
→The target of jump from last statement of B1 is the first statement B2, so from B1 to B2
there is an edge.
If we decrease the number of instructions in an inner loop then the running time of a
program may be improved even if we increase the amount of code outside that loop.
1.Code motion
2.Induction-variable elimination
3.Strength reduction
1.Code Motion:
Code motion is used to decrease the amount of code in loop. This transformation takes a
statement or expression which can be moved outside the loop body without affecting the
semantics of the program.
For example
In the while statement, the limit-2 equation is a loop invariant equation.
It can reduce the number of additions in a loop. It improves both code space and run time
performance.
In this figure, we can replace the assignment t4:=4*j by t4:=t4-4. The only problem which
will be arose that t4 does not have a value when we enter block B2 for the first time. So we
place a relation t4=4*j on entry to the block B2.
3.Reduction in Strength
Strength reduction is used to replace the expensive operation by the cheaper once
on the target machine.
Example:
1. while (i<10)
2. {
3. j= 3 * i+1;
4. a[j]=a[j]-2;
5. i=i+2;
6. }
After strength reduction the code will be:
1. s= 3*i+1;
2. while (i<10)
3. {
4. j=s;
5. a[j]= a[j]-2;
6. i=i+2;
7. s=s+6;
8. }
In the above code, it is cheaper to compute s=s+6 than j=3
ReductionFrequency
Frequency reduction is a type in loop optimization process which is machine
independent. In frequency reduction code inside a loop is optimized to improve
the running time of program. Frequency reduction is used to decrease the
amount of code in a loop. A statement or expression, which can be moved
outside the loop body without affecting the semantics of the program, is moved
outside the loop. Frequency Reduction is also called Code Motion.
Objective of Frequency Reduction:
The objective of frequency reduction is:
•To reduce the evaluation frequency of expression.
•To bring loop invariant statements out of the loop.
Below is the example of Frequency Reduction:
Program
// This program does not uses frequency reduction.
#include <bits/stdc++.h>
int main()
{
int a = 2, b = 3, c, i = 0;
while (i < 5) {
// c is calculated 5 times
c = pow(a, b) + pow(b, a);
Output:
17
17
17
17
17
Explanation:
Program 2 is more efficient than Program 1 as in Program 1 the value of c is
calculated each time the while loop is executed. Hence the value of c is
calculated outside the loop only once and it reduces the amount of code in the
loop.
1.The leaves of graph are labeled by unique identifier and that identifier can be
variable names or constants.
3.Nodes are also given a sequence of identifiers for labels to store the computed
value.
Each node contains a list of attached identifiers to hold the computed values.
Step 2:For case(i), create node(OP) whose right child is node(z) and left child is node(y).
For case(ii), check whether there is node(OP) with one child node(y).
Output:For node(x) delete x from the list of identifiers. Append x to attached identifiers list
for the node n found in step 2. Finally set node(x) to n.
over how the code will be executed. You can make use of specific hardware
features or take advantage of system-level APIs that may not be available to
more portable code.
•Reduced portability: One of the main drawbacks of machine-dependent code is
that it is not portable. It can only be run on the specific machine or environment it
was written for, which can be a limitation if you need to run the code on multiple
platforms.
•Higher maintenance costs: Machine-dependent code can be more difficult to
maintain and update, as it may require specific knowledge of the hardware and
software environment it was written for. This can lead to higher maintenance
a b c d
4 6
2 8
f f
fp fp
p p
→Each register has a register descriptor containing the list of variables currently stored in
this register. At the start of the basic block, all register descriptors are empty. It keeps track
of recent/current variable in each register. It is constructed whenever a new register is
needed
Each variable has an address descriptor containing the list of locations where this
variable is currently stored. Possibilities are its memory location and one or more registers
The memory location might be in the static area, the stack, r presumably the heap. The
register descriptors can be computed from the address descriptors. For each name of the
block, an address descriptors is maintained that keep track of location where current value
of name is found at runtime
There are basically three aspects to be considered in code generation:
Choosing registers
Generating instructions
Managing descriptors
Minimize the number of registers used:
→When a register holds the values of a program variable and all subsequent
uses of this value are preceded by a redefinition, we could reuse this register.
But to know about all subsequent uses, one may require live/dead-on-exit
knowledge
Assume a, b, c and d are program variables and t, u, v are compiler generated
temporaries. These are represented as t 1,𝑡2 and t$3. The code generated for different
TACs is given below:
t = a –b
LD R1, a
LD R2, b
SUB R1, R2
U=a–c
LD R3, a
LD R2, c
SUB R3, R2
v=t+u
ADD R1, R3
a=d
LD R2, d
ST a, R2
d=v+u
ADD R1, R3
ST d, R1
Exit
DAG for Register allocation:
DAG (Directed Acyclic Graphs) are useful data structures for implementing
transformations on basic blocks. A DAG gives a picture of how the value
computed by a statement in a basic block is used in subsequent statements of
the block.
→Constructing a DAG
from three-address statements is a good way of determining common sub-
expressions within a block, determining which names are used inside
the block but evaluated outside the block, and determining which statements of
the block could have their computed value used outside the block.
→A DAG for a basic block is a directed cyclic graph with the following labels
on nodes:
1. Leaves are labeled by unique identifiers, either variable names or constants.
From the operator applied to a name we determine whether the l-value or r-
value of a name is needed; most leaves represent r- values.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels. The
intention is that interior nodes represent computed values, and the identifiers
labeling a node are deemed to have that value.
DAG representation Example:
For example, the slide shows a three-address code. The corresponding DAG is
shown. We observe that each node of the DAG represents a formula in terms of
the leaves, that is, the values possessed by variables and constants upon
entering the block
. For example, the node labeled t 4 represents the formula
that is, the value of the word whose address is 4*i bytes offset from address b,
which is the
intended value of t 4