CD Notes Final
CD Notes Final
CD Notes Final
Compiler:
It is a translator which takes input i.e., High-Level Language, and produces an
output of low-level language i.e. machine or assembly language.
A compiler is more intelligent than an assembler it checks all kinds
of limits, ranges, errors, etc.
But its program run time is more and occupies a larger part of
memory. It has a slow speed because a compiler goes through the
entire program and then translates the entire program into machine
codes.
Relocatable Machine Code: It can be loaded at any point and can be
run. The address within the program will be in such a way that it will
cooperate with the program movement.
Linker
Linker is a computer program that links and merges various object files together
in order to make an executable file. All these files might have been compiled by
separate assemblers. The major task of a linker is to search and locate referenced
module/routines in a program and to determine the memory location where these
codes will be loaded, making the program instruction to have absolute references.
Loader
Loader is a part of operating system and is responsible for loading executable
files into memory and execute them. It calculates the size of a program
(instructions and data) and creates memory space for it. It initializes various
registers to initiate execution.
Phases of a Compiler:
There are two major phases of compilation, which in turn have many parts.
Each of them takes input from the output of the previous level and works in a
coordinated way.
Analysis Phase: Known as the front-end of the compiler, the analysis phase of
the compiler reads the source program, divides it into core parts and then checks
for lexical, grammar and syntax errors.The analysis phase generates an
intermediate representation of the source program and symbol table, which
should be fed to the Synthesis phase as input. An intermediate representation is
created from the given source code :
1. Lexical Analyzer
2. Syntax Analyzer
3. Semantic Analyzer
4. Intermediate Code Generator
Lexical analyzer divides the program into “tokens”, the Syntax analyzer
recognizes “sentences” in the program using the syntax of the language and
the Semantic analyzer checks the static semantics of each construct.
Intermediate Code Generator generates “abstract” code.
Synthesis Phase:
Known as the back-end of the compiler, the synthesis phase generates the target
program with the help of intermediate source code representation and symbol
table. Equivalent target program is created from the intermediate
representation. It has two parts:
1. Code Optimizer
2. Code Generator
Code Optimizer optimizes the abstract code, and the final Code Generator
translates abstract intermediate code into specific machine instructions.
Cross Compiler
The simple compiler works in one system only, but what will happen if we
need a compiler that can compile code from another platform, to perform such
compilation, the cross compiler is introduced.
A cross compiler is a compiler capable of creating executable code for a
platform other than the one on which the compiler is running. For example, a
cross compiler executes on machine X and produces machine code for
machine Y.
Source-to-source Compiler
A compiler that takes the source code of one programming language and
translates it into the source code of another programming language is called a
source-to-source compiler
Phases of a Compiler
We basically have two phases of compilers, namely the Analysis phase and
Synthesis phase. The analysis phase creates an intermediate representation
from the given source code. The synthesis phase creates an equivalent target
program from the intermediate representation.
Tokens
Lexemes are said to be a sequence of characters (alphanumeric) in a token. There
are some predefined rules for every lexeme to be identified as a valid token. These
rules are defined by grammar rules, by means of a pattern. A pattern explains
what can be a token, and these patterns are defined by means of regular
expressions.
In programming language, keywords, constants, identifiers, strings, numbers,
operators and punctuations symbols can be considered as tokens.
For example, in C language, the variable declaration line
int value = 100;
contains the tokens:
int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).
Specifications of Tokens
Let us understand how the language theory undertakes the following terms:
Alphabets
Any finite set of symbols {0,1} is a set of binary alphabets,
{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets, {a-z, A-Z}
is a set of English language alphabets.
Strings
Any finite sequence of alphabets (characters) is called a string. Length of the
string is the total number of occurrence of alphabets, e.g., the length of the string
tutorialspoint is 14 and is denoted by |tutorialspoint| = 14. A string having no
alphabets, i.e. a string of zero length is known as an empty string and is denoted
by ε (epsilon).
Language
A language is considered as a finite set of strings over some finite set of alphabets.
Computer languages are considered as finite sets, and mathematically set
operations can be performed on them. Finite languages can be described by means
of regular expressions.
Lexeme
Pattern
The
sequence
of
all the reserved characters
Interpretation keywords of that that make
of type language(main, the
Keyword printf, etc.) int, goto keyword.
it must
start with
the
alphabet,
followed
Interpretation name of a by the
of type variable, alphabet or
Identifier function, etc main, a a digit.
each kind of
punctuation is
considered a
token. e.g.
Interpretation semicolon,
of type bracket, comma,
Punctuation etc. (, ), {, } (, ), {, }
Criteria Token Lexeme Pattern
any string
of
characters
Interpretation (except ‘ ‘)
of type a grammar rule or “Welcome to between ”
Literal boolean literal. GeeksforGeeks!” and “
How Lexical Analyzer works-
1. Input preprocessing: This stage involves cleaning up the input text
and preparing it for lexical analysis. This may include removing
comments, whitespace, and other non-essential characters from the
input text.
2. Tokenization: This is the process of breaking the input text into a
sequence of tokens. This is usually done by matching the characters
in the input text against a set of patterns or regular expressions that
define the different types of tokens.
3. Token classification: In this stage, the lexer determines the type of
each token. For example, in a programming language, the lexer
might classify keywords, identifiers, operators, and punctuation
symbols as separate token types.
4. Token validation: In this stage, the lexer checks that each token is
valid according to the rules of the programming language. For
example, it might check that a variable name is a valid identifier, or
that an operator has the correct syntax.
5. Output generation: In this final stage, the lexer generates the output
of the lexical analysis process, which is typically a list of tokens.
This list of tokens can then be passed to the next stage of compilation
or interpretation.
The lexical analyzer identifies the error with the help of the automation
machine and the grammar of the given language on which it is based like C,
C++, and gives row number and column number of the error.
The lexical analyzer also follows rule priority where a reserved word, e.g., a
keyword, of a language is given priority over user input. That is, if the lexical
analyzer finds a lexeme that matches with any existing reserved word, it should
generate an error.
The output of Lexical Analysis Phase:
The output of Lexical Analyzer serves as an input to Syntax Analyzer as a
sequence of tokens and not the series of lexemes because during the syntax
analysis phase individual unit is not vital but the category or class to which
this lexeme belongs is considerable.
Regular Expressions
The lexical analyzer needs to scan and identify only a finite set of valid
string/token/lexeme that belong to the language in hand. It searches for the pattern
defined by the language rules.
Regular expressions have the capability to express finite languages by defining a
pattern for finite strings of symbols. The grammar defined by regular expressions
is known as regular grammar. The language defined by regular grammar is
known as regular language.
Regular expression is an important notation for specifying patterns. Each pattern
matches a set of strings, so regular expressions serve as names for a set of strings.
Programming language tokens can be described by regular languages. The
specification of regular expressions is an example of a recursive definition.
Regular languages are easy to understand and have efficient implementation.
There are a number of algebraic laws that are obeyed by regular expressions,
which can be used to manipulate regular expressions into equivalent forms.
Operations
The various operations on languages are:
Union of two languages L and M is written as
L U M = {s | s is in L or s is in M}
Concatenation of two languages L and M is written as
LM = {st | s is in L and t is in M}
The Kleene Closure of a language L is written as
L* = Zero or more occurrence of language L.
Notations
If r and s are regular expressions denoting the languages L(r) and L(s), then
Union : (r)|(s) is a regular expression denoting L(r) U L(s)
Concatenation : (r)(s) is a regular expression denoting L(r)L(s)
Kleene closure : (r)* is a regular expression denoting (L(r))*
(r) is a regular expression denoting L(r)
Precedence and Associativity
*, concatenation (.), and | (pipe sign) are left associative
* has the highest precedence
Concatenation (.) has the second highest precedence.
| (pipe sign) has the lowest precedence of all.
Representing valid tokens of a language in regular expression
If x is a regular expression, then:
x* means zero or more occurrence of x.
i.e., it can generate { e, x, xx, xxx, xxxx, … }
x+ means one or more occurrence of x.
i.e., it can generate { x, xx, xxx, xxxx … } or x.x*
x? means at most one occurrence of x
i.e., it can generate either {x} or {e}.
[a-z] is all lower-case alphabets of English language.
[A-Z] is all upper-case alphabets of English language.
[0-9] is all natural digits used in mathematics.
Representation occurrence of symbols using regular expressions
letter = [a – z] or [A – Z]
digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
sign = [ + | - ]
Representation of language tokens using regular expressions
Decimal = (sign)?(digit)+
Identifier = (letter)(letter | digit)*
The only problem left with the lexical analyzer is how to verify the validity of a
regular expression used in specifying the patterns of keywords of a language. A
well-accepted solution is to use finite automata for verification.
Finite Automata
Finite automata is a state machine that takes a string of symbols as input and
changes its state accordingly. Finite automata is a recognizer for regular
expressions. When a regular expression string is fed into finite automata, it
changes its state for each literal. If the input string is successfully processed and
the automata reaches its final state, it is accepted, i.e., the string just fed was said
to be a valid token of the language in hand.
The mathematical model of finite automata consists of:
Finite set of states (Q)
Finite set of input symbols (Σ)
One Start state (q0)
Set of final states (qf)
Transition function (δ)
The transition function (δ) maps the finite set of state (Q) to a finite set of input
symbols (Σ), Q × Σ ➔ Q
Finite Automata Construction
Let L(r) be a regular language recognized by some finite automata (FA).
States : States of FA are represented by circles. State names are of
the state is written inside the circle.
Start state : The state from where the automata starts, is known as
start state. Start state has an arrow pointed towards it.
Intermediate states : All intermediate states has at least two arrows;
one pointing to and another pointing out from them.
Final state : If the input string is successfully parsed, the automata
is expected to be in this state. Final state is represented by double
circles. It may have any odd number of arrows pointing to it and even
number of arrows pointing out from it. The number of odd arrows
are one greater than even, i.e. odd = even+1.
Transition : The transition from one state to another state happens
when a desired symbol in the input is found. Upon transition,
automata can either move to next state or stay in the same state.
Movement from one state to another is shown as a directed arrow,
where the arrows points to the destination state. If automata stays on
the same state, an arrow pointing from a state to itself is drawn.
Example : We assume FA accepts any three digit binary value ending in digit 1.
FA = {Q(q0, qf), Σ(0,1), q0, qf, δ}
For example, below DFA with Σ = {0, 1} accepts all strings ending with 0.
The lexical analyzer scans the input from left to right one character at a time. It
uses two pointers begin ptr(bp) and forward ptr(fp) to keep track of the pointer
of the input
scanned.
Initially both the pointers point to the first character of the input string as
shown
below
The forward ptr moves ahead to search for end of lexeme. As soon as the blank
space is encountered, it indicates end of lexeme. In above example as soon as
ptr (fp) encounters a blank space the lexeme “int” is identified. The fp will be
moved ahead at white space, when fp encounters white space, it ignore and
moves ahead. then both the begin ptr(bp) and forward ptr(fp) are set at next
token. The input character is thus read from secondary storage, but reading in
this way from secondary storage is costly. hence buffering technique is used.A
block of data is first read into a buffer, and then second by lexical analyzer.
there are two methods used in this context: One Buffer Scheme, and Two Buffer
Scheme. These are explained as following
below.
1. One Buffer Scheme: In this scheme, only one buffer is used to store
the input string but the problem with this scheme is that if lexeme is
very long then it crosses the buffer boundary, to scan rest of the
lexeme the buffer has to be refilled, that makes overwriting the first of
lexeme.
2. Two Buffer Scheme: To overcome the problem of one buffer scheme,
in this method two buffers are used to store the input string. the first
buffer and second buffer are scanned alternately. when end of current
buffer is reached the other buffer is filled. the only problem with this
method is that if length of the lexeme is longer than length of the
buffer then scanning input cannot be scanned completely. Initially
both the bp and fp are pointing to the first character of first buffer.
Then the fp moves towards right in search of end of lexeme. as soon
as blank character is recognized, the string between bp and fp is
identified as corresponding token. to identify, the boundary of first
buffer end of buffer character should be placed at the end first buffer.
Similarly end of second buffer is also recognized by the end of buffer
mark present at the end of second buffer. when fp encounters first eof,
then one can recognize end of first buffer and hence filling up second
buffer is started. in the same way when second eof is obtained then it
indicates of second buffer. alternatively both the buffers can be filled
up until end of the input program and stream of tokens is identified.
This eof character introduced at the end is calling Sentinel which is
used to identify the end of
buffer.
Syntax Analysis
Syntax analysis or parsing is the second phase of a compiler.
We have seen that a lexical analyzer can identify tokens with the help of regular
expressions and pattern rules. But a lexical analyzer cannot check the syntax of a
given sentence due to the limitations of the regular expressions. Regular
expressions cannot check balancing tokens, such as parenthesis. Therefore, this
phase uses context-free grammar (CFG), which is recognized by push-down
automata.
CFG, on the other hand, is a superset of Regular Grammar, as depicted below:
It implies that every Regular Grammar is also context-free, but there exists some
problems, which are beyond the scope of Regular Grammar. CFG is a helpful tool
in describing the syntax of programming languages.
Context-Free Grammar
In this section, we will first see the definition of context-free grammar and
introduce terminologies used in parsing technology.
A context-free grammar has four components:
A set of non-terminals (V). Non-terminals are syntactic variables
that denote sets of strings. The non-terminals define sets of strings
that help define the language generated by the grammar.
A set of tokens, known as terminal symbols (Σ). Terminals are the
basic symbols from which strings are formed.
A set of productions (P). The productions of a grammar specify the
manner in which the terminals and non-terminals can be combined
to form strings. Each production consists of a non-terminal called
the left side of the production, an arrow, and a sequence of tokens
and/or on- terminals, called the right side of the production.
One of the non-terminals is designated as the start symbol (S); from
where the production begins.
The strings are derived from the start symbol by repeatedly replacing a non-
terminal (initially the start symbol) by the right side of a production, for that non-
terminal.
Example
We take the problem of palindrome language, which cannot be described by
means of Regular Expression. That is, L = { w | w = wR } is not a regular language.
But it can be described by means of CFG, as illustrated below:
G = ( V, Σ, P, S )
Where:
V = { Q, Z, N }
Σ = { 0, 1 }
P = { Q → Z | Q → N | Q → ℇ | Z → 0Q0 | N → 1Q1 }
S={Q}
This grammar describes palindrome language, such as: 1001, 11100111, 00100,
1010101, 11111, etc.
Syntax Analyzers
A syntax analyzer or parser takes the input from a lexical analyzer in the form of
token streams. The parser analyzes the source code (token stream) against the
production rules to detect any errors in the code. The output of this phase is
a parse tree.
This way, the parser accomplishes two tasks, i.e., parsing the code, looking for
errors and generating a parse tree as the output of the phase.
Parsers are expected to parse the whole code even if some errors exist in the
program. Parsers use error recovering strategies, which we will learn later in this
chapter.
Derivation
A derivation is basically a sequence of production rules, in order to get the input
string. During parsing, we take two decisions for some sentential form of input:
Deciding the non-terminal which is to be replaced.
Deciding the production rule, by which, the non-terminal will be
replaced.
To decide which non-terminal to be replaced with production rule, we can have
two options.
Left-most Derivation
If the sentential form of an input is scanned and replaced from left to right, it is
called left-most derivation. The sentential form derived by the left-most
derivation is called the left-sentential form.
Right-most Derivation
If we scan and replace the input with production rules, from right to left, it is
known as right-most derivation. The sentential form derived from the right-most
derivation is called the right-sentential form.
Example
Production rules:
E→E+E
E→E*E
E → id
Input string: id + id * id
The left-most derivation is:
E→E*E
E→E+E*E
E → id + E * E
E → id + id * E
E → id + id * id
Notice that the left-most side non-terminal is always processed first.
The right-most derivation is:
E→E+E
E→E+E*E
E → E + E * id
E → E + id * id
E → id + id * id
Parse Tree
A parse tree is a graphical depiction of a derivation. It is convenient to see how
strings are derived from the start symbol. The start symbol of the derivation
becomes the root of the parse tree. Let us see this by an example from the last
topic.
We take the left-most derivation of a + b * c
The left-most derivation is:
E→E*E
E→E+E*E
E → id + E * E
E → id + id * E
E → id + id * id
Step 1:
E→E*E
Step 2:
E→E+E*E
Step 3:
E → id + E * E
Step 4:
E → id + id * E
Step 5:
E → id + id * id
In a parse tree:
All leaf nodes are terminals.
All interior nodes are non-terminals.
In-order traversal gives original input string.
A parse tree depicts associativity and precedence of operators. The deepest sub-
tree is traversed first, therefore the operator in that sub-tree gets precedence over
the operator which is in the parent nodes.
Ambiguity
A grammar G is said to be ambiguous if it has more than one parse tree (left or
right derivation) for at least one string.
Example
E→E+E
E→E–E
E → id
For the string id + id – id, the above grammar generates two parse trees:
(2) S => Aα | β
A => Sd
(1) is an example of immediate left recursion, where A is any non-terminal
symbol and α represents a string of non-terminals.
(2) is an example of indirect-left recursion.
A top-down parser will first parse the A, which in-turn will yield a string
consisting of A itself and the parser may go into a loop forever.
Removal of Left Recursion
One way to remove left recursion is to use the following technique:
The production
A => Aα | β
is converted into following productions
A => βA'
A'=> αA' | ε
This does not impact the strings derived from the grammar, but it removes
immediate left recursion.
Second method is to use the following algorithm, which should eliminate all
direct and indirect left recursions.
START
Follow Set
Likewise, we calculate what terminal symbol immediately follows a non-terminal
α in production rules. We do not consider what the non-terminal can generate but
instead, we see what would be the next terminal symbol that follows the
productions of a non-terminal.
Algorithm for calculating Follow set:
if α is a start symbol, then FOLLOW() = $
if α is a non-terminal and has a production α → AB, then FIRST(B)
is in FOLLOW(A) except ℇ.
if α is a non-terminal and has a production α → AB, where B ℇ, then
FOLLOW(A) is in FOLLOW(α).
Follow set can be seen as: FOLLOW(α) = { t | S *αt*}
Limitations of Syntax Analyzers
Syntax analyzers receive their inputs, in the form of tokens, from lexical
analyzers. Lexical analyzers are responsible for the validity of a token supplied
by the syntax analyzer. Syntax analyzers have the following drawbacks -
it cannot determine if a token is valid,
it cannot determine if a token is declared before it is being used,
it cannot determine if a token is initialized before it is being used,
it cannot determine if an operation performed on a token type is
valid or not.
These tasks are accomplished by the semantic analyzer, which we shall study in
Semantic Analysis.
Parser
Parser is a compiler that is used to break the data into smaller elements coming
from lexical analysis phase.
A parser takes input in the form of sequence of tokens and produces output in the
form of parse tree.
Example
Production
1. E→T
2. T→T*F
3. T → id
4. F→T
5. F → id
1. Shift-Reduce Parsing
2. Operator Precedence Parsing
3. Table Driven LR Parsing
a. LR( 1 )
b. SLR( 1 )
c. CLR ( 1 )
d. LALR( 1 )
and works down the parse tree by parse tree and works up the parse
using the rules of grammar. tree by using the rules of grammar.
This parsing technique uses Left This parsing technique uses Right
Most Derivation. Most Derivation.
LR Parser
LR parsing is one type of bottom up parsing. It is used to parse the large class of
grammars.
"K" is the number of input symbols of the look ahead used to make number of
parsing decision.
LR parsing is divided into four parts: LR (0) parsing, SLR parsing, CLR parsing
and LALR parsing.
LR algorithm:
The LR algorithm requires stack, input, output and parsing table. In all type of
LR parsing, input, output and stack are same but parsing table is different.
Fig: Block diagram of LR parser
Input buffer is used to indicate end of input and it contains the string to be parsed
followed by a $ Symbol.
Parsing table is a two dimensional array. It contains two parts: Action part and
Go To part.
LR (1) Parsing
Augment Grammar
Example
Given grammar
1. S → AA
2. A → aA | b
1. S`→ S
2. S → AA
3. A → aA | b
S.
SLR Parsers Canonical LR Parsers LALR Parsers
no.
SLR parsers are cost CLR parsers are The cost of constructing
effective to construct in expensive to construct in LALR parsers is i
4.
terms of time and terms – of time and intermediate between
space. space. SLR and CLR parser.
Every LALR(1)
Every SLR(1) grammar Every LR(1) grammar grammar may not be
8. is LR(1) grammar and may not be SLR( 1) SLR(1) but every
LALR(1). grammar. LALR(1) grammar is
LR(1) grammar.
A shift-reduce or reduce-
A shift-reduce or A shift-reduce conflict
reduce conflicted may
reduce-reduce conflict can not arise but a
9. arise but chances are less
may arise in SLR reduce-reduce conflict
than that in SLR parsing
parsing table. may arise.
tables.
If watched closely, we find most of the leaf nodes are single child to their parent
nodes. This information can be eliminated before feeding it to the next phase. By
hiding extra information, we can obtain a tree as shown below:
E -> E+T | T
T -> T*F | F
F -> INTLIT
This is a grammar to syntactically validate an expression having additions and
multiplications in it. Now, to carry out semantic analysis we will augment SDT
rules to this grammar, in order to pass some information up the parse tree and
check for semantic errors, if any. In this example, we will focus on the
evaluation of the given expression, as we don’t have any semantic assertions to
check in this very basic example.
Conversion
Dynamic Type Checking is defined as the type checking being done at run
time. In Dynamic Type Checking, types are associated with values, not
variables. Implementations of dynamically type-checked languages runtime
objects are generally associated with each other through a type tag, which is
a reference to a type containing its type information. Dynamic typing is more
flexible. A static type system always restricts what can be conveniently
expressed. Dynamic typing results in more compact programs since it is more
flexible and does not require types to be spelled out. Programming with a
static type system often requires more design and implementation effort.
Languages like Pascal and C have static type checking. Type checking is used
to check the correctness of the program before its execution. The main
purpose of type-checking is to check the correctness and data type
assignments and type-casting of the data types, whether it is syntactically
correct or not before their execution.
Static Type-Checking is also used to determine the amount of memory
needed to store the variable.
The design of the type-checker depends on:
1. Syntactic Structure of language constructs.
2. The Expressions of languages.
3. The rules for assigning types to constructs (semantic rules).
The token streams from the lexical analyzer are passed to the PARSER. The
PARSER will generate a syntax tree. When a program (source code) is
converted into a syntax tree, the type-checker plays a Crucial Role. So, by
seeing the syntax tree, you can tell whether each data type is handling the
correct variable or not. The Type-Checker will check and if any modifications
are present, then it will modify. It produces a syntax tree, and after that,
INTERMEDIATE CODE Generation is done.
Intermediate code can translate the source program into the machine program.
Intermediate code is generated because the compiler can’t generate machine code
directly in one pass. Therefore, first, it converts the source program into intermediate
code, which performs efficient generation of machine code further. The intermediate
code can be represented in the form of postfix notation, syntax tree, directed acyclic
graph, three address codes, Quadruples, and triples.
If it can divide the compiler stages into two parts, i.e., Front end & Back end, then this
phase comes in between.
Question – Write quadruple, triples and indirect triples for following expression
: (x + y) * (y + z) + (x + y + z)
Explanation – The three address code is:
t1 = x + y
t2 = y + z
t3 = t1 * t2
t4 = t1 + z
t5 = t3 + t4
Intermediate Code Generation in Compiler Design
In the analysis-synthesis model of a compiler, the front end of a compiler
translates a source program into an independent intermediate code, then the
back end of the compiler uses this intermediate code to generate the target
code (which can be understood by the machine). The benefits of using
machine-independent intermediate code are:
Because of the machine-independent intermediate code, portability
will be enhanced. For ex, suppose, if a compiler translates the source
language to its target machine language without having the option for
generating intermediate code, then for each new machine, a full
native compiler is required. Because, obviously, there were some
modifications in the compiler itself according to the machine
specifications.
Retargeting is facilitated.
It is easier to apply source code modification to improve the
performance of source code by optimizing the intermediate code.
If we generate machine code directly from source code then for n target
machine we will have optimizers and n code generator but if we will have a
machine-independent intermediate code, we will have only one optimizer.
Intermediate code can be either language-specific (e.g., Bytecode for Java) or
language. independent (three-address code). The following are commonly used
intermediate code representations:
1. Postfix Notation: Also known as reverse Polish notation or suffix
notation. The ordinary (infix) way of writing the sum of a and b is
with an operator in the middle: a + b The postfix notation for the
same expression places the operator at the right end as ab +. In
general, if e1 and e2 are any postfix expressions, and + is any binary
operator, the result of applying + to the values denoted by e1 and e2
is postfix notation by e1e2 +. No parentheses are needed in postfix
notation because the position and arity (number of arguments) of the
operators permit only one way to decode a postfix expression. In
postfix notation, the operator follows the operand.
Example 1: The postfix representation of the expression (a + b) * c
is : ab + c *
Example 2: The postfix representation of the expression (a – b) * (c
+ d) + (a – b) is : ab – cd + *ab -+
Read more: Infix to Postfix
When to Optimize?
Why Optimize?
(i) A = 2*(22.0/7.0)*r
Perform 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3
Evaluate x/2.3 as 12.4/2.3 at compile time.
2. Variable Propagation:
C
//Before Optimization
c = a * b
x = a
till
d = x * b + 4
//After Optimization
c = a * b
x = a
till
d = a * b + 4
3. Constant Propagation:
If the value of a variable is a constant, then replace the variable
with the constant. The variable may not always be a constant.
Example:
C
(i) A = 2*(22.0/7.0)*r
Performs 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3
Evaluates x/2.3 as 12.4/2.3 at compile time.
(iii) int k=2;
if(k) go to L3;
It is evaluated as :
go to L3 ( Because k = 2 which implies condition is always true)
4. Constant Folding:
C
#define k 5
x = 2 * k
y = k + 5
This can be computed at compile time and the values of x and y are :
x = 10
y = 10
5. Copy Propagation:
C
//Before Optimization
c = a * b
x = a
till
d = x * b + 4
//After Optimization
c = a * b
x = a
till
d = a * b + 4
C
c = a * b
x = a
till
d = a * b + 4
//After elimination :
c = a * b
till
d = a * b + 4
C++
#include <iostream>
using namespace std;
int main() {
int num;
num=10;
cout << "GFG!";
return 0;
cout << num; //unreachable code
}
//after elimination of unreachable code
int main() {
int num;
num=10;
cout << "GFG!";
return 0;
}
9. Function Inlining:
C
Example 1 :
Multiplication with powers of 2 can be replaced by shift left operator which is
less
expensive than multiplication
a=a*16
// Can be modified as :
a = a<<4
Example 2 :
i = 1;
while (i<10)
{
y = i * 4;
}
//After Reduction
i = 1
t = 4
{
while( t<40)
y = t;
t = t + 4;
}
C
a = 200;
while(a>0)
{
b = x + y;
if (a % b == 0)
printf(“%d”, a);
}
2. Loop Jamming:
Two or more loops are combined in a single loop. It helps in
reducing the compile time.
Example:
C
for(int k=0;k<10;k++)
{
y = k+3;
}
3. Loop Unrolling:
It helps in optimizing the execution time of the program by
reducing the iterations.
It increases the program’s speed by eliminating the loop control
and test instructions.
Example:
C
printf("Hello");
printf("Hello");
Now that we learned the need for optimization and its two types,now let’s
see where to apply these optimization.
Source program: Optimizing the source program involves making
changes to the algorithm or changing the loop structures. The user
is the actor here.
Intermediate Code: Optimizing the intermediate code involves
changing the address calculations and transforming the procedure
calls involved. Here compiler is the actor.
Target Code: Optimizing the target code is done by the compiler.
Usage of registers, and select and move instructions are part of the
optimization involved in the target code.
Local Optimization: Transformations are applied to small basic
blocks of statements. Techniques followed are Local Value
Numbering and Tree Height Balancing.
Regional Optimization: Transformations are applied to Extended
Basic Blocks. Techniques followed are Super Local Value
Numbering and Loop Unrolling.
Global Optimization: Transformations are applied to large
program segments that include functions, procedures, and loops.
Techniques followed are Live Variable Analysis and Global Code
Replacement.
Interprocedural Optimization: As the name indicates, the
optimizations are applied inter procedurally. Techniques followed
are Inline Substitution and Procedure Placement.