0% found this document useful (0 votes)

122 views194 pages

Compiler Desing-Final ppt2

The document discusses the Analysis-Synthesis model of compilation. It has front-end analysis phases like lexical, syntax and semantic analysis. The back-end phases like code generation and optimization are called the synthesis phase as they synthesize the target language from the intermediate representation created by the front-end. The model has well-defined phases with logical activities and code reusability. Lexical analysis is the first phase that breaks the input into tokens. Lexical analyzers are generated using tools like Lex and Flex that take regular expressions as input and generate a DFA to recognize tokens.

Uploaded by

Tanya Singhal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views194 pages

Compiler Desing-Final ppt2

Uploaded by

Tanya Singhal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 194

• Advantages of the model

Also known as Analysis-Synthesis model of compilation

- Front end phases are known as analysis phases
- Back end phases are known as synthesis phases
- Each phase has a well defined work
- Each phase handles a logical activity in the process of
compilation

The Analysis-Synthesis model:

• The front end phases are Lexical, Syntax and Semantic
analyses. These form the "analysis phase" as you can well
see these all do some kind of analysis. The Back End
phases are called the "synthesis phase" as they synthesize
the intermediate and the target language and hence the
program from the representation created by the Front End
phases. The advantages are that not only can lots of code
be reused, but also since the compiler is well structured -
it is easy to maintain & debug.
How does LEX work?
• Regular expressions describe the languages that can
be recognized by finite automata.

• Translate each token regular expression into a non

deterministic finite automaton(NFA)

• Convert the NFA into an equivalent DFA

• Minimize the DFA to reduce the number of states

• Emit code driven by the DFA tables.

Lexical Analysis
Difficulties in the Implementation of
Lexical analysers
• Lexemes in a fixed position. Fix format vs. free
format languages
• Handling of blanks
– in Pascal, blanks separate identifiers
– in Fortran, blanks are important only in literal strings
for example variable counter is same as count er
• Another example
• DO 10 I = 1.25 DO10I=1.25
• DO 10 I = 1,25 DO10I=1,25
Recognition of reserved keywords
and identifiers
• To reduce the number of states, enter
keywords into the symbol table as if they were
identifiers.

• When the LA consults the symbol table to find

the correct lexical value to return, it discovers
that this identifier is really a keyword, and the
symbol table entry has the proper token code
to return.
Transition diagram for unsigned
numbers
Implementation of Transition Diagram
Another transition diagram for
Unsigned Numbers

• A more complex transition diagram is difficult to

implement and may give rise to errors during coding,
however, there are ways to better implementation.
Lexical Analyzer Generators

• The process of constructing a lexical analyzer can be

automated.

• Input to the generator:

– List of regular expressions in priority order
– Associated actions for each of regular expression

• Output of the generator:

– Program that reads input character stream and breaks that
into tokens
– Reports lexical errors(unexpected characters), if any
• Two popular lexical analyzer generators are

– Flex : generates lexical analyzer in C or C++. It is more

modern version of the original Lex tool that was part of
the AT&T Bell Labs version of Unix.
• An open source implementation of the original UNIX lex utility

– Jlex: written in Java. Generates lexical analyzer in Java

• Lex:

– It is a lexical analyzer generator.

– The input notation for the Lex tool is referred to as the

Lex language.

– The tool itself is the Lex compiler.

– Behind the scenes, the Lex compiler transforms the

input patterns into a transition diagram and generates
code, in a file called lex.yy.c, that simulates this
transition diagram.
• An input file which we call lex.l is written in
the Lex language and describes the lexical
analyzer to be generated.

• The lex compiler transforms lex.l to a C

program, in a file that is always named lex.yy.c
• The latter file is compiled by the C compiler into
a file called a.out, as always.
• The C-compiler output is a working lexical
analyzer that can take a stream of input characters
and produce a stream of tokens.

• The normal use of the C compiled output, referred

to as a.out is a subroutine of the parser.
Fig: Creating a lexical analyzer with Lex
Structure of Lex programs:
• A lex program has the following form:

declarations
%%
Translation rules
%%
Auxiliary functions (optional)
• The declaration section includes declarations
of
– variables, manifest constants ( identifiers declared
to stand for a constant), and
– regular definitions.

• The translation rules each have the form

Pattern {Action}
- Actions will contain the C code . A single statement
or a code block.
Lex Pattern Examples
abc Match the string “abc”
[a-z A-Z] Match any lower or uppercase
letter
Dog.*cat Match any string starting with
dog, and ending with cat

(ab)+ Match one or more occurrences

of “ab” concatenated

[^a-z]+ Matches any string of one or

more characters that do not
include lowercase a-z
[ + -]? [0-9]+ Match any string of one or more
digits with an optional prefix of
+ or -
Lex input example
• Filename: example1.l

%%
“HI” printf(“Hello World”);
. ; //Empty C statement.
Does nothing for any other character
encountered in the input
%%
Executing the Lex File
• lex example1.l
(Processes the lex file to generate a scanner which
gets saved as lex.yy.c)

• cc lex.yy.c –ll
compile the scanner and grab main() from the lex
library(-ll)

• ./a.out
Run the scanner taking input from standard input
Fig: Lex program for the relational operators as
tokens
• For a trivial example, consider a program to delete
from the input all blanks or tabs at the ends of
lines.
%%
[ \t]+$ ;
is all that is required.

– The program contains a %% delimiter to mark the

beginning of the rules, and one rule.
– This rule contains a regular expression which matches
one or more instances of the characters blank or tab
(written \t for visibility, in accordance with the C
language convention) just prior to the end of a line.
• To change any remaining string of blanks or
tabs to a single blank, add another rule:
%%
[ \t]+$ ;
[ \t]+ printf(" ");

– The first rule matches all strings of blanks or tabs

at the end of lines, and
– the second rule all remaining strings of blanks or
tabs.
• Lex can be used alone for simple transformations,
or for analysis and statistics gathering on a lexical
level.
• It is particularly easy to interface Lex and Yacc .
– Lex programs recognize only regular expressions;
– Yacc writes parsers that accept a large class of context
free grammars, but require a lower level analyzer to
recognize input tokens.
– Thus, a combination of Lex and Yacc is often
appropriate.
– When used as a preprocessor for a later parser
generator, Lex is used to partition the input stream, and
the parser generator assigns structure to the resulting
pieces.
• The general format of Lex source is:
{definitions}
%%
{rules}
%%
{user subroutines}

• where the definitions and the user subroutines are often

omitted. The second %% is optional, but the first is required
to mark the beginning of the rules. The absolute minimum
Lex program is thus
%%
• (no definitions, no rules) which translates into a program
which copies the input to the output unchanged.
• If the action is merely a single C expression, it
can just be given on the right side of the line; if it
is compound, or takes more than a line, it should
be enclosed in braces.

• The operator characters are

"\[]^-?.*+|()$/{}%<>
and if they are to be used as text characters, an
escape should be used.
• The quotation mark operator (") indicates that
whatever is contained between a pair of quotes is to
be taken as text characters. Thus
xyz"++"
matches the string xyz++ when it appears.
– Note that a part of a string may be quoted. It is harmless
but unnecessary to quote an ordinary text character; the
expression
"xyz++"
is the same as the one above.
• An operator character may also be turned into
a text character by preceding it with \ as in
xyz\+\+
which is another, less readable, equivalent of
the given expressions.
• In character classes, the ^ operator must appear as
the first character after the left bracket; it indicates
that the resulting string is to be complemented with
respect to the computer character set. Thus

[^abc]
matches all characters except a, b, or c, including all
special or control characters; or
[^a-zA-Z]
is any character which is not a letter.
• Lex Actions:

– When a specified expression is matched, Lex executes

the corresponding action.
– Note that there is a default action, which consists of
copying the input to the output.
• This is performed on all strings not otherwise matched.
• Thus the Lex user who wishes to absorb the entire input,
without producing any output, must provide rules to match
everything.
• In more complex actions, the user will often want
to know the actual text that matched some
expression like [a-z]+. Lex leaves this text in an
external character array named yytext. Thus, to
print the name found, a rule like

[a-z]+ printf("%s", yytext);

will print the string in yytext.
• Ambiguous Source Rules.
– Lex can handle ambiguous specifications. When more than one
expression can match the current input, Lex chooses as follows:
1) The longest match is preferred.
2)Among rules which matched the same number of
characters, the rule given first is preferred.

• Thus, suppose the rules

integer keyword action ...;
[a-z]+ identifier action ...;
to be given in that order.
– If the input is integers, it is taken as an identifier, because [a-z]+
matches 8 characters while integer matches only 7.
– If the input is integer, both rules match 7 characters, and the
keyword rule is selected because it was given first.
– Anything shorter (e.g. int) will not match the expression integer
and so the identifier interpretation is used.
• Remember that Lex is turning the rules into a
program. Any source not intercepted by Lex is
copied into the generated program.
Parsing
Syntax Analysis:
What does it do?
• Error reporting and recovery

• Model using context free grammars

• Recognize using Push down automata/Table

Driven Parsers
What a Syntax Analyser cannot do?
• To check whether variables are of types which
operations are allowed.

• To check whether a variable has been declared

before use

• To check whether a variable has been initialized

• These issues will be handled in semantic analysis

Limitations of Regular Languages
• To describe language syntax precisely and conveniently,
Can regular expressions be used?
• Many languages are not regular, for example, string of
balanced parentheses.. "for every opening parenthesis
there must be a closing parenthesis" cannot be
described using a regular expression
- ((((.))))
-{(i)i|i=0}
- There is no regular expression for this language
- A finite automata may repeat states, however, it cannot
remember the number of times it has been to a
particular state.
• Many programming languages have an
inherently recursive structure that can be
defined by Context Free Grammars (CFG)
rather intuitively.
Syntax Definition:
• Context free grammars
- a set of tokens (terminal symbols)
- a set of non terminal symbols
- a set of productions of the form non
terminal String of terminals & non terminals
- a start symbol <T, N, P, S>
• A grammar derives strings by beginning with a
start symbol and repeatedly replacing a non
terminal by the right hand side of a
production for that non terminal.

• The strings that can be derived from the start

symbol of a grammar G form the language
L(G) defined by the grammar.
Examples
• String of balanced parentheses:
S (S)S|ε

• Grammar for a string of digits separated by + or -:

Therefore, the string 9-5+2 belongs to the language

specified by the grammar.

It would be interesting to know that the name

context free grammar comes from the fact that use of
a production X . does not depend on the context of
X.
• The simple reason is that non-terminals appear by
themselves to the left of the arrow in context-free
rules:
A
• The rule A says that A may be replaced by
anywhere, regardless of where A occurs.
• On the other hand, we could define a context as pair of
strings such that a rule would apply only if occurs
before and occurs after the non-terminal A. We would
write this as


Derivations
• The construction of a parse tree can be made
precise by taking a derivational view, in which
productions are treated as rewriting rules.
• At each step, we choose a non-terminal to
replace. Different choices can lead to different
derivations.

• Two derivations are of interest

1. Leftmost: replace leftmost non-terminal (NT)
at each step
2. Rightmost: replace rightmost NT at each step
• The two derivations produce different parse trees.
The parse trees imply different evaluation orders!
• Parse Trees
– The derivations can be represented in a tree-like fashion.
– The interior nodes contain the non-terminals used
during the derivation
Context free grammars Versus Regular
Expressions
• Every construct that can be described by a
regular expression can be described by a
grammar, but not vice versa.
• Alternatively, every regular language is a
context free language, but not vice-versa.
• We can construct mechanically a grammar to
recognize the same language as a non
deterministic finite automaton (NFA).
• It uses the following construction:
– For each state i of the NFA, create a non terminal
Ai
– If state i has a transition to state j on input a, add
the production AiaAj. If state i goes to state j on
input epsilon, add the production AiAj.
– If i is an accepting state, add Aiƹ.
– If i is the start state, make Ai be the start symbol
of the grammar.
Parsing Techniques
• There are two primary parsing techniques:
– Top-down
– Bottom-up
Top- down parsers:
• A top-down parsers starts at the root of the
parse tree and grows towards leaves.
• At each node, the parser picks a production
and tries to match the input.
• However, the parser may pick the wrong
production in which case it will need to
backtrack.
• Some grammars are backtrack- free.
Bottom-up parsers:
• A bottom- up parser starts at the leaves and
grows toward root of the parse tree.
• As input is consumed, the parser encodes
possibilities in an internal state.
• The bottom- up parser starts in a state valid
for legal first tokens.
LL(1) Parser
• The first L in LL(1) stands for scanning the
input from left to right
• The second L stands for producing a leftmost
derivation
• The “1” stands for using one input symbol of
lookahead at each step to make parsing action
decisions.
Recursive Descent Parsing
• It consists of a set of procedures, one for each
non terminal.
• Execution begins with the procedure for the
start symbol, which halts and announces
success if it scans the entire input string.
Void A( )
{
choose an A-production, A->X1,X2….Xk;
for(i=1….k){
if(Xi is a non-terminal)
call procedure Xi( );
else if(Xi equals the current input symbol a)
advance the input to the next symbol;
else /*an error has occurred*/;
}
}

Fig. A typical procedure for a non-terminal in a top-down parser

• Predictive parsers, that is, recursive-descent parsers
needing no backtracking can be constructed for a
class of grammars called LL(1).

• A grammar G is LL(1) if and only if whenever

A->α|β are two distinct productions of G, the
following conditions hold:
1. For no terminal a do both α and β derive string
beginning with a. (Equivalent to the statement that
FIRST(α) and FIRST(β) are disjoint sets).
2. At most one of α and β can derive the empty string.
3. If ε is in FIRST(β) , then FIRST(α) and FOLLOW(A) are
disjoint sets, and likewise if ε is in FIRST(α).
• Consider the grammar
E-> iE’
E’-> +iE’/ Ԑ
E( )
{
if(l==‘i’)
{
match(‘i’);
E’();
}
}
E’( )
{
if(l==‘+’)
{
match(‘+’);
match(‘i’);
E’( );
}
else
return( );
}
match(char t)
{
if(l==t)
l=getchar();
else
printf(“error’);
}
main()
{
E();
if(l==‘$’)
printf(“parsing success”);
}
Non-recursive predictive parsing
• A non recursive predictive parser can be built by
maintaining a stack explicitly rather than
implicitly via recursive calls.
• The table-driven parser has an input buffer, a
stack containing a sequence of grammar
symbols, a parsing table constructed by a parsing
algorithm and an output stream.
• The input buffer contains the string to be parsed
followed by the endmarker $.
• We reuse the symbol $ to mark the bottom of the
stack, which initially contains the start symbol of
the grammar on top.
Bottom Up Parsing
• We can think of bottom-up parsing as the process
of “reducing” a string w to the start symbol of the
grammar.

• At each reduction step, a specific substring

matching the body of a production is replaced by
the non-terminal at the head of that production.

• The key decisions during bottom-up parsing are

about when to reduce and about what
production to apply, as the parse proceeds.
• Handle Pruning: A “handle” is a substring that
matches the body of a production, and whose
reduction represents one step along the
reverse of a rightmost derivation.

• Shift-reduce parsing: It is a form of bottom-up

parsing in which a stack holds grammar
symbols and an input buffer holds the rest of
the string to be parsed.
• While the primary operations are shift and
reduce, there are actually four possible
actions a shift-reduce parser can make:
1. Shift- Shift the next input symbol onto the top of
the stack.
2. Reduce- The right end of the string to be
reduced must be at the top of the stack. Locate
the left end of the string within the stack and
decide what non terminal to replace the string.
3. Accept- Announce successful completion of
parsing.
4. Error-Discover a syntax error and call an error
recovery routine.
Parser Generators
• Parser generators exists for LL(1) and LALR(1)
grammars. For eg.,
 LALR(1)- YACC, Bison, CUP
 LL(1)-ANTLR
 Recursive Descent- JavaCC
• The structure of the yacc file is
Declarations
%%
Translation rules
%%
Supporting C/C++ functions
Semantic Analysis
Syntax Directed Translation
• It is a systematic process of assigning meanings to
programs which can also be viewed as
computation of some special
information(attributes) associated with the
symbols of the Grammar.

• To accomplish the task of syntax-directed

translations, there are two general approaches:
– Syntax-Directed Definitions(SDD)
– Syntax-Directed Translation(SDT) Schemes
• The conceptual view of syntax-directed
translation can be presented as

Input string  Parse Tree Dependency

graph Evaluation order for semantic rules
Syntax Analyzer as translator

Abstract Syntax
Continuous
Stream of Parser Tree, Syntax
Tree,
Tokens
intermediate
code, etc.
Syntax + Translation
Rules

Fig.: Syntax Directed Translation

Syntax-Directed Translation
• The grammar symbols are associated with
attributes to associate information with the
programming language constructs that they
represent.
• Values of these attributes are evaluated by the
semantic rules associated with the grammar
productions.
• Any attribute may hold almost any information
– It can hold a string, a number, a memory location, etc
• Evaluation of these semantic rules:

– May put information into the symbol table

– May perform type checking
– May issue error messages
– May perform some other activities
Syntax –Directed Definitions and
Translation Schemes
• When we associate semantic rules with
productions, we use two notations:

– Syntax-Directed Definitions
– Translation Schemes
• Syntax-Directed Definitions:

– It gives high-level specifications for translations

– It hides many implementation details such as
order of evaluation of semantic actions
– We associate a production rule with a set of
semantic actions, and we do not say when they
will be evaluated.
• Syntax-Directed Translation Schemes:

– Indicate the order of evaluation of semantic

actions associated with a production rule.

– In other words, translation schemes give a little bit

information about implementation details.
Syntax Directed Definitions
• A SDD is a generalization of a context-free
grammar in which:

– Each grammar symbol is associated with a set of

attributes.
– This set of attributes for a grammar symbol can be of
the following categories:

• Synthesized attributes
• Inherited attributes

– Each production rule is associated with a set of

semantic rules
• In a SDD, each production A α is associated with
a set of semantic rules of the form:
b=f(c1,c2,c3,….cn)
Where f is a function and b can be one of the
following:
– b is a synthesized attribute of A and c1,c2,….,cn are
attributes of the grammar symbols in the
production(A α )
– b is an inherited attribute of one of the grammar
symbols in α (on the right side of the production) ,
and c1,c2,….,cn are attributes of the grammar symbols
in the production(A α ).
• Terminals have only synthesized attributes
whose values are provided by the scanner or
the lexical analyzer.

• The start non-terminal typically has no

inherited attributes.

• We may allow function calls as semantic-rules

also; they are called “Side-effects”
Annotated Parse Tree
• A parse tree showing the values of attributes at each
node is called an annotated parse tree

• The process of computing the attribute values at the

nodes is called annotation (or decoration) of the parse
tree.

• Definitely the order of these computations depends on

the dependency graph induced by the semantic rules.

• Values of attributes in nodes of annotated parse-tree

are either,
– Initialized to constant values by the lexical analyzer
– Determined by the semantic-rules
Evaluating Attributes
• If a syntax-directed definition employs only
Synthesized attributes, the evaluation of the
attributes can be done in a bottom-up fashion

• Inherited attributes would require more

arbitrary “traversals” of the annotated parse-
tree.

• A dependency graph suggests possible

evaluation orders for an annotated parse-tree.
Attribute Grammar
• An attribute grammar is a formal way to define
attributes for the productions of a formal grammar,
associating these attributes with values.
Example of a Syntax-Directed Definition
• Here,
– The SDD is based on the grammar for arithmetic
expressions which evaluates expressions terminated by an
endmarker n.
– Grammar symbols: L,E,T,F,n,+,*,(,), digit
– Non terminals E,T,F have an attribute called val.
– Terminal digit has an attribute called lexval whose value Is
provided by the lexical analyzer.

PRODUCTION SEMANTIC RULE

L En print(E.val)
E E1+T E.val=E1.val+T.val
ET E.val=T.val
T T1*F T.val= F.val
T F T.val=F.val
F(E) F.val=E.val
F digit F.val=digit.lexval
Synthesized and Inherited attributes
• Synthesized attributes are computed bottom-
up fashion from the leaves upwards

• Inherited attributes flow down from the

parent or sibling to the node in question
v
Evaluation Order for SDDs
• “Dependency graphs” are a useful tool for
determining an evaluation order for the
attribute instances in a given parse tree.
• While an annotated parse tree shows the
values of attributes, a dependency graph
helps us determine how those values can be
computed.
Dependency graphs
• A dependency graph depicts the flow of information
among the attributes instances in a particular parse
tree; an edge from one attribute instance to
another means that the value of the first is needed
to compute the second.
• Edges express constraints implied by the semantic
rules.
– Consider the following production and rule:
PRODUCTION SEMANTIC RULE
E E1+T E.val = E1.val +T.val
• Here, val is a synthesized attribute.
• As a convention, we show the parse tree edges
as dotted lines, while the edges of the
dependency graph are solid.
Ordering the Evaluation of Attributes
• The dependency graph characterizes the possible
orders in which we can evaluate the attributes at the
various nodes of a parse tree.

• If the dependency graph has an edge from node M to

node N, then the attribute corresponding to M must be
evaluated before the attribute of N.

• A topological sort of a directed graph is a linear

ordering of its vertices such that for each directed edge
xy from a vertex x to vertex y, x comes before y in the
ordering.
• For the above graph there are other topological sorts as well, such as 1,3,5,2,4,6,7,8,9
• There are two important classes of SDDs:
1. S-attributed definitions: An SDD is S-attributed if
every attribute is synthesized.
2. L-attributed definitions: Each attribute must be either
a. Synthesized or
b. Inherited, but with the rules limited as follows.
Suppose there is a production A X1X2X3…Xn,
and that there is an inherited attribute Xi.a computed by
a rule associated with this production. Then the rule may
use only:
• Inherited attributes associated with the head A
• Either inherited or synthesized attributes associated with
the occurrences of symbols X1,X2,X3,X4,….,Xi-1 located
to the left of Xi.
• Inherited or synthesized attributes associated with this
occurrence of Xi itself but only in such a way that there
is no cycles in a dependency graph followed by the
attributes of this Xi.
– Given the Syntax-Directed Definition below
with the synthesized attribute val, draw the
annotated parse tree for the expression
(3+4)*(5+6)
PRODUCTION SEMANTIC RULE
L En print(E.val)
E E1+T E.val=E1.val+T.val
ET E.val=T.val
T T1*F T.val= F.val
T F T.val=F.val
F(E) F.val=E.val
F digit F.val=digit.lexval
Applications of Syntax-Directed
Translation
• One important application of SDT is the
construction of syntax trees.
• Since some compilers use syntax trees as an
intermediate representation, a common form of
SDD turns it’s input string into a tree.
• To complete the translation to intermediate
code, the compiler may then walk the syntax
tree, using another set of rules that are in effect
an SDD on the syntax tree rather than the parse
tree.
• We consider two SDDs for constructing syntax
trees for expressions.
– S-attributed definition, which is suitable for use
during bottom-up parsing
– L-attributed definition, is suitable for use during
top-down parsing
Construction of syntax trees
• We implement the nodes of a syntax tree by
objects with a suitable number of fields. Each
object will have an op field that is the label of the
node.
• The objects will have additional field as follows:
– If the node is a leaf, an additional field holds the
lexical value for the leaf. The construction function
Leaf(op, val) creates a leaf object. Alternatively, the
nodes are viewed as records, then Leaf returns a
pointer to a new record for a leaf.
– If the node is an interior node, there as many
additional fields as the node has children in the
syntax tree. A construction function Node takes
two or more arguments: Node(op,c1,c2,….,ck)
creates an object with first field op and k
additional fields for the k children c1,….,ck.
Intermediate Code Generation
• In the analysis-synthesis model of a compiler,
the front end analyzes a source program and
creates an intermediate representation, from
which the back-end generates target code.
Why Intermediate code?
• While generating machine code directly from
source code is possible, it entails two
problems
– With m languages and n target machines, we need
to write m front ends, m*n optimizers, and m*n
code generators.

– The code optimizer which is one of the largest and

very difficult to write components of a compiler,
cannot be reused.
• By converting source code to an intermediate
code, a machine independent code optimizer
may be written

• This means just m front ends, n code

generators and 1 optimizer
• Directed Acyclic Graphs for Expressions

– Like the syntax tree for an expression, a DAG has

leaves corresponding to atomic operands and
interior nodes corresponding to operators.

– The difference is that a node N in DAG has more

than one parent if N represents a common sub
expression.
• In a syntax tree, the tree for the common sub
expression would be replicated as many times as the
sub expression appears in the original expression.
– A DAG not only represents expressions more
succintly, it gives the compiler important clues
regarding the generation of efficient code to
evaluate the expressions.
The Value-Number method for
construction of a DAG
Three Address Code
• In three-address code, there is at most one
operator on the right side of an instruction; that is,
no built-up arithmetic expressions are permitted.
• Thus a source-language expression like x+y*z might
be translated into the sequence of three-address
instructions

• where t1 and t2 are compiler-generated temporary

names.
• This unraveling of multi-operator arithmetic
expressions and of nested flow-of-control
statements makes three-address code desirable for
target-code generation and optimization
• Three address code can be implemented using
records with fields for the addresses;
Quadruples, triples and indirect triples.
• Quadruples: A quadruple has four fields,
which we call op, arg1, arg2 and result.
– The op field contains an internal code for the
operator.
– For instance, the three-address instruction x = y +z
is represented by placing + in op, y in arg1, z in
arg2, and x in result
• There are some following exceptions to this
rule:
– Instructions with unary operators like x = minus y
or x = y do not use arg2. Note that for a copy
statement like x = y, op is =, while for most other
operations, the assignment operator is implied.

– Conditional and unconditional jumps put the

target label in result.
• Triples: A triple has only three fields, which we
call op, arg1, and arg2.
– Using triples, we refer to the result of an
operation x op y by its position, rather than by an
explicit temporary name.

– Thus, instead of the temporary t1 in Fig. 6.10 (b),

a triple representation would refer to position (0).
Parenthesized numbers represent pointers into
the triple structure itself.
• A benefit of quadruples over triples can be
seen in an optimizing compiler, where
instructions are often moved around.
• With quadruples, if we move an instruction
that computes a temporary t, then the
instructions that use t require no change.
• With triples, the result of an operation is
referred to by its position, so moving an
instruction may require us to change all
references to that result.
• Indirect triples : Indirect triples consist of a listing
of pointers to triples, rather than a listing of
triples themselves

• With indirect triples, an optimizing compiler can

move an instruction by reordering the instruction
list, without affecting the triples themselves.
• Static checking includes type checking, which
ensures that operators are applied to
compatible operands.

• It also includes any syntactic checks that

remain after parsing.

• For example, static checking assures that a

break-statement in C is enclosed within a
while-, for-, or switch-statement; an error is
reported if such an enclosing statement does
not exist
Types
• The applications of types can be grouped under
checking and translation:

– Type checking uses logical rules to reason about the

behavior of a program at run time.
• Specifically, it ensures that the types of the operands match the
type expected by an operator. For example, the && operator in
Java expects its two operands to be booleans; the result is also of
type boolean.

– Translation Applications. From the type of a name, a

compiler can determine the storage that will be needed for
that name at run time.
Storage layout for local names
• From the type of a name, we can determine
the amount of storage that will be needed for
the name at run time.
Type Conversions
• Consider expressions like x + i, where x is of type
float and i is of type integer.

• Since the representation of integers and floating-

point numbers is different within a computer and
different machine instructions are used for
operations on integers and floats, the compiler
may need to convert one of the operands of + to
ensure that both operands are of the same type
when the addition occurs.
• Suppose that integers are converted to floats
when necessary, using a unary operator
(float)
• For example, the integer 2 is converted to a
float in the code for the expression 2 * 3 .14:
t1 = (float) 2;
t2 = t1 * 3.14;
• We introduce another attribute E.type, whose
value is either integer or float.

• The rule associated with E  El + E2 builds on

the pseudocode
if ( E1.type = integer and E2.type = integer )
E.type = integer;
else if ( E1 .type = float and E2. type = integer )
…..
• Type conversion rules vary from language to
language.
• The rules for Java in Fig. 6.25 distinguish
between widening conversions, which are
intended to preserve information, and
narrowing conversions, which can lose
information.
• The widening rules are given by the hierarchy
in Fig. 6.25(a): any type lower in the hierarchy
can be widened to a higher type.
• Thus, a char can be widened to an int or to a
float, but a char cannot be widened to a short.

• The narrowing rules are illustrated by the

graph in Fig. 6.25(b): a type s can be narrowed
to a type t if there is a path from s to t.

• Note that char, short, and byte are pairwise

convertible to each other.
• Conversion from one type to another is said to
be implicit if it is done automatically by the
compiler.
• Implicit type conversions, also called
coercions, are limited in many languages to
widening conversions.
• Conversion is said to be explicit if the
programmer must write something to cause
the conversion.
• Explicit conversions are also called casts.
• The semantic action for checking E  E1 + E2
uses two functions:

1. max(t1, t2) takes two types t1 and t2 and

returns the maximum of the two types in the
widening hierarchy. It declares an error if
either t1 or t2 is not in the hierarchy.
2. widen(a, t, w) generates type conversions if
needed to widen an address a of type t into a
value of type w. It returns a itself if t and w are
the same type. Otherwise, it generates an
instruction to do the conversion and place the
result in a temporary t, which is returned as the
result.

• Pseudocode for widen, assuming that the only

types are, integer and float, appears in Fig. 6.26.
Code Optimization Techniques
1. Constant Folding

-It refers to a technique of evaluating the expressions whose

operands are known to be constant at compile time itself.

Eg. a= (22/7)*d

2. Constant propagation

- In constant propagation, if a variable is assigned a constant value,

then subsequent use of that variable can be replaced by a constant
as long as no intervening assignment has changed the value of the
variable.
– Eg. Pi=3.14
r=5
area=pi*r*r
3. Common sub-expression elimination

– This technique replaces redundant expression each

time it is encountered.

– Example: T1=4i T1=4i

T2=a[T1] T2=a[T1]
T3=4*j
T3=4*j T5=n
T4=4*i T6=b[T1]+T5
T5=n
T6=b[T4]+T5 After Optimization
Before optimization
4. Code Movement
– It is a technique of moving a block of code
outside a loop if it won’t have any difference
outside or inside the loop.
Example:

for(int i=0;i<n;i++) x=y+z;

{ for(int i=0;i<n;i++)
x=y+z; {
a[i]=6*I;
} a[i]=6*I;
}
Before Optimization
After Optimization
5.Dead Code Elimination
– This method involves eliminating those code
statements which are either never executed or
unreachable or if executed their output is never
used.
– Example
i=0; i=0
If(i==1)
{ After
a=x+5; Optimization
}

Before Optimization
6. Strength Reduction
• It is the replacement of expressions that
are expensive with cheaper and simple ones.
Example:

B=A*2; B=A+A;

Before After
Optimization Optimization
Basic Blocks and Flow Graphs
• Helps in the identification of loops in the code
for code optimization process

• First job is to partition a sequence of three-

address instructions into basic blocks.
Code Generation
Code Generation
• Three primary tasks of code generator:

– Instruction selection: involves choosing

appropriate target-machine instructions to
implement the IR statements

– Register allocation and assignment involves

deciding what values to keep in which registers.

– Instruction ordering involves deciding in what

order to schedule the execution of instructions.
A simple Target Machine Model
• Our target computer models a three-address
machine with load and store operations,
computation operations, jump operations,
and conditional jumps.

• The underlying computer is a byte-

addressable machine with n general-purpose
registers, RO, R1, . . . , Rn - 1
• We assume the following kinds of instructions
are available:
– Load operations: The instruction LD dst, addr
loads the value in location addr into location dst.
– Store operations: The instruction ST x, r stores
the value in register r into the location x.
– Computation operations of the form OP dst,
srcl, src2, where OP is a operator like ADD or
SUB, and dst, srcl , and src2 are locations.
– Unconditional jumps: The instruction BR L
causes control to branch to the machine instruction
with label L. (BR stands for branch.)

– Conditional jumps of the form Bcond r, L, where

r is a register, L is a label, and cond stands for any
of the common tests on values in the register r.
• For example, BLTZ r, L causes a jump to label L if the
value in register r is less than zero, and allows control to
pass to the next machine instruction if not.
Addressing Modes
• We assume our target machine has a variety of
addressing modes:
– In instructions, a location can be a variable name x
referring to the memory location that is reserved for x
– A location can also be an indexed address of the form
a(r), where a is a variable and r is a register.
• For example, the instruction LD R1, a(R2) has the effect of
setting Rl = contents (a +contents (R2)), where contents(x)
denotes the contents of the register or memory location
represented by x.
–A memory location can be an integer indexed by a
register. For example, LD R1, 1OO(R2) has the
effect of setting R1
=contents(100+contents(R2))that is, of loading
into R1 the value in the memory location obtained
by adding 100 to the contents of register R2.
– Two indirect addressing modes: *r means the
memory location found in the location represented
by the contents of register r and *100(r) means the
memory location found in the location obtained by
adding 100 to the contents of r.

• For example, LD R1, *100 (R2) has the effect of setting

R1 = contents(contents(100+contents(R2))), that is, of
loading into R1 the value in the memory location stored
in the memory location obtained by adding 100 to the
contents of register R2.
– Immediate constant addressing mode: The constant
is prefixed by #. The instruction LD R1, #100
loads the integer 100 into register R1, and ADD
R1, R1, #100 adds the integer 100 into register R1.
Program and Instruction Costs
• Cost of Instruction=1+ costs associated with the
addressing modes of the operands

• Addressing modes involving registers have zero

additional cost, while those involving a memory
location or constant in them have an additional cost of
one.

• We assume the cost of a target-language program on a

given input is the sum of costs of the individual
instructions executed when the program is run on that
input
Example:
• The instruction LD RO, R1 copies the contents of
register R1 into register RO. This instruction has a
cost of one because no additional memory words are
required.
• The instruction LD RO, M loads the contents of
memory location M into register RO. The cost is
two.
• The instruction LD R1, *100(R2) loads into register
R1 the value given by contents(contents(l00 +
contents(R2))). The cost is two.
1. 2 + 2 + 1 + 2 = 7
2. 2 + 2 + 2 + 2 = 8
3. 2 + 2 + 2 + 2 = 8
Recursive Descent Parsing
• It consists of a set of procedures, one for each
non terminal.

• Execution begins with the procedure for the

start symbol, which halts and announces
success if it scans the entire input string.
Void A( )
{
choose an A-production, A->X1,X2….Xk;
for(i=1….k){
if(Xi is a non-terminal)
call procedure Xi( );
else if(Xi equals the current input symbol a)
advance the input to the next symbol;
else /*an error has occurred*/;
}
}

Fig. A typical procedure for a non-terminal in a top-down parser

• Consider the grammar
E-> iE’
E’-> +iE’/ Ԑ
E( )
{
l=getchar();
if(l==‘i’)
{
match(‘i’);
E’();
}
}
E’( )
{
if(l==‘+’)
{
match(‘+’);
match(‘i’);
E’( );
}
else
return( );
}
match(char t)
{
if(l==t)
l=getchar();
else
printf(“error’);
}
main()
{
E();
if(l==‘$’)
printf(“parsing success”);
}

Ovation Ethernet Link Controller Module
100% (1)
Ovation Ethernet Link Controller Module
5 pages
Handling Multiple Attachments Using Java Mapping - SAP PI
75% (4)
Handling Multiple Attachments Using Java Mapping - SAP PI
23 pages
Estimation and Costing
100% (2)
Estimation and Costing
20 pages
Compiler Lab Manual Final E-Content
75% (16)
Compiler Lab Manual Final E-Content
55 pages
W116 Eng
No ratings yet
W116 Eng
2 pages
Drawin 1
0% (1)
Drawin 1
2 pages
Group 7 How Buildings Learn - Review Group
No ratings yet
Group 7 How Buildings Learn - Review Group
7 pages
BPMN Quick Reference Guide
No ratings yet
BPMN Quick Reference Guide
2 pages
7dimension Portfolio
No ratings yet
7dimension Portfolio
45 pages
TN27 Australian Standard For Precast BC and Headwall
No ratings yet
TN27 Australian Standard For Precast BC and Headwall
8 pages
Irc 103-1988
No ratings yet
Irc 103-1988
22 pages
LEX and YACC
No ratings yet
LEX and YACC
3 pages
Lecture 07 PDF
No ratings yet
Lecture 07 PDF
8 pages
ATCD Mod 3
No ratings yet
ATCD Mod 3
46 pages
CD Cse Record
No ratings yet
CD Cse Record
76 pages
Code:: Compiler Design (3170701) 190090107055
No ratings yet
Code:: Compiler Design (3170701) 190090107055
76 pages
Estd 1919
No ratings yet
Estd 1919
22 pages
Introduction For Lab Compiler
No ratings yet
Introduction For Lab Compiler
15 pages
Lex Introduction
No ratings yet
Lex Introduction
26 pages
Compiler Construction: Department of Computer Science
No ratings yet
Compiler Construction: Department of Computer Science
17 pages
UNIT 2 Part 3 Lexical Analyzer Generator
No ratings yet
UNIT 2 Part 3 Lexical Analyzer Generator
27 pages
Notes About Lex and Yacc: Pablo Nogueira Iglesias December 26, 1999
No ratings yet
Notes About Lex and Yacc: Pablo Nogueira Iglesias December 26, 1999
15 pages
Assignment CD
No ratings yet
Assignment CD
7 pages
Teamcenter BOM Report
No ratings yet
Teamcenter BOM Report
40 pages
RC Slab Design - Nadiah
No ratings yet
RC Slab Design - Nadiah
14 pages
System Software Manual
No ratings yet
System Software Manual
27 pages
9536 Exp5 Merged
No ratings yet
9536 Exp5 Merged
18 pages
Lex & Yacc
No ratings yet
Lex & Yacc
46 pages
Firestop CEJ 421 P
No ratings yet
Firestop CEJ 421 P
4 pages
Lab
0% (1)
Lab
32 pages
Restaurant Interior Fitout Services in Dubai
No ratings yet
Restaurant Interior Fitout Services in Dubai
8 pages
Compiler Design (CD) : Lab Assignment 1
No ratings yet
Compiler Design (CD) : Lab Assignment 1
36 pages
Lab Manual
No ratings yet
Lab Manual
23 pages
Lab Manual CD
No ratings yet
Lab Manual CD
19 pages
Compiler Design Lab (CSP358) : Practical No. 1 (LEX)
No ratings yet
Compiler Design Lab (CSP358) : Practical No. 1 (LEX)
16 pages
Flex/Le X: Javeria Akram (276) Ifra Zahid
No ratings yet
Flex/Le X: Javeria Akram (276) Ifra Zahid
21 pages
Lab
No ratings yet
Lab
169 pages
Syllabus B.Arch GNDU
No ratings yet
Syllabus B.Arch GNDU
78 pages
Compiler Design Manual
No ratings yet
Compiler Design Manual
69 pages
Multiple Access Schemes FDMA TDMA CDMA For Class
No ratings yet
Multiple Access Schemes FDMA TDMA CDMA For Class
26 pages
Plumbing-Ground Floor Design Build Operate Mep Services Using Bim A113
No ratings yet
Plumbing-Ground Floor Design Build Operate Mep Services Using Bim A113
4 pages
UNIT I BKS Lexical Analysis IX - LEX
No ratings yet
UNIT I BKS Lexical Analysis IX - LEX
17 pages
Introduction To Lex
No ratings yet
Introduction To Lex
18 pages
Chapter 6 Part IV
No ratings yet
Chapter 6 Part IV
36 pages
Cs3501 Compiler Design Lab Manual
No ratings yet
Cs3501 Compiler Design Lab Manual
54 pages
1lex and Yacc
No ratings yet
1lex and Yacc
42 pages
Lecture3 Lex
No ratings yet
Lecture3 Lex
44 pages
Sample of Web Engineering Exam (June 2006) - UK University BSC Final Year
No ratings yet
Sample of Web Engineering Exam (June 2006) - UK University BSC Final Year
8 pages
Lex A Lexical Analyzer Generator: M M.. E E.. L Leesskk A An ND de E.. S SCCH HM Miid DTT
No ratings yet
Lex A Lexical Analyzer Generator: M M.. E E.. L Leesskk A An ND de E.. S SCCH HM Miid DTT
13 pages
Assignment No.: Assignment To Understand Basic Syntax of LEX Specifications, Built-In Functions and Variables
No ratings yet
Assignment No.: Assignment To Understand Basic Syntax of LEX Specifications, Built-In Functions and Variables
32 pages
What Do You Mean by LEX2
No ratings yet
What Do You Mean by LEX2
7 pages
Lexical Analyser Parser
No ratings yet
Lexical Analyser Parser
37 pages
SSCD LAB MAUNUAL DRTTIT FULL (Santhosh) PDF
No ratings yet
SSCD LAB MAUNUAL DRTTIT FULL (Santhosh) PDF
50 pages
7B Vocabulary - Houses
No ratings yet
7B Vocabulary - Houses
13 pages
Pre Bid Minutes Led
No ratings yet
Pre Bid Minutes Led
4 pages
SS & OS Final Lab Manual
No ratings yet
SS & OS Final Lab Manual
46 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
Lex Tool
No ratings yet
Lex Tool
7 pages
The Function of Lex Is As Follows
No ratings yet
The Function of Lex Is As Follows
3 pages
Lex
No ratings yet
Lex
4 pages
Class 2019 Lex
No ratings yet
Class 2019 Lex
30 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
31 pages
CompilerDesignLabManual PDF
No ratings yet
CompilerDesignLabManual PDF
11 pages
SPCC Exp7
No ratings yet
SPCC Exp7
8 pages
Media Test Export
No ratings yet
Media Test Export
13 pages
System Programming & Compiler Design Lab Manual
No ratings yet
System Programming & Compiler Design Lab Manual
41 pages
Lab Session
No ratings yet
Lab Session
27 pages
Zoe Chuia Resort: Bathroom Cleaning Checklist
100% (1)
Zoe Chuia Resort: Bathroom Cleaning Checklist
6 pages
Lex - A Lexical Analyzer Ge...
No ratings yet
Lex - A Lexical Analyzer Ge...
19 pages
Obs Synopsis Toc
No ratings yet
Obs Synopsis Toc
2 pages
Hpe Proliant Ml350 Gen9 Server: High Performance With Unmatched Capacity and Reliability
No ratings yet
Hpe Proliant Ml350 Gen9 Server: High Performance With Unmatched Capacity and Reliability
2 pages
Lex and Yacc Roll No 23
No ratings yet
Lex and Yacc Roll No 23
7 pages
CFG DWDM Controllers
No ratings yet
CFG DWDM Controllers
24 pages
PLT Lecture Notes
No ratings yet
PLT Lecture Notes
5 pages
Unit I Introduction To Compilers: Lex - The Lexical-Analyzer Generator
No ratings yet
Unit I Introduction To Compilers: Lex - The Lexical-Analyzer Generator
19 pages
Lec5 LEX Lexical Analyzer Generator
No ratings yet
Lec5 LEX Lexical Analyzer Generator
12 pages
Pressure Relief Joints
No ratings yet
Pressure Relief Joints
3 pages
Subject: - Tender For Design, Build & Installation of Steel Fire Escape Stair Case in
No ratings yet
Subject: - Tender For Design, Build & Installation of Steel Fire Escape Stair Case in
10 pages
Crane Girder Sample
No ratings yet
Crane Girder Sample
1 page
2.kajrath Esp 19-02-2021 - V7 - Approve - P122
No ratings yet
2.kajrath Esp 19-02-2021 - V7 - Approve - P122
1 page
Why Did The British Build Hill Stations in India?: Bhairabi Kapil Saurabh Sulekha
No ratings yet
Why Did The British Build Hill Stations in India?: Bhairabi Kapil Saurabh Sulekha
36 pages
Study of Lex
No ratings yet
Study of Lex
3 pages
Compiler Design Lab KCS552
No ratings yet
Compiler Design Lab KCS552
82 pages
Acd 2.1
No ratings yet
Acd 2.1
20 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
Dive Into Sea of C
From Everand
Dive Into Sea of C
M Ashok
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet