Unit 4
Unit 4
Example
If X is a symbol and a is one of its attributes, we write X. a to denote the value of a at
a particular parse-tree node labeled X. Attributes may be numbers, types, table
references or strings.
Eg: E→E1+T{print‘+’}
A parser builds parse trees in the syntax analysis phase. The plain parse-tree is of no
use for a compiler, as it does not carry any information of how to evaluate the tree. The
productions of context-free grammar that makes the rule so the languages do not
accommodate how to interpret them
Example: E → E +T
The above CFG production has no semantic rule associated with it and it cannot help
in making any sense of the production.
Semantics
Semantics of a language provide meaning to its constructs, like tokens and syntax
structure. Semantics help interpret symbols, their types and their relations with each
1
other. Semantic analysis help whether the syntax structure constructed in the source
program derives any meaning or not.
Grammar with Semantic rules is called Syntax Directed Definition.
SyntaxDirectedDefinitions=CFG+Semanticrules
For example: int a=“value”;
Should not issue an error in lexical and syntax analysis phase, as it is lexically and
structurally correct, but Semantic analysis should generate a semantic error as the type
of the assignment differs. These rules are set by the grammar of the language and
evaluated in semantic analysis. The following tasks should be performed in semantic
analysis:
Semantic Errors
Wehavementionedsomeofthesemanticserrorsthatthesemanticanalyserisexpectedto
recognize:
Type mismatch
Undeclared variable
Reserved identifier misuse.
Multiple declaration of variable in a scope.
Accessing an out of scope variable.
Actual and formal parameter mismatch.
TheSDDisakindofabstractspecification.SyntaxDirectedDefinitionsisanaugmentedconte
xt free grammar generated. It means the set of attributes are associated with each
terminal and non-terminal. The attributes may be a number, string, memory location or
a type.
Attribute Grammar
Attribute grammar is a special form of context-free grammar where some additional
information means some attributes is appended to one or more of its non-terminals in order
to provide context-sensitive information. Each attribute has well-defined domain of values,
such as integer, float, character, string, and expressions.
Attribute grammar is a medium to provide semantics to the context-free grammar and it can
help to specify the syntax and semantics of a programming anguage. Attribute grammar can
pass values or information among the nodes of a tree.
Example:
E→E+T{E.value=E.value+T.value}
The right part of the CFG contains the semantic rules that specify how the grammar
should be interpreted. Here, the values of non-terminals E and T are added to get her
and the result is copied to then on-terminal E.
2
Semantic attributes may be assigned to their values from their domain at the time of
parsing and evaluated at the time of assignment or conditions. Based on the way the
attributes get their values.
They can be broadly divided into two categories: synthesized attributes and inherited
attributes.
Types of Attributes
Synthesized attributes
The parent node attributes get values from the attribute values of their child nodes.
To illustrate, assume the following production:
S→ABC
If S is taking values from its child nodes (A,B,C), then it is said to be a synthesized
attribute, as the values of ABC are synthesized to S.
E → E + T, the parent node E gets its value from its child node. Synthesized
attributes never take values from their parent nodes or any sibling nodes.
Construct Syntax Directed Translation for the given
grammarE → E+T
E→T
T→T*
FT →F
F→ num
Solution
Step 1: Context free grammar + semantic actions are said to be syntax Definition
Translation.
Semantic actions are to be written in between curly braces.
Step2AbstractSyntaxTree
Abstract syntax tree means building a tree without symbols.
Abstract Syntax
grammar
3
4
Step4:Annotated Parse Tree/Decorated Parse Tree: The syntax directed definition is to
be written with suitable semantic action for corresponding production rule of the given
grammar.
Productions SemanticRules
S→TL L.in.T.type
T→char T.type=char
L →L1,id L1.in=L.in
L →id Id.entry=L.inh
5
Expansion: When a non-terminal is expanded to terminals as per a grammaticalrule
Dependency Graph
The basic idea behind dependency graphs is for compiler to look for various kinds if
dependence among statements to prevent their execution in wrong order i.e. the order
6
that changes the meaning of the program. This helps it to identify various parallel is able
components in the program.
For rule X → YZ the semantic action is given by X.x = f(Y.y, Z.z) then synthesized attribute is
X.x and X.x depends upon attributes Y.y and Z.z
S-attributed SDT
If an SDT uses only synthesized attributes, it is called as S-attributed SDT. These
attributes are evaluated using S-attributed SDTs that have their semantic actions written
after the production.
Sattributeddefinitionisonesuchclassofsyntaxdirecteddefinitionwithsynthesizedattrib
utesonly.
Synthesized attributes are evaluated in bottom up fashion.
A stack is maintained to keep track of values of synthesized attributes associated
with the grammar symbols on its stack. This stack is often termed as parser
stack.
As in the above diagram, attributes in S-attributed SDTs are evaluated in bottom-up parsing,
as the values of the parent nodes depend upon the values of the child nodes.
L-attributed SDT
This form of SDT uses both synthesized and inherited attributes with restriction of not
taking values from right siblings.
In L-attributed SDTs, a non-terminal can get values from its parent, child, and sibling
nodes. As in the following production
Example S→ ABC
S can take values from A, B, and C (synthesized). A can take values from S only. B can
take values from S and A. C can get values from S, A, and B. No non-terminal can get
values from the sibling to its right.
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing
manner.
7
We may conclude that if a definition is S-attributed, then it is also L-attributed as L-
attributed definition encloses S-attributed definitions.
Intermediate Code: Compilers generate easy to represent form of source language that is
called intermediate language.
8
Types of Intermediate code
Intermediate code is broadly divided into two forms - Linear form and Tree form.
Linear form is again divided into two types – Post fix form and 3 address code.
Tree form is also divided into two types –Abstract Syntax Tree and Directed Acyclic
Graph (DAG).
Postfix or Polish Notation: In this form the operator is associated with the
corresponding operands. This is the most natural way of representation in expression
evaluation. In this notation the operator occurs fir and then operands are placed.
Example: ( a +b ) * (c + d)
ab +c d + *
9
Construct the given expression into postfix notation / Polish notation
Expression: ( a + ( b * c )) ^ d – e / ( f + g)
t1 = b c *
t2 = a t1 +
t3 = t2 d ^
t4 = f g +
t5 = e t4
t6 = t3 t5 –
Three address code is a type of intermediate code which is easy to generate and can be
easily converted to machine code. It makes use of at most three addresses and one
operator to represent an expression and the value computed at each instruction is stored in
temporary variable generated by compiler. The compiler decides the order of operation
given by three address code.
Each instruction should have maximum three addresses.
Each instruction should have only one operator at its right side.
Implementation of Three Address Code
There are 3 representations of three address code.
1. Quadruple
2. Triples
3. Indirect Triples
Quadruple
It is structure with consist of 4 fields namely op, arg1, arg2 and result. op denotes the
operator and arg1 and arg2 denotes the two operands and result is used to store the
result of the expression.
Advantages
Easy to rearrange code for global optimization.
One can quickly access value of temporary variables using symbol table.
Disadvantage
Contain lot of temporaries.
Temporary variable creation increases time and space complexity.
Example Consider expression a = b * – c + b * – c.
The three address code is:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4= b * t3
t5 = t2 + t4
a = t5
# op Arg1 Arg2 Result
(0) uminus c t1
(1) * t1 b t2
(2) uminus c t3
(3) * t3 b t4
(4) + t2 t4 t5
(5) = t5 A
Quadruple Representation
10
Triples
This representation doesn’t make use of extra temporary variable to represent a single
operation instead when a reference to another triple’s value is needed, a pointer to that
triple is used. So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage
Temporaries are implicit and difficult to rearrange code.
It is difficult to optimize because optimization involves moving intermediate code.
When a triple is moved, any other triple referring to it must be updated also. With
help of pointer one can directly access symbol table entry.
Example Consider expression a = b * – c + b * – c
# op Arg1 Arg2
(0) uminus c
(1) * (0) b
(2) uminus c
(3) * (2) b
(4) + (1) (3)
(5) = a (4)
Triples Representation
Indirect Triples
This representation makes use of pointer to the listing of all references to
computations which is made separately and stored. Its similar in utility as
compared to quadruple representation but requires less space than it.
Temporaries are implicit and easier to rearrange code.
Example Consider expression a = b * – c + b * – c
Abstract Syntax Tree: It is condensed / compact version of parse tree. In this tree nosymbols
are present except terminals.
Construct an Abstract Syntax Tree for the given expression
11
Input string x = -a*b+-a* b
Abstract Syntax Tree
12
integers to integers. Then, the assignment
The first two lines compute the value of the expression a [i] into temporary t2.
D → define T id ( F ) { S }
F → ε | T id, F
A → return E;
E → id ( A )
A → ε | E, A
Adding functions to the source language
The productions allow function definitions and function calls. Non terminal D and T
generate declarations and types, respectively. A function definition generated by D
consists of keyword define, a return type, the function name, formal parameters in
parentheses and a function body consisting of a statement.
13
Directed Acyclic Graph :
The Directed Acyclic Graph (DAG) is used to represent the structure of basic blocks, to visualize the flow of
values between basic blocks, and to provide optimization techniques in the basic block. To apply an
optimization technique to a basic block, a DAG is a three-address code that is generated as the result of an
intermediate code generation.
Directed acyclic graphs are a type of data structure and they are used to apply transformations to basic
blocks.
The Directed Acyclic Graph (DAG) facilitates the transformation of basic blocks.
DAG is an efficient method for identifying common sub-expressions.
It demonstrates how the statement’s computed value is used in subsequent statements.
Examples of directed acyclic graph :
14
T0 = a + b —Expression 1
T1 = T 0 + c —-Expression 2
d = T0 + T1 —–Expression 3
Expression 1 : T0 = a + b
Expression 3 : d = T0 + T1
15
Application of Directed Acyclic Graph:
Directed acyclic graph determines the subexpressions that are commonly used.
Directed acyclic graph determines the names used within the block as well as the names computed outside
the block.
Determines which statements in the block may have their computed value outside the block.
Code can be represented by a Directed acyclic graph that describes the inputs and outputs of each of the
arithmetic operations performed within the code; this representation allows the compiler to perform
common subexpression elimination efficiently.
Several programming languages describe value systems that are linked together by a directed acyclic graph.
When one value changes, its successors are recalculated; each value in the DAG is evaluated as a function
of its predecessors.
16