Compiler Design Unit-3
Compiler Design Unit-3
Let us assume an input string int a, c for computing inherited attributes. The
annotated parse tree for the input string is
The value of L nodes is obtained from T.type (sibling) which is basically lexical
value obtained as int, float or double. Then L node gives type of identifiers a
and c. The computation of type is done in top down manner or preorder
traversal. Using function Enter_type the type of identifiers a and c is inserted in
symbol table at corresponding id.entry.
Evaluation Orders for SDD's
Evaluation order for SDD includes how the SDD(Syntax Directed Definition) is
evaluated with the help of attributes, dependency graphs, semantic rules, and S
and L attributed definitions. SDD helps in the semantic analysis in the compiler
so it’s important to know about how SDDs are evaluated and their evaluation
order. This article provides detailed information about the SDD evaluation. It
requires some basic knowledge of grammar, production, parses tree, annotated
parse tree, synthesized and inherited attributes.
Terminologies:
Parse Tree: A parse tree is a tree that represents the syntax of the
production hierarchically.
Annotated Parse Tree: Annotated Parse tree contains the values and
attributes at each node.
Synthesized Attributes: When the evaluation of any node’s attribute is
based on children.
Inherited Attributes: When the evaluation of any node’s attribute is based
on children or parents.
Dependency Graphs:
The dependency graph provides the evaluation order of attributes of the nodes
of the parse tree. An edge( i.e. first node to the second node) in the
dependency graph represents that the attribute of the second node is
dependent on the attribute of the first node for further evaluation. This order of
evaluation gives a linear order called topological order.
There is no way to evaluate SDD on a parse tree when there is a cycle present
in the graph and due to the cycle, no topological order exists.
Production Table
A1.inh = A.syn
3. A1 ⇢ B A1.syn = B.syn
Node number in the graph represents the order of the evaluation of the
associated attribute. Edges in the graph represent that the second value is
dependent on the first value.
Table-1
Node Attribute
1 digit.lexval
2 digit.lexval
3 digit.lexval
Table-1
Node Attribute
4 B.syn
5 B.syn
6 B.syn
7 A1.syn
8 A.syn
9 A1.inh
10 S.val
Table-2
Edge
1 4 B.syn = digit.lexval
Table-2
Edge
2 5 B.syn = digit.lexval
3 6 B.syn = digit.lexval
4 7 A1.syn = B.syn
8 9 A1.inh = A.syn
S-Attributed Definitions:
S-attributed SDD can have only synthesized attributes. In this type of definitions
semantic rules are placed at the end of the production only. Its evaluation is
based on bottom up parsing.
Example: S ⇢ AB { S.x = f(A.x | B.x) }
L-Attributed Definitions:
L-attributed SDD can have both synthesized and inherited (restricted inherited
as attributes can only be taken from the parent or left siblings). In this type of
definition, semantics rules can be placed anywhere in the RHS of the
production. Its evaluation is based on inorder (topological sorting).
Example: S ⇢ AB {A.x = S.x + 2} or S ⇢ AB { B.x = f(A.x | B.x) } or S ⇢ AB
{ S.x = f(A.x | B.x) }
Note:
Every S-attributed grammar is also L-attributed.
For L-attributed evaluation in order of the annotated parse tree is used.
For S-attributed reverse of the rightmost derivation is used.
Side effects are the program fragment contained within semantic rules. These
side effects in SDD can be controlled in two ways: Permit incidental side effects
and constraint admissible evaluation orders to have the same translation as any
admissible order.
Applications of Syntax-Directed Translation
Syntax Directed Translation :
It is used for semantic analysis and SDT is basically used to construct the parse
tree with Grammar and Semantic action. In Grammar, need to decide who has
the highest priority will be done first and In semantic action, will decide what
type of action done by grammar.
Example :
SDT = Grammar+Semantic Action
Grammar = E -> E1+E2
Semantic action= if (E1.type != E2.type) then print "type
mismatching"
Application of Syntax Directed Translation :
SDT is used for Executing Arithmetic Expression.
In the conversion from infix to postfix expression.
In the conversion from infix to prefix expression.
It is also used for Binary to decimal conversion.
In counting number of Reduction.
In creating a Syntax tree.
SDT is used to generate intermediate code.
In storing information into symbol table.
SDT is commonly used for type checking also.
Example :
Here, we are going to cover an example of application of SDT for better
understanding the SDT application uses. let’s consider an example of arithmetic
expression and then you will see how SDT will be constructed.
Let’s consider Arithmetic Expression is given.
Input : 2+3*4
output: 14
SDT for the above example.
E → id E. CODE = id
Translation for Postfix Notation
Here, E. CODE represents an attribute or translation of grammar symbol E. It means
the sequence of three-address statements evaluating E. The translation of the
nonterminal on the left of each production is the concatenation (| |) of the translation of
the non-terminals on the right followed by the operator.
In the first productionE → E(1) + E(2), the value of translation E. CODE is the
concatenation of two translation E(1). CODE & E(2). CODE and symbol '+'.
In the second productionE → E(1) ∗ E(2), the value of translation E. CODE is the
concatenation of two translation E(1). CODE & E(2). CODE and symbol '∗'.
Here, concatenation is represented by the symbol (| |).
In the third productionE → (E(1)), the translation of parenthesized expression is the
same as that for unparenthesized expression.
In the fourth productionE → id, the translation of any identifier is the identifier itself.
Following are various attributes or translations for grammar symbol 𝐄.
A syntax tree basically has two variants which are described below:
1. Directed Acyclic Graphs for Expressions (DAG)
2. The Value-Number Method for Constructing DAGs
Expression 2: T1 = T0 +c
The integer index of the record for that node inside the array is used to refer to
nodes in this array. This integer has been referred to as the node’s value
number or the expression represented by the node in the past. The value of the
node labeled -I- is 3, while the values of its left and right children are 1 and 2,
respectively. Instead of integer indexes, we may use pointers to records or
references to objects in practice, but the reference to a node would still be
referred to as its “value number.” Value numbers can assist us in constructing
expressions if they are stored in the right data format.
Algorithm: The value-number method for constructing the nodes of a
Directed Acyclic Graph.
INPUT: Label op, node /, and node r.
OUTPUT: The value number of a node in the array with signature (op, l,r).
METHOD: Search the array for node M with label op, left child I, and right
child r. If there is such a node, return the value number of M. If not, create in
the array a new node N with label op, left child I, and right child r, and return
its value number.
While Algorithm produces the intended result, examining the full array every
time one node is requested is time-consuming, especially if the array contains
expressions from an entire program. A hash table, in which the nodes are
divided into “buckets,” each of which generally contains only a few nodes, is a
more efficient method. The hash table is one of numerous data structures that
may effectively support dictionaries. 1 A dictionary is a data type that allows us
to add and remove elements from a set, as well as to detect if a particular
element is present in the set. A good dictionary data structure, such as a hash
table, executes each of these operations in a constant or near-constant amount
of time, regardless of the size of the set.
To build a hash table for the nodes of a DAG, we require a hash function h that
computes the bucket index for a signature (op, I, r) in such a manner that the
signatures are distributed across buckets and no one bucket gets more than a
fair portion of the nodes. The bucket index h(op, I, r) is deterministically
computed from the op, I, and r, allowing us to repeat the calculation and always
arrive at the same bucket index per node (op, I, r).
The buckets can be implemented as linked lists, as in the given figure. The
bucket headers are stored in an array indexed by the hash value, each of which
corresponds to the first cell of a list. Each column in a bucket’s linked list
contains the value number of one of the nodes that hash to that bucket. That is,
node (op,l,r) may be located on the array’s list whose header is at index
h(op,l,r).
Data structure for searching buckets
We calculate the bucket index h(op,l,r) and search the list of cells in this bucket
for the specified input node, given the input nodes op, I, and r. There are
usually enough buckets that no list has more than a few cells. However, we
may need to examine all of the cells in a bucket, and for each value number v
discovered in a cell, we must verify that the input node’s signature (op,l,r)
matches the node with value number v in the list of cells (as in fig above). If a
match is found, we return v. We build a new cell, add it to the list of cells for
bucket index h(op, l,r), and return the value number in that new cell if we find no
match.
Three-Address Code
Three address code is a type of intermediate code which is easy to generate
and can be easily converted to machine code. It makes use of at most three
addresses and one operator to represent an expression and the value
computed at each instruction is stored in temporary variable generated by
compiler. The compiler decides the order of operation given by three address
code.
Conversion
Conversion from one type to another type is known as implicit if it is to be
done automatically by the compiler. Implicit type conversions are also
called Coercion and coercion is limited in many languages.
Example: An integer may be converted to a real but real is not converted to an
integer.
Conversion is said to be Explicit if the programmer writes something to do the
Conversion.
Tasks:
1. has to allow “Indexing is only on an array”
2. has to check the range of data types used
3. INTEGER (int) has a range of -32,768 to +32767
4. FLOAT has a range of 1.2E-38 to 3.4E+38.
Dynamic Type Checking is defined as the type checking being done at run time.
In Dynamic Type Checking, types are associated with values, not variables.
Implementations of dynamically type-checked languages runtime objects are
generally associated with each other through a type tag, which is a reference to
a type containing its type information. Dynamic typing is more flexible. A static
type system always restricts what can be conveniently expressed. Dynamic
typing results in more compact programs since it is more flexible and does not
require types to be spelled out. Programming with a static type system often
requires more design and implementation effort.
Languages like Pascal and C have static type checking. Type checking is used
to check the correctness of the program before its execution. The main purpose
of type-checking is to check the correctness and data type assignments and
type-casting of the data types, whether it is syntactically correct or not before
their execution.
Static Type-Checking is also used to determine the amount of memory needed
to store the variable.
The design of the type-checker depends on:
1. Syntactic Structure of language constructs.
2. The Expressions of languages.
3. The rules for assigning types to constructs (semantic rules).
The token streams from the lexical analyzer are passed to the PARSER. The
PARSER will generate a syntax tree. When a program (source code) is
converted into a syntax tree, the type-checker plays a Crucial Role. So, by
seeing the syntax tree, you can tell whether each data type is handling the
correct variable or not. The Type-Checker will check and if any modifications
are present, then it will modify. It produces a syntax tree, and after that,
INTERMEDIATE CODE Generation is done.
Overloading:
Control flow
The control flow is the order in which the computer executes statements in a script.
Code is run in order from the first line in the file to the last line, unless the computer
runs across the (extremely frequent) structures that change the control flow, such as
conditionals and loops.
For example, imagine a script used to validate user data from a webpage form. The
script submits validated data, but if the user, say, leaves a required field empty, the
script prompts them to fill it in. To do this, the script uses a conditional structure
or if...else, so that different code executes depending on whether the form is complete or
not:
JSCopy to Clipboard
if (isEmpty(field)) {
promptUser();
} else {
submitForm();
}
A typical script in JavaScript or PHP (and the like) includes many control structures,
including conditionals, loops and functions. Parts of a script may also be set to execute
when events occur.
For example, the above excerpt might be inside a function that runs when the user clicks
the Submit button for the form. The function could also include a loop, which iterates
through all of the fields in the form, checking each one in turn. Looking back at the code
in the if and else sections, the lines promptUser and submitForm could also be calls to other
functions in the script. As you can see, control structures can dictate complex flows of
processing even with only a few lines of code.
Control flow means that when you read a script, you must not only read from start to
finish but also look at program structure and how it affects order of execution.
Three-Address Code
Intermediate code generator receives input from its predecessor phase, semantic
analyzer, in the form of an annotated syntax tree. That syntax tree then can be
converted into a linear representation, e.g., postfix notation. Intermediate code tends to
be machine independent code. Therefore, code generator assumes to have unlimited
number of memory storage (register) to generate code.
For example:
a = b + c * d;
The intermediate code generator will try to divide this expression into sub-expressions
and then generate the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2
r being used as registers in the target program.
A three-address code has at most three address locations to calculate the expression. A
three-address code can be represented in two forms : quadruples and triples.
Quadruples
Each instruction in quadruples presentation is divided into four fields: operator, arg1,
arg2, and result. The above example is represented below in quadruples format:
* c d r1
+ b r1 r2
+ r2 r1 r3
= r3 a
Triples
Each instruction in triples presentation has three fields : op, arg1, and arg2.The results
of respective sub-expressions are denoted by the position of expression. Triples
represent similarity with DAG and syntax tree. They are equivalent to DAG while
representing expressions.
Op arg1 arg2
* c d
+ b (0)
+ (1) (0)
= (2)
Triples face the problem of code immovability while optimization, as the results are
positional and changing the order or position of an expression may cause problems.
Indirect Triples
This representation is an enhancement over triples representation. It uses pointers
instead of position to store results. This enables the optimizers to freely re-position the
sub-expression to produce an optimized code.
Declarations
A variable or procedure has to be declared before it can be used. Declaration involves
allocation of space in memory and entry of type and name in the symbol table. A
program may be coded and designed keeping the target machine structure in mind, but
it may not always be possible to accurately convert a source code to its target language.
Taking the whole program as a collection of procedures and sub-procedures, it
becomes possible to declare all the names local to the procedure. Memory allocation is
done in a consecutive manner and names are allocated to memory in the sequence
they are declared in the program. We use offset variable and set it to zero {offset = 0}
that denote the base address.
The source programming language and the target machine architecture may vary in the
way names are stored, so relative addressing is used. While the first name is allocated
memory starting from the memory location 0 {offset=0}, the next name declared later,
should be allocated memory next to the first one.
Example:
We take the example of C programming language where an integer variable is assigned
2 bytes of memory and a float variable is assigned 4 bytes of memory.
int a;
float b;
Allocation process:
{offset = 0}
int a;
id.type = int
id.width = 2
float b;
id.type = float
id.width = 4