CD Unit 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

UNIT 3

Syntax Directed Definition


Syllabus: Syntax-Directed Translation: Syntax-Directed Definitions, Evaluation Orders for SDD's,
Applications of Syntax-Directed Translation, Syntax-Directed Translation Schemes, and Implementing L-
Attributed SDD's. Intermediate-Code Generation: Variants of Syntax Trees, Three-Address Code, Types and
Declarations, Type Checking, Control Flow, Back patching, Switch-Statements, Intermediate Code for
Procedures.

========================================================================

Syntax Directed Definition


A syntax-directed definition (SDD) is a context-free grammar with attributes and rules.
Attributes are associated with grammar symbols and rules are associated with
productions.

Example
• If X is a symbol and a is one of its attributes, we write X. a to denote the value of a at
a particular parse-tree node labelled X. Attributes may be numbers, types, table
references or strings.

Syntax Directed Definition: It is a special form of is a kind of abstract


specification. Productions with semantic rules is known as Syntax Directed Definition.

Eg: E → E1 + T E.value = E1.value | T.value | +

Syntax Directed Translations: Productions with semantic actions embedded with


production bodies.

Eg: E → E1 + T { print ‘+’}

A parser builds parse trees in the syntax analysis phase. The plain parse-tree is of no
use for a compiler, as it does not carry any information of how to evaluate the tree. The
productions of context-free grammar that makes the rules of the language do not
accommodate how to interpret them.

Example: E → E + T

The above CFG production has no semantic rule associated with it and it cannot help in
making any sense of the production.

Semantics
Semantics of a language provide meaning to its constructs, like tokens and syntax
structure. Semantics help interpret symbols, their types and their relations with each
other. Semantic analysis help whether the syntax structure constructed in the source
program derives any meaning or not.
Grammar with Semantic rules is called Syntax Directed Definition.

1
Syntax Directed Definitions = CFG + Semantic rules
For example: int a = “value”;
should not issue an error in lexical and syntax analysis phase, as it is lexically and
structurally correct, but Semantic analysis should generate a semantic error as the type
of the assignment differs. These rules are set by the grammar of the language and
evaluated in semantic analysis. The following tasks should be performed in semantic
analysis:
Semantic Errors
We have mentioned some of the semantics errors that the semantic analyser is expected
to recognize:

• Type mismatch
• Undeclared variable
• Reserved identifier misuse.
• Multiple declaration of variable in a scope.
• Accessing an out of scope variable.
• Actual and formal parameter mismatch.

The SDD is a kind of abstract specification. Syntax Directed Definitions is an augmented


context free grammar generated. It means the set of attributes are associated with each
terminal and non-terminal. The attributes may be a number, string, memory location or
a type.

The conceptual view of syntax directed translation is like this.

Attribute Grammar
Attribute grammar is a special form of context-free grammar where some additional
information means some attributesis appended to one or more of its non-terminals in
order to provide context-sensitive information. Each attribute has well-defined domain
of values, such as integer, float, character, string, and expressions.
Attribute grammar is a medium to provide semantics to the context-free grammar and it
can help to specify the syntax and semantics of a programming language. Attribute
grammar can pass values or information among the nodes of a tree.
Example:
E → E + T {E.value=E.value+T.value}
The right part of the CFG contains the semantic rules that specify how the grammar
should be interpreted. Here, the values of non-terminals E and T are added together
and the result is copied to the non-terminal E.
Semantic attributes may be assigned to their values from their domain at the time of
parsing and evaluated at the time of assignment or conditions. Based on the way the
attributes get their values.

2
They can be broadly divided into two categories: synthesized attributes and inherited
attributes.

Synthesized attributes
The parent node attributes get values from the attribute values of their child nodes. To
illustrate, assume the following production:
S→AB C
If S is taking values from its child nodes (A,B,C), then it is said to be a synthesized
attribute, as the values of ABC are synthesized to S.
E → E + T, the parent node E gets its value from its child node. Synthesized attributes
never take values from their parent nodes or any sibling nodes.
Construct Syntax Directed Translation for the given grammar
E→E+T
E→ T
T→T*F
T→ F
F → num

Solution
Step 1: Context free grammar + semantic actions are said to be syntax Definition
Translation.
Semantic actions are to be written in between curly braces.
Contest Free Grammar Semantic Actions
E→E+T { E. value = E. value + T. value}
E→T {E. value = T. value}
T→T*F {T. value = T. value * T.value}
T→F {T.value = F.value}
F → num {F. value = num.lexicalvalue}

Step 2 Abstract Syntax Tree


Abstract syntax tree means building a tree without symbols.

Abstract Syntax Tree

Step 3: Constructing Parse Tree for the above grammar

3
Step 4: Annotated Parse Tree / Decorated Parse Tree: The syntax directed definition is
to be written with suitable semantic action for corresponding production rule of the
given grammar.

The annotated tree can also be termed as decorated parse tree.

Annotated Parse Tree

Inherited attributes
Unlike synthesized attributes, inherited attributes can take values from parent and/or
siblings. As in the following production,
S→AB C
A can get values from S, B and C. B can take values from S, A, and C. Likewise, C can
take values from S, A, and B
Annotate the parse tree for the computation of inherited attributes for the given string
int a,b,c.
Solution
Step 1. The Syntax Directed Definition for the given grammar.

Productions Semantic Rules

S→TL L.in .T.type

T → int T.type = int

T → float T.type = float

T → char T.type = char

T → double T.type = double

L → L1, id L1.in = L.in

L → id Id.entry = L.inh

4
Expansion: When a non-terminal is expanded to terminals as per a grammatical rule

Reduction: When a terminal is reduced to its corresponding non-terminal according to


grammar rules, Syntax trees are parsed top-down and left to right. Whenever reduction
occurs, we apply its corresponding semantic rules.
Semantic analysis uses Syntax Directed Translations to perform the above tasks.
Semantic analyser receives AST Abstract Syntax Treefrom its previous stage syntax
analysis.
Semantic analyser attaches attribute information with AST, which are called Attributed
AST.
Attributes are two tuple values, <attribute name, attribute value>
int value = 5;
<type , “integer “>
<presentvalue, “5”>
For every production, we attach a semantic rule.

Dependency Graph
The basic idea behind dependency graphs is for compiler to look for various kinds if
dependence among statements to prevent their execution in wrong order i.e. the order

5
that changes the meaning of the program. This helps it to identify various parallelisable
components in the program.
For rule X → YZ the semantic action is given by X.x = f(Y.y, Z.z) then synthesized
attribute is X.x and X.x depends upon attributes Y.y and Z.z

S-attributed SDT
If an SDT uses only synthesized attributes, it is called as S-attributed SDT. These
attributes are evaluated using S-attributed SDTs that have their semantic actions
written after the production.
• S attributed definition is one such class of syntax directed definition with
synthesized attributes only.
• Synthesized attributes are evaluated in bottom up fashion.
• A stack is maintained to keep track of values of synthesized attributes associated
with the grammar symbols on its stack. This stack is often termed as parser
stack.

As in the above diagram, attributes in S-attributed SDTs are evaluated in bottom-up


parsing, as the values of the parent nodes depend upon the values of the child nodes.

L-attributed SDT
This form of SDT uses both synthesized and inherited attributes with restriction of not
taking values from right siblings.
In L-attributed SDTs, a non-terminal can get values from its parent, child, and sibling
nodes. As in the following production
Example S → A B C
S can take values from A, B, and C (synthesized). A can take values from S only. B can
take values from S and A. C can get values from S, A, and B. No non-terminal can get
values from the sibling to its right.
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing
manner.

6
We may conclude that if a definition is S-attributed, then it is also L-attributed as L-
attributed definition encloses S-attributed definitions.
Intermediate Code: Compilers generate easy to represent form of source language that
is called intermediate language.

Types of Intermediate code

Intermediate code is broadly divided into two forms - Linear form and Tree form.
Linear form is again divided into two types – Post fix form and 3 address code.
Tree form is also divided into two types –Abstract Syntax Tree and Directed Acyclic
Graph (DAG).

Postfix or Polish Notation: In this form the operator is associated with the
corresponding operands. This is the most natural way of representation in expression
evaluation. In this notation the operator occurs fir and then operands are placed.
Example: ( a +b ) * (c + d)
ab +c d + *
Construct the given expression into postfix notation / Polish notation
Expression: ( a + ( b * c )) ^ d – e / ( f + g)
t1 = b c *
t2 = a t1 +
t3 = t2 d ^
t4 = f g +
t5 = e t4
t6 = t3 t5 –

Three address code is a type of intermediate code which is easy to generate and
can be easily converted to machine code. It makes use of at most three addresses and
one operator to represent an expression and the value computed at each instruction is
stored in temporary variable generated by compiler. The compiler decides the order of
operation given by three address code.
• Each instruction should have maximum three addresses.
• Each instruction should have only one operator at its right side.

7
Implementation of Three Address Code
There are 3 representations of three address code.
1. Quadruple
2. Triples
3. Indirect Triples

Quadruple
It is structure with consist of 4 fields namely op, arg1, arg2 and result. op denotes the
operator and arg1 and arg2 denotes the two operands and result is used to store the
result of the expression.
Advantages
• Easy to rearrange code for global optimization.
• One can quickly access value of temporary variables using symbol table.
Disadvantage
• Contain lot of temporaries.
• Temporary variable creation increases time and space complexity.
Example Consider expression a = b * – c + b * – c.
The three address code is:
t1 = uminus c
t2 = b * t3
t3 = uminus * t3
t4 b * t4
t5 = t2 + t4
a = t5
# op Arg1 Arg2 result
(0) uminus c t1
(1) * t1 b t2
(2) uminus c t3
(3) * t3 b T4
(4) + 22 t4 t5
(5) = t5 a
Quadruple Representation
Triples
This representation doesn’t make use of extra temporary variable to represent a single
operation instead when a reference to another triple’s value is needed, a pointer to that
triple is used. So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage
• Temporaries are implicit and difficult to rearrange code.
• It is difficult to optimize because optimization involves moving intermediate code.
When a triple is moved, any other triple referring to it must be updated also. With
help of pointer one can directly access symbol table entry.
Example Consider expression a = b * – c + b * – c

# op Arg1 Arg2
(0) uminus c
(1) * b B
(2) uminus c
(3) * b B
(4) + (1) (3)
(5) = a (4)

8
Triples Representation

Indirect Triples
This representation makes use of pointer to the listing of all references to
computations which is made separately and stored. Its similar in utility as
compared to quadruple representation but requires less space than it.
Temporaries are implicit and easier to rearrange code.
Example Consider expression a = b * – c + b * – c

List of pointers to the table


Pointers op
# op Arg1 Agr2
100 (1)
(1) uminus C
200 (2)
(2) * (14) B
300 (3)
(3) uminus C
400 (4)
(4) - (16) B
500 (5)
(5) + (15) (17)
600 (6)
(6) = a (18)

Abstract Syntax Tree: It is condensed / compact version of parse tree. In this tree no
symbols are present except terminals.
Construct an Abstract Syntax Tree for the given expression
Input string x = -a*b+-a* b
Abstract Syntax Tree

9
Directed Acyclic Graph: It is more compact version of Abstract Syntax Tree. It
does not allow repeated
parts of the expression instead it point to the already existed one.
A directed acyclic graph is used to apply transformation s on the basic block.
A DAG can be constructed for the following type of labels on nodes.
1. Leaf nodes are labelled by identifiers or variable names or constants. Generally
leaves represent r-values.
2. Internal nodes store operator values
The DAG and flow graphs are two different representations. Each node of the flow graph
can be represented by DAG because each node of the flow graph is a basic block.
DAG for the expression (((a * b) + ( c – d ) * ( a * ))) + b

Type Checking
• Type checking is used for catching errors in programs. In principle, any check can
be done dynamically, if the target code carries the type of an element along with
the value of the element.
• A sound type system eliminates the need for dynamic checking for type errors,
because it allows us to determine statically that these errors cannot occur when
the target program runs.
• An implementation of a language is strongly typed if a compiler guarantees that the
programs it accepts will run without type errors.
Rules for Type Checking
Type checking is of two forms: 1. Synthesis and 2. Inference.
Type synthesis builds up the type of an expression from the types of its subexpressions. It
requires names to be declared before they are used. The type of E1 + E2 is defined in terms
of the types of E1 and E2 • A typical rule for type synthesis has the form

10
Here, f and x denote expressions and s -> t denotes a function from s to t. This rule for
functions with one argument carries over to functions with several arguments. The
rule can be adapted for E1 +E2 by viewing it as a function application add (E1, E2).

Type inference determines the type of a language construct from the way it is used. Let
null be a function that tests whether a list is empty. Then, from the usage null(x), we
can tell that x must be a list. The type of the elements of x is not known; that x must be
a list of elements of some type that is presently unknown.

A typical rule for type inference has the form

Control Flow
The translation of statements such as if-else-statements and while-statements is tied to the
translation of Boolean expressions. In programming languages, Boolean expressions are
often used to
• Alter the flow of control: Boolean expressions are used as conditional expressions in
statements that alter the flow of control. The value of such Boolean expressions is
implicit in a position reached in a program. For example, in if (E) S, the
expression E must be true if statement S is reached.

• Compute logical values: A Boolean expression can represent true or false as values.
Such boolean expressions can be evaluated to arithmetic expressions using three-
address instructions with logical operators.
Boolean Expressions: Boolean expressions are composed of the boolean operators
&&, || and ! using the C convention for the operators AND, OR, and NOT, respectively.
Relational expressions are of the form E1 rel E2, where E1 and E2 are arithmetic
expressions. Consider boolean expressions generated by the following grammar:

The attribute rel. op to indicate which of the six comparison operators <, < = , =, ! =, >, or
>= is represented by rel. Assume that | | and && are left-associative, and that | | has lowest
precedence, then &&, then !.
Flow-of-Control Statements
Consider the translation of boolean expressions into three-address code in the context
of statements such as those generated by the following grammar:

11
Both B and S have a synthesized attribute code, which gives the translation into three-
address instructions. For simplicity, the translations B. code and S. code as strings, using
syntax-directed definitions. The semantic rules defining the code attributes are
implemented instead by building up syntax trees.
The translation of if (B) S1 consists of B. code followed by S1. code as indicated fig a.
Within B. code are jumps based on the value of B. If B is true, control flows to the first
instruction of S1. code and if B is false, control flows to the instruction immediately
following S1. code.

The labels for the jumps in B. code and S. code are managed using inherited attributes.
With a boolean expression B, we associate two labels: B. true, the label to which control
flows if B is true and B. false, the label to which control flows if B is false. With a
statement S, we associate an inherited attribute S. next denoting a label for the
instruction immediately after the code for S. In some cases, the instruction immediately
following S. code is a jump to some label L. A jump to a jump to L from within S. code is
avoided using S. next.
The syntax-directed definition produces three-address code for boolean expressions in
the context of if-, if-else-, and while-statements.

12
Backpatching
The problem in generating three address codes in a single pass is that we may not know
the labels that control must go to at the time jump statements are generated. So, to get
around this problem a series of branching statements with the targets of the jumps
temporarily left unspecified is generated.
Back Patching is putting the address instead of labels when the proper label is
determined. Leaving the labels as empty and filling them later is called back patching.
Convert the following expression into 3 address code using Back patching.
Example 1:if ( a < b ) then t = 1 else t = 0
(i) if a < b goto ............. ( i+3 )
(i + 1) t=0
( i + 2) goto… ...................... ( i + 4)
(i + 3) t=1
( i + 4)
Example 2: Convert the following expression into 3 address code using Back patching.
Expression: a < b and c < d or e < f
100: if (a< b) goto ---103
101: t1 =0
102: goto ---104
103: t1 = 1
104: if ( c < d) goto ---107

13
105: t2 = 0
106: goto ---108
107: t2 = 1
108: if ( e < f ) goto ---111
109: t3 = 0
110: goto ---112
111: t3 = 1
112: t4 = t1 and t2
113: t4 or t3

Switch Statements
Convert the switch statements into 3 address code
switch( i + j)
{
case (i): a = b + c;
break; case
(ii): P = Q + R;
break;
case (iii): x = y + z;
break;
}
Three Address Code t = i
+j
goto Test
L1: t1 = b + c a
= t1

goto ---Last

L2: t2 = Q + R
P = t2
goto ---Last
L3: t3 = y + z
x = t3

14
goto ---Last

15
Test: if ( t == 1) goto L1
If(t == 2) goto L2
goto L3
Last:

Intermediate Code for Procedures


The term function for a procedure that returns a value. In three-address code, a function
call is explored into the evaluation of parameters in preparation for a call, followed by
the call itself.

Example: Suppose that a is an array of integers and that f is a function from integers to
integers. Then, the assignment

The first two lines compute the value of the expression a [i] into temporary t2.

Line 3 makes t2 an actual parameter for the call.

On line 4 of f with one parameter.

Line 5 assigns the value returned by the function call to t3 .

Line 6 assigns the returned value to n.

D → define T id ( F ) { S }
F → ε | T id, F
A → return E;
E → id ( A )
A → ε | E, A
Adding functions to the source language

The productions allow function definitions and function calls. Nonterminal D and T
generate declarations and types, respectively. A function definition generated by D
consists of keyword define, a return type, the function name, formal parameters in
parentheses and a function body consisting of a statement.

16
Nonterminal F generates zero or more formal parameters, where a formal parameter
consists of a type followed by an identifier. Nonterminal S and E generate statements
and expressions, respectively.

The production for S adds a statement that returns the value of an expression. The
production for E adds function calls, with actual parameters generated by A. An actual
parameter is an expression.

Function types
• The type of a function must encode the return type and the types of the formal
parameters. Let void be a special type that represents no parameter or no return
type.
• The type of a function pop ( ) that returns an integer is "function from void to
integer."
• Function types can be represented by using a constructor fun applied to the
return type and an ordered list of types for the parameters.
Symbol tables
• Let s be the top symbol table when the function definition is reached. The function
name is entered into s for use in the rest of the program.
• The formal parameters of a function can be handled with field names in a record.
• In the production for D, after seeing define and the function name, we push s and
set up a new symbol table
Env. push(top)] top = new Env(top);
Call the new symbol table, t. Note that top is passed as a parameter in n e w Env (top),
so the new symbol table t can be linked to the previous one, s. The new table t is used to
translate the function body. We revert to the previous symbol table s after the function
body is translated.
Type checking
• Within expressions, a function is treated like any other operator.
• Type checking carries over including the rules.
For example, if f is a function with a parameter of type real, then the integer 2 is
a real in the call f (2).

17
Function calls
• When generating three-address instructions for a function call id(E, E,... , E), it is
sufficient to generate the three-address instructions for evaluating or reducing the
parameters E to addresses, followed by a param instruction for each parameter.
• If we do not want to mix the parameter-evaluating instructions with the param
instructions, the attribute E.addr for each expression E can be saved in a data
structure such as a queue.
• Once all the expressions are translated, the param instructions can be generated
as the queue is emptied.
The procedure is an important and frequently used programming construct that it is
imperative for a compiler to code for procedure calls and returns. The run-time routines
that handle procedure parameter passing, calls, and returns are part of the run-time
support package.
$$$$$$$$$$$$$$$$$$$

18

You might also like