0% found this document useful (0 votes)
24 views

Compiler Design Unit-3

Syntax directed definitions are a generalization of context free grammars that associate semantic rules with productions. The document discusses synthesized and inherited attributes, annotated parse trees, and evaluation orders for syntax directed definitions including bottom-up and top-down approaches.

Uploaded by

bhuvanachandra54
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Compiler Design Unit-3

Syntax directed definitions are a generalization of context free grammars that associate semantic rules with productions. The document discusses synthesized and inherited attributes, annotated parse trees, and evaluation orders for syntax directed definitions including bottom-up and top-down approaches.

Uploaded by

bhuvanachandra54
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Syntax directed definitions

Syntax Directed Definition (SDD) is a kind of abstract specification. It is


generalization of context free grammar in which each grammar production X –>
a is associated with it a set of production rules of the form s = f(b1, b2, ……bk)
where s is the attribute obtained from function f. The attribute can be a string,
number, type or a memory location. Semantic rules are fragments of code
which are embedded usually at the end of production and enclosed in curly
braces ({ }).
Example:
E --> E1 + T { E.val = E 1.val + T.val}
Annotated Parse Tree – The parse tree containing the values of attributes at
each node for given input string is called annotated or decorated parse tree.
Features –
 High level specification
 Hides implementation details
 Explicit order of evaluation is not specified
Types of attributes – There are two types of attributes:
1. Synthesized Attributes – These are those attributes which derive their
values from their children nodes i.e. value of synthesized attribute at node is
computed from the values of attributes at children nodes in parse tree.
Example:
E --> E1 + T { E.val = E 1.val + T.val}
In this, E.val derive its values from E1.val and T.val
Computation of Synthesized Attributes –
 Write the SDD using appropriate semantic rules for each production in given
grammar.
 The annotated parse tree is generated and attribute values are computed in
bottom up manner.
 The value obtained at root node is the final output.
Example: Consider the following grammar
S --> E
E --> E1 + T
E --> T
T --> T1 * F
T --> F
F --> digit
The SDD for the above grammar can be written as follow
Let us assume an input string 4 * 5 + 6 for computing synthesized attributes.
The annotated parse tree for the input string is
For computation of attributes we start from leftmost bottom node. The rule F –>
digit is used to reduce digit to F and the value of digit is obtained from lexical
analyzer which becomes value of F i.e. from semantic action F.val = digit.lexval.
Hence, F.val = 4 and since T is parent node of F so, we get T.val = 4 from
semantic action T.val = F.val. Then, for T –> T1 * F production, the
corresponding semantic action is T.val = T1.val * F.val . Hence, T.val = 4 * 5 =
20
Similarly, combination of E1.val + T.val becomes E.val i.e. E.val = E1.val + T.val
= 26. Then, the production S –> E is applied to reduce E.val = 26 and semantic
action associated with it prints the result E.val . Hence, the output will be 26.
2. Inherited Attributes – These are the attributes which derive their values
from their parent or sibling nodes i.e. value of inherited attributes are computed
by value of parent or sibling nodes.
Example:
A --> BCD { C.in = A.in, C.type = B.type }
Computation of Inherited Attributes –
 Construct the SDD using semantic actions.
 The annotated parse tree is generated and attribute values are computed in
top down manner.
Example: Consider the following grammar
S --> T L
T --> int
T --> float
T --> double
L --> L1, id
L --> id
The SDD for the above grammar can be written as follow

Let us assume an input string int a, c for computing inherited attributes. The
annotated parse tree for the input string is
The value of L nodes is obtained from T.type (sibling) which is basically lexical
value obtained as int, float or double. Then L node gives type of identifiers a
and c. The computation of type is done in top down manner or preorder
traversal. Using function Enter_type the type of identifiers a and c is inserted in
symbol table at corresponding id.entry.
Evaluation Orders for SDD's
Evaluation order for SDD includes how the SDD(Syntax Directed Definition) is
evaluated with the help of attributes, dependency graphs, semantic rules, and S
and L attributed definitions. SDD helps in the semantic analysis in the compiler
so it’s important to know about how SDDs are evaluated and their evaluation
order. This article provides detailed information about the SDD evaluation. It
requires some basic knowledge of grammar, production, parses tree, annotated
parse tree, synthesized and inherited attributes.

Terminologies:

 Parse Tree: A parse tree is a tree that represents the syntax of the
production hierarchically.
 Annotated Parse Tree: Annotated Parse tree contains the values and
attributes at each node.
 Synthesized Attributes: When the evaluation of any node’s attribute is
based on children.
 Inherited Attributes: When the evaluation of any node’s attribute is based
on children or parents.

Dependency Graphs:

A dependency graph provides information about the order of evaluation of


attributes with the help of edges. It is used to determine the order of evaluation
of attributes according to the semantic rules of the production. An edge from the
first node attribute to the second node attribute gives the information that first
node attribute evaluation is required for the evaluation of the second node
attribute. Edges represent the semantic rules of the corresponding production.
Dependency Graph Rules: A node in the dependency graph corresponds to
the node of the parse tree for each attribute. Edges (first node from the second
node)of the dependency graph represent that the attribute of the first node is
evaluated before the attribute of the second node.

Ordering the Evaluation of Attributes:

The dependency graph provides the evaluation order of attributes of the nodes
of the parse tree. An edge( i.e. first node to the second node) in the
dependency graph represents that the attribute of the second node is
dependent on the attribute of the first node for further evaluation. This order of
evaluation gives a linear order called topological order.
There is no way to evaluate SDD on a parse tree when there is a cycle present
in the graph and due to the cycle, no topological order exists.

Production Table

S.No. Productions Semantic Rules

1. S⇢A&B S.val = A.syn + B.syn

2. A ⇢ A1 # B A.syn = A1.syn * B.syn


Production Table

S.No. Productions Semantic Rules

A1.inh = A.syn

3. A1 ⇢ B A1.syn = B.syn

4. B ⇢ digit B.syn = digit.lexval

Annotated Parse Tree For 1#2&3


Dependency Graph For 1#2&3

Explanation of dependency graph:

Node number in the graph represents the order of the evaluation of the
associated attribute. Edges in the graph represent that the second value is
dependent on the first value.

Table-1 represents the attributes corresponding to each node.


Table-2 represents the semantic rules corresponding to each edge.

Table-1

Node Attribute

1 digit.lexval

2 digit.lexval

3 digit.lexval
Table-1

Node Attribute

4 B.syn

5 B.syn

6 B.syn

7 A1.syn

8 A.syn

9 A1.inh

10 S.val

Table-2

Edge

Fro Corresponding Semantic Rule


m To (From the production table)

1 4 B.syn = digit.lexval
Table-2

Edge

Fro Corresponding Semantic Rule


m To (From the production table)

2 5 B.syn = digit.lexval

3 6 B.syn = digit.lexval

4 7 A1.syn = B.syn

5 8 A.syn = A1.syn * B.syn

6 10 S.val = A.syn + B.syn

7 8 A.syn = A1.syn * B.syn

8 10 S.val = A.syn + B.syn

8 9 A1.inh = A.syn

S-Attributed Definitions:

S-attributed SDD can have only synthesized attributes. In this type of definitions
semantic rules are placed at the end of the production only. Its evaluation is
based on bottom up parsing.
Example: S ⇢ AB { S.x = f(A.x | B.x) }
L-Attributed Definitions:

L-attributed SDD can have both synthesized and inherited (restricted inherited
as attributes can only be taken from the parent or left siblings). In this type of
definition, semantics rules can be placed anywhere in the RHS of the
production. Its evaluation is based on inorder (topological sorting).
Example: S ⇢ AB {A.x = S.x + 2} or S ⇢ AB { B.x = f(A.x | B.x) } or S ⇢ AB
{ S.x = f(A.x | B.x) }
Note:
 Every S-attributed grammar is also L-attributed.
 For L-attributed evaluation in order of the annotated parse tree is used.
 For S-attributed reverse of the rightmost derivation is used.

Semantic Rules with controlled side-effects:

Side effects are the program fragment contained within semantic rules. These
side effects in SDD can be controlled in two ways: Permit incidental side effects
and constraint admissible evaluation orders to have the same translation as any
admissible order.
Applications of Syntax-Directed Translation
Syntax Directed Translation :
It is used for semantic analysis and SDT is basically used to construct the parse
tree with Grammar and Semantic action. In Grammar, need to decide who has
the highest priority will be done first and In semantic action, will decide what
type of action done by grammar.
Example :
SDT = Grammar+Semantic Action
Grammar = E -> E1+E2
Semantic action= if (E1.type != E2.type) then print "type
mismatching"
Application of Syntax Directed Translation :
 SDT is used for Executing Arithmetic Expression.
 In the conversion from infix to postfix expression.
 In the conversion from infix to prefix expression.
 It is also used for Binary to decimal conversion.
 In counting number of Reduction.
 In creating a Syntax tree.
 SDT is used to generate intermediate code.
 In storing information into symbol table.
 SDT is commonly used for type checking also.
Example :
Here, we are going to cover an example of application of SDT for better
understanding the SDT application uses. let’s consider an example of arithmetic
expression and then you will see how SDT will be constructed.
Let’s consider Arithmetic Expression is given.
Input : 2+3*4
output: 14
SDT for the above example.

SDT for 2+3*4

Semantic Action is given as following.


E -> E+T { E.val = E.val + T.val then print (E.val)}
|T { E.val = T.val}
T -> T*F { T.val = T.val * F.val}
|F { T.val = F.val}
F -> Id {F.val = id}

Syntax-Directed Translation Schemes


It is a kind of notation in which each production of Context-Free Grammar is related
with a set of semantic rules or actions, and each grammar symbol is related to a set of
Attributes. Thus, the grammar and the group of semantic Actions combine to make
syntax-directed definitions.
The translation may be the generation of intermediate code, object code, or adding the
information in symbol table about constructs type. The modern compiler uses the
syntax-directed translation that makes the user’s life easy by hiding many
implementation details and free the user from having to specify explicitly the order in
which semantic rules are to be evaluated.
Semantic Actions− It is an action that is executed whenever the Parser will recognize
the input string generated by context-free grammar.
For Example, A → BC {Semantic Action}
Semantic Action is written in curly braces Attached with a production.
In Top-Down Parser, semantic action will be taken when A will be expanded to derive
BC which will further derive string w.
In Bottom-Up Parser, Semantic Action is generated when BC is reduced to A.
Syntax Directed Translation Scheme for Postfix Code
In postfix notation, the operator appears after the operands, i.e., the operator between
operands is taken out & is attached after operands.
For example,
Consider Grammar
E → E(1) + E(2)
E → E(1) ∗ E(2)
E → (E(1))
E → id
The following table shows the Postfix Translation for the grammar.

Production Semantic Action

E → E(1) + E(2) E. CODE = E(1). CODE| |E(2). CODE | |'+'

E → E(1) ∗ E(2) E. CODE = E(1). CODE| |E(2). CODE | |'∗'

E → (E(1)) E. CODE = E(1). CODE

E → id E. CODE = id
Translation for Postfix Notation
Here, E. CODE represents an attribute or translation of grammar symbol E. It means
the sequence of three-address statements evaluating E. The translation of the
nonterminal on the left of each production is the concatenation (| |) of the translation of
the non-terminals on the right followed by the operator.
In the first productionE → E(1) + E(2), the value of translation E. CODE is the
concatenation of two translation E(1). CODE & E(2). CODE and symbol '+'.
In the second productionE → E(1) ∗ E(2), the value of translation E. CODE is the
concatenation of two translation E(1). CODE & E(2). CODE and symbol '∗'.
Here, concatenation is represented by the symbol (| |).
In the third productionE → (E(1)), the translation of parenthesized expression is the
same as that for unparenthesized expression.
In the fourth productionE → id, the translation of any identifier is the identifier itself.
Following are various attributes or translations for grammar symbol 𝐄.

 𝐄. 𝐕𝐀𝐋 → It tells the value of E.


 𝐄. 𝐏𝐋𝐀𝐂𝐄 →It describes the name that will hold the value of the expression.
 𝐄. 𝐂𝐎𝐃𝐄 →The sequence of three address statements computing the expression.
 𝐄. 𝐌𝐎𝐃𝐄 →It describes the data type of E.
Implementing L-Attributed SDD's
Before coming up to S-attributed and L-attributed SDTs, here is a brief intro to
Synthesized or Inherited attributes Types of attributes – Attributes may be of
two types – Synthesized or Inherited.
1. Synthesized attributes – A Synthesized attribute is an attribute of the non-
terminal on the left-hand side of a production. Synthesized attributes
represent information that is being passed up the parse tree. The attribute
can take value only from its children (Variables in the RHS of the
production). The non-terminal concerned must be in the head (LHS) of
production. For e.g. let’s say A -> BC is a production of a grammar, and A’s
attribute is dependent on B’s attributes or C’s attributes then it will be
synthesized attribute.
2. Inherited attributes – An attribute of a nonterminal on the right-hand side of
a production is called an inherited attribute. The attribute can take value
either from its parent or from its siblings (variables in the LHS or RHS of the
production). The non-terminal concerned must be in the body (RHS) of
production. For example, let’s say A -> BC is a production of a grammar and
B’s attribute is dependent on A’s attributes or C’s attributes then it will be
inherited attribute because A is a parent here, and C is a sibling.
Now, let’s discuss about S-attributed and L-attributed SDT.
1. S-attributed SDT :
 If an SDT uses only synthesized attributes, it is called as S-attributed
SDT.
 S-attributed SDTs are evaluated in bottom-up parsing, as the values of
the parent nodes depend upon the values of the child nodes.
 Semantic actions are placed in rightmost place of RHS.
2. L-attributed SDT:
 If an SDT uses both synthesized attributes and inherited attributes with a
restriction that inherited attribute can inherit values from left siblings only,
it is called as L-attributed SDT.
 Attributes in L-attributed SDTs are evaluated by depth-first and left-to-
right parsing manner.
 Semantic actions are placed anywhere in RHS.
 Example : S->ABC, Here attribute B can only obtain its value either from
the parent – S or its left sibling A but It can’t inherit from its right sibling C.
Same goes for A & C – A can only get its value from its parent & C can
get its value from S, A, & B as well because C is the rightmost attribute in
the given production.

Variants of syntax tree:

A syntax tree basically has two variants which are described below:
1. Directed Acyclic Graphs for Expressions (DAG)
2. The Value-Number Method for Constructing DAGs

Directed Acyclic Graphs for Expressions (DAG)

A DAG, like an expression’s syntax tree, includes leaves that correspond to


atomic operands and inside codes that correspond to operators. If N denotes a
common subexpression, a node N in a DAG has many parents; in a syntax tree,
the tree for the common subexpression would be duplicated as many times as
the subexpression appears in the original expression. As a result, a DAG not
only encodes expressions more concisely but also provides essential
information to the compiler about how to generate efficient code to evaluate the
expressions.
The Directed Acyclic Graph (DAG) is a tool that shows the structure of
fundamental blocks, allows you to examine the flow of values between them,
and also allows you to optimize them. DAG allows for simple transformations of
fundamental pieces.
Properties of DAG are:
1. Leaf nodes represent identifiers, names, or constants.
2. Interior nodes represent operators.
3. Interior nodes also represent the results of expressions or the
identifiers/name where the values are to be stored or assigned.
Examples:
T0 = a+b --- Expression 1
T1 = T0 +c --- Expression 2
Expression 1: T0 = a+b

Syntax tree for expression 1

Expression 2: T1 = T0 +c

Syntax tree for expression 2


The Value-Number Method for Constructing DAGs:
An array of records is used to hold the nodes of a syntax tree or DAG. Each row
of the array corresponds to a single record, and hence a single node. The first
field in each record is an operation code, which indicates the node’s label. In
the given figure below, Interior nodes contain two more fields denoting the left
and right children, while leaves have one additional field that stores the lexical
value (either a symbol-table pointer or a constant in this instance).

Nodes of a DAG for i = i + 10 allocated in an array

The integer index of the record for that node inside the array is used to refer to
nodes in this array. This integer has been referred to as the node’s value
number or the expression represented by the node in the past. The value of the
node labeled -I- is 3, while the values of its left and right children are 1 and 2,
respectively. Instead of integer indexes, we may use pointers to records or
references to objects in practice, but the reference to a node would still be
referred to as its “value number.” Value numbers can assist us in constructing
expressions if they are stored in the right data format.
 Algorithm: The value-number method for constructing the nodes of a
Directed Acyclic Graph.
 INPUT: Label op, node /, and node r.
 OUTPUT: The value number of a node in the array with signature (op, l,r).
 METHOD: Search the array for node M with label op, left child I, and right
child r. If there is such a node, return the value number of M. If not, create in
the array a new node N with label op, left child I, and right child r, and return
its value number.
While Algorithm produces the intended result, examining the full array every
time one node is requested is time-consuming, especially if the array contains
expressions from an entire program. A hash table, in which the nodes are
divided into “buckets,” each of which generally contains only a few nodes, is a
more efficient method. The hash table is one of numerous data structures that
may effectively support dictionaries. 1 A dictionary is a data type that allows us
to add and remove elements from a set, as well as to detect if a particular
element is present in the set. A good dictionary data structure, such as a hash
table, executes each of these operations in a constant or near-constant amount
of time, regardless of the size of the set.
To build a hash table for the nodes of a DAG, we require a hash function h that
computes the bucket index for a signature (op, I, r) in such a manner that the
signatures are distributed across buckets and no one bucket gets more than a
fair portion of the nodes. The bucket index h(op, I, r) is deterministically
computed from the op, I, and r, allowing us to repeat the calculation and always
arrive at the same bucket index per node (op, I, r).
The buckets can be implemented as linked lists, as in the given figure. The
bucket headers are stored in an array indexed by the hash value, each of which
corresponds to the first cell of a list. Each column in a bucket’s linked list
contains the value number of one of the nodes that hash to that bucket. That is,
node (op,l,r) may be located on the array’s list whose header is at index
h(op,l,r).
Data structure for searching buckets

We calculate the bucket index h(op,l,r) and search the list of cells in this bucket
for the specified input node, given the input nodes op, I, and r. There are
usually enough buckets that no list has more than a few cells. However, we
may need to examine all of the cells in a bucket, and for each value number v
discovered in a cell, we must verify that the input node’s signature (op,l,r)
matches the node with value number v in the list of cells (as in fig above). If a
match is found, we return v. We build a new cell, add it to the list of cells for
bucket index h(op, l,r), and return the value number in that new cell if we find no
match.
Three-Address Code
Three address code is a type of intermediate code which is easy to generate
and can be easily converted to machine code. It makes use of at most three
addresses and one operator to represent an expression and the value
computed at each instruction is stored in temporary variable generated by
compiler. The compiler decides the order of operation given by three address
code.

Three address code is used in compiler applications:


Optimization: Three address code is often used as an intermediate
representation of code during optimization phases of the compilation process.
The three address code allows the compiler to analyze the code and perform
optimizations that can improve the performance of the generated code.
Code generation: Three address code can also be used as an intermediate
representation of code during the code generation phase of the compilation
process. The three address code allows the compiler to generate code that is
specific to the target platform, while also ensuring that the generated code is
correct and efficient.
Debugging: Three address code can be helpful in debugging the code
generated by the compiler. Since three address code is a low-level language, it
is often easier to read and understand than the final generated code.
Developers can use the three address code to trace the execution of the
program and identify errors or issues that may be present.
Language translation: Three address code can also be used to translate code from one
programming language to another. By translating code to a common intermediate
representation, it becomes easier to translate the code to multiple target languages.
General representation –
a = b op c
Where a, b or c represents operands like names, constants or compiler generated
temporaries and op represents the operator
Example-1: Convert the expression a * – (b + c) into three address code.

Example-2: Write three address code for following code


for(i = 1; i<=10; i++)
{
a[i] = x * 5;
}
Implementation of Three Address Code –
There are 3 representations of three address code namely
1. Quadruple
2. Triples
3. Indirect Triples
1. Quadruple – It is a structure which consists of 4 fields namely op, arg1, arg2 and
result. op denotes the operator and arg1 and arg2 denotes the two operands and result is
used to store the result of the expression.
Advantage –
 Easy to rearrange code for global optimization.
 One can quickly access value of temporary variables using symbol table.
Disadvantage –
 Contain lot of temporaries.
 Temporary variable creation increases time and space complexity.
Example – Consider expression a = b * – c + b * – c. The three address code is:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5

2. Triples – This representation doesn’t make use of extra temporary variable to


represent a single operation instead when a reference to another triple’s value is needed, a
pointer to that triple is used. So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage –
 Temporaries are implicit and difficult to rearrange code.
 It is difficult to optimize because optimization involves moving intermediate code.
When a triple is moved, any other triple referring to it must be updated also. With
help of pointer one can directly access symbol table entry.
Example – Consider expression a = b * – c + b * – c
3. Indirect Triples – This representation makes use of pointer to the listing of all
references to computations which is made separately and stored. Its similar in utility as
compared to quadruple representation but requires less space than it. Temporaries are
implicit and easier to rearrange code.
Example – Consider expression a = b * – c + b * – c
Question – Write quadruple, triples and indirect triples for following expression : (x + y)
* (y + z) + (x + y + z)
Explanation – The three address code is:
t1 = x + y
t2 = y + z
t3 = t1 * t2
t4 = t1 + z
t5 = t3 + t4
Types and Declarations, Type Checking
Type checking is the process of verifying and enforcing constraints of types in
values. A compiler must check that the source program should follow the
syntactic and semantic conventions of the source language and it should also
check the type rules of the language. It allows the programmer to limit what
types may be used in certain circumstances and assigns types to values. The
type-checker determines whether these values are used appropriately or not.
It checks the type of objects and reports a type error in the case of a violation,
and incorrect types are corrected. Whatever the compiler we use, while it is
compiling the program, it has to follow the type rules of the language. Every
language has its own set of type rules for the language. We know that the
information about data types is maintained and computed by the compiler.
The information about data types like INTEGER, FLOAT, CHARACTER, and all
the other data types is maintained and computed by the compiler. The compiler
contains modules, where the type checker is a module of a compiler and its
task is type checking.

Conversion
Conversion from one type to another type is known as implicit if it is to be
done automatically by the compiler. Implicit type conversions are also
called Coercion and coercion is limited in many languages.
Example: An integer may be converted to a real but real is not converted to an
integer.
Conversion is said to be Explicit if the programmer writes something to do the
Conversion.
Tasks:
1. has to allow “Indexing is only on an array”
2. has to check the range of data types used
3. INTEGER (int) has a range of -32,768 to +32767
4. FLOAT has a range of 1.2E-38 to 3.4E+38.

Types of Type Checking:

There are two kinds of type checking:


1. Static Type Checking.
2. Dynamic Type Checking.

Static Type Checking:

Static type checking is defined as type checking performed at compile time. It


checks the type variables at compile-time, which means the type of the variable
is known at the compile time. It generally examines the program text during the
translation of the program. Using the type rules of a system, a compiler can
infer from the source text that a function (fun) will be applied to an operand (a)
of the right type each time the expression fun(a) is evaluated.
Examples of Static checks include:
 Type-checks: A compiler should report an error if an operator is applied to
an incompatible operand. For example, if an array variable and function
variable are added together.
 The flow of control checks: Statements that cause the flow of control to
leave a construct must have someplace to which to transfer the flow of
control. For example, a break statement in C causes control to leave the
smallest enclosing while, for, or switch statement, an error occurs if such an
enclosing statement does not exist.
 Uniqueness checks: There are situations in which an object must be
defined only once. For example, in Pascal an identifier must be declared
uniquely, labels in a case statement must be distinct, and else a statement in
a scalar type may not be represented.
 Name-related checks: Sometimes the same name may appear two or more
times. For example in Ada, a loop may have a name that appears at the
beginning and end of the construct. The compiler must check that the same
name is used at both places.
The Benefits of Static Type Checking:
1. Runtime Error Protection.
2. It catches syntactic errors like spurious words or extra punctuation.
3. It catches wrong names like Math and Predefined Naming.
4. Detects incorrect argument types.
5. It catches the wrong number of arguments.
6. It catches wrong return types, like return “70”, from a function that’s declared
to return an int.

Dynamic Type Checking:

Dynamic Type Checking is defined as the type checking being done at run time.
In Dynamic Type Checking, types are associated with values, not variables.
Implementations of dynamically type-checked languages runtime objects are
generally associated with each other through a type tag, which is a reference to
a type containing its type information. Dynamic typing is more flexible. A static
type system always restricts what can be conveniently expressed. Dynamic
typing results in more compact programs since it is more flexible and does not
require types to be spelled out. Programming with a static type system often
requires more design and implementation effort.
Languages like Pascal and C have static type checking. Type checking is used
to check the correctness of the program before its execution. The main purpose
of type-checking is to check the correctness and data type assignments and
type-casting of the data types, whether it is syntactically correct or not before
their execution.
Static Type-Checking is also used to determine the amount of memory needed
to store the variable.
The design of the type-checker depends on:
1. Syntactic Structure of language constructs.
2. The Expressions of languages.
3. The rules for assigning types to constructs (semantic rules).

The Position of the Type checker in the Compiler:


Type checking in Compiler

The token streams from the lexical analyzer are passed to the PARSER. The
PARSER will generate a syntax tree. When a program (source code) is
converted into a syntax tree, the type-checker plays a Crucial Role. So, by
seeing the syntax tree, you can tell whether each data type is handling the
correct variable or not. The Type-Checker will check and if any modifications
are present, then it will modify. It produces a syntax tree, and after that,
INTERMEDIATE CODE Generation is done.

Overloading:

An Overloading symbol is one that has different operations depending on its


context.
Overloading is of two types:
1. Operator Overloading
2. Function Overloading
Operator Overloading: In Mathematics, the arithmetic expression “x+y” has
the addition operator ‘+’ is overloaded because ‘+’ in “x+y” have different
operators when ‘x’ and ‘y’ are integers, complex numbers, reals, and Matrices.
Example: In Ada, the parentheses ‘()’ are overloaded, the ith element of the
expression A(i) of an Array A has a different meaning such as a ‘call to function
‘A’ with argument ‘i’ or an explicit conversion of expression i to type ‘A’. In most
languages the arithmetic operators are overloaded.
Function Overloading: The Type Checker resolves the Function Overloading
based on types of arguments and Numbers.
Example:
E-->E1(E2)
{
E.type:= if E2.type = s
E1.type = s -->t then t
else type_error
}

Control flow
The control flow is the order in which the computer executes statements in a script.

Code is run in order from the first line in the file to the last line, unless the computer
runs across the (extremely frequent) structures that change the control flow, such as
conditionals and loops.

For example, imagine a script used to validate user data from a webpage form. The
script submits validated data, but if the user, say, leaves a required field empty, the
script prompts them to fill it in. To do this, the script uses a conditional structure
or if...else, so that different code executes depending on whether the form is complete or
not:

JSCopy to Clipboard
if (isEmpty(field)) {
promptUser();
} else {
submitForm();
}

A typical script in JavaScript or PHP (and the like) includes many control structures,
including conditionals, loops and functions. Parts of a script may also be set to execute
when events occur.

For example, the above excerpt might be inside a function that runs when the user clicks
the Submit button for the form. The function could also include a loop, which iterates
through all of the fields in the form, checking each one in turn. Looking back at the code
in the if and else sections, the lines promptUser and submitForm could also be calls to other
functions in the script. As you can see, control structures can dictate complex flows of
processing even with only a few lines of code.

Control flow means that when you read a script, you must not only read from start to
finish but also look at program structure and how it affects order of execution.

Intermediate Code for Procedures.


A source code can directly be translated into its target machine code, then why at all we
need to translate the source code into an intermediate code which is then translated to
its target code? Let us see the reasons why we need an intermediate code.
 If a compiler translates the source language to its target machine language
without having the option for generating intermediate code, then for each new
machine, a full native compiler is required.
 Intermediate code eliminates the need of a new full compiler for every unique
machine by keeping the analysis portion same for all the compilers.
 The second part of compiler, synthesis, is changed according to the target
machine.
 It becomes easier to apply the source code modifications to improve code
performance by applying code optimization techniques on the intermediate code.
Intermediate Representation
Intermediate codes can be represented in a variety of ways and they have their own
benefits.
 High Level IR - High-level intermediate code representation is very close to the
source language itself. They can be easily generated from the source code and
we can easily apply code modifications to enhance performance. But for target
machine optimization, it is less preferred.
 Low Level IR - This one is close to the target machine, which makes it suitable
for register and memory allocation, instruction set selection, etc. It is good for
machine-dependent optimizations.
Intermediate code can be either language specific (e.g., Byte Code for Java) or
language independent (three-address code).

Three-Address Code
Intermediate code generator receives input from its predecessor phase, semantic
analyzer, in the form of an annotated syntax tree. That syntax tree then can be
converted into a linear representation, e.g., postfix notation. Intermediate code tends to
be machine independent code. Therefore, code generator assumes to have unlimited
number of memory storage (register) to generate code.
For example:
a = b + c * d;
The intermediate code generator will try to divide this expression into sub-expressions
and then generate the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2
r being used as registers in the target program.
A three-address code has at most three address locations to calculate the expression. A
three-address code can be represented in two forms : quadruples and triples.
Quadruples
Each instruction in quadruples presentation is divided into four fields: operator, arg1,
arg2, and result. The above example is represented below in quadruples format:

Op arg1 arg2 result

* c d r1

+ b r1 r2

+ r2 r1 r3

= r3 a

Triples
Each instruction in triples presentation has three fields : op, arg1, and arg2.The results
of respective sub-expressions are denoted by the position of expression. Triples
represent similarity with DAG and syntax tree. They are equivalent to DAG while
representing expressions.

Op arg1 arg2

* c d

+ b (0)

+ (1) (0)

= (2)

Triples face the problem of code immovability while optimization, as the results are
positional and changing the order or position of an expression may cause problems.
Indirect Triples
This representation is an enhancement over triples representation. It uses pointers
instead of position to store results. This enables the optimizers to freely re-position the
sub-expression to produce an optimized code.

Declarations
A variable or procedure has to be declared before it can be used. Declaration involves
allocation of space in memory and entry of type and name in the symbol table. A
program may be coded and designed keeping the target machine structure in mind, but
it may not always be possible to accurately convert a source code to its target language.
Taking the whole program as a collection of procedures and sub-procedures, it
becomes possible to declare all the names local to the procedure. Memory allocation is
done in a consecutive manner and names are allocated to memory in the sequence
they are declared in the program. We use offset variable and set it to zero {offset = 0}
that denote the base address.
The source programming language and the target machine architecture may vary in the
way names are stored, so relative addressing is used. While the first name is allocated
memory starting from the memory location 0 {offset=0}, the next name declared later,
should be allocated memory next to the first one.
Example:
We take the example of C programming language where an integer variable is assigned
2 bytes of memory and a float variable is assigned 4 bytes of memory.
int a;
float b;

Allocation process:
{offset = 0}

int a;
id.type = int
id.width = 2

offset = offset + id.width


{offset = 2}

float b;
id.type = float
id.width = 4

offset = offset + id.width


{offset = 6}
To enter this detail in a symbol table, a procedure enter can be used. This method may
have the following structure:
enter(name, type, offset)
This procedure should create an entry in the symbol table, for variable name, having its
type set to type and relative address offset in its data area.

You might also like