Chapter 4 Semantic Analysis
Chapter 4 Semantic Analysis
Complier Design
1
Chapter 4
Semantic Analysis
2
What is Semantic analysis?
Type checking : The process of verifying and enforcing the constraints of types is called
type checking.
This may occur either at compile-time (a static check) or run-time (a dynamic check).
Static type checking is a primary task of the semantic analysis carried out by a compiler.
Uniqueness checking : Whether a variable name is unique or not, in the its scope.
Name Checks : Check whether any variable has a name which is not allowed.
Beyond syntax analysis
☞ Parser cannot catch all the program errors
☞ There is a level of correctness that is deeper than syntax analysis
☞ Some language features cannot be modeled using context free grammar formalism
A parser has its own limitations in catching program errors related to semantics, something
that is deeper than syntax analysis.
Typical features of semantic analysis cannot be modeled using context free grammar
formalism.
If one tries to incorporate those features in the definition of a language then that language
does not remain context free anymore.
Example 1
string x; int y; y = x + 3; The use of x is a type error. (a string can’t add with integer)
int a, b; a = b + c; Here, c is not declared
int x; char x; An identifier x refer to different data types and makes declaration conflicts
A variable declared within one function cannot be used within the scope of the other
function unless declared there separately.
These are a couple of examples, which tell us that typically what a compiler has to do
beyond syntax analysis.
This is just an example probably you can think of many more examples in which syntax
analysis will not handle.
Compiler needs to know?
Whether a variable has been declared? and What is the type of the variable?
Whether a variable is a scalar, an array, or a function?
What declaration of the variable does each reference use?
If an expression is type consistent?
How many arguments does a function take?
Are all invocations of a function consistent with the declaration?
If an operator/function is overloaded, which function is being invoked?
Inheritance relationship
If the compiler has the answers to all these and other questions, then it will be able to
successfully do a semantic analysis by using the generated parse tree.
How to answer these questions?
In order to answer the previous questions the compiler will have to keep information about
the type of variables, number of parameters in a particular function, types of
inheritance used etc.
It will have to do some sort of computation in order to gain this information.
Most compilers keep a structure called symbol table to store this information.
But How?
In syntax analysis we used context free grammar.
Here we put lot of attributes around it and it consists of context sensitive grammars along
with extended attribute grammars.
Attribute grammar is nothing but it is a CFG and attributes put around all the terminal
and non-terminal symbols are used.
Example of Attribute grammar:
E → E + T { E.value = E.value + T.value }
CFG
Sematic Rules
The semantic rules that specify how the grammar should be interpreted.
Here, the values of non-terminals E and T are added together and the result is copied to the
non-terminal E.
Semantic attributes may be assigned to their values from their domain at the time of parsing
and evaluated at the time of assignment or conditions.
Based on the way the attributes get their values, they can be broadly divided into two
categories: synthesized attributes and inherited attributes.
Synthesized attributes:
These attributes get values from the attribute values of their child nodes.
To illustrate, assume the following production: S → ABC
If S is taking values from its child nodes (A, B, C), then it is said to be a
synthesized attribute, as the values of ABC are synthesized to S.
As in our previous example (E → E + T), the parent node E gets its value from
its child node.
Synthesized attributes never take values from their parent nodes or any
sibling nodes.
Inherited attributes:
In contrast to synthesized attributes, inherited attributes can take values from
parent and/or siblings.
As in the following production, S → ABC
A can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take
values from S, A, and B.
SDT involves passing information bottom-up and/or top-down to the parse tree in
form of attributes attached to the nodes.
S-attributed SDT
L-attributed STD
A. S-attributed SDT
If an SDT uses only synthesized attributes, it is called as S-attributed SDT.
These attributes are evaluated using S-attributed SDTs that have their semantic actions
written after the production (right hand side).
S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent
nodes depend upon the values of the child nodes.
In L-attributed SDTs, a non-terminal can get values from its parent, child, and sibling
nodes. As in the following production S → ABC
S can take values from A, B, and C (synthesized). A can take values from S only. B can
take values from S and A. C can get values from S, A, and B.
No non-terminal can get values from the sibling to its right.
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing manner.
From the diagram, we can say that, All S-attributed STD are L-attributed STD.
terminals are assumed to have only synthesized attribute values of which are supplied by
lexical analyzer
Start symbol has no parents, hence no inherited attributes.
Example Parse tree for 3*4+5n here n is newline
Inherited attributes help to find the context (type, scope etc.) of a token.
Here addtype (id.entry, L.in) functions adds a symbol table entry for the id a and attaches
to its parent the type of L.in .
Parse tree for real x, y, z
Dependence of attributes in an inherited attribute system. The value of in (an inherited attribute) at the
three L nodes gives the type of the three identifiers x , y and z . These are determined by computing the value
of the attribute T.type at the left child of the root and then valuating L.in top down at the three L nodes in the
right subtree of the root.
Dependence Graph
It is directed graph indicating interdependencies among the synthesized and
inherited attributes of various nodes in a parse tree.
If an attribute b depends on an attribute c then the semantic rule for b must be
evaluated after the semantic rule for c
The dependencies among the nodes can be depicted by a directed graph called
dependency graph
An algorithm to construct the dependency graph is : After making one node for every
attribute of all the nodes of the parse tree, make one edge from each of the other attributes
on which it depends.
The semantic rule A.a = f(X.x , Y.y) for the production A -> XY defines the synthesized
attribute a of A to be dependent on the attribute x of X and the attribute y of Y .
Similarly for the semantic rule X.x = g(A.a , Y.y) for the same production there will be
an edge from A.a to X.x and an edg e from Y.y to X.x. With inherited attributes
Part 2
What is Next?
Type System
Type Checking
Type Conversions
Construction of Abstract Syntax Trees
An Abstract Syntax Tree (syntax tree) is a tree in which each leaf node
represents an operand, while each inside node represents an operator.
The syntax is "abstract" : it does not represent every detail of real syntax, but
rather structural or content-related details .
It is condensed form of parse trees.
Example 1 id * id + id
Example 2 : Abstract syntax tree for a+b*c-d
( a) parse tree ( b) Abstract syntax tree
Rules for constructing a syntax tree
Each node in a syntax tree can be executed as data with multiple fields.
In the node for an operator, one field recognizes the operator and the remaining field
includes a pointer to the nodes for the operands.
The following functions are used to create the nodes of the syntax tree for the expressions
with binary operators.
mknode (op, left, right) − It generates an operator node with label op and two field including
pointers to left and right.
mkunode(op, entry)- It generates urinary operator for labels
mkleaf (id, entry) − It generates an identifier node with label id and the field including the
entry, a pointer to the symbol table entry for the identifier.
mkleaf (num, val) − It generates a number node with label num and a field including val,
For example : Let us have the following SDT
EE+T { E.ptr = mknode( ‘+’,E.ptr , T.ptr );}
ET {E.ptr=T.ptr ;}
TT*F {T.ptr = mknode (‘*’ , T.ptr, F.ptr) ;}
TF {T.ptr = F.ptr ;}
Fid {F.ptr = mkleaf ( id, id.entry ;)
This means that variables are checked against types only when the program is executing.
Programming languages like Lisp, PHP, JavaScript etc. have dynamic type checking.
Type Expressions
☞ Read from your handout
Type Conversion
d ou
E n Y
n k
h a
T