Week 3-4
Week 3-4
1
OUTLINE OF THE TOPICS TO BE COVERED TODAY
Language definition
Syntax and semantic Specification
Using BNF notation to define syntax of a language
2
SYNTAX AND SEMANTIC SPECIFICATION
Syntax and Semantic Specification
For specifying syntax, we present a widely used notation called Backus-Naur Form or
CFGs.
For example: if (expression) statement else statement
Context-free grammar, or grammars are used alternatively here
(An if-else statement is the concatenation of the keyword if, an opening parenthesis,
an expression, a closing parenthesis, a statement, the keyword else, and another
statement.)
3
SYNTAX AND SEMANTIC SPECIFICATION
Using the variable expr to denote an expression and the variable stmt to denote a statement,
this structuring rule can be expressed as
Stmt if (expr) stmt else stmt
arrow may be read as “can have the form." Such a rule is called a production
In a production, lexical elements like the keyword if and the parentheses are called terminals
Variables like expr and stmt represent sequences of terminals and are called non-terminals.
4
SYNTAX AND SEMANTIC SPECIFICATION
Definition of Grammars:
A context-free grammar has four components:
1. A set of terminal symbols, sometimes referred to as “tokens." The terminals are the elementary symbols of the
language defined by the grammar.
2. A set of non-terminals, sometimes called “syntactic variables." Each non-terminal represents a set of strings of
terminals, in a manner we shall describe.
3. A set of productions, where each production consists of a nonterminal, called the head or left side of the
production, an arrow, and a sequence of terminals and/or non-terminals, called the body or right side of the
production. The intuitive intent of a production is to specify one of the written forms of a construct; if the head
nonterminal represents a construct, then the body represents a written form of the construct.
4. A designation of one of the non-terminals as the start symbol
5
SYNTAX AND SEMANTIC SPECIFICATION
Few Conventions for specifying Productions
Specifying grammars by listing their productions, with the productions for the start
symbol listed first
We assume that digits, signs such as < and <=, and boldface strings such as while are
terminals
An italicized name is a nonterminal, and any non-italicized name or symbol may be
assumed to be a terminal
For notational convenience, productions with the same nonterminal as the head can
have their bodies grouped, with the alternative bodies separated by the symbol |,
which we read as “or."
6
SYNTAX AND SEMANTIC SPECIFICATION
Example:
Several examples here use expressions consisting of digits and plus
and minus signs; e.g., expressions such as 9-5+2, 3-1, or 7.
Since a plus or minus sign must appear between two digits, we refer
to such expressions as “lists of digits separated by plus or minus
signs."
The following grammar describes the syntax of these expressions.
The productions are:
7
SYNTAX AND SEMANTIC SPECIFICATION
The bodies of the three productions with nonterminal list as head equivalently can be grouped:
According to our conventions, the terminals of the grammar are the symbols
The non-terminals are the italicized names list and digit, with list being the start symbol because its
productions are given first
We say a production is for a nonterminal if the nonterminal is the head of the production.
A string of terminals is a sequence of zero or more terminals. The string of zero terminals, is called the
empty string.
8
SYNTAX AND SEMANTIC SPECIFICATION
Derivations:
Productions (1) to (4) are all we need to define the desired language
For example, we can deduce that 9-5+2 is a list as follows
a. 9 is a list by production (3), since 9 is a digit
b. 9-5 is a list by production (2), since 9 is a list and 5 is a digit
c. 9-5+2 is a list by production (1), since 9-5 is a list and 2 is a digit
10
SYNTAX AND SEMANTIC SPECIFICATION
Example 2:
call id ( optparams )
optparams params |
params params , param | param 11
SYNTAX AND SEMANTIC SPECIFICATION
Parsing:
Parsing is the problem of taking a string of terminals and figuring out how to derive it from the start symbol of the
grammar
If it cannot be derived from the start symbol of the grammar, then reporting syntax errors within the string
A parse tree pictorially shows how the start symbol of a grammar derives a string in the language. If nonterminal A has a production
A XYZ, then a parse tree may have an interior node labeled A with three children labeled X, Y , and Z, from left to right
Formally, given a context-free grammar, a parse tree according to the grammar is a tree with the following properties :
The root is labeled by the start symbol
Each leaf is labeled by a terminal or by
Each interior node is labeled by a non-terminal
12
SYNTAX AND SEMANTIC SPECIFICATION
13
SYNTAX AND SEMANTIC SPECIFICATION
Tree Terminology:
A tree consists of one or more nodes
Nodes may have labels, which in this book typically will be grammar symbols
When we draw a tree, we often represent the nodes by these labels only
Exactly one node is the root
All nodes except the root have a unique parent; the root has no parent
When we draw trees, we place the parent of a node above that node and draw an edge between them. The root is then the highest (top)
node.
If node N is the parent of node M, then M is a child of N. The children of one node are called siblings. They have an order, from the left,
and when we draw trees, we order the children of a given node in this manner.
A node with no children is called a leaf. Other nodes | those with one or more children | are interior nodes.
A descendant of a node N is either N itself, a child of N, a child of a child of N, and so on, for any number of levels.
We say node N is an ancestor of node M if M is a descendant of N
14
SYNTAX AND SEMANTIC SPECIFICATION
the interior node corresponds to the head of the production, the children to the body.
The children of the root are labeled, from left to right, list, +, and digit.
Note that in the production (on right side); the left child of the root is similar to the root, with a child labeled +
instead of -.
15
SYNTAX AND SEMANTIC SPECIFICATION
In the Fig. the yield is 9-5+2; for convenience, all the leaves
are shown at the bottom level
Another definition of the language generated by a grammar is
as the set of strings
that can be generated by some parse tree.
16
SYNTAX AND SEMANTIC SPECIFICATION
Ambiguity
Grammar can have more than one parse trees
Showing the multiple representations of the grammar, we normally find a terminal string
Strings having more than one interpretation, we need to design unambiguous grammars with
additional rules
Example: We used a single non terminal string and did not distinguish between list and digits
17
SYNTAX AND SEMANTIC SPECIFICATION
For example:
9-2+5 can have more than one parse trees and can be interpreted as
Any two parse trees have the two ways of parenthesizing the expression
(9-5)+2 and 9-(5+2); evaluate the result please … ???
Previous grammar does not not allow this interpretation.
18
SYNTAX AND SEMANTIC SPECIFICATION
Associativity of Operators
By convention the expression
9+5+2 is equivalent to 9+(5+2) and
9-5-2 is equivalent to 9-(5-2)
How? When the operand has some operators on its both sides, there are some rules for
making the decision regarding their selection.
We say operator + associate to its left
In most programming languages, the operators: +, -, / and * are left associative.
20
SYNTAX AND SEMANTIC SPECIFICATION
Associativity of Operators
Some common operators like exponentiation are right associative
Assignment operators = in C and its descendants are right associative;
for example the expression a=b=c is treated as a=(b=c)
String like a=b=c with a right associative operator are generated by the following grammar:
21
SYNTAX AND SEMANTIC SPECIFICATION
Figure: Parse trees for the left and right associative operators 22
SYNTAX AND SEMANTIC SPECIFICATION
Precedence of Operators
Consider the expression 9+5*2
Has two interpretations: (9+5)*2 and 9+(5*2); is there any ways to solve this?
As operators + and * lie on the same level, now which new rules are required …??
23
SYNTAX AND SEMANTIC SPECIFICATION
24
PRACTICE TASK # 2
25
SYNTAX AND SEMANTIC SPECIFICATION
Syntax-Directed Translation
It is done by attaching rules or program fragments to productions in the grammar
26
SYNTAX AND SEMANTIC SPECIFICATION
27
SYNTAX AND SEMANTIC SPECIFICATION
Syntax-directed translations are used to translate infix expressions into postfix notation, to evaluate the
expressions and to build the syntax trees for programming constructs.
Postfix Notation
Considering an expression E
28
SYNTAX AND SEMANTIC SPECIFICATION
For example:
(9-5)+2 is 95-2+
The constants 9, 5, and 2 will be interpreted as the same by rule 1.
Then 9-2 will be translated as 95- by rule two
Parenthesis will be translated by rule 3 including the rule two for internal processing
29
SYNTAX AND SEMANTIC SPECIFICATION
Examples:
How to convert 9-(5+2)into postfix??
Evaluate the result … ??
No parenthesis are required for postfix notations, scan from left to right
How to convert 952+-3* into infix form … ??
30
SYNTAX AND SEMANTIC SPECIFICATION
Synthesized attributes:
Associating qualities with programming constructs
Values and types with expressions
Attributes with terminals and non terminals
Rules to productions, how they are evaluated (like in parse trees, nodes relationship with its
child nodes)
31
SYNTAX AND SEMANTIC SPECIFICATION
Formal Definition
32
SYNTAX AND SEMANTIC SPECIFICATION
33
SYNTAX AND SEMANTIC SPECIFICATION
34
SYNTAX AND SEMANTIC SPECIFICATION
35
Thanks
36