Lecture02 Single Slide Handout
Lecture02 Single Slide Handout
Describing
Syntax
Chapter 3
ISBN 0-321-49362-1
Lecture 2 Topics:
Introduction
The General Problem of Describing Syntax:
o Language Recognizers
o Language Generators
Formal Methods of Describing Syntax:
o Backus-Naur Form (BNF):
Extended BNF (EBNF)
o Context-Free Grammars (CFG):
Grammars
Derivations
Parse Trees
Ambiguity
Attribute Grammars:
o Static Semantics
1-2
Introduction
The study of programming languages, like
the study of natural languages, can be
divided into:
Examinations of syntax.
Examinations of semantics.
Syntax: the form or structure of the
expressions, statements, and program units.
Semantics: the meaning of the expressions,
statements, and program units.
Syntax and semantics provide a language’s
definition.
1-3
Introduction: Example
Example:
The syntax of a Java “while” statement is:
while (boolean_expr) statement
The semantics of the same statement is:
When the current value of the Boolean
expression is true, the embedded statement is
executed.
Then control implicitly returns to the Boolean
expression to repeat the process.
If the Boolean expression is false, control
transfers to the statement following the while
construct.
1-4
Introduction: Language Users
1-5
The General Problem of Describing
Syntax: Terminology
A sentence (statement) is a string of
characters over some alphabet.
A language is a set of sentences
(statements).
A lexeme is the lowest level syntactic unit of
a language (e.g., *, sum, begin).
The language operators.
The language special words.
The language numerical literals.
Etc.
A token is a category of lexemes (e.g.,
identifier).
1-6
Example
Recognizers:
A recognition device reads input strings over the alphabet of
the language and decides whether the input strings belong
to the language (accept or reject the given input strings).
Example: syntax analysis (parsing) part of a compiler.
The syntax analyzer determines whether the given programs
are syntactically correct.
Detailed discussion of syntax analysis appears in Chapter 4.
Generators:
A device that generates sentences of a language.
One can determine if the syntax of a particular sentence is
syntactically correct by comparing it to the structure of the
generator.
1-8
BNF and Context-Free Grammars
Context-Free Grammars:
Developed by Noam Chomsky in the mid-1950s.
Language generators, meant to describe the
syntax of natural languages.
Define a class of languages called context-free
languages.
Backus-Naur Form (1959):
Invented by John Backus to describe the syntax of
Algol 58.
BNF is equivalent to context-free grammars.
1-9
BNF Fundamentals
1-11
<assign> <var> = <expression>
1-13
BNF Fundamentals (continued)
1-14
BNF Rules
An abstraction (or nonterminal symbol) can
have more than one RHS:
<stmt> <single_stmt>
<stmt> begin <stmt_list> end
1-15
BNF Rules: More Examples
1-16
Describing Lists
Example of a list:
a list of identifiers appearing on a data
declaration statement.
| ident, <ident_list>
1-18
An Example Grammar
<program> <stmts>
<stmts> <stmt> | <stmt> ; <stmts>
<stmt> <var> = <expr>
<var> a | b | c | d
<expr> <term> + <term> | <term> - <term>
<term> <var> | const
1-22
Derivations
1-23
Yet Another Example Grammar
A = B * ( A + C )
1-24
Yet Another Example Derivation
The statement A = B * ( A + C ) is generated by the
following leftmost derivation:
1-25
Parse Tree
A hierarchical representation of a
derivation.
<program>
<var> = <expr>
a <term> + <term>
<var> const
b
1-26
Parse Tree
1-27
Ambiguity in Grammars
A grammar is ambiguous if and only if it
generates a sentential form that has two or more
distinct parse trees.
This type of grammar allows the parse tree of an
expression to grow on both left and right.
It should allow the tree to grow on the right only in
such cases.
How can this be a problem?
It confuses compilers during syntax analysis as
compilers use the parse tree to generate code.
So, the meaning of the structure cannot be
determined uniquely.
1-28
An Ambiguous Expression Grammar
<op> / | -
<expr> <expr>
1-29
Another Ambiguous Grammar
1-30
Two distinct parse trees for the same sentence, A = B + C * A
1-31
An Unambiguous Expression Grammar
<expr>
<expr> - <term>
const const
1-32
Operators Precedence
1-34
Operators Precedence
For example, using the grammar in slide 24, try to
sketch the parse trees for these two expressions:
A + B * C
A * B + C
What have you noticed?
You will see that:
For A + B * C, the (*) operator is the lowest in the tree,
which will lead to a correct evaluation.
However, for A * B + C instead, the (+) operator is the
lowest (indicating it is to be done first), which will lead to
an incorrect evaluation.
So, the grammar (slide 24) is sensitive to the order
of the operators in the expressions.
1-35
Operators Precedence
So, how this problem can be solved?
Simply, take the order into consideration when
designing the grammar by:
Use separate nonterminal symbols to represent the
operands of the operators that have different
precedence.
This requires additional nonterminals and some new
rules.
For example, to correct the grammar in slide 24,
we could use three nonterminals to represent
operands, which allows the grammar to force
different operators to different levels in the parse
tree.
But, how? See the next slide!
1-36
Operators Precedence
If <expr> is the root symbol for expressions, +
can be forced to the top of the parse tree by
having <expr> directly generate only +
operators, using the new nonterminal, <term>,
as the right operand of +.
Next, we can define <term> to generate *
operators, using <term> as the left operand and
a new nonterminal, <factor>, as its right
operand.
Now, * will always be lower in the parse tree,
simply because it is farther from the start symbol
than + in every derivation.
1-37
This grammar generates the
same language as the above
grammar. It is unambiguous
and it specifies the usual
precedence order of
multiplication and addition
operators.
1-38
Operators Precedence: Example
(Leftmost Derivation)
A = B + C * A
A = B + C * A
1-40
Associativity of Operators
When an expression includes two operators that
have the same precedence (as * and / usually
have)—for example, “A / B * C”, then a semantic
rule is required to specify which should have
precedence.
This rule is named associativity.
A grammar for expressions may correctly imply
operator associativity.
Consider the following example of an assignment
statement:
A = B + C + A
After using the grammar in the next slide for the
derivation of this statement, then its parse tree will
look like:
1-41
A parse tree for
A = B + C + A
illustrating the
associativity of addition
1-42
Associativity of Operators
The previous parse tree shows the left addition
operator lower than the right addition operator.
This is the correct order if addition is meant to be left
associative, which is typical.
In most cases, the associativity of addition in a
computer is irrelevant.
In mathematics, addition is associative, which means
that left and right associative orders of evaluation
mean the same thing.
That is, (A + B) + C = A + (B + C)
Subtraction and division are not associative, whether
in mathematics or in a computer.
Therefore, correct associativity may be essential for an
expression that contains either of them.
1-43
Associativity of Operators
When a grammar rule has its LHS also appearing at
the beginning of its RHS, the rule is said to be left
recursive.
This left recursion specifies left associativity.
For example, the left recursion of the rules of the
grammar below causes it to make both addition and
multiplication left associative.
1-44
Associativity of Operators
The exponentiation operator is right associative in
most languages that provide it.
To indicate right associativity, right recursion can be
used.
A grammar rule is right recursive if the LHS appears
at the right end of the RHS.
Rules such as:
1-46
BNF and EBNF
BNF:
<expr> <expr> + <term>
| <expr> - <term>
| <term>
<term> <term> * <factor>
| <term> / <factor>
| <factor>
EBNF:
<expr> <term> {(+ | -) <term>}
<term> <factor> {(* | /) <factor>}
1-47
Recent Variations in EBNF
1-48
Any Questions?