CSC 461 - Chapter 03
CSC 461 - Chapter 03
DESCRIBING SYNTAX
AND SEMANTICS
SLIDES COURTESY FROM :
“CONCEPTS OF PROGRAMMING LANGUAGES” –BY ROBERT W.
SEBESTA.
PUBLISHED BY PEARSON EDUCATION, INC. USA. ELEVENTH
EDITION. 2016
Language = C programs
Each string over is a sentence
For Java programming language, a Java 4
program is a sentence.
THE GENERAL PROBLEM OF
DESCRIBING SYNTAX
Formal descriptions of the syntax of programming
languages often do not include descriptions of the
lowest-level syntactic units. These small units are
called lexemes.
The lexemes of a programming language include
its numeric literals, operators, and special words.
Lexemes are partitioned into groups—for
example, the names of variables, methods, and
classes.
A name or token represents each lexeme group.
So, a token of a language is a category of its
lexemes.
5
An identifier is a token that can have lexemes or
instances, such as sum and total.
THE GENERAL PROBLEM OF
DESCRIBING SYNTAX
6
CONTEXT-FREE GRAMMARS
In the middle to late 1950s, two men, Noam
Chomsky and John Backus, developed the same
syntax description formalism in unrelated
research efforts. Subsequently, it became the
most widely used method for programming
language syntax.
In the mid-1950s, Noam Chomsky described
four classes of generative devices or grammars
that define four classes of languages (Chomsky,
1956, 1959).
Two of these grammar classes, named context-
free and regular, turned out to be useful for
7
describing the syntax of programming
languages.
CONTEXT-FREE GRAMMARS
Regular grammars can describe the forms of
the tokens of programming languages.
The syntax of whole programming
languages, with minor exceptions, can be
described by context-free grammars.
8
ORIGINS OF BACKUS-NAUR FORM
John Backus’s paper introduced a new formal
notation for specifying programming
language syntax.
The new notation was later modified slightly
by Peter Naur for the description of ALGOL
60 (Naur, 1960). This revised method of
syntax description became known as the
Backus-Naur Form, or simply BNF.
BNF is nearly identical to Chomsky’s
generative devices for context-free
languages, called context-free grammars.
9
FUNDAMENTALS
A metalanguage is a language that is used to
describe another language. BNF is a metalanguage
for programming languages.
BNF uses abstractions for syntactic structures. A
simple Java assignment statement, for example,
might be represented by abstraction <assign>. The
actual definition of <assign> can be given by
<assign> <var> = <expression>
The text on the left side of the arrow, which is called
the left-hand side (LHS). The text to the right of the
arrow is the definition of the LHS. It is called the right-
hand side (RHS) and consists of some mixture of
tokens, lexemes, and references to other abstractions.
Altogether, the definition is called a rule or 10
production.
FUNDAMENTALS
The abstractions in a BNF description or
grammar are often called nonterminal
symbols or simply nonterminals.
The lexemes and tokens of the rules are
called terminal symbols or simply terminals.
A BNF description, or grammar, is a
collection of rules.
Nonterminal symbols can have two or more
distinct definitions, representing two or more
possible syntactic forms in the language.
11
GRAMMARS AND DERIVATIONS
Recursion is used to describe Variable-length lists in
BNF form.
A rule is recursive if its LHS appears in its RHS. The
following rules illustrate how recursion is used to
describe lists:
<ident_list> identifier | identifier,
<ident_list>
A grammar is a generative device for defining
languages.
The sentences of the language are generated through
a sequence of applications of the rules, beginning
with a special nonterminal of the grammar called the
start symbol.
12
This sequence of rule applications is called a
derivation.
GRAMMARS AND DERIVATIONS
Each of the strings in a derivation is called a
sentential form.
When the replaced nonterminal is always the
leftmost nonterminal in the previous
sentential form, that order of replacement is
called leftmost derivation.
When the replaced nonterminal is always the
rightmost nonterminal in the previous
sentential form, that order of replacement is
called rightmost derivation.
13
PARSE TREES
One of the most attractive features of grammar is
that it naturally describes the hierarchical syntactic
structure of the sentences of the languages they
define. These hierarchical structures are called
parse trees.
14
AMBIGUOUS GRAMMAR
A grammar that generates a sentential form for
which there are two or more distinct parse trees is
said to be ambiguous.
Ambiguity is related to the grammar, not the
language.
Removing Ambiguity
When an expression includes two operators with the
same precedence (as * and / usually have)—for
example, A / B * C—a semantic rule must specify which
should have precedence. This rule is called Associativity.
When an expression includes two different operators,
for example, x + y * z, assigning different precedence
levels to operators is important. For example, the
multiplication operator is generated lower in the tree, 15
which could indicate that it has precedence over the
addition operator in the expression.
EXTENDED BNF
Because of a few minor inconveniences in BNF, it
has been extended in several ways. Most extended
versions are called Extended BNF, or simply EBNF,
even though they are not all exactly the same.
The extensions do not enhance the descriptive
power of BNF; they only increase its readability and
writability.
Three extensions are commonly included in the
various versions of EBNF.
The first of these denotes an optional part of an
RHS, which is delimited by brackets. For example,
a C if-else statement can be described as:
<if_stmt> if (<expression>) <statement> [else 16
<statement>]
EXTENDED BNF
Without the use of the brackets, the syntactic
description of this statement would require the
following two rules:
<if_stmt> if (<expression>) <statement>
|
if(<expression>)<statement>else<statement>
The second extension is the use of braces in an
RHS to indicate that the enclosed part can be
repeated indefinitely or left out altogether.
The third common extension deals with multiple-
choice options. When a single element must be
chosen from a group, the options are placed in
parentheses and separated by the OR operator, |. 17
ATTRIBUTE GRAMMARS
An attribute grammar is a device used to
describe more of the structure of a
programming language that can be defined
with context-free grammar.
It is an extension of context-free grammar.
Static Semantics
Some characteristics of programming languages are
difficult to describe with both BNF & EBNF, and
some are impossible.
In Java, for example, a floating-point value cannot
be assigned to an integer type variable, although
the opposite is legal.
Although this restriction can be specified in BNF, it 18
requires additional nonterminal symbols and rules.
STATIC SEMANTICS
If all of the typing rules of Java were specified in
BNF, the grammar would become too large to be
helpful because the size of the grammar
determines the size of the syntax analyzer.
These problems illustrate the categories of
language rules called static semantics rules.
The static semantics of a language is only
indirectly related to the meaning of programs
during execution; rather, it has to do with the legal
forms of programs (syntax rather than semantics).
Many static semantic rules of a language state its
type constraints. Static semantics is so named
because the analysis required to check these 19
specifications can be done at compile time.
STATIC SEMANTICS
Attribute grammars are context-free grammars to
which attributes, attribute computation functions,
and predicate functions have been added.
Attributes, which are associated with grammar
symbols (the terminal and nonterminal symbols),
are similar to variables in the sense that they can
have values assigned to them.
Attribute computation functions, sometimes called
semantic functions, are associated with grammar
rules. They are used to specify how attribute
values are computed.
Predicate functions, which state the static
semantic rules of the language, are associated 20
with grammar rules.
STATIC SEMANTICS
21
TWO TYPES OF ATTRIBUTES
P
c1 c2 c3 c4
Synthesized attributes: values are computed from one of
the children’s nodes. Synthesized of P = f(c 1, c2, c3, c4)
Synthesized attributes are used to pass semantic
information up a parse tree.
P
S1 S2 S3 S4
D E F D E F
24
EXAMPLE
The syntax and static semantics of this assignment statement are as
follows:
1. The only variable names are A, B, and C.
is always real.
6. When they are the same, the expression type is that of the
operands.
7. The type of the left side of the assignment must match the type
26
DYNAMIC SEMANTICS
Operational Semantics
Operational semantics describes the meaning of
a program by defining the execution steps
needed to evaluate it. It gives a detailed, step-
by-step explanation of how each statement or
expression changes the program’s state.
Example: For an assignment statement x = x +
1, operational semantics would describe the
exact process of looking up the current value of
x, adding 1, and then updating x with the new
value.
27
DYNAMIC SEMANTICS
Denotational Semantics
Denotational semantics defines the meaning of a
program by mapping its elements to
mathematical objects, such as functions or
domains. Each program construct is mapped to a
mathematical object that represents its
"meaning" without specifying how it executes
step-by-step.
Example: In denotational semantics, the
assignment x = x + 1 might be represented as a
function that takes an initial state (a mapping of
variables to values) and produces a new state
with x updated.
28
DYNAMIC SEMANTICS
Axiomatic Semantics
Axiomatic semantics focuses on using logical
assertions to specify a program's behavior. It
uses preconditions and postconditions to
describe what must be true before and after
each statement's execution. This approach is
useful for proving programs' correctness.
Example: For x = x + 1, we could specify a
precondition P: x = 3 and a postcondition Q: x =
4, indicating that if x starts as 3, it will end as 4
after this statement.
29
DYNAMIC SEMANTICS
Originally
Based on formal logic (predicate calculus)
For formal program verification
Form
Assertions: The logical expressions used in axiomatic
semantics are called predicates or assertions.
Pre-, post- conditions: {P} statement {Q}
{b > 0}
a = b + 1
{a > 1}
Axioms (logical statements that are assumed to be
true) {P} S {Q}, P' P, Q Q'
Inference rules
how to interpret… {P'} S {Q'}
30
(Is a method of inferring the truth of one
assertion based on the values of other assertions.)
WEAKEST PRECONDITIONS
The weakest precondition is the least
restrictive precondition that will guarantee
the validity of the associated postcondition.
An inference rule is a method of inferring the
truth of one assertion on the basis of the
values of other assertions. The general form
of an inference rule is as follows:
This rule states that if S1, S2, . . . , and Sn
are true, then the truth of S can be inferred.
The top part of an inference rule is called its
antecedent; the bottom part is called its
consequent. 31
ASSIGNMENT STATEMENTS
The precondition and postcondition of an
assignment statement together define its
meaning.
To define the meaning of an assignment
statement, there must be a way to compute its
precondition from its postcondition.
Let x = E be a general assignment statement,
and Q be its postcondition. Then, its weakest
precondition, P, is defined by the axiom.
which means that P is computed as Q with all
instances of x replaced by E. For example, if
we have the assignment statement and 32
postcondition
ASSIGNMENT STATEMENTS
a = b / 2 - 1 {a < 10}, the weakest
precondition is computed by substituting b / 2
- 1 for ‘a’ in the postcondition {a < 10}, as
follows: b / 2 - 1 < 10 b < 22
Thus, the weakest precondition for the given
assignment statement and postcondition is {b
< 22}
The usual notation for specifying the
axiomatic semantics of a given statement
form is: {P} S {Q}
A given assignment statement with both a
precondition and a postcondition can be 33
34
THE END
35