0% found this document useful (0 votes)
4 views35 pages

CSC 461 - Chapter 03

Chapter 3 discusses the complexities of describing programming languages, focusing on syntax and semantics. It explains the importance of formal grammar, particularly context-free grammars and Backus-Naur Form (BNF), in defining language structures. Additionally, it covers static and dynamic semantics, including operational, denotational, and axiomatic semantics, which help in understanding program behavior and correctness.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views35 pages

CSC 461 - Chapter 03

Chapter 3 discusses the complexities of describing programming languages, focusing on syntax and semantics. It explains the importance of formal grammar, particularly context-free grammars and Backus-Naur Form (BNF), in defining language structures. Additionally, it covers static and dynamic semantics, including operational, denotational, and axiomatic semantics, which help in understanding program behavior and correctness.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

CHAPTER 3

DESCRIBING SYNTAX
AND SEMANTICS
SLIDES COURTESY FROM :
“CONCEPTS OF PROGRAMMING LANGUAGES” –BY ROBERT W.
SEBESTA.
PUBLISHED BY PEARSON EDUCATION, INC. USA. ELEVENTH
EDITION. 2016

Md. Rawnak Saif Adib


1 Lecturer
Department of Computer
Science and Engineering
 The task of providing a concise yet
understandable description of a programming
language is difficult but essential to the
language’s success.
 One of the problems in describing a language is
the diversity of the people who must understand
the description.
 Programming language implementors must be
able to determine how the expressions,
statements, and program units of a language are
formed.
 Language users must be able to determine how
2
to encode software solutions by referring to a
language reference manual.
 The study of programming languages, like the
study of natural languages, can be divided into
examinations of syntax and semantics.
 The syntax of a programming language is the
form of its expressions, statements, and program
units.
 Its semantics is the meaning of those
expressions, statements, and program units.
 For example, the syntax of a Java while statement
is
while (boolean_expr) statement
 The semantics of this statement form is that
3
when the current value of the Boolean expression
is true, the embedded statement is executed.
THE GENERAL PROBLEM OF
DESCRIBING SYNTAX
 A language, whether natural (such as
English) or artificial (such as Java), is a set of
strings of characters from some alphabet.
The strings of a language are called
sentences or statements.
 Let  be a set of characters. A language over  is
a set of strings of characters drawn from .
 Alphabet = English characters
Language = English sentences
 Alphabet = ASCII

Language = C programs
 Each string over  is a sentence
 For Java programming language, a Java 4

program is a sentence.
THE GENERAL PROBLEM OF
DESCRIBING SYNTAX
 Formal descriptions of the syntax of programming
languages often do not include descriptions of the
lowest-level syntactic units. These small units are
called lexemes.
 The lexemes of a programming language include
its numeric literals, operators, and special words.
 Lexemes are partitioned into groups—for
example, the names of variables, methods, and
classes.
 A name or token represents each lexeme group.
 So, a token of a language is a category of its
lexemes.
5
 An identifier is a token that can have lexemes or
instances, such as sum and total.
THE GENERAL PROBLEM OF
DESCRIBING SYNTAX

6
CONTEXT-FREE GRAMMARS
 In the middle to late 1950s, two men, Noam
Chomsky and John Backus, developed the same
syntax description formalism in unrelated
research efforts. Subsequently, it became the
most widely used method for programming
language syntax.
 In the mid-1950s, Noam Chomsky described
four classes of generative devices or grammars
that define four classes of languages (Chomsky,
1956, 1959).
 Two of these grammar classes, named context-
free and regular, turned out to be useful for
7
describing the syntax of programming
languages.
CONTEXT-FREE GRAMMARS
 Regular grammars can describe the forms of
the tokens of programming languages.
 The syntax of whole programming
languages, with minor exceptions, can be
described by context-free grammars.

8
ORIGINS OF BACKUS-NAUR FORM
 John Backus’s paper introduced a new formal
notation for specifying programming
language syntax.
 The new notation was later modified slightly
by Peter Naur for the description of ALGOL
60 (Naur, 1960). This revised method of
syntax description became known as the
Backus-Naur Form, or simply BNF.
 BNF is nearly identical to Chomsky’s
generative devices for context-free
languages, called context-free grammars.
9
FUNDAMENTALS
 A metalanguage is a language that is used to
describe another language. BNF is a metalanguage
for programming languages.
 BNF uses abstractions for syntactic structures. A
simple Java assignment statement, for example,
might be represented by abstraction <assign>. The
actual definition of <assign> can be given by
<assign>  <var> = <expression>
 The text on the left side of the arrow, which is called
the left-hand side (LHS). The text to the right of the
arrow is the definition of the LHS. It is called the right-
hand side (RHS) and consists of some mixture of
tokens, lexemes, and references to other abstractions.
 Altogether, the definition is called a rule or 10
production.
FUNDAMENTALS
 The abstractions in a BNF description or
grammar are often called nonterminal
symbols or simply nonterminals.
 The lexemes and tokens of the rules are
called terminal symbols or simply terminals.
 A BNF description, or grammar, is a
collection of rules.
 Nonterminal symbols can have two or more
distinct definitions, representing two or more
possible syntactic forms in the language.
11
GRAMMARS AND DERIVATIONS
 Recursion is used to describe Variable-length lists in
BNF form.
 A rule is recursive if its LHS appears in its RHS. The
following rules illustrate how recursion is used to
describe lists:
<ident_list>  identifier | identifier,
<ident_list>
 A grammar is a generative device for defining
languages.
 The sentences of the language are generated through
a sequence of applications of the rules, beginning
with a special nonterminal of the grammar called the
start symbol.
12
 This sequence of rule applications is called a
derivation.
GRAMMARS AND DERIVATIONS
 Each of the strings in a derivation is called a
sentential form.
 When the replaced nonterminal is always the
leftmost nonterminal in the previous
sentential form, that order of replacement is
called leftmost derivation.
 When the replaced nonterminal is always the
rightmost nonterminal in the previous
sentential form, that order of replacement is
called rightmost derivation.

13
PARSE TREES
 One of the most attractive features of grammar is
that it naturally describes the hierarchical syntactic
structure of the sentences of the languages they
define. These hierarchical structures are called
parse trees.

14
AMBIGUOUS GRAMMAR
 A grammar that generates a sentential form for
which there are two or more distinct parse trees is
said to be ambiguous.
 Ambiguity is related to the grammar, not the
language.
 Removing Ambiguity
 When an expression includes two operators with the
same precedence (as * and / usually have)—for
example, A / B * C—a semantic rule must specify which
should have precedence. This rule is called Associativity.
 When an expression includes two different operators,
for example, x + y * z, assigning different precedence
levels to operators is important. For example, the
multiplication operator is generated lower in the tree, 15
which could indicate that it has precedence over the
addition operator in the expression.
EXTENDED BNF
 Because of a few minor inconveniences in BNF, it
has been extended in several ways. Most extended
versions are called Extended BNF, or simply EBNF,
even though they are not all exactly the same.
 The extensions do not enhance the descriptive
power of BNF; they only increase its readability and
writability.
 Three extensions are commonly included in the
various versions of EBNF.
 The first of these denotes an optional part of an
RHS, which is delimited by brackets. For example,
a C if-else statement can be described as:
 <if_stmt>  if (<expression>) <statement> [else 16
<statement>]
EXTENDED BNF
 Without the use of the brackets, the syntactic
description of this statement would require the
following two rules:
 <if_stmt>  if (<expression>) <statement>
|
if(<expression>)<statement>else<statement>
 The second extension is the use of braces in an
RHS to indicate that the enclosed part can be
repeated indefinitely or left out altogether.
 The third common extension deals with multiple-
choice options. When a single element must be
chosen from a group, the options are placed in
parentheses and separated by the OR operator, |. 17
ATTRIBUTE GRAMMARS
 An attribute grammar is a device used to
describe more of the structure of a
programming language that can be defined
with context-free grammar.
 It is an extension of context-free grammar.
 Static Semantics
 Some characteristics of programming languages are
difficult to describe with both BNF & EBNF, and
some are impossible.
 In Java, for example, a floating-point value cannot
be assigned to an integer type variable, although
the opposite is legal.
 Although this restriction can be specified in BNF, it 18
requires additional nonterminal symbols and rules.
STATIC SEMANTICS
 If all of the typing rules of Java were specified in
BNF, the grammar would become too large to be
helpful because the size of the grammar
determines the size of the syntax analyzer.
 These problems illustrate the categories of
language rules called static semantics rules.
 The static semantics of a language is only
indirectly related to the meaning of programs
during execution; rather, it has to do with the legal
forms of programs (syntax rather than semantics).
 Many static semantic rules of a language state its
type constraints. Static semantics is so named
because the analysis required to check these 19
specifications can be done at compile time.
STATIC SEMANTICS
 Attribute grammars are context-free grammars to
which attributes, attribute computation functions,
and predicate functions have been added.
 Attributes, which are associated with grammar
symbols (the terminal and nonterminal symbols),
are similar to variables in the sense that they can
have values assigned to them.
 Attribute computation functions, sometimes called
semantic functions, are associated with grammar
rules. They are used to specify how attribute
values are computed.
 Predicate functions, which state the static
semantic rules of the language, are associated 20
with grammar rules.
STATIC SEMANTICS

E  T + E1 {E.val = T.val + E1.val}


E  T {E.val = T.val}
T  int * T1 {T.val = int.val * T1.val}
T  int {T.val = int.val}

Context Free Grammar


Semantic actions
attributes

21
TWO TYPES OF ATTRIBUTES
P

c1 c2 c3 c4
Synthesized attributes: values are computed from one of
the children’s nodes. Synthesized of P = f(c 1, c2, c3, c4)
Synthesized attributes are used to pass semantic
information up a parse tree.
P

S1 S2 S3 S4

Inherited attributes: values are computed from attributes


of the siblings and parent of the node. Inherited
attributes pass semantic information down and across a
22
tree
Inherited of S4= f(P, S1, S2, S3)
TWO TYPES OF ATTRIBUTES
A A

D E F D E F

b is synthesized attribute of A b is Inherited attribute of D


ADEF DAEF
ci’s attributes of D, E, F ci’s attributes of A, E, F

 Terminal symbols – have synthesized attributes only


 Start symbol – is assumed not to have any inherited
attributes

 Synthesized/Inherited attributes are naturally 23


computed bottom-up/top-down, respectively
THE ATTRIBUTES FOR THE NON
TERMINALS
 actual_type
 A Synthesized attribute associated with the non
terminals <var> and <expr>. In the case of an
expression, it is determined from the actual
types of the child node.
 expected_type
 An Inherited attribute associated with the non-
terminal <expr>. determined by the type of the
variable

24
EXAMPLE
The syntax and static semantics of this assignment statement are as
follows:
 1. The only variable names are A, B, and C.

 2. The right side of the assignments can be either a variable or an

expression in the form of a variable added to another variable.


 3. The variables can be one of two types: int or real.

 4. When two variables are on the right side of an assignment, they

need not be the same type.


 5. The type of expression when the operand types are not the same

is always real.
 6. When they are the same, the expression type is that of the

operands.
 7. The type of the left side of the assignment must match the type

of the right side.


 The syntax portion of our example attribute grammar is

<assign> → <var> = <expr> 25


<expr> → <var> + <var> | <var>
<var> → A | B | C
THE FLOW OF ATTRIBUTES
IN THE TREE

26
DYNAMIC SEMANTICS
 Operational Semantics
 Operational semantics describes the meaning of
a program by defining the execution steps
needed to evaluate it. It gives a detailed, step-
by-step explanation of how each statement or
expression changes the program’s state.
 Example: For an assignment statement x = x +
1, operational semantics would describe the
exact process of looking up the current value of
x, adding 1, and then updating x with the new
value.

27
DYNAMIC SEMANTICS
 Denotational Semantics
 Denotational semantics defines the meaning of a
program by mapping its elements to
mathematical objects, such as functions or
domains. Each program construct is mapped to a
mathematical object that represents its
"meaning" without specifying how it executes
step-by-step.
 Example: In denotational semantics, the
assignment x = x + 1 might be represented as a
function that takes an initial state (a mapping of
variables to values) and produces a new state
with x updated.
28
DYNAMIC SEMANTICS
 Axiomatic Semantics
 Axiomatic semantics focuses on using logical
assertions to specify a program's behavior. It
uses preconditions and postconditions to
describe what must be true before and after
each statement's execution. This approach is
useful for proving programs' correctness.
 Example: For x = x + 1, we could specify a
precondition P: x = 3 and a postcondition Q: x =
4, indicating that if x starts as 3, it will end as 4
after this statement.

29
DYNAMIC SEMANTICS
 Originally
 Based on formal logic (predicate calculus)
 For formal program verification
 Form
 Assertions: The logical expressions used in axiomatic
semantics are called predicates or assertions.
 Pre-, post- conditions: {P} statement {Q}
{b > 0}
a = b + 1
{a > 1}
 Axioms (logical statements that are assumed to be
true) {P} S {Q}, P'  P, Q  Q'
 Inference rules
 how to interpret… {P'} S {Q'}
30
 (Is a method of inferring the truth of one
assertion based on the values of other assertions.)
WEAKEST PRECONDITIONS
 The weakest precondition is the least
restrictive precondition that will guarantee
the validity of the associated postcondition.
 An inference rule is a method of inferring the
truth of one assertion on the basis of the
values of other assertions. The general form
of an inference rule is as follows:
 This rule states that if S1, S2, . . . , and Sn
are true, then the truth of S can be inferred.
The top part of an inference rule is called its
antecedent; the bottom part is called its
consequent. 31
ASSIGNMENT STATEMENTS
 The precondition and postcondition of an
assignment statement together define its
meaning.
 To define the meaning of an assignment
statement, there must be a way to compute its
precondition from its postcondition.
 Let x = E be a general assignment statement,
and Q be its postcondition. Then, its weakest
precondition, P, is defined by the axiom.
 which means that P is computed as Q with all
instances of x replaced by E. For example, if
we have the assignment statement and 32
postcondition
ASSIGNMENT STATEMENTS
 a = b / 2 - 1 {a < 10}, the weakest
precondition is computed by substituting b / 2
- 1 for ‘a’ in the postcondition {a < 10}, as
follows: b / 2 - 1 < 10 b < 22
 Thus, the weakest precondition for the given
assignment statement and postcondition is {b
< 22}
 The usual notation for specifying the
axiomatic semantics of a given statement
form is: {P} S {Q}
 A given assignment statement with both a
precondition and a postcondition can be 33

considered a logical statement.


ASSIGNMENT STATEMENTS
 For example, consider the following logical
statement:
{x > 3} x = x - 3 {x > 0}
 Using the assignment axiom on the
statement and its postcondition produces {x
> 3}, which is the given precondition.
Therefore, we have proven the example
logical statement.

34
THE END

35

You might also like