0% found this document useful (0 votes)
5 views48 pages

Lecture 3

Chapter 3 focuses on describing the syntax and semantics of programming languages, introducing key terminology and formal methods such as grammars and Backus-Naur Form (BNF). It discusses the distinction between terminals and nonterminals, and how grammars generate valid sentences in a language through derivations. The chapter also addresses issues of ambiguity in grammar and methods for handling it, including operator precedence and associativity.

Uploaded by

ENES
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views48 pages

Lecture 3

Chapter 3 focuses on describing the syntax and semantics of programming languages, introducing key terminology and formal methods such as grammars and Backus-Naur Form (BNF). It discusses the distinction between terminals and nonterminals, and how grammars generate valid sentences in a language through derivations. The chapter also addresses issues of ambiguity in grammar and methods for handling it, including operator precedence and associativity.

Uploaded by

ENES
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Chapter 3

Describing Syntax
and Semantics
Tentative Course Outline

1. Preliminaries -Introduction to Programming Languages


2. Evolution of the Major Programming Languages
3. Describing Syntax and Semantics
4. Names, Bindings, and Scopes
5. Data Types
6. Expressions and Assignment Statements
7. Statement-Level Control Structures
8. Subprograms
9. Implementing Subprograms
10. Abstract Data Types and Encapsulation Concepts
11. Support for Object-Oriented Programming
12. Concurrency
13. Exception Handling and Event Handling
14. Functional Programming Languages
Chapter 3 Topics

• Introduction
• Describing Syntax: Terminology
• Formal Methods of Describing Syntax
• Additional Notes on Terminals and Nonterminals
• Grammars and Derivations

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-3


Introduction

(SYNTAX) (SEMANTICS)

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-4


Introduction

Example: Syntax and Semantics

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-5


Describing Syntax: Terminology

Alphabet: Σ, All strings: Σ*


• A sentence is a string of characters over some alphabet
• A language is a set of sentences, L ⊆ Σ*

• A language is a set of sentences


– Natural languages: English, Turkish, …
– Programming languages: C, Fortran, Java,…
– Formal languages: a*b*, 0n1n
• String of the language:
– Sentences
– Program statements
– Words (aaaaabb, 000111)

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-6


Describing Syntax: Terminology

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-7


Describing Syntax: Terminology

Example in Java Language

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-8


Describing Syntax: Terminology

• Syntax rules specify which strings from Σ* are in the language


• Examples: organization of the program, loop structures, assignment, expressions, subprogram
definitions, and calls.

Elements of Syntax

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-9


Formal Methods of Describing Syntax

Grammars: formal language-generation mechanisms.

Chomsky described four classes of grammars that define four classes of languages.
 Two of these grammar classes, named context-free and regular, turned out to be useful
for describing the syntax of programming languages.
 Regular grammars: The forms of the tokens of programming languages
 Context-free grammars: The syntax of whole programming

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-10


Formal Methods of Describing Syntax

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-11


Formal Methods of Describing Syntax
Regular Languages

• Tokens can be generated using three formal rules


– Concatenation
– Alternation (|)
– Kleene closure (repetition an arbitrary number of times)(*)
• Any sets of strings that can be defined by these three rules is called a regular set or a regular
language

| means OR. * means 0 OR more

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-12


Formal Methods of Describing Syntax

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-13


Formal Methods of Describing Syntax

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-14


Formal Methods of Describing Syntax

Regular Language vs CFL

• Tokens can be generated using three formal rules


– Concatenation
– Alternation (|)
– Kleene closure (repetition an arbitrary number of times)(*)
• Any sets of strings that can be defined by these three rules is called a regular set or a regular
language
• Any set of strings that can be defined if we add recursion is called context-free language
(CFL).

Context-Free Grammars

• Language generators, meant to describe the syntax of natural languages


• Define the class of context-free languages
• Programming languages are contained in the class of CFL’s.

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-15


Formal Methods of Describing Syntax

Backus-Naur Form (BNF) (1959)


• Invented by John Backus to describe the syntax of Algol 58
• A notation to describe the syntax of programming languages.
• Named after
– John Backus – Algol 58
– Peter Naur – Algol 60
• A metalanguage is a language used to describe another language.
• BNF is a metalanguage used to describe PLs
• BNF is equivalent to context-free grammars

BNF and context-free grammars are equivalent meta-languages


Well-suited for describing the syntax of programming languages

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-16


Formal Methods of Describing Syntax

BNF Fundamentals

• BNF uses abstractions for syntactic structures.


<LHS> → <RHS>
• LHS: abstraction being defined
• RHS: definition
• “→” means “can have the form”
• Sometimes ::= is used for →

Example, Java assignment statement can be represented by the abstraction <assign>

<assign> → <var> = <expression>

• This is a rule or production


• Here, <var> and <expression> must also be defined.
• Example instances of this abstraction can be
total = sub1 + sub2
myVar = 4

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-17


Formal Methods of Describing Syntax

BNF Fundamentals
• These abstractions are called Variables or Nonterminals of a Grammar.
• Lexemes and tokens are the Terminals of a grammar.
• Nonterminals are often enclosed in angle brackets

Examples of BNF rules:


<ident_list> → identifier | identifier, <ident_list>
<if_stmt> → if <logic_expr> then <stmt>

• A formal definition of rule:


• A rule has a left-hand side (LHS), which is a nonterminal, and a right-hand side (RHS),
which is a string of terminals and/or nonterminals
<LHS> → <RHS>

For example (rule): <assign> → <var> = <expression>


For example(sentence): total = subtotal1 + subtotal2

Java statement ex: <if_stmt> → if ( <logic_expr> ) <stmt> else <stmt>

LHS RHS
Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-18
Additional Notes on Terminals and Nonterminals
Terminals
Terminals are the smallest block we consider in our
grammars. Let’s see some typical terminals:
• identifiers: these are the names used for variables,
classes, functions, methods and so on

• keywords: almost every language uses keywords. They


are exact strings that are used to indicate the start of a
definition, a modifier (public, private, static, final, etc.)
or control flow structures (while, for, until, etc.)

• literals: these permit to define values in our languages.


We can have string literals, numeric literal, char literals,
boolean literals (but we could consider them keywords
as well), array literals, map literals, and more, depending
on the language

• separators and delimiters: like colons, semicolons,


commas, parenthesis, brackets

• whitespaces: spaces, tabs, newlines

• comments
Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-19
Additional Notes on Terminals and Nonterminals

Nonterminals

• Examples of non-terminals are:


– program/document: represent the entire file
– module/classes: group several declarations togethers
– functions/methods: group statements together
– statements: these are the single instructions.
Some of them can contain other statements.
Example : loops
– expressions: are typically used within statements and can be composed in various ways

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-20


Additional Notes on Terminals and Nonterminals

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-21


Additional Notes on Terminals and Nonterminals

An initial example Alternations

Infinite Number of Sentences

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-22


Additional Notes on Terminals and Nonterminals

How you can describe simple arithmetic?

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-23


Additional Notes on Terminals and Nonterminals

Identifiers

<identifier> → <letter> | <identifier><letter> | <identifier><digit>

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-24


Grammars and Derivations

• A grammar is a generative device for defining languages


• The sentences of the language are generated through a sequence of applications of the rules,
starting from the special nonterminal called start symbol.
• Such a generation is called a derivation.
• Start symbol represents a complete program. So it is usually named as <program>.

<program> → begin <stmt_list> end


<stmt_list> → <stmt> |<stmt> ; <stmt_list>
<stmt> → <var> := <expression>
<var> → A | B | C
<expression>→ <var> | <var> <arith_op> <var>
<arith_op> → + | - | * | /

<program>  <stmts>
<stmts>  <stmt> | <stmt> ; <stmts>
<stmt>  <var> = <expr>
<var>  a | b | c | d
<expr>  <term> + <term> | <term> - <term>
<term>  <var> | const
Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-25
Grammars and Derivations

• In order to check if a given string represents a valid program in the language, we try to derive
it in the grammar.
• Derivation starts from the start symbol <program>.
• At each step we replace a nonterminal with its definition (RHS of the rule).
• Every string of symbols in a derivation is a sentential form
• A sentence is a sentential form that has only terminal symbols
• A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the
one that is expanded

<program> → begin <stmt_list> end


<stmt_list> → <stmt> |<stmt> ; <stmt_list>
<stmt> → <var> := <expression>
<var> → A | B | C
<expression>→ <var> | <var> <arith_op> <var>
<arith_op> → + | - | * | /

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-26


Grammars and Derivations

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-27


Grammars and Derivations

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-28


Grammars and Derivations

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-29


Grammars and Derivations

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-30


Grammars and Derivations
Parse Tree: A hierarchical representation
of a derivation

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-31


Grammars and Derivations

Copyright © 2012 Addison-Wesley. All rights reserved. 1-32


Grammars and Derivations

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-33


Grammars and Derivations

Ambiguity (Belirsizlik) in Grammars

• A grammar is ambiguous if and only if it generates a sentential form that has two or more
distinct parse trees

Given the following grammar

A=B+C*A
Ambiguity
A = (B+C)*A or
A = B+(C*A)
Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-34
Grammars and Derivations

Two Leftmost derivations for


A= B +C *A

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-35


Grammars and Derivations

Two Rightmost derivations for


A= B +C *A

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-36


Grammars and Derivations

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-37


Grammars and Derivations

If we use the parse tree to indicate


precedence levels of the operators, we
cannot have ambiguity

An Ambiguous Expression Grammar

An Unambiguous Expression Grammar

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-38


Grammars and Derivations

Handling Ambiguity

• The grammar of a programming languages must not be ambiguous


• There are solutions for correcting the ambiguity
 Operator precedence
 Associativity rules

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-39


Grammars and Derivations

Handling Ambiguity

Operator Precedence
In mathematics * operation
has a higher precedence than
+
This can be implemented
with extra nonterminals

Unique Parse Tree for A = B + C * A


Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-40
Grammars and Derivations

Handling Ambiguity

Associativity of Operators

•What about equal precedence operators?


•In math addition and multiplication are associative
A+B+C = (A+B)+C = A+(B+C)
•Subtraction and division are not associative
A/B/C/D = ? ((A/B)/C)/D ≠A/(B/(C/D))

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-41


Grammars and Derivations
Operator associativity can also be indicated by a grammar

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-42


Grammars and Derivations

Associativity

• In a BNF rule, if the LHS appears at the beginning of the RHS, the rule is said to be
left recursive
• Left recursion specifies left associativity

<expr> ::= <expr> + <term> | <term>

• Similar for the right recursion


• In most of the languages exponential is defined as a right associative operation

<factor> ::= <expr> ** <factor> | <expr>


<expr> ::= (<expr>) | <id>

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-43


Grammars and Derivations

Associativity
A parse tree for A = B + C + A illustrating the
associativity of addition

Left associativity

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-44


Grammars and Derivations
Is this ambiguous?

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-45


Grammars and Derivations
An Unambiguous grammar for “if then else”

1-46
Grammars and Derivations
An Unambiguous grammar for “if then else”

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-47


Grammars and Derivations

Draw the parse tree

Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-48

You might also like