0% found this document useful (0 votes)
47 views28 pages

Lec 03 TPL

TPL Programming Language Concepts

Uploaded by

Lilian Voss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views28 pages

Lec 03 TPL

TPL Programming Language Concepts

Uploaded by

Lilian Voss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Lecture 03

Describing Syntax
Topics

• Introduction
• The General Problem of Describing Syntax
• Formal Methods of Describing Syntax
– BNF
– Context-Free Grammars
– Derivation
– Parse Trees
– Syntax Diagrams

Copyright © 2015 Pearson. All rights reserved. 1-2


Introduction
• Syntax: the form or structure of the expressions,
statements, and program units

• Semantics: the meaning of the expressions,


statements, and program units

• Syntax and semantics provide a language’s


definition
– Users of a language definition
• Other language designers
• Implementers
• Programmers (the users of the language)

Copyright © 2015 Pearson. All rights reserved. 1-3


The General Problem of Describing Syntax:
Terminology
• A sentence is a string of
characters over some alphabet index = 2 * count + 17;

• A language is a set of sentences


• A lexeme is the lowest level
syntactic unit of a language
(e.g., *, sum, begin)
• A token is a category of lexemes
(e.g., identifier) sentence?
language?
• A pattern explains the ways to lexeme?
arrange the combinations of
token?
characters in a token (e.g., use
pattern?
regular expression: [A-Z a-z 0-9]+)
Copyright © 2015 Pearson. All rights reserved. 1-4
The General Problem of Describing Syntax:
Terminology

1-5
The General Problem of Describing Syntax:
Terminology

1-6
The General Problem of Describing Syntax:
Terminology

1-7
The General Problem of Describing Syntax:
Terminology

How to categorize tokens?

•Very much dependent on the language.


•Typically:
 Give keywords their own tokens.
 Give different punctuation symbols their
own tokens.
 Group lexemes representing identifiers,
numeric constants, strings, etc. into their
own groups.
 Discard irrelevant information (whitespace,
comments)
1-8
Formal Methods of Describing Syntax

• Recognizers
– A recognition device reads input strings over the
alphabet of the language and decides whether
the input strings belong to the language
– Example: syntax analysis part of a compiler

• Generators
– A device that generates sentences of a language
– One can determine if the syntax of a particular
sentence is syntactically correct by comparing it
to the structure of the generator

Copyright © 2015 Pearson. All rights reserved. 1-9


Formal Methods of Describing Syntax

• Lexical analyzer can identify the pattern of the


tokens with the regular expressions but cannot
check the syntax.

• Context-Free Grammars (CFG) is a helpful tool in


describing the syntax of programming languages
(syntax analyzer)

https://www.tutorialspoint.com/compiler_design/compiler_design_syntax_analysis.htm 1-10
BNF and Context-Free Grammars

• Context-Free Grammars (CFG)


– Developed by Noam Chomsky in the mid-1950s
– Language generators, meant to describe the
syntax of natural languages
– Define a class of languages called context-free
languages

• Backus-Naur Form (BNF)


– Invented by John Backus to describe the syntax
of Algol 58 in 1959 then slightly modified by
Peter Naur.
– BNF is notation technique for CFG
Copyright © 2015 Pearson. All rights reserved. 1-11
Difference of CFG and BNF

– CFG
expression  identifier | number | - expression
| ( expression )
| expression operator expression
operator  + | - | * | /

– BNF
expression  identifier | number | - expression
| ( expression )
| expression operator expression
operator  + | - | * | /

Copyright © 2015 Pearson. All rights reserved. 1-12


BNF Fundamentals

The basic properties are

• NT: a set of non-terminal symbols


• T: a set of terminals (NT ∩ T = Ǿ)
• R: a set of rules (R: NT → (NT U T)*)
• S: a start symbol

Copyright © 2015 Pearson. All rights reserved. 1-13


BNF Fundamentals

• In BNF, abstractions are used to represent classes


of syntactic structures--they act like syntactic
variables (also called nonterminal symbols, or just
terminals)

• Terminals are lexemes or tokens

• A rule has a left-hand side (LHS), which is a


nonterminal, and a right-hand side (RHS), which is
a string of terminals and/or nonterminals

Copyright © 2015 Pearson. All rights reserved. 1-14


BNF Fundamentals (continued)

• Nonterminals are often enclosed in angle brackets

– Examples of BNF rules:


<ident_list> → identifier | identifier, <ident_list>
<if_stmt> → if <logic_expr> then <stmt>

• Grammar: a finite non-empty set of rules

• A start symbol is a special element of the


nonterminals of a grammar

Copyright © 2015 Pearson. All rights reserved. 1-15


BNF Rules

• A nonterminal symbol can have more


than one RHS
<stmt>  <single_stmt>
<stmt>  begin <stmt_list> end

<stmt>  <single_stmt>
| begin <stmt_list> end

Copyright © 2015 Pearson. All rights reserved. 1-16


Describing Lists

• Syntactic lists are described using recursion


( LHS appear in its RHS )
<ident_list>  ident
| ident, <ident_list>

• A derivation
– is a repeated application of rules, starting with
the start symbol and ending with a sentence (all
terminal symbols)
– show how to generate a syntactically valid string

Copyright © 2015 Pearson. All rights reserved. 1-17


An Example Grammar & Derivation
<program>  <stmts>
<stmts>  <stmt> | <stmt> ; <stmts>
<stmt>  <var> = <expr>
<var>  a | b | c | d
<expr>  <term> + <term> | <term> - <term>
<term>  <var> | const
 How to derive a sentence (a = b + const) from grammar???
<program> => <stmts> => <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const
Copyright © 2015 Pearson. All rights reserved. 1-18
Derivations
• Every string of symbols in a derivation is a sentential
form
• A sentence is a sentential form that has only terminal
symbols
• A leftmost derivation is one in which the leftmost
nonterminal in each sentential form is the one that is
expanded
• A derivation may be neither leftmost nor rightmost

1-19
Parse Tree

• A hierarchical representation of a derivation


<program>

<stmts>

<stmt>

<var> = <expr>

a <term> + <term>

<var> const

b
Copyright © 2015 Pearson. All rights reserved. 1-20
Ambiguity in Grammars
• A grammar is ambiguous if and only if it generates a sentential
form that has two or more distinct parse trees

<expr>  <expr> <op> <expr> | const


<op>  / | -
<expr> <expr>

<expr> <op> <expr> <expr> <op> <expr>

<expr> <op> <expr> <expr> <op> <expr>

const - const / const const - const / const


Copyright © 2015 Pearson. All rights reserved. 1-21
An Unambiguous Expression Grammar

• If we use the parse tree to indicate precedence


levels of the operators, we cannot have ambiguity

<expr>  <expr> - <term> | <term>


<term>  <term> / const | const

<expr>

<expr> - <term>

<term> <term> / const

const const

Copyright © 2015 Pearson. All rights reserved. 1-22


Associativity of Operators
• When more than one operators have same
precedence, associativity is used to specify
precedence  left-associative or right-associative
• Operator associativity can also be indicated by a grammar

<expr> -> <expr> + <expr> | const (ambiguous)


<expr> -> <expr> + const | const (unambiguous)
<expr>
<expr>

<expr> + const

<expr> + const

const
Copyright © 2015 Pearson. All rights reserved. 1-23
Unambiguous Grammar for Selector
• C++ if-then-else grammar
<if_stmt> -> if (<logic_expr>) <stmt>
| if (<logic_expr>) <stmt> else <stmt>
AMBIGUOUS!!!

Copyright © 2015 Pearson. All rights reserved. 1-24


Unambiguous Grammar for Selector
• C++ if-then-else grammar
<if_stmt> -> if (<logic_expr>) <stmt>
| if (<logic_expr>) <stmt> else <stmt>
AMBIGUOUS!!!

- An unambiguous grammar for if-then-else

<stmt> -> <matched> | <unmatched>


<matched> -> if (<logic_expr>) <stmt>
| a non-if statement
<unmatched> -> if (<logic_expr>) <stmt>
| if (<logic_expr>) <matched> else
<unmatched>

Copyright © 2015 Pearson. All rights reserved. 1-25


Extended BNF
• Increase readability and writability
• Optional parts are placed in brackets [ ]
<proc_call> -> ident [(<expr_list>)]

• Alternative parts of RHSs are placed inside parentheses and


separated via vertical bars
<term> → <term> (+|-) const

• Repetitions (0 or more) are placed inside braces { }


<ident> → letter {letter|digit}

• Repetitions (1 or more) use a plus (+) superscript


<ident> → letter {letter|digit}+

Copyright © 2015 Pearson. All rights reserved. 1-26


BNF and EBNF
• BNF
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
<compound>  begin <stmt> {<stmt>} end
• EBNF
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}
<compound>  begin {<stmt>}+ end

Copyright © 2015 Pearson. All rights reserved. 1-27


Syntax Diagram
~ Graphical form of BNF
• An expression can broken
down into a sequence of
terms, separated by + or –
<expr>  <term> {(+ | -)
<term>}

• Each term is broken down


into a sequence of factors,
separated by * or /

• Each factor is either a


parenthesized expression
or a number

1-28

You might also like