0% found this document useful (0 votes)
104 views48 pages

cs212 Lect05 63 Inter

Here are the steps to scan the input "x := (y+10) * z1;": 1. Start at state q0 2. Read 'x' and stay in q0, accumulating characters into the token 3. Read ':' and transition to state q1, returning token "x" 4. Read '=' and stay in q1, returning token ":=" 5. Read '(' and transition to state q2, returning token "=" 6. Read 'y' and stay in q2, accumulating characters into the token 7. Read '+' and stay in q2 8. Read '1' and stay in q2 9. Read '0' and stay in q2

Uploaded by

Leng Hour leng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views48 pages

cs212 Lect05 63 Inter

Here are the steps to scan the input "x := (y+10) * z1;": 1. Start at state q0 2. Read 'x' and stay in q0, accumulating characters into the token 3. Read ':' and transition to state q1, returning token "x" 4. Read '=' and stay in q1, returning token ":=" 5. Read '(' and transition to state q2, returning token "=" 6. Read 'y' and stay in q2, accumulating characters into the token 7. Read '+' and stay in q2 8. Read '1' and stay in q2 9. Read '0' and stay in q2

Uploaded by

Leng Hour leng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

CS 212 LECTURE 05

PROGRAMMING LANGUAGES
BACKUS NAUR FORM

• Backus Naur Form (BNF): a standard notation for


expressing syntax as a set of grammar rules.
• BNF was developed by Noam Chomsky, John Backus,
and Peter Naur.
• First used to describe Algol.
• BNF can describe any context-free grammar.
• Fortunately, computer languages are mostly context-
free. 2
BACKUS NAUR FORM
• Grammar Rules or Productions: define symbols.
assignment_stmt ::= id = expression ;

The nonterminal The definition


symbol being defined. (production)

Nonterminal Symbols: anything that is defined on the left-


side of some production.
Terminal Symbols: things that are not defined by
productions. They can be literals, symbols, and other
3
lexemes of the language defined by lexical rules.
BACKUS NAUR FORM
• Different notations (same meaning):
assignment_stmt ::= id = expression + term
<assignment-stmt> ::= <id> = <expr> + <term>
<assignment-stmt> => <id> = <expr> + <term>
AssignmentStmt  id = expression + term
::=, =>,  mean "consists of" or "defined as”
• Null symbol : e or @
• Alternatives ( " | " ):
<expression> ::= <expression> + <term>
4
| <expression> - <term>
| <term>
PROBLEMS WITH BNF NOTATION

• BNF notation is too long.


• Must use recursion to specify repeated
occurrences
• Must use separate an alternative for every
option
5
EXTENDED BNF NOTATION
• EBNF adds notation for repetition and optional elements.
{…} means the contents can occur 0 or more times:
Use for repeating left recursive
<expr> ::= <expr> + <term> | <term>
This can repeat itself
and it on the left of other
becomes
non terminal symbol
<expr> ::= <term> { + <term> }
6
EXTENDED BNF NOTATION
[…] encloses an optional part:
1
<if-stmt> ::= if ( <expr> ) <stmt>
| if ( <expr> ) <stmt> else <stmt>
2
If there is no else <stmt> in (2) then (1) = (2). So
that else <stmt> is the optional part
Becomes
<if-stmt> ::=
if ( <expr> ) <stmt> [else <stmt>]
7
EXTENDED BNF NOTATION,
CONTINUED
• ( a | b | ... ) is a list of choices. Choose exactly one.
1 The different
<expr> ::= <expr> + <term>
between (1) and
2
| <expr> - <term> (2) is +,- sign. So
| <term> +,- are the list
choice
becomes
<expr> ::= <term> { (+|-) <term> }
Another example:
<term> ::= <factor> { (*|/|%)<factor> }
8
EBNF COMPARED TO BNF
BNF: <expression> ::=
<expression> <expression> ++ <term>
::= <expression> <term>
|| <expression>
<expression> -- <term>
<term>
|| <term>
<term>
<term> ::=
<term> ::= <term> ** <factor>
<term> <factor>
|| <term>
<term> // <factor>
<factor>
|| <factor>
<factor>
<factor> ::=
<factor> ( <expression>
::= ( <expression> ))
|| <id>
<id>
|| <number>
<number>

EBNF:
<expression> ::=
<expression> <term> {{ (+|-)
::= <term> (+|-) <term>
<term> }}
<term> ::=
<term> ::= <factor> {{ (*|/)
<factor> (*|/)
<factor> }}
<factor>
factor 
factor  '(' <expression>
'(' <expression> ')'
')'
|| <id>
<id> || <number>
<number> 9
NOTES ON USE OF EBNF

• Do not start a rule with {…}:


Right: <expr> ::= <term> { + <term> }
Wrong: <expr> ::= {<term> + } <term>
• For right recursive rules use [ ... ] instead:
<expr> ::= <term> + <expr> | <term>
EBNF: <expr> ::= <term> [ + <expr> ]
Square brackets can be used anywhere:
<expr> ::= <expr> + <term>|<term>|- <term>
EBNF:
10
<expr> ::= [ - ] <term> { + <term> }
TRY THIS

• Rewrite this grammar using Extended BNF.


<sentence> ::= <noun-phrase> <verb-phrase> .
<noun-phrase> ::= <article><noun>
| <noun>
<article> ::= a | the
<noun> ::= boy | girl | cat | dog
<verb-phrase> ::= <verb><noun-phrase>
| <verb>
<verb> ::= sees | pets | bites
11
SYNTAX AND SEMANTICS
• The syntax of a language defines the valid symbols and
grammar.
• Syntax defines the structure of a program, i.e., the
form that each program unit and each statement must
use.
• The semantics defines the meaning of the grammar
elements.
• Lexical structure is the form of lowest level syntactic 12
units (words or tokens) of a grammar.
SYNTAX AND SEMANTICS COMPARED

• Syntax: in Java, an assignment statement is:


identifier = expression { operator expression } ;
• Semantics: an assignment statement must use compatible
types, e.g.
int n1, n2;
n1 = 20*1024; // OK, int_var = int_expression
n2 = 3.50; // illegal, incompatible types

• Lexical elements (Lexemes):


13
"n2" "=" "3.50" ";"
HOW ARE THEY USED?
Program
Source Code Parts of a Compiler / Interpreter:
Tokenizer (Lexical Analysis)
Token stream

Parser (Syntax Analysis)


Parse tree

Semantic Analysis
Intermediate code

Optimization and Code Generation


14
Object code
SCANNING AND PARSING

source file sum = x1 + x2;

input stream sum


=
Scanner x1
+
tokens x2
;
Parser
sum
=
+
parse tree x1 x2 15
SCANNERS

• Recognize regular expressions


• Implemented as finite automata (finite state machines)
• Typically contain a loop that cycles through characters,
building tokens and associated values by repeated
operations
• scanner may be integrated as a function in the parser.
• Parser calls the Scanner to get the next token.
16
SCANNERS

17
LEXICAL STRUCTURE
• Lexemes are the smallest lexical unit of a language, grouped
according to syntactic usage. Some types of lexemes in
computer languages are:

identifiers: x, println, _INIT, ArrayList


numeric constants: 0, 10000, 2.98E+6
operators: =, +, -, ++, +=, *, /
separators: [ ] ; : . , ( )
string literals: "hello there" 18
LEXICAL STRUCTURE

• A token is a structure representating a lexeme that


explicitly indicates its categorization for the purpose
of parsing.

19
LEXICAL STRUCTURE

•Lexemes are recognized by the first phase of


a translator -- the scanner -- that deals
directly with the input. The scanner
separates the input into tokens.
•Scanners are also called lexers.
20
TYPES OF LEXEMES
• Common Lexemes (classes of tokens)
identifiers: x, println, _INIT, ArrayList
numeric constants: 0, 10000, 2.98E+6
assignment operators: =, +=, -=, *=, /=, %=
arithmetic operators: *, /, +, -, %
boolean operators: &&, ||, ^, !
separators: [ ] ; : . , ( )
string literals: "hello there“
• Reserved words: may be defined as a class, or simply treat as
21
identifiers at lexical level
TOKENS
• Tokens are the strings of syntactic units.
• Example: what are the tokens in this statement?
result = (sum - average)/count;
• Lexeme Tokens:
result identifier
= assignment operator
( expression delimiter
sum identifier
- arithmetic operator
average identifier
) expression delimiter
/ arithmetic operator
count identifier 22
; semi-colon (statement delimiter)
HERE IS AN FA THAT RECOGNIZES A SUBSET
OF TOKENS IN THE PASCAL LANGUAGE:

23
when scanning “ temp := temp + 1 ”
The first token should be temp.
From start state, then go to state q1 and loop in
state q1 until get “ : ” . It will stop and return the
first token “temp” then start to get the next token.
Try scanning “ x := (y+10) * z1; ”
24
PARSING ALGORITHMS
• Broadly divided into LL and LR.
• LL algorithms match input directly to left-side
symbols, then choose a right-side production that
matches the tokens. This is top-down parsing

• LR algorithms try to match tokens to the right-side


productions, then replace groups of tokens with the
left-side nonterminal. They continue until the entire
input has been "reduced" to the start symbol
25
PARSING ALGORITHMS

••Look
Lookahead:
ahead:
••algorithms
algorithmsmust
mustlook
lookatatnext
nexttoken(s)
token(s)totodecide
decide
between alternate productions for current tokens
between alternate productions for current tokens

••LL(1)
LL(1)means
meansLL
LLwith
with11token
tokenlook-ahead
look-ahead
••LL algorithms are simpler and easier to visualize.
LL algorithms are simpler and easier to visualize.
••LR
LRalgorithms
algorithmsare
aremore
morepowerful:
powerful:can
canparse
parsesome
some
grammars that LL cannot, such as left recursion.
grammars that LL cannot, such as left recursion. 26
TOP-DOWN PARSING EXAMPLE (LL)

• Grammar rule : Rule


1 Number
2
3
4
5
6
7
8
9
10

Input String : x – 2 * y
27
Tokens : id – number * id
28
29
30
31
32
33
34
35
36
37
38
39
TOP-DOWN PARSING EXAMPLE (LL)

• Grammar rule : Rule


1 Number
2
3
4
5
6
7
8
9
10

Input String : x – 2 * y
40
Tokens : id – number * id
41
42
ELIMINATION OF LEFT RECURSION

43
ELIMINATION OF LEFT RECURSION
Here is the grammar again:
S ::= A | B
A ::= ABc | AAdd | a | aa
B ::= Bee | b
An equivalent right-recursive grammar:
S ::= A | B
A ::= aA′ | aaA′
A′ ::= BcA′ | AddA′ | 
B ::= bB′
44
B′ ::= eeB′ | 
ELIMINATING LEFT RECURSION

45
LL PARSING EXAMPLE

Let try input String : x – 2 * y


Tokens : id – number * id

46
Rul Sentential Form Input
e
- Goal x – 2 * y
Expr x – 2 * y
Term Expr x – 2 * y
Factor Term Expr x – 2 * y
<id,x> Term Expr x – 2 * y
<id,x>  Expr x – 2 * y
<id,x> +Term Expr x – 2 * y

47
Rule Sentential Form Input
- Goal x – 2 * y
Expr x – 2 * y
Term Expr x – 2 * y
Factor Term Expr x – 2 * y
<id,x> Term Expr x – 2 * y
<id,x>  Expr x – 2 * y
<id,x> - Term Expr x – 2 * y
<id,x> - Factor Term Expr x – 2 * y
<id,x> - <number,2> Term Expr x –2*y
<id,x> - <number,2> * Factor Term Expr x –2*y
<id,x> - <number,2> * <id,y> Term Expr x –2*y
<id,x> - <number,2> * <id,y>  Expr x –2*y
<id,x> - <number,2> * <id,y>  x –2*y
<id,x> - <number,2> * <id,y> x –2*y
48

You might also like