Lec4 SyntaxAnalysis
Lec4 SyntaxAnalysis
Lec4 SyntaxAnalysis
• Parser overview
• Derivations
• Ambiguity
• Syntax errors
2
What is the job of Syntax Analysis?
• Syntax Analysis is also called Parsing or Hierarchical
Analysis.
• A Parser implements grammar of the language may it be java,
C, C++ etc
• The parser obtains a string of tokens from the lexical analyzer
and :
• verifies that the string can be generated by the
grammar for the source language
• Reports any syntax errors in the program
• Constructs a parse tree representation of the
program
• usually calls the lexical analyzer to supply a token to it
when necessary
• The grammar that a parser implements is called a Context
The Functionality of theParser
• If-then-else statement
if (x == y) then z =1; else z = 2;
• Parser input
IF (ID == ID) THEN ID = INT; ELSE ID =
INT;
• Possible parser output
IF-THEN-ELSE
== = =
I I I IN I IN
D D D T D T
5
The Role of the Parser
• Not all sequences of tokens are programs ...
• Parser must distinguish between valid and
invalid sequences of tokens
• The Role is:
1. To check syntax (= string recognizer)
And to report syntax errors accurately
2. To invoke semantic actions
For static semantics checking, e.g. type checking of
expressions, functions, etc.
• We need
– A language for describing valid sequences of
tokens
– A method for distinguishing valid from invalid
6
What is the difference between Syntax and Semantic?
8
CFGs (Cont.)
A CFG consists of
– A set of terminals T
– A set of non-terminals N
– A start symbol S (a non-terminal)
– A set of productions
10
Notational Conventions
Terminals: Lower case letters, operator symbols,
punctuation symbols, digits, boldface strings are all
terminals
10
Examples of CFGs
11
Examples of CFGs (cont.)
Grammar for simple arithmetic expressions:
E E*
E
|E+E
|(E)
|id
12
The Language of a CFG
Read productions as replacement rules:
X Y1 ... Yn
Means X can be replaced by Y1 ... Yn
X
Means X can be erased (replaced with empty string)
13
Key Idea
14
The Language of a CFG (Cont.)
More formally, we write
X1 . . X i . . X n X1 ..X i 1 Y 1 . . Y m Xi1 . . X n
if there is a production
Xi Y1 . . Y m
Write
X1 . . X n Y1 . . Y m
if
X1 . . X n .. ..
Y1 . . Y m
15
The Language of a CFG
Let G be a context-free grammar with start
symbol S. Then the language of G is:
S (S ) S 0(S)
or
S |
20
Example
18
Arithmetic Example
E E+E | E E | (E) |
id
Some elements
id of the id
language:
+ id
(id) id
(id) id id
(id)
19
Derivations and Parse Trees
20
Derivation Example
• Grammar
E E+E | E E | (E) | id
• String id id + id E
E
E+E
E + E
E E+E
id E + E E * id
id id + E E
id id + id id id
21
Derivation Example
• Grammar
E E+E | E E | (E) | id
• String id id + id
Notes on Derivations
• A parse tree has
– Terminals at the leaves
– Non-terminals at the interior nodes
23
Left-most and Right-most
Derivations
E • What was shown before
was a left-most derivation
E+E – At each step, replace the
left-most non-terminal
E+id
E E + id • There is an equivalent
E id + id notion of a right-
most derivation
id id + id – Shown on the right
24
Right-most Derivation in Detail
E E E
E E+E
E + E
E
E
E+E E + E
E+id
id
25
Right-most Derivation in Detail (2)
E E
E+E
E+id E + E
E E + id
E * id
E id + id E
id id + id id
• Note that right-most and left-most
id
derivations have the same parse tree
• The difference is just in the order in
which branches are added
26
Ambiguity
• Grammar:
EE+E|E* E | (E)|
int
28
Dealing with Ambiguity
ET+E|T
T int * T | int | ( E )
29
The Dangling Else: Example
Consider the following grammar S if C then S
C1 if S4 C1 if
C2 S3
S3 S4
C2
if if
C1 if C1 if S4
C2 C2
S3
S3
• Not valid because the
S4 then expression is
not a MIF
• A valid parse tree 32
Ambiguity
• No general techniques for handling ambiguity
33
Precedence and Associativity Declarations
• Instead of rewriting the grammar
– Use the more natural (ambiguous) grammar
– Along with disambiguating declarations
• Examples …
34
Associativity Declarations
• Consider the grammar EE+E|
int
• Ambiguous: two parse trees of int + int + int
E E
E + E + E
E
E + E int int E + E
35
Precedence Declarations
• Consider the grammar E E + E * E | int
| E And the string int + int * int
E E
* E + E
E
E + E int int E *
E
int int int
int
• Precedence declarations: %left
+
36
Error Handling
Purpose of the compiler is
– To detect non-valid programs
– To translate the valid ones
Many kinds of possible errors (e.g. in C)
Error kind Example Detected by …
Lexical …$… Lexer
Syntax … x *% … Parser
Semantic … int x; y = x(3); … Type checker
Correctness your favorite program Tester/User
37
Error Handling
A good compiler should assist in identifying and locating errors
◦ Lexical errors: important, compiler can easily recover and
continue such as misspelling an identifier, keyword etc.
◦ Syntax errors: most important for compiler, can almost
always recover such as arithmetic expression with unbalanced
parenthesis
◦ Static semantic errors: important, can sometimes recover
such as operator applied to incompatible operands
◦ Dynamic semantic errors: hard or impossible to detect at
compile time, runtime checks are required
◦ Logical errors: hard or impossible to detect such as infinite
recursive calls
38
Syntax Error Handling
Error handler should
– Report errors accurately and clearly
– Recover from an error quickly
– Not slow down compilation of valid code
39
Approaches to Syntax Error Recovery
Approaches from simple to complex
– Panic mode
– Error productions
– Automatic local or global correction
• Panic mode is the simplest, most popular method
• When an error is detected:
– Discard tokens until one with a clear role is
found
– Continue from there