Lec4 SyntaxAnalysis

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

Introduction to Parsing

Ambiguity and Syntax Errors


Outline

• Parser overview

• Context-free grammars (CFG’s)

• Derivations

• Ambiguity

• Syntax errors

2
What is the job of Syntax Analysis?
• Syntax Analysis is also called Parsing or Hierarchical
Analysis.
• A Parser implements grammar of the language may it be java,
C, C++ etc
• The parser obtains a string of tokens from the lexical analyzer
and :
• verifies that the string can be generated by the
grammar for the source language
• Reports any syntax errors in the program
• Constructs a parse tree representation of the
program
• usually calls the lexical analyzer to supply a token to it
when necessary
• The grammar that a parser implements is called a Context
The Functionality of theParser

• Input: sequence of tokens from lexer

• Output: parse tree of the program


Comparison with Lexical Analysis:
Phase Input Output
Lexer Sequence of Sequence of
characters tokens
Parser Sequence of Parse tree
tokens
4
Example

• If-then-else statement
if (x == y) then z =1; else z = 2;
• Parser input
IF (ID == ID) THEN ID = INT; ELSE ID =
INT;
• Possible parser output

IF-THEN-ELSE
== = =

I I I IN I IN
D D D T D T
5
The Role of the Parser
• Not all sequences of tokens are programs ...
• Parser must distinguish between valid and
invalid sequences of tokens
• The Role is:
1. To check syntax (= string recognizer)
And to report syntax errors accurately
2. To invoke semantic actions
For static semantics checking, e.g. type checking of
expressions, functions, etc.
• We need
– A language for describing valid sequences of
tokens
– A method for distinguishing valid from invalid
6
What is the difference between Syntax and Semantic?

 Syntax is the way in which we construct sentences


by following principles and rules.

 Semantics is the interpretations of and meanings


derived from the sentence transmission and
understanding of the message or in other words
are the logical sentences making sense or not
Context-Free Grammars
• Many programming language constructs have a
recursive structure

• A STMT is of the form


if COND then STMT else STMT , or
while COND do STMT , or

• Context-free grammars are a natural notation
for this recursive structure

8
CFGs (Cont.)
 A CFG consists of
– A set of terminals T
– A set of non-terminals N
– A start symbol S (a non-terminal)
– A set of productions

Assuming X  N the productions are of the form


X , or
X  Y1 Y2 ... Yn where Yi  N T

10
Notational Conventions
Terminals: Lower case letters, operator symbols,
punctuation symbols, digits, boldface strings are all
terminals

Non Terminals: Upper case letters, lower case italic


names are usually non terminals
• Greek letters such as ,, represent strings of
grammars symbols. Thus a generic production can
be written as A 
• The start symbol is the left-hand side of the first
production

10
Examples of CFGs

A fragment of our example language


(simplified):

STMT  if COND then STMT else STMT


| while COND do STMT
| id = int

11
Examples of CFGs (cont.)
Grammar for simple arithmetic expressions:

E  E*
E
|E+E
|(E)
|id

12
The Language of a CFG
Read productions as replacement rules:

X  Y1 ... Yn
Means X can be replaced by Y1 ... Yn
X
Means X can be erased (replaced with empty string)

13
Key Idea

(1) Begin with a string consisting of the start


symbol “S”
(2) Replace any non-terminal X in the string
by a right-hand side of some production
X  Y1 . . Y n
(3) Repeat (2) until there are no non-terminals
in the string

14
The Language of a CFG (Cont.)
More formally, we write
X1 . . X i . . X n  X1 ..X i  1 Y 1 . . Y m Xi1 . . X n

if there is a production
Xi  Y1 . . Y m
Write
X1 . . X n Y1 . . Y m
if
X1 . . X n .. .. 
Y1 . . Y m
15
The Language of a CFG
Let G be a context-free grammar with start
symbol S. Then the language of G is:

{a1 . . a n  | S  a1 . . a n and every ai is a


terminal }
Terminals

• Terminals are called so because there are no


rules for replacing them
• Once generated, terminals are permanent
• Terminals ought to be tokens of the
language 16
Examples

L(G) is the language of the CFG G

Strings of balanced parentheses


( ) i i
|i
Two grammars:

S  (S ) S 0(S)
or
S   | 

20
Example

A fragment of our example language (simplified :

STMT  if COND then STMT


|if COND then STMT else STMT
|while COND do STMT
|id = int
COND  (id == id)
|
(id != id)

18
Arithmetic Example

Simple arithmetic expressions:

E  E+E | E E | (E) |
id
Some elements
id of the id
language:
+ id
(id) id 
(id) id id
 (id)
 19
Derivations and Parse Trees

A derivation is a sequence of productions

S.. .. ..


A derivation can be drawn as a tree
– Start symbol is the tree’s root
– For a production
X  Y1 . . Y n add
children Y1 . . Y n to node X

20
Derivation Example
• Grammar

E  E+E | E E | (E) | id
• String id  id + id E
E
 E+E
E + E
 E  E+E
 id E + E E * id
 id id + E E
 id id + id id id
21
Derivation Example
• Grammar

E  E+E | E E | (E) | id
• String id  id + id
Notes on Derivations
• A parse tree has
– Terminals at the leaves
– Non-terminals at the interior nodes

• An in-order traversal of the leaves is the


original input

• The parse tree shows the association of


operations, the input string does not

23
Left-most and Right-most
Derivations
E • What was shown before
was a left-most derivation
 E+E – At each step, replace the
 left-most non-terminal
E+id
 E  E + id • There is an equivalent
 E id + id notion of a right-

most derivation
id id + id – Shown on the right

24
Right-most Derivation in Detail
E E E

E  E+E
E + E

E
E
 E+E E + E

 E+id
id

25
Right-most Derivation in Detail (2)
E E
 E+E
 E+id E + E

 E  E + id
E * id
 E id + id E
 id id + id id
• Note that right-most and left-most
id
derivations have the same parse tree
• The difference is just in the order in
which branches are added
26
Ambiguity
• Grammar:
EE+E|E* E | (E)|
int

• The string int * int + int has two parse trees


E
E E+ E * E
E
E * E int int
E + E
int int int
int 27
Ambiguity (Cont.)
 A grammar is ambiguous if it has more
than one parse tree for some string
– Equivalently, there is more than one right-
most or left-most derivation for some string
 Ambiguity is bad
– Leaves meaning of some programs ill-defined
 Ambiguity is common in programming
languages
– Arithmetic expressions
– IF-THEN-ELSE

28
Dealing with Ambiguity

• There are several ways to handle ambiguity

• Most direct method is to rewrite grammar


unambiguously

ET+E|T
T  int * T | int | ( E )

• This grammar enforces precedence of * over +

29
The Dangling Else: Example
Consider the following grammar S  if C then S

• The expression |if


C then S else S
if C1 then if C2 then S3 else S4
|
has twoOTHER
parse trees
if if

C1 if S4 C1 if

C2 S3
S3 S4
C2

• Typically we want the second form


30
The Dangling Else: A Fix
• else should match the closest unmatched then
• We can describe this in the grammar

S  MIF /* all then are matched */


| /* some then are unmatched */
MIFUIF
 if C then MIF else MIF
| OTHER
UIF  if C then S
| if C then MIF else UIF

• Describes the same set of strings


The Dangling Else: Example Revisited
• The expression if C1 then if C2 then S3 else S4

if if

C1 if C1 if S4

C2 C2
S3
S3
• Not valid because the
S4 then expression is
not a MIF
• A valid parse tree 32
Ambiguity
• No general techniques for handling ambiguity

• Impossible to convert automatically an


ambiguous grammar to an unambiguous one

• Used with care, ambiguity can simplify the


grammar
– Sometimes allows more natural definitions
– We need disambiguation mechanisms

33
Precedence and Associativity Declarations
• Instead of rewriting the grammar
– Use the more natural (ambiguous) grammar
– Along with disambiguating declarations

• Most tools allow precedence and associativity


declarations to disambiguate grammars

• Examples …

34
Associativity Declarations
• Consider the grammar EE+E|
int
• Ambiguous: two parse trees of int + int + int

E E
E + E + E
E
E + E int int E + E

int int int int


• Left associativity declaration: %left +

35
Precedence Declarations
• Consider the grammar E  E + E * E | int
| E And the string int + int * int

E E
* E + E
E
E + E int int E *
E
int int int
int
• Precedence declarations: %left
+
36
Error Handling
 Purpose of the compiler is
– To detect non-valid programs
– To translate the valid ones
 Many kinds of possible errors (e.g. in C)
Error kind Example Detected by …
Lexical …$… Lexer
Syntax … x *% … Parser
Semantic … int x; y = x(3); … Type checker
Correctness your favorite program Tester/User

37
Error Handling
 A good compiler should assist in identifying and locating errors
◦ Lexical errors: important, compiler can easily recover and
continue such as misspelling an identifier, keyword etc.
◦ Syntax errors: most important for compiler, can almost
always recover such as arithmetic expression with unbalanced
parenthesis
◦ Static semantic errors: important, can sometimes recover
such as operator applied to incompatible operands
◦ Dynamic semantic errors: hard or impossible to detect at
compile time, runtime checks are required
◦ Logical errors: hard or impossible to detect such as infinite
recursive calls

38
Syntax Error Handling
 Error handler should
– Report errors accurately and clearly
– Recover from an error quickly
– Not slow down compilation of valid code

• Good error handling is not easy to achieve

39
Approaches to Syntax Error Recovery
 Approaches from simple to complex
– Panic mode
– Error productions
– Automatic local or global correction
• Panic mode is the simplest, most popular method
• When an error is detected:
– Discard tokens until one with a clear role is
found
– Continue from there

• Such tokens are called synchronizing tokens


– Typically the statement or expression
terminators
40
Questions???

You might also like