0% found this document useful (0 votes)
15 views29 pages

4th - Syntax Analysis

Uploaded by

morsalin islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views29 pages

4th - Syntax Analysis

Uploaded by

morsalin islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 4

Syntax Analysis
Overview

Why are Grammars to formally describe Languages Important ?


1. Precise, easy-to-understand representations
2. Compiler-writing tools can take grammar and generate a compiler
3.Allow language to be evolved (new statements, changes to statements, etc.)
Languages are not static, but are constantly upgraded to add new features or
fix “old” ones

02
The Role of the Parser
 Syntax analysis is done by the parser.

token Parse Rest of Intermediate


Source Lexical parser
program tree Representation
analyzer Get next front end
token
1. Also technically
part of parsing
Symbol 2. Includes augmenting
table info on tokens in
source, type
Figure 4. 1: Position of parser in compiler model
checking, semantic
Parser: analysis
 uses a grammar to check structure of
tokens
 produces a parse tree
 syntactic errors and recovery
 recognize correct syntax
 report errors 03
The Role of the Parser
Syntax Error Handling:
1) Lexical errors:
 Include misspellings of identifiers, keywords, or operators
 Example: the use of an identifier elipseSize instead of ellipseSize –
and missing quotes around text intended as a string.
2) Syntactic errors:
 Omission, wrong order of tokens
 include misplaced semicolons or extra or missing braces; that is, "{"
or "}."
3) Semantic errors:
 Incompatible types
 include type mismatches between operators and operands.
 An example is a return statement in a Java method with result type
void.
4) Logical errors:
 anything from incorrect reasoning on the part of the programming.
 Infinite loop / recursive call
Majority of error processing occurs during syntax analysis.
NOTE: Not all errors are identifiable. 04
The Role of the Parser
Error handler goals
 Report the presence of errors clearly and accurately
 Recover from each error quickly enough to detect subsequent errors
 Add minimal overhead to the processing of correct programs
Error-recover strategies:
1) Panic mode recovery
 Discard input symbol one at a time until one of designated set of
synchronization tokens is found
 The synchronizing tokens are usually delimiters, such as semicolon
or }
 Advantages:
simple  suited to 1 error per statement
 Problems:
skip input  miss declaration – causing more errors
 miss errors in skipped material
2) Phrase level recovery
 Replacing a prefix of remaining input by some string that allows
the parser to continue
 Local correction on input is to replace a comma by a semicolon,
delete
an extraneous semicolon, or insert a missing semicolon.
 Not suited to all situations
 Used in conjunction with panic mode to allow less input to be 05
The Role of the Parser

3) Error productions
 Augment the grammar with productions that generate the
erroneous constructs
 Example: add a rule for
:= in C assignment statements
Report error but continue compile
 Self correction + diagnostic messages
4) Global correction
 Choosing minimal sequence of changes to obtain a globally least-
cost correction
 Adding / deleting / replacing symbols
 Costly - key issues

06
Context Free Grammars

Context Free Grammars:


 Basis of parsing
 Represent language constructs
 consists of terminals, nonterminals, a start symbol, and
productions. stmt → if (expr ) stmt else stmt
1) Terminals :
 tokens of the language
 the terminals are the keywords if and else and the symbols " c"
and ") ."
2) Non-terminals:
 denote sets of strings generated by the grammar & in the
language
 stmt and expr are nonterminals
3) Start symbol:
 one nonterminal is distinguished as the start symbol,
 the productions for the start symbol are listed first.
4) Production rules:
 to indicate how T and NT are combined to generate valid 07
Context Free Grammars
 Each production consists of:
(a) A nonterminal called the head or left side of the production; this
production defines some of the strings denoted by the head.
(b) The symbol → Sometimes : : = has been used in place of the arrow.
(c) A body or right side consisting of zero or more terminals and
nonterminals. The components of the body describe one way in
which strings of the nonterminal at the head can be constructed.
Example 4.2 : The grammar with the following production defines
simple arithmetic expressions. In this grammar, the terminal symbols are
id + - * / ↑( )
Production: expression → expression op
expression expression → (expression)
expression → -
expression expression →
id
op → +
op →
- op
→ * 08
Context Free Grammars
Notational Convent:
Terminals: a,b,c,+,-,punc,0,1,…,9
Non Terminals:
A,B,C,S T or NT:
X,Y,Z
Strings of Terminals: u,v,…,z in T*
Strings of T / NT:  ,  in ( T 
NT)* Alternatives of production rules:
A 1; A 2; …; A k;  A  1
| 2 | … | 1
First NT on LHS of 1st production rule
is designated as start symbol !
Example 4.3:
Using these shorthands, the grammar of Example 4.2 can be rewritten
concisely as
E  E A E | ( E ) | -E | id
A+|-|*| / |
The notational conventions tell us that E, and A are nonterminals, with E the
09
start symbol. The remaining symbols are terminals.
Context Free Grammars

Define: parse tree, Derivation tree, syntax tree

10
Context Free Grammars
Derivations:
 A step in a derivation is zero or one action that replaces a NT with the RHS of
a production rule.
 Productions are treated as rewriting rules to generate a string
EXAMPLE: E  -E (the  means “derives” in one step) using the production rule: E
 -E
Leftmost: Replace the leftmost non-terminal symbol (Input scanned and replaced with
the production rules from left to right)

E  E A E  id A E  id * E  id * id
Rightmost: Replace the leftmost non-terminal symbol (Input scanned and replaced
with the production rules from right to left)
E  E A E  E A id  E * id  id * id
Example 4.4: The string –(id+id) is a sentence of grammar because there is the
derivation
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)

11
Context Free Grammars
Example 4.5:
Build a parse tree from the derivation (Leftmost derivations) -(id+id) or
E  -E  -(E)  -(E+E)  -(id+E)  - (id+id)

12
Context Free Grammars
Consider the expression grammar:
E  E+E | E*E | (E) | -E | id
Leftmost derivations of id + id * id
E
EE+E
E + E
E

E + E  id + E E + E

id
E

id + E  id + E * E E + E

id E * E

13
Context Free Grammars
Consider the expression grammar:
E  E+E | E*E | (E) | -E | id
Rightmost derivations of id + id * id
E
EE+E
E * E
E

E + E  E* id E * E
id
E

id + E  E+E*id E * E

E E
+ id

14
Ambiguous Grammars
Example 4.6 :
The arithmetic expression grammar permits two distinct leftmost derivations
for the sentence id + id * id:
EE*E EE+E
E+E*E  id + E
 id + E * E  id + E * E
 id + id * E  id + id * E
 id + id * id  id + id * id

E E

E * E E + E

E id E * E
+ E id

id id id id

15
Ambiguous Grammars
Ambiguity: A CFG is said to be ambiguous if for some strings there exist
more than one parse tree Or more than one leftmost derivation Or more than
one rightmost derivation
 Example: id+id*id

E
E
E + E
E * E
id E * E
E + E id

id id id id
1+5*3
• The problem with ambiguity precedence of operator is violated.

• There is no standard method to check ambiguity , we have to do it by


practice (hit & trial method) 16
How to remove ambiguity

Consider the expression grammar:


E  E+E | E*E | (E) | -E | id
for the sentence id + id * id:
E
E
E + E
E * E
id E * E
E + E id

id id id id

Precedence is violated here

17
How to remove ambiguity

Consider the expression grammar:


E  E+E | E*E | (E) | -E | id
for the sentence 1+2+3
E
E
E + E
E + E
id E + E
E + E id

id id id id

for the sentence (1+2)+3 for the sentence 1+(2+3)


“+” is left associative here “+” is right associative here

Both are correct using parse tree, but operator associativity is violated here
18
How to remove ambiguity (with Associativity)
We are getting both associativity in the previous grammar because we are defining the
grammar without any order.

To achieve left associativity we have to grow the grammar in left direction only.
For example +, * are left associative operator, in this case we have to maintain left
associativity.
Consider the expression grammar:
E E  E+E | E*E | id E  E+id | E*id | id
E + id
E + id

id

for the sentence (1+2)+3

19
How to remove ambiguity (with Associativity)
We are getting both associativity in the previous grammar because we are defining the
grammar without any order.

To achieve right associativity we have to grow the grammar in left direction only.
For example “ ”is right associative operator, in this case we have to maintain left
associativity.
Consider the expression grammar:
E Tree should
E  E ^ E | id E  F ^ E | F grow in
F ^ E F  id right
direction
only.
id F ^ E

id F

id
for the sentence 2^3^2, Which is

20
How to remove ambiguity (With Precedence)
To maintain the order of precedence of operators, Heights precedence operator should be at
the least level.

Consider the expression grammar:


E  E+E | E*E | id

E  E+T | T
T  T*F | F After maintaining the precedence order

F  id

21
Left Recursion
Left recursion: A production of grammar is said to have left recursive if the leftmost variable
of its RMS is same as the variable of its LMS.

Example:

Type of Left recursion:


.
1. Direct left recursion : Example:

2. Indirect left recursion: Example:

22
Left Recursion
Why to Left recursion: In recursive descent parsing, left recursive grammar cause an infinite
loop . How infinite loops?
Example:
The language generated by the grammar is

Type of Left recursion:


.
1. Direct left recursion : Example:

2. Indirect left recursion: Example:

23
Elimination of Direct Left Recursion
The Grammar

So that the grammar can generated same string (

For the following left recursive grammar

We get following non left recursive grammar

24
Elimination of Direct Left Recursion

25
Elimination of indirect Left Recursion
Consider The following indirect left recursive Grammar

26
Left Factoring
Consider the following grammar
| Common prefixes in the grammar

If we have to generate a string then what will happen?


• Backtracking.
• Why backtracking happened?
• Because we are making decision only after seeing (prefix)

One or more productions on the RHS are having something common in the prefixes. This is
called common prefix problem on non-deterministic grammar.

Left factoring:
Sometimes it is not clear which production rules to choose to expand a non terminal because
multiple productions begin with the same terminal or non-terminal. This type of grammar is
non-deterministic grammar or grammar containing left factoring.
How to remove left factoring:
Postponed the decision making.
| 
|
27
Left Factoring

Class Work S
|aSaSb
|abb
|b
28
Types of Parser
Parser Shift-
Reduce
Top down parser Bottom up parser parser

L scan from
Top down Operator left -right
Top down
with full precedence LR Parser
without R  Reverse
backtracking parser
backtracking of RMD

Brute force Recursive


method LR(0) SLR(0) LALR CLR
descent
Non SSimple LALook CCanonical
Recursive ahed
descent

LL(1) The first L stands for scanning the input from left to right,
or the second L stands for producing a leftmost derivation,
Predictive and the 1 stands for using one input symbol of lookahead
parser at each step to make parsing action decision.
29

You might also like