0% found this document useful (0 votes)
107 views

Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree

The document discusses syntax analysis and parsing. It defines context-free grammars and how they are used to describe the syntax of programming languages. Grammars allow derivation of strings in a language from a start symbol using productions. Parse trees show how a string derives from the start symbol. Ambiguity in grammars can cause incorrect parsing and the document discusses techniques like precedence rules to resolve ambiguity.

Uploaded by

Bhavya Nag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views

Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree

The document discusses syntax analysis and parsing. It defines context-free grammars and how they are used to describe the syntax of programming languages. Grammars allow derivation of strings in a language from a start symbol using productions. Parse trees show how a string derives from the start symbol. Ambiguity in grammars can cause incorrect parsing and the document discusses techniques like precedence rules to resolve ambiguity.

Uploaded by

Bhavya Nag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Syntax Analysis

• Check syntax and construct abstract syntax tree

if
== = ;
b 0 a b

• Error reporting and recovery


• Model using context free grammars
• Recognize using Push down automata/Table
Driven Parsers
1
Limitations of regular languages
• How to describe language syntax precisely and
conveniently. Can regular expressions be
used?
• Many languages are not regular, for example,
string of balanced parentheses
– ((((…))))
– { ( i) i | i ≥ 0 }
– There is no regular expression for this language
• A finite automata may repeat states, however,
it cannot remember the number of times it
has been to a particular state
• A more powerful language is needed to
describe a valid string of tokens 2
Syntax definition
• Context free grammars <T, N, P, S>
– T: a set of tokens (terminal symbols)
– N: a set of non terminal symbols
– P: a set of productions of the form
nonterminal →String of terminals & non terminals
– S: a start symbol
• A grammar derives strings by beginning with a
start symbol and repeatedly replacing a non
terminal by the right hand side of a production
for that non terminal.
• The strings that can be derived from the start
symbol of a grammar G form the language L(G)
defined by the grammar.
3
Examples
• String of balanced parentheses
S→(S)S|Є

• Grammar
list → list + digit
| list – digit
| digit
digit → 0 | 1 | … | 9

Consists of the language which is a list of digit


separated by + or -.
4
Derivation
list  list + digit
 list – digit + digit
 digit – digit + digit
 9 – digit + digit
 9 – 5 + digit
9–5+2
Therefore, the string 9-5+2 belongs to the
language specified by the grammar
The name context free comes from the fact
that use of a production X  … does not
depend on the context of X
5
Examples …
• Simplified Grammar for C block
block  ‘{‘ decls statements ‘}’
statements  stmt-list | Є
stmt–list  stmt-list stmt ‘;’
| stmt ‘;’
decls  decls declaration | Є
declaration  …
6
Syntax analyzers
• Testing for membership whether w belongs
to L(G) is just a “yes” or “no” answer
• However the syntax analyzer
– Must generate the parse tree
– Handle errors gracefully if string is not in the
language
• Form of the grammar is important
– Many grammars generate the same language
– Tools are sensitive to the grammar
7
What syntax analysis cannot do!
• To check whether variables are of types on
which operations are allowed

• To check whether a variable has been


declared before use

• To check whether a variable has been


initialized

• These issues will be handled in semantic


analysis
8
Derivation
• If there is a production A  α then we
say that A derives α and is denoted by A
α
• α A β  α γ β if A  γ is a production
• If α1  α2  …  αn then α1 + αn
• Given a grammar G and a string w of
terminals in L(G) we can write S  w +

• If S * α where α is a string of terminals


and non terminals of G then we say
that α is a sentential form of G
9
Derivation …
• If in a sentential form only the leftmost non
terminal is replaced then it becomes leftmost
derivation
• Every leftmost step can be written as
wAγ lm* wδγ
where w is a string of terminals and A  δ is a
production
• Similarly, right most derivation can be defined
• An ambiguous grammar is one that produces
more than one leftmost (rightmost) derivation
of a sentence
10
Parse tree
• shows how the start symbol of a
grammar derives a string in the language
• root is labeled by the start symbol
• leaf nodes are labeled by tokens
• Each internal node is labeled by a non
terminal
• if A is the label of anode and x1, x2, …xn
are labels of the children of that node
then A  x1 x2 … xn is a production in the
grammar 11
Example
Parse tree for 9-5+2
list

list + digit

list - digit 2

digit 5

9
12
Ambiguity
• A Grammar can have more than one
parse tree for a string
• Consider grammar
list  list+ list
| list – list
|0|1|…|9

• String 9-5+2 has two parse trees


13
list list

list + list list - list

list - list 2 9 list + list

9 5 5 2

14
Ambiguity …
• Ambiguity is problematic because meaning
of the programs can be incorrect
• Ambiguity can be handled in several ways
– Enforce associativity and precedence
– Rewrite the grammar (cleanest way)
• There is no algorithm to convert
automatically any ambiguous grammar to
an unambiguous grammar accepting the
same language
• Worse, there are inherently ambiguous
languages! 15
Ambiguity in Programming Lang.
• Dangling else problem
stmt  if expr stmt
| if expr stmt else stmt
• For this grammar, the string
if e1 if e2 then s1 else s2
has two parse trees

16
if e1
if e2
stmt
s1
else s2
if expr stmt else stmt

if e1 e1 if expr stmt s2
if e2
s1
else s2 e2 s1
stmt

if expr stmt

e1 if expr stmt else stmt

e2 s1 s2 17
Resolving dangling else problem
• General rule: match each else with the closest
previous unmatched if. The grammar can be
rewritten as
stmt  matched-stmt
| unmatched-stmt
matched-stmt  if expr matched-stmt
else matched-stmt
| others
unmatched-stmt  if expr stmt
| if expr matched-stmt
else unmatched-stmt 18
Associativity
• If an operand has operator on both the
sides, the side on which operator takes this
operand is the associativity of that
operator
• In a+b+c b is taken by left +
• +, -, *, / are left associative
• ^, = are right associative
• Grammar to generate strings with right
associative operators
right  letter = right | letter
letter  a| b |…| z
19
Precedence
• String a+5*2 has two possible
interpretations because of two
different parse trees corresponding to
(a+5)*2 and a+(5*2)
• Precedence determines the correct
interpretation.
• Next, an example of how precedence
rules are encoded in a grammar
20
Precedence/Associativity in the
Grammar for Arithmetic Expressions
Ambiguous • Unambiguous,
with precedence
EE+E and associativity
| E*E rules honored
| (E) EE+T|T
| num | id TT*F|F
F  ( E ) | num
3+2+5 | id
3+2*5 21
Parsing
• Process of determination whether a string
can be generated by a grammar
• Parsing falls in two categories:
– Top-down parsing:
Construction of the parse tree starts at the root
(from the start symbol) and proceeds towards
leaves (token or terminals)
– Bottom-up parsing:
Construction of the parse tree starts from the
leaf nodes (tokens or terminals of the grammar)
and proceeds towards root (start symbol)
22

You might also like