0% found this document useful (0 votes)
34 views11 pages

Theme

The document discusses context-free grammars (CFGs) and parsing. It defines parsing as determining if a string is in the language generated by a CFG, typically by constructing a derivation tree. It describes top-down and bottom-up parsers, and how they construct derivation trees starting from the root or leaves, respectively. Examples of parse trees are provided to illustrate derivations for strings using different grammars. Practical parsers used for programming languages are discussed, including LL(k) and LR(k) parsers which enable deterministic parsing.

Uploaded by

AK Collection
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views11 pages

Theme

The document discusses context-free grammars (CFGs) and parsing. It defines parsing as determining if a string is in the language generated by a CFG, typically by constructing a derivation tree. It describes top-down and bottom-up parsers, and how they construct derivation trees starting from the root or leaves, respectively. Examples of parse trees are provided to illustrate derivations for strings using different grammars. Practical parsers used for programming languages are discussed, including LL(k) and LR(k) parsers which enable deterministic parsing.

Uploaded by

AK Collection
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CFG: Parsing

•Generative aspect of CFG: By now it should be clear how,


from a CFG G, you can derive strings w∈L(G).
CFG: Parsing
•Analytical aspect: Given a CFG G and strings w, how do
you decide if w∈L(G) and –if so– how do you determine
the derivation tree or the sequence of production rules
Recognition of strings in a that produce w? This is called the problem of parsing.
language

1 2

CFG: Parsing CFG: Parsing


• Parser
A program that determines if a string ω ∈ L (G ) Parse trees (=Derivation Tree)
by constructing a derivation. Equivalently,
A parse tree is a graphical representation
it searches the graph of G.
of a derivation sequence of a sentential form.
– Top-down parsers
• Constructs the derivation tree from root to
leaves.
• Leftmost derivation. Tree nodes represent symbols of the
– Bottom-up parsers grammar (nonterminals or terminals) and
• Constructs the derivation tree from leaves to tree edges represent derivation steps.
root.
• Rightmost derivation in reverse.
3 4

CFG: Parsing CFG: Parsing

Parse Tree: Example Parse Tree: Example 1


E → E + E | E ∗ E | ( E ) | - E | id
Given the following grammar: Lets examine this derivation:
E ⇒ -E ⇒ -(E) ⇒ -(E + E) ⇒ -(id + id)
E → E + E | E ∗ E | ( E ) | - E | id
E E E E E
Is the string -(id + id) a sentence in this grammar?
- E - E - E - E
Yes because there is the following derivation: ( E ) ( E ) ( E )

E ⇒ -E ⇒ -(E) ⇒ -(E + E) ⇒ -(id + id) E + E E + E


This is a top-down derivation
id id
because we start building the
parse tree at the top parse tree
5 6

1
CFG: Parsing CFG: Parsing
S → SS | a | b
Parse Tree: Example 2 ab ∈ L( S )
Parse Tree: Example 2 Rightmost
derivation S ⇒ SS ⇒ Sb ⇒ ab
S S S S S
S S S

Derivation S S S S Derivation S S S S
Trees S S Trees
a a b S S b a b

S S S
S ⇒ SS ⇒ aS ⇒ ab
Leftmost S
derivation Rightmost
Derivation
a b S S
in Reverse

7 8

CFG: Parsing
CFG: Parsing
Practical Parsers
Example 3 Consider the CFG grammar G • Language/Grammar designed to enable deterministic (directed
S → A and backtrack-free) searches.
A → T | A +T
T → b | (A) Show that (b)+b ∈ L(
L(G
G)? – Top-down parsers : LL(k) languages
S S • E.g., Pascal, Ada, etc.
S S S S S S • Better error diagnosis and recovery.
A A – Bottom-up parsers : LALR(1), LR(k) languages
A A A A A
A • E.g., C/C++, Java, etc.
A T T
A TA T A T A T • Handles left recursion in the grammar.
T T – Backtracking parsers
T T T
A A • E.g., Prolog interpreter.
A A T T
T ( b )+ ( b )+ b 9 10
+ + ( )+ ( )+

CFG: Parsing CFG: Parsing

Top-down Exhaustive Parsing Flaws of Top-down Exhaustive Parsing


nExhaustive parsing is a form of top
top--down parsing where nObvious flaw: it will take a long time and a lot of memory
you start with S and systematically go through all possible (say for moderately long strings w: It is inefficient.
leftmost) derivations until you produce the string w.
n(You can remove sentential forms that will not work.)
nFor cases w ∉L(G
L(G)) exhaustive parsing may never end.
This will especially happen if we have rules like A?
A ? ? that make
nExample: Can the CFG S ? SS | aSb | bSa | ? produce the the sentential forms ‘shrink
shrink’’ so that we will never know if we
string w = aabb
aabb,, and how? went ‘too far’
far’ with our parsing attempts.
nAfter one step: S ⇒ SS or aSb or bSa or ?. nSimilar problems occur if the parsing can get in a loop according
accordin g
nAfter two steps: S ⇒ SSS or aSbS or bSaS or S, to A ⇒ B ⇒ A ⇒ B…
or S ⇒ aSSb or aaSbb or abSab or abab.. nFortunately, it is always possible to remove problematic rules
nAfter three steps we see that: S ⇒ aSb ⇒ aaSbb ⇒ aabb
aabb.. like A?
A ? ? and A
A?? B from a CFG G.

11 12

2
Grammar Ambiguity Grammar Ambiguity

Definition
A string w∈L(G) is derived ambiguously if it has
more than one derivation tree (or equivalently: if it has
more than one leftmost derivation (or rightmost)).

Definition: a string is derived ambiguously A grammar is ambiguous if some strings are derived
in a context-free grammar if it has two or ambiguously.
more different parse trees
Typical example: rule S → 0 | 1 | S+S | S×S
Definition: a grammar is ambiguous if it
S ⇒ S+S ⇒ S×S+S ⇒ 0×S+S ⇒ 0×1+S ⇒ 0×1+1
generates some string ambiguously versus
S ⇒ S×S ⇒ 0×S ⇒ 0×S+S ⇒ 0×1+S ⇒ 0×1+1
13 14

Grammar Ambiguity Grammar Ambiguity

The ambiguity of 0×1+1 is shown by the two Note that the two different derivations:
different parse trees: S ⇒ S+S ⇒ 0+S ⇒ 0+1 S
S
S and
S ⇒ S+S ⇒ S+1 ⇒ 0+1
do not constitute an ambiguous string
0 + 1
S
S × S 0+1 as have the same parse tree:
+ S
0 Ambiguity causes troubles when trying to interpret strings
S 1 S + S like: “She likes men who love women who don't smoke.”
× S
Solutions: Use parentheses, or use precedence rules
1 such as a+(b×c) = a+b×c ? (a+b)×c.
0 1 1 15 16

Grammar Ambiguity
Grammar Ambiguity
Example
<EXPR> ? <EXPR> + <EXPR> Inherently Ambiguous
<EXPR> ? <EXPR> * <EXPR>
<EXPR> ? ( <EXPR> ) n Languagesthat can only be generated by
<EXPR> ? a ambiguous grammars are inherently ambiguous.
ambiguous .
Build a parse tree for a + a * a n Example anbncm} ∪ {anbmcm}.
5.13: L = {{a
<EXPR> <EXPR> L = { a ib j c k | i = j ∨ j = k}
n The way to make a CFG for this L somehow has to
<EXPR> <EXPR> involve the step S ? S1|S2 where S1 produces the
<EXPR> <EXPR> strings anbncm and S2 the strings anbmcm.
n This will be ambiguous on strings an bn cn .
<EXPR> <EXPR> <EXPR> <EXPR>

17 18

a + a * a a + a * a

3
Grammar Ambiguity Grammar Ambiguity

Example Example
E → E + E | E ∗ E | ( E ) | - E | id E → E + E | E ∗ E | ( E ) | - E | id
Find a derivation for the expression: id + id ∗ id Find a derivation for the expression: id + id ∗ id
E E E E E
According to the grammar, both are correct.
E + E E + E E + E E + E

E ∗ E id E ∗ E id E ∗ E

id id A grammar that produces more than one id id


E E E E parse tree for any input sentence is said E
to be an ambiguous grammar.
E ∗ E E ∗ E E ∗ E E + E

E + E E + E id E ∗ E id

Which derivation tree is correct? id id 19 id id 20

Grammar Ambiguity Grammar Ambiguity

Example
One way to resolve ambiguity is to associate
precedence to the operators.
stm → if expr then stm
Example Grammar:
| if expr then stm
• * has precedence over + else stm
1 + 2 * 3 = 1 + (2 * 3)
1 + 2 * 3 ? (1 + 2)*3 if B1 then if B2 then S1 else S2
Ambiguity:
• Associativity and precedence information is typically vs
used to disambiguate non-fully parenthesized if B1 then if B2 then S1 else S2
expressions containing unary prefix/postfix operators
or binary infix operators.
21 22

Grammar Ambiguity Grammar Ambiguity

Quiz 1 Quiz 2

Is the following grammar ambiguous? S → PC | AQ Is the following grammar ambiguous? S → aS | Sb | ab

P → aPb | λ
Yes: consider the string abc C → cC | λ Yes: consider ab

Q → bQc | λ
A → aA | λ 23 24

4
Grammar Ambiguity Simple Grammar
Definition
Quiz
A CFG (V,T,S,P) is a simple grammar

Is the following grammar ambiguous? S → SS | λ (s-grammar) if and only if all its productions are of the form
A ? ax with
A∈V, a∈T, x∈V* and any pair ( A,a) occurs at most once.

•Note, for simple grammars a left most derivation of a


S string w∈L(G) is straightforward and requires time |w|.
Yes
•Example: Take the s-grammar S ? aS|bSS|c with aabcc:
SS S ⇒ aS ⇒ aaS ⇒ aabSS ⇒ aabcS ⇒ aabcc.
λ Cyclic structure
Quiz: is the grammar S ? aS|bSS|aSS|c s -grammar ?
SSS
(Illustrates ambiguous grammar with cycles.) 25 NO Why? The pair (S,a) occurs twice 26

Chomsky Normal Form CNF

Even though we can’t get every grammar


into right-linear form, or in general even
Normal Forms get rid of ambiguity, there is an especially
simple form that general CFG’s can be
converted into:
Chomsky Normal Form
Griebach Normal Form

27 28

Chomsky Normal Form CNF


Chomsky Normal Form
Definition 6.4: A CFG is in Chomsky normal form
Noam Chomsky came up with an especially simple
if and only if all production rules are of the form
type of context free grammars which is able to
A → BC capture all context free languages.
or A→ x Chomsky's grammatical form is particularly useful
with variables A,B,C∈V and x∈T. when one wants to prove certain facts about
(Sometimes rule S→? is also allowed.) context free languages. This is because
CFGs in CNF can be parsed in time O(|w|3). assuming a much more restrictive kind of
grammar can often make it easier to prove that
Named after Noam Chomsky who in the generated language has whatever property
the 60s made seminal contributions you are interested in.
to the field of theoretical linguistics.
(cf. Chomsky hierarchy of languages).
29 30

5
Chomsky Normal Form CNF Chomsky Normal Form CNF
Significance of CNF
A CFG is said to be in Chomsky Normal Form if every rule in the
• Length of derivation of a string of length grammar has one of the following forms:
n in CNF = (2n-1)
(Cf. Number of nodes of a strictly binary tree with n-leaves) A → BC (dyadic variable productions)

• Maximum depth of a parse tree = n log 2 n  + 1 A→a (unit terminal productions)


• Minimum depth of a parse tree =
S →λ (? for empty string sake only)

where B,C∈V − {S}

Where S is the start variable, A,B,C are variables and a is a terminal.


Thus epsilons may only appear on the right hand side of the start
symbol and other RHS are either 2 variables or a single terminal.

31 32

Chomsky Normal Form CNF Chomsky Normal Form CNF


CFGè CNF CFGè CNF: Construction
• Obtain an equivalent grammar that does
not contain λ-rules, chain rules, and
useless variables.
• Theorem: There is an algorithm to • Apply following conversion on rules of
construct a grammar G’ in CNF that is the form: A → bBcC
equivalent to a CFG G.
A → PQ P→b
Q → BR R → WC
33
W →c 34

Chomsky Normal Form CNF Chomsky Normal Form CNF


CFGè CNF: Construction CFGè CNF: Example 1

Converting a general grammar into Chomsky


Normal Form works in four steps: Let’s see how this works on the following
1. Ensure that the start variable doesn't example grammar:
appear on the right hand side of any rule.
2. Remove all ?-rules productions, except from
start variable.
3. Remove unit variable productions of the Sà? | a | b | aSa | bSb
form A à B where A and B are variables.
4. Add variables and dyadic variable rules to
replace any longer non-dyadic or non-
variable productions 35 36

6
Chomsky Normal Form CNF Chomsky Normal Form CNF
CFGè CNF: Example 1 CFGè CNF: Example 1

1. Start Variable 2. Remove ?-rules

Ensure that start variable doesn't appear Remove all epsilon productions, except
on the right hand side of any rule. from start variable.

S’àS S’àS | ?
Sà? | a | b | aSa | bSb Sà? | a | b | aSa | bSb | aa | bb

37 38

Chomsky Normal Form CNF Chomsky Normal Form CNF


CFGè CNF: Example 1 CFGè CNF: Example 1

4. Longer production rules


3. Remove variable units
Add variables and dyadic variable rules to
Remove unit variable productions of the replace any longer productions.
form A à B. S’à ? | a | b | aSa | bSb | aa | bb AB|CD|AA|CC
Sàa | b | aSa | bSb | aa | bb AB|CD|AA|CC
S’àS | ? | a | b | aSa | bSb | aa | bb Aà a
Sà? | a | b | aSa | bSb | aa | bb Bà SA
Cà b
39 DàSC 40

CFGè CNF: Example 2


1.
2. Add
Remove
a new A ? variable
all start ? rules S0 S0 ? S
Chomsky Normal Form CNF
and
(where
addAthe ruleSS0)0 ? S
is not S? 0S1
CFGè CNF: Example 1
For each occurrence of A on right S? T#T
hand side of a rule, add a new rule S? T
5. Result
CNF with the occurrence deleted T? ?
S’à ? | a | b | AB | CD | AA | CC If we have the rule B ? A, add S? T#
CFG Sà a | b | AB | CD | AA | CC B ? ?, unless we have S? #T
Sà? | a | b | aSa | bSb previously removed B ? ? S? #
Aà a
S? ?
Bà SA 3. Remove unit rules A ? B
S0?? 01
0S1
Cà b Whenever B ? w appears, add S0 ? ?
DàSC the rule A ? w unless this was
a unit rule previously removed
41 42

7
CFGè CNF: Example 2 CFGè CNF: Example 2

4. Convert all remaining rules into the S0 ? ?


Convert the following into Chomsky normal form:
proper form S0 ? 0S1
A ? BAB | B | ?
S0 ? 0S1 S0 ? T#T
B ? 00 | ?
S0 ? T#
S0 ? A1A2
S0 ? #T S0 ? A S0 ? A | ?
A1 ? 0 A ? BAB | B | BB | AB | BA
S0 ? # A ? BAB | B | ?
A2 ? SA3 S0 ? 01 B ? 00 | ? B ? 00
A3 ? 1 S ? 0S1
S ? T#T
S0 ? T# S ? T# S0 ? BAB | 00 | BB | AB | BA | ?
S0 ? TA4 S ? #T A ? BAB | 00 | BB | AB | BA
S? # B ? 00
A4 ? #
S ? 0143 44

Chomsky Normal Form CNF Chomsky Normal Form CNF


Exercise Answer
•S ? aA|aBB
•Write into Chomsky Normal Form the CFG: A ? aaA|?
B ? bC|bbC
S? aA|aBB C? B
A? aaA|? •Answer (1): First you remove the ?-productions
B? bC|bbC (A⇒?):
C? B •S ? aA|aBB|a
A ? aaA|aa
B ? bC|bbC
C? B
45 46

Chomsky Normal Form CNF Chomsky Normal Form CNF


Answer Answer
•Answer (2): Next you remove the unit-productions from: Answer(3): Next, we determine the useless
S ? aA|aBB|a variables in
A ? aaA|aa
S ? aA|aBB|a
B ? bC|bbC
C? B A ? aaA|aa
•Removing C? B, we have to include the C⇒*B B ? bC|bbC
possibility, which can be done by substitution and gives: C ? bC|bbC
S ? aA|aBB|a
A ? aaA|aa The variables B and C can not terminate and are
B ? bC|bbC
C ? bC|bbC
therefore useless. So, removing B and C gives:
S ? aA|a
47 A ? aaA|aa 48

8
Chomsky Normal Form CNF Chomsky Normal Form CNF
Answer Answer

Answer(4): To make the CFG in Chomsky Answer(5): Finally, we have to ‘chain’ the
normal form, we have to introduce terminal variables in
producing variables for S ? XaA|a
S ? aA|a A ? XaXaA|XaXa
A ? aaA|aa, Xa ? a,
•which gives
•which gives
S ? XaA|a S ? XaA|a
A ? XaXaA|XaXa A ? XaA2 |XaXa
Xa ? a. A2 ? XaA
49 50
Xa ? a.

Griebach Normal Form GNF Griebach Normal Form GNF

• The size of the equivalent GNF can be


• A CFG is in Griebach Normal Form large compared to the original grammar.
• Next Example CFG has 5 rules, but the
if each rule is of the form corresponding GNF has 24 rules!!

A → aA1 A2 ... An • Length of the derivation in GNF


A→ a = Length of the string.
S→λ • GNF is useful in relating CFGs
where Ai ∈ V − {S} (“generators”) to pushdown automata
(“recognizers”/”acceptors”).
51 52

Griebach Normal Form GNF Griebach Normal Form GNF


CFGè GNF CFGè GNF: Example
C → ( bCB | a ) R
| bCB | a
A → BC
B → CA | b R → ACBR | ACB
• Theorem: There is an algorithm to
C → AB | a
construct a grammar G’ in GNF that is
equivalent to a CFG G.
C → bCBR | aR | bCB | a
B → bcBRA | aRA
A → BC
| bCBA | aA | b
B → CA | b
A → bcBRAC | aRAC
C → BCB | a | bCBAC | aAC | bC
C → CAC B | bCB | a R → (bCBRAC | ... | bC )(CBR | CB )
53 54

9
Context Sensitive Grammar Context Sensitive Grammar (CSG)
Example
An even more general form of grammars exists. Find the language generated by the CSG:
In general, a non-context free grammar is one
in which whole mixed variable/terminal S à ? | ASBC
substrings are replaced at a time. For Aàa
example with Σ = {a,b,c} consider: CB à BC
S à ? | ASBC aB à ab aB à ab
Aàa bB à bb bB à bb
CB à BC bC à bc bC à bc
cC à cc cC à cc
For technical reasons, when length of LHS
always ≤ length of RHS, these general
55 56
grammars are called context sensitive.

Context Sensitive Grammar (CSG) Relations between Grammars


Example So far we studied 3 grammars:

1. Regular Grammars (RG)


Answer is {anbncn}.
2. Context Free Grammars (CFG)
2. Context Sensitive Grammars (CSG)
In a future class we’ll see that this
The relation between these 3 grammars is as follow:
language is not context free. Thus
perturbing context free-ness by allowing CSG
CFG
context sensitive productions expands
the class. RG

57 58

Grammar Applications Grammar Applications


Programming Languages
Programming languages are often defined as Context Compiler Syntax Analysis
Free Grammars in Backus-Naur Form (BNF).
This part of
Compiler: Source Program the compiler
Example:
<if_statement> ::= IF <expression><then_clause><else_clause> use the
<expression> ::= <term> | <expression>+<term> Grammar
<term> ::= <factor>|<term>*<factor> Scanner
Parser
The variables as indicated by <a variable name>
The arrow ? is replaces by ::= Semantic Analy.
Here, IF, + and * are terminals. Inter. Code Gen.
“Syntax Checking” is checking if a program is an Optimizer
element of the CFG of the programming language. Code Generation
59 60
Target Program

10
Applications of CFG
Parsing is where we use the theory of CFGs.

The theory is especially relevant when dealing with


Extensible Markup Language (XML) files and their
corresponding Document Type Definitions (DTDs).

Document Type Definitions define the grammar that


the XML files have to adhere to. Validating XLM files
equals parsing it against the grammar of the DTD.

The nondeterminism of NPDAs can make parsing slow.


What about deterministic PDAs?
61

11

You might also like