0% found this document useful (0 votes)
194 views73 pages

Week 10 (Part A) Context Free Grammar Trees Ambiguous Grammer Plus Additional Slides

The document discusses theory of automata including context free grammars, trees, Lukasiewicz notation, ambiguity, total language tree, semi-words, and regular grammars. Specifically, it provides examples of derivation trees to represent context free grammar productions and the use of trees to represent parsing and avoid ambiguity. It also describes how Lukasiewicz notation can be used to represent arithmetic expressions without parentheses by traversing the parse tree. Finally, it discusses how any finite automaton can be represented by a regular grammar in context free form and how regular grammars are a subset of context free grammars where productions are of the form non-terminal -> semi-word or non-terminal -> word.

Uploaded by

AbdulJaleelMemon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
194 views73 pages

Week 10 (Part A) Context Free Grammar Trees Ambiguous Grammer Plus Additional Slides

The document discusses theory of automata including context free grammars, trees, Lukasiewicz notation, ambiguity, total language tree, semi-words, and regular grammars. Specifically, it provides examples of derivation trees to represent context free grammar productions and the use of trees to represent parsing and avoid ambiguity. It also describes how Lukasiewicz notation can be used to represent arithmetic expressions without parentheses by traversing the parse tree. Finally, it discusses how any finite automaton can be represented by a regular grammar in context free form and how regular grammars are a subset of context free grammars where productions are of the form non-terminal -> semi-word or non-terminal -> word.

Uploaded by

AbdulJaleelMemon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Theory of Automata

Context Free Grammars


Dr Waseem Shehzad, Mr. Hafiz Tayyeb Javed
Mehreen Alam
Week 10
Trees
• Consider the following CFG:
S → AA
A → AAA|bA|Ab|a
• The derivation of the word bbaaaab is as follows:

S  AA  bAAAA  bbAaaAb  bbaaaab

Theory of Automata 2
• We can use a tree diagram to show that derivation process:
We start with the symbol S. Every time we use a production to replace a non-
terminal by a string, we draw downward lines from the non-terminal to
EACH character in the string.

• Reading from left to right produces the word bbaaaab.


• Tree diagrams are also called syntax trees, parse trees, generation trees,
production trees, or derivation trees.

Theory of Automata 3
Lukasiewicz Notation - Example
• Also called the polish prefix notation.
• A parenthesis free notation
• Consider the following CFG for a simplified version of arithmetic
expressions:
S  S + S | S * S | number
where the only non-terminal is S, and the terminals are number together
with the symbols +,* .

• Obviously, the expression 3 + 4 * 5 is a word in the language defined by


this CFG; however, it is ambiguous since it is not clear whether it means (3
+ 4) * 5 (which is 35), or 3 + (4 * 5) (which is 23).
• To avoid ambiguity, we often need to use parentheses, or adopt the
convention of “hierarchy of operators” (i.e., * is to be executed before +).
• We now present a new notation that is unambiguous but does not rely on
operator hierarchy or on the use of parentheses.

Theory of Automata 4
Theory of Automata 5
New Notation: Lukasiewicz notation
• We can now construct a new notation for arithmetic
expressions:

– We walk around the tree and write down symbols, once each, as
we encounter them.

– We begin on the left side of the start symbol S and head south.

– As we walk around the tree, we always keep our left hand on the
tree.

Theory of Automata 6
• Using the algorithm above, the first derivation tree is converted into
the notation: + 3 * 4 5.
• The second derivation tree is converted into * + 3 4 5.

Theory of Automata 7
Example
• Consider the expression: + 3 * 4 5:

Theory of Automata 8
• Consider the second expression: * + 3 4 5:

Theory of Automata 9
Example
• Convert the following arithmetic expression into operator prefix
notation:

((1 + 2) * (3 + 4) + 5) * 6.

• This normal notation is called operator infix notation, with which


we need parentheses to avoid ambiguity.

• Let’s us draw the derivation tree:

Theory of Automata 10
• Reading around the tree gives the equivalent prefix
notation expression:
• * + * + 1 2 + 3 4 5 6.

Theory of Automata 11
Evaluate the String

Theory of Automata 12
• This operator prefix notation was invented by Lukasiewicz (1878 -
1956) and is often called Polish notation.

• There is a similar operator postfix notation (also called Polish


notation), in which the operation symbols (+, -, ...) come after the
operands. This can be derived by tracing around the tree of the
other side, keeping our right hand on the tree and then reversing
the resultant string.

• Both these methods of notation are useful for computer science:


Compilers often convert infix to prefix and then to assembler code.

Theory of Automata 13
Ambiguity- example
• Consider the language generated by the following CFG:
S → AB
A→a
B→b
• There are two derivations of the word ab:

S  AB  aB  ab
or
S  AB  Ab  ab

Theory of Automata 14
• However, These two derivations correspond to the same syntax
tree:

• The word ab is therefore not ambiguous. In general, when all the


possible derivation trees are the same for a given word, then the
word is unambiguous.

Theory of Automata 15
Ambiguity - Definition

A CFG is called ambiguous if for at least one word in the language


that it generates, there are two possible derivations of the word
that correspond to different syntax trees. If a CFG is not ambiguous,
it is called unambiguous.

Theory of Automata 16
Example
• The following CFG defines the language of all non-null strings of a’s:
S → aS| Sa |a
• The word a3 can be generated by 4 different trees:

Theory of Automata 17
Example
• the CFG, S→aS|a is not ambiguous as neither the word aaa nor any
other word can be derived from more than one production trees.
The derivation tree for aaa is as follows:

Theory of Automata 18
The Total Language Tree
• It is possible to depict the generation of all the words in the
language of a CFG simultaneously in one big (possibly infinite) tree.
Definition:
• For a given CFG, we define a tree with the start symbol S as its root
and whose nodes are working strings of terminals and non-
terminals. The descendants of each node are all the possible results
of applying every applicable production to the working string, one
at a time. A string of all terminals is a terminal node in the tree. The
resultant tree is called the total language tree of the CFG.

Theory of Automata 19
Example
• Consider the CFG:
S → aa | bX |aXX
X → ab |b
• The total language tree is

Theory of Automata 20
• The above total language has only 7 different words.
• Four of its words (abb, aabb, abab, aabab) have two different
derivations because they appear as terminal nodes in two different
places.
• However, these words are NOT generated by two different derivation
trees. Hence, the CFG is unambiguous. For example,

Theory of Automata 21
Example
• Consider the CFG:
S → aSb | bS | a
• The language of this CFG is infinite, so is the total language tree:
The tree may get arbitrary wide as well as infinitely long.
• Can you draw the beginning part of this total language tree?

Theory of Automata 22
Semi Word
• For a given CFG, semi-word is a string of terminals (may be none)
concatenated with exactly one non-terminal (on the right).

• In general semi-word has the shape

(terminal) (terminal)….(terminal) (Non-Terminal)

e.g. aaaX abcY bbY

A word is a string of terminals only (zero or more terminals)

Theory of Automata 23
Regular Grammar
Given an FA, there is a CFG that generates exactly the language
accepted by the FA.

– In other words, all regular languages are CFLs

CFL

Regular

Theory of Automata 24
Creating a CFG from an FA
Step-1 The Non-terminals in CFG will be all names of the states in the
FA with the start state renamed S.
Step-2 For every edge a

a
X y x

Create productions XaY or XaX


Do the same for b-edges
Step-3 For every final-state X, create the production

XΛ

Theory of Automata 25
b Example
a,b
a
a
S- M F+

S  aM
S  bS Note: It is not necessary that each CFG
M  aF has a corresponding FA. But each FA
M  bS has an equivalent CFG.
F  aF
F  bF
FΛ
Theory of Automata 26
Regular Grammar
Theorem 22:
If all the productions in a given CFG fit one of the two forms: Non-
terminal  semiword
or Non-terminal  word
(Where the word may be a Λ or string of terminal), then the language
generated by the CFG is Regular.
Proof:
For a CFG to be regular is by constructing a TG from the given CFG.

Theory of Automata 27
Proof contd.
• Let us consider a general CFG in this form
N1  w1N2 N7  w10
N1 w2N3 N18  w23
N2  w3N4 ---------------
--------------- ---------------
Where N’s are non-terminal and w’s are the string of terminal and part
wyNz are semiwords.
Let N1=S. Draw a small circle for each N and one extra circle labelled +,
the circle for S we label (-)

N2 N13
-S
N3 …… Nx …… +

Theory of Automata 28
Proof contd.
• For each production of the form Nx  wyNz, draw a directed edge from
state Nx to Nz with label wy. wy
Nx Nz
• If Nx = Nz, the path is a loop
• For every production of the form Np  wq, draw a directed edge from
Np to + and label it with wq even if wq = Λ.
wq
N p +

• Any path in TG form – to + corresponds to a word in the language of TG


(by concatenating symbols) and simultaneously corresponds to
sequence of productions on the CFG generating words.
• Conversely every production of the word in the CFG:
S  wN  wwN  wwwN  …..  wwwww
Corresponds to a path in this TG.

Theory of Automata 29
Example
• Consider the CFG S  aaS | bbS | Λ
aa

bb Λ
- +

• The regular expression is given by (aa + bb)*.


• Consider the CFG • Language accepted?
SaaS | bbS | abX | baX | Λ
X aaX | bbX | abS | baS
• EVEN-EVEN
aa,bb aa,bb
ab, ba

Λ - X
+
ab, ba
Theory of Automata 30
Context-Free Languages
Additional Stuff

Prof. Busch - LSU 31


Context-Free Languages

n n R
{a b : n  0} {ww }

Regular Languages
a *b * ( a  b) *

Prof. Busch - LSU 32


Context-Free Languages

Context-Free Pushdown
Grammars Automata

stack

automaton

Prof. Busch - LSU 33


Context-Free Grammars

Prof. Busch - LSU 34


Another Example
Sequence of
terminals and variables

Grammar: S  aSb
S 

Variable The right side


may be 

Prof. Busch - LSU 35


S  aSb
Grammar: S 

ab
Derivation of string :
S  aSb  ab

S  aSb S 
Prof. Busch - LSU 36
S  aSb
Grammar: S 

aabb
Derivation of string :
S  aSb  aaSbb  aabb

S  aSb S 
Prof. Busch - LSU 37
Grammar: S  aSb
S 

Other derivations:

S  aSb  aaSbb  aaaSbbb  aaabbb

S  aSb  aaSbb  aaaSbbb


 aaaaSbbbb  aaaabbbb
Prof. Busch - LSU 38
Grammar: S  aSb
S 

Language of the grammar:

L  {a b : n  0}
n n

Prof. Busch - LSU 39


A Convenient Notation
*
S  aaabbb
We write: for zero or more derivation steps

Instead of:
S  aSb  aaSbb  aaaSbbb  aaabbb

Prof. Busch - LSU 40


*
In general we write: w1  wn

If: w1  w2  w3    wn
in zero or more derivation steps

*
Trivially: w  w
Prof. Busch - LSU 41
Example Grammar Possible Derivations
S  aSb *
S 
S  *
S  ab
*
S  aaabbb

 
S  aaSbb  aaaaaSbbbbb

Prof. Busch - LSU 42


Another convenient notation:

S  aSb
S  aSb | 
S 

Prof. Busch - LSU 43


Formal Definitions
Grammar: G  V , T , S , P 

Set of
variables
Set of Start Set of
terminal variable productions
symbols

Prof. Busch - LSU 44


Context-Free Grammar: G  (V , T , S , P)

All productions in P are of the form

A s
Variable String of
variables and
terminals

Prof. Busch - LSU 45


Example of Context-Free Grammar
S  aSb | 
productions
P  {S  aSb, S  }

G  V ,T , S , P 

V  {S}
T  {a, b} start variable
variables
terminals
Prof. Busch - LSU 46
Language of a Grammar:

G S
For a grammar with start variable
*
L(G )  {w : S  w, w  T *}

String of terminals or 

Prof. Busch - LSU 47


Example:

context-free grammar G : S  aSb | 

L(G )  {a b : n  0}
n n

Since, there is derivation



S a b n n
for any n 0
Prof. Busch - LSU 48
Context-Free Language definition:
A language Lis context-free
if there is a context-free grammar G
with L  L(G )

Prof. Busch - LSU 49


Example:

L  {a b : n  0}
n n

is a context-free language
since context-free grammar G :
S  aSb | 

generates L(G )  L

Prof. Busch - LSU 50


Another Example

Context-free grammar G :
S  aSa | bSb | 
Example derivations:
S  aSa  abSba  abba
S  aSa  abSba  abaSaba  abaaba

L(G )  {ww : w {a, b}*}


R

Palindromes of even length


Prof. Busch - LSU 51
Another Example
Context-free grammar G :
S  aSb | SS | 
Example derivations:
S  SS  aSbS  abS  ab
S  SS  aSbS  abS  abaSb  abab

L(G )  {w : na ( w)  nb ( w),
and na (v)  nb (v)
Describes
matched
in any prefix v}
parentheses: () ((( ))) (( )) a  (, b )
Prof. Busch - LSU 52
Derivation Order
and
Derivation Trees

Prof. Busch - LSU 53


Derivation Order
Consider the following example grammar
with 5 productions:

1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Prof. Busch - LSU 54


1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Leftmost derivation order of string aab :

1 2 3 4 5
S  AB  aaAB  aaB  aaBb  aab

At each step, we substitute the


leftmost variable
Prof. Busch - LSU 55
1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Rightmost derivation order of string aab :

1 4 5 2 3
S  AB  ABb  Ab  aaAb  aab
At each step, we substitute the
rightmost variable
Prof. Busch - LSU 56
1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Leftmost derivation of aab :


1 2 3 4 5
S  AB  aaAB  aaB  aaBb  aab

Rightmost derivation of aab :


1 4 5 2 3
S  AB  ABb  Ab  aaAb  aab
Prof. Busch - LSU 57
Derivation Trees
Consider the same example grammar:

S  AB A  aaA |  B  Bb | 

And a derivation of aab :

S  AB  aaAB  aaABb  aaBb  aab

Prof. Busch - LSU 58


S  AB A  aaA |  B  Bb | 

S  AB
S

A B

yield AB

Prof. Busch - LSU 59


S  AB A  aaA |  B  Bb | 

S  AB  aaAB
S

A B

yield aaAB
a a A

Prof. Busch - LSU 60


S  AB A  aaA |  B  Bb | 

S  AB  aaAB  aaABb
S

A B

a a A B b

yield aaABb
Prof. Busch - LSU 61
S  AB A  aaA |  B  Bb | 
S  AB  aaAB  aaABb  aaBb
S

A B

a a A B b

yield

Prof. Busch - LSU aaBb  aaBb 62
S  AB A  aaA |  B  Bb | 
S  AB  aaAB  aaABb  aaBb  aab
Derivation Tree S
(parse tree)
A B

a a A B b
yield

Prof. Busch - LSU
 aab  aab
63
Sometimes, derivation order doesn’t matter
Leftmost derivation:
S  AB  aaAB  aaB  aaBb  aab
Rightmost derivation:
S  AB  ABb  Ab  aaAb  aab
S

Give same
A B
derivation tree
a a A B b

Prof. Busch - LSU   64


Ambiguity

Prof. Busch - LSU 65


Ambiguous Grammar:
A context-free grammar G is ambiguous
if there is a string w L(G ) which has:

two different derivation trees


or
two leftmost derivations

(Two different derivation trees give two


different leftmost derivations and vice-versa)
Prof. Busch - LSU 66
Example: E  E  E | E  E | (E) | a

this grammar is ambiguous since


string a  a  a has two derivation trees
E E

E  E E  E

a E  E E  E a

a a a a
Prof. Busch - LSU 67
E  E  E | E  E | (E) | a
this grammar is ambiguous also because
string a  a  a has two leftmost derivations
E  E  E  a E  a EE
 a  a E  a  a*a

E  EE  E  EE  a EE


 a  aE  a  aa
Prof. Busch - LSU 68
Another ambiguous grammar:

IF_STMT  if EXPR then STMT


| if EXPR then STMT else STMT

Variables Terminals

Very common piece of grammar


in programming languages
Prof. Busch - LSU 69
If expr1 then if expr2 then stmt1 else stmt2
IF_STMT

if expr1 then STMT

if expr2 then stmt1 else stmt2

Two derivation trees


IF_STMT

if expr1 then STMT else stmt2

if expr2 then stmt1


Prof. Busch - LSU 70
In general, ambiguity is bad
and we want to remove it

Sometimes it is possible to find


a non-ambiguous grammar for a language

But, in general ιt is difficult to achieve this

Prof. Busch - LSU 71


A successful example:
Equivalent
Ambiguous
Grammar Non-Ambiguous
Grammar
E E E
E  E T |T
E  E E
T T  F | F
E  (E )
E a F  (E ) | a
generates the same
language
Prof. Busch - LSU 72
The string a n b n c n  L
has always two different derivation trees
(for any grammar)

For example
S S

S1 S2

S1 c a S2

Prof. Busch - LSU 73

You might also like