Chapter3 CFG
Chapter3 CFG
1
Hierarchy of languages (Revision)
Recursive Languages
Context-Free Languages
Regular Languages
2
Context Free Languages (CFL)
• The pumping lemma showed there are languages that are not
regular
– There are many classes “larger” than that of regular languages
– One of these classes are called “Context Free” languages
• Described by Context-Free Grammars (CFG)
– Why named context-free?
– Property that we can substitute strings for variables regardless of
context (implies context sensitive languages exist)
• CFG’s are useful in many applications
– Describing syntax of programming languages
– Parsing
– Structure of documents, e.g.XML
• Analogy of the day:
– DFA:Regular Expression as Pushdown Automata : CFG
CFG
• It is a notation used to specify the syntax of the language.
Context-free Grammar is used to design parsers.
4
Cont…
L(G) = {w : S * w }
Example
• The grammar ({A}, {a, b, c}, P, A), P : A → aA, A → abc.
• The grammar ({S, a, b}, {a, b}, P, S), P: S → aSa, S → bSb, S →
ε
• The grammar ({S, F}, {0, 1}, P, S), P: S → 00S | 11F, F → 00F8 | ε
Examples
1. Write down Grammar for language L={an|n≥1}
Solution
Let G=(V,Σ,P,S)
V = {S}
Σ={a}
P = { S→aS|a }
2. Construct a CFG for a language L = {wcwR | where w € (a,
b)*}.
The grammar could be:
S → aSa rule 1
S → bSb rule 2
S→c rule 3
9
Cont…
Construct a CFG for the language L = anb2n where n>=1.
Solution:
The string that can be generated for a given language is {abb,
aabbbb, aaabbbbbb....}.
The grammar could be:
S → aSbb | abb
10
Sample CFG
14
Cont…
There are two different approaches to draw a derivation tree −
Top-down Approach −
• Starts with the starting symbol S
• Goes down to tree leaves using productions
Bottom-up Approach −
• Starts from tree leaves
• Proceeds upward to the root which is the starting symbol S
Derivation or Yield of a Tree
The derivation or the yield of a parse tree is
the final string obtained by concatenating
the labels of the leaves of the tree from left
to right, ignoring the Nulls. However, if all
the leaves are Null, derivation is Null.
Let a CFG {N,T,P,S} be
N = {S}, T = {a, b}, Starting symbol = S, P = S → SS | aSb | ε
One derivation from the above CFG is “abaabb”S → SS → aSbS
15
→ abS → abaSb → abaaSbb → abaabb
Cont…
Sentential Form and Partial Derivation Tree
Example
Let any set of production rules in a CFG be
X → X+X | X*X |X| a over an alphabet {a}.
The leftmost derivation for the string "a+a*a" may be −
X → X+X → a+X → a + X*X → a+a*X → a+a*a
The stepwise derivation of the above string is shown as below −
18
Cont…
19
Cont…
The rightmost derivation for the above string "a+a*a" may be −
X → X*X → X*a → X+X*a → X+a*a → a+a*a
The stepwise derivation of the above string is shown as below −
20
Example 2
S → AB
Let’s draw leftmost and rightmost derivations
A → Aa | a of above grammar to get the string “aab”.
B→b
Leftmost Derivation Rightmost Derivation
24
Example CFG
for {0k1k | k≥0}:
G = ({S}, {0, 1}, P, S) // Remember: G = (V, T, P,
S)
P:
(1) S –> 0S1 or
just simply S –> 0S1 | ε Derivations:
Example
(2) S –> ε
S => 0S1 (1) S => ε (2)
=> 01 (2)
S => 0S1 (1)
=> 00S11 (1)
=> 000S111 (1)
=> 000111 (2)
• Derivation of aabb
S aSb aaSbb aabb
• Derivation tree
S
a S b
a b
l
Examples
G ({S , A, B},{a , b},
{S AB, A aA | ,
B Bb | },
S)
L(G ) L( a * b*)
Leftmost D erivation :
S AB aAB aB aBb ab
Rightmost Derivation :
S AB ABb Ab aAb ab
27
Derivation Tree (abstracts derivation)
S
A B
a A B b
l l
Parse Trees and Derivations…..Cont…
E 1
EE+E
E2 E3 (1)
+
E 4
E 5
id + E
i *
d
(2)
i i
d
Preorder d
id + E * E (3)
numbering
E 1
id + id * E (4)
id + id * id(5)
E5 + E 2
EE+E
i E 4
* E 3
(1)
d i i
E + E * E (2)
d
Reverse of postorder d
numbering
E + E * id (3)
E + id * id (4) 29
Examples
• C++ identifier names. Check if _var2 is valid identifier name.
Answer:
• < id> : = < letter > < rest >|< underscore > < rest >
• < rest > :: = < letter >< rest > | < underscore >< rest > | < digits
> < rest>| Ɛ
<letter> :: = a|b|c…|z|A|B|…|Z
• <digits> ::= 0|1|…|9
• <underscore> :: = -
• Changing it to CFG :
• I LR|UR
• R LR|UR|DR|l
• L a | b | c … | z | A | B | … | Z
• D 0| 1 | … | 9
U _ 30
Examples: CFGs and CFLs
Find out language generated by Grammar.
G=({S},{a,b}{S → a S b,S → a,b},S)
Solution
S aSa | B
L( S ) {a n bm a n | n 0 m 0}
B bB |
S AB
S aS | aB
A aA | a
B bB |
B bB |
L( S ) {a n b m | m 0 n 0}
* 32
L( S ) L( a b )
n
S abScB | L( S ) {(ab) n
|
cb mi
B bB | b i 1
n 0 (i : 1 i n mi 0)}
S aS | B
S AbAbA B bA
a * ba * ba *
A aA | A aA | bC
C aC |
S --> B | Ɛ
L = {am bn | m >= n}. B --> aBb | A
A --> aA | a
33
Cont…
L {w {a, b}* | length ( w) is EVEN}
E
E | aO | bO
| aaE | abE
| baE | bbE O aE | bE
S → S1S2
L2 = {a nb mc k | n + k = m } S1 → aS1b
S1 → ϵ
S2 → bS2c
S2 → ϵ
35
Cont…
Left and Right Recursive Grammars
In a context-free grammar G, if there is a production in the
form X → Xa where X is a non-terminal and ‘a’ is a string of
terminals, it is called a left recursive production. The
grammar having a left recursive production is called a left
recursive grammar.
And if in a context-free grammar G, if there is a production is
in the form X → aX where X is a non-terminal and ‘a’ is a
string of terminals, it is called a right recursive production.
The grammar having a right recursive production is called
a right recursive grammar.
36
Right/Left Recursive… (General)
• A grammar is a left • A grammar is a right
recursive if its production recursive if its production
rules can generate a rules can generate a
derivation of the form A derivation of the form A
*
A X. *
X A.
• Examples: • Examples:
– E E O id | (E) | id – E id O E | (E) | id
– E F + id | (E) | id – E id + F | (E) | id
F E * id | id F id * E | id
• E F + id • E id + F
E * id + id
id + id * E
37
Ambiguity in Context-Free Grammars
• Context Free Grammars(CFGs) are classified based on:
• Number of Derivation trees
• Number of strings
• Depending on the Number of Derivation trees, CFGs are sub-
divided into 2 types:
• Ambiguous grammars
• Unambiguous grammars
41
Exercise
Check the Following are ambiguous grammar or not:
• S-> aS |Sa| Є
• E-> E +E | E*E| id
• A -> AA | (A) | a
↨ If both left and right recursion are not present in grammar, then
is the grammar unambiguous? Explain with an example.
↨ Ans– No, the grammar can still be ambiguous. If both left and
right recursion are present in grammar, then the grammar is
ambiguous, but the reverse is not always true.
43
Cont….
In the above example, although both
Example - left and right recursion are not
S -> aB | ab present, but if we see string { ab },
A -> AB | a we can make more than one parse
B -> Abb | b tree to generate the string.
44
Cont….
1. State whether the grammar is ambiguous or not.
S -> SAB | Є
A -> AaB | a
B -> AS | b
Ans – The grammar is Ambiguous.
If we put
B -> AS in S -> SAB
Then we get S -> SAAS and the grammar clearly contains both
left and right recursion. Hence the grammar is ambiguous.
49
Simplification of CFG
In a CFG, it may happen that all the production rules and symbols
are not needed for the derivation of strings. Besides, there may be
some null productions and unit productions. Elimination of these
productions and symbols is called simplification of CFGs.
50
Cont…
Simplification essentially comprises of the following steps −
• Reduction of CFG
• Eliminate ambiguity.
• Eliminate “useless” variables.
• Eliminate e-productions: A .
• Eliminate unit productions: A B.
• Eliminate redundant productions.
• Trade left- & right-recursion.
Reduction of CFG
CFGs are reduced in two phases −
Phase 1 − Derivation of an equivalent grammar, G’, from the
CFG, G, such that each variable derives some terminal string.
Cont…
Derivation Procedure −
Step 1 − Include all symbols, W1, that derive some terminal
and initialize i=1.
Step 2 − Include all symbols, Wi+1, that derive Wi.
Step 3 − Increment i and repeat Step 2, until Wi+1 = Wi.
Step 4 − Include all production rules that have Wi in it.
Phase 2 − Derivation of an equivalent grammar, G”, from the
CFG, G’, such that each symbol appears in a sentential form.
Derivation Procedure −
Step 1 − Include the start symbol in Y1 and initialize i = 1.
Step 2 − Include all symbols, Yi+1, that can be derived
from Yi and include all production rules that have been applied.
Step 3 − Increment i and repeat Step 2, until Yi+1 = Yi.
52
Cont…
Example
Find a reduced grammar equivalent to the grammar G, having
production rules, P: S → AC | B, A → a, C → c | BC, E → aA |
e
Solution
Phase 1 −
T = { a, c, e }
W1 = { A, C, E } from rules A → a, C → c and E → aA
W2 = { A, C, E } U { S } from rule S → AC
W3 = { A, C, E, S } U ∅
Since W2 = W3, we can derive G’ as −
G’ = { { A, C, E, S }, { a, c, e }, P, {S}}
where P: S → AC, A → a, C → c , E → aA | e
53
Cont…
Phase 2 −
Y1 = { S }
Y2 = { S, A, C } from rule S → AC
Y3 = { S, A, C, a, c } from rules A → a and C → c
Y4 = { S, A, C, a, c }
Since Y3 = Y4, we can derive G” as − G” = { { A, C, S }, { a, c },
P, {S}}, of
Removal where
Unit P: S → AC, A → a, C → c
Productions
Any production rule in the form A → B where A, B ∈ Non-terminal
is called unit production..
Removal Procedure −
1 − To remove A → B, add production A → x to the grammar rule
whenever B → x occurs in the grammar. [x ∈ Terminal, x can be
Null]
2 − Delete A → B from the grammar.
3 − Repeat from step 1 until all unit productions are removed. 54
Cont….
Example
Remove unit production from the following −
S → XY, X → a, Y → Z | b, Z → M, M → N, N → a
Solution −
There are 3 unit productions in the grammar −
Y → Z, Z → M, and M → N
At first, we will remove M → N.
As N → a, we add M → a, and M → N is removed.
The production set becomes
S → XY, X → a, Y → Z | b, Z → M, M → a, N → a
Now we will remove Z → M.
As M → a, we add Z→ a, and Z → M is removed.
The production set becomes
S → XY, X → a, Y → Z | b, Z → a, M → a, N → a
55
Cont…
Now we will remove Y → Z.
As Z → a, we add Y→ a, and Y → Z is removed.
The production set becomes
S → XY, X → a, Y → a | b, Z → a, M → a, N → a
Now Z, M, and N are unreachable, hence we can remove those.
The final CFG is unit production free −
S → XY, X → a, Y → a | b
Removal of Null Productions
In a CFG, a non-terminal symbol ‘A’ is a nullable variable if
there is a production A → ε or there is a derivation that starts
at A and finally ends up with
ε: A → .......… → ε
56
Cont….
Removal Procedure
Step 1 − Find out nullable non-terminal variables which derive ε.
Step 2 − For each production A →a, construct all productions A
→ x where x is obtained from ‘a’ by removing one or multiple
non-terminals from Step 1.
Step 3 − Combine the original productions with the result of step 2
and remove ε - productions.
Example
Remove null production from the following −
S → ASA | aB | b, A → B, B → b | ∈
Solution −
There are two nullable variables
− A and B
At first, we will remove B → ε. 57
Cont….
After removing B → ε, the production set becomes −
S→ASA | aB | b | a, A ε B| b | &epsilon, B → b
Now we will remove A → ε.
After removing A → ε, the production set becomes −
S→ASA | aB | b | a | SA | AS | S, A → B| b, B → b
This is the final production set without null transition.
Remove Useless productions
The productions that can never take part in derivation of any
string , are called useless productions. Similarly , a variable that
can never take part in derivation of any string is called a useless
variable.
Use less
S -> abS | abA | abB
productions
For eg. A -> cd are
B -> aB
B -> aB
C -> dc 58
C -> dc
Chomsky Normal Form
A CFG is in Chomsky Normal Form if the Productions are in the
following forms −
A→a
A → BC
S → ε, where A, B, and C are non-terminals and a is terminal.
60
Cont…
Solution
(1) Since S appears in R.H.S, we add a new state S0 and S0→S is
added to the production set and it becomes −
S0→S, S→ ASA | aB, A → B | S, B → b | ∈
(2) Now we will remove the null productions −
B → ∈ and A → ∈
After removing B → ε, the production set becomes −
S0→S, S→ ASA | aB | a, A → B | S | ∈, B → b
After removing A → ∈, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA | S, A → B | S, B → b
61
Cont…
(3) Now we will remove the unit productions.
After removing S → S, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA, A → B | S, B → b
After removing S0→ S, the production set becomes −
S0→ ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → B | S, B → b
After removing A→ B, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A→S|b
B→b
After removing A→ S, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → b |ASA | aB | a | AS | SA, B → b
62
Cont…
4) Now we will find out more than two variables in the R.H.S
Here, S0→ ASA, S → ASA, A→ ASA violates two Non-terminals
in R.H.S.
Hence we will apply step 4 and step 5 to get the following final
production set which is in CNF −
S0→ AX | aB | a | AS | SA
S→ AX | aB | a | AS | SA
A → b |AX | aB | a | AS | SA
B→b
X → SA
(5) We have to change the productions S0→ aB, S→ aB, A→ aB
And the final production set becomes −
63
Cont…
S0→ AX | YB | a | AS | SA
S→ AX | YB | a | AS | SA
A → b A → b |AX | YB | a | AS | SA
B→b
X → SA
Y→a
64
Greibach Normal
Form
A CFG is in Greibach Normal Form if the Productions are in the
following forms −
A→b
A → bD1…Dn
S → ε,where A, D1,....,Dn are non-terminals and b is a terminal.
Algorithm to Convert a CFG into Greibach Normal Form
1. If the start symbol S occurs on some right side, create a new start
symbol S’ and a new production S’ → S.
2. Remove Null productions. (Using the Null production removal
algorithm discussed earlier)
3. Remove unit productions. (Using the Unit production removal
algorithm discussed earlier)
4. Remove all direct and indirect left-recursion.
5. Do proper substitutions of productions to convert it into the
proper form of GNF. 65
Cont…
Example
Convert the following CFG Step 4
into CNF Now after replacing
S → XY | Xn | p X in S → XY | Xo | p
X → mX | m with
Y → Xn | o mX | m
Solution we obtain
Here, S does not appear on S → mXY | mY | mXo | mo |
the right side of any p.
production and there are no And after replacing
unit or null productions in X in Y → Xn | o
the production rule set. So, with the right side of
we can skip Step 1 to Step 3. X → mX | m
66
Cont…
we obtain
Y → mXn | mn | o.
Two new productions O → o and P → p are added to the
production set and then we came to the final GNF as the
following −
S → mXY | mY | mXC | mC | p
X → mX | m
Y → mXD | mD | o
O→o
P→p
67