0% found this document useful (0 votes)
26 views67 pages

Chapter3 CFG

it descibes how CFG works

Uploaded by

wubalem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views67 pages

Chapter3 CFG

it descibes how CFG works

Uploaded by

wubalem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

Chapter 3

Context Free Languages


Contents
– Context free languages
– Parsing and ambiguity
– Sentential forms
– Derivation tree or parse tree
– Left most and right most
– Derivations
o n of c on te x t fr e e gra m m ar
– Simplifica ti
mars.
– Methods for transforming gram

1
Hierarchy of languages (Revision)

Regular Languages  Finite State Machines, Regular


Expression
Context Free Languages  Context Free Grammar, Push-down
Automata
Non-Recursively Enumerable Languages

Recursively Enumerable Languages

Recursive Languages

Context-Free Languages

Regular Languages

2
Context Free Languages (CFL)
• The pumping lemma showed there are languages that are not
regular
– There are many classes “larger” than that of regular languages
– One of these classes are called “Context Free” languages
• Described by Context-Free Grammars (CFG)
– Why named context-free?
– Property that we can substitute strings for variables regardless of
context (implies context sensitive languages exist)
• CFG’s are useful in many applications
– Describing syntax of programming languages
– Parsing
– Structure of documents, e.g.XML
• Analogy of the day:
– DFA:Regular Expression as Pushdown Automata : CFG
CFG
• It is a notation used to specify the syntax of the language.
Context-free Grammar is used to design parsers.

• As Lexical Analyzer generates a string of tokens which are


given to parser to construct parse tree. But, before
constructing the parse tree, these tokens will be grouped so
that the results of grouping will be a valid construct of a
language. So, to specify constructs of language, a suitable
notation is used, which will be precise & easy to
understand. This notation is Context-Free Grammar.

4
Cont…

Definition. Given a context-free grammar G = ( , NT,


R, S), the language generated or
derived from G is the set:

L(G) = {w : S  * w }

Definition. A language L is context-free if there is a


context-free grammar G = ( , NT, R, S), such that L is
generated from G
Informal Comments
• A context-free grammar is a notation for describing languages.
• It is more powerful than finite automata or RE’s, but still cannot
define all possible languages.
• Useful for nested structures, e.g., parentheses in programming
languages.
• Basic idea is to use “variables” to stand for sets of strings (i.e.,
languages).
• These variables are defined recursively, in terms of one another.
• Recursive rules (“productions”) involve only concatenation.
• Alternative rules for a variable allow union.
6
CFG Formalism
• Terminals:- symbols of the alphabet of the language being
defined.
• Variables (non-terminals ) = a finite set of other symbols, each
of which represents a language.
• Start symbol:- the variable whose language is the one being
defined.
• A production has the form variable -> string of variables and
terminals.
• Convention:
– A, B, C,… are variables.
– a, b, c,… are terminals.
– …, X, Y, Z are either terminals or variables.
– …, w, x, y, z are strings of terminals only.
– , , ,… are strings of terminals and/or variables. 7
Context Free Grammar
Notation
Definition − A context-free grammar (CFG) consisting
of a finite set of grammar rules is a quadruple (N, T, P,
S) where
• N is a set of non-terminal symbols.
• T is a set of terminals where N ∩ T = NULL.
• P is a set of rules, P: N → (N ∪ T)*, i.e., the left-hand side of the
production rule P does have any right context or left context.
• S is the start symbol

Example
• The grammar ({A}, {a, b, c}, P, A), P : A → aA, A → abc.
• The grammar ({S, a, b}, {a, b}, P, S), P: S → aSa, S → bSb, S →
ε
• The grammar ({S, F}, {0, 1}, P, S), P: S → 00S | 11F, F → 00F8 | ε
Examples
1. Write down Grammar for language L={an|n≥1}
Solution
Let G=(V,Σ,P,S)
V = {S}
Σ={a}
P = { S→aS|a }
2. Construct a CFG for a language L = {wcwR | where w € (a,
b)*}.
The grammar could be:
S → aSa rule 1
S → bSb rule 2
S→c rule 3

9
Cont…
Construct a CFG for the language L = anb2n where n>=1.
Solution:
The string that can be generated for a given language is {abb,
aabbbb, aaabbbbbb....}.
The grammar could be:
S → aSbb | abb

10
Sample CFG

1. EI // Expression is an identifier


2. EE+E // Add two expressions
3. EE*E // Multiply two expressions
4. E(E) // Add parenthesis
5. I L // Identifier is a Letter
6. I ID // Identifier + Digit
7. I IL // Identifier + Letter
8. D0|1|2|3|4|5|6|7|8 |9 // Digits
9. L a |b|c|…A|B|…Z // Letters

Note Identifiers are regular; could describe as (letter)(letter +


digit)*
Derivations – Intuition
• We derive strings in the language of a CFG by starting with
the start symbol, and repeatedly replacing some variable A by
the right side of one of its productions.
– That is, the “productions for A” are those that have A on the
left side of the ->.
• We say  A =>    if A ->  is a production.
• Example: S -> 01; S -> 0S1.
• S => 0S1 => 00S11 => 000111.
Definition. v is one-step derivable from u, written u  v, if:
• u = x z
• v = x z
•    in R
Definition. v is derivable from u, written u  * v, if:
There is a chain of one-derivations of the form: 12
u  u1  u2  …  v
Example of Derivation
S  SS | (S)S | () | E  E O E | (E) | id
S O+|-|*|/
 SS E
 (S)SS EOE
(S)S(S)S  (E) O E
 (S)S(())S  (E O E) O E
 ((S)S)S(())S * ((E O E) O E) O E
 ((S)())S(())S  ((id O E)) O E) O E
 ((())())S(())S  ((id + E)) O E) O E
 ((())()) (())S  ((id + id)) O E) O E
 ((())())(())  * ((id + id)) * id) + id
13
Generation of Derivation Tree
• A derivation tree or parse tree is an ordered rooted tree that
graphically represents the semantic information a string
derived from a context-free grammar.
Representation Technique
 Root vertex − Must be labeled by the start symbol.
 Vertex − Labeled by a non-terminal symbol.
 Leaves − Labeled by a terminal symbol or ε.
If S → x1x2 …… xn is a production rule in a CFG, then the parse
tree / derivation tree will be as follows

14
Cont…
There are two different approaches to draw a derivation tree −
Top-down Approach −
• Starts with the starting symbol S
• Goes down to tree leaves using productions
Bottom-up Approach −
• Starts from tree leaves
• Proceeds upward to the root which is the starting symbol S
Derivation or Yield of a Tree
The derivation or the yield of a parse tree is
the final string obtained by concatenating
the labels of the leaves of the tree from left
to right, ignoring the Nulls. However, if all
the leaves are Null, derivation is Null.
Let a CFG {N,T,P,S} be
N = {S}, T = {a, b}, Starting symbol = S, P = S → SS | aSb | ε
One derivation from the above CFG is “abaabb”S → SS → aSbS
15
→ abS → abaSb → abaaSbb → abaabb
Cont…
Sentential Form and Partial Derivation Tree

Any string of variables and/or terminals derived from the start


symbol is called a sentential form.
A partial derivation tree is a sub-tree of a derivation tree/parse
tree such that either all of its children are in the sub-tree or none
of them are in the sub-tree.
Example
If in any CFG the productions are −
S → AB, A → aaA | ε, B → Bb| ε
the partial derivation tree can be the
following −
16
Derivation Trees

S  A|AB Other derivation trees for


A  e | a | A b | AA w = aabb this string?
B  b|bc|Bc|bB
S
S S
A
A B A B
A A Infinitely
A A b B A A b many others
A A A b possible.
a a b a A b
a e A b
a
a
Cont…
Leftmost and Rightmost Derivation of a String
• Leftmost derivation − A leftmost derivation is obtained by
applying production to the leftmost variable in each step.
• Rightmost derivation − A rightmost derivation is obtained by
applying production to the rightmost variable in each step.

Example
Let any set of production rules in a CFG be
X → X+X | X*X |X| a over an alphabet {a}.
The leftmost derivation for the string "a+a*a" may be −
X → X+X → a+X → a + X*X → a+a*X → a+a*a
The stepwise derivation of the above string is shown as below −

18
Cont…

19
Cont…
The rightmost derivation for the above string "a+a*a" may be −
X → X*X → X*a → X+X*a → X+a*a → a+a*a
The stepwise derivation of the above string is shown as below −

20
Example 2
S → AB
Let’s draw leftmost and rightmost derivations
A → Aa | a of above grammar to get the string “aab”.

B→b
Leftmost Derivation Rightmost Derivation

• Each step of the derivation is • Each step of the derivation is a


a replacement of the leftmost replacement of the rightmost
nonterminals in a sentential nonterminals in a sentential
form. form.
E E
 EOE  EOE
 (E) O E  E O id
 (E O E) O E  E * id
 (id O E) O E  (E) * id
 (id + E) O E  (E O E) * id
 (id + id) O E  (E O id) * id
 (id + id) * E  (E + id) * id
 (id + id) * id  (id + id) * id
22
Sample Parse /derivation
Tree
E

• Using a leftmost derivation generates the


E * E
parse tree for a*(a+b1)
• Does using a rightmost derivation I ( E )
produce a different tree?
• The yield of the parse tree is the string L E + E

that results when we concatenate the


a I I
leaves from left to right (e.g., doing a
leftmost depth first search). L I D
– The yield is always a string that is derived
from the root and is guaranteed to be a string
a L 1
in the language L.
b
Example
G = ({A, B, C, S}, {a, b, c}, P, S)
P:
(1) S –> ABC
(2) A –> aA
A –> aA | ε
(3) A –> ε
(4) B –> bB B
–> bB | ε
(5) B –> ε
(6) C –> cC C
–> cC | ε
Derivations: S=> ABC (7) (1) C –> ε S=> ABC (1)
=> BC (3) => aABC (2)
=>C (5) => aaABC (2)
=> ε (7) => aaBC (3)
=> aabBC (4)
=> aabC (5)
=> aabcC (6)
=> aabc (7)

24
Example CFG
for {0k1k | k≥0}:
G = ({S}, {0, 1}, P, S) // Remember: G = (V, T, P,
S)
P:
(1) S –> 0S1 or
just simply S –> 0S1 | ε Derivations:
Example
(2) S –> ε
S => 0S1 (1) S => ε (2)
=> 01 (2)
S => 0S1 (1)
=> 00S11 (1)
=> 000S111 (1)
=> 000111 (2)
• Derivation of aabb
S  aSb  aaSbb  aabb
• Derivation tree
S

a S b

a b
l
Examples
G ({S , A, B},{a , b},
{S  AB, A  aA |  ,
B  Bb | },
S)
L(G )  L( a * b*)

Leftmost D erivation :
S  AB  aAB  aB  aBb  ab
Rightmost Derivation :
S  AB  ABb  Ab  aAb  ab

27
Derivation Tree (abstracts derivation)
S

A B

a A B b

l l
Parse Trees and Derivations…..Cont…
E 1
EE+E
E2 E3 (1)
+
E 4
E 5

id + E
i *
d
(2)
i i
d
Preorder d

id + E * E (3)
numbering
E 1

id + id * E (4)

id + id * id(5)
E5 + E 2

EE+E
i E 4
* E 3
(1)
d i i 
E + E * E (2)
d
Reverse of postorder d
numbering 
E + E * id (3)

E + id * id (4) 29
Examples
• C++ identifier names. Check if _var2 is valid identifier name.
Answer:
• < id> : = < letter > < rest >|< underscore > < rest >
• < rest > :: = < letter >< rest > | < underscore >< rest > | < digits
> < rest>| Ɛ
<letter> :: = a|b|c…|z|A|B|…|Z
• <digits> ::= 0|1|…|9
• <underscore> :: = -
• Changing it to CFG :
• I  LR|UR
• R  LR|UR|DR|l
• L a | b | c … | z | A | B | … | Z
• D 0| 1 | … | 9
U _ 30
Examples: CFGs and CFLs
Find out language generated by Grammar.
G=({S},{a,b}{S → a S b,S → a,b},S)
Solution

Production rules: S ⇒ aSa


S → aSa S ⇒ abSba
S → bSb S ⇒ abbSbba
S→c S ⇒ abbcbba
L= {wcwR | w € (a, b)*}
31
S  aSa | aBa L ( B ) {b m | m  0}
B  bB | b L ( S ) {a n b m a n | n  0  m  0}

S  aSa | B
L( S ) {a n bm a n | n 0  m 0}
B  bB | 

L ( S ) {( ab) n c n | n 0}


S  abSc | 

S  AB
S  aS | aB
A  aA | a
B  bB | 
B  bB | 
L( S ) {a n b m | m 0  n  0}
 * 32
L( S )  L( a b )
n
S  abScB |  L( S ) {(ab) n
 |
cb mi

B  bB | b i 1

n 0  (i : 1 i n  mi  0)}

S  aS | B
S  AbAbA B  bA
a * ba * ba *
A  aA |  A  aA | bC
C  aC | 

Left to right generation of string.

S --> B | Ɛ
L = {am bn | m >= n}. B --> aBb | A
A --> aA | a

33
Cont…
L {w  {a, b}* | length ( w) is EVEN}
E 
E  | aO | bO
| aaE | abE
| baE | bbE O aE | bE

L {w  {a, b}* | w has EVEN number of b' s}


E  | aE | bO
O aO | bE

{am bn cm+n | m,n0}

Rewrite as {am bn cn cm | m,n 0}:


S  S’ | a S c
S’  e | b S’ c 34
S→aSb/A
CFG for L {a b | n <= m+3,
n m
A→ϵ/a/aa/aaa/B
n,m>=0}
B→bB/ϵ

S → S1S2
L2 = {a nb mc k | n + k = m } S1 → aS1b
S1 → ϵ
S2 → bS2c
S2 → ϵ

35
Cont…
Left and Right Recursive Grammars
In a context-free grammar G, if there is a production in the
form X → Xa where X is a non-terminal and ‘a’ is a string of
terminals, it is called a left recursive production. The
grammar having a left recursive production is called a left
recursive grammar.
And if in a context-free grammar G, if there is a production is
in the form X → aX where X is a non-terminal and ‘a’ is a
string of terminals, it is called a right recursive production.
The grammar having a right recursive production is called
a right recursive grammar.

36
Right/Left Recursive… (General)
• A grammar is a left • A grammar is a right
recursive if its production recursive if its production
rules can generate a rules can generate a
derivation of the form A derivation of the form A
*
A X. *
X A.
• Examples: • Examples:
– E  E O id | (E) | id – E  id O E | (E) | id
– E  F + id | (E) | id – E  id + F | (E) | id
F  E * id | id F  id * E | id
• E  F + id • E  id + F

E * id + id 
id + id * E

37
Ambiguity in Context-Free Grammars
• Context Free Grammars(CFGs) are classified based on:
• Number of Derivation trees
• Number of strings
• Depending on the Number of Derivation trees, CFGs are sub-
divided into 2 types:
• Ambiguous grammars
• Unambiguous grammars

Definition: G = (V,T,P,S) is a CFG that is said to be ambiguous if


and only if there exists a string in T* that has more than one parse
tree. where V is a finite set of variables. T is a finite set of
terminals. P is a finite set of productions of the form, A -> α,
where A is a variable and α ∈ (V ∪ T)* S is a designated variable
called the start symbol. 38
Ambiguity
CFG ambiguous  any of following equivalent statements:
–  string w with multiple derivation trees.
–  string w with multiple leftmost derivations.
–  string w with multiple rightmost derivations.Defining
ambiguity of grammar, not language.
Ambiguity in Context-Free Grammars
If a context free grammar G has more than one derivation tree for
some string w ∈ L(G), it is called an ambiguous grammar. There
exist multiple right-most or left-most derivations for some string
generated from that grammar.
Problem
Check whether the grammar G with production rules −
X → X+X | X*X |X| a
is ambiguous or not.
Solution
Let’s find out the derivation tree for
the string "a+a*a". It has two
leftmost derivations.
Derivation 1 − X → X+X → a +X
→ a+ X*X → a+a*X → a+a*a
40
Parse tree 1 −
Cont…
Derivation 2 − X → X*X → X+X*X → a+ X*X → a+a*X →
a+a*a
Parse tree 2 −

Since there are two parse trees


for a single string "a+a*a",
the grammar G is ambiguous.

41
Exercise
Check the Following are ambiguous grammar or not:
• S-> aS |Sa| Є
• E-> E +E | E*E| id
• A -> AA | (A) | a

• S -> SS|AB , A -> Aa|a , B -> Bb|b


How to find out whether grammar is ambiguous or not?
if we can directly or indirectly observe both left and right
recursion in grammar, then the grammar is ambiguous.

Example - S -> SaS|Є


In this grammar we can see both left and right
recursion. So the grammar is ambiguous.
We can make more than one parse tree/derivation tree
for input string (let's say {aa} ) 42
Cont….

↨ If both left and right recursion are not present in grammar, then
is the grammar unambiguous? Explain with an example.
↨ Ans– No, the grammar can still be ambiguous. If both left and
right recursion are present in grammar, then the grammar is
ambiguous, but the reverse is not always true.

43
Cont….
In the above example, although both
Example - left and right recursion are not
S -> aB | ab present, but if we see string { ab },
A -> AB | a we can make more than one parse
B -> Abb | b tree to generate the string.

We can see that even if both


left and right recursion are
not present in grammar, the
grammar can be ambiguous

44
Cont….
1. State whether the grammar is ambiguous or not.
S -> SAB | Є
A -> AaB | a
B -> AS | b
Ans – The grammar is Ambiguous.
If we put
B -> AS in S -> SAB
Then we get S -> SAAS and the grammar clearly contains both
left and right recursion. Hence the grammar is ambiguous.

2. Is the following grammar ambiguous?


SAS | ε
AA1 | 0A1 | 01
45
Cont…
Example : Check whether the following grammar is ambiguous or
not
S → AB / C Solution
A → aAb / ab Now we draw more than one parse trees to
B → cBd / cd get string w = aabbccdd.
C → aCd / aDd
D → bDc / bc

As original string (w =aabbccdd) can


derived through two different parse
trees. So, the given grammar is
ambiguous.
46
CFL Closure Property

Context-free languages Example


are closed under − Let L1 = { anbn , n > 0}.
• Union Corresponding grammar G1 will
• Concatenation have P: S1 → aAb|ab
• Kleene Star operation Let L2 = { cmdm , m ≥ 0}.
Union Corresponding grammar G2 will
Let L1 and L2 be two
have P: S2 → cBb| ε
context free languages. Union of L1 and L2, L = L1 ∪ L2 =
Then L1 ∪ L2 is also
{ anbn } ∪ { cmdm }
context free. The corresponding grammar G will
have the additional production S →
S1 | S2
47
Cont…
Kleene Star
Concatenation If L is a context free
If L1 and L2 are context free language, then L* is also
languages, then L1L2 is also context free.
context free. Example
Example Let L = { anbn , n ≥ 0}.
Product of the languages L1 and Corresponding grammar G
will have P: S → aAb| ε
L2, L = L1L2 = { anbncmdm }
Kleene Star L1 = { anbn }*
The corresponding grammar G
The corresponding grammar
will have the additional
G1 will have additional
production S → S1 S2
productions S1 → SS1 | ε
48
Cont…
Context-free languages are not closed under −
• Intersection − If L1 and L2 are context free languages, then L1
∩ L2 is not necessarily context free.
• Intersection with Regular Language − If L1 is a regular
language and L2 is a context free language, then L1 ∩ L2 is a
context free language.
• Complement − If L1 is a context free language, then L1’ may
not be context free.

49
Simplification of CFG

As we have seen, various languages can efficiently be represented


by a context-free grammar. All the grammar are not always
optimized that means the grammar may consist of some extra
symbols(non-terminal). Having extra symbols, unnecessary
increase the length of grammar. Simplification of grammar means
reduction of grammar by removing useless symbols.

In a CFG, it may happen that all the production rules and symbols
are not needed for the derivation of strings. Besides, there may be
some null productions and unit productions. Elimination of these
productions and symbols is called simplification of CFGs.
50
Cont…
Simplification essentially comprises of the following steps −
• Reduction of CFG
• Eliminate ambiguity.
• Eliminate “useless” variables.
• Eliminate e-productions: A .
• Eliminate unit productions: A B.
• Eliminate redundant productions.
• Trade left- & right-recursion.
Reduction of CFG
CFGs are reduced in two phases −
Phase 1 − Derivation of an equivalent grammar, G’, from the
CFG, G, such that each variable derives some terminal string.
Cont…
Derivation Procedure −
Step 1 − Include all symbols, W1, that derive some terminal
and initialize i=1.
Step 2 − Include all symbols, Wi+1, that derive Wi.
Step 3 − Increment i and repeat Step 2, until Wi+1 = Wi.
Step 4 − Include all production rules that have Wi in it.
Phase 2 − Derivation of an equivalent grammar, G”, from the
CFG, G’, such that each symbol appears in a sentential form.
Derivation Procedure −
Step 1 − Include the start symbol in Y1 and initialize i = 1.
Step 2 − Include all symbols, Yi+1, that can be derived
from Yi and include all production rules that have been applied.
Step 3 − Increment i and repeat Step 2, until Yi+1 = Yi.
52
Cont…
Example
Find a reduced grammar equivalent to the grammar G, having
production rules, P: S → AC | B, A → a, C → c | BC, E → aA |
e
Solution
Phase 1 −
T = { a, c, e }
W1 = { A, C, E } from rules A → a, C → c and E → aA
W2 = { A, C, E } U { S } from rule S → AC
W3 = { A, C, E, S } U ∅
Since W2 = W3, we can derive G’ as −
G’ = { { A, C, E, S }, { a, c, e }, P, {S}}
where P: S → AC, A → a, C → c , E → aA | e
53
Cont…
Phase 2 −
Y1 = { S }
Y2 = { S, A, C } from rule S → AC
Y3 = { S, A, C, a, c } from rules A → a and C → c
Y4 = { S, A, C, a, c }
Since Y3 = Y4, we can derive G” as − G” = { { A, C, S }, { a, c },
P, {S}}, of
Removal where
Unit P: S → AC, A → a, C → c
Productions
Any production rule in the form A → B where A, B ∈ Non-terminal
is called unit production..
Removal Procedure −
1 − To remove A → B, add production A → x to the grammar rule
whenever B → x occurs in the grammar. [x ∈ Terminal, x can be
Null]
2 − Delete A → B from the grammar.
3 − Repeat from step 1 until all unit productions are removed. 54
Cont….
Example
Remove unit production from the following −
S → XY, X → a, Y → Z | b, Z → M, M → N, N → a
Solution −
There are 3 unit productions in the grammar −
Y → Z, Z → M, and M → N
At first, we will remove M → N.
As N → a, we add M → a, and M → N is removed.
The production set becomes
S → XY, X → a, Y → Z | b, Z → M, M → a, N → a
Now we will remove Z → M.
As M → a, we add Z→ a, and Z → M is removed.
The production set becomes
S → XY, X → a, Y → Z | b, Z → a, M → a, N → a
55
Cont…
Now we will remove Y → Z.
As Z → a, we add Y→ a, and Y → Z is removed.
The production set becomes
S → XY, X → a, Y → a | b, Z → a, M → a, N → a
Now Z, M, and N are unreachable, hence we can remove those.
The final CFG is unit production free −
S → XY, X → a, Y → a | b
Removal of Null Productions
In a CFG, a non-terminal symbol ‘A’ is a nullable variable if
there is a production A → ε or there is a derivation that starts
at A and finally ends up with
ε: A → .......… → ε

56
Cont….
Removal Procedure
Step 1 − Find out nullable non-terminal variables which derive ε.
Step 2 − For each production A →a, construct all productions A
→ x where x is obtained from ‘a’ by removing one or multiple
non-terminals from Step 1.
Step 3 − Combine the original productions with the result of step 2
and remove ε - productions.
Example
Remove null production from the following −
S → ASA | aB | b, A → B, B → b | ∈
Solution −
There are two nullable variables
− A and B
At first, we will remove B → ε. 57
Cont….
After removing B → ε, the production set becomes −
S→ASA | aB | b | a, A ε B| b | &epsilon, B → b
Now we will remove A → ε.
After removing A → ε, the production set becomes −
S→ASA | aB | b | a | SA | AS | S, A → B| b, B → b
This is the final production set without null transition.
Remove Useless productions
The productions that can never take part in derivation of any
string , are called useless productions. Similarly , a variable that
can never take part in derivation of any string is called a useless
variable.
Use less
S -> abS | abA | abB
productions
For eg. A -> cd are
B -> aB
B -> aB
C -> dc 58
C -> dc
Chomsky Normal Form
A CFG is in Chomsky Normal Form if the Productions are in the
following forms −
 A→a
 A → BC
 S → ε, where A, B, and C are non-terminals and a is terminal.

Algorithm to Convert into Chomsky Normal Form −


Step 1 − If the start symbol S occurs on some right side, create
a new start symbol S’ and a new production S’→ S.
Step 2 − Remove Null productions. (Using the Null production
removal algorithm discussed earlier)
Step 3 − Remove unit productions. (Using the Unit production
removal algorithm discussed earlier)
59
Cont…
Step 4 − Replace each production A → B1…Bn where n >
2 with A → B1C where C → B2 …Bn. Repeat this step for
all productions having two or more symbols in the right
side.
Step 5 − If the right side of any production is in the form A
→ aB where a is a terminal and A, B are non-terminal, then
the production is replaced by A → XB and X → a. Repeat
this step for every production which is in the form A → aB.
Example
Convert the following CFG into CNF
S → ASA | aB, A → B | S, B → b | ε

60
Cont…
Solution
(1) Since S appears in R.H.S, we add a new state S0 and S0→S is
added to the production set and it becomes −
S0→S, S→ ASA | aB, A → B | S, B → b | ∈
(2) Now we will remove the null productions −
B → ∈ and A → ∈
After removing B → ε, the production set becomes −
S0→S, S→ ASA | aB | a, A → B | S | ∈, B → b
After removing A → ∈, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA | S, A → B | S, B → b

61
Cont…
(3) Now we will remove the unit productions.
After removing S → S, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA, A → B | S, B → b
After removing S0→ S, the production set becomes −
S0→ ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → B | S, B → b
After removing A→ B, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A→S|b
B→b
After removing A→ S, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → b |ASA | aB | a | AS | SA, B → b
62
Cont…
4) Now we will find out more than two variables in the R.H.S
Here, S0→ ASA, S → ASA, A→ ASA violates two Non-terminals
in R.H.S.
Hence we will apply step 4 and step 5 to get the following final
production set which is in CNF −
S0→ AX | aB | a | AS | SA
S→ AX | aB | a | AS | SA
A → b |AX | aB | a | AS | SA
B→b
X → SA
(5) We have to change the productions S0→ aB, S→ aB, A→ aB
And the final production set becomes −

63
Cont…
S0→ AX | YB | a | AS | SA
S→ AX | YB | a | AS | SA
A → b A → b |AX | YB | a | AS | SA
B→b
X → SA
Y→a

64
Greibach Normal
Form
A CFG is in Greibach Normal Form if the Productions are in the
following forms −
A→b
A → bD1…Dn
S → ε,where A, D1,....,Dn are non-terminals and b is a terminal.
Algorithm to Convert a CFG into Greibach Normal Form
1. If the start symbol S occurs on some right side, create a new start
symbol S’ and a new production S’ → S.
2. Remove Null productions. (Using the Null production removal
algorithm discussed earlier)
3. Remove unit productions. (Using the Unit production removal
algorithm discussed earlier)
4. Remove all direct and indirect left-recursion.
5. Do proper substitutions of productions to convert it into the
proper form of GNF. 65
Cont…
Example
Convert the following CFG Step 4
into CNF Now after replacing
S → XY | Xn | p X in S → XY | Xo | p
X → mX | m with
Y → Xn | o mX | m
Solution we obtain
Here, S does not appear on S → mXY | mY | mXo | mo |
the right side of any p.
production and there are no And after replacing
unit or null productions in X in Y → Xn | o
the production rule set. So, with the right side of
we can skip Step 1 to Step 3. X → mX | m
66
Cont…
we obtain
Y → mXn | mn | o.
Two new productions O → o and P → p are added to the
production set and then we came to the final GNF as the
following −
S → mXY | mY | mXC | mC | p
X → mX | m
Y → mXD | mD | o
O→o
P→p

67

You might also like