0% found this document useful (0 votes)
52 views18 pages

Flat Module 3

The document discusses context free grammar and its key concepts. Context free grammar is a formal grammar used to generate strings in a given language. It is defined by variables, terminals, production rules and a start symbol. Strings can be derived by replacing non-terminals using the rules until only terminals remain. Left and right derivations as well as parse trees are explained.

Uploaded by

Priya Rana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views18 pages

Flat Module 3

The document discusses context free grammar and its key concepts. Context free grammar is a formal grammar used to generate strings in a given language. It is defined by variables, terminals, production rules and a start symbol. Strings can be derived by replacing non-terminals using the rules until only terminals remain. Left and right derivations as well as parse trees are explained.

Uploaded by

Priya Rana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Module 3

Context free grammer

Context free grammar

Context free grammar is a formal grammar which is used to generate all possible strings in a
given formal language.

Context free grammar G can be defined by four tuples as:

G= (V, T, P, S)

Where,

G describes the grammar

T describes a finite set of terminal symbols.

V describes a finite set of non-terminal symbols

P describes a set of production rules

S is the start symbol.

In CFG, the start symbol is used to derive the string. You can derive the string by repeatedly
replacing a non-terminal by the right hand side of the production, until all non-terminal have
been replaced by terminal symbols.

Example:

L= {wcwR | w € (a, b)*}

Production rules:

1. S → aSa
2. S → bSb
3. S → c

Now check that abbcbba string can be derived from the given CFG.
1. S ⇒ aSa
2. S ⇒ abSba
3. S ⇒ abbSbba
4. S ⇒ abbcbba

By applying the production S → aSa, S → bSb recursively and finally applying the production S
→ c, we get the string abbcbba.

Capabilities of CFG

There are the various capabilities of CFG:

o Context free grammar is useful to describe most of the programming languages.


o If the grammar is properly designed then an efficientparser can be constructed
automatically.
o Using the features of associatively & precedence information, suitable grammars for
expressions can be constructed.
o Context free grammar is capable of describing nested structures like: balanced
parentheses, matching begin-end, corresponding if-then-else's & so on.

CFL

CFLis a language which is generated by a context-free grammar or Type 2


grammar(according to Chomsky classification) and gets accepted by a Pushdown
Automata.
o Some very much important properties of a context-free language is:
o Regularity- context-free languages are Non-Regular PDA language.
Derivation

Derivation is a sequence of production rules. It is used to get the input string through these
production rules. During parsing we have to take two decisions. These are as follows:

o We have to decide the non-terminal which is to be replaced.


o We have to decide the production rule by which the non-terminal will be replaced.

We have two options to decide which non-terminal to be replaced with production rule.
Left-most Derivation

In the left most derivation, the input is scanned and replaced with the production rule from left to
right. So in left most derivatives we read the input string from left to right.

Example:

Production rules:

1. S = S + S
2. S = S - S
3. S = a | b |c

Input:

a-b+c

The left-most derivation is:

1. S = S + S
2. S = S - S + S
3. S = a - S + S
4. S = a - b + S
5. S = a - b + c

Right-most Derivation

In the right most derivation, the input is scanned and replaced with the production rule from right
to left. So in right most derivatives we read the input string from right to left.

Example:
1. S = S + S
2. S = S - S
3. S = a | b |c

Input:

a-b+c

The right-most derivation is:

1. S = S - S
2. S = S - S + S
3. S = S - S + c
4. S = S - b + c
5. S = a - b + c
Parse tree

o Parse tree is the graphical representation of symbol. The symbol can be terminal or non-
terminal.
o In parsing, the string is derived using the start symbol. The root of the parse tree is that
start symbol.
o It is the graphical representation of symbol that can be terminals or non-terminals.
o Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So,
the operator in the parent node has less precedence over the operator in the sub-tree.

The parse tree follows these points:


o All leaf nodes have to be terminals.
o All interior nodes have to be non-terminals.
o In-order traversal gives original input string.

Example:

Production rules:

T= T + T | T * T

1. T = a|b|c

Input:

a*b+c

Step 1:
Step 2:

Step 3:

Step 4:
Step 5:

Ambiguity

A grammar is said to be ambiguous if there exists more than one leftmost derivation or more
than one rightmost derivative or more than one parse tree for the given input string. If the
grammar is not ambiguous then it is called unambiguous.

Example:
1. S = aSb | SS
2. S = ∈

For the string aabb, the above grammar generates two parse trees:

If the grammar has ambiguity then it is not good for a compiler construction. No method can
automatically detect and remove the ambiguity but you can remove ambiguity by re-writing the
whole grammar without ambiguity.

Chomsky's Normal Form (CNF)


CNF stands for Chomsky normal form. A CFG(context free grammar) is in CNF(Chomsky
normal form) if all production rules satisfy one of the following conditions:

o Start symbol generating ε. For example, A → ε.


o A non-terminal generating two non-terminals. For example, S → AB.
o A non-terminal generating a terminal. For example, S → a.

For example:
1. G1 = {S → AB, S → c, A → a, B → b}
2. G2 = {S → aA, A → a, B → c}

The production rules of Grammar G1 satisfy the rules specified for CNF, so the grammar G1 is
in CNF. However, the production rule of Grammar G2 does not satisfy the rules specified for
CNF as S → aZ contains terminal followed by non-terminal. So the grammar G2 is not in CNF.

Steps for converting CFG into CNF

Step 1: Eliminate start symbol from the RHS. If the start symbol T is at the right-hand side of
any production, create a new production as:

1. S1 → S

Where S1 is the new start symbol.

Step 2: In the grammar, remove the null, unit and useless productions. You can refer to
the Simplification of CFG.

Step 3: Eliminate terminals from the RHS of the production if they exist with other non-
terminals or terminals. For example, production S → aA can be decomposed as:

1. S → RA
2. R → a

Step 4: Eliminate RHS with more than two non-terminals. For example, S → ASB can be
decomposed as:

1. S → RS
2. R → AS

Example:

Convert the given CFG to CNF. Consider the given grammar G1:
1. S → a | aA | B
2. A → aBB | ε
3. B → Aa | b

Solution:

Step 1: We will create a new production S1 → S, as the start symbol S appears on the RHS. The
grammar will be:

1. S1 → S
2. S → a | aA | B
3. A → aBB | ε
4. B → Aa | b

Step 2: As grammar G1 contains A → ε null production, its removal from the grammar yields:

1. S1 → S
2. S → a | aA | B
3. A → aBB
4. B → Aa | b | a

Now, as grammar G1 contains Unit production S → B, its removal yield:

1. S1 → S
2. S → a | aA | Aa | b
3. A → aBB
4. B → Aa | b | a

Also remove the unit production S1 → S, its removal from the grammar yields:

1. S0 → a | aA | Aa | b
2. S → a | aA | Aa | b
3. A → aBB
4. B → Aa | b | a

Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa, terminal a
exists on RHS with non-terminals. So we will replace terminal a with X:

1. S0 → a | XA | AX | b
2. S → a | XA | AX | b
3. A → XBB
4. B → AX | b | a
5. X → a

Step 4: In the production rule A → XBB, RHS has more than two symbols, removing it from
grammar yield:

1. S0 → a | XA | AX | b
2. S → a | XA | AX | b
3. A → RB
4. B → AX | b | a
5. X → a
6. R → XB

Hence, for the given grammar, this is the required CNF.

Greibach Normal Form (GNF)

GNF stands for Greibach normal form. A CFG(context free grammar) is in GNF(Greibach
normal form) if all the production rules satisfy one of the following conditions:

o A start symbol generating ε. For example, S → ε.


o A non-terminal generating a terminal. For example, A → a.
o A non-terminal generating a terminal which is followed by any number of non-terminals.
For example, S → aASB.

For example:

1. G1 = {S → aAB | aB, A → aA| a, B → bB | b}


2. G2 = {S → aAB | aB, A → aA | ε, B → bB | ε}

The production rules of Grammar G1 satisfy the rules specified for GNF, so the grammar G1 is
in GNF. However, the production rule of Grammar G2 does not satisfy the rules specified for
GNF as A → ε and B → ε contains ε(only start symbol can generate ε). So the grammar G2 is
not in GNF.

Steps for converting CFG into GNF

Step 1: Convert the grammar into CNF.


If the given grammar is not in CNF, convert it into CNF. You can refer the following topic to
convert the CFG into CNF: Chomsky normal form

Step 2: If the grammar exists left recursion, eliminate it.

If the context free grammar contains left recursion, eliminate it. You can refer the following
topic to eliminate left recursion: Left Recursion

Step 3: In the grammar, convert the given production rule into GNF form.

If any production rule in the grammar is not in GNF form, convert it.

Example:
1. S → XB | AA
2. A → a | SA
3. B → b
4. X → a

Solution:

As the given grammar G is already in CNF and there is no left recursion, so we can skip step 1
and step 2 and directly go to step 3.

The production rule A → SA is not in GNF, so we substitute S → XB | AA in the production


rule A → SA as:

1. S → XB | AA
2. A → a | XBA | AAA
3. B → b
4. X → a

The production rule S → XB and B → XBA is not in GNF, so we substitute X → a in the


production rule S → XB and B → XBA as:

1. S → aB | AA
2. A → a | aBA | AAA
3. B → b
4. X → a

Now we will remove left recursion (A → AAA), we get:


1. S → aB | AA
2. A → aC | aBAC
3. C → AAC | ε
4. B → b
5. X → a

Now we will remove null production C → ε, we get:

1. S → aB | AA
2. A → aC | aBAC | a | aBA
3. C → AAC | AA
4. B → b
5. X → a

The production rule S → AA is not in GNF, so we substitute A → aC | aBAC | a | aBA in


production rule S → AA as:

1. S → aB | aCA | aBACA | aA | aBAA


2. A → aC | aBAC | a | aBA
3. C → AAC
4. C → aCA | aBACA | aA | aBAA
5. B → b
6. X → a

The production rule C → AAC is not in GNF, so we substitute A → aC | aBAC | a | aBA in


production rule C → AAC as:

1. S → aB | aCA | aBACA | aA | aBAA


2. A → aC | aBAC | a | aBA
3. C → aCAC | aBACAC | aAC | aBAAC
4. C → aCA | aBACA | aA | aBAA
5. B → b
6. X → a

Hence, this is the GNF form for the grammar G.

Non-deterministic Pushdown Automata


The non-deterministic pushdown automata is very much similar to NFA. We will discuss some
CFGs which accepts NPDA.

The CFG which accepts deterministic PDA accepts non-deterministic PDAs as well. Similarly,
there are some CFGs which can be accepted only by NPDA and not by DPDA. Thus NPDA is
more powerful than DPDA.

Example:

Design PDA for Palindrome strips.

Suppose the language consists of string L = {aba, aa, bb, bab, bbabb, aabaa, ......]. The string can
be odd palindrome or even palindrome. The logic for constructing PDA is that we will push a
symbol onto the stack till half of the string then we will read each symbol and then perform the
pop operation. We will compare to see whether the symbol which is popped is similar to the
symbol which is read. Whether we reach to end of the input, we expect the stack to be empty.

This PDA is a non-deterministic PDA because finding the mid for the given string and reading
the string from left and matching it with from right (reverse) direction leads to non-deterministic
moves. Here is the ID.

Simulation of abaaba
1. δ(q1, abaaba, Z) Apply rule 1
2. ⊢ δ(q1, baaba, aZ) Apply rule 5
3. ⊢ δ(q1, aaba, baZ) Apply rule 4
4. ⊢ δ(q1, aba, abaZ) Apply rule 7
5. ⊢ δ(q2, ba, baZ) Apply rule 8
6. ⊢ δ(q2, a, aZ) Apply rule 7
7. ⊢ δ(q2, ε, Z) Apply rule 11
8. ⊢ δ(q2, ε) Accept
CFG to PDA Conversion

The first symbol on R.H.S. production must be a terminal symbol. The following steps are used
to obtain PDA from CFG is:

Step 1: Convert the given productions of CFG into GNF.

Step 2: The PDA will only have one state {q}.

Step 3: The initial symbol of CFG will be the initial symbol in the PDA.

Step 4: For non-terminal symbol, add the following rule:

1. δ(q, ε, A) = (q, α)

Where the production rule is A → α

Step 5: For each terminal symbols, add the following rule:

1. δ(q, a, a) = (q, ε) for every terminal symbol

Example 1:

Convert the following grammar to a PDA that accepts the same language.

1. S → 0S1 | A
2. A → 1A0 | S | ε

Solution:

The CFG can be first simplified by eliminating unit productions:

1. S → 0S1 | 1S0 | ε
Now we will convert this CFG to GNF:

1. S → 0SX | 1SY | ε
2. X → 1
3. Y → 0

The PDA can be:

R1: δ(q, ε, S) = {(q, 0SX) | (q, 1SY) | (q, ε)}


R2: δ(q, ε, X) = {(q, 1)}
R3: δ(q, ε, Y) = {(q, 0)}
R4: δ(q, 0, 0) = {(q, ε)}
R5: δ(q, 1, 1) = {(q, ε)}

Example 2:

Construct PDA for the given CFG, and test whether 0104 is acceptable by this PDA.

1. S → 0BB
2. B → 0S | 1S | 0

Solution:

The PDA can be given as:

1. A = {(q), (0, 1), (S, B, 0, 1), δ, q, S, ?}

The production rule δ can be:

R1: δ(q, ε, S) = {(q, 0BB)}


R2: δ(q, ε, B) = {(q, 0S) | (q, 1S) | (q, 0)}
R3: δ(q, 0, 0) = {(q, ε)}
R4: δ(q, 1, 1) = {(q, ε)}

Testing 0104 i.e. 010000 against PDA:

1. δ(q, 010000, S) ⊢ δ(q, 010000, 0BB)


2. ⊢ δ(q, 10000, BB) R1
3. ⊢ δ(q, 10000,1SB) R3
4. ⊢ δ(q, 0000, SB) R2
5. ⊢ δ(q, 0000, 0BBB) R1
6. ⊢ δ(q, 000, BBB) R3
7. ⊢ δ(q, 000, 0BB) R2
8. ⊢ δ(q, 00, BB) R3
9. ⊢ δ(q, 00, 0B) R2
10. ⊢ δ(q, 0, B) R3
11. ⊢ δ(q, 0, 0) R2
12. ⊢ δ(q, ε) R3
13. ACCEPT

Thus 0104 is accepted by the PDA.

Example 3:

Draw a PDA for the CFG given below:

1. S → aSb
2. S → a | b | ε

Solution:

The PDA can be given as:

1. P = {(q), (a, b), (S, a, b, z0), δ, q, z0, q}

The mapping function δ will be:

R1: δ(q, ε, S) = {(q, aSb)}


R2: δ(q, ε, S) = {(q, a) | (q, b) | (q, ε)}
R3: δ(q, a, a) = {(q, ε)}
R4: δ(q, b, b) = {(q, ε)}
R5: δ(q, ε, z0) = {(q, ε)}

Simulation: Consider the string aaabb

1. δ(q, εaaabb, S) ⊢ δ(q, aaabb, aSb) R3


2. ⊢ δ(q, εaabb, Sb) R1
3. ⊢ δ(q, aabb, aSbb) R3
4. ⊢ δ(q, εabb, Sbb) R2
5. ⊢ δ(q, abb, abb) R3
6. ⊢ δ(q, bb, bb) R4
7. ⊢ δ(q, b, b) R4
8. ⊢ δ(q, ε, z0) R5
9. ⊢ δ(q, ε)
10. ACCEPT

Pumping lemma for context free language


Pumping lemma for context free language (CFL) is used to prove that a language is not a
Context free language
Assume L is context free language
Then there is a pumping length n such that any string w εL of length>=n can be written as
follows −
|w|>=n
We can break w into 5 strings, w=uvxyz, such as the ones given below

 |vxy| >=n
 |vy| # ε
 For all k>=0, the string uvkxyyz∈L
The steps to prove that the language is not a context free by using pumping lemma are explained
below −

 Assume that L is context free.


 The pumping length is n.
 All strings longer than n can be pumped |w|>=n.
 Now find a string 'w' in L such that |w|>=n.
 Divide w into uvxyz
 Show that uvkxykz ∉L for some k
 Then, consider the ways that w can be divided into uvxyz.
 Show that none of these can satisfy all the 3 pumping conditions at same time.
 w cannot be pumped (contradiction).
Example
Find out whether L={xnynzn|n>=1} is context free or not
Solution
 Let L be context free.
 L must satisfy pumping length, say n.
 Now we can take a string such that s=xnynzn
 We divide s into 5 strings uvxyz.
Let n=4 so, s=x4y4z4
Case 1:
v and y each contain only one type of symbol.
{we are considering only v and y because v and y has power uv2xy2z}
X xx x yyyyz z zz
=uvkxykz when k=2
=uv2xy2z
=xxxxxxyyyyzzzzz
=x6y4z5
(Number of x # number of y #number of z)
Therefore,The resultant string is not satisfying the condition
x6y4z5 ∉ L
If one case fails then no need to check another condition.
Case 2:
Either v or y has more than one kind of symbols
Xx xx yy y y zzzz
=uvkxykz (k=2)
=uv2xy2z
=xxxxyyxxyyyyyzzzz
=x4y2x2y5z2
This string is not following the pattern of our string xnynzn
Deterministic Pushdown Automata

The Deterministic Pushdown Automata is a variation of pushdown automata that accepts the
deterministic context-free languages.

A language L(A) is accepted by a deterministic pushdown automata if and only if there is a


single computation from the initial configuration until an accepting one for all strings belonging
to L(A). It is not as powerful as non-deterministic finite automata. That's why it is less in use and
used only where determinism is much easier to implement.

A PDA is said to be deterministic if its transition function δ(q,a,X) has at most one member for -

a ∈ Σ U {ε}
So, for a deterministic PDA, there is at most one transition possible in any combination of state,
input symbol and stack top.

Closure properties of CFL

Context-free languages are closed under −

 Union
 Concatenation
 Kleene Star operation
Union
Let L1 and L2 be two context free languages. Then L1 ∪ L2 is also context free.
Example
Let L1 = { anbn , n > 0}. Corresponding grammar G1 will have P: S1 → aAb|ab
Let L2 = { cmdm , m ≥ 0}. Corresponding grammar G2 will have P: S2 → cBb| ε
Union of L1 and L2, L = L1 ∪ L2 = { anbn } ∪ { cmdm }
The corresponding grammar G will have the additional production S → S1 | S2
Concatenation
If L1 and L2 are context free languages, then L1L2 is also context free.
Example
Union of the languages L1 and L2, L = L1L2 = { anbncmdm }
The corresponding grammar G will have the additional production S → S1 S2
Kleene Star
If L is a context free language, then L* is also context free.
Example
Let L = { anbn , n ≥ 0}. Corresponding grammar G will have P: S → aAb| ε
Kleene Star L1 = { anbn }*
The corresponding grammar G1 will have additional productions S1 → SS1 | ε
Context-free languages are not closed under −
 Intersection − If L1 and L2 are context free languages, then L1 ∩ L2 is not necessarily
context free.
 Intersection with Regular Language − If L1 is a regular language and L2 is a context
free language, then L1 ∩ L2 is a context free language.
 Complement − If L1 is a context free language, then L1’ may not be context free.

You might also like