The Pumping Lemma For Context Free Grammars
The Pumping Lemma For Context Free Grammars
The Pumping Lemma For Context Free Grammars
Where a is any terminal and A,B,C are any variables except B and C may not be the start variable
There are two and only two variables on the right hand side of the rule Exception: S is permitted where S is the start variable
Theorem
Any context free language may be generated by a context free grammar in Chomsky Normal Form To show how this is possible we must be able to convert any CFG into CNF
1. Eliminate all rules of the form A 2. Eliminate all unit rules of the form A B 3. Convert any remaining rules into the form A BC
Proof
First add a new start symbols S0 and the rule S0 S where S was the original start symbol Remove all rules.
Remove a rule A where A is not the start symbol. For each occurrence of A on the RHS of a rule, add a new rule with that occurrence of A deleted Ex: R uAv becomes R uv This must be done for each occurrence of A, e.g.: R uAvAw becomes R uvAw | uAvw | uvw Repeat until all rules are removed, not including the start This guarantees the new start symbol is not on the RHS of any rule
Proof
Next remove all unit rules of the form A B
Whenever a rule B u appears, add the rule A u. u may be a string of variables and terminals Repeat until all unit rules are eliminated
Convert all remaining rules into the form with two variables on the right
The rule A u1u2u3uk becomes A u1A1 A1 u2A2 Ak-2 uk-1uk
Where the Ais are new variables. u may be a variable or a terminal (and in fact a terminal must be converted to a variable since CNF does not allow a mixture of variables and terminals on the right hand side)
Example
Convert the following grammar into CNF
S ASA | aB A B|S B b| First add a new start symbol S0: S0 S S ASA | aB A B|S B b|
Example
Next remove the epsilon transition from rule B
S0 S S ASA | aB | a A B|S| B b
Example
Next remove unit rules, starting with S0 S and S S can also be removed
S0 ASA | aB | a | AS | SA S ASA | aB | a |AS | SA A B|S B b S0 ASA | aB | a | AS | SA S ASA | aB | a |AS | SA A b|S B b
Example
Finally convert the remaining rules to the proper form by adding variables and rules when we have more than three things on the RHS
S0 ASA | aB | a | AS | SA S ASA | aB | a |AS | SA A b| ASA | aB | a |AS | SA B b
Becomes
S0 A1 A2 S A B
We are done!
S S S a
A b
Longest path has length n, where n>1. The root uses a production that must be of the form A BC since we cant have a terminal from the root By induction, the subtrees from B and C have yields of length at most 2n-2 since we used one of the edges from the root to these subtrees The yield of the entire tree is the concatenation of these two yields, which is 2n-2 + 2n-2 which equals 2*2n-2 = 2n-2+1=2n-1
Informally
The pumping lemma for CFLs states that for sufficiently long strings in a CFL, we can find two, short, nearby substrings that we can pump in tandem and the resulting string must also be in the language.
Choose longest path to be m+1, yield must then be 2m or less Given p=2m and |z| p this works out Any parse tree that yields z must have a path of length at least m+1. This is illustrated in the following figure:
Parse Tree
z=uvwxy where |z| p
A0 A1 A2 Ak
Variables A0,A1, Ak If km then at least two of these variables must be the same, since only m unique variables
Parse Tree
Suppose the variables are the same at Ai=Aj where k-m i < j k
A0
Ai Aj u v w x y
Pumping Lemma
Condition 2: vx Follows since we must use a production from Ai to Aj and cant be a terminal or there would be no Aj. Therefore we must have two variables; one of these must lead to Aj and the other must lead to v or x or both. This means v and x cannot both be empty but one might be empty. A0
Ai Aj u v w x y
Pumping Lemma
Condition 1 stated that |vwx| p This says the yield of the subtree rooted at Ai is p We picked the tree so the longest path was m+1, so it easily follows that |vwx| p 2m+1-1 (Ai could be A0 so vwx is the entire tree)
u v w A0
Ai Aj x y
10
Pumping Lemma
Condition 3 stated that for all i 0, uviwxiy is also in L We can show this by noting that the symbol Ai=Aj This means we can substitute different production rules for each other Substituting Aj for Ai the resulting string must be in L
A0
Aij w u v w Aj x y y
Pumping Lemma
Substituting Ai for Aj Result: uv1wx1y, uv2wx2y, etc.
u v v Ai Aji wAj w xx x yy A0
11
Pumping Lemma
We have now shown all conditions of the pumping lemma for context free languages To show a language is not context free we
Pick a language L to show that it is not a CFL Then some p must exist, indicating the maximum yield and length of the parse tree We pick the string z, and may use p as a parameter Break z into uvwxy subject to the pumping lemma constraints
|vwx| p, |vx|
We win by picking i and showing that uviwxiy is not in L, therefore L is not context free
Example 1
Let L be the language { 0n1n2n | n 1 }. Show that this language is not a CFL. Suppose that L is a CFL. Then some integer p exists and we pick z = 0p1p2p. Since z=uvwxy and |vwx| p, we know that the string vwx must consist of either:
all zeros all ones all twos a combination of 0s and 1s a combination of 1s and 2s
The string vwx cannot contain 0s, 1s, and 2s because the string is not large enough to span all three symbols. Now pump down where i=0. This results in the string uwy and can no longer contain an equal number of 0s, 1s, and 2s because the strings v and x contains at most two of these three symbols. Therefore the result is not in L and therefore L is not a CFL.
12
Example 2
Let L be the language { aibjck | 0 i j k }. Show that this language is not a CFL. This language is similar to the previous one, except proving that it is not context free requires the examination of more cases. Suppose that L is a CFL. Pick z = apbpcp as we did with the previous language. As before, the string vwx cannot contain as, bs, and cs. We then pump the string depending on the string vwx as follows:
There are no as. Then we try pumping down to obtain the string uv0wx0y to get uwy. This contains the same number of as, but fewer bc or cs. Therefore it is not in L. There are no bs but there are as. Then we pump up to obtain the string uv2wx2y to give us more as than bs and this is not in L. There are no bs but there are cs. Then we pump down to obtain the string uwy. This string contains the same number of bs but fewer cs, therefore this is not in C. There are no cs. Then we pump up to obtain the string uv2wx2y to give us more bs or more as than there are cs, so this is not in C.
Since we can come up with a contradiction for any case, this language is not a CFL language.
Example 3
Let L be the language {ww | w {0,1}*}. Show that this language is not a CFL. As before, assume that L is context-free and let p be the pumping length. This time choosing the string z is less obvious. One possibility is the string: 0p10p1. It is in L and has length greater than p, so it appears to be a good candidate. But this string can be pumped as follows so it is not adequate for our purposes: 0p 1 0p 1
000000 0 1 0 0000001 u v w x y
13
Example 3
This time lets try z=0p1p0p1p instead. We can show that this string cannot be pumped. We know that |vwx| p.
Lets say that the string |vwx| consists of the first p 0s. If so, then if we pump this string to uv2wx2y then well have introduced more 0s in the first half and this is not in L. We get a similar result if |vwx| consists of all 0s or all 1s in either the first or second half. If the string |vwx| matches some sequence of 0s and 1s in the first half of z, then if we pump this string to uv2wx2y then we will have introduced more 1s on the left that move into the second half, so it cannot be of the form ww and be in L. Similarly, if |vwx| occurs in the second half of z, them pumping z to uv2wx2y moves a 0 into the last position of the first half, so it cannot be of the form ww either. This only leaves the possibility that |vwx| somehow straddles the midpoint of z. But if this is the case, we can now try pumping the string down. uv0wx0y = uwy has the form of 0p1i0j1p where i and j cannot both equal p. This string is not of the form ww and therefore the string cannot be pumped and L is therefore not a CFL.
14