19 ContextFreeHandout
19 ContextFreeHandout
Proof:
(1) There are a countably infinite number of context-free languages. This true because every description of a context-free
language is of finite length, so there are a countably infinite number of such descriptions.
Thus there are more languages than there are context-free languages.
Example: {anbncn}
Showing that a Language is Context-Free
Unfortunately, these are weaker than they are for regular languages.
Lecture Notes 19 Languages That Are and Are Not Context Free 1
The Context-Free Languages are Closed Under Kleene Star
Let L = L(G1)*
L1 ∩ L2 = L1 ∪ L2
We proved closure for regular languages two different ways. Can we use either of them here:
1. Given a deterministic automaton for L, construct an automaton for its complement. Argue that, if closed under complement
and union, must be closed under intersection.
2. Given automata for L1 and L2, construct a new automaton for L1 ∩ L2 by simulating the parallel operation of the two original
machines, using states that are the Cartesian product of the sets of states of the two original machines.
We construct a new PDA, M3, that accepts L ∩ R by simulating the parallel execution of M1 and M2.
Insert into ∆:
This works because: we can get away with only one stack.
Lecture Notes 19 Languages That Are and Are Not Context Free 2
Example
L= a nb n ∩ (aa)*(bb)*
b/a/ a
A B 1 2
a//a b/a/ a
b
b
3 4
b
((A, a, ε), (A, a)) (1, a, 2)
((A, b, a), (B, ε)) (1, b, 3)
((B, b, a), (B, ε)) (2, a, 1)
(3, b, 4)
(4, b, 3)
A PDA for L:
Don’t Try to Use Closure Backwards
L3 = L1 ∪ L2.
But what if L3 and L1 are context free? What can we say about L2?
L3 = L1 ∪ L2.
Example:
This time we use parse trees, not automata as the basis for our argument.
u v x y z
If L is a context-free language, and if w is a string in L where |w| > K, for some value of K, then w can be rewritten as uvxyz,
where |vy| > 0 and |vxy| ≤ M, for some value of M.
uxz, uvxyz, uvvxyyz, uvvvxyyyz, etc. (i.e., uvnxynz, for n ≥ 0) are all in L.
Lecture Notes 19 Languages That Are and Are Not Context Free 3
Some Tree Basics
root
height
nodes
leaves
yield
Theorem: The length of the yield of any tree T with height H and branching factor (fanout) B is ≤ BH.
Proof: By induction on H. If H is 1, then just a single rule applies. By definition of fanout, the longest yield is B.
Assume true for H = n.
Consider a tree with H = n + 1. It consists of a root, and some number of subtrees, each of which is of height ≤ n (so induction
hypothesis holds) and yield ≤ Bn. The number of subtrees ≤ B. So the yield must be ≤ B(Bn) or Bn+1.
What Is K?
S
u v x y z
So K = BT, where T is the number of nonterminals in G and B is the branching factor (fanout).
What is M?
u v x y z
Assume that we are considering the bottom most two occurrences of some nonterminal. Then the yield of the upper one is at
most BT+1 (since only one nonterminal repeats).
So M = BT+1.
Lecture Notes 19 Languages That Are and Are Not Context Free 4
The Context-Free Pumping Lemma
Theorem: Let G = (V, Σ, R, S) be a context-free grammar with T nonterminal symbols and fanout B. Then any string w ∈ L(G)
where |w| > K (BT) can be rewritten as w = uvxyz in such a way that:
• |vy| > 0,
• |vxy| ≤ M (BT+1), (making this the "strong" form),
• for every n ≥ 0, uvnxynz is in L(G).
Proof:
Let w be such a string and let T be the parse tree with root labeled S and with yield w that has the smallest number of leaves
among all parse trees with the same root and yield. T has a path of length at least T+1, with a bottommost repeated nonterminal,
which we'll call A. Clearly v and y can be repeated any number of times (including 0). If |vy| = 0, then there would be a tree with
root S and yield w with fewer leaves than T. Finally, |vxy| ≤ BT+1.
An Example of Pumping
L = {anbncn : n≥ 0}
u v x y z
Unfortunately, we don't know where v and y fall. But there are two possibilities:
1. If vy contains all three symbols, then at least one of v or y must contain two of them. But then uvvxyyz contains at least one
out of order symbol.
2. If vy contains only one or two of the symbols, then uvvxyyz must contain unequal numbers of the symbols.
We need to pick w, then show that there are no values for uvxyz that satisfy all the above criteria. To do that, we just need to
focus on possible values for v and y, the pumpable parts. So we show that all possible picks for v and y violate at least one of
the criteria.
For each possibility for v and y (described in terms of the regions defined above), find some value n such that uvnxynz is not in L.
Almost always, the easiest values are 0 (pumping out) or 2 (pumping in). Your value for n may differ for different cases.
Lecture Notes 19 Languages That Are and Are Not Context Free 5
v y n why the resulting string is not in L
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
Q. E. D.
Suppose L is context free. The context free pumping lemma applies to L. Let M be the number from the pumping lemma.
Choose w = aMbMcM. Now w ∈ L and |w| > M ≥ K. From the pumping lemma, for all strings w, where |w| > K, there exist u, v, x,
y, z such that w = uvxyz and |vy| > 0, and |vxy| ≤ M, and for all n ≥ 0, uvnxynz is in L. There are two main cases:
1. Either v or y contains two or more different types of symbols (“a”, “b” or “c”). In this case, uv2xy2z is not of the form
a*b*c* and hence uv2xy2z ∉L.
2. Neither v nor y contains two or more different types of symbols. In this case, vy may contain at most two types of
symbols. The string uv0xy0z will decrease the count of one or two types of symbols, but not the third, so uv0xy0z ∉L
Cases 1 and 2 cover all the possibilities. Therefore, regardless of how w is partitioned, there is some uvnxynz that is not in L.
Contradiction. Therefore L is not context free.
Note: the underlined parts of the above proof is “boilerplate” that can be reused. A complete proof should have this text or
something equivalent.
L = {anbn}
L′ = {anan}
= {a2n}
= {w ∈ {a}* : |w| is even}
L = {anbm : n, m ≥ 0 and n ≠ m}
L′ = {anam : n, m ≥ 0 and n ≠ m}
=
Lecture Notes 19 Languages That Are and Are Not Context Free 6
Another Language That Is Not Context Free
L = {an : n ≥ 1 is prime}
2. |ΣL| = 1. So if L were context free, it would also be regular. But we know that it is not. So it is not context free either.
Now what?
t t
u v x y z
What if u is ε,
v is w,
x is ε,
y is w, and
z is ε
Lecture Notes 19 Languages That Are and Are Not Context Free 7
L = {tt : t ∈ {a, b}* }
What if we let |w| > M, i.e. choose to pump the string aMbaMb:
t t
u v x y z
Suppose |v| = |y|. Now we have to show that repeating them makes the two copies of t different. But we can’t.
This time, we let |w| > 2M, and the number of both a's and b's in w >M:
1 2 3 4
aaaaaaaaaabbbbbbbbbbaaaaaaaaaabbbbbbbbbb
t t
u v x y z
First, notice that if either v or y contains both a's and b's, then we immediately violate the rules for L' when we pump.
So now we know that v and y must each fall completely in one of the four marked regions.
|w| > 2M, and the number of both a's and b's in w >M:
1 2 3 4
aaaaaaaaaabbbbbbbbbbaaaaaaaaaabbbbbbbbbb
t t
u v x y z
(1,1)
(2,2)
(3,3)
(4,4)
(1,2)
(2,3)
(3,4)
(1,3)
(2,4)
(1,4)
Lecture Notes 19 Languages That Are and Are Not Context Free 8
The Context-Free Languages Are Not Closed Under Intersection
Consider L = {anbncn: n ≥ 0}
L is not context-free.
But L = L1 ∩ L2.
So, if the context-free languages were closed under intersection, L would have to be context-free. But it isn't.
By definition:
L1 ∩ L2 = L1 ∪ L2
Since the context-free languages are closed under union, if they were also closed under complementation, they would necessarily
be closed under intersection. But we just showed that they are not. Thus they are not closed under complementation.
Let L be a language such that L$ is accepted by the deterministic PDA M. We construct a deterministic PDA M' to accept (the
complement of L)$, just as we did for FSMs:
Lecture Notes 19 Languages That Are and Are Not Context Free 9
An Example of the Construction
a//a b/a/
b/a/ $/ε/
1 2 3
$/ε/
$/Z/
$/Z/
b/Z/, $/a/
a//, $/a/, b/Z/
4
Theorem: The class of deterministic context-free languages is a proper subset of the class of context-free languages.
Proof: Consider L = {anbmcp : m ≠ n or m ≠ p} L is context free (we have shown a grammar for it).
But L is not deterministic. If it were, then its complement L1 would be deterministic context free, and thus certainly context free.
But then
L2 = L1 ∩ a*b*c* (a regular language)
would be context free. But
L2 = {anbncn : n ≥ 0}, which we know is not context free.
Thus there exists at least one context-free language that is not deterministic context free.
Note that deterministic context-free languages are not closed under union, intersection, or difference.
Lecture Notes 19 Languages That Are and Are Not Context Free 10
Decision Procedures for CFLs & PDAs
Such decision procedures usually involve conversions to Chomsky Normal Form or Greibach Normal Form. Why?
Theorem: For any context free grammar G, there exists a number n such that:
1. If L(G) ≠ ∅, then there exists a w ∈ L(G) such that |w| < n.
2. If L(G) is infinite, then there exists w ∈ L(G) such that n ≤ |w| < 2n.
If we could decide these problems, we could decide the halting problem. (More later.)
Convert M to its equivalent PDA and use the corresponding CFG decision procedure. Why avoid using PDA’s directly?
If we could decide these problems, we could decide the halting problem. (More later.)
Lecture Notes 19 Languages That Are and Are Not Context Free 11
Comparing Regular and Context-Free Languages
Recursively Enumerable
Languages
Recursive
Languages
Context-Free
Languages
Regular
Languages
FSMs
D ND
PDAs
Lecture Notes 19 Languages That Are and Are Not Context Free 12