0% found this document useful (0 votes)
130 views41 pages

Chapter 3

The document summarizes pushdown automata and context-free languages. It discusses how pushdown automata are like finite automata but can also push and pop symbols from a stack, allowing them to recognize a wider class of languages called context-free languages. An example pushdown automaton is provided that recognizes the language of strings with an equal number of a's and b's. The automaton uses its stack to keep track of the a's as it reads them, and then pops the symbols to match to the b's.

Uploaded by

Varun Srivastava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views41 pages

Chapter 3

The document summarizes pushdown automata and context-free languages. It discusses how pushdown automata are like finite automata but can also push and pop symbols from a stack, allowing them to recognize a wider class of languages called context-free languages. An example pushdown automaton is provided that recognizes the language of strings with an equal number of a's and b's. The automaton uses its stack to keep track of the a's as it reads them, and then pops the symbols to match to the b's.

Uploaded by

Varun Srivastava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Chapter 3

Pushdown Automata and Context


Free Languages

As we have seen, Finite Automata are somewhat limited in the languages they can recog-
nize. Pushdown Automata are another type of machine that can recognize a wider class of
languages. Context Free Grammars, which we can think of as loosely analogous to regular
expressions, provide a method for describing a class of languages called Context Free Lan-
guages. As we will see, the Context Free Languages are exactly the languages recognized by
Pushdown Automata.
Pushdown Automata can be used to recognize certain types of structured input. In
particular, they are an important part of the front end of compilers. They are used to
determine the organization of the programs being processed by a compiler; tasks they handle
include forming expression trees from input arithmetic expressions and determining the scope
of variables.
Context Free Languages, and an elaboration called Probabilistic Context Free Languages,
are widely used to help determine sentence organization in computer-based text understand-
ing (e.g., what is the subject, the object, the verb, etc.).

3.1 Pushdown Automata


A Pushdown Automata, or PDA for short, is simply an NFA equipped with a single stack.
As with an NFA, it moves from vertex to vertex as it reads its input, with the additional
possibility of also pushing to and popping from its stack as part of a move from one vertex
to another. As with an NFA, there may be several viable computation paths. In addition,
as with an NFA, to recognize an input string w, a PDA M needs to have a recognizing path,
from its start vertex to a final vertex, which it can traverse on input w.
Recall that a stack is an unbounded store which one can think of as holding the items it
stores in a tower (or stack) with new items being placed (written) at the top of the tower
and items being read and removed (in one operation) from the top of the tower. The first
operation is called a Push and the second a Pop. For example, if we perform the sequence

62
3.1. PUSHDOWN AUTOMATA 63

of operations Push(A), Push(B), Pop, Push(C), Pop, Pop, the 3 successive pops will read
the items B, C, A. respectively. The successive states of the stack are shown in Figure 3.1.

Push(A) Push(B) B Pop Push(C) C Pop Pop


A A A A A
Returns B Returns C Returns A

Figure 3.1: Stack Behavior.

Let’s see how a stack allows us to recognize the following language L1 = {ai bi | i ≥ 0}.
We start by explaining how to process a string w = ai bi ∈ L. As the PDA reads the initial
string of a’s in its input, it pushes a corresponding equal length string of A’s onto its stack
(one A for each a read). Then, as M reads the b’s, it seeks to match them one by one against
the A’s on the stack (by popping one A for each b it reads). M recognizes its input exactly
if the stack becomes empty on reading the last b.
In fact, PDAs are not allowed to use a Stack Empty test. We use a standard technique,
which we call -shielding, to simulate this test. Given a PDA M on which we want to perform
Stack Empty tests, we create a new PDA M which is identical to M apart from the following
small changes. M uses a new, additional symbol on its stack, which we name . Then at the
very start of the computation, M pushes a  onto its stack. This will be the only occurrence
of  on its stack. Subsequently, M performs the same steps as M except that when M seeks
to perform a Stack Empty test, M pops the stack and then immediately pushes the popped
symbol back on its stack. The simulated stack is empty exactly if the popped symbol was a
.
Next, we explain what happens with strings outside the language L1 . We do this by
looking at several categories of strings in turn.

1. ai bh , h < i.
After the last b is read, there will still be one or more A’s on the stack, indicating the
input is not in L1 .

2. ai bj , j > i.
On reading the (i + i)st b, there is an attempt to pop the now empty stack to find a
matching A; this attempt fails, and again this indicates the input is not in L1 .

3. The only other possibility for the input is that it contains the substring ba; as already
described, the processing consists of an a-reading phase, followed by a b-reading phase.
The a in the substring ba is being encountered in the b-reading phase and once more
this input is easily recognized as being outside L1 .

As with an NFA, we can specify the computation using a directed graph, with the edge
labels indicating the actions to be performed when traversing the given edge. To recognize
an input w, the PDA needs to be able to follow a path from its start vertex to a final vertex
64 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

starting with an empty stack, where the path’s read labels spell out the input, and the
stack operations on the path are consistent with the stack’s ongoing contents as the path is
traversed.
A PDA M1 recognizing L1 is shown in Figure 3.2. Because the descriptions of the
Read b
Pop A
p1 : p3 :
λ read ai bh read, i ≥ h ≥ 0
stack empty stack contents Ai−h Pop 

p2 : p4 :
Push  ai read, i ≥ 0 Read λ ai bi read, i ≥ 0
stack contents ai stack empty
Read a
Push A

Figure 3.2: PDA M1 recognizing L1 = {ai bi | i ≤ 0}.

vertices are quite long, we have given them the shorter names p1 –p4 . These descriptions
specify exactly the strings that can be read and the corresponding stack contents to reach
the vertex in question. We will verify the correctness of these specifications by looking at
what happens on the possible inputs. To do this effectively, we need to divide the strings
into appropriate categories.
An initial understanding of what M1 recognizes can be obtained by ignoring what the
stack does and viewing the machine as just an NFA (i.e. using the same graph but with just
the reads labeling the edges). See Figure 3.3 for the graph of the NFA N1 derived from M1 .
The significant point is that if M1 can reach vertex p on input w using computation path
a b

p1 : λ p2 : λ p3 : λ p4 :
λ read ai read ai bh read ai bh read
i≥0 i, h ≥ 0 i, h ≥ 0

Figure 3.3: The NFA N1 derived from M1 .

P then so can N1 (for the same reads label P in both machines). It follows that any string
recognized by M1 is also recognized by N1 : L(M1 ) ⊆ L(N1 ).
It is not hard to see that N1 recognizes a∗ b∗ . If follows that M1 recognizes a subset of
∗ ∗
a b . So to explain the behavior of M1 in full it suffices to look at what happens on inputs
3.1. PUSHDOWN AUTOMATA 65

the form ai bj , i, j ≥ 0, which we do by examining five subcases that account for all such
strings.

1. λ.
M1 starts at p1 . On pushing , p2 and p3 can be reached. Them on popping the ,
p4 can be reached. Note that the specification of p2 holds with i = 0, that of p3 with
i = h = 0, and that of p4 with i = 0. Thus the specification at each vertex includes
the case that the input is λ.

2. ai , i ≥ 1.
To read ai , M1 needs to push , then follow edge (p1 , p2 ), and then follow edge (p2 , p2 )
i times. This puts $Ai on the stack. Thus on input ai , p2 can be reached and its
specification is correct. In addition, the edge to p3 can be traversed without any
additional reads or stack operations, and so the specification for p3 with h = 0 is
correct for this input.

3. ai bh , 1 ≤ h < i.
The only place to read b is on edge (p3 , p3 ). Thus, for this input, M1 reads ai to
bring it to p3 and then follows (p3 , p3 ) h times. This leaves $Ai−h on the stack, and
consequently the specification of p3 is correct for this input. Note that as h < i, edge
(p3 , p4 ) cannot be followed as  is not on the stack top.

4. ai bi , i ≥ 1.
After reading the i b’s, M1 can be at vertex p3 as explained in (3). Now, in addition,
edge (p3 , p4 ) can be traversed and this pops the  from the stack, leaving it empty. So
the specification of p4 is correct for this input.

5. ai bj , j > i.
On reading ai bi , M1 can reach p3 with the stack holding  or reach p4 with an empty
stack, as described in (4). From p3 the only available move is to p4 , without reading
anything further. At p4 there is no move, so the rest of the input cannot be read, and
thus no vertex can be reached on this input.

This is a very elaborate description which we certainly don’t wish to repeat for each
similar PDA. We can describe M1 ’s functioning more briefly as follows.

M1 checks that its input has the form a∗ b∗ (i.e. all the a’s precede all the b’s)
using its underlying NFA (i.e. without using the stack). The underlying NFA is
often called its finite control. In tandem with this, M1 uses its -shielded stack to
match the a’s against the b’s, first pushing the a’s on the stack (it is understood
that in fact A’s are being pushed) and then popping them off, one for one, as the
b’s are read, confirming that the numbers of a’s and b’s are equal.

The detailed argument we gave above is understood, but not spelled out.
66 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

Now we are ready to define a PDA more precisely. As with an NFA, a PDA consists of a
directed graph with one vertex, start, designated as the start vertex, and a (possibly empty)
subset of vertices designated as the final set, F , of vertices. As before, in drawing the graph,
we show final vertices using double circles and indicate the start vertex with a double arrow.
Each edge is labeled with the actions the PDA performs on following that edge.
For example, the label on edge e might be: Pop A, read b, Push C, meaning that the
PDA pops the stack, reads the next input character, and if the pop returns an A and the
character read is a b, then the PDA can traverse e, which entails it pushing C onto the stack.
Some or all of these values may be λ: Pop λ means that no Pop is performed, read λ that
no read occurs, and Push λ that no Push happens. To avoid clutter, we usually omit the
λ-labeled terms; for example, instead of Pop λ, read λ, Push C, we write Push C. Also, to
avoid confusion in the figures, if there are multiple triples of actions that take the PDA from
a vertex u to a vertex v, we use multiple edges from u to v, one for each triple.
In sum, a label, which specifies the actions accompanying a move from vertex u to vertex
v, has up to three parts.
1. Pop the stack and check that the returned character has a specified value (in our
example this is the value A).
2. Read the next character of input and check that it has a specified value (in our example,
the value b).
3. Push a specified character onto the stack (in our example, the character C).
From an implementation perspective it may be helpful to think in terms of being able to
peek ahead, so that one can see the top item on the stack without actually popping it, and
one can see the next input character (or that one is at the end of the input) without actually
reading forward.
One further rule is that an empty stack may not be popped.
A PDA also comes with an input alphabet Σ and a stack alphabet Γ (these are the
symbols that can be written on the stack). It is customary for Σ and Γ to be disjoint, in
part to avoid confusion. To emphasize this disjointness, we write the characters of Σ using
lowercase letters and those of Γ using uppercase letters.
Became the stack contents make it difficult to describe the condition of the PDA after
multiple moves, we use the transition function here to describe possible out edges from
single vertices only. Accordingly, δ(p, A, b) = {(q1 , C1 ), (q2 , C2 ), · · · , (ql , Cl )} indicates that
the edges exiting vertex p and having both Pop A and read b in their label are the edges
going to vertices q1 , q2 , · · · , ql where the rest of the label for edge (p, qi ) includes action Ci ,
for 1 ≤ i ≤ l. That is δ(p, A, b) specifies the possible moves out of vertex p on popping
character A and reading b. (Recall that one or both of A and b might be λ.)
In sum, a PDA M consists of a 6-tuple: M = (Σ, Γ, V, start, F, δ), where
1. Σ is the input alphabet,
2. Γ is the stack alphabet,
3.1. PUSHDOWN AUTOMATA 67

3. V is the vertex or state set,

4. F ⊆ V is the final vertex set,

5. start is the start vertex, and

6. δ is the transition function, which specifies the edges and their labels.

Recognition is defined as for an NFA, that is, PDA M recognizes input w if there is a
path that M can follow on input w that takes M from its start vertex to a final vertex.
We call this path a w-recognizing computation path to emphasize that stack operations may
occur in tandem with the reading of input w. More formally, M recognizes w if there is a
path start = p0 , p1 , · · · , pm , where pm is a final vertex, the label on edge (pi−i , pi ) is (Read
ai , Pop Bi , Push Ci ), for 1 ≤ i ≤ m, and the stack contents at vertex pi is σi , for 0 ≤ i ≤ m,
where

1. a1 a2 · · · am = w,

2. σ0 = λ,

3. and Pop Bi , Push Ci applied to σi−i produces σi for 1 ≤ i ≤ m.

We write L(M ) for the language, or set of strings, recognized by M .


Next, we show some more examples of languages that can be recognized by PDAs.

Example 3.1.1. L2 = {ai cbi | i ≥ 0}.


PDA M2 recognizing L2 is shown in Figure 3.4. The processing by M2 is similar to that

Read b
Pop A
p1 : p3 :
λ read ai cbh read, i ≥ h ≥ 0 Pop 
stack empty stack contents Ai−h

p2 : p4 :
Push  ai read, i ≥ 0 Read c ai cbi read, i ≥ 0
stack contents Ai stack empty
Read a
Push A

Figure 3.4: PDA M2 recognizing L2 = {ai cbi | i ≥ 0}.

of M1 . M2 checks that its input has the form a∗ cb∗ using its finite control. In tandem, M2
uses its -shielded stack to match the a’s against the b’s, first pushing the a’s on the stack
68 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

(actually A’s are being pushed), then reads the c without touching the stack, and finally
pops the a’s off, one for one, as the b’s are read, confirming that the numbers of a’s and b’s
are equal.

Example 3.1.2. L3 = {wcwR | w ∈ {a, b}∗ }.


PDA M3 recognizing L3 is shown in Figure 3.5. M3 uses its -shielded stack to match

Read a Read b
Pop A Pop B
p1 : p3 :
λ read yzcz R read Pop 
stack empty stack contents Y

p2 : p4 :
Push  x read Read c
wcwR read
stack contents X stack empty
Read a Read b
Push A Push B

Figure 3.5: PDA M3 recognizing L3 = {wcwR | w ∈ {a, b}∗ }. Z denotes string z in capital
letters.

the w against the wR , as follows. It pushes w on the stack (the end of the substring w being
indicated by reaching the c). At this point, the stack content read from the top is wR $, so
popping down to the  outputs the string wR . This stack contents is readily compared to
the string following the c. The input is recognized exactly if they match.

Example 3.1.3. L4 = {wwR | w ∈ {a, b}∗ }.


PDA M4 recognizing L4 is shown in Figure 3.6. This is similar to Example 3.1.2. The
one difference is that the PDA M4 can decide at any point to stop reading w and begin
reading wR . Of course there is only one correct switching location, at most, but as M4 does
not know where it is, M4 considers all possibilities by means of its nondeterminism.

Example 3.1.4. L5 = {ai bj ck | i = j or i = k}.


PDA M5 recognizing L5 is shown in Figure 3.7.
This is the union of languages L6 = {ai bi ck | i, k ≥ 0} and L7 = {ai bj ci | i, j ≥ 0}, each
of which is similar to L2 . M5 , the PDA recognizing L5 uses submachines to recognize each
of L6 and L7 . M5 ’s first move from its start vertex is to traverse (Push )-edges to the start
vertices of the submachines. The net effect is that M5 recognizes the union of the languages
recognized by the submachines. As the submachines are similar to M2 they are not explained
further.
3.1. PUSHDOWN AUTOMATA 69

Read a Read b
Pop A Pop B
p1 : p3 :
λ read yzz R read Pop 
stack empty stack contents Y

p2 : p4 :
Push  x read Read λ
wwR read
stack contents X stack empty
Read a Read b
Push A Push B

Figure 3.6: PDA M4 recognizing L4 = {wwR | w ∈ {a, b}∗ }. Z denotes string z in capital
letters.

Read a Read b
Push A Pop A
p1 : Push  p12 : Read λ p13 :
λ read ai read, i ≥ 0 ai bh read, i ≥ h ≥ 0
stack empty stack contents ai stack contents Ai−h

p14 : Pop 
Read c ai bi read, i ≥ 0
Push 
stack empty

Read a Read c
Push A Read b Pop A

p21 : Read λ p22 : Read λ p23 :


ai read, i ≥ 0 ai bj read, i, j ≥ 0 ai bj ch read
stack contents ai stack contents Ai i ≥ h ≥ 0, j ≥ 0
stack contents Ai−h

p24 :
ai bj ci read, i, j ≥ 0 Pop 
stack empty

Figure 3.7: PDA M5 recognizing L5 = {ai bj ck | i = j or i = k}. Z denotes string z in capital


letters.
70 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

3.2 Closure Properties


Lemma 3.2.1. Let A and B be languages recognized by PDAs MA and MB , respectively,
Then A ∪ B is also recognized by a PDA called MA∪B .

Proof. The graph of MA∪B consists of the graphs of MA and MB plus a new start vertex
start A∪B , which is joined by λ-edges to the start vertices start A and start B of MA and MB ,
respectively. Its final vertices are the final vertices of MA and MB . The graph is shown in
figure 3.8.

MA MA
...

...
startA Read λ startA
MA∪B

MB MB

Read λ
...

...
startB startB

Figure 3.8: PDA MA∪B .

While it is clear that L(MA∪B ) = L(MA ) ∪ L(MB ), we present the argument for com-
pleteness.
First, we show that L(MA∪B ) ⊆ L(MA ) ∪ L(MB ). Suppose that w ∈ L(MA∪B ). Then
there is a w-recognizing computation path from start A∪B to a final vertex f . If f lies in
MA , then removing the first edge of P leaves a path P  from start A to f . Further, at the
start of P  , the stack is empty and nothing has been read, so P  is a w-recognizing path
in MA . That is, w ∈ L(MA ). Similarly, if f lies in MB , then w ∈ L(MB ). Either way,
w ∈ L(MA ) ∪ L(MB ).
Second, we show that L(MA ) ∪ L(MB ) ⊆ L(MA∪B ). Suppose that w ∈ L(MA ). Then
there is a w-recognizing computation path P  from start A to a final vertex f in MA . Adding
the λ-edge (start A∪B , start A ) to the beginning of P  creates a w-recognizing computation
path in MA∪B , showing that L(MA ) ⊆ L(MA∪B ). Similarly, if w ∈ L(MB ), then L(MB ) ⊆
L(MA∪B ).

Our next construction is simplified by the following technical lemma.


3.2. CLOSURE PROPERTIES 71

Lemma 3.2.2. Let PDA M recognize L. There is an L-recognizing PDA M  with the
following properties: M  has only one final vertex, finalM  , and M  will always have an
empty stack when it reaches finalM  .
Proof. The idea is quite simple. M  simulates M using a -shielded stack. When M ’s
computation is complete, M  moves to a new stack-emptying vertex, stack-empty, at which
M  empties its stack of everything apart from the -shield. To then move to final M  , M  pops
the , thus ensuring it has an empty stack when it reaches final M  . M  is illustrated in Figure
3.9. More precisely, M  consists of the graph of M plus three new vertices; start M  , stack-

M: M : M: Pop X
Read λ X = $
...

...
startM startM  startM
Read λ stack-
empty

M ’s final finalM 
vertices

Figure 3.9: PDA M  for Lemma 3.2.2.

empty, and final M  . The following edges are also added: (start M  , start M ) labeled Push ,
λ-edges from each of M ’s final vertices to stack-empty, self-loops (stack-empty, stack-empty)
labeled Pop X for each X ∈ Γ, where Γ is M ’s stack alphabet (so $ = X), and edge
(stack-empty, final M  ) labeled Pop .
It is clear that L(M ) = L(M  ). Nonetheless, we present the argument for completeness.
First, we show that L(M  ) ⊆ L(M ). Let w ∈ L(M  ). Let P  be a w-recognizing path in
M  and let f be the final vertex of M preceding stack-empty on the path P  . Removing the
first edge in P  and every edge including and after (f, stack-empty), leaves a path P which
is a w-recognizing path in M . Thus L(M  ) ⊆ L(M ).
Now we show L(M ) ⊆ L(M  ). Let w ∈ L(M ) and let P be a w-recognizing path in
M . Suppose that P ends with string s on the stack at final vertex f . We add the edges
(start M  , start M ), (f, stack-empty), |s| self-loops at stack-empty, and (stack-empty, final M  )
to P , yielding path P  in M  . By choosing the self-loops to be labeled with the characters of
sR in this order, we cause P  to be a w-recognizing path in M  . Thus L(M ) ⊆ L(M  ).
Lemma 3.2.3. Let A and B be languages recognized by PDAS MA and MB , respectively,
Then A ◦ B is also recognized by a PDA called MA◦B .
Proof. Let MA and MB be PDAs recognizing A and B, respectively, where they each have
just one final vertex that can be reached only with an empty stack.
MA◦B consists of MA , MB plus one λ-edge (final A , startB ). Its start vertex is start A and
its final vertex is final B .
72 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

To see that L(MA◦B ) = A ◦ B is straightforward.


First, we show that L(MA◦B ) ⊆ A ◦ B. So suppose that w ∈ L(MA◦B ), Then there is a w-
recognizing path P in MA◦B ; P is formed from a path PA in MA going from start A to final A
(and which therefore ends with an empty stack), λ-edge (final A , start B ), and a path PB in
MB starting from start B with an empty stack and going to final B . Let u be the sequence
of reads labeling PA and v those labeling PB . It follows that PA is u-recognizing, and PB
is v-recognizing (see Figure 3.10) and thus u ∈ A and v ∈ B. In addition, w = uλv = uv,

MA∪B : MA : MB :
u Read λ v
startA startB

Figure 3.10: PDA L(MA◦B ).

which implies that w = uv ∈ A ◦ B.


Next, we show that A ◦ B ⊆ L(MA◦B ). So suppose that w = uv ∈ A ◦ B, where
u ∈ A and v ∈ B. then there is a u-recognizing path PA in MA and a v-recognizing path
PB in MB . We argue that the following path P is w-recognizing: PA followed by the λ-
edge (final A , start B ), followed by PB . Note that the stack is empty when at node final A by
construction and hence it is also empty at node start B . Consequently, computation path P
in MA◦B recognizes uλv = uv = w, as claimed. It follows that w = uv ∈ L(MA◦B ).

Lemma 3.2.4. Suppose that L is recognized by PDA ML and suppose that R is a regular
language. Then L ∩ R is recognized by a PDA called ML∩R .

Proof. Let ML = (Σ, ΓL , VL , start L , FL , δL ) and let R be recognized by DFA MR = (Σ, VR , start R , FR , δR ).
We will construct ML∩R . The vertices of ML∩R will be 2-tuples, the first component corre-
sponding to a vertex of ML and the second component to a vertex of MR . The computation
of ML∩R , when looking at the first components along with the stack will be exactly the
computation of ML , and when looking at the second components, but without the stack, it
will be exactly the computation of MR . This leads to the following edges in ML∩R .

1. If ML has an edge (uL , vL ) with label (Pop A, Read b, Push C) and MR has an edge
(uR , vR ) with label b, then ML∩R has an edge ((uL , uR ), (vL , vR )) with label (Pop A,
Read b, Push C).

2. If ML has an edge (uL , vL ) with label (Pop A, Read λ, Push C) then ML∩R has an
edge ((uL , uR ), (vL , uR )) with label (Pop A, Read λ, Push C) for every uR ∈ VR .

The start vertex for ML∩R is (start L , start R ) and its set of final vertices is FL × FR , the pairs
of final vertices, one from ML and one from MR , respectively.
3.3. CONTEXT FREE LANGUAGES 73

Assertion. ML∩R can reach vertex (vL , vR ) on input w if and only if ML can reach vertex
vL and MR can reach vertex vR on input w.
Next, we argue that the assertion is true. For suppose that on input w, ML∩R can reach
vertex (vL , vR ) by computation path PL∩R . If we consider the first components of the vertices
in PL∩R , we see that it is a computation path of ML on input w reaching vertex vL . Likewise,
if we consider the second components of the vertices of ML∩R , we obtain a path PR . The
only difficulty is that this path may contain repetitions of a vertex uR corresponding to reads
of λ by ML∩R . Eliminating such repetitions creates a path PR in MR reaching vR and having
the same label w as path PR .
Conversely, suppose that ML can reach vL by computation path PL and MR can reach
vR by computation path PR . Combining these paths, with care, gives a computation path P
which reaches (vL , vR ) on input w. We proceed as follows. The first vertex is (startL , startR ).
Then we traverse PL and PR in tandem. Either the next edges in PL and PR are both labeled
by a Read b (simply a b on PR ) in which case we use Rule (1) above to give the edge to add
to P , or the next edge on PL is labeled by Read λ (together with a Pop and a Push possibly)
and then we use Rule (2) to give the edge to add to P . In the first case we advance one edge
on both PL and PR , in the second case we only advance on PL . Clearly, the path ends at
vertex (vL , vR ) on input w.
It is now easy to see that L(ML∩R ) = L ∩ R. For on input w, ML∩R can reach a final
vertex v ∈ F = FL × FR if and only if on input w, ML reaches a vertex vL ∈ FL and
MR reaches a vertex vR ∈ FR . That is, w ∈ L(ML∩R ) if and only if w ∈ L(ML ) = L and
w ∈ L(MR ) = R, or in other words w ∈ L(ML∩R ) if and only if w ∈ L ∩ R.

Exercise. Show that if A is recognized by pda MA then there is a pda MA∗ recognizing A∗ .

3.3 Context Free Languages


Context Free Languages (CFLs) provide a way of specifying certain recursively defined lan-
guages. Let’s begin by giving a recursive method for generating integers in decimal form.
We will call this representation of integers decimal strings. A decimal string is defined to be
either a single digit (one of 0–9) or a single digit followed by another decimal string. This
can be expressed more succinctly as follows.


decimal string →
digit |
digit
decimal string (3.1)

digit → 0 | 1 | 2 | · · · | 9

We can also view this as providing a way of generating decimal strings. To generate a
particular decimal string we perform a series of replacements or substitutions starting from
74 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

the “variable”
decimal string . The sequence of replacements generating 147 is shown below.


decimal string ⇒
digit
decimal string
⇒ 1
decimal string
⇒ 1
digit
decimal string
⇒ 14
decimal string
⇒ 14
digit
⇒ 147

We write σ ⇒ τ if the string τ is the result of a single replacement in σ. The possible


replacements are those given in (3.1). Each replacement takes one occurrence of an item on
the lefthand side of an arrow and replaces it with one of the items on the right hand side;
these are the items separated by vertical bars. Specifically, the possible replacements are to
take one occurrence of one of:

X
decimal string and replaces it with one of
digit or the sequence
digit
decimal string .

X
digit and replaces it with one of 0–9.

An easier way to understand this is by viewing the generation using a tree, called a
derivation or parse tree, as shown in Figure 3.11. Clearly, were we patient enough, in


decimal string


digit
decimal string


digit
decimal string


digit

1 4 7

Figure 3.11: Parse Tree Generating 147.

principle we could generate any number.


The above generation rules are relatively simple. Let’s look at the slightly more-elaborate
example of arithmetic expressions such as x + x × x or (x + x) × x. For simplicity, we limit
the expressions to those built from a single variable (x), the “+” and “×” operators, and
3.3. CONTEXT FREE LANGUAGES 75

parentheses. We also would like the generation rules to follow the precedence order of the
operators “+” and “×”, in a sense that will become clear.
The generation rules, being recursive, generate arithmetic expressions top-down. Let’s
use the example x + x × x as motivation. To guide us, it is helpful to look at the expression
tree representation, shown in Figure 3.12a. The root of the tree holds the operator “+”;


expr


expr
term

+
term
factor

×
factor

x x x x + x × x

(a) (b)

Figure 3.12: Parse Tree Generating x + x × x.

correspondingly, the first substitution we apply needs to create a “+”; the remaining parts
of the expression will then be generated recursively. This is implemented with the following
variables:
expr , which can generate any arithmetic expression;
term , which can generate
any arithmetic expression whose top level operator is times (×) or matched parentheses
(“(” and “)”); and
factor , which can generate any arithmetic expression whose top level
operator is a pair of matched parentheses. This organization is used to enforce the usual
operator precedence. This leads us to the following substitution rules:

X
expr →
expr +
term |
term .
This rule implies that the top-level additions are generated in right to left order and
hence performed in left to right order (for the expression generated by the left
expr
is evaluated before being added to the expression generated by the
term to its right).
So eventually the initial
expr is replaced by
term +
term + · · · +
term , each
term being an operand for the “+” operator. If there is no addition,
expr is simply
replaced by
term .

X
term →
term ×
factor |
factor .
Similarly, this rule implies that the multiplications are performed in left to right order.
Eventually the initial
term is replaced by
factor ×
factor × · · · ×
factor , each
76 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

factor being an operand to the “×” orator. If there is no multiplication,


term is
simply replaced by
factor .

X
factor → x | (expr).
Since “×” has the highest precedence each of its operands must be either a simple
variable (x) or a parenthesized expression, which is what we have here.

The derivation tree for the example of Figure 3.12a is shown in Figure 3.12b. Note that the
left-to-right order of evaluation is an arbitrary choice for the “+” and the “×” operators,
but were we to introduce the “−”and “÷” operators it ceases to be arbitrary; left-to-right
is then the correct rule.
Let’s look at one more example.

Example 3.3.1. L = {ai bi | i ≥ 0}. Here are a set of rules to generate the strings in L, to
generate L for short, starting from the term
Balanced .


Balanced ⇒ λ | a
Balanced b.

Notice how a string s ∈ L is generated: from the outside in. First the outermost a and b are
created, together with a
Balanced term between them; this
Balanced term will be used
to generate ai−1 bi−1 . Then the second outermost a and b are generated, etc. The bottom of
the recursion, the base case, is the generation of string λ.

Now we are ready to define Context Free Grammars (CFGs), G, (which have nothing to
do with graphs). A Context Free Grammar has four parts:

1. A set V of variables (such as


f actor or
Balanced ); note that V is not a vertex set
here.
The individual variables are usually written using single capital letters, often from the
end of the alphabet, e.g. X, Y, Z; no angle brackets are used here. This has little
mnemonic value, but it is easier to write. If you do want to use longer variable names,
I advise using the angle brackets to delimit them.

2. An alphabet T of terminals: these are the characters used to write the strings being
generated. Usually, they are written with small letters.

3. S ∈ V is the start variable, the variable from which the string generation begins.

4. A set of rules R. Each rule has the form X → σ, where X ∈ V is a variable and
σ ∈ (T ∪ V )∗ is a string of variables and terminals, which could be λ, the empty string.
If we have several rules with left the same lefthand side, for example X → σ1 , X →
σ2 , · · · , X → σk , they can be written as X → σ1 | σ2 | · · · | σk for short. The
meaning is that an occurrence of X in a generated string can be replaced by any one
of σ1 , σ2 , · · · , σk . Different occurrences of X can be replaced by distinct σi , of course.
3.3. CONTEXT FREE LANGUAGES 77

The generation of a string proceeds by a series of replacements which start from the
string s0 = S, and which, for 1 ≤ i ≤ k, obtain si from si−1 by replacing some variable X
in si−1 by one of the replacements σ1 , σ2 , · · · , σk for X, as provided by the rule collection R.
We will write this as

S = s0 ⇒ s1 ⇒ s2 ⇒ · · · ⇒ sk or S ⇒∗ sk for short.

The language generated by grammar G, L(G), is the set of strings of terminals that can be
generated from G’s start variable S:

L(G) = {w | S ⇒∗ w and w ∈ T ∗ }.

Example 3.3.2. Grammar G2 , the grammar generating the language of properly nested
parentheses. It has:
Variable set: {S}.
Terminal set: {(, )}.
Rules: S ⇒ (S) | SS | λ.
Start variable: S.

Here are some example derivations.


S ⇒ SS ⇒ (S)S ⇒ ()S ⇒ ()(S) ⇒ ()((S)) ⇒ ()(()).
S ⇒ (S) ⇒ (SS) ⇒ ((S)S) ⇒ (()S) ⇒ (()(S)) ⇒ (()()).

3.3.1 Closure Properties


Lemma 3.3.3. Let GA and GB be CFGs generating languages A and B, respectively. Then
there are CFGs generating A ∪ B, A ◦ B, A∗ .

Proof. Let GA = (VA , ΣA , RA , SA ) and GB = (VB , ΣB , RB , SB ). By renaming variables if


needed, we can ensure that VA and VB are disjoint.
First, we show that A ∪ B is generated by the following grammar GA∪B .
GA∪B has variable set {SA∪B } ∪ VA ∪ VB , terminal set TA ∪ TB , start variable SA∪B , rules
RA ∪ RB plus the rules SA∪B → SA | SB .
To generate a string w ∈ A, GA∪B performs a derivation with first step SA∪B ⇒ SA ,
and then follows this with a derivation of w in GA : SA ⇒∗ w. So if w ∈ A, w ∈ L(GA∪B ).
Likewise, if w ∈ B, then w ∈ L(GA∪B ) also. Thus A ∪ B ⊆ L(GA∪B ).
To show L(GA∪B ) ⊆ A ∪ B is also straightforward. For if w ∈ L(GA∪B ), then there is
a derivation SA∪B ⇒∗ w. Its first step is either SA∪B ⇒ SA or SA∪B ⇒ SB . Suppose it
is SA∪B ⇒ SA . Then the remainder of the derivation is SA ⇒∗ w; this says that w ∈ A.
Similarly, if the first step is SA∪B ⇒ SB , then w ∈ B. Thus L(GA∪B ) ⊆ A ∪ B.
Next, we show that A ◦ B is generated by the following grammar GA◦B .
GA◦B has variable set SA◦B ∪ VA ∪ VB , terminal set TA ∪ TB , start variable SA◦B , and
rules RA ∪ RB plus the rule SA◦B → SA SB .
78 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

If w ∈ A ◦ B, then w = uv for some u ∈ A and v ∈ B. So to generate w, GA◦B performs


the following derivation. The first step is SA◦B ⇒ SA SB ; this is followed by a derivation
SA ⇒∗ u, which yields the string uSB ; this is then followed by a derivation SB ⇒∗ v, which
yields the string uv = w. Thus A ◦ B ⊆ L(GA◦B ).
To show L(GA◦B ) ⊆ A ◦ B is also straightforward. For if w ∈ L(GA◦B ), then there is
a derivation SA◦B ⇒∗ w. The first step can only be SA◦B ⇒ SA SB . Let u be the string
of terminals derived from SA , and v the string of terminals derived from SB , in the full
derivation. So uv = w, u ∈ A and v ∈ B. Thus L(GA◦B ) ⊆ A ◦ B.
The fact that there is a grammar GA∗ generating A∗ we leave as an exercise for the
reader.
Lemma 3.3.4. {λ}, φ, {a} are all context free languages.
Proof. The grammar with the single rule S → λ generates {λ}, the grammar with no rules
generates φ, and the grammar with the single rule S → a generates {a}, where, in each case,
S is the start variable.
Corollary 3.3.5. All regular languages have CFGs.
Proof. It suffices to show that for any language represented by a regular expression r there
is a CFG Gr generating the same language. This is done by means of a proof by induction
on the number of operators in r. As this is identical in structure to the proof of Lemma
2.4.1, the details are left to the reader.

3.4 Converting CFGs to Chomsky Normal Form (CNF)


A CNF grammar is a CFG with rules restricted as follows.

The right-hand side of a rule consists of:

i. Either a single terminal, e.g. A → a.


ii. Or two variables, e.g. A → BC.
iii. Or the rule S → λ, if λ is in the language.

iv. The start symbol S may appear only on the left-hand side of rules.

Given a CFG G, we show how to convert it to a CNF grammar G generating the same
language.
We use a grammar G with the following rules as a running example.

S → ASA | aB; A → B | S; B → b | λ

We proceed in a series of steps which gradually enforce the above CNF criteria; each step
leaves the generated language unchanged.
3.4. CONVERTING CFGS TO CHOMSKY NORMAL FORM (CNF) 79

Step 1 For each terminal a, we introduce a new variable, Ua say, add a rule Ua → a, and
for each occurrence of a in a string of length 2 or more on the right-hand side of a rule,
replace a by Ua . Clearly, the generated language is unchanged.
Example: If we have the rule A → Ba, this is replaced by Ua → a, A → BUa .
This ensures that terminals on the right-hand sides of rules obey criteria (i) above.
This step changes our example grammar G to have the rules:

S → ASA | Ua B; A → B | S; B → b | λ; Ua → a

Step 2 For each rule with 3 or more variables on the right-hand side, we replace it with a
new collection of rules obeying criteria (ii) above. Suppose there is a rule U → W1 W2 · · · Wk ,
for some k ≥ 3. Then we create new variables X2 , X3 , · · · , Xk−1 , and replace the prior rule
with the rules:

U → W1 X2 ; X2 → W2 X3 ; · · · ; Xk−2 → Wk−2 Xk−1 ; Xk−1 → Wk−1 Wk


Clearly, the use of the new rules one after another, which is the only way they can be used,
has the same effect as using the old rule U → W1 W2 · · · Wk . Thus the generated language is
unchanged.
This ensures, for criteria (ii) above, that no right-hand side has more than 2 variables.
We have yet to eliminate right-hand sides of one variable or of the form λ.
This step changes our example grammar G to have the rules:

S → AX | Ua B; X → SA; A → B | S; B → b | λ; Ua → a

Step 3 We replace each occurrence of the start symbol S with the variable S  and add the
rule S → S  . This ensures criteria (iv) above.
This step changes our example grammar G to have the rules:

S → S  ; S  → AX | Ua B; X → S  A; A → B | S  ; B → b | λ; Ua → a

Step 4 This step removes rules of the form A → λ.


To understand what needs to be done it is helpful to consider a derivation tree for a string
w. If the tree use a rule of the form A → λ, we label the resulting leaf with λ. We will be
focussing on subtrees in which all the leaves have λ-labels; we call such subtrees λ-subtrees.
Now imagine pruning all the λ-subtrees, creating a reduced derivation tree for w. Our goal
is to create a modified grammar which can form the reduced derivation tree. A derivation
tree, and its reduced form is shown in Figure 3.13.
We need to change the grammar as follows. Whenever there is a rule A → BC and B
can generate λ, we need to add the rule A → C to the grammar (note that this does not
allow any new strings to be generated); similarly, if there is is rule A → DE and E can
generate λ, we need to add the rule A → D; likewise, if there is a rule A → BB and B can
generate λ, we need to add the rule A → B.
80 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

S S

S S

A X X

B S A S

λ Ua B B Ua B

a b λ a b

(a) (b)

Figure 3.13: (a) A Derivation Tree for string ab. (b) The Reduced Derivation Tree.

Next, we remove all rules of the form A → λ. We argue that any previously generatable
string w = λ remains generatable. For given a derivation tree for w using the old rules,
using the new rules we can create the reduced derivation tree, which is a derivation tree for
w in the new grammar. To see this, consider a maximal sized λ-subtree (that is a λ-subtree
whose parent is not part of a λ-subtree). Then its root v must have a sibling w and parent
u (these are the names of nodes, not strings). Suppose that u has variable label A, v has
label B and w has label C. Then node v was generated by applying either the rule A → BC
or the rule A → CB at node u (depending on whether v is the left or right child of u). In
the reduced tree, applying the rule A → C generates w and omits v and its subtree.
Finally, we take care of the case that, under the old rules, S can generate λ. In this
situation, we simply add the rule S → λ, which then allows λ to be generated by the new
rules also.
To find the variables that can generate λ, we use an iterative rule reduction procedure.
First, we make a copy of all the rules. We then create reduced rules by removing from the
right-hand sides all instances of variables A for which there is a rule A → λ. We keep
iterating this procedure so long as it creates new reduced rules with λ on the right-hand
side.
For our example grammar we start with the rules
S → S  ; S  → AX | Ua B; X → S  A; A → B | S  ; B → b | λ; Ua → a
As B → λ is a rule, we obtain the reduced rules
S → S  ; S  → AX | Ua B | Ua ; X → S  A; A → B | λ | S  ; B → b | λ; Ua → a
As A → λ is now a rule, we next obtain
S → S  ; S  → AX | X | Ua B | Ua ; X → S  A | S  ; A → B | λ | S  ; B → b | λ; Ua → a
3.4. CONVERTING CFGS TO CHOMSKY NORMAL FORM (CNF) 81

There are no new rules with λ on the right-hand side. So the procedure is now complete and
this yields the new collection of rules:

S → S  ; S  → AX | X | Ua B | Ua ; X → S  A | S  ; A → B | S  ; B → b; Ua → a

An efficient implementation keeps track of the lengths of each right-hand side, and a list
of the locations of each variable; the new rules with λ on the right-hand side are those which
have newly obtained length 0. It is not hard to have this procedure run in time linear in the
sum of the lengths of the rules.

Step 5 This step removes rules of the form A → B, which we call unit rules.
What is needed is to replace derivations of the form A1 ⇒ A2 ⇒ · · · ⇒ Ak ⇒ BC with
a new derivation of the form A ⇒ BC; this is achieved with a new rule A → BC. Similarly,
derivations of the form A1 ⇒ A2 ⇒ · · · ⇒ Ak ⇒ a need to be replaced with a new derivation
of the form A ⇒ a; this is achieved with a new rule A → a. We proceed in two substeps.
Substep 5.1. This substep identifies variables that are equivalent, i.e. collections B1 , B2 , · · · , Bl
such that for each pair Bi and Bj , 1 ≤ i < j ≤ l, Bi can generate Bj , and Bj can generate
Bi . We then replace all of B1 , B2 , · · · , Bl with a single variable, B1 say. Clearly, this does
not change the language that is generated.
To do this we form a directed graph based on the unit rules. For each variable, we create
a vertex in the graph, and for each unit rule A → B we create an edge (A, B). Figure 3.14(a)
shows the graph for our example grammar. The vertices in each strong component of the

S A S A

S B X B

Ua X Ua
(a) (b)

Figure 3.14: (a) Graph showing the unit rules. (b) The reduced graph.

graph correspond to a collection of equivalent variables.


For the example grammar, the one non-trivial strong component contains the variables
{S , X}. We replace S  with X yielding the rules:


S → X; X → AX | X | Ua B | Ua ; X → XA | X; A → B | X; B → b; Ua → a

We can remove the useless rule X → X also.


Substep 5.2. In this substep, we add rules A → BC and A → a, as described above, so as
to shortcut derivations that were using unit rules.
82 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

To this end, we use the graph formed from the unit rules remaining after Substep 5.1,
which we call the reduced graph. It is readily seen that this is an acyclic graph.
In processing A → B, we will add appropriate non-unit rules that allow the shortcutting
of all uses of A → B, and hence allow the rule A → B to be discarded. If there are no unit
rules with B on the left-hand side it suffices to add a rule A → CD for each rule B → CD,
and a rule A → b for each rule B → b.
To be able to do this, we just have to process the unit rules in a suitable order. Recall
that each unit rule is associated with a distinct edge in the reduced graph. As this graph
will be used to determine the order in which to process the unit rules, it will be convenient
to write “processing an edge” when we mean “processing the associated rule”. It suffices to
ensure that each edge is processed only after any descendant edges have been processed. So
it suffices to start at vertices with no outedges and to work backward through the graph.
This is called a reverse topological traversal. (This traversal can be implemented via a depth
first search on the acyclic reduced graph.)
For each traversed edge (E, F ), which corresponds to a rule E → F , for each rule
F → CD, we add the rule E → CD, and for each rule F → f , we add the rule E → f ; then
we remove the rule E → F . Any derivation which had used the rules E → F followed by
F → CD or F → f can now use the rule E → CD or E → f instead. So the same strings
are derived with the new set of rules.
This step changes our example grammar G as follows (see Figure 3.14(b)):
First, we traverse edge (A, B). This changes the rules as follows:
Add A → b
Remove A → B.
Next, we traverse edge (X, Ua ). This changes the rules as follows:
Add X → a
Remove X → Ua .
Now, we traverse edge (A, X). This changes the rules as follows:
Add A → AX | XA | Ua B | a.
Remove A → X.
Finally, we traverse edge (S, X). This changes the rules as follows:
Add S → AX | XA | Ua B | a.
Remove S → X.
The net effect is that our grammar now has the rules
S → AX | Ua B | XA | a; X → AX | Ua B | XA | a; A → b | AX | Ua B | XA | a; B → b; Ua → a
Steps 4 and 5 complete the attainment of criteria (ii), and thereby create a CNF grammar
generating the same language as the original grammar.

3.5 Showing Languages are not Context Free


We will do this with the help of a Pumping Lemma for Context Free Languages. To prove
this lemma we need two results relating the height of derivation trees and the length of the
3.5. SHOWING LANGUAGES ARE NOT CONTEXT FREE 83

derived strings, when using CNF grammars.


Lemma 3.5.1. Let T be a derivation tree of height h for string w = λ using CNF grammar
G. Then w ≤ 2h−1 .
Proof. The result is easily confirmed by strong induction on h. Recall that the height of the
tree is the length, in edges, of the longest root to leaf path.
The base case, h = 1, occurs with a tree of two nodes, the root and a leaf child. Here, w
is the one terminal character labeling the leaf, so |w| = 1 = 2h−1 ; thus the claim is true in
this case.
Suppose that the result is true for trees of height k or less. We show that it is also true
for trees of height k + 1. To see this, note that the root of T has two children, each one being
the root of a subtree of height k or less. Thus, by the inductive hypothesis, each subtree
derives a string of length at most 2k−1 , yielding that T derives a string of length at most
2 · 2k−1 = 2k . This shows the inductive claim for h = k + 1.
It follows that the result holds for all h ≥ 1.
Corollary 3.5.2. Let w be the string derived by derivation tree T using CNF grammar G.
If |w| > 2h−1 , then T has height at least h + 1 and hence has a root to leaf path with at least
h + 1 edges and at least h + 1 internal nodes.
Now let’s consider the language L = {ai bi ci | i ≥ 0} which is not a CFL as we shall
proceed to show. Let’s suppose for a contradiction that L were a CFL. Then it would have
a CNF grammar G, with m variables say. Let p = 2m .
Let’s consider string s = ap bp cp ∈ L, and look at the derivation tree T for s. As |s| > 2m−1 ,
by Corollary 3.5.2, T has a root to leaf path with at least m + 1 internal nodes. Let P be a
longest such path. Each internal node on P is labeled by a variable, and as P has at least
m + 1 internal nodes, some variable must be used at least twice.
Working up from the bottom of P , let c be the first node to repeat a variable label. So
on the portion of P below c each variable is used at most once. The derivation tree is shown
in Figure 3.15. Let d be the descendant of c on P having the same variable label as c, A say.
Let w be the substring derived by the subtree rooted at d. Let vwx be the substring derived
by the subtree rooted at c (so v, for example, is derived by the subtrees hanging from P to
its left side on the portion of P starting at c and ending at d’s parent). Let uvwxy be the
string derived by the whole tree.
Observation 1. The height of c is at most m + 1. Hence, by Lemma 3.5.1, |vwx| ≤ 2m = p.
This follows because P is a longest root to leaf path and because no variable label is repeated
on the path below node c.
Observation 2. Either |v| ≥ 1 or |x| ≥ 1 (or both). We abbreviate this as |vx| ≥ 1.
For node c has two children, one on path P , and a second child, which we name e, that is
not on path P . This is illustrated in Figure 3.16. Suppose that e is c’s right child. Let x2
be the string derived by the subtree rooted at e. Then, as e is not a leaf, |x2 | ≥ 1. Clearly
x2 is the right end portion of x (it could be that x = x2 ); thus |x| ≥ 1. Similarly if e is c’s
left child, |v| ≥ 1.
84 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

T:
P node c, label A

node d, label A

u v w x y

Figure 3.15: Derivation Tree for s ∈ L.

c c

e e
d d

w x1 x2 v1 v2 w

x v
(a) (b)

Figure 3.16: Possible Locations of e, the Off Path Child of node c.


3.5. SHOWING LANGUAGES ARE NOT CONTEXT FREE 85

Let’s consider replicating the middle portion of the derivation tree, namely the wedge
W formed by taking the subtree C rooted at c and removing the subtree D rooted at d, to
create a derivation tree for a longer string, as shown in Figure 3.17. We can do this because

node c node c
duplicate
W node d node c
W
node d
D W

Figure 3.17: Duplicating wedge W .

the root of the wedge is labeled by A and hence W plus a nested subtree D is a legitimate
replacement for subtree D. The resulting tree, with two copies of W , one nested inside the
other, is a derivation tree for uvvwxxy. Thus uvvwxxy ∈ L.
Clearly, we could duplicate W more than once, or remove it entirely, showing that all the
strings uv i wxi y ∈ L, for any integer i ≥ 0.
Now let’s see why uvvwxxy ∈ / L. Note that we know that |vx| ≥ 1 and |vwx| ≤ p, by
Observations 1 and 2. Further recall that ap bp cp = uvwxy. As |vwx| ≤ p, it is contained
entirely in either one or two adjacent blocks of letters, as illustrated in Figure 3.18. Therefore,

ap bp cp

Key: substring vwx, possible locations

Figure 3.18: Possible Locations of vwx in ap bp cp .

when v and x are duplicated, as |vx| ≥ 1, the number of occurrences of one or two of the
characters increases, but not of all three. Consequently, in uvvwxxy there are not equal
numbers of a’s, b’s, and c’s, and so uvvwxx ∈
/ L.
We have shown both that uvvwxxy ∈ L and uvvwxxy ∈ / L. This contradiction means
that the original assumption (that L is a CFL) is mistaken. We conclude that L is not a
CFL.
We are now ready to prove the Pumping Lemma for Context Free Languages, which
will provide a tool to show many languages are not Context Free, in the style of the above
86 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

argument.

Lemma 3.5.3. (Pumping Lemma for Context Free Languages.) Let L be a Context Free
Language. Then there is a constant p = pL such that if s ∈ L and |s| ≥ p then s is pumpable,
that is s can be written in the form s = uvwxy with

1. |vx| ≥ 1.

2. |vwx| ≤ p.

3. For every integer i, i ≥ 0, uv i wxi y ∈ L.

Proof. Let G be a CNF grammar for L, with m variables say. Let p = 2m . Let s ∈ L, where
|s| ≥ p, and let T be a derivation tree for s. As |s| > 2m−1 , by Corollary 3.5.2, T has a root
to leaf path with at least m + 1 internal nodes. Let P be a longest such path. Each internal
node on P is labeled by a variable, and as P has at least m + 1 internal nodes, some variable
must be used at least twice.
Working up from the bottom of P , let c be the first node to repeat a variable label. So
on the portion of P below c each variable is used at most one. Thus c has height at most
m + 1. The derivation tree is shown in Figure 3.15. Let d be the descendant of c on P having
the same variable label as c, A say. Let w be the substring derived by the subtree rooted at
d. Let vwx be the substring derived by the subtree rooted at c. Let uvwxy be the string
derived by the whole tree.
By Observation 1, c has height at most m + 1; hence, by Lemma 3.5.1, vwx, the string
c derives, has length at most 2m = p. This shows Property (2). Property (1) is shown in
Observation 2, above.
Finally, we show property (3). Let’s replicate the middle portion of the derivation tree,
namely the wedge W formed by taking the subtree C rooted at c and removing the subtree
D rooted at d, to create a derivation tree for a longer string, as shown in Figure 3.17.
We can do this because the root of the wedge is labeled by A and hence W plus a nested
subtree D is a legitimate replacement for subtree D. The resulting tree, with two copies of
W , one nested inside the other, is a derivation tree for uvvwxxy. Thus uvvwxxy ∈ L.
Clearly, we could duplicate W more than once, or remove it entirely, showing that all the
strings uv i wxi y ∈ L, for any integer i ≥ 0.

Next, we demonstrate by example how to use the Pumping Lemma to show languages are
not context free. The argument structure is identical to that used in applying the Pumping
Lemma to regular languages.

Example 3.5.4. J = {ww | w ∈ {a, b}∗ }. We will show that J is not context free.
Step 1. Suppose, for a contradiction, that J were context free. Then, by the Pumping
Lemma, there is a constant p = pJ such that for any s ∈ J with |s| ≥ p, s is pumpable.
Step 2. choose s = ap+1 bp+1 ap+1 bp+1 . Clearly s ∈ J and |s| ≥ p, so s is pumpable.
3.5. SHOWING LANGUAGES ARE NOT CONTEXT FREE 87

Step 3. As s is pumpable we can write s as s = uvwxy with |vwx| ≤ p, |vx| ≥ 1 and


uv i wxi y ∈ J for all integers i ≥ 0. Also, by condition (3) with i = 0, s = uwy ∈ J. We
argue next that in fact s ∈ / J. As |vwx| ≤ p, vwx can overlap one or two adjacent blocks
of characters in s but no more. Now, to obtain s from s, v and x are removed. This takes
away characters from one or two adjacent blocks in s, but at most p characters in all (as
|vx| ≤ p). Thus s has four blocks of characters, with either one of the blocks of a’s shorter
than the other, or one of the blocks of b’s shorter than the other, or possibly both of these.
In every case s ∈/ J. We have shown that both s ∈ J and s ∈ / J. This is a contradiction,
Step 4. The contradiction shows that the initial assumption was mistaken. Consequently,
J is not context free.

Comment. Suppose, by way of example, that vwx overlaps the first two blocks of characters.
It would be incorrect to assume that v is completely contained in the block of a’s and x in
the block of b’s. Further, it may be that v = λ or x = λ (but not both). All you know is
that |vwx| ≤ p and that one of |v| ≥ 1 or |x| ≥ 1. Don’t assume more than this.
Example 3.5.5. K = {ai bj ck | i < j < k}. We show that K is not context free.
Step 1. Suppose, for a contradiction, that K were context free. Then, by the Pumping
Lemma, there is a constant p = pK such that for any s ∈ K with |s| ≥ p, s is pumpable.
Step 2. Choose s = ap bp+1 cp+2 . Clearly, s ∈ K and |s| ≥ p, so s is pumpable.
Step 3. As s is pumpable we can write s as s = uvwxy with |vwx| ≤ p, |vx| ≥ 1 and
uv i wxi y ∈ K for all integers i ≥ 0.
As |vwx| ≤ p, vwx can overlap one or two blocks of the characters in s, but not all three.
Our argument for obtaining a contradiction depends on the position of vwx.
Case 1. vx does not overlap the block of c’s.
Then consider s = uvvwxxy. As s is pumpable, by Condition (3) with i = 2, s ∈ L. We
argue next that in fact s ∈ / K. As v and x have been duplicated in s , the number of a’s or
the number of b’s is larger than in s (or possibly both numbers are larger); but the number
of c’s does not change. If the number of b’s has increased, then s has at least as many b’s
as c’s, and then s ∈ / K. Otherwise, the number of a’s increases, and the number of b’s is
unchanged, so s has at least as many a’s as b’s, and again s ∈/ K.
Case 2. vwx does not overlap the block of a’s.
Then consider s = uwy. Again, as s is pumpable, by Condition (3) with i = 0, s ∈ s.
Again, we show that in fact s ∈ / K. To obtain s from s, the v and the x are removed. So

in s either the number of c’s is smaller than in s, or the number of b’s is smaller (or both).
But the number of a’s is unchanged. If the number of b’s is reduced, then s has at least as
many a’s as b’s, and so s ∈ / K. Otherwise, the number of c’s decreases and the number of
b’s is unchanged; but then s has at least as many b’s as c’s, and again s ∈
/ K.

In either case, a pumped string s has been shown to be both in K and not in K. This
is a contradiction.
Step 4. The contradiction shows that the initial assumption was mistaken. Consequently,
K is not context free.
88 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

The following example uses the yet to be proven result that if L is context free and R is
regular then L ∩ R is also context free.

Example 3.5.6. H = {w | w ∈ {a, b, c}∗ and w has equal numbers of a’s, b’s and c’s}.
Consider H ∩ a∗ b∗ c∗ = L = {ai bi ci }. If H were context free, then L would be context free
too. But we have already seen that L is not context free. Consequently, H is not context
free either.
This could also be shown directly by pumping on string s = ap bp cp .

When applying the Pumping Lemma, it seems a nuisance to have to handle the cases
where one of v or x may be the empty string, and fortunately, we can prove a variant of the
Pumping Lemma in which both |v| ≥ 1 and |x| ≥ 1.

Lemma 3.5.7. (Variant of the Pumping Lemma for Context Free Languages.) Let L be a
Context Free Language. Then there is a constant p = pL such that if s ∈ L and |s| ≥ p then
s is pumpable, that is s can be written in the form s = uvwxy with

1. |v|, |x| ≥ 1.

2. |vwx| ≤ p.

3. For every integer i, i ≥ 0, uv i wxi y ∈ L.

Proof. Let p̃L be the pumping constant for the standard pumping lemma applied to L. We
will chose pL = 2p̃L .
Now let s ∈ L be any string of length at least p = pL .
We apply the standard pumping lemma to s and conclude that we can write s as s =
ũṽ w̃x̃ỹ with

1. |ṽx̃| ≥ 1.

2. |ṽ w̃x̃| ≤ p̃.

3. For every integer i, i ≥ 0, ũṽ i w̃x̃i ỹ ∈ L.

If both |ṽ| ≥ 1 and x̃ ≥ 1 then the new result follows on setting u = ũ, v = ṽ, w = w̃,
x = x̃, y = ỹ.
Otherwise, if |ṽ| ≥ 1 and |x̃| = 0, then we set u = ũ, v = ṽ, w = λ, x = ṽ, y = w̃ỹ.
We observe that for all i, uv i wxi y = ũṽ i λṽ i w̃ỹ = ũṽ 2i w̃x̃2i ỹ ∈ L; we also observe that
|vwx| = 2|ṽ| ≤ 2|ṽ w̃x̃| ≤ 2p̃ = p. Likewise, if |ṽ| = 0 and |x̃| ≥ 1, then we set u = ũw̃, v = x̃,
w = λ, x = x̃, y = ỹ. Again, uv i wxi y ∈ L for all integer i ≥ 0 and |vwx| ≤ p.
3.6. PDAS RECOGNIZE EXACTLY THE CONTEXT FREE LANGUAGES 89

3.6 PDAs Recognize exactly the Context Free Lan-


guages
Lemma 3.6.1. Let L be a CFL. Then there is a PDA ML that recognizes L.
Proof. Let L be generated by CNF grammar GL = (VL , T, RL , SL ). The corresponding ML
is illustrated in Figure 3.19. The computation centers on vertex Main. Each return visit to

Main: Read a
λ read Push  Push S s read Pop A
stack empty σ R on the stack Simulates
corresponds to rule A → a
Pop  derivation of sσ
s read Pop A
stack empty Push B Push C
corresponds to
derivation of w Push B

Simulates rule
A → BC

Figure 3.19: PDA ML simulating CFL GL .

Main corresponds to the simulation of one step of a derivation in GL . Specifically:


Claim. Let s ∈ T ∗ and σ ∈ V ∗ . Then
GL generates string sσ
exactly if
ML can reach vertex Main with data configuration (s, $σ R ).
In order to simulate the derivation’s use of rule A → a, ML has a self-loop labeled (Pop
A, Read a). To simulate the use of rule A → BC, ML will execute the sequence Pop A,
Push C, Push B (remember, $σ R is on the stack). To achieve this, ML has an additional
vertex called “Push B” and edges (Main, “Push B”) and (“Push B”, Main), labeled (Pop
A, Push C) and Push B, respectively. It will be helpful to refer to these actions, that take
ML from vertex Main back to itself, as supermoves. So each supermove of ML corresponds
to one derivation step in GL .
A derivation of a terminal string s occurs if σ = λ. To allow this to be recognized, ML
uses a -shielded stack. Then if ML is at vertex Main with data-configuration (s, $), it can
pop its stack and move to its final vertex. Thus if we can show the claim it is immediate
that GL ⇒∗ w exactly if w ∈ L(ML ).
We show the claim in two steps. First, suppose that GL ⇒∗ w. Then there is a leftmost
derivation S = s1 σ1 ⇒ s2 σ2 ⇒ · · · ⇒ sk+1 σk+1 = w. (In a leftmost derivation, at step
i, the leftmost variable in σi , for 1 ≤ i < k, is always the one to be replaced using a
90 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

rule of the grammar. It corresponds to a Depth First Traversal of the derivation tree.) The
corresponding k supermove computation by ML starts by moving to vertex Main with λ read
and S on its stack, i.e. it has data configuration C1 = (λ, $S) = (s1 , $σ1R ). It then proceeds
through the following k data configurations at vertex Main: C2 = (s2 , $σ2R ), · · · Ck+1 =
R
(sk+1 , $σk+1 ) = (w, $), and it goes from Ci to Ci+1 , for 1 ≤ i ≤ k, by means of the supermove
corresponding to the application of the rule taking si σi to si+1 σi+1 . Finally ML moves from
vertex Main to its final vertex, popping  from its stack, and thereby recognizing w; i.e.
w ∈ L(ML ).
Next suppose that ML recognizes w. It does so by means of a computation using k
supermoves, for some k ≥ 0. The computation begins by moving to vertex Main, while
reading λ and pushing  on the stack, i.e. it is at configuration C1 = (λ, $) = (s1 , $σ1R ).
It then moves through the following series of data configurations at vertex Main: C2 =
(s2 , $σ2R ), · · · , Ck+1 = (sk+1 , $σk+1
R
), where, for 1 ≤ i ≤ k, Ci+1 is reached from Ci by means
of a supermove. By definition, the ith supermove corresponds to the application of the
rule that takes string si σi to si+1 σi+1 . Thus, the following is a derivation in grammar GL :
S = s1 σ1 ⇒ s2 σ2 ⇒ · · · ⇒ sk+1 σ k+1 . Following the kth supermove, ML moves to its final
vertex, which entails popping  from its stack. At this point, ML has data configuration
(sk+1 , λ), and it accepts sk+1 , the string read. By assumption, ML was recognizing w, thus
sk+1 = w. We conclude that S ⇒∗ sk+1 σk+1 = w.

Lemma 3.6.2. If L is context-free and R is regular then L ∩ R is also context free, where
L, R ⊆ Σ∗ .

Proof. Let GL = (VL , Σ, RL , SL ) be a CNF grammar generating L and let MR = (VR , start, FR , δR )
be a DFA recognizing R. We will build a grammar GL∩R = (VL∩R , Σ, RL∩R , SL∩R ) to gener-
ate L ∩ R. Let VR = {q1 , q2 , · · · , qm }. For each variable U ∈ VL , we create m2 variables Uij
in VL∩R . The rules we create will ensure that:

Uij ⇒∗ w ∈ Σ∗
exactly if
U ⇒∗ w and there is a path labeled w in MR going from vi to vj .

Thus a variable in GL∩R records the name of the corresponding variable in GL and also
records a “start” and a “finish” vertex in MR . The constraint we are imposing is that Uij
can generate w exactly if both U can generate w and MR when started at qi will go to qj on
input w. It follows that if qf is a final vertex of MR and if q1 = start, then

(SL )1f ⇒∗ w for some qf ∈ FR


exactly if
w ∈ L ∩ R.

We create the following rules.

X If A → a is a rule in GL and δR (qi , a) = qj , then Uij → a is a rule in GL∩R .


3.6. PDAS RECOGNIZE EXACTLY THE CONTEXT FREE LANGUAGES 91

X If A → BC is a rule in GL , then Aik → Bij Cjk is a rule in GL∩R , for all i, j, k,


1 ≤ i, j, k ≤ m.

X Finally, if SL → λ is a rule in GL and λ ∈ R (because q1 ∈ FR ), then SL∩R → λ is a


rule in GL∩R .

To see why this works, we consider any non-empty string w ∈ L ∩ R and look at a
derivation tree T for w with respect to GL . At the same time we look at a w-recognizing
path P in MR .
We label each leaf of T with the names of two vertices in MR : the vertices that MR is
at right before and right after reading the input character corresponding to the character
labeling the leaf. If we read across the leaves from left to right, recording vertex and character
labels, we obtain a sequence p1 w1 p2 , p2 w2 p3 , · · · , pn wn pn+1 , where p1 = start and pn+1 ∈ FR
Next, we give the internal nodes in T vertex labels also. A node receives as its first label
the first label of its leftmost leaf and as its second label the second label of its rightmost
leaf. Suppose that A is the variable label at an internal node with children having variable
labels B and C (see Figure 3.20). Suppose further that B receives vertex labels p and q (in

p A s

B C
p q r s

p q = r s

Figure 3.20: Vertex Labels in Derivation Tree T : first label on left, second label on right.

that order), and C receives vertex labels r and s. Then q = r and A receives vertex labels p
and s. To obtain the derivation in GL∩R , we simply replace A ⇒ BC by Aps ⇒ Bpq Cqs . In
addition, at the leaf level, we replace A ⇒ a by Apq ⇒ a where p and q are the vertex labels
on the leaf (and on its parent). Clearly, this is a derivation in GL∩R and further it derives
w. Thus if w ∈ L ∩ R, then SL∩R ⇒∗ w.
On the other hand, suppose that SL∩R ⇒∗ w. Then consider the derivation tree for
w in GL∩R . On replacing each variable Uij by U we obtain a derivation tree for w in GL .
Thus SL ⇒∗ w also. On looking at the leaf level, and labeling each leaf with the vertex
indices on the variable at its parent, we obtain a sequence p1 w1 p2 , p2 w2 p3 , · · · , pn wn pn+1 ,
where w = w1 w2 · · · wn , p1 = start and pn+1 ∈ FR . As δ(pi , wi ) = pi+1 , for 1 ≤ i ≤ n, by the
first rule definition for GL∩R , we see that p1 p2 pi · · · pn+1 is a w-recognizing path in MR , and
so w ∈ R. This shows that if SL∩R ⇒∗ w then w ∈ L ∩ R.
92 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

3.6.1 Construcing a CFG Generating a PDA-Recognized Language


Our final construction will show that if L is recognized by a pda, then there is a CFG
generating L. The first step in the construction it to represent the computation of the pda
is terms of matching pushes and pops to the stack.
A little more terminology will be helpful.
Definition 3.6.3. A computation of M that goes from vertex p to vertex q and begins and
ends with σ on the stack is called σ-preserving if throughout the computation σ remains on
the stack (possibly with other characters pushed on top some of the time); i.e. none of σ is
popped during this part of the computation.
Lemma 3.6.4. Let P be a w-computation path going from vertex p to vertex q that reads
input w and that is σ-preserving for some σ. Then the same computation is also τ -preserving,
for any τ .
Proof. The computation never seeks to pop any of the characters forming σ from the stack,
so it does not matter what is on this portion of the stack. All that matters is what is
pushed once the computation starts. Thus if the σ is replaced by τ it has no effect on the
computation.
In other words, the specific σ on the stack when a σ-preserving computation begins is
irrelevant. Accordingly, we also call a σ-preserving computation a stack-preserving compu-
tation.
Definition 3.6.5. (s, σ) is called a data-configuration of PDA M at vertex p if M can
end up at vertex p having σ on its stack and having read input string s (on starting the
computation at its start vertex with an empty stack).

The Trapezoidal Decomposition


Let ML = (V, T, Γ, F, s, δ) be a -shielded pda recognizing L that has a single final vertex
f which is reachable only with an empty stack. Let us further suppose that on each move
ML does either a Pop or a Push but not both (so a move in which neither a Pop nor a Push
occurs can be replaced by two moves, the first being an unnecessary Push and the second
being a Pop; likewise, a move which has both a Pop and a Push can be replaced by two
moves: a Pop followed by a Push). Finally, we suppose that the first and last steps of ML ’s
computation must go from from vertex s to s and from f  to f , respectively, and do not do
any reads, but just push and pop the  shield.
For this construction, it is helpful to view ML ’s computation in terms of a Stack Contents
Diagram, as shown in Figure 3.21. This shows the height of the stack evolving as the
computation proceeds. We associate each vertex in the computation path with a matching
successor or predecessor vertex (or possibly both), as follows.
Definition 3.6.6. Vertices p and r are matched if r is the first vertex following p for which
the computation path from p to r is σ-preserving.
3.6. PDAS RECOGNIZE EXACTLY THE CONTEXT FREE LANGUAGES 93

Stack
height

p t Computation path

Figure 3.21: Stack Contents Diagram.

If we draw the edges connecting matched vertices in the Stack Contents Diagram, this
naturally partitions the diagram into trapezoids (see Figure 3.22).

q r
p
t

Figure 3.22: Trapezoid Decomposition

Consider the subcomputation of ML that goes from a vertex p to its subsequent matched
vertex r. We call the subcomputation a phase; let P denote this phase. Let A be the
character pushed onto the stack in P’s first step; then this same A is popped in P’s last
step. Suppose that the first step performs the operation “Read a, Push A” and takes ML
from vertex p to vertex q, and the last step performs the operation “Read b, Pop A” and
takes ML from vertex r to vertex t. (note that a, b ∈ {λ} ∪ T .) Then this pair of actions
can be represented by the left and right edges of the folowing trapezoid. It has left edge
Aab Aab
(p, q) and right edge (r, t), We name this trapezoid Tpqrt . We call trapezoid Tpqrt the base of
phase P. It may be that P is a 2-step computation, in which case the trapezoid is actually
a triangle, and q = r; we call such trapezoids triangles. But even if q = r it could be that P
lasts longer than 2 steps.
If the base of phase P is a non-triangular trapezoid, then P consists of a series of one or
more subphases. Each subphase is stack-preserving, maintaining the A on the stack. Again,
for each subphase P  , we can identify the trapezoid at its base; again, it which represents
the computation performed during the first and last steps of P  .
94 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

We can view a w-recognizing computation of ML as a nested collection of phases which


we represent using a trapezoidal tree. In a trapezoidal tree, each leaf node holds a triangle
and each internal node a non-triangular trapezoid. Each subtree corresponds to a phase,
which is a stack preserving computation of ML . In addition, the base of the phase is the
trapezoid stored at the root of the corresponding subtree.
The following additional terminology will be helpful in specifying which trapezoidal trees
could occur.
Aab
Definition 3.6.7. Trapezoid Tpqrt is realizable if ML has an edge from p to q labeled “Read
a, Push A” and an edge from r to t labeled “Read b, Pop A.”
Only realizable trapezoids could occur in the Trapezoidal Decomposition of a computation
by ML .
Definition 3.6.8. Let Z be a tree in which each internal node stores a realizable trapezoid
and each leaf a realizable triangle. Z’s label is determined recursively as follows. Suppose
that Z is rooted at internal node z and has subtrees Z1 , Z2 , · · · , Zk in left to right order,
Aab
where z stores trapezoid Tpqrt . Then Z has label:

label(Z) = a ◦ label(Z1 ) ◦ label(Z2 ) ◦ · · · ◦ label(Zk ) ◦ b.

Also p is called Z’s start vetex and t its end vertex.


Then a w-recognizing computation can be represented by a trapezoidal tree Z(w), a tree
which satisfies the following properties.
Property 3.6.9. 1. Each leaf stores a realizable triangle.

2. Each internal node stores a realizable non-triangular trapezoid.

3. The root of the tree stores trapezoid Tss$λλ


 f  f . Recall that s is ML ’s start vertex and f its

sole final vertex. Also the first and last steps of ML ’s computation go from from vertex
s to s and from f  to f , respectively, and do not perform any reads, but just push and
pop the -shield.

4. If v is an internal node with children v1 , v2 , · · · , vk , for some k ≥ 1, if v stores trapezoid


Aab
Tpqrt , and if vi stores a trapezoid with bottom edge (qi , ri ), for 1 ≤ i ≤ k, then qi = ri+1 ,
for 1 ≤ i ≤ k − 1, q = q1 and r = rk .
Lemma 3.6.10. Let Π be a stack-preserving computation of ML that begins at vertex p and
ends at vertex t and that reads input w, and let Z be the corresponding trapezoidal tree. Then
label(Z) = w.
Proof. We prove the result by induction on the height of Z.
Base case. Z comprises a single (leaf) node v.
Aab
Let Tprrt be the triangle stored at node v. Then the computation of ML comprises two steps,
the first one reading a and the second reading b (there is also a “Push A” and a “Pop A” in
3.6. PDAS RECOGNIZE EXACTLY THE CONTEXT FREE LANGUAGES 95

the first and second steps, respectively); thus w = ab. And label(Z) = ab = w, proving the
claim in this case.
Inductive step. Suppose that the claim is true for all trapezoid trees of height h ≤ l. We
show that it is true for trees Z of height l + 1 also.
Let z be the root of Z and suppose it has subtrees Z1 , Z2 , · · · Zk in left to right order,
Aab
for some k ≥ 1. Let Tpqrt be the trapezoid stored at z. Then the computation of ML
represented by Z consists of a first step “Read a, Push A”, followed in turn by the stack-
preserving computations represented by Z1 , Z2 , · · · Zk , followed by a final step “Read b, Pop
A”. Suppose that ML reads string wi during the computation corresponding to Zi . By the
inductive hypothesis, label(Zi ) = wi . Thus ML ’s computation corresponding to tree Z reads
aw1 w2 · · · wk b = w. And label(Z) = a ◦ label(Z1 ) ◦ label(Z2 ) ◦ · · · ◦ label(Zk ) ◦ b = w.
We conclude that label(Z) = w for all trees Z.
We now show the converse.
Lemma 3.6.11. Let Z be a trapezoidal tree that observes Property 3.6.9. Suppose that
label(Z) = w. Then there is a stack-preserving computation of ML that reads w, starts at
Z’s start vertex and ends at Z’s end vertex.
Proof. We prove the result by induction on the height of Z.
Base case. Z comprises a single (leaf) node v.
Aab
Let Tprrt be the triangle stored at node v. Clearly, label(Z) = ab, so w = ab.
Now define a computation of ML comprising the following two steps: the first step
consists of “Read a, Push A”’ plus a move from p, Z’s start vertex, to vertex r; the second
step consists of “Read b, Pop B” plus a move to vertex t, Z’s end vertex. Clearly this
computation is stack-preserving, it reads ab = w, and it begins at Z’s start vertex and ends
at Z’s end vertex.
Inductive step. Suppose that the claim is true for all trapezoid trees Z of height h ≤ l. We
show that it is true for trees of height l + 1 also.
Let z be the root of Z and suppose it has subtrees Z1 , Z2 , · · · Zk in left to right order, for
Aab
some k ≥ 1. Let Tpqrt be the trapezoid stored at z. Let the root of subtree Zi , for 1 ≤ i ≤ k,
store a trapezoid with bottom edge (qi , ri ). By Property 3.6.9, ri = qi+1 for 1 ≤ i ≤ k − 1,
q1 = q, and rk = r. Let label(Zi ) = wi , for 1 ≤ i ≤ k. Thus label(Z) = aw1 w2 · · · wk b = w.
By the inductive hypothesis, there is a stack-preserving subcomputation of ML that goes
from vertex qi to ri and that reads wi , for 1 ≤ i ≤ k.
We define a computation of ML comprising the following steps: the first step consists of
“Read a, Push A” and a move from vertex p to vertex q = q1 . Then there are a series of
k stack-preserving computations, the ith one corresponding to subtree Zi , and going from
qi to ri = qi+1 , while reading wi . The final step consists of “Read b, Pop A” and a move
from vertex r = rk to t. Altogether, the computation goes from vertex p to vertex t, it reads
aw1 w2 · · · wk b = w, and it is stack preserving.
We conclude that for all Z there is a stack-preserving computation of ML that reads w,
starts at Z’s start vertex and ends at Z’s end vertex.
96 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

Lemmas 3.6.10 and 3.6.11 show that a w-recognizing computation of ML can be repre-
sented by a trapezoidal tree, and further that any trapezoidal tree observing Property 3.6.9
corresponds to a label(Z)-recognizing computation of ML .
To facilitate turning this tree into a derivation tree, we want the nodes in the trapezoidal
tree to have bounded degree. Let v be a node in the trapezoidal tree with k ≥ 1 children
v1 , v2 , · · · , vk . We achieve the bounded degree by introducing intermediate nodes, as follows.
We create intermediate nodes x1 , x2 , xk−1 , and y1 , y2 , · · · , yk , together with the following tree
edges. If k = 1, y1 is the child of v; otherwise x1 is the child of v. For 1 ≤ i ≤ k, yi will be
the parent of vi ; for 1 ≤ i ≤ k − 2, xi will have left child yi and right child xi+1 ; xk−1 will
have left child yk−1 and and right child yk . Clearly, in the new tree, v still has descendants
v1 , v2 , · · · , vk in left to right order. (see Figure 3.23.) We use the term x-nodes to refer to

y1 y2 y3

x1 x2

Figure 3.23: The Tree Fragment with Intermediate Nodes.

the nodes xi and y-nodes to refer to the yi .


Suppose that the sequence of bottom edges for the trapezoids associated with nodes
v1 , v2 , · · · , vk is (q0 , q1 ), (q1 , q2 ), · · · , (qk−1 , qk ). So edge (q0 , qk ) is the top edge for the trape-
zoid associated with v. We associate span (qi−1 , qi ) with node yi and span (qi−1 , qk ) with
node xi ; the latter is exactly the union of the spans for the nodes yi , · · · yk , and these nodes
comprise xi ’s descendants.
We make the trapezoidal tree a derivation tree for w by putting a suitable variable at each
node and adding appropriate leaf nodes for the terminals read during ML ’s computation; in
addition, we introduce suitable rules in the grammar so that the variable at each node can
be replaced by the terminals read on the side edges of the corresponding trapezoid and the
variables for its original children, if any.

The Context Free Grammar GL

Now we are ready to define the Context Free Grammar GL . We will create a variable for
each realizable trapezoid, together with variables to label the x- and y-nodes. In addition,
we create rules to enable the variable for a node at the root of a subtree Z to derive label(Z).
Aab Aab
Thus we introduce a variable Upqrt for each realizable trapezoid Tpqrt . We create variables
Xpq for all p, q ∈ V for the x-nodes and variables Ypq for all p, q ∈ V for the y-nodes.
We add the following rules:
3.6. PDAS RECOGNIZE EXACTLY THE CONTEXT FREE LANGUAGES 97

Aab Aab
X Upqrt → aXqr b | aYqr b for all realizable trapezoids Tpqrt .
Aab Aab
X Uprrt → ab for all realizable trapezoids Tprrt (these are triangles).
Aab   
X Yqr → Uqq  r r for all q, q  , r , r ∈ V , A ∈ Γ, and a , b ∈ T .

X Xqt → Yqr Xrt | Yqr Yrt for all r ∈ V .


$λλ
The start variable for GL is Uss  f  f (recall that s is the start vertex in ML and f the

single final vertex, ML is -shielded, its first and last steps perform no reads, and go from
vertex s to s and from f  to f , respectively).

Lemma 3.6.12. Let Z be a trapezoidal tree obeying Property 3.6.9. Let w = label(Z). Then
w ∈ L(GL ).
Aab
Proof. To obtain a derivation tree in GL , we replace each trapezoid Tpqrt at a node v by the
Aab Aab
variable Upqrt . For each variable Upqrt we add a left leaf child labeled by a and a right leaf
child labeled by b. We give variable Xpq to an x-node with span (p, q), and variable Ypq to a
y-node with span (p, q).
We use the following rules to enable the derivation of the string labeling the leaves of
this tree. For an x-node v with span (q, t) and children with spans (q, r) and (r, t) we use
one of the rules Xqt → Yqr Xrt | Yqr Yrt , according as the right child of v is an x-node or a
Aab
y-node. For a y-node with span (p, t), whose child has label variable Upqrt , we use the rule
Aab Aab Aab
Ypt → Upqrt . For a node v with label Upqrt we use one of the rules Upqrt → aXqr b | aYqr b,
according as v has an x-child or a y-child, while if v has no child (and so q = r) we use the
Aab
rule Uprrt → ab.
It is easy to see that this derivation derives label(Z) = w.

Lemma 3.6.13. Suppose that w ∈ L(GL ); then there is a trapezoidal tree Z obeying Prop-
erty 3.6.9 with w = label(Z).

Proof. Let TG (w) be a derivation tree for w. We will construct the trapezoidal tree Z as
Aab Aab
follows. We simply replace each variable Upqrt by trapezoid Tpqrt , remove the leaves (labeled
by terminals), and remove the variables labeling x- and y-nodes.
Clearly label(Z) = w.
It remains to show that Z obeys Property 3.6.9. Part 2 follows because the variables
Aab Σλλ
Upqrt correspond to realizable trapezoids. Part 3 follows because GL ’s start variable is Uss f f

so Z’s root stores trapezoid TssΣλλ  f  f . Part 1 follows because each leaf in Z corresponds to a
Aab
node with two leaf children in the derivation tree, and such nodes have variables Uprrt , which
Aab
are replaced by realizable triangles Tprrt . Part 4 follows because if in the derivation tree
Aab
Upqrt derives UpA11qa1 1r1b1t1 UpA22qa2 2r2b2t2 · · · UpAkkqakkrkbktk via intermediate X and Y variables, then q = p1 ,
Aab
pi+1 = qi for 1 ≤ i ≤ k − 1, and qk = r. But then the corresponding trapezoids Tpqrt and
A1 a1 b1 A2 a2 b2 Ak ak bk
Tp1 q1 r1 t1 , Tp2 q2 r2 t2 , · · · , Tpk qk rk tk obey the same condition, which is part 4 of Property 3.6.9.
Lemmas 3.6.10–3.6.13 show that L(GL ) = L(ML ).
98 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

Exercises
1. For the pda in Figure 3.24 answer the following questions.

Read a, Push A

Read b, Pop A
Push  q
Pop 
p Pop , Push  Pop , Push  s
Push  Pop 
r Read a, Pop B

Read b, Push B

Figure 3.24: Figure for Problem 1

i. What is its start vertex?


ii. What are its final vertices?
iii. Give the sequence of vertices and associated data configurations the pda goes through
in an accepting computation on input abba. Are there any other computation paths
that can be followed on this input, and if yes, give the sequence of vertices such a
computation goes through.
iv. What is the language accepted by this pda?

2. Give PDAs to recognize the following languages. You are to give both an explanation in
English of what each PDA does plus a graph of the PDA; seek to label the graph vertices
accurately.

i. A = {w | w has odd length, w ∈ {a, b}∗ and the middle symbol of w is an a}.
ii. B = {w | w ∈ {a, b}∗ and w = ai bi for any integer i}.
iii. C = {w | w ∈ {a, b}∗ and w contains equal numbers of a’s and b’s}.
iv. D = {wwR x | w, x ∈ {a, b}∗ }.
v. E = {wcwR x | w, x ∈ {a, b}∗ }.

3. Suppose that A is recognized by PDA M . Give a PDA to recognize A∗ .

4. Let C be a language over the alphabet {a, b} and let Suffix(C) = {w | there is a u ∈ {a, b}∗
with uw ∈ C}. Show that if C is recognized by a pda then so is Suffix(C).
EXERCISES 99

5. i. Let A = {uav#xby | u, v, x, y ∈ {a, b}∗ and (|u| − |v|) = (|x| − |y|)}. Give a pda to
recognize A.
ii. Let B = {w#z | w, z ∈ {a, b}∗ and |w| = |z|}. Give a pda to recognize B.
iii. Show that A ∪ B = {s#t | s, t ∈ {a, b}∗ and s = t}.

6. Consider the pda in Figure 3.24. Add descriptors to the vertices. What language does
this pda recognize?

7. Consider the following context free grammar. S ← (S)|SS|( )|[ ]

i. What are its terminals?


ii. What are its variables?
iii. What are its rules?
iv. Show the derivation of string ([ ] ( )).
v. Describe in English the set of strings generated by this grammar.

8. Give CFG’s to generate the following languages.

i. A = {w | w has odd length, w ∈ {a, b}∗ and the middle symbol of w is an a}.
ii. B = {w | w ∈ {a, b}∗ and w = wR }. A is the language of palindromes, strings that
read the same forward and backward.
Hint: Be sure to handle strings of all possible lengths.
iii. C = {wwR x | w, x ∈ {a, b}∗ }.
iv. D = {w | w ∈ {a, b}∗ and w contains an equal number of a’s and b’s}.
Hint: suppose that the first character in w is an a. Let x be the shortest initial
substring of w having an equal number of a’s and b’s. If |x| < |w|, then w can be
written as w = xy; what can you say about y? Otherwise, x = w and w can be
written as w = azb; what can you say about z?
v. E = {w#x | w, x ∈ {a, b}∗ and wR is an initial substring of x}.
Hint: x can be written as x = wR y for some x ∈ {a, b}∗ .

9. i. Let E = {ai bj | i < j}. Give a CFL to generate E.


ii. Let F = {ai bj | 2i > j}. Give a CFL to generate F .
iii. Let J = {ai bj | i < j < 2i}. Give a CFL to generate J.
Hint: Let i = h + l and j = h + 2l. what can you say about h and l?

10. i. Give a context free grammar to generate the following language: L1 = {ai #bi+j $aj | i, j ≥
0}.
ii. Give a context free grammar to generate the following language: L2 = {w#x$y | w, x, y ∈
{a, b}∗ and |x| = |w| + |y|}.
100 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

iii. Hence give a context free grammar to generate the following language: L3 = {uv | |u| =
|v| but u = v}. Hint: think of the # and the $ from part (2) as a pair of aligned yet
unequal characters in u and v; what is the relation among the lengths of the remaining
pieces of u and v?

11. Let A be a CFL generated by a CFG GA . Give a CFG grammar GA∗ , based on GA , to
generate A∗ . Show that L(GA∗ ) = A∗ .

12. Convert the following CFGs to CNF form.

i. G1 has start variable S, terminal set {a, b, c} and rules

S → SBS | BC; B → ab | λ; C → c | λ.

ii. G2 has start variable S, terminal set {a, b, c} and rules

S → AB | SX; A → a | λ; B → CA; C → c | λ; X → SAS.

13. Show that the following languages are not context free.

i. A = {ai bj ck bj ck ai | i, j, k ≥ 0}.
ii. B = {am bn cm dn | m, n ≥ 0}.
iii. C = {w | w ∈ {a, b, c}∗ and the number of a’s, b’s and c’s in w are all equal}.
iv. D = {u#v#w | u, v, w ∈ {a, b}∗ , the number of a’s in u equals |v|,
and the number of b’s in v equals |w|}.
v. Let E = {w | w ∈ {a, b, c, d}∗ and w contains equal numbers of a’s and b’s, and equal
numbers of c’s and d’s}. Show that D is not context free.
2
vi. F = {ai | i ≥ 1}.
Comment. Any CFL over a 1-character alphabet is a regular language. Give a proof
without using this fact.
i
vii. H = {a2 | i ≥ 0}.
Comment. Any CFL over a 1-character alphabet is a regular language. Give a proof
without using this fact.
viii. J = {x1 #x2 # · · · #xl | xh ∈ {a, b}∗ , 1 ≤ h ≤ l, and for some i, j, k, 1 ≤ i < j < k,
|xi | = |xj | = |xk |}.
ix. K = {x1 #x2 # · · · #xk | xh ∈ {a, b}∗ , 1 ≤ h ≤ k, and for some i, j, 1 ≤ i < j ≤ k,
xi = xj }.
x. Let L = {ai bj | i is an integer multiple of j}.
xi. Let M = {wxwR | w, x ∈ {a, b}∗ and |w| = |x|}.
xii. Let N be the language consisting of all palindromes over the alphabet Σ = {a, b, c}
having equal numbers of a’s and b’s.
EXERCISES 101

xiii. P = {ai bi cj | j > i}.


xiv. Q = {ai bj ck | k = max{i, j}}.

14. Consider the following CNF context free grammar.


S → AB, A → AA, A → a, B → b.
Show the pda generated by applying the construction of Section 3.6.1 to this grammar.
15. Consider the pda shown in Figure 3.2.
i. Draw the trapezoidal diagram for the computation recognizing input aabb.
ii. Give the CFL generated by applying the construction of Section 3.6.1 to this pda.
16. For each of the language transformations T defined in the parts below, answer the fol-
lowing two questions.
a. Suppose that L is a CFL. Show that T (L) is also a CFL by giving a CFG to generate
T (L). Remember to explain why your solution is correct.
b. Now suppose that L is recognized by a pda. Show that T (L) is also recognized by a
pda. Again, remember to explain why your solution is correct.
Comment: The two parts are equivalent; nonetheless, you are being asked for a separate
construction for each part.

i. Let w ∈ {a, b, c}∗ . Define Sbst(w, a, b) to be the string obtained by replacing all
instances of the character a in w with b. e.g. Sbst(ac, a, b) = bc, Sbst(cc, a, b) = cc,
Sbst(abc, a, b) = bbc, Sbst(acacac, a, b) = bcbcbc.
Let L be a language over the alphabet {a, b, c}. Define T (L) = Sbst(L, a, b) = {x | x =
Sbst(w, a, b) for some w ∈ L}.
ii. Let w ∈ {a, b, c}∗ . Define OneSubst(w, a, b), or OS(w, a, b) for short, to be the set of
strings obtained by replacing one instance of the character a from w with a b. e.g.
OS(acacac, a, b) = {bcacac, acbcac, acacbc}.
Let L be a language over the alphabet {a, b, c}.
Define T (L) = OS(L, a, b) = {x | x = OS(w, a, b) for some w ∈ L}.
iii. Let w ∈ {a, b, c}∗ . Define Remove-c(w) to be the string obtained by deleting all
instances of the character c from w. e.g. Remove-c(ab) = ab, Remove-c(cc) = λ,
Remove-c(abc) = ab, Remove-c(acacac) = aaa.
Let L be a language over the alphabet {a, b, c}. Define T (L) = Remove-c(L) = {x | x =
Remove-c(w) for some w ∈ L}.
iv. Let w ∈ {a, b, c}∗ . Define Remove-One-c(w) to be the set of strings obtained by delet-
ing one instance of the character c from w. e.g. Remove-One-c(acacac) = {aacac, acaac, acaca}.
Let L be a language over the alphabet {a, b, c}.
Define L(T ) = Remove-One-c(L) = {x | x = Remove-One-c(w) for some w ∈ L}.
102 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES

v. Let h be a mapping from Σ to Σ∗ , that is h maps each character in Σ to a string of


characters. Define h(s) for a string s = s1 s2 · · · sk to be the string h(s1 )h(s2 ) · · · h(sk ).
Define T (L) = {h(w) | w ∈ L}.
vi. Let h be a mapping from Σ to R where R is the set of regular expressions over alphabet
Σ. Define h(s) for a string = s1 s2 · · · sk to be the string h(s1 )h(s2 ) · · · h(sk ).
Define T (L) = {h(w) | w ∈ L}.

You might also like