Chapter 3
Chapter 3
As we have seen, Finite Automata are somewhat limited in the languages they can recog-
nize. Pushdown Automata are another type of machine that can recognize a wider class of
languages. Context Free Grammars, which we can think of as loosely analogous to regular
expressions, provide a method for describing a class of languages called Context Free Lan-
guages. As we will see, the Context Free Languages are exactly the languages recognized by
Pushdown Automata.
Pushdown Automata can be used to recognize certain types of structured input. In
particular, they are an important part of the front end of compilers. They are used to
determine the organization of the programs being processed by a compiler; tasks they handle
include forming expression trees from input arithmetic expressions and determining the scope
of variables.
Context Free Languages, and an elaboration called Probabilistic Context Free Languages,
are widely used to help determine sentence organization in computer-based text understand-
ing (e.g., what is the subject, the object, the verb, etc.).
62
3.1. PUSHDOWN AUTOMATA 63
of operations Push(A), Push(B), Pop, Push(C), Pop, Pop, the 3 successive pops will read
the items B, C, A. respectively. The successive states of the stack are shown in Figure 3.1.
Let’s see how a stack allows us to recognize the following language L1 = {ai bi | i ≥ 0}.
We start by explaining how to process a string w = ai bi ∈ L. As the PDA reads the initial
string of a’s in its input, it pushes a corresponding equal length string of A’s onto its stack
(one A for each a read). Then, as M reads the b’s, it seeks to match them one by one against
the A’s on the stack (by popping one A for each b it reads). M recognizes its input exactly
if the stack becomes empty on reading the last b.
In fact, PDAs are not allowed to use a Stack Empty test. We use a standard technique,
which we call -shielding, to simulate this test. Given a PDA M on which we want to perform
Stack Empty tests, we create a new PDA M which is identical to M apart from the following
small changes. M uses a new, additional symbol on its stack, which we name . Then at the
very start of the computation, M pushes a onto its stack. This will be the only occurrence
of on its stack. Subsequently, M performs the same steps as M except that when M seeks
to perform a Stack Empty test, M pops the stack and then immediately pushes the popped
symbol back on its stack. The simulated stack is empty exactly if the popped symbol was a
.
Next, we explain what happens with strings outside the language L1 . We do this by
looking at several categories of strings in turn.
1. ai bh , h < i.
After the last b is read, there will still be one or more A’s on the stack, indicating the
input is not in L1 .
2. ai bj , j > i.
On reading the (i + i)st b, there is an attempt to pop the now empty stack to find a
matching A; this attempt fails, and again this indicates the input is not in L1 .
3. The only other possibility for the input is that it contains the substring ba; as already
described, the processing consists of an a-reading phase, followed by a b-reading phase.
The a in the substring ba is being encountered in the b-reading phase and once more
this input is easily recognized as being outside L1 .
As with an NFA, we can specify the computation using a directed graph, with the edge
labels indicating the actions to be performed when traversing the given edge. To recognize
an input w, the PDA needs to be able to follow a path from its start vertex to a final vertex
64 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
starting with an empty stack, where the path’s read labels spell out the input, and the
stack operations on the path are consistent with the stack’s ongoing contents as the path is
traversed.
A PDA M1 recognizing L1 is shown in Figure 3.2. Because the descriptions of the
Read b
Pop A
p1 : p3 :
λ read ai bh read, i ≥ h ≥ 0
stack empty stack contents Ai−h Pop
p2 : p4 :
Push ai read, i ≥ 0 Read λ ai bi read, i ≥ 0
stack contents ai stack empty
Read a
Push A
vertices are quite long, we have given them the shorter names p1 –p4 . These descriptions
specify exactly the strings that can be read and the corresponding stack contents to reach
the vertex in question. We will verify the correctness of these specifications by looking at
what happens on the possible inputs. To do this effectively, we need to divide the strings
into appropriate categories.
An initial understanding of what M1 recognizes can be obtained by ignoring what the
stack does and viewing the machine as just an NFA (i.e. using the same graph but with just
the reads labeling the edges). See Figure 3.3 for the graph of the NFA N1 derived from M1 .
The significant point is that if M1 can reach vertex p on input w using computation path
a b
p1 : λ p2 : λ p3 : λ p4 :
λ read ai read ai bh read ai bh read
i≥0 i, h ≥ 0 i, h ≥ 0
P then so can N1 (for the same reads label P in both machines). It follows that any string
recognized by M1 is also recognized by N1 : L(M1 ) ⊆ L(N1 ).
It is not hard to see that N1 recognizes a∗ b∗ . If follows that M1 recognizes a subset of
∗ ∗
a b . So to explain the behavior of M1 in full it suffices to look at what happens on inputs
3.1. PUSHDOWN AUTOMATA 65
the form ai bj , i, j ≥ 0, which we do by examining five subcases that account for all such
strings.
1. λ.
M1 starts at p1 . On pushing , p2 and p3 can be reached. Them on popping the ,
p4 can be reached. Note that the specification of p2 holds with i = 0, that of p3 with
i = h = 0, and that of p4 with i = 0. Thus the specification at each vertex includes
the case that the input is λ.
2. ai , i ≥ 1.
To read ai , M1 needs to push , then follow edge (p1 , p2 ), and then follow edge (p2 , p2 )
i times. This puts $Ai on the stack. Thus on input ai , p2 can be reached and its
specification is correct. In addition, the edge to p3 can be traversed without any
additional reads or stack operations, and so the specification for p3 with h = 0 is
correct for this input.
3. ai bh , 1 ≤ h < i.
The only place to read b is on edge (p3 , p3 ). Thus, for this input, M1 reads ai to
bring it to p3 and then follows (p3 , p3 ) h times. This leaves $Ai−h on the stack, and
consequently the specification of p3 is correct for this input. Note that as h < i, edge
(p3 , p4 ) cannot be followed as is not on the stack top.
4. ai bi , i ≥ 1.
After reading the i b’s, M1 can be at vertex p3 as explained in (3). Now, in addition,
edge (p3 , p4 ) can be traversed and this pops the from the stack, leaving it empty. So
the specification of p4 is correct for this input.
5. ai bj , j > i.
On reading ai bi , M1 can reach p3 with the stack holding or reach p4 with an empty
stack, as described in (4). From p3 the only available move is to p4 , without reading
anything further. At p4 there is no move, so the rest of the input cannot be read, and
thus no vertex can be reached on this input.
This is a very elaborate description which we certainly don’t wish to repeat for each
similar PDA. We can describe M1 ’s functioning more briefly as follows.
M1 checks that its input has the form a∗ b∗ (i.e. all the a’s precede all the b’s)
using its underlying NFA (i.e. without using the stack). The underlying NFA is
often called its finite control. In tandem with this, M1 uses its -shielded stack to
match the a’s against the b’s, first pushing the a’s on the stack (it is understood
that in fact A’s are being pushed) and then popping them off, one for one, as the
b’s are read, confirming that the numbers of a’s and b’s are equal.
The detailed argument we gave above is understood, but not spelled out.
66 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
Now we are ready to define a PDA more precisely. As with an NFA, a PDA consists of a
directed graph with one vertex, start, designated as the start vertex, and a (possibly empty)
subset of vertices designated as the final set, F , of vertices. As before, in drawing the graph,
we show final vertices using double circles and indicate the start vertex with a double arrow.
Each edge is labeled with the actions the PDA performs on following that edge.
For example, the label on edge e might be: Pop A, read b, Push C, meaning that the
PDA pops the stack, reads the next input character, and if the pop returns an A and the
character read is a b, then the PDA can traverse e, which entails it pushing C onto the stack.
Some or all of these values may be λ: Pop λ means that no Pop is performed, read λ that
no read occurs, and Push λ that no Push happens. To avoid clutter, we usually omit the
λ-labeled terms; for example, instead of Pop λ, read λ, Push C, we write Push C. Also, to
avoid confusion in the figures, if there are multiple triples of actions that take the PDA from
a vertex u to a vertex v, we use multiple edges from u to v, one for each triple.
In sum, a label, which specifies the actions accompanying a move from vertex u to vertex
v, has up to three parts.
1. Pop the stack and check that the returned character has a specified value (in our
example this is the value A).
2. Read the next character of input and check that it has a specified value (in our example,
the value b).
3. Push a specified character onto the stack (in our example, the character C).
From an implementation perspective it may be helpful to think in terms of being able to
peek ahead, so that one can see the top item on the stack without actually popping it, and
one can see the next input character (or that one is at the end of the input) without actually
reading forward.
One further rule is that an empty stack may not be popped.
A PDA also comes with an input alphabet Σ and a stack alphabet Γ (these are the
symbols that can be written on the stack). It is customary for Σ and Γ to be disjoint, in
part to avoid confusion. To emphasize this disjointness, we write the characters of Σ using
lowercase letters and those of Γ using uppercase letters.
Became the stack contents make it difficult to describe the condition of the PDA after
multiple moves, we use the transition function here to describe possible out edges from
single vertices only. Accordingly, δ(p, A, b) = {(q1 , C1 ), (q2 , C2 ), · · · , (ql , Cl )} indicates that
the edges exiting vertex p and having both Pop A and read b in their label are the edges
going to vertices q1 , q2 , · · · , ql where the rest of the label for edge (p, qi ) includes action Ci ,
for 1 ≤ i ≤ l. That is δ(p, A, b) specifies the possible moves out of vertex p on popping
character A and reading b. (Recall that one or both of A and b might be λ.)
In sum, a PDA M consists of a 6-tuple: M = (Σ, Γ, V, start, F, δ), where
1. Σ is the input alphabet,
2. Γ is the stack alphabet,
3.1. PUSHDOWN AUTOMATA 67
6. δ is the transition function, which specifies the edges and their labels.
Recognition is defined as for an NFA, that is, PDA M recognizes input w if there is a
path that M can follow on input w that takes M from its start vertex to a final vertex.
We call this path a w-recognizing computation path to emphasize that stack operations may
occur in tandem with the reading of input w. More formally, M recognizes w if there is a
path start = p0 , p1 , · · · , pm , where pm is a final vertex, the label on edge (pi−i , pi ) is (Read
ai , Pop Bi , Push Ci ), for 1 ≤ i ≤ m, and the stack contents at vertex pi is σi , for 0 ≤ i ≤ m,
where
1. a1 a2 · · · am = w,
2. σ0 = λ,
Read b
Pop A
p1 : p3 :
λ read ai cbh read, i ≥ h ≥ 0 Pop
stack empty stack contents Ai−h
p2 : p4 :
Push ai read, i ≥ 0 Read c ai cbi read, i ≥ 0
stack contents Ai stack empty
Read a
Push A
of M1 . M2 checks that its input has the form a∗ cb∗ using its finite control. In tandem, M2
uses its -shielded stack to match the a’s against the b’s, first pushing the a’s on the stack
68 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
(actually A’s are being pushed), then reads the c without touching the stack, and finally
pops the a’s off, one for one, as the b’s are read, confirming that the numbers of a’s and b’s
are equal.
Read a Read b
Pop A Pop B
p1 : p3 :
λ read yzcz R read Pop
stack empty stack contents Y
p2 : p4 :
Push x read Read c
wcwR read
stack contents X stack empty
Read a Read b
Push A Push B
Figure 3.5: PDA M3 recognizing L3 = {wcwR | w ∈ {a, b}∗ }. Z denotes string z in capital
letters.
the w against the wR , as follows. It pushes w on the stack (the end of the substring w being
indicated by reaching the c). At this point, the stack content read from the top is wR $, so
popping down to the outputs the string wR . This stack contents is readily compared to
the string following the c. The input is recognized exactly if they match.
Read a Read b
Pop A Pop B
p1 : p3 :
λ read yzz R read Pop
stack empty stack contents Y
p2 : p4 :
Push x read Read λ
wwR read
stack contents X stack empty
Read a Read b
Push A Push B
Figure 3.6: PDA M4 recognizing L4 = {wwR | w ∈ {a, b}∗ }. Z denotes string z in capital
letters.
Read a Read b
Push A Pop A
p1 : Push p12 : Read λ p13 :
λ read ai read, i ≥ 0 ai bh read, i ≥ h ≥ 0
stack empty stack contents ai stack contents Ai−h
p14 : Pop
Read c ai bi read, i ≥ 0
Push
stack empty
Read a Read c
Push A Read b Pop A
p24 :
ai bj ci read, i, j ≥ 0 Pop
stack empty
Proof. The graph of MA∪B consists of the graphs of MA and MB plus a new start vertex
start A∪B , which is joined by λ-edges to the start vertices start A and start B of MA and MB ,
respectively. Its final vertices are the final vertices of MA and MB . The graph is shown in
figure 3.8.
MA MA
...
...
startA Read λ startA
MA∪B
MB MB
Read λ
...
...
startB startB
While it is clear that L(MA∪B ) = L(MA ) ∪ L(MB ), we present the argument for com-
pleteness.
First, we show that L(MA∪B ) ⊆ L(MA ) ∪ L(MB ). Suppose that w ∈ L(MA∪B ). Then
there is a w-recognizing computation path from start A∪B to a final vertex f . If f lies in
MA , then removing the first edge of P leaves a path P from start A to f . Further, at the
start of P , the stack is empty and nothing has been read, so P is a w-recognizing path
in MA . That is, w ∈ L(MA ). Similarly, if f lies in MB , then w ∈ L(MB ). Either way,
w ∈ L(MA ) ∪ L(MB ).
Second, we show that L(MA ) ∪ L(MB ) ⊆ L(MA∪B ). Suppose that w ∈ L(MA ). Then
there is a w-recognizing computation path P from start A to a final vertex f in MA . Adding
the λ-edge (start A∪B , start A ) to the beginning of P creates a w-recognizing computation
path in MA∪B , showing that L(MA ) ⊆ L(MA∪B ). Similarly, if w ∈ L(MB ), then L(MB ) ⊆
L(MA∪B ).
Lemma 3.2.2. Let PDA M recognize L. There is an L-recognizing PDA M with the
following properties: M has only one final vertex, finalM , and M will always have an
empty stack when it reaches finalM .
Proof. The idea is quite simple. M simulates M using a -shielded stack. When M ’s
computation is complete, M moves to a new stack-emptying vertex, stack-empty, at which
M empties its stack of everything apart from the -shield. To then move to final M , M pops
the , thus ensuring it has an empty stack when it reaches final M . M is illustrated in Figure
3.9. More precisely, M consists of the graph of M plus three new vertices; start M , stack-
M: M : M: Pop X
Read λ X = $
...
...
startM startM startM
Read λ stack-
empty
M ’s final finalM
vertices
empty, and final M . The following edges are also added: (start M , start M ) labeled Push ,
λ-edges from each of M ’s final vertices to stack-empty, self-loops (stack-empty, stack-empty)
labeled Pop X for each X ∈ Γ, where Γ is M ’s stack alphabet (so $ = X), and edge
(stack-empty, final M ) labeled Pop .
It is clear that L(M ) = L(M ). Nonetheless, we present the argument for completeness.
First, we show that L(M ) ⊆ L(M ). Let w ∈ L(M ). Let P be a w-recognizing path in
M and let f be the final vertex of M preceding stack-empty on the path P . Removing the
first edge in P and every edge including and after (f, stack-empty), leaves a path P which
is a w-recognizing path in M . Thus L(M ) ⊆ L(M ).
Now we show L(M ) ⊆ L(M ). Let w ∈ L(M ) and let P be a w-recognizing path in
M . Suppose that P ends with string s on the stack at final vertex f . We add the edges
(start M , start M ), (f, stack-empty), |s| self-loops at stack-empty, and (stack-empty, final M )
to P , yielding path P in M . By choosing the self-loops to be labeled with the characters of
sR in this order, we cause P to be a w-recognizing path in M . Thus L(M ) ⊆ L(M ).
Lemma 3.2.3. Let A and B be languages recognized by PDAS MA and MB , respectively,
Then A ◦ B is also recognized by a PDA called MA◦B .
Proof. Let MA and MB be PDAs recognizing A and B, respectively, where they each have
just one final vertex that can be reached only with an empty stack.
MA◦B consists of MA , MB plus one λ-edge (final A , startB ). Its start vertex is start A and
its final vertex is final B .
72 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
MA∪B : MA : MB :
u Read λ v
startA startB
Lemma 3.2.4. Suppose that L is recognized by PDA ML and suppose that R is a regular
language. Then L ∩ R is recognized by a PDA called ML∩R .
Proof. Let ML = (Σ, ΓL , VL , start L , FL , δL ) and let R be recognized by DFA MR = (Σ, VR , start R , FR , δR ).
We will construct ML∩R . The vertices of ML∩R will be 2-tuples, the first component corre-
sponding to a vertex of ML and the second component to a vertex of MR . The computation
of ML∩R , when looking at the first components along with the stack will be exactly the
computation of ML , and when looking at the second components, but without the stack, it
will be exactly the computation of MR . This leads to the following edges in ML∩R .
1. If ML has an edge (uL , vL ) with label (Pop A, Read b, Push C) and MR has an edge
(uR , vR ) with label b, then ML∩R has an edge ((uL , uR ), (vL , vR )) with label (Pop A,
Read b, Push C).
2. If ML has an edge (uL , vL ) with label (Pop A, Read λ, Push C) then ML∩R has an
edge ((uL , uR ), (vL , uR )) with label (Pop A, Read λ, Push C) for every uR ∈ VR .
The start vertex for ML∩R is (start L , start R ) and its set of final vertices is FL × FR , the pairs
of final vertices, one from ML and one from MR , respectively.
3.3. CONTEXT FREE LANGUAGES 73
Assertion. ML∩R can reach vertex (vL , vR ) on input w if and only if ML can reach vertex
vL and MR can reach vertex vR on input w.
Next, we argue that the assertion is true. For suppose that on input w, ML∩R can reach
vertex (vL , vR ) by computation path PL∩R . If we consider the first components of the vertices
in PL∩R , we see that it is a computation path of ML on input w reaching vertex vL . Likewise,
if we consider the second components of the vertices of ML∩R , we obtain a path PR . The
only difficulty is that this path may contain repetitions of a vertex uR corresponding to reads
of λ by ML∩R . Eliminating such repetitions creates a path PR in MR reaching vR and having
the same label w as path PR .
Conversely, suppose that ML can reach vL by computation path PL and MR can reach
vR by computation path PR . Combining these paths, with care, gives a computation path P
which reaches (vL , vR ) on input w. We proceed as follows. The first vertex is (startL , startR ).
Then we traverse PL and PR in tandem. Either the next edges in PL and PR are both labeled
by a Read b (simply a b on PR ) in which case we use Rule (1) above to give the edge to add
to P , or the next edge on PL is labeled by Read λ (together with a Pop and a Push possibly)
and then we use Rule (2) to give the edge to add to P . In the first case we advance one edge
on both PL and PR , in the second case we only advance on PL . Clearly, the path ends at
vertex (vL , vR ) on input w.
It is now easy to see that L(ML∩R ) = L ∩ R. For on input w, ML∩R can reach a final
vertex v ∈ F = FL × FR if and only if on input w, ML reaches a vertex vL ∈ FL and
MR reaches a vertex vR ∈ FR . That is, w ∈ L(ML∩R ) if and only if w ∈ L(ML ) = L and
w ∈ L(MR ) = R, or in other words w ∈ L(ML∩R ) if and only if w ∈ L ∩ R.
Exercise. Show that if A is recognized by pda MA then there is a pda MA∗ recognizing A∗ .
decimal string →
digit |
digit
decimal string (3.1)
digit → 0 | 1 | 2 | · · · | 9
We can also view this as providing a way of generating decimal strings. To generate a
particular decimal string we perform a series of replacements or substitutions starting from
74 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
the “variable”
decimal string. The sequence of replacements generating 147 is shown below.
decimal string ⇒
digit
decimal string
⇒ 1
decimal string
⇒ 1
digit
decimal string
⇒ 14
decimal string
⇒ 14
digit
⇒ 147
X
decimal string and replaces it with one of
digit or the sequence
digit
decimal string.
X
digit and replaces it with one of 0–9.
An easier way to understand this is by viewing the generation using a tree, called a
derivation or parse tree, as shown in Figure 3.11. Clearly, were we patient enough, in
decimal string
digit
decimal string
digit
decimal string
digit
1 4 7
parentheses. We also would like the generation rules to follow the precedence order of the
operators “+” and “×”, in a sense that will become clear.
The generation rules, being recursive, generate arithmetic expressions top-down. Let’s
use the example x + x × x as motivation. To guide us, it is helpful to look at the expression
tree representation, shown in Figure 3.12a. The root of the tree holds the operator “+”;
expr
expr
term
+
term
factor
×
factor
x x x x + x × x
(a) (b)
correspondingly, the first substitution we apply needs to create a “+”; the remaining parts
of the expression will then be generated recursively. This is implemented with the following
variables:
expr, which can generate any arithmetic expression;
term, which can generate
any arithmetic expression whose top level operator is times (×) or matched parentheses
(“(” and “)”); and
factor, which can generate any arithmetic expression whose top level
operator is a pair of matched parentheses. This organization is used to enforce the usual
operator precedence. This leads us to the following substitution rules:
X
expr →
expr +
term |
term.
This rule implies that the top-level additions are generated in right to left order and
hence performed in left to right order (for the expression generated by the left
expr
is evaluated before being added to the expression generated by the
term to its right).
So eventually the initial
expr is replaced by
term +
term + · · · +
term, each
term being an operand for the “+” operator. If there is no addition,
expr is simply
replaced by
term.
X
term →
term ×
factor |
factor.
Similarly, this rule implies that the multiplications are performed in left to right order.
Eventually the initial
term is replaced by
factor ×
factor × · · · ×
factor, each
76 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
X
factor → x | (expr).
Since “×” has the highest precedence each of its operands must be either a simple
variable (x) or a parenthesized expression, which is what we have here.
The derivation tree for the example of Figure 3.12a is shown in Figure 3.12b. Note that the
left-to-right order of evaluation is an arbitrary choice for the “+” and the “×” operators,
but were we to introduce the “−”and “÷” operators it ceases to be arbitrary; left-to-right
is then the correct rule.
Let’s look at one more example.
Example 3.3.1. L = {ai bi | i ≥ 0}. Here are a set of rules to generate the strings in L, to
generate L for short, starting from the term
Balanced.
Balanced ⇒ λ | a
Balancedb.
Notice how a string s ∈ L is generated: from the outside in. First the outermost a and b are
created, together with a
Balanced term between them; this
Balanced term will be used
to generate ai−1 bi−1 . Then the second outermost a and b are generated, etc. The bottom of
the recursion, the base case, is the generation of string λ.
Now we are ready to define Context Free Grammars (CFGs), G, (which have nothing to
do with graphs). A Context Free Grammar has four parts:
2. An alphabet T of terminals: these are the characters used to write the strings being
generated. Usually, they are written with small letters.
3. S ∈ V is the start variable, the variable from which the string generation begins.
4. A set of rules R. Each rule has the form X → σ, where X ∈ V is a variable and
σ ∈ (T ∪ V )∗ is a string of variables and terminals, which could be λ, the empty string.
If we have several rules with left the same lefthand side, for example X → σ1 , X →
σ2 , · · · , X → σk , they can be written as X → σ1 | σ2 | · · · | σk for short. The
meaning is that an occurrence of X in a generated string can be replaced by any one
of σ1 , σ2 , · · · , σk . Different occurrences of X can be replaced by distinct σi , of course.
3.3. CONTEXT FREE LANGUAGES 77
The generation of a string proceeds by a series of replacements which start from the
string s0 = S, and which, for 1 ≤ i ≤ k, obtain si from si−1 by replacing some variable X
in si−1 by one of the replacements σ1 , σ2 , · · · , σk for X, as provided by the rule collection R.
We will write this as
S = s0 ⇒ s1 ⇒ s2 ⇒ · · · ⇒ sk or S ⇒∗ sk for short.
The language generated by grammar G, L(G), is the set of strings of terminals that can be
generated from G’s start variable S:
L(G) = {w | S ⇒∗ w and w ∈ T ∗ }.
Example 3.3.2. Grammar G2 , the grammar generating the language of properly nested
parentheses. It has:
Variable set: {S}.
Terminal set: {(, )}.
Rules: S ⇒ (S) | SS | λ.
Start variable: S.
iv. The start symbol S may appear only on the left-hand side of rules.
Given a CFG G, we show how to convert it to a CNF grammar G generating the same
language.
We use a grammar G with the following rules as a running example.
S → ASA | aB; A → B | S; B → b | λ
We proceed in a series of steps which gradually enforce the above CNF criteria; each step
leaves the generated language unchanged.
3.4. CONVERTING CFGS TO CHOMSKY NORMAL FORM (CNF) 79
Step 1 For each terminal a, we introduce a new variable, Ua say, add a rule Ua → a, and
for each occurrence of a in a string of length 2 or more on the right-hand side of a rule,
replace a by Ua . Clearly, the generated language is unchanged.
Example: If we have the rule A → Ba, this is replaced by Ua → a, A → BUa .
This ensures that terminals on the right-hand sides of rules obey criteria (i) above.
This step changes our example grammar G to have the rules:
S → ASA | Ua B; A → B | S; B → b | λ; Ua → a
Step 2 For each rule with 3 or more variables on the right-hand side, we replace it with a
new collection of rules obeying criteria (ii) above. Suppose there is a rule U → W1 W2 · · · Wk ,
for some k ≥ 3. Then we create new variables X2 , X3 , · · · , Xk−1 , and replace the prior rule
with the rules:
S → AX | Ua B; X → SA; A → B | S; B → b | λ; Ua → a
Step 3 We replace each occurrence of the start symbol S with the variable S and add the
rule S → S . This ensures criteria (iv) above.
This step changes our example grammar G to have the rules:
S → S ; S → AX | Ua B; X → S A; A → B | S ; B → b | λ; Ua → a
S S
S S
A X X
B S A S
λ Ua B B Ua B
a b λ a b
(a) (b)
Figure 3.13: (a) A Derivation Tree for string ab. (b) The Reduced Derivation Tree.
Next, we remove all rules of the form A → λ. We argue that any previously generatable
string w = λ remains generatable. For given a derivation tree for w using the old rules,
using the new rules we can create the reduced derivation tree, which is a derivation tree for
w in the new grammar. To see this, consider a maximal sized λ-subtree (that is a λ-subtree
whose parent is not part of a λ-subtree). Then its root v must have a sibling w and parent
u (these are the names of nodes, not strings). Suppose that u has variable label A, v has
label B and w has label C. Then node v was generated by applying either the rule A → BC
or the rule A → CB at node u (depending on whether v is the left or right child of u). In
the reduced tree, applying the rule A → C generates w and omits v and its subtree.
Finally, we take care of the case that, under the old rules, S can generate λ. In this
situation, we simply add the rule S → λ, which then allows λ to be generated by the new
rules also.
To find the variables that can generate λ, we use an iterative rule reduction procedure.
First, we make a copy of all the rules. We then create reduced rules by removing from the
right-hand sides all instances of variables A for which there is a rule A → λ. We keep
iterating this procedure so long as it creates new reduced rules with λ on the right-hand
side.
For our example grammar we start with the rules
S → S ; S → AX | Ua B; X → S A; A → B | S ; B → b | λ; Ua → a
As B → λ is a rule, we obtain the reduced rules
S → S ; S → AX | Ua B | Ua ; X → S A; A → B | λ | S ; B → b | λ; Ua → a
As A → λ is now a rule, we next obtain
S → S ; S → AX | X | Ua B | Ua ; X → S A | S ; A → B | λ | S ; B → b | λ; Ua → a
3.4. CONVERTING CFGS TO CHOMSKY NORMAL FORM (CNF) 81
There are no new rules with λ on the right-hand side. So the procedure is now complete and
this yields the new collection of rules:
S → S ; S → AX | X | Ua B | Ua ; X → S A | S ; A → B | S ; B → b; Ua → a
An efficient implementation keeps track of the lengths of each right-hand side, and a list
of the locations of each variable; the new rules with λ on the right-hand side are those which
have newly obtained length 0. It is not hard to have this procedure run in time linear in the
sum of the lengths of the rules.
Step 5 This step removes rules of the form A → B, which we call unit rules.
What is needed is to replace derivations of the form A1 ⇒ A2 ⇒ · · · ⇒ Ak ⇒ BC with
a new derivation of the form A ⇒ BC; this is achieved with a new rule A → BC. Similarly,
derivations of the form A1 ⇒ A2 ⇒ · · · ⇒ Ak ⇒ a need to be replaced with a new derivation
of the form A ⇒ a; this is achieved with a new rule A → a. We proceed in two substeps.
Substep 5.1. This substep identifies variables that are equivalent, i.e. collections B1 , B2 , · · · , Bl
such that for each pair Bi and Bj , 1 ≤ i < j ≤ l, Bi can generate Bj , and Bj can generate
Bi . We then replace all of B1 , B2 , · · · , Bl with a single variable, B1 say. Clearly, this does
not change the language that is generated.
To do this we form a directed graph based on the unit rules. For each variable, we create
a vertex in the graph, and for each unit rule A → B we create an edge (A, B). Figure 3.14(a)
shows the graph for our example grammar. The vertices in each strong component of the
S A S A
S B X B
Ua X Ua
(a) (b)
Figure 3.14: (a) Graph showing the unit rules. (b) The reduced graph.
S → X; X → AX | X | Ua B | Ua ; X → XA | X; A → B | X; B → b; Ua → a
To this end, we use the graph formed from the unit rules remaining after Substep 5.1,
which we call the reduced graph. It is readily seen that this is an acyclic graph.
In processing A → B, we will add appropriate non-unit rules that allow the shortcutting
of all uses of A → B, and hence allow the rule A → B to be discarded. If there are no unit
rules with B on the left-hand side it suffices to add a rule A → CD for each rule B → CD,
and a rule A → b for each rule B → b.
To be able to do this, we just have to process the unit rules in a suitable order. Recall
that each unit rule is associated with a distinct edge in the reduced graph. As this graph
will be used to determine the order in which to process the unit rules, it will be convenient
to write “processing an edge” when we mean “processing the associated rule”. It suffices to
ensure that each edge is processed only after any descendant edges have been processed. So
it suffices to start at vertices with no outedges and to work backward through the graph.
This is called a reverse topological traversal. (This traversal can be implemented via a depth
first search on the acyclic reduced graph.)
For each traversed edge (E, F ), which corresponds to a rule E → F , for each rule
F → CD, we add the rule E → CD, and for each rule F → f , we add the rule E → f ; then
we remove the rule E → F . Any derivation which had used the rules E → F followed by
F → CD or F → f can now use the rule E → CD or E → f instead. So the same strings
are derived with the new set of rules.
This step changes our example grammar G as follows (see Figure 3.14(b)):
First, we traverse edge (A, B). This changes the rules as follows:
Add A → b
Remove A → B.
Next, we traverse edge (X, Ua ). This changes the rules as follows:
Add X → a
Remove X → Ua .
Now, we traverse edge (A, X). This changes the rules as follows:
Add A → AX | XA | Ua B | a.
Remove A → X.
Finally, we traverse edge (S, X). This changes the rules as follows:
Add S → AX | XA | Ua B | a.
Remove S → X.
The net effect is that our grammar now has the rules
S → AX | Ua B | XA | a; X → AX | Ua B | XA | a; A → b | AX | Ua B | XA | a; B → b; Ua → a
Steps 4 and 5 complete the attainment of criteria (ii), and thereby create a CNF grammar
generating the same language as the original grammar.
T:
P node c, label A
node d, label A
u v w x y
c c
e e
d d
w x1 x2 v1 v2 w
x v
(a) (b)
Let’s consider replicating the middle portion of the derivation tree, namely the wedge
W formed by taking the subtree C rooted at c and removing the subtree D rooted at d, to
create a derivation tree for a longer string, as shown in Figure 3.17. We can do this because
node c node c
duplicate
W node d node c
W
node d
D W
the root of the wedge is labeled by A and hence W plus a nested subtree D is a legitimate
replacement for subtree D. The resulting tree, with two copies of W , one nested inside the
other, is a derivation tree for uvvwxxy. Thus uvvwxxy ∈ L.
Clearly, we could duplicate W more than once, or remove it entirely, showing that all the
strings uv i wxi y ∈ L, for any integer i ≥ 0.
Now let’s see why uvvwxxy ∈ / L. Note that we know that |vx| ≥ 1 and |vwx| ≤ p, by
Observations 1 and 2. Further recall that ap bp cp = uvwxy. As |vwx| ≤ p, it is contained
entirely in either one or two adjacent blocks of letters, as illustrated in Figure 3.18. Therefore,
ap bp cp
when v and x are duplicated, as |vx| ≥ 1, the number of occurrences of one or two of the
characters increases, but not of all three. Consequently, in uvvwxxy there are not equal
numbers of a’s, b’s, and c’s, and so uvvwxx ∈
/ L.
We have shown both that uvvwxxy ∈ L and uvvwxxy ∈ / L. This contradiction means
that the original assumption (that L is a CFL) is mistaken. We conclude that L is not a
CFL.
We are now ready to prove the Pumping Lemma for Context Free Languages, which
will provide a tool to show many languages are not Context Free, in the style of the above
86 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
argument.
Lemma 3.5.3. (Pumping Lemma for Context Free Languages.) Let L be a Context Free
Language. Then there is a constant p = pL such that if s ∈ L and |s| ≥ p then s is pumpable,
that is s can be written in the form s = uvwxy with
1. |vx| ≥ 1.
2. |vwx| ≤ p.
Proof. Let G be a CNF grammar for L, with m variables say. Let p = 2m . Let s ∈ L, where
|s| ≥ p, and let T be a derivation tree for s. As |s| > 2m−1 , by Corollary 3.5.2, T has a root
to leaf path with at least m + 1 internal nodes. Let P be a longest such path. Each internal
node on P is labeled by a variable, and as P has at least m + 1 internal nodes, some variable
must be used at least twice.
Working up from the bottom of P , let c be the first node to repeat a variable label. So
on the portion of P below c each variable is used at most one. Thus c has height at most
m + 1. The derivation tree is shown in Figure 3.15. Let d be the descendant of c on P having
the same variable label as c, A say. Let w be the substring derived by the subtree rooted at
d. Let vwx be the substring derived by the subtree rooted at c. Let uvwxy be the string
derived by the whole tree.
By Observation 1, c has height at most m + 1; hence, by Lemma 3.5.1, vwx, the string
c derives, has length at most 2m = p. This shows Property (2). Property (1) is shown in
Observation 2, above.
Finally, we show property (3). Let’s replicate the middle portion of the derivation tree,
namely the wedge W formed by taking the subtree C rooted at c and removing the subtree
D rooted at d, to create a derivation tree for a longer string, as shown in Figure 3.17.
We can do this because the root of the wedge is labeled by A and hence W plus a nested
subtree D is a legitimate replacement for subtree D. The resulting tree, with two copies of
W , one nested inside the other, is a derivation tree for uvvwxxy. Thus uvvwxxy ∈ L.
Clearly, we could duplicate W more than once, or remove it entirely, showing that all the
strings uv i wxi y ∈ L, for any integer i ≥ 0.
Next, we demonstrate by example how to use the Pumping Lemma to show languages are
not context free. The argument structure is identical to that used in applying the Pumping
Lemma to regular languages.
Example 3.5.4. J = {ww | w ∈ {a, b}∗ }. We will show that J is not context free.
Step 1. Suppose, for a contradiction, that J were context free. Then, by the Pumping
Lemma, there is a constant p = pJ such that for any s ∈ J with |s| ≥ p, s is pumpable.
Step 2. choose s = ap+1 bp+1 ap+1 bp+1 . Clearly s ∈ J and |s| ≥ p, so s is pumpable.
3.5. SHOWING LANGUAGES ARE NOT CONTEXT FREE 87
Comment. Suppose, by way of example, that vwx overlaps the first two blocks of characters.
It would be incorrect to assume that v is completely contained in the block of a’s and x in
the block of b’s. Further, it may be that v = λ or x = λ (but not both). All you know is
that |vwx| ≤ p and that one of |v| ≥ 1 or |x| ≥ 1. Don’t assume more than this.
Example 3.5.5. K = {ai bj ck | i < j < k}. We show that K is not context free.
Step 1. Suppose, for a contradiction, that K were context free. Then, by the Pumping
Lemma, there is a constant p = pK such that for any s ∈ K with |s| ≥ p, s is pumpable.
Step 2. Choose s = ap bp+1 cp+2 . Clearly, s ∈ K and |s| ≥ p, so s is pumpable.
Step 3. As s is pumpable we can write s as s = uvwxy with |vwx| ≤ p, |vx| ≥ 1 and
uv i wxi y ∈ K for all integers i ≥ 0.
As |vwx| ≤ p, vwx can overlap one or two blocks of the characters in s, but not all three.
Our argument for obtaining a contradiction depends on the position of vwx.
Case 1. vx does not overlap the block of c’s.
Then consider s = uvvwxxy. As s is pumpable, by Condition (3) with i = 2, s ∈ L. We
argue next that in fact s ∈ / K. As v and x have been duplicated in s , the number of a’s or
the number of b’s is larger than in s (or possibly both numbers are larger); but the number
of c’s does not change. If the number of b’s has increased, then s has at least as many b’s
as c’s, and then s ∈ / K. Otherwise, the number of a’s increases, and the number of b’s is
unchanged, so s has at least as many a’s as b’s, and again s ∈/ K.
Case 2. vwx does not overlap the block of a’s.
Then consider s = uwy. Again, as s is pumpable, by Condition (3) with i = 0, s ∈ s.
Again, we show that in fact s ∈ / K. To obtain s from s, the v and the x are removed. So
in s either the number of c’s is smaller than in s, or the number of b’s is smaller (or both).
But the number of a’s is unchanged. If the number of b’s is reduced, then s has at least as
many a’s as b’s, and so s ∈ / K. Otherwise, the number of c’s decreases and the number of
b’s is unchanged; but then s has at least as many b’s as c’s, and again s ∈
/ K.
In either case, a pumped string s has been shown to be both in K and not in K. This
is a contradiction.
Step 4. The contradiction shows that the initial assumption was mistaken. Consequently,
K is not context free.
88 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
The following example uses the yet to be proven result that if L is context free and R is
regular then L ∩ R is also context free.
Example 3.5.6. H = {w | w ∈ {a, b, c}∗ and w has equal numbers of a’s, b’s and c’s}.
Consider H ∩ a∗ b∗ c∗ = L = {ai bi ci }. If H were context free, then L would be context free
too. But we have already seen that L is not context free. Consequently, H is not context
free either.
This could also be shown directly by pumping on string s = ap bp cp .
When applying the Pumping Lemma, it seems a nuisance to have to handle the cases
where one of v or x may be the empty string, and fortunately, we can prove a variant of the
Pumping Lemma in which both |v| ≥ 1 and |x| ≥ 1.
Lemma 3.5.7. (Variant of the Pumping Lemma for Context Free Languages.) Let L be a
Context Free Language. Then there is a constant p = pL such that if s ∈ L and |s| ≥ p then
s is pumpable, that is s can be written in the form s = uvwxy with
1. |v|, |x| ≥ 1.
2. |vwx| ≤ p.
Proof. Let p̃L be the pumping constant for the standard pumping lemma applied to L. We
will chose pL = 2p̃L .
Now let s ∈ L be any string of length at least p = pL .
We apply the standard pumping lemma to s and conclude that we can write s as s =
ũṽ w̃x̃ỹ with
1. |ṽx̃| ≥ 1.
If both |ṽ| ≥ 1 and x̃ ≥ 1 then the new result follows on setting u = ũ, v = ṽ, w = w̃,
x = x̃, y = ỹ.
Otherwise, if |ṽ| ≥ 1 and |x̃| = 0, then we set u = ũ, v = ṽ, w = λ, x = ṽ, y = w̃ỹ.
We observe that for all i, uv i wxi y = ũṽ i λṽ i w̃ỹ = ũṽ 2i w̃x̃2i ỹ ∈ L; we also observe that
|vwx| = 2|ṽ| ≤ 2|ṽ w̃x̃| ≤ 2p̃ = p. Likewise, if |ṽ| = 0 and |x̃| ≥ 1, then we set u = ũw̃, v = x̃,
w = λ, x = x̃, y = ỹ. Again, uv i wxi y ∈ L for all integer i ≥ 0 and |vwx| ≤ p.
3.6. PDAS RECOGNIZE EXACTLY THE CONTEXT FREE LANGUAGES 89
Main: Read a
λ read Push Push S s read Pop A
stack empty σ R on the stack Simulates
corresponds to rule A → a
Pop derivation of sσ
s read Pop A
stack empty Push B Push C
corresponds to
derivation of w Push B
Simulates rule
A → BC
rule of the grammar. It corresponds to a Depth First Traversal of the derivation tree.) The
corresponding k supermove computation by ML starts by moving to vertex Main with λ read
and S on its stack, i.e. it has data configuration C1 = (λ, $S) = (s1 , $σ1R ). It then proceeds
through the following k data configurations at vertex Main: C2 = (s2 , $σ2R ), · · · Ck+1 =
R
(sk+1 , $σk+1 ) = (w, $), and it goes from Ci to Ci+1 , for 1 ≤ i ≤ k, by means of the supermove
corresponding to the application of the rule taking si σi to si+1 σi+1 . Finally ML moves from
vertex Main to its final vertex, popping from its stack, and thereby recognizing w; i.e.
w ∈ L(ML ).
Next suppose that ML recognizes w. It does so by means of a computation using k
supermoves, for some k ≥ 0. The computation begins by moving to vertex Main, while
reading λ and pushing on the stack, i.e. it is at configuration C1 = (λ, $) = (s1 , $σ1R ).
It then moves through the following series of data configurations at vertex Main: C2 =
(s2 , $σ2R ), · · · , Ck+1 = (sk+1 , $σk+1
R
), where, for 1 ≤ i ≤ k, Ci+1 is reached from Ci by means
of a supermove. By definition, the ith supermove corresponds to the application of the
rule that takes string si σi to si+1 σi+1 . Thus, the following is a derivation in grammar GL :
S = s1 σ1 ⇒ s2 σ2 ⇒ · · · ⇒ sk+1 σ k+1 . Following the kth supermove, ML moves to its final
vertex, which entails popping from its stack. At this point, ML has data configuration
(sk+1 , λ), and it accepts sk+1 , the string read. By assumption, ML was recognizing w, thus
sk+1 = w. We conclude that S ⇒∗ sk+1 σk+1 = w.
Lemma 3.6.2. If L is context-free and R is regular then L ∩ R is also context free, where
L, R ⊆ Σ∗ .
Proof. Let GL = (VL , Σ, RL , SL ) be a CNF grammar generating L and let MR = (VR , start, FR , δR )
be a DFA recognizing R. We will build a grammar GL∩R = (VL∩R , Σ, RL∩R , SL∩R ) to gener-
ate L ∩ R. Let VR = {q1 , q2 , · · · , qm }. For each variable U ∈ VL , we create m2 variables Uij
in VL∩R . The rules we create will ensure that:
Uij ⇒∗ w ∈ Σ∗
exactly if
U ⇒∗ w and there is a path labeled w in MR going from vi to vj .
Thus a variable in GL∩R records the name of the corresponding variable in GL and also
records a “start” and a “finish” vertex in MR . The constraint we are imposing is that Uij
can generate w exactly if both U can generate w and MR when started at qi will go to qj on
input w. It follows that if qf is a final vertex of MR and if q1 = start, then
To see why this works, we consider any non-empty string w ∈ L ∩ R and look at a
derivation tree T for w with respect to GL . At the same time we look at a w-recognizing
path P in MR .
We label each leaf of T with the names of two vertices in MR : the vertices that MR is
at right before and right after reading the input character corresponding to the character
labeling the leaf. If we read across the leaves from left to right, recording vertex and character
labels, we obtain a sequence p1 w1 p2 , p2 w2 p3 , · · · , pn wn pn+1 , where p1 = start and pn+1 ∈ FR
Next, we give the internal nodes in T vertex labels also. A node receives as its first label
the first label of its leftmost leaf and as its second label the second label of its rightmost
leaf. Suppose that A is the variable label at an internal node with children having variable
labels B and C (see Figure 3.20). Suppose further that B receives vertex labels p and q (in
p A s
B C
p q r s
p q = r s
Figure 3.20: Vertex Labels in Derivation Tree T : first label on left, second label on right.
that order), and C receives vertex labels r and s. Then q = r and A receives vertex labels p
and s. To obtain the derivation in GL∩R , we simply replace A ⇒ BC by Aps ⇒ Bpq Cqs . In
addition, at the leaf level, we replace A ⇒ a by Apq ⇒ a where p and q are the vertex labels
on the leaf (and on its parent). Clearly, this is a derivation in GL∩R and further it derives
w. Thus if w ∈ L ∩ R, then SL∩R ⇒∗ w.
On the other hand, suppose that SL∩R ⇒∗ w. Then consider the derivation tree for
w in GL∩R . On replacing each variable Uij by U we obtain a derivation tree for w in GL .
Thus SL ⇒∗ w also. On looking at the leaf level, and labeling each leaf with the vertex
indices on the variable at its parent, we obtain a sequence p1 w1 p2 , p2 w2 p3 , · · · , pn wn pn+1 ,
where w = w1 w2 · · · wn , p1 = start and pn+1 ∈ FR . As δ(pi , wi ) = pi+1 , for 1 ≤ i ≤ n, by the
first rule definition for GL∩R , we see that p1 p2 pi · · · pn+1 is a w-recognizing path in MR , and
so w ∈ R. This shows that if SL∩R ⇒∗ w then w ∈ L ∩ R.
92 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
Stack
height
p t Computation path
If we draw the edges connecting matched vertices in the Stack Contents Diagram, this
naturally partitions the diagram into trapezoids (see Figure 3.22).
q r
p
t
Consider the subcomputation of ML that goes from a vertex p to its subsequent matched
vertex r. We call the subcomputation a phase; let P denote this phase. Let A be the
character pushed onto the stack in P’s first step; then this same A is popped in P’s last
step. Suppose that the first step performs the operation “Read a, Push A” and takes ML
from vertex p to vertex q, and the last step performs the operation “Read b, Pop A” and
takes ML from vertex r to vertex t. (note that a, b ∈ {λ} ∪ T .) Then this pair of actions
can be represented by the left and right edges of the folowing trapezoid. It has left edge
Aab Aab
(p, q) and right edge (r, t), We name this trapezoid Tpqrt . We call trapezoid Tpqrt the base of
phase P. It may be that P is a 2-step computation, in which case the trapezoid is actually
a triangle, and q = r; we call such trapezoids triangles. But even if q = r it could be that P
lasts longer than 2 steps.
If the base of phase P is a non-triangular trapezoid, then P consists of a series of one or
more subphases. Each subphase is stack-preserving, maintaining the A on the stack. Again,
for each subphase P , we can identify the trapezoid at its base; again, it which represents
the computation performed during the first and last steps of P .
94 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
sole final vertex. Also the first and last steps of ML ’s computation go from from vertex
s to s and from f to f , respectively, and do not perform any reads, but just push and
pop the -shield.
the first and second steps, respectively); thus w = ab. And label(Z) = ab = w, proving the
claim in this case.
Inductive step. Suppose that the claim is true for all trapezoid trees of height h ≤ l. We
show that it is true for trees Z of height l + 1 also.
Let z be the root of Z and suppose it has subtrees Z1 , Z2 , · · · Zk in left to right order,
Aab
for some k ≥ 1. Let Tpqrt be the trapezoid stored at z. Then the computation of ML
represented by Z consists of a first step “Read a, Push A”, followed in turn by the stack-
preserving computations represented by Z1 , Z2 , · · · Zk , followed by a final step “Read b, Pop
A”. Suppose that ML reads string wi during the computation corresponding to Zi . By the
inductive hypothesis, label(Zi ) = wi . Thus ML ’s computation corresponding to tree Z reads
aw1 w2 · · · wk b = w. And label(Z) = a ◦ label(Z1 ) ◦ label(Z2 ) ◦ · · · ◦ label(Zk ) ◦ b = w.
We conclude that label(Z) = w for all trees Z.
We now show the converse.
Lemma 3.6.11. Let Z be a trapezoidal tree that observes Property 3.6.9. Suppose that
label(Z) = w. Then there is a stack-preserving computation of ML that reads w, starts at
Z’s start vertex and ends at Z’s end vertex.
Proof. We prove the result by induction on the height of Z.
Base case. Z comprises a single (leaf) node v.
Aab
Let Tprrt be the triangle stored at node v. Clearly, label(Z) = ab, so w = ab.
Now define a computation of ML comprising the following two steps: the first step
consists of “Read a, Push A”’ plus a move from p, Z’s start vertex, to vertex r; the second
step consists of “Read b, Pop B” plus a move to vertex t, Z’s end vertex. Clearly this
computation is stack-preserving, it reads ab = w, and it begins at Z’s start vertex and ends
at Z’s end vertex.
Inductive step. Suppose that the claim is true for all trapezoid trees Z of height h ≤ l. We
show that it is true for trees of height l + 1 also.
Let z be the root of Z and suppose it has subtrees Z1 , Z2 , · · · Zk in left to right order, for
Aab
some k ≥ 1. Let Tpqrt be the trapezoid stored at z. Let the root of subtree Zi , for 1 ≤ i ≤ k,
store a trapezoid with bottom edge (qi , ri ). By Property 3.6.9, ri = qi+1 for 1 ≤ i ≤ k − 1,
q1 = q, and rk = r. Let label(Zi ) = wi , for 1 ≤ i ≤ k. Thus label(Z) = aw1 w2 · · · wk b = w.
By the inductive hypothesis, there is a stack-preserving subcomputation of ML that goes
from vertex qi to ri and that reads wi , for 1 ≤ i ≤ k.
We define a computation of ML comprising the following steps: the first step consists of
“Read a, Push A” and a move from vertex p to vertex q = q1 . Then there are a series of
k stack-preserving computations, the ith one corresponding to subtree Zi , and going from
qi to ri = qi+1 , while reading wi . The final step consists of “Read b, Pop A” and a move
from vertex r = rk to t. Altogether, the computation goes from vertex p to vertex t, it reads
aw1 w2 · · · wk b = w, and it is stack preserving.
We conclude that for all Z there is a stack-preserving computation of ML that reads w,
starts at Z’s start vertex and ends at Z’s end vertex.
96 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
Lemmas 3.6.10 and 3.6.11 show that a w-recognizing computation of ML can be repre-
sented by a trapezoidal tree, and further that any trapezoidal tree observing Property 3.6.9
corresponds to a label(Z)-recognizing computation of ML .
To facilitate turning this tree into a derivation tree, we want the nodes in the trapezoidal
tree to have bounded degree. Let v be a node in the trapezoidal tree with k ≥ 1 children
v1 , v2 , · · · , vk . We achieve the bounded degree by introducing intermediate nodes, as follows.
We create intermediate nodes x1 , x2 , xk−1 , and y1 , y2 , · · · , yk , together with the following tree
edges. If k = 1, y1 is the child of v; otherwise x1 is the child of v. For 1 ≤ i ≤ k, yi will be
the parent of vi ; for 1 ≤ i ≤ k − 2, xi will have left child yi and right child xi+1 ; xk−1 will
have left child yk−1 and and right child yk . Clearly, in the new tree, v still has descendants
v1 , v2 , · · · , vk in left to right order. (see Figure 3.23.) We use the term x-nodes to refer to
y1 y2 y3
x1 x2
Now we are ready to define the Context Free Grammar GL . We will create a variable for
each realizable trapezoid, together with variables to label the x- and y-nodes. In addition,
we create rules to enable the variable for a node at the root of a subtree Z to derive label(Z).
Aab Aab
Thus we introduce a variable Upqrt for each realizable trapezoid Tpqrt . We create variables
Xpq for all p, q ∈ V for the x-nodes and variables Ypq for all p, q ∈ V for the y-nodes.
We add the following rules:
3.6. PDAS RECOGNIZE EXACTLY THE CONTEXT FREE LANGUAGES 97
Aab Aab
X Upqrt → aXqr b | aYqr b for all realizable trapezoids Tpqrt .
Aab Aab
X Uprrt → ab for all realizable trapezoids Tprrt (these are triangles).
Aab
X Yqr → Uqq r r for all q, q , r , r ∈ V , A ∈ Γ, and a , b ∈ T .
single final vertex, ML is -shielded, its first and last steps perform no reads, and go from
vertex s to s and from f to f , respectively).
Lemma 3.6.12. Let Z be a trapezoidal tree obeying Property 3.6.9. Let w = label(Z). Then
w ∈ L(GL ).
Aab
Proof. To obtain a derivation tree in GL , we replace each trapezoid Tpqrt at a node v by the
Aab Aab
variable Upqrt . For each variable Upqrt we add a left leaf child labeled by a and a right leaf
child labeled by b. We give variable Xpq to an x-node with span (p, q), and variable Ypq to a
y-node with span (p, q).
We use the following rules to enable the derivation of the string labeling the leaves of
this tree. For an x-node v with span (q, t) and children with spans (q, r) and (r, t) we use
one of the rules Xqt → Yqr Xrt | Yqr Yrt , according as the right child of v is an x-node or a
Aab
y-node. For a y-node with span (p, t), whose child has label variable Upqrt , we use the rule
Aab Aab Aab
Ypt → Upqrt . For a node v with label Upqrt we use one of the rules Upqrt → aXqr b | aYqr b,
according as v has an x-child or a y-child, while if v has no child (and so q = r) we use the
Aab
rule Uprrt → ab.
It is easy to see that this derivation derives label(Z) = w.
Lemma 3.6.13. Suppose that w ∈ L(GL ); then there is a trapezoidal tree Z obeying Prop-
erty 3.6.9 with w = label(Z).
Proof. Let TG (w) be a derivation tree for w. We will construct the trapezoidal tree Z as
Aab Aab
follows. We simply replace each variable Upqrt by trapezoid Tpqrt , remove the leaves (labeled
by terminals), and remove the variables labeling x- and y-nodes.
Clearly label(Z) = w.
It remains to show that Z obeys Property 3.6.9. Part 2 follows because the variables
Aab Σλλ
Upqrt correspond to realizable trapezoids. Part 3 follows because GL ’s start variable is Uss f f
so Z’s root stores trapezoid TssΣλλ f f . Part 1 follows because each leaf in Z corresponds to a
Aab
node with two leaf children in the derivation tree, and such nodes have variables Uprrt , which
Aab
are replaced by realizable triangles Tprrt . Part 4 follows because if in the derivation tree
Aab
Upqrt derives UpA11qa1 1r1b1t1 UpA22qa2 2r2b2t2 · · · UpAkkqakkrkbktk via intermediate X and Y variables, then q = p1 ,
Aab
pi+1 = qi for 1 ≤ i ≤ k − 1, and qk = r. But then the corresponding trapezoids Tpqrt and
A1 a1 b1 A2 a2 b2 Ak ak bk
Tp1 q1 r1 t1 , Tp2 q2 r2 t2 , · · · , Tpk qk rk tk obey the same condition, which is part 4 of Property 3.6.9.
Lemmas 3.6.10–3.6.13 show that L(GL ) = L(ML ).
98 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
Exercises
1. For the pda in Figure 3.24 answer the following questions.
Read a, Push A
Read b, Pop A
Push q
Pop
p Pop , Push Pop , Push s
Push Pop
r Read a, Pop B
Read b, Push B
2. Give PDAs to recognize the following languages. You are to give both an explanation in
English of what each PDA does plus a graph of the PDA; seek to label the graph vertices
accurately.
i. A = {w | w has odd length, w ∈ {a, b}∗ and the middle symbol of w is an a}.
ii. B = {w | w ∈ {a, b}∗ and w = ai bi for any integer i}.
iii. C = {w | w ∈ {a, b}∗ and w contains equal numbers of a’s and b’s}.
iv. D = {wwR x | w, x ∈ {a, b}∗ }.
v. E = {wcwR x | w, x ∈ {a, b}∗ }.
4. Let C be a language over the alphabet {a, b} and let Suffix(C) = {w | there is a u ∈ {a, b}∗
with uw ∈ C}. Show that if C is recognized by a pda then so is Suffix(C).
EXERCISES 99
5. i. Let A = {uav#xby | u, v, x, y ∈ {a, b}∗ and (|u| − |v|) = (|x| − |y|)}. Give a pda to
recognize A.
ii. Let B = {w#z | w, z ∈ {a, b}∗ and |w| = |z|}. Give a pda to recognize B.
iii. Show that A ∪ B = {s#t | s, t ∈ {a, b}∗ and s = t}.
6. Consider the pda in Figure 3.24. Add descriptors to the vertices. What language does
this pda recognize?
i. A = {w | w has odd length, w ∈ {a, b}∗ and the middle symbol of w is an a}.
ii. B = {w | w ∈ {a, b}∗ and w = wR }. A is the language of palindromes, strings that
read the same forward and backward.
Hint: Be sure to handle strings of all possible lengths.
iii. C = {wwR x | w, x ∈ {a, b}∗ }.
iv. D = {w | w ∈ {a, b}∗ and w contains an equal number of a’s and b’s}.
Hint: suppose that the first character in w is an a. Let x be the shortest initial
substring of w having an equal number of a’s and b’s. If |x| < |w|, then w can be
written as w = xy; what can you say about y? Otherwise, x = w and w can be
written as w = azb; what can you say about z?
v. E = {w#x | w, x ∈ {a, b}∗ and wR is an initial substring of x}.
Hint: x can be written as x = wR y for some x ∈ {a, b}∗ .
10. i. Give a context free grammar to generate the following language: L1 = {ai #bi+j $aj | i, j ≥
0}.
ii. Give a context free grammar to generate the following language: L2 = {w#x$y | w, x, y ∈
{a, b}∗ and |x| = |w| + |y|}.
100 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES
iii. Hence give a context free grammar to generate the following language: L3 = {uv | |u| =
|v| but u = v}. Hint: think of the # and the $ from part (2) as a pair of aligned yet
unequal characters in u and v; what is the relation among the lengths of the remaining
pieces of u and v?
11. Let A be a CFL generated by a CFG GA . Give a CFG grammar GA∗ , based on GA , to
generate A∗ . Show that L(GA∗ ) = A∗ .
S → SBS | BC; B → ab | λ; C → c | λ.
13. Show that the following languages are not context free.
i. A = {ai bj ck bj ck ai | i, j, k ≥ 0}.
ii. B = {am bn cm dn | m, n ≥ 0}.
iii. C = {w | w ∈ {a, b, c}∗ and the number of a’s, b’s and c’s in w are all equal}.
iv. D = {u#v#w | u, v, w ∈ {a, b}∗ , the number of a’s in u equals |v|,
and the number of b’s in v equals |w|}.
v. Let E = {w | w ∈ {a, b, c, d}∗ and w contains equal numbers of a’s and b’s, and equal
numbers of c’s and d’s}. Show that D is not context free.
2
vi. F = {ai | i ≥ 1}.
Comment. Any CFL over a 1-character alphabet is a regular language. Give a proof
without using this fact.
i
vii. H = {a2 | i ≥ 0}.
Comment. Any CFL over a 1-character alphabet is a regular language. Give a proof
without using this fact.
viii. J = {x1 #x2 # · · · #xl | xh ∈ {a, b}∗ , 1 ≤ h ≤ l, and for some i, j, k, 1 ≤ i < j < k,
|xi | = |xj | = |xk |}.
ix. K = {x1 #x2 # · · · #xk | xh ∈ {a, b}∗ , 1 ≤ h ≤ k, and for some i, j, 1 ≤ i < j ≤ k,
xi = xj }.
x. Let L = {ai bj | i is an integer multiple of j}.
xi. Let M = {wxwR | w, x ∈ {a, b}∗ and |w| = |x|}.
xii. Let N be the language consisting of all palindromes over the alphabet Σ = {a, b, c}
having equal numbers of a’s and b’s.
EXERCISES 101
i. Let w ∈ {a, b, c}∗ . Define Sbst(w, a, b) to be the string obtained by replacing all
instances of the character a in w with b. e.g. Sbst(ac, a, b) = bc, Sbst(cc, a, b) = cc,
Sbst(abc, a, b) = bbc, Sbst(acacac, a, b) = bcbcbc.
Let L be a language over the alphabet {a, b, c}. Define T (L) = Sbst(L, a, b) = {x | x =
Sbst(w, a, b) for some w ∈ L}.
ii. Let w ∈ {a, b, c}∗ . Define OneSubst(w, a, b), or OS(w, a, b) for short, to be the set of
strings obtained by replacing one instance of the character a from w with a b. e.g.
OS(acacac, a, b) = {bcacac, acbcac, acacbc}.
Let L be a language over the alphabet {a, b, c}.
Define T (L) = OS(L, a, b) = {x | x = OS(w, a, b) for some w ∈ L}.
iii. Let w ∈ {a, b, c}∗ . Define Remove-c(w) to be the string obtained by deleting all
instances of the character c from w. e.g. Remove-c(ab) = ab, Remove-c(cc) = λ,
Remove-c(abc) = ab, Remove-c(acacac) = aaa.
Let L be a language over the alphabet {a, b, c}. Define T (L) = Remove-c(L) = {x | x =
Remove-c(w) for some w ∈ L}.
iv. Let w ∈ {a, b, c}∗ . Define Remove-One-c(w) to be the set of strings obtained by delet-
ing one instance of the character c from w. e.g. Remove-One-c(acacac) = {aacac, acaac, acaca}.
Let L be a language over the alphabet {a, b, c}.
Define L(T ) = Remove-One-c(L) = {x | x = Remove-One-c(w) for some w ∈ L}.
102 CHAPTER 3. PUSHDOWN AUTOMATA AND CONTEXT FREE LANGUAGES