CMP3008
Formal Languages
and Automata Theory
Lecture Notes 5
Nonregular Languages, Pumping Lemma and
Context Free Grammars
Sources
https://eecs.wsu.edu/~ananth/CptS317/Lectures/index.htm
"Introduction to automata theory, languages and
computation" by JE Hopcroft, R Motwani and JD Ullman.
" An Introduction to Formal Languages and Automata Theory" by
Peter Linz 1
Content
• Non-Regular Languages
• Pumping Lemma
• Context-Free Grammars
• Ambiguity in CFG
• Chomsky Normal Form
2
Not all languages are regular
• So what happens to the languages which are not regular?
• Can we still come up with a language recognizer?
• i.e., something that will accept (or reject) strings that belong (or do not
belong) to the language?
3
Non-Regular Languages
• Question: What are the limitations of finite automata, i.e. DFAs (or
NFAs)?
• Can we find a DFA for the language B = {0n1m| n ≥ 0, m ≥ 0}
• What about B = {0n1n| n ≥ 0}?
• The language B = {0n1n| n ≥ 0} is nonregular because the number of
0s isn’t limited, the machine will have to keep track of an unlimited
number of possibilities. But it cannot do so with any finite number of
states.
Non-Regular Languages
• We need a proof to show that a given language is not regular
• Question: Doesn’t the argument already given prove nonregularity
because the number of 0s is unlimited?
• No
• A language requiring unbounded memory doesn’t mean that it is not
regular
• There are languages seem to require an unlimited number of
possibilities, yet actually they are regular
Non-Regular Languages
• For example, consider two languages over the alphabet Σ = {0,1}:
• C = {w| w has an equal number of 0s and 1s},
• D = {w| w has an equal number of occurrences of 01 and 10 as substrings
• Can we design a DFA for C and/or D?
• For C, no
• But for D, yes! -> DFA or NFA?
• So, we need a proof!
Pumping Lemma
• Pumping lemma theorem states that all regular languages have a
special property.
• If we can show that a language does not have this property, we are
guaranteed that it is not regular.
• The property states that all strings in the language can be “pumped”
if they are at least as long as a certain special value, called the
pumping length.
• That means each such string contains a section that can be repeated
any number of times with the resulting string remaining in the
language
Formal Definition of Pumping Lemma
Formal Definition of Pumping Lemma
• When s is divided into xyz, either x or z may be ε, but condition 2 says
that y ≠ ε.
• Observe that without condition 2 the theorem would be trivially true.
• Condition 3 states that the pieces x and y together have length at
most p.
• It is an extra technical condition that we occasionally find useful when
proving certain languages to be nonregular
How to use Pumping Lemma?
• To use the pumping lemma to prove that a language B is not regular,
• First assume that B is regular in order to obtain a contradiction.
• Then use the pumping lemma to guarantee the existence of a
pumping length p such that all strings of length p or greater in B
can be pumped.
• Next, find a string s in B that has length p or greater but that
cannot be pumped.
• Finally, demonstrate that s cannot be pumped by considering all
ways of dividing s into x, y, and z (taking condition 3 of the
pumping lemma into account if convenient) and, for each such
division, finding a value i where xyi z is not a member of B.
Example 1
• Let B = {0n1n}| n ≥ 0}. We use the pumping lemma to prove that B is
not regular. The proof is by contradiction.
• Assume to the contrary that B is regular. Let p be the pumping length
given by the pumping lemma. Choose s to be the string 0p1p.
• Because s is a member of B and s has length more than p, the
pumping lemma guarantees that s can be split into three pieces, s =
xyz, where for any i ≥ 0 the string xyiz is in B. We consider three cases
to show that this result is impossible.
Example 1 (cont’d)
• The string y consists only of 0s. In this case, the string xyyz has more
0s than 1s and so is not a member of B, violating condition 1 of the
pumping lemma. This case is a contradiction.
• The string y consists only of 1s. This case also gives a contradiction.
• The string y consists of both 0s and 1s. In this case, the string xyyz
may have the same number of 0s and 1s, but they will be out of order
with some 1s before 0s. Hence it is not a member of B, which is a
contradiction.
Example 2
• C = {w | w has an equal number of 0s and 1s}
• Assume C is regular
• Let s be the string 0p1p.
• With s being a member of C and having length more than p, the
pumping lemma guarantees that s can be split into three pieces,
s = xyz, where for any i ≥ 0 the string xyiz is in C.
• Let’s show that this is not possible!
Example 2
• If we let x and z be the empty string and y be the string 0p1p, then xyiz
always has an equal number of 0s and 1s and hence is in C. So it
seems that s can be pumped.
• But! Here condition 3 in the pumping lemma is useful.
• It stipulates that when pumping s, it must be divided so that |xy| ≤ p.
• If |xy| ≤ p, then y must consist only of 0s, so xyyz is not in C.
• Therefore, s cannot be pumped. That gives us the desired
contradiction.
Example 2
• Can we show the same for s = (01)p which is also a member of C?
• Can we pump it?
• x = ε, y = 01, and z = (01)p−1. Then xyiz ∈ C for every value of i.
Example 3
• F = {ww | w ∈ {0,1}*}
• Assume that F is regular
• s = 0p1p0p1p
• 00000111110000011111
• It is not possible to find a y in the first p number of 0’s such that if we
pump y the resulting string is in F.
• s = 0p10p1 is another good choice
• s = 0p0p not a good choice
Example 4
• E = {0i1j| i > j}
• Assume that E is regular
• s = 0p+11p
• 0000 0 011111 (if p is 5)
• When y = 0 or y = (0)p , removing y (xy0z) will reduce the number of
zeros and hence, the resulting string will not be in E, so we have a
contradiction.
Example 5
• A nonregular unary language:
• D contains all strings of 1s whose length is a perfect square.
Note the growing gap between successive members of this sequence.
Large members of this sequence cannot be near each other 18
Example 5
• A nonregular unary language:
19
Not all languages are regular
• So what happens to the languages which are not regular?
• Can we still come up with a language recognizer?
• i.e., something that will accept (or reject) strings that belong (or do not
belong) to the language?
20
Context-Free Languages
• A language class larger than the class of regular languages
• Supports natural, recursive notation called “context-free grammar”
• Applications:
• Parse trees, compilers
• XML
Context-
Regular free
(FA/RE)
(PDA/CFG)
21
An Example
• A palindrome is a word that reads identical from both ends
• E.g., madam, redivider, malayalam, 010010010
• Let L = { w | w is a binary palindrome}
• Is L regular?
• No.
• Proof:
• Let w=0p10p (assuming N to be the p/l constant)
• By Pumping lemma, w can be rewritten as xyz, such that xyiz is also L (for any i≥0)
• But |xy|≤p and y≠
• ==> y=0+
• ==> xyiz will NOT be in L for i=0
• ==> Contradiction
22
But the language of palindromes…
is a CFL, because it supports recursive substitution (in the form of a
CFG)
• This is because we can construct a “grammar” like this:
1. A ==>
2. A ==> 0 Terminal
Same as:
Productions 3. A ==> 1 A => 0A0 | 1A1 | 0 | 1 |
4. A ==> 0A0
5. A ==> 1A1 Variable or non-terminal
How does this grammar work?
23
How does the CFG for palindromes work?
An input string belongs to the language (i.e., accepted) iff it can be
generated by the CFG
G:
• Example: w=01110
A => 0A0 | 1A1 | 0 | 1 |
• G can generate w as follows:
Generating a string from a grammar:
1. A => 0A0
1. Pick and choose a sequence
2. => 01A10 of productions that would
3. => 01110 allow us to generate the
string.
2. At every step, substitute one variable
with one of its productions.
24
Example
• Example context free grammar G1:
A → 0A1
A→B
B→#
• 3 Substitution rules (productions)
• Variables = {A, B}
• Terminals = {0, 1, #}
• Start variable = A
Derivation
• For example, grammar G1 generates the string 000#111.
A → 0A1
A→B
B →#
• The sequence of substitutions to obtain a string is called a derivation.
A derivation of string 000#111 in grammar G1 is
• A ⇒ 0A1 ⇒ 00A11 ⇒ 000A111 ⇒ 000B111 ⇒ 000#111.
Parse Trees
• Each CFG can be represented using a parse tree:
• Each internal node is labeled by a variable in V
• Each leaf is terminal symbol
• For a production, A==>X1X2…Xk, then any internal node labeled A has k
children which are labeled from X1,X2,…Xk from left to right
Parse tree for production and all other subsequent productions:
A ==> X1..Xi..Xk A
X1 … Xi … Xk
27
Examples
Recursive inference
A
E + E
0 A 0
F F
Derivation
1 A 1
a 1
Parse tree for 0110
Parse tree for a + 1
G: G:
E => E+E | E*E | (E) | F A => 0A0 | 1A1 | 0 | 1 |
F => aF | bF | 0F | 1F | 0 | 1 | a | b
28
Parse Tree
Examples
• Can the following strings be derived from G1:
0#1 A ⇒ 0A1 ⇒ 0B1 ⇒ 0#1 A → 0A1
A→B
0#11 Cannot be derived. B→#
# A⇒B⇒#
Language of the grammar.
• All strings generated in this way constitute the language of the
grammar. We write L(G1) for the language of grammar G1.
• Some experimentation with the grammar G1 shows us that L(G1) is:
A → 0A1
A→B
B→#
{0n#1n| n ≥ 0}
“|” symbol
For convenience when presenting a context-free grammar, we
abbreviate several rules with the same left-hand variable, such as
A → 0A1 and A → B
into a single line
A → 0A1 | B
using the symbol “|” as an “or”.
Grammar G2
Examples
Strings in L(G2) include:
• a boy sees
• the boy sees a flower
• a girl with a flower likes the boy
Derivation of “a boy sees”
FORMAL DEFINITION OF A CONTEXT-FREE
GRAMMAR
Example
Design a CFG for the following language:
L = {w | w is a properly nested parentheses}
(), (()), (()())(), ()()()() are in L
()), (()(), ))(( are not in L
G3 = ({S}, {(, )}, R, S). The set of rules, R, is
S → (S) | SS | ε
Example
• A grammar for L = {0m1n | m≥n}
• CFG?
G:
S => 0S1 | A
A => 0A |
How would you interpret the string “00000111”
using this grammar?
38
Examples
DESIGNING CONTEXT-FREE GRAMMARS
As with the design of finite automata the design of context-free
grammars requires creativity.
But there are some useful techniques
Technique I: Merging Grammars
Technique II: DFA to CFG
• You can convert any DFA into an equivalent CFG as follows.
• Make a variable Ri for each state qi of the DFA.
• Add the rule Ri → aRj to the CFG if δ(qi,a) = qj is a transition in the DFA.
• Add the rule Ri → ε if qi is an accept state of the DFA.
• Make R0 the start variable of the grammar, where q0 is the start state of the
machine
Technique II: DFA to CFG
E → 0E
E → 1O
O → 0O
O → 1E
O→ε
Example Derivation:
E ⇒ 0E ⇒ 00E ⇒ 001O ⇒ 0010O ⇒ 00101E ⇒ 001011O ⇒ 001011
Ambiguity
Ambiguity
Example Derivations
• E -> E + E | E x E | (E) | a
a+a
• E⇒E+E⇒a+E⇒a+a
((a + a) x a)
• E ⇒ (E) ⇒(E x E) ⇒ ((E) x E) ⇒ ((E+E) x E) ⇒ ((a + a) x a)
a+axa
• E⇒E+E⇒E+ExE⇒a+axa
• E⇒ExE⇒E+ExE⇒a+axa
Ambiguity
the girl touches the boy with the flower
(a) (b)
Leftmost derivation
• A derivation of a string w in a grammar G is a leftmost derivation if at
every step the leftmost remaining variable is the one replaced. The
derivation below is a leftmost derivation.
Ambiguity
Chomsky Normal Form
Theorem
Example
This change guarantees that the start variable
doesn’t occur on the right-hand side of a rule.
Example con’t.
• Second, we take care of all ε-rules.
• We remove an ε-rule A → ε, where A is not the start variable.
• Then for each occurrence of an A on the right-hand side of a rule, we add a new rule with that
occurrence deleted.
• In other words, if R → uAv is a rule in which u and v are strings of variables and terminals, we add
rule R → uv.
• We do so for each occurrence of an A, so the rule R → uAvAw causes us to add R → uvAw, R →
uAvw, and R → uvw.
• If we have the rule R → A, we add R → ε unless we had previously removed the rule R → ε.
• We repeat these steps until we eliminate all ε-rules not involving the start variable.
Example con’t.
Example con’t.
Example con’t.