E Automata Theory-1 Final
E Automata Theory-1 Final
In the literary sense of the term, grammars denote syntactical rules for conversation in natural
languages. Linguistics have attempted to define grammars since the inception of natural
languages like English, Sanskrit, Mandarin, etc.
The theory of formal languages finds its applicability extensively in the fields of Computer
Science. Noam Chomsky gave a mathematical model of grammar in 1956 which is effective for
writing computer languages.
Grammar
A grammar G can be formally written as a 4-tuple (N, T, S, P) where −
N or VN is a set of variables or non-terminal symbols.
→ β, where α and β are strings on VN ∪ ∑ and least one symbol of α belongs to VN.
P is Production rules for Terminals and Non-terminals. A production rule has the form α
Example
Grammar G1 −
({S, A, B}, {a, b}, S, {S → AB, A → a, B → b})
Here,
S, A, and B are Non-terminal symbols;
Productions, P : S → AB, A → a, B → b
Example
Grammar G2 −
(({S, A}, {a, b}, S,{S → aAb, aA → aaAb, A → ε } )
Here,
S and A are Non-terminal symbols.
a and b are Terminal symbols.
x α y ⇒G x β y
written as −
Example
Let us consider the grammar −
G2 = ({S, A}, {a, b}, S, {S → aAb, aA → aaAb, A → ε } )
L(G)={W|W ∈ ∑*, S ⇒G W}
from that grammar. A language generated by a grammar G is a subset formally defined by
and a ∈ T (Terminal)
The rule S → ε is allowed if S does not appear on the right side of any rule.
Example
X→ε
X → a | aY
Y→b
Type - 2 Grammar
Type-2 grammars generate context-free languages.
where A ∈ N (Non-terminal)
αAβ→αγβ
and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)
The strings α and β may be empty, but γ must be non-empty.
The rule S → ε is allowed if S does not appear on the right side of any rule. The languages
generated by these grammars are recognized by a linear bounded automaton.
Example
AB → AbBc
A → bcA
B→b
Type - 0 Grammar
Type-0 grammars generate recursively enumerable languages. The productions have no
restrictions. They are any phase structure grammar including all formal grammars.
They generate the languages that are recognized by a Turing machine.
The productions can be in the form of α → β where α is a string of terminals and nonterminals
with at least one non-terminal and α cannot be null. β is a string of terminals and non-
terminals.
Example
S → ACaB
Bc → acB
CB → DB
aD → Db
Regular Expressions
A Regular Expression can be recursively defined as follows −
ε is a Regular Expression indicates the language containing an empty string. (L (ε) = {ε})
φ is a Regular Expression denoting an empty language. (L (φ) = { })
x is a Regular Expression where L = {x}
If X is a Regular Expression denoting the language L(X) and Y is a Regular Expression
(a+b)*abb Set of strings of a’s and b’s ending with the string abb. So L = {abb, aabb,
babb, aaabb, ababb, …………..}
(11)* Set consisting of even number of 1’s including empty string, So L= {ε, 11,
1111, 111111, ……….}
(aa)*(bb)*b Set of strings consisting of even number of a’s followed by odd number of
b’s , so L = {b, aab, aabbb, aabbbbb, aaaab, aaaabbb, …………..}
(aa + ab + ba + bb)* String of a’s and b’s of even length can be obtained by concatenating any
combination of the strings aa, ab, ba and bb including null, so L = {aa, ab,
ba, bb, aaab, aaba, …………..}
Regular Sets
Any set that represents the value of the Regular Expression is called a Regular Set.
Properties of Regular Sets
Property 1. The union of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(aa)* and RE2 = (aa)*
So, L1 = {a, aaa, aaaaa,.....} (Strings of odd length excluding Null)
Hence, proved.
Property 2. The intersection of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(a*) and RE2 = (aa)*
So, L1 = { a,aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∩ L2 = { aa, aaaa, aaaaaa,.......} (Strings of even length excluding Null)
RE (L1 ∩ L2) = aa(aa)* which is a regular expression itself.
Hence, proved.
Property 3. The complement of a regular set is regular.
Proof −
Let us take a regular expression −
RE = (aa)*
So, L = {ε, aa, aaaa, aaaaaa, .......} (Strings of even length including Null)
Complement of L is all the strings that is not in L.
So, L’ = {a, aaa, aaaaa, .....} (Strings of odd length excluding Null)
RE (L’) = a(aa)* which is a regular expression itself.
Hence, proved.
Property 4. The difference of two regular set is regular.
Proof −
Let us take two regular expressions −
RE1 = a (a*) and RE2 = (aa)*
So, L1 = {a, aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 – L2 = {a, aaa, aaaaa, aaaaaaa, ....}
(Strings of all odd lengths excluding Null)
RE (L1 – L2) = a (aa)* which is a regular expression.
Hence, proved.
Property 5. The reversal of a regular set is regular.
Proof −
We have to prove LR is also regular if L is a regular set.
Let, L = {01, 10, 11, 10}
RE (L) = 01 + 10 + 11 + 10
LR = {10, 01, 11, 01}
RE (LR) = 01 + 10 + 11 + 10 which is regular
Hence, proved.
Property 6. The closure of a regular set is regular.
Proof −
If L = {a, aaa, aaaaa, .......} (Strings of odd length excluding Null)
i.e., RE (L) = a (aa)*
L* = {a, aa, aaa, aaaa , aaaaa,……………} (Strings of all lengths excluding Null)
RE (L*) = a (a)*
Hence, proved.
Property 7. The concatenation of two regular sets is regular.
Proof −
Let RE1 = (0+1)*0 and RE2 = 01(0+1)*
Here, L1 = {0, 00, 10, 000, 010, ......} (Set of strings ending in 0)
and L2 = {01, 010,011,.....} (Set of strings beginning with 01)
Then, L1 L2 = {001,0010,0011,0001,00010,00011,1001,10010,.............}
Set of strings containing 001 as a substring which can be represented by an RE − (0 + 1)*001(0
+ 1)*
Hence, proved.
Identities Related to Regular Expressions
∅* = ε
Given R, P, L, Q as regular expressions, the following identities hold −
ε* = ε
RR* = R*R
R*R* = R*
(R*)* = R*
RR* = R*R
(PQ)*P =P(QP)*
Rij represents the set of labels of edges from qi to qj, if no such edge exists, then Rij = ∅
qn = q1R1n + q2R2n + … + qnRnn
Step 2 − Solve these equations to get the equation for the final state in terms of Rij
Problem
Construct a regular expression corresponding to the automata given below −
Solution −
Here the initial state and final state is q1.
The equations for the three states q1, q2, and q3 are as follows −
q1 = q1a + q3a + ε (ε move is because q1 is the initial state0
q2 = q 1 b + q 2 b + q 3 b
q3 = q 2 a
Now, we will solve these three equations −
q2 = q 1 b + q 2 b + q 3 b
= q1b + q2b + (q2a)b (Substituting value of q3)
= q1b + q2(b + ab)
= q1b (b + ab)* (Applying Arden’s Theorem)
q1 = q 1 a + q 3 a + ε
= q1a + q2aa + ε (Substituting value of q3)
= q1a + q1b(b + ab*)aa + ε (Substituting value of q2)
= q1(a + b(b + ab)*aa) + ε
= ε (a+ b(b + ab)*aa)*
= (a + b(b + ab)*aa)*
Hence, the regular expression is (a + b(b + ab)*aa)*.
Problem
Construct a regular expression corresponding to the automata given below −
Solution −
Here the initial state is q1 and the final state is q2
Now we write down the equations −
q1 = q 1 0 + ε
q2 = q 1 1 + q 2 0
q3 = q 2 1 + q 3 0 + q 3 1
Now, we will solve these three equations −
q1 = ε0* [As, εR = R]
So, q1 = 0*
q2 = 0*1 + q20
So, q2 = 0*1(0)* [By Arden’s theorem]
Hence, the regular expression is 0*10*.
Construction of an FA from an RE
We can use Thompson's Construction to find out a Finite Automaton from a Regular
Expression. We will reduce the regular expression into smallest regular expressions and
converting these to NFA and finally to DFA.
Some basic RA expressions are the following −
Case 1 − For a regular expression ‘a’, we can construct the following FA −
Method
Step 1 Construct an NFA with Null moves from the given regular expression.
Step 2 Remove Null transition from the NFA and convert it into its equivalent DFA.
Problem
Convert the following RA into its equivalent DFA − 1 (0 + 1)* 0
Solution
We will concatenate three expressions "1", "(0 + 1)*" and "0"
Now we will remove the ε transitions. After we remove the ε transitions from the NDFA, we get
the following −
It is an NDFA corresponding to the RE − 1 (0 + 1)* 0. If you want to convert it into a DFA, simply
apply the method of converting NDFA to DFA discussed in Chapter 1.
Finite Automata with Null Moves (NFA-ε)
A Finite Automaton with null moves (FA-ε) does transit not only after giving input from the
alphabet set but also without any input symbol. This transition without input is called a null
move.
An NFA-ε is represented formally by a 5-tuple (Q, ∑, δ, q0, F), consisting of
Q − a finite set of states
q0 − an initial state q0 ∈ Q
F − a set of final state/states of Q (F⊆Q).
Solution
Step 1 −
Here the ε transition is between q1 and q2, so let q1 is X and qf is Y.
Here the outgoing edges from qf is to qf for inputs 0 and 1.
Step 2 −
Now we will Copy all these edges from q 1 without changing the edges from qf and get the
following FA −
Step 3 −
Here q1 is an initial state, so we make qf also an initial state.
So the FA becomes −
Step 4 −
Here qf is a final state, so we make q1 also a final state.
So the FA becomes −
P is a set of rules, P: N → (N ∪ T)*, i.e., the left-hand side of the production rule P does
T is a set of terminals where N ∩ T = NULL.
If a partial derivation tree contains the root S, it is called a sentential form. The above sub-tree
is also in sentential form.
Leftmost and Rightmost Derivation of a String
Leftmost derivation − A leftmost derivation is obtained by applying production to the
leftmost variable in each step.
Rightmost derivation − A rightmost derivation is obtained by applying production to
the rightmost variable in each step.
Example
Let any set of production rules in a CFG be
X → X+X | X*X |X| a
over an alphabet {a}.
The leftmost derivation for the string "a+a*a" may be −
X → X+X → a+X → a + X*X → a+a*X → a+a*a
The stepwise derivation of the above string is shown as below −
The rightmost derivation for the above string "a+a*a" may be −
X → X*X → X*a → X+X*a → X+a*a → a+a*a
The stepwise derivation of the above string is shown as below −
Since there are two parse trees for a single string "a+a*a", the grammar G is ambiguous.
CFL Closure Property
Context-free languages are closed under −
Union
Concatenation
Kleene Star operation
Let L1 and L2 be two context free languages. Then L1 ∪ L2 is also context free.
Union
Example
Let L1 = { anbn , n > 0}. Corresponding grammar G1 will have P: S1 → aAb|ab
W3 = { A, C, E, S } U ∅
W2 = { A, C, E } U { S } from rule S → AC
Any production rule in the form A → B where A, B ∈ Non-terminal is called unit production..
Removal of Unit Productions
Removal Procedure −
S → ASA | aB | b, A → B, B → b | ∈
Remove null production from the following −
Solution −
There are two nullable variables − A and B
At first, we will remove B → ε.
After removing B → ε, the production set becomes −
S→ASA | aB | b | a, A ε B| b | &epsilon, B → b
Now we will remove A → ε.
After removing A → ε, the production set becomes −
S→ASA | aB | b | a | SA | AS | S, A → B| b, B → b
This is the final production set without null transition.
Chomsky Normal Form
A CFG is in Chomsky Normal Form if the Productions are in the following forms −
A→a
A → BC
S→ε
where A, B, and C are non-terminals and a is terminal.
Algorithm to Convert into Chomsky Normal Form −
Step 1 − If the start symbol S occurs on some right side, create a new start symbol S’ and a new
production S’→ S.
Step 2 − Remove Null productions. (Using the Null production removal algorithm discussed
earlier)
Step 3 − Remove unit productions. (Using the Unit production removal algorithm discussed
earlier)
Step 4 − Replace each production A → B1…Bn where n > 2 with A → B1C where C → B2 …Bn.
Repeat this step for all productions having two or more symbols in the right side.
Step 5 − If the right side of any production is in the form A → aB where a is a terminal and A,
B are non-terminal, then the production is replaced by A → XB and X → a. Repeat this step for
every production which is in the form A → aB.
Problem
Convert the following CFG into CNF
S → ASA | aB, A → B | S, B → b | ε
Solution
(1) Since S appears in R.H.S, we add a new state S0 and S0→S is added to the production set and
B → ∈ and A → ∈
(2) Now we will remove the null productions −
S0→S, S→ ASA | aB | a, A → B | S | ∈, B → b
After removing B → ε, the production set becomes −
length ≥ p can be written as w = uvxyz, where vy ≠ ε, |vxy| ≤ p, and for all i ≥ 0, uvixyiz ∈ L.
Applications of Pumping Lemma
Pumping lemma is used to check whether a grammar is context free or not. Let us take an
example and show how it is checked.
Problem
Find out whether the language L = {xnynzn | n ≥ 1} is context free or not.
Solution
Let L is context free. Then, L must satisfy pumping lemma.
At first, choose a number n of the pumping lemma. Then, take z as 0n1n2n.
Break z into uvwxy, where
|vwx| ≤ n and vx ≠ ε.
Hence vwx cannot involve both 0s and 2s, since the last 0 and the first 2 are at least (n+1)
positions apart. There are two cases −
Case 1 − vwx has no 2s. Then vx has only 0s and 1s. Then uwy, which would have to be in L,
has n 2s, but fewer than n 0s or 1s.
Case 2 − vwx has no 0s.
Here contradiction occurs.
Hence, L is not a context-free language.
Pushdown Automata Introduction
Basic Structure of PDA
A pushdown automaton is a way to implement a context-free grammar in a similar way we
design DFA for a regular grammar. A DFA can remember a finite amount of information, but a
PDA can remember an infinite amount of information.
Basically a pushdown automaton is −
"Finite state machine" + "a stack"
A pushdown automaton has three components −
an input tape,
a control unit, and
a stack with infinite size.
The stack head scans the top symbol of the stack.
A stack does two operations −
Push − a new symbol is added at the top.
Pop − the top symbol is read and removed.
A PDA may or may not read an input symbol, but it has to read the top of the stack in every
transition.
A PDA can be formally described as a 7-tuple (Q, ∑, S, δ, q0, I, F) −
Q is the finite number of states
∑ is input alphabet
This means at state q1, if we encounter an input string ‘a’ and top symbol of the stack is ‘b’,
then we pop ‘b’, push ‘c’ on top of the stack and move to state q2.
Terminologies Related to PDA
Instantaneous Description
The instantaneous description (ID) of a PDA is represented by a triplet (q, w, s) where
q is the state
w is unconsumed input
s is the stack contents
Turnstile Notation
The "turnstile" notation is used for connecting pairs of ID's that represent one or many moves
of a PDA. The process of transition is denoted by the turnstile symbol "⊢".
Consider a PDA (Q, ∑, S, δ, q 0, I, F). A transition can be mathematically represented by the
This implies that while taking a transition from state p to state q, the input symbol ‘a’ is
consumed, and the top of the stack ‘T’ is replaced by a new string ‘α’.
Note − If we want zero or more moves of a PDA, we have to use the symbol (⊢*) for it.
Pushdown Automata Acceptance
There are two different ways to define PDA acceptability.
Final State Acceptability
In final state acceptability, a PDA accepts a string when, after reading the entire string, the PDA
is in a final state. From the starting state, we can make moves that end up in a final state with
any stack values. The stack values are irrelevant as long as we end up in a final state.
Example
Construct a PDA that accepts L = {0n 1n | n ≥ 0}
Solution
Step 2 − For every w, x, y, z ∈ Q, add the production rule Xwx → XwyXyx in grammar G.
contains (x, ε), add the production rule Xwx → a Xyzb in grammar G.
⊢(y*z, +SI) ⊢ (*z, y+SI) ⊢ (*z, Y+SI) ⊢ (*z, X+SI) ⊢ (z, *X+SI)
⊢ (ε, z*X+SI) ⊢ (ε, Y*X+SI) ⊢ (ε, X+SI) ⊢ (ε, SI)
Turing Machine Introduction
A Turing Machine is an accepting device which accepts the languages (recursively enumerable
set) generated by type 0 grammars. It was invented in 1936 by Alan Turing.
Definition
A Turing Machine (TM) is a mathematical model which consists of an infinite length tape
divided into cells on which input is given. It consists of a head which reads the input tape. A
state register stores the state of the Turing machine. After reading an input symbol, it is
replaced with another symbol, its internal state is changed, and it moves from one cell to the
right or left. If the TM reaches the final state, the input string is accepted, otherwise rejected.
A TM can be formally described as a 7-tuple (Q, X, ∑, δ, q0, B, F) where −
Q is a finite set of states
X is the tape alphabet
∑ is the input alphabet
δ is a transition function; δ : Q × X → Q × X × {Left_shift, Right_shift}.
q0 is the initial state
B is the blank symbol
F is the set of final states
Comparison with the previous automaton
The following table shows a comparison of how a Turing machine differs from Finite
Automaton and Pushdown Automaton.
Machine Stack Data Structure Deterministic?
It is a two-track tape −
Upper track − It represents the cells to the right of the initial head position.
Lower track − It represents the cells to the left of the initial head position in reverse
order.
The infinite length input string is initially written on the tape in contiguous tape cells.
The machine starts from the initial state q0 and the head scans from the left end marker ‘End’.
In each step, it reads the symbol on the tape under its head. It writes a new symbol on that
tape cell and then it moves the head either into left or right one tape cell. A transition function
determines the actions to be taken.
It has two special states called accept state and reject state. If at any point of time it enters
into the accepted state, the input is accepted and if it enters into the reject state, the input is
rejected by the TM. In some cases, it continues to run infinitely without being accepted or
rejected for some certain input symbols.
Note − Turing machines with semi-infinite tape are equivalent to standard Turing machines.
A linear bounded automaton is a multi-track non-deterministic Turing machine with a tape of
some bounded finite length.
Length = function (Length of the initial input string, constant c)
Here,
Memory information ≤ c × Input information
The computation is restricted to the constant bounded area. The input alphabet contains two
special symbols which serve as left end markers and right end markers which mean the
transitions neither move to the left of the left end marker nor to the right of the right end
marker of the tape.
A linear bounded automaton can be defined as an 8-tuple (Q, X, ∑, q 0, ML, MR, δ, F) where −
Q is a finite set of states
X is the tape alphabet
∑ is the input alphabet
q0 is the initial state
ML is the left end marker
MR is the right end marker where MR ≠ ML
δ is a transition function which maps each pair (state, tape symbol) to (state, tape
symbol, Constant ‘c’) where c can be 0 or +1 or -1
F is the set of final states
A deterministic linear bounded automaton is always context-sensitive and the linear bounded
automaton with empty language is undecidable.
Language Decidability
A language is called Decidable or Recursive if there is a Turing machine which accepts and halts
on every input string w. Every decidable language is Turing-Acceptable.
Example 1
Find out whether the following problem is decidable or not −
Is a number ‘m’ prime?
Solution
Prime numbers = {2, 3, 5, 7, 11, 13, …………..}
Divide the number ‘m’ by all the numbers between ‘2’ and ‘√m’ starting from ‘2’.
If any of these numbers produce a remainder zero, then it goes to the “Rejected state”,
otherwise it goes to the “Accepted state”. So, here the answer could be made by ‘Yes’ or ‘No’.
Hence, it is a decidable problem.
Solution
Take the DFA that accepts L and check if w is accepted
Note −
If a language L is decidable, then its complement L' is also decidable
If a language is decidable, then there is an enumerator for it.
Undecidable Languages
For an undecidable language, there is no Turing Machine which accepts the language and
makes a decision for every input string w (TM can make decision for some input string though).
A decision problem P is called “undecidable” if the language L of all yes instances to P is not
decidable. Undecidable languages are not recursive languages, but sometimes, they may be
recursively enumerable languages.
Example
The halting problem of Turing machine
The mortality problem
The mortal matrix problem
The Post correspondence problem, etc.
Turing Machine Halting Problem
Input − A Turing machine and an input string w.
Problem − Does the Turing machine finish computing of the string w in a finite number of
steps? The answer must be either yes or no.
Proof − At first, we will assume that such a Turing machine exists to solve this problem and
then we will show it is contradicting itself. We will call this Turing machine as a Halting
machine that produces a ‘yes’ or ‘no’ in a finite amount of time. If the halting machine finishes
in a finite amount of time, the output comes as ‘yes’, otherwise as ‘no’. The following is the
block diagram of a Halting machine −
Since, P is non-trivial, at least one language satisfies P, i.e., L(M0) ∈ P , ∋ Turing Machine M0.
Let, w be an input in a particular instant and N is a Turing Machine which follows −
On input x
Run M on w
If M does not accept (or doesn't halt), then do not accept x (or do not halt)
If M accepts w then run M0 on x. If M0 accepts x, then accept x.
If M accepts w and N accepts the same language as M0, Then L(M) = L(M0) ∈ p
A function that maps an instance ATM = {<M,w>| M accepts input w} to a N such that
M Abb aa aaa
N Bba aaa aa
Here,
x2x1x3 = ‘aaabbaaa’
and y2y1y3 = ‘aaabbaaa’
We can see that
x2x1x3 = y2y1y3
Hence, the solution is i = 2, j = 1, and k = 3.
Example 2
Find whether the lists M = (ab, bab, bbaaa) and N = (a, ba, bab) have a Post Correspondence
Solution?
Solution
x1 x2 x3
M ab bab bbaaa
N a ba bab
In this case, there is no solution because −
| x2x1x3 | ≠ | y2y1y3 | (Lengths are not same)
Hence, it can be said that this Post Correspondence Problem is undecidable.