100% found this document useful (1 vote)
556 views100 pages

Automata Theory E-Content Document

This document provides an overview of theory of computation concepts including alphabets, languages, strings, grammars and automata. It defines alphabets as finite sets of symbols, languages as sets of strings over an alphabet, and strings as finite sequences of symbols. Operations on strings like concatenation and reversal are described. Formal languages and language operations including union, intersection and concatenation are introduced. The document also provides definitions for Kleene star and regular expressions. Finally, it gives a high-level description of automata as abstract computing devices that read input, have a finite state control unit, and can accept or produce output.

Uploaded by

jhoronamohanty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
556 views100 pages

Automata Theory E-Content Document

This document provides an overview of theory of computation concepts including alphabets, languages, strings, grammars and automata. It defines alphabets as finite sets of symbols, languages as sets of strings over an alphabet, and strings as finite sequences of symbols. Operations on strings like concatenation and reversal are described. Formal languages and language operations including union, intersection and concatenation are introduced. The document also provides definitions for Kleene star and regular expressions. Finally, it gives a high-level description of automata as abstract computing devices that read input, have a finite state control unit, and can accept or produce output.

Uploaded by

jhoronamohanty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Theory of Computation

Prepared by
Sasmita Kumari Nayak
Assistant Professor, CSE Department
Centurion institute of Technology
Centurion University of Technology and management

Module 1
Chapter 1
1.1 Alphabet, languages and grammars
Languages:
A general definition of language must cover a variety of distinct categories: natural languages,
programming languages, mathematical languages, etc. The notion of natural languages like
English, Hindi, etc. is familiar to us. Informally, language can be defined as a system suitable for
expression of certain ideas, facts, or concepts, which includes a set of symbols and rules to
manipulate these. The languages we consider for our discussion is an abstraction of natural
languages. That is, our focus here is on formal languages that need precise and formal
definitions. Programming languages belong to this category. We start with some basic concepts
and definitions required in this regard.

1.2 Symbols:
Symbols are indivisible objects or entity that cannot be defined. That is, symbols are the atoms of
the world of languages. A symbol is any single object such as , a, 0, 1, #, begin, or do. Usually,
characters from a typical keyboard are only used as symbols.

1.3 Alphabets:
An alphabet is a finite, nonempty set of symbols. The alphabet of a language is normally denoted
by . When more than one alphabets are considered for discussion, then subscripts may be used
(e.g.
etc) or sometimes other symbol like G may also be introduced.
Example:

1.4 Strings or Words over Alphabet:


A string or word over an alphabet
is a finite sequence of concatenated symbols of .
Example: 0110, 11, 001 are three strings over the binary alphabet { 0, 1 } .
aab, abcb, b, cc are four strings over the alphabet { a, b, c }.
It is not the case that a string over some alphabet should contain all the symbols from the
alphabet. For example, the string cc over the alphabet { a, b, c } does not contain the symbols a

and b. Hence, it is true that a string over an alphabet is also a string over any superset of that
alphabet.
Convention: We will use small case letters towards the beginning of the English alphabet to
denote symbols of an alphabet and small case letters towards the end to denote strings over an
alphabet. That is,

(symbols) and

are strings.

1.5 Some String Operations:


The operations of strings are concatenation, length of a string, powers of strings, powers of
alphabets and reversal.

1.6 Concatenation:
Let

and

be

two

strings.

The

concatenation

of x and y denoted by xy, is the string


. That is, the concatenation
of x and y denoted by xy is the string that has a copy of x followed by a copy of y without any
intervening space between them.
Example: Concatenation of the strings 0110 and 11 is 011011 and concatenation of the strings
good and boy is goodboy.
Note that for any string w, we = ew = w. It is also obvious that if | x | = n and | y | = m,
then | x + y | = n + m.
u is a prefix of v if v = ux for some string x.
u is a suffix of v if v = xu for some string x.
u is a substring of v if v = xuy for some strings x and y.

1.7 Length of a string:


The number of symbols in a string w is called its length, denoted by |w|.
Example: | 011 | = 4, |11| = 2, | b | = 1
It is convenient to introduce a notation e for the empty string, which contains no symbols at all.
The length of the empty string e is zero, i.e., | e | = 0.
Example: Consider the string 011 over the binary alphabet. All the prefixes, suffixes and
substrings of this string are listed below.
Prefixes: e, 0, 01, 011.
Suffixes: e, 1, 11, 011.
Substrings: e, 0, 1, 01, 11, 011.
Note that x is a prefix (suffix or substring) to x, for any string x and e is a prefix (suffix or
substring) to any string.
A string x is a proper prefix (suffix) of string y if x is a prefix (suffix) of y. In the above example,
all prefixes except 011 are proper prefixes.

1.8 Powers of Strings:


For any string x and integer

, we use

to denote the string formed by sequentially

concatenating n copies of x. We can also give an inductive definition of

as follows:

= e, if n = 0 ; otherwise
Example : If x = 011, then

= 011011011,

= 011 and

2.1 Powers of Alphabets:


We write
(for some integer k) to denote the set of strings of length k with symbols from .
In other words,
= { w | w is a string over
and | w | = k}. Hence, for any alphabet,
denotes the set of all
strings of length zero. That is,
= { e }. For the binary alphabet { 0, 1 } we have the following.

2.2
The set of all strings over an alphabet

is denoted by

. That is,

The set
contains all the strings that can be generated by iteratively concatenating symbols
from
any number of times.
Example: If
= {a, b}, then
= { e, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa}.
Please note that if
, then
that is
. It may look odd that one can proceed from
the empty set to a non-empty set by iterated concatenation. But there is a reason for this and we
accept this convention.

2.3
The set of all nonempty strings over an alphabet

Note that

is denoted by

. That is,

is infinite. It contains no infinite strings but strings of arbitrary lengths.

Reversal: For any string

the reversal of the string is

2.4 Languages:
A language over an alphabet is a set of strings over that alphabet. Therefore, a language L is any
subset of . That is, any
Example :
1. F is the empty language.
2.
is a language for any

is a language.

2.5 Cont
3. {e} is a language for any . Note that,
. Because the language F does not
contain any string but {e} contains one string of length zero.
4. The set of all strings over { 0, 1 } containing equal number of 0's and 1's.
5. The set of all strings over {a, b, c} that starts with a.
Convention : Capital letters A, B, C, L, etc. with or without subscripts are normally used to
denote languages.

2.6 Set operations on languages:


Since languages are set of strings we can apply set operations to languages. Here are some
simple examples (though there is nothing new in it).
Union : A string

iff

or

Example : { 0, 11, 01, 011 }

{ 1, 01, 110 } = { 0, 11, 01, 011, 111 }

Intersection : A string

iff

Example : { 0, 11, 01, 011 }

and

{ 1, 01, 110 } = { 01 }

2.7 Complement :
Usually,

is the universe that a complement is taken with respect to. Thus for a language L, the

complement is = {

}.

Example : Let L = { x | |x| is even }. Then its complement is the language {


| |x| is odd }.
Similarly we can define other usual set operations on languages like relative complement,
symmetric difference, etc.

2.8 Reversal of a language:


The reversal of a language L, denoted as

, is defined as:

Example :
1. Let L = {0, 11, 01, 011}. Then
2. Let L = {

= { 0, 11, 10, 110 }.

| n is an integer }. Then

= {

| n is an integer }.

3.1 Language concatenation:


The

concatenation

of

languages

and

is

defined

as

= { xy |
and
}.
Example : { a, ab }{ b, ba } = { ab, aba, abb, abba }.
Note that ,
1.
2.
3.

in general.

3.2 Iterated concatenation of languages:


Since we can concatenate two languages, we also repeat this to concatenate any number of
languages. Or we can concatenate a language with itself any number of times. The
operation
denotes the concatenation of L with itself n times. This is defined formally as
follows:

3.3 Example :
Let L = { a, ab }. Then according to the definition, we have

and so on.

3.4 Kleene's Star operation:


The Kleene star operation on a language L, denoted as
is defined as follows:
= (Union n in N)
=
= {x | x is the concatenation of zero or more strings from L }
Thus
is the set of all strings derivable by any number of concatenations of strings in L. It is
also useful to define

=
, i.e., all strings derivable by one or more concatenations of strings in L. That is
= (Union n in N and n >0)
=

3.5 Example :
Let L = { a, ab }. Then we have,
=
= {e}
{a, ab}
{aa, aab, aba, abab}
=
= {a, ab}
{aa, aab, aba, abab}
Note : e is in
, for every language L, including .
The previously introduced definition of
is an instance of Kleene star.

3.6 What is Automata?


An automaton is an abstract computing device (or machine). There are different varieties of such
abstract machines (also called models of computation) which can be defined mathematically.
Some of them are as powerful in principle as today's real computers, while the simpler ones are
less powerful. (Some models are considered even more powerful than any real computers as they
have infinite memory and are not subject to physical constraints on memory unlike in real
computers). Studying the simpler machines is still worth as it is easier to introduce some
formalism used in theory.

3.7 Features of Automaton:

Every automaton consists of some essential features as in real computers. It has a


mechanism for reading input. The input is assumed to be a sequence of symbols over a
given alphabet and is placed on an input tape (or written on an input file). The simpler
automata can only read the input one symbol at a time from left to right but not change.
Powerful versions can both read (from left to right or right to left) and change the input.
The automaton can produce output of some form. If the output in response to an input
string is binary (say, accept or reject), then it is called an accepter.

3.8 Cont

If it produces an output sequence in response to an input sequence, then it is called a


transducer (or automaton with output).
The automaton may have a temporary storage, consisting of an unlimited number of cells,
each capable of holding a symbol from an alphabet (which may be different from the
input alphabet). The automaton can both read and change the contents of the storage cells
in the temporary storage. The accusing capability of this storage varies depending on the
type of the storage.

The most important feature of the automaton is its control unit, which can be in any one
of a finite number of interval states at any point. It can change state in some defined
manner determined by a transition function.

Figure 1: The figure above shows a diagramatic representation of a generic automation.

4.1 Operation of Automata:


Operation of the automation is defined as follows.
At any point of time the automaton is in some integral state and is reading a particular
symbol from the input tape by using the mechanism for reading input. In the next time
step the automaton then moves to some other integral (or remain in the same state) as
defined by the transition function. The transition function is based on the current state,
input symbol read, and the content of the temporary storage. At the same time the content
of the storage may be changed and the input read may be modified. The automation may
also produce some output during this transition. The internal state, input and the content
of storage at any point defines the configuration of the automaton at that point. The
transition from one configuration to the next (as defined by the transition function) is
called a move. Finite state machine or Finite Automation is the simplest type of abstract
machine we consider. Any system that is at any point of time in one of a finite number of
interval state and moves among these states in a defined manner in response to some
input, can be modeled by a finite automaton. It does not have any temporary storage and
hence a restricted model of computation.

Chapter 2
Finite Automata
4.2 Finite Automata
An automaton (Plural: Automata) is defined as a system where information is transmitted and
used for performing some functions without direct participation of man.

4.3 Description of Automata


A finite automaton is a mathematical model of a system with discrete inputs and outputs. The
system can be in any number of internal configurations on states. The state of system
summarizes the information concerning past input that is needed to determine the behavior of the
system or subsequent input. Transitions are changes of states that can occur spontaneously or in
response to inputs to the states. Though transitions usually take time, we assume that state
transitions are instantaneous (which is an abstraction).

4.4 Cont
Some examples of state transition systems are: digital systems, vending machines, etc. A system
containing only a finite number of states and transitions among them is called a finite-state
transition system.
Finite-state transition systems can be modeled abstractly by a mathematical model called finite
automation.

4.5 Deterministic Finite (-state) Automata


Informally, a DFA (Deterministic Finite State Automaton) is a simple machine that reads an
input string, one symbol at a time and then, after the input has been completely read, decides
whether to accept or reject the input. As the symbols are read from the tape, the automaton can
change its state, to reflect how it reacts to what it has seen so far.

4.6 Formal Definition


A Deterministic Finite State Automaton (DFA) is a 5-tuple :
Q is a finite set of states.

is a finite set of input symbols or alphabet.


A transition function ( : Q Q)

is the start state.

is the set of accept or final states.

4.7 Acceptance of Strings:


A DFA accepts a string
that
1.

if there is a sequence of states

in Q such

is the start state.

2.

for all

3.

4.8 Language Accepted or Recognized by a DFA:


The language accepted or recognized by a DFA M is the set of all strings accepted by M, and is
denoted by
i.e.
The notion of acceptance can also be made more precise by extending the transition function

5.1 Extended transition function:


Extend

(which is

i.e.

function on symbols)

to

a function on strings,

That is,
is the state the automation reaches when it starts from the state q and finish
processing the string w.

5.2 Cont
Formally, we can give an inductive definition as follows:
The language of the DFA M is the set of strings that can take the start state to one of the
accepting states i.e.
L(M) = {
={

| M accepts w }
|

5.3 Example:

is the start state

5.4 Cont
It is a formal description of a DFA. But it is hard to comprehend. For example, the language of
the DFA is any string over { 0, 1} having at least one 1.
We can describe the same DFA by transition table or state transition diagram as following:

5.5 Cont
State transition diagram :

Transition Table :
0

It is easy to comprehend the transition diagram.


Explanation : We cannot reach final state

w/0 or in the i/p string. There can be any no. of 0's

at the beginning. ( The self-loop at


on label 0 indicates it ). Similarly there can be any no. of
0's & 1's in any order at the end of the string.

5.6 Transition table:


It is basically a tabular representation of the transition function that takes two arguments (a state
and a symbol) and returns a value (the next state).
Rows correspond to states,
Columns correspond to input symbols,

5.7 Cont

Entries correspond to next states


The start state is marked with an arrow
The accept states are marked with either a star (*) or a single circle.

5.8 Cont
0

6.1 (State) Transition diagram:


A state transition diagram or simply a transition diagram is a directed graph which can be
constructed as follows:
1. For each state in Q there is a node.
2. There is a directed edge from node q to node p labeled a iff
. (If there are
several input symbols that cause a transition, the edge is labeled by the list of these
symbols.)

6.2 Cont
3. There is an arrow with no source into the start state.
4. Accepting states are indicated by double circle.

6.3 Cont
Here is an informal description how a DFA operates. An input to a DFA can be any
string
. Put a pointer to the start state q. Read the input string w from left to right, one
symbol at a time, moving the pointer according to the transition function, . If the next symbol
of w is a and the pointer is on state p, move the pointer to

6.4 Cont
When the end of the input string w is encountered, the pointer is on some state, r. The string is
said to be accepted by the DFA if
and rejected if
. Note that there is no formal
mechanism for moving the pointer.
A language
is said to be regular if L = L(M) for some DFA M.

6.5 Example:
Design a DFA which accepts the language L1 = {x {a, b}* | x ends with aa}

Figure 2: An FA accepting the strings ending with aa.

6.6 Acceptability of a string by a finite automata:


A string w is accepted by a finite automaton
this concludes that the string is accepted by the final state.

if (q0, w)=P for some P in F.

6.7 Property:
for all strings w and input symbols a
(q, xy) = ((q, x), y)
(q, yx) = ((q, y), x)

6.8 Example:
Let us consider a transition table as given below. Now let us consider the input string whether it
is accepted by a FA or not.
0
1

6.9 Cont
By using the above property (q0, 1010) = (q1, 010)= (q1, 10)= (q1, 0)= q1
Here q1is the final state so it concludes that the given string is accepted by the finite automata by
final state.

7.1 Nondeterministic Finite Automata (NFA)


An NFA is defined in the same way as the DFA but with the following two exceptions:
Multiple next states.

- Transitions.
Multiple Next State:
In contrast to a DFA, the next state is not necessarily uniquely determined by the current
state and input symbol in case of an NFA. (Recall that, in a DFA there is exactly one start
state and exactly one transition out of every state for each symbol in ).

This means that - in a state q and with input symbol a - there could be one, more than one
or zero next state to go, i.e. the value of
=

is a subset of Q. Thus

which means that any one of

could be the next state.

The zero next state case is a special one giving


= , which means that there is no
next state on input symbol when the automata is in state q. In such a case, we may think
that the automata "hangs" and the input will be rejected.

7.2 - Transitions:

In an -transition, the tape head doesn't do anything- it does not read and it does not move.
However, the state of the automata can be changed - that is can go to zero, one or more
states. This is written formally as
could by any one of

implying that the next state

w/o consuming the next input symbol.

7.3 Formal definition of NFA:


Formally, an NFA contains 5 tuples
meaning as for a DFA, but

where Q, , , and F bear the same


, the transition function is redefined as follows:

Where P(Q) is the power set of Q i.e.


.
Acceptance of NFA:
Acceptability of a string is defined as if any completed path ends with final state then; we say it
is accepted otherwise not accepted.

7.4 Example:
Obtain an NFA for a language consisting of all strings over {0, 1} containing a 1 in the third
position from the end.
Solution:
q1, q2, q3 are initial states and q4 is the final state.

Please note that this is an NFA as d(q2 ,0) = q3 and d(q2 ,1) = q3 .

7.5 Example:
Sketch the NFA state diagram for M = ({q0, q1 , q2 , q3},{0,1},d, q0 ,{q3}) with the state table as
given below.

7.6 Solution:
The NFA states are q0, q1, q2 and q3.

The NFA is as shown below.

7.7 Equivalence of NFA and DFA


Theorem: Equivalence of DFA and NFA
Let L (language) be a set accepted by a nondeterministic finite automata. Then there exist
deterministic finite automata that accept L.
Proof:
Let M= (Q, , , q0, F) be an NFA accepting L .
We can defined DFA M= (Q, , , q0, F). The states of the M are all the subsets of the states
of M. That is Q=2Q .
M will keep track in its states of all the states M could be at any given time .F is the set of all
states in Q containing a final state of M.

7.8 Cont
An elements of Q will be denoted by {q1, q2, q3... qn}, where q1, q2, q3... qn are in Q. Notice that
[q1, q2, q3... qn] is a single state of the DFA corresponding to a set of states of the NFA.
Note that q0= [ q0]
We can defined
([q1, q2, q3, ......, qn],a)=[ P1, P2, P3, ..., Pj] if and only if ([q1, q2, q3, ......, qn],a)=[ P1, P2, P3, ...,
Pj]

8.1 Cont
that is applied to an element [q1, q2, q3, ......, qn] of Q is computed by applying to reach
state of Q represented by [q1, q2, q3, ......, qn].
On applying to each of [q1, q2, q3... qn] and taking the unions, we get some new set of states, P1,
P2, P3... Pj. This new set of states has a representive, [P1, P2, P3... Pj] in Q, and the element is the
value o ([q1, q2, q3... qn],a)
The length of input string x, then (q0, .x)= { q1, q2, q3, ......, qn } if and only if (q0, .x)= { q1,
q2, q3, ......, qn }

8.2 Cont
The result is trivial for X=0, since q0= [q0] and X must be (Null).
So it is prove that for every NFA there is exist a DFA.
Procedure: For NFA their exist a DFA which simulates the behavior of NFA, i.e. if L is a set
accepted by a NFA then their exist DFA which also accepting L.

8.3 Cont
Let M be a NFA having M= (Q, , , q0, F)
Then the corresponding DFA M is
M= (Q, , , q0, F)
Where Q=2Q
i.e. if Q = { q0, q1}
Then Q= {, [q0], [q1] [q0, q1]}
= (input set)
q0=[ q0] (initial state)
F=set of state having any qi F for i=0,1,2,3....... n
= It is defined only for those states data reachable from initial states.

8.4 Example:
For the following NFA find equivalent DFA

8.5 Solution:
The state table can be represented by the state diagram as given below

First we make the starting state of NFA as the starting state of DFA. Apply 0 and 1 as input on
state q0. Now keep the outputs in 2nd and 3rd columns. Next take the new generated states which
we placed in 2nd and 3rd columns keep them in first column and again apply 0s and 1s as input.
Repeat the process until no new state is left.

8.6 Cont

Then draw the state diagram by using the above transition table of DFA.

8.7 Example:
Determine a deterministic Finite State Automaton from the given NFA.

M ({q0 , q1},{a, b}, , q0 ,{q1}) with the state table diagram for given below.

Solution:

Let M(Q, ,, q, F) be a determine. Finite state automaton (DFA),


where
Q= {[q0 ], [q1 ], [q0 , q1 ], [ ]},
q0 = [q0]
and F= {[q1 ], [q0 , q1 ]}
Please remember that [ ] denotes a single state. Let us now proceed to determine to be defined
for the DFA.

8.8 Cont
It is to be noted that

Here any subset containing q1 is the final state in DFA. This is shown as below.

9.1 NFA with -transition


It is an NFA including transitions on the empty input that is (epsilon).

9.2 Formal definition:


M be a NFA with moves is denoted by a, 5 tuple notation having M= (Q, , , q0, F)
where Q is set of states
is finite set of input alphabet
q0 is initial state
is transition function which maps Q( {})2Q
Basically, it is a NFA with few transitions on the empty input

9.3 -closure of a state:


-closure (q0) is pronounced as epsilon closure of q0 is defined as the set of all states p such that
there is a path from q0 to p with label .
Consider the above transition diagram of NFA with -moves are
-closure(q0) ={ q0, q1, q2}
-closure(q1) ={ q1, q2}
-closure(q2) ={q2}

9.4 Cont
We can define separate function 0 , = 0 = {0 , 1 , 2 } rules for of
required of NFA.
(q, a) = (( , , ))
Where , = ()

9.5 Construction of NFA without -moves from NFA with -moves:


Theorem: Every DFA has as equivalent NFA
Proof: A DFA is just a special type of an NFA . In a DFA , the transition functions is defined
from

whereas
and

in

case

be a DFA .

of

an NFA it

is

defined

from

9.6 Cont
We construct an equivalent NFA

as follows.

i. e
If
and
All other elements of N are as in D.
If

then

there

is

sequence

of

states

such

that

9.7 Cont
Then it is clear from the above construction of N that there is a sequence of states
(in N)

such

that

and

and

hence
Similarly we can show the converse.
Hence ,

9.8 Cont
Given any NFA we need to construct as equivalent DFA i.e. the DFA need to simulate the
behavior of the NFA. For this, the DFA have to keep track of all the states where the NFA could
be in at every step during processing a given input string.
There are
possible subsets of states for any NFA with n states. Every subset corresponds to
one of the possibilities that the equivalent DFA must keep track of. Thus, the
equivalent DFA will have
states.

10.1 Cont
The formal constructions of an equivalent DFA for any NFA is given below. We first consider
an NFA without transitions and then we incorporate the affects of transitions later.
Formal construction of an equivalent DFA for a given NFA without

transitions.

10.2 Cont
Given an

without
as follows

- moves, we construct an equivalent DFA

i.e.

(i.e. every subset of Q which as an element in F is considered as a


final state in DFA D )

10.3 Cont
for all

and

where
That is,
To show that this construction works we need to show that L(D)=L(N) i.e.

Or,
We will prove the following which is a stranger statement thus required.

10.4 Proof :
We will show by inductions on
Basis: If

=0, then w =

So,

by definition.

Inductions hypothesis : Assume inductively that the statement holds


than or equal to n.
Inductive step
Let

10.5 Cont
Now,

, then

with

of length less

10.6 Cont
Now, given any NFA with -transition, we can first construct an equivalent NFA without transition and then use the above construction process to construct an equivalent DFA , thus,
proving the equivalence of NFA s and DFAs.

10.7 Example:
Consider the NFA given below.
Transition table
0

10.8 Cont
Since there are 3 states in the NFA
There will be
states (representing all possible subset of states) in the equivalent DFA . The
transition table of the DFA constructed by using the subset constructions process is produced
here.
The start state of the DFA is

- closures

The final states are all those subsets that contains

11.1 Cont
Let us compute one entry,

(since

in the NFA).

Similarly, all other transitions can be computed.

11.2 Cont
Corresponding transition fig. for the DFA is shown in below:
0

11.3 Cont
Note that states
are not accessible and hence can be
removed. This gives us the following simplified DFA with only 3 states.

11.4 Cont
It is interesting to note that we can avoid encountering all those inaccessible or unnecessary
states in the equivalent DFA by performing the following two steps inductively.
1. If
is the start state of the NFA, then make - closure (
equivalent DFA . This is definitely the only accessible state.

) the start state of the

2. If we have already computed a set


compute

of states which are accessible. Then

because these set of states will also be accessible.

11.5 Cont
Following these steps in the above example, we get the transition table given below
0

11.6 Minimization of FA:


One important result on finite automata, both theoretically and practically, is that for any regular
language there is a unique DFA having the smallest number of states that accepts it. Let M = < Q
, , q0 , , F > be a DFA that accepts a language L. Then the following algorithm produces the
DFA, denote it by M1, which has the smallest number of states among the DFAs that accept L.

11.7 Procedure:
Minimum DFA M1 is constructed from
final as follows:
Select one state in each set of the partition
final as the representative for the set. These
representatives are states of minimum DFA M1.
Let p and q be representatives i.e. states of minimum DFA M1. Let us also denote by p
and q the sets of states of the original DFA M represented by p and q, respectively. Let s
be a state in p and t a state in q. If a transition from s to t on symbol a exists in M, then
the minimum DFA M1 has a transition from p to q on symbol a.

11.8 Cont

The start state of M1 is the representative which contains the start state of M.
The accepting states of M1 are representatives that are in A.
Note that the sets of
final are either a subset of A or disjoint from A.
Remove from M1 the dead states and the states not reachable from the start state, if there are any.
Any transitions to a dead state become undefined. A state is a dead state if it is not an accepting
state and has no out-going transitions except to itself.

12.1 Example:
Let us
Initially

try to minimize the


= { { 1 , 5 } , { 2 , 3 , 4 } }.

number

of

states

of

the

following

DFA.

12.2 Solution:
New partition is applied to . Since on b state 2 goes to state 1, state 3 goes to state 4 and 1 and
4 are in different sets in
, states 2 and 3 are going to be separated from each other in
new .
Also since on a sate 4 goes to sate 4, state 3 goes to state 5 and 4 and 5 are in different sets in
, states 3 and 4 are going to be separated from each other in
new.

12.3 Cont
Further, since on b 2 goes to 1, 4 goes to 4 and 1 and 4 are in different sets in
, 2 and 4 are
separated from each other in
new. On the other hand 1 and 5 make the same transitions. So
they are not going to be split.
Thus the new partition is { { 1 , 5 } , { 2 } , { 3 } , { 4 ] }. This becomes the
in the second
iteration.

12.4 Cont
When new partition is applied to this new
unchanged.
Thus

final

, since 1 and 5 do the same transitions,

remains

= { { 1 , 5 } , { 2 } , { 3 } , { 4 ] }.

Select 1 as the representative for {1, 5}. Since the rest are singletons, they have the obvious
representatives.
Note here that state 4 is a dead state because the only transition out of it is to itself. Thus the set
of states for the minimized DFA is {1, 2, 3}. For the transitions, since 1 goes to 3 on a, and to 2
on b in the original DFA, in the minimized DFA transitions are added from 1 to 3 on a, and 1 to
2 on b. Also since 2 goes to 1 on b, and 3 goes to 1 on a in the original DFA, in the minimized
DFA transitions are added from 2 to 1 on b, and from 3 to 1 on a. Since the rest of the states are
singletons, all transitions between them are inherited for the minimized DFA. Thus the
minimized DFA is as given in the following figure:

12.5 Equivalence of Two finite state machines:


Two finite automata over are equivalent if they accept the same set of strings over . When the
two finite automata are not equivalent, there is some string w over satisfying the following :
One automaton reaches a final state on application of w
Whereas the other automaton reaches a non-final state.

12.6 Cont
Let there are two Finite Automata M and M' where number of input symbols are the same. To
test the equivalence of two finite automata over by using comparison method.

12.7 Comparison Method:


(i) Make a comparison table with n + 1 columns, where n is the number of input symbols.
(ii) In the first column there will be pair of vertices (q, q') where q M and q' M'. The first
pair of vertices will be the initial states of the two machines M and M'. Second column consists
of (qa, qa'). Where qa is reachable from the initial state of the machine M for the first input, and
q'a is reachable from the initial state of the machine M' for the first input. The other n2 columns
consist of pair of vertices from M and M' for n1 inputs, where n = 2, 3,n1. We repeat the
construction by considering the pairs in the second and subsequent columns which are not in the
first column.

12.8 Cont
The row-wise construction is repeated. There are two cases:
Case 1: If we reach a pair (q, q') such that q is a final state of M, and q' is a non-final state of M'
or vice versa, we terminate the construction and conclude that M and M' are not equivalent.
Case 2: The construction is terminated when no new element appears in the second and
subsequent columns which are not in the first column. In this case we conclude that M and M'
are equivalent.

13.1Example:
Consider the following two DFAs M and M over {c, d} given in figure, find out whether M
and M are equivalent.

13.2 Solution:
The initial states in M and M are q1 and q4 respectively. Hence first element of the first column
in the comparison table must be {q1, q4} since both q1, q4 are c-reachable from the respective
initial states.
As we do not get a pair (q, q1), where q is a final state and q1 is non final state or vice-versa at
every row, therefore M and M are equivalent.
Comparison table

13.3 Finite Automata with output


Finite automata are primarily used in parsing for recognising languages. Input strings belonging
to a given language should turn an automaton to final states and all other input strings should
turn this automaton to states that are not final.
Finite automata that additionally generate output are called transducers. In order to define a
transducer, an output alphabet and output function should be specified in addition to the five
components outlined earlier.

13.4 Cont
The output function can be a function of a state or a function of both state and input symbol. If
the output function depends on a state and input symbol, then it is called a Mealy automaton. If
the output function depends only on a state, then it called is a Moore automaton.

13.5 Moore Machine


A Moore machine is a six tuples M= (Q, , , , , q0), where
Q is a finite set of states.

is a finite set of input symbols or alphabet.


is the output alphabet.
is the transition function ( : Q Q)
is the output function mapping Q into

is the start or initial state.

13.6 Transition table of Moore machine


A transition table that shows for each state and each i/p letter what state is reached next. An o/p
table that shows what character from is printed by each state as it is entered.

13.8 Cont
The transition table of moore machine as shown in below:

13.9 Mealy Machine


A mealy machine is a six-tuple M= (Q, , , , , q0), where all the symbols except have the
same meaning as in the Moore machine. is the output function mapping Q into .
The transition table of mealy machine as shown in below:

Mealy machine

14.1 Example:
Design Mealy machine for the following table and also find the output for the string abbabaaa.

14.2 Solution:
Moore machine

Mealy machine

14.3 Output of above example


The output for the given string abbabaaa is shown in below:

14.5 Conversion from Moore machine to Mealy machine


We develop procedures for transforming a Mealy machine into a Moore machine and vice versa
so that for a given input string the output strings are the same (except for the first symbol) in
both the machines.
Example: Design a Mealy machine which is equivalent to the Moore machine given in table.

14.6 Solution:
In Moore machine every input symbol (a) we form a pair consisting of the next state and the
corresponding output and reconstruct the table for mealy machine. For example the states q 0 and
q1 in the next state column should be associated with output 0 and 1.
The transition table for Mealy machine will be:

Mealy machine

14.7 Cont

14.8 Example:
Consider a Mealy machine described by the transition table. Construct a Moore machine which
is equivalent to the Mealy machine.

15.1 Solution:
we look into the next state column for any state, say q i and determine the number of different
outputs associated with qi in that column.
We break qi into several different states, the number of such states being equal to the number of
different outputs associated with qi, for example q2 is associated with two outputs 0 and 1 and
state q3 is associated only one output 1. So we will break state q 2 in two parts first one for output
0 and second one for output 1, but we cannot be break down because q1 have only one output.
New Table:

15.2 Cont
Note: State q20 means state q2 with output 0 and state q21 means state q2 with output 1. The
pair of states and outputs in the next column can be arranged as in table.

We observe that the initial state q1, is associated with output 1. This means that with input
(null). We get an output 1, if the machine starts at state q1. Thus the Moore machine accepts a
zero length sequence (null sequence) which is not accepted by the Mealy machine. To overcome
this situation, either we must neglect the response of Moore machine to input , or we must add a
new starting q0, whose state transition are same as state q1 but output will be zero (0).

Note: If we have m- outputs, n-state, Mealy machine the corresponding m-output Moore
machine has no more than mn+1 states.

15.3 Cont

15.4 Cont

15.5 Example:
Consider a Mealy machine. Construct a Moore machine equivalent to the Mealy machine.

15.6 Solution:

15.7 Transition Table for Moore Machine:

15.8 Transition diagram of given Example

Chapter 3
16.1 Regular Expressions (RE)
Regular Expressions
A regular expression is a formula for representing a (complex) language in terms of
elementary" languages combined using the three operations union, concatenation and Kleene
closure.

16.2 Formal Definition


A regular expression R over an alphabet
is of one of the following forms:
1.
2. a, for some a
3.
4. (R1R2), where R1and R2 are regular expressions
5. R1R2, where R1and R2 are regular expressions
6. (R1)*, where R1 is a regular expression
Note: it is an inductive definition - it is defined based on itself.
Note: the + symbol will at times be used for union (0+1)*

16.3 Language described by REs :


Each describes a language (or a language is associated with every RE). We will see later that
REs are used to attribute regular languages.

16.4 Notation:
If r is a RE over some alphabet then L(r) is the language associate with r . We can define the
language L(r) associated with (or described by) a REs as follows.
1.
is the RE describing the empty language i.e. L( ) =
.
2. is a RE describing the language { } i.e. L( ) = { } .
3.
, a is a RE denoting the language {a} i.e . L(a) = {a} .

16.5 Cont
4. If
i)
ii)
iii)

and

are REs denoting language L(

) and L(

is a regular expression denoting the language L(


is a regular expression denoting the language L(

) respectively, then
) = L(
)=L(

) L(

) + L(
)

is a regular expression denoting the language

iv) ( ) is a regular expression denoting the language L(( )) = L( )


Example : Consider the RE (0*(0+1)). Thus the language denoted by the RE is
L(0*(0+1)) = L(0*) L(0+1) .......................by 4(ii)
= L(0)*L(0) + L(1)
= { , 0,00,000,........} {0} {1}
= { , 0,00,000,........} {0,1}
= {0, 00, 000, 0000,..........,1, 01, 001, 0001,...............}

16.7 Precedence Rule


Consider the RE ab + c. The language described by the RE can be thought of
either L(a)L(b+c) or L(ab) L(c)as provided by the rules (of languages described by REs) given
already. But these two represents two different languages lending to ambiguity. To remove this
ambiguity we can either
1) Use fully parenthesized expression- (cumbersome) or
2) Use a set of precedence rules to evaluate the options of REs in some order. Like other algebras
mod in mathematics.

16.8 Precedence Rule without parentheses


In absence of parentheses, we have the hierarchy of operations as follows: iteration (closure or
star operator), concatenation and union. For REs, the order of precedence for the operators is as
follows:
i)
We perform first iteration, then concatenation and finally union (+) operator. This
hierarchy is similar to that followed for arithmetic expressions (exponentiation,
multiplication and addition).

17.1 Cont
ii) It is also important to note that concatenation & union (+) operators are associative and
union operation is commutative.
Using these precedence rule, we find that the RE ab+c represents the language L(ab)
L(c) i.e.
it should be grouped as ((ab)+c).
We can, of course change the order of precedence by using parentheses. For example, the
language represented by the RE a(b+c) is L(a)L(b+c).
Example: The RE ab*+b is grouped as ((a(b*))+b) which describes the language L(a)(L(b))*
L(b)
Example: The RE (ab)*+b represents the language (L(a)L(b))* L(b).

17.2 Example:
It is easy to see that the RE (0+1)*(0+11) represents the language of all strings over {0,1} which
are either ended with 0 or 11.
Example: The regular expression r =(00)*(11)*1 denotes the set of all strings with an even
number of 0's followed by an odd number of 1's i.e.
Note: The notation
is used to represent the RE rr*. Similarly,
RE rr,
denotes r, and so on.
An arbitrary string over
= {0,1} is denoted as (0+1)*.
Example: Give a RE r over {0,1} s.t. L(r)={

represents the

has at least one pair of consecutive 1's}

Solution: Every string in L(r) must contain 00 somewhere, but what comes before and what goes
before is completely arbitrary. Considering these observations we can write the REs as
(0+1)*11(0+1)*.
Example : Considering the above example it becomes clear that the RE
(0+1)*11(0+1)*+(0+1)*00(0+1)* represents the set of string over {0,1} that contains the
substring 11 or 00.

17.3 Identities for Regular Expression


Two regular expressions P and Q are equivalent if P and Q represent the same set of strings.
Identities for regular expression, these are useful for simplifying regular expressions. Here we
can represent as i.e. = .

17.4 Identity Rules

17.5 ARDENS THEOREM


Let P and Q be two Regular Expression s over . If P does not contain , then for the equation
R = Q + RP
----------------(1)
has a unique (one and only one) solution R = QP*.

17.6 Proof:
Now point out the statements in Arden's Theorem in General form.
(i) P and Q are two Regular Expressions.
(ii) P does not contain symbol.
(iii) R = Q + RP has a solution, i.e. R = QP*
(iv) This solution is the one and only one solution of the equation.

17.7 Cont
If R = QP* is a solution of the equation R = Q + RP then by putting the value of R in the given
equation we shall get the value
Q + (QP*) P
(R = Q + RP)
*
Q(+ P P )
((+ R* R = R* by I9)
*
QP
Hence equation (1) is satisfied when R = QP* this means R = QP* is a solution of the equation
(1). To prove uniqueness, consider equation (1). Here, replacing R by Q + RP on the R.H.S., we
get the equation.

17.8 Cont

From equation (1),

18.1 Cont
We now show that any solution of equation (1) is equivalent to QP * suppose R satisfies equation
(1) than it satisfies equation (2). Let w be a string of length I in the set R, than w belongs to
the set
, as P does not contain , RPi+1 has no string of
length less than I + 1 and so w is not in the set RP i+1 this means w belongs to the set
and hence to QP*.

18.2 Example:
Solution:

18.3 Equivalence (of REs) with FA :


A language that is accepted by some FAs are known as Regular language. The two concepts :
REs and Regular language are essentially same i.e. for every regular language can be developed
by a RE, and for every RE there is a Regular Langauge.
Theorem: A language is regular iff some RE describes it.

18.4 RE to FA:
Lemma: If L(r) is a language described by the RE r, then it is regular i.e. there is a FA such
that L(M) L(r).

18.5 Proof:
To prove the lemma, we apply structured index on the expression r. First, we show how to
construct FA for the basis elements: ,
and for any
. Then we show how to combine
these Finite Automata into Complex Automata that accept the Union, Concatenation, Kleen
Closure of the languages accepted by the original smaller automata.
Use of NFAs is helpful in the case i.e. we construct NFAs for every REs which are represented
by transition diagram only.

18.6 Basis:
If R is an elementary regular expression, NFA NR is constructed as follows.

18.7 Induction:
If P and Q are regular expressions with corresponding finite automata MP and MQ, then we can
construct a finite automaton denoting P+Q in the following manner:

18.8 Cont
The transitions at the end are needed to maintain a unique accepting state. If P and Q are
regular expressions with corresponding finite automata MP and MQ, then we can construct a
finite automaton denoting PQ in the following manner:

19.1 Cont
Finally, if P is a regular expression with corresponding finite automaton MP, then we can
construct a finite automaton denoting P* in the following manner:

Again, the extra transitions are here to maintain a unique accepting state. It is clear that each
finite automaton described above accepts exactly the set of strings described by the
corresponding regular expression (assuming inductively that the submachines used in the
construction accept exactly the set of strings described by their corresponding regular
expressions)
.

19.2Cont
Since, for each constructor of regular expressions, we have a corresponding constructor of finite
automata, the induction step is proved and our proof is complete.
We have proved that for every regular expression, there exists an equivalent nondeterministic
finite automaton with transitions.

19.3 Example:
Construct transition graph for regular expression R=(0(0|1)*)+
Solution: 0.(0|1)*

19.4 Cont
(0.(0|1)*)+

19.5 Construction of FA equivalent to a RE:


This method is called subset method which involves two steps:
Step 1: Construct a transition graph or diagram equivalent to the given REs using -moves.
Step 2: Construct the transition table for the transition graph obtained by in step 1, and then we
construct DFA. We reduce the number of states if possible.

19.6 Example:
Construct a FA equivalent to the regular expression,

19.7 Solution:

19.8 Cont

20.1 Cont
Note: Null move () in between q0 and q5 consider as the same state. Therefore q0 = q5.

20.2 Transition table of NFA

20.3 Transition table of DFA

20.4 Final transition Table:


We can reduce state [q0, q6, qf] and [q0, q7, qf] because both are final states, containing final state
qf and both have same transition ( next state) at the input symbol 0 and 1.
[q0, q6, qf] [q0, q7, qf]

20.5 Final transition Diagram

20.6 FA to RE:
Lemma: If a language is regular, then there is a RE to describe it. i.e. if L = L(M) for some
DFA M, then there is a RE r such that L = L(r).
Proof: We need to construct a RE r such that
. Since M is a DFA, it has
a finite no of states. Let the set of states of M is Q = {1, 2, 3,..., n} for some integer n. [ Note : if
the n states of M were denoted by some other symbols, we can always rename those to indicate
as 1, 2, 3,..., n ]. The required RE is constructed inductively.

20.7 Cont
Notations:
is a RE denoting the language which is the set of all strings w such that w is the label of a
path from state i to state j
in M, and that path has no intermediate state whose
number is greater then k. ( i & j (begining and end pts) are not considered to be "intermediate"
so i and /or j can be greater than k )
We now construct

inductively, for all i, j

Q starting at k = 0 and finally reaching k = n.

20.8 Cont
Basis: k = 0,
i.e. the paths must not have any intermediate state (since all states are
numbered 1 or above). There are only two possible paths meeting the above condition:
1. A direct transition from state i to state j.
o

= a if there is a transition from state i to state j on symbol the single


symbol a.

if there are multiple transitions from state i to state j on

symbols
o

= f if there is no transition at all from state i to state j.

21.1 Cont
2. All paths consisting of only one node i.e. when i = j. This gives the path of length 0 (i.e.
the RE
denoting the string
) and all self loops. By simply adding to various cases
above we get the corresponding REs i.e.
o

+ a if there is a self loop on symbol a in state i .


+

symbols
o

if there are self loops in state i as multiple


.

if there is no self loop on state i.

21.2 Cont
Induction:
Assume that there exists a path from state i to state j such that there is no intermediate state
whose number is greater than k. The corresponding Re for the label of the path is

21.3 Cont
There are only two possible cases:

1. The path dose not go through the state k at all i.e. number of all the intermediate states are
less than k. So, the label of the path from state i to state j is the language described by the
RE
.
2. The path goes through the state k at least once. The path may go from i to j and k may
appear more than once. We can break the into pieces as shown in the figure.

21.4 Cont

21.5 Cont

The first part from the state i to the state k which is the first recurrence. In this path, all

intermediate states are less than k and it starts at i and ends at k. So the RE
denotes
the language of the label of path.
The last part from the last occurrence of the state k in the path to state j. In this path also,
no intermediate state is numbered greater than k. Hence the RE
language of the label of the path.

denoting the

21.6 Cont

In the middle, for the first occurrence of k to the last occurrence of k , represents a loop
which may be taken zero times, once or any number of times. And all states between two
consecutive k's are numbered less than k.

Hence the label of the path of the part is denoted by the RE


state i to state j is the concatenation of these 3 parts which is

.The label of the path from

21.7 Cont
Since either case 1 or case 2 may happen the labels of all paths from state i to j is denoted by the
following RE
.
We can construct

for all i, j

{1,2,..., n} in increasing order of k starting with the basis k =

0 upto k = n since
available).

depends only on expressions with a small superscript (and hence will be

21.8 Cont
Assume that state 1 is the start state and
are the m final states where ji
{1, 2, ...
, n },
and
. According to the convention used, the language of the automata can
be denoted by the RE

Since
is the set of all strings that starts at start state 1 and finishes at final state
following
the transition of the FA with any value of the intermediate state (1, 2, ... , n) and hence accepted
by the automata.

22.1 Finding RE using Ardens Theorem (or Algebraic Method


using Ardens Theorem)
The following method is an extension of Ardens theorem this is used to find the regular
expression recognized by a transition system.
The following rules are used for transition system:
The transition graph does not have - moves.
It has only initial state, say V1.
Its vertices are V1 Vk.
Vi is the regular expression representing the set of strings accepted by the system even
though Vi is a final state.

22.2 Cont

ij denotes the regular expression representing the set of labels of edges from Vi to Vj,
when there is no such edge, ij = . Consequently, we can get the following set of
equations V1 Vn.

If there is more than one final states then we take union of all final states equations.

22.3 Example:
Consider the transition system, prove that the strings recognized are

22.4 Solution:
The three equations for q1, q2, and q3 can be written as

Put the value of q3 from equation (3) to equation (2)

22.5 Cont
Put q2 from equation (4) to equation (1)

22.6 Cont
Since q3 is a final state, the set of strings recognised by the graph is given by

Put the value of q1

22.7 Equivalence of Two Regular expressions


Let P and Q are two regular expressions. The regular expressions P and Q are equivalent iff they
represent the same set.
To prove the equivalence of P and Q,
We prove that the sets P and Q are the same but for nonequivalence we find a string in
one set but not in the other.
We use the identities to prove the equivalence of P and Q.
We construct the corresponding FA M and M and prove that M and M are equivalent,
and for nonequivalence we prove that M and M are not equivalent.
The method to be chosen depends on the problem.

22.8 Pumping Lemma for Regular expression


We can prove that a certain language is non regular by using a theorem called Pumping
Lemma. According to this theorem every regular language must have a special property. If a
language does not have this property, than it is guaranteed to be not regular. The idea behind this
theorem is that whenever a FA process a long string (longer than the number of states) and
accepts, there must be at least one state that is repeated, and the copy of the sub string of the
input string between the two occurrences of that repeated state can be repeated any number of
times with the resulting string remaining in the language.

23.1 Theorem:
Let L be a regular language. Then the following property holds for L.
There exists a number
(called, the pumping length), where, if w is any string in L of length
at least k i.e.
, then w may be divided into three sub strings w = xyz, satisfying the
following conditions:
1.

i.e.

2.
3.

23.2 Proof:
Since L is regular, there exists a DFA
the number of states in M is n.
Say, w = a1a2am, mn
(q0, a1a2ai) = qi for i=1, 2, , m

that recognizes it, i.e. L = L(M) . Let

; i.e. Q is the sequence of states in the path with path value w = a 1a2am.
As there are only n distinct states, at least two states in Q must coincide. Among the various pairs
of repeated states, we take the first pair. Let us take them as q j and qk (qj= qk). Then j and k
satisfy the condition 0j<kn.

23.3 Cont
The string w can be decomposed into 3 substrings a1a2aj, aj+1ak and ak+1am. Let x, y, z
denote these strings a1a2aj, aj+1ak and ak+1am respectively. As k n, and w=xyz.
The path with the path value w in the transition diagram of M as shown in figure.

The automaton M starts from the initial state q0. On applying the string x, it reaches qj(= qk). On
applying the string y, it comes back to qj(= qk). So after application of yi for each i0, the
automaton is in the same state qj. on applying z, it reaches qm, a final state. Hence xyiz L. as
every state in Q is obtained by applying an input symbol, y .

23.4 Application of Pumping Lemma


This theorem can be used to prove that certain sets are not regular. There are three steps needed
for proving that a given set is not regular.
Step 1: Assume that L is regular. Let n be the number of states in the corresponding finite
automaton.
Step 2: Choose a string w such that . Use pumping lemma to write w = xyz, with
and > 0.
Step 3: Find a suitable integer i such that xyiz L. This contradicts our assumption. Hence L is
not regular.
The crucial part of the procedure is to find I such that xyiz is not belong to L. In some cases we
prove xyiz is not belong to L by considering xy i z .

23.5 Example:
Show that L={aP|P is a prime} is not regular.
Solution: Prime numbers are 2, 3, 5, 7, 11, 13, 17,
L={aa, aaa, aaaaa, aaaaaaa,}
Let
y=aa (y )
x = , z =

w= (aa)i
w= (aa)i
When i=1, w = aa which is not in L
When i=2, w = aaaaa which is not L
Therefore L= {aP|P is a prime} is not regular.

23.6 Closure properties of Regular sets


The closure properties of regular sets are as follows:
Regular sets are closed under union.
Regular sets are closed under concatenation.
Regular sets are closed under closure (iteration).
Regular sets are closed under transpose.
Regular sets are closed under intersection.
Regular sets are closed under complementation.

23.7 Theorem:
Regular languages are closed under union operation and Kleen star operation.
Proof:
If L 1 and L 2 are regular, then there are regular expressions r 1 and r 2 denoting the languages L 1
and L 2 , respectively.
( r 1+ r 2) and (r 1*) are regular expressions denoting the languages L 1 L 2 and L 1*.
Therefore, L 1 L 2 and L 1* are regular.

23.8 Theorem:
Regular languages are closed under complement operation and intersection operation.
Proof:
Suppose that L 1 and L 2 are regular over an alphabet .
There is a DFA M=(Q, , , q 0, F) accepting L 1.
Design a DFA M = (Q, , , q 0, F), where F = Q - F.
Now, we have that L(M) = * - L 1. Hence, the complement of L 1 is regular.
Let L 1 = * - L 1 and L 2 = * - L 2.
The complement of L 1 L 2 is regular and is equal to L 1 L 2 .
Hence L 1 L 2 is regular.
The proof of regular languages are closed under intersection operation can be done by construct
a new DFA to accept the intersection. The construction is as follows.
Suppose that L 1 and L 2 are regular languages accepted by DFAs M 1=(Q 1, , 1, q 0 1, F 1) and
M 2 =(Q 2, , 2, q 0 2, F 2) such that L(M 1) = L 1 and L(M 2 )= L 2.
Construct a DFA M = (Q, , , q 0, F ), where Q = Q 1 Q 2,
q 0 = (q 0 1, q 0 2), F = F 1 F 2, and
((p i, q s), a) = ( 1(p i, a), 2(q s, a)).
It is not difficult to show that L(M) = L 1 L 2.

MODULE 2
Chapter 4
24.1 GRAMMAR
Introduction
A grammar is mechanism used for describing languages. This is one of the most simple but yet
powerful mechanism. There are four important components in a grammatical description of a
language.
A grammar G is defined as a quadruple.

24.2 Cont
Where,
N is a non-empty finite set of non-terminals or variables,
is a non-empty finite set of terminal symbols such that
, is a special non-terminal (or variable) called the start symbol
is a finite set of production rules.

24.3Context free Grammar (CFG)


A grammar
is a Context free Grammar if the production of the form, one
non-terminal finite string of terminals and non-terminals.

24.4Example:
is a grammar
Where,

24.5Notations used in CFG

If A is any set, then A* denotes the set of all strings over A, A+ denotes A*- {}, where
is the empty string.
A, B, C, A1, A2, denote the variables.
a, b, c denote the terminals.
x, y, z, w, denote the strings of terminals.
, , , denote the elements of (V )*.

x0 = for any symbol x in V .

24.6Production rules and derivation languages

The binary relation defined by the set of production rules is denoted by


i.e.
P

. Here contains at least one symbol from V. In other words,

iff
is

where

finite

set

of

production

rules

of

the

form

and

24.7 Cont

The production rules specify how the grammar transforms one string to another.
In grammar no reverse substitution and no inverse operation is permitted.

24.8 Cont

A string of terminals and variables is called as a sentential form if


.
If the final string does not contain any variable, it is a sentence in the language.
If the final string contains a variable, it is a sentential form.
If A is a production where AV, then it is called as an A production.
If A 1, A 2, , A m are A- productions, these productions are written as A
1|2||m.

25.1 Example:

25.2 Example:

25.3 REGULAR GRAMMARS


A grammar
three forms:
A
cB ,
A
c,
A
Where A, B

is right-linear if each production has one of the following

(with A = B allowed) and

25.4 Cont
A grammar G is left-linear if each production has one of the following three forms.
A Bc,
A c,
A
A right or left-linear grammar is called a regular grammar.

25.5 Derivation Trees


The derivations in a CFG can be represented using trees. Such trees representing are called
derivation trees. The derivation tree is also called as parse tree.

25.6 Conditions of Derivation Trees


The conditions satisfying for derivation tree of a CFG
Every vertex has a label which is a variable or terminal or.
The root has label S.

are as follows:

25.7 Cont

The label of an internal vertex is a variable.


If the vertices n1, n2, , nk written with labels X1, X2, , Xk are the sons of vertex n
with label A, then A X1, X2, , Xk is a production in P.
A vertex n is a leaf if its label is a or ; n is the only son of its father if its label is.

25.8 Left and Right Derivation Tree


The yields of derivation tree are the concatenation of the labels of the leaves without repetition in
the left-to-right ordering.
Left most derivation
If at each step in a derivation a production is applied to the left most variable, then the derivation
is called left most derivation.
Right most derivation
A derivation in which the right most variable is replaced at each step is said to be right most
derivation.

26.1 Example:
Let G be a grammar
. For the string
00110101, find a) the left most derivation, b) the right most derivation and c) the derivation tree.
Solution:

26.2 Cont
(c) The derivation tree is shown in below which yields 00110101:

26.3 Ambiguity in CFG


A CFG, G such that some string has two parse trees is said to be ambiguous. If there exist two or
more left most derivation or right most derivation trees to derive a string from a grammar is an
ambiguous.

26.4 Example:
If G is the grammar S SbS|a, show that G is ambiguous.
Solution: Let us consider w=abababa. Then we get two derivation trees for w as shown in
below:

26.5 Cont

Figure: Two derivation trees of abababa

26.6 Simplification of Context Free Grammars


The CFG can be simplified by eliminating some symbols and productions in G which are not
useful for the derivation of sentences.

26.7 Cont
Here we give the construction to eliminate:
1. Variables not deriving terminal strings.
2. Symbols not appearing in any sentential form
3. Null productions
4. Productions of the form AB

26.8 Variables not deriving terminal strings:


A variable in a grammar is eliminated if there is no way of getting a terminal string from it.

Theorem: if G is a CFG such that L(G) , we can find an equivalent grammar G such that
each variable derives some terminal string.
Proof: Let G=(N, , P, S), we define G=(N, , P, S) as follows:

27.1 Construction of N and P:


a) Construction of N:
We define Wi N by recursion:
W1 = { A N | there exists a production Aw where w*}. (if W1 = , some variable
will remain after the application of any production, and so L(G) )
Wi+1 = Wi { A N | there exists some production A with ( Wi )*}.

27.2 Cont
By the definition of Wi, Wi Wi+1 for all i. As N has only a finite number of variables,
Wk = Wk+1 for some k . Therefore, Wk = Wk+j, for j 1. We define N = Wk.
b) Construction of P:

27.3 Example:

27.4 Cont

27.5 Eliminate useless symbol:


A variable in a grammar said to be useless if there is no way of getting a terminal string from it
and it cannot be reached from the starting symbol.
Theorem:

27.6 Cont

27.7 Example:
Consider G=({S, A, B, E}, {a, b, c}, P, S), where P consists of SAB, Aa, Bb, Ec.
Solution:

27.8 Elimination of NULL productions:


A context free grammar may have productions of the form A. the production A is just
used to eliminate A by using production A, where A is a variable called as - production.

28.1 Example:
Reduce the grammar such that there are no - production, whose productions are SaS|AB,
A, B, Db.

28.2 Solution:
Step 1: Construction of the set W of all nullable variables:

Thus, W = W2 = {S, A, B}

28.3 Cont
Step 2: Construction of P:

Here we can not erase both the nullable variables A and B in SAB as we will get S in that
case.
Hence the required grammar without null productions is G1 = ({S, A, B, D}, {a, b}, P, S}
Where Pconsists of Db, SaS, SAB, Sa, SA, SB.

28.4 Elimination of Unit productions:


A production of the form AB, A, B VN in CFG is called as unit production. The unit
productions are eliminated from the grammar inorder to get the reduced CFG. To remove the
unit productions from the CFG, the substitution method is considered. Assume that there are no
- productions.

28.5 Example:

28.6 Cont

28.7 Normal forms for CFGs


In a CFG, the R.H.S of a production can be any string of variables and terminals, when the
productions in G satisfy certain restrictions, then G is said to be in a normal form. Here we will
discuss two normal forms such as Chomsky normal form (CNF) and Greibach normal form
(GNF).

28.8 Chomsky normal form (CNF)


A CFG G is in CNF if every production is of the form Aa, or ABC, and S is in G if
L(G), when is in L(G), we assume that S does not appear on the R.H.S of any production.
For example, consider G whose productions are SAB|, Aa, and Bb. Then G is in CNF.

29.1 Reduction to CNF


Theorem: For every CFG, there is an equivalent grammar G2 in CNF.
Proof: construction of a grammar in CNF
Step 1: Eliminate useless symbols, null productions and unit production.
Step 2: Eliminate terminals on right hand side of productions as follows:

29.2 Cont
We define G1 = (VN, , P1, S), where P1 and VN are constructed as follows:
All the productions in P of the form Aa, or ABC are included in P1, all the variables
in VN are included in VN.

29.3 Cont

Consider A X1 X2 Xn with some terminal on R.H.S. If Xi is a terminal, say ai, add a


new variable to VN and to P1. In production A X1 X2 Xn, every
terminal on R.H.S is replaced by the corresponding new variable and the variables on the
R.H.S are retained. The resulting production is added to P 1. Thus, we get G1 = (VN, ,
P1, S).

29.4 Cont
Step 3: Restricting the number of variables on R.H.S:
For any production in P1, the R.H.S consists of either a single terminal (or in S) or two or
more variables. We define G2 = (VN, , P2, S) as follows:
All productions in P1 are added to P2 if they are in the required form. All the variables in
VN are added to VN.

29.5 Cont

Consider A A1 A2 Am, where m 3. We introduce new productions A A1C1, C1


A2 C2 Cm-2 Am-1 Am and new variables C1, C2, Cm-2. These are added to P2
and VN, respectively.
Thus we get G2 in CNF.

29.6 Example:
Reduce the following grammar G to CNF. G is SaAD, AaB|bAB, Bb, Dd.
Solution:
Step 1: As there are no null productions or unit productions, we can proceed to step 2.

29.7 Cont

29.8 Cont

30.1 Greibach Normal form (GNF)


In GNF we put restrictions not on the length of the right sides of productions, but on the position
in which terminals and variables can appear.
A CFG is said to be GNF if all the productions have the form Aa and Aa, where a is a
terminal and may be any number of non terminals.
To convert given grammar into GNF the following two lemmas are considered.

30.2 Lemma 1:
Let
be a CFG. Let AB be an A-production in P and B-production be
B1|2||i.
Let G1 = (VN, , P1, S) be obtained from G by deleting the production AB from P and
adding the production, A1|2||i.
Then L(G) =L(G1)

30.3 Lemma 2:
Let
be a CFG. Let the set of A-productions be AA1|A2||Ar|1|2||i.
Note that Is do not start with A.

Let
be CFG formed by adding the variable Z to V and replacing all the
A-productions by the following points.

30.4 Cont
Where P1 is defined as follows:

(iii) The productions for the other variables are as in P. Then G 1 is a CFG and equivalent to G.

30.5 Reduction of CFG to GNF


The following steps indicate to construct a grammar in GNF are:
Step 1: Eliminate null and unit productions and convert the grammar into CNF.
Step 2: Rename all the variables as same variable name with different subscript. For example if
S, A, B are variables, then those are renamed as A1, A2, A3.

30.6 Cont
Step 3: Choose production such that left hand side variable subscript is greater than right hand
side starting variable subscript. Then apply lemma 1 or lemma 2 according to productions.
Step 4: Repeat applying lemma 1 or lemma 2 for all the productions till the grammar comes into
GNF.

30.7 Example:
Construct a grammar in GNF equivalent to the grammar

30.8 Solution:
The given grammar is in CNF. S and A are renamed as A1 and A2 respectively. So the
productions are
As the given grammar has no null productions and
is in CNF we need to carry out step 1. So we proceed to step 2.

31.1 Cont

31.2 Cont

31.3 Cont

31.4 Cont
Hence the equivalent grammar is
Where P1 consists of

31.5 Closure Property of CFLs


The closure properties of CFLs are as follows:
CFLs are closed under union.
CFLs are closed under concatenation.
CFLs are closed under kleen closure.
CFLs are not closed under intersection
CFLs are not closed under complementation.
CFLs are closed under substitution.
CFLs are closed under homomorphism.

31.6 Theorem 1:
The CFLs are closed under union operation.

31.7 Cont
We assume that the sets V1 and V2 are disjoint or V1 V2 = ( we can assume this because if
the sets are not disjoint we can make them so, by renaming variables in one of the grammars).
Consider the following grammar:

The above grammar is basically a combination of the grammars G1 and G2 in which we have
added the new start state S and a new production rule S S1| S2.

31.8 Theorem:
The CFLs are closed under concatenation operation.
Let L={ab}, s(a)=L1 and s(b)=L2. Then s(L)=L1L2
To get grammar for L1L2 :
Add new start symbol and rule S S1S2
We get G = (V, T, P, S) where
V = V1 V2 { S }, where S V1 V2
P = P1 P2 { S S1S2 }

32.1 Example:
L1 = { anbn | n 0 } with G1: S1 aS1b |
L2 = { bnan | n 0 } with G2 : S2 bS2a |
L1L2 = { anb{n+m}am | n, m 0 } with G = ({S, S1, S2}, {a, b}, {S S1S2, S1
aS1b | , S2 bS2a}, S)

32.2 Theorem:
The CFLs are closed under Kleen closure operation.
Use L={a}* or L={a}+, s(a)=L1. Then s(L)=L1* (or s(L)=L1+).
To get grammar for (L1)*
Add new start symbol S and rules S SS1 | .
We get G = (V, T, P, S) where

V = V1 { S }, where S V1
P = P1 { S SS1 | }

32.3 Example:
L1 = {anbn | n 0} (L1)*= { a{n1}b{n1} ... a{nk}b{nk} | k 0 and ni 0 for all i }
L2 = { a{n2} | n 1 }, (L2)*= a*

32.4 Theorem:
The CFLs are closed under homomorphism.
Closure under homomorphism of CFL L for every a . Suppose L is a CFL over alphabet
and h is a homomorphism on .
Let s be a substitution that replaces every a , by h(a). ie s(a) = {h(a)}.
Then h(L) = s(L).
h(L) ={h(a1)h(ak) | k 0} where h(ak) is a homomorphism for every ak .

32.5 Theorem:
The CFLs are closed under substitution.
If a substitution s assigns a CFL to every symbol in the alphabet of a CFL L, then s(L) is a CFL.
Proof:
Let G = (V, , P, S) be grammar for L
Let Ga= (Va, Ta, Pa, Sa) be the grammar for each a with VVa =
And G= (V, T, P, S) for s(L)

32.6 Cont
Where,
V = V Va
T = union of Ta for all a
P consists of
All productions in any Pa for a
In productions of P, each terminal a is replaced by Sa
A detailed proof that this construction works is in the reader.
Intuition: this replacement allows any string in La to take the place of any occurrence of a in any
string of L.

32.7 Example:
L = {0n1n| n 1}, generated by the grammar
S0S1|01,
n m
s(0) = {a b |m n}, generated by the grammar SaSb|A; AaA| ab,
s(1)={ab, abc}, generated by the grammar S abA, A c |

Rename second and third Ss to S0 and S1 respectively. Rename second A to B.


Resulting grammars are:

32.8 Cont
S0S1 | 01
S0aS0b | A; AaA | ab
S1abB; Bc |
In the first grammar replace 0 by S0 and 1 by S1. The combined grammar:
G = ({S, S0, S1, A, B}, {a, b}, P, S),
where P = {S S0SS1 | S0S1, S0 aS0b | A, A aA | ab, S1abB, B c | }

33.1 Application of Substitution:

Closure under union of CFLs L1 and L2


Closure under concatenation of CFLs L1 and L2
Closure under Kleenes star (closure * and positive closure +) of CFLs L1
Closure under homomorphism of CFL Li for every ai

33.2 Pumping lemma for CFL


Theorem: Let L be a context free language. Then we can find a natural number n such that:

33.3 Example:
Show that L = {aP| P is prime} is not a CFL.
Solution: we use the following property of L: if w L, then is a prime.
Step 1: Suppose L = L(G) is context-free. Let n be the natural number obtained by using
pumping lemma.

33.4 Cont
Step 2: Let p be a prime number greater than n. then z = aP L. we write z = uvwxy.
Step 3: By pumping lemma, uv0wx0 y = uwy L. So uwy is a prime number, say Q. let vx = r.
Then, uv q wx q y = + . As q + qr is not a prime. uv q wx q y not belongs to L. this is a
contradiction. Therefore, L is not context-free.

33.5 CYK Algorithm to decide membership in CFL


We now present a cubic-time algorithm due to cocke, Younger and Kasami. It uses the dynamic
programming technique-solves smaller sub-problems first and then builds up solution by
combining smaller sub-solutions. It determines for each substring y of the given string x the set
of all nonterminals that generate y. This is done inductively on the length of y.
Let

be the given CFG in CNF. Consider the given string x and let

. Let

be the substring of x that begins at position i (i.e. i-th symbol of x) and has length j. Let
be the set of all nonterminals A such that
We write

. Where each

iff

is a terminal symbol.

. Thus we construct the sets

for all

Combining substrings of length 2, it is clear that,


in G and
That is

and

iff

i.e.

iff there is a production

.
and

and

Thus we can construct the sets


grammar.

from the already constructed sets

In general considering substrings

of length j,

in G such that
That is

iff

to divide,

and
and

i.e.

for some

, by inspecting the

iff there is a production


.

for some

such that

into smaller substrings, using all possible ways (i.e. for different values of k), and

construct
from already constructed sets for smaller substrings (i.e.
inspecting the grammar.
These sets for longer substrings of x are constructed inductively until the set
is constructed.
It is clear from the construction that
Hence, we can determine whether
The CYK algorithm is presented next.

iff
by inspecting

33.6 CYK-Algorithm
Input: A CFG
Initialize:

. The idea is

and a string

and

) by
for the string

for j : = 2 to n do /* Determine

for all i */

for i := 1 to n-j+1 do /* No sense in considering i, j with

for k := 1 to j-1 do /* try substrings of

for all i */

of length k */

33.7 Cont

The correctness of the algorithm can be proved by applying induction on j that whenever
the outer loop finishes for particular j, the set
derive

contains all nonterminals A that can

( for all i).

It is easy to conclude that the time complexity of this algorithm is


where
and grammar G is "fixed" in the sense that the size of the grammar is not considered as
input in measuring complexity.

33.8 Example :
Consider the CFG:
S
AB | AC
A
BC | a
B
CB | b
C
AA | b
Let us decide the membership for the string x = baaaab using the CYK algorithm.
The table for

's is shown below.

34.1 Cont
Word: b a a a a b

cell i, j will contain Nij

34.2 Cont
The top row is filled in by the first step of the algorithm e.g.
, because
is a
production. We can compute the contents of the second row by using the contents of the first row
(already done) and inspecting the grammar.

34.3 Cont
For example, to compute
if

(i.e. the set of nonterminals that derive

or if

we have

or

) we notice that

is a production since no such production exists,

34.4 Cont
Similarly since

is a production and

Now, consider the first element of the third row,


There are two ways to break up

we put

(corresponding to the string

).

and

34.5 Cont
Consider
Since

and

( If we consider, the other way i.e.


can be added to
)
Continuing this way we fill up the
whole table as given below

34.6 Cont

Figure

is a production, we put
we find that

.
and hence no more symbols

34.7 Cont
x = x16 = baaaab
since
Hence baaaab is a member of the language generated by the grammar.

Chapter 5
34.8 PUSHDOWN AUTOMATA (PDA)
Regular language can be characterized as the language accepted by finite automata. Similarly,
we can characterize the context-free language as the language accepted by a class of machines
called "Pushdown Automata" (PDA). Pushdown automation is an extension of the NFA.

35.1 FA Vs. PDA


It is observed that FA has limited capability. (in the sense that the class of languages accepted or
characterized by them is small). This is due to the "finite memory" (number of states) and "no
external memory" involved with them. A PDA is simply an NFA augmented with an "external
stack memory". The addition of a stack provides the PDA with a last-in, first-out memory
management capability.

35.2 Cont
This "Stack" or "pushdown store" can be used to record potentially unbounded information. It is
due to this memory management capability with the help of the stack that a PDA can overcome
the memory limitations that prevents a FA to accept many interesting languages like
.

35.3 Function of PDA


A PDA can store an unbounded amount of information on the stack, its access to the information
on the stack is limited. It can push an element onto the top of the stack and pop off an element
from the top of the stack. To read down into the stack the top elements must be popped off and
are lost. Due to this limited access to the information on the stack, a PDA still has some
limitations and cannot accept some other interesting languages.

35.4 PDA Model

(PDA model)

35.5 Components of PDA


From the figure it has shown that, a PDA has three components:
an input tape with read only head,
a finite control and
A pushdown store.

35.6 Functions of Each Component


The input head is read-only and may only move from left to right, one symbol (or cell) at a time.
In each step, the PDA pops the top symbol off the stack; based on this symbol, the input symbol
it is currently reading, and its present state, it can push a sequence of symbols onto the stack,
move its read-only head one cell (or symbol) to the right, and enter a new state, as defined by the
transition rules of the PDA.

35.7 Cont
PDA are nondeterministic, by default. That is, - transitions are also allowed in which the PDA
can pop and push, and change state without reading the next input symbol or moving its readonly head. Besides this, there may be multiple options for possible next moves.

35.8 Formal Definitions:


Formally, a PDA M is a 7-tuple M =
where,

is a finite set of states,


is a finite set of input symbols (input alphabets),
is a finite set of stack symbols (stack alphabets),

is a transition function from

is the start state

, is the initial stack symbol, and

, is the final or accept states.

to subset of

36.1 Explanation of the transition function, :


Transition function can be explained depending on the input a i.e.
Transition function for if
Transition function for if a = .

36.2 Transition function if


If, for any
,
. This means intitutively that
whenever the PDA is in state q reading input symbol a and z on top of the stack, it can
nondeterministically for any i,

go to state
pop z off the stack

36.3 Cont

Push

then
will be at the top and
at the bottom.)
Move read head right one cell past the current symbol a.

onto the stack (where

) (The usual convention is that if

36.4 Transition function if a =


If a = , then
means intitutively that whenever the
PDA is in state q with z on the top of the stack regardless of the current input symbol, it can
nondeterministically for any i,
,

go to state
pop z off the stack

push onto the stack, and


Leave its read-only head where it is.

36.5 State transition diagram:


A PDA can also be depicted by a state transition diagram. The labels on the arcs indicate both
the

input

and

the

for

stack
and

operation.

The

transition

is depicted by

Final states are indicated by double circles and the start state is indicated by an arrow to it from
nowhere.

36.6 Configuration or Instantaneous Description (ID):


A configuration or an instantaneous description (ID) of PDA at any moment during its
computation is an element of

describing the current state, the portion of the input

remaining to be read (i.e. under and to the right of the read head), and the current stack contents.
Only these three elements can affect the computation from that point on and, hence, are parts of
the ID.

36.7 Cont
The start or initial configuration (or ID) on input
in its start state,

is

. That is, the PDA always starts

with its read head pointing to the leftmost input symbol and the stack

containing only the start/initial stack symbol,

36.8 Cont
Let A be a PDA. A move relation, denoted by

, between IDs is defined as

37.1 Language accepted by a PDA M


There are two alternative definition of acceptance as given below.
Acceptance by final state
Acceptance by empty stack (or Null stack)

37.2 Acceptance by final state:


Consider the PDA
. Informally, the PDA M is said to accept its input
by final state if it enters any final state in zero or more moves after reading its entire input,
starting in the start configuration on input .
Formally, we define L(M), the language accepted by final state to be
{

for some

and

37.3 Acceptance by empty stack (or Null stack):


The PDA M accepts its input by empty stack if starting in the start configuration on input
Formally, we define N (M), the language accepted by empty stack, to be
{
|
for some
}
Note that the set of final states, F is irrelevant in this case and we usually let the F to be the
empty set i.e. F = Q.

37.4 Example:
Here is a PDA that accepts the language

37.5 Cont
and

consists of the following transitions

37.6 Cont
The PDA can also be described by the adjacent transition diagram.

Informally, whenever the PDA M sees an input a in the start state

with the start symbol z on

the top of the stack it pushes a onto the stack and changes state to

. (to remember that it has

seen the first 'a'). On state

if it sees anymore a, it simply pushes it onto the stack. Note that

when M is on state , the symbol on the top of the stack can only be a. On state if it sees the
first b with a on the top of the stack, then it needs to start comparison of numbers of a's and b's,
since all the a's at the beginning of the input have already been pushed onto the stack. It start this
process by popping off the a from the top of the stack and enters in state q3 (to remember that the
comparison process has begun). On state , it expects only b's in the input (if it sees any more a
in the input thus the input will not be in the proper form of anbn). Hence there is no more on
input a when it is in state . On state it pops off an a from the top of the stack for every b in
the input. When it sees the last b on state q3, then the last a from the stack will be popped off and
the start symbol z is exposed. This is the only possible case when the input (i.e. on -input) the
PDA M will move to state

which is an accept state.

We can show the computation of the PDA on a given input using the IDs and next move
relations. For example, following are the computation on two input strings.

37.7 Acceptance of string


Let the input be aabb. We start with the start configuration and proceed to the subsequent IDs
using the transition function defined

(Using transition 1)
(Using transition 2)
(Using transition 3)
(Using transition 4)
(Using
5)
is final state. Hence, accept. So the
stringtransition
aabb is rightly
accepted by M.
We can show the computation of the PDA on a given input using the IDs and next move
relations. For example, following are the computation on two input strings.

37.8 Cont
i) Let the input be aabab.

No further move is defined at this point.


Hence the PDA gets stuck and the string aabab is not accepted.
Deterministic and non-deterministic PDA
Like FA the PDA also may be deterministic or nondeterministic. We will discuss this one by one
here.

38.1 Deterministic PDA (DPDA)


A PDA is said to be deterministic if all derivations (ID) in the design has to give only single
move.
Example: To design a DPDA to accept the following language, L = {wwR| w (a, b)*}
Solution:
w(a, b) *

Then w = , a, b, aa, bb, ab, ba, aab,


language, L = , aa, bb, abba, baab, aabbaa,
As we know that PDA, M= (Q, , , , q0, z0, F)

38.2 Cont

can be defines as:


1. ( q0, a, z0) = (q0, az0)
2. ( q0, b, z0) = (q0, bz0)
3. ( q0, a, b) = (q0, ab)
4. ( q0, b, a) = (q0, ba)
5. ( q0, a, a) = (q1, )
6. ( q0, b, b) = (q1, )
7. ( q1, a, a) = (q1, )
8. ( q1, b, b) = (q1, )
9. ( q0, , z0) = (q2, z0)
10. ( q1, , z0) = (q2, z0)

38.3 Cont
PDA reaches the centre only when the input symbol and the top most symbol of push down stack
are same.
Let L=abba
( q0, abba, z0)
(q0, bba, az0)
(By rule 1)
(q0, ba, baz0)
(By rule 4)
(q0, a, az0)
(By rule 6)
(q1, , z0)
(By rule 5)
(q2, z0)
(By rule 10)
Hence the string is accepted by final state in DPDA.

38.4 Non-Deterministic PDA (NPDA)


A PDA is called as non-deterministic, if derivation generates move than one move in the
designing of particular task.
Example: To design a DPDA to accept the following language, L = {wwR| w (a, b)*}
Solution:
NPDA, M= (Q, , , , q0, z0, F)

38.5 Cont
can be defines as:
1.
2.
3.
4.
5.
6.

( q0, a, z0) = (q0, az0)


( q0, b, z0) = (q0, bz0)
( q0, a, b) = (q0, ab)
( q0, b, a) = (q0, ba)
( q0, a, a) = {( q0, aa), (q1, )}
( q0, b, b) = {(q0, bb), (q1, )}

7. ( q1, a, a) = (q1, )
8. ( q1, b, b) = (q1, )
9. ( q1, , z0) = (q2, z0)

38.6 Cont
Let L=abba
(q0, abba, z0)

(q0, bba, az0)


(By rule 1)
(q0, ba, baz0)
(By rule 4)
(q1, a, az0)
(By rule 6)
(q1, , z0)
(By rule 7)
(q2, z0)
(By rule 9)
Hence the string is accepted by final state in NPDA.

38.7 Pushdown Automata and CFLs


In this section we prove that the sets accepted by PDA either by null store or final states are
precisely the CFLs.
Converting from CFL to PDA
Theorem: if L is a language, then we can construct a PDA A accepting L by empty store i.e. L
= PDA (A).
Let L =L(G), where
is a CFG. W constructs a PDA A as
Where is defined by the following rules:

38.8 Cont
The above rules can be explained as follows:
The pushdown symbols in A are variables and terminals. If the PDA reads a variable A on the
top of PDS (pushdown store or stack), it makes a -move by placing the R. H. S of any Aproduction (after erasing A). If the PDA reads a terminal a on PDS and if it matches with the
current input symbol, then the PDA erases a. In other cases the PDA halts.

39.1 Example:
Construct a PDA A equivalent to the following CFG: S 0BB, B0S|1S|0. Test whether 0104
is in PDA (A).
Solution:
Define PDA A as follows:

39.2 Cont
is defined by the following rules:

39.3 Cont

Thus 0104 is in PDA (A).

MODULE 3
Chapter 6
39.4 Turing machine
The Turing machine (TM) is a simple mathematical model of a general purpose computer. In
other words, TM models the computing power of a computer. i.e. the TM is capable of
performing any calculation, which can be performed by any computing machine.

39.5 Cont
Machine may take its own o/p as i/p for further operation, hence no distinction between i/p and
o/p set. The machine can choose the current location and also decide whether to move Left or
Right in the memory. The o/p of the machine may also be stored in the memory and can
sometimes be used as the i/p.

39.6 Model of TM
The model of Turing machine as shown in below:

39.7 Function of R/w Head


The TM can be thought of as a finite state automaton connected to a R/W (Read/Write) Head. It
has one tape which is divided into a number of cells. Each cell can store only one symbol. The
i/p and o/p from the finite state automaton are affected by a R/W head, Which can examine one
cell at a time.

39.8 Cont
In one move, the machine examines the present symbol under the R/w head o the tape and the
present state of an automaton to determine.
A new symbol to be written on the tape in the cell under the R/W head.
A motion of the R/W head along the tape, either the head moves one cell left (L) or one
cell Right (R).
The next state of the automaton.
Whether to halt or not.

40.1 Formal Definition of TM


The formal definition of a TM M is a 7 tuples
, where
Q is a finite nonempty set of states.

is a finite non-empty set of tape symbols, called the tape alphabet of M.

is a finite non-empty set of input symbols, called the input alphabet of M.

40.2 Cont

is the transition function of M,

is the initial or start state.


is the blank symbol

is the set of final state.

40.3 Description
So, given the current state and tape symbol being read, the transition function describes the next
state, symbol to be written on the tape, and the direction in which to move the tape head ( L and
R denote left and right, respectively ).
Notes:
Here the symbols and are same. And the blank symbol B can also denote as b.
The acceptability of a string is decided by the reachability from the initial state to some
final state. So the final states are also called the accepting states.
may not be defined for some elements of Q.

40.4 Representation of Turing machines


The TM can be represented by using
Instantaneous description using move relations.
Transition table.
Transition diagram or transition graph.

40.5 Representation by instantaneous descriptions


Turing machine is defined in terms of the entire input string and the current state.
Definition: An ID of a TM M is a string, where is the present state of M. The entire string
is split as , the first symbol of is the current symbol a under the R/W head and has all the
subsequent symbols of the input string, and the string is the substring of the input string
formed by all the symbols to the left of a.

40.6 ID of TM
The instantaneous description of TM is shown in below. Let us consider the given below TM.

(Turing Machine)

40.7 Description of TM ID
From the above figure it shown that the present symbol under the R/W head is a 1. The present
state is q3. The nonblank symbols to the left of a1 form the string a4a1a2a1a2a2, which is written to
the left of q3. The sequence of nonblank symbols to the right of a1 is a4a2.

40.8 Representation of ID
Thus the ID of the given TM is shown in below.

(Representation of ID)

41.1 Moves in a TM:


In the case of PDA, (q, x) induces a change in ID of the TM, we call this change in ID a move.
Suppose (q, xi) = (p, y, L). The input string to be processed is x1x2 xn, and the present
symbol under the R/W head is xi. So the ID before processing xi is
After processing xi, the resulting ID is
This change of ID is represented by

41.2 Cont
If i = 1, the resulting ID is p y x2 x3. xn.
If (q, xi) = (p, y, R), then the change of ID is represented by,

41.3 Representation by Transition Table


The transition function can be defined in the form of a table called as a transition table. If (q, a)
= (p, y, D), we write (p, y, D) under the a-column and in the q-row. So if we get (p, y, D) in the
table, it means that y is written in the current cell, D gives the movement of the head either L or
R and p denotes the new state into which the TM enters.

41.4 Transition Table of TM


The transition table of TM M is shown in below:

Note: Here # denote as blank symbol.

41.5 Representation by Transition Diagram


A transition system is a directed labeled graph in which each vertex or node represents a state
and the directed edges indicates the transition of a state and the edges are labeled with triples of
the form (x, y, D), where x and y are tape symbols and D is a direction i.e. either L or R. In other
words, whenever (q, x) = (p, y, D), we find the label (x, y, D) on the arc from q to p. the initial
state is indicated by and the final state is indicated by double circle.

41.6 Example of Transition Diagram


The example of transition diagram of TM for the above transition table is shown in below:

41.7 The language of a Turing machine


For the acceptability of a language in TM, at first the input string is placed on the tape and the
tape head begins at the left most input symbol. If the TM enters an accepting state, then the input
is accepted, otherwise not. The set of languages we can accept using a TM is called the
recursively enumerable languages or RE languages.

41.8 Example:
Show the IDs of the TM is shown in table if the input tape contains: a) 01 and b) 1011010.

42.1 Solution:
a) Processing of the string 01# is as follows:
q001# 0q01# 01q1# 011p
b) Processing of the string 1011010# is as follows:
q01011010# 1q1011010#10q111010# 101q01010# 1011q1010#
10110q110# 101101q00# 1011010q0# 10110100p

42.2 Design of Turing machines


The objectives of designing of TM are as follows:
Scan the symbol by the R/W head to know about the future. The machine remembers the
past symbol while going to the next state.
To minimize the number of state by changing the states only either when there is a
change in the movement of the R/W head or when there is a change in the written
symbol.

42.3 Example: parity counter problem


Design a Turing machine that reads binary strings and counts the number of 1s in the sequence.
The output is 0 if the number of 1s in the string is even and 1 if the number of 1s is odd.

42.4 Solution:
The actions that should be carried out are:
Search for 1s to the right.
The 0s in the sequence are read but ignored.
Maintain a count of whether the number of 1s is odd or even.
After a blank (#) is found, write the output to the right of the input string.

42.5 Cont
The set of states is as follows:
q0 searches to the right for a 1, indicates the number of 1s so far is even.
q1 searches to the right for a 1, indicates the number of 1s so far is odd.

42.6 Cont
The Turing machine is represented by the following state diagram:

42.7 Cont
TM=({q0,q1},{0,1},{0,1,#},,q0,{p}) where is given by:

42.8 TM Languages: (Recursive and Recursive Enumerable


language)
The languages of Turing machines are:
Recursive language
Recursive enumerable language

43.1 Recursive Language


A language is said to be recursive if there exists a TM which will halt and accept when presented
with any input string w* only if the string is in the language otherwise it will halt and reject
the string. It is also called as turing decidable language.
A TM that always halts is called as a decider or total TM. The recursive language is also called
as recursive set or decidable.

43.2 Recursive Enumerable language


A language is said to be recursive enumerable if there exists a TM which will halt and accept
when presented with any input string w* it may halt and reject or loop forever, when a string
is not in the language. It is also called as turing acceptable language.
For recursive enumerable a decision problem is called partially semi-decidable, if there is a
recursively enumerable set. It is also called as recursive enumerable set or semi-decidable.

43.3 Turing Computability


A function f with domain D is said to be turing computable if there exists some turing
machine,
such that
For all w D
1. Each elementary function is computable either because it is simple or by definition.
2. If a function f is computable and if another function g can be obtained by applying an
elementary operation to f, then g is also computable.
3. A function is computable if and only if it can be obtained using rule 1 and 2.

43.4 Variants of Turing machine


The variants of TMs are as follows:
Multi-tape TM
Multi head TM
Multi dimensional TM
Non-deterministic TM

43.5 Multi-tape Turing Machines


A multi-tape TM is a TM with more than one tape and each with its own independently
controlled R/W head. Each tape is divided into cells and each cell can holding symbol of input.
The input appears on the first tape and all other tapes hold the blank symbol. Initially the head of
the first tape is at the left end of the input and all other heads can be placed at any cell.

43.6 Cont
The advantage of multi-tape TM is that the design of some functions like copying, reversing,
verifying whether a string is a palindrome or not etc, can be carried out in much easier way as
compared to the design of the corresponding standard TMs. The formal notation of multi-tape
TM is quite similar to standard TM except the transition function as shown in below:

43.7 Cont
The multi-tape TM is shown in figure with 3 tape for example.

(Multi-tape TM)

43.8 Multi head TM


A multi-head TM has more than one number of heads, i.e. it has some fixed number of heads.
The heads are numbered from 1 to n where n is the number of head. The move of the TM
depends on the state and on the symbol scanned by each head in one move; the head may move
independently left, right or remain stationary.

44.1 Multi-dimensional TM
A multi-dimensional TM is one in which the tape can be viewed as extending in more than one
dimensional grid. Initially the input is along one axis and the head is at the left end of the input.
Depending on the state and symbol scanned, the machine changes its states, writes a new symbol
and moves its tape head in one of the directions, either positively or negatively along one of the
K-axes.

44.2 Cont
At any time, in any dimension, only a finite number of rows contain non-blank symbols and each
of these rows has only a finite number of non-blank symbols. The transition function can be
defined as:
Q x Q x x {L,R, D1, D2}

44.3 Nondeterministic Turing Machines


A non-deterministic TM is a device with a finite control, with one way movement of tape. For a
given state and tape symbol scanned by the tape head, the TM has a finite set of choices for the
next move. Each choice consists of a new state, symbol, and a direction of head motion. The
transition function can be defined as: : Q x P( Q x x {L,R} )

44.4 Linear Bounded Automata (LBA)


LBA is a restricted form of a non-deterministic TM in which the tape is not permitted to move
off the portion of the tape containing the input. If the machine tries to move its head off either
end of the input, the head stays where it is. A LBA contains a single tape whose length is not
infinite but bounded by a linear function of the length of the input string. It contains with a
limited amount of memory.

44.5 Cont
It can only solve problems requiring memory that can fit within the tape used for the input. If the
input length is n, then the amount of memory available is linear in n, thus the name of this model
is LBA. LBA accepts a smaller class of languages than the class of r.e. languages.

44.6 Cont
An LBA is exactly similar to a TM except that on any input w
with |w| = n, it can use only
(n+2) numbers of cells of the input tape. The input string is always put between a left-end
marker, <, and a right-end marker, >, which are not puts of the input string. The read-write head
cannot move to the left of the left-end marker or to the right of the right-end marker. The two end
markers cannot be overwritten in any case.

44.7 Formal Definition


Formally, a LBA is a nondeterministic TM M = (Q, , , , q0, B, <, >, F) satisfying the
following conditions:
1. The input alphabet, must contain two special symbols < and >, the left and right end
markers, respectively which do not appear in any input string.

44.8 Cont
2.

(q, <) can contain only element of the form ( p, <, R ) and (q, >) can contain only
elements of the form (p, >, L) for any q, p Q .
[ Note: All other elements are identical to the corresponding elements of a TM]
The language which is accepted by the LBA is called as context sensitive language (CSL).

45.1 Context Sensitive Language (CSL)


A language is said to be context sensitive if there exists a context sensitive grammar G, such that
L = L(G) or L = L(G) {}.
A grammar, G = (V, T, P, S) is said to be context sensitive if all productions are of the form: X
Y where X, Y (VT) and
A context sensitive Grammar (CSG) can never generate a language containing the empty string
.

45.2 Example:
The language
bBBb, aBaa|aaA.

is a CSL. The grammar is Sabc|aAbc, AbbA, AcBbcc,

45.3 Solution:
Derive a tree of string a3b3c3.
S aAbc
abAc
abBbcc
aBbbcc
aaAbbcc
aabAbcc
aabbAcc
aabbBbccc
aabBbbccc
aaBbbbccc
aaabbbccc

45.4 Primitive recursive functions


A function or map f from a set X to set is a rule which associates to every element x in X a
unique element in Y, which denoted by f(x). The element f(x) is called the image of x under f.
The function is denoted by f : XY
We construct primitive functions over N and .

45.5 Initial function over N


The initial functions over N are
Zero function, Z(X) = 0.
Successor function, S (X) = X+1.
Projection function, 1 = .
Example:
Zero function, Z(4) = 0.
Successor function, S (4) = 4+1= 5.
Projection function, 35 (1, 7, 8, 2, 9) = 8.

45.6 Initial function over


The Initial function over are, let ={a, b}
nil (x) =
cons a (x)= ax
cons b (x)= bx
Cons a (x) and cons b (x) denote the concatenation of constant string a and x and the
concatenation of the constant string b and x.

45.7 Cont
If f1, f2, fk are partial functions of n variables and g is a partial function of k variables, then
the composition of g with f1, f2, fk is a partial function of n variables defined by
If f1, f2, and f3 are partial function of two
variables and g is a partial function of 3-variables, then the composition of g with f1, f2, f3 is
given by

45.8 Example:
Let

46.1 Ackermanns function


A (x, y) can be computed for every (x, y) and hence A(x, y) is total. Ackermanns function is not
primitive recursive but recursive. Ackermanns function can be defined as,

46.2 Example:
Compute A (1, 1), A (2, 1), A(1, 2), A(2, 2).
Solution:
A (1, 1)

46.3 Cont

46.4 Cont

46.5 Partial function


A partial function f from X to Y is a rule which assigns to every element of X at most one
element of Y.

46.6 Total function


A total function from X to Y is a rule which assigns to every element of X a unique element
Y.
Example: If R denotes the set of all real numbers, the rule f from R to itself given by =
+ is a partial function since F(r) is not defined as a real number when r is negative. But g(r)
= 2r is a total function from R to itself.

46.7 -recursive function


Every turing computable function is partial -recursive and every -recursive function is turing
computable or the set of -recursive functions is identical to the set of turing computable
functions.

46.8 Church Turing hypothesis


The assumption that the notation of computable function can be identified with the class of
partial recursive functions is known as Churchs hypothesis or the church-turing thesis.
While we cannot hope to prove churchs hypothesis as long as the informal notation of
computable remains as informal notation, we can give evidence for it is reasonableness.

47.1 Cont
As long as our notation of computable places no bound on the number of steps or the amount of
storages, it seems that the partial recursive functions are computable, although some would argue

that a function is not computable unless we can bound the computation in advance or at least
establish whether or not the computation eventually terminates.

47.2 Cont
What is less clear is whether the class of partial recursive functions includes all computable
functions. Logic has presented many other form such as the -calculus, post system and general
recursive functions.
In addition, abstract computer models, such as the Random Access memory (RAM), also give
rise to the partial recursive functions.

47.3 Cont
The RAM consists of an infinite number of memory words, numbered 0, 1, each of which
can hold any integer, and a finite number of arithmetic registers capable of holding any integer.
Integers may be decided into usual sorts of computer instructions. It should be clear that if we
choose a suitable set of instructions the RAM may simulate any existing computer.

47.4 Undecidable problems


Undecidability is the situation where we could not develop an algorithm which will correctly
give a solution to a given problem.
A problem whose language is recursive is said to be decidable otherwise it is undecidable of the
problem is undecidable than there is no algorithm which takes the i/p and find the answer either
YES/NO.
Example of undecidable problems are PCP and universal TM.

47.5 Universal Turing machine (UTM or U)


The TM that were considered for the design for special purpose computers, Designing general
purpose. TM is a more complex task. We must design a machine that can accept the i/ps i.e.
the i/p data

47.6 Cont

a description of computation this is precisely what a general purpose computer does. It


accepts data and a program (i.e. a description of the computation to be done).
A general purpose TM is usually called universal TM which is powerful enough to simulate the
behavior of any computer, including any TM itself. More precisely, UTM can simulate the
behavior of an arbitrary TM over any .

47.7 Cont
This turings greatest discovery that there is a TM which is so sensitive in the respect that by
properly adjusting the i/p representation, we can cause it to computer any turing computable
function i.e., any other function that can be computer by any other TM. The TM of the above
capability is known as UTM.

47.8 Cont
Turing has also indicated as how to construct the UTM which has a property that for every TM
T, there is a string of symbol dT description of Turing such that, if the number X is
written in Unary Notation on a blank tape followed by the string d T and UTM is strated in
state q0 on the left most symbol of dT then when the machine stops the number f(x) will
appear on the tape.
Where f(x) is the number that would have been computed. If the TM T had been started with
only X on the tape.

48.1 Cont
Thus the UTM is capable of any other TM when the following information is started on the tape.
A description of TM in terms of its finite automaton ( this will be the operation area or
the program area of the tape)
The initial configuration of TM i.e. the starting or the current state and the symbol
scanned. ( this will be the state area of the tape)
The processing data to be fed to the TM. (this will be the data area of the tape).

48.2 Cont
The finite automaton, the description of the TM is stored on the tape. This stored information is
nothing but the DATA to the UTM. The only difference in the input string and this data is that
the interpretation will be different.
The UTM actually keep, the complete account of the TM, which is to be simulated. It remembers
the current state of that TM, input scanned, the action which will be taken by that TM and so
on. But the whole process has a definite path.

48.3 Working of UTM


The working of the UTM will have the following steps in general:
Step 1: Scan the square on the state area of the tape and read the symbol that TM reads and the
initial state of TM.
Step 2: Move the tape to the program area which contains the finite automation table. Find the
row which is headed by the symbol scanned in the previous state.

48.4 Cont
Step 3: Find the required column i.e. the current state of TM and read the triplet from the table
i.e. new symbol, new state and direction of movement of the tape.
Step 4: Move the tape to the appropriate square on the data area of the tape, replace the symbol,
move the tape in the given direction, read the next symbol, reach the state and replace the state
by the current state and repeat from the step 1.

Chapter 9
48.5 Post's Correspondence Problem (PCP)
A post correspondence system consists of a finite set of ordered pairs
where

for some alphabet

Any sequence of numbers


is called a solution to a Post Correspondence System.
The Post's Correspondence Problem is the problem of determining
whether a Post Correspondence system has a solution.

48.6 Example 1:
Consider the post correspondence system
The list 1,2,1,3 is a solution to it.
Because
i

xi

yi

1
2
3

(A post correspondence system is also denoted as an instance of the PCP)

48.7 Example 2:
The following PCP instance has no solution
i

xi

yi

1
2

48.8 Cont
This can be proved as follows. (x2, y2) can not be chosen at the start, since than the LHS and
RHS would differ in the first symbol (

in LHS and

in RHS). So, we must start with

The next pair must be


so that the 3 rd symbol in the RHS becomes identical to that of the
LHS. After this step, LHS and RHS are not matching.

49.1 Cont
If
RHS). If
next step.

is selected next, then would be mismatched in the 7 th symbol (

in LHS and

in

is selected, instead, there will not be any choice to match the both side in the

49.2 Example 3:
The list 1,3,2,3 is a solution to the following PCP instance.
i

xi

yi

101

10

00

011

11

49.3 Valid and invalid computations of Turing machines.


We can use CFG's to know something about Turing Machine computations (well actually it
usually works the other way around).
A valid computation for a turing machine is a sequence of machine configurations such that the
first one is a starting configuration, the last is an accepting configuration and each one is
reachable from the previous in one computation step.
It turns out, the valid computations for any (particular) turing machine is the intersection of two
CFLs and the invalid computations (the complement of the valid computations) is a CFL.
From this, it's easy to show that the problem of recognizing CFGs with language Sigma* must be
undecidable. Here's why. If we make a CFG that represents the invalid computations for a TM,
then, if that CFG accepts Sigma*, then the language accepted by the turing machine must be
empty. But, we know that the problem of detecting turing machines with empty languages is
undecidable (actually, it's non-computable), so recognizing CFGs with language Sigma* must be
undecidable.
From this, it's easy to show that the problem of recognizing equivalent CFGs must also be
undecidable. If this was decidable, then we could use it to decide whether or not the language of
a CFG was infinite.

49.4 Chomsky Hierarchy of languages (or Chomsky Classification of


languages)
Grammars are language generators. They consist of an alphabet of terminal symbols, alphabet of
non-terminal symbols, a starting symbol and rules. Each language, generated by some grammar,
can be recognized by some automaton.

49.5 Cont
Languages (and the corresponding grammars) can be classified according to the minimal
automaton sufficient to recognize them. Such classification, known as Chomsky Hierarchy, has
been defined by Noam Chomsky, a distinguished linguist with major contributions to linguistics.

49.6 Cont
The Chomsky Hierarchy comprises four types of languages and their associated grammars and
machines.
The
types
of
languages
form
a
strict
hierarchy:
regular languages context-free languages context-sensitive languages recursive languages
recursively enumerable languages.

49.7 Cont
Language

Grammar

Machine

Example

Type 3

Regular languages

Regular grammars
Right-linear
grammars
Left-linear
grammars

Finite-state
automata

a*

Type 2

Context-free languages

Context-free
grammars

Push-down
automata

anbn

Type 1

Context-sensitive
languages

Context-sensitive
grammars

Linear-bound
automata

anbncn

Type 0

Recursive languages
Recursively
enumerable languages

Unrestricted
grammars

Turing machines

any computable
function

49.8 Hierarchy of languages


The hierarchy of languages and relationship with the machines as shown in figure:

50.1 Difference between Languages


The distinction between languages can be seen by examining the structure of the grammar rules
of their grammar, or the nature of the automata which can be used to identify them.
Type 3 - Regular Languages
As we have discussed, a regular language is one which can be represented by a regular
grammar, described using a regular expression, or accepted using an FSA.
There are two kinds of regular grammar:

50.2 Cont

Right-linear (right-regular), with rules of the form


A B or A ,
where A and B are single non-terminal symbols, is a terminal symbol
Parse trees with these grammars are right-branching.
Left-linear (left-regular), with rules of the form
A B or A
Parse trees with these grammars are left-branching.
Examples of regular languages are pattern matching languages (regular expressions).

50.3 Type 2 - Context-Free Languages

Type 2 - Context-Free Languages


A Context-Free Grammar (CFG) is one whose production rules are of the form:
A
where A is any single non-terminal, and is any combination of terminals and nonterminals.
The minimal automaton that recognizes context-free languages is a push-down
automaton. It uses stack when expanding the non-terminal symbols with the right-hand
side of the corresponding grammar rule.
Examples of CFLs are some simple programming languages.

50.4 Type 1 - Context-Sensitive Languages

Type 1 - Context-Sensitive Languages


Context-Sensitive grammars may have more than one symbol on the left-hand-side of
their grammar rules, provided that at least one of them is a non-terminal and the number
of symbols on the left-hand-side does not exceed the number of symbols on the righthand-side.
Their rules have the form:
A
where A is a single non-terminal symbol, and are any combination of terminals and
non-terminals.

50.5 Cont
Since we allow more than one symbol on the left-hand-side, we refer to those symbols
other than the one we are replacing as the context of the replacement.
The automaton which recognizes a context-sensitive language is called a linear-bounded
automaton: an FSA with a memory to store symbols in a list.

50.6 Cont
Since the number of the symbols on the left-hand side is always smaller or equal to the
number of the symbols on the right-hand side, the length of each derivation string is
increased when applying a grammar rule. This length is bound by the length of the input
string. Thus a linear-bound automaton always needs a finite list as its store
Examples of context-sensitive languages are most programming languages

50.7 Type 0 - Unrestricted (Free) Languages

Type 0 - Unrestricted (Free) Languages


Unrestricted grammars have no restrictions on their grammar rules, except that there must
be at least one non-terminal on the left-hand-side. The rules have the form
where and are arbitrary strings of terminal and nonempty string)
The type of automata which can recognize such a language is a Turing machine, with an
infinitely long memory.
Examples of unrestricted languages are almost all natural languages.

50.8 Theorem:
A language is generated by an unrestricted grammar if and only if it is recursively enumerable.

50.9 Are all languages recursively enumerable?


The answer is no. We have shown that there are languages that are not recursively enumerable.
Such languages cannot be described a formal grammar. Each formal grammar has a finite
description and therefore can be considered as a string. Thus, the set of all formal grammars is
infinitely countable. The set of all languages over an alphabet is the power set of all strings over
that alphabet. We have shown that power sets of infinite sets are not countable. Therefore there is
no one-to-one match between grammars and languages.

You might also like