0% found this document useful (0 votes)
81 views74 pages

Module-1-Finite Automata

Uploaded by

Pramod Parit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views74 pages

Module-1-Finite Automata

Uploaded by

Pramod Parit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

S. J. P. N.

TRUST’S
HIRASUGAR INSTITUTE OF TECHNOLOGY, NIDASOSHI
Accredited at 'A' Grade by NAAC
Programmes Accredited by NBA: CSE, ECE, EEE & ME.

Department of Computer Science & Engineering

Course: Theory of Computation(BCS503)

Module 1: Introduction to Finite


Automata
Prof. A. A. Daptardar
Asst. Prof. , Dept. of Computer Science & Engg.,
Hirasugar Institute of Technology, Nidasoshi
Module - 1
• Introduction to Finite Automata, Structural
Representations, Automata and Complexity.
The Central Concepts of Automata Theory.
Deterministic Finite Automata,
Nondeterministic Finite Automata, An
Application: Text Search, Finite Automata with
Epsilon-Transitions.
• TEXT BOOK: Sections 1.1, 1.5, 2.2,2.3,2.4,2.5

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 2


IT, Nidasoshi
Introduction
• Automata theor y is the study of
abstract computi ng devices or machines.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 3


IT, Nidasoshi
Why study Automata Theory?
• 1.1.1 - Introduction to Finite Automata
• 1.1.2 - Structural Representations
• 1.1.3 - Automata and Complexity

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 4


IT, Nidasoshi
1.1.1 Introduction to Finite Automata
F i n i te A u t o m a t a are useful mo de l fo r many impo rtant kinds o f hardware and

S o ft ware

• Soft ware fo r designing & c h e c k i n g t h e b e h a v i o r o f digital circuits.

• Lexical analyzer o f a t y p i c a l compiler i.e. the compiler component that

breaks the input text into logical units, such as identi fi ers, keywords and

punctuati on.
•Software for scanning large bodies of text , such as collections of web pages, to find

occurrences of words, phrases or other patterns.


•S o ft w a r e f o r v e r i f y i n g s y s t e m s o f a l l t y p e s t h a t h a v e a fi n i t e n u m b e r of

d i s ti n c t s t a t e s , s u c h a s c o m m u n i c a ti o n s p r o t o c o l s o r p r o t o c o l s f o r

s e c u r e e x c h a n g e o f i n f o r m a ti o n .

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 5


IT, Nidasoshi
• Finite automata are finite collections of states with
transition rules that take you from one state to
another.
• The purpose of a state is to remember the relevant
portion of the system's history.
• Since there are only a finite number of states, the
entire history generally cannot be remembered, so
the system must be designed carefully, to
remember what is important and forget what is
not.
• The advantage of having only a finite number of
states is that we can implement the system with a
fixed set of resources.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 6
IT, Nidasoshi
Example 1.1
• The simplest nontrivial finite automaton is an
on/off switch.
• The device remembers whether it is in the
"on" state or the "off" state, and it allows the
user to press a button whose effect is
different, depending on the state of the
switch.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 7


IT, Nidasoshi
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 8
IT, Nidasoshi
Representing FA
• Simplest representation is often a graph.
– Nodes(Circles) = states.
– Arcs indicate state transitions.
– Labels on arcs tell what causes the transition(Arcs between
states are labeled by "inputs," which represent external
influences on the system)
– One of the states is designated the "start state," the state in
which the system is placed initially.
– It is often necessary to indicate one or more states as "final" or
"accepting“ states. Entering one of these states after a sequence
of inputs indicates that the input sequence is good in some
way(It is conventional to designate accepting states by a double
circle)
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 9
IT, Nidasoshi
Example 1.2
• The part of a lexical analyzer. The job of this
automaton is to recognize the keyword then. It thus
needs five states, each of which represents a
different position in the word then that has been
reached so far. These positions correspond to the
prefixes of the word, ranging from the empty string
(i.e., nothing of the word has been seen so far) to the
complete word.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 10


IT, Nidasoshi
• In Fig. 1.2, the five states arc named by the prefix of
then seen so far.
• Inputs correspond to letters.
• We may imagine that the lexical analyzer examines one
character of the program that it is compiling at a time,
and the next character to be examined is the input to
the automaton.
• The start state corresponds to the empty string, and
each state has a transition on the next letter of then to
the state that corresponds to the next-larger prefix.
• The state named then is entered when the input has
spelled the word then. Since it is the job of this
automaton to recognize when then has been seen, we
could consider that state the lone accepting state.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 11
IT, Nidasoshi
1.1.2 Structural Representations
• There are two important notations that are not
automaton-like, but play an important role in the study
of automata and their applications.
– Grammars are useful models when designing software that processes
data with a recursive structure. The best-known example is a "parser,"
the component of a compiler that deals with the recursively nested
features of the typical programming language, such as expressions -
arithmetic, conditional, and so on. For example, a grammatical rule like
E => E + E states that an expression can be formed by taking any two
expressions and connecting them by a plus sign
– Regular Expressions also denote the structure of data, especially text
strings. The UNIX-style regular expression ‘[A-Z] [a-z] * [ ] [A-Z] [A-Z]’
represents capitalized words followed by a space and two capital
letters.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 12


IT, Nidasoshi
1.1.3 Automata and Complexity
• Automata are essential for the study of the limits
of computation. There are two important issues:
– 1. What can a computer do at all? This study is called
"decidability," and the problems that can be solved
by computer are called "decidable.“
– 2. What can a computer do efficiently? This study is
called "intractabil­ity," and the problems that can be
solved by a computer using no more time than some
slowly growing function of the size of the input are
called "tractable."

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 13


IT, Nidasoshi
1.5 The Central Concepts of Automata
Theory
• In this section we shall introduce the most
important definitions of terms that pervade
the theory of automata.
• These concepts include the "alphabet" (a set
of symbols), "strings" (a list of symbols from
an alphabet), and "language" (a set of
strings from the same alphabet).

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 14


IT, Nidasoshi
1.5.1 Alphabets
• An alphabet is a finite, nonempty set of symbols.
Conventionally, we use the symbol Ʃ for an alphabet.
• Common alphabets include:
1. Ʃ = {0,1}, the binary alphabet.
2. Ʃ = {a, b, ... , z}, the set of all lower-case letters.
3.The set of all ASCII characters, or the set of all
printable ASCII characters

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 15


IT, Nidasoshi
1.5.2 Strings
A string (or sometimes word) is a finite sequence of
symbols chosen from some alphabet.
For example, 01101 is a string from the binary
alphabet Ʃ = {0, 1}.
• The Empty String : The empty string is the string
with zero occurrences of symbols. This string,
denoted ϵ, is a string that may be chosen from any
alphabet whatsoever.
• Length of a String : It is the number of positions for
symbols in the string. For instance, 01101 has
length 5. The standard notation for the length of a
string w is lwl. For example, 10111 = 3 and l ϵ l = 0.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 16
IT, Nidasoshi
• Powers of an Alphabet : If Ʃ is an alphabet, we
can express the set of all strings of a certain
length from that alphabet by using an
exponential notation. We define Ʃk to be the
set of strings of length k, each of whose
symbols is in Ʃ.
• If Ʃ = {0, 1}, then Ʃ1 = {0, 1},
• Ʃ2 = {00,01, 10, 11},
• Ʃ3 = {000,001,010,011,100,101,110,111}
• Note: Ʃ0 = {ϵ} regardless of what alphabet Ʃ is.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 17


IT, Nidasoshi
• The set of all strings over an alphabet Ʃ is
conventionally denoted Ʃ*.
• For instance, {0, 1}* = {ϵ, 0, 1, 00, 01, 10, 11, 000,
••• }.
• Put another way,
• Ʃ* = Ʃ0 U Ʃ1 U Ʃ2 U · · ·
• Sometimes, we wish to exclude the empty string
from the set of strings. The set of nonempty strings
from alphabet Ʃ is denoted Ʃ+. Thus, two
appropriate equivalences are:
• Ʃ+ = Ʃ1 U Ʃ2 U Ʃ3 U · · ·
• Ʃ* = Ʃ+ U {ϵ}.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 18
IT, Nidasoshi
• Concatenation of Strings : Let x and y be
strings. Then xy denotes the concatenation of x
and y, that is, the string formed by making a
copy of x and following it by a copy of y. More
precisely, if x is the string composed of i
symbols x = a1 a2…..ai and y is the string
composed of j symbols y = b1 b2 .…bj, then xy is
the string of length
i + j : xy = a1 a2…..aib1 b2 .…bj
Example: Let x = 01101 and y = 110.
Then xy = 01101110 and yx = 11001101.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 19
IT, Nidasoshi
• 1.5.3 Languages
• A set of strings all of which are chosen from some
Ʃ* where Ʃ is a particular alphabet, is called a
language. If Ʃ is an alphabet, and L Ʃ*, then L is a
language over Ʃ.
• An example is English, where the collection of
legal English words is a set of strings over the
alphabet that consists of all the letters.
• Another example is C, or any other programming
language, where the legal programs are a subset
of the possible strings that can be formed from
the alphabet of the language.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 20
IT, Nidasoshi
• There are also many other languages that appear when
we study automata. Some are abstract examples, such
as:
– 1. The language of all strings consisting of n 0's followed by n
1's, for some n ≥ 0: {ϵ, 01, 0011, 000111, ... }.
– 2. The set of strings of 0's and 1's with an equal number of
each:
{ϵ, 01, 10, 0011, 0101, 1001,, .. }
– 3. The set of binary numbers whose value is a prime:
– {10, 11, 101, 111, 1011, .. }
– 4. Ʃ* is a language for any alphabet Ʃ.
– 5. ᴓ, the empty language, is a language over any alphabet.
– 6. {ϵ}, the language consisting of only the empty string, is also
a language over any alphabet. Notice that ᴓ ≠ {ϵ}; the former
has no strings and the latter has one string.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 21
IT, Nidasoshi
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 22
IT, Nidasoshi
• 1.5.4 Problems
• In automata theory, a problem is the question of
deciding whether a given string is a member of some
particular language.
• More precisely, if Ʃ is an alphabet, and L is a language
over Ʃ, then the problem L is:
• Given a string w in Ʃ*, decide whether or not w is in L.
• Example: The problem of testing primality can be
expressed by the language LP consisting of all binary
strings whose value as a binary number is a prime.
That is, given a string of 0's and 1's, say "yes" if the
string is the binary representation of a prime and say
"no" if not.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 23
IT, Nidasoshi
2.2 Deterministic Finite Automata
• The term "deterministic" refers to the fact that
on each input there is one and only one state
to which the automaton can transition from its
current state. In contrast, "nondeterministic"
finite automata, the subject can be in several
states at once.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 24


IT, Nidasoshi
Definition of Deterministic Finite Automata
(DFA)
• A Deterministic Finite Automaton consists of:
1. A finite set of states, often denoted Q.
2. A finite set of input symbols, often denoted Ʃ.
3. A transition function that takes as arguments a state and an
input symbol and returns a state. The transition function
will commonly be denoted . In our informal graph
representation of automata, was represented by arcs
between states and the labels on the arcs. If q is a state,
and a is an input symbol, then (q, a) is that state p such
that there is an arc labeled a from q to p.
4. A start state, one of the states in Q.
5. A set of final or accepting states F. The set F is a subset of Q.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 25
IT, Nidasoshi
• A deterministic finite automaton will often be
referred to by its acronym: DFA. The most
succinct representation of a DFA is a listing of
the five components above. In proofs we often
talk about a DFA in "five-tuple" notation:
A= (Q, Ʃ, ,q0,F)
• where A is the name of the DFA, Q is its set of
states, Ʃ its input symbols, its transition
function, q0 its start state, and F its set of
accepting states.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 26


IT, Nidasoshi
2.2.2 How a DFA Processes Strings
• Let us understand about a DFA is how the DFA decides whether or
not to "accept" a sequence of input symbols.
• The "language" of the DFA is the set of all strings that the DFA
accepts.
• Suppose a1a2 ···an is a sequence of input symbols.
• We start out with the DFA in its start state, q0.
• We consult the transition function , say (q0, a1) = q1 to find the
state that the DFA A enters after processing the first input symbol a1.
• We process the next input symbol, a2, by evaluating (q1, a2); let us
suppose this state is q2.
• We continue in this manner, finding states q3,q4,.....qn such that (qi-
1,ai) = qi for each i.

• If qn is a member of F, then the input a1a2 ···an is accepted, and if not


then it is "rejected." Prof. A.A.Daptardar, Department of CSE, HS
04/02/2025 27
IT, Nidasoshi
2.2.3 Simpler Notations for DFA's
• Specifying a DFA as a five-tuple with a detailed
description of the transition function is both tedious
and hard to read.
• There are two preferred notations for describing
automata:
1. A Transition Diagram which is represented using
circles, arrows, arcs with labels, double circles etc.
2. A Transition Table, which is a tabular listing of
the function, which by implication tells us the set of
states and the input alphabet.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 28
IT, Nidasoshi
Transition Diagrams
• A transition diagram for a DFA A= (Q, Ʃ, , q0,F) is a graph
defined as follows:
a) For each state in Q there is a node.
b) For each state q in Q and each input symbol a in Ʃ, let (q,a) = p.
Then the transition diagram has an arc from node q to node p,
labeled
a. If there are several input symbols that cause transitions from q to p,
then the transition diagram can have one arc, labeled by the list of
these symbols.
c) There is an arrow into the start state q0, labeled Start. This arrow
does not originate at any node.
d) Nodes corresponding to accepting states (those in F) are marked
by a double circle. States not in F have a single circle.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 29
IT, Nidasoshi
Transition Tables
• A transition table is a conventional, tabular
representation of a function like that takes
two arguments and returns a value.
• The rows of the table correspond to the
states, and the columns correspond to the
inputs.
• The entry for the row corresponding to state q
and the column corresponding to input a is
the state (q, a).
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 30
IT, Nidasoshi
• Construct DFA which accepts any number of
a’s followed by a string ba and followed by
string of a’s and b’s.
• A= (Q, Ʃ, ,q0,F)
• Q = {q0,q1,q2}
• Ʃ = {a, b}
• q0 is the start state
• F = {q2} – final state

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 31


IT, Nidasoshi
Transition Diagram
a,b
a
b a
q0 q1 q2

Transition Table
States a b
q0 q0 q1
q1 q2 -
*q2 q2 q2

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 32


IT, Nidasoshi
• Example 2.4 - Design a DFA to accept the
language
L = { w I w has both an even number of 0's and
an even number of 1's}
• This DFA is to count both the number of 0's
and the number of 1's, but count them
modulo 2. That is, the state is used to
remember whether the number of 0's seen so
far is even or odd, and also to remember
whether the number of 1 's seen so far is even
or odd. There are thus four states, which can
be given the following interpretations:
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 33
IT, Nidasoshi
• q0: Both the number of 0's seen so far and the number of
1's seen so far are even
• q1: The number of 0's seen so far is even, but the number
of 1 's seen so far is odd.
• q2: The number of 1's seen so far is even, but the number
of 0's seen so far is odd.
• q3: Both the number of 0's seen so far and the number of
1's seen so far are odd.
• State q0 is both the start state and the lone accepting state.
It is the start state, because before reading any inputs, the
numbers of 0's and 1 's seen so far are both zero, and zero
is even. It is the only accepting state, because it describes
exactly the condition for a sequence of 0's and 1 's to be in
language L.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 34
IT, Nidasoshi
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 35
IT, Nidasoshi
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 36
IT, Nidasoshi
2.2.4 Extending the Transition Function to
Strings
• we need to make the notion of the language of a DFA
precise.
• To do so, we define an extended transition function that
describes what happens when we start in any state and
follow any sequence of inputs.
• If δ is our transition function then the extended transition
function constructed from δ will be called ..
• The extended transition function is a function that takes
a state q and a string w and returns a state p - the state
that the automaton reaches when starting in state q and
processing the sequence of inputs w.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 37
IT, Nidasoshi
• We define by induction on the length of the input string,
as follows:
• BASIS: (q, ϵ ) = q. That is, if we are in state q and read no
inputs, then we are still in the state q.
• INDUCTION: Suppose w is a string of the form xa; that is, a is
the last symbol of w, and x is the string consisting of all but
the last symbol.
• For example, w = 1101 is broken into x = 110 and a= 1. Then
• (q,w) = (δ(q,x),a)
• To compute (q, w), first compute (q, x), the state that the
automaton is in after processing all but the last symbol of w.
• Suppose this state is p; that is, (q, x) = p.
• Then (q,w) is what we get by making a transition from
state p on input a, the last symbol of w.
• That is, (q,w} = δ(p,a).
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 38
IT, Nidasoshi
• Let us design a DFA to accept the language
• L = { w I w has both an even number of 0's and an
even number of 1's}
• Suppose the input is 110101. Since this string has
even numbers of 0's and 1's both, we expect it is
in the language. Thus, we expect that (q0,
110101) = q0, since q0 is the only accepting state.
Let us now verify that claim.
• The check involves computing (q0 ,w) for each
prefix w of 110101, starting at ϵ and going in
increasing size. The summary of this calculation is:
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 39
IT, Nidasoshi
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 40
IT, Nidasoshi
2.3 Nondeterministic Finite Automata
• A ''nondeterministic” finite automaton (NFA)
has the power to be in several states at once.
• This ability is often expressed as an ability to
"guess" something about its input.
• Before examining applications, we need to
define nondeterministic finite automata and
show that each one accepts a language that is
also accepted by some DFA.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 41


IT, Nidasoshi
Definition of NFA
• A Nondeterministic Finite Automaton consists of:
1. A finite set of states, often denoted Q.
2. A finite set of input symbols, often denoted Ʃ.
3. A δ is the transition function that takes as arguments a
state and an input symbol and returns a subset of states
which maps from Q X Ʃ into 2Q which is a power set of Q.
4. A start state q0, one of the states in Q.
5.A set of final or accepting states F. The set F is a subset of
Q.
• An NFA is represented essentially like a DFA:
A= (Q, Ʃ , δ, q0, F)
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 42
IT, Nidasoshi
• Example 2.6: Figure 2.9 shows a
nondeterministic finite automaton, whose job
is to accept all and only the strings of 0's and
1's that end in 01.

• The NFA can represented as

04/02/2025 43
Processing inputs by NFA

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 44


IT, Nidasoshi
• It starts in only its start state, q0. When the first 0 is read,
the NFA may go to either state q0 or state q1, so it does both.
• Then, the second 0 is read. State q0 may again go to both q0
and q1. However, state q1 has no transition on 0, so it "dies”.
• When the third input, a 1, occurs, we must consider
transitions from both q0 and q1. We find that q0 goes only to
q0 on 1, while q1 goes only to q2.
• Thus, after reading 001, the NFA is in states q 0 and q2. Since
q2 is an accepting state, the NFA accepts 001.
• However, the input is not finished. The fourth input, a 0,
causes q2's thread to die, while q0 goes to both q0 and q1.
• The last input, a 1, sends q0 to q0 and q1 to q2. Since we are
again in an accepting state, 00101 is accepted.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 45


IT, Nidasoshi
The Extended Transition Function

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 46


IT, Nidasoshi
• Let us use to describe the processing of
input 00101 by the NFA of Fig. 2.9. A summary
of the steps is:

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 47


IT, Nidasoshi
2.3.4 The Language of an NFA

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 48


IT, Nidasoshi
2.3.5 Equivalence of Deterministic and
Nondeterministic Finite Automata
• Although there are many languages for which an NFA is
easier to construct than a DFA such as the language
(Example 2.6) of strings that end in 01.
• It is a surprising fact that every language that can be
described by some NFA can also be described by some
DFA.
• Moreover, the DFA in practice has about as many states
as the NFA, although it often has more transitions.
• In the worst case, however, the smallest DFA can have 2 n
states while the smallest NFA for the same language has
only n states.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 49
IT, Nidasoshi
• The proof that DFA's can do whatever NFA's
can do involves an important "construction"
called the subset construction because it
involves constructing all subsets of the set of
states of the NFA.
• The subset construction starts from an NFA N =
(QN, Ʃ, δN, q0,FN). Its goal is the description of a
DFA D = (QD, Ʃ, δD, {q0}, FD)such that L(D) = L(N).
• Notice that the input alphabets of the two
automata are the same, and the start state of D
is the set containing only the start state of N.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 50


IT, Nidasoshi
• The other components of D are constructed as
follows.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 51


IT, Nidasoshi
Examples

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 52


IT, Nidasoshi
2.4 An Application: Text Search
• In this section, we shall see that the abstract
study of the previous section, where we
considered the "problem" of deciding whether
a sequence of bits ends in 01, is actually an
excellent model for several real problems that
appear in applications such as Web search and
extraction of information from text.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 53


IT, Nidasoshi
2.4.1 Finding Strings in Text
• A common problem in the age of the Web and other online
text repositories is the following.
• Given a set of words, find all documents that contain one (or
all) of those words.
• A search engine is a popular example of this process.
• The search engine uses a particular technology, called inverted
indexes, where for each word appearing on the Web (there are
100,000,000 different words), a list of all the places where that
word occurs is stored.
• Machines with very large amounts of main memory keep the
most common of these lists available, allowing many people to
search for documents at once.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 54
IT, Nidasoshi
• Inverted-index techniques do not make use of
finite automata, but they also take very large
amounts of time for crawlers to copy the Web
and set up the indexes.
• There are a number of related applications
that are unsuited for inverted indexes, but are
good applications for automaton-based
techniques.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 55


IT, Nidasoshi
• The characteristics that make an application suitable for searches
that use automata are:
• 1. The repository on which the search is conducted is rapidly
changing. For example:
– (a)Every day, news analysts want to search the day's online news
articles for relevant topics. For example, a financial analyst might search
for certain stock ticker symbols or names of companies.
– (b) A "shopping robot" wants to search for the current prices charged
for the items that its clients request. The robot will retrieve current
catalog pages from the Web and then search those pages for words that
suggest a price for a particular item.
• 2. The documents to be searched cannot be cataloged. For
example,
– Amazon.com does not make it easy for crawlers to find all the pages for
all the books that the company sells. Rather, these pages are generated
"on the fly" in response to queries. However, we could send a query for
books on a certain topic, say "finite automata," and then search the
pages retrieved for certain words, e.g., "excellent" in a review portion.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 56
IT, Nidasoshi
2.4.2 Nondeterministic Finite Automata for
Text Search
• Suppose we are given a set of words, which we
shall call the keywords, and we want to find
occurrences of any of these words.
• In applications such as these, a useful way to
proceed is to design a nondeterministic finite
automaton, which signals, by entering an accepting
state, that it has seen one of the keywords.
• The text of a document is fed, one character at a
time to this NFA, which then recognizes
occurrences of the keywords in this text.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 57
IT, Nidasoshi
• There is a simple form to an NFA that recognizes a
set of keywords.
– 1.There is a start state with a transition to itself on
every input symbol, e. g. every printable ASCII
character if we are examining text. Intuitively, the
start state represents a "guess" that we have not yet
begun to see one of the keywords, even if we have
seen some letters of one of these words.
– 2.For each keyword a1a2 · · · ak, there are k states, say
q1, q2, ........, qk, There is a transition from the start
state to q1 on symbol a1, a transition from q1 to q2
on symbol a2, and so on. The state qk is an accepting
state and indicates that the keyword a1,a2,....., ak has
been found.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 58
IT, Nidasoshi
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 59
IT, Nidasoshi
2.4.3 A DFA to Recognize a Set of Keywords

• We can apply the subset construction to any NFA.


However, when we apply that construction to an
NFA that was designed from a set of keywords, we
find that the number of states of the DFA is never
greater than the number of states of the NFA.
• Since in the worst case the number of states
exponentiates as we go to the DFA, this observation
is good news and explains why the method of
designing an NFA for keywords and then
constructing a DFA from it is used frequently.
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 60
IT, Nidasoshi
• The rules for constructing the set of DFA states
is as follows.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 61


IT, Nidasoshi
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 62
IT, Nidasoshi
2.5 Finite Automata With Epsilon-
Transitions
• We shall now introduce another extension of
the finite automaton.
• The new "feature" is that we allow a transition
on E, the empty string.
• In effect, an NFA is allowed to make a
transition spontaneously, without receiving an
input symbol.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 63


IT, Nidasoshi
2.5.1 Uses of ϵ -Transitions
• We shall begin with an informal treatment of ϵ -
NFA's, using transition diagrams with ϵ allowed as
a label.
• In the examples to follow, think of the automaton
as accepting those sequences of labels along
paths from the start state to an accepting state.
• However, each ϵ along a path is "invisible"; i.e., it
contributes nothing to the string along the path.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 64


IT, Nidasoshi
Example 2.16: In Fig. 2.18 is an ϵ - NFA that
accepts decimal numbers consisting of:
1. An optional + or - sign,
2. A string of digits,
3. A decimal point, and
4. Another string of digits. Either this string of digits, or
the string (2) can be empty, but at least one of the
two strings of digits must be nonempty.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 65


IT, Nidasoshi
2.5.2 The Formal Notation for an ϵ -NFA

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 66


IT, Nidasoshi
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 67
IT, Nidasoshi
2.5.3 Epsilon-Closures
• Definition: ECLOSE(q) is read as ϵ - CLOSURE of
q and is the set of all states which are
reachable from q on ϵ-transitions only.
• The recursive definition of ECLOSE(q) is:
– State q is in ECLOSE(q) i.e. ECLOSE(q) = q for each
q
– If state p is in ECLOSE(q), and there is a transition
from state p to state r labeled ϵ, then r is in
ECLOSE(q) i.e. if ECLOSE(q) = {p, r}

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 68


IT, Nidasoshi
Acceptance of Language
• A string of symbols obtained from Ʃ. The set of all strings
recognized by an automaton is called language.
• The Language L accepted by an ϵ - NFA E =(Q, Ʃ, δ, q0, A)
where Q is the set of finite states, Ʃ is the set of input
alphabets, δ is transition from Q X {Ʃ U ϵ) to 2Q, q0 is the
start state and A is the final or accepting state.
• The string w accepted by an NFA can be defined in
formal notation as :
• L(E) = { w | w ϵ Ʃ* and

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 69


IT, Nidasoshi
Acceptance of Language
• A string of symbols obtained from Ʃ. The set of all strings
recognized by an automaton is called language.
• The Language L accepted by an ϵ - NFA E =(Q, Ʃ, δ, q0, A)
where Q is the set of finite states, Ʃ is the set of input
alphabets, δ is transition from Q X {Ʃ U ϵ) to 2Q, q0 is the
start state and A is the final or accepting state.
• The string w accepted by an NFA can be defined in
formal notation as :
• L(E) = { w | w ϵ Ʃ* and

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 70


IT, Nidasoshi
Acceptance of Language
• A string of symbols obtained from Ʃ. The set of all
strings recognized by an automaton is called language.
• The Language L accepted by an ϵ - NFA E =(Q, Ʃ, δ, q0, F)
where Q is the set of finite states, Ʃ is the set of input
alphabets, δ is transition from Q X {Ʃ U ϵ) to 2Q, q0 is the
start state and F is the final or accepting state.
• The string w accepted by an NFA can be defined in
formal notation as :
• L(E) = { w | w ϵ Ʃ* and (q0, w) = P with at least one
component of P in F}
04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 71
IT, Nidasoshi
• Obtain an ϵ – NFA which accepts strings
consisting of zero or more a’s followed by zero
or more b’s followed by zero or more c’s.

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 72


IT, Nidasoshi
Conversion of ϵ-NFA to DFA
• Given any ϵ-NFA E, we can find a DFA D that accepts
the same language as E. The construction we use is
very close to the subset construction, as the states of
D are subsets of the states of E.
• Let E =(QE, Ʃ, δE, q0, FE). Then the equivalent DFA
D=(QD, Ʃ, δD, q0, FD) is defined as follows:
• QD is the set of subsets of QE · More precisely, we shall
find that all accessible states of D are ϵ -closed subsets
of QE, that is, sets S QE such that S = ECLOSE(S).

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 73


IT, Nidasoshi
• qD = ECLOSE(q0); that is, we get the start state
of D by closing the set consisting of only the
start state of E.
• FD is those sets of states that contain at least
one accepting state of E. That is, FD = {S l S is in
QD and S ∩ FE = ᴓ }.
• δD(S, a) is computed, for all a in Ʃ; and sets S in
QD by:

04/02/2025 Prof. A.A.Daptardar, Department of CSE, HS 74


IT, Nidasoshi

You might also like