TOA - Lecture 3
TOA - Lecture 3
CS-312
Theory of Automata and Formal
Languages
Agenda
Regular Expression
Formal Definition of Regular Expression
Regular Languages
Languages associated with regular expressions
All finite languages are regular
Introduction to Finite Automata:
States
Transition
Acceptance or Rejection of a string
Representation of an FA by Transition Table
2
Regular Expression
3
Regular Expression
Given = {a,b}
a* = {Λ, a,aa,aaa,aaa,aaaa,aaaaa, …}
ab* = {a, ab,abb,abbb,abbbb, …}
a+b = {a,b}
(ab)* = {Λ, ab, abab, ababab, …}
(a+b)* = {Λ, any string of a’s and b’s}
4
Formal Definition of Regular
Expressions
The set of regular expression is defined by following rules
1. Every letter of and Λ is a regular expression
2. If r1 and r2 are regular expressions, then so are
r 1r 2
r1+r2
r1* (or r2* )
5
Example of RE
Now consider another language L, of even length, defined
over Σ = {a, b}, then it’s regular expression may be
(a+b) generate 1 letter which is of length 1
(a+b) (a+b) generate 2 letter which is even length
( (a+b) (a+b) ) * generate language of even length
Generate even length string where length =4
6
Example of RE
Now consider another language L, of odd length,
defined over Σ = {a, b}, then it’s regular expression
may be
(a+b)((a+b)(a+b))* or
((a+b)(a+b))*(a+b)
7
Remark
It may be noted that a language may be expressed
by more than one regular expressions,
while given a regular expression there exist a
unique language generated by that regular
expression.
8
Example of RE
Consider the language, defined over
Σ={a, b}, of words starting with double a and
ending in double b then its regular expression may
be aa(a+b)*bb
Consider the language, defined over
Σ={a, b} of words starting with a and ending in
b OR starting with b and ending in a, then its
regular expression may be a(a+b)*b+b(a+b)*a
9
Example of RE
Example:
Consider the language, defined over
Σ={a , b} of words having at least one a, may be
expressed by a regular expression
(a+b)*a(a+b)*.
Consider the language, defined over
Σ = {a, b} of words having at least one a and one
b, may be expressed by a regular expression
(a+b)*a(a+b)*b(a+b)*+ (a+b)*b(a+b)*a(a+b)*.
10
Example of RE
Consider the language, defined over
Σ={a, b} of words beginning with a, then its
regular expression may be a(a+b)*
Consider the language, defined over
Σ={a, b} of words beginning and ending in
same letter, then its regular expression may be
a(a+b)*a
b(a+b)*b
(a+b)
(a+b)+a(a+b)*a+b(a+b)*b
11
Example of RE
Consider the language, defined over
Σ={a, b} of words ending in b, then its regular
expression may be (a+b)*b.
Consider the language, defined over
Σ={a, b} of words not ending in a, then its regular
expression may be (a+b)*b + Λ. It is to be noted that this
language may also be expressed by ((a+b)*b)*.
12
An important example
The Language EVEN-EVEN :
Language of strings, defined over Σ={a, b} having even
number of a’s and even number of b’s. i.e.
EVEN-EVEN = {Λ, aa, bb, aaaa,aabb,abab, abba, baab, baba,
bbaa, bbbb,…} ,
its regular expression can be written as
(aa + bb + (ab+ba)(aa+bb)*(ab+ba))*
To Understand:
(ab+ba)(aa+bb)*(ab+ba) = (ab)(aa)(ba) =abaaba
(ab+ba)(aa+bb)*(ab+ba) = (ab)(bb)(ba) =abbbba
13
Note
It is important to be clear about the difference of
the following regular expressions
r1=a*+b*
r2=(a+b)*
Here r1 does not generate any string of
concatenation of a and b, while r2 generates such
strings.
14
Equivalent Regular Expressions
Definition:
Two regular expressions are said to be equivalent if they
generate the same language.
Example:
Consider the following regular expressions
r1= (a + b)* (aa + bb)
r2= (a + b)*aa + ( a + b)*bb then
both regular expressions define the language of strings
ending in aa or bb.
15
Regular Languages
16
Regular Languages Definition
The language generated by any regular expression is
called a regular language.
17
Note
It is to be noted that if L1 and L2 are expressed by r1and r2,
respectively then the language expressed by
1) r1+ r2, is the language L1 + L2 or L1 U L2
2) r1r2, , is the language L1L2, of strings obtained by prefixing
every string of L1 with every string of L2
3) r1*, is the language L1*, of strings obtained by concatenating the
strings of L, including the null string.
18
Example
If r1=(aa+bb) and r2=(a+b) then the language of strings generated
by r1+r2, is also a regular language, expressed by (aa+bb)+(a+b)
If r1=(aa+bb) and r2=(a+b) then the language of strings
generated by r1r2, is also a regular language, expressed by
(aa+bb)(a+b)
If r=(aa+bb) then the language of strings generated by r*, is also a
regular language, expressed by (aa+bb)*
19
All finite languages are regular
Example:
Consider the language L, defined over Σ={a,b}, of strings of
length 2, starting with a, then
L={aa, ab}, may be expressed by the regular expression
aa+ab. Hence L, by definition, is a regular language.
20
Note
It may be noted that if a language contains even thousand
words, its RE may be expressed, placing ‘ + ’ between all the
words.
Here the special structure of RE is not important.
Consider the language
L={aaa, aab, aba, abb, baa, bab, bba, bbb},
that may be expressed by a
RE=aaa+aab+aba+abb+baa+bab+bba+bbb,
which is equivalent to
(a+b)(a+b)(a+b).
21
What are Regular Language of
following Regular Expressions?
Whether following are RE if so what languages do they
generate
a (b + a)*
bb(a+b)
(a+b)(a+b)(a+b)
(a+b)*ba
(a+b)*a(a+b)*
(a+b)*aa(a+b)*
22
What are Regular Expressions of
following Regular expression?
Write RE for the following languages over the ={a,b}.
All words ending with b
All words that start with a
All words that start with a double letter
All words that contain at least one double letter
All words that start and end with a double letter
All words of length >=3
All words that contain exactly one a or exactly one b
All words that don’t end at b
23
Finite Automata
24
Introduction to Defining Languages
25
Finite Automata
Language Recognizers
Machines embedded with grammatical rules that recognize a
language
REs define a language and FAs accept (or reject ) them
26
Finite Automata
Sort of graphs consisting of nodes called states and edges
called transitions
character read
character
27
Finite Automata
A finite automaton is a collection of followings:
A finite set of states
Exactly one initial state (start state)
One or more (may be none) final states that mark the
acceptance of a word
Intermediate states that are neither start not final states
An alphabet of possible input letters
28
How Does a Finite Automaton
work?
The start state marks the beginning of reading every
input
Reading a character triggers a transition from that
state which may transfer control to some other state and
the reading mechanism advances to next character and
the process continues
When the input terminates, if the control is left
with a final or accepting state, the input string is
accepted otherwise it is rejected and the FA resets
control to the initial state for next input
29
How Does a Finite Automaton
work?
The state to go to next on reading a letter of the input
30
Finite Automata Visual
Representation
Visual representations
States represented by circles labeled to identify each distinctly
1/x
Initial (- sign) and Final states (+ sign )
- +
Transitions
Directed edges labeled with the characters of
a
- +
31
Example of Finite Automata
Consider the following FA:
The input alphabet has only the two letters a and b. (We usually use
this alphabet throughout the chapter.)
There are only three states, x, y and z, where x is the start state and
z is the final state.
Input Alphabets
x y
Input a
Alphabets
State a b
x y z
y x z b b
z z z a b
33
Example of Finite Automata
Let us examine what happens when the input string aaa is
presented to this FA.
34
Example of Finite Automata
The set of all strings that lead to a final state is called the
language defined by the finite automaton.
Thus, the string aaa is not in the language defined by
this FA.
We may also say that the string aaa is not accepted by this
FA, or the string aaa is rejected by this FA.
The set of all strings accepted is also called the language
associated with the FA.
We also say, “This FA accepts the language L”, or “L is the
language accepted by this FA”, or “L is the language of
the FA”, by which we mean that all the words in L are
accepted, and all the inputs accepted are words in L
35
Example of Finite Automata
It is not difficult to find the language accepted by this FA.
If an input string is made up of only letter a’s then the action
of the FA will be to jump back and forth between state x and
state y.
To get to state z, it is necessary for the string to have the letter
b in it. As soon as a b is encountered, the FA jumps to state z.
Once in state z, it is impossible to leave. When the input string
runs out, the FA will be in the final state z.
This FA will accept all strings that have the letter b in them.
Hence, the language accepted by this FA is defined by the
regular expression
36 (a + b)*b(a + b)*
Abstract definition of FA
1. FA = (Q, ,q0,F,δ)
2. A finite set of states Q = {q0, q1, q2 q3 … qn where n is
finite }
3. An alphabet ∑ = { x1, x2, x3, …}.
4. q0 is the start states
5. F Q is the set of final states F may be
6. A transition function δ associating each pair of state and
letter with a state:
δ(q,xj) = xk
37
Transition Diagrams
Pictorial representation of an FA gives us more of a
feeling for the motion.
We represent each state by a small circle.
We draw arrows showing to which other states the
different input letters will lead us. We label these
arrows with the corresponding input letters.
If a certain letter makes a state go back to itself, we
indicate this by a loop.
We indicate the start state by a minus sign, or by
labeling it with the word start.
We indicate the final states by plus signs, or by labeling
them with the word final.
Sometimes, a start state is indicated by an arrow, and a
38
final state is indicated by drawing another circle around its
FA with Minus and Plus Sign
39
FA with start and final label
40
FA with arrow and double circle
41
Transition Diagram
When we depict an FA as circles and arrows, we say that
we have drawn a directed graph.
We borrow from Graph Theory the name directed
edge, or simply edge, for the arrow between states.
Every state has as many outgoing edges as there
are letters in the alphabet.
It is possible for a state to have no incoming
edges or to have many.
42
Finite Automata Example
By convention, we say that the null string starts in the
start state and ends also in the start state for all FAs.
Consider this FA:
Here, the ± means that the same state is both a start and a
final state.
The language for this machine is
(a + b)*
44
Finite Automata Example
The second type include FAs of which the final states can not be
reached from the start state.
45
Finite Automata Example
Or it is because the final state has no incoming edges, as
shown below:
46
FA and their Languages
We will study FA from two different angles:
47
Example
Let us build a machine that accepts the language of all words over
the alphabet Σ = {a, b} with an even number of letters.
A mathematician could approach this problem by counting the total
number of letters from left to right. A computer scientist would
solve the problem differently since it is not necessary to do all the
counting:
Use a Boolean flag, named E, initialized with the value TRUE. Every
time we read a letter, we reverse the value of E until we have
exhausted the input string. We then check the value of E. If it is
TRUE, then the input string is in the language; if FALSE, it is not.
The FA for this language should require only 2 states:
– State 1: E is TRUE. This is the start and also final state.
– State 2: E is FALSE.
48
Example Contd.
So the FA is pictured as follows:
49
Example
Let us build a FA of a language of all strings that begin with the
letter a.
a(a + b)*
50
Example
The machine looks like this:
51
Example
Let’s build a machine that accepts all words containing a
triple letter, either aaa or bbb, and only those words.
From the start state, the FA must have a path of three edges,
with no loop, to accept the word aaa. So, we begin our FA
with the following:
52
Example Contd.
For similar reason, there must be a path for bbb, that has no loop,
and uses entirely differently states. If the b-path shares any states
with the a-path, we could mix a’s and b’s to get to the final state.
However, the final state can be shared.
53
Example Contd.
If we are moving along the a-path and we read a b before the
third a, we need to jump to the b-path in progress and
vice versa.The final FA then looks like this:
54
Example
Consider the FA below. We would like to examine what language
this machine accepts.
55
Example
There are only two ways to get to the final state 4 in this FA: One is from state 2
and the other is from state 3.
The only way to get to state 2 is by reading an a while in either state 1 or state 3.
If we read another a we will go to the final state 4.
Similarly, to get to state 3, we need to read the input letter b while in either state
1 or state 2. Once in state 3, if we read another b, we will go to the final state 4.
Thus, the words accepted by this machine are exactly those strings that have a
double letter aa or bb in them. This language is defined by the regular expression
56
Example
Consider the FA below. What is the language accepted by this
machine?
57
Example
Starting at state 1, if we read a word beginning with an a, we will
go straight to the final state 3. We will stay in state 3 as long as we
continue to read only a’s. Hence, all words of the form aa are
accepted by this FA.
What if we began with some a’s that take us to state 3 and then we
read a b? This will bring us to state 2. To get back to the final state 3,
we must proceed to state 4 and then state 3. This trip requires two
more b’s.
Notice that in states 2, 3, and 4, all a’s that are read are ignored; and
only b’s cause a change of state.
58
Summarizing what we know: If an input string starts with an a
followed by some b’s, then it must have 3 b’s to return to the final
state 3, or 6 b’s to make the trip twice, or 9 b’s, or 12 b’s and so on.
In other words, an input string starting with an a and having a total
number of b’s divisible by 3 will be accepted. If an input string
starts with an a but has a total number of b’s not divisible by 3, then
it is rejected because its path will end at either state 2 or 4.
59
Example Contd.
What happens to an input string that begins with a b?
Such an input string will lead us to state 2. It then needs two more b’s to
get to the final state 3. These b’s can be separated by any number of a’s.
Once in state 3, it needs no more b’s, or 3 more b’s, or 6 more b’s and so
on.
60
Example EVEN-EVEN revisited
Consider the FA below.
61
Example EVEN-EVEN revisited Contd.
There are 4 edges labeled a. All the a-edges go either from one of the
upper two states (states 1 and 2) to one of the lower two states (states 3
and 4), or else form one of the lower two states to one of the upper two
states.
If a string gets accepted by this FA, we can say that the string must have
had an even number of a’s in it. Every a that took us south was balanced by
some a that took us back north.
So, every word in the language of this FA has an even number of a’s in it.
Also, we can say that every input string with an even number of a will
finish its path in the north (ie., state 1 or state 2).
62
Example EVEN-EVEN revisited Contd.
Therefore, all the words in the language accepted by this FA must
have an even number of a’s and an even number of b’s. So, they are
in the language EVEN-EVEN.
Notice that all input strings that end in state 2 have an even number
of a’s but an odd number of b’s. All strings that end in state 3 have an
even number of b’s but an odd number of a’s. All strings that end in
state 4 have an odd number of a’s and an odd number of b’s. Thus,
every word in the language EVEN - EVEN must end in state 1 and
therefore be accepted.
63