0% found this document useful (0 votes)
127 views19 pages

HN ATC Notes Module 2

This document contains lecture notes on automata theory and computability. It discusses regular expressions and their relationship to finite state machines. The notes begin with examples of problems that involve matching patterns, such as lexical analysis in compilers. It then defines regular expressions recursively and provides examples. Kleene's theorem is discussed, which states that regular expressions and finite state machines have equivalent expressive power. The notes provide constructions for building a finite state machine from a regular expression and vice versa. Applications of regular expressions are mentioned.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views19 pages

HN ATC Notes Module 2

This document contains lecture notes on automata theory and computability. It discusses regular expressions and their relationship to finite state machines. The notes begin with examples of problems that involve matching patterns, such as lexical analysis in compilers. It then defines regular expressions recursively and provides examples. Kleene's theorem is discussed, which states that regular expressions and finite state machines have equivalent expressive power. The notes provide constructions for building a finite state machine from a regular expression and vice versa. Applications of regular expressions are mentioned.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Lecture Notes

15CS54
Automata Theory and
Computability
(CBCS Scheme)

Prepared by

Mr. Harivinod N
Dept. of Computer Science and Engineering,
Vivekananda College of Engineering and Technology, Puttur

Expression
Module-2: Regular Grammar
Language

Contents

1. Regular Expression 4. Regular Grammars


2. Kleene’s theorem 5. Regular and Non-regular languages
3. Applications of RE

Course website:
www.techjourney.in
Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

1. Regular Expression
Let's now take a different approach to categorizing problems. Instead of focusing on the power
of a computing device, let's look at the task that we need to perform. In particular, let's consider
problems in which our goal is to match finite or repeating patterns.
• The first step of compiling a program: This step is called lexical analysis. Its job is to
break the source code into meaningful units such as keywords, variables, and numbers.
• Filtering email for spam
• Searching a complex directory structure by specifying patterns that are known to occur
in the file we want.
Definition: Regular expression (RE) is defined in recursive way. The regular expressions over
an alphabet Σ are all and only the strings that can be obtained as follows:
1. ∅is a regular expression.
2. ɛ is a regular expression.
3. Every element of Σ is a regular expression.
4. If α , β are regular expressions, then so is αβ.
5. If α , β are regular expressions, then so is α∪β.
6. If α is a regular expression, then so is α*.
7. If α is a regular expression, then so is α+.
8. If α is a regular expression, then so is (α).
So, if we let Σ = {a, b }, the following strings are regular expressions:
∅, ε. a. b. (aUb)*, abbaUε,
Semantic interpretation
• L(∅) = ∅. The language that contains no strings
• L(ε) = {ε}. The language that contains just the empty string.
• L(c), where c ∈ Σ = {c}. The language that contains the single one-character string c
• L(αβ) = L(α) L(β). Concatenation of RE is same as concatenation of Languages
• L(α ∪ β) = L(α) ∪ L(β). Union of RE’s is same as union of the two constituent
languages.
• L(α*) = (L(α))*. * is a Klene star operation. Defines the language that is formed by
concatenating together zero or more strings drawn from L(α).
• L(α+) = L(αα*) = L(α) (L(α))*. If L(α) is equal to ∅, then L(α+) is also equal to ∅.
Otherwise L(α+) is language formed by concatenating together one or more strings
drawn from L(α).
• L((α)) = L(α).

If the meaning of a regular expression α is the language L, then we say that α defines or
describes L.

Prepared by Harivinod N www.techjourney.in Page| 2


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

Example-1

So the meaning of the regular expression (a U b)*b is the set of all strings over the alphabet {a,
b} that end in b.

Example-2

So the meaning of the regular expression ((aU b) (aU b))a(a U b)* is:

Alternatively, it is the language that contains all strings of a's and b's such that there exists a
third character and it is an a.

Example-3

Example-4

Priority
The regular expression language that we have just defined provides three operators. We will
assign the following precedence order to them (from highest to lowest): 1. Kleene star. 2.
concatenation, and 3. union.
So the expression {a U bb*a) will be interpreted as (a U (b(b* a))).

Prepared by Harivinod N www.techjourney.in Page| 3


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

Refer Class notes for more examples on RE of languages

2. Kleene’s theorem
Statement: Finite state machines & regular expressions define the same class of languages; i.e.
They are equivalent i.e. They are equally powerful.
To prove this, we must show:
• Theorem: Any language that can be defined with a regular expression can be accepted
by some FSM and so is regular. (RE to FSM)
• Theorem: Every regular language (i.e., every language that can be accepted by some
DFSM) can be defined with a regular expression. (FSM to RE)

2.1. Building FSM from Regular expression


Theorem: For Every Regular Expression There is an Equivalent FSM i.e Any language that
can be defined with a regular expression can be accepted by some FSM and so is regular.
Proof: The proof is by construction. We will show that given a regular expression α, we can
construct an FSM M such that L(α) = L (M).
We first show that there exists an FSM that corresponds to each primitive regular expression:

Figure 6.1: FSMs for primitive regular expressions

Prepared by Harivinod N www.techjourney.in Page| 4


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

Prepared by Harivinod N www.techjourney.in Page| 5


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

Based on the constructions that have just been described, we can define the following algorithm
to construct, given a regular expression α , a corresponding (usually nondeterministic) FSM:
regextofsm( α: regular expression) =
Beginning with the primitive subexpressions of α and working outwards until
an FSM for an of a has been built do:
Construct an FSM as described above.

Example: Consider the regular expression (b U ab )*. We use regextofsm to build an FSM that
accepts the language defined by this regular expression:

Refer Class notes for more examples

Prepared by Harivinod N www.techjourney.in Page| 6


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

2.2. Building Regular expression from FSM


Theorem: Every regular language (i.e every language that can be accepted by some FSM) can
he defined with a regular expression.
Proof: The proof is by construction. Given an FSM M = (K. ∑, δ, s, A), we can construct a
regular expression α such that L (M) = L (α).
Before starting the method, do the following modification.

Figure 6.2: Collapsing


multiple transitions
into one.

Figure 6.3. Adding all the required transitions.


Note: The above step is optional. We can assume Φ whenever it is required.
Removing a state: Now suppose that we select a state rip and remove it and the transitions
into and out of it. Then we must modify every remaining transition so that M's function stays
the same. So, suppose that we remove some state that we will call rip. How should the
remaining transitions be changed? Consider any pair of states p and q. Once we remove rip,
how can M get from p to q?
• lt can still take the transition that went directly from p to q, or
• It can take the transition from p to rip. Then, it can take the transition from rip back to
itself zero or more times. Then it can take the transition from rip to q.

Prepared by Harivinod N www.techjourney.in Page| 7


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

Refer Class notes for


more examples

Prepared by Harivinod N www.techjourney.in Page| 8


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

Regular Expression, Regular Languages and Kleene’s Theorem

Kleene's Theorem tells us that there is no difference between the formal power of regular
expressions and finite state machines. But, as some of the examples that we just considered
suggest. there is a practical difference in their effectiveness as problem solving tools:
• The regular expression language is a pattern language. In particular, regular expressions
must specify the order in which a sequence of symbols must occur. This is useful when
we want to describe patterns such as phone numbers (it matters that the area code
comes first) or email addresses (it matters that the user name comes before the
domain).
• But there are some applications where order doesn't matter: vending machine example
that an instance of this class of problem. The order in which the coins were entered
doesn't matter. Parity checking is another. Only the total number of 1 bits matters. not
where they occur in the string. Finite state machines can he very effective in solving
problems such as this. But the regular expressions that correspond to those FSMs may
be too complex to be useful.
The bottom line is that sometimes it is easy to write a finite state machine to describe a
language. For other problems. it may be easier to write u regular expression.

3. Applications of RE
Because patterns are everywhere, applications of regular expressions are everywhere. The term
regular expression is used in the modern computing world in a much more general way than
we have defined it here. Many programming languages and scripting systems provide support
for regular expression matching. Each of them has its own syntax.
• The programming language Perl. for example. supports regular expression matching.
• Decimal Numbers: The following regular expression matches decimal encodings of
numbers:
-? ([0-9]+(\.[0-9]*)? | \.[0-9]+)
• Biology: Meaningful words in protein sequences arc called motifs. They can he
described with regular expressions. Given a protein or DNA sequence, task is to find
others that are likely to be evolutionarily close to it.
ESGHDTTTYYNKNRYPAGWNNHHDQMFFWV
To achieve this we need to, build a DFSM that can examine thousands of other sequences and
find those that match any of the selected patterns.
• lP Addresses: The following regular expression searches for Internet (IP) addresses:
([0-9]{1, 3} (\. [0-9]{1, 3}){3})
• In XML regular expressions are one way to define parts of new document types.

Prepared by Harivinod N www.techjourney.in Page| 9


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

• Legal Passwords: Consider the problem of determining whether a string is a legal


password. Suppose that we require that all passwords meet the following requirements:
• A password must begin with a letter.
• A password may contain only letters. numbers. and the underscore character.
• A password must contain at least four characters and no more than eight
characters.
The following regular expression describes the language of legal passwords. The line
breaks have no significance. We have used them just to make the expression easier to
read.
((a-z) U (A-Z))
((a-z) U (A-Z) U (0-9) U _)
((a-z) U (A-Z) U (0-9) U _)
((a-z) U (A-Z) U (0-9) U _)
((a-z) U (A-Z) U (0-9) U _ U ɛ)
((a-z) U (A-Z) U (0-9) U _ U ɛ)
((a-z) U (A·Z) U (0-9) U _ U ɛ)
((a-z) U (A-Z) U (0-9) U _ U ɛ).

Prepared by Harivinod N www.techjourney.in Page| 10


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

6. Regular Grammar
So far, we have considered two equivalent ways to describe exactly the class of regular
languages: Finite state machines, Regular expressions. We now introduce a third: Regular
grammars (sometimes also called right linear grammars).
Definition: A regular grammar G is a quadruple (V, Σ, R, S), where:
V is the rule alphabet, which contains non-terminals and terminals,
Σ (the set of terminals) is a subset of V,
R (the set of rules) is a finite set of rules of the form: X → Y,
S (the start symbol) is a nonterminal.
In a regular grammar, all rules in R must:
o have a LHS that is a single nonterminal
o have a RHS that is:
ε, or
a single terminal, or
a single terminal followed by a single nonterminal.
Example: Legal: S → a, S → ε, and T → aS,
Example: Not legal: S → aSa and aSa → T

Example for Grammar: Consider the language: L = {w ∈ {a, b}* : |w| is even}. The following
DFSM M accepts L:

The corresponding RE is ((aa) ∪ (ab) ∪ (ba) ∪ (bb))*. The following regular grammar G also
defines L:
S→ε T→a T → aS
S → aT T→b T → bS
S → bT
In G, the job of the non-terminal S is to generate an even length string. It does this either by
generating the empty string or by generating a single character and then creating T. The job of
Tis to generate an odd length string. It does this by generating a single character and then
creating S. S generates ɛ. the shortest possible even length string. So, if T can be shown to
generate all and only the odd length strings, we can show that S generates all and only the
remaining even length strings. T generates every string whose length is one greater than the
length of some string S generates. So. if S generates all and only the even length strings, then
T generates all and only the other odd length strings.
Notice the clear correspondence between M and G. which we have highlighted by naming M's
states S and T. Even length strings drive M to state S. Even length strings are generated by G
starting with S. Odd length strings drive M to state T. Odd length strings are generated by G
starting with T.

Prepared by Harivinod N www.techjourney.in Page| 11


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

Theorem: The class of languages that can be defined with regular grammars is exactly the
regular languages.
Proof: We first show that any language that can be defined with a regular grammar can be
accepted by some FSM and so is regular. Then we must show that every regular language (i.e.,
every language that can be accepted by some FSM) can be defined with a regular grammar.
Both proofs are by construction.

5.1. Regular grammar to FSM


grammartofsm(G = (V, Σ, R, S)) =
1. Create in M a separate state for each nonterminal in V.
2. Start state is the state corresponding to S .
3. If there are any rules in R of the form X → w, for some w∈Σ, create a new state labeled
# (Final).
4. For each rule of the form X → wY, add a transition from X to Y labeled w.
5. For each rule of the form X → w, add a transition from X to # labeled w.
6. For each rule of the form X → ε, mark state X as accepting.
7. Mark state # as accepting.
8. If M is incomplete (i.e. there are some (state, input) pairs for which no transition is
defined), M requires a dead state. Add a new state D. For every (q, i) pair for which no
transition has already been defined, create a transition from q to D labeled i. For every i
in Σ, create a transition from D to D labeled i.

5.2. FSM to regular grammar


The construction is effectively the reverse of the one we just did.
fsm-to-Grammar (M) =
1. Remove the dead states.
2. For each state in M, create a nonterminal except for # (the accepting state that do not have
outgoing transitions).
3. Label corresponding to start state is the S
4. For each transition from X to # (final state without outgoing transitions) labeled w rule of
the form X → w.
5. For each transition from X to Y labeled w, add a rule of the form X → wY.
6. For each accepting state X, add a rule of the form X → ε,

5.3. Examples
Example: Strings that end with aaaa

Prepared by Harivinod N www.techjourney.in Page| 12


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

Example: Missing Letter language


Let L = {a, b, c}. Let L be LMissing = { w: there is a symbol ai belongs to ∑ not appearing in w},
S→ε A → bA C → aC
S → aB A → cA C → bC
S → aC A→ε C→ε
S → bA B → aB
S → bC B → cB
S → cA B→ε
S → cB

Prepared by Harivinod N www.techjourney.in Page| 13


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

Example:

Prepared by Harivinod N www.techjourney.in Page| 14


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

6. Regular and Non-Regular Languages


Theorem: There is a countably infinite number of regular languages.
Proof: We can lexicographically enumerate all the syntactically legal DFSMs with input
alphabet ∑. Every regular language is accepted by at least one of them. So there cannot be more
regular languages than there are DFSMs. Thus there are at most a countably infinite number of
regular languages. There is not a one-to-one relationship between regular languages and
DFSMs since there is an infinite number of machines that accept any given language. But the
number of regular languages is infinite because it includes the following infinite set of
languages:
{a}, {aa} , {aaa}, {aaaa}. {aaaaa}, {aaaaaa }…

Theorem: Every finite language is regular.


Proof: If L is the empty set, then it is defined by the regular expression Φ and so is regular. If
it is any finite language composed of the strings s1, s2, ... sn for some positive integer n, then it
is defined by the regular expression:

So it too is regular.

Example: The Intersection of Two Infinite Languages


Let L = L1⋂L2, where L1 = { an bn; n≥0} and L2 ={ bn an;; n≥0} As we will soon be able to
prove, neither L1 nor L2 is regular. But L is. L = {e}, which is finite.

Example: A Finite Language We May Not Be Able to Write Down


Let L = {w ɛ {0 - 9}* : w is the social security number of a living US resident}.
L is regular because it is finite. It doesn't matter that no individual or organization happens at
any given instant, to know what strings are in L.

Note: Not every regular language is computationally tractable(manageable). Example is


Towers of Hanoi language.
But, of course, most interesting regular languages are infinite. Sn far. we've developed four
techniques for showing that a (finite or infinite) language L is regular:
• Exhibit a regular expression for L.
• Exhibit an FSM for L.
• Show that the number of equivalence classes of ≈L is finite.
• Exhibit a regular grammar for L

Prepared by Harivinod N www.techjourney.in Page| 15


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

6.1 Closure Properties of Regular Languages


Theorem: The regular languages are closed under union, concatenation, and Kleene star.
Proof: By the same constructions that were used in the proof of Kleene's theorem
Theorem: The regular languages are closed under complement, intersection, difference,
reverse, and letter substitution.
Proof:

Prepared by Harivinod N www.techjourney.in Page| 16


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

6.2 Showing That a Language is Not Regular


The Pumping Theorem for Regular languages

Prepared by Harivinod N www.techjourney.in Page| 17


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

anbn is not regular

The Balanced Parenthesis Language is Not Regular

The Even Palindrome language is Not Regular

Prepared by Harivinod N www.techjourney.in Page| 18


Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines

The Language with More a's than b's is Not Regular

The Prime Number of a's Language is Not Regular

When we do a Pumping theorem proof that a language L is not regular, we have two choices
to make: a value for w and a value for q. As we have just seen, there are some useful heuristics
that can guide our choices:
To choose w:
• Choose a w that is in the part of L that makes it not regular.
• Choose a w that is only barely in L.
• Choose a w with as homogeneous as possible an initial region of length at least k.
To choose q:
• Try letting q be either 0 or 2.
• If that doesn't work, analyze L to see if there is some other specific value that will work.
*****

Prepared by Harivinod N www.techjourney.in Page| 19

You might also like