HN ATC Notes Module 2
HN ATC Notes Module 2
15CS54
Automata Theory and
Computability
(CBCS Scheme)
Prepared by
Mr. Harivinod N
Dept. of Computer Science and Engineering,
Vivekananda College of Engineering and Technology, Puttur
Expression
Module-2: Regular Grammar
Language
Contents
Course website:
www.techjourney.in
Lecture Notes | 15CS54 – ATC | Module 1: Finite State Machines
1. Regular Expression
Let's now take a different approach to categorizing problems. Instead of focusing on the power
of a computing device, let's look at the task that we need to perform. In particular, let's consider
problems in which our goal is to match finite or repeating patterns.
• The first step of compiling a program: This step is called lexical analysis. Its job is to
break the source code into meaningful units such as keywords, variables, and numbers.
• Filtering email for spam
• Searching a complex directory structure by specifying patterns that are known to occur
in the file we want.
Definition: Regular expression (RE) is defined in recursive way. The regular expressions over
an alphabet Σ are all and only the strings that can be obtained as follows:
1. ∅is a regular expression.
2. ɛ is a regular expression.
3. Every element of Σ is a regular expression.
4. If α , β are regular expressions, then so is αβ.
5. If α , β are regular expressions, then so is α∪β.
6. If α is a regular expression, then so is α*.
7. If α is a regular expression, then so is α+.
8. If α is a regular expression, then so is (α).
So, if we let Σ = {a, b }, the following strings are regular expressions:
∅, ε. a. b. (aUb)*, abbaUε,
Semantic interpretation
• L(∅) = ∅. The language that contains no strings
• L(ε) = {ε}. The language that contains just the empty string.
• L(c), where c ∈ Σ = {c}. The language that contains the single one-character string c
• L(αβ) = L(α) L(β). Concatenation of RE is same as concatenation of Languages
• L(α ∪ β) = L(α) ∪ L(β). Union of RE’s is same as union of the two constituent
languages.
• L(α*) = (L(α))*. * is a Klene star operation. Defines the language that is formed by
concatenating together zero or more strings drawn from L(α).
• L(α+) = L(αα*) = L(α) (L(α))*. If L(α) is equal to ∅, then L(α+) is also equal to ∅.
Otherwise L(α+) is language formed by concatenating together one or more strings
drawn from L(α).
• L((α)) = L(α).
If the meaning of a regular expression α is the language L, then we say that α defines or
describes L.
Example-1
So the meaning of the regular expression (a U b)*b is the set of all strings over the alphabet {a,
b} that end in b.
Example-2
So the meaning of the regular expression ((aU b) (aU b))a(a U b)* is:
Alternatively, it is the language that contains all strings of a's and b's such that there exists a
third character and it is an a.
Example-3
Example-4
Priority
The regular expression language that we have just defined provides three operators. We will
assign the following precedence order to them (from highest to lowest): 1. Kleene star. 2.
concatenation, and 3. union.
So the expression {a U bb*a) will be interpreted as (a U (b(b* a))).
2. Kleene’s theorem
Statement: Finite state machines & regular expressions define the same class of languages; i.e.
They are equivalent i.e. They are equally powerful.
To prove this, we must show:
• Theorem: Any language that can be defined with a regular expression can be accepted
by some FSM and so is regular. (RE to FSM)
• Theorem: Every regular language (i.e., every language that can be accepted by some
DFSM) can be defined with a regular expression. (FSM to RE)
Based on the constructions that have just been described, we can define the following algorithm
to construct, given a regular expression α , a corresponding (usually nondeterministic) FSM:
regextofsm( α: regular expression) =
Beginning with the primitive subexpressions of α and working outwards until
an FSM for an of a has been built do:
Construct an FSM as described above.
Example: Consider the regular expression (b U ab )*. We use regextofsm to build an FSM that
accepts the language defined by this regular expression:
Kleene's Theorem tells us that there is no difference between the formal power of regular
expressions and finite state machines. But, as some of the examples that we just considered
suggest. there is a practical difference in their effectiveness as problem solving tools:
• The regular expression language is a pattern language. In particular, regular expressions
must specify the order in which a sequence of symbols must occur. This is useful when
we want to describe patterns such as phone numbers (it matters that the area code
comes first) or email addresses (it matters that the user name comes before the
domain).
• But there are some applications where order doesn't matter: vending machine example
that an instance of this class of problem. The order in which the coins were entered
doesn't matter. Parity checking is another. Only the total number of 1 bits matters. not
where they occur in the string. Finite state machines can he very effective in solving
problems such as this. But the regular expressions that correspond to those FSMs may
be too complex to be useful.
The bottom line is that sometimes it is easy to write a finite state machine to describe a
language. For other problems. it may be easier to write u regular expression.
3. Applications of RE
Because patterns are everywhere, applications of regular expressions are everywhere. The term
regular expression is used in the modern computing world in a much more general way than
we have defined it here. Many programming languages and scripting systems provide support
for regular expression matching. Each of them has its own syntax.
• The programming language Perl. for example. supports regular expression matching.
• Decimal Numbers: The following regular expression matches decimal encodings of
numbers:
-? ([0-9]+(\.[0-9]*)? | \.[0-9]+)
• Biology: Meaningful words in protein sequences arc called motifs. They can he
described with regular expressions. Given a protein or DNA sequence, task is to find
others that are likely to be evolutionarily close to it.
ESGHDTTTYYNKNRYPAGWNNHHDQMFFWV
To achieve this we need to, build a DFSM that can examine thousands of other sequences and
find those that match any of the selected patterns.
• lP Addresses: The following regular expression searches for Internet (IP) addresses:
([0-9]{1, 3} (\. [0-9]{1, 3}){3})
• In XML regular expressions are one way to define parts of new document types.
6. Regular Grammar
So far, we have considered two equivalent ways to describe exactly the class of regular
languages: Finite state machines, Regular expressions. We now introduce a third: Regular
grammars (sometimes also called right linear grammars).
Definition: A regular grammar G is a quadruple (V, Σ, R, S), where:
V is the rule alphabet, which contains non-terminals and terminals,
Σ (the set of terminals) is a subset of V,
R (the set of rules) is a finite set of rules of the form: X → Y,
S (the start symbol) is a nonterminal.
In a regular grammar, all rules in R must:
o have a LHS that is a single nonterminal
o have a RHS that is:
ε, or
a single terminal, or
a single terminal followed by a single nonterminal.
Example: Legal: S → a, S → ε, and T → aS,
Example: Not legal: S → aSa and aSa → T
Example for Grammar: Consider the language: L = {w ∈ {a, b}* : |w| is even}. The following
DFSM M accepts L:
The corresponding RE is ((aa) ∪ (ab) ∪ (ba) ∪ (bb))*. The following regular grammar G also
defines L:
S→ε T→a T → aS
S → aT T→b T → bS
S → bT
In G, the job of the non-terminal S is to generate an even length string. It does this either by
generating the empty string or by generating a single character and then creating T. The job of
Tis to generate an odd length string. It does this by generating a single character and then
creating S. S generates ɛ. the shortest possible even length string. So, if T can be shown to
generate all and only the odd length strings, we can show that S generates all and only the
remaining even length strings. T generates every string whose length is one greater than the
length of some string S generates. So. if S generates all and only the even length strings, then
T generates all and only the other odd length strings.
Notice the clear correspondence between M and G. which we have highlighted by naming M's
states S and T. Even length strings drive M to state S. Even length strings are generated by G
starting with S. Odd length strings drive M to state T. Odd length strings are generated by G
starting with T.
Theorem: The class of languages that can be defined with regular grammars is exactly the
regular languages.
Proof: We first show that any language that can be defined with a regular grammar can be
accepted by some FSM and so is regular. Then we must show that every regular language (i.e.,
every language that can be accepted by some FSM) can be defined with a regular grammar.
Both proofs are by construction.
5.3. Examples
Example: Strings that end with aaaa
Example:
So it too is regular.
When we do a Pumping theorem proof that a language L is not regular, we have two choices
to make: a value for w and a value for q. As we have just seen, there are some useful heuristics
that can guide our choices:
To choose w:
• Choose a w that is in the part of L that makes it not regular.
• Choose a w that is only barely in L.
• Choose a w with as homogeneous as possible an initial region of length at least k.
To choose q:
• Try letting q be either 0 or 2.
• If that doesn't work, analyze L to see if there is some other specific value that will work.
*****