0% found this document useful (0 votes)
24 views

Regex - Regular Expression

1) The document summarizes a lecture on regular expressions and finite state automata. It discusses precedence rules for regular expressions, equivalence of regular expressions, and introduces nondeterministic finite state automata (NFA). 2) An NFA is presented that recognizes strings ending in "babb" using fewer states than a deterministic finite automaton. The subset construction algorithm is described to convert an NFA to an equivalent deterministic finite automaton (DFA). 3) Closure properties of regular languages are discussed. An example finite state automaton is constructed that recognizes strings containing "aaa" and an even number of b's.

Uploaded by

Sakib Jobaid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Regex - Regular Expression

1) The document summarizes a lecture on regular expressions and finite state automata. It discusses precedence rules for regular expressions, equivalence of regular expressions, and introduces nondeterministic finite state automata (NFA). 2) An NFA is presented that recognizes strings ending in "babb" using fewer states than a deterministic finite automaton. The subset construction algorithm is described to convert an NFA to an equivalent deterministic finite automaton (DFA). 3) Closure properties of regular languages are discussed. An example finite state automaton is constructed that recognizes strings containing "aaa" and an even number of b's.

Uploaded by

Sakib Jobaid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CSC 236 H1F Lecture Summary for Week 11 Fall 2015

Regular Expressions (Continued)


Precedencse Rules: The following conventions allow us to simplify our regexps considerably without introduc-
ing ambiguity:

• We leave out the outermost pair of parantheses. E.g., (0 + 1)(11)∗ is an abbreviation of ((0 + 1)(11)∗ ).

• Star operator has precedence over all operators. E.g., RS ∗ is an abbreviation of R(S ∗ )

• Concatenation has precedence over union. E.g., RS ∗ + T is an abbreviation of ((RS ∗ ) + T )

• When the same binary operator is applied several times in a row, we can leave out the parantheses and
assume the grouping is to the right. E.g., 11 + 01 + 10 + 11 is an abbreviation of (11 + (01 + (10 + 11)))

Equivalence of Regexps: Regexps R and S are equivalent (denoted R ≡ S) iff they represent the same
language (i.e., L(R) = L(S)), e.g., b∗ a(a + b)∗ ≡ (a + b)∗ ab∗ .

Theorem 1. The general regexps R, S and T , the following equivalences hold:

• Comutativity of union: R + S ≡ S + R

• Associativity of union: (R + S) + T ≡ R + (S + T )

• Associativity of concatenation: (RS)T ≡ R(ST )

• Left distributivity: R(S + T ) ≡ RS + RT

• Right distrbutivity: (S + T )R ≡ SR + T R

• Identity for union: R + {} ≡ R

• Identity for concatenation: R ≡ R ≡ R

• Annihilator for concatenation: {}R ≡ {} ≡ R{}



• Idempotence of Kleene star: R∗ ≡ R∗

Example 1. We prove that L(b∗ a(a + b)∗ ) = L = {all strings of a’s and b’s that contain at least one a}, by
showing double inclusion (standard technique for proving set equality).
Intuition: Stating L(b∗ a(a + b)∗ ) = L amounts to making two separate claims.

1. Every string in L(b∗ a(a + b)∗ ) has at least one a (i.e., RE pattern does not include bad strings)

2. Every string with at least one a belongs to L(b∗ a(a + b)∗ ) (i.e., RE pattern includes every good string).

Proof. Now, let’s prove both parts.


1. (L(b∗ a(a + b)∗ ) subset of L): Let s be an arbitrary string in L(b∗ a(a + b)∗ ). This means s = t ◦ u ◦ v for some
strings t ∈ L(b∗ ), u ∈ L(a), and v ∈ L((a + b)∗ ). Since there is only one string a ∈ L(a), u = a so s = t ◦ a ◦ v and
s is a string that contains at least one a, so s ∈ L.
2. (L subset of L(b∗ a(a + b)∗ )): Let s be an arbitrary string in L. This means that s contains at least one a, so
it contains a first occurrence of a and can be broken up into three substrings: s = r ◦ a ◦ t, where r is some string
that contains no a (maybe empty), a is the first occurrence of a in s, and t is some string of a’s and b’s. But then,
r ∈ L(b∗ ), a ∈ L(a), and t ∈ L((a + b)∗ ) so by definition, s = r ◦ a ◦ t is in L(b∗ a(a + b)∗ ).

Remark. See textbook for other detailed examples.

Dept. of Computer Science, University of Toronto, St. George Campus Page 1 of 4


CSC 236 H1F Lecture Summary for Week 11 Fall 2015

Nondeterministic Finite State Automata (NFA or NFSA)


Assume that you want to construct a DFA that accepts the following language

L = {s ∈ {a, b}∗ : s ends with babb}


The DFA for this language must remember the last 4 symbols processed. As we saw in our tutorial, this requires all
possible combinations of the last 4 characters (16 of them). We should consider all possible combinations because
in a DFA, a given state and the current input symbol uniquely determines the next state of the automaton. It
is for this reason that such automata are called deterministic. But if we remove the determinism constraint, the
following FSA accepts L:

a, b

q0 b q1 a q2 a q3 a q4
start

Figure 1:

Remark. Note that if a string does not end with babb, then every attempt to follow transition out of q0 (in Figure
1) ends up in empty set of states (one of the transitions won’t work).
Notice the simplicity of FSA in Figure 1. Such properties have lead to the definition of a variant of finite state
automata, called nondeterministic finite state automata (NFA or NFSA). In these FSAs, given the current state,
when the automaton reads an input symbol a, there may be several states to which it may go next (hence the
nondeterminism).
NFA or NFSA: A nondeterministic finite state automaton is a quintuple (Q, Σ, q0 , F, δ), where Q is a fixed,
finite, non-empty set of states. Σ is a fixed (finite, non-empty) alphabet (Q ∩ Σ = {}). q0 ∈ Q is the initial state.
F ⊆ Q is the set of accepting (“final”) states. δ : Q × (Σ ∪ {}) → P(Q) is a transition function (i.e., δ(q, a) is the
set of next states of the NFA when processing symbol a from state q)
Note: P(Q) is the power set of Q.
We can see that the definition of the NFA contains transitions like δ(q, ). These transitions are called spon-
taneous state transition or -transition, in which the NFA makes a transition from the current state to the next
state without reading any input symbol. The NFA can be defined without the introduction of -transitions by
extending the defintion of initial state to a set of states rather than a state. However, -transitions will allow us
to simplify our notations and arguments in some cases (e.g., when we talk about closure properties).
Remark. The power of NFA is that, by definition, NFA accepts a string iff set of states reached at the end contains
at least one accepting state. It is like saying that NFA has unlimited parallelism.
Subset Construction: Given a NFA M = (Q, Σ, q0 , F, δ), we can construct a DFA M 0 = (Q0 , Σ, q00 , F 0 , δ 0 ) that
accepts the same language as M as follows:
• Q0 = P (Q)
• q00 = E(q0 ) (i.e., the set of all states reachable from the initial state of the given NFSA via -transitions only)
• F 0 = {q 0 ∈ Q0 : q 0 ∩ F 6= ∅} (i.e., all states that contain an accepting state of the given NFSA)
• For any q 0 ∈ Q and a ∈ Σ, δ 0 (q 0 , a) = ∪qx ∈q0 ∪qy ∈δ(qx ,a) E(qy ) where E(qy ) is the set of states reachable from


qy following any number of  transitions.


This construction is called the subset construction, because each state of M 0 is a set of states of M

Dept. of Computer Science, University of Toronto, St. George Campus Page 2 of 4


CSC 236 H1F Lecture Summary for Week 11 Fall 2015

Example 2. Consider the following NFA:

a b

, a
start q0 q1

Figure 2: NFA corresponding to regexp a∗ b∗

The corresponding DFA using the subset construction is:

a b

q0 q1 b q1
start

Figure 3: Correponding DFA of NFA in Figure 2

Remark. Although NFA may introduce unlimited parallelism. But it is not a practical model!

Closure properties

Let’s construct a FSA that accepts the language

L = {s ∈ {a, b}∗ : s contains three a’s in a row and an even number of b’s }

Another way to express L is to say

L = {s : s contains aaa} ∩ {s : s contains even many b’s}

Each of these sub-languages correspond to a FSA as follows:

b a, b
a a
q0 q1 q2 a q3
start b

Figure 4: FSA for {s : s contains aaa} (FSA1)

Dept. of Computer Science, University of Toronto, St. George Campus Page 3 of 4


CSC 236 H1F Lecture Summary for Week 11 Fall 2015

a a
b
start q0 q1

Figure 5: FSA for {s : s contains even many b’s} (FSA2)

Now, let’s try to combine the states in FSA1 and FSA2 so that the resulting states can track the states in
both of the aforementioned FSAs at the same time. The resulting FSA will look like follows (qxy is a state that
represents state qx in FSA1 and state qy in FSA2):

q00 a q10 a q20 a q30


start
b b b
b b b b
b
q01 a q11 a q21 a q31

Figure 6: FSA for {s : s contains even many b’s} (FSA2)

It should be obvious now that the only accepting state in this FSA should be q30 in which we have seen three
a’s and even number of b’s.
The aforementioned example demonstrates a powerful design technique by which we can combine FSAs that
accept languages to obtain an FSA that accepts the resulting language of the combination.

Closure Property: Let R and S represent two languages that are accepted by FSAR and FSAS respectively. If
an operation that is applied to R and S results in a language T for which there exists a FSA (FSAT ) that decides
language T , we say that the class of languages accepted by FSA is closed under this operation

Theorem 2. The class of languages that are accepted by FSA is closed under complementation, union, in-
tersection, concatenation and the Kleene star operation. In other words, if L and L0 are languages that are
accepted by FSA, then so are all of the following: L̄, L ∩ L0 , L ∪ L0 , L ◦ L0 and L~ .

Regular Languages
Theorem 3. Let L be a language. The following statements are equivalent:

1. L = L(A) for some NFA A

2. L = L(A0 ) for some DFA A0

3. L = L(R) for some regexp R

We are not going to prove this theorem. However, we are going to talk about the main ideas of the proof. You
can look at Sections 7.4.2 and Sections 7.6 in the textbook for a foraml treatment of this theorem.

Dept. of Computer Science, University of Toronto, St. George Campus Page 4 of 4

You might also like