Regex - Regular Expression
Regex - Regular Expression
• We leave out the outermost pair of parantheses. E.g., (0 + 1)(11)∗ is an abbreviation of ((0 + 1)(11)∗ ).
• Star operator has precedence over all operators. E.g., RS ∗ is an abbreviation of R(S ∗ )
• When the same binary operator is applied several times in a row, we can leave out the parantheses and
assume the grouping is to the right. E.g., 11 + 01 + 10 + 11 is an abbreviation of (11 + (01 + (10 + 11)))
Equivalence of Regexps: Regexps R and S are equivalent (denoted R ≡ S) iff they represent the same
language (i.e., L(R) = L(S)), e.g., b∗ a(a + b)∗ ≡ (a + b)∗ ab∗ .
• Comutativity of union: R + S ≡ S + R
• Associativity of union: (R + S) + T ≡ R + (S + T )
• Right distrbutivity: (S + T )R ≡ SR + T R
Example 1. We prove that L(b∗ a(a + b)∗ ) = L = {all strings of a’s and b’s that contain at least one a}, by
showing double inclusion (standard technique for proving set equality).
Intuition: Stating L(b∗ a(a + b)∗ ) = L amounts to making two separate claims.
1. Every string in L(b∗ a(a + b)∗ ) has at least one a (i.e., RE pattern does not include bad strings)
2. Every string with at least one a belongs to L(b∗ a(a + b)∗ ) (i.e., RE pattern includes every good string).
a, b
q0 b q1 a q2 a q3 a q4
start
Figure 1:
Remark. Note that if a string does not end with babb, then every attempt to follow transition out of q0 (in Figure
1) ends up in empty set of states (one of the transitions won’t work).
Notice the simplicity of FSA in Figure 1. Such properties have lead to the definition of a variant of finite state
automata, called nondeterministic finite state automata (NFA or NFSA). In these FSAs, given the current state,
when the automaton reads an input symbol a, there may be several states to which it may go next (hence the
nondeterminism).
NFA or NFSA: A nondeterministic finite state automaton is a quintuple (Q, Σ, q0 , F, δ), where Q is a fixed,
finite, non-empty set of states. Σ is a fixed (finite, non-empty) alphabet (Q ∩ Σ = {}). q0 ∈ Q is the initial state.
F ⊆ Q is the set of accepting (“final”) states. δ : Q × (Σ ∪ {}) → P(Q) is a transition function (i.e., δ(q, a) is the
set of next states of the NFA when processing symbol a from state q)
Note: P(Q) is the power set of Q.
We can see that the definition of the NFA contains transitions like δ(q, ). These transitions are called spon-
taneous state transition or -transition, in which the NFA makes a transition from the current state to the next
state without reading any input symbol. The NFA can be defined without the introduction of -transitions by
extending the defintion of initial state to a set of states rather than a state. However, -transitions will allow us
to simplify our notations and arguments in some cases (e.g., when we talk about closure properties).
Remark. The power of NFA is that, by definition, NFA accepts a string iff set of states reached at the end contains
at least one accepting state. It is like saying that NFA has unlimited parallelism.
Subset Construction: Given a NFA M = (Q, Σ, q0 , F, δ), we can construct a DFA M 0 = (Q0 , Σ, q00 , F 0 , δ 0 ) that
accepts the same language as M as follows:
• Q0 = P (Q)
• q00 = E(q0 ) (i.e., the set of all states reachable from the initial state of the given NFSA via -transitions only)
• F 0 = {q 0 ∈ Q0 : q 0 ∩ F 6= ∅} (i.e., all states that contain an accepting state of the given NFSA)
• For any q 0 ∈ Q and a ∈ Σ, δ 0 (q 0 , a) = ∪qx ∈q0 ∪qy ∈δ(qx ,a) E(qy ) where E(qy ) is the set of states reachable from
a b
, a
start q0 q1
a b
q0 q1 b q1
start
Remark. Although NFA may introduce unlimited parallelism. But it is not a practical model!
Closure properties
L = {s ∈ {a, b}∗ : s contains three a’s in a row and an even number of b’s }
b a, b
a a
q0 q1 q2 a q3
start b
a a
b
start q0 q1
Now, let’s try to combine the states in FSA1 and FSA2 so that the resulting states can track the states in
both of the aforementioned FSAs at the same time. The resulting FSA will look like follows (qxy is a state that
represents state qx in FSA1 and state qy in FSA2):
It should be obvious now that the only accepting state in this FSA should be q30 in which we have seen three
a’s and even number of b’s.
The aforementioned example demonstrates a powerful design technique by which we can combine FSAs that
accept languages to obtain an FSA that accepts the resulting language of the combination.
Closure Property: Let R and S represent two languages that are accepted by FSAR and FSAS respectively. If
an operation that is applied to R and S results in a language T for which there exists a FSA (FSAT ) that decides
language T , we say that the class of languages accepted by FSA is closed under this operation
Theorem 2. The class of languages that are accepted by FSA is closed under complementation, union, in-
tersection, concatenation and the Kleene star operation. In other words, if L and L0 are languages that are
accepted by FSA, then so are all of the following: L̄, L ∩ L0 , L ∪ L0 , L ◦ L0 and L~ .
Regular Languages
Theorem 3. Let L be a language. The following statements are equivalent:
We are not going to prove this theorem. However, we are going to talk about the main ideas of the proof. You
can look at Sections 7.4.2 and Sections 7.6 in the textbook for a foraml treatment of this theorem.