Regular Expressions: Reading: Chapter 3
Regular Expressions: Reading: Chapter 3
Reading: Chapter 3
1
Regular Expressions vs. Finite
Automata
Offers a declarative way to express the pattern of any
string we want to accept
E.g., 01*+ 10*
Regular
Languages
Formal language
classes
3
RE’s: Introduction
• Regular expressions are an algebraic
way to describe languages.
• They describe exactly the regular
languages.
• If E is a regular expression, then L(E) is
the language it defines.
• We’ll describe RE’s and their
languages recursively.
4
RE’s: Definition
• Basis 1: If a is any symbol, then a is a
RE, and L(a) = {a}.
• Note: {a} is the language containing one
string, and that string is of length 1.
• Basis 2: ε is a RE, and L(ε) = {ε}.
• Basis 3: ∅ is a RE, and L(∅) = ∅.
5
RE’s: Definition – (2)
• Induction 1: If E1 and E2 are regular
expressions, then E1+E2 is a regular
expression, and L(E1+E2) = L(E1)L(E2).
• Induction 2: If E1 and E2 are regular
expressions, then E1E2 is a regular
expression, and L(E1E2) = L(E1)L(E2).
7
Language Operators
Union of two languages:
L U M = all strings that are either in L or M
Note: A union of two languages produces a third
language
9
Kleene Closure (special notes)
Why?
L* is an infinite set iff |L|≥1 and L≠{}
If L={}, then L* = {} Why?
Why?
If L = Φ, then L* = {}
11
Example: how to use these regular
expression properties and language
operators?
L = { w | w is a binary string which does not contain two consecutive 0s or two
consecutive 1s anywhere)
E.g., w = 01010101 is in L, while w = 10010 is not in L
Goal: Build a regular expression for L
Four cases for w:
Case A: w starts with 0 and |w| is even
Case B: w starts with 1 and |w| is even
Case C: w starts with 0 and |w| is odd
Case D: w starts with 1 and |w| is odd
Regular expression for the four cases:
Case A: (01)*
Case B: (10)*
Case C: 0(10)*
Case D: 1(01)*
Since L is the union of all 4 cases:
Reg Exp for L = (01)* + (10)* + 0(10)* + 1(01)*
If we introduce then the regular expression can be simplified to:
Reg Exp for L = ( +1)(01)*( +0)
12
Precedence of Operators
• Parentheses may be used wherever
needed to influence the grouping of
operators.
• Order of precedence is * (highest), then
concatenation, then + (lowest).
13
Precedence of Operators
Highest to lowest
* operator (star)
. (concatenation)
+ operator
Example:
01* + 1 = ( 0 . ((1)*) ) + 1
14
Examples: RE’s
• L(01) = {01}.
• L(01+0) = {01, 0}.
• L(0(1+0)) = {01, 00}.
• Note order of precedence of operators.
• L(0*) = {ε, 0, 00, 000,… }.
• L((0+10)*(ε+1)) = all strings of 0’s and
1’s without two consecutive 1’s.
15
More Examples: RE’s
• L((0+1)*101(0+1)*) = all strings of 0’s
and 1’s having 101 as a substring.
• L((0+1)*1(0+1)*0(0+1)*1(0+1)*) = all
strings of 0’s and 1’s having 101 as a
subsequence.
• L(1*(1*01*01*01*)*1*) =all strings of 0’s
and 1’s having a number of 0’s that is a
multiple of 3.
16
Finite Automata (FA) & Regular
Expressions (Reg Ex)
To show that they are interchangeable,
consider the following theorems:
Theorem 1: For every DFA A there exists a regular
Proofs expression R such that L(R)=L(A)
in the book Theorem 2: For every regular expression R there
exists an -NFA E such that L(E)=L(R)
-NFA NFA
Theorem 2 Kleene Theorem
Reg Ex DFA
Theorem 1
17
DFA Reg Ex
Theorem 1
DFA to RE construction
Informally, trace all distinct paths (traversing cycles only once)
from the start state to each of the final states and enumerate all
the expressions along the way
Example: 1 0 0,1
q0 0 q1 1 q2
1* 00* 1 (0+1)*
1*00*1(0+1)*
18
DFA-to-RE
• A strange sort of induction.
• States of the DFA are assumed to be
1,2,…,n.
• We construct RE’s for the labels of
restricted sets of paths.
• Basis: single arcs or no arc at all.
• Induction: paths that are allowed to
traverse next state in order.
19
k-Paths
• A k-path is a path through the graph of
the DFA that goes through no state
numbered higher than k.
• Endpoints are not restricted; they can
be any state.
20
k-Path Induction
• Let Rijk be the regular expression for the
set of labels of k-paths from state i to
state j.
• Basis: k=0. Rij0 = sum of labels of arc
from i to j.
•
∅ if no such arc.
• But add ε if i=j.
21
k-Path Inductive Case
• A k-path from i to j either:
1. Never goes through state k, or
2. Goes through k one or more times.
Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1.
Goes from
Then, from
Doesn’t go i to k the
k to j
through k first time Zero or
more times
from k to k
22
Illustration of Induction
Path to k
Paths not going
i through k From k to k
Several times j
k
From k
States < k to j
23
Final Step
• The RE with the same language as
the DFA is the sum (union) of Rijn,
where:
1. n is the number of states; i.e., paths are
unconstrained.
2. i is the start state.
3. j is one of the final states.
24
DFA to RE Example - I
25
DFA to RE Example - II
26
DFA to RE Example - III
28
DFA to RE – A simpler approach by State Elimination
- Eliminate all states that are not starting states or ending states
29
DFA to RE – A simpler approach by State Elimination
- We can use 0+1 as the start state, and concatenate 1 and 0+1 simply, as
there is no other arc from A to C, and no loop at B
30
DFA to RE – A simpler approach by State Elimination
31
DFA to RE – A simpler approach by State Elimination
Finally we concatenate both languages (this above and one from last slide)
and get:
32
RE to NFA - I
33
RE to NFA - II
R+S
RS
R*
(0 + 1)* 1 (0 + 1) 35
Algebraic Laws of Regular
Expressions
Commutative:
E+F = F+E
Associative:
(E+F)+G = E+(F+G)
(EF)G = E(FG)
Identity:
E+Φ = E
E=E=E
Annihilator:
ΦE = EΦ = Φ
36
Algebraic Laws…
Distributive:
E(F+G) = EF + EG
(F+G)E = FE+GE
Idempotent: E + E = E
Involving Kleene closures:
(E*)* = E*
Φ* =
* =
E+ =EE*
E? = +E
37
True or False?
Let R and S be two regular expressions. Then:
1. ((R*)*)* = R* ?
2. (R+S)* = R* + S* ?
38
Summary
Regular expressions
Equivalence to finite automata
DFA to regular expression conversion
Regular expression to -NFA conversion
Algebraic laws of regular expressions
Unix regular expressions and Lexical
Analyzer
39