Formal Languages Part 1 Including Regular Expressions: Basic Concepts For Symbols, Strings, and Languages
Formal Languages Part 1 Including Regular Expressions: Basic Concepts For Symbols, Strings, and Languages
Peter Fritzson
IDA, Linköpings universitet, 2011.
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.2
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.5 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.6
1
Operations on Languages
Closure Concatenation
∑* denotes the set of all strings which can be constructed L, M are languages.
from the alphabet
Example: S = {0,1}
Example:
∑* = {ϵ, 0,1,00,01,...,111,101,...}
L ={ab,cd} M={uv,yz}
∑+ = ∑* – {ϵ} = {0,1,00,01,...} gives us: LM ={abuv,abyz,cduv,cdyz}
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.7 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.8
L0 = {ϵ} L* = L0 L1 ... L
L1 =L Positive closure
L2 = L•L L+ = L1 L2 ... L LL* = L* – {ϵ} , if ϵ not in L
Ln = L
L•L
Ln-1, n >=
> 1
L* = {{ϵ}} L+
Union of languages
Example: A = {a,b}
L, M are languages.
A* = {ϵ,a,b,aa,ab,ba,bb,...}
L M = {x| x L or x M}
= All possible sequences of a and b.
Example: L = {ab,cd} , M = {uv,yz}
gives us: L M = {ab,cd,uv,yz}
A language over A is always a subset of A*.
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.9 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.10
Regular expressions
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.11 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.12
2
Rules for constructing regular expressions Regular Expression Language Examples
Examples: S = {a,b}
S is an alphabet, Regular expression r Language Lr
1. r=a Lr={a}
the regular expression r
describes the language Lr, ϵ {ϵ} 2. r=a* Lr={ϵ,a,aa,aaa, ...} = {a}*
a aS {a} 3. r=a|b Lr={a,b}={a} {b}
the regular expression s
corresponds to the language union: (s) | (t) L s Lt
4. r=(a|b)* Lr={a,b}*={ϵ,a,b,aa,ab,ba,bb,aaa,aab,...}
Ls, etc. concatenation: (s).(t) Ls.Lt
repetition: (s)* L s* 5 r=(a*b*)*
5. r=(a b ) Lr={a,b} ={ϵ,a,b,aa,ab,ba,bb,aaa,aab,...}
={a b}*={ a b aa ab ba bb aaa aab }
repetition: (s)+ Ls + 6. r=a|ba* Lr={a,b,ba,baa,baaa,...}={a or bai | i0}
Each symbol in the alphabet S is
a regular expression which
denotes {a}. Priorities
NB! {anbn | n>=0} cannot be described with regular expressions.
* = repetition, zero or more Highest * +
times. r=a*b* gives us Lr={ai bj | i,j>=0} does not work.
.
+ = repetition, one or more r=(ab)* gives us Lr={(ab)i | i>=0}={ϵ,ab,abab, ... } does not work.
times. Lowest | Regular expressions cannot ’’count’’ (have no memory).
. concatenation can be left out
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.13 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.14
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.15 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.16
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.17 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.18
3
Representation of State Diagrams by
Transition Tables NFA and Transition Tables
The previous graph is a DFA Example: NFA for (b|a)* ab
(Deterministic Finite Automaton).
State Accept Found Next Next
It is deterministic because at each state state
step there is exactly one state to a
a b
go to and there is no transition state a b Accept
0 no ϵ 9 1
marked ‘‘ϵ’’. start a b
1 no b 2 9 0 1 2
A regular
g expression
p denotes a 0 {0,1} {0} no
2 no b +
ba 2 3
regular set and corresponds to an
3 yes ba+b+ 9 3 1 {2} no
NFA (Nondeterministic Finite b
Automaton). 9 no 9 2 yes
state diagram for (b|a)*ab
Transition Table
Transition table for (b|a)*ab
(Suitable for computer representation).
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.19 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.20
start a b
0
DFA for (b|a)*ab 1 2
b
a
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.21 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.22