0% found this document useful (0 votes)
27 views4 pages

Formal Languages Part 1 Including Regular Expressions: Basic Concepts For Symbols, Strings, and Languages

The document discusses basic concepts of formal languages including: 1) Alphabets, strings, and languages where alphabets are sets of symbols, strings are sequences of symbols, and languages are sets of strings. 2) Properties of strings such as length, empty strings, concatenation, and substrings. 3) Operations on languages including concatenation, exponents, union, closure, and regular expressions which are used to describe simple languages.

Uploaded by

Zubair Rahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views4 pages

Formal Languages Part 1 Including Regular Expressions: Basic Concepts For Symbols, Strings, and Languages

The document discusses basic concepts of formal languages including: 1) Alphabets, strings, and languages where alphabets are sets of symbols, strings are sequences of symbols, and languages are sets of strings. 2) Properties of strings such as length, empty strings, concatenation, and substrings. 3) Operations on languages including concatenation, exponents, union, closure, and regular expressions which are used to describe simple languages.

Uploaded by

Zubair Rahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

TDDD55 Compilers and interpreters

Basic Concepts for


TDDB44 Compiler Construction Symbols, Strings, and Languages
 Alphabet
A finite set of symbols.
 Example:
Formal Languages Part 1 ∑b = { 0,1 } binary alphabet
∑s = { A,B,C,...,Z,Å,Ä,Ö } Swedish characters
Including Regular Expressions ∑r = { WHILE,IF,BEGIN,... } reserved words
 String
A finite sequence of symbols from an alphabet.
 Example:
10011 from ∑b
KALLE from ∑s
WHILE DO BEGIN from ∑r

Peter Fritzson
IDA, Linköpings universitet, 2011.
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.2

Properties of Strings in Formal Languages Properties of Strings in Formal Languages


String Length, Empty String Concatenation, Exponentiation
 Concatenation
 Length of a string
 Two strings x and y are joined together x•y = xy
 Number of symbols in the string.
 Example:
 Example:  x = AB, y = CDE produce x•y = ABCDE
 x arbitrary string, |x| length of the string x  |xy| = |x| + |y|
 |10011| = 5 according to ∑b
 xy  yx (not commutative)
 |WHILE| = 5 according to ∑s
 |WHILE| = 1 according to ∑r ϵ x=xϵ=x
 String exponentiation
 Empty string
 x0 = ϵ
 The empty string is denoted ϵ, |ϵ| = 0
 x1 = x
 x2 = xx
 xn = x•xn-1, n >= 1
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.3 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.4

Substrings: Prefix, Suffix Languages


 A Language = A finite or infinite set of strings which can be
 Example: constructed from a special alphabet.

 x = abc  Alternatively: a subset of all the strings which can be


constructed from an alphabet.
  = the empty language. NB! {ϵ}  .
 Prefix: Substring
g at the beginning.
g g
 Prefix of x: abc (improper as the prefix equals x), ab, a, ϵ
 Example: S = {0,1}
 L1 = {00,01,10,11} all strings of length 2
 Suffix: Substring at the end.
 L2 = {1,01,11,001,...,111, ...} all strings which finish on 1
 Suffix of x: abc (improper as the suffix equals x), bc, c, ϵ
 L3 =  all strings of length 1 which finish on 01

TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.5 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.6

1
Operations on Languages
Closure Concatenation
 ∑* denotes the set of all strings which can be constructed  L, M are languages.
from the alphabet

 Concatenation operation • (or nothing) between languages


 Closure types:  L•M = LM = {xy|x  L and y  M}
 * = closure, Kleene closure
 L{ϵ} = {ϵ}L = L
 + = positive closure
 L = L = 

 Example: S = {0,1}
 Example:
 ∑* = {ϵ, 0,1,00,01,...,111,101,...}
 L ={ab,cd} M={uv,yz}
 ∑+ = ∑* – {ϵ} = {0,1,00,01,...}  gives us: LM ={abuv,abyz,cduv,cdyz}

TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.7 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.8

Exponents and Union of Languages Closure of Languages


 Exponents of languages  Closure

 L0 = {ϵ}  L* = L0  L1  ...  L
 L1 =L  Positive closure
 L2 = L•L  L+ = L1  L2  ...  L LL* = L* – {ϵ} , if ϵ not in L
 Ln = L
L•L
Ln-1, n >=
> 1
 L* = {{ϵ}}  L+
 Union of languages
 Example: A = {a,b}
 L, M are languages.
 A* = {ϵ,a,b,aa,ab,ba,bb,...}
 L  M = {x| x  L or x  M}
= All possible sequences of a and b.
 Example: L = {ab,cd} , M = {uv,yz}
 gives us: L  M = {ab,cd,uv,yz}
 A language over A is always a subset of A*.

TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.9 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.10

Regular expressions

 Regular expressions are used to describe simple languages,


e.g. basic symbols, tokens.
Small Language Exercise
 Example: identifier = letter • (letter | digit)*

 Regular expressions over an alphabet S denote a language


(regular set).

TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.11 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.12

2
Rules for constructing regular expressions Regular Expression Language Examples
 Examples: S = {a,b}
 S is an alphabet, Regular expression r Language Lr
 1. r=a Lr={a}
 the regular expression r
describes the language Lr, ϵ {ϵ}  2. r=a* Lr={ϵ,a,aa,aaa, ...} = {a}*
a aS {a}  3. r=a|b Lr={a,b}={a}  {b}
 the regular expression s
corresponds to the language union: (s) | (t) L s  Lt
 4. r=(a|b)* Lr={a,b}*={ϵ,a,b,aa,ab,ba,bb,aaa,aab,...}
Ls, etc. concatenation: (s).(t) Ls.Lt
repetition: (s)* L s*  5 r=(a*b*)*
5. r=(a b ) Lr={a,b} ={ϵ,a,b,aa,ab,ba,bb,aaa,aab,...}
={a b}*={ a b aa ab ba bb aaa aab }
repetition: (s)+ Ls +  6. r=a|ba* Lr={a,b,ba,baa,baaa,...}={a or bai | i0}
 Each symbol in the alphabet S is
a regular expression which
denotes {a}. Priorities
 NB! {anbn | n>=0} cannot be described with regular expressions.
 * = repetition, zero or more Highest * +
times.  r=a*b* gives us Lr={ai bj | i,j>=0} does not work.
.
 + = repetition, one or more  r=(ab)* gives us Lr={(ab)i | i>=0}={ϵ,ab,abab, ... } does not work.
times. Lowest |  Regular expressions cannot ’’count’’ (have no memory).
 . concatenation can be left out
TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.13 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.14

Finite state Automata and Diagrams State Transition Diagram


 (Finite automaton)  state diagram (DFA) for banbm
 Assume:
 regular expression RU = ba+b+ = baa ... abb ... b a
a
1 2
 L(RU) = { banbm | n, m  1 } b
start b b
 Recognizer 0

 A program which takes a string x and answers yes/no a


depending on whether x is included in the language. 9
a
3 b
 The first step in constructing a recognizer for the language
L(RU) is to draw a state diagram (transition diagram). a, b
error state accepting state

TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.15 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.16

Interpret a State Transition Diagram Input and State Transitions


 Start in the starting node 0.  Example of input: baab

 Repeat until there is no more input:  Then accept when there is no


a
more input and state 3 is an 1 2 a
 Read input.
accepting state. b
 Follow a suitable edge. start
0
b b

 When there is no more input:


p Step Current Input a
State 9 3 b
 Check whether we are in a final state. In this case accept a
1 0 baab
the string. a, b
2 1 aab error state accepting state
3 2 ab
 There is an error in the input if there is no suitable edge to 4 2 b
follow. 5 3 ϵ
 Add one or several error nodes.

TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.17 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.18

3
Representation of State Diagrams by
Transition Tables NFA and Transition Tables
 The previous graph is a DFA Example: NFA for (b|a)* ab
(Deterministic Finite Automaton).
State Accept Found Next Next
 It is deterministic because at each state state
step there is exactly one state to a
a b
go to and there is no transition state a b Accept
0 no ϵ 9 1
marked ‘‘ϵ’’. start a b
1 no b 2 9 0 1 2
 A regular
g expression
p denotes a 0 {0,1} {0} no
2 no b +
ba 2 3
regular set and corresponds to an
3 yes ba+b+ 9 3 1 {2} no
NFA (Nondeterministic Finite b
Automaton). 9 no 9 2 yes
state diagram for (b|a)*ab
Transition Table
Transition table for (b|a)*ab
(Suitable for computer representation).

It requires more calculations to simulate an NFA with a computer program,


e.g. for input ab, compared to a DFA.

TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.19 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.20

Transforming NFA to DFA


 Theorem
Any NFA can be transformed to a corresponding

DFA.
Small Regular Expression and
 When generating a recognizer automatically, the
following is done: Transition Diagram/Table
 regular expression  NFA. Exercise
 NFA  DFA.
 DFA  minimal DFA.
 DFA  corresponding program code or table.

start a b
0
DFA for (b|a)*ab 1 2

b
a

TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.21 TDDD55/B44, P Fritzson, IDA, LIU, 2011. 2.22

You might also like