Module - 2 (Compiler)
Module - 2 (Compiler)
Principles of Compiler
Design
MODULE – 1
Dr. WI. Sureshkumar
Associate Professor
School of Computer Science and Engineering (SCOPE)
VIT Vellore
[email protected]
SJT413A34
Regular Expression
Let be a given alphabet. Then
1) , , and any a are all regular expressions. These are called
primitive regular expressions.
2) If r1 and r2 are regular expressions, then
- r1 / r2 is a regular expression.
- r1 r2 is a regular expression.
- r1* is a regular expression.
- (r1 ) is a regular expression.
3) A string is a regular expression, if and only if it can be derived from the
primitive regular expressions by a finite number of applications of rules in
(2).
Regular Expression
Check whether the given string is a regular expression,
S ={ a, b, c}, the string (a/b . c)* . (c / )
r1 = a , r 2 = b , r 3 = c
r4 = r2 . r3 = b . c
r5 = r1 / r 4 = a / b . c
r6 = (r5) = (a / b . c)
r7 = r6* = (a / b . c)*
r8 =
r9 = r3 / r 8 = c /
r10 = (r9)= (c / )
Languages associated with regular
expressions
The language L(r) denoted by a regular expression r is defined by the
following rules,
1) is a r.e denoting the empty set L () = {}
2) is a r.e denoting the empty set L () = {}
3) For any a is a r.e denoting the set L (a) = {a}
If r1 and r2 are regular expressions, then
4) L (r1 / r2) = L{r1} U L{r2}
5) L (r1 . r2) = L{r1}.L{r2}
6) L ((r1)) = L(r1)
7) L (r1*) = (L (r1))*
Languages associated with regular
expressions
Exhibit the language L(a* . (a / b)) in set notation
L(a* . (a / b)) = L(a*) . L( (a / b))
= (L(a))*. L(a / b))
= (L(a))*. L(a) U L(b)
= { a }*. { a } U { b }
= {, a, aa, aaa,. . . }. { a, b}
= {a, aa, aaa, aaaa,. . . ,b, ab, aab, aaab,. . . }
Languages associated with regular
expressions
r = (a / b)*. (a / bb), find L(r)
L(r) = L((a / b)*. (a / bb))
= L((a / b)*). L(a / bb)
= (L(a / b))*. L(a / bb)
= (L(a) U L(b))* . L(a) U L(bb)
= {a, b}* . {a} U {b}.{b}
= {a, b}* . {a} U {bb}
= {a, b}* . {a, bb}
= {, a, b, aa, ab, ba, bb,. . .} . {a, bb}
= {a, aa, ba, aaa, aba, baa, bba,. . .,bb, abb, bbb, aabb, abbb, . . .}
Examples
Describe the following sets by regular expressions
1) L1 = the set of all strings of 0’s and 1’s ending with 00
2) L2 = the set of all strings of 0’s and 1’s and beginning with 0 and ending
with 1
3) L3 = {, aa, aaaa, . . . }
4) The set of all strings of 0’s and 1’s with at least two consecutive zeros
5) The set of all strings of a’s and b’s whose length is divisible by 6
6) The set of all strings of a’s and b’s whose 5th last symbol is b
7) The expression r = (aa)* (bb)*b denotes the set of strings with an even
number of a’s followed by an odd number of b’s
8) L4 = {an bm / n ≥ 4 , m ≤ 3}
Examples
1) (0 / 1)*00
2) 0(0 / 1)*1
3) (aa)*
4) (0 / 1)*00(0 / 1)*
5) [(a / b)6]*
6) (a / b)*b(a / b)4
7) L(r) = {a2n b2m+1 / n ≥ 0 , m ≥ 0 }
8) aaaaa*( / b / bb / bbb)
Regular expression to -NFA
Theorem: Let r be a regular expression, then there exists some NFA
with -moves that accept L(r). Consequently, L(r) is a regular language.
Proof: We begin with automata that accepts the language for primitive
regular expressions , , and any a
i) a) NFA accepts
q0 qf
a
q0 qf
q0 qf
Regular expression to -NFA
ii) NFA accepts L(r)
M(r)
M(r2)
Regular expression to -NFA
iii) NFA accepts L(r1.r2)
M(r1) M(r2)
M(r1)
Examples
1) The set of integers
(1+2+. . . +9)(0+1+. . . +9)*
2)The set of decimal numbers
.
(1+2+. . . +9)(0+1+. . . +9)* (0+1+. . . +9)*
3) Strings over {a, b} and length multiple of 3
[(a + b)(a + b)(a + b)]*
Examples
1) (0 + 01)*
2) (a + b)*b(a + b)*
3) (a+b)*abb
4) aa* + ab a*b*
5) (abab)* + (aa*+ b)*
6) ((00*)*1)*
7) (01)* + 1(01)* + (01)*0 + 1(01)*0
1) (0 / 01)*
0
0
1
1
0 1
01
0
0 / 01
0 1
(0 / 01)* 0
0 1
(a/b)*
a
b
(a / b)*b(a / b)*
a a
b
b b
Direct Method – (RE to DFA)
• Let r be the regular expression. Then the augmented regular
expression is denoted by r#
r = (a/b)*bb r# = (a/b)*bb#
• Construct a syntax tree for r#
• Traverse the tree to construct the following functions
nullable() , firstpos() , lastpos() , followpos()
• Construct DFA by using subset construction method.
Given (a/b)*abb , then augmented r.e is (a/b)*abb#
#
6
b
5
b
4
* a
3
|
a b
1 2
{6} # {6}
{5} b {5}
b
{4} {4}
* {3} a { 3}
|
{1} a {1} {2} b {2}
{ 1, 2, 3 } {6}
{ 1, 2 } { 1, 2 }
|
{1} a {1} {2} b {2}
Computing followpos()
The function followpos( i ) tells us what position can follow position i
in the syntax tree. To find followpos( i ), we need 2 rules:
a
A B
B = {1, 2, 3, 4} Dtran[B, a] = followpos(1) U followpos(3)
= {1, 2, 3} U {4} = {1, 2, 3, 4} = B
Dtran[B, b] = followpos(2) U followpos(4)
= {1, 2, 3} U {5} = {1, 2, 3, 5} = C
a b
A B C
a
C = {1, 2, 3, 5} Dtran[C, a] = followpos(1) U followpos(3)
= {1, 2, 3} U {4} = {1, 2, 3, 4} = B
Dtran[C, b] = followpos(2) U followpos(5)
= {1, 2, 3} U {6} = {1, 2, 3, 6} = D
b
a
a b b
A B C D
a
D = {1, 2, 3, 6} Dtran[D, a] = followpos(1) U followpos(3)
= {1, 2, 3} U {4} = {1, 2, 3, 4} = B
Dtran[D, b] = followpos(2)
= {1, 2, 3} = {1, 2, 3} = A
b b
a
a b b
A B C D
a
a
Given (a/b)*a(a/b) , then augmented r.e is (a/b)*a(a/b)#
#
6
|
a b
*
4 5
a
3
|
a b
1 2
{6} # {6}
|
{5} b {5}
{4}a {4}
* {3} a{3}
|
{1} a {1} {2} b {2}
{ 1, 2, 3 } {6}
{ 1, 2, 3 } { 4, 5 }
{6} # {6}
{ 4, 5 }
{ 4, 5 } |
{ 1, 2, 3 } {3}
{5} b {5}
{4} a {4}
{ 1, 2 }
* { 1, 2 } {3} a {3}
{ 1, 2 } | { 1, 2 }
6
3
2 5
A = {1, 2, 3} position(a) = 1, 3, 4 position(b) = 2, 5
a
B
A
B = {1, 2, 3, 4, 5} Dtran[B, a] = followpos(1)U followpos(3)U followpos(4)
= {1, 2, 3} U {4, 5} U {6} = {1, 2, 3, 4, 5, 6} = C
Dtran[B, b] = followpos(2) U followpos(5)
b = {1, 2, 3}U{6} = {1, 2, 3, 6} = D
a a
B
A C
D
C = {1, 2, 3, 4, 5, 6} Dtran[C, a] = followpos(1)Ufollowpos(3)U followpos(4)
D = {1, 2, 3, 6} = {1, 2, 3} U {4, 5} U {6} = {1, 2, 3, 4, 5, 6} = C
Dtran[C, b] = followpos(2) U followpos(5)
= {1, 2, 3}U{6} = {1, 2, 3, 6} = D
b
Dtran[D, a] = followpos(1)Ufollowpos(3)
= {1, 2, 3} U {4, 5} = {1, 2, 3, 4, 5} = B
a
Dtran[D, b] = followpos(2)
a
A
B
C = {1, 2, 3} = A
a
b a
b
b
D
Construct DFA for the following regular expressions:
1) (a/b)*a(a/b) (a/b)
2) (a/b)*abb(a/b)*
3) (a/b)*a(a/b)(a/b)(a/b)
4) (a/b)*a(a/b)*