0% found this document useful (0 votes)
10 views32 pages

Unit 1 Part 2 - Compiler

The document discusses regular expressions, their properties, and methods for converting them to NFA and DFA, including Thompson's construction and direct methods. It outlines the closure properties of regular languages, such as union, intersection, concatenation, and Kleene closure, and provides examples of regular expressions for both finite and infinite languages. Additionally, it details the steps involved in converting regular expressions to DFA using syntax trees and the functions nullable, firstpos, lastpos, and followpos.

Uploaded by

vy6837
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views32 pages

Unit 1 Part 2 - Compiler

The document discusses regular expressions, their properties, and methods for converting them to NFA and DFA, including Thompson's construction and direct methods. It outlines the closure properties of regular languages, such as union, intersection, concatenation, and Kleene closure, and provides examples of regular expressions for both finite and infinite languages. Additionally, it details the steps involved in converting regular expressions to DFA using syntax trees and the functions nullable, firstpos, lastpos, and followpos.

Uploaded by

vy6837
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Unit I – Part II

1. Regular Expressions
2. Conversion of regular expression to NFA –
Thompson’s
3. Converting Regular expression directly to DFA
2. Regular Expressions
Regular expression
• Method to represent a language (regular expression)
• L = {ε, a, aa, aaa,…} = a*
• Let ‘R’ be a regular expression over alphabet ∑ if R is-
1. ε is regular expression denoting the set {ε}.
2. ∅ is regular expression denoting the empty set {}.

• (Phi means no string accepted i.e. no final state.


Epsilon means there is a string of length 0 & it is accepted i.e. there is
a final state.)
3. For each symbol a belong to ∑, ‘a’ is regular expression denoting set
{a}.
4. Union of two regular expression is also regular.
5. Concatenation of two regular expression is also regular.
6. Kleene closure of two regular expression is also regular.
7. If R is regular language, then (R) is also regular.

Nothing else, repeat step 1 to 7 recursively.

Ex: (a + b)*
Regular expression
1. Regular expression for finite language
2. Regular expression for infinite language

Regular expression for Finite language:


∑ = {a, b}
• No string {} : - ∅
• Length 0 {ε} :- ε
• Length 1 {a, b} :- (a + b)
• Length 2 {aa, ab, ba, bb} :- (aa + ab + ba + bb) = a(a+b)+b(a+b) = (a+b)(a+b)
• Length 3 {aaa, bbb, aab, aba,…} :- (a+b)(a+b)(a+b)
• Atmost 1 (0,1) {ε, a, b} :- (ε + a + b)
• Atmost 2 :- (ε + a + b) (ε + a + b)
Regular expression for infinite language
∑ = {a, b}

1. All strings having a single ‘b’ :- a*b a*


2. All strings having at least one ‘b’ :- (a + b)* b (a + b)*
3. All strings having ‘bbbb’ as substring :- (a + b)* bbbb (a + b)*
4. All strings end with ‘ab’ :- (a + b)* ab
5. All strings start with ‘ba’ :- ba (a + b)*
6. All strings beginning and end with ‘a’ :- a (a + b)* a
7. All strings containing ‘a’ :- (a + b)* a (a + b)*
8. All strings starting and end with different symbol :- a (a + b)* b or
b (a + b)* a
Closure Properties of Regular Languages
• Union : If L1 andnIf L2 are two regularn languages, their union L1 ∪ L2 will also be regular. For
example, L1 = {a n| n ≥ n0} and L2 = {b | n ≥ 0}
L3 = L1 ∪ L2 = {a ∪ b | n ≥ 0} is also regular.
• Intersection : If L1 and If L2 are two regular languages, their intersection L1 ∩ L2 will also be
regular. For example,
m n m n n m
L1= {a b | n ≥ 0m and n
m ≥ 0} and L2= {a b ∪ b a | n ≥ 0 and m ≥ 0}
L3 = L1 ∩ L2 = {a b | n ≥ 0 and m ≥ 0} is also regular.
Concatenation : If L1 and If L2 are two regular languages, their concatenation L1.L2 will also
be regular.
n
For example, n
L1 = {a | n ≥ 0}m
andn L2 = {b | n ≥ 0}
L3 = L1.L2 = {a . b | m ≥ 0 and n ≥ 0} is also regular.
• Kleene Closure : If L1 is a regular language, its Kleene closure L1* will also be regular. For
example,
L1 = (a ∪ b)
L1* = (a ∪ b)*
Complement : If L(G) is regular language, its complement L’(G) will also be regular.
Complement of a language can be found by subtracting strings which are in L(G) from all
possible nstrings. For example,
L(G) = {a n| n > 3}
L’(G) = {a | n <= 3}
Converting Regular expression to DFA
4. a+ = {a, aa,
1. b a* b = {bb, bab, baab, aaa,…}
baaab,……}

2. (a + b) c = {ac, bc}
5. a * = {ε, a, aa,
aaa,….}

3. a(bc)* = {a, abc, abcbc,


abcbcbc,…..}
Converting Regular expression to DFA

• (a + b) * (abb + a+b)
Conversion of regular expression to NFA

3
3. Conversion of regular expression
to NFA
Regular expression can be converted into DFA by the following methods:
(i) Thompson’s subset construction
• Given regular expression is converted into NFA using thompson’s
construct.
• Resultant NFA is converted into DFA
(ii) Direct Method
• In direct method, given regular expression is converted directly into DFA.
Thompson’s Construction for Conversion of
Regular Expression to NFA

• Union: r=a+b

• Concatenation: r = r1 r2
• Closure: r = a*

• Ɛ –closure: Ɛ – Closure is the set of states that are reachable from the state
concerned on taking empty string as input. It describes the path that
consumes empty string (Ɛ) to reach some states of NFA.
• Ɛ -closure(q0) = { q0, q1, q2}
• Ɛ –closure(q1 ) = {q1, q2}
• Ɛ -closure(q2) = { q2}
Example:
• Ɛ -closure (1) = {1, 2, 3, 4, 6}
• Ɛ-closure (2) = {2, 3, 6}
• Ɛ-closure (3) = {3, 6}
• Ɛ-closure (4) = {4}
• Ɛ-closure (5) = {5, 7}
• Ɛ -closure (6) = {6}
• Ɛ-closure (7) = {7}

Sub-set Construction
• Given regular expression is converted into NFA.
• Then, NFA is converted into DFA.
Steps:
l. Convert into Ɛ-NFA using above rules for operators (union, concatenation and
closure) and precedence.
2. Find Ɛ -closure of all states.
3. Start with epsilon closure of start state of Ɛ-NFA.
4. Apply the input symbols and find its epsilon closure.
Dtran[state, input symbol] = Ɛ -closure(move(state, input symbol))
where Dtran is transition function of DFA
5. Analyze the output state to find whether it is a new state.
6. If new state is found, repeat step 4 and step 5 until no more new states are
found.
7. Construct the transition table for Dtran function.
8. Draw the transition diagram with start state as the Ɛ -closure (start state of
NFA) and final state is the state that contains final state of NFA drawn.
Ques: (a + b)*a
1

2
Conversion:
Step 2: Find Ɛ -closure of all states.
Ɛ -closure (A): {A, B, C, F, H, I}
Ɛ -closure (B): {B, C, F}
Ɛ -closure (C): {C}
Ɛ -closure (D): {D, E, F, B, C, H, I}
Ɛ -closure (E): {E, B, F, C, H, I}
Ɛ -closure (F): {F}
Ɛ -closure (G): {G, E, B, F, C, H, I}
Ɛ -closure (H): {H, I}
Ɛ -closure (I): {I}
Ɛ -closure (J): {J}
Step 3: Start with epsilon closure of start state of NFA i.e. Closure of A = {A, B,
C, F, H, I}, give a name to this state (here we have named it 1).

Step 4: Apply the input symbols and find its epsilon closure.
Dtran[state, input symbol] = Ɛ -closure(move(state, input symbol))

Dtran[1, a] = Ɛ -closure(move(1, a)) = Ɛ -closure(D, J) = {D, E, F, B, C, H, I, J}


(this will be termed as state 2 in new table).

Dtran[1, b] = Ɛ -closure(move(1, b)) = Ɛ -closure(G) = {G, E, B, F, C, H, I} (this


will be termed as state 3 in new table as this state is not previously received).
Dtran[2, a] = Ɛ -closure(move(2, a)) = Ɛ -closure(D, J) = {D, E, F, B, C, H, I, J} (same
state as 2).

Dtran[2, b] = Ɛ -closure(move(2, b)) = Ɛ -closure(G) = {G, E, B, F, C, H, I} (same state


as 3).

Initial state as it NFA States DFA State a b


contain A A, B, C, F, H, I 1 2 3
D, E, F, B, C, H, I, 2 2 3
Final state as it J
contain J G, E, B, F, C, H, I 3 2 3
4. Converting Regular expression directly to
DFA
Direct Method

• Direct method is used to convert given regular expression directly into DFA.
• Uses augmented regular expression r#.
• Important states of NFA correspond to positions in regular expression that
hold symbols of the alphabet.
Regular expression is represented as syntax tree where interior nodes
correspond to operators representing union, concatenation and closure
operations.
• Leaf nodes corresponds to the input symbols
• Construct DFA directly from a regular expression by computing the
functions nullable(n), firstpos(n), lastpos(n) andfollowpos(i) from the syntax
tree.
• nullable (n): The subtree at node n generates languages including the
empty string.
• firstpos (n): Set of positions that can match the first symbol of a string
generated by the subtree at node n.
• lastpos (n): Set of positions that can match the last symbol of a string
generated by the subtree at node n.
• followpos (i): The set of positions that can follow position i in the tree.
Steps:
• Step 1: Construct a syntax tree for r#.
• Step 2: Traverse the tree to construct functions nullable, firstpos,
lastpos and followpos.
• Step 3: Compute followpos.
• Step 4: Convert RE to DFA.
• Rules for computing nullable, firstpos, and lastpos:
Rules for computing followpos:
1. If n is a cat-node with left child c1 and right child c2 and i is a position in
lastpos(c1), then all positions in firstpos(c2) are in followpos(i).

2. If n is a star-node and i is a position in lastpos(n), then all positions in


firstpos(n) are in followpos(i).

Now that we have seen the rules for computing firstpos and lastpos, we
now proceed to calculate the values of the same for the syntax tree of the
given regular expression (a|b)*abb#.
Example: r = (a|b)*abb
Step 1: Firstly, we construct the augmented regular expression for the
given expression. By concatenating a unique right-end marker ‘#’ to a
regular expression r, we give the accepting state for r a transition on ‘#’
making it an important state of the NFA for r#.
So, r' = (a|b)*abb#
Step 2: Then we construct the syntax tree for r#.
Step 3: Next we need to evaluate four functions nullable, firstpos, lastpos, and
followpos.
• nullable(n) is true for a syntax tree node n if and only if the regular expression
represented by n has € in its language.
• firstpos(n) gives the set of positions that can match the first symbol of a
string generated by the subexpression rooted at n.
• lastpos(n) gives the set of positions that can match the last symbol of a string
generated by the subexpression rooted at n.
We refer to an interior node as a cat-node, or-node, or star-node if it is labeled
by a concatenation, | or * operator, respectively.
NODE followpos

1 {1, 2, 3}

2 {1, 2, 3}

3 {4}

4 {5}

5 {6}
firstpos and lastpos for nodes in syntax tree for
(a|b)*abb# 6 ∅
4. Now we construct Dstates, the set of states of DFA D and Dtran, the transition
table for D. The start state of DFA D is firstpos(root) and the accepting states are
all those containing the position associated with the endmarker symbol #.

• According to our example, the firstpos of the root is {1, 2, 3}. Let this state be A
and consider the input symbol a. Positions 1 and 3 are for a, so let B = followpos(1)
∪ followpos(3) = {1, 2, 3, 4}. Since this set has not yet been seen, we set Dtran[A,
a] := B.

• When we consider input b, we find that out of the positions in A, only 2 is


associated with b, thus we consider the set followpos(2) = {1, 2, 3}. Since this set
has already been seen before, we do not add it to Dstates but we add the
transition Dtran[A, b]:= A.

• Continuing like this with the rest of the states, we arrive at the below transition
table.
• Here, A is the start state and D is the accepting state.
5. Finally we draw the DFA for the above transition table.
The final DFA will be :

Input

State a b

⇢A B A

B B C

C B D

D B A

You might also like