Unit 1 Part 2 - Compiler
Unit 1 Part 2 - Compiler
1. Regular Expressions
2. Conversion of regular expression to NFA –
Thompson’s
3. Converting Regular expression directly to DFA
2. Regular Expressions
Regular expression
• Method to represent a language (regular expression)
• L = {ε, a, aa, aaa,…} = a*
• Let ‘R’ be a regular expression over alphabet ∑ if R is-
1. ε is regular expression denoting the set {ε}.
2. ∅ is regular expression denoting the empty set {}.
Ex: (a + b)*
Regular expression
1. Regular expression for finite language
2. Regular expression for infinite language
2. (a + b) c = {ac, bc}
5. a * = {ε, a, aa,
aaa,….}
• (a + b) * (abb + a+b)
Conversion of regular expression to NFA
3
3. Conversion of regular expression
to NFA
Regular expression can be converted into DFA by the following methods:
(i) Thompson’s subset construction
• Given regular expression is converted into NFA using thompson’s
construct.
• Resultant NFA is converted into DFA
(ii) Direct Method
• In direct method, given regular expression is converted directly into DFA.
Thompson’s Construction for Conversion of
Regular Expression to NFA
• Union: r=a+b
• Concatenation: r = r1 r2
• Closure: r = a*
• Ɛ –closure: Ɛ – Closure is the set of states that are reachable from the state
concerned on taking empty string as input. It describes the path that
consumes empty string (Ɛ) to reach some states of NFA.
• Ɛ -closure(q0) = { q0, q1, q2}
• Ɛ –closure(q1 ) = {q1, q2}
• Ɛ -closure(q2) = { q2}
Example:
• Ɛ -closure (1) = {1, 2, 3, 4, 6}
• Ɛ-closure (2) = {2, 3, 6}
• Ɛ-closure (3) = {3, 6}
• Ɛ-closure (4) = {4}
• Ɛ-closure (5) = {5, 7}
• Ɛ -closure (6) = {6}
• Ɛ-closure (7) = {7}
Sub-set Construction
• Given regular expression is converted into NFA.
• Then, NFA is converted into DFA.
Steps:
l. Convert into Ɛ-NFA using above rules for operators (union, concatenation and
closure) and precedence.
2. Find Ɛ -closure of all states.
3. Start with epsilon closure of start state of Ɛ-NFA.
4. Apply the input symbols and find its epsilon closure.
Dtran[state, input symbol] = Ɛ -closure(move(state, input symbol))
where Dtran is transition function of DFA
5. Analyze the output state to find whether it is a new state.
6. If new state is found, repeat step 4 and step 5 until no more new states are
found.
7. Construct the transition table for Dtran function.
8. Draw the transition diagram with start state as the Ɛ -closure (start state of
NFA) and final state is the state that contains final state of NFA drawn.
Ques: (a + b)*a
1
2
Conversion:
Step 2: Find Ɛ -closure of all states.
Ɛ -closure (A): {A, B, C, F, H, I}
Ɛ -closure (B): {B, C, F}
Ɛ -closure (C): {C}
Ɛ -closure (D): {D, E, F, B, C, H, I}
Ɛ -closure (E): {E, B, F, C, H, I}
Ɛ -closure (F): {F}
Ɛ -closure (G): {G, E, B, F, C, H, I}
Ɛ -closure (H): {H, I}
Ɛ -closure (I): {I}
Ɛ -closure (J): {J}
Step 3: Start with epsilon closure of start state of NFA i.e. Closure of A = {A, B,
C, F, H, I}, give a name to this state (here we have named it 1).
Step 4: Apply the input symbols and find its epsilon closure.
Dtran[state, input symbol] = Ɛ -closure(move(state, input symbol))
• Direct method is used to convert given regular expression directly into DFA.
• Uses augmented regular expression r#.
• Important states of NFA correspond to positions in regular expression that
hold symbols of the alphabet.
Regular expression is represented as syntax tree where interior nodes
correspond to operators representing union, concatenation and closure
operations.
• Leaf nodes corresponds to the input symbols
• Construct DFA directly from a regular expression by computing the
functions nullable(n), firstpos(n), lastpos(n) andfollowpos(i) from the syntax
tree.
• nullable (n): The subtree at node n generates languages including the
empty string.
• firstpos (n): Set of positions that can match the first symbol of a string
generated by the subtree at node n.
• lastpos (n): Set of positions that can match the last symbol of a string
generated by the subtree at node n.
• followpos (i): The set of positions that can follow position i in the tree.
Steps:
• Step 1: Construct a syntax tree for r#.
• Step 2: Traverse the tree to construct functions nullable, firstpos,
lastpos and followpos.
• Step 3: Compute followpos.
• Step 4: Convert RE to DFA.
• Rules for computing nullable, firstpos, and lastpos:
Rules for computing followpos:
1. If n is a cat-node with left child c1 and right child c2 and i is a position in
lastpos(c1), then all positions in firstpos(c2) are in followpos(i).
Now that we have seen the rules for computing firstpos and lastpos, we
now proceed to calculate the values of the same for the syntax tree of the
given regular expression (a|b)*abb#.
Example: r = (a|b)*abb
Step 1: Firstly, we construct the augmented regular expression for the
given expression. By concatenating a unique right-end marker ‘#’ to a
regular expression r, we give the accepting state for r a transition on ‘#’
making it an important state of the NFA for r#.
So, r' = (a|b)*abb#
Step 2: Then we construct the syntax tree for r#.
Step 3: Next we need to evaluate four functions nullable, firstpos, lastpos, and
followpos.
• nullable(n) is true for a syntax tree node n if and only if the regular expression
represented by n has € in its language.
• firstpos(n) gives the set of positions that can match the first symbol of a
string generated by the subexpression rooted at n.
• lastpos(n) gives the set of positions that can match the last symbol of a string
generated by the subexpression rooted at n.
We refer to an interior node as a cat-node, or-node, or star-node if it is labeled
by a concatenation, | or * operator, respectively.
NODE followpos
1 {1, 2, 3}
2 {1, 2, 3}
3 {4}
4 {5}
5 {6}
firstpos and lastpos for nodes in syntax tree for
(a|b)*abb# 6 ∅
4. Now we construct Dstates, the set of states of DFA D and Dtran, the transition
table for D. The start state of DFA D is firstpos(root) and the accepting states are
all those containing the position associated with the endmarker symbol #.
• According to our example, the firstpos of the root is {1, 2, 3}. Let this state be A
and consider the input symbol a. Positions 1 and 3 are for a, so let B = followpos(1)
∪ followpos(3) = {1, 2, 3, 4}. Since this set has not yet been seen, we set Dtran[A,
a] := B.
• Continuing like this with the rest of the states, we arrive at the below transition
table.
• Here, A is the start state and D is the accepting state.
5. Finally we draw the DFA for the above transition table.
The final DFA will be :
Input
State a b
⇢A B A
B B C
C B D
D B A