0% found this document useful (0 votes)
9 views52 pages

03 Regular

The document covers the concepts of regular languages and grammars, focusing on regular expressions, their formal definitions, and the connection between regular expressions and regular languages. It includes examples of regular expressions, languages associated with them, and the construction of nondeterministic finite automata (NFA) for these expressions. Additionally, it discusses generalized transition graphs and procedures for converting NFAs to regular expressions.

Uploaded by

ghmpersonal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views52 pages

03 Regular

The document covers the concepts of regular languages and grammars, focusing on regular expressions, their formal definitions, and the connection between regular expressions and regular languages. It includes examples of regular expressions, languages associated with them, and the construction of nondeterministic finite automata (NFA) for these expressions. Additionally, it discusses generalized transition graphs and procedures for converting NFAs to regular expressions.

Uploaded by

ghmpersonal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Regular Languages and

Grammars
Week 3

Formal Language and Automata Theory, SeoulTech

1
Topics

• Regular Expressions
• Connection between Regular Expressions and Regular Languages
• Regular Grammars

2
Regular Expressions

3
Regular Expression Formal Definition
De nition 3.1

• Let Σ be a given alphabet. Then


1. Ø, λ, and a ∈ Σ are all regular expressions (primitive regular
expressions).

2. If r1 and r2 are regular expressions, so are r1 + r2, r1 ⋅ r2, r*


1
, and (r1).

3. A string is a regular expression i . it can be derived from the primitive


regular expressions by a nite number of applications of the rules in (2).

• + (union), ⋅ (concatenation), * (star-closure).

4
fi
fi
ff
Regular Expressions
Example 3.1

• For Σ = {a, b, c}, the string


• (a + b ⋅ c)* ⋅ (c + Ø)
• is a regular expression.
• On the other hand,
• (a + b+)
• is not a regular expression.

5
Languages Associated w/ RegExp
De nition 3.2

• The language L(r) denoted by any regular expression r is de ned by the


following rules.

1. Ø is a regular expression denoting the empty set,


2. λ is a regular expression denoting {λ},
3. For every a ∈ Σ, a is a regular expression denoting {a}.

6
fi
fi
Languages Associated w/ RegExp
De nition 3.2

• If r1 and r2 are regular expressions, then


4. L(r1 + r2) = L(r1) ∪ L(r2),
5. L(r1 ⋅ r2) = L(r1)L(r2),
6. L((r1)) = L(r1),
7. L(r*
1
) = (L(r1))*.

• These are the rules to recursively reduce languages into simpler forms.
7
fi
Languages Associated w/ RegExp
Example 3.2

• Exhibit the language L(a* ⋅ (a + b)) in set notation.


• L(a* ⋅ (a + b))
• = L(a*)L(a + b)
• = (L(a))*(L(a) ∪ L(b))
• = {λ, a, aa, aaa, ⋯}{a, b}
• = {a, aa, aaa, ⋯, b, ab, aab, ⋯}.
8
Ambiguity in the Rules

• Rules 4~7.
• L(r1 + r2) = L(r1) ∪ L(r2) , L(r1 ⋅ r2) = L(r1)L(r2)
• L((r1)) = L(r1), L(r*
1
) = (L(r1 ))*

• Consider a ⋅ b + c.
• r1 = a ⋅ b, r2 = c ➞ L(a ⋅ b + c) = {ab, c}.
• r1 = a, r2 = b + c ➞ L(a ⋅ b + c) = {ab, ac}.
• Precedence rules: * > ⋅ > + .
9
Ambiguity in the Rules
Example 3.3

• For Σ = {a, b}, the expression


• r = (a + b)*(a + bb)
• is regular.
• It denotes the language
• L(r) = {a, bb, aa, abb, ba, bbb, ⋯}.

10
Ambiguity in the Rules
Example 3.4

• The expression
• r = (aa)*(bb)*b
• denotes the set of all strings with an even number of a's followed by an odd
number of b's.
2n 2m+1
• L(r) = {a b : n ≥ 0,m ≥ 0}.
• Get used to these notations.

11
Identify Regular Expressions
Example 3.5

• For Σ = {0,1}, give a regular expression r such that


• L(r) = {w ∈ Σ* : w has at least one pair of consecutive zeros}.
• In other words, 00 should be included in a string of L(r).
• Arbitrary strings on {0,1} can be denoted by (0 + 1)*.
• For the given language, we can put them before/after 00.
• r = (0 + 1)*00(0 + 1)*.

12
Identify Regular Expressions
Example 3.6

• Find a regular expression for the language


• L = {w ∈ {0,1}* : w has no pair of consecutive zeros}.
• This language is actually the complement of the language in Example 3.5.
• So you may exploit [^] (not) for many regex features in PL.
• No consecutive zeros ➞ Once 0 appears, 1 should follow.
• (1*011*)*
• Two uncovered cases: all 1s and end with 0.
• r = (1*011*)* + 1* + (1*011*)*0 + 1*0 = (1*011*)*(0 + λ) + 1*(0 + λ).
13
Identify Regular Expressions
Example 3.6

• If we consider L as the repetition of the strings 1 and 01,


• r = (1 + 01)*(0 + λ).
• One language can be denoted by many regular expressions.
• Generally there are in nitely many of them.

14
fi
Identifying Regular Expressions
Exercise 6

• Show that r = (1 + 01)*(0 + 1*) also denotes the language in Example 3.6.
• Proof. It's actually quite simple.
• Note that 1* is a set of strings repeating zero or more 1s.
• So it includes λ in it.
• Extra 1s on the 2nd part can be attached to the 1st part.

• Can you nd a RegExp for a language with exactly one 00? (Exercise 18).

15
fi
Connection between Regular
Expressions and Languages

16
RegExps Denote Regular Languages
Theorem 3.1

• Let r be a regular expression.


• Then there exists some NFA that accepts L(r).
• Consequently, L(r) is a regular language.
• Proof: We will show that the above argument is true, by constructing
automata for regular expressions.

17
RegExps Denote Regular Languages
Figure 3.1
Theorem 3.1 Proof

• Automata accept regular expressions Ø, λ, a ∈ Σ are shown in Figure 3.1.


• Assume that we have automata M(r1) and M(r2) accepting languages
denoted by regular expressions r1 and r2 respectively.

• We are using the argument that every NFA has an equivalent NFA w/ one nal
state, from Exercise 9, Section 2.3.

• Schematic representation.

Figure 3.2

18

fi
RegExps Denote Regular Languages
Theorem 3.1 Proof

• With M(r1) and M(r2), we can construct automata for the regular expressions
r1 + r2, r1r2, and r*.

Figure 3.4

Figure 3.5
Figure 3.3

19
RegExps Denote Regular Languages
Example 3.7

• Find an NFA that accepts L(r), where


• r = (a + bb)*(ba* + λ).
• Here are automata for (a + bb) and (ba* + λ).
• Figure 3.6

20
RegExps Denote Regular Languages
Example 3.7

• Putting the two automata together for concatenation gives us the nal
solution.

• r = (a + bb)*(ba* + λ).

Figure 3.7

21

fi
Generalized Transition Graphs

• A Generalized Transition Graph (GTG) is a transition graph whose edges are labeled with
regular expressions.

• Other than that, it is the same as the usual transition graph.


• The label of any walk from initial to nal states is the concatenation of several regular
expressions.

• The strings denoted by such expressions are a subset of the language accepted by the GTG.
• The union of such strings gives the full language.
• A complete GTG is a GTG with all edges are present.
• A graph of an NFA can be considered as a generalized transition graph too.
22
fi
Generalized Transition Graphs

• The graph of any nondeterministic nite accepter can be considered as a


generalized transition graph too.

• Hence for every regular language, there exists a GTG that accepts it.
• Conversely, every language accepted by a GTG is regular.
• Every walk in a GTG is a regular expression.
• By Theorem 3.1, a language represented by the regular expression is
regular.

23
fi
Generalized Transition Graphs
Example 3.8

• Figure 3.8 represents a generalized transition graph. Figure 3.8

• The language accepted by it is


• L(a* + a*(a + b)c*).
• The edge (q0, q0) labeled a is a loop that can generate any number of a's =
L(a*).
• Labeling this edge a* will not change the language accepted by the graph.

24
Complete GTGs
Example 3.9

Figure 3.9

25
Complete GTGs
Example 3.10

• When a GTG has more than two states, we can nd an


equivalent graph by removing one state at a time.

• Consider the complete GTG in Figure 3.11.


• To remove q2, we create
• an edge (q1, q1), labeled e + af*b,
• an edge (q3, q3), labeled g + df*c,
• an edge (q1, q3), labeled h + af*c, Figure 3.11

• an edge (q3, q1), labeled i + df*b.


26
fi
Complete GTGs
Example 3.10

• Once all edges are created, we can remove q2 and associated edges.
• Then we can obtain an equivalent graph in Figure 3.12.

Figure 3.11 Figure 3.12

27
Complete GTGs
Procedure: nfa-to-rex

• Consider the simple two-state complete GTG in Figure 3.10.


• The regular expression
• r = r*r (r
1 2 4
+ r r* r
3 1 2 )* Figure 3.10

• covers all possible paths.


• For arbitrary GTGs, we remove one state at a time until only two states are
left.

• Then we apply the above expression to get the nal regular expression.
28
fi
Complete GTGs
Procedure: nfa-to-rex

1. Start with an NFA with states q0, q1, ⋯, qn, and a single nal state, distinct
from its initial state.

2. Convert the NFA into a complete GTG. Let rij stands for the label of (qi, qj).

3. If the GTG has only two states, with qi as its initial state and qj its nal state,
its associated regular expression is

• r = r* r (r
ii ij jj
+ r r* r
ji ii ij)* (3.2)

29

fi
fi
Complete GTGs
Procedure: nfa-to-rex

4. If the GTG has three states, with initial state qi, nal state qj, and third state qk, introduce new edges,
labeled

• rpq + rpkr*
kk
rkq (3.3)

• for p = i, j, q = i, j. When it's done, remove vertex qk and its associated edges.
5. If the GTG has four or more states, pick a state qk to be removed.

• Apply rule 4 for all pairs of states (qi, qj), i ≠ k, j ≠ k.


• At each step, apply the simplifying rules
• r + Ø = r, rØ = Ø, Ø* = λ wherever possible, then remove state qk.
• Try Example 3.11 by yourself.
30
fi
Regular Expression and Regular Language
Theorem 3.2

• Let L be a regular language.


• Then there exists a regular expression r such that L = L(r).
• Proof: If L is regular, there exists an NFA for it. We can assume that this NFA
has a single nal state, distinct from its initial state.

• We can convert this NFA to a complete GTG and apply procedure nfa-to-rex,
obtaining the required regular expression.

31
fi
Regular Expression for Simple Patterns
Exercise 17
<digit> <digit>

• Regular expression for C real numbers. q0


<sign>
q1
<digit>
q2
<p>
q3
<digit>
q4

• e.g) -2.5e-3 = -0.0025 <digit>


<p> e
<p> e e
q5
<sign>, e
<sign>, <p>
<sign>

➞ ➞ e <sign> <digit>
q6
<sign>, <p>
e, <p> <digit>
➞ <sign>, e, <p>
q8 q7
<sign>, e, <p>

<digit>
<sign>, <digit>, e, <p>

• We are considering Σ = {+, -, ., e, 0, 1, ..., 9}.


32
Regular Expression for Simple Patterns
Exercise 17
d d

• Applying nfa-to-rex. q0
s
q1
d
q2
p
q3
d
q4

• Let s, d, e, p denote <sign>, <digit>, e, and <p> d


p e
respectively. p e e
q5

• Then a regular expression s, e


s, p s
e s d
• r = (d + sd)d* + (p + sp + (d + sd)d*p)dd* s, p
e, p
q6

+(((d + sd)d*e + (p + sp + (d + sd)d*p)e s

+(p + sp + d(d + sd)d*p)dd*e)d


s, e, p
q8 q7
+((d + sd)d*e + (p + sp + (d + sd)d*p)e s, e, p

+(p + sp + (d + sd)d*p)dd*e)sd)d* s, d, e, p
d

• is for C real numbers.


33
Regular Grammars

34
Right- and Left-Linear Grammars
De nition 3.3

• A grammar G = (V, T, S, P) is said to be right-linear if all productions are of the form


• A → xB,
A → x,
• where A, B ∈ V, and x ∈ T*.
• A grammar is said to be left-linear if all productions are of the form
• A → Bx,
• A → x.
• A regular grammar is either right-linear or left-linear.
35
fi
Regular Grammars
Example 3.13

• The grammar G1 = ({S}, {a, b}, S, P1), with P1 given as


• S → abS | a
• is right-linear.
• The grammar G2 = ({S, S1, S2}, {a, b}, S, P2), with productions
• S → S1ab,
S1 → S1ab | S2,
S2 → a,
• is left-linear.
36
Regular Grammars
Example 3.13

• G1: S → abS | a
• S ⇒ abS ⇒ ababS ⇒ ababa is a derivation with G1.
• L(G1) is the language denoted by the regular expression r = (ab)*a.
• G2: S → S1ab, S1 → S1ab | S2, S2 → a
• S ⇒ S1ab ⇒ S1abab ⇒ S2abab ⇒ aabab
• L(G2) is the regular language L(aab(ab)*).
37
Regular Grammars
Example 3.14

• The grammar G = ({S, A, B}, {a, b}, S, P) with productions


• S → A,
A → aB | λ,
B → Ab,
• is not regular, but it is a linear grammar.
• A linear grammar is a grammar in which at most one variable can appear on
the right side of any production, without restriction on the position.

38
Regular Grammars and Regular Languages

• Now we know what is a regular grammar.


• What we want to say is that regular grammars are associated with regular
languages.

• Every regular language can be denoted by a regular grammar.


• And a language generated by a regular grammar is regular.

39
Right-Linear Grammars generate Regular Languages
Theorem 3.3

• Let G = (V, T, S, P) be a right-linear grammar. Then L(G) is a regular


language.

• The key idea of the proof is like the following.


• Consider a derivation ab⋯cD ⇒ ab⋯cdE, by using D → dE.
• We can construct an NFA to mimic this derivation, going from D to E, when
d is encountered - i.e., the edge labeled d.

40
Right-Linear Grammars generate Regular Languages
Theorem 3.3

• Proof: Assume that V = {V0, V1, ⋯}, S = V0, and we have productions of
the form V0 → v1Vi, Vi → v2Vj, ⋯ or Vn → vl, ⋯.

• If w is a string in L(G), then because of the production form


• V0 ⇒ v1Vi ⇒ v1v2Vj ⇒* v1v2⋯vkVn ⇒ v1v2⋯vkvl = w. (3.4)
• The automaton to be constructed will reproduce the derivation by consuming
v's in turn.

41
Right-Linear Grammars generate Regular Languages
Theorem 3.3

• With the initial state V0, there will be a non- nal state labeled Vi for each variable.
• For each production Vi → a1a2⋯amVj, we can consider transitions to connect Vi and Vj.
• δ*(Vi, a1a2⋯am) = Vj.
• Similarly, for each production Vi → a1a2⋯am,
• δ*(Vi, a1a2⋯am) = Vf, where Vf is a nal state.

Figure 3.16

42
fi
fi
Right-Linear Grammars generate Regular Languages
Theorem 3.3

• Suppose that w ∈ L(G) so that (3.4) is satis ed.


• V0 ⇒ v1Vi ⇒ v1v2Vj ⇒* v1v2⋯vkVn ⇒ v1v2⋯vkvl = w (3.4)
• In the NFA, there is a path from V0 to Vi labeled v1, a path from Vi to Vj
labeled v2, ⋯. Then

• Vf ∈ δ*(V0, w), and w is accepted by M.

43
fi
Right-Linear Grammars generate Regular Languages
Theorem 3.3

• Conversely, assume that w is accepted by M.


• Then the automaton has to traverse states V0, Vi, ⋯ to Vf, using paths
labeled v1, v2, ⋯, which indicates that

• w = v1v2⋯vkvl, and the derivation


• V0 ⇒ v1Vi ⇒* v1v2⋯vkVk ⇒ v1v2⋯vkvl is possible.
• Hence w is in L(G), which proves the theorem.

44
Right-Linear Grammars generate Regular Languages
Example 3.15

• Construct a nite automaton that accepts the language Figure 3.17


generated by the grammar

• V0 → aV1,
• V1 → abV0 | b, where V0 is the start variable.
• The language generated by the grammar and accepted by
the automaton is the regular language L((aab)*ab).

45
fi
Right-Linear Grammars for Regular Languages

• Now we know that right-linear grammars generate regular language.


• In other words, languages generated by right-linear grammars are always
regular.

• Then how about the opposite direction?


• Can we always nd a right-linear grammar, for a given regular language?

46
fi
Right-Linear Grammars for Regular Languages
Theorem 3.4

• If L is a regular language on the alphabet Σ, then there exists a right-linear


grammar G = (V, Σ, S, P) such that L = L(G).

• Proof: Let M = (Q, Σ, δ, q0, F) be a DFA that accepts L, and assume that
Q = {q0, q1, ⋯, qn} and Σ = {a1, a2, ⋯, am}.

• Construct the right-linear grammar G with V = {q0, q1, ⋯, qn} and S = q0.
• For each transition δ(qi, aj) = qk of M, we put the production qi → ajqk in P.
• If qk is in F, we add qk → λ to P.
47
Right-Linear Grammars for Regular Languages
Theorem 3.4

• The rst step of the proof is showing that G de ned in this way can generate
every string in L.

• Consider w ∈ L, w = aiaj⋯akal.
• If M accepts w, then there must be transitions like the following.
• δ(q0, ai) = qp, δ(qp, aj) = qr, ⋯, δ(qt, al) = qf ∈ F.

48
fi
fi
Right-Linear Grammars for Regular Languages
Theorem 3.4

• By construction, G will have one production for each transition.


• Hence we can make the derivation
• q0 ⇒ aiqp ⇒ aiajqr ⇒* aiaj⋯akalqf ⇒ aiaj⋯akal (3.7)
• with G, and w ∈ L(G).
• Conversely, if w ∈ L(G), then its derivation must have the form (3.7).
• This implies that δ*(q0, aiaj⋯akal) = qf, which completes the proof.
49
Equivalence of Regular Languages/Grammars
Theorem 3.5 and 3.6

• Theorem 3.5: A language L is regular if and only if there exists a left-linear


grammar G such that L = L(G).

• Proof: Check yourselves from the textbook. The idea is very simple, since we
can convert a left-linear grammar to a right-linear grammar generating the
reverse of L.

• Theorem 3.6: A language L is regular if and only if there exists a regular


grammar G such that L = L(G).

• Proof: this is automatically drawn from Theorems 3.4 and 3.5.


50
Describing Regular Languages

Regular Expression to NFA Regular Language to Regular Expression

Regular Grammar to DFA Regular Language to Regular Grammar

Figure 3.19

51
Summary

• You need to remember what is the regular expression and its notation.
• For a given regular expression, nd a language denoted by the expression, and vice versa.
• The relation between regular expressions and regular languages.
• What is a GTG and a complete GTG?
• How can we obtain a regular expression from an NFA?
• How to write a regular expression for a simple pattern?
• Practice the conversions between regular expressions, grammars, DFAs/NFAs and
languages.

52
fi

You might also like