Lecture 6

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Formal Languages and Compiler

Regular Expression, Regular Languages,


Implementing a recognizer of RE

Ziaurahman Hikmat

ziaurahman.hikmat1@gmail.com
Nangarhar University Computer Science Faculty

31 October 2023

1 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


LECTURE OVERVIEW
֎ Regular Expression
֎ Regular grammar Overview
֎ Introduction
֎ Regular Definition
֎ RE vs RG
֎ Implementing a Recognizer of RE’s: Automata
֎ DFA
֎ NFA
֎ From Regular Expression to NFA

NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


TYEP3 (REGULAR GRAMMAR)
Regular Grammars, also called Type 3 Grammars, are formal Grammars, G= (VT,VN,S,P),
such that all productions in P respect the following condition:
A → aB, or A → a
with A,B ∈ VN and a ∈ VT.
Furthermore, a rule of the form:
S → ε is allowed if S does not appear on the right side of any rule.
֎ The above define the Right-Regular Grammars. The following Productions:
A → Ba, or A → a
define Left-Regular Grammars.
֎ Right-Regular and Left-Regular Grammars define the same set of Languages.
֎ Regular Grammars are commonly used to define the lexical structure of
programming languages

3 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


REGULAR EXPRESSION
Each Regular Expression, say R, denotes a Language, L(R). The following
are the rules to build them over an alphabet V:
֎ If a ∈ V ∪ {𝜀} then a is a Regular Expression denoting the language {a};
֎ If R,S are Regular Expressions denoting the Languages L(R) and L(S)
then:
֎ R | S is a Regular Expression denoting L(R) ∪ L(S);
֎ R·S is a Regular Expression denoting the concatenation L(R) · L(S), i.e.,
L(R)·L(S) = {r·s | r ∈ L(R) and s ∈ L(S)};
֎ R∗ (Kleen closure) is a Regular Expression denoting L(R)∗, zero or more
concatenations of L(R), i.e., L(R)∗ =‫∞ڂ‬ 𝑖=0 𝐿 𝑅 —where 𝐿 𝑅
𝑖 0
= {𝜀};
֎ (R) is a Regular Expression denoting L(R).
Precedence of Operators: ∗ > · > |
E | F·G∗ = E | (F·(G∗))

4 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


EXAMPLE
֎ Let V = {a,b}
֎ The Regular Expression a | b denotes the Language {a,b}.
֎ The Regular Expression (a | b)(a | b) denotes the Language {aa,ab,ba,bb}.
֎ The Regular Expression a∗ denotes the Language of all strings of zero
or more a’s, {𝜀,a,aa,aaa,...}.
֎ The Regular Expression (a | b)∗ denotes the Language of all strings of
a’s and b’s.

5 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


REGULAR EXPRESSION SHORTHANDS
֎ Notational shorthand's are introduced for frequently used
constructors.
֎ +: One or more instances. If R is a Regular Expression then R+ ≡ RR∗.
֎ ?: Zero or one instance. If R is a Regular Expression then R? ≡ 𝜀 | R.
֎ Character Classes. If a,b,...,z ∈ V then [a,b,c] ≡ a | b | c, and
[a−z] ≡ a | b | ... | z.

6 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


REGULAR DEFINITIONS
֎ Regular Definitions are used to give names to regular Expressions
and then to re-use these names to build new Regular Expressions.
֎ A Regular Definition is a sequence of definitions of the form:
D1 → R1
D2 → R2
...
Dn → Rn
֎ Where each Di is a distinct name and each Ri is a Regular Expression
over the extended alphabet V ∪ {D1,D2,...,Di−1}.
֎ Note: Such names for Regular Expression will be often the Tokens
returned by the Lexical Analyzer. As a convention, names are printed
in boldface.

7 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


REGULAR DEFINITION EXAMPLE
Example 1.
Identifiers are usually strings of letters and digits beginning with a letter:
letter → A | B |...| Z | a | b | ... | z
digit → 0 | 1 |···| 9
id → letter(letter | digit)∗
Using Character Classes we can define identifiers as:
id → [A−Za−z][A−Za−z0−9]∗

8 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


REGULAR DEFINITION EXAMPLE
Example 2.
Numbers are usually strings such as 5230, 3.14, 6.45E4, 1.84E-4.
digit → 0 | 1 |···| 9
digits → digit+
optional-fraction → (.digits)?
optional-exponent → (E(+ |−)?digits)?
num → digits optional-fraction optional-exponent

9 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


RE VS RG
֎ Languages captured by Regular Expressions could be captured by
Regular Grammars (Type 3 Grammars).
֎ Regular Expressions are a notational variant of Regular Grammars:
Usually they give a more compact representation.
֎ Example. The Regular Expression for numbers can be captured by
a Regular Grammar with the following Productions (num is the
scope and digit is a terminal symbol):
num → digit | digit Z
Z → digit | digit Z | . Frac-Exp | E Exp-Num
Frac-Exp → digit | digit Frac-Exp | digit Exp
Exp → E Exp-Num
Exp-Num → +Digits |−Digits | digit | digit Digits
Digits → digit | digit Digits

10 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


FINITE AUTOMATA
֎ We need a mechanism to recognize Regular Expressions.
֎ While Regular Expressions are a specification language, Finite
Automata are their implementation.
֎ Given an input string, x, and a Regular Language, L, they answer
“yes” if x ∈ L and “no” otherwise.

11 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


DETERMINISTIC FINITE AUTOMAT
A Deterministic Finite Automata, DFA for short, is a tuple:
A = (S, V, δ, s0, F):
֎ S is a finite non empty set of states;
֎ V is the input symbol alphabet;
֎ δ : S × V → S is a total function called the Transition Function;
֎ s0 ∈ S is the initial state;
֎ F ⊆ S is the set of final states.

12 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


TRANSITION GRAPH
֎ A DFA can be represented by Transition Graphs where the nodes
are the states and each labeled edge represents the transition
function.
֎ The initial state has an input arc marked start. Final states are
indicated by double circles.
֎ Example: DFA that accepts strings in the Language L((a|b)∗abb)

13 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


TRANSITION TABLE
֎ Transition Tables implement transition graphs, and thus Automata.
֎ A Transition Table has a row for each state and a column for each
input symbol.
֎ The value of the cell (si,aj) is the state that can be reached from state
si with input aj.
֎ Example: The table implementing the previous transition graph will
have 4 rows and 2 columns, let us call the table δ, then:
δ(0,a) = 1 δ(0,b) = 0
δ(1,a) = 1 δ(1,b) = 2
δ(2,a) = 1 δ(2,b) = 3
δ(3,a) = 1 δ(3,b) = 0

14 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


NONDETERMINISTIC FINITE AUTOMATA
A Nondeterministic Finite Automata, NFA for short, is a tuple:
A = (S, V*, δ, s0, F):
֎ S is a finite non empty set of states;
֎ V* is the input symbol alphabet include ε;
֎ δ : S × V → S is a total function called the Transition Function;
֎ s0 ∈ S is the initial state;
֎ F ⊆ S is the set of final states.

15 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


NFA EXAMPLE
Given an input string and an NFA there will be, in general, more then one
path that can be followed: An NFA accepts an input string if there is at
least one path ending in a final state.
Example. NFA that accepts strings in the Language L((a | b)∗abb).

16 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


DFA VS NFA
֎ Both DFA and NFA are capable of recognizing all Regular Languages
/Expressions:
֎ L(NFA) = L(DFA)
֎ The main difference is a Space vs. Time tradeoff:
֎ DFA are faster than NFA;
֎ DFA are bigger (exponentially larger) than NFA.

17 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


FROM RE TO NFA
֎ To convert a regular expression to NFA use Thompson’s construction.
֎ Given a RE, say r , the Thompson’s construction generates an NFA
accepting L(r).
֎ The Thompson’s construction is a recursive procedure guided by the
structure of the regular expression.

18 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


FROM RE TO NFA
֎ The NFA resulting from the Thompson’s construction has important
properties:
֎ It is an 𝜀 -NFA: The automaton can make a transition without
consuming an input symbol — the automaton can non-
deterministically change state.
֎ It has exactly one final state.
֎ No edge enters the start state.
֎ No edge leaves the final state.

19 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


FROM RE TO NFA
֎ Algorithm for conversion of regular expression to NFA is:
֎ Input: A regular expression R
֎ Output: NFA accepting language denoted by R
Method:
For ε NFA is: ε

For a NFA is:


a

20 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


FROM RE TO NFA
For a+b or a|b NFA is:
a
ε ε

ε b
ε

For ab NFA is:


a b

21 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


FROM RE TO NFA
For a* NFA is:
ε

ε a ε

22 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


EXAMPLE
Construct NFA for the regular expression ((a.b)|c)*
Step 1: construct NFA for r1

a
( ( a . B ) | c )* r1

r1

23 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


EXAMPLE
Construct NFA for the regular expression ((a.b)|c)*
Step 2: construct NFA for r2

a
( ( a . B ) | c )* r1

r1 r2 b
r2

24 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


EXAMPLE
Construct NFA for the regular expression ((a.b)|c)*
Step 3: construct NFA for r3

( ( a . B ) | c )* b
a
r3
r3

25 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


EXAMPLE
Construct NFA for the regular expression ((a.b)|c)*
Step 4: construct NFA for r4

( ( a . B ) | c )* b
r3 a

r3 r4

c
r4

26 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


EXAMPLE
Construct NFA for the regular expression ((a.b)|c)*
Step 5: construct NFA for r5

a b
( ( a . B ) | c )*
ε ε
r5 r5
ε ε
c

27 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


EXAMPLE
Construct NFA for the regular expression ((a.b)|c)*
Step 6: construct NFA for r5*
ε

a b
ε ε
ε ε

ε ε
c

28 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


EXAMPLE
Construct NFA for the regular expression a(a+b)*bb

a
ε ε
a ε ε b b

ε ε
b

29 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


REFERENCES
Alfred V. Aho, Monica S. Lam, Ravi Sethi and Jeff Ullman. (2007)
Compilers: Principles, Techniques, and Tools , 2nd Edition
J.E. Hopcroft, R. Motwani, J.D. Ullman. (2007)
Introduction to Automata Theory, Languages, and Computation, 3rd
Edition

30 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY


THIS IS ENOUGH!
Any questions?
Suggestions?

31 NANGARHAR UNIVERSITY COMPUTER SCIENCE FACULTY

You might also like