0% found this document useful (0 votes)

91 views10 pages

Implementation of The Regular Expression

- Regular expressions can be used to specify lexical structures and partition input into tokens. - Finite automata are used to implement regular expressions and recognize regular languages. They consist of states, transitions between states based on input symbols, a start state, and accepting states. - Regular expressions are first converted to non-deterministic finite automata (NFAs) which are then converted to deterministic finite automata (DFAs) for implementation of a lexical analyzer.

Uploaded by

Param Ahir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views10 pages

Implementation of The Regular Expression

Uploaded by

Param Ahir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Outline

• Specifying lexical structure using regular

Implementation of Lexical Analysis expressions

• Finite automata
– Deterministic Finite Automata (DFAs)
– Non-deterministic Finite Automata (NFAs)

• Implementation of regular expressions

RegExp ⇒ NFA ⇒ DFA ⇒ Tables

Compiler Design 1 (2011) 2

Notation Regular Expressions in Lexical Specification

• For convenience, we use a variation (allow user- • Last lecture: a specification for the predicate
defined abbreviations) in regular expression s ∈ L(R)
notation • But a yes/no answer is not enough !
• Instead: partition the input into tokens
• Union: A + B ≡ A|B
• Option: A + ε ≡ A? • We will adapt regular expressions to this goal
• Range: ‘a’+’b’+…+’z’ ≡ [a-z]
• Excluded range:
complement of [a-z] ≡ [^a-z]

Compiler Design 1 (2011) 3 Compiler Design 1 (2011) 4

Regular Expressions ⇒ Lexical Spec. (1) Regular Expressions ⇒ Lexical Spec. (2)

1. Select a set of tokens 3. Construct R, matching all lexemes for all

• Integer, Keyword, Identifier, OpenPar, ... tokens

2. Write a regular expression (pattern) for the R = Keyword + Identifier + Integer + …

lexemes of each token = R1 + R2 + R3 + …
• Integer = digit +
• Keyword = ‘if’ + ‘else’ + …
• Identifier = letter (letter + digit)*
Facts: If s ∈ L(R) then s is a lexeme
• OpenPar = ‘(‘ – Furthermore s ∈ L(Ri) for some “i”
• … – This “i” determines the token that is reported

Compiler Design 1 (2011) 5 Compiler Design 1 (2011) 6

Regular Expressions ⇒ Lexical Spec. (3) How to Handle Spaces and Comments?

4. Let input be x1…xn 1. We could create a token Whitespace

• (x1 ... xn are characters) Whitespace = (‘ ’ + ‘\n’ + ‘\t’)+
• For 1 ≤ i ≤ n check – We could also add comments in there
x1…xi ∈ L(R) ? – An input “ \t\n 5555 “ is transformed into
5. It must be that Whitespace Integer Whitespace
x1…xi ∈ L(Rj) for some j 2. Lexer skips spaces (preferred)
(if there is a choice, pick a smallest such j) • Modify step 5 from before as follows:
It must be that xk ... xi ∈ L(Rj) for some j such
6. Remove x1…xi from input and go to previous step that x1 ... xk-1 ∈ L(Whitespace)
• Parser is not bothered with spaces

Compiler Design 1 (2011) 7 Compiler Design 1 (2011) 8

Ambiguities (1) Ambiguities (2)

• There are ambiguities in the algorithm • Which token is used? What if

• x1…xi ∈ L(Rj) and also
• How much input is used? What if • x1…xi ∈ L(Rk)
• x1…xi ∈ L(R) and also – Rule: use rule listed first (j if j < k)
• x1…xK ∈ L(R)
– Rule: Pick the longest possible substring • Example:
– The “maximal munch” – R1 = Keyword and R2 = Identifier
– “if” matches both
– Treats “if” as a keyword not an identifier

Compiler Design 1 (2011) 9 Compiler Design 1 (2011) 10

Error Handling Summary

• What if • Regular expressions provide a concise notation

No rule matches a prefix of input ? for string patterns
• Problem: Can’t just get stuck … • Use in lexical analysis requires small extensions
• Solution: – To resolve ambiguities
– Write a rule matching all “bad” strings – To handle errors
– Put it last • Good algorithms known (next)
• Lexer tools allow the writing of: – Require only single pass over the input
R = R1 + ... + Rn + Error – Few operations per character (table lookup)
– Token Error matches if nothing else matches

Compiler Design 1 (2011) 11 Compiler Design 1 (2011) 12

Regular Languages & Finite Automata Finite Automata

Basic formal language theory result: A finite automaton is a recognizer for the
Regular expressions and finite automata both strings of a regular language
define the class of regular languages.
A finite automaton consists of
Thus, we are going to use: – A finite input alphabet Σ
• Regular expressions for specification – A set of states S
– A start state n
• Finite automata for implementation
– A set of accepting states F ⊆ S
(automatic generation of lexical analyzers)
– A set of transitions state →input state

Compiler Design 1 (2011) 13 Compiler Design 1 (2011) 14

Finite Automata Finite Automata State Graphs

• Transition • A state
s1 →a s2
• Is read
In state s1 on input “a” go to state s2 • The start state

• If end of input (or no transition possible) • An accepting state

– If in accepting state ⇒ accept
– Otherwise ⇒ reject
a
• A transition

Compiler Design 1 (2011) 15 Compiler Design 1 (2011) 16

A Simple Example Another Simple Example

• A finite automaton that accepts only “1” • A finite automaton accepting any number of 1’s
followed by a single 0
• Alphabet: {0,1}

1
1

Compiler Design 1 (2011) 17 Compiler Design 1 (2011) 18

And Another Example And Another Example

• Alphabet {0,1} • Alphabet still { 0, 1 }

• What language does this recognize?
1

1 0
1
0 0
• The operation of the automaton is not
1 completely defined by the input
1 – On input “11” the automaton could be in either state

Compiler Design 1 (2011) 19 Compiler Design 1 (2011) 20

Epsilon Moves Deterministic and Non-Deterministic Automata

• Another kind of transition: ε-moves • Deterministic Finite Automata (DFA)

– One transition per input per state
ε – No ε-moves
A B • Non-deterministic Finite Automata (NFA)
– Can have multiple transitions for one input in a
• Machine can move from state A to state B given state
without reading input – Can have ε-moves
• Finite automata have finite memory
– Enough to only encode the current state

Compiler Design 1 (2011) 21 Compiler Design 1 (2011) 22

Execution of Finite Automata Acceptance of NFAs

• A DFA can take only one path through the

state graph • An NFA can get into multiple states
– Completely determined by input 1

0 1
• NFAs can choose
– Whether to make ε-moves
– Which of multiple transitions for a single input to
take
0

• Input: 1 0 1

• Rule: NFA accepts an input if it can get in a

final state
Compiler Design 1 (2011) 23 Compiler Design 1 (2011) 24
NFA vs. DFA (1) NFA vs. DFA (2)

• NFAs and DFAs recognize the same set of • For a given language the NFA can be simpler
languages (regular languages) than the DFA

1
0 0
NFA
• DFAs are easier to implement 0

– There are no choices to consider 1 0

0 0
DFA
1
1

• DFA can be exponentially larger than NFA

Compiler Design 1 (2011) 25 Compiler Design 1 (2011) 26

Regular Expressions to Finite Automata Regular Expressions to NFA (1)

• High-level sketch • For each kind of reg. expr, define an NFA

– Notation: NFA for regular expression M

NFA
M
Regular
expressions DFA • For ε
ε

Lexical Table-driven • For input a

Specification Implementation of DFA a

Compiler Design 1 (2011) 27 Compiler Design 1 (2011) 28

Regular Expressions to NFA (2) Regular Expressions to NFA (3)

• For AB • For A*

A ε ε
B

A
• For A + B ε

ε
B ε
ε
ε
ε A

Compiler Design 1 (2011) 29 Compiler Design 1 (2011) 30

Example of Regular Expression → NFA conversion NFA to DFA. The Trick

• Consider the regular expression • Simulate the NFA

(1+0)*1 • Each state of DFA
• The NFA is = a non-empty subset of states of the NFA
• Start state
= the set of NFA states reachable through ε-moves
ε from NFA start state

ε C 1 E ε • Add a transition S →a S’ to DFA iff

1 – S’ is the set of NFA states reachable from any
A ε B
0 F G H ε I J
ε D ε state in S after seeing the input a
• considering ε-moves as well
ε

Compiler Design 1 (2011) 31 Compiler Design 1 (2011) 32

NFA to DFA. Remark NFA to DFA Example

• An NFA may be in many states at any time ε

• How many different states ? ε C 1 E ε

1
A ε B
0 F G H ε I J
ε D ε
• If there are N states, the NFA must be in
some subset of those N states ε
0
• How many subsets are there? 0 FGABCDHI
– 2N - 1 = finitely many 0 1
ABCDHI
1
1 EJGABCDHI

Compiler Design 1 (2011) 33 Compiler Design 1 (2011) 34

Implementation Table Implementation of a DFA

• A DFA can be implemented by a 2D table T

0
– One dimension is “states”
– Other dimension is “input symbols”
0 T
0 1
– For every transition Si →a Sk define T[i,a] = k S
1
1 U

• DFA “execution”
– If in state Si and input a, read T[i,a] = k and skip to 0 1
state Sk
S T U
– Very efficient
T T U
U T U

Compiler Design 1 (2011) 35 Compiler Design 1 (2011) 36

Implementation (Cont.) Theory vs. Practice

• NFA → DFA conversion is at the heart of Two differences:

tools such as lex, ML-Lex or flex
• DFAs recognize lexemes. A lexer must return
• But, DFAs can be huge a type of acceptance (token type) rather than
simply an accept/reject indication.
• In practice, lex/ML-Lex/flex-like tools trade
off speed for space in the choice of NFA and • DFAs consume the complete string and accept
DFA representations or reject it. A lexer must find the end of the
lexeme in the input stream and then find the
next one, etc.

Compiler Design 1 (2011) 37 Compiler Design 1 (2011) 38

Regular Expression, DFA and NFA: Prepared By: Prof. J. S. Dhobi Prof. M. D. Mehta
No ratings yet
Regular Expression, DFA and NFA: Prepared By: Prof. J. S. Dhobi Prof. M. D. Mehta
82 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
Compiler Construction Lecture 3-4
No ratings yet
Compiler Construction Lecture 3-4
78 pages
Chapter 3 Implementation - of - Lexical - Analysis
No ratings yet
Chapter 3 Implementation - of - Lexical - Analysis
63 pages
Lecture 04
No ratings yet
Lecture 04
37 pages
CD ppt1
No ratings yet
CD ppt1
62 pages
Wifiid
No ratings yet
Wifiid
83 pages
CD - Unit1 - Lecture4 5 6 7
No ratings yet
CD - Unit1 - Lecture4 5 6 7
50 pages
PLDI Week 06 Parsing
No ratings yet
PLDI Week 06 Parsing
55 pages
CD Mod 1 & 2
No ratings yet
CD Mod 1 & 2
32 pages
Module 2
No ratings yet
Module 2
51 pages
Regular Expression
No ratings yet
Regular Expression
46 pages
Recognition of Tokens
No ratings yet
Recognition of Tokens
34 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Lexical Analysis: Regular Expressions
No ratings yet
Lexical Analysis: Regular Expressions
11 pages
Automata Lectuee3
No ratings yet
Automata Lectuee3
27 pages
Batch
No ratings yet
Batch
420 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
Lecture 4 Regular Expression
No ratings yet
Lecture 4 Regular Expression
30 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Lecture Week 03
No ratings yet
Lecture Week 03
24 pages
Finite Automata-Topic RE To NFA
No ratings yet
Finite Automata-Topic RE To NFA
25 pages
SLD 2
No ratings yet
SLD 2
67 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Lexical
No ratings yet
Lexical
30 pages
Compiler Construction Lecture Notes
No ratings yet
Compiler Construction Lecture Notes
27 pages
Finite Automata Answers
No ratings yet
Finite Automata Answers
33 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Lecture 4
No ratings yet
Lecture 4
23 pages
3 Regex
No ratings yet
3 Regex
16 pages
Automata & Compiler Design Handout
No ratings yet
Automata & Compiler Design Handout
59 pages
2 - Scanner
No ratings yet
2 - Scanner
49 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
CompilerD L3
No ratings yet
CompilerD L3
36 pages
Unit I Bks Lexical Analysis V - Re - and - Fsa
No ratings yet
Unit I Bks Lexical Analysis V - Re - and - Fsa
52 pages
Chapter Two LexicalAnalysis
No ratings yet
Chapter Two LexicalAnalysis
16 pages
Cse384 Compiler Design Laboratory Lab Manual
No ratings yet
Cse384 Compiler Design Laboratory Lab Manual
55 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
32 pages
Token, Lexemes and Regular Expression
No ratings yet
Token, Lexemes and Regular Expression
22 pages
2 - 8 Design of A Lexical Analyzer Generator
No ratings yet
2 - 8 Design of A Lexical Analyzer Generator
15 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Compilers Design: M. T. Bennani Assistant Professor, FST - El Manar University, LISI-INSAT
No ratings yet
Compilers Design: M. T. Bennani Assistant Professor, FST - El Manar University, LISI-INSAT
15 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Applications of FA
No ratings yet
Applications of FA
29 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Physical Design Questions
100% (1)
Physical Design Questions
7 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
Lexical Analysis Finite Automata
No ratings yet
Lexical Analysis Finite Automata
12 pages
3-Lexical Analysis Part2
No ratings yet
3-Lexical Analysis Part2
39 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
Lexical Analysis All Token List and Diffence
No ratings yet
Lexical Analysis All Token List and Diffence
4 pages
BridgeEngineering PDF
No ratings yet
BridgeEngineering PDF
96 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
User Manual: ALC404 Lighting Tower Controller
No ratings yet
User Manual: ALC404 Lighting Tower Controller
83 pages
Finite Automata: A Simple Computing Model
No ratings yet
Finite Automata: A Simple Computing Model
53 pages
DBMS - Chapter-4
No ratings yet
DBMS - Chapter-4
98 pages
JCI 2013 Awards Manual
No ratings yet
JCI 2013 Awards Manual
64 pages
Traslados
No ratings yet
Traslados
363 pages
Insead Emba
No ratings yet
Insead Emba
35 pages
Understanding The Process Conditions in A Parallel Flow Regenerative Kiln PDF
No ratings yet
Understanding The Process Conditions in A Parallel Flow Regenerative Kiln PDF
4 pages
PV-Wind Turbine Hybrid Power System Simulation Using MATLAB/Simulink
No ratings yet
PV-Wind Turbine Hybrid Power System Simulation Using MATLAB/Simulink
16 pages
Ielts SV Club - 0346808595
No ratings yet
Ielts SV Club - 0346808595
12 pages
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
No ratings yet
RHD'L: Instruction-Level Parallel Processing: History, Overview and Perspective
57 pages
Guidelines For M.tech Thesis Rtu
100% (2)
Guidelines For M.tech Thesis Rtu
7 pages
Snowden Leaks by Al Mac
100% (1)
Snowden Leaks by Al Mac
94 pages
TNPSC Group 2 Original Question Paper 2007 - General Knowledge and General Tamil
No ratings yet
TNPSC Group 2 Original Question Paper 2007 - General Knowledge and General Tamil
15 pages
YAMAHA鋼琴型號
No ratings yet
YAMAHA鋼琴型號
11 pages
Sylvania Engineering Bulletin - Mercury Lamps 1977
100% (1)
Sylvania Engineering Bulletin - Mercury Lamps 1977
20 pages
Ig31a-M7 Ip31a-A7 Bios 1116
No ratings yet
Ig31a-M7 Ip31a-A7 Bios 1116
39 pages
Gem Pump Instructions
No ratings yet
Gem Pump Instructions
2 pages
HorizonView ReferencePorts v1
No ratings yet
HorizonView ReferencePorts v1
5 pages
CITRIX Administrator
100% (1)
CITRIX Administrator
4 pages
Golgota I Vaskrs Srbije Kupindo
No ratings yet
Golgota I Vaskrs Srbije Kupindo
3 pages
Alaytical Balance 2
No ratings yet
Alaytical Balance 2
11 pages
10w-70w Q-Switch Pulsed Fiber Laser-Leaflet
No ratings yet
10w-70w Q-Switch Pulsed Fiber Laser-Leaflet
2 pages
A Technical Report FORMAT
No ratings yet
A Technical Report FORMAT
3 pages
Carlos Silva Prewiring
No ratings yet
Carlos Silva Prewiring
1 page
Ficha Tecnica
No ratings yet
Ficha Tecnica
1 page
HE Commercial Washer Brochure PDF
No ratings yet
HE Commercial Washer Brochure PDF
4 pages
H. NO. 1-9-645, Vidyanagar, Adikmet Road, Near SBH, Hyderabad-500 044
No ratings yet
H. NO. 1-9-645, Vidyanagar, Adikmet Road, Near SBH, Hyderabad-500 044
2 pages

Implementation of The Regular Expression

Uploaded by

Implementation of The Regular Expression

Uploaded by

Outline

• Specifying lexical structure using regular

• Implementation of regular expressions

Compiler Design 1 (2011) 2

Notation Regular Expressions in Lexical Specification

Compiler Design 1 (2011) 3 Compiler Design 1 (2011) 4

1. Select a set of tokens 3. Construct R, matching all lexemes for all

2. Write a regular expression (pattern) for the R = Keyword + Identifier + Integer + …

Compiler Design 1 (2011) 5 Compiler Design 1 (2011) 6

4. Let input be x1…xn 1. We could create a token Whitespace

Compiler Design 1 (2011) 7 Compiler Design 1 (2011) 8

• There are ambiguities in the algorithm • Which token is used? What if

Compiler Design 1 (2011) 9 Compiler Design 1 (2011) 10

Error Handling Summary

• What if • Regular expressions provide a concise notation

Compiler Design 1 (2011) 11 Compiler Design 1 (2011) 12

Compiler Design 1 (2011) 13 Compiler Design 1 (2011) 14

Finite Automata Finite Automata State Graphs

• If end of input (or no transition possible) • An accepting state

Compiler Design 1 (2011) 15 Compiler Design 1 (2011) 16

Compiler Design 1 (2011) 17 Compiler Design 1 (2011) 18

And Another Example And Another Example

• Alphabet {0,1} • Alphabet still { 0, 1 }

Compiler Design 1 (2011) 19 Compiler Design 1 (2011) 20

• Another kind of transition: ε-moves • Deterministic Finite Automata (DFA)

Compiler Design 1 (2011) 21 Compiler Design 1 (2011) 22

Execution of Finite Automata Acceptance of NFAs

• A DFA can take only one path through the

• Rule: NFA accepts an input if it can get in a

– There are no choices to consider 1 0

• DFA can be exponentially larger than NFA

Compiler Design 1 (2011) 25 Compiler Design 1 (2011) 26

Regular Expressions to Finite Automata Regular Expressions to NFA (1)

• High-level sketch • For each kind of reg. expr, define an NFA

Lexical Table-driven • For input a

Compiler Design 1 (2011) 27 Compiler Design 1 (2011) 28

Compiler Design 1 (2011) 29 Compiler Design 1 (2011) 30

Example of Regular Expression → NFA conversion NFA to DFA. The Trick

• Consider the regular expression • Simulate the NFA

ε C 1 E ε • Add a transition S →a S’ to DFA iff

Compiler Design 1 (2011) 31 Compiler Design 1 (2011) 32

• An NFA may be in many states at any time ε

• How many different states ? ε C 1 E ε

Compiler Design 1 (2011) 33 Compiler Design 1 (2011) 34

Implementation Table Implementation of a DFA

• A DFA can be implemented by a 2D table T

Compiler Design 1 (2011) 35 Compiler Design 1 (2011) 36

• NFA → DFA conversion is at the heart of Two differences:

Compiler Design 1 (2011) 37 Compiler Design 1 (2011) 38

You might also like