0% found this document useful (0 votes)
58 views11 pages

Two Issues in Lexical Analysis

This document discusses lexical analysis and regular expressions. It covers two main topics: 1) Recognizing tokens specified by regular expressions using non-deterministic finite automata (NFA). An NFA can be constructed from a regular expression and used to recognize strings in the language in O(|S|^2|X|) time, where |S| is the number of NFA states and |X| is the length of the input string. 2) Converting an NFA to a deterministic finite automaton (DFA) to improve the token recognition time complexity to O(|X|). The conversion algorithm marks states in the DFA as it is constructed from the NFA. For an NFA with

Uploaded by

deThirdKind
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views11 pages

Two Issues in Lexical Analysis

This document discusses lexical analysis and regular expressions. It covers two main topics: 1) Recognizing tokens specified by regular expressions using non-deterministic finite automata (NFA). An NFA can be constructed from a regular expression and used to recognize strings in the language in O(|S|^2|X|) time, where |S| is the number of NFA states and |X| is the length of the input string. 2) Converting an NFA to a deterministic finite automaton (DFA) to improve the token recognition time complexity to O(|X|). The conversion algorithm marks states in the DFA as it is constructed from the NFA. For an NFA with

Uploaded by

deThirdKind
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 11

Two issues in lexical analysis

Specifying tokens (regular expression)


Identifying tokens specified by regular
expression.
How to recognize tokens specified by
regular expressions?
A recognizer for a language is a program that takes a
string x as input and answers yes if x is a sentence of
the language and no otherwise.
In the context of lexical analysis, given a string and a regular
expression, a recognizer of the language specified by the
regular expression answer yes if the string is in the
language.
A regular expression can be compiled into a recognizer
(automatically) by constructing a finite automata which can be
deterministic or non-deterministic.
Non-deterministic finite automata
(NFA)
A non-deterministic finite automata (NFA) is a mathematical
model that consists of: (a 5-tuple (Q, , , q 0, F )
a set of states Q
a set of input symbols
a transition function that maps state-symbol pairs to sets of
states.
A state q0 that is distinguished as the start (initial) state
A set of states F distinguished as accepting (final) states.
An NFA accepts an input string x if and only if there is some path
in the transition graph from the start state to some accepting state.
Show an NFA example (page 116, Figure 3.21).
An NFA is non-deterministic in that (1) same
character can label two or more transitions out of
one state (2) empty string can label transitions.
For example, here is an NFA that recognizes the
language ???. a

0 1 2 3
a b b
b
An NFA can easily implemented using a transition
table.
State a b
0 {0, 1} {0}
1 - {2}
2 - {3}
The algorithm that recognizes the language
accepted by NFA.
Input: an NFA (transition table) and a string x (terminated by eof).
output yes if accepted, no otherwise.

S = e-closure({s0});
a = nextchar;
while a != eof do begin
S = e-closure(move(S, a));
a := next char;
end
if (intersect (S, F) != empty) then return yes
else return no

Note: e-closure({S}) are the state that can be reached from states in S
through transitions labeled by the empty string.
Example: recognizing ababb from previous NFA
Example2: Use the example in Fig. 3.27 for recognizing ababb

Space complexity O(|S|), time complexity O(|S|^2|x|)??


Construct an NFA from a regular expression:
Input: A regular expression r over an alphabet
Output: An NFA N accepting L( r )
Algorithm (3.3, pages 122):
For , construct the NFA
For a in , construct the NFA
a
Let N(s) and N(t) be NFAs for regular s and t:
for s|t, construct the NFA N(s|t): N(s)
For st, construct the NFA N(st):
N(s) N(t) N(t)

For s*, construct the NFA N(s*):




N(s)

Example: r = (a|b)*abb.

Example: using algorithm 3.3 to construct


N( r ) for r = (ab | a)*b* | b.
Using NFA, we can recognize a token in O(|
S|^2|X|) time, we can improve the time
complexity by using deterministic finite
automaton instead of NFA.
An NFA is deterministic (a DFA) if
no transitions on empty-string
for each state S and an input symbol a, there is at
most one edge labeled a leaving S.
What is the time complexity to recognize a
token when a DFA is used?
Algorithm to convert an NFA to a DFA that accepts the
same language (algorithm 3.2, page 118)

initially e-closure(s0) is the only state in Dstates and it is unmarked


while there is an unmarked state T in Dstates do begin
mark T;
for each input symbol a do begin
U := e-closure(move(T, a));
if (U is not in Dstates) then
add U as an unmarked state to Dstates;
Dtran[T, a] := U;
end
end;

Initial state = e-closure(s0), Final state = ?


Example: page 120, fig 3.27.

Question:
for a NFA with |S| states, at most how many states can its
corresponding DFA have?

You might also like