Chapter 2 - 1 Lexical Analysis
Chapter 2 - 1 Lexical Analysis
Stream of “Tokens”
...
}
Developing a Scanner
public class Scanner {
...
return ...
case ‘a’: case ‘b’: ... case ‘z’:
case ‘A’: case ‘B’: ... case ‘Z’:
scan Letter (Letter | Digit)*
acceptIt();
while
return
scan (Letter
(isLetter(currentChar)
Token.IDENTIFIER;
| Digit)*
case ‘0’:
return ... case ‘9’:
|| isDigit(currentChar)
Token.IDENTIFIER; )
case ‘0’: ... case|‘9’:
...acceptIt();
scan (Letter Digit)
...
return Token.IDENTIFIER;
case ‘0’: ... case ‘9’:
...
Developing a Scanner
The scanner will return instances of Token:
public class Token {
byte kind; String spelling;
final static byte
IDENTIFIER = 0; INTLITERAL = 1; OPERATOR = 2;
BEGIN = 3; CONST = 4; ...
...
...
}
Developing a Scanner
The scanner will return instances of Token. The implementation below
is the one in the Triangle source code.
= initial state
r = final state
M
M = non-final state
s
Deterministic, and non-deterministic FA
• A FA is called deterministic (acronym: DFA) if for
every state and every possible input symbol, there
is only one possible transition to chose from.
Otherwise it is called non-deterministic (NDFA or
NFA).
Q: Is this FSM deterministic or non-deterministic:
r
M
M
s
Deterministic, and non-deterministic FA
• Theorem: every NDFA can be converted into an
equivalent DFA.
r
M
M
s
r
DFA ? M
s
Deterministic, and non-deterministic FA
• Theorem: every NDFA can be converted into an
equivalent DFA.
Algorithm:
The basic idea: DFA is defined as a machine that does a “parallel
simulation” of the NDFA.
• The states of the DFA are subsets of the states of the NDFA
(i.e. every state of the DFA is a set of states of the NDFA)
=> This state can be interpreted as meaning “the simulated
NDFA is now in any of these states”
Deterministic, and non-deterministic FA
Conversion algorithm example:
r
M
2 3
M
1 r
{3,4} is a final state because 3
4 r is a final state
r,s
e
Theorem: every (N)DFA-e can be converted into an equivalent
NDFA (without e-moves).
M r
r M
FA with e moves
Theorem: every (N)DFA-e can be converted into an equivalent
NDFA (without e-moves).
convert into a final state
Algorithm:
e
1) converting states into final states:
if a final state can be reached from
a state S using an e-transition
convert it into a final state.
Repeat this rule until no more states can be converted.
For example:
convert into a final state
e e
2 1
FA with e moves
Algorithm:
1) converting states into final states.
2) adding transitions (repeat until no more can be added)
a) for every transition followed by e-transition
t e
t add transition
t add transition
3) delete all e-transitions
Converting a RE into an NDFA-e
RE: e
FA:
RE: t
FA: t
RE: XY
FA: e
X Y
Converting a RE into an NDFA-e
RE: X|Y
FA:
e X e
e Y e
RE: X* e
FA:
X
e
FA and the implementation of Scanners
• Regular expressions, (N)DFA-e and NDFA and
DFA’s are all equivalent formalism in terms of what
languages can be defined with them.
• Regular expressions are a convenient notation for
describing the “tokens” of programming
languages.
• Regular expressions can be converted into FA’s
(the algorithm for conversion into NDFA-e is
straightforward)
• DFA’s can be easily implemented as computer
programs.
FA and the implementation of Scanners