Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
error error
Symbol Table
4
Attributes of Tokens
<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>
token
tokenval
(token attribute) Parser
5
letter → A | B | … | Z | a | b | … | z
digit → 0 | 1 | … | 9
id → letter ( letter | digit )*
• Cannot use recursion, this is illegal:
lex.yy.c C a.out
compiler
input sequence
stream a.out of tokens
18
Lex Specification
• A lex specification consists of three parts:
regular definitions, C declarations in %{ %}
%%
translation rules
%%
user-defined auxiliary procedures
• The translation rules are of the form:
p1 { action1 }
p2 { action2 }
…
pn { actionn }
19
lex spec.l
gcc lex.yy.c -ll
./a.out < spec.l
21
Optional
regular
NFA DFA
expressions
Nondeterministic Finite
Automata
• Definition: an NFA is a 5-tuple (S,,,s0,F)
where
Transition Graph
• An NFA can be diagrammatically
represented by a labeled directed graph
called a transition graph
a
S = {0,1,2,3}
start a b b = {a,b}
0 1 2 3
s0 = 0
b F = {3}
27
Transition Table
• The mapping of an NFA can be
represented in a transition table
Input Input
State
(0,a) = {0,1} a b
(0,b) = {0} 0 {0, 1} {0}
(1,b) = {2} 1 {2}
(2,b) = {3}
2 {3}
28
Subset construction
(optional)
DFA
30
a start a
i f
start N(r1)
r1 | r2 i f
N(r2)
start
r1r2 i N(r1) N(r2) f
r* start
i N(r) f
31
a { action1 }
start a b b
abb { action2 } 3 4 5 6
a b
a*b+ { action3 }
start
7 b 8
a
1 2
start
0 3
a
4
b
5
b
6
a b
7 b 8
32
a a b a
none
0 2 7 8 action3
1 4
3 7 Must find the longest match:
7 Continue until no further moves are possible
When last state is accepting: execute action
33
a b b a
none
0 2 5 6 action2
1 4 8 8 action3
3 7
7 When two or more accepting states are reached, the
first action given in the Lex specification is executed
34
Example DFA
b
b
a
start a b b
0 1 2 3
a a
36
C
b a
b a
start a b b start a b b
A B D E A B D E
a a
a
a b a
43
Leaf true
{1, 2} | {1, 2}
Directly: Algorithm
s0 := firstpos(root) where root is the root of the syntax tree
Dstates := {s0} and is unmarked
while there is an unmarked state T in Dstates do
mark T
for each input symbol a do
let U be the set of positions that are in followpos(p)
for some position p in T,
such that the symbol at position p is a
if U is not empty and not in Dstates then
add U as an unmarked state to Dstates
end if
Dtran[T,a] := U
end do
51
b b
a
start a 1,2, b 1,2, b 1,2,
1,2,3
3,4 3,5 3,6
a
a
52
Time-Space Tradeoffs
Space Time
Automaton
(worst case) (worst case)
NFA O(|r|) O(|r||x|)
DFA O(2|r|) O(|x|)