Lec 2
Lec 2
Lexical Analysis
OUTLINE
▪ Role of lexical analyzer
▪ Specification of tokens
▪ Recognition of tokens
▪ Lexical analyzer generator
THE ROLE OF LEXICAL ANALYZER
token
Source To semantic
Lexical Analyzer Parser
program analysis
getNextToken
Symbol
table
WHY TO SEPARATE LEXICAL ANALYSIS AND
PARSING
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
Lexical analyzer: reads input characters and
produces a sequence of tokens as output
(nexttoken()).
LEXICAL ANALYZER
Token 1: (const, -)
Token 2: (identifier, ‘pi’)
Token 3: (=, -)
Token 4: (realnumber, 3.14159)
Token 5: (;, -)
EXAMPLE
Token Informal description Sample lexemes
if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=
token
Source Lexical Parser
program analyzer Get next
token
Symbol
table
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER
Accepting
state
start > =
0 6 7 return(relop,GE)
Start other
state 8 * return(relop,GT)
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
4、Implementing a Transition Diagram
▪ Each state gets a segment of code
▪ If there are edges leaving a state, then its code reads a character
and selects an edge to follow, if possible
▪ Use nextchar() to read next character from the input buffer
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
4、Implementing a Transition Diagram
while (1) {
switch(state) {
case 0: c=nextchar();
if (c==blank || c==tab || c==newline){
state=0;lexeme_beginning++}
else if (c== ‘<‘) state=1;
else if (c==‘=‘) state=5;
else if(c==‘>’) state=6 else state=fail();
break
case 9: c=nextchar();
if (isletter( c)) state=10;
else state=fail(); break
… }}}
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
5、A generalized transition diagram
Finite Automation
▪ Deterministic or non-deterministic FA
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
e.g:The FA simulator for Identifiers is:
letter
letter
1 2
digit
LEXICAL ANALYSIS
SECTION 4 FINITE AUTOMATION
1、Usage of FA
▪ Precisely recognize the regular sets
▪ A regular set is a set of sentences relating to the regular expression
2、Sorts of FA
▪ Deterministic FA
▪ Non-deterministic FA
LEXICAL ANALYSIS
SECTION 4 FINITE AUTOMATA
3、Deterministic FA (DFA)
DFA is a quintuple, M(S,,move,s0,F)
▪ S: a set of states
▪ : the input symbol alphabet
▪ move: a transition function, mapping from S
to S, move(s,a)=s’
▪ s0: the start state, s0 ∈ S
▪ F: a set of states F distinguished as accepting
states, FS
LEXICAL ANALYSIS
SECTION 4 FINITE AUTOMATION
3、Deterministic FA (DFA)
Note: 1) In a DFA, no state has an -transition;
2)In a DFA, for each state s and input
symbol a, there is at most one edge labeled a
leaving s
3)To describe a FA,we use the transition
graph or transition table
4)A DFA accepts an input string x if and
only if there is some path in the transition
graph from start state to some accepting state
e.g. DFA M=({0,1,2,3},{a,b},move,0,{3})
Move: move(0,a)=1 m(0,b)=2 m(1,a)=3 m(1,b)=2
m(2,a)=1 m(2,b)=3 m(3,a)=3 m(3,b)=3
Transition table
input a b
state 1 a
a
a
0 1 2 b a
0 3
1 3 2 b
b
b
2 1 3 2
3 3 3 Transition graph
e.g. Construct a DFA M,which can accept the strings which begin with
a or b, or begin with c and contain at most one a。
b b
0 c 2 a 3
a b c c
c 1 a
b
So ,the DFA is
b b
M=({0,1,2,3,},{a,b,c},move,0,{1,2,3})
move:move(0,a)=1 move(0,b)=1 0 c 2 a 3
move(0,c)=2 move(1,a)=1
move(1,b)=1 move(1,c)=1 a b c c
move(2,a)=3 move(2,b)=2
move(2,c)=2 move(3,b)=3
move(3,c)=3 c 1 a
b