6.implementing Lexical Analyzer Using Finite Automation
6.implementing Lexical Analyzer Using Finite Automation
6.implementing Lexical Analyzer Using Finite Automation
LEXICAL ANALYZER
USING FINITE
AUTOMATION
We are given the following regular
definition:
if -> if
then -> then
else -> else
relop -> <| <=|=|<>|>|>=
id -> letter(letter|digit)*
num -> digit+(.digit+)? (E(+|-)?digit+)?
letter -> [a-z]|[A-Z]
digit ->[0-9]
Recognize the keyword: if, then, else and
lexemes: relop, id, num
delim -> blank|tab|newline
ws -> delim+
if a match for ws is found lexical analyzer
does not return a token to parser. It
proceeds to find a token following the white
space and return that to parser.
TRANSITION DIAGRAMS
Transition diagram depicts the actions that
takes place when a lexical analyzer is called by
parser to get the next token
TD keeps track of information about characters
that are seen as fwd pointer scans the input
Position in TD are drawn as circles called states
States are connected by arrows called edges
Edges leaving state s have labels indicating i/p
characters that can next appear after
transition diagram have reached state s.
Start state: state where control resides when
we begin to recognize a token.
No valid transitions indicate failure
Accepting state: state in which token can be
found.
* indicates state in which retraction must
takes place
letter/digit
start letter
*
delimiter
0 1 2
There may be several transition diagrams
If failure occurs while following one transition
diagram, then retract the fwd pointer to where it
was in start state of this diagram and activate
next transition diagram
If failure occurs in all transition diagrams, lexical
error will be detected and error recovery
routines will be invoked
e.g. DO 5 I=1.25
DO 5 I=1,25
RECOGNITION OF RESERVED WORDS
Initialize appropriately the symbol table in which
information about identifiers is stored
Enter the reserved words into symbol table before
any characters in the i/p are seen.
Make a note in the symbol table of the token to be
returned when the keyword is identified.
Return statement next to accepting state uses
gettoken() and install_id() to obtain token and
attribute value
When a lexeme is identified, symbol table is
checked
if found as keyword install_id() will return 0
If an identifier , pointer to symbol table entry will be
returned
gettoken() will return the corresponding token
RECOGNITION OF NUMBERS
When accepting state is reached,
call a procedure install_num() that enters the
lexeme into table of numbers and returns a
pointer to created entry
Returns the token NUM
IMPLEMENTING LEXICAL ANALYZER
Token nexttoken( )
{
While (1)
{
switch(state) {
case 0: c=nextchar();
If (c==blank|| c==tab|| c==newline) {
State =0;
lexeme_beginning++;
}
else if (c==’<’) state=1;
else if (c ==’=’)state=5;
else if (c==’>’) state=6;
else state=fail();
break;
case 1: c= nextchar();
if (c==’=’) state=2;
else if (c==’>’) state=3;
else state=4;
break;
case 2: token.attribute=LE;
token.name=relop;
return token;
case 8: retract (1);
token.attribute=GT;
token.name=relop;
return token;
case 9: c= nextchar();
if (isletter(c)) state=10;
else state= fail();
break;
case 10: c= nextchar();
if (isletter(c)) state=10;
else if (isdigit(c)) state=10;
else state=11;
break;
case11: retract (1);
entry=install_id( );
name=gettoken();
token.name= name;
token. attribute=entry;
return token;
break;
/* cases 12-24 here for numbers*/
case 25: c= nextchar();
if (isidgit(c)) state=26;
else state=fail();
break;
case 26: c= nextchar();
if (isidgit(c)) state=26;
else state=27;
break;
case 27:retract (1); install_num( );
return (NUM);
}
}
}
CODE FOR NEXT STATE
int state=0, start=0;
int lexical_value;
int fail()
{
forward=token_beginning;
switch( start){
case 0:start=9; break;
case 9: start=12; break;
case 12: start=20; break;
case 20: start=25; break;
case 25: recover( ); break;
default: /* compiler error*/
}
return start;
}