Lexical Analysis: Programming Languages Translators
Lexical Analysis: Programming Languages Translators
Lexical Analysis
Tasks of a Scanner
program confusing;
const true = false;
begin
if (a<b) = true then
f(a)
else
Whitespace
Stream of tokens
Parser Parse tree
Context-free grammar
Sentinels
Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;\
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
Transition diagrams
state = 0;
while ( (c = next_char() ) != EOF ) {
switch (state) {
case 0: if ( c == a ) state = 1;
break;
case 1: if ( c == b ) state = 2;
break;
case 2: if ( c == c ) state = 3;
break;
case 3: if ( c == a ) state = 1;
else { ungetchar(); return (TRUE); }
break;
default:
error();
}
}
if ( state == 3 ) return (TRUE) else return (FALSE);
Finite Automata for the Lexical Tokens
a- z a- z
i f 0-9
2 0-9
1 2 3 1 2
1
0-9
IF ID NUM
0-9 0-9
0-9
. 1 - 2 - 3
\n
4
a- z
1 2 3 1 2
any but \n
. blank, etc.
5 blank, etc.
4 0-9 5 0-9
Lexical Errors
Deleting an extraneous character
Inserting a missing character
Replacing an incorrect character by a correct
character
Transposing two adjacent characters(such as ,
fi=>if)
Pre-scanning
Tokens / Patterns / Regular Expressions
Lexical Analysis - searches for matches of lexeme to pattern
Lexical Analyzer returns:<actual lexeme, symbolic identifier of token>
algs algs
REs --- NFA --- DFA (program for simulation)