Module-3 Lexical Analysis: System Software 15CS63
Module-3 Lexical Analysis: System Software 15CS63
MODULE-3
Lexical Analysis
Role of lexical analyzer
Specification of tokens
Recognition of tokens
Finite automata
1. Simplicity of design
A pattern is a description of the form that the lexemes of a token may take
A lexeme is a sequence of characters in the source program that matches the pattern for a
token
Example
Lexical errors
Some errors are out of power of lexical analyzer to recognize:
o fi (a == f(x)) …
However it may be able to recognize errors like:
o d = 2r
Such errors are recognized when no pattern for tokens matches a
character sequence
Error recovery
1. Panic mode: successive characters are ignored until we reach to a well formed token
2. Delete one character from the remaining input
3. Insert a missing character into the remaining input
Input buffering
Sentinels
Specification of tokens
1. In theory of compilation regular expressions are used to formalize the specification
of tokens
3. Example:
i. Letter_(letter_ | digit)*
Regular expressions
1. Ɛ is a regular expression, L(Ɛ) = {Ɛ}
Regular definitions
1. d1 -> r1
2. d2 -> r2
3. …
4. dn -> rn
5. Example:
6. letter_ -> A | B | … | Z | a | b | … | Z | _
7. digit -> 0 | 1 | … | 9
8. id -> letter_ (letter_ | digit)*
Extensions
One or more instances: (r)+
Example:
id -> letter_(letter|digit)*
Recognition of tokens
Starting point is the language grammar to understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
|Ɛ
expr -> term relop term
| term
term -> id
| number
Recognition of tokens (cont.)
The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
We also need to handle whitespaces:
Transition diagrams
TOKEN getRelop()
switch(state) {
case 0: c= nextchar();
if (c == ‘<‘) state = 1;
break;
case 1: …
case 8: retract();
retToken.attribute = GT;
return(retToken);
Finite Automata
o A set of states S
o A start state n
Transition
s1 a s2
Is read
If end of input
Example
Alphabet still { 0, 1 }
MODULE-4