Chapter 2 - Lexical Analyser
Chapter 2 - Lexical Analyser
Instructor: Mohammed O.
Email: [email protected]
Samara University
Chapter Two
This Chapter Covers:
Role of lexical analyser
Token Specification and Recognition
NFA to DFA
Lexical Analyzer
Lexical Analyzer reads the source program character by
character to produce tokens.
Normally a lexical analyzer doesn’t return a list of tokens
at one shot, it returns a token when the parser asks a
token from it.
3
2
1
Tokens/Patterns/Lexemes/Attributes
A token is sequence of characters which represents a unit
of information in the source program.
Example:
The parser will repeatedly call the scanner to read all the
tokens from the input stream or until an error is detected
(such as a syntax error).
A typical scanner:
recognises the keywords of the language (these are the
reserved words that have a special meaning in the language,
such as the word class in Java); (such as the #include "file"
directive in C).
(cont’d)
recognises special characters, such as parentheses ( and ),
or groups of special characters, such as := (equal by
definition) and ==;
recognises identifiers, integers, reals, decimals, strings, etc;
ignores whitespaces and comments;
Hand Implementation
There are two ways (methods) to use hand implementation:
Input Buffer approach
Transitional diagrams approach
Input Buffering
The lexical analyser scans the characters of the source
programme one at a time to discover tokens.
(cont’d)
Often, many characters beyond (in addition to) the next
token may have to be examined before the next token itself
can be determined.
i 1
Example
L1 = {a,b,c,d} L2 = {1,2}
L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2}
L1 L2 = {a,b,c,d,1,2}
-closure({0}) = {0,1,2,4,7}
mark S0
(move(S0,a)) = ({3,8}) = S1
(move(S0,b)) = ({5}) = S2
transfunc[S0,a] S1 transfunc[S0,b] S2
mark S1
(move(S1,a)) = ({3,8}) = S1
(move(S1,b)) = ({5}) = S2
transfunc[S1,a] S1 transfunc[S1,b] S2
mark S2
(move(S2,a)) = ({3,8}) = S1
(move(S2,b)) = ({5}) = S2
(cont’d)
S0 is the start state of DFA since 0 is a member of
S0={0,1,2,4,7}
S1 is an accepting state of DFA since 8 is a member
of S1 = {3,8}
Converting RE Directly to DFAs
We may convert a regular expression into a DFA (without
creating a NFA first).
First we augment (enlarge) the given regular expression by
concatenating it with a special symbol #.
r (r)# augmented regular expression (make
something) greater by adding to it.)
Syntax tree of (a|b) * a #
#
4
* a
3 • each symbol is numbered (positions)
• each symbol is at a leave
|
G1 = {2}
G2 = {1,3}
a b
1->2 1->3
2->2 2->3
3->4 3->3