0% found this document useful (0 votes)
99 views8 pages

Module-3 Lexical Analysis: System Software 15CS63

The document discusses lexical analysis in compiler design. It describes the role of a lexical analyzer in separating lexical analysis from parsing. Tokens, patterns, and lexemes are defined. Finite automata and regular expressions are used to formally specify tokens and recognize them. Transition diagrams are used to design the lexical analyzer. Errors in lexical analysis and strategies for error recovery are also covered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views8 pages

Module-3 Lexical Analysis: System Software 15CS63

The document discusses lexical analysis in compiler design. It describes the role of a lexical analyzer in separating lexical analysis from parsing. Tokens, patterns, and lexemes are defined. Finite automata and regular expressions are used to formally specify tokens and recognize them. Transition diagrams are used to design the lexical analyzer. Errors in lexical analysis and strategies for error recovery are also covered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

System Software 15CS63

MODULE-3
Lexical Analysis
 Role of lexical analyzer

 Specification of tokens

 Recognition of tokens

 Lexical analyzer generator

 Finite automata

 Design of lexical analyzer generator

The role of lexical analyzer

Why to separate Lexical analysis and parsing

1. Simplicity of design

2. Improving compiler efficiency

3. Enhancing compiler portability

Tokens, Patterns and Lexemes

 A token is a pair a token name and an optional token value

 A pattern is a description of the form that the lexemes of a token may take

 A lexeme is a sequence of characters in the source program that matches the pattern for a
token

Example

46 GMIT, Davangere Deepak D J


System Software 15CS63

 Attributes for tokens


E = M * C ** 2
<id, pointer to symbol table entry for E>
<assign-op>
<id, pointer to symbol table entry for M>
<mult-op>
<id, pointer to symbol table entry for C>
<exp-op>
<number, integer value 2>

 Lexical errors
Some errors are out of power of lexical analyzer to recognize:
o fi (a == f(x)) …
However it may be able to recognize errors like:
o d = 2r
Such errors are recognized when no pattern for tokens matches a
character sequence

 Error recovery
1. Panic mode: successive characters are ignored until we reach to a well formed token
2. Delete one character from the remaining input
3. Insert a missing character into the remaining input

47 GMIT, Davangere Deepak D J


System Software 15CS63

4. Replace a character by another character


5. Transpose two adjacent characters

 Input buffering

Sentinels

48 GMIT, Davangere Deepak D J


System Software 15CS63

 Specification of tokens
1. In theory of compilation regular expressions are used to formalize the specification
of tokens

2. Regular expressions are means for specifying regular languages

3. Example:

i. Letter_(letter_ | digit)*

4. Each regular expression is a pattern specifying the form of strings

 Regular expressions
1. Ɛ is a regular expression, L(Ɛ) = {Ɛ}

2. If a is a symbol in ∑then a is a regular expression, L(a) = {a}

3. (r) | (s) is a regular expression denoting the language L(r) L(s)

4. (r)(s) is a regular expression denoting the language L(r)L(s)

5. (r)* is a regular expression denoting (L(r))*

49 GMIT, Davangere Deepak D J


System Software 15CS63

6. (r) is a regular expression denoting L(r)

 Regular definitions
1. d1 -> r1
2. d2 -> r2
3. …
4. dn -> rn

5. Example:

6. letter_ -> A | B | … | Z | a | b | … | Z | _
7. digit -> 0 | 1 | … | 9
8. id -> letter_ (letter_ | digit)*
 Extensions
One or more instances: (r)+

Zero of one instances: r?

Character classes: [abc]

Example:

letter_ -> [A-Za-z_]

digit -> [0-9]

id -> letter_(letter|digit)*

 Recognition of tokens
Starting point is the language grammar to understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt

expr -> term relop term
| term
term -> id
| number
 Recognition of tokens (cont.)
The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
We also need to handle whitespaces:

50 GMIT, Davangere Deepak D J


System Software 15CS63

ws -> (blank | tab | newline)+

 Transition diagrams

 Transition diagrams (cont.)

 Transition diagram for whitespace

 Transition diagram for unsigned numbers

51 GMIT, Davangere Deepak D J


System Software 15CS63

Architecture of a transition-diagram-based lexical analyzer

TOKEN getRelop()

TOKEN retToken = new (RELOP)

while (1) { /* repeat character processing until a

return or failure occurs */

switch(state) {

case 0: c= nextchar();

if (c == ‘<‘) state = 1;

else if (c == ‘=‘) state = 5;

else if (c == ‘>’) state = 6;

else fail(); /* lexeme is not a relop */

break;

case 1: …

case 8: retract();

retToken.attribute = GT;

return(retToken);

 Finite Automata

 Regular expressions = specification


52 GMIT, Davangere Deepak D J
System Software 15CS63

 Finite automata = implementation

 A finite automaton consists of


o An input alphabet

o A set of states S

o A start state n

o A set of accepting states F S

o A set of transitions state input state

 Transition

s1 a s2

 Is read

In state s1 on input “a” go to state s2

 If end of input

 If in accepting state => accept, othewise => reject

 If no transition possible => reject

Example

 Alphabet still { 0, 1 }

The operation of the automaton is not completely defined by the input

On input “11” the automaton could be in either state

MODULE-4

53 GMIT, Davangere Deepak D J

You might also like