0% found this document useful (0 votes)

28 views27 pages

Lexical Analysis 3

The document discusses lexical analysis in compiler design. It covers topics like the role of a lexical analyzer, tokens, patterns and lexemes, attributes of tokens, lexical errors, input buffering, specification of tokens using regular expressions, recognition of tokens using transition diagrams, and the architecture of a lexical analyzer.

Uploaded by

shrutika jori

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views27 pages

Lexical Analysis 3

Uploaded by

shrutika jori

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

COMPILER DESIGN

Topic: Lexical Analysis

Soma Ghosh - [email protected]

The role of lexical analyzer

token
Source Lexical To semantic
Parser
program Analyzer analysis
getNextToken

Symbol
table
Why to separate Lexical
analysis and parsing
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
Tokens, Patterns and Lexemes

• A token is a pair a token name and an optional token

value
• A pattern is a description of the form that the lexemes
of a token may take
• A lexeme is a sequence of characters in the source
program that matches the pattern for a token
Example

Token Informal description Sample lexemes

if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2

number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

printf(“total = %d\n”, score);

Attributes for tokens

• E = M * C ** 2
• <id, pointer to symbol table entry for E>
• <assign-op>
• <id, pointer to symbol table entry for M>
• <mult-op>
• <id, pointer to symbol table entry for C>
• <exp-op>
• <number, integer value 2>
Lexical errors

• Some errors are out of power of lexical analyzer to

recognize:
• fi (a == f(x)) …
• However it may be able to recognize errors like:
• d = 2r
• Such errors are recognized when no pattern for tokens
matches a character sequence
Error recovery

• Panic mode: successive characters are ignored until we

reach to a well formed token
• Delete one character from the remaining input
• Insert a missing character into the remaining input
• Replace a character by another character
• Transpose two adjacent characters
Input buffering

• Sometimes lexical analyzer needs to look ahead some

symbols to decide about the token to return
• In C language: we need to look after -, = or < to decide what
token to return
• In Fortran: DO 5 I = 1.25
• We need to introduce a two buffer scheme to handle
large look-aheads safely

E = M * C * * 2 eof
Sentinels

E = M eof * C * * 2 eof eof

Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;\
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
Specification of tokens

• In theory of compilation regular expressions are used to

formalize the specification of tokens
• Regular expressions are means for specifying regular
languages
• Example:
• Letter_(letter_ | digit)*
• Each regular expression is a pattern specifying the form
of strings
Regular expressions

• Ɛ is a regular expression, L(Ɛ) = {Ɛ}

• If a is a symbol in ∑then a is a regular expression, L(a)
= {a}
• (r) | (s) is a regular expression denoting the language
L(r) ∪ L(s)
• (r)(s) is a regular expression denoting the language
L(r)L(s)
• (r)* is a regular expression denoting (L9r))*
• (r) is a regular expression denting L(r)
Regular definitions

d1 -> r1
d2 -> r2
…
dn -> rn

• Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Extensions

• One or more instances: (r)+

• Zero of one instances: r?
• Character classes: [abc]

• Example:
• letter_ -> [A-Za-z_]
• digit -> [0-9]
• id -> letter_(letter|digit)*
Recognition of tokens

• Starting point is the language grammar to understand

the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
|Ɛ
expr -> term relop term
| term
term -> id
| number
Recognition of tokens (cont.)

• The next step is to formalize the patterns:

digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
• We also need to handle whitespaces:
ws -> (blank | tab | newline)+
Transition diagrams

• Transition diagram for relop

Transition diagrams (cont.)

• Transition diagram for reserved words and identifiers

Transition diagrams (cont.)

• Transition diagram for unsigned numbers

Transition diagrams (cont.)

• Transition diagram for whitespace

Architecture of a transition-diagram-
based lexical analyzer
TOKEN getRelop()
{
TOKEN retToken = new (RELOP)
while (1) { /* repeat character processing until a
return or failure occurs */
switch(state) {
case 0: c= nextchar();
if (c == ‘<‘) state = 1;
else if (c == ‘=‘) state = 5;
else if (c == ‘>’) state = 6;
else fail(); /* lexeme is not a relop */
break;
case 1: …
…
case 8: retract();
retToken.attribute = GT;
return(retToken);
}
Lexical Analyzer Generator - Lex

Lex Source Lexical

lex.yy.c
program Compiler
lex.l

lex.yy.c
C a.out
compiler

Input stream a.out

Sequence
of tokens
Structure of Lex programs

declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions
Compiling & executing lex programs

SP Unit III-2024-25
No ratings yet
SP Unit III-2024-25
126 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
63 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
CD 1
No ratings yet
CD 1
92 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
SSC Module2 LexicalAnalysis
No ratings yet
SSC Module2 LexicalAnalysis
26 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
CH 2 - Lexical Analysis
No ratings yet
CH 2 - Lexical Analysis
36 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Lexical Analysis: Programming Languages Translators
No ratings yet
Lexical Analysis: Programming Languages Translators
21 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
2 Lex
No ratings yet
2 Lex
45 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
Compiler
No ratings yet
Compiler
60 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
CD PPTS 2
No ratings yet
CD PPTS 2
27 pages