0% found this document useful (0 votes)

78 views24 pages

Compiler Design Lexical Analysis

The document discusses the role of lexical analysis in compiler design. It explains that the lexical analyzer tokenizes the source program by separating it into tokens which are then passed to the parser. It describes tokens as pairs of a token name and optional value. Regular expressions are used to formally specify token patterns. The lexical analyzer generates a symbol table and detects lexical errors. It also discusses topics like input buffering, token attributes, transition diagrams, and lexical analyzer generators like Lex.

Uploaded by

Bhaskar P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views24 pages

Compiler Design Lexical Analysis

Uploaded by

Bhaskar P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 24

COMPILER DESIGN

BCA 5th Semester 2020

Topic: Lexical Analysis

Sakhi Bandyopadhyay
Department of Computer Science and BCA
Kharagpur College
The role of lexical analyzer

token
Source To semantic
Lexical Analyzer Parser
program analysis
getNextToken

Symbol
table
Why to separate Lexical analysis and
parsing
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
Tokens, Patterns and Lexemes

• A token is a pair a token name and an optional token value

• A pattern is a description of the form that the lexemes of a token
may take
• A lexeme is a sequence of characters in the source program that
matches the pattern for a token
Example

Token Informal description Sample lexemes

if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2

number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

printf(“total = %d\n”, score);

Attributes for tokens

• E = M * C ** 2
• <id, pointer to symbol table entry for E>
• <assign-op>
• <id, pointer to symbol table entry for M>
• <mult-op>
• <id, pointer to symbol table entry for C>
• <exp-op>
• <number, integer value 2>
Lexical errors

• Some errors are out of power of lexical analyzer to recognize:

• fi (a == f(x)) …
• However it may be able to recognize errors like:
• d = 2r
• Such errors are recognized when no pattern for tokens matches a
character sequence
Error recovery

• Panic mode: successive characters are ignored until we reach to a

well formed token
• Delete one character from the remaining input
• Insert a missing character into the remaining input
• Replace a character by another character
• Transpose two adjacent characters
Input buffering

• Sometimes lexical analyzer needs to look ahead some symbols to

decide about the token to return
• In C language: we need to look after -, = or < to decide what token to return
• In Fortran: DO 5 I = 1.25
• We need to introduce a two buffer scheme to handle large look-
aheads safely

E = M * C * * 2 eof
Sentinels

E = M eof * C * * 2 eof eof

Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;\
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
Specification of tokens

• In theory of compilation regular expressions are used to formalize

the specification of tokens
• Regular expressions are means for specifying regular languages
• Example:
• Letter_(letter_ | digit)*
• Each regular expression is a pattern specifying the form of strings
Regular expressions

• Ɛ is a regular expression, L(Ɛ) = {Ɛ}

• If a is a symbol in ∑then a is a regular expression, L(a) = {a}
• (r) | (s) is a regular expression denoting the language L(r) ∪ L(s)
• (r)(s) is a regular expression denoting the language L(r)L(s)
• (r)* is a regular expression denoting (L9r))*
• (r) is a regular expression denting L(r)
Regular definitions

d1 -> r1
d2 -> r2
…
dn -> rn

• Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Extensions

• One or more instances: (r)+

• Zero of one instances: r?
• Character classes: [abc]

• Example:
• letter_ -> [A-Za-z_]
• digit -> [0-9]
• id -> letter_(letter|digit)*
Recognition of tokens

• Starting point is the language grammar to understand the tokens:

stmt -> if expr then stmt
| if expr then stmt else stmt
|Ɛ
expr -> term relop term
| term
term -> id
| number
Recognition of tokens (cont.)

• The next step is to formalize the patterns:

digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
• We also need to handle whitespaces:
ws -> (blank | tab | newline)+
Transition diagrams

• Transition diagram for relop

Transition diagrams (cont.)

• Transition diagram for reserved words and identifiers

Transition diagrams (cont.)

• Transition diagram for unsigned numbers

Transition diagrams (cont.)

• Transition diagram for whitespace

Architecture of a transition-diagram-based
lexical analyzer
TOKEN getRelop()
{
TOKEN retToken = new (RELOP)
while (1) {/* repeat character processing until a
return or failure occurs */
switch(state) {
case 0: c= nextchar();
if (c == ‘<‘) state = 1;
else if (c == ‘=‘) state = 5;
else if (c == ‘>’) state = 6;
else fail(); /* lexeme is not a relop */
break;
case 1: …
…
case 8: retract();
retToken.attribute = GT;
return(retToken);
}
Lexical Analyzer Generator - Lex

Lex Source
Lexical Compiler lex.yy.c
program
lex.l

lex.yy.c
C a.out
compiler

Input stream a.out

Sequence
of tokens
Structure of Lex programs

declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions
Thank You

Design Thinking and Innovation at Apple
No ratings yet
Design Thinking and Innovation at Apple
36 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Comparative Report SMS LMS
No ratings yet
Comparative Report SMS LMS
14 pages
How To Create Bank Statement Transaction Creation Rules and Account Bank Charges Fees or Interest
100% (2)
How To Create Bank Statement Transaction Creation Rules and Account Bank Charges Fees or Interest
9 pages
Yardi Commercial Suite
No ratings yet
Yardi Commercial Suite
52 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Compiler
No ratings yet
Compiler
60 pages
SSC Module2 LexicalAnalysis
No ratings yet
SSC Module2 LexicalAnalysis
26 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
SP Unit III-2024-25
No ratings yet
SP Unit III-2024-25
126 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
CH 2 - Lexical Analysis
No ratings yet
CH 2 - Lexical Analysis
36 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
63 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
03 Lex Analysis
No ratings yet
03 Lex Analysis
61 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Compiler - Lexical Analyzer-2
No ratings yet
Compiler - Lexical Analyzer-2
16 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
CD UNIT-1
No ratings yet
CD UNIT-1
60 pages
2 Lex
No ratings yet
2 Lex
45 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
Lec 02
No ratings yet
Lec 02
17 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Hiragana Memory Hint Worksheet Booklet JPF
No ratings yet
Hiragana Memory Hint Worksheet Booklet JPF
12 pages
TP3 - Hadoop Python - Wordcount
No ratings yet
TP3 - Hadoop Python - Wordcount
6 pages
James Hall
No ratings yet
James Hall
8 pages
Siemens Relay
No ratings yet
Siemens Relay
12 pages
Resume 1 Naresh Bolikonda
No ratings yet
Resume 1 Naresh Bolikonda
3 pages
Erp Performance As Intervening Variable To Financial Performance For Erp Implementation, Adherence To Coso, and GCG Implementation
No ratings yet
Erp Performance As Intervening Variable To Financial Performance For Erp Implementation, Adherence To Coso, and GCG Implementation
20 pages
Fusion 360 Lab Report
No ratings yet
Fusion 360 Lab Report
15 pages
Log
No ratings yet
Log
2 pages
Advatnages of Intranet Disadvantages of Intranet
No ratings yet
Advatnages of Intranet Disadvantages of Intranet
2 pages
Big Data Analytics: Free Guide: 5 Data Science Tools To Consider
No ratings yet
Big Data Analytics: Free Guide: 5 Data Science Tools To Consider
8 pages
Courier Onboarding
100% (1)
Courier Onboarding
32 pages
Gmail - Experian Credit Report and Credit Score Through INDIALENDS
No ratings yet
Gmail - Experian Credit Report and Credit Score Through INDIALENDS
2 pages
COS 101 Revision
No ratings yet
COS 101 Revision
7 pages
PLSQL 16 18
No ratings yet
PLSQL 16 18
3 pages
SS7 Mad Prac 10 2
No ratings yet
SS7 Mad Prac 10 2
4 pages
Scoreboarding or SVA?: in A UVM Class-Based Environment
No ratings yet
Scoreboarding or SVA?: in A UVM Class-Based Environment
3 pages
Vismat Material V-Ray For Sketchup
No ratings yet
Vismat Material V-Ray For Sketchup
19 pages
DSA Lab 05
No ratings yet
DSA Lab 05
5 pages
(CC & DSC) Midterm Exam Routine-CSE-Spring 2022
No ratings yet
(CC & DSC) Midterm Exam Routine-CSE-Spring 2022
7 pages
SECM
No ratings yet
SECM
17 pages
RoadMap Data Science
No ratings yet
RoadMap Data Science
6 pages
MC0069-System Analysis and Design Model Question Paper
No ratings yet
MC0069-System Analysis and Design Model Question Paper
23 pages
Reduced-Order State Observer Design
No ratings yet
Reduced-Order State Observer Design
145 pages
A+ Exam Wrong Answers
No ratings yet
A+ Exam Wrong Answers
50 pages
Model Predictive Control Using YALMIP Getting Started
No ratings yet
Model Predictive Control Using YALMIP Getting Started
5 pages
Smaart9 2ReleaseOverview
No ratings yet
Smaart9 2ReleaseOverview
5 pages

Compiler Design Lexical Analysis

Uploaded by

Compiler Design Lexical Analysis

Uploaded by

COMPILER DESIGN

BCA 5th Semester 2020

• A token is a pair a token name and an optional token value

Token Informal description Sample lexemes

id Letter followed by letter and digits pi, score, D2

printf(“total = %d\n”, score);

• Some errors are out of power of lexical analyzer to recognize:

• Panic mode: successive characters are ignored until we reach to a

• Sometimes lexical analyzer needs to look ahead some symbols to

E = M eof * C * * 2 eof eof

• In theory of compilation regular expressions are used to formalize

• Ɛ is a regular expression, L(Ɛ) = {Ɛ}

• One or more instances: (r)+

• Starting point is the language grammar to understand the tokens:

• The next step is to formalize the patterns:

• Transition diagram for relop

• Transition diagram for reserved words and identifiers

• Transition diagram for unsigned numbers

• Transition diagram for whitespace

Input stream a.out

You might also like