0% found this document useful (0 votes)

11 views27 pages

Unit 1-REGULAR LANGUAGES

The document discusses lexical analysis, detailing the roles of lexical analyzers and their interaction with parsers. It covers concepts such as tokens, lexemes, patterns, and attributes, as well as methods for error recovery and input buffering techniques. Additionally, it introduces regular expressions, their algebraic properties, and the recognition of tokens through grammar and transition diagrams.

Uploaded by

luqmanmohd009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views27 pages

Unit 1-REGULAR LANGUAGES

Uploaded by

luqmanmohd009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Module 2

Chapter 3

LEXICAL ANALYSIS
LEXICAL ANALYSIS
 Role of lexical analyzer / Interaction between lexer & Parser

 Some additional tasks: eliminating comments, blanks, tab and

newline characters, providing line numbers associated with error
messages and making a copy of the source program with error
messages.
lexemes
Token: token is a pair consisting of a token name and an

optional attribute value

Pattern: pattern is a description of the form that the

lexemes of a token may take

Lexeme: lexeme is a sequence of characters in the source

program that matches the pattern for a token and is

identified by the lexical analyzer as an instance of that
token.
In many programming languages, the following

classes cover most or all of the tokens:

1. One token for each keyword. The pattern for a keyword
is the same as the keyword itself.
2.Tokens for the operators, either individually or in
classes such as the token comparison.
3. One token representing all identifiers.
4.One or more tokens representing constants, such as
numbers and literal
5. Tokens for each punctuation symbol, such as left and
right parentheses, comma, and semicolon.
Attributes for tokens
E = M * C ** 2 are written below as a sequence of
pairs.
<id, pointer to symbol-table entry for E>
<assign_op>
<id, pointer to symbol-table entry for M>
<mult_op>
<id, pointer to symbol-table entry for C>
<exp_op>
<number, integer value 2>
Lexical errors
f i ( a == f ( x ) ) . ..
"panic mode" recovery: delete successive characters
from the remaining input
Other recovery methods are:
1. Delete one character from the remaining input.
2. Insert a missing character into the remaining
input.
3. Replace a character by another character.
4. Transpose two adjacent characters.
INPUT BUFFERING
 To speed up reading of source program.
 Need to look at least additional one character ahead.
 Ex: = is a part of ==, < is a part of <=
 Two-buffer scheme is introduced:
 to handle large look aheads safely.

 To reduce the amount of overhead required to process a

single input character.

 Each buffer is of same size N (size of disk block, eg, 4096

bytes)
 If there are less than N characters in the input file, a special

character “eof” marks the end of source file.

INPUT BUFFERING
Buffer Pairs: Two pointers are used.

lexemeBegin: marks the beginning of current

lexeme.
forward: scans ahead until a pattern match is
found.
Sentinels: Special character that can not be a

part of the source program, eg: “eof”

For each character read, we make 2 tests: one

for end of buffer, one for actual character read.

Lookahead code with sentinels
switch ( *forward++ ) {
case eof:
if (forward is at end of first buffer ) {
reload second buffer;
forward = beginning of second buffer;
}
else if (forward is at end of second buffer ) {
reload first buffer;
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
Cases for the other characters
}
SPECIFICATION OF
TOKENS
Alphabet: finite set of symbols, ex: {0,1}
String : finite sequence of symbols drawn from that

alphabet. Ex: 000, 0011 ….

length of string: |s|, is the number of occurrences of

symbols
empty string : denoted by ε
Prefix, suffix, substring, proper prefix, subsequence of

string(deleting zero or more not necessarily consecutive

positions)
Language : set of strings over some fixed alphabet.
Operations on languages

Ex: L={A,B,…..Z,a,b,….z}
D={0,1,…..9}
LUD= 62 strings of length 1 (total of alphabet and digit)
LD=520 strings of length 2 (52*10=520)
L*=? D2= set of 2 digit numbers
L+=?
Regular Expressions
 ε is a regular expression, the set containing empty string is a

regular expression.
 a is a regular expression.

 Suppose r and s are regular expression denoting the

languages L(r) and L(s) then,

 (r|s) is a regular expression denoting L(r)UL(s)

 rs is a regular expression denoting L(r)L(s)

 (r)* is a regular expression denoting (L(r))*

 (r) is a regular expression denoting L(r)

Regular expression Example strings
a|b {a, b}
(a|b)(a|b) {aa, ab, ba, bb}
a* {ε, a, aa, aaa,…}
a|a*b {a, b, ab, aab, aaab,…}

Algebraic properties (laws) of regular

expression
Regular definition
Process of giving a name to certain regular
expressions and use those names in subsequent
expressions.

Ex: 0|1|2|……|9 can be written as D 0|1|2|……|9

Extensions of regular
expressions
Zero or more: *
One or more: +
Zero or one: ?
Character class: [ ]
[abc] denotes the regular expression a|b|c
[a-z] denotes the regular expression a|b|..|z
1. Write regular definition for C identifier using
extensions.
Letter_  [a-zA-Z_]
Digit  [0-9]
Id  Letter_(Letter_|Digit)*

2. Write regular definition for unsigned number using

delim  blank | tab | newline

ws delim+
Transition Diagrams
collection of nodes or circles, called states.
Edges are directed from one state of the transition
diagram to another.
edge is labeled by a symbol.
Certain states are said to be accepting, or final.
One state is designated the start state, or initial state
retract the forward pointer one position: denoted by *

Ex: Write the transition diagram for relational

operator.
Recognition of Reserved Words and Identifiers
There are two ways that we can handle reserved words that look like
identifiers:
1) Install the reserved words in the symbol table initially.
 installID( ): places it in the symbol table if it is not already there.
 getToken( ): examines the symbol table entry for the lexeme
found
2) Create separate transition diagrams for each keyword
Transition diagram for white space:

Transition diagram for unsigned number:

Implementation of relop T.D
TOKEN getRelop ( ) {
TOKEN retToken = new(RELOP);
while(1) { /* repeat character processing until a return or failure occurs */
switch(state) {
case 0:
c = nextChar( );
if ( c == ‘<’ ) state = 1;
else if ( c == '=' ) state = 5;
else if ( c == '>' ) state = 6;
else fail(); /* lexeme is not a relop */
break;
case 1: ......
case 2: ….
case 8: retract();
retToken.attribute = GT;
return(retToken);
}
}
End of Module

Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
CD ch2
No ratings yet
CD ch2
104 pages
Unit 2 Lexical Analysis
No ratings yet
Unit 2 Lexical Analysis
94 pages
Chapter - 2 Lexical Analysis
No ratings yet
Chapter - 2 Lexical Analysis
160 pages
SSCD Chapter3
No ratings yet
SSCD Chapter3
97 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
Lexical Analysis: CD: Compiler Design
No ratings yet
Lexical Analysis: CD: Compiler Design
122 pages
Ch3 - Lexical Analysis
No ratings yet
Ch3 - Lexical Analysis
52 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
WINSEM2024-25 CSI2005 TH VL2024250502429 2024-12-14 Reference-Material-II
No ratings yet
WINSEM2024-25 CSI2005 TH VL2024250502429 2024-12-14 Reference-Material-II
84 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
Unit 2
No ratings yet
Unit 2
89 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Unit 1
No ratings yet
Unit 1
34 pages
Lec 06 Specification of Tokens
No ratings yet
Lec 06 Specification of Tokens
23 pages
Lexical Analysis
No ratings yet
Lexical Analysis
121 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
SSC Module2 LexicalAnalysis
No ratings yet
SSC Module2 LexicalAnalysis
26 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
No ratings yet
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
23 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Lexical Analysis: S. M. Farhad
No ratings yet
Lexical Analysis: S. M. Farhad
28 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Dual Port Ram Using Uvm
No ratings yet
Dual Port Ram Using Uvm
42 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
CC Note 1
No ratings yet
CC Note 1
11 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Unit-2 Lexical Analysis
No ratings yet
Unit-2 Lexical Analysis
36 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Ch2 Lexical Analysis
No ratings yet
Ch2 Lexical Analysis
11 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
DP-203T00 Microsoft Azure Data Engineering-02
No ratings yet
DP-203T00 Microsoft Azure Data Engineering-02
23 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
CD Aii Partb Ans
No ratings yet
CD Aii Partb Ans
8 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
EcoStruxure Hybrid DCS Process Templates User Guide - Eng - EIO0000000987.14
No ratings yet
EcoStruxure Hybrid DCS Process Templates User Guide - Eng - EIO0000000987.14
227 pages
PostgreSQL Internals Through Pictures
100% (3)
PostgreSQL Internals Through Pictures
72 pages
Compiler
No ratings yet
Compiler
60 pages
CD PPTS 2
No ratings yet
CD PPTS 2
27 pages
Online Food Ordering System Synopsis
No ratings yet
Online Food Ordering System Synopsis
9 pages
Web Services: Answer: D Explanation
No ratings yet
Web Services: Answer: D Explanation
15 pages
Windows SCADA Disturbance Capture: User's Guide
No ratings yet
Windows SCADA Disturbance Capture: User's Guide
23 pages
Information Management System 2 PDF
No ratings yet
Information Management System 2 PDF
15 pages
Information Management System 2
No ratings yet
Information Management System 2
19 pages
Gaf XL 415
No ratings yet
Gaf XL 415
12 pages
8 Introduction To Purchasing & Purchase Requisition in SAP
No ratings yet
8 Introduction To Purchasing & Purchase Requisition in SAP
6 pages
Latex PHD Thesis Template b5
100% (3)
Latex PHD Thesis Template b5
6 pages
Lec05 Intermediate Code Generation
No ratings yet
Lec05 Intermediate Code Generation
40 pages
Sem Vi Ty BSC Cs Qp's Oct 2022 NSG Academy
100% (1)
Sem Vi Ty BSC Cs Qp's Oct 2022 NSG Academy
17 pages
PowerBi VideoNotes PDF
No ratings yet
PowerBi VideoNotes PDF
9 pages
Stack Programs
No ratings yet
Stack Programs
3 pages
SPIRV
No ratings yet
SPIRV
117 pages
IS2201-1 ISM MSE 1 Scheme
No ratings yet
IS2201-1 ISM MSE 1 Scheme
4 pages
Is2201-1 Ism Mse 1 QP
No ratings yet
Is2201-1 Ism Mse 1 QP
1 page
Lec05 02112021 074456pm
No ratings yet
Lec05 02112021 074456pm
39 pages
Digital Design Verification IA-4: Name: Advika Priyabhashini DOS: 7/11/24 Sub: Minor Spec in VLSI
No ratings yet
Digital Design Verification IA-4: Name: Advika Priyabhashini DOS: 7/11/24 Sub: Minor Spec in VLSI
9 pages
Teach Yourself CORBA in 14 Days
No ratings yet
Teach Yourself CORBA in 14 Days
403 pages
Hospital Management System: Submitted in Partial Fulfillment For The Award of The Degree of
No ratings yet
Hospital Management System: Submitted in Partial Fulfillment For The Award of The Degree of
5 pages
Jitsi
No ratings yet
Jitsi
3 pages
Real-Time Campus University Bus Tracking Mobile Application: July 2018
No ratings yet
Real-Time Campus University Bus Tracking Mobile Application: July 2018
7 pages
Api Java : Link Link Librerías y Frameworks
No ratings yet
Api Java : Link Link Librerías y Frameworks
4 pages
ZTE Wetest User Guide All
No ratings yet
ZTE Wetest User Guide All
3 pages
ResumeAmanFinalP (1) 1
No ratings yet
ResumeAmanFinalP (1) 1
1 page
Type Constructors
No ratings yet
Type Constructors
22 pages
PHP Labsubmit Program
No ratings yet
PHP Labsubmit Program
18 pages
Códigos Clarion
No ratings yet
Códigos Clarion
3 pages
Steamcmd
No ratings yet
Steamcmd
3 pages
4.11 Big Data Questions
No ratings yet
4.11 Big Data Questions
3 pages
Rails Competencies
No ratings yet
Rails Competencies
1 page
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet

Unit 1-REGULAR LANGUAGES

Uploaded by

Unit 1-REGULAR LANGUAGES

Uploaded by

Module 2

 Some additional tasks: eliminating comments, blanks, tab and

optional attribute value

lexemes of a token may take

program that matches the pattern for a token and is

classes cover most or all of the tokens:

 To reduce the amount of overhead required to process a

single input character.

character “eof” marks the end of source file.

lexemeBegin: marks the beginning of current

part of the source program, eg: “eof”

for end of buffer, one for actual character read.

alphabet. Ex: 000, 0011 ….

string(deleting zero or more not necessarily consecutive

 Suppose r and s are regular expression denoting the

languages L(r) and L(s) then,

 rs is a regular expression denoting L(r)L(s)

 (r)* is a regular expression denoting (L(r))*

 (r) is a regular expression denoting L(r)

Algebraic properties (laws) of regular

Ex: 0|1|2|……|9 can be written as D 0|1|2|……|9

2. Write regular definition for unsigned number using

delim  blank | tab | newline

Ex: Write the transition diagram for relational

Transition diagram for unsigned number:

You might also like