0% found this document useful (0 votes)

2 views21 pages

Pcdunit2 Class

The document discusses the role of the lexical analyzer as the first phase of a compiler, which reads source code to generate tokens that represent logical units such as identifiers and operators. It outlines the functions of the lexical analyzer, including token production, comment elimination, and error reporting, as well as the importance of separating lexical analysis from parsing for design simplicity and compiler efficiency. Additionally, it covers terminology related to lexical analysis, input buffering techniques, and error recovery methods.

Uploaded by

venkat Mohan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views21 pages

Pcdunit2 Class

Uploaded by

venkat Mohan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 21

UNIT 2

LEXICAL ANALYSIS
The Role of The Lexical Analyzer

Lexical analyzer is the first phase of compiler.

The lexical analyzer reads the input source program from left to
right one character at a time and generate the sequence of
tokens.
Each token is a single logical cohesive unit such as identifier,
keywords, operators and punctuation marks.
Then the parser to determine the syntax of the source program
can use these tokens.
The role of lexical analyzer in the process of compilation is shown
below
token
Source Lexical Parser
program analyzer Get next
token

Symbol
table
How lexical analyzer works?
Syntax checker in a compiler serves as the master program.
It first sends a request to get a valid token from the lexical
analyzer

Scanner then do its pattern matching to create a valid token if

possible and then sends back to syntax checker.

As soon as it sends back the token the scanner gets suspended.

Syntax checker do the grammar check over the token and then
asks for next token.

This process continues recursively till the entire input string is

consumed
Apart from token identification lexical analyzer also performs
following functions

Functions of lexical analyzer

• It produces stream of tokens

• It eliminates blank and comments

• It generates symbol table which stores the information about

identifiers, constants encountered in the input.

• It keeps track of line numbers.

• It reports the error encountered while generating the tokens

Why to separate Lexical analysis and parsing
There are several reasons for separating the analysis phase of
compiling into lexical analysis and parsing.
1. Simpler design
It is the most important consideration. The separation of lexical
analysis from syntax analysis often allows us to simplify one or the
other of these phases.
Eg: If parser need to deal with whitespaces and command lines - task
is hard to design

2. Compiler efficiency is improved:

Optimization of lexical analysis because a large amount of time is
spent reading the source program and partitioning it into tokens.

3. Compiler portability is enhanced:

Input alphabet peculiarities and other device-specific anomalies can
be restricted to the lexical analyzer.
Eg: special and non standard symbols can be isolated.
Terminology used in lexical analysis
1 Token: A set of input string which are related through a
similar pattern.

Any word that starts with an alphabet and can contain any
number or alphabet in between is called an identifier.

Identifier is called a token: raghu,raghu123

2 Lexeme : The actual input string which represents the

token
Identifier – token
Raghu,raghu123—lexeme

3 Pattern : rule which a lexical analyzer follow to create a

token
Example:
X= X * (acc + 123 )

Token Lexeme
Identifier X
Operator equal =
Identifier X
Operator multiply *
Left parenthesis (
Identifier Acc
Operator Plus +
Integer constant 123
Examples of Tokens

regular
expression

• In most programming language, the following

constructs are treated as tokens: keyword,
identifiers, constants, literal strings, operators,
and punctuation symbols.
Attributes for Tokens
• When more than one lexeme matches a
pattern, the lexical analyzer must provide
additional information about the particular
lexeme that matched to the subsequent
phases of the compiler.

• The lexical analyzer collects information

about tokens into their associated
attributes.
Attributes for tokens
• E = M * C ** 2
– <id, pointer to symbol table entry for E>
– <assign-op>
– <id, pointer to symbol table entry for M>
– <mult-op>
– <id, pointer to symbol table entry for C>
– <exp-op>
– <number, integer value 2>
Lexical Errors
• Few errors are detected at lexical level alone,
because a lexical analyzer has a very localized
view of a source program.
• For example, if the string fi is encountered in a
C program for the first time in the context
– fi ( a == f(x)) …..
– whether fi is a misspelling of the keyword if or an
undeclared function identifier
– Since fi is a valid identifier, the lexical analyzer must
return the token for an identifier and let latter phase
handle any error.
Lexical Errors (Cont.)
• Panic mode: successive characters are ignored
until we reach to a well formed token

• Other possible error-recovery actions are:

– Deleting an extraneous character
– Inserting a missing character
– Replacing an incorrect character by a correct
character
– Transposing two adjacent characters
Input Buffering

Examining ways of speeding reading the

source program
Two-buffer scheme handling large look ahead
safely
The lexical analyzer scans the input string from left to right
one character at a time.

It uses two pointers begin pointer and forward pointer bptr and
fptr to keep track of the portion of the input scanned.

Initially both the pointers point to the first character of the input
string
The bptr remains at the beginning of the string to be
read and the fptr moves ahead to search for end of
lexeme.
As soon as the blank space is encountered it
indicates end of lexeme
Then fptr will be at white space. When fptr
encounters white space it ignore and moves ahead.
Then both the bptr and fptr is set at next token i.
The input character is thus read from secondary
storage.
But reading in this way from secondary storage is
costly. Hence buffering technique is used.
A block of data is first read into a buffer, and then
scanned by lexical analyzer

There are two methods used in this context:

one buffer scheme and
two buffer scheme
One buffer scheme

In this buffer scheme, only one buffer is used to

store the input string.

But the problem with this scheme is that if lexeme

is very long then it crosses the buffer boundary,
to scan rest of the lexeme the buffer has to be
refilled, that makes overwriting the first part of
lexeme.
Two Buffer scheme
To overcome the problem of one buffer scheme, in this
method two buffers are used to store the input string.

In this method, the first buffer and second buffer are scanned
alternately.

When end of current buffer is reached the other buffer is

filled.

The only problem with this method is that if length of the

lexeme is longer than length of the buffer then the input
cannot be scanned completely.
Initially both the bptr and fptr are pointing to the first character
of first buffer.

Then the fptr moves towards right in search of end of lexeme.

As soon as blank character is recognized, the string between

bptr and fptr is identified as corresponding token.

To identified the boundary of first buffer end of buffer

character should be placed at the end of first buffer.

Similarly end of second buffer is also recognized by the end of

buffer mark present at the end of second buffer.
When fptr encounters first eof, then one can
recognize end of first buffer and hence filling up of
second buffer is started.

In the same way when second eof is obtained then it

indicates end of second buffer.

Alternatively both the buffers can be filled up until end

of the input program and stream of tokens is
identified.

This eof character introduced at the end is called

sentinel which is used to identify the end of buffer.
If (fptr==eof(buff1)) /* encounters end of first buffer */
{ /* refill buffer2 */
fptr++;
}
else if (fp==eof(buff2) // encounters end of second
buffer
{
//refill buffer1
fptr++;
}
else if (fptr == eof(input))
return; // terminate scanning
else
fptr++; // still remaining input has to be scanned

Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Unit 5
No ratings yet
Unit 5
20 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
CD Unit I Part II Lexical Analysis
No ratings yet
CD Unit I Part II Lexical Analysis
58 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
68 pages
SPCC Module 5 Lect 2 Lexical Analysis Part 1
No ratings yet
SPCC Module 5 Lect 2 Lexical Analysis Part 1
16 pages
Unit 1 CD
No ratings yet
Unit 1 CD
36 pages
Chapter Two-Lexical Analysis
No ratings yet
Chapter Two-Lexical Analysis
4 pages
Compiler - 2
No ratings yet
Compiler - 2
15 pages
CD - Module 2
No ratings yet
CD - Module 2
12 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
48 pages
CD Notes
No ratings yet
CD Notes
194 pages
Compiler Construction Lec 1b
No ratings yet
Compiler Construction Lec 1b
37 pages
2.1 - Lexical Analysis
No ratings yet
2.1 - Lexical Analysis
102 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
76 pages
Unit Ii-Lexical Analysis Lexical Analysis: Sri Vidya College of Engineering and Technology Lecture Notes
No ratings yet
Unit Ii-Lexical Analysis Lexical Analysis: Sri Vidya College of Engineering and Technology Lecture Notes
14 pages
Compiler - Design - Module2-Print
No ratings yet
Compiler - Design - Module2-Print
16 pages
Chapter - 2 Lexical Analysis
No ratings yet
Chapter - 2 Lexical Analysis
160 pages
Unit 2
No ratings yet
Unit 2
61 pages
CD Notes
No ratings yet
CD Notes
7 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
Unit 2 Lexical Analysis
No ratings yet
Unit 2 Lexical Analysis
94 pages
Lexical Analysis
No ratings yet
Lexical Analysis
14 pages
Input Buffering
No ratings yet
Input Buffering
129 pages
Chapter 2
No ratings yet
Chapter 2
36 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
@CD - ch2 Compiler Design
No ratings yet
@CD - ch2 Compiler Design
26 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
31 pages
UNIT 2 Compiler Design
No ratings yet
UNIT 2 Compiler Design
23 pages
Unit I - Introduction To Compilers (9 Hours)
No ratings yet
Unit I - Introduction To Compilers (9 Hours)
23 pages
Compiler Easy Notes - Hamza Zahoor
No ratings yet
Compiler Easy Notes - Hamza Zahoor
37 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
Lec 02
No ratings yet
Lec 02
17 pages
Screenshot 2024-02-07 104122-Compressed
No ratings yet
Screenshot 2024-02-07 104122-Compressed
24 pages
CD Unit-1 (Part-1)
No ratings yet
CD Unit-1 (Part-1)
18 pages
Cognizant Communication Assessment Questions
No ratings yet
Cognizant Communication Assessment Questions
4 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
26 pages
CD UNIT-1
No ratings yet
CD UNIT-1
60 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Lexical Analysis: Deterministic Finite Automata
No ratings yet
Lexical Analysis: Deterministic Finite Automata
37 pages
HW 31712
No ratings yet
HW 31712
22 pages
AI Lab Question Bank
No ratings yet
AI Lab Question Bank
18 pages
Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
SONA Round1 Shortlisted Candidates
No ratings yet
SONA Round1 Shortlisted Candidates
2 pages
Prospectus Pharm D
No ratings yet
Prospectus Pharm D
18 pages
2014-CD Ch-03 SAn
No ratings yet
2014-CD Ch-03 SAn
21 pages
Lexical Analysis
No ratings yet
Lexical Analysis
5 pages
Current Affairs - MAY 2025
No ratings yet
Current Affairs - MAY 2025
127 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Urdu Model Verbs
No ratings yet
Urdu Model Verbs
8 pages
CD Aii Partb Ans
No ratings yet
CD Aii Partb Ans
8 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
Jman - Cse
No ratings yet
Jman - Cse
49 pages
Lexical Analysis
No ratings yet
Lexical Analysis
38 pages
JD Software Engineer JMAN 2025
No ratings yet
JD Software Engineer JMAN 2025
1 page
Essential Excel Skills For Data Preparation and Analysis Week 3
No ratings yet
Essential Excel Skills For Data Preparation and Analysis Week 3
9 pages
GATE Roadmap - Sheet1
No ratings yet
GATE Roadmap - Sheet1
4 pages
Lecture 04 05 PDF
No ratings yet
Lecture 04 05 PDF
8 pages
Guidelines For Open Elective Registration MAY 2025
No ratings yet
Guidelines For Open Elective Registration MAY 2025
6 pages
Cognizant DN Java FSE Progress Report
No ratings yet
Cognizant DN Java FSE Progress Report
4 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Bottom Up Evalution of Inherited Attributes - Group 7
No ratings yet
Bottom Up Evalution of Inherited Attributes - Group 7
11 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
56 pages
DN4.0 Deepskilling Handbook Java FSE
No ratings yet
DN4.0 Deepskilling Handbook Java FSE
33 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
Systems Programming Assignment
No ratings yet
Systems Programming Assignment
24 pages
Sona HWI Final Registration
No ratings yet
Sona HWI Final Registration
30 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
68 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
Semantic Report
No ratings yet
Semantic Report
24 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Compiler Design BIT052 Complete Notes (RRSIMT)
No ratings yet
Compiler Design BIT052 Complete Notes (RRSIMT)
126 pages
OE Regn - Coordinator Instructions
No ratings yet
OE Regn - Coordinator Instructions
1 page
HSST Computer Science
No ratings yet
HSST Computer Science
14 pages
LR Parsing PDF
No ratings yet
LR Parsing PDF
20 pages
CSE-Batch 3 List
No ratings yet
CSE-Batch 3 List
1 page
Honours Courses - Registration Semester 7
No ratings yet
Honours Courses - Registration Semester 7
1 page
CTS 3
No ratings yet
CTS 3
16 pages
Lex and Yacc
No ratings yet
Lex and Yacc
2 pages
Infosys Addon
No ratings yet
Infosys Addon
8 pages
Vacation Training Cse - Iqmath (Paid and Attendee List)
No ratings yet
Vacation Training Cse - Iqmath (Paid and Attendee List)
4 pages
JMAN - CSE New List
No ratings yet
JMAN - CSE New List
28 pages
SEMESTER-5 IT Syllabus
No ratings yet
SEMESTER-5 IT Syllabus
17 pages
DevRev Customer Support Engineering Intern
No ratings yet
DevRev Customer Support Engineering Intern
3 pages
Zentropy Addon
No ratings yet
Zentropy Addon
4 pages
1-Dynamic Program Analysis - Purdue CS PDF
No ratings yet
1-Dynamic Program Analysis - Purdue CS PDF
40 pages
CSD
No ratings yet
CSD
2 pages
GATE Study Material, Forum, Downloads, Discussions & More!
No ratings yet
GATE Study Material, Forum, Downloads, Discussions & More!
15 pages
Module 2 C D Notes
No ratings yet
Module 2 C D Notes
21 pages
Associate Software Engineer
No ratings yet
Associate Software Engineer
2 pages
Compiler Design Unit 1
No ratings yet
Compiler Design Unit 1
42 pages
Associate Data Scientist
No ratings yet
Associate Data Scientist
2 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
20 pages
Programming in Java 250525 093638
No ratings yet
Programming in Java 250525 093638
1 page
Unit-V Expert Systems-Notes
No ratings yet
Unit-V Expert Systems-Notes
23 pages
Programming in Java
No ratings yet
Programming in Java
1 page
Sona L1 Assessment
No ratings yet
Sona L1 Assessment
2 pages
170S Problem Set 2 Solutions
No ratings yet
170S Problem Set 2 Solutions
11 pages
Scaling To Very Very Large Corpora For Natural Language Disambiguation
No ratings yet
Scaling To Very Very Large Corpora For Natural Language Disambiguation
8 pages
Module 4
No ratings yet
Module 4
18 pages
Formal Language and Compiler Design - 2
No ratings yet
Formal Language and Compiler Design - 2
39 pages
Design Principles For Packet Parsers
No ratings yet
Design Principles For Packet Parsers
12 pages
CD - R20 Model Paper
No ratings yet
CD - R20 Model Paper
2 pages
2.1 Review of Literature: "SQL Generation and PL/SQL Execution From Natural Language Processing"
No ratings yet
2.1 Review of Literature: "SQL Generation and PL/SQL Execution From Natural Language Processing"
11 pages
September 2021
No ratings yet
September 2021
1 page
SONA Institute of Technology - Second Round Shortlisted Candidates
No ratings yet
SONA Institute of Technology - Second Round Shortlisted Candidates
6 pages
Sona College of Technology, Salem
No ratings yet
Sona College of Technology, Salem
6 pages
Sona College-Interview Reschedule
No ratings yet
Sona College-Interview Reschedule
5 pages
bc210424835 cs606 Assignment 1 Solution
No ratings yet
bc210424835 cs606 Assignment 1 Solution
3 pages
Sona Hwi Interview Shortlist
No ratings yet
Sona Hwi Interview Shortlist
2 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)

Pcdunit2 Class

Uploaded by

Pcdunit2 Class

Uploaded by

UNIT 2

Lexical analyzer is the first phase of compiler.

Scanner then do its pattern matching to create a valid token if

As soon as it sends back the token the scanner gets suspended.

This process continues recursively till the entire input string is

Functions of lexical analyzer

• It produces stream of tokens

• It eliminates blank and comments

• It generates symbol table which stores the information about

• It keeps track of line numbers.

• It reports the error encountered while generating the tokens

2. Compiler efficiency is improved:

3. Compiler portability is enhanced:

Identifier is called a token: raghu,raghu123

2 Lexeme : The actual input string which represents the

3 Pattern : rule which a lexical analyzer follow to create a

• In most programming language, the following

• The lexical analyzer collects information

• Other possible error-recovery actions are:

Examining ways of speeding reading the

There are two methods used in this context:

In this buffer scheme, only one buffer is used to

But the problem with this scheme is that if lexeme

When end of current buffer is reached the other buffer is

The only problem with this method is that if length of the

Then the fptr moves towards right in search of end of lexeme.

As soon as blank character is recognized, the string between

To identified the boundary of first buffer end of buffer

Similarly end of second buffer is also recognized by the end of

In the same way when second eof is obtained then it

Alternatively both the buffers can be filled up until end

This eof character introduced at the end is called

You might also like