0% found this document useful (0 votes)

24 views4 pages

Chapter Two-Lexical Analysis

The document discusses lexical analysis in compiler design. It defines key terms like tokens and lexemes. It describes the functions of a lexical analyzer as processing input characters into valid tokens while skipping comments and whitespace. Regular expressions are used to specify token patterns which are then recognized using transition diagrams. The lexical analyzer interacts with the symbol table and parser. Input buffering techniques like single/pair buffers use pointers to efficiently scan input for lexemes without reloading characters.

Uploaded by

gebrehiwot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views4 pages

Chapter Two-Lexical Analysis

Uploaded by

gebrehiwot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Chapter-2: Lexical Analysis

2.1 Functions of Lexical Analysis2.2 Role of the Lexical Analyzer2.3 Input Buffering
2.4 Specification of tokens2.5 Recognition of tokens

Lexical Analysis
Lexical means “Anything related to words”.
Terminology used in Lexical Analysis.
1. Token: A set of input string which are related through a similar pattern.
2. Lexeme: The actual input string which represents the token.
3. Pattern: Rule which a Lexical analyzer follow to create a token.

Functions of Lexical Analysis in Compiler Design

1. Process the input characters that constitute a high level program into valid set of tokens.
2. It skips the comment and white spaces while creating these tokens.
3. If any erroneous input is provided by the user in the program Lexical analyzer correlate that error with source
file and line number.
Ex:

Token Lexeme
Identifier X
Operator Eq =
Identifier X
Operator Mul *
Left Paranthesis (
Indentifier Acc
Operator Plus +
Integer Constant 123
Right Paranthesis )

Role of a Lexical Analyzer (Scanner or Lexer)

 The main task of the lexical analyzer is to read the input characters of the source program, group them into
lexemes, and produce as an output a sequence of tokens for each lexeme in the source program
 The stream of tokens is sent to the parser for syntax analysis
 The lexical analyzer also interacts with the symbol table, e.g., when the lexical analyzer discovers a lexeme
constituting an identifier, it needs to enter that lexeme into the symbol table
 The interactions are suggested in Figure 2.1
1
Figure 2.1: Interactions between the lexical analyzer and the parser

 The following are additional tasks performed by the lexical analyzer other than identifying lexemes:
 Stripping out comments and whitespace (blank, newline, and tab)
 Correlating error messages generated by the compiler with the source program by keeping track of line
numbers (using newline characters)

Input Buffering
- Reading character by character from secondary storage is slow process and time consuming as well
- It is necessary to look ahead several characters beyond the lexeme for a pattern before a match can be announced
- One technique is to read characters from the source program and if pattern is not matched than push look ahead
character back to the source program
- This technique is time consuming
- Use of buffer techniques can eliminate this prob. And increase efficiency
Single Buffer
- A buffer of n character is defined in the memory which is usually of block size (1024k)
- Two pointers will be used
o Beginning pointer (BP) o Forwarding pointer (FP)
- BP points to the start of the lexeme while FP scans the input buffer for lexeme
- When the lexeme end is found it is processed i.e. matched with pattern and converted into a token where FP
remains still
- When the lexeme is processed both pointers points to the next character /lexeme
- If the character is white space, it is also matched but no token is generated, just both pointers move ahead to
detect next lexeme

Disadvantage of Single buffer technique

- if the file size is greater than the buffer size , then the content of the last lexeme under process will be over
written by the new data if wereload the data

Buffer Pair technique

- In this technique buffer is divided into 2 half's, each n characters long which are contiguous to one another
- One half will be loaded at a time
- Same 2 pointers are used i.e. BP and FP
- When FP reaches the end of one half, second half is loaded and FP points to the beginning of the next half
- Lexeme processing remains the same
- In this way single buffer problem can be eliminated

Disadvantage of Buffer Pair Technique

- Two checks to check the end of 2 half of buffer will be performed each time FP is advanced

Psuedo code for the advancement of FP

2
If FP at the end of first half then begin
Reload second Half
FP=FP+ 1End
Else if FP at the end of second half then begin
Reload first half
Move FP at the beginning of First half End
Else
FP = FP+1

Sentinel
- While using buffer pair technique, we have to check each time FP is moved that it doesn't crosses the buffer half
and when it reaches end of buffer, the other one needs to be loaded
- We can solve this problem by introducing a sentinel character at the end of both halves of buffer
- This sentinel can be any character which is not a part of source program. EOF is usually preferred as it will also
indicate end of source program
- Through this sentinel buffer end can be determined
- FP will only be checked for this sentinel and when it is encountered appropriate action will be taking to Fill next
buffer. If eof is used than, encountering eof in elsewhere in buffer will mean end of source program

Algorithm for Advancement of FP while using sentinel

FP = FP + 1
If FP =eof then
begin
If FP at the end of first half then
begin
Reload seeondHalf
FP = FP + 1
End
Else if FP at the end of second half then
begin
Reload first half
Move FP at the beginning of First Half
End
Else
/* eof is within the buffer signifying end of i/p */
Terminate Lexical Analyzer
End

Specification of Tokens
Regular expressions are used to specify token patterns.
Notational shorthands
+ means one or more
*means zero or more
? means zero or one
Character classes [a-z],[abc],[0-9].

3
Ex: A pascal identifier is a letter followed by zero or more letters or digits.
id letter(letter/digit)*
letter A/B/…/Z/a/b/…z.
digit 0/1/…/9.

Each regular expression r denotes a language L(r).

Let ={a,b} then
Regular expression a/b denotes the set {a,b}
Regular expression (a/b)(a/b) denotes the set {aa,ab,ba,bb}
Regular expression a* denotes the set { ,a,aa,aaa,aaaa,….}
Regular expression ab+ denotes the set {ab,abb,abbb,….}

Recognition of Tokens
Lexical analyzer recognizes tokens
by using transition diagrams. Some important conventions about the transition diagram are:
1. Certain states are said to be accepting or final. These states indicate that a lexeme has been found.
Accepting states are indicated by a double circle, and if there is an action to be taken – typically returning a
token and an attribute value to the parser – we shall attach that action to the accepting state
2. In addition, if it is necessary to retract the forward pointer one position (i.e., the lexeme does not include the
symbol that got us to the accepting state), then we shall additionally place a * near that accepting state. Any
number of *s can be attached depending on the number of positions to retract
3. One state is designated the start state, or initial state; it is indicated by an edge labeled “start”, entering from
nowhere. The transition diagram always begins in the start state before any input symbols have been read

Figure 2.4 Transition diagram for relop

 Figure 2.4 is a transition diagram that recognizes the lexemes matching the token relop in Pascal
 Being on state 0 (initial state), if we see < as the first input symbol, then the lexemes that match the pattern for
relop can only be <, <>, or <=
 We therefore go to state 1 and look at the next character. If it is =, then we recognize lexeme <=, enter state 2,
and return the token relop with attribute LE, the symbolic constant representing this particular operator
 If in state 1 the next character is >, then instead we have lexeme <>, and enter state 3 to return an indication that
the not-equals operator has been found
 On any other character, after < , the lexeme is invalid, and we enter state 4 to return this information. Note,
however, that state 4 has a * to indicate that we must retract the input one position.
 On the other hand, if in state 0 the first character we see is =, then this one character must be the lexeme. We
immediately return the fact from state 5
 The remaining possibility is that the first character is >. Then, we must enter state 6 and decide, on the basis of
the next character, whether the lexeme >= (if we next see the sign =), or just > (on any other character)
 Note that if, in state 0, we see any character besides <, =, or >, we cannot possibly be seeing a relop lexeme, so
this transition diagram will not be used.

CD Prev Ans and Ques
No ratings yet
CD Prev Ans and Ques
37 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
CD Previous QP Answers
No ratings yet
CD Previous QP Answers
28 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Lexical Analysis
No ratings yet
Lexical Analysis
14 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Unit 2 - Lexical Anlaysis
No ratings yet
Unit 2 - Lexical Anlaysis
76 pages
CD Unit I Part II Lexical Analysis
No ratings yet
CD Unit I Part II Lexical Analysis
58 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
48 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Chapter - 2 Lexical Analysis
No ratings yet
Chapter - 2 Lexical Analysis
160 pages
Pcdunit2 Class
No ratings yet
Pcdunit2 Class
21 pages
Unit 2
No ratings yet
Unit 2
93 pages
CD - Module 2
No ratings yet
CD - Module 2
12 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Lexical Analysis: S. M. Farhad
No ratings yet
Lexical Analysis: S. M. Farhad
28 pages
CD Aii Partb Ans
No ratings yet
CD Aii Partb Ans
8 pages
SPCC Module 5 Lect 2 Lexical Analysis Part 1
No ratings yet
SPCC Module 5 Lect 2 Lexical Analysis Part 1
16 pages
CC Note 1
No ratings yet
CC Note 1
11 pages
Ch3 - Lexical Analysis
No ratings yet
Ch3 - Lexical Analysis
52 pages
Input Buffering
No ratings yet
Input Buffering
129 pages
Unit 2 Lexical Analysis
No ratings yet
Unit 2 Lexical Analysis
94 pages
Lexical Analysis: CD: Compiler Design
No ratings yet
Lexical Analysis: CD: Compiler Design
122 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
84 pages
2.1 - Lexical Analysis
No ratings yet
2.1 - Lexical Analysis
102 pages
WINSEM2024-25 CSI2005 TH VL2024250502429 2024-12-14 Reference-Material-II
No ratings yet
WINSEM2024-25 CSI2005 TH VL2024250502429 2024-12-14 Reference-Material-II
84 pages
Unit 2
No ratings yet
Unit 2
61 pages
Lexical Analysis
No ratings yet
Lexical Analysis
121 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Compiler - 2
No ratings yet
Compiler - 2
15 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
76 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
UNIT 2 Compiler Design
No ratings yet
UNIT 2 Compiler Design
23 pages
CD 1
No ratings yet
CD 1
92 pages
Unit 01 - PART 2
No ratings yet
Unit 01 - PART 2
25 pages
Modulo Nand Flash ARDUINO
No ratings yet
Modulo Nand Flash ARDUINO
78 pages
Lec 02
No ratings yet
Lec 02
17 pages
Lexical Analysis: Deterministic Finite Automata
No ratings yet
Lexical Analysis: Deterministic Finite Automata
37 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
Compiler - Design - Module2-Print
No ratings yet
Compiler - Design - Module2-Print
16 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
Ch2 Lexical Analysis
No ratings yet
Ch2 Lexical Analysis
11 pages
Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
Unit Ii-Lexical Analysis Lexical Analysis: Sri Vidya College of Engineering and Technology Lecture Notes
No ratings yet
Unit Ii-Lexical Analysis Lexical Analysis: Sri Vidya College of Engineering and Technology Lecture Notes
14 pages
Xa Zhing Software List - Update 10.12.2018
No ratings yet
Xa Zhing Software List - Update 10.12.2018
48 pages
Sprinkler Monitoring Manual
No ratings yet
Sprinkler Monitoring Manual
55 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Screenshot 2024-02-07 104122-Compressed
No ratings yet
Screenshot 2024-02-07 104122-Compressed
24 pages
2.1 Constituents of Lexical Analysis
No ratings yet
2.1 Constituents of Lexical Analysis
10 pages
Internship Report
No ratings yet
Internship Report
8 pages
Knowing The Drill: Virtual Teamwork at BP
No ratings yet
Knowing The Drill: Virtual Teamwork at BP
11 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
26 pages
Rpi 08 FSG 1
No ratings yet
Rpi 08 FSG 1
384 pages
HW 31712
No ratings yet
HW 31712
22 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
56 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
IDM 4.0 - Basic User Guide
No ratings yet
IDM 4.0 - Basic User Guide
82 pages
AI Tools For Email
No ratings yet
AI Tools For Email
13 pages
2873 Manual 1 U1980
No ratings yet
2873 Manual 1 U1980
35 pages
Test Set Ads-B/Tis/Tis-B Xpdr/Dme/Tcas: Operation Manual
No ratings yet
Test Set Ads-B/Tis/Tis-B Xpdr/Dme/Tcas: Operation Manual
238 pages
British Standard: A Single Copy of This British Standard Is Licensed To
No ratings yet
British Standard: A Single Copy of This British Standard Is Licensed To
25 pages
Planning and Provisioning Office 365
No ratings yet
Planning and Provisioning Office 365
9 pages
Solar Tree: Presented By
No ratings yet
Solar Tree: Presented By
18 pages
Compiler - Lexical Analysis
No ratings yet
Compiler - Lexical Analysis
17 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
Mil Spec
No ratings yet
Mil Spec
36 pages
Documentation Sheet Steel Spring Isolator General
No ratings yet
Documentation Sheet Steel Spring Isolator General
2 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
Question Bank 2161me167
No ratings yet
Question Bank 2161me167
2 pages
Biometric Integration - User Manual: Created By, Smit Thakur, QA Team 08-05-2019
No ratings yet
Biometric Integration - User Manual: Created By, Smit Thakur, QA Team 08-05-2019
8 pages
操作及维护保养说明书
No ratings yet
操作及维护保养说明书
22 pages
Cv. Siswanto
No ratings yet
Cv. Siswanto
3 pages
Literature Review of JIT-KANBAN System: Originalarticle
No ratings yet
Literature Review of JIT-KANBAN System: Originalarticle
16 pages
Sony Logistics
No ratings yet
Sony Logistics
2 pages
Project Schedule Management
No ratings yet
Project Schedule Management
20 pages
GUI in JAVA
No ratings yet
GUI in JAVA
99 pages
Build Sheet Servlet
No ratings yet
Build Sheet Servlet
6 pages
Compressor Surging
No ratings yet
Compressor Surging
2 pages
iDS-7200HQHI-M1-FA SERIES TURBO ACUSENSE DVR
No ratings yet
iDS-7200HQHI-M1-FA SERIES TURBO ACUSENSE DVR
4 pages
Question 4
No ratings yet
Question 4
9 pages
Web Video and Businesses
No ratings yet
Web Video and Businesses
1 page
Mo Test 1 2
No ratings yet
Mo Test 1 2
7 pages
Fire Alarm System & Design
No ratings yet
Fire Alarm System & Design
50 pages

Chapter Two-Lexical Analysis

Uploaded by

Chapter Two-Lexical Analysis

Uploaded by

Chapter-2: Lexical Analysis

Functions of Lexical Analysis in Compiler Design

Role of a Lexical Analyzer (Scanner or Lexer)

Disadvantage of Single buffer technique

Buffer Pair technique

Disadvantage of Buffer Pair Technique

Psuedo code for the advancement of FP

Algorithm for Advancement of FP while using sentinel

Each regular expression r denotes a language L(r).

Figure 2.4 Transition diagram for relop

You might also like