0% found this document useful (0 votes)

7 views52 pages

2 - Lexical Analysis

Uploaded by

Anonymous Racoon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views52 pages

2 - Lexical Analysis

Uploaded by

Anonymous Racoon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 52

Lexical Analysis

Role of Lexical Analyzer

• First phase of a compiler.
• The main task is
• read the input character of the source program.
• Group them into lexeme.
• Produce a sequence of token for each lexeme.

01/06/2025 2
Role of Lexical Analyzer

token
source to semantic
Lexical Analyzer Parser
program analysis
getNextToken

Symbol Table

Fig 1: Interaction between the lexical analyzer and the parser

01/06/2025 3
Role of Lexical Analyzer
• Lexical analyzer might perform some other tasks.
• Stripping out comments and white space (blank, tab, newline).
• Correlating error messages from the source program.
• Associating line number with an error message.
• Expanding macro preprocessor functions.

01/06/2025 4
Role of Lexical Analyzer
• Lexical analyzer may be divided into a cascade of two
processes
• Scanning
• Simple processes that do not require tokenization.
• Deletion of comment.
• Compaction of consecutive whitespaces into one.
• Lexical analysis
• Complex portion.
• Produces tokens.

01/06/2025 5
Tokens
• A pair consisting of a token name and an optional attribute value.
• Token name is an abstract symbol representing a kind of lexical
unit.
• Keyword
• Identifier

01/06/2025 6
Patterns
• A description of the form that the lexemes of a token may take.

Lexeme
• A sequence of characters in the source program that matches the
pattern for a token and is identified by the lexical analyzer as an
instance of that token.

01/06/2025 7
Tokens, Patterns and Lexemes
printf(“Total = %d\n”,score)

• printf and score are lexemes matching the pattern for token id.
• “Total = %d\n” is a lexeme matching literal.

01/06/2025 8
Tokens, Patterns and Lexemes

Token Informal Description Sample Lexemes

if Characters i, f If
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=
id Letter followed by letters and digits Pi, score
literal Anything but “ surrounded by “’s “core dumped”
number Any numeric constant 3.14, 0, 6.02e23

Figure 2: Examples of Tokens

01/06/2025 9
Attribute for Tokens
• More than one lexeme can match a pattern.
• 0 and 1 both are number.
• Additional information must be provided to subsequent phases.
• In many cases, lexical analyzer returns token name with an attribute
value.
• Attribute value describes the lexeme represented by the token.
• Token name influences parsing decisions.
• Attribute value influences translation of token after the parse.

01/06/2025 10
Attribute for Tokens
• Attribute can be
• A single value.
• Structure combining several information.
• A ‘id’ might contain information of its
• Lexeme
• Type
• Location at which it is found first
• These values are stored in symbol table.
• Hence appropriate value for an identifier is a pointer to the symbol
table entry for that identifier.
01/06/2025 11
Attribute for Tokens
E = M * C ** 2

• Tokens for the statement

• < id, pointer to symbol table entry for E>
• <assign_op>
• <id, pointer to symbol table entry for M>
• <mult_op>
• <id, pointer to symbol table entry for C>
• <exp_op>
• <number, integer value 2>

01/06/2025 12
Lexical Errors
• Hard to detect error without the help of other component.
fi(a%2==0)

• Lexical analyzer can’t tell if “fi” is a misspelling of “if” or an

undeclared function.

01/06/2025 13
Lexical Errors
• Lexical analyzer is unable proceed because none of the patterns for
tokens matches any prefix of the remaining input.
• Simplest recovery strategy – Panic mode recovery.
• Delete successive characters from the remaining input, until lexical analyzer
can find a well-formed token.
• Other possible error-recovery actions
• Delete one character from the remaining input.
• Insert a missing character from the remaining input.
• Replace a character by another character.
• Transpose two adjacent characters.

01/06/2025 14
Input Buffering
• We often need to look one or more character beyond to correctly determine the
lexeme.
• Need to find space to determine the end of identifier.
• Single operator (<,=) can be the beginning of a two-character operators (<=, ==).

01/06/2025 15
Buffer Pair
• Buffering technique is used to reduce the amount of overhead required to
process a single input character.
• One scheme is using two buffers and alternately reloading them.
• Each buffer is of the same size N.
• N is normally the size of disk block (4096 bytes).
• We can read N character per system call.

01/06/2025 16
Buffer Pair
• Two pointers are maintained in the buffer
• lexemeBegin Marks the beginning of the current lexeme.
• forward scans ahead until a match is found.

E = M * C * * 2 eof

lexemeBegin forward

01/06/2025 17
Buffer Pair
• Once lexeme is found
• lexemeBegin is set to immediate next character after the previous lexeme.
• Forward is retracted one position left.

• Advancing forward requires to test if we have reached the end of the buffer.
• If so, other buffer is reloaded and forward is moved to the beginning of newly loaded buffer.

01/06/2025 18
Buffer Pair
• Two checks are necessary to advance forward
• Have we reached end of the buffer?
• Which character have we read?
• We can combine buffer end test with current character.
• The Sentinel is a special character that cannot be part of the source program
• a natural choice is the character eof.

E = M * eof C * * 2 eof eof

01/06/2025 19
Buffer Pair

01/06/2025 20
Specification of Token
• Regular expressions are used to specify lexeme patterns.
• Although not all patterns can be expressed using RE
• Very effective for specifying tokens.

• Let’s recap Regular Expression.

01/06/2025 21
Strings and Language
• Alphabet
• Any finite set of symbols.
• {0,1} is the binary alphabet.
• ASCII, Unicode.
• String
• Finite sequence of symbol drawn from the alphabet.
• 0,1,00,01,1111,… etc. are string of binary alphabet.
• Length of string s, |s|
• Number of occurences of symbols in s.
• ε is the empty string with length 0.

01/06/2025 22
Strings and Language
• Language
• Any countable set of strings over some fixed alphabet.
• Very broad definition.
• All syntactically well-formed C program.
• All grammatically correct sentences.

01/06/2025 23
Operations on Language

Operation Definition and Notation

Union of L and M L ∪ M = {s | s is in L or s is in M}
Concatenation of L and M LM = {st| s is in L and t is in M}
Kleene closure of L L*=
Positive closure of L L+=

01/06/2025 24
Example of Operations
• Let L be the set of letters {A,B,…,Z,a,b,…z}
• Let D be the set of digits {0,1,…9}
• LUD
• Set of letters and digits.
• 62 strings with length 1.
• LD
• Set of 520 strings of length two.
• One letter followed by one digit.
• L4
• Set of all 4 letter strings.

01/06/2025 25
Example of Operations
• Let L be the set of letters {A,B,…,Z,a,b,…z}
• Let D be the set of digits {0,1,…9}
• L*
• Set of all strings of letter including empty string.
• L(L U D)*
• Set of all strings of letters and digits beginning with letter.
• D+
• Set of all strings of one or more digits.

01/06/2025 26
Regular Expressions
• Sequence of characters specifying patterns.
• If letter_ means any letter or underscore
• And digit means any digit
• We can describe the language of C identifiers by
• letter_ (letter_ | digit)*

01/06/2025 27
Formation of Regular Expressions
• Regular expression are built recursively out of smaller regular expression.
• Each regular expression r denotes a language L(r).

01/06/2025 28
Formation of Regular Expressions
• Rules to define RE over language Σ
• Basis
• ε is a regular expression and L(ε) = {ε}
• If ‘a’ is a symbol in Σ, then ‘a’ is a RE and L(a) = {a}.

01/06/2025 29
Formation of Regular Expressions
• Induction
• Suppose r and s are RE.
• (r)|(s) is a RE denoting L(r) U L(s)
• (r)(s) is a RE denoting the language L(r)L(s).
• (r)* is a RE denoting (L(r))*.
• (r) is a RE denoting L(r).
• We can add additional parentheses without changing the meaning.

01/06/2025 30
Precedence and Associativity
• The unary operator(*) has the highest precedence.
• Concatenation has second highest precedence.
• | has the lowest precedence.
• All operators are left associative.

01/06/2025 31
Regular Expression Example
• Let Σ = {a,b}
• a |b denotes the language {a,b}
• (a|b)(a|b)
• {aa, ab, ba, bb}
• a*
• Consisting of all strings of zero or more a.
• (a|b)*
• Zero or more instances of a or b.
• A,b,aa,ab,ba,aab,….

01/06/2025 32
Regular Definition
• Used for notational convenience
• Give name to certain R.E and use them as symbols.
• If ∑ is an alphabet
• Then a regular definition is a sequence of definition of the form

d1 → r1
d2 → r2
…
…
…
dn → rn

01/06/2025 33
Regular Definition
• Each di is a new symbol, not in ∑ and not same as any other d’s
• Each ri is a regular expression over the alphabet
• ∑ U {d1, d2, …, di-1}

d1 → r1
d2 → r2
…
…
…
dn → rn

01/06/2025 34
Regular Definition Example
• C identifiers are strings of letters, digits and underscores.
• The regular definition of identifiers

letters_ → A | B | … | Z| a | …. | z | _
digit → 0 | 1 | … | 9
Id → letters_ ( letters_ | digit)*

01/06/2025 35
Extension of Regular Expression
• One or more instances
• Unary postfix operator +
• Represents positive closure.
• (r)+ denotes the language (L( r ))+
• Zero or one instance
• Unary postfix operator ?
• r? is equivalent to r | ε.
• Same precedence as * and +.

01/06/2025 36
Extension of Regular Expression
• Character classes
• a1|a2| … | an where ai are each symbol of the alphabet can be replaced with
• [a1a2…an]
• If a1a2…an forms a logical sequence
• Uppercase letters, lowercase letters, digits
• We can replace a1a2…an with a1-an
• First and last symbol separated by hyphen.

01/06/2025 37
Regular Expression Example
• Rewriting the regular definition of identifiers

letters_ → [A-Za-z_]
digit → [0-9]
Id → letters_ ( letters_ | digit)*

01/06/2025 38
Recognition of Token
• So far we have seen how to express patterns using regular expression.
• Now we want to use these patterns to detect lexemes.

01/06/2025 39
Recognitions of Token
• Consider the example
stmt → if expr then stmt
| if expr then stmt else stmt
|ϵ
expr -> term relop term
| term
term -> id | number

• A grammar for branching statement and conditional

expressions.

01/06/2025 40
Recognition of Token
• Terminals of the grammars are:
• if, then, else, relop, id, number.
• For relop we will use:
• =, <>, <, >, <=, >=

01/06/2025 41
Recognition of Token
digit → [0-9]
digits → digit+
number → digits(.digits)?(E[+-]? digits)?
letter → [A-Za-z]
id → letter(letter|digit)*
if → if
then → then
else → else
relop → < | > | <= | >= | = | <>

01/06/2025 42
Recognition of Token
• We also need to removed white spaces.
• ws → (blank | tab | newline)+

01/06/2025 43
Tokens, Patterns and Attribute
Values
Lexemes Token Name Attribute Value
Any ws - -
If If -
then then -
else else -
Any id Id Pointer to table entry
Any number number Pointer to table entry
< relop LT
<= relop LE
= relop EQ
<> relop NE
> relop GT
>= relop GE

01/06/2025 44
Transition Diagrams
• As an intermediate step, patterns are converted into stylized
flowcharts, called transition diagrams.

01/06/2025 45
Transition Diagram
• Have a collection of nodes or circles, called states.
• Each state represents a condition that could occur during the process
of scanning.
• Edges are directed from one state of the transition diagram to
another.
• Each edge is labeled by a symbol or set of symbols.
• Assume our diagram is deterministic.

01/06/2025 46
Transition Diagram
• Certain states are said to be accepting or final.
• Indicates a lexeme is found.
• Indicated by a double circle.
• Action is attached with the circle.
• Action is typically returning lexeme with attribute.
• One state is designated the start state or initial state.
• Indicated by an edge labeled by the start.

01/06/2025 47
Transition Diagram Example

01/06/2025 48
Recognition of Reserved Words and
Identifiers
• Keywords like if or then are reserved.
• Even though they look like identifiers.

• This diagram will detect if as a identifier.

01/06/2025 49
Methods to handle reserved word
• Install the reserved words in the symbol table initially.
• A field will indicate that it is not a identifier.
• installID() places a identifier if it is not in the symbol table already.
• Create separate transition diagrams for each keywords.

01/06/2025 50
Transition Diagram for Numbers

01/06/2025 51
The End

01/06/2025 52

Enabling Android Auto On Rlink1
100% (1)
Enabling Android Auto On Rlink1
13 pages
StarBoat User Manual
No ratings yet
StarBoat User Manual
30 pages
Form IEPF 2 - 2012 2013F
No ratings yet
Form IEPF 2 - 2012 2013F
2,350 pages
Process Mapping Checklist
No ratings yet
Process Mapping Checklist
2 pages
Software Metrics
No ratings yet
Software Metrics
121 pages
PP - QM S4 Functionalities
No ratings yet
PP - QM S4 Functionalities
8 pages
BM1X00 Service Training
No ratings yet
BM1X00 Service Training
63 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Erp Proposal
No ratings yet
Erp Proposal
4 pages
Agile Requirements Engineering With User Stories (FINAL PDF)
No ratings yet
Agile Requirements Engineering With User Stories (FINAL PDF)
4 pages
Lexical Analysis: Risul Islam Rasel
No ratings yet
Lexical Analysis: Risul Islam Rasel
148 pages
CD ch2
No ratings yet
CD ch2
104 pages
Unit 1. Spreadsheet: Engr. Khrisna Mae C. Gelogo, Ece, LPT Instructor
No ratings yet
Unit 1. Spreadsheet: Engr. Khrisna Mae C. Gelogo, Ece, LPT Instructor
58 pages
Lexical Analysis: CD: Compiler Design
No ratings yet
Lexical Analysis: CD: Compiler Design
122 pages
GUIslice Builder User Guide
No ratings yet
GUIslice Builder User Guide
107 pages
CV1800B CV1801B Preliminary Datasheet Full en
No ratings yet
CV1800B CV1801B Preliminary Datasheet Full en
692 pages
CH 2 - Lexical Analysis
No ratings yet
CH 2 - Lexical Analysis
36 pages
Oracle Fusion Transaction Business Intelligence (OTBI) in Fusion HCM - Part 2
No ratings yet
Oracle Fusion Transaction Business Intelligence (OTBI) in Fusion HCM - Part 2
3 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Modern Data Pipelines With Apache Airflow
No ratings yet
Modern Data Pipelines With Apache Airflow
36 pages
NXC100 Controller PDF
No ratings yet
NXC100 Controller PDF
2 pages
11 Css Module 6
No ratings yet
11 Css Module 6
20 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
Unit 2 Lexical Analysis
No ratings yet
Unit 2 Lexical Analysis
94 pages
Blackberry
No ratings yet
Blackberry
10 pages
Reference Guide: HP ENVY 6000e
No ratings yet
Reference Guide: HP ENVY 6000e
12 pages
Need To Download Bulk Invoices As PDF - SAP Q&A
No ratings yet
Need To Download Bulk Invoices As PDF - SAP Q&A
3 pages
PHP Installation Guide
No ratings yet
PHP Installation Guide
7 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
Tutorial Lab 3
No ratings yet
Tutorial Lab 3
4 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Ch3 - Lexical Analysis
No ratings yet
Ch3 - Lexical Analysis
52 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Unit Testing
No ratings yet
Unit Testing
16 pages
Catalogo AR7LITE WINLED 2023
No ratings yet
Catalogo AR7LITE WINLED 2023
64 pages
The Notorious PM Quadtree: The Instrument of Your Torture
No ratings yet
The Notorious PM Quadtree: The Instrument of Your Torture
56 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Ch2 Lexical Analysis
No ratings yet
Ch2 Lexical Analysis
11 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Ali Tanveer CV
No ratings yet
Ali Tanveer CV
2 pages
Welcome To CSSE 304
No ratings yet
Welcome To CSSE 304
5 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Equus Pitch Deck - 4 25 24
No ratings yet
Equus Pitch Deck - 4 25 24
15 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
CS411 MIDTERM SOLVED MCQS by JUNAID
No ratings yet
CS411 MIDTERM SOLVED MCQS by JUNAID
48 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
SSC Module2 LexicalAnalysis
No ratings yet
SSC Module2 LexicalAnalysis
26 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
Revision Worksheet - Class 9 - Part B
0% (1)
Revision Worksheet - Class 9 - Part B
2 pages
Compiler
No ratings yet
Compiler
60 pages
Chapter - 2 Lexical Analysis
No ratings yet
Chapter - 2 Lexical Analysis
160 pages
Differences Between SAP S4 HANA and Other ERPs
No ratings yet
Differences Between SAP S4 HANA and Other ERPs
4 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Design Pattern
No ratings yet
Design Pattern
12 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD 1
No ratings yet
CD 1
92 pages
Avaya IP Office™ Built-In Functionality Part 1
No ratings yet
Avaya IP Office™ Built-In Functionality Part 1
6 pages
Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
No ratings yet
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
23 pages
Unit 1
No ratings yet
Unit 1
34 pages
Lecture3 E
No ratings yet
Lecture3 E
153 pages
Lec 02
No ratings yet
Lec 02
17 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
TheAgileCoach 2025edition AgileScrumGuide Com
No ratings yet
TheAgileCoach 2025edition AgileScrumGuide Com
5 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
1431-5403-1-1 Ict Spring 2025
No ratings yet
1431-5403-1-1 Ict Spring 2025
18 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Lexical Analysis
No ratings yet
Lexical Analysis
121 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
Lex Analysis and Implementation: Definitive Reference for Developers and Engineers
From Everand
Lex Analysis and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

2 - Lexical Analysis

Uploaded by

2 - Lexical Analysis

Uploaded by

Lexical Analysis

Role of Lexical Analyzer

Fig 1: Interaction between the lexical analyzer and the parser

Token Informal Description Sample Lexemes

Figure 2: Examples of Tokens

• Tokens for the statement

• Lexical analyzer can’t tell if “fi” is a misspelling of “if” or an

E = M * eof C * * 2 eof eof

• Let’s recap Regular Expression.

Operation Definition and Notation

• A grammar for branching statement and conditional

• This diagram will detect if as a identifier.

You might also like