0% found this document useful (0 votes)

27 views52 pages

Ch3 - Lexical Analysis

The document discusses lexical analysis and how it relates to parsing source code. It defines key terms like tokens, patterns, and lexemes. It also covers how regular expressions are used to specify patterns and the use of finite state automata in lexical analysis.

Uploaded by

nickdamor1123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views52 pages

Ch3 - Lexical Analysis

Uploaded by

nickdamor1123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Lexical Analysis

1
Lexical Analysis
• Basic Concepts & Regular Expressions
• What does a Lexical Analyzer do?
• How does it Work?
• Formalizing Token Definition & Recognition

• Reviewing Finite Automata Concepts

• Non-Deterministic and Deterministic FA
• Conversion Process
• Regular Expressions to NFA
• NFA to DFA
2
• Relating NFAs/DFAs /Conversion to Lexical Analysis
Lexical Analyzer in Perspective
token
source lexical
analyzer parser
program
get next
token

symbol
table

Important Issue:
• What are Responsibilities of each Box ? 3
• Focus on Lexical Analyzer and Parser.
Lexical Analyzer in Perspective

• PARSER
• LEXICAL ANALYZER
• Perform Syntax Analysis
• Scan Input
• Actions Dictated by Token
• Remove WS, NL, … Order
• Identify Tokens • Update Symbol Table Entries
• Create Symbol Table • Create Abstract Rep. of
• Insert Tokens into ST Source

• Generate Errors • Generate Errors

• Send Tokens to Parser • And More…. (We’ll see later)

4
What Factors Have Influencedthe Functional
Division of Labor?
• Separation of Lexical Analysis From Parsing Presents
a Simpler Conceptual Model
• A parser embodying the conventions for comments and white space is
significantly more complex that one that can assume comments and
white space have already been removed by lexical analyzer.

• Separation Increases Compiler Efficiency

• Specialized buffering techniques for reading input characters and
processing tokens…

• Separation Promotes Portability.

5
• Input alphabet peculiarities and other device-specific anomalies can be
restricted to the lexical analyzer.
Introducing Basic Terminology
• What are Major Terms for Lexical Analysis?
• TOKEN
• A pair consisting of a token name and an optional attribute value.
• A particular keyword, or a sequence of input characters denoting
identifier.
• PATTERN
• A description of a form that the lexemes of a token may take.
• For keywords, the pattern is just a sequence of characters that
form keywords.

• LEXEME
6
• Actual sequence of characters that matches pattern and is
classified by a token
Introducing Basic Terminology

Token Sample Lexemes Informal Description of Pattern

const const const
if if characters of i, f
relation <, <=, =, < >, >, >= < or <= or = or < > or >= or >
id pi, count, D2 letter followed by letters and digits
num 3.1416, 0, 6.02E23 any numeric constant
literal “core dumped” any characters between “ and “
except “

Classifies
Pattern Actual values are critical. Info is : 7
1. Stored in symbol table
2. Returned to parser
Attributes forTokens

• When more than one lexeme can match a pattern, a lexical

analyzer must provide the compiler additional information
about that lexeme matched.

• In formation about identifiers, its lexeme, type and location

at which it was first found is kept in symbol table.

• The appropriate attribute value for an identifier is a pointer

to the symbol table entry for that identifier.
8
Attributes for Tokens
Tokens influence parsing decision;
The attributes influence the translation of tokens.

Example: E = M * C ** 2
<id, pointer to symbol-table entry for E>
<assign_op, >
<id, pointer to symbol-table entry for M>
<mult_op, >
<id, pointer to symbol-table entry for C>
9
<exp_op, >
<num, integer value 2>
Handling Lexical Errors
• Its hard for lexical analyzer without the aid of other
components, that there is a source-code error.
• If the statement fi is encountered for the first time in a C
program it can not tell whether fi is misspelling of if statement
or a undeclared literal.
• Probably the parser in this case will be able to handle this.

• Error Handling is very localized, with Respect to Input Source

• For example: whil ( x = 0 ) do
generates no lexical errors in PASCAL
10
Handling LexicalErrors
• In what Situations do Errors Occur?
• Lexical analyzer is unable to proceed because none of the
patterns fortokens matches a prefix of remaining input.
• Panic mode Recovery
• Delete successive characters from the remaining input until
the analyzer can find a well-formed token.
• May confuse the parser – creating syntax error
• Possible error recovery actions:
• Deleting or Inserting Input Characters
• Replacing or Transposing Characters

11
Buffer Pairs
• Lexical analyzer needs to look ahead several characters beyond
the lexeme for a pattern before a match can be announced.
• Use a function ungetc to push look-ahead characters back into the
input stream.
• Large amount of time can be consumed moving characters.

Special Buffering Technique

Use a buffer divided into two N-character halves N =

Number of characters on one disk block One system
command read N characters 12
Fewer than N character => eof
Buffer Pairs (2)
• Two pointers lexeme beginning and forward to the input buffer are
maintained.
• The string of characters between the pointers is the current lexeme.
• Initially both pointers point to first character of the next lexeme to be found.
Forward pointer scans ahead until a match for a pattern is found
• Once the next lexeme is determined, the forward pointer is set to the
character at its right end.
• After the lexeme is processed both pointers are set to the character
immediately past the lexeme

E = M * C * * 2 eof
13
Lexeme_beginning forward
Comments and white space can be treated as patterns that yield no token
Code to advance forwardpointer
if forward at the end of first half
then begin reload second
half ;
forward : = forward + 1;
end
else if forward at end of second half
then begin reload first half ;
move forward to beginning of
first half
end
else forward : = forward + 1;

Pitfalls:
1. This buffering scheme works quite well most of the time
but with it amount of lookahead is limited.
14
2. Limited lookahead makes it impossible to recognize tokens
in situations where the distance, forward pointer must
travel is more than the length of buffer.
Specification ofTokens
Regular expressions are an important notation for specifying lexeme patterns

An alphabet is a finite set of symbols.

• Typical example of symbols are letters, digits and punctuation etc.
• The set {0, 1} is the binary alphabet.

A string over an alphabet is a finite sequence of symbols drawn from that

alphabet.
• The length is string s is denoted as |s|
• Empty string is denoted by ε

Prefix: ban, banana, ε, etc are the prefixes of banana

Suffix: nana, banana, ε, etc are suffixes of banana

Kleene or closure of a language L, denoted by L*.

• L*: concatenation of L zero or more times 15
• L0: concatenation of L zero times
• L+: concatenation of L one or more times
Operations on Languages
Kleene closure

L* denotes “zero or more concatenations of” L

16
Example
Let: L = { a, b, c, ..., z }
D = { 0, 1, 2, ..., 9 }

D+ = “The set of strings with one or more digits”

L  D = “The set of all letters and digits (alphanumeric characters)”

LD = “The set of strings consisting of a letter followed by a digit”

L* = “The set of all strings of letters, including , the empty string”

( L  D )* = “Sequences of zero or more letters and digits”

L ( ( L  D )* ) = “Set of strings that start with a letter, followed by zero or

more letters and digits.”
17
Rules for specifyingRegular
Expressions
Regular expressions over alphabet 

1.  is a regular expression that denotes {}.

2. If a is a symbol (i.e., if a ), then a is a regular expression

that denotes {a}.

3. Suppose r and s are regular expressions denoting the

languages L(r) and L(s). Then

a) (r) | (s) is a regular expression denoting L(r) U L(s).

b) (r)(s) is a regular expression denoting L(r)L(s).
c) (r)* is a regular expression denoting (L(r))*.
d) (r) is a regular expression denoting L(r). 18
How to “Parse” RegularExpressions
• Precedence:
• * has highest precedence.
• Concatenation as middle precedence.
• | has lowest precedence.
• Use parentheses to override these rules.
• Examples:
• a b* = a (b*)
• If you want (a b)* you must use parentheses.
• a | b c = a | (b c)
• If you want (a | b) c you must use parentheses.

• Concatenation and | are associative.

• (a b) c = a (b c) = a b c
• (a | b) | c = a | (b | c) = a | b | c
19
• Example:
• bd|ef*|ga
How to “Parse” RegularExpressions
• Precedence:
• * has highest precedence.
• Concatenation as middle precedence.
• | has lowest precedence.
• Use parentheses to override these rules.
• Examples:
• a b* = a (b*)
• If you want (a b)* you must use parentheses.
• a | b c = a | (b c)
• If you want (a | b) c you must use parentheses.

• Concatenation and | are associative.

• (a b) c = a (b c) = a b c
• (a | b) | c = a | (b | c) = a | b | c
19
• Example:
• b d | e f * | g a = (b d) | (e (f *)) | (g a)
Example
• Let  = {a, b}

• Consider the following R.E.

• a|b
• (a|b)(a|b)
• a*

• (a|b)*
• a|a*b

20
Example

• a | b denotes the set {a, b}

• (a|b)(a|b) denotes {aa, ab, ba, bb}
• a* denotes the set of all strings of zero or more a’s. i.e.,
{, a, aa, aaa, ….. }
• (a|b)* denotes the set containing zero or more instances
of an a or b.
• a|a*b denotes the set containing the string a and all
strings consisting of zero or more a’s followed by one b.

20
Regular Definition
• If Σ is an alphabet of basic symbols then a regular
definition is a sequence of the following form:

d1→r1
d2→r2
……..
dn→rn

where
• Each di is a new symbol such that di  Σ and di dj where
j<I 21
• Each ri is a regular expression over Σ  {d1,d2,…,di-1)
Regular Definition

22
UnsignedNumber
1240, 39.45, 6.33E15, or 1.578E-41

digit → 0 | 1 | 2 | … | 9
digits → digit digit*
optional_fraction → . digits | 
optional_exponent → ( E ( + | -| ) digits) | 
num → digits optional_fraction optional_exponent

24
Addition Notation /Shorthand

23
UnsignedNumber 1240, 39.45, 6.33E15, or 1.578E-41

digit → 0 | 1 | 2 | … | 9
digits → digit digit*
optional_fraction → . digits | 
optional_exponent → ( E ( + | -| ) digits) | 
num → digits optional_fraction optional_exponent

Shorthand

digit → 0 | 1 | 2 | … | 9
digits → digit+
optional_fraction → (. digits ) ?
optional_exponent → ( E ( + | -) ? digits) ? 24
num → digits optional_fraction optional_exponent
TokenRecognition
How can we use concepts developed so far to assist in
recognizing tokens of a source language ?

Assume Following Tokens:

if, then, else, relop, id, num

Given Tokens, What are Patterns ?

num → digit + (. digit + ) ? ( E(+ | -) ? digit + ) ?

What Else Does Lexical Analyzer Do?

Scan away blanks, new lines, tabs

Can we Define Tokens For These?

blank → blank
tab → tab
newline → newline
delim → blank | tab | newline
ws → delim +
27
In these cases no token is returned to parser
Overall

ws - -
if if -
then then -
else else -
id id pointer to table entry
num num Exact value
< relop LT
<= relop LE
= relop EQ
<> relop NE
> relop GT
>= relop GE 28

Note: Each token has a unique token identifier to define category of lexemes
Constructing Transition Diagramsfor Tokens

• Transition Diagrams (TD) are used to represent the tokens

•As characters are read, the relevant TDs are used to attempt to match
lexeme to a pattern
• Each TD has:
• States : Represented by Circles
• Actions : Represented by Arrows between states
• Start State : Beginning of a pattern (Arrowhead)
• Final State(s) : End of pattern (Concentric Circles)
• Edges: arrows connecting the states
• Label other : any character not indicated by any of edge
29
•Each TD is Deterministic (assume) - No need to choose between 2
different actions !
Example TDs

>=: start
0
>
6

30
Example TDs

>=: start
0
>
6
=
7
RTN(GE)

other
8
* RTN(GT)

We’ve accepted “>” and have read one extra char that must be
unread. 30
Example : All RELOPs
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
= 4
*
return(relop, LT)

5 return(relop, EQ)
>

=
6 7 return(relop, GE)
other
8
*
return(relop, GT)
31
Example TDs : id and delim

id : id → letter ( letter | digit )*

letter or digit
start letter other *
9 10 11

return( get_token(), install_id())

Either returns ptr or “0” if reserved

delim : ws →delim +

delim
32
start delim other *
28 29 30
Example TDs :Unsigned #s

digit → 0 | 1 | 2 | … | 9
digits → digit digit*
optional_fraction → . digits | 
optional_exponent → ( E ( + | -| ) digits) | 
num → digits optional_fraction optional_exponent
Example TDs :Unsigned #s
digit digit digit

start digit . digit E +|- digit other *

12 13 14 15 16 17 18 19

E digit

digit digit

start digit * . digit other *

20 21 22 23 24

return(num, install_num())
digit

start digit other *

25 26 27

Questions: Is ordering important for unsigned #s ? 33

Why are there no TDs for then, else, if ?
QUESTION :

What would the transition diagram

(TD) for strings containing each
vowel, in their strict lexicographical
order, look like?

34
Answer
cons → B | C | D | F | G | H | J | … | N | P | … | T | V | .. | Z
string → cons* Acons* E cons* I cons* O cons* U cons*

cons cons cons cons cons cons

start A E I O U other

error Note: The error path is taken

if the character is other than 35
a cons or the vowel in the lex
order.
Finite State Automata(FSAs)
• “Finite State Machines”, “Finite Automata”, “FA”
• A recognizer for a language is a program that takes as input a string x
and answers “yes” if x is a
sentence of the language and “no” otherwise.
• The regular expression is compiled into a recognizer by constructing a
generalized transition diagram called a finite automaton.
• Each state is labeled with a state name
• Directed edges, labeled with symbols
• Two types
• Deterministic (DFA)
• Non-deterministic (NFA)

38
Nondeterministic FiniteAutomata
A nondeterministic finite automaton (NFA) is a
mathematical model that consists of

1. A set of states S
2. A set of input symbols 
3. A transition function that maps state/symbol pairs
to a set of states
4. A special state s0 called the start state
5. A set of states F (subset of S) of final states

INPUT: string 39
OUTPUT: yes or no
Example – NFA : (a|b)*abb

40
Example – NFA : (a|b)*abb

S = { 0, 1, 2, 3 } a
start a b b
s0 = 0 0 1 2 3
F= { 3}
b
 = { a, b }

input
a b  (null) moves possible
s
0 { 0, 1 } {0} i  j
t
a 1 -- {2}
t Switch state but do not 40
e 2 -- {3} use any input symbol
Transition Table
How Does AnNFA Work ?
a
start a b b
0 1 2 3

b • Given an input string, we trace moves

• If no more input & in final state,ACCEPT
EXAMPLE: -OR-
Input: ababb
move(0, a) = 0
move(0, a) = 1 move(0, b) = 0
move(1, b) = 2 move(0, a) = 1
move(2, a) = ? (undefined) move(1, b) = 2
move(2, b) = 3 41
REJECT ! ACCEPT !
Handling UndefinedTransitions
We can handle undefined transitions by defining one more state, a
“death” state, and transitioning all previously undefined transition to
this death state.

start a b b
0 1 2 3

b a
a
a, b

4
 42
Other Concepts
Not all paths may result in acceptance.
a
start a b b
0 1 2 3

aabb is accepted along path : 0 → 0 → 1 → 2 →3

BUT… it is not accepted along the valid path:

0 → 0 → 0 → 0 →0
43
Deterministic Finite Automata
A DFA is an NFA with the following restrictions:
•  moves are not allowed
• For every state s S, there is one and only one path from s for every
input symbol a  .

Since transition tables don’t have any alternative options, DFAs

are easily simulated via an algorithm.
s  s0
c  nextchar;
while c eof do
s  move(s,c);
c  nextchar;
end;
44
if s is in F then return “yes”
else return “no”
Example – DFA :(a|b)*abb
b
a
start a b b
0 1 2 3
a
b a

What Language is Accepted?

Recall the original NFA:

a
start a b b
0 1 2 3 45

b
Relation between RE, NFA andDFA
1. There is an algorithm for converting any RE into an NFA.
2. There is an algorithm for converting any NFA to a DFA.
3. There is an algorithm for converting any DFA to a RE.

These facts tell us that REs, NFAs and DFAs have equivalent expressive
power.

All three describe the class of regular languages.

46
NFA vs DFA
• An NFA may be simulated by algorithm, when NFA is constructed from the
R.E
• Algorithm run time is proportional to |N| * |x| where |N| is the
number of states and |x| is the length of input
• Alternatively, we can construct DFA from NFA and uses it to recognize
input
• The space requirement of a DFA can be large. The RE
(a+b)*a(a+b)(a+b)….(a+b) [n-1 (a+b) at the end] has no DFA with less
than 2n states. Fortunately, such RE in practice does not occur often

space time to
required simulate
NFA O(|r|) O(|r|*|x|)
47
DFA O(2|r|) O(|x|)

where |r| is the length of the regular expression.

Thank You Any
Questions?

Lexical Analysis: Risul Islam Rasel
No ratings yet
Lexical Analysis: Risul Islam Rasel
148 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Lexical Analysis
No ratings yet
Lexical Analysis
121 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Chapter - 2 Lexical Analysis
No ratings yet
Chapter - 2 Lexical Analysis
160 pages
Unit 2
No ratings yet
Unit 2
89 pages
Lecture3 E
No ratings yet
Lecture3 E
153 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Conclusion and Future Scope Bibliography and References: Microsoft Access
100% (2)
Conclusion and Future Scope Bibliography and References: Microsoft Access
59 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
WINSEM2024-25 CSI2005 TH VL2024250502429 2024-12-14 Reference-Material-II
No ratings yet
WINSEM2024-25 CSI2005 TH VL2024250502429 2024-12-14 Reference-Material-II
84 pages
Unit 2 Lexical Analysis
No ratings yet
Unit 2 Lexical Analysis
94 pages
Lexical Analysis: CD: Compiler Design
No ratings yet
Lexical Analysis: CD: Compiler Design
122 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
84 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Unit 2
No ratings yet
Unit 2
61 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
CD 1
No ratings yet
CD 1
92 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
CD ch2
No ratings yet
CD ch2
104 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
PM TR 20 414 En-Gb PM-SW+MMIS+SWP
No ratings yet
PM TR 20 414 En-Gb PM-SW+MMIS+SWP
25 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
Lec 02
No ratings yet
Lec 02
17 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Module 3
No ratings yet
Module 3
7 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
Ch2 Lexical Analysis
No ratings yet
Ch2 Lexical Analysis
11 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Ad Audit Plus Use Cases
No ratings yet
Ad Audit Plus Use Cases
32 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
GST JV S4 Configuration&Process
100% (1)
GST JV S4 Configuration&Process
6 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
ICG DevOps Methodology Ebook
No ratings yet
ICG DevOps Methodology Ebook
5 pages
Instruct MP3 Firmware Install 05-06
No ratings yet
Instruct MP3 Firmware Install 05-06
7 pages
How To Download Books From Books Google With Google Book Download Stand Alone Program and Greasemonkey With Google Books Downloader Script
No ratings yet
How To Download Books From Books Google With Google Book Download Stand Alone Program and Greasemonkey With Google Books Downloader Script
5 pages
rkCD-Chapter 2 - LEXICAL ANALYSIS
No ratings yet
rkCD-Chapter 2 - LEXICAL ANALYSIS
9 pages
MariaDB Encryption
No ratings yet
MariaDB Encryption
11 pages
QmEye PC Client Introduction
No ratings yet
QmEye PC Client Introduction
16 pages
Visualizing Data Very Good
100% (1)
Visualizing Data Very Good
15 pages
Customer Startup Info V20 E 101005 (AGe)
No ratings yet
Customer Startup Info V20 E 101005 (AGe)
31 pages
Scrivener Keyboard Shortcuts
No ratings yet
Scrivener Keyboard Shortcuts
3 pages
Guia de Importacion de Productos para El Software Unicenta
0% (1)
Guia de Importacion de Productos para El Software Unicenta
18 pages
Enhancing The Implementation of Telecommuting (Work From Home) in Malaysia
No ratings yet
Enhancing The Implementation of Telecommuting (Work From Home) in Malaysia
12 pages
Nes Architecture
No ratings yet
Nes Architecture
23 pages
Blackwintersecurity Com Oscp
No ratings yet
Blackwintersecurity Com Oscp
6 pages
What Is The Espresso Book Machine?: by On Demand Books
No ratings yet
What Is The Espresso Book Machine?: by On Demand Books
2 pages
PCB Artist User Tips Guide
No ratings yet
PCB Artist User Tips Guide
10 pages
UD39811B-A Network-Video-Recorder User-Manual V5.04.000 20250407
No ratings yet
UD39811B-A Network-Video-Recorder User-Manual V5.04.000 20250407
150 pages
Introduction To Computers
No ratings yet
Introduction To Computers
29 pages
GSM Guid
No ratings yet
GSM Guid
5 pages
Cloud Computing and Big Data 7th Conference JCC BD 2019 La Plata Buenos Aires Argentina June 24 28 2019 Revised Selected Papers Marcelo Naiouf
No ratings yet
Cloud Computing and Big Data 7th Conference JCC BD 2019 La Plata Buenos Aires Argentina June 24 28 2019 Revised Selected Papers Marcelo Naiouf
55 pages
Cloud Security Mechanisms
No ratings yet
Cloud Security Mechanisms
10 pages
Web Client Logs
No ratings yet
Web Client Logs
94 pages
Milestone XProtect Essential NVR Cheat Sheet
No ratings yet
Milestone XProtect Essential NVR Cheat Sheet
3 pages
Science BSC Computer Science Semester 5 2024 April Information Network Security R 2023
No ratings yet
Science BSC Computer Science Semester 5 2024 April Information Network Security R 2023
1 page
Introduction To The Arduino: EE 260 Lab 1
No ratings yet
Introduction To The Arduino: EE 260 Lab 1
4 pages
RPSC Programmer Sad DPP Part-3 By-Sunil Yadav Sir
No ratings yet
RPSC Programmer Sad DPP Part-3 By-Sunil Yadav Sir
2 pages
HPC Syllabus
No ratings yet
HPC Syllabus
2 pages
MKA500 Data Sheet en 12
No ratings yet
MKA500 Data Sheet en 12
4 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)

Ch3 - Lexical Analysis

Uploaded by

Ch3 - Lexical Analysis

Uploaded by

Lexical Analysis

• Reviewing Finite Automata Concepts

• Generate Errors • Generate Errors

• Send Tokens to Parser • And More…. (We’ll see later)

• Separation Increases Compiler Efficiency

• Separation Promotes Portability.

Token Sample Lexemes Informal Description of Pattern

• When more than one lexeme can match a pattern, a lexical

• In formation about identifiers, its lexeme, type and location

• The appropriate attribute value for an identifier is a pointer

• Error Handling is very localized, with Respect to Input Source

Special Buffering Technique

Use a buffer divided into two N-character halves N =

An alphabet is a finite set of symbols.

A string over an alphabet is a finite sequence of symbols drawn from that

Prefix: ban, banana, ε, etc are the prefixes of banana

Kleene or closure of a language L, denoted by L*.

L* denotes “zero or more concatenations of” L

D+ = “The set of strings with one or more digits”

L  D = “The set of all letters and digits (alphanumeric characters)”

LD = “The set of strings consisting of a letter followed by a digit”

L* = “The set of all strings of letters, including , the empty string”

( L  D )* = “Sequences of zero or more letters and digits”

L ( ( L  D )* ) = “Set of strings that start with a letter, followed by zero or

1.  is a regular expression that denotes {}.

2. If a is a symbol (i.e., if a ), then a is a regular expression

3. Suppose r and s are regular expressions denoting the

a) (r) | (s) is a regular expression denoting L(r) U L(s).

• Concatenation and | are associative.

• Concatenation and | are associative.

• Consider the following R.E.

• a | b denotes the set {a, b}

Assume Following Tokens:

Given Tokens, What are Patterns ?

num → digit + (. digit + ) ? ( E(+ | -) ? digit + ) ?

Scan away blanks, new lines, tabs

• Transition Diagrams (TD) are used to represent the tokens

id : id → letter ( letter | digit )*

return( get_token(), install_id())

Either returns ptr or “0” if reserved

start digit . digit E +|- digit other *

start digit * . digit other *

start digit other *

Questions: Is ordering important for unsigned #s ? 33

What would the transition diagram

cons cons cons cons cons cons

error Note: The error path is taken

b • Given an input string, we trace moves

aabb is accepted along path : 0 → 0 → 1 → 2 →3

BUT… it is not accepted along the valid path:

Since transition tables don’t have any alternative options, DFAs

What Language is Accepted?

Recall the original NFA:

All three describe the class of regular languages.

where |r| is the length of the regular expression.

You might also like