0% found this document useful (0 votes)

55 views19 pages

2 Compiler - Slide

This document provides an overview of compilation and lexical analysis. It defines key terms like compilers, lexical analysis, tokens, and finite state automata. It then gives examples of finite state automata for recognizing specific words and integers. The document discusses how lexical analysis works by stripping unnecessary characters and tracking line numbers from source code. It also introduces regular expressions, deterministic and nondeterministic finite automata as formalisms used for scanners in compilers.

Uploaded by

gilbertelena7898

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views19 pages

2 Compiler - Slide

Uploaded by

gilbertelena7898

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

An Overview of Compilation

source program target program

lexical analyzer
CS 335: Lexical Analysis symbol table
code generator

Swarnendu Biswas
syntax analyzer error handler code optimizer
Semester 2022-2023-II
CSE, IIT Kanpur
intermediate code
semantic analyzer generator
Content influenced by many excellent references, see References slide for acknowledgements.

CS 335 Swarnendu Biswas

Overview of Lexical Analysis Description of Lexical Analysis

• First stage of a three-part frontend to help understand the source • Input:
program • A high level language (e.g., C++ and Java) program in the form of a sequence
of ASCII characters
• Processes every character in the input program
• If a word is valid, then it is assigned to a syntactic category • Output:
• This is similar to identifying the part of speech of an English word • A sequence of tokens along with attributes corresponding to different
syntactic categories that is forwarded to the parser for syntax analysis
• Functionality:
Compilers are engineered objects. • Strips off blanks, tabs, newlines, and comments from the source program
• Keeps track of line numbers and associates error messages from various parts
of a compiler with line numbers
noun verb adjective noun punctuation • Performs some preprocessor functions in languages like C

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Recognizing Word “new”
c = getNextChar();
if (c == ‘n’)
S0
c = getNextChar();
if (c == ‘e’) n
c = getNextChar();
if (c == ‘w’) s1
report success;
else
// Other logic s2
e
Formalism for Scanners
else Regular expressions, DFAs, and NFAs
w
// Other logic
else s3
// Other logic

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Definitions Finite State Automaton

• An alphabet is a finite set of symbols • A finite state automaton (FSA) is a five-tuple or quintuple (𝑆, Σ, 𝛿, 𝑠0 , 𝑆𝐹 )
• Typical symbols are letters, digits, and punctuations • 𝑆 is a finite set of states
• ASCII and UNICODE are examples of alphabets • Σ is the alphabet or character set, is the union of all edge labels in the FSA, and is
finite
• A string over an alphabet is a finite sequence of symbols drawn from • 𝛿(𝑠, 𝑐) represents the transition from state 𝑠 on input 𝑐
that alphabet • 𝑠0 ∈ S is the designated start state
• A language is any countable set of strings over a fixed alphabet • 𝑆𝐹 ⊆ 𝑆 is the set of final states
• A FSA accepts a string 𝑥 if and only if
i. FSA starts in 𝑠0
ii. Executes transitions for the sequence of characters in 𝑥
iii. Final state is an accepting state ∈ 𝑆𝐹 after 𝑥 has been consumed

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

FSA for recognizing “new” FSA for Unsigned Integers
• FSA = (𝑆, Σ, 𝛿, 𝑠0 , 𝑆𝐹 ) char = getNextChar( ) 𝑠𝑒 is the • FSA = (𝑆, Σ, 𝛿, 𝑠0 , 𝑆𝐹 )
state = 𝑠0 error state
S0 • 𝑆 = (𝑠0 , 𝑠1 , 𝑠2 , 𝑠3 ) • 𝑆 = (𝑠0 , 𝑠1 , 𝑠2 , 𝑠𝑒 )
• Σ = {𝑛, 𝑒, 𝑤} while (char ≠ EOF and state ≠ 𝑠𝑒 ) • Σ = {0,1,2,3,4,5,6,7,8,9}
n 𝑛 𝑒 𝑤 state = 𝛿(state,char) 0 1−9
• 𝛿 = {𝑠0 ՜ 𝑠1 , 𝑠1 ՜ 𝑠2 , 𝑠2 ՜ 𝑠3 } • 𝛿 = {𝑠0 ՜ 𝑠1 , 𝑠0 𝑠2 ,
s1 char = getNextChar() 0−9 0−9
• 𝑠0 = 𝑠0 𝑠2 𝑠2 , 𝑠1 𝑠𝑒 }
e • 𝑆𝐹 = {𝑠3 } • 𝑠0 = 𝑠0
if (state ∈ 𝑆𝐹 )
s2 • 𝑆𝐹 = {𝑠1 , 𝑠2 }
report success
w
else
s3 String is recognized in time proportional to the input report failure

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Dealing with Erroneous Situations Nondeterministic Finite Automaton

• FSA is in state 𝑠, the next input character is 𝑐, and 𝛿(𝑠, 𝑐) is not • NFA is a FSA that allows transitions on the empty string 𝜖 and can
defined have states that have multiple transitions on the same input character
• FSA processes the complete input and is still not in the final state
• Input string is a proper prefix for some word accepted by the FSA • Simulating an NFA
• Always make the correct nondeterministic choice to follow transitions that
lead to accepting state(s) for the input string, if such transitions exist
• Try all nondeterministic choices in parallel to search the space of all possible
configurations

• Simulating a DFA is more efficient than an NFA

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas
Regular Expressions Regular Expressions
• The set of words accepted by an FSA 𝐹 is called its language 𝐿(𝐹) • 𝜖 is a RE, 𝐿 𝜖 = {𝜖}
• Let Σ be an alphabet. For each 𝑎 ∈ Σ, 𝑎 is a RE, and 𝐿 𝑎 = {𝑎}.
• For any FSA 𝐹, we can also describe 𝐿(𝐹) using a notation called a
Regular Expressions (RE) • Let 𝑟 and 𝑠 be REs denoting the languages 𝑅 and 𝑆, respectively
• The language described by a RE 𝑟 is called a regular language • Alternation (or union): (𝑟|𝑠) is a RE, 𝐿 𝑟 𝑠 = 𝑅|𝑆 = 𝑥 𝑥 ∈ 𝑅 or 𝑥 ∈ 𝑆 =
(denoted by 𝐿(𝑟)) 𝐿(𝑟) ∪ 𝐿(𝑠)
• Concatenation: (𝑟𝑠) is a RE, 𝐿 𝑟𝑠 = 𝑅. 𝑆 = 𝑥𝑦 𝑥 ∈ 𝑅 ∧ 𝑦 ∈ 𝑆}
• Closure: (𝑟 ∗ ) is an RE, 𝐿 𝑟 ∗ = 𝑅∗ = ‫∞ڂ‬𝑖=0 𝑅
𝑖

• 𝐿∗ is called the Kleene closure or closure of 𝐿

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Examples of Regular Expressions Examples of Regular Expressions

𝐿 = set of all strings of 0′ s and 1′ s Unsigned real numbers with exponents
𝑟 = (0 + 1)∗ 𝑟 = 0 1 … 9 0 … 9 ∗ . 0 … 9 ∗ 𝜖 𝐸(+| − |𝜖)(0|[1 … 9][0 … 9]∗ )

𝐿 = 𝑤 ∈ 0,1 ∗ 𝑤 has two or three occurences of 1,

the first and second are not consecutive} 𝐿 = 𝑤 ∈ 0,1 ∗ 𝑤 has no pair of consecutive zeros}
𝑟 = 0∗ 10∗ 010∗ (10∗ + 𝜖) 𝑟 = 1 + 01 ∗ (0 + 𝜖)

𝐿 = 𝑤 𝑤 ∈ 𝑎, 𝑏 ∗ ∧ 𝑤 ends with 𝑎}
𝑟 = (𝑎 + 𝑏)∗ 𝑎
CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas
Regular Expressions Algebraic Rules for REs
• We can reduce the use of parentheses by introducing precedence and Rule Description
associativity rules 𝑟|𝑠 = 𝑠|𝑟 | is commutative
• Binary operators, closure, concatenation, and alternation are left associative
𝑟| 𝑠 𝑡 = 𝑟 𝑠 |𝑡 | is associative
𝑟 𝑠𝑡 = 𝑟𝑠 𝑡 Concatenation is commutative
• Precedence rule is
𝑟 𝑠 𝑡 = 𝑟𝑠|𝑟𝑡; 𝑠 𝑡 𝑟 = 𝑠𝑟|𝑡𝑟 Concatenation distributes over |
parentheses > closure > concatenation > alternation 𝜖𝑟 = 𝑟𝜖 = 𝑟 𝜖 is the identity of concatenation
𝑟∗ = (𝑟|𝜖)∗ 𝜖 is guaranteed in a closure
𝑟 ∗∗ = 𝑟∗ ∗ is idempotent

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Regular Definitions Example of Regular Definitions

• Let 𝑟𝑖 be a regular expression and 𝑑𝑖 be a distinct name • Unsigned numbers (e.g., 5280, 0.01234, 6.336E4, or 1.89E-4)
• Regular Definition is a sequence of definitions of the form
𝑑1 ՜ 𝑟1
𝑑2 ՜ 𝑟2 𝑑𝑖𝑔𝑖𝑡 = 0 1 2 3 4 5 6 7 8|9
… 𝑑𝑖𝑔𝑖𝑡𝑠 = 𝑑𝑖𝑔𝑖𝑡 𝑑𝑖𝑔𝑖𝑡 ∗
𝑑𝑛 ՜ 𝑟𝑛 𝑜𝑝𝑡𝑓𝑟𝑎𝑐 = . 𝑑𝑖𝑔𝑖𝑡𝑠|𝜖
• Each 𝑟𝑖 is a regular expression over the symbols Σ ∪ {𝑑1 , 𝑑2 , … , 𝑑𝑖−1 } 𝑜𝑝𝑡𝑒𝑥𝑝 = (𝐸 + − 𝜖 𝑑𝑖𝑔𝑖𝑡𝑠 |𝜖
𝑢𝑛𝑠𝑖𝑔𝑛𝑒𝑑𝑛𝑢𝑚 = 𝑑𝑖𝑔𝑖𝑡𝑠 𝑜𝑝𝑡𝑓𝑟𝑎𝑐 𝑜𝑝𝑡𝑒𝑥𝑝
• Each 𝑑𝑖 is a new symbol not in Σ

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Extensions of Regular Expressions Example of Regular Definitions
“.” is any character other than “\n” • Unsigned numbers
• Example: 5280, 0.01234, 6.336E4, or 1.89E-4
[𝑥𝑦𝑧] is 𝑥|𝑦|𝑧 𝑑𝑖𝑔𝑖𝑡 = 0 1 2 3 4 5 6 7 8|9
𝑑𝑖𝑔𝑖𝑡𝑠 = 𝑑𝑖𝑔𝑖𝑡 𝑑𝑖𝑔𝑖𝑡 ∗
[𝑎𝑏𝑔−𝑝𝑇−𝑌] is any character 𝑎, 𝑏, 𝑔, … , 𝑝, 𝑇, … , 𝑌 𝑜𝑝𝑡𝑓𝑟𝑎𝑐 = . 𝑑𝑖𝑔𝑖𝑡𝑠|𝜖
𝑜𝑝𝑡𝑒𝑥𝑝 = (𝐸 + − 𝜖 𝑑𝑖𝑔𝑖𝑡𝑠 |𝜖
[^𝐺−𝑄] is not any one of 𝐺, 𝐻, … , 𝑄
𝑢𝑛𝑠𝑖𝑔𝑛𝑒𝑑𝑛𝑢𝑚 = 𝑑𝑖𝑔𝑖𝑡𝑠 𝑜𝑝𝑡𝑓𝑟𝑎𝑐 𝑜𝑝𝑡𝑒𝑥𝑝
𝑟+ is one or more 𝑟’s 𝑑𝑖𝑔𝑖𝑡𝑠 = [0−9] Simpler to
𝑑𝑖𝑔𝑖𝑡𝑠 = 𝑑𝑖𝑔𝑖𝑡 + write
𝑟? is zero or one 𝑟 𝑢𝑛𝑠𝑖𝑔𝑛𝑒𝑑𝑛𝑢𝑚 = 𝑑𝑖𝑔𝑖𝑡𝑠 . 𝑑𝑖𝑔𝑖𝑡𝑠 ? 𝐸 +− ? 𝑑𝑖𝑔𝑖𝑡𝑠 ?

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

NFA = (𝑁, Σ, 𝛿𝑁 , 𝑛0 , 𝑁𝐴 )

Equivalence of RE and FSA NFA to DFA: Subset Construction DFA = (𝐷, Σ, 𝛿𝐷 , 𝑑0 , 𝐷𝐴 )

• There exists an NFA with 𝜖-transitions that accepts 𝐿(𝑟), where 𝑟 is a

RE Subset Construction 𝜖-closure

• If 𝐿 is accepted by a DFA, then 𝐿 is generated by a RE 𝑞0 = 𝜖-closure({𝑠0 }) for each state 𝑛 ∈ 𝑁 do

𝑄 = 𝑞0 𝐸 𝑛 = {𝑛}
•… Kleene’s WorkList = {𝑞0 } WorkList = 𝑁
Construction
while (WorkList ≠ 𝜙) do while (WorkList ≠ 𝜙) do
Improve run time
and memory
code for a remove 𝑞 from WorkList remove 𝑛 from WorkList
scanner
overhead for each character 𝑐 ∈ Σ do 𝑡 = {𝑛} ∪ ‫𝜖 ڂ‬ 𝐸(𝑝)
RE DFA 𝑛՜𝑝∈𝛿𝑁
DFA 𝑡 = 𝜖-closure(𝛿(𝑞, 𝑐)) if 𝑡 ≠ 𝐸(𝑛)
Minimization 𝑇 𝑞, 𝑐 = 𝑡 𝐸 𝑛 =𝑡
if 𝑡 ∉ 𝑄 then 𝜖
Thompson’s WorkList = WorkList ∪ {𝑚|𝑚 ՜ 𝑛 ∈ 𝛿𝑁 }
NFA Subset add 𝑡 to 𝑄 and to WorkList
Construction DFAs are
Construction easier to
simulate
CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas
DFA to Minimal DFA: Hopcroft’s Algorithm Splitting a Partition
𝑑𝑥 𝑑𝑖 𝑎 𝑑𝑥

• A DFA from Subset construction can have a large number of states 𝑎 𝑎

𝑝4 𝑝6 𝑝4
𝑑𝑖 𝑑𝑥 𝑑𝑖
• Does not increase the time needed to scan a string
• Increases the space requirement of the scanner in memory 𝑎 𝑎
𝑑𝑗 𝑑𝑦 𝑑𝑗 𝑎
• Speed of accesses to main memory may turn out to be the bottleneck 𝑑𝑦 𝑑𝑗 𝑑𝑦
• Smaller scanner has better chances of fitting in the processor cache
𝑎 𝑎
𝑑𝑘 𝑑𝑧 𝑑𝑘 𝑎
𝑑𝑧 𝑑𝑘 𝑑𝑧

𝑝1 𝑝2 𝑝3 𝑝5 𝑝7 𝑝5

𝑎 does not split 𝑝1 𝑎 splits 𝑝3 Partitions after splitting on 𝑎

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

DFA to Minimal DFA: Hopcroft’s Algorithm

Minimization Split(𝑺)
𝑇 = 𝐷𝐴 , 𝐷 − 𝐷𝐴 for each 𝑐 ∈ Σ do
𝑃=𝜙 if 𝑐 splits 𝑆 into 𝑠1 and 𝑠2
while(𝑃 ≠ 𝑇) do return {𝑠1 , 𝑠2 }
return 𝑆
𝑃=𝑇
𝑇=𝜙 Realizing Scanners
for each set 𝑝 ∈ 𝑃 do
𝑇 = 𝑇 ∪ Split(𝑝)

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Tokens Patterns and Lexemes
• Pattern
f l o a t a b s _ z e r o = - 2 7 3 ; / * K e l v i n * /
• The rule describing the set of strings for which the same token is produced
• Token • The pattern is said to match each string in the set
• A string of characters which logically belong together in a syntactic category • float, letter(letter|digit|_)*, =, -, digit+, ;
• Sentences consist of a string of tokens (e.g., float, identifier, assign, minus,
intnum, semicolon) • Lexeme
• Tokens are treated as terminal symbols of the grammar specifying the source • The sequence of characters matched by a pattern to form the corresponding
language token
• May have optional attributes • “float”, “abs_zero”, “=”, “-”, “273”, “;”
• Example of tokens in programming languages: Keywords, operators,
identifiers (names), constants, literal strings, punctuation symbols
(parentheses, brackets, commas, semicolons, and colons)

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Attributes of Tokens Role of a Lexical Analyzer

• An attribute of a token is a value that the scanner extracts from the • Identify tokens and corresponding lexemes
corresponding lexeme and supplies to the syntax analyzer • Construct constants: for example, convert a number to token intnum
• Examples attributes for tokens and pass the value as its attribute
• identifier: the lexeme of the token, or a pointer into the symbol table where • 31 becomes <intnum, 31>
the lexeme is stored by the LA
• Recognize keyword and identifiers
• intnum: the value of the integer (similarly for floatnum, etc.)
• counter = counter + increment becomes id = id + id
• Type of the identifier, location where first found
• Check that id here is not a keyword
• The exact set of attributes are dependent on the compiler designer
• Discard whatever does not contribute to parsing
• White spaces (blanks, tabs, newlines) and comments

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Specifying and Recognizing Patterns and
Transition Diagrams
Tokens
• Patterns are denoted with REs, and recognized with FSAs • Transition diagrams (TDs) are generalized DFAs with the following
differences
• Regular definitions, a mechanism based on regular expressions, are • Edges may be labelled by a symbol, a set of symbols, or a regular definition
popular for specification of tokens • Few accepting states may be indicated as retracting states
• Indicates that the lexeme does not include the symbol that transitions to the accepting
• Transition diagrams, a variant of FSAs, are used to implement regular state
definitions and to recognize tokens • Each accepting state has an action attached to it
• Usually used to model LA before translating them to executable programs • Action is executed when the state is reached (e.g., return a token and its attribute value)

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Examples of Transition Diagrams A Sample Specification

Identifiers and reserved words letter/digit
𝑠𝑡𝑚𝑡 ⟶ if 𝑒𝑥𝑝𝑟 then 𝑠𝑡𝑚𝑡 𝑑𝑖𝑔𝑖𝑡 ⟶ [0−9]
𝑙𝑒𝑡𝑡𝑒𝑟 = 𝑎−𝑧𝐴−𝑍 𝑑𝑖𝑔𝑖𝑡𝑠 ⟶ 𝑑𝑖𝑔𝑖𝑡 +
𝑑𝑖𝑔𝑖𝑡 = [0−9] * | if 𝑒𝑥𝑝𝑟 then 𝑠𝑡𝑚𝑡 else 𝑠𝑡𝑚𝑡
start letter other 𝑛𝑢𝑚𝑏𝑒𝑟 ⟶ 𝑑𝑖𝑔𝑖𝑡𝑠 . 𝑑𝑖𝑔𝑖𝑡𝑠 ? 𝐸 +− ? 𝑑𝑖𝑔𝑖𝑡𝑠 ?
𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑟 = 𝑙𝑒𝑡𝑡𝑒𝑟(𝑙𝑒𝑡𝑡𝑒𝑟|𝑑𝑖𝑔𝑖𝑡)∗ 0 1 2 |𝜖
𝑙𝑒𝑡𝑡𝑒𝑟 ⟶ [𝐴−𝑍𝑎 − 𝑧]
𝑒𝑥𝑝𝑟 ⟶ 𝑡𝑒𝑟𝑚 relop 𝑡𝑒𝑟𝑚
return(get_token_code(), name) 𝑖𝑑 ⟶ 𝑙𝑒𝑡𝑡𝑒𝑟 𝑙𝑒𝑡𝑡𝑒𝑟 𝑑𝑖𝑔𝑖𝑡)∗
| 𝑡𝑒𝑟𝑚
• * indicates a retraction state 𝑡𝑒𝑟𝑚 ⟶ id
𝑖𝑓 ⟶ if
𝑡ℎ𝑒𝑛 ⟶ then
• get_token_code() searches a table to check if the name is a | number 𝑒𝑙𝑠𝑒 ⟶ else
𝑟𝑒𝑙𝑜𝑝 ⟶ < | > <= >= | = | <>
reserved word and returns its integer code if so 𝑤𝑠 ⟶ blank tab | newline)+
• Otherwise, it returns the integer code of the IDENTIFIER token, with
name containing the string of characters forming the token
• Name is not relevant for reserved words

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Tokens, Lexemes, and Attributes Transition Diagram for relop
Lexemes Token Name Attribute Value start < =
0 1 2
Any 𝑤𝑠 -- -- return(relop, LE)
>
𝑖𝑓 if --
3
𝑡ℎ𝑒𝑛 then -- return(relop, NE)
𝑒𝑙𝑠𝑒 else --
4
*
Any 𝑖𝑑 id Pointer to symbol table entry
5 return(relop, LT)
Any 𝑛𝑢𝑚𝑏𝑒𝑟 number Pointer to symbol table entry return(relop, ASSGN)
< relop LT
<= relop LE =
6 7
= relop ASSGN return(relop, GE)

<> relop NE
8
*
> relop GT return(relop, GT)
>= relop GE
CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Transition Diagrams for IDs and Keywords Transition Diagram for Unsigned Numbers
IDs and Keywords letter/digit
digit digit digit

start letter other *

9 10 11 start digit . digit E +|- digit other *
12 13 14 15 16 17 18 19
return(get_token_code(), name)

E digit
Whitespace
delim
* *
20 21

start delim other *

22 23 24

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Combining Transition Diagrams to form a Combining Transition Diagrams to form a
Lexical Analyzer Lexical Analyzer
• Different transition diagrams (TDs) must be combined appropriately • Different transition diagrams (TDs) must be combined appropriately
to yield a scanner to yield a scanner
• Try different transition diagrams one after another
• For example, TDs for reserved words, constants, identifiers, and operators could be tried
How do we do this? in that order
• However, this does not use the “longest match” characteristic
• thenext should be an identifier, and not reserved word then followed by identifier ext

• To find the longest match, all TDs must be tried and the longest match
must be used

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Challenges in Lexical Analysis Challenges in Lexical Analysis

• Certain languages like PL/I do not have any reserved words • Certain languages like PL/I do not have any reserved words
• while, do, if, and else are reserved in C but not in PL/I • while, do, if, and else are reserved in C but not in PL/I
• Makes it difficult for the scanner to distinguish between keywords and user- • Makes it difficult for the scanner to distinguish between keywords and user-
defined identifiers defined identifiers

if then then then = else else else = then • PL/I declarations

• DECLARE(arg1,arg2,arg3,…,argn)
if if then then = then + 1 • Cannot tell whether DECLARE is a keyword with variable definitions or is a
procedure with arguments until after “)”
• Requires arbitrary lookahead and very large buffers
• Worse, the buffers may have to be reloaded in case of wrong inferences

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Challenges in Lexical Analysis Challenges in Lexical Analysis
fi (a == g(x)) … • Consider a fixed-format language like Fortran
• Is fi a typo or a function call? • 80 columns per line
• Remember, fi is a valid lexeme for IDENTIFIER • Column 1-5 for the statement number/label column
• Column 6 for continuation mark
• Think of C++ • Column 7-72 for the program statements
• Template syntax: Foo<Bar> • Column 73-80 Ignored (used for other purposes)
• Stream syntax: cin >> var;
• Nested templates: Foo<Bar<Bazz>> • Letter C in Column 1 meant the current line is a comment

• Can these problems be resolved by lexical analysers alone? No, in some

cases parser needs to help.

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Programming Languages vs Natural

Challenges in Lexical Analysis
Languages
• In fixed-format Fortran, some keywords are context-dependent • Meaning of words in natural languages is often context-sensitive
• In the statement, DO 10 I = 10.86, DO10I is an identifier, and DO is not a • An English word can be a noun or a verb (for e.g., “stress”)
keyword • “are” is a verb, “art” is a noun, and “arz” is undefined
• But in the statement, DO 10 I = 10, 86, DO is a keyword
• Blanks are not significant in Fortran and can appear in the midst of identifiers
• Variable “counter” is same as “count er” • Grammars are rigorously specified to provide meaning
• In Fortran, blanks are important only in literal strings • Words in a programming language are always lexically specified
• Reading from left to right, one cannot distinguish between the two until the • Any string in (1…9)(0…9)* is a positive integer
“,” or “.” is reached
• Requires look ahead for resolution

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Why separate tokens and lexemes? Lexical Analysis as a Separate Phase
• Rules to govern the lexical structure of a programming language is 1. Simplifies the compiler design: I/O issues are limited to only the
called its microsyntax lexical analyzer, leading to better portability
2. Allows designing a more compact and faster parser
• Separating syntax and microsyntax allows for a simpler parser • Comments and whitespace need not be handled by the parser
• Parser only needs to deal with syntactic categories like IDENTIFIER • No rules for numbers, names, and comments are needed in the parser
• A parser is more complicated than a lexical analyzer and shrinking the
grammar makes the parser more efficient
3. Scanners based on finite automata are more efficient to implement
than stack-based pushdown automata used for parsing

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Interfacing with Parser Error Handling in Lexical Analysis

• A unique integer representing the token is passed by LA to the parser • LA cannot catch any other errors except for simple errors such as
illegal symbols
token • In such cases, LA skips characters in the input until a well-formed
source Lexical Syntax to semantic token is found
program Analyzer Analyzer analysis
get next token • This is called “panic mode” recovery
• We can think of other possible recovery strategies
• Delete one character from the remaining input, or insert a missing character
• Replace a character, or transpose two adjacent characters
symbol table
• Idea is to see if a single (or few) transformation(s) can repair the error

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Other Uses of Lexical Analysis Concepts
• UNIX command line tools like grep, awk, and sed
• Search tools in editors
• Word-processing tools

Implementing Scanners

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Implementing Scanners Implementation Considerations

1. Specify REs for each syntactic category in the PL • Speed is paramount for scanning
2. Construct an NFA for each RE • Processes every character from a possibly large input source program

3. Join the NFAs with 𝜖-transitions

4. Create the equivalent DFA • Repeatedly read input characters and simulate the corresponding DFA
• Types of scanner implementations: table-driven, direct-coded, and hand-
5. Minimize the DFA coded
6. Generate code to implement the DFA • Asymptotic complexity is the same, differs in run-time costs

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

High-Level Idea in Implementing Scanners Table-Driven Scanner
Lexical
Patterns Scanner
Read input characters one by one Tables
Generator

Look up the transition based on the current state and the input character
FSA
Switch to the new state Interpreter
[0…9]
Check for termination conditions, i.e., accept and error
• Register specification r [0…9]
Repeat • For example, r1 and r27 s0 s1 s2

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Table-Driven Scanner Table-Driven Scanner

𝒓 𝟎, 𝟏, 𝟐, … , 𝟗 EOF Other
𝒓 𝟎, 𝟏, 𝟐, … , 𝟗 EOF Other
state = 𝑠0 ; lexeme = “”; Register Digit EOF Other // Rollback
Register Digit EOF Other
clear stack; push(bad); while (state ∉ 𝑠𝐴 and state ≠ bad)
state = pop()
// Model the DFA
𝜹 R 0,1,…,9 other
truncate lexeme 𝜹 R 0,1,…,9 other
while (state ≠ 𝑠𝑒 ) 𝒔𝟎 𝑠1 𝑠𝑒 𝑠𝑒 rollback()
char = getNextChar() 𝒔𝟎 𝑠1 𝑠𝑒 𝑠𝑒
lexeme = lexeme + char 𝒔𝟏 𝑠𝑒 𝑠2 𝑠𝑒
if state ∈ 𝑠𝐴 𝒔𝟏 𝑠𝑒 𝑠2 𝑠𝑒
if state ∈ 𝑠𝐴 𝒔𝟐 𝑠𝑒 𝑠2 𝑠𝑒 return token 𝒔𝟐 𝑠𝑒 𝑠2 𝑠𝑒
clear stack
push(state) 𝒔𝒆 𝑠𝑒 𝑠𝑒 𝑠𝑒 else
return invalid
𝒔𝒆 𝑠𝑒 𝑠𝑒 𝑠𝑒
token = lookup(PATTERN)
state = 𝛿(state, token) involves two
table lookups

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Problem of Rollbacks Address Excessive Rollbacks
state = 𝑠0 ; lexeme = “”; // Rollback
clear stack; push(<bad, bad>); while (state ∉ 𝑠𝐴 and state ≠ bad)
• A scanner’s aim is to recognize the inputPos = 0 Failed[state, inputPos] = true
longest match but it can increase for each state 𝑠 ∈ DFA while (state ≠ 𝑠𝑒 ) <state, inputPos> = pop()
rollbacks for i = 1:|input stream| char = getNextChar() truncate lexeme
• Consider the RE 𝑎𝑏 | (𝑎𝑏)∗ 𝑐, and Failed[state, i] = false lexeme = lexeme + char rollback()
input 𝑎𝑏𝑎𝑏𝑎𝑏𝑎𝑏 inputPos = inputPos + 1
if Failed[state, inputPos] if state ∈ 𝑠𝐴
• A scanner can avoid such
break return token
pathological quadratic expense by if state ∈ 𝑠𝐴 else
remembering failed attempts clear stack return invalid
• Such scanners are called maximal push(<state, inputPos>)
munch scanners token = lookup(PATTERN)
state = 𝛿(state, token)

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Overhead with Table Lookups Direct-Coded Scanner

w lexeme = “”; clear stack; 𝑠 1: char = getNextChar()
base base
push(bad); goto 𝑠0; lexeme = lexeme + char
w
if state ∈ 𝑠𝐴
𝑠0: char = getNextChar() clear stack
Address2 = base + lexeme = char push(𝑠1 )
(i,j) (i*c + j) * w
if state ∈ 𝑠𝐴 if (‘0’ ≤ char ≤ ‘9’)
offset
Address1 = base + clear stack goto 𝑠2
i
offset * w push(𝑠0 ) else
c columns
if (char == ‘r’) goto 𝑠𝑒
goto 𝑠1
else
The table-driven scanner performs two address computations and
two load operations for each character that it processes goto 𝑠𝑒

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Direct-Coded Scanner Hand-Coded Scanner
𝑠 2: char = getNextChar() 𝑠𝑒 : while (state ∉ 𝑠𝐴 and
lexeme = lexeme + char state ≠ bad) • Many real-world compilers use hand-coded scanners for further
if state ∈ 𝑠𝐴 state = pop() efficiency
clear stack truncate lexeme
• For e.g., gcc 4.0 uses hand-coded scanners in several of its front ends
push(𝑠2 ) rollback()
if (‘0’ ≤ char ≤ ‘9’) if state ∈ 𝑠𝐴
goto 𝑠2 return token i. Fetching a character one-by-one from I/O is expensive; fetch a
else else number of characters in one go and store in a buffer
goto 𝑠𝑒 return invalid ii. Use double buffering to simplify lookahead and rollback

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Reading Characters from Input Optimizing Reads from the Buffer

• A scanner reads the input character by character • A buffer at its end may contain an initial portion of a lexeme
• Reading the input will be very inefficient if it requires a system call for every E = M*C**2
character read
E = M *
• Input buffer
• OS reads a block of data, supplies scanner the required amount, and stores
the remaining portion in a buffer called buffer cache • It creates problem in refilling the buffer, so a two-buffer scheme is
• In subsequent calls, actual I/O does not take place as long as the data is used where the two buffers are filled alternatively
available in the buffer cache
E = M * C * * 2 eof
• Scanner uses its own buffer since requesting OS for single character is also
costly due to context-switching overhead
forward
lexBegin

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Optimizing Reads from the Buffer Advance Forward Pointer
• Read from buffer if (forward is at end of first buffer) {
• (1) Check for end of buffer, and (2) test the type of the input character reload second buffer
• If end of buffer, then reload the other buffer forward = beginning of second buffer
} else if (forward is at end of second buffer) {
reload first buffer
E = M * C * * 2 eof
forward = beginning of first buffer
} else {
forward
lexBegin forward++
}

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Optimizing Reads from the Buffer Optimizing Reads from the Buffer
switch (*forward++) {
• A sentinel character (say eof) is placed at the end of buffer to avoid case eof:
two comparisons if (forward is at end of first buffer) {
reload second buffer
forward = beginning of second buffer
} else if (forward is at end of second buffer) {
reload first buffer
E = M eof * C * * 2 eof eof
forward = beginning of first buffer
} else { // end of input
forward
lexBegin break
}
…
// case for other characters
}

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Symbol Table Implementation of Symbol Table
• Data structure that stores information for subsequent phases Fixed space for lexemes Other attributes Pointer to Other attributes
lexemes
• Symbol table interface
• insert(s, t): save lexeme s, token t, and return pointer
• lookup(s): return index of entry for lexeme s or 0 if s is not found

32 bytes
4 bytes

Fixed amount of space to store lexemes

might waste space lexeme1 eos lexeme2 eos …

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Handling Keywords References

• Two choices: use separate REs or compare lexemes for ID token • A. Aho et al. Compilers: Principles, Techniques, and Tools, 2nd edition, Chapter 3.
• K. Cooper and L. Torczon. Engineering a Compiler, 2nd edition, Chapter 2.

• Consider token DIV and MOD with lexemes div and mod
• Initialize symbol table with insert(“div”, DIV) and
insert(“mod”, MOD) before beginning of scanning
• Any subsequent insert fails and any subsequent lookup returns the keyword
value
• These lexemes can no longer be used as an identifier

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Shankland - Theoretical Rook Endgames (2023)
83% (6)
Shankland - Theoretical Rook Endgames (2023)
450 pages
(Innovations in Transactional Analysis - Theory and Practice) Sari Van Poelje, Anne de Graaf - New Theory and Practice of Transactional Analysis in Organizations - On The Edge-Routledge (2021)
100% (1)
(Innovations in Transactional Analysis - Theory and Practice) Sari Van Poelje, Anne de Graaf - New Theory and Practice of Transactional Analysis in Organizations - On The Edge-Routledge (2021)
213 pages
01 - Assignment TX Line Solutions
100% (2)
01 - Assignment TX Line Solutions
4 pages
Automata: The Methods and The Madness
No ratings yet
Automata: The Methods and The Madness
84 pages
Trial Memorandum Plaintiff SAMPLE
100% (4)
Trial Memorandum Plaintiff SAMPLE
10 pages
Lease Forms Residential Lease Agreement
100% (4)
Lease Forms Residential Lease Agreement
6 pages
Law of Property and Easement-NOTES
No ratings yet
Law of Property and Easement-NOTES
62 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
From Everand
Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
Sherwyn Allibang
2/5 (1)
Why The Hammered Bracelet Could Not Be Flown Over
No ratings yet
Why The Hammered Bracelet Could Not Be Flown Over
21 pages
Slides CHP 3 and 4
No ratings yet
Slides CHP 3 and 4
21 pages
Topic 3
No ratings yet
Topic 3
66 pages
Dfa 2
No ratings yet
Dfa 2
51 pages
Lecture 3 (30-1-23)
No ratings yet
Lecture 3 (30-1-23)
11 pages
Advanced Theory of Computation: by Dr. Noman Hasany Chairman & Assoc. Prof. (CS), SSUET
No ratings yet
Advanced Theory of Computation: by Dr. Noman Hasany Chairman & Assoc. Prof. (CS), SSUET
44 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
CD ppt1
No ratings yet
CD ppt1
62 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
1) Role of Lexical Analysis and Its Issues
No ratings yet
1) Role of Lexical Analysis and Its Issues
10 pages
Module-1-Finite Automata
No ratings yet
Module-1-Finite Automata
74 pages
Csc3205-Lexical - Analysis PDF
No ratings yet
Csc3205-Lexical - Analysis PDF
33 pages
Principles of Programming Languages - ASU 2014
100% (4)
Principles of Programming Languages - ASU 2014
479 pages
Automata Compiler Desinger Notes
No ratings yet
Automata Compiler Desinger Notes
46 pages
Toc Mod1 1
No ratings yet
Toc Mod1 1
57 pages
CD ch2
No ratings yet
CD ch2
104 pages
FA MSC 2
No ratings yet
FA MSC 2
100 pages
Chapter 33
No ratings yet
Chapter 33
107 pages
FLAT Unit 1 August 2023
No ratings yet
FLAT Unit 1 August 2023
69 pages
Lecture2 Web
No ratings yet
Lecture2 Web
19 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
2 Lexical Analizer
No ratings yet
2 Lexical Analizer
56 pages
Automata Theory and Computability
No ratings yet
Automata Theory and Computability
189 pages
TOC GTU Study Material Presentations Unit-2 22022020072120AM
No ratings yet
TOC GTU Study Material Presentations Unit-2 22022020072120AM
125 pages
Ch2 CC
No ratings yet
Ch2 CC
47 pages
Code Source Tokens Scanner Parser IR
No ratings yet
Code Source Tokens Scanner Parser IR
26 pages
Lecture 1
No ratings yet
Lecture 1
13 pages
Compiler Construction Lecture 3-4
No ratings yet
Compiler Construction Lecture 3-4
78 pages
Theory of Computation - 2
No ratings yet
Theory of Computation - 2
44 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Complierdesign Operatingsonlanguagesrefiniteautomata 240920162828 5f5b45f9
No ratings yet
Complierdesign Operatingsonlanguagesrefiniteautomata 240920162828 5f5b45f9
16 pages
Theory of Computation
No ratings yet
Theory of Computation
64 pages
Module-3 Lexical Analysis: System Software 15CS63
No ratings yet
Module-3 Lexical Analysis: System Software 15CS63
8 pages
Lexical Analysis
No ratings yet
Lexical Analysis
121 pages
Lecture 3-4 Updated
No ratings yet
Lecture 3-4 Updated
26 pages
Lexical Analyzer
100% (1)
Lexical Analyzer
38 pages
Acd Notes - 2
No ratings yet
Acd Notes - 2
32 pages
Practical No-1: AIM: Write and Implement A Program To Simulate Deterministic Finite Automata Theory
No ratings yet
Practical No-1: AIM: Write and Implement A Program To Simulate Deterministic Finite Automata Theory
9 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
Role of Lexical Analysis: Scanning
No ratings yet
Role of Lexical Analysis: Scanning
2 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Lexical Analyzer Parser
No ratings yet
Lexical Analyzer Parser
38 pages
SLD 2
No ratings yet
SLD 2
67 pages
L1 - Introduction, Finite Automaton, Regular Languages - Expressions, Closure (Union)
No ratings yet
L1 - Introduction, Finite Automaton, Regular Languages - Expressions, Closure (Union)
12 pages
1 Finite Autometa
No ratings yet
1 Finite Autometa
21 pages
Lexical Analysis: Leonidas Fegaras
No ratings yet
Lexical Analysis: Leonidas Fegaras
28 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Compilation Techniques
No ratings yet
Compilation Techniques
21 pages
Cs 8501 Toc Unit1 Ppt1
No ratings yet
Cs 8501 Toc Unit1 Ppt1
67 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
CT 1 - A Answer Key
No ratings yet
CT 1 - A Answer Key
6 pages
CompilerD L3
No ratings yet
CompilerD L3
36 pages
Programming in Star
From Everand
Programming in Star
Francis McCabe
No ratings yet
F-3 Iso-Standard
No ratings yet
F-3 Iso-Standard
7 pages
Math 6 March 23 Quarter 3 Speed
No ratings yet
Math 6 March 23 Quarter 3 Speed
34 pages
Soumya Ranjan Dash - Es20913
No ratings yet
Soumya Ranjan Dash - Es20913
1 page
EG8145V5 Quick Start 01 (R20C00)
No ratings yet
EG8145V5 Quick Start 01 (R20C00)
16 pages
Tle 10-Las Q4-Week 3
No ratings yet
Tle 10-Las Q4-Week 3
4 pages
Peter Markus NGEM01
No ratings yet
Peter Markus NGEM01
63 pages
Form 60
No ratings yet
Form 60
1 page
Prewedding Catalog 2023
No ratings yet
Prewedding Catalog 2023
8 pages
Internship Report
No ratings yet
Internship Report
10 pages
15MW Periodic Maintenace Schedule
No ratings yet
15MW Periodic Maintenace Schedule
8 pages
Solution Manual For Canadian PR For The Real World Maryse Cardin Kylie Mcmullan
No ratings yet
Solution Manual For Canadian PR For The Real World Maryse Cardin Kylie Mcmullan
6 pages
Madrid Protocol TMR
No ratings yet
Madrid Protocol TMR
21 pages
JPPPF June2025 111 02 13 26 Dwi+Ambar
No ratings yet
JPPPF June2025 111 02 13 26 Dwi+Ambar
14 pages
Grammar Worksheets
No ratings yet
Grammar Worksheets
30 pages
Week 5 MODULE PURPOSIVE COMMUNICATION
No ratings yet
Week 5 MODULE PURPOSIVE COMMUNICATION
13 pages
Chapter 1 - Marketing in Today's Economy
No ratings yet
Chapter 1 - Marketing in Today's Economy
43 pages
BYK E-Prospectus of PDF
No ratings yet
BYK E-Prospectus of PDF
9 pages
Literature Review Last Edit
No ratings yet
Literature Review Last Edit
11 pages
Amer Shield
No ratings yet
Amer Shield
4 pages
Philips 37pfl7605h CH Q552.1e-La
No ratings yet
Philips 37pfl7605h CH Q552.1e-La
184 pages
IClebo Arte User Guide-English
No ratings yet
IClebo Arte User Guide-English
20 pages
General Biology Chapter 2 Assignment
No ratings yet
General Biology Chapter 2 Assignment
2 pages
Home Sweet Compromise
No ratings yet
Home Sweet Compromise
7 pages

2 Compiler - Slide

Uploaded by

2 Compiler - Slide

Uploaded by

An Overview of Compilation

source program target program

CS 335 Swarnendu Biswas

Overview of Lexical Analysis Description of Lexical Analysis

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Definitions Finite State Automaton

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Dealing with Erroneous Situations Nondeterministic Finite Automaton

• Simulating a DFA is more efficient than an NFA

• 𝐿∗ is called the Kleene closure or closure of 𝐿

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Examples of Regular Expressions Examples of Regular Expressions

𝐿 = 𝑤 ∈ 0,1 ∗ 𝑤 has two or three occurences of 1,

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Regular Definitions Example of Regular Definitions

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Equivalence of RE and FSA NFA to DFA: Subset Construction DFA = (𝐷, Σ, 𝛿𝐷 , 𝑑0 , 𝐷𝐴 )

• There exists an NFA with 𝜖-transitions that accepts 𝐿(𝑟), where 𝑟 is a

• If 𝐿 is accepted by a DFA, then 𝐿 is generated by a RE 𝑞0 = 𝜖-closure({𝑠0 }) for each state 𝑛 ∈ 𝑁 do

• A DFA from Subset construction can have a large number of states 𝑎 𝑎

𝑎 does not split 𝑝1 𝑎 splits 𝑝3 Partitions after splitting on 𝑎

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

DFA to Minimal DFA: Hopcroft’s Algorithm

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Attributes of Tokens Role of a Lexical Analyzer

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Examples of Transition Diagrams A Sample Specification

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

start letter other *

start delim other *

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Challenges in Lexical Analysis Challenges in Lexical Analysis

if then then then = else else else = then • PL/I declarations

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

• Can these problems be resolved by lexical analysers alone? No, in some

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Programming Languages vs Natural

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Interfacing with Parser Error Handling in Lexical Analysis

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Implementing Scanners Implementation Considerations

3. Join the NFAs with 𝜖-transitions

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Table-Driven Scanner Table-Driven Scanner

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Overhead with Table Lookups Direct-Coded Scanner

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Reading Characters from Input Optimizing Reads from the Buffer

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Fixed amount of space to store lexemes

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Handling Keywords References

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

You might also like