0% found this document useful (0 votes)

33 views30 pages

Chapter 2 - 1 Lexical Analysis

The document describes the process of developing a lexical analyzer or scanner. It discusses expressing the lexical grammar, implementing a scanner based on that grammar, and refining the scanner to track token spelling and kind. It provides details on systematically developing scanning methods for each non-terminal in the grammar. The scanner returns Token objects containing the token kind and spelling. Regular expressions are used to describe tokens and finite state machines are used to recognize them. Non-deterministic finite automata can be converted to equivalent deterministic finite automata.

Uploaded by

mishamoamanuel574

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views30 pages

Chapter 2 - 1 Lexical Analysis

Uploaded by

mishamoamanuel574

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Chapter 2_1: Lexical Analysis/scanning

Syntax Analysis: Scanner

Dataflow chart
Source Program Stream of Characters

Scanner Error Reports

Stream of “Tokens”

Parser Error Reports

Abstract Syntax Tree

Steps for Developing a Scanner
1) Express the “lexical” grammar
2) Implement Scanner based on this grammar
3) Refine scanner to keep track of spelling and kind of
currently scanned token
Systematic Development of Scanner
(1) Express (lexical) grammar.
(2) (2) Create a scanning method scan N for each non terminal
N
(3) Create a scanner class with
– private variable currentChar
– private methods : take and takeIt
– the private scanning methods implemented in step (2)
– add private scan N method for each non terminal N,
enhanced to record each token’s kind and spelling
– public scan method that scans Separator* Token,
discarding any separators but returning the token that
follows them
Developing a Scanner
Implementation of the scanner

public class Scanner {

private char currentChar;

private StringBuffer currentSpelling;
private byte currentKind;

private char take(char expectedChar) { ... }

private char takeIt() { ... }

// other private auxiliary methods and scanning

// methods here.

public Token scan() { ... }

}
Developing a Scanner
The scanner will return instances of Token:
public class Token {
byte kind; String spelling;
final static byte
IDENTIFIER = 0; INTLITERAL = 1; OPERATOR = 2;
BEGIN = 3; CONST = 4; ...
...

public Token(byte kind, String spelling) {

this.kind = kind; this.spelling = spelling;
if spelling matches a keyword change my kind
automatically
}

...
}
Developing a Scanner
public class Scanner {

private char currentChar = get first source char;

private StringBuffer currentSpelling;
private byte currentKind;

private char take(char expectedChar) {

if (currentChar == expectedChar) {
currentSpelling.append(currentChar);
currentChar = get next source char;
}
else report lexical error
}
private char takeIt() {
currentSpelling.append(currentChar);
currentChar = get next source char;
}
...
Developing a Scanner
...
public Token scan() {
// Get rid of potential separators before
// scanning a token
while ( (currentChar == ‘!’)
|| (currentChar == ‘ ’)
|| (currentChar == ‘\n’ ) )
scanSeparator();
currentSpelling = new StringBuffer();
currentKind = scanToken();
return new Token(currentkind,
currentSpelling.toString());
}
Developed much in the
private void scanSeparator() { ... } same way as parsing
private byte scanToken() { ... } methods
...
Developing a Scanner

private byte scanToken() {

switch (currentChar) {
case ‘a’: case ‘b’: ... case ‘z’:
case ‘A’: case ‘B’: ... case ‘Z’:
scan Letter (Letter | Digit)*
return Token.IDENTIFIER;
case ‘0’: ... case ‘9’:
scan Digit Digit*
return Token.INTLITERAL ;
case ‘+’: case ‘-’: ... : case ‘=’:
takeIt();
return Token.OPERATOR;
...etc...
}
Developing a Scanner
Let’s look at the identifier case in more detail

...
return ...
case ‘a’: case ‘b’: ... case ‘z’:
case ‘A’: case ‘B’: ... case ‘Z’:
scan Letter (Letter | Digit)*
acceptIt();
while
return
scan (Letter
(isLetter(currentChar)
Token.IDENTIFIER;
| Digit)*
case ‘0’:
return ... case ‘9’:
|| isDigit(currentChar)
Token.IDENTIFIER; )
case ‘0’: ... case|‘9’:
...acceptIt();
scan (Letter Digit)
...
return Token.IDENTIFIER;
case ‘0’: ... case ‘9’:
...
Developing a Scanner
The scanner will return instances of Token:
public class Token {
byte kind; String spelling;
final static byte
IDENTIFIER = 0; INTLITERAL = 1; OPERATOR = 2;
BEGIN = 3; CONST = 4; ...
...

public Token(byte kind, String spelling) {

this.kind = kind; this.spelling = spelling;
if spelling matches a keyword change my kind
automatically
}

...
}
Developing a Scanner
The scanner will return instances of Token. The implementation below
is the one in the Triangle source code.

public class Token {

...
public Token(byte kind, String spelling) {
if (kind == Token.IDENTIFIER) {
int currentKind = firstReservedWord;
boolean searching = true;
while (searching) {
int comparison = tokenTable[currentKind].compareTo(spelling);
if (comparison == 0) {
this.kind = currentKind;
searching = false;
} else if (comparison > 0 || currentKind == lastReservedWord) {
this.kind = Token.IDENTIFIER;
searching = false;
} else { currentKind ++; }
}
} else
this.kind = kind;
...
}
Developing a Scanner
The scanner will return instances of Token:
public class Token {
...

private static String[] tokenTable = new String[] {

"<int>", "<char>", "<identifier>", "<operator>",
"array", "begin", "const", "do", "else", "end",
"func", "if", "in", "let", "of", "proc", "record",
"then", "type", "var", "while",
".", ":", ";", ",", ":=", "~", "(", ")", "[", "]", "{", "}", "",
"<error>" };

private final static int firstReservedWord = Token.ARRAY,

lastReservedWord = Token.WHILE;
...
}
Generating Scanners
Generation of scanners is based on
• Regular Expressions: to describe the tokens to be recognized
• Finite State Machines: an execution model to which REs are
“compiled”
Recap: Regular Expressions
e The empty string
t Generates only the string t
XY Generates any string xy such that x is generated by x
and y is generated by Y
X|Y Generates any string which generated either
by X or by Y
X* The concatenation of zero or more strings generated
by X
(X) For grouping
Generating Scanners
• Regular Expressions can be recognized by a finite state
machine. (often used synonyms: finite automaton (acronym FA))

Definition: A finite state machine is an N-tuple (States,S,start,d ,End)

States A finite set of “states”
S An “alphabet”: a finite set of symbols from which the
strings we want to recognize are formed (for example:
the ASCII char set)
start A “start state” Start  States
d Transition relation d  States x States x S. These are
“arrows” between states labeled by a letter from the
alphabet.
End A set of final states. End  States
Generating Scanners
• Finite state machine: the easiest way to describe a
Finite State Machine is by means of a picture:
Example: an FA that recognizes M r | M s

= initial state
r = final state
M
M = non-final state
s
Deterministic, and non-deterministic FA
• A FA is called deterministic (acronym: DFA) if for
every state and every possible input symbol, there
is only one possible transition to chose from.
Otherwise it is called non-deterministic (NDFA or
NFA).
Q: Is this FSM deterministic or non-deterministic:

r
M

M
s
Deterministic, and non-deterministic FA
• Theorem: every NDFA can be converted into an
equivalent DFA.
r
M
M
s

r
DFA ? M
s
Deterministic, and non-deterministic FA
• Theorem: every NDFA can be converted into an
equivalent DFA.
Algorithm:
The basic idea: DFA is defined as a machine that does a “parallel
simulation” of the NDFA.
• The states of the DFA are subsets of the states of the NDFA
(i.e. every state of the DFA is a set of states of the NDFA)
=> This state can be interpreted as meaning “the simulated
NDFA is now in any of these states”
Deterministic, and non-deterministic FA
Conversion algorithm example:
r
M
2 3
M
1 r
{3,4} is a final state because 3
4 r is a final state
r,s

r {3,4} {2,4} --r-->{3,4}

M because:
r s
s 2 --r--> 3
{1} {2,4}
4 --r--> 3
{4} 4 --r--> 4
s
FA with e moves
(N)DFA-e automata are like (N)DFA. In an (N)DFA-e we are
allowed to have transitions which are “e-moves”.
Example: M r (M r)*
M r

e
Theorem: every (N)DFA-e can be converted into an equivalent
NDFA (without e-moves).

M r
r M
FA with e moves
Theorem: every (N)DFA-e can be converted into an equivalent
NDFA (without e-moves).
convert into a final state
Algorithm:
e
1) converting states into final states:
if a final state can be reached from
a state S using an e-transition
convert it into a final state.
Repeat this rule until no more states can be converted.
For example:
convert into a final state
e e

2 1
FA with e moves
Algorithm:
1) converting states into final states.
2) adding transitions (repeat until no more can be added)
a) for every transition followed by e-transition
t e

t add transition

b) for every transition preceded by e-transition

e t

t add transition
3) delete all e-transitions
Converting a RE into an NDFA-e
RE: e
FA:

RE: t
FA: t

RE: XY
FA: e
X Y
Converting a RE into an NDFA-e
RE: X|Y
FA:
e X e

e Y e

RE: X* e
FA:
X

e
FA and the implementation of Scanners
• Regular expressions, (N)DFA-e and NDFA and
DFA’s are all equivalent formalism in terms of what
languages can be defined with them.
• Regular expressions are a convenient notation for
describing the “tokens” of programming
languages.
• Regular expressions can be converted into FA’s
(the algorithm for conversion into NDFA-e is
straightforward)
• DFA’s can be easily implemented as computer
programs.
FA and the implementation of Scanners

What a typical scanner generator does:

Token definitions Scanner Generator Scanner DFA

Regular expressions Java or C or ...

A possible algorithm: note: In practice this exact

- Convert RE into NDFA-e algorithm is not used. For reasons of
- Convert NDFA-e into NDFA performance, sophisticated
- Convert NDFA into DFA optimizations are used.
- generate Java/C/... code • direct conversion from RE to DFA
• minimizing the DFA
Implementing a DFA
Definition: A finite state machine is an N-tuple (States,S,start,d,
End)
States N different states => integers {0,..,N-1} => int data type
S byte or char data type.
start An integer number
d Transition relation d  States x S x States.
For a DFA this is a function
States x S -> States
Represented by a two dimensional array (one dimension
for the current state, another for the current character).
The contents of the array is the next state.
End A set of final states. Represented (for example) by an
array of Booleans (mark final state by true and other
states by false)
JLex Regular Expressions
• Regular expressions are expressed using ASCII
characters (0 – 127).
• The following characters are metacharacters.
? * + | ( ) ^ $ . [ ] { } “ \
• Metacharacters have special meaning; they do not
represent themselves.
• All other characters represent themselves.
THANK YOU

Pragmatics PDF
0% (1)
Pragmatics PDF
87 pages
Ultratech Report Final
No ratings yet
Ultratech Report Final
78 pages
Saic-Q-1035 Sub-Base & Base Course
No ratings yet
Saic-Q-1035 Sub-Base & Base Course
4 pages
Iso 3511 Instrument - Symbols - Part - 4 PDF
0% (1)
Iso 3511 Instrument - Symbols - Part - 4 PDF
10 pages
ADM202EA
No ratings yet
ADM202EA
16 pages
Cinematography: Lighting
88% (24)
Cinematography: Lighting
77 pages
Simba S7 D - Techspecific
No ratings yet
Simba S7 D - Techspecific
4 pages
Valsir - Triplus New
No ratings yet
Valsir - Triplus New
20 pages
Office Automation
No ratings yet
Office Automation
14 pages
Abstract Algebra Rings, Modules, Polynomials, Ring Extensions, Categorical and Commutative Algebra
No ratings yet
Abstract Algebra Rings, Modules, Polynomials, Ring Extensions, Categorical and Commutative Algebra
488 pages
Materi SMA Bahasa Inggris
No ratings yet
Materi SMA Bahasa Inggris
21 pages
DS-32-SDMS-07 Rev-01 - 600A CB - Interface
No ratings yet
DS-32-SDMS-07 Rev-01 - 600A CB - Interface
3 pages
Jie Yang, Zhong Xin, Quan (Sophia) He, Kenneth Corscadden, Haibo Niu T
No ratings yet
Jie Yang, Zhong Xin, Quan (Sophia) He, Kenneth Corscadden, Haibo Niu T
21 pages
Nmrws2: H O That Are Aldehydes
No ratings yet
Nmrws2: H O That Are Aldehydes
4 pages
M Schemes 04
0% (2)
M Schemes 04
3 pages
Mock Exam-P1 Review 2025
No ratings yet
Mock Exam-P1 Review 2025
41 pages
Laboratory Activity 2
No ratings yet
Laboratory Activity 2
19 pages
15.18 Auxiliary Power Units (APUs)
No ratings yet
15.18 Auxiliary Power Units (APUs)
24 pages
IND315 Operations Research I, Fall 2023, by Ç. Özgün Kibiroğlu
No ratings yet
IND315 Operations Research I, Fall 2023, by Ç. Özgün Kibiroğlu
7 pages
Half-Wave Rectifier Feeding A DC Motor
No ratings yet
Half-Wave Rectifier Feeding A DC Motor
4 pages
Network Scanning With Scapy in Python by Zhang Zeyu Python in Plain English
No ratings yet
Network Scanning With Scapy in Python by Zhang Zeyu Python in Plain English
11 pages
White Paper Droplet Based Microfluidics Elveflow Microfluidics
No ratings yet
White Paper Droplet Based Microfluidics Elveflow Microfluidics
28 pages
Lab Java MyRMwhFBL
No ratings yet
Lab Java MyRMwhFBL
2 pages
Basic Principles and Practices in CC1 1
No ratings yet
Basic Principles and Practices in CC1 1
2 pages
ESC201 UDas Lec24Corrected OpAmp Aps PDF
No ratings yet
ESC201 UDas Lec24Corrected OpAmp Aps PDF
6 pages
Book Review of Lewis Vaughn's "The Power of Critical Thinking"
No ratings yet
Book Review of Lewis Vaughn's "The Power of Critical Thinking"
6 pages
Application of DVD/CD Pickup Optics To Microscopy and Fringe Projection
No ratings yet
Application of DVD/CD Pickup Optics To Microscopy and Fringe Projection
6 pages
Spinach 1
No ratings yet
Spinach 1
7 pages
Assignment On MAT141
No ratings yet
Assignment On MAT141
2 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
(Reg. Relationship Steps
No ratings yet
(Reg. Relationship Steps
4 pages
6 Lexical Analysis
No ratings yet
6 Lexical Analysis
39 pages
A1 Compilers B22it031
No ratings yet
A1 Compilers B22it031
11 pages
Applications of FA
No ratings yet
Applications of FA
29 pages
Unit 5
No ratings yet
Unit 5
43 pages
Lexical Analysis
No ratings yet
Lexical Analysis
88 pages
Unit 1
No ratings yet
Unit 1
34 pages
Unit 2-Introduction To Compilers
No ratings yet
Unit 2-Introduction To Compilers
51 pages
Lexical Analysis
No ratings yet
Lexical Analysis
66 pages
Lecture 15
No ratings yet
Lecture 15
27 pages
Slides CHP 3 and 4
No ratings yet
Slides CHP 3 and 4
21 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
SML Chapter9
No ratings yet
SML Chapter9
40 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
2 Scan 1
No ratings yet
2 Scan 1
24 pages
Concepts - Assignment (Technical Report Template)
No ratings yet
Concepts - Assignment (Technical Report Template)
14 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
2024 CSN352 Lec 8
No ratings yet
2024 CSN352 Lec 8
48 pages
Lexing
No ratings yet
Lexing
16 pages
EX 8 - 14 47 ACD - Merged
No ratings yet
EX 8 - 14 47 ACD - Merged
30 pages
Lecture 2.76
No ratings yet
Lecture 2.76
31 pages
Compiler Lab Print Merged
No ratings yet
Compiler Lab Print Merged
45 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Lexical Analysis of Compiler
No ratings yet
Lexical Analysis of Compiler
13 pages
Lect 03
No ratings yet
Lect 03
19 pages
Lecture - 03 Compiler Overview - Lexical Analysis, Tokens, Ad-Hoc Lexer
No ratings yet
Lecture - 03 Compiler Overview - Lexical Analysis, Tokens, Ad-Hoc Lexer
50 pages
01 134201 011 9556776808 04042022 115152pm
No ratings yet
01 134201 011 9556776808 04042022 115152pm
13 pages
Chapter 33
No ratings yet
Chapter 33
107 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Experiment-1 Problem Definition
No ratings yet
Experiment-1 Problem Definition
28 pages
Programming Languaged Scanning Week 1-2
No ratings yet
Programming Languaged Scanning Week 1-2
7 pages
01 134201 011 9556776808 15032022 124916pm
No ratings yet
01 134201 011 9556776808 15032022 124916pm
6 pages
3 - Lexical Analysis (Compatibility Mode) PDF
No ratings yet
3 - Lexical Analysis (Compatibility Mode) PDF
28 pages
Unit 5 Scanner
No ratings yet
Unit 5 Scanner
14 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Implementation of Finite Automat in Code
No ratings yet
Implementation of Finite Automat in Code
18 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
LA Using Transition Table
No ratings yet
LA Using Transition Table
5 pages
Lecture08 4up
No ratings yet
Lecture08 4up
5 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
LA Using Transition Diagram
No ratings yet
LA Using Transition Diagram
2 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
CS3304 9 LanguageSyntax 2 PDF
No ratings yet
CS3304 9 LanguageSyntax 2 PDF
39 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Module-3 Lexical Analysis: System Software 15CS63
No ratings yet
Module-3 Lexical Analysis: System Software 15CS63
8 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
COS 320 Compilers: David Walker
No ratings yet
COS 320 Compilers: David Walker
38 pages
Lecture 3 - LexicalAnalysis
No ratings yet
Lecture 3 - LexicalAnalysis
27 pages
Assignment 1 (Lexical Analyzer)
No ratings yet
Assignment 1 (Lexical Analyzer)
17 pages
Lexical Analysis: Programming Languages Translators
No ratings yet
Lexical Analysis: Programming Languages Translators
21 pages
Assignment 2 04042021 045308pm
No ratings yet
Assignment 2 04042021 045308pm
9 pages
The Scanner or Lexical Analyzer 2017
No ratings yet
The Scanner or Lexical Analyzer 2017
3 pages
CC Assignment
No ratings yet
CC Assignment
11 pages
Assignment 2 04042021 045308pm
No ratings yet
Assignment 2 04042021 045308pm
14 pages
Lab 2: Lexer Implementation: Preparation
No ratings yet
Lab 2: Lexer Implementation: Preparation
6 pages
03 Lexical Analysis
No ratings yet
03 Lexical Analysis
14 pages
Compiler Design & Construction Term Project: Part 1
No ratings yet
Compiler Design & Construction Term Project: Part 1
10 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

Chapter 2 - 1 Lexical Analysis

Uploaded by

Chapter 2 - 1 Lexical Analysis

Uploaded by

Chapter 2_1: Lexical Analysis/scanning

Syntax Analysis: Scanner

Scanner Error Reports

Parser Error Reports

Abstract Syntax Tree

public class Scanner {

private char currentChar;

private char take(char expectedChar) { ... }

// other private auxiliary methods and scanning

public Token scan() { ... }

public Token(byte kind, String spelling) {

private char currentChar = get first source char;

private char take(char expectedChar) {

private byte scanToken() {

public Token(byte kind, String spelling) {

public class Token {

private static String[] tokenTable = new String[] {

private final static int firstReservedWord = Token.ARRAY,

Definition: A finite state machine is an N-tuple (States,S,start,d ,End)

r {3,4} {2,4} --r-->{3,4}

b) for every transition preceded by e-transition

What a typical scanner generator does:

Token definitions Scanner Generator Scanner DFA

A possible algorithm: note: In practice this exact

You might also like