0% found this document useful (0 votes)

28 views16 pages

Lexing

Uploaded by

madtecharch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views16 pages

Lexing

Uploaded by

madtecharch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

LEXING

ACKNOWLEDGEMENT: SLIDES ARE ADAPTED FROM THE MATERIALS FROM JOSHUA ELLIS AND ANTHONY
CLARK

1
2
INTERPRETER Regular Expressions
and
Finite State Machines
Group characters
Source Code (Plain Text) into smallest
meaningful units Lexemes/Tokens
int main ( ) {
int i = getint ( ) , j = getint ( ) ;

Lexer/Scanner
while ( i != j ) {
if ( i > j ) i = i - j ;
else j = j - 1 ;
1. // GCD Program (in C) }

2. int main() { putint ( i ) ;

3. int i = getint(), j = }

getint();
4.
5.
while (i != j) {
if (i > j) i = i - j; Working on this
6. else j = j - i;
7. }
8. putint(i); 9.
}

We’ll create our own

simple calculator language

3
The Finite-State Machine use by a
turnstile

Finite-State Machine (FSM) (aka Finite-State Automaton (FSA)

or State Machine)
A mathematical model of computation. It is an abstract machine that
can be in exactly one of a finite number of states at any given time. 4
The FSM can change from one state to another in response to some
inputs; the change from one state to another is called a transition.
FINITE STATE MACHINE DIAGRAMS

Stat
Transition/ Stat OR e and
Details
Condition e information

Entran
Exit OR Exit
ce

5
INTERPRETER

read
Lexe
r
request … … send
token
toke
Parser n
request send
AST AST
I/O
Console Tree Walker

6
AST (Abstract Syntax
Tree)
LEXER

This process is known as:

• Scanning, lexing (lexical analysis), or tokenizing

This is the first step for any compiler or interpreter

A scanner takes in a raw source (plain text) and turns it into lexemes
• A lexeme is the smallest grouping of characters that represents
something useful/meaningful

7
GENERATING AND DESIGNING A SCANNER

1. Look at your language

2. Find all of your lexemes

• Lexemes are the smallest meaningful grouping of characters
• You also must respect the maximal munch rule (always take more characters if possible)
 • == is usually an EqualEqual token, not two Equal tokens
• let lettuce = … is Let, Identifier(“lettuce”), Equal, … not Let, Let, Identifier(“tuce”), …

1. Write a regular expression for each lexeme

• All of these expressions together create a lexical grammar

2. Generate or Design a scanner

• Use ANTLR or Lex or Flex or JFlex or … to automatically generate a scanner 8

• Write a scanner by hand (known as the Ad-hoc method)

AD-HOC SCANNERS

For this class, we will be writing an ad-hoc scanner

• This has the benefit of not hiding any of the details
• This has the drawback of not hiding any of the details

An ad-hoc scanner is basically a match expression on steroids

• Or a switch statement if your in C++
• Or branching if-statements if your in some other language
• “…pretty much a switch statement with delusions of grandeur.”

9
fn get_next_token(&mut self) -> Result<Token,
String> { self.skip_whitespace();

if self.cursor >=
self.input.len() { return
Ok(Token::End);
}

let new_token = match

self.input[self.cursor] { '+' =>
Token::Plus,
'-' => Token::Minus,
'*' => Token::Star,
'/' =>
Token::Slash, '^'
=> Token::Caret,
'(' =>
Token::LParen,
')' => Token::RParen,
_ => {
return Err(format!(
"Unexpected character: '{}'",
self.input[self.cursor]
));
}
};

self.cursor += 1;
Ok(new_token)
} 10

}
fn get_next_token(&mut self) -> Result<Token,
String> { self.skip_whitespace();

if self.cursor >=
self.input.len() { return
Ok(Token::End);
}

let new_token = match

self.input[self.cursor]
'+' => Token::Plus, {
'-' => Token::Minus,
'*' => Token::Star,
'/' => Token::Slash,
'^' => Token::Caret,
'(' => Token::LParen
,
_ =>=>{ Token::RParen
')'
return
, Err(format!(
"Unexpected character:
'{}'",
self.input[self.cursor]
));
}
};
self.cursor +=
1;
Ok(new_token)
}
}

11
LEXEME TOKEN CATEGORIES

• Single-character punctuators ( + , ; - ( }
• etc. )
Multi-character punctuators ( == <= etc.
• Comments )
• Literals (strings and numbers)
• Reserved keywords ( while for let int
etc. )
• Identifiers

12
Comment : '/*' .*? '*/' | '//' ~[\r\ // Read two
n]*; values read A
Assig : ':='; read B
n : '+'; ANTLR Syntax
// Add them
Plus : '-';
sum := A +
Minus : '*'; B
: '/';
Times : '('; // Print
: ')'; stuff write
Divid sum write
: 'read';
sum / talk
We’ll 2 more about
e : regular expressions next.
LPare 'write';
: Letter (Letter |
n Digit)*;
: Digit+ | Digit* ('.' Digit | Digit '.')
RPare Digit*;
n
fragmen Letter : [a-zA-
t
Read Z];
fragmen Digit
Write [0-9];
t
Identifie :
r Number
13
14
CHOICES TO MAKE

• How do we represent comments? (// and /* */) or (# and ‘’’ ‘’’)

• How do we end statements/lines of code. (return or ;)

• Do we require type definitions?

• How are variables initialized?

• Do we implement lists/arrays?

• Do we implement functions?
• Do we allow for the declaration of functions?
15
What features does our
language have?
What tokens are we looking
for?
WHAT DOES OUR FSM/LEXER LOOK
LIKE?

What coding language should

we use? 16

Compiler Design in C (Allen I. Holub)
100% (1)
Compiler Design in C (Allen I. Holub)
986 pages
Aci Troubleshooting Book
100% (9)
Aci Troubleshooting Book
230 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Thorsten Ball-Writing An Interpreter in Go (2017) PDF
100% (1)
Thorsten Ball-Writing An Interpreter in Go (2017) PDF
206 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
SAP Handling Unit Management Integration With Production Planning
100% (1)
SAP Handling Unit Management Integration With Production Planning
23 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Cross Sections Creating Annotating and Volumes Practice Workbook
No ratings yet
Cross Sections Creating Annotating and Volumes Practice Workbook
29 pages
WEEK1 - Overview in MS Excel PDF
No ratings yet
WEEK1 - Overview in MS Excel PDF
22 pages
Compiler Design Practical File PDF
No ratings yet
Compiler Design Practical File PDF
33 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Revit 2010 Tutorials
0% (1)
Revit 2010 Tutorials
374 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
CD ch2
No ratings yet
CD ch2
104 pages
Chapter 3 - Scanning: 3.1 Kinds of Tokens
No ratings yet
Chapter 3 - Scanning: 3.1 Kinds of Tokens
17 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
2024 CSN352 Lec 8
No ratings yet
2024 CSN352 Lec 8
48 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Lexical and Syntactic Analysis: Vitaly Shmatikov
No ratings yet
Lexical and Syntactic Analysis: Vitaly Shmatikov
39 pages
Lexical and Syntax Analysis (Parsing)
No ratings yet
Lexical and Syntax Analysis (Parsing)
39 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
COS 320 Compilers: David Walker
No ratings yet
COS 320 Compilers: David Walker
38 pages
Laboratory Manual For Compiler Design: Robb T. Koether
No ratings yet
Laboratory Manual For Compiler Design: Robb T. Koether
194 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Lecture 3-4 Updated
No ratings yet
Lecture 3-4 Updated
26 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
67163118e98feCCWeek 03lecture05
No ratings yet
67163118e98feCCWeek 03lecture05
62 pages
Lexical and Syntactic Analysis: Slide 1
No ratings yet
Lexical and Syntactic Analysis: Slide 1
39 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Ch02 Programming Language Syntax 4e 2
No ratings yet
Ch02 Programming Language Syntax 4e 2
64 pages
Lexical Analysis: Textbook:Modern Compiler Design
No ratings yet
Lexical Analysis: Textbook:Modern Compiler Design
43 pages
2 Scan 1
No ratings yet
2 Scan 1
24 pages
L02 Syntax
No ratings yet
L02 Syntax
114 pages
CPS and Digital Twin in Industry 4.0 - Group 5
No ratings yet
CPS and Digital Twin in Industry 4.0 - Group 5
11 pages
Chapter - 2 Lexical Analysis
No ratings yet
Chapter - 2 Lexical Analysis
160 pages
Lecture 2.76
No ratings yet
Lecture 2.76
31 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
63 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Unit 5
No ratings yet
Unit 5
43 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Ch2 CC
No ratings yet
Ch2 CC
47 pages
2 - Scanner
No ratings yet
2 - Scanner
49 pages
Chapter 3
No ratings yet
Chapter 3
25 pages
2 Lexing
No ratings yet
2 Lexing
73 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Lexical Analysis
No ratings yet
Lexical Analysis
66 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
FSP 3000
No ratings yet
FSP 3000
16 pages
Sral
No ratings yet
Sral
20 pages
Modeling Business Processes Using BPMN and ARIS: Applies To
No ratings yet
Modeling Business Processes Using BPMN and ARIS: Applies To
11 pages
Unit I
100% (1)
Unit I
7 pages
Worksheet SQL
No ratings yet
Worksheet SQL
14 pages
Microsoft Office Interview Questions and Answers PDF
100% (1)
Microsoft Office Interview Questions and Answers PDF
15 pages
Chapter 2: PC Assembly: Instructor Materials
100% (1)
Chapter 2: PC Assembly: Instructor Materials
46 pages
AXE Telephone Exchange - Wikipedia, The Free Encyclopedia
No ratings yet
AXE Telephone Exchange - Wikipedia, The Free Encyclopedia
2 pages
What Are Schemas
No ratings yet
What Are Schemas
25 pages
Unit1 Detailed Notes DWDM MAKAUT
No ratings yet
Unit1 Detailed Notes DWDM MAKAUT
4 pages
Full Circle Magazine - Issue 96 EN
No ratings yet
Full Circle Magazine - Issue 96 EN
63 pages
OpenStack-made-easy Ebook 11.17 PDF
No ratings yet
OpenStack-made-easy Ebook 11.17 PDF
29 pages
h14385 Introduction Vnxe1600 WP
No ratings yet
h14385 Introduction Vnxe1600 WP
27 pages
Web Technologies - II
No ratings yet
Web Technologies - II
2 pages
Samples Manual
No ratings yet
Samples Manual
24 pages
LogCluster A Data Clustering and Pattern Mining
No ratings yet
LogCluster A Data Clustering and Pattern Mining
7 pages
TMJava Book Three
No ratings yet
TMJava Book Three
28 pages
Ciphering Procedure in GSM Call Flow
No ratings yet
Ciphering Procedure in GSM Call Flow
3 pages
Avaya CMS Security 19.x March 2021
No ratings yet
Avaya CMS Security 19.x March 2021
36 pages
SDA Lab 12
No ratings yet
SDA Lab 12
4 pages
SANGFOR - IAM - v12.0.42 - Version Release Notes
No ratings yet
SANGFOR - IAM - v12.0.42 - Version Release Notes
9 pages
FIXED FOR NEWS-ANNOUNCEMENT-Announcement - Manage-Php
No ratings yet
FIXED FOR NEWS-ANNOUNCEMENT-Announcement - Manage-Php
7 pages
Sara T Chandra
No ratings yet
Sara T Chandra
3 pages
DAA Lab - Practical No. 01 - AI&DS
No ratings yet
DAA Lab - Practical No. 01 - AI&DS
4 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
C Programming
From Everand
C Programming
Netra
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

Lexing

Uploaded by

Lexing

Uploaded by

LEXING

2. int main() { putint ( i ) ;

We’ll create our own

Finite-State Machine (FSM) (aka Finite-State Automaton (FSA)

This process is known as:

This is the first step for any compiler or interpreter

1. Look at your language

2. Find all of your lexemes

1. Write a regular expression for each lexeme

2. Generate or Design a scanner

• Write a scanner by hand (known as the Ad-hoc method)

For this class, we will be writing an ad-hoc scanner

An ad-hoc scanner is basically a match expression on steroids

let new_token = match

let new_token = match

• How do we represent comments? (// and /* */) or (# and ‘’’ ‘’’)

• Do we require type definitions?

What coding language should

You might also like