0% found this document useful (0 votes)

22 views

Lecture2 Web

The document discusses lexical analysis and regular expressions. It covers topics like lexical analysis/scanning, regular expressions which are used to specify languages, the Chomsky hierarchy and different types of grammars and machines. It also provides examples of regular expressions and their use.

Uploaded by

jmi40724

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Lecture2 Web

Uploaded by

jmi40724

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Compilers and Execution

Environments (ID2202)
Fall 2020
Lecture 2: Lexical Analysis

David Broman
Associate Professor, KTH Royal Institute of Technology
Associate Director Operations, Digital Futures
David Broman

Part I
Lexical Analysis

Part II Part III

Regular Expressions Deterministic Finite Automata

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Part I
Lexical Analysis

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Lexical Analysis / Scanning

Line comment: Recognize “//“,
int
ignore until new line Identifier token, IDENT
foo
( “foo” is a specific
lexeme
// A function int
int foo(int x){ x
Lexical Analyzer /
return x + 1; Scanner )
} {
Source Text Tokens return Keyword, RETURN
x
Ignores white space +
and comments. 1 Integer Literal / Constant
Space 0x20, Tab 0x09, Lexical Error ; INT
Line Feed 0xa, Carriage }
Return 0xd Specific token
CR LF Windows/DOS RCURLY
LF Unix, MacOS X, Linux
CR Classic Mac, C64

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Lexical Analysis / Scanning

Exercise: which lexical errors would Not a lexical error. Would be

a C/C++ compiler detect? detected by the parser

void foo(int x){

while(x < 10]{
if(x == 7) // Works?
x = x + 2;
braek;
x = x + 1 Not a lexical error. “braek”
is an identifier.
}
return "Hello\z";
}

How do we implement scanning?

Lexical warning. Unknown
1. By hand, programmatically longest match
escape sequence ‘\z’
2. Scanner tool
Specify => regular expressions (regex) NOTE: most of the errors in this example will
Implement => deterministic finate automata (DFA) be parsing errors.

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Languages

Language
A set of strings
String
A finite sequence of symbols

Symbol
An element in a finite set (the alphabet)

Example 1: set of strings representing an integer

Finite or infinite set? Answer: infinite

Example 2: set of strings forming an identifier, which

represents a keyword in the C language

Finite or infinite set? Answer: finite

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Chomsky Hierarchy

Grammars Languages Accepting Machines

Type 0 Recursively Turing machine
Unrestricted grammars enumerable

Type 1 Context- Linear-bounded

Context-sensitive sensitive automata
grammars
Used in parsing
Type 2 Context-free Pushdown
Context-free automata
grammars
Part of this
course
Type 3 Regular Deterministic finite Used in lexing / scanning
Regular grammars language automaton
Nondeterministic finite
automaton

For details, see for instance “Languages and Machines By Sudkamp, 2006
Part I Part II Part III
Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Part II
Regular Expressions

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Regular Expressions
Regular expression
A way to specify a (possibly infinite) set of strings
Example: Alphabet {a, b, c}
Symbol a
A language just containing a
Alternation M|N
M | N is a new regular expression,
Regex a|b represents the
where M and N are regular expressions
language {“a”, “b”}
Concatenation M⋅N
M⋅N is a new regular expression, forming the
Regex (a|b)⋅c represents the
concatenation of M and N, where M and N are
language {“ac”, “bc”}
regular expressions
Epsilon ε Regex (a⋅c)|ε represents the
Regular expression ε is the language of an language {“ac”, “”}
empty string
Repetition M* Regex ((a⋅b)|c)* represents the
* is called the Kleene closure of M. M* forms a language {“abc”, “abc”, “abcc”,
regular expression, representing the “abccc”, “abccabab …}
concatenation of zero or more M Structure from Apple (1998)

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Regular Expressions, Convensions

Conventions
Symbol a
A language just containing a
May omit ⋅ or ε
Alternation M|N
ab is the same as a⋅b
M | N is a new regular expression,
where M and N are regular expressions a| is the same as a|ε

Concatenation M⋅N
Kleene closure binds tighter
M⋅N is a new regular expression, forming the
concatenation of M and N, where M and N are ab* is the same as a(b)*
regular expressions
Epsilon ε Concatenations binds tighter than
Regular expression ε is the language of an alternation
empty string ab|c is the same as (ab)|c

Repetition M*
* is called the Kleene closure of M. M* forms a
regular expression, representing the
concatenation of zero or more M

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Regular Expressions, Abbreviations

Abbreviations
Symbol a
A language just containing a
Alternation abbreviation
Alternation M|N [abc] is the same as a|b|c
M | N is a new regular expression,
[b-h] is the same as [bcdefgh]
where M and N are regular expressions
[b-d01A-C] is the same as [bcd01ABC]
Concatenation M⋅N
M⋅N is a new regular expression, forming the
concatenation of M and N, where M and N are Optional and repetition
regular expressions M? with the meaning (M| ε)
Epsilon ε M+ with the meaning (M⋅M*)
Regular expression ε is the language of an
empty string Any characters represented by a
period .
Repetition M*
* is called the Kleene closure of M. M* forms a
regular expression, representing the
concatenation of zero or more M

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Regular Expressions, Abbreviations

Symbol a Alternation abbreviation

Alternation M|N Optional and repetition
[abc] is the same as a|b|c
Concatenation M⋅N M? with the meaning (M| ε)
[b-h] is the same as [bcdefgh]
Epsilon ε M+ with the meaning (M⋅M*)
[b-d01A-C] is the same as [bcd01ABC]
Repetition M*

Unsigned integer [0-9]+

Keyword while “while”
Unsigned integer,
avoiding octal number 0|[1-9][0-9]* Keyword if “if”
Hexadecimal number,
0[xX][0-9a-fA-F]+ What about tokenizing the string “ifwhile”?
“0x01a” “0XfF31”
Typically, match identifier,
Identifiers, e.g. “foo”, check if keyword
“Bar”, “_foo12”. It is not [_a-zA-Z][_a-zA-Z0-9]*
allowed to start with a What about matching operators + and ++
digit. Two rules to disambiguate: longest
match and rule priority
Part I Part II Part III
Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Part III
Deterministic Finite Automata

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Deterministic Finite Automata (DFA)

1 Finite Automata
A DFA is a machine M, represented as is a 5-tuple:
The language recognized by M is the
set of strings that M accepts. It is
ata M = (S, ⌃, , s0 , F ) written as L(M).

1 Finite Automata
Set of final states
What is the Set of states
L(M )
M = (S, ⌃, , s0 , F )
formalization of the :Q⇥⌃!Q Start state
DFA? Transition
Finite set of L(M )
function Note: deterministic
input symbols because only one
S = {1, 2, 3, 4} (the alphabet) : S ⇥ ⌃ ! S(1) possible output
⌃ = {r, f, o} (2)S= {1,o2, 3, 4} r
f
= {((1, f), 2), ((2, o), 3), ((3, r), 4)} (3)
⌃= {r, f, o}
S = {1, 2, 3, 4}
s0 = 1 (4)
1 2 3 4
F = {4} (5) = {((1,
⌃ = {r, f),f,2),
o}((2, o), 3), (

Part I Part II
s0 = 1 Part=III {((1, f), 2), ((2, o), 3),
Lexical Analysis Regular Expressions sDeterministic
0 = 1 Finite Automata
David Broman

⇥⌃!Q
Deterministic Finite Automata (DFA)
S = {1, 2, 3, 4} (1)
⌃ = {r, f, o} (2)
= {((1, f),
Exercise: Write ((2,ao),
2),down 3), ((3,
regular r), 4)}for
expression (3)
s0a =
lower
1 case identifier (can include an (4) a-z
underscore as first character)
F = {4} (5)
[_a-z][a-z0-9]* _

1 2
S = {1, 2} (6)
⌃ = { , a, b, . . . , z, 0, 1, . . . , 9} (7)
a-z
= {((1, a), 2), . . . , (2, 0), 2), . . . , } (8)
s0 = 1 (9) 0-9
F = {2} (10)
Abbreviation. Represents one
transition line for each character

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Deterministic Finite Automata (DFA)

Task 1. Suppose your alphabet contains the
characters
Task 2. Write out
N, a, v, /
the DFA
We assume that N represents a new line
a
Write down the regular expression that matches a
line comment, starting with “//“
//[av/]*N / /
N
Task 3. Given the DFA M to the
right, which of the the following 1 2 3 4
strings is a string in L(M)?

1 “aavaN” Is rejected by M
2. “//avvN” Is accepted by M v /

This is how you would write it in

"//" [^'\n']* '\n'
ocamlflex. ^'\n' means not a new line.

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Reading Guidelines - Module 1

• Compiler overview: C&T Chapters 1.1 - 1.4

• Lexing and scanning: C&T Chapters 2.1 - 2.5
• Parsing: C&T Chapters 3.1 - 3.5. Focus on 3.2 and 3.3.

C&T = the course book by Cooper and Torczon

Reading Guidelines
See the course webpage
for more information.

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Soon time for coffee…

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata
David Broman

Conclusions

Some key take away points:

• Lexical analysis (scanning) is the first phase in the

front end. The output is a stream of tokens

• Regular expressions can be used when specifying the

scanner.

• A deterministic finite automata (DFA) may be used

when implementing the lexical analysis.

Thanks for listening!

Part I Part II Part III

Lexical Analysis Regular Expressions Deterministic Finite Automata

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
TCL and The Cisco Router
No ratings yet
TCL and The Cisco Router
5 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
Compiler 2
No ratings yet
Compiler 2
10 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
21 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Compiler Design - Lexical Analysis: University of Salford, UK
No ratings yet
Compiler Design - Lexical Analysis: University of Salford, UK
1 page
CompilerD L3
No ratings yet
CompilerD L3
36 pages
Compiler 2
No ratings yet
Compiler 2
38 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Compiler Construction Lecture Notes
No ratings yet
Compiler Construction Lecture Notes
27 pages
CT2
No ratings yet
CT2
21 pages
CD ch2
No ratings yet
CD ch2
104 pages
Lecture 3 (30-1-23)
No ratings yet
Lecture 3 (30-1-23)
11 pages
Lexical Analysis All Token List and Diffence
No ratings yet
Lexical Analysis All Token List and Diffence
4 pages
Compiler Construction Lecture Notes: Why Study Compilers?
No ratings yet
Compiler Construction Lecture Notes: Why Study Compilers?
16 pages
Slide Set 4 Lexical Analysis
No ratings yet
Slide Set 4 Lexical Analysis
11 pages
2 Lexical Analizer
No ratings yet
2 Lexical Analizer
56 pages
Unit 2: Role of Lexical Analyzer
No ratings yet
Unit 2: Role of Lexical Analyzer
11 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
Re and Finite Automata Examples
No ratings yet
Re and Finite Automata Examples
6 pages
SLD 2
No ratings yet
SLD 2
67 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Dr. Fouzia Jabeen: Theory of Automata
No ratings yet
Dr. Fouzia Jabeen: Theory of Automata
24 pages
Lexi Cal
No ratings yet
Lexi Cal
38 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
3B-Formal Languages
No ratings yet
3B-Formal Languages
24 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Applications of FA
No ratings yet
Applications of FA
29 pages
Lecture02 Scanning 1
No ratings yet
Lecture02 Scanning 1
72 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
Practical No-1: AIM: Write and Implement A Program To Simulate Deterministic Finite Automata Theory
No ratings yet
Practical No-1: AIM: Write and Implement A Program To Simulate Deterministic Finite Automata Theory
9 pages
Unit I Bks Lexical Analysis V - Re - and - Fsa
No ratings yet
Unit I Bks Lexical Analysis V - Re - and - Fsa
52 pages
Chapter 2 - Lexical Analysis_Regular Expressions(1)
No ratings yet
Chapter 2 - Lexical Analysis_Regular Expressions(1)
27 pages
Complierdesign Operatingsonlanguagesrefiniteautomata 240920162828 5f5b45f9
No ratings yet
Complierdesign Operatingsonlanguagesrefiniteautomata 240920162828 5f5b45f9
16 pages
Compiler Design Assignment
No ratings yet
Compiler Design Assignment
6 pages
Compiler Construction: Lexical Analysis
No ratings yet
Compiler Construction: Lexical Analysis
37 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Unit 1 - Finite Automata
No ratings yet
Unit 1 - Finite Automata
18 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
PLDI Week 06 Parsing
No ratings yet
PLDI Week 06 Parsing
55 pages
Finite Automata Formal Languages: Operations On Sentences
No ratings yet
Finite Automata Formal Languages: Operations On Sentences
20 pages
CPSC 388 - Compiler Design and Construction: Scanners - Regular Expressions
No ratings yet
CPSC 388 - Compiler Design and Construction: Scanners - Regular Expressions
20 pages
CC 2
No ratings yet
CC 2
65 pages
Code Source Tokens Scanner Parser IR
No ratings yet
Code Source Tokens Scanner Parser IR
26 pages
Lecture 3-4 Updated
No ratings yet
Lecture 3-4 Updated
26 pages
Ch3myppt
No ratings yet
Ch3myppt
59 pages
CH 2
No ratings yet
CH 2
36 pages
Lec 02 - Chapter - 3 (Part - 1)
No ratings yet
Lec 02 - Chapter - 3 (Part - 1)
24 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
RE With DFA: Subject: System Programing
No ratings yet
RE With DFA: Subject: System Programing
16 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
Lect05 Fetch APIs
No ratings yet
Lect05 Fetch APIs
57 pages
End Sem CD
No ratings yet
End Sem CD
97 pages
06 Kleene Theorem
No ratings yet
06 Kleene Theorem
3 pages
ST Ref
No ratings yet
ST Ref
162 pages
Cisco Expressway SIP Trunk To Unified CM Deployment Guide CUCM 8 9 10 11 and X8 8
No ratings yet
Cisco Expressway SIP Trunk To Unified CM Deployment Guide CUCM 8 9 10 11 and X8 8
42 pages
Selenium With Java Course Content
No ratings yet
Selenium With Java Course Content
26 pages
Linux Advanced Commands
No ratings yet
Linux Advanced Commands
3 pages
Java String Methods Cheat Sheet
No ratings yet
Java String Methods Cheat Sheet
18 pages
27.2.9 Lab - Regular Expression Tutorial Redone
No ratings yet
27.2.9 Lab - Regular Expression Tutorial Redone
3 pages
Vmware Ruby Vsphere Console Command Reference For Virtual San
No ratings yet
Vmware Ruby Vsphere Console Command Reference For Virtual San
82 pages
Javascript Coding Book For Beginners Web Development Crash Course Head First Javascript Programming Book For Modern Software Engineering Javascript The Definitive Guide For Coding Interview
100% (2)
Javascript Coding Book For Beginners Web Development Crash Course Head First Javascript Programming Book For Modern Software Engineering Javascript The Definitive Guide For Coding Interview
251 pages
Cs420 Week 7-14
No ratings yet
Cs420 Week 7-14
155 pages
Flat PQP
No ratings yet
Flat PQP
4 pages
TCL Tutorial
0% (1)
TCL Tutorial
54 pages
Inportatn Matlab Functions
No ratings yet
Inportatn Matlab Functions
10 pages
02 Advanced BGP Features
No ratings yet
02 Advanced BGP Features
53 pages
100 Skills To Better Python
100% (8)
100 Skills To Better Python
80 pages
Tasks For Final Exam
No ratings yet
Tasks For Final Exam
2 pages
CSC 453 Lexical Analysis (Scanning) : Saumya Debray
No ratings yet
CSC 453 Lexical Analysis (Scanning) : Saumya Debray
27 pages
bb1sg56q1gst TroubleshootingGuide Synergy
No ratings yet
bb1sg56q1gst TroubleshootingGuide Synergy
17 pages
Regular Expression - Sentence Segment
No ratings yet
Regular Expression - Sentence Segment
46 pages
Wilhelm, Seidl, Hack - Compiler Design. Syntactic and Semantic Analysis
100% (1)
Wilhelm, Seidl, Hack - Compiler Design. Syntactic and Semantic Analysis
233 pages
Keterangan Epsilon-Nfa Ke Dfa Rev 1
No ratings yet
Keterangan Epsilon-Nfa Ke Dfa Rev 1
9 pages
Compiler Design Questions
No ratings yet
Compiler Design Questions
6 pages
Unix Basics
No ratings yet
Unix Basics
15 pages
Name Asu Id: (10) (10) (10) (10) Total
No ratings yet
Name Asu Id: (10) (10) (10) (10) Total
3 pages
PHP and XMLUnit 4 Complete Notes
No ratings yet
PHP and XMLUnit 4 Complete Notes
24 pages
How to Lowercase a String in JavaScript _ - GeeksforGeeks
No ratings yet
How to Lowercase a String in JavaScript _ - GeeksforGeeks
7 pages
Javascript Cheat Sheet
No ratings yet
Javascript Cheat Sheet
1 page