CPSC 388 - Compiler Design and Construction: Scanners - Regular Expressions

The document provides information about a CPSC 388 compiler design course. It includes announcements about due dates for homework, programming assignments, and reading. It also discusses topics that will be covered, including scanners, regular expressions, finite state automata, and creating a scanner from regular expressions.

Uploaded by

Kashif Raffat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views20 pages

CPSC 388 - Compiler Design and Construction: Scanners - Regular Expressions

Uploaded by

Kashif Raffat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 20

CPSC 388 – Compiler Design

and Construction

Scanners – Regular Expressions

Announcements
 Last day to Add/Drop Sept 11
 Wipe Down Computer Keyboards and Mice
 ACM programming contest
 Read Chapter 3
 Homework 2 due this Friday
 PROG1 due this Friday
 FSA for Java string constants (Anybody
figure it out?)
Homework 1 returned
 Summary – In addition to your summary please
answer the following questions in your report:
 Context – What is the context of the research
presented in the paper? What new ideas or concepts
do you think were presented in the paper?
 Evaluation – How was the research evaluated? What
evaluation techniques would you like to see to
compare the research to the state of the art before
the paper?
 Significance – What is the significance of the
research? Do you feel the work is a minor
improvement or a major step in the field?
 Grammar / spelling / clarity / support for statements
made
Scanner Generator

.Jlex file .java file

Containing Scanner Generator Containing
Regular Expressions Scanner code

To understand Regular Expressions

you need to understand Finite-State Automata
FSA Formal Definition (5-tuple)
Q – a finite set of states
Σ – The alphabet of the automata
(finite set of characters to label edges)
δ – state transition function
δ(statei,character)  statej
q – The start state
F – The set of final states
Types of FSA
 Deterministic (DFA)
 No State has more than one outgoing
edge with the same label
 Non-Deterministic (NFA)
 States may have more than one
outgoing edge with same label.
 Edges may be labeled with ε, the empty
string. The FSA can take an epsilon
transition without looking at the current
input character.
Terms to Know
 Alphabet (Σ) – any finite set of
symbols e.g. binary, ASCII, Unicode
 String – finite sequence of symbols
e.g. 010001, banana, bãër
 Language – any countable set of
strings e.g.
Empty set
Well-formed C programs
English words
Regular Expressions
 Easy way to express a language that is
accepted by FSA
 Rules:
 ε is a regular expression
 Any symbol in Σ is a regular expression
If r and s are any regular expressions then so is:
 r|s denotes union e.g. “r or s”
 rs denotes r followed by s (concatination)
 (r)* denotes concatination of r with itself zero or
more times (Kleene closer)
 () used for controlling order of operations
Example Regular Expressions
Regular Expression Corresponding Language
ε {“”}
a {“a”}
abc {“abc”}
a|b|c {“a”,”b”,”c”}
(a|b|c)* {“”,”a”,”b”,”c”,”aa”,”ab”,”ac”,”aaa”,…}
a|b|c|…|z|A|B|…|Z Any letter
0|1|2|…|9 Any digit
Precedence in Regular Expressions
 * has highest precedence, left associative

 Concatenation has second highest

precedence, left associative

 | has lowest associative, left associative

More Regular Expression Examples
Regular Expression Corresponding Language
ε|a|b|ab* {“”, “a”, “b”, “ab”, “abb”, “abbb”,…}

ab*c {“ac”, “abc”, “abbc”,…}

ab|a {“”, “a”, “ab”, “aa”, “aaa”, “abb”,…}

a(b|a) {“a”, “ab”, “aa”, “abb”, “aaa”, …}

a(b|a)* {“a”, “ab”, “aa”, “aaa”, “aab”, “aba”,…}

You Try
 What is the language described by
each Regular Expression?
a*
(a|b)*
a|a*b
(a|b)(a|b)
aa|ab|ba|bb
(+|-|ε)(0|1|2|3|4|5|6|7|8|9)*
Regular Definitions
If Σ is an alphabet of basic symbols,
then a regular definition is a
sequence of definitions of the form:
D1 → R 1
1. Each di is a new symbol not in Σ and
D2 → R2 not the same as any other of the d’s.
…
2. Each ri is a regular expression over
Dn → R n
Σ U (d1,d2,…,di-1)
Regular Definitions Example
Example C identifiers:
Σ = ASCII

letter_ → a|b|c|…|z|A|B|C|…|Z|_
digit → 0|1|2|…|9
id → letter_(letter_|digit)*
Regular Definitions Example
Example Unsigned Numbers (integer or float):
Σ = ASCII

digit → 0|1|2|…|9
digits → digit digit*
optionalFraction → . digits | ε
optionalExponent → (E(+|-| ε)digits)| ε
number → digits optionalFraction optionalExponent
Special Characters in Reg. Exp.
What does each of the following mean?
* – Kleene Closure
| – or
() – grouping
[] – creates a character class
+ – Positive Closure
? – zero or one instance
“” – anything in quotes means itself, e.g. “*”
. – matches any single character (except newline)
\ – used for escape characters (newline, tab, etc.)
^ – matches beginning of a line
$ – matches the end of a line
Extensions to Regular Expressions
 + means one or more occurrence
(positive closure)
 ? means zero or one occurrence
 Character classes
 a|r|t can be written [art]
 a|b|…|z can be written [a-z]
As long as there is a clear ordering to
characters
 [^a-z] matches any character except a-z
Example Using Character Classes
^[^aeiou]*$
Matches any complete line that does not
contain a lowercase vowel
How do you tell which meaning of ^ is
intended?
Try It
 Create Character Classes for:
 First ten letters (up to “j”)
 Lowercase consonants
 Digits in hexadecimal
 Create Regular Expressions for:
 Case Insensitive keyword such as
SELECT (or Select or SeLeCt) in SQL
 Java string constants
 Any string of whitespace characters
Creating a Scanner
 Create a set of regular expressions, one for
each token to be recognized
 Convert regular expressions into one
combined DFA
 Run DFA over input character stream
 Longest matching regular expression is selected
 If a tie then use first matching regular
expression
 Attach code to run when a regular
expression matches

Class 3
No ratings yet
Class 3
52 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
Unit 3 - Regular Expression
No ratings yet
Unit 3 - Regular Expression
45 pages
Module2 NLP BAD613B Notes
100% (1)
Module2 NLP BAD613B Notes
16 pages
Lecture # 06
No ratings yet
Lecture # 06
27 pages
Lecture02 Scanning 1
No ratings yet
Lecture02 Scanning 1
72 pages
Regular Expression: Dept. of Computer Science Faculty of Science and Technology
No ratings yet
Regular Expression: Dept. of Computer Science Faculty of Science and Technology
16 pages
Automata Theory Computability - M2
No ratings yet
Automata Theory Computability - M2
68 pages
Solution-Assignment 1
No ratings yet
Solution-Assignment 1
5 pages
Chapter Two
No ratings yet
Chapter Two
59 pages
Chapter 3 - Regular Expressions
No ratings yet
Chapter 3 - Regular Expressions
49 pages
Mod 2
No ratings yet
Mod 2
49 pages
Delos Santos - AutomataResearch2
No ratings yet
Delos Santos - AutomataResearch2
2 pages
Automata Module 2
No ratings yet
Automata Module 2
69 pages
Automata - Chap3+regularexpressionlanguages - 2
No ratings yet
Automata - Chap3+regularexpressionlanguages - 2
61 pages
Re and Finite Automata Examples
No ratings yet
Re and Finite Automata Examples
6 pages
Rohini 41292706916
No ratings yet
Rohini 41292706916
6 pages
Atc-21cs51 Module 2
No ratings yet
Atc-21cs51 Module 2
56 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
ACD Module - 2 Notes
No ratings yet
ACD Module - 2 Notes
28 pages
Atcd Module 2 2021 Scheme
No ratings yet
Atcd Module 2 2021 Scheme
56 pages
2 Regular Expressions
No ratings yet
2 Regular Expressions
34 pages
CD - Unit1 - Lecture4 5 6 7
No ratings yet
CD - Unit1 - Lecture4 5 6 7
50 pages
Language About Complier Construction
No ratings yet
Language About Complier Construction
23 pages
Chapter 2 RegularExpressions
No ratings yet
Chapter 2 RegularExpressions
95 pages
3-Regular Expressions
No ratings yet
3-Regular Expressions
34 pages
COMP3 RegEx
No ratings yet
COMP3 RegEx
10 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
Compiler Lecture 7
No ratings yet
Compiler Lecture 7
18 pages
Compiler Lecture 7
No ratings yet
Compiler Lecture 7
18 pages
Formal Methods: Finite State Machine - Regular Expressions
No ratings yet
Formal Methods: Finite State Machine - Regular Expressions
14 pages
Chapter 2 RegularExpressions
No ratings yet
Chapter 2 RegularExpressions
95 pages
Regular Expression Overview
No ratings yet
Regular Expression Overview
5 pages
Ayan Saha - 10700121101
No ratings yet
Ayan Saha - 10700121101
10 pages
Computability 05
No ratings yet
Computability 05
28 pages
ACT CH2 Regular Expressions and Languages
No ratings yet
ACT CH2 Regular Expressions and Languages
59 pages
Section 3.1
No ratings yet
Section 3.1
44 pages
Lecture 4 Regular Expression
No ratings yet
Lecture 4 Regular Expression
30 pages
Module 2 Chap1
No ratings yet
Module 2 Chap1
92 pages
TPL Lect 15 - 16
No ratings yet
TPL Lect 15 - 16
5 pages
Lecture2 Web
No ratings yet
Lecture2 Web
19 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Unit Ii
No ratings yet
Unit Ii
25 pages
Regular Expressions G P: Reading: Chapter 3
No ratings yet
Regular Expressions G P: Reading: Chapter 3
16 pages
Token, Lexemes and Regular Expression
No ratings yet
Token, Lexemes and Regular Expression
22 pages
Sys LW-08EN Regex-Filters
No ratings yet
Sys LW-08EN Regex-Filters
31 pages
Chapter THREE
No ratings yet
Chapter THREE
24 pages
TOC Unit2
No ratings yet
TOC Unit2
87 pages
Spring 2024 Compiler Constructoin A Lab 3-2
No ratings yet
Spring 2024 Compiler Constructoin A Lab 3-2
16 pages
HN ATC Notes Module 2
No ratings yet
HN ATC Notes Module 2
19 pages
Automata Lectuee3
No ratings yet
Automata Lectuee3
27 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
10 pages
3 Regular Expression
No ratings yet
3 Regular Expression
15 pages
Regular Expressions: Reading: Chapter 3
No ratings yet
Regular Expressions: Reading: Chapter 3
16 pages
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
The Genetic Code of All Languages,(Part 2.1; Numerals)
From Everand
The Genetic Code of All Languages,(Part 2.1; Numerals)
Moni Kanchan Panda
No ratings yet
Mathematics XII Chapter 10
No ratings yet
Mathematics XII Chapter 10
77 pages
Mathematics XII Chapter 11
No ratings yet
Mathematics XII Chapter 11
24 pages
Introduction To Numerical Methods
No ratings yet
Introduction To Numerical Methods
21 pages
Mathematics XII Chapter 7
No ratings yet
Mathematics XII Chapter 7
44 pages
Data Structure & Algorithms - Tower of Hanoi
100% (1)
Data Structure & Algorithms - Tower of Hanoi
3 pages
Mathematics XII Chapter 3
No ratings yet
Mathematics XII Chapter 3
38 pages
Partial Differentiation
No ratings yet
Partial Differentiation
13 pages
Deep Learning Based Medical X-Ray
No ratings yet
Deep Learning Based Medical X-Ray
35 pages
Mathematics XII Chapter 6
No ratings yet
Mathematics XII Chapter 6
15 pages
Mathematics XII Chapter 9
No ratings yet
Mathematics XII Chapter 9
33 pages
Mathematics XII Chapter 4
No ratings yet
Mathematics XII Chapter 4
45 pages
CPSC 388 - Compiler Design and Construction: Scanners - Finite State Automata
No ratings yet
CPSC 388 - Compiler Design and Construction: Scanners - Finite State Automata
16 pages
CPSC 388 - Compiler Design and Construction: Scanners - Jlex Scanner Generator
No ratings yet
CPSC 388 - Compiler Design and Construction: Scanners - Jlex Scanner Generator
15 pages
The Legality of Pakistan and Indian View Points Over Kashmir Dispute (An Analysis)
No ratings yet
The Legality of Pakistan and Indian View Points Over Kashmir Dispute (An Analysis)
12 pages
Compiler (RE and TD) - 01
No ratings yet
Compiler (RE and TD) - 01
2 pages
Context Free Grammar (CFG) - 2021
100% (1)
Context Free Grammar (CFG) - 2021
2 pages
Diagnostic Criteria For Primary Osteoporosis-Year 2012 Revision PDF
No ratings yet
Diagnostic Criteria For Primary Osteoporosis-Year 2012 Revision PDF
12 pages
CPSC 388 - Compiler Design and Construction: Evolution of Programming Languages Programming Language Basics Make
No ratings yet
CPSC 388 - Compiler Design and Construction: Evolution of Programming Languages Programming Language Basics Make
21 pages
CPSC 388 - Compiler Design and Construction: Parsers - Context Free Grammars
No ratings yet
CPSC 388 - Compiler Design and Construction: Parsers - Context Free Grammars
35 pages
Adjective Clause Rayos Jasmin
No ratings yet
Adjective Clause Rayos Jasmin
22 pages
k2 V11ea1
No ratings yet
k2 V11ea1
30 pages
Running and Clicking - Future Narratives in Film
No ratings yet
Running and Clicking - Future Narratives in Film
244 pages
Epals School Mail 101 Briefly
No ratings yet
Epals School Mail 101 Briefly
48 pages
Close Passage - Reconcilliation Week
No ratings yet
Close Passage - Reconcilliation Week
2 pages
16 Tenses Dalam Bahasa Inggris
No ratings yet
16 Tenses Dalam Bahasa Inggris
3 pages
DLL - English 9 - Q1 - W2
No ratings yet
DLL - English 9 - Q1 - W2
5 pages
Chapter 12: File-System Implementation: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts
No ratings yet
Chapter 12: File-System Implementation: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts
51 pages
Logic Group Summary
No ratings yet
Logic Group Summary
9 pages
A Parable of God's Love For Sinners - Luke 15
No ratings yet
A Parable of God's Love For Sinners - Luke 15
15 pages
Programming - Programming QuickStart Box Set - HTML, Javascript & CSS (Programming, HTML, Javascript, CSS, Computer Programming) (PDFDrive)
No ratings yet
Programming - Programming QuickStart Box Set - HTML, Javascript & CSS (Programming, HTML, Javascript, CSS, Computer Programming) (PDFDrive)
223 pages
Typeof Examination Academic Preparationand Performanceof BSN Three Studentsof Davao Doctors College
No ratings yet
Typeof Examination Academic Preparationand Performanceof BSN Three Studentsof Davao Doctors College
5 pages
The French Foreign Legion. La Legion Etranger
No ratings yet
The French Foreign Legion. La Legion Etranger
78 pages
Jha BrahmanicalIntoleranceEarly 2016
No ratings yet
Jha BrahmanicalIntoleranceEarly 2016
9 pages
Present Perfet (Already, Yet, Just, Since, For, Ever, Never) Worksheet
No ratings yet
Present Perfet (Already, Yet, Just, Since, For, Ever, Never) Worksheet
1 page
Red Hatconferral
No ratings yet
Red Hatconferral
30 pages
Unit 4 Unit Wise Question
No ratings yet
Unit 4 Unit Wise Question
20 pages
Groovy For SAP CPI Workbook
No ratings yet
Groovy For SAP CPI Workbook
4 pages
Lecture No.04 Data Structures: Dr. Sohail Aslam
No ratings yet
Lecture No.04 Data Structures: Dr. Sohail Aslam
54 pages
Florida's B.E.S.T. Standards For English Language Arts Grades 6-8
100% (1)
Florida's B.E.S.T. Standards For English Language Arts Grades 6-8
6 pages
S29Gl-N Mirrorbit™ Flash Family
No ratings yet
S29Gl-N Mirrorbit™ Flash Family
100 pages
Kinera
No ratings yet
Kinera
15 pages
Notes DWDM
No ratings yet
Notes DWDM
12 pages
ASSESSMENT
No ratings yet
ASSESSMENT
8 pages
Sap Enterprise Structure
No ratings yet
Sap Enterprise Structure
27 pages
Tata Sky Packages
No ratings yet
Tata Sky Packages
10 pages
Special Angles Ws 2
No ratings yet
Special Angles Ws 2
8 pages
Streams 2 GG
No ratings yet
Streams 2 GG
59 pages
Darren Sardelli
No ratings yet
Darren Sardelli
6 pages
HP Z4 G4 Workstation
No ratings yet
HP Z4 G4 Workstation
73 pages

CPSC 388 - Compiler Design and Construction: Scanners - Regular Expressions

Uploaded by

CPSC 388 - Compiler Design and Construction: Scanners - Regular Expressions

Uploaded by

CPSC 388 – Compiler Design

Scanners – Regular Expressions

.Jlex file .java file

To understand Regular Expressions

 Concatenation has second highest

 | has lowest associative, left associative

ab*c {“ac”, “abc”, “abbc”,…}

ab*|a* {“”, “a”, “ab”, “aa”, “aaa”, “abb”,…}

a(b*|a*) {“a”, “ab”, “aa”, “abb”, “aaa”, …}

a(b|a)* {“a”, “ab”, “aa”, “aaa”, “aab”, “aba”,…}

You might also like

ab|a {“”, “a”, “ab”, “aa”, “aaa”, “abb”,…}

a(b|a) {“a”, “ab”, “aa”, “abb”, “aaa”, …}