0% found this document useful (0 votes)

6 views17 pages

2 - 2specification of Tokens

Uploaded by

2k5preethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views17 pages

2 - 2specification of Tokens

Uploaded by

2k5preethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 17

SPECIFICATION OF

TOKENS

1
Strings and Languages

• Regular Expressions are an important notation for

specifying patterns.

• Alphabet – any finite set of symbols

e.g. ASCII, binary alphabet, UNICODE, EBCDIC,LATIN-1

• String – A finite sequence of symbols drawn from an alphabet

– Banana (ASCII Alphabet)
– Length of a string => |s|
– Empty String => ε

• Other terms relating to strings: prefix; suffix; substring; proper

prefix, suffix, or substring (non-empty, not entire string);
subsequence

• Language – A set of strings over a fixed alphabet

2
Languages
• A language, L, is simply any set of strings over a
fixed alphabet.

Alphabet Languages
{0,1} {0,10,100,1000,100000…}
{0,1,00,11,000,111,…}
{a,b,c} {abc,aabbcc,aaabbbccc,…}
{A, … ,Z} {FOR,WHILE,GOTO,…}
{A,…,Z,a,…,z,0,…9, { All legal PASCAL progs}
+,-,…,<,>,…}

Special Languages:  - EMPTY LANGUAGE

 - contains  string only

3
String operations
• Given String: banana
• Prefix : ban, banana
• Suffix : ana, banana
• Substring : nan, ban, ana, banana
• Subsequence: bnan, nn
• Proper Prefix and Suffix

4
String Operations
• Concatenation
– xy; s = s = s;  - identity for concatenation
– s0 =  if i > 0 si = si-1s

5
Operations on Languages

OPERATION DEFINITION
union of L and M L  M = {s | s is in L or s is in M}
written L  M
concatenation of L LM = {st | s is in L and t is in M}
and M written LM

Kleene closure of L
written L*
L*= Li

i 0

L* denotes “zero or more concatenations of “ L

positive closure of 

L+= 
i
L
L written L+ i 1

L+ denotes “one or more concatenations of “ L

Exponentiation Lo={ε}, L1=L,L2=LL
6
Operations on Languages
• LUD is the set of letters and digits
• LD is the set of strings consisting of a
letter followed by a digit
• L4 is the set of all four strings
• L* is the set of strings including ε
• D+ is the set of strings of one or more
digits.

7
Say What?
L = {A, B, C, D } D = {1, 2, 3}
• LD
{A, B, C, D, 1, 2, 3 }
• LD
{A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 }
• L2
{ AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD}
• L*
{ All possible strings of L plus  }
• L+
L* - 
• L (L  D )
Valid :{ A1,AA2,B345,CD45} Invlaid:{321,4A2}
• L (L  D )*
Valid:{ A,A1,A23,D3,DA5..} Invalid:{31}
8
Regular Expressions
• A Regular Expression is a Set of Rules /
Techniques for Constructing Sequences of
Symbols (Strings) from an Alphabet.

• Let  Be an Alphabet, r a Regular Expression

Then L(r) is the Language That is
characterized by the Rules of r

9
Regular Expressions
• Defined over an alphabet Σ

• ε represents {ε}, the set containing the empty string

• If a is a symbol in Σ, then a is a regular expression

denoting {a}, the set containing the string a

• If r and s are regular expressions denoting the languages

L(r) and L(s), then:
– (r)|(s) is a regular expression denoting L(r)U L(s)
– (r)(s) is a regular expression denoting L(r)L(s)
– (r)* is a regular expression denoting (L(r))*
– (r) is a regular expression denoting L(r)

• Precedence: * (left associative), then concatenation (left

associative), then | (left associative) 10
Regular Expressions
Alphabet = {a, b}
1. a|b denotes {a, b}
2. (a|b)(a|b) denotes {ab, aa, ba, bb}
3. a* denotes {, a, aa, …}
4. (a|b)* - Strings of a’s and b’s including the 
5. a|a*b – a followed by zero/more a’s followed by b

11
Algebraic Properties of Regular
Expressions

AXIOM DESCRIPTION
r|s=s|r | is commutative
r | (s | t) = (r | s) | t | is associative
(r s) t = r (s t) concatenation is associative
r(s|t)=rs|rt
(s|t)r=sr|tr concatenation distributes over |

r = r
r = r  Is the identity element for concatenation

r* = ( r |  )* relation between * and 

r** = r* * is idempotent

12
Regular Definitions
• Names maybe given to regular expressions; these
names can be used like symbols
• Let  is an alphabet of basic symbols. The regular
definition is a sequence of definitions of the form
d1 r1
d2 r2
...
dn rn
Where, each di is a distinct name, and each ri is a
regular expression over the symbols in   {d1, d2,
…, di-1 }

13
Regular Definitions
• Example 1:
– letter  A|B|…|Z|a|b|…|z
– digit  0|1|…|9
– id  letter (letter | digit)*
• Example 2
– digit  0 | 1 | 2 | … | 9
– digits  digit digit*
– optional_fraction  . digits | 
– optional_exponent  ( E ( + | -| ) digits) | 
– num  digits optional_fraction optional_exponent

14
Regular Definitions
• Shorthand
– One or more instances: r+ denotes rr*
– Zero or one Instance: r? denotes r|ε
– Character classes: [a-z] denotes [a|b|…|
z]

15
Example
• digit  0 | 1 | 2 | … | 9
• digits  digit+
• optional_fraction  (. digits ) ?
• optional_exponent  ( E ( + | -) ? digits) ?
• num  digits optional_fraction optional_exponent

16
Limitations of Regular
Expression
• Some languages cannot be described by any regular
expression
• Cannot describe balanced or nested constructs
– Example, all valid strings of balanced parentheses
– This can be done with CFG
• Cannot describe repeated strings
– Example: {wcw|w is a string of a’s and b’s}
– This can be done with CFG
• Can be used to denote only a fixed or unspecified
number of repetitions.

Solution-Introduction To Automata Theory
36% (25)
Solution-Introduction To Automata Theory
53 pages
Mathematics For Natural Science PDF
100% (1)
Mathematics For Natural Science PDF
172 pages
Boolean Algebra 2
No ratings yet
Boolean Algebra 2
1,051 pages
Expritment No 5
No ratings yet
Expritment No 5
29 pages
Mod 3.2 Fol
100% (1)
Mod 3.2 Fol
129 pages
Automata Theory Tutorial
0% (1)
Automata Theory Tutorial
17 pages
HW2 Solutions 2016 Spring PDF
No ratings yet
HW2 Solutions 2016 Spring PDF
6 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
Introduction To Programming For Beginners: Allan Martell
No ratings yet
Introduction To Programming For Beginners: Allan Martell
54 pages
1 - Computing Models and The Power of Writing
No ratings yet
1 - Computing Models and The Power of Writing
85 pages
Lec06 Bottomupparser
83% (6)
Lec06 Bottomupparser
88 pages
Introduction To Parsing: Prof. Bodik CS 164 Lecture 4 1
No ratings yet
Introduction To Parsing: Prof. Bodik CS 164 Lecture 4 1
44 pages
Kleen's Theorem
No ratings yet
Kleen's Theorem
21 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
K Strips in Artificial Intelligence
No ratings yet
K Strips in Artificial Intelligence
2 pages
Regular Expressions
No ratings yet
Regular Expressions
31 pages
Language About Complier Construction
No ratings yet
Language About Complier Construction
23 pages
Small17 PDF
No ratings yet
Small17 PDF
64 pages
Lesson Teaching Plan: Subject: Automata Theory Branch: Computer Application Semester: 4 Faculty Name: Bighnaraj Naik
No ratings yet
Lesson Teaching Plan: Subject: Automata Theory Branch: Computer Application Semester: 4 Faculty Name: Bighnaraj Naik
2 pages
Lecture Slides Regular Expressions
No ratings yet
Lecture Slides Regular Expressions
138 pages
Chap-2 2 (RegularExpression)
No ratings yet
Chap-2 2 (RegularExpression)
46 pages
Regular Expression and Languages: Prepared By: Ochovillo, Divina T. & Behic, Esterlita G
No ratings yet
Regular Expression and Languages: Prepared By: Ochovillo, Divina T. & Behic, Esterlita G
9 pages
21CS51 ATCD MODULE 2 - 1 Regular Expressions
No ratings yet
21CS51 ATCD MODULE 2 - 1 Regular Expressions
148 pages
COSC 408 - Compiler Construction
No ratings yet
COSC 408 - Compiler Construction
324 pages
CC 2
No ratings yet
CC 2
65 pages
Automata Theory: Lecture 3, 4
No ratings yet
Automata Theory: Lecture 3, 4
18 pages
Edwin Mares - The Logic of Entailment and Its History-Cambridge University Press (2024)
No ratings yet
Edwin Mares - The Logic of Entailment and Its History-Cambridge University Press (2024)
280 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
Lect2 Lexical
No ratings yet
Lect2 Lexical
9 pages
Solucion Ejercisios Seccion 1.3 MAtematicas Discretas
No ratings yet
Solucion Ejercisios Seccion 1.3 MAtematicas Discretas
3 pages
05 Handout 1
No ratings yet
05 Handout 1
3 pages
Chapter THREE
No ratings yet
Chapter THREE
24 pages
Regular Expression: Anab Batool Kazmi
No ratings yet
Regular Expression: Anab Batool Kazmi
32 pages
cs212 Lect02 63 Inter
No ratings yet
cs212 Lect02 63 Inter
39 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
Lecture02 Scanning 1
No ratings yet
Lecture02 Scanning 1
72 pages
MATH
No ratings yet
MATH
24 pages
LPN 07 Exercises
No ratings yet
LPN 07 Exercises
2 pages
2022 CSC 353 2.0 2 Alphabets and Languages
No ratings yet
2022 CSC 353 2.0 2 Alphabets and Languages
3 pages
Chapter 3 - Regular Expressions
No ratings yet
Chapter 3 - Regular Expressions
49 pages
Formal Methods: Finite State Machine - Regular Expressions
No ratings yet
Formal Methods: Finite State Machine - Regular Expressions
14 pages
Specification of Tokens
0% (1)
Specification of Tokens
17 pages
3 RegularExpressions
No ratings yet
3 RegularExpressions
25 pages
03-RegularExpression 112422
No ratings yet
03-RegularExpression 112422
22 pages
CO-2-Classroom Delivery Problems Xeroxx
No ratings yet
CO-2-Classroom Delivery Problems Xeroxx
5 pages
Automata - Chap3+regularexpressionlanguages - 2
No ratings yet
Automata - Chap3+regularexpressionlanguages - 2
61 pages
Chapter 3 - Regular Expression
No ratings yet
Chapter 3 - Regular Expression
16 pages
SPECIFICATION OF TOKENS - Unit 1
No ratings yet
SPECIFICATION OF TOKENS - Unit 1
13 pages
Atcd Module 2 2021 Scheme
No ratings yet
Atcd Module 2 2021 Scheme
56 pages
Flat All Units
No ratings yet
Flat All Units
82 pages
FL 2
No ratings yet
FL 2
34 pages
Lecture 3a and 3b
No ratings yet
Lecture 3a and 3b
21 pages
Formal Languages Part 1 Including Regular Expressions: Basic Concepts For Symbols, Strings, and Languages
No ratings yet
Formal Languages Part 1 Including Regular Expressions: Basic Concepts For Symbols, Strings, and Languages
4 pages
CST301 QP
No ratings yet
CST301 QP
3 pages
Automata Theory - Quick Guide
No ratings yet
Automata Theory - Quick Guide
71 pages
Unit Ii QB
No ratings yet
Unit Ii QB
16 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
Formal Languages and Automata Theory - Regular Expressions and Finite Automata
No ratings yet
Formal Languages and Automata Theory - Regular Expressions and Finite Automata
17 pages
TOA Lecture 03
No ratings yet
TOA Lecture 03
63 pages
Compiler Design Assignment
No ratings yet
Compiler Design Assignment
6 pages
ECS 20 Chapter 12, Languages, Automata, Grammars: R R 1 2 N R N n-1 2 1 R
No ratings yet
ECS 20 Chapter 12, Languages, Automata, Grammars: R R 1 2 N R N n-1 2 1 R
4 pages
Important Questions CSE322
No ratings yet
Important Questions CSE322
3 pages
Specification of Tokens
No ratings yet
Specification of Tokens
21 pages
Unit I
No ratings yet
Unit I
37 pages
Lec 3
No ratings yet
Lec 3
25 pages
Chapter Two
No ratings yet
Chapter Two
59 pages
ACD Module - 2 Notes
No ratings yet
ACD Module - 2 Notes
28 pages
Specification of Tokens
No ratings yet
Specification of Tokens
17 pages
Lecture # 06
No ratings yet
Lecture # 06
27 pages
Lec 4
No ratings yet
Lec 4
16 pages
Unit Ii
No ratings yet
Unit Ii
25 pages
Bcs503 Module 2
No ratings yet
Bcs503 Module 2
46 pages
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
No ratings yet
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
23 pages
Regular Expressions
No ratings yet
Regular Expressions
4 pages
Regular Expressions and Languages
No ratings yet
Regular Expressions and Languages
16 pages
CSC102 DS Syllabus V4.0
No ratings yet
CSC102 DS Syllabus V4.0
3 pages
Regular Expression
No ratings yet
Regular Expression
89 pages
Regular Expressions and Regular Languages
No ratings yet
Regular Expressions and Regular Languages
5 pages
Intuitionistic Proof Versus Classical Truth The Role of Brouwer S Creative Subject in Intuitionistic Mathematics 1st Edition Enrico Martino (Auth.)
No ratings yet
Intuitionistic Proof Versus Classical Truth The Role of Brouwer S Creative Subject in Intuitionistic Mathematics 1st Edition Enrico Martino (Auth.)
57 pages
Regular Expressions
No ratings yet
Regular Expressions
21 pages
Lecture 3, 4
No ratings yet
Lecture 3, 4
33 pages
1.1 System Models For Distributed and Cloud Computing
No ratings yet
1.1 System Models For Distributed and Cloud Computing
37 pages
1.2 NIST Reference Architecture
No ratings yet
1.2 NIST Reference Architecture
25 pages
2.0+regular Expression Part 1 MKN
No ratings yet
2.0+regular Expression Part 1 MKN
33 pages
Specification of Tokens
No ratings yet
Specification of Tokens
21 pages
1.2 NIST Reference Architecture
No ratings yet
1.2 NIST Reference Architecture
25 pages
1.5 Architectural Design Challenges
No ratings yet
1.5 Architectural Design Challenges
10 pages
Cloud Questions Answers Cleaned
No ratings yet
Cloud Questions Answers Cleaned
3 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
Chapter 3
No ratings yet
Chapter 3
10 pages
Csen3031 - Compiler Design
No ratings yet
Csen3031 - Compiler Design
2 pages
Implementation Levels
No ratings yet
Implementation Levels
3 pages
Pcdunit2 Continuation
No ratings yet
Pcdunit2 Continuation
26 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Introduction to Formal Languages
From Everand
Introduction to Formal Languages
György E. Révész
2/5 (1)

2 - 2specification of Tokens

Uploaded by

2 - 2specification of Tokens

Uploaded by

SPECIFICATION OF

• Regular Expressions are an important notation for

• Alphabet – any finite set of symbols

• String – A finite sequence of symbols drawn from an alphabet

• Other terms relating to strings: prefix; suffix; substring; proper

• Language – A set of strings over a fixed alphabet

Special Languages:  - EMPTY LANGUAGE

L* denotes “zero or more concatenations of “ L

L+ denotes “one or more concatenations of “ L

• Let  Be an Alphabet, r a Regular Expression

• ε represents {ε}, the set containing the empty string

• If a is a symbol in Σ, then a is a regular expression

• If r and s are regular expressions denoting the languages

• Precedence: * (left associative), then concatenation (left

r* = ( r |  )* relation between * and 

You might also like