Specification of Tokens

The document provides an introduction to compilers, focusing on lexical analysis and the specification of tokens using regular expressions. It explains the concepts of strings, languages, and operations on languages, including union, concatenation, and closure. Additionally, it covers regular definitions and their applications in defining languages, such as C identifiers and unsigned numbers.

Uploaded by

Subashini Hari Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views21 pages

Specification of Tokens

Uploaded by

Subashini Hari Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

UNIT I

INTRODUCTION TO COMPILERS

Lexical Analysis
Specification of Tokens
Strings and Languages
• Regular expressions are an important notation
for specifying lexeme patterns
• Alphabet
– Any finite set of symbols
– Examples:
• The set {0, 1} is the binary alphabet
• ASCII
• Unicode
Strings and Languages
• String (sentence or word)
– A string over an alphabet is a finite sequence of symbols
drawn from that alphabet
– Length of a string s
• Written |s|
• The number of occurrences of symbols in s
• Example:
– banana is a string of length six
• The empty string, denoted ε, is the string of length zero
• Language
– Any countable set of strings over some fixed alphabet
• Φ, the empty set
• {ε} , the set containing only the empty string
Strings and Languages
• Parts of strings
– A prefix of string s is any string obtained by
removing zero or more symbols from the end of s
• Example: ban, banana, and ε are prefixes of banana
– A suffix of string s is any string obtained by
removing zero or more symbols from the
beginning of s
• Example: nana, banana, and ε are suffixes of banana
– A substring of s is obtained by deleting any prefix
and any suffix from s
• Example: banana, nan, and ε are substrings of banana
Strings and Languages
– The proper prefixes, suffixes, and substrings of a
string s are those, prefixes, suffixes, and
substrings, respectively, of s that are not ε or not
equal to s itself
– A subsequence of s is any string formed by
deleting zero or more not necessarily consecutive
positions of s
• Example: baan is a subsequence of banana
Strings and Languages
• If x and y are strings, then the concatenation of x
and y, denoted xy, is the string formed by
appending y to x
– Example: If x = dog and y = house, then xy = doghouse
• The empty string is the identity under
concatenation
– That is, for any string s, εs = sε = s
• Exponentiation of strings
– s0 to be ε
– For all i > 0, si is si-1s
• s1 = s0s = εs = s, s2 = ss, s3 = sss, and so on
Operations on Languages
• In lexical analysis, the most important
operations on languages are union,
concatenation, and closure
Operations on Languages
• Union of languages
– Strings are found either in the first language or in the second
language
• Concatenation of languages
– Strings formed by taking a string from the first language and a string
from the second language, and concatenating them
• Kleene closure of a language L
– Denoted by L*
– The set of strings get by concatenating L zero or more times
• L0 = {ε}
• Li = Li-1L
• Positive closure
– Denoted by L+
– Same as Kleene closure, but without the term L0
Operations on Languages
• L = {A, B , . . . , Z , a, b, . . . , z} and D = {0, 1 , . . . 9}
– L U D = Set of letters and digits. The language
with 62 strings of length one, each of which
strings is either one letter or one digit
– LD = Set of 520 strings of length two, each
consisting of one letter followed by one digit
– L4 = Set of all 4-letter strings
– L* = Set of ail strings of letters, including ε, the
empty string
– L(L U D)* = Set of all strings of letters and digits
beginning with a letter
– D+ = Set of all strings of one or more digits
Regular Expressions
• Regular expressions over some alphabet  and the
languages that those expressions denote
– BASIS: There are two rules
• ε is a regular expression, and L(ε) = {ε} , the language whose sole
member is the empty string
• If ‘a’ is a symbol in , then a is a regular expression, and L(a) = {a},
the language with one string, of length one, with ‘a’
– INDUCTION: There are four parts. Suppose r and s are
regular expressions denoting languages L(r) and L(s),
respectively
• (r)|(s) is a regular expression denoting the language L(r) U L(s)
• (r)(s) is a regular expression denoting the language L(r)L(s)
• (r)* is a regular expression denoting (L(r)) *
• (r) is a regular expression denoting L(r)
Regular Expressions
• The unary operator * has highest precedence
and is left associative
• Concatenation has second highest precedence
and is left associative
• |has lowest precedence and is left associative
• Example
– (a)|((b)*(c)) = a|b*c (b)* = b* = {ε, b, bb, bbb,…}
(bc)* != bc* (bc)* = {ε, bc, bcbc, …} bc* = {b, bc, bcc,
…}
• Set of strings that are either a single a or are zero or
Regular Expressions
• Example: Let  = {a, b}
– a l b = {a, b}
– (a l b)(a l b) = {aa, ab, ba, bb}
• Another regular expression for the same language is aa | ab | ba| bb
– a* = Set of all strings of zero or more a‘s
= {ε, a, aa, aaa, ... }
– (a I b)* = Set of all strings of zero or more instances of a or
b
= {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, abb... }
• Another regular expression for the same language is (a* b * )*
– a l a*b = {a, b, ab, aab, aaab, ... }
• Set consisting of a string ‘a’ and all strings of zero or more a's and
Regular Expressions
• A language that can be defined by a regular
expression is called a regular set.
• If two regular expressions r and s denote the
same regular set, they are said to be
equivalent (r = s)
Regular Expressions
• There are a number of algebraic laws for regular
expressions
• Consider arbitrary regular expressions r, s, and t
Regular Definitions
• If  is an alphabet of basic symbols, then a regular
definition is a sequence of definitions of the form:
d1  r1
d2  r2
...
dn  rn
where:
1. Each di is a new symbol, not in  and not the same as any
other of the d's, and
2. Each ri is a regular expression over the alphabet
 U {d1 , d2 , . . . , di-1}
Regular Definitions
• Regular definition for the language of C
identifiers
letter_  A | B | · · · | Z | a | b | · · · | z | _
digit  0 | 1 | . . . | 9
id  letter_ (letter_ | digit)*
Regular Definitions
• Regular definition for unsigned numbers

digit  0 | 1 | . . . | 9
digits  digit digit* 100.05
optionalFraction  .digits | ε
optionalExponent  (E ( + | - | ε ) digits) | ε
number  digits optionalFraction optionalExponent
Regular Expressions
• One or more instances
– Positive closure (+)
– If r is a regular expression, the (r)+ denotes the language (L(r))+
– r* = r+ | ε and r+ = rr* = r*r
• Zero or one instance
– Operator: ? a?.pdf
– r? = r| ε
– L(r?) = L(r) ꓴ {ε}
• Character classes
– A regular expression a1 | a2 | · · · | an , where the ai's are each
symbols of the alphabet, can be replaced by the shorthand [a 1a2 · ·
· an]
– When a1, a2 , . · · ,an form a logical sequence, it can replaced by a1-
a
Regular Definitions
• Regular definition for the language of C identifiers
letter_  A | B | · · · | Z | a | b | · · · | z | _
digit  0 | 1 | . . . | 9
id  letter_ (letter_ | digit)*

digit  [0-9]
digits  digit+
number  digits (. digits)? (E [+-]? digits)?

Survey Sampling Formula Sheet
100% (2)
Survey Sampling Formula Sheet
13 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
Specification of Tokens
No ratings yet
Specification of Tokens
17 pages
Specification of Tokens
0% (1)
Specification of Tokens
17 pages
2 - 2specification of Tokens
No ratings yet
2 - 2specification of Tokens
17 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
Lecture 3a and 3b
No ratings yet
Lecture 3a and 3b
21 pages
Pcdunit2 Continuation
No ratings yet
Pcdunit2 Continuation
26 pages
SPECIFICATION OF TOKENS - Unit 1
No ratings yet
SPECIFICATION OF TOKENS - Unit 1
13 pages
Compiler Design Assignment
No ratings yet
Compiler Design Assignment
6 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Regular Expression: Anab Batool Kazmi
No ratings yet
Regular Expression: Anab Batool Kazmi
32 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
Lect2 Lexical
No ratings yet
Lect2 Lexical
9 pages
Unit I
No ratings yet
Unit I
37 pages
Chapter THREE
No ratings yet
Chapter THREE
24 pages
Lec 4
No ratings yet
Lec 4
16 pages
Lecture02 Scanning 1
No ratings yet
Lecture02 Scanning 1
72 pages
Regular Expression
No ratings yet
Regular Expression
89 pages
Specification of Tokens
No ratings yet
Specification of Tokens
21 pages
Lexical Analysis-1
No ratings yet
Lexical Analysis-1
9 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
Language About Complier Construction
No ratings yet
Language About Complier Construction
23 pages
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
No ratings yet
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
23 pages
Regular Expressions
No ratings yet
Regular Expressions
4 pages
Bcs503 Module 2
No ratings yet
Bcs503 Module 2
46 pages
Chapter 3 - Regular Expression
No ratings yet
Chapter 3 - Regular Expression
16 pages
Chapter 3 - Regular Expressions
No ratings yet
Chapter 3 - Regular Expressions
49 pages
CC 2
No ratings yet
CC 2
65 pages
Chapter Two
No ratings yet
Chapter Two
59 pages
Small17 PDF
No ratings yet
Small17 PDF
64 pages
Lecture # 06
No ratings yet
Lecture # 06
27 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
CSC236 Week 9: Larry Zhang
No ratings yet
CSC236 Week 9: Larry Zhang
44 pages
Chapter 3
No ratings yet
Chapter 3
10 pages
Theory of Computation: Dr. Krishnendu Rarhi E: Krishnendu.e9621@cumail - in
No ratings yet
Theory of Computation: Dr. Krishnendu Rarhi E: Krishnendu.e9621@cumail - in
44 pages
Chap-2 2 (RegularExpression)
No ratings yet
Chap-2 2 (RegularExpression)
46 pages
Lecture 7
No ratings yet
Lecture 7
70 pages
Automata Theory Computability - M2
No ratings yet
Automata Theory Computability - M2
68 pages
Theory of Automata
No ratings yet
Theory of Automata
46 pages
TPL Lect 15 - 16
No ratings yet
TPL Lect 15 - 16
5 pages
Unit 2
No ratings yet
Unit 2
135 pages
Regular Expressions and Languages
No ratings yet
Regular Expressions and Languages
16 pages
ECS 20 Chapter 12, Languages, Automata, Grammars: R R 1 2 N R N n-1 2 1 R
No ratings yet
ECS 20 Chapter 12, Languages, Automata, Grammars: R R 1 2 N R N n-1 2 1 R
4 pages
21CS51 ATCD MODULE 2 - 1 Regular Expressions
No ratings yet
21CS51 ATCD MODULE 2 - 1 Regular Expressions
148 pages
5 - Regular Expression
No ratings yet
5 - Regular Expression
144 pages
Theory of Automata Lecture#2: by Riaz Ahmad Ziar R.ziar@kardan - Edu.af
No ratings yet
Theory of Automata Lecture#2: by Riaz Ahmad Ziar R.ziar@kardan - Edu.af
22 pages
Regular Expressions and Regular Languages
No ratings yet
Regular Expressions and Regular Languages
5 pages
Lecture Slides Regular Expressions
No ratings yet
Lecture Slides Regular Expressions
138 pages
Languages, Automata and Grammars Lecture Notes
No ratings yet
Languages, Automata and Grammars Lecture Notes
21 pages
Atcd Module 2 2021 Scheme
No ratings yet
Atcd Module 2 2021 Scheme
56 pages
ACD Module - 2 Notes
No ratings yet
ACD Module - 2 Notes
28 pages
Chapter 3 RE
No ratings yet
Chapter 3 RE
19 pages
2022 CSC 353 2.0 2 Alphabets and Languages
No ratings yet
2022 CSC 353 2.0 2 Alphabets and Languages
3 pages
Unit 3 - Regular Expression
No ratings yet
Unit 3 - Regular Expression
45 pages
Unit Ii
No ratings yet
Unit Ii
25 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
CD ch2
No ratings yet
CD ch2
104 pages
Lecture 1 String and Language
No ratings yet
Lecture 1 String and Language
36 pages
Operations On Languages
No ratings yet
Operations On Languages
3 pages
The Genetic Code of All Languages,(Part 2.1; Numerals)
From Everand
The Genetic Code of All Languages,(Part 2.1; Numerals)
Moni Kanchan Panda
No ratings yet
Dual Space
No ratings yet
Dual Space
17 pages
GreenHouse Model IEEEICAACCA2022
No ratings yet
GreenHouse Model IEEEICAACCA2022
6 pages
Iccs Brochure Vagai
No ratings yet
Iccs Brochure Vagai
2 pages
Topografia Moire
No ratings yet
Topografia Moire
6 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Tutorial - 4
No ratings yet
Tutorial - 4
2 pages
2006-09 Lodgeroom
100% (1)
2006-09 Lodgeroom
25 pages
Stanford E14 PSET 1 Solutions
No ratings yet
Stanford E14 PSET 1 Solutions
18 pages
IP Project I
No ratings yet
IP Project I
51 pages
A Review of Condition Monitoring and Fault Diagnosis For Diesel Engines
No ratings yet
A Review of Condition Monitoring and Fault Diagnosis For Diesel Engines
25 pages
Immediate Download Stability of Buildings Part 4 Moment Frames 1st Edition Andy Gardner Ebooks 2024
100% (1)
Immediate Download Stability of Buildings Part 4 Moment Frames 1st Edition Andy Gardner Ebooks 2024
61 pages
Concurrent Computing: Programming Paradigms
100% (1)
Concurrent Computing: Programming Paradigms
6 pages
34-Base, C.D., Beeby, A.W., Taylor, P.J. (1966) - An Investigation of The Crack Control Characteristics of
No ratings yet
34-Base, C.D., Beeby, A.W., Taylor, P.J. (1966) - An Investigation of The Crack Control Characteristics of
45 pages
Heat Conduction Using Green S Functions 2nd Edition Beck Instant Download
No ratings yet
Heat Conduction Using Green S Functions 2nd Edition Beck Instant Download
47 pages
Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
100% (1)
Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
348 pages
Karnaugh Maps 1
No ratings yet
Karnaugh Maps 1
18 pages
ES221-2022 Fall
No ratings yet
ES221-2022 Fall
3 pages
Week 1a - Introduction To Biostatistics
No ratings yet
Week 1a - Introduction To Biostatistics
40 pages
Adv Math 02
No ratings yet
Adv Math 02
4 pages
Humaira Thesis
No ratings yet
Humaira Thesis
28 pages
Variation
No ratings yet
Variation
20 pages
Grains Weight (G) : Ugyen Academy Assignments For Class VIII Students - 2022
No ratings yet
Grains Weight (G) : Ugyen Academy Assignments For Class VIII Students - 2022
6 pages
Porous Media in Openfoam: Chalmers Spring 2009
No ratings yet
Porous Media in Openfoam: Chalmers Spring 2009
14 pages
Module 11 Unit 1 Correlation Analysis
No ratings yet
Module 11 Unit 1 Correlation Analysis
13 pages
Wave
No ratings yet
Wave
15 pages
Newtons Laws of Motion PDF
No ratings yet
Newtons Laws of Motion PDF
43 pages
Querying The Linked Data Graph Using Owl:Sameas Provenance
No ratings yet
Querying The Linked Data Graph Using Owl:Sameas Provenance
13 pages
Experimental Psychology Chptr. 1-8 Reviewer
No ratings yet
Experimental Psychology Chptr. 1-8 Reviewer
9 pages
Understanding The Statistical Tests in Your Study
No ratings yet
Understanding The Statistical Tests in Your Study
9 pages

Specification of Tokens

Uploaded by

Specification of Tokens

Uploaded by

UNIT I

You might also like