0% found this document useful (0 votes)

14 views73 pages

2-Patterns, Lexemes, Tokens, Attributes-18-12-2024

The document covers the fundamentals of lexical analysis, including the interaction between scanners and parsers, the definition of tokens, patterns, and lexemes, as well as input buffering techniques. It explains the specification of tokens using regular expressions, transition diagrams, and finite automata, along with operations on languages. Additionally, it discusses regular expressions and their applications in defining patterns for token recognition.

Uploaded by

shukraditya.bose2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views73 pages

2-Patterns, Lexemes, Tokens, Attributes-18-12-2024

Uploaded by

shukraditya.bose2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 73

Lexical Analyzer

Topics to be covered
 Looping
1. Interaction of scanner & parser
2. Token, Pattern & Lexemes
3. Input buffering
4. Specification of tokens
5. Regular expression & Regular definition
6. Transition diagram
7. Hard coding & automatic generation lexical
analyzers
8. Finite automata
9. Regular expression to NFA using Thompson's rule
10.Conversion from NFA to DFA using subset
construction method
11.DFA optimization
1. Interaction with Scanner & Parser
Interaction of scanner & parser
Toke
Source Lexical n
Parser
Progra Analyzer
m Get next
token

Symbol Table

• Upon receiving a “Get next token” command from parser, the lexical
analyzer reads the input character until it can identify the next token.
• Lexical analyzer also stripping out comments and white space in the form of
blanks, tabs, and newline characters from the source program.
Why to separate lexical analysis &
parsing?
1. Simplicity in design.
2. Improves compiler efficiency.
3. Enhance compiler portability.
2. Token, Pattern & Lexemes
Token, Pattern & Lexemes
Token Pattern
The set of rules called pattern
Sequence of character
associated with a token.
having a collective meaning
Example: “non-empty sequence of
is known as token. digits”, “letter followed by letters and
Categories of Tokens: digits”

1.Identifier Lexemes

2.Keyword
The sequence of character in a
3.Operator source program matched with a
4.Special symbol pattern for a token is called
lexeme.
5.Constant
Example: Rate, DIET, count, Flag
Example: Token, Pattern & Lexemes
Example: total = sum + 45
Tokens:
total Identifier1
Operator1
=
sum Identifier2 Tokens

+ Operator2

45 Constant1

Lexemes
Lexemes of identifier: total, sum
Lexemes of operator: =, +
Lexemes of constant: 45
Attributes for Tokens
3. Input buffering
Input buffering
• There are mainly two techniques for input buffering:
1. Buffer pairs
2. Sentinels
Buffer
Pair
• The lexical analysis scans the input string from left to right one
character at a time.
• Buffer divided into two N-character halves, where N is the
number of character: : :on
E : one
: = : : disk block.
: C: * : * : 2 : eof :
Mi : * : : : :
Buffer pairs
: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :

forward forward
lexeme_beginnig

• Pointer Lexeme Begin, marks the beginning of the current lexeme.

• Pointer Forward, scans ahead until a pattern match is found.
• Once the next lexeme is determined, forward is set to character at its right
end.
• Lexeme Begin is set to the character immediately after the lexeme just found.
• If forward pointer is at the end of first buffer half then second is filled with N
input character.
• If forward pointer is at the end of second buffer half then first is filled with N
input character.
Buffer pairs
: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :

forward forward forward

Code to advance forwardlexeme_beginnig
pointer
if forward at end of first half then begin
reload second half;
forward := forward + 1;
end
else if forward at end of second half then begin
reload first half;
move forward to beginning of first half;
end
else forward := forward + 1;
Sentinels
: : E : : = : : Mi : * : eof : C: * : * : 2 : eof : : eof

forward
lexeme_beginnig

• In buffer pairs we must check, each time we move the forward pointer
that we have not moved off one of the buffers.
• Thus, for each character read, we make two tests.
• We can combine the buffer-end test with the test for the current
character.
• We can reduce the two tests to one if we extend each buffer to hold a
sentinel character at the end.
• The sentinel is a special character that cannot be part of the source
program, and a natural choice is the character EOF.
Sentinels
: : E : : = : : Mi : * : eof : C: * : * : 2 : eof : : eof

forward forward forward

lexeme_beginnig
forward := forward + 1;
if forward = eof then begin
if forward at end of first half then begin
reload second half;
forward := forward + 1;
end
else if forward at the second half then begin
reload first half;
move forward to beginning of first half;
end
else terminate lexical analysis;
end
4. Specification of tokens
SPECIFICATIONS OF TOKENS
• Specification of tokens depends on the pattern of the lexeme.
• Regular expressions is used to specify the different types of
patterns
that can actually form tokens.

Strings and Languages

Operation on Languages

Regular Expression

Regular Definition
Strings and Languages

1. String
Strings and Languages
• Alphabet or character class is a finite set of symbols. Denoted as 

Example :
 The set of digits (symbols)  = {0, 1} forms a binary alphabet.
 ASCII used in almost every computer, denotes the alphabet A using the set
of digits {0, 1} i.e. A = 01000001.
  = {a, b,…..,z} is the set of lower case letters

• Symbols can be letters, digits and punctuation.

• String is a finite set of alphabets. It is generated from 

  = {a, b} we can derive n number of strings a, ab,ba,aab,……

  = {0, 1} possible strings possible are 0,01,1,10,100,010,……
2. Language
• Language is a set or a collection of strings over some fixed alphabets.

L= {0,01,1,10,100,010,……}
L={a,b,ab,ba,abb,….}
• {ε} – set containing only empty string is language under φ.
Operations on strings
Length of String
• The length of the string can be determined by the number of alphabets
in the string.
• The string is represented by the letter ‘s’ and |s| represents the length
of the string.
s = banana, |s| = 6
s= 1100 , |s| = 4
Empty string
• The empty string or the string with length 0 is represented by
‘∈’.
| ∈ character
• The string does not contain any |=0
Strings and languages
Term Definition
Prefix of s A string obtained by removing zero or more
trailing symbol of string S.
e.g., ban is prefix of banana.
Suffix of S A string obtained by removing zero or more
leading symbol of string S.
e.g., nana is suffix of banana.
Sub string of S A string obtained by removing prefix and suffix
from S.
Proper prefix, e.g., nonempty
Any nan is substring
stringofxbanana
that is respectively proper
suffix and prefix, suffix or substring of S, such that s≠x.
substring of Sof
Subsequence A string obtained by removing zero or more not
S necessarily contiguous symbol from S.
e.g., baaa is subsequence of banana.
TERM DEFINITION EXAMPLE
Prefix of s A string obtained by ban is a prefix of banana.
trailing symbols of string s Prefix: ∈, a, ab, abc, abcd
removing zero or more S=abcd

Suffix of s A string formed by nana is a suffix of banana.

the leading symbols of s. Suffix: ∈, d, cd, bcd, abcd
deleting zero or more of s = abcd

Substring of s A string obtained by nan is a substring of banana.

deleting a prefix and s=banana
a suffix from s Substring :∈ nan,na,anan
Proper prefix, Any nonempty string x S= abcd
suffix, or substring that is a prefix, suffix or Proper Prefix : a, ab, abc
of s substring of s that s <> x. Proper Suffix :d, cd, bcd
Substring : bcd, abc, cd, ab
Subsequence of s Any string formed by baaa is a subsequence of
deleting zero or more not banana
necessarily contiguous S=abcd
symbols from s Subsequence : abd, bcd, bd
Operation on Languages

Operation on Languages
• Let L be the set { A, B,….,Z,a,b,….z) and D the set {0,1,….., 9)
• L and D can be thought in 2 ways
 L as alphabet consisting of upper and lower case letters
 D as alphabet consisting of set of ten decimal digits.
• New languages that can be created from L and D by applying
the
operations are
1. L U D is the set of letters and digits
2. LD is the set of strings consisting of letter followed by a digit
3. L4 is the set of all four letter strings
4. L* is the set of all strings of letters, including epsilon
5. L (LUD)* is the set of all strings of letters and digits beginning
with a letter
6. D+ is the set of all strings of one or more digits
OPERATION DEFINITION EXAMPLE

L ∪ M = {a, b, c, d}
Union of L and M L υ M = { s | s is in L If L = {a, b} and M = {c, d}
(LυM) or s is in M }

L ⋅ M = {ac, ad, bc, bd}

Concatenation of LM = { st | s is in L If L = {a, b} and M = {c, d}
L and t is in M }
and M.(LM)
Kleene closure of L L* denotes “zero or If L = {a, b}
(L*) more concatenation L* = {∈, a, b, aa, bb, aaa, bbb, …}
of” L

Positive closure of L L+ denotes “one or If L = {a, b}

(L+) more Concatenation L+ = {a, b, aa, bb, aaa, bbb, …}
of” L

1. Union (L U S) = {0, 1, a, b, c}
Let, L = {0, 1} and S = {a, b, c}
2. Concatenation (L.S) = {0a, 0b, 0c, 1a, 1b, 1c}
3. Kleene closure (L*) = {ε, 0, 1, 00….}
Perform L U S, L.S, L*,L+ 4. Positive closure (L+) = {0, 1, 00….}
Exercise
• Write prefix, suffix, substring, proper prefix, proper suffix and
subsequence of following string:
String: Compiler
Operations on languages
Operation Definition
Union of L and M
Written L U M
Concatenation of
L and M
Written LM
Kleene closure
of L
Written
Positive L∗
closure
of L
Written L+
5. Regular Expression & Regular
Definition
Regular expression
• A regular expression is a sequence of characters that define
a pattern.
Notational shorthand's
1. One or more instances: +
2. Zero or more instances: *
3. Zero or one instances: ?
4. Alphabets: Σ
Rules to define regular expression
1. is a regular expression that denotes , the set containing empty
string.
2. If is a symbol in then is a regular expression,
3. Suppose and are regular expression denoting the languages
and . Then,
a. is a regular expression denoting
b. is a regular expression denoting
c. * is a regular expression denoting
d. is a regular expression denoting

The language denoted by regular expression is said to be a

regular set.
Regular expression
• L = Zero or More Occurrences
a* of a =

*
𝜖
a
aa Infinite
aaa
aaa …..
a
aaaaa
…..
Regular expression

+
a+ of a =
• L = One or More Occurrences

a
aa Infinite
aaa
aaa …..
a
aaaaa
…..
Precedence and associativity of operators
Operator Precedence Associative
Kleene * 1 left
Concatenation 2 left
Union | 3 left
Regular expression examples
1. 0 or 1
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 :𝟎 ,𝟏𝐑 . 𝐄 .=𝟎∨𝟏
2. 0 or 11 or 111
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 :𝟎 ,𝟏𝟏, 𝟏𝟏𝟏 𝐑 . 𝐄 .=𝟎|𝟏𝟏|𝟏𝟏𝟏
3. String having zero or. more
.=a. 𝐑 𝐄 𝐚∗
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 : 𝛜 , 𝐚 , 𝐚𝐚 , 𝐚𝐚𝐚 , 𝐚𝐚𝐚𝐚 …..
4. String having one 𝐑or. more
𝐄 .=a.𝐚 +¿
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 : 𝐚 , 𝐚𝐚 , 𝐚𝐚𝐚 , 𝐚𝐚𝐚𝐚 …..
5. Regular expression over that represent all string of length 3.
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 : 𝐚𝐛𝐜 , 𝐛𝐜𝐚 , 𝐛𝐛𝐛 ,𝐜𝐚𝐛 ,𝐚𝐛𝐚 …. 𝐑 . 𝐄 .= ( 𝐚|𝐛|𝐜 )( 𝐚|𝐛|𝐜 ) (𝐚|𝐛|𝐜)
6. All binary string
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 :𝟎,𝟏𝟏,𝟏𝟎𝟏,𝟏𝟎𝟏𝟎𝟏,𝟏𝟏𝟏𝟏… +
Regular expression examples
7. 0 or more occurrence of either a or b or both
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝜖,𝒂,𝒂𝒂,𝒂𝒃𝒂𝒃,𝒃𝒂𝒃… 𝑹. 𝑬 .=(𝒂∨𝒃)∗
8. 1 or more occurrence of either a or b or both
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝒂,𝒂𝒂,𝒂𝒃𝒂𝒃,𝒃𝒂𝒃,𝒃𝒃𝒃𝒂𝒂𝒂… +

9. Binary no. ends with 0

𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎,𝟏𝟎,𝟏𝟎𝟎,𝟏𝟎𝟏𝟎,𝟏𝟏𝟏𝟏𝟎… *

10.Binary no. ends with 1

𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟏,𝟏𝟎𝟏,𝟏𝟎𝟎𝟏,𝟏𝟎𝟏𝟎𝟏,… 𝑹. 𝑬 .=(𝟎∨𝟏)∗𝟏
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟏𝟏,𝟏𝟎𝟏,𝟏𝟎𝟎𝟏,𝟏𝟎𝟏𝟎𝟏,…
11.Binary no. starts and ends with 1

𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎,𝟏𝟎𝟏,𝒂𝒃𝒂,𝒃𝒂𝒂𝒃…
12.String starts and ends with same character
Regular expression examples
13.All string of a and b starting with a
… *

14.String of 0 and 1 ends with 00

… 𝑹. 𝑬 .=(𝟎∨𝟏)∗𝟎𝟎
𝑹. 𝑬 .=(𝒂∨𝒃)∗𝒂𝒃𝒃
15.String ends with abb
…

16.String starts with 1 and ends with 0

… 𝑹. 𝑬 .=𝟏(𝟎∨𝟏)∗𝟎
17.All binary string with at least 3 characters and 3rd character
…
should be zero 𝑹.𝑬.=( 𝟎|𝟏 )( 𝟎|𝟏) 𝟎(𝟎∨𝟏)∗

18.Language
… 𝑹. 𝑬 .=𝒂∗𝒃
which consist 𝒂∗𝒃𝒂∗
of exactly two b’s over the set
Regular expression examples
19.The language with such that 3rd character from right end of the
string is always a.
… 𝑹.𝑬.=(𝒂∨𝒃)∗𝒂(𝒂∨𝒃)(𝒂∨𝒃)
20. Any no. of followed by any no. of followed by any no. of
… 𝑹. 𝑬 .=𝒂∗𝒃∗𝒄 ∗
21. String should contain at least three
∗ ∗ ∗ ∗
…. 𝑹.𝑬.=(𝟎∨𝟏) 𝟏(𝟎∨𝟏) 𝟏(𝟎∨𝟏) 𝟏(𝟎∨𝟏)
22. String should contain exactly two ∗ ∗ ∗
…. 𝑹 . 𝑬 .=𝟎 𝟏 𝟎 𝟏 𝟎
23.Length of string should be at least 1 and at most 3
…. 𝑹.𝑬.=( 𝟎∨𝟏)|( 𝟎∨𝟏)( 𝟎∨𝟏)|( 𝟎∨𝟏)( 𝟎∨𝟏 )( 𝟎∨𝟏)
24.No. of zero should be multiple of 3 ∗ ∗ ∗ ∗ ∗
…. 𝑹. 𝑬 .=(𝟏 𝟎𝟏 𝟎𝟏 𝟎𝟏 )
Regular expression examples
25.The language with where should be multiple of 3
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝒂𝒂𝒂,𝒃𝒂𝒂𝒂,𝒃𝒂𝒄𝒂𝒃𝒂,𝒂𝒂𝒂𝒂𝒂𝒂.. ∗ ∗ ∗ ∗∗
𝑹.𝑬.=( ( 𝒃∨𝒄 ) 𝒂 ( 𝒃∨𝒄 ) 𝒂 ( 𝒃∨𝒄 ) 𝒂 ( 𝒃∨𝒄 ) )
26. Even no. of 0 ∗ ∗ ∗ ∗
…. 𝑹 . 𝑬 .=(𝟏 𝟎 𝟏 𝟎 𝟏 )
27. String should have odd length ∗
…. 𝑹. 𝑬 .=( 𝟎∨𝟏 ) (( 𝟎|𝟏 ) (𝟎∨𝟏))
28. ….
String should have even length ∗
𝑹 . 𝑬 .=( ( 𝟎|𝟏 ) ( 𝟎∨𝟏))
29. ….
String start with 0 𝑹.
and𝑬has odd length ∗
.=( 𝟎 ) ( ( 𝟎|𝟏 ) (𝟎∨𝟏))
30. ….
String start with 1 and has even length ∗
𝑹. 𝑬 .=𝟏(𝟎∨𝟏)(( 𝟎|𝟏 ) (𝟎∨𝟏))
31.𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎𝟏𝟎𝟏,𝟏𝟎𝟏𝟎𝟎,
All string begins 𝟏𝟏𝟎,𝟎𝟏𝟎𝟏𝟏…or ends with 00 or 11 ( 𝟎|𝟏 ) ∗(𝟎𝟎∨𝟏𝟏)
𝑹.𝑬.=(𝟎𝟎∨𝟏𝟏)(𝟎∨𝟏)∗∨
Regular expression examples
32. Language of all string containing both 11 and 00 as substring
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎𝟏𝟏,𝟏𝟏𝟎𝟎,𝟏𝟎𝟎𝟏𝟏𝟎,𝟎𝟏𝟎𝟎𝟏𝟏…
33. String ending with 1 and not contain 00
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟏𝟏,𝟏𝟏𝟎𝟏,𝟏𝟎𝟏𝟏…. 𝑹 . 𝑬 .=( 𝟏|𝟎𝟏 ) +¿
34. Language of C identifier
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝒂𝒓𝒆𝒂,𝒊,𝒓𝒆𝒅𝒊𝒐𝒖𝒔,𝒈𝒓𝒂𝒅𝒆𝟏…. 𝑹. 𝑬 .=(¿+𝑳)(¿+𝑳+𝑫)
∗

𝒘𝒉𝒆𝒓𝒆 𝑳𝒊𝒔𝑳𝒆𝒕𝒕𝒆𝒓 ∧𝐃𝐢𝐬𝐝𝐢𝐠𝐢𝐭

Regular definition
• A regular definition gives names to certain regular expressions
and uses those names in other regular expressions.
• Regular definition is a sequence of definitions of the form:

……

Where is a distinct name & is a regular expression.

 Example: Regular definition for identifier
letter  A|B|C|………..|Z|a|b|………..|z
digit  0|1|…….|9|
id letter (letter | digit)*
Regular definition example
• Example: Unsigned Pascal numbers
3
5280
39.37
6.336E4
1.894E-4
2.56E+7
Regular Definition
digit  0|1|…..|9

optional_fraction  .digits | 𝜖
digits  digit digit*

optional_exponent  (E(+|-|𝜖)digits)|𝜖
num  digits optional_fraction optional_exponent
6. Transition Diagram
Transition Diagram
• A stylized flowchart is called transition diagram.

is a state

is a transition

is a start state

is a final state
Transition Diagram : Relational operator

<
0 1
=
2 return
(relop,LE)
>
3 return
= (relop,NE)
other
5
4 return
return (relop,LT)
(relop,EQ)
>

6 =
7 return
(relop,GE)
other
8 return
(relop,GT)
Transition diagram : Unsigned number

digi digi digi

t t t
star digi . +or digi other
t 1 2 3
t
digi
t 4 5 6 7
E
- t 8

E digi
3 t
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
7. Hard coding & automatic
generation Lexical analyzers
Hard coding and automatic generation lexical
analyzers
• Lexical analysis is about identifying the pattern from the input.
• To recognize the pattern, transition diagram is constructed.
• It is known as hard coding lexical analyzer.
• Example: to represent identifier in ‘C’, the first character must be letter and
other characters are either letter or digits.
• To recognize this pattern, hard coding lexical analyzer will work with a
transition diagram.
• The automatic generation lexical analyzer takes special notation as input.
• For example, lex compiler tool will take regular expression as input and finds
out the pattern matching to that regular expression.
Letter or
digit

Start Letter
1 2 3
8. Finite Automata
Finite Automata
• Finite Automata are recognizers.
• FA simply say “Yes” or “No” about each possible input string.
• Finite Automata is a mathematical model consist of:
1. Set of states
2. Set of input symbol
3. A transition function move
4. Initial state
5. Final states or accepting states
Types of finite automata
• Types of finite automata are:
DFA
b
 Deterministic finite
automata (DFA): have for each a b b
1 2 3 4
state exactly one edge leaving out
for each symbol. a
a
b a
NFA DFA
 Nondeterministic finite automata a
(NFA): There are no restrictions on
the edges leaving a state. There a b b
1 2 3 4
can be several with the same
symbol as label and some edges
can be labeled with . b NFA
9. DFA optimization
DFA optimization
1. Construct an initial partition of the set of states with two groups:
the accepting states and the non-accepting states .
2. Apply the repartition procedure to to construct a new partition .
3. If , let and continue with step (4). Otherwise, repeat step (2) with
.
for each group of do begin
partition into subgroups such that two states and
of are in the same subgroup if and only if for all
input symbols , states and have transitions on
to states in the same group of .
replace in by the set of all subgroups formed.
end
DFA optimization
4. Choose one state in each group of the partition as the
representative for that group. The representatives will be the
states of . Let s be a representative state, and suppose on
input a there is a transition of from to . Let be the
representative of s group. Then has a transition from to on .
Let the start state of be the representative of the group
containing start state of , and let the accepting states of be
the representatives that are in .
5. If has a dead state , then remove from . Also remove any
state not reachable from the start state.
DFA optimization
States a b
{ 𝐴, 𝐵,𝐶, 𝐷, 𝐸} A B C
B B D
Nonaccepting States Accepting States
C B C
D B E

{𝐷} E B C

States a b
A B A
B B D
• Now no more splitting is possible. D B E
E B A
• If we chose A as the representative
for group (AC), then we obtain Optimized
Transition
reduced transition table Table
10. Conversion from regular
expression to DFA
Rules to compute nullable, firstpos,
lastpos
• nullable(n)
• The subtree at node generates languages including the empty string.

• firstpos(n)
• The set of positions that can match the first symbol of a string generated by
the subtree at node
• lastpos(n)
• The set of positions that can match the last symbol of a string generated be
the subtree at node
• followpos(i)
• The set of positions that can follow position in the tree.
Rules to compute nullable, firstpos,
lastpos
Node n nullable(n) firstpos(n) lastpos(n)
A leaf labeled
true
by with
A leaf
false
position
firstpos(c1) lastpos(c1)
n
¿ nullable(c1)
or  
c c nullable(c2) firstpos(c2) lastpos(c2)
1 2

if
n . if (nullable(c1)) (nullable(c2))
c c nullable(c1) thenfirstpos(c1) then
1 2 and  firstpos(c2) lastpos(c1) 
nullable(c2) else lastpos(c2)
n ∗ firstpos(c else )
true firstpos(c1))
1 lastpos(c
c lastpos(c12)
1
Rules to compute followpos
1. If n is concatenation node with left child c1 and right child c2
and i is a position in lastpos(c1), then all position in
firstpos(c2) are in followpos(i)

2. If n is * node and i is position in lastpos(n), then all position in

firstpos(n) are in followpos(i)
Conversion from regular expression to DFA
ab
(a|b)* # Step 1: Construct
b
Syntax Tree
. Step 2: Nullable node
.
¿
𝟔 Here, * is only nullable
. node
𝑏
. 𝟓
𝑏
𝟒
∗ 𝑎
𝟑
¿
𝑎 𝑏
𝟏 𝟐
Conversion from regular expression to DFA
Step 3: Calculate firstpos
Firstpos
{1,2,3} .
{1,2,3} . A leaf with position
{6 }¿
{1,2,3} . 𝟔
{5 }𝑏
{1,2,3} . 𝟓
n
¿ firstpos(c1) 
{4 }𝑏 c c firstpos(c2)
𝟒 1
{1,2} ∗ {3 }𝑎
2

n∗
𝟑 firstpos(c1)
c
{1,2} ¿ 1

n if (nullable(c1))
.
𝑎 𝑏 thenfirstpos(c1) 
{1}𝟏 {2 𝟐
} c c firstpos(c2)
1 2 else firstpos(c1)
Conversion from regular expression to DFA
Step 3: Calculate lastpos
Lastpos
{1,2,3} . {6 }
{1,2,3} .{5 }
{6 }¿{6 } A leaf with position

{1,2,3} .{4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓
n
¿ lastpos(c1) 
{4 } c c lastpos(c2)
𝟒 1 2
{1,2} ∗{1,2} {3 }𝑎 {3 } n∗
𝟑 c
lastpos(c1)

{1,2} ¿{1,2} 1

n if (nullable(c2))
.
𝑎 𝑏 then lastpos(c1) 
{1} {2 𝟐
{1}𝟏 } {2 } c c lastpos(c2)
1 2 else lastpos(c2)
Conversion from regular expression to DFA
Step 4: Calculate followpos Positi followpo
on
5 s 6
Firstpos {1,2,3} . {6 }
Lastpos
{1,2,3} .{5 }
{6 }¿{6 }
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2,3} 𝒄 𝟏{5 } {6 } 𝒄 𝟐{6 }
𝟑
{1,2} ¿{1,2}

𝑎 𝑏
{1} {2 𝟐
{1}𝟏 } {2 }
Conversion from regular expression to DFA
Step 4: Calculate followpos Positi followpo
on
5 s 6
{1,2,3} . {6 } 4 5
{1,2,3} .{5 }
{6 }¿{6 }
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2,3} 𝒄 𝟏{4 } {5 } 𝒄 𝟐{5 }
𝟑
{1,2} ¿{1,2}

𝑎 𝑏
{1} {2 𝟐
{1}𝟏 } {2 }
Conversion from regular expression to DFA
Step 4: Calculate followpos Positi followpo
on
5 s 6
Firstpos {1,2,3} . {6 } 4 5
Lastpos
{1,2,3} .{5 } 3 4
{6 }¿{6 }
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 }
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2,3} 𝒄 𝟏{3 } {4 } 𝒄 𝟐{4 }
𝟑
{1,2} ¿{1,2}

𝑎 𝑏
{1} {2 𝟐
{1}𝟏 } {2 }
Conversion from regular expression to DFA
Step 4: Calculate followpos Positi followpo
on
5 s 6
Firstpos {1,2,3} . {6 } 4 5
Lastpos
{1,2,3} .{5 } 3 4
{6 }¿{6 }
2 3
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 } 1 3
{1,2,3} . {3 } {4 }𝑏 𝟓 .
{4 }
𝟒
{1,2} ∗{1,2} {3 }𝑎 {3 } {1,2} 𝒄 𝟏{1,2} {3 } 𝒄 𝟐{3 }
𝟑
{1,2} ¿{1,2}

𝑎 𝑏
{1} {2 𝟐
{1}𝟏 } {2 }
Conversion from regular expression to DFA
Step 4: Calculate followpos Positi followpo
on
5 s 6
Firstpos {1,2,3} . {6 } 4 5
Lastpos
{1,2,3} .{5 } 3 4
{6 }¿{6 }
2 1,2,3
{1,2,3} .{4 } 𝟔
{5 }𝑏{5 } 1 1,2,3
{1,2,3} . {3 } {4 }𝑏 𝟓
{4 }
𝟒 {1,2} *{1,2}
{1,2} ∗{1,2} {3 }𝑎 {3 } 𝒏
𝟑
{1,2} ¿{1,2}

𝑎 𝑏
{1} {2 𝟐
{1}𝟏 } {2 }
Conversion from regular expression to DFA
Initial state = of root = {1,2,3}
Positi followpo
----- A on
5 s 6
State A 4 5

δ( (1,2,3),a) = followpos(1) U 3 4

followpos(3) 2 1,2,3
1 1,2,3
=(1,2,3) U (4) =
{1,2,3,4} ----- B
States a b
A={1,2,3 B A
δ( (1,2,3),b) = followpos(2) }
B={1,2,3,
4}
=(1,2,3) ----- A
Conversion from regular expression to DFA
State B
Positi followpo
δ( (1,2,3,4),a) = followpos(1) U followpos(3)
on5 s 6
=(1,2,3) U (4) = {1,2,3,4} -----4 B 5
3 4
2 1,2,3
δ( (1,2,3,4),b) = followpos(2) U followpos(4)
1 1,2,3
=(1,2,3) U (5) = {1,2,3,5} ----- C
State C
States a b
δ( (1,2,3,5),a) = followpos(1) U followpos(3)
A={1,2,3 B A
}
=(1,2,3) U (4) = {1,2,3,4} ----- B
B={1,2,3, B C
4}
C={1,2,3, B D
5}
D={1,2,3,
δ( (1,2,3,5),b) = followpos(2) U followpos(5)
6}

=(1,2,3) U (6) = {1,2,3,6} ----- D

Conversion from regular expression to DFA
State D
Positi followpo
δ( (1,2,3,6),a) = followpos(1) U followpos(3)
on
5 s 6

=(1,2,3) U (4) = {1,2,3,4} -----

4 B 5
3 4
2 1,2,3
δ( (1,2,3,6),b) = followpos(2) 1 1,2,3
=(1,2,3) ----- A
b
a States a b
A={1,2,3 B A
a b b }
B={1,2,3, B C
A B C D 4}
C={1,2,3, B D
a 5}
a D={1,2,3, B A
b 6}

DFA
Conversion from regular expression to DFA
Construct DFA for following regular expression:
1. (c | d)*c#
Thank You

Specification of Tokens
No ratings yet
Specification of Tokens
21 pages
Lexical Analysis
No ratings yet
Lexical Analysis
121 pages
Unit 2
No ratings yet
Unit 2
89 pages
Chapter - 2 Lexical Analysis
No ratings yet
Chapter - 2 Lexical Analysis
160 pages
Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
Unit 2
No ratings yet
Unit 2
93 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
CD ch2
No ratings yet
CD ch2
104 pages
Unit 2 - Lexical Anlaysis
No ratings yet
Unit 2 - Lexical Anlaysis
76 pages
WINSEM2024-25 CSI2005 TH VL2024250502429 2024-12-14 Reference-Material-II
No ratings yet
WINSEM2024-25 CSI2005 TH VL2024250502429 2024-12-14 Reference-Material-II
84 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
2.1 - Lexical Analysis
No ratings yet
2.1 - Lexical Analysis
102 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
84 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
Unit 2 Lexical Analysis
No ratings yet
Unit 2 Lexical Analysis
94 pages
CD 1
No ratings yet
CD 1
92 pages
Unit-2 Lexical Analysis
No ratings yet
Unit-2 Lexical Analysis
36 pages
Compiler Design
No ratings yet
Compiler Design
102 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
CD GTU Study Material Presentations Unit-2 27082020063553AM
No ratings yet
CD GTU Study Material Presentations Unit-2 27082020063553AM
84 pages
Compute's Third Book of Commodore 64 PDF
100% (1)
Compute's Third Book of Commodore 64 PDF
322 pages
Lexical Analysis: CD: Compiler Design
No ratings yet
Lexical Analysis: CD: Compiler Design
122 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Ch3 - Lexical Analysis
No ratings yet
Ch3 - Lexical Analysis
52 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Unit 1-REGULAR LANGUAGES
No ratings yet
Unit 1-REGULAR LANGUAGES
27 pages
CD - Unit II - Notes
No ratings yet
CD - Unit II - Notes
20 pages
Unit 2 Lexical Analysis - Part 1: Harshita Sharma
No ratings yet
Unit 2 Lexical Analysis - Part 1: Harshita Sharma
55 pages
Lec 06 Specification of Tokens
No ratings yet
Lec 06 Specification of Tokens
23 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Lexical Analysis: Deterministic Finite Automata
No ratings yet
Lexical Analysis: Deterministic Finite Automata
37 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
No ratings yet
WINSEM2023-24 CSI2005 TH VL2023240501823 2024-01-08 Reference-Material-I
23 pages
SDT Student Guide v1.1
100% (2)
SDT Student Guide v1.1
60 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
18 pages
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
No ratings yet
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
53 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
Acd Unit-2
No ratings yet
Acd Unit-2
16 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Chapter 3 Lexical Analyser
No ratings yet
Chapter 3 Lexical Analyser
29 pages
Lec 02
No ratings yet
Lec 02
17 pages
Unit 01 - PART 2
No ratings yet
Unit 01 - PART 2
25 pages
CD GTU Study Material Presentations Unit-2 27082020063553AM
No ratings yet
CD GTU Study Material Presentations Unit-2 27082020063553AM
84 pages
Cse309 3
No ratings yet
Cse309 3
101 pages
SE Compiler Chapter 2
No ratings yet
SE Compiler Chapter 2
16 pages
Lec 4
No ratings yet
Lec 4
16 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
UNIT 2 Compiler Design
No ratings yet
UNIT 2 Compiler Design
23 pages
Lexical Analysis: S. M. Farhad
No ratings yet
Lexical Analysis: S. M. Farhad
28 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Python Star Course Content
100% (1)
Python Star Course Content
8 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
PCD - Theory - Paper Solution - Nov - Dec - 2017
No ratings yet
PCD - Theory - Paper Solution - Nov - Dec - 2017
27 pages
WinOLS HelpEn
No ratings yet
WinOLS HelpEn
22 pages
7 - Accenture 2023 - Coding - Trainer Handout
No ratings yet
7 - Accenture 2023 - Coding - Trainer Handout
124 pages
Netezza Stored Procedures Guide
No ratings yet
Netezza Stored Procedures Guide
88 pages
Chapter-4 Data Manipulation
No ratings yet
Chapter-4 Data Manipulation
33 pages
Puppet Cookbook - Third Edition - Sample Chapter
0% (1)
Puppet Cookbook - Third Edition - Sample Chapter
44 pages
Ch2 Lexical Analysis
No ratings yet
Ch2 Lexical Analysis
11 pages
Chapter-III Data Structures in Python
No ratings yet
Chapter-III Data Structures in Python
129 pages
Nemo BTS Fileformat 1.15
No ratings yet
Nemo BTS Fileformat 1.15
28 pages
Practical Lab 6
No ratings yet
Practical Lab 6
9 pages
Parameters and Variables in Informatica
No ratings yet
Parameters and Variables in Informatica
37 pages
100 Javascript Snippets For Beginners
No ratings yet
100 Javascript Snippets For Beginners
17 pages
CD Aii Partb Ans
No ratings yet
CD Aii Partb Ans
8 pages
R20 I B.Tech. CSE Syllabus
No ratings yet
R20 I B.Tech. CSE Syllabus
45 pages
Economic Load Dispatch Using Genetic Algorithm
No ratings yet
Economic Load Dispatch Using Genetic Algorithm
13 pages
AC CUT EDM-Expert Interface
No ratings yet
AC CUT EDM-Expert Interface
16 pages
OCPP-2.0.1 Part2 Appendices v13
No ratings yet
OCPP-2.0.1 Part2 Appendices v13
39 pages
Class XI (As Per CBSE Board) : Computer Science
No ratings yet
Class XI (As Per CBSE Board) : Computer Science
14 pages
Junit Tutorial PDF
No ratings yet
Junit Tutorial PDF
20 pages
Formal Languages and Automation Assignment
No ratings yet
Formal Languages and Automation Assignment
3 pages
E1 PowerCenter
No ratings yet
E1 PowerCenter
24 pages
Powering The 82240B IR Printer
No ratings yet
Powering The 82240B IR Printer
15 pages
Tutorial Traductor - English - v1.1 - 20 Jan 08 PDF
No ratings yet
Tutorial Traductor - English - v1.1 - 20 Jan 08 PDF
20 pages
Acm Amman Collegiate Programming Contest en
No ratings yet
Acm Amman Collegiate Programming Contest en
13 pages
Saving and Reloading ListView Using Shared Preferences (Saving Ondestroy ) - Stack Overflow PDF
No ratings yet
Saving and Reloading ListView Using Shared Preferences (Saving Ondestroy ) - Stack Overflow PDF
7 pages
Pythonsupplement
100% (1)
Pythonsupplement
190 pages
DP All Categories
No ratings yet
DP All Categories
14 pages
OCS752-Introduction To C Programming
No ratings yet
OCS752-Introduction To C Programming
9 pages
Glossary - Python Basics
No ratings yet
Glossary - Python Basics
2 pages
Format String Ex
No ratings yet
Format String Ex
2 pages

2-Patterns, Lexemes, Tokens, Attributes-18-12-2024

Uploaded by

2-Patterns, Lexemes, Tokens, Attributes-18-12-2024

Uploaded by

Lexical Analyzer

• Pointer Lexeme Begin, marks the beginning of the current lexeme.

forward forward forward

forward forward forward

Strings and Languages

• Symbols can be letters, digits and punctuation.

  = {a, b} we can derive n number of strings a, ab,ba,aab,……

Suffix of s A string formed by nana is a suffix of banana.

Substring of s A string obtained by nan is a substring of banana.

L ⋅ M = {ac, ad, bc, bd}

Positive closure of L L+ denotes “one or If L = {a, b}

The language denoted by regular expression is said to be a

9. Binary no. ends with 0

10.Binary no. ends with 1

14.String of 0 and 1 ends with 00

16.String starts with 1 and ends with 0

𝒘𝒉𝒆𝒓𝒆 𝑳𝒊𝒔𝑳𝒆𝒕𝒕𝒆𝒓 ∧𝐃𝐢𝐬𝐝𝐢𝐠𝐢𝐭

Where is a distinct name & is a regular expression.

digi digi digi

2. If n is * node and i is position in lastpos(n), then all position in

=(1,2,3) U (6) = {1,2,3,6} ----- D

=(1,2,3) U (4) = {1,2,3,4} -----

You might also like