0% found this document useful (0 votes)

11 views153 pages

Lecture3 E

The document discusses the role of the lexical analyzer in compilers, which involves reading source program characters, grouping them into lexemes, and producing tokens for syntax analysis. It details the interaction between the lexical analyzer and the parser, the processes of scanning and lexical analysis, and the importance of separating these tasks for simplicity and efficiency. Additionally, it covers concepts such as tokens, patterns, lexemes, error recovery, regular expressions, and the recognition of tokens.

Uploaded by

mimise6572

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views153 pages

Lecture3 E

Uploaded by

mimise6572

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 153

Lexical Analysis

The Role of the Lexical Analyzer

• The lexical analyzer is to read the input characters of the
source program, group them into lexemes, and produce as
output a sequence of tokens for each lexeme in the source
program.
• The stream of tokens is sent to the parser for syntax
analysis.
• It is common for the lexical analyzer to interact with the
symbol table as well.
• When the lexical analyzer discovers a lexeme constituting
an identifier, it needs to enter that lexeme into the symbol
table.
• In some cases, information regarding the kind of identifier
may be read from the symbol table by the lexical analyzer
to assist it in determining the proper token it must pass to
the parser.
Interaction between the lexical
analyzer & the parser
• Other Tasks:
Stripping out comments and whitespace
(blank, newline, tab, and perhaps other
characters that are used to separate tokens in
the input).
Correlating error messages generated by the
compiler with the source program.
Divided into two processes
1. Scanning consists of the simple processes that
do not require tokenization of the input, such
as deletion of comments and compaction of
consecutive whitespace characters into one.
2. Lexical analysis proper is the more complex
portion, where the scanner produces the
sequence of tokens as output.
Lexical Analysis Versus Parsing
• Why the analysis portion of a compiler is
normally separated into lexical analysis and
parsing (syntax analysis) phases?
1. Simplicity of design is the most important
consideration.
 The separation of lexical & syntactic analysis
often allows us to simplify at least one of these
tasks.
 If we are designing a new language, separating
lexical & syntactic concerns can lead to a
cleaner overall language design.
2. Compiler efficiency is improved.
A separate lexical analyzer allows us to
apply specialized techniques that serve
only the lexical task, not the job of parsing.
In addition, specialized buffering
techniques for reading input characters can
speed up the compiler significantly.
3. Compiler portability is enhanced. Input-
device-specific peculiarities can be restricted
to the lexical analyzer.
Tokens, Patterns & Lexemes
• Token
a token name + an optional attribute value.
The token name is an abstract symbol
representing a kind of lexical unit, e.g., a
particular keyword, or a sequence of input
characters denoting an identifier.
The token names are the input symbols
that the parser processes.
• Pattern
o A pattern is a description of the form that the
lexemes of a token may take.
o In the case of a keyword as a token, the pattern is
just the sequence of characters that form the
keyword.
o For identifiers & some other tokens, the pattern is
a more complex structure that is matched by
many strings.
• Lexemes
A lexeme is a sequence of characters in the
source program that matches the pattern for a
token & is identified by the lexical analyzer as an
instance of that token.
Example: Patterns & Lexemes

• C statement
printf ("Total = %d\n" , score ) ;
• both printf and score are lexemes matching the
pattern for token id, and " Total = %d\n" is a lexeme
matching literal.
Covering most or all of the tokens
1. One token for each keyword. The pattern for a
keyword is the same as the keyword itself.
2. Tokens for the operators, either individually or in
classes.
3. One token representing all identifiers.
4. One or more tokens representing constants, such
as numbers & literal strings.
5. Tokens for each punctuation symbol, such as left
&right parentheses, comma, & semicolon.
Attributes for Tokens
• When more than one lexeme can match a pattern, the
lexical analyzer must provide the subsequent compiler
phases additional information about the particular lexeme
that matched.
• For example, the pattern for token number matches both
0 and 1, but it is extremely important for the code
generator to know which lexeme was found in the source
program.
• Thus, in many cases the lexical analyzer returns to the
parser not only a token name, but an attribute value that
describes the lexeme represented by the token ;
• Token name influences parsing decisions, while the
attribute value influences translation of tokens after the
parse.
Attributes for Tokens
• Tokens have at most one associated attribute,
although this attribute may have a structure
that combines several pieces of information.
• Normally, information about an identifier-e.g.,
its lexeme, its type, and the location at which
it is first found is kept in the symbol table.
• Thus, the appropriate attribute value for an
identifier is a pointer to the symbol-table
entry for that identifier.
Example 3.2 : The token names & associated
attribute values for the Fortran statement
E = M * C ** 2
• Sequence of pairs:
<id, pointer to symbol-table entry for E>
<assign_op> [no need to assign]
<id, pointer to symbol-table entry for M>
<mult_op> [no need to assign]
<id, pointer to symbol-table entry for C>
<exp_op> [no need to assign]
<number, integer value 2>
Lexical Errors
• It is hard for a lexical analyzer to tell, without the
aid of other components, that there is a source-
code error.
• For instance, if the string fi is encountered for the
first time in a C program in the context :
fi ( a == f (x) ) . . .
• A lexical analyzer cannot tell whether fi is a
misspelling of the keyword if or an undeclared
function identifier.
• Since fi is a valid lexeme for the token id, the
lexical analyzer must return the token id to the
parser
Lexical Errors
• Let the parser - handle an error due to transposition of the
letters.
• However, suppose a situation arises in which the lexical
analyzer is unable to proceed because none of the patterns
for tokens matches any prefix of the remaining input.
• Error Recovery
 The simplest recovery strategy is "panic mode" recovery.
 We delete successive characters from the remaining input,
until the lexical analyzer can find a well-formed token at the
beginning of what input is left.
 This recovery technique may confuse the parser, but in an
interactive computing environment it may be quite
adequate.
• Other Recovery

• 1. Delete one character from the remaining

input.
• 2. Insert a missing character into the remaining
input.
• 3. Replace a character by another character.
• 4. Transpose two adjacent characters.
Terms for Parts of Strings
1 . A prefix of string s is any string obtained by removing zero or more
symbols from the end of s.
• ban, banana, and  are prefixes of banana.
2. A suffix of string s is any string obtained by removing zero or more
symbols from the beginning of s.
nana, anana, and  are suffixes of banana.
3. A substring of s is obtained by deleting any prefix and any suffix from s.
banana, nan, and  are substrings of banana.
4. The proper prefixes, suffixes, and substrings of a string s are those,
prefixes, suffixes, and substrings, respectively, of s that are not  or not
equal to s itself.
5. A subsequence of s is any string formed by deleting zero or more not
necessarily consecutive positions of s.
baan is a subsequence of banana.
Regular Expressions
• Regular expressions: underscore is included
among the letters.
• if letter_ is established to stand for any letter
or the underscore, and digit_ established to
stand for any digit, then we could describe the
language of C identifiers by :
Letter_ ( letter_ | digit )*
Vertical bar: union
Parentheses: group sub expressions,
Star: zero or more occurrences of
Juxtaposition of letter_ with the remainder of the expression signifies concatenation
• Each regular expression r denotes a language
L(r) , which is also defined recursively from the
languages denoted by r's sub expressions.
• Rules that define the regular expressions over
some alphabet  and the languages that those
expressions denote.
BASIS: There are two rules

• 1.  is a regular expression, and L () is {}

 the language whose sole member is the empty
string.
• 2. If a is a symbol in , then a is a regular
expression, and L(a) = {a},
 the language with one string, of length one,
with a in its one position.
INDUCTION: There are 04 parts to the induction whereby larger
regular expressions are built from smaller ones. Suppose r and s are
regular expressions denoting languages L(r) and L(s).

1. (r)|(s) is a regular expression denoting the language L(r) U L(s) .

2. (r) (s) is a regular expression denoting the language L(r)L(s).

3. (r)* is a regular expression denoting (L (r)) * .

4. (r) is a regular expression denoting L(r) .

 This last rule says that we can add additional pairs of parentheses
around expressions without changing the language they denote.
• We may drop certain pairs of parentheses
• Conventions
• a) The unary operator * has highest precedence & is
left associative.
• b) Concatenation has second highest precedence
and is left associative.
• c) I has lowest precedence and is left associative.
• (a) I((b) *(c)) == alb* c.
• Both expressions denote the set of strings that are
either a, single a or are zero or more b's followed by
one c.
Example 3.4 : Let  = {a, b} .
1. The regular expression a|b denotes the language {a, b} .
2. (alb) (alb) denotes {aa, ab, ba, bb} , the language of all
strings of length two over the alphabet . Another regular
expression for the same language : aa l ab l ba l bb
3. a* denotes the language consisting of all strings of zero or
more a's: {, a, aa, aaa, ... }.
4. ( a I b) * denotes the set of all strings consisting of zero or
more instances of a or b,
 all strings of a's and b's: { , a, b, aa, ab, ba, bb, aaa, ... }.
 Another regular expression for same language: (a* b * )*.
5. ala* b denotes the language {a, b, ab, aab, aaab, ... },
 the string a & all strings consisting of zero or more a's &
ending in b.
• Regular set: A language that can be defined by
a regular expression
• If two regular expressions r and s denote the
same regular set , we say they are equivalent
and write r = s. For instance, (alb) = (b la).
Algebraic laws for regular expressions r, s, & t
Regular Definition
• If  is an alphabet of basic symbols, then a
regular definition is a sequence of definitions
of the form:
• d1  r1
• d2  r2
------
 Dn  rn
• 1. Each di is a new symbol, not in  and not the
same as any other of the d's,
• 2. Each ri is a regular expression over the
alphabet U {d1 , d2 , . . . , di-1

• By restricting ri to  & previously defined d's, we avoid recursive

definitions, and we can construct a regular expression over  alone,
for each ri

• How: first replacing uses of d1 in r2 (which cannot use any of the d's
except for d1), then replacing uses of d1 and d2 in r3 by r1 and (the
substituted) r2 , and so on.

• Finally, in rn replace each di , for i = 1 , 2, . . . , n - 1 , by the

substituted version of ri , each of which has only symbols of .
• Example: C identifiers are strings of letters,
digits, and underscores. Here is a regular
definition for the language of C identifiers. We
shall conventionally use italics for the symbols
defined in regular definitions.
• letter  A I B I · · · l z l a l b l · · · l z l –
• digit  0| 1 | · · ·| 9

• id  letter _ ( letter- I digit ) *

• Example : Unsigned numbers (integer or floating point)
are strings such as 5280, 0 . 0 1234, 6 . 336E4, or 1 .
89E-4.

• digit  0| 1 |· · · | 9

• digits  digit digit*

• optionalFraction  . digits I 

• optionalExponent  ( E ( + I - I  ) digits ) I 

• number  digits optionalFraction optionalExponent

Abbreviations

• The basic operations generate all possible regular

expressions, but there are common abbreviations used for
convenience. Typical examples:
Abbr. Meaning Notes
r+ (rr*) 1 or more occurrences
r? (r | ε) 0 or 1 occurrence
[a-z] (a|b|…|z) 1 character in given range
[abxyz] (a|b|x|y|z) 1 of the given characters
Examples

re Meaning
+ single + character
! single ! character
= single = character
!= 2 character sequence
<= 2 character sequence
xyzzy 5 character sequence
Extensions of Regular Expressions
• 1 . One or more instances. The unary, postfix operator + represents the
positive closure of a regular expression and its language.
 That is, if r is a regular expression, then (r)+ denotes the language (L(r) ) + .
 The operator + has the same precedence and associativity as the operator *
 Two useful algebraic laws, r* = r+ |e and r+ = rr* = r*r relate the Kleene
closure & positive closure.
• 2. Zero or one instance. The unary postfix operator ? means "zero or one
occurrence." That is, r? is equivalent to r l  , or put another way, L (r?) = L (r)
U {}.
 The ? operator has the same precedence and associativity as * and + .
• 3. Character classes. A regular expression a1l a2| · · · I an , where the ai 's are
each symbols of the alphabet, can be replaced by the shorthand [a1 a2 · · ·
an].
 when a1 , a2 , . · · , an form a logical sequence, e.g., consecutive uppercase
letters, lowercase letters, or digits, we can replace them by a1-an , that is,
just the first and last separated by a hyphen.
 [abc] == a|b|c, [a-z] == a|b| · · · |z
Recognition of Tokens
• Build a piece of code that examines the input
string & finds a prefix that is a lexeme
matching one of the patterns.
Example: Patterns
Terminals:
if, then, else,
relop , id,
number---names
of tokens
ws  ( blank | tab | newline )+

blank, tab, newline are abstract symbols.

Token ws is different from the other tokens in that ,
when we recognize it, we do not return it to the
parser, but rather restart the lexical analysis from
the character that follows the whitespace.
Tokens, their patterns, and attribute values

• For each lexeme or

family of lexemes,
which token name
is returned to the
parser and what
attribute value, is
returned.
 06 relational
operators are
used as the
attribute value,
in order to
indicate which
instance of the
token relop we
have found
Transition Diagrams
• convert patterns into stylized flowcharts:
"transition diagrams"
Transition diagrams have a collection of
nodes or circles, called states.
Each state represents a condition that
could occur during the process of scanning
the input looking for a lexeme that matches
one of several patterns.
Transition Diagrams
• Edges are directed from one state to another.
• Each edge is labeled by a symbol or set of symbols.
• If we are in some state s , & the next input symbol is a, we
look for an edge out of state s labeled by a (and perhaps
by other symbols, as well).
• If such an edge found, advance the forward pointer &
enter the state of the transition diagram to which that
edge leads.
• transition diagrams are deterministic
 there is never more than one edge out of a given state
with a given symbol among its labels.
Some important conventions
1. Certain states are said to be accepting, or final. These
states indicate that a lexeme has been found. We always
indicate an accepting state by a double circle, and if there is
an action to be taken - typically returning a token and an
attribute value to the parser - we shall attach that action to
the accepting state.
2. In addition, if it is necessary to retract the forward pointer
one position (i.e., the lexeme does not include the symbol
that got us to the accepting state), then we shall
additionally place a* near that accepting state.
3. One state is designated the start state, or initial state; it is
indicated by an edge, labeled "start," entering from
nowhere. The transition diagram always begins in the start
state before any input symbols have been read.
• Example: Transition diagram that recognizes
the lexemes matching the token relop.
Recognition of Reserved Words and Identifiers

• Keywords (if or then) are reserved

• not identifiers
• Transition diagram for identifier lexemes, &
recognize the keywords if , then, & else
02 ways that we can handle reserve words
that look like identifier
1. Install the reserved words in the symbol table
initially. A field of the symbol-table entry
indicates that these strings are never ordinary
identifiers, and tells which token they represent.
When we find an identifier, a call to installID places
it in the symbol table if it is not already there
and returns a pointer to the symbol-table entry
for the lexeme found.
2. Create separate transition diagrams for each
keyword;
Note that such a transition diagram consists of
states representing the situation after each
successive letter of the keyword is seen,
followed by a test for a "nonletter-or-digit,“
i.e., any character that cannot be the
continuation of an identifier.

Hypothetical transition diagram for the keyword then

Figure : A transition diagram for unsigned numbers

Figure : A transition diagram for whitespace

Finite Automata
• Finite automata are essentially graphs, like transition
diagrams, with a few differences:
1. Finite automata are recognizers; they simply say "yes" or "no"
about each possible input string.
2. Finite automata come in two flavors:
 (a) Nondeterministic finite automata (NFA) have no
restrictions on the labels of their edges. A symbol can label
several edges out of the same state, and , the empty
string, is a possible label.
 (b) Deterministic finite automata (DFA) have, for each
state, and for each symbol of its input alphabet exactly one
edge with that symbol leaving that state.
• Both NFA & DFA are capable of recognizing the same
languages (regular language)
Nondeterministic Finite Automata (NFA)
1 . A finite set of states S.
2. A set of input symbols , the input alphabet. The
empty string (), is never a member of 
3. A transition function that gives, for each state,
and for each symbol in  U {E} a set of next
states.
4. A state s0 from S that is distinguished as the start
state (or initial state) .
5. A set of states F, a subset of S, that is
distinguished as the accepting states (or final
states) .
Any NFA/DFA can represent by a transition graph, where the
nodes are states and the labeled edges represent the transition
function.
There is an edge labeled a from state s to state t
if and only if t is one of the next states for state s and input
a.

This graph is very much like a transition diagram, except

a) The same symbol can label edges from one state to several
different states,
b) An edge may be labeled by , the empty string, instead of, or
in addition to, symbols from the input alphabet.
• Example: R = (alb) * abb
NFA

Double circle around state 3 indicates that this state is accepting.

only ways to get from the start state 0 to the accepting state is to
follow some path that stays in state 0 for a while, then goes to
states 1 , 2, and 3 by reading abb from the input.

Thus, the only strings getting to the accepting state are those that
end in abb.
Transition Tables
• Rows : states
• Columns: input symbols and .
• The entry for a given state & input is
value of the transition function
applied to those arguments.
• If the transition function has no
information about that state-input
pair, put .
• Adv: Easily find the transitions on a
given state and input.
• Disadv: takes a lot of space, when
the input alphabet is large,
Acceptance of Input Strings by
Automata
• An NFA accepts input string x if & only if there
is some path in the transition graph from the
start state to one of the accepting states
•  labels along the path are effectively ignored,
since the empty string does not contribute to
the string constructed along the path.
Example : The string aabb is accepted by the NFA

• Another path (not accepting)

 NFA accepts a string as long as some path

labeled by that string leads from the start
state to an accepting state.
NFA
L(aa* | bb* )

• String aaa accepted

•  is "disappear" in a
concatenation
Example: Moves on a Chessboard

• States = squares.
• Inputs = r (move to an adjacent red square)
and b (move to an adjacent black square).
• Start state, final state are in opposite
corners.

55
Example: Chessboard – (2)
1 2 3
r b
1 2,4 5
4 5 6 2 4,6 1,3,5
3 2,6 5
7 8 9 4 2,8 1,5,7
5 2,4,6,8 1,3,7,9
r b b 6 2,8 3,5,9
1 2 1 5 7 4,8 5
4 3 1
8 4,6 5,7,9
5 3
* 9 6,8 5
7 7
9 Accept, since final state reached 56
Example
• An NFA accepting all strings that end in 01

0,1
Start 0 1
q0 q1 q2

Input: 00101
q0 q0 q0 q0 q0 q0

q1 q1 q1
(Stuck)
q2 q2 Accepted
(Stuck)

1 0 1
0 0 57
Example
• NFA that has an input alphabet {0} consisting of a
single symbol. It accepts all strings of the form 0k
where k is a multiple of 2 or 3 (accept: , 00, 0000,
000000 but not 0, 00000)

58
Example

Accept: , a, baba, baa q1

Reject: b, bb, babba

q2 q3
a, b

59
Transition Table
NFA A= ({q0,q1,q2},{0,1}, d ,q0,{q2})

0,1
Start 0 1
q0 q1 q2

0 1
q0 {q0,q1} {q0}
q1 Ø {q2}
*q2 Ø Ø
60
Transition Table
• Accept all strings that contains either 101
or 11 as a substring (010110)
0,1
0,1

Start 1 0,  1
q1 q2 q3 q4

1. Q = {q1, q2, q3, q4}

2.  = {0, 1} 0 1 
q1 {q1} {q1, q2} 
3. d
q2 {q3}  {q3}
q3  {q4} 
4. Start state: q1
5. F = {q4} *q4 {q4} {q4} 

61
Deterministic Finite Automata (DFA)
1. There are no moves on input 
2. For each state s & input symbol a, there is
exactly one edge out of s labeled a
• If we are using a transition table to represent
a DFA, then each entry is a single state.
• Represent this state without the curly braces
that we use to form sets.
• Lexical Analyzer---DFA
Algorithm: Simulating a DFA.
• INPUT: An input string x terminated by an end-of-
file character eof. A DFA D with start state s0 ,
accepting states F, and transition function move.
• OUTPUT: Answer "yes" if D accepts x ; "no"
otherwise.
• METHOD: Apply the algorithm to the input string
x. The function move(s, c) gives the state to which
there is an edge from state s on input c. The
function nextChar returns the next character of
the input string x.
(a|b)* abb
ababb,
Sequence of states: 0, 1 , 2, 1 , 2, 3
& returns "yes."
Example
Draw the Transition Diagram for the DFA
accepting all string with a substring 01.

1 0 0,1

Start 0 1
q0 q2 q1

A=({q0,q1,q2},{0,1}, d ,q0,{q1})
Check with the string 01,11010,100011,
0111,110101,11101101, 111000 65
Transition Function & Table
1 0 0,1

Start 0 1
q0 q2 q1

 (q0,0)=q2
 (q0,1)=q0 0 1
(q1,0)=q1

 (q1,1)=q1
q0 q2 q0
 (q2,0)=q2 *q1 q1 q1
 (q2,1)=q1
q2 q2 q1
Example
Let us design a DFA to accept the language
L={w | w has both an even number of 0’s
and even number of 1’s} q00(even) 1 (even)
q 0(even) 1 (odd) 1
q20(odd) 1 (even)
1 q30(odd) 1 (odd)
Start
q q
0
1 1
0 1
*q0 q2 q1 0 0 0 0
q1 q3 q0
1
q2 q0 q3 q q
2 3
q3 q3 q1 1
67
Example: Try Yourself
• A = {w | w contains at least one 1 and an even
number of 0s follow the last 1
• Hints: A1 = (Q, , d, q1, F)
1. Q = {q1, q2, q3}
2.  = {0, 1}
3. d try yourself
4. Start state: q1
5. Final state: {q2}

68
Example
0 1 1

q1 q2

• A2= ({q1, q2}, (0,1), d, q1, {q2})

• Transition function, d 0 1
Try: 1101, 11010, 0011010 q1 q1 q2
L(A2) = {w | w ends in a 1} *q2 q1 q2
69
Example
0 1 1

q1 q2

• A3= ({q1, q2}, (0,1), d, q1, {q1})

• Transition function, d 0 1
Try: 1101, 11010, 0011010 *q1 q1 q2
L(A3) = {w | w is  or ends q2 q1 q2
in a 0} 70
DFA vs. NFA
◊ DFA: d returns a single  NFA: d returns a set of states
state  NFA has an arrow with label 
◊ Every state of a DFA  NFA may have arrows labeled
always has exactly one with members of alphabet/.
exiting transition arrow  Zero, one, or many arrows may
for each symbol in the exit from each state with label 
alphabet
◊ Labels on the transition
arrows are symbols
from the alphabet

71
DFA vs. NFA
Parallel computation
tree

reject

accept
Accept/reject
72
NFA to DFA
• Subset Construction Algorithm
Subset Construction
• Given an NFA with states Q, inputs Σ,
transition function δN, state state q0, and
final states F, construct equivalent DFA with:
– States 2Q (Set of subsets of Q).
– Inputs Σ.
– Start state {q0}.
– Final states = all those with a member of F.

74
Subset Construction
• Given, NFA: N = (QN, Σ, dN, q0, FN)
• Goal: DFA, D = (QD, Σ, dD, {q0}, FD)
• L(D) = L(N)
States
QD is the set of subsets of QN
- QD is the power set of QN
- If QN has n states, QD will have 2n states
Inaccessible states can be thrown away, so
effectively, the number of states D << 2n

75
Subset construction
Final States
• FD is the set of subsets S of QN such that S  FN
  . That is FD is all sets of N’s states that
include at least one accepting state of N.
Transition Function
• The transition function δD is defined by:
δD({q1,…,qk}, a) is the union over all i = 1,…,k of
δN(qi, a).

76
Subset Construction: Example 1
• Example: We’ll construct the DFA
equivalent of our “chessboard” NFA.

1 2 3

4 5 6

7 8 9

77
Example: Subset Construction
r b r b

1 2,4 5 {1} {2,4} {5}

2 4,6 1,3,5 {2,4}
3 2,6 5 {5}
4 2,8 1,5,7
5 2,4,6,8 1,3,7,9
6 2,8 3,5,9
7 4,8 5
8 4,6 5,7,9
* 9 6,8 5

78
Example: Subset Construction
r b
r b
1 2,4 5 {1} {2,4} {5}
{2,4} {2,4,6,8} {1,3,5,7}
2 4,6 1,3,5
{5}
3 2,6 5
{2,4,6,8}
4 2,8 1,5,7 {1,3,5,7}
5 2,4,6,8 1,3,7,9
6 2,8 3,5,9
7 4,8 5
*
8 4,6 5,7,9
9 6,8 5
79
Example: Subset Construction
r b r b

1 2,4 5 {1} {2,4} {5}

2 4,6 1,3,5 {2,4} {2,4,6,8} {1,3,5,7}
3 2,6 5 {5} {2,4,6,8} {1,3,7,9}
4 2,8 1,5,7 {2,4,6,8}
5 2,4,6,8 1,3,7,9 {1,3,5,7}
6 2,8 3,5,9 * {1,3,7,9}
7 4,8 5
8 4,6 5,7,9
* 9 6,8 5

80
Example: Subset Construction
r b
r b
{1} {2,4} {5}
1 2,4 5
{2,4} {2,4,6,8} {1,3,5,7}
2 4,6 1,3,5
{5} {2,4,6,8} {1,3,7,9}
3 2,6 5 {2,4,6,8} {2,4,6,8} {1,3,5,7,9}
4 2,8 1,5,7 {1,3,5,7}
5 2,4,6,8 1,3,7,9 * {1,3,7,9}
6 2,8 3,5,9 * {1,3,5,7,9}
7 4,8 5
8 4,6 5,7,9
* 9 6,8 5
81
Example: Subset Construction
r b
r b
{1} {2,4} {5}
1 2,4 5
{2,4} {2,4,6,8} {1,3,5,7}
2 4,6 1,3,5
{5} {2,4,6,8} {1,3,7,9}
3 2,6 5 {2,4,6,8} {2,4,6,8} {1,3,5,7,9}
4 2,8 1,5,7 {1,3,5,7} {2,4,6,8} {1,3,5,7,9}
5 2,4,6,8 1,3,7,9 * {1,3,7,9}
6 2,8 3,5,9 * {1,3,5,7,9}
7 4,8 5
8 4,6 5,7,9
* 9 6,8 5
82
Example: Subset Construction
r b
r b
{1} {2,4} {5}
1 2,4 5
{2,4} {2,4,6,8} {1,3,5,7}
2 4,6 1,3,5
{5} {2,4,6,8} {1,3,7,9}
3 2,6 5 {2,4,6,8} {2,4,6,8} {1,3,5,7,9}
4 2,8 1,5,7 {1,3,5,7} {2,4,6,8} {1,3,5,7,9}
5 2,4,6,8 1,3,7,9 * {1,3,7,9} {2,4,6,8} {5}
6 2,8 3,5,9 * {1,3,5,7,9}
7 4,8 5
8 4,6 5,7,9
* 9 6,8 5
83
Example: Subset Construction
r b
r b
{1} {2,4} {5}
1 2,4 5
{2,4} {2,4,6,8} {1,3,5,7}
2 4,6 1,3,5
{5} {2,4,6,8} {1,3,7,9}
3 2,6 5 {2,4,6,8} {2,4,6,8} {1,3,5,7,9}
4 2,8 1,5,7 {1,3,5,7} {2,4,6,8} {1,3,5,7,9}
5 2,4,6,8 1,3,7,9 * {1,3,7,9} {2,4,6,8} {5}
6 2,8 3,5,9 * {1,3,5,7,9} {2,4,6,8} {1,3,5,7,9}
7 4,8 5
8 4,6 5,7,9
* 9 6,8 5
84
Example 2
0,1
Start 0 1
q0 q1 q2

δD({q0, q2}, 0) = δN({q0, 0) U δN({q2, 0) = {q0, q1} U  = {q0, q1}

δD({q0, q2}, 1) = δN({q0, 1) U δN({q2, 1) = {q0} U  = {q0}

0 1
Ø Ø Ø
{q0} {q0,q1} {q0}
{q1} Ø {q2}
*{q2} Ø Ø
{q0,q1} {q0,q1} {q0,q2}
*{q0,q2} {q0,q1} {q0}
*{q1,q2} Ø {q2}
*{q0,q1,q2} {q0,q1} {q0,q2} 85
Example 2
0 1
• NFA N Accepts all A A A
strings that end in 01 B E B
• N’s set of states: {q1, C A D
q2, q3} =03 *D A A
• Subset construction: E E F
DFA need 23 = 8 states *F E B
• Assign new names: A for
 , B for {q0} *G A D
*H E F
86
Example 2
1 0
Start 0 1
B E F

0
1
0 1
A A A
B E B
•From 08 states, starting in start
C A D
state B, can only reach states B, E
*D A A
&F
E E F
other 05 states are inaccessible
*F E B
from B
*G A D
*H E F
87
Example 3
• N = (Q, {a, b}, d, 1, {1})
• Q = {1, 2, 3} = 03 states 1

• DFA states = 08
• {, {1}, {2}, {3}, {1, 2}, {1, 3},
2 3
{2, 3}, {1, 2, 3}} a, b

88
a b 
   
{1}  {2} {3}
{2} {2, 3} {3} 
{3} {1, 3}  
{1, 2} {2, 3} {2, 3} 
{1, 3} {1, 3} {2} 
{2, 3} {1, 2, 3} {3} 
{1, 2, 3} {1, 2, 3} {2, 3} 

a, b
a b {2}
 {1} {1, 2}

a
b b a

a
{2, 3} {1, 2, 3}
{3} {1, 3} a
a b
89
Example 3
Simplified: no incoming arrows point at states {1} & {1, 2}
May be removed without affecting the performance

a, b
a
a b
{1, 3}
{3} 

b b b a

a
{2} {2, 3} {1, 2, 3}
a
b

90
Closure of States
• CL(q) = set of states you can reach from state
q following only arcs labeled ε.
• Example: CL(A) = {A}; ε
1 1
CL(E) = {B, C, D, E}. 1 B C D

A ε ε 0
0 E F
0

• Closure of a set of states = union of the

closure of each state.
91
Algorithm : The subset construction of
a DFA from an NFA.
Input: An NFA N.
OUTPUT: A DFA D accepting the same language as N
METHOD: constructs a transition table Dtran for D.
Each state of D is a set of NFA states, and construct
Dtran so D will simulate
"in parallel" all possible moves N can make on a
given input string.
s is a single state of N, while T is a set of states of N.
Operations on NFA states

Set of states
The subset construction

Computing -closure(T)
Example: NFA accepting R = (alb) *abb
 = (a, b)

Marked
• -closure(0) = {0, 1, 2,4, 7} = A
• Mark A, Compute Dtran [A, a] & Dtran [A, b]
• Dtran [A, a] = -closure (move(A, a))
= -closure (move({0, 1, 2, 4, 7}, a))
= -closure ({3, 8})
= {3, 6, 7, 1, 2, 4} U {8}
= {1, 2, 3, 4, 6, 7, 8} = B
 = (a, b)

• Dtran [A, b] = -closure (move(A, b))

= -closure (move({0, 1, 2, 4, 7}, b))
= -closure ({5})
= {5, 6, 7, 1, 2, 4}
Dtran [A, b] = {1, 2, 4, 5, 6, 7} = C
 = (a, b)

• Mark B, Compute Dtran [B, a] & Dtran [B, b]

• Dtran [B, a] = -closure (move(B, a))
= -closure (move({1, 2, 3, 4, 6, 7, 8}, a))
= -closure ({3, 8})
= {3, 6, 7, 1, 2, 4} U {8}
Dtran [B, a] = {1, 2, 3, 4, 6, 7, 8} = B
 = (a, b)

• Compute Dtran [B, b]

• Dtran [B, b] = -closure (move(B, b))
= -closure (move({1, 2, 3, 4, 6, 7, 8}, b))
= -closure ({5, 9})
= {5, 6, 7, 1, 2, 4} U {9}
Dtran [B, b] = {1, 2, 4, 5, 6, 7, 9} = D
 = (a, b)

• Mark C, Compute Dtran [C, a] & Dtran [C, b]

• Dtran [C, a] = -closure (move(C, a))
= -closure (move({1, 2, 4, 5, 6, 7}, a))
= -closure ({3, 8})
= {3, 6, 7, 1, 2, 4} U {8}
Dtran [C, a] = {1, 2, 3, 4, 6, 7, 8} = B
 = (a, b)

• Compute Dtran [C, b]

• Dtran [C, b] = -closure (move(C, b))
= -closure (move({1, 2, 4, 5, 6, 7}, b))
= -closure ({5})
= {5, 6, 7, 1, 2, 4}
Dtran [C, b] = {1, 2, 4, 5, 6, 7} = C
 = (a, b)

• Mark D, Compute Dtran [D, a] and Dtran [D, b]

• Dtran [D, a] = -closure (move(D, a))
= -closure (move({1, 2, 4, 5, 6, 7, 9}, a))
= -closure ({3, 8})
= {3, 6, 7, 1, 2, 4} U {8}
Dtran [D, a] = {1, 2, 3, 4, 6, 7, 8} = B
 = (a, b)

• Compute Dtran [D, b]

• Dtran [D, b] = -closure (move(D, b))
= -closure (move({1, 2, 4, 5, 6, 7, 9}, b))
= -closure ({5, 10})
= {5, 6, 7, 1, 2, 4} U {10}
Dtran [D, b] = {1, 2, 4, 5, 6, 7, 10} = E
 = (a, b)

• Mark E, Compute Dtran [E, a] and Dtran [E, b]

• Dtran [E, a] = -closure (move(E, a))
= -closure (move({1, 2, 4, 5, 6, 7, 10}, a))
= -closure ({3, 8})
= {3, 6, 7, 1, 2, 4} U {8}
Dtran [E, a] = {1, 2, 3, 4, 6, 7, 8} = B
 = (a, b)

• Compute Dtran [E, b]

• Dtran [E, b] = -closure (move(E, b))
= -closure (move({1, 2, 4, 5, 6, 7, 10}, b))
= -closure ({5})
= {5, 6, 7, 1, 2, 4}
Dtran [E, b] = {1, 2, 4, 5, 6, 7} = C
Summary
Dtran [A, a] = {1, 2, 3, 4, 6, 7, 8} = B
Dtran [A, b] = {1, 2, 4, 5, 6, 7} = C NFA State DFA a b
State
Dtran [B, a] = {1, 2, 3, 4, 6, 7, 8} = B {0, 1, 2, 4, 7} A B C
Dtran [B, b] = {1, 2, 4, 5, 6, 7, 9} = D {1, 2, 3, 4, 6, 7, 8} B B D
Dtran [C, a] = {1, 2, 3, 4, 6, 7, 8} = B {1, 2, 4, 5, 6, 7} C B C
Dtran [C, b] = {1, 2, 4, 5, 6, 7} = C {1, 2, 4, 5, 6, 7, 9} D B E
{1, 2, 4, 5, 6, 7, 10} E B C
Dtran [D, a] = {1, 2, 3, 4, 6, 7, 8} = B
Dtran [D, b] = {1, 2, 4, 5, 6, 7, 10} = E
Dtran [E, a] = {1, 2, 3, 4, 6, 7, 8} = B
Dtran [E, b] = {1, 2, 4, 5, 6, 7} = C
NFA State DFA a b
State
{0, 1, 2, 4, 7} A B C
{1, 2, 3, 4, 6, 7, 8} B B D
{1, 2, 4, 5, 6, 7} C B C
{1, 2, 4, 5, 6, 7, 9} D B E
*{1, 2, 4, 5, 6, 7, 10} *E B C
Regular Expression to NFA
• McNaughton-Yamada-Thompson Algorithm
Construction of an NFA from a Regular
Expression
• Algorithm: The McNaughton-Yamada- Thompson
algorithm to convert a regular expression to an NFA.
• INPUT: A regular expressioll r over alphabet 
• OUTPUT: An NFA N accepting L(r) .
• METHOD:
 Begin by parsing r into its constituent sub expressions.
 The rules for constructing an NFA consist of
 basis rules for handling sub expressions with no
operators,
 inductive rules for constructing larger NFA's from the
NFA's for the immediate sub expressions of a given
expression
Basis
1. For expression  (r= ) construct the NFA

2. For any sub expression a (r=a), construct NFA

INDUCTION
• Suppose N(s) and N(t) are NFA's for regular
expressions s and t, respectively.
1. r = s|t (union)
2. r = st (Concatenation)
3. r = s* (Closure/star)
Observations
• 1. N(r) has at most twice as many states as there are
operators and operands in r.
-This bound follows from the fact that each step of the
algorithm creates at most two new states.
• 2. N(r) has one start state and one accepting state.
-The accepting state has no outgoing transitions,
-start state has no incoming transitions.
• 3. Each state of N(r) other than the accepting state has
either
-one outgoing transition on a symbol in 
-or two outgoing transitions, both on 
Example: Construct an NFA for r = (alb)*abb

Parse tree
Step 1: For sub expression r1 = a

Step 2: For sub expression r2 = b

Step 3: For sub expression r3 = r1|r2

Step 4: For sub expression r4 = (r3)

Same As r3
Step 5: For sub expression r5 = (r3)*
Step 6: For sub expression r6 = a

Step 7: For sub expression r7 = r5r6

Step 8: For sub expression r8 = b
b
8` 9

Step 9: For sub expression r9 = r8r7

b
8 9
Step 9: For sub expression r10 = b
b
9` 10

Step 10: For sub expression r11 = r10r9

b b
8 9 10
Important States of NFA
• A state of an NFA important if it has a non- out-transition.
• Notice that the subset construction uses only the important
states in a set T when it computes
- closure (move(T, a)),
-the set of states reachable from T on input a.
• The set of states move(s , a) is nonempty only if state s is
important.
• During the subset construction, two sets of NFA states can
be identified (treated as if they were the same set) if they:
• 1. Have the same important states, and
• 2. Either both have accepting states or neither does.
• The only important states are those introduced as
initial states in the basis part for a particular
symbol position in the regular expression.
• Each important state corresponds to a particular
operand in the regular expression.
• The constructed NFA has only one accepting
state, but this state, having no out-transitions, is
not an important state
By concatenating a unique right end marker # to a regular expression r, we
give the accepting state for r a transition on #, making it an important state of
the NFA for (r) #.
augmented regular expression (r)#,
when the construction is complete, any state with a transition on # must be
an accepting state.
Nodes
• The important states of the NFA correspond directly to
the positions in the regular expression that hold
symbols
• present the regular expression by its syntax tree
-leaves correspond to operands
-interior nodes correspond to operators
• An interior nodes:
.
• cat-node: concatenation operator ( dot)
• or-node: union operator (I)
• star-node: star operator (*)
Syntax tree: (alb)* abb#
Syntax tree: (alb)* abb#
• Leaves in a syntax tree are labeled by  or by an
alphabet symbol.
To each leaf not labeled , attach a unique
integer.
(the position of the leaf and also as a position of
its symbol)
a symbol can have several positions (a: 1 & 3 )
• The positions in the syntax tree correspond to the
important states of the constructed NFA.
Example: NFA [for r=(a|b)*abb#] with the important states numbered and
other states represented by letters

b b
8 9 10
Functions Computed From the Syntax Tree
• To construct a DFA directly from a regular
expression, we construct its syntax tree and
then compute four functions:
nullable
firstpos
lastpos
followpos
04 Functions
1. nullable(n) is true for a syntax-tree node n if & only if the sub
expression represented by n has  in its language.
 sub expressiorn can be "made null" or the empty string, even
though there may be other strings it can represent as well.
2. firstpos(n) is the set of positions in the subtree rooted at n that
correspond to the first symbol of at least one string in the
language of the sub expression rooted at n.
3. lastpos(n) is the set of positions in the subtree rooted at n that
correspond to the last symbol of at least one string in the
language of the sub expression rooted at n
4. followpos(p), for a position p, is the set of positions q in the
entire syntax tree such that there is some string x = a1 a2 . . . an
in L ( (r ) #) such that for some i, there is a way to explain the
membership of x in L( (r) #) by matching ai to position p of the
syntax tree and ai+1 to position q
Example: Consider the aa
cat-node n corresponds ba
to expression (alb) *a aba
Cat node
• nullable(n) is false,
since this node
generates all strings of
a’s & b’s ending in an
a; does not generate 
•
• the star-node below it firstpos (n) = {1, 2, 3}
is nullable; it generates lastpost (n) = {3}
 along with all other followpos (1) = {1, 2, 3}
strings of a’s & b’s
Computing nullable, firstpos, & lastpos
• Compute nullable, firstpos, & lastpos by a
straightforward recursion on height of the tree
• Basis & inductive rules for nullable & firstpos
Example : only the star-
node is nullable.
• none of the leaves are
nullable, because they
each correspond to non-
operands.
• The or-node is not
nullable, because neither
of its children is.
• The star-node is nullable,
because every star-node
is nullable.
• each of the cat-nodes,
having at least one non
null able child, is not
nullable.
firstpos(n) to the left of node n, and lastpos(n) to its right.
Each of the leaves has only itself for firstpos & lastpos, as required by
the rule for non- leaves
For the or-node, we take the union of firstpos
at the children and do the same for lastpos.
• consider the lowest cat-node, which we shall call n.
• To compute firstpos(n) , we first consider whether the
left operand is nullable, which it is in this case.
• Therefore, firstpos for n is the union of firstpos for each
of its children, that is {1, 2 } U {3} = {I, 2, 3}.
• The rule for lastpos are the same as for firstpos, with
the children interchanged.
• To compute lastpos(n) we must ask whether its right
child (the leaf with position 3) is nullable, which it is
not.
• Therefore, lastpos(n) is the same as lastpos of the right
child, or {3}.
Computing Followpos
• two ways that a position of a regular
expression can be made to follow another:
1. If n is a cat-node with left child C1 & right child
C2 , then for every position i in lastpos(C1) , all
positions in firstpos(C2) are in followpos(i).
2. If n is a star-node, & i is a position in
lastpos(n) , then all positions in firstpos(n) are
in followpos(i).
Example: Rule 1 for followpos requires that we look
at each cat-node, & put each position in firstpos of
its right child in followpos for each position in
lastpos of its left child.

firstpos

lastpos
 For the lowest cat-node, that rule says position 3 is in
followpos(1) and followpos(2)
 The next cat-node says that 4 is in followpos (3) ,
 remaining two cat-nodes give us 5 in followpos (4) & 6 in
followpos(5)

1. If n is a cat-node with left child C1 & right

child C2 , then for every position i in lastpos(C1)
, all positions in firstpos(C2) are in followpos(i).

C1 C2
F|L F|L
 For the lowest cat-node, that rule says position 3 is in followpos(1) &
followpos(2)
 Rule 2 to the star-node. positions 1 & 2 are in both followpos(1) &
followpos(2) , since both firstpos & lastpos for this node are {1 , 2} .

2. If n is a star-node, & i is a position in

lastpos(n) , then all positions in firstpos(n)
are in followpos(i).

C1 C2
F|L F|L
Directed graph for the function followpos
Converting a Regular Expression
Directly to a DFA
Algorithm: Construction of a DFA from a regular expression r.
INPUT : A regular expression r.
OUTPUT: A DFA D that recognizes L (r) .
METHOD:
1 . Construct a syntax tree T from the augmented regular
expression (r) #.
2. Compute nullable, firstpos, lastpos, & followpos for T
3. Construct Dstates, the set of states of DFA D , & Dtran, the
transition function for D. The states of D are sets of positions in T.
Initially, each state is "unmarked," & a state becomes "marked" just
before we consider its out-transitions.
The start state of D is firstpos(no) , where node no is the root of T.
The accepting states are those containing the position for
endmarker symbol #
Construction of a DFA directly from a
regular expression
Example: construct a DFA for the regular expression
r = (a|b)*abb.
The value of firstpos for the root of the tree: {1, 2, 3}
A = {1, 2, 3} ----Start state
• Compute Dtran[A, a] & Dtran[A, b].
• Among the positions of A, 1 & 3 correspond to a, while 2
corresponds to b.
• Dtran[A, a] = followpos(1) U followpos(3) = {1, 2, 3, 4} = B
• Compute Dtran[A, b].
• Among the positions only 2 corresponds to b.
• Dtran[A, b] = followpos(2) = {1, 2, 3} = A
• Compute Dtran[B, a] = Dtran[{1, 2, 3, 4}, a]
• Among the positions 1, 3 corresponds to a.
• Dtran[B, a] = followpos(1) U followpos(3)
= {1, 2, 3, 4} = B
• Compute Dtran[B, b] = Dtran[{1, 2, 3, 4}, b]
• Among the positions 2 & 4 corresponds to b.
• Dtran[B, b] = followpos(2) U followpos(4)
= {1, 2, 3, 5} = C
• Compute Dtran[C, a] = Dtran[{1, 2, 3, 5}, a]
• Among the positions 1 & 3 corresponds to a.
• Dtran[C, a] = followpos(1) U followpos(3)
= {1, 2, 3, 4} = B
• Compute Dtran[C, b] = Dtran[{1, 2, 3, 5}, b]
• Among the positions 2 & 5 corresponds to b.
• Dtran[C, b] = followpos(2) U followpos(5)
= {1, 2, 3, 6} = D
• Compute Dtran[D, a] = Dtran[{1, 2, 3, 6}, a]
• Among the positions 1 & 3 corresponds to a.
• Dtran[D, a] = followpos(1) U followpos(3)
= {1, 2, 3, 4} = B
• Compute Dtran[D, b] = Dtran[{1, 2, 3, 6}, b]
• Among the positions 2 corresponds to b.
• Dtran[D, b] = followpos(2)
= {1, 2, 3} = A
A = {1, 2, 3}
Dtran[A, a] = followpos(1) U followpos(3) = {1, 2, 3, 4} = B

Dtran[A, b] = followpos(2) = {1, 2, 3} = A

Dtran[B, a] = followpos(1) U followpos(3) = {1, 2, 3, 4} = B

Dtran[B, b] = followpos(2) U followpos(4) = {1, 2, 3, 5} = C

Dtran[C, a] = followpos(1) U followpos(3) = {1, 2, 3, 4} = B

Dtran[C, b] = followpos(2) U followpos(5) = {1, 2, 3, 6} = D

Dtran[D, a] = followpos(1) U followpos(3) = {1, 2, 3, 4} = B

Dtran[D, b] = followpos(2) = {1, 2, 3} = A States a b

{1, 2, 3} A B A
{1, 2, 3, 4} B B C
{1, 2, 3, 5} C B D
{1, 2, 3, 6} D B A
States a b
{1, 2, 3} A B A
{1, 2, 3, 4} B B C
{1, 2, 3, 5} C B D DFA Construction
{1, 2, 3, 6} D B A

A B C D
Conclusion
• Tokens
• Lexemes
• Patterns
• Regular Expressions
• Regular Definitions
• Transition Diagrams
• Finite Automata
• DFA & NFA
• Conversion (NFA to DFA, Regular Expression to
NFA/DFA)

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Check List (Quality Auditors) - Converted1
No ratings yet
Check List (Quality Auditors) - Converted1
65 pages
Installation of Signboard
100% (1)
Installation of Signboard
13 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Lexical Analysis: Risul Islam Rasel
No ratings yet
Lexical Analysis: Risul Islam Rasel
148 pages
Lec 02
No ratings yet
Lec 02
17 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
rkCD-Chapter 2 - LEXICAL ANALYSIS
No ratings yet
rkCD-Chapter 2 - LEXICAL ANALYSIS
9 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
Unit 01 - PART 2
No ratings yet
Unit 01 - PART 2
25 pages
Ch3 - Lexical Analysis
No ratings yet
Ch3 - Lexical Analysis
52 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
CSI 411 - Compiler - Lecture 2 PDF
No ratings yet
CSI 411 - Compiler - Lecture 2 PDF
22 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
CD 1
No ratings yet
CD 1
92 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
HW 31712
No ratings yet
HW 31712
22 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
Chapter-3 Short
No ratings yet
Chapter-3 Short
50 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Unit 2
No ratings yet
Unit 2
61 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Module 3
No ratings yet
Module 3
7 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Ch2 Lexical Analysis
No ratings yet
Ch2 Lexical Analysis
11 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
CSC 415 Compiler Design: Lexical Analysis
No ratings yet
CSC 415 Compiler Design: Lexical Analysis
40 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Compiler Design Part 2
No ratings yet
Compiler Design Part 2
20 pages
2 Lex
No ratings yet
2 Lex
45 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Lexical Analysis: CD: Compiler Design
No ratings yet
Lexical Analysis: CD: Compiler Design
122 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Human Values: DR - Sunil Ms Ob LPU
No ratings yet
Human Values: DR - Sunil Ms Ob LPU
11 pages
Sony kv-27fs13 27fs17 27fv17 29fv17-c 32fs13 32fs17 34fs13c 34fs17 CH Ba-5
No ratings yet
Sony kv-27fs13 27fs17 27fv17 29fv17-c 32fs13 32fs17 34fs13c 34fs17 CH Ba-5
299 pages
Electronics AND Communication Engineers: Indian Society OF
No ratings yet
Electronics AND Communication Engineers: Indian Society OF
2 pages
Improve English
No ratings yet
Improve English
3 pages
Marker Enzymes
No ratings yet
Marker Enzymes
4 pages
Cantilever Slab
No ratings yet
Cantilever Slab
3 pages
Ex Inspections - A Journey For Maintenance Engineers: Shailesh Chauhan Shell Project &technology Stavanger Norway
No ratings yet
Ex Inspections - A Journey For Maintenance Engineers: Shailesh Chauhan Shell Project &technology Stavanger Norway
4 pages
8 Step Training Model
No ratings yet
8 Step Training Model
1 page
Data Structure - AVL Tree
No ratings yet
Data Structure - AVL Tree
6 pages
Comply Efficiently With Electronic Documents and Statutory Reporting Worldwide
No ratings yet
Comply Efficiently With Electronic Documents and Statutory Reporting Worldwide
4 pages
Nairobi PDF
No ratings yet
Nairobi PDF
22 pages
IDTR 2019-20 Announcement
No ratings yet
IDTR 2019-20 Announcement
3 pages
SikaGrout-220 2011-11 - 1
No ratings yet
SikaGrout-220 2011-11 - 1
4 pages
Aw Hook-Simulationxpress Study-1
No ratings yet
Aw Hook-Simulationxpress Study-1
11 pages
Alchemical Imagery in The Works of Quiri PDF
No ratings yet
Alchemical Imagery in The Works of Quiri PDF
467 pages
Modeep
No ratings yet
Modeep
13 pages
Membership Form: The Accredited Professional Organization in The Phils. (I-Apo No
No ratings yet
Membership Form: The Accredited Professional Organization in The Phils. (I-Apo No
1 page
Essay On My Hero
100% (2)
Essay On My Hero
3 pages
Human Settlements and Town Planning
No ratings yet
Human Settlements and Town Planning
3 pages
Logcat
No ratings yet
Logcat
4,525 pages
ICTAD Review
0% (1)
ICTAD Review
48 pages
Internal Control PSA315
100% (1)
Internal Control PSA315
8 pages
Sensors: Implementation of Parameter Observer For Capacitors
No ratings yet
Sensors: Implementation of Parameter Observer For Capacitors
19 pages
Attitude Defines Our Altitude
No ratings yet
Attitude Defines Our Altitude
3 pages
WPP - 4 - Federalism
No ratings yet
WPP - 4 - Federalism
2 pages
Nobel Prize - Story by Vikas Taya
No ratings yet
Nobel Prize - Story by Vikas Taya
1 page
Manual F315-F321-F330-F340
No ratings yet
Manual F315-F321-F330-F340
19 pages
The Social Work Student's Research Handbook - 2nd Edition Instant DOCX Download
100% (15)
The Social Work Student's Research Handbook - 2nd Edition Instant DOCX Download
16 pages

Lecture3 E

Uploaded by

Lecture3 E

Uploaded by

Lexical Analysis

The Role of the Lexical Analyzer

• 1. Delete one character from the remaining

• 1.  is a regular expression, and L () is {}

1. (r)|(s) is a regular expression denoting the language L(r) U L(s) .

2. (r) (s) is a regular expression denoting the language L(r)L(s).

3. (r)* is a regular expression denoting (L (r)) * .

4. (r) is a regular expression denoting L(r) .

• By restricting ri to  & previously defined d's, we avoid recursive

• Finally, in rn replace each di , for i = 1 , 2, . . . , n - 1 , by the

• id  letter _ ( letter- I digit ) *

• digits  digit digit*

• number  digits optionalFraction optionalExponent

• The basic operations generate all possible regular

blank, tab, newline are abstract symbols.

• For each lexeme or

• Keywords (if or then) are reserved

Hypothetical transition diagram for the keyword then

Figure : A transition diagram for whitespace

This graph is very much like a transition diagram, except

Double circle around state 3 indicates that this state is accepting.

• Another path (not accepting)

 NFA accepts a string as long as some path

• String aaa accepted

Accept: , a, baba, baa q1

Reject: b, bb, babba

1. Q = {q1, q2, q3, q4}

• A2= ({q1, q2}, (0,1), d, q1, {q2})

• A3= ({q1, q2}, (0,1), d, q1, {q1})

1 2,4 5 {1} {2,4} {5}

1 2,4 5 {1} {2,4} {5}

δD({q0, q2}, 0) = δN({q0, 0) U δN({q2, 0) = {q0, q1} U  = {q0, q1}

• Closure of a set of states = union of the

• Dtran [A, b] = -closure (move(A, b))

• Mark B, Compute Dtran [B, a] & Dtran [B, b]

• Compute Dtran [B, b]

• Mark C, Compute Dtran [C, a] & Dtran [C, b]

• Compute Dtran [C, b]

• Mark D, Compute Dtran [D, a] and Dtran [D, b]

• Compute Dtran [D, b]

• Mark E, Compute Dtran [E, a] and Dtran [E, b]

• Compute Dtran [E, b]

2. For any sub expression a (r=a), construct NFA

Step 2: For sub expression r2 = b

Step 4: For sub expression r4 = (r3)

Step 7: For sub expression r7 = r5r6

Step 9: For sub expression r9 = r8r7

Step 10: For sub expression r11 = r10r9

1. If n is a cat-node with left child C1 & right

2. If n is a star-node, & i is a position in

Dtran[A, b] = followpos(2) = {1, 2, 3} = A

Dtran[B, a] = followpos(1) U followpos(3) = {1, 2, 3, 4} = B

Dtran[B, b] = followpos(2) U followpos(4) = {1, 2, 3, 5} = C

Dtran[C, a] = followpos(1) U followpos(3) = {1, 2, 3, 4} = B

Dtran[C, b] = followpos(2) U followpos(5) = {1, 2, 3, 6} = D

Dtran[D, a] = followpos(1) U followpos(3) = {1, 2, 3, 4} = B

Dtran[D, b] = followpos(2) = {1, 2, 3} = A States a b

You might also like