0% found this document useful (0 votes)

148 views72 pages

Lecture02 Scanning 1

This document discusses the process of scanning or lexical analysis in compiler construction. It covers the main topics of: 1) The function of a scanner is to read source code characters and form them into tokens. Tokens are logical entities like keywords, symbols, and identifiers. 2) Regular expressions are used to represent patterns of character strings and define the language of tokens. Operations like choice, concatenation, and repetition are used to construct regular expressions. 3) Finite automata or finite state machines are used to implement scanners by recognizing patterns in the input defined by regular expressions. The document will cover converting regular expressions to deterministic finite automata.

Uploaded by

Nada Shaaban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views72 pages

Lecture02 Scanning 1

Uploaded by

Nada Shaaban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

COMPILER CONSTRUCTION

Principles and Practice

Kenneth C. Louden
2. Scanning (Lexical Analysis)

PART ONE
Contents
PART ONE
2.1 The Scanning Process [Open]
2.2 Regular Expression [Open]
2.3 Finite Automata [Open]

PART TWO
2.4 From Regular Expressions to DFAs
2.5 Implementation of a TINY Scanner
2.6 Use of Lex to Generate a Scanner Automatically
2.1 The Scanning Process
The Function of a Scanner
• Reading characters from the source code
and form them into logical units called
tokens
• Tokens are logical entities defined as an
enumerated type
– Typedef enum
{IF, THEN, ELSE, PLUS, MINUS, NUM, ID,…}
TokenType;
The Categories of Tokens
• RESERVED WORDS
– Such as IF and THEN, which represent the strings of
characters “if” and “then”
• SPECIAL SYMBOLS
– Such as PLUS and MINUS, which represent the
characters “+” and “-“
• OTHER TOKENS
– Such as NUM and ID, which represent numbers and
identifiers
Relationship between Tokens and its
String
• The string is called STRING VALUE or
LEXEME of token
• Some tokens have only one lexeme, such as
reserved words
• A token may have infinitely many lexemes,
such as the token ID
Relationship between Tokens and its
String
• Any value associated to a token is called an attributes of a
token
– String value is an example of an attribute.
– A NUM token may have a string value such as “32767” and actual
value 32767
– A PLUS token has the string value “+” as well as arithmetic
operation +
• The token can be viewed as the collection of all of its
attributes
– Only need to compute as many attributes as necessary to allow
further processing
– The numeric value of a NUM token need not compute immediately
Some Practical Issues of the Scanner

• One structured data type to collect all the

attributes of a token, called a token record
– Typedef struct
{TokenType tokenval;
char *stringval;
int numval;
} TokenRecord
Some Practical Issues of the Scanner
• The scanner returns the token value only and
places the other attributes in variables
TokeType getToken(void)
• As an example of operation of getToken,
consider the following line of C code.
A[index] = 4+2

a [ i n d e x ] = 4 + 2

a [ i n d e x ] = 4 + 2 RET
2.2 Regular Expression
Some Relative Basic Concepts
• Regular expressions
– represent patterns of strings of characters.
• A regular expression r
– completely defined by the set of strings it matches.
– The set is called the language of r written as L(r)
• The set elements
– referred to as symbols
• This set of legal symbols
– called the alphabet and written as the Greek symbol ∑
Some Relative Basic Concepts
• A regular expression r
– contains characters from the alphabet, indicating
patterns, such a is the character a used as a pattern
• A regular expression r
– may contain special characters called meta-characters
or meta-symbols
• An escape character can be used to turn off the
special meaning of a meta-character.
– Such as backslash and quotes
More About Regular Expression

2.2.1 Definition of Regular Expression [Open]

2.2.2 Extension to Regular Expression [Open]
2.2.3 Regular Expressions for Programming
Language Tokens [Open]
2.2.1 Definition of Regular
Expressions
Basic Regular Expressions
• The single characters from alphabet
matching themselves
– a matches the character a by writing L(a)={ a }
– ε denotes the empty string, by L(ε)={ε}
Regular Expression Operations
• Choice among alternatives, indicated by
the meta-character |
• Concatenation, indicated by juxtaposition
• Repetition or “closure”, indicated by the
meta-character *
Choice Among Alternatives
• If r and s are regular expressions, then r|s is a
regular expression which matches any string that
is matched either by r or by s.
• In terms of languages, the language r|s is the union
of language r and s, or L(r|s) = L(r) U L(s)
• A simple example, L(a|b) = L(a) U (b) = {a, b}
• Choice can be extended to more than one
alternative.
Concatenation
• If r and s are regular expression, the rs is their
concatenation which matches any string that is the
concatenation of two strings, the first of which
matches r and the second of which matches s.
• In term of generated languages, the concatenation
set of strings S1S2 is the set of strings of S1
appended by all the strings of S2.
• A simple example, (a|b)c matches ac and bc
• Concatenation can also be extended to more than
two regular expressions.
Repetition
• The repetition operation of a regular expression,
called (Kleene) closure, is written r*, where r is a
regular expression. The regular expression r*
matches any finite concatenation of strings, each
of which matches r.
• A simple example, a* matches the strings epsilon,
a, aa, aaa,…
• In term of generated language, given a set of S of
string, S* is a infinite set union, but each element
in it is a finite concatenation of string from S
Precedence of Operation and Use of
Parentheses
• The standard convention
Repetition * has highest precedence
Concatenation is given the next highest
| is given the lowest
• A simple example
a|bc* is interpreted as a|(b(c*))
• Parentheses is used to indicate a different
precedence
Name for regular expression
• Give a name to a long regular expression
– digit = 0|1|2|3|4……|9
– (0|1|2|3……|9)(0|1|2|3……|9)* digit
digit*
Definition of Regular Expression
• A regular expression is one of the following:
(1) A basic regular expression, a single legal character a from
alphabet ∑ or meta-character ε.
(2) The form r|s, where r and s are regular expressions
(3) The form rs, where r and s are regular expressions
(4) The form r*, where r is a regular expression
(5) The form (r), where r is a regular expression

• Parentheses do not change the language.

Examples of Regular Expressions
Example 1:
– ∑={ a,b,c}
– the set of all strings over this alphabet that contain exactly one b.
– (a|c)*b(a|c)*

Example 2:
– ∑={ a,b,c}
– the set of all strings that contain at most one b.
– (a|c)*|(a|c)*b(a|c)* (a|c)*(b|ε)(a|c)*
– the same language may be generated by many different regular
expressions.
Examples of Regular Expressions
Example 3:
– ∑={ a,b}
– the set of strings consists of a single b surrounded by the same
number of a’s.
– S = {b, aba, aabaa,aaabaaa,……} = { anban | n≠0}
– This set can not be described by a regular expression.
• “regular expression can’t count ”

– not all sets of strings can be generated by regular expressions.

– a regular set : a set of strings that is the language for a regular
expression is distinguished from other sets.
Examples of Regular Expressions
Example 4:
– ∑={ a,b,c}
– The strings contain no two consecutive b’s
– ( (a|c)* | (b(a|c))* )*
– ( (a | c ) | (b( a | c )) )* or (a | c | ba | bc)*
• Not yet the correct answer
The correct regular expression
– (a | c | ba | bc)* (b |ε)
– ((b |ε) (a | c | ab| cb )*
– (not b |b not b)*(b|ε) not b = a|c
Examples of Regular Expressions
Example 5:
– ∑={ a,b,c}
– ((b|c)* a(b|c)*a)* (b|c)*
– Determine a concise English description of
the language
– the strings contain an even number of a’s
(nota* a nota* a)* nota*

BACK
2.2.2 Extensions to Regular
Expression
List of New Operations
1) one or more repetitions
r+
2) any character
period “．”
3) a range of characters
[0-9], [a-zA-Z]
List of New Operations
4) any character not in a given set
∼(a|b|c) a character not either a or b or c
[^abc] in Lex
5) optional sub-expressions
– r? the strings matched by r are optional

BACK
2.2.3 Regular Expressions for
Programming Language Tokens
Number, Reserved word and
Identifiers
Numbers
– nat = [0-9]+
– signedNat = (+|-)?nat
– number = signedNat(“．”nat)? (E signedNat)?
Reserved Words and Identifiers
– reserved = if | while | do |………
– letter = [a-z A-Z]
– digit = [0-9]
– identifier = letter(letter|digit)*
Comments
Several forms:
{ this is a pascal comment } {( ∼ })*}

; this is a schema comment

-- this is an Ada comment --(∼newline)* newline

/* this is a C comment */
can not written as ba(~(ab))*ab, ~ restricted to single character
one solution for ~(ab) : b*(a*∼(a|b)b*)*a*

Because of the complexity of regular expression, the comments will be

handled by ad hoc methods in actual scanners.
Ambiguity
Ambiguity: some strings can be matched
by several different regular expressions.
– either an identifier or a keyword, keyword
interpretation preferred.
– a single token or a sequence of several tokens,
the single-token preferred.( the principle of
longest sub-string.)
White Space and Lookahead
White space:
– Delimiters: characters that are unambiguously part of
other tokens are delimiters.
– whitespace = ( newline | blank | tab | comment)+
– free format or fixed format
Lookahead:
– buffering of input characters , marking places for
backtracking
DO99I=1,10
DO99I=1.10

RET
2.3 FINITE AUTOMATA
Introduction to Finite Automata
• Finite automata (finite-state machines) are a
mathematical way of describing particular kinds of
algorithms.
• A strong relationship between finite automata and
regular expression
• Identifier = letter (letter | digit)*
letter

letter

1 2

digit
Introduction to Finite Automata
letter

letter

1 2

digit
• Transition:
– Record a change from one state to another upon a match of the
character or characters by which they are labeled.
• Start state:
– The recognition process begin
– Drawing an unlabeled arrowed line to it coming “from nowhere”
• Accepting states:
– Represent the end of the recognition process.
– Drawing a double-line border around the state in the diagram
More About Finite Automata

2.3.1 Definition of Deterministic Finite

Automata [Open]
2.3.2 Lookahead, Backtracking ,and
Nondeterministic Automata [Open]
2.3.3 Implementation of Finite Automata in
Code [Open]
2.3.1 Definition of Deterministic
Finite Automata
The Concept of DFA
DFA: Automata where the next state is uniquely given by the
current state and the current input character.

Definition of a DFA:
A DFA (Deterministic Finite Automation) M consist of
(1) an alphabet ∑,
(2) A set of states S,
(3) a transition function T : S ×∑ → S,
(4) a start state s0∈S,
(5)And a set of accepting states A ⊂ S
The Concept of DFA
The language accepted by a DFA M, written L(M),
is defined to be
the set of strings of characters c1c2c3….cn with each ci ∈ ∈∑ such that
there exist states s1 = t(s0,c1),s2 = t(s1,c2), sn = T(sn-1,cn) with sn an
element of A (i.e. an accepting state).

Accepting state sn means the same thing as the

diagram:
c1 c2 cn
→ s0 → s1 → s2 → ………sn-1 → sn
Some differences between definition
of DFA and the diagram:
letter

letter

star In-id
t

digit

1) The definition does not restrict the set of states to numbers

2) We have not labeled the transitions with characters but with names
representing a set of characters
3) definitions T: S ×∑ → S , T(s, c) must have a value for every s and
c.
– In the diagram, T (start, c) defined only if c is a letter, T(in_id, c) is
defined only if c is a letter or a digit.
– Error transitions are not drawn in the diagram but are simply assumed
to always exist.
Examples of DFA
Example 2.6: exactly accept one b
Not b

Not b

Example 2.7: at most one b

not b

b
Examples of DFA
Example 2.8:digit = [0-9]
nat = digit +
signedNat = (+|-)? nat
Number = singedNat(“.”nat)?(E signedNat)?

A DFA of nat: digit

digit

A DFA of signedNat: +
digit
digit

−
digit
Examples of DFA
Example 2.8:digit = [0-9]
nat = digit +
signedNat = (+|-)? nat
Number = singedNat(“.”nat)?(E signedNat)?

A DFA of Number:
digit

+ digit . digit E + digit

－－
E

digit digit
Examples of DFA
Example 2.9 : A DFA of C Comments
(easy than write down a regular expression)

other
*

/ * * /
1 2 3 4 5

other

BACK
2.3.2 Lookahead, Backtracking,
and Nondeterministic Automata
A Typical Action of DFA Algorithm
• Making a transition: move the character from the input
string to a string that accumulates the characters
belonging to a single token (the token string value or
lexeme of the token)
• Reaching an accepting state: return the token just
recognized, along with any associated attributes.
• Reaching an error state: either back up in the input
(backtracking) or to generate an error token.
letter

letter [other]
start in_id finish return ID

digit
Finite automation for an identifier
with delimiter and return value
• The error state represents the fact that either
an identifier is not to be recognized (if came
from the start state) or a delimiter has been seen
and we should now accept and generate an
identifier-token.
• [other]: indicate that the delimiting character
should be considered look-ahead, it should be
returned to the input string and not consumed.
letter

letter [other]
start in_id finish return ID

digit
Finite automation for an identifier
with delimiter and return value
• This diagram also expresses the principle of
longest sub-string described in Section 2.2.4:
the DFA continues to match letters and digits (in
state in_id) until a delimiter is found.
• By contrast the old diagram allowed the DFA
to accept at any point while reading an
identifier string.
letter
letter

letter
letter [other]
star In-id
start in_id finish return ID
t

digit
digit
How to arrive at the start state in
the first place
(combine all the tokens into one DFA)
Each of these tokens begins with a
different character
: =
•Consider the tokens given by return ASSIGN

the strings : =, <=, and = < =

return LE
• Each of these is a fixed =
string, and DFAs for them return EQ

can be written as right

=
return ASSIGN
:

• Uniting all of their start < =

return LE
states into a single start state =

to get the DFA return EQ

Several tokens beginning with the
same character
=

• They cannot be <

return LE

simply written as the < >

return NE

right diagram, since <

return LT
it is not a DFA

=
return LE

• The diagram can be < >

return NE

rearranged into a
DFA [other] return LT
Expand the Definition of a Finite
Automaton
• One solution for the problem is to expand
the definition of a finite automaton
• More than one transition from a state
may exist for a particular character
(NFA: non-deterministic finite automaton,)
• Developing an algorithm for systematically
turning these NFA into DFAs
ε-transition
• A transition that may occur without consulting the
input string (and without consuming any characters)
ε

• It may be viewed as a "match" of the empty string.

• ( This should not be confused with a match of the
characterεin the input)
ε-Transitions Used in Two Ways.
• First: to express a choice of : =

alternatives in a way without ε

combining states ε < =

– Advantage: keeping the ε

=
original automata intact
and only adding a new
start state to connect them

• Second: to explicitly ε
describe a match of the
empty string.
Definition of NFA
• An NFA (non-deterministic finite automaton) M consists
of
– an alphabet Σ, a set of states S,
– a transition function T: S x (Σ U{ε})→℘(S),
– a start state s0 from S, and a set of accepting states A from S

• The language accepted by M, written L(M),

– is defined to be the set of strings of characters c1c2…. cn with
– each ci from Σ U{ε}such that
– there exist states s1 in T(s0 ,c1), s2 in (s1, c2),..., sn in T(sn-1 , cn)
with sn an element of A.
Some Notes
• Any of the cI in c1c2……cn may beε,and
the string that is actually accepted is the string c,c2. . .cn with theε's
removed (since the concatenation of s withε is s itself).
Thus, the string c,c2.. .cn may actually have fewer than n
characters in it

• The sequence of states s1,..., sn are chosen from the sets of

states T(sQ , c1),..., T(sn-1, cn), and this choice will not
always be uniquely determined.
The sequence of transitions that accepts a particular string is
not determined at each step by the state and the next input
character.
Indeed, arbitrary numbers ofε's can be introduced into the string at
any point, corresponding to any number ofε-transitions in the NFA.
Some Notes
• An NFA does not represent an algorithm.
However, it can be simulated by an algorithm
that backtracks through every non-deterministic
choice.
Examples of NFAs
Example 2.10
• The string abb can be accepted by either 2

of the following sequences of transitions: a b

a b ε b a ε
1 3 4
→1→2→4→2→4
ε
a ε ε b ε b
→1→3→4→2→4→2→4
• This NFA accepts the languages as
a
follows:
regular expression: (a|ε)b*
ab+|ab*|b* b b

• Left DFA accepts the same language.

b
Examples of NFAs
Example 2.11
• It accepts the string acab by making the following
transitions:
– (1)(2)(3)a(4)(7)(2)(5)(6)c(7)(2)(3)a(4)(7)(8)(9)b(10)
• It accepts the same language as that generated by the
regular expression : (a | c) *b

a
ε 3 4 ε
ε ε ε b
1 2 7 8 9 10
c
ε 5 6
ε
ε

BACK
2.3.3 Implementation of Finite
Automata in Code
Ways to Translate a DFA or NFA
into Code
The code for the DFA accepting identifiers:
• { starting in state 1 }
• if the next character is a letter then
• advance the input; letter
• { now in state 2 }
• while the next character is a letter or a digit do letter [other]
advance the input; { stay in state 2 } 1 2 3
• end while;
• { go to state 3 without advancing the input} digit
• accept;
• else
• { error or other cases }
• end if;

Two drawbacks:
• It is ad hoc—that is, each DFA has to be treated slightly differently, and it is difficult
to state an algorithm that will translate every DFA to code in this way.
• The complexity of the code increases dramatically as the number of states rises or,
more specifically, as the number of different states along arbi-trary paths rises.
Ways to Translate a DFA or NFA
into Code
The Code of the DFA that accepts the C comments:
• { state 1 }
• if the next character is "/" then advance the input; ( state 2 }
• if the next character is " * " then
• advance the input; { state 3 } other
• done := false; *
• while not done do
• while the next input character is not "*" do 1
/
2
*
3
*
4
/
5
• advance the input; end while;
• advance the input; ( state 4 }
• while the next input character is "*" do other
• advance the input;
• end while;
• if the next input character is "/" then
• done := true; end if;
• advance the input; end while;
• accept; { state 5 }
• else { other processing }
• end if;
• else { other processing } end if;
Ways to Translate a DFA or NFA
into Code
A better method:
• Using a variable to maintain the current state and
• writing the transitions as a doubly nested case statement inside a loop,
• where the first case statement tests the current state and the nested sec-ond level tests the input
character.

The code of the DFA for identifier: letter

• state := 1; { start }
• while state = 1 or 2 do letter [other]

• case state of 1 2 3

• 1: case input character of

digit
• letter: advance the input :
• state := 2;
• else state := ….{ error or other }; else state := 3;
• end case; • end case;
• 2: case input character of • end case;
• letter , digit: advance the input; • end while;
• state := 2; { actually unnecessary } • if state = 3 then accept else error;
Ways to Translate a DFA or NFA
into Code
The code of the DFA for C comments
• state := 1; { start } • 4: case input character of
• while state = 1, 2, 3 or 4 do • "/" advance the input;
• case state of • state := 5;
• 1: case input character of • "*": advance the input; { and stay in state 4 }
• "/" : advance the input; • else advance the input;
• state := 2; • state := 3;
• else state :=...{ error or other}; • end case;
• end case; • end case;
• 2: case input character of • end while;
• "*": advance the input; • if state = 5 then accept else error;
• state ::= 3;
• else state :=...{ error or other };
• end case; *
• 3: case input character of
/ * * /
• "*": advance the input;
• state := 4; 1 2 3 4 5
• else advance the input { and stay in state 3 }; other
• end case;
• other
Ways to Translate a DFA or NFA
into Code
Generic code:
Express the DFA as a data structure and then write
"generic" code;
A transition table, or two-dimensional array,
indexed by state and input character that expresses
the values of the transition function T

Characters in the alphabet

c
States States representing transitions
s T (s, c)
Ways to Translate a DFA or NFA
into Code
The transition table of the DFA for identifier:

Input char letter digit other Accepting

state
1 2 No
2 2 2 [3] no
3 yes

Brackets indicate “noninput-

consuming” transitions This column indicates
accepting states

Assume :the first state listed is the start state

Ways to translate a DFA or NFA
into Code
The transition table of the DFA for C comments: The code scheme:

Input char / * Other Accepting

• state := 1;
state • ch := next input character;
1 2 no • while not Accept[state] and not error(state) do
2 3 no • newstate := T[state,ch];
3 3 4 3 no • if Advance[state,ch] then ch := next input char;
4 5 4 3 no
• state := newstate;
• end while;
5 yes

• if Accept[state] then accept;

Assumes :
• The transi-tions are kept in a transition array T indexed by states and input characters;
• The transi-tions that advance the input (i.e., those not marked with brackets in the table) are given by
the Boolean array Advance, indexed also by states and input characters;
• Accepting states are given by the Boolean array Accept, indexed by states.
Features of Table-Driven Method
Table driven: use tables to direct the progress of the algorithm.

The advantage:
• The size of the code is reduced, the same code will work for many different problems,
and the code is easier to change (maintain).

The disadvantage:
• The tables can become very large, causing a significant increase in the space used by the
program. Indeed, much of the space in the arrays we have just described is wasted.
• Table-driven methods often rely on table-compression methods such as sparse-array
representations, although there is usually a time penalty to be paid for such compression,
since table lookup becomes slower. Since scanners must be efficient, these methods are
rarely used for them.

NFAs can be implemented in similar ways to DFAs, except NFAs are nondeterministic,
• there are potentially many different sequences of transitions that must be tried.
• A program that simulates an NFA must store up transitions that have not yet been tried
and backtrack to them on failure.

RET
End of Part One

THANKS

Specification of Tokens
No ratings yet
Specification of Tokens
21 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Lecture 2 Scanning (Lexical Analysis) - Part1
No ratings yet
Lecture 2 Scanning (Lexical Analysis) - Part1
51 pages
Lec - 2. Scanning (Lexical Analysis) Part 1
No ratings yet
Lec - 2. Scanning (Lexical Analysis) Part 1
37 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Unit 2
No ratings yet
Unit 2
89 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
Regular Expression
No ratings yet
Regular Expression
89 pages
Class 3
No ratings yet
Class 3
52 pages
Re - Regular Expression Operations - Python 3.13.3 Documentation
No ratings yet
Re - Regular Expression Operations - Python 3.13.3 Documentation
28 pages
2 - Scanning Slides Sanyal Part2
No ratings yet
2 - Scanning Slides Sanyal Part2
14 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Pcdunit2 Continuation
No ratings yet
Pcdunit2 Continuation
26 pages
Chapter 3 - Scanning: 3.1 Kinds of Tokens
No ratings yet
Chapter 3 - Scanning: 3.1 Kinds of Tokens
17 pages
CD ch2
No ratings yet
CD ch2
104 pages
CC 2
No ratings yet
CC 2
65 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Lecture 3-4 Updated
No ratings yet
Lecture 3-4 Updated
26 pages
Ayan Saha - 10700121101
No ratings yet
Ayan Saha - 10700121101
10 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
Computer Capsule July 2015
No ratings yet
Computer Capsule July 2015
19 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
2 - 2specification of Tokens
No ratings yet
2 - 2specification of Tokens
17 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
Automata Theory Computability - M2
No ratings yet
Automata Theory Computability - M2
68 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Regular Expression
No ratings yet
Regular Expression
3 pages
Chapter THREE
No ratings yet
Chapter THREE
24 pages
Chapter 3 - Regular Expressions
No ratings yet
Chapter 3 - Regular Expressions
49 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
Specification of Tokens
No ratings yet
Specification of Tokens
17 pages
Lexical Analysis-1
No ratings yet
Lexical Analysis-1
9 pages
Language About Complier Construction
No ratings yet
Language About Complier Construction
23 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
2 Regular Expression
No ratings yet
2 Regular Expression
23 pages
Regular Expressions
No ratings yet
Regular Expressions
4 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
CPSC 388 - Compiler Design and Construction: Scanners - Regular Expressions
No ratings yet
CPSC 388 - Compiler Design and Construction: Scanners - Regular Expressions
20 pages
2 Regular Expressions
No ratings yet
2 Regular Expressions
34 pages
Specification of Tokens
0% (1)
Specification of Tokens
17 pages
Session 1 FHIR Principles Quiz 1
No ratings yet
Session 1 FHIR Principles Quiz 1
5 pages
Lect2 Lexical
No ratings yet
Lect2 Lexical
9 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Unit 3 - Regular Expression
No ratings yet
Unit 3 - Regular Expression
45 pages
SPECIFICATION OF TOKENS - Unit 1
No ratings yet
SPECIFICATION OF TOKENS - Unit 1
13 pages
Lecture 3a and 3b
No ratings yet
Lecture 3a and 3b
21 pages
Lecture # 06
No ratings yet
Lecture # 06
27 pages
Regular Expression: Anab Batool Kazmi
No ratings yet
Regular Expression: Anab Batool Kazmi
32 pages
Sys LW-08EN Regex-Filters
No ratings yet
Sys LW-08EN Regex-Filters
31 pages
Compiler Design Assignment
No ratings yet
Compiler Design Assignment
6 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
1001 SaaS Product Ideas - Bannerbear
No ratings yet
1001 SaaS Product Ideas - Bannerbear
8 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
TPL Lect 15 - 16
No ratings yet
TPL Lect 15 - 16
5 pages
Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
10 pages
Regular Expression: Dept. of Computer Science Faculty of Science and Technology
No ratings yet
Regular Expression: Dept. of Computer Science Faculty of Science and Technology
16 pages
Senior Engineer, Operational Technology (OT) Cybersecurity
No ratings yet
Senior Engineer, Operational Technology (OT) Cybersecurity
3 pages
General S3 FAQs
No ratings yet
General S3 FAQs
45 pages
Counting Service Manual ccb9
No ratings yet
Counting Service Manual ccb9
17 pages
Log.2017 08 03 11 04 48 0500
No ratings yet
Log.2017 08 03 11 04 48 0500
28 pages
GitanjaliJoshi QA 8years
No ratings yet
GitanjaliJoshi QA 8years
3 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
149 pages
Cisco SmartPlay Select (SP) - Smartplay Select Program Guide SP v2.2
100% (1)
Cisco SmartPlay Select (SP) - Smartplay Select Program Guide SP v2.2
31 pages
TM1115 - STM32F103ZE - Chapter 6 (I2C - LCD)
No ratings yet
TM1115 - STM32F103ZE - Chapter 6 (I2C - LCD)
23 pages
Whats New. Tutorial de EdgeCAM
No ratings yet
Whats New. Tutorial de EdgeCAM
65 pages
Ccna at A Glance
No ratings yet
Ccna at A Glance
2 pages
2.9.1 Packet Tracer - Basic Switch and End Device Configuration - ILM
No ratings yet
2.9.1 Packet Tracer - Basic Switch and End Device Configuration - ILM
3 pages
Oasys SCADA HMI Supported Configurations Guide
No ratings yet
Oasys SCADA HMI Supported Configurations Guide
12 pages
DCS Migration Best Practices Open The Door To The Modern World - INSIDE AUTOMATION
No ratings yet
DCS Migration Best Practices Open The Door To The Modern World - INSIDE AUTOMATION
12 pages
How To Configure A Cluster
No ratings yet
How To Configure A Cluster
11 pages
What Is Personal Computer
100% (1)
What Is Personal Computer
21 pages
Automobile Dealership Management Software
100% (1)
Automobile Dealership Management Software
12 pages
UNIT-3 Structural Patterns: Intent
No ratings yet
UNIT-3 Structural Patterns: Intent
16 pages
ESP and ETS Integration Plans
No ratings yet
ESP and ETS Integration Plans
2 pages
Capstone 2014
No ratings yet
Capstone 2014
4 pages
DROBO B810i+Getting+Started+Guide
No ratings yet
DROBO B810i+Getting+Started+Guide
33 pages
Containers and Virtual Machines at Scale: A Comparative Study
No ratings yet
Containers and Virtual Machines at Scale: A Comparative Study
13 pages
603 Multimedia Technology
No ratings yet
603 Multimedia Technology
11 pages
Progressive Scan CMOS 210
No ratings yet
Progressive Scan CMOS 210
2 pages
NRPL ADS-B Esite A4 2002114 2
No ratings yet
NRPL ADS-B Esite A4 2002114 2
3 pages
Lab 1: Number Systems: Họ và tên: Phùng Quốc Khánh MSSV: 2193929
No ratings yet
Lab 1: Number Systems: Họ và tên: Phùng Quốc Khánh MSSV: 2193929
2 pages
CEPM Assignment01 Mobile Brand Comprision
No ratings yet
CEPM Assignment01 Mobile Brand Comprision
6 pages
Chatbot Usability Scale 33099 Toad
No ratings yet
Chatbot Usability Scale 33099 Toad
2 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)

Lecture02 Scanning 1

Uploaded by

Lecture02 Scanning 1

Uploaded by

COMPILER CONSTRUCTION

Principles and Practice

• One structured data type to collect all the

2.2.1 Definition of Regular Expression [Open]

• Parentheses do not change the language.

– not all sets of strings can be generated by regular expressions.

; this is a schema comment

Because of the complexity of regular expression, the comments will be

2.3.1 Definition of Deterministic Finite

Accepting state sn means the same thing as the

1) The definition does not restrict the set of states to numbers

Example 2.7: at most one b

A DFA of nat: digit

+ digit . digit E + digit

the strings : =, <=, and = < =

can be written as right

• Uniting all of their start < =

to get the DFA return EQ

• They cannot be <

simply written as the < >

right diagram, since <

• The diagram can be < >

• It may be viewed as a "match" of the empty string.

alternatives in a way without ε

combining states ε < =

– Advantage: keeping the ε

• The language accepted by M, written L(M),

• The sequence of states s1,..., sn are chosen from the sets of

of the following sequences of transitions: a b

• Left DFA accepts the same language.

The code of the DFA for identifier: letter

• 1: case input character of

Characters in the alphabet

Input char letter digit other Accepting

Brackets indicate “noninput-

Assume :the first state listed is the start state

Input char / * Other Accepting

• if Accept[state] then accept;

You might also like