0% found this document useful (0 votes)
17 views100 pages

Chapter 3 Finite Automata and Lexical Analysis

Uploaded by

sikaryoseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views100 pages

Chapter 3 Finite Automata and Lexical Analysis

Uploaded by

sikaryoseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 100

Chapter 3:

Finite automata and lexical analysis


Contents

 Lexical analysis
 The role of the lexical analyzer:

 Lexical scanning, token classes, keyword recognition.

 Finite automata
 Alphabet, Strings and languages
 Regular expressions
 Finite automata (DFA and NFA)
 Regular expressions to Finite automata conversion
 Minimizing the number of states of a DFA
 Lexical Analysis (Scanning): plays an important
role in compilation process of a program.
 It takes the source program as input and reads it
Lexical
one character at a time and produces equivalent
analysis
token stream of a program.
 For example, A = B + C * 50 (source program)
statement.
 The corresponding tokens stream after lexical
analyzer phase are x1 = x2 + x3 * 50, where x1,
x2 and x3 are tokens.
Lexical Analyzer

 Scans the Pure HLL code line by line.


 Takes lexemes as i/o and produces tokens.
 Removes comments and whitespaces from pure HLL code.
 Helps in macro expansion in the pure HLL code.
 Other tasks performed by Lexers are:
 skip comments and white space;
Lexical
 Detect syntactic errors in tokens
analysis
 Input program representation:
─ Character sequence
 Output program representation:
Lexical
analysis ─ Token sequence
 Analysis specification:
─ Regular expressions
 Recognizing (abstract) machine:
─ Finite Automata
 Implementation: Finite Automata
 Lexical analyzer performs the following tasks:
 Reads the source program, scans the input
characters, group them into lexemes and produce
Role of the token as output.
Lexical
Analyzer
 Enters the identified token into the symbol table.
 Strips out whitespaces and comments from
source program.
 Correlate error messages with the source program
i.e., displays error messages with its occurrence
by specifying the line number.
 Expands the macros if it is found in the source
program.
 Simplicity of design of compiler
- The removal of white spaces and
comments enables the syntax analyzer
Need of
Lexical for efficient syntactic constructs.
Analyzer  Compiler efficiency is improved
- Specialized buffering techniques for
reading characters speed up the
compiler process.

 Compiler portability is enhanced


Description of a Language
 Syntax: the form or structure of the expressions, statements, and program
units.
• Specifying how statements, declarations, and other language constructs
are written.
 Semantics: the meaning of the expressions, statements, and program units.

• What programs do, their behavior and meaning.


 Semantics is more complex and involved. It is harder to define, e.g., natural
language .
 Example: if statement
• Syntax: if (<expr>) <statement>

• Semantics: if <expr> is true, execute <statement>


 Sentence is a string of characters over some
alphabets.
 Language is a set of sentences
Descripti
on of a  Lexeme is the lowest level syntactic unit of the
Language language (i.e.+, int, total)
 The lexemes of a PL include its numeric literals,
operators, and special words…
 Lexemes are partitioned into groups -for
example, the names of variables, methods,
classes, and so forth in a PL form a group called
identifiers.
Description of a Language
 Token is a category of lexemes (e.g. identifier,
Keyword, Whitespace…).
 In programming languages, the following are the possible
tokens to be identified:

,;:
a g f

123, 123.45, 123E (+ if, else, while, struct,


or -), 123.45E float, int, begin …
 Consider
the following Java
statement:
Example: index = 2 * count + 17;
 Thelexemes and tokens of this
statement are:
 Pattern: describes a rule that must be matched
by sequences of characters(lexemes) to form a
Descripti
token.
on of a
Language  It can be defined by regular expressions or
grammar rules.
 For example,
[A-Za-z][A-Za-z_0-9]*
Count the number of tokens in a given code
segment:
Exercise
1. Count the number of tokens in the following C
statement is:
Printf(“i=%d, &i=%x”,i, &i);
2. Consider the program
int max (x, y)
int x, y;
{
/* find max of x and y*/
{
return (x>y ? x:y)
}
Q1. How many tokens are found in this source code?
Q2. Identify lexeme, and token.
 There are three approaches to building a lexical
analyzer:

1) Write a formal description of token patterns of the


Approaches
language using a descriptive language related to regular
of building
Lexical expressions.
Analyzer  These descriptions are used as input to a software tool
(for example, Lex) that automatically generates a lexical
analyzer.

2) Design a state transition diagram that describes the


token patterns of the language and write a program that
implements the diagram.

3) Design a state transition diagram that describes the


token patterns of the language and hand construct a
table driven implementation of the state diagram.
Approaches of building Lexical Analyzer
- Lexemes, Tokens and Patterns

C-Tokens:
If:
Identifier:
Integer:
Approaches of building Lexical Analyzer -
Lexemes, Tokens and Patterns

C- Tokens:
Approaches of building Lexical Analyzer -
Lexemes, Tokens and Patterns

C- Tokens:
 Some important definition regarding to
Specification languages include,
of
Tokens( Alph 1) Alphabet: is a finite, non empty set of
abet, Strings symbols.
and
languages)
 It is denoted by ∑ (Greek letter sigma).

 Example: Σ = {a,b}, Σ = {i,j,k}

 Roman alphabet ∑= {a, b, ……,z},


 Binary alphabet ∑= {0,1} is pertinent to the
theory of computation.
 String - is a finite sequence of symbols, which is

usually written next to one another and not


separated by commas.
Alphabet,
 Example –
Strings &
1) If Σa = {0 ,1}then 001001 is a string over Σa
Languages
2) If Σb = {a ,b , , , z) then axyrpqstcd is a string
over Σb.

3) 00110, 01, 1,0 are all strings over an alphabet

∑ ={0,1}

4) abab, aabb, ab, ba, a are all strings over an


alphabet ∑ = {a,b}.
1) Empty string : the string of zero length is
called the empty string.
 This is denoted by ϵ.
String  The empty string plays role of 0 in a number system.
Operatio 2) Reverse String: If w = w1w2,…,wn , where
ns each wiϵ∑ , the reverse of w is wnwn -1, …, w1.

3) Substring - z is a substring of w if z appears


consecutively within w.
 Example - deck is a substring of abcdeckabcjkl.
4. Concatenation - assume a string x of length
m and string y of length n, the concatenation
of x and y is written xy, which is the string
String obtained by appending y to the end of x, as
Operatio in x1x2 ...xmy1 y2..... yn.

ns  To concatenate a string with it self many, we

use the “superscript” notation:


String
Operatio
ns
 The Lexicographic ordering of strings is the

same as the dictionary ordering, except that


Lexicograph shorter strings precede longer strings.
ic ordering  The lexicographic ordering of all strings

over the alphabet {0, 1} is

– (∈, 0,1, 00, 01, 10, 11, 000, ...).


String Operations
 Language is simply a set of strings involving
symbols from some alphabet.
 Any set of strings over an alphabet ∑ is
Languages called a language.
 The set of all strings, including the empty
string over an alphabet ∑ is denoted as ∑*.
 Infinite languages L are denoted as:

L = { w ϵ∑*: w has property P}.


 Language is simply a set of strings involving
symbols from some alphabet.
 Any set of strings over an alphabet ∑ is
Languages called a language.
 The set of all strings, including the empty
string over an alphabet ∑ is denoted as ∑* .
 Infinite languages L are denoted as:

L = { w ϵ∑*: w has property P}.


 For example,
1) L1 = { w ϵ(0,1)*: w has an equal number of

0’s and 1’s}.


 L1 = {01, 10,1010. 1100,…..}
Languages
2) L2 = { w ϵ∑*: w = wR} where wR is the

reverse string of w.
 L2 = {101, 10101, radar, level,….}
 Union:
- If L1 and L2 are two languages, then union, denoted
by L1U L2 is a language containing all strings(w)
from both the languages.
Operations  Concatenation of Languages:
on - If L1 and L2 are languages over Σ, their
Language concatenation is
- L = L1•L2, or simply
- L =L1L2, where
L = {w ∈ Σ* : w = x •Y for some X∈ L1 and Y∈L2}.
Example: If L = {001, 10, 111} and M = {, 001}
then
– L.M = {001, 10, 111, 001001, 10001, 111001 }
– L U M ={, 001,10,111}
 Kleene Star:
- “Kleene Star” of a language L is denoted by L*.
- L* is the set of all strings obtained by concatenating
zero or more strings from L.
Operations
• L*= w ∈ Σ*:w=w1....w k for some k ≥0 and
on
Language some w1,w2,...,wk ∈ L

• Example: If L = {01, 1, 100} then 110001110011∈ L*,

since 110001110011 = 1• 100• 01• 1• 100 • 1• 1, each

of these strings is in L.
- L*= L0 U L1U L2U…. , Where L0=Є

 Positive closure: The positive closure of a


+ 1 2
Exercise - Operations on Language
• If L1 = {a, b} and L2 = {c, d} then,
1. L1.L2 = ?
2. L1  L2 = ?
 Regular expressions were mathematical tool

designed to represent regular languages.


 Built from a set of primitives and operations.

 This representation involves a combination of


Regular
strings of symbols from some alphabet Σ,
expressions
parentheses and the operators +, ⋅, and *.
 A regular expression is obtained from the

symbol {a, b, c}, empty string ∈, and empty-


set ∅ perform the operations +, ⋅ and * (union,
concatenation and Kleene star).
A Regular Expression can be recursively defined as
follows −

1. ε is a Regular Expression indicates the language


containing an empty string. (L (ε) = {ε})

Regular 2. φ is a Regular Expression denoting an empty


language. (L (φ) = { })
Expressions-
3. x is a Regular Expression where L = {x}
4. If X is a Regular Expression denoting the language
L(X) and Y is a Regular Expression denoting the
language L(Y), then

– X + Y is a Regular Expression corresponding to the


Regular language L(X) ∪ L(Y) where L(X+Y) = L(X) ∪ L(Y).

Expressions- – X . Y is a Regular Expression corresponding to the


language L(X) . L(Y) where L(X.Y) = L(X) . L(Y)

5. R* is a Regular Expression corresponding to the


language L(R*)where L(R*) = (L(R))*
• If we apply any of the rules several times from 1 to 5,
they are Regular expressions.
 Examples
0 + 1 represents the set {0, 1}
1 represents the set {1}
Regular (0 +1)1 represents the set {01, 11}
expressions (a+b)⋅(b+c) represents the set {ab, bb, ac, bc}
(0 + 1)* = ∈+ (0 + 1) + (0 + 1) (0 + 1)..........=
Σ*
(0 + 1 )+ =(0 +1) (0 +1)*= Σ+ =Σ*- {ε}
 Assume that Σ = {a,b,c}

 Zero or more: a* means zero or more a’s,

 To say zero or more ab’s, i.e.,{λ, ab,abab........,}

you need to say (ab)*.


Building
 One or more: Since a* means zero or more a’s,
Regular
you can use aa* (or equivalently a*a) to mean one
expressions
or more a’s.
 Similarly to describe ‘one or more ab’s”, that is

{ab, abab, ababab, .........}, you can use ab(ab)*.


 Zero or one: It can be described as an optional ‘a’

with (a + λ).
 Examples:
– Represent the following sets by regular expression
a. {∧, ab}
b. {1,11,111....}
c. {ab, a, b, bb}
Regular  Solution
expression a. The set {∧, ab} is represented by the regular
s
expression ∧ + ab
b. The set{1, 11,111,....,}is got by concatenating 1
and any element of {1}*.
Therefore 1(1)* represent the given set.
c. The set {ab, a, b, bb} represents the regular
expression
ab+ a+ b +bb.
 Obtain the regular expressions for the following

sets:
1. The set of all strings over {a, b} beginning and ending
with ‘a’.
Þ The regular expression for ‘the set of all
Regular
strings over {a, b} beginning and ending
expressions with ‘a’ is given by: a (a + b)*a
- Exercises 2. {b2, b5, b8,. . . . .}
Þ The regular expression for {b 2
, b 5
, b
8
, .........} is given by: bb (bbb)*
3. {a2n+1 |n > 0}
Þ The regular expression for {a 2n+1
|n >
0}is given by: a (aa)+
 Let L = {ab, aa, baa}, which of the following

strings are in L*?


I. abaabaaabaa
II. aaaabaaaa
Regular
III. baaaaabaaaab
expressions IV. baaaaabaa
- Exercises Answer: note that L* is a star- closure of a language L
given by L* = L1 U L2 U L3 U …..

V. abaabaaabaa = abaabaa ab aa  This string is in L*.


VI. 
VII. 
VIII.
1. What is a Finite Automaton (FA)?
– Define an FA in terms of its components: states,
alphabet, transition function, start state, and final
states.

Finite – Explain the difference between Deterministic Finite

Automata - Automata (DFA) and Non-Deterministic Finite


Basic Automata (NFA).
Concepts – What is the purpose of a Finite Automaton?
2. Explain the subset construction algorithm for NFA-to-
DFA conversion.
3. Provide examples of designing FAs for specific
languages, such as, Strings of even length?
4. Explain the concept of state minimization.
 A finite State Automata / Finite
Automata is an abstract machine
having:
– A finite set of states.
– A start state and a set of final states.
Finite
– A finite set of input symbols
Automata
(alphabet)
– A finite set of transition rules, which
specify how the machine in a
particular state responds to a
particular input symbol. The response
may be to change state and /or
 The finite automaton is a mathematical model
of a system, with discrete inputs and outputs.
 The system can be in any one of a finite
number of internal configurations or “states”.
Finite  The state of the system summarizes the

Automata information concerning past inputs that is


needed to determine the behavior of the
system on subsequent inputs.
 The various components of finite
automata are:
– Input Tape
– Finite control
Elements
– Reading Head

of Finite
Automata

– Input Tape – is divided into cells


(squares), which can hold one symbol
from input alphabet.
 The various components of finite
automata are:
– Finite control – it indicates the
Elements
current state and decides the next
of Finite state on receiving a particular input
Automata from the input tape.
– The tape reader reads the cells one by
one from left to right and at any instance
only one input symbol.

– Reading Head – examines read


symbol and moves to the right side
with or without changing the state.
 Finite Automaton can be classified
into two types –
– Deterministic Finite Automaton
(DFA) –
Finite for each state and input symbol, there is exactly one
transition.
Automata – Non-deterministic Finite Automaton
(NDFA / NFA) –
for a given state and input symbol, there can be
multiple possible transitions.

 Applications of FAs:
 Lexical analysis,
 text search,
 DFA is FSA that accepts /rejects finite
strings of symbols.
 Produces a unique computation of
the automaton for each input string.

Determinis  Formal Definition of (DFA)

tic Finite
 A DFA is a 5-tuple M =(𝑄,Σ,𝛿,𝑞0,𝐹)
where
Automaton
– 𝑄: A finite set of state
(DFA)
– Σ: An alphabet of input symbols
– 𝛿 ∶ 𝑄 × Σ → 𝑄: A transition
function
– 𝑞0 ∈ 𝑄: A start state
 The input mechanism can move
only from left to right and reads
exactly one symbol on each step.
Determinis  The transition from one internal

tic Finite state to another are governed by the


transition function δ.
Automaton
 If δ(q0 , 0) =q1 , then if the DFA is in
(DFA)
state q0 and the current input
symbol is 0, the DFA will go into
state q1.
 Q = {0,1,2}
 Σ ={a}
 0 = Start state
Example -
 1 = final state
DFA  transition function are:
 δ( 0,a) 1
 δ( 1,a) 2
 δ( 2,a) 2
 Design a DFA, the language recognized by the
Example#
Automaton being
L = {an b :n ≥ 0}
 For the given language L = {an b :n ≥ 0}, the
strings could be
b, ab, a2b, a3b,....,.
• Therefore the DFA accepts all strings consisting of
an arbitrary number of a’s, followed by a single b.
 In the automata theory, a
nondeterministic finite automaton (NFA)
or nondeterministic finite state machine
is a finite state machine where from
Non-
each state and a given input symbol the
deterministic
automaton may jump into several
Finite possible next states.
Automata  A non-deterministic finite state
(NFA) automaton (NFA) is a 5- tuple (Q, Σ, δ,s0,
F), where:
– Q is a finite set called the states;
– Σ is a finite set called the alphabet;
Example -
Please note that this is an NFA
NFA as
• δ(q0 ,0) = q0 and δ(q0,0)=q1
DFA Vs NFA:
S.No DFA NFA
1. For Every symbol of the We do not need to specify how does
alphabet, there is only one state the NFA react according to some
transition in DFA. symbol.
2. DFA cannot use Empty String NFA can use Empty String
transition. transition.
3. DFA can be understood as one NFA can be understood as multiple
machine. little machines computing at the
same time.
4. DFA is more difficult to
NFA is easier to construct.
construct.
DFA Vs NFA:
 Thereare different representation
of DFA, such as:
 Graph - Transition diagram
 Tabular - Transition table
DFA
Representations  Mathematical - Detailed description
 Directed graphs with vertices and edges with
set of symbols (alphabet).
A diagram consisting of circles to represent
states and directed line segments to represent

Transition transitions between the states.


diagram: Initial state

Final state
a state transition table is a table
showing what state finite state
Table machine(or states in the case of an
transition
NFA) will move to, based on the
current state and other inputs.
Row – states
Column – inputs
Entries – next state
 - start state
* - final state
The mathematical model of automat
consists of
Detailed Q  finite set of states
description
∑  finite set of input symbols

δ : Q X ∑  Q , transition function

q0  start / initial state

F  final / accepting state


Transition functions returns next
state.
Transition Parameters
functions
Current state
Current input symbols
δ (Current state, current input symbol) 
next state.

Example
δ (q0,0)q1
δ (q0,1)q0
δ (q1,0)q1
δ (q1,1)q2
δ (q2,0)q2
Determine the DFA schematic for M =
(Q, Σ, δ ,q ,F ), where Q = {q1, q2,
Example - q3}, Σ = {0,1}, q1 is the start state,
DFA
F = {q2} and δ is given by the table
below
 Language of accepted Strings
 Consider a DFA shown in figure below

Input strings are 01


011,
Check the acceptability of each string
Consider the following NFA given below

Check the acceptability of the following strings


i) 011
ii) 010
iii) 011011
Consider the following NFA given below

Check the acceptability of the following strings


i) 011
ii) 010
iii) 011011
1) The regular expression ϵ denotes the language ϵ;
no strings belong to this language, not even the
empty string.
 RE = ϵ

2) The regular expression ∅ denotes the language ∅; no


Regular
Expressions strings belong to this language, not even the empty
to NFA string.
– RE = ∅

3) For any x in Σ, the regular expression denotes

the language {x}.


 RE = x
3) For juxtaposition, strings in L(r1) followed

by strings in L(r2) , we chain the NFAs


together
 RE = r1r2
Regular
Expressions
to NFA

4) The “+” denotes “or” in a regular expression,

we would use an NFA with a choice of paths.


 RE = r1+r2
5) The star (*) denotes zero or more

applications of the regular expression, hence


a loop has to be set up in the NFA.
 RE = r*
Regular
Expressions
to NFA
1) Construct NFA with ϵ moves with regular expression (0+1)*.
Solution:
Examples The NFA will be constructed step by step by breaking regular
expression into small regular expressions.
 R3 = (r1 + r2)

 R = r3* , where r1 = 0 , r2 = 1

NFA for r1 will be

NFA for r2 will be

NFA for r3 will be


1) Construct NFA with ϵ moves with regular expression (0+1)*.
Solution:
Examples The NFA will be constructed step by step by breaking regular
expression into small regular expressions.
 R3 = (r1 + r2)

 R = r3* , where r1 = 0 , r2 = 1

And finally NFA for the regular expression (0+1)* will be


2) Construct NFA with ϵ moves with regular expression
(01+2*)0.

Exercise Solution:
The NFA will be constructed step by step by breaking regular
expression into small regular expressions.
 R = (r1 + r2)r3 , where r1 = 01 , r2 = 2* and r3 = 0


Exercise 3) Construct NFA for the regular expression

r= (a|b)* abb.


• Two finite accepters M1 and M2 are equivalent,
iff L(M1) =L(M2) i.e., if both
EQUIVALENC
E OF NFA accept the same language.
AND DFA
• Both DFA and NFA recognize the same class of
languages.
• It is important to note that every NFA has an
equivalent DFA.
Problem Statement

• Let X = (Qx, ∑, δx, q0, Fx) be an NDFA


which accepts the language L(X).
• We have to design an equivalent DFA Y
= (Qy, ∑, δy, q0, Fy) such that L(Y) =
Algorithm
NDFA to DFA
Conversion- • Input: An NDFA
Subset
Constructi • Output: equivalent DFA
on
• Step 1 Create state table from the given
NDFA.
• Step 2 Create a blank state table under
possible input alphabets for the
equivalent DFA.
• Step 3 Mark the start state of the DFA
by q0 (Same as the NDFA).
• Step 4 Find out the combination of States {Q0,
NDFA to DFA
Conversion- Q1,... ,Qn} for each possible input alphabet.
Algorithm • Step 5 Each time we generate a new DFA state

under the input alphabet columns, we have to


apply step 4 again, otherwise go to step 6.
• Step 6 The states which contain any of the final
states of the NDFA are the final states of the
equivalent DFA.
• Let us illustrate the conversion of NFA(NDFA ) to
DFA through an example.
Example
.

75
• ε-closure – is a set of states which can be reached from the
Steps for
state with only ε move including the state itself.
converting
NFA with ε to 01: We will take the ε-closure for the starting state of NFA as a
DFA: starting state of DFA.
02: Find the states for each input symbol that can be traversed
from the present. I.e., the union of transition value and their
closures for each state of NFA present in the current state of
DFA.
03: If we found a new state, take it as current state and repeat
02.
04: Repeat Step 02 and 03 until there is no new state present in
the transition table of DFA.
05: Mark the states of DFA as a final state which contains the
Example

77
Con…

78
DFA
DFA minimization is the task of transforming a given
Minimizati
deterministic finite automaton (DFA) into an
on
equivalent DFA that has a minimum number of
states.

There are two popular methods for minimizing a DFA-


1. Minimization of DFA Using Equivalence Theorem-
DFA 01: Eliminate all the dead states and
Minimizatio inaccessible states from the given DFA (if
n any).
 Dead State
 All those non-final states which transit to itself
for all input symbols in ∑ are called dead
states.
 Inaccessible State
 All those states which can never be reached

from the initial state are called as inaccessible


states.
02: Now, start applying equivalence theorem.
 Take a counter variable k and initialize it with

value 0.
1. Minimization of DFA Using Equivalence
DFA Theorem-
Minimizatio 03: Increment k by 1.
n  Find Pk by partitioning the different sets of Pk-
1.

 In each set of Pk-1 , consider all the possible

pair of states within each set and if the two


states are distinguishable, partition the set
into different sets in Pk.

04: Repeat step-03 until no change in partition


occurs.
Example- Minimize given DFA Using Equivalence
DFA Theorem-
Minimizatio
n
Example #2
2. DFA Minimization using Myphill-Nerode
DFA Theorem (Table Filling) method
Minimizatio  Steps
n 01: Draw a table for all pairs of states (Qi, Qj) not
necessarily connected directly [All are unmarked
initially]

02: Consider every state pair (Qi, Qj) in the DFA

where Qi ∈ F and Qj ∉ F or vice versa and mark


them. [Here F is the set of final states]
03: Repeat this step until we cannot mark anymore
states −

 If there is an unmarked pair (Qi, Qk), mark it if

the pair {δ (Qi, A), δ (Qj A)} is marked for


 Example - DFA Minimization using Myphill-Nerode Theorem (Table
Filling) method
 To understand the concept of minimization using Myhill-Nerode Theorem,
Let us take an example-

 To Minimize the above DFA using


Myhill-Nerode theorem, follow some
steps:
 Initially an X is placed in each entry
corresponding to one final state and one
non-final state in the following format.
A,
E

B,
H
DFA Minimization
Example -
• There is a wide range of tools
A language for for constructing lexical
specifying lexical
analyzers analyzers.
– Lex
• Lex is a computer program
that generates lexical
analyzers.
• Lex is commonly used with the
yacc parser generator.
• Lex Specification or Structure
A language for
• A LEX program has the
specifying lexical
analyzers following forms:
D1 = R1
D2 = R2
---------------------
Auxiliary ---------------------
Definitions Dn = Rn

• Each Di is distinct name and


each R is a regular expression,
whose symbols are chosen
from ∑Ʋ{D1,D2, …Di-1}.
• Example:
letter = A|B|……….|Z.
digit = 0|1|………...|9.
Identifier = letter(letter | digit)*
P1 = {A1}
P2 = {A2}
---------------------
---------------------
Pn = {An }
Translation Rules

• each pi is a regular expression


called a token pattern over the
alphabet consisting of ∑ and
auxiliary definition names.
• Example:
ab* (for input symbol a,b)
if, then, else (for keywords).
Each Ai is a program fragment
describing what action the lexical
analyzer should take when token P i
is found.
• First, a specification of a
lexical analyzer is prepared by

Creating a lexical creating a program lex.l in the


analyzer Lex language.
• Then, lex.l is run through the
Lex compiler to produce a C
program lex.yy.c.
• Finally, lex.yy.c is run through
the C compiler to produce an
object program a.out, which is
the lexical analyzer that
Creating a lexical
analyzer
• Recognizers
• Tokens can be recognized by Finite
Automata.
– A recognition device reads input
Recognition of
tokens strings over the alphabet of the
language and decides whether the
input strings belong to the language.
– Example: syntax analysis part of a
compiler
– Compilers and Interpreters recognize
syntax and convert it into machine
understandable form.
• Generators
–A device that generates

Recognition of sentences of a language.


tokens – One can determine if the
syntax of a particular sentence
is syntactically correct by
comparing it to the structure of
the generator.
– Example:

Regular expression
 Three general approaches for
the implementation of a lexical

Implementation of analyzer
a lexical analyzer  By using a lexical-analyzer
generator:
 The generator provides routines for
reading and buffering the input.

 To write the lexical analyzer by


using a high level language.
 To write the lexical analyzer by
using a low level language.
1. Which of the following statements are true?
Exercise
Argue each answer informally.
a) Any subset of a regular language is itself a
regular language.

b) Any superset of a regular language is itself a


regular language.

c) The set of anagrams of strings from a regular


language forms a regular language. (An anagram
of a string is obtained by rearranging the order of
characters in the string, but without adding or
deleting any. The anagrams of the string abc are
hence abc, acb, bac, bca, cab and cba).

You might also like