Unit I - Automata and Lexical Analyzer
Unit I - Automata and Lexical Analyzer
10/22/2021 6
10/22/2021 7
Why to bother with automata theory?
10/22/2021 8
Historical perspective of Automata theory
Finite automata are a useful model for many important kinds of hardware and
software. some of the most important kinds:
Software for designing and checking the behavior of digital circuits.
The lexical analyzer of a typical compiler that is the compiler component that
breaks the input text into logical units such as identifiers keywords and
punctuation.
Software for scanning large bodies of text such as collections of Web pages to
find occurrences of words phrases or other patterns.
Software for verifying systems of all types that have a finite number of distinct
states such as communications protocols or protocols for secure exchange of
information Prepared by Dr R Raja 10
10/22/2021 Prepared by Dr R Raja 11
10/22/2021 Prepared by Dr R Raja 12
Structural Representations
There are two important notations that are not automaton like but play an important role in
the study of automata and their applications
1. Grammars are useful models when designing software that processes data with a
recursive structure. The best known example is a parse the component of a compiler
that deals with the recursively nested features of the typical programming language
such as expressions arithmetic conditional and so on For instance a grammatical rule
like E E + E states that an expression can be formed by taking any two expressions
and connecting them by a plus sign.
2. Regular Expressions also denote the structure of data especially text strings the
patterns of strings they describe are exactly the same as what can be described by
finite automata. The style of these expressions differs significantly from that of
grammars. The UNIX style regular expression represents capitalized
words followed by a space and two capital letters This expression represents patterns
in text that could be a city and state. If multiword city names
1.What can a computer do at all? This study is called decidability and the
problems that can be solved by computer are called decidable.
2. What can a computer do efficiently? This study is called intractability and the
problems that can be solved by a computer using no more time than some slowly
growing function of the size of the input are called tractable Often we take all
polynomial functions to be slowly growing while functions that grow faster than
any polynomial are deemed to grow too fast.
14
The Central Concepts of Automata Theory
Example: 0,1
Positive Closure:
The’ +’ (plus operation) is sometimes called positive Closure.
If ∑ = {a}, then ∑+ = {a, aa, aaa, …} = the set of nonempty strings from ∑
∑+ = ∑* – {ε}
10/22/2021 Prepared by Dr R Raja 21
Finite Automaton(FA) or Finite State Machine (FSM)
An automaton with a finite number of states is called a Finite Automaton(FA) or Finite
State Machine (FSM).
Formal definition of a Finite Automaton
An automaton can be represented by a 5-tuple (Q, ∑, δ, q0, F),
Finite Automaton can
where − be classified into two
types −
Q is a finite set of states.
Deterministic Finite
∑ is a finite set of symbols, called the alphabet of the automaton. Automaton (DFA)
Non-deterministic
δ is the transition function.
Finite Automaton
q0 is the initial state from where any input is processed (q0 ∈ Q). (NDFA / NFA)
F is a set of final state/states of Q (F ⊆ Q).
In state q1, if we read 1, we will be in state q1, but if we read 0 at state q1, we will reach
to state q2 which is the final state. In state q2, if we read either 0 or 1, we will go to q2
state or q1 state respectively. Note that if the input ends with 0, it will be in the final
state.
10/22/2021 Prepared by Dr R Raja 29
Example 2:
Solution:
In the given solution, we can see that only input 101 will be accepted. Hence, for input
101, there is no other path shown for other input.
The stages q0, q1, q2 are the final states. The DFA will generate the strings that do not
contain consecutive 1's like 10, 110, 101,..... etc.
10/22/2021 Prepared by Dr R Raja 33
Example 6:
Design a FA with ∑ = {0, 1} accepts the strings with an even number of 0's followed by
single 1.
Solution:
The DFA can be shown by a transition diagram as:
The finite automata are called NFA when there exist many paths for specific input from the
Every NFA is not DFA, but each NFA can be translated into DFA.
NFA is defined in the same way as DFA but with the two exceptions, it contains multiple
states q1 and q2, similarly, from q0 for input b, the next states are q0 and q1. Thus it is
not fixed or determined that with a particular input where to go next. Hence this FA is
10/22/2021 36
Design a NFA for the transition table as given below:
Present State 0 1
Solution:
Now before double 1, there can be any string of 0 and 1. Similarly, after double 0, there
can be any string of 0 and 1. Hence the NFA becomes:
The language consists of all the string containing substring 1110. The partial transition
diagram can be:
Now as 1110 could be the substring. Hence we will add the inputs 0's and 1's so that the
substring 1110 of the language can be maintained. Hence the NFA becomes:
Thus we get the third symbol from the right end as '0' always. The NFA can be:
OOO, O11,1011,11011
e b
2 3 4
$20
Σ w
1 Σ
e
Start 5 6 7 8
$20
b a y
Use of ε -transitions
We allow the automaton to accept the empty string ε.
This means that a transition is allowed to occur without
reading input symbol.
The resulting NFA is called ε -NFA.
It adds “programming (design) convenience” (more
intuitive for use in designing FA’s)
a state in Q, and
a member of ∑∪{ε}
ε –closure(1) ={1,2,3,4,6}
ε–closure(2) ={2,3,6}
ε–closure(3) ={3,6}
ε−closure(4) = {4}
ε –closure(5) ={5,7}
ε–closure(6) = {6}
ε –closure(7) ={7}
10/22/2021 54
Conversion of NFA to DFA
In NFA, when a specific input is given to the current state, the machine goes to multiple
states. It can have zero, one or more than one move on a given input symbol.
On the other hand, in DFA, when a specific input is given to the current state, the
machine goes to only one state. DFA has only one move on a given input symbol.
Let, M = (Q, ∑, δ, q0, F) is an NFA which accepts the language L(M). There should be
equivalent DFA denoted by M' = (Q', ∑', q0', δ', F') such that L(M) = L(M').
Step 2: Add q0 of NFA to Q'. Then find the transitions from this start
state.
Step 3: In Q', find the possible set of states for each input symbol. If this
Step 4: In DFA, the final state will be all the states which contain F(final
10/22/2021 56
states of NFA)
Conversion of NFA to DFA
Example : Convert the given NFA to DFA.
Solution: For the given transition diagram we will first construct the transition table.
State 0 1 The δ' transition for state q2 is obtained as:
→q0 q0 q1 δ'([q2], 0) = [q2]
q1 {q1, q2} q1 δ'([q2], 1) = [q1, q2]
*q2 q2 {q1, q2}
Now we will obtain δ' transition on [q1, q2].
Now we will obtain δ' transition for state q0.
δ'([q1, q2], 0) = δ(q1, 0) ∪ δ(q2, 0)
δ'([q0], 0) = [q0]
= {q1, q2} ∪ {q2}
δ'([q0], 1) = [q1]
= [q1, q2]
The δ' transition for state q1 is obtained as: δ'([q1, q2], 1) = δ(q1, 1) ∪ δ(q2, 1)
= {q1} ∪ {q1, q2}
δ'([q1], 0) = [q1, q2] (new state generated)
= {q1, q2}
δ'([q1], 1) = [q1]
10/22/2021
= [q1, q2] 57
The state [q1, q2] is the final state as well because it contains a final state q2. The transition
table for the constructed DFA will be:
State 0 1
Solution: For the given transition diagram we will first construct the transition table.
State 0 1
Now we will obtain δ' transition on [q0, q1].
→q0 {q0, q1} {q1}
*q1 ϕ {q0, q1}
δ'([q0, q1], 0) = δ(q0, 0) ∪ δ(q1, 0)
= {q0, q1} ∪ ϕ
Now we will obtain δ' transition for state q0. = {q0, q1}
δ'([q0], 0) = {q0, q1} = [q0, q1]
= [q0, q1] (new state generated) δ'([q0, q1], 1) = δ(q0, 1) ∪ δ(q1, 1)
δ'([q0], 1) = {q1} = [q1] = {q1} ∪{qo,q1}
The δ' transition for state q1 is obtained as: = {q0, q1}
= [q0, q1]
δ'([q1], 0) = ϕ
δ'([q1], 1) = [q0, q1]
10/22/2021 59
As in the given NFA, q1 is a final state, then in DFA wherever, q1 exists that state becomes a
final state. Hence in the DFA, final states are [q1] and [q0, q1]. Therefore set of final states
F = {[q1], [q0, q1]}.
The transition table for the constructed DFA will be:
State 0 1
The Transition diagram will be: Even we can change the name of the states of
DFA.
Suppose
A = [q0]
B = [q1]
C = [q0, q1]
10/22/2021 60
Example Problems for Conversion of NFA to DFA
Construct a NFA accepting the set of strings over {a,b} ending with “aba”. Use it to
construct a DFA accepting the same set of strings.
Solution : NFA for accepting the strings ending with “aba”
Now we will obtain δ' transition for state q0. Now we will obtain δ' transition for state
{q0,q2}
10/22/2021 61
Transition Table and Transition Diagram
10/22/2021 74
10/22/2021 75
10/22/2021 Prepared by Dr R Raja 76
Conversion of NFA with ε to DFA:
Steps for converting NFA with ε to DFA:
Step 1: We will take the ε-closure for the starting state of NFA as a starting state of DFA.
Step 2: Find the states for each input symbol that can be traversed from the present. That
means the union of transition value and their closures for each state of NFA
Step 3: If we found a new state, take it as current state and repeat step 2.
Step 4: Repeat Step 2 and Step 3 until there is no new state present in the transition table
of DFA.
Step 5: Mark the states of DFA as a final state which contains the final state of NFA.
10/22/2021 77
Convert the NFA with ε into its equivalent DFA.
10/22/2021 78
Now,
δ'(B, 0) = ε-closure {δ(q3, 0) } For state C:
=ϕ δ'(C, 0) = ε-closure {δ(q4, 0) }
δ'(B, 1) = ε-closure {δ(q3, 1) } =ϕ
= ε-closure {q4} δ'(C, 1) = ε-closure {δ(q4, 1) }
= {q4} i.e. state C =ϕ
82
10/22/2021 83
Conversion of Epsilon NFA to DFA
10/22/2021 85
Convert the Following Epsilon NFA into DFA
1.
2.
3.
10/22/2021 86
Conversion of Regular Expression to Finite Automata : Direct Method
Example 1: Design a FA from given regular expression 10 + (0 + 11)0* 1
Step 1: Step 4:
Step 2:
Step 5:
Step 3:
10/22/2021 87
Example 2: Design a FA from given regular expression 1 (1* 01* 01*)*
Step 1:
Step 2:
Step 3:
Step 1: Step 3:
Step 2:
10/22/2021 89
Example 4: Design a FA from given regular expression (0+1)*(00+11)(0+1)*
Step 1: Step 4:
Step 2:
Step 5:
Step 3:
Lexical analysis
Syntax analysis
Semantic analysis
10/22/2021 Prepared by Dr R Raja 108
Language Processors
10/22/2021 110
Lexical Analyzer
Lexical Analyzer reads the source program character by character and returns the
tokens of the source program.
A token describes a pattern of characters having same meaning in the source program.
(such as identifiers, operators, keywords, numbers, delimeters and so on)
Ex: newval := oldval + 12 => tokens: newval identifier
:= assignment operator
oldval identifier
+ add operator
12 a number
identifier := expression
identifier number
oldval 12
• The type of the identifier newval must match with type of the expression
(oldval+12)
Ex:
MULT id2,id3,temp1
ADD temp1,#1,id1
Code Generator
Produces the target language in a specific architecture.
The target program is normally is a relocatable object file containing the machine
codes.
Ex:
( assume that we have an architecture with instructions whose at least one of its
operands is a machine register)
MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1 Prepared by Dr R Raja 118
Symbol-Table Management
An essential function of a compiler is to record the identifiers used in the source
program and collect information about various attributes of each identifier.
These attributes may provide information about the storage allocated for an identifier,
its type, its scope, and,
In the case of procedure names, such things as the number and types of its arguments,
the method of passing each argument, and the type of returned, if any.
A symbol table is a data structure containing a record of each identifier, with fields
for the attributes of the identifier.
When an identifier in the source program is detected by the lexical analyzer, the
identifier is entered into the symbol table.
The remaining phases enter information about identifiers into the symbol table and
then use this information in various way.
Ex. when doing semantic analysis and intermediate code generation,
we need to know what the types of identifiers are, so we can check that the source
program uses them in valid ways,
And so that we can generate the proper operations on them.
The code generator typically enters and uses detail information about the storage
assigned to identifiers.
10/22/2021 Prepared by Dr R Raja 119
Error Detection and Reporting
Assembly code: names are used for instructions, and names are used for
memory addresses.
Two-pass Assembly:
First Pass: all identifiers are assigned to memory addresses (0-offset)
e.g. substitute 0 for a, and 4 for b
Second Pass: produce relocatable machine code:
0001 01 00 00000000 *
0011 01 10 00000010
0010 01 00 00000100 *
relocation bit
10/22/2021 Prepared by Dr R Raja 123
LOADER AND LINK-EDITOR
LOADERS take relocatable machine code and alter the addresses, putting the
instructions and data in a particular location in memory.
The LINK EDITOR (part of the loader) pieces together a complete program from
several independently compiled parts.
Loader: taking relocatable machine code, altering the addresses and placing the
altered instructions into memory.
Link-editor: taking many (relocatable) machine code programs (with cross-
references) and produce a single file.
Need to keep track of correspondence between variable names and
corresponding addresses in each piece of code.
token
Source To semantic
program Lexical Analyzer Parser analysis
getNextToken
Symbol
table
Simplicity of design
Improving compiler efficiency
Enhancing compiler portability
Example
const pi = 3.1416;
The substring pi is a lexeme for the token “identifier.”
130
10/22/2021 131
10/22/2021 Prepared by Dr R Raja 132
Operations on Languages
Rules
• ε is a regular expression that denotes {ε}, the set containing empty string.
• If a is a symbol in Σ, then a is a regular expression that denotes {a}, the set containing the
string a.
• Suppose r and s are regular expressions denoting the language L(r) and L(s), then
• (r) |(s) is a regular expression denoting L(r)∪L(s).
• (r)(s) is regular expression denoting L (r) L(s).
• (r) * is a regular expression denoting (L (r) )*.
• (r) is a regular expression denoting L (r).
Precedence Conventions
• The unary operator * has the highest precedence and is left associative.
• Concatenation has the second highest precedence and is left associative.
• | has the lowest precedence and is left associative.
• (a)|(b)*(c)→a|b*c Prepared by Dr R Raja 134
Example of Regular Expressions
Regular Definitions
• If Σ is an alphabet of basic symbols, then a regular definition is a sequence of definitions
of the form:
d1→r1
d2→r2
...
dn→rn
• where each di is a distinct name, and each ri is a regular expression over the symbols in
10/22/2021 136
∪Σ{d1,d2,…,di-1}, i.e., the basic symbols and the previously defined names.
Examples of Regular Definitions