Chapter 3 Finite Automata and Lexical Analysis
Chapter 3 Finite Automata and Lexical Analysis
Lexical analysis
The role of the lexical analyzer:
Finite automata
Alphabet, Strings and languages
Regular expressions
Finite automata (DFA and NFA)
Regular expressions to Finite automata conversion
Minimizing the number of states of a DFA
Lexical Analysis (Scanning): plays an important
role in compilation process of a program.
It takes the source program as input and reads it
Lexical
one character at a time and produces equivalent
analysis
token stream of a program.
For example, A = B + C * 50 (source program)
statement.
The corresponding tokens stream after lexical
analyzer phase are x1 = x2 + x3 * 50, where x1,
x2 and x3 are tokens.
Lexical Analyzer
,;:
a g f
C-Tokens:
If:
Identifier:
Integer:
Approaches of building Lexical Analyzer -
Lexemes, Tokens and Patterns
C- Tokens:
Approaches of building Lexical Analyzer -
Lexemes, Tokens and Patterns
C- Tokens:
Some important definition regarding to
Specification languages include,
of
Tokens( Alph 1) Alphabet: is a finite, non empty set of
abet, Strings symbols.
and
languages)
It is denoted by ∑ (Greek letter sigma).
∑ ={0,1}
reverse string of w.
L2 = {101, 10101, radar, level,….}
Union:
- If L1 and L2 are two languages, then union, denoted
by L1U L2 is a language containing all strings(w)
from both the languages.
Operations Concatenation of Languages:
on - If L1 and L2 are languages over Σ, their
Language concatenation is
- L = L1•L2, or simply
- L =L1L2, where
L = {w ∈ Σ* : w = x •Y for some X∈ L1 and Y∈L2}.
Example: If L = {001, 10, 111} and M = {, 001}
then
– L.M = {001, 10, 111, 001001, 10001, 111001 }
– L U M ={, 001,10,111}
Kleene Star:
- “Kleene Star” of a language L is denoted by L*.
- L* is the set of all strings obtained by concatenating
zero or more strings from L.
Operations
• L*= w ∈ Σ*:w=w1....w k for some k ≥0 and
on
Language some w1,w2,...,wk ∈ L
of these strings is in L.
- L*= L0 U L1U L2U…. , Where L0=Є
with (a + λ).
Examples:
– Represent the following sets by regular expression
a. {∧, ab}
b. {1,11,111....}
c. {ab, a, b, bb}
Regular Solution
expression a. The set {∧, ab} is represented by the regular
s
expression ∧ + ab
b. The set{1, 11,111,....,}is got by concatenating 1
and any element of {1}*.
Therefore 1(1)* represent the given set.
c. The set {ab, a, b, bb} represents the regular
expression
ab+ a+ b +bb.
Obtain the regular expressions for the following
sets:
1. The set of all strings over {a, b} beginning and ending
with ‘a’.
Þ The regular expression for ‘the set of all
Regular
strings over {a, b} beginning and ending
expressions with ‘a’ is given by: a (a + b)*a
- Exercises 2. {b2, b5, b8,. . . . .}
Þ The regular expression for {b 2
, b 5
, b
8
, .........} is given by: bb (bbb)*
3. {a2n+1 |n > 0}
Þ The regular expression for {a 2n+1
|n >
0}is given by: a (aa)+
Let L = {ab, aa, baa}, which of the following
of Finite
Automata
Applications of FAs:
Lexical analysis,
text search,
DFA is FSA that accepts /rejects finite
strings of symbols.
Produces a unique computation of
the automaton for each input string.
tic Finite
A DFA is a 5-tuple M =(𝑄,Σ,𝛿,𝑞0,𝐹)
where
Automaton
– 𝑄: A finite set of state
(DFA)
– Σ: An alphabet of input symbols
– 𝛿 ∶ 𝑄 × Σ → 𝑄: A transition
function
– 𝑞0 ∈ 𝑄: A start state
The input mechanism can move
only from left to right and reads
exactly one symbol on each step.
Determinis The transition from one internal
Final state
a state transition table is a table
showing what state finite state
Table machine(or states in the case of an
transition
NFA) will move to, based on the
current state and other inputs.
Row – states
Column – inputs
Entries – next state
- start state
* - final state
The mathematical model of automat
consists of
Detailed Q finite set of states
description
∑ finite set of input symbols
δ : Q X ∑ Q , transition function
Example
δ (q0,0)q1
δ (q0,1)q0
δ (q1,0)q1
δ (q1,1)q2
δ (q2,0)q2
Determine the DFA schematic for M =
(Q, Σ, δ ,q ,F ), where Q = {q1, q2,
Example - q3}, Σ = {0,1}, q1 is the start state,
DFA
F = {q2} and δ is given by the table
below
Language of accepted Strings
Consider a DFA shown in figure below
R = r3* , where r1 = 0 , r2 = 1
R = r3* , where r1 = 0 , r2 = 1
Exercise Solution:
The NFA will be constructed step by step by breaking regular
expression into small regular expressions.
R = (r1 + r2)r3 , where r1 = 01 , r2 = 2* and r3 = 0
Exercise 3) Construct NFA for the regular expression
r= (a|b)* abb.
• Two finite accepters M1 and M2 are equivalent,
iff L(M1) =L(M2) i.e., if both
EQUIVALENC
E OF NFA accept the same language.
AND DFA
• Both DFA and NFA recognize the same class of
languages.
• It is important to note that every NFA has an
equivalent DFA.
Problem Statement
75
• ε-closure – is a set of states which can be reached from the
Steps for
state with only ε move including the state itself.
converting
NFA with ε to 01: We will take the ε-closure for the starting state of NFA as a
DFA: starting state of DFA.
02: Find the states for each input symbol that can be traversed
from the present. I.e., the union of transition value and their
closures for each state of NFA present in the current state of
DFA.
03: If we found a new state, take it as current state and repeat
02.
04: Repeat Step 02 and 03 until there is no new state present in
the transition table of DFA.
05: Mark the states of DFA as a final state which contains the
Example
77
Con…
78
DFA
DFA minimization is the task of transforming a given
Minimizati
deterministic finite automaton (DFA) into an
on
equivalent DFA that has a minimum number of
states.
value 0.
1. Minimization of DFA Using Equivalence
DFA Theorem-
Minimizatio 03: Increment k by 1.
n Find Pk by partitioning the different sets of Pk-
1.
B,
H
DFA Minimization
Example -
• There is a wide range of tools
A language for for constructing lexical
specifying lexical
analyzers analyzers.
– Lex
• Lex is a computer program
that generates lexical
analyzers.
• Lex is commonly used with the
yacc parser generator.
• Lex Specification or Structure
A language for
• A LEX program has the
specifying lexical
analyzers following forms:
D1 = R1
D2 = R2
---------------------
Auxiliary ---------------------
Definitions Dn = Rn
Regular expression
Three general approaches for
the implementation of a lexical
Implementation of analyzer
a lexical analyzer By using a lexical-analyzer
generator:
The generator provides routines for
reading and buffering the input.