Formal Languages CHAPTER-I
Formal Languages CHAPTER-I
INTRODUCTION
Introduction:
Automata theory : the study of abstract computing devices, or ”machines” Before computers
(1930), A. Turing studied an abstract machine (Turing machine) that had all the capabilities of
today’ s computers (concerning what they could compute). His goal was to describe precisely the
boundary between what a computing machine could do and what it could not do. Simpler kinds
of machines (finite automata) were studied by a number of researchers and useful for a variety of
purposes.
Theoretical developments bear directly on what computer scientists do today Finite automata,
formal grammars: design/ construction of software Turing machines: help us understand what we
can expect from a software Theory of intractable problems: are we likely to be able to write a
program to solve a given problem? Or we should try an approximation, a heuristic...
why we study:
Finite automata are a useful model for many important kinds of software and hardware:
1. Software for designing and checking the behaviour of digital circuits
2. The lexical analyser of a typical compiler, that is, the compiler component that
breaks the input text into logical units
3. Software for scanning large bodies of text, such as collections of Web pages, to
find occurrences of words, phrases or other patterns
4. Software for verifying systems of all types that have a finite number of distinct
states, such as communications protocols of protocols for secure exchange
information.
Applications:
Modern applications of automata theory go far beyond compiler techniques or hardware
verification. Automata are widely used for modelling and verification of software, distributed
systems, real-time systems, or structured data. They have been equipped with features to model
time and probabilities as well.
Terminoloies:
Alphabets:
Definition: An alphabet is any finite set of symbols or A finite, nonempty set of symbols.
Symbol: Σ
Example: Σ = {a, b, c, d} is an alphabet set where ‘a’, ‘b’, ‘c’, and ‘d’ are symbols.
The binary alphabet: Σ = {0, 1}
The set of all lower-case letters: Σ = {a, b, . . . , z}
The set of all ASCII characters.
POWER OF AN ALPHABET :
If Σ is an alphabet, we can express the set of all strings of a certain length from that
alphabet by using the exponential notation:
ΣK: the set of strings of length k, each of whose is in Σ
Examples:
Σ0 : { є }, regardless of what alphabet Σ is. That is ǫ is the only string of
1
length 0
If Σ = {0, 1}, then:
1. Σ 1 = {0, 1}
2. Σ 2 = {00, 01, 10, 11}
3. Σ 3 = {000, 001, 010, 011, 100, 101, 110, 111}
Note: confusion between Σ and Σ1:
1. Σ is an alphabet; its members 0 and 1 are symbols
2. Σ 1 is a set of strings; its members are strings (each one of length 1)
Strings:
Definition: A string is a finite sequence of symbols taken from Σ.or A string (or sometimes
a word) is a finite sequence of symbols chosen from some alphabet.
Example: ‘cabcad’ is a valid string on the alphabet set Σ = {a, b, c, d}
Length of a String :It is the number of symbols present in a string. (Denoted by |S|).or the
number of positions for symbols in the string
Examples:
01101 has length 5
If S=‘cabcad’, |S|= 6
If |S|= 0, it is called an empty string (Denoted by λ or ε).
There are only two symbols (0 and 1) in the string 01101, but 5 positions for
Symbols.
LANUAGES: “A language is a collection of sentences of finite length all constructed from a finite
alphabet of symbols”. (OR)
A language is a subset of Σ* for some alphabet Σ. It can be finite or
infinite.
2
L is a said to be a language over alphabet Σ, only if L ⊆ Σ*
this is because Σ* is the set of all strings (of all possible length including 0) over the given alphabet Σ
Examples:
1.Let L be the language of all strings consisting of n 0’s followed by n 1’s:
L = {є,01,0011,000111,…}
2.Let L be the language of all strings of with equal number of 0’s and 1’s:
L = { є,01,10,0011,1100,0101,1010,1001,…}
3. If the language takes all possible strings of length 2 over Σ = {a, b},
then L = { ab, bb, ba, bb}.
CLOSURE:
The closure of a language L is denoted L_ and represents the set of those strings that
can be formed by taking any number of strings from L, possibly with repetitions (i.e., the
same string may be selected more than once) and concatenating all of them.
Examples:
If L = {0, 1} then L* is all strings of 0 and 1
If L = {0, 11} then L* consists of strings of 0 and 1 such that the 1 come in
pairs, e.g., 011, 11110 and є. But not 01011 or 101.
GRAMMARS:
The theory of formal languages finds its applicability extensively in the fields of Computer
Science. Noam Chomsky gave a mathematical model of grammar in 1956 which is effective
for writing computer languages.
Grammar
A grammar G can be formally written as a 4-tuple (N, T, S, P) where −
P is Production rules for Terminals and Non-terminals. A production rule has the form α
→ β, where α and β are strings on VN ∪ ∑ and least one symbol of α belongs to VN.
3
“A grammar can be regarded as a device that enumerates the sentences of a language” - nothing more,
nothing less .
G=(N,T,P,S).
roduction
Kleene Star:
Definition: The Kleene star, Σ*, is a unary operator on a set of symbols or strings,
Σ, that gives the infinite set of all possible strings of all possible lengths over Σ
including λ.
Representation: Σ* = Σ0 U Σ1 U Σ2 U……. where Σp is the set of all possible strings
of length p.
Example: If Σ = {a, b}, Σ*= {λ, a, b, aa, ab, ba, bb,………..}
Kleene Closure / Plus
Definition: The set Σ+ is the infinite set of all possible strings of all possible lengths
over Σ excluding λ.
Representation: Σ+ = Σ1 U Σ2 U Σ3 U…….
Σ+
= Σ* − { λ }
Example: If Σ = { a, b } , Σ+ ={ a, b, aa, ab, ba, bb,………..}.
Automata :
The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting".
An automaton (Automata in plural) is an abstract self-propelled computing device which
follows a predetermined sequence of operations automatically.
An automaton with a finite number of states is called a Finite Automaton (FA) or Finite State
Machine (FSM).
4
Definition of Finite Automata. A finite automaton(FA) is a simple idealized machine used to
recognize patterns within input taken from some character set (or alphabet) C. The job of an FA
is to accept or reject an input depending on whether the pattern defined by the FA occurs in the
input.
A finite automaton has a finite set of states with which it accepts or rejects strings.
An FA has three components:
1. input tape contains single string;
2. head reads input string one symbol at a time; and
3. Memory is in one of a finite number of states.
Operating an FA.
1) Set the machine to start state.
2) If End-of-String then halt.
3) Read a symbol.
4) Update state according to current state and symbol read.
5) Goto Step 2.
Final state is state FA is in when finished reading the input string.
There are accept states (double circle) and reject states.
An FA accepts input string if final state is accept state; otherwise it rejects.
5
Example
Let a deterministic finite automaton be →
Q = {a, b, c},
∑ = {0, 1},
q0 = {a},
F = {c}, and
Transition function δ as shown by the following table −
Present State Next State for Input 0 Next State for Input 1
a a b
b c a
c b c
In NDFA, for a particular input symbol, the machine can move to any combination of the states
in the machine. In other words, the exact state to which the machine moves cannot be
determined. Hence, it is called Non-deterministic Automaton. As it has finite number of
states, the machine is called Non-deterministic Finite Machine or Non-deterministic Finite
Automaton.
6
Formal Definition of an NDFA
An NDFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −
(Here the power set of Q (2Q) has been taken because in case of NDFA, from a state,
transition can occur to any combination of Q states)
q0 is the initial state from where any input is processed (q0 ∈ Q).
F is a set of final state/states of Q (F ⊆ Q).
Q = {a, b, c}
∑ = {0, 1}
q0 = {a}
F = {c}
The transition function δ as shown below −
Present State Next State for Input 0 Next State for Input 1
a a, b b
b c a, c
7
c b, c c
DFA vs NDFA
The following table lists the differences between DFA and NDFA.
DFA NDFA
The transition from a state is to a single particular next The transition from a state can be to multiple next
state for each input symbol. Hence it is states for each input symbol. Hence it is
called deterministic. called non-deterministic.
Empty string transitions are not seen in DFA. NDFA permits empty string transitions.
A string is accepted by a DFA, if it transits to a final A string is accepted by a NDFA, if at least one of
state. all possible transitions ends in a final state.
Example
Let us consider the DFA shown in Figure 1.3. From the DFA, the acceptable strings can be
derived.
8
Strings accepted by the above DFA: {0, 00, 11, 010, 101, ...........}
Strings not accepted by the above DFA: {1, 011, 111, ........}
Algorithm
Input − An NDFA
Step 2 − Create a blank state table under possible input alphabets for the equivalent DFA.
Step 3 − Mark the start state of the DFA by q0 (Same as the NDFA).
Step 4 − Find out the combination of States {Q0, Q1,... , Qn} for each possible input alphabet.
Step 5 − Each time we generate a new DFA state under the input alphabet columns, we have to
apply step 4 again, otherwise go to step 6.
Step 6 − The states which contain any of the final states of the NDFA are the final states of the
equivalent DFA.
Example:
9
Let us consider the NDFA shown in the figure below.
q δ(q,0) δ(q,1)
a {a,b,c,d,e} {d,e}
b {c} {e}
c ∅ {b}
d {e} ∅
e ∅ ∅
Using the above algorithm, we find its equivalent DFA. The state table of the DFA is shown in
below.
q δ(q,0) δ(q,1)
10
[d,e] [e] ∅
[e] ∅ ∅
[c, e] ∅ [b]
[c] ∅ [b]
11