0% found this document useful (0 votes)
14 views78 pages

Unit 1 - Merged

The document outlines the course structure for Automata Theory, including six units covering topics such as mathematical induction, regular languages, finite automata, and Turing machines. It also lists textbooks and provides detailed explanations of mathematical induction, recursive definitions, and types of grammars and languages. Additionally, it discusses the applications of regular expressions and finite automata in various computational contexts.

Uploaded by

iameverywhere792
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views78 pages

Unit 1 - Merged

The document outlines the course structure for Automata Theory, including six units covering topics such as mathematical induction, regular languages, finite automata, and Turing machines. It also lists textbooks and provides detailed explanations of mathematical induction, recursive definitions, and types of grammars and languages. Additionally, it discusses the applications of regular expressions and finite automata in various computational contexts.

Uploaded by

iameverywhere792
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Course Name: Automata Theory

Course Code: UCSC0402


Unit 1: Mathematical Induction, Regular Languages &
Finite Automata
Unit 2 : Kleene’s Theorem
Unit 3 : Grammars and Languages
Unit 4 : Push Down Automata
Unit 5 : CFL’s and non CFL’s
Unit 6 : Turing Machines

Text Books:
1.Introduction to languages & Theory of computations – John C.
Martin (MGH) –Chapters 1, 2,3,4,5,6,7,8 2.
2. Discrete Mathematical Structures with applications to Computer
Science—J .P.Trembley & R.Manohar (MGH) Chapter 1,
Unit 1: Mathematical Induction, Regular Languages &
Finite Automata
• The Principle of Mathematical Induction
• Recursive Definitions,
• Definition & types of grammars & languages,
• Regular expressions and corresponding regular
languages, examples and applications,
• unions, intersection & complements of regular
languages,
• Finite automata-definition and representation,
• Non-deterministic F.A.,NFA with null transitions,
• Equivalence of FA’s ,
• NFA’s and NFA’s with null transitions.
The Principle of Mathematical Induction
• Mathematical Induction is a technique of proving a statement,
theorem or formula which is thought to be true, for each and
every natural number n.
• By generalizing this in form of a principle which we would use to
prove any mathematical statement is 'Principle of Mathematical
Induction'.

• For example: 13 +23 + 33 + ….. +n3 = (n(n+1) / 2)2, the statement is


considered here as true for all the values of natural numbers.

The technique involves two steps to prove a statement P(n), as stated


below −
• Step 1(Base step) − It proves that a statement is true for the initial value(a).
• Step 2(Inductive step) − Assume that given statement P(n) is also true for n = k,
where k is any positive integer.
• The base step and the inductive step, together, prove that P(k) => P(k + 1) =>
P(k + 2) …. is true. Therefore, P(n) is true for all integers n ≥ a.
The Principle of Mathematical Induction example
• Example 1:
• Prove the following by Mathematical Induction:
1 + 3 + 5 +.... + 2n - 1 = n2.
Solution:
Step 1: Base step- let us assume that.
P (n) = 1 + 3 + 5 +..... + 2n - 1 = n2.
For n = 1, P (1) = 1 = 12 = 1 It is true for n = 1................ (i)
Induction Step: For n = r,
P (r) = 1 + 3 + 5 +..... +2r-1 = r2 is true......................... (ii)
Adding 2r + 1 in both sides
P (r + 1) = 1 + 3 + 5 +..... +2r-1 + 2r +1 = r2 + (2r + 1)
= r2 + 2r +1
= (r+1)2..................... (iii)
As P(r) is true. Hence P (r+1) is also true.
From (i), (ii) and (iii) we conclude that.
1 + 3 + 5 +..... + 2n - 1 =n2 is true for n = 1, 2, 3, 4, 5 ....Hence Proved.
Example:The Product of Two Odd Integers Is Odd
• To Prove: For every two integers a and b, if a and b are odd,
then ab is odd.
■ Proof :
• The conditional statement can be restated as follows:
• If there exist integers i and j so that a = 2i + 1 and b = 2j + 1,
then there exists an integer k so that ab = 2k + 1.
• Our proof will be constructive—not only will we show that
there exists such an integer k, but we will demonstrate how to
construct it.
• Assuming that a = 2i + 1 and b = 2j + 1,
• we have ab = (2i + 1)(2j + 1) = 4ij + 2i + 2j + 1 = 2(2ij + i + j ) + 1
• Therefore, if we let k = 2ij + i + j, we have the result we want,
ab = 2k + 1.
Recursive Definitions
• Recursion is the general term for the practice of
defining an object in terms of itself or of part of
itself.
• Recursively Defined Functions:
• A recursive or inductive definition of a function
consists of two steps.
– • Basis Step: Specify the value of the function at
initial values. (e.g. f(0) defined)
– • Recursive Step: Give a rule for finding its value at an
integer from its values at smaller integers. (For n > 0,
define f(n) in terms of f(0), f(1), . . . , f(n − 1))
Example of recursive definations of function to calculate the
factorial of any number n>0.
• A function is said to be recursive when it is calling itself (with somewhat less
complex arguments) to solve a problem.
• Now, factorial can be a recursive function as we can calculate factorial of
bigger numbers by calculating factorials of lower numbers.
• e.g., 5! = 5*4! = 5*4*3! = 5*4*3*2! = 5*4*3*2*1! = 5*4*3*2*1 = 120
by taking 1! = 1.
• n! = 1 if n=0
n! = n*(n-1)! If n>0
Recursive Definitions
• Recursion is a technique that is often useful in
writing computer programs.
• Here recursion as a tool for defining sets:
primarily, sets of numbers, sets of strings, and
sets of sets (of numbers or strings).
• A recursive definition of a set begins with a basis
statement that specifies one or more elements in
the set. The recursive part of the definition
involves one or more operations that can be
applied to elements already known to be in the
set, so as to produce new elements of the set
Example of recursive definition is the axiomatic
definition of the set N of natural numbers
• We might write the definition this way:
1. 0 ∈ N .
2. For every n ∈ N , n + 1 ∈ N .
3. Every element of N can be obtained by using statement
1 or statement 2.
In order to obtain an element of N , we use statement 1
once and statement 2 a finite number of times (zero or
more).
To obtain the natural number 7, for example, we use
statement 1 to obtain 0;
then statement 2 with n = 0 to obtain 1;
then statement 2 with n = 1 to obtain 2; ... ;
and finally, statement 2 with n = 6 to obtain 7.
Definition & types of grammars & languages,
• According to Noam Chomosky, there are four
types of grammars − Type 0, Type 1, Type 2, and
Type 3. The scope of each type of grammar −
Type - 0 Grammar
• Type-0 grammars generate recursively enumerable
languages. The productions have no restrictions. They
are any phase structure grammar including all formal
grammars.
• They generate the languages that are recognized by a
Turing machine.
• The productions can be in the form of α → β where α is
a string of terminals and nonterminals with at least one
non-terminal and α cannot be null. β is a string of
terminals and non-terminals.
• Examples : ( includes all natural languages)
S → ACaB
Bc → acB
CB → DB
aD → Db
Type - 1 Grammar
• Type-1 grammars generate context-sensitive languages.
The productions must be in the form
• αAβ→αγβ
• where A ∈ N (Non-terminal)
• and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-
terminals)
• The strings α and β may be empty, but γ must be non-
empty.
• The rule S → ε is allowed if S does not appear on the
right side of any rule. The languages generated by these
grammars are recognized by a linear bounded
automaton.
• Examples: (most of programming languages)
AB → AbBc
A → bcA
B→b
Type - 2 Grammar
• Type-2 grammars are CFG(context free
grammar) generate context-free languages.
• The productions must be in the form A → γ
where A ∈ N (Non terminal) and
γ ∈ (T ∪ N)* (String of terminals and non-terminals).
• These languages generated by these grammars are be
recognized by a non-deterministic pushdown automaton.
• Examples: (some simple programming languages)
S→Xa
X→a
X → aX
X → abc
X→ε
Type - 3 Grammar
• Type-3 grammars are Regular grammar which generate
Regular languages.
• Type-3 grammars must have a single non-terminal on the left-
hand side and a right-hand side consisting of a single terminal
or single terminal followed by a single non-terminal.
• The productions must be in the form
X → a or X → aY
where X, Y ∈ N (Non terminal) and
a ∈ T (Terminal)
• The rule S → ε is allowed if S does not appear on the right side
of any rule.
• Examples: (pattern matching languages contains regular
expressions)
X→ε
X → a | aY
Y→b
Application of Regular Expression
• Regular expressions are useful in a wide variety of text processing
tasks,and more generally string processing, where the data need
not be textual.
• Common applications include data validation, data scraping
(especially webscraping),data wrangling, simple parsing, the
production of syntax highlighting systems, and many other tasks.
• While regexps would be useful on Internet search engines,
processing them across the entire database could consume
excessive computer resources depending on the complexity and
design of the regex.
• Basically regular expressions are used in search tools
• Text file search in Unix (tool: egrep) This command searches for a
text pattern in the file and list the file names containing that
pattern
Introduction to Finite Automata
• Study a class of machines called finite automata.
• Finite automata are computing devices that
accept/recognize regular languages and are used to
model operations of many systems we find in
practice.
• Their operations can be simulated by a very simple
computer program.
• A language is a subset of the set of strings over an
alphabet.
• A language can be generated by grammar. A
language can also be recognized by a machine. Such
machine is called recognition device.
• The simplest machine is the finite state automaton.
• A finite automaton is a mathematical (model)
abstract machine that has a set of “states” and its
“control”moves from state to state in response to
external “inputs”.
• The finite state machines are used in applications in
computer science and data networking.
• For example, finite-state machines are basis for
programs for spell checking, indexing, grammar
checking, searching large bodies of text, recognizing
speech, transforming text using markup languages
such as XML & HTML, and network protocols that
specify how computers communicate.
Types of Finite Automata
• The control may be either “Deterministic”
meaning that the automation can‘t be in more
than one state at any one time, or “Non
deterministic”, meaning that it may be in several
states at once.
• This distinguishes the class of automata as
– DFA - The DFA, i.e. Deterministic Finite Automata
can‘t be in more than one state at any time.
– NFA. ‐ The NFA, i.e. Non-Deterministic Finite
Automata can be in more than one state at a time.
– NFA with epsilon moves
Deterministic Finite Automata- DFA
• Definition : A deterministic finite automaton is defined
by a quintuple (5-tuple) as (Q, Σ, δ, q0, F). Where,
Q = Finite set of states,
Σ = Finite set of input symbols,
δ = A transition function that maps Q × Σ -> Q
q0 = A start state; q0 ∈ Q
F = Set of final states; F includes in Q.
A transistion function δ that takes as arguments a state
and an input symbol and returns a state. In our diagram,
δ is represented by arcs between states and the labels
on the arcs.
For example:
If S is a state and a is an input symbol then δ(s,a) is
that state F such that there are arcs labled ‘a‘ from S to
F. a
S FS
Examples of DFA
Example 1:
Design a FA with ∑ = {0, 1} accepts those string which starts with 1
and ends with 0.
Solution: The FA will have a start state q0 from which only the
edge with input 1 will go to the next state.
In state q1, if we read 1, we will be in state q1, but if we read 0 at
state q1, we will reach to state q2 which is the final state. In state
q2, if we read either 0 or 1, we will go to q2 state or q1 state
respectively. Note that if the input ends with 0, it will be in the
final state.
• Example 2:
Design a FA with ∑ = {0, 1} accepts the only
input 101.
Solution: In the given solution, we can see that
only input 101 will be accepted. Hence, for
input 101, there is no other path shown for
other input.
• Example 3:
Design FA with ∑ = {0, 1} accepts even
number of 0's and even number of 1's.
• This FA will consider four different stages
for input 0 and input 1. The stages could
be:
• Here q0 is a start state and the final
state also. Note carefully that a
symmetry of 0's and 1's is
maintained. We can associate
meanings to each state as:
• q0: state of even number of 0's and
even number of 1's.
q1: state of odd number of 0's and
even number of 1's.
q2: state of odd number of 0's and
odd number of 1's.
q3: state of even number of 0's and
odd number of 1's.
Unit2 : Kleene’s Theorem

•Part I & II statements and proofs,


•minimum state of FA for a regular language,
•minimizing number of states in Finite Automata.
• In this unit we are going to learn Kleene's
theorem. It states that any regular language is
accepted by an FA and conversely that any
language accepted by an FA is regular.
• Kleene's theorem is proving the following two
parts:
1. Kleene’s Theorem, Part 1
Any regular language can be accepted by a Finite
Automaton.
2. Kleene’s Theorem, Part 2
The Language accepted by Finite Automaton is
regular
1. Kleene’s Theorem, Part 1 :
Any regular language can be accepted by a Finite
Automaton.

Proof:
• This is going to be proven by (general) induction
following the recursive definition of regular
language.
• The Inductive proofs includes the 2 steps as
– Basis Step
– Inductive Step
Basis Step:
As shown below the languages Φ , {} and { a } for
any symbol a in Σ are accepted by an FA.
Inductive Step:
• We are going to show that for any languages L1 and
L2 if they are accepted by FAs, then L1. L2 , L1UL2 and
L1* are accepted by FAs.
• Since any regular language is obtained from {} and {a}
for any symbol a in by using union, concatenation and
Kleene star operations, that together with the Basis Step
would prove the theorem.
• Suppose that L1 and L2 are accepted by
FAs M1 = < Q1 , ∑ , q1,0 , δ1 , A1 > and
M2 = < Q2 , ∑, q2,0 , δ2 , A2 > , respectively.
We assume that Q1 ∩ Q2 = Φ without loss of
generality since states can be renamed if necessary.
• Then L1. L2 , L1UL2 and L1* are
accepted by the FAs
L1 U L2 is Mu = < Qu , Σ , qu,0 , δu , Au > ,

L1. L2 is Mc = < Qc , Σ , qc,0 , δc , Ac >

L1* is Mk = < Qk , Σ , qk,0 , δk , Ak > ,

which are given below.


L1 U L2 is represented as ,
Mu = < Qu , Σ , qu,0 , δu , Au >
where
Qu = Q1 ∪ Q2 ∪ { qu,0 } ,
where qu,0 is a state which is neither in Q1 nor in
Q2 .
δu = δ1 ∪ δ2 ∪ { (qu,0, ∅ , { q1,0 , q2,0 } ) } ,
that is δu(qu,0, ∅ ) = { q1,0 , q2,0 } .

Note that δu (qu,0, a ) = ∅ for all a in Σ.


Au = A 1 ∪ A2
L1. L2 is represented as ,
Mc = < Qc , Σ , qc,0 , δc , Ac > :
where,
Qc = Q 1 ∪ Q2
qc,0 = q1,0
δc = δ1 ∪ δ2 ∪ { (qc,0, ∅ , { q2,0 } ) } | q ∈ A1,
Ac = A 2
L1* is represented as ,
Mk = < Qk , Σ , qk,0 , δk , Ak > :
where,
Qk = Q1 ∪ { qk,0 } ,
where qk,0 is a state which is not in Q1 .
δk =δ1 ∪ { (qk,0, ∅ , { q1,0 } ) } { (q, ∅ , { qk,0 } ) |q∈ A1 }
Ak = { qk,0 }
These NFA-s are illustrated below.
It can be proven, though we omit proofs, that these NFA- s , Mu, Mc and Mk , in
fact accept L1UL2 ,L1. L2 and L1* respectively.
End of Proof
Examples of Mu , Mc and Mk:

• Example 1: An NFA- that accepts the language


represented by the
regular expression (aa + b)* can be constructed
as follows using the operations given above.
• Example 2: An NFA- that accepts the language
represented by the
regular expression ((a + b)a*)* can be
constructed as follows using the operations given
above.
Example 1: An NFA- null that accepts the language
represented by the regular expression (aa + b)* can be
constructed as follows using the operations given above.
Solution:
First construct NFA-null transition for a, b and aa
Continued with example1
Then construct NFA-null for subexpressions of given RE using union and kleen *
Example 2: An NFA- that accepts the language represented by the
regular expression ((a + b)a*)* can be constructed as follows using
the operations given above.
Solution:
First construct NFA-null transition for a, b , a* and a+b
Continued with example1 :
Then construct NFA-null for subexpressions of given RE using concatination and kleen *
.
2 Kleene’s Theorem, Part 2
The Language accepted by Finite Automaton is regular
• The converse of the part 1 of Kleene Theorem also holds
true. It states that any language accepted by a finite
automaton is regular.
• Before proceeding to a proof outline for the converse, let
us study a method to compute the set of strings accepted
by a finite automaton.
• Given a finite automaton, first relabel its states with the
integers 1 through n, where n is the number of states of the
finite automaton.
• Next denote by L(p, q, k) the set of strings representing
paths from state p to state q that go through only states
numbered no higher than k.
• Note that paths may go through arcs and vertices any
number of times.
• Then the following lemmas hold.
Lemma 1:
L(p, q, k+1) = L(p, q, k) L(p, k+1, k)L(k+1, k+1, k)*L(k+1, q, k) .

What this lemma says is that the set of strings


representing paths from p to q passing through states
labeled with k+1 or lower numbers consists of the
following two sets:
1. L(p, q, k) : The set of strings representing paths from p
to q passing through states labeled with k or lower
numbers.
2. L(p, k+1, k)L(k+1, k+1, k)*L(k+1, q, k) : The set of strings
going first from p to k+1, then from k+1 to k+1 any
number of times, then from k+1 to q, all without passing
through states labeled higher than k.
See the figure below for the illustration of above.
Lemma 2: L(p, q, 0) is regular.
Proof: L(p, q, 0) is the set of strings representing paths from p
to q without passing any states in between. Hence if p and q are
different, then it consists of single symbols representing arcs
from p to q. If p = q, then is in it as well as the strings
representing any loops at p (they are all single symbols). Since
the number of symbols is finite and since any finite language is
regular, L(p, q, 0) is regular.
>From Lemmas 1 and 2 by induction the following lemma
holds.

Lemma 3: L(p, q, k) is regular for any states p and q and any


natural number k.
Since the language accepted by a finite automaton is the union
of L(q0, q, n) over all accepting states q, where n is the number
of states of the finite automaton, we have the following
converse of the part 1 of Kleene Theorem.
Unit3 : Grammars and
Languages
Syllabus
• Derivation and ambiguity,
• BNF & CNF notations,
• Union, Concatenation and *’s of CFLs,
• Eliminating production & unit productions from
CFG,
• Eliminating useless variables from a context Free
Grammar.
• Parsing: Top-Down, Recursive Descent
and Bottom-Up Parsing
Introduction
• Grammar is a set of rules by which strings in a language
can be generated.
• Here we are discussed the CFG- Context Free Grammar a
more powerful method of describing languages.
• The set of strings generated by a context-free grammar is
called a context-free language and context-free languages
can describe many practically important systems.
• Most programming languages can be approximated by
context-free grammar and compilers for them have been
developed based on properties of context-free languages.
• Let us define context-free grammars and context-free
languages here.
CFG Vs RE
• The CFG are more powerful than the regular
expressions as they have more expressive power
than the regular expression.
• Generally regular expressions are useful for
describing the structure of lexical constructs as
identical keywords, constants etc.
• But they do not have the capability to specify the
recursive structure of the programming constructs.
• However, the CFG are capable to define any of the
recursive structure also.
• Thus, CFG can define the languages that are regular
as well as those languages that are not regular.
Definition (Context-Free Grammar) :
• A 4-tuple G = < V ,∑ , S , P > is a context-free
grammar (CFG)
• if V and ∑ are finite sets sharing no elements
between them (V∩ ∑ = ) ,
• S belongs to V is the start symbol, and
• P is a finite set of productions of the form
X -> α , where X € V , and α € (V ∪ ∑ )* .
A language is a context-free language (CFL) if all
of its strings are generated by a context-free
grammar.
Properties of Context-Free Language
Theorem 1:
Let L1 and L2 be context-free languages. Then
L1 ∪ L2 , L1L2 , and L1* are context-free languages.
Normal forms and Simplification of CFG
BNF & CNF notations,
• Productions in CFG, satisfying certain restrictions
are said to be in Normal Forms.
• There are 2 notations used for Normal Forms,
1. CNF- Chomsky Normal Form:
A context-free grammar is said to be in
Chomsky normal form if every production is
of one of these two types:
A → BC (where B and C are variables)
A → a (where a is a terminal symbol)
Simplification of CFG
• To get CFG in CNF form, we need to make a
number of preliminary Simplifications, which
are themselves useful in various ways.
• A CFG is simplified by eliminating the following,
1. “Useless Symbols”: Those variables or terminals
that do not appear in any derivation of a terminal
string from the start symbol.
2. “Null-productions”: Those of the form
A→ Є for some variable A.
3. “Unit productions”: Those of the form
A →B for variables A and B.
1) Eliminating“Useless Symbols”
There are 2 types of symbols are useless,
• Non-genrating symbols
• Non-reachable symbols
Thus useful symbols are those variables or
terminals that appear in any derivation of a
terminal string from the start symbol.
Eliminating a useless symbol includes identifying
whether or not the symbol is “generating”‖ and
“reachable”‖.
• Generating Symbol:
We say x is generating if x→*w for some
terminal string w: Note that every terminal is
generated since w can be that terminal itself,
which is derived by zero steps.
• Reachable symbol:
We say x is reachable if there is derivation
S →* αxβ for some α and β.

• Thus if we eliminate the non generating symbols


and then non-reachable, we shall have only the
useful symbols left.
• Example :
Consider a grammar defined by following
productions:

S→aB | bX
A → Bad | bSX | a
B → aSB | bBX
X → SBd | aBX | ad
Here;
A and X can directly generate terminal symbols. So, A
and X are generating symbols. As we have the
productions A→ a and X→ ad.
Also,
S→bX and X generates terminal string so S can also
generate terminal string. Hence, S is also generating
symbol.
B can not produce any terminal symbol, so it is non-
generating.
Hence, the new grammar after removing
non-generating symbols is:
S → bX
A → bSX | a
X → ad
• Here,
• A is non-reachable as there is no any derivation
of the form S→* α A β in the grammar. Thus
eliminating the non-reachable symbols, the
resulting grammar is:
S→ bX
X→ ad
This is the grammar with only useful symbols.
Exercise
1) Remove useless symbol from the following
grammar:

S→ xyZ | XyzZ
X → Xz | xYZ
Y → yYy | XZ
Z → Zy | z
2) Remove useless symbol from the following grammar

S → aC | SB
A → bSCa
B → aSB | bBC
C → aBc | ad
2)Eliminating“Null-productions” :
A grammar is said to have Є-
productions if there is a production of
the form
A → Є.
Here our strategy is to begin by
discovering which variables are
“nullable”.

A variable ‘A‘ is “nullable” if A→ Є .


Algorithm (Steps to remove Є-production from the
grammar):
• If there is a production of the form A →Є,
then A is “nullable”.
• If there is production of the form B→ X1,
X2………. And each Xi‘s are nullable then B is also
nullable.
• Find all the nullable variables.
• If B→X1, X2……………. Xn is a production in P then
add all productions P‘ formed by striking out
some subsets of there Xi‘s that are nullable.
• Do not include B→ Є if there is such production.
Example:
Consider the grammar:
S→ABC
A → BB | Є
B → CC | a
C → AA | b
Here,
A→Є A is nullable.
C → AA → * Є, C is nullable
B → CC → * Є, B is nullable
S → ABC → * Є, S is nullable
Now for removal of Є –production:
• In production
S→ABC, all A, B and C are nullable.
So, striking out subset of each the possible
combination of production gives new productions
as:
S→ ABC | AB | BC | AC | A | B | C
• Similarly for other can be done and the resulting
grammar after removal of e-production is:
S → ABC | AB | BC | AC | A | B | C
A → BB | B
B → CC | C | a
C → AA | A | b
Exercise:
Remove Є-productions for each of grammar;
1)
S→ AB
A → aAA | Є
B → bBB | Є
3) Eliminating Unit Production:
• A unit production is a production of the
form A→ B, where A and B are both
variables.
• Here, if A → B, we say B is A-derivable.
B→ C, we say C is B-derivable.
• Thus if both of two A → B and B → C, then
A → * C, hence C is also A-derivable.
• Here pairs (A, B), (B, C) and (A, C) are
called the unit pairs.
To eliminate the unit productions, first find all of the unit
pairs. The unit pairs are;
(A, A) is a unit pair for any variable A as A→* A
If we have A → B then (A, B) is unit pair.
If (A, B) is unit pair i.e. A → B, and if we have B → C then
(A, C) is also a unit pair.
Now, to eliminate those unit productions for a, gives
grammar say G = (V, T, P, S), we have to find another
grammar G‘ = (V, T, P‘, S) with no unit productions. For this,
we may workout as below;
• Initialize P‘ = P
• For each A ε V, find a set of A-derivable variables.
• For every pair (A, B) such that B is A-derivable and for every
non-unit production B→ α, we add production A → α is P‘
if it is not in P‘ already.
• Delete all unit productions from P‘.
Example
Remove the unit production for grammar G defined
by productions:
P = { S→ S + T | T
T → T* F | F
F → (S) | a };
Solution:
Initialize
1) P‘= { S→ S + T | T
T → T* F | F
F → (S) | a };
2) Now, find unit pairs;
Here, S→ T So, (S, T) is unit pair.
T→ F So, (T, F) is unit pair.
Also, S → T and T → F So, (S, F) is unit pair.

3) Now, add each non-unit productions of the form B → α


for each pair (A, B);
P‘ = {
S → S + T |T * F| (S) | a
T → T * F | (S) | a | F
F → (S) | a
}
4) Delete the unit productions from the grammar;
P‘ = {
S→ S + T | T * F | (S) | a
T→ T * F | (S) | a
F→ (S) | a
}
Exercise
1) Simply the grammar G = (V, T, P, S) defined by following
productions.
1) S→ ASB | Є
A→> aAS | a
B→SbS | A | bb | Є
Note: Here simplify means you have to remove all the useless
symbol, Unit production and Є- productions.

2) Simplify the grammar defined by following production:

S → 0A0 | 1B1 | BB
A→C
B→S|A
C→S|Є
1)CNF- Chomsky Normal Form:
A context-free grammar is said to be in Chomsky
normal form if every production is of one of
these two types:
A → BC (where B and C are variables)
A → a (where a is a terminal symbol)
and Thus a grammar in CNF is one which should
not have;
• Є-production
• Unit production
• Useless symbols
Algorithm to convert CFG into CNF:
• Step 1: Eliminate Є-production, Unit production
Useless symbols from given CFG.
• Step2: If all the productions are of the form
A→ a and A→BC with A, B, C ε V and
a ε T, we have done.
Otherwise, we have to do two task as:
1. Arange that all bodies of length 2 or more
consist only of variable.
2. Break bodies of length 3 or more into a
cascade of production , each with a body
consisting of two variable.
• The construction for task (1) is as follows :
if the productions are of the form:
A→ X1, X2, ………………Xm, m>2 and if some
Xi is terminal a,
then we replace the Xi by Ca having Ca→ a
where Ca is a variable itself.
Thus as result we will have all productions of
the form:
A→ B1B2…………Bm, m>2;
where all Bi‘s are non-terminal.
• The construction for task (2) is as follows :
We break those production
A→ B1B2…………Bm for m>=3, into group of
production with two variables in each body.
We introduce m-2 new variables
C1,C2,…………Cm-2.
The original production is replaced by the m-1
productions :
A→B1C1,
C1→B2C2,
……..
…….. …
Ck-2→Bk-1Bk
• Finally, all of the productions are achieved
in the form as:
A→ BC or A→a
This is certainly a grammar in CNF and
generates a language without Є-
productions.
• Consider an example: Convert CFG to CNF,
S→ AAC
A→ aAb | Є
C → aC | a
Solution: 1) First, removing Є- productions;
Here, A is nullable symbol as A→Є
So, eliminating such Є-productions,
we have;
S→ AAC | AC | C
A→ aAb | ab
C→ aC | a
2) Removing unit-productions:
Here, the unit pair we have is (S, C) as S→C
So, removing unit-production,
we have CFG as ;
S→ AAC | AC | aC| a
A→ aAb | ab
C→ aC | a

3) Here in CFG , we do not have any useless


symbols.
Now, we can convert the grammar to CNF. For this;
• First replace the terminal by a variable and introduce new productions for
those which are not as the productions in CNF.
• i.e. S→AAC | AC |C1C | a
C1→ a
A→ C1AB1 | C1B1
B1→ b
C→ C1C | a
Now, replace the sequence of non-terminals by a variable and introduce new
productions.
Here, replace S→ AAC by S→AC2, C2→AC
Similarly, replace A→ C1AB1 by A→ C1C3, C3→ AB1
Thus the final grammar in CNF form will be as;
S→ AC2 | AC | C1C | a
A→ C1C3 | C1b1
C1→ a
B1→ b
C2→ AC
C3→ AB1
C→C1C | a
Q. Convert the following CFG into CNF form,
S→TU|V
T → aT b |
U → cU |
V → aV c | W
W → bW |
Solution:
1. (Identifying nullable variables)
The variables T , U, and W are nullable because they are involved in -
productions;
V is nullable because of the production V → W; and S is also, either because of
the production S → T U or
because of S → V .
So all the variables are!
2. (Eliminating -productions)
Before the -productions are eliminated, the following productions are added:
S→T S→U T → ab U→c V → ac W → b After
eliminating -productions, we are left with
S→TU|T|U|V
T → aT b | ab
U → cU | c
V → aV c | ac | W
W → bW | b
3. (Identifying A-derivable variables, for each A) The S-
derivable variables obviously include T , U, and V , and
they also include W because of the production V → W. The
V -derivable variable is W.
4. (Eliminating unit productions)
We add the productions
S → aT b | ab | cU | c | aV c | ac | bW | b
V → bW | b
before eliminating unit productions.
At this stage, we have
S → T U | aT b | ab | cU | c | aV c | ac | bW | b
T → aT b | ab
U → cU | c
V → aV c | ac | bW | b
W → bW | b
5. (Converting to Chomsky normal form)
We replace a, b, and c by Xa, Xb, and Xc, respectively, in productions whose right
sides are not single terminals, obtaining
S → T U | XaT Xb | XaXb | XcU | c | XaV Xc | XaXc |
XbW | b T → XaT Xb | XaXb
U → XcU | c V → XaV Xc | XaXc | XbW | b
W → XbW | b
This grammar fails to be in Chomsky normal form only because of the productions
S → XaT Xb, S → XaV Xc, T → XaT Xb, and V → XaV Xc. When we take care of
these as described above, we obtain the final CFG G1 with productions
S → T U | XaY1 | XaXb | XcU | c | XaY2 | XaXc | XbW | b
Y1 → T Xb
Y2 → V Xc
T → XaY3 | XaXb
Y3 → T Xb
U → XcU | c
V → XaY4 | XaXc | XbW | b
Y4 → V Xc
W → XbW | b
(We obviously don’t need both Y1 and Y3, and we don’t need both Y2 and Y4, so we
could simplify G1 slightly.)

You might also like