0% found this document useful (0 votes)
3 views

Module-4 Normal Forms

This document outlines Module 4 of the Theory of Computation course, focusing on Normal Forms for Context-Free Grammars. It covers topics such as eliminating useless symbols, €-productions, unit productions, and the Chomsky Normal Form, along with the Pumping Lemma for Context-Free Languages. The content is structured to provide algorithms and definitions necessary for understanding context-free languages and their properties.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module-4 Normal Forms

This document outlines Module 4 of the Theory of Computation course, focusing on Normal Forms for Context-Free Grammars. It covers topics such as eliminating useless symbols, €-productions, unit productions, and the Chomsky Normal Form, along with the Pumping Lemma for Context-Free Languages. The content is structured to provide algorithms and definitions necessary for understanding context-free languages and their properties.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

S. J. P. N.

TRUST’S
HIRASUGAR INSTITUTE OF TECHNOLOGY, NIDASOSHI
Accredited at 'A+' Grade by NAAC
Programmes Accredited by NBA: CSE, ECE

Department of Computer Science & Engineering


Course: Theory of Computation(BCS503)
Module 4: Normal Forms for Context
Free Grammar

Prof. A. A. Daptardar
Asst. Prof. , Dept. of Computer Science &
Engg.,
1
Hirasugar Institute of Technology, Nidasoshi
Module-4
Content to be covered:
• Normal Forms for Context-Free
Grammars, The Pumping Lemma for
Context-Free Languages, Closure
Properties of Context-Free Languages.
• TEXT BOOK: Sections 7.1, 7.2, 7.3

2
NORMAL FORMS FOR
CONTEXT FREE GRAMMARS
Eliminating Useless Symbols
Computing the Generating and Reachable Symbols
Eliminating €-Productions
Eliminating Unit Productions
Chomsky Normal Form

3
Introduction

• The goal of this section is to show that


every CFL (without €) is generated by a
CFG in which all productions are of the
form A  BC or A  a, where A, B, and
Care variables, and a is a terminal. This
form is called Chomsky Normal Form.
• To get there, we need to make a number
of preliminary simplifications, which are
themselves useful in various ways:
4
• 1. We must eliminate useless symbols,
those variables or terminals that do not
appear in any derivation of a terminal
string from the start symbol.
• 2. We must eliminate €-productions, those
of the form A  € for some variable A.
• 3. We must eliminate unit productions,
those of the form A  B for variables

5
7.1.1 Eliminating Useless Symbols

6
Algorithm
• Stage 1 : Obtain the set of variables and
productions which derive only string of
terminals.
• The algorithm to obtain a set of variables :
• Step 1: ( Initialize old_ variables denoted by ov
to ᴓ)
ov = ᴓ
• Step 2: Take all productions of the form A  x where x ϵ
T+ i.e if the RHS of the production contains only string of
terminals consider those productions and corresponding
non terminals on the LHS are added to new_variables
denoted by nv.
7
nv = {A | A  x and x ϵ T+ }
• Step 3: Compare ov and nv: As long as
the elements in ov and nv are not equal,
repeat the following statements. Otherwise
go to step 4.
• a. [copy new variables to old variables]
ov = nv
• b. add all the elements in ov to nv. Also
add the variables which derive a string
consisting of terminals and non-terminals
which are in ov i.e.
nv = ov U { A | A  y and y ϵ (ov U T)*}
8
• Step 4: After completion of step 3, nv(or
ov) contains all those non-terminals from
which only the string of terminals are
derived and add those variables to V1 i.e
V 1 = ov
• Step 5 : [Terminate the algorithm]
return V1

9
• Stage 2: Obtain the set of variables and
terminals which are reachable from the start
symbol. The productions which are not used are
useless. This can be obtained as shown below:
• Given a CFG G1 = (V1,T 1,P 1,S), we can find an
equivalent grammar G1=(V1,T 1,P 1,S) such that
for each X in (V1 U T1) there exists some α such
that

• Where X is a symbol in α i.e X is a variable, X ϵ


V 1 and if X is a terminal X ϵ T1. Each symbol X in
(V1 U T1) is reachable from the start symbol S.

10
• The algorithm for this is shown below:
V 1 = {S}
For each A in V1
If A  α then
Add the variables in α to V1
Add the terminals in α to T1
End if
End for
• Using this algorithm all those symbols (
whether variables or terminals) that are
not reachable from the start symbol are
eliminated. 11
ELIMINATING
€-PRODUCTIONS

15
Introduction

• A production of the form A  ϵ is


undesirable in a CFG, unless an empty
string is derived from the start symbol.
Suppose, the language generated from a
grammar G does not derive any empty
string and the grammar consists of ϵ –
productions. Such ϵ–productions can be
removed.

16
Definition
• Let G = ( V,T,P,S) be a CFG. A production
in P of the form
A ϵ
is called an ϵ–production or NULL
production.
• After applying the production the variable
A is erased. For each A in V, if there is a
derivation of the form

17
• Then A is a nullable variable.
• A nullable variable is defined as follows:
– 1. If A  ϵ is a production in P, then A is a
nullable variable.
– 2. If A B1B2 ….Bn is a production in P, and
if B1,B2,……Bn are nullable variables, then A
is also nullable variable.
– 3. The variables for which there are
productions of the form shown in step 1 and
step 2 are nullable variables.

18
19
20
21
22
23
ELIMINATING
UNIT PRODUCTIONS

25
Unit productions
CNF-
CHOMSKY NORMAL FORM

31
CNF Definition:
Computing the Generating
and Reachable Symbols

40
Computing the Generating
Symbols

41
• By the basis, a and b are generating. For the
induction, we can use the production A  b to
conclude that A is generating, and we can use
the production S  a to conclude that S is
generating.
• At that point, the induction is finished.
• We cannot use the production S  AB, because
B has not been established to be generating.
• Thus, the set of generating symbols is { a, b, A,
S}. 42
Computing Reachable Symbols
• Let us consider the inductive algorithm
whereby we find the set of reachable
symbols for the grammar G = (V, T, P, S).
• BASIS: S is surely reachable.
• INDUCTION: Suppose we have
discovered that some variable A is
reachable. Then for all productions with A
in the head, all the symbols of the bodies
of those productions are also reachable.
43
• By the basis, S is reachable. Since S has
production bodies AB and a, we conclude
that A, B, and a are reachable. B has no
productions, but A has A  b. We
therefore conclude that b is reachable.
Now, no more symbols can be added to
the reachable set, which is {S,A,B,a,b}.
44
THE PUMPING LEMMA FOR
CONTEXT FREE LANGUAGES
The size of the parse tree
Statement of the Pumping Lemma
Applications of the Pumping Lemma

45
Introduction
• Now, we shall develop a tool for showing that
certain languages are not context­free.
• The theorem, called the "pumping lemma for
context-free languages," says that in any
sufficiently long string in a CFL, it is possible to
find at most two short, nearby substrings, that
we can "pump" in tandem.
• That is, we may repeat both of the strings i
times, for any integer i, and the resulting string
will still be in the language.
46
The Size of the Parse Trees
• Our first step in deriving a pumping lemma
for CFVs is to examine the shape and size
of parse trees. One of the uses of CNF is
to turn parse trees into binary trees. These
trees have some convenient properties,
one of which we exploit here.

47
48
The Statement of the Pumping
Lemma

49
Applications of Pumping Lemma

50
51
52
53
54
55
56
57
58
59
60
61
62
63

You might also like