0% found this document useful (0 votes)
30 views21 pages

Theory of Computation: Lecture 7: Context-Free Grammar

This document summarizes key topics from a lecture on context-free grammars: - A context-free grammar (CFG) is defined as a 4-tuple (V, Σ, R, S) comprising variables, terminals, substitution rules, and a start variable. - Derivations in a CFG are sequences of substitutions starting with the start symbol and resulting in a string. - The language of a CFG is the set of all strings that can be derived from the start variable. A language is context-free if it is generated by some CFG. - It is always possible to construct a CFG that generates the same language as a given deterministic finite automaton (DFA).
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views21 pages

Theory of Computation: Lecture 7: Context-Free Grammar

This document summarizes key topics from a lecture on context-free grammars: - A context-free grammar (CFG) is defined as a 4-tuple (V, Σ, R, S) comprising variables, terminals, substitution rules, and a start variable. - Derivations in a CFG are sequences of substitutions starting with the start symbol and resulting in a string. - The language of a CFG is the set of all strings that can be derived from the start variable. A language is context-free if it is generated by some CFG. - It is always possible to construct a CFG that generates the same language as a given deterministic finite automaton (DFA).
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Theory of Computation

Lecture 7: Context-free Grammar


Max Alekseyev
University of South Carolina
February 2, 2012
Lecture Outline
Context-free Grammars
Derivation
Context-free Languages
Parse Trees and Ambiguity
Chomsky Normal Form
Languages with Recursive Structure
Consider a language L = 0
n
1
n
[ n 0. As we already know, this
language is not regular.
However, it can be generated by simple recursive rules:

for a string s from L, the string 0s1 is also in L.


That is, applying these rules and starting with L, we will get
that the strings 01, 0011, 000111 and so on are also in L. Sooner
or later we will obtain every element of L.
Context-free grammar is a powerful tool to describe languages with
recursive structure.
Languages with Recursive Structure
Consider a language L = 0
n
1
n
[ n 0. As we already know, this
language is not regular.
However, it can be generated by simple recursive rules:

for a string s from L, the string 0s1 is also in L.


That is, applying these rules and starting with L, we will get
that the strings 01, 0011, 000111 and so on are also in L. Sooner
or later we will obtain every element of L.
Context-free grammar is a powerful tool to describe languages with
recursive structure.
Example of a Context-free Grammar

for a string s from L, the string 0s1 is also in L.


can be formally written as substitution rules:
S
S 0S1
where at the left hand side we have variables (usually denoted by
capital letters) such as S in this case; symbols other than variables
are called terminals. The top-most variable is called the start
varible.
The set of rules, the set of terminals, and the set of varibles with
one marked as the start variable form a context-free grammar.
The set of rules corresponding to the same variable can be written
in a single line, e.g.:
S 0S1 [
Example of a Context-free Grammar

for a string s from L, the string 0s1 is also in L.


can be formally written as substitution rules:
S
S 0S1
where at the left hand side we have variables (usually denoted by
capital letters) such as S in this case; symbols other than variables
are called terminals. The top-most variable is called the start
varible.
The set of rules, the set of terminals, and the set of varibles with
one marked as the start variable form a context-free grammar.
The set of rules corresponding to the same variable can be written
in a single line, e.g.:
S 0S1 [
Context-free Grammars
A context-free grammar (CFG) is a 4-tuple (V, , R, S) where

V is a nite set of variables;

is a nite set of terminals (disjoint from V);

R is a nite set of substitution rules, each rule maps a variable


into a string of variables and terminals;

S V is the start variable.


The term context-free reects the fact that each variable can be
substituted independently of the context in which it appears.
Q: Can we say that R : V (V )

, i.e., R is a function from


V to the set of strings over variables and terminals?
Q: What is about R : V T ((V )

) ?
Context-free Grammars
A context-free grammar (CFG) is a 4-tuple (V, , R, S) where

V is a nite set of variables;

is a nite set of terminals (disjoint from V);

R is a nite set of substitution rules, each rule maps a variable


into a string of variables and terminals;

S V is the start variable.


The term context-free reects the fact that each variable can be
substituted independently of the context in which it appears.
Q: Can we say that R : V (V )

, i.e., R is a function from


V to the set of strings over variables and terminals?
Q: What is about R : V T ((V )

) ?
Context-free Grammars
A context-free grammar (CFG) is a 4-tuple (V, , R, S) where

V is a nite set of variables;

is a nite set of terminals (disjoint from V);

R is a nite set of substitution rules, each rule maps a variable


into a string of variables and terminals;

S V is the start variable.


The term context-free reects the fact that each variable can be
substituted independently of the context in which it appears.
Q: Can we say that R : V (V )

, i.e., R is a function from


V to the set of strings over variables and terminals?
Q: What is about R : V T ((V )

) ?
Derivation
Substitution rules is convenient way to represent a context-free
grammar:
A 0A1
A B
B #
Q: What are (V, , R, S) for this grammar?
A derivation of a string w in a given grammar G is a sequence of
substitutions starting with the start symbol and resulting in w, e.g.:
A 0A1 00A11 000A111 000B111 000#111
is a derivation of the string 000#111 in the grammar dened
above. The symbol reads yields.
We say u derives v and write u

v if there is a derivation (for
some k 0):
u u
1
u
2
u
k
v.
Derivation
Substitution rules is convenient way to represent a context-free
grammar:
A 0A1
A B
B #
Q: What are (V, , R, S) for this grammar?
A derivation of a string w in a given grammar G is a sequence of
substitutions starting with the start symbol and resulting in w, e.g.:
A 0A1 00A11 000A111 000B111 000#111
is a derivation of the string 000#111 in the grammar dened
above. The symbol reads yields.
We say u derives v and write u

v if there is a derivation (for
some k 0):
u u
1
u
2
u
k
v.
Derivation
Substitution rules is convenient way to represent a context-free
grammar:
A 0A1
A B
B #
Q: What are (V, , R, S) for this grammar?
A derivation of a string w in a given grammar G is a sequence of
substitutions starting with the start symbol and resulting in w, e.g.:
A 0A1 00A11 000A111 000B111 000#111
is a derivation of the string 000#111 in the grammar dened
above. The symbol reads yields.
We say u derives v and write u

v if there is a derivation (for
some k 0):
u u
1
u
2
u
k
v.
Context-free Languages
The set of all strings that can be derived (generated) in a given
grammar G = (V, , R, S) is called the language of the grammar
G and denoted L(G). In other words,
L(G) = w

[ S

w.
A language is called a context-free language (CFL) if it is
generated by some context-free grammar.
We already considered two examples of context-free languages:
L(G
1
) = 0
n
1
n
[ n 0 and L(G
2
) = 0
n
#1
n
[ n 0.
Given two CFLs, it is easy to construct a CFG for their union, e.g.,
combining CFGs for L(G
1
) and L(G
2
):
S S
1
[ S
2
S
1
0S
1
1 [
S
2
0S
2
1 [ #
Context-free Languages
The set of all strings that can be derived (generated) in a given
grammar G = (V, , R, S) is called the language of the grammar
G and denoted L(G). In other words,
L(G) = w

[ S

w.
A language is called a context-free language (CFL) if it is
generated by some context-free grammar.
We already considered two examples of context-free languages:
L(G
1
) = 0
n
1
n
[ n 0 and L(G
2
) = 0
n
#1
n
[ n 0.
Given two CFLs, it is easy to construct a CFG for their union, e.g.,
combining CFGs for L(G
1
) and L(G
2
):
S S
1
[ S
2
S
1
0S
1
1 [
S
2
0S
2
1 [ #
From DFA to CFG
CFGs have more power that DFA. It is not surprising that for any
DFA recognizing a language L, we can construct into a CFG
generating the same language L.
Q: What such construction would prove?
For a given DFA (Q, , , q
0
, F), an equivalent CFG can be
constructed as follows:

Introduce the variable R


i
for every state q
i
Q;

Make R
0
a start variable of the CFG;

Add the rule R


i
aR
j
for every transition q
i
a
q
j
dened by
(i.e., (q
i
, a) = q
j
);

Add the rule R


i
if q
i
is an accepting state.
Q: Verify that the constructed CFG generates the same language
as the recognized by the given DFA.
From DFA to CFG
CFGs have more power that DFA. It is not surprising that for any
DFA recognizing a language L, we can construct into a CFG
generating the same language L.
Q: What such construction would prove?
For a given DFA (Q, , , q
0
, F), an equivalent CFG can be
constructed as follows:

Introduce the variable R


i
for every state q
i
Q;

Make R
0
a start variable of the CFG;

Add the rule R


i
aR
j
for every transition q
i
a
q
j
dened by
(i.e., (q
i
, a) = q
j
);

Add the rule R


i
if q
i
is an accepting state.
Q: Verify that the constructed CFG generates the same language
as the recognized by the given DFA.
Parse Trees
A parse tree is a convenient way to represent a derivation of a
particular string.
For example, in the grammar
A 0A1
A B
B #
derivation of the string 000#111 is given by the tree:
Parse Trees and Ambiguity
The same parse tree may correspond multiple dierent derivations.
However, each parse tree uniquely denes a leftmost derivation
where at every step the leftmost remaining variable is being
replaced.
A string w is derived ambiguously in a context-free grammar G if
it has two or more dierent leftmost derivation (thus, parse trees).
Two dierent parse trees for the string a + a a in the grammar
EXPR EXPR +EXPR [ EXPR EXPR [ (EXPR) [ a
Inherently Ambiguous Grammars
A grammar G is ambiguous if it generates some string ambiguously.
Sometimes for an ambiguous grammar, there exists an
unambiguous grammar generating the same language. But some
languages can be generated only by ambiguous grammars. Such
languages are called inherently ambiguous.
In particular, the language a
i
b
j
c
k
[ i = j j = k is inherently
ambiguous.
Chomsky Normal Form
A CFG is Chomsky normal form may have rules only of the
following forms:
S
A BC
A a
where S is the start variable, A, B, and C are any varibles, except
that B ,= S and C ,= S; and a is any terminal.
Theorem
Every CFL is generated by some CFG in Chomsky normal form.
Converting CFG into Chomsky Normal Form
A CFG can be converted into Chomsky normal form as follows:

Add a new start variable S


0
and the rule S
0
S where S is
the old start variable.

Remove each A , expanding the other rules with A in the


r.h.s.: each rule R uAv gets a complement rule R uv;
rule R uAvAw gets three complements: R uvAw,
R uAvw, and R uvw; etc. Rule R A is complemented
by R unless this rule was previously removed.

Remove each unit rule A B and complement each rule


B u with a rule A u unless it was a unit rule removed
before.

Replace each rule A u


1
u
2
. . . u
k
, where k 3 and each
u
i
V , with the rules A u
1
A
1
, A
1
u
2
A
2
, . . . ,
A
k2
u
k1
u
k
where A
i
are new variables. If k = 2, we
replace each terminal u
i
with the new variable U
i
and add a
new rule U
i
u
i
.

You might also like