CFG
CFG
.
For a rule A , A is the rules head and is its body.
Context-free grammars
Example: CFG example
= {the, cat, in, hat}
V = {D, N, P, NP, PP}
The start symbol is NP
The rules:
D the NP D N
N cat PP P NP
N hat NP NP PP
P in
Context-free grammars: language
Each non-terminal symbol in a grammar denotes a language.
A rule such as N cat implies that the language denoted by the
non-terminal N includes the alphabet symbol cat.
The symbol cat here is a single, atomic alphabet symbol, and not
a string of symbols: the alphabet of this example consists of
natural language words, not of natural language letters.
For a more complex rule such as NP D N, the language
denoted by NP contains the concatenation of the language
denoted by D with that denoted by N: L(NP) L(D) L(N).
Matters become more complicate when we consider recursive rules
such as NP NP PP.
Context-free grammars: derivation
Given a grammar G = V, , P, S, we dene the set of forms to
be (V )
r
and A
c
is a rule in P.
A is called the selected symbol. The rule A is said to be
applicable to .
Derivation
Example: Forms
The set of non-terminals of G is V = {D, N, P, NP, PP} and the
set of terminals is = {the, cat, in, hat}.
The set of forms therefore contains all the (innitely many)
sequences of elements from V and , such as , NP,
D cat P D hat, D N, the cat in the hat, etc.
Derivation
Example: Derivation
Let us start with a simple form, NP. Observe that it can be
written as
l
NP
r
, where both
l
and
r
are empty. Observe also
that NP is the head of some grammar rule: the rule NP D N.
Therefore, the form is a good candidate for derivation: if we replace
the selected symbol NP with the body of the rule, while preserving
its environment, we get
l
D N
r
= D N. Therefore, NP D N.
Derivation
Example: Derivation
We now apply the same process to D N. This time the selected
symbol is D (we could have selected N, of course). The left context
is again empty, while the right context is
r
= N. As there exists
a grammar rule whose head is D, namely D the, we can replace
the rules head by its body, preserving the context, and obtain the
form the N. Hence D N the N.
Derivation
Example: Derivation
Given the form the N, there is exactly one non-terminal that we
can select, namely N. However, there are two rules that are headed
by N: N cat and N hat. We can select either of these rules
to show that both the N the cat and the N the hat.
Since the form the cat consists of terminal symbols only, no non-
terminal can be selected and hence it derives no form.
Extended derivation
G
if derives in k steps:
G
1
G
2
G
. . .
G
k
and
k
= .
The reexive-transitive closure of
G
is
G
:
G
if
G
for some k 0.
A G-derivation is a sequence of forms
1
, . . . ,
n
, such that for
every i , 1 i < n,
i
G
i +1
.
Extended derivation: example
Example: Derivation
(1) NP D N
(2) D N the N
(3) the N the cat
Extended derivation: example
Example: Derivation
Therefore, we trivially have:
(4) NP
D N
(5) D N
the N
(6) the N
the cat
From (2) and (6) we get
(7) D N
the cat
and from (1) and (7) we get
(7) NP
the cat
Languages
A form is a sentential form of a grammar G i S
G
, i.e., it
can be derived in G from the start symbol.
The (formal) language generated by a grammar G with respect to
a category name (non-terminal) A is L
A
(G) = {w | A
w}. The
language generated by the grammar is L(G) = L
S
(G).
A language that can be generated by some CFG is a context-free
language and the class of context-free languages is the set of
languages every member of which can be generated by some CFG.
If no CFG can generate a language L, L is said to be
trans-context-free.
Language of a grammar
Example: Language
For the example grammar (with NP the start symbol):
D the NP D N
N cat PP P NP
N hat NP NP PP
P in
it is fairly easy to see that L(D) = {the}.
Similarly, L(P) = {in} and L(N) = {cat, hat}.
Language of a grammar
Example: Language
It is more dicult to dene the languages denoted by the non-
terminals NP and PP, although is should be straight-forward that
the latter is obtained by concatenating {in} with the former.
Proposition: L(NP) is the denotation of the regular expression
the (cat + hat) (in the (cat + hat))
, and
another that contains only terminal rules of the form B where
. It turns out that every CFG is equivalent to some CFG of
this form.
Normal form
A grammar G is in phrasal/terminal normal form i for every
production A of G, either V
or . Productions of
the form A are called terminal rules, and A is said to be a
pre-terminal category, the lexical entry of . Productions of the
form A , where V