Chapter 4 - Context Free Languages
Chapter 4 - Context Free Languages
A
symbol and ending in leaves that are terminals, a While, the tree shown in the figure given below is a
derivation tree shows how each variable is replaced in derivation tree.
the derivation.
S
Let G = (N, T, S, P) be a context-free grammar. An
ordered tree is a derivation tree for G if and only if it
has the following properties.
1. The root is labeled S a A B
2. Every leaf has a label from T { ε }
a. Every leaf has a label from N T {ε}
is said to be a partial derivation tree. b B b A
3. Every interior vertex (a vertex which is not a
leaf) has a label from N.
4. If a vertex has label A N, and its children are ε b B b
labeled (from left to right) a1, a2, . . . , an, then P
must contain a production of the form
Aa1a2…an. ε
5. A leaf labeled ε has no siblings, that is, a vertex
with a child labeled ε can have no other
children. The string abBbB, which is the yield of the first tree
A tree that has properties 2(a), 3, 4 and 5, but in which (partial derivation tree), is a sentential form of G. the
1 does not necessarily hold is said to be a Partial yield of the second tree, abbbb is a sentence of L(G).
Derivation Tree. The string of symbols obtained by
reading the leaves of the tree from left to right, omitting 4.2 Parsing and Ambiguity
any ε’s encountered, is said to be the yield of the tree.
We have so far concentrated on the generative aspects
For example, consider the grammar G, with productions of grammars. Given a grammar G, we studied the set of
S aAB strings that can be derived using G. In cases of practical
A bBb applications, we are also concerned with the analytical
BA|ε side of the grammar: given a string ‘w’ of terminals (a
The tree in the figure below is a partial derivation tree sentence), we want to know whether or not ‘w’ is in
for G. L(G). If so, we may want to find a derivation of w. An
algorithm that can tell us whether w is in L(G) is a
membership algorithm. The term “Parsing” describes
S finding a sequence of productions by which a ‘w’
L(G) is derived.
Ambiguity in Grammars and Languages Hence the above grammar is an Ambiguous grammar.
A context free grammar G is said to be ambiguous if Ambiguity is a common feature of natural languages,
there exists some w L(G) which has at least two where it is tolerated and dealt with in a variety of ways.
distinct derivation trees. Alternatively ambiguity In programming languages, where there should be only
implies the existence of two or more leftmost or one interpretation of each statement, ambiguity must be
rightmost derivations. For example, consider the removed when possible. Often we can achieve this by
grammar rewriting the grammar in an equivalent, unambiguous
S SS form. Let us demonstrate this with an example.
S aSb Consider the grammar G = (N, T, E, P) with
Sε N= {E, I}
T = {a,b,c,+,*,(,)}
The sentence aabb has two derivation trees as shown in and productions
the figures below:
E E+E
S E E*E
E (E)
EI
a S b Ia|b|c
ε
E
S E + E
I E * E
S S
a I I
ε a S b
b c
a S b
a b
Instructor: Yonas T. 6 Chapter -4
[email protected] Context Free Languages
Aksum University Electrical Engineering and Informatics Department
4.3 Context-Free Grammars and to make the intent of production explicit. But otherwise
Programming Languages there are no significant differences between the two
notations. For example, if-then-else statement of high
One of the most important uses of the theory of formal level programming language can be defined as
languages is in the definition of programming
<if-statement>::=if<expression><then-clause><else-clause>
languages and in the construction of interpreters and
compilers for them. The basic problem here is to define Here the keyword “if” is a terminal symbol. All other
a programming language precisely and to use this terms are variables which still have to be defined.
definition as the starting point for the writing of
efficient and reliable translation programs. Both regular Those aspects of a programming language which can be
and context-free grammars are important in achieving modeled by a context free grammar are usually referred
this. As we have seen regular grammar are used in the to as its syntax. However, it is normally the case that
recognition of certain simple patterns which occur in not all programs which are syntactically correct in this
programming languages, but as we have discussed sense are in fact acceptable programs. For example, the
already, we need context-free grammars to model more usual BNF definition allows constructs such as
complicated aspects. It is also to be noted that all
int x,y;
regular grammars are context-free grammars.
char a,x;
It is traditional in writing on programming languages to
use a convention for specifying grammars called the or
Backus-Naur Form or BNF. This form in essence the
same as the notation we have used, but the appearance int x;
is different. In BNF, variables or Non-terminals are x=’a’;
enclosed in triangular brackets. Terminal symbols are
written without any special marking. BNF also uses Neither of these two constructs is either right or
subsidiary symbol such as |, much in the way we have acceptable by the C compiler, since they violate other
used. Let us represent the grammar given below in constraints, such as “Two different variables cannot
BNF. have the same name” as in the first case and “an integer
variable cannot handle a character value as it is” as in
Context-Free Grammar the second case. This kind of rule is part of
E E+T | T programming language “semantics”, since it has to do
T T*F | F with how we interpret the meaning of a particular
F (E) | I construct. Programming language semantics are a
Ia|b|c complicated matter. It is an on going concern both in
programming languages and in formal language theory
BNF
to find effective methods for defining programming
<expression>::=<term> | <expression> + <term>
language sentences. Several methods have been
<term>::=<factor> | <term> * <factor>
proposed, but none of them have been universally
<factor>::=(<expression>) | <identifier>
accepted and as successful for semantic definition as
<identifier>::=a | b | c
context free grammars have been for syntax.
The symbols +, *, (, ), a, b and c are terminals. The
symbols | is used as an alternator. The symbol ::= is
used in place of . BNF descriptions of programming
languages tend to use more explicit variable identifiers