0% found this document useful (0 votes)
60 views7 pages

Chapter 4 - Context Free Languages

The document discusses context-free grammars and context-free languages. It defines a context-free grammar as a grammar where productions have the form A → β, where A is a non-terminal and β is a string containing terminals and/or non-terminals. A language is context-free if there exists a context-free grammar such that the language is the set of strings derived from the grammar. Examples of context-free languages are given, including languages of balanced parentheses and strings with equal numbers of a's and b's. Derivations in context-free grammars can be leftmost or rightmost depending on the order variables are replaced. Derivations can also be represented using derivation trees showing the step-
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views7 pages

Chapter 4 - Context Free Languages

The document discusses context-free grammars and context-free languages. It defines a context-free grammar as a grammar where productions have the form A → β, where A is a non-terminal and β is a string containing terminals and/or non-terminals. A language is context-free if there exists a context-free grammar such that the language is the set of strings derived from the grammar. Examples of context-free languages are given, including languages of balanced parentheses and strings with equal numbers of a's and b's. Derivations in context-free grammars can be leftmost or rightmost depending on the order variables are replaced. Derivations can also be represented using derivation trees showing the step-
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Aksum University Electrical Engineering and Informatics Department

Formal Languages Where A is a Non-terminal [A  N] and


Chapter – 4  is any string containing Non-terminals and/or
Context Free Languages terminals [  (NT)*].

A language L is said to be context-free if and only if


In the previous chapter, we discussed that not all there is a context free grammar G such that L = L(G).
languages are regular. While regular languages are every regular grammar is context-free, so a regular
effective in describing certain simple patterns, one does language is a proper subset of the family of context-free
not need to look very far for examples of non-regular languages. Context-free grammars derive their name
languages. The relevance of these limitations to from the fact that the substitution of the variable on the
programming languages becomes evident if we left of a production can be made any time such a
reinterpret some of the examples. If in L={anbn:n0} variable appears in a sentential form. It does not depend
we substitute a left parenthesis for ‘a’ and a right on the symbols in the rest of the sentential form (the
parenthesis for ‘b’, then parenthesis strings such as (()) context). This feature is the consequence of allowing
and ((( ))) are in L, but (() is not. The language only a single variable on the left side of the production.
therefore describes a simple kind of nested structure
found in programming languages require something Examples of Context-Free languages:
beyond regular languages. In order to cover this and The grammar G = ({S}, {a, b}, S, P) with productions
other more complicated features we must enlarge the S  aSa
family of languages. This leads us to consider context- S  bSb
free languages and grammars. Sε
is context-free. A typical derivation in this grammar is
The topic of context-free languages is perhaps the most S aSa  aaSaa  aabSbaa  aabbaa
important aspect of formal language theory as it applies This makes it clear that
to programming languages. Actual programming L(G) = {wwR : w  {a, b}*}.
languages have many features that can be described The language is context-free, but it is not regular.
elegantly by means of context-free languages. What
formal language theory tells us about context-free The Grammar G, with productions
languages has important applications in the design of S  abB
programming languages as well as in the construction A  aaBb
of efficient compilers. B  bbAa
Aε
4.1 Context-Free Grammars is context-free. We leave it to the reader to show that
L(G) = {ab (bbaa)n bba (ba)n | n0}
The productions in a regular grammar are restricted in
two ways: the left side must be a single variable, while Both of the above examples involve grammars that are
the right side has a special form. To create grammars not only context free, but linear. Regular and linear
that are more powerful, we must relax some of these grammars are clearly context-free, but a context-free
restrictions. By retaining the restriction on the left side, grammar is not necessarily linear.
but permitting anything on the right, we get context-
free grammars. The language L = {anbm | n  m} is context-free. To
show this, we need to produce a context-free grammar
A grammar G= (N, T, S, P) is said to be context-free if for the language.
all productions in P have the form

A

Instructor: Yonas T. 1 Chapter -4


[email protected] Context Free Languages
Aksum University Electrical Engineering and Informatics Department

The case of n = m In order to show which derivation is used, we have


S  aSb numbered the productions and written the appropriate
Sε number on the  symbol. From this we see that the two
Take the case n > m. We first generate a string with an derivations not only yield the same sentence but use
equal number of a’s and b’s, then add extra a’s on the exactly the same productions. The difference is entirely
left. This is done with in the order in which the productions are applied.
S  AB
B  aBb | ε A derivation is said to be leftmost if in each step the
leftmost variable in the sentential form is replaced. If in
A  aA | a
each step the rightmost variable is replaced, we call the
We can use similar reasoning for the case n<m, and we derivation rightmost.
get the answer
For example:
S  AN | NB
Consider the grammar with productions
N  aNb | ε
S  aAB
A  aA | a
A  bBb
B  bB | b
B  A| ε
The resulting grammar is context-free; hence L is a Then,
context-free language. However, the grammar is not SaABabBbBabAbBabbBbbBabbbbB 
linear. abbbb is leftmost derivation of the string abbbb. A
rightmost derivation of the same string is
Leftmost and Rightmost Derivations SaABaAabBbabAbabbBbbabbbb.

In context-free grammars that are not linear, a Derivation Trees


derivation may involve sentential forms may involve The second way of showing derivations, independent of
sentential forms with more than one variable. In such the order in which productions are used, is by a
cases, we have a choice in the order in which variables derivation tree. A derivation tree is an ordered tree in
are replaced. Take for example the grammar which nodes are labeled with the left sides of
productions and in which the children of a node
G = ({A, B, S}, {a, b}, S, P) with productions,
represent its corresponding right sides. The figure given
S  AB
below shows part of a derivation tree representing the
A  aaA
production A  abABc.
A ε
B  Bb
B ε A

It is easy to see that this grammar generates the


language L (G) = { a2nbm | n0, m0 }

Consider now the two derivations


a b A B c
1 2 3 4 5
S  AB  aaAB  aaB  aaBb  aab
In a derivation tree, a node labeled with a variable
And
occurring on the left side of a production has children
1 4 2 5 3
consisting of the symbols on the right side of that
S  AB  ABb  aaABb  aaAb  aab production. Beginning with root, labeled with the start

Instructor: Yonas T. 2 Chapter -4


[email protected] Context Free Languages
Aksum University Electrical Engineering and Informatics Department

symbol and ending in leaves that are terminals, a While, the tree shown in the figure given below is a
derivation tree shows how each variable is replaced in derivation tree.
the derivation.
S
Let G = (N, T, S, P) be a context-free grammar. An
ordered tree is a derivation tree for G if and only if it
has the following properties.
1. The root is labeled S a A B
2. Every leaf has a label from T { ε }
a. Every leaf has a label from N  T {ε}
is said to be a partial derivation tree. b B b A
3. Every interior vertex (a vertex which is not a
leaf) has a label from N.
4. If a vertex has label A  N, and its children are ε b B b
labeled (from left to right) a1, a2, . . . , an, then P
must contain a production of the form
Aa1a2…an. ε
5. A leaf labeled ε has no siblings, that is, a vertex
with a child labeled ε can have no other
children. The string abBbB, which is the yield of the first tree
A tree that has properties 2(a), 3, 4 and 5, but in which (partial derivation tree), is a sentential form of G. the
1 does not necessarily hold is said to be a Partial yield of the second tree, abbbb is a sentence of L(G).
Derivation Tree. The string of symbols obtained by
reading the leaves of the tree from left to right, omitting 4.2 Parsing and Ambiguity
any ε’s encountered, is said to be the yield of the tree.
We have so far concentrated on the generative aspects
For example, consider the grammar G, with productions of grammars. Given a grammar G, we studied the set of
S  aAB strings that can be derived using G. In cases of practical
A  bBb applications, we are also concerned with the analytical
BA|ε side of the grammar: given a string ‘w’ of terminals (a
The tree in the figure below is a partial derivation tree sentence), we want to know whether or not ‘w’ is in
for G. L(G). If so, we may want to find a derivation of w. An
algorithm that can tell us whether w is in L(G) is a
membership algorithm. The term “Parsing” describes
S finding a sequence of productions by which a ‘w’ 
L(G) is derived.

a A B Parsing and Membership

Given a string ‘w’ in L(G), we can parse it in a rather


obvious fashion: we systematically construct all
b B b
possible (say, leftmost) derivations and see whether any
of them match w. Specifically, we start at iteration one
by looking at all productions of the form

Instructor: Yonas T. 3 Chapter -4


[email protected] Context Free Languages
Aksum University Electrical Engineering and Informatics Department

S Which are obtained by replacing the leftmost S in


sentential form 1 with all applicable substitutes.
finding all  that can be derived from S in one step. If Similarly, from sentential form 2 we got the additional
none of these result in a match with w, we go to the sentential forms
next iteration, in which we apply all applicable S  aSb  aSSb
productions to the leftmost variable of every . This S  aSb  aaSbb
gives us a set of sentential forms, some of them S  aSb  ab
possibly leading to w. On each, subsequent iteration we Again, several of these can be removed from
again take all leftmost variables and apply all possible contention. On the next iteration, we find the actual
productions. It may be that some of these sentential target string from the sequence
forms can be rejected on the grounds that ‘w’ can never S  aSb  aaSbb  aabb
be derived from them, but in general, we will have on
each iteration set of possible sentential forms. After the Therefore, aabb is in the language generated by the
first iteration, we have sentential forms that can be grammar under consideration.
derived by applying a single production, after the Exhaustive search parsing has serious flaws. The most
second iteration we have the sentential forms that can obvious one is its tediousness; it is not to be used where
be derived with two productions, and son on. If w  efficient parsing is required. But even when efficiency
L(G), then it must have a leftmost derivation of finite is a secondary issue, there is a more pertinent objection.
length. Thus, the method will eventually give a leftmost While the method always parses a w  L(G), it is
derivation of w. For further reference, we will call this possible that it never terminates for strings in L(G).
the “exhaustive search” parsing method. It is a form of This is certainly the case in the previous example; with
top-down parsing, which we can view as the w = abb, the method will go on producing trial
construction of a derivation tree from the root down. sentential forms indefinitely unless we build into it
For example, consider the grammar some way of stopping.
S  SS
S  aSb The problem of non-termination of exhaustive search
S  bSa parsing is relatively easy to overcome if we restrict the
form that the grammar can have. If we examine the
Sε
previous example, we see that the difficulty comes from
and the string w = aabb.
the production S  ε; this production can be used to
First iteration gives us decrease the length of successive sentential forms, so
that we cannot tell easily when to stop. If we do not
1. S  SS have any such productions, then we have much less
2. S  aSb difficulties. In fact, there are two types of productions
3. S  bSa we want to rule out, those of the form A  ε as well as
4. Sε those of the form A  B. The grammar
SSS|aSb|bSa|ab|ba satisfies the given requirements. It
The last two of these can be removed ffrom further generates the language without the empty string. Given
consideration for obvious reasons. Iteration two then any w  {a, b}+, the exhaustive search parsing method
yields sentential forms will always terminate in no more than |w| iteration.
S  SS  SSS
S  SS  aSbS
S  SS  S

Instructor: Yonas T. 4 Chapter -4


[email protected] Context Free Languages
Aksum University Electrical Engineering and Informatics Department

Ambiguity in Grammars and Languages Hence the above grammar is an Ambiguous grammar.

A context free grammar G is said to be ambiguous if Ambiguity is a common feature of natural languages,
there exists some w  L(G) which has at least two where it is tolerated and dealt with in a variety of ways.
distinct derivation trees. Alternatively ambiguity In programming languages, where there should be only
implies the existence of two or more leftmost or one interpretation of each statement, ambiguity must be
rightmost derivations. For example, consider the removed when possible. Often we can achieve this by
grammar rewriting the grammar in an equivalent, unambiguous
S  SS form. Let us demonstrate this with an example.
S  aSb Consider the grammar G = (N, T, E, P) with
Sε N= {E, I}
T = {a,b,c,+,*,(,)}
The sentence aabb has two derivation trees as shown in and productions
the figures below:
E  E+E
S E  E*E
E  (E)
EI
a S b Ia|b|c

The grammar is ambiguous, for the reason, the string


a+b*c has two different derivation trees, as shown in
a S b the figures below:

ε
E

S E + E

I E * E
S S

a I I
ε a S b

b c
a S b

Instructor: Yonas T. 5 Chapter -4


[email protected] Context Free Languages
Aksum University Electrical Engineering and Informatics Department

No other derivation tree is possible for this string: the


E grammar is unambiguous.

In the foregoing example the ambiguity came from the


grammar I the sense that it could be removed by finding
E * E
an equivalent unambiguous grammar. In some
instances, however, this is not possible because the
ambiguity is in the language. If L is a context free
E + E I language for which there exists an unambiguous
grammar, then L is said to be unambiguous, then the
language is called inherently ambiguous. It is difficult
I I c even to exhibit an inherently ambiguous language. The
best we can do here is give an example with some
reasonably plausible claim that it is inherently
a b ambiguous. For example, the language
L={anbncm}{anbmcm} where n and m are non-negative,
One way to resolve the ambiguity is, as is done in
is an inherently ambiguous context-free language. The
programming manuals, to associate precedence rules
context free language L is represented as
with the operators + and *. To rewrite the grammar in
L = L1  L2
the above example we introduce new variables, taking
N = {E, T, F, I}
Where L1 is generated by the grammar
and replace the productions with R  aRbc | ε
C  cC | ε
E  E+T | T And where L2 can be generated using the grammar
T  T*F | F S  aS | B
F  (E) | I B  bBc | ε
Ia|b|c
The derivation tree of the sentence a+b*c is shown Then L is generated by the combination of these two
below: grammars with additional production
E P R|S

The grammar is ambiguous since the string anbncn has


two different derivations, one starting with P  S, and
E + T
the other with P  S. It does of course not follow from
this that L is inherently ambiguous as there might exist
some other unambiguous as there might exist some
T T * F other unambiguous grammars for it. But in some way
L1 and L2 have conflicting requirements, the first
putting a restriction on the number of a’s and b’s, while
F F I second does the same for b’s and c’s. A few tries will
quickly convince you of the impossibility of combining
these requirements in a single set of rules that cover the
I I c case n=m uniquely.

a b
Instructor: Yonas T. 6 Chapter -4
[email protected] Context Free Languages
Aksum University Electrical Engineering and Informatics Department

4.3 Context-Free Grammars and to make the intent of production explicit. But otherwise
Programming Languages there are no significant differences between the two
notations. For example, if-then-else statement of high
One of the most important uses of the theory of formal level programming language can be defined as
languages is in the definition of programming
<if-statement>::=if<expression><then-clause><else-clause>
languages and in the construction of interpreters and
compilers for them. The basic problem here is to define Here the keyword “if” is a terminal symbol. All other
a programming language precisely and to use this terms are variables which still have to be defined.
definition as the starting point for the writing of
efficient and reliable translation programs. Both regular Those aspects of a programming language which can be
and context-free grammars are important in achieving modeled by a context free grammar are usually referred
this. As we have seen regular grammar are used in the to as its syntax. However, it is normally the case that
recognition of certain simple patterns which occur in not all programs which are syntactically correct in this
programming languages, but as we have discussed sense are in fact acceptable programs. For example, the
already, we need context-free grammars to model more usual BNF definition allows constructs such as
complicated aspects. It is also to be noted that all
int x,y;
regular grammars are context-free grammars.
char a,x;
It is traditional in writing on programming languages to
use a convention for specifying grammars called the or
Backus-Naur Form or BNF. This form in essence the
same as the notation we have used, but the appearance int x;
is different. In BNF, variables or Non-terminals are x=’a’;
enclosed in triangular brackets. Terminal symbols are
written without any special marking. BNF also uses Neither of these two constructs is either right or
subsidiary symbol such as |, much in the way we have acceptable by the C compiler, since they violate other
used. Let us represent the grammar given below in constraints, such as “Two different variables cannot
BNF. have the same name” as in the first case and “an integer
variable cannot handle a character value as it is” as in
Context-Free Grammar the second case. This kind of rule is part of
E  E+T | T programming language “semantics”, since it has to do
T  T*F | F with how we interpret the meaning of a particular
F  (E) | I construct. Programming language semantics are a
Ia|b|c complicated matter. It is an on going concern both in
programming languages and in formal language theory
BNF
to find effective methods for defining programming
<expression>::=<term> | <expression> + <term>
language sentences. Several methods have been
<term>::=<factor> | <term> * <factor>
proposed, but none of them have been universally
<factor>::=(<expression>) | <identifier>
accepted and as successful for semantic definition as
<identifier>::=a | b | c
context free grammars have been for syntax.
The symbols +, *, (, ), a, b and c are terminals. The
symbols | is used as an alternator. The symbol ::= is
used in place of . BNF descriptions of programming
languages tend to use more explicit variable identifiers

Instructor: Yonas T. 7 Chapter -4


[email protected] Context Free Languages

You might also like