0% found this document useful (0 votes)
275 views40 pages

Context Free Grammars

Context-Free Languages & Grammars The document discusses context-free languages and grammars (CFLs and CFGs). It provides examples of CFLs including the language of binary palindromes and balanced parentheses. A CFG defines a language by specifying variables, terminals, productions, and a start variable. Strings in the language must be derivable from the start variable using the productions. Parse trees can represent derivations visually. CFLs are more powerful than regular languages and have applications in areas like syntax analysis for programming languages. CFGs may be ambiguous, having more than one parse tree for some strings.

Uploaded by

Venugopal Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
275 views40 pages

Context Free Grammars

Context-Free Languages & Grammars The document discusses context-free languages and grammars (CFLs and CFGs). It provides examples of CFLs including the language of binary palindromes and balanced parentheses. A CFG defines a language by specifying variables, terminals, productions, and a start variable. Strings in the language must be derivable from the start variable using the productions. Parse trees can represent derivations visually. CFLs are more powerful than regular languages and have applications in areas like syntax analysis for programming languages. CFGs may be ambiguous, having more than one parse tree for some strings.

Uploaded by

Venugopal Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Context-Free Languages &

Grammars
((CFLs & CFGs))
Reading: Chapter 5

Not all languages are regular

So what happens to the languages


which are not regular?
Can we still come up with a language
recognizer?

ii.e., something
thi th
thatt will
ill acceptt ((or reject)
j t)
strings that belong (or do not belong) to the
language?
2

Context-Free Languages

A language class larger than the class of regular


languages
Supports natural, recursive notation called contextfree grammar
Applications:

Parse trees
trees, compilers
XML

Regular
(FA/RE)

Contextfree
(PDA/CFG)

An Example

A palindrome is a word that reads identical from both


ends

E g madam
E.g.,
madam, redivider
redivider, malayalam
malayalam, 010010010

Let L = { w | w is a binary palindrome}


Is L regular?

No.
Proof:

(assuming N to be the p/l constant)


Let w=0N10N
By Pumping lemma, w can be rewritten as xyz, such that xykz is also L
(for any k0)
But |xy|N and y
==> yy=0
0+
==> xykz will NOT be in L for k=0
==> Contradiction

But the language


g g of
palindromes
is a CFL, because it supports recursive
substitution (in the form of a CFG)

This is because we can construct a


grammar like this:
1.
2.
3.

Productions

4.
5
5.

Same as:
A => 0A0 | 1A1 | 0 | 1 |

A ==>
Terminal
A ==> 0
A ==> 1
Variable or non-terminal
A ==> 0A0
A ==> 1A1

How does this grammar work?


5

How does the CFG for


palindromes work?
An input string belongs to the language (i.e.,
accepted) iff it can be generated by the CFG

Example: w=01110
G can generate w as follows:
1.
2.
3.

=> 0A0
=> 01A10
=> 01110

G:
A => 0A0 | 1A1 | 0 | 1 |

Generating a string from a grammar:


1. Pick and choose a sequence
of productions that would
allow us to generate the
string.
2 At every step,
2.
step substitute one variable
with one of its productions.
6

Context-Free Grammar:
Definition

A context-free grammar G=(V,T,P,S), where:

V: set of variables or non-terminals


T: set of terminals (= alphabet U {{})
})
P: set of productions, each of which is of the form
V ==> 1 | 2 |
Where each i is an arbitrary string of variables and
terminals
S ==> start variable

CFG for the language


g g of binary
yp
palindromes:
G=({A},{0,1},P,A)
P: A ==> 0 A 0 | 1 A 1 | 0 | 1 |

More examples

Parenthesis matching in code


Syntax checking
In scenarios where there is a general need
for:

Matching
M
t hi a symbol
b l with
ith another
th symbol,
b l or
Matching a count of one symbol with that of
another symbol,
y
or
Recursively substituting one symbol with a string
of other symbols

Example #2

Language of balanced paranthesis


e g ()(((())))((()))
e.g.,
()(((())))((())).
CFG?
G:
S => (S) | SS |

How would you interpret the string (((()))()()) using this grammar?

Example #3

A grammar for L = {0m1n | mn}

CFG?

G:
S => 0S1 | A
A => 0A |

How would you interpret the string 00000111


using this grammar?

10

Example #4
A program containing if-then(-else) statements
if Condition then Statement else Statement
(Or)
if Condition then Statement
CFG?

11

More examples

L1 = {0n | n0 }
L2 = {0n | n1 }
L3={0i1j2k | i=j or j=k, where i,j,k0}
L4={0i1j2k | i=j or i=k, where i,j,k1}

12

Applications of CFLs & CFGs

Compilers use parsers for syntactic checking


Parsers can be expressed as CFGs
1.

B l
Balancing
i paranthesis:
th i

2
2.

If-then-else:
If
then else:

3.
4.
5.

B ==> BB | (B) | Statement


Statement ==>
S ==> SS | if Condition then Statement else Statement | if Condition
then Statement | Statement
Condition ==>
Statement ==>

C paranthesis matching { }
Pascal begin-end matching
YACC (Yet Another Compiler-Compiler)
Compiler Compiler)
13

More applications

Markup languages

Nested Tag Matching

HTML

<html> <p> <a href=> </a> </p> </html>

XML

<PC>
PC <MODEL>
MODEL </MODEL>
/MODEL .. <RAM>
RAM
</RAM> </PC>

14

Tag-Markup Languages
Roll ==> <ROLL> Class Students </ROLL>
Class ==> <CLASS> Text </CLASS>
Text ==> Char Text | Char
Char ==> a | b | | z | A | B | .. | Z
Students ==> Student Students |
Student ==> <STUD> Text </STUD>
Here, the left hand side of each production denotes one non-terminals
(e.g., Roll, Class, etc.)
Th
Those
symbols
b l on the
th right
i ht hand
h d side
id ffor which
hi h no productions
d ti
(i
(i.e.,
substitutions) are defined are terminals (e.g., a, b, |, <, >, ROLL,
etc.)
15

Structure of a production
derivation

head
A

=======>

body
1 | 2 | | k

The above is same as:


1.
1
2.
3.

K.

A ==> 1
A ==> 2
A ==> 3
A ==> k
16

CFG conventions

Terminal symbols <== a, b, c

Non-terminal symbols <== A,B,C,

Terminal or non-terminal symbols <== X,Y,Z

Terminal strings <== w, x, y, z

Arbitrary
A
bit
strings
ti
off tterminals
i l and
d nonterminals <== , , , ..

17

Syntactic
y
Expressions
p
in
Programming Languages
result = a*b + score + 10 * distance + c
terminals

variables

Operators are also


terminals

Regular languages have only terminals

Reg expression = [a-z][a-z0-1]*


If we allow
ll
only
l lletters
tt
a & b,
b and
d 0 & 1 ffor
constants (for simplification)

Regular expression = (a+b)(a+b+0+1)*

18

String membership
How to say if a string belong to the language
defined by a CFG?
1.
Derivation

Head to body

Recursive inference

2.

Body to head

Example:

w = 01110
Is w a palindrome?

Both are equivalent


q
forms
G:
A =>
> 0A0 | 1A1 | 0 | 1 |
A => 0A0
=> 01A10
=> 01110
19

Simple Expressions

We can write a CFG for accepting simple


expressions
G = (V,T,P,S)

V = {E,F}
T = {0,1,a,b,+,
{0 1 a b + *,(,)}
( )}
S = {E}
P:

E ==> E+E | E*E | (E) | F


F ==> aF | bF | 0F | 1F | a | b | 0 | 1

20

Generalization of derivation

Derivation is head ==> body


A==>X
A ==>*G X

(A derives X in a single step)


(A derives X in a multiple steps)

Transitivity:
IFA ==>*GB, and B ==>*GC, THEN A ==>*G C

21

Context-Free Language

The language of a CFG, G=(V,T,P,S),


denoted by
y L(G),
( ), is the set of terminal
strings that have a derivation from the
start variable S.

L(G) = { w in T* | S ==>*G w }

22

Left-most & Right-most


g
G:
=> E+E | E*E | (E) | F
Derivation Styles EF =>
aF | bF | 0F | 1F |
E =*=>G a*(ab+10)

Derive the string a*(ab+10) from G:


E
==> E * E
==> F * E
==> aF * E
==> a * E
==> a * (E)
==> a * (E + E)
==> a * (F + E)
==> a * (
(aF + E))
==> a * (abF + E)
==> a * (ab + E)
==> a * (ab + F)
==> a * (ab + 1F)
==> a * (ab + 10F)
==> a * (ab + 10)

Left-most
derivation:
Always
substitute
leftmost
variable

E
==> E * E
==> E * (E)
==> E * (E + E)
==> E * (E + F)
==> E * (E + 1F)
==> E * (E + 10F)
==> E * (E + 10)
==> E * (
(F + 10))
==> E * (aF + 10)
==> E * (abF + 0)
==> E * (ab + 10)
==> F * (ab + 10)
==> aF * (ab + 10)
==> a * (ab + 10)

Right-most
derivation:
Always
substitute
rightmost
g
variable

23

Leftmost vs. Rightmost


g
derivations
Q1) For every leftmost derivation, there is a rightmost
derivation, and vice versa. True or False?
True - will use parse trees to prove this

Q2) Does every word generated by a CFG have a


leftmost and a rightmost derivation?
Yes easy to prove (reverse direction)

Q3) Could there be words which have more than one


l f
leftmost
(or
( rightmost)
i h
)d
derivation?
i i ?
Yes depending on the grammar
24

How to prove that your CFGs


are correct?
(using induction)

25

CFG & CFL

Gpal:
A => 0A0 | 1A1 | 0 | 1 |

Theorem: A string w in (0+1)* is in


L(Gpal), if and only if, w is a palindrome.
Proof:

Use induction

on string
t i length
l
th ffor the
th IF partt
On length of derivation for the ONLY IF part

26

Parse trees

27

Parse Trees

Each CFG can be represented using a parse tree:


Each internal node is labeled by a variable in V
Each leaf is terminal symbol
For a production, A==>X1X2Xk, then any internal node
labeled A has k children which are labeled from X1,X2,Xk
from left to right

Parse tree for production and all other subsequent productions:


A ==>
> X1..X
Xi..X
Xk
A
X1

Xi

Xk

28

Examples
+

F
a

F
1

A
0

A
1

A 1

Derivatio
on

Recursive
R
e inferenc
ce

Parse tree for 0110

Parse tree for a + 1


G:
E => E+E | E*E | (E) | F
F => aF | bF | 0F | 1F | 0 | 1 | a | b

G:
G
A => 0A0 | 1A1 | 0 | 1 |
29

Parse Trees,, Derivations,, and


Recursive Inferences
Re
ecursive
infference

A
X1

Xi

Left-most
derivation
Derivation

Xk

Derivation

Production:
A ==> X1..Xi..Xk

P
Parse
tree
t

Right most
Right-most
derivation

Recursive
inference
30

Interchangeability
g
y of different
CFG representations

Parse tree ==> left-most derivation

Parse tree ==> right-most derivation

DFS right to left

==>
> left-most
l ft
t derivation
d i ti == right-most
i ht
t
derivation
Derivation ==>
> Recursive inference

DFS left to right

Reverse the order of productions

Recursive inference ==> Parse trees

bottom-up traversal of parse tree


31

Connection between CFLs


and RLs

32

What kind of grammars result for regular languages?

CFLs & Regular Languages

A CFG is said to be right-linear if all the


productions are one of the following two
f
forms:
A ==> wB
B (or)
( ) A ==> w
Where:
A & B are variables,
w is a string of terminals

Theorem 1: Every right-linear CFG generates


a regular language
Theorem 2: Every regular language has a
right-linear grammar
Theorem 3: Left-linear CFGs also represent
RLs
33

Some Examples
0
A

1
1

0,1
0

Right linear CFG?

0
A

1
1

0
B 1
0

Right
g linear CFG?

A => 01B | C
B => 11B | 0C | 1A
C => 1A | 0 | 1
Finite Automaton?

34

Ambiguity in CFGs and CFLs

35

Ambiguity in CFGs

A CFG is said to be ambiguous if there


exists a string which has more than one
left-most derivation

Example:
S ==> AS |
A ==> A1 | 0A1 | 01

LM derivation #1:
S =>
> AS
=> 0A1S
=>0A11S
=> 00111S
=> 00111
Input string: 00111
Can be derived in two ways

LM derivation #2:
S =>
> AS
=> A1S
=> 0A11S
=> 00111S
=> 00111
36

Why does ambiguity matter?


Values are
different !!!

E ==> E + E | E * E | (E) | a | b | c | 0 | 1

string = a * b + c

LM derivation #1:
E => E + E => E * E + E
==>*
> a*b+c

E
E

(a*b)+c
c

E
b
E

LM derivation #2
E => E * E => a * E =>
a * E + E ==>* a * b + c

E
a

The calculated value depends on which


of the two parse trees is actually used.

*
E
b

a*(b+c)
E
c
37

Removing
g Ambiguity
g y in
Expression Evaluations

It MAY be possible to remove ambiguity for


some CFLs

E.g.,, in a CFG for expression evaluation by


imposing rules & restrictions such as precedence
This would imply
p y rewrite of the g
grammar
Modified unambiguous version:

Precedence: (), * , +

Ambiguous version:
E ==> E + E | E * E | (E) | a | b | c | 0 | 1

E => E + T | T
T => T * F | F
F => I | (E)
I => a | b | c | 0 | 1
How will this avoid ambiguity?
38

Inherently Ambiguous CFLs

However, for some languages, it may not be


possible to remove ambiguity

A CFL is said to be inherently ambiguous if


every CFG that describes it is ambiguous
Example:

L = { anbncmdm | n,m
n m 1} U {anbmcmdn | n,m
n m 1}
L is inherently ambiguous
Why?
n n n n
Input string: a b c d

39

Summary

Context-free grammars
Context-free languages
Productions, derivations, recursive inference,
parse trees
L ft
Left-most
t & right-most
i ht
t derivations
d i ti
Ambiguous grammars
R
Removing
i ambiguity
bi it
CFL/CFG applications

parsers markup languages


parsers,
40

You might also like