0% found this document useful (0 votes)
8 views28 pages

2 Grammars

The document discusses the concept of grammars as generative models that produce strings of languages through rewriting rules. It defines formal grammars, their components, and provides examples to illustrate different types of grammars, including context-free and regular grammars. Additionally, it explores the relationships between grammars and automata, highlighting their equivalence and applications in programming language syntax and language recognition.

Uploaded by

aleksa.pulai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views28 pages

2 Grammars

The document discusses the concept of grammars as generative models that produce strings of languages through rewriting rules. It defines formal grammars, their components, and provides examples to illustrate different types of grammars, including context-free and regular grammars. Additionally, it explores the relationships between grammars and automata, highlighting their equivalence and applications in programming language syntax and language recognition.

Uploaded by

aleksa.pulai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

1

Grammars

• Automata are a model suitable to recognize/accept, translate,


compute (languages): they “receive” an input string and process it in
various ways
• Let us now consider a generative model:
a grammar produces, or generates, strings (of a language)
• General notion of a grammar or syntax (alphabet and vocabulary, and
grammar and syntax are synonymous):
set of rules to build phrases of a language (strings): it applies to any
notion of a language in the widest possible sense.
• In a way similar to normal linguistic mechanisms, a formal grammar
generates strings of a language through a process of rewriting:
2

• “A phrase is made of a subject followed by a predicate”


“A subject can be a noun or a pronoun, or …”
“A predicate can be a verb followed by a complement…”
• A program consists of a declarative part and an executable part
The declarative part …
The executable part consists of a statement sequence
A statement can be simple or compound
….
3

• An email message consists of a header and a body


The header contains an address, ….
• …

• In general this kind of linguistic rules describes a “main object”


(a book, a program, a message, a protocol, ….) as a sequence of
“composing objects” (subject, header, declarative part, …).
Each of these is then “refined” by replacing it with more
detailed objects and so on, until a sequence of base elements is
obtained (bits, characters, …)
The various rewriting operations can be alternative: a subject
can be a noun or a pronoun or something else; a statement can
be an assignment, or I/O, ...
4

Formal definition of a grammar

• G = (VN, VT, P, S)
– VN : nonterminal alphabet or vocabulary
– VT : terminal alphabet or vocabulary
– V = VN  VT
– S  VN : a particular element of VN called axiom or initial
(Start) symbol
– P  VN+  V* : set of rewriting rules, or productions
P = {(, ) | VN+  V*}
for ease of notation, write (, ) as   
to emphasize the action of rewriting
5

Example

• VN = {S, A, B, C, D}
• VT = {a,b,c}
• S
• P = {S  AB, BA  cCD, CBS  ab, A  e}
6

Relation of Immediate Derivation “”


   , V  ,  V *
if and only if
  1 2 3 ,   1 2 3   2  2  P
 2 is rewritten as  2 in the context of 1 and  3

With reference to the previous grammar: applying rule BA  cCD


aaBAS  aacCDS

As usual, define the reflexive and transitive closure of 

it means: “zero or more rewriting steps”


7

Language generated by a grammar

L(G) = {x  VT* | S * x}

It consists of all strings, containing only terminal symbols, that can be


derived (in any number of steps) from S

NB: not necessarily all derivations lead to a string of terminal symbols


some may “get stuck” (string is not terminal but no rule can be applied)
some may be “never ending”
8

A first example
G1 = ({S, A, B}, {a, b, 0},
{S  aA, A  aS, S  bB, B  bS, S  0}, S)

example derivations:
S0
S  aA  aaS  aa0
S  bB  bbS  bb0
S  aA  aaS  aabB  aabbS  aabb0

We can see that:


L(G1) = {aa, bb}* . 0
9

Second example

G2 = ({S}, {a, b},


{S  aSb | ab}, S) (abbreviation for S  aSb, S  ab)

some derivations:
S  ab
S  aSb  aabb
S  aSb  aaSbb  aaabbb

Through an easy generalization:


L(G2) = {anbn | n > 0}
If we change S  ab with S   we get:
L(G2) = {anbn | n ≥ 0}
10
Third example: G3
{S  aACD, A  aAC , A   , B  b, CD  BDc, CB  BC , D   }
*
S  aACD  aCD  aBDc  abc
S  aACD  aaACCD  aaCCD  aaCC 
S  aACD  aaACCD  aaCCD  aaCBDc 
aaBCDc  aabCDc  aabBDcc  aabbDcc  aabbcc
*
S  aaaACCCD  aaaCCCD  aaaCCBDc  aaaCCbDc  aaaCCbc 
...

1. SaACD, AaAC and A generate a n C n D


2. any xL includes only terminal symbols, hence nonterminal symbols must disappear
3. C disappears only when it “hits” the D and then it generates a ‘B’ and a ‘c’
4. C’s and B’s must switch to permit all the C’s to reach the D
5. Hence Cn D * bn cn
6. Hence L={anbncn | n > 0}
11

Some “natural” questions

• What is the practical use of grammars (beyond funny


“tricks” like {anbn}?)
• What languages can be obtained through grammars?
• What relations exist among grammars and automata
(better: among languages generated by grammars and
languages accepted by automata?
12

Some answers

• Definition of the syntax of the programming languages


• Applications are “dual” w.r.t. automata: grammars
generate languages, whereas automata recognize them
• Simplest example: language compilation: the grammar
defines the language, the automaton accepts and
translates it
13

Classes of grammars

• Context-free (CF) grammars:


–  rule (a  b)  P, |a| = 1, i.e., a is an element A of VN.
– Context free because the rewriting of a (i.e., of AVN) does not
depend on its context (the string parts surrounding it do not appear
in the left-hand side of the rule)
– These are in fact the same as the BNF used for defining the syntax
of programming languages (so they are well fit to define typical
features of programming and natural languages, … but not all)
– G1 and G2 above are context-free not so for G3
14
• Regular Grammars:

–  rule (a  b)  P, |a| = 1, b  (VT . VN)  VT

– Regular grammars are also context free, but not vice versa
– G1 above is regular, not so G2.
– (For the empty string, we also need the rule S → e)
15

Monotonic Grammars:

Every rule has the form   , where || ≤ ||

As before, we need S →  for the empty string, but S cannot occur
in right parts of rules

Regular grammars are clearly monotonic

while CF grammars appear to be in general not monotonic,
because they allow rules A → .

On the other hand, we can easily remove such rules:

e.g.: a rule B → abaAcDA can be replaced by the rules
B → abaAcDA | abacDA | abaAcD | abacD

Note: languages defined by monotonic grammars are called
contextual and are defined by Linear Automata (i.e. TM where the
tape length is bounded)
16

Relations among grammars and languages (Chomsky Hierarchy)

General or unrestricted g. (type 0)

Monotonic g. (type 1)

CF grammars (type 2)

Regular g. (type 3)
17

We can immediately deduce that:

L3  L2  L1  L0

Are the containments strict?

The answer is given by the relation with automata


18
Relations between grammars and automata
(with few surprises)
• Define “equivalence” between RG and FSA
(i.e., the FA accepts same language that the RG generates)
– From FSA to RG: given A=(Q, I, q0, F), let VN = Q, VT = I, S = <q0>, and,
for each d(q, i) = q’ let <q> i<q’> and, if q’ F, add <q>  i
– It is an easy intuition (proved by induction) that
d*(q, x) = q’ iff <q> * x<q’>, and hence, if q’ F, <q> * x
• Vice versa, from RG to FSA:
– Given a RG, let Q = VN{qF}, I = VT, <q0> = S, qF F and,
for each A bC let d(A,b) = C
for each A b let d(A,b) = qF

The FSA thus obtained is nondeterministic (why?): much easier!


19

• CFG equivalent to PDA (ND!)

intuitive justification (no proof:


the proof is the “heart” of compiler construction)
20

, S / b a

, S / b S a

q0
, Z0 / Z0 S q1
, Z0 / 
q2
a, a / 

b, b / 
21
S  aSb | ab
S  aSb  aabb
a a b b a a b b a a b b

a
S S
S b b
Z0 Z0 Z0

a a b b a a b b … a a b b

a
b b …
b
Z0 b
Z0 Z0
22

General grammars are equivalent to TM


• Given G let us construct (in broad lines) a ND TM, M,
accepting L(G):
– The input string x is on the input tape
– Loop:

The memory tape (which, in general, will contain a string a, a
 V*) is scanned searching the right part  of some     P

When one is found -not necessarily the first one, M operates a
ND choice-  is substituted by the corresponding left part 
23
– This way: g  d iff c = <q, d> ├* <q, g>
– The TM is following the derivation relation  from right to
left

If and when the tape holds the axiom S, x is accepted.


(i.e. <q0, x> ├* <qF, S>)

Otherwise this particular computation of the nondeterministic TM does


not lead to acceptance.
Note: for monotonic grammars, we are using only the memory
originally occupied by x, so we do not need other memory cells (Linear
Automata), with the only exception of the empty string
24

• Using a ND TM facilitates the construction but is not necessary


• It is instead necessary -and, we will see, unavoidable- that, if x 
L(G), M might “try an infinite number of ways”, some of which might
never terminate, without being able (rightly) to conclude that x 
L(G), but not even the opposite.

• This is consistent with the definition of acceptance, which requires M


to reach an accepting configuration if and only if x  L, but does not
require M to terminate its computation without accepting (i.e., in a
“rejecting state”) if x  L
25
• Given M (single tape, for ease of reasoning and without loss of
generality) we define (in broad lines) a G generating L(M):
– First, G generates all strings of the type
x$X, x  VT*, X being a “copy of x” composed of nonterminal
symbols (e.g., for x = aba, x$X = aba$ABA)
– G simulates the successive configurations of M using the string on the
right of $
– G is defined in a way such that it has a derivation x$X * x, with
xVT* , if and only if x is accepted by M.
– The idea: simulate each move of M by an immediate derivation of G
26

Example grammar for string like x$X, with alphabet {a,b}


S  S A' A S  S B' B
S$
A A'  A' A A B'  B' A
B A'  A' B B B'  B' B
$A'  a$ $B'  b$

• The objective is to generate x$X from a derivation


x$X * x iff x is accepted by M.
• The main idea is to simulate every move of M with an
immediate derivation of G:
27
– We represent the configuration
… blank … a B A C b … blank …

• through the string (special cases are left as an exercise): $aBqACb


• to start, G has therefore a set of derivations of the kind x$Xx$qoX (where qoX
encodes the initial configuration of M)
• for each value of the transition function  of M, a rule of G is defined :
– d(q,A) = <q’, A’, R> G includes the rule qA  A’q’
– d(q,A) = <q’, A’, S> G includes the rule qA  q’ A’
– d(q,A) = <q’, A’, L> G includes the rules BqA  q’ BA’
 B in the alphabet of M (recall that M is single tape, hence it has a unique
alphabet for input, memory, and output)
28

– This way, for instance:

a B A C b a B A’ C b

q q’

– If and only if: x$aBqACb  x$aBA’q’Cb,


– etc. …
– We finally add productions allowing G to derive from
x$aBqFACb a unique x if –and only if– M reaches an
accepting configuration (aBqFACb), by deleting whatever is
to the right of $, and also $

You might also like