0% found this document useful (0 votes)
7 views8 pages

Chapter 2 Grammars

Chapter 2 discusses grammars as formal systems for language generation, defining grammars as quadruplets consisting of terminal and non-terminal vocabularies, an axiom, and production rules. It covers derivation processes, including leftmost and rightmost derivations, and introduces the concept of derivation trees as a representation of the derivation sequence. Additionally, it explores the classification of grammars and languages, emphasizing the equivalence of grammars that generate the same language.

Uploaded by

Moussab Chibani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views8 pages

Chapter 2 Grammars

Chapter 2 discusses grammars as formal systems for language generation, defining grammars as quadruplets consisting of terminal and non-terminal vocabularies, an axiom, and production rules. It covers derivation processes, including leftmost and rightmost derivations, and introduces the concept of derivation trees as a representation of the derivation sequence. Additionally, it explores the classification of grammars and languages, emphasizing the equivalence of grammars that generate the same language.

Uploaded by

Moussab Chibani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Chapter 2 : Grammars page 16

Chapter 2 : Grammars

Chapter 2 outline :

1. Grammar (language generation system)


2. Formal definition of grammars
3. Derivation and generated language
4. Derivation tree
5. Classification of grammars (Chomsky hierarchy)
6. Classification of languages

1. Grammar (language generation system)


How can we describe a language in a formal way to facilitate its processing by a computer ?
- General formalism for describing a language.
- Based on the use of a generative mechanism capable of producing all the words of a
given language.
- The most important advantage of grammar definitions is that they allow us to understand
the structures of a language. Thus, a grammar is a means of describing the construction of
the words of a language.

2. Formal definition of grammars


A grammar is a quadruplet 𝐺 = (𝑉𝑇 , 𝑉𝑁 , 𝑆, 𝑃) where :
- 𝑉𝑇 : is a non-empty set. It is the terminal vocabulary, which is the vocabulary of the
language (the alphabet on which the language is defined) ;
- 𝑉𝑁 : non-terminal vocabulary. These are intermediate symbols for producing new objects
(these are the symbols that still need to be defined) ;
- 𝑆 ∈ 𝑉𝑁 : a particular non-terminal called an axiom ;
V

VN
VT

So we have : 𝑉𝑁 ∩ 𝑉𝑇 = ∅ 𝑎𝑛𝑑 𝑉𝑁 ∪ 𝑉𝑇 = 𝑉
- 𝑃 : a set of production or rewriting rules. Each rule is of the form 𝛼 → 𝛽 with
𝛼 ∈ (VN ∪ VT )+ and 𝛽 ∈ (VN ∪ VT )∗ ;
A production rule 𝛼 → 𝛽 specifies that the sequence of symbols 𝛼 can be replaced by the
sequence of symbols 𝛽 ;

Module : Language theory 2nd Year Bachelor of Computer Science – 4th Semester Module supervisor : Amiar Lotfi
Chapter 2 : Grammars page 17

The notation 𝛼 → 𝛽 is called a derivation and means that 𝛼 can be replaced by 𝛽 ;


𝛼 is called left member and 𝛽 right member ;
Words are generated through non-terminals.
There are two types of rules :
o Non-recursive rules : 𝐴 → 𝑏.
o Recursive rules : on the right (𝐴 → 𝑏𝐴) and on the left (𝐴 → 𝐴𝑏).
To each recursive rule, a non-recursive rule must be associated in order to terminate the
recursion.

Notation : There are several ways to write a production rule :


a) 𝛼 → 𝛽
b) 𝛼 ::= 𝛽 : Backus notation
c) < 𝛼 > ∷= < 𝛽 > : BNF notation (Backus Normal Form)

Conventions :
a) For the elements of 𝑉𝑇 , we use lowercase letters, numbers, and other symbols.
b) For the elements of 𝑉𝑁 , we only use capital letters.
c) (𝛼 → 𝛽1 , 𝛼 → 𝛽2 , … , 𝛼 → 𝛽𝑛 ) ⇔ (𝛼 → 𝛽1 |𝛽2 | … |𝛽𝑛 ) .

3. Derivation and generated language



Definition 1 (derivation) : We define the derivation relation denoted : ⇒ on the set 𝑉 ∗ × 𝑉 ∗

by 𝛼1 ⇒ 𝛼𝑚 (which is read as : 𝛼1 𝑑𝑒𝑟𝑖𝑣𝑒𝑠 𝑖𝑛𝑡𝑜 𝛼𝑚 ) if and only if there exists 𝛼2 , … , 𝛼𝑚−1 in
𝑉 ∗ such that :
𝛼1 ⟹ 𝛼2 ⟹ ⋯ ⟹ 𝛼𝑚−1 ⟹ 𝛼𝑚 .
𝑘 𝑘
We can specify the number 𝑘 of derivations by : ⇒ (𝛼1 ⇒ 𝛼𝑚 ).
The 𝑑𝑖𝑟𝑒𝑐𝑡 𝑑𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑜𝑛 relation is denoted by : ⟹ (𝛼𝑖 ⟹ 𝛼𝑗 ).
Remarks :

- 𝑆 ⇒ 𝑤 : is a derivation sequence. We then say that there is a derivation chain leading
from 𝑆 to 𝑤.
- Let be a grammar 𝐺 (𝑉𝑇 , 𝑉𝑁 , 𝑆, 𝑃). We say that the word 𝑤 belonging to 𝑉𝑇∗ is derived (or
generated) from 𝐺 if there exists a derivation sequence that, starting from the axiom 𝑆,

allows obtaining 𝑤 ∶ 𝑆 ⇒ 𝑤.

Module : Language theory 2nd Year Bachelor of Computer Science – 4th Semester Module supervisor : Amiar Lotfi
Chapter 2 : Grammars page 18

- We apply the derivation rules to construct words. There are two ways to derive words
from the axiom :
• The leftmost derivation : we start the derivation with the leftmost non-terminal.
• The rightmost derivation : we start the derivation with the rightmost non-terminal.

Definition 2 (language generated by a grammar) : The language of all the words generated

by the grammar 𝐺 is denoted 𝐿(𝐺) such that 𝐿(𝐺) = {𝑤 ∈ 𝑉𝑇∗ |𝑆 ⇒ 𝑤} and we say that the

language 𝐿 is generated by the grammar 𝐺.


Note that an expression, derived from the axiom, is considered belonging to 𝐿(𝐺) only if it
does not contain any non-terminal.

Definition 3 (word generated by a grammar) : To show that a word 𝑤 can be generated by



a grammar, it is necessary to find successive derivations starting from the axiom 𝑆 : 𝑆 ⇒ 𝑤.

Example 1 :
Let 𝐺1 (𝑉𝑇 , 𝑉𝑁 , 𝑆, 𝑃) be with : 𝑉𝑇 = {𝑎}, 𝑉𝑁 = {𝑆}, 𝑆, 𝑃 = {𝑆 → 𝑎𝑆|𝜀}
Give :
a) the words generated by this grammar,
b) the general form of this language generated by 𝐺1,
c) the derivation sequence of the word 𝑎𝑎𝑎𝑎𝑎.

Response example 1 :
a) Words generated by this grammar are :
The derivation sequence 𝑆 ⟹ 𝜀 generates the word 𝑤 = 𝜀
The derivation sequence 𝑆 ⟹ 𝑎𝑆 ⟹ 𝑎𝜀 ⟹ 𝑎 generates the word 𝑤 = 𝑎
The derivation sequence 𝑆 ⟹ 𝑎𝑆 ⟹ 𝑎𝑎𝑆 ⟹ 𝑎𝑎𝜀 ⟹ 𝑎𝑎 generates the word 𝑤 = 𝑎𝑎
The derivation sequence 𝑆 ⟹ 𝑎𝑆 ⟹ 𝑎𝑎𝑆 ⟹ 𝑎𝑎𝑎𝑆 ⟹ 𝑎𝑎𝑎𝜀 ⟹ 𝑎𝑎𝑎 generates the word
𝑤 = 𝑎𝑎𝑎

Then the words that can be generated by this grammar are :


{𝜀, 𝑎, 𝑎𝑎, 𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎𝑎, … }
b) 𝐿(𝐺1) = {𝑎𝑛 𝑤𝑖𝑡ℎ 𝑛 ≥ 0}.
c) The derivation sequence for the word 𝑎𝑎𝑎𝑎𝑎 is given by :
𝑆 ⟹ 𝑎𝑆 ⟹ 𝑎𝑎𝑆 ⟹ 𝑎𝑎𝑎𝑆 ⟹ 𝑎𝑎𝑎𝑎𝑆 ⟹ 𝑎𝑎𝑎𝑎𝑎𝑆 ⟹ 𝑎𝑎𝑎𝑎𝑎𝜀 ⟹ 𝑎𝑎𝑎𝑎𝑎

Then 𝑆 ⇒ 𝑎𝑎𝑎𝑎𝑎.

Module : Language theory 2nd Year Bachelor of Computer Science – 4th Semester Module supervisor : Amiar Lotfi
Chapter 2 : Grammars page 19

Example 2 :
Let 𝐺2 (𝑉𝑇 , 𝑉𝑁 , 𝑆, 𝑃) be with: 𝑉𝑇 = {𝑎, 𝑏}, 𝑉𝑁 = {𝑆, 𝐴}, 𝑆, 𝑃 = {𝑆 → 𝑎𝑆|𝑎𝐴 ; 𝐴 → 𝑏𝐴|𝑏}
Give:
a) the words generated by this grammar,
b) the general form of this language generated by 𝐺2,
c) the derivation sequence of the word 𝑎𝑎𝑏𝑏𝑏.

Response example 2 :
a) Words generated by this grammar are :
The derivation sequence 𝑆 ⟹ 𝑎𝐴 ⟹ 𝑎𝑏 generates the word 𝑤 = 𝑎𝑏
The derivation sequence 𝑆 ⟹ 𝑎𝑆 ⟹ 𝑎𝑎𝐴 ⟹ 𝑎𝑎𝑏 generates the word 𝑤 = 𝑎𝑎𝑏
The derivation sequence 𝑆 ⟹ 𝑎𝑆 ⟹ 𝑎𝑎𝑆 ⟹ 𝑎𝑎𝑎𝐴 ⟹ 𝑎𝑎𝑎𝑏 generates the word 𝑤 = 𝑎𝑎𝑎𝑏
The derivation sequence 𝑆 ⟹ 𝑎𝐴 ⟹ 𝑎𝑏𝐴 ⟹ 𝑎𝑏𝑏 generates the word 𝑤 = 𝑎𝑏𝑏
The derivation sequence 𝑆 ⟹ 𝑎𝐴 ⟹ 𝑎𝑏𝐴 ⟹ 𝑎𝑏𝑏𝐴 ⟹ 𝑎𝑏𝑏𝑏 generates the word 𝑤 = 𝑎𝑏𝑏𝑏

Then the words that can be generated by this grammar are :


{𝑎𝑏, 𝑎𝑎𝑏, 𝑎𝑎𝑎𝑏, 𝑎𝑎𝑎. . 𝑎𝑏, 𝑎𝑏𝑏, 𝑎𝑏𝑏𝑏, 𝑎𝑏𝑏𝑏 … 𝑏, 𝑎 … 𝑎𝑏 … 𝑏}
b) 𝐿(𝐺2) = {𝑎𝑛 𝑏 𝑚 𝑤𝑖𝑡ℎ 𝑛 ≥ 1 𝑎𝑛𝑑 𝑚 ≥ 1}.
c) The derivation sequence for the word 𝑎𝑎𝑏𝑏𝑏 is given by :
𝑆 ⟹ 𝑎𝑆 ⟹ 𝑎𝑎𝐴 ⟹ 𝑎𝑎𝑏𝐴 ⟹ 𝑎𝑎𝑏𝑏𝐴 ⟹ 𝑎𝑎𝑏𝑏𝑏

Then 𝑆 ⇒ 𝑎𝑎𝑏𝑏𝑏.

Definition 4 (Equivalent grammars) : Two grammars 𝐺 and 𝐺′ are said to be equivalent,


denoted 𝐺 ≡ 𝐺′, if they generate the same language : 𝐺 ≡ 𝐺′ ⇔ 𝐿(𝐺) = 𝐿(𝐺′)
Remarks :
- A grammar defines a single language.
- On the other hand, the same language can be generated by several different grammars.

4. Derivation tree
Let 𝐺 (𝑉𝑇 , 𝑉𝑁 , 𝑆, 𝑃) be a grammar. The derivation tree is another representation of the

derivation sequence 𝑆 ⇒ 𝑤 that leads from the axiom 𝑆 to the word 𝑤. This representation
abstracts from the order of application of the grammar rules and helps to understand the
syntax of the word considered.
The derivation tree of a word relative to a grammar 𝐺 is a tree such that :
a) the root is labeled by the axiom 𝑆,

Module : Language theory 2nd Year Bachelor of Computer Science – 4th Semester Module supervisor : Amiar Lotfi
Chapter 2 : Grammars page 20

b) each internal node is labeled by a non-terminal symbol 𝐴 of 𝑉𝑁 . If its 𝑛 children are


respectively labeled with 𝛼1 , 𝛼2 , … , 𝛼𝑛 , then :
𝐴 → 𝛼1 𝛼2 … 𝛼𝑛
must be a derivation rule of 𝑃.
c) each leaf node is labeled by a terminal symbol of 𝑉𝑇 .
Remarks :
- The word represented by the derivation tree is reconstructed by reading the leaves from
left to right.
- To show that a word 𝑤 can be generated by a grammar 𝐺, we need to find a derivation
tree that generates 𝑤.
The word 𝑤 = 𝑎𝑎𝑎𝑎𝑎 in Example 1 is represented by a derivation tree as follows :
𝑆

𝒂 𝑆

𝒂 𝑆

𝒂 𝑆

𝒂 𝑆

𝒂 𝑆

𝜺
Fig.1. Derivation tree for word 𝑎𝑎𝑎𝑎𝑎
Example 4 :
The word 𝑤 = 𝑎𝑎𝑏𝑏𝑏 in Example 2 is represented by a derivation tree as follows :

Module : Language theory 2nd Year Bachelor of Computer Science – 4th Semester Module supervisor : Amiar Lotfi
Chapter 2 : Grammars page 21

𝒂 𝑆

𝒂 𝐴

𝒃 𝐴

𝒃 𝐴

Fig.2. Derivation tree for word 𝑎𝑎𝑏𝑏𝑏


5. Classification of grammars (Chomsky hierarchy)
By introducing more or less restrictive criteria on the form (or nature) of grammar rules, we
obtain hierarchical classes of grammars, ordered by inclusion. The classification of grammars,
defined in 1957 by 𝑁𝑜𝑎𝑚 𝐶𝐻𝑂𝑀𝑆𝐾𝑌, distinguishes the following four classes : a grammar
𝐺 (𝑉𝑇 , 𝑉𝑁 , 𝑆, 𝑃) is said to be :

Type 3 (regular grammar or linear grammar) :


- On the right, if all the rules of 𝑃 are of the form :
𝐴 → 𝛼𝐵|𝛼 with 𝐴, 𝐵 ∈ 𝑉𝑁 𝑎𝑛𝑑 𝛼 ∈ 𝑉𝑇∗
- On the left, if all the rules of 𝑃 are of the form :
𝐴 → 𝐵𝛼|𝛼 with 𝐴, 𝐵 ∈ 𝑉𝑁 𝑎𝑛𝑑 𝛼 ∈ 𝑉𝑇∗

In other words, the left-hand member of each rule consists of a single non-terminal symbol,
and the right-hand member consists of a terminal symbol and possibly a non-terminal symbol.

For right regular grammars, the non-terminal symbol must always be to the right of the
terminal symbol, whereas for left regular grammars it must be to the left.

Type 2 (context-free grammar, algebraic grammar or Chomsky grammar) :

If all the rules of 𝑃 are of the form :


𝐴 → 𝛼 with 𝐴 ∈ 𝑉𝑁 and 𝛼 ∈ ( 𝑉𝑇 ∪ 𝑉𝑁 )∗
In other words, the left-hand member of each rule consists of a single non terminal symbol.

Module : Language theory 2nd Year Bachelor of Computer Science – 4th Semester Module supervisor : Amiar Lotfi
Chapter 2 : Grammars page 22

Type 1 (context sensitive grammar or contextual grammar) :


If all the rules of 𝑃 are of the form :
𝛼𝐴𝛽 → 𝛼𝑤𝛽 with 𝛼, 𝛽 ∈ ( 𝑉𝑇 ∪ 𝑉𝑁 )∗ , 𝐴 ∈ 𝑉𝑁 𝑎𝑛𝑑 𝑤 ∈ ( 𝑉𝑇 ∪ 𝑉𝑁 )∗

In other words, the non-terminal symbol 𝐴 is replaced by 𝑤 if we have the contexts 𝛼 on the
left and 𝛽 on the right. Moreover, only the axiom can generate the empty word ((𝑆 → 𝜀) ∈ 𝑃)
and in this case 𝑆 does not appear on the right-hand member of another rule.

We can also find the following definition of 𝑡𝑦𝑝𝑒 1 grammars :


All rules are of the form 𝛼 → 𝛽 such that 𝛼 ∈ ( 𝑉𝑇 ∪ 𝑉𝑁 )+ , 𝛽 ∈ ( 𝑉𝑇 ∪ 𝑉𝑁 )∗ and
|𝛼| ≤ |𝛽|. Moreover, if 𝜀 appears in a rule on the right side then we have 𝑆 (the axiom) on
the left side, meaning that (𝑆 → 𝜀) ∈ 𝑃, the other rules where 𝜀 appears are not allowed.

Type 0 (unrestricted grammar or grammar without restriction) :


No restrictions on rules :
𝛼 → 𝛽 with 𝛼 ∈ ( 𝑉𝑇 ∪ 𝑉𝑁 )+ and 𝛽 ∈ ( 𝑉𝑇 ∪ 𝑉𝑁 )∗
Remarks :
- There is an inclusion relationship between these four types of grammars. In other words,
we have : 𝑡𝑦𝑝𝑒 3 ⊆ 𝑡𝑦𝑝𝑒 2 ⊆ 𝑡𝑦𝑝𝑒 1 ⊆ 𝑡𝑦𝑝𝑒 0.

Type 0 : unrestricted grammar

Type 1 : context sensitive grammar

Type 2 : context free grammar

Type 3 : regular grammar

Fig. 3 – Chomsky hierarchy


- The type retained for a grammar is the smallest one that satisfies the conditions.

6. Classification of languages
The classification of grammars above will allow for the categorization of languages according
to the type of grammars necessary for their generation. A language that can be generated by a
type 𝑖 grammar but not by a grammar of a higher type in the hierarchy, will be called a type 𝑖
language.
Each type of grammar is associated with a type of language :

Module : Language theory 2nd Year Bachelor of Computer Science – 4th Semester Module supervisor : Amiar Lotfi
Chapter 2 : Grammars page 23

- 𝑡𝑦𝑝𝑒 3 grammars generate regular (or rational) languages ;


- 𝑡𝑦𝑝𝑒 2 grammars generate context-free (algebraic or non-contextual) languages ;
- 𝑡𝑦𝑝𝑒 1 grammars generate context sensitive (or contextual) languages ;
- 𝑡𝑦𝑝𝑒 0 grammars generate all “decidable” languages, in other words, all languages that
can be recognized in a finite time by a machine. Languages that cannot be generated by a
𝑡𝑦𝑝𝑒 0 grammar are said to be "undecidable".

Remarks :
- A language can be generated by different grammars that can be of different types. A
language takes the smallest type in the sense of inclusion.
- There are "fundamental" languages for the most commonly used types of languages,
these are languages that illustrate an important property of their class :
• Regular languages (type 3) :
The language 𝑎𝑛 𝑏 𝑚 , with 𝑛, 𝑚 ≥ 0 (words of any length composed only of 𝑎 and/or
𝑏), can be generated by the following grammar :
〈{𝑎, 𝑏}, {𝑆, 𝑆1 }, 𝑆, {𝑆 → 𝑎𝑆 | 𝑆1 |𝜀 ; 𝑆1 → 𝑏𝑆1 |𝜀}〉.
• Context-free languages (type 2) :
The language 𝑎𝑛 𝑏 𝑛 , with 𝑛 ≥ 0 (the language of words composed of a certain
number of 𝑎 followed by the same number of 𝑏), which can be generated by the
following grammar :
〈{𝑎, 𝑏}, {𝑆}, 𝑆, {𝑆 → 𝑎𝑆𝑏 | 𝜀}〉.
• Context sensitive languages (type 1) :
The language 𝑎𝑛 𝑏 𝑛 𝑐 𝑛 with 𝑛 ≥ 1, which can be generated by the following grammar :

〈{𝑎, 𝑏, 𝑐}, {𝑆, 𝑆1 , 𝑆2 }, 𝑆, {𝑆 → 𝑎𝑆1 𝑐 ; 𝑆1 → 𝑏|𝑆𝑆2 ; 𝑐𝑆2 → 𝑆2 𝑐 ; 𝑏𝑆2 → 𝑏𝑏 }〉.

Module : Language theory 2nd Year Bachelor of Computer Science – 4th Semester Module supervisor : Amiar Lotfi

You might also like