Lecture 8 - CFG
Lecture 8 - CFG
Lecture 8 - CFG
Lecture 8
Grammars & CFG
Dr. Samar Hesham
Department of Computer Science
Faculty of Computers and AI
Cairo University
Egypt
1
Syllabus and Terminologies
5/31/2024 FCI-CU-EG 2
Languages
Regular Languages
4
Context-Free Languages
n n R
{a b } {ww }
Regular Languages
5
Regular Languages
Regular
Languages
Context-Free Pushdown
Grammars Automata stack
automaton
7
What is a Grammar
A grammar is a precise description of a formal
language.
It describes what possible sequence of symbols/strings
constitute valid words or sentences in that language
Natural Formal Languages:
Arabic, English, French, Spanish … etc
Programming Languages:
C, C++, Java, C#, HTML, XML …
8
What is a Grammar
A grammar G <N, Σ ,P, S> consists of the following
components:
1. A finite set N of non-terminal symbols or variables.
2. A finite set Σ of terminal symbols that are disjoint from N.
3. A finite set P of production rules of the form
(Σ U N)* N (Σ U N)* → (Σ U N)*
where * is the Kleene star operator and U denotes the set
union. Each production rule maps from one string of
symbols to another where the left hand side contains at
least one non terminal symbol.
4. A distinguished start symbol S ∈ N.
9
Regular languages
A language is said to be a regular language if it is generated by
a regular grammar.
A grammar is said to be regular if it's either right-linear or left-
linear.
Specifically, a grammar G <N, Σ ,P, S> is said to be:
right-linear if each of its production rules is either in the form
A → xA or A → x,
left-linear if each of its production rules is either in the form
A → Bx or B → x,
Where:
A and B are non terminal symbols in N and,
10
Example
Let A={a,b,c}, then the grammar for the A* language
can be described by the following production rules:
S→
S→aS
S→bS
S→cS
How do we know that this grammar describes the
language A*?
We must be able to describe each string of the language in terms of the
grammar rules.
Prove that the string aacb is in A*???
11
Example
If A={a,b,c}, and the production rules is the set P the
grammar G=<N, Σ,P,S> ≡ <{S,A,B}, {a,b}, P, S>,
where P ≡ S→AB A→ |aA B→ |bB.
Let us derive the string aab:
S⇒AB⇒aAB⇒aaAB⇒aaB⇒aabB⇒aab.
Note: that the language can have more than one grammar.
So we should not be surprised when two people come up
with two different grammar for the same language.
12
Combining grammars
Suppose M and N are languages whose grammars have disjoint sets of
non-terminals. Suppose also that the start symbols for the grammars M
and N are A and B respectively. We can obtain the following new
languages and grammars:
13
Context-free languages
A language is said to be context-free if it is generated by context-free
grammar (CFG).
A grammar G <N, Σ, P, S> is context-free if the production
rules are of the form N → (N U Σ)*.
Unlike regular grammars, the right-hand sides of the production rules
in CFGs are unrestricted and can be any combination of terminals
and non-terminals.
Regular languages (RLs) are subsets of context free languages
(CFLs).
Things that cannot be expressed by regular grammar, but needed in
Parsing of CFLs:
Palindromes.
Balanced brackets.
Counting!! 14
CFG
A context-free grammar is a notation for defining
context free languages.
It is more powerful than finite automata or REs, but
still cannot define all possible languages.
Useful for nested structures, e.g., parentheses in
programming languages.
Basic idea is to use “variables” (non-terminals) to
stand for sets of strings.
These variables are defined recursively, in terms of
one another.
15
CFG
CFG is used to generate the strings belonging to CFL.
Each production has the form A → w, where A is a
nonterminal and w is a string of terminals and non-
terminals.
Any non-terminal can be expanded out to any of its
productions at any point.
Language of a CFG: set of strings of terminals that
can be derived from its start symbol
Pushdown Automata (PDA) is the automata capable
of accepting languages defined by CFGs.
16
CFGs: Alternate Definition
Many textbooks use different symbols and terms to
describe CFG’s
G = (V, S, P, S)
V = variables a finite set
S = alphabet or terminals a finite set
P = productions a finite set
S = start variable SV
A x
x is string of variables and terminals 18
CSG
A context-sensitive grammar is a notation for
defining context sensitive languages.
Each production has the form wAx → wyx
where w and x are strings of terminals and non-terminals
and y is a string of terminals
The productions give rules saying "if you see A in a
given context, you may replace A by the string y
19