motivation for formal grammars
Natural Languages are usually described
by rules like the one shown below
Example:
(1) sentence noun-phrase verb-phrase
(2) noun-phrase article noun
(3) article a | the
(4) noun girl | dog
(5) verb-phrase verb noun-phrase
(6) verb sees | pets
Grammars Produce Languages
Language: the set of strings (of terminals) that
can be generated from the start symbol by derivation:
sentence
noun-phrase verb-phrase . (rule 1)
article noun verb-phrase . (rule 2)
the noun verb-phrase . (rule 3)
the girl verb-phrase . (rule 4)
the girl verb noun-phrase . (rule 5)
the girl sees noun-phrase . (rule 6)
the girl sees article noun . (rule 2)
the girl sees a noun . (rule 3)
the girl sees a dog . (rule 4)
•Language: the programs (character streams)
allowed
•Grammar rules (productions): "produce" the
language
left-hand side, right-hand side
•nonterminals (structured names):
noun-phrase verb-phrase
•terminals (tokens): . dog
•metasymbols: (“consists of”) | (choice)
•start symbol: the nonterminal that stands for
the entire structure (sentence, program).
–sentence
•E.g., if-statement if (expression) statement else
statement
Context-Free Grammar
Context-Free Grammars (CFG)
Noam Chomsky, 1950s.
Define context-free languages.
Four components:
terminals, nonterminals, one start symbol,
productions (left-hand side: one single
nonterminal)
CFG ’s
Context-free grammar( CFG or just grammar) is
4-tuple denoted G=(V,T,P,S) ,
Where
V is a finite set of variables or nonterminals or syntactic
categories
T is a finite set of terminals or tokens
P is a finite set of production rules in the form
A ,where A is a variable and is a string of symbols from
(V T)*
S is a special variable called the start symbol
What does “Context-Free” mean?
Left-hand side of a production is always
one single nonterminal:
The nonterminal is replaced by the
corresponding right-hand side, no matter
where the nonterminal appears. (i.e., there
is no context in such
replacement/derivation.)
Context-sensitive grammar (context-
sensitive languages)
Why context-free?
CFG Example 1
E → ID
| NUM
| E*E
| E/E
| E+E
| E-E
| (E)
ID → a | b |…|z
NUM → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Grammars Produce Languages
Does the above grammar produce the following
sentence (a*b)+c Start with start symbol of the CFG
E → E+E
Left most derivation
→ ( E)+E
→ ( E*E)+E
Notations
→ (ID*E)+E
G
→ ( a * E)+E E ==> E +E (single step left most derivation)
lm
→ ( a * ID)+E
*
E ==> E+E (zero or more step derivation)
→ ( a * b)+ID lm
*
Example: E ==> (a*b)+c
→ ( a * b) + c G
Context – free languages (CFL ’s)
The languages described by context –free grammars are
known as CFL ’s
formal notation:
The language generated by G [denoted L(G)] is {w | w is
== w }
*
in T* and S ==>
G
That is , a string is in L(G) if:
1) the string consists solely of terminals
2) the string can be derived from S
A string of terminals and variables is called
a sentential form if S ==>
*
==
CFG Example 2
S → if E then S else S
| begin S L
| print E
L → end
|;SL
E → NUM = NUM
NUM → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Parse Tree
Represents the derivation steps from start
symbol to the string
Given the derivations used in the parsing of an
input sequence, a parse tree has
the start symbol as the root
the terminals of the input sequence as leafs
for each production A → X1 X2 ... Xn used in a
derivation, a node A with children X1 X2 ... Xn
Parse Tree Example 1
CFG:
expr
expr → expr + expr | expr * expr | (expr) | number
number → number digit |digit
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
expr expr
+
Input Sequence: 3+4*5 number expr * expr
digit number number
3 digit digit
4 5
What is Parsing?
Given a grammar and a token string:
determine if the grammar can generate the
token string?
i.e., is the string a legal program in the
language?
In other words, to construct a parse
tree for the token string.
What’s significant about
parse tree?
A parse tree gives a unique syntactic
structure
Leftmost, rightmost derivation
There is only one leftmost derivation for a
parse tree, and symmetrically only one
rightmost derivation for a parse tree.
Example
expr expr + expr | expr expr | ( expr ) | number
number number digit | digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
parse tree expr
leftmost derivation expr
expr + expr
expr + expr
number expr * expr
number expr * expr
digit number number
digit number number
3 digit digit
3 digit digit
4 5
4 5