100% found this document useful (1 vote)
19 views62 pages

09 CFL

Uploaded by

maddywright3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
19 views62 pages

09 CFL

Uploaded by

maddywright3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Grammars and Pushdown Automata

COMP1600 / COMP6260

Nisansala Yatapanage Dirk Pattinson


(with contributions by many previous colleagues)

Australian National University

Semester 2, 2023
Formal Languages – A Reminder of Terminology

The alphabet or vocabulary of a formal language is a set of tokens


(or letters). It is usually denoted Σ.
A string over Σ is a sequence of tokens, or the null-string .
I sequence may be empty, giving empty string 
I ababc is a string over Σ = {a, b, c}
A language with alphabet Σ is some set of strings over Σ.
I For example, the set of all strings Σ∗
I or the set of all strings of even length, {w ∈ Σ∗ | w has even length}

Notation.
Σ∗ is the set of all strings over Σ.
Therefore, every language with alphabet Σ is some subset of Σ∗ .

1 / 59
Specifying Languages

Languages can be given . . .


as a finite enumeration, e.g. L = {, a, ab, abb}
as a set, by giving a predicate, e.g. L = {w ∈ Σ∗ | P(w )} for some
alphabet Σ
algebraically by regular expressions, e.g. L = L(r ) for regexp r
by an automaton, e.g. L = L(A) for some FSA A
by a grammar (this lecture)

Grammar.
a concept that has been invented in linguistics to describe natural
languages
describes how strings are constructed rather than how membership
can be checked (e.g. by an automaton)
the main tool to describe syntax.
2 / 59
Grammars in general

Formal Definition. A grammar is a quadruple hVt , Vn , S, Pi where


Vt is a finite set of terminal symbols (the alphabet)
Vn is a finite set of non-terminal symbols disjoint from Vt
(Notation: V = Vt ∪ Vn )
S is a distinguished non-terminal symbol called the start symbol
P is a set of productions, written

α→β

where
I α ∈ V ∗ Vn V ∗ (i.e. at least one non-terminal in α)
I β ∈ V ∗ (i.e. β is any list of symbols)

3 / 59
Example
The grammar
G = h{a, b}, {S, A}, S, {S → aAb, aA → aaAb, A → }i
has the following components:
Terminals: {a, b} Productions:
Non-terminals: {S, A} S → aAb
Start symbol: S aA → aaAb
A→
Notation.
Often, we just list the productions P, as all other components can be
inferred (S is the standard notation for the start symbol)
The notation α → β1 | · · · | βn abbreviates the set of productions
α → β1 , α → β2 , ..., α → βn
(like for inductive data types)
4 / 59
Derivations
Intuition.
A production α → β tells you what you can “make” if you have α:
you can turn it into β.
The production α → β allows us to re-write any string γαρ to γβρ.
Notation: γαρ ⇒ γβρ
Derivations.
α ⇒∗ β if α can be re-written to β in 0 or more steps
so ⇒∗ is the reflexive transitive closure of ⇒.
Sentential Forms of a grammar.
informally: all strings (may contain non-terminals) that can be
generated from S
formally: S(G ) = {w ∈ V ∗ | S ⇒∗ w }.
Language of a grammar.
informally: all strings of terminal symbols that can be generated from
the start symbol S
formally: L(G ) = {w ∈ Vt∗ | S ⇒∗ w } ⊆ S(G )
5 / 59
Example
Productions of the grammar G .
S → aAb, aA → aaAb, A → .
Example Derivation.

6 / 59
Example
Productions of the grammar G .
S → aAb, aA → aaAb, A → .
Example Derivation.

S ⇒ aAb ⇒ aaAbb ⇒ aaaAbbb ⇒ aaabbb

last string aaabbb is a word, others are sentential forms

Language of grammar G .
L(G ) = {an b n | n ∈ N, n ≥ 1}

Alternative Grammar for the same language


S → aSb, S → ab.
(Grammars and languages are not in 1-1 correspondence)
6 / 59
The Chomsky Hierarchy (by Noam Chomsky, a linguist!)
Each grammar is of a type:

Unrestricted: (type 0) no constraints.


Context-sensitive: (type 1) the length of the left hand side of each
production must not exceed the length of the right.
Note 1: There are other equivalent definitions which
don’t restrict the length
Note 2: If  ∈ L should be allowed, we are allowed
S → , but then we don’t allow S to occur on any
right-hand side.
Context-free: (type 2) the left of each production must be a single
non-terminal.
Regular: (type 3) As for type 2, and the right of each production is
also constrained (details to come).

(There are lots of intermediate types, too.)


7 / 59
Classification of Languages

Definition. A language is type n if it can be generated by a type n


grammar.

Immediate Fact.
Every language of type n + 1 is also of type n.

Establishing that a language is of type n


give a grammar of type n that generate the language
usually the easier task

Disproving that a language is of type n


must show that no type n-grammar generates the language
usually a difficult problem

8 / 59
Example — language {an b n | n ∈ N, n ≥ 1}
Different grammars for this language
Unrestricted (type 0):

S → aAb
aA → aaAb
A→

Context-free (type 2):

S → ab
S → aSb

Recall. We know from last week that there is no DFA accepting L


We will see that this means that there’s no regular grammar
so the language is context-free, but not regular.
9 / 59
Regular (Type 3) Grammars
Definition. A grammar is regular if all its productions are either
right-linear, i.e. of the form

A → aB or A→a or A→

or left-linear, i.e. of the form

A → Ba or A→a or A → .

right and left linear grammars are equivalent: they generate the same
languages
we focus on right linear (for no deep reason)
i.e. one symbol is generated at a time (cf. DFA/NFA!)
termination with terminal symbols or 

Next Goal. Regular Grammars generate precisely all regular languages.


10 / 59
Regular Languages - Many Views
Theorem. Let L be a language. Then the following are equivalent:
L is the language generated by a right-linear grammar;
L is the language generated by a left-linear grammar;
L is the language accepted by some DFA;
L is the language accepted by some NFA;
L is the language specified by a regular expression.

So far.
have seen that NFAs and DFAs generate the same languages (subset
construction)
have hinted at regular expressions and NFAs generate the same
languages

Goal. Show that NFAs and right-linear grammars generate the same
languages.
11 / 59
From NFAs to Right-linear Grammars
Given. Take an NFA A = (Σ, S, s0 , F , R).
alphabet, state set, initial state, final states, transition relation

Construction of a right-linear grammar


terminal symbols are elements of the alphabet Σ;
non-terminal symbols are the states S;
start symbol is the start state s0 ;
productions are constructed as follows:
Each transition gives production
a
T −→ U T → aU
Each final state gives production
T ∈F T →
a
(Formally, a transition T −→ U means (T , a, U) ∈ R.)

Observation. The grammar so generated is right-linear, and hence regular.


12 / 59
NFAs to Right-linear Grammars - Example

Given. A non-deterministic automaton


    
- S - S1 - S2 - S3
  a b   c 
a 6 b 6 c 6

Equivalent Grammar obtained by construction

13 / 59
NFAs to Right-linear Grammars - Example

Given. A non-deterministic automaton


    
- S - S1 - S2 - S3
  a b   c 
a 6 b 6 c 6

Equivalent Grammar obtained by construction

S → aS S2 → cS2
S → aS1 S2 → cS3
S1 → bS1 S3 → ε
S1 → bS2

Exercise. Convince yourself that the NFA accepts precisely the words that
the grammar generates.

13 / 59
From Right-linear Grammars to NFAs
Given. Right-linear grammar (Vt , Vn , S, P)
terminals, non-terminals, start symbol, productions

Construction of an equivalent NFA has


alphabet is the terminal symbols Vt ;
states are the non-terminal symbols Vn together with new state Sf
(for final);
start state is the start symbol S;
final states are Sf and all non-terminals T such that there exists a
production T → ε;
transition relation is constructed as follows:
Each production gives transition
a
T → aU T −→ U
Each transition gives transition
a
T →a T −→ Sf
14 / 59
Right-linear Grammars to NFAs - Example
Given. Grammar G with the productions
S →0 S → 1T
T →ε T → 0T T → 1T
(generates binary strings without leading zeros)

Equivalent Automaton obtained by construction.

Sf
0

1
T 0, 1

Exercise. Convince yourself that the NFA accepts precisely the words that
the grammar generates.
15 / 59
Context-Free (Type 2) Grammars (CFGs)

Recall. A grammar is type-2 or context free if all productions have the


form
A→w
where A ∈ Vn is a non-terminal, and w ∈ V ∗ is an (arbitrary) string.
left side is non-terminal
right side can be anything
independent of context, replace LHS with RHS.

In Contrast. Context-Sensitive grammars may have productions

αAβ → αw β

may only replace A by w if A appears in context α β

16 / 59
Example
Goal. Design a CFG for the language
L = {am b n c m−n | m ≥ n ≥ 0}

Strategy. Every word w ∈ L can be split


w = am−n an b n c m−n
and hence L = {ak an b n c k | n, k ≥ 0}
convenient to not have comparison between n and m
generate ak . . . c k , i.e. same number of leading as and trailing c s
fill . . . in the middle by an b n , i.e. same number of as and bs
use different non-terminals for both phases of the construction

Resulting Grammar. (productions only)


S → aSc | T
T → aTb | 
17 / 59
Example ctd.

Grammar

S → aSc | T
T → aTb | 

Example Derivation. of aaabbc:

S ⇒ aSc
⇒ aTc
⇒ aaTbc
⇒ aaaTbbc
⇒ aaabbc

18 / 59
The Power of Context-Free Grammars

A fun example (for the “usefulness” of CFGs):


http://pdos.csail.mit.edu/scigen

Note that the generator doesn’t seem to work anymore, but they
also link similar services. And maybe you can find a working version
of it somewhere.

19 / 59
Parse Trees
Idea. Represent derivation as tree rather than as list of rule applications
describes where and how productions have been applied
generated word can be collected at the leaves
Example for the grammar that we have just constructed
S

a S c

a T b

a T b


20 / 59
Parse Trees Carry Semantics
Take the code

if e1 then if e2 then s1 else s2

where e1, e2 are boolean expressions and s1, s2 are subprograms.

Two Readings.

if e1 then ( if e2 then s1 else s2 )

and

if e1 then ( if e2 then s1 ) else s2

Goal. unambiguous interpretation of the code leading to determined and


clear program execution.
21 / 59
Ambiguity

Recall that we can present CFG derivations as parse trees.


Until now this was mere pretty presentation; now it will become important.

A context-free grammar G is unambiguous iff every string can be


derived by at most one parse tree.
G is ambiguous iff there exists any word w ∈ L(G ) derivable by more
than one parse trees.

22 / 59
Example: If-Then and If-Then-Else

Consider the CFG

S → if bexp then S | if bexp then S else S | prog

where bexp and prog stand for boolean expressions and (if-statement
free) programs respectively, defined elsewhere.
The string if e1 then if e2 then s1 else s2 then has two parse trees:

S S
x " x  "
(
if e1 then S if e1 then S else S
t |  " x  
if e2 then S else S if e2 then S s2
  
s1 s2 s1

23 / 59
Example: If-Then and If-Then-Else

That grammar was ambiguous. But here’s a grammar accepting the


exact same language that is unambiguous:

S → if bexp then S | T
T → if bexp then T else S | prog

There is now only one parse for if e1 then if e2 then s1 else s2.
This is given on the next slide:

24 / 59
Example: If-Then and If-Then-Else

S
x "
if e1 then S

T
t |  "
if e2 then T else S
 
s1 T

s2
You cannot parse this string as if e1 then ( if e2 then s1 ) else s2.

25 / 59
Reflecting on This Example
Observation.
more than one grammar for a language
some are ambiguous, others are not
ambiguity is a property of grammars

Grammars for Programs.


ambiguity is bad: don’t know how program will execute!
replace ambiguous grammar with unambiguous one

Choices for converting ambiguous grammars to unambiguous ones


decide on just one parse tree
e.g. if e1 then ( if e2 then s1 ) else s2 vs.
if e1 then ( if e2 then s1 else s2 )
in example: we have chosen:
if e1 then ( if e2 then s1 ) else s2
26 / 59
What Ambiguity Isn’t
Question. Is the grammar with the following production ambiguous?

T → if bexp then T else S

Reasoning.
Suppose that the above production was used
we can then expand either T or S first.

A. This is not ambiguity.


both options give rise to the same parse tree
indeed, for context-free languages it doesn’t matter what production
is applied first.
thinking about parse trees, both expansions happen in parallel.

Main Message. Parse trees provide a better representation of syntax than


derivations.
27 / 59
Inherently Ambiguous Languages

Q. Can we always remove ambiguity?

Example. Language L = {ai b j c k | (j = i or j = k) and i, j, k ∈ N}

Q. Why is this context free?


A. Note that L = {ai b i c k | i, k ∈ N} ∪ {ai b j c j | i, j ∈ N}
idea: start with production that “splits” between the union
S → T | W where T is “left” and W is “right”

Complete Grammar.
S → T | W
T → UV W → XY
U → aUb |  X → aX | 
V → cV |  Y → bYc | 

28 / 59
Inherently Ambiguous Languages
Problem. Both left part ai b i c k and right part ai b j c j has non-empty
intersection: ai b i c i

Ambiguity where a, b and c are equi-numerous:

S S
 
T W
} ! | "
U V X Y
~  !  ! ~  | 
a U b c V a X b Y c
   
   

So there are two parse trees for the same word!

Fact. There is no unambiguous grammar for this language


(we don’t prove this)
29 / 59
The Bad News

Q. Can we compute an unambiguous grammar whenever one exists?

Q. Can we even determine whether an unambiguous grammar exists?

A. If we interpret “compute” and “determine” as “by means of a


program”, then no.
There is no program that solves this problem for all grammars
input: CFG G , output: ambiguous or not. This problem is
undecidable
(More undecidable problems next week!)

30 / 59
Example: Subtraction
Example.
S → S − S | int
int stands for integers
the intended meaning of − is subtraction

Ambiguity.
S S

    
S − 1 5 − S

    
5 − 3 3 − 1

Evaluation.
left parse tree evaluates to 1
right parse tree evaluates to 3
so ambiguity matters! (As we also saw for the if/else statements.)
31 / 59
Technique 1: Associativity
Idea for ambiguity induced by binary operator (think: −)
prescribe “implicit parentheses”, e.g. a − b − c ≡ (a − b) − c
make operator associate to the left or the right

Left Associativity.
S → S − int | int

Result.
5 − 3 − 1 can only be read as (5 − 3) − 1
this is left associativity

Right Associativity.
S → int − S | int

Idea. Break the symmetry


one side of operator forced to lower level
here: force right hand side of i to lower level
32 / 59
Example: Multiplication and Addition

Example. Grammar for addition and multiplication

S → S ∗ S | S + S | int

Ambiguity.
1 + 2 ∗ 3 can be read as (1 + 2) ∗ 3 and 1 + (2 ∗ 3) with different results
also 1 + 2 + 3 is ambiguous – but this doesn’t matter here.

Take 1. The trick we have just seen


strictly evaluate from left to right
but this gives 1 + 2 ∗ 3 ≡ (1 + 2) ∗ 3, not intended!

Goal. Want ∗ to have higher precedence than +

33 / 59
Technique 2: Precedence
Example Grammar giving ∗ higher precedence:

S → S +T | T
T → T ∗ int | int

Given e.g. 1 + 2 ∗ 3 or 2 ∗ 3 + 1
forced to expand + first: otherwise only ∗
so + will be last operation evaluated

Example. Derivation of 1 + 2 ∗ 3
suppose we start with S ⇒ T ⇒ T ∗ int
stuck, as cannot generate 1 + 2 from T

Idea. Forcing operation with higher priority to lower level


three levels: S, (highest), T (middle) and integers
lowest-priority operation generated by highest-level nonterminal
34 / 59
Example: Basic Arithmetic

Repeated use of + and ∗:

S → S +T | S −T | T
T → T ∗ U | T /U | U
U → (S) | int

Main Differences.
have parentheses to break operator priorities, e.g. (1 + 2) ∗ 3
parentheses at lowest level, so highest priority
lower-priority operator can be inside parentheses
expressions of arbitrary complexity (no nesting in previous examples)

35 / 59
Example: Balanced Brackets

S →  | (S) | SS
Ambiguity.
associativity: create brackets from left or from right (as before)
. . . two ways of generating (): S ⇒ SS ⇒ S ⇒ (S) ⇒ ()
indeed, any expression has infinitely many parse trees

Reason. More than one way to derive .

36 / 59
Technique 3: Controlling 

Alternative Grammar with only one way to derive :

S→  | T
T → TU | U
U → () | (T )

 can only be derived from S


all other derivations go through T
here: combined with multiple level technique
ambiguity with  can be hard to miss!

37 / 59
From Grammars to Automata

So Far.
regular languages correspond to regular grammars.

Q. What automata correspond to context free grammars?

38 / 59
General Structure of Automata

a0 a1 a2 ... .... an

read input tape


head

Finite Auxiliary
State
Memory
Control

input tape is a set of symbols


finite state control is just like for DFAs / NFAs
symbols are processed and head advances
new aspect: auxiliary memory

Auxiliary Memory classifies languages and grammars


no auxiliary memory: NFAs / DFAs: regular languages
stack: push-down automata : context free languages
unbounded tape: Turing machines: all languages
39 / 59
Push-down Automata — PDAs

a0 a1 a2 ... .... an

read input tape


head

Finite
State
zk
Control
stack
memory
z2
z1

40 / 59
PDAs ctd.
Actions of a push-down automaton
change of internal state
pushing or popping the stack
advance to next input symbol

Action dependencies. Actions generally depend on


current state (of finite state control)
input symbol
symbol at the top of the stack

Acceptance. The machine accepts if


input string is fully read
machine is in accepting state

Variation.
PDAs can equivalently be defined without final states F .
Then, acceptance condition is having an empty stack
(after the input word was completely read)
41 / 59
Example

Language (that cannot be recognised by a DFA)

L = {an b n | n ≥ 1}

cannot be recognised by a DFA


can be generated by a context free grammar
can be recognised by a PDA

PDA design. (ad hoc, but showcases the idea)


phase 1: (state S1 ) push a’s from the input onto the stack
phase 2: (state S2 ) pop a’s from the stack, if there is a b on input
finalise: if the input is exhausted and the stack is empty, enter a final
state (S3 ), i.e., accept the string.

42 / 59
Deterministic PDA – Definition
[label=important]
Definition. A deterministic PDA has the form (S, s0 , F , Σ, Γ, Z , δ), where
S is the finite set of states, s0 ∈ S is the initial state and F ⊆ S are
the final states;
Σ is the finite alphabet, or set of input symbols;
Γ is the finite set of stack symbols, and Z ∈ Γ is the initial stack
symbol;
δ is a (partial) transition function

δ : S × (Σ ∪ {}) × Γ 9 S × Γ∗
δ : (state, input token or , top-of-stack) 9 (new state, new stack)

Additional Requirement to ensure determinism:


if δ(s, , γ) is defined, then δ(s, a, γ) is undefined for all a ∈ Σ
ensures that automaton has at most one execution
43 / 59
Notation

Given. Deterministic PDA with transition function

δ : S × (Σ ∪ {}) × Γ 9 S × Γ∗
δ : (state, input token or , top-of-stack) 9 (new state, stack string)

Notation.
write δ(s, a, γ) = s 0 /σ
σ is a string that replaces top stack symbol
final states are usually underlined (s)

Rationale.
replacing top stack symbol gives just one operation for push and pop
pop: δ(s, a, γ) = s 0 /
push: δ(s, a, γ) = s 0 /w γ
44 / 59
Two types of PDA transition

Input consuming transitions


δ contains (s1 , x, γ) 7→ s2 /σ
automaton reads symbol x
symbol x is consumed

Non-consuming transitions
δ contains (s1 , , γ) 7→ s2 /σ
Recall that for the pair s1 , γ, we can’t have any other entry (s1 , a, γ)
with a ∈ Σ to stay deterministic! (See slide 43)
independent of input symbol
can happen any time and does not consume input symbol

45 / 59
Example ctd.

Language L = {an b n | n ≥ 1}

Push-down automaton
starts with Z (initial stack symbol) on stack
final state is S3 (underlined)
transition function (partial) given by
δ(S0 , a, Z ) 7→ S1 /aZ push first a
δ(S1 , a, a) 7→ S1 /aa push further a’s
δ(S1 , b, a) 7→ S2 / start popping a’s
δ(S2 , b, a) 7→ S2 / pop further a’s
δ(S2 , , Z ) 7→ S3 / accept

(δ is partial, i.e. undefined for many arguments)


Aso not that we don’t have to delete Z in the last step.
The stack doesn’t have to be empty at the end.
46 / 59
Example ctd. – Diagrammatically

a, a/aa b, a/

a, Z /aZ b, a/
start S0 S1 S2

, Z /

S3

(Same as before, just larger.)

47 / 59
Example ctd. — PDA Trace
PDA configurations
triples: (state, remaining input, stack)
top of stack on the left (by convention)

Example Execution.

(S0 , aaabbb, Z ) ⇒ (S1 , aabbb, aZ ) (push first a)


⇒ (S1 , abbb, aaZ ) (push further a’s)
⇒ (S1 , bbb, aaaZ ) (push further a’s)
⇒ (S2 , bb, aaZ ) (start popping a’s)
⇒ (S2 , b, aZ ) (pop further a’s)
⇒ (S2 , , Z ) (pop further a’s)
⇒ (S3 , , ) (accept)

Accepting execution. Input exhausted, ends in final state (as usual!).


48 / 59
Example ctd. — Rejection

PDA execution.

(S0 , aaba, Z ) ⇒ (S1 , aba, aZ )


⇒ (S1 , ba, aaZ )
⇒ (S2 , a, aZ )
⇒ ???

Non-accepting execution.
No transition possible, stuck without reaching final state
rejection happens when transition function is undefined for current
configuration (state, input, top of stack)

49 / 59
Example: Palindromes with ‘Centre Mark’

Example Language.

L = {wcw R | w ∈ {a, b}∗ ∧ w R is w reversed}

Deterministic PDA that accepts L


Push a’s and b’s onto the stack as we seem them;
When we see c, change state;
Now try to match the tokens we are reading with the tokens on top of
the stack, popping as we go;
If the top of the stack is the empty stack symbol Z , enter the final
state via an -transition. Hopefully our input has been used up too!

Exercise. Define this formally!

50 / 59
Non-Deterministic PDAs
Deterministic PDAs
transitions are a partial function
δ : S × (Σ ∪ {}) × Γ 9 S × Γ∗
δ : (state, input token or , top-of-stack) 9 (new state, stack string)
side condition about -transitions

Non-Deterministic PDAs
transitions given by relation
δ ⊆ S × (Σ ∪ {}) × Γ × S × Γ∗
no side condition (at all).

Main differences
for deterministic PDA: at most one transition possible
for non-deterministic PDA: zero or more transitions possible
51 / 59
Non-Deterministic PDAs ctd.
Finite Automata
non-determinism is convenient
but doesn’t give extra power (subset construction)
can convert every NFA to an equivalent DFA

Push-down automata.
non-determinism gives extra power
cannot convert every non-deterministic PDA to deterministic PDA
there are context free languages that can only be recognised by
non-deterministic PDA
intuition: non-determinism allows “guessing”

Grammar / Automata correspondence


non-deterministic PDAs are more important
they correspond to context-free languages
52 / 59
Example: Even-Length Palindromes
Palindromes of even length, without centre-marks
L = {ww R | w ∈ {a, b}∗ ∧ w R is w reversed}
this is a context-free language
cannot be recognised by deterministic PDA
intuitive reason: no centre-mark, so don’t know when first half of
word is read

Non-deterministic PDA for L has the transition


δ(s, , γ) = r /x
x ∈ {a, b, Z }, s is the ‘push’ state and r the ‘match and pop’ state.

Intuition
“guess” (non-deterministically) whether we need to enter
“match-and-pop”-state
automaton gets stuck if guess is not correct (no harm done)
automaton accepts if guess is correct
53 / 59
Grammars and PDAs

Theorem. Context-free languages and non-deterministic PDAs are


equivalent
for every CFL L there exists a PDA that accepts L
if L is accepted by a non-deterministic PDA, then L is a CFL.

Proof. We only do one direction: construct PDA from CFL.


this is the “interesting” direction for parser generators
other direction quite complex.

54 / 59
From CFG to PDA
Given. Context-Free Grammar G = (Vt , Vn , S, P)

Construct non-deterministic PDA A = (Q, Q0 , F , Σ, Γ, Z , δ)

States. Q0 (initial state), Q1 (working state) and Q2 (final state).

Alphabet. Σ = Vt , terminal symbols of the grammar

Stack Alphabet. Γ = Vt ∪ Vn ∪ {Z }

Initialisation.
push start symbol S onto stack, enter working state Q1
δ(Q0 , , Z ) 7→ Q1 /SZ

Termination.
if the stack is empty (i.e. just contains Z ), terminate
δ(Q1 , , Z ) 7→ Q2 /
55 / 59
From CFGs to PDAs: working state
Idea.
build the derivation on the stack by expanding non-terminals
according to productions productions
if a terminal appears that matches the input, pop it
terminate, if the entire input has been consumed

Expand Non-Terminals.
non-terminals on the stack are replaced by right hand side of
productions
δ(Q1 , , A) 7→ Q1 /α for all productions A → α

Pop Terminals.
terminals on the stack are popped if they match the input
δ(Q1 , t, t) 7→ Q1 / for all terminals t

Result of Construction. Non-deterministic PDA


may have more than one production for a non-terminal
56 / 59
Example — Derive a PDA for a CFG

Arithmetic Expressions as a grammar:

S → S +T | T
T → T ∗U | U
U → (S) | int

1. Initialise:

δ(Q0 , , Z ) 7→ Q1 /SZ

2. Expand non-terminals:

δ(Q1 , , S) 7→ Q1 /S + T δ(Q1 , , T ) 7→ Q1 /U
δ(Q1 , , S) 7→ Q1 /T δ(Q1 , , U) 7→ Q1 /(S)
δ(Q1 , , T ) 7→ Q1 /T ∗ U δ(Q1 , , U) 7→ Q1 /int

57 / 59
CFG to PDA ctd.

3. Match and pop terminals:

δ(Q1 , +, +) 7→ Q1 /
δ(Q1 , ∗, ∗) 7→ Q1 /
δ(Q1 , int, int) 7→ Q1 /
δ(Q1 , (, () 7→ Q1 /
δ(Q1 , ), )) 7→ Q1 /

4. Terminate:

δ(Q1 , , Z ) 7→ Q2 /

58 / 59
Example Trace

(q0 , int ∗ int, Z ) ⇒ (Q1 , int ∗ int, SZ )


⇒ (Q1 , int ∗ int, TZ)
⇒ (Q1 , int ∗ int, T ∗ UZ )
⇒ (Q1 , int ∗ int, U ∗ UZ )
⇒ (Q1 , int ∗ int, int ∗ UZ )
⇒ (Q1 , ∗int, ∗UZ )
⇒ (Q1 , int, UZ )
⇒ (Q1 , int, intZ )
⇒ (Q1 , , Z)
⇒ (Q2 , , )
⇒ accept

59 / 59

You might also like