Automata Theory LecturesSlides Compressed
Automata Theory LecturesSlides Compressed
Kinds of languages:
– Talking language
– Programming language
– Formal Languages (Syntactic languages)
EMPTY STRING or NULL STRING
• Example:
If Σ= {a} then a language L can be defined as
L={an : n=1,2,3,…..} or L={a,aa,aaa,….}
off on
q0 1 q1 0 q2
states
q1 q2 q1
q2 q2 q2
Language of a DFA
M: off on
0 1
1
q0 q1
0
0 1 0,1
q0 1 q1 0 q2
L = {010, 1} ( S = {0, 1} )
Examples
• Construct a DFA that accepts the language
L = {010, 1} ( S = {0, 1} )
• Answer
q0 1 q01 0 q010
0
0 1
qe 0, 1
1 0, 1
q1 qdie
0, 1
Examples
• Construct a DFA over alphabet {0, 1} that
accepts all strings that end in 101
Examples
• Construct a DFA over alphabet {0, 1} that
accepts all strings that end in 101
…
qe
1 q101
0 q10
1
q1 …
…
1
q11 1
q111 1
Grammar ?
24
Grammar and its Chomsky
Classification
• We’ll cover three types of structures used in modeling computation:
• Grammars
• Used to generate sentences of a language and to determine if a given
sentence is in a language
• Formal languages, generated by grammars, provide models for
programming languages (Java, C, etc) as well as natural language ---
important for constructing compilers
• Finite-state machines (FSM)
• FSM are characterized by a set of states, an input alphabet, and
transitions that assigns a next state to a pair of state and an input. We’ll
study FSM with and without output. They are used in language
recognition (equivalent to certain grammar)but also for other tasks such
as controlling vending machines
• Turing Machine – they are an abstraction of a computer; used to
compute number theoretic functions
25
Intro to Languages
predicate verb 29
article a
article the
•
noun boy
noun dog
verb runs
verb sleeps
30
• A derivation of “the boy sleeps”:
31
• Language of the grammar:
L = { “a boy runs”,
“a boy sleeps”,
“the boy runs”,
“the boy sleeps”,
“a dog runs”,
“a dog sleeps”,
“the dog runs”,
“the dog sleeps” }
32
Notation
• noun boy
noun dog
Variable Terminal
or Production
Symbols of
Non-terminal rule
the vocabulary
Symbols of
the vocabulary
33
Basic Terminology
35
Example 1:
Assignment statements
• V = { S, E }, T = { i, =, +, *, n }
• Productions:
Si=E
En
Ei
EE+E
EE*E
36
Example 3: 0n1n
• V = { S }, T = { 0, 1 }
• Productions:
Se
S 0S1
37
Derivation
• Definition
• If w0, w1, …., wn are strings over V such that w0 =>w1,w1=>w2,…, wn-1 => wn,
then we say that wn is derivable from w0, and write w0=>*wn.
40
Leftmost vs rightmost derivations
• Leftmost derivation: the leftmost variable is always
the one replaced when applying a production
– Example: S i = E i = E + E
i=n+Ei=n+n
• Rightmost derivation: rightmost variable is replaced
– Example: S i = E i = E + E
i=E+ni=n+n
Sentential forms
• In a derivation, assuming it begins with S, all
intermediate strings are called sentential forms of
the grammar G
• Example: i = E and i = E + n are sentential forms of
the assignment statement grammar
• The sentential forms are called leftmost (rightmost)
sentential forms if they are a result of leftmost
(rightmost) derivations
Parse trees
• Recall that a tree in graph theory is a set of nodes
such that
– There is a special node called the root
– Nodes can have zero or more child nodes
– Nodes without children are called leaves
– Interior nodes: nodes that are not leaves
• A parse tree for a grammar G is a tree such that the
interior nodes are non-terminals in G and children of
a non-terminal correspond to the body of a
production in G
Yield of a parse tree
• Yield: concatenation of leaves from left to right
• If the root of the tree is the start symbol, and all
leaves are terminal symbols, then the yield is a string
in L(G)
• A derivation always corresponds to some parse tree
Types of Grammars -
Chomsky hierarchy of languages
• Venn Diagram of Grammar Types:
Type 1 –
Context-Sensitive
Type 2 –
Context-Free
Type 3 –
Regular
Classifying grammars
n n R
{a b : n 0} {ww }
Regular Languages
a *b * ( a b) *
Fall 2006
Definition: Context-Free Grammars
Grammar G (V , T , S , P)
Fall 2006
Costas Busch - RPI 51
Grammar for mathematical expressions
E E E | E E | (E) | a
Example strings:
(a a ) a (a a (a a ))
Fall 2006
Costas Busch - RPI 52
E E E | E E | (E) | a
E E E a E a EE
E
a a E a a*a
E E A leftmost derivation
for
a aa
a E E
a a Fall 2006
Costas Busch - RPI 53
E E E | E E | (E) | a
a aa E E a
a a Fall 2006
Costas Busch - RPI 54
E E E | E E | (E) | a
E a aa E
E E E E
a E E E E a
a a a a Fall 2006
Costas Busch - RPI 55
take
a2
a a a 2 22
E E
E E E E
2 E E E E 2
2 2 2 2 Fall 2006
Costas Busch - RPI 56
Good Tree Bad Tree
2 22 6 2 22 8
6 Compute expression result 8
using the tree
E E
2 4 4 2
E E E E
2 2 2 2
2 E E E E 2
2 2 2 2 Fall 2006
Costas Busch - RPI 57
Ambiguous Grammar:
A context-free grammar
if there is a string
is ambiguous
which has: G
two different derivation trees w L(G )
or
two leftmost derivations
Fall 2006
Costas Busch - RPI 58
• Context-Sensitive Languages
Finite
Control
• Example #1:
0
q0 q1 1
0
1 0 0 1 1
q0 q0 q1 q0 q0 q0
a a a/b/c
c c
q0 q1 q2
b b
a c c c b accepted
q0 q0 q1 q2 q2 q2
a a c rejected
q0 q0 q0 q1
63
a a a/b/c
q0 c q1 c q2
b b
Inductive Proof (sketch): that the machine correctly accepts strings with at least two c’s
Proof goes over the length of the string.
Inductive steps: Each case for symbol p, for string xp (|xp| = k+1), the last symbol p = a, b or c
xa xb xc
• A DFA is a five-tuple:
M = (Q, Σ, δ, q0, F)
δ:
0 1
q0 q1 q0
q1 q0 q1
66
• Revisit example #2:
a a a/b/c
Q = {q0, q1, q2}
Σ = {a, b, c} c c
q0 q1 q2
Start state is q0
F = {q2} b b
δ: a b c
q0 q0 q0 q1
q1 q1 q1 q2
q2 q2 q2 q2
67
Nondeterministic Finite State
Automata (NFA)
• An NFA is a five-tuple:
M = (Q, Σ, δ, q0, F)
0 1 0/1
Q = {q0, q1, q2}
Σ = {0, 1} 0 1
q0 q1 q2
Start state is q0
F = {q2}
δ: 0 1
q0 {q0, q1} {}
q1 {} {q1, q2}
q2 {q2} {q2}
69
• Example #2: pair of 0’s or pair of 1’s as substring
0/1 0/1
Q = {q0, q1, q2 , q3 , q4}
Σ = {0, 1} 0 0
q0 q3 q4
Start state is q0
F = {q2, q4} 1 0/1
1
δ: 0 1 q1 q2
q0 {q0, q3} {q0, q1}
q1 {} {q2}
{q2} {q2}
q2
q3 {q4} {}
q4 {q4} {q4}
70
• Question: Why non-determinism is useful?
– Non-determinism = Backtracking
– Compressed information
– Non-determinism hides backtracking
– Programming languages, e.g., Prolog, hides backtracking => Easy to
program at a higher level: what we want to do, rather than how to do it
– Useful in algorithm complexity study
– Is NDA more “powerful” than DFA, i.e., accepts type of languages that
any DFA cannot?
71
Equivalence of DFAs and NFAs
72
• Consider the following DFA: 2 or more c’s
a a a/b/c
Q = {q0, q1, q2}
Σ = {a, b, c} c c
q0 q1 q2
Start state is q0
F = {q2} b b
δ: a b c
q0 q0 q0 q1
q1 q1 q1 q2
q2 q2 q2 q2
73
• An Equivalent NFA:
a a a/b/c
Q = {q0, q1, q2}
Σ = {a, b, c} c c
q0 q1 q2
Start state is q0
F = {q2} b b
δ: a b c
q0 {q0} {q0} {q1}
74
Real-life Uses of DFAs
Grep
Coke Machines
Thermostats (fridge)
Elevators
• Introduction
• Chomsky normal form
• Preliminary simplifications
• Final steps
• Greibach Normal Form
• Algorithm (Example)
• Summary
Grammar: G = (V, T, P, S)
Terminals T = { a, b }
Variables V = A, B, C
Start Symbol S
Production P=S→A
Grammar example
S → aBSc
S → abc
L = { anbncn | n ≥ 1 }
Ba → aB
Bb → bb
S→P
P → aPb L = { anbn | n ≥ 0 }
P→ε
2 Eliminate ε productions
• X is generating if X
* ω for some terminal string ω.
• X is reachable if there is a derivation X
* αXβ
for some α and β
S → AB | a
A→b Initial CFL grammar
S → AB | a
Identify generating symbols
A→b
S→a
Remove non-generating
A→b
S→a
Identify reachable symbols
A→b
S → AB | a S→a
A→b A→b
Eliminate ε Productions
A
* ε
Nullable variable
If A is a nullable variable
S → ASA | aB
A→B|S Nullable: {A, B}
B→b|ε
Eliminate ε Productions
S → ASA | aB S → ASA | aB | AS | SA | S | a
A→B|S A→B|S
B→b|ε B→b
Eliminate ε Productions
S → ASA | aB S → ASA | aB | AS | SA | S | a
A→B|S A→B|S
B→b|ε B→b
Eliminate ε Productions
S → ASA | aB S → ASA | aB | AS | SA | S | a
A→B|S A→B|S
B→b|ε B→b
A
* B
A → B, B → ω, then A → ω
Example:
T = {*, +, (, ), a, b, 0, 1} Pairs Productions
( E, E ) E→E+T
I → a | b | Ia | Ib | I0 | I1 ( E, T ) E→T*F
F → I | (E) ( E, F ) E → (E)
T→F|T*F ( E, I ) E → a | b | Ia | Ib | I0 | I1
E→T|E+T ( T, T ) T→T*F
( T, F ) T → (E)
( T, I ) T → a | b | Ia |Ib | I0 | I1
Basis: (A, A) is a unit pair ( F, F ) F → (E)
of any variable A, if ( F, I ) F → a | b | Ia | Ib | I0 | I1
A * A by 0 steps. ( I, I ) I → a | b | Ia | Ib | I0 | I1
Example:
Pairs Productions
… …
( T, T ) T→T*F
( T, F ) T → (E)
( T, I ) T → a | b | Ia |Ib | I0 | I1
… …
I → a | b | Ia | Ib | I0 | I1
E → E + T | T * F | (E ) | a | b | la | lb | l0 | l1
T → T * F | (E) | a | b | Ia | Ib | I0 | I1
F → (E) | a | b | Ia | Ib | I0 | I1
A → αX
Example:
S → XA | BB S = A1 A 1 → A2 A 3 | A 4 A 4
B → b | SB X = A2 A4 → b | A1A4
X→b A = A3 A2 → b
A→a B = A4 A3 → a
Example:
A 4 → A1 A 4
Example:
A 4 → A1 A 4 A 1 → A2 A 3 | A 4 A 4
A4 → A2A3A4 | A4A4A4 | b A4 → b | A1A4
A2 → b
A4 → bA3A4 | A4A4A4 | b
A3 → a
Example:
A 1 → A2 A 3 | A 4 A 4 Second Step
A4 → bA3A4 | A4A4A4 | b
Eliminate Left
A2 → b
Recursions
A3 → a
A4 → A4A4A4
Example:
Second Step
Eliminate Left
Recursions
Example:
A1 → A2 A 3 | A 4 A 4
A4 → bA3A4 | b | bA3A4Z | bZ A → αX
Z → A4A4 | A4A4 Z
A2 → b GNF
A3 → a
Example:
A1 → A2A3 | A4A4
A4 → bA3A4 | b | bA3A4Z | bZ
Z → A4A4 | A4A4 Z
A2 → b
A3 → a
Example:
This definition may seem circular, but 1-3 form the basis
Precedence: Parentheses have the highest precedence, followed by *(iteration),
concatenation, and then union(ICU)
RE Examples
• L(001) = {001}
• L(0+10*) = { 0, 1, 10, 100, 1000, 10000, … }
• L(0*10*) = {1, 01, 10, 010, 0010, …} i.e. {w | w has exactly a single 1}
• L()* = {w | w is a string of even length}
• L((0(0+1))*) = { ε, 00, 01, 0000, 0001, 0100, 0101, …}
• L((0+ε)(1+ ε)) = {ε, 0, 1, 01}
• L(1Ø) = Ø ; concatenating the empty set to any set yields the empty set.
• Rε = R
• R+Ø = R
• Note that R+ε may or may not equal R (we are adding ε to the language)
• Note that RØ will only equal R if R itself is the empty set.
Regular Expressions
• Regular expressions
• describe regular languages
(a b c) *
• Example:
a, bc* , a, bc, aa, abc, bca,...
• describes the language
• Example
1 1
Start 3 1 2
to RE’s: 0 0+1
1 1
Start 3 1 2
0
Converting a RE to an Automata
• We have shown we can convert an automata to a RE.
To show equivalence we must also go the other
direction, convert a RE to an automaton.
• We can do this easiest by converting a RE to an ε-NFA
– Inductive construction
– Start with a simple basis, use that to build more complex
parts of the NFA
RE to ε-NFA
• Basis:
a
R=a
R=ε ε
R=Ø
ε ε
T
ε
R=ST
S T
ε ε
R=S*
S
ε
RE to ε-NFA Example
• Convert R= (ab+a)* to an NFA
– We proceed in stages, starting from simple
elements and working our way up
a
a
b
b
a ε b
ab
RE to ε-NFA Example (2)
ab+a
a ε b
ε ε
a
ε ε
(ab+a)*
a ε b
ε ε
ε ε
a
ε ε
ε
Pushdown Automata
Formal Definition of a PDA
122
Pushdown Automaton
• A pushdown automaton (PDA) is an abstract model machine
similar to the FSA
123
Power of PDAs
124
NDPDAs are different from DPDAs
125
• The PDA can be represented by
M = (Q, Σ, Γ, δ, s, F)
where Σ is the alphabet of input symbols and Γ
is the alphabet of stack symbols.
• The set of all strings accepted by a PDA M is
denoted by L(M). We also say that the
language L(M) is accepted by M.
• The transition diagram of a PDA is an alternative way
to represent the PDA.
• For M = (Q, Σ, Γ, δ, s, F), the transition diagram of M
is an edge-labeled digraph G=(V, E) satisfying the
following:
V = Q (s = ,f= for f F)
Solution 1.
1, 0/ε
0, ε/0 1, 0/ε
Solution 2.
Consider a CFG
G = ({S}, {0,1}, {S → ε | 0S1}, S).
ε, S/1
ε, ε/S
ε, ε/S
ε, ε/0
ε, S/ε
0, 0/ε
1, 1/ε
• TMs model the computing capability of a general purpose computer, which
informally can be described as:
– Effective procedure
• Finitely describable
• Well defined, discrete, “mechanical” steps
• Always terminates
– Computable function
• A function computable by an effective procedure
130
Deterministic Turing Machine (DTM)
…….. B B 0 1 1 0 0 B B ……..
Finite
Control
• Two-way, infinite tape, broken into cells, each containing one symbol.
• Two-way, read/write tape head.
• An input string is placed on the tape, padded to the left and right infinitely with
blanks, read/write head is positioned at the left end of input string.
• Finite control, i.e., a program, containing the position of the read head, current
symbol being scanned, and the current state.
• In one move, depending on the current state and the current symbol being
scanned, the TM 1) changes state, 2) prints a symbol over the cell being
scanned, and 3) moves its’ tape head one cell left or right.
• Many modifications possible, but Church-Turing declares equivalence of all.
131
Formal Definition of a DTM
• A DTM is a seven-tuple:
M = (Q, Σ, Γ, δ, q0, B, F)
Intuitively, δ(q,s) specifies the next state, symbol to be written, and the direction of
tape head movement by M after reading symbol s while in state q.
132
• Example #1: {w | w is in {0,1}* and w ends with a 0}
0
00
10
10110
Not ε
0 1 B
->q0 (q0, 0, R) (q0, 1, R) (q1, B, L)
q1 (q2, 0, R) - -
q2* - - -
– q0 is the start state and the “scan right” state, until hits B
– q1 is the verify 0 state
– q2 is the final state
133
• Example #2: {0n1n | n ≥ 1}
0 1 X Y B
->q0 (q1, X, R) - - (q3, Y, R)0’s finished -
q1 (q1, 0, R)ignore1 (q2, Y, L) - (q1, Y, R) ignore2 - (more 0’s)
q2 (q2, 0, L) ignore2 - (q0, X, R) (q2, Y, L) ignore1 -
q3 - - (more 1’s) - (q3, Y, R) ignore (q4, B, R)
q4* - - - - -
q00011BB.. |— Xq1011
|— X0q111
|— Xq20Y1
|— q2X0Y1
|— Xq00Y1
|— XXq1Y1
|— XXYq11
|— XXq2YY
|— Xq2XYY
|— XXq0YY
|— XXYq3Y B…
|— XXYYq3 BB…
|— XXYYBq4
134
• Same Example #2: {0n1n | n ≥ 1}
0 1 X Y B
q0 (q1, X, R) - - (q3, Y, R) -
q1 (q1, 0, R) (q2, Y, L) - (q1, Y, R) -
q2 (q2, 0, L) - (q0, X, R) (q2, Y, L) -
q3 - - - (q3, Y, R) (q4, B, R)
q4 - - - - -
Logic: cross 0’s with X’s, scan right to look for corresponding 1, on finding it cross it with Y, and scan
left to find next leftmost 0, keep iterating until no more 0’s, then scan right looking for B.
– The TM matches up 0’s and 1’s
– q1 is the “scan right” state, looking for 1
– q2 is the “scan left” state, looking for X
– q3 is “scan right”, looking for B
– q4 is the final state
135
Formal Definitions for DTMs
• Let M = (Q, Σ, Г, δ, q0, B, F) be a TM.
x1x2…xi-1qxixi+1…xn
x1x2…xi-1qxixi+1…xn |— x1x2…xi-1ypxi+1…xn
x1x2…xnq |— x1x2…xnyp
137
L is Recursively enumerable:
TM exist: M0, M1, …
They accept string in L, and do not accept any string outside L
L is Recursive:
at least one TM halts on L and on ∑*-L, others may or may not
L is not R.E:
no TM exists
138
Modifications of the Basic TM Model
139
The Halting Problem - Background
• Definition: A decision problem is a problem having a yes/no answer (that one
presumably wants to solve with a computer). Typically, there is a list of parameters on
which the problem is based.
– Given a list of numbers, is that list sorted?
– Given a number x, is x even?
– Given a C program, does that C program contain any syntax errors?
– Given a TM (or C program), does that TM contain an infinite loop?
From a practical perspective, many decision problems do not seem all that interesting.
However, from a theoretical perspective they are for the following two reasons:
– Decision problems are more convenient/easier to work with when proving complexity results.
– Non-decision counter-parts can always be created & are typically at least as difficult to solve.
• Notes:
– The following terms and phrases are analogous:
r.e. recursive
languages languages
THE HALTING PROBLEM
HALTTM = { (M,w) | M is a TM that halts on string w }
• http://www.cse.cuhk.edu.hk/~andrejb/csc3130
• cs.www.duke.edu › courses › fall07 › cps102 › lectures › lecture02
• ie.technion.ac.il › courses › verification › C4.1_NFA+Buchi.ppt
• curry.ateneo.net › ~jpv › cs130-L6-Grammars
• montefiore.www.ulg.ac.be › cours › psfiles › calc-chap3
• cit.courses.cornell.edu › PPT › INFO-2950-Languages-and-Grammars
• cs.www.rpi.edu › ~moorthy › Courses › modcomp › slides › PDA
• cs.fit.edu › ~dmitra › FormaLang › Lectures › PushdownAutomata
• cs.www.cmu.edu › ~emc › flac09 › lectures
• cse.www.iitd.ernet.in › ~naveen › courses › COL352 › slides
• cs.www.rpi.edu › ~moorthy › Courses › modcomp › slides › NFA
• cs.fit.edu › ~dmitra › FormaLang › Lectures › FiniteAutomata