0% found this document useful (0 votes)
38 views

Outline: Great Theoretical Ideas in CS

This document outlines key concepts in theoretical computer science related to finite automata and regular languages. It discusses deterministic finite automata (DFAs) and how they can be used to recognize regular languages. Specifically, it defines what a DFA is, provides examples of DFAs and the languages they recognize, and discusses the membership problem of determining if a given string is in the language recognized by a DFA. It also proves that any finite language is regular by showing that a language consisting of a finite number of strings can be recognized by a DFA.

Uploaded by

Asheber
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Outline: Great Theoretical Ideas in CS

This document outlines key concepts in theoretical computer science related to finite automata and regular languages. It discusses deterministic finite automata (DFAs) and how they can be used to recognize regular languages. Specifically, it defines what a DFA is, provides examples of DFAs and the languages they recognize, and discusses the membership problem of determining if a given string is in the language recognized by a DFA. It also proves that any finite language is regular by showing that a language consisting of a finite number of strings can be recognized by a DFA.

Uploaded by

Asheber
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Great Theoretical Ideas in CS

V. Adamchik CS 15-251
Outline
Carnegie Mellon University

DFA
Finite Automata Regular Languages
0n1n is not regular
Union Theorem
Kleene’s Theorem
NFA
Application: KMP

Deterministic Finite Automaton 0


11 1
0,1
1

A machine so simple that you can 0111 111 1ϵ


understand it in just one minute 0 0

1

The machine processes a string and accepts
it if the process ends in a double circle
The unique string of length 0 will be denoted
by ε and will be called the empty or null string

accept states (F) Anatomy of a Deterministic Finite


start state (q0)
0
11 Automaton
0,1 1
The singular of automata is automaton.
1
0111 111 The alphabet Σ of a finite automaton is the

set where the symbols come from, for
0 0 example {0,1}
transitions 1 The language L(M) of a finite automaton is
the set of strings that it accepts
states
L(M) = {x∈Σ: M accepts x}
The machine accepts a string if the process It’s also called the
ends in an accept state (double circle) “language decided/accepted by M”.

1
The Language L(M) of Machine M The Language L(M) of Machine M

0 0 0
0,1
1
q0 q1
q0
1
1
What language does this DFA decide/accept?
L(M) = All strings of 0s and 1s

The language of a finite automaton is the set


of strings that it accepts L(M) = { w | w has an even number of 1s}

Formal definition of DFAs M = (Q, Σ, , q0, F) Q = {q0, q1, q2, q3}


where
A finite automaton is a 5-tuple M = (Q, Σ, , q0, F) Σ = {0,1}
q0 Q is start state
Q is the finite set of states F = {q1, q2} Q accept states
:Q Σ → Q transition function
Σ is the alphabet
:Q Σ → Q is the transition function q1 1
0 0 1
0,1
q0 Q is the start state q0 q0 q1
1
q1 q2 q2
F Q is the set of accept states q0 M q2
0 0 q2 q3 q2
q3 q0 q2
L(M) = the language of machine M
1
q3

= set of all strings machine M accepts

EXAMPLE
Determine the language
An automaton that accepts all recognized by
and only those strings that
contain 001

1 0,1

0 0,1 0
1

0 0 1
{0} {00} {001}

1 L(M)={1,11,111, …}

2
Membership problem
Determine the language
decided by
Determine whether some
word belongs to the language.

0 0,1

0 1 0,1

L(M)={1, 01}

Regular Languages DFA Membership problem

A language over Σ is a set of strings over Σ Determine whether some


word belongs to the language.
A language L ⊆ Σ is regular if it is recognized by a
deterministic finite automaton
Theorem: The DFA Membership Problem is
A language L ⊆ Σ is regular if there is solvable in linear time.
a DFA which decides it.
Let M = (Q, Σ, , q0, F) and w = w1...wm.
L = { w | w contains 001} is regular Algorithm for DFA M:
p := q0;
L = { w | w has an even number of 1s} is regular for i := 1 to m do p := (p,wi);
if pF then return Yes else return No.

Theorem: Any finite language is regular


Are all languages
regular? Claim 1: Let w be a string over an alphabet. Then
{w} is a regular language.

Proof: By induction on the number of characters.


If {a} and {b} are regular then {ab} is regular

Claim 2: A language consisting of n strings is


Theorem: Any finite language is regular
regular
Proof: By induction on the number of strings. If
{a} then L{a} is regular

3
Theorem: L = {0n1n : n∈ℕ} is not regular Theorem: L = {0n1n : n∈ℕ} is not regular

Notation: Wrong Intuition:


If a∈Σ is a symbol and n∈N then denotes
an
For a DFA to decide L, it seems like it needs
the string aaa∙∙∙a (n times).
to “remember” how many 0’s it sees at the
beginning of the string, so that it can
E.g., a3 means aaa, a5 means aaaaa,
“check” there are equally many 1’s.
a1 means a, a0 means ϵ, etc.
But a DFA has only finitely many states —
shouldn’t be able to handle arbitrary n.
Thus L = {ϵ, 01, 0011, 000111, 00001111, …}.

L = strings where the number of How to prove a language is NOT


occurrences of 01 is equal to the number regular…
of occurrences of 10
Assume for contradiction there is a DFA M with
1
0 L(M) = L.
1
0 0
Argue (usually by Pigeonhole) there are two
strings x and y which reach the same state in M.
1 0
0
1 Show there is a string z such that xz∈L but yz∉L.
1 Contradiction, since M accepts either both (or
M accepts only the strings with an equal neither.)
number of 01’s and 10’s!
For example, 010110

Theorem: L = {0n1n : n∈ℕ} is not regular Theorem: L = {0n1n : n∈ℕ} is not regular
Full proof: Full proof:
Suppose M is a DFA deciding L with, say, k states. So on input 0s1s ∈ L, M will reach an accepting state.

Consider input 0t1s ∉ L, s≠t.


Let ri be the state M reaches after processing 0i.
By Pigeonhole, there is a repeat among M will process 0t, reach state rt = rs
r0, r1, r2, …, rk. So say that rs = rt for some s ≠ t.
then M will process 1s, and reach an accepting state.

Since 0s1s ∈ L, starting from rs and processing 1s Contradiction!


causes M to reach an accepting state.

4
Regular Languages Equivalence of two DFAs

Definition: Two DFAs M1 and M2 over the same


Definition: A language L ⊆ Σ is regular if there is alphabet are equivalent if they
a DFA which decides it. accept the same language: L(M1) = L(M2).

Questions: Given a few equivalent machines, we are


naturally interested in the smallest one
1. Are all languages regular? with the least number of states.
2. Are there other ways to tell if L is regular?

Union Theorem Theorem: The union of two regular


languages is also a regular language
Given two languages, L1 and L2, define Proof (Sketch): Let
the union of L1 and L2 as
M1 = (Q1, Σ, 1, q0, F1) be finite automaton for L1
L1 L2 = { w | w L1 or w L2 }
and
M2 = (Q2, Σ, 2, q0, F2) be finite automaton for L2
Theorem: The union of two regular
languages is also a regular language. We want to construct a finite automaton
M = (Q, Σ, , q0, F) that recognizes L = L1 L2

Idea: Run both M1 and M2 at the same time.

Union Theorem Union Theorem


L1 = strings with
qeven 0 qeven 0
even # of 1’s M1 M1

L2 = strings x with Input: 101001


1 1 1 1
|x| div. by 3

qodd 0 qodd 0

M2 M2
0,1 0,1 0,1 0,1
p0 p1 p2 p0 p1 p2
0,1 0,1

5
Union Theorem Union Theorem

qeven 0 qeven 0
M1 M1

Input: 101001 Input: 101001


1 1 1 1

qodd 0 qodd 0

M2 M2
0,1 0,1 0,1 0,1
p0 p1 p2 p0 p1 p2
0,1 0,1

Union Theorem Union Theorem

qeven 0 qeven 0
M1 M1

Input: 101001 Input: 101001


1 1 1 1

qodd 0 qodd 0

M2 M2
0,1 0,1 0,1 0,1
p0 p1 p2 p0 p1 p2
0,1 0,1

Union Theorem Union Theorem

qeven 0 qeven 0
M1 M1

Input: 101001 Input: 101001


1 1 1 1
Accept.

qodd 0 qodd 0

M2 M2
0,1 0,1 0,1 0,1
p0 p1 p2 p0 p1 p2
0,1 0,1

6
Union Theorem Union Theorem
Q = pairs of states, one from M1 and one from M2
Make a DFA keeping
track of both at once.
qeven 0 = { (q1, q2) | q1 Q1 and q2 Q2 }
M1
= Q1 Q2
0
1 1 0 0
qeven, p0 qeven, p1 qeven, p2
qodd 0
1 1
M2
0,1 0,1 qodd, p0 0 qodd, p1 0 qodd, p2
p0 p1 p2
0,1
1 1

The Regular Operations The Kleene closure: A*


Union: A B={w|w A or w B} Star: A* = { w1 …wk | k ≥ 0 and each wi A}

Intersection: A B={w|w A and w B} From the definition of the concatenation,


we definite An, n =0, 1, 2, … recursively
Negation: A={w|w A}
A0 = {ε}
Reverse: AR = { w1 …wk | wk …w1 A} A = An A
n+1

Concatenation: A B = { vw | v A and w B} A* is a set consisting of concatenations


of arbitrary many strings from A.
Star: A* = { w1 …wk | k ≥ 0 and each wi A}

A* UA k

k 0

The Kleene closure: A* Regular Languages Are Closed


Under The Regular Operations
What is A* of A={0,1}?
An axiomatic system for regular languages
All binary strings
Vocabulary: Languages over alphabet Σ

What is A* of A={11}? Axioms: ∅, {a} for each a∈Σ

Deduction rules:
All binary strings of an even Given L1, L2, can obtain L1 ⋃ L2
number of 1s Given L1, L2, can obtain L1 ⋅ L2
Given L, can obtain L*

7
The Kleene Theorem (1956) Reverse
Reverse: AR = { w1 …wk | wk …w1 A}
Every regular language over Σ can be How to construct a DFA for the reversal
constructed from ∅ and {a}, a ∈ Σ, using only of a language?
the operations union, concatenation 1 0,1
The direction in which we
and the Kleene star. read a string should be 0
irrelevant. q0 q1

1 0,1
If we flip transitions
around we might not get 0
a DFA. q0 q1

Nondeterministic Finite Automaton Nondeterministic finite automaton


(NFA)
There is another type
machine in which there a
may be several possible a Nondeterminism can arise from two different
next states. Such qk
sources:
machines called
nondeterministic. -Transition nondeterminism
a
-Initial state nondeterminism
.

Allows transitions from qk on the same


symbol to many states

Nondeterministic finite automaton NFA for {0k | k is a multiple of 2 or 3}


(NFA) 0

An NFA is defined using the same 0


ε
notations M = (Q, Σ, , I, F)
as DFA except the initial states I and
the transition function assigns a set of 0
states to each pair Q Σ of state and ε 0
input.
0
Note, every DFA is automatically also NFA.

8
Find the language recognized by this What does it mean that for an NFA to
NFA recognize a string?
0
0 0 0
s1 s1 s3
s3 0
0
1 1

1 1
s0 0,1 s0 0,1

0 0 1
1
s2 s4 s2 s4

Since each input symbol xj (for j>1) takes the


L = {0n, 0n01, 0n11 | n = 0, 1, 2…} previous state to a set of states, we shall use a
union of these states.

What does it mean that for a NFA to Find the language recognized by this
recognize a string? NFA
0
Here we are going formally define this.
1
1

For a state q and string w, *(q, w) is the set of 1 0


s0
states that the NFA can reach when it reads the
string w starting at the state q.
0 1
Thus for NFA= (Q, Σ, , q0, F), the function
*: Q x Σ -> 2Q

is defined by *(q, y xk) = p *(q,y) (p,xk) L = 1* (01, 1, 10) (00)*

Nondeterministic finite automaton Nondeterministic finite automaton


Theorem (Rabin, Scott 1959).
Theorem. For every NFA there is an equivalent DFA.
If the language L is recognized by an NFA,
then L is also recognized by a DFA. For this they won the Turing Award.

In other words,
if we ask if there is a NFA that is not
CMU prof.
equivalent to any DFA. The answer is No.
emeritus

Rabin Scott

9
NFA vs. DFA Pattern Matching

Advantages. Input: Text T of length k, string/pattern P of length n


Easier to construct and manipulate.
Sometimes exponentially smaller. Problem: Does pattern P appear inside text T?
Sometimes algorithms much easier. Naïve method:

Drawbacks a1, a2, a3, a4, a5, …, an


Acceptance testing slower.
Sometimes algorithms more complicated. Cost: Roughly O(n k) comparisons

may occur in images and DNA sequences


unlikely in English text

Pattern Matching
Build DFA from pattern
Input: Text T, length n. Pattern P, length k.
Output: Does P occur in T? The alphabet is {a, b}.
The pattern is a a b a a a b b.

Automata solution: To create a DFA we consider all prefixes


ε, a, aa, aab, aaba, aabaa, aabaaa, aabaaab,
aabaaabb
The language P is regular!
There is some DFA MP which decides it. These prefixes are states. The initial state
is ε. The pattern is the accepting state.
Once you build MP, feed in T: takes time O(n).

DFA Construction DFA Construction


aabaaabb aabaaabb

b b
a
0 1 0 1 2
a a

10
DFA Construction DFA Construction
aabaaabb aabaaabb

b a b a

b a
b
0 1 2 3 0 1 2 3 4
a a a a

b b
b

DFA Construction DFA Construction


aabaaabb aabaaabb

b b

b a b a

b b
0 1 2 3 4 5 0 1 2 3 4 5 6
a a a a a a a a a
b b b
b b

DFA Construction DFA Construction


aabaaabb aabaaabb

b
b a
b
a
b a b
b a 0 1 2 3 4 5 6 7 8
a a a a b
b b b
0 1 2 3 4 5 6 7 b
a a a a a
b a
b b
b

11
The Knuth-Morris-Pratt Algorithm (1976) The KMP Algorithm - Motivation
1970 Cook published a paper about a possibility of Algorithm compares the
existence of a linear time algorithm pattern to the text in
left-to-right, but shifts a b a a b x
Knuth and Pratt developed an algorithm the pattern more
intelligently than the
a b a a b a
Morris discovered the same algorithm brute-force algorithm. j
When a mismatch
a b a a b a
Pittsburgh native, occurs, we compute the
CMU professor. length of the longest
No need to Resume
prefix of P that is a repeat these comparing
proper suffix of P. comparisons here

Languages
DFAs
The regular operations
0n1n is not regular
Union Theorem
Kleene’s Theorem
Here’s What NFAs
You Need to Application: KMP
Know…

12

You might also like