0% found this document useful (0 votes)
31 views212 pages

Chapter 1 Intro To The Theory of Computation2016

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views212 pages

Chapter 1 Intro To The Theory of Computation2016

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 212

Chapter 1

Introduction to the Theory


of Computation

1
Strings and Languages
• Symbol: any thing like a, b, c, 0, 1, …
• Alphabet Σ: It is defined as a finite set of symbols.
• Example: Roman alphabet {A,..,Z, a, b, ...... z}.
• “Binary Alphabet” {0, 1} is pertinent to the theory of computation.
• String: A “string” over an alphabet is a finite sequence of symbols
from that alphabet, which is usually written next to one another and
not separated by commas.
• (i) If Sa = {0,1} then 001001 is a string over Sa.
• (ii) If Sb = {a, b, .., z) then axyrpqstcd is a string over Sb .
• Length of String: is its length as a sequence. It is the number
of symbols in the string.
• The length of a string w is written as |w|.
• Example: |10011| = 5
• Empty String: The string of zero length is called the “empty
string”.
• This is denoted by e or λ orÎ.
• The empty string plays the role of 0 in a number system.
Strings
• If w is a string, then wn stands for the string
obtained by repeating ω n times.
• As a special case, we define w0= λ, for all w.
• If Σ is an alphabet, then we use Σ* to denote the
set of strings obtained by concatenating zero
or more symbols from Σ.
• The set Σ* always contains λ.
• To exclude the empty string, we define Σ+= Σ*-
{λ}
3
•  ={ε}, where ε is the empty string (common to all
0
• While Σ is finite by assumption, Σ* and Σ+ are always
infinite since there is no limit on the length of the
strings in these sets.

• A language is defined as a subset of Σ*.

• A string in a language L will be called a sentence of L.

• any set of strings on an alphabet Σ can be considered a


language.
• A set of strings over Σ (i.e. any subset of Σ*) is called a formal
language over Σ.
• For example, if Σ = {0, 1}, the set of strings with an even number of
zeros ({ε, 1, 00, 11, 001, 010, 100, 111, 0000, 0011, 0101, 0110, 1001,
1010, 1100, 1111, …}) is a formal language over Σ.
Reverse String: If w= w1 w2,..,wn where each wi ÎS, the reverse of w is

wnwn-1..w1 .
Substring: z is a substring of w if z appears consecutively within w.
As an example, ‘deck’ is a substring of ‘abcdeckabcjkl’.
Concatenation: Assume a string x of length m and string y of length n,
the concatenation of x and y is written xy, which is the string obtained by
appending y to the end of x, as in x1x2..xm y1 y2.. yn .
To concatenate a string with itself many times we use the “superscript”
notation:
Suffix: If w = xv for some x, then v is a suffix of w.
Example: Let us take a string w = 0110. For the particular string, λ, 0, 10,
110, and 0110 are suffixes of the string 0110. For a string of length n, there
are n + 1 number of suffixes.

Proper suffix: For a string, any suffix of the string other than the string
itself is called as the proper suffix of the string. Example: For the string w
= 0110, the proper suffixes are λ, 0, 10, and 110.

Prefix: If w = vy for some y, then v is a prefix of w.


Example: Let us take a string w = 0111. For the particular string, λ, 0, 01,
011, and 0111 are prefixes of the string 0111. For a string of length n, there
are n + 1 number of prefixes.
Proper prefix: any prefix of the string other than the string itself
is called as the proper prefix of the string. Example: For the string
w = 0111, the proper prefixes are λ, 0, 01, and 011.
Lexicographic ordering: is the same as the dictionary ordering,
The set of all strings over Σ will be written Σ∗. For the alphabet {a,b}, we
have {a,b}∗ ={λ,a,b,aa,ab,ba,bb,aaa,aab,...}
Here we have listed the strings in canonical order, the order in which
shorter strings precede longer strings and strings of the same length
appear alphabetically.
Canonical order is different from lexicographic, or strictly alphabetical
order, in which aa precedes b.
An essential difference is that canonical order can be described by
making a single list of strings that includes every element of Σ∗ exactly
once.
If we wanted to describe an algorithm that did something with each
string in {a,b}∗, it would make sense to say, “Consider the strings in
canonical order, and for each one, ...” .
If an algorithm were to “consider the strings of {a,b}∗ in lexicographic
order”, it would have to start by considering ε, a, aa, aaa, ..., and it would
never get around to considering the string b.
Language: Any set of strings over an alphabet S is called a language.

eg. S={0,1}
L1=set of all strings of length 2, ={00,01,10,11}
L2=set of all strings of length 3, ={000,001,010,011,100, 101, 111}
L3=set of all strings that begin with 0, ={0,00,01,011,000, 0101, …}
L1and L2 are finite and L3 is infinite
Power of S : S={0,1}
S0= set of all strings of length 0: S0={Î}
S1= set of all strings of length 1: S1={0,1}
S2= set of all strings of length 2: S2={00,01,10,11}
S3= set of all strings of length 3: S3={000,001,010,011,101,110,111}
….
Sn= set of all strings of length n
Cardinality: number of elements in a set, Sn =2n . Cardinality(S0)=1
S*={Î} u {0,1} u {00,01,10,11} u...
= set of all possible strings of all lengths over {0,1}
it is infinite
Formal Language
• A formal language is a set of words, i.e. finite strings of letters, or
symbols.
• The inventory/list from which these letters are taken is called the
alphabet over which the language is defined.
• A formal language is often defined by means of a formal grammar.
Formal languages are a purely syntactical notion, so there is not
necessarily any meaning associated with them.
Formal Definition
A formal language L over an alphabet Σ is just a subset of Σ*, that
is, a set of words over that alphabet. For example, three sample
languages over the same alphabet Σ = { a, b }:
L1 = {a a, a a a }
L2 = {a ba, a a b}
L3 = {a b, b a, a a bb, a ba b, . . . , a a a bbb, . . . }
In computer science and mathematics, which do not deal with
natural languages, the adjective "formal" is usually omitted as
redundant.
Example1
The following rules define a formal language L over the alphabet Σ=
{0,1,2,3,4,5,6,7,8,9,+,=}:
• Every non empty string that does not contain + or = and does not
start with 0 is in L.
• The string 0 is in L.
• A string containing=is in L if and only if there is exactly one =, and
it separates two strings in L.
• A string containing + is in L if and only if every + in the string
separates two valid strings in L.
• No string is in L other than those implied by the previous rules.
Automata
• An automaton is an abstract model of a
digital computer and the computational
problems that can be solved using these
machines.
• abstract 'mathematical' machines or systems
• It has a mechanism for reading input
• the input is a string over a given alphabet, written
on an input file, which the automaton can read but
not change
• The input file is divided into cells, each of which can
hold one symbol.
15
Figure 1.1 schematic representation of a general automaton.
• The input mechanism can read the input file from left
to right, one symbol at a time
• The automaton can produce output of some form.
• It may have a temporary storage device,
• consisting of an unlimited number of cells,
• each capable of holding a single symbol from an alphabet
The automaton can read and change the contents
of the storage cells.
• the automaton has a control unit, which can be in
any one of a finite number of internal states, and
which can change state in some defined manner.
Formal Definitions
An automaton is represented formally by the 5-tuple ⟨Q, Σ, δ, q0, F⟩,
Automaton

where:
• Q is a finite set of states.
• Σ is a finite set of symbols, called the alphabet of the automaton.
• δ is the transition function, that is, δ: Q ×Σ→ Q.
• q0 is the start state, that is, the state which the automaton is in
when no input has been processed yet, where q0∈Q.
• F is a set of states of Q (i.e. F⊆Q) called accept states.
Automata Recognizable language
Deterministic finite automata (DFA) regular languages

Nondeterministic finite automata (NFA) regular languages


Nondeterministic finite automata
with ε transitions (FND-εor ε-NFA) regular languages
Pushdown automata (PDA) context-free languages
Linear bounded automata (LBA) context-sensitive language
Turing machines Recursively enumerable languages
The Chomsky Hierarchy
• The Chomsky hierarchy is an important
contribution in the field of formal language and
automata theory.
• Chomsky classified the grammar into four types
depending on the production rules.
• These are:
• Type 0
• Type 1
• Type 2
• Type 3
• Provide a basis for understanding the relationships between
the grammars
2.1.Finite Automata
• A Finite Automata is:
– a mechanism to recognize a set of valid
inputs before carrying out an action.
– a notation for describing a family of language
recognition algorithms.
– Consists of
– a finite memory called input tape,
– a finite-nonempty set of states,
– an input alphabet, a read-only head,
– a transition function which defines the
change of configuration,
– an initial state, and
– a finite-non empty set of final states.
2
• The input tape is divided into cells and
each cell contains one symbol from the
input tape.
• The symbol  is used as the left most cell
to indicate the beginning of the input tape
• $ used as the right most cell to indicate
the end of the input alphabet
• The head reads one symbol on the input
tape
• Finite control controls the next
configuration
• The head can read either from left-to-right
or right-to-left one cell at a time
• The head can’t write and can’t move
Operation of the
machine
0 0 1 Input Tap
Tape Hea e
d q0 Finite Control
• Read the current letter of
input under the tape head.
• Transit to a new state depending
on the current input and the
current state, as dictated by the
transition function.
• Halt after consuming the entire
input.
2
Operation of the
machine
• Transitions show the initial state,
input, and next state
– Form: (q,a)=b
• Example:
– (q0,0)=q1 (q0,1)=q2
• Tape head advances to next cell, in
state q1
• What happens now?
– What is (q1,0)?
0 0 1

q1
2
States of the FA
FA has following states
• Initial state
• Final states
• Non-final states: all except final state
• Hang-states: states not included into Q,
and after reaching these states FA sits in
idle situation. These have no outgoing
edge. These states are generally denoted
by . For example consider a FA shown
above
Definition of a Finite
1. Finite set of states, typically Q.
Automaton
2. Alphabet of input symbols, typically 
3. One state is the start/initial state, typically q0
// q0  Q
4. Zero or more final/accepting states; the set
is typically F. // F  Q
5. A transition function, typically δ. This
function
• Takes a state and input symbol as arguments.
• Returns a state.
• One “rule” would be written δ(q, a) = p, where q
and p are states, and a is an input symbol.
• Intuitively: if the FA is in state q, and input a is
received, then the FA goes to state p (note: q = p
δ,qOK).
0,
6. A
F).FA is represented as the five-tuple: A = (Q, 2
Definition of
• Let Computation
M = (Q, , δ,q0, F) be a finite
automaton and let w = w1w2…wn
be a string where each wi is a
member of alphabet ∑.
• M accepts w if a sequence of
states r0r1…rn in Q exists with
conditions:
three
1. r0 = q0
2. δ(ri, wi+1) = ri+1 for i=0, … ,
n-1
We say that M recognizes language A if A = {w | M accepts w }
3. rn  F
In other words, the language is all of those strings that are accepted
by the finite automata.
3
Construc
t• A Finite Automaton Accepting the

L= {x ∈ {a, b}* | x ends with aa}


Language of Strings Ending in aa, i.e.

• An FA Accepting the Language


of Strings Ending in b and Not

L = {x ∈ {a, b}∗ | x ends with b


Containing the Substring aa

and does not contain


the substring aa}

3
Finite Automata\Some
–Applications
Software for designing and checking
the behavior of digital circuits
– Lexical analyzer of a typical compiler
– Software for scanning large bodies of
text (e.g., web pages) for pattern
finding
– Software for verifying systems of all
types that have a finite number of
states (e.g., stock market transaction,
communication/network protocol)

32
Finite Automata

FA with FA without
output output

Moore Mealy
Machine Machine
DFA NFA Î-NFA
• FA without outputs
• both describe regular languages
– Deterministic (DFA) – There is a fixed number of
states and we can only be in one state at a time.
It is one in which each move (transition from
one state to another) is unequally determined
by the current configuration.
– Nondeterministic (NFA) –There is a fixed number
of states but we can be in multiple states at one
time
• While NFA’s are more expressive than DFA’s,
we will see that adding nondeterminism
does not let us define any language that
cannot be defined by a DFA.
• One way to think of this is we might write
a program using a NFA, but then when it
is “compiled” we turn the NFA into an
34
Deterministic Finite Automaton (DFA)
is represented by a quintuple (5-tuple)
M =Q(Q, Σ, ,
is the q0of
set , states
F) : (finite)
Σ is the alphabet (finite) λ  Σ
 : Q  Σ → Q is the transition
function q0  Q is the start state
F  Q is the set of accept states
Let w1, ... , wn  Σ and w = w1... wn  Σ*
Then M accepts w if there are r0, r1, ..., rn  Q, s.t. r0=q0
(ri, wi+1 ) = ri+1, for i = 0, ..., n-1, and rn  F
• The input mechanism can move only from left to right and reads exactly one
symbol on each step.
• The transition from one internal state to another are governed by the transition
function .
• If (q0 , a) =q1 ,then if the DFA is in state q0 and the current input symbol is a, the
DFA will go into state q1.
(DFA
)
states accept states (F)
0 q1 1
0,1
1
q0 q2
0 0
1
start state (q0) q3 states

The machine accepts a string if the process


ends in a double circle
NOTATION
An alphabet Σ is a finite set (e.g., Σ = {0,1})

A string over Σ is a finite-length sequence of


elements of Σ

Σ* denotes the set of finite length sequences of


elements of Σ

For x a string, |x| is the length of x


The unique string of length 0 will be denoted by ε
and will be called the empty or null string

A language over Σ is a set of strings over Σ, ie, a


subset of Σ*
Transition
Graph
a,
b
q5
a, b
q0 q2 q3 q4

q1 acceptin
transitio g state
initia stat n
l e
38
Alphabet   {a,b a, b

} q
5
a
,

b
q0 q1 q2
q3 q4
39
hea Initial
d Configuration
Input
Tape
Input String
a, b

q5
a,
b
q0 q
1 q2 q3 q4
Initial state
40
Scanning the
Input

a,
b
q5
a,
q0 q b
1 q2 q3 q4

41
a,
b
q5
a,
q0 q b
1 q2 q3 q4

18
a,
b
q5
a,
q0 q b
1 q2 q3 q4

43
Input
finished

a,
b
q5
a,
b accep
q0 q1 q2 q3 q4 t
Last state determines the
outcome 44
A Rejection
Case

Input String
a,
b
q5
a,
q0 q b
1 q2 q3 q4

45
a,
b
q5
a,
q0 q b
1 q2 q3 q4

22
a,
b
q5
a,
q0 q b
1 q2 q3 q4

47
Input
finished

a,
b
reject
q5
a, b
q0 q1 q2 q3 q4

Last state determines the


outcome 48
Another Rejection
Case

Tape
()is empty

Input Finished (no symbol read)


a, b

q5
a,
b
q0 q1 q2 q3
q4 49
This automaton accepts only one
string
Language L  abba
Accepted:
 a,
b
q5
a,
q0 q b
1 q2 q3 q4
50
Another
Example
L  ,ab,abba 
a, b

q
5

q0 q1 q2 q3 a
q4
,
Accep Accep Accep
t t tb
51
Empty
Tape
()

Input a,
Finished b
q5
a,
q0 q b
1 q2 q3 q4
accep
t 52
DFA problems can be

• Start with: all strings starting with 0


• End with: all strings ending with 1
• Contains/not contain: all strings
contains with 00
• Length: all strings of length >=3
• Divisibility: all strings of length
divisible by 3
Eg.
1) Construct DFA which accepts all strings over alphabet S={0,1} starts with ‘0’.
2) Construct DFA which accepts all strings over alphabet S={0,1} starts with ‘01’.
3) Construct DFA which accepts all strings over alphabet S={0,1} ends with ‘0’.
4) Construct DFA which accepts all strings over alphabet S={0,1} ends with ‘10’.
5) Construct DFA which accepts all strings over alphabet S={0,1} length of
string=2.
6) Construct DFA which accepts all strings over alphabet S={0,1} length of
string>=2
7) Construct DFA which accepts all strings over alphabet S={0,1} length of
string<=2
8) Construct DFA which accepts all strings over alphabet S={0,1} length of string
is even. (%2)
9) Construct DFA which accepts all strings over alphabet S={0,1} length of string
is odd.
10) Construct DFA which accepts all strings over alphabet S={0,1} where binary
integers divisible by 3. eg. {0,11,110,1001, 1100,1111,…}
11) Construct DFA which accepts all strings over alphabet S={0,1} where binary
integers divisible by 4.
55
Language L  {a b : n
n

Accepted:  0}

a,
b

q a, q
q0
b
1 2

56
Another
Example
Alphabet:  
1
{1}
q0 q1
1
Language Accepted:

EVEN  {x : x   *
and x is
even}
57
Set of States Q
Example

Q  q0, q1, q 2 , q3, q 4 , q5


a, b

q
5 a,
b
q q q q
0 1 2 3 q4
58
Input Alphabet 

 :the input alphabet never

Exampl
contains  
e a,
a,b b
q
5 a,
b
q q q q
0 1 2 3 q4
59
Initial State q0

Exampl
e
a,
b
q5
a,
q0 q b
1 q2 q3 q4

34
Set of Accepting States F 
Q
Exampl
e F a,
q 4 b
q5
a,
q0 q b
1 q2 q3 q4

61
Transition  : Q  
Function
Q
 (q,x ) 
q x
q q

Describes the result of a
from state
transition
with symbol x
q
62
Exampl
e:  q 0 , a  
q1
a,
b
q
5 a,
q q q q b
0 1 2 3 q4
63
 q0 ,b 
q5
a,
b
q
5 a,
q q q q b
0 1 2 3 q4
64
 q2 ,b 
q3
a,
b
q5
a,
q0 q b
1 q2 q3 q4
65
Transition Table for
symbol 
 s
q0 q1 q5
q1 q5 q2
q2 q5 q3
state

q3 q4 q5 a,
b
s

q4 q5 q5
q5 q5 q5 q a,
5
q0 q1 q2 b
q 66
67
Extended Transition
Function *
 :Q *

Q
 (q,w ) 
*

q
after scanning string from state
Describes the resulting
w
state q
68
Example:  *
q 0 , ab 
 q2
a,
b
q
5 a,
q q q q b
0 1 2 3 q4 69
 * q0 , abbbaa 
 q5
a,
b

q
5 a,
q q q q b
0 1 2 3 q4 70
 q1 , bba 
*

 q4
a,
b
q5
a,
q0 q b
1 q2 q3 q4
71
Special
case:

for any state

 q , 
q *

q

72
In  q,w 
*

general:
 q 
implies that there is a walk of
transitions

1    1  2 k
w 
q 2

k q
states may be
repeated

q w
q
73
More DFA
Examples
  {a,b }
a,
a,
b
b

q0 q0

L(M)  { L(M) 
}
Empty  *
All
language strings
74
  {a,b
}

a, b

q0 a, q1
b
L(M )  {}
Language of the empty
string
75

{a,b }
LM = { all strings with prefix
ab }
a, b
q0 q q2
1
accep
t
q a,
3 b
76
LM  = { all binary strings
containing substring
001 }
0,
1 0
1
1

0 0 0 1 00
0 1
0
77
LM  = { all binary strings
without substring
001 }
1 0 0,
1 1

0 1
0 0 00
0 1
0
78

L(M )  awa : w 

a , b  
*

q0 q q
2 3

q1

a, 79
2.3 Nondeterministic Finite
• Automata
A NFA (nondeterministic finite automata) is
able to be in several states at once.
– In a DFA, we can only take a transition to a
single deterministic state
– In a NFA we can accept multiple destination states
for the same input.
– You can think of this as the NFA “guesses”
something
about its input and will always follow the proper
path if that can lead to an accepting state.
– Another way to think of the NFA is that it travels all
possible paths, and so it remains in many states at
once. As long as at least one of the paths results in
an accepting state, the NFA accepts the input.
• NFA is a useful tool
– More expressive than a DFA.
– BUT we will see that it is not more powerful!
80
An NFA
• Similar to a DFA
1. Finite set of states, typically Q.
2. Alphabet of input symbols, typically 
3. One state is the start/initial state, typically q0
4. Zero or more final/accepting states; the set is typically F.
5. A transition function, typically  . This function:
 Takes a state and input symbol as arguments.
 Returns a set of states instead of a single state, as a DFA
6.A FA is represented as the five-tuple: A = (Q, ,  ,q0, F). Here, F is a set of
accepting states.

81
DFA NFA

For each symbolic representation of the alphabet, No need to specify how does the NFA react
1 there is only one state transition in DFA. according to some symbol.

2 DFA cannot use Empty String transition. NFA can use Empty String transition.
In DFA, the next possible state is distinctly set. In NFA, each pair of state and input symbol can
3 have many possible next states.
4 DFA is more difficult to construct. NFA is easier to construct.
DFA rejects the string in case it terminates in a NFA rejects the string in the event of all branches
5 state that is different from the accepting state. dying or refusing the string.

6 Time needed for executing an input string is less. Time needed for executing an input string is more.

7 All DFA are NFA. Not all NFA are DFA.


8 DFA requires more space. NFA requires less space than DFA.
9 Dead state may be required. Dead state is not required.
δ: QxΣ -> 2Q i.e. next possible state belongs to
δ: QxΣ -> Q i.e. next possible state belongs to Q. power set of Q.
10
NFA
Example
• NFA accepts ε, a, baba, baa, and aa,
but that it doesn’t accept b, bb, and
babba.

q1
b a
ε
a q2 q3
a,b

83
• there are three major differences
– In the NFA:
• The range of δ in the power set 2Q, its value is
not a single element of Q, but a subset of it.

Allow  a s the second argument of δ; make a
transition without consuming an input symbol
• The set δ(qi, a) may be empty; there is no
transition defined

84
Nondeterministic Finite Automaton
(NFA)
Alphabet =

{a}

q1 q2

q0
q3

85
Alphabet =

{a}
Two
choices
q1 q2

q0
q3

86
Alphabet =

{a}
Two
choices
q1 q2 No
transition
q0
q3 No
transition

87
First
Choice

q1 q2

q0
q3

88
First
Choice

q1 q2

q0
q3

89
First
Choice

All input is
consumed
q2
q1 “accept”
q0
q3

90
Second
Choice

q1 q2

q0
q3

91
Second
Choice

Input cannot be consumed

q1
Automaton
q0 Halts
q2
q3
“reject”
92
aa is accepted by the
NFA:
“accept
q1 ” q1 q2
q2
q0 q0
q3 q3 “reject
because ”
this
this computation is
computatio 93
Rejection
example

q1 q2

q0
q3

94
First
Choice

“reject

q2
q1
q0
q3

95
Second
Choice

q1 q2

q0
q3

72
Second
Choice

q1 q2

q0
q3 “reject

97
Another Rejection
example

q1 q2

q0
q3

98
First
Choice

q1 q2

q0
q3

99
First
Choice

Input cannot be
consumed
q1 q2
“reject”
q0
Automaton
q3 halts

76
Second
Choice

q1 q2

q0

q3

77
Second
Choice

Input cannot be
consumed
q1 q2

q0 Automaton
halts
q3 “reject

10
An NFA rejects a
string:
if there is no computation of the
NFA that accepts the string.

For each computation:


• All the input is
consumed and the automaton
is in a non final state

OR
• 10
is rejected by the NFA:

“reject
q1 q2 ” q2
q1
q0 q0
q3 “reject q3

All possible computations lead to


is rejected by the NFA:

“reject
q1 ” q1 q2
q2
q0 q0
q3 q3 “reject

All possible computations lead to


Language accepted: L

{aa}
q1 q2

q0

q3

106
Lambda
Transitions

q0 q1 q2 q3

107
q0 q1 q2 q3

108
q0 q1 q2 q3

109
input tape head does not
move

q0 q1 q2 q3

110
all input is
consumed

“accept

q0 q1 q2
q3

String aa is 111
Rejection
Example

q0 q1 q2 q3

112
q0 q1 q2 q3

113
(read head doesn’t
move)

q0 q1 q2 q3

114
Input cannot be
consumed

Automaton
halts “reject”

q2 q3
q0 q1

String is rejected
115
Language accepted: L

{aa}

q0 q1 q2 q3

116
Another NFA
Example

a q1 b q2 q3
q0

117
a b

a q1 b q2 q3
q0

118
a b

a q1 b q2 q3
q0

119
a b

“accept
a b ”
q0 q1 q3
q2

120
Another
String
a b a b

a b
q0 q1 q2 q3

121
a b a b

a b
q0 q1 q2 q3

122
a b a b

a b
q0 q1 q2 q3

123
a b a b

a b
q0 q1 q2 q3

124
a b a b

a b
q0 q1 q2 q3

125
a b a b

a b
q0 q1 q2 q3

126
a b a b

“accept
a b ”
q0 q1 q3
q2

127
Language
accepted
L  ab,abab, ababab,
...
ab 

a q1 b q2 q3
q0

128
Another NFA
Example

0
q1 0, q2
q0
1 1

129
Language
accepted

L(M ) = {λ,10, 1010, 101010, ...


= {10}*
0
q0 q1 0, q2
1
1
(redunda
nt state)
130
Transition
Function
 q , x 
q , q ,, q 
1 2 k

q1
x resulting states with
q x
q following one
x 1 transition with symbol

q x
k
131
 q 0 , 1  
q 1
0
0,
q0 1 q1 q2
1

132
 (q1,0)  {q0 ,
q2 }
0
0,
q0 1 q1 q2
1

133
 (q0 ,  ) 
{q 2 }

0
0,
q0 1 q1 q2
1

134
 (q2 ,1) 

0
0,
q0 1 q1 q2
1

135
Extended Transition Function 
*

Same with but applied on


strings

 *
q , a    q 
0 1

q4 q5
a
a a b
q0 q2 q3
q1
136
 *
q 0 , aa 
q , q 4 5

q4 q5
a
a a b
q0 q2 q3
q1
113
 * q0 , ab 
q , q , q
2 3 0 

q4 q5
a
a a b
q0 q2 q3
q1
138
Special
case:
for any state
q
q *
 q,


139
In
general
qj
  * qi , w  : there is a walk from qi
with label w
to q j
qi w qj

w  1 2 
k
qi 1 k qj
2
140
The Language of an NFA
The language accepted by M M
is:

LM  
w1 ,w2 ,...wn 
wher  * (q 0 ,wm ) 
e {qi ,...,qk ,,qj }

and there is qk  F (accepting


some state)
141
wm  LM
*
 (q0 ,wm )
w
m
qi
qk qk  F
q0 w
m
w
qj
m

142
F  q 0 ,
q4 q5
q5
a
a a b
q0 q2 q3
q1

 * q0 , aa   q 4 , q5 aa 
L(M )
 143
F  q 0 ,
q4 q5
q5
a
a a b
q0 q2 q3
q1

 * q0 , ab   q2 , q3 , q0 ab  LM


  144
q4 q5
a
a a b
q0 q2 q3
q1

LM   ab*  ab*


{aa
} 145
Example of equivalent machines

NFA
LM1 
q0 M1 q1
{10}*
01

DFA M 2 0,
0
LM 2   1
q0 q1 1 q2
{10}* 1
0 146
Theore
m:
Languag
Regular
es
Languag
accepted
es
by NFAs
Languag
es
accepted
NFAs and DFAs have the same computation
by DFAs
power, accept the same set of languages
147
Equivalence of DFA’s and
• NFA’s
For most languages, NFA’s are easier
to construct than DFA’s
• But it turns out we can build a
corresponding DFA for any NFA
– The downside is there may be up to 2n
states in turning a NFA into a DFA.
However, for most problems the number of
states is approximately equivalent.
• Theorem: A language L is accepted by
some DFA if and only if L is accepted by
some NFA; i.e. : L(DFA) =
L(NFA) for an appropriately constructed
DFA from an NFA.

148
Conversion NFA to
DFA
NFA M a
q0 a q1 q2
b

DFA M 

q0 

149
 * (q0 ,a ) 
{q1 , q2 } a
NFA M
a q q2
1
q0
b

DFA M
q0 a
q1,

q2

150
 * (q0 ,b)  empty
 a set
NFA M
a q q2
1
q0
b

DFA M
q0 a
q1,

b q2

 trap 151
 * (q1 ,a ) 
a {q1 ,q2 }
NFA M  * (q2 ,a )  
q0 a q1 q2
b qunion
1,
q2
a
DFA M
q0 a
q1,

b q2

152
 * (q1 ,b) 
a { q0 }
NFA M  * (q2 ,b) 
q0 a q1 q
{q2 }
b
0
q 0
unio

n
b a
DFA M
q0 a
q1,

b q2

153
a
NFA M
a q1 q2
q0
b

b a
DFA M
q0 a
q1,

b q2
 a,b trap
state 130
END OF CONSTRUCTION
a
NFA M
q0 a q1 q2 q1 
b F
a
b
DFA M
q0 a
q1,
 q1, q2
b q2
a,b F

155
General Conversion
Procedure
Input: an NFA M
Output: an equivalent DFA M

with LM   L(M )

156
Step 2: select only those states which are reachable from start state
The NFA has q 0 , q 1,
states
q2 ,...

The DFA has states from the power


set

, q0 , q1 , q0 , q1 ,


q , q , q , ....
1 2 3

159
Conversion Procedure
Steps

step

1. Initial state of NFA: q0

Initial state of DFA: q0

160
Exampl
e a
a
NFA M q0 q1 q2
b

DFA M 

q0 

161
step
2. For every DFA’s {qi ,q
state
compute in the j ,...,qm}
NFA  *
 q*, a  Union
i

q , a 
j {qk,ql,...,
... qn }
 *
add transition to
 qm , a 
DFA 
 {qi , qj ,..., qm }, a  162
Exampl  *(q0 , a)  {q1,
e q2}
a a
NFA M q0 q1 q2
b

 q 0 , a   q1 ,
DFA M 
q0q 2  a
q1,
q2

163
ste
p3. Repeat Step 2 for every state in DFA
and symbols in alphabet until no more
states can be added in the DFA

164
Exampl
e a
a
NFA M q0 q1 q2
b

b a
DFA M
q0 a
q1,

b q2
 a,b
165
ste
p
4. For any DFA state {qi , q j ,..., qm}

if some q j is accepting state in


NFA

Then {qi , q j ,...,


,is accepting state in
qm }
DFA
166
Exampl
e a
a q1 
NFA M q0 q1 q2
b F

a
b
DFA M
q0 a
q1,
 q1, q2
b q2
a,b F

167
Lemm
a:
If we convert NFA M to DFA M
then the two automata are

equivalent:

LM   LM 
Proof:
We only need to LM   LM
show:

AND
168
Languages & Grammars

Phrase-Structure Grammars
Types of Phrase-Structure
Grammars
Derivation Trees
Backus-Naur Form
Intro to Languages

 English grammar tells us if a given combination of


words is a valid sentence.
 The syntax of a sentence concerns its form while
the semantics concerns its meaning.
e.g. the mouse wrote a poem

 From a syntax point of view this is a valid


sentence.
 From a semantics point of view not so fast…
perhaps in Disney land
 Natural languages (English, French, Portguese,
etc) have very complex rules of syntax and not
170
necessarily well-defined.
Formal Language

 Formal language – is specified by well-defined set of rules of


syntax

 We describe the sentences of a formal language using a


grammar.

 Two key questions:


 1 - Is a combination of words a valid sentence in a
formal language?
 2 – How can we generate the valid sentences of a
formal language?

 Formal languages provide models for both natural languages


171
and programming languages.
Grammars

A formal grammar G is any compact, precise


mathematical definition of a language L.
As opposed to just a raw listing of all of the
language’s legal sentences, or just examples of
them.
A grammar implies an algorithm that would
generate all legal sentences of the language.
Often, it takes the form of a set of recursive
definitions.
A popular way to specify a grammar
recursively is to specify it as a phrase-structure
grammar.
Grammars (Semi-formal)

 Example: A grammar that generates a subset


of the English language
sentence  noun _ phrase predicate

noun _ phrase  article noun

predicate  verb 173


article  a
article  the

noun  boy
noun  dog

verb  runs
verb  sleeps
174
 A derivation of “the boy sleeps”:

sentence  noun _ phrase predicate


 noun _ phrase verb
 article noun verb
 the noun verb
 the boy verb
 the boy sleeps

175
 A derivation of “a dog runs”:

sentence  noun _ phrase predicate


 noun _ phrase verb
 article noun verb
 a noun verb
 a dog verb
 a dog runs
176
Language of the grammar:

L = { “a boy runs”,
“a boy sleeps”,
“the boy runs”,
“the boy sleeps”,
“a dog runs”,
“a dog sleeps”,
“the dog runs”,
“the dog sleeps” }

177
Notation

noun  boy
noun  dog

Variable Terminal
or Production
Symbols of
Non-terminal rule
the vocabulary

Symbols of
the vocabulary
178
Basic Terminology
► A vocabulary/alphabet, V is a finite nonempty set
of elements called symbols.
 Example: V = {a, b, c, A, B, C, S}

► A word/sentence over V is a string of finite length


of elements of V.
Example: Aba

► The empty/null string, λ is the string with no


symbols.

► V* is the set of all words over V.


Example: V* = {Aba, BBa, bAA, cab …}

► A language over V is a subset of V*.


We can give some criteria for a word to be in a
Conti…
A grammar G is defined as a quadruple
 G =(V, T, S, P), where
V is a finite set of objects called variables,
T is a finite set of objects called terminal
symbols,
S ∈ V is a special symbol called the start
variable,
P is a finite set of productions. It will be
assumed without further mention that the sets
V and T are nonempty and disjoint.
X-> y
where x is an element of (V ∪ T)+ and y is in
Conti…

 we say the production x → y is applicable to this


string, and we may use it to replace x with y,
thereby obtaining a new string
Z->uyv,
This is written as
w=>z
 We say that w derives z or that z is derived from
w. Successive strings are derived by applying the
productions of the grammar in arbitrary order. A
production can be used whenever it is applicable,
and it can be applied as often as desired. If
w =>w ---=>w
Conti…

 we say that w1 derives wn and write

w1=>*wn
 The * indicates that an unspecified number of steps
(including zero) can be taken to derive wn from w1.
 Definition :- Let G = (V, T, S, P) be a grammar. Then the
set L(G)=(w € T*:S=>w*) is the language generated by
G.
 If w ∈ L (G),then the sequence S=>w1=>w2--- =>wn
=>w
is a derivation of the sentence w. The strings S, w1,
w2,…, wn, which contain variables as well as terminals,
are called sentential forms of the derivation.
Conti…

 Consider the grammar


 G=({S},{a,b},S,P)
With P given by
S----> aSp
S λ
Then S=>aSb=>aaSbb=>aabb
So we can write S=>*aabb
The string aabb is a sentence in the language generated
by G, while aaSbb is a sentential form.
Phrase-Structure
Grammars
A phrase-structure grammar (abbr. PSG)
G = (V,T,S,P) is a 4-tuple, in which:
V is a vocabulary (set of symbols)
The “template vocabulary” of the language.
T  V is a set of symbols called terminals
Actual symbols of the language.
Also, N :≡ V − T is a set of special “symbols”
called nonterminals. (Representing concepts like
“noun”)
SN is a special nonterminal, the start symbol.
in our example the start symbol was “sentence”.
P is a set of productions (to be defined).
Rules for substituting one sentence fragment for
another
Every production rule must contain at least one
nonterminal on its left side.
Phrase-structure
Grammar
► EXAMPLE:

 Let G = (V, T, S, P),

 where V = {a, b, A, B, S}
 T = {a, b},
 S is a start symbol
 P = {S → ABa, A → BB, B → ab, A → Bb}.

G is a Phrase-Structure Grammar.

What sentences can be generated


with this grammar?
Derivation
 Definition

 Let G=(V,T,S,P) be a phrase-structure grammar.

 Let w0=lz0r (the concatenation of l, z0, and r) w1=lz1r be strings


over V.

 If z0  z1 is a production of G we say that w1 is directly derivable


from w0 and we write wo => w1.

 If w0, w1, …., wn are strings over V such that w0 =>w1,w1=>w2,…,


wn-1 => wn, then we say that wn is derivable from w0, and write
w0=>*wn.

 The sequence of steps used to obtain wn from wo is called a


derivation.
Productions
A production pP is a pair p=(b,a) of sentence
fragments a, b (not necessarily in L), which
may generally contain a mix of both terminals
and nonterminals.
We often denote the production as b → a.
Read “replace b by a”
Call b the “before” string, a the “after” string.
It is a kind of recursive definition meaning
that
If lbr  LT, then lar  LT. (LT =
sentence “templates”)
That is, if lbr is a legal sentence template,
then so is lar.
That is, we can substitute a in place of b in
any sentence template.
Languages from PSGs
The recursive definition of the language L
defined by the PSG: G = (V, T, S, P):
Rule 1: S  LT (LT is L’s template
Abbreviate
language) this using
The start symbol is a sentence template lbr  lar.
(member of LT). (read, “lar is
Rule 2: directly
derivable
(b→a)P: l,rV*: lbr  L → lar  L from lbr”).
T T

Any production, after substituting in any


fragment of any sentence template,
yields another sentence template.
Rule 3: (σ  LT: ¬nN: nσ) → σL
All sentence templates that contain no
nonterminal symbols are sentences in L.
Language

Let G(V,T,S,P) be a phrase-structure grammar.


The
language generated by G (or the language of
G)
denoted by L(G) , is the set of all strings of
terminals
that are derivable from the starting state S.

L(G)= {w  T* | S =>*w}
189
Language L(G)
► EXAMPLE:

 Let G = (V, T, S, P), where V = {a, b,


A, S}, T = {a, b}, S is a start symbol and P
= {S → aA, S → b, A → aa}.

The language of this grammar is given by


L (G) = {b, aaa};

1. we can derive aA from using S → aA, and


then derive aaa using A → aa.
Another example

Grammar:
G=(V,T,S,P) T={a,b} P= S  aSb
V={a,b,S}
S

Derivation of sentence :
ab
S  aSb  ab

S  aSb S 191


S  aSb
 Grammar:
S

 Derivation of sentence :
aabb
S  aSb  aaSbb  aabb

S  aSb S
192
 Other derivations:

S  aSb  aaSbb  aaaSbbb  aaabbb


S  aSb  aaSbb  aaaSbbb
 aaaaSbbbb  aaaabbbb
So, what’s the language of the
grammar with the productions? S  aSb
S 193
 Language of the grammar with the
productions:
S  aSb
S
n n
L {a b : n 0}

194
PSG Example – English
Fragment

We have G = (V, T, S, P), where:


V = {(sentence), (noun phrase),
(verb phrase), (article), (adjective),
(noun), (verb), (adverb), a, the, large,
hungry, rabbit, mathematician, eats,
hops,
quickly, wildly}
T = {a, the, large, hungry, rabbit,
mathematician,
eats, hops, quickly, wildly}
S = (sentence)
P = (see next slide)
Productions for our
Language

 P = { (sentence) → (noun phrase)


(verb phrase),
(noun phrase) → (article) (adjective)
(noun),
(noun phrase) → (article) (noun),
(verb phrase) → (verb) (adverb),
(verb phrase) → (verb),
(article) → a, (article) → the,
(adjective) → large, (adjective) →
hungry,
(noun) → rabbit, (noun) →
mathematician,
A Sample Sentence
Derivation

(sentence) On each step,


(noun phrase) (verb phrase)
we apply a
production to a
(article) (adj.) (noun) (verb fragment of the
phrase) previous sentence
template to get a
(art.) (adj.) (noun) (verb) new sentence
(adverb) template. Finally,
the (adj.) (noun) (verb) we end up with a
sequence of
(adverb) terminals (real
the large (noun) (verb) (adverb)
words), that is, a
the large rabbit (verb) (adverb)
sentence of our
language L.
the large rabbit hops
(adverb)
Another Example

V T

Let G = ({a, b, A, B, S}, {a, b},


P S,

{S → ABa, A → BB, B → ab, AB → b}).


One possible derivation in this
grammar is:
S  ABa  Aaba  BBaba 
Bababa
Defining the PSG Types
Type 0: Phase-structure grammars – no restrictions
on the production rules
Type 1: Context-Sensitive PSG:
All after fragments are either longer than the
corresponding before fragments, or empty:
if b → a, then |b| < |a|  a=λ.
Type 2: Context-Free PSG:
All before fragments have length 1 and are nonterminals:
if b → a, then |b| = 1 (b  N).
Type 3: Regular PSGs:
All before fragments have length 1 and nonterminals
 All after fragments are either single terminals, or a pair
of a terminal followed by a nonterminal.
if b → a, then a  T  a  TN.
Types of Grammars - Chomsky hierarchy of languages

 Venn Diagram of Grammar Types:

Type 0 – Phrase-structure Grammars


Type 1 –
Context-Sensitive
Type 2 –
Context-Free
Type 3 –
Regular
Classifying grammars

Given a grammar, we need to be able to


find the smallest class in which it belongs.
This can be determined by answering
three questions:
Are the left hand sides of all of the
productions single non-terminals?
 If yes, does each of the productions
create at most one non-terminal and is it
on the right?
 Yes – regular No – context-
free
 If not, can any of the rules reduce the
length of a string of terminals and non-
A regular grammar is one where each production
takes one of the following forms: (where the capital
letters are non-terminals and w is a non-empty string
of terminals):
 S  ,
 S  w,
 S  T,
 S  wT.
Therefore, the grammar: S → 0S1, S → λ
is not regular, it is context-free
Only one nonterminal can appear on the right side
and it must be at the right end of the right side.
Therefore the productions
 A  aBc and S  TU
are not part of a regular grammar,
but the production A  abcA is.
Definition: Context-Free Grammars

Grammar G (V , T , S , P )

Vocabulary Terminal Start


symbols variable

Productions of the form:


A x
Non-Terminal String of variables
and terminals
Derivation Tree of A Context-free
Grammar

► Represents the language using an ordered


rooted tree.

► Root represents the starting symbol.


► Internal vertices represent the nonterminal
symbol that arise in the production.
► Leaves represent the terminal symbols.

► If the production A → w arise in the


derivation, where w is a word, the vertex that
represents A has as children vertices that
represent each symbol in w, in order from left
to right.
Language Generated by a
Grammar

 Example: Let G = ({S,A,a,b},{a,b}, S,


{S → aA, S → b, A → aa}). What is L(G)?
 Easy: We can just draw a tree
of all possible derivations.
We have: S  aA  aaa.
S
and S  b.
 Answer: L = {aaa, b}. aA b
Example of a
aaa derivation tree
or parse tree
or sentence
diagram.
Example: Derivation Tree
► Let G be a context-free grammar with the
productions P = {S →aAB, A →Bba, B →bB,
B →c}. The word w = acbabc can be derived
from S as follows:
S ⇒ aAB →a(Bba)B ⇒ acbaB ⇒ acba(bB) ⇒
acbabc
Thus, the derivation Stree is given as follows:

a A B

B b a b B

c
c
Backus-Naur Form

sentence ::= noun phrase verb phrase


noun phrase ::= article [adjective]
noun
verb phrase ::= verb [adverb]
Square brackets []
article ::= a | the mean “optional”
adjective ::= large | hungry
noun ::= rabbit | mathematician
Vertical bars
verb ::= eats | hops mean “alternatives”
adverb ::= quickly | wildly
Generating Infinite
Languages

 A simple PSG can easily generate an infinite language.


 Example: S → 11S, S → 0 (T = {0,1}).
 The derivations are:
S  0
S  11S  110
S  11S  1111S  11110
and so on…
L = {(11)*0} – the
set of all strings
consisting of some
number of concaten-
ations of 11 with itself,
followed by 0.
Another example

Construct a PSG that generates the language L


= {0n1n | nN}.
0 and 1 here represent symbols being concatenated
n times, not integers being raised to the nth power.
Solution strategy: Each step of the
derivation should preserve the invariant that
the number of 0’s = the number of 1’s in the
template so far, and all 0’s come before all 1’s.
Solution: S → 0S1, S → λ.
Context-Sensitive Languages

The language { anbncn | n  1} is context-


sensitive but not context free.
A grammar for this language is given by:
S  aSBC | aBC
CB  BC
aB  ab
bB  bb
Terminal
and bC  bc
non-terminal cC  cc
A derivation from this grammar is:-
S  aSBC
 aaBCBC (using S
aBC)
 aabCBC (using aB
 ab)
 aabBCC (using CB 
BC)
 aabbCC (using bB 
bb)
 aabbcC (using bC 
bc)
 aabbcc (using cC 

You might also like