The Chomsky Hierarchy
1
Who is Noam Chomsky Anyway?
▪ Philosopher of Languages
▪ Professor of Linguistics at MIT
▪ Constructed the idea that language was
not a learned “behavior”, but that it was
cognitive and innate; versus stimulus-
response driven
▪ In an effort to explain these theories, he
developed the Chomsky Hierarchy
Chomsky Hierarchy
• Comprises four types of languages and
their associated grammars and machines.
Type 3: Regular Languages
Type 2: Context-Free Languages
Type 1: Context-Sensitive Languages
Type 0: Recursively Enumerable Languages
• These languages form a strict hierarchy
Chomsky Hierarchy
Language Grammar Machine Example
Regular Grammar Deterministic or
Regular ▪ Right-linear Nondeterministic
grammar Finite-state a*
Language
▪ Left-linear acceptor
grammar
Context-free Context-free Nondeterministic anbn
Language grammar Pushdown
automaton
Context- Context-sensitive Linear-bounded anbnc n
sensitive grammar automaton
Recursively Unrestricted Turing machine Any computable
enumerable grammar function
The Chomsky Hierarchy
Non Turing-Acceptable
Turing-Acceptable
decidable
Context-sensitive
Context-free
Regular
5
9.7: Chomsky Hierarchy
Turing Machine
Turing Machine (II)
Unrestricted grammar
Recognized by Turing machine
It consists of a read-write head that can be
positioned anywhere along an infinite tape.
It is not a useful class of language for
compiler design.
Linear-Bounded Automata
Linear-Bounded Automata
Context-sensitive
Restrictions
Left-hand of each production must have at least
one nonterminal in it
Right-hand side must not have fewer symbols
than the left
There can be no empty productions (N→)
Push-Down Automata
Push-Down Automata (II)
Context-free
Recognized by push-down automata
Can only read its input tape but has a stack that can grow to
arbitrary depth where it can save information
An automation with a read-only tape and two independent
stacks is equivalent to a Turing machine.
It allows at most a single nonterminal (and no terminal) on
the left-hand side of each production.
Finite-State Automata
Finite State Automata (II)
Regular language
Anything that must be remembered about
the context of a symbol on the input tape
must be preserved in the state of the
machine.
It allows only one symbol (a nonterminal) on
the left-hand, and only one or two symbols
on the right.
Linear-Bounded Automata:
Same as Turing Machines with one difference:
the input string tape space
is the only tape space allowed to use
15
Linear Bounded Automaton (LBA)
Input string
[ a b c d e ]
Working space
Left-end Right-end
in tape
marker marker
All computation is done between end markers
16
We define LBA’s as NonDeterministic
Open Problem:
NonDeterministic LBA’s
have same power as
Deterministic LBA’s ?
17
Example languages accepted by LBAs:
L = {a b c }
n n n
L = {a }
n!
LBA’s have more power than PDA’s
(pushdown automata)
LBA’s have less power than Turing Machines
18
Unrestricted Grammars:
Productions
u →v
String of variables String of variables
and terminals and terminals
19
Example unrestricted grammar:
S → aBc
aB → cA
Ac → d
20
Theorem:
A language L is Turing-Acceptable
if and only if L is generated by an
unrestricted grammar
21
Context-Sensitive Grammars:
Productions
u →v
String of variables String of variables
and terminals and terminals
and: |u| |v|
22
The language n n n
{a b c }
is context-sensitive:
S → abc | aAbc
Ab → bA
Ac → Bbcc
bB → Bb
aB → aa | aaA
23
Theorem:
A language L is context sensistive
if and only if
it is accepted by a Linear-Bounded automaton
Observation:
There is a language which is context-sensitive
but not decidable
24
Intro to Languages
English grammar tells us if a given combination of words is a
valid sentence.
The syntax of a sentence concerns its form while the
semantics concerns
its meaning.
e.g. the mouse wrote a poem
From a syntax point of view this is a valid sentence.
From a semantics point of view not so fast…perhaps in Disney
land
Natural languages (English, French, Portguese, etc) have very
complex rules of syntax and not necessarily well-defined.
25
Formal Language
Formal language – is specified by well-defined set of rules of
syntax
We describe the sentences of a formal language using a
grammar.
Two key questions:
1 - Is a combination of words a valid sentence in a formal
language?
2 – How can we generate the valid sentences of a formal
language?
Formal languages provide models for both natural languages and
programming languages.
26
Grammars
A formal grammar G is any compact, precise
mathematical definition of a language L.
As opposed to just a raw listing of all of the language’s
legal sentences, or just examples of them.
A grammar implies an algorithm that would
generate all legal sentences of the language.
Often, it takes the form of a set of recursive
definitions.
A popular way to specify a grammar recursively is
to specify it as a phrase-structure grammar.
Grammars (Semi-formal)
Example: A grammar that generates a
subset of the English language
sentence → noun _ phrase predicate
noun _ phrase → article noun
predicate → verb 28
article → a
article → the
noun → boy
noun → dog
verb → runs
verb → sleeps
29
A derivation of “the boy sleeps”:
sentence noun _ phrase predicate
noun _ phrase verb
article noun verb
the noun verb
the boy verb
the boy sleeps
30
A derivation of “a dog runs”:
sentence noun _ phrase predicate
noun _ phrase verb
article noun verb
a noun verb
a dog verb
a dog runs 31
Language of the grammar:
L = { “a boy runs”,
“a boy sleeps”,
“the boy runs”,
“the boy sleeps”,
“a dog runs”,
“a dog sleeps”,
“the dog runs”,
“the dog sleeps” }
32
Notation
noun → boy
noun → dog
Variable Terminal
or Production
Symbols of
Non-terminal rule
the vocabulary
Symbols of
the vocabulary
33
Basic Terminology
► A vocabulary/alphabet, V is a finite nonempty set of
elements called symbols.
Example: V = {a, b, c, A, B, C, S}
► A word/sentence over V is a string of finite length of
elements of V.
Example: Aba
► The empty/null string, λ is the string with no symbols.
► V* is the set of all words over V.
Example: V* = {Aba, BBa, bAA, cab …}
► A language over V is a subset of V*.
We can give some criteria for a word to be in a
language.
Context-Sensitive Languages
The language { anbncn | n 1} is context-
sensitive but not context free.
A grammar for this language is given by:
S → aSBC | aBC
CB → BC
aB → ab
Terminal bB → bb
and bC → bc
non-terminal cC → cc
Context-Sensitive Languages
Example
A derivation from this grammar is:-
S aSBC
aaBCBC (using S → aBC)
aabCBC (using aB → ab)
aabBCC (using CB → BC)
aabbCC (using bB → bb)
aabbcC (using bC → bc)
aabbcc (using cC → cc)
which derives a2b2c2.