0% found this document useful (0 votes)

14 views50 pages

Cis 262 SL 1

Uploaded by

Joseph22404

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views50 pages

Cis 262 SL 1

Uploaded by

Joseph22404

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Introduction to the Theory of Computation

Languages, Automata, Grammars

Slides for CIS262

Jean Gallier

February 21, 2020

2
Chapter 1

Introduction

1.1 Generalities, Motivations, Problems

In this part of the course we want to understand

• What is a language?
• How do we define a language?
• How do we manipulate languages, combine them?
• What is the complexity of a language?

Roughly, there are two dual views of languages:

(A) The recognition point view.
(B) The generation point of view.

3
4 CHAPTER 1. INTRODUCTION

No matter how we view a language, we are typically con-

sidering two things:
(1) The syntax , i.e., what are the “legal” strings in that
language (what are the “grammar rules”?).
(2) The semantics of strings in the language, i.e., what
is the meaning (or interpretation) of a string.

The semantics is usually a lot more interesting than the

syntax but unfortunately much more diﬃcult to deal with!

Therefore, sorry, we will only be dealing with syntax!

In (A), we typically assume some kind of “black box”,

M , (an automaton) that takes a string, w, as input and
returns two possible answers:

Yes, the string w is accepted , which means that w be-

longs to the language, L, that we are trying to define.

No, the string w is rejected , which means that w does

not belong to the language, L.
1.1. GENERALITIES, MOTIVATIONS, PROBLEMS 5

Usually, the black box M gives a definite answer for every

input after a finite number of steps, but not always.

For example, a Turing machine may go on computing

forever and not give any answer for certain strings not in
the language. This is an example of undecidability.

The black box may compute deterministically or non-

deterministically, which means roughly that on input w,
the machine M is allowed to try diﬀerent computations
and to ignore failing computations as long as there is some
successful computation on input w.

This aﬀects greatly the complexity of recognition, i.e,.

how many steps it takes to process w.
6 CHAPTER 1. INTRODUCTION

Sometimes, a nondeterministic version of an automaton

turns out to be equivalent to the deterministic version
(although, with diﬀerent complexity).

This tends to happen for very restrictive models—where

nondeterminism does not help, or for very powerful
models—where again, nondeterminism does not help, but
because the deterministic model is already very powerful!

We will investigate automata of increasing power of recog-

nition:
(1) Deterministic and nondeterministic finite automata
(DFA’s and NFA’s, their power is the same).
(2) Pushdown automata (PDA’s) and determinstic push-
down automata (DPDA’s), here PDA > DPDA.
(3) Deterministic and nondeterministic Turing machines
(their power is the same).
(4) If time permits, we will also consider some restricted
type of Turing machine known as LBA (linear bounded
automaton).
1.1. GENERALITIES, MOTIVATIONS, PROBLEMS 7

In (B), we are interested in formalisms that specify a

language in terms of rules that allow the generation of
“legal” strings. The most common formalism is that of a
formal grammar .

Remember:
• An automaton recognizes (or accepts) a language,
• a grammar generates a language.
• grammar is spelled with an “a” (not with an “e”).
• The plural of automaton is automata
(not automatons).

For “good” classes of grammars, it is possible to build an

automaton, MG, from the grammar, G, in the class, so
that MG recognizes the language, L(G), generated by the
grammar G.
8 CHAPTER 1. INTRODUCTION

However, grammars are nondeterministic in nature. Thus,

even if we try to avoid nondeterministic automata, we
usually can’t escape having to deal with them.

We will investigate the following types of grammars (the

so-called Chomsky hierarchy) and the corresponding fam-
ilies of languages:
(1) Regular grammars (type 3-languages).
(2) Context-free grammars (type 2-languages).
(3) The recursively enumerable languages or r.e. sets
(type 0-languages).
(4) If time permit, context-sensitive languages
(type 1-languages).

Miracle: The grammars of type (1), (2), (3), (4) corre-

spond exactly to the automata of the corresponding type!
1.1. GENERALITIES, MOTIVATIONS, PROBLEMS 9

Furthermore, there are algorithms for converting gram-

mars to the corresponding automata (and backward), al-
though some of these algorithms are not practical.

Building an automaton from a grammar is an important

practical problem in language processing. A lot is known
for the regular and the context-free grammars, but there
is still room for improvements and innovations!

There are other ways of defining families of languages, for

example

Inductive closures.

In this style of definition, a collection of basic (atomic)

languages is specified, some operations to combine lan-
guages are also specified, and the family of languages is
defined as the smallest one containing the given atomic
languages and closed under the operations.
10 CHAPTER 1. INTRODUCTION

Investigating closure properties (for example, union, in-

tersection) is a way to assess how “robust” (or complex)
a family of languages is.

Well, it is now time to be precise!

Chapter 2

Basics of Formal Language Theory

2.1 Review of Some Basic Math Notation and

Definitions

N, Z, Q, R, C.

The natural numbers,

N = {0, 1, 2, . . .}.

The integers,
Z = {. . . , −2, −1, 0, 1, 2, . . .}.

The rationals,
! "
p
Q= | p, q ∈ Z, q ̸= 0 .
q

The reals, R.
11
12 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

The complex numbers,

C = {a + ib | a, b ∈ R} .

Given any set X, the power set of X is the set of all

subsets of X and is denoted 2X .

The notation
f: X →Y
denotes a function with domain X and range
(or codomain) Y .

graph(f ) = {(x, f (x)} | x ∈ X} ⊆ X × Y

is the graph of f .

Im(f ) = f (X) = {y ∈ Y | ∃x ∈ X, y = f (x)} ⊆ Y

is the image of f .
2.1. REVIEW OF SOME BASIC MATH NOTATION AND DEFINITIONS 13

More generally, if A ⊆ X, then

f (A) = {y ∈ Y | ∃x ∈ A, y = f (x)} ⊆ Y
is the (direct) image of A.

If B ⊆ Y , then
f −1(B) = {x ∈ X | f (x) ∈ B} ⊆ X
is the inverse image (or pullback ) of B.

f −1(B) is a set; it might be empty even if B ̸= ∅.

14 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

Given two functions f : X → Y and g : Y → Z, the

function g ◦ f : X → Z given by
(g ◦ f )(x) = g(f (x)) for all x ∈ X
is the composition of f and g.

The function idX : X → X given by

idX (x) = x for all x ∈ X
is the identity function (of X).

A function f : X → Y is injective (old terminology one-

to-one) if for all x1, x2 ∈ X,
if f (x1) = f (x2), then x1 = x2;

equivalently if x1 ̸= x2, then f (x1) ̸= f (x2).

2.1. REVIEW OF SOME BASIC MATH NOTATION AND DEFINITIONS 15

Fact: If X ̸= ∅ (and so Y ̸= ∅), a function f : X → Y is

injective iﬀ there is a function r : Y → X (a left inverse)
such that
r ◦ f = idX .

Note: r is surjective.

A function f : X → Y is surjective (old terminology

onto) if for all y ∈ Y , there is some x ∈ X such that
y = f (x), iﬀ
f (X) = Y.

Fact: If X ̸= ∅ (and so Y ̸= ∅), a function f : X → Y

is surjective iﬀ there is a function s : Y → X (a right
inverse or section) such that
f ◦ s = idY .

Note: s is injective.
16 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

A function f : X → Y is bijective if it is injective and

surjective.

Fact: If X ̸= ∅ (and so Y ̸= ∅), a function f : X → Y

is bijective if there is a function f −1 : Y → X which is a
left and a right inverse, that is
f −1 ◦ f = idX , f ◦ f −1 = idY .

The function f −1 is unique and called the inverse of f .

The function f is said to be invertible.
2.1. REVIEW OF SOME BASIC MATH NOTATION AND DEFINITIONS 17

A binary relation R between two sets X and Y is a

subset
R ⊆ X × Y = {(x, y) | x ∈ X, y ∈ Y }.

dom(R) = {x ∈ X | ∃y ∈ Y, (x, y) ∈ R} ⊆ X
is the domain of R.

range(R) = {y ∈ Y | ∃x ∈ X, (x, y) ∈ R} ⊆ Y
is the range of R.

We also write xRy instead of (x, y) ∈ R.

18 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

Given two relations R ⊆ X × Y and S ⊆ Y × Z, their

composition R ◦ S ⊆ X × Z is given by
R◦S = {(x, z) | ∃y ∈ Y, (x, y) ∈ R and (y, z) ∈ S}.
! Note that if R and S are the graphs of two functions
f and g, then R ◦ S is the graph of g ◦ f .

IX = {(x, x) | x ∈ X}
is the identity relation on X.

Given R ⊆ X × Y , the converse R−1 ⊆ Y × X of R is

given by
R−1 = {(x, y) ∈ Y × X | (y, x) ∈ R}.

A relation R ⊆ X × X is transitive if for all x, y, z ∈ X,

if (x, y) ∈ R and (y, z) ∈ R, then (x, z) ∈ R.

A relation R ⊆ X × X is transitive iﬀ R ◦ R ⊆ R.
2.1. REVIEW OF SOME BASIC MATH NOTATION AND DEFINITIONS 19

A relation R ⊆ X × X is reflexive if (x, x) ∈ R for all

x∈X

A relation R ⊆ X × X is reflexive iﬀ IX ⊆ R.

A relation R ⊆ X × X is symmetric if for all x, y ∈ X,

if (x, y) ∈ R, then (y, x) ∈ R

A relation R ⊆ X × X is symmetric iﬀ R−1 ⊆ R.

Given R ⊆ X × X (a relation on X), define Rn by

R0 = IX
Rn+1 = R ◦ Rn .

The transtive closure R+ of R is given by

#
+
R = Rn .
n≥1

Fact. R+ is the smallest transitive relation containing

R.
20 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

The reflexive and transitive closure R∗ of R is given by

#
∗
R = Rn = R+ ∪ IX .
n≥0

Fact. R∗ is the smallest transitive and reflexive relation

containing R.

A relation R ⊆ X × X is an equivalence relation if it is

reflexive, symmetric, and transitive.

Fact. The smallest equivalence relation containing a re-

lation R ⊆ X × X is given by
(R ∪ R−1 )∗.
2.1. REVIEW OF SOME BASIC MATH NOTATION AND DEFINITIONS 21

A relation R ⊆ X × X is antisymmetric if for all x, y ∈

X, if (x, y) ∈ R and (y, x) ∈ R, then x = y.

A relation R ⊆ X × X is a partial order if it is reflexive,

transitive, and antisymmetic.

A partial order R ⊆ X × X is a total order if for all

x, y ∈ X, either (x, y) ∈ R or (y, x) ∈ R.
22 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

2.2 Alphabets, Strings, Languages

Our view of languages is that a language is a set of

strings.

In turn, a string is a finite sequence of letters from some

alphabet. These concepts are defined rigorously as fol-
lows.

Definition 2.1. An alphabet Σ is any finite set.

We often write Σ = {a1, . . . , ak }. The ai are called the

symbols of the alphabet.

Examples:
Σ = {a}
Σ = {a, b, c}
Σ = {0, 1}
Σ = {α, β, γ, δ, ϵ, λ, ϕ, ψ, ω, µ, ν, ρ, σ, η, ξ, ζ}
2.2. ALPHABETS, STRINGS, LANGUAGES 23

A string is a finite sequence of symbols. Technically,

it is convenient to define strings as functions. For any
integer n ≥ 1, let
[n] = {1, 2, . . . , n},
and for n = 0, let
[0] = ∅.
Definition 2.2. Given an alphabet Σ, a string over Σ
(or simply a string) of length n is any function

u : [n] → Σ.

The integer n is the length of the string u, and it is

denoted as |u|.

When n = 0, the special string u : [0] → Σ of length 0 is

called the empty string, or null string, and is denoted
as ϵ.
24 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

Given a string u : [n] → Σ of length n ≥ 1, u(i) is the

i-th letter in the string u. For simplicity of notation,
we write ui instead of u(i), and we denote the string
u = u(1)u(2) · · · u(n) as

u = u1 u2 · · · un ,

with each ui ∈ Σ.

For example, if Σ = {a, b} and u : [3] → Σ is defined

such that u(1) = a, u(2) = b, and u(3) = a, we write
u = aba.

Other examples of strings are

work, f un, gabuzomeuh

Strings of length 1 are functions u : [1] → Σ simply pick-

ing some element u(1) = ai in Σ.

Thus, we will identify every symbol ai ∈ Σ with the

corresponding string of length 1.
2.2. ALPHABETS, STRINGS, LANGUAGES 25

The set of all strings over an alphabet Σ, including the

empty string, is denoted as Σ∗.

Observe that when Σ = ∅, then

∅∗ = {ϵ}.
When Σ ̸= ∅, the set Σ∗ is countably infinite. Later on,
we will see ways of ordering and enumerating strings.

Strings can be juxtaposed, or concatenated.

Definition 2.3. Given an alphabet Σ, given any two

strings u : [m] → Σ and v : [n] → Σ, the concatenation
u · v (also written uv) of u and v is the string
uv : [m + n] → Σ, defined such that
!
u(i) if 1 ≤ i ≤ m,
uv(i) =
v(i − m) if m + 1 ≤ i ≤ m + n.

In particular, uϵ = ϵu = u. Observe that

|uv| = |u| + |v|.
26 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

For example, if u = ga, and v = buzo, then

uv = gabuzo

It is immediately verified that

u(vw) = (uv)w.
Thus, concatenation is a binary operation on Σ∗ which is
associative and has ϵ as an identity.

Note that generally, uv ̸= vu, for example for u = a and

v = b.

Given a string u ∈ Σ∗ and n ≥ 0, we define un recursively

as follows:

u0 = ϵ
un+1 = unu (n ≥ 0).
2.2. ALPHABETS, STRINGS, LANGUAGES 27

Clearly, u1 = u, and it is an easy exercise to show that

unu = uun, for all n ≥ 0.

For the induction step, we have

un+1u = (unu)u by definition of un+1

= (uun)u by the induction hypothesis
= u(unu) by associativity
= uun+1 by definition of un+1.
28 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

Definition 2.4. Given an alphabet Σ, given any two

strings u, v ∈ Σ∗ we define the following notions as fol-
lows:

u is a prefix of v iﬀ there is some y ∈ Σ∗ such that

v = uy.

u is a suﬃx of v iﬀ there is some x ∈ Σ∗ such that

v = xu.

u is a substring of v iﬀ there are some x, y ∈ Σ∗ such

that
v = xuy.

We say that u is a proper prefix (suﬃx, substring) of

v iﬀ u is a prefix (suﬃx, substring) of v and u ̸= v.
2.2. ALPHABETS, STRINGS, LANGUAGES 29

For example, ga is a prefix of gabuzo,

zo is a suﬃx of gabuzo and

buz is a substring of gabuzo.

Recall that a partial ordering ≤ on a set S is a binary

relation ≤ ⊆ S × S which is reflexive, transitive, and
antisymmetric.

The concepts of prefix, suﬃx, and substring, define binary

relations on Σ∗ in the obvious way. It can be shown that
these relations are partial orderings.

Another important ordering on strings is the lexicographic

(or dictionary) ordering.
30 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

Definition 2.5. Given an alphabet Σ = {a1, . . . , ak }

assumed totally ordered such that a1 < a2 < · · · < ak ,
given any two strings u, v ∈ Σ∗, we define the lexico-
graphic ordering ≼ as follows:
⎧
⎨ (1) if v = uy, for some y ∈ Σ∗, or
u≼v (2) if u = xaiy, v = xaj z, ai < aj ,
⎩
with ai, aj ∈ Σ, and for some x, y, z ∈ Σ∗.

Note that cases (1) and (2) are mutually exclusive. In

case (1) u is a prefix of v. In case (2) v ̸≼ u and u ̸= v.

For example
ab ≼ b, gallhager ≼ gallier.

It is fairly tedious to prove that the lexicographic ordering

is in fact a partial ordering.

In fact, it is a total ordering, which means that for any

two strings u, v ∈ Σ∗, either u ≼ v, or v ≼ u.
2.2. ALPHABETS, STRINGS, LANGUAGES 31

The reversal w R of a string w is defined inductively as

follows:
ϵR = ϵ,
(ua)R = auR,
where a ∈ Σ and u ∈ Σ∗.

For example
reillag = gallierR.

By setting u = ϵ in (ua)R = auR and using the fact that

ϵR = ϵ, we obtain aR = a for all a ∈ Σ.

It can be shown that

(uv)R = v RuR.
Thus,
(u1 . . . un)R = uR R
n . . . u1 ,
and when ui ∈ Σ, we have
(u1 . . . un)R = un . . . u1.
32 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

We can now define languages.

Definition 2.6. Given an alphabet Σ, a language over

Σ (or simply a language) is any subset L of Σ∗.

If Σ ̸= ∅, there are uncountably many languages.

2.2. ALPHABETS, STRINGS, LANGUAGES 33

A Quick Review of Finite, Infinite,

Countable, and Uncountable Sets

For details and proofs, see Discrete Mathematics, by

Gallier.

Let N = {0, 1, 2, . . .} be the set of natural numbers.

Recall that a set X is finite if there is some natural

number n ∈ N and a bijection between X and the set
[n] = {1, 2, . . . , n}. (When n = 0, X = ∅, the empty
set.)

The number n is uniquely determined. It is called the

cardinality (or size) of X and is denoted by |X|.

A set is infinite iﬀ it is not finite.

34 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

Recall that any injection or surjection of a finite set to

itself is in fact a bijection.

The above fails for infinite sets.

The pigeonhole principle asserts that there is no bijec-

tion between a finite set X and any proper subset Y
of X.

Consequence: If we think of X as a set of n pigeons

and if there are only m < n boxes (corresponding to the
elements of Y ), then at least two of the pigeons must
share the same box .

As a consequence of the pigeonhole principle, a set X is

infinite iﬀ it is in bijection with a proper subset of itself.

For example, we have a bijection n /→ 2n between N and

the set 2N of even natural numbers, a proper subset of
N, so N is infinite.
2.2. ALPHABETS, STRINGS, LANGUAGES 35

Definition 2.7. A set X is countable (or denumer-

able) if there is an injection from X into N.

If X is not the empty set, then X is countable iﬀ there is

a surjection from N onto X.

Fact. It can be shown that a set X is countable if either

it is finite or if it is in bijection with N (in which case it
is infinite).

We will see later that N × N is countable. As a conse-

quence, the set Q of rational numbers is countable.

A set is uncountable if it is not countable.

For example, R (the set of real numbers) is uncountable.

Similarly

(0, 1) = {x ∈ R | 0 < x < 1}

is uncountable. However, there is a bijection between

(0, 1) and R (find one!)
36 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

The set 2N of all subsets of N is uncountable. This is a

special case of Cantor’s theorem discussed below.

Suppose |Σ| = k with Σ = {a1, . . . , ak }.

There are k n strings of length n and (k n+1 − 1)/(k − 1)

strings of length at most n over Σ; when k = 1, the
second formula should be replaced by n + 1.

If Σ ̸= ∅, then the set Σ∗ of all strings over Σ is infinite

and countable, as we now show.

If k = 1 write a = a1, and then

{a}∗ = {ϵ, a, aa, aaa, . . . , an, . . .}.

We have the bijection n /→ an from N to {a}∗.

2.2. ALPHABETS, STRINGS, LANGUAGES 37

If k ≥ 2, then we can think of the string

u = a i1 · · · a in
as a representation of the integer ν(u) in base k shifted
by (k n − 1)/(k − 1), with

ν(u) = i1k n−1 + i2k n−2 + · · · + in−1k + in

kn − 1
= + (i1 − 1)k n−1 + · · · + (in−1 − 1)k + in − 1.
k−1

(and with ν(ϵ) = 0), where 1 ≤ ij ≤ k for j = 1, . . . , n.

We leave it as an exercise to show that ν : Σ∗ → N is a

bijection.

In fact, ν corresponds to the enumeration of Σ∗ where u

precedes v if |u| < |v|, and u precedes v in the lexico-
graphic ordering if |u| = |v|.
38 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

For example, if k = 2 and if we write Σ = {a, b}, then

the enumeration begins with

ϵ,
0
a, b,
1, 2,
aa, ab, ba, bb,
3, 4, 5, 6,
aaa, aab, aba, abb, baa, bab, bba, bbb
7, 8, 9, 10, 11, 12, 13, 14

To get the next row, concatenate a on the left, and then

concatenate b on the left.

ν(bab) = 2 · 22 + 1 · 21 + 2 = 8 + 2 + 2 = 12.
It works!
∗
On the other hand, if Σ ̸= ∅, the set 2Σ of all subsets of
Σ∗ (all languages) is uncountable.
2.2. ALPHABETS, STRINGS, LANGUAGES 39

Indeed, we can show that there is no surjection from N

∗
onto 2Σ .

First, we will show that there is no surjection from Σ∗

Σ∗
onto 2 . This is an instance of Cantor’s Theorem.
∗
We claim that if there is no surjection from Σ∗ onto 2Σ ,
∗
then there is no surjection from N onto 2Σ either.
Proof. Assume by contradiction that there is a surjection
∗
g : N → 2Σ .

But, if Σ ̸= ∅, then Σ∗ is infinite and countable, thus we

have the bijection ν : Σ∗ → N. Then the composition
ν g ∗
Σ∗ !
N !
2Σ
is a surjection, because the bijection ν is a surjection, g
is a surjection, and the composition of surjections is a
surjection, contradicting the hypothesis that there is no
∗ Σ∗
surjection from Σ onto 2 .
40 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

We use a diagonalization argument to prove Cantor’s

Theorem.
Theorem 2.1. (Cantor, 1873) For every set X, there
is no surjection from X onto 2X .
Proof. Assume there is a surjection h : X → 2X , and
consider the set
/ h(x)} ∈ 2X .
D = {x ∈ X | x ∈
By definition, for any x ∈ X we have x ∈ D iﬀ x ∈/ h(x).
Since h is surjective, there is some y ∈ X such that
h(y) = D. Then, by definition of D and since D = h(y),
we have
y ∈ D iﬀ y ∈/ h(y) = D,
a contradiction. Therefore, h is not surjective.

Applying Theorem 2.1 to the case where X = Σ∗, we

∗
deduce that there is no surjection from Σ∗ onto 2Σ .
∗
Therefore, if Σ ̸= ∅, then 2Σ is uncountable.
2.2. ALPHABETS, STRINGS, LANGUAGES 41

Applying Theorem 2.1 to the case where X = N, we see

that there is no surjection from N onto 2N. This shows
that 2N is uncountable, as we claimed earlier.

For any set X, by mapping x ∈ X to {x} ∈ 2X , we

obtain an injection of X into 2X . However, Cantor’s
theorem implies that there is no injection of 2X into X.

Intuitively, 2X is strictly larger than X.

∗
Since 2Σ is uncountable.(if Σ ̸= ∅), we will try to single
out countable “tractable” families of languages.

We will begin with the family of regular languages, and

then proceed to the context-free languages.

We now turn to operations on languages.

42 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

2.3 Operations on Languages

A way of building more complex languages from simpler

ones is to combine them using various operations. First,
we review the set-theoretic operations of union, intersec-
tion, and complementation.

Given some alphabet Σ, for any two languages L1, L2 over

Σ, the union L1 ∪ L2 of L1 and L2 is the language
L1 ∪ L2 = {w ∈ Σ∗ | w ∈ L1 or w ∈ L2}.

The intersection L1 ∩ L2 of L1 and L2 is the language

L1 ∩ L2 = {w ∈ Σ∗ | w ∈ L1 and w ∈ L2}.

The diﬀerence L1 − L2 of L1 and L2 is the language

L1 − L2 = {w ∈ Σ∗ | w ∈ L1 and w ∈
/ L2}.
The diﬀerence is also called the relative complement.
2.3. OPERATIONS ON LANGUAGES 43

A special case of the diﬀerence is obtained when L1 = Σ∗,

in which case we define the complement L of a language
L as

L = {w ∈ Σ∗ | w ∈
/ L}.

The above operations do not use the structure of strings.

The following operations use concatenation.

Definition 2.8. Given an alphabet Σ, for any two lan-

guages L1, L2 over Σ, the concatenation L1L2 of L1 and
L2 is the language

L1L2 = {w ∈ Σ∗ | ∃u ∈ L1, ∃v ∈ L2, w = uv}.

For any language L, we define Ln as follows:

L0 = {ϵ},
Ln+1 = LnL (n ≥ 0).

By setting n = 0 in the above equation we get L1 = L.

44 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

The following properties are easily verified:

L∅ = ∅,
∅L = ∅,
L{ϵ} = L,
{ϵ}L = L,
(L1 ∪ {ϵ})L2 = L1L2 ∪ L2,
L1(L2 ∪ {ϵ}) = L1L2 ∪ L1,
LnL = LLn.

In general, L1L2 ̸= L2L1.

So far, the operations that we have introduced, except

complementation (since L = Σ∗ −L is infinite if L is finite
and Σ is nonempty), preserve the finiteness of languages.
This is not the case for the next two operations.
2.3. OPERATIONS ON LANGUAGES 45

Definition 2.9. Given an alphabet Σ, for any language

L over Σ, the Kleene ∗-closure L∗ of L is the language
#
∗
L = Ln .
n≥0

The Kleene +-closure L+ of L is the language

#
+
L = Ln .
n≥1

Thus, L∗ is the infinite union

L∗ = L0 ∪ L1 ∪ L2 ∪ . . . ∪ Ln ∪ . . . ,

and L+ is the infinite union

L+ = L1 ∪ L2 ∪ . . . ∪ Ln ∪ . . . .

Since L1 = L, both L∗ and L+ contain L.

46 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

In fact,

L+ = {w ∈ Σ∗, ∃n ≥ 1,
∃u1 ∈ L · · · ∃un ∈ L, w = u1 · · · un},

and since L0 = {ϵ},

L∗ = {ϵ} ∪ {w ∈ Σ∗, ∃n ≥ 1,
∃u1 ∈ L · · · ∃un ∈ L, w = u1 · · · un}.

Thus, the language L∗ always contains ϵ, and we have

L∗ = L+ ∪ {ϵ}.
2.3. OPERATIONS ON LANGUAGES 47

However, if ϵ ∈ / L+. The following is easily

/ L, then ϵ ∈
shown:

∅∗ = {ϵ},
L+ = L∗L,
L∗ ∗ = L∗ ,
L∗ L∗ = L∗ .

The Kleene closures have many other interesting proper-

ties.

Homomorphisms are also very useful.

Given two alphabets Σ, ∆, a homomorphism

h : Σ∗ → ∆∗ between Σ∗ and ∆∗ is a function
h : Σ∗ → ∆∗ such that

h(uv) = h(u)h(v) for all u, v ∈ Σ∗.

48 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

Letting u = v = ϵ, we get

h(ϵ) = h(ϵ)h(ϵ),

which implies that (why?)

h(ϵ) = ϵ.

If Σ = {a1, . . . , ak }, it is easily seen that h is completely

determined by h(a1), . . . , h(ak ) (why?)

Example: Σ = {a, b, c}, ∆ = {0, 1}, and

h(a) = 01, h(b) = 011, h(c) = 0111.

For example

h(abbc) = 010110110111.
2.3. OPERATIONS ON LANGUAGES 49

Given any language L1 ⊆ Σ∗, we define the image h(L1)

of L1 as

h(L1) = {h(u) ∈ ∆∗ | u ∈ L1}.

Given any language L2 ⊆ ∆∗, we define the

inverse image h−1(L2) of L2 as

h−1(L2) = {u ∈ Σ∗ | h(u) ∈ L2}.

We now turn to the first formalism for defining languages,

Deterministic Finite Automata (DFA’s)
50 CHAPTER 2. BASICS OF FORMAL LANGUAGE THEORY

Helpful Notes - 11 - Final - Review
No ratings yet
Helpful Notes - 11 - Final - Review
21 pages
The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute
No ratings yet
The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute
109 pages
Computer Science Press - Introduction To Logic and Automata
No ratings yet
Computer Science Press - Introduction To Logic and Automata
302 pages
Introduction To The Theory of Computing
No ratings yet
Introduction To The Theory of Computing
227 pages
AUTOMATA Good Book
No ratings yet
AUTOMATA Good Book
60 pages
CS372 Formal Languages & The Theory of Computation
No ratings yet
CS372 Formal Languages & The Theory of Computation
40 pages
Introduction To Languages and The Theory of Computation
100% (1)
Introduction To Languages and The Theory of Computation
568 pages
TOC (Subjective Questions)
No ratings yet
TOC (Subjective Questions)
31 pages
Station Island
No ratings yet
Station Island
246 pages
An Introduction To Formal Languages and Automata Book
No ratings yet
An Introduction To Formal Languages and Automata Book
14 pages
Foundation of Tcs
No ratings yet
Foundation of Tcs
142 pages
Theory of Computation
100% (2)
Theory of Computation
220 pages
Formal Language Theory
No ratings yet
Formal Language Theory
69 pages
Theory of Computation
No ratings yet
Theory of Computation
120 pages
Lecture Notes Formal Languages Nouwen
No ratings yet
Lecture Notes Formal Languages Nouwen
52 pages
Theory of Computation: Computer Science & Information Technology by
No ratings yet
Theory of Computation: Computer Science & Information Technology by
10 pages
Unit 1
No ratings yet
Unit 1
22 pages
Unit 1 - Merged
No ratings yet
Unit 1 - Merged
78 pages
Gallier Theory of Computation
No ratings yet
Gallier Theory of Computation
398 pages
Models of Computation
No ratings yet
Models of Computation
30 pages
Toc in 8 Hours Theory of Computation
No ratings yet
Toc in 8 Hours Theory of Computation
79 pages
Operating System
No ratings yet
Operating System
231 pages
Formal Languages and Chomsky Hierarchy
No ratings yet
Formal Languages and Chomsky Hierarchy
41 pages
Formal Languages and Chomsky Hierarchy
No ratings yet
Formal Languages and Chomsky Hierarchy
36 pages
Formal Languages and Chomsky Hierarchy
No ratings yet
Formal Languages and Chomsky Hierarchy
36 pages
Module 1 FLAT
No ratings yet
Module 1 FLAT
84 pages
Lecture 1
No ratings yet
Lecture 1
4 pages
Week 1
No ratings yet
Week 1
12 pages
Automata Tutorial Points
No ratings yet
Automata Tutorial Points
36 pages
FALLSEM2020-21 CSE2002 TH VL2020210104526 Reference Material I 13-Jul-2020 1-Introduction
No ratings yet
FALLSEM2020-21 CSE2002 TH VL2020210104526 Reference Material I 13-Jul-2020 1-Introduction
39 pages
Final3 PDF
No ratings yet
Final3 PDF
91 pages
MM CHAPTER 2 Automata Landscape
No ratings yet
MM CHAPTER 2 Automata Landscape
122 pages
Theory of Automata: Dr. S. M. Gilani
No ratings yet
Theory of Automata: Dr. S. M. Gilani
29 pages
Automata Theory: Digital Notes by
No ratings yet
Automata Theory: Digital Notes by
77 pages
Notes On Formal Languages Automata Computability and Complexity Gallier J - The Full Ebook Version Is Available, Download Now To Explore
No ratings yet
Notes On Formal Languages Automata Computability and Complexity Gallier J - The Full Ebook Version Is Available, Download Now To Explore
45 pages
Full Notes
No ratings yet
Full Notes
152 pages
Book
No ratings yet
Book
401 pages
TOC Practise Problems and Hints
No ratings yet
TOC Practise Problems and Hints
15 pages
Lecture 01 - Introduction To LT & FA-2024
No ratings yet
Lecture 01 - Introduction To LT & FA-2024
34 pages
Machines and Their Languages (G51MAL) Lecture Notes Spring 2003
No ratings yet
Machines and Their Languages (G51MAL) Lecture Notes Spring 2003
27 pages
Finite Automata Theory and Formal Languages: Lec01: Introduction
No ratings yet
Finite Automata Theory and Formal Languages: Lec01: Introduction
26 pages