50% found this document useful (2 votes)

4K views492 pages

Discrete Mathematics - Balakrishnan and Viswanathan

This draft is our book proposal format to be substantially revised. The present form is for students. I use it for a course on Combinatorics and Graph Theory. This contains the stuff to be covered in two courses for a good undergraduate programme.

Uploaded by

vichyiyer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

50% found this document useful (2 votes)

4K views492 pages

Discrete Mathematics - Balakrishnan and Viswanathan

Uploaded by

vichyiyer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 492

1 Introduction: Sets, Functions and Relations 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Finite and Inﬁnite Sets . . . . . . . . . . . . . . . . . . . . . 9
1.5 Cardinal Numbers of Sets . . . . . . . . . . . . . . . . . . . . 12
1.6 Power set of a set . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.8 Partially Ordered Sets . . . . . . . . . . . . . . . . . . . . . . 20
1.9 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.10 Boolean Algebras . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 35
1.10.2 Examples of Boolean algebras . . . . . . . . . . . . . . 36
1.11 Atoms in a Lattice . . . . . . . . . . . . . . . . . . . . . . . . 40
1.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2 Combinatorics 47
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2 Elementary Counting Ideas . . . . . . . . . . . . . . . . . . . . 48
2.2.1 Sum Rule . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2.2 Product Rule . . . . . . . . . . . . . . . . . . . . . . . 49
2.3 Combinations and Permutations . . . . . . . . . . . . . . . . . 51
2.4 Stirling’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.5 Examples in simple combinatorial reasoning . . . . . . . . . . 54
2.6 The Pigeon-Hole Principle . . . . . . . . . . . . . . . . . . . . 59
2.7 More Enumerations . . . . . . . . . . . . . . . . . . . . . . . . 62
2.7.1 Enumerating permutations with constrained repetitions 64
2.8 Ordered and Unordered Partitions . . . . . . . . . . . . . . . . 65
2.8.1 Enumerating the ordered partitions of a set . . . . . . 65
2.9 Combinatorial Identities . . . . . . . . . . . . . . . . . . . . . 68
2.10 The Binomial and the Multinomial Theorems . . . . . . . . . 71

i
CONTENTS ii

2.11 Principle of Inclusion-Exclusion . . . . . . . . . . . . . . . . . 75

2.12 Euler’s φ-function . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.13 Inclusion-Exclusion Principle and the Sieve of Eratosthenes . . 79
2.14 Derangements . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.15 Partition Problems . . . . . . . . . . . . . . . . . . . . . . . . 83
2.15.1 Recurrence relations p(n, m) . . . . . . . . . . . . . . . 83
2.16 Ferrer Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.16.1 Proposition . . . . . . . . . . . . . . . . . . . . . . . . 85
2.16.2 Proposition . . . . . . . . . . . . . . . . . . . . . . . . 85
2.16.3 Proposition . . . . . . . . . . . . . . . . . . . . . . . . 86
2.17 Solution of Recurrence Relations . . . . . . . . . . . . . . . . 86
2.18 Homogeneous Recurrences . . . . . . . . . . . . . . . . . . . . 90
2.19 Inhomogeneous Equations . . . . . . . . . . . . . . . . . . . . 93
2.20 Repertoire Method . . . . . . . . . . . . . . . . . . . . . . . . 94
2.21 Perturbation Method . . . . . . . . . . . . . . . . . . . . . . . 96
2.22 Solving Recurrences using Generating Functions . . . . . . . . 97
2.22.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . 98
2.23 Some simple manipulations . . . . . . . . . . . . . . . . . . . . 100
2.23.1 Solution of recurrence relations . . . . . . . . . . . . . 102
2.23.2 Some common tricks . . . . . . . . . . . . . . . . . . . 104
2.24 Illustrative Problems . . . . . . . . . . . . . . . . . . . . . . . 105

3 Basics of Number Theory 115

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.2 Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.3 gcd and lcm of two integers . . . . . . . . . . . . . . . . . . . 117
3.4 Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
3.6 Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.7 Complete System of Residues . . . . . . . . . . . . . . . . . . 131
3.8 Linear Congruences and Chinese Remainder Theorem . . . . . 136
3.9 Lattice Points Visible from the Origin . . . . . . . . . . . . . . 141
3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.11 Some Arithmetical Functions . . . . . . . . . . . . . . . . . . 146
3.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.13 The big O notation . . . . . . . . . . . . . . . . . . . . . . . . 155

4 Mathematical Logic 161

4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.2 Fully Parenthesized Propositions and their Truth Values . . . 167
4.3 Validity, Satisﬁability and Related Concepts . . . . . . . . . . 172
CONTENTS iii

4.4 Normal forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

4.4.1 Conjunctive and Disjunctive Normal Forms . . . . . . 178
4.5 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.6 The Resolution Principle in Propositional Calculus . . . . . . 182
4.7 Predicate Calculus – Basic Ideas . . . . . . . . . . . . . . . . . 188
4.8 Formulas in Predicate Calculus . . . . . . . . . . . . . . . . . 191
4.8.1 Free and bound Variables . . . . . . . . . . . . . . . . 193
4.9 Interpretation of Formulas of Predicate Calculus . . . . . . . . 194
4.9.1 Structures . . . . . . . . . . . . . . . . . . . . . . . . . 195
4.9.2 Truth Values of formulas in Predicate Calculus . . . . . 196
4.10 Equivalence of Formulas in Predicate Calculus . . . . . . . . . 200
4.11 Prenex Normal Form . . . . . . . . . . . . . . . . . . . . . . . 203
4.12 The Expansion Theorem . . . . . . . . . . . . . . . . . . . . . 204

5 Algebraic Structures 209

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.3 Addition, Scalar Multiplication and Multiplication of Matrices 210
5.3.1 Transpose of a Matrix . . . . . . . . . . . . . . . . . . 211
5.3.2 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . 211
5.3.3 Symmetric and Skew-symmetric matrices . . . . . . . . 212
5.3.4 Hermitian and Skew-Hermitian matrices . . . . . . . . 213
5.3.5 Orthogonal and Unitary matrices . . . . . . . . . . . . 213
5.4 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
5.4.1 Group Tables . . . . . . . . . . . . . . . . . . . . . . . 221
5.5 A Group of Congruent Transformations (Also called Symme-
tries) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
5.6 Another Group of Congruent Transformations . . . . . . . . . 224
5.7 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.8 Cyclic Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
5.9 Lagrange’s Theorem for Finite Groups . . . . . . . . . . . . . 233
5.10 Homomorphisms and Isomorphisms of Groups . . . . . . . . . 238
5.11 Properties of Homomorphisms of Groups . . . . . . . . . . . . 241
5.12 Automorphism of Groups . . . . . . . . . . . . . . . . . . . . 244
5.13 Normal Subgroups . . . . . . . . . . . . . . . . . . . . . . . . 245
5.14 Quotient Groups (or Factor Groups) . . . . . . . . . . . . . . 248
5.15 Basic Isomorphism Theorem for Groups . . . . . . . . . . . . 250
5.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
5.17 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
5.17.1 Rings, Deﬁnitions and Examples . . . . . . . . . . . . . 258
5.17.2 Units of a ring . . . . . . . . . . . . . . . . . . . . . . . 260
CONTENTS iv

5.18 Integral Domains . . . . . . . . . . . . . . . . . . . . . . . . . 263

5.19 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
5.20 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
5.21 Characteristic of a Field . . . . . . . . . . . . . . . . . . . . . 266
5.22 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
5.22.1 Examples of Vector Spaces . . . . . . . . . . . . . . . . 270
5.23 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
5.24 Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
5.25 Linear Independence and Base . . . . . . . . . . . . . . . . . . 275
5.26 Bases of a Vector Space . . . . . . . . . . . . . . . . . . . . . 279
5.27 Dimension of a Vector Space . . . . . . . . . . . . . . . . . . . 280
5.28 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
5.29 Solutions of Linear Equations and Rank of a Matrix . . . . . . 287
5.30 Solutions of Linear Equations . . . . . . . . . . . . . . . . . . 290
5.31 Solutions of Nonhomogeneous Linear Equations . . . . . . . . 292
5.32 LUP Decomposition . . . . . . . . . . . . . . . . . . . . . . . 294
5.32.1 Computing an LU Decomposition . . . . . . . . . . . . 295
5.33 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
5.34 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
5.35 Factorization of Polynomials over Finite Fields . . . . . . . . . 309
5.35.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 312
5.36 Mutually Orthogonal Latin Squares [MOLS] . . . . . . . . . . 313

6 Graph Theory 316

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
6.2 Basic definitions and ideas . . . . . . . . . . . . . . . . . . . . 319
6.2.1 Types of Graphs . . . . . . . . . . . . . . . . . . . . . 319
6.2.2 Two Interesting Applications . . . . . . . . . . . . . . 328
6.2.3 First Theorem of Graph Theory . . . . . . . . . . . . . 331
6.3 Representations of Graphs . . . . . . . . . . . . . . . . . . . . 333
6.4 Basic Ideas in Connectivity of Graphs . . . . . . . . . . . . . . 337
6.4.1 Some Graph Operations . . . . . . . . . . . . . . . . . 337
6.4.2 Vertex Cuts, Edge Cuts and Connectivity . . . . . . . 339
6.4.3 Vertex Connectivity and Edge-Connectivity . . . . . . 342
6.5 Trees and their properties . . . . . . . . . . . . . . . . . . . . 349
6.5.1 Basic Definition . . . . . . . . . . . . . . . . . . . . . . 349
6.5.2 Sum of distances from a leaf of a tree . . . . . . . . . . 353
6.6 Spanning Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
6.6.1 Definition and Basic Results . . . . . . . . . . . . . . . 355
6.6.2 Minimum Spanning Tree . . . . . . . . . . . . . . . . . 360
6.6.3 Algorithm PRIM . . . . . . . . . . . . . . . . . . . . . 361
CONTENTS v

6.6.4 Algorithm KRUSKAL . . . . . . . . . . . . . . . . . . 363

6.7 Independent Sets and Vertex Coverings . . . . . . . . . . . . . 363
6.7.1 Basic Deﬁnitions . . . . . . . . . . . . . . . . . . . . . 363
6.8 Vertex Colorings of Graphs . . . . . . . . . . . . . . . . . . . . 366
6.8.1 Basic Ideas . . . . . . . . . . . . . . . . . . . . . . . . 366
6.8.2 Bounds for χ(G) . . . . . . . . . . . . . . . . . . . . . 369

7 Coding Theory 371

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
7.2 Binary Symmetric Channels . . . . . . . . . . . . . . . . . . . 373
7.3 Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
7.4 The Minimum Weight (Hamming Weight) of a Code . . . . . 379
7.5 Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . . . 380
7.6 Standard Array Decoding . . . . . . . . . . . . . . . . . . . . 381
7.7 Sphere Packings . . . . . . . . . . . . . . . . . . . . . . . . . . 386
7.8 Extended Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 388
7.9 Syndrome Decoding . . . . . . . . . . . . . . . . . . . . . . . . 389
7.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390

8 Cryptography 392
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
8.2 Some Classical Cryptosystem . . . . . . . . . . . . . . . . . . 393
8.2.1 Caesar Cryptosystem . . . . . . . . . . . . . . . . . . . 393
8.2.2 Aﬃne Cryptosystem . . . . . . . . . . . . . . . . . . . 394
8.2.3 Private Key Cryptosystems . . . . . . . . . . . . . . . 396
8.2.4 Hacking an aﬃne cryptosystem . . . . . . . . . . . . . 396
8.3 Encryption Using Matrices . . . . . . . . . . . . . . . . . . . . 399
8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
8.5 Other Private Key Cryptosystems . . . . . . . . . . . . . . . . 402
8.5.1 Vigenere Cipher . . . . . . . . . . . . . . . . . . . . . . 402
8.5.2 The One-Time Pad . . . . . . . . . . . . . . . . . . . . 403
8.6 Public Key Cryptography . . . . . . . . . . . . . . . . . . . . 404
8.6.1 Working of Public Key Cryptosystems . . . . . . . . . 405
8.6.2 RSA Public Key Cryptosystem . . . . . . . . . . . . . 406
8.6.3 The ElGamal Public Key Cryptosystem . . . . . . . . 409
8.6.4 Description of ElGamal System . . . . . . . . . . . . . 410
8.7 Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . 411
8.7.1 Nontrivial Square Roots (mod n) . . . . . . . . . . . . 411
8.7.2 Prime Number Theorem . . . . . . . . . . . . . . . . . 412
8.7.3 Pseudoprimality Testing . . . . . . . . . . . . . . . . . 413
8.7.4 The Miller-Rabin Primality Testing Algorithm . . . . . 414
CONTENTS vi

8.7.5 Miller-Rabin Algorithm (a, n) . . . . . . . . . . . . . . 416

8.8 The Agrawal-Kayal-Saxena (AKS) Primality Testing Algorithm417
8.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 417
8.8.2 The Basis of AKS Algorithm . . . . . . . . . . . . . . . 418
8.8.3 Notation and Preliminaries . . . . . . . . . . . . . . . . 419
8.8.4 The AKS Algorithm . . . . . . . . . . . . . . . . . . . 421

9 Finite Automata 430

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
9.2 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
9.3 Regular Expressions and Regular Languages . . . . . . . . . . 435
9.4 Finite Automata — Definition . . . . . . . . . . . . . . . . . . 437
9.4.1 The Product Automaton and Closure Properties . . . . 442
9.5 Nondeterministic Finite Automata . . . . . . . . . . . . . . . 444
9.5.1 Nondeterminism . . . . . . . . . . . . . . . . . . . . . 444
9.5.2 Definition of NFA . . . . . . . . . . . . . . . . . . . . . 446
9.6 Subset Construction—Equivalence of DFA and NFA . . . . . . 447
9.7 Closure of Regular Languages Under Concatenation and Kleene
Star . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
9.8 Regular Expressions and Finite Automata . . . . . . . . . . . 455
9.9 DFA State Minimization . . . . . . . . . . . . . . . . . . . . . 462
9.10 Myhill-Nerode Relations . . . . . . . . . . . . . . . . . . . . . 467
9.10.1 Isomorphism of DFAs . . . . . . . . . . . . . . . . . . . 467
9.10.2 Myhill-Nerode Relation . . . . . . . . . . . . . . . . . . 467
9.10.3 Construction of the DFA from a given Myhill-Nerode
Relation . . . . . . . . . . . . . . . . . . . . . . . . . . 469
9.10.4 Myhill-Nerode Relation and the Corresponding DFA . 471
9.11 Myhill-Nerode Theorem . . . . . . . . . . . . . . . . . . . . . 474
9.11.1 Notion of Refinement . . . . . . . . . . . . . . . . . . . 474
9.11.2 Myhill-Nerode Theorem and Its Proof . . . . . . . . . 475
9.12 Non-regular Languages . . . . . . . . . . . . . . . . . . . . . . 477
9.12.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . 477
9.12.2 The Pumping Lemma . . . . . . . . . . . . . . . . . . 479
Chapter 1

Introduction: Sets, Functions

and Relations

1.1 Introduction

In this chapter, we recall some of the basic facts about sets, functions, rela-
tions and lattices. We are sure that the reader is already familiar with most
of these that are usually taught in high school algebra with the exception of
lattices. We also assume that the reader is familiar with the basics of real
and complex numbers.
If A and B are sets and A ⊆ B (that is A is a subset of B, and A may
be equal to B), then the complement of A in B is the set B \ A consisting of
all elements of B not belonging to A. The sets A and B are equal if A ⊆ B
and B ⊆ A.

Definition 1.1.1:
By a family of sets we mean an indexed collection of sets.

1
Chapter 1 Introduction: Sets, Functions and Relations 2

For instance, F = {Aα }α∈I is a family of sets. Here for each α ∈ I, there
exists a set Aα of F . Assume that each Aα is a subset of set X. Such a set
X certainly exists since we can take X = ∪ Aα . For each α ∈ I, denote by
α∈I
A′α the complement X/Aα of Aα in X. We then have the celebrated laws of
de Morgan.

Theorem 1.1.2 (De Morgan’s laws):

Let {Aα }α∈I be a family of subsets of a set X. Then

(i) ( ∪ Aα )′ = ∩ A′α , and

α∈I α∈I

(ii) ( ∩ Aα )′ = ∪ A′α .
α∈I α∈I

Proof. We prove (i); the proof of (ii) is similar.

Let x ∈ ( ∪ Aα )′ . Then x 6∈ ∪ Aα , and therefore, x 6∈ Aα , for each
α∈I α∈I
α ∈ I. Hence x ∈ A′α for each α, and consequently, x ∈ ∩ A′α . Thus
α∈I
( ∪ Aα )′ ⊆ ∩ A′α . Conversely, assume that x ∈ ∩ A′α . Then x ∈ A′α for
α∈I α∈I α∈I
each α ∈ I, and therefore, x 6∈ Aα for each α ∈ I. Thus x 6∈ ∪ Aα , and
α∈I
′
hence x ∈ ( ∪ Aα ) . Consequently, ∩ A′α ′
⊆ ( ∪ Aα ) . This proves (i).
α∈I α∈I α∈I

Definition 1.1.3:
A family {Aα }α∈I is called a disjoint-family of sets if whenever α ∈ I, β ∈ I
and α 6= β, we have Aα ∩ Aβ = φ.

For instance, if A1 = {1, 2}, A2 = {3, 4} and A3 = {5, 6, 7} then {Aα }α∈I ,
I = {1, 2, 3} is a disjoint-family of sets.
Chapter 1 Introduction: Sets, Functions and Relations 3

1.2 Functions

Definition 1.2.1:
A function (also called a map or mapping or a single-valued function) f :
A → B from a set A to a set B is a rule by which to each a ∈ A, there is
assigned a unique element f (a) ∈ B. f (a) is called the image of a under f .

For example, if A is a set of students of a particular class, then for a ∈ A

if f (a) denotes the height of a, then f : A → R+ the set of positive real
numbers is a function.

Definition 1.2.2:
Two functions f : A → B and g : A → B are called equal if f (a) = g(a) for
each a ∈ A.

Definition 1.2.3:
If E is a subset of A, then the image of E under f : A → B is ∪ {f (a)}. It
a∈E
is denoted by f (E).

Definition 1.2.4:
A function f : A → B is one-to-one (or 1–1 or injective) if for a1 and a2 in
A, f (a1 ) = f (a2 ) implies that a1 = a2 .

Equivalently, this means that f (a1 ) 6= f (a2 ) implies that a1 6= a2 . Hence

f is 1–1 iﬀ distinct elements of A have distinct images in B under f .
As an example, let A denote the set of 1,000 students of a college A,
and B the set of positive integers. For a ∈ A, let f (a) denote the exam
registration number of a. Then f (a) ∈ B. Clearly, f is 1–1. On the other
Chapter 1 Introduction: Sets, Functions and Relations 4

hand if for the above sets A and B, f (a) denotes the age of the student a, f
is not 1–1.

Definition 1.2.5:
A function f : A → B is called onto (or surjective) if for each b ∈ B, there
exists at least one a ∈ A with f (a) = b (that is, the image f (A) = B).

For example, let A denote the set of integers Z and B, the set of even
integers. If f : A → B is deﬁned by setting f (a) = 2a, then f : A → B is
onto. Again, if f : R → (set of non-negative reals) deﬁned by f (x) = x2 ,
then f is onto but not 1–1.

Definition 1.2.6:
A function f : A → B is bijective (or is a bijection) if it is both 1–1 and
onto.

The function f : Z → 2Z deﬁned by f (a) = 2a is bijective.

An injective (respectively surjective, bijective) mapping is referred to an
injection (respectively surjection, bijection).

Definition 1.2.7:
Let f : A → B and g : B → C be functions. The composition of g with f ,
denoted by g ◦ f , is the function

g◦f : A→C

deﬁned by (g ◦ f )(a) = g(f (a)) for a ∈ A.

As an example, let A = Z denote the set of integers, B = N ∪ {0}, where

Chapter 1 Introduction: Sets, Functions and Relations 5

N is the set of natural numbers {1, 2, . . .}, and C = N. If f : A → B is given

by f (a) = a2 , a ∈ A, and g : B → C is deﬁned by g(b) = b + 1, b ∈ B then
h = g ◦ f : A → C is given by, h(a) = g(f (a)) = g(a2 ) = a2 + 1, a ∈ Z = A.

Definition 1.2.8:
Let f : A → B be a function. For F ⊆ B, the inverse image of F under f ,
denoted by f −1 (F ) is the set of all a ∈ A with f (a) ∈ F . In symbols:

f −1 (F ) = {a ∈ A : f (a) ∈ F }.

Consider the example under Deﬁnition 1.2.5. If F = {1, 2}, that is, if F
consists of the 1st and 2nd standards of the school, then f −1 (F ) is the set of
students who are either in the 1st standard or in the 2nd standard.

Theorem 1.2.9:
Let f : A → B, and X1 , X2 ⊆ A and Y1 , Y2 ⊆ B. Then the following
statements are true:

(i) f (X1 ∪ X2 ) = f (X1 ) ∪ f (X2 ),

(ii) f (X1 ∩ X2 ) ⊆ f (X1 ) ∩ f (X2 ),

(iii) f −1 (Y1 ∪ Y2 ) = f −1 (Y1 ) ∪ f −1 (Y2 ), and

(iv) f −1 (Y1 ∩ Y2 ) = f −1 (Y1 ) ∩ f −1 (Y2 ).

Proof. We prove (iv). The proofs of the other statements are similar.
So assume that a ∈ f −1 (Y1 ∩ Y2 ), where a ∈ A. Then f (a) ∈ Y1 ∩ Y2 , and
therefore, f (a) ∈ Y1 and f (a) ∈ Y2 . Hence a ∈ f −1 (Y1 ) and a ∈ f −1 (Y2 ), and
therefore, a ∈ f −1 (Y1 ) ∩ f −1 (Y2 ). The converse is proved just by retracing
the steps.
Chapter 1 Introduction: Sets, Functions and Relations 6

Note that, in general, we may not have equality in (ii). Here is an example
where equality does not hold good. Let A = {1, 2, 3, 4, 5} and B = {6, 7, 8}.
Let f (1) = f (2) = 6, f (3) = f (4) = 7, and f (5) = 8. Let X1 = {1, 2, 4} and
X2 = {2, 3, 5}. Then X1 ∩ X2 = {2}, and so, f (X1 ∩ X2 ) = {f (2)} = {6}.
However, f (X1 ) = {6, 7}, and f (X2 ) = {6, 7, 8}. Therefore f (X1 ) ∩ f (X2 ) =
{6, 7} 6= f (X1 ∩ X2 ).
We next deﬁne a family of elements and a sequence of elements in a set
X.

Definition 1.2.10:
A family {xi }i∈I of elements xi in a set X is a map x : I → X, where for
i ∈ I, x(i) = xi ∈ X. I is the indexing set of the family (In other words, for
each i ∈ I, there is an element xi ∈ X of the family).

Definition 1.2.11:
A sequence {xn }n∈N of elements of X is a map x : N → X. In other words, a
sequence in X is a family in X where the indexing set is the set N of natural
numbers. For example, {2, 4, 6, . . .} is the sequence of even positive integers.

1.3 Equivalence Relations

Definition 1.3.1:
The Cartesian product X × Y of two (not necessarily distinct) sets X and
Y is the set of all ordered pairs (x, y), where x ∈ X and y ∈ Y . In symbols:

X × Y = {(x, y) : x ∈ X, y ∈ Y }.

In the ordered pair (x, y), the order of x and y is important whereas the
Chapter 1 Introduction: Sets, Functions and Relations 7

unordered pairs (x, y) and (y, x) are equal. As ordered pairs they are equal
if and only if x = y. For instance, the pairs (1, 2) and (2, 1) are not equal
as ordered pairs, while they are equal as unordered pairs.

Definition 1.3.2:
A relation R on a set X is a subset of the Cartesian product X × X.

If (a, b) ∈ R, then we say that b is related to a under R and we also denote

this fact by aRb. For example, if X = {1, 2, 3}, the set R = {(1, 1), (1, 2), (2, 2)}
is a relation on X.
One of the important concepts in the realm of relations is the equivalence
relation.

Definition 1.3.3:
A relation R on a set X is an equivalence relation on X if

(i) R is reﬂexive, that is, (a, a) ∈ R for each a ∈ X,

(ii) R is symmetric, that is, if (a, b) ∈ R then (b, a) ∈ R, and

(iii) R is transitive, that is, if (a, b) ∈ R and (b, c) ∈ R, then (a, c) ∈ R.

We denote by [a] the set of elements of X which are related to a under

R. In other words, [a] = {x ∈ X : (x, a) ∈ R}. [a] is called the equivalence
class deﬁned by a in the relation R.

Example 1.3.4:(1) On the set N of positive integers, let aRb mean that a|b (a
is a divisor of b). Then R is reﬂexive and transitive but not symmetric.

(2) Let X = N, the set of positive integers. For x, y ∈ X, set xRy iﬀ x is a

Chapter 1 Introduction: Sets, Functions and Relations 8

divisor of y. Then R is not symmetric but reﬂexive and transitive.

It is clear that similar examples can be constructed.

Example 1.3.5 (Example of an equivalence relation):

On the set Z of integers (positive integers, negative integers and zero), set
aRb iﬀ a − b is divisible by 5. Clearly R is an equivalence relation on Z.

Definition 1.3.6:
A partition P of a set X is a collection P of nonvoid subsets of X whose
union is X such that the intersection of any two distinct members of P is
empty.

Theorem 1.3.7:
Any equivalence relation R on a set X induces a partition on X in a natural
way.

Proof. As above, let [x] denote the class deﬁned by x. We show that the
classes [x], x ∈ X, deﬁne a partition on X. First of all, each x of X belongs
to class [x] since (x.x) ∈ R. Hence

X = ∪ [x]
x∈X

We now show that if (x, y) 6∈ R, then [x] ∩ [y] = φ. Suppose on the contrary,
[x] ∩ [y] 6= φ. Let z ∈ [x] ∩ [y]. This means that z ∈ [x] and z ∈ [y];
hence (z, x) ∈ R and (z, y) ∈ R. This of course means that (x, z) ∈ R
and (z, y) ∈ R and hence by transitivity (x, y) ∈ R, a contradiction. Thus
{[x] : x ∈ X} forms a partition of X.

Example 1.3.8:
Chapter 1 Introduction: Sets, Functions and Relations 9

Let X = Z, the set of integers, and let (a, b) ∈ R iﬀ a − b is a multiple of 5.

Then clearly, R is an equivalence relation on Z. The equivalence classes are:

[0] ={. . . , −10, −5, 0, 5, 10, . . .},

[1] ={. . . , −9, −4, 1, 6, 11, . . .},

[2] ={. . . , −8, −3, 2, 7, 12, . . .},

[3] ={. . . , −7, −2, 3, 8, 13, . . .},

[4] ={. . . , −6, −1, 4, 9, 14, . . .}.

Note that [5]=[0] and so on. Then the collection {[0], [1], [2], [3], [4]} of
equivalence classes forms a partition of Z.

1.4 Finite and Infinite Sets

Definition 1.4.1:
Two sets are called equipotent if there exists a bijection between them.
Equivalently, if A and B are two sets then A is equivalent to B if there
exists a bijection φ : A → B from A onto B.

If φ : A → B is a bijection from A to B, then φ−1 : B → A is also

a bijection. Again, if φ : A → B and ψ : B → C are bijections, then
ψ ◦ φ : A → C is also a bijection. Trivially, the identity map i : A → A
deﬁned by i(a) = a for each a ∈ A, is a bijection on A. Hence if X is a
nonvoid set, and P(X) is the power set of X, that is, the collection of all
subsets of X, then for A, B ∈ P(X), if we set ARB iﬀ there exists a bijection
φ from A onto B, then R is an equivalence relation on P(X). Equipotent
sets have the same “cardinal number” or “cardinality”.
Chapter 1 Introduction: Sets, Functions and Relations 10

Let Nn denote the set {1, 2, . . . , n}. Nn is called the initial segment
deﬁned with respect to n.

Definition 1.4.2:
A set S is ﬁnite if S is equipotent to Nn for some positive integer n; otherwise
S is called an inﬁnite set.

If S is equipotent to Nn , then the number of elements in it is n. Hence

if n 6= m, Nn is not equipotent to Nm . Consequently, no finite set can be
equipotent to a proper subset of itself.
For instance, the set of trees in a garden is finite whereas the set Z of
integers is infinite. Any subset of a finite set is finite and therefore any
superset of an infinite set is infinite. (If S is an infinite set and S ⊆ T , then
T must be infinite; otherwise, S being a subset of a finite set T , must be
finite).

Theorem 1.4.3:
Let S be a ﬁnite set and f : S → S. Then f is 1–1 iﬀ f is onto.

Proof. Suppose f : S → S is 1–1, and T = f (S). If T 6= S, as f is a bijection

from S to T ⊂ S . Hence S and T have the same number of elements, a
6=
contradiction to the fact that T is a proper subset of the ﬁnite set S. Hence
f must be onto.
Conversely, assume that f is onto. (If f is not 1–1, there exists at least
one s ∈ S having at least two preimages). For each s ∈ S choose a preimage
s′ ∈ S under f . Let S ′ be the set of all such s′ . Clearly, if s and t are distinct
elements of S, then s′ 6= t′ . Hence S ′ is a proper subset of S. Moreover, the
Chapter 1 Introduction: Sets, Functions and Relations 11

function φ : S → S ′ deﬁned by φ(s) = s′ is a bijection. Thus S is bijective

with the proper subset S ′ of S, a contradiction to the fact that S is a ﬁnite
set.

Example 1.4.4:
We show by means of examples that the conclusions in Theorem 1.4.3 may
not be true if S is an infinite set.
First, take S = Z, the set of integers and f : Z → Z defined by f (a) = 2a.
Clearly f is 1–1 but not onto (the image of f being the set of even integers).
Next, let R be the set of real numbers, and let f : R → R be defined by




 x−1 if x > 0


f (x) = 0 if x = 0





x + 1 if x < 0

Clearly f is onto; however, f is not 1–1 since

f (1) = f (0) = f (−1) = 0.

Theorem 1.4.5:
The union of any two ﬁnite sets is ﬁnite.

Proof. First we show that the union of any two disjoint finite sets is finite.
Let S and T be any two finite sets of cardinalities n and m respectively. Then
S is equipotent to Nn and T equipotent to Nm = {1, 2, . . . , m}. Clearly T
is also equipotent to the set {n + 1, n + 2, . . . , n + m}. Hence S ∪ T is
equipotent to {1, . . . , n} ∪ {n + 1, . . . , n + m} = Nn+m . Hence S ∪ T is also
a finite set.
Chapter 1 Introduction: Sets, Functions and Relations 12

By induction it follows that the union of a disjoint-family of a ﬁnite

number of finite sets is finite.
We now show that the union of any two finite sets is finite. Let S and T
be any two finite sets. Then S ∪ T is the union of the three pair-wise disjoint
sets S\T , S ∪ T and T \S and hence is finite.

Corollary 1.4.6:
The union of any finite number of finite sets is finite.

Proof. By induction on the number of ﬁnite sets.

1.5 Cardinal Numbers of Sets

In this section, we brieﬂy discuss the cardinal numbers of sets. Recall Sec-
tion 1.4.

Definition 1.5.1:
A set A is equipotent to a set B if there exists a bijection f from A onto
B, and that equipotence between members of a collection of sets S is an
equivalence relation on S .

As mentioned before, the sets in the same equivalence class are said to
have the same cardinality or the cardinal number. Intuitively it must be
clear that equipotent sets have the same “number” of elements. The cardinal
number of any finite set is a positive integer, while the cardinal numbers of
infinite sets are denoted by certain symbols. The cardinal number of the
infinite set N (the set of positive integers) is denoted by ℵ0 (aleph not). ℵ is
Chapter 1 Introduction: Sets, Functions and Relations 13

the ﬁrst character of the Hebrew language.

Definition 1.5.2:
A set is called denumerable if it is equipotent to N (equivalently, if it has
cardinal number ℵ0 ). A set is countable if it is ﬁnite or denumerable. It
is uncountable if it is not countable (clearly, any uncountable set must be
inﬁnite).

Lemma 1.5.3:
Every inﬁnite set contains a denumerable subset.

Proof. Let X be an inﬁnite set and let x1 ∈ X. Then X1 = X \ {x1 } is an

infinite subset of X (If not, X = X1 ∪{x1 } is a union of two finite subsets of X
and therefore finite by Corollary 1.4.6). As X1 is infinite, X1 has an element
x2 , and X1 \ {x2 } = X \ {x1 , x2 } is infinite. Suppose we have found out
distinct elements x1 , x2 , . . . , xn in X with Xn = X \ {x1 , . . . , xn } infinite.
Then there exists xn+1 in Xn so that Xn \ {xn+1 } is infinite. By induction,
there exists a denumerable subset {x1 , x2 , . . . , xn , xn+1 , . . .} of X.

Theorem 1.5.4:
A set is inﬁnite iﬀ it is equipotent to a proper subset of itself.

Proof. That no ﬁnite set can be equipotent to a proper subset of it has

already been observed (Section 1.4).
Assume that X is inﬁnite. Then, by Lemma 1.5.3, X contains a denu-
merable subset X0 = {x1 , x2 , . . . , }. Let

Y =(X \ X0 ) ∪ {x2 , x3 , . . .} = X \ {x1 }.

Chapter 1 Introduction: Sets, Functions and Relations 14

Then the mapping φ : X → Y deﬁned by




x if x ∈ X \ X0
φ(x) =


xn+1 if x = xn , n ≥ 1,

(so that φ(x1 ) = x2 , φ(x2 ) = x3 and so on) is a 1–1 map of X onto Y , and
therefore an equipotence (that is, a bijection). Thus X is equipotent to the
proper subset Y = X \ {x} of X.

Notation

We denote the cardinality of a set X by |X|.

If X is a set of, say, 17 elements and y is a set of 20 elements, then
|X| < |Y | and there exists a 1–1 mapping of X to Y . Conversely, if X and
Y are ﬁnite sets and |X| < |Y |, then there exists a 1–1 map from X to Y .
These ideas can be generalized to any two arbitrary sets.

Definition 1.5.5:
Let X and Y be any two sets. Then |X| ≤ |Y | iff there exists a 1–1 mapping
from X to Y .
Suppose we have |X| ≤ |Y |, and |Y | ≤ |X|. If X and Y are finite sets,
it is clear that X and Y have the number of elements, that is, |X| = |Y |.
The same result holds good even if X and Y are infinite sets. This result is
known as Schroder-Bernstein theorem.

Theorem 1.5.6 (Schroder-Bernstein):

If X and Y are sets such that |X| ≤ |Y | and |Y | ≤ |X|, then |X| = |Y |.

For the proof of Theorem 1.5.6, we need a lemma.

Chapter 1 Introduction: Sets, Functions and Relations 15

Lemma 1.5.7:
Let A be a set and A1 and A2 be subsets of A such that A ⊇ A1 ⊇ A2 . If
|A| = |A2 |, then |A| = |A1 |.

Proof. If A is a ﬁnite set, then |A| = |A2 | gives that A = A2 , as A2 is a

subset of A. Hence A = A1 = A2 , and therefore |A| = |A1 |.
So assume that A is an inﬁnite set. |A| = |A2 | means that there exists a
bijection φ : A → A2 . Let φ(A1 ) = A3 ⊆ A2 . We then have,

A1 ⊇ A2 ⊇ A3 , and |A1 | = |A3 | (1.1)

So starting with A ⊇ A1 ⊇ A2 and |A| = |A2 |, we get (1.1). Starting with

(1.1) and using the same argument, we get

A2 ⊇ A3 ⊇ A4 and |A2 | = |A4 |. (1.2)

Note that the bijection from A2 to A4 is given by the same map φ. In this
way, we get a sequence of sets

A ⊇ A1 ⊇ A2 ⊇ A3 ⊇ . . . (1.3)

with |A| = |A3 |, and |Ai | = |Ai+2 |, for each i ≥ 1. Moreover,

A \ A1 = A2 \ A3 ,

A1 \ A2 = A3 \ A4 ,

A2 \ A3 = A4 \ A5 ,

and so on. (see Figure 1.1). Once again, the bijections are under the same
map φ. Let P = A ∩ A1 ∩ A2 ∩ . . .

ThenA =(A \ A1 ) ∪ (A1 \ A2 ) ∪ . . . ∪ P, and

Chapter 1 Introduction: Sets, Functions and Relations 16

A\A1 A2 \A3
φ
A A1 A3 A2

Figure 1.1:

A1 =(A1 \ A2 ) ∪ (A2 \ A3 ) ∪ . . . ∪ P,

where the sets on the right are pairwise disjoint. Moreover, |A \ A1 | =

|A1 \ A2 |, |A1 \ A2 | = |A2 \ A3 | and so on. Hence |A| = |A1 |.

Proof of Schroder–Bernstein theorem. By hypothesis |X| ≤ |Y |. Hence there

exists a 1–1 map φ : X → Y . Let φ(X) = Y ∗ ⊆ Y . As |Y | ≤ |X|, there
exists a 1–1 map ψ : Y → X. Let ψ(Y ) = X ∗ , and ψ(Y ∗ ) = X ∗∗ ⊆ X.
Then as Y ∗ ⊆ Y , and ψ is 1–1, ψ(Y ∗ ) ⊆ ψ(Y ), that is, X ∗∗ ⊆ X ∗ . Thus
|X| = |Y ∗ | = |X ∗∗ |, and X ⊇ X ∗ ⊇ X ∗∗ . By Lemma 1.5.7, |X| = |X ∗ |, and
since |Y | = |X ∗ |, we have |X| = |Y |.

For a diﬀerent proof of Theorem 1.5.6, see Chapter 6.

1.6 Power set of a set

We recall the deﬁnition of the power set of a given set from Section 1.4.

Definition 1.6.1:
The power set P(X) of a set X is the set of all subsets of X.
Chapter 1 Introduction: Sets, Functions and Relations 17

For instance, if X = {1, 2, 3}, then

P(X) = φ, {1}, {2}, {3}, {1, 2}, {2, 3}, {3, 1}, {1, 2, 3} = X

The empty set φ and the whole set X, being subsets of X, are elements of
P(X). Now each subset S of X is uniquely deﬁned by its characteristic
function χS : X → {0, 1} deﬁned by



1 if s ∈ S
χS =


0 if s ∈
/ S.

Conversely, every function f : X → {0, 1} is the characteristic function of a

unique subset S of X. Indeed, if

S = {x ∈ X : f (x) = 1},

then f = χS .

Definition 1.6.2:
For sets X and Y , denote by Y X , the set of all functions f : X → Y .

Theorem 1.6.3:
|X| |P(X)| for each nonvoid set X.

Proof. Theorem 1.6.3, implies that there exists a 1–1 function from X to
P(X) but none from P(X) to X.
First of all, the mapping f : X → P(X) deﬁned by f (x) = {x} ∈ P(X)
is clearly 1–1. Hence |X| ≤ |P(X)|. Next, suppose there exists a 1–1
map from P(X) to X. Then by Schroder–Bernstein theorem, there exists a
bijection g : P(X) → X. This means that for each element S of P(X), the
Chapter 1 Introduction: Sets, Functions and Relations 18

mapping g : s → g(S) = x ∈ X is a bijection. Now the element x may or

may not belong to S. Call x ∈ X ordinary if x ∈ S, that is, x is a member
of the subset S of X whose image under the map g is x. Otherwise, call x
extraordinary. Let A be the subset of X consisting of all the extraordinary
elements of X. Then A ∈ P(S). (Note: A may be the empty set; still,
A ∈ P(S)). Let g(A) = a ∈ X. Is a an ordinary element or an extraordinary
element of X? Well, if we assume that a is ordinary, then a ∈ A; but then a
is extraordinary as A is the set of extraordinary elements. Suppose we now
assume that a is extraordinary; then a ∈
/ A and so a is an ordinary element
of X, again a contradiction. These contradictions show that there exists no
1–1 mapping from P(X) to X (X 6= φ), and so |P(X)| > |X|.

1.7 Exercises

1. State true or false (with reason):

(i) Parallelism is an equivalence relation on the set of all lines in the

plane.

(ii) Perpendicularity is an equivalence relation on the set of all lines in

the plane.

(iii) A ﬁnite set can be equipotent to a proper subset of itself.

2. Prove the statements (i), (ii) and (iii) in Theorem 1.2.9.

3. Prove that the set Z of integers is denumerable.

4. (a) Prove that the denumerable union of denumerable sets is denumer-

able.
Chapter 1 Introduction: Sets, Functions and Relations 19

(b) Prove that a countable union of (nonvoid) denumerable sets is denu-

merable.

5. Prove that the set of all rational numbers is denumerable.

6. Let Mn = {m ∈ N : m is a multiple of n}. Find

(i) ∪ Mn .
n∈N

(ii) Mn ∩ Mm .

(iii) ∩ Mn .
n∈N

(iv) ∪ Mp .
p=a prime

7. Let f : A → B, and g : B → C be functions.

Prove:

(i) If f and g are 1–1, then so is g ◦ f .

(ii) If f and g are onto, then so is g ◦ f .

(iii) If g ◦ f is 1–1, then f is 1–1.

(iv) If g ◦ f is onto, then g is onto.

8. Give an example of a relation which is

(i) reﬂexive and symmetric but not transitive
(ii) reﬂexive and transitive but not symmetric.

9. Does there exist a relation which is not reﬂexive but both symmetric and
transitive?

10. Let X be the set of all ordered pairs (a, b) of integers with b 6= 0. Set
(a, b) ∼ (c, d) in X iﬀ ad = bc. Prove that ∼ is an equivalence relation on
X. What is the class to which (1,2) belongs?
Chapter 1 Introduction: Sets, Functions and Relations 20

11. Give a detailed proof of Corollary 1.4.6.

1.8 Partially Ordered Sets

Definition 1.8.1:
A relation R on a set X is called antisymmetric if, for a, b ∈ X, (a, b) ∈ R
and (b, a) ∈ R together imply that a = b.
For instance, the relation R defined on N, the set of natural numbers,
by setting that “(a, b) ∈ R iff a|b(a divides b)” is an antisymmetric rela-
tion. However, the same relation defined on Z⋆ = Z/{0}, the set of nonzero
integers, is not antisymmetric. For instance, 5|(−5) and (−5)|5 but 5 6= −5.

Definition 1.8.2:
A relation R on a set X is called a partial order on X if it is (i) Reﬂexive,
(ii) Antisymmetric and (iii) Transitive.
A partially ordered set is a set with a partial order deﬁned on it

Examples

1. Let R be deﬁned on the set N by setting aRb iﬀ a|b. Then R is a partial

order on N.

2. Let X be a nonempty set. Deﬁne a relation R on P(X) by setting

ARB in P(X) iﬀ A ⊆ B. Then P(X) is a partially ordered set with
respect to the above partial order.

It is customary to denote a general partial order on a set X by “≤”. We

then write that (X, ≤) is a partially ordered set or a poset in short.
Chapter 1 Introduction: Sets, Functions and Relations 21

{1,2,3}
b
b{1,3}
{1,2}
b
b{2,3}

b b b
{1} {2} {3}

b
φ

Figure 1.2: The Hasse diagram of P({1, 2, 3})

Every poset (X, ≤) can be represented pictorially by means of its Hasse

diagram. This diagram is drawn in the plane by taking the elements of X as
points of the plane and representing the fact that a ≤ b in X by placing b
above a and joining a and b by a line segment.
As an example, take S = {1, 2, 3} and X = P(S), the power set of S
and ⊆ to stand for“≤”. Figure 1.2 gives the Hasse diagram of (X, ≤). Note
that φ ≤ {1, 2} since φ is a subset of {1,2} . However, we have not drawn a
line between φ and {1, 2} but a (broken) line exists from φ to {1, 2} via {1}
or {2}. When there is no relation between a and b of X, both a and b can
appear in the same horizontal level.

Definition 1.8.3:
A partial order “≤” on X is a total order (or linear order) if for any two
elements a and b of X, either a ≤ b or b ≤ a holds.

For instance, if X = {1, 2, 3, 4} and “ ≤′′ is the usual “less than or equal
to” then (X, ≤) is a totally ordered set since any two elements of X are
Chapter 1 Introduction: Sets, Functions and Relations 22

comparable. The Hasse diagram of (X, ≤) is given in Figure1.3.

4 b

2 b

b
φ

Figure 1.3: Hasse diagram of (X, ≤), X = {1, 2, 3, 4}

The Hasse diagrams of all lattices with ﬁve elements are given in Fig-
ure 1.4.

1b 1b
1b 1b
b b a b bb
1b
a b a b bb a b b
c
1b b b b b b
0 0 0 0 0 0
V11 V12 V13 V14 V24 V15
1b
1b 1b
b
c b c b
cb 1b
bb b b
b a b
a b bb a b c bb a b
b b b b
0 0 0 0
V25 V35 V45 V55

Figure 1.4: Hasse diagrams of ﬁve elements

Chapter 1 Introduction: Sets, Functions and Relations 23

If S has at least two elements, then (P(S), ≤) is not a totally order set.
Indeed, if a, b ∈ S, then {a} and {b} are incomparable (under ⊆) elements
of P(S).

Definition 1.8.4 (Converse Relation):

If f : A −→ B is a relation, the relation f −1 : B −→ A is called the converse
of f provided that (b, a) ∈ f −1 iﬀ (a, b) ∈ f .

We note that if (X, ≤) is a poset, then (X, ≥) is also a poset. Here a ≥ b

in (X, ≥) is deﬁned by b ≤ a in (X, ≤).

Definition 1.8.5:
Let (X, ≤) be a poset.

(i) a is called a greatest element of the poset if x ≤ a for each x ∈ X. An

element b ∈ X is called a smallest element if b ≤ x for each x ∈ X.

If a and a′ are greatest elements of X, by deﬁnition, a ≤ a′ and a′ ≤ a,

and so by antisymmetry a = a′ . Thus a greatest (smallest) element, if it
exists, is unique. The greatest and least elements of a poset, whenever
they exist, are denoted by 1 and 0 respectively. They are called the
universal elements of the poset.

(ii) An element a ∈ X is called a minimal element of X if there exists no

element c ∈ X such that c < a (that is, c ≤ a and c 6= a). b ∈ X is a
maximal element of X if there exists no element c of X such that c > b
(that is, b ≤ c, b 6= c).

Clearly, the greatest element of a poset is a maximal element and the least
element a minimal element.
Chapter 1 Introduction: Sets, Functions and Relations 24

Example 1.8.6:

Let (X, ⊆) be the poset where X = {1}, {2}, {1, 2}, {2, 3}, {1, 2, 3} . In
X, {1}, {2} are minimal elements, {1, 2, 3} is the greatest element (and the
only maximal element) but there is no smallest element.

Definition 1.8.7:
Let (X, ≤) be a poset and Y ⊆ X

(i) x ∈ X is an upper bound for Y if y ≤ x for all y ∈ Y .

(ii) x ∈ X is a lower bound for Y if x ≤ y for all y ∈ Y .

(iii) The inﬁmum of Y is the greatest lower bound of Y , if it exists. It is

denoted by inf Y .

(iv) The supremum of Y is the least upper bound of Y , if it exists. It is

denoted by sup Y .

Example 1.8.8:
If X = [0, 1] and ≤ stands for the usual ordering in the reals, then 1 is the
supremum of X and 0 is the infimum of X. Instead, if we take X = (0, 1), X
has neither an infimum nor a supremum in X. Here we have taken Y = X.
However, if X = R and Y = (0, 1), then 1 and 0 are the supremum and
infimum of Y respectively. Note that the supremum and infimum of Y ,
namely, 1 and 0, do not belong to Y .

1.9 Lattices

We now deﬁne a lattice.

Chapter 1 Introduction: Sets, Functions and Relations 25

Definition 1.9.1:
A lattice L = (L, ∧, ∨) is a nonempty set L together with two binary oper-
ations ∧ (called meet or intersection or product) and ∨ (called join or union
or sum) that satisfy the following axioms:

For all a, b, c ∈ L,

(L1 ) a ∧ b = b ∧ a; a ∨ b = b ∨ a, (Commutative law)

(L2 ) a ∧ (b ∧ c) = (a ∧ b) ∧ c; a ∨ (b ∨ c) = (a ∨ b) ∨ c,
(Associative law)

and (L3 ) a ∧ (a ∨ b) = a; a ∨ (a ∧ b) = a, (Absorbtion law)

Now, by (L3 ), a ∨ (a ∧ a) = a,

and hence again by (L3 ) (a ∧ a) = a ∧ (a ∨ (a ∧ a)) = a.

Similarly, a ∨ a = a for each a ∈ L.

Theorem 1.9.2:
The relation “a ≤ b iﬀ a ∧ b = a” in a lattice (L, ∧, ∨), deﬁnes a partial order
on L.

Proof. (i) Trivially a ≤ a since a ∧ a = a in L. Thus “≤” is reﬂexive on L.

(ii) If a ≤ b and b ≤ a, we have a ∧ b = a and b ∧ a = b. Hence a = b since

by (L1 ), a ∧ b = b ∧ a. This proves that “≤” is antisymmetric.

(iii) Finally we prove that “≤” is transitive. Let a ≤ b and b ≤ c so that

a ∧ b = a and b ∧ c = b.
Chapter 1 Introduction: Sets, Functions and Relations 26

Now

a ∧ c = (a ∧ b) ∧ c = a ∧ (b ∧ c) by(L2 )

= a ∧ b = a and hence a ≤ c.

Thus (L, ≤) is a poset.

The converse of Theorem 1.9.2 is as follows:

Theorem 1.9.3:
Any partially ordered set (L, ≤) in which any two elements have an inﬁmum
and a supremum in L is a lattice under the operations,

a ∧ b = inf(a, b), and a ∨ b = sup(a, b).

Proof. Follows from the deﬁnitions of supremum and inﬁmum.

Examples of Lattices

For a nonvoid set S, P(S), ∩, ∪ is a lattice. Again for a positive integer
n, deﬁne Dn to be the set of divisors of n, and let a ≤ b in Dn mean that
a | b, that is, a is a divisor of b. Then a ∧ b = (a, b), the gcd of a and b, and

a ∨ b = [a, b], the lcm of a and b, and Dn , ∨, ∧ is a lattice (See Chapter 2
for the deﬁnitions of gcd and lcm ). For example, if n = 20, Fig. 1.5 gives
the Hasse diagram of the lattice D20 = {1, 2, 4, 5, 10, 20}. It has the least
element 1 and the greatest element 20.
Chapter 1 Introduction: Sets, Functions and Relations 27

20
10

5
4

2
1

Figure 1.5: The lattice D20

We next give the Duality Principle valid in lattices.

Duality Principle

In any lattice (L, ∧, ∨), any formula or statement involving the operations ∧
and ∨ remains valid if we replace ∧ by ∨ and ∨ by ∧.
The statement got by the replacement is called “the dual statement” of
the original statement.
The validity of the duality principle lies in the fact that in the set of
axioms for a lattice, any axiom obtained by such a replacement is also an
axiom. Consequently, whenever we want to establish a statement and its
dual, it is enough to establish one of them. Note that the dual of the dual
statement is the original statement. For instance, the statement:

a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c)

implies the statement

a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c).
Chapter 1 Introduction: Sets, Functions and Relations 28

Definition 1.9.4:
A subset L′ of a lattice L = (L, ∧, ∨) is a sublattice of L if (L′ , ∧, ∨) is a
lattice.

A subset S of a lattice (L, ∧, ∨) need not be a sublattice even if it is a

poset with respect to the operation “ ≤ ” deﬁned by

a ≤ b iﬀ a ∧ b = a

For example, let (L, ∩, ∪) be the lattice of all subsets of a vector space L
and S be the collection of all subspaces of L. Then S is, in general, not a
sublattice of L since the union of two subspaces of L need not be a subspace
of L.

Lemma 1.9.5:
In any lattice L = (L, ∧, ∨), the operations ∧ and ∨ are isotone, that is, for
a, b, c in L,

if b ≤ c, then a ∧ b ≤ a ∧ c and a ∨ b ≤ a ∨ c.

Proof. We have (see Exercise 5 of 1.12)

a ∧ b = a ∧ (b ∧ c) = (a ∧ b) ∧ c (by L2)

≤ a ∧ c (as a ∧ b ≤ a).

Similarly (or by duality), a ∨ b ≤ a ∨ c.

Lemma 1.9.6:
Any lattice satisﬁes the two distributive inequalities:

(i) x ∧ (y ∨ z) ≥ (x ∧ y) ∨ (x ∧ z), and

Chapter 1 Introduction: Sets, Functions and Relations 29

(ii) x ∨ (y ∧ z) ≤ (x ∨ y) ∧ (x ∨ z).

Proof. We have x∧y ≤ x, and x∧y ≤ y ≤ y∨z. Hence, x∧y ≤ inf(x, y∨z) =
x ∧ (y ∨ z), Also x ∧ z ≤ x, and x ∧ z ≤ z ≤ y ∨ z. Thus x ∧ z ≤ x ∧ (y ∨ z).
Therefore, x ∧ (y ∨ z) is an upper bound for both x ∧ y and x ∧ z and hence
greater than or equal to their least upper bound, namely, (x ∧ y) ∨ (x ∧ z).
The second statement follows by duality.

Lemma 1.9.7:
The elements of a lattice satisfy the modular inequality:

x≤z implies x ∨ (y ∧ z) ≤ (x ∨ y) ∧ z.

Proof. We have x ≤ x ∨ y and x ≤ z. Hence x ≤ (x ∨ y) ∧ z. Also,

y ∧ z ≤ y ≤ x ∨ y and y ∧ z ≤ z, whence y ∧ z ≤ (x ∨ y) ∧ z. These together
imply that
x ∨ (y ∧ z) ≤ (x ∨ y) ∧ z.

Aliter.

By Lemma 1.9.6 x ∨ (y ∧ z) ≤ (x ∨ y) ∧ (x ∨ z)

= (x ∨ y) ∧ z, as x ≤ z.

Distributive and Modular Lattices

Two important classes of lattices are the distributive lattices and modular
lattices. We now deﬁne them.

Definition 1.9.8:
Chapter 1 Introduction: Sets, Functions and Relations 30

A lattice (L, ∧, ∨) is called distributive if the two distributive laws:

a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c),

and a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c)

hold for all a, b, c ∈ L.

Note that in view of the duality that is valid for lattices, if one of the
two distributive laws holds in L then the other would automatically remain
valid.

Example 1.9.9 (Examples of Distributive Lattices): (i) P(S), ∩, ∪

(ii) (N, gcd, lcm ). (Here a∧b = (a, b), the gcd of a and b, and a∨b = [a, b],
the lcm of a and b.)

Example 1.9.10 (Examples of Nondistributive Lattices): (i) The ‘diamond lat-

tice’ of Figure 1.6 (a)

(ii) The ‘pentagonal lattice’ of Figure 1.6 (b)

1b 1b
c b

a b bc bb bb

a b
b b
0 0
Diamond Lattice Pentagonal Lattice
(a) (b)

Figure 1.6: Hasse diagram of the diamond and pentagonal lattices

Chapter 1 Introduction: Sets, Functions and Relations 31

In the diamond lattice,

a ∧ (b ∨ c) = a ∧ 1 = a, while (a ∧ b) ∨ (a ∧ c) = 0 ∨ 0 = 0 (6= a).

In the case of the pentagonal lattice,

a ∨ (b ∧ c) = a ∨ 0 = a, while

(a ∨ b) ∧ (a ∨ c) = 1 ∧ c = c(6= a).

Complemented Lattice

Definition 1.9.11:
A lattice L with 0 and 1 is complemented if for each element a ∈ L, there
exists at least one element b ∈ L such that

a ∧ b = 0 and a ∨ b = 1.

Example 1.9.12:(1) Let L = P(S), the power set of a nonvoid set S. If

A ∈ L, then A has a unique complement B = L \ A. Here 0 = the empty
set and 1 = the whole set S.

(2) The complement of an element need not be unique. For example, in

the lattice of Figure 1.7 (a), both a and c are complements of b, since
b ∧ a = b ∧ c = 0, and b ∨ a = b ∨ c = 1.

(3) Not every lattice with 0 and 1 is complemented. In the lattice of Fig-
ure 1.7 (b), a has no complement.

That the diamond lattice and the pentagonal lattice (of Figure 1.6) are
crucial in the study of distributive lattices is the content of Theorem 1.9.13.
Chapter 1 Introduction: Sets, Functions and Relations 32

1b 1b

bc a b bb
b b
b
ba
c
b
0 b0
(a) (b)

Figure 1.7:

Theorem 1.9.13:
A lattice is distributive iﬀ it does not contain a sublattice isomorphic to the
diamond lattice or the pentagonal lattice.

The necessity of the condition in Theorem 1.9.13 is trivial but the proof
of suﬃciency is more involved. (For a proof, see ????)
However, a much simpler result is the following:

Theorem 1.9.14:
If a lattice L is distributive then for a, b, c ∈ L, the equations a ∧ b = a ∧ c
and a ∨ b = a ∨ c together imply that b = c.

Proof. Assume that L is distributive. Suppose that a ∧ b = a ∧ c and a ∨ b =

a ∨ c. We have to show that b = c.

Now b = b ∧ (a ∨ b) (by absorption law)

= b ∧ (a ∨ c) = (b ∧ a) ∨ (b ∧ c) (by distributivity)

= (a ∧ b) ∨ (b ∧ c) = (a ∧ c) ∨ (b ∧ c) = (a ∨ (b ∧ c)) ∧ (c ∨ (b ∧ c))

= (a ∨ (b ∧ c)) ∧ c (by absorbtion law)

Chapter 1 Introduction: Sets, Functions and Relations 33

= ((a ∨ b) ∧ (a ∨ c)) ∧ c = ((a ∨ c) ∧ (a ∨ c)) ∧ c = (a ∨ c) ∧ c

= c.

We now consider another important class of lattices called modular lat-

tices.

Modular Lattices

Definition 1.9.15:
A lattice is modular if it satisﬁes the following modular identity:

x ≤ z ⇒ x ∨ (y ∧ z) = (x ∨ y) ∧ z.

Hence the modular lattices are those lattices for which equality holds in
Lemma 1.9.7. We prove in Chapter 5 that the normal subgroups of any
group form a modular lattice.
The pentagonal lattice of Figure 1.7 is nonmodular since in it a ≤ c, a ∨
(b ∧ c) = a ∨ 0 = a while (a ∨ b) ∧ c = 1 ∧ c = c(6= a). In fact, the following
result is true.

Theorem 1.9.16:
Any nonmodular lattice L contains the pentagonal lattice as a sublattice.

Proof. As L is nonmodular, there exist elements a, b, c in L such that

a < c and a ∨ (b ∧ c) 6= (a ∨ b) ∧ c.

But the modular inequality (Lemma 1.9.7) holds for any lattice. Hence

a < c and a ∨ (b ∧ c) < (a ∨ b) ∧ c.

Chapter 1 Introduction: Sets, Functions and Relations 34

Set x = a ∨ (b ∧ c), and y = (a ∨ b) ∧ c, so that x < y. Now x ∨ b =

a ∨ (b ∧ c) ∨ b = a ∨ (b ∧ c) ∨ b = a ∨ b = y ∨ b = a ∨ ((b ∧ c) ∨ b) (by L3).

a∧b
b

b b

bx
b
b∧y

Figure 1.8:

By duality, x ∧ b = y ∧ b. Now since y ≤ c, b ∧ y ≤ b ∧ c ≤ x, the lattice

of Figure 1.8 is the pentagonal lattice contained in L.

A consequence of Theorems 1.9.13 and 1.9.16 is that every distributive

lattice is modular (since L is distributive ⇒ L does not contain the pen-
tagonal lattice as a sublattice ⇒ L is modular). The diamond lattice is an
example of a modular lattice that is not distributive. Another example is the
lattice of all vector subspaces of a vector space V . (See Exercise 1.12.)

Theorem 1.9.17:
In a distributive lattice L, an element can have at most one complement.

Proof. Suppose x has two complements y1 , y2 . Then

x ∧ y1 = 0 = x ∧ y2 , and x ∨ y1 = 1 = x ∨ y2 .

As L is distributive, by Theorem 1.9.14, y1 = y2 .

Chapter 1 Introduction: Sets, Functions and Relations 35

A consequence of Theorem 1.9.17 is that any complemented lattice having

an element without a unique complement is not distributive. For example,
the lattice of Figure 1.7 is not distributive since c has two complements,
namely, a and b.

1.10 Boolean Algebras

1.10.1 Introduction

A Boolean algebra is an abstract mathematical system primarily used in

computer science and in expressing the relationships between sets. This sys-
tem was developed by the English mathematician George Boole in 1850 to
permit an algebraic manipulation of logical statements. Such manipulation
can demonstrate whether or not a statement is true and show how a com-
plicated statement be rephrased in a similar, more convenient form without
losing its meaning.

Definition 1.10.1:
A complemented distributive lattice is a Boolean algebra. Hence a Boolean
algebra B has the universal elements 0 and 1 and that every element x
of B has a complement x′ , and since B is a distributive lattice, by Theo-
rem 1.9.17, x′ is unique. The Boolean algebra B is symbolically represented
as (B, ∧, ∨, 0, 1,′ ).
Chapter 1 Introduction: Sets, Functions and Relations 36

1.10.2 Examples of Boolean algebras

1. Let S be a nonvoid set. Then (P(S), ∩, ∪, φ, S, ′ ) is a Boolean alge-

bra. Here if A ∈ P(S), A′ = S \ A is the complement of A in S.

2. Let B n denote the set of all binary sequences of length n. For (a1 , . . . , an )
and (b1 , . . . , bn ) ∈ B n , set
(a1 , . . . , an ) ∧ (b1 , . . . , bn ) = (min(a1 , b1 ), . . . , min(an , bn )),
(a1 , . . . , an ) ∨ (b1 , . . . , bn ) = (max(a1 , b1 ), . . . , max(an , bn )),
and (a1 , . . . , an )′ = (a′1 , . . . , a′n ), where 0′ = 1 and 1′ = 0.
Note that the zero element is the n-vector (0, 0, . . . , 0), and, the unit
element is (1, 1, . . . , 1). For instance, if n = 3, x = (1, 1, 0) and
y = (0, 1, 0), then x ∧ y = (0, 1, 0), x ∨ y = (1, 1, 0), and x′ = (0, 0, 1).

Theorem 1.10.2 (DeMorgan’s Laws):

Any Boolean algebra B satisﬁes DeMorgan’s laws: For any two elements a,
b ∈ B,
(a ∧ b)′ = a′ ∨ b′ , and (a ∨ b)′ = a′ ∧ b′ .

Proof. We have by distributivity,

(a ∧ b) ∧ (a′ ∨ b′ ) = (a ∧ (a′ ∨ b′ )) ∧ (b ∧ (a′ ∨ b′ ))

= ((a ∧ a′ ) ∨ (a ∧ b′ )) ∧ ((b ∧ a′ ) ∨ (b ∧ b′ ))

= (0 ∨ (a ∧ b′ )) ∧ ((b ∧ a′ ) ∨ 0)

= (a ∧ b′ ) ∧ (b ∧ a′ )

= (a ∧ b ∧ a′ ) ∧ (b′ ∧ b ∧ a′ ) (see Exercise 1.12 #4)

= (a ∧ a′ ∧ b) ∧ (b′ ∧ b ∧ a′ )

= (0 ∧ b) ∧ (0 ∧ a′ )
Chapter 1 Introduction: Sets, Functions and Relations 37

=0∧0

= 0 (since a ∧ a′ = 0 = b ∧ b′ ).

Similarly,

(a ∧ b) ∨ (a′ ∨ b′ ) = (a ∨ (a′ ∨ b′ )) ∧ (b ∨ (a′ ∨ b′ )) (by distributivity)

= ((a ∨ a′ ) ∨ b′ ) ∧ ((b ∨ b′ ) ∨ a′ ) (since a′ ∨ b′ = b′ ∨ a′ )

= (1 ∨ b′ ) ∧ (1 ∨ a′ ) = 1 ∧ 1 = 1.

Hence the complement of a ∧ b is a′ ∨ b′ . In a similar manner, we can show

that (a ∨ b)′ = a′ ∧ b′ .

Corollary 1.10.3:
In a Boolean algebra B, for a, b ∈ B,a ≤ b iﬀ a′ ≥ b′

Proof. a ≤ b ⇔ a ∨ b = b ⇔ (a ∨ b)′ = a′ ∧ b′ = b′ ⇔ a′ ≥ b′ .

Theorem 1.10.4:
In a Boolean algebra B, we have for all a, b ∈ B,

a ≤ b iﬀ a ∧ b′ = 0 iﬀ a′ ∨ b = 1.

Proof. Since (a ∧ b′ )′ = a′ ∨ b and 0′ = 1, it is enough to prove the ﬁrst part

of the theorem. Now
a ≤ b ⇒ (by isotone property) a ∧ b′ ≤ b ∧ b′ = 0. ⇒ a ∧ b′ ≤ 0 ⇒ a ∧ b′ = 0.
Conversely, let a ∧ b′ = 0. Then

a = a ∧ 1 = a ∧ (b ∨ b′ ) = (a ∧ b) ∨ (a ∧ b′ )

= a ∧ b ⇒ a ≤ b.
Chapter 1 Introduction: Sets, Functions and Relations 38

Next we brieﬂy discuss Boolean subalgebras and Boolean isomorphisms.

These are notions similar to subgroups and group-isomorphisms.

Boolean Subalgebras

Definition 1.10.5:
A Boolean subalgebra of a Boolean algebra B = (B, ∧, ∨, 0, 1, ′ ) is a subset
B1 of B such that (B1 , ∧, ∨, 0, 1, ′ ) is itself a Boolean algebra with the same
elements 0 and 1 of B.

Boolean Isomorphisms

Definition 1.10.6:
A Boolean homomorphism from a Boolean algebra B1 to a Boolean algebra
B2 is a map f : B1 → B2 such that for all a, b in B1 ,

(i) f (a ∧ b) = f (a) ∧ f (b),

(ii) f (a ∨ b) = f (a) ∨ f (b), and

(iii) f (a′ ) = (f (a))′ .

Conditions (i) and (ii) imply that f is a lattice homomorphism from B1 to B2

while condition (iii) tells that f takes the complement of an element (which
is unique in a Boolean algebra) in B1 to the complement in B2 of the image
of that element.

Theorem 1.10.7:
Let f : B1 → B2 be a Boolean homomorphism. Then

(i) f (0) = 0, and f (1) = 1;

Chapter 1 Introduction: Sets, Functions and Relations 39

(ii) f is isotone.

(iii) The image f (B1 ) is a Boolean subalgebra of B2 .

Proof. Straightforward.

Example 1.10.8:
Let S = {1, 2, . . . , n}, and let A be the Boolean algebra (P(S) = A, ∩, ∪, ′ ),
and let B be the Boolean algebra defined by the set of all functions from S
to the set [0, 1]. Any such function is a sequence (x1 , . . . , xn ) where xi = 0
or 1. Let ∧, ∨ and be as in Example 2 of Section 1.10.1. Now consider the
map f : A = P(S) → B = [0, 1]S defined as follows: For X ⊂ S (that is,
X ∈ P(S) = A), f (X) = (x1 , x2 , . . . , xn ), where xi = 1 or 0 according to
whether i ∈ X or not. For X, Y ∈ P (S), f (X ∩ Y ) = the binary sequence
having 1 only in the places common to X and Y = f (X) ∧ f (Y ) as per the
definitions in Example 2 of Section 1.10.1 Similarly, f (X ∪ Y ) = the binary
sequence having 1 in all the places corresponding to the 1’s in the set X ∪ Y
= f (X) ∨ f (Y ).
Further, f (X ′ ) = f (S \ X) = the binary sequence having 1’s in the places
where X had zeros, and zeros in the places where X had 1’s = (f (X)′ ). f
is 1–1 since distinct binary sequences in B arise out of distinct subsets of
S. Finally, f is onto, since any binary sequence in B is the image of the
corresponding subset (that is, the subset corresponding to the places of the
sequence with 1) of X. Thus f is a Boolean isomorphism.

Example 1.10.9:
Let A be a proper Boolean subalgebra of B = P(S). Then if f : A → B is
Chapter 1 Introduction: Sets, Functions and Relations 40

the identity function, f is a lattice homomorphism since,

f (A1 ∧ A2 ) = A1 ∧ A2 = f (A1 ) ∧ f (A2 ),

and f (A1 ∨ A2 ) = A1 ∨ A2 = f (A1 ) ∨ f (A2 ).

However f (A′ ) = f (complement of A in A)= f (φ) = φ = 0B ,

while (f (A))′ = A′ = B|A 6= φ.
Hence f (A′ ) 6= f (A′ ), and f is not a Boolean homomorphism.

1.11 Atoms in a Lattice

Definition 1.11.1:
An element a of a lattice L with zero is called an atom of L if a 6= 0 and for
all b ∈ L, 0 < b ≤ a ⇒ b = a. That is to say, a is an atom if there is no
nonzero b strictly less than a.

Definition 1.11.2:
An element a of a lattice L is called join-irreducible if a = b ∨ c, then a = b
or a = c; otherwise, a is join-reducible.

Lemma 1.11.3:
Every atom of a lattice with zero is join-irreducible.

Proof. Let a be an atom of a lattice L, and let a = b ∨ c, a 6= b. Then

b ≤ b ∨ c = a, but since b 6= a and a is an atom of L, b = 0, and hence
a = c.

Lemma 1.11.4:
Let L be a distributive lattice and c ∈ L be join-irreducible. If c ≤ a ∨ b,
Chapter 1 Introduction: Sets, Functions and Relations 41

then c ≤ a or c ≤ b. In particular, the same result is true if c is an atom of

Proof. As c ≤ a ∨ b, and L is distributive, c = c ∧ (a ∨ b) = (c ∧ a) ∨ (c ∧ b).

As c is join-irreducible, this means that c = c ∧ a or c = c ∧ b, that is, c ≤ a
or c ≤ b. The second statement follows immediately from Lemma 1.11.3.

Definition 1.11.5: (i) Let L be a lattice and a and b, a ≤ b, be any two

elements of L. Then the closed interval [a, b] is deﬁned as:

[a, b] = {x ∈ L : a ≤ x ≤ b}.

(ii) Let x ∈ [a, b]. x is said to be relatively complemented in [a, b], if x has
a complement y in [a, b], that is, x ∧ y = a and x ∨ y = b. If all intervals
[a, b] of L are complemented, then the lattice L is said to be relatively
complemented.

(iii) If L has a zero element and all elements in [0, b] have complements in L
for every nonzero b in L, then L is said to be sectionally complemented.

Our next theorem is crucial for the proof of the representation theorem
for ﬁnite Boolean algebras.

Theorem 1.11.6:
The following statements are true:

(i) Every Boolean algebra is relatively complemented.

(ii) Every relatively complemented lattice is sectionally complemented.

Chapter 1 Introduction: Sets, Functions and Relations 42

(iii) In any ﬁnite sectionally complemented lattice, each nonzero element is

a join of ﬁnitely many atoms.

Proof. (i) Let [a, b] be an interval in a Boolean algebra B, and x ∈ [a, b].
We have to prove that [a, b] is complemented. Now, as B is the Boolean
algebra, it is a complemented lattice and hence there exists x′ in B such
that x ∧ x′ = 0, and x ∨ x′ = 1. Set y = b ∧ (a ∨ x′ ). Then y ∈ [a, b].
Also, y is a complement of x in [a, b] since

x ∧ y = x ∧ (b ∧ (a ∨ x′ ))

= (x ∧ b) ∧ (a ∨ x′ ) = x ∧ (a ∨ x′ ) (as x ∈ [a, b], x ≤ b)

= (x ∧ a) ∨ (x ∧ x′ ) (as B is distributive)

= (x ∧ a) ∨ (0) = a (as a ≤ x)

and, x ∨ y = x ∨ (b ∧ (a ∨ x′ )) = (x ∨ b) ∧ (x ∨ (a ∨ x′ ))

= b ∧ (x ∨ x′ ) ∨ a (again by distributivity)

= b ∧ 1 = b (since x ∨ x′ = 1, and 1 ∨ a = 1).

Hence B is complemented in the interval [a, b].

(ii) If L is the relatively complemented, L is complemented in [0, b] for each

b ∈ L (take a = 0). Hence L is sectionally complemented.

(iii) Let a be a nonzero element of a ﬁnite sectionally complemented lattice

L. As L is ﬁnite, there are only ﬁnitely many atoms p1 , . . . , pn in L such
that pi ≤ a, 1 ≤ i ≤ n, and let b = p1 ∨· · ·∨pn . Now, b ≤ a, since b is the
least upper bound of p1 , . . . , pn while a is an upper bound of p1 , . . . pn .
Suppose b 6= a, then b has a nonzero complement, say, c, in the section
[0, a] since we have assumed that L is sectionally complemented. Let
Chapter 1 Introduction: Sets, Functions and Relations 43

p be an atom such that p ≤ c (≤ a). Then p ∈ {p1 , . . . , pn }, as

by assumption p1 , . . . , pn are the only atoms with pi ≤ a, and hence,
p = p ∧ b ≤ c ∧ b = 0 (as c is the complement of b), a contradiction.
Hence b = a = p1 ∨ · · · ∨ pn .

An immediate consequence of Theorem 1.11.6 is the following result.

Corollary 1.11.7:
In any ﬁnite Boolean algebra, every nonzero element is a join of atoms.

We end this section with the representation theorem for finite Boolean
algebras which says that any finite Boolean algebra may be thought of as the
Boolean algebra P(S) defined on a finite set S.

Theorem 1.11.8 (Representation theorem for ﬁnite Boolean algebras):

Let B be a ﬁnite Boolean algebra and A, the set of its atoms. Then there
exists a Boolean isomorphism B ≃ P(A).

Proof. For b ∈ B, deﬁne A(b) = {a ∈ A : a ≤ b} so that A(b) is the set of

the atoms of B that are less than or equal to b. Then A(b) ∈ P(A). Now
deﬁne
φ : B → P(A)

by setting φ(b) = A(b). We now prove that φ is a Boolean isomorphism.

We ﬁrst show that φ is a lattice homomorphism, that is, for b1 , b2 ∈ B,
φ(b1 ∧ b2 ) = φ(b1 ) ∧ φ(b2 ) = φ(b1 ) ∩ φ(b2 ), and φ(b1 ∨ b2 ) = φ(b1 ) ∨ φ(b2 ) =
φ(b1 ) ∪ φ(b2 ).
Chapter 1 Introduction: Sets, Functions and Relations 44

Equivalently, we show that

A(b1 ∧ b2 ) = A(b1 ) ∩ A(b2 ),

and A(b1 ∨ b2 ) = A(b1 ) ∪ A(b2 ).

Let a be an atom of B. Then a ∈ A(b1 ∧ b2 ) ⇔ a ≤ b1 ∧ b2 ⇔ a ≤ b1 and a ≤

b2 ⇔ a ∈ A(b1 ) ∩ A(b2 ). Similarly, a ∈ A(b1 ∨ b2 ) ⇔ a ≤ b1 ∨ b2 ⇔ a ≤ b1 or
a ≤ b2 (As a is an atom, a is join-irreducible by Lemma 1.11.3 and B being a
Boolean algebra, it is a distributive lattice. Now apply Lemma 1.11.4) ⇔ a ∈
A(b1 ) or a ∈ A(b2 ) ⇔ a ∈ A(b1 ) ∪ A(b2 ). Next, as regards complementation,

a ∈ φ(b′ ) ⇔ a ∈ A(b′ ) ⇔ a ≤ b′ ⇔ a ∧ b = 0 (by Theorem 1.10.4)

′
⇔a / A(b) ⇔ a ∈ A \ A(b) = (A(b))′ . Thus A(b′ ) = A(b) .
b⇔a∈

Finally, φ(0) = set of atoms in B that are ≤ 0

= φ (as there are none), the zero element of P(A),

and φ(1) = set of atoms in B that are ≤ 1

= set of all atoms in B = A, the unit element of P(A).

All that remains to show is that φ is a bijection. By Corollary 1.11.7, any

b ∈ B is a join, say, b = a1 ∨· · ·∨an (of a ﬁnite number n) of atoms a1 , . . . , an
of B. Hence ai ≤ b, 1 ≤ i ≤ n. Suppose φ(b) = φ(c), that is, A(b) = A(c).
Then each ai ∈ A(b) = A(c) and so ai ≤ c for each i, and hence b ≤ c. In a
similar manner, we can show that c ≤ b, and hence b = c. In other words, φ
is injective.
Finally we show that φ is surjective. Let C = {c1 , . . . , ck } ∈ P(A) so
that C is a set of atoms in A. Set b = c1 ∨ · · · ∨ ck . We show that φ(b) = C
Chapter 1 Introduction: Sets, Functions and Relations 45

and this would prove that φ is onto. Now ci ≤ b for each i, and so by the
deﬁnition of φ, φ(b) = { set of atoms c ∈ A with c ≤ b} ⊇ C. Conversely,
if a ∈ φ(b), then a is an atom with a ≤ b = c1 ∨ . . . ∨ ck . Therefore a ≤ ci
for some i by Lemma 1.11.4. As ci is an atom and a 6= 0, this means that
a = ci ∈ C. Thus φ(b) = C.

1.12 Exercises

1. Draw the Hasse diagram of all the 15 essentially distinct lattices with
six elements.

2. Show that the closed interval [a, b] is a sublattice of the lattice R, inf, sup .

3. Give an example of a lattice with no zero element and with no unit

element.

4. In a lattice, show that

(i) a ∧ b ∧ c = c ∧ b ∧ a,
(ii) (a ∧ b) ∧ c = (a ∧ c) ∧ (b ∧ c).

5. Prove that in a lattice a ≤ b ⇔ a ∧ b = a ⇔ a ∨ b = b.

6. Show that every chain is a distributive lattice.

7. Show that the three lattices of Fig. 1.9 are not distributive.
Chapter 1 Introduction: Sets, Functions and Relations 46

1
b

b
e 1b
db
cb d
b 1
b

b
b b
b bd
a b b ab c
b ab c
b
f bb

b b b
0 0 0

(a) (b) (c)

Figure 1.9:

8. Show that the lattice of Fig. 1.9 (c) is not modular.

9. Show that the lattice of all subspaces of a vector space is not distribu-
tive.

10. Which of the following lattices are (i) distributive (ii) modular (iii) mod-
ular, but not distributive? (a) D160 (b) D20 (c) D36 (d) D40 .

11. Give a detailed proof of Theorem 1.10.7.

Chapter 2

Combinatorics

2.1 Introduction

Combinatorics is the science (and to some extent, the art) of counting and
enumeration of configurations (it is understood that a configuration arises
every time objects are distributed according to certain predetermined con-
straints). Just as arithmetic deals with integers (with the standard opera-
tions), algebra deals with operations in general, analysis deals with functions,
geometry deals with rigid shapes and topology deals with continuity, so does
combinatorics deals with configurations. The word combinatorial was first
used in the modern mathematical sense by Gottfried Wilhelm Leibniz (1646–
1716) in his Dissertatio de Arte Combinatoria (Dissertation Concerning the
Combinatorial Arts). Reference to “combinatorial analysis” is found in En-
glish in 1818 in the title Essays on the Combinatorial Analysis by P. Nichol-
son (see Jeff Miller’s Earliest known uses of the words of Mathematics, Society
for Industrial and Applied Mathematics, U.S.A.). In his book [4], C. Berge
points out the following interesting aspects of combinatorics:

47
Chapter 2 Combinatorics 48

1. study of the intrinsic properties of a known conﬁguration

2. investigation into the existence/non-existence of a conﬁguration with

speciﬁed properties

3. counting and obtaining exact formulas for the number of conﬁgurations

satisfying certain speciﬁed properties

4. approximate counting of conﬁgurations with a given property

5. enumeration and classiﬁcation of conﬁgurations

6. optimization in the sense of obtaining a conﬁguration with speciﬁed

properties so as to minimize an associated cost function.

Basic combinatorics is now regarded as an important topic in Discrete

Mathematics. Principles of counting appear in various forms in Computer
Science and Mathematics, especially in the analysis of algorithms and in
probability theory.

2.2 Elementary Counting Ideas

We begin with some simple ideas of counting using the Sum Rule, the Product
Rule and obtaining permutations and combinations of ﬁnite sets of objects.
Chapter 2 Combinatorics 49

Sum Rule and Product Rule

2.2.1 Sum Rule

This is also known as the principle of disjunctive counting. If a ﬁnite set

X is the union of pairwise disjoint nonempty subsets S1 , S2 , . . . , Sn , then
|X| = |S1 | + |S2 | + · · · + |Sn | where, |X| denotes the number of elements in
the set X.

2.2.2 Product Rule

This is also known as the principle of sequential counting. If S1 , S2 , . . . ,

Sn are nonempty ﬁnite sets, then the number of elements in the cartesian
product S1 × S2 × · · · × Sn is |S1 | × |S2 | × · · · × |Sn |.
For instance, consider the set A = {a, b, c, d, e} and the set B = {f, g, h}.
The cardinality |A × B| of the set A × B is |A| · |B| = 5.3 = 15. The proof
of the Product Rule is straightforward. One way to do it is to prove the rule
taking two sets and then use induction on the number of sets.
The Sum Rule and the Product Rule are basic and they are applied
mechanically in many situations.

Example 2.2.1:
Assume that a car registration system allows a registration plate to consist of
one, two or three English alphabets followed by a number (not zero or starting
with zero) having a number of digits equal to the number of alphabets. How
many possible registrations are there?

There are 26 possible alphabets and 10 possible digits, including 0. By the

Chapter 2 Combinatorics 50

Product Rule there are 26, 26 × 26 and 26 × 26 × 26 possibilities of alphabet

combination(s) of length 1,2 and 3 respectively and 9, 9 × 10 and 9 × 10 × 10
permissible numbers. Each occurrence of a single alphabet can be combined
with any of the nine single digit numbers 1 to 9. Similarly each occurrence
of a double (respectively triple) alphabet canbe combined with any of the
allowed ninety two digit numbers from 10 to 99 (respectively nine hundred
three digit numbers from 100 to 999).Hence the number of registrations of
lengths 2, 4 and 6 characters are 26 × 9 (= 234), 676 × 90 (= 60 840) and
17 576×900 (= 15 818 400) respectively. Since these possibilities are mutually
exclusive and together they exhaust all the possibilities, by the Sum Rule the
total number of possible registrations is 234+60 840+15 818 400 = 15 879 474.

Example 2.2.2:
Consider tossing 100 indistinguishable dice. By the Product Rule, it follows
that there are 6100 ways of their falling.

Example 2.2.3:
A self-dual 2-valued Boolean function is one whose deﬁnition remains un-
changed if we change all the 0’s to 1’s and all the 1’s to 0’s simultaneously.
How many such functions in n-variables exist?
Chapter 2 Combinatorics 51

Values of Function
Variables Value
0 0 0 1
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 0
1 1 1 0

As an example, we consider a Boolean function in 3 variables. The function

value is what is to be assigned in deﬁning the function. As the function is
required to be self-dual, it is evident that an arbitrary assignment (of 0’s and
1’s) of values cannot be made. The self-duality requires that changing all
0’s to 1’s and all 1’s to 0’s does not change the function deﬁnition. Thus, if
the values to the variables are assigned as 0 0 0 and if the function value is
1, then for the assignment (of variables) 1 1 1 the function value must be the
complement of 1 as shown in the table. It is thus easy see that only 2n−1
independent assignments to the function value can be made. Thus, the total
(n−1)
number of self-dual 2-valued Boolean functions is 22 .

2.3 Combinations and Permutations

In obtaining combinations of various objects the general idea is one of selec-

tion. However, in obtaining permutations of objects the idea is to arrange
the objects in some order.
Consider the ﬁve given objects, a, a, a, b, c abbreviated as {3.a, 1.b, 1.c}.
The numbers 3, 1, 1 denoting the multiplicities of the objects are called the
repetition numbers of the objects. The 3-combinations are a a a, a a b, a a c
Chapter 2 Combinatorics 52

and a b c. This ignores the ordering of objects. On the other hand, the 3-
permutations are a a a, a a b, a b a, b a a, a a c, a c a, c a a, a b c, a c b, b a c, b c a,
c a b and c b a.
When we allow unlimited repetitions of objects we denote the repetition
number by α. Consider the 3-combinations possible from {α.a, α.b, α.c, α.d}.
There are 20 of them. On the other hand, there are 43 or 64 3-permutations.
We use the following notations.
P (n, r) = the number of r permutations of n distinct elements without rep-
etitions.
C(n, r) = the number of r combinations of n distinct elements without rep-
etitions.
n! n!
The following are basic results: P (n, r) = (n−r)!
and C(n, r) = r!(n−r)!

where n! (read as “n-factorial”) is the product 1 · 2 · 3 · · · n.

2.4 Stirling’s Formula

A useful approximation to the value of n! for large values of n was given by

James Stirling in 1730. This formula says that when n is “large”, n! can be
p
approximated by, Sn = (2πn) · (n/e)n
That is, lim n!/Sn = 1. The following table gives the values of n! and
n→∞
the corresponding Sn for the indicated n also indicates the percentage error
in Sn .
For standard generalizations of n! see [39].
The following two examples are variations of permutations:

Example 2.4.1:
Chapter 2 Combinatorics 53

Table 2.1: Sn for the indicated n and percentage error in Sn

n n! Sn Percentage
error
8 40 320 39 902 1.0357
9 362 880 359 537 0.9213
10 3 628 800 3 598 696 0.8296
11 39 916 800 39 615 625 0.7545
12 479 001 600 475 687 486 0.6919
13 6 227 020 800 6 187 239 475 0.6389

Given n objects it is required to arrange them in a circle. In how many ways

is this possible?
Number the objects as 1, 2, . . . , n. Keeping the object 1 ﬁxed, it is possible
to obtain (n − 1)! diﬀerent arrangements of the remaining objects. Note
that, for any arrangement it is possible to shift the position of the object 1
while simultaneously shifting (rotating) all the other objects. Thus, the total
number of arrangements is just (n − 1)!.

Example 2.4.2:
Enumerating r-permutations of n objects with unlimited repetitions is easy.
We consider r boxes, each to be ﬁlled by one of n possibilities. Thus the
answer is U (n, r) = nr .
It is easy to see that C(n, r) can also be expressed as n(n−1)(n−2) · · · (n−
r + 1)/r!
The numerator is often denoted by [n]r which is a polynomial in n of
degree r. Thus we can write, [n]r = sor + s1r n + s2r n2 + · · · + srr nr .

By deﬁnition, the coeﬃcients skr are the Stirling Numbers of the first kind.
Chapter 2 Combinatorics 54

These numbers can be calculated from the following formulas:

s0r = 0, srr = 1

skr+1 = srk−1 − rskr (2.1)

Proof. By definition, [x]r+1 = [xr ](x − r). Again by definition, we have from
the above equality, · · · + skr+1 xk + · · · = (· · · + srk−1 xk−1 + skr xk + · · · )(x − r)
Equating the coefficients of xk on both the sides above, gives the required
recurrence. From the above relations we can build the following table:
skr k=0 1 2 3 4
r=1 0 1 0 0 0
2 0 -1 1 0 0
3 0 2 -3 1 0
4 0 -6 11 -6 1

2.5 Examples in simple combinatorial reason-

ing

Clever reasoning forms a key part in most of the combinatorial problems.

We illustrate this with some typical examples.

Example 2.5.1:
Show that C(n, r) = C(n, n − r).
The left hand side of the equality denotes the number of ways of choosing
r objects from n objects. Each such choice leaves out (n − r) objects. This
is exactly equivalent to choosing (n − r) objects, leaving out r objects, which
is the right hand side.
Chapter 2 Combinatorics 55

Example 2.5.2:
Show that C(n, r) = C(n − 1, r − 1) + C(n − 1, r).
The left hand side is the number of ways of selecting r objects from out
of n objects. To do this, we proceed in a diﬀerent manner. We mark one of
the n objects as X. In the selected r objects, (a) either X is included or (b)
X is excluded. The two cases (a) and (b) are mutually exclusive and totally
exhaustive. Case (a) is equivalent to selecting (r − 1) objects from (n − 1)
objects while case (b) is equivalent to selecting r objects from (n−1) objects.

Example 2.5.3:
There are a roads from city A to city B, b roads from city B to city C, c
roads from city C to city D, e roads from city A to city C, d roads from
city B to city D and f roads from city A to city D. In how many number
of ways one can travel from city A to city D and come back to city A while
visiting at least one of city B and/or city C at least once? Starting from the
city A, the diﬀerent routes leading to city D are shown in the following “tree
diagram”.

It follows that the total number of ways of going to city D from city A is
(abc + ad + ec + f ). The tree diagram also suggests (from the leaves to the
root) the number of ways of going from city D to city A is (abc + ad + ec + f ).
Therefore, the total number of ways of going from city A to city D and back
is (abc + ad + ec + f )2 . The number of ways of directly going from city A to
city D and back to city A directly is f 2 . Hence the number of ways of going
from city A to city D and back while visiting city B and/or city C at least
once is (abc + ad + ec + f )2 − f 2 .
Chapter 2 Combinatorics 56

A
a f
B e
C D
b d
c
C D
c D
D

Figure 2.1:

Example 2.5.4:
Show that P (n, r) = r · P (n − 1, r − 1) + P (n − 1, r).
The left hand side is the number of ways of arranging r objects from out
of n objects. This can be done in the following way. Among the r objects, we
mark one object as X. The selected arrangement either includes X or does
not include X. In the former case, we ﬁrst select (r − 1) objects from among
(n − 1) objects and then introduce X in any of the r positions. This gives
the ﬁrst term on the right hand side. In the latter case, we simply select r
objects from out of (n − 1) objects excluding X. This gives the second term
on the right hand side.

Example 2.5.5:
Count the number of simple undirected graphs with a given set V of n ver-
tices.
Obviously, V contains C(n, 2) = n(n − 1)/2 unordered pairs of vertices.
We may include or exclude each pair as an edge in forming a graph with
vertex set V . Therefore, there are 2C(n,2) simple graphs with vertex set V .
Chapter 2 Combinatorics 57

In the above counting, we have not considered isomorphism (see Chap-

ter 6). Thus, although there are 64 simple graphs having four vertices, there
are only eleven of them distinct up to isomorphism.

Example 2.5.6:
Let S be a set of 2n distinct objects. A pairing of S is a partition of S into
2-element subsets; that is, a collection of pairwise disjoint 2-element subsets
whose union is S. How many diﬀerent pairings of S are there?

Method 1: We pick an element say x from S. The number of ways to

select x’s partner, say y, is (2n − 1). (Now {x, y} forms a 2-element subset).
Consider now the (2n − 2) elements in S \ {x, y}. We pick any element say u
from S \ {x, y}. The number of ways to select u’s partner, say v, is (2n − 3).
Thus, the total number of ways of picking {x, y} and {u, v} is (2n−1)(2n−3).
Extending the argument in a similar way and applying the product rule, the
total number of ways of partitioning S into 2-element subsets is given by,

(2n − 1) · (2n − 3) · · · 5 · 3 · 1.

Method 2: Consider the n 2-element subsets (boxes shown as braces) shown

below:
{−, −} , {−, −}, . . . , {−, −} numbered respectively as 1, 2, . . . , n.
We form a 2-element subset of S and assign it to box 1. There are C(2n, 2)
ways of doing this. Next, we form a 2-element subset from the remaining
(2n − 2) elements and assign it to box 2. This can be done in C(2n − 2, 2)
ways. It is easy to see that the total number of ways to form the diﬀerent
2-element subsets of S is,

C(2n, 2) · C(2n − 2, 2) · C(2n − 4, 2) · · · C(4, 2) · C(2, 2).

Chapter 2 Combinatorics 58

However, in the above, we have considered the ordering of the various 2-

element subsets also. Since the ordering is immaterial, the number of ways
of partitioning S into 2-element subsets is

1
[C(2n, 2) · C(2n − 2, 2) · C(2n − 4, 2) · · · C(4, 2) · C(2, 2)]
n!

This expression is the same as the one obtained in the method 1 above.

Example 2.5.7:
Let S = {1, 2, . . . , (n + 1)} where n ≥ 2 and let T = {(x, y, z) ∈ S 3 |x < z
and y < z}. Show by counting |T | in two diﬀerent ways that,
X
k 2 = C(n + 1, 2) + 2C(n + 1, 3) . . . . . . (2.2)
1≤k≤n

We ﬁrst show that the number on the left-hand side of 2.2 is |T |: In

selecting (x, y, z) we first fix z = 2. We then have ordered 3-tuples of the
form (−, −, z). The number of ways of filling the blanks is 1 × 1 as z is
greater than both x and y. Next we fix z = 3 to get ordered 3-tuples of the
form (−, −, z). As both 2 and 1 are less than the fixed z, the number of
ways of filling the blanks is 2 × 2 (whence we get the four ordered 3-tuples
(1, 1, 3), (1, 2, 3), (2, 1, 3) and (2, 2, 3) ). The argument can be extended by
fixing z = 3, 4, . . . , n. Thus the total number of different ordered 3-tuples of
the required type is simply 1 · 1 + 2 · 2 + · · · + n · n = |T |. We next show that
the number on the right-hand side of 2.2) also represents |T |: We consider
two mutually exclusive and totally exhaustive cases, namely x = y and x 6= y
in the 3-tuples of interest. For the first case, we select only 2 integers from S,
take the larger to be z and the smaller to be both x as well as y. This can be
done in C(n + 1, 2) ways. For the second case, we select 3 integers, take the
Chapter 2 Combinatorics 59

largest to be z and assign the remaining two to be x and y in two possible

ways (note that each selection will produce two ordered 3-tuples). The total
number of ordered 3-tuples that are possible in this way is 2C(n + 1, 3).
Hence the number of elements in |T | is C(n + 1, 2) + 2.C(n + 1, 3).

Example 2.5.8:
A sequence of (mn + 1) distinct integers u1 , u2 , . . . , umn+1 is given. Show
that the sequence contains either a decreasing subsequence of length greater
than m or an increasing subsequence of length greater than n (this result is
due to P. Erdös and G. Szekeres (1935)).

We present the proof as in [4]. Let li (−) be the length of the longest
decreasing subsequence with the first term ui (−) and let li (+) be the length
of the longest increasing subsequence with the first term ui (+).
Assume that the result is false. Then ui → (li (−), li (+)) defines a
mapping of {u1 , u2 , . . . , umn+1 } into the Cartesian product {1, 2, . . . , m} ×
{1, 2, . . . , n}. This mapping is injective since if i < j,

ui > uj ⇒ li (−) > lj (−)⇒ (li (−), li (+)) 6= (lj (−), lj (+))

ui < uj ⇒ li (+) > lj (+)⇒ (li (−), li (+)) 6= (lj (−), lj (+))

Hence, |{u1 , u2 , . . . , umn+1 }| ≤ |{1, 2, . . . , m} × {1, 2, . . . , n}| and therefore

mn + 1 ≤ mn which is impossible.

2.6 The Pigeon-Hole Principle

One deceptively simple counting principle that is useful in many situations

is the Pigeon-Hole Principle. This principle is attributed to Johann Peter
Chapter 2 Combinatorics 60

Dirichlet in the year 1834 although he apparently used the German term
Schubfachprinzip. The French term is le principe de tiroirs de Dirichlet which
can be translated as “the principle of the drawers of Dirichlet”.
The Pigeon-Hole Principle: If n objects are put into m boxes and n > m
(m and n are positive integers) then at least one box contains two or more
objects.
A stronger form: If n objects are put into m boxes and n > m, then some
box must contain at least [n/m] objects.
Another form: Let k and n be the two positive integers. If at least kn + 1
objects are distributed among n boxes then one of the boxes must contain
at least k + 1 objects.
We now illustrate this principle with some examples.

Example 2.6.1:
Show that, among a group of 7 people there must be at least four of the same
sex.

We treat the 7 people as 7 objects. We create 2 boxes—one (say, box1) for

the objects corresponding to the females and one (say, box2) for the objects
corresponding to the males. Thus, the 7(= 3 · 2 + 1) objects are put into two
boxes. Hence, by the Pigeon-Hole principle there must be at least 4 objects
in one box. In other words, there must be at least four people of the same
sex.

Example 2.6.2:
Given any ﬁve points, chosen within a square of side with length 2 units,
√
prove there must be two points which are at most 2 units apart.
Chapter 2 Combinatorics 61

Subdivide the square into four small squares each with side of length 1
unit. By the pigeon-hole principle at least two of the chosen points must be
in (or on the boundary) one small square. But then the distance between
√
these two points cannot exceed the diagonal length, 2 of the small square.

Example 2.6.3:
Let A be a set of m positive integers. Show that there exists a nonempty
P
subset B of A such that the sum is divisible by m.
x∈B

We make use of the congruence relation. By a ≡ b (mod m), we mean m

divides (a − b). A basic property we will use, is that if a ≡ r (mod m) and
b ≡ r (mod m), then a ≡ b (mod m).
Let A = {a1 , a2 , . . . , an }. Consider the following m subsets of A and the
sum of their respective elements:

Set A1 = {a1 } − sum of the elements is a1

Set A2 = {a1 , a2 } − sum of the elements is a1 + a2

Set A3 = {a1 , a2 , a3 } − sum of the elements is a1 + a2 + a3

...............

Set Am = {a1 , a2 , ..., am } − sum of the elements is a1 + a2 + · · · + am .

If any of the sums is exactly divisible by m, then the corresponding set is the
required subset B. Therefore, we will assume that none of the above sums is
divisible by m. We thus have,

a1 ≡ r1 ( mod m)

a1 + a2 ≡ r2 ( mod m)

a1 + a2 + a3 ≡ r3 ( mod m)
Chapter 2 Combinatorics 62

a1 + a2 + · · · + am ≡ rm ( mod m)

where each ri (1 ≤ i ≤ m) is in {1, 2, . . . , (m − 1)}. Now, we consider (m − 1)

boxes numbered 1 through (m − 1) and we distribute the integers r1 through
rm to these boxes so that ri goes into box i if r1 = i . By the Pigeon-Hole
principle there must be one box containing an ri and an rj both of which
must be the same, say r. That is, we must have,

a1 + a2 + · · · + ai ≡ r ( mod m)

and a1 + a2 + · · · + aj ≡ r ( mod m),

where j > i without loss of generality.

Therefore, m divides the diﬀerence (a1 +a2 +· · ·+aj ) −(a1 +a2 +· · ·+ai ).
Accordingly, Aj \ A1 is the required subset B.

2.7 More Enumerations

We now consider the interesting case of enumerating combinations with un-

limited repetitions:
Let us consider the distinct objects to be a1 , a2 , a3 , . . . , an . We are inter-
ested in selecting r-combinations from {∞ · a1 , ∞ · a2 , ∞ · a3 , . . . , ∞ · an }.
Any r-combination will be of the form {x1 ·a1 , x2 ·a2 , . . . , xn ·an } where the
x1 , x2 , . . . , xn are the repitition numbers, each being a non-negative integer
and they add up to r. Thus the number of r-combinations of {∞·a1 , ∞·a2 , ∞·
a3 , . . . , ∞·an } is equal to the number of solutions of x1 +x2 +x3 +· · ·+xn = r.
For each xi we put a bin and assign xi balls to that bin; the number of
balls will add up to r. Thus the number of solutions to the above equation
Chapter 2 Combinatorics 63

is equal to the number of ways of placing r indistinguishable balls in n bins

numbered 1 through n. We now make the following observation.
Consider 10 objects and consider the 7-combinations from them. One
solution is: (3 0 0 2 0 0 0 2 0 0).
Corresponding to this (distribution of balls in bins) we can form a unique
binary number as follows: ﬁrst we separate the integers by a 1 (imagine this
as a vertical bar). Then we ignore the zeros and put (appropriate number
of) zeros in the place of non-zero integers. For the above example, we get,
000 1 1 1 00 1 1 1 1 00 1 1 .
Generalizing this we can say that the number of ways of placing r in-
distinguishable balls into n numbered bins is equal to the number of binary
numbers with (n − 1) 1’s and r 0’s. Counting such binary numbers is easy:
we have (n−1+r) positions and we have to choose r positions to be occupied
by the 0’s ( the remaining (n − 1) positions get ﬁlled by 1’s ). This can be
done in C(n − 1 + r, r) ways. We can now state the result:
Let V (n, r) be the number of r-combinations of n distinct objects with
unlimited repitions. We have, V (n, r) = C(n − 1 + r, r) = C(n − 1 + r, n − 1).

Example 2.7.1:
The number of integral solutions of, x1 + x2 + · · · + xn = r, xi > 0, for all
admissible values of i is equal to the number of ways of distributing r similar
balls into n numbered bins with at least one ball in each bin. This is equal
to C(n − 1 + (r − n), r − n) = C(r − 1, r − n).
Chapter 2 Combinatorics 64

2.7.1 Enumerating permutations with constrained rep-

etitions

We begin with an illustrative example. Consider the problem of obtaining

the 10-permutations of {3 · a, 4 · b, 2 · c, 1 · d}. Let x be the number of such
permutations. If the 3 a’s are replaced by a1 , a2 , a3 it is easy to see that
we will get 3!x permutations. Further if the 4b’s are replaced by b1 , b2 , b3 ,
b4 then we will get (4!)(3!)x permutations. In addition, if we replace the
2c’s by c1 , c2 we will get (2!)(4!)(3!)x permutations. In the process we have
generated the set { a1 , a2 , a3 , b1 , b2 , b3 , b4 , c1 , c2 , d } the elements of which
can be permuted in 10! ways. Therefore,

(2!)(4!)(3!)x = 10! and hence, x = 10!/(2!)(4!)(3!).

We now generalize the above example.

Let q1 , q2 , . . . , qt be nonnegative integers such that n = q1 + q2 + . . . + qt .
Also let a1 , a2 , . . . , at be t distinct objects. Let P (n; q1 , q2 , . . . , qt ) denote the
number of n-permutations of the n-combination of {q1 · a1 , q2 · a2 , . . . , qt · at }.
By an argument similar to the above example we have

P (n; q1 , q2 , . . . , qt ) = n!/(q1 ! · q2 ! · . . . · qt !) =

C(n, q1 )C(n − q1 , q2 )C(n − q1 − q2 , q3 ) . . . C(n − q1 − q2 − . . . − qt−1 , qt )

By substituting the formula for each term in the product, the last expression
can be simpliﬁed to the previous expression.
Chapter 2 Combinatorics 65

2.8 Ordered and Unordered Partitions

Let S be a set with n distinct elements and let t be a positive integer. A

t-part partition of the set S is a set {A1 , A2 , . . . , At } of t subsets of S namely,
A1 , A2 , . . . , At such that

S = A1 ∪ A2 ∪ · · · ∪ At ,

and Ai ∩ Aj = φ, for i 6= j.

This refers to unordered partition.

An ordered partition of S is ﬁrstly, a partition of S; secondly there is
a speciﬁed order on the subsets. For example, the ordered partitions of
S = {a, b, c, d} of the type {1, 1, 2} are given below:

({a}, {b}, {c, d}) ({b}, {a}, {c, d}) ({a}, {c}, {b, d}) ({c}, {a}, {b, d})

({a}, {d}, {b, c}) ({d}, {a}, {b, c}) ({b}, {c}, {a, d}) ({c}, {b}, {a, d})

({b}, {d}, {a, c}) ({d}, {b}, {a, c}) ({c}, {d}, {a, b}) ({d}, {c}, {a, b})

Here, our concern is in the number of such partitions rather than the actual
list itself.

2.8.1 Enumerating the ordered partitions of a set

The number of ordered partitions of a set S of the type (q1 , q2 , . . . , qt ) , where

|S| = n, is
P (n; q1 , q2 , . . . , qt ) = n!/(q1 ! · q2 ! · · · · qt !)

We see this by choosing the q1 elements to occupy the ﬁrst subset in C(n, q1 )
ways; the q2 elements for the second subset in C(n − q1 , q2 ) ways etc. Thus,
Chapter 2 Combinatorics 66

the number of ordered partitions of the type (q1 , q2 , . . . , qt ) is

C(n, q1 )C(n − q1 , q2 )C(n − q1 − q2 , q3 ) . . . C(n − q1 − q2 − . . . − qt−1 , qt )

which is equal to n!/(q1 ! · q2 ! · · · qt !).

Example 2.8.1:
In the game of bridge, the four players N , E, S and W are seated in a
speciﬁed order and are each dealt with a hand of 13 cards. In how many
ways can the 52 cards be dealt to the four players?
We see that the order counts. Therefore, the number of ways is 52!/(13!)4 .

Example 2.8.2:
To show that (n2 )!/(n!)n is an integer.
Consider a set of n2 elements. Partition this set into n-part partitions.
Then the number of ordered n-part partitions is (n2 )!/(n!)n which has to be
an integer.

We can also consider partitioning a set of objects into diﬀerent classes.

For example, consider the four objects a, b, c, d. We can partition the
set {a, b, c, d} into two classes: one class containing two subsets with one
and three elements and another class containing two subsets each with two
elements (these are the only possible classes). The partioning gives the fol-
lowing:

({a}, {b, c, d}) , ({b}, {a, c, d}) , ({c}, {a, b, d}) , ({d}, {a, b, c}) ,

({a, b}, {c, d}) , ({a, c}, {b, d}) , ({a, d}, {b, c}) .

The number of partitions of a set of n objects into m classes, where n ≥ m,

is denoted by Snm which is called the Stirling Number of the second kind. It
Chapter 2 Combinatorics 67

is also the number of distinct ways of arranging a set of n distinct objects

into a collection of m identical boxes, not allowing any box to be empty ( if
empty boxes were permitted, then the number would be Sn1 + Sn2 + . . . + Snm ).
It easily follows that,

Sn1 = Snn = 1.
k
Also, Sn+1 = Snk−1 + kSnk , 1<k<n

Proof. Consider the partitions of (n + 1) objects into k classes. We have the

following two mutually exclusive and totally exhaustive cases:

(i) The (n+1)th object is the sole member of a class: In this case, we simply
form the partitions of the remaining n objects into k − 1 classes and
attach the class containing the sole member. The number of partitions
thus formed is Snk−1 .

(ii) The (n+1)th object is not the sole member of any class: In this case, we
ﬁrst form the partitions of the remaining n objects into k classes. This
gives Snk partitions. In each such partition we then add the (n + 1)th
object to one of the k classes. We thus get kSnk partitions of the required
type.

From (i) and (ii) above, the result follows.

The interpretation of Snm as the number of partitions of {1, . . . , n} into

exactly m classes also yields another recurrence. If we remove the class (say
c) containing n and if there are r elements in the class c then we get a
partition of (n − r) elements into (m − 1) classes. The class c can be chosen
Chapter 2 Combinatorics 68

in C(n − 1, r − 1) possible ways. Hence,

n
X
Snm = m−1
C(n − 1, r − 1)Sn−r .
r=1

We can use the above relations to build the following table:

Snm m=1 2 3 4 5 6 7
n=1 1 0 0 0 0 0 0
2 1 1 0 0 0 0 0
3 1 3 1 0 0 0 0
4 1 7 6 1 0 0 0
5 1 15 25 10 1 0 0
6 1 31 90 65 15 1 0
7 1 63 301 350 140 21 1

2.9 Combinatorial Identities

There are many interesting combinatorial identities. Some of these identities

are suggested by the Pascal’s Triangle shown in Fig. 2.2. Pascal’s triangle
is constructed by first writing down three 1’s in the form of a triangle (this
corresponds to the first two rows in the figure). Any number (other than the
1’s at the ends) in any other row is obtained by summing up the two numbers
in the previous row that are positioned immediately before and after. The
1’s at the ends of any row are simply carried over. We consider below, some
well-known combinatorial identities. The identities 2 through 5 can be seen
to appear in the Pascal’s triangle.

Newton’s Identity

C(n, r)C(r, k) = C(n, k)C(n − k, r − k), for integers n ≥ r ≥ k ≥ 0.

The left-hand side consists of selecting two sets: ﬁrst a set A of r objects
and then a set B of k objects from the set A. For example, it is the number
Chapter 2 Combinatorics 69

C(0, 0) = 1

C(1, 0) = 1 C(1, 1) = 1

C(2, 0) = 1 C(2, 1) = 2 C(2, 2) = 1

C(3, 0) = 1 C(3, 1) = 3 C(3, 2) = 3 C(3, 3) = 1

C(4, 0) = 1 C(4, 1) = 4 C(4, 2) = 6 C(4, 3) = 4 C(4, 4) = 1

Figure 2.2:

of ways of selecting a committee of r people out of a set of n people and then

choosing a subcommittee of k people. The right-hand side can be viewed
as selecting the k subcommittee members in the ﬁrst place and then adding
(r − k) people, to form the committee, from the remaining (n − k) people.
A special case: C(n, r)r = nC(n − 1, r − 1).

Pascal’s Identity: C(n, r) = C(n − 1, r) + C(n − 1, r − 1).

This is attributed to M. Stifel (1486–1567) and to Blaise Pascal (1623–1662).

The proof of the identity is easy.

Diagonal Summation:

C(n, 0) + C(n + 1, 1) + C(n + 2, 2) + · · · + C(n + r, r) = C(n + r + 1, r).

The right-hand side is equal to the number of ways to distribute r indistin-
guishable balls into (n+2) numbered boxes. But, the balls may be distributed
as follows: ﬁx a value for k where 0 ≤ k ≤ r; for each k distribute the k balls
in the ﬁrst (n + 1) boxes and then the remainder in the last box. This can
P
be done in k=0,1,...,r C(n + k, k) ways.
Chapter 2 Combinatorics 70

Row Summation

C(n, 0) + C(n, 1) + . . . + C(n, r) + · · · + C(n, n) = 2n .

Consider ﬁnding the number of subsets of a set with n elements. We
take n bins (one for each element). We indicate the picking (respectively, not
picking) of an element by “putting” a 1 (respectively a 0) in the corresponding
bin. It is thus easy to see that the total number of possible subsets is equal to
the total number of ﬁlling up of the bins (with a 1 or with a 0). This is equal
to 2n , the right-hand side of the above equality. Now, the various possible
subsets can also be counted as the the number of subsets with 0 elements,
the number of subsets with 1 elements and so on. This way of enumeration
leads to the expression on the left-hand side (note: the proof does not use
the Binomial Theorem).

Row Square Summation

[C(n, 0)]2 + [C(n, 1)]2 + · · · + [C(n, r)]2 + · · · + [C(n, n)]2 = C(2n, n)

Let S be a set with 2n elements. The right-hand side above counts the
number of n-combinations of n. Now, partition S into two subsets A and
B, each with n elements. Then, an n-combination of S is a union of an r-
combination of A and an (n−r)-combination of B, for r = 0, 1, . . . , n. For any
r, there are C(n, r)r-combinations of A and C(n, n − r)(n − r)-combinations
of B. Thus, by the Product Rule there are C(n, r)C(n, n − r)n-combinations
obtained by taking r elements from A and (n − r) elements from B. Since
C(n, n − r) = C(n, r). we have C(n, r)C(n, n − r) = [C(n, r)]2 for each r.
Then the Sum Rule gives the left-hand side.
Chapter 2 Combinatorics 71

2.10 The Binomial and the Multinomial The-

orems

Theorem 2.10.1:
Let n be a positive integer. Then all elements x and y belonging to a com-
mutative ring with unit element with the usual operations + and ·,
(x + y)n = C(n, 0)xn + C(n, 1)xn−1 y + C(n, 2)xn−2 y 2 + · · · + C(n, r)xn−r y r +
· · · + C(n, n)y n

Note: For the deﬁnition of a commutative ring, see Chapter 5. For the
present, it is enough to think x and y as real numbers.

Proof. The inductive proof is well-known. Below, we present the combinato-

rial proof. We write the left-hand side explicitly as consisting of n factors:

(x + y)(x + y) · · · (x + y)

We select an x or a y from each factor (x + y). This gives terms of the

form xn−r y r for each r = 0, 1, . . . , n. We collect all such terms with similar
exponents on x and y and sum them up. This sum is then the coeﬃcient of
the term of the form xn−r y r in the expansion of (x + y)n . For any given r, to
get the term xn−r y r we select r of the y’s from the n factors (x gets chosen
from the remaining n − r factors). This can be done in C(n, r) ways. This
then is the coeﬃcient of xn−r y r as required by the theorem.

The binomial coefficients (of the type C(n, r)) appearing above occur in
the Pascal’s triangle. For a fixed n, we can obtain the ratio of the (k + 1)st
bionomial coefficient of order n to the k th : C(n, k+1)/C(n, k) = (n−k)/(k+
1).
Chapter 2 Combinatorics 72

This ratio is larger than 1 if k < (n−1)/2 and is less than 1 if k > (n−1)/2.
Therefore, we can infer that the biggest binomial coeﬃcient must occur in the
“middle”. We use Stirling’s approximation to estimate how big the binomial
coeﬃcients are:
p
n! (n/e)n (2nπ) p
C(n, n/2) = = p = 2n (2/nπ)
(n/2)! {(n/2e)n/2 (nπ)}2

Corollary 2.10.2:
Using the Binomial Theorem we can get expansions for (1 + x)n and (1 − x)n .

If we set x = 1 and y = −1 in the Binomial Theorem, we get,

C(n, 0) − C(n, 1) + C(n, 2) − · · · + (−1)n · C(n, n) = 0.

We can write this as:

C(n, 0) + C(n, 2) + C(n, 4) + · · · = C(n, 1) + C(n, 3) + C(n, 5) + · · · = S

Let S be the common sum. Then, by previous identity (see row summation),
adding the two series, we get 2S = 2n or S = 2n−1 . The combinatorial
interpretation is easy. If S is a set with n elements, then the number of
subsets of S with an even number of elements is equal to number of subsets
of S with an odd number of elements and each of these is equal to 2n−1 .

Example 2.10.3:
To show that: 1 · C(n, 1) + 2 · C(n, 2) + 3 · C(n, 3) + · · · + n · C(n, n) = n2n−1 ,
for each positive integer n.

We use the Newton’s Identity, rC(n, r) = nC(n − 1, r − 1) to replace each

term on left-hand side; then the expression on the left reduces to

n [C(n − 1, 0) + · · · + C(n − 1, n − 1)] = n2n−1

Chapter 2 Combinatorics 73

giving the expression on the right.

The Multinomial Theorem: This concerns with the expansion of multi-
nomials of the form
(x1 + x2 + . . . + xt )n .

Here, the role of the binomial coefficients gets replaced by the “multinomial
coefficients”
n!
P (n; q1 , q2 , . . . , qt ) =
q1 !q2 ! · · · qt !
P
where qi ’s are non-negative integers and qi = n. (recall that the multino-
mial coefficients enumerate the ordered partitions of a set of n elements of
the type (q1 , q2 , . . . , qt )).

Example 2.10.4:
By long multiplication we can get, (x1 + x2 + x3 )3 = x31 + x32 + x33 + 3x21 x2
+ 3x21 x3 + 3x1 x22 + 3x1 x23 + 3x2 x23 + 3x22 x3 + 6x1 x2 x3 .

To get the coeﬃcient of, say, x2 x23 we choose x2 from one of the factors
and x3 from the remaining two. This can be done in C(3, 1) · C(2, 2) = 3
ways; therefore the required coeﬃcient should be 3.

Example 2.10.5:
Find the coeﬃcient of x41 x52 x63 x34 in (x1 + x2 + x3 + x4 )18 .

The product will occur as often as, x1 can be chosen from 4 out of the
18 factors, x2 from 5 out of the remaining 14 factors, x3 from 6 out of the
remaining 9 factors and x4 from out of the last 3 factors. Therefore the
Chapter 2 Combinatorics 74

coeﬃcient of x41 x52 x63 x34 must be

C(18, 4) · C(14, 5) · C(9, 6) · C(3, 3) = 18!/4!5!6!3!.

Generalization of the above yields the following theorem:

Theorem 2.10.6 (The Multinomial Theorem):

Let n be a positive integer. Then for all x1 , x2 , . . . ,xt we have,
X
(x1 + x2 + · · · + xt )n = P (n; q1 , q2 , ..., qt )xq1 q2 qt
1 x2 · · · xt

where the summation extends over all sets of non-negative integers q1 , q2 ,

. . . , qt with q1 + q2 + · · · + qt = n.

To count the number of terms in the above expansion, we note that each
term of the form xq1 q2 qt
1 , x2 , . . . , xt corresponds to a selection of n objects

with repetitions from t distinct types. There are C(n+t−1, n) ways of doing
this. This then is the number of terms in the above expansion.

Example 2.10.7:
In (x1 + x2 + x3 + x4 +x5 )10 , the coeﬃcient of x21 x3 x34 x45 is

P (10; 2, 0, 1, 3, 4) = 10!/2!0!1!3!4! = 12, 600.

There are C(10+5−1, 10) = C(14, 10) = 1001 terms in the above multinomial
expansion.

Corollary 2.10.8:
In the multinomial theorem if we let x1 = x2 = · · · = xt = 1, then for any
P
positive interger t, we have tn = P (n; q1 , q2 , . . . , qt ), where the summation
P
extends over all sets of non-negative integers q1 , q2 , . . . , qt with qi = n.
Chapter 2 Combinatorics 75

2.11 Principle of Inclusion-Exclusion

The Sum Rule stated earlier (see section 1.2 ) applies only to disjoint sets.
A generalization is the Inclusion- Exclusion principle which applies to non-
disjoints sets as well.
We ﬁrst consider the case of two sets. If A and B are ﬁnite subsets of
some universe U , then

|A ∪ B| = |A| + |B| − |A ∩ B|

By the Sum Rule

|A ∪ B| = |A ∩ B ′ | + |A ∩ B| + |A′ ∩ B| (2.3)

Also, we have

|A| = |A ∩ B ′ | + |A ∩ B|

|B| = |A′ ∩ B| + |A ∩ B|, and

|A| + |B| = |A ∩ B ′ | + |A′ ∩ B| + 2|A ∩ B| (2.4)

(2.5)

From (2.3) and (2.4), we get

|A ∪ B| = |A| + |B| − |A ∩ B|.

Example 2.11.1:
From a group of ten professors, in how many ways can a committee of ﬁve
members can be formed so that at least one of professor A or professor B is
included?
Chapter 2 Combinatorics 76

Let A1 and A2 be the sets of committees that include professor A and

professor B respectively. Then the required number is |A1 ∪ A2 | . Now,

|A1 | = C(9, 4) = 126 = |A2 | and |A1 ∩ A2 | = C(8, 3) = 56.

Therefore it follows that |A1 ∪ A2 | = 126 + 126 − 56 = 196.

The Inclusion-Exclusion Principle for the case of three sets can be stated
as given below:
If A, B and C are three ﬁnite sets then

|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|

The result can be obtained easily using Venn diagram.

We now state a more general form of the Inclusion-Exclusion Principle.

Theorem 2.11.2:
If A1 , A2 , . . . , an are ﬁnite subsets of a universal set then,
X X X
|A1 ∪ A2 ∪ · · · ∪ An | = |Ai | − |Ai ∩ Aj | + |Ai ∩ Aj ∩ Ak | − · · ·

+(−1)n−1 |A1 ∩ A2 ∩ . . . ∩ An | (2.6)

– the second summation on the right-hand side is taken over all the 2-
combinations (i, j) of the integers {1, 2, ..., n}; the third summation is taken
over all the 3-combinations (i, j, k) of the integers {1, 2, ..., n} and so on.
Thus, for n = 4, there are 4 + C(4, 2) + C(4, 3) + 1 = 24 − 1 = 15 terms on
the right-hand side. In general there are,

C(n, 1) + C(n, 2) + C(n, 3) + · · · + C(n, n) = 2n − 1

terms on the right-hand side.

(Note: the term C(n, 0) is missing; so the −1 appears on right hand side)
Chapter 2 Combinatorics 77

Proof. The proof by induction is boring! Here we give the proof based on
combinatorial arguments.
We must show that every element of A1 ∪ A2 ∪ · · · ∪ An is counted exactly
once in the right hand side of (2.6). Suppose that an element x ∈ A1 ∪
A2 ∪ · · · ∪ An is in exactly m (integer, ≥ 1) of the sets considered on the
right-hand side of (2.6); for deﬁniteness, say

x ∈ A1 , x ∈ A2 , . . . , x ∈ A m and x ∈
/ Am+1 , . . . , x ∈
/ An .

Then x will be counted in each of the terms |Ai |, for i = 1, . . . , m; in other

P
words x will be counted C(m, 1) times in the “ |Ai | term” on right-hand
P
side of (2.6). Also, x will be counted C(m, 2) times in the “ |Ai ∩ Aj |
term” on right-hand side of (2.6) since there are C(m, 2) pairs of sets Ai , Aj
where x is in both Ai and Aj . Likewise x is counted C(m, 3) times in the
P
“ |Ai ∩ Aj ∩ Ak | term” since there are C(m, 3) 3-combinations of Ai , Aj
and Ak such that x ∈ Ai , x ∈ Aj and x ∈ Ak . Continuing in this manner, we
see that on the right hand side of (2.6) x is counted,

C(m, 1) − C(m, 2) + C(m, 3) + · · · + (−1)m−1 C(m, m)

number of times. Now, we must show that, this last expression is 1. Ex-
panding (m − 1)m by the Binomial Theorem we get,

0 = C(m, 0) − C(m, 1) + C(m, 2) + · · · + (−1)m−1 C(m, m).

Using the fact that C(m, 0) = 1 and transposing all other terms to the left-
hand side of the above equation, we get the required relation.
Chapter 2 Combinatorics 78

2.12 Euler’s φ-function

If n is a positive integer, by deﬁnition, φ(n) is the number of integers x such

that 1 ≤ x ≤ n and n and x are relatively prime (note: two positive integers
are relatively prime if their gcd is 1).
For example, φ(30) = 8 because the eight integers 1, 7, 11, 13, 17, 19, 23
and 29 are the only positive integers less than 30 and relatively prime to 30.
Let Ai be the subset of U consisting of those integers divisible by pi . The
integers in U relatively prime to n are those in none of the subsets A1 , A2 ,
. . . , Ak . So,

φ(n) = |A′1 ∩ A′2 ∩ · · · ∩ A′k | = |U | − |A1 ∪ A2 ∪ · · · ∪ Ak |

If d divides n, then there are n/d multiples of d in U . Hence,

|Ai | = n/pi , |Ai ∩ Aj | = n/pi pj · · · , |A1 ∩ A2 ∩ · · · ∩ Ak | = n/p1 p2 . . . pk .

Thus by Inclusion-Exclusion Principle,

X X
φ(n) = n − n/pi + n/pi pj + · · · + (−1)k (n/p1 p2 · · · pk ).
i 1≤i≤j≤k

1 1 1
This is equal to the product n 1 − p1
1− p2
··· 1 − pk
.
It turns out that computing φ(n) is as hard as factoring n. The following
is a beautiful identity involving the Euler φ-function due to Smith (1875):

(1, 1) (1, 2) . . . (1, n)

(2, 1) (2, 2) . . . (2, n)
= φ(1)φ(2) · · · φ(n)
... ... ... ...

(n, 1) (n, 2) · · · (n, n)

where (a, b) denotes the gcd of the integers a and b.

Chapter 2 Combinatorics 79

2.13 Inclusion-Exclusion Principle and the Sieve

of Eratosthenes

The Greek mathematician Eratosthenes who lived in Alexandria in the 3rd

century B.C. devised the sieve technique to get all primes between 2 and n.
The method starts by writing down all the integers from 2 to n in the natural
order. Then starting with the smallest that is 2, every second number (these
are multiples of 2) is crossed out. Next starting with the smallest (uncrossed)
number, that is 3, every third number (these are multiples of 3) is crossed out.
This procedure is repeated until a stage is reached when no more numbers
could be crossed out. The surviving uncrossed numbers are the required
primes.
We now compute how many integers between 1 and 1000 are not divisible
by 2, 3, 5 and 7; that is how many integers remain after the ﬁrst 4 steps of
the sieve method.
Let U be the set of integers x such that 1 ≤ x ≤ 1000. Let A1 , A2 , A3 , A4
be the sets of elements (of U ) divisible by 2, 3, 5 and 7 respectively. Then,
A1 ∪ A2 ∪ A3 ∪ A4 is the set of numbers in U that are divisible by at least
one of 2, 3, 5, and 7. Hence the required number is |(A1 ∪ A2 ∪ A3 ∪ A4 )′ | .
We know that,

|A1 | = 1000/2 = 500; |A2 | = 1000/3 = 333;

|A3 | = 1000/5 = 200; |A4 | = 1000/7 = 142;

|A1 ∩ A2 | = 1000/6 = 166; |A1 ∩ A3 | = 1000/10 = 100;

|A1 ∩ A4 | = 1000/14 = 71; |A2 ∩ A3 | = 1000/15 = 66;

|A2 ∩ A4 | = 1000/21 = 47; |A3 ∩ A4 | = 1000/35 = 28;

Chapter 2 Combinatorics 80

|A1 ∩ A2 ∩ A3 | = 1000/30 = 33; |A1 ∩ A2 ∩ A4 | = 1000/42 = 23;

|A1 ∩ A3 ∩ A4 | = 1000/70 = 14; |A2 ∩ A3 ∩ A4 |= 1000/106 = 9;

|A1 ∩ A2 ∩ A3 ∩ A4 | = 1000/210 =4

Then, |A1 ∪ A2 ∪ A3 ∪ A4 | =

(500+333+200+142)−(166+100+71+66+47+28)+(33+23+14+9)−4 = 772.

Therefore, |(A1 ∪ A2 ∪ A3 ∪ A4 )′ | = 1000 − 772 = 228

2.14 Derangements

Derangements are permutations of {1, . . . , n} in which none of the n elements

appears in its ‘natural’ place. Thus, (i1 , i2 , . . . , in ) is a derangement if i1 6= 1,
i2 6= 2, . . . , and in 6= n.
Let Dn denote the number of derangements of (1, 2, . . . , n). It follows that
D1 = 0; D2 = 1, because (2, 1) is a derangement; D3 = 2 because (2, 3, 1)
and (3, 1, 2) are the only derangements of (1, 2, 3). We will determine Dn
using the Inclusion-Exclusion principle.
Let U be the set of all n! permutations of {1, 2, . . . , n}. For each i, let
Ai be the set of permutations such that element i is in its correct place; that
is, all permutations (b1 , b2 , . . . , bn ) such that bi = i. Evidently, the set of
derangements is precisely the set A′1 ∩ A′2 ∩ A′3 ∩ · · · ∩ A′n ; the number of
elements in it is Dn .
Now, the permutations in A1 are all of the form (1, b2 , . . . , bn ) where
(b2 , . . . , bn ) is a permutation of {2, . . . , n}. Thus |A1 | = (n − 1)!. Similarly it
follows thet |Ai | = (n − 1)!.
Chapter 2 Combinatorics 81

Likewise, A1 ∩ A2 is the set of permutations of the form (1, 2, b3 , . . . , bn ),

so that |A1 ∩A2 | = (n−2)!. We can similarly argue, |Ai ∩Aj | = (n−2)! for all
admissible pairs of values of i, j, i 6= j. For any integer k, where 1 ≤ k ≤ n,
the permutations in A1 ∩ A2 ∩ . . . ∩ Ak are of the form

(1, 2, . . . , k, bk+1 , . . . , bn )

where (bk+1 , . . . , bn ) is a permutations of (k + 1, . . . , n). Thus, |A1 ∩ A2 ∩

. . . ∩ Ak | = (n − k)! .
More generally, we have |Ai1 ∩Ai2 ∩. . .∩Aik | = (n−k)! for {i1 , i2 , . . . , ik },
a k-combination of {1, 2, . . . , n}. Therefore,

|(A′1 ∩ A′2 ∩A′3 ∩ · · · ∩ A′n )| = |U | − |A1 ∪ A2 ∪ . . . ∪ An |

= n! − C(n, 1)(n − 1)! + C(n, 2)(n − 2)! + . . . + (−1)n C(n, n)

n! n! n! n!
= n! − + − + . . . + (−1)n .
1! 2! 3! n!
h i
Thus, Dn = n! 1 − 1!1 + 2!1 − 3!1 + . . . + (−1)n n!1 .
We can get a quick approximation to Dn in terms of the exponential e.
We know that,

e−1 = 1 − (1/1!) + (1/2!) − (1/3!) + ...

+ (−1)n (1/n!) + (−1)n+1 (1/(n + 1)!) + . . .

or e−1 = (Dn /n!) + (−1)n+1 (1/(n + 1)!) + (−1)n+2 (1/(n + 2)!) + . . .

or |e−1 − (Dn /n!)| ≤ (1/(n + 1)!) + (1/(n + 2)!) + . . .

≤ (1/(n + 1)!)[1 + (1/(n + 2)) + (1/(n + 2)2 ) + . . .]

≤ (1/(n + 1)!)[1/{1 − (1/(n + 2)}]

= (1/(n + 1)!)[1 + 1/(n + 1)]

Chapter 2 Combinatorics 82

As n → ∞, we know (n + 1)! → ∞ at a faster rate; thus for large values

of n, we regard n!/e as a good approximation to Dn . For example, D8 =
14833 and the value of 8!/e is about 14832.89906. A diﬀerent approach
leads to a recurrence relation for Dn . In considering the derangements of
{1, . . . , n}, n ≥ 2 we can form two mutually exclusive and totally exhaustive
cases as follows:

(i) The integer 1 is displaced to the k th position (1 < k ≤ n) and k is

displaced to the 1st position: In this case, the total number of derange-
ments of is equal to the number of derangements of the set of n − 2
numbers {2, 3, . . . , k−1, k+1, ..., n}. The required number is thus Dn−2 .

(ii) The integer 1 is displaced to the k th position (1 < k ≤ n) but k is dis-

placed to a position diﬀerent from the 1st position: These are precisely
the derangements of the set of n − 1 numbers {k, 2, 3, . . . , k − 1, k +
1, ..., n}. Clearly, any derangement will displace the integer k from the
1st position. The required number is thus Dn−1 .

We note that in the above argument, k can be any one of 2, 3, ..., n;

that is, it can take (n − 1) possible values. Thus, we have,

Dn = (n − 1)(Dn−1 + Dn−2 )

By simple algebraic manipulations, it can be shown that Dn − nDn−1 =

(−1)n , for n ≥ 2. This recurrence relation when solved with the initial
condition gives,
n−1
X (−1)i+1
Dn = n! .
i=1
(i + 1)!
Chapter 2 Combinatorics 83

2.15 Partition Problems

Several interesting combinatorial problems arise in connection with the “par-

tioning of integers”. These are collectively referred to as partition problems.
We denote as p(n, m), the number of partitions of an integer n into m parts.
We write,

n = α1 + α2 + · · · + αm and specify α1 ≥ α2 ≥ · · · ≥ αm ≥1

For example the partitions of 2 are 2 and 1 + 1; so p(2, 1) = p(2, 2) = 1. The

partitions of 3 are 3, 2+2 and 1+1+1 and so, p(3, 1) = p(3, 2) = p(3, 3) = 1.
Similarly the partitions of 4 are 4, 3 + 1, 2 + 2, 2 + 1 + 1 and 1 + 1 + 1 + 1;
so p(4, 2) = 2 and p(4, 1) = p(4, 3) = p(4, 4) = 1.

2.15.1 Recurrence relations p(n, m)

The following recurrence relations can be used to compute p(n, m):

p(n, 1) + p(n, 2) + · · · + p(n, k) = p(n + k, k) (2.7)

and p(n, 1) = p(n, n) = 1 (2.8)

Proof. The second formula is obvious by deﬁnition. We proceed to prove the

ﬁrst formula.
Let A be the set of partitions of n having m parts, m ≤ k ; each partition
in A can be considered as a k-tuple. Let B be the set partitions of n + k into
k parts. Deﬁne a mapping φ : A → B by setting,

φ(α1 , α2 , . . . , αm ) = (α1 + 1, α2 + 1, . . . , αm + 1, 1, 1, . . . , 1)

Since α1 + α2 + · · · + αm = n, the sequence on the right side above (equation

Chapter 2 Combinatorics 84

***) gives a partition of n + k into k parts. (Note that if m = k the 1’s in

the partion of n + k will be absent.)
Clearly the mapping is bijective. Hence,

|A| = p(n, 1) + p(n, 2) + · · · + p(n, k) = |B| = p(n + k, k)

The equations (2.7) and (2.8) allow us to compute p(n, m)’s recursively.
For example, if n = 4 and m = 6 the values of p(n, m)’s for n ≤ 4 and m ≤ 6
are given by the following array:
p(n, m) m = 1 2 3 4 5 6
n=1 1 0 0 0 0 0
2 1 1 0 0 0 0
3 1 1 1 0 0 0
4 1 2 1 1 0 0
5 1 2 2 1 1 0
6 1 3 3 2 1 1

2.16 Ferrer Diagrams

We next illustrate the idea of Ferrer diagram to represent a partition. Con-

sider a partition such as 5 + 3 + 2 + 2. We represent it diagrammatically as
shown in the Fig. 2.3 on the following page. The representation as seen in
the diagram has one row for each part; the number of squares in a row is
equal to the size of the part it represents; an upper row has at least as many
number of squares as there are in a lower row; the rows are aligned to the
left.
The partition obtained by rendering the columns (of a given partition)
as rows is known as the conjugate partition of a given partition. For exam-
ple, from the above Ferrer diagram it follows (by turning the ﬁgure by 90o
clockwise and by taking a mirror reﬂection) that the conjugate partition of
5 + 3 + 2 + 2 is 4 + 4 + 2 + 1 + 1.
Chapter 2 Combinatorics 85

x x x x x
x x x
5+3+3+1+1→ x x x →9+3+1
x
x

Figure 2.3: Figure 2.4:

2.16.1 Proposition

The number of partitions of n into k parts is equal to the number of partitions

of n into parts, of which the largest is k. It is easy to see that we can establish
a bijection between the set of partitions of n having k as the largest part.
This can be seen using the Ferrer diagram.

2.16.2 Proposition

The number of self-conjugate partitions of n is the same as the number of

partitions of n with all parts unequal and odd.
Consider the Ferrer diagram associated with a partition of n with all
parts unequal and odd. We can obtain a new Ferrer diagram by placing the
squares of each row in a “set-square arrangement” as shown in Fig. 2.4.
The new Ferrer diagram deﬁnes a self-conjugate partition. Similarly,
reversing the argument, each self-conjugate partition corresponds to a unique
partition with all parts unequal and odd.
Chapter 2 Combinatorics 86

2.16.3 Proposition

The number of partitions of n is equal to the number of partitions of 2n

which have exactly n parts.
Any partition of n can have at most n parts. Treat each partition as an
n-tuple, (α1 , α2 , . . . , αn ) where, in general, for some i all αi+1 to αn will be
0. Now add 1 to each αj , 1 ≤ j ≤ n –this deﬁnes a partition of 2n (as we
are adding n 1’s) where there must be exactly n parts. It is easy to see that
we have a bijection from the set of the original n-tuples to the set of n-tuples
formed as above. Hence the result follows.

2.17 Solution of Recurrence Relations

Diﬀerent types of recurrence relations (also known as diﬀerence equations)

arise in many enumeration problems and in the analysis of algorithms. We il-
lustrate many types of recurrence relations and the commonly adopted tech-
niques and tricks used to solve them. We also give the general technique
based on generating functions to solve recurrence relations. [20], [9] and [56]
provide a good material for solving recurrence relations.

Example 2.17.1:
The number Dn of derangements of the integers (1, 2, . . . , n) as we have seen
in section *** satisﬁes the recurrence relation, Dn − nDn−1 = (−1)n , for
n ≥ 2 with D1 = 0. The easiest way to solve this recurrence relation is to
rewrite it as,
Dn Dn−1 (−1)n
− =
n! (n − 1)! n!
which is easy to solve.
Chapter 2 Combinatorics 87

Example 2.17.2:
The sorting problem asks for an algorithm that takes as input a list or an
array of n integers and that sorts them that is, arranges them in nondecreas-
ing (or nonincreasing) order. One algorithm, call it procedure Mergesort(n),
does this by splitting the given list of n integers into two sublists of ⌊n/2⌋ and
⌈n/2⌉ integers, applies the procedure on the sublists to sort them and merges
the sorted sublists (note the recursive formulation). If the “time taken” by
procedure Mergesort(n) in terms of the number of comparisons is denoted
by T (n) then T (n) is known to satisfy the following recurrence relation:
jnk lnm
T (n) = T +T + n − 1, with T (2) = 1
2 2

The above method is characteristic of “divide-and-conquer” algorithms

where a main problem of size n is split it into b (b > 1) subproblems of size
say, n/c (c > 1); the subproblems are solved by further splitting; the splitting
stops when the size of the subproblems are small enough to be solved by a
simple technique. A divide-and-conquer algorithm combines the solutions to
the subproblems to yield a solution to the original problem. Thus there is
a non-recursive cost (denoted by the function f (.) ) associated in splitting
and/or combining (the solutions). Thus we generaly get a recurrence relation
of the type,
T (n) = bT (n/c) + f (n)

Example 2.17.3:
Consider the problem of ﬁnding the number of binary sequences of length n
which do not contain two consecutive 1’s.
Let wn be the number of such sequences. Let un be the number of such
Chapter 2 Combinatorics 88

sequences whose last digit is a 1. Also, let vn be the number of such sequences
whose last digit is a 0. Obviously, wn = un + vn .

Consider extending any sequence of the required type from length n − 1

to length n. We have the following two possibilities:

(i) If a sequence of length n − 1 ends with a 1 then, we can append a 0 but

not a 1.

(ii) If a sequence of length n − 1 ends with a 0 then, we can append either

a 0 or a 1.

It is not diﬃcult to reason that un and vn can be counted as given by the

following recurrence relations:

vn = vn−1 + un−1 and un = vn−1 .

These lead to the equations,

vn = vn−1 + vn−2 and un = un−1 + un−2

which when added give the recurrence relation,

wn = wn−1 + wn−2 .

This equation is the same as that for Fibonacci numbers, which can be solved
with the initial conditions w1 = 2 and w2 = 3.

Example 2.17.4:
A circular disk is divided into n sectors. There are p diﬀerent colors (of
paints) to color the sectors so that no two adjacent sectors get the same
color. We are interested in the number of ways of coloring the sectors.
Chapter 2 Combinatorics 89

n 1
2

3
4

Let un be the number of ways to color the disk in the required manner.
This number clearly depends upon both n and p. We form a recurrence
relation in n using the following reasoning as given in [56]. We construct two
mutually exclusive and exhaustive cases:

(i) The sectors 1 and 3 are colored diﬀerently. In this case, removing sector
2 gives a disk of n − 1 sectors. Sector 2 can be colored in p − 2 ways.

(ii) The sectors 1 and 3 are of the same color. In this case, removing sector
2 gives a disk of n − 2 sectors as sectors 1 and 3 being of the same color
can be fused as one. Sector 2 can be colored using any of the p−1 colors
(i.e., excluding the color of sector 1 or sector 3). For each coloring of
sector 2, we can color the disk of n − 2 sectors in un−2 ways. Thus, we
have the following recurrence relation:

un = (p − 2)un−1 + (p − 1)un−2

with the initial conditions, u2 = p(p − 1)andu3 = p(p − 1)(p − 2).

The solution can be obtained as, un = (p − 1)[(p − 1)n−1 + (−1)n ].

Chapter 2 Combinatorics 90

2.18 Homogeneous Recurrences

The recurrence relations in the above examples can be solved using the tech-
niques described in this section. In general, we can consider equations of the
form,
an = f (an−1 , an−2 , . . . , an−i ), where n ≥ i

Here we have a recurrence with “ﬁnite history” as an depends on a ﬁxed

number of earlier values. If an depends on all the previous values then the
recurrence is said to have a “full history”. In this section, we here consider
equations of the form,

c0 an + c1 an−1 + ... + ck an−k = 0. (2.9)

The above equation is linear as it does not contain terms like an−i · an−j ,
a2n−i and so on; it is homogeneous as the linear combination of an−i is zero; it
is with constant coefficients because the ci ’s are constants. For example, the
well-known Fibonacci Sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, . . . is deﬁned by
the linear homogeneous recurrence, fn = fn−1 + fn−2 , when n ≥ 2 with the
initial conditions, f0 = 0 and f1 = 1.
It is easy to see that if fn and gn are solutions to (2.9), then so does a
linear combinations of fn and gn say pfn + qgn , where p and q are constants.
To solve (2.9), we try an = xn where x is an unknown constant. If an = xn
is used in (2.9) we should have,

c0 xn + c1 xn−1 + . . . + ck xn−k = 0.

The solution x = 0 is trivial; otherwise, we must have

p(x) ≡ co xk + c1 xk−1 + . . . + ck = 0.
Chapter 2 Combinatorics 91

The equation p(x) = 0 is known as the characteristic equation which can be

solved for x.

Example 2.18.1:
Consider the Fibonacci Sequence as above. We can write the recurrence
relation as
fn − fn−1 − fn−2 = 0. (2.10)

The characteristic equation is x2 − x − 1 = 0. The roots r1 , r2 of this

√ √
equation are given by, r1 = (1 + 5)/2 and r2 = (1 − 5)/2.
So, the general solution of (2.10) is, fn = c1 r1n + c2 r2n .
Using the initial conditions f0 = 0 and f1 = 1 we get, c1 + c2 = 0 and
√ √
c1 r1 + c2 r2 = 1 which give, c1 = 1/ 5 and c2 = −1/ 5.
Thus the nth Fibonacci number fn−1 is given by,
√ √
1 1 + 5 n−1 1 1 − 5 n−1
fn−1 = √ −√
5 2 5 2

It is interesting to observe that the solution of the above recurrence which

has integer coeﬃcients and integer initial values, involves irrational numbers.

Example 2.18.2:
Consider the relation,

an − 6an−1 + 11an−2 − 6an−3 = 0, n>2

with a0 = 1, a1 = 3 and a2 = 5.
From the given equation we directly write its chracteristic equation as,

x3 − 6x2 + 11x − 6 = 0,

that is, (x − 1)(x − 2)(x − 3) = 0.

Chapter 2 Combinatorics 92

Thus the roots of the characteristic equation are, 1, 2 and 3 and the solution
should be of the form, an = A1n + B2n + C3n .
Applying the initial conditions, we get,

a0 = 1 = A + B + C; a1 = 3 = A + 2B + 3C; and a2 = 5 = A + 4B + 9C

which yield, A = −2, B = 4 and C = −1.

Hence the solution is given by an = −2 + 4.2n − 3n .
We next deal with the case when the characteristic equation has multiple
roots. Let the characteristic polynomial p(x) be,

p(x) = co xk + c1 xk−1 + · · · + ck .

Let r be a multiple root occurring two times; that is, (x − r)2 is a factor of
p(x). We can write,

p(x) = (x − r)2 q(x), where q(x) is of degree (k − 2).

For every n ≥ k consider the nth degree polynomials,

un (x) = c0 xn + c1 xn−1 + . . . + ck xn−k

and vn (x) = c0 nxn + c1 (n − 1)xn−1 + · · · + ck (n − k)xn−k .

We note that, vn (x) = xu′n (x).

Now, un (x) = xn−k p(x) = xn−k (x − r)2 q(x) = (x − r)2 [xn−k q(x)]
So, u′n (r) = 0, which implies that vn (r) = ru′n (r) = 0, for all n ≥ k.
That is, c0 nrn + c1 (n − 1)rn−1 + · · · + ck (n − k)rn−k = 0.
From (2.9), we conclude an = nrn is also a solution to the given recur-
rence.
More generally, if root r has multiplicity m, then an = rn , an = nrn ,
an = n2 rn , . . . , an = nm−1 rn all are distinct solutions to the recurrence.
Chapter 2 Combinatorics 93

Example 2.18.3:
Consider the recurrence, an − 11an−1 + 39an−2 − 45an−3 = 0, when n > 3
with a0 = 0, a1 = 1 and a2 = 2.
We can write the characteristic equation as

x3 − 11x2 + 39x − 45 = 0,

the roots of which are 3, 3 and 5. Hence, the general solution can be written
as
an = (A + Bn)3n + C5n .

Using the initial conditions, we get a0 = A + C = 0; a1 = 3(A + B) + 5C = 1;

and a2 = 9(A + 2B) + 25C = 2. This gives, A = 1, B = 1 and C = −1.
Hence the required solution is given by,

an = (1 + n)3n + 5n .

2.19 Inhomogeneous Equations

Inhomogeneous equations are more diﬃcult to handle and linear combina-

tions of the diﬀerent solutions may not be a solution. We begin with a simple
case.
c0 an + c1 an−1 + · · · + ck an−k = bn p(n).

Here the left-hand side is same as before; on the right-hand side, b is a

constant and p(n) is a polynomial in n of degree d. The following example
is illustrative.

Example 2.19.1:
Chapter 2 Combinatorics 94

Consider the recurrence relation,

an − 3an−1 = 5n (2.11)

Multiplying by 5, we get 5an − 15an−1 = 5n+1

Replacing n by n − 1, we get 5an−1 − 15an−2 = 5n (2.12)

Subtracting (2.12) from (2.11), an − 8an−1 + 15an−2 = 0 (2.13)

The corresponding characteristic polynomial is x2 −8x+15 = (x−3)(x−5)

and so we can write the general solution as,

an = A3n + B5n (2.14)

We can observe that 3n and 5n are not solutions to (2.11)! The reason is
that (2.11) implies (2.13), but (2.13) does not imply (2.11) and hence they
are not equivalent.
From the original equation we can write a1 = 3a0 + 5, where a0 is the initial
condition.
From (2.14) we get, A + B = a0 and 3A + 5B = a1 = 3a0 + 5.
Therefore, we should have, A = a0 − 5/2 and B = 5/2.
Hence, an = [(2a0 − 5)3n + 5n+1 ]/2.

2.20 Repertoire Method

The repertoire method (illustrated in ‘[60]) is one technique that works by

“trial and error” and may be useful in certain cases. Consider for example,

an − nan−1 + (n − 2)an−2 = (2 − n),

with the initial condition a1 = 0 and a2 = 1. We “relax” the above recurrence

by writing a general function, say f (n) on the right-hand side of the above
Chapter 2 Combinatorics 95

equation. Thus, we consider the equation,

an − nan−1 + (n − 2)an−2 = f (n).

We now try various possible candidate solutions for an , evaluate the left-
hand side above and check to see if it yields the required value for f (n) – the
required f (n) should be (2 − n). We tabulate the work as follows:

Row No. Suggested an Resulting f (n) Resulting initial conditions

1 1 0 a1 = 1 and a2 = 1
2 -1 0 a1 = −1 and a2 = −1
3 n (2-n) a1 = 1 and a2 = 2

We note that row 1 and row 2 are not linearly independent. We also note
that in row 3, we have an = n which gives the correct f (n) but the initial
conditions are not correct. We can subtract row 1 from row 3 which gives
an = (n − 1) with the correct initial conditions. Thus, an = (n − 1) is the
required solution.
Next we consider the following example.

an − (n − 1)an−1 + nan−2 = (n − 1) for n > 1; a0 = a1 = 1.

We generalize this as,

an − (n − 1)an−1 + nan−2 = f (n) for n > 1; a0 = a1 = 1.

As before, we try various possibilities for an and look for the resulting f (n)
to get a “repertoire” of recurrences. We summarize the work in the following
table:

Row No. Suggested an Resulting f (n) Resulting initial conditions

1 1 2 a0 = a1 = 1
2 n n−1 a0 = 0 and a1 = 1
3 n2 n+1 a0 = 0 and a1 = 1
Chapter 2 Combinatorics 96

From row 1 and row 3, by subtraction we ﬁnd that an = (n2 − 1) is a

solution because it results in f (n) = (n − 1); however it gives the initial
conditions a0 = −1 and a1 = 0. This solution and the solution in the second
row are linearly independent. We combine these to give an = (n − n2 + 1)
which gives the right initial conditions.

2.21 Perturbation Method

The Perturbation Method is another technique to approximate the solution

to a recurrence. The method is nicely illustrated in the following example
given in ‘[60]. Consider the following recurrence:

a0 = 1; a1 = 2 and an+1 = 2an + an−1 /n2 , n > 1 (2.15)

In the perturbation step, we note that the last term contains the “1 / n2 ”
factor; we therefore reason that it will bring a small contribution to the
recurrence; hence, approximately,

an+1 = 2an

This yields an = 2n which is an approximate solution. To correct the error

involved, we consider the exact recurrence,

bn+1 = 2bn , n > 0; with b0 = 1.

Then, bn = 2n which is exact. We now compare the two quantities an and

bn . Let,
Pn = an /bn = an /2n which gives an = 2n Pn .

Using the last relation in (2.15) we get,

2n+1 Pn+1 = 2 × 2n Pn + 2n−1 Pn−1 /n2 when n > 1.

Chapter 2 Combinatorics 97

Therefore, Pn+1 = Pn + (1/4n2 )Pn−1 , n > 0; P0 = 1

Clearly the Pn ’s are increasing. We see that,

Pn+1 ≤ Pn (1 + (1/4n2 )), n ≥ 1,

so that,
n
Y
Pn+1 ≤ [1 + (1/4k 2 )].
k=1
The inﬁnite product α0 corresponding to the right hand side above converges
monotonically to
∞
Y
α0 = (1 + (1/4k 2 )) = 1.46505 . . .
k=1

Thus Pn is bounded above by α0 and as it is increasing, it must converge

to a constant. We have thus proved that, an ∼ α · 2n , for some constant
α < 1.46505.

2.22 Solving Recurrences using Generating

Functions

We next consider a general technique to solve linear recurrence relations. This

is based on the idea of generating functions. We begin with an introduction to
generating functions. An inﬁnite sequence {a0 , a1 , a2 , . . .} can be represented
as a power series in an auxiliary variable z. We write,
X
A(z) = a0 + a1 z + a2 z 2 + . . . = ak z k . . . (2.16)
k≥0

When it exists, A(z) is called the generating function of the sequence

{a0 , a1 , a2 , . . .}.
It may be noted that the theory of inﬁnite series says that:
Chapter 2 Combinatorics 98

(i) If the series converges for a particular value of z0 of z then it converges

for all values of z such that |z| < |z0 |.

(ii) The series converges for some z 6= 0 if and only if the sequence {|an |n/2 }
is bounded. (If this condition is not satisﬁed, it may be that the se-
quence {an /n!} is convergent.)

In practice, the convergence of the series may simply be assumed. When

a solution is discovered by any means, however sloppy, it may be justiﬁed
independently say, by using induction.
We use the notation [z n ]A(z) to denote the coeﬃcient of z n in A(z); that
is, an = [z n ]A(z). The following are easy to check:

1. If the given inﬁnite sequence is {1, 1, 1, . . .} then it follows that the corre-
sponding generating function A(z) is given by,

A(z) = 1 + z + z 2 + z 3 + . . . = 1/(1 − z).

2. If the given inﬁnite sequence is {1, 1/1!, 1/2!, 1/3!, 1/4!, . . .} then

A(z) = 1 + z/1! + z 2 /2! + z 3 /3! + z 4 /4! + . . . = ez .

3. If the given inﬁnite sequence is {0, 1, 1/2, 1/3, 1/4, . . .} then,

A(z) = z + z 2 /2 + z 3 /3 + z 4 /4 + . . . = ln(1/(1 − z)).

2.22.1 Convolution

Let A(z) and B(z) be the generating functions of the sequences {a0 , a1 , a2 , . . .}
and {b0 , b1 , b2 , . . .} respectively. The product A(z)B(z) is the series,

(a0 + a1 z + a2 z 2 + . . .) × (b0 + b1 z + b2 z 2 + . . .)
Chapter 2 Combinatorics 99

= a0 b0 + (a0 b1 + a1 b0 )z + (a0 b2 + a1 b1 + a2 b0 )z 2 + . . .

Pn
It is easily seen that, [z n ]A(z)B(z) = k=0 ak bn−k . Therefore, if we wish to
evaluate any sum that has the general form
n
X
cn = ak bn−k (2.17)
k=0

and if the generating functions A(z) and B(z) are known then we have cn =
[z n ]A(z)B(z).
The sequence {cn } is called the convolution of the sequences {an } and
{bn }. In short, we say that the convolution of two sequences corresponds to
the product of the respective generating functions. We illustrate this with an
example.
From the Binomial Theorem, we know that, (1 + z)r is the generating
function of the sequence {C(r, 0), C(r, 1), C(r, 2), . . .}. Thus we have,
X X
(1 + z)r = C(r, k)z k and (1 + z)s = C(s, k)z k
k≥0 k≥0

By multiplication, we get (1 + z)r (1 + z)s = (1 + z)r+s .

Equating the coeﬃcients of z n on both the sides we get,
X
C(r, k)C(s, n − k) = C(r + s, n)
k=0

which is the well-known Vandermonde convolution.

When bk = 1 (for all k = 0, 1, 2, . . . ) , then from (2.12) we get, cn =
P
k=0 ak .
Thus, convolution of a given sequence with the sequence {1, 1, 1, . . .} gives
“a sequence of sums”. The following examples are illustrative of this fact.
Chapter 2 Combinatorics 100

Example 2.22.1:
By taking the convolution of the sequence {1, 1, 1, . . .} (whose generating
function is 1/(1 − z) ) with itself we can immediately deduce that 1/(1 - z)2
is the generating function of the sequence {1, 2, 3, 4, 5, . . .}.

Example 2.22.2:
We can easily see that 1/(1 + z) is the generating function for the sequence
{1, −1, 1, −1, . . .}. Therefore 1/(1 + z)(1 − z) or 1/(1 − z 2 ) is the generat-
ing function of the sequence {1, 0, 1, 0, . . .} which is the convolution of the
sequences {1, 1, 1, . . .} and {1, −1, 1, −1, . . .}.

2.23 Some simple manipulations

In the following, we assume F (z) and G(z) to be the respective generating

functions of the inﬁnite sequences {fn } and {gn }.

1. For some constants u and v we have,

X X X
uF (z) + vG(z) = u fn z n + v gn z n = (ufn + vgn )z n ,
n≥0 n≥0 n≥0

which is therefore the generating function of the sequence {ufn + vgn } .

2. Replacing z by cz where c is a constant, we get

X X
G(cz) = gn (cz)n = c n gn z n
n≥0 n≥0

which is the generating function for the sequence {cn gn }. Thus 1/(1 − cz)
is the generating function of the sequence {1, c, c2 , c3 , . . .}.
Chapter 2 Combinatorics 101

3. Given G(z) in a closed form we can get G′ (z). Term by term diﬀerentiation
(when possible) of the inﬁnite sum of G(z) yields,

G′ (z) = g1 + 2g2 z + 3g3 z 2 + 4g4 z 3 + . . .

Thus G′ (z) represents the inﬁnite sequence {g1 , 2g2 , 3g3 , 4g4 , . . .} i.e., {(n+
1)gn+1 }. Thus, with a shift, we have “brought down a factor of n” into the
terms of the original sequence {gn }. Equivalently, zG′ (z) is the generating
function for {ngn }.

4. By term by term integration (when possible) of the inﬁnite sum of G(z),

we have
Z z X
G(t)dt = g0 z + (1/2)g1 z 2 + (1/3)g2 z 3 + (1/4)g3 z 4 + . . . = (1/n)gn−1 z n
0 n≥1

Thus by integrating G(z) we get the generating function for the sequence
{gn−1 /n}.

Example 2.23.1:
It is required to ﬁnd the generating function of the sequence {12 , 22 , 32 , . . .}.

We have seen above that 1/(1 − x)2 is the generating function of the
sequence {1, 2, 3, . . .}.
By diﬀerentiation, we can see that 2/(1 − x)3 is the generating function
of the sequence {2.1, 3.2, 4.3, . . .}. In this sequence, the term with index k is
(k + 2)(k + 1) which can be written as (k + 1)2 + k + 1. We want the sequence
{ak } where ak = (k + 1)2 . By subtracting the generating function for the
sequence {1, 2, 3, . . .} from that for the sequence {2.1, 3.2, 4.3, . . .} , we get
the required answer as [2/(1 − x)3 ] − [1/(1 − x)2 ].
Chapter 2 Combinatorics 102

2.23.1 Solution of recurrence relations

Consider the following simultaneous recurrence relations,

an + 2an−1 + 4bn−1 = 0 (2.18)

and bn − 4an−1 − 6bn−1 = 0 (2.19)

with a0 = 1 and b0 = 0. We multiply both the sides of equation (2.18) by z n

and take the sum over n ≥ 1. This gives,
X X X
an z n + 2z an−1 z n−1 + 4z bn−1 z n−1 = 0.
n≥1 n≥1 n≥1
X X X
or, an z n + 2z an z n + 4z bn z n = 0.
n≥1 n≥1 n≥1

The last equation can be written as,

G(z) − 1 + 2zG(z) + 4zH(z) = 0 (2.20)

where the initial condition a0 = 1 has been used and it is assumed that G(z)
and H(z) are the generating functions of the sequences {a0 , a1 , a2 , . . .} and
{b0 , b1 , b2 , . . .} respectively. In a similar manner, from (2.19) we can obtain,

H(z) − 4zG(z) − 6zH(z) = 0 (2.21)

Equations (2.20) and (2.21) can be solved to yield,

G(z) = (1 − 6z)/(1 − 2z)2 and H(z) = 4z/(1 − 2z)2 .

From these closed forms it is easy to obtain,

[z n ]G(z) = an = 2n (1 − 2n) and [z n ]H(z) = bn = n2n+1

Next we consider the following recurrence relation.

an+2 − 3an+1 + 2an = n, with a0 = a1 = 1.

Chapter 2 Combinatorics 103

We multiply both sides of the above equation by z n+2 and sum over all n
obtaining,
X X X X
an+2 z n+2 − 3z an+1 z n+1 + 2z 2 an z n = nz n+2 .
n≥0 n≥0 n≥0
P
If we deﬁne G(z) = n≥0 an z n then the above equation can then be written
as,
(G(z) − z − 1) − 3z(G(z) − 1) + 2z 2 G(z) = z 3 /(1 − z)2 .

Note that in obtaining the above we have used the initial conditions a0 =
a1 = 1 and we have used the fact that the inﬁnite sequence {0, 0, 0, 1, 2, 3, . . .}
has the generating function z 3 /(1 − z)2 . From the above, we get G(z) as,

G(z) = {z 3 /(1 − z)2 (1 − 3z + 2z 2 )} + {(1 − 2z)/(1 − 3z + 2z 2 )}.

To get [z n ]G(z) , we ﬁrst express the right-hand side of G(z) above in terms
of partial fractions. We get,
n 1 o n 1 o n 1 o
G(z) = + − .
(1 − 2z) (1 − z)2 (1 − z)3

Using inﬁnite series expansions of the terms on the right, it is easy to check
that,
[z n ]G(z) = an = 2n − (n2 + n)/2.

We note that we can describe a sequence by a recurrence relation and then

try to use the recurrence relation (as in the above examples) to obtain an
equation in the associated generating function. The following exercise illus-
trates that, in general, we may end up in a diﬀerential equation involving
the generating function.
Chapter 2 Combinatorics 104

Example 2.23.2:
Consider the following sequence:

a0 = 1, a1 = 1/2, a2 = (1 · 3)/(2 · 4),

a3 = (1 · 3 · 5)/(2 · 4 · 6), a4 = (1 · 3 · 5 · 7)/(2 · 4 · 6 · 7)

Obtain the generating function for the sequence {an }.

2.23.2 Some common tricks

We now illustrate some common manipulations in dealing with certain re-

currences.

1. Consider the recurrence, an = nan−1 + n(n − 1)an−2 , where n > 1 with

a1 = 1 and a0 = 0. Dividing both the sides by n! gives the Fibonacci
relation in an /n! which can be readily solved with the given initial condi-
tions.

√
2. Consider the non-linear recurrence, an = (an−1 an−2 ), n > 1 with a1 = 2
and a0 = 1. We take the logarithm on both the sides and set bn = log an .
This gives,

bn = (bn−1 + bn−2 )/2, n > 1 with b1 = 1 and b0 = 0

which is a linear recurrence with constant coeﬃcients.

3. A slightly more tricky case is the following recurrence:

an = 1/(1 + an−1 ), n > 0 with a0 = 1

The ﬁrst few iterations give a0 = 1, a1 = 1/2, a2 = 2/3, a3 = 3/5, a4 = 5/8

etc. We can easily recognize the Fibonacci numbers in these ratios. This
Chapter 2 Combinatorics 105

suggests the substitution an = bn−1 /bn which yields the equation,

bn−1 /bn = 1/(1 + bn−2 /bn−1 ), n > 0 with b1 = 1 and b0 = 1

which reduces to, bn = bn−1 + bn−2 which can be easily solved.

2.24 Illustrative Problems

We end this chapter with two illustrative problems.

Problem 2.24.1 (Counting binary trees):

We consider this problem as given in [49].
We recursively define a binary tree as: a binary tree is empty (having no
vertex) or it consists of a distinguished vertex, called the root, together with
an ordered pair of binary trees called the left subtree and the right subtree.
It can be noted that this definition admits a binary tree to have only a left
subtree or only a right subtree. The problem is to count the number bn of
binary trees with exactly n vertices. By exhaustive listing we find b0 = 1,
b1 = 1, b2 = 2 and b3 = 5. Let B(z) be the generating function for the
sequence {b0 , b1 , b2 , b3 , . . .}.

By deﬁnition, B(z) = b0 + b1 z + b2 z 2 + b3 z 3 + . . ..
For n ≥ 1 the number of binary trees with n vertices can be enumerated
as the number of ordered pairs of the form (B1 , B2 ) where B1 and B2 are
binary trees that together have exactly (n − 1) vertices i.e., if B1 has k
vertices then B2 will have (n − k − 1) vertices where k can take the values
0, 1, 2, 3, . . . , (n − 1). Therefore the number bn of such ordered pairs is given
Chapter 2 Combinatorics 106

by,
bn = b0 bn−1 + b1 bn−2 + · · · + bn−1 b0 where n ≥ 1.

The right-side above can be recognized to be the coeﬃcient of z n−1 in the

product B(z)B(z) = {B(z)}2 . Hence bn should be the coeﬃcient of z n in
z{B(z)}2 . We note that z{B(z)}2 is the generating function for the sequence
{b1 , b2 , b3 , . . .} which is the same as the sequence generated by B(z) except
that B(z) also generates b0 = 1. We thus have,

B(z) = 1 + z{B(z)}2

If z is such a real number so that the power series B(z) converges then B(z)
will also be a real number; then the above quadratic can be solved to give,
p p
1 + (1 − 4z) 1 − (1 − 4z)
B(z) = or
2z 2z

yielding two possibilities. We thus have,

p p
1 + (1 − 4z) 1− (1 − 4z)
zB(z) = or
2 2

Diﬀerentiating the right-side of the above equation with respect to z and

letting z → 0, we see that the ﬁrst expression tends to −1 whereas the
second expression tends to 1. Now,

zB(z) = b0 z + b1 z 2 + b2 z 3 + · · · ,

and its derivative with respect to z is b0 + 2b1 z + 3b2 z 2 + · · · , which tends to

1 as z → 0. Therefore we must have,

1 − (1 − 4z)1/2 1 − (1 − 4z)1/2
zB(z) = or B(z) =
2 2z
Chapter 2 Combinatorics 107

By expanding (1 − 2z)1/2 in inﬁnite series we can get the expansion for B(z)
and get,

n 1 2n
bn = [z ]B(z) =
n+1 n
The numbers bn are known as Catalan numbers.

Problem 2.24.2:
Average-case analysis of a simple algorithm to ﬁnd the maximum in a list.

This problem is to do an average-case analysis of a trivial algorithm.

Let X[1 . . . n] be an array of n distinct positive real numbers. The follow-
ing pseudo-code FindMax returns in the variable “max”, the largest element
in the array X[1 . . . n]:

max:= -1;
for i := 1 to n do
if max < X[i] then max := X[i];

The problem is to ﬁnd the average-case time complexity of the above code-
segment. This essentially involves ﬁnding the average number of times the
assignment “max := X[i]” is executed . FindMax can be written as the
following assembly-language like code using a reduced set of pseudo-code
statements:

max := −1;
i := 0;
1: i := i + 1;
if i > n then goto 2;
if max ≥ X[i] then goto 1;
Chapter 2 Combinatorics 108

max := X[i];
goto 1;
2: - - -

Thus on a sequential computer, a compiled form of FindMax will execute:

(i) a ﬁxed number of assignments to initialize the variables max and i

(ii) (n+1) comparisons of the form “ i > n? ”

(iii) (n+1) increments of the index i

(iv) n comparisons of the form “max ≥ X[i]?”

(v) a variable number (between 1 and n ) of assignments “max := X[i]”.

We can thus conclude that the time, tmaxfn of FindMax has to be of the
form:
tmaxfn = c0 + c1 n + c2 EXCH[X]
where EXCH[X] is the number of times the instruction “max := X[i]” is
executed ( i.e., the number of “exchanges” that has taken place) and c0 , c1
and c2 are the implementation constants dependent on the machine where
the code runs. We note that EXCH[X] is 1 if X[1] is the largest element. Also,
EXCH[X] takes the maximum value n when the array X is already sorted in
the increasing order.
To get an estimate of the expected value of EXCH[X] we introduce the
“permutation model”. In this model we will assume that the array X is a
permutation of the integers (1, . . . , n). Then each permutation can occur (be
an input to FindMax) with equal probability 1/n!. Let sn,k be the num-
ber of those permutations wherein EXCH[X] is k. Then, if pn,k denotes the
Chapter 2 Combinatorics 109

probability that EXCH[X] = k we have,

sn,k
pn,k =
n!
The expected value, exch[X] of EXCH[X] is given by,
X ksn,k 1 X
exch[X] = = ksn,k . (2.22)
k=1
n! n! k=1

To get the sum on the right-side above, we consider all those permutations
σ1 σ2 σ3 . . . σn of (1, . . . , n) wherein the value EXCH[X] is exactly k (by deﬁni-
tion, there are exactly sn,k of these). With respect to these types of permu-
tations, we reason that the following two cases can occur:

(a) the last element σn is equal to n: in this case σ1 σ2 . . . σn−1 should have
produced exactly k − 1 exchanges because the last element being the
largest will surely force one more exchange. Thus the number of permu-
tations in this case is sn−1,k−1 .

(b) the last element σn is not equal to n: in this case σn is one of 1, 2, 3, . . . , (n−
1). Then σ1 σ2 . . . σn−1 should have produced exactly k exchanges because
the element (being less than the maximum) will not be able to force an
exchange. In this case the number of permutations is (n − 1)sn−1,k .

Thus we have,
sn,k = sn−1,k−1 + (n − 1)sn−1,k (2.23)

We introduce the following generating function Sn (x) for each n:

n
X
Sn (x) = sn,k xk . (2.24)
k=1

Multiplying both the sides of (2.23) by xk and summing over k from 1 through
n and using the deﬁnition (2.12) we get,

Sn (x) = xSn−1 (x) + (n − 1)Sn−1 (x) = (x + n − 1)Sn−1 (x) (2.25)

Chapter 2 Combinatorics 110

From the definition (2.24) we find S1 (x) = x. Then from (2.25) we get
S2 (x) = x(x + 1) etc. In general we find that the explicit form of Sn (x) is
given by
n−1
Y
Sn (x) = (x + j) (2.26)
j=0

From (2.24) we get,

n
X n
X
Sn′ (x) = ksn,k x k−1
which gives Sn′ (1) = ksn,k (2.27)
k=1 k=1

Also from (2.26) we get,

n−1
Y
Sn (1) = (x + j) = n! (2.28)
j=0

From (2.23) we have,

n
1 X S ′ (1)
exch[X] = ksn,k = n (2.29)
n! k=1 Sn (1)
The second equality above uses (2.27) and (2.29). The derivative of (2.26)
after taking natural logarithm on both the sides gives,
Sn′ (x) 1 1 1
= + + ··· +
Sn (x) x (x + 1) (x + n − 1)
Substituting x = 1, this yields, Sn′ (1)/Sn (1) = Hn , the nth Harmonic number
which is the value of exch[X]. Hence the average time FindMax takes under
the permutation model is

tmaxfAV
n
G
= c0 + c1 n + c2 Hn .

Exercises

1. Let A = {a1 , a2 , a3 , a4 , a5 } be a set of ﬁve integers. Show that for any

permutation aπ1 , aπ2 , aπ3 , aπ4 , aπ5 of A, the product,

(a1 − aπ1 )(a2 − aπ2 ) · · · (a5 − aπ5 )

Chapter 2 Combinatorics 111

is always divisible by 2.

2. This problem concerns one instance of what is known as Langford’s

Problem. A 27-digit sequence includes the digits 1 through 9, three
times each. There is just one digit between the first two 1’s and between
the last two 1’s. There are just two digits between the first two 2’s and
between the last two 2’s and so on. The problem asks to find all such
sequences. The naı̈ve method is:

Step 1. Generate the ﬁrst/next sequence of 27-digits using the digits

1 through 9, each digit occuring exactly three times.

Step 2. Check if the current sequence satisﬁes the given constraint.

Step 3. Output the current sequence as a solution if it satisﬁes the

constraint.

Step 4. If all sequences have been generated then stop; else go to Step
1.

How many sequences are examined in the above method?

Bonus: How is it possible to do better?

3. How many ways are there to choose three or more people from a set of
eleven people?

4. Three boys and four girls are to sit on a bench. The boys must sit
together and the girls must sit together. In how many ways can this
be done?

5. An urn contains 5 red marbles, 2 blue marbles and 5 green marbles.

Assume that marbles of the same color are indistinguishable. How
Chapter 2 Combinatorics 112

many diﬀerent sequences of marbles of length four can be chosen?

6. Let X =⊆ {1, 2, 3, 4, . . . , (2n − 1)} and let |X| ≥ (n + 1). Prove the
following result (due to P. Erdös):
“There are two numbers a, b ∈ X, with a < b such that a divides b”.
In the above problem, if we prescribe |X| = n, will the above result be
still true?

7. Find the approximate value (correct to three decimal places) of (1.02)10 .

(Hint: Use the Binomial Theorem.)

8. Without using the general formula for φ(n) above, reason that when
n = pα (α is a natural number) is a prime power, φ(n) can be expressed
as pα (1 − 1/p).

9. If m and n are relatively prime, then argue that φ(mn) = φ(m)φ(n).

P
10. For an arbitrary natural number n prove that, d|n φ(d) = n (the sum
is over all natural numbers dividing n).

11. If p is a prime and n is a natural number, then argue that, φ(pn) =

pφ(n) if p divides n and φ(pn) = (p − 1) · φ(n), otherwise.

12. Argue that the number of partitions of a number n into exactly m

terms is equal to the number of partitions of n − m into no more than
m terms.

13. Find the number of ways in which eight rooks may be placed on a
conventional 8 × 8 chessboard so that no rook can attack another and
the white diagonal is free of rooks.
Chapter 2 Combinatorics 113

14. What is the number An of ways of going up a staircase with n steps if

we are allowed to take one or two steps at a time?

15. Consider the evaluation of n × n determinant by the usual method of

expansion by cofactors. If fn is the number of multiplications required
then argue that fn satisﬁes the recurreence

fn = n(fn−1 + 1), n > 1, f1 = 0

Solve the above recurrence and hence show that fn ≤ en!, for all n > 1.

16. Without using induction prove that

n
X 1
i(i + 1)(i + 2) = n(n + 1)(n + 2)(n + 3)
i=1
4

17. This problem is due to H. Larson (1977). Consider the inﬁnite sequence
an with a1 = 1, a5 = 5, a12 = 144 and an + an+3 = 2an+2 .
Prove that an is the nth Fibonacci number.

18. Solve the recurrence relation

an = 7an−1 − 13an−2 − 3an−3 + 18an−4

where, a0 = 5, a1 = 3, a2 = 6, a3 = −21.

[Solution: Characteristic equation is (x + 1)(x − 2)(x − 3)2 , an =

2(−1)n + 2n + 2 · 3n − n3n , n ≥ 0.]

19. Consider the following recurrence:

an − 6an−1 − 7an−2 = 0, n≥5

with a3 = 344, a4 = 2400. Show that an = 7n + (−1)n+1 , n ≥ 3.

Chapter 2 Combinatorics 114

20. Consider the recurrence relation

an = 6an−1 − 9an−2 ,

with a0 = 2 and a1 = 3. Show that the associated generating function

g(x) is given by
2 − 9x
g(x) = .
1 − 6x + 9x2
Hence obtain an .
Solution: an = (2 − n) · 3n , n ≥ 0.

21. Let N be the number of strings of length n made up of the letters x, y

and z, where z occurs an even number of times.
Show that N = 21 (3n + 1).

22. How many sequences of length n can be composed from a, b, c, d in

such a way that a and b are never neighboring elements?
(Hint: Let xn = number of such sequences that start from a or b;
let yn = number of such sequences that start from c or d; form two
recurrences involving xn and yn .)
Chapter 3

Basics of Number Theory

3.1 Introduction

In this chapter, we present some basics of number theory. These include

divisibility, primes, congruences, some number-theoretic functions and the
Euclidean algorithm for ﬁnding the gcd of two numbers. We also explain
the big O notation and polynomial–time algorithms. We show, as examples,
that the Euclidean algorithm and the modular exponentiation algorithm are
polynomial–time algorithms. We denote by Z, the set of integers and N, the
set of positive numbers.

3.2 Divisibility

Definition 3.2.1:
Let a and b be any two integers and a 6= 0. Then b is divisible by a (equiva-
lently, a is a divisor of b) if there exists an integer c such that b = ac.

115
Chapter 3 Basics of Number Theory 116

If a divides b, it is denoted by a | b. If a does not divide b, we denote it

by a ∤ b.

Theorem 3.2.2: (i) a | b implies that a | bc for any integer c.

(ii) If a | b and b | c, then a | c.

(iii) If a divides b1 , b2 , . . . , bn , then a divides b1 x1 + · · · + bn xn for integers

x1 , . . . , xn .

(iv) a | b and b | a imply that a = ±b.

The proofs of these results are trivial and are left as exercises. (For
instance, to prove (iii), we note that if a divides b1 , . . . , bn , there exist integers
c1 , c2 , . . . , cn such that b1 = ac1 , b2 = ac2 , . . . , bn = acn . Hence b1 x1 + · · · +
bn xn = a(c1 x1 + · · · + cn xn ), and so a divides b1 x1 + · · · bn xn ).

Theorem 3.2.3 (The Division Algorithm):

Given any integers a and b with a 6= 0, there exist unique integers q and r
such that
b = qa + r, 0 ≤ r < |a|

Proof. Consider the arithmetic progression

. . . b − 3|a|, b − 2|a|, b − |a|, b, b + |a|, b + 2|a|, b + 3|a|, . . .

with common diﬀerence |a| and extended inﬁnitely in both the directions.
Certainly, this sequence contains a least non-negative integer r. Let this
term be b + q|a|, q ∈ Z. Thus

b + q|a| = r (3.1)
Chapter 3 Basics of Number Theory 117

and, its previous term, namely, r − |a| =

6 0 so that r < |a|. If a > 0, then
(3.1) gives
b = −qa + r, 0 ≤ r < |a| = a,

while if a < 0, |a| = −a so that b = qa + r, 0 ≤ r < |a|. It is clear that the

numbers q and r are unique.

Theorem 3.2.3 gives the division algorithm, that is, the process by means
of which division of one integer by a nonzero integer is made. An algorithm is
a step by step procedure to solve a given mathematical problem in ﬁnite time.
We next present Euclid’s algorithm to determine the gcd of two numbers a
and b. Euclid’s algorithm (Euclid B.C. ) is the ﬁrst known algorithm in the
mathematical literature. It is just the usual algorithm taught in High School
Algebra.

3.3 The Greatest Common Divisor (gcd) and

the Least Common Multiple (lcm) of two

integers

Definition 3.3.1:
Let a and b be two integers, at least one of which is not zero. A common
divisor of a and b is an integer c(6= 0) such that c | a and c | b. The greatest
common divisor of a and b is the greatest of the common divisors of a and b.
It is denoted by (a, b).
If c divides a and b, then so does −c. Hence (a, b) > 0 and is uniquely
deﬁned. Moreover, if c is a common divisor of a and b, that is, if c | a and
Chapter 3 Basics of Number Theory 118

c | b, then
a = a′ c and b = b′ c

for integers a′ and b′ . Hence

(a, b) = c(a′ , b′ )

so that c | (a, b). Thus any common divisor of a and b divides the gcd of a
and b. Hence (a, b) is the least common divisor of a and b that is divisible by
every common divisor of a and b. Moreover, (a, b) = (±a, ±b).

Proposition 3.3.2:
If c | ab and (c, b) = 1, then c | a.

Proof. By hypothesis c | ab. Trivially c | |ac. Hence c is a common divisor

of ab and ac. Hence c is a divisor of (ab, ac) = a(b, c) = a, as (b, c) = 1.

Definition 3.3.3:
If a, b and c are nonzero integers and if a | c and b | c, then c is called a
common multiple of a and b.

The least common multiple (lcm) of a and b is the smallest of the positive
common multiples of a and b and is denoted by [a, b]. As in the case of gcd,
[a, b] = [±a, ±b].

Euclid’s Algorithm

Since (±a, ±b) = (a, b), we may assume without loss of generality that a > 0
and b > 0 and that a > b (If a = b, then (a, b) = (a, a) = a). By Division
Chapter 3 Basics of Number Theory 119

Algorithm (Theorem 3.2.3), there exist integers q1 and r1 such that

a = q 1 b + r1 , 0 ≤ r1 < b. (3.2)

Next divide b by r1 and get

b = q 2 r1 + r2 , 0 ≤ r2 < r1 . (3.3)

Next divide r1 by r2 and get

r1 = q 3 r2 + r3 , 0 ≤ r3 < r2 . (3.4)

At the (i + 2)-th stage, we get the equation

ri = qi+2 ri+1 + ri+2 , 0 ≤ ri+2 < ri+1 . (3.5)

Since the sequence of remainders r1 , r2 , . . . is strictly decreasing, this proce-

dure must stop at some stage, say,

rj−1 = qj+1 rj . (3.6)

Then (a, b) = rj .

Proof. First we show that rj is a common divisor of a and b. To see this

we observe from equation (3.6) that rj | rj−1 . Now the equation (3.5) for
i = j − 2 is
rj−2 = qj rj−1 + rj . (3.7)

the algorithm yield c | a, c | b, c | r1 , c | r2 , . . . , c | rj . Thus any common

divisor of a and b is a divisor of rj .
Consequently, rj = (a, b).

Extended Euclidean Algorithm

Theorem 3.3.4:
If rj = (a, b), then it is possible to ﬁnd integers x and y such that

ax + by = rj (3.8)

Proof. The equations preceding (3.6) are

rj−3 = qj−1 rj−2 + rj−1 , and

rj−2 = qj rj−1 + rj . (3.9)

Equation (3.9) expresses rj in terms of rj−1 and rj−2 while the equation
preceding it expresses rj−1 in terms of rj−2 and rj−3 . Thus

rj = rj−2 − qj rj−1

= rj−2 − qj (rj−3 − qj−1 rj−2 )

= (1 + qj qj−1 ) rj−2 − qj rj−3 .

Thus we have expressed rj as a linear combination of rj−2 and rj−3 , the coef-
ﬁcients being integers. Working backward, we get rj as a linear combination
of a, b with the coeﬃcients being integers.

The process given in the proof of Theorem 3.8 is known as the Extended
Euclidean Algorithm.
Chapter 3 Basics of Number Theory 121

Corollary 3.3.5:
If (a, m) = 1, then there exists an integer u such that au ≡ 1(mod m) and
any two such integers are congruent modulo m.

Proof. By the Extended Euclidean Algorithm, there exist integers u and v

such that au + mv = 1. This however means that au ≡ 1(mod m). The
second part is trivial as (a, m) = 1.

Example 3.3.6:
Find integers x and y so that 120x + 70y = 1.
We apply Euclid’s algorithm to a = 120 and b = 70. We have

120 = 1 · 70 + 50

70 = 1 · 50 + 20

50 = 2 · 20 + 10

20 = 2 · 10

Hence gcd(120, 70) = 10.

Now starting from the last but one equation and going backward, we get

10 = 50 − 2 · 20

= 50 − 2(70 − 1 · 50)

= 3 · 50 − 2 · 70

= 3 · (120 − 1 · 70) − 2 · 70

= 3 · 120 − 5 · 70.

Therefore x = 3 and y = −5 fulﬁl the requirement.

Chapter 3 Basics of Number Theory 122

3.4 Primes

Definition 3.4.1:
An integer n > 1 is a prime if its only positive divisors are 1 and n. A natural
number greater than 1 which is not a prime is a composite number.

Naturally, 2 is the only even prime. 3, 5, 7, 11, 13, 17, . . . are all odd
primes. The composite numbers are 4, 6, 8, 9, 10, . . .

Theorem 3.4.2:
Every integer n > 1 can be expressed as a product of primes, unless the
number n itself is a prime.

Proof. The result is obvious for n = 2, 3 and 4. So assume that n > 4 and
apply induction. If n is a prime there is nothing to prove. If n is not a prime,
then n = n1 n2 , where 1 < n1 < n and 1 < n2 < n. By induction hypothesis,
both n1 and n2 are products of primes. Hence n itself is a product of primes.
(Note that the prime factors of n need not all be distinct.)

Suppose in a prime factorization of n, the distinct prime factors are

p1 , p2 , . . . , pr , and that pi is repeated αi times, 1 ≤ i ≤ r. Then

n = pα1 1 pα2 2 · · · pαr r . (3.10)

We now show that this factorization is unique in the sense that in any prime
factorization, the prime factors that occur are the same and that the prime
powers are also the same except for the order of the prime factors. For
instance, 200 = 23 × 52 , and the only other way to write it in the form (3.10)
is 52 × 23 .
Chapter 3 Basics of Number Theory 123

Theorem 3.4.3 (Unique factorization theorem):

Every positive integer n > 1 can be expressed uniquely in the form

n = pα1 1 pα2 2 · · · pαr r

where pi , 1 ≤ i ≤ r, are distinct primes; the above factorization is unique

except for the order of the primes.

To prove Theorem 3.4.3, we need a property of primes.

Lemma 3.4.4:
If p is a prime such that p | (ab), but p ∤ a, then p | b.

Proof. As p ∤ a, (p, a) = 1. Now apply Proposition 3.3.2.

Note 3.4.5:
Theorem 3.4.4 implies that if p is a prime and p ∤ a and p ∤ b, then p ∤
(ab). More generally, p ∤ a1 , p ∤ a2 , . . . p ∤ an imply that p ∤ (a1 a2 · · · an ).
Consequently if p | (a1 a2 · · · an ), then p must divide at least one ai , 1 ≤ i ≤ n.

Proof. (of Theorem 3.4.3) Suppose

β
n = pα1 1 pα2 2 · · · pαr r = q1β1 q2β2 · · · qj j · · · qsβs (3.11)

are two prime factorizations of n, where the pi ’s and qi ’s are all primes. As

α1 αr β1 βr
p1 | (p1 · · · pr ), p1 | q1 · · · qs . Hence by Note 3.4.5, p1 must divide some
qj . As p1 and qj are primes and p1 | qj , p1 = qj . Cancelling p1 on both the
sides, we get
β −1
pα1 1 −1 pα2 2 · · · pαr r = q1β1 q2β2 · · · qj j · · · qsβs . (3.12)
Chapter 3 Basics of Number Theory 124

Now argue as before with p1 if α1 − 1 ≥ 1. If α1 < βj , this procedure will

result in the relation

β −α1
pα2 2 · · · pαr r = q1β1 q2β2 · · · qj j · · · qsβs . (3.13)

Now p1 divides the right hand expression of (3.13) and so must divide the
left hand expression of (3.13). But this is impossible as the pi ’s are distinct
primes. Hence α1 = βj . Cancellation of pα1 1 on both sides of (3.11) yields

β β
pα2 2 · · · pαr r = q1β1 q2β2 · · · qj−1
j−1 j+1
qj+1 · · · qsβs .

Repetition of our earlier argument gives p2 = one of the qi ’s, 1 ≤ i ≤ s, say,

qk , k 6= j. Hence α2 = βk and so on. This shows that each pi = some qt and
that αi = βt so that pαi i = qtαt . Cancellation of pα1 1 followed by pα2 2 , . . . , pαr r
on both sides will leave 1 on the left side expression of (3.11) and so the right
side expression of (3.11) should also reduce to 1.

The unique factorization of numbers enables us to compute the gcd and

lcm of two numbers. Let a and b be any two integers ≥ 2. Let p1 , . . . , pr be
the primes which divide at least one of a and b. Then a and b can be written
uniquely in the form

a = pα1 1 pα2 2 · · · pαr r

b = pβ1 1 pβ2 2 · · · pβr r ,

where αi ’s and βj ’s are nonnegative integers. (Taking αi ’s and βj ’s to be

nonnegative integers in the prime factorizations of a and b, instead of taking
them to be positive, enables us to use the same prime factors for both a and
b. For instance, if a = 72 and b = 45, we can write a and b as,

a = 23 · 32 · 50
Chapter 3 Basics of Number Theory 125

and b = 20 · 32 · 51 .

Then clearly,
r
Y min(αi , βi )
(a, b) = pi , and
i=1
r
Y max(αi , βi )
[a, b] = pi
i=1

We next establish two important properties of prime numbers.

Theorem 3.4.6 (Euclid):

The number of primes is inﬁnite.

Proof. The proof is by contradiction. Suppose there are only ﬁnitely many
primes, say, p1 , p2 , . . . , pr . Then the number n = 1 + p1 p2 · · · pr is larger
than each pi , 1 ≤ i ≤ r, and hence composite. Now any composite number
is divisible by some prime. But none of the primes pi , 1 ≤ i ≤ r, divides
n. (For, if pi divides n, then pi | 1, an impossibility). Hence the number of
primes is inﬁnite.

Theorem 3.4.7 (Nagell):

There are arbitrarily large gaps in the sequence of primes. In other words,
for any positive integer k ≥ 2, there exist k consecutive composite numbers.

Proof. The k numbers

(k + 1)! + 2, (k + 1)! + 3, . . . , (k + 1)! + k, (k + 1)! + (k + 1)

are consecutive. They are all composite since (k + 1)! + 2 is divisible by 2,

(k + 1)! + 3 is divisible by 3 and so on. In general (k + 1)! + j is divisible by
j for each j, 2 ≤ j ≤ k + 1.
Chapter 3 Basics of Number Theory 126

Definition 3.4.8:
Two numbers a and b are coprime or relatively prime if they are prime to
each other, that is, if (a, b) = 1.

3.5 Exercises

1. For any integer n, show that n2 − n is divisible by 2, n3 − n by 6 and

n5 − n by 30.

2. Show that (n, n + 1) = 1 and that [n, n + 1] = n(n + 1).

3. Use the unique factorization theorem to prove that for any two positive
integers a and b, (a, b)[a, b] = ab, and that (a, b) | [a, b]. (Remark:
This shows that if (a, b) = 1, then [a, b] = ab. More generally, if
{a1 , a2 , . . . , ar }, is any set of positive integers, then (a1 , a2 , . . . , ar )
divides [a1 , a2 , . . . , ar ]. Here (a1 , a2 , . . . , ar ) and [a1 , a2 , . . . , ar ] denote
respectively the gcd and lcm of the numbers a1 , a2 , . . . , ar .

4. Prove that no integers x, y exist satisfying x + y = 100 and (x, y) = 3.

Do x, y exist if x + y = 99, (x, y) = 3 ?

5. Show that there exist inﬁnitely many pairs (x, y) such that x + y = 72
and (x, y) = 9.
Hint: One choice for (x, y) is (63, 9). Take x′ prime to 8, that is,
(x′ , 8) = 1 and take y ′ = 8 − x′ . Now use the pairs (x′ , y ′ ).

6. If a+b = c, show that (a, c) = 1, iﬀ (b, c) = 1. Hence show that any two
consecutive Fibonacci numbers are coprime. (The Fibonacci numbers
Chapter 3 Basics of Number Theory 127

Fn are deﬁned by the recursive relation Fn = Fn−1 + Fn−2 , where F0 =

1 = F1 . Hence the Fibonacci sequence is {1, 1, 2, 3, 5, 8, . . .}.

7. Find (i) gcd (2700, 15120). (ii) lcm [2700, 15120].

8. Determine integers x and y so that

(i) 180x + 72y = 36.

(ii) 605x + 96y = 67.

9. For a positive integer n, show that there exist integers a and b such
that n is a multiple of (a, b) = d and ab = n iﬀ d2 | n.
(Hint: By Exercise 3 above, (a, b)[a, b] = ab = n and that (a, b) | [a, b].
Hence d2 | n. Conversely, if d2 | n, n = d2 c. Now take d = a and dc = b.)

10. Prove that amn − 1 is divisible by am − 1, when m and n are positive

integers. Hence show that if an − 1 is prime, a ≥ 2, then n must be
prime.
2N
P−1 1
11. Prove that is not an integer for n > 1.
n=1 2j − 1

12. Let pn denote the n-th prime (p1 = 2, p2 = 3 and so on). Prove that
n
pn > 22 .

n
13. Prove that if an = 22 + 1, then (an , an + 1) = 1 for each n ≥ 1. (Hint:
n
Set 22 = x).

14. If n = pa11 · · · par r is the prime factorization of n, show that d(n), the
number of distinct divisors of n is (a1 + 1) · · · (ar + 1).
Chapter 3 Basics of Number Theory 128

3.6 Congruences

A congruence is a division with reference to a number or a function. The

congruence relation has a notational convenience that could be employed
in making addition, subtraction, multiplication by constants and division in
some special cases.

Definition 3.6.1:
Given integers a, b and n (6= 0), a is said to be congruent to b modulo n, if
a − b is divisible by n, that is, a − b is a multiple of n. In symbols, it is
denoted by a ≡ b (mod n), and is read as a is congruent to b modulo n. The
number n is the modulus of the congruence.

Definition 3.6.2:
If f (x), g(x) and h(x) (6= 0) are any three polynomials with real coeﬃcients
then by f (x) ≡ g(x) (mod h(x)), we mean that f (x) − g(x) is divisible by
h(x) over R, that is to say, there exists a polynomial q(x) with real coeﬃcients
such that
f (x) − g(x) = q(x)h(x).

The congruence given in Deﬁnition 3.6.1 is numerical congruence while

that given in Deﬁnition 3.6.2 is polynomial congruence. We now concentrate

on numerical congruence. Trivially, a ≡ b (mod m), iﬀ a ≡ b mod (−m) .
Hence we assume without loss of generality that the modulus of any numerical
congruence is a positive integer.

Proposition 3.6.3: 1. a ≡ b (mod m) iﬀ b ≡ a (mod m).

Chapter 3 Basics of Number Theory 129

2. If a ≡ b(mod m), and b ≡ c(mod m), then a ≡ c(mod m).

3. If a ≡ b (mod m), then for any integer k, ka ≡ kb (mod m). In par-

ticular, taking k = −1, we have −a ≡ −b (mod m), whenever a ≡ b (
mod m).

4. If a ≡ b (mod m), and c ≡ d (mod m), then

(a) a ± c ≡ b ± d (mod m), and

(b) ac ≡ bd (mod m).

Proof. We prove only 3 and 4; the rest follow immediately from the deﬁnition.

3. If a ≡ b (mod m), then a−b is divisible by m, and hence so is k(a−b) =

ka − kb. Thus ka ≡ kb (mod m).

4. If a − b and c − d are multiples of m, say, a − b = km and c − d = k ′ m

for integers k and k ′ , then (a + c) − (b + d) = (a − b) + (c − d) =
km + k ′ m = (k + k ′ )m, a multiple of m. This means that a + c ≡ b + d (
mod m). Similarly a − c ≡ b − d (mod m). Next, if a ≡ b (mod m),
then by (3), ac ≡ bc (mod m). But then c ≡ d (mod m) gives bc ≡ bd (
mod m). Hence ac ≡ bd (mod m).

Proposition 3.6.4:
If ab ≡ ac (mod m), and (a, m) = 1, then b ≡ c (mod m).

Proof. ab ≡ ac (mod m) gives that a(b − c) = km for some integer k. As

a | a(b − c), a | km. But (a, m) = 1. Hence a | k so that k = ak ′ , k ′ ∈ Z, and
Chapter 3 Basics of Number Theory 130

a(b − c) = km = ak ′ m. This however gives that b − c = k ′ m, and therefore

b ≡ c (mod m).

Corollary 3.6.5:
m
If ab ≡ ac (mod m), then b ≡ c (mod d
), where d = (a, m).

Proof. As d = (a, m), d | a and d | m. Therefore, a = da′ and m = dm′ ,

where (a′ , m′ ) = 1. Then ab ≡ ac (mod m) gives that

da′ b ≡ da′ c ( mod m), (3.14)

that is da′ b ≡ da′ c ( mod dm′ ) (3.15)

and therefore a′ b ≡ a′ c(mod m′ ). But (a′ , m′ ) = 1. Hence b ≡ c.

(mod m′ ).

Proposition 3.6.6:
If (a, m) = (b, m) = 1, then (ab, m) = 1.

Proof. Suppose (ab, m) = d > 1 and p is a prime divisor of d. Then p | m

Proposition 3.6.7:
If ax ≡ 1 (mod m) and (a, m) = 1, then (x, m) = 1.

Proof. Suppose (x, m) = d > 1. Let p be a prime divisor of d. Then p | x

and p | m. This however means, since ax − 1 = km for some k ∈ Z, that
p | 1, a contradiction.
Chapter 3 Basics of Number Theory 131

Proposition 3.6.8:
If a ≡ b (mod mi ), 1 ≤ i ≤ r, then a ≡ b (mod [m1 , . . . , mr ]), where
[m1 , . . . , mr ] stands for the lcm of m1 , . . . , mr .

Proof. The hypothesis implies that a − b is a common multiple of m1 , . . . , mr

and hence it is a multiple of the least common multiple of m1 , . . . mr . (Be-
cause if a − b = αi mi , 1 ≤ i ≤ r, and mi has the prime factorization
αi αi αi αi
mi = p1 1 p2 2 · · · pt t , 1 ≤ i ≤ r, then pj j | mi for each j, in 1 ≤ j ≤ t and
for each i, 1 ≤ i ≤ r. (Here the exponents αji are nonnegative integers. This
enables us to take the same prime factors p1 , . . . , pt for all the mi ’s. See
max αi
i j
Section 3.4.) Hence mi is divisible by pj for each j in 1 ≤ j ≤ t.
Thus each of these t numbers divides a − b and they are pairwise coprime.
Hence a − b is divisible by their product. But their product is precisely the
lcm of m1 , . . . , mr . (See Section 3.4.)

3.7 Complete System of Residues

A number b is called a residue of a number a modulo m if a ≡ b (mod m).

Obviously, b is a residue of b modulo m.

Definition 3.7.1:
Given a positive integer m, a set S = {x1 , . . . , xm } of m numbers is called
a complete system of residues modulo m if for any integer x, there exists a
unique xi ∈ S such that x ≡ xi (mod m).

We note that no two numbers xi and xj , i 6= j, in S are congruent modulo

m. For if xi ≡ xj (mod m), then since xi ≡ xi (mod m) trivially, we have
Chapter 3 Basics of Number Theory 132

a contradiction to the fact that S is a complete residue system modulo m.

Conversely, it is easy to show that any set of m numbers no two of which
are congruent modulo m forms a complete residue system modulo m. In
particular, the set {0, 1, 2, . . . , m − 1} is a complete residue system modulo
m.
Next, suppose that (x, m) = 1 and x ≡ xi (mod m). Then xi is also prime
to m. (If xi and m have a common factor p > 1 then p | x as x − xi = km,
k ∈ Z. This however means that (x, m) ≥ p > 1, a contradiction to our
assumption.) Thus if S = {x1 , . . . , xm } is a complete system of residues
modulo m and x ≡ xi (mod m), then (x, m) = 1 iﬀ (xi , m) = 1. Then
deleting the numbers xj of S that are not coprime to m, we get a subset S ′
of S consisting of a set of residues modulo m each of which is relatively prime
to m. Such a system is called a reduced system of residues modulo m. For
instance, taking m = 10, {0, 1, 2, . . . . , 9} is a a complete system of residues
modulo 10, while S ′ = {1, 3, 7, 9} is a reduced system of residue modulo 10.
The numbers in S ′ are all the numbers that are less than 10 and prime to 10.

Euler’s φ-Function

Definition 3.7.2:
The Euler function φ(n) (also called the totient function) is deﬁned to be
the number of positive integers less than n and prime to n. It is also the
cardinality of a reduced residue system modulo n.

We have seen earlier that φ(10) = 4. We note that φ(12) is also equal to
4 since 1, 5, 7, 11 are all the numbers less than 12 and prime to 12. If p is a
prime, then all the numbers in {1, 2, . . . , p − 1} are less than p and prime to
Chapter 3 Basics of Number Theory 133

p and so φ(p) = p − 1. We now present Euler’s theorem on the φ-function.

Theorem 3.7.3 (Euler):

If (a, n) = 1, then
aφ(n) ≡ ( mod n).

Proof. Let r1 , . . . , rφ(n) be a reduced residue system modulo n. Now
(ri , n) = 1 for each i, 1 ≤ i ≤ φ(n). Further, as (a, n) = 1, by Proposi-
tion 3.6.6, (ari , n) = 1. Moreover, if i 6= j, ari 6≡ arj (mod n). For, ari ≡ arj (
mod n) implies (as (a, n) = 1), by virtue of Proposition 3.6.4, that ri ≡ rj (

mod n), a contradiction to the fact that r1 , . . . , rφ(n) is a reduced residue

system modulo n. Hence ari , . . . , arφ(n) is also a reduced residue system
modulo n and  
φ(n) φ(n)
Y Y
(ari ) ≡  rj  ( mod n).
i=1 j=1
Qφ(n) Qφ(n)
This gives that aφ(n) i=1 ri ≡
rj (mod n). Further (ri , n) = 1 for
j=1
Q
φ(n)
each i = 1, 2, . . . , φ(n) gives that i=1 ri , n = 1 by Proposition 3.6.6.
Consequently, by Proposition 3.6.4,

aφ(n) ≡ 1 ( mod n).

Corollary 3.7.4 (Fermat’s Little Theorem):

If n is a prime and (a, n) = 1, then an−1 ≡ 1 (mod n).

Proof. If n is a prime, then φ(n) = n − 1. Now apply Euler’s theorem

(Theorem 3.7.3).

We see more properties of the Euler function φ(n) in Section 3.11. An-
other interesting theorem in elementary number theory is Wilson’s theorem.
Chapter 3 Basics of Number Theory 134

Theorem 3.7.5:
If u ∈ [1, m − 1] is a solution of the congruence ax ≡ 1 (mod m), then all
the solutions of the congruence are given by u + km, k ∈ Z. In particular,
there exists a unique u ∈ [1, m − 1] such that au ≡ 1(mod m).

Proof. Clearly, u+km, k ∈ Z, is a solution of the congruence ax ≡ 1 (mod m)

because a(u + km) = au + akm ≡ au ≡ 1 (mod m). Conversely, let ax0 ≡ 1 (
mod m). Then a(x0 − u) ≡ 0 (mod m), and therefore by Proposition 3.6.7,
(x0 − u) ≡ 0 (mod m) as (a, m) = 1 in view of au ≡ 1 (mod m). Hence
x0 = u + km for some k ∈ Z. The proof of the latter part is trivial.

Theorem 3.7.6 (Wilson’s theorem):

If p is a prime, then
(p − 1)! ≡ −1 ( mod p).

Proof. The result is trivially true if p = 2 or 3. So let us assume that prime

p ≥ 5. We look at (p − 1)! = 1 · 2 · · · (p − 1). Now

1 ≡ 1 ( mod p), and

p − 1 ≡ −1 ( mod p).

Hence it is enough if we prove that

2 · 3 · · · (p − 2) ≡ 1 ( mod p),

since the multiplication of the three congruences (See Proposition 3.6.3) will
yield the required result.
Now, as p (≥ 5) is an odd prime, the cardinality of L is even, where
L = {2, 3, . . . , p − 2}. For each i ∈ L, by virtue of Corollary 3.3.5, there
Chapter 3 Basics of Number Theory 135

exists a unique j, 1 ≤ j ≤ p − 1 such that ij ≡ 1 (mod p). Now j 6= 1, and

j 6= p − 1. If j = p − 1, then ij = i(p − 1) ≡ −i (mod p) and therefore

−i ≡ 1 (mod p). This means that p | (i + 1). This is not possible as i ∈ L .
Also ij = ji. Moreover j 6= i since j = 1 implies that i2 ≡ 1 (mod p) and
therefore p | (i − 1) or p | (i + 1). However this is not possible as this will
imply that i = 1 or (p − 1). Thus each i ∈ L can be paired oﬀ with a unique
p−3
j ∈ L such that ij ≡ 1 (mod p). In this way we get congruences.
2
p−3
Multiplying these congruences, we get
2

2 · 3 · · · (p − 2) ≡ 1 · · · 1 ≡ 1 ( mod p)

Example 3.7.7:
As an application of Wilson’s theorem, we prove that 712!+1 ≡ 0 (mod 719).

Proof. Since 719 is a prime, Wilson’s theorem implies that 718! + 1 ≡ 0 (

mod 719). We now rewrite 718! in terms of 712 as

718! = (712)! × 713 × 714 × 715 × 716 × 717 × 718

= 712!(719 − 6)(719 − 5) · · · (719 − 1)

= 712! (M (719) + 6!) .

M (719) stands for a multiple of 719.

≡ 712! × 6! ( mod 719)

≡ 712! × 720 ( mod 719)

≡ 712! × (719 + 1) ( mod 719)

≡ (712! × 719) + 712! ( mod 719)

≡ 712! ( mod 719).
Chapter 3 Basics of Number Theory 136

Thus 712! + 1 ≡ 718! + 1 ≡ 0 (mod 719)

If a ≡ b (mod m), then by Proposition 3.6.3, a2 ≡ b2 (mod m), and so

a3 ≡ b3 (mod m) and so on. In general, ar ≡ br (mod m) for every positive
integer r and hence, again by Proposition 3.6.3, tar ≡ tbr for every integer
t. In particular, if f (x) = a0 + a1 x + · · · + an xn is any polynomial in x
with integer coeﬃcients, then f (a) = a0 + a1 · a + a2 · a2 + · · · + an · an ≡
a0 + a1 · b + a2 · b2 + · · · + an · bn = f (b)(mod m). We state this result as a
theorem.

Theorem 3.7.8:
If f (x) is a polynomial with integer coeﬃcients, and a ≡ b(mod m), then
f (a) ≡ f (b) (mod m).

3.8 Linear Congruences and Chinese Remain-

der Theorem

Let f (x) be a polynomial with integer coeﬃcients. By a solution of the

polynomial congruence
f (x) ≡ 0 ( mod m), (3.16)

we mean an integer x0 with f (x0 ) ≡ 0 (mod m). If x0 ≡ y0 (mod m), by

Theorem 3.7.8, f (x0 ) ≡ f (y0 ) (mod m), and hence y0 is also a solution of
the congruence (3.16). Hence, when we speak of all the solutions of (3.16),
we consider congruent solutions as forming a class. Hence by the number of
solutions of a polynomial congruence, we mean the number of distinct con-
gruence classes of solutions. Equivalently, it is the number of incongruent
Chapter 3 Basics of Number Theory 137

solutions modulo m of the congruence. Since any set of incongruent num-

bers modulo m is of cardinality at most m, the number of solutions of any
polynomial congruence modulo m is at most m.
The congruence (3.16) is linear if f (x) is a linear polynomial. Hence a
linear congruence is of the form

ax ≡ b ( mod m) (3.17)

It is not always necessary that a congruence modulo m has m solutions. In

fact, a congruence modulo m may have no solution or less than m solutions.
For instance, the congruence 2x ≡ 1 (mod 6) has no solution since 2x − 1 is
an odd integer and hence cannot be a multiple of 6. The congruence x3 ≡ 1 (
mod 7) has exactly 3 solutions given by x ≡ 1, 2, 4 (mod 7).

Theorem 3.8.1:
Let (a, m) = 1 and b an integer. Then the linear congruence

ax ≡ b ( mod m) (3.18)

has exactly one solution.

Proof. (See also Corollory 3.3.5.) The numbers 1, 2, . . . , m form a complete

residue system modulo m. Hence, as (a, m) = 1, the numbers a · 1, a · 2,
· · · , a · m also form a complete residue system modulo m. Now any integer is
congruent modulo m to a unique integer in a complete residue system modulo
m (by deﬁnition of a complete residue system). Hence b is congruent modulo
m to a unique a · i, 1 ≤ i ≤ m. Thus there exists a unique x ∈ {1, 2, . . . m}
such that

ax ≡ b ( mod m)
Chapter 3 Basics of Number Theory 138

If (a, m) = 1, taking b = 1 in Theorem 3.8.1, we see that there exists a

unique x in {1, 2, . . . m} such that

ax ≡ 1 ( mod m)

This unique x is called the reciprocal of a modulo m.

We have seen in Theorem 3.8.1 that if (a, m) = 1, the congruence has
exactly one solution. What happens if (a, m) = d?

Theorem 3.8.2:
Let (a, m) = d. Then the congruence

ax ≡ b ( mod m) (3.19)

has a solution iﬀ d | b. If d | b, the congruence has exactly d solutions. The

d solutions are given by

x0 , x0 + m/d, x0 + 2m/d, . . . , x0 + (d − 1)m/d, (3.20)

where x0 is the unique solution in {1, 2, . . . , m/d} of the congruence

a b m
x≡ mod .
d d d

Proof. Suppose x0 is a solution of the congruence (3.19). Then ax0 = b+km,

k ∈ Z. As d | a and d | m, d | b. Conversely, if d | b, let b = db0 . Further, if
a = da0 and m = dm0 . Then the congruence (3.19) becomes

a0 dx ≡ db0 ( mod dm0 )

and therefore
a0 x ≡ b0 ( mod m0 ) (3.21)
Chapter 3 Basics of Number Theory 139

where (a0 , m0 ) = 1. But the latter has a unique solution x0 ∈ {1, 2, . . . m0 =

m/d}. Hence a0 x0 ≡ b0 (mod m0 ). So x0 is also a solution of (3.19).
Assume now that d | b. Let y be any solution of (3.19). Then ay ≡
b (mod m). Also ax0 ≡ b (mod m). Hence ay ≡ ax0 (mod m) so that
a0 dy ≡ a0 dx0 (mod (m/d)d). Hence a0 y ≡ a0 x0 (mod (m/d)). As d =
(a, m), (a0 , m/d) = 1. So by Proposition 3.6.4, y ≡ x0 (mod (m/d)) and
so y = x0 + k(m/d) for some integer k. But k ≡ r (mod d) for some r,
0 ≤ r < d. This gives k md ≡ r md (mod m). Thus x0 + k(m/d) ≡ x0 + r(m/d) (
mod m), 0 ≤ r < d and so y is congruent modulo m to one of the numbers
in (3.20).

Chinese Remainder Theorem

Suppose there are more than one linear congruences. In general, they need
not possess a common solution. (In fact, as seen earlier, even a single linear
congruence may not have a solution.) The Chinese Remainder Theorem
ensures that if the moduli of the linear congruences are pairwise coprime,
then the simultaneous congruences all have a common solution. To start
with, consider congruences of the form x ≡ bi (mod mi ).

Theorem 3.8.3:
Let m1 , . . . , mr be positive integers that are pairwise coprime, that is, (mi , mj ) =
1 whenever i 6= j. Let b1 , . . . , br be arbitrary integers. Then the system of
congruences

x ≡ b1 ( mod m1 )
..
.
Chapter 3 Basics of Number Theory 140

x ≡ br ( mod mr )

has exactly one solution modulo M = m1 · · · mr .

Proof. Let Mi = M/mi , 1 ≤ i ≤ r. Then, by hypothesis,

Y
r
(Mi , mi ) = mk , mi = 1.
k=1
k6=i

Hence each Mi has a unique reciprocal Mi′ modulo mi , 1 ≤ i ≤ r. Let

x = b1 M1 M1′ + · · · + br Mr Mr′ . (3.22)

Now mi divides each Mj , j 6= i. Hence, taking modulo mi on both sides of

(3.22), we get

x ≡ bi Mi Mi′ ( mod mi )

≡ bi ( mod mi ) as Mi Mi′ ≡ 1 ( mod mi ).

Hence x is a common solution of all the r congruences.

We now show that x is unique modulo M . In fact, if y is another common
solution, we have
y ≡ bi ( mod mi ), 1 ≤ i ≤ r,

and, therefore,
x ≡ y ( mod mi ), 1 ≤ i ≤ r.

This means, as the mi ’s are pairwise coprime, that

x ≡ y ( mod M ).

We now present the general form of the Chinese Remainder Theorem.

Chapter 3 Basics of Number Theory 141

Theorem 3.8.4 (Chinese Remainder Theorem):

Let m1 , . . . , mr be positive integers that are pairwise coprime. Let b1 , . . . , br
be arbitrary integers and let integers a1 , . . . , ar satisfy (ai , mi ) = 1, 1 ≤ i ≤ r.
Then the system of congruences

a1 x ≡ b1 ( mod m1 )
..
.

ar x ≡ br ( mod mr )

has exactly one solution modulo M = m1 m2 · · · mr .

Proof. As (ai , mi ) = 1, ai has a unique reciprocal a′i modulo mi so that

ai a′i ≡ 1 (mod mi ). Then the congruence ai x ≡ bi (mod mi ) is equivalent
to a′i ai x ≡ a′i bi (mod mi ), that is, to x ≡ a′i bi (mod mi ), 1 ≤ i ≤ r. By
Theorem 3.8.3, these congruences have a common unique solution x modulo
M = m1 · · · mr . Because of the equivalence of the two sets of congruences, x
is a common solution to the given set of r congruences as well.

3.9 Lattice Points Visible from the Origin

A lattice point of the plane is a point both of whose cartesian coordinates

(with reference to a pair of rectangular axes) are integers. For example,
(2, 3) is a lattice point while (2.5, 3) is not. A lattice point (a, b) is said to
be visible from another lattice point (a′ , b′ ) if the line segment joining (a′ , b′ )
with (a, b) contains no other lattice point. In other words, it means that
there is no lattice point that obstructs the view of (a, b) from (a′ , b′ ). It is
clear that (±1, 0) and (0, ±1) are the only lattice points on the coordinate
Chapter 3 Basics of Number Theory 142

axes visible from the origin. Further, the point (2, 3) is visible from the
origin, but (2, 2) is not (See Figure 3.1). Hence we consider here lattice
points (a, b) not on the coordinate axes but visible from the origin. Without
loss of generality, we may assume that a ≥ 1 and b ≥ 1.
y
y
(a, b) b

b (2,3)
b (2,2) b (a′ , b′ )
b (1,1)
b x
x O
O

Figure 3.2:
Figure 3.1: Lattice points visible
from the origin

Lemma 3.9.1:
The lattice point (a, b) (not belonging to any of the coordinate axes) is visible
from the origin iﬀ (a, b) = 1.

Proof. As mentioned earlier, assume without loss of generality, that a ≥ 1

and b ≥ 1. Similar argument applies in the other cases.
Suppose (a, b) = 1. Then (a, b) must be visible from the origin. If not,
there exists a lattice point (a′ , b′ ) with a′ < a and b′ < b in the segment
b b′
joining (0, 0) with (a, b). (See Figure 3.2) Then = ′ (= slope of the line
a a
joining (0, 0) with (a, b)) so that ba = b a. Now a | b′ a and so a | ba′ . But
′ ′

(a, b) = 1 and hence by Proposition 3.3.2, a | a′ . But this is a contradiction

since a′ < a.
Next assume that (a, b) = d > 1. Then a = da′ , b = db′ for positive
Chapter 3 Basics of Number Theory 143

integers a′ , b′ . Then the lattice point (a′ , b′ ) lies on the segment joining (0,
0) with (a, b), and since a′ < a and b′ < b, (a, b) is not visible from the
origin.

Corollary 3.9.2:
The lattice point (a, b) is visible from the lattice point (c, d) iﬀ (a−c, b−d) =
1.

Proof. Shift the origin to (c, d) through parallel axes. Then, the new origin
is (c, d) and the new coordinates of the original point (a, b) with respect to
the new axes are (a − c, b − d). Now apply Lemma 3.9.1.

We now give an application of the Chinese remainder theorem to the set

of lattice points visible from the origin.

Theorem 3.9.3:
The set of lattice points visible from the origin contains arbitrarily large
square gaps. That is, given any positive integer k, there exists a lattice point
(a, b) such that none of the lattice points

(a + r, b + s), 1 ≤ r ≤ k, 1 ≤ s ≤ k,

is visible from the origin.

Proof. Let {p1 , p2 , . . .} be the sequence of primes. Given the positive integer
k, construct a k by k matrix M whose ﬁrst row is the sequence of ﬁrst
k primes p1 , p2 , . . . , pk , the second row is the sequence of next k primes,
Chapter 3 Basics of Number Theory 144
y

(a + t, b + k) (a + k, b + k)
b b

(a + r, b + s) b (a + k, b + t)
b

(a, b) b x

Figure 3.3: Lattice points visible from (a, b)

namely pk+1 , . . . , p2k , and so on.

 
p p2 . . . ps . . . pk
 1 
 
M : pk+1 pk+2 . . . pk+s . . . p2k 
 
.. .. . . . .. ..
. . . .

Let mi (resp. Mi ) be the product of the k primes in the i-th row (resp.
column) of M . Then for i 6= j, (mi , mj ) = 1 and (Mi , Mj ) = 1 because in
the products mi and mj (resp. Mi and Mj ), there is no repetition of any
prime. Now by Chinese Remainder Theorem, the set of congruences

x ≡ −1 ( mod m1 )

x ≡ −2 ( mod m2 )
..
.

x ≡ −k ( mod mk )

has a unique common solution a modulo m1 · · · mk . Similarly, the system

y ≡ −1 ( mod M1 )

y ≡ −2 ( mod M2 )
Chapter 3 Basics of Number Theory 145

..
.

y ≡ −k ( mod Mk )

has a unique common solution b modulo M1 · · · Mk . Then a ≡ −r (mod mr ),

and b ≡ −s (mod Ms ), 1 ≤ r, s ≤ k. Hence a + r is divisible by the product
of all the primes in the r-th row of M , and similarly b + s is divisible by the
product of all the primes in the s-th column of M . Hence the prime common
to the r-th row and s-th column of M divides both a + r and b + s. In other
words (a + r, b + s) 6= 1. So by Lemma 3.9.1, the lattice point (a + r, b + s) is
not visible from the origin. Now any lattice point inside the square is of the
form (a + r, b + s), 0 < r < k, 0 < s < k. (For 1 ≤ r ≤ k − 1, 1 ≤ s ≤ k − 1,
we get lattice points inside the square while for r = k, or s = k, we get lattice
points on the boundary of the square.)

3.10 Exercises

1. If a is prime to m, show that 0 · a, 1 · a, 2 · a, . . . , (m − 1) · a form a

complete system of residues mod m. Hence show that for any integer
b and for (a, m) = 1, the set {b, b + a, b + 2a, . . . , b + (m − 1)a} forms
a complete system of residues modulo m.

2. Show that the sum of the numbers less than n and prime to n is 12 nφ(n).

3. Prove that 18! + 1 is divisible by 437.

4. If p and q are distinct primes, show that pq−1 + q p−1 − 1 is divisible by

pq.
Chapter 3 Basics of Number Theory 146

5. If p is a prime, show that 2(p − 3)! + 1 ≡ 0 (mod p). [Hint: (p − 1)! =

(p − 3)!(p − 2)(p − 1) ≡ 2(p − 3)!].

6. Solve: 5x ≡ 2 (mod 6). [Hint: See Theorem 3.8.1].

7. Solve: 3x ≡ 2 (mod 6). [Hint: Apply Theorem 3.8.2].

8. Solve: 5x ≡ 10 (mod 715). [Hint: Apply Chinese Remainder Theorem].

9. Solve the simultaneous congruences:

(i) x ≡ 1 (mod 3), x ≡ 2 (mod 4); 2x ≡ 1 (mod 5).

(ii) 2x ≡ 1 (mod 3), 3x ≡ 1 (mod 4); x ≡ 2 (mod 5).

10. Prove the converse of Wilson’s theorem, namely, if (n − 1)! + 1 ≡ 0 (

mod n), then n is prime. [Hint: Prove by contradiction].

3.11 Some Arithmetical Functions

Arithmetical functions are real or complex valued functions deﬁned on N,

the set of positive integers. We have already come across the arithmetical
function φ(n), the Euler’s totient function. In this section, we look at the
basic properties of the arithmetical functions φ(n), the Möbius function µ(n)
and the divisor function d(n).

Definition 3.11.1:
An arithmetical function or a number-theoretical function is a function whose
domain is the set of natural numbers and codomain is the set of real or
complex numbers.
Chapter 3 Basics of Number Theory 147

The Möbius Function µ(n)

Definition 3.11.2:
The Möbius function µ(n) is deﬁned as follows:

µ(1) = 1;

If n > 1, and n = pa11 · · · par r is the prime factorization of n,





 (−1)r if a1 = a2 = · · · = ar = 1


µ(n) = (that is, if n is a product of r distinct primes)





0 otherwise (that is, n has a square factor > 1).

For instance, if n = 5 · 11 · 13 = 715, a product of three distinct primes,

µ(n) = (−1)3 = −1, while if n = 52 · 11 · 13 or n = 73 · 13, µ(n) = 0. Most of
these arithmetical functions have nice relations connecting n and the divisors
of n.

Theorem 3.11.3:
If n ≥ 1, we have



X 1 
1 if n = 1
µ(d) = = (3.23)
n 

d|n 0 if n > 1.

(Recall that for any real number x, ⌊x⌋ stands for the ﬂoor of x, that is,
the greatest integer not greater than x. (For example, ⌊ 15
2
⌋ = 7). )

Proof. If n = 1, µ(1) is, by deﬁnition, equal to 1 and hence the relation

(3.23) is valid. Now assume that n > 1 and that p1 , . . . , pr are the distinct
prime factors of n. Then any divisor of n is of the form pa11 · · · par r , where
Chapter 3 Basics of Number Theory 148

each ai ≥ 0 and hence the divisors of n for which the µ-function has nonzero
values are the numbers in the set {pσ1 1 · · · pσr r : σi = 0 or 1, 1 ≤ i ≤ r} =
{1; p1 , . . . , pr ; p1 p2 , p1 p3 , . . . , pr−1 pr ; . . . ; p1 p2 · · · pr }. Now µ(1) = 1; µ(pi ) =
(−1)1 = −1; µ(pi pj ) = (−1)2 = 1; µ(pi pj pk ) = (−1)3 = −1 and so on.

Further the number of terms of the form pi is n1 , of the form pi pj is n2 and
so on. Hence if n > 1,
X
n n r n
µ(d) = 1 − + − · · · + (−1)
1 2 r
d|n

= (1 − 1)r = 0

A relation connecting φ and µ

Theorem 3.11.4:
If n ≥ 1, we have
X n
φ(n) = µ(d) (3.24)
d
d|n

j k 1 j k
1 1
Proof. If (n, k) = 1, then (n, k)
= 1
= 1, while if (n, k) > 1, (n, k)
=
⌊a positive number less than 1⌋ = 0. Hence
n
X
1
φ(n) = .
k=1
(n, k)

Now replacing n by (n, k) in Theorem 3.11.3, we get

X
1
µ(d) = .
(n, k)
d|(n, k)

Hence
n
X X
φ(n) = µ(d)
k=1 d|(n, k)
Chapter 3 Basics of Number Theory 149

n X
X
= µ(d). (3.25)
k=1 d|n
d|k

For a ﬁxed divisor d if n, we must sum over all those k in the range 1 ≤ k ≤ n
which are multiples of d. Hence if we take k = qd, then 1 ≤ q ≤ n/d.
Therefore (3.25) reduces to
n/d
XX
φ(n) = µ(d)
d|n q=1
n/d
X X X n
= µ(d) 1= µ(d)
q=1
d
d|n d|n

Theorem 3.11.5:
P
If n ≥ 1, we have d|n φ(d) = n.

Proof. For each divisor d of n, let A(d) denote those numbers k, 1 ≤ k ≤ n,

such that (k, n) = d. Clearly, the sets A(d) are pairwise disjoint and their
union is the set {1, 2, . . . , n}. (For example, if n = 6, d = 1, 2, 3 and
6. Moreover A(1) = {k : (k, n) = 1} = {set of numbers ≤ n and prime to
n} = {1, 5}. Similarly, A(2) = {2, 4}, A(3) = {3}, A(4) = φ = A5 , and
A6 = {6}. Clearly, the sets A(1) to A(6) are pairwise disjoint and their
union is the set {1, 2, 3, 4, 5, 6}). Then if |A(d)| denotes the cardinality of
the set A(d),
X
| A(d) |= n (3.26)
d|n

k n
But (k, n) = d iff , = 1 and 0 < k ≤ n, that is, iff 0 < kd ≤ nd . Hence
d d
if we set q = kd , there is a 1 − 1 correspondence between the elements in A(d)
and those integers q satisfying 0 < q ≤ nd , where (q, n/d) = 1. The number of
such q’s is φ(n/d). [Note: If q = n/d, then, (q, n/d) = (n/d, n/d) = n/d = 1
Chapter 3 Basics of Number Theory 150
n
iff d = n. In this case, q = 1 = φ(1) = φ . Thus | A(d) |= φ(n/d) and
d
(3.26) becomes
X
φ(n/d) = n.
d|n
P
But this is equivalent to d|n φ(d) = n since as d runs through all the divisors
of n, so does n/d.

As an example, take n = 12. Then d runs through 1, 2, 3, 4, 6 and 12.

P
Now φ(1) = φ(2) = 1, φ(3) = φ(4) = φ(6) = 2, and φ(12) = 4 and φ(d) =
d|n
φ(1) + φ(2) + φ(3) + φ(4) + φ(6) + φ(12) = (1 + 1) + (2 + 2 + 2) + 4 = 12 = n.

A Product Formula for φ(n)

We now present the well-known product formula for φ(n) expressing it as a

product extended over all the distinct prime factors of n.

Theorem 3.11.6:
For n ≥ 2, we have
Y
1
φ(n) = n 1− (3.27)
p|n
p
p=a prime

P
Proof. We use the formula φ(n) = d|n µ(d) nd of Theorem 3.11.4 for the
proof. Let p1 , . . . , pr be the distinct prime factors of n. Then
Y Y r X 1 X 1 X 1
1 1
1− = 1− = 1− + − +··· ,
p i=1
pi i
pi i6=j pi pj pi pj pk
p|n
p=a prime
(3.28)
P 1
where, for example, the sum pi pj pk
is formed by taking distinct prime
divisors pi , pj and pk of n. Now, by deﬁnition of the µ-function, µ(pi ) = −1,
Chapter 3 Basics of Number Theory 151

µ(pi pj ) = 1, µ(pi pj pk ) = −1 and so on. Hence the sum on the right side of
(3.28) is equal to
X µ(pi ) X µ(pi pj ) X µ(d)
1+ + + ··· = ,
pi
pi pi ,pj
pi pj d
d|n

since all the other divisors of n, that is, divisors which are not products of
distinct primes, contain a square and hence their µ-values are zero. Thus
Y X
1 n
n 1− = µ(d) = φ(n) (by Theorem 3.11.4)
p d
p|n d|n
p=a prime

The Euler φ-function has the following properties.

Theorem 3.11.7: (i) φ(pr ) = pr − pr−1 for prime p and r ≥ 1.

(ii) φ(mn) = φ(m)φ(n)(d/φ(d)), where d = (m, n).

(iii) φ(mn) = φ(m)φ(n), if m and n are relatively prime.

(iv) a | b implies that φ(a) | φ(b).

(v) φ(n) is even for n ≥ 3. Moreover, if n has k distinct odd prime factors,
then 2k | φ(n).

Proof. (i) By the product formula,

r 1 r
φ(p ) = p 1 − = pr − pr−1 .
p

Q
1
(ii) We have φ(mn) = mn p|mn 1 − p . If p is a prime that divides
p=a prime
mn, then p divides either m or n. But then there may be primes which
divide both m and n and these are precisely the prime factors of (m, n).
Chapter 3 Basics of Number Theory 152

Hence if we look at the primes that divide both m and n separately,

the primes p that divide (m, n) = d occur twice. Therefore

Q 1
Q
1

Y 1 p|m 1 − p
· p|n 1 − p
1− = Q
p 1− 1
p|mn p|(m, n) p

φ(m) φ(n)
·
= m n (by the product formula)
φ(d)
d
1 d
= φ(m)φ(n)
mn φ(d)
1
This gives the required result since the term on the left is φ(mn).
mn

(iii) If (m, n) = 1, then d in (ii) is 1. Now apply (ii).

(iv) If a | b, then every prime divisor of a is a prime divisor of b. Hence

Q Q
a 1 − p1 | b 1 − p1 . This however means that φ(a) | φ(b).
p|a p|b

(v) If n ≥ 3, either n is a power of 2, say, 2r , r ≥ 2, or else n = 2k m,

where m ≥ 3 is odd. If n = 2r , φ(n) = 2r−1 , an even number. In
the other case, by (iii), φ(n) = φ(2k )φ(m). Now if m = pk11 · · · pks s is
Q
the prime factorisation of m, by (iii) φ(m) = si=1 φ(pki i ) = by (i)
Qs ki −1
i=1 pi · (pi − 1). Now each pi − 1 is even and hence 2s is a factor of
φ(n).

There are other properties of the three arithmetical functions described

above as also other arithmetical functions not described here. The interested
reader can consult [1, 2, 54].
We now present an application of Theorem 3.11.5.
Chapter 3 Basics of Number Theory 153

An Application

We prove that

(1, 1) (1, 2) . . . (1, n)

(2, 1) (2, 2) . . . (2, n)
.. .. . . .. = φ(1)φ(2) · · · φ(n).
. . . .
(n, 1) (n, 2) . . . (n, n)

Proof. Let D be the diagonal matrix

 
φ(1) 0 ... 0 
 
 0 φ(2) ... 0 
 
 . .. . . . .. 
 .. . . 
 
 
0 φ(n)

Then det D = φ(1)φ(2) · · · φ(n). Deﬁne the n by n matrix A = (aij ) by




1 if i | j
aij =


0 otherwise.

Then A is an upper triangular matrix (See Deﬁnition ??) with all diagonal
entries equal to 1. Hence det A = 1 = det At . Set S = At DA. Then

det S = det At · det D · det A

= 1 · (φ(1)φ(2) · · · φ(n)) · 1

= φ(1)φ(2) · · · φ(n).

We now show that S = (sij ), where sij = (i, j). This would prove our
statement.
Chapter 3 Basics of Number Theory 154

Now At = (bij ), where bij = aji . Hence if D is the matrix (dαβ ), then the
(i, j)-th entry of S is given by:
n X
X n
sij = biα dαβ aβj .
α=1 β=1

Now dαβ = 0 if α 6= β, and dαα = φ(α) for each α. Therefore

n
X
sij = biα dαα aαj
α=1
n
X
= biα aαj φ(α)
α=1
Xn
= aαi aαj φ(α)
α=1

Now by deﬁnition aαi = 0 iﬀ α ∤ i and aαi = 1 if α | i. Hence the nonzero

terms of the last sum are given by those α that divide i as well as j. Now
P
α | i and α | j iﬀ α | (i, j). Thus sij = φ(α). But the sum on the right
α|(i, j)
is, by Theorem 3.11.5, (i, j).

3.12 Exercises

1. Find those n for which φ(n) | n.

2. An arithmetical function f is called multiplicative if it is not identically

zero and f (mn) = f (m)f (n) whenever (m, n) = 1. It is completely
multiplicative if f (mn) = f (m)f (n) for all positive integers m and n.
Prove the following:

(i) φ is multiplicative but not completely multiplicative.

Chapter 3 Basics of Number Theory 155

(ii) µ is multiplicative but not completely multiplicative.

(iii) d(n), the number of positive divisors of n is not even multiplica-

tive.

(iv) If f is multiplicative, then prove that

X Y 1

µ(d)f (d) = 1− .
p
d|n p|n
p= a prime

3. Prove that the sum of the positive integers less than n and prime to n
1
is nφ(n).
2
4. Let σ(n) denote the sum of the divisors of n. Prove that σ(n) is multi-
plicative. Hence prove that if n = pa1r · · · par r is the prime factorization
of n, then
r
Y pai +1 − 1
i
σ(n) = .
i=1
pi − 1

3.13 The big O notation

The big O notation is used mainly to express an upper bound for a given
arithmetical function in terms of a another simpler arithmetical function.

Definition 3.13.1:
Let f : N → C be an arithmetical function. Then f (n) is O(g(n)) (read big
O of g(n)), where g(n) is another arithmetical function provided that there
exists a constant K > 0 such that

|f (n)| ≤ K|g(n)| for all n ∈ N.

More generally, we have the following deﬁnition for any real-valued func-
tion.
Chapter 3 Basics of Number Theory 156

Definition 3.13.2:

Let f : R → C be a real valued function. Then f (x) = O g(x) , where
g : R → C is another function if there exists a constant K > 0 such that

|f (x)| ≤ K|g(x)| for each x in R.

An equivalent formulation of Deﬁnition 3.13.1 is the following.

Definition 3.13.3:
Let f : N → C be an arithmetical function. Then f (n) is O(g(n)), where
g(n) is another arithmetical function if there exists a constant K > 0 such
that

|f (n)| ≤ K|g(n)| for all n ≥ n0 , for some positive integer n0 (3.29)

Clearly, Deﬁnition 3.13.1 implies Deﬁnition 3.13.3. To prove the converse,

assume that (3.29) holds. Choose positive numbers c1 , c2 , . . . , cn0 −1 such that
|f (1)| < c1 |g(1)|, |f (2)| < c2 |g(2)| . . . , |f (n − 1)| < cn−1 |g(n − 1)|. Let
K0 =max (c1 , c2 , . . . , cn0 −1 , K). Then

|f (n)| ≤ K0 |g(n)| for each n ∈ N.

This is precisely Deﬁnition 3.13.1.

The time complexity of an algorithm is the number of bit operations
required to execute the algorithm. If there is just one input, and n is the size
of the input, the time complexity is a function T (n) of n. In order to see that
T (n) is not very unwieldy, usually, we try to express T (n) = O(g(n)), where
g(n) is a known less complicated function of n. The most ideal situation is
where g(n) is a polynomial in n. Such an algorithm is known as a polynomial–
Chapter 3 Basics of Number Theory 157

time algorithm. We now give the deﬁnition of a polynomial–time algorithm

where there are, not just one, but k inputs.

Definition 3.13.4:
Let n1 , . . . , nr be positive integers and let ni be a ki -bit integer (so that the
size of ni is ki ), 1 ≤ i ≤ r. An algorithm to perform a computation involving
n1 , . . . , nr is said to be a polynomial–time algorithm if there exist nonnegative
integers m1 , . . . mr such that the number of bit operations required to perform
the algorithm is the O(k1m1 . . . krmr ).

Recall that the size of a positive integer is the number of bits in it. For
instance, 8 = (1000) and 9 = (1001). So both 8 and 9 are of size 4. In fact
all numbers n such that 2k−1 ≤ n < 2k are k-bits. Taking logarithms with
respect to base 2, we get k − 1 ≤ log2 n < k and hence k − 1 ≤ ⌊log2 n⌋ < k,
so that ⌊log2 n⌋ = k − 1.Thus k = 1 + ⌊log2 n⌋ and hence k is O(log2 n). Thus
we have proved the following result.

Theorem 3.13.5:
The size of n is O(log2 n).

Note that in writing O(log n), the base of the logarithm is immaterial.
For, if the base is b, then any number that is O(log2 n) is O(logb n) and vice
verse. This is because log2 n = logb n, log2 b, and log2 b can be absorbed in
the constant K of Deﬁnition 3.13.1.

Example 3.13.6:
Let g(n) be a polynomial of degree t. Then g(n) is O(nt ).
Chapter 3 Basics of Number Theory 158

Proof. Let g(n) = a0 nt + a1 nt−1 + · · · + at , ai ∈ R. Then

|g(n)| ≤|a0 |nt + |a1 |nt−1 + . . . + |at |

≤nt (|a0 | + |a1 | + . . . + |at |)

t
X
=Knt , where K = |ai |.
i=0

Thus g(n) is O(nt ).

We now present two examples of a polynomial time algorithm.

We have already described (see Section 3.3) Euclid’s method of computing
the gcd of two positive integers a and b.

Theorem 3.13.7:
Euclid’s algorithm is a polynomial time algorithm.

Proof. We show that Euclid’s algorithm of computing the gcd (a, b), a > b,
can be performed in time O(log3 a) .
Adopting the same notation as in (3.5), we have

rj = qj+2 rj+1 + rj+2 , 0 ≤ rj+2 < rj+1 .

Now the fact that qj+2 ≥ 1 gives,

rj ≥ rj+1 + rj+2 > 2rj+2 ,

1
Hence rj+2 < r
2 j
for each j. This means that the remainder in every
other step in the Euclidean algorithm is less than half of the original re-
mainder. Hence if a = O(2k ), then are at most k steps in the Euclidean
Chapter 3 Basics of Number Theory 159

algorithm. Now, how about the number of arithmetic operations in each

step? In equation (3.5), the number rj is divided by rj+1 and the remain-
der rj+2 is computed. Since both rj and rj+1 are numbers less than a they
are O(log a). Hence equation (3.5), involves O(log2 a) bit operations. Since
there are k = O(log a) such steps, the total number of bit operations is
(O(log3 a)).

Next we show that the modular exponentiation ac (mod m) for positive

integers a, c and m can be performed in polynomial time. Here we can take
without loss of generality that a < m (Because if a ≡ a′ (mod m), then
ac ≡ (a′ )c (mod m), where a′ can be taken to be < m). We now show that
ac (mod m) can be computed in O(log a log2 m) bit operations. Note that
O(log a) is O(log m), Write c in binary. Let

c = (bk−1 bk−2 · · · b1 b0 ) in the binary scale.

k−1
Then c = bk−1 2k−1 + bk−2 2k−2 + · · · + b0 20 , and therefore ac = abk−1 2 ·
k−2
abk−2 2 · ab1 2 ab0 , where each bi = 0 or 1. We now compute ac (mod m)
recursively by reducing the number computed at each step by mod m. Set

y0 = abk−1 = a

y1 = y02 abk−2 = abk−1 ·2 abk−2

2
y2 = y12 abk−3 = abk−1 2 abk−2 2 abk−3
..
.
i i−1
yi+1 = yi2 abk−i−2 = abk−1 2 abk−2 2 · · · abk−i−2
..
.
2 k−1 k−2
yk−1 = yk−2 abk−k = abk−1 2 abk−2 2 · · · a b0
Chapter 3 Basics of Number Theory 160

k−1 +b k−2 +···+b )

= a(bk−1 2 k−2 2 0
= ac ( mod m).

There are k−1 steps in the algorithm. Note that yi+1 is computed by squaring
yi and multiplying the resulting number by 1 if bk−i−2 = 0 or else multiplying
the resulting number by a if bk−i−2 = 1. Now yi (mod m) being a O(log m)
number, to compute yi2 , we make O(log2 m) = O(t2 ) where t = O(log2 m)
bit operations. yi being t-bit, yi2 is a 2t or (2t + 1) bit number and so
it is also a O(t)-bit number. Now we reduce yi2 modulo m, that is, we
divide the O(t) number yi2 by the O(t) number m. Hence this requires an
additional O(t2 ) bit operations. Thus in all we have performed until now
O(t2 ) + O(t2 ) bit operations, that is O(t2 ) bit operations. Having computed
yi2 (mod m), we next multiply it by a0 or a1 . As a is O(log2 m) = O(t),
this requires O(t2 ) bit operations. Thus in all, computation of yi+1 from yi
requires O(t2 ) bit operations. But then there are k − 1 = O(log2 c) steps
in the algorithm. Thus the number of bit operations in the computation of
ac (mod m) is O(log2 c log22 m) = O(kt2 ). Thus the algorithm is a polynomial–
time algorithm.
Next, we give an algorithm that is not a polynomial–time algorithm.
(Sieve of Eratosthenes)
Chapter 4

Mathematical Logic

The study of logic can be traced back to the ancient Greek philosopher Aris-
totle (384–322 B.C ). Modern logic started seriously in mid-19th century
mainly due to the British mathematicians George Boole and Augustus de
Morgan. The German mathematician and philosopher Gottlob Frege (1848–
1925) is widely regarded as the founder of modern mathematical logic. Logic
is implicit in every form of common reasoning. It is concerned with the rela-
tionships between language (syntax), reasoning (deduction and computation)
and meaning (semantics). A simple and popular deﬁnition of logic is that it
is the analysis of the methods of reasoning. In the study of these methods,
logic is concerned with the form of arguments rather than the contents or the
meanings associated with the statements. To illustrate this point, consider
the following arguments:

(a) All men are mortal. Socrates is a man. Therefore Socrates is a mortal.

(b) All cyclones are devastating. Bettie is a cyclone. Therefore Bettie is

devastating.

161
Chapter 4 Mathematical Logic 162

Both (a) and (b) have the same form: All A are B; x is an A; therefore x
is a B. The truth or the falsity of the premise and the conclusion is not the
primary concern. Again, consider the following pattern of argument:

The program gave incorrect output because of incorrect input or a bug.

The input has been validated to be correct.
Therefore, the incorrect output is due to a bug.
We can extract the pattern of reasoning: A occurs due to B or due to
C; B is shown to be absent; therefore, A has occurred due to C. In general,
whether a given set of premises logically lead to a conclusion is of interest.
Logic provides the formal basis to the theory of Computer Science (e.g.,
Computability, Decidability, Complexity and Automata Theory) and Math-
ematics (e.g., Set Theory). The classical theory of computation has it origins
in the works of logicians such as Gödel, Turing, Church, Kleene, Post and
others in the 1930s. In Computer Science, the methods employed (primarily,
programming activity) in its study are themselves rooted in logic. Digital
logic is at the heart of the operational functions in all modern digital comput-
ers. Logical techniques have been successful in understanding and formalizing
algorithms, program development, program specification. and program veri-
fication. More practical examples can be cited. Predicate Calculus has been
successfully employed in the design of relational database query languages.
The programming language PROLOG is a practical realization of programming
with logic. Also, relational databases can be extended with logical rules to
give deductive databases.
The focus of the present chapter is on the first principles of propositional
calculus and predicate calculus (together, they comprise first-order logic).
Chapter 4 Mathematical Logic 163

We start with informal ideas in the next section.

4.1 Preliminaries

In simple terms, an assertion can be deﬁned as a statement. Consider the

following examples:

“This topic is very interesting”

“I am writing this sentence using English alphabets”
“x + y < z”

A proposition is an assertion which is either true or false but not both at

any time. Thus the following statements are (mathematical) propositions:

“ 2 is an even number”
“ 52! is always less than 100!”
“If x, y and z are the sides of a triangle, then x + y = z”

An assertion need not be always a proposition. This can be illustrated by

the following two examples:

(i) Let C be any input-free computational procedure. Assume that there

is a decision procedure HALT( ) such that HALT(C) returns true if C
halts and returns false otherwise. Now consider the procedure given
below:

procedure ABSURD;
begin if HALT(ABSURD) then
while true do ‘‘print Running’’
end

Chapter 4 Mathematical Logic 164

Next, consider the assertion, “ procedure ABSURD never halts”—it is not

too diﬃcult to reason that this assertion cannot be assigned a truth
value consistent with the behaviour of procedure ABSURD. In this case
the absurdity of procedure ABSURD arises because it is assumed that
HALT( ) exists with the stated behaviour.

(ii) In English, call a word homological if the word’s meaning applies to

itself; otherwise, call it heterological. For instance, the words English,
erudite, polysyllabic are all homological because English is an English
word, erudite is in the learned people’s vocabulary, and polysyllabic is
made of more than one syllable. By a similar reasoning it follows that
German, monosyllabic and indecipherable are all heterological. Now
consider the following statement:

“Heterological is heterological”.

It is not diﬃcult to reason that a truth-value true or false cannot be

assigned to the above assertion (consistent with the deﬁned meanings)
and hence it is not a proposition.

In the subsequent discussions we are mostly concerned with diﬀerent

propositions and their truth values (the values can be true denoted by ⊤
or false denoted by ⊥) rather than their actual meanings. In such cases it
is convenient to denote the propositions simply by letters of English.. Thus
we can speak of a propositional variable p whose truth value is True i.e.,
⊤. Using diﬀerent types of logical connectives, we form compound state-
ments from simple statements. We say that atomic statements are those
that have no connectives. Logical connectives (or operators) are precisely
Chapter 4 Mathematical Logic 165

deﬁned by specifying appropriate truth tables which deﬁne the value of a

compound statement, given the values for the propositional variables. The
basic connectives are ¬ (read not, indicating negation), ∧ (read and, indicat-
ing conjunction) and ∨ (read or, indicating disjunction) are deﬁned by the
following truth tables:

p ¬p p q p∧q p q p∨q
⊤ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥
⊥ ⊤ ⊥ ⊤ ⊥ ⊥ ⊤ ⊤
⊤ ⊥ ⊥ ⊤ ⊥ ⊤
⊤ ⊤ ⊤ ⊤ ⊤ ⊤

The logical connectives ⊕ (called exclusive-or), ⇒ (called implication) and

⇔ (called biconditional) are deﬁned by the following truth tables:

p q p⊕q p q p⇒q p q p⇔q

⊥ ⊥ ⊥ ⊥ ⊥ ⊤ ⊥ ⊥ ⊤
⊥ ⊤ ⊤ ⊥ ⊤ ⊤ ⊥ ⊤ ⊥
⊤ ⊥ ⊤ ⊤ ⊥ ⊥ ⊤ ⊥ ⊥
⊤ ⊤ ⊥ ⊤ ⊤ ⊤ ⊤ ⊤ ⊤

The truth value of p ⊕ q is ⊤ if and only if exactly one of p or q has a truth

value ⊤.
In the statement, p ⇒ q we say that p is the premise, hypothesis or
antecedent and we say q is the conclusion or consequence. To understand
the meaning of p ⇒ q, we ﬁrst remark that, no relationship is to be assumed
between p and q, unlike in ordinary reasoning using the verb “implies”. Let
p denote “I have money” and let q denote “I own a car”. Then, it appears
that the implication, “I have money” implies “I don’t own a car” is false if we
Chapter 4 Mathematical Logic 166

realize that money is suﬃcient to buy and therefore own a car. By a similar
reasoning, the implication, “I have money” implies “I own a car” appears to
be true. The implication, “I don’t have money” implies “I don’t own a car”
appears to be not a false statement. The implication, “I don’t have money”
implies “I own a car” is not clear. If money is a necessary prerequisite to
buy, and therefore own a car, then this last implication is false. But since p
and q are unrelated, we can, in a relaxed manner, reason that in the absence
of money also, owning a car may be possible. We make this allowance and
take the implication to be true.
If p ⇒ q is true then p is said to be a stronger assertion than q. For
example, consider the implications,

(x is a positive integer) ⇒ (x is an integer)

(△ ABC is equilateral) ⇒ (△ ABC is isosceles).

It certainly follows that the premise is a stronger assertion. The assertion,

(△ ABC is equilateral) ⇒ (△ ABC has two equal angles)

can be made stronger in two diﬀerent ways as given below:

(△ ABC is isosceles) ⇒ (△ ABC has two equal angles)

(△ ABC is equilateral) ⇒ (△ ABC has three equal angles).

This means that we can strengthen p ⇒ q by making q stronger or p weaker.

We note that the expression p ⇒ q is exactly equivalent to the expression
¬ p ∨ q in the sense that both have identical truth table. Given the statement
p ⇒ q we say that the converse statement is q ⇒ p while the contrapositive
statement is ¬ q ⇒ ¬ p. We can verify using the truth table that if p ⇒ q
Chapter 4 Mathematical Logic 167

then we have ¬ q ⇒ ¬ p. This can be illustrated by the following examples:

Example 4.1.1:
f (x) = x2 ⇒ f ′ (x) = 2x has the contrapositive form f ′ (x) 6= 2x ⇒ f (x) 6= x2

Example 4.1.2:
√ √
a ≥ 0 ⇒ a is real has the contrapositive form a is not real ⇒ a < 0.

The biconditional of p and q has the truth-value ⊤ if p and q both have the
truth-value ⊤ or both have the truth-value ⊥; otherwise, the biconditional
has truth-value ⊥.

4.2 Fully Parenthesized Propositions and their

Truth Values

Using the different logical connectives introduced above, given any set of
prepositional variables, we can form meaningful fully parenthesized proposi-
tions or formulas. Below, we first define the rules which describe the syntactic
structure of such formulas:

(i) T and F are constant propositions.

(ii) Assertions denoted by small letters such as p, q etc., are atomic propo-
sitions or formulas (which are deﬁned to take values ⊥ or ⊤).

(iii) If P is any (atomic or not) proposition then (¬ P ) is also a proposition.

Chapter 4 Mathematical Logic 168

(iv) If P and Q are any propositions then so are (P ∧ Q), (P ∨ Q), (P ⊕ Q),
(P ⇒ Q) and (P ⇔ Q).

Note:

(i) An atomic formula or the negation of an atomic formula is called a

literal.

(ii) When there is no confusion, we relax the above rules and we will denote
(P ) simply by P , (R ∧ S) simply by R ∧ S etc.

(iii) If one formula is a substring of another, we say that it is a subformula

of the other—an atomic subformula if it is atomic e.g., if R = (A ∨

B) ∧ (¬ B ∨ C) , then trivially R itself is a subformula. Also, (A ∨ B),
(¬ B ∨ C), A, B, ¬ B and C are subformulas. A, B and C are atomic
subformulas.

The rules given above specify the syntax for propositions formed using
the diﬀerent logical connectives. The meaning or semantics of such formulas
is given by specifying how to systematically evaluate any fully parenthesized
proposition. The following cases arise in evaluating a fully parenthesized
proposition:

Case 1. The value of the proposition T is ⊤; the value of the proposition F

is ⊥.

Case 2. The values of (¬ P ), (P ∧ Q), (P ∨ Q), (P ⊕ Q), (P ⇒ Q) and

(P ⇔ Q), when the values of P and Q are speciﬁed, are determined by
using the corresponding operator’s truth table (described above). In
the truth tables above, we allow p to be any formula P and q to be any
formula Q.
Chapter 4 Mathematical Logic 169

Case 3. The value of a formula with more than one operator is found by
repeatedly applying the Case 2 to the subformulas and replacing every
subformula by its value until the given proposition is reduced to ⊥ or
⊤.

The following examples illustrate the systematic evaluation of propositions.

Example 4.2.1:

Consider evaluating (T ∧T ) ⇒ F . We ﬁrst substitute the values for T and

F and get the proposition (⊤ ∧ ⊤) ⇒ ⊥ . This is ﬁrst reduced to (⊤ ⇒ ⊥)
by evaluating (⊤ ∧ ⊤) and then is reduced to ⊥.

Example 4.2.2:

To construct the truth table for the proposition (p ⇒ q) ∧ (q ⇒ p) , we
proceed in stages by assigning all possible combinations of truth values to the
prepositions p and q. We then evaluate the “inner” propositions and ﬁnally
determine the truth-value of the given proposition. This is summarized in
the following truth table which contains the intermediate results also.

p q p⇒q q⇒p (p ⇒ q) ∧ (q ⇒ p)

⊥ ⊥ ⊤ ⊤ ⊤
⊥ ⊤ ⊤ ⊥ ⊥
⊤ ⊥ ⊥ ⊤ ⊥
⊤ ⊤ ⊤ ⊤ ⊤

We note that the given formula has truth value ⊤ whenever p and q have
identical truth values.

Example 4.2.3:
Chapter 4 Mathematical Logic 170

To construct the truth table for the proposition (p ∧ q) ∨ (¬ r) ⇔ p we
have to consider all combinations of truth values to p, q and r. This results
in the following table.

p q r (p ∧ q) (¬ r) (p ∧ q) ∨ (¬ r) (p ∧ q) ∨ (¬ r) ⇔ p

⊥ ⊥ ⊥ ⊥ ⊤ ⊤ ⊥
⊥ ⊥ ⊤ ⊥ ⊥ ⊥ ⊤
⊥ ⊤ ⊥ ⊥ ⊤ ⊤ ⊥
⊥ ⊤ ⊤ ⊥ ⊥ ⊥ ⊤
⊤ ⊥ ⊥ ⊥ ⊤ ⊤ ⊤
⊤ ⊥ ⊤ ⊥ ⊥ ⊥ ⊥
⊤ ⊤ ⊥ ⊤ ⊤ ⊤ ⊤
⊤ ⊤ ⊤ ⊤ ⊥ ⊤ ⊤

The last two examples above illustrate that, given a propositional formula
we can determine its truth value by considering all possible combinations of
truth values to its constituent atomic variables. On the other hand given the
propositional variables (such as p, q, r, etc.), for each possible combination of
truth values to these variables we can deﬁne a functional value by assigning
a corresponding truth value (conventionally denoted in Computer Science
by 0 or 1) to the function. This corresponds to the deﬁnition of a Boolean
function.

Definition 4.2.4:
A Boolean function is a function f : {0, 1}n → {0, 1}, where n ≥ 1.

Thus a truth table can be considered as a tabular representation of a

Boolean function and every propositional formula deﬁnes a Boolean function
Chapter 4 Mathematical Logic 171

by its truth table. The notion of assigning a combination of truth values ⊤

and ⊥ to atomic variables is captured by a truth-assignment, A deﬁned as
follows:

Definition 4.2.5:
A truth-assignment A is a function from the set of atomic variables to the
set {⊥, ⊤} of truth values.

A given proposition is said to be well-deﬁned with respect to a truth-

assignment if each atomic variable in the proposition is associated with either
the value ⊥ or the value ⊤. Thus any well-deﬁned proposition can be evalu-
ated by the technique described in the above examples. On the other hand,
if A is a truth-assignment and P is a formula, we say A is appropriate to P
if each atomic variable of P is in the domain of A. Note that we can extend
the idea of A—we have already done this in the evaluation of propositions
considered above under cases 1, 2 and 3 above. We repeat this in a more
formal way in the following deﬁnition.

Definition 4.2.6:
Let A be any function from the set of atomic variables to {⊥, ⊤}. Then, we
extend the notion of A as follows:

1. A(R ∨ S) = ⊤ if A(R) = ⊤ or A(S) = ⊤

= ⊥ otherwise.

2. A(R ∧ S) = ⊤ if A(R) = ⊤ and A(S) = ⊤

= ⊥ otherwise.
Chapter 4 Mathematical Logic 172

3. A(¬ R) = ⊤ if A(R) =⊥

= ⊥ otherwise.

4. A(R ⇒ S)= ⊤ if A(R) = ⊥ or A(S) = ⊤

= ⊥ otherwise.

5. A(R ⇒ S)= ⊤ if A(R) = A(S)

= ⊥ otherwise.

6. A(R ⊕ S) = ⊤ if A(R) 6= A(S)

= ⊥ otherwise.

4.3 Validity, Satisfiability and Related Con-

cepts

. Let P be a given formula and let A be a truth-assignment appropriate to

P . If A(P ) = ⊤ then we say that A verifies P ; if A(P ) = ⊥ then we say
that A falsifies P . Also if we have a set S = {P1 , P2 , . . . , Pk } of formulas
and if A is appropriate to each Pi (1 ≤ i ≤ n) then we say that A verifies
S if A(Pi ) = ⊤ for all i. If A verifies a formula or a set of formulas then
A is said to be a model for that formula or set of formulas. Note that any
truth-assignment is a model for the empty set.
A formula or a set of formulas is satisfiable if it has at least one model;
otherwise the formula or set of formulas is unsatisfiable. We remark that the
general problem of determining whether formulas in propositional calculus
are satisfiable is a tough problem—no one has found and efficient algorithm
Chapter 4 Mathematical Logic 173

for this problem but no one has proved that an eﬃcient solution does not
exist. In classical complexity theory, the problem is termed NP-Complete.
A formula P is valid if every truth-assignment appropriate to P , veriﬁes
P . We then call P , a tautology or a universally valid formula.
Trivially T is a tautology and F is not. It easily follows that the proposi-
tion (P ∨ ¬ P ) is a tautology. It is easy to see that if S and T are equivalent,
then (S ⇔ T ) is a tautology.
A tautology can be established by constructing a truth table as the fol-
lowing example shows.

Example 4.3.1:

To show that (¬ p) ⇒ (p ⇒ q) is a tautology.
We do this by constructing the following truth-table:

p q (¬ p) (p ⇒ q) ¬ p ⇒ (p ⇒ q)

⊥ ⊥ ⊤ ⊤ ⊤
⊥ ⊤ ⊤ ⊤ ⊤
⊤ ⊥ ⊥ ⊥ ⊤
⊤ ⊤ ⊥ ⊤ ⊤

If P is a formula and if, for all truth-assignments A appropriate to P , if

A(P ) = ⊥ then P is called a contradiction. Obviously the negation of a
contradiction is a tautology.

We also say that a statement P (not atomic, in general) tautologically

implies a statement Q (not atomic, in general) if and only if (P ⇒ Q)

is a tautology. For example it is easy to verify that (p ∧ q) ⇒ p and

p∧(p → q) ⇒ q are tautological implications. The following tautological
Chapter 4 Mathematical Logic 174

implications are well-known:

(i) Addition: p ⇒ (p ∨ q)

(ii) Simpliﬁcation: (p ∧ q) ⇒ p

(iii) Modus Ponens: p ∧ (p ⇒ q) ⇒ q

(iv) Modus Tollens: (p ⇒ q) ∧ ¬ q ⇒ ¬ p

(v) Disjunctive Syllogism: ¬ p ∧ (p ∨ q) ⇒ q

(vi) Hypothetical Syllogism: (p ⇒ q) ∧ (q ⇒ p) ⇒ (p ⇒ r)

Consider any formula P with n distinct atomic subformulas. There are

only 2n different truth-assignments possible to the n atomic.subformulas.
It is quite possible that there may be a syntactically different (in particular,
shorter) formula Q such that under any of the 2n possible truth-assignments,
P and Q both evaluate to ⊤ or both evaluate to ⊥. In such a case P and Q

are equivalent. As an extreme case, consider the formula (¬ p) ⇒ (p ⇒ q)
of Example 4.3.1 above. This formula was shown to be a tautology and so it
is equivalent to T . In general if we show that P is equivalent to a syntacti-
cally shorter formula Q, we can construct the truth-table for Q (and hence
for P ) with lesser effort. The laws of equivalence, stated below, will be useful
in transforming a given P to an equivalent Q.

Definition 4.3.2:
Two propositional formulas P and Q are said to be equivalent (and we write
P ≡ Q), if and only if they are assigned the same truth-value by every truth
assignment appropriate to both.
Chapter 4 Mathematical Logic 175

We note that P ≡ Q is a shorthand for the assertions that P and Q are

equivalent formulas. On the other hand, the formula (P ⇔ Q) is not an
assertion. It is easy to see that (logical) equivalence between two formulas
is an equivalence relation on the set of formulas. It is also easy to see that
any two tautologies are equivalent and any two unsatisﬁable formulas are
equivalent. This means that two formulas with diﬀerent atomic subformulas
can be equivalent.
The following laws of equivalence are well-known:

i) Idempotence of ∧ and ∨ : (a) P ≡ (P ∧ P ) and

(b) P ≡ (P ∨ P )
ii) Commutativity Laws : (a) (P ∧ Q) ≡ (Q ∧ P ),
(b) (P ∨ Q) ≡ (Q ∨ P )
and (c) (P ⇔ Q) ≡ (Q ⇔ P )

iii) Associativity Laws : (a) (P ∨ Q) ∨ R ≡ P ∨ (Q ∨ R) and

(b) (P ∧ Q) ∧ R ≡ P ∧ (Q ∧ R)

iv) Distributive Laws : (a) P ∧ (Q ∨ R) ≡ (P ∧ Q) ∨ (Q ∧ R) and

(b) P ∨ (Q ∧ R) ≡ (P ∨ Q) ∧ (Q ∨ R)
v) de Morgan’s Laws : (a) ¬ (P ∨ Q) ≡ (¬ P ∧ ¬ Q)
and (b) ¬ (P ∧ Q) ≡ (¬ P ∨ ¬ Q)
vi) Law of Negation : ¬ (¬ P ) ≡ P
vii) Law of the excluded Middle : P ∨ ¬P ≡ T
viii) Law of Contradiction : P ∧ ¬P ≡ F
ix) Law of Implication : (P ⇒ Q) ≡ (¬ P ∨ Q)
x) Equivalence or Law of Equality: P ⇔ Q ≡ (P ⇒ Q) ∧ (Q ⇒ P )
xi) Laws of OR-Simpliﬁcation : (a) P ∨ P ≡ P , (b) P ∨ T ≡ T ,
Chapter 4 Mathematical Logic 176

(c) P ∨ F ≡ P and (d) P ∨ (P ∧ Q) ≡ P

xii) Laws of AND-Simpliﬁcation : (a) P ∧ P ≡ P , (b) P ∧ T ≡ P ,
(c) P ∧ F ≡ F and (d) P ∧ (P ∨ Q) ≡ P
xiii) Law of Exportation : P ⇒ (Q ⇒ R) ≡ (P ∧ Q) ⇒ R
xiv) Law of Contrapositivity : (P ⇒ Q) ≡ (¬ Q ⇒ ¬ P )

Obviously, each of the above laws, of the form P ≡ Q, can be proved by

proved by constructing the truth-tables for P and for Q separately and by
checking that they are identical.

Example 4.3.3:
We show that it is possible to use the laws of equivalence to check if a formula
is a tautology. Consider the formula P where,

P = ¬ (p ⇒ q) ∧ ¬ (¬ p ⇒ (q ∨ r) ⇒ (¬ q ⇒ r)

Eliminating the main implication (using the law of implication), we get,

¬ ¬ (p ⇒ q) ∧ ¬ ¬ p ⇒ (q ∨ r) ∨ (¬ q ⇒ r)

Next, using de Morgan’s law, we get,

¬ ¬ (p ⇒ q) ∨ ¬ ¬ ¬ p ⇒ (q ∨ r) ∨ (¬ q ⇒ r)

Application of the law of negation gives,

(p ⇒ q) ∨ ¬ p ⇒ (q ∨ r) ∨ (¬ q ⇒ r)

Again, using the laws of implication and negation, we get,

(¬ p ∨ q) ∨ p ∨ (q ∨ r) ∨ (q ∨ r)
Chapter 4 Mathematical Logic 177

Using laws of Associativity, Commutativity and OR-simpliﬁcation, we get,

¬p ∨ p ∨ q ∨ r

This last formula, by laws of Excluded Middle and OR-simpliﬁcation evalu-

ates to T. We therefore conclude that the original formula P is a tautology.

Let P ≡ Q and let R be any formula involving P ; say, R = (A ∨ P ) ∧

¬ (C ∧ P ) . In the process of the evaluation of R, every occurrence of P
will result in a truth-value. In each such occurrence of P in R, the same
truth-value will result if any (all) occurrence(s) of P is (are) replaced by Q
in R. Thus, in general, if R′ is the formula that results by replacing some
occurrence of P by Q within R, then we will have R ≡ R′ .

4.4 Normal forms

Given the formulas F1 , F2 , . . . , Fn , we write,

n
(a) ∨i=1 Fi for (F1 ∨ F2 ∨ . . . ∨ Fn )

(b) ∧ni=1 Fi for (F1 ∧ F2 ∧ . . . ∧ Fn )
—this is possible because the associativity laws imply that in nested con-
junctions or disjunctions the grouping of the subformulas is immaterial. We
use the above notations even when n = 1, in which case, both representa-
tions simply denote F1 . We call (a) and (b) respectively the disjunction and
conjunction of the formulas F1 , F2 , . . . , Fn .
Chapter 4 Mathematical Logic 178

4.4.1 Conjunctive and Disjunctive Normal Forms

Definition 4.4.1:
A formula is in conjunctive normal form (CNF) if it is a conjunction of
disjunctions of literals. Similarly, a formula is in disjunctive normal form
(DNF) if it is a disjunction of conjunctions of literals.

For example, the formula (a ∧ b) ∨ (¬ a ∧ c) is in DNF and the formula

(a ∨ b) ∧ (b ∨ c ∨ ¬ a) ∧ (a ∨ c) is in CNF. The formulas (a ∨ b) and (a ∧ ¬ c)
are both in CNF as well as in DNF.
(Notation: if F = ¬ G, then let F ′ = G; if F is not a a negation of any
formula, let F ′ = ¬ F ; this ﬁts well with the case if F is atomic say A; then
A′ = ¬ A and ¬ A′ = A)

Theorem 4.4.2:
Every formula has at least one CNF and at least one DNF representation.
Furthermore, there is an algorithm that transforms any given formula into a
CNF or a DNF as desired.

Proof. The proof is by induction on the number of occurrences of the logical

connectives in the given formula. A formula with no logical connectives is
atomic and is already in CNF or DNF. We now assume that the theorem is
true for formulas with k or fewer occurrences of logical connectives. Let F
be the formula with (k + 1) occurrences of logical connectives. The following
cases arise:

(a) F = ¬ G for some formula G with k occurrences of logical connectives.

Chapter 4 Mathematical Logic 179

By induction hypothesis, we can ﬁnd both CNF and DNF of G. If

n mi
∧ ∨ Gij
i=1 j=1

is a CNF of G then by de Morgan’s Law,

n mi
∨ ∧ G′ij
i=1 j=1

is a DNF of F .
Similarly a CNF of F can be obtained from a DNF of G.
(b) F = (G1 ∨ G2 ) for some formulas G1 and G2 each of which has k or
fewer occurrences of logical connectives. To obtain a DNF of F we simply
take the disjunction of a DNF of G1 and DNF of G2 . To obtain a CNF of
m
n
F , ﬁnd CNFs of G1 and G2 , such as ∧ Hi and ∧ Ji , where H1 , . . . ,
i=1 j=1
Hm and J1 , . . . , Jn are disjuncts of literals, then by Distributivity Law

m n
∧ ∧ (Hi ∨ Ji ) ,
i=1 j=1

is equivalent to (G1 ∨ G2 ). Each formula (Hi ∨ Ji ) is a disjunction of literals,

so this formula is a CNF of F.
(c) F = (G1 ∧ G2 ) for some formulas G1 and G2 . This case is entirely
analogous to case (b) above.
(d) F = (G1 ⇒ G2 ). As (G1 ⇒ G2 ) can be written in the equivalent form
¬ G1 ∨ G2 . we can apply the reasoning of case (b)
(e) F = (G1 ⇔ G2 ). The proposition can be written as (G1 ⇒ G2 ) ∧
(G2 ⇒ G1 ). We can apply case (c) now.
Chapter 4 Mathematical Logic 180

4.5 Compactness

In Mathematics, any property that can be inferred of a countably inﬁnite

set, on the basis of ﬁnite “approximations” to that set is called the compact-
ness phenomenon. Compactness is a fundamental property of formulas in
propositional calculus which is stated in the following theorem:

Theorem 4.5.1:
A set of formulas is satisfiable if and only if each of its finite subsets is
satisfiable. (Alternatively, if a set of formulas is unsatisfiable, some finite
subset of the given set must be unsatisfiable.)

Proof. Given an inﬁnite set of formulas which is satisﬁable, it easily follows

that any of its finite subsets is satisfiable. Proving the converse is harder.
Now, let S be an infinite set of formulas such that each finite subset
of S is satisfiable. Let A1 , A2 , A3 , . . . , An , be the listing of all atomic
subformulas in S. For each n ≥ 0, denote by sn , the set of all formulas
whose atomic formulas are among A1 , A2 , A3 , . . . , An . Obviously in sn
there are infinitely many formulas. However, it is not too difficult to see
n
that there are only 22 equivalence classes in sn where all the formulas in
any one class are all equivalent to one another. Since each finite subset of
S is satisfiable S ∩ sn is also satisfiable for each n. To see this, we choose
n
from sn only one representative from the 22 equivalence classes and any
truth-assignment verifying the representative subset verifies S ∩ sn also.
As S ∩ sn is satisfiable for each n = 0, 1, 2, ... there is a truth-assignment
An appropriate to sn such that An verifies S ∩ sn . But, this does not mean
that there is a k such that Ak verifies S; in fact, none of the An need even
Chapter 4 Mathematical Logic 181

be defined on all the atomic subformulas of S. We, however, claim that from
{A0 , A1 , A2 , . . .} we can construct a truth-assignment A that does verify
S. For each n ≥ 0, this construction specifies how to build, a set Un of
truth-assignments. The construction specifies:

(a) For all n > 0, Un is a subset of Un−1 .

(b) Un is an inﬁnite subset of {An , An+1 , An+2 , . . .}.

(c) For 1 ≤ n ≤ m, any two truth-assignments in Um agree on the

truth-value assigned to the atomic formula An .

We set U0 = {A0 , A1 , A2 , . . .}—this means, for n = 0 (a) through (c) are

true. Once Un has been defined, we define Un+1 as follows: By (b), Un
contains Ai for infinitely many i ≥ n + 1. That is, the set {i | i ≥ n + 1 and
Ai ∈ Un } is infinite. This set can be partitioned into two disjoint subsets say
J1 and J2 where:
J1 = {i | i ≥ n + 1 and Ai ∈ Un and Ai verifies An+1 } and
J2 = {i | i ≥ n + 1 and Ai ∈ Un and Ai does not verify An+1 }
At least, one of J1 and J2 must be infinite because the union of two finite
sets is finite. Let the infinite one of these (or any one if both are infinite) be J
(= J1 or J2 , whichever is infinite; = either, if both J1 and J2 are infinite) and
let Un+1 = {Ai | i ∈ J}. Then, Un+1 is definitely a subset of Un ; also, Un+1
contains Ai for infinitely many i > n + 1. Further, all the truth-assignments
in Un+1 agree on A1 , . . . , An and on An+1 .
We now define a truth-assignment, A : {A1 , A2 , A3 , . . .} → {⊤, ⊥} as
follows: A(Am ) is the common value of B(Am ) for all B ∈ Un , n ≥ m (such
a B(Am ) exists by (a) and (c) ). The truth-assignment A verifies S: if any
Chapter 4 Mathematical Logic 182

formula F ∈ S, then F ∈ S ∩ sm for some m and so An veriﬁes F for all

n ≥ m. Hence, B veriﬁes F for all B ∈ Un , if n ≥ m; and hence, A which
agrees with all these B ∈ Un for all n ≥ m on formulas in sm also veriﬁes
F.

4.6 The Resolution Principle in Propositional

Calculus

We first introduce the idea of a clause and a clause set. Consider a CNF

formula F = (p ∨ q) ∧ (¬ r ∨ s ∨ t) . There are other equivalent ways of

writing F e.g., (q ∨ p) ∧ (¬ r ∨ t ∨ s) . All such other syntactical forms of
F and F itself can therefore be just regarded as a set of set of literals. For
n o
example, F is captured by the set {p, q}, {t, s, ¬ r} .
Formally, a clause is a finite set of literals. Each disjunction of literals
corresponds to a clause and each nonempty clause corresponds to one or more
disjunction of literals. We also allow the empty set as a clause, in which case,
it does not correspond to any formula. We write 2 for the empty clause.
A clause set is a set of clauses (empty clause allowed), possibly empty and
may be infinite. Every CNF formula naturally corresponds to a clause set and
every finite clause set not containing the empty clause and not itself empty
corresponds to one or more formulas in CNF. The empty clause set is not
the same as the empty clause, although they are identical when considered
as sets. We write ∅ to denote the empty clause set.
We can carry forward the notion of a truth-assignment A as appropriate
to a clause set. If S is a clause set and every atomic formula in S is in the
Chapter 4 Mathematical Logic 183

domain of A then we say that A is appropriate to S. The truth-value of any

clause in S or the truth-value of S as a whole can be analogously determined
by taking the truth-value of a corresponding formula for the clause or the
clause set. We also say that A verifies a clause if and only if A verifies at
least one of its members and we say A verifies a clause set if and only if A
verifies each of its members.

Example 4.6.1:
n o
Let S be the clause set {p, q}, {¬ r} . If a truth-assignment A verifies p,
verifies ¬ q and verifies ¬ r, then, A also verifies S. On the other hand, if A
verifies all of p, q and r then it follows that A does not verify S.

It follows that any truth-assignment A does not verify 2 because 2 has

no members for A to verify. But, any truth-assignment A verifies ∅ because
A verifies each clause in ∅ (of which there is none). On the other hand, A
does not verify {2} for any A. Thus, the difference between 2 and ∅ is
apparent. We can also say that two given clause sets are equivalent if any
truth-assignment appropriate each assigns to both, the same truth-value.
It follows that a clause or a clause set may be satisfiable, unsatisfiable
or a tautology. The empty clause set is a tautology but the empty clause
is unsatisfiable. Any clause set containing the empty clause is unsatisfiable.
n o
The clause set {p}, {¬ p} , although not empty, is also unsatisfiable.
The resolution rule can be illustrated with a simple example. Consider
n
the clause set S = {p, q, ¬ r}, {q, r, s} . To verify S, any appropriate
truth-assignment must verify {p, q, ¬ r} and must also verify {q, r, s}. This,
it is not difficult to reason, leads to the conclusion that the clause set {p, q, s}
Chapter 4 Mathematical Logic 184

must also be veriﬁed by the same truth- assignment. In fact the clause set
n o
{p, q, ¬ r}, {q, r, s}, {p, q, s} can be shown to be equivalent to S. We say
that the clause set {p, q, s} is a resolvent of the clause sets {p, q, ¬ r} and
{q, r, s}.

Definition 4.6.2:
Let C1 and C2 be two clauses. Then the clause D is a resolvent of C1
and C2 if and only if for some literal l we have l ∈ C1 and ¬ l ∈ C2 and

D = C1 \{l} ∪ C2 \{¬ l} .

Thus a resolvent of two clauses is any clause obtained by striking out a

complementary pair of literals, one from each clause and merging the remain-
ing literals into a single clause. Evidently, from C1 and C2 , in general it is
possible to produce more than one D.

The Resolution Rule

Let S be a clause set with two or more elements and let D be a resolvent of
any two clauses in S. Then S ≡ S ∪ {D}.

Proof. Let A be a truth-assignment appropriate to S. Clearly, if A veriﬁes

S ∪ {D} then A verifies S as well. Now, let it be that A verifies S and let

D = C1 \{l} ∪ C2 \{¬ l} where C1 , C2 ∈ S, l ∈ C1 and ¬ l ∈ C2 . As A
cannot verify both l and ¬ l, in order that A verifies both C1 and C2 , it must

verify some literal in C1 \{l} ∪ C2 \{¬ l} i.e., some literal in D.

Starting with a clause set S, the resolution rule allows us to build a new
clause set S ′ by adding all resolvents to S (in the sense that S and S ′ are
Chapter 4 Mathematical Logic 185

equivalent). Now we can replace S by S ′ and add new resolvents again and
we can repeat this procedure as long as it is possible. We precisely formalize
this in the following:
Let S be a ﬁnite clause set with two or more elements. We deﬁne,

AddRes(S) = S ∪ {D | D is a resolvent of two clauses in S}.

Evidently, AddRes(S) ≡ S. Now, we let AddRes0 (S) to be S
itself. We can then deﬁne,
AddResi+1 (S) = AddRes (AddRes i (S)) for each i ≥ 0.
Finally we let, AddRes⋆ (S) = ∪ { AddRes i (S)) | i ≥ 0}.

In other words, AddRes⋆ (S) is the closure of S under the operation AddRes
of adding all resolvents of clauses already present. Since each clause in S is
finite, there are only a finite number of clauses that can be formed by the
atomic formulas appearing in S. Hence only a finite number of resolvents
can ever be formed starting from a finite S. So there exists an i > 0 such
that, AddResi+1 (S) =AddResi (S). Then AddRes⋆ (S) = AddRes i (S). By
induction, it follows that S ≡ AddRes⋆ (S).
Determining AddRes⋆ (S) by repeated application of the resolution rule
can be used to find out whether a clause set is satisfiable. In turn, this
implies that we can have a computational procedure to determine whether a
formula is satisfiable. This is possible because of the following theorem:

Theorem 4.6.3 (The Resolution Theorem):

A clause set S is unsatisﬁable if and only if 2 ∈ AddRes⋆(S).

Proof. Clearly, if 2 ∈ AddRes⋆ (S) then AddRes⋆ (S) and so S is unsatisﬁable.

To prove the converse, assume that S is unsatisﬁable. By the Compactness
Chapter 4 Mathematical Logic 186

Theorem, some ﬁnite subset of S is unsatisﬁable.

Let An (n ≥ 1) denote all possible atomic formulas appearing in S. Let
Cn be the set of all clauses that can be constructed using only A1 , A2 , . . . ,
An . We denote {2} by C0 . Then there is an n > 0 such that S ∩ Cn and
hence AddRes⋆ (S) ∩ Cn is unsatisfiable.
We now claim that for each k = n, n − 1, . . . , 0, and for each truth-
assignment A appropriate to {A1 , A2 , . . . , An } there exists some clause C in
AddRes⋆ (S) ∩ Ck such that A does not verify C. That is, for every truth-
assignment that assigns truth-values to the first k atomic formulas, there is
some clause in AddRes⋆ (S) that contains only these atomic formulas and is
falsified by the truth-assignment. Since the only clause that can be falsified
when k = 0 is 2, it follows that 2 ∈ AddRes⋆(S).
We apply induction to prove the above claim. For k = n, the claim is
immediate, since AddRes⋆ (S) ∩ Cn is unsatisfiable.
Now suppose that the claim holds for some k + 1 but fails for k. Then
there is some truth-assignment A appropriate to {A1 , A2 , . . . , Ak } such that
A verifies AddRes⋆ (S) ∩ Ck . Now let A1 and A2 be two truth-assignments
that assign the same truth values as A to A1 , A2 , . . . , Ak and are such that
A1 verifies Ak+1 but A2 falsifies Ak+1 . By the induction hypothesis there are
clauses C1 , C2 ∈ AddRes⋆ (S) ∩ Ck+1 such that A1 does verify C1 and A2
does not verify C2 . Now both C1 and C2 must contain Ak+1 as otherwise one
of them would be in AddRes⋆ (S) ∩ Ck and would be falsified by A as well as
by A1 or by A2 also. It is easy to reason that C1 must contain ¬ Ak+1 as its
member but not Ak+1 while C2 must contain Ak+1 and not ¬ Ak+1 . Thus, in
any case, A1 must verify C1 or A2 must verify C2 . Then we can produce a
Chapter 4 Mathematical Logic 187

resolvent D from C1 and C2 as, D = C1 \{¬ Ak+1 } ∪ C2 \{Ak+1 } . Now,
D is in Ck because all occurrences of Ak+1 and ¬ Ak+1 are discarded from
C1 and C2 in forming D. Also, D is in AddRes⋆ (S) because it is a resolvent
of clauses in AddRes⋆ (S). Therefore, D is in AddRes⋆ (S) ∩ Ck . Further, A

does not verify D; for if it does then either A verifies D = C1 \{¬ Ak+1 }

or C2 \{Ak+1 } and then A1 verifies C1 or A2 verifies C2 . But this is a
contradiction because we assumed that A verifies AddRes⋆ (S) ∩ Ck and here
we have got a D ∈ AddRes⋆ (S) ∩ Ck such that A does not verify D. This
completes the induction and therefore the proof.

The resolution theorem can be used to determine the satisﬁability of a

given formula. The following procedure precisely does this by brute-force:

Step 1: Convert the given formula into a CNF.

Step 2: From the CNF formula, get the corresponding clause set,
say S.
Step 3: Compute AddRes1 (S), AddRes2 (S), . . . ,
until AddResi+1 (S) = AddRes i (S) for some i = 1.
Step 4: If 2 ∈ AddResi(S) then conclude that S is unsatisﬁable;
otherwise S is satisﬁable.

Strictly speaking, not all the intermediate resolvents in AddRes⋆ (S) are
really needed to check whether 2 ∈ AddRes⋆(S). The following example is
illustrative.

Example 4.6.4:
Show that the following formula is unsatisﬁable:

(A ∧ B) ∨ (A ∧ C) ∨ (B ∧ C) ∨ (¬ A ∧ ¬ B) ∨ (¬ A ∧ ¬ C) ∨ (¬ B ∧ ¬ C)
Chapter 4 Mathematical Logic 188

The given formula corresponds to the clause set,

{A, B}, {A, C}, {B, C}, {¬ A, ¬ B}, {¬ A, ¬ C}, {¬ B, ¬ C} .

Starting with the given clause set, the following self-explanatory “tree dia-
gram” gives the intermediate resolvents that are needed to show that 2 is
also a resolvent belonging to AddRes⋆ (S):

{A, C} {¬A, ¬B} {B, C} {A, B} {¬A, ¬C} {¬B, ¬C}

{C, ¬B} {B, ¬C}

{C} {¬C}

The following exercise gives an important result where the resolution tech-
nique holds the key.

4.7 Predicate Calculus – Basic Ideas

Formulas in propositional calculus are ﬁnite and are limited in expressive

power. There is no way of making an assertion in a single formula that
covers inﬁnite similar cases. For example, even certain simple assertions in
Mathematics do not ﬁt into the language of propositional calculus. Thus
assertions such as, “x = 3” , “y ≥ z” and “x + y > z” are not propositions
Chapter 4 Mathematical Logic 189

as truth-values ⊤ or ⊥ cannot be assigned to them. However, if integer

values are assigned to the variables x, y, z in the above assertions then each
assertion becomes a proposition. Similarly it is possible to consider sentences
in English where pronouns and improper nouns act as variables, as in the
assertions, “He is tall and blonde” (x is tall and blonde) and “She does not
smoke” (y does not smoke) These assertions can be regarded as “templates”
or “patterns”, expressing relationships between objects—these templates are
technically called “predicates”. Using a uniform notation, we can write the
above assertions as,

EQ(x, 3), GTE(y, z), SGT(x, y, z), TNB(x), DOSNSMOKE(y).

We further note that in the above, we implicitly assume that x, y, z are

integers in one case whereas in another, we assume x, y to be the persons (or
names of persons). In Predicate Calculus, we make general statements about
objects in a ﬁxed set, called the Universe. In an assertion like P (x, y), P is
called the Predicate sign and x and y are called variables. More technically,
P (x, y) is simply written as P xy. An assertion such as P (x1 , . . . , xn ), where
x1 , . . . , xn are variables is said to be an n-place predicate. Note that if P is
an n-place predicate constant and values C1 , . . . , Cn from the Universe are
assigned to each of the individual variables, the result is a proposition.
Predicates are commonly used in control statements in programming lan-
guages. For example, consider the statement in a Pascal-like language,

if ( x > 3 ) then goto L ;

During program execution, when the if- statement is encountered, the cur-
rent value of x is substituted to determine whether (x > 3) evaluates to true
or false (and then the program control is transferred conditionally).
Chapter 4 Mathematical Logic 190

Restricting the Universe of discourse to positive integers, now consider

the assertions,
“ For all even integers x, we have x > 1”
“ If x is an even integer there exists integers y, z such that x = y + z”.

Both the above assertions are true and hence these are propositions. To
succinctly express the above, we need two special symbols ∀ (read for-all )
and ∃ (read there-exists) respectively known as universal quantifier and ex-
istential quantifier . We can now write the above propositions as,

∀x (x mod 2 = 0) ⇒ (x > 1)

∀x∃y∃z (x mod 2) = 0 ⇒ (y + z) = x

Taking mankind as the universe of discourse, if Lxy is used to denote “x

loves y” then the formula, ∀x(Lxx ⇒ ∃yLxy) can be interpreted to mean
the statement, “Anyone who loves himself loves someone”.
Let M (x) stand for “x is a man”; let T (y) stand for “y is a truck”; and let
D(x, y) stan for “x drives y”. Consider the following, a formula in predicate
calculus:

∀x M (x) ⇒ ∃y T (y) ∧ D(x, y)

—this says, “for all x, if x is a man then there exists a y such that y is a
truck and x drives y.” In other words, it says “every man drives at least one
truck”. The following formula,

∀y T (y) ⇒ ∃x M (x) ∧ D(x, y)

says that every truck is driven by at least one man. We remark that paren-
theses add to clarification. For example, the formula ¬ T (y) ∧ D(x, y) can be
Chapter 4 Mathematical Logic 191

interpreted in two ways. It can denote ¬ T (y) ∧ D(x, y) which means, “y

is not a truck and x drives y”. Alternately, it can denote, ¬ T (y) ∧ D(x, y)
which means, “y is not a truck that x drives”.
More generally, we can make statements about relations that are not
explicitly specified. For example, ∀x∀yP (x, y) ⇔ P (y, x) simply states that
P is a symmetric relation; the formula ∃x¬ P (x, x) can be interpreted to
mean that P is not reflexive.
Apart from predicate signs and variables, predicate calculus also allows
function signs, Thus, if f is a function sign corresponding to a binary function
f (x, y) where x and y are variables (denoting objects in a fixed universe),
then ∀x∀yP f xyf yf yx (which may be informally written as)

∀x∀yP f (x, y), f y, f (y, x)

is a legal formula.
Formulas in Predicate Calculus turn out to be true or false depending on
the interpretation of predicates and function signs. Thus ∃x¬ P xx is true if
and only if P is interpreted as a binary relation that is not reﬂexive. Also,
the formula ∀x∀y∀zP xy ∧ P yz ⇒ P xz is true if and only if P is interpreted
as a transitive relation. In the universe of discourse U , for any predicate P

and for any element m ∈ U , the formula ∀x P (x) ⇒ P (m) is always true.
In the sequel, we will prefer to use the informal style of writing the formulas
to provide some clarity.

4.8 Formulas in Predicate Calculus

We now deﬁne the syntax of formulas in predicate calculus.

Chapter 4 Mathematical Logic 192

1. We use P, Q, R, . . . to denote predicate signs; f, g, h . . . to denote func-

tion signs; and x, y, z, . . . to denote variables. A 0-place function sign
stands for a speciﬁc element in the Universe–we call these constant
signs and denote them by a, b, c, . . .

2. We next deﬁne terms inductively as:

(a) Every variable is a term.

(b) If f is an n-place function sign and t1 , . . . , tn are terms, then

f (t1 , . . . , tn ) is also a term.

3. Atomic formulas are deﬁned as:

If P is an n-place predicate sign and t1 , . . . , tn are terms, then P (t1 , . . . , tn )

is an atomic formula.

4. Formulas are deﬁned as follows:

(a) Atomic formulas are formulas.

(b) If F and G are any formulas, then so are (F ∨ G), (F ∧ G) and ¬ F .

A subformula of a formula is a substring of that formula which is itself a

formula. The matrix of a formula is the result of deleting all occurrences
of quantiﬁers and the variables immediately following those occurrences of
quantiﬁers. It is easy to see that the result of the process is a formula.
As in propositional calculus, we call (F ∨ G) the disjunction of F and G;
we call (F ∧ G), the conjunction of F and G; we call ¬ F , the negation of F .
Chapter 4 Mathematical Logic 193

Also, we introduce the notions of the conditional and biconditional of two

formulas: (F ⇒ G) is used as an abbreviation for (¬ F ∨ G) and (F ⇔ G) is

used as an abbreviation for (F ⇒ G) ∧ (G ⇒ F ) .
The scope of an occurrence of the negative sign or a quantiﬁer in a formula
is another syntactic notion. Any occurrence of ¬ or ∀ or ∃ in a formula
F refers to or embraces a particular subformula of F . The scope of an
occurrence of ¬ in F is that uniquely determined subformula G such a that
the occurrence of ¬ is the leftmost symbol of ¬ G. Similarly the scope of an
occurrence of ∀ or ∃ is that unique formula G such that for some variable x,
the occurrence of the ∀ or ∃ is the leftmost symbol of ∀xG or ∃xG. Consider
the formula, (∀xP (x, y)) ∨ (∃xQ(x) ∧ ∃x¬ Q(x)). The scope of ∀ is P (x, y).
The scope of ﬁrst occurrence of ∃ is Q(x) and that of the second occurrence
of ∃ is ¬ Q(x). The scope of occurrence of ¬ is Q(x).

4.8.1 Free and bound Variables

An occurrence of a variable in a formula is said to be free if it is governed

by no quantiﬁer (in that formula) containing that variable. Otherwise, the
variable is said to be bound. We illustrate this idea in an informal way.
Consider the predicate ∀i(x ∗ i > 0), where i is speciﬁed to be an integer
between m and n(m < n) and x is an integer. The predicate is true if both x
and m are greater than 0 or if x and n are less than 0. Hence it is equivalent
to the predicate,

(x > 0 ∧ m > 0) ∨ (x < 0 ∧ n < 0)

Thus the truth value of the predicate depend on the values of m, n and x but
not on the value of i. It is obvious that the truth value of the predicate does
Chapter 4 Mathematical Logic 194

not change if all occurrence of i are replaced by a new variable, say j. The
variable i is bound to the quantiﬁer ∀ in the predicate. The variables m, n
and x are free in the predicate. A predicate such as (i > 0)∧(∀i(x∗i > 0)) can
be confusing. In such cases we rewrite the predicate to remove the ambiguity.
For example, we can rewrite this last predicate as (i > 0) ∧ (∀j(x ∗ j) > 0).
More precisely the free variable of a formula are deﬁned inductively as follows:

(a) The free variables of an atomic formula are all the variables occurring in
it.

(b) The free variables of the formulas (F ∨G) or (F ∧G) are the free variables
of F and the free variables of G; the free variables of ¬ F are the free
variables of F

(c) The free variables of ∀xF and ∃xF are the free variables of F , except for
x (if x happens to be a free variable in F ).

When there are no free occurrences of any variable in a formula, the formula
is called a closed formula or sentence.

4.9 Interpretation of Formulas of Predicate

Calculus

A formula in predicate calculus can be considered to be true or false depend-

ing on the interpretation. Any interpretation for a formula must contain
suﬃcient information to determine if the formula is true or not. This point
can be understood by considering a statement such as “all students have
Chapter 4 Mathematical Logic 195

passed” (may be written as ∀sP (s), where s is a variable denoting a stu-

dent, P is the predicate “have passed”). To decide if this statement is true
we need to know who the students are i.e., we need to know a universe of
discourse. Also, we need to know who has failed. That is we need some
type of assignment of the predicate “have passed”. More generally, we must
have interpretations for predicate and functions signs as relations and func-
tions respectively. Also, we may have to interpret some of the variables as
particular objects. To do this, we ﬁrst deﬁne the notion of a structure.

4.9.1 Structures

A structure A is a pair ([A],IA ). where [A] is any nonempty set called the
universe of A and IA is a function whose domain is a set of predicate and
function signs. Speciﬁcally

- if P is an n-place predicate sign in the domain of IA , then IA (P ) is an

n-ary relation on [A].

- if f is an n-place function sign in the domain of IA , then IA (f ) is a function

from [A]n to [A].

We also write P A for IA (P ) and f A for IA (f ). In the case the domain of IA

is ﬁnite, say {P1 , . . . , Pm , f1 , . . . , fn } we also write A as the (m + n + 1)-tuple:

([A], P1A , . . . , PmA , f1A , . . . , fnA ).

If A is structure and F is a formula such that each predicate letter and

function sign of F is assigned a valued by IA , then A is said to be appropriate
to F .
Chapter 4 Mathematical Logic 196

Example 4.9.1:
Let P be a 2-place predicate and f be a 1-place function sign and let F be
the formula ∀xP (x, f (x)).

The structure A = ([A], P A , f A ), deﬁned as follows is appropriate to F :

[A] = {0, 1, 2, . . .}, the set of natural numbers N

P A = {(m, n)| m, n ∈ N and m < n}

fA is the successor function i.e., f (n) = n + 1 for each n ∈ N.

We regard F as true in the structure A since every number is less than its
successor. If we deﬁne a new structure B which is the same as A except that,
P A = {(m, n)| m, n ∈ N and m > n}, then F is false in B.
The formula P (f (x), y) cannot be regarded as true or false in A or in B
without knowing what x and y are.
The next subsection formalizes these ideas.

4.9.2 Truth Values of formulas in Predicate Calculus

Let F be a given formula and let A be a structure appropriate to F . Let ξ

be a function with a domain that includes all variables of F and with a range
that is a subset of [A]. We then say ξ is appropriate to F and A.
Let t be a term and G be a formula that can be constructed by using the
variables, predicate signs and function signs of F . Now, given F , A and ξ,
for each t or G, we deﬁne A(t)ξ in [A] or A(G)ξ in {⊤, ⊥} as follows:

1. (a) If x is a variable of F , then A(x)ξ = ξ(x).

Chapter 4 Mathematical Logic 197

(b) If t1 , . . . , tn are terms and f is an n-place function of F , then

A(f (t1 , . . . , tn ))ξ = f A (A(t1 )ξ , . . . , A(tn )ξ )

In other words, A(f (t1 , . . . , tn ))ξ is calculated by ﬁrst ﬁnding A(t1 )ξ ,

. . . , A(tn )ξ which are members of [A] and then applying to these
values, the function, f A : [A]n → [A].

We require one or more auxiliary deﬁnition. If ξ is as described above,

x is a variable of F and a is a member of [A], then we take ξ[x/a] to be
the function ξ ′ which is identical to ξ except that ξ ′ (x) = a (whatever
ξ(x) might be). We now deﬁne the value of A(G)ξ considering the
diﬀerent cases. We let G and H to be formulas that can be constructed
by using the variables, predicate signs and functions signs of F

2. (a) If t1 , . . . , tn are terms

 and P is an n-place predicate sign of F , then,


⊤ if A(t1 )ξ , . . . , A(tn )ξ ∈ P A
A(P (t1 , . . . , tn ))ξ =


⊥ otherwise



⊤ if A(G)ξ = ⊤, or A(H)ξ = ⊤
(b) A (G ∨ H) =
ξ 

⊥ otherwise



⊤ if A(G)ξ = ⊤, and A(H)ξ = ⊤
(c) A (G ∧ H) =
ξ 

⊥ otherwise



⊤ if A(G)ξ = ⊥
(d) A(¬ G)ξ =


⊥ otherwise



⊤ if A(G)ξ[x/a] = ⊤, for each a ∈ [A]
(e) A(∀ xG)ξ =


⊥ otherwise
Chapter 4 Mathematical Logic 198



 ⊤ if A(G)ξ[x/a] = ⊤, for some a ∈ [A]
(f) A(∃ xG)ξ =


⊥ otherwise

This completes the task of evaluating the truth values of formulas in pred-
icative calculus.
We write A ξ G if and only if A(G)ξ = ⊤.

Example 4.9.2:
Let us consider the structure A of Example 4.9.1 and the formula P (x, f (y)).
Let ξ be the function from {x, y} to [A] = N such that,

ξ(x) = 1 ξ(y) = 2

Now A(y)ξ = ξ(y) = 2 and A(f (y))ξ = f A (A(y)) = f A (2) = 2 + 1 = 3. Next

A P (x, f (y)) = T if and only if A(x)ξ , A(f (y))ξ ∈ P A . That is, if and
ξ
only if (1,3)∈ P A which in indeed true. Hence A ξ P (x, f (y)).

Example 4.9.3:
For the structure A and the function ξ as in example 4.9.2, let us consider

the formula ∀ xP (x, f (x)). By deﬁnition, A ∀ xP (x, f (x)) = ⊤ if and only
ξ
of A(x, f (x))ξ[x/a] = ⊤ for each a ∈ [A] = N.

Now A P (x, f (x)) = ⊤ if and only if A(x)ξ[x/a] , A(f (x)ξ[x/a] ) ∈ P A .
ξ[x/a]
We see that A(x)ξ[x/a] = a and A(f (x))ξ[x/a] = f A(f (x))ξ[x/a] = f A (a) =
A

a + 1. Since (a, a + 1) ∈ P A for each a ∈ N, we conclude A ξ ∀ xP (x, f (x)).

Example 4.9.4:
Let L be a 2-place predicate sign and f , a 2-place function sign. Let x and
Chapter 4 Mathematical Logic 199

y be variables. Consider the structure A, where,

[A] = N

LA = {(m, n)|m < n, m, n ∈ N}

f A (m, n) = m + n

Let ξ be the function from {x, y} to N such that ξ(x) = 5, ξ(y) = 2.

Consider the formula ∀ x∀ yL(x, f (x, y)).
Now

A ∀ x∀ yL(x, f (x, y)) = ⊤
ξ

if and only if

A ∀ yL(x, f (x, y)) = ⊤ for each a ∈ [A]
ξ[x/a]

i.e., if and only if

A L(x, f (x, y)) = ⊤ for each a, b ∈ [A] = N
ξ[x/a][y/b]

i.e., if and only if

A(x)ξ[x/a][y/b] , A f (x, y) ξ[x/a][y/b] ∈ LA for each a, b ∈ N

i.e., if and only if

a, f (A(x)ξ[x/a][y/b] , A(y)ξ[x/a][y/b] ) ∈ LA
A
for each a, b ∈ N

i.e., if and only if

a, f A (a, b) ∈ LA for each a, b ∈ LA
Chapter 4 Mathematical Logic 200

i.e., if and only if

(a, a + b) ∈ LA for each a, b ∈ N

But there is a value of (a, a + b) namely (0, 0) ∈ N such that (0, 0) 6∈ LA .

Therefore A 2ξ ∀ x∀ yL(x, f (x, y)).
It is easy to reason that if x has no free occurrences in F ,then the value
of A(F )ξ is independent of the value of ξ(x).
We write A ξ G if and only if A(G)ξ = ⊤. If F is a closed formula, then
A ξ F for some appropriate ξ if and only if A ξ F for every appropriate ξ.
In this case we simply write A F and we say A is a model for F . If A F
for every A appropriate to F , then F is said to be valid. Valid formulas in
predicate calculus play the same role as tautologies in propositional calculus.
A closed formula is said to be satisfiable if it has at least one model; other
wise it is unsatisﬁable.
We remark that there is no method similar to the “true-table” approach
to check if a formula in predicate calculus is satisﬁable.

4.10 Equivalence of Formulas in Predicate Cal-

culus

Two given formulas F and G are equivalent if and only if, for every struc-
ture A and function ξ appropriate to both F and G, A(F )ξ = A(G)ξ .As in
propositional calculus, write F ≡ G if F and G are equivalent.
It should be obvious that ≡ is an equivalence relation on the set of sen-
tences. All the laws of equivalence in propositional calculus (seen earlier)
Chapter 4 Mathematical Logic 201

such as associativity and commutativity laws for ∧ and ∨, DeMorgan’s laws,

the law of double negation etc. continue to hold good in predicate calculus
also. In addition, we also have the following important equivalences.

Lemma 4.10.1:(a) For any formula F and variable x,

¬ ∀xF ≡ ∃x¬ F

¬ ∃xF ≡ ∀x¬ F

(b) For any formulas F and G variable x, such that x has no occurrence in
G we have,

(∀xF ∨ G) ≡ ∀x(F ∨ G)

(∀xF ∧ G) ≡ ∀x(F ∧ G)

(∃xF ∨ G) ≡ ∃x(F ∨ G)

(∃xF ∧ G) ≡ ∃x(F ∧ G)

(c) For any formulas F and G and any variable x,

∀x(F ∧ G) ≡ (∀xF ∧ ∀xG)

∃x(F ∨ G) ≡ (∃xF ∨ ∀xG)

Proof. (a) We prove that ¬ ∀xF ≡ ∃x¬ F . The other equivalence can be
proved in a similar way.

A(¬ ∀xF )ξ = ⊤
Chapter 4 Mathematical Logic 202

if and only if A(∀xF )ξ = ⊥,

i.e., if and only if A(F )ξ[x/a] = ⊥ for some a ∈ [A]

i.e., if and only if A(¬ F )ξ[x/a] = ⊤ for some a ∈ [A]

i.e., if and only if A(∃x¬ F )ξ = ⊥

(b) Consider the formula (∀xF ∨ G) ≡ ∀x(F ∨ G).

A((∀xF ∨ G))ξ = ⊤

if and only if A(∀xF )ξ = ⊤ or A(G)ξ = ⊤

i.e., if and only if, A(F )ξ[x/a] = ⊤ for each a ∈ [A] or A(G)ξ = ⊤

i.e., if and only if, for each a ∈ [A]

either A(F )ξ[x/a] = ⊤ or A(G)ξ[x/a] = ⊤

(since x has no free occurrences in G)

i.e., if and only ifA (F ∨ G) ξ[x/a] = ⊤ for each a ∈ [A]

i.e., if and only if, A(F ∨ G)ξ[x/a] = ⊤ for each a ∈ [A]

i.e., if and only if, A(∀x(F ∨ G)) = ⊤

The other equivalence may be proved along similar lines.

(c) We consider the ﬁrst equivalence ∀x(F ∧ G) ≡ (∀xF ∧ ∀xG).

Now A ∀x(F ∧ G) ξ = ⊤

if and only if A((F ∧ G)ξ[x/a] = ⊤, for each a ∈ [A]

i.e., if and only if A(F )ξ[x/a] = ⊤,

and A(G)ξ[x/a] = ⊤, for each a ∈ [A]

Chapter 4 Mathematical Logic 203

i.e., if and only if A(F )ξ[x/a] = ⊤, for each a ∈ [A]

and A(G)ξ[x/a] = ⊤, for each a ∈ [A]

if and only if A(∀xF )ξ = ⊤, and A(∀xG)ξ =⊤

if and only if A (∀xF ∧ ∀xG) ξ = ⊤

The other equivalence may be proved similarly.

4.11 Prenex Normal Form

A formula is in prenex form (or prenex normal form) if and only if all quanti-
ﬁers (if any) occur at the extreme left without intervening parentheses. The
prenex form is
Q1 v1 . . . Qn vn G

where G is a quantiﬁer-free formula and each Qi is either ∀ or ∃.

The formula F1 = (∃xP x ⇒ P y) is not in prenex form. The equivalent
formula F2 = ∀x(P x ⇒ P y) is a prenex form. Given any formula that is not
in prenex form, we can systematically use the previous lemma to successively
move quantiﬁers to the left (in a sequence of logically equivalent formulas)
until ﬁnally a prenex formula is obtained. In this process, it may be necessary
to rename variables so that no variables is both free and bound in the same
formula. Thus to transform any formula into an equivalent prenex form,we
carry out the following steps:

Step 1 Rename the variable, if necessary, so that no variable is both free and
bound and so that there is at most once occurrence of a quantiﬁer with
any particular variable (we get what is called as a rectified formula).
Chapter 4 Mathematical Logic 204

Step 2 Apply the equivalences of Lemma 9.10.2 to move the quantiﬁers to

the left end.

It is easy to see that a give n formula may have diﬀerent prenex forms.

Example 4.11.1:
We wish to get the prenex form of the formula.

(¬ ∀xP (x, y) ∨ ∀xR(x, y))

The given formula is equivalent to the rectiﬁed formula (¬ ∀xP (x, y)∨∀zR(z, y)).
This formula is equivalent to (∃ x¬ P (x, y) ∨ ∀zR(z, y)) which is of the form
(∃ xF ∨ G) with x having no free occurrence in G, which is equivalent to
∃ x(F ∨ G). So the formula is equivalent to ∃ x(¬ P (x, y) ∨ ∀ zP (z, y)) ≡
∃ x(∀ zP (z, y) ∨ ¬ P (x, y)) ≡ ∃ x∀ z(P (z, y) ∨ ¬ P (x, y)) which is in prenex
form.

4.12 The Expansion Theorem

Given a formula F in predicate calculus,. it can be reduced in a systematic

way to a countable set of formulas without quantifiers or variables. The set
of formulas can therefore be regarded as formulas in propositional calculus.
That is, for each F we can generate E(F ), the collection of quantifier-free
formulas. E(F ) is known as th Herbrand expansion of F . The way to obtain
E(F ) from F is best illustrated by considering an example.
Let F = (∀ y ∃ xP (y, x) ∧ ∃ z ∀ w ¬ P (w, z)). Let A be a structure ap-
propriate to F . In the Herbrand expansion, for A to be a model for F we
prescribe values of existentially quantified variables corresponding to various
Chapter 4 Mathematical Logic 205

ways of substituting values for the universally quantiﬁed variables so that the
matrix urns out to be true in each case. For the above case, we ﬁrst select a
1-place function sign f and a 0-place function sign a. We next replace F by
its functional form,

∀ yP (y, f (y)) ∧ ∀ w ¬ P (w, a)

Here f (y) denotes a choice for the object x corresponding to y such that
P (y, x) holds. For each y, there can be many possible x—we simply say
that there must be at least one choice for x. Similarly a stands for some
ﬁxed object such that ¬ P (w, a) holds, whatever be the value of w. To make
P (y, x) true we can also take y to be the object f (a) itself and take x to be
f (f (a)), and so on.
The Herbrand Universe of F is the set of terms that can be formed from
a and f , namely,
{a, f (a), f (f (a)), . . .}

It turns out that in order to test whether F is satisﬁable, it is suﬃcient to

consider the functional form and to consider values for y and w drawn from
the Herbrand universe. Moreover, it is not necessary to consider arbitrary
interpretations for the function signs f and a. It suﬃces to interpret f
syntactically; that is, f is simply considered as a function from the Herbrand
Universe to itself.
To get the Herbrand expansion of F , we ﬁrst obtain the matrix F ∗ of the
functional form of F . For the above example,

F ∗ = P (y, f (y)) ∧ ¬ P (w, a)

By substituting diﬀerent values for y, we get the Herbrand expansion as

Chapter 4 Mathematical Logic 206

follows:
n

P (a, f (a)) ∧ ¬ P (a, a) = F ∗ [y/a, w/a]

P f (a), f (f (a)) ∧ ¬ P (a, a) = F ∗ [y/f (a), w/a]

P (a, f (a)) ∧ ¬ P (f (a), a) = F ∗ [y/a, w/f (a)]

P f (a), f (f (a)) ∧ ¬ P (f (a), a) = F ∗ [y/f (a), w/f (a)]
..
.
o

Let us take the formula G = (∀ y ∃xP (y, x) ∧ ∃ z∀ w ¬ P (a, w). The cor-
responding functional form is, ∀ yP (y, f (y)) ∧ ∀ w ¬ P (a, w). We have the
Herbrand expansion as,
n

P (a, f (a)) ∧ ¬ P (a, a)

P f (a), f (f (a)) ∧ ¬ P (a, a)

P (a, f (a)) ∧ ¬ P (a, f (a))

..
.
o

We can see that G is unsatisﬁable because in th Herbrand expansion both

P (a, f (a)) and ¬ P (a, f (a)) must be true.
We state the main result of this section without proof.

Theorem 4.12.1 (The Expansion Theorem):

Chapter 4 Mathematical Logic 207

A closed formula is satisﬁable if and only if its Herbrand expansion is satis-

ﬁable.

The expansion theorem gives a procedure to detect unsatisﬁable formulas

in predicate calculus. A formula is unsatisfiable if and only if its Herbrand
expansion is unsatisfiable. By Compactness theorem for propositional cal-
culus, the expansion is unsatisfiable if and only if some finite subset of it is
unsatisfiable. These two facts together suggest the following computational
procedure for testing unsatisfiability.
Generate the Herbrand expansion in small portions in some systematic
way. At periodic intervals stop and test (using truth-tables for example)
whether the portion generated is unsatisfiable. If the original formula is
unsatisfiable, then the fact will be discovered at some point. However if the
original formula is satisfiable this procedure may not halt.

Exercises

1. The Sheﬀer stroke ↑ (called the nand operator in Computer Science)

and the Pierce arrow ↓ (called the nor operator in Computer Science)
are deﬁned by the following truth tables:

p q p↑q p q p↓q
⊥ ⊥ ⊤ ⊥ ⊥ ⊤
⊥ ⊤ ⊤ ⊥ ⊤ ⊥
⊤ ⊥ ⊤ ⊤ ⊥ ⊥
⊤ ⊤ ⊥ ⊤ ⊤ ⊥

Express ¬ p, p ∧ q and p ∨ q in terms of ↑ and ↓ operators.

Chapter 4 Mathematical Logic 208

2. If P = (A ∧ B) ∨ (B ∧ C) ∨ (A ∧ C) then show that a DNF of ¬ P

is (¬ A ∧ ¬ B) ∨ (¬ B ∧ ¬ C) ∨ (¬ A ∧ ¬ C) .

3. Show that:

(a) P ∨ (Q ∧ R) ≡ (P ∨ Q) ∧ (P ∨ R)
(b) ¬ (P ∨ Q) ≡ (¬ P ∧ ¬ Q)
(c) (P ∨ Q) ≡ Q, if P is unsatisﬁable.

4. Show that the formula (a ∨ b ∨ c) ∧ (c ∨ ¬ a) is equivalent to the

formula c ∨ (b ∧ ¬ a) .

5. Let S be a ﬁnite clause set such that |C| ≤ 2 for each C ∈ S. Show that
the resolution technique provides a polynomial-time decision procedure
for determining the satisﬁability of S.

6. Prove the following equivalences:

(i) ∃x(P x ⇒ ∀xP x) ≡ ∃x∀y(P x ⇒ P y)

(ii) ∃x(∃xP x ⇒ P x) ≡ ∃x∀y(P y ⇒ P x)

7. Obtain the prenex normal form of

∀x ∃ yR(x, y) ∧ ∀ y ¬ S(x, y) ⇒ ¬ ∃ yR(x, y) ∧ P

8. Show that the definitions 4, 5 and 6 in Definition 4.2.6 are direct con-
sequences of the other definitions.
Chapter 5

Algebraic Structures

5.1 Introduction

In this chapter, the basic properties of the fundamental algebraic structures,

namely, matrices, groups, rings, vector spaces and fields are presented. In ad-
dition, the basic properties of finite fields that are so basic to finite geometry,
coding theory and cryptography are also discussed.

5.2 Matrices

A complex matrix A of type (m, n) or an m by n complex matrix is an

arrangement of mn complex numbers in m rows and n columns in the form:

 
 a11 a12 . . . a1n 
 
 a21 a22 . . . a2n 
 
A= .
. . . . . . . . . . .
 
 
am1 am2 . . . amn

209
Chapter 5 Algebraic Structures 210

A is usually written in the shortened form A = (aij ), where 1 ≤ i ≤ m and

1 ≤ j ≤ n. If m = n, A is a square matrix of order n. aij is the (i, j)-th entry
of A. All the matrices that we consider in this section are complex matrices.

5.3 Addition, Scalar Multiplication and Mul-

tiplication of Matrices

If A = (aij ) and B = (bij ) are two m by n matrices, then A + B is the

m by n matrix (aij + bij ), and for a scalar (that is, a complex number) α,
αA = (αaij ). Further, if A = (aij ) and B = (bij ) is an n by p matrix, then
the product AB is deﬁned to be the m by p matrix (cij ), where,

cij = ai1 b1j + ai2 b2j + · · · + ain bnj

= the scalar product of the i-th row vector Ri of A

and the j-th column vector Cj′ of B.

Thus Cij = Ri · Cj′ . Both Ri and Cj′ are vectors of length n. It is well-
known that the matrix product satisﬁes both the distributive laws and the
associative laws, namely, for matrices A, B and C,

A(B + C) = AB + AC,

(A + B)C = AC + BC, and

(AB)C = A(BC)

whenever these sums and products are deﬁned.

Chapter 5 Algebraic Structures 211

5.3.1 Transpose of a Matrix

If A = (aij ) is an m by n matrix, then the n by m matrix (bij ), where Bij = aji

is called the transpose of A. It is denoted by At . Thus At is obtained from
A by interchanging the row and column vectors of A. For instance, if
   
123 147
   
   
A = 4 5 6 , then At = 2 5 8 .
   
789 369

It is easy to check that

(i) (At ) =A, and

(ii) (AB)t =B t At , whenever the product AB is deﬁned

5.3.2 Inverse of a Matrix

Let A = (aij ) be an n by n matrix, that is, a matrix of order n. Let Aij be

the cofactor of aij in det A(= determinant of A). Then the matrix (Aij )t of
order n is called the adjoint (or adjucate) of A, and denoted by adj A.

Theorem 5.3.1:
For any square matrix A of order n,

A(adj A) = (adj A)A = (det A)In ,

where In is the identity matrix of order n. (In is a matrix of order n in which

the n diagonal entries are 1 and the remaining entries are 0).

Proof. By a property of determinants, we have

ai1 A1j + ai2 A2j + · · · + ain Anj = A1i a1j + A2i a2j + · · · + Ani anj = det A or 0
Chapter 5 Algebraic Structures 212

according to whether i = j or i 6= j. We note that in adj A, (Aj1 , . . . , Anj )

is the j-th column vector and (A1j , . . . , Ajn ) is the j-th row vector. Hence,
actual multiplication yields
 
det A 0 . . . 0 
 
 0 det A . . . 0 
 
A(adj A) = (adj A)A =   = (det A)In .
. . . . . . . . . . . . . . .
 
 
0 0 . . . det A

Corollary 5.3.2:
1
Let A be a nonsingular matrix that is, (det A 6= 0). Set A−1 = adj A
(det A).
Then AA−1 = A−1 A = In , where n is the order of A.

The matrix A−1 , as deﬁned in Corollary 5.3.2, is called the inverse of the
(nonsingular) matrix A. If A, B are square matrices of the same order with
AB = I, then B = A−1 and A = B −1 . This is seen by premultiplying the
equation AB = I by A−1 and postmultiplying it by B −1 . Note that A−1 and
B −1 exist since taking determinants of both sides of AB = I, we get

det(AB) = det A · det B = det I = 1

and hence det A 6= 0 as well as det B 6= 0.

5.3.3 Symmetric and Skew-symmetric matrices

A matrix A is said to be symmetric iﬀ A = At . A is skew-symmetric iﬀ

A = −At . Hence if A = (aij ), then A is symmetric if aij = aji for all i and j; it
Chapter 5 Algebraic Structures 213

is skew-symmetric if aij = −aji for all i and j. Clearly, symmetric and skew-
symmetric matrices are square matrices. If A = (aij ) is skew-symmetric,
then aii = −aii , and hence aii = 0 for each i. Thus in a skew-symmetric
matrix, all the diagonal entries are zero.

5.3.4 Hermitian and Skew-Hermitian matrices

Let H = (hij ) denote a complex matrix. The conjugate H of H is the matrix

(hij ). The conjugate-transpose of H is the matrix H ⋆ = (H)t = (H t ) =
(hji ) = (h⋆ij ). H is Hermitian iﬀ H ⋆ = H; H is skew-hermitian iﬀ H ⋆ = −H.

1 2+3i
For example, the matrix H = 2−3i √
5
is Hermitian, while the matrix
−i 1+2i

S = −1+2i 5i is skew-hermitian. Note that the diagonal entries of a
skew-hermitian matrix are purely imaginary.

5.3.5 Orthogonal and Unitary matrices

A real matrix (that is, a matrix whose entries are real numbers) P of order
n is called orthogonal if P P t = In . If P P t = In , then P t = P −1 . Thus the
inverse of an orthogonal matrix is its transpose. Further as P −1 P = In , we
also have P t P = In . If R1 , . . . , Rn are the row vectors of P , the relation
P P t = In implies that Ri · Rj = δij , where δij = 1 if i = j, and δij = 0 if
i 6= j. A similar statement also applies to the column vectors of P . As an
example, the matrix ( −cos α sin α
sin α cos α ) is orthogonal. Indeed, if (x, y) are cartesian

coordinates of a point P referred to a pair of rectangular axes and if (x′ , y ′ ) are

the coordinates of the same point P with reference to a new set of rectangular
axes got by rotating the original axes through an angle α about the origin,
Chapter 5 Algebraic Structures 214

then

x = x′ cos α + y ′ sin α

y = −x′ sin α + y ′ cos α,

    
x cos α − sin α x′
that is,  =  
y − sin α cos α y′

so that rotation is eﬀected by an orthogonal matrix.

Again, if (l1 , m1 , n1 ), (l2 , m2 , n2 ) and (l3 , m3 , n3 ) are the direction cosines
of three mutually orthogonal directions referred to an orthogonal coordinate
l1 m1 n1
system in the Euclidean 3-space, then the matrix l2 m2 n2 is orthogonal.
l3 m3 n3
In passing, we mention that rotation in higher dimensional Euclidean spaces
is deﬁned by means of an orthogonal matrix.
A complex matrix U of order n is called unitary if U U ⋆ = In . Again, this
means that U ⋆ U = In . Also a real unitary matrix is simply an orthogonal
matrix. The unit matrix is both orthogonal as well as unitary. For example,

the matrix U = 51 −1+2i −4−2i
2−4i −2−i is unitary.

Exercises 5.2
3 −4
1+2k −4k

1. If A = 1 −1 , prove by induction that Ak = k 1−2k for any posi-
tive integer k.

N
2. If M = ( −cos α sin α
sin α cos α ), prove that M = ( −cos nα sin nα
sin nα cos nα ), n ∈ N .

1 −10

3. Compute the transpose, adjoint and inverse of the matrix 0 1 −1 .
1 0 1

1 3 2 −1
4. If A = ( −2 2 ), show that A − 3A + 8I = 0. Hence compute A .

5. Give two matrices A and B of order 2, so that

Chapter 5 Algebraic Structures 215

(i) AB 6= BA

(ii) (AB)t 6= AB

6. Prove: (i) (AB)t = B t At ; (ii) If A and B are nonsingular, (AB)−1 =

B −1 A−1 .

7. Prove that the product of two symmetric matrices is symmetric iﬀ the

two matrices commute.

8. Prove: (i) (iA)⋆ = −iA⋆ (ii) H is Hermitian iﬀ iH is skew-Hermitian.

9. Show that every real matrix is the unique sum of a symmetric matrix
and a skew-symmetric matrix.

10. Show that every complex matrix is the unique sum of a Hermitian and
a skew-Hermitian matrix.

5.4 Groups

Groups constitute an important basic algebraic structure that occur very nat-
urally not only in mathematics but also in many other ﬁelds such as physics
and chemistry. In this section, we present the basic properties of groups. In
particular, we discuss abelian and nonabelian groups, cyclic groups, permuta-
tion groups and homomorphisms and isomorphisms of groups. We establish
Lagrange’s theorem for ﬁnite groups and the basic isomorphism theorem for
groups.
Chapter 5 Algebraic Structures 216

Abelian and Nonabelian Groups

Definition 5.4.1:
A binary operation on a nonempty set S is a map.
∴ S × S → S, that is, for every ordered pair (a, b) of elements of S, there is
associated a unique element a · b of S. A binary system is a pair (S, ·), where
S is a nonempty set and · is a binary operation on S. The binary system
(S, ·) is associative if · is an associative operation on S, that is, for all a, b, c
in S, (a · b) · c = a · (b · c)

Definition 5.4.2:
A semi group is an associative binary system. An element e of a binary
system (S, ·) is an identity element of S if a · e = e · a = a for all a ∈ S.

We use the following standard notations:

Z =the set of integers (positive integers, negative integers and zero)
Z+ =the set of positive integers
=the set of natural numbers {1, 2, . . . , n, . . .} = N
Q =the set of rational numbers
Q+ =the set of positive rational numbers
Q∗ =the set of nonzero rational numbers
R =the set of real numbers
R∗ =the set of nonzero real numbers
C =the set of complex numbers
C∗ =the set of nonzero complex numbers
Chapter 5 Algebraic Structures 217

Examples

1. (N, ·) is a semigroup, where · denotes the usual multiplication.

2. The operation subtraction is not a binary operation on N (for example,

/ N).
3−5∈

3. (Z, −) is a binary system which is not a semigroup since the associative

law is not valid in (Z, −); for instance, 10 − (5 − 8) 6= (10 − 5) − 8.

We now give the deﬁnition of a group.

Definition 5.4.3:
A group is a binary system (G, ·) such that the following axioms are satisﬁed:

(G1 ): The operation · is associative on G, that is, for all a, b, c ∈ G, (a ·

b) · c = a · (b · c).
(G2 ): (Existence of identity)There exists an element e ∈ G (called an identity
element of G with respect to the operation ·) such that a · e = e · a = a for
all a ∈ G.
(G3 ): (Existence of inverse) To each element a ∈ G, there exists an element
a−1 ∈ G (called an inverse of a) with respect to the operation ·) such that
a · a−1 = a−1 · a = e.
Before proceeding to examples of groups, we show that identity element
e, and inverse element a−1 of a, given in Deﬁnition 5.4.3 are unique.
Suppose G has two identities e and f with respect to the operation ·.
Then

e=e·f (as f is an identity of (G, ·))

Chapter 5 Algebraic Structures 218

=f (as e is an identity of (G, ·)).

Next, let b and c be two inverses of a in (G, ·). Then

b = b · e = b · (a · c)

= (b · a) · c by the associativity of ·

=e·c

= c.

Thus henceforth we can talk of “The identity element e” of the group (G, ·),
and “The inverse element a−1 of a” in (G, ·).
If a ∈ G, then a · a ∈ G; also, a · a · · · (n times)∈ G. We denote a · a · · · (n
times) by an . Further, if a, b ∈ G, a · b ∈ G, and (a · b)−1 = b−1 · a−1 . (Check
that (a · b)(a · b)−1 = (a · b)−1 (a · b) = e). More generally, if a1 , a2 , . . . , an ∈ G,
−1
then (a1 · a2 · · · an )−1 = a−1 −1 n −1
n · an−1 · · · aa , and hence (a ) = (a−1 )n =
(written as) a−n . Then the relation am+n = am · an holds for all integers m
and n, with a0 = e. In what follows, we drop the group operation · in (G, ·),
and simply write group G, unless the operation is explicitly needed.

Lemma 5.4.4:
In a group, both the cancellation laws are valid, that is, if a, b, c are elements
of a group G with ab = ac, then b = c (left cancellation law), and if ba = ca,
then b = c (right cancellation law).

Proof. If ab = ac, premultiplication by a−1 gives a−1 (ab) = a−1 (ac). So by

the associative law, (a−1 a)b = (a−1 a)c, and hence eb = ec. This implies that
b = c. The other cancellation is proved similarly.
Chapter 5 Algebraic Structures 219

Definition 5.4.5:
The order of a group G is the cardinality of G. The order of an element a of
a group G is the least positive n such that an = e, the identity element of G.
If no such n exists, the order of a is taken to be inﬁnity.

Definition 5.4.6 (Abelian Group):

A group G is called abelian (after Abel) if the group operation of G is com-
mutative, that is, ab = ba for all a, b ∈ G.

A group G is nonabelian if it is not abelian, that is, there exists a pair of

elements x, y in G with xy 6= yx.

Examples of Abelian Groups

1. (Z, +) is an abelian group, that is, the set Z of integers is an abelian group
under the usual addition operation. The identity element of this group is
O, and the inverse of a is −a. (Z, +) is often referred to as the additive
group of integers. Similarly, (Q, +), (R, +), (C, +) are all additive abelian
groups.

2. The sets Q∗ , R∗ and C∗ are groups under the usual multiplication opera-
tion.

3. Let G = C[0, 1], the set of complex-valued continuous functions deﬁned

on [0, 1]. G is an abelian group under addition. Here, if f, g ∈ C[0, 1],
then f + g is deﬁned by (f + g)(X) = f (X) + g(X). The zero function T
is the identity element of the group while the inverse of f is −f .

4. For any positive integer n, let Zn = {0, 1, . . . , n − 1}. Deﬁne addition + in

Chapter 5 Algebraic Structures 220

Zn as “congruent modulo n” addition, that is, if a, b ∈ Zn , then a + b = c,

where c ∈ Zn and a + b ≡ c ( mod n). Then (Zn , +) is an abelian group.
For instance, if n = 5, then in Z5 , 2 + 2 = 4, 2 + 3 = 0, 3 + 3 = 1 etc.

5. Let G = {rα : rα = rotation of the plane about the origin through an

angle α in the anticlockwise sense }. Then if we set rβ · rα = rα+β (that
is, rotation through α followed by rotation through β = rotation through
α + β), then (G, ·) is a group. The identity element of (G, ·) is r0 , while
(rα )−1 = r−α , the rotation of the plane about the origin through an angle
α in the clockwise sense.

Examples of Nonabelian Groups

1. Let G = GL(n, R), the set of all n by n nonsingular matrices with real
entries. Then G is an inﬁnite nonabelian group under multiplication.

2. Let G = SL(n, Z) be the set of matrices of order n with integer entries

having determinant 1. G is again an inﬁnite nonabelian multiplicative
1
group. (Note that if A ∈ SL(n, Z), then A−1 = det A
(adj A) ∈ SL(n, Z)
since det A = 1 and all the cofactors of the entries of A are integers. (See
Section 5.3.2)

3. Let S4 denote the set of all 1-1 maps f : N4 → N4 , where N4 = {1, 2, 3, 4}.
If · denotes composition of maps, then (S4 , ·) is a nonabelian group of order
4! = 24. (See Section*** for more about such groups). For instance, let
 
123 4
f = .
412 3
Chapter 5 Algebraic Structures 221

Here the parantheses notation signiﬁes the fact that the image under f of
a number in the top row is the corresponding number in the bottom row.
For instance, f (1) = 4, f (2) = 1 and so on. Let
   
123 4 123 4
g=  Then g · f =  .
312 4 431 2

Note that (g · f )(1) = g f (1) = g(4) = 4, while (f · g)(1) = f g(1) =
f (3) = 2, and hence f · g 6= g · f . In other words, S4 is a nonabelian group.
The identity element of S4 is the map
   
123 4 123 4
I=  and f −1 =  .
123 4 234 1

5.4.1 Group Tables

The structure of a ﬁnite group G can be completely speciﬁed y means of its

group table (sometimes called multiplication table). This is formed by listing
the elements of G in some order as {g1 , . . . , gn }, and forming an n by n double
array (gij ), where gij = gi gj , 1 ≤ i ≤ j ≤ n. It is customary to take g1 = e,
the identity element of G.

Examples Continued

9. a [Klein’s 4-group K4 ] This is a group of order 4. If its elements are

e, a, b, c, the group table of K4 is given by Table 5.1.

It is observed from the Table that

ab = ba = c, and a(ba)b = a(ab)b

Chapter 5 Algebraic Structures 222

This gives

c2 = (ab)(ab) = a(ba)b = a(ab)b = a2 b2 = ee = e.

Thus every element of K4 other than e is of order 2.

5.5 A Group of Congruent Transformations

(Also called Symmetries)

We now look at the congruent transformations of an equilateral triangle

ABC. Assume without loss of generality that the side BC of the triangle is
horizontal so that A is in the vertical through the middle point D of BC.
Let us denote the rotation of the triangle about its centre through an angle
of 120◦ in the anticlockwise sense and let f denote the ﬂipping of the triangle
about the vertical through the middle point of the base. f interchanges the
base vertices and leaves all the points of the vertical through the third vertex
unchanged. Then f r denotes the transformation r followed by f and so on.

e a b c
e e a b c
a a e c b
b b c e a
c c b a e

Table 5.1: Group Table of Klein’s 4-group.

Chapter 5 Algebraic Structures 223

A C C

r f

B D C A B B A

Thus f r leaves B ﬁxed and ﬂips A and C in △ABC. There are six congruent
transformations of an equilateral triangle and they form a group as per the
following group table.

r3 = e r r2 f rf r2 f
r3 = e e r r2 f rf r2 f
r r r2 e rf r2 f f
r2 r2 e r r2 f f rf
f f fr f r2 e r2 r
rf rf f r2 f r e r2
r2 f r2 f rf f r2 r e
Group Table of the Dihedral group D3

For instance, r2 f r and rf are obtained as follows:

A C C B

r f r2

B C A B B A A C

A A B

f r

B C C B A C
Chapter 5 Algebraic Structures 224

Thus r2 f r = rf , and similarly the other products can be veriﬁed. The

resulting group is known as the dihedral group D3 .

5.6 Another Group of Congruent Transfor-

mations

Let D4 denote the transformations that leave a square invariant. If r denotes

a rotation of 90◦ about the centre of the square in the anticlockwise sense,
and f denotes the ﬂipping of the square about one of its diagonals, then the
deﬁning relations for D4 are given by:

r4 = e = f 2 = (rf )2 .

D4 is the dihedral group of order 2 × 4 = 8. Both D3 and D4 are nonabelian

groups. The dihedral group Dn of order n is deﬁned in a similar fashion as
the group of congruent transformations of a regular polygon of n sides. The
groups Dn , n ≥ 3, are all nonabelian. Dn is of order 2n for each n.

5.7 Subgroups

Definition 5.7.1:
A subset H of a group (G, ·) is a subgroup of (G, ·) if (H) is a group, under
the operation · of G.

Deﬁnition 5.7.1 shows that the group operation of a subgroup H of G is

the same as that of G.
Chapter 5 Algebraic Structures 225

Examples of Subgroups

1. Z is a subgroup of (Q, +).

2. 2Z is a subgroup of (Z, +). (Here 2Z denotes the set of even integers).

3. Q∗ is a subgroup of (R∗ , ·). (Here · denotes multiplication.

4. Let H be the subset of maps of S4 (See Example 3 of Section 5.4) that ﬁx

1, that is, H = {f ∈ S4 : f (1) = 1}. Then H is a subgroup of S4 .
Note that the set N of natural numbers does not form a subgroup of (Z, +).

Subgroup Generated by an Element

Definition 5.7.2:
LEt S be a nonempty subset of a group G. The subgroup generated by S in
G, denoted by < S >, is the intersection of all subgroups of G containing S.

Before we proceed to the properties of < S >, we need a result.

Proposition 5.7.3:
The intersection of any family of subgroups of G is a subgroup of G.

Proof. Consider a family {Gα }α∈I of subgroups of G, and let H = ∩Gα . If

a, b ∈ H,then a, b ∈ Gα for each α ∈ I, and since Gα is a subgroup of
G, a · b ∈ Gα for each α ∈ H. Therefore ab ∈ H, and similarly a−1 ∈ H and
e ∈ H. The associative law holds in H, a subset of G, as it holds in G. Thus
H is a subgroup of G.

Corollary 5.7.4:
Chapter 5 Algebraic Structures 226

Let S be a nonempty subset of a group G. Then < S > is the smallest

subgroup of G containing S.

Proof. By deﬁnition, < S > is a subgroup of G containing S. If < S > is

not the smallest subgroup of G containing S, then there exists a subgroup H
of G such that S ⊂ H ⊂ < S >. But, by deﬁnition of < S >, any subgroup
that contains S must contain < S >. Hence < S > ⊂ H. Thus H =< S >,
and so < S > is the smallest subgroup of G containing S.

5.8 Cyclic Groups

Definition 5.8.1:
Let G be a group and a, an element of of G. Then the subgroup generated
by a in G is h{a}i, that is, the subgroup generated by the singleton subset
{a}. It is also denoted simply by hai.

By Corollary 5.7.4, hai is the smallest subgroup of g containing a. As

a ∈< a >, all the powers of an , n ∈ Z, also belong to < a >. But then,
as may be checked easily, the set {an : n ∈ Z} of powers of a is already a
subgroup of G. Hence hai = {an : n ∈ Z}. Note that a0 = e, the identity
element of G, and a−n = (a−1 )n , the inverse of an . This makes am · an = am+n
for all integers m and n. The subgroup A = hai of G is called the subgroup
generated by a, and a is called a generator of A.
Now since {an : n ∈ Z} = {(a−1 ) : n ∈ Z}, a−1 is also a generator of
hai. Suppose a is of ﬁnite order m in hai. This means that m is the
least positive integer with the property that am = e. Then the elements
Chapter 5 Algebraic Structures 227

a1 = a, a2 , . . . , am−1 , am = e are all distinct. Moreover, for for any integer m,

by Euclidean algorithm (see....), there are integers q and r such that

m = qn + r, 0 ≤ r < n.

Then
am = a(qn+r) = (an )q ar = eq ar = ear = ar , 0 ≤ r < n,

and hence am ∈< a >. Thus in this case,

hai = a, a2 , . . . , am−1 , am = e

In the contrary case, there exists no positive integer m such that am = e.

Then all the powers ar : r ∈ N an distinct. If not, there exist integers r and
s, r 6= s, such that ar = as . Suppose r > s. Then r − s > 0 and the equation
ar = as gives a−s ar = a−s as , and therefore ar−s = a0 = e, a contradiction. In
the ﬁrst case, the cyclic group < a > is of order m while in the latter case,
it is of inﬁnite order.

Examples of cyclic Groups

1. The additive group of integers (Z, +) is an inﬁnite cyclic group. It is

generated by 1 as well as −1.

2. The group of n-th roots of unity, n ≥ 1. Let G be the set of n-th roots of
unity so that

2π 2π
G = ω, ω 2 , . . . , ω n = 1; ω = cos + i sin .
n n

Then G is a cyclic group of order n generated by ω, that is, G =< ω >.

In fact ω k , 1 ≤ k ≤ n, also generates k iﬀ (k, n) = 1. (See****). Hence
Chapter 5 Algebraic Structures 228

the number of generators of G is φ(n). As G is a cyclic group of order n,

it follows that for each positive integer n, there exists a cyclic group of
order n.

If G =< a >= {an : n ∈ Z}, then since for any two integers n and m,
an am = an+m = am an , G is abelian. In other words, every cyclic group is
abelian. However, the converse is not true. K4 , the Klein’s 4-group (See
Table 5.1 of Section 5.4) is abelian but not cyclic since K4 has no element of
order 4.

Theorem 5.8.2:
Any subgroup of a cyclic group is cyclic.

Proof. Let G =< a > be a cyclic group, and H, a subgroup of G. If H = {e},

then H is trivially cyclic. So assume that H 6= {e}. As the elements of G
are powers of a, an ∈ H for some nonzero integer n. Then its inverse a−1
also belongs to H, and of n and −n at least one of them is a positive integer.
Let s be the least positive integer such that as ∈ H. (recall that H 6= {e} as
per our assumption). We claim that H =< as >, the cyclic subgroup of G
generated by as . To prove this we have to show that each element of H is a
power of as . Let g be any element of H. As g ∈ G, g = am for some integer
m. By division algorithm,

m = qs + r, 0 ≤ r < s.

Hence ar = am−qs = am (as )−q ∈ H as am ∈ H and as ∈ H. Thus ar ∈ H.

This however implies, by the choice of s, r = 0 (otherwise ar ∈ H with
0 < r < s). Hence ar = a0 = e = am (as )−q , and therefore, g = am =
Chapter 5 Algebraic Structures 229

(as )q , q ∈ Z. Thus every element of H is a power of as and so H ⊂< as >.

Now since as ∈ H, all powers of as also ∈ H, and so < as >⊂ H. Thus
H =< as > and therefore H is cyclic.

Definition 5.8.3:
Let S be any nonempty set. A permutation on S is a bijective mapping from
S to S.

Lemma 5.8.4:
If σ1 and σ2 are permutations on S, then the map σ = σ1 σ2 deﬁned on S by

σ1 σ2 (s) = σ(s) = σ1 (σ2 (s)) , s∈S

is also a permutation on S. σ is one-to-one as both σ1 and σ2 are one-to-one;

it is onto as both σ1 and σ2 are onto.

σ2 σ1

s1 σ1 (σ2 s1 ) = σ(s1 )
σ2 σ2 (s1 ) σ1

s2 σ1 (σ2 s2 ) = σ(s2 )
σ2 (s1 )
S S S

Proof. Indeed, we have, for s1 , s2 in S, σ(s1 ) = σ(s2 ) gives that σ1 (σ2 s1 ) =

σ2 (σ2 s2 ). This implies, as σ1 is 1 − 1, σ2 s1 = σ2 s2 . Again, as σ2 is 1 − 1, this
gives that s1 = s2 . Thus σ is 1 − 1. For a similar reason, σ is onto.

Let B denote the set of all bijections on S. Then it is easy to verify that
(B, ·), where · is the composition map, is a group. The identity element of
this group is the identity function e on S.
Chapter 5 Algebraic Structures 230

The case when S is a ﬁnite set is of special signiﬁcance. So let S =

{1, 2, . . . , n}. The set P of permutations of S forms a group under the com-
position operation. Any σ ∈ P can be conveniently represented as:
 
1 2 ... n
σ:  .
σ(1) σ(2) . . . σ(n)
Then σ −1 is just the permutation
 
σ(1) σ(2) . . . σ(n)
σ −1 :  
1 2 ... n
What is the order of the group P? Clearly σ(1) has n choices, namely,
any one of 1, 2, . . . , n. Having chosen σ(1), σ(2) has n − 1 choices (as σ is
1 − 1, σ(1) 6= σ(2)). For a similar reason, σ(3) has n − 2 choices and so
on, and ﬁnally σ(n) has just one left out choice. Thus the total number of
permutations on S is

n · (n − 1) · (n − 2) · · · 2 · 1 = n!

In other words, the group P of permutations on a set of n elements is of order

n! It is denoted by Sn and is called the symmetric group of degree n. Any
subgroup of Sn is called a permutation group of degree n.

Example

Let S = {1, 2, 3, 4, 5}, and let σ and τ ∈ S5 be given by

   
12345 12345
σ= , τ =  .
23541 52413
Then    
12345 1 2 3 4 5
σ·τ = , and σ 2 = σ · σ =  
13425 35142
Chapter 5 Algebraic Structures 231

Definition 5.8.5:
A cycle in Sn is a permutation σ ∈ Sn that can be represented in the form
(a1 , a2 , . . . , ar ), where the ai , 1 ≤ i ≤ r, r ≤ n, are all in S, and σ(ai ) = ai+1 ,
1 ≤ i ≤ r − 1, and σ(ar ) = a1 , that is, each ai is mapped cyclically to the
next element (or) number ai+1 and σ ﬁxes the remaining ai ’s. For example,
 
1324
if σ =   ∈ S4 , then σcan be represented by (132). Here σ leaves 4
3214
ﬁxed.
 
1234567
Now consider the permutation p =  . Clearly, p is the
3421657
product of the cycles

(1324)(56)(7) = (1324)(56).

Since 7 is left ﬁxed by p, 7 is not written explicitly. In this way, every

permutation on n symbols is a product of disjoint cycles. The number of
symbols in a cycle is called the length of the cycle. For example, the cycle
(1 3 2 4) is a cycle of length 4. A cycle of length 2 is called a transposition.
For example, the permutation (1 2) is a transposition. It maps 1 to 2, and
2 to 1. Now consider the product of transpositions t1 = (13), t2 = (12) and
t3 = (14). We have
     
1 3 1 2 1 4 123 4
t1 t2 t3 = (13)(12)(14) =    =  = (1423) .
3 1 2 1 4 1 431 2
To see this, note that

(t1 t2 t3 )(4) = (t1 t2 )(t3 (4)) = (t1 t2 )(1) = t1 (t2 (1)) = t1 (2) = 2, and so on.

In the same way, any cycle (a1 a2 . . . an ) = (a1 an )(a1 an−1 ) · · · (a1 a2 ), a product
of transpositions. Since any permutation is a product of disjoint cycles and
Chapter 5 Algebraic Structures 232

any cycle is a product of transpositions, it is clear that any permutation is

a product of transpositions. Now in the expression of a cycle as a product
of transpositions, the number of transpositions need not be unique. For
instance, (12)(12) = identity permutation, and

(1324) = (14)(12)(13) = (12)(12)(14)(12)(13).

However, this number is always odd or always even.

Theorem 5.8.6:
Let σ be any permutation on n symbols. Then in whatever way σ is expressed
as a product of transpositions, the number of transpositions is always odd or
always even.

Proof. Assume that σ is a permutation on {1, 2, . . . , n}. Let the product

P = (a1 − a2 )(a1 − a3 ) · · · (a1 − an )

(a2 − a3 ) · · · (a2 − an )
... ...

(an−1 − an )

1 1 ··· 1
a1 a2 · · · an
Y
= (ai − aj ) = det a21 a22 · · · a2n .
1≤i<j≤n
··· ··· ··· ···
a1n−1 a2n−1 · · · ann−1
Any transposition (ai aj ) applied to the product P changes P to −P as this
amounts to the interchange of the i-th and j-th columns of the above de-
Chapter 5 Algebraic Structures 233

terminant. Now σ when applied to P has a deﬁnite eﬀect, namely, either

it changes P to −P or leaves P unchanged. In case σ changes P to −P , σ
must always be a product of an odd number of transpositions; otherwise it
must be the product of an even number of transpositions.

Definition 5.8.7:
A permutation is odd or even according to whether it is expressible as the
product of an odd number or even number of transpositions.

Example 5.8.8:

 
123456789
Let σ =  
451237986

Then σ = (14253)(679)(8)

= (13)(15)(12)(14)(69)(67)

= a product of an even number of transpositions

Hence σ is an even permutation.

5.9 Lagrange’s Theorem for Finite Groups

We now establish the most famous basic theorem on ﬁnite groups, namely,
Lagrange’s theorem. For this, we need the notion of left and right cosets of
a subgroup.

Definition 5.9.1:
Chapter 5 Algebraic Structures 234

Let G be a group and H, a subgroup of G. For a ∈ G, the left coset aH of a

in G is the subset {ah : h ∈ H} of G. The right coset Ha of a is deﬁned in
an analogous manner.

Lemma 5.9.2:
Any two left cosets of a subgroup H of a group G are equipotent (that is,
have he same cardinality). Moreover, they are equipotent to H.

Proof. Let aH and bH be two left cosets of H in G. Consider the map

φ: aH → bH

deﬁned by φ(ah) = bh, h ∈ H. φ is 1 − 1 since φ(ah1 ) = φ(ah2 ), for

h1 , h2 ∈ H, implies that bh1 = bh2 and therefore h1 = h2 . Clearly, φ is
onto. Thus φ is a bijection of aH onto bH. In other words, aH and bH are
equipotent. SInce eH = H, H itself is a left coset of H and so all left cosets
of H are equipotent to H.

Lemma 5.9.3:
The left coset aH is equal to H iﬀ a ∈ H.

Proof. If a ∈ H, then aH = {ah : h ∈ H} ⊂ H as ah ∈ H. Further if

b ∈ H, then a−1 b ∈ H, and so a(a−1 b) ∈ H. But a(a−1 b) = b. Hence b ∈ aH,
and therefore aH = H. (In particular if H is a group, then multiplication of
the elements of H by any element a ∈ H just gives a permutation of H.)
Conversely, if aH = H, then a = ae ∈ aH, as e ∈ H.

Example 5.9.4:
It is not necessary that aH = Ha for all a ∈ G. For example, consider S3 ,
Chapter 5 Algebraic Structures 235

the symmetric group of degree 3. The 3! = 6 permutations of S3 are given

by
      



 1 2 3 1 2 3 1 2 3



 e=
, 
 
 = (23), 
 
 = (13)



 123 132 321
S3 =      



 1 2 3 1 2 3 1 2 3

   = (12),   = (123),   = (132).

      


 213 231 312

Let H be the subgroup {e, (12)}. For a = (123), we have

aH = {(123)e, (123)(12) = (13)} , and

Ha = {e(123), (12)(123) = (23)}

so that aH 6= Ha.

Proposition 5.9.5:
The left cosets aH and bH are equal iﬀ a−1 b ∈ H.

Proof. If aH = bH, then there exist h1 , h2 ∈ H such that ah1 = bh2 , and
therefore a−1 b = h1 h−1 −1 −1 −1
2 ∈ H, h2 ∈ H. If a b ∈ H, let a b = h ∈ H so

that b = ah and bH = (ah)H = a(hH) = aH by Lemma ??.

Lemma 5.9.6:
Any two left cosets of the same subgroup of a group are either identical or
disjoint.

Proof. Suppose aH and bH are two left cosets of the subgroup H of a group
G, where a, b ∈ G. If aH and bH are disjoint there is nothing to prove.
Otherwise, aH ∩bH 6= φ,and therefore, there exist h1 , h2 ∈ H with ah1 = bh2 .
Chapter 5 Algebraic Structures 236

This however means that a−1 b = h−1

1 h2 ∈ H. So by Proposition 5.9.5,

aH = bH.

Example 5.9.7:
For the subgroup H of Example 5.9.4, we have seen that (123)H = {(123), (13)}.
Now (12)H = {(12)e, (12)(12)} = {(12), e} = H, and hence (123)H ∩
(12)H = φ. Also (23)H = {(23)e, (23)(12)} = {(23), (132)}, and (13)H =
{(13)(123), (13)((13)} = {(123), e} = (12) ∈ H
The last equation holds since (13)−1 (123) = (13)(123) = (12) ∈ H (Refer
to Proposition 5.9.5).

Theorem 5.9.8:
[Lagrange’s Theorem, after the French mathematician J. L. Lagrange]Lagrange
The order of any subgroup of a ﬁnite group G divides the order of G.

Proof. Let H be a subgroup of the ﬁnite group G. We want to show that

o(H)| · o(G). We show this by proving that the left cosets of H in G form
a partition of G. First of all, if g is any element of G, g = ge ∈ gH. Hence
every element of G is in some left coset of H. Now by Lemma 5.9.6, the
distinct cosets of H are pairwise disjoint and hence form a partition of G.
Again, by Lemma ??, all the left cosets of H have the same cardinality as
H, namely, o(H). Thus if there are l left cosets of H in G, we have

l · o(H) = o(G) (5.1)

Consequently, o(H) divides o(G).

Definition 5.9.9:
Let H be a subgroup of a group G. Then the number (may be inﬁnite) of
Chapter 5 Algebraic Structures 237

left cosets of H in G is called the index of H in G and denoted by iG (H)

If G is a ﬁnite group, then equation 5.1 in the proof of Theorem 5.9.8
shows that
o(G) = o(H)iG (H)

Example 5.9.10:
[An application of Lagrange’s theorem] If p is a prime, and n any positive
integer, then
n|φ(pn − 1)

First we prove a lemma.

Lemma 5.9.11:
If m ≥ 2 is a positive integer, and S, the set of positive integers less than m
and prime to it, then S is a multiplicative group modulo m.

Proof. If (a, m) = 1 and (b, m) = 1, then (ab, m) = 1. For if p is a prime

factor of ab and m, then as p divides ab, p must divide either a or b, say, p|a.
Then (a, m) ≥ p, a contradiction. Moreover, if ab ≡ c( mod m), 1 ≤ c < m,
then (c, m) = 1. Thus ab( mod m) = c ∈ S. Also as (1, m) = 1, 1 ∈ S. Now
for any a ∈ S, by Euclidean algorithm, there exists b ∈ N such that ab ≡ 1(
mod m). Then (b, m) = 1 (if not, there exists a prime p with p|b and p|m,
then p|1, a contradiction.) Thus a has an inverse b( mod m) in S. Thus S
is a multiplicative group modulo m, and o(S) = φ(m).

Proof. [of 5.9.10] We apply Lemma 5.9.11 by taking m = pn − 1. Let H =

{1, p, p2 , . . . , pn−1 }. All the numbers in H are prime to m and hence H ⊂ S
(as deﬁned in Lemma 5.9.10). Further pj · pn−j = pn = 1( mod m). Hence
Chapter 5 Algebraic Structures 238

every element of H has an inverse modulo m. Therefore (as the other group
axioms are trivially satisﬁed byH), H is a subgroup of order n of S. By
Lagrange’s theorem, o(H)|o(S), and so n|φ(pn − 1).

As an application of Lagrange’s theorem, we have the following result.

Theorem 5.9.12:
Any group of prime order is cyclic.

5.10 Homomorphisms and Isomorphisms of

Groups

Consider the two groups:

G1 = the multiplicative group of the sixth roots of unity
= {ω 6 = 1, ω, ω 2 , . . . , ω 5 : ω = a primitive sixty root of unity}
and G2 = the additive group of Z = {0, 1, 2, . . . , 5}
G1 is a multiplicative group while G2 is an additive group. However, struc-
turewise, they are just the same. By this we mean that if we can make a
suitable identiﬁcation of the elements of the two groups, then they behave in
the same manner. If we make the correspondence

ω i ←→ i,

we see that ω i ω j ←→ i + j as ω i ω j = ω i+j , when i + j is taken modulo 6. For

instance, in G1 , ω 3 ω 4 = ω 7 = ω 1 , while in G2 , 3 + 4 = 1 as 7 ≡ 1( mod 6).
The order of ω in G1 = 6 = the (additive) order of 1 in G2 . G1 has {1, ω 2 , ω 4 }
as a subgroup while G2 has {0, 2, 4} as a subgroup and so on. It is clear that
we can replace 6 by any positive integer n and a similar result holds good.
Chapter 5 Algebraic Structures 239

In the above situation, we say G1 and G2 are isomorphic groups. We now

formalize the above concept.

Definition 5.10.1:
Let G and G′ be groups (distinct or not). A homomorphism from G to G1 is
a map f : G → G′ such that

f (ab) = f (a)f (b). (5.2)

In Deﬁnition 5.10.1 the multiplication operation has been used to denote

the group operations in G1 and G2 . If, for instance, G1 is an additive group
and G2 is a multiplicative group, the equation 5.2 should be changed to

f (a + b) = f (a)f (b)

and so on.

Definition 5.10.2:
An isomorphism from a group G to a group G′ is bijective homomorphism
from G to G′ , that is, it is a map f : G → G′ which is both a bijection and a
group homomorphism.

It is clear that if f : G → G′ is an isomorphism from G to G′ , then

f −1 : G′ → G is an isomorphism from G′ → G. Hence if there exists a group
isomorphism from G to G′ , we can say without any ambiguity that G and
G′ are isomorphic groups. A similar statement cannot be made for group
homomorphism. If G is isomorphic to G′ , we write: G ≃ G′
Chapter 5 Algebraic Structures 240

Examples

1. Let G = (Z, +), and G′ = (nZ, +). (nZ is the set got by multiplying all
integers by n). The map f : G → G′ deﬁned by f (m) = mn, m ∈ G, is a
group homomorphism from G onto G′ .

2. Let G = (R, +) and G′ = (R+ , ·). The map f : G → G′ deﬁned by

f (x) = ex , x ∈ G, is a group homomorphism from G onto G′ .

3. Let G = (Z, +), and G′ = (Z × Z, +). The map f : G → G′ deﬁned by

f (n) = (0, n), n ∈ Z is a homomorphism from G to G′ .

4. Let G = R2 = R × R, the real plane with addition + group operation

(this (x, y) + (x′ , y ′ ) = (x + x′ , y + y ′ ), and PX : R2 → R be deﬁned by
PX (x, y) = x, the projection of R2 on the X-axis. PX is a homomorphism
from (R2 , +) to (R, +).

We remark that the homomorphism in Example 3 above is not onto while

those in Examples 1, 2 and 4 are onto. The homomorphisms in Examples 1
and 2 are isomorphisms. The isomorphism in Example 1 is an isomorphism of
G onto a proper subgroup of G. We now check that the map f of Example 2
is an isomomorphism. First it is a group homomorphism since

f (x + y) = ex+y = ex ey = f (x)f (y).

(Note that the group operation in G is addition while that in G′ is multipli-

cation). Next we check that f is 1 − 1. In fact, f (x) = f (y) gives ex = ey ,
and therefore ex−y = 1. This means, as the domain of f is R, x − y = 0,
and hence x = y. Finally, f is onto. If y ∈ R+ , then there exists x such that
Chapter 5 Algebraic Structures 241

ex = ey ; in fact, x = l0 ge y, is the unique preimage of y. Thus f is a 1 − 1,

onto group homomorphism and hence it is a group isomorphism.

5.11 Properties of Homomorphisms of Groups

Let f : G → G′ be a group homomorphism. Then f satisﬁes the following

properties:

Property 1 f (e) = e′ , that is, the image of the identity element e of G

under f is the identity element e′ of G′ .

Proof. For x ∈ G, the equation xe = x in G gives, as f is a group

homomorphism, f (xe) = f (x)f (e) = f (x) = f (x)e′ in G′ . As G′ is a
group, both the cancellation laws are valid in G′ . Hence cancellation of
f (x) gives f (e) = e′ .

Property 2 The image f (a−1 ) of the inverse of an element a of G is the

inverse of f (a) in G′ , that is, f (a−1 ) = (f (a))−1 .

Proof. The relation aa−1 = e in G gives f (aa−1 ) = f (e). But by Prop-

erty 1, f (e) = e′ , and as f is a homomorphism, f (aa−1 ) = f (a)f (a−1 ).
Thus f (a)f (a−1 ) = e′ in G′ . This implies that f (a−1 ) = (f (a))−1 .

Property 3 The image f (G) ⊂ G′ is a subgroup of G′ . In other words, the

homomorphic image of a group is a group.

Proof. (i) Let f (a), f (b) ∈ f (G), where a, b ∈ G. Then f (a)f (b) =
f (ab) ∈ f (G), as ab ∈ G.
Chapter 5 Algebraic Structures 242

(ii) The associative law is valid in f (G). As f (G) ⊂ G′ and G′ , being

a group, satisﬁes the associative law.

(iii) By Property 1, the element f (e) ∈ f (G) acts as the identity ele-
ment of f (G).

(iv) Let f (a) ∈ f (G), a ∈ G. By Property 2, (f (a))−1 = f (a−1 ) ∈

f (G), as a−1 ∈ G.

Thus f (G) is a subgroup of G′ .

Theorem 5.11.1:
Let f : G → G′ be a group homomorphism and K = {a ∈ G : f (a) = e′ },
that is K is the set of all those elements of G that are mapped by f to the
identity element e′ of G. Then K is a subgroup of G.

Proof. By Exercise***, it is enough to check that if a, b ∈ K, then ab−1 ∈ K.

Now f (ab−1 ) = (as f is a group homomorphism)

f (a)f (b−1 ) = f (a) (f (b))−1 = e′ (e′ )−1 = e′ e′ = e′ = e′

and hence ab−1 ∈ K. Thus K is a subgroup of G.

Definition 5.11.2:
The subgroup K deﬁned in the statement of Theorem 5.11.1 is called the
kernel of the group homomorphism f .

As before, let f : G → G′ be a group homomorphism.

Property 4 For a, b ∈ G, f (a) = f (b) iﬀ ab−1 ∈ K, the kernel of f .

Chapter 5 Algebraic Structures 243

Proof.

f (a) = f (b) ⇔ f (a) (f (b))−1 = e′

⇔ f (a)f (b−1 ) = e′ (By Property 2)

⇔ f (ab−1 ) = e′

⇔ ab−1 ∈ K.

Property 5 f is a 1 − 1 map iﬀ K = {e}.

Proof. Let f be 1 − 1, a ∈ K. Then by the deﬁnition of the kernel,

f (a) = e′ . But e′ = f (e), by Property 1. Thus f (a) = f (e), and this
implies, as f is 1 − 1, that a = e.

Conversely, assume that K = {e}, and let f (a) = f (b). Then, by

Property 4, ab−1 ∈ K. Thus ab−1 = e and so a = b. Hence f is
1 − 1.

Property 6 A group homomorphism f : G → G′ is an isomorphism iﬀ

f (G) = G′ and K(= the kernel of f )= {e}.

Proof. f is an isomorphism iﬀ f is a 1−1, onto homomorphism. Now f

is 1 − 1 iﬀ K = {e}, by Property 5. Further, f is onto iﬀ f (G) = G′ .

Property 7 (Composition of homomorphisms) Let f : G → G′ and g : G′ →

G′′ be group homomorphisms. Then the composition map h = gof :
G → G′′ is also a group homomorphism.

Proof. h is a group homomorphism iﬀ h(ab) = h(a)h(b) for all a, b ∈ G.

Now h(ab) = (gof )(ab) = g (f (ab))

Chapter 5 Algebraic Structures 244

= g (f (a)f (b)) , as f is a group homomorphism

= g (f (a)gf (b)) , as g is a group homomorphism

= (g · f ) · (g · f )(b)

= h(a)h(b).

5.12 Automorphism of Groups

Definition 5.12.1:
An automorphism of a group G is an isomorphism of G onto itself.

Example 5.12.2:
Let G = {ω 0 = 1, ω, ω 2 }, be the group of cube roots of unity, where ω =
cos 2π
3
+ i sin 2π
3
. Let f : G → G′ be deﬁned by f (ω) = ω 2 . To make f a group
homomorphism, we have to set f (ω 2 ) = f (ω · ω) = f (ω)f (ω) = ω 2 · ω 2 = ω,
and f (1) = f (ω 3 ) = (f (ω))3 = (ω 2 )3 = (ω 3 )2 ) = 13 = 1. In other words, the
homomorphism f : G → G′ is uniquely deﬁned on G once we set f (ω) = ω 2 .
Clearly, f is onto. Further, only 1 is mapped to 1 by f , while the other two
elements ω and ω 2 are moved by f . Thus Ker f = {1}. So by Property 7, f
is an isomorphism of G onto G, that is, an automorphism of G.

Our next theorem shows that there is a natural way of generating at least
one set of automorphisms of a group.

Theorem 5.12.3:
Let G be a group and a ∈ G. The map fa : G → G deﬁned by fa (x) = axa−1
is an automorphism of G.
Chapter 5 Algebraic Structures 245

Proof. First we show that fa is a homomorphism. In fact, for x, y ∈ G,

fa (xy) = a(xy)a−1 by the deﬁnition of fa

= a(xa−1 ay)a−1

= (axa−1 )(aya−1 )

= fa (x)fa (y).

Thus fa is a group homomorphism. Next we show that fa is 1 − 1. Suppose

for x, y ∈ G, fa (x) = fa (y). This gives axa−1 = aya−1 , and so by the two
cancellation laws that are valid in a group, x = y. Finally, if y ∈ G, then
a−1 ya ∈ G, and fa (a−1 ya) = a(a−1 ya)a−1 = (aa−1 )y(aa−1 ) = eye = y, and
so f is onto. Thus f is an automorphism of the group G.

Definition 5.12.4:
An automorphism of a group G that is a map of the form fa for some a ∈ G
is called an inner automorphism of G.

5.13 Normal Subgroups

Definition 5.13.1:
A subgroup N of a group G is called a normal subgroup of G (equivalently,
N is normal in G) if

aN a−1 ⊆ N for each a ∈ G. (5.3)

In other words, N is normal in G if N is left invariant by the inner

automorphisms fa for each a ∈ G. We state this observation as a proposition.
Chapter 5 Algebraic Structures 246

Proposition 5.13.2:
The normal subgroup of a groups G are those subgroups of G that are left
invariant by all the inner automorphisms of G.

Now the condition aN a−1 ⊆ N for each a ∈ G shows, by replacing a by

a−1 , a−1 N (a−1 )−1 = a−1 N a ⊆ N . The latter condition is equivalent to

N ⊆ aN a−1 for each a ∈ G. (5.4)

The conditions (5.3) and (5.4) give the following equivalent deﬁnition of a
normal subgroup.

Definition 5.13.3:
A subgroup N of a group G is normal in G iﬀ aN a−1 = N (equivalently,
aN = N a) for every a ∈ G.

Examples

1. Let G = S3 , the group of 3! = 6 permutations on {1, 2, 3}. Let N =

{e, (123), (132)}. Then N is a normal subgroup of S3 . First of all note
that N is a subgroup of G. In fact, we have (123)2 = (132), (132)2 = (123),
and (123)(132) = e. Hence (123)−1 = (132) and (132)−1 = (123).
Let a ∈ S3 . If a ∈ N, then aN = N = N a (See 5.9.3). So let
a ∈ S \ N . Hence a = (12), (23) or (13). If a = (12), then aN a−1 =
{(12)e, (12)(123)(12), (12)(132)(12)} = {e, (132), (123)}. In a similar
manner, we have (23)N (23) = N and (13)N (13) = N . Thus N is a
normal subgroup of S3 .

2. Let H = {e, (12)} ⊂ S3 . ThenH is is a subgroup of G that is not normal

Chapter 5 Algebraic Structures 247

in S3 . In fact, if a = (23), we have

aHa−1 = (23) {e, (12)} (23)

= {(23)e(23), (23)(12)(23)}

= {e, (13)} 6= H

Hence H is not a normal subgroup of S3 .

Definition 5.13.4:
The centre of a group G consists of those elements of G each of which
commutes with all the elements of G. It is denoted by C(G). Thus

C(G) = {x ∈ G : xa = ax for each a ∈ G}

For example, C(S3 ) = {e}, that is, the centre of S3 is trivial. Also, it is
easy to see that the centre of an abelian group G is G itself. Clearly the
trivial subgroup {e} is normal in G and G is normal in G. (Recall that
aG = G for each a ∈ G).

Proposition 5.13.5:
The centre C(G) of a group G is a normal subgroup of G.

Proof. We have for a ∈ G,

aC(G)a−1 = aga−1 : g ∈ C(G)

= (ag)a−1 : g ∈ C(G)

= g(aa−1 ) : g ∈ C(G)

= {g : g ∈ C(G)} = C(G)
Chapter 5 Algebraic Structures 248

Theorem 5.13.6:
f : G → G′ be a group homomorphism. Then H = Ken f is a normal
subgroup of G.

Proof. We have, for a ∈ G, aKa−1 = {aka−1 : h ∈ K}. Now f (aka−1 ) =

f (a)f (k)f (a−1 ) = f (a)e′ f (a−1 ) = f (a)f (a−1 ) = f (a)(f (a))−1 − e′ . Hence
aka−1 ∈ K for each k ∈ K and so aKa−1 ⊂ Ker f = K for every a ∈ G.
This implies that H is a normal subgroup of G.

5.14 Quotient Groups (or Factor Groups)

Let G be a group, and H a normal subgroup of G. Let G/H (read G

modulo H) be the set of all left cosets of H. Recall that when H is a
normal subgroup of G, there is no distinction between the left coset aH
and the right coset Ha of H. The fact that H is a normal subgroup of G
enables us to define a group operation in G/H. We set, for any two cosets
aH and bH of H in G, aH · bH = (ab)H. This definition is well-defined.
By this we mean that if we take different representative elements instead
of a and b to define the cosets aH and bH, still we end up with the same
product. To be precise, let

aH = a1 H, and bH = b1 H (5.5)

Then (ab)H = (aH)(bH) = (a1 H)(b1 H) = (a1 b1 )H (5.6)

because (5.5) implies that

a−1 a1 ∈ H and b−1 b1 ∈ H,

Chapter 5 Algebraic Structures 249

and so (ab)−1 (a1 b1 ) = b−1 (a−1 a1 )b1 = b−1 hb1 ) (h = a−1 a1 ⊂ H).
(5.7)

Now we apply Property ???. Thus the product of two (left) cosets of H
in G is itself a left coset of H. Further for a, b, c ∈ G,

(aH)(bH cH) = (aH)(bc H) = (abc)H = ((ab)H)cH = (aH bH)cH.

Thus the binary operation deﬁned in G/H satisﬁes the associative law.
Further eH = H acts as the identity element of G/H as

(aH)(eH) = (ae)H = aH = (ea)H = eH · aH

Finally, the inverse of aH is a−1 H since

(aH)(a−1 H) = (aa−1 )H = eH = H,

and for a similar reason (a−1 H)(aH) = H. Thus G/H is called a group
under this binary operation. G/H is called the quotient group or factor
group of G modulo H.

Example 5.14.1:
We now present an example of a quotient group. Let G = (R2 , +), the
additive group of points of the plane R2 . (If (x1 , y1 ) and (x2 , y2 ) are two
points of R2 , their sum (x1 , y1 )+(x2 , y2 ) is deﬁned as (x1 +x2 , y1 +y2 ). The
identity element of this group (0, 0) and the inverse of (x, y) is (−x, −y)).
Let H be the subgroup: {(x, 0) : x ∈ R} = X-axis. If (a, b) is any point
of R2 , then

(a, b) + H = {(a, b) + (x, 0) = (a + x, b) : x ∈ R}

Chapter 5 Algebraic Structures 250

=line through (a, b) parallel to X-axis. Clearly (a, b) + H = (a′ , b′ ) + H,

then ((a′ − a), (b′ − b) ∈ H) =X-axis and therefore the Y-coordinate b′ −
b = 0 and so b′ = b. In other words, the line through (a, b) and the line
through (a′ , b′ ), both parallel to the X-axis, are the same iff b = b′ , as
is expected (See Figure 5.15. For this reason, this line may be taken as
(0, b) + H. Thus the cosets of H in R ar the lines parallel to X-axis and
therefore the elements of the quotient group R/H are the lines parallel
to the X-axis. If (a, b) + H and (a′ , b′ ) + H are two elements of R/H,
we define their sum to be (a + a′ , b + b′ ) + H = (0, b + b′ ) + H, the line
through (0, b + b′ ) parallel to the X-axis. Note that (R, +) is an abelian
group and so H is a normal subgroup of R2 . Hence the above sum is well-
defined. The above addition defines a group structured on the set of lines
parallel to the X-axis, that is, the elements of R/H. The identity element
of the quotient group is the X-axis = H, and the inverse of (0, b) + H is
(0, −b) + H.

Our next result exhibits the importance of factor groups.

5.15 Basic Isomorphism Theorem for Groups

If there exists a homomorphism f from a group G onto a group G′ with

kernel K then G/K ≃ G′ .
Chapter 5 Algebraic Structures 251

(0, b + b′ )
(a + x, b′ )
(0, b′ )
(a, b) (a + x, b)
(0, b)
z x
}| {
X
O (x, 0)

Figure 5.15

Proof. We have to prove that there exists an isomorphism φ from the

factor group G/K onto G′ (observe that the factor group G/K is deﬁned,
as K is a normal subgroup of G (See Theorem 5.13.6). Deﬁne φ : G/
K → G′ by φ(gK) = f (G). The mapping φ is pictorially depicted in
Figure 5.15.

First we need to establish that φ is a well deﬁned map. This is because, it

is possible that gK = g ′ K with g 6= g ′ in G. Then as per our definition of
φ, φ(gK) = f (G), while φ(g ′ K) = f (g ′ ). Hence our definition of φ will be
valid only if f (G) = f (g ′ ). Now gK = g ′ K implies that g ′ −1 g ∈ K. Let
g ′ −1 g = k ∈ K. Then f (k) = e′ , the identity element of G′ . Moreover, e′ =
f (g ′ −1 g) = f (g ′ −1 )f (g) = f (g ′ )−1 f (g) (as f is a group homomorphism,
by Property 2 holds). Thus f (g ′ ) = f (g), and f is well defined. We next
show that φ is a group isomorphism.

(i) φ is a group homomorphism: We have for g1 K, g2 K in G/K,

φ(g1 Kg2 K) = φ((g1 g2 )K)

Chapter 5 Algebraic Structures 252

= f (g1 g2 ), by the deﬁnition of φ

= f (g1 )f (g2 ), as f is a group homomorphism

= φ(g1 K)φ(g2 K)

Thus φ is a group homomorphism.

(ii) φ is 1 − 1: Suppose φ(g1 K) = φ(g2 K), where g1 K, g2 K ∈ G/K.

This gives that f (g1 ) = f (g2 ), and therefore f (g1 )f (g2 )−1 = e′ , the
identity element of G′ . But f (g1 )f (g2 )−1 = f (g1 )f (g2−1 ) = f (g1 g2−1 ).
Hence g1 g2−1 = e′ , and so g1 g ′ −1
2 ∈ K, and consequently, g1 K = g2 K

(by Property 4). Thus φ is 1 − 1.

(iii) φ is onto: Let g ′ ∈ G′ . As f is onto, there exists g ∈ G with f (g) = g ′ .

Now gK ∈ G/K, and φ(gK) = f (g) = g ′ . Thus φ is onto and hence
φ is an isomorphism.

Let us see as to what this isomorphism means with regard to the factor
group R2 /H given in Example 5.14.1. Deﬁne f : R2 → R by f (a, b) =
(0, b), the projection of the point (a, b) ∈ R2 on the Y-axis= R. The
identity element of the image group is the origin (0, 0). Clearly K is the
set of all points (a, b) ∈ R2 that are mapped to (0, 0), that is, the set
of those points of R2 whose projections on the Y-axis coincide with the
origin. Thus K is the X-axis(= R). Now φ : G/K = R2 /R → G′ is
deﬁned by φ((a + b) + K) = f (a, b) = (0, b). This means that all points of
the line through (a, b) parallel to the X-axis are mapped to their common
projection on the Y-axis, namely, the point (0, b). Thus the isomorphism
between G/K and G′ is obtained by mapping each line parallel to the
X-axis to the point where the line meets the Y-axis.
Chapter 5 Algebraic Structures 253

We now consider another example. Let G = Sn , the symmetric group of

degree n, and G′ = {−1, 1}, the multiplicative group of two elements (with
multiplication deﬁned in the usual way). 1 is the identity element of G′ and
the inverse of −1 is −1. Deﬁne f : G → G′ by setting



1 if σ ∈ Sn is an even permutation, that is, if σ ∈ An
f (σ) =


−1 if σ ∈ Sn is an odd permutation.

Recall that An is a subgroup of Sn . Now |Sn | = n! and |An | = n!/2. Further,

if α is an odd permutation in Sn , and σ ∈ An , then ασ is an odd permutation,
and hence |αAn | = n!/2. Let Bn denote the set of odd permutations in Sn .
Then Sn = An ∪ Bn , An ∩ Bn = φ, and αAn = Bn for each α ∈ Bn . Thus
Gn /An has exactly two distinct cosets of An , namely, An and Sn \ An = Bn .
The mapping f : Sn → {1, −1} deﬁned by f (α) = 1 or −1 accord-
ing to whether the permutation α is even or odd clearly deﬁnes a group
homomorphism. The kernel K of this homomorphism is An , and we have
Sn \ An ≃ {1, −1}. The isomorphism is obtained by mapping the coset σAn
to 1 or −1 according to whether σ is an even or an odd permutation.

5.16 Exercises

1. Let G = SL(n, C) be the set of all invertible complex matrices A of order

n. IF the operation· denotes matrix multiplication, show that G is a group
under ·.

2. Let G denote the set o fall real matrices of the form ( ab 01 ) with a 6= 0.
Show that G is a group under matrix multiplication.
Chapter 5 Algebraic Structures 254

3. Which of the following semigroups are groups?

(i). (Q, ·)

(ii). (R∗ , ·)

(iii). (Q, +)

(iv). (R∗ , ·)

(v). The set of all 2 by 2 real matrices under matrix multiplication.

(vi). The set of all 2 by 2 real matrices of the form ( ab 01 )

4. Prove that a ﬁnite cancellative semigroup is a group, that is, if H is a

ﬁnite semigroup in which both the cancellation laws are valid (that is,
ax = ay implies that x = y, and xa = ya implies that x = y, a, x, y ∈ H),
is a group. (Hint: If H = {x1 , x2 , . . . , xn }, consider aH and Ha and apply
Ex. 7 above.

5. Prove that in semigroup G in which the equations ax = b and yc = d, are

solvable in G, where a, b, c, d ∈ G, is a group.

6. In the group GL(2, C) of 2 × 2 complex nonsingular matrices, ﬁnd the

order of the following elements:
−2+3i −1+2i

(i) ( 10 −i
0
) (ii) ( 10 11 ) (iii) ( 0i −i
0
) (iv) 1−i 3−2i

7. Let G be a group, and φ : G → G be deﬁned by φ(g) = g −1 , g ∈ G. Show

that φ is an automorphism of G iﬀ G is abelian.

8. Prove that any group of even order has an element of order 2. (Hint:
o(a) 6= 2 iﬀ a 6= a−1 . Pair of such elements (a, a−1 ).
Chapter 5 Algebraic Structures 255

9. Give an example of a noncyclic group each of whose proper subgroup is

cyclic.

10. Show that no group can be the set union of two of its proper subgroups.

11. Show that S = {3, 5} generates the group (Z, +).

12. Give an example of an inﬁnite nonabelian group.

13. Show that the permutation T = {12345} and S = (25)(34) generate a

subgroup of order 10 in S5 .

14. Find if the following permutations are odd:

(i) (123)(456) (ii) (1546)(2)(3) (iii) ( 12 25 34 43 51 67 76 89 98 ) .

15. If G = {a1 , . . . , an } is a ﬁnite abelian group of order n, show that (a1 a2 . . . an )2 =

16. Let G = {a ∈ R : a 6= 1}. If the binary operation ∗ is deﬁned in G by

a ∗ b = a + b + ab for all a, b ∈ G, show that (G, ∗) is a group.

17. Let G = {a ∈ R : a 6= 1}. If the binary operation ∗ is deﬁned on G by

a ∗ b = a + b − ab for all a, b ∈ G, show that (G, ∗) is not a group.

18. Let σ = (i1 i2 . . . ir ) be a cycle in Sn of length r. Show that the order of σ

(= the order of the group generated by σ) is r.

19. Prove that the centre of a group is a normal subgroup G.

20. Let α, β, γ be permutations in S4 deﬁned by

α = ( 11 24 33 42 ), β = ( 12 21 34 43 ), γ = ( 13 21 32 44 ).
Find (i) α−1 , (ii) α−1 βγ (iii) βγ −1 .
Chapter 5 Algebraic Structures 256

21. Show that any inﬁnite cyclic group is isomorphic to (Z, +).

22. Show that the set {ein : n ∈ Z} forms a multiplicative group. Show that
this is isomorphic to (Z, +). Is this group cyclic?

23. Find a homomorphism of the additive group of integers to itself that is

not onto.

24. Give an example of a group that is isomorphic to one of its proper sub-
groups.

25. Prove that (Z, +) is not isomorphic to (Q, +). (Hint: Suppose ∃ an iso-
morphism φ : Z → Q. Let φ(5) = a ∈ Q. Then ∃ b ∈ Q with 2b = a. Let
x ∈ Z be the preimage of b. Then 2x = 5 in Z, which is not true.)

26. Prove that the multiplicative groups R and C are not isomorphic.

27. Give an example of an inﬁnite group in which ∀ element is of ﬁnite order.

28. Give the group table of the group S3 . From the table, ﬁnd the centre of
S3 .

29. Show that if a subgroup H of a group G is generated by a subset S of G,

then H is a normal subgroup iﬀ aSa−1 ⊂< S > for each a ∈ G.

30. Show that a group G is abelian iﬀ the centre C(G) of G is G.

31. Let G be a group. Let [G, G] denote the subgroup of G generated by all
elements of G of the form aba−1 b−1 (called the commutator of a and b)
for all pair of elements a, b ∈ G. Show that [G, G] is a normal subgroup of
G. [Hint: For c ∈ G, we have c(aba−1 b−1 )c−1 = (cac−1 )(cbc−1 )(cc−1 )−1 ∈
[G, G]. Now apply Exercise 29.]
Chapter 5 Algebraic Structures 257

32. Show that a group G is abelian ⇔ [G, G] = {e}.

Remark: [G, G] is called the commutator subgroup of G and it is for this
reason, we take the subgroup generated by the commutators of G. There
is no known elementary counter example. For a counter example, see for
instance, Rotman, Theory of Groups.

33. Let G be the set of all roots of unity, that is, G = {ω ∈ C : ω n = 1 for
some n ∈ N}. Prove that G is an abelian group that is not cyclic.

34. If A and B are normal subgroups of a group G such that A ∩ B = {e}.

Then show that for ∀a ∈ A and b ∈ B, ab = ba.

35. If H is the only subgroup of a given ﬁnite order in a group G, show that
H is normal in G.

36. Show that any subgroup of a group G of index 2 is normal in G.

37. Prove that the subgroup {e, (13)} of S3 is not normal in S3 .

38. Prove that the subgroup {e, (123), (132)} is a normal subgroup of S3 .

5.17 Rings

The study of commutative rings arose as a natural abstraction of the algebraic

properties of the set of integers while that of the ﬁelds arose out of the sets
of rational, real and complex numbers.
We begin with the deﬁnition of a ring and then proceed on to establish
some of its basic properties.
Chapter 5 Algebraic Structures 258

5.17.1 Rings, Definitions and Examples

Definition 5.17.1:
A ring is a set A with two binary operations, denoted by + and · (called
addition and multiplication respectively) satisfying the following axioms:

R1 : (A, +) is an abelian group. (The identity element of (A, +) is denoted

by 0).

R2 : · is associative, that is, a · (b · c) = (a · b) · c for all a, b, c ∈ A.

R3 : For all a, b, c ∈ A,

a · (b + c) = a · b + a · c (left distribution law)

(a + b) · c = a · c + b · c (right distribution law).

It is customary to write ab instead of a · b.

Examples of Rings

1. A = Z, the set of all integers with the usual addition + and the usual
multiplication taken as ·.

2. A = 2Z, the set of even integers with the usual addition and multipli-
cation.

3. A = Q, R or C with the usual addition and multiplication.

4. A = Zn = {0, 1, 2, . . . , n − 1}, the set of integers modulo n, where +

and · denote addition and multiplication taken modulo n. (For in-
stance, if A = Z5 , then in Z5 , 3 + 4 = 7 = 2, and 3 · 4 = 12 = 2 as both
7 and 12 are congruent to 2 modulo 5).
Chapter 5 Algebraic Structures 259

5. A = Z[X], the set of polynomials in the indeterminate X with integer

coeﬃcients with addition + and multiplication · deﬁned in the usual
way.
√ √
6. A = Z + i 3Z = a + ib 3 : a, b ∈ Z ⊂ C. Then A is a ring with the
usual + and · in C.

7. (Ring of Gaussian integers). Let A = ‘Z+iZ′ = {a + ib : a, b ∈ Z} ⊂ C.

Then with the usual addition + and multiplication · in C, A is a ring.

8. (A ring of functions). Let A = C[0, 1], the set of all complex-valued

continuous functions on [0, 1]. For t ∈ [0, 1], and f, g ∈ A, set (f +
g)(t) = f (t) + g(t), and (f · g)(t) = f (t)g(t). Then it is clear that both
f + g and f · g are in A. It is easy to check that A is a ring.

Definition 5.17.2:
A ring A is called commutative if for all a, b ∈ A, ab = ba.

Hence if A is a noncommutative ring, there exists a pair of elements

x, y ∈ A with xy 6= yx.
All the rings given above in Examples 1 to 8 are commutative rings. We
now present an example of a noncommutative ring.

Example 9: Let A = M2 (Z), the set of all 2 by 2 matrices with integers

as entries. A is a ring with the usual matrix addition as + and usual
matrix multiplication as ·. It is a noncommutative ring since M =
11
00

00
, N = 11
are in A, but M N 6= N M .
Chapter 5 Algebraic Structures 260

Unity element of a ring

Definition 5.17.3:
An element e of a ring A is called a unity element of A if ea = ae = a for all
a ∈ A.
A unity element of A, if it exists, must be unique. For, if e and f are
unity elements of A, then,

ef = e as f is a unity or an identity element of A,

ef = f as e is a unity element of A.

Therefore e = f . Hence if a ring A has a unity element e, we can refer to

it as the unity element e of A.
For the rings in Examples 1, 3 and 7 above, the number 1 is the unity
element. For the ring C[0, 1] of Example 8 above, the function 1 ∈ C[0, 1]
deﬁned by 1(t) = 1 for all t ∈ [0, 1] acts as the unity element. For the ring

M2 (Z) of Example 9., the matrix 10 01 acts as a the unity element. A ring
may not have a unity element. For instance, the ring 2Z in Example 5.17.1
above has no unity element.

5.17.2 Units of a ring

An element a of a ring A with unit element e is called a unit in A if there

exist elements b and c in A such that ab = e = ca.

Proposition 5.17.4:
If a is a unit in a ring A with unity element e, and if ab = ca = e, then b = c.

Proof. We have, b = eb = (ca)b = c(ab) = ce = c.

Chapter 5 Algebraic Structures 261

We denote the element b(= c) described in Proposition 5.17.4 as the

inverse of a and denote it by a−1 . Thus if a is a unit in A, then there exists
an element a−1 ∈ A such that aa−1 = a−1 a = e. Clearly, a−1 is unique.

Proposition 5.17.5:
The units of a ring A (with identity element) form a group under multipli-
cation.

Proof. Exercise.

Units of the ring Zn

Let a be a unit in the ring Zn . (See Example 5.17.1 above). Then there
exists an x ∈ Zn such that ax = 1 in Zn , or equivalently, ax ≡ 1( mod n).
But this implies that ax − 1 = bn for some integer b. Hence (a, n) = the
gcd of a and n = 1. (Because if an integer c > 1 divides both a and n, then
it should divide 1). Conversely, if (a, n) = 1, by Euclidean algorithm (see
Section ??), there exist integers x and y with ax + ny = 1, and therefore
ax ≡ 1( mod n). This however means that a is a unit in Zn . Thus the set
U of units of Zn consists precisely of those integers in Zn , that are relatively
prime to n. By Deﬁnition 3.7.2, |U | is φ(n), where φ is the Euler function.

Zero Divisors

In the ring Z of integers, a is a divisor of c if there exists an integer b such

that ab = c. As Z is a commutative ring, we simply say that a is a divisor
of c and not a left divisor or right divisor of c. Taking c = 0, we have the
following more general deﬁnition.
Chapter 5 Algebraic Structures 262

Definition 5.17.6:
A left zero divisor in a ring A is a non-zero element a of A such that there
exists a non-zero element b of A with ab = 0 in A. a ∈ A is a right zero
divisor in A if ca = 0 for some c ∈ A, c 6= 0.

If A is commutative ring, a left zero divisor a in A is automatically a right

zero divisor in A and vice versa. In this case, we simply call a a zero divisor
in A.

Examples

1. If a = 2 in Z4 , then a is a zero divisor in Z4 , as 2 · 2 = 4 = 0 in Z4 .

2. In the ring M2 (Z), the matrix [ 10 10 ] is a right zero divisor as

[ 00 01 ] [ 10 10 ] = [ 00 00 ]

and [ 00 01 ] is not the zero matrix of M2 (Z).

3. If p is a prime, then every non-zero element of Zp is a unit. This follows

from the fact that if 1 ≤ a < p, then (a, p) = 1. Hence no a ∈ Zp ,
a 6= 0 is a zero divisor in Zp .

Theorem 5.17.7:
The following statements are true for any ring A.

(i) a0 = 0a for any a ∈ A.

(ii) a(−b) = (−a)b = −(ab) for all a, b ∈ A.

(iii) (−a)(−b) = ab for all a, b ∈ A.

Chapter 5 Algebraic Structures 263

Proof. Exercise.

5.18 Integral Domains

An integral domain is an abstraction of the algebraic structure of the ring of

integers.

Definition 5.18.1:
An integral domain A is a commutative ring with unity element having no
divisors of zero.

Examples

1. The rings Z and Z + iZ are both integral domains.

2. The ring 2Z of even integers is not an integral domain (Why?) even

though it has no zero divisor.

3. Let Z × Z = {(a, b) : a ∈ Z, b ∈ Z}. For (a, b), (c, d) in Z × Z, deﬁne

(a, b) ± (c, d) = (a ± c, b ± d),

and(a, b) · (c, d) = (ac, bd).

Clearly Z × Z is a commutative ring with zero element (0, 0) and unity

element (1, 1) but not an integral domain as it has zero divisors. For
instance, (1, 0) · (0, 1) = (0, 0).

5.19 Exercises

1. Prove that Zn , n ≥ 2, is an integral domain iﬀ n is a prime.

Chapter 5 Algebraic Structures 264

2. Give the proof of Proposition 5.17.5.

3. Determine the group of units of the ring

√
(i) Z, (ii) M2 (Z), (iii) Z + iZ, (iv) Z + i 3Z.

4. Let A be a ring, and a, b1 , b2 , . . . , bn ∈ A. Then show that a(b1 + b2 +

· · · + bn ) = ab1 + ab2 + · · · abn . (Hint: Apply induction on n).

5. Let A be a ring, and a, b ∈ A. Then show that for any positive integer
n,
n(ab) = (na)b = a(nb).

(na stands for the element a + a + · · · (n times) of A).

6. Show that no unit of a ring A can be a zero divisor in A.

7. (Deﬁnition: A subset B of a ring A is a subring of A if B is a ring with

respect to the binary operations + and · of A). Prove:

(i) Z is a subring of Q.

(ii) Q is a subring of R.

(iii) R is a subring of C.

8. Prove that any ring A with identity element and cardinality p, where
p is a prime, is commutative. (Hint: Verify that the elements 1, 1 +
1, . . . , 1 + 1 + · · · + 1 (p times) are all distinct elements of A).

5.20 Fields

We now discuss the fundamental properties of ﬁelds and then go on to develop

the properties of ﬁnite ﬁelds that are basic to coding theory and cryptography.
Chapter 5 Algebraic Structures 265

If rings are algebraic abstractions of the set of integers, ﬁelds are algebraic
abstractions of the sets Q, R and C (as mentioned already).

Definition 5.20.1:
A ﬁeld is a commutative ring with unity element in which every non-zero
element is a unit.

Hence if F is a ﬁeld, and F ∗ = F \ {0}, the set of non-zero elements of

F , then every element of F ∗ is a unit in F . Hence F ∗ is a group under the
multiplication operation of F . Conversely, if F is a commutative ring with
unit element and if F ∗ is a group under the multiplication operation of F ,
then every element of F ∗ is a unit under the multiplication operation of F ,
and hence F is a field. This observation enables one to give an equivalent
definition of a field.

Definition 5.20.2 (Equivalent deﬁnition):

A ﬁeld is a commutative ring F with unit element in which the set F ∗ of
non-zero elements is a group under the multiplication operation of F .

Every field is an integral domain. To see this all that we have to verify
is that F has no zero divisors. Indeed, if ab = 0, a 6= 0, then a−1 exists in F
and so we have 0 = a−1 (ab) = (a−1 a)b = b in F . However, not every integral
domain is a field. For instance, the ring Z of integers is an integral domain
but not a field. (Recall that the only non-zero integers which are units are 1
and −1.)
Chapter 5 Algebraic Structures 266

5.21 Characteristic of a Field

Definition 5.21.1:
A field F is called finite if |F |, the cardinality of F , is finite; otherwise, F is
an infinite field.

Let F be a field whose zero and unity elements are denoted by 0F and
1F respectively. A subfield of F is a subset F ′ of F such that F ′ is also
a field with the same addition and multiplication operations of F . This of
course means that the zero and unity elements of F ′ are the same as those
of F . It is clear that the intersection of any family of subfields of F is again
a subfield of F . Let P denote the intersection of the family of all subfields
of F . Naturally, the subfield P is the smallest subfield of F . Because if
P ′ is a subfield of F that is properly contained in P , then P ⊂ P ′ ⊂ P , a
6=
contradiction. This smallest subfield P of F is called the prime field of F .
Necessarily, 0F ∈ P and 1F ∈ P .
As 1F ∈ P , the elements 1F , 1F + 1F = 2 · 1F , 1F + 1F + 1F = 3 · 1F and, in
general, n · 1F , n ∈ N, all belong to P . There are then two cases to consider:
Case 1: The elements n·1F , n ∈ N, are all distinct. In this case, the subfield
P itself is an infinite field and therefore F is an infinite field.
Case 2: The elements n · 1F , n ∈ N, are not all distinct. In this case,
there exist r, s ∈ N with r > s such that r · 1F = s · 1F , and therefore,
(r − s) · 1F = 0, where r − s is a positive integer. Hence there exists a least
positive integer p such that p · 1F = 0. We claim that p is a prime number.
If not, p = p1 p2 , where p1 and p2 are positive integers less than p. Then
0 = p · 1F = (p1 p2 ) · 1F = (p1 · 1F )(p2 · 1F ) gives, as F is a field, either
Chapter 5 Algebraic Structures 267

p1 · 1F = 0 or p2 · 1F = 0. But this contradicts the choice of p. Thus p is

prime.

Definition 5.21.2:
The characteristic of a ﬁeld F is the least positive integer p such that p·1F = 0
if such a p exists; otherwise, F is said to be of characteristic zero.

A field of characteristic zero is necessarily infinite (as its prime field al-
ready is). A finite field is necessarily of prime characteristic. However, there
are infinite fields with prime characteristic. Note that if a field F has char-
acteristic p, then px = 0 for each x ∈ F .

Examples

(i) The ﬁelds Q, R and C are all of characteristic zero.

(ii) The ﬁeld Zp of integers modulo a prime p is of characteristic p.

(iii) For a ﬁeld F , denote by F [X] the set of all polynomials in X over F ,
that is, polynomials whose coeﬃcients are in F . F [X] is an integral
domain and the group of units of F [X] = F ∗ .

a(X)
(iv) The field Zp (X) of rational functions of the form , where a(X)
b(X)
and b(X) are polynomials in X over Zp , and b(X) 6= 0, is an infinite
field of (finite) characteristic p.

Theorem 5.21.3:
n
Let F be a ﬁeld of (prime) characteristic p. Then for all x, y ∈ F , (x ± y)p =
n n n n n
xp ± y p , and (xy)p = xp y p .
Chapter 5 Algebraic Structures 268

Proof. We apply induction on n. If n = 1, (by binomial theorem which is

valid for any commutative ring with unit element).

p p p p−1 p
(x + y) = x + x y + ··· + xy p−1 + y p
1 p−1

p
= xp + y p , since p| , 1≤i≤p−1 (5.1)
i

So assume that

n n n
(x + y)p = xp + y p .
n+1

n p
Then (x + y)p = (x + y)p
n n p
= xp + y p by induction assumption
n n
= (xp )p + (y p )p (by (5.1))
n+1 n+1
= xp + yp (5.2)

n
Next we consider (x − y)p . If p = 2, then −y = y and so the result is valid.
If p is an odd prime, change y to −y in (5.2). This gives

n n n
(x − y)p = xp + (−y)p
n n n
= xp + (−1)p y p
n n
= xp − y p ,

n
since (−1)p = −1.

5.22 Vector Spaces

In Section 5.20, we discussed some of the basic properties of ﬁelds. In the

present section, we look at the fundamental properties of vector spaces. We
follow up this discussion with a section on ﬁnite ﬁelds.
Chapter 5 Algebraic Structures 269

While the three algebraic structures—groups, rings and ﬁelds—are natu-

ral generalizations of integers and real numbers, the algebraic structure vector
space is a natural generalization of the 3-dimensional Euclidean space.
We start with the formal definition of a vector space. To define a vector
space, we need two objects: (i) a set V of vectors, and (ii) a field F of scalars.
In the case of Euclidean 3-space, V is the set of vectors, each vector being
an ordered triple (x1 , x2 , x3 ) of real numbers and F = R, the field of real
numbers. The axioms for a vector space that are given in Definition 5.22.1
below are easily seen to be generalizations of the properties of R3 .

Definition 5.22.1:
A vector space (or linear space) V over a ﬁeld F is a nonvoid set V whose
elements satisfy the following axioms:

(A) V has the structure of an additive abelian group.

(B) For every pair of elements α and v, where α ∈ F and v ∈ V , there exists
an element αv ∈ V called the product of v by α such that

(i) α(βv) = (αβ)v for all α, β ∈ F and v ∈ V , and

(ii) 1v = v for each v ∈ V (Here 1 is the unity element of the ﬁeld F ).

(C) (i) For α ∈ F , and u, v in V , α(u+v) = αu+αv, that is, multiplication

by elements of F is distributive over addition in V .

(ii) For α, β ∈ F and v ∈ V , (α + β)v = αv + βv, that is multiplication

of elements of V by elements of F is distributive over addition in
F.
Chapter 5 Algebraic Structures 270

If F = R, V is called a real vector space; if F = C, then V is called

a complex vector space. When an explicit reference to the ﬁeld F
is not required, we simply say that V is a vector space (omitting
the words “over the ﬁeld F ”). The product αv, α ∈ F, v ∈ V , is
often referred to as scalar multiplication, α being the scalar.

5.22.1 Examples of Vector Spaces

1. Let V = R3 , the set of ordered triples x1 , x2 , x3 of real numbers. Then

R3 is a vector space over R (as mentioned earlier) and hence R3 is
a real vector space. More generally, if V = Rn , n ≥ 1, the set of
ordered n-tuples (x1 , . . . , xn ) of real numbers, then Rn is a real vector
space. If x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) ∈ Rn , then x + y =
(x1 + y1 , . . . , xn + yn ), and αx = (αx1 , . . . , αxn ). The zero vector of Rn
is (0, . . . , 0). Rn is known as the real affine space of dimension n.

2. Let V = Cn , the set of ordered n-tuples (z1 , . . . , zn ) of complex num-

bers. Then V is a vector space over R as well as over C. We note that
the real vector space Cn and the complex vector space Cn are essen-
tially diﬀerent spaces despite the fact that the underlying set of vectors
is the same in both the cases.

3. R is a vector space over Q.

4. F [X], the ring of polynomials in X over the ﬁeld F , is a vector space

over F .

5. The set of solutions of a homogeneous linear ordinary diﬀerential equa-

tion with real coeﬃcients forms a real vector space. The reason is that
Chapter 5 Algebraic Structures 271

any such diﬀerential equation has the form

dn y dn−1 y
+ C1 + · · · + Cn−1 y = 0. (1)
dxn dxn−1

Clearly, if y1 (x) and y2 (x) are two solutions of the diﬀerential equa-
tion (1), then so is y(x) = α1 y1 (x) + α2 y2 (x), α1 , α2 ∈ R. It is now easy
to verify that the axioms of a vector space are satisﬁed.

5.23 Subspaces

The notion of a subspace of a vector space is something very similar to the

notions of a subgroup, subring and a subﬁeld.

Definition 5.23.1:
A subspace W of a vector space V over F is a subset W of V such that W is
also a vector space over F with addition and scalar multiplication as deﬁned
for V .

Proposition 5.23.2:
A non-void subset W of a vector space V is a subspace of V iﬀ for all u, v ∈ W
and α, β ∈ F ,
αu + βv ∈ W

Proof. If W is a subspace of V , then, as W is a vector space over F (with

the same addition and scalar multiplication as in V ), αu ∈ W and βv ∈ W ,
and therefore αu + βv ∈ W .
Conversely, if the condition holds, then it means that W is an additive
subgroup of V (. . .). Moreover, taking β = 0, we see that for each α ∈ F ,
Chapter 5 Algebraic Structures 272

u ∈ W , αu ∈ W . As W ⊂ V , all the axioms of a vector space are satisﬁed

by W and hence W is a subspace of V .

An example of a subspace

Let W = {(a, b, 0) : a, b ∈ R}. Then W is a subspace of R3 . (To see this,

apply Deﬁnition 5.23.1). Geometrically, this means that the xy-plane of R3
(that is, the set of points of R3 with the z-coordinate zero) is a subspace of
R3 .

Proposition 5.23.3:
If W1 and W2 are subspaces of a vector space V , then W1 ∩ W2 is also a
subspace of V . More generally, the intersection of any family of subspaces of
a vector space V is also a subspace of V .

Proof. Let u, v ∈ W = W1 ∩ W2 , and α, β ∈ F . Then αu + βv belongs to

W1 as well as to W2 by Proposition 5.23.2, and therefore to W . Hence W is
a subspace of V again by Proposition 5.23.2. The general case is similar.

5.24 Spanning Sets

Definition 5.24.1:
Let S be a subset of a vector space V over F . By the subspace spanned by
S, denoted by < S >, we mean the smallest subspace of V that contains S.
If < S >= V , we call S a spanning set of V .

Clearly, there is at least one subspace of V containing S, namely, V . Let

Chapter 5 Algebraic Structures 273

S denote the collection of all subspaces of V containing S. Then ∩ W is

W ∈S
also a subspace of V containing S. Clearly, it is the smallest subspace of V
containing S, and hence < S >= ∩ W .
W ∈S

Example 5.24.2:
We shall determine the smallest subspace of R3 containing the vectors (1, 2, 1)
and (2, 3, 4).
Clearly, W must contain the subspace spanned by (1, 2, 1), that is, the line
joining the origin (0, 0, 0) and (1, 2, 1). Similarly, W must also contain the
line joining (0, 0, 0) and (2, 3, 4). These two distinct lines meet at the origin
and hence deﬁne a unique plane through the origin, and this is the subspace
spanned by the two vectors (1, 2, 1) and (2, 3, 4). (See Proposition 5.24.3
below.

Proposition 5.24.3:
Let S be a subset of a vector space V over F . Then < S >= L(S), where

L(S) = α1 s1 + α2 s2 + · · · + αr sr : si ∈ S, 1 ≤ i ≤ r and αi ∈ F, 1 ≤ i ≤
r, r ∈ N = set of all ﬁnite linear combinations of vectors of S over F .

Proof. First, it is easy to check that L(S) is a subspace of V . In fact, let

u, v ∈ L(S) so that

u = α1 s1 + · · · + αr sr , and

v = β1 s′1 + · · · + βt s′t

where si ∈ S, αi ∈ F for each i and βj ∈ F for each j. Hence if α, β ∈ F ,

then

αu + βv = (αα1 )s1 + · · · + (ααr )sr + (ββ1 )s′1 + · · · + (ββt )s′t ∈ L(S).

Chapter 5 Algebraic Structures 274

Hence by Proposition 5.23.2, L(S) is a subspace of V . Further 1·s = s ∈ L(S)

for each s ∈ S, and hence L(S) contains S. But by deﬁnition, < S > is the
smallest subspace of V containing S. Hence < S > ⊆ L(S).
Now, let W be any subspace of V containing S. Then any linear com-
bination of vectors of S is a vector of W , and hence L(S) ⊆ W . In other
words, any subspace of V that contains the set S must contain the subspace
L(S). Once again, as < S > is the smallest subspace of V containing S,
L(S) ≤< S >. Thus < S > = L(S).

Note : If S = {u1 , . . . , un } is a ﬁnite set, then < S > = < u1 , . . . , un >=

subspace of linear combinations of u1 , . . . , un over F . In this case, we say
that S generates the subspace < S > or S is a set of generators for < S >.
Also L(S) is called the linear span of S in V .

Proposition 5.24.4:
Let u1 , . . . , un and v be vectors of a vector space V . Suppose that v ∈ <
u1 , u2 , . . . , un >. Then < u1 , . . . , un > = < u1 , . . . , un ; v >.

Proof. Any element α1 u1 + · · · + αn un , αi ∈ F for each i, of < u1 , . . . , un >

can be rewritten as
α1 u1 + · · · + αn un + 0 · v

and hence belongs to < u1 , . . . , un ; v >. Thus

< u1 , . . . , un > ⊆< u1 , . . . , un ; v >

Conversely, if

w = α1 u1 + · · · + αn un + βv ∈< u1 , . . . , un ; v >,
Chapter 5 Algebraic Structures 275

then as v ∈< u1 , . . . , un >, v = γ1 u1 + · · · + γn un , γi ∈ F and therefore

w = (α1 u1 + · · · + αn un ) + β (γ1 u1 + · · · + γn un )
n
X
= (αi + βγi ) ui ∈< u1 , . . . , un > .
i=1

Thus < u1 , . . . , un ; v >⊆< u1 , . . . , un > and therefore

< u1 , . . . , un > = < u1 , . . . , un ; v >

Corollary 5.24.5:
If S is any nonempty subset of a vector space V , and v ∈< S >, then
< S ∪ {v} > = < S >.

Proof. v ∈< S > implies, by virtue of Proposition 5.24.3, v is a linear com-

bination of a ﬁnite set of vectors in S. Now the rest of the proof is as in the
proof of Proposition 5.24.4.

5.25 Linear Independence and Base

Let V = R3 , and e1 = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1) in R3 . If

v = (x, y, z) is any vector of R3 , then v = xe1 + ye2 + ze3 . Trivially, this
is the only way to express v as a linear combination of e1 , e2 , e3 . For this
reason, we call {e1 , e2 , e3 } a base for R3 . We now formalize these notions.

Definition 5.25.1: (i) A ﬁnite subset S = {v1 , . . . , vn } of vectors of a vec-

tor space V over a ﬁeld F is said to be linearly independent if the
equation
α1 v1 + α2 v2 + · · · + αn vn = 0, αi ∈ F
Chapter 5 Algebraic Structures 276

implies that αi = 0 for each i.

In other words, a linearly independent set of vectors admits only the

trivial linear combination between them, namely,

0 · v1 + 0 · v2 + · · · + 0 · vn = 0

In this case we also say that the vectors v1 , . . . , vn are linearly indepen-
dent over F . In the above equation, the zero on the right refers to the
zero vector of V while the zeros on the left refer to the scalar zero, that
is, the zero element of F .

(ii) An inﬁnite subset S of V is linearly independent in V if every ﬁnite

subset of vectors of S is linearly independent.

(iii) A subset S of V is linearly dependent over F if it is not linearly indepen-

dent over F . This means that there exists a ﬁnite subset {v1 , . . . , vn }
of S and a set of scalars α1 , . . . , αn , not all zero, in F such that

α1 v1 + · · · + αn vn = 0.

If {v1 , . . . , vn } is linearly independent over F , we also note that the

vectors v1 , . . . , vn are linearly independent over F .

Remark 5.25.2: (i) The zero vector of V forms a linearly dependent set
since it satisﬁes the nontrivial equation 1 · 0 = 0, where 1 ∈ F and
0∈V.

(ii) Two vectors V are linearly dependent over F iﬀ one of them is a scalar
multiple of the other.
Chapter 5 Algebraic Structures 277

(iii) If v ∈ V and v 6= 0, then {v} is linearly independent (since for α ∈

F, αv = 0 implies that α = 0).

(iv) The empty set is always taken to be linearly independent.

Proposition 5.25.3:
Any subset T of a linearly independent set S of a vector space is linearly
independent.

Proof. First assume that S is a ﬁnite subset of V . We can take T =

{v1 , . . . , vr } and S = {v1 , . . . , vr ; vr+1 , . . . , vn }, n ≥ r. The relation

α1 v1 + · · · αr vr = 0, αi ∈ F

is equivalent to the condition that

(α1 v1 + · · · αr vr ) + (0 · vr+1 + · · · 0 · vn ) = 0.

But this implies, as S is linearly independent over F , αi = 0, 1 ≤ i ≤ n.

Hence T is linearly independent.
If S is an infinite set and T ⊆ S, then any finite subset of T is a finite
subset of S and hence linearly independent. Hence T is linearly independent
over F .

A restatement of Proposition 5.25.3 is that any superset (⊂ V ) of a

linearly dependent subset of V is linearly dependent.

Corollary 5.25.4:
If v ∈ L(S), then S ∪ {v} is linearly dependent.
Chapter 5 Algebraic Structures 278

Proof. By hypothesis, there exist v1 , . . . , vn in S, and α1 , . . . , αn ∈ F such

that α1 v1 + · · · αn vn + (−1)v = 0. Hence {v1 , . . . , vn ; v} is linearly dependent
and so by Proposition 5.25.3, S ∪ {v} is linearly dependent.

Examples

1. C is a vector space over R. The vectors 1 and i of C are linearly

independent over R. In fact, if α, β ∈ R, then

α·1+β·i=0

gives that α + βi = 0, and therefore α = 0 = β. One can check that

{1+i, 1−i} is also linearly independent over R, while {2+i, 1+i, 1−i}
is linearly dependent over R. The last assertion follows from the fact
that if u = 2 + i, v = 1 + i and w = 1 − i, then u + w = 3 and v + w = 2
so that
2(u + w) = 3(v + w)

and giving
2u − 3v − w = 0.

2. The inﬁnite set of polynomials S = {1, X, X 2 , . . .} in the vector space

R[X] of polynomials in X with real coeﬃcients is linearly independent.

Recall that an infinite set S is linearly independent iff every finite subset

of S is linearly independent. So consider a ﬁnite subset {X i1 , X i2 , . . . , X in }
of S. The equation

λ1 X i1 + λ2 X i2 + · · · + λn X in = 0 (A)
Chapter 5 Algebraic Structures 279

where the scalars λi , 1 ≤ i ≤ n, all belong to R, implies that the

polynomial on the left side of equation (A) is the zero polynomial and
hence must be zero for every real value of X. In other words, every
real number is a zero (root) of this polynomial. This is possible only if
each λi is zero. Hence the set S is linearly independent over R.

5.26 Bases of a Vector Space

Definition 5.26.1:
A basis (or base) of a vector space V over a ﬁeld F is a subset B of V such
that

(i) B is linearly independent over F , and

(ii) B spans V ; in symbols, < B >= V .

Condition (ii) implies that every vector of V is a linear combination of (a

ﬁnite number of) vectors of B while condition (i) ensures that the expression
is unique. Indeed, if u = α1 u1 +· · ·+αn un = β1 u1 +· · ·+βn un , where the ui ’s
are all in B, then (α1 −β1 )u1 +· · ·+(αn −βn )un = 0. The linear independence
of the vectors u1 , . . . , un , mean αi −βi = 0, that is, αi = βi for each i. Notice
that we have taken the same u1 , . . . , un in both the expressions as we can
always add terms with zero coeﬃcients. For example, if

u = α1 u1 + α2 u2 = β1 u′1 + β2 u′2 , then

u = α1 u1 + α2 u2 + 0 · u′1 + 0 · u′2

= 0 · u1 + 0 · u2 + αu′1 + αu′2 .
Chapter 5 Algebraic Structures 280

Example 5.26.2:
The vectors e1 = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1) form a basis for R3 .
This follows from the following two facts.

1. {e1 , e2 , e3 } is linearly independent in R3 . In fact, α! e1 +α2 e2 +α3 e3 = 0,

αi ∈ R, implies that

(α1 , α2 , α3 ) = α1 (1, 0, 0) + α2 (0, 1, 0) + α3 (0, 0, 1) = (0, 0, 0),

and hence αi = 0, 1 ≤ i ≤ 3.

2. < e1 , e2 , e3 >= R3 . To see this, any vector of < e1 , e2 , e3 >= α1 e1 +

α2 e2 + α3 e3 = (α1 , α2 , α3 ) ∈ R3 and, conversely, any (α1 , α2 , α3 ) in R3
is α1 e1 + α2 e2 + α3 e3 and hence belongs to < e1 , e2 , e3 >.

5.27 Dimension of a Vector Space

Definition 5.27.1:
By a ﬁnite-dimensional vector space, we mean a vector space that can be
generated (or spanned) by a ﬁnite number of vectors in it.

Our immediate goal is to establish that any ﬁnite-dimensional vector

space has a ﬁnite basis and that any two bases of a ﬁnite-dimensional vector
space have the same number of elements.

Lemma 5.27.2:
No ﬁnite-dimensional vector space can have an inﬁnite basis.

Proof. Let V be a ﬁnite-dimensional vector space with a ﬁnite spanning set

S = {v1 , . . . , vn }. Suppose to the contrary V has an inﬁnite basis B. Then,
Chapter 5 Algebraic Structures 281

as B is a basis, vi is a linear combination of a ﬁnite subset Bi , 1 ≤ i ≤ n of

n
B. Let B ′ = ∪ Bi . Then B ′ is also a ﬁnite subset of B. As B ′ ⊂ B, B ′ is
i=1
linearly independent and further, as each v ∈ V is a linear combination of
v1 , . . . , vn , v is also a linear combination of the vectors of B ′ . Hence B ′ is
also a basis for V . If x ∈ B|B ′ , then x ∈ L(B ′ ) and so B ′ ∪ {x} is a linearly
dependent subset of the linearly independent set B, a contradiction.

Lemma 5.27.3:
A ﬁnite sequence {v1 , . . . , vn } of non-zero vectors of a vector space V is
linearly dependent iﬀ for some k, 2 ≤ k ≤ n, vk is a linear combination of its
preceding vectors.

Proof. In one direction, the proof is trivial; if vk ∈< v1 , . . . , vk−1 >, then
by Proposition 5.25.3, {v1 , . . . vk−1 ; vk } is linearly dependent and so is its
superset {v1 , . . . , vn }.
Conversely, assume that {v1 , v2 , . . . , vn } is linearly dependent. As v1 is
a non-zero vector, {v1 } is linearly independent (See (iii) of Remark 5.25.2).
Hence there must exist a k, 2 ≤ k ≤ n such that {v1 , . . . , vk−1 } is linearly
independent while {v1 , . . . , vk } is linearly dependent since at worst k can be
n. Hence there exists a set of scalars α1 , . . . , αk , not all zero, such that

α1 v1 + · · · + αk vk = 0.

Now αk 6= 0; for if αk = 0, there exists a nontrivial linear relation connecting

v1 , . . . , vk−1 contradicting the fact that {v1 , . . . , vk−1 } is linearly independent.
Thus
vk = −αk−1 α1 v1 − · · · − αk−1 αk−1 vk−1 .
Chapter 5 Algebraic Structures 282

Lemma 5.27.3 implies, by Proposition 5.24.4, that under the stated con-
ditions on vk ,
∧
< v1 , . . . , vk , . . . , vn >=< v1 , . . . , vk−1 , vk+1 , . . . , vn >=< v1 , . . . , vk , . . . , vn >,

where the ∧ symbol upon vk indicates that the vector vk should be deleted.
We next prove a very important property of ﬁnite-dimensional vector
spaces.

Theorem 5.27.4:
Any ﬁnite-dimensional vector space has a basis. Moreover, any two bases of
a ﬁnite-dimensional vector space have the same number of elements.

Proof. Let V be a ﬁnite-dimensional vector space. By Lemma 5.27.2, every

basis of V is ﬁnite. Let S = {u1 , . . . , um }, and T = {v1 , . . . , vn } be any two
bases of V . We want to prove that m = n.
Now v1 ∈ V =< u1 , . . . , um >. Hence the set S1 = {v1 ; u1 , . . . , um } is lin-
early dependent. By Lemma 5.27.3, there exists a vector ui1 ∈ {u1 , . . . , um }
such that
∧
< v1 ; u1 , . . . , um >=< v1 ; u1 , . . . ui1 , . . . , um > .

Now consider the set of vectors

∧ ∧
S2 = {v2 , v1 ; u1 , . . . , , ui1 , . . . , um } = {v2 , v1 ; u1 , . . . um }/{ui1 }.
∧
As v2 ∈ V = < v1 ; u1 , . . . , ui1 , . . . , um >, there exists a vector ui2 ∈ {u1 , . . . ,
∧
ui1 , . . . , um } such that ui2 is a linear combination of the vectors preceding it
in the sequence S2 . (Such a vector cannot be a vector of T as every subset
of T is linearly independent). Hence if
∧ ∧
S3 = {v1 , v2 ; u1 , . . . , ui1 , . . . ui2 , . . . , um }
Chapter 5 Algebraic Structures 283

∧ ∧
= {v1 , v2 ; u1 , . . . , ui1 , . . . ui2 , . . . , um }/ ui1 , ui2 ,

< S3 > = V.

Thus every time we introduce a vector from T , we are in a position to delete

a vector from S. Hence |T | ≤ |S|, that is, n ≤ m. Interchanging the roles
of the bases S and T , we see, by a similar argument, that m ≤ n. Thus
m = n.

Note that we have actually shown that any finite spanning subset of a
finite-dimensional vector space V does indeed contain a finite basis of V .
Theorem 5.27.4 makes the following definition unambiguous.

Definition 5.27.5:
The dimension of a ﬁnite-dimensional vector space is the number of elements
in any one of its bases.

If V is of dimension n over F , we write dimF V = n or, simply, dim V = n,

when F is known.

Examples

1. Rn is of dimension n over R. In fact, it is easy to check that the set of

vectors

S = {e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1)}

is a basis for Rn over R.

2. Cn is of dimension n over C.
Chapter 5 Algebraic Structures 284

3. Cn is of dimension 2n over R. In fact, if ek ∈ Cn has 1 in the k-th

√
position and 0 in the remaining positions, and fk ∈ Cn has i = −1
in the k-th position and 0 in the remaining positions, then

S = {e1 , . . . , en ; f1 , . . . fn }

forms a basis of Cn over R. (Verify!)

4. Let Pn (X) denote the set of real polynomials in X with real coeﬃcients
of degrees not exceeding n. Then B = {1, X, X 2 , . . . , X n } is a basis for
Pn (X). Hence dimR Pn (X) = n + 1.

5. The vector space R over Q is inﬁnite-dimensional. This can be seen as

follows: Suppose dimQ R = n (ﬁnite). Then R has a basis {v1 , . . . , vn }
over Q. Hence

R = {α1 v1 + · · · + αn vn : αi ∈ Q for each i}

But as Q is countable, the number of such linear combinations is count-

able (See Section ??). This is a contradiction as R is uncountable. Thus
R is inﬁnite-dimensional over Q.

Proposition 5.27.6:
Any maximal linearly independent subset of a ﬁnite-dimensional vector space
V is a basis for V .

Proof. Let B be a maximal linearly independent subset of V , that is, B is

not a proper subset of B′ , where B′ is linearly independent in V . Suppose
B is not a basis for V . This means, by the deﬁnition of a basis for a vector
space, that there exists a vector x in V such that x 6∈< B >. Then B ∪ {x}
Chapter 5 Algebraic Structures 285

must be a linearly independent subset of V ; for, suppose B ∪ {x} is linearly

dependent. Then there exist v1 , . . . , vn in B satisfying a nontrivial relation

α1 v1 + · · · + αn vn + βx = 0, αi ∈ F, for each i and β ∈ F .

Now, β 6= 0, as otherwise, {v1 , . . . , vn } would be linearly dependent over F .

Thus x = −β −1 (α1 v1 + · · · + αn vn ) ∈ < B >, a contradiction. Thus B ∪ {x}
is linearly independent. But then the fact that B ∪ {x} ⊃ B violates the
maximality of B. Hence B must be a basis for V .

5.28 Exercises

1. Show that Z is not a vector space over Q.

2. If n ∈ N, show that the set of all real polynomials of degree n does not
form a vector space over R (under usual addition and scalar multipli-
cation of polynomials).

3. Which of the following are vector spaces over R?

(i) V1 = {x, y, z ∈ R3 such that y + z = 0}.
(ii) V1 = {x, y, z ∈ R3 such that y + z = 1}.
(iii) V1 = {x, y, z ∈ R3 such that y ≥ 0}.
(iv) V1 = {x, y, z ∈ R3 such that z = 0}.

4. Show that the dimension of the vector space of all m by n real matrices
over R is mn. [Hint: For m = 2, n = 3, the matrices
h i h i h i h i h i h i
100 010 001 000 000 000
000 , 000 , 000 , 100 , 010 , 001

form a basis for the space of all 2 by 3 real matrices. Verify this ﬁrst].
Chapter 5 Algebraic Structures 286

5. Prove that a subspace of a ﬁnite-dimensional vector space is ﬁnite-

dimensional.

6. Show that the vector space of real polynomials in X is inﬁnite-dimensional

over R.

7. Find a basis and the dimension of the subspace of R4 spanned by the

vectors u1 = (1, 2, 2, 0), u2 = (2, 4, 0, 1) and u3 = (4, 8, 4, 1).

8. Find the dimension of the subspace of R3 spanned by v1 = (2, 3, 7),

v2 = (1, 0, −1), v3 = (1, 1, 2) and v4 = (0, 1, 3).
[Hint: v1 = 3v3 − v2 and v4 = v3 − v2 .]

9. State with reasons whether each of the following statements is true or

false:

(i) A vector space V can have two disjoint subspaces.

(ii) Every vector space of dimension n has a subspace of dimension

m for each m ≤ n.

(iii) A two-dimensional vector space has exactly three subspaces.

(iv) In a vector space, any two generating (that is, spanning) subsets
are disjoint.

(v) If n vectors of a vector space V span a subspace U of V , then

dim U = n.
Chapter 5 Algebraic Structures 287

5.29 Solutions of Linear Equations and Rank

of a Matrix

Let  
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A =  .. .. .. 
. . .
am1 am2 . . . amn
be an m by n matrix over a field F . To be precise, we take F = R, the field
of real numbers. Let R1 , R2 , . . . , Rm be the row vectors and C1 , C2 , . . . , Cn
the column vectors of A. Then each Ri ∈ Rn and each Cj ∈ Rm . The row
space of A is the subspace < R1 , . . . , Rm > of Rn , and its dimension is the
row rank of A. Clearly, (row rank of A) ≤ m since any m vectors of a vector
space span a subspace of dimension at most m. The column space of A and
the column rank of A (≤ n) are defined in an analogous manner.
We now consider three elementary row transformations (or operations)
defined on the row vectors of A:

(i) Rij —interchange of the i-th and j-th rows of A.

(ii) kRi —multiplication of the i-th row vector Ri of A by a non-zero scalar

(real number) k.

(iii) Ri + cRj —addition to the i-th row of A, c times the j-th row of A, c
being a scalar.

The elementary column transformations are deﬁned in an analogous manner.

The inverse of each of these three transformations is again a transformation
of the same type. For instance, the inverse of Ri + cRj is got by additing to
the new i-th row, −c times Rj .
Chapter 5 Algebraic Structures 288

Let A∗ be a matrix obtained by applying a ﬁnite sequence of elementary

row transformations to a matrix A. Then the row space of A = row space of
A∗ , and hence, row rank of A = dim( row space of A) = dim( row space of
A∗ ) = row rank of A∗ . Now a matrix A∗ is said to be in row-reduced echelon
form if:

(i) The leading non-zero entry of any non-zero row (if any) of A∗ is 1.

(ii) The leading 1’s in the non-zero rows of A∗ occur in increasing order of
their columns.

(iii) Each column of A∗ containing a leading 1 of a row of A∗ has all its

other entries zero.

(iv) The non-zero rows of A∗ precede its zero rows, if any.

Now let D be a square matrix of order n. The three elementary row (respec-
tively column) operations considered above do not change the singular or
nonsingular nature of D. In other words, if D∗ is a row-reduced echelon form
of D, then D is singular iﬀ D∗ is singular. In particular, D is nonsingular iﬀ
D∗ = In , the identity matrix of order n. Hence if a row-reduced echelon form
A∗ of a matrix A has r non-zero rows, the maximum order of a nonsingular
square submatrix of A is r. This number is called the rank of A.

Definition 5.29.1:
The rank of a matrix A is the maximum order of a nonsingular square sub-
matrix of A. Equivalently, it is the maximum order of a nonvanishing deter-
minant minor of A.
Chapter 5 Algebraic Structures 289

Example 5.29.2:
Find the row-reduced echelon form of
 
12 3 −1
2 1 −1 4
A = 3 3 2 3 .
66 4 6
As the leading entry of R1 is 1, we perform the operations R2 −2R1 ; R3 −3R1 ;
R4 − 6R1 (where Ri stands for the i-th row of A). This gives
 
1 2 3 −1
0 −3 −7 6
A1 = 0 −3 −7 6  .
0 −6 −14 12
Next perform 31 R2 (that is, replace R2 by 31 R2 ). This gives
1 2 3 −1
0 1 7/3 −2
A′1 = 0 −3 −7 6  .
0 −6 −14 12
Now perform R1 − 2R2 (that is, replace R1 by R1 − 2R2 etc.); R3 + 3R2 ;
R4 + 6R2 . This gives the matrix
 
1 0 −5/3 3
0
A2 = 0 1 7/3 −2 .
0 0 0
0 0 0 0
A2 is the row-reduced echelon form of A. Note that A2 is uniquely determined
by A. Since the maximum order of a non-singular submatrix of A2 is 2, rank
of A = 2. Moreover, row space of A2 =< R1 , R2 (of A2 ) >. Clearly R1 and
R2 are linearly independent over R since for α1 , α2 ∈ R, α1 R1 + α2 R2 =
(α1 , α2 , −5α1 /3 + 7α2 /3, 3α1 − 2α2 ) = 0 = (0, 0, 0, 0) implies that α1 = 0 =
α2 . Thus the row rank of A is 2 and therefore the column rank of A is also
2.

Remark 5.29.3:
Since the last three rows of A1 are proportional (that is, one row is a multiple
Chapter 5 Algebraic Structures 290

of the other two), any 3 × 3 submatrix of A2 will be singular. Since A1 has a

nonsingular submatrix of order 2, (for example, ( 10 −3
2
] ), A1 is of rank 2 and
we can conclude that A is also of rank 2.

5.30 Solutions of Linear Equations

Consider the system of linear homogeneous equations

X1 + 2X2 + 3X3 − X4 = 0

2X1 + X2 − X3 + 4X4 = 0 (5.3)

3X1 + 3X2 + 2X3 + 3X4 = 0

6X1 + 6X2 + 4X3 + 6X4 = 0 .

These equations are called homogeneous because if (X1 , X2 , X3 , X4 ) is a so-

lution of these equations, then so is (kX1 , kX2 , kX3 , kX4 ) for any scalar k.
Trivially (0, 0, 0, 0) is a solution. We express these equations in the matrix
form
AX = 0, (5.4)

where   X   
1 2 3 −1 1 0
2 1 −1 4 X2  0
A = 3 3 2 3 ,

X= X , and 0 = 0 .
3
6 6 4 6 X4 0
If X1 and X2 are any two solutions of (5.4), then so is aX1 + bX2 for scalars
a and b since A(aX1 + bX2 ) = a(AX1 ) + b(AX2 ) = a · 0 + b · 0 = 0. Thus the
set of solutions of (5.4) is (as X ∈ Rn ) a vector subspace of Rn , where n =
the number of indeterminates in the equations (5.3).
It is clear that the three elementary row operations performed on a sys-
tem of homogeneous linear equations do not alter the set of solutions of the
Chapter 5 Algebraic Structures 291

equations. Hence if A∗ is the row reduced echelon form of A, the solution

sets of AX = 0 and A∗ X = 0 are the same.
In the Example 5.29.2, A∗ = A2 . Hence the equations A∗ X = 0 are:
5
X1 − X3 + 3X4 = 0, and
3
7
X2 + X3 − 2X4 = 0
3
These give
5
X1 = X3 − 3X4
3
7
X2 = − X3 + 2X4
3
so that X     
1 5/3 −3
X −7/3  2
X = X  = X3  1  + X4  0 .
2
3
X4 0 1
Thus the space of
" solutions
# of AX = 0 is spanned by the two linearly inde-
5/3 −3
pendent vectors −7/3 and 20 and hence is of dimension 2. This number
1
0 1
2 corresponds to the fact that X1 , X2 , X3 , and X4 are all expressible in terms
of X3 and X4 . Here X1 and X2 correspond to the identity submatrix of order
2 of A∗ , that is, the rank of A. Also, dimension 2 = 4 − 2 = (number of
indeterminates) − (rank of A). The general case is clearly similar where the
system of equations is given by ai1 X1 + · · · + ain Xn = 0, 1 ≤ i ≤ m. These
are given by the matrix equation AX = 0, where A is the m by n matrix
X1
(aij ) of coeﬃcients and X = ... ; we state it as a theorem.
Xn

Theorem 5.30.1:
The solution space of a system of homogeneous linear equations is of dimen-
sion n−r, where n is the number of unknowns and r is the rank of the matrix
A of coeﬃcients.
Chapter 5 Algebraic Structures 292

5.31 Solutions of Nonhomogeneous Linear Equa-

tions

A system of nonhomogeneous linear equations is of the form

a11 X1 + a12 X2 + · · · + a1n Xn = b1

.. .. .. ..
. . . .

am1 X1 + am2 X2 + · · · + amn Xn = bm .

These m equations are equivalent to the single matrix equation

AX = B,

where A = (aij ) is an m by n matrix and B is a non-zero column vector of

length m.
It is possible that such a system of equations has no solution at all. For
example, consider the system of equations

X1 − X2 + X3 = 2

X1 + X2 − X3 = 0

3X1 = 6.

From the last equation, we get X1 = 2. This, when substituted in the ﬁrst
two equations, yields −X2 + X3 = 0, X2 − X3 = −2 which are mutually
contradictory. Such equations are called inconsistent equations.
When are the equations represented by AX = B consistent?

Theorem 5.31.1:
The equations AX = B are consistent if and only if B belongs to the column
space of A.
Chapter 5 Algebraic Structures 293
α1
Proof. The equations are consistent iff there exists a vector X0 = ... such
αn
that AX0 = B. But this happens iff α1 C1 +· · ·+αn Cn = B, where C1 , . . . , Cn
are the column vectors of A, that is, iff B belongs to the column space of
A.

Corollary 5.31.2:
The equations represented by AX = B are consistent iﬀ rank of A = rank
of (A, B). [(A, B) denotes the matrix obtained from A by adding one more
column vector B at the end. It is called the matrix augmented by B].

Proof. By the above theorem, the equations are consistent iﬀ

B ∈ < C1 , . . . , Cn >. But this is the case, by Proposition 5.24.4, iﬀ
< C1 , . . . , Cn >=< C1 , . . . , Cn , B >. The latter condition is equivalent to,
column rank of A = column rank of (A, B) and consequently to rank of A =
rank of (A, B).

We now ask the natural question. If the system AX = B is consistent,

how to solve it?

Theorem 5.31.3:
Let X0 be any particular solution of the equation AX = B. Then, the set of
all solutions of AX = B is given by {X0 + U }, where U varies over the set
of solutions of the auxilliary equation AX = 0.

Proof. If AU = 0, then A(X0 +U ) = AX0 +AU = B +0 = B, so that X0 +U

is a solution of AX = B.
Conversely, let X1 be an arbitrary solution of AX = B, so that AX1 = B.
Chapter 5 Algebraic Structures 294

Then AX0 = AX1 = B gives that A(X1 − X0 ) = 0. Setting X1 − X0 = U ,

we get, X1 = X0 + U , and AU = 0.

As in the case of homogeneous linear equations, the set of solutions of

AX = B remains unchanged when we perform any ﬁnite number of elemen-
tary row transformations on A, provided we take care to perform simulta-
neously the same operations on the matrix B on the right. As before, we
row-reduce A to its echelon form A0 and the equation AX = B gets trans-
formed into its equivalent equation A∗ X0 = B0 .

5.32 LUP Decomposition

Definition 5.32.1:
By an LUP decomposition of a square matrix A we mean an equation of the
form
P A = LU (5.5)

where P is a permutation matrix, L, a unit-triangular matrix and U , an

upper triangular matrix. (See Chapter 1 for deﬁnitions). Suppose we have
determined matrices P, L and U so that equation (5.5) holds. The equation
AX = b is equivalent to (P A)X = P b in that both have the same set of
solutions X. This is because P −1 exists and hence (P A)X = P b is same as
AX = b. Now set P b = b′ . Then P AX = P b gives

LU X = b′ . (5.6)

Hence if we set U X = Y (a column vector), then (5.6) becomes LY = b′ . We

know from Section 5.31 how to solve LY = b′ . A solution Y of this equation,
when substituted in U X = Y gives X again by the same method.
Chapter 5 Algebraic Structures 295

5.32.1 Computing an LU Decomposition

We ﬁrst consider the case when A = (aij ) is a nonsingular matrix of order

n. We begin by obtaining an LU decomposition for A; that is, an LUP
decomposition with P = In in (5.5). The process by which we obtain the LU
decomposition for A is known as Gaussian elimination .
6 0.
Assume that a11 = We write
a a12 . . . a1n   a11 a12 . . . a1n 
11
 a21 a22 . . . a2n   a21 
A =  .. .. ..  =  .. ′ ,
. . . . A
an1 an2 . . . ann an1

where A′ is a square matrix of order n − 1. We can now factor A as

 a a ... a1n

1 0 ... 0 11 12
 a21 /a11  0 
   . vw t 
 ..   . ′
A −

. In−1  . 
a11
an1 /a11 0
a21
where v = ... and wt = (a12 . . . a1n ). Note that vwt is also a matrix of
an1
vwt
order n − 1. The matrix A1 = A′ − is called the Schur complement of A
a11
with respect to a11 .
We now recursively ﬁnd an LU decomposition of A. If we assume that
A1 = L′ U ′ , where L′ is unit lower-triangular and U ′ is upper-triangular, then
  
1 0 a11 wt
A=  
v/a11 In−1 0 A1
  
t
1 0 a w
=   11 
′ ′
v/a11 In−1 0 LU
  
1 0 a11 wt
=  
′ ′
v/a11 L 0 U
Chapter 5 Algebraic Structures 296

= LU, (5.7)
   
t
1 0 a11 w
where L =  , and U =  . The validity of the two middle
′ ′
v/a11 L 0 U
equations on the right of (5.7) can be verified by routine block multiplication
of matrices (See ???). This method is based on the supposition that a11
and all the leading entries of the successive Schur complements are all non-
zero. If a11 is zero, we interchange the first row of A with a subsequent row
having a non-zero first entry. This amounts to premultiplying both sides
by the corresponding permutation matrix P yielding the matrix P A on the
left. We now proceed as with the case when a11 6= 0. If a leading entry of
a subsequent Schur complement is zero, once again we make interchanges
of rows—not just the rows of the relevant Schur complement but the full
rows got from A. This again amounts to premultiplication by a permutation
matrix. Since any product of permutation matrices is a permutation matrix,
this process finally ends up with a matrix P ′ A, where P ′ is a permutation
matrix of order n.
We now present two examples, one to obtain the LU decomposition when
it is possible and another to determine the LUP decomposition.

Example 5.32.2:

2 3 1 2
Find the LU decomposition of A = 4 7 4 7 .
2 7 13 16
h i 6 10 13 15
4
Here a11 = 2, v = 2 , wt = [3, 1, 2].
h i6 h i
2 6 2 4
Therefore v/a11 = 1 , and so vwt /a11 = 3 1 2 , where wt denotes the
3 9 3 6

transpose of w.
h i h i h i
7 4 7 6 2 4 1 2 3
Hence the Schur complement of A is A1 = 7 13 16 − 3 1 2 = 4 12 14 .
10 13 15 9 3 6 1 10 9

Now the Schur complement of A1 is

Chapter 5 Algebraic Structures 297

A2 = [ 12 14 4
10 9 ] − [ 1 ] [
2, 3 ] = [ 12 14 ] − [ 8 12 ] = [ 4 2 ].
10 9 2 3 8 6

This gives the Schur complement of A2 as

A3 = (6) − (2)(2) = (2)

= (1)(2) = L3 U3 ,

where L3 = (1) is lower unit triangular and U3 = (2) is upper triangular.

Tracing back we get,

1 0 4 w2t
A2 = v /a L 0 U3
2 11 3
h ih i
10 42
= 2 1 0 2 = L2 U2 .

This gives, " #" #

100 123
A1 = 410 042 .
121 002
Consequently, 1 0 0 02 3 1 2
2 1 0 00 1 2 3
A = 1 4 1 0 0 0 4 2 = LU,
3 1 2 1 0 0 0 2
where
1 0 0 0
2 1 0 0
L = 1 4 1 0 is unit lower-triangular (see Ch. 1 for deﬁnition), and
3 1 2 1
2 3 1 2
0 1 2 3
U = 0 0 4 2 is upper-triangular.
0 0 0 2

Example 5.32.3:  
2 3 1 2
4 6 4 7
Find the LUP decomposition of A = 2 7 13 16 .
6 10 13 15
Chapter 5 Algebraic Structures 298

Suppose we proceed as before: The Schur complement of A is

" # " # " # " # " #
6 4 7 2 6 4 7 624 0 2 3
A1 = 7 13 16 − 1 [3 1 2] = 7 13 16 − 3 1 2 = 4 12 14 .
10 13 15 3 10 13 15 936 1 10 9
Since the leading entry is zero, we interchange the first row of A1 with some
other row. Suppose we interchange the first and third rows of A1 . This

1 0 0 0
amounts to considering the matrix P A instead of A1 , where P = 00 00 01 10 .
0 1 0 0
Note that the first row of A2 corresponds to the second row of A and the last
row of A2 to the fourth row of A. This means that the Schur complement
h i
1 10 9
of P A (instead of A) is A′1 = 4 12 14 . We now proceed with A′1 as before.
0 2 3

The Schur complement of A′1 is

h i h i h i h i h i
12 14 4 10 9 12 14 40 36 −28 −22
A2 = 2 3 − 0 [ ]= 2 3 − 0 0 = 2 3 .

The Schur complement of A2 is

2
11 10
A3 = 3 − −28 [−22] = 3 − = = (1)(10/7) = L3 U3 ,
7 7

10
where L3 = [1] and U3 = . Hence
7

1 0 −28 −22
A2 = 2 1 0 10 .
−28 7

This gives #"

" #
1 0 0 1 10 9
A′1 = 4 1 0 0 −28 −22 .
0 −1
14
1 0 0 107
Thus   
1 0 0 0 23 1 2
3 1 0 0 0 1 10 9 
1 4 1 0 0 0 −28 −22 = LU,
2 0 −1
14
1 0 0 0 10 7
where L and U are the ﬁrst and second matrices in the product. Notice that
we have interchanged the second and fourth rows of A while computing L.
Chapter 5 Algebraic Structures 299

We have assumed, to start with, that A is a nonsingular matrix. Even if

A is invertible, it is possible that A has no LU decomposition. For example,
h i
01
the nonsingular matrix 1 0 has no LU decomposition. If A is singular it
is possible that not only a column of a Schur complement but even the full
column corresponding to it may be zero. In that case, we have to interchange
columns. This would result in a matrix of the form AP rather than P A.
It is also possible that we get the form P1 AP2 where P1 and P2 are both
permutation matrices.

Example 5.32.4:
Solve the system of linear equations

2X1 + 3X2 + X3 + 2X4 = 10

4X1 + 6X2 + 4X3 + 7X4 = 25 (5.8)

2X1 + 7X2 + 13X3 + 16X4 = 40

6X1 + 10X2 + 13X3 + 15X4 = 50.

These equations are equivalent to AX = B, where A is the matrix of

10
Example 5.32.3 and B = 2540 . If we interchange any pair of rows of (A|B),
50
it amounts to interchanging the corresponding equations. However, this will
in no way alter the solution. Hence the solutions of (5.8) are the same
as solutions of P AX = P B, where P is the permutation matrix obtained
in Example 5.32.3. Thus the solutions are the same as the solutions of
LU X = B ′ , where L and M are again those obtained in Example 5.32.3
and B ′ is got from B by interchanging the second and fourth entries.
Now set U X = Y so that the given equations become LY = B ′ , where
Chapter 5 Algebraic Structures 300
Y1
Y = YY23 . This gives
Y4
 
1 0 0 0    
3 1 0 0 Y1 10
1 4 1 0   = 50 .
Y 2
 −1  Y3 40
2 0 1 Y4 25
14
These are equivalent to

Y1 = 10

3Y1 + Y2 = 50

Y1 + 4Y2 +Y3 = 40
1
2Y1 − Y3 +Y4 = 25,
14

and we get Y1 = 10, Y2 = 20, Y3 = −50 and Y4 = 10/7. Substituting these

values in U X = Y , we get
   
23 1 2   10
X1
0 1 10 9 
X  20 
0 0 −28 −22  2
= −50 .
 10 X3  10 
00 0 X4
7 7
These give

2X1 + 3X2 + X3 + 2X4 = 10

X2 + 10X3 + 9X4 = 20

−28X3 − 22X4 = −50

10 10
X4 =
7 7

Solving backward, we get X1 = 2, X2 = X3 = X4 = 1.

Chapter 5 Algebraic Structures 301

5.33 Exercises

1. Examine if the following equations are consistent.

X1 + X2 + X3 + X4 = 0

2X1 − X2 + 3X3 + 4X4 = 1

3X1 + 4X3 + 5X4 = 2

2. Solve the system of homogeneous linear equations:

4X1 + 4X2 + 3X3 − 5X4 = 0

X1 + X2 + 2X3 − 3X4 = 0

2X1 + 2X2 − X3 + X4 = 0

X1 + X2 + 2X3 − 3X4 = 0

Show that the solution space is of dimension 2.

3. Solve:

X1 + X2 + X3 + X4 = 0

X1 + 3X2 + 2X3 + 4X4 = 0

2X1 + X3 − X4 = 0

4. Solve by using LUP decomposition:

(i)

2X1 + 3X2 − 5X3 + 4X4 = 8

3X1 + X2 − 4X3 + 5X4 = 10

7X1 + 3X2 − 2X3 + X4 = 10

Chapter 5 Algebraic Structures 302

4X1 + X2 − X3 + 3X4 = 10

(ii)

3X1 − 2X2 + X3 = 7

X1 + X2 + X3 = 12

−X1 + 4X2 − X3 = 3

(iii)

2X1 + 4X2 − 5X3 + X4 = 8

4X1 + −5X3 − X4 = 16

−4X1 + 2X2 + X4 = −5

6X1 + 4X2 − 10X3 + 7X4 = 13

5.34 Finite Fields

In this section, we discuss the basic properties of finite fields. Finite fields
are fundamental to the study of codes and cryptography.
Recall that a field F is finite if |F | is finite. |F | is the order of F . The
characteristic of a finite field F , as seen in Section 5.21, is a prime number
and the prime field P of F is a field of p elements. P consists of the p
elements 1F , 2 · 1F = 1F + 1F , . . . , p · 1F = 0F . Clearly, F is a vector space
over P . If the dimension of F over P is n, then n is finite. Hence F has a
basis {u1 , . . . , un } of n elements over P . This means that each element v ∈ F
is a unique linear combination of u1 , . . . , un , say,

v = α1 u1 + α2 u2 + · · · + αn un , αi ∈ P, 1 ≤ i ≤ n.
Chapter 5 Algebraic Structures 303

For each i, αi can take |P | = p values, and so there are p · p · · · (n times) = pn

distinct elements in F . Thus we have proved the following result.

Theorem 5.34.1:
The order of a ﬁnite ﬁeld is a power of a prime number.

Finite fields are known as Galois fields after the French mathematician
Évariste Galois (1811–1832) who first studied them. A finite field of order q
is denoted by GF (q).
We now look at the converse of Theorem 5.34.1. Given a prime power pn
(where p is a prime), does there exist a field of order pn ? The answer to this
question is in the affirmative. We give below two different constructions that
yield a field of order pn .

Theorem 5.34.2:
Given pn (where p is a prime), there exists a ﬁeld of pn elements.

n
Construction 1: Consider the polynomial X p − X ∈ Zp [X] of degree pn .
(Recall that Zp [X] stands for the ring of polynomials in X with coeﬃcients
from the ﬁeld Zp of p elements). The derivative of this polynomial is

n −1
pn X p − 1 = −1 ∈ Zp [X],

n
and is therefore relatively prime to it. Hence the pn roots of X p − X are
all distinct. (Here, though no concept of the limit is involved, the notion of
the derivative has been employed as though it is a real polynomial). It is
known [28] that the roots of this polynomial lie in an extension ﬁeld K ⊃ Zp .
Chapter 5 Algebraic Structures 304

n
K is also of characteristic p. If a and b any two roots of X p − X, then

n n
ap = a, and bp = b.

Now by Theorem 5.21.3,

n n n
(a ± b)p = ap ± bp ,

and, by the commutativity of multiplication in K,

n n n
ap bp = (ab)p ,

n
and so a ± b and ab are also roots of X p − X. Moreover, if a is a non-zero
n n n
root of X p − X, then so is a−1 since (a−1 )p = (ap )−1 = a−1 . Also the
associative and distributive laws are valid for the set of roots since they are
all elements of the field K. Finally 0 and 1 are also roots. In other words,
n
the pn roots of X p − X ∈ Zp [X] form a field of order pn .
Construction 2: Let f (X) = X n +a1 X n−1 +· · · an ∈ Zp [X] be a polynomial of
degree n irreducible over Zp The existence of such an irreducible polynomial
(with leading coefficient 1) of degree n is guaranteed by a result (see [5]) in

Algebra . Let F denote the ring of polynomials in Zp [X] reduced modulo
f (X) (that is, if g(X) ∈ Zp [X], divide g(X) by f (X) and take the remainder
g1 (X) which is 0 or of degree less than n). Then every non-zero polynomial
in F is a polynomial of Zp [X] of degree at most n − 1. Moreover, if a0 X n−1 +
· · · + an and b0 X n−1 + · · · + bn are two polynomials in F of degrees at most
n − 1, and if they are equal, then,

(a0 − b0 )X n−1 + · · · + (an − bn )

is the zero polynomial of F , and hence is a multiple of f (X) in Zp [X]. As

degree of f is n, this is possible only if ai = bi , 0 ≤ i ≤ n. Now if a0 + a1 X +
Chapter 5 Algebraic Structures 305

· · · + an−1 X n−1 is any polynomial of F , ai ∈ Zp and hence has p choices.

Hence the number of polynomials of the form

a0 X n−1 + · · · + an ∈ Zp [X]

is pn .
We now show that F is a ﬁeld. Clearly, F is a commutative ring with
unit element 1(= 0 · X n−1 + · · · + 0 · X + 1). Hence we need only verify that
if a(X) ∈ F is not zero, then there exists b(X) ∈ F with a(X)b(X) = 1. As
a(X) 6= 0, and f (X) is irreducible over Zp , the gcd (a(X), f (X)) = 1. So by
Euclidean algorithm (Section 3.3), there exist polynomials C(X) and g(X)
in Zp [X] such that
a(X)C(X) + f (X)g(X) = 1 (5.9)

in Zp [X]. Now there exists C1 (X) ∈ F with C1 (X) ≡ C(X)( mod f (X)).
This means that there exist a polynomial h(X) in Zp [X] with C(X) −
C1 (X) = h(X)f (X), and hence C(X) = C1 (X) + h(X)f (X). Substitut-
ing this in (1) and taking modulo f (X), we get, a(X)C1 (X) = 1 in F . Hence
a(X) has C1 (X) as inverse in F . Thus every non-zero element of F has a
multiplicative inverse in F , and so F is a field of pn elements.
We have constructed a field of pn elements in two different ways—one,
n
as the field of roots of the polynomial X p − X ∈ Zp [X], and the other, as
the field of polynomials in Zp [X] reduced modulo the irreducible polynomial
f (X) of degree n over ZP . Essentially, there is not much of a difference
between the two constructions, as our next theorem shows.

Theorem 5.34.3:
Any two finite fields of the same order are isomorphic under a field isomor-
Chapter 5 Algebraic Structures 306

phism.

Example 5.34.4:
Take p = 2 and n = 3. The polynomial X 3 + X + 1 of degree 3 is irreducible
over Z2 . (If it is reducible, one of the factors must be of degree 1, and it
must be either X or X + 1 = X − 1 ∈ Z2 [X]. But 0 and 1 are not roots
of X 3 + X + 1 ∈ Z2 [X]). The 23 = 8 polynomials over Z2 reduced modulo
X 3 + X + 1 are:

0, 1, X, X + 1, X 2 , X 2 + 1, X 2 + X, X 2 + X + 1

and they form a ﬁeld. (Note that X 3 = X + 1, X 3 + X = 1 and X 3 + X = 0).

We have, for instance, (X 2 + X + 1) + (X + 1) = X 2 and (X 2 + 1)(X + 1) =
X 3 + X 2 + X + 1 = X 2 . Also (X + 1)2 = X 2 + 1.
We know that if F is a field, the set F ∗ of non-zero elements of F is a
group. In the case when F is a finite field, F ∗ has an additional algebraic
structure.

Theorem 5.34.5:
If F is a ﬁnite ﬁeld, F ∗ (the set of non-zero elements of F ) is a cyclic group.

Proof. We know (as F is a ﬁeld), F ∗ is a group. Hence we need only show

that F ∗ is generated by a single element.
Let α be an element of the group F ∗ of maximum order, say, k. Necessar-
ily, k ≤ q −1, where q = |F |. Choose β ∈ F ∗ , β 6= α, 1. Let o(β) (= the order
l
of β) = l. Then l > 1. We ﬁrst show that l|k. Now o(β (k,l) ) = , where
(k, l)
l
(k, l) denotes the gcd of k and l. Further, as 0(α), o(β (k,l) ) = k, =
(k, l)
Chapter 5 Algebraic Structures 307

l
1, we have o(αβ (k,l) ) = o(α) · o(β (k,l) ) = k ·
= [k, l], the lcm of k and l.
(k, l)
But, by our choice, the maximum order of any element of F ∗ is k. Therefore
[k, l] = k which implies that l|k. But l = o(β). Therefore β k = 1. Thus for
each of the q − 1 elements of x of F ∗ , xk = 1 and so is a root of xk − 1. This
means that, as |F ∗ | = q − 1, k = q − 1. Thus o(α) = q − 1 and so F ∗ is the
cyclic group generated by α.

Definition 5.34.6: (i) A primitive element of a ﬁnite ﬁeld F is a generator

of the cyclic group F ∗ .

(ii) A monic polynomial is a polynomial with leading coeﬃcient 1. For

example, X 2 + 2X + 1 ∈ R[X] is monic while 2X 2 + 1 is not.

(iii) Let F be a finite field with prime field P . A primitive polynomial in

F [X] over P is the minimal polynomial in P [X] of a primitive element
of F . A minimal polynomial in P [X] of an element α ∈ F is a monic
polynomial of least degree in P [X] having α as a root.

Clearly the minimal polynomial of any element of F in P [X] is irreducible

over P .
Let α be a primitive element of F , and f (X) = X n + a1 X n−1 + · · · + an ∈
P[X] be the primitive polynomial of α. Then any polynomial in P [α] of
degree n or more can be reduced to a polynomial in P [α] of degree at most
n − 1. Moreover, no two distinct polynomials of P [α] of degree at most n − 1
can be equal; otherwise α would be a root of a polynomial of degree less than
n over P . Hence all the polynomials of the form

a0 + a1 α + · · · + an−1 αn−1 , ai ∈ P
Chapter 5 Algebraic Structures 308

in P [α] are all distinct and |P [α]| = pn where p = |P |. These pn elements

constitute a subﬁeld F ′ of F and α ∈ F ′ . But then F ⊆ F ′ and hence
F = F ′ . Thus |F | = |F ′ | = pn . As α is a primitive element of F this means
that
n
F = 0; α, α2 , . . . , αp −1 = 1 .

Example 5.34.7:
Consider the polynomial X 4 + X + 1 ∈ Z2 [X]. This is irreducible over Z2
(Check that it can have no linear or quadratic factor in Z2 [X]). Let α be a
root (in an extension ﬁeld of Z2 ) of this polynomial so that α4 + α + 1 = 0.
This means that α4 = α + 1.
We now prove that α is a primitive element of a ﬁeld of 16 elements over
Z2 by checking that the 15 powers α, α2 , . . . , α15 are all distinct and that
α15 = 1. Indeed, we have

α1 = α

α2 = α2

α3 = α3

α4 = α + 1

α5 = αα4 = α(α + 1) = α2 + α

α6 = αα5 = α3 + α2

α7 = αα6 = α4 + α3 = α3 + α + 1

α8 = αα7 = α4 + (α2 + α) = (α + 1) + (α2 + α) = α2 + 1

α9 = αα8 = α3 + α

α10 = αα9 = α4 + α2 = α2 + α + 1
Chapter 5 Algebraic Structures 309

α11 = αα10 = α3 + α2 + α

α12 = αα11 = α4 + (α3 + α2 ) = (α + 1) + (α3 + α2 ) = α3 + α2 + α + 1

α13 = αα12 = α4 + (α3 + α2 + α) = (α + 1) + (α3 + α2 + α)

= α3 + α2 + 1

α14 = αα13 = α4 + (α3 + α) = (α + 1) + (α3 + α) = α3 + 1

α15 = αα14 = α4 + α = (α + 1) + α = 1

Thus F = {0} ∪ F ∗ = {0, α, α2 , . . . , α15 = 1} and so α is a primitive element

of F = GF (24 ).

We observe that a polynomial irreducible over a ﬁeld F need not be

primitive over F . For instance, the polynomial f (X) = X 4 +X 3 +X 2 +X+1 ∈
Z2 [X] is irreducible over Z2 but it is not primitive. To check that f (X) is
irreducible, verify that F (X) has no linear or quadratic factor over Z2 . Next,
for any root α of f (X), check that α5 = 1 so that o(α) < 15, and f (X) is not
primitive over Z2 . Recall that if f (X) were a primitive polynomial, some

root of f (X) should be a primitive element of GF (24 ) .

5.35 Factorization of Polynomials over Finite

Fields

Let α be a primitive element of the ﬁnite ﬁeld F = GF (pn ), where p is a

n n
prime. Then F ∗ = α, α2 , . . . , αp −1 = 1 , and for any x ∈ F , xp = x.
Hence for each i, 1 ≤ i ≤ pn − 1,

n
(αi )p = αi .
Chapter 5 Algebraic Structures 310

t+1
This shows that there exists a least positive integer t such that αi·p = αi .
Then set

Ci = i, pi, p2 i, . . . , pt i , 0 ≤ i ≤ pn − 1.

The sets Ci are called the cyclotomic cosets modulo p deﬁned with respect
to F and α. Now, corresponding to the coset Ci , 0 ≤ i ≤ pn − 1, consider
the polynomial

2 t
fi (X) = (X − αi )(X − αi·p )(X − αi·p ) · · · (X − αi·p ).

t
The coefficients of fi are elementary symmetric functions of αi , αip , . . . , αip
and if β denotes any of these coefficients, then β satisfies the relation β p = β.
Hence β ∈ Zp and fi (X) ∈ Zp [X] for each i, 0 ≤ i ≤ pn − 1. Each element
of Ci determines the same cyclotomic coset, that is, Ci = Cip = Cip2 = · · · =
n
/ Ci , Ci ∩ Cj = φ. This gives a factorization of X p − X
Cipt . Moreover, if j ∈
n n −1
into irreducible factors over Zp . In fact, X p − X = X(X p − 1), and

n −1 n −1
Xp − 1 = (x − α)(X − α2 ) · · · (X − αp )
!
Y Y
= (X − αj ) ,
i j∈Ci

where the ﬁrst product is taken over all the distinct cyclotomic cosets. What
is more, each polynomial fi (X) is irreducible over Zp as shown below. To see
this, assume that

g(X) = a0 + a1 X + · · · + ak X k ∈ F [X].
p
Then g(X) = ap0 + ap1 X p + · · · + apk (X k )p

= a0 + a1 X p + · · · + ak X kp

= g(X p ).
Chapter 5 Algebraic Structures 311

Consequently, if β is a root of g, g(β) = 0, and therefore 0 = (g(β))p = g (β p ),

that is, β p is also a root of g(X). Hence if j ∈ Ci and αj is a root of fi (X),
then all the powers αk , k ∈ Ci , are roots of fi (X). Hence any non-constant
irreducible factor of fi (X) over Zp must contain all the terms (X −αj ), j ∈ Ci
as factors. In other words, g(X) is irreducible over Zp .
Thus the determination of the cyclotomic cosets yields a simple device to
n
factorize X p − X into irreducible factors over ZP . We illustrate this fact by
an example.

Example 5.35.1:
4
We factorize X 2 − X into irreducible factors over Z2 .
Let α be a primitive element of the ﬁeld GF (24 ). As a primitive polyno-
mial of degree 4 over Z2 having α as a root, we can take (See Example 5.34.7)
X 4 + X + 1.
The cyclotomic cosets modulo 2 w.r.t. GF (24 ) and α are:

C0 = {0}

C1 = 1, 2, 22 = 4, 23 = 8 (Note: 24 = 16 ≡ 1( mod 15))

C3 = {3, 6, 12, 9}

C5 = {5, 10}

C7 = {7, 14, 13, 11} .

Note that C2 = C1 = C4 , and so on. Thus

15
Y
16 15

X − X = X(X − 1) = X X − αi
i=1
" #" #
0
Y i
Y i

=X X −α X −α X −α
i∈C1 i∈C3
Chapter 5 Algebraic Structures 312
" #" #
Y Y
i i
× X −α X −α
i∈C5 i∈C7

= X (X + 1) X 4 + X + 1 X 4 + X 3 + X 2 + X + 1

× X2 + X + 1 X4 + X3 + 1 . (5.10)

In computing the products, we have used the relation α4 + α + 1 = 0, that

is, α4 = α + 1. Hence, for instance,
Y
X − αi = X − α5 X − α10
i∈C5

= X 2 − α5 + α10 X + α15

= X 2 + α2 + α + α2 + α + 1 X + α15

= X 2 + X + 1.

The six factors on the right of Equation (5.10) are all irreducible over Z2 . The
minimal polynomials of α, α3 and α7 are all of degree 4 over Z2 . However,
while α and α7 are primitive elements of GF (24 ) (so that the polynomials
X 4 +X +1 and X 4 +X 3 +1 are primitive), α3 is not (even though its minimal
polynomial is also of degree 4).

Primitive polynomials ???? are listed in [44].

5.35.1 Exercises

1. Construct the following ﬁelds:

GF (24 ), GF (25 ) and GF (32 ).

2. Show that GF (25 ) has no GF (23 ) as a subﬁeld

3 5
3. Factorize X 2 + X and X 2 + X over Z2 .
Chapter 5 Algebraic Structures 313

2
4. Factorize X 3 − X over Z3 .

5. Using Theorem 5.34.5, prove Fermat’s little Theorem that for any prime
p, ap−1 ≡ 1 ( mod p), for a 6≡ 0 ( mod p).

5.36 Mutually Orthogonal Latin Squares [MOLS]

In this section, we show, as an application of ﬁnite ﬁelds, the existence of

n − 1 mutually orthogonal latin squares of order n.
A latin square of order n is a double array L of n rows and n columns in
which the entries belong to a set S of n elements such that no two entries
of the same row or column of L are equal. Usually, we take S to be the set
{1, 2, . . . , n} but this is not always essential.
h i
1 2 3
For example, [ 12 21 ] and 2 3 1 are latin squares of orders 2 and 3 re-
3 1 2

spectively. Let L1 = (aij ), and L2 = (bij ) be two latin squares of order n

with entries in S. We say that L1 and L2 are orthogonal latin squares if
h i
1 2 3
the n2 ordered pairs (aij , bij ) are all distinct. For example, L1 = 2 3 1 ,
h i 3 1 2
1 2 3
and L2 = 3 1 2 are orthogonal latin squares of order 3 since the nine or-
2 3 1

dered pairs (1, 1), (2, 2), (3, 3); (2, 3), (3, 1), (1, 2); (3, 2), (1, 3), (2, 1) are all
distinct. However if M1 = [ 12 21 ] and M2 = [ 21 12 ], then the 4 ordered pairs
(1, 2), (2, 1), (2, 1) and (1, 2) are not all distinct. Hence M1 and M2 are not
orthogonal. The study of orthogonal latin squares started with Euler, who
had proposed the following problem of 36 officers. The problem asks for
an arrangement of 36 officers of 6 ranks and from 6 regiments in a square
formation of size 6 by 6. Each row and column of this arrangement are to
contain only one officer of each rank and only one officer from each regiment.
Chapter 5 Algebraic Structures 314

We label the ranks and the regiments from 1 through 6, and assign to each
officer an ordered pair of integers in 1 through 6. The first component of the
ordered pair corresponds to the rank of the officer and the second component
his regiment. Euler’s problem then reduces to finding a pair of orthogonal
latin squares of order 6. Euler conjectured in 1782 that there exists no pair
of orthogonal latin squares of order n ≡ 2( mod 4). Euler himself verified
the conjecture for n = 2, while Tarry in 1900 verified it for n = 6 by a sys-
tematic case by case analysis. But the most significant result with regard to
the Euler conjecture came from Bose, Shrikande and Parker who disproved
the conjecture by establishing that if n ≡ 2( mod 4) and n > 6, then there
exists a pair of orthogonal latin squares of order n.
A set {L1 , . . . , Lt } of t latin squares of order n on S is called a set of mu-
tually orthogonal latin squares (MOLS) if Li and Lj are orthogonal whenever
i 6= j. It is easy to see [59] that the number t of MOLS of order n is bounded
by n − 1. Further, any set of n − 1 MOLS of order n is known to be equiva-
lent to the existence of a finite projective plane of order n. A long standing
conjecture is that if n is not a prime power, then there exists no complete
set of MOLS of order n.
We now show that if n is a prime power, there exists a set of n − 1 MOLS
of order n. (Equivalently, this implies that there exists a projective plane of
any prime power order, though we do not prove this here).

Theorem 5.36.1:
Let n = pk , where p is a prime and k is a positive integer. Then for n ≥ 3,
there exists a complete set of MOLS of order n.
Chapter 5 Algebraic Structures 315

Proof. By Theorem 5.34.2, we know that there exists a finite field GF (pk ) =
GF (n) = F , say. Denote the elements of F by a0 = 0, a1 = 1, a2 , . . . , an−1 .
Define the n − 1 matrices A1 , . . . , An−1 of order n by

At = (atij ) = (at aij ), 0 ≤ i, j ≤ n − 1; and 1 ≤ t ≤ n − 1,

where at aij = at ai + aj . The entries atij are all elements of the ﬁeld F . We
claim that each At is a latin square. Suppose, for instance, two entries of
some i-th row of At , say atij and atil are equal. This implies that

at ai + aj = at ai + al ,

and hence aj = al . Consequently j = l. Thus all the entries of the i-th row
of At are distinct. For a similar reason, no two entries of the same column of
At are equal Hence At is a latin square.
We next claim that {A1 , . . . , An−1 } is a set of MOLS. Suppose 1 ≤ r <
u ≤ n − 1. Then Ar and Au are orthogonal. For suppose that

arij , auij = ari′ j ′ , aui′ j ′ .

This means that

ar ai + aj = ar ai′ + aj ′ ,

and au ai + aj = au ai′ + aj ′ .

Subtraction gives
(ar − au )ai = (ar − au )a′i

and hence, as ar 6= au , ai = ai′ . Consequently, i = i′ and j = j ′ . thus Ar and

Au are orthogonal.
Chapter 6

Graph Theory

6.1 Introduction

Graph theory, broadly speaking, studies properties between a given set of

objects with some structure. Informally, a graph is a set of objects called
vertices (or points) connected by links called edges (or lines). A graph is
usually depicted by means of a diagram in the plane in which the vertices
are denoted by points of the plane and edges by lines joining certain pairs of
vertices. Graph theory has its origins in recreational mathematics and its use
as a tool to solve practical problems. Since 1930, graph theory has received
considerable attention as a mathematical discipline. This is due to its wide
range of practical applications in many areas of the society. In fact, in recent
times, its importance has seen phenomenal growth in view of its wide-range
connections to Computer Science.
The earliest known recorded result in graph theory is due to the Swiss
mathematician Leonhard Euler (1701–1783) in 1736. The city of Königsberg
(now Kaliningrad) has seven bridges linking two islands A and B and the

316
Chapter 6 Graph Theory 317

banks C and D of the Pregal river (later called the Pregolya). The people
of Königsberg wondered if it was possible to take a stroll across the seven
bridges, crossing each bridge exactly once and returning to the starting point.
Euler showed that it was not possible.

C (land)

Pregel river

A (island) B (island)

D (land)

Figure 6.1:

In 1859, William Rowan Hamilton (1805–1865) discovered the idea of a

“cycle” that visits each vertex of a graph exactly once. Hamilton’s game
was a wooden regular dodecahedron with 20 vertices (labeled by cities). The
objective was to ﬁnd a cycle traveling along the edges of the solid so that
each city was visited exactly once. Fig. 6.2 below is the planar graph for a
dodecahedron—the required cycle is designated by dark edges. Such a cycle
is now called a Hamilton cycle.
Chapter 6 Graph Theory 318

Figure 6.2:

Trees are special kinds of connected graphs that contain no cycles. In

1847, G. B. Kirchhoff (1824–1877) first used trees in his work on electri-
cal networks. It is also known that Arthur Caley (1821–1895) used trees
systematically in his attempts to enumerate the isomers of the saturated
hydrocarbons.
Many problems in graph theory are easy to pose although their solutions
may require research effort. Perhaps the most famous one is the Four Color
Theorem (FCT) which states that every planar map can be colored using
not more than four colors so that regions sharing a common boundary are
colored different. The final proof of the FCT, given by Kenneth Appel and
Wolfgang Haken in 1976 required 1200 hours of computer time.
The present chapter gives an exposure to some of the basic ideas in graph
theory.
Chapter 6 Graph Theory 319

6.2 Basic definitions and ideas

6.2.1 Types of Graphs

Definition 6.2.1:
A graph G consists of a vertex (or point) set V (G) = {v1 , . . . , vn } and an edge
(or line) set E(G) = {e1 , . . . , em }, where each edge consists of an unordered
pair of vertices. If e is an edge of G, then it is represented by the unordered
pair {a, b} (denoted by ab when no confusion arises). We call a and b, the
endpoints of e. If an edge e = {u, v} ∈ E(G), then u and v are said to be
adjacent. A null graph on n vertices, denoted by Nn , has no edges. The
empty graph is denoted by φ and has vertex set φ (and therefore edge set φ).

A loop is an edge whose endpoints are the same. Parallel edges or multiple
edges are edges that have the same pair of endpoints. A simple graph is one
that has no loops and no multiple edges.
The order of a graph G, denoted by n(G) or simply n, is the number of
vertices in V (G). A graph of order 1 is called trivial. The size of a graph G,
denoted by m(G) or simply m, is the number of edges in E(G). If a graph
G has finite order and finite size, then G is said to be a finite graph.
Unless stated otherwise, we consider only simple finite graphs.
It is possible to assign a direction or orientation to each edge in a graph—
we then treat an edge e (with endpoints u and v) as an ordered pair (u, v)
−→ −→
or (v, u) often denoted by uv or vu.

Definition 6.2.2:
A directed graph or a digraph G consists of a vertex set V (G) = {v1 , . . . , vn }
Chapter 6 Graph Theory 320

2b
b
5b 6b
1
4b 5b b b
4 7
b b b b
1 G1 3 2 G2 3
V (G1 ) = {1,
2, 3, 4, 5} V (G2 ) = {1, 2, 3, 4, 5, 6, 7}
E(G1 ) = {1, 2}, {2, 3}, E(G2 ) = (1, 2), (2, 3), (1, 4), (3, 6),
{4, 5} (6, 5), (6, 7), (7, 6)

Figure 6.3:

and an arc set A(G) = {e1 , . . . , em } where each arc is an ordered pair of
vertices. A simple digraph is one in which each ordered pair of vertices
−→
occurs at most once as an arc. We indicate an arc as (u, v) or uv where u
−→
and v are the endpoints. Note that this is diﬀerent from the arc vu with
same endpoints. If e = (u, v), then u is called the tail and v is called the
head of e.

In depicting a digraph, we assign a direction, marked by an arrow to each

−→
edge. Thus an arc uv of a digraph will be a line from u to v with an arrow
in the direction from u to v. Figures 6.3 and 6.4 show some examples of
graphs and digraphs. G1 is a graph where the set of vertices {1,2,3} forms
one component and the set of vertices {4,5} forms another component. We
thus have a graph that is not connected. The connected components of such
a graph may be “studied” separately. G2 is an example digraph.
Consider the graphs G3 and G4 . We can make the following correspon-
dence (denoted by the double arrow ↔) between V (G3 ) and V (G4 ):

1 ↔ a, 2 ↔ b, 3 ↔ d, 4↔d
Chapter 6 Graph Theory 321

6b 7
b
b b bb c
b
5
8 f b b g
2b b b b
3 e h
b b b b
1 a
G3 4 G4 d
V (G3 ) = {1, 2, 3, 4, 5, 6, 7, 8} V (G4 ) = {a, b, c, d, e, f, g, h}
E(G3 ) = {1, 4}, {4, 8}, {8, 5}, {5, 1}, E(G4 ) = {ab, bc, cd, da, ef, f g,
{1, 2}, {5, 6}, {8, 7}, {4, 3}, gh, he, ae, bf, cg, dh}
{2, 3}, {3, 7}, {7, 6}, {6, 2}

Figure 6.4:

5 ↔ e, 6 ↔ f, 7 ↔ g, 8↔h

Letting i, j ∈ {1, 2, 3, 4, 5, 6, 7, 8} and x, y ∈ {a, b, c, d, e, f, g, h} we can easily

check that if i ↔ x and j ↔ y then ij ∈ E(G3 ) if and only if xy ∈ E(G4 ). In
other words, G3 and G4 are the “same graph”. More formally we have the
following deﬁnition.

Definition 6.2.3:
Two simple graphs G and H are isomorphic if there is a bijection φ : V (G) →
V (H) such that {u, v} ∈ E(G) if and only if {φ(u), φ(v)} ∈ E(H).

Exercise 6.2.4:
Show that the Petersen graph P (most commonly drawn as G1 below) is
isomorphic to the graphs G2 , G3 and G4 below:
Chapter 6 Graph Theory 322

b b
b b b b
b b

b
b b b
b b b b b b b
b b b b
b b b
b b
b b b b
b b b b b b b b

G1 G2 G3 G4

Figure 6.5:

Exercise 6.2.5:
Show that the Petersen graph is isomorphic to the following graph Q: The
vertex set V (Q) is the set of unordered pairs of numbers (i, j), i 6= j, 1 ≤
i, j ≤ 5. Two vertices {i, j} and {k, l} (i, j, k, l ∈ {1, 2, . . . , 5}) form an edge
if and only if {i, j} ∩ {k, l} = φ.

Example 6.2.6:
n
n(n−1)
Let V be a set cardinality n (vertices). From V we can obtain 2
= 2

unordered pairs (these can be treated as possible edges). Each subset of these
ordered pairs deﬁnes a simple graph and hence there are 2( 2 ) simple graphs
n

with vertex set V .

If |V | = 4, then there are 64 simple graphs on four vertices. However,

they fall into only eleven isomorphism classes as shown in Fig 6.6.
Chapter 6 Graph Theory 323

b b b b b b b b b b b b

G1 G2 G3 G4 G5 G6

b b b b b b b b b b

b b b b b b b b b b
G11 G10 G9 G8 G7

Figure 6.6:

Consider the pairs Gi and G12−i in Fig 6.6.

We see that G12−i is obtained from Gi by removing its edge(s) and intro-
ducing all edges not in Gi . This is the idea of complementation of a graph.
Note that G6 is isomorphic to its complement (formal deﬁnition follows).
While there are only 11 non-isomorphic simple graphs on six vertices,
there are as many as 1044 non-isomorphic simple graphs on seven vertices!
When the order and the size of graphs are small (“small graphs”), it is usually
easy to check for isomorphism. However, the problem of deciding whether
two given graphs are isomorphic or not, is diﬃcult in general.

Definition 6.2.7:
The complement Ḡ of a simple graph G is the simple graph with vertex set
V (Ḡ) = V (G) and edge set E(Ḡ) deﬁned thus: uv ∈ E(Ḡ) if and only if
uv ∈
/ E(G). That is, E(Ḡ) = {uv|uv ∈
/ E(G)}

Example 6.2.8:
The following graph G is isomorphic to its complement Ḡ:
Chapter 6 Graph Theory 324

1b b2 1b 2b
5b 6b 5b b6

b b b b
8 7 8 7
b b b b
4 3 4 3
G Ḡ
Figure 6.7:

The mapping for the isomorphism is:

1 → 4, 2→6 3→1 4→7 5→2 6→8 7→3 8→5

Exercise 6.2.9:
Show that two graphs G and H are isomorphic if and only if Ḡ and H̄ are
isomorphic.

Definition 6.2.10:
A simple graph is called self-complementary if it is isomorphic to its own
complement.

The complement of the null graph Nn is a graph with n vertices in which

every distinct pair of vertices is an edge. Such a graph is called a complete
graph or a clique. The complete graph on n vertices is denoted by Kn . It
easily follows that Kn has 21 n(n − 1) edges.

Exercise 6.2.11:

The line-graph L(G) of a given graph G = V (G), E(G) is the simple graph

whose vertices are the edges of G with ef ∈ E L(G) if and only if two edges
Chapter 6 Graph Theory 325

e and f in G have a common endpoint. Prove that the Petersen graph is the
complement of the line-graph of K5 .

Exercise 6.2.12:
Prove that if G is a self-complementary graph with n vertices, then n is either
4t or 4t + 1, for some integer t (Hint: consider the number of edges in Kn )

A subgraph of a graph G is a graph H such that V (H) ⊆ V (G) and

E(H) ⊆ E(G). In notation, we write H ⊆ G and say that “H is a subgraph
of G”. For S ⊆ V (G), the induced subgraph G[S] or < S > of G is a
subgraph H of G such that V (H) = S and E(H) contains all edges of G
whose endpoints are in S (see Fig 6.8).
b b b

b b b b b b
b b

b b b b b b
G G1 G2

Figure 6.8: G1 is an induced subgraph of G; G2 is not

Note that a complete graph may have many subgraphs that are not cliques
but every induced subgraphs of a complete graph is a clique.
The components of a graph G are its maximal connected subgraphs. A
component is nontrivial if it contains an edge.
An independent set of a graph G is a vertex subset S ⊆ V (G) such that
no two vertices of S are adjacent in G. It is easy to check that a clique of G
is an independent set of Ḡ and vice versa.
We next introduce a special class of graphs, called bipartite graphs. A
Chapter 6 Graph Theory 326

graph G is called bipartite if V (G) can be partitioned into two subsets X and
Y such that each edge of G has one endpoint in X and the other in Y . We
express this by writing G = G(X, Y ).
A complete bipartite graph G is a bipartite graph G(X, Y ) whose edge set
consists of all possible pairs of vertices having one endpoint in X and the
other in Y . If X has m vertices and Y has n vertices such a graph is denoted
by Km,n . Note that Km,n is isomorphic to Kn,m . It is easy to see that Km,n
has mn edges.

Example 6.2.13:
The graph of Fig 6.9.(a) is the 3-cube. It is a bipartite graph (though not
complete). Fig. 6.9.(b) is a redrawing of Fig. 6.9.(a) exhibiting the biparti-
tions X = {x1 , x2 , x3 , x4 } and Y = {y1 , y2 , y3 , y4 }. Graphs of Fig. 6.9.(c) are
isomorphic to the complete bipartite graph K3,3 .

x1 b
y1
b
y2 x2 x1 x2 x3 x4
b b b b b b

≃
b b b b b b
x3 y3 y1 y2 y3 y4
b b
y4 x4
(a) (b)
b b
b b b b b b

b b

b b b b b b
b b

(c) Three drawings of the complete bipartite graph K3,3

Figure 6.9:
Chapter 6 Graph Theory 327

Exercise 6.2.14:
Let G be the graph whose vertex set is the set of binary strings of length
n (n ≥ 1) . A vertex x in G is adjacent to vertex y of G if and only if x and
y diﬀer exactly in one position in their binary representation. Prove that G
is a bipartite graph.

We next introduce the notion of a walk and related concepts.

Definition 6.2.15:
A walk of length k in a graph G is a non-null alternating sequence v0 e1 v1 e2 . . . ek vk
of vertices and edges of G (starting and ending with vertices) such that
ei = vi−1 vi , for all i. A trail is a walk in which no edge is repeated. A path
is a walk with no repeated vertex. A (u, v)- walk is one whose ﬁrst vertex
is u and last vertex v (u and v are the end vertices of the walk). A walk or
trail is closed if it has length at least one and has its endvertices the same.
A cycle is a closed path. A cycle on n vertices is denoted by Cn (where the
vertices are unlabeled).

Note that in a simple graph a walk is completely speciﬁed by its sequence

of vertices.
We now formally state the notion of connectedness.

Definition 6.2.16:
A graph G is connected if it has a (u, v)-path for each pair u, v ∈ V (G).

Exercise 6.2.17:
Let G be a simple graph. Show that if G is not connected, then its comple-
Chapter 6 Graph Theory 328

ment Ḡ is connected.

6.2.2 Two Interesting Applications

We now illustrate how the above concepts can be applied to problems.

(a) A Party Problem

Six people are at a party. Show that there are three people who all know
each other or there are three people who are mutually strangers.
Perhaps, the easiest way to solve the problem is using graph theory. Con-
sider the complete graph K6 . We associate the six people with the six vertices
of K6 . We color the edges joining two vertices black if the corresponding peo-
ple know each other. If two people do not know each other, we color the edge
joining the corresponding vertices grey. If there are three people who know
(don’t know) each other, then we should have a black (grey) triangle in K6 .
Given an assignment of colors to all edges of K6 , a subgraph H is called
monochromatic if all edges of H have the same color.
The party problem can now be posed as follows:
If we arbitrarily color the edges of K6 black or grey, then there must be
a monochromatic clique on three vertices.
Let u, v, w, x, y, z be the vertices of K6 . An arbitrary vertex, say u in
K6 has degree 5. So when we color the edges incident with u, we must use the
color black or grey at least three times. Without loss of generality, assume
that the three edges are colored black as shown in Fig. 6.10. Let these edges
be uv, ux and uw. If any one of the edges vw, vx or xw is now colored
black, we get the required black triangle. Hence we suppose all these edges
Chapter 6 Graph Theory 329

are colored grey. But then this gives a grey triangle.

ub
v b b w

y b b z
b x

Figure 6.10:

We remark that the generalization of the above party problem leads to

Ramsey Theory. However we will not develop the theory in this book.

(b) Proof of Schröder-Bernstein Theorem

An interesting application of bipartite graphs appears in the proof of Schröder-

Bernstein theorem (See Theorem 1.5.6) We begin with a result which we state
without proof: A graph G is bipartite iﬀ no cycle of G is odd.
Let f : X → Y and g : Y → X be 1–1 maps. Form a bipartite graph
G = G(X, Y ) with bipartition (X, Y ). If x ∈ X, set (x, f (x)) ∈ E(G) and

if y ∈ Y , set y, g(y) ∈ E(G). The components of G could only be of one
of the following four types (See Fig. 6.11):

xb b g(f (x)) d x′ = g(y ′ ) x′′ = g(y ′′ )

b b b b b b b b b b b b b b b b b b

b b b b b b b b b b b b b b b b b b b b

f (x) f g f (x) y′ y ′′ = f (x′ ) d

Figure 6.11:
Chapter 6 Graph Theory 330

(i) a one-way inﬁnite path starting at a vertex x ∈ X. Such a path is of

the form

x, f (x), g f (x) , f g f (x) , · · · .

(ii) a one-way inﬁnite path starting at a vertex y ′ ∈ Y . Such a path is of

the form

y ′ , g(y ′ ), f g(y ′ ) , · · · .

(iii) two-way inﬁnite paths with vertices alternating between X and Y . Such
a path is of the form
. . . , x, y, x′ , y ′ , . . . ,

where x, x′ , . . . ∈ X, and y, y ′ . . . ∈ Y .

(iv) an even cycle of the form

x1 y1 x2 y2 , . . . , xn , yn x 1 ,

where xi ∈ X and yj ∈ Y for each i and j.

It is now easy to set up a bijection φ from X onto Y . If z is a vertex of a

component of type (i), set



f (z) if z ∈ X
φ(z) =


g(z) if z ∈ Y.

A similar statement applies for vertices of a component of type (ii). In

type (iii), deﬁne φ(x) = y, φ(x′ ) = y ′ and so on. Finally, in type (iv),
if the component is x1 y1 x2 y2 . . . xn yn x1 , set φ(xi ) = yi , 1 ≤ i ≤ n. Then
φ : X → Y is a bijection from X onto Y . Thus X and Y are equipotent
sets.
Chapter 6 Graph Theory 331

6.2.3 First Theorem of Graph Theory

We next come to the ﬁrst theorem of graph theory–ﬁrst we introduce the

notion of degree of a vertex.
An edge e of a graph G is said to be incident with a vertex v if v is an
endvertex of e. We then say, v is incident with e. Two edges e and f having
a common vertex v are said to be adjacent.

Definition 6.2.18:
Given a graph G, let v ∈ V (G). The degree d(v) or dG (v) of v is the number
of edges of G incident with v.

A vertex of degree 1 in a graph G is called an end-vertex of G or a leaf

of G. We denote the maximum degree in a graph G by ∆(G); the minimum
degree is denoted by δ(G). A graph is regular if ∆(G) = δ(G). Also, a
graph is k-regular if ∆(G) = δ(G) = k. It is easy to check that a k-regular
graph with n vertices has nk/2 edges. The complete graph Kn is (n − 1)
regular. The complete bipartite graph Kn,n is n-regular. The Petersen graph
is 3-regular.
A 3-regular simple connected graph is called a cubic graph. It turns out
that cubic graphs on n vertices exist for even values of n. Note that K3,3 is
a cubic graph. Some cubic graphs are shown in Fig. 6.12.
b b b
b b b b
b b b b b b b b
b b b b
b

b b b b b b

Figure 6.12:
Chapter 6 Graph Theory 332

Exercise 6.2.19:
Consider any k-regular graph G for odd k. Prove that the number of edges
in G is a multiple of k.

Exercise 6.2.20:
Prove that every 5-regular graph contains a cycle of length at least six.

Theorem 6.2.21 (First Theorem of graph theory or Degree-Sum Formula):

For any graph G,
X
d(v) = 2m(G)
v ∈ V (G)

Proof. When the degrees of all the vertices are summed up, each edge is
counted twice. Hence the result.

By the Degree-Sum Formula, the average vertex degree is 2m(G)/n(G),

where n(G) is the order of G. Here,

δ(G) ≤ 2m(G)/n(G) ≤ ∆(G)

A vertex of a graph is called odd or even depending on whether its degree is

odd or even.

Corollary 6.2.22 (Handshake Lemma):

In any graph G, there is an even number of odd vertices.

Proof. Let A and B be respectively the set of odd and even vertices of G.
P
Then for each u ∈ B, d(u) is even and so d(u) is also even. By the
u∈B
Degree-Sum Formula
X X X
d(u) + d(w) = d(v) = 2m(G)
u∈B w∈A v∈V (G)
Chapter 6 Graph Theory 333

X X
This gives, d(w) = 2m(G) − d(v) = an even number
w∈A u∈B

(being the diﬀerence of two even numbers)

Hence the result. This can be interpreted thus: the number of participants
at a birthday party each of who shake hands with an odd number of other
participants is always even.

6.3 Representations of Graphs

A graph G = (V (G), E(G)) can be represented as a collection of adjacency

lists or as an adjacency matrix (deﬁned below). For sparse graphs for which
|E(G)| is much less compared to |V (G)|2 , the adjacency list representation
is preferred. For dense graphs for which |E(G)| is close to |V (G)|2 , the
adjacency matrix representation is good.

Definition 6.3.1:

The adjacency list representation of a graph G = V (G), E(G) consists of
an array Adj of |V (G)|. For each vertex u of G, Adj[u] points to the list of
all vertices v that are adjacent to u.

−→
For a directed graph Adj[u] points to the list of all vertices v such that uv
is an arc in E(G). It is easy to see that the adjacency list representation of
a graph (directed or undirected) has the desirable property that the amount
of memory it requires is

O(max(|V |, |E|)) = O(|V | + |E|)

Given an adjacency list representation of a graph, to determine if an edge

Chapter 6 Graph Theory 334

uv is present in the graph, the only way is to search the list Adj[u] for v.
The process of determining the presence (or absence) of an edge uv is much
simpler in the adjacency-matrix (deﬁned below) representation of a graph.

Definition 6.3.2:

To represent a graph G = V (G), E(G) , we ﬁrst number the vertices of G
by 1, 2, . . . , |V | in some arbitrary manner. The adjacency matrix of G is then
the |V | × |V | matrix A = (aij ) when,



1 if ij ∈ E(G)
aij =


0 otherwise.

The above deﬁnition applies to directed graphs also where we specify the
elements aij as, 

 −→
1 if ij ∈ A(G)
aij =


0 otherwise.

Note that a graph may have many adjacency lists and adjacency matrices
because the numbering of the vertices is arbitrary. However, all the rep-
resentations yield graphs that are isomorphic. It is then possible to study
properties of graphs that do not dependent on the labels of the vertices.

Theorem 6.3.3:
Let G be a graph with n vertices v1 , . . . , vn . Let A be the adjacency matrix
of G with this labeling of vertices. Let Ak be the result of multiplication of
k (a positive integer) copies of A. Then the (i, j)th entry of Ak is the number
of diﬀerent (vi , vj )-walks in G of length k.
Chapter 6 Graph Theory 335

Proof. We prove the result by induction on k. For k = 1, the theorem follows

from the definition of the adjacency matrix of G, since a walk of length 1
from vi to vj is just the edge vi vj .
We now assume that the result is true for Ak−1 (k > 1). Let Ak−1 = (bij )
i.e., bij is the number of different walks of length k − 1 from vi to vj . Let
Ak = (cij ). We want to prove that cij is the number of different walks of
length k from vi to vj . By definition, Ak = Ak−1 A and by the definition of
matrix multiplication,
Xn

cij = (i, r)th element ofAk−1 × (r, j)th element of A
r=1
Xn
= bir arj
r=1

Now every (vi , vj )-walk of length k consists of a (vi , vr )-walk of length k − 1

for some r followed by the edge vr vj . Now arj = 1 or 0 according as vr is
adjacent to vj or not. Now, by induction hypothesis, the number of (vi , vr )-
walks of length k − 1 is the (i, r)th entry bir of matrix Ak−1 . Hence the total
number of (vi , vj )-walks of length k is,
Xn
bir arj = cij .
r=1
k
Hence the result is true for A .

The next theorem uses the above result to determine whether or not a
graph is connected.

Theorem 6.3.4:
Let G be a graph with n vertices v1 , . . . , vn and let A be the adjacency matrix
of G. Let B = (bij ) be the matrix given by,

B = A + A2 + . . . + An−1 .
Chapter 6 Graph Theory 336

Then G is connected if and only if for every pair of distinct indices i, j,

bij 6= 0; that is, G is connected if and only if B has no zero elements oﬀ the
main diagonal.

(k)
Proof. Let aij denote the (i, j)th entry of Ak (k = 1. . . . , n − 1). We then
have,
(1) (2) (n−1)
bij = aij + aij + . . . + aij
(k)
By Theorem 6.3.3, aij denotes the number of distinct walks of length k from
vi to vj . Thus,

bij = (number of diﬀerent (vi , vj )-walks of length 1)

+ (number of diﬀerent (vi , vj )-walks of length 2)

+ ···

+ (number of diﬀerent (vi , vj )-walks of length n − 1)

In other words, bij is the number of diﬀerent (vi , vj )-walks of length less than
n.
Assume that G is connected. Then for every pair i, j(i 6= j) there is a
path from vi to vj . Since G has only n vertices, any path is of length at most
n − 1. Hence there is a path of length less than n from vi to vj . This implies
that bij 6= 0.
Conversely, assume that bij 6= 0 for every pair i, j(i 6= j). Then from
the above discussion it follows that there is at least one walk of length less
than n, from vi to vj . This holds for every pair i, j(i 6= j) and therefore we
conclude that G is connected.

Exercise 6.3.5:
Let A be the adjacency matrix of a connected graph G with n vertices; is it
Chapter 6 Graph Theory 337

always necessary to compute up to An−1 to conclude that G is connected?

Justify your answer.

Let G be the simple connected graph shown in Fig. 6.13.

1b 3b

b b
2 4

Figure 6.13:

We label the vertices of G so that the edge {1, 2} is necessarily present

in G. The graph G can now be regarded as a collection of edges from the
lexicographically ordered sequence.

{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}

Indicating the presence or absence of an edge in G by 1 or 0 respectively, the

above sequence yields the binary string,

1 0 0 1 1 1

which represents the number 39 in the decimal system. Thus we can uniquely
represent a graph by a number.

6.4 Basic Ideas in Connectivity of Graphs

6.4.1 Some Graph Operations

Let G = (V, E) be a graph. We deﬁne graph operations that lead to new

graphs from G.
Chapter 6 Graph Theory 338

(a) Edge addition: Let uv ∈ E(G). Then the graph,

G + uv = V (G), E(G) ∪ {uv}

(b) Edge deletion: Let e ∈ E(G). Then the graph

G − e = V (G), E(G) \ {e}

(c) Vertex deletion: Let v be a vertex of G. Then the graph

G − v = V (G) \ {v}, {e ∈ E(G)|e not incident with v}

i.e., we delete the vertex v and all edges having v as an endvertex.

(d) Edge subdivision:

G%e = V (G) ∪ {z}, E(G) \ xy ∪ xz, zy

where e = xy ∈ E(G) and z ∈

/ V (G) is a new vertex; we “insert a new
vertex z” on the edge xy.

(e) Deletion of a set of vertices: Let W ⊂ V (G). The graph G \ W is the

graph obtained from G by deleting all the vertices of W as also each edge
that has at least one vertex in W .

The above operations are illustrated in Fig 6.14.

Chapter 6 Graph Theory 339

ub vb

b
wb b

b b
G + uv
b b

b b b

b b

ub vb G−e
b b
e
b b b b b
w w′
b b b
b
G Gb − w b
bz
b b b

b b

b
G%b e

b b

G \W, where W = {w, w′ }

Figure 6.14:

6.4.2 Vertex Cuts, Edge Cuts and Connectivity

Definition 6.4.1:
In a connected graph G, a subset V ′ ⊆ V (G) of is a vertex cut of G if G − V ′
is disconnected. It is a k-vertex cut if |V ′ | = k. V ′ is called a separating set
of vertices of G. A vertex v of a connected graph G is a cut vertex of G if
{v} is a vertex cut of G.
Chapter 6 Graph Theory 340

Definition 6.4.2:
Let G be a nontrivial graph. Let S be a proper nonempty subset of V . Let
[S, S̄] denote the set of all edges of G having one endpoint in S and the other
in S̄. A set of edges of G of the form [S, S̄] is called an edge cut of G. An
edge e ∈ E(G) is a cut edge of G if {e} is an edge cut of G. An edge cut
of cardinality k is called a k-edge cut of G. If e is a cut edge of a connected
graph, then G − e has exactly two components.

Example 6.4.3:
Consider the graph in Fig. 6.15.
wb xb
b
vb
u
b b
y z
Figure 6.15:

{v} and {w, x} are vertex cuts. The edge subsets, wy, xy , uv and
{xz} are all edge cuts. Vertex v is a cut vertex. Edges uv and xz are cut
edges.

The following theorem characterizes a cut vertex of G.

Theorem 6.4.4:
A vertex v of a connected graph G with at least three vertices is a cut vertex
of G if and only if there exist vertices u and w of G, distinct from v, such
that v is in every (u, w)-path in G.

Proof. If v is a cut vertex G, then G − v is disconnected. Let G1 and G2

be the two components of G \ v. Choose u, w such that u ∈ V (G1 ) and
Chapter 6 Graph Theory 341

w ∈ V (G2 ). Then every (u, w)-path in G must contain v as otherwise u and

w would belong to the same component of G − v.
Conversely, assume that the condition of the theorem holds. Then the
deletion of v splits every (u, w)-path in G and hence u and w lie in diﬀerent
components of G − v. So G − v is disconnected and therefore v is a cut vertex
of G.

The following two theorems characterize a cut edge of a graph.

Theorem 6.4.5:
In a connected graph G, an edge e = uv of is a cut edge of G if and only if
e does not belong to any cycle of G.

Proof. Let e = uv be a cut edge of G and let [S, S̄] = {e} be the partition
of V (G) deﬁned by G − e so that u ∈ S and v ∈ S̄. If e belongs to a
cycle of G then [S, S̄] must contain at least one more edge contradicting that
{e} = [S, S̄]. Hence e cannot belong to a cycle.
Conversely assume that e is not a cut edge of G. Then G − e is connected
and hence there exists a (u, v)-path P in G − e. Then P together with the
edge e forms a cycle in G.

Theorem 6.4.6:
In a connected graph G, an edge e = uv is a cut edge if and only if there
exist vertices x and y such that e belongs to every (x, y)-path in G.

Proof. Let e = uv be a cut edge in G. Then G − e has two components, say

G1 and G2 . Let x ∈ V (G1 ) and y ∈ V (G2 ). Then there is no (x, y)-path in
G − e. Hence every (x, y)-path in G must contain e.
Chapter 6 Graph Theory 342

Conversely, assume that there exist vertices u and v satisfying the condi-
tion of the theorem. Then there exists no (x, y)-path in G − e and this means
that G − e is disconnected. Hence e is a cut edge of G

Exercise 6.4.7:
Prove or disprove: Let G be a simple connected graph with |V (G)| ≥ 3.
Then G has a cut edge if and only if it has a cut vertex.

6.4.3 Vertex Connectivity and Edge-Connectivity

We next introduce two parameters of a graph which in a way, measure the

“connectedness” of a graph.

Definition 6.4.8:
Let G be a nontrivial connected graph having at least a pair of non-adjacent
vertices. The minimum k for which there exists a k-vertex cut is called the
vertex connectivity or simply the connectivity of G; it is denoted by κ(G).
If G has no pair of nonadjacent vertices, that is if G is a complete graph of
order n, then κ(G) is deﬁned to be n − 1.
Note that the removal of any set of n − 1 vertices of Kn results in a K1 .
A subset of vertices or edges of a connected graph G is said to disconnect
the graph if its deletion results in a disconnected graph.

Definition 6.4.9:
The edge connectivity of a connected graph G is the smallest k for which
there exists a k-edge cut (i.e., an edge cut containing k edges). The edge
connectivity of G is denoted by λ(G).
Chapter 6 Graph Theory 343

If λ(G) is the connectivity of a graph G, then there exists a set of λ(G)

edges whose deletion results in a disconnected graph and no subset of edges
in G of size less than λ(G) has this property. Thus we have the following
deﬁnition

Definition 6.4.10:
A graph G is r-connected if κ(G) ≥ r. G is r-edge connected if λ(G) ≥ r.

The parameters κ(G), λ(G) and δ(G) are related by the following inequal-
ities.

Theorem 6.4.11:
For a connected graph G,

κ(G) ≤ λ(G) ≤ δ(G)

Proof. By our deﬁnition of κ(G) and λ(G), κ(G) ≥ 1, λ ≥ 1 and δ(G) ≥ 1.

Let E be an edge cut of G with λ(G) edges and let uv ∈ E. For each edge of
E that does not have both u and v as endpoints, remove an endpoint that is
diﬀerent from u and v. If there are t such edges, at most t vertices would have
been removed. If the resulting graph is disconnected, then κ(G) ≤ t ≤ λ(G).
Otherwise, there will remain a subset of edges of E having u and v as end
vertices, the removal of which will disconnect the graph. Hence additional
removal of one of u or v will disconnect the graph or a trivial graph will result.
In the process, a set of at most t + 1 vertices would have been removed and
so κ(G) ≤ t + 1 ≤ λ(G). Finally it is clear that λ(G) ≤ δ(G); in fact, if v is
a vertex of G with dG (v) = δ(G), then the set of δ(G) edges incident with v
forms the edge cut [{v}, V \ {v}] of G. Thus λ(G) ≤ δ(G).
Chapter 6 Graph Theory 344

We now characterize 2-connected graphs. Two paths from a vertex u to

another vertex v (u 6= v) are internally disjoint if they have no common
vertex except the vertices u and v.
The following theorem due to Whitney characterizes 2-connected graphs.

Theorem 6.4.12:
A graph G with at least three vertices is 2-connected if and only if there
exists a pair of internally disjoint paths between any pair of distinct vertices.

Proof. For any two distinct vertices u, v in G, assume that G has at least
two internally disjoint (u, v)-paths. Let w be any vertex of G. Then w is
not a cut vertex of G. If not, by Theorem 6.4.12, there exist vertices u and
v of G such that every (u, v)-path in G contains w. Hence w is not a cut
vertex of G and therefore G is 2-connected. Conversely, assume that G is 2-
connected. We apply induction on d(u, v) to prove that G has two internally-
disjoint (u, v)-paths. When d(u, v) = 1, the graph G − uv is connected, since
λ(G) ≥ κ(G) ≥ 2. Any (u, v)-path in G − uv is internally disjoint in G
from the (u, v)-path consisting of the edge uv. Thus u and v are connected
by two internally-disjoint paths in G. Now we apply induction on d(u, v).
Let d(u, v) = k > 1 and assume that G has internally-disjoint (x, y)-paths
whenever 1 ≤ d(x, y) < k. Let w be the vertex appearing before v on a
shortest (u, v)-path. Since d(u, w) = k − 1, by induction hypothesis, G has
two internally disjoint (u, w)-paths, say P and Q. (see Fig. 6.16).
Chapter 6 Graph Theory 345

P R
b z
b
u b bv
b
w
b b
Q

Figure 6.16:

Since G − w is connected, a (u, v)-path, say R, must exist in G − w. If R

avoids P or Q, we are done; otherwise let z be the last vertex of R belonging
to P ∪ Q. (see Fig 6.16). Without loss of generality, assume that z ∈ P .
We combine the (u, z)-subpath of P (with (z, v)-section of R) to obtain a
(u, v)-path internally-disjoint from the (u, v)-path Q ∪ (w, v)-path in G.

We next give further characterizations of 2-connected graphs.

Lemma 6.4.13 (Expansion Lemma):

Let G be a k-connected graph and let G ′ be obtained from G by adding a
new vertex x and making it adjacent to at least k vertices of G. Then G ′ is
also k-connected.

Proof. Let S be a separating set of G ′ . We have to show that |S| ≥ k. If

x ∈ S, then S \ {x} separates G and so |S − {x}| ≥ k. Hence |S| ≥ k + 1. If
x∈
/ S, then if S separates G, then |S| ≥ k. Otherwise, S does not separate
G but separates G ′ and so the neighbor set of x in G ′ is contained in S.

Theorem 6.4.14:
For a graph G with |V (G)| ≥ 3, the following conditions characterize 2-
Chapter 6 Graph Theory 346

connected graphs and are all equivalent.

(a) G is connected and has no cut vertex.

(b) For all u, v ∈ V (G), u 6= v, there are two internally disjoint (u, v)-paths.

(c) For all u, v ∈ V (G), u 6= v, there is a cycle through u and v

(d) δ(G) ≥ 2 and every pair of edges in G lies on a common cycle.

Proof. Equivalence of (a) and (b) follows from Theorem 6.4.12. Any cycle
containing vertices u and v corresponds to a pair of internally disjoint (u, v)-
paths. Therefore (b) and (c) are equivalent.
Next we shall prove that (d) ⇒ (c). Let x, y ∈ V (G) be any two vertices
in G. We consider edges of the type ux and uy (since δ(G) > 1) or ux and
wy. By (d) these edges lie on a common cycle and hence x and y lie on a
common cycle (see Fig. 6.17).

xb
xb
ub
u b

b
b w b
y y

Figure 6.17:

Therefore (d) ⇒ (c).

For proving (c) ⇒ (d), suppose that G is 2-connected and let uv, xy ∈
E(G). We add to G, the vertices w with neighborhood {u, v} and z with
Chapter 6 Graph Theory 347

neighborhood {x, y}. By the Expansion Lemma, the the resulting graph G′
is 2-connected and hence w, z lie on a common cycle C in G′ . Since w, z each
have degree 2, this cycle contains the paths u, w, v and x, z, y but not uv and
xy. We replace the paths u, w, v and x, z, y in C by the edges uv and xy to
obtain a desired cycle in G.

We conclude this section after introducing the notion of a block.

Definition 6.4.15:
A graph G is nonseparable if it is nontrivial, connected and has no cut
vertices. A block of a graph is a maximal nonseparable subgraph of G.

Note that if G has no cut vertices then G itself is a block.

Example 6.4.16:
A graph G is shown in Fig 6.18.(a) and Fig 6.18.(b) shows its blocks B1 , B2 , B3
and B4 .
Chapter 6 Graph Theory 348

b e g
b

b
b

a b b
b

b
d f
b
b

c h
(a) Graph G

b g
b
b e
b

f b
a b b
d b b
f b
d f
b
c b
(b) Blocks of G h

Figure 6.18:

Let G be a connected graph with |V (G)| ≥ 3. The following statements

are straightforward.

1 Each block of G with at least three vertices is a 2-connected subgraph of

2 Each edge of G belongs to one of its blocks and hence G is the union of its
blocks.

3 Any two blocks of G have at most one vertex in common; such a vertex, if
it exists, is a cut vertex of G.

4 A vertex of G that is not a cut vertex belongs to exactly one of its blocks.

5 A vertex of G is a cut vertex of G if and only if it belongs to at least two

blocks of G.
Chapter 6 Graph Theory 349

6.5 Trees and their properties

6.5.1 Basic Definition

The notion of a tree in graph theory is a simple and fundamental concept

with important applications in Computer Science.

Definition 6.5.1:
A tree is a connected graph that has no cycle (that is, acyclic). A forest
is an acyclic graph; each component of a forest is a tree. Fig. 6.19 shows
an arbitrary tree. The graphs of Fig 6.20 show all (unlabeled) trees with at
most ﬁve vertices.

b
b b
b
b b b
b
b
b
b b b b
b b
b
b
b

b
b b
b
b
b b

Figure 6.19: An arbitrary tree.

Chapter 6 Graph Theory 350
b

b b b b

b b b b b

1 vertex 2 vertices 3 vertices 4 vertices

b b

b b b b b b b b b b b

b b

5 vertices

Figure 6.20: Trees with at most ﬁve vertices.

A tree can be characterized in diﬀerent ways. We do this in the next

theorem.

Theorem 6.5.2 (Tree characterizations):

Given a graph G = (V, E), the following conditions are equivalent:

(i) G is a tree.

(ii) There is a unique path between any pair of vertices in G.

(iii) G is connected and each edge of G is a cut edge.

(iv) G is acyclic and the graph formed from G by “adding an edge” (that is,
a graph of the form G + e where e joins a pair of nonadjacent vertices
of G) contains a unique cycle.

(v) G is connected and |V (G)| = |E(G)| + 1.

Chapter 6 Graph Theory 351

Before proving Theorem 6.5.2, we ﬁrst prove two lemmas.

Lemma 6.5.3:
Any tree with at least two vertices contains at least two leaves.

Proof. Given a tree T = (V, E), let P = (v0 , e1 , v1 , . . . , ek , vk ) be a longest

path of T . Clearly, length of P is at least 1 and so v0 6= vk . We claim that
both v0 and vt are leaves. This is done by contradiction. If v0 is not a leaf,
then there exists an edge e = v0 v where e 6= e1 . Then either v is in P or v is
not in P : if v is in P (that is, v = vi , i ≥ 2) then the edge e together with
the section of the path P from v0 to vi forms a cycle in T , a contradiction
to the fact that T is acyclic. If v is not in P , then that would contradict the
choice of P .

Lemma 6.5.4:
Let v be a leaf in a graph G. Then G is a tree if and only if G − v is a tree.

Proof. We ﬁrst assume that G is a tree. Let x, y be two vertices of G − v.

Since G is connected x and y are connected by a path in G. This path
contains only two vertices, viz., x and y, of degree 1; so it does not contain
v. Consequently it is completely contained in G − v and hence G − v is
connected. As G is acyclic, G − v is also acyclic and so it is a tree.
We now assume that G − v is a tree. As v is a leaf, there exists an edge
uv incident on it. Clearly, u ∈ (G − v) and (G − v) ∪ {uv} has no cycle. As
G − v is connected (because it is a tree), (G − v) ∪ {uv} = G is connected.

We now prove Theorem 6.5.2

Chapter 6 Graph Theory 352

Proof of Theorem 6.5.2. We prove that each of the statements (ii) through
(v) is equivalent to statement (i). The proofs go by induction on the number
of vertices of G, using Lemma 6.5.4. For the induction basis, we observe that
all the statements (i) through (v) are valid if G contains a single vertex only.
We first show that (i) implies all of (ii) to (v). Let G be a tree with at
least two vertices and let v be a leaf and let v ′ be the vertex adjacent to v
in G. By the induction hypothesis, we assume that G − v satisfies (ii) to
(v). Now the validity of (ii), (iii) and (v) for G is obvious. For (iv), since G
is connected, any two vertices x, y ∈ V (G) is connected by a path P and if
xy ∈
/ E(G), then P + xy creates a cycle. Therefore (i) implies (iv) as well.
We now prove that each of the conditions (ii) to (v) implies (i). In (ii)
and (iii) we already assume connectedness. Also, a graph satisfying (ii) or
(iii) cannot contain a cycle: for (ii), this is because two vertices in a cycle are
connected by two distinct paths and for (iii), the reason is that by omitting
an edge in a cycle we obtain a connected graph. Thus (ii) implies (i) and
(iii) also implies (i).
To verify that (iv) implies (i), it suffices to check that G is connected. If
x, y ∈ V (G), then either xy ∈ E(G) or the graph G + xy contains a unique
cycle. Necessarily, as G is a cyclic, this cycle must contain the edge xy. Now
removal of the edge xy from this cycle give a path from x to y in G. Thus
G is connected. We finally prove that (v) implies (i). Let G be a connected
graph satisfying |V (G)| = |E(G)| + 1 ≥ 2. The sum of the degrees of all
vertices is 2|V (G)| − 2. This means that not all vertices can have degree 2
or more. Since all degrees are at least 1 (by connectedness) there exists a
vertex v of degree exactly 1, that is, a leaf of G. The graph G′ = G − v is
Chapter 6 Graph Theory 353

again connected and it satisﬁes |V (G′ )| = |E(G′ )| + 1. Hence it is a tree by

the induction hypothesis and thus G is a tree as well.

6.5.2 Sum of distances from a leaf of a tree

Definition 6.5.5:
Let G be a graph and let u, v be a pair of vertices connected in a path in G.
Then the distance from u to v, denoted by dG (u, v) or simply d(u, v), is the
least length (in terms of the number of edges) of a (u, v)-path in G. If G has
no (u, v)-path, we deﬁne d(u, v) = ∞.

Definition 6.5.6:
The diameter of a connected graph G (denoted by diam (G)) is deﬁned as
the maximum of the distances between pairs of vertices of G. In symbols,
diam (G) = max d(u, v).
u,v∈V (G)

Since in a tree any two vertices are connected by a unique path, the
diameter of a tree T is the length of a longest path in T .
We next prove a theorem that gives a bound on the sum of the distances
of all the vertices of a tree T from a given vertex of T .

Theorem 6.5.7:
Let u be a vertex of a tree T with n vertices. Then,
X
n
d(u, v) ≤ .
2
v∈V (G)

Proof. The proof goes by induction on the number of vertices of G. The

result holds trivially for n = 2. Let n > 2. The graph T − u is a forest with
Chapter 6 Graph Theory 354

components T1 , . . . , Tk , where k > 1. As T is connected, u has a neighbor

in each Ti ; also, since T has no cycles, u has exactly one neighbor say vi , in
each Ti (the situation is shown in Fig. 6.21)

b
b
b
b v1
T1 ub
b b
b
b v3
T3
b b
b
v2
b
b
b
T2 b
b
b

Figure 6.21:

If v ∈ V (Ti ), then the unique (u, v)-path in T passes through vi and we

have,
dT (u, v) = 1 + dTi (vi , v)

Letting ni = |Ti |, we obtain

X X
dT (u, v) = ni + dTi (vi , v)
v∈V (Ti ) v∈V (Ti )

By the induction hypothesis,

X
ni
dTi (vi , v) ≤
2
v∈V (Ti )

if we now sum the formula for distances from u over all the components of
k
P
T − u, we obtain (as ni = n − 1),
i=1
X X ni
dT (u, v) ≤ (n − 1) +
i
2
v∈V (T )
Chapter 6 Graph Theory 355

k
P ni
n−1
P
We note that 2
≤ 2
since ni = n − 1, and the right hand side
i=1
counts the edges in Kn−1 and the left hand side counts the edges in a subgraph
of Kn−1 (a disjoint union of cliques). Hence we have,
X
n−1 n
dT (u, v) ≤ (n − 1) + =
2 2
v ∈ V (T )

6.6 Spanning Tree

6.6.1 Definition and Basic Results

We now introduce the notion of a spanning tree.

Definition 6.6.1:
A spanning subgraph of a graph G is a subgraph H of G such that V (H) =
V (G). A spanning tree of G is a spanning subgraph of G which is a tree.

It is not diﬃcult to show that every connected graph has a spanning

tree. Note that not every spanning subgraph is connected and a connected
subgraph of G need not be a spanning subgraph.

Exercise 6.6.2:
Prove that a graph G is connected if and only if it has a spanning tree.

By labeling the vertices of K4 , we can easily see that it has 16 diﬀerent

spanning trees (see Fig. 6.22). However, K4 has only two non-isomorphic
unlabeled spanning trees namely K1,3 and P4 .
Chapter 6 Graph Theory 356

1b 2b 1b 2b 1b 2b 1b 2b

b b b b b b b b
4 3 4 3 4 3 4 3

1b 2b 1b 2b 1b 2b 1b 2b

b b b b b b b b
4 3 4 3 4 3 4 3

1b 2b 1b 2b 1b 2b 1b 2b

b b b b b b b b
4 3 4 3 4 3 4 3

1b 2b 1b 2b 1b 2b 1b 2b

b b b b b b b b
4 3 4 3 4 3 4 3

Figure 6.22:

We state a general theorem due to the English mathematician Arthur

Caley without proof.

Theorem 6.6.3 (A. Caley (1889)):

The complete graph Kn has nn−2 diﬀerent labeled spanning trees.

Corollary 6.6.4 (to Theorem 6.5.7):

If u is a vertex of a connected graph G of order n then
X
n
d(u, v) ≤
2
v ∈ V (G)
Chapter 6 Graph Theory 357

Proof. Let T be a spanning tree of G. Every (u, v)-path in T also appears in

G (G may have other (u, v)-paths that are shorter than those in T ). Therefore
dG (u, v) ≤ dT (u, v), for every vertex v. This implies
X X
dG (u, v) ≤ dT (u, v)
v ∈ V (G) v ∈ V (G)
P
n
But Theorem 6.5.7, we have dT (u, v) ≤ 2
.
v ∈ V (G)
X
n
Hence dG (u, v) ≤
2
v ∈ V (G)

Exercise 6.6.5:
Let G be a graph with exactly one spanning tree. Prove that G is a tree.

Exercise 6.6.6:
If ∆ = k, then show that a graph G has a spanning tree with at least k
leaves.

The sum of the distances over all pairs of distinct vertices a graph G
is known as the Wiener Index of G, denoted by W (G). Thus W (G) =
P
u,v∈V (G) d(u, v).

Chemical molecules can be modeled as graphs by treating the atoms

as vertices and the atomic bonds as edges. Wiener index was originally
introduced by Harold Wiener who observed correlation between this index
and the boiling point of paraﬃns. Consider the path Pn of n vertices labeled
1 through n wherein vertex i to connected to vertex i − 1 (1 < i ≤ n).

W (Pn−1 ) W (Pn )
b b b b b b b
1 2 3 4 (n − 2) (n − 1) n

Figure 6.23:
Chapter 6 Graph Theory 358

From Fig. 6.23 it easily follows that

n(n − 1)
W (Pn ) = W (Pn−1 ) +
2
n+1

yielding Pn = 3
.

Exercise 6.6.7:
Prove the following:

n

(i) W (Kn ) = 2
.

(ii) W (Km,n ) = (m + n)2 − mn − m − n.

(iii) W (C2n ) = (2n)3 /8.

(iv) W (C2n+2 ) = (2n + 2)(2n + 1)(2n)/8.

(v) Wiener index of the Petersen Graph is 75.

We next give an algorithm for ﬁnding a spanning tree of a graph.

Algorithm SP-TREE

Let G = (V, E) be the given graph, where |V (G)| = n and |E(G)| = m.

Let (e1 , e2 , . . . , em ) be the edge sequence of G labeled in some way. We
now successively construct the subsets E0 , E1 , . . . of G.
Set E0 = φ. From Ei−1 , the set Ei is obtained as follows:



Ei−1 ∪ {ei }, if the graph (V (G), Ei−1 ∪ {ei }) has no cycle
Ei =


Ei−1 , otherwise

We stop if Ei has already n − 1 edges or if i = m (i.e., all edges have been

considered).
Chapter 6 Graph Theory 359

We prove that algorithm SP-TREE is correct.

Proposition 6.6.8:
If algorithm SP-TREE outputs a graph T with n − 1 edges, then T is a
spanning tree of G. If T has k < (n − 1) edges, then G is a disconnected
graph with n − k components.

Proof. From the way the sets Ei are constructed the graph T contains no
cycle. If k = |E(T )| = n−1, then by Theorem 6.5.2 (v), T is a tree and hence
it is a spanning tree. If k < n − 1, then T is a disconnected graph whose
every component is a tree. It is easy to reason that it has n − k components.
We prove that the vertex sets of the components of the graph T coincide
with those of the components of the graph G. Assume the contrary and
let x and y be vertices lying in the same component of G but in distinct
components of T . Let C be the component of T containing the vertex x.
Consider some path,

(x = x0 , e1 , x1 , e2 , . . . , ek , xk = y)

from x to y in G as shown in Fig. 6.24.

b
b b
e b
b b
xb xi
b b
b y
C

Figure 6.24:

Let i be last index for which xi is contained in the component C. Obvi-

Chapter 6 Graph Theory 360

ously i < k and have xi+1 ∈

/ C. The edge e = xi xi+1 thus does not belong to
T and so it had to form a cycle with some edges already selected into T at
some stage of the algorithm. Therefore the graph T + e also contains a cycle;
but this is impossible as e connects two distinct components of T . This gives
the desired contradiction.

6.6.2 Minimum Spanning Tree

The design of electronic circuits using integrated chips often requires that
the pins of several components be at the same potential. This is achieved
by wiring them together. To interconnect a set of n pins, we can use an
arrangement of n − 1 wires, each connecting two points. Of the various
arrangements on a circuit board, the one that uses the least amount of wire
is desirable.
The above wiring problem can be modeled thus: we are given a connected
graph G = (V (G), E(G)), where V (G) corresponds to the set of pins and
E(G) corresponds to the possible interconnections. Associated with each
edge uv ∈ E(G), we have a “weight” w(u, v) specifying the cost (amount
of wire needed) to connect u and v. We then wish to ﬁnd a spanning tree
T (V, E(T )) of G whose total weight,
X
W (E(T )) = w(u, v)
u v ∈E(T )

is minimum. Such a tree T is called a minimum spanning tree (MST) of the

graph G.
We now present what is popularly known as Prim’s algorithm (alterna-
tively called Jarnik’s algorithm) for ﬁnding the MST of a graph.
Chapter 6 Graph Theory 361

6.6.3 Algorithm PRIM

Let the input graph be G = (V, E), with |V (G)| = n, |E(G)| = m. We

will successively construct the sets V0 , V1 , V2 , . . . ⊆ V of vertices and the sets
E1 , E1 , E2 , . . . ⊆ E of edges as given below.
Let E0 = φ and V0 = {v}, where v is an arbitrary vertex. Having
constructed Vi−1 and Ei−1 , we next ﬁnd an edge ei = xi yi ∈ E(G) such that,

(i) xi ∈ Vi−1 and yi ∈ V \ Vi−1 and

(ii) ei is an edge of minimum weight.

We now set, Vi = Vi−1 ∪ {yi } and Ei = Ei−1 ∪ {ei }. If no such edge exists,
the algorithm terminates. Let Et denote the set for which the algorithm has
stopped. The algorithm outputs the graph T = (V, Et ) as the MST.

Proof of correctness of Algorithm PRIM. Let T = (V, Et ) be the output

of algorithm PRIM. Let the edges of Et be numbered e1 , e2 , . . . , en−1 in the
order in which they have been added to T by the algorithm.
Assume that T is not a MST; let T ′ be some MST. Let k(T ′ ) denote the
index for which all the edges e1 , e2 , . . . , ek ∈ E(T ′ ) but ek+1 ∈
/ E(T ′ ). Among
all MSTs, we select the one which has the maximum value k. Let this MST
be Tb = (V, E).
b Deﬁne k = k(Tb).

Let us now consider the stage in the algorithm’s execution when the edge
ek+1 has been added to T . Let Tk = (Vk , Ek ) be the tree formed by the
addition of the edges e1 , . . . , ek . Then ek+1 = xy where x ∈ V (Tk ) and
/ V (Tk ). Consider the graph Tb + ek+1 . This graph contains some cycle
y ∈
C-such a cycle necessarily contains the edge ek+1 .
Chapter 6 Graph Theory 362

The cycle C consists of the edge ek+1 = xy, plus a path, say P connecting
the vertices x and y in the spanning tree Tb. At least one edge of the path P
has one vertex in the set Vk and the other vertex not in Vk . Let e be such an
b and
edge. Obviously e is diﬀerent from ek+1 (see Fig. 6.25) and also e ∈ E
ek+1 ∈ b
/ E.
b

b b
b
b
b
x
b b
b
b
b
b ek+1
b
b b b
b b
y
e b b
b
b
b b

edges in P

Figure 6.25:

Both the e and ek+1 connect a vertex of Vk with a vertex outside Vk and
by the edge selection rule in the algorithm we get w(ek+1 ) ≤ w(e).
Now consider the graph T ′ = (T̂ + eK+1 ) − e. This graph has n − 1 edges
and is connected as can be easily seen. Hence it is a spanning tree. Now we
have w(E(T ′ )) = w(Ê) − w(e) + w(eK+1 ) ≤ w(Ê) and thus T ′ is an MST as
well, but with k(T ′ ) > k(T̂ ). This is a contradiction to the choice of T̂ and
hence T must be a MST.

We next present another elegant algorithm to ﬁnd the MST of a weighted

graph. This is Kruskal’s algorithm.
Chapter 6 Graph Theory 363

6.6.4 Algorithm KRUSKAL

Given a weighted graph G, let us denote the edges of G by the sequence

{e1 , e2 , . . . , em } where

w(e1 ) ≤ w(e2 ) ≤, . . . , ≤ w(em ).

For the above ordering of edges, execute algorithm SP-TREE.

Exercise 6.6.9:
Prove that algorithm KRUSKAL does produce an MST.

6.7 Independent Sets and Vertex Coverings

6.7.1 Basic Definitions

Definition 6.7.1:
Recall (see Section 6.2.1 on page 319) that an independent set of a graph
G is a subset S ⊆ V (G) such that no two vertices of S are adjacent in G.
S is a maximum independent set of G if G has no independent set S ′ with
|S ′ | > |S|. A maximal independent set of G is an independent set that is not
a proper subset of an independent set of G.
Chapter 6 Graph Theory 364

b
u q

b
v b

t r

b
b
s

Figure 6.26:

In Fig 6.26, {v} and {p, q, r, s, t, u} are both maximal independent sets.
The latter set is also a maximum independent set.

Definition 6.7.2:
A subset K ⊆ V (G) is called a covering of G if every edge of G is incident
with at least one vertex of K. A covering K is minimum if there is no
covering K ′ of G such that |K ′ | < |K|; it is minimal if there is no covering
K ′′ of G such that K ′′ is a proper subset of K.
ub
b

y bv
b

b
z

x w
b

Figure 6.27: Wheel W5

In the graph W5 of Fig 6.27, {u, v, w, x, y} is a covering of W5 ; {u, w, x, z}

is a minimal covering.
Chapter 6 Graph Theory 365

Theorem 6.7.3:

In a graph G = V (G), E(G) , a subset S ⊆ V (G) is independent if V (G) \ S
is a covering of G.

Proof. In proof, we note that S is independent if and only if no two vertices

of S are adjacent in G. Hence every edge of S must be incident to a vertex
of V (G) \ S. Therefore V (G) \ S is a covering of G.

Definition 6.7.4:
The number of vertices in a maximum independent set of G is called the
independence number of G and is denoted by α(G). The number of vertices
in a minimum covering of G is the covering number of G and is denoted by
β(G).

Corollary 6.7.5:
For a graph G of order n, α(G) + β(G) = n.

Proof. Let S be a maximum independent set of G. By Theorem 6.7.3, V (G)\

S is a covering of G and therefore |V (G) \ S| = n − α(G) ≥ β(G) or n ≥
α(G)+β(G). Similarly, let K be a minimum covering of G. Then V (G)\K is
an independent set and so |V (G)\K| = n−β(G) ≤ α(G) or n ≤ α(G)+β(G).
The two inequalities together imply that α(G) + β(G) = n.

Exercise 6.7.6:
Prove that the smallest possible maximal independent set in a d-regular graph
is n/(d + 1).

Exercise 6.7.7:
Chapter 6 Graph Theory 366

Show that a graph G having all degrees at most d satisﬁes the following
inequality:
V (G)
α(G) ≥
d+1

6.8 Vertex Colorings of Graphs

6.8.1 Basic Ideas

We begin with a simple problem, known as the “storage problem”. The

Chemistry department of a college wants to store various chemicals so that
incompatible chemicals (two chemicals are incompatible when they cause a
violent reaction when brought together) are stored in diﬀerent rooms. The
college is interested in knowing the minimum number of rooms required to
store all the chemicals so that no two incompatible chemicals are in the same
room.

To solve the above problem, we form a graph G = V (G), E(G) , where
V (G) corresponds to the chemicals and uv ∈ E(G) if and only if the chemicals
corresponding to u and v are incompatible. Then any set of compatible
chemicals corresponds to an independent set of G. Thus a safe storing scheme
of chemicals corresponds to a partition of V (G) into independent subsets of
G. The cardinality of such a minimum partition of V (G) is then the required
number of rooms. This minimum cardinality is called the chromatic number
of the graph G.

Definition 6.8.1:
The chromatic number χ(G) of a graph G is the minimum number of in-
dependent subsets that partition the vertex set of G. Any such minimum
Chapter 6 Graph Theory 367

partition is called a chromatic partition of V (G).

We can interpret the above described storage problem as a vertex coloring

problem: we are asked to color the vertices of a graph G so that no two ad-
jacent vertices receive the same color; that is, no two incompatible chemicals
are marked by the same color. If we use the minimum number of colors, we
have solved the problem.

Definition 6.8.2:
A k-coloring of a graph G is a labeling f : V (G) → {1, . . . , k}. The labels
are interpreted as colors; all vertices with the same color form a color class.
A k-coloring f is proper if an edge uv is in E(G) iﬀ f (u) 6= f (v). A graph
G is k-colorable if it has a proper k-coloring. We will call the labels “colors”
because their numerical value is not important. Note that χ(G) is then the
minimum number of colors needed for a proper coloring of G. We also say “G
is k-chromatic” to mean χ(G) = k. It is obvious that χ(Kn ) = n. Further
χ(G) = 2 if and only if G is bipartite having at least one edge. We can also
reason that χ(Cn ) = 2 if n is even and χ(Cn ) = 3 if n is odd.

Exercise 6.8.3:
Prove that χ(G) = 2 if and only if G is a bipartite graph with at least one
edge.
The Petersen graph, P has chromatic number 3. Fig. 6.28 shows a proper
3-coloring of P using three colors. Certainly, P is not 2-colorable, since it
contains an odd cycle.
Chapter 6 Graph Theory 368

1b
b3

3 b b b b2
1 3
b b
1 2
b b
2 1

Figure 6.28:

Since each color class of a graph G is an independent set, we can see that,
χ(G) ≥ |V (G)|/α(G), where α(G) is the independence number of G.

Definition 6.8.4:
A graph G is called critical if for every proper subgraph H of G, we have
χ(H) < χ(G). Also, G is called k-critical if it is k-chromatic and critical.
The above deﬁnition holds for any graph. When G is connected it is
equivalent to the condition that χ(G − e) < χ(G) for each edge e ∈ E(G);
but then this is equivalent to saying χ(G − e) = χ(G) − 1. If χ(G) = 1, then
G is either trivial or totally disconnected. Hence G is 1-critical if and only
if G is K1 . Also χ(G) = 2 implies that G is bipartite and has at least one
edge. Hence G is 2-critical if and only if G is K2 .

Exercise 6.8.5:
Prove that every critical graph is connected.

Exercise 6.8.6:
Show that if G is k-critical, then for any v ∈ V (G) and e ∈ E(G),

χ(G − v) = χ(G − e) = k − 1.
Chapter 6 Graph Theory 369

It is clear that any k-chromatic graph contains a k-critical subgraph. This

can be seen by removing vertices and edges in succession, whenever possible,
without decreasing the chromatic number.

Theorem 6.8.7:
If G is k-critical, then δ(G) ≥ k − 1.

Proof. Suppose δ(G) ≤ k − 2. Let v be a vertex of minimum degree in

the graph G. Since G is k-critical, χ(G − v) = χ(G) − 1 = k − 1 (refer
Exercise 6.8.6). Hence in any proper (k − 1)-coloring of G − v, at most
(k − 2) colors alone would have been used to color the neighbors of v in G.
Thus there is at least one color, say c, that is left out of these (k − 1) colors.
If v is given that color c, a proper (k − 1)-coloring of G is obtained. This is
impossible since G is k-chromatic. Hence δ(G) ≥ (k − 1).

6.8.2 Bounds for χ(G)

We begin with the following corollary to Theorem 6.8.7

Corollary 6.8.8:
For any graph χ(G) ≤ 1 + ∆(G).

Proof. Let G be a k-chromatic graph and let H be a k-critical subgraph of

G. Then χ(H) = χ(G) = k.
By Theorem 6.8.7, δ(H) ≥ k − 1 and hence k ≤ 1 + δ(H) ≤ 1 + ∆(H) ≤
1 + ∆(G).

We can also show that the above result is implied by the “greedy” coloring
Chapter 6 Graph Theory 370

algorithm given below.

Algorithm GREEDY-COLORING

Given V (G), let v1 , v2 , . . . , vn be a vertex ordering, where |V (G)| = n. Color

the vertices in this order, assigning to vi the smallest-indexed label (color)
not already used on its lower-indexed neighbors.
In a vertex ordering, each vertex v has at most ∆(G) neighbors prior to v
in the ordering. So the algorithm GREEDY-COLORING cannot be forced to
use more than ∆(G)+1 colors. (reason: v has at most ∆(G) vertices adjacent
to it; for coloring these adjacent vertices we need ∆(G) colors and a diﬀerent
color for coloring v itself). This constructively proves that χ(G) ≤ ∆(G) + 1.
If G is an odd cycle, then χ(G) = 3 = 2+1 = 1+∆(G); if G is a complete
graph Kk , χ(G) = k = 1 + (k − 1) = 1 + ∆(G). That these are the only two
extremal families of graphs for which χ(G) = 1 + ∆(G) is asserted by the
following theorem, which we state without proof.

Theorem 6.8.9 (Brook’s Theorem):

If G is a connected graph other than a complete graph or an odd cycle, then
χ(G) ≤ ∆(G).

Exercise 6.8.10:
If χ(G) = k, then show that G contains at least k vertices each of degree at
least k − 1.
Chapter 7

Coding Theory

7.1 Introduction

Coding Theory has its origin in communication engineering. But with Shan-
non’s seminal paper of 1948 [?], it has been greatly influenced by math-
ematics with a variety of mathematical techniques to tackle its problems.
Algebraic coding theory uses a great deal of matrices, groups, rings, fields,
vector spaces, algebraic number theory and, not to speak of, algebraic geome-
try. In algebraic coding, each message is regarded as a block of symbols taken
from a finite alphabet. On most occasions, these are elements of Z2 = {0, 1}.
Each message is then a finite string of 0’s and 1’s. For instance, 00110111 is
a message. Usually, the messages get transmitted through a communication
channel. It is quite possible that such channels are subjected to noises, and
consequently, the messages changed. The purpose of an error correcting code
is to add redundancy symbols to the message, based on, of course, on some
rule so that the original message could be retrieved even though it is garbled.
Any communication channel looks as in Figure 7.1. The first box of the

371
Chapter 7 Coding Theory 372

Message Channel Received Original

Encoder noise message Decoder message
1101 1101001 0100001 1101001 1101

Figure 7.1: The Communication Channel

channel indicates the message. It is then transmitted to the encoder which

adds a certain number of redundancy symbols. In Figure 7.1, they are 001
which when added to the message 1101 gives the coded message 1101001.
Because of channel noise, the coded message gets distorted and the received
message in Fig. 7.1 is 00101001. This message then enters the decoder. The
decoder applies the decoding algorithm and retrieves the coded message using
the added redundancy symbols. From this, the original message is read off
in the last box. The decoder has thus corrected two errors, that is, error in
two places.
The efficiency of a code is the number of errors it can correct. A code
is perfect if it can correct all of its errors. It is k-error-correcting if it can
correct k or fewer errors. The aim of coding theory is to devise efficient
codes. Its importance lies in the fact that erroneous messages could prove to
be disastrous.
It is relatively easier to detect errors than to correct them. Sometimes,
even detection may prove to be helpful as in the case of a feedback channel,
that is, a channel that has a provision for retransmission of the messages.
Suppose the message 1111 is sent through a feedback channel. If a single error
occurs and the received message is 0111, we can ask for a feedback twice and
may get 1111 on both the occasions. We then conclude that the received
message should have been 1111. Obviously, this method is not perfect as the
Chapter 7 Coding Theory 373

q
0 0
p p

1 1
q

Figure 7.2:

original message could have been 0011. On the other hand, if the channel is
two-way, that is, it can detect errors so that the receiver knows the places
where the errors have occurred and also contains the provision for feedback,
then it can prove to be more eﬀective in decoding the received message.

7.2 Binary Symmetric Channels

One of the simplest channels is the binary symmetric channel (BSC). This
channel has no memory and it simply transmits two symbols 0 and 1. It
has the property that the probability that a transmitted message is received
correctly is q while the probability that it is not is p = 1−q. This is pictorially
represented in Figure 7.2. Before considering an example of a BSC, we ﬁrst
give the formal deﬁnition of a code.

Definition 7.2.1:
A code C of length n over a ﬁeld F is a set of vectors in F n , the space of
ordered n-tuples over F . Any element of C is called a codeword of C.

As an example, the set of vectors, C = {(10110), (00110), (11001), (11010)}

is a code of length n over the ﬁeld Z2 . This code C has four codewords.
Chapter 7 Coding Theory 374

Suppose a binary codeword of length 4 (that is, a 4-digit codeword) is

sent through a BSC with probability q = 0.9. Then the probability that the
sent word is received correctly is q 4 = (0.9)4 = 0.6561.
We now consider another code, namely, the Hamming (7, 4)-code. (See
Section 7.5 below). This code has as its words the binary vectors (1000 001),
(0100 011), (0010 010), (0001 111), of length 7 and all of their linear combi-
nations over the field Z2 .
The first four positions are information positions and the last three are
the redundancy positions. There are in all 24 = 16 codewords in the code.
We shall see later that this code can correct one error. Hence the probability
that a received vector yields the transmitted vector is q 7 +7pq 6 , where the first
term corresponds to the case of no error and the term 7pq 6 corresponds to a
single error in each of the seven possible positions. As q = 0.9, q 7 + 7pq 6 =
0.4783 + 0.3720 = 0.8503, which is quite large compared to the probability
0.6561 arrived at earlier in the case of a BSC.
Hamming code is an example of a class of codes called Linear Codes. we
now present some basic facts about linear codes.

7.3 Linear Codes

Definition 7.3.1:
An [n, k]-linear code C over a ﬁnite ﬁeld F is a k-dimensional subspace of
V n , the vector space of ordered n-tuples over F .

If F has q elements, that is, F = GF (q), the [n, k]-code will have q k
codewords. The codewords of C are all of length n as they are n-vectors over
Chapter 7 Coding Theory 375

F . k is the dimension of C.

Definition 7.3.2:
A nonlinear code of length n over a ﬁeld F is just a subset of the vector space
F n over F .

C is a binary code if F = Z2 .
A linear code C is best represented by any one of its generator matrices.

Definition 7.3.3:
A generator matrix of a linear code C over F is a matrix whose row vectors
form a basis for C over F .

If C is an [n, k]-linear code over F , then a generator matrix of C is a k

by n matrix G over F whose row vectors form a basis for C.
For example, consider the binary code C1 with generator matrix
 
10011
 
 
G1 = 0 1 0 1 0 .
 
00101
Clearly, all the three row vectors of G1 are linearly independent over Z2 .
Hence C1 has 23 = 8 codewords. The ﬁrst three columns of G1 are linearly
independent over Z2 . Therefore, the ﬁrst three positions of any codewords of
C1 may be taken as information positions and the remaining two as redun-
dancy positions. In fact, the positions corresponding to any three linearly
independent columns of G1 may be taken as information positions and the
rest redundancies. Now any word of X of C1 is given by

X = x1 R1 + x2 R2 + x3 R3 , (8.1)
Chapter 7 Coding Theory 376

where x1 , x2 , x3 are all in Z2 and R1 , R2 , R3 are the three row vectors of

G1 in order. Hence by (8.1), X = (x1 , x2 , x3 , x1 + x2 , x1 + x3 ). If we take
X = (x1 , x2 , x3 , x4 , x5 ), we have the relations

x4 = x1 + x2 , and

x5 = x1 + x3 . (8.2)

In other words, the first redundancy coordinate of any codeword is the sum of
the first two information coordinates of that word while the next redundancy
coordinate is the sum of the first and third information coordinates.
Equations (8.2) are the parity-check equations of the code C1 . They can
be rewritten as

x1 + x2 − x4 = 0, and

x1 + x3 − x5 = 0. (8.3)

In the binary case, equations (8.3) become

x1 + x2 + x4 = 0, and

x1 + x3 − x5 = 0. (8.4)

In other words, the vector X = (x1 , x2 , x3 , x4 , x5 ) ∈ C1 iﬀ its coordinates

satisfy (8.4). Equivalently, X ∈ C1 iﬀ it is orthogonal to the two vectors
(11010) and (10101). If we take these two vectors as the row vectors of a
matrix H1 , then H1 is the 2 by 5 matrix
 
11010
H1 =  .
10101

H1 is called a parity-check matrix of the code C1 . The row vectors of H1

are orthogonal to the row vectors of G1 . (Recall that two vectors X =
Chapter 7 Coding Theory 377

(x1 , . . . , xn ) and Y = (y1 , . . . , yn ) of the same length n are orthogonal if their

inner product (= scalar product < X, Y > = x1 y1 + · · · + xn yn ) is zero). Now
if a vector v is orthogonal to u1 , . . . , uk , then it is orthogonal to any linear
combination of u1 , . . . , un . Hence the row vectors of H1 which are orthogonal
to the row vectors of G1 are orthogonal to all the vectors of the row space of
G1 , that is, to all the vectors of C1 . Thus

C1 = X ∈ Z2 5 : H1 X t = 0 = Null space of the matrix H1 ,

when X t is the transpose of X. The orthogonality relations H1 X t = 0

give the parity-check conditions for the code C1 . These conditions fix the
redundancy positions, given the message positions of any codeword. A similar
result holds good for any linear code. Thus any linear code over a field F is
either the row space of one of its generator matrices or the null space of its
corresponding parity-check matrix.
So far we have been considering binary linear codes. We now consider
linear codes over an arbitrary finite field F . As mentioned in Definition 7.3.1,
an [n, k] linear code C over F is a k-dimensional subspace of F n , the space
of all ordered n-tuples over F . If {u1 , . . . , uk } is a basis of C over F , every
word of C is a unique linear combination

α1 u1 + · · · + αk uk , αi ∈ F for each i.

Since αi can take q values for each i, 1 ≤ i ≤ k, C has q · q · · · q(k times) = q k

codewords.
Let G be the k by n matrix over F having u1 , . . . , uk of F n as its row
vectors. Then as G has k (= dimension of C) rows and all the k rows form a
linearly independent set over F , G is a generator matrix of C. Consequently,
Chapter 7 Coding Theory 378

C is the row space of G over F . The null space of C is the space of vectors
X ∈ F n which are orthogonal to all the words of C. In other words, it is the
dual space C⊥ of C. As C is of dimension k over F , C⊥ is of dimension n − k
over F . Let {X1 , . . . , Xn−k } be a basis of C⊥ over F . If H is the matrix
whose row vectors are X1 , . . . , Xn−k , then H is a parity-check matrix of C.
It is an (n − k) by n matrix. Thus

C = row space of G

= null space of H

= X ∈ F n : HX t = 0 .

Theorem 7.3.4:
Let G = (Ik |A) be a generator matrix of a linear code C over F , where Ik is
the identity matrix of order k over F , and A is a k by (n − k) matrix over
F . Then a generator matrix of C⊥ is given by

H = −At |In−k

over F .

Proof. Each row of H is orthogonal to all the rows of G since (by block
multiplication, See(...),
h i
−A
GH t = [Ik |A] I = −A + A = 0.
n−k

Recall that in the example following Deﬁnition 7.3.3, k = 3, n = 5, and

" #
11
G1 = (I3 |A) , where A = 1 0 while H1 = −At |I2 = At |I2 over Z2 .
01
Chapter 7 Coding Theory 379

Corollary 7.3.5:
G = [Ik |A] is a generator matrix of a code C of length n iﬀ H = [−At |In−k ]
is a parity-check matrix of C.

7.4 The Minimum Weight (Hamming Weight)

of a Code

Definition 7.4.1:
The weight W (v) of a codeword v of a code C is the number of nonzero
coordinates in v. The minimum weight of C is the least of the weights of its
nonzero codewords. The weight of the zero vector of C is naturally zero.

Example 7.4.2:
As an example, consider the binary code C2 with generator matrix
h i
10110
G2 = 0 1 1 0 1 = [I2 |A] .

C2 has four codewords. Its three nonzero words are u1 = (10110),

u2 = (01101), and u3 = u1 + u2 = (11011). Their weights are 3, 3 and
4 respectively. Hence the minimum weight of C2 is 3. A parity-check matrix
of C2 is, (refer Theorem 7.3.4)
" #
11110
H2 = [−At|I5−2 ] = At |I3 = 10010 (as F = Z2 ).
01001

Definition 7.4.3:
Let X, Y ∈ F n . The distance d(X, Y ), also called the Hamming distance
between X and Y , is deﬁned to be the number of places in which X and Y
diﬀer.
Chapter 7 Coding Theory 380

If X and Y are codewords of a linear code C, over a ﬁeld F , then X − Y

is also in C and has nonzero coordinates only at the places where X and Y
diﬀer. Accordingly, if X and Y are words of a linear code C, then

d(X, Y ) = wt (X − Y ). (8.5)

We state this result as our next theorem.

Theorem 7.4.4:
The minimum distance of a linear code C is the minimum weight of a nonzero
codeword of C.

Thus for the linear code C2 of Example 7.4.2, the minimum distance is 3.
The function d(X, Y ) deﬁned in Equation (8.5) does indeed deﬁne a dis-
tance function (that is, a metric) on C. That is to say, it has the following
three properties:
For all X, Y, Z in C,

(i) d(X, Y ) ≥ 0, and d(X, Y ) = 0 iﬀ X = Y .

(ii) d(X, Y ) = d(Y, X).

(iii) d(X, Z) ≤ d(X, Y ) + d(Y, Z).

7.5 Hamming Codes

Hamming codes are binary linear codes. They can be deﬁned either by their
generator matrices or by their parity-check matrices. We prefer the latter.
Let us start by deﬁning the [7, 4]-Hamming code H3 . The seven column
vectors of its parity-check matrix H are the binary representations of the
Chapter 7 Coding Theory 381

numbers 1 to 7 written in such a way that the last three of its column
vectors form I3 , the identity matrix of order 3. Thus
" #
1110100
H= 1101010 .
1011001

The columns of H are the binary representations of the numbers 7, 6, 5, 3;

4, 2, 1 respectively. As H is of the form [−At |I3 ], the generator matrix of H3
is given by  
1 0 0 0 1 1 1
0 1 0 0 1 1 0
G = [I4 |A] =  0 0 1 0 1 0 1 .
0 0 0 1 0 1 1
H3 is of length 23 − 1 − 3 = 4, and dimension 4 = 7 − 3 = 23 − 1 − 3.
What is the minimum distance of H3 ? One way of ﬁnding it is to list all
the 24 − 1 nonzero codewords (See Theorem 7.4.4). However, a better way
of determining it is the following. The ﬁrst row of G is of weight 4 while the
remaining rows are of weight 3. The sum of any two or three of these row
vectors as well as the sum of all the four row vectors of G are all of weight
at least 3. Hence the minimum distance of H3 is 3.

7.6 Standard Array Decoding

We now write the coset decomposition of Z72 with respect to the subspace H3 .
(Recall that H3 is a subgroup of the additive group Z72 ). As Z72 has 27 vectors,
and H3 has 24 codewords, the number of cosets of H3 in Z2 is 27 /24 = 23 .
(See ....). Each coset is of the form X + H3 = {X + v : v ∈ H3 }. Any two
cosets are either identical or disjoint. The vector X is a representative of the
cosetX + H3 . The zero vector is a representative of the coset H3 . If X and
Y are each of weight 1, the coset X + H3 6= Y + H3 , since X − Y is of weight
Chapter 7 Coding Theory 382

at most 2 and hence does not belong to H3 (as H3 is of minimum weight

3). Hence the seven vectors of weight 1 in Z72 together with the zero vector
define 8 = 23 pairwise disjoint cosets exhausting all the 23 × 24 = 27 vectors
of Z72 . These eight vectors (namely, the seven vectors of weight one and the
zero vector) are called coset leaders.
We now construct a double array (Figure 7.3) of vectors of Z72 with the
cosets defined by the eight coset leaders (mentioned above). The first row is
the coset defined by the zero vector, namely, H3 .
Figure 7.3 gives the standard array for the code H3 . If u is the message
vector (that is, codeword) and v is the received vector, then v − u = e is the
error vector. If we assume that v has one or no error, then e is of weight 1 or 0.
Accordingly e is a coset leader of the standard array. Hence to get u from v,
we subtract e from v. In the binary case (as −e = e), u = v +e. For instance,
if in Figure 7.3, v = (1100 110), then v is present in the second coset for which
the leader is e = (1000 000). Hence the message is u = v + e = (0100 110).
This incidentally shows that H3 can correct single errors. However, if for
instance, u = (0100 110) and v = (1000 110), then e = (1100 000) is of weight
2 and is not a coset leader of the standard array of Figure 7.3. In this
case, the standard array decoding of H3 will not work as it would wrongly
decode v as (1000 110) − (0000 001) = (1000 111) ∈ H3 . (Notice that v is
present in the last row of Figure 7.3). The error is due to the fact that v has
two errors and not just one. Standard array decoding is therefore maximum
likelihood decoding. The general Hamming code Hm is defined analogous to
H3 . Its parity-check matrix H has the binary representations of the numbers
1, 2, 3, . . . , 2m − 1 as its column vectors. Each such vector is a vector of
Chapter 7 Coding Theory 383

length m. Hence H is a m by 2m − 1 binary matrix and the dimension

of Hm is 2m − 1 − m = 1 (number of columns in H) − (number of rows
in H)). In other words, Hm is a [2m − 1, 2m − 1 − m] linear code over Z2 .
Notice that the matrix H has rank m since H contains Im as a submatrix.
Chapter 7 Coding Theory
coset C = H3
leader

(0000000) (1000111) (0100110) (0010101) (0001011) (1100001) (1010010) (1001100) (0110011) (0101101) (0011010) (0111000) (1011001) (1101010) (1110100) (1111111)

(1000000) (0000111) (1100110) (1010101) (1001011) (0100001) (0010010) (0001100) (1110011) (1101101) (1011010) (1111000) (0011001) (0101010) (0110100) (0111111)

(0100000) (1100111) (0000110) (0110101) (0101011) (1000001) (1110010) (1101100) (0010011) (0001101) (0111010) (0011000) (1111001) (1001010) (1010100) (1011111)

(0010000) (1010111) (0110110) (0000101) (0011011) (1110001) (1000010) (1011100) (0100011) (0111101) (0001010) (0101000) (1001001) (1111010) (1100100) (1101111)

(0001000) (1001111) (0101110) (0011101) (0000011) (1101001) (1011010) (1000100) (0111011) (0100101) (0010010) (0110000) (1010001) (1100010) (1111100) (1110111)

(0000100) (1000011) (0100010) (0010001) (0001111) (1100101) (1010110) (1001000) (0110111) (0101001) (0011110) (0111100) (1011101) (1101110) (1110000) (1111011)

(0000010) (1000101) (0100100) (0010111) (0001001) (1100011) (1010000) (1001110) (0110001) (0101111) (0011000) (0111010) (1011011) (1101000) (1110110) (1111101)

Figure 7.3:

384
Chapter 7 Coding Theory 385
   
· · · 0 ··· 0 ··· 0 · · · · · · 1 ··· 0 · · ·
 .. .. ..   
 . . .  · · · 0 ··· 1 · · ·
   
   .. .. 
H:
 0 1 1 
 H:
 . . 

   
   
· · · 1 ··· 0 ··· 1 · · ·  0 0 
   
··· 1 ··· 1 ··· 0 ··· ··· 1 ··· 1 ···

··· i ··· j ··· k ··· ··· a ··· b ···

(a) (b)

Figure 7.4:

The minimum distance of Hm , m ≥ 2 is 3. This can be seen as follows. Re-

m
call that Hm = X ∈ Z22 −1 : HX t = 0 . Let i, j, k denote respectively the
numbers of the columns of H in which the m-vectors (0. . . 011), (00. . . 0101)
and (0. . . 0110) are present. (see Figure 7.4 (a)). Let v be the binary vector
of length 2m − 1 which has 1 in the i-th, j-th and k-th positions and zero
at other positions. Clearly, v is orthogonal to all the row vectors of H and
hence belongs to Hm . Hence Hm has a word of weight 3. Further, Hm has no
word of weight 2 or 1. Suppose Hm has a word u of weight 2. Let a, b be the
positions where u has 1 (Figure 7.4 (b)). As the columns of H are distinct,
the a-th and b-th columns of H are not identical. It is clear that u is not
orthogonal to all the row vectors of H (For instance, u is not orthogonal to
the ﬁrst row of H). Hence Hm has no word of weight 2. Similarly, it has no
word of weight 1. Thus the minimum weight of Hm is 3.
Chapter 7 Coding Theory 386

7.7 Sphere Packings

As before, let F n denote the vector space of all ordered n-tuples over F .
Recall (Section 7.4) that F n is a metric space with the Hamming distance
between vectors of F n as the metric.

Definition 7.7.1:
In F n , the sphere with centre X and radius r is the set

S(X, r) = {Y ∈ F n : d(X, Y ) ≤ r} ⊂ F n .

Definition 7.7.2:
An r-error-correcting linear code C is perfect if the spheres of radius r with
the words of C as centres are pairwise disjoint and their union is F n .

The above deﬁnition is justiﬁed because if v is a received vector that has

at most r errors, then v is at a distance at most r from a unique codeword
u of C, and hence v belongs to the unique sphere S(u, r); then, v will be
decoded as u.

Theorem 7.7.3:
The Hamming code Hm is a single-error-correcting perfect code.

m −1−m)
Proof. Hm is code of dimension 2m − 1 − m over Z2 and hence, has 2(2
words. Now if v is any codeword of Hm , S(v, 1) contains v (which is at
distance zero from v) and the 2m − 1 codewords got from v (which is of
length 2m −1) by altering each position once at a time. Thus S(v, 1) contains
1 + (2m − 1) = 2m words of Hm . Consequently, the union of the spheres
m −1−m
S(v, 1) as v varies over Hm contains 22 · 2m vectors. But this number
Chapter 7 Coding Theory 387

m −1
is = 22 = the number of vectors in F n , where n = 2m − 1.

Out next theorem shows that the minimum distance d of a linear code
stands as a good measure of the code.

Theorem 7.7.4:
d−1
If C is a linear code of minimum distance d, then C can correct t = ⌊ ⌋
2
or fewer errors.

Proof. It is enough to show that the spheres of radius t centred at the

codewords of C are pairwise disjoint. Indeed, if u and v are in C, and if
z ∈ B(u, t) ∩ B(v, t) (See Figure 7.5), then

z
u v

Figure 7.5:

d(u, v) ≤ d(u, z) + d(z, v)

≤ t + t = 2t ≤ d − 1 < d,

a contradiction to the fact that d is the minimum distance of C.

For instance, for the Hamming code Hm , d = 3, and therefore t =

d−1
⌊ ⌋ = 1, and so Hm is single error-correcting, as observed earlier in
2
Section 7.6.
Chapter 7 Coding Theory 388

7.8 Extended Codes

Let C be a binary linear code of length n. We can extend this code by adding
an overall parity-check at the end. This means, we add a zero at the end of
each word of even weight in C and add 1 at the end of every word of odd
weight. This gives an extended code C′ of length n + 1.
For looking at some of the properties of C′ , we need a lemma.

Lemma 7.8.1:
Let w denote the weight function of a binary code C. Then

w(X + Y ) = w(X) + w(Y ) − 2(X ⋆ Y ) (8.6)

where X ⋆ Y is the number of common 1’s in X and Y .

Proof. Let X and Y have common 1’s in the i1 , i2 , . . . , ip -th positions so that
X ⋆ Y = p. Let X have 1’s in the i1 , . . . , ip and j1 , . . . , jq -th positions and Y
in the i1 , . . . , ip and l1 , . . . , ir -th positions. Then w(X) = p + q, w(Y ) = p + r
and w(X + Y ) = q + r. The proof is now clear.

Coming back to the extended code C′ , by deﬁnition, it is an even weight

code, that is, every word of C′ is of even weight. In fact, if X and Y are in
C′ , then the RHS of Equation (8.6) is even and therefore w(X + Y ) is even.
A generator matrix of C′ is obtained by adding an overall parity-check to the
rows of a generator matrix of C. Thus a generator matrix of the extended
′
code H3 is
 
1 0 0 0 1 1 1 0
 0 1 0 0 1 1 0 1
0 0 1 0 1 0 1 1
0 0 0 1 0 1 1 1
Chapter 7 Coding Theory 389

7.9 Syndrome Decoding

Let C be an [n, k]-linear code over GF (q) = F . The standard array decoding
scheme requires storage of q n vectors of F n and also comparisons of a received
vector with the coset leaders. The number of such comparisons is at most
q n−k , the number of distinct cosets in the standard array. Hence any method
that makes a sizeable reduction in storage and the number of comparisons
is to be welcomed. One such method is given by the syndrome-decoding
scheme.

Definition 7.9.1:
The syndrome of a vector Y ∈ F n with respect to a linear [n, k]-code over F
with parity-check matrix H is the vector HY t .

As H is an (n − k) by n matrix, the syndrome of Y is a column vector

of length n − k. We denote the syndrome of Y by S(Y ). For instance, the
syndrome of Y = (1110 001) with respect to the Hamming code H3 is
 
1
" #1 " #
1111100  1 1
1101010  
0 = 0 .
1011001  0
 1
0
1

Theorem 7.9.2:
Two vectors of F n belong to the same coset in the standard array decompo-
sition of a linear code C iﬀ they have the same syndrome.

Proof. Let u and v belong to the same coset a + C, a ∈ F n , of C. Then

u = a + X and v = a + Y , where X, Y are in C. Then S(u) = S(a + X) =
Chapter 7 Coding Theory 390

H(a + X)t = Hat + HX t = S(a) (Recall that as X ∈ C, S(X) = HX t = 0).

Similarly S(v) = S(a). Thus S(a) = S(b).
Conversely, let S(u) = S(v). Then Hut = Hv t and therefore H(u − v)t =
0. This means that u − v ∈ C, and hence the cosets u + C and v + C are
equal.

Theorem 7.9.2 shows that the syndromes of all the vectors of F n are
determined by the syndromes of the coset leaders of the standard array of C.
In case C is an [n, k]-binary linear code, there are 2n−k cosets and therefore
the number of distinct syndromes is 2n−k . Hence in contrast to standard-
array decoding, it is enough to store 2n−k vectors (instead of 2n vectors) in
the syndrome decoding. For instance, if C is an [100, 30]-binary linear code,
it is enough to store the 270 syndromes instead of the 2100 vectors in Z100
2 , a

huge saving indeed.

7.10 Exercises

1. Find all the codewords of the binary code with generator matrix [ 11 00 10 11 11 ].
Find a parity-check matrix of the code. Write down the parity-check
equations.

2. Show by means of an example that the syndrome of a vector depends

on the choice of the parity-check matrix.

3. Decode the received vector (1100 011) in H3 using (i) the standard array
decoding, and (ii) syndrome decoding.

4. How many vectors of Z72 are there in S(u, 3), where u ∈ Z72 ?
Chapter 7 Coding Theory 391

5. How many vectors of F n are there in S(u, 3), where u ∈ F n , and

|F | = q.

6. Show that a t-error-correcting binary perfect [n, k]-linear code satisﬁes

the relation

n n n
+ + ··· + = 2n−k .
0 1 t
More generally, show that a t-error-correcting perfect [n, k]-linear code
over GF (q) satisﬁes the relation

n n n
+ + ··· + = q n−k .
0 1 t

7. Show that there exists a set of eight binary vectors of length 6 such
that the distance between any two of them is at least 3.

8. Show that it is impossible to ﬁnd nine binary vectors of length 6 such

that the distance between any two of them is at least 3.

9. Show that the function d(X, Y ) deﬁned in Section 7.4 is indeed a metric.

d
10. Show that a linear code of minimum distance d can detect at most ⌊ ⌋
2
errors.
Chapter 8

Cryptography

8.1 Introduction

Cryptography is the science of transmitting messages in a secured fashion.

Naturally, it has become an important tool in this information age. It has
already entered into our day-to-day life through e-commerce, e-banking etc.,
not to speak of defence.
To make a message secure, the sender usually sends the message in a
disguised form. The intended receiver removes the disguise and then reads
oﬀ the original message. The original message of the sender is the plaintext
and the disguised message is the ciphertext. The plaintext and the ciphertext
are usually written in the same alphabet. The plaintext and the ciphertext
are divided, for the sake of computational convenience, into units of a ﬁxed
length. The process of converting a plaintext to a ciphertext is known as
enciphering or encryption, and the reverse process is known as deciphering
or decryption. A message unit may consist of a single letter or any ordered
k-tuple, k ≥ 2. Each such unit is converted into a number in a suitable arith-

392
Chapter 8 Cryptography 393

metic and the transformations are then carried out on this set of numbers.
An enciphering transformation f converts a plaintext message unit P (given
by its corresponding number)into a number that represents the correspond-
ing ciphertext message unit C while its inverse transformation, namely, the
deciphering transformation just does the opposite by taking C to P . We
assume that there is a 1–1 correspondence between the set of all plaintext
units P to the set C of all ciphertext units. Hence each plaintext unit gives
rise to a unique ciphertext unit and vice versa. This can be represented
symbolically by
f f −1
P −→ ξ −→ C ,

where C stands for the enciphering transformation. Such a set up is known

as a cryptosystem.

8.2 Some Classical Cryptosystem

8.2.1 Caesar Cryptosystem

One of the earliest of the cryptosystems is the Caesar cryptosystem attributed

to Julius Caesar of 1st century. In this cryptosystem, the alphabet is the set
of English characters A, B, C, . . . ,X, Y , Z labelled 0, 1, 2, . . . , 23, 24, 25
respectively so that 0 corresponds to A, 1 corresponds to B and so on, and
ﬁnally 25 corresponds to Z. In this system, each message unit is of length
1 and hence consists of a single character. The encryption (transformation)
f · P −→ C is given by

f (a) = a + 3 (mod 26). (8.1)

Chapter 8 Cryptography 394

A B C D E F G H I J K L M
0 1 2 3 4 5 6 7 8 9 10 11 12
N O P Q R S T U V W X Y Z
13 14 15 16 17 18 19 20 21 22 23 24 25

Figure 8.1:

while the decryption (transformation) f −1 · C −→ P is given by

f −1 (b) = b − 3 (mod 26). (8.2)

Figure 8.1 gives the 1–1 correspondence between the characters A to Z and
the numbers 1 to 25. For example, the word “OKAY” corresponds to the
number sequence “(14) (10) (0) (24)” and this gets transformed, by eqn. (8.1),
to “(17)(13)(3)(1)” and so the corresponding ciphertext is “RNDB”. The
deciphering transformation applied to “RNDB” then gives back the message
“OKAY”.

8.2.2 Affine Cryptosystem

Suppose we want to encrypt the message “I LIKE IT”. In addition to the

English characters, we have in the message two spaces in between words.
So we add “space” to our alphabet by assigning to it the number 26. We
now do arithmetic modulo 27 instead of 26. Suppose, in addition, each such
message unit is an ordered pair (sometimes called a digraph). Then each
unit corresponds to a unique number in the interval [0, 272 − 1]. Now, in the
message, “I LIKE IT”, the number of characters including the two spaces is
9, an odd number. As our message units are ordered pairs, we add an extra
blank space at the end of the message. This makes the number of characters
Chapter 8 Cryptography 395

10 and hence the message can be divided into 5 units U1 , U2 , . . . , U5 :

|I−| |LI| |KE| |−I| |T −|

where U1 = I− etc. (Here, − stands for space). Now U1 corresponds to the

number (see Figure 8.1) 8 · 271 + 26 = 242. Assume that the enciphering
transformation that acts now on ordered pairs is given by

C ≡ aP + b (mod 272 ) (8.3)

where a and b are in the ring Z27 , a 6= 0 and (a, 27) = 1. In eqn. (8.3), P
and C denote a pair of corresponding plaintext and ciphertext units. The
Extended Euclidean Algorithm [?] ensures that as (a, 27) = 1, a has a unique
inverse a−1(mod 27) so that

aa−1 ≡ 1 (mod 272 ).

This enables us to solve for P in terms of C from the Congruence (8.3).

Indeed, we have
P ≡ a−1 (C − b) (mod 272 ) (8.4)

As a speciﬁc example, let us take a = 4 and b = 2. Then

C ≡ 4P + 2 (mod 272 ).

Further as (4, 272 ) = 1, 4 has a unique inverse (mod 272 ); in fact 4−1 = −182
as 4 · (−182 ) ≡ 1(mod 272 ). This when substituted in congruence (8.4) gives

P ≡ −182 (C − 2) (mod 272 ).

Getting back to P =“I−” = 252 in I LIKE IT, we get

C ≡ 4 · 252 + 2 (mod 272 )

Chapter 8 Cryptography 396

≡ 271 (mod 272 ).

Now 271 = 10 · 27 + 1, and therefore it corresponds to the ordered pair KB

in the ciphertext. Similarly, “LI”, “KE” and “(space)I” and “T(space)” cor-
respond to ?????????? respectively. Thus the ciphertext that corresponds to
the plaintext “I LIKE IT” is “KB, . . . , . . . , . . . , ”. To get back the plain-
text, we apply the inverse transformation (8.4). As the numerical equivalent
of “KB” is 271, relation (8.4) gives P ≡ (−182 ) (271 − 2) = 1863 ≡ 405 (
mod 272 ) and this, as seen earlier, corresponds to “I (−)”. An equation of
the form C = aP + b is known as an aﬃne transformation. Hence such
cryptosystems are called affine cryptosystems.
In the Ceasar cryptosystem and the aﬃne cryptosystem, in the transfor-
mation f (a) ≡ a + 3(mod 26), 3 is known as the key of the transformation.
In the transformation given by eqn. (8.3), there are two keys, namely, a and
b.

8.2.3 Private Key Cryptosystems

In the Caesar cryptosystem and the aﬃne cryptosystem, the keys are known
to the sender and the receiver in advance. That is to say that whatever
information does the sender has with regard to his encryption, it is shared
by the receiver. For this reason, these cryptosystems are called private key
cryptosystems.

8.2.4 Hacking an affine cryptosystem

Suppose an intruder I (that is a person other than the sender A and the
receiver B) who has no knowledge of the private keys wants to hack the
Chapter 8 Cryptography 397

message, that is, decipher the message stealthily. We may suppose that the
type of cryptosystem used by A and B of the system including the unit
length, though not the keys, are known to I. Such an information may get
leaked out over a passage of time or may be obtained even by spying. How
does I go about hacking it? He does it by a method known as frequency
analysis.
Assume for a moment that the message units are of length 1. Look at a
long string of the ciphertext and ﬁnd out the most-repeated character, the
next most-repeated character and so on. Suppose, for the sake of precision,
they are U, V, X, . . . Now in the English language, the most common char-
acters of the alphabet of 27 letters consisting of the English characters A
to Z and “space” are known to be “space” and E. Then “space” and E of
the plaintext correspond to U and V of the ciphertext respectively. If the
cryptosystem used is the aﬃne system given by equation,

C = aP + b (mod 27),

we have 20 = a · 26 + b (mod 27)

and 21 = a · 4 = b (mod 27).

Subtraction yields
22a ≡ −1 (mod 27) (8.5)

As (22, 27) = 1, (8.5) has a unique solution a = 11. This gives b = 21 − 4a =

21 − 44 = −23 = 4 ( mod 27). The cipher has thus been hacked.
Suppose now the cryptosystem is based on an aﬃne transformation C =
aP + b with unit length 2. If the same alphabet consisting of 27 characters
of this section (namely, A to Z and space) is used, each unit corresponds to
Chapter 8 Cryptography 398

a unique nonnegative integer less that 272 . Suppose the frequency analysis
of the ciphertext reveals that the most commonly occurring ordered pairs
are “CB” and “DX” in their decreasing orders of their frequencies. The
decryption transformation is of the form

P ≡ a′ C + b′ (mod 272 ) (8.6)

Here a and b are the enciphering keys and a′ , b′ are the deciphering keys.
Now it is known that in English language, the most frequently occurring
order pairs, in their decreasing orders of their frequencies, are “E(space)”
and “S(space)”. Symbolically,

“E(space)” −→ CA, and

“S(space)” −→ CX.

Writing these in terms of their numerical equivalents we get

(4 × 27) + 26 = 134 −→ (2 × 27) + 0 =54, and

(18 × 27) + 26 = 512 −→ (3 × 27) + 23=104. (8.7)

These, when substituted in (8.5), give the congruences:

134 ≡ 54a′ + b′ (mod 729), and

512 ≡ 104a′ + b′ (mod 729).

Subtraction gives

50a′ ≡ 378 (mod 729).

As (50, 729) = 1, this congruence has a unique solution by the Extended Eu-
clidean Algorithm [?]. In fact, a′ =?????????? and therefore b′ =??????????
Chapter 8 Cryptography 399

Thus the deciphering keys a′ and b′ have been determined and the cryptosys-
tem has been hacked.
In our case, the gcd(50, 729) happened to be 1 and hence we had no prob-
lem in determining the deciphering key. If not, we have to try all the possible
solutions for a′ and take the plaintext that is meaningful. Instead, we can also
continue with our frequency analysis and compare the next most-repeated
ordered pairs in the plaintext and ciphertext and get a third congruence and
try for a solution in conjunction with one or both of the earlier congruences.
If these also fail, we may have to adopt adhoc techniques to determine a′ and
b′ .

8.3 Encryption Using Matrices

Assume once again that the message units are ordered pairs in the same
alphabet of size 27 of Section 8.2. We can use 2 by 2 matrices over the ring
Z27 to set up a private key cryptosystem in this case. In fact if A is any
2 by 2 matrix with entries from Z27 , and (X, Y ) is any plaintext unit, we
 
X
encipher it as B = A  , where B is again a 2 by 1 matrix and therefore
Y
 
X′
a ciphertext unit of length 2. If B =  , we have the equations
′
Y
   
X′ X
  = A 
Y′ Y
   
X X′
and   = A−1   . (8.8)
Y Y′
Chapter 8 Cryptography 400

The ﬁrst equation of (8.8) gives the encryption while the second gives the
decryption. Notice that A−1 must be taken in Z27 . For A−1 to exist, we must
have (det A, 27) = 1. If this were not the case, we may have to try once
again adhoc methods.  
21
As an example, take A =  . Then det A = 2, and (det A, 27) =
43
(2, 27) = 1. Hence 2 (mod 27) exists and 2−1 = 14 ∈ Z27 . This gives
7

     
3 −1 42 −14 15 13
A−1 = 14  = =  over Z27 . (8.9)
−4 2 −56 28 25 1

Suppose, for instance, we want to encipher “HEAD” using the above matrix
 
7
transformation. We proceed as follows: ‘HE” corresponds to the vector  ,
4
 
0
and “AD” to the vector  . Hence the enciphering transformation gives
3
the corresponding ciphertext as
     
70 7 0
A   = A   , A   (mod 27)
43 4 3
     
21 7 21 0
=     ,     (mod 27)
43 4 43 3
   
18 3
=   ,   (mod 27)
40 9
       
18 3 S D
=  ,   =  ,  
13 9 N J

Thus the ciphertext of “HEAD” is “SNDJ”. We can decipher “SNDJ” in

Chapter 8 Cryptography 401

exactly the same manner by taking A−1 in Z27 . This gives the plaintext
     
18 3 15 13
A−1   , A−1   , where A−1 =   , as given by (8.9)
13 9 25 1

Therefore the plaintext is

       
450 162 7 0
  ,   (mod 27) =   ,  
463 84 4 3

and this corresponds to “HEAD”.

8.4 Exercises
 
17 5
1. Find the inverse of A =   in Z27 .
8 7
 
12 3
2. Find the inverse of A =   in Z29 .
5 17

3. Encipher the word “MATH” using the matrix A of Exercise 1 above as

the enciphering matrix in the alphabet A to Z of size 26. Check your
result by deciphering your ciphertext.

4. Solve the simultaneous congruences

x − y = 4 (mod 26)

7x − 4y = 10 (mod 26).

5. Encipher the word “STRIKES” using an aﬃne transformation C ≡

4P + 7 ( mod 272 ) acting on units of length 2 over an alphabet of size
27 consisting of A to Z and the exclamation mark ! with 0 to 25 and
26 as the corresponding numerals.
Chapter 8 Cryptography 402

6. Suppose that we know that our adversary is using a 2 by 2 enciphering

matrix with a 29-letter alphabet, where A to Z have numerical equiv-
alents 0 to 25, (space)=26, ?=27 and !=28. We receive the message

(space) C ? Y C F ! Q, T W I U M H Q V.

Suppose we know by some means that the last four letters of the plain-
text are our adversary’s signature “MIKE”. Determine the full plain-
text.

8.5 Other Private Key Cryptosystems

We now describe two other private key cryptosystems.

8.5.1 Vigenere Cipher

In this cipher, the plaintext is in the English alphabet. The key consists of
an ordered set of letters for some ﬁxed positive integer d. The plaintext is
divided into message units of length d. The ciphertext is obtained by adding
the key to each message unit using modulo 26 addition.
For example, let d = 3 and the key be XYZ. If the message is “ABAN-
DON”, the ciphertext is obtained by taking the numerical equivalence of the
plaintext, namely,
010 (13) 3 (14) 14 (13),

and the adding modulo 26 numerical equivalence of “XYZ”, namely, (23) (24) (25)
of the key. This yields

(23) (25) (25) (36) (27) (39) (37) (37) (mod 26)
Chapter 8 Cryptography 403

= (23) (25) 25 (10) 1 (13) (11) (11)

= “X Z Z K B N L L′′

as the ciphertext.

8.5.2 The One-Time Pad

This was introduced by G. S. Vernam in 1926. The alphabet Σ for the

plaintext is the set of 26 English characters. If the message M is of length
N , the key K is generated as a pseudorandom sequence of characters of Σ
also of the same length N . The ciphertext is then obtained by the equation

C ≡ M + K (mod 26)

Notwithstanding the fact that the key K is as long as the message M , the
system has its own drawbacks.

(i) There are only standard methods of generating pseudorandom sequences

from out of Σ, and their number is not large.

(ii) The long private keyK must be communicated to the receiver in ad-
vance.

Despite these drawbacks, this cryptosystem is allegedly used in some highest

levels of communication such as the Washington-Moscow hotline [?].
There are several other private key cryptosystems. The interested reader
can see the references [?].
Chapter 8 Cryptography 404

8.6 Public Key Cryptography

All cryptosystems described so far are private key cryptosystems. This means
that some one who has enough information to encipher messages has enough
information to decipher messages as well. As a result, in private key cryp-
tography, any two persons in a group who want to communicate messages in
a secret way must have exchanged keys in a safe way (for instance, through
a trusted courier).
In 1976, the face of cryptography got altered radically with the invention
of public key cryptography by Diffie and Hellman [?]. In this cryptosystem,
the encryption can be done by any one. But the decryption can be done only
by the intended recipient who alone is in possession of a secret key.
At the heart of this cryptography is the concept of a “one-way function”.
Roughly speaking, a one-way function is a 1-1 function f which is such that
whenever k is given, it is possible to compute f (k) “rapidly” while it is
“extremely difficult” to compute the inverse of f in a “reasonable” amount of
time. There is no way of asserting that such and such a function is a one-way
function since the computations depend on the technology of the day—the
hardware and the software. So what passes on for a one-way function today
may fail to be a one-way function a few years hence.
As an example of a one-way function, consider two large primes p and q
each having at least 500 digits. Then it is “easy” to compute their product
n = pq. However, given n, there is no efficient factoring algorithm as on date
that would give p and q in a reasonable amount of time. The same problem
of forming the product pq with p and q having 100 digits had passed on for
a one-way function in the 1980’s but is no longer so today.
Chapter 8 Cryptography 405

8.6.1 Working of Public Key Cryptosystems

A public key cryptosystem works in the following way: Each person A in a

group has a public key PA and a secret key SA . The public keys are made
public as in a telephone book with PA given against the name A. A computes
his own secret key SA and keeps it within himself. The security of the system
rests on the fact that no person of the group other than A or an intruder
would be able to ﬁnd out SA . The keys PA and SA are chosen to be inverses
of each other in that for any message M ,

(PA ◦ SA ) M = M = (SA ◦ PA ) M.

Transmission of Messages

Suppose A wants to send a message M to B in a secure fashion. The public

key of B is SB which is known to every one. A sends “PB · M ” to B. Now, to
decipher the message, B applies SB to it and gets SB (PB M ) = (SB · PB )M =
M . Note that none other than B can decipher the message sent by A since
B alone is in possession of SB .

Digital Signature

Suppose A wants to send some instruction to a bank (for instance, transfer

an amount to Mr. C from out of his account). If the intended message to the
bank is M , A applies his secret key SA to M and sends SA M to the bank.
He also gives his name for identiﬁcation. The bank applies A’s public key PA
to it and gets the message PA (SA M ) = (PA ◦ SA )M = M . This procedure
also authenticates A’s digital signature. This is in fact the method adopted
in credit cards.
Chapter 8 Cryptography 406

We now describe two public key cryptosystems. The ﬁrst is RSA, after
their inventors, Rivest, Shamir and Adleman. In fact, Diﬃe and Hellman,
though they invented public key cryptography in 1976, did not give the pro-
cedure to implement it. Only Rivest Shamir and Adleman did it in 1978,
two years later.

8.6.2 RSA Public Key Cryptosystem

Suppose there is a group of people who want to communicate between them-

selves secretly. In such a situation, RSA is the most commonly used public
key cryptosystem. The length of the message units is ﬁxed in advance as
also the alphabet in which the cryptosystem is operated. If for instance, the
alphabet consists of the English characters and the unit length is k, then any
message unit is represented by a number less than 26k = (say) N .

Description of RSA

We now describe RSA.

1. Each person A (traditionally called Alice) chooses two large distinct

primes p and q and computes their product n = pq, where p and q are
so chosen that n > N .

2. Each A chooses a small positive integer e, 1 < e < φ(n), such that
(e, φ(n)) = 1, where (the Euler function) φ(n) = φ(pq) = φ(p)φ(q) =
(p − 1)(q − 1). (e is odd as φ(n) is even).

3. As (e, φ(n)) = 1, by Extended Euclidean algorithm [?], e has a multi-

Chapter 8 Cryptography 407

plicative inverse d modulo n, that is,

ed ≡ 1 (mod φ(n)).

4. A (Alice) gives the ordered pair (n, e) as her public key and keeps d as
her private (secret) key.

5. Encryption P (M ) of the message unit M is done by

P (M ) ≡ M e (mod n), (8.10)

while decryption S(M ′ ) of the cipher text unit M ′ is given by

d
S(M ′ ) ≡ M ′ (mod n) (8.11)

Thus both P and S (of A) act on the ring Zn . Before we establish the
correctness of RSA, we observe that d (which is computed using the Extended
Euclidean algorithm) can be computed in O(log3 n) time. Further powers
M e and M ′ d modulo n in eqns. (8.10) and (8.11) can also be computed in
O(log3 n) time [?]. Thus all computations in RSA can be done in polynomial
time.

Theorem 8.6.1 (Correctness of RSA):

Equations (8.10) and (8.11) are indeed inverse transformations.

Proof. We have

S (P (M )) = (S(M e )) ≡ M ed (mod n).

Hence it suﬃces to show that

M ed ≡ M (mod n).
Chapter 8 Cryptography 408

Now, by the deﬁnition of d,

ed ≡ 1 (mod φ(n)).

But φ(n) = φ(pq) = φ(p)φ(q) = (p − 1)(q − 1),

and therefore, ed = 1 + k(p − 1)(q − 1) for some integer k. Hence

M ed = M 1+k(p−1)(q−1)

= M · M k(p−1)(q−1)

By Fermat’s little theorem, if (M, p) = 1,

M p−1 ≡ 1 (mod p)

and therefore,

M ed = M · M 1+k(p−1)(q−1) ≡ M (mod p).

If however (M, p) 6= 1, then (M, p) = p, and trivially

M ed ≡ M (mod p).

Hence in both the cases,

M ed ≡ M (mod p). (8.12)

For a similar reason,

M ed ≡ M (mod q). (8.13)

As p and q are distinct primes, the congruences (8.12) and (8.13) imply that

M ed ≡ M (mod pq)

≡ M (mod n)
Chapter 8 Cryptography 409

The above description shows that if Bob wants to send the message M to
Alice, he will send it as M e(mod n) using the public key of Alice. To decipher
the message, Alice will raise this number to the power d and get M ed ≡ M
(mod n), the original message sent by Bob.
The security of RSA rests on the supposition that none other than Alice
can determine the private key d of Alice. A person can compute d if he/she
knows φ(n) = (p − 1)(q − 1) = n − (p + q) + 1, that is to say, if he/she knows
the sum p + q. For this, he should know the factors p and q of n. Thus, in
essence, the security of RSA is based on the assumption that factoring a large
number n that is a product of two distinct primes is “diﬃcult”. However to
quote Koeblitz [?], “no one can say with certainty that breaking RSA requires
factoring n. In fact, there is even some indirect evidence that breaking RSA
cryptosystem might not be quite as hard as factoring n. RSA is the public
key cryptosystem that has had by far the most commercial success. But,
increasingly, it is being challenged by elliptic curve cryptography”.

8.6.3 The ElGamal Public Key Cryptosystem

We have seen that RSA is based on the premise that factoring a very larger
integer which is a product of two “large” primes p and q is “difficult” com-
pared to forming their product pq. In other words, given p and q, finding
their product is a one-way function. ElGamal public key cryptosystem uses
a different one-way function, namely, a function that computes the power of
an element of a large finite group G. In other words, given G, g ∈ G, g 6= e,
and a positive integer a, ElGamal cryptosystem is based on the assumption
that computation of g a = b ∈ G is “easy” while given b ∈ G and g ∈ G, it is
Chapter 8 Cryptography 410

“diﬃcult” to ﬁnd the exponent a.

Definition 8.6.2:
Let G be a ﬁnite group and b ∈ G. If y ∈ G, then the discrete logarithm of y
with respect to base b is any non-negative integer x less than o(G), the order
of G, such that bx = y, and we write logb y = x.

As per the definition, logb y may or may not exist. However, if we take
G = Fq∗ , the group of nonzero elements of a finite field Fq of q elements and
g, a generator of the cyclic group Fq∗ (See [?]), then for any y ∈ Fq∗ , the
discrete logarithm logg y exists.

Example 8.6.3:
∗ ∗
5 is a generator of F17 . In F17 , the discrete logarithm of 12 with respect to
∗
base 5 is 10. In symbols: log5 12 = 10. In fact, in F17 ,

h5i = 51 = 5, 52 = 8, 6, 13, 14, 2, 10, 58 = −1, 12, 9, 11, 4, 3, 15, 7, 516 = 1

This logarithm is called discrete as it is taken in a ﬁnite group.

8.6.4 Description of ElGamal System

The ElGamal system works in the following way: All the users in the system
agree to work in an already chosen large finite field Fq . A generator g of
Fq∗ is fixed once and for all. Each message unit is then converted into a
number Fq . For instance, if the alphabet is the set of English characters and
if each message unit is of length 3, then the message unit BCD will have the
numerical equivalent 262 · 1 + 261 · 2 + 3 ( mod q). It is clear that in order that
these numerical equivalents of the message units are all distinct, q should be
Chapter 8 Cryptography 411

quite large. In our case, q ≥ 263 . Now each user A in the system randomly
chooses an integer a = aA , 0 < a < q − 1, and keeps it as his or her secret
key. A declares g a ∈ Fq as his public key.
If B wants to send the message unit M to A, he chooses a random positive
integer k, k < q − 1, and sends the ordered pair

g k , M g ak (8.14)

to A. Since B knows k and g a is the public key of A, B can compute g ak .

How will A decipher B’s message? She will first raise the first number of the
ordered pair given in (8.14) to the power a and compute it in Fq∗ . She will
then divide the second number M g ak of the pair by g ak and get M . A can
do this as she has a knowledge of a. An intruder who gets to know the pair
(g k , M g ak ) cannot find a = loggk (g ak ) ∈ Fq∗ , since the security of the system
rests on the fact that finding discrete logarithm is “difficult”, that is, given
h and ha in Fq , there is no efficient algorithm to determine a.
There are other public key cryptosystems as well. The interested reader
can refer to [?].

8.7 Primality Testing

8.7.1 Nontrivial Square Roots (mod n)

We have seen that the most commonly applied public key cryptosystem,
namely, the RSA is built up on very large prime numbers (numbers having,
say, 500 digits and more). So there arises the natural question: Given a large
positive integer, how do we know that it is a prime or not. A ‘primality test’
is a test that tells if a given number is a prime or not.
Chapter 8 Cryptography 412

Let n be a prime, and a a positive integer with a2 ≡ 1 ( mod n). Then a

is called a square root mod n. This means that n divides (a − 1)(a + 1), and
so, n|(a − 1) or n|(a + 1); in other words a ≡ ±1 ( mod n). Conversely, if
a ≡ ±1 ( mod n), then a2 ≡ 1 ( mod n). Hence a prime number has only the
trivial square root 1 and −1 modulo n. However, the converse is not true,
that is, there exist composite numbers m having only trivial square roots
modulo m, for instance, m = 10 is such a number. On the other hand 11 is
nontrivial square root of the composite number 20 since 11 6≡ ±1 ( mod 20).
Consider the modular exponentiation algorithm which determines
ac(mod n) in O(log2 n log c) time [?]. At any intermediate stage of the algo-
rithm, the output i is squared, taken modulo n and then multiplied to a or
1 as the case may be. If the square i2(mod n) of the output i is 1 modulo n,
then already we have determined a nontrivial square root modulo n, namely,
i. Therefore we can immediately conclude that n is not a prime and there-
fore a composite number. This is one of the major steps in the Miller-Rabin
Primality Testing Algorithm to be described below.

8.7.2 Prime Number Theorem

For a positive real number x, let π(x) denote the number of primes less than
or equal to x. The Prime Number Theorem states that π(x) is asymptotic
to x/ log x; in symbols, π(x) ≈ x/ log x. Here the logarithm is with respect
π(n) 1
to base e. Consequently, π(n) ≈ n/ log n, or, equivalently, n
≈ log n
. In
other words, in order to ﬁnd a 100-digit prime one has to examine roughly
loge 10100 ≈ 230 randomly chosen 100-digit numbers for primality. (This
ﬁgure may drop down by half if we omit even numbers).
Chapter 8 Cryptography 413

8.7.3 Pseudoprimality Testing

Fermat’s Little Theorem (FLT) states that if n is prime, then for each a,
1 ≤ a ≤ n − 1,
an−1 ≡ 1 (mod n). (8.15)

Note that for any given a, an−1 ( mod n) can be computed in polynomial time
using the repeated squaring method [?]. However, the converse of Fermat’s
Little Theorem is not true. This is because of the presence of Carmichal
numbers. A Carmichal number is a composite number n satisfying (8.15) for
each a prime to n. They are sparse but are inﬁnitely many (??????????).
The ﬁrst few Carmichal numbers are 561, 1105, 1729.
Since we are interested in checking if a given large number n is prime
or not, n is certainly odd and hence (2, n) = 1. Consequently, if 2n−1 6≡ 1
(mod n), we can conclude with certainty, in view of FLT, that n is composite.
However, if 2n−1 ≡ 1 ( mod n), n may be a prime or not. If it is not a prime,
then it is a pseudoprime with respect to base b.

Definition 8.7.1:
n is called a pseudoprime to base a, where (a, n) = 1, if
(i) n is composite, and
(ii) an−1 ≡ 1 ( mod n)

In this case, n is also called a base a pseudoprime.

Base-2 Pseudoprime Test

Given an odd positive integer n, check if 2n−1 6≡ 1 ( mod n). If yes, n is

composite. If not, 2n−1 ≡ 1 ( mod n) and n may be a prime.
Chapter 8 Cryptography 414

But then there is a chance that n is not a prime. How often does this
happen? For n < 10, 000, there are only 22 pseudoprimes to base 2. They
are 341, 561, 645, 1105, . . . . Using better estimates due to Carlo Pomerance
(See [?]), we can conclude that the chance of a randomly chosen 50 digit
(resp. 100-digit) number satisﬁes (8.15) but fails to be a prime is < 10−6
(resp.< 10−13 ).
More generally, if (a, n) = 1, 1 < a < n, the pseudoprime test with
reference to base a checks if an−1 6≡ 1 ( mod n). If true, a is composite; if
not, a may be a prime.

8.7.4 The Miller-Rabin Primality Testing Algorithm

If we choose each a with (a, n) = 1, then we may have to choose φ(n) base
numbers a in the worst case. Instead, the Miller-Rabin Test works as follows:

(i) It tries several randomly chosen base values a instead of just one a.

(ii) While computing each modular exponentiation an−1 ( mod n), it stops
as soon as it notices a nontrivial square root of 1 ( mod n) and outputs
composite.

We now proceed to present the pseudocode for the Miller–Rabin test. The
code uses an auxiliary procedure WITNESS such that WITNESS (a, n) is
TRUE iﬀ a is a “witness” to the compositeness of n. We now present and
justify the construction of “WITNESS”.

WITNESS (a, n)
Chapter 8 Cryptography 415

1 let (bk , bk−1 , . . . , b0 ) be the binary representation of n − 1
2 d ←− 1
3 for i ←− k downto 0
4 do x ←− d
5 d ←− (d, d) ( mod n)
6 if d = 1 and x 6= n − 1
7 then return TRUE
8 if bi = 1
9 then d ←− (d, d) ( mod n)
10 if d 6= 1
11 then return TRUE
12 return FALSE

Description of WITNESS (a, n)

Line 1 determines the binary representation of n. Lines 3 to 9 compute d

as an−1 ( mod n). This is done using the modular exponentiation method
(??????????) and can be done in polynomial time. Lines 6 and 7 check if
a nontrivial square root of 1 is present; if yes, then declares the test that n
is composite. Lines 10 and 11 return TRUE if an−1(mod n) 6≡ 1. We again
conclude that n is composite. If FALSE is returned in line 12, we conclude
that n is either a prime or a pseudoprime to base a. WITNESS (a, n) gives
a decisive conclusion only in the case when n is composite.
Chapter 8 Cryptography 416

Correctness of WITNESS (a, n)

If line 7 returns TRUE, then, x is a nontrivial square root modulo n and

so n is composite (See Section 8.7). If WITNESS returns TRUE in line 11,
then it has checked that an−1 6≡ 1(mod n) and therefore by Fermat’s Little
Theorem, a is composite. If line 12 returns FALSE, then an−1 ≡ 1(mod n),
and therefore a is either a prime or a pseudoprime to base a (as mentioned
before).

8.7.5 Miller-Rabin Algorithm (a, n)

The Miller-Rabin algorithm is a probabilistic algorithm to test the compos-

iteness of n. It chooses at random s base numbers in {1, 2, . . . , n − 1} from
a random-number generating process: RANDOM (1, n − 1). It then uses
WITNESS (a, n) as an auxiliary procedure and checks if a is composite.

MILLER-RABIN (a, n)

1 for j ←− 1 to s
2 do a ←− RANDOM (1, n − 1)
3 if WITNESS (a, n)
4 then return COMPOSITE ⊲ Definitely
5 return PRIME ⊲ Almost surely

Chapter 8 Cryptography 417

Description and Correctness of Miller-Rabin (a, n)

The main loop (beginning of line 1) picks s random values of a ∈ {1, 2, . . . , n}

(line 2). If one of a’s so picked is a witness to the compositeness of n (line
3), then Miller-Rabin outputs COMPOSITE in line 4. If no witness is found
in line 3, Miller-Rabin outputs PRIME in line 5. If Miller-Rabin outputs
PRIME in line 5, then there may be an error in the procedure. This error
may not depend on n but rather on the size of s and the luck in the random
choice of the base a. However, this error is rather small. In fact, it can be
shown that the error is less than 1/2s . Hence if the number of witnesses s is
large, the error is comparatively small.

8.8 The Agrawal-Kayal-Saxena (AKS) Primal-

ity Testing Algorithm

8.8.1 Introduction

The Miller-Rabin primality testing algorithm is a probabilistic algorithm

that uses Fermat’s Little Theorem. Another probabilistic algorithm is due
to Solovay and Strassen. It uses the fact that if n is an odd prime, then
n−1 a a
a 2 ≡ ( )(mod n), where ( ) stands for the Legendre symbol. The Miller-
n n
Rabin primality test is known to be the fastest randomized primality testing
algorithm, to within constant factors. However, the question of determin-
ing a polynomial time algorithm (See ??????????for deﬁnition) to test if a
given number is prime or not was remaining unsolved till July 2002. In
August 2002, Agrawal, Kayal and Saxena of the Indian Institute of Tech-
Chapter 8 Cryptography 418

nology, Kanpur, India made a sensational revelation that they have found a
polynomial time algorithm for primality testing. Actually, their algorithm
works in Õ (log7.5 n) time (Recall from ?????????? that Õ (f (n)) stands for

O f (n)· (polynomial in log f (n)) . It is based on a generalization of Fermat’s
Little Theorem to polynomial rings over ﬁnite ﬁelds. Notably, the correct-
ness proof of their algorithm requires only simple tools of algebra. In the
following section, we present the details of AKS algorithm in ???

8.8.2 The Basis of AKS Algorithm

The AKS algorithm is based on the following identity for prime numbers
which is a generalization of Fermat’s Little Theorem.

Lemma 8.8.1:
Let a ∈ Z, n ∈ N, n ≥ 2, and (a, n) = 1. Then n is prime if and only if

(X + a)n ≡ X n + a (mod n) (8.16)

Proof. We have
n−1
X
n n n
(X + a) = X + X n−1 ai + an .
n=1
i
n

If n is prime, each term i
, 1 ≤ i ≤ n − 1, is divisible by n. Further, as
(a, n) = 1, by Fermat’s Little Theorem, an ≡ a ( mod n). This establishes
(8.16).
If n is composite, n has a prime factor q < n. Let q k ||n (that is, q k |n but

q k+1 6 | n). Now consider the term nq X n−q aq in the expansion of (X + a)n .
We have

n n(n − 1) · · · (n − q + 1)
= .
q 1 · 2···q
Chapter 8 Cryptography 419

n

n

Then q k 6 | q
. For if q k q
, as q k ||n , n(n − 1) · · · (n − q + 1) must be
divisible by q, a contradiction. Hence q k , and therefore n does not divide the

term nq X n−q aq . This shows that (X + a)n − (X n + a) is not identically zero
over Zn .

The above identity suggests a simple test for primality: given input n,
choose an a and test whether the congruence (8.16) is satisfied, However
this takes time Ω(n) because we need to evaluate n coefficients in the LHS
in the worst case. A simple way to reduce the number of coefficients is to
evaluate both sides of (8.16) modulo a polynomial of the form X r − 1 for an
appropriately chosen small r. In other words, test if the following equation
is satisfied:
(X + a)n = X n + a (mod X r − 1, n) (8.17)

From Lemma 8.8.1, it is immediate that all primes n satisfy eqn. (8.17) for
all values of a and r. The problem now is that some composites n may also
satisfy the eqn. (8.17) for a few values of a and r (and indeed they do).
However, we can almost restore the characterization: we show that for an
appropriately chosen r if the eqn. (8.17) is satisﬁed for several a’s, then n
must be a prime power. It turns out that the number of such a’s and the
appropriate r are both bounded by a polynomial in log n, and this yields a
deterministic polynomial time algorithm for testing primality.

8.8.3 Notation and Preliminaries

Fp denotes the finite field with p elements, where p is a prime. Recall that if p
is prime and h(x) is a polynomial of degree d irreducible over Fp , then Fp [X]/
(h(X)) is a finite field of order pd . We will use the notation, f (X) = g(X)
Chapter 8 Cryptography 420

(mod h(X), n) to represent the equation f (X) = g(X) in the ring Zn [X]/
(h(X)), that is, if the coeﬃcients of f (X), g(X) and h(X) are reduced modulo
n, then h(X) divides f (X) − g(X).

As mentioned earlier, for any function f (n) of n, Õ f (n) stands for

O f (n) · poly in log f (n) . For example,

Õ(logk n) = O logk n · poly log logk n

= O logk n · poly log log n

= O logk+ǫ n for any ǫ > 0.

All logarithms in this section are with respect to base 2.

Given r ∈ N, a ∈ Z with (a, r) = 1, the order of a modulo r is the smallest
number k such that ak ≡ 1(mod r). It is denoted as Or (a). φ(r) is the Euler’s
quotient function. Since by Fermat’s theorem, aφ(r) ≡ 1 ( mod r), and since
aOr (a) ≡ 1 ( mod r), by the deﬁnition of Or (a), we have Or (a) | φ(r).
We need the following simple fact about the lcm of the ﬁrst m natural
numbers (See ?????????? for a proof).

Lemma 8.8.2:
Let lcm (m) denote the lcm of the ﬁrst m natural numbers. Then for m ≥ 7,
lcm (m) ≥ 2m .
Chapter 8 Cryptography 421

8.8.4 The AKS Algorithm

Input, integer n > 1
1 If (n = ab for a ∈ N and b > 1), output COMPOSITE
2 Find the smallest r such that Or (n) > 4 log2 n
3 If 1 < (a, n) < n for some a ≤ r, output COMPOSITE
4 If n ≤ r, output PRIME
p
5 For a = 1 to ⌊2 φ(r) · log n⌋, do if (X + a)n 6= X n + a ( mod X n − 1, n),
output COMPOSITE
6 output PRIME

Theorem 8.8.3:
The AKS algorithm returns PRIME iﬀ n is prime.

We now prove Theorem 8.8.3 through a sequence of lemmas.

Lemma 8.8.4:
If n is prime, then the AKS algorithm returns PRIME.

Proof. If n is prime, we have to show that AKS will not return COMPOSITE
in steps 1, 3 and 5. Certainly, the algorithm will not return COMPOSITE
in step 1. Also, if n is prime, there exists no a such that 1 < (a, n) < n, so
that the algorithm will not return COMPOSITE in step 3. By Lemma 8.8.1,
the for loop in step 5 cannot return COMPOSITE. Hence the algorithm will
identify n as PRIME either in step 4 or in step 6.

We now consider the steps when the algorithm returns PRIME, namely,
steps 4 and 6. Suppose the algorithm returns PRIME in step 4. Then n
must be prime. If n were composite, n = n1 n2 , where 1 < n1 , n2 < n.
Chapter 8 Cryptography 422

then as n ≤ r, if we take a = n1 , we have a ≤ r. So in step 3, we would

have had 1 < (a, n) = a < n, a ≤ r. Hence the algorithm would have
output COMPOSITE in step 3 itself. Thus we are left out with only one
case, namely, the case if the algorithm returns PRIME in step 6. For the
purpose of subsequent analysis, we assume this to be the case.
The algorithm has two main steps (namely, 2 and 5). Step 2 finds an
appropriate r and step 5 verifies eqn. (8.17) for a number of a’s. We first
bound the magnitude of r.

Lemma 8.8.5:
There exists an r ≤ 16 log5 n + 1 such that Or (n) > 4 log2 n.

Proof. Let r1 , . . . , rt be all the numbers such that Ori (n) ≤ 4 log2 n for each
i and therefore ri divides αi = (nOri (n) − 1) for each i. Now for each i, αi
divides the product
⌊4 logr n⌋
Y 4 4 5
P = (ni − 1) < n16 log n
= (2log n )16 log n
= 216 log n .
i=1

t
Q 2
(Note that we have used the fact that (ni − 1) < nt , the proof of which
i=1
follows readily by induction on t). As ri divides αi and αi divides P for
each i, 1 ≤ i ≤ t, the lcm of the ri ’s also divides P . Hence (lcm of the
5
ri ’s)< 216 log n . However, by Lemma 8.8.2,
log5 n ⌉
lcm 1, 2, . . . , ⌈16 log5 n⌉ ≥ 2⌈16 2 .

Hence there must exist a number r in 1, 2, . . . , ⌈16 log5 n⌉ , that is, r ≤
16 log5 n + 1 such that Or (n) > 4 log2 n.

Let p be a prime divisor of n. We must have p > r. For, if p ≤ r, (then

Chapter 8 Cryptography 423

as p < n), n would have been declared COMPOSITE in step 3, while if

p = n ≤ r, n would have been declared PRIME in step 4. This forces that
(n, r) = 1. Otherwise, there exists a prime divisor p of n and r, and hence
p ≤ r, a contradiction as seen above. Hence (p, r) is also equal to 1. We ﬁx
p
p and r for the remainder of this section. Also, let l = ⌊2 φ(r) log n⌋.
Step 5 of the algorithm veriﬁes l equations. Since the algorithm does not
output COMPOSITE in this step (recall that we are now examining step 6),
we have
(X + a)n = X n + a (mod X r − 1, n)

for every a, 1 ≤ a ≤ l. This implies that

(X + a)n = X n + a (mod X r − 1, p) (8.18)

for every a, 1 ≤ a ≤ l. By Lemma 8.8.1, we have

(X + a)p = X p + a (mod X r − 1, p) (8.19)

for 1 ≤ a ≤ b. Comparing eqn. (8.18) with (8.19), we notice that n behaves

like prime p. We give a name to this property:

Definition 8.8.6:
For polynomial f (X) and number m ∈ N, m is said to be introspective for
f (X) if
[f (X)]m = f (X m ) (mod X r − 1, p).

It is clear from eqns. (8.18) and (8.19) that both n and p are introspective
for X + a, 1 ≤ a ≤ l. Our next lemma shows that introspective numbers are
closed under multiplication.
Chapter 8 Cryptography 424

Lemma 8.8.7:
If m and m′ are introspective numbers for f (X), then so is mm′ .

Proof. Since m is introspective for f (X), we have

f (X m ) = (f (X))m (mod X r − 1, p)
′ ′
and hence [f (X m )]m = f (X)mm (mod X r − 1, p). (8.20)

Also, since m′ is introspective for f (X), we have

m′ ′
f X = f (X)m (mod X r − 1, p).

Replacing X by X m in the last equation, we get

′
mm′
f X = f (X m )m (mod X mr − 1, p).
′
mm′
and hence f X ≡ f (X m )m (mod X r − 1, p) (8.21)

(since X r − 1 divides X mr − 1). Consequently from (8.20) and (8.21),

mm′ mm′
(f (X)) =f X (mod X r − 1, p).

Thus mm′ is introspective for f (X).

Next we show that for a given number m, the set of polynomials for which
m is introspective is closed under multiplication.

Lemma 8.8.8:
If m is introspective for both f (X) and g(X), then it is also introspective for
the product f (X)g(X).

Proof. The proof follows from the equation:

[f (X) · g(X)]m = [f (X)]m [g(X)]m = f (X m )g(X m ) (mod X r − 1, p).

Chapter 8 Cryptography 425

Eqns. (8.18) and (8.19) together imply that both n and p are introspective
for (X + a). Hence by Lemmas 8.8.7 and 8.8.8, every number in the set
I = {ni pj : i, j ≥ 0} is introspective for every polynomial in the set P =
l
Q ea
(X + a) : ea ≥ 0 . We now define two groups based on the sets I
a=1
and P that will play a crucial role in the proof.
The first group consists of the set G of all residues of numbers in I modulo
r. Since both n and p are prime to r, so is any number in I. Hence G ⊂ Zn∗ ,
the multiplicative group of residues mod r that are relatively prime to r. It
is easy to check that G is a group. (Only thing that requires a verification is
that ni pj has a multiplicative inverse in G. Since nOr (n) ≡ 1 ( mod r), there
′ ′
exists i′ , 0 ≤ i′ < Or (n) such that ni = ni . Hence inverse of ni (= ni ) is
n(Or (n)−i) . A similar argument applies for p as pOr (n) = 1. (p being a prime
divisor of r, (p, r) = 1). Let |G| = the order of the group G = t (say). As G
is generated by n and p modulo r and since Or (n) > 4 log2 n, t > 4 log2 n.
To define the second group, we need some basic facts about cyclotomic
polynomials over finite fields. Let Qr (X) be the r-th cyclotomic polynomial
over the field Fp (?????[?]). Then Qr (X) divides X r − 1 and factors into
irreducible factors of the same degree d = Or (p). Let h(X) be one such
irreducible factor of degree d. Then F = Fp [X]/(h(X)) is a field. The
second group that we want to consider is the group generated by X + 1, X +
2, . . . , X + l in the multiplicative group F ∗ of nonzero elements of the field
F . Hence it consists of simply the residues of polynomials in P modulo h(X)
and p. Denote this group by G.
We claim that the order of G is exponential in either t = |G| or l.

Lemma 8.8.9:
Chapter 8 Cryptography 426

|G| ≥ min 2l − 1, 2t .

Proof. First note that h(X) | Qr (X) and Qr (X) | (X r − 1). Hence X may be
taken as a primitive r-th root of unity in F = Fp [X]/(h(X)).
We claim that (*) if f (X) and g(X) are polynomials of degree less than
t and if f (X) 6= g(X) in P , then their images in F (got by reducing the
coefficients modulo p and then taking modulo h(X) ) are distinct. To see
this, assume that f (X) = g(X) in the field F (that is, the images of f (X)
and g(X) in the field F are the same). Let m ∈ I. Recall that every number
of I is introspective with respect to every polynomial in P . Hence m is
introspective with respect to both f (X) and g(X). This means that

f (X)m = f (X m ) (mod X r − 1, p),

and g(X)m = g(X m ) (mod X r − 1, p).

Consequently, f (X m ) = g(X m ) (mod X r − 1, p),

and since h(X) | (X r − 1),

f (X m ) = g(X m ) (mod h(X), p).

In other words f (X m ) = g(X m ) in F , and therefore X m is a root of the

polynomial Q(Y ) = f (Y ) − g(Y ) for each m ∈ G. As X is a primitive r-th
root of unity and for each m ∈ G, (m, r) = 1, X m is also a primitive r-th
root of unity. Since each m ∈ G is reduced modulo r, the powers X m , m ∈ G,
are all distinct. Thus there are at least t = |G| primitive r-th roots of unity
X m , m ∈ G, and each one of them is a root of Q(Y ). But since f and g are
of degree less than t, Q(Y ) has degree less than t. This contradiction shows
that f (X) 6= g(X) in F .
Chapter 8 Cryptography 427

We next observe that the numbers 1, 2, . . . , t are all distinct in Fp . This

is because
p √
l = ⌊2 φr) log n⌋ < 2 r log n
√
√ t
<2 r (as t > 4 log2 n)
2
<r (as t < r; recall that G is a subgroup of Zn∗ )

< p (by assumption on p).

Hence {1, 2, . . . , l} ⊂ {1, 2, . . . , p − 1}. This shows that the elements X + 1,

X + 2, . . . , X + l are all distinct in Fp [X] and therefore in Fp [X]/h(X) =
F . If t ≤ l, then all the possible products of the polynomials in the set
{X + 1, X + 2, . . . X + t} except the one containing all the t of them are all
distinct and of degree less than t. Their number is 2t − 1 and all of them
belong to P . By (*), their images in F are all distinct, that is, |G| ≥ 2t − 1.
If t > l, then there exist at least 2l such polynomials (namely, the product
of all subsets of {X + 1, . . . , X + l}). These products are all of degree at
most l and hence of degree < t. Hence in this case, |G| ≥ 2l . Thus |G| ≥

min 2t − 1, 2l .

Finally we show that if n is not a prime power, then |G| is bounded above
by an exponential function of t = |G|.

Lemma 8.8.10:
√
If n is not a prime power, |G| ≤ 21 n2 t .

Proof. Set Iˆ = {ni · pj : 0 ≤ i, j ≤ ⌊t⌋}. If n is not a prime power (recall

that p|n), the number of terms in Iˆ = (⌊t⌋ + 1)2 > t. When reduced mod r,
the elements of Iˆ give elements of G. But |G| = t. Hence there exist at least
Chapter 8 Cryptography 428

two distinct numbers in Iˆ which become equal when reduced modulo r. Let
them be m1 , m2 with m1 > m2 . So we have (since r divides (m1 − m2 ) ),

X m1 = X m2 (mod X r − 1) (8.22)

Let f (X) ∈ P . Then

[f (X)]m1 = f (X m1 ) (mod X r − 1, p)

= f (X m2 ) (mod X r − 1, p) by (8.22)

= [f (X)]m2 (mod X r − 1, p)

= [f (X)]m2 (mod h(X), p) since h(X) | (X r − 1).

This implies that

[f (X)]m1 = [f (X)]m2 in the ﬁeld F . (8.23)

Now f (X) when reduced modulo (h(X), p) yields an element of G. Thus

every polynomial of G is a root of the polynomial

Q1 (Y ) = Y m1 − Y m2 over F .

Thus there are at least |G| distinct roots in F . Naturally, |G| ≤ degree of
Q1 (Y ). Now the degree of Q1 (Y )

= m1 (as m1 > m2 )

≤ the greatest number in Iˆ = (n p)⌊t⌋

2 √t
n n
< p < , since p | n, and p 6= n
2 2
√ √
n2 t n2 t
= √t < .
2 2
√
n2 t
Hence |G| ≤ m1 < .
2
Chapter 8 Cryptography 429

Lemma 8.8.9 gives a lower bound for |G| while Lemma 8.8.10 gives an
upper bound for |G|. These bounds enable us to prove the correctness of the
algorithm.
Chapter 9

Finite Automata

9.1 Introduction

The broad areas of automata, formal languages, computability and complex-

ity comprise the core of contemporary Computer Science theory. Progress in
these areas has given a formal meaning to computation, which itself received
serious attention in the 1930s. Automata theory provides a firm basis for
the mathematical models of computing devices; finite automaton is one such
model. Ideas of simple finite automata have been implicitly used in elec-
tromechanical devices for over a century. A formal version of finite automata
appeared in 1943 in the McCulloch-Pitts model of neural networks. The
1950s saw intensive work on finite automata (often under the name “sequen-
tial machines”), including their recognizing power and equivalence to regular
expressions.
A finite automaton is a greatly restricted model of a modern computer.
The main restriction is that it has only limited memory (which cannot be
expanded) and has no auxiliary storage. The theory of finite automata is rich

430
Chapter 9 Finite Automata 431

and elegant. In 1970, with the development of the UNIX operating system,
practical applications of finite automata have appeared—in lexical analysis
(lex), in text searching (grep) and in Unix-like utilities (awk). Extensions
to finite automata have been proposed in the literature. For example, the
Büchi automata is one that accepts infinite input sequences.
We begin the notion of languages which is freely used in later discussions.
We then introduce regular expressions. Subsequent sections deal with finite
automata and their properties.

9.2 Languages

An alphabet is any ﬁnite set of symbols or letters. If we are considering

ASCII characters as our alphabet we can build text as understood commonly.
With the alphabet {0, 1}, we can build binary numbers. We will denote an
arbitrary alphabet by the Greek letter Σ. Given Σ, a string (or any sentence
or a sequence) over Σ is any ﬁnite-length sequence of symbols of Σ.

e.g. if Σ = {0, 1}, we can construct strings ending in 0;

that is, we can prescribe the sequence, 0, 10, 100, 110, 1000, 1100,
1010, 1110, . . .
(these strings can be precisely described in another way—they
are binary numbers divisible by 2)

The length of a string x, denoted by |x| is deﬁned to be the number of

symbols in it. For example |1010| = 4. The null string or empty string is a
unique string of length 0 (over Σ). It is denoted by ǫ. Thus |ǫ| = 0. The set
of all strings over an alphabet Σ is denoted by Σ⋆ . For example, if Σ = {a},
Chapter 9 Finite Automata 432

then,
{a}⋆ = {ǫ, a aa, aaa, aaaa, . . .}

Consider the alphabet A comprising both upper and lower case of the 26
letters of the English alphabet as well as the punctuation marks, blank and
question mark. That is,

A = {a, b, . . . , A, B, . . . , 6 b, ?}

The sentence,
This6 b is6 b a6 b sentence.
is a sequence in A⋆ . Similarly, the sentence,
sihT6 b si6 b ynnuf.
is a sequence in A⋆ .
The objective in deﬁning a language is to selectively pick certain sequences
of Σ⋆ , that make sense in some way or that satisfy some property.

Definition 9.2.1:
Let Σ be a finite set, the alphabet. A language over Σ is a subset of the set
Σ⋆ .
Note that Σ⋆ , φ and Σ denote some language! It is not difficult to reason
that since Σ is finite, Σ⋆ is a countably infinite set (the basic idea is thus: for
each k ≥ 0, all strings of length k are enumerated before all strings of length
k + 1; strings of length exactly k are enumerated lexicographically, once we
fix some ordering of the symbols in Σ).
Chapter 9 Finite Automata 433

Since languages are sets, they can be combined by union, intersection and
diﬀerence. Given a language A we also deﬁne the complement in Σ⋆ as

Ā =∼ A = {x ∈ Σ⋆ | x 6∈ A} .

In other words, ∼ A is Σ⋆ \ A.
If A and B are languages over Σ, their concatenation is a language A · B
or simply AB deﬁned as,

AB = {xy | x ∈ A and y ∈ B} .

The powers of An of a language A are inductively deﬁned

A◦ = {ǫ} An+1 = AAn

For example, {a, b}n is the set of strings over {a, b} of length n.
The closure or the Kleene star or the asterate A⋆ of a language A is the
set of all strings obtained by concatenating zero or more strings from A. This
is exactly equivalent to taking the union of all ﬁnite powers of A. That is
[
A⋆ = An
n≥0

We also deﬁne A+ to be the union of all nonzero powers of A:

A+ = AA⋆

Exercise 9.2.2:
The language L is deﬁned over the alphabet {a, b} recursively as follows:

i) ǫ ∈ L
Chapter 9 Finite Automata 434

ii) ∀x in L, xa, bx and abx are also in L

iii) nothing else is in L

Show that L is the set of all elements in {a, b}⋆ except those containing the
substring aab.

A primary issue in the theory of computation is the representation of

languages by finite specification. As we are interested in languages that may
have inﬁnite number of strings, the issue is quite challenging. Also, for many
languages, characterizing all the strings by a simple unique property may not
be possible. In any case, given a speciﬁcation of a language, we are mostly
interested in the following problems:

(i) how to automatically generate the strings in the given language?

(ii) how to recognize whether a given string belongs to the given language?

Answers to these questions comprise much of the subject matter of the study
of automata and formal languages. Models of computation, such as finite
automata, pushdown automata, Turing Machines or λ-calculus evolved be-
fore modern computers came into existence. Parallely and independently, the
formal notion of grammar and language (Chomsky hierarchy, a hierarchy of
language classification) were developed. The equivalence between languages
and automata is now well-understood.
Our concern here is restricted to what are called as regular languages
described precisely by regular expressions (answering the question (i) above).
We are also concerned with the corresponding model of computation called
finite automata or finite state machines (which will answer the question (ii)
above).
Chapter 9 Finite Automata 435

9.3 Regular Expressions and Regular Languages

To concisely describe an identiﬁer in a programming language, we may say

the following:

“an identifer (in some programming language) is a

letter followed by zero or more letters or digits.”

This can be syntactically described as,

identiﬁer = letter · (letter + digit)⋆

The expression on the right is a typical regular expression. Interpreting ·

⋆
as concatenation, + as union and as the Kleene star, we get the desired
meaning. We generalize the idea in the above example: we can construct
a language using the operations of union, concatenation and Kleene star on
“simple languages”. Given Σ, for any a ∈ Σ, we can form a simple language
{a}; also we allow empty language and {ǫ} to be simple languages.

Definition 9.3.1:
Regular expressions (and the corresponding languages) over Σ are exactly
those expressions that can be constructed from the following rules:

1. φ is a regular expression, denoting the empty language.

2. ǫ is a regular expression, denoting the language {ǫ}, the language con-

taining only the empty string.

3. If a ∈ Σ, a is a regular expression denoting the language {a}, the

language with only one string (comprising the single symbol a).
Chapter 9 Finite Automata 436

4. If R and S are regular expressions denoting languages LR and LS re-

spectively, then:

i) (R) + (S) is a regular expression denoting the language LR ∪ LS .

ii) (R) · (S) is a regular expression denoting the language LR · LS .

iii) (R)⋆ is a regular expression denoting the language L⋆R .

Languages that can be obtained by the above rules 1 through 4 produce

what are called as regular languages over Σ, which we deﬁne later.
Regular expressions obtained as above are fully paranthesized. We can
relax on this requirement if we agree to the precedence rules: Kleene star has
highest precedence and + , with the lowest precedence (we use parentheses
when necessary).

Example 9.3.2:
We write a + b⋆ c to mean (a + ((b⋆ )c)), a regular expression over {a, b, c}.

Example 9.3.3:
The regular expression (a + b)⋆ cannot be written as a + b⋆ because they
denote diﬀerent languages.

Example 9.3.4:
To reason about the language represented by the regular expression (a+b)⋆ a,
we note that (a + b)⋆ denotes strings of any length (including 0) formed by
taking a or b. This is then concatenated with the symbol a. Therefore the
language consists of all possible strings from {a, b} ending in a.
Chapter 9 Finite Automata 437

If two regular expressions R and S denote the same language, we write

R = S and we say R and S are equivalent.

Exercise 9.3.5:
Let Σ = {a, b}. Reason that

(i) (a + b)⋆ = {a⋆ , b⋆ }⋆ .

(ii) (aa + ab + ba + bb)⋆ = ((a + b)(a + b))⋆ .

Every regular expression can be regarded as a way of generating mem-

bers of a language. There are however some languages which have simple
descriptions even though they cannot be described by regular expressions.
For example, the language {0n 1n | n ≥ 1} can be shown to be not regular.
Thus regular expressions are limited in their powers to specify languages in
general.

9.4 Finite Automata — Definition

In the previous section, we saw regular expressions as language generators.

We now describe a simple automaton or an abstract machine to recognize
strings that belong to a language generated by regular expressions. In the
discussions that follow, Σ should be obvious from the context.
Consider the following simple diagram, which depicts a ﬁnite automaton M1 :
Chapter 9 Finite Automata 438

1 0 0

q0 q1

1
The automaton M1

Figure 9.1:

Let 1010 be the input to the machine. We start in state q0 (marked by

an arrow) and we see a 1 in the input (scanned from left to right). On seeing
the 1 we make the transition as indicated and stay in state q0 . On seeing the
second symbol 0, from state q0 we follow the transition and move to state q1 .
In state q1 we see the third symbol 1. We make the transition to state q0 . In
state q0 we see the last symbol 0 and make the transition to state q1 which is
an “accepting state” (marked by double circle). We say that the input string
1010 is accepted by the automaton M1 . It is not diﬃcult to reason that M1
accepts only all strings described by (0 + 1)⋆ 0 i.e., all strings ending in 0.
Let us next consider the following automaton M2 :

0 1 1

q0 q1

0
The automaton M2

Figure 9.2:

We can easily reason that M2 accepts the empty string ǫ and those that
end in 0.
We now consider the following automaton M3 which “solves” a “practical
Chapter 9 Finite Automata 439

problem”—testing for divisibility of a number by 3: The above automaton

0, 3, 6, 9

1, 4, 7 1, 4, 7
0, 3, 6, 9
2, 5, 8 s 2, 5, 8

1, 4, 7 2, 5, 8

q1 1, 4, 7 q2

0, 3, 6, 9 2, 5, 8 0, 3, 6, 9
Automaton M3 −Divisibility by 3 tester

Figure 9.3:

recognizes all strings from Σ = {0, 1, 2, 3, . . . , 9} that are exactly divisible

by 3. Note that at any time, in scanning an input, we are in:
—–state q0 , if the partial string scanned (i.e., the number corresponding to
it) so far is congruent to 0 ( mod 3);
—–state q1 , if the partial string scanned so far is congruent to 1 ( mod 3);
and
—–state q2 , if the partial string scanned so far is congruent to 2 ( mod 3)

Given a machine M , if A is the set of all strings accepted by M , we

say that the language of the machine M is A. This is concisely written as
L(M ) = A.
Chapter 9 Finite Automata 440

Exercise 9.4.1:
By trial and error, design a ﬁnite automaton that precisely recognizes the
following language A:
A = {ω | ω is a string that has an equal number of occurrences of 01 and 10
as substrings}.

We now give the deﬁnition of a (deterministic) ﬁnite state automaton

(DFA).

Definition 9.4.2:
A (deterministic) ﬁnite automaton M is a 5-tuple

M = (Q, Σ, δ, s, F ) ,

where
Q is a finite set of states
Σ is a finite alphabet
δ: Q × Σ → Q is a transition function
s ∈ Q is the start state
F ⊆ Q is the set of accept (or final) states.

Note that δ, the transition function deﬁnes the rules for “moving” through
the automaton as depicted pictorially. It can equivalently be described by a
table. If M is in state q and sees input a, it moves to state δ(q, a). No move
on ǫ is allowed; also δ(q, a) is uniquely speciﬁed.

Example 9.4.3:
Chapter 9 Finite Automata 441

The following automaton M4 accepts strings ending in 11:

M4 = (Q, Σ, δ, s, F )

where Q = {a, b, c}, Σ = {0, 1}, s = a and F = {c}. δ is given by the

following table:

δ: 0 1
a a b
b a c
c a c

We now deﬁne δ̂, the “multistep version” of δ:

δ̂ : Q × Σ⋆ → Q

By deﬁnition, for any q ∈ Q, δ̂(q, ǫ) = q and for any string x ∈ Σ⋆ and

symbol a ∈ Σ,

δ̂(q, xa) = δ δ̂(q, x), a .

Thus the state δ̂(q, x) is the state the automaton M will end up in, when
started in state q, fed the input x and allowed transitions according to δ.
Note that δ̂ and δ agree on strings of length one:

δ̂(q, a) = δ̂(q, ǫa), since a = ǫa

= δ δ̂(q, ǫ), a = δ(q, a).

Formally, a string x is accepted by the automaton M if δ̂(s, x) ∈ F and is

rejected by the automaton M if δ̂(s, x) 6∈ F .
We can now formally deﬁne L(M ), the language of the machine:
n o
⋆
L(M ) = x ∈ Σ | δ̂(s, x) ∈ F
Chapter 9 Finite Automata 442

Definition 9.4.4:
A language A is said to be regular if A = L(M ) for some DFA M .

9.4.1 The Product Automaton and Closure Properties

Given two regular languages A and B, we now show that A ∪ B, A ∩ B and

∼ A are also regular.
Let A and B be regular languages. Then there must be automata M1
and M2 such that L(M1 ) = A and L(M2 ) = B. Let,

M1 = (Q1 , Σ, δ1 , s1 , F1 )

M2 = (Q2 , Σ, δ2 , s2 , F2 )

To show that A∩B is regular, we have to show that there exists an automaton
M3 such that L(M3 ) = A ∩ B. We claim that M3 can be constructed as given
below:

Let M3 = (Q3 , Σ, δ3 , s3 , F3 ) , where

Q3 = Q1 × Q2 = {(p, q) | p ∈ Q1 and q ∈ Q2 }

F3 = F1 × F2 = {(p, q) | p ∈ F1 and q ∈ F2 }

s3 = (s1 , s2 )

δ3 (p, q), a = δ1 (p, a), δ2 (q, a)

The automaton M3 is called the product of M1 and M2 .

The operation of M3 should be intuitively clear. On an input x ∈ Σ⋆ , it
will simulate the moves of M1 and M2 simultaneously and reach an accept
state only if x is accepted by both M1 and M2 . A formal argument is given
below.
Chapter 9 Finite Automata 443

Lemma 9.4.5:
We ﬁrst show that for all x ∈ Σ⋆ ,

δ̂3 ((p, q), x) = δ̂1 (p, x), δ̂2 (q, x) .

The proof is by induction on | x | .

If x = ǫ, δ̂3 ((p, q), ǫ) = (p, q) = δ̂1 (p, ǫ), δ̂2 (q, ǫ) .

Now assume that the lemma holds for x ∈ Σ⋆ . We will show that it holds
for xa also where a ∈ Σ.

δ̂3 (p, q), xa = δ3 δ̂3 ((p, q), x) , a definition of δ̂3

= δ3 δ̂1 (p, x), δ̂2 (q, x) , a induction hypothesis

= δ̂1 δ̂1 (p, x), a , δ̂2 δ̂2 (q, x), a definition of δ3

= δ̂1 (p, xa)) , δ̂2 (q, xa) definition of δ̂1 and δ̂2

We now show that L(M3 ) = L(M1 ) ∩ L(M2 ).

For all x ∈ Σ⋆ ,

x ∈ L(M3 )

⇔ δ̂3 (s3 , x) ∈ F3 deﬁnition of acceptance

⇔ δ̂3 ((s1 , s2 ), x) ∈ F1 × F2 deﬁnition of s3 and F3

⇔ δ̂1 (s1 , x), δ̂2 (s2 , x) ∈ F1 × F2 lemma 9.4.5

⇔ δ̂1 (s1 , x) ∈ F1 and δ̂2 (s2 , x) ∈ F2 deﬁnition of set product

⇔ x ∈ L(M1 ) and x ∈ L(M2 ) deﬁnition of acceptance

⇔ x ∈ L(M1 ) ∩ L(M2 ) deﬁnition of intersection

Chapter 9 Finite Automata 444

We next argue that regular sets are closed under complementation. To do

this, we take the DFA M , where L(M ) = A, and interchange the set of accept
and reject states. The modiﬁed automaton accepts a string x exactly when
x is not accepted by M . So we have constructed the modiﬁed automaton to
accept ∼ A.
By one of the DeMorgan’s laws,

A ∪ B = ∼ (∼ A∩ ∼ B)

Therefore, the regular languages are closed under union also.

In the sequel, we shall give only intuitively appealing informal correctness
arguments rather than rigorous formal proofs as done above.

9.5 Nondeterministic Finite Automata

9.5.1 Nondeterminism

A DFA can be made more “ﬂexible” by adding a feature called nondetermin-

ism. This feature allows the machine to change states that is only partially
determined by the current state and input symbol. That is, from a current
state and input symbol, the next state can be any one of the several possible
legal states. Also the machine may change its state by making an ǫ-transition
i.e., there may be transition from one state to another without reading any
new input symbol. It however turns out that every nondeterministic ﬁnite
automaton (NFA) is equivalent to a DFA.
An NFA is said to accept its input string x if it is possible to start in the
start state and scan x, moving according to the transition rules making a
series of choices (for moving to next state) that eventually leads to one of the
Chapter 9 Finite Automata 445

accepting states when the end of x is reached. Since there are many choices
for going to the next state, there may be many paths through the NFA in
response to the input x—some may lead to accept states and some may lead
to reject states. The NFA is said to accept x if at least one computation
path on x starting from the start state leads to an accept state. It should be
noted that in the NFA model itself there is no mechanism to determine as to
which transition to make in response to a next symbol from the input. We
illustrate these ideas with an example. Consider the following automaton
N1 :

0, 1

q0 q1 q2 q3
1 0, 1 0, 1
The automaton N1

Figure 9.4:

In the automaton N1 , from the state q0 there are two transitions on the
symbol 1. So N1 is indeed nondeterministic. It is easy to reason that N1
accepts all strings over {0, 1}⋆ that contain a 1 in the third position from
the end.
On an input, say, 01110100, a computation can be such that we always
stay in state q0 . But the computation where we stay in state q0 till we read
the ﬁrst ﬁve symbols and then move to states q1 , q2 and q3 on reading the
next three symbols accepts the string 01110100. Therefore N1 accepts the
string 01110100.
Chapter 9 Finite Automata 446

9.5.2 Definition of NFA

The deﬁnition is very similar to that of a DFA except that we need to describe
the new way of making transitions. In an NFA the input to the transition
function is a state plus an input symbol or the empty string; the transition
is to a set of possible next (legal) states.
Let P(Q) be the power set of Q. Let Σǫ denote Σ ∪ {ǫ}, for any alphabet
Σ. The formal deﬁnition is as follows:

Definition 9.5.1:
A nondeterministic finite automaton is a 5-tuple (Q, Σ, δ, s, F ), where
Q is a finite set of states
Σ is a finite alphabet
δ: Q × Σǫ → P(Q) is the transition function
s ∈ Q is the start state and
F ⊆ Q is the set of accept states.

We can now formally state the notion of computation for an NFA. Let
N = (Q, Σ, δ, s, F ) be an NFA and let w be a string over the alphabet Σ.
Then we say that N accepts w if we can write w as w1 w2 · · · wk , where each
wi , 1 ≤ i ≤ k, is in Σǫ and there exists a sequence of states q0 , q1 , . . . , qk ,
0 ≤ i ≤ k such that,

(i) q0 = s

(ii) qi+1 ∈ δ (qi , wi+1 ), for i = 0, . . . , k − 1

(iii) qk ∈ F

Condition (i) states that the machine starts in its start state. Condition (ii)
Chapter 9 Finite Automata 447

states that on reading the next symbol in any current state the next state is
any one of the allowed legal states. Condition (iii) states that the machine
exhausts its input to end up in any one of the ﬁnal states.

9.6 Subset Construction—Equivalence of DFA

and NFA

Definition 9.6.1:
Two automata M and N are said to be equivalent if L(M ) = L(N ).

A fundamental fact is that both deterministic and non-deterministic ﬁnite

automata recognize the same class of languages and are therefore equivalent.
Thus, corresponding to an NFA there exists a DFA—the DFA may require
more states. The equivalence can be proved using what is known as the
subset construction, which can be intuitively described as given below.
Let N be the given NFA. We start with a pebble in the start state of N .
We next track the moves in N using its transition function. Assume that we
have scanned y, a prefix of the input string and that the next symbol is b. We
now make all b moves (as indicated by δ) from every (current) state where we
have a pebble to each destination state to which we can move on b or on ǫ. In
each of these destination states, we place a pebble and we remove the pebbles
off the current states. Let P be the set of current states and P ′ be the set of
destination states. Then we can build a transition function for a DFA which
has P as one state and which moves to the state P ′ on seeing b from state
P . This intuitive idea can be precisely refined and captured. We first need
a preliminary computation before we describe the subset construction.
Chapter 9 Finite Automata 448

Let ǫ-CLOSURE(q), q ∈ Q be the states of N built by applying the

following rules:

(i) q is added to ǫ-CLOSURE(q);

(ii) If r1 is in ǫ-CLOSURE(q) and there is an edge, labelled ǫ from r1

to r2 , then r2 is also added to ǫ-CLOSURE(q) if r2 is not already
there. Rule (ii) is repeated until no more new states can be added
to ǫ-CLOSURE(q).

Thus ǫ-CLOSURE(q) is the set of states that can be reached from q on

one or more ǫ-transitions alone. If T is a set of states, then we deﬁne ǫ-
CLOSURE(T ) as the union over all states q in T of ǫ-CLOSURE(q). The
procedure given in Fig. 9.1 illustrates the computation of ǫ-CLOSURE(T )
using a stack.

begin
push all states of T onto STACK;
ǫ-CLOSURE(T ) = T ;
while (STACK not empty) do
begin
pop q, top element of STACK, off STACK;
for (each state r with an edge from q to r labelled ǫ) do
if (r is not in ǫ-CLOSURE (T )) then
begin add r to ǫ-CLOSURE (T );
push r onto STACK
end
end
end
Chapter 9 Finite Automata 449

Figure 9.1: Computing ǫ-CLOSURE

We now give the subset construction procedure. That is, we are given an
NFA N ; we are required to construct a DFA D equivalent to N .
Initially let ǫ-CLOSURE (s) be a state (the start state) of D, where s is
the start state of N . We assume that initially each state of D is “unmarked”.
We now execute the following procedure.

begin
while(there is an unmarked state q = (r1 , r2 , . . . rn ) of D) do
begin mark q;
for (each input symbol a ∈ Σ) do
begin
let T be the set of states to which there is
a transition on a from some state ri in q;
x = ǫ-CLOSURE (T );
if (x has not yet been added to the set of
states of D)
then make x an unmarked state of D;
Add a transition from q to x labelled a
if not already present;
end
end
end

Figure 9.2:
Chapter 9 Finite Automata 450

We thus have the following theorem.

Theorem 9.6.2:
Every NFA has an equivalent DFA.

Thus NFAs give an alternative characterization of regular languages: a

language is regular if and only if some NFA recognizes it.

Exercise 9.6.3:
The following example illustrates the fact that given an NFA with ǫ-transitions,
we can get another NFA with no ǫ-transitions.

q2
q2 1
0
0 0 0 1 0
0 0 1

q0 q1 q3 q0 0 q1 0 q3
ǫ ǫ
0

Figure 9.5:

Argue that the NFAs given above are equivalent.

Exercise 9.6.4:
Argue that both the following NFAs accept the language (01⋆ + 0⋆ 1).
Chapter 9 Finite Automata 451

0 q0 0 q1
q0 q1
ǫ
1
1 0 1
0
p0 1 p1
1
p0 p1
0

Figure 9.6:

Exercise 9.6.5:
Consider the following NFA which accepts all strings ending in 01 (interpreted
as a binary integer, the NFA accepts all integers of the form 4x + 1, x = any
integer).

0, 1
0 1
q0 q1 q2

Figure 9.7:

Show that by applying the subset construction procedure we get the following
equivalent DFA.

1 0

{q0 } 0 {q0 , q1 } 1 {q0 , q2 }

0
1

Figure 9.8:
Chapter 9 Finite Automata 452

Exercise 9.6.6:
Consider the following NFA which accepts all strings of 0s and 1s such that
the nth symbol from the end is 1.

0, 1
1 0, 1 0, 1 0, 1 0, 1
q0 q1 q2 b b b qn−1 qn

Figure 9.9:

Argue that the above NFA is a bad case for subset construction.

9.7 Closure of Regular Languages Under Con-

catenation and Kleene Star

We ﬁrst show that the class of regular languages is closed under the con-
catenation operation. Given two regular languages A and B, let N1 and N2
be two NFAs such that L(N1 ) = A and L(N2 ) = B. From N1 and N2 we
construct a new NFA N , to recognize AB, as suggested by the following
ﬁgure.
Chapter 9 Finite Automata 453

s1 s2

N1 N2

ǫ
s

Figure 9.10:

The key idea is that, we make the start state s1 of N1 as the start state
s of N . From each accept state of N1 we make an ǫ-transition to the start
state s2 of N2 . The accept states of N are the accept states of N2 only.

Let N1 = (Q1 , Σ, δ1 , s1 , F1 )

N2 = (Q2 , Σ, δ2 , s2 , F2 )

We construct N to recognize AB as follows:

N = (Q, Σ, δ, s1 , F2 )

where
(i) Q = Q1 ∪ Q2
(ii) We deﬁne δ so that for any q ∈ Q and a ∈ Σǫ
Chapter 9 Finite Automata 454





 δ1 (q, a) if q ∈ Q1 and q 6∈ F1





δ1 (q, a) if q ∈ F1 and a 6∈ ǫ
δ(q, a) =



δ1 (q, a) ∪ {s2 } if q ∈ F1 and a = ǫ





δ2 (q, a) if q ∈ Q2

Finally we show that the class of regular languages is closed under the
star operation.
Given a regular language A, we wish to prove that the language A⋆ is also
regular. Let N be an NFA such that L(N ) = A. We modify N , as suggested
by the following ﬁgure, to build an NFA N ′ :

ǫ
N N′
s s′ ǫ
ǫ

Figure 9.11:

The NFA N ′ accepts any input that can be broken into several pieces
and each piece is accepted by N . In addition, N ′ also accepts ǫ which is a
member of A⋆ .
Let N = (Q, Σ, δ, s, F ) be an NFA that recognizes A. We construct N ′
to recognize A⋆ as follows:

N ′ = (Q′ , Σ, δ ′ , s′ , F ′ )
Chapter 9 Finite Automata 455

Q′ = Q ∪ {s′ }

F ′ = F ∪ {s′ }

For any q ∈ Q and a ∈ Σǫ , we deﬁne





 δ(q, a) if q ∈ Q and a 6∈ F







 δ(q, a) if q ∈ F and a 6= ǫ


δ ′ (q, a) = δ(q, a) ∪ {s} if q ∈ F and a ∈ ǫ







 {s} if q = s′ and a = ǫ





φ if q = s′ and a 6= ǫ

We thus have the following theorem:

Theorem 9.7.1:
The class of languages accepted by ﬁnite automaton is closed under intersec-
tion, complementation, union, concatenation and Kleene star.

Proof. Follows from Subsection 9.4.1 and the above discussions.

9.8 Regular Expressions and Finite Automata

Although regular expressions and ﬁnite automata appear to be quite diﬀer-

ent, they are equivalent in their descriptive power. Thus any regular expres-
sion can be converted to a ﬁnite automaton and given any ﬁnite automaton,
we can get a regular expression capturing the language recognized by it.
Chapter 9 Finite Automata 456

Lemma 9.8.1:
If a language A is described by a regular expression R, then it is regular;
that is, there is an NFA to recognize A.

Proof. Given a regular expression R, we can construct an equivalent NFA N

by considering the following four cases:
1. R = φ. Then L(R) = φ and the following NFA recognizes L(R).

Figure 9.12:

2. R = ǫ. The L(R) = {ǫ} and the following NFA recognizes L(R).

Figure 9.13:

3. R = a, for some a ∈ Σ. The L(R) = {a} and the following NFA

recognizes L(R):

Figure 9.14:

4. R can be one of the following (R1 and R2 are regular expressions):

i) R1 + R2

ii) R1 · R2
Chapter 9 Finite Automata 457

iii) R1⋆

For this case we simply construct the equivalent NFA following the techniques
used in proving closure properties (union, catenation and Kleene star) of
regular languages.

The following example will provide a concrete illustration:

Example 9.8.2:
Consider the regular expression (ab+aab)⋆ . We proceed to construct an NFA
to recognize the corresponding language:

Step 1 We construct NFAs for a, b :

a b

Figure 9.15:

Step 2 We use the above NFAs and construct NFAs for ab and aab.

a ǫ b

a ǫ a ǫ b

Figure 9.16:

Step 3 Using the above NFAs for ab and aab, we construct an NFA for
ab + aab.
Chapter 9 Finite Automata 458

a ǫ b
ǫ

ǫ a ǫ a ǫ b

Figure 9.17:

Step 4 From the above NFA for ab + aab, we build the required NFA for
(ab + aab)⋆ .

a ǫ b
ǫ ǫ
ǫ
ǫ
ǫ a ǫ a ǫ b

Figure 9.18:

Lemma 9.8.3:
If a language is regular (that is, it is recognized by a ﬁnite automaton), then
it is described by a regular expression.

Proof. Let L ⊆ Σ⋆ be accepted by a DFA M = (Q, Σ, δ, s, F ). Then L =

n o
⋆
x ∈ Σ | δ̂(s, x) ∈ F . Let F = {q1 , q2 , . . . , qk }. For every qi ∈ F we can
consider the language Lqi as,
n o
Lqi = x ∈ Σ⋆ | δ̂(s, x) = qi
Chapter 9 Finite Automata 459

Then L = Lq1 ∪ Lq2 ∪ · · · ∪ Lqk .

If Lqi is described by a regular expression rqi then L can be described by
the regular expression, rq1 + rq2 + · · · + rqk . (recall: ﬁnite unions of regular
sets are regular).
For any two states p, q ∈ Q, let L(p, q) be the language deﬁned as,
n o
L(p, q) = x ∈ Σ⋆ | δ̂(p, x) = q

It is therefore suﬃcient to show that L(p, q) is described by a regular expres-

sion.
We now look at the number of distinct states through which M passes
in moving from state p to state q. It turns out that it is more convenient to
consider, for each k, a speciﬁc set of k states and to consider the set of strings
that cause M to go from state p to state q by going through only states in
that set. If k is large enough, this set of strings will be all of L(p, q).
Let us relabel the states of M using integers 1 through n, where n = |Q|.
For a string x ∈ Σ⋆ , let x represent a path from state p to state q going
through state t is there exist strings y, z (both 6= ǫ) such that,

x = yz and δ̂(p, y) = t δ̂(t, z) = q

Now, for j ≥ 0, let

L(p, q, j) =set of strings corresponding to paths from state p to

state q that go through no state numbered higher than j.

We have, L(p, q) = L(p, q, n), because no string in L(p, q) can go through a

state numbered higher than n for there are no states numbered greater than
n.
Chapter 9 Finite Automata 460

The problem now is to show that L(p, q, n) can be described by a regular

expression.
(This will be true if we can show that L(p, q, j) is described by a regular
expression for every j with 0 ≤ j ≤ n).
We proceed to do this by induction. For the basis step, we show that
L(p, q, 0) can be described by a regular expression. By deﬁnition, L(p, q, 0)
represents sets of strings corresponding to paths from state p to state q going
through no state numbered higher than 0—this means going through no state
at all, which means that the string can contain no more than one symbol.
Therefore,
L(p, q, 0) ⊆ Σ ∪ {ǫ}

More explicitly,



{a ∈ Σ | δ(p, q) = q} if p 6= q
L(p, q, 0) =


{a ∈ Σ | δ(p, a) = p} ∪ {ǫ} if p = q.

It is easy to reason that L(p, q, 0) can be described by a regular expression.

The induction hypothesis is that, for some k ≥ 0 and for every states p, q
where 0 ≤ p, q ≤ n, the language L(p, q, k) can be described by a regular
expression. Assuming the hypothesis, we wish to show that for every p, q
in the same range, the language L(p, q, k + 1) can also be described by a
regular expression. We note that for k ≥ n, L(p, q, k + 1) = L(p, q, k) and
so we assume k < n.
By deﬁnition, a string x ∈ L(p, q, k + 1) if it represents a path from p to
q that goes through no state numbered higher than k + 1. This is possible
in two ways. First, the path does not go higher than k—so, x ∈ L(p, q, k).
Second, the path goes through the state k + 1 (and nothing higher). In this
Chapter 9 Finite Automata 461

case, in general, it goes from state p to the state k + 1 (for the ﬁrst time),
then possibly loops from state k + 1 back to itself (zero or more times), and
then from the state k + 1 to the state q. This means, we can write x as yzw,
where y corresponds to the path from state p to the ﬁrst visit of state k + 1,
z corresponds to the looping in state k + 1, and w corresponds to the path
from state k + 1 to state q. We note that in each of the two parts y and w
and in each of the looping making z, the path does not go through any state
higher than k. Therefore,

y ∈ L(p, k + 1, k), w ∈ L(k + 1, q, k), z ∈ L(k + 1, k + 1, k)⋆

Considering both the above cases, we have,

x ∈ L(p, q, k) ∪ L(p, k + 1, k)L(k + 1, k + 1, k)⋆ L(k + 1, q, k)

Since x is any string in L(p, q, k + 1), we can write,

L(p, q, k + 1) = L(p, q, k) ∪ L(p, k + 1, k)L(k + 1, k + 1, k)⋆ L(k + 1, q, k)

The expression on the right hand side above can be described by a regular
expression if the individual languages in it can be described by regular ex-
pressions (this is so, by induction hypothesis). Therefore L(p, q, k + 1) can
be described by a regular expression.

Theorem 9.8.4:
A language is regular if and only if some regular expression describes it.

Proof. Follows from lemmas 9.8.1 and 9.8.3 above.

Chapter 9 Finite Automata 462

9.9 DFA State Minimization

Consider the regular expression (a + b)⋆ abb. If we construct the NFA for this
and apply subset construction algorithm, we will get the following DFA:

a a
A B C b

a
b a a b

D E
b

Figure 9.19:

The above DFA has ﬁve states. The following DFA with only four states
also accepts the language described by (a + b)⋆ abb:

b b
b
a b
A B C D
a
a a

Figure 9.20:

The above example suggests that a given DFA can possibly be simpliﬁed
to give an equivalent DFA with a lesser number of states.
We now give an algorithm that gives a general method of reducing the
Chapter 9 Finite Automata 463

number of states of a DFA. Let the given DFA be,

M = (Q, Σ, δ, s, F )

We assume that from every state there is a transition on every input (if q is
a state not conforming to this then introduce a new “dead state” d from q
to d on the inputs that are not already present; also add transitions from d
to d on all inputs).
We say that a string w distinguishes a state q1 from a state q2 if δ̂(q1 , w) ∈
F and δ̂(q2 , w) 6∈ F or vice versa.
The minimization procedure works on M by finding all groups of states
that can be distinguished by some input string. Those groups of states that
cannot be distinguished are then merged to form a single state for the entire
group. The algorithm works by keeping a partition of Q such that each group
of states consists of states which have not yet been distinguished from one
another and such that any pair of states chosen from different groups have
been found distinguishable by some input.
Initially the two groups are F and Q \ F . The fundamental step is to take
a group of states, say A = {q1 , q2 , . . . , qk } and some input symbol a ∈ Σ and
consider δ(qi , a) for every qi ∈ A. If these transitions are to states that fall
into two or more different groups of the current partition, then we must split
A into subsets so that the transitions from the subsets of A are all confined
to a single group of the current partition. For example, let δ(q1 , a) = t1 and
δ(q2 , a) = t2 and let t1 and t2 be in different groups. Then we must split A
into at least two subsets so that one subset contains q1 and another contains
q2 . Note that t1 and t2 are distinguished by some string w and so q1 and q2
are distinguished by the string aw.
Chapter 9 Finite Automata 464

The algorithm repeats the process of splitting groups until no more groups
need to be split. The following fact can be formally proved: If there exists a
string w such that δ̂(q1 , w) ∈ F and δ̂(q2 , w) 6∈ F or vice versa then q1 and
q2 cannot be in the same group; if no such w exists then q1 and q2 can be in
the same group.

Algorithm DFA Minimization

Input: A DFA M = (Q, Σ, δ, s, F ) with transitions deﬁned for all states.

Output: A DFA M ′ such that L(M ′ ) = L(M ).

Q Q
1. We construct a partition of Q. Initially consists of F and Q \ F
Q Q Q
only. We next refine to new , a new partition using procedure Refine-
Q Q
given in Fig. ??. That is, new consists of groups of , each split into one
Q Q Q Q
or more subgroups. If new 6= , we replace by new and repeat the
Q Q Q
procedure Refine- . If new = , then we terminate the process of refining
Q
.
Q
Let G1 , G2 , . . . , Gk be the final groups of .

Q
procedure Refine-
Q
begin for (each group G of ) do
begin partition G into subgroups such that two states
q1 and q2 of G are in the same subgroup iff for
Q
all a ∈ Σ, δ(q1 , a) and (q2 , a) are in the same group of ;
/* in the worst case, a state will be in a
subgroup by itself */
Q
place all subgroups so formed in new
Chapter 9 Finite Automata 465

end
end

Figure 9.3:

Q
2. For each Gi in we pick a representative, an arbitrary state in Gi . The
representatives will be the states of the DFA M ′ .
Let qi be the representative of Gi and for a ∈ Σ let δ(qi , a) ∈ Gj . Note
Gj can be same as Gi . Let qj be the representative of Gj . Then in M ′ we
add the transition from qi to qj on a. Let the initial state of M ′ be the
representative of the group containing the initial state of s of M . Also let
the ﬁnal states of M ′ be the representatives which are in F .
3. If M ′ has a dead state d, then remove d from M ′ . Also remove any
state not reachable from the initial state. Any transition from other states
to d become undeﬁned.

Example 9.9.1:
The following DFA recognizes the language described by the regular expres-
sion (0 + 1)⋆ 10.
Chapter 9 Finite Automata 466

4
0
1
2 1 0
0 5

1 0
1

6 1
1 0
3 0
1
7
1

Figure 9.21:

Applying the above algorithm gives the ﬁnal partition of states as {1, 2, 4},
{5, 3, 7} and {6}. The minimized DFA is given below:

0 1

1 0
{1,2,4} {5,3,7} {6}
1
0

Figure 9.22:

Section 9.11 will answer the question of the uniqueness of the minimal
DFA.
Chapter 9 Finite Automata 467

9.10 Myhill-Nerode Relations

9.10.1 Isomorphism of DFAs

We begin with the notion of isomorphism of DFAs.

Let D1 and D2 be two DFAs where,

D1 = (Q1 , Σ, δ1 , s1 , F1 ) and

D2 = (Q2 , Σ, δ2 , s2 , F2 )

The DFAs D1 and D2 are said to be isomorphic if there is a 1–1 onto mapping
f : Q1 → Q2 such that,

(i) f (s1 ) = s2

(ii) f (δ1 (p, a)) = δ2 (f (p), a) for all p ∈ Q1 and a ∈ Σ

(iii) p ∈ F1 if and only if f (p) ∈ F2 , for all p ∈ Q1

Conditions (i), (ii) and (iii) imply that D1 and D2 are essentially the same
automaton up to renaming of states. Therefore they accept the same input
set. We can show that the minimal state DFA corresponding to set it accepts
is unique up to isomorphism. This can be done by a beautiful correspondence
between a DFA with input alphabet Σ and certain equivalence relations on
Σ⋆ .

9.10.2 Myhill-Nerode Relation

Let M be the following DFA with no inaccessible states:

M = (Q, Σ, δ, s, F )
Chapter 9 Finite Automata 468

Let L(M ) = R ⊆ Σ⋆ . That is, R is the regular set accepted by M .

We deﬁne an equivalence relation ≡M on Σ⋆ :
For any two strings x, y ∈ Σ⋆ , we say x ≡M y if and only if δ̂(s, x) =
δ̂(s, y). It is easy to see that ≡M is an equivalence relation on Σ⋆ . Thus the
automaton M induces the equivalence relation ≡M on Σ⋆ . It is interesting
to note that ≡M also satisﬁes the following properties:

(a) ≡M is a right congruence:

For any x, y ∈ Σ⋆ and a ∈ Σ, if x ≡M y then xa ≡M ya.

To see this, we assume that x ≡M y. Then

δ̂(s, xa) = δ̂(s, x), a , by deﬁnition

= δ̂(s, y), a , by assumption

= δ̂(s, ya), by deﬁnition

Hence xa ≡M ya.

(b) ≡M reﬁnes R:
For any x, y ∈ Σ⋆ , if x ≡M y then (x ∈ R ⇔ y ∈ R). By deﬁnition
x ≡M y means δ̂(s, x) = δ̂(s, y), which is either an accept state or a
reject state. Therefore either both x and y are accepted or both x and
y are rejected. Stated another way, every equivalence class induced by
≡M has all its elements in R or none of its elements in R.

(c) ≡M is of its ﬁnite index:

That is, the number of equivalence classes induced by ≡M is ﬁnite.

Corresponding to each state q ∈ Q, there is exactly one equivalence

Chapter 9 Finite Automata 469

class given by,

n o
x ∈ Σ⋆ | δ̂(s, x) = q

The number of states is ﬁnite and hence the number of equivalence

classes is also ﬁnite.

Definition 9.10.1:
An equivalence relation ≡M on Σ⋆ is a Myhill-Nerode relation for R, a regular
set, if it satisfies the properties (a), (b) and (c) above; that is, ≡M is a right
congruence of finite index, refining R.

The interesting fact about the deﬁnition above is that it characterises ex-
actly the relations on Σ⋆ that are ≡M for some automaton M . That is, we
can construct M from ≡M using only the fact that ≡M is a Myhill-Nerode
relation.

9.10.3 Construction of the DFA from a given Myhill-

Nerode Relation

We ﬁrst show the construction of the DFA M≡ , for a given R ⊆ Σ⋆ , from

any given Myhill-Nerode relation ≡ for R. We will assume that R is not
necessarily regular. Given any string x, its equivalence class is deﬁned by,

[x] = {y | y ≡ x}

Note that there are indefinitely many strings, but there are only finitely many
equivalence classes by property (c) above.
We now define the DFA M≡ = (Q, Σ, δ, s, F ) as follows:

Q = {[x] | x ∈ Σ⋆ }
Chapter 9 Finite Automata 470

s = [ǫ]

F = {[x] | x ∈ R}

– well deﬁned because x ∈ R iﬀ [x] ∈ F by property (b)

δ ([x], a) = [xa] – well deﬁned by property (a)

We now have to show that L(M≡ ) = R.

Lemma 9.10.2:
δ̂ ([x], y) = [xy]

Proof. The proof is by induction on [y]. For the basis, we note that,

δ̂ ([x], ǫ) = [x] = [xǫ]

In the induction step, we consider δ̂ ([x], ya). By deﬁnition,

δ̂ ([x], ya) = δ δ̂ ([x], y) , a

= δ ([xy], a) , using induction hypothesis

= [xyz]

Theorem 9.10.3:
L(M≡ ) = R

Proof. Let x be any string recognized by M≡ . We will argue that x is in R.

Now, x ∈ L(M≡ )

⇔ δ̂ ([ǫ], x) ∈ F, deﬁnition of acceptance

Chapter 9 Finite Automata 471

⇔ [x] ∈ F using lemma 9.10.2

⇔ x ∈ F, by deﬁnition of F

9.10.4 Myhill-Nerode Relation and the Corresponding

DFA

We have done the following constructions:

(i) given M , we have deﬁned ≡M , a Myhill-Nerode relation.

(ii) given ≡ (a Myhill-Nerode relation), we have deﬁned M≡ .

We will now show that the constructions (i) and (ii) are inverses up to iso-
morphism of automata.

Lemma 9.10.4:
Let ≡ be a Myhill-Nerode relation for R ⊆ Σ⋆ . Let M≡ be the corresponding
DFA. From M≡ if we now deﬁne the corresponding Myhill-Nerode relation,
say ≡M≡ , it is identical to ≡.

Proof. Given ≡, let the corresponding DFA be M≡ , deﬁned as M≡ = (Q, Σ, δ, s, F ).

Then, for any two strings x, y ∈ Σ⋆ , by deﬁnition,

x ≡M≡ y ⇔ δ̂ (s, x) = δ̂ (s, y)

⇔ δ̂ ([ǫ], x) = δ̂ ([ǫ], y)

⇔ [x] = [y] by lemma 9.10.2

⇔ x ≡ y.

Hence ≡M≡ is identical to ≡.

Chapter 9 Finite Automata 472

Lemma 9.10.5:
Let the DFA for R be M , with no inaccessible states. Let the corresponding
Myhill-Nerode relation be ≡M . From ≡M if we construct the corresponding
DFA say M≡M , it is isomorphic to M .

Proof.

Let M = (Q, Σ, δ, s, F ) and

let M≡M = (Q′ , Σ, δ ′ , s′ , F ′ )

By construction, for the DFA M≡M

[x] = {y | y ≡M x}
n o
= y | δ̂(s, y) = δ̂(s, x)

Q′ = {[x] | x ∈ Σ⋆ }

s′ = [ǫ]

F ′ = {[x] | x ∈ R}

δ ′ ([x], a) = [xa]

We now have to show that M≡M and M are isomorphic under the map:

f : Q′ → Q where f ([x]) = δ̂ (s, x)

By the deﬁnition of ≡M ,

[x] = [y] iﬀ δ̂ (s, x) = δ̂ (s, y) .

So the map of f is well-deﬁned on the equivalence classes induced by ≡M .

Also f is 1–1. Since M has no inaccessible states f is onto.
Chapter 9 Finite Automata 473

To argue that f is an isomorphism we need to show that,

(i) f (s′ ) = s

(ii) f δ̂ ([x], a) = δ f ([x]), a and

(iii) [x] ∈ F ′ iﬀ f ([x]) ∈ F

We show these as follows:

(i) f (s′ ) = f ([ǫ]) , deﬁnition of s′

= δ̂ (s, ǫ) deﬁnition of f

= s, deﬁnition of δ̂

ii) f δ̂ ([x], a) = f ([xa]) , deﬁnition of δ ′

= δ̂ (s, xa) , deﬁnition of f

= δ δ̂ (s, x) , a , definition of δ̂

= δ f ([x]), a , definition of f
(iii) [x] ∈ F ′ iff x ∈ R, definition of acceptance and property (b)

iﬀ δ̂ (s, x) ∈ F, since L(M ) = R

iﬀ f ([x]) ∈ F, deﬁnition of f .

We thus have the following theorem.

Theorem 9.10.6:
Let Σ be a ﬁnite alphabet. Up to isomorphism of automata, there is a 1–1
correspondence between DFA (with no inaccessible states) over Σ accepting
R ⊆ Σ⋆ and Myhill-Nerode relations for R on Σ⋆ .

Theorem 9.10.6 implies that we can deal with regular sets and ﬁnite
automata in terms of a few simple algebraic properties.
Chapter 9 Finite Automata 474

9.11 Myhill-Nerode Theorem

9.11.1 Notion of Refinement

Definition 9.11.1:
A relation r1 is said to reﬁne another relation r2 if r1 ⊆ r2 , considered as sets
of ordered pairs. That is, r1 reﬁnes r2 if for all x, y if x r1 y holds then x r2 y
also holds.

For equivalence relations ≡1 and ≡2 , this means that for every x, the
≡1 -class of x is included in the ≡2 -class of x.
For example, the equivalence relation i ≡ j mod 6 on the integers refines
the equivalence relation i ≡ j mod 3.
We will now show that there exists a coarsest Myhill-Nerode relation ≡R
for any given regular set R; that is, any other Myhill-Nerode relation for R
refines ≡R . The relation ≡R corresponds to the unique minimal DFA for R.
Property (b) of the definition of Myhill-Nerode relations says that a
Myhill-Nerode relation ≡ for R refines the equivalence relation with equiv-
alence classes R and Σ \ R. The relation of refinement between equivalence
relations is a partial order:

• it is reﬂexive: every relation reﬁnes itself.

• it is antisymmetric: if ≡1 reﬁnes ≡2 and if ≡2 reﬁnes ≡1 then ≡1 and

≡2 are the same relation.

• it is transitive: if ≡1 refines ≡2 and if ≡2 refines ≡3 then ≡1 refines ≡3 .

Note that, if ≡1 reﬁnes ≡2 , then ≡1 is the finer and ≡2 is the coarser of

the two relations. The identity relation on any set S, {(x, x) | x ∈ S} is the
Chapter 9 Finite Automata 475

ﬁnest equivalence relation on S. Also the universal relation on any set S,

{(x, y) | x, y ∈ S} is the coarsest equivalence relation on S.

9.11.2 Myhill-Nerode Theorem and Its Proof

Let R ⊆ Σ⋆ , regular or not. The equivalence relation ≡R on Σ⋆ is deﬁned in

terms of R as follows:
For any x, y, z ∈ Σ⋆ , x ≡R y if and only if (xz ∈ R ⇔ yz ∈ R).
That is two strings x and y are equivalent under ≡R if whenever we
append any other string z to both of them, the resulting strings xz and yz
are either both in R or both not in R. It is easy to reason that ≡R is an
equivalence relation for any R. The following lemma 9.11.2 will show that
≡R for any R (regular or not) satisfies the properties (a) and (b) of Myhill-
Nerode relations and it is the coarsest such relation on Σ⋆ . In case R is
regular, ≡R is of finite index and is therefore a Myhill-Nerode relation for
R. In fact, it is the coarsest possible Myhill-Nerode relation for R and it
corresponds to the unique minimal finite automaton for R.

Lemma 9.11.2:
Let R ⊆ Σ⋆ , be any set, regular or not. Let the relation ≡R be deﬁned as
follows:
For any x, y, z ∈ Σ⋆ , x ≡R y if and only if (xz ∈ R ⇔ yz ∈ R).

Then ≡R is a right congruence refining R and is the coarsest such relation

on ≡R .

Proof. We ﬁrst show that ≡R is a right congruence:

Chapter 9 Finite Automata 476

taking z = aw, a ∈ Σ, w ∈ Σ⋆ in the deﬁnition of ≡R we have,

x ≡R y ⇒ ∀a ∈ Σ, ∀w ∈ Σ⋆ (xaw ∈ R ⇔ yaw ∈ R)

⇒ ∀a ∈ Σ (xa ≡R ya)

We next show that ≡R reﬁnes R. Taking z = ǫ in the deﬁnition of R,

x ≡R y ⇒ (x ∈ R ⇔ y ∈ R) .

We now show ≡R is the coarsest such relation. That is any other equivalence
relation ≡ satisfying property (a) and (b) reﬁnes ≡R :

x ≡ y ⇒ ∀z ∈ Σ⋆ (xz ≡ yz) , by induction on |z| and using property (a)

⇒ ∀z ∈ Σ⋆ (xz ∈ R ⇔ yz ∈ R) , by property (b)

⇒ x ≡R y, by deﬁnition of ≡R .

We are now ready to state and prove the Myhill-Nerode Theorem.

Theorem 9.11.3 (Myhill-Nerode Theorem):

Let R ⊆ Σ⋆ . The following statements are equivalent:

(i) R is regular.

(ii) there exists a Myhill-Nerode relation for R.

(iii) the relation ≡R is of ﬁnite index.

Proof. We ﬁrst show (i) ⇒ (ii): Given a DFA M for R (because R is reg-
ular) we can construct ≡R , a Myhill-Nerode relation for R (as shown in
Section 9.11.2).
Chapter 9 Finite Automata 477

We next show (ii) ⇒ (iii): By lemma 9.11.2, any Myhill-Nerode relation

R is of finite index and refines ≡R ; therefore ≡R is of finite index.
We finally show (iii) = (i): If ≡R is of finite index, then it is a Myhill-
Nerode relation for R and given ≡R we can construct the corresponding DFA
M≡R for R.

Since ≡R is the unique coarsest Myhill-Nerode relation for R, a regular

set, it corresponds to the DFA for R with the fewest states among all DFAs
for R.

9.12 Non-regular Languages

9.12.1 An Example

We now show that the power of ﬁnite automata is limited. Speciﬁcally we

show that there exist languages that are not regular.
Consider the set A given by,

A = {an bn | n ≥ 0} = {ǫ, ab, aabb, aaabbb, . . .}

The intuitive argument to show that there exists no DFA M such that
L(M ) = A goes thus: if such an M exists then M has to remember when
passing through the centre point between the a’s and the b’s and how many
a’s it has seen. It has to do this for arbitrarily long strings an bn (n may
be arbitrarily large), much larger than the number of states. This is an un-
bounded amount of information and it is not possible to remember this with
only ﬁnite memory.
The formal argument is given below:
Chapter 9 Finite Automata 478

Assume that A is regular and assume that a DFA M exists such that
L(M ) = A. Let k be the number of states of this assumed DFA M . Consider
the action of M on the input an bn , where n ≫ k. Let the start state be s
and let the machine reach a ﬁnal state r after scanning the input string an bn :

aaaaaaaaaaaaaaaa
| {z } bbbbbbbbbbbbbbb
| {z }
n n
s r

Figure 9.23:

Since n ≥ k, by pigeon hole principle, there must exist some state, say
p that M must enter more than once while scanning the input. We break
the string an bn into three pieces u, v, w where v is the string of a’s scanned
between the two occurrences of entry into state p. This is depicted below:

| {z } aaaaa
aaaaaa | {z } aaaa bbbbbbbbbbbbbbb
| {z }
u v w
s p p r

Figure 9.24:

Note that we have assumed |v| > 0. We then have

δ̂(s, u) = p δ̂(p, v) = p δ̂(p, w) = r ∈ F

We now show that the substring v can be deleted and the resulting string
will still be erroneously accepted:

δ̂(s, uw) = δ̂ δ̂(s, u), w = δ̂(p, w) = r ∈ F

The acceptance is erroneous because after deleting v, the number of a’s in the
resulting string is strictly less than the number of b’s. This is a contradiction
Chapter 9 Finite Automata 479

and the assumption there exists a DFA M to recognize A is wrong. In other

words, A is not regular. Note that we could also insert (“pump in”) extra
copies of v and the resulting string would be erroneously accepted.
We formalize the idea used in the above example in the form of a theorem
called the “pumping lemma”.

9.12.2 The Pumping Lemma

The theorem commonly known as “pumping lemma” is about a special prop-

erty of regular languages. Languages that are not regular lack the property.
The property states that each string (of a regular language) contains a sec-
tion that can be “repeated any number of times” with the resulting string
still remaining in the language.

Theorem 9.12.1 (Pumping Lemma):

Let A be a regular set. Then there is a number p (the pumping length) such
that for any string w ∈ A with |w| ≥ p, w may be divided into three parts,
as w = xyz satisfying the following conditions:

(i) xy i z ∈ A, i ≥ 0

(ii) |y| > 0 and

(iii) |xy| ≤ p

Proof. We start with an informal approach to the proof.

Let M = (Q, Σ, δ, s, F ) such that L(M ) = A. Let us take p to be |Q|,
the number of states. If for all w ∈ A it happens that |w| < p, then the
theorem is vacuously true—because the three conditions should only for all
Chapter 9 Finite Automata 480

strings of length at least p. Let w ∈ A be such that |w| = n ≥ p. Consider

the sequence of states (starting with s) that M goes through in accepting
w. It starts in state s, then goes to say q3 , then say q21 , then say q10 and
so on till it reaches a ﬁnd state say q14 at the end of the last symbol in w.
Since |w| = n, the sequence of states q3 , q21 , q10 , . . . , q14 has length n + 1.
Also as n ≥ p, n + 1 is greater than p, the number of states of M . By the
pigeon hole principle, the sequence of states must contain a repeated state.
The following ﬁgure depicts the situation—state q10 is repeated:

w= w1 w2 w3 w4 w5 b b b wn

s q3 q21 q10 b b b q10 q14

Figure 9.25:

We divide w into three parts thus: x is the part of w appearing before

the ﬁrst occurrence of q10 ; y is the part of w between the two occurrences of
q10 ; z is the remaining part of w, coming after the second occurrence of q10 .
Therefore,
δ̂(s, x) = q10 δ̂(q10 , y) = q10 δ̂(q10 , z) = q14

Let us “pump in” a copy of y into w—we get the string xyyz. The extra
copy of y will start in state q10 and will end in q10 . So z will still start in
state q10 and will go to the accept state q14 . Thus xyyz will be accepted.
Similar reasoning shows that the string xy i z, i > 0 will be accepted. It is
easy to reason that xz is also accepted. Thus condition (i) is satisfied.
Since y is the part occurring between the two different occurrences of
state q10 , |y| > 0 and so the condition (ii) is satisfied.
To get condition (iii), we make sure that q14 is the first repetition in the
Chapter 9 Finite Automata 481

sequence. By pigeon hole principle, the first p+1 states in the sequence must
contain a repetition. Therefore |xy| ≤ p.
We now formalize the above ideas.
As before, let w = w1 w2 w3 · · · wn , n ≥ p. Let r1 (= s), r2 , r3 , . . . , rn+1
be the sequence of states that M enters while processing w, so that ri+1 =
δ(ri , wi ), 1 ≤ i ≤ n. This sequence of states has length n + 1 which is at least
p + 1. By the pigeon hole principle there must be two identical states among
the first p + 1 states in the sequence. Let the first occurrence of the repeated
state by ra and the second occurrence of the same state by rb . Because ra
occurs among the first p + 1 states starting at r1 we have a ≤ p + 1. Now let,

x = w1 w2 · · · wa−1

y = wa · · · wb−1

z = w b · · · wn

As x takes M from state r1 to ra , y takes M from ra to rb and z takes M from

ra to rn+1 , an accept state, M must accept xy i z for i ≥ 0 (see the informal
argument). We know that a 6= b, so |y| > 0. Also a ≤ p + 1 and so |xy| ≤ p.
Thus all the conditions of the theorem are satisﬁed.
Bibliography

[1] G. E. Andrews, Number Theory, Hindustan Publishing Corporation (In-

dia) Delhi, 1992.

[2] T. M. Apostol, Introduction to Analytic Number Theory, Springer Inter-

national Student Edition, Narosa Publishing House, New Delhi, 1989.

[3] R. Balakrishnan and K. Ranganathan, A Textbook of Graph Theory,

Springer, New York, 2000.

[4] C. Berge, Priciples of Combinatorics, Academic Press, New York, 1971.

[5] E. R. Berlekamp, Algebraic Coding Theory, revised ed., Algean Park

Press, Laguna Hills, 1984.

[6] B. Bollobas, Modern Graph Theory, Springer-Verlag, New York, 1998.

[7] J. A. Bondy and U. S. R. Murty, Graph Theory with Applications, The

MacMillan Press Ltd., 1976.

[8] G. S. Boolos and R. C. Jeﬀrey, Computability and Logic, 2/e, Cambridge

University Press, Cambridge, 1980.

[9] G. Brassard and P. Bratley, Algorithmics: Theory and Practice,

Prentice-Hall, 1988.

[10] C. Chuan-Chong and K. K. Meng, Principles and Techniques in Com-

binatorics, World Scientiﬁc, Singapore, 1992.

[11] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups,

Springer-Verlag, New York, 1988.

[12] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algo-

rithms, Prentice Hall of India, Eastern Economy Edition, 2000.

482
BIBLIOGRAPHY 483

[13] M. D. Davis, R. Sigal, and E. J. Weyuker, Computability, Complexity

and Languages: Fundamentals of Theoretical Computer Science, 2/e,
Academic Press, New York, 1994.

[14] W. Diﬃe and M. E. Hellman, New Directions in Cryptography, IEEE

Trans. Inform. Theory 22 (1976), 644–684.

[15] L. Dornhoﬀ and F. E. Hohn, Applied Modern Algebra, Macmillan, New

York, 1978.

[16] H. M. Edgar, A First Course in Number Theory, Wadsworth Publishing

Co., 1988.

[17] H. B. Enderton, A Mathematical Introduction to Logic, Academic Press,

New York, 1972.

[18] P. Flajolet, Mathematical Methods in the Analysis of Algorithms and

Data Structures, Trends in Theoretical Computer Science (E. Börger,
ed.), Computer Science Press, 1988.

[19] W. J. Gilbert, Modern Algebra with Applications, Wiley, New York,

1976.

[20] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics,

Addison-Wesley, Reading, MA, 1989.

[21] W. K. Grassmann and J. P. Tremblay, Logic and Discrete Mathematics,

Prentice-Hall, New Jersey, 1996.

[22] G. Gratzer, Lattice Theory, Freeman, San Francisco, 1971.

[23] , General Lattice Theory, Birkhauser, Basel, 1978.

[24] D. H. Greene and D. E. Knuth, Mathematics for the Analysis of Algo-

rithms, Birkhauser, Boston, 1982.

[25] F. Harary, Graph Theory, Addison-Wesley, Reading MA, 1971, Second

Printing.

[26] J. L. Hein, Discrete Structures, Logic and Computability, 2/e, Jones and
Bartlett, 2002.

[27] M. E. Hellman, The Mathematics of Public-key Cryptogrphy, Sci. Amer.

241 (1979), 130–139.

[28] I. N. Herstein, Topics in Algebra, 2/e, Wiley, New York, 1975.

BIBLIOGRAPHY 484

[29] K. Hoﬀman and R. Kunze, Linear Algebra, Prentice Hall of India, East-
ern Economy Edition, New Delhi, 1967.

[30] F. E. Hohn, Applied Boolean Algebra, 2/e, Macmillan, 1970.

[31] W. M. L. Holcombe, Algebraic Automata Theory, Cambridge University

Press, Cambridge, 1982.

[32] J. E. Hopcroft and J. D. Ullman, Introduction to Automata Theory,

Languages and Computation, Addision-Wesley, 1979.

[33] J. E. Hopcroft and J. E. Ullman, Formal Languages and their Relation

to Automata, Addison-Wesley, 1969.

[34] E. Horowitz, S. Sahini, and S. Rajasekaran, Fundamentals of Computer

Algorithms, Galgotia Publication Pvt Ltd, New Delhi, 2000.

[35] N. Jacobson, Basic Algebra, 2/e, vol. I & II, Freeman, San Francisco,
1985.

[36] D. Jungnickel, Finite Fields; Structure and Arithmetic, Bibliographis-

ches Institut, Mannheim, 1993.

[37] D. Kahn, The Codebreakers, Weidenfeld and Nicholson, 1967.

[38] D. E. Knuth, Seminumerical Algorithms, 2/e, The Art of Computer

Programming, vol. 2, Addison-Wesley, Reading, Mass, 1981.

[39] , Fundamental Algorithms, 3/e, The Art of Computer Program-

ming, vol. 1, Addison Wesley Longman, 1997.

[40] N. Koblitz, A Course on Number Theory and Cryptography, Springer–

Verleg, 1987.

[41] D. C. Kozen, Automata and Computability, Springer-Verlag, New York,

1997.

[42] S. Lang, Algebra, Addison-Wesley, 1993.

[43] H. R. Lewis and C. H. Papadimitriou, Elements of the Theory of Com-

putation, Prentice-Hall, Eaglewood Cliﬀs, 1981.

[44] R. Lidl and G. Pilz, Applied Abstract Algebra, 2/e, Springer–Verlag,

1998.

[45] S. MacLane and G. Birkhoﬀ, Algebra, 2/e, Collier Macmillan, 1979.

BIBLIOGRAPHY 485

[46] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting

Codes, vol. I & II, North-Holland, Amsterdam, 1977.
[47] Neeraj Kayal Manindra Agrawal and Nitin Saxena, Primes is in p,
Mathematics Newsletter 555 (2003), 11–19.
[48] J. Martin, Introduction to Languages and the Theory of Computation,
McGraw Hill, 2003.
[49] J. Matousek and J. Nesetril, Invitation to Discrete Mathematics, Claren-
don Press, Oxford, 1998.
[50] , Invitation to Discrete Mathematics, Clarendon Press, Oxford,
1998.
[51] R. J. McEliece, Information Theory and Coding Theory, Addison-
Wesley, Reading,Mass, 1977.
[52] E. Mendelson, Introduction to Mathematical Logic, D. Van Nostrand,
New York, 1979.
[53] J. L. Mott, A. Kandel, and T. P. Baker, Discrete Mathematics for Com-
puter Science, Reston Publishing Co., VA, 1983.
[54] I. Niven and H. S Zuckerman, An Introduction to the Theory of Numbers,
???, 2000.
[55] A. M. Odlyuzko, Discrete Logarithms in Finite Fields and Their Cryp-
tographic Significance, Advances in Cryptology, Paris 1984 (T. Beth,
N. Cot, and I. Ingemarsson, eds.), Lecture Notes in Computer Science,
vol. 209, Springer–Verlag, Berlin, 1985, pp. 224–314.
[56] E. S. Page and L. B. Wilson, An Introduction to Computational Com-
binatorics, Cambridge University Press, Cambridge, 1979.
[57] W. W. Peterson and E. J. Weldon Jr., Error-Correcting Codes, MIT
Press, Cambridge,Mass, 1972.
[58] V. Pless, Introduction to the Theory of Error-Correcting Codes, 2/e,
Wiley, New York, 1989.
[59] H. J. Ryser, Combinatorial Mathematics, Carus Mathematical Series,
no. 14, American Mathematical Association, 19.
[60] R. Sedgewick and P. Flajolet, An Introduction to the Analysis of Algo-
rithms, Addison-Wesley, 1996.
BIBLIOGRAPHY 486

[61] M. Sipser, Introduction to Theory of Computation, PWS Publishing

Company, 1997.

[62] G. Szasz, Introduction to Lattice Theory, Academic Press, New York,

1963.

[63] B. L. van der Waerden, Modern Algebra, vol. I & II, Ungar, New York,
1970.

[64] J. H. van Lint, Introduction to Coding Theory, 2/e, Springer-Verlag,

New York, 1992.

[65] J. von zur Gathen and J. Gerhard, Modern Computer Algebra, Cam-
bridge University Press, 1999.

[66] D. Welsh, Codes and Cryptography, Clarendon Press, New York, 1988.

[67] D. B. West, Introduction to Graph Theory, Prentice-Hall, New Jersey,

1996.

PDE by DR Nawazish Ali Shah
No ratings yet
PDE by DR Nawazish Ali Shah
435 pages
J. P. Tremblay, P. G. Sorenson and D. M. Manegre - Instructor's Solutions Manual To Accompany An Introduction To Data Structures With Applications-McGraw Hill (1984)
No ratings yet
J. P. Tremblay, P. G. Sorenson and D. M. Manegre - Instructor's Solutions Manual To Accompany An Introduction To Data Structures With Applications-McGraw Hill (1984)
380 pages
Introduction To Languages and The Theory of Computation
100% (1)
Introduction To Languages and The Theory of Computation
568 pages
Differential Calculus-Das & Mukherjee
100% (1)
Differential Calculus-Das & Mukherjee
623 pages
Discrete and Combinatorial Mathematics An Applied Introduction 5e Ralph P Grimaldi
No ratings yet
Discrete and Combinatorial Mathematics An Applied Introduction 5e Ralph P Grimaldi
1,005 pages
Combinatorial Techniques Sharad S. Sane
100% (1)
Combinatorial Techniques Sharad S. Sane
477 pages
Alan Beardon Algebra and Geometry PDF
100% (1)
Alan Beardon Algebra and Geometry PDF
102 pages
A - Discrete Structures CP Gandhi
No ratings yet
A - Discrete Structures CP Gandhi
753 pages
Eric Gossett Discrete Mathematics With Proof 2003
100% (8)
Eric Gossett Discrete Mathematics With Proof 2003
922 pages
1 - R K Bisht, H S Dhami - Discrete Mathematics-OUP India (2015)
100% (1)
1 - R K Bisht, H S Dhami - Discrete Mathematics-OUP India (2015)
635 pages
Coordinate Geometry of Three Dimensions by Bell
50% (4)
Coordinate Geometry of Three Dimensions by Bell
424 pages
Advance Group Theory Notes by Bilal GCUF
100% (1)
Advance Group Theory Notes by Bilal GCUF
43 pages
Discrete Mathematics With Graph Theory 1nbsped 3031213203 9783031213205 9783031213212 9789382127185 9783031213236 - Compress
No ratings yet
Discrete Mathematics With Graph Theory 1nbsped 3031213203 9783031213205 9783031213212 9789382127185 9783031213236 - Compress
657 pages
Graph Theory
100% (2)
Graph Theory
460 pages
Discrete Mathematics by Norman L. Biggs
No ratings yet
Discrete Mathematics by Norman L. Biggs
498 pages
B.SC - III Maths
No ratings yet
B.SC - III Maths
76 pages
Advanced Level Physics Adewale 9jabaz - NG
No ratings yet
Advanced Level Physics Adewale 9jabaz - NG
315 pages
Basic Maths
100% (1)
Basic Maths
144 pages
Classical Complex Analysis: Liang-Shin Hahn
100% (1)
Classical Complex Analysis: Liang-Shin Hahn
437 pages
Maths
50% (2)
Maths
15 pages
Friendly Introduction To Numerical Analysis 1st Edition Bradie Solutions Manual PDF
33% (3)
Friendly Introduction To Numerical Analysis 1st Edition Bradie Solutions Manual PDF
13 pages
Digital Electronics Chapter 2
0% (1)
Digital Electronics Chapter 2
49 pages
Parsonson S.L. Pure Mathematics (Volumes 1 & 2)
92% (13)
Parsonson S.L. Pure Mathematics (Volumes 1 & 2)
719 pages
Group Theory Discrete Mathematics Mukesh Bhardwaj
100% (1)
Group Theory Discrete Mathematics Mukesh Bhardwaj
29 pages
Complex-Analysis Madras University
100% (1)
Complex-Analysis Madras University
66 pages
B.sc. Mathematics
73% (30)
B.sc. Mathematics
7 pages
Combinatorics Notes
100% (1)
Combinatorics Notes
303 pages
Soln Dis July
No ratings yet
Soln Dis July
318 pages
Compex Analysis by Dennis Z Gill
100% (1)
Compex Analysis by Dennis Z Gill
71 pages
Lecture Notes On Advance Group Theory: by Azhar Hussain
No ratings yet
Lecture Notes On Advance Group Theory: by Azhar Hussain
16 pages
Group Theory (Notes)
No ratings yet
Group Theory (Notes)
39 pages
Discrete Mathematics 2 离散数学
No ratings yet
Discrete Mathematics 2 离散数学
189 pages
Insights On Discrete Structure
No ratings yet
Insights On Discrete Structure
118 pages
CBSE Class 11 Assignment For Sets PDF
No ratings yet
CBSE Class 11 Assignment For Sets PDF
8 pages
Discrete Mathematics, Second Edition by Chandrasekaran, N., Umaparvathi, M
0% (1)
Discrete Mathematics, Second Edition by Chandrasekaran, N., Umaparvathi, M
2 pages
PL Meyer PDF
No ratings yet
PL Meyer PDF
382 pages
Vena
No ratings yet
Vena
328 pages
MTH 202
No ratings yet
MTH 202
263 pages
Book
No ratings yet
Book
347 pages
MTH 202
No ratings yet
MTH 202
215 pages
Data Structures Using C and C++ - Y. Langsam, M. Augenstein and A. M. Tenenbaum
No ratings yet
Data Structures Using C and C++ - Y. Langsam, M. Augenstein and A. M. Tenenbaum
99 pages
Discrete Mathematics: Haluk Bingol February 21, 2012
No ratings yet
Discrete Mathematics: Haluk Bingol February 21, 2012
157 pages
An Introduction To Analysis Differential Calculu 9788173814624
No ratings yet
An Introduction To Analysis Differential Calculu 9788173814624
1 page
Discrete Mathematical Structures
No ratings yet
Discrete Mathematical Structures
2 pages
Maths (Organizer)
No ratings yet
Maths (Organizer)
68 pages
Satya Algebraic
100% (1)
Satya Algebraic
351 pages
Cantor's Diagonal Argument
No ratings yet
Cantor's Diagonal Argument
4 pages
Solutions To Complex Variables and Applications (9780073383170), Pg. 282, Ex. 1 - Free Homework Help and Answers - Slader
No ratings yet
Solutions To Complex Variables and Applications (9780073383170), Pg. 282, Ex. 1 - Free Homework Help and Answers - Slader
6 pages
Continue
No ratings yet
Continue
5 pages
Einar Hille - Analytic Function Theory
100% (1)
Einar Hille - Analytic Function Theory
2 pages
Instructions For Preparation of Database Projects
No ratings yet
Instructions For Preparation of Database Projects
6 pages
Books List
No ratings yet
Books List
3 pages
Mariano Giaquinta Mathematical Analysis PDF
No ratings yet
Mariano Giaquinta Mathematical Analysis PDF
418 pages
Chapter XI Group Theory
No ratings yet
Chapter XI Group Theory
13 pages
Number Theory Notes Anwar Khan
No ratings yet
Number Theory Notes Anwar Khan
219 pages
Introductory Discrete Mathematics
No ratings yet
Introductory Discrete Mathematics
1 page
International Islamic University, Islamabad BS Mathematics Real Analysis II
No ratings yet
International Islamic University, Islamabad BS Mathematics Real Analysis II
29 pages
A Course in Combinatorics - J.H. Van Lint, R.M. Wilson
No ratings yet
A Course in Combinatorics - J.H. Van Lint, R.M. Wilson
618 pages
t1
No ratings yet
t1
4 pages
Solution CH 02 Part 1
No ratings yet
Solution CH 02 Part 1
6 pages
Complex Number GB Sir Module PDF
No ratings yet
Complex Number GB Sir Module PDF
24 pages
Digital Logic Circuit Analysis and Design ISM by Nelson
No ratings yet
Digital Logic Circuit Analysis and Design ISM by Nelson
31 pages
Pre Final Quiz 2 Attempt Review PDF
No ratings yet
Pre Final Quiz 2 Attempt Review PDF
5 pages
Real Analysis Exercises
No ratings yet
Real Analysis Exercises
6 pages
Data Types and Variables
No ratings yet
Data Types and Variables
3 pages
Systems of Inequalities Word Problem Worksheet
No ratings yet
Systems of Inequalities Word Problem Worksheet
2 pages
Title: Non-Linear Optimization (Unconstrained) - Direct Search Method
100% (1)
Title: Non-Linear Optimization (Unconstrained) - Direct Search Method
21 pages
Second Derivative Test
100% (1)
Second Derivative Test
3 pages
RELATION AND FUNCTION - Module
No ratings yet
RELATION AND FUNCTION - Module
5 pages
Expressions
No ratings yet
Expressions
26 pages
LAA Notes 2024 Web
No ratings yet
LAA Notes 2024 Web
215 pages
CM7 Logic
No ratings yet
CM7 Logic
13 pages
1.9.4 Test (TST) - Foundations of Geometry (Test)
No ratings yet
1.9.4 Test (TST) - Foundations of Geometry (Test)
11 pages
PRELIM ACTIVITY: 09/07/2021: Write The Pseudocode, Algorithm and Draw The Flowchart
No ratings yet
PRELIM ACTIVITY: 09/07/2021: Write The Pseudocode, Algorithm and Draw The Flowchart
5 pages
Ict Reviewer 2q
No ratings yet
Ict Reviewer 2q
3 pages
Lecture 7 - Pointer
No ratings yet
Lecture 7 - Pointer
19 pages
For Semi Finals
No ratings yet
For Semi Finals
10 pages
W 23 Lecture 10
No ratings yet
W 23 Lecture 10
45 pages
ExtraExamples 1 6
No ratings yet
ExtraExamples 1 6
4 pages
L7 Greedy-Algorithms
No ratings yet
L7 Greedy-Algorithms
49 pages
Activity 2 and 3 - Artificial Intelligence - CS5 AND CS 26
No ratings yet
Activity 2 and 3 - Artificial Intelligence - CS5 AND CS 26
2 pages
Compiler Design
No ratings yet
Compiler Design
4 pages
4 - Expressions
No ratings yet
4 - Expressions
16 pages
Tuples Type B
No ratings yet
Tuples Type B
13 pages
Spring 23 Mid Cse 2213 Uiu
No ratings yet
Spring 23 Mid Cse 2213 Uiu
2 pages
Daa QB
No ratings yet
Daa QB
3 pages
Design and Analysis of Algorithms: CSE 5311 Lecture 22 All-Pairs Shortest Paths
No ratings yet
Design and Analysis of Algorithms: CSE 5311 Lecture 22 All-Pairs Shortest Paths
40 pages
List Manipulation in Turbo Prolog: V.Cotelea
No ratings yet
List Manipulation in Turbo Prolog: V.Cotelea
14 pages
01 Handout 1
No ratings yet
01 Handout 1
5 pages
JMSAA 90902 M. Shirvani-Ghadikolai Et Al. 27-42
No ratings yet
JMSAA 90902 M. Shirvani-Ghadikolai Et Al. 27-42
16 pages

Discrete Mathematics - Balakrishnan and Viswanathan

Uploaded by

Discrete Mathematics - Balakrishnan and Viswanathan

Uploaded by

Contents

1 Introduction: Sets, Functions and Relations 1

2.11 Principle of Inclusion-Exclusion . . . . . . . . . . . . . . . . . 75

3 Basics of Number Theory 115

4 Mathematical Logic 161

4.4 Normal forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

5 Algebraic Structures 209

5.18 Integral Domains . . . . . . . . . . . . . . . . . . . . . . . . . 263

6 Graph Theory 316

6.6.4 Algorithm KRUSKAL . . . . . . . . . . . . . . . . . . 363

7 Coding Theory 371

8.7.5 Miller-Rabin Algorithm (a, n) . . . . . . . . . . . . . . 416

9 Finite Automata 430

Introduction: Sets, Functions

Theorem 1.1.2 (De Morgan’s laws):

(i) ( ∪ Aα )′ = ∩ A′α , and

Proof. We prove (i); the proof of (ii) is similar.

For example, if A is a set of students of a particular class, then for a ∈ A

Equivalently, this means that f (a1 ) 6= f (a2 ) implies that a1 6= a2 . Hence

The function f : Z → 2Z deﬁned by f (a) = 2a is bijective.

deﬁned by (g ◦ f )(a) = g(f (a)) for a ∈ A.

As an example, let A = Z denote the set of integers, B = N ∪ {0}, where

N is the set of natural numbers {1, 2, . . .}, and C = N. If f : A → B is given

(i) f (X1 ∪ X2 ) = f (X1 ) ∪ f (X2 ),

(ii) f (X1 ∩ X2 ) ⊆ f (X1 ) ∩ f (X2 ),

(iii) f −1 (Y1 ∪ Y2 ) = f −1 (Y1 ) ∪ f −1 (Y2 ), and

(iv) f −1 (Y1 ∩ Y2 ) = f −1 (Y1 ) ∩ f −1 (Y2 ).

1.3 Equivalence Relations

If (a, b) ∈ R, then we say that b is related to a under R and we also denote

(i) R is reﬂexive, that is, (a, a) ∈ R for each a ∈ X,

(ii) R is symmetric, that is, if (a, b) ∈ R then (b, a) ∈ R, and

(iii) R is transitive, that is, if (a, b) ∈ R and (b, c) ∈ R, then (a, c) ∈ R.

We denote by [a] the set of elements of X which are related to a under

(2) Let X = N, the set of positive integers. For x, y ∈ X, set xRy iﬀ x is a

divisor of y. Then R is not symmetric but reﬂexive and transitive.

It is clear that similar examples can be constructed.

Example 1.3.5 (Example of an equivalence relation):

Let X = Z, the set of integers, and let (a, b) ∈ R iﬀ a − b is a multiple of 5.

[0] ={. . . , −10, −5, 0, 5, 10, . . .},

[1] ={. . . , −9, −4, 1, 6, 11, . . .},

[2] ={. . . , −8, −3, 2, 7, 12, . . .},

[3] ={. . . , −7, −2, 3, 8, 13, . . .},

[4] ={. . . , −6, −1, 4, 9, 14, . . .}.

1.4 Finite and Infinite Sets

If φ : A → B is a bijection from A to B, then φ−1 : B → A is also

If S is equipotent to Nn , then the number of elements in it is n. Hence

Proof. Suppose f : S → S is 1–1, and T = f (S). If T 6= S, as f is a bijection

function φ : S → S ′ deﬁned by φ(s) = s′ is a bijection. Thus S is bijective

Clearly f is onto; however, f is not 1–1 since

f (1) = f (0) = f (−1) = 0.

By induction it follows that the union of a disjoint-family of a ﬁnite

Proof. By induction on the number of ﬁnite sets.

1.5 Cardinal Numbers of Sets

the ﬁrst character of the Hebrew language.

Proof. Let X be an inﬁnite set and let x1 ∈ X. Then X1 = X \ {x1 } is an

Proof. That no ﬁnite set can be equipotent to a proper subset of it has

Y =(X \ X0 ) ∪ {x2 , x3 , . . .} = X \ {x1 }.

Then the mapping φ : X → Y deﬁned by

We denote the cardinality of a set X by |X|.

Theorem 1.5.6 (Schroder-Bernstein):

For the proof of Theorem 1.5.6, we need a lemma.

Proof. If A is a ﬁnite set, then |A| = |A2 | gives that A = A2 , as A2 is a

A1 ⊇ A2 ⊇ A3 , and |A1 | = |A3 | (1.1)

So starting with A ⊇ A1 ⊇ A2 and |A| = |A2 |, we get (1.1). Starting with

A2 ⊇ A3 ⊇ A4 and |A2 | = |A4 |. (1.2)

with |A| = |A3 |, and |Ai | = |Ai+2 |, for each i ≥ 1. Moreover,

ThenA =(A \ A1 ) ∪ (A1 \ A2 ) ∪ . . . ∪ P, and

where the sets on the right are pairwise disjoint. Moreover, |A \ A1 | =

Proof of Schroder–Bernstein theorem. By hypothesis |X| ≤ |Y |. Hence there

For a diﬀerent proof of Theorem 1.5.6, see Chapter 6.

1.6 Power set of a set

For instance, if X = {1, 2, 3}, then

Conversely, every function f : X → {0, 1} is the characteristic function of a

mapping g : s → g(S) = x ∈ X is a bijection. Now the element x may or

1. State true or false (with reason):

(i) Parallelism is an equivalence relation on the set of all lines in the