0% found this document useful (0 votes)
101 views

Finite Fields and Error-Correcting Codes: Lecture Notes in Mathematics

This document provides lecture notes on finite fields and error-correcting codes. Chapter 1 introduces finite fields, including basic definitions of rings and fields. It defines the ring Zn of integers modulo n and proves that Zn is a field if and only if n is a prime number. Chapter 2 will cover linear codes, generating matrices, control matrices for decoding, and some special codes like Vandermonde matrices and Reed-Solomon codes.

Uploaded by

yashbhardwaj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views

Finite Fields and Error-Correcting Codes: Lecture Notes in Mathematics

This document provides lecture notes on finite fields and error-correcting codes. Chapter 1 introduces finite fields, including basic definitions of rings and fields. It defines the ring Zn of integers modulo n and proves that Zn is a field if and only if n is a prime number. Chapter 2 will cover linear codes, generating matrices, control matrices for decoding, and some special codes like Vandermonde matrices and Reed-Solomon codes.

Uploaded by

yashbhardwaj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Lecture Notes in Mathematics

Finite Fields and Error-Correcting Codes


Karl-Gustav Andersson
(Lund University)

(version 1.014 - 31 December 2015)

Translated from Swedish by Sigmundur Gudmundsson


Contents

Chapter 1. Finite Fields 3


1. Basic Definitions and Examples 3
2. Calculations with Congruences 8
3. Vector Spaces 14
4. Polynomial Rings 17
5. Finite Fields 23
6. The Existence and Uniqueness of GF (pn ) 28
7. The Möbius Inversion Formula 32
Chapter 2. Error-Correcting Codes 35
1. Introduction 35
2. Linear Codes and Generating Matrices 38
3. Control Matrices and Decoding 41
4. Some Special Codes 45
5. Vandermonde Matrices and Reed-Solomon Codes 50

1
CHAPTER 1

Finite Fields

1. Basic Definitions and Examples


In this introductory section we discuss the basic algebraic opera-
tions addition and multiplication from an abstract point of view. We
consider a set A equipped with two operations defined in such a way
that to each pair of elements a, b ∈ A there are associated two new
elements a + b and a · b in A called the sum and the product of a and
b, respectively. We assume that for the sum we have the following four
axioms.

(A1) a + (b + c) = (a + b) + c
(A2) a+b=b+a

(A3) there exists an element 0 ∈ A such that

a + 0 = a for all a ∈ A

(A4) for every a ∈ A there exists an element −a ∈ A such that

a + (−a) = 0.

These axioms guarantee that subtraction is well-defined in A. It is


easily checked that (A1)–(A4) imply that the equation a + x = b in A
has the unique solution x = b + (−a). In what follows we will write
b − a for b + (−a).
The corresponding axioms for the multiplication are

(M1) a · (b · c) = (a · b) · c
(M2) a·b=b·a

(M3) there exists an element 1 ∈ A such that

1 · a = a · 1 = a for all a ∈ A
3
4 1. FINITE FIELDS

(M4) for every a 6= 0 in A there exists an element a−1 ∈ A such


that
a · a−1 = 1.
Sometimes we will only assume that some of these axioms for the
multiplication are satisfied. If they all apply then, precisely as for the
subtraction, a division is well-defined in A i.e. the equation ax = b
with a 6= 0 has the unique solution x = a−1 · b.
Finally, we always assume the distributive laws for A:
(D) a · (b + c) = a · b + a · c and (a + b) · c = a · c + b · c
Definition 1.1. A ring A is a set equipped with an addition and
a multiplication such that all the rules (A1)–(A4) are satisfied and
furthermore (M1) and (D). If A also satisfies (M2) it is said to be a
commutative ring and if (M3) is fulfilled we say that the ring has a
unity. A ring that contains at least two elements and satisfies all the
rules (M1)–(M4) for the multiplication is called a field.
Example 1.2. The rational numbers Q , the reals R and the com-
plex numbers C are important examples of fields, when equipped with
their standard addition and multiplication. The integers Z form a
commutative ring but are not a field since (M4) is not valid in Z .
Example 1.3. The set M2 (R) of 2 × 2 real matrices forms a ring.
Here 0 is the zero matrix and 1 is the unit matrix. In M2 (R) the
commutative law (M2) is not satisfied. The rule (M4) is not fulfilled
either, since there exist non-zero matrices that are not invertible. For
example we have
    
1 −2 4 −2 0 0
= .
−2 4 2 −1 0 0
It follows from this relation that none of the two matrices on the left-
hand side are invertible.
Definition 1.4. Two elements a 6= 0 and b 6= 0 in a ring are called
zero divisors if a · b = 0.
Example 1.5. The two matrices
   
1 −2 4 −2
and
−2 4 2 −1
in Example 1.3 are zero divisors in the ring M2 (R).
We shall now discuss, in more detail, a family of rings that will play
an important role in what follows. Let n ≥ 2 be a given integer. We
1. BASIC DEFINITIONS AND EXAMPLES 5

say that two integers a and b are congruent modulo n if their difference
a − b is divisible by n. For this we simply write a ≡ b (mod n). For
example we have 13 ≡ 4 (mod 3). Denote by [a] the class of integers
that are congruent to a modulo n. We can then define an addition and
a multiplication of such congruence classes by
[a] + [b] = [a + b] and [a] · [b] = [a · b].
Here we must verify that these definitions do not depend on the choice
of representatives for each congruent class. So assume that a ≡ a1
(mod n) and b ≡ b1 (mod n). Then a1 = a + kn and b1 = b + ln for
some integers k and l. This implies that
a1 + b1 = a + b + (k + l)n and a1 b1 = ab + (al + bk + kln)n,
hence a1 +b1 is congruent with a+b and a1 b1 with ab modulo n. Denote
by Zn the set of congruence classes modulo n i.e.
Zn = {[0], [1], [2], . . . , [n − 1]}.
It is easily checked that the above defined addition and multiplication
turn Zn into a commutative ring.
Example 1.6. In the ring Z11 we have
[5] + [9] = [14] = [3] and [5] · [9] = [45] = [1]
and in Z12 the following equalities hold
[4] + [9] = [13] = [1] and [4] · [9] = [36] = [0].
As a direct consequence of the example we see that [5] is the mul-
tiplicative inverse of [9] in the ring Z11 . The following result gives a
criteria for an element of Zn to have a multiplicative inverse.
Theorem 1.7. Let [a] in Zn be different from [0]. Then there exists
an element [b] in Zn such that [a][b] = [1] if and only if a and n are
relatively prime i.e. they do not have a non-trivial common divisor.
Proof. Let us first assume that a and n have a common divisor
d ≥ 2. Then a = kd and n = ld for some integers k and l with
0 < l < n. This implies that [l][a] = [lkd] = [kn] = [0]. Hence there
does not exist a multiplicative inverse [b] to [a], because in that case
[l] = [l][1] = [l][a][b] = [0][b] = [0].
On the other hand, if a and n are relatively prime then it is a con-
sequence of the Euclidean algorithm that there exist integers b and c
such that 1 = ab + nc. This gives [1] = [a][b]. 
6 1. FINITE FIELDS

Example 1.8. We will now use the Euclidean algorithm to deter-


mine whether or not [235] has a multiplicative inverse in Z567 .
567 = 2 · 235 + 97
235 = 2 · 97 + 41
97 = 2 · 41 + 15
41 = 3 · 15 − 4
15 = 4 · 4 − 1
This shows that 567 and 235 are relatively prime, and by following the
calculations backwards we see that
1 = 4·4−15 = 4·(3·15−41)−15 = 11·15−4·41 = · · · = 63·567−152·235.
Hence the multiplicative inverse of [235] is [−152] = [415].
If n = p is a prime, then it is clear that none of the numbers
1, 2, . . . , p − 1 has a common divisor with p. This shows that all the
classes [1], [2], . . . , [p − 1] in Zp , different from [0], have a multiplicative
inverse, so Zp is a field. If n is not a prime, then n = kl for some
integers k, l ≥ 2. Then none of the two classes [k] and [l] has an inverse
in Zn , so Zn is not a field. We summarize:
Theorem 1.9. The ring Zn is a field if and only if n is a prime.
We conclude this section by defining the notion of an isomorphism
between rings. Let A1 and A2 be two rings and assume that there exists
a bijective map f from A1 to A2 such that
f (a + b) = f (a) + f (b) and f (a · b) = f (a) · f (b)
for all elements a and b in A1 . In that case, we say that the rings A1 and
A2 are isomorphic and that f is an isomorphism from A1 to A2 . Two
rings that are isomorphic are actually just two different representations
of the same ring. An isomorphism corresponds to just changing the
names of the elements. All calculations in one of the rings correspond
to exactly the same calculations in the other.
Example 1.10. Let M be the ring of all 2 × 2 matrices of the form
 
a −b
b a
where a and b are real numbers and the operations are the standard
matrix addition and matrix multiplication. Then the map
 
a −b
M3 7→ a + ib ∈ C
b a
1. BASIC DEFINITIONS AND EXAMPLES 7

defines an isomorphism from M to the ring C of complex numbers.


The reader is encouraged to check this fact.

Exercises

Exercise 1.1. Show that the following rules are valid in any ring:
(1) 0 · a = a · 0 = 0, (Hint: 0 · a + 0 · a = 0 · a.)
(2) (−a)b = a(−b) = −ab,
(3) (−a)(−b) = ab.
Exercise 1.2. Show that a field does not have any zero divisors.
Exercise 1.3. Show that if a is not a zero divisor in the ring A
then the following cancelation law applies
ax = ay ⇒ x = y
for all x and y in A.
Exercise 1.4. Let M be the set of all matrices
 
a 2b
,
−b a
where a and b are integers. Show that, with the standard matrix addi-
tion and multiplication, M forms a commutative ring with unity. Does
M have any zero divisors?

Exercise
√ 1.5. Let Q[ 2] be the set of all numbers of the form
a + b 2, where a and b are rational. Show
√ that the usual addition and
multiplication of real numbers turn Q[ 2] into a field.
Exercise 1.6. Let Z[i] be the set of Gaussian integers a + ib, where
a and b are integers. Show that Z[i], with the usual addition and
multiplication of complex numbers, is a commutative ring with unity.
For which elements u ∈ Z[i] does there exist a multiplicative inverse v
i.e. an element v such that uv = 1?
Exercise 1.7. Show that a ring A is commutative if and only if
(a + b)2 = a2 + 2ab + b2
for all a and b in A.
Exercise 1.8. Find out if the determinant

325 131 340

142 177 875

214 122 961
8 1. FINITE FIELDS

is an odd number or an even one.


Exercise 1.9. Solve in Z23 the equations
[17] · x = [5] and [12] · x = [7].
Exercise 1.10. Determine if [121] and [212] are invertible in Z9999
or not. Find the inverses if they exist.
Exercise 1.11. Consider the elements [39], [41], [46] and [51] in
Z221 .
(1) Which of these are zero divisors?
(2) Which have a multiplicative inverse? Find the inverses if they
exist.
Exercise 1.12. Solve the following systems of equations
( (
4x + 7y ≡ 3 (mod 11) 4x + 7y ≡ 5 (mod 13)
, .
8x + 5y ≡ 9 (mod 11) 7x + 5y ≡ 8 (mod 13)
Exercise 1.13. Determine the digits x and y such that the follow-
ing decimal numbers are divisible by 11
2x653874 , 37y64943252.
n n
(Hint: 10 ≡ (−1) (mod 11).)
Exercise 1.14. Let A be a finite commutative ring with a unity.
Show that if a ∈ A is not a zero divisor, then a has a multiplicative
inverse. (Hint: Consider the map x 7→ ax , x ∈ A.)
Exercise 1.15. Let a be a non-zero element in a field A.
(1) Show that if a−1 = a, then either a = 1 or a = −1.
(2) Prove Wilson’s theorem stating that for every prime p we have
(p − 1)! ≡ −1 (mod p).

2. Calculations with Congruences


Let F be a finite field with q elements and F ∗ = {x ∈ F ; x 6= 0}.
We order the elements of F ∗ in a sequence x1 , x2 , . . . , xq−1 . Then for
every fixed a ∈ F ∗ the sequence ax1 , ax2 , . . . , axq−1 contains exactly the
same elements i.e. those of F ∗ , since if axi = axj then multiplication
by a−1 gives xi = xj . We have therefore shown that
q−1 q−1
Y Y
(axi ) = xi .
i=1 i=1
2. CALCULATIONS WITH CONGRUENCES 9

By collecting a from
Qq−1each of the different factors on the left-hand side
q−1
and dividing by i=1 xi , we obtain a = 1 and have thereby proven
the following result.
Theorem 2.1. Let F be a finite field with q elements and a 6= 0 be
an element of F . Then
aq−1 = 1.
Specializing to the case when F = Zp , for some prime p, we obtain
the following result due to Pierre de Fermat in 1640:
Theorem 2.2 (Fermat’s little theorem). If p is a prime number
and a is an integer not divisible by p, then
ap−1 ≡ 1 (mod p).
Example 2.3. We now want to calculate the least positive remain-
der when dividing 3350 by 17. Since 17 is a prime, Fermat’s theorem
tells us that 316 ≡ 1 (mod 17). Hence
3350 = 321·16+14 ≡ 314 (mod 17).
A continued calculation modulo 17 gives
314 = 97 = 9 · 813 ≡ 9 · (−4)3 = 9 · (−4) · 16 ≡ 9 · (−4) · (−1) = 36 ≡ 2.
The remainder that we are looking for is therefore 2.
Alternatively, one can show that 314 ≡ 2 by observing that 314 ·32 =
3 ≡ 1. This implies that [314 ] = [9]−1 = [2], since 2 · 9 = 18 ≡ 1.
16

The next result generalizes Fermat’s little theorem.


Theorem 2.4. Let p and q be different prime numbers and m be a
positive integer. Then
am(p−1)(q−1)+1 ≡ a (mod pq)
for every integer a.
Proof. If p does not divide a, then it follows from Fermat’s theo-
rem that
ap−1 ≡ 1 (mod p).
This implies that
am(p−1)(q−1) ≡ 1 (mod p).
Multiplication by a gives
am(p−1)(q−1)+1 ≡ a (mod p).
This equality is of course also valid when p divides a, since then a ≡ 0
(mod p). In the same way, we see that
am(p−1)(q−1)+1 ≡ a (mod q).
10 1. FINITE FIELDS

Since both p and q divide the difference am(p−1)(q−1)+1 − a so does the


product pq and the statement is proven. 
Example 2.5. Theorem 2.4 has an interesting application in cryp-
tology. Assume that a receiver, for example a bank, receives messages
from a large number of senders and does not want the content to be read
by unauthorized individuals. Then the messages must be encrypted.
This means that an encrypting key must be available to the sender.
One way to achieve this is to use a system with a public key. Such
systems are based on the idea that there exist functions that are eas-
ily computed but the inverse operation is very difficult without some
additional information. The following method (the RSA-system) was
suggested by Rivest, Shamir and Adelman in 1978.
Choose two large1 different primes p and q and set n = pq. Then
pick a large number d relatively prime to (p − 1)(q − 1). According to
Theorem 1.7 of the last section, d has a multiplicative inverse e in the
ring Z(p−1)(q−1) , which can be determined by the Euclidean algorithm.
The numbers n and e are made public as well as necessary information
on how they should be used for the encrypting. The numbers p, q and
d are kept secret by the receiver.
Assume that all the messages are of the form of one or more integers
between 1 and n. A sender interested in sending such a number M will
encrypt it by calculating C ≡ M e (mod n). After receiving C, the
receiver calculates the unique number D between 1 and n satisfying
D ≡ C d (mod n). According to Theorem 2.4 we have the equality
D ≡ M (mod n). Indeed, since e is the multiplicative inverse of d in
the ring Z(p−1)(q−1) , it follows that ed = m(p − 1)(q − 1) + 1 for some
integer m, so
D ≡ C d ≡ M ed = M m(p−1)(q−1)+1 ≡ M (mod n).
Now the interesting question is, if it is possible to use only the
public information e and n to get hold of the content of the message
sent. To do this within a reasonable amount of time one would need
to know the prime numbers p and q. These can be determined by
factorizing n. Even with our modern computers this should in general
be an impossible task.
In the next example we deal with the problem of finding a simulta-
neous solution to several different congruences.
Example 2.6. In a 2000 years old book by the Chinese author
Sun-Tsu one can read:
1By large numbers we here mean numbers with hundreds of digits.
2. CALCULATIONS WITH CONGRUENCES 11

“There exists an unknown number which divided by 3 leaves the


remainder 2, by 5 the remainder 3 and by 7 the remainder 2. What is
this number?”
In other words, one should find an integer x that simultaneously
solves the three congruences
x≡2 (mod 3)
x≡3 (mod 5)
x≡2 (mod 7).
The method that Sun-Tsu presented for solving the problem gives the
Chinese remainder theorem.
Theorem 2.7. Assume that the integers n1 , n2 , . . . , nk are pairwise
relatively prime. Then the system of congruences
x ≡ a1 (mod n1 )
x ≡ a2 (mod n2 )
...
x ≡ ak (mod nk )
has a unique solution x modulo n = n1 n2 · · · nk .
Proof. Define
n Y
Ni = = nj .
ni j6=i

Then the numbers Ni and ni are relatively prime for each i. Hence
there exist integers si and ti such that
si Ni + ti ni = 1.
Set
k
X
x= aj sj Nj = a1 s1 N1 + · · · + ak sk Nk .
j=1

We have si Ni ≡ 1 (mod ni ) and Nj ≡ 0 (mod ni ) when j 6= i. This


implies that
x ≡ ai (mod ni ) , i = 1, . . . , k.
We still have to show that the solution x is uniquely determined
modulo n. Assume that x̃ was another solution. Then x ≡ x̃ (mod ni )
for all i. Since the numbers ni are pairwise relatively prime, it follows
that x ≡ x̃ (mod n) and the result follows. 
12 1. FINITE FIELDS

Example 2.8. In the last example we have n1 = 3, n2 = 5, n3 =


7 and N1 = 35, N2 = 21, N3 = 15. We find
2 · 35 − 23 · 3 = 1
1 · 21 − 4 · 5 = 1
1 · 15 − 2 · 7 = 1.
So the above method gives the solution
x = 2 · 2 · 35 + 3 · 1 · 21 + 2 · 1 · 15 = 233.
The least positive solution is
233 − 2n = 233 − 210 = 23.
The Chinese remainder theorem has another, a bit more abstract,
formulation. If A1 , . . . , Ak are k rings, then we can form a new ring
denoted by A1 × · · · × Ak consisting of all elements (a1 , . . . , ak ) where
ai ∈ Ai . The addition and the multiplication in the new ring are defined
by
(a1 , . . . , ak ) + (b1 , . . . , bk ) = (a1 + b1 , . . . , ak + bk )
(a1 , . . . , ak ) · (b1 , . . . , bk ) = (a1 · b1 , . . . , ak · bk ).
Assume now that n = n1 n2 · · · nk where the numbers ni are pairwise
relatively prime. Then the Chinese remainder theorem states that for
given integers a1 , . . . , ak with 0 ≤ ai < ni , there exists precisely one
integer a with 0 ≤ a < n such that
a ≡ ai (mod ni ) , i = 1, . . . , k.
It is easily checked that the map that takes a to (a1 , . . . , ak ) is an
isomorphism between Zn and Zn1 × · · · × Znk .
Example 2.9. Let n = 1001 = 7 · 11 · 13 and consider the two
elements [778] and [431] in Z1001 . Then
778 ≡ 1 (mod 7) 431 ≡ 4 (mod 7)
778 ≡ 8 (mod 11) 431 ≡ 2 (mod 11)
778 ≡ 11 (mod 13) 431 ≡ 2 (mod 13).
Instead of calculating the product 778 · 431 modulo 1001, we can also
calculate
(1, 8, 11) · (4, 2, 2) = (4, 16, 22) ≡ (4, 5, 9)
in the ring Z7 × Z11 × Z13 and then, as in the proof of the Chinese
remainder theorem, determine the corresponding element in Z1001 . This
sort of arithmetic is sometimes useful when performing this type of
calculations with large numbers.
2. CALCULATIONS WITH CONGRUENCES 13

Exercises

Exercise 2.1. Find the multiplicative inverse of [45] in Z101 . Then


determine the integer x between 1 and 100 such that
4599 ≡ x (mod 101) .
Exercise 2.2. In each of the following cases, find the least non-
negative integer x satisfying
x ≡ 35000 (mod 13), x ≡ 3100 (mod 101),
40 1000
x≡3 (mod 23), x≡2 (mod 7).
Exercise 2.3. Show that if p and q are different primes, then
pq−1 + q p−1 ≡ 1 (mod pq).
Exercise 2.4. Let p1 , p2 , . . . , pk be different primes and r be a pos-
itive integer divisible by pi − 1 for all i = 1, . . . , k. Show that
ar+1 ≡ a (mod p1 · p2 · · · pk )
for all integers a.
Exercise 2.5. Show that all integers n satisfy
(1) n7 ≡ n (mod 42),
(2) n13 ≡ n (mod 2730).
(Hint: Use the result from Exercise 2.4.)
Exercise 2.6. Find the least positive integer M , such that
M 49 ≡ 21 (mod 209).
Exercise 2.7. Show that if p is a prime and m is a positive integer,
then
m−1
a(p−1)p ≡ 1 (mod pm )
for all integer a not divisible by p. (Hint: Copy the proof of Theorem
2.1 with F ∗ equal to the set of all invertible elements in Zpm .)
Exercise 2.8. Show that all odd integers k satisfy
(1) k 4 ≡ 1 (mod 16),
n
(2) k 2 ≡ 1 (mod 2n+2 ) where n ≥ 2.
Exercise 2.9. Find all integers x such that

x ≡ 1 (mod 3)

x ≡ 3 (mod 7)

x ≡ 7 (mod 16) .
14 1. FINITE FIELDS

Exercise 2.10. Find the least positive integer x satisfying


(
2x ≡ 9 (mod 11)
7x ≡ 2 (mod 19) .
Exercise 2.11. Verify that
(
95 ≡ 3 (mod 23)
95 ≡ 2 (mod 31)
and apply this to calculate 9536 (mod 713) .

3. Vector Spaces
Definition 3.1. A vector space (or a linear space) over a field F is
a set V , containing an element denoted by 0, and for each pair u, v ∈ V
and each α ∈ F having a well-defined sum u + v ∈ V and a product
αu ∈ V such that the following rules are satisfied
(i) u + (v + w) = (u + v) + w
(ii) u+v =v+u
(iii) α(βu) = (αβ)u
(iv) 1u = u
(v) 0u = 0
(vi) α(u + v) = αu + αv
(vii) (α + β)u = αu + βu .
Remark 3.2. It follows from these rules that all the axioms for
addition, (A1)–(A4) from Section 1, are satisfied in a vector space.
From (iv) , (v) and (vii) we get
u + 0 = 1u + 0u = (1 + 0)u = 1u = u
so (A3) applies. The axiom (A4) can be verified as follows
u + (−1)u = 1u + (−1)u = (1 + (−1))u = 0u = 0 .
Remark 3.3. The elements of a vector space are often called vec-
tors. In (v) we underlined the zero on the right-hand side to emphasize
that it is a vector. In what follows, we will simply denote also the zero
vector by 0.
The basic theory for vector spaces over a general field F is the same
as for the special case when F = R. A number of vectors u1 , . . . , ul in
3. VECTOR SPACES 15

V are said to be linearly dependent if there exist α1 , . . . , αl ∈ F , not


all zero, such that
α1 u1 + · · · + αl ul = 0 .
We say that u1 , . . . , ul are linearly independent if they are not linearly
dependent. The vectors u1 , . . . , ul generate the vector space V if every
vector u ∈ V is a linear combination of u1 , . . . , ul i.e. if
u = α1 u1 + · · · + αl ul
for some α1 , . . . , αl ∈ F . A basis for V is a collection of vectors
e1 , . . . , en which are linearly independent and generate V . This is equiv-
alent to the statement that every vector u ∈ V can, in a unique way,
be written as
u = α1 e1 + · · · + αn en ,
where α1 , . . . , αn ∈ F . The coefficients α1 , . . . , αn are called the coor-
dinates of the vector u in the basis e1 , . . . , en . Two different bases for a
given vector space always contain equally many elements and a vector
space is said to have the dimension n if it has a basis with n vectors. If
a vector space V is generated by a finite number of vectors v1 , . . . , vm ,
then we can always pick a basis from these. If the vectors v1 , . . . , vm
are linearly independent then they form a basis. Otherwise, one of
them, for example vm , is a linear combination of the others. Then V is
generated by v1 , . . . , vm−1 . In this way, we can continue until we obtain
a collection of linearly independent vectors which generate V .
Example 3.4. For a given field F the standard example of a vector
space over F is its n-fold product
F n = {(α1 , . . . , αn ) ; αi ∈ F }
with addition and multiplication, by elements from F , in each compo-
nent. Every vector space V over F of dimension n can be identified
with F n by choosing a basis in V .
Example 3.5. Let f be a subfield of a larger field F . This means
that f is a subset of F and that f is itself a field with the same operations
as defined in F . For this to be the case, it is necessary that f contains
at least two elements, that the operations addition and multiplication
applied to two elements in f again give an element in f , and that −α
and α−1 also belong to f for every α 6= 0 in f . In this case, we can
think of F as a vector space over the subfield f . It follows from the
rules for F that the axioms (i)–(vii) for a vector space are satisfied.
It is clear, that if we view the finite field F as a vector space over f ,
then it is generated by a finite number of vectors. In other words there
16 1. FINITE FIELDS

exists a basis e1 , . . . , en of elements in F such that every u ∈ F can, in


a unique way, be written as
u = α1 e1 + · · · + αn en
with α1 , . . . , αn ∈ f . Here the dimension of F is n. If p is the number
of elements in the subfield f , then each coordinate αi can be chosen in
p different ways, so F has exactly pn elements.
In connection with error-correcting codes, we will later deepen our
discussion on vector spaces over finite fields. Here we just show how
Example 3.5 can be used to see that the number of elements of a finite
field must be a power of a single prime.
Let F be a finite field and as usual denote the unity in F by 1.
Consider the sums
1 , 1 + 1 , 1 + 1 + 1 , . . . , m1 , . . .
where m1 means the sum of m copies of the unity. Since F is finite,
there exist integers r < s such that r1 = s1. If m = s−r, then m1 = 0.
The least positive integer p such that p1 = 0 is called the characteristic
of the field F . The characteristic p must be a prime, since if p were the
product of two integers p1 and p2 greater than 1 then
(p1 1) · (p2 1) = p1 = 0
and hence p1 1 = 0 or p2 1 = 0. This contradicts the fact that p is the
least positive integer with p1 = 0. Now set
f = {m1 ; m ∈ Z} = { 0 , 1 , 1 + 1 , . . . , (p − 1)1 } .
Then it is easily checked that f is a subfield of F and that the map
m 7→ m1 gives an isomorphism between Zp and f . Because f has p
elements, it follows from Example 3.5 that the field F has pn elements
for some positive integer n. We can now formulate our result as the
following theorem.
Theorem 3.6. For every finite field F there exist a prime number
p and a positive integer n such that the number of elements in F is pn .
The prime p is the characteristic of the field.
Remark 3.7. The notion of a characteristic can also be defined for
infinite fields, but here there are two cases. Either, there exists a least
positive integer p such that p1 = 0 which we then call the characteristic,
or the elements m1 are non-zero for all non-zero m. In the latter case
we say that the characteristic is 0. As examples we have Q, R and C
which all are fields of characteristic 0.
4. POLYNOMIAL RINGS 17

Exercises

Exercise 3.1. Let V be a vector space over a field F . A subset U


of V is called a subspace of V if
u, v ∈ U ⇒ αu + βv ∈ U , for all α, β ∈ F.
Check that every subspace U of V is a vector space with the same
operations as in V .
Let F be the field Z3 and U be the subspace of F 4 generated by
the vectors (0, 1, 2, 1), (1, 0, 2, 2) and (1, 2, 0, 1). Find a basis for U and
determine its dimension.
Exercise 3.2. Let F be a field with characteristic p 6= 0 .
(1) Show that pa = 0 for all a ∈ F .
(2) Show that
(a + b)p = ap + bp
for all a, b ∈ K .
p

(Hint: Show first that for 0 < k < p the binomial coefficients k
are
divisible by p.)
Exercise 3.3. (1) Show that for a field of characteristic p 6= 0
(a1 + a2 + · · · + al )p = ap1 + ap2 + · · · + apl .
(2) Prove Fermat’s little theorem by choosing all ai = 1 in (1).

4. Polynomial Rings
According to Theorem 3.6, any finite field must have pn elements,
where p is a prime number and n is some positive integer. So far,
we have only dealt with the fields Zp for which n = 1. To be able
to construct fields with n > 1, we need to discuss polynomials with
coefficients in finite fields.
A polynomial with coefficients in a field F is an expression of the
form
(1) f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 ,
where ai ∈ F . Strictly speaking, a polynomial is just a finite sequence
a0 , a1 , . . . , an of elements in F and the letter x should be seen as a
formal symbol. The value f (α) of the polynomial f at α ∈ F is
an αn + an−1 αn−1 + · · · + a1 α + a0 ∈ F.
18 1. FINITE FIELDS

Example 4.1. Consider the polynomials


f (x) = x3 + 1 and g(x) = x4 + x2 + x + 1
with coefficients in Z2 (observe that we do not write out the terms with
coefficient 0). Despite the fact that the values of f and g are equal for
all α ∈ Z2 = {0, 1}, the polynomials should be considered as different.
If an 6= 0 in equation (1), then we say that the polynomial f (x)
is of degree n and f (x) is said to be monic if an = 1. The set of
all polynomials with coefficients in a field F is denoted by F [x]. The
addition and multiplication of polynomials are defined as usual when
the coefficients lie in R or C. The division algorithm, the factor theorem
and the Euclidean algorithm can be proven, in the general case, in
exactly the same way as when F = R. The division algorithm tells us
that if f and g are polynomials such that deg f ≥ deg g, then there
exist polynomials q and r such that
f (x) = q(x)g(x) + r(x),
where either r(x) is the zero polynomial or deg r < deg g. If r is the zero
polynomial, then we say that g divides f and write g|f . The statement
of the factor theorem is that f (α) = 0 if and only if (x − α) divides
f (x). Finally, the Euclidean algorithm gives a method for finding a
greatest common divisor of two polynomials f and g. That h is a
greatest common divisor of f and g means that h divides both f and g,
furthermore that any other polynomial that divides both f and g must
divide h. The greatest common divisor is not uniquely determined,
but two different greatest common divisors h1 and h2 only differ by a
constant multiple. This follows from the fact that h1 divides h2 and
h2 divides h1 . This is only possible if h1 = ah2 for some a ∈ F . If
we demand that the greatest common divisor of f and g is a monic
polynomial, then it is uniquely determined and is denoted by (f, g).
Example 4.2. We will now illustrate the Euclidean algorithm by
calculating the greatest common divisor of the following polynomials
in Z3 [x]:
f (x) = x5 + 2x3 + x2 + 2, g(x) = x4 + 2x3 + 2x2 + 2x + 1.
Observe that since the coefficients are in Z3 , we can apply identities
such as 4 ≡ 1 and 2 ≡ −1. (In what follows, we will leave out the
brackets around elements in Zn .)
x5 + 2x3 + x2 + 2 = (x + 1)(x4 + 2x3 + 2x2 + 2x + 1) + (x3 + 1)
x4 + 2x3 + 2x2 + 2x + 1 = (x + 2)(x3 + 1) + (2x2 + x + 2)
4. POLYNOMIAL RINGS 19

x3 + 1 = (2x + 2)(2x2 + x + 2).


The last non-vanishing remainder 2x2 + x + 2 is a greatest common
divisor of f and g. The corresponding monic polynomial is obtained
by multiplying by 2−1 = 2. This gives (f, g) = x2 + 2x + 1.
Definition 4.3. A polynomial s(x) in F [x] of degree n ≥ 1 is said
to be irreducible if it does not have a non-trivial divisor i.e. if there
does not exist a polynomial g(x), with 1 ≤ deg g < n, that divides
s(x). Irreducible polynomials are also called prime polynomials.
Example 4.4. The polynomial f (x) = x3 + 2x + 1 is irreducible in
Z3 [x]. To checking this, observe that if f (x) were reducible then at least
one if its factors would be of degree 1. Then f (x) would necessarily
have a zero in Z3 , but this is not the case since f (0) = 1, f (1) = 1 and
f (−1) = 1.
We will now prove that every monic polynomial in F [x] can be
written as a product of monic prime polynomials and that this product
is unique up to the order of its factors. For this we need the following
lemma.
Lemma 4.5. Assume that f , g and h are three polynomials in F [x]
such that f (x) divides the product g(x)h(x). If f and g are relatively
prime i.e. (f, g) = 1 then f divides h.
Proof. It follows from the Euclidean algorithm that since (f, g) =
1 there exist two polynomials c(x) and d(x) such that
1 = c(x)f (x) + d(x)g(x).
Hence
h(x) = c(x)f (x)h(x) + d(x)g(x)h(x).
Both terms on the right-hand side are divisible by f so f must divide
h. 
Theorem 4.6. Let F be a field and f (x) be a monic polynomial with
coefficients in F . Then there exist a number of different monic prime
polynomials s1 (x), . . . , sl (x) in F [x] and positive integers m1 , . . . , ml
such that
f (x) = s1 (x)m1 · · · sl (x)ml .
The prime polynomials si and the integers mi are, up to order, uniquely
determined.
Proof. We prove by induction, over the degree of f , that f can
be written as a product of prime polynomials. When the degree of
f is 1 there is nothing to prove. Now assume that the degree of f
20 1. FINITE FIELDS

is n and that the statement is correct for any polynomial of lower


degree. If f is a prime polynomial we are done. Otherwise, we can
write f (x) = g1 (x)g2 (x) for some polynomials of g1 and g2 both of
degree less than n. According to the induction hypothesis these can be
written as a product of prime polynomials. This proves that f has a
prime factorization.
What is left to prove is the uniqueness. Assume that we have two
prime factorizations for f (x)
(2) s1 (x)m1 · · · sl (x)ml = t1 (x)n1 · · · tj (x)nj .
Let us first consider t1 (x). We shall show that t1 (x) is equal to one
of the factors si (x) on the left-hand side. Since s1 and t1 are monic
prime polynomials, we know that either s1 = t1 or s1 and t1 are rela-
tively prime. If s1 = t1 we are done. Otherwise s1 (x)m1 and t1 (x) are
relatively prime. According to Lemma 4.5, t1 (x) must then divide the
product
s2 (x)m2 · · · sl (x)ml .
We can now continue the same procedure. Either t1 = s2 or else divides
t1 (x) the product
s3 (x)m3 · · · sl (x)ml .
Sooner or later we end up with t1 (x) = si (x) for some i. We can then
divide both sides of equation (2) by t1 (x) and repeat the procedure
now for t2 (x). When we have, in this way, divided out all the factors
ti (x) on the right-hand side, all the factors si (x) on the left-hand side
must have disappeared. Otherwise a product of such factors would
be equal to 1, which is impossible. This proves the uniqueness of the
prime factorization. 
For a given field F the set F [x], equipped with the polynomial ad-
dition and the polynomial multiplication, forms a ring. As we have
seen above, there are great similarities between F [x] and the ring Z of
integers. For both Z and F [x] we have the division algorithm, the Eu-
clidean algorithm and furthermore a unique prime factorization. The
prime numbers in Z correspond to the prime polynomials in F [x]. We
shall now copy the construction of the rings Zn from Z to F [x]. Let
s(x) be a given non-zero polynomial with coefficients in F . Two poly-
nomials f (x) and g(x) in F [x] are said to be congruent modulo s(x)
if their difference f (x) − g(x) is divisible by s(x). For this we simply
write f ≡ g (mod s). Denote by [f (x)] the class of polynomials which
are congruent to f (x) modulo s(x). Then we define an addition and a
multiplication by
[f (x)] + [g(x)] = [f (x) + g(x)] and [f (x)] · [g(x)] = [f (x)g(x)].
4. POLYNOMIAL RINGS 21

In the same way as for the integers, one can check that these definitions
are independent of the choice of the representatives for the congruence
classes. Denote by
F [x]/(s(x))
the set of congruence classes modulo s(x). It is easily checked that
F [x]/(s(x)), equipped with this addition and this multiplication, is a
commutative ring.
Example 4.7. For the ring Z5 [x]/(x3 + 1) we have
[x2 + 2x + 1] · [x2 + x + 2] = [x4 + 3x3 + 5x2 + 5x + 2]
= [x4 + 3x3 + 2] = [(x + 3)(x3 + 1 − 1) + 2]
= [(x + 3)(−1) + 2] = [−x − 1] = [4x + 4].
Observe that x3 can always be substituted by −1, since we are calcu-
lating modulo x3 + 1.
In analogy with the rings Zn one can show that F [x]/(s(x)) is a
field if and only if s(x) is a prime polynomial. If s(x) is not a prime
polynomial, then s(x) = s1 (x)s2 (x) for some polynomials s1 and s2
of positive degree. Then [s1 (x)][s2 (x)] = 0, so F [x]/(s(x)) has zero
divisors and hence is not a field. If s(x) is a prime polynomial, then
(f, s) = 1 for every non-zero polynomial f (x) of degree less than s.
By the Euclidean algorithm there exist polynomials c(x) and d(x) such
that
1 = c(x)f (x) + d(x)s(x).
This implies that [1] = [c(x)][f (x)], so [c(x)] is the inverse of [f (x)]. Ac-
cording to the division algorithm, every congruence class in F [x]/(s(x))
is represented by a polynomial of degree less than s(x). This means
that every non-zero element has an inverse, so F [x]/(s(x)) is a field.
Example 4.8. The polynomial x2 +1 is irreducible in the ring R[x]
of polynomials with real coefficients. This means that
R[x]/(x2 + 1)
is a field. Every congruence class is represented by a polynomial of
degree one and if we apply [x2 + 1] = 0, then we easily get
[a + bx][c + dx] = [(ac − bd) + (ad + bc)x]
With this we easily see that R[x]/(x2 + 1) is isomorphic to the field C
of complex numbers.

Exercises
22 1. FINITE FIELDS

Exercise 4.1. Let f (x) be the polynomial x214 + 3x152 + 2x47 + 2


in Z5 [x]. Find the value f (3) in Z5 .
Exercise 4.2. Show that if f (x) is a polynomial of degree n with
coefficients in a field F , then f has at most n zeros in F .
Exercise 4.3. Determine the greatest common divisor (f, g) of the
following polynomials in Z2 [x]:
(1) f (x) = x7 + 1 , g(x) = x5 + x3 + x + 1.
(2) f (x) = x5 + x + 1 , g(x) = x6 + x5 + x4 + x + 1.
Exercise 4.4. Find the greatest common divisor h = (f, g) of the
polynomials f (x) = x17 + 1 and g(x) = x7 + 1 in Z2 [x] and determine
two polynomials c(x) and d(x) such that
h(x) = c(x)f (x) + d(x)g(x).
Exercise 4.5. Show that there exists only one irreducible poly-
nomial in Z2 [x] of degree two. Determine whether the polynomial
x5 + x4 + 1 in Z2 [x] is irreducible or not.
Exercise 4.6. Determine all monic irreducible polynomials in Z3 [x]
of degree 2.
Exercise 4.7. Find in Z3 [x] the prime factorization for the follow-
ing polynomials:
(1) x5 + x4 + x3 + x − 1
(2) x4 + 2x2 + 2x + 2
(3) x4 + 1
(4) x8 + 2.
Exercise 4.8. How many zero divisors do there exist in the ring
Z5 [x]/(x3 + 1)?
Exercise 4.9. (1) Let F be a finite field. Show that the product
of all non-zero elements in F is equal to −1. (Hint: Apply Theorem
2.1 and the relationship between zeros and coefficients.)
(2) Show that for every prime number p we have
(p − 1)! = −1 (mod p).
(Compare this result with Exercise 1.15.)
Exercise 4.10. Let F be a field with q elements, where q = 2m + 1
is odd. Show that x ∈ F is the square of some non-zero element in F
if and only if xm = 1. (Hint: Show first that a2 = b2 implies that a = b
or a = −b and then use Exercise 4.2.)
Exercise 4.11. Show that for a field with an even number of ele-
ments, every element is the square of one and only one element.
5. FINITE FIELDS 23

5. Finite Fields
Example 5.1. We shall here determine all irreducible polynomi-
als in Z2 [x] of degree less than or equal to 4. There exist only two
polynomials of degree 1, namely
x and x + 1.
These are trivially irreducible. A polynomial of degree 2 or 3 is irre-
ducible if and only if it has no zeros in Z2 . It is easily checked that
such a polynomial has no zeros exactly when it has an odd number
of terms and the constant term is 1. This shows that the irreducible
polynomials of degree 2 and 3 are exactly the following:
x2 + x + 1

x3 + x2 + 1 and x3 + x + 1.
If a polynomial of degree 4 is irreducible, then necessarily it does not
have a factor of degree 1, i.e. it does not have a zero in Z2 , and it is not
a product of two irreducible factors of degree 2. The second condition
only excludes (x2 + x + 1)2 = x4 + x2 + 1, since there only exists one
prime polynomial of degree 2. The other polynomials in Z2 of degree
4 that do not have a zero are
x4 + x3 + 1 , x4 + x + 1 and x4 + x3 + x2 + x + 1.
These are all the prime polynomials in Z2 [x] of degree 4.
If s(x) is any of the irreducible polynomials of degree 4 mentioned
above, then Z2 [x]/(s(x)) is a field with 24 = 16 elements. This follows
from the fact that every congruence class is represented by a unique
polynomial of degree 3 and for this each coefficient can be chosen in
exactly two ways, namely as 0 or 1. Any irreducible polynomial of
degree 2 or 3 induces a field with 22 = 4 or 23 = 8 elements, respectively.
In the next section, we will show that for every prime number p and
every positive integer n there exists an irreducible polynomial in Zp [x]
of degree n. As a direct consequence of this, there exists for each such
p and n a field with pn elements. We shall also show that any two finite
fields with the same number of elements are isomorphic. This means
that up to isomorphism there exists, for each prime p and each positive
integer n, exactly one finite field with pn elements. These fields are
denoted by GF (pn ) and called the Galois field of order pn in honour of
the French mathematician Évariste Galois (1811-1832). In this section
we shall give examples of how to do calculations in finite fields.
24 1. FINITE FIELDS

Example 5.2. In order to find the multiplicative inverse of [x2 + 1]


in the field Z2 [x]/(x3 + x2 + 1) we apply the Euclidean algorithm:
x3 + x2 + 1 = (x + 1)(x2 + 1) + x
x2 + 1 = x · x + 1.
This leads to (observe that + = − in Z2 )
1 = (x2 + 1) + x · x = (x2 + 1) + x((x3 + x2 + 1) + (x + 1)(x2 + 1))
= (x2 + x + 1)(x2 + 1) + x(x3 + x2 + 1).
We end up with [x2 + 1]−1 = [x2 + x + 1].
We will now turn our attention to calculations concerning powers.
If a is a non-zero element of a finite field F then some of its power must
be 1. We know for example from Theorem 2.1 that aq−1 = 1, where q
is the number of elements in F .
Definition 5.3. The order of a non-zero element a in a finite field
is the least positive integer m such that am = 1. We denote the order
of a by o(a).
Example 5.4. Here we determine the order of [10] in the field Z73 :
102 = 100 ≡ 27
103 ≡ 270 ≡ −22
104 ≡ −220 ≡ −1.
This implies that 105 ≡ −10, 106 ≡ −27, 107 ≡ 22 and 108 ≡ 1. The
order of [10] is therefore 8.
According to Fermat’s little theorem, we know that for any non-zero
element a in the field Z73 we have a72 = 1. The following result shows
that it is not a coincidence that the order 8 in Example 5.4 divides 72.
Lemma 5.5. Let a be a non-zero element in a finite field. If an = 1
for some positive number n, then the order of a divides n.
Proof. Assume the converse. If m is the order of a, then there
exist integers q and r with 0 < r < m, such that
n = qm + r.
From this it follows that
1 = an = (am )q · ar = ar .
This contradicts the fact that m = o(a), since 0 < r < m. 
5. FINITE FIELDS 25

The next result gives us a method for constructing elements of high


order.
Lemma 5.6. Assume that the elements a1 and a2 in a finite field
have the orders m1 and m2 , respectively, and that m1 and m2 are rel-
atively prime. Then a = a1 a2 has the order m1 m2 .
Proof. Assume that ak = 1. Then we have
1 = akm1 = akm
1
1
· akm
2
1
= akm
2
1
.
According to Lemma 5.5, m2 must divide km1 . Since (m1 , m2 ) = 1 the
number m2 must divide k. Using a similar argument, we see that m1
divides k. This means that k is divisible by m1 m2 , since m1 and m2
are relatively prime. The order of a is therefore at least m1 m2 . That
it is exactly m1 m2 follows from
am1 m2 = (am1 m2
1 ) · (am 2 m1
2 ) = 1.

Example 5.7. In the field Z73 we have
82 = 64 ≡ −9
83 ≡ −72 ≡ 1
so the order of [8] is 3. According to Example 5.4 and Lemma 5.6 the
order of [80] = [7] is 8 · 3 = 24.
Before we can formulate the main result of this section we need the
following lemma.
Lemma 5.8. Let a and b be elements of a finite field F of order m
and n, respectively, and assume that m does not divide n. Then there
exists an element in F of order greater that n.
Proof. If m does not divide n, then there exists a prime power
p that divides m but not n. Then m = m0 pk and n = n0 pl , where
k

0 ≤ l < k and n0 is not divisible by p. According to Lemma 5.6, this


0 l
means that (pk , n0 ) = 1 and the order of am · bp is pk n0 > n. 
Theorem 5.9. If F is a finite field with q elements, then there
always exists an element in F of order q − 1.
Proof. Let b be a non-zero element in F such that the order of
b is larger than or equal to the order of any other element of F . Set
n = o(b). According to Lemma 5.8 the order of any element in F must
divide n, since otherwise there would exist an element of order greater
26 1. FINITE FIELDS

than n. This means that any non-zero element of F must satisfy the
equation
xn = 1.
The polynomial xn − 1 has therefore q − 1 different zeros. Following
the factor theorem we therefore have n ≥ q − 1. On the other hand
Theorem 2.1 tells us that the order never can be greater than q − 1.
Hence n = q − 1 so we have proven the result. 
Definition 5.10. Let F be a field with q elements. An element of
order q − 1 in F is said to be a primitive element.
Example 5.11. We shall show that [3] is a primitive element for
Z101 . Since the order of [3] must divide 100 = 22 · 52 , it is enough to
check the powers 2, 4, 5, 10, 20, 25 and 50:
32 = 9
34 = 81 ≡ −20
35 ≡ −60
310 ≡ 3600 ≡ −36
320 ≡ 1296 ≡ −17
325 ≡ 1020 ≡ 10
350 ≡ 100 ≡ −1
The least positive integer m for which 3m ≡ 1 is therefore 100.
For a primitive element a in a field F with q element the powers
a0 , a1 , a2 , . . . , aq−2
are all different. Otherwise we would have aj = ak for some integers
j < k between 0 and q − 2. Then ak−j = 1, which contradicts the fact
that the order of a is q − 1. For every non-zero b in F there exists a
uniquely determined j with 0 ≤ j ≤ q − 2 such that b = aj . We call
j the index of b and write j = ind(b). The index is also called the
discrete logarithm of b with respect to the primitive element a. The
index can be used to simplify calculations of products and quotients in
finite fields. If the field has q elements then we have
ind(b1 · b2 ) ≡ ind(b1 ) + ind(b2 ) (mod q − 1)
ind(b1 · b−1
2 ) ≡ ind(b1 ) − ind(b2 ) (mod q − 1) .
Example 5.12. We have seen in Example 5.1 that the polynomial
x4 + x3 + 1 is irreducible Z2 [x]. The field
F = Z2 [x]/(x4 + x3 + 1)
5. FINITE FIELDS 27

has 24 = 16 elements. Each element in F can be described with a


string of four binary digits given by the coefficients of the polynomial
of degree 3 representing the congruence class. As an example, the string
1011 denotes the class [x3 + x + 1]. The class [x] is a primitive element
in F and this induces a table containing each element in F ∗ :

index 0 1 2 3 4 5 6 7
element 0001 0010 0100 1000 1001 1011 1111 0111
index 8 9 10 11 12 13 14
element 1110 0101 1010 1101 0011 0110 1100

As an example, the calculation of the element of degree 5 goes as follows


[x5 ] = [x · x4 ] = [x · (x3 + 1)] = [x4 + x]
= [(x3 + 1) + x] = [x3 + x + 1] .
We illustrate how the table can be used by calculating
(1111) · (1101)−1 .
The index for this element is
6 − 11 = −5 ≡ 10 (mod 15)
Hence
(1111) · (1101)−1 = (1010).

Exercises

Exercise 5.1. Determine all irreducible polynomials of degree 5 in


Z2 [x].
Exercise 5.2. Prove that Z3 [x]/(x3 + x2 + 2) is a field with 27
elements and determine the multiplicative inverse to [x + 2].
Exercise 5.3. Prove that Z11 [x]/(x2 +x+4) is a field and determine
the multiplicative invers to [3x + 2]. How many elements does the field
have ?
Exercise 5.4. (1) Determine the order of the elements [3] and [4]
in Z37 . (2) Determine a primitive element in Z37 .
Exercise 5.5. Determine a primitive element in Z73 .
Exercise 5.6. (1) Show that L = Z2 [x]/(x3 + x + 1) is a field. (2)
Show that [x] is a primitive element and calculate, as in Example 5.12,
an index table for L. (3) Calculate [x2 + 1] · [x2 + x + 1]−1 .
28 1. FINITE FIELDS

Exercise 5.7. Use the table in Example 5.12 to calculate the fol-
lowing
(1) (1001) · ((1011)2 + (0011)−2 ),
(2) ((1010)2 + (0101)3 ) · ((0001) + (1101)2 )−1 .

6. The Existence and Uniqueness of GF (pn )


To show that there exists a field with pn elements we shall here
prove that for each prime p and every positive integer n there exists an
irreducible polynomial of degree n in Zp [x]. We start by noticing that
the total number of monic polynomials
f (x) = xn + an−1 xn−1 + · · · + a1 x + a0
with coefficients in Zp is equal to pn . According to Theorem 6, every
such polynomial can, in a unique way, up to the term order, be written
as a product

(3) f (x) = s1 (x)m1 · · · sl (x)ml ,


where s1 (x), . . . , sl (x) are monic prime polynomials in Zp [x]. If di is
the degree of si (x) then
(4) n = m1 d1 + · · · + ml dl .
The number of monic polynomials of degree n in Zp [x] is equal to
the number of ways, as in (3), to write monic polynomials of degree n
as a product of prime polynomials. If Id denotes the number of monic
prime polynomials of degree d, then according to (4), the total number
of monic polynomials of degree n in Zp [x] is equal to the coefficient for
tn in the product
(1 + t + t2 + · · · )I1 (1 + t2 + t4 + · · · )I2 (1 + t3 + t6 · · · )I3 · · · .
Since we know that the number of these coefficients is equal to pn , we
have
Y  1 Id 1
d
= .
d
1 − t 1 − pt
By taking logarithms on each side we obtain
X
−Id ln(1 − td ) = − ln(1 − pt)

d

and by Taylor expanding on both sides we get


6. THE EXISTENCE AND UNIQUENESS OF GF (pn ) 29

t2 t3 t4 t6 t6 t9
+ + · · · ) + I2 (t2 + + + · · · ) + I3 (t3 + + + · · · ) + · · ·
I1 (t +
2 3 2 3 2 3
p2 t2 p3 t3
= pt + + + ··· .
2 3
Comparing coefficients of each side for tn gives
X d pn
Id · = .
n n
d|n

Observe that on the left-hand side we only have terms where d divides
n. Multiplying by n gives the following result:
Theorem 6.1. If Id is the number of monic irreducible polynomials
of degree d in Zp [x], then
X
dId = pn .
d|n

Example 6.2. If p = 2 and n = 6 then we obtain


I1 + 2I2 + 3I3 + 6I6 = 26 = 64.
According to Example 5.1 we have I1 = 2, I2 = 1 and I3 = 2, so I6 = 9.
By applying Theorem 6.1 repeatedly we can, in this way, determine
the numbers Id . But to do this in one go, we will make use of the
Möbius inversion formula proven in the next section.
The Möbius function µ(n) is defined for positive integers n and
takes only three values 0, 1 and −1. It is given by

1
 if n = 1
k
µ(n) = (−1) if n is the product of k different primes

0 otherwise.
If we apply the Möbius inversion formula to the equation in Theorem
6.1 then we get X
nIn = µ(d)pn/d .
d|n
The right-hand side contains a lowest power of p. If the lowest power
is pm , then
nIn
= ±1 + (a number of p-powers with coefficients ±1).
pm
Hence
nIn
= ±1 (mod p)
pm
and in particular nIn 6= 0.
30 1. FINITE FIELDS

Theorem 6.3. For each prime number p and each positive integer
n there exists an irreducible polynomial of degree n in Zp [x].
It is a direct consequence of Theorem 6.3 that there exists a field
with pn elements. We shall now focus our attention on proving that,
up to isomorphisms, there exists only one such field.
Let F be an arbitrary finite field of characteristic p. Then F con-
tains the subfield
f = { 0 , 1 , . . . , (p − 1)1 }
which is isomorphic to Zp . If m1 ∈ f and β ∈ F , then (m1) · β = mβ .
We can therefore consider F as a vector space over Zp . Since F is
finite, this vector space is finite dimensional. This implies that for
every α ∈ F there exists a positive integer d such that the powers
α0 , α1 , α2 , . . . , αd
are linearly dependent, i.e. there exist a0 , a1 , . . . , ad ∈ Zp not all zero
such that
a0 1 + a1 α + a2 α 2 + · · · + ad α d = 0 .
Let d be the smallest such integer and set s(x) = a0 + a1 x + · · · + ad xd .
Then s(x) has the lowest degree amongst the non-trivial polynomials in
Zp [x] having α as a zero. We can always choose ad = 1, and then s(x)
is uniquely determined and called the minimal polynomial to α. The
minimal polynomial is irreducible in Zp [x] because if s(x) was a product
s1 (x)s2 (x) of factors of lower degree than d, then s1 or s2 would have
α as zero and this would contradict the fact that s(x) is the minimal
polynomial of α.
Theorem 6.4. Let F be a finite field of charateristic p and let α be
an element of F . If L is the smallest subfield of F containing α and if
s(x) is the minimal polynomial to α, then L is isomorphic to the field
Zp [x]/(s(x)).
Proof. Set
L = {f (α) ; f ∈ Zp [x]}.
Every subfield of F containing α must include L, since such a field
contains all powers of α and all linear combinations of such powers.
We shall show that L is isomorphic to the field Zp [x]/(s(x)). It follows
from this that L itself is a field and hence the smallest subfield of F
containing α. Consider the map
Zp [x]/(s(x)) 3 [f (x)] 7→ f (α) ∈ L.
6. THE EXISTENCE AND UNIQUENESS OF GF (pn ) 31

It is well-defined since if f and g belong to the same congruence class


i.e. if f (x) = g(x) + h(x)s(x) for some polynomial h, then
f (α) = g(α) + h(α)s(α) = g(α) .
It immediately follows from the definition that [f (x)]+[g(x)] is mapped
to f (α) + g(α) and [f (x)] · [g(x)] to f (α)g(α). It remains to show that
the map is bijective. It is clear that it is surjective. To show that it
is injective, we first observe that if the minimal polynomial s(x) has
degree d, then it is enough to consider polynomials f (x) of degree less
than d. Every congruence class in Zp [x]/(s(x)) is represented by such
a polynomial. Assume that f (α) = g(α) for two different polynomials
of degree less than d. Then α is a zero of f − g , which contradicts
the fact that s(x) is the minimal polynomial of α. This shows that the
map is injective and the statement is proven. 
Corollary 6.5. Let F be a field with pn elements and let s(x) be a
monic prime polynomial in Zp [x] with zero α in F . Then s(x) is the
minimal polynomial of α and the degree of s divides n.
Proof. The element α is a zero of both s(x) and its minimal poly-
nomial t(x). Hence α is a zero to the greatest common divisor (s, t).
Since s and t are irreducible, we must have s = (s, t) = t. If s(x) has
the degree d and L is the smallest subfield containing α, then Theorem
6.4 tells us that L has pd elements. Because F can be seen as a vector
space over L, we have
|F | = |L|m
for some positive integer m, where |F | and |L| denote the number of
elements in F and L, respectively. This means that
pn = pdm
and from this follows that d divides n. 
We now have all the tools needed to prove that two finite fields
with the same number of elements must be isomorphic. Let F be an
arbitrary field with q = pn elements. According to Theorem 2.1 every
element in F is a zero of the polynomial xq −x . We have multiplied the
equation in the theorem by x to include x = 0. According to Theorem
4.6, xq − x can be written as a product of prime polynomials in Zp [x]:
Y
(5) xq − x = si (x).
i
Here is the sum of the degrees of the polynomials si equal to q. Since
xq − x has q different zeros in F , the prime polynomials on the right-
hand side must all be different and for each polynomial si its degree
32 1. FINITE FIELDS

must be the number of its different zeros in F . The above corollary


shows that the degree of the polynomial si divides n.
Let us now consider the formula in Theorem 6.1. It shows that the
sum of the degrees of all prime polynomials in Zp [x] dividing n is equal
to pn . This means that the product on the right-hand side of equation
(5) must contain a prime polynomial the degree of which divides n. In
particular, according to Theorem 6.3, the right hand-side of (5) must
contain a prime polynomial of degree n. This is the minimal polynomial
of each of its n zeros in F . Let α be such a zero. Then it follows from
Theorem 6.4 that the smallest subfield of F containing α is isomorphic
to the field Zp [x]/(s(x)) and consequently contains pn elements. The
field F is therefore isomorphic to Zp [x]/(s(x)) . We have hereby proven
the following result.
Theorem 6.6. Let s(x) be a prime polynomial of degree n in Zp [x].
Then every field with pn elements is isomorphic to Zp [x]/(s(x)) .
Remark 6.7. In particular, we have shown that if s1 and s2 are
two different prime polynomials of degree n in Zp [x] then the fields
Zp [x]/(s1 (x)) and Zp [x]/(s2 (x)) are isomorphic.

7. The Möbius Inversion Formula


Let us first remember the fact that the Möbius function µ(n) is
defined for positive integers n, as 0 if n has multiple prime factors and
as (−1)k if n is the product of k different primes. As a special case we
have µ(1) = 1 .
Lemma 7.1.
(
X 1 for n = 1
µ(d) =
d|n
0 for n > 1 .

Proof. When n = 1 the sum is equal to µ(1) = 1. If n > 1 and


n = pm mr
1 · · · pr
1
is the prime factorization of n, we set n∗ = p1 · · · pr .
Then
X X
µ(d) = µ(d)
d|n d|n∗
   
r k r r
= 1 − r + · · · + (−1) + · · · + (−1)
k r
r
= (1 − 1)
7. THE MÖBIUS INVERSION FORMULA 33

= 0.
The binomial coefficients kr tell us how many different numbers d are

products of k prime factors chosen amongst p1 , . . . , pr . 
Theorem 7.2 (Möbius inversion formula). Let f (n) and g(n) be
defined for positive integers n and assume that
X
f (n) = g(d)
d|n

for all n. Then X n


g(n) = µ(d)f .
d
d|n

Proof. It follows from


n X
f = g(d0 ),
d
d0 | n
d

that
X n X X X X
µ(d)f = µ(d) g(d0 ) = g(d0 ) µ(d) = g(n).
d
d|n d|n d0 | n
d
d0 |n d| dn0

For the last equality we have used Lemma 7.1, which gives
(
X 1 if d0 = n
µ(d) =
d| n
0 if d0 < n .
d0


CHAPTER 2

Error-Correcting Codes

1. Introduction
When transferring or storing information there is always a risk of
errors occurring in the process. To increase the possibility of detecting
and possibly correcting such errors, one can add a certain redundance
to the text carrying the information, for example, in form of control
digits. We shall now give two simple examples.

Example 1.1. Assume that a sender transmits a text which is di-


vided into a number of six digit binary words. Each such word consists
of six digits which each is either 0 or 1. To increase the possibility for
a receiver to detect possible errors, that might have occurred during
the transfer, to each word the sender can add the seventh binary digit
in such a way that in each seven digit word there always is an even
number of ones. If the receiver registers a word with an odd number
of ones, then he will know that an error has occurred and can possibly
ask the sender to repeat the message.

Example 1.2. If the receiver in Example 1.1 does not have the
opportunity to ask for a repetition, the sender can proceed in a different
way. Instead of adding the seventh digit he can send every six digit
word three times in a row. If the three words are not identical when
they reach the receiver, he will know that an error has occurred and
could try to correct it at each place by choosing a digit that occurs
at the corresponding places in at least two of the received words. He
can of course not be completely sure that the erroneous word has been
corrected, but if the probability for more than one error to occur is
low, then the chances are good.

One disadvantage of the method in Example 1.2 is that, compared


with the original text, the message with the error-correcting mechanism
takes three times as long to send. Hence it seems a worthwhile exercise
to find more effective methods and this is the purpose of the theory of
error-correcting codes. This was started off by the work of Shannon,
Golay and Hamming at the end of the 1940s and has since evolved
35
36 2. ERROR-CORRECTING CODES

rapidly using ever more sophisticated mathematical methods. Here


the theory of finite fields plays a particularly important role.

For writing a text we must have an alphabet. This is a finite set F


of symbols called letters. As is common in coding theory, we assume
that F is a finite field. When F = Z2 , as in the above examples, the
code is said to be binary. A word is a finite sequence x1 x2 . . . xm of
letters. We shall here only deal with so called block codes . This means
that the words are all of the same length m and can therefore be seen
as elements in the vector space F m . When appropriate, we write the
words as vectors x = (x1 , . . . , xm ) in F m . A coding function E is an
injective map
E : Fm → Fn
from F m into a vector space F n of higher dimension i.e. m < n. The
image C = E(F m ) is what we call a code. To improve the possibility
for detecting and correcting errors, it is useful that the elements of
the code C lie far apart from each other in F n . This to minimize the
probability that a sent code word is received erroneously as a different
code word.
Definition 1.3. The Hamming distance d(x, y) between two vec-
tors x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) in F n is defined as the
number of coordinates i where xi 6= yi .
Example 1.4. In the space Z52 the Hamming distance satisfies
d(10111, 11001) = 3 and in Z43 we have d(1122, 1220) = 2.
Remark 1.5. If it is equally likely that an erroneously received
letter is any other letter from the alphabet, then the Hamming distance
is a natural measure for how big the error is. In some situations, other
measures are more appropriate, but here we will only deal with the
Hamming distance.
Definition 1.6. Let C be a code in F n . Then we define its sep-
aration d(C) as the least distance between two different words in the
code i.e.
d(C) = min{d(x, y) ; x, y ∈ C , x 6= y}.
Theorem 1.7. Let C be a code with separation d(C).
(1) If d(C) ≥ k + 1 then C can detect up to k errors in each word.
(2) If d(C) ≥ 2k + 1 then C can correct up to k errors in each
word.
Remark 1.8. The consequence of (2) is that if d(C) ≥ 2k + 1
then, for each word containing at most k errors, there exists a uniquely
1. INTRODUCTION 37

determined closest code word. We assume that the erroneous word is


corrected by picking instead the closest word in the code. For prac-
tical purposes, it is of great importance to find effective algorithms
correcting errors and the existence of such algorithms can be a strong
argument for the choice of a particular code. In the following we will
focus on how to construct codes with high separation and not on error-
correcting algorithms.
Proof of Theorem 1.7. (1) If d(C) ≥ k + 1, then any two code
words are different at at least k + 1 places. A received word with at
most k letters wrong cannot be a code word and is therefore detected
as erroneous.
To prove (2) we assume that x is a received word different from
a code word y at most k places. If z was another code word with
this property then the triangular inequality gives d(y, z) ≤ d(y, x) +
d(x, z) ≤ 2k. This contradicts the assumption that d(C) ≥ 2k + 1.
This means that we can correct x to y . 
If we are interested in constructing a code C = E(F m ) in F n with
a given separation σ = d(C) , then there is a natural limit for which m
we can choose. We shall now give a theoretical estimate of the largest
possible value of m.
Definition 1.9. For every non-negative integer r we define the
sphere S(x, r), with centre x ∈ F n and radius r, by
S(x, r) = {y ∈ F n ; d(x, y) ≤ r}.
Lemma 1.10. If F has q elements then the sphere S(x, r) contains
exactly
       
n n n 2 n
+ (q − 1) + (q − 1) + · · · + (q − 1)r
0 1 2 r
words.
Proof. The  result follows from the fact that if 0 ≤ j ≤ r, then
there exist nj (q − 1)j words which have exactly j coordinates different
from x. 
Theorem 1.11. Assume that F has q elements, that the code C in
F n contains M words and has separation 2k + 1. Then
      
n n n
(6) M + (q − 1) + · · · + (q − 1) ≤ q n .
k
0 1 k
Proof. The spheres of radius k and centre in different code words
in C cannot intersect, since d(C) = 2k + 1. Because the number of
elements in F n is q n , the result then follows from Lemma 1.10. 
38 2. ERROR-CORRECTING CODES

Remark 1.12. If C = E(F m ) then M = q m .


Remark 1.13. The inequality (6) is called the sphere packing bound
or the Hamming bound. In case of equality, the corresponding code C
is said to be perfect. For such a code, every word y in F n lies in exactly
one sphere S(x, k) with x in C.

Exercises

Exercise 1.1. In Examples 1.1 and 1.2 we defined two coding func-
tions from Z62 to Z72 and Z18
2 , respectively. Determine the separation
for the corresponding codes. Compare the result with Theorem 1.11.
Exercise 1.2. Let σ > 0 be an odd integer and C be a code in
Zn2with M words and separation σ. Show that there exists a code C
b
n+1
in Z2 with M words and separation σ + 1. (Hint: Compare with
Example 1.1)
Exercise 1.3. Construct a code in Z82 with 4 words and separation
5.
Exercise 1.4. Show that there does not exist a code in Z12
2 with
7
2 words and separation 5.

2. Linear Codes and Generating Matrices


Definition 2.1. A code C in F n is said to be linear if it is a linear
subspace of F n . If the dimension of C is m then it is called an [n, m]
code.
Remark 2.2. That C is a linear subspace of F n means that every
linear combination of vectors in C is also contained in C. Then C is
itself a vector space with the same operations as F n , so the dimension
of C is well-defined.
In practice, most error-correcting codes are linear or can be ob-
tained from linear ones. A great advantage of linear codes is that it is
much easier to determine their separation than in the general case.
Remark 2.3. By the weight w(x) of a code word x = (x1 , . . . , xn )
in F n we mean the number of coordinates in x that are different from
zero. The weight w(C) of a linear code C in F n is defined by
w(C) = min{w(x); x ∈ C , x 6= 0}.
2. LINEAR CODES AND GENERATING MATRICES 39

Theorem 2.4. For a linear code C the separation d(C) is equal to


its weight w(C).
Proof. A linear code that contains the two words x and y also
contains their difference x − y. The result follows from the fact that
the Hamming distance d(x, y) is equal to the weight w(x − y). 
Remark 2.5. If we are interested in determining the separation
for a general code containing M words, then we must, in principle,
determine M (M − 1)/2 different Hamming distances, one for each pair
in the code. For a linear code, it is enough to calculate the weight of
the M − 1 non-zero code words.
Definition 2.6. A generator matrix for a linear [n, m] code C in
n
F is a m × n matrix G, with elements in F , such that its rows form
a basis for C.
Example 2.7. Consider the following 3 × 7 matrix with elements
in F = Z3
 
1 1 1 1 1 1 1
G= 1 1 2 2 1 1 2 .
2 1 2 1 2 1 2
By subtracting the first row from the second and adding the first to
the third, we obtain the matrix
 
1 1 1 1 1 1 1
0 0 1 1 0 0 1 .
0 2 0 2 0 2 0
Multiplying the third row by 2 gives
 
1 1 1 1 1 1 1
0 0 1 1 0 0 1 .
0 1 0 1 0 1 0
Finally, subtracting both the second and the third row from the first
yields
 
1 0 0 2 1 0 0
e = 0 0
G 1 1 0 0 1 .
0 1 0 1 0 1 0
The rows of Ge generate the same subspace of F 7 as the rows of G,
because we can write the rows in one matrix as a linear combination
of the rows of the other. The two matrices G and G e are therefore
7
generator matrices for the same code C in F . We now observe that
40 2. ERROR-CORRECTING CODES

the first three columns of G


e are columns in the identity matrix of order
3. If we interchange the second and the third columns of G e we get
 
1 0 0 2 1 0 0
0 1 0 1 0 0 1 .
0 0 1 1 0 1 0
This matrix generates a code C 0 in F 7 that is obtained from C by
interchanging the letters in position 2 and 3 for all words in C.
Definition 2.8. Two codes C and C 0 in F n are said to be equivalent
if there exists a permutation π of the numbers 1, . . . , n such that
C 0 = {xπ(1) xπ(2) . . . xπ(n) ; x1 x2 . . . xn ∈ C} .
Remark 2.9. If two codes C and C 0 are equivalent then their
separations are equal i.e. d(C) = d(C 0 ).
The ideas presented in Example 2.7 can be applied to prove the
following theorem.
Theorem 2.10. Every linear [n, m] code C is equivalent to a code
with a generator matrix of the form
[Im | A]
where Im is the identity matrix of order m and A is an m × (n − m)
matrix.
Definition 2.11. When a generator matrix for a linear code takes
the form as in Theorem 2.10 we say that it is of normal form.
Let G = [Im | A] be the generator matrix, of a linear [n, m] code C
in F n , of normal form. If the elements in F m and F n are seen as row
matrices, then the map
F m 3 x 7→ xG ∈ F n
gives a natural linear coding function. The first m letters in the word
xG are given by x in K m and the last n − m letters (control digits) by
xA.

Exercises

Exercise 2.1. Construct generator matrices for the codes in Ex-


amples 1.1 and 1.2.
3. CONTROL MATRICES AND DECODING 41

Exercise 2.2. Let C be a binary linear code with the generator


matrix  
1 0 0 1 1 0 1
0 1 0 1 0 1 1 .
0 0 1 0 1 1 1
List all the code words in C and determine the separation for C.
Exercise 2.3. The matrix
 
1 0 1 1
0 1 1 2
is a generator matrix for a linear code C in Z43 . Determine all the code
words in C and the separation d(C) . Then show that the code C is
perfect.
Exercise 2.4. Let C be a binary linear code with generator matrix
 
1 1 1 0 0 0 0
1 0 0 1 1 0 0
1 0 0 0 0 1 1 .
 
0 1 0 1 0 1 0
Find a generator matrix for C of normal form.
Exercise 2.5. Prove Theorem 2.10 by showing that every m × n
matrix G, with elements in a field F and linearly independent rows,
can be transformed into a matrix of the form [Im | A] by repeated use
of the following operations:
(1) multiplication of a row with an element in F
(2) addition of a row to another one
(3) swopping two columns.
(Hint: Use induction over the number of rows in G)

3. Control Matrices and Decoding


Definition 3.1. The scalar product < x, y > of two vectors x =
(x1 , . . . , xn ) and y = (y1 , . . . , yn ) in F n is defined by
< x, y > = x1 y1 + · · · + xn yn .
Definition 3.2. The dual code C ⊥ of a linear code C in F n is the
linear code
C ⊥ = {y ∈ F n ; < x, y >= 0 for all x ∈ C}.
42 2. ERROR-CORRECTING CODES

Remark 3.3. As for subspaces in Rn , it is easy to show that if the


code C in F n has dimension m, then the dual code C ⊥ is of dimension
n − m. For vector spaces F n over a finite field F , it is not true in
general that every vector in F n can, in a unique way, be written as
the sum of a vector in C and a vector in C ⊥ . It can even happen that
C ⊥ = C. In that case the code is said to be self-dual.
Example 3.4. For the matrix
 
1 0 1 1
G=
0 1 1 2
the scalar product of the first row with itself is 3, the scalar product of
the second row with itself is 6, and the scalar product of the two rows
is 3. This means that each scalar product is 0 modulo 3. From this we
see that the [4, 2] code over Z3 with generator matrix G is self-dual.
Definition 3.5. A generator matrix for the dual code C ⊥ of C is
called a control matrix for C.
A word x ∈ F n is contained in the code C if and only if the scalar
product of x and any row of a control matrix for C is zero. In this way
we can easily check if a word belongs to the code or not.
If G is a generator matrix for an [n, m] code C and H is a control
matrix for C, then G is an m × n matrix and H is an (n − m) × n
matrix of rank (n − m). The condition that H is a control matrix for
C can be written as
(7) G · H t = 0,
where H t denotes the transpose of the matrix H. The content of equa-
tion (7) is namely that the scalar product of the rows of G and the
rows of H are zero.
Let us now assume that the generator matrix G is of normal form
[Im | A], where A is an m × (n − m) matrix. If we then choose
H = [−At | In−m ],
then it is easily verified that condition (7) is satisfied. We now formu-
late this as the following theorem.
Theorem 3.6. If a linear [n, m] code C has the generator matrix
[Im | A], then [−At | In−m ] is a control matrix for C.
Remark 3.7. If the field F is Z2 , then −At = At so we can take
t
[A | In−m ] as a control matrix.
3. CONTROL MATRICES AND DECODING 43

Example 3.8. The binary [5, 2] code which has the generator ma-
trix  
1 0 1 0 1
G=
0 1 0 1 1
has as control matrix
 
1 0 1 0 0
H = 0 1 0 1 0  .
1 1 0 0 1
We shall now describe how a receiver can apply a control matrix
H of a linear code C to correct errors that possibly have occurred
during the transfer of information when using the code C. We start by
checking if the received word x ∈ F n satisfies the condition xH t = 0.
If that is the case then x is orthogonal to all the rows of H and hence
a code word. We then assume that no error has occurred and that x
is the code word sent. On the other hand, if xH t 6= 0 then an error
has occurred. In order to correct it, we consider the set of all words y
in F n such that yH t = xH t . We call this set the coset corresponding
to the syndrome xH t . In the coset corresponding to xH t we choose
the word ȳ with least weight i.e. the least Hamming distance from the
origin. The fact that ȳH t = xH t means that the difference x − ȳ is
a code word and there does not exist any other code word closer to x
since ȳ is of minimal weight. For this reason it is reasonable to correct
x to x − ȳ . The word ȳ is called a coset leader corresponding to the
syndrome xH t .
Example 3.9. For the code in Example 3.8 we have the following
table of coset leaders of the listed syndromes

coset leader 00000 10000 01000 00100 00010 00001 11000 10010
syndrome 000 101 011 100 010 001 110 111

The syndrome 000 corresponds to the coset of code words. The five
following syndromes correspond to cosets consisting of words different
from a code word at only one place. For those the coset leaders are
uniquely determined since different words of weight one have different
syndromes. This is a consequence of the fact that the columns of the
control matrix H are all different. The syndrome of a word that has
1 at place j and 0 elsewhere is the j-th row in H t . The two last coset
leaders are not uniquely determined by their syndromes. For example,
also 01100 gives the syndrome 111. Here the receiver can act in several
ways. One possibility is that he decides to pick one of the coset leaders
44 2. ERROR-CORRECTING CODES

and uses that one for error-correcting. Other alternatives are that he
asks the sender to repeat the message or simply ignores the word.
Let us now apply the above table to the three received words 11111,
01110 and 01101. The first word has the syndrome 001. The corre-
sponding coset leader is 00001 and the corrected word becomes 11110.
For 01110 the syndrome is 101 with coset leader 10000. Even in this
case the corrected word is 01110 − 10000 = 11110. For the word 01101
the syndrome is 110 so at least two letters must be wrong. If the re-
ceiver picks the coset leader in the list above, then the corrected word
becomes 10101.
We conclude this section with a theorem telling us how we can
determine the separation of a code from its control matrix.
Theorem 3.10. A linear code C with the control matrix H has
separation σ if and only if there exist σ columns in H that are linearly
dependent and furthermore any σ − 1 of the columns in H are linearly
independent.
Proof. That σ columns in H are linearly dependent means that
there exists a word x of weight at most σ such that xH t = 0. The
weight of such a word can never be less than σ, since σ − 1 columns
in H are always linearly independent. Hence w(C) = σ and the result
follows from Theorem 2.4 of the last section. 

Exercises

Exercise 3.1. Construct a control matrix for the code in Example


1.1.
Exercise 3.2. Show that for a linear [n, m] code C the dual code
C ⊥ has dimension n − m. (Hint: Use Theorem 2.10)
Exercise 3.3. The matrices
   
1 0 0 1 1 1 2 4 0 3
1 1 1 0 1 and 0 2 1 4 1
0 0 1 1 1 2 0 3 1 4

are generator matrices for two linear codes C1 and C2 in Z52 and Z55 ,
respectively. Construct control matrices for C1 and C2 . What are the
separations for C1 and C2 ?
4. SOME SPECIAL CODES 45

Exercise 3.4. Consider the linear code in Z62 with the generator
matrix  
1 0 0 1 1 1
0 1 0 1 0 1 .
0 0 1 0 1 1
(1) Which of the following words are code words ?
111001 , 010100 , 101100 , 110111 , 100001.
(2) Which of the words can be corrected? Correct those!
Exercise 3.5. Let C be a binary code with generator matrix
 
1 0 0 0 1 0 1
0 1 0 0 1 0 1
 .
0 0 1 0 0 1 1
0 0 0 1 0 1 1
Correct the following words in C if possible
1101011 , 0110111 , 0111000 .
Exercise 3.6. Determine the separation for the linear code in Z83
with control matrix
 
1 1 1 1 1 1 1 1
0 1 0 0 1 2 1 2
 .
0 0 1 0 1 1 2 0
0 0 0 1 1 1 0 1
Exercise 3.7. Let C be the code in Z65 with the generator matrix
 
1 0 0 1 1 1
0 1 0 1 2 3 .
0 0 1 1 3 4
Show that d(C) = 4.

4. Some Special Codes


Example 4.1. The matrix
 
1 1 1 0 1 0 0
H = 1 1 0 1 0 1 0
1 0 1 1 0 0 1
is a control matrix for a binary [7, 4] code consisting of all the words
in Z72 such that xH t = 0. The seven columns in H are all different
and together they are all the non-zero elements in Z32 . Therefore every
46 2. ERROR-CORRECTING CODES

non-zero syndrome in Z32 has a unique coset leader of weight 1. For


example ȳ = 0001000 has the syndrome ȳH t = 011 corresponding to
the fourth column in H. Every word x in Z72 which is not a code word
can be corrected to a code word by only changing one digit in x . Which
digit is to be changed is determined by which column in H corresponds
to the syndrome xH t .
Codes with the properties explained in the last example carry a
special name.
Definition 4.2. A linear [n, m] code over Z2 , with a control ma-
trix such that its columns are all different and constitute all non-zero
columns in Zn−m
2 , is called a binary Hamming code.
Remark 4.3. Hamming codes can only occur for special values of
the parameters m and n. If r = n − m then the number of non-zero
vectors in Zn−m
2 is 2r − 1. This means that for a binary Hamming code
we have n = 2r − 1 and m = n − r = 2r − 1 − r for some positive integer
r. In Example 4.1 we have r = 3.
Remark 4.4. In the same way as in Example 4.1, we see that for
an arbitrary binary Hamming code, it follows that every word in Zn2 is
either a code word or has Hamming distance 1 from a uniquely deter-
mined code word. This implies that the spheres, of radius 1 and centre
in a code word, cover Zn2 and that two such spheres never intersect.
This means that every binary Hamming code is perfect.
Example 4.5. Let C be the [10, 8] code over the field Z11 defined
by xH t = 0, where
 
1 1 1 1 1 1 1 1 1 1
H= .
1 2 3 4 5 6 7 8 9 10
Observe that the control matrix H is not of the normal form [−At | I2 ].
If we so wish, it is easy to transform it to normal form but for our
purposes it is more useful as it is. Note that the calculations are taking
place in Z11 , so x ∈ Z10
11 is a code word if and only if
(
x1 + x2 + · · · + x10 = 0 (mod 11)
x1 + 2x2 + · · · + 10x10 = 0 (mod 11).
Assume that when transferring a code word z = (z1 , . . . , z10 ) exactly
one error e has occurred at place k so that the received word is
x = (z1 , . . . , zk + e, . . . , z10 ) .
Then the syndrome xH t is equal to (e, ke). From this we can directly
determine the error e and also at which place it occurred, by dividing
4. SOME SPECIAL CODES 47

the second component by the first. If for example x = 0610271355,


then xH t = (8, 6) . Since 8−1 = 7 in Z11 we get k = 6 · 8−1 ≡ 42 ≡ 9
(mod 11) . If only one error has occurred in x, then this has happened
at place 9 and the corresponding digit should be changed to 5 − 8 ≡ 8.
If we do not want to use the ”digit 10” in the code words we can
simply remove all the words containing 10 from the code C. Employing
the principle of inclusion-exclusion, one can see that we then still have
82,644,629 words left in the code. This means that we could issue
so many ten digit telephone numbers and guarantee that the correct
person would be reached even if one digits had been pressed wrongly.
To prepare the next example we describe how two given codes can
be used to produce a new one.
Theorem 4.6. Let F be a finite field and C1 , C2 be two linear codes
in F n of dimension m1 and m2 , respectively. Then
C = {(x, x + y) ∈ F 2n ; x ∈ C1 and y ∈ C2 }
is a linear [2n, m1 + m2 ] code. If σ1 is the separation of C1 and σ2 the
separation of C2 , then the separation of C is
σ = min(2σ1 , σ2 ).
Proof. We leave it to the reader to prove that C is a linear code
of dimension m1 + m2 . To determine the separation of C we must
estimate the least possible weight of the non-zero words in C. If y = 0,
then w(x, x) = 2w(x) ≥ 2σ1 and equality is obtained for some x 6= 0 in
C1 . If y 6= 0, then w(x, x+y) ≥ w(y) ≥ σ2 and equality holds for x = 0
and some y ∈ C2 . Hence the separation of C equals min(2σ1 , σ2 ) . 
Example 4.7. By repeatedly using Theorem 4.6, we shall construct
a code which has, amongst other things, been used by Mariner 9 to send
pictures of the planet Mars back to Earth.
Let C1 be the binary [4,3] code consiting of all the words x =
(x1 , x2 , x3 , x4 ) in Z42 such that
x1 + x2 + x3 + x4 = 0 (mod 2) .
The code C1 is generated by those words that contain an even number
of the digit 1. A non-zero word must therefore contain at least two
ones, so the separation of C1 is 2. As C2 we take the code consisting
of the two words 0000 and 1111. The code C2 has dimension 1 and
separation 4. If we now apply the construction of Theorem 4.6 to C1
and C2 , then we obtain a [8, 3 + 1] code with separation 4. Call this
code C10 and now choose C20 to be the code in Z82 containing the two
elements for which their digits are either all 0 or all 1. If we then apply
48 2. ERROR-CORRECTING CODES

Theorem 4.6 to C10 and C20 we get a [16,5] code with separation 8. Call
this C100 and take C200 to be the code in Z16
2 consisting of the two words
with all digits equal. If we then yet again employ Theorem 4.6 we yield
a [32,6] code with separation 16. This is the code that was used by
Mariner 9. Since the separation is 16, Theorem 1.7 tells us that up to
15 errors are detected and that up to 7 errors can be corrected in each
word consisting of 32 letters. For this 32 − 6 = 26 control digits are
needed. The Mariner code belongs to a general class called Reed-Muller
codes.
The last example of this section is a classical code constructed by
M. J. E. Golay in 1949.
Example 4.8. Let C be the [12,6] code over Z3 with generator
matrix
1 0 0 0 0 0 0 1 1 1 1 1
 

0 1 0 0 0 0 1 0 1 2 2 1


0 0 1 0 0 0 1 1 0 1 2 2
 
G = [I6 | A] =  .

0 0 0 1 0 0 1 2 1 0 1 2


0 0 0 0 1 0 1 2 2 1 0 1

0 0 0 0 0 1 1 1 2 2 1 0
The five last digits in the five last rows are obtained by a cyclic permu-
tation of the vector 01221. It is easily checked that the scalar products
of the rows of G are zero (note that 2 = −1 in Z3 ). The code C is
therefore self-dual. In particular, we have < x, x >= 0 for every word
x in C.
Since the letters in x are 0 or ±1, this implies that the weight w(x)
must be divisible by 3. We will show that there does not exist a word
in C of weight 3. Such a word must be of the type (3 | 0), (2 | 1), (1 | 2)
or (0 | 3), where the digits to the left and to the right of | tell us how
many of the first six and last six digits in the word are different from
0, respectively. Since the code is self-dual, the scalar product of any
code word and any row of the generator matrix G must be zero. This
is impossible for the words of the type (3 | 0) and (2 | 1). On the other
hand, every code word is a linear combination of the rows of G. This is
impossible for the types (1 | 2) and (0 | 3). This means that the lowest
weight of a non-zero word in C is 6, which therefore is the separation
of the code. If we now remove the first column of A in the generator
matrix we obtain a [11,6] code called the Golay code over Z3 and is
denoted by G11 . By removing a letter from a word its weight is reduced
by at most 1, so G11 has the separation 5 and can therefore correct up
to 2 errors.
4. SOME SPECIAL CODES 49

This shows that G11 is a perfect code. In order to check this one
has to show that equality holds in (6) of Theorem 1.11. For G11 we
have M = 36 , n = 11, k = 2 and q = 3, so we must verify
      
6 11 11 11
3 · + ·2+ · 2 = 311 .
2
0 1 2
This is left to the reader.
Remark 4.9. In 1949 Golay also constructed a perfect binary
[23,12] code with separation 7 denoted by G23 . One can show that
Golay’s codes are the only perfect codes over a finite field containing
more that two words and correcting more than one error. To be more
precise, every such code must be equivalent to either G11 or G23 .

Exercises

Exercise 4.1. Construct a control matrix for a binary [15,11] Ham-


ming code.
Exercise 4.2. Let F be a finite field and C be an [n, m] code in
F n with separation 3. If C has a control matrix H such that every
vector in F n−m can be obtained by multiplying some column in H by
an element in F , then C is called a Hamming code over K.
(1) Show that every such Hamming code is perfect.
(2) Construct a control matrix for a [8,6] Hamming code over Z7 .
(3) Construct a control matrix for a [13,10] Hamming code over
Z3 .
(4) For which values of n and m does there exist an [n, m] Hamming
code over Zp ?
Exercise 4.3. Correct, with respect to the code in Example 4.5,
the received word 0617960587 under the condition that it contains at
most one error.
Exercise 4.4. Let H be the control matrix in Example 4.5. What
conclusion can be drawn if one digit, but not both, is zero in the syn-
drome xH t for the received word x?
Exercise 4.5. Describe a generator matrix for the code C in The-
orem 4.6, if G1 and G2 are generator matrices for the codes C1 and C2 .
Do also construct a generator matrix for the code C10 in Example 4.7.
Exercise 4.6. Show that in a binary self-dual code the weight of
any element must be an even number.
50 2. ERROR-CORRECTING CODES

Exercise 4.7. Let C be a binary code with generator matrix


 
1 1 0 0 0 1 1 0
0 1 1 1 0 1 0 0
 .
0 0 1 0 1 1 1 0
0 0 1 1 1 0 0 1
(1) Show that C is self-dual.
(2) Use the result of Exercise 4.6 to calculate the separation d(C).

5. Vandermonde Matrices and Reed-Solomon Codes


In this last section we describe a particular type of codes with a
high error-correcting capacity. They have, amongst other things, been
important for the development of modern CD-technology.
According to Theorem 3.10, a linear code with control matrix H
has separation at least σ if every collection of σ − 1 columns in H
is linearly independent. We start by showing how to easily construct
matrices with a least fixed number of linearly independent columns.
Let F be a finite field and β0 , β1 , . . . , βd be different elements of
F . Then the factor theorem tells us that a polynomial c(x) in F [x] of
degree at most d with zeros β0 , β1 , . . . , βd must be the zero polynomial.
If
c(x) = c0 + c1 x + · · · + cd xd ,
then this implies that the system of equations
1 β0 β02 . . . β0d c0 0
    
1 β1 β12 . . . β1d  c1  0
. . .. ..   .  = .
 .. .. . .   ..   .. 
1 βd βd2 . . . βdd cd 0
only has the trivial solution c0 = c1 = · · · = cd = 0 . This means that
the coefficient matrix is invertible, so the columns of the transposed
matrix
1 1 ... 1
 
 β0 β1 . . . βd 
 2 2
β0 β1 . . . βd2 

(8) .
 .. .. .. 
. .
β0 β1 . . . βdd
d d

are linearly independent. A matrix of this form is called a Vandermonde


matrix .
5. VANDERMONDE MATRICES AND REED-SOLOMON CODES 51

Now let n be an integer greater than d and let α0 , α1 , . . . , αn be


different elements of the field F . Then every collection of d+1 columns
from the matrix
1 1 ... ... 1
 
α0 α1 . . . . . . αn 
 2
α0 α12 . . . . . . αn2 

(9) .
 .. .. .. 
. . 
α0 α1 . . . . . . αnd
d d

are linearly independent. This because the columns form a Vander-


monde matrix. According to Theorem 3.10 every matrix of the form
(9) is a control matrix of a linear code in F n+1 with separation d + 2.
Example 5.1. Consider the linear [10,6] code over Z11 defined by
the control matrix
 
1 1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8 9 10 
H= 1 22 32

42 52 62 72 82 92 102 
1 23 33 43 53 63 73 83 93 103
where we calculate the powers in Z11 . According to what we have
just proven, four arbitrary chosen columns in H are always linearly
independent so the corresponding code has separation 5. Observe that
the columns are contained in a four dimensional vector space, so more
that four vectors are always linearly dependent.
Since the separation is 5, it follows from Theorem 1.7 that the code
corrects two errors. This is an improvement compared with the code
in Example 4.5 only correcting one error. The price for this is that the
number of code words in Z10 6
11 are now only 11 compared with 11 in
8

Example 4.5.
The code in Example 5.1 is a so called Reed-Solomon code. In
general this name is given to every code over a finite field F with a
control matrix of the form (9) where α0 , α1 , . . . , αn are all the non-zero
elements of F . If F has q elements then n = q − 2. Usually, we then
list the elements α0 , α1 , . . . , αn by choosing a primitive element α ∈ F
and put αi = αi . Then the control matrix (9) takes the form
1 1 1 ... 1
 
1 α α2 . . . αq−2 
. . .. .. 
 .. .. . . 
1 αd α2d . . . α(q−2)d
52 2. ERROR-CORRECTING CODES

Since αq−1 = 1 in F , it is of course sufficient to calculate the exponents


modulo q − 1.
Remark 5.2. If d = 2k − 1, in the control matrix (9), then the
separation is 2k + 1 and the corresponding code corrects k errors. In
most applications we have F = GF (2m ) and each “letter” in F can then
be written as a string of m binary symbols, 0 or 1. If one considers a
continuous sequence of (k − 1)m + 1 binary symbols in one word, then
they can not influence more than k letters in GF (2m ) . This means
that a single “cascade” of binary errors of length ≤ (k − 1)m + 1 can
be corrected. This is the reason why Reed-Solomon codes are used in
today’s CD-technology. This is utilized when playing a disc to eliminate
noise caused by dust, fingerprints, small scratches, etc.
Example 5.3. Let us consider the case when F = GF (26 ) and
k = 5. Since F has 64 elements, every word in a Reed-Solomon code
over F has length 63 if the letters are elements in F . This corresponds
to binary words of length 6 · 63 = 378. When k = 5 we can correct
single cascades of binary errors up to length (k − 1)m + 1 = 25. In this
case the control matrix (9) has d + 1 = 2k = 10 rows, so the code has
dimension 63 − 10 = 53, as a vector space over F . This means that it
contains (26 )53 = 2318 words.

Exercises

Exercise 5.1. Construct a linear [8,4] code over Z17 with separa-
tion 5.
Exercise 5.2. Construct a control matrix for a Reed-Solomon code
over F = GF (23 ) that corrects 2 errors in F .

You might also like