Lecture Notes
Lecture Notes
Matthew Fayers
January–March 2008
This is a set of notes which is supposed to augment your own notes for the Coding Theory course.
They were written by Matthew Fayers, and very lightly edited my me, Mark Jerrum, for 2008. I am
very grateful to Matthew Fayers for permission to use this excellent material. If you find any mistakes,
please e-mail me: [email protected]. Thanks to the following people who have already sent
corrections: Nilmini Herath, Julian Wiseman, Dilara Azizova.
Contents
1 Introduction and definitions 2
1.1 Alphabets and codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Error detection and correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Equivalent codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Good codes 6
2.1 The main coding theory problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Spheres and the Hamming bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 The Singleton bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Another bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 The Plotkin bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Linear codes 15
4.1 Revision of linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Finite fields and linear codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 The minimum distance of a linear code . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Bases and generator matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 Equivalence of linear codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.6 Decoding with a linear code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1
2 Coding Theory
Proof. (1), (2) and (3) are very easy, so let’s do (4). Now d(x, z) is the number of values i for which
xi , zi . Note that if xi , zi , then either xi , yi or yi , zi . Hence
{i | xi , zi } ⊆ {i | xi , yi } ∪ {i | yi , zi }.
Introduction and definitions 3
So
{|i | xi , zi }| 6 |{i | xi , yi } ∪ {i | yi , zi }|
6 |{i | xi , yi }| + |{i | yi , zi }|,
i.e.
d(x, z) 6 d(x, y) + d(y, z).
Now we can talk about error detection and correction. We say that a code C is t-error detecting
if d(x, y) > t for any two distinct words in C. We say that C is t-error-correcting if there do not exist
words x, y ∈ C and z ∈ An such that d(x, z) 6 t and d(y, z) 6 t.
Example. The simplest kinds of error-detecting codes are repetition codes. The repetition code of
length n over A simply consists of all words aa . . . a, for a ∈ A. For this code, any two distinct
codewords differ in every position, and so d(x, y) = n for all x , y in C. So the code is t-error-
detecting for every t 6 n − 1, and is t-error-correcting for every t 6 n−1
2 .
Given a code C, we define its minimum distance d(C) to be the smallest distance between distinct
codewords:
d(C) = min{d(x, y) | x , y ∈ C}.
Lemma 1.2. A code C is t-error-detecting if and only if d(C) > t + 1, and is t-error-correcting if and
only if d(C) > 2t + 1,
Proof. The first part is immediate from the definition of “t-error-detecting”. For the second part,
assume that C is not t-error-correcting. Then there exist distinct codewords x, y ∈ C and and a word
z ∈ An such that d(x, z) ≤ t and d((y, z) ≤ t. By the triangle inequality, d(x, y) ≤ d(x, z) + d((y, z) ≤ 2t,
and hence d(C) ≤ 2t. Conversely, if d(C) ≤ 2t then choose x, y ∈ C such that d(x, y) ≤ 2t. There exists
z ∈ An such that d(x, z) ≤ t and d((y, z) ≤ t. (Check this! It is a property of the Hamming metric, but
not of metrics in general.) Thus, C is not t-error-correcting.
Proof. By the previous lemma, the properties “t-error-correcting” and “2t-error-detecting” for the
code C are both equivalent to d(C) ≥ 2t + 1.
From now on, we shall think about the minimum distance of a code rather than how many errors
it can detect or correct.
We say that a code of length n with M codewords and minimum distance at least d is an (n, M, d)-
code. For example, the repetition code described above is an (n, q, n)-code. Another example is the
following ‘parity-check’ code, which is a binary (4, 8, 2)-code:
The point of using error-detecting and error-correcting codes is that we might like to transmit a
message over a ‘noisy’ channel, where each symbol we transmit gets mis-transmitted with a certain
probability; an example (for which several of the codes we shall see have been used) is a satellite
4 Coding Theory
transmitting images from the outer reaches of the solar system. Using an error-detecting code, we
reduce the probability that the receiver misinterprets distorted information – provided not too many
errors have been made in transmission, the receiver will know that errors have been made, and can
request re-transmission; in a situation where re-transmission is impractical, an error-correcting code
can be used. Of course, the disadvantage of this extra ‘certainty’ of faithful transmission is that we are
adding redundant information to the code, and so our message takes longer to transmit. In addition,
for intricate codes, decoding may be difficult and time-consuming.
The main tasks of coding theory, therefore, are to find codes which enable error-detection and
-correction while adding as little redundant information as possible, and to find efficient decoding
procedures for these codes. Clearly, as d gets large, codes with minimum distance d have fewer and
fewer codewords. So we try to find codes of a given length and a given minimum distance which have
as many codewords as possible. We shall see various bounds on the possible sizes of codes with given
length and minimum distance, and also construct several examples of ‘good’ codes.
Operation 1 – permutation of the positions in the codewords Choose a permutation σ of {1, . . . , n},
and for a codeword v = v1 . . . vn in C define
vσ = vσ(1) . . . vσ(n) .
Now define
Cσ = {vσ | v ∈ C}.
Now define
C f,i = {v f,i | v ∈ C}.
We can from C to D by Operation 1 – we replace each codeword ab with ba. We can get from D to E
by Operation 2 – we permute the symbols appearing in the second position via 0 → 2 → 1 → 0. So
C, D and E are equivalent codes.
The point of equivalence is that equivalent codes have the same size and the same minimum
distance; we can often simplify both decoding procedures and some of our proofs by replacing codes
with equivalent codes.
Lemma 1.4. Suppose C is a code and σ a permutation of {1, . . . , n}, and define Cσ as above. Then
|C| = |Cσ |.
Introduction and definitions 5
Lemma 1.5. Suppose C is a code containing words v and w, and suppose σ is a permutation of
{1, . . . , n}. Define the words vσ and wσ as above. Then
similarly,
d(vσ , wσ ) = |{i | xi , yi }|.
Since xi = vσ(i) and yi = wσ(i) , σ defines a bijection from
{i | xi , yi }
to
{i | vi , wi }.
Lemma 1.7. Suppose C is a code, f a permutation of A and i ∈ {1, . . . , n}, and define C f,i as above.
Then |C f,i | = |C|.
Lemma 1.8. Suppose C is a code containing codewords v and w, and define v f,i and w f,i as above.
Then
d(v f,i , w f,i ) = d(v, w).
6 Coding Theory
2 Good codes
2.1 The main coding theory problem
The most basic question we might ask about codes is: given n, M and d, does an (n, M, d)-code
exist? Clearly, better codes are those which make M and d large relative to n, so we define Aq (n, d)
to be the maximum M such that a q-ary (n, M, d)-code exists. The numbers Aq (n, d) are unknown in
general, and calculating them is often referred to as the ‘main coding theory problem’. Here are two
very special cases.
Theorem 2.1.
1. Aq (n, 1) = qn .
2. Aq (n, n) = q.
Proof.
1. We can take C = An , the set of all words of length n. Any two distinct words must differ in
at least one position, so the code has minimum distance at least 1. Obviously a q-ary code of
length n can’t be bigger than this.
2. Suppose we have a code of length n with at least q + 1 codewords. Then by the pigeonhole
principle there must be two words with the same first symbol. These two words can therefore
differ in at most n−1 positions, and so the code has minimum distance less than n. So Aq (n, n) 6
q. On the other hand, the repetition code described above is an (n, q, n)-code.
Good codes 7
Now we come to our first non-trivial result. It is a ‘reduction theorem’, which in effect says that
for binary codes we need only consider codes with odd values of d.
Theorem 2.3. Suppose d is even. Then a binary (n, M, d)-code exists if and only if a binary (n −
1, M, d − 1)-code exists.
Hence if d is even, then A2 (n, d) = A2 (n − 1, d − 1).
Proof. The ‘only if’ part follows from the Singleton bound, which we state and prove in Section 2.3.
So we concentrate on the ‘if’ part.
Suppose we have a binary (n − 1, M, d − 1)-code. Given a codeword x, we form a word x̂ of length
n by adding an extra symbol, which we choose to be 0 or 1 in such a way that x̂ contains an even
number of 1s.
Claim. If x, y are codewords in C, then d( x̂, ŷ) is even.
Proof. The number of positions in which x̂ and ŷ differ is
(number of places where x̂ has a 1 and ŷ has a 0)
+(number of places where x̂ has a 0 and ŷ has a 1)
which equals
which equals
Now for any x, y ∈ C, we have d(x, y) > d − 1, and clearly this gives d( x̂, ŷ) > d − 1. But d − 1 is
odd, and d( x̂, ŷ) is even, so in fact we have d( x̂, ŷ) > d. So the code
Ĉ = { x̂ | x ∈ C}
is an (n, M, d)-code.
For the final part of the theorem, we have
Now we’ll look at some upper bounds for sizes of (n, M, d)-codes.
(n.b. in metric-space language this is a ball, but the word ‘sphere’ is always used by coding-theorists.)
The importance of spheres lies in the following lemma.
Lemma 2.4. A code C is t-error-correcting if and only if for any distinct words x, y ∈ C, the spheres
S (x, t) and S (y, t) are disjoint.
This lemma gives us a useful bound on the size ofat-error-correcting code. We begin by counting
the words in a sphere; recall the binomial coefficient nr = (n−r)!r!
n!
.
Lemma 2.5. If A is a q-ary alphabet, x is a word over A of length n and r 6 n, then the sphere S (x, r)
contains exactly ! ! ! !
n n 2 n r n
+ (q − 1) + (q − 1) + · · · + (q − 1)
0 1 2 r
words.
Proof. We claim that for any i, the number of words y such that d(x, y) equals i is (q − 1)i ni ; the
lemma then follows by summing for i = 0, 1, . . . , r.
d(x, y) = i means that x and y differ in exactly i positions. Given x, in how many ways can we
choose
n such a y? We begin by choosing the i positions in which x and y differ; this can be done in
i ways. Then we choose what symbols will appear in these i positions in y. For each position, we
can choose any symbol other than the symbol which appears in that position in x – this gives us q − 1
choices. So we have (q − 1)i choices for these i symbols altogether.
Theorem 2.6 (Hamming bound). If C is a t-error-correcting code of length n over a q-ary alphabet
A, then
qn
|C| 6 n n .
0 + (q − 1) 1 + (q − 1) 2 n + · · · + (q − 1)t n
2 t
Good codes 9
Proof. Each codeword has a sphere of radius t around it, and by Lemma 2.4 these spheres are disjoint.
So the total number of words in all these spheres together is
! ! ! !!
n n 2 n t n
M× + (q − 1) + (q − 1) + · · · + (q − 1) ,
0 1 2 t
and this can’t be bigger than the total number of possible words, which is qn .
qn
Aq (n, 2t + 1) 6 n n n n .
0 + (q − 1) 1 + (q − 1)2 2 + · · · + (q − 1)t t
Proof. Suppose C is a q-ary (n, M, 2t + 1)-code. Then C is t-error-correcting (from Section 1), so
qn
|M| 6 n n ,
0 + (q − 1) 1 + (q − 1)2 n2 + · · · + (q − 1)t nt
qn
M = n n
0 + (q − 1) 1 + (q − 1)2 n2 + · · · + (q − 1)r nr
for some r, that is, if equality holds in the Hamming bound. For example, if n is odd and q = 2, then
the repetition code described in §1.2 is perfect (check this!). Later, we shall see some more interesting
examples of perfect codes.
1. Suppose n, d > 1. If a q-ary (n, M, d)-code exists, then a q-ary (n − 1, M, d − 1)-code exists.
Hence Aq (n, d) 6 Aq (n − 1, d − 1).
Proof.
1. Let C be a q-ary (n, M, d)-code, and for x ∈ C, let x be the word obtained by deleting the last
symbol. Let C = {x | x ∈ C}.
The first consequence of the claim is that, since d > 1, x and y are distinct when x and y are. So
|C| = M. The second consequence is that d(C) > d − 1. So C is an (n − 1, M, d − 1)-code.
To show that Aq (n, d) 6 Aq (n − 1, d − 1), take an (n, M, d)-code C with M = Aq (n, d). Then we
get an (n − 1, M, d − 1)-code C, which means that Aq (n − 1, d − 1) > M = Aq (n, d).
2. We prove this part by induction on d, with the case d = 1 following from Theorem 2.1. Now
suppose d > 1 and that the inequality holds for d − 1 (and n − 1). This means
Aq (n − 1, d − 1) 6 q(n−1)−(d−1)+1 = qn−d+1 .
Note that the Singleton bound finishes off the proof of Theorem 2.3.
Proof. It suffices to prove that if a q-ary (n, M, d)-code exists, then so does a q-ary (n − 1, P, d)-code,
for some P > M/q. Indeed, we can take M = Aq (n, d), which will give qP > Aq (n, d), so that
qAq (n − 1, d) > Aq (n, d). So let C be a q-ary (n, M, d)-code. Look at the last symbol of each codeword,
and for each a ∈ A, let n(a) be the number of codewords ending in a.
Proof. Suppose not, i.e. n(a) < M/q for all a ∈ A. Then we get < M. But
P P
a∈A n(a) a∈A n(a)
is the number of codewords, which is M. Contradiction.
So take some a such that n(a) > M/q, and let C0 denote the set of codewords ending in a. For
each x ∈ C0 , define x to be the word obtained by deleting the last symbol from x, and then define
C = {x | x ∈ C0 }.
Proof. We have d(x, y) > d, so x and y differ in at least d positions. Furthermore, none of these
positions is the last position, since x and y both have an a here. So x and y differ in at least d
positions among the first n − 1 positions, which means that x and y differ in at least d places.
The first consequence of this claim is that if x, y ∈ C0 with x , y, then x , y. So |C| = |C0 | = n(a). The
second consequence is that d(C) > d. So C is an (n − 1, n(a), d)-code.
Good codes 11
Now we need to recall some notation: remember that if x ∈ R, then bxc is the largest integer which
is less than or equal to x.
Lemma 2.11. If x ∈ R, then b2xc 6 2bxc + 1.
Proof. Let y = bxc; then x < y + 1. So 2x < 2y + 2, so b2xc 6 2y + 1.
Now we can state the Plotkin bound – there are two cases, depending on whether d is even or odd.
But in fact either one of these can be recovered from the other, using Theorem 2.3.
Theorem 2.12 (Plotkin bound).
1. If d is even and n < 2d, then $ %
d
A2 (n, d) 6 2 .
2d − n
2. If d is odd and n < 2d + 1, then
d+1
$ %
A2 (n, d) 6 2 .
2d + 1 − n
The proof is a double-counting argument. Suppose C is a binary (n, M, d)-code. We suppose that
our alphabet is {0, 1}, and if v = (v1 . . . vn ) and w = (w1 . . . wn ) are codewords, then we define v + w to
be the word (v1 + w1 )(v2 + w2 ) . . . (vn + wn ), where we do addition modulo 2 (so 1 + 1 = 0).
A really useful feature of this addition operation is the following.
Lemma 2.13. Suppose v, are binary words of length n. Then d(v, w) is the number of 1s in v + w.
Proof. By looking at the possibilities for vi and wi , we see that
0 (if vi = wi )
(v + w)i = .
1 (if vi , wi )
So
d(v, w) = |{i | vi , wi }|
= |{i | (v + w)i = 1}|.
12 Coding Theory
Now we write down an M2 by n array A whose rows are all the words v + w for pairs of distinct
codewords v, w. We’re going to count the number of 1s in this array in two different ways.
Lemma 2.14. The number of 1s in A is at most
M2
n (if M is even)
.
4
M2 − 1
n
(if M is odd)
4
Proof. We count the number of 1s in each column. The word v + w has a 1 in the jth position if and
only if one of v and w has a 1 in the jth position, and the other has a 0. If we let N be the number of
codewords which have a 1 in the jth position, then the number ways of choosing a pair v, w such that
v + w has a 1 in the jth position is N(M − N). So the number of 1s in the jth column of our array is
2
M
(if M is even)
4
N(M − N) 6 2
M −1
(if M is odd)
4
by Lemma 2.10. This is true for every j, so by adding up we obtain the desired inequality.
Proof of the Plotkin bound. We assume first that d is even. Suppose we have a binary (n, M, d)-code
C, and construct the array as above. Now we simply combine the inequalities of Lemma 2.14 and
Lemma 2.15. There are two cases, according to whether M is even or odd.
Case 1: M even By combining the two inequalities, we get
M2
!
M
d 6n
2 4
dM 2 dM nM 2
⇒ − 6
2 2 4
⇒ (2d − n)M 2 6 2dM.
By assumption, 2d − n and M are both positive, so we divide both sides by (2d − n)M to get
2d
M6 .
2d − n
But M is an integer, so in fact
$ % $ %
2d d
M6 62 + 1;
2d − n 2d − n
Error probabilities and nearest-neighbour decoding 13
j k
since M is even and 2 d
2d−n + 1 is odd, we can improve this to
$ %
d
M62 .
2d − n
M2 − 1
!
M
d 6n ,
2 4
M M+1
d 6n .
2 4
It follows that (2d − n)M ≤ n and hence
n 2d
M6 = − 1.
2d − n 2d − n
Now M is an integer, so we get
$ %
2d
M6 −1
2d − n
$ %
2d
= −1
2d − n
$ %
d
62 +1−1
2d − n
$ %
d
=2 ,
2d − n
Now we consider the case where d is odd; but this follows by Theorem 2.3. If d is odd and
n < 2d + 1, then d + 1 is even and (n + 1) < 2(d + 1). So by the even case of the Plotkin bound we
have
(d + 1)
$ %
A2 (n + 1, d + 1) 6 ,
2(d + 1) − (n + 1)
and by Theorem 2.3 this equals A2 (n, d).
codeword
Noisy channel
y
distorted word
Decoding process
y
codeword
We’d like the codeword at the bottom to be the same as the codeword at the top as often as possible.
This relies on a good choice of code, and a good choice of decoding process. Most of this course is
devoted to looking at good codes, but here we look at decoding processes. Given a code C of length
n over the alphabet A, a decoding process is simply a function from An to C – given a received word,
we try to ‘guess’ which word was sent.
We make certain assumptions about our noisy channel, namely that all errors are independent and
equally likely. This means that there is some error probability p such that any transmitted symbol a
will be transmitted correctly with probability 1 − p, or incorrectly with probability p, and that if there
is an error then all the incorrect symbols are equally likely. Moreover, errors on different symbols are
independent – whether an error occurs in one symbol has no effect on whether errors occur in later
symbols. We also assume that p 6 21 .
Suppose we have a decoding process f : An → C. We say that f is a nearest-neighbour decoding
process if for all w ∈ An and all v ∈ C we have
This means that for any received word, we decode it using the nearest codeword. Note that some code-
words may be equally near, so there may be several different nearest-neighbour decoding processes
for a given code.
{00000, 11111}.
Given a code and a decoding process, we consider the word error probability: given a codeword w,
what is the probability that after distortion and decoding, we end up with a different codeword? Let’s
calculate this for the above example, with w = 00000. It’s clear that this will be decoded wrongly if
at least three of the symbols are changed into 1s. If the error probability of the channel is p, then the
probability that this happens is
! ! !
5 5 5
p3 (1 − p)2 + p4 (1 − p) + p5 = 6p5 − 15p4 + 10p3 .
3 4 5
For example, if p = 14 , then the word error probability is only about 0.104.
Linear codes 15
In general, word error probability depends on the particular word, and we seek a decoding process
which minimises the maximum word error probability. It can be shown that the best decoding process
in this respect is always a nearest-neighbour decoding process (remembering our assumption that
p 6 12 ).
4 Linear codes
For the rest of the course, we shall be restricting our attention to linear codes; these are codes in
which the alphabet A is a finite field, and the code itself forms a vector space over A. These codes are
of great interest because:
• they are easy to describe – we need only specify a basis for our code;
• it is easy to calculate the minimum distance of a linear code – we need only calculate the
distance of each word from the word 00 . . . 0;
• many of the best codes known are linear; in particular, every known non-trivial perfect code has
the same parameters (i.e. length, number of codewords and minimum distance) as some linear
code.
16 Coding Theory
• F forms an abelian group under +, with identity element 0 (that is, we have
a + b = b + a,
(a + b) + c = a + (b + c),
a + 0 = a,
there exists an element −a of F such that −a + a = 0
• F \ {0} forms an abelian group under ×, with identity element 1 (that is, we have
a × b = b × a,
(a × b) × c = a × (b × c),
a × 1 = a,
there exists an element a−1 of F such that a−1 × a = 1
• a × (b + c) = (a × b) + (a × c) for all a, b, c ∈ F.
We make all the familiar notational conventions: we may write a × b as a.b or ab; we write a × b−1
as a/b; we write a + (−b) as a − b.
We shall need the following familiar property of fields.
We also need to recall the definition of a vector space. If F is a field, then a vector space over F is
a set V with a distinguished element 0, a binary operation + and a function × : (F × V) → V (that is,
a function which, given an element λ of F and an element v of V, produces a new element λ × v of V)
such that:
(λ × µ) × v = λ × (µ × v),
(λ + µ) × v = (λ × v) + (µ × v),
λ × (u + v) = (λ × u) + (λ × v),
1 × v = v.
There shouldn’t be any confusion between the element 0 of F and the element 0 of V, or between
the different versions of + and ×. We use similar notational conventions for + and × those that we use
for fields.
If V is a vector space over F, then a subspace is a subset of V which is also a vector space under
the same operations. In fact, a subset W of V is a subspace if and only if
Linear codes 17
• 0 ∈ W,
• v + w ∈ W, and
• λv ∈ W
whenever v, w ∈ W and λ ∈ F.
Suppose V is a vector space over F and v1 , . . . , vn ∈ V. Then we say that v1 , . . . , vn are linearly
independent if there do not not exist λ1 , . . . , λn ∈ F which are not all zero such that
λ1 v1 + · · · + λn vn = 0.
We define the span of v1 , . . . , vn to be the set of all linear combinations of v1 , . . . , vn , i.e. the set
ker(α) = {v ∈ V | α(v) = 0}
Im(α) = {α(v) | v ∈ V}
of W. ker(α) is a subspace of V, and we refer to its dimension as the nullity of α, which we write n(α).
Im(α) is a subspace of W, and we refer to its dimension as the rank of α, which we write r(α). The
Rank–nullity Theorem says that if α is a linear map from V to W, then
We shall only be interested in one particular type of vector space. For a non-negative integer n, we
consider the set Fn , which we think of as the set of column vectors of length n over F, with operations
x1 + y1
x1 y1
.. +
.. =
..
. . .
xn + yn
xn yn
and
λx1
x1
λ
.. .
= .. .
.
λxn
xn
18 Coding Theory
Fn is a vector space over F of dimension n. Sometimes we will think of the elements of Fn as row
vectors rather than column vectors, or as words of length n over F.
Given m, n and an n × m matrix A over F, we can define a linear map Fm → Fn by
Every linear map from Fm to Fn arises in this way. The rank of A is defined to be the rank of this linear
map. The column rank of A is defined to be the dimension of hc1 , . . . , cm i, where c1 , . . . , cm are the
columns of A regarded as vectors in Fn , and the row rank is defined to be the dimension of hr1 , . . . , rn i,
where r1 , . . . , rn are the rows of A regarded as (row) vectors in Fm . We shall need the result that the
rank, row rank and column rank of A are all equal.
Note that when we think of Fn as the space of row vectors rather than column vectors, we may
think of a linear map as being multiplication on the right by an m × n matrix.
Theorem 4.2. Let q be an integer greater than 1. Then a field of order q exists if and only if q is a
prime power, and all fields of order q are isomorphic.
If q is a prime power, then we refer to the unique field of order q as Fq . For example, if q is
actually a prime, then Fq simply consists of the integers mod q, with the operations of addition and
multiplication mod q. It is a reasonably easy exercise to show that this really is a field – the hard
part is to show that multiplicative inverses exist, and this is a consequence of the Chinese Remainder
Theorem.
If q is a prime power but not a prime, then the field Fq is awkward to describe without developing
lots of theory. But this need not worry us – all the explicit examples we meet will be over fields
of prime order. Just remember that there is a field of each prime power order. As an example, the
addition and multiplication tables for the field of order 4 are given below; we write F4 = {0, 1, a, b}.
+ 0 1 a b × 0 1 a b
0 0 1 a b 0 0 0 0 0
1 1 0 b a 1 0 1 a b
a a b 0 1 a 0 a b 1
b b a 1 0 b 0 b 1 a
What this means for coding theory is that if we have a q-ary alphabet A with q a prime power,
then we may assume that A = Fq (since A is just a set of size q with no additional structure, we lose
nothing by re-labelling the elements of A as the elements of Fq ) and we gets lots of extra structure on
A (i.e. the structure of a field) and on An = Fnq (the structure of a vector space).
Definition. Assume that q is a prime power and that A = Fq . A linear code over A is a subspace of
An .
Linear codes 19
that we saw earlier is linear. To check this, we have to show that it is closed under addition and scalar
multiplication. Scalar multiplication is easy: the only elements of F2 are 0 and 1, and we have
0x = 00000, 1x = x
for any codeword x. For addition, notice that we have x + x = 00000 and x + 00000 = x for any x,
and
d(x + z, y + z) = |{i | xi + zi , yi + zi }.
Now for any xi , yi , zi we have xi + zi , yi + zi if and only if xi , yi (since we can just add or
subtract zi to/from both sides). So
d(x + z, y + z) = {i | xi , yi }
= d(x, y).
Now since λ , 0 we have λxi , λyi if and only if xi , yi (since we can multiply both sides by
λ or λ−1 ). So we find
Corollary 4.4. The minimum distance of a linear code C equals the minimum weight of a non-zero
codeword in C.
(C contains distinct codewords x, y with d(x, y) = δ) ⇔ (C contains a non-zero codeword x with weight(x) = δ).
(⇐) C must contain the zero element of Fnq , namely the word 00 . . . 0. This is because C must contain
some word x, and hence must contain 0x = 00 . . . 0. So if x is a non-zero codeword with weight
δ, then x and 00 . . . 0 are distinct codewords with d(w, 00 . . . 0) = δ.
(⇒) If x, y are distinct codewords with d(x, y) = δ, then x − y is a non-zero codeword with weight(x −
y) = δ.
v = λ1 e1 + λ2 e2 + · · · + λk ek ,
Example. The set {01101, 11011} is a basis for the code in the last example.
In general, V will have lots of different bases to choose from. But (recall from Linear Algebra)
that any two bases have the same size, and this size we call the dimension of V. So the code in the
examples above has dimension 2. If a code C is of length n and has dimension k as a vector space, we
say that C is an [n, k]-code. If in addition C has minimum distance at least d, we may say that C is an
[n, k, d]-code. So the code in the example above is a binary [5, 2, 3]-code.
Proof. Suppose {e1 , . . . , ek } is a basis for V. Then, by the definition of a basis, every element of V is
uniquely of the form
λ1 e1 + λ2 e2 + · · · + λk ek ,
for some choice of λ1 , . . . , λk ∈ Fq . On the other hand, every choice of λ1 , . . . , λk gives us an element
of V, so the number of vectors in V is the number of choices of λ1 , . . . , λk . Now there are q ways to
choose each λi (since there are q elements of Fq to choose from), and so the total number of choices
of these scalars is qk .
Linear codes 21
As a consequence, we see that a q-ary [n, k, d]-code is a q-ary (n, qk , d)-code. This highlights a
slight disadvantage of linear codes – their sizes must be powers of q. So if we’re trying to find optimal
codes for given values of n, d (i.e. (n, M, d)-codes with M = Aq (n, d)), then we can’t hope to do this
with linear codes if Aq (n, d) is not a power of q. In practice (especially for q = 2) many of the values
of Aq (n, d) are powers of q.
It will be useful in the rest of the course to arrange a basis of our code in the form of a matrix.
Definition. Suppose C is a q-ary [n, k]-code. A generator matrix for C is a k × n matrix with entries
in Fq , whose rows form a basis for C.
Examples.
1. The binary [5, 2, 3]-code from the last example has various different generator matrices; for
example ! !
01101 10110
, .
10110 11011
2. If q is a prime power, then the q-ary repetition code is linear. It has a generator matrix
(11 . . . 1).
C = {v = v1 v2 . . . vn | v1 , . . . , vn ∈ F2 , v1 + · · · + vn = 0}.
Operation 2: permuting the symbols in a given position Choose i ∈ {1, . . . , n} and a permutation
f of A. Forv = v1 . . . vn ∈ An , define
There’s a slight problem with applying this to linear codes, which is that if C is linear and D is
equivalent to C, then D need not be linear. Operation 1 is OK, as we shall now show.
Lemma 4.6. Suppose C is a linear [n, k, d]-code over Fq , and σ is a permutation of {1, . . . , n}. Then
the map
φ : C −→ Fnq
v 7−→ vσ
i.e.
(φ(λv + µw))i = (λφ(v) + µφ(w))i
for every i ∈ {1, . . . , n}. We have
as required.
Now Cσ is by definition the image of φ, and so is a subspace of Fnq , i.e. a linear code. We know
that d(Cσ ) = d(C) from before, and that |Cσ | = |C|. This implies qdim Cσ = qdim C by Lemma 4.5, i.e.
dim Cσ = dim C = k, so Cσ is an [n, k, d]-code.
Unfortunately, Operation 2 does not preserve linearity. Here is a trivial example of this. Suppose
q = 2, n = 1 and C = {0}. Then C is a linear [1, 0]-code. If we choose i = 1 and f the permutation
which swaps 0 and 1, then we have C f,i = {1}, which is not linear. So we need to restrict Operation 2.
We define the following.
Operation 20 Suppose C is a linear code of length n over Fq . Choose i ∈ {1, . . . , n} and a ∈ Fq \ {0}.
For v = v1 . . . vn ∈ Fnq define
We want to show that Operation 20 preserves linearity, dimension and minimum distance. We being
by showing that it’s a special case of Operation 2.
Lemma 4.7. If a ∈ Fq \ {0}, the the map
f : Fq −→ Fq
x 7−→ ax
is a bijection, i.e. a permutation of Fq .
Proof. Since f is a function from a finite set to itself, we need only show that f is injective. If x, y ∈ Fq
and f (x) = f (y), then we have ax = ay. Since a , {0}, a has an inverse a−1 , and we can multiply both
sides by a−1 to get x = y. So f is injective.
Now we show that the operation which sends v to va,i is linear, which will mean that it sends linear
codes to linear codes.
Lemma 4.8. Suppose C is a linear [n, k, d]-code over Fq , i ∈ {1, . . . , n} and 0 , a ∈ Fq . Then the map
φ : Fnq −→ Fnq
v 7−→ va,i
is linear, and Ca,i is a linear [n, k, d]-code over Fq .
Proof. For any vector v ∈ Fnq , we have
av j
( j = i)
φ(v) j = (va,i ) j = .
v j
( j , i)
Now take v, w ∈ Fnq and λ, µ ∈ Fq . We must show that
(φ(λv + µw)) j = (λφ(v) + µφ(w)) j
for each j ∈ {1, . . . , n}. For j , i we have
(φ(λv + µw)) j = (λv + µw) j
= (λv) j + (µw) j
= λv j + µw j
= λ(φ(v)) j + µ(φ(w)) j
= (λφ(v)) j + (µφ(w)) j ,
= (λφ(v) + µφ(w)) j ,
while for j = i we have
(φ(λv + µw)) j = a(λv + µw) j
= a((λv) j + (µw) j )
= a(λv j + µw j )
= aλv j + aµw j
= λ(av j ) + µ(aw j )
= λ(φ(v) j ) + µ(φ(w) j )
= (λφ(v)) j + (µφ(w)) j
= (λφ(v) + µφ(w)) j ,
24 Coding Theory
as required.
Now Ca,i is by definition the image of φ, and this is a subspace of Fnq , i.e. a linear code. We know
from before (since Operation 20 is a special case of Operation 2) that d(Ca,i ) = d(C) and |Ca,i | = |C|,
and this gives dim Ca,i = dim C = k, so that Ca,i is a linear [n, k, d]-code.
In view of these results we re-define equivalence for linear codes: we say that linear codes C and
D are equivalent if we can get from one to the other by applying Operations 1 and 20 repeatedly.
Example. Let n = q = 3, and define
Then C, D and E are all [3, 1]-codes over Fq (check this!). We can get from C to D by swapping the
first two positions, and we can get from D to E by multiplying everything in the third position by 2.
So C, D and E are equivalent linear codes.
We’d like to know the relationship between equivalence and generator matrices: if C and D are
equivalent linear codes, how are their generator matrices related? Well, a given code usually has
more than one choice of generator matrix, and so first we’d like to know how two different generator
matrices for the same code are related.
We define the following operations on matrices over Fq :
MO1. permuting the rows;
MO2. multiplying a row by a non-zero element of Fq ;
MO3. adding a multiple of a row to another row.
You should recognise these as the ‘elementary row operations’ from Linear Algebra. Their im-
portance is as follows.
Lemma 4.9. Suppose C is a linear [n, k]-code with generator matrix G. If the matrix H can be
obtained from G by applying the row operations (MO1–3) repeatedly, then H is also a generator
matrix for C.
Proof. Since G is a generator matrix for C, we know that the rows of G are linearly independent and
span C. So G has rank k (the number of rows) and row space C. We know from linear algebra that
elementary row operations do not affect the rank or the row space of a matrix, so H also has rank k
and row space C. So the rows of H are linearly independent and span C, so form a basis for C, i.e. H
is a generator matrix for C.
Lemma 4.10. Suppose C is a linear [n, k]-code over Fq , with generator matrix G. If the matrix H
is obtained from G by applying matrix operation 4 or 5, then H is a generator matrix for a code D
equivalent to C.
Linear codes 25
Proof. Suppose G has entries g jl , for 1 6 j 6 k and 1 6 l 6 n. Let r1 , . . . , rk be the rows of G, i.e.
r j = g j1 g j2 . . . g jn .
But this is the word (r j )a,i as defined in equivalence operation 2. So the rows of H lie in the code Ca,i ,
which is equivalent to C.
For either matrix operation, we have seen that the rows of H lie in a code D equivalent to C. We
need to know that they form a basis for D. Since there are k rows and dim(D) = dim(C) = k, it
suffices to show that the rows of H are linearly independent, i.e. to show that H has rank k. But matrix
operations 4 and 5 are elementary column operation, and we know from linear algebra that these don’t
affect the rank of a matrix. So rank(H) = rank(G) = k.
Proposition 4.11. Suppose C is a linear [n, k]-code with a generator matrix G, and that the matrix H
is obtained by applying a sequence of matrix operations 1–5. Then H is a generator matrix for a code
D equivalent to C.
Proof. By applying matrix operations 1–3, we get a new generator matrix for C, by Lemma 4.9, and
C is certainly equivalent to itself. By applying matrix operations 4 and 5, we get a generator matrix
for a code equivalent to C, by Lemma 4.10.
Note that in the list of matrix operations 1–5, there is a sort of symmetry between rows and
columns. In fact, you might expect that you can do another operation
but you can’t. Doing this can take you to a code with a different minimum distance. For example,
suppose q = 2, and that C is the parity-check code of length 3:
If we applied operation 6 above, adding column 1 to column 2, we’d get the matrix
!
111
H= .
011
which has minimum distance 1, so is not equivalent to C. So the difference between ‘row operations’
and ‘column operations’ is critical.
Armed with these operations, we can define a standard way in which we can write generator
matrices.
G = (Ik |A),
For example, the generator matrix for the binary parity-check code given above is in standard
form.
Lemma 4.12. Suppose G is a k × n matrix over Fq whose rows are linearly independent. By applying
matrix operations 1–5, we may transform G into a matrix in standard form.
where the 1 is in position i. Suppose we have already done this for columns 1, . . . , i − 1.
Step 1. Since the rows of our matrix are linearly independent, there must be some non-zero entry in
the ith row. Furthermore, by what we know about columns 1, . . . , i − 1, this non-zero entry must
occur in one of columns i, . . . , n. So we apply matrix operation 4, permuting columns i, . . . , n
to get a non-zero entry in the (i, i)-position of our matrix.
Step 2. Suppose the (i, i)-entry of our matrix is a , 0. Then we apply matrix operation 2, multiplying
row i of our matrix by a−1 , to get a 1 in the (i, i)-position. Note that this operation does not
affect columns 1, . . . .i − 1.
Linear codes 27
Step 3. We now apply matrix operation 3, adding multiples of row i to the other rows in order to ‘kill’
the remaining non-zero entries in column i. Note that this operation does not affect columns
1, . . . .i − 1.
By applying Steps 1–3 for i = 1, . . . , k in turn, we get a matrix in standard form. Note that it is auto-
matic from the proof that k 6 n.
Corollary 4.13. Suppose C is a linear [n, k]-code over Fq . Then C is equivalent to a code with a
generator matrix in standard form.
Proof. G has linearly independent rows, so by Lemma 4.12 we can transform G into a matrix H in
standard form using matrix operations 1–5. By Proposition 4.11, H is a generator matrix for a code
equivalent to C.
v + C = {v + w | w ∈ C}.
You should remember the word coset from group theory, and this is exactly what cosets are here
– cosets of the group C as a subgroup of Fnq .
Proof.
1. C contains qk words, and the map from C to v + C given by w 7→ v + w is a bijection.
5. There are qn words altogether in Fnq , and each of them is contained in exactly one coset of C.
Each coset has size qk , and so the number of cosets must be qn /qk = qn−k .
Given a linear [n, k]-code C and a coset w + C, we define a coset leader to be a word of minimal
weight in w + C. Now we define a Slepian array to be a qn−k × qk array, constructed as follows:
• choose one leader from each coset (note that the word 00 . . . 0 must be the leader chosen from
the coset 00 . . . 0 + C = C, since it has smaller weight than any other word);
• in the first row of the array, put all the codewords, with 00 . . . 0 at the left and the other code-
words in any order;
• in the first column put all the coset leaders – the word 00 . . . 0 is at the top, and the remaining
leaders go in any order;
• now fill in the remaining entries by letting the entry in row i and column j be
Example. For the code in the last example, we may choose 00, 02 and 10 as coset leaders, and draw
the Slepian array
00 12 21
02 11 20 .
10 22 01
Proof. Let v be a word in Fnq . The v lies in some coset x + C, by Proposition 4.14(2). Let y be the
chosen leader for this coset; then y appears in column 1, in row i, say. Since y ∈ x + C, we have
y + C = x + C, by Proposition 4.14(3). So v ∈ y + C, and so we can write v = y + u, where u ∈ C. u
lies in row 1 of the array, in column j, say, and so v lies in row i and column j of the array.
Now we show how to use a Slepian array to construct a decoding process. Let C be an [n, k]-code
over Fq , and let S be a Slepian array for C. We define a decoding process f : Fnq → C as follows. For
v ∈ Fnq , we find v in the array S (which we can do, by Lemma 4.15). Now we let f (v) be the codeword
at the top of the same column as v.
Linear codes 29
Find v in the Slepian array, and let u be the word at the start of the same row as v. Then, by the
construction of the Slepian array,
v = u + f (v) ∈ u + C.
This gives
v − w = u + ( f (v) − w) ∈ u + C.
Of course u ∈ u + C, and u was chosen to be a leader for this coset, which means that
C = {000, 111}.
in column j.
This is a much better way to construct Slepian arrays, since we only need to know the code, not
the cosets. However, we’ll see that we can do better than to use a Slepian array in the next section.
30 Coding Theory
= w.v.
Also,
n
X
(λv + µv0 ).w = (λv + µv0 )i wi
i=1
n
X
= (λvi + µv0i )wi
i=1
Xn n
X
=λ vi wi + µ v0i wi
i=1 i=1
Lemma 5.2. Suppose C is a linear [n, k]-code and G is a generator matrix matrix for C. Then
w ∈ C⊥ ⇔ GwT = 0.
Note that we think of the elements of Fnq as row vectors; if w is the row vector (w1 . . . wn ), then wT
is the column vector
w1
.
.. .
wn
G is a k × n matrix, so GwT is a column vector of length k.
Proof. Suppose G has entries gi j , for 1 6 i 6 k and 1 6 j 6 n. Let gi denote the ith row of G. Then
g1 , . . . , gk are words which form a basis for C, and
gi = gi1 . . . gin .
Now
n
X
(GwT )i = gi j w j
j=1
= gi .w,
so GwT = 0 if and only if gi .w = 0 for all i.
(⇒) If w ∈ C⊥ , then v.w = 0 for all v ∈ C. In particular, gi .w = 0 for i = 1, . . . , k. So GwT = 0.
(⇐) If GwT = 0, then gi .w = 0 for i = 1, . . . , k. Now g1 , . . . , gk form a basis for C, so any v ∈ C can
be written as
v = λ1 g 1 + · · · + λk g k
for λ1 , . . . , λk ∈ Fq . Then
v.w = (λ1 g1 + · · · + λk gk ).w
= λ1 (g1 .w) + · · · + λk (gk .w)
by Lemma 5.1
= 0 + · · · + 0.
So w ∈ C⊥ .
This lemma gives us another way to think of C⊥ – it is the kernel of any generator matrix of C.
Examples.
1. Suppose q = 3 and C is the repetition code {000, 111, 222}. This has a 1 × 3 generator matrix
1 1 1 , so
C⊥ = {w ∈ F33 | 111.w = 0}
= {w ∈ F33 | w1 + w2 + w3 = 0}
= {000, 012, 021, 102, 111, 120, 201, 210, 222},
the linear [3, 2]-code with basis {210, 201}.
32 Coding Theory
2. Let q = 2 and C = {0000, 0101, 1010, 1111}. This has generator matrix
!
1 0 1 0
,
0 1 0 1
and we have
! w1
w1 + w3
!
1 0 1 0 w2
= .
0 1 0 1 w3 w2 + w4
w4
So C⊥ is the set of all words w with w1 + w3 = w2 + w4 = 0, i.e.
So C⊥ = C. This is something which can’t happen for real vector spaces and their orthogonal
complements.
3. Let q = 2, and C = {000, 001, 110, 111}. Then C has generator matrix
!
0 0 1
,
1 1 0
Note that in all these examples, C⊥ is a subspace of Fnq , i.e. a linear code. In fact, this is true in
general.
Proof. Let G be a generator matrix of C. Then Lemma 5.2 says that C⊥ = ker(G). So by the rank–
nullity theorem C⊥ is a subspace of Fnq , i.e. a linear code, and the dimension of C⊥ is n minus the rank
of G. Recall that the rank of a matrix is the maximum l such that G has a set of l linearly independent
rows. Well, G has k rows, and since they form a basis for C, they must be linearly independent. So G
has rank k, and the theorem is proved.
Example. Suppose q = 2 and C is the repetition code {00 . . . 0, 11 . . . 1} of length n. G has generator
matrix
1 1 ... 1 ,
and so
C⊥ = {v ∈ Fn2 | v1 + · · · + vn = 0},
the parity-check code. By Theorem 5.3, this is an [n, n − 1]-code, so contains 2n−1 words.
Proof.
Claim. C ⊆ (C⊥ )⊥ .
Dual codes and parity-check matrices 33
Proof. C⊥ = {w ∈ Fq | v.w = 0 for all v ∈ C}. This means that v.w = 0 for all v ∈ C and
w ∈ C⊥ . Another way of saying this is that if v ∈ C, then w.v = 0 for all w ∈ C⊥ . So
If C is an [n, k]-code, then Theorem 5.3 says that C⊥ is an [n, n − k]-code. By applying Theorem 5.3
again, we find that (C⊥ )⊥ is an [n, k]-code. So we have (C⊥ )⊥ > C and dim(C⊥ )⊥ = dim C have the
same dimension, and so (C⊥ )⊥ = C.
Definition. Let C be a linear code. A parity-check matrix for C is a generator matrix for C⊥ .
We will see in the rest of the course that parity-check matrices are generally more useful than
generator matrices. Here is an instance of this.
Lemma 5.5. Let C be a code, H a parity-check matrix for C, and v a word in Fnq . Then v ∈ C if and
only if HvT = 0.
Proof. By Theorem 5.4 we have v ∈ C if and only if v ∈ (C⊥ )⊥ . Now H is a generator matrix for C⊥ ,
and so by Lemma 5.2 we have w ∈ (C⊥ )⊥ if and only if HwT = 0.
But can we find a parity-check matrix? The following lemma provides a start.
Lemma 5.6. Suppose C is a linear [n, k]-code with generator matrix G, and let H be any n − k by n
matrix. Then H is a parity-check matrix for C if and only if
• GH T = 0.
Proof. Let h1 , . . . , hn−k be the rows of H. Then the ith column of GH T is GhTi , so GH T = 0 if and
only if GhTi = 0 for i = 1, . . . , n − k.
(⇒) If H is a parity-check matrix for C, then it is a generator matrix for C⊥ , so its rows h1 , . . . , hn−k
form a basis for C⊥ ; in particular, they are linearly independent. Also, since h1 , . . . , hn−k lie in
C⊥ , we have GhTi = 0 for each i by Lemma 5.2, so GH T = 0.
(⇐) Suppose that GH T = 0 and the rows of H are linearly independent. Then GhTi = 0 for each i,
and so h1 , . . . , hn−k lie in C⊥ by Lemma 5.2. So the rows of H are linearly independent words
in C⊥ . But the dimension of C⊥ is n − k (the number of rows of H), so in fact these rows form
a basis for C⊥ , and hence H is a generator matrix for C⊥ , i.e. a parity-check matrix for C.
This helps us to find a parity-check matrix if we already have a generator matrix. If the generator
matrix is in standard form, then a parity-check matrix is particularly easy to find.
G = (Ik |A)
34 Coding Theory
is a generator matrix for C in standard form: Ik is the k by k identity matrix, and A is some k by n − k
matrix. Then the matrix
H = (−AT |In−k )
is a parity-check matrix for C.
Proof. Certainly H is an n − k by n matrix. The last n − k columns of H are the standard basis vectors,
and and so are linearly independent. So H has rank at least n − k, and hence the rows of H must be
linearly independent. It is a simple exercise (which you should do!) to check that GH T = 0, and we
can appeal to Lemma 5.6.
Example.
In view of Lemma 5.7, we say that a parity-check matrix is in standard form if it has the form
(B|In−k ).
S (y) = yH T ∈ Fn−k
q .
We saw above that w ∈ (C⊥ )⊥ if and only if HwT = 0, i.e. if and only if S (y) = 0. So the syndrome
of a word tells us whether it lies in our code. In fact, the syndrome tells us which coset of our code
the word lies in.
Dual codes and parity-check matrices 35
Lemma 5.8. Suppose C is a linear [n, k]-code, and v, w are words in Fnq . Then v and w lie in the same
coset of C if and only if S (v) = S (w).
Proof.
v and w lie in the same coset ⇔ v ∈ w + C
⇔ v = w + x, for some x ∈ C
⇔v−w∈C
⇔ H(vT − wT ) = 0
⇔ HvT − HwT = 0
⇔ HvT = HwT
⇔ S (v) = S (w).
In view of this lemma, we can talk about the syndrome of a coset to mean the syndrome of a word
in that coset. Note that if C is an [n, k]-code, then a syndrome is a row vector of length n − k. So
there are qn−k possible syndromes. But we saw after Proposition 4.14 that there are also qn−k different
cosets of C, so in fact each possible syndrome must appear as the syndrome of a coset.
The point of this is that we can use syndromes to decode a linear code without having to write out
a Slepian array. We construct a syndrome look-up table as follows.
1. Choose a parity-check matrix H for C.
2. Choose a set of coset leaders (i.e. one leader from each coset) and write them in a list.
3. For each coset leader, calculate its syndrome and write this next to it.
Example. Let q = 3, and consider the repetition code
C{000, 111, 222}
again. This has generator matrix
G = (111),
and hence parity-check matrix !
102
H= .
012
There are 9 cosets of C, and a set of coset leaders, with their syndromes, is
leader syndrome
000 00
001 22
002 11
010 01
.
020 02
100 10
200 20
012 12
021 21
36 Coding Theory
Given a syndrome look-up table for a code C, we can construct a decoding process as follows.
• Find this syndrome in the syndrome look-up table; let v be the coset leader with this syndrome.
• Define g(w) = w − v.
Lemma 5.9. The decoding process g that we obtain for C using a syndrome look-up table with coset
leaders l1 , . . . , lr is the same as the decoding process f that we obtain using a Slepian array with coset
leaders l1 , . . . , lr .
Proof. When we decode w using g, we find the coset leader with the same syndrome as w; by Lemma
5.8, this means the leader in the same coset as w. So
w = (leader at the left of the same row as w) + (codeword at the top of the same column as w),
so
(since the words in any one row of a Slepian array form a coset)
= g(w).
So we have seen the advantages of linear codes – although there might often be slightly larger
non-linear codes with the same parameters, it requires a lot more work to describe them and to encode
and decode.
Now we’ll look at some specific examples of codes.
Definition. Let r be any positive integer, and let Hr be the r by 2r − 1 matrix whose columns are all
the different non-zero vectors over F2 . Define the binary Hamming code Ham(r, 2) to be the binary
[2r − 1, 2r − r − 1]-code with Hr as its parity-check matrix.
Example.
• For r = 1, we have
H = (1),
so that Ham(1, 2) is the [1, 0]-code {0}.
Hence
Note that the Hamming code is not uniquely defined – it depends on the order you choose for the
columns of H. But choosing different orders still gives equivalent codes, so we talk of the Hamming
code Ham(r, 2).
Now we consider q-ary Hamming codes for an arbitrary prime power q. We impose a relation on
the non-zero vectors in Frq by saying that v ≡ w if and only if v = λw for some non-zero λ ∈ Fq .
Now we count the equivalence classes of non-zero words (the zero word lies in an equivalence
class on its own).
qr − 1
Lemma 6.2. Under the equivalence relation ≡, there are exactly equivalence classes of non-
q−1
zero words.
Proof. Take v ∈ Frq \ {0}. The equivalence class containing v consists of all words λv for λ ∈ Fq \ {0}.
There are q − 1 possible choices of λ, and these give distinct words: if λ , µ, then (since v , 0)
(λv , µv). So there are exactly q − 1 words in each equivalence class. There are qr − 1 non-zero words
qr − 1
altogether, so the number of equivalence classes is .
q−1
Now choose vectors v1 , . . . , vN , where N = (qr − 1)/(q − 1), choosing exactly one from each
equivalence class. Define H to be the r by N matrix with columns v1 , . . . , vN , and define the q-ary
Hamming code Ham(r, q) to the [N, N − r]-code over Fq with parity-check matrix H. Again, the actual
code depends on the order of v1 , . . . , vN (and on the choice of v1 , . . . , vN ), but different choices give
equivalent codes.
Example.
• Let q = 5 and r = 2. H may be chosen to equal
!
111110
,
123401
so that Ham(2, 5) is the [6, 4]-code over F5 with generator matrix
100044
010043
001042 .
000141
• Let q = 3 and r = 3. H may be chosen to be
0011111111100
1100111222010 ,
1212012012001
and Ham(3, 3) is the [13, 10]-code with generator matrix
1000000000022
0100000000021
0010000000202
0001000000201
0000100000220
0000010000222 .
0000001000221
0000000100210
0000000010212
0000000001211
Some examples of linear codes 39
It may not be obvious how to choose the vectors v1 , . . . , vN , but there is a trick: choose all non-
zero vectors whose first non-zero entry is a 1. You might like to prove as an exercise that this always
works, but you can use this trick without justification in the exam.
Lemma 6.3. Ham(r, q) is an [N, N − r]-code over Fq , where as above N = (qr − 1)/(q − 1).
Proof. H is an r × N matrix, and so Ham(r, q)⊥ is an [N, r]-code. So Ham(r, q) is an [N, N − r]-code,
by Theorem 5.3.
Proof. Since Ham(r, q) is linear, it’s enough to show that the minimum weight of a non-zero codeword
is 3, i.e. that there are no codewords of weight 1 or 2. We have a parity-check matrix H, and by Lemma
5.5 a word w lies in Ham(r, q) if and only if HwT = 0. So all we need to do is show that HwT , 0
whenever w is a word of weight 1 or 2.
Suppose that w has weight 1, with
wi = λ , 0,
w j = 0 for j , i.
Recall that the columns of H are v1 , . . . , vN . We calculate that HwT = λvi . Now vi , 0 by construction,
and λ , 0, and so HwT , 0, so w < Ham(r, q).
Next suppose w has weight 2, with
wi = λ , 0,
w j = µ , 0,
wk = 0 for k , i, j.
Proof. Ham(r, q) is 1-error-correcting since its minimum distance is greater than 2. To show that it is
perfect, we have to show that equality holds in the Hamming bound, i.e.
qN
| Ham(r, q)| = N ,
0 + (q − 1) N1
qr − 1
where N = .
q−1
40 Coding Theory
Since Ham(r, q) is an [N, N−r]-code, the left-hand side equals qN−r by Lemma 4.5. The right-hand
side equals
qN qN
=
1 + (q − 1)N 1 + (q − 1) qr −1
q−1
qN
=
1 + qr − 1
= qN−r ,
For binary Hamming codes, there is quite a neat way to do syndrome decoding. First, we need to
know what the coset leaders look like in a Hamming code.
Lemma 6.6. Suppose C = Ham(r, q), and that D is a coset of C. Then D contains a unique word of
weight at most 1.
Proof. First we’ll show that the number of words in Frq equals the number of cosets, so that there is
one word of weight at most 1 per coset on average. Then we’ll show that any coset contains at most
one word of weight 1. This will then imply that each coset contains exactly one word of weight 1.
C is an [N, N − r]-code, so there are qr cosets of C (see the discussion after Proposition 4.14).
Now we look at the number of words of weight at most 1. There is one word of weight 0. To specify
a word of weight 1, we need to choose what the non-zero entry in the word will be (q − 1 choices),
and where it will occur (N choices). So the number of words of weight at most 1 is
qr − 1
1 + (q − 1)N = 1 + (q − 1) = qr .
q−1
For the second part, suppose v and w are weight at most 1 lying in the same coset. Then v ∈ w + C,
so v = w + x for some x ∈ C, i.e. v − w ∈ C. Now v and −w are words of weight at most 1, and so
v − w has weight at most 2. But d(C) > 3, so the only word in C of weight at most 2 is the zero word.
So v − w = 0, i.e. v = w, and so a coset contains at most one word of weight at most 1.
The preceding lemma tells us that the coset leaders for a Hamming code must be precisely all
the words of weight at most 1. Now we restrict our attention to the case q = 2. Recall that the
columns of Hr are precisely all the different non-zero column vectors over F2 – these give the binary
representations of the numbers 1, 2, . . . , 2r − 1. We order the columns of Hr so that column i gives the
binary representation of the number i.
Lemma 6.7. Let S (w) be the syndrome of w. If w ∈ C, then S (w) = 0. Otherwise, S (w) gives the
binary representation of j.
Proof. Since w − e j lies in the code, e j must lie in the same coset as w. So e j has the same syndrome
as w. But clearly the syndrome of e j is the jth row of H T , and we picked H so that the jth row of H T
is the binary representation of j.
Example. Continuing the last example: suppose the codeword 0101010 is transmitted (exercise:
check that this is a codeword, given the way we’ve chosen Hr ). Suppose that the fifth digit gets
distorted to a 1, so we receive 0101110. We calculate the syndrome
001
010
011
0101110 100 = (101),
101
110
111
which is the binary representation of the number 5. So we know to change the fifth digit to recover
the codeword.
Theorem 6.8. Suppose C is an [n, k]-code with parity-check matrix H. Then C has minimum distance
at least d if and only if any d − 1 columns of H are linearly independent.
In particular, an [n, k, d]-code exists if and only if there is a sequence of n vectors in Fn−k
q such
that any d − 1 of them are linearly independent.
Proof. Let c1 , . . . , cn be the columns of H, and suppose first that there are columns ci1 , . . . , cid−1 which
are linearly dependent, i.e. there are scalars λ1 , . . . , λd−1 (not all zero) such that
Let w be the word which has λ1 , . . . , λd−1 in positions i1 , . . . , id−1 , and 0s elsewhere. Then the above
equation is the same as saying HwT = 0, which, since H is a parity-check matrix for C, is the same as
saying w ∈ C. But then w is a non-zero word of weight at most d − 1, while C is a code of minimum
distance at least d; contradiction.
The other direction is basically the same: if C does not have minimum distance at least d, then C
has a non-zero codeword w of weight e < d, and the equation HwT = 0 provides a linear dependence
between some e of the columns of H; if e vectors are linearly dependent, then any d − 1 vectors
including these e are certainly linearly dependent.
The second paragraph of the proposition is now immediate – if we have such a code, then the
columns of a parity-check matrix are such a set of vectors, while if we have such a set of vectors, then
42 Coding Theory
the matrix with these vectors as its columns is the parity-check matrix of such a code.
Notice that we say ‘sequence’ rather than ‘set’, since two columns of a matrix might be the same.
However, if there are two equal columns, then they are linearly dependent, and so the minimum
distance of our code would be at most 2.
Theorem 6.8 tells us that one way to look for good linear codes is to try to find large sets of
vectors such that large subsets of these are linearly independent. This is often referred to as the ‘main
linear coding theory problem’. We use this now to prove the Gilbert–Varshamov bound. The bounds
we saw earlier – the Hamming, Singleton and Plotkin bounds – were all essentially of the form ‘if a
code with these parameters exists, then the following inequality holds’, and and so gave upper bounds
on Aq (n, d). The Gilbert–Varshamov bound says ‘if this inequality holds, then a code with these
parameters exists’, and so it gives some lower bounds on Aq (n, d).
First we prove a very simple lemma.
Theorem 6.10 (Gilbert–Varshamov bound). Suppose q is a prime power, and n, r, d are positive inte-
gers satisfying
! ! ! !
n−1 n−1 2 n−1 d−2 n − 1
+ (q − 1) + (q − 1) + · · · + (q − 1) < qr . (∗)
0 1 2 d−2
Then an [n, n − r, d]-code over Fq exists.
Proof. By Theorem 6.8, all we need to do is find a sequence of n vectors in Frq such that any d − 1
of them are linearly independent. In fact, we can do this in a completely naı̈ve way. We begin by
choosing any non-zero vector v1 ∈ Frq . Then we choose any vector v2 such that v1 and v2 are linearly
independent. Then we choose any v3 such that any d − 1 of v1 , v2 , v3 are linearly independent, and so
on. We need to show that this always works, i.e. at each stage you can choose an appropriate vector.
Formally, this amounts to the following.
We prove the Gilbert–Varshamov bound by induction on n. For the case n = 1, we just need to be
able to find a non-zero vector v1 ; we can do this, since r > 0.
Now suppose n > 1 and that the theorem is true with n replaced by n − 1, i.e. whenever
! ! ! !
n−2 n−2 2 n−2 d−2 n − 2
+ (q − 1) + (q − 1) + · · · + (q − 1) < qr , (†)
0 1 2 d−2
Some examples of linear codes 43
we can find a sequence v1 , . . . , vn−1 of vectors in Frq such that any d−1 of them are linearly independent.
Assume that the inequality (∗) holds. Recall that for any i we have
! ! !
n−1 n−2 n−2
= + ;
i i i−1
Hence if (∗) holds then so does (†). So by our inductive hypothesis we can find a sequence v1 , . . . , vn−1
of vectors of which any d − 1 are linearly independent.
Given a vector vn ∈ Frq , we say that it is good if any d − 1 of the vectors v1 , . . . , vn are linearly
independent, and bad otherwise. All we need to do is show that there is a good vector. We’ll do this
by counting the bad vectors, and showing that the number of bad vectors vn is strictly less than the
total number of choices of vn ∈ Frq , i.e. qr ; then we’ll know that there is a good vector.
Suppose vn is bad. This means that some d − 1 of the vectors v1 , . . . , vn are linearly dependent, i.e.
there exist 1 6 i1 < · · · < id−1 6 n and λ1 , . . . , λd−1 ∈ Fq not all zero such that
By our assumption, any d − 1 of the vectors v1 , . . . , vn−1 are linearly independent, so the above sum
must involve vn with non-zero coefficient, i.e. id−1 = n and λd−1 , 0. So we have
λ1 λ2 λd−2
! ! !
vn = − vi − vi − · · · − vi .
λd−1 1 λd−1 2 λd−1 d−2
vn = µ1 vi1 + · · · + µe vie
for some 0 6 e 6 d − 2, some 1 6 i1 < · · · < id−2 6 n − 1 and some non-zero µi ∈ Fq . (n.b. the case
e = 0 is allowed – it gives vn equals to the empty sum, i.e. vn = 0.)
So every bad vector can be written as a linear combination with non-zero coefficients of e of the
vectors v1 , . . . , vn−1 , for some e 6 d − 2. So the number of bad vectors is at most the number of such
linear combinations. (Note that we say ‘at most’ because it might be that a bad vector can be written
in several different ways as a linear combination like this.)
How many of these linear combinations are there? For a given e, we can choose the numbers
i1 , . . . , ie in n−1
e different ways. Then we can choose each of the coefficients µi in q − 1 different ways
(since µi must be chosen non-zero). So for each e the number of different linear combinations
µi vi1 + · · · + µe vie
44 Coding Theory
n−1
is (q − 1)e e . We sum over e to obtain
! ! ! !
n−1 n−1 2 n−1 d−2 n − 1
(number of bad vectors) 6 + (q − 1) + (q − 1) + · · · + (q − 1)
0 1 2 d−2
< qr
by (∗). So not every vector is bad, and so we can find a good vector.
Theorem 6.11 (Singleton bound for linear codes). An [n, k, d]-code over Fq satisfies
d 6 n − k + 1.
Proof. If C is an [n, k, d]-code over Fq , then C is a q-ary (n, M, d)-code, where M = qk by Lemma 4.5.
Hence by Theorem 2.8(2) we have
qk 6 qn+1−d ,
i.e. k 6 n + 1 − d.
The aim of this section is to look at codes which give equality in this bound. In an [n, k]-code, the
number n − k is called the redundancy of the code – it can be thought of as the number of redundant
digits we add to our source word to make a codeword.
Definition. A maximum distance separable code (or MDS code) of length n and redundancy r is a
linear [n, n − r, r + 1]-code.
Obviously, the main question concerning MDS codes is: for which n, r does an MDS code of
length n and redundancy r exist? The answer is not known in general, but we shall prove some results
and construct some MDS codes. Theorem 6.8 says that an MDS code of length n and redundancy r
exists if and only if we can find a sequence of n vectors in Frq of which any r are linearly independent.
Clearly if we can do this for a given value of n, then we can do it for any smaller value of n, just
by deleting some of the vectors. This implies that for each r (and q) there is a maximum n (possibly
infinite) for which an MDS code of length n and redundancy r exists. Given r, q, we write max(r, q)
for this maximum. The main question about MDS codes is to find the values of max(r, q).
Note that max(r, q) may be infinite, even though there are only finitely many vectors in Fnq , because
we are allowed repeats in our sequence of vectors. Note, though, that if we have a repeated vector in
our sequence, then the code cannot possibly have distance more than 2. Here is our main theorem on
the values max(r, q).
Theorem 6.12.
1. If r = 0 or 1, then max(r, q) = ∞.
It is conjectured that the bounds in (3) actually give the right values of max(r, q), except in the
cases where q is even and r = 3 or q − 1, in which case MDS codes of length q + 2 can be constructed.
The three parts of Theorem 6.12 have different proofs, and the first two are quite easy.
Proof of Theorem 6.12(1). We must show that for r = 0 or 1 and for any n and q, there exists an
MDS code of length n and redundancy r. For r = 0, this means we need an [n, n, 1]-code. But the
whole of Fnq is such a code, as we saw in Theorem 2.1.
For r = 1, we want to construct a sequence of n vectors in F1q such that the set formed by any one
of them is linearly independent. We just take each of the vectors to be the vector (1).
Proof of Theorem 6.12(2). To show that max(r, q) > r + 1, we need to show that an MDS code of
length r + 1 and redundancy r exists. But this is an [r + 1, 1, r + 1]-code, and the repetition code is
such a code.
To show that max(r, q) 6 r +1, we have to show that we can’t find an MDS code of length r +2 and
redundancy r, i.e. an [r + 2, 2, r + 1]-code. If we can find such a code C, then any code D equivalent to
C is also an [r + 2, 2, r + 1]-code. So by taking a generator matrix for C and applying matrix operations
MO1–5, we may find an [r + 2, 2, r + 1]-code with a generator matrix in standard form. This has a
parity-check matrix H in standard form, i.e. of the form (B|Ir ), where B is some r × 2 matrix over Fq .
Let v, w be the columns of B. By Theorem 6.8 any r of the columns of H are linearly independent,
i.e. any r of the vectors v, w, e1 , . . . , er are linearly independent, where e1 , . . . , er are the columns of
the identity matrix, i.e. the standard basis vectors.
Suppose the entries of v are v1 , . . . , vr . First we show that v1 , . . . , vr are all non-zero. If not, then
v j = 0 for some j. Then we have
v1
v2
..
.
v
v = j−1 = v1 e1 + v2 e2 + · · · + v j−1 e j−1 + v j+1 e j+1 + · · · + vr er ,
0
v j+1
..
.
vr
and so the r vectors v, e1 , . . . , e j−1 , e j+1 , . . . , er are linearly dependent. Contradiction. So each v j is
v1 vr
non-zero. Similarly we find that the entries w1 , . . . , wr of w are non-zero. This means that ,...,
w1 wr
are non-zero elements of Fq . Now there are only q − 1 < r distinct non-zero elements of Fq , and so
vi vj vj wj
we must have = for some i < j. We re-write this as − = 0. Now consider the vector
wi wj vi wi
v w vk wk
− . The kth component of this vector equals − , and so we have
vi wi vi wi
r !
v w X vk wk
− = − ek .
vi wi k=1 vi wi
vk wk
Now for k = i and k = j we have − = 0, and so we may ignore the i and j terms to get
vi wi
i−1 ! j−1 ! r !
v w X vk wk X vk wk X vk wk
− = − ek + − ek + − ek .
vi wi k=1 vi wi k=i+1
vi wi k= j+1
vi wi
46 Coding Theory
So the r vectors v, w, e1 , . . . , ei−1 , ei+1 , . . . , e j−1 , e j+1 , . . . , er are linearly dependent; contradiction.
For the proof of Theorem 6.12(3), we can be a bit more clever. Given a sequence of vectors
v1 , . . . , vn , we have an easy way to check whether some r of them are linearly independent. We
let A be the matrix which has these vectors as its columns. Then A is a square matrix, and so has a
determinant. And the columns of A are linearly independent if and only if the determinant is non-zero.
We need to look at a particular type of determinant, called a ‘Vandermonde determinant’.
Proposition 6.13. Suppose x1 , . . . , xr are distinct elements of a field F. Then the determinant
1 ...
1 1
x2 . . .
x1
xr
x22 . . .
2
xr2
x1
.. .. ..
. . .
xr−1 xr−1 . . . xr−1
1 2 r
is non-zero.
Now we can construct our codes.
Proof of Theorem 6.12(3). We need to construct a [q + 1, q + 1 − r, r + 1]-code. Label the elements
of Fq as λ1 , . . . , λq in some order, and let
...
1 1 1 0
λ1 λ2 . . . λq 0
λ2 λ22 . . . λ2q 0
1
H = .. .. .. .. .
. . . .
λr−2 λr−2 . . . λr−2 0
1 2 q
λr−1
1 λ r−1 . . . λr−1 1
2 q
Let C be the code with H as its parity-check matrix. Then C is a [q + 1, q + 1 − r], code, and we claim
that C has minimum distance at least r + 1. Recall from Theorem 6.8 that this happens if and only if
any r columns of H are linearly independent. H has r rows, and so any r columns together will form
a square matrix, and we can check whether the columns are linearly independent by evaluating the
determinant. So choose r columns of H, and let J be the matrix formed by them.
If the last column of H is not one of the columns chosen, then
1 ...
1 1
x1 x2 . . . xr
J = x1
2 x22 . . . xr2
.. .. ..
. . .
. . . xr
r−1 r−1 r−1
x1 x2
for some distinct x1 , . . . , xr , and so det(J) , 0 by Proposition 6.13. If the last column of H is one of
the columns chosen, then we have
1 1 ... 1 0
x2 . . . xr−1 0
x
1
x2 x22 . . . xr−1
2 0
1
J = . .. .. ..
.. . . .
. . . xr−1
r−2 r−2 r−2 0
x1 x2
. . . xr−1
r−1 r−1 r−1 1
x1 x2
Some examples of linear codes 47
for some distinct x1 , . . . , xr−1 . Let J 0 be the matrix formed by the first r − 1 rows and the first r − 1
columns. Then det(J 0 ) , 0 by Proposition 6.13, and so the first r−1 rows of J are linearly independent.
Now consider the rows of J; suppose that
0 + · · · + 0 + µr = 0.
Hence
µ1 .(row 1) + · · · + µr−1 .(row r − 1) = 0,
but this implies µ1 = · · · = µr−1 = 0, since the first r − 1 rows of J are linearly independent. And so
all the rows of J are linearly independent, so det J , 0, as required.
Example. We have
x0 (2) = 0101,
x1 (2) = 0011,
x0 (3) = 01010101,
x1 (3) = 00110011,
x2 (3) = 00001111.
We will write xi (n) as xi when it is clear what the value of n is. We consider products of the words
xi (n). For example,
We include the word 11 . . . 1, which we regard as the ‘empty product’, and write as 1(n). Note
that we only bother with products of distinct xi (n)s, since xi (n)xi (n) = xi (n).
Definition. The rth-order Reed–Muller code R(r, n) is the binary linear code of length 2n spanned by
all products of at most r of the words x0 (n), . . . , xn−1 (n).
Note that in ‘at most r’ we include 0, so we include the product of none of the words x0 (n), . . . , xn−1 (n),
i.e. the word 1(n).
48 Coding Theory
1 = 11111111,
x0 = 01010101,
x1 = 00110011,
x0 ∗ x1 = 00010001,
x2 = 00001111,
x0 ∗ x2 = 00000101,
x1 ∗ x2 = 00000011,
x0 ∗ x1 ∗ x2 = 00000001.
So
R(0, 3) = h11111111i,
R(1, 3) = h11111111, 01010101, 00110011, 00001111i,
and
We want to work out the dimension of R(r, n). In fact, the spanning set we’ve chosen is a basis, but
this is not immediately obvious. Let’s have a look at the size of this spanning set: for each 0 6 i 6 r,
n take all products of i of the words x0 , . . . , xn−1 . The number of ways of choosing these i words is
we
i . By summing for all i, we find that
! ! !
n n n
dim R(r, n) 6 + + ··· + .
0 1 r
Lemma 6.14. Suppose 0 6 i1 < · · · < i s < n and let x = xi1 (n) ∗ xi2 (n) ∗ · · · ∗ xis (n). If i s < n − 1, then
the x is simply the word xi1 (n − 1) ∗ xi2 (n − 1) ∗ · · · ∗ xis (n − 1) written twice. If i s = n − 1, then x is the
word 00 . . . 0 of length 2n−1 followed by the word xi1 (n − 1) ∗ xi2 (n − 1) ∗ · · · ∗ xis−1 (n − 1).
Proof. If i < n − 1, then xi (n) is the word xi (n − 1) written twice. So if we take any product of words
xi (n) with i < n − 1, then we’ll get the product of the corresponding xi (n − 1)s written twice. xn−1 (n)
consists of 2n−1 zeroes followed by 2n−1 ones. So if v is a word of length 2n consisting of a word w
written twice, then v ∗ xn−1 (n) consists of 2n−1 zeroes followed by w. The lemma follows.
Proof. We prove this by induction on n, with the case n = 1 being easy to check. So we suppose that
n > 1 and the result holds with n replaced by n − 1. Let x = xi1 (n) ∗ xi2 (n) ∗ · · · ∗ xis (n). There are two
cases to consider, according to whether i s = n − 1 or not. If i s < n − 1, then by Lemma 6.14 x consists
of the word xi1 (n−1)∗· · ·∗ xis (n−1) written twice, so the first 1 appears in position 1+2i1 +· · ·+2is , by
induction. If i s = n − 1, then by Lemma 6.14 x consists of the word consisting of 2n−1 zeroes followed
by the word xi1 (n − 1) ∗ · · · ∗ xis−1 (n − 1). So by induction the first 1 appears in position 2n−1 + p, where
p is the position of the first 1 in xi1 (n − 1) ∗ · · · ∗ xis−1 (n − 1). By induction p = 1 + 2i1 + · · · + 2is−1 , and
the result follows.
There are 2n different products of the words x0 (n), . . . , xn−1 (n) (since each xi (n) is either involved
in the product or not involved in it). Recall that for a positive integer p there is a unique way to write
p − 1 as a sum of distinct powers of 2, i.e. there are unique integers i1 < i2 < · · · < i s such that
p − 1 = 2i1 + · · · + 2is .
p = 1 + 2i1 + · · · + 2is .
Combined with Proposition 6.15, this means that for each p there is exactly one word which is a
product of words xi (n) and which has its first 1 in position p. We label the products of the words
x0 (n), . . . , xn−1 (n) as y1 , . . . , y2n so that for each y p has its first 1 in position p, for each p.
Proof. Suppose
λ1 y1 + · · · + λ2n y2n = 0,
with each λi equal to 0 or 1 and not all the λi being 0. Let p be minimal such that λi = 1. The word
y p has a 1 in position p, while the words y p+1 , . . . , y2n each have a 0 in position p. Hence the word
λ1 w1 + · · · + λ2n w2n
words which are products of at most r of the words x0 (n), . . . , xn−1 (n). These span R(r, n) by defini-
tion, and by Proposition 6.15 they are linearly independent, and so they form a basis for R(r, n).
Proof. (Non-examinable) We use the fact that the minimum distance of a linear code equals the
minimum weight of a non-zero codeword. First we want to show that there is a codeword of weight
2n−r . We claim that the word
xn−r (n) ∗ xn−r+1 (n) ∗ · · · ∗ xn−1 (n)
consists of 2n − 2n−r zeroes followed by 2n−r ones. We prove this by induction on n, with the case
n = 1 easy to check. Suppose n > 1. By Lemma 6.14 the word xn−r (n) ∗ xn−r+1 (n) ∗ · · · ∗ xn−1 (n) equals
the word 00 . . . 0 of length 2n−1 followed by the word
By induction this word consists of 2n−1 − 2n−r zeroes followed by 2n−r ones, and the claim is proved.
Hence R(r, n) contains a word of weight 2n−r , so d(R(r, n)) 6 2n−r .
Now we show that every non-zero word has weight at least 2n−r . Again, we proceed by induction
on n, with the case n = 1 being easy to check. So we suppose that n > 1 and that the theorem is true
for n − 1. Suppose w is a non-zero word in R(r, n); then we can write
w = y1 + · · · + y s ,
where each yi is a product of at most r of the words x0 (n), . . . , xn−1 (n). We let u be the sum of all the
terms which do not include xn−1 (n), and we let v be the sum of all the terms which do include xn−1 (n).
Then w = u + v, and by Lemma 6.14 we see that
Now we can prove that the weight of w is at least 2n−r , by considering several cases.
• Now suppose u0 = v0 . Then u0 , 0, and w consists of the word u0 followed by 2n−1 zeroes. So
weight(w) = weight(u0 ). Now u0 = v0 ∈ R(r − 1, n − 1), and so by induction the weight of u0 is
at least 2(n−1)−(r−1) = 2n−r .