Group Theory & Error Detecting / Correcting Codes: Article
Group Theory & Error Detecting / Correcting Codes: Article
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/237564884
Article
CITATION READS
1 114
1 author:
Sotiris Moschoyiannis
University of Surrey
49 PUBLICATIONS 288 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Sotiris Moschoyiannis on 14 April 2014.
Sotiris Moschoyiannis
December 2001
Group Theory
and
S. K. MOSCHOYIANNIS
Department of Computing
School of Electronics, Computing & Mathematics
University of Surrey
Guildford, Surrey GU2 7XH, UK
September 2001
ii
iii
ABSTRACT
At the dawn of the 21st century it is more than obvious that the information age is upon us. Technological
developments such as orbiting and geo-stationary satellites, deep-space telescopes, high-speed computers,
compact disks, digital versatile disks, high-definition television and international networks allow massive
amounts of information to be transmitted, stored and retrieved. Vocationally, efficient communication of
information, in terms of speed, economy, accuracy and reliability, is becoming an essential process. Since
its orig ins, the field of error detecting / correcting codes arose in response to practical problems in the
reliable communication of digital information.
Natural communication systems such as our eyes or the English language use mechanisms to achieve
reliability. Our eyes, when we are disoriented, use experience to guess the meaning of what they see,
heavily depending on the various independent guessing mechanisms of our brains. The English language
makes use of built-in restrictions to ensure that most sequences of letters do not form words, so as to allow
very few candidates for the correct version of a misspelled word.
However, when processing digital information, the lack of assumptions about the original message provides
no statistic upon which to base a reasonable guess. Therefore, robust communication systems employ error
detecting / correcting codes to combat the noise in transmission and storage systems. These codes obtain
error control capability by adding redundant digits in a systematic fashion to the original message so that
the receiver terminal can reproduce the message if altered during transmission. In order to ensure that
redundancy will allow for error detection / correction, various mathematical methods are invoked in the
design of error control codes.
This study aims to indicate the algebraic techniques applied for developing error detecting / correcting
codes. Many of these techniques are based on standard abstract algebra, particularly the theory of groups
and other algebraic structures such as rings, vector spaces and matrices on the basics of finite fields.
These mathematical concepts are discussed, focusing on their relation to error control coding, as they
provide the basis for constructing efficient error detecting / correcting codes.
iv
ACKNOWLEDGEMENTS
I would like to thank my supervisor Dr. M. W. Shields for his guidance until completion of this project. He
made me feel I had the appropriate degree of freedom in conducting this study and at the same time his
valuable comments and suggestions at each phase, gave me direction towards the next stage.
I would also like to thank my family for their continuous encouragement throughout this study. Their
financial support enabled me to concentrate solely on this project.
Many thanks to my friend Mary for her constant support and assistance. Her printer facilitated numerous
corrections on pre-drafts of my work.
v
CONTENTS
Abstract......................................................................................................................................iv
Contents......................................................................................................................................vi
1. Introduction............................................................................................................................1
vi
4.5.2 Syndromes of Words.........................................................................................24
4.5.2.1 Observation........................................................................................26
7. Cyclic Codes........................................................................................................................54
7.1 Polynomial Representation.........................................................................................54
7.2 Cyclic Codes as Ideals................................................................................................55
7.3 Parity-Check Polynomial – Generator Matrix for Cyclic Codes................................56
7.4 Systematic Encoding for Cyclic Codes.......................................................................59
vii
9. Performance of Error Detecting / Correcting Codes...........................................................71
9.1 Error Detection Performance......................................................................................71
9.2 Error Correction Performance.....................................................................................72
9.3 Information Rate and Error Control Capacity.............................................................73
Afterword..................................................................................................................................79
Appendix A...............................................................................................................................81
Appendix B...............................................................................................................................86
Appendix C...............................................................................................................................87
Appendix D...............................................................................................................................89
Appendix E...............................................................................................................................90
Bibliography.............................................................................................................................91
viii
Group Theory and Error Detecting / Correcting Codes
Introduction
1. INTRODUCTION
In some cases, analysis of the design criteria for a communication system may have once
indicated that the desired system is a physical impossibility. Shannon and Hamming laid the
foundation for error control coding, a field which now includes powerful techniques for
achieving reliable reproduction of data that is transmitted in a noisy environment. Shannon’s
existential approach motivated the search for codes by providing the limits for ideal error
control coding while Hamming constructed the first error detecting / correcting code.
The purpose of error detecting / correcting codes is to reduce the chance of receiving
messages which differ from the original message. The main concept behind error control
coding is redundancy. That is, adding further symbols to the original message that do not add
information but serve as check / control symbols. Error detecting / correcting codes insert
redundancy into the message, at the transmitter’s end, in a systematic, analytic manner in
order to enable reconstruction of the original message, at the receiver’s end, if it has been
distorted during transmission.
The ultimate objective is to ensure that the message and its redundancy are interrelated by
some set of algebraic equations. In case the message is disturbed during transmission, it is
reproduced at the receiver terminal by the use of these equations. Explicitly, error control
efficiency is highly associated with applying mathematical theory in the design of error
control schemes. The purpose of this study is to indicate the underlying mathematical
structure of error detecting / correcting codes.
Chapter 2 highlights the main principles of error detecting / correcting codes. Two examples
of basic binary codes are included and the Chapter concludes with a discussion on Shannon’s
Theorem.
1
Introduction
Concepts of elementary linear algebra, are introduced in Chapter 3. In particular, the theory
of groups and related algebraic structures such as fields, vector spaces and matrices are
selectively presented.
Chapter 4 covers the basic structure of linear block codes. Based on the vector representation,
these codes are postulated in terms of the mathematical entities introduced in Chapter 3.
Additionally, an extensive section is devoted to the decoding process of linear block codes.
The well-known Hamming codes are presented in Chapter 5, among with other classes of
linear block codes such as Golay, Hadamard and Reed-Muller codes. The construction of
these codes rests on the concepts introduced in Chapters 3 and 4. In this part, we have
included two programs that perform encoding and decoding for the [7,4] Hamming code.
Chapter 6 delves into the structure of Galois fields, aiming to form the necessary
mathematical framework for defining several powerful classes of codes. The main interest is
in the algebraic properties of polynomials over Galois fields.
Chapter 7 proceeds to develop the structure and properties of cyclic codes. The polynomial
representation of a code is used as a link between error detecting / correcting codes and the
mathematical entities introduced in Chapter 6. Emphasis is placed on the application of these
mathematical tools to the construction of cyclic codes.
Chapter 8 is devoted to the presentation of the important classes of BCH and Reed-Solomon
codes for multiple error correction. Attention is confined to their powerful algebraic decoding
algorithm. The last section describes the capability of Reed-Solomon codes to correct error
bursts.
Chapter 10 investigates suitable error control strategies for specific applications. Reed-
Solomon codes are emphasised due to their popularity in current error control systems.
2
Principles of Error Control
Error detecting / correcting codes are implemented in almost every electronic device which
entails transmission of information, whether this information is transmitted across a
communication channel or stored and retrieved from a storage system such as a compact disk.
The set of symbols – this set always being finite – used to form the information message,
constitutes the alphabet of the code. In order to send information, a channel, be it physical or
not, is required. In most of the cases presented, a random symmetric channel is considered.
A channel is a random symmetric error channel if for each pair of distinct symbols a, b of the
alphabet, there is a fixed probability p a,b that when a is transmitted, b is received and p a,b is the
same for all possible pairs a, b (a ? b).
The basic operating scheme of error detecting / correcting codes is depicted in Figure 2–1.
Suppose that an information sequence of k message symbols or information digits is to be
transmitted. This sequence m may be referred to as message word. The encoder, at the
transmitter’s end,
m u v
Encoder ? Decoder
retransmission required
Figure 2–1
adds r check digits from the alphabet according to a certain rule, referred to as encoding rule.
The encoder outputs a total sequence u of n digits, called codeword, which is the actual
sequence transmitted. The n – k = r additional digits, known as parity-check digits, are the
redundant digits used at the receiver’s end for detection and correction of errors. Errors that
occur during transmission alter codeword u to a word v, which is the received word. The
decoder checks whether the received word v satisfies the encoding rule or not. If the
condition is determined to be false, then error processing is performed, in an attempt to
reproduce the actual transmitted codeword u. If this attempt fails, the received word is
ignored and retransmission is required, else the decoder extracts the original message m from
the reconstructed codeword u.
3
Principles of Error Control
4
Principles of Error Control
G = 1 – H(p)
where the function H(p), called the binary entropy of the information source, is defined as
for the binary symmetric channel (BSC) which consists of two symbols with the same
probability p of incorrect transmission. The precise statement has as follows, Jones and Jones
(2000, p. 88).
1
Theorem 2-1: Let ? be a binary symmetric channel with p > , s? B has capacity
2
G = 1 – H(p) > 0 and let d,e > 0. Then, for all sufficiently large n, there is a
For simplicity, the statement is for the BSC, but the theorem is valid for all channels. Thus,
by choosing d and e sufficiently sma ll, PrE and R can be made as close as required to 0 and G
respectively. Informally, the theorem states that if long enough codewords are chosen, then
information can be transmitted across a channel ? as accurately as required, at an information
rate as close as desired to the capacity of the channel. Theorem 2-1 motivates the search for
codes whose information rate R approaches 1 while the length of codewords n increases.
Such codes are often characterised as ‘good’ codes.
2.2.1 Observation
Since, a very large value of n is required to achieve R → C and PrE → 0, long enough
codewords are transmitted making encoding and decoding more complex processes.
5
Principles of Error Control
Furthermore, if n is large then the receiver may experience delays until a codeword is
received, resulting in a sudden burst of information, which may be difficult to handle.
6
Mathematical Background I
3. MATHEMATICAL BACKGROUND I
3.1 Groups
A fundamental concept used in the study of error detection / correction codes is the structure
of a group, which underlies other algebraic structures such as fields and rings.
A binary operation operates on two elements of a set at a time, yielding a third (not
necessarily distinct) element. When a binary operation, along with certain rules restricting the
results of the operation is imposed on a set, the resulting structure is a group.
Definition 3-1: A group is a set of elements G with a binary operation ‘·’ defined in such a
way that the following requirements are satisfied:
1. G is closed; that is a·ß is in G whenever a and ß are in G
2. a·(ß·?) = (a·ß)·? for all a,ß,? ∈ G (associative law)
3. There exists e∈ G, such that a·e = e·a = a (identity)
4. For all a ∈ G, there exists a ∈ G such that a·a = a ·a = e
-1 -1 -1
(inverse)
The order of a group is defined to be the cardinality of the group, which is the number of
elements contained in the group. The order of a group is not sufficient to completely specify
the group. Restriction to a particular operation is necessary. Groups with a finite number of
elements are called finite groups.
For example, the set of integers forms an infinite commutative group under integer addition,
but not under integer multiplication, since the latter does not allow for the required
multiplicative inverses.
7
Mathematical Background I
The order of a group element g ∈ G, essentially different from the order of the group, is
defined to be the smallest positive integer r such that g r = e, where e is the identity element of
group G. A simple method for constructing a finite group is based on the application of
modular arithmetic to the set of integers as stated in the next two theorems, Wicker (1995, p.
23-4).
Theorem 3-1: The elements {0, 1, 2, …, m – 1} form a commutative group of order m under
modulo m integer addition for any positive integer m.
As for integer multiplication, m cannot be selected arbitrarily because if the modulus m has
factors other than 1 and m in a given set, the set will have zero divisors. A zero divisor is any
non-zero number a for which there exists non-zero number b such that a·b = 0 modulo m.
Hence, to construct a finite group of order m under multiplication modulo m, the moduli must
be restricted to prime integers.
A subset S of G is a subgroup if it exhibits closure and contains all the necessary inverses.
That is, c = a·b -1 ∈ S for all a,b ∈ S. The order of a subgroup is related to the order of the
group according to Langrange’s Theorem, which states that the order of a subgroup is always
a divisor of the order of the group.
Another important algebraic structure in the study of error control codes is the cyclic group,
defined as follows.
Definition 3-2: A group G is said to be a cyclic group if each of its elements is equal to a
power of an element a in G. Then, the group G is determined by <a>.
Element a is called a generating element of <a>. The element a 0 is by convention the unit
element.
3.2 Fields
The concept of a field, particularly a finite field , is of great significance in the theory of error
control codes as will be highlighted throughout this study. A common approach to the
construction of error detecting / correcting codes suggests that the symbols of the alphabet
used, are elements of a finite field.
8
Mathematical Background I
Definition 3-3: A field F is a set of elements with two binary operations ‘+’ and ‘·’ such that:
1. F forms a commutative group under ‘+’ with identity 0
2. F –{0} forms a commutative group under ‘·’ with identity 1
3. The operations ‘+’ and ‘·’ distribute:
a·(ß+?) = a·ß + a·? for all a,ß,?∈ F
For example, the real numbers form an infinite field, as do the rational numbers.
A non-empty subset F´ of a field F is a subfield, if and only if F´ constitutes a field with the
same binary operations of F.
If the set F is finite, then F is called a finite field. Finite fields are often known as Galois
fields, in honour of the French mathematician Evariste Galois who provided the fundamental
results on finite fields.
The order of a finite field F is defined to be the number of elements in the field. It is standard
practice to denote a finite field of order q by GF(q). For example, the binary alphabet B is a
finite field of order 2, denoted by GF(2), under the operations of modulo 2 addition and
multiplication.
Finite fields and their properties are further discussed in Chapter 6 since they are used in most
of the known classes of error detection / correction codes.
Consider V to be a set of elements called vectors and F a field of elements called scalars. A
vector space V over a field F is defined by introducing two operations in addition to the two
already defined between field elements:
i. Let ‘+’ be a binary additive operation, called additive vector operation, which maps
pairs of vectors v1 ,v2 ∈ V onto vector v = v1 + v2 ∈ V
ii. Let ‘·’ be a binary multiplication operation, called scalar multiplicative operation,
which maps a scalar a ∈ F and a vector v∈ V onto a vector u = a·v∈ V
Now, V forms a vector space over F if the following conditions are satisfied:
1. V forms an additive commutative group
9
Mathematical Background I
Let u,v ∈ V where v = (v0 , v1 , …, vn-1) and u = (u0 , u 1 , …, u n-1 ) with {vi }∈ F and {u i }∈ F.
Then, vector addition can be defined as
For example, the set of binary n-tuples, Vn , forms a vector space over GF(2), with coordinate-
wise addition and scalar multiplication. Note that the operations for the coordinates are
performed under the restrictions imposed on the set they are taken from. Obviously, Vn has
cardinality 2n , since that is the number of all possible distinct sequences of 1s and 0s of length
n. Since V forms an additive commutative group, for a 0 , a 1 , …, a n-1 scalars in F, the linear
combination v = a 0 v0 + a 1v1 +…+ a n-1 vn-1 is a vector in V.
A spanning set for V is a set of vectors G = {v0, v1 , …, vn-1 }, the linear combinations of which
include all vectors in a vector space V. Equivalently, we can say that G spans V. A spanning
set with minima l cardinality is called a basis for V. If a basis for V has k elements, hence its
cardinality is k, then the vector space V is said to have dimension k. Furthermore, according
to the following theorem, Wicker (1995, p. 31) each vector in V can be written as a linear
combination of the basis elements for some collection of scalars {a i } in F.
Theorem 3-3: Let {vi } i = 0..k – 1 be a basis for a vector space V. For every vector in V
there is a representation v = a 0v0 + … + a k-1vk-1 . This representation is unique.
10
Mathematical Background I
Equivalently, S is a vector subspace of the vector space V over field F if and only if a·v1 + b·v2
is in S, for v1 ,v2 ∈ S and a,b ∈ F.
Definition 3-5: Let u = (u 0 , u 1 , …, un-1 ) and v = (v0 , v1 , …, vn-1) be vectors in the vector space
V over the field F. The inner product u·v is defined as
n −1
u·v = ∑
i= 0
u ivi = u0 v0 + u 1 v1 + … + u n-1vn-1
The inner product defined in a vector space V over F, has the following properties derived
from its definition:
i. commutative:
u·v = v·u, for all u,v ∈ V
ii. associative with scalar multiplication:
a·(v·u) = (a·v)·u
iii. distributive with vector addition:
u·(v + w) = u·v + u·w
If the inner product of two vectors v and u is v·u = 0, then v is said to be orthogonal to u or
equivalently u is orthogonal to v.
The inner product, which is a binary operation that maps pairs of vectors in the vector space V
over field F onto scalars in F, is used to characterise dual (null) spaces.
Given that a vector space V over F is a vector space with inner product, the dual space S- is
defined as the set of all vectors v in V such that u·v = 0, for all u ∈ S and for all v ∈ S- .
Note that S and S- are not disjoint since they both contain the vector the coordinates of which
are all zero, denoted by 0. Additionally, the Dimension Theorem imposes that the summation
of the dimensions of S and S- , is equal to the dimension of the vector space V.
11
Mathematical Background I
3.4 Matrices
It is common practice in error control systems to employ matrices in the encoding and
decoding processes.
A kx n matrix G over a Galois field GF(q) is a rectangular array with k rows and n columns
where each entry g ij is an element of GF(q) (i = 0..k – 1 and j = 0..n – 1).
If k ≤ n and the k rows of a matrix G are linearly independent, the q k linear combinations of
these rows form a k-dimensiona l subspace of the vector space Vn of all n-tuples over GF(q).
Such a subspace is called the row space of matrix G.
Furthermore, by using the notion of dual space, previously introduced, an important theorem
implies the existence of an (n – k)xn matrix H for each kx n matrix G with k linearly
independent rows. The precise statement of this theorem, Lin and Costello (1983, p. 47),
which will appear rather useful in section 4.3 concerning the matrix description of a code, has
as follows.
Theorem 3-4: For any k xn matrix G over GF(q) with k linearly independent rows, there exists
an (n – k)x n matrix H over GF(q) with (n – k) linearly independent rows such
that for any row g i in G and any hj in H, g i ·hj = 0. The row space of G is the
null (dual) space of H, and vice versa.
12
Linear Block Codes
Most known codes are block codes (or codes of fixed length). These codes divide the data
stream into blocks of fixed length which are then treated independently. There also exist
codes of non-constant length such as the convolutional codes which offer a substantially
different approach to error control. In these codes, redundancy can be introduced into an
information sequence through the use of linear shift registers which convert the entire data
stream, regardless of its length, into a single codeword. In general, encoding and decoding of
convolutional codes depends more on designing the appropriate shift register circuits and less
on mathematical structures. Therefore, in this study, attention is confined to block codes,
which invoke algebraic techniques mainly based on the theory of groups to insert redundancy
into the information sequence.
Figure 4–1
The length n of the codeword is greater than k and these (n – k) additional digits, often
referred to as parity-check digits, are the redundancy added.
In order to ensure that the encoding process can be reversed in the receiver’s end in order to
retrieve the original message, there must be a one-to-one correspondence between a message
block and its corresponding codeword. This implies that there are exactly q k codewords. The
set of q k codewords of length n is called an (n,k) block code.
13
Linear Block Codes
This representation allows us to exploit the algebraic structures introduced in Chapter 3 and
provides the basis for defining linear block codes.
Based on the definition of linear block codes over GF(2) in Lin and Costello (1983, p. 52),
these codes can also be defined over a general finite field alphabet as follows.
Definition 4-1: A block code of length n and q k codewords is called a linear (n,k) code C, if
and only if its q k codewords form a k-dimensional subspace of the vector
space of all n-tuples over the field GF(q).
Based on the properties of vector spaces, discussed in section 3.3, it can be seen that the linear
combination of any set of codewords is a codeword. This implies that the sum of any two
codewords in C is a codeword in C. Another consequence of this is that linear codes always
contain the all-zero vector, 0, as a codeword.
14
Linear Block Codes
c = m0 g0 + m1 g1 + … + mk-1 g k-1
where {mi } i = 0..k – 1 are in GF(q). The above expression is valid for any codeword c in C,
implying that there is a one-to-one correspondence between the set of message blocks of the
form (m0 , m1 , ..., mk-1 ) and the codewords in C.
g0 g 0, 0 g 0 ,1 . . . g 0, n−1
1
g g1, 0 g1,1 . . . g1,n −1
.
G = = .
.
.
. .
g k −1 g k −1, 0 g k −1,1 . . . g k −1,n −1
Clearly, the rows of G completely specify the code C. The matrix G is called a generator
matrix for code C and is used for encoding any message m = (m0 , m1 , ..., mk-1 ) as
g0
g1
.
c = m·G = [m0 m1 ... mk-1 ] · = m g +m g +…+ m g
. 0 0 1 1 k-1 k-1
.
g k −1
15
Linear Block Codes
The generator matrix is central to the description of linear block codes since the encoding
process is reduced to matrix multiplication. In addition, only the k rows of G need to be
stored; the encoder merely needs to form a linear combination of the k rows based on the
input message m = (m0 , m1 , ..., mk-1 ). Likewise, the decoding process can be simplified by the
use of another matrix, the parity-check matrix , introduced next.
As stated in Theorem 3-4, for any k x n matrix G with q linearly independent rows there exists
an (n – k)x n matrix H with (n – k) linearly independent rows such that any vector orthogonal
to the rows of H is in the row space of G and thus a valid codeword.
Such a matrix H is called a parity-check matrix of the code and is used for decoding since a
received word v is a codeword if and only if v·HT = 0, where HT is the transpose1 of H. In
addition, the (n – k) linearly independent rows of H span an (n – k)-dimensional subspace of
the vector space of all n-tuples over GF(q). It can be seen that this is the dual space of the
vector space formed by the (n,k) code. Thus, H can be regarded as a generator matrix for an
(n,n – k) code. This code, with regard to the notion of dual space, is called the dual code C-
of C.
1
The transpose of H, is an n x (n – k) matrix whose rows are the columns of H and whose columns are
the rows of H.
16
Linear Block Codes
Consider a linear code C with generator matrix G. By applying elementary row operations 2
and column permutations it is always possible to obtain a generator matrix of the form
G = [Pkx(n-k) | Ik]
where Ik is the kxk identity matrix. Matrix G in the above expression is said to be in
systematic form.
Note that, by permuting the columns of G, code C may change but the new code C´ will differ
from C only in the order of symbols within codewords, allowing the two codes not to be
determined as essentially different. In fact, such codes are called equivalent codes.
If the generator matrix G is in systematic form, the message block during encoding is
embedded without modification in the last k coordinates of the resulting codeword. The
(n – k) first coordinates contain a set of (n – k) symbols that are linear combinations of certain
information symbols. This set is determined by matrix Pk x(n-k), which can be stored in the
read-only memory (ROM) of a PC.
c = m·G = [m0 m1 ... mk-1 ]·[Pkx(n-k) | Ik] = [c0 c1 ... cn-k-1 | m0 m1 ... mk-1 ]
H = [In-k | PT]
The use of G and H in systematic form has implementation advantages. An encoder of binary
systematic linear codes, according to Reed and Chen (1999, p. 82) can be implemented by a
logic circuit, as illustrated in Appendix D. Additionally, the decoding process is simplified
since a received word contains information in the last k positions and thus in case of correct
transmission, the original message can be reconstructed by simply extracting the last k
coordinates.
2
Elementary row operations are defined to be:
i. multiplication of a row by a non-zero constant
ii. replacement of a row ri with ri + a·rj , where i ≠ j and a ≠ 0
iii. row permutations (row reordering)
17
Linear Block Codes
4.4.1 Definition
Effective codes tend to use codewords that are very unlike each other, since such a property is
required for applying maximum likelihood decoding. Clearly, there is a necessity to measure
how like or unlike two codewords are.
Definition 4-2: If u and v are two n-tuples, then we shall say that their Hamming distance is
d H (u,v) = | { i | 0 ≤ i ≤ n – 1, u i ≠ vi } |
In short, the Hamming distance is the number of places (i = 0..n – 1) in which two codewords
differ. It is a metric; that is, it satisfies the following properties that a distance function must
satisfy in the set of codewords of a code C:
1. d H (u,v) ≥ 0 for all n-tuples u,v
2. d H (u,v) = 0 if and only if u = v
3. d H (u,v) = d H (v,u)
4. For any three n-tuples u, v, w in C:
d H (u,w) ≤ d H (u,v) + d H (v,w) (triangle inequality)
The Hamming distance d H is calculated for all possible pairs of codewords of a code C and
much attention is given to its minimum value. The minimum distance, d, among all
codewords is considered to be the least Hamming distance, Reed and Chen (1999, p. 85).
Definition 4-3: The minimum distance of a block code C is the minimum Hamming distance
between all distinct pairs of codewords in C
18
Linear Block Codes
Calculating distances among q k codewords is a quite tedious task, which may be simplified by
the notion of weight defined as follows, van Lint (1999, p. 33).
Definition 4-4: The weight w(v) of any vector (codeword) v = v0 v1…vn-1 is defined by
w(v) = d H (v,0)
In other words, the weight w(v) of a codeword v is the number of non-zero coordinates in v.
Now, the minimum distance of a code can be obtained by using the following theorem, Lin
and Costello (1983, p. 63).
Theorem 4-1: The minimum distance of a linear block code is equal to the minimum weight
of its non-zero codewords.
Consequently, finding the minimum distance of a code requires the weight structure of q k
codewords rather than computing the Hamming distance for all q 2k pairs of codewords.
Another way to determine the minimum distance of a code is by using the parity-check matrix
H, described in section 4.3.2. It has been seen that if c is a codeword then c·HT = 0. Further,
c·HT can be written as a linear combination of the columns of H or the rows of HT. Thus, the
equation c·HT = 0 implies that the columns of H are linearly dependent. These results lead to
the following theorem, Jones and Jones (2000, p. 131), which provides an alternative way of
determining the minimum distance of a code.
Theorem 4-2: Let C be a linear code of minimum distance d and let H be a parity-check
matrix for C. Then d is the minimum number of linearly dependent columns
of H.
If d is the minimum distance of a code, any two distinct codewords differ at least in d
coordinates. For such a code, a received word with (d – 1) or fewer errors cannot be matched
19
Linear Block Codes
to a codeword. Hence, the code is capable of detecting all (d – 1) or fewer errors that may
occur during transmission.
When applying maximum likelihood decoding, incorrect decoding may occur whenever a
received word is closer, in Hamming distance, to an incorrect codeword than to the correct
codeword. For an (n,k) code with minimum distance d, all incorrect codewords are at least
distance d from the transmitted codeword. This implies that incorrect decoding may be the
case only when at least d / 2 errors occur. This is stated in an equivalent fashion in the
following theorem, Pless (1998, p. 11).
d −1
Theorem 4-3: If d is the minimum distance of a code C, then C can correct t =+ +or
2
fewer errors3 , and conversely.
Equivalently, the above theorem imposes the condition that d and t satisfy the inequality
d ≥ 2t + 1 for a code to be t-error-correcting. That is, a code can correct any word received
with errors in at most t of its symbols.
It is also possible to employ codes, which can simultaneously perform correction and
detection. Such modes of error control, often referred to as hybrid modes of error control, are
commonly used in practice. It can be shown, by use of geometric arguments according to
Reed and Chen (1999, p. 86) that a code with minimum distance d can correct t errors and at
the same time detect l errors where t, l and d satisfy t + l + 1 ≤ d.
A Hamming sphere of radius t contains all possible received words that are Hamming
distance t or less from a codeword. If a received word falls within a Hamming sphere it is
decoded as the codeword in the center of the sphere. The common radius t of these disjoint4
spheres is desired to be large in order to achieve good error correcting capacity. Yet, to attain
good information rate R, the number of these spheres M must also be large resulting in a
conflict, since the spheres are disjoint. Thus, there is a limit on the number of spheres, and
consequently codewords, that can be used. An upper bound, known as Hammming’s Sphere-
packing Bound, addresses the problem as stated in the following theorem, Pless (1998, p. 23).
3
+ y+ is the largest integer less than or equal to y (floor function).
4
The requirement that the Hamming spheres are disjoint allows only one candidate codeword for each
received word (unambiguous decoding).
20
Linear Block Codes
n n n n
( + (q − 1) + (q − 1)2 + ... + (q − 1)t ) q k ≤ q n
0 1 2 t
Hence, given n and k this expression bounds t and so bounds d. An important result that rests
on Theorem 4-4 introduces the notion of t-perfect codes, as presented by Jones and Jones
(2000, p. 108).
Theorem 4-5: Every t-error-correcting linear (n,k) code C over GF(q) satisfies
t
n
∑ i (q − 1) ≤ q n− k
i
i= 0
A linear code C is t-perfect if it attains equality in the above theorem. Note that the inequality
in Theorem 4-5 can be obtained by dividing the inequality in Theorem 4-4 by the number of
all codewords, q k, since C is of dimension k. Based on this condition for t-perfect codes,
Golay constructed the two perfect codes, G11 and G23 , as will be described in section 5.2.
From the discussion argued in this section, it can be concluded that the minimum distance is
central to the structure of error detecting / correcting codes. Its tight relation to the error
control capacity of a code, motivates the search for codes with a large minimum distance for
given length n and dimension k.
21
Linear Block Codes
Step 1: The first row of T consists of all the codewords in any order with the restriction that
the first word is 0, the all-zero vector, which has all entries equal to zero.
Step 2: The i-th row is formed by choosing an n-tuple in GF(q) that has not appeared yet in T
and placing it in the first column. This word is called the row leader. The rest of the
row is determined by adding the codeword at the head of each column – it is an
element of the first row – to the row leader.
Following the above steps a q kxq n-k array T is formed, called a standard array for the linear
(n,k) code. Clearly, T contains all words of length n from GF(q) where q is the number of
code symbols and n is the length of the code.
From the way in which the standard array is constructed – that is, every codeword is the sum
of its row leader and the codeword at the head of its column – it follows that the horizontal
differences in a standard array are codewords. This property enables a testing of whether two
codewords u and v lie in the same row of the standard array or not, according to the following
theorem, Pretzel (1996, p. 50).
Theorem 4-6: Let C be a linear (n,k) code and T a standard array for C. Then two entries u
and v lie in the same row of T if and only if their difference u – v is a
codeword.
Error processing with the standard array is performed by replacing each received word with
the codeword at the head of its column. An important property that rests on the above
theorem is that every word of length n for C, occurs exactly once in a standard array for code
C. As a result, standard array decoding is complete and unambiguous. By ‘complete’ is
meant that every received word is always matched to a codeword and ‘unambiguous’ refers to
the fact that the received word clearly determines the corresponding codeword, not allowing
any arbitrary choice.
22
Linear Block Codes
The set of elements that appear in the same row is called a coset. Thus, the cosets of a code
can be defined as the rows of the standard array. These cosets are the same for any possible
standard array of a code, because the test whether two entries lie in the same row, is based on
the set of codewords of the code and not on the actual choice of the array. The entries in the
first row may be in different order, but the sets of elements are the same. Consequently, the
rest of the rows in the standard array have the same entries, even though their order may differ
since ordering depends on the choice of row leader for each row. This property of the
standard array is important, as the error processor associates the whole row to its row leader.
Once the error processor using a standard array detects an erroneous received word, instead of
assuming which codeword was sent, it attempts to guess which error occurred. Then, based
on the fact that an incorrect received word v equals to the transmitted codeword u plus an
error pattern e, where e = v – u, the word v is corrected to the codeword u. To accomplish
that, the error patterns are chosen to be the row leaders of the standard array. In this way, the
word at the head of the column containing e is 0. Thus, e – 0 = v – u, where u is the word at
the head of the column containing v. Hence, u = v – e and the transmitted codeword u is
determined.
Since a standard array for a linear (n,k) code cannot always have all possible error patterns as
row leaders, decoding with a standard array does not enable correction of all errors. In an
attempt to ease this deficiency of the standard array, the row leaders can be chosen to be of
minimal weight among all candidates. In this case, the row leaders are called coset leaders.
However, the choice may be arbitrary in many cases leading to different versions of the
standard array, but the cosets will consist of the same words for the same coset leader in a
different form of the standard array. In this way, each received word will be corrected to a
closest codeword. The above argument is justified in the following theorem, Pretzel (1996,
p. 54).
Theorem 4-7: Let C be a linear (n,k) code over A with a standard array T in which the row
leaders have the smallest possible weight. Let u be a word of length n and let
v be the codeword at the head of its column. Then for any codeword w the
distance d(u,w) is greater than the distance d(u,v).
23
Linear Block Codes
array often becomes prohibitive. Codes need long block length in order to achieve multiple
error correction, resulting in standard arrays with, say 2100 entries, which is not feasible on
computers, in terms of storage space and required time for searching for a specific entry.
Additionally, in the standard array decoding technique, the error patterns corrected are the
row (coset) leaders. As a result, if two error patterns lie in the same row (coset), at most one
of them can be corrected.
With regard to the limitation in the error correction capabilities of the code imposed by the
standard array, a set of error patterns is chosen, usually the set of all errors up to some given
weight. The elements of the set are chosen to be the coset leaders, therefore lie in distinct
rows allowing the standard array to correct all of them and no more. In short, the
corresponding code can correct all error patterns of a certain weight so long as they appear as
coset leaders. Regarding the size of the standard array, the notion of syndromes of received
words reduces the large standard array to a table that consists of two rows, as discussed next,
resulting in saving storage space.
Definition 4-5: Given a linear (n,k) code C and a check matrix H, the syndrome of a word v
of length n is v·HT.
Consequently, the standard array can be reduced to a table containing the row leaders and
syndromes only, resulting in reduced storage space while the search to locate a received word
is less time-consuming.
24
Linear Block Codes
The received words and the corresponding error patterns have the same syndromes. Indeed,
supposing that codeword u is transmitted and v = u + e is received, where e is the error
pattern, the syndrome of v is
since u ∈ C and u·HT = 0 because u is a codeword. The basic property of the check matrix can
be restated using the notion of syndromes by saying that a received word v is a codeword if
and only if its syndrome v·HT = 0. The syndrome of each received word is computed and
located in the syndrome list of the syndrome table. Then, the corresponding coset leader is
the error pattern to be subtracted from the received word. In this way, the actual codeword
sent is reconstructed.
1 0 1 0
G=
0 1 1 1
1 1 1 0
H=
0 1 0 1
The first row of a possible standard array consists of the codewords with 0 first,
0000 00
1000 10
0100 11
0001 01
25
Linear Block Codes
Note that the error pattern 0010 was not chosen as a coset leader, because it had already
appeared in a coset, at the second row, of the standard array. The syndrome of the coset
leader 1000 is identical to the first column of the check matrix. Likewise, the syndrome of
0100 is the second column of H and that of 0001 is the fourth column of the check matrix.
The position of 1 in the error pattern indicates the column and consequently the position of
error in the received word. In the previous example, the error pattern was 0001 and the error
symbol was in the fourth position of the received word.
4.5.2.1 Observation
Syndrome decoding is more efficient than standard array decoding in terms of storage space
and speed of error processing. Considering a binary (100, 60) code, for standard array
decoding there is a need to store 2100 entries and search through 260 vectors to locate a
received word. By implementing syndrome decoding, the requirement is to store and search
through 240 coset leaders and their syndromes, resulting in reduced storage space and a less
time-consuming process.
26
Hamming Codes - Golay Codes - RM Codes
As described by Reed and Chen (1999, p. 104), a binary Hamming code can be constructed
for any positive integer m ≥ 2, with the following parameters.
Hamming codes are determined by their parity-check matrices. The parity-check matrix H of
a binary Hamming code, consists of all non-zero m-tuples as its columns which are ordered
arbitrarily.
The smallest number of distinct non-zero binary m-tuples that are linearly dependent is three
and this can be justified as follows. Since the columns of H are non-zero and distinct, no two
columns add to zero. In addition, H has all the non-zero m-tuples as its columns and thus the
vector sum of any two columns must be a column of H as well. This implies that
hi + hj + hk = 0, which means that at least three columns in H are linearly dependent. It follows
from Theorem 4-2 that the Hamming codes always have minimum distance three. Further,
Theorem 4-3 implies that Hamming codes can always correct a single error.
Hamming codes can be decoded with syndrome decoding, developed in section 4.5. Suppose
that r is the received word and e is the error pattern with a single error in the j-th position.
When the syndrome of r is computed, the transposition of the j-th column of H is obtained as
demonstrated in the following expression
T
s = r·HT = e·HT = h j
27
Hamming Codes - Golay Codes - RM Codes
Step 2: Determine the position j of the column of H which is the transposition of the
syndrome
Step 3: Complement the bit in the j-th position in the received word
is the [7,4] Hamming code. This code encodes message words of length 4 as codewords of
length 7. A table of all 24 message words and their corresponding codewords, taken by Lin
and Costello (1983, p.67) was rather useful for testing the programs presented in the
following sections and is included in Appendix E. This table can be constructed by using the
following equations (encoding rule) based on Pretzel (1996, p. 67):
c1 = m1 + m3 + m4
c2 = m1 + m2 + m3 (5.1.1)
c3 = m2 + m3 + m4
where {mi } denote the coordinates of the message word m = (m1 m2 m3 m4 ). The
corresponding codeword is formed as c = (c1 c2 c3 c4 c5 c6 c7 ) where c1 , c2 , c3 are determined by
the above equations and c4 = m1 , c5 = m2 , c6 = m3 and c7 = m4 .
c1 = m1 + m3 + m4 = 0 + 0 + 1 = 1
c2 = m1 + m2 + m3 = 0 + 1 + 0 = 1
c3 = m2 + m3 + m4 = 1 + 0 + 1 = 0
28
Hamming Codes - Golay Codes - RM Codes
Note that the information symbols – that is, the message word coordinates – occupy the last 4
positions of the corresponding codeword and therefore, equations (5.1.1) perform systematic
encoding for the [7,4] Hamming code.
1 1 0 1 0 0 0
0 1 1 0 1 0 0
G=
1 1 1 0 0 1 0
1 0 1 0 0 0 1
Though comments have been added to the source code for clarity, a brief description of the
program follows.
The program takes a message word as input from the user. This message word, consisting of
4 bits, is encoded through matrix multiplication by the generator matrix G of the code, where
the operations are performed modulo 2. The output of the program is the corresponding
codeword.
#!/usr/bin/perl -w
#HamSysEncoder.pl
#the program performs systematic encoding for the [7,4] Hamming Code
#it uses the generator matrix G in systematic form
29
Hamming Codes - Golay Codes - RM Codes
if (!(defined($mword))){
print "Enter a message word first:\n";
}
else {
@message=split(/\s+/,$mword);
}
for($n=0;$n<7;$n++){
$codeword[$n]=$g0[$n]+$g1[$n]+$g2[$n]+$g3[$n];
if (($codeword[$n]==3)||($codeword[$n]==1)){
$codeword[$n]=1;
}
else {
if (($codeword[$n]==0)||($codeword[$n]==2)||($codeword[$n]==4)){
30
Hamming Codes - Golay Codes - RM Codes
$codeword[$n]=0;
}
}
}
#continue encoding?
print "\n\nDo you want to encode another message word? (Y/N) ";
chomp($yesorno=<STDIN>);
while (($yesorno ne 'Y')and($yesorno ne 'y')and($yesorno ne 'N')
and($yesorno ne 'n')){
print "Enter Y or N\n";
chomp ($yesorno=<STDIN>);
}
if (($yesorno eq 'N') or ($yesorno eq 'n')){
last;
}
}
Given the generator matrix G, used for systematic encoding in the previous section, the
corresponding parity-check matrix H in systematic form for the [7,4] Hamming code is of the
form
1 0 0 1 0 1 1
H = 0 1 0 1 1 1 0
0 0 1 0 1 1 1
31
Hamming Codes - Golay Codes - RM Codes
The table of Appendix E, and in particula r its second column, can be used for testing the
program. A brief description of the program follows.
The program takes a received word as input from the user. This word of length 7 is
multiplied by the transpose of the parity-check matrix H to obtain its syndrome for error
detection / correction. All single errors are detected and corrected. Double errors are
detected but cannot be corrected, as imposed by the [7,4] Hamming code. The output of the
program is the actual transmitted codeword.
#!/usr/bin/perl -w
#HamSyndDecoder.pl
#the program performs syndrome decoding for [7,4] Hamming code
#detects and corrects ALL single errors
#detects but cannot correct double errors (results in incorrect
#decoding)
32
Hamming Codes - Golay Codes - RM Codes
if (($c1==1)||($c1==3)||($c1==5)||($c1==7)){
$c1=1;
}
else {
$c1=0;
}
$syndrome[0]=$c1;
33
Hamming Codes - Golay Codes - RM Codes
#error correction
34
Hamming Codes - Golay Codes - RM Codes
#continue decoding?
print "\n\nDo you want to decode another received word? (Y/N)";
chomp ($yesorno=<STDIN>);
while (($yesorno ne 'Y')and($yesorno ne 'y')and($yesorno ne 'N')
and($yesorno ne 'n')){
print "Enter Y or N\n";
chomp ($yesorno=<STDIN>);
}
if (($yesorno eq 'N')or($yesorno eq 'n')){
last;
}
}
11 11 2
1 + 2 + 2
1 2
was that a perfect linear code could have as parameters q = 3, n = 11, k = 6 and t = 2, since the
above number is 243 which is equal to 35 . Recall that a linear code is perfect if it attains
equality in Theorem 4-5. To construct such a code, Golay considered the following parity-
check matrix, Jones and Jones (2000, p. 137),
1 1 1 2 2 0 1 0 0 0 0
1 1 2 1 0 2 0 1 0 0 0
H = 1 2 1 0 1 2 0 0 1 0 0
1 2 0 1 2 1 0 0 0 1 0
1 0 2 2 1 1 0 0 0 0 1
with n = 11 columns and n – k = 5 independent rows. Though a tedious task, it can be proved
that there are no sets of four linearly dependent columns, while there is a set of five.
Consequently, the minimum distance is d = 5, hence t = 2 according to Theorem 4-3. The
code C defined by H, attains equality in Theorem 4-5 since
35
Hamming Codes - Golay Codes - RM Codes
t
n 11 11 2
∑ i (q – 1)i = 1 + 1 2 + 2 = 243 = 35
i= 0 2
and 35 = q k the code is perfect and is called the ternary Golay code G11 of length 11.
Inspired by the following equation
23 23 23
1 + + + = 2048 = 211
1 2 3
and motivated by Theorem 4-5, Golay took q = 2, n = 23, k = 12 and t = 3 to construct the
binary Golay code G23 of length 23. Similarly, a parity-check matrix H with 23 columns and
11 independent rows was used, which had seven as minimum number of linearly dependent
columns. Thus, this code has minimum distance d = 7 and from Theorem 4-3, it follows that
t = 3 making the G23 a 3-error-correcting perfect code. Recall that it is perfect, since it attains
equality in Theorem 4-5.
5.3.1 Definition
For each positive integer m and each integer r satisfying 0 ≤ r ≤ m, the r-th order Reed-
Muller code R(r,m) is a binary linear code with parameters
1+ m +... + m
1 r
n = 2m , M =2 , d = 2m-r
The special case R(0,m) is the repetition code. More attention is given to the first order
Reed-Muller codes (r = 1) which are binary linear codes. Among several ways of defining
the Reed-Muller codes, R(m), Roman (1997, p. 234) takes the following approach.
Definition 5-1: The Reed-Muller codes R(m) are binary codes defined for all integers m ≥ 1,
as follows:
36
Hamming Codes - Golay Codes - RM Codes
In words, the codewords in R(m + 1) are formed by juxtaposing each codeword in R(m) with
itself and with its complement.
For example, given that R(1) = {00, 01, 10, 11}, according to the previous definition, the R(2)
code will consist of the codewords
Furthermore, the above definition leads to the construction of a generator matrix for R(m)
based on the generator matrix for R(1), as stated in the following theorem, Roman (1997,
p. 236).
0 1
Theorem 5-1: A generator matrix for R(1) is R1 =
1 0
If Rm is a generator matrix for R(m), then a generator matrix for R(m + 1) is
0…0 1…1
Rm+1
=
The first row of a generator matrix Rm for R(m) consists of 2m-1 0s followed by 2m-1 1s. The
i-th row of Rm , is formed by alternating blocks of 0s and 1s of length 2m-i . As for the m-th row
of Rm it consists of alternating 0s and 1s, since the blocks are of length 2m-m = 20 = 1. The last
row consists of all 1s.
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
R3 =
0 1 0 1 0 1 0 1
1 1 1 1 1 1 1 1
37
Hamming Codes - Golay Codes - RM Codes
An nx n matrix H that satisfies both above conditions is called a Hadamard matrix of order n.
For example,
1 1
H =
1 −1
As proven in the next theorem, Hadamard matrices generate binary codes, Jones and Jones
(2000, p. 116).
Theorem 5-2: Each Hadamard matrix of order n gives rise to a binary code of length n, with
M = 2n codewords and minimum distance d = n / 2.
Indeed, if 2n vectors ±r1 , ±r2 , ..., ±rn are formed from the ri rows of H, the orthogonality in
condition (ii) implies that these vectors are distinct. If each entry of –1 is replaced by 0, the
2n vectors can be regarded as elements of GF(2), generating a binary code. Its codewords are
38
Mathematical Background II
6. MATHEMATICAL BACKGROUND II
The error detecting / correcting codes discussed so far, were constructed using algebraic
structures introduced in Chapter 3. In this Chapter, a larger portion of the available algebraic
tools is presented, since application of more advanced algebraic techniques leads to a
significant increase in the power of the resulting codes. Such a class of codes are the so
called cyclic codes, presented in the following Chapter. These codes can correct multiple
errors and can cope with error bursts. That is, errors that do not occur entirely independently
of each other, but affect several neighbouring bits. For instance, a 2.5mm scratch on a
compact disk accounts for approximately 4,000 erroneous bits. The class of cyclic codes
includes certain families of codes such as, BCH and Reed-Solomon codes, which are widely
adopted in practical applications. In order to describe these codes, it is necessary to consider
finite fields.
Consider the set of integers {0, 1, 2, …, p – 1}. Under addition modulo p these elements form
an additive commutative group, by Theorem 3-1. Under multiplication modulo p the subset
of elements {1, 2, …, p – 1} forms a multiplicative commutative group, by Theorem 3-2. If
the two operations are allowed to distribute, which is the case in integer arithmetic, then the
above set is a field. Hence, the integers {0, 1, 2, …, p – 1} where p is a prime, form a finite
field of order p under modulo p addition and multiplication, denoted by GF(p).
Finite fields of order p m , p a prime and m a positive integer, can be constructed as vector
spaces over the prime-order field GF(p), as it shall be demonstrated in section 6.1.3.
The order of a finite or Galois field element is of significant importance since it determines
some of its basic properties.
Consider the finite field GF(q) and the following sequence of elements
1, ß, ß2 , ß3 ,…
39
Mathematical Background II
where ß is an element of GF(q) and 1 is the multiplicative identity. Since ß ∈ GF(q) the
property of closure under multiplication implies that all successive powers of ß must also be
in GF(q). However, GF(q) has only a finite number of elements. Thus, for some power of ß
the sequence begins to repeat values found earlier in the sequence. The first element to repeat
must be 1 and this can be proven by contradiction. Assuming that ßx ? 1 is the first repeating
element and that it repeats ßy, where 0 < y < x, we get ßx = ßy. It follows that ßx-y = 1 is the
first repeating element, where 0 < x – y < x. Since ßx ? 1 was assumed to be the first repeating
element, there is a contradiction. Thus, the first element to repeat is element 1.
The above concept is related to the order of a Galois field element, which is defined as
follows.
Definition 6-1: Let ß be an element in GF(q). The order of ß, denoted by ord(ß), is the
smallest positive integer t such that ßt = 1.
Note that for a Galois field element ß, ord(ß) is defined using the multiplicative operation
whereas the order of a group element is defined using the additive operation. Furthermore,
unlike a group, the order of a finite field completely specifies the field. A finite field of order
q, GF(q), is unique up to isomorphisms 8 . Thus, two finite fields of the same order are always
identical up to labelling of their elements regardless of how the fields were constructed. The
order of a non-zero element ß in GF(q) must satisfy certain requirements, as dictated by
Theorems 6-1 and 6-2, Wicker (1995, p. 34).
Theorem 6-2: Let a and ß be elements in GF(q) such that ß = a i. If ord(a) = t then
ord(ß) = t / GCD(i,t)
(where GCD(i,t) denotes the greatest common divisor of i and t).
Theorem 6-1 states that the order of an element ß in GF(q) must be a divisor of (q – 1). For
example in GF(16) the elements can only have orders {1, 3, 5, 15}. It is possible to
determine the number of elements in the field of any given order. This information is
8
Two fields F and F´are called isomorphic if there exists a map f from F to F´ such that:
i. f is bijective
ii. for any a,b ∈ F, f (a·b) = f (a)·f (b) and f (a + b) = f (a) + f (b)
The map f is called an isomorphism.
40
Mathematical Background II
obtained by the Euler f Function defined as the number of integers in the set {1, …, t – 1}
that are relatively prime 9 to t. Hence,
Theorem 6-1 and the Euler f Function are combined in Theorem 6-3, which describes the
multiplicative structure of finite fields, Wicker (1995, p. 36).
Important results related to the elements of order (q – 1) in GF(q) rest on Theorem 6-3.
Definition 6-2: An element with order (q – 1) in GF(q) is called a primitive element in GF(q).
As an immediate consequence of Theorem 6-3, it follows that in every finite field GF(q) there
are exactly f (q – 1) primitive elements. The notion of primitive elements is central to the
analysis of certain powerful classes of codes, as will be demonstrated in Chapter 8.
The fact that f (t) is always greater than zero for positive t, combined with the above
corollary, dictates that every finite field GF(q) contains at least one primitive element.
Consider the following sequence that comprises successive powers of a primitive element a in
GF(q)
1, a, a 2 , …, a q-2 , a q-1 , a q ,…
Since a is a primitive element, ord(a) = q – 1 and thus a q-1 is the first power of a that repeats
the value 1. It can be shown that 1 is the first element to repeat, in an analogous way to the
one developed on the order of a field element. Consequently, the first (q – 1) elements in the
above sequence are distinct and repetitions start from (q – 1) and higher powers of a. Since
the powers of a are non-zero, the first (q – 1) elements in the sequence must comprise the
9
Integers that have no common divisors with t except 1.
41
Mathematical Background II
(q – 1) non-zero elements in GF(q). Thus, all non-zero elements in GF(q) can be represented
as (q – 1) consecutive powers of a primitive element a ∈ GF(q).
Consider the following sequence of sums of the multiplicative identity element 1 in a finite
field GF(q)
1 2 3 k
∑ 1 = 1, ∑ 1 = 1 + 1, ∑ 1 = 1 + 1 + 1, …, ∑ 1 = 11+42
1 + ...
4+31 , …
i =1 i =1 i =1 i =1 k −times
Closure of the field GF(q) under addition implies that these sums are elements in the field.
Since GF(q) has a finite number of elements, these sums cannot all be distinct and thus, the
sequence must begin to repeat at some point. That is, there must exist two positive integers j
and k (j < k) such that
j k
∑1 =
i =1
∑1
i =1
k− j λ
Thus, ∑ 1 = 0. Consequently, there must exist a smallest positive integer ? such that∑ 1 = 0.
i =1 i =1
This integer ? is called the characteristic of GF(q). Further, the following theorem, Lin and
Costello (1983, p. 22), determines ? to be a prime integer.
As mentioned earlier, a Galois field GF(p) can be constructed by reducing the set of integers
modulo p, where p is a prime number. The following theorem, Berlekamp (1968, p. 102), in
combination with Theorem 6-4, extends the available range of finite fields by imposing that
the order of a Galois field may also be a power of a prime.
Definition 6-3: A ring is a set of elements R with two binary operations ‘+’ and ‘·’ such that
the following requirements are satisfied:
1. R forms an additive commutative group
The additive identity element is ‘0’
42
Mathematical Background II
If a ring satisfies all five above requirements, it is said to be a commutative ring with identity .
Based on the definition of a ring, a field F can be considered as a ring R for which R –{0} is a
commutative group.
• additive operation:
(a 0 + a1 x + a2 x2 + … + a n xn ) + (b 0 + b 1x + b 2 x2 + … + b nxn ) =
= (a 0 + b0 ) + (a 1 + b 1 )x + … + (a n +bn )xn
• multiplicative operation:
(a 0 + a1 x + a2 x2 + … + a n xn ) · (b 0 + b1 x + b2 x2 + … + bm xm) =
= (a 0b 0 ) + [(a 0 b1 ) + (a 1b 0)]x + … + a nb mxn+m
Note that the coefficient operations are performed using the operations for the field from
which they were taken. For example, in GF(2) addition modulo 2 dictates that 1 + 1 = 0.
A polynomial f(x) is irreducible in GF(p)[x], if f(x) cannot be factored into a product of lower
degree polynomials. The notion of irreducible polynomial for polynomials is, in a way,
43
Mathematical Background II
analogous to that of a prime number p for numbers. Yet, a polynomial f(x) may be irreducible
in one ring and reducible in another. For this reason, the notion is used with respect to a
specific ring of polynomials.
An irreducible polynomial that satisfies the property indicated by the following definition is
central for constructing a non-prime order Galois field and is called primitive polynomial.
With regard to the roots of primitive polynomials in GF(p)[x], which can be found in the
finite field of order p m , Theorem 6-6 states that they are of order (p m– 1), Wicker (1995,
p. 41).
Theorem 6-6: The roots {ai } of an m-th degree primitive polynomial in GF(p)[x] have
order (p m– 1).
It can also be shown that all of the roots of an irreducible polynomial have the same order.
44
Mathematical Background II
However, we can extend GF(p) to a larger field in which p(x) has a root, by a method similar
to that which constructs the complex numbers from the reals.
We adjoin a new symbol a to GF(p) and form all formal sums a 0 + a 1 a + … + a m-1 a m-1 of
degree less than m = deg(p(x)).
These may be added and multiplied like polynomials except that the multiplication is
performed modulo p(a). This requirement ensures that the set of formal sums is closed under
the above operations. It may be shown that the resulting set with these operations is a finite
field containing GF(p). Furthermore, if a is a root of p(x) then
Since a is of order (p m– 1), by Theorem 6-6, the (p m– 1) distinct powers of a must have
(p m– 1) corresponding non-zero formal sums of the form
b 0 + b 1a + … + bm-1 a m-1
The coefficients {b i } are taken from GF(p), so there are exactly (p m – 1) distinct non-zero
formal sums or polynomial representations for the (p m– 1) powers of a.
It can be shown that, these (pm – 1) polynomials, together with zero, form an additive group
under polynomial addition. It can also be shown that they form a multiplicative group under
polynomial multiplication performed modulo p(a) and for the operations thus defined,
multiplication distributes over addition. As a result, the (p m – 1) polynomial representations
together with zero form a finite field of order p m , GF(pm ).
The non-zero elements of this finite field can be represented as (p m– 1) consecutive powers of
a or polynomials in a of degree less than m with coefficients in GF(p). We return to the more
general topic of rings of polynomials in sections 6.2.4 and 7.2, where their significance in the
construction of cyclic codes shall be explained.
45
Mathematical Background II
Consider the commutative ring with identity GF(p)[x]. The polynomial p(x) = x3 + x + 1 is a
primitive polynomial in GF(2)[x]. Let a be a root of p(x). Then,
a3 + a + 1 = 0 ⇒ a3 = a + 1
Also, m = 3 since p(x) is of degree 3. The mapping between the distinct powers of a and the
polynomials in a of degree at most m – 1 = 2 has as follows.
0 0
a0 1
a1 a
a2 a2
a3 a+1 since, a 3 = a + 1
a4 a2 + a since, a 4 = a 3 a = (a + 1)a = a 2 + a
a5 a2 + a + 1 since, a 5 = a 4a = (a 2 + a)a = a 3 + a 2 = (a + 1) + a 2
a6 a2 + 1 since, a 6 = a 5a = (a 2 + a + 1)a = a 3 + a 2 + a =
= (a + 1) + a 2 + a = a 2 + 1
As depicted in the above mapping between the exponential (first column) and polynomial
(second column) representations, the 7 distinct powers of a have 7 distinct representations of
polynomials in a of degree less than 3.
Another useful representation for the field elements in GF(p m ) is the vector representation. If
b 0 + b1 a + … + bm-1 am-1 is the polynomial representation of an element ß ∈ GF(p m ), then ß can
be represented by the vector whose m coordinates are the m coefficients of the polynomial
representation of ß. Hence,
The zero element is represented by the all-zero vector (0, 0, …, 0) of dimension m. In this
manner, each distinct power of a or equivalently each field element in GF(p m ) is associated
46
Mathematical Background II
with a vector of dimension m with coordinates in GF(p). This representation allows for the
operation of addition in GF(p m ) to be reduced to vector addition over GF(p). Clearly, GF(p m )
forms a vector space over GF(p).
With regard to the previous example, a vector space representation for GF(8) can be obtained
by using the set {1, a, a 2 } as a basis. Since all the non-zero elements of GF(8) can be
represented as polynomials in a of degree m – 1 = 2 or less, by taking their vector
representation the operations in GF(8) are reduced to vector operations. Each distinct power
of a is associated with a vector of dimension 3 as:
0 ↔ (0, 0, 0)
a ↔ (0, 1, 0)
a2 ↔ (0, 0, 1)
a3 = a+1 ↔ (1, 1, 0)
a4 = a2 + a ↔ (0, 1, 1)
a5 = a2 + a + 1 ↔ (1, 1, 1)
a6 = a2 + 1 ↔ (1, 0, 1)
a7 = 1 ↔ (1, 0, 0)
The above vector representation, allows for the operation of addition in the field GF(23 ) to be
reduced to vector addition over the field GF(2).
In general, a finite field GF(p m ), constructed using an m-th degree polynomial in GF(p)[x],
contains GF(p) and can be viewed as a construction over GF(p). Consequently, fields of
prime order power, GF(p m ), are called extensions of the prime order field, GF(p), which is
referred to as the ground field of GF(p m ).
Just as groups may contain more than one subgroups, so may a finite field contain subfields
other than GF(p), p a prime. In fact, GF(p m ) contains all Galois fields of order p b where b
divides p. For example, the field GF(64) contains GF(26 ), GF(23 ), GF(22 ) and GF(2), all
being proper subfields except for GF(26 ).
47
Mathematical Background II
Definition 6-5: A Euclidean domain is a set D with two operations ‘+’ and ‘·’ such that:
1. D forms a commutative ring with identity under ‘+’
2. Cancellation: if a·b = b·c, b ≠ 0, then a = c
3. There exists a function g: D –{0} → N such that:
i. g(a) < g(a·b) for every non-zero b ∈ D
ii. for all non-zero a,b ∈ D with g(a) > g(b), there exists q and r
such that, a = q·b + r where r = 0 or g(r) < g(b)
(q is called quotient and r remainder)
Note that for the additive identity element, the value g(0) is taken by convention to be − ∞ .
The ring of polynomials f(x) over a finite field GF(q) with the function g defined as
g(f(x)) = deg(f(x)) forms a Euclidean domain. Put more simply, with each polynomial in
GF(q)[x] we associate an integer which is the degree of the polynomial. For example,
f(x) = x3 + x + 1 ∈ GF(2)[x] is associated with the integer 3, which equals to deg(f(x)).
The introduction of the function g as defined above, allows the operation of division in a
Euclidean domain.
A general procedure for finding the greatest common divisor of any two polynomials, also
valid for integers, is the Euclid’s algorithm. The Euclid’s algorithm for polynomials in a
Euclidean domain D is as follows.
Given a(x) and b(x) ≠ 0, there exist polynomials s(x) and d(x) such that
48
Mathematical Background II
The process consists of dividing appropriate polynomials until a remainder of 0 results. The
following algorithm determines these appropriate polynomials.
Euclid’s Algorithm
Step 1: (initialisation)
Let r-1 (x) = a(x)
and r0(x) = b(x)
Note that with each iteration of the recursion formula, the degree of ri (x) gets smaller, since it
is strictly less than the degree of the dividend (which is ri-1 (x)). Thus, the algorithm
terminates after a finite number of steps.
The polynomials s(x) and d(x) can be obtained by applying the process described in Euclid’s
algorithm inversely. Supposing that GCD(a(x), b(x)) = r2 (x) was found after two iterations, it
follows from the recursion formula that
Consequently, the linear combination a(x)·s(x) + b(x)·d(x) = r2 (x) can be determined, which
gives the GCD(a(x), b(x)).
49
Mathematical Background II
Definition 6-6: Let a be an element in the field GF(q m ). The minimal polynomial of a with
respect to GF(q) is the smallest-degree monic 7 (and thus non-zero) polynomial
p(x) in GF(q)[x] such that p(a) = 0.
In other words, the minimal polynomial p(x)∈ GF(q) for a is the smallest-degree non-zero
polynomial that contains the specified root a.
Properties of the minimal polynomial are stated in the following theorem, Wicker (1995,
p. 54).
Theorem 6-7: For each element a in GF(q m ) there exists a unique monic (and thus non-zero)
polynomial p(x) of minimal degree in GF(q)[x] such that the following are
true:
1. p(a) = 0
2. The degree of p(x) is less than or equal to m
3. f(a) = 0 implies that f(x) is a multiple of p(x)
4. p(x) is irreducible in GF(q)[x]
Consequently, there exist polynomials with coefficients taken from the field GF(q), which
have a specified set of roots – the elements of GF(q m ). Those polynomials are the minimal
polynomials for each element a in GF(q m ). The next theorem, Wicker (1995, p. 56),
determines the other roots of the minimal polynomial of a as the conjugates of a with respect
to GF(q). These are defined as follows.
Let a be an element in the Galois field GF(q m ). The conjugates of a with respect to GF(q) are
the elements
7
A polynomial f(x) is said to be monic if the coefficient of the highest power of x in f(x) is equal to 1.
50
Mathematical Background II
2 3
a, a q , a q , a q ,…
The conjugacy class of a with respect to GF(q) is the set of conjugates of a with respect to
GF(q). It can be shown that the conjugacy class of a ∈ GF(q m ) with respect to GF(q) contains
d
d elements where d is the smallest integer such that a q = a.
Theorem 6-8: Let a be an element in GF(q m ). Let p(x) be the minimal polynomial of a with
respect to GF(q). The roots of p(x) are exactly the conjugates of a with respect
to GF(q).
Furthermore, the conjugates of an element a ∈ GF(qm ) can be used to obtain a form of the
minimal polynomial according to the following theorem, Lin and Costello (1983, p. 37).
Theorem 6-9: Let p(x) be the minimal polynomial of an element a in GF(q m ). Let d be the
d
smallest integer such that a q = a. Then,
d −1
p(x) = ∏ ( x + a q )
i
i =0
The minimal polynomials are of significant importance for the complete factorisation of the
polynomial of the form xn – 1, which is essential to describe certain classes of error detecting /
correcting codes and is discussed in the following section.
6.2.3 Factorisation of x n – 1
The order of any element a in the field GF(q m ) divides (q m – 1), according to Theorem 6-1.
By definition of order (Definition 6-1), it follows that the non-zero elements in GF(q m ) are
m
−1
roots of the expression x q – 1 = 0 or equivalently, they are (q m– 1)-st roots of unity. Since
m
−1
the expression x q – 1 = 0 is of degree (q m– 1), it may be shown that it has exactly (q m – 1)
distinct roots. Consequently, the (q m – 1) non-zero elements of GF(q m ) form the complete set
m
−1 m
−1
of roots for x q – 1 = 0. Since every non-zero element in GF(q m ) is a root of x q – 1 = 0,
the minimal polynomials of all the non-zero elements in GF(q m ) provide the complete
m
−1
factorisation of x q – 1. Further, those factors are irreducib le polynomials in the ring
GF(q)[x].
m
−1
The reasoning developed above for factoring the expression x q – 1, can be extended to the
more general form of polynomials xn – 1. Assume that there exists element ß with order n in
some field GF(q m ). It follows that ß and all powers of ß are roots of xn– 1 = 0. In addition,
51
Mathematical Background II
the elements 1, ß, ß2 , …, ßn-1 are distinct (repetitions begin from ßn and the first element to
repeat is element 1). Thus, the n roots of xn – 1 are generated by computing n consecutive
powers of ß. For this reason, elements of order n like ß are called primitive n-th roots of
unity.
If n is a divisor of (q m – 1), then there are exactly f (n) elements with order n in GF(q m ).
Therefore, the existence of a positive integer m such that n | (q m – 1) implies the existence of a
primitive n-th root of unity ß in an extension field GF(q m ) of GF(q). Moreover, if m is chosen
to be the smallest positive integer such that n | (q m– 1), ß can be found in the smallest
extension field GF(q m ) of GF(q). Once the desired primitive n-th root of unity has been
found, forming the conjugacy class of that root and computing the associated minimal
polynomials, of the root and its conjugates, can complete factorisation of xn – 1. A discussion
about the existence of such an element ß and where it can be found is included in Appendix
B.
In general, factoring xn – 1 is a quite tedious task. Due to its application to the construction of
cyclic codes which will be described in Chapter 7, algebraic computer systems like MAPLE
have been developed, capable of factoring polynomials of reasonable degree, over some
prime order fields.
6.2.4 Ideals
Rings of polynomials are of great significance in the study of algebraic codes since they are
used for defining powerful classes of error control codes. It has already been seen that the
collection of all polynomials of arbitrary degree with coefficients in the finite field GF(q)
forms a commutative ring with identity. If the ring of polynomials GF(q)[x] is reduced
modulo (xn – 1), the resulting structure is again a ring, this time containing all polynomials of
degree less than n with coefficients in GF(q), denoted by GF(q)[x]/(xn– 1). This ring and in
particular, the notion of ideals in GF(q)[x]/(xn– 1), is rather useful in defining linear cyclic
codes as will be described in section 7.2.
It can be seen that {0} and R are the trivial ideals in any ring R.
52
Mathematical Background II
As dictated by the above definition, every element in the principal ideal can be represented as
a product of a specific element in the ideal. This element g, used to represent all elements of
the principal ideal, is called generator element. The ideal thus generated is denoted by <g>.
Basic properties of ideals in the ring GF(q)[x]/(xn– 1) are presented in the following theorem,
Wicker (1995, p. 64).
For example, the ring GF(2)[x]/(x7 – 1) contains the ideals: <x + 1>, <x3 + x + 1>, <x3 + x2 + 1>,
<(x + 1)(x3 + x + 1)>, <(x + 1)(x3 + x2 + 1)>, <(x3 + x + 1)(x3 + x2 + 1)> and the two trivial ideals
{0} and R.
The above theorem, in combination with the reasoning developed in factorising xn – 1, leads to
the characterisation of all the ideals in GF(q)[x]/(xn– 1), which is central to the construction of
the class of cyclic codes, as described in the following Chapter.
53
Cyclic Codes
7. CYCLIC CODES
Cyclic codes are an important class of linear block codes. Much work has been done on a
theoretical basis on cyclic codes, which enhances their practical applications. Their
considerable algebraic structure enables multiple error correction and can provide protection
against error bursts. Many important codes such as Golay, Hamming, BCH codes can be
represented as cyclic codes. A cyclic version of the [7,4] Hamming code is included in
Appendix C. The underlying mathematical structure of cyclic codes allows for the design of
various encoding / decoding methods which are implemented by means of shift registers.
Definition 7-1: An (n,k) linear code C is called a cyclic code if every cyclic shift of a
codeword in C is also a codeword in C.
When considering cyclic codes, it is useful to associate each codeword c = (c0 , c1 , …, cn-1 ) of
length n, with a polynomial c(x) of degree at most (n – 1), which has the coordinates of the
code vector as its coefficients. Thus,
54
Cyclic Codes
The product a(x)·c(x) where c(x) is a code polynomial and a(x) an arbitrary polynomial in
GF(q)[x]/(xn– 1), is a linear combination of cyclic shifts. By Definition 7-1, and bearing the
polynomial representation in mind, a(x)·c(x) must also be a code polynomial. Thus, code C
which forms a vector space within the ring of polynomials GF(q)[x]/(xn– 1) is an ideal, recall
Definition 6-7, and this is formally put in the following theorem, Poli and Huguet (1992,
p. 188).
According to Theorem 6-10, every cyclic code is a principal ideal of the ring GF(q)[x]/(xn – 1).
This implies that a cyclic code C consists of the multiples of a polynomial g(x) ∈ C, which is
unique, monic and of lowest degree among all code polynomials. This polynomial is called
the generator polynomial of the q-ary (n,k) cyclic code. Further, g(x) is a divisor of xn – 1, for
otherwise the greatest common divisor (GCD) of xn– 1 and g(x) would be a polynomial in C
of lower degree than g(x).
The above results combined with Theorem 6-10 are summarised in the following theorem,
Wicker (1995, p. 101), which presents the basic properties of cyclic codes.
55
Cyclic Codes
The requirement that the generator polynomial must be a divisor of xn – 1 limits the selection
of g(x). Based on the factorisation of xn – 1 into irreducible polynomials in GF(q)[x], it is
possible to list all q-ary cyclic codes of length n. Let xn– 1 be factorised into irreducible
factors, as xn– 1 = f 1 (x) f2 (x)…f t (x). By choosing, in all possible ways, one of the 2t factors of
xn – 1 as generator polynomial g(x) and defining the corresponding code to be the set of
multiples of g(x) modulo (xn – 1), all cyclic codes of length n can be determined.
c(x) = c0 + c1 x + … + cn-1xn-1
will be produced as
c(x) = i(x)·g(x)
= (a0 + a 1x + … + a k-1 xk-1 )·g(x)
= a 0g(x) + a 1xg(x) + … + a k-1 xk-1g(x)
g ( x)
xg( x)
c(x) = [a0 a 1 … a k-1 ] · .
.
.
x k −1 g ( x)
This provides a general form for the generator matrix G for cyclic codes
56
Cyclic Codes
g 0 g 1 g 2 . . . g n-k 0 0 . . . 0
0 g0 g1 g2 . . . g n-k 0 . . . 0
. . . . .
G = . . . . .
. . . . .
0 . . . 0 g0 g1 g2 . . . g n-k
Note that the k rows of the generator matrix G are the codewords g(x), xg(x), …, xk-1 g(x)
which form a basis for C.
Let c(x) be a code polynomial in C. Then, c(x) = m(x)·g(x). If c(x) is multiplied by h(x),
c(x)·h(x) = m(x)·g(x)·h(x)
= m(x)·(xn – 1)
= xn m(x) – m(x) (7.3.1)
The degree of m(x) is at most (k – 1) and thus the powers xk, xk+1 , …, xn-1 do not appear in
xn m(x) – m(x). It follows that on the left-hand side of equation (7.3.1), the coefficients of the
powers xk, xk+1 , …, xn-1 must be equal to zero, providing (n – k) parity-check equations
xkh(x-1 ) = h k + h k-1 x + … + h0 xk
8
Let f(x) = a 0 + a 1 x + … + a n xn be an n-th degree polynomial. The reciprocal f* (x) is the polynomial
f* (x) = xn f(x-1 ) = a n + a n-1 x + … + a 0 xn .
57
Cyclic Codes
which can be shown that is also a factor of xn – 1, an (n,n – k) cyclic code is generated with the
following (n – k)x n matrix as a generator matrix
h k h k-1 . . . h0 0 0 . . . 0
0 hk . . . h1 h0 0 . . . 0
. . .
H = . . .
. . .
0 0 . . . hk h k-1 . . . h0
The construction of H, based on the (n – k) parity-check equations (7.3.2), implies that any
codeword v in C is orthogonal to the (n – k) rows of H. It follows that the rows of H are
vectors in the dual space C- of C. Since h(x) is monic, the (n – k) rows of H are linearly
independent and according to Theorem 3-4 and the reasoning developed in section 4.3.2, they
span C- . Thus, H is a parity-check matrix for C. The polynomial h(x), used to obtain H, is
called the parity -check polynomial of cyclic code C, Pless (1998, p. 74).
h k h k-1 . . . h0 0 0 . . . 0
0 hk . . . h1 h0 0 . . . 0
. . .
H = . . .
. . .
0 0 . . . hk h k-1 . . . h0
Indeed, the parity-check matrix and the generator matrix for a cyclic code C share the same
structure. The reciprocal polynomial of h(x) shall construct a generator matrix for C- .
58
Cyclic Codes
(0, 0, …, 0, a 0 , a 1 , …, a k-1 )
Hence,
q(x)·g(x) = xn-ki(x) – d(x) (7.4.1)
Since the product q(x)·g(x) = c(x) is a multiple of g(x), it is a valid code polynomial and so is
xn-ki(x) – d(x). The remainder d(x) is of degree less than (n – k) and thus can be associated
with the sequence
( – d 0 , – d 1 , …, – dn-k-1 , 0, 0, …, 0)
whose last k positions are zero. Thus, using expression (7.4.1), the codeword c(x) = q(x)·g(x)
can be written as
In this way, the information polynomial i(x) has been systematically encoded. That is, the
information sequence, associated with i(x), has been mapped to a codeword in which the
information bits occupy the last k positions.
59
Cyclic Codes
60
BCH Codes – Reed-Solomon Codes
In constructing cyclic codes there is no guarantee on the minimum distance of the resulting
code. Given an arbitrary generator polynomial g(x), a computer search of all non-zero
codewords needs to be conducted to determine the minimum weight and thus, the minimum
distance. Placing a constraint on the generator polynomial in order to ensure the minimum
distance of the resulting code, is the main concept of another powerful class of codes, the so
called BCH codes, in honour of their discoverers Bose, Chaudhuri and Hocquenghem. In
addition to these codes, the Reed-Solomon codes, which address the issue of minimising the
added redundancy, are presented in the following discussion.
Definition 8-1: A cyclic code of length n over GF(q) is called a BCH code of designated
distance d, if its generator polynomial g(x) is the least common multip le of the
minimal polynomials of ßl , ßl+1 , …, ßl+d -2 for some (non-negative integer) l,
where ß is a primitive n-th root of unity.
The codes defined for l = 1 are called narrow-sense BCH codes and for n = q m– 1 the
resulting codes are called primitive BCH codes, since the n-th root of unity ß is a primitive
element in GF(q m ).
The BCH code of designated distance d has minimum distance d, which is equal to or exceeds
d, as stated in the following theorem, Pless (1998, p. 112), which reveals the importance of
this class of codes.
61
BCH Codes – Reed-Solomon Codes
Theorem 8-1: (BCH Bound) The minimum weight of a BCH code of designated distance d is
at least d.
Now, a t-error-correcting BCH code of length n can be constructed for any positive integers m
and t that satisfy the requirement imposed by the following theorem, Poli and Huguet (1992,
p. 200).
Theorem 8-2: For every integer of the form qm – 1, m ≥ 3 (q a power of a prime), there exists
a BCH (n,k) code C which is t-error-correcting, such that k ≥ n – 2tm if q > 2
(k ≥ n – tm if q = 2) whose generator polynomial is
The construction process of BCH codes can be described in three steps as follows.
Step 1: Find a primitive n-th root of unity in the smallest extension field GF(q m ) of GF(q)
Step 3: Let g(x) be the l.c.m. of the minimal polynomials of these (d – 1) consecutive powers
of ß
It can be seen that we follow the general construction process for cyclic codes, but by placing
the constraint dictated by Steps 2 and 3 on the generator polynomial, we ensure that the
resulting code has minimum distance at least equal to d.
The importance of the class of BCH codes stems from the fact that a BCH code with desired
minimum distance d can be constructed, for a specific value of d that we select in the
construction process. Yet, the choice of d is not totally arbitrary since there are
implementation considerations.
62
BCH Codes – Reed-Solomon Codes
polynomials of ßl , ßl+1 , …, ßl+d-2 . It follows that v(x) is also divisible by their least common
multiple, which is the generator polynomial g(x) of the t-error-correcting BCH code C.
Hence, v(x) is a code polynomial.
Since, the (d – 1) consecutive powers of ß starting from l are roots of v(x), the following
equations must be satisfied. These equations perform error detection, by checking whether a
received word v has the required zeros
1 βl β 2l . . . β ( n−1) l
1 β l+1 β 2 (l +1) . . . β ( n−1)( l+1)
[ v0 v1 … vn-1 ]·
. . . .
= 0 (8.1.2.2)
. . . .
. . . .
1 β l+ δ − 2 β 2(l +δ − 2) . . . β ( n−1)( l+ δ − 2)
hence, vT·H = 0
It follows from equation (8.1.2.2) that if v = (v0 , v1 , …, vn-1 ) is a codeword in the t-error-
correcting BCH code C, then vT·H = 0. Conversely, if v satisfies vT·H = 0, then it follows
from equations (8.1.2.1) that the (d – 1) consecutive powers of ß, starting from l, are roots of
its corresponding polynomial v(x). This implies that v is a codeword in C. Hence, H is the
parity-check matrix of the BCH code C. Note that it has entries from the extension field
GF(q m) of GF(q), m minimal.
63
BCH Codes – Reed-Solomon Codes
achieve a given error correction at the expense of adding more redundancy than actually
needed. Two finite fields are used for the construction of BCH codes. One is GF(q) over
which the code is defined, and the other is GF(q m ) where a primitive n-th root of unity can be
found.
Reed-Solomon codes are an important class of codes, where the two fields coincide. These
codes are also cyclic and can be defined as follows, Poli and Huguet (1992, p. 205).
Definition 8-2: A Reed-Solomon code (RS code) over GF(q m ) is a BCH code of length
n = q m– 1, dimension k = n – d + 1 and minimum distance d. It is therefore,
m
−1
an ideal in GF(q m )[x]/(x q – 1).
It can be seen that the length of the code is one less than the size of code symbols and
k = n – d + 1 implies that the designated distance d is d = n – k + 1, hence the minimum
distance of an RS code is one greater than the number of parity-check digits.
where l is a positive integer which is the power of the first of the (d – 1) consecutive powers
of a. Different generator polynomials are formed for different values of l.
Clearly, g(x) has (d – 1) consecutive powers of a as all its roots and has coefficients from the
field GF(q m ). The RS code thus generated, is an (n,n – 2t) cyclic code which consists of those
polynomials of degree at most (n – 1) with coefficients in GF(q m ) that are multiples of g(x).
The original approach to the construction of RS codes by their discoverers, I.S. Reed and
G. Solomon, is slightly different than the generator polynomial construction which initially
was developed to describe cyclic codes, as discussed in Chapter 7. A brief illustration of the
original approach to RS codes, based on Reed and Chen (1999, p. 243-4) is given next.
Consider a message block m = (m0 , m1 , …, mk-1 ) whose k symbols are taken from GF(q m ) with
corresponding message (information) polynomial i(x) = m0 + m1 x + … + mk -1 xk-1 . This
message block m is encoded by evaluating i(x) with each of the q m elements in the finite field
64
BCH Codes – Reed-Solomon Codes
GF(q m). Recall that the non-zero elements in GF(q m ) can be represented as the (qm – 1)
powers of some primitive element a. Thus, c is obtained as
m
−1
c = (c0 , c1 , …, cn-1 ) = [i(0), i(a), i(a 2 ), …, i(a q )]
In this way, a complete set of codewords can be determined by allowing the k information
symbols to take all possible values. This set of RS codewords forms a k-dimensional vector
space over GF(q m ). Besides, the code length is equal to q m since each codeword has q m
coordinates.
In contrast, following the generator polynomial approach construction, the resulting RS codes
have code length q m – 1. However, this approach is currently more popular than the original
approach, mainly because it is in accordance with the construction method adopted for cyclic
and BCH codes.
One of the most significant properties of Reed-Solomon codes is that an (n,k) RS code always
has minimum distance d equal to the designated distance d = n – k + 1, as dictated by
Definition 8-2. The fact that d is one greater than the number of parity-check digits, makes
the Reed-Solomon codes maximum-distance separable (MDS).
The term was coined to describe codes whose minimum distance is the largest possible for
fixed length n and dimension k. This property satisfied by RS codes, mainly justifies the fact
that they are preferred in various practical applications for both random and burst error
correction. The latter, is briefly illustrated in section 8.4.
Encoding for Reed-Solomon codes is similar to encoding for cyclic codes described earlier
(section 7.4). To obtain the 2t parity-check digits, the information polynomial is multiplied
by x2t and then divided by the generator polynomial g(x). The coefficients of the remainder
are the parity-check digits. This process performs systematic encoding for Reed-Solomon
codes, since the parity-check digits occupy the first 2t positions of the corresponding
codeword.
65
BCH Codes – Reed-Solomon Codes
In decoding non-binary BCH and Reed-Solomon codes, not only the locations of the errors
need to be determined, but the corresponding error values as well. In the binary case, the
error values are equal to 1 and thus, known.
Suppose that codeword v(x) = v0 + v1x + … + vn-1xn-1 is transmitted. Errors that occurred
during transmission result in the received polynomial r(x) = r0 + r1 x + … + rn-1xn-1. Let e(x)
be the error pattern, e(x) = e0 + e1x + … + en-1 xn-1 . Then,
Selecting d – 1 = 2t, we ensure that code C can correct t errors, by Theorem 8-1 and Theorem
4-3. The first step is to compute the syndrome from the received vector r(x).
The j-th component of the syndrome, by (8.3.2) and H taken from equation (8.1.2.2) is
The syndrome components are in the field GF(q m ) and can be computed from r(x) as follows.
We divide r(x) by the minimal polynomial p j (x) of aj and we obtain from Euclid’s algorithm
r(x) = a j(x)·pj (x) + bj (x), where deg(bj (x)) < deg(pj (x)). Since p j(a j ) = 0 we get
Since a, a 2 , …, a2t are roots of every code polynomial, it follows that v(a j ) = 0, j = 1..2t.
66
BCH Codes – Reed-Solomon Codes
n −1
S j = e(a j ) = ∑ ek(aj )k (8.3.5)
k =0
locations, { e il X l } l = 1..?
n −1 ?
S j = e(aj ) = ∑
k =0
ek(a j )k = ∑
l =1
e il X l j (8.3.6)
S 1 = e i1 X1 + e i2 X2 + … + eiν X?
S2 = e i1 X 12 + e i2 X 22 + … + eiν X ν2
.
. (8.3.7)
.
S 2t = e i1 X 12t + e i2 X 22t + … + eiν X ν2t
Equations (8.3.7) are not linear, but can be reduced to a set of linear functions in the unknown
quantities. For this purpose, an error locator polynomial ?(x) is defined, such that its roots
are the inverses of the error locators {Xl }
ν
?(x) = ∏ (1–Xl··x) = ? ?x? + ? ?-1x?-1 + … + ? 1x + ? 0 (8.3.8)
l =1
67
BCH Codes – Reed-Solomon Codes
hence,
ν ν ν ν
? ? ∑ e il X l−ν + j + ? ?-1 ∑ e il X l−ν + j +1 + … + ? 1 ∑ e il X lj −1 + ? 0 ∑ e il X lj = 0
l =1 l =1 l =1 l =1
hence,
From equation (8.3.8) defining ?(x), it follows that ? 0 is always 1. Thus, equation (8.3.11)
can be reexpressed as
Assuming that ? = t, where t is the error correcting capability of the code, the following
matrix expression of (8.3.12) can be obtained
S1 S2 S3 S4 . . . S t−1 St Λt − S t +1
S2 S3 S4 S5 . . . St S t +1 Λ t+1 − S t+ 2
S3 S4 S5 S6 . . . S t+1 S t+ 2 Λ t+ 2 − S t+ 3
S4 S5 S6 S7 . . . S t+ 2 S t+ 3 Λ t +3 − S t+ 4
A´· ? = . · . = .
. . .
. . .
S t−1 St S t +1 S t +2 . . . S 2t− 3 S 2t − 2 Λ2 − S 2t −1
St S t +1 St +2 S t +3 . . . S 2t − 2 S 2t −1 Λ1 − S 2t
It can be shown that matrix A´ is non-singular if ? = t. If less than t errors occurred (? < t)
then it can also be shown that A´ is singular. If A´ is singular, then the rightmost column and
the last row are removed and the determinant of the resulting (t – 1)x (t – 1) matrix is
computed. The process is repeated until the resulting matrix becomes non-singular. Once
this is true, the coefficients {? l }, l = 1..t of the error locator polynomial are determined, using
standard linear algebra techniques, with computations performed in GF(q m ) from which the
68
BCH Codes – Reed-Solomon Codes
entries are taken. Once the ? error locations are determined, the equations in (8.3.7) form a
system of 2? equations in ? unknowns, { e il } l = 1..?. Solving this system for the error values
A set of binary errors in a word is called a burst. The length l of the burst is defined as the
number of binary positions between the first and the last error, inclusive. An (n,k) code that is
capable of correcting all error bursts of length l or less, but not all error bursts of length
(l + 1), is called an l-burst-error-correcting code. It may be shown that an (n,k) code C is
l-burst-error-correcting if no bursts of length 2l or less can be a codeword in C. The
following theorem, Lin and Costello (1983, p. 258), places an upper bound, known as Reiger
bound, on the burst error correction capability of an (n,k) code.
An (n,k) code that achieves equality in the inequality stated in the above theorem is said to be
optimal.
According to the discussion in section 6.1.3, any element ß in GF(2m ) can be represented by a
vector of dimension m, (b 0 , b 1 , …, b m-1 ) where {b i } i = 0..m – 1 lie in GF(2). This
representation of ß is referred to as an m-bit byte. Consider a t-error-correcting Reed-
Solomon code with code symbols from GF(2m ). If each element in GF(2m ) is represented by
its corresponding m-bit byte, the resulting code is a binary linear code with the following
parameters
During the decoding process, the binary received vector is divided into (2m – 1) m-bit bytes
and each m-bit byte is transformed back to a symbol in GF(2m ). If an error affects t or fewer
69
BCH Codes – Reed-Solomon Codes
of these m-bit bytes, it affects t or fewer symbols in GF(2m ) and can thus be corrected using
the decoding method for RS codes, demonstrated in section 8.3.
For example, RS codes of length 255 are used in many applications since each of the 256 = 28
field elements of GF(256) can be represented as a binary 8-tuple (byte) allowing the code to
be implemented using binary electronic devices.
The code is capable of correcting all error bursts of length (t – 1)m + 1, since such a burst
cannot affect more than t m-bit bytes. Note that this binary Reed-Solomon code can still
correct any combination of t or fewer random errors. Put more simply, in addition to being
effective for burst error correction, the code continues to be t-error-correcting.
With regard to Theorem 8-3, a binary RS code achieves (t – 1) m + 1 as maximum length for a
correctable burst, which is quite close to tm that is the optimal.
70
Performance of Error Detecting / Correcting Codes
The number of possible non-zero received words is 2n – 1 and thus there are 2n– 1
corresponding error patterns and exactly 2k– 1 of them, are identical to the non-zero
codewords of the (n,k) linear code. Since in any (n,k) linear code, the sum of two codewords
is a codeword, whenever one of these 2k– 1 error patterns occur, the transmitted codeword is
altered to another codeword resulting in an incorrect decoding. Thus, there are exactly 2k– 1
undetectable error patterns.
Consequently, there are exactly 2n – 2k detectable error patterns which are those not identical
to the codewords of the (n,k) code. It can be seen that for large n, the number 2n is much
bigger than 2k– 1, allowing a relatively small number of undetectable error patterns.
An upper bound on the probability of undetected word error Pu (E) can be obtained using the
minimum distance according to Wicker (1995, p. 240-2). The communication channel is
assumed to be the Binary Symmetric Channel, illustrated in Figure 9–1, which consists of
two symbols, 0 and 1, and the probability of error, p, is the same for both symbols.
1– p
0 0
p
input bit output bit
1 1
1– p
Figure 9–1
71
Performance of Error Detecting / Correcting Codes
Since an (n,k) linear code with minimum distance d is capable of detecting all error patterns
of weight (d – 1) or less, the probability of undetected error is bounded above by the
following expression
n
n
Pu (E) ≤ ∑ j p (1 – p)
j =d
j n-j
where the binomial coefficient is the number of error patterns of weight j, where j starts from
d and p j (1 – p)n-j is the probability of occurrence of a particular error pattern of weight j.
For the non-binary case, the q m -ary Uniform Discrete Symmetric Channel is considered,
which consists of q m symbols. The probability that a symbol is correctly received is s, while
the probability that a particular incorrect symbol is received is (1– s). It can be shown, that
the probability of undetected error is bounded by the expression
d −1
n
Pu (E) ≤ 1 – ∑ j (1 − s ) s n− j
j
j= 0
A decoder error occurs when a received word is matched to a codeword which was not the
one actually transmitted. The probability of decoder error P(E) is bounded above by the
probability of occurrence of error patterns of weight greater than t, since up to t errors are
corrected by the (n,k) code. Hence,
n
n
∑ j p (1 − p) n− j
P(E) ≤ j
j =t +1
n t
P(E) = ∑ ∑P
j =d
Aj
k =o
k
j
72
Performance of Error Detecting / Correcting Codes
where Pk j is the probability that a received word is exactly Hamming distance k from a
k
j n − j j− k + 2r
Pk j = ∑ k − r r
p
(1 − p )n− j+ k −2r
r =o
For the non-binary case and given the UDSC channel, it can be shown – through an
analysis based on probability theory, not pursued in this study – that the probability of
decoder error is given by the expression
n t
P(E) = ∑ A j ∑ Pk j
j =d k=o
In the discussion of basic binary codes in section 2.1 it was shown that the parity-check code
k
had very high information rate R = , since it uses a single parity-check digit, but can only
k +1
detect any odd number of errors and does not have error correction capabilities (t = 0). The
Triple Repetition codes which can detect a double error and correct a single error (t = 1) had
1
low information rate R = . The desirable, yet impossible, performance for an error control
3
code would be to achieve R approaching 1 while t approaches n. This imposes that the
performance of a code is associated with achieving a “moderate” pair (R, t), since one
parameter is considered in relation to the other.
For instance, the formulae for the parameters of Hamming codes, presented in section 5.1,
imply that the information rate R approaches 1 quite fast as m grows large. However,
73
Performance of Error Detecting / Correcting Codes
Hamming codes can always correct a single error and the importance of this error control
capacity fades as the code length n increases. Consider correcting one error in a code length
of 7 (m = 3) and one error in a length of 63 (m = 6). Therefore, it is the shorter Hamming
codes that are used in practical applications.
As described in section 8.2, Reed-Solomon codes can be defined over Galois fields of order
q m , though in the coding literature they are usually discussed over GF(2m ) due to the
simplicity of their implementation using binary electronic devices. Table 9–1, based on Reed
and Chen (1999, p. 246), lists all Reed-Solomon codes defined over the Galois fields GF(2m )
for m ≤ 4. A column for the minimum distance d of each code has been added. The length of
each code is the column under n, the number of information symbols is under k, the number
of correctable errors is under t and the information rate of each code is under R.
m n k d t R (%)
2 3 1 3 1 33.3%
3 7 5 3 1 71.4%
3 7 3 5 2 42.9%
3 7 1 7 3 14.3%
4 15 13 3 1 86.7%
4 15 11 5 2 72.3%
4 15 9 7 3 60.0%
4 15 7 9 4 46.7%
4 15 5 11 5 33.3%
4 15 3 13 6 20.0%
4 15 1 15 7 6.7%
Table 9–1
It can be seen that the number of possible Reed-Solomon codes increases quite fast as m
increases. In order to choose an RS code, the error correcting capacity t must be considered
in relation to the information rate R. For instance, a reasonable choice would be the (15,9) RS
code with t = 3 and R = 0.6 since it corrects 3 errors in a length of 15 and has moderate
information rate. However, considering the frequency of occurrence of errors in the
74
Performance of Error Detecting / Correcting Codes
transmission channel used by a specific application, the (15,11) RS code could be chosen,
though it corrects 2 errors in a length of 15, as it is more economical with information rate
R = 0.723. If the channel used, induces a relatively large number of errors then error
correction may be given higher priority and the (15,7) RS code could be a good choice since it
corrects 4 errors (t = 4) in a length of 15. Yet, this code is uneconomical since its information
rate R is R = 0.467. In any case, Table 9–1 reveals the importance of the class of Reed-
Solomon codes which has a wide range of codes likely to be used in a number of practical
applications.
Another parameter to be considered in the performance analysis of error control codes is the
implementation complexity; the amount of hardware and software required for realisation of
the encoding and decoding processes. This parameter is also subject to the specific
application, since hardware and software for applications with low data rate such as the audio
bit-stream for the compact disk with 1.41 Mbits/sec is substantially different from the
requirements for high data rate applications such as High-Definition Television with 19.3
Mbits/sec. Consequently, powerful encoding and decoding algorithms of error detecting /
correcting codes need to have feasible implementation in order to be regarded as efficient.
75
Error Control Strategies and Applications
The uses of error detecting / correcting codes are continuously expanding, in a way,
proportionally to the technological developments in applications concerned with transmission
of information. The type of error detection and correction coding deployed by a real
communication system mainly depends on the application. Parameters such as the channel
(or storage medium) properties, the amount of transmitted power and digital equipment
limitations primarily determine the error control strategy adopted. As exhibited throughout
the coding theory literature, three preponderant error control schemes are in play; forward
error correction, automatic repeat request and hybrid error control.
The major advantage of ARQ over FEC is that it is adaptive, since error processing
(retransmission) is performed only when errors occur. However, error control codes tend to
use long codewords for efficient error detecting. As a result, if retransmission is requested
frequently, the receiver may experience delays in receiving the original message. A mixed
error control strategy, called hybrid error control (HEC), addresses the problem by using
error correction for the most frequent errors in combination with error detection and
retransmission for the less frequent error patterns.
Furthermore, another approach to specifying the error control strategy suggests considering
what types of data need to be strongly protected during transmission. For example, in a
76
Error Control Strategies and Applications
computer system there is greater sensitivity to errors in the machine instructions than in the
user’s data.
The uses of error control codes are ever expanding and include:
• radio links
• long-distance telephony
• television (High-Definition Television)
• data storage systems
• compact disk (Digital Versatile Disk)
• international data networks
• wireless communications
• deep-space communications (satellites, telescopes, space probes)
Error detecting / correcting codes are widely used for improving the reliability of computer
storage systems. The requirement for such systems, which at first used core memories, was
for a single error correcting / double error detecting (SECDED) code. The first error control
scheme to be implemented on computer memories were the Hamming codes which have this
error control capacity, as discussed in section 5.1. Especially after core memories were
replaced by semi-conductor memories, which are faster but their high density per chip induces
more errors, error control codes became an essential design feature in computer storage
systems.
A commonly used protection method is to apply two levels of coding, called concatenation.
The information sequence is encoded with a code C2 (external code) and transformed from a
sequence of length k to an encoded sequence of length n. This new sequence is regarded as
the information for a second code C1 (internal code) and is accordingly again encoded. This
process results in a concatenated code which is effective against a mixture of error bursts and
random errors. In general, the external code prevents errors generated by incorrect decoding
of the internal code.
77
Error Control Strategies and Applications
The most commonly adopted combination is that of a convolutional internal code with a
Reed-Solomon external code. Concatenated convolutional and RS codes find widespread use
in applications concerned with the reliable transmission of data representing photographs or
video such as satellite telecommunications, deep-space telescopes and, recently,
High-Definition Television.
Currently, the most popular class of codes for communication and storage systems seems to
be the class of Reed-Solomon codes. The prevalence of these codes over most of the codes
presented in this study, can be chiefly justified by their powerful algebraic decoding algorithm
in combination with their byte symbol structure.
For example, the compact disk error control system employs Reed-Solomon codes, actually
two RS codes defined over the Galois field GF(28 ) according to Reed and Chen (1999,
p. 300-7). Each symbol from GF(28 ) can be represented by a binary 8-tuple (byte ) based on
the development of section 8.4. The compact disk digital audio system uses the byte symbol
structure of RS codes, since these 8-bit symbols from GF(28 ) prove to be suitable, from an
electrical engineering perspective, for the 16-bit samples obtained from the original analog
music source.
The length of an RS code over GF(28 ) is 255 leading to 2040-bit codewords, which would
result in increased complexity and high implementation cost. In order to minimise this cost,
as the compact disk application focused on retail sales, a concatenation of two shortened
Reed-Solomon codes are implemented on its error correction system. The external code is a
(28,24) RS code and the internal is a (32,28) RS code which both still use 8-bit symbols from
GF(28 ). Their concatenation combined with their byte symbol structure, provides protection
against error bursts caused by material imperfections or fingerprints and scratches that may
occur when handling the CD.
78
Afterword
AFTERWORD
As observed throughout this study of error detecting / correcting codes, the encoding process
is likely to be less extensive than decoding. In fact, the encoder performs a single task; that
is, transforming each message word to a codeword by adding redundancy. In contrast, the
decoder conducts three tasks; detection of errors, error processing in case of occurrence of
errors, extraction of the original message word from the codeword. These processes reveal a
substantial decoding complexity and imply extensive digital equipment requirements. Given
that decoding is primarily based on the encoding rule, the design of error detecting /
correcting codes should focus on optimising the encoding process in such a manner that it
simplifies decoding.
Error detecting / correcting codes cannot eliminate the probability of erroneous transmission.
Yet, they contribute to a significant reduction in the effects of noisy transmission channels.
Practical applications demand codes with a specified error control capability. Such codes are
bound to emerge once they are proved mathematically to satisfy the required error control
capacity.
Towards this direction, advanced algebraic techniques are exploited in the construction
process of error control codes. The resulting codes gain in error control capability, but this
benefit is counterbalanced by increased implementation complexity. Clearly, the portion of
available algebraic tools applied should overlap with feasible implementation.
Put more simply, rigorous research on advanced applied mathematics should be performed in
parallel to research on engineering developments that will withdraw severe implementation
preoccupations.
This combination formula can facilitate innovative approaches to error control systems for
efficient transmission of information.
79
Appendices
Appendix A
APPENDIX A
1. Groups
A set of elements G that constitutes a group under a binary operation ‘·’ has only one identity
element.
Proof: Assume that there exist two inverse elements a´ and a´´ for an element a in G.
Then, a´= a´· e = a´·(a · a´´) = (a´· a)· a´´= e · a´´= a´´.
Hence, a´= a´´ implying that the inverse a´ of a group element a is unique.
When the operation imposed on a set of elements for the formation of a group is
multiplication ‘·’, the inverse element of a is often written as a -1 and the unit element is
written as 1. We also write a i for a · … · a (i times).
When the operation is taken to be addition ‘+’ the inverse element of a ∈ G is often written
as –a and the unit element is written as 0.
The set of all permutations of {1, 2, …, n} is a group under the composition of functions. It is
called the symmetric group and its cardinality is equal to n!.
81
Appendix A
If G is a commutative (or abelian) group and H a subgroup, then the sets aH = {a·h | h ∈ H}
are called cosets of H. The cosets again form a group, if multiplication of cosets is defined by
(aH)·(bH) = a·bH. This group is called the factor group, denoted by G | H.
2. Fields
The following properties can be derived from the definition of a field GF(q):
1. a·0 = 0·a = 0, for any a ∈ GF(q)
2. If a·b = 0, a ≠ 0, then b = 0, a,b ∈ GF(q)
3. For all a,b ∈ GF(q) with a ≠ 0 ≠ b, a·b ≠ 0
4. – (a·b) = (–a)·b = a·(– b) for all a,b ∈ GF(q)
5. If a·b = a·c, a ≠ 0 then b = c
We can regard a field GF(q) as having four operations, addition, subtraction, multiplication
and division by non-zero element – where subtraction and division are performed using the
inverse element in GF(q) – with the understanding that a – b = a +(–b) and a / b = a·b -1 ,
(b ? 0) for all a,b ∈ GF(q).
The following theorem is useful for performing field arithmetic in fields of characteristic ?.
Theorem A-1: In a field GF(q) of characteristic ?, (x ± y)? = x? ± y? for any x,y variables or
elements in GF(q).
λ λ
(x + y)? = x? + x?-1 y + x?-2 y2 + … + y?
1 2
82
Appendix A
λ
Every term except the first and the last is multiplied by for i ≥ 1.
i
Each of these binomial coefficients has ? as a factor when multiplied out.
Hence, in a field GF(q) they are all 0 and the result follows.
For (x – y)?, replace y by –y in the above expression.
In the discussion about fields, in section 6.1.1 it is mentioned that a finite field of order q is
unique up to isomorphisms.
Two fields GF(q) and GF(q´) are called isomorphic if there exists a map f from GF(q) to
GF(q´) such that:
i. f is bijective; that is, for each element ß ∈ GF(q´) there is exactly one element
a ∈ GF(q) with f (a) = ß
ii. for any a,ß ∈ GF(q), f (a·ß) = f (a)·f (ß), and f (a + ß) = f (a) + f (ß)
An isomorphism amounts to relabelling of the elements of the field that preserves ‘+’ and ‘·’,
implying that f (0) = 0 and f (1) = 1.
3. Vector spaces
A measure of the degree of freedom available to the elements of a vector space is an invariant
called the dimension or rank of the vector space. It is described through the concept of linear
independence.
A linearly independent subset B of a vector space V is called a basis of V if for any vector
v ∈ V, v is a linear combination of elements of B.
83
Appendix A
As for the dual space S- of a vector subspace S of a vector space V over GF(q), the
Dimension Theorem dictates a property to be satisfied by their dimensions.
4. Matrices
A matrix is an mx n array of entries a ij which, for the purposes of coding theory, are assumed
to be elements of a finite field GF(q).
The set of vectors of length n, denoted by Fn , comes equipped with standard operations for
adding vectors and multiplying them by elements of GF(q), called scalars
Coding theory is concerned with choosing subsets of Fn , which are closed under the
operations of vector addition and scalar multiplication defined above.
Note that matrix multiplication between A and B is defined only when the number of columns
of A is equal to the number of rows of B.
The following properties for matrix multiplication can be derived from its definition.
84
Appendix A
Let A be an mx n matrix, B an nx s matrix and D an sx r matrix, with entries from a finite field
GF(q). Then,
1. A·(B+C) = A·B + A·C
2. A·(a·B) = a·(A·B), a ∈ GF(q)
3. A·(B·D) = (A·B)·D
x1 b1
a11 a12 . . . a1n
x2 b2
a 21 a22 . . . a 2n
. .
. · =
. .
.
. .
.
xn bm
a m1 am 2 . . . amn
85
Appendix B
APPENDIX B
The following discussion investigates the existence of an element ß with order n in some field
GF(q m), since such an element is central to the process of factoring xn – 1.
If n is a divisor of (q m – 1), then there are exactly f(n) elements of order n in GF(q m ), where
f(n) is the Euler f function evaluated for n, according to property 2 of Theorem 6-3. For
positive n, f(n) is always greater than zero and thus the field GF(q m ) contains at least one
element of order n. Consequently, if a positive integer m can be found such that n | (qm – 1),
the existence of a primitive n-th root of unity is guaranteed in an extension field GF(q m ) of
GF(q).
The next step is to determine in precisely which extension field of GF(q), the element ß of
order n can be found. If m is the order of q modulo n – that is, m is the smallest positive
integer such that n divides (q m– 1) – then GF(q m ) is the smallest extension field of GF(q) in
which a primitive n-th root of unity can be found.
x3 – 1 = (x + 1)(x2 + x + 1)
x5 – 1 = (x + 1)(x4 + x3 + x2 + x + 1)
x7 – 1 = (x + 1)(x3 + x + 1)(x3 + x2 + 1)
x9 – 1 = (x + 1)(x2 + x + 1)(x6 + x3 + 1)
x11– 1 = (x + 1)(x10 + x9 + x8 + x7 + x6 + x5 + x4 + x3 + x2 + x + 1)
Note that further factorisations can be obtained considering that over GF(2),
x 2 n – 1 = (x n − 1)
k 2k
86
Appendix C
APPENDIX C
The cyclic property and the property of linearity allow the process of shifting and addition
respectively, which can be applied to generate cyclic codes as depicted in the following
example, presented by Sweeney (1991, p. 47).
Consider the field GF(23 ). The factors of x7– 1 are x7– 1 = (x + 1)(x3 + x + 1)(x3 + x2 + 1). Out
of the two polynomials of degree 3, both primitive, x3 + x + 1 can be arbitrarily chosen to
construct a cyclic code of length 7. Using x3 + x + 1 as the generator polynomial, which
corresponds to the generator sequence 0001011, all the codewords of length 7 can be
constructed as
1 0000000
2 generator sequence 0001011
st
3 1 shift 0010110
nd
4 2 shift 0101100
rd
5 3 shift 1011000
6 4th shift 0110001
th
7 5 shift 1100010
th
8 6 shift 1000101
9 sequences 2 + 3 0011101
10 1st shift 0111010
nd
11 2 shift 1110100
rd
12 3 shift 1101001
13 4th shift 1010011
14 5th shift 0100111
th
15 6 shift 1001110
16 sequences 2 + 11 1111111
The first codeword was taken to be the all-zero sequence, since the all-zero vector is always a
codeword of a linear code. Then, starting from the generator sequence and shifting it
cyclically until all seven positions have been registered, the next seven codewords were
constructed. Then, two of those sequences are found, such that, when added they give a new
sequence. That sequence is cyclically shifted until all seven positions have again been
registered, to construct the next seven codewords. Finally, two sequences are found such that,
when added, they form the all-one codeword which remains the same even if shifts are
applied. Further shifting and addition cannot produce any new codewords.
87
Appendix C
The code constructed in this way, has 16 codewords and consequently, has 4 information bits,
thus is a (7,4) code. Its minimum distance is 3, as the minimum weight of a non-zero
codeword is 3. It can be seen that the code, thus produced is a cyclic version of the [7,4]
Hamming code.
88
Appendix D
APPENDIX D
Error detecting / correcting codes mostly employ code symbols from the binary finite field GF(2) or
its extension GF(2m ) since information in digital-data transmission and storage systems is universally
coded in binary form. In fact, error control codes are implemented on bi-stable electronic devices and
therefore, data in binary form is suitable for realisation of encoding and decoding schemes.
Binary addition and multiplication are basic to binary field arithmetic. According to Reed and Chen
(1999, p. 51) these elementary operations can be implemented using standard logic gates as depicted
in Figure D–1.
As for systematic encoding of linear block codes, discussed in section 4.3.3, a binary linear systematic
encoder is illustrated in Figure D–2, as proposed by Reed and Chen (1999, p. 82).
n – k parity
columns of ROM . Address
generator .
matrix appear k x(n – k) . generator
message
word
codeword
.......
parity bits
accumulators ...
. . .
89
Appendix E
APPENDIX E
The following table lists all 24 message words and the corresponding codewords of the [7,4]
Hamming code, as found in Lin and Costello (1983, p. 67).
1000 1101000
0100 0110100
1100 1 011100
0010 1110010
1010 0011010
0110 1000110
1110 0101110
0001 1010001
1001 0111001
0101 1100101
1101 0001101
0011 0100011
1011 1001011
1111 1111111
Actually, this table can be constructed by following the process described in section 5.1.1 for all 24
message words to obtain their corresponding codewords.
The program presented in section 5.1.1.1 takes as input an element (a message word) of the first
column and produces as output the corresponding codeword of the second column.
90
Appendix E
The program presented in section 5.1.1.2 takes as input a received word. This received word can
either be an element (a codeword) of the second column or a codeword with one complemented bit
(single error) or a codeword with two complemented bits (double error). The program outputs the
index of the erroneous coordinate of the received word and the actual transmitted codeword.
91
Bibliography
BIBLIOGRAPHY
Berlekamp, E. R. (1984) Algebraic Coding Theory, Revised 1984 Edition, Aegean Park Press, ISBN
0-89412-063-8
Jones, G. A., Jones J. M. (2000) Information and Coding Theory, Springer Mathematics Series,
Springer, ISBN 1-85233-622-6
Lin, S., Costello, J. C. (1983) Error Control Coding: Fundamentals and Applications, Prentice-Hall,
ISBN 0-13-283796-X
van Lint, J. H. (1992) Introduction to Coding Theory, Graduate Texts in Mathematics, Second
Edition, Springer, ISBN 3-540-54894-7
van Lint, J. H. (1999) Introduction to Coding Theory, Graduate Texts in Mathematics, Third Edition,
Springer, ISBN 3-540-64133-5
Pless, V. (1989) Introduction to the Theory of Error-Correcting Codes, Second Edition, Wiley-
Interscience Series in Discrete Mathematics and Optimisation, John Wiley & Sons Inc, ISBN 0-471-
61884-5
Pless, V. (1998) Introduction to the Theory of Error-Correcting Codes, Third Edition, Wiley-
Interscience Series in Discrete Mathematics and Optimisation, John Wiley & Sons Inc, ISBN 0-471-
19047-0
Poli, A., Huguet, L. (1992) Error Correcting Codes: Theory and Applications, Prentice Hall -
Masson, ISBN 0-13-284894-5
90
Bibliography
Pretzel, O. (1996) Error-Correcting Codes and Finite Fields, Oxford Applied Mathematics &
Computing Science Series, Student Edition, Clarendon Press, Oxford ISBN 0-19-269067-1
Reed, I. S., Chen, X. (1999) Error-Control Coding for Data Networks, Kluwer Academic Publishers,
ISBN 0-7923-8528-4
Roman, S. (1997) Introduction to Coding and Information Theory, Springer, ISBN 0-387-94704-3
Sweeney, P. (1991) Error Control Coding, An Introduction, Prentice Hall, ISBN 0-13-284126-6
Wicker, B. S. (1995) Error Control Systems for Digital Communication and Storage, Prentice Hall,
ISBN 0-13-200809-2
91