Algebraic Coding Theory MA 407: 1 Introduction and Motivation
Algebraic Coding Theory MA 407: 1 Introduction and Motivation
Algebraic Coding Theory MA 407: 1 Introduction and Motivation
MA 407
Adam Attarian
Andrew Hutzel
Ryan Neal
May 4, 2006
Over the last 70 years, algebraic coding has become one of the most important and widely
applied aspects of abstract algebra. Coding theory forms the basis of all modern communication systems, and is the key to another area of study, Information Theory, which
lies in the intersection of probability and coding theory. Algebraic codes are now used in
essentially all hardware-level implementations of smart and intelligent machines, such as
scanners, optical devices, and telecom equipment. It is only with algebraic codes that we
are able to communicate over long distances, or are able to achieve megabit bandwidth
over a wireless channel.
Algebraic coding is most prevalent in communication systems, and has been developed
and engineered because of one inescapable fact of communication: noise. Noise will always be apart of communications, and has the potential to corrupt data and voice due
to its presence. Noise comes from practically an infinite number of sources, from cosmic
background radiation (affecting space based communication), from an inductive motor in
a vending machine down the hall, and can even be generated by the user themselves by
induced signal reflections in the environment. The implications of destructive interference
in communications is obvious: mission critical communique potentially couldnt be trusted
and decisions based on those communications could not be made.
Consider these basic applications of algebraic codes. Suppose that in the case of two
warring nations, a binary message is to be sent indicating an intention of surrender or an
intention of war. If a binary 1 is sent, the nation surrenders. If a binary 0 is sent, then
war it is. In this time of such rudimentary communication, there is no concept of noise
or error correction, and so it is possible, if not likely that due to noise a transmitted 0
to be received as a 1, or vice versa. To make this system substantially more robust, a
party can transmit five bits, and the receiver then infers a message based on the majority
contents. For instance, if 00000 meant surrender and was sent, though due to noise a
00100 was received, the message remains intact and the white flag is raised. Based on
this sender-receiver agreement, up to three errors can occur before the messages intent is
1
reversed and ultimately lost. The probability of three bit errors occurring can be shown to
be lower than a single error, and so the addition of this decoding makes the system more
robust. This decision process is called the maximum-likelihood decoding procedure, and
will be discussed further.
A more involved yet more pertinent example is the method in which current generation
cell phones prevent cross-talk and ensure user security. This example will be explained
further once the requisite background is established, though an overview will suffice for
now. Code Division Multiple Access, or CDMA, is a dominant cell phone standard in North
America, and operates on the idea of orthogonal codes. In a cell phone environment, there
are multiple users talking on the same network, at the same time using the same space
in a finite frequency band. How do the users remain independent of each other and avoid
cross talk? Each user is assigned a unique spreading sequence, or code, and then network
identifies and routes traffic based on these unique sequences. The code is an M length
binary vector that is multiplied onto the users signal. The key here is each of the codes
are orthogonal and of high dimensionality relative to the number of users, so that the users
conversations do not literally collapse into each other.
Two main branches of coding theory are source coding and channel coding. They are
so named because the former manipulates the source to allow more efficient transmission
(ie smaller size messages) while the latter addresses the errors that may be introduced
in the transmission channel. The fundamental theorem of source coding was given by
Claude Shannon in 1948, widely considered the father of Information Theory. Shannons
Theorem describes the best possible error-correction of a code given certain parameters.
Source coding is more within the computer science and engineering discipline, with main
applications being compression of data prior to transmission. Our textbook focuses on
error-correcting codes, since these find algebra more applicable.
One of the most fundamental concepts to understand about channel coding is that it
is only possible to catch errors if there are some restrictions on what constitutes a proper
message. The receiver needs to have an idea of the shape of what it will be receiving. If
the message space is entirely composed of legitimate codewords then any error will change
one code word into another, and the receiver will not be able to discern that the seemingly
legitimate code word that was received is not the same as what was sent. Note that it is
difficult to discuss encoding without including the corresponding decoding method. The
most fundamental concept in encoding is to build redundancy into the message.
2
2.1
Mathematical Discussion
Code Words
There are several different kinds of codes, however one of the most common is the linear
code.
over F. The vectors V are called the code words. When F = Z2 , we refer to working
with binary codes.
A (n, k) linear code over a field F can be thought of as a set of n-tuples from F, where
each vector contains both the message word and a redundancy, which are the remaining
n k components of the code word. For any finite field order q, there are then q k possible
code words. In the common base of binary codes, for n digits, there are 2n possible code
words.
Example 1. The set {0000, 0101, 1010, 1111} is a (4, 2) binary code. The first two digits
of each string are the numbers 0 through 3, whereas the trailing two bits illustrate the
redundancy.
When discussing error-correcting and detecting capabilities, it is necessary to refer to
the Hamming Distance and Hamming Weight.
Definition 2. Hamming Distance. The Hamming distance between any two vectors ,
V is the number of components in which they differ. Let d(, ) denote the Hamming
distance between the two vectors , .
Definition 3. Hamming Weight. The Hamming weight of a vector V is the number
of nonzero components. The Hamming weight of a linear code is the minimum weight of
any nonzero vector in the code. Let wt() denote the Hamming weight of the vector .
Another way to look at the Hamming distance is to notice that this is the number
of substitutions required to change in vector into another, or the number of errors that
transformed one codeword into another. The Hamming weight can also be thought of the
difference between the Hamming distance of a code word and the zero code, {00 0}. For
binary codes, the Hamming distance is merely the number of ones in the codeword.
2.2
Error Detection
Linear codes allow for well defined algorithms for error detection. These routines are so
mature and developed that they are now typically implemented with hardware circuits
rather than software programs. How well an error is detected is based on the codes
Hamming weight. In the end, it may comes down to a choice between detection and
correction.
Theorem 1. If the Hamming weight of a linear code is at least 2t + 1, then the code can
correct any t or fewer errors. Consequently, the same code and detect 2t or fewer errors.
3
Proof. Suppose that a transmitted code word u is received as the vector v, and that at
most t errors have been made in transmission. Then, by definition we have d(u, v) t. If
w is any code word other than u, w u is a nonzero code word. We may assume
2t + 1 wt(w u) = d(w, u) d(w, v) + d(v, u) d(w, v) + t
and so t + 1 d(w, v). So the code word closest to the received vector v is u, and therefor
v is correctly decoded as u.
This theorem is interesting because it implies that the user must choose either to
correct t errors or detect 2t errors. This means that some times it is possible to not decode
a received message at all. In this case, the received will request a retransmission of the
applicable packet or message.
Example 2. Let the Hamming weight of a vector be 4. Then, we know it will correct any
single error and detect any two errors (t = 1, s = 1) or detect any three errors (t = 0, s = 3)
To generate a linear code, a useful method is via a generator matrix. The matrix G is
a linear transform which maps a subspace V of Fk to a subspace W of Fn such that for
any V the vector vG will agree with v in the first k components and build an amount
of redundancy into the vector. This k n linear transform has a matrix representation of
Ikk Akn
where the individual elements of A, aij F. This linear transform G is called the
standard generator matrix, and basically is able to encode any code. It is worth noting
that any matrix of rank k will transform Fk to a k-dimensional subspace of Fn , but the
standard generator matrix has the advantage that the original message vectors form the
first k components of the resultant vectors.
2.3
Error Decoding
Once an error has been detected, it must be corrected, or decoding via a given method
based on the original source encoding. The receiver needs to know what method to employ,
and relies on having prior knowledge on the method that was originally used in encoding.
A very common method of decoding is the Parity-Check Matrix. This method is still
the predominant decoding technique for standard telephone modem systems. For this
method, we presume all of the code words were encoded via a common generator matrix
G. We simply need to undo the encoding that the matrix G put onto the words.
Let
V be a linear
code over the finite field F given by the standard generator matrix
G = Ikk Akn . Then, the n (n k) matrix
A
H=
Ink
4
is referred to as the parity check matrix. To decode any received message vector m, we
follow this procedure:
1. If the product mH is zero, then we presume that no error was made.
2. If there is a unique nonzero row i of H such that mH is s times row i for some s F,
assume the sent word was m (0 s 0), where s occurs in the ith component. If
there is more than one instance, do not decode.
3. If the code is binary, then: if mH is the ith row of H for exactly one i, then we know
than an error was made in that component of m. If mH is more than one row of H,
do not decode.
As this point of the discussion, it would be appropriate to make the point that any
reference to Coding Theory or Information Theory must absolutely include a discussion of
probability. This is important for many reasons. In all of these cases, there is an underlying
idea of the probability that an error occurred. Many times, in processor and power limited
cases (such as spacecraft), computational resources are finite, and can only be expended if
it is absolutely necessary. Though it is out of the scope of this paper, generally all complex
digital devices first calculate a Bit Error Probability before engaging in error correction or
error decoding. If the BER is above a certain threshold, then error correction is attempted.
The error probability is intensely dependent on the type of communication system, the kind
of noise, and the transmission method. Additionally, several decoding procedures depend
exclusively on the probability of an error occurring.
For the following examples, let C Fn2 be a linear code of length n and x, y Fn2 .
Example 3. In Ideal Observer Decoding, the code word x is received. This method then
picks a code word y C to maximize the conditional probability
P (y sent|x received)
This is also known as maximizing the entropy of the system, a concept from Information Theory. In the event that more than one y is selected (that is, the system is over
determined), it is usually requested that the message be retransmitted.
Example 4. In Maximum Likelihood Decoding, the codeword x is received and then a
y C is chosen to maximize the conditional probability
P (x received|y sent)
In other words, in Ideal Observer Decoding, we choose a y that would have most likely
been received as x. In Maximum Likelihood Decoding, we choose y that would have most
likely resulted in x being received.
5
Theorem 2. If each code word is equally likely to be sent, then Maximum Likelihood is
equal to Ideal Observer Decoding
Proof. By the definition of conditional probability, we know that
P (x received y sent)
P (y sent)
P (x received y sent) P (y sent)
=
P (y sent)
P (x sent)
P (x received y sent)
=
P (x sent)
= P (y received x sent)
P (x received|y sent) =
Algebraic coding is what makes wireless communication possible at bandwidth above speeds
in the single bits per second. We will investigate how this is possible, but we need some
background first.
In a cell phone environment, all of the users in a given location are all talking at the
same time on the same frequency band. Because of this, it would be logical to think that
as one user uses increasing bandwidth for different applications (such as wireless email),
all other users links must suffer proportionally. To get around this, CDMA allows all
users access to all frequencies at all times. Therefore, each user spans enough good
frequencies to access all requested services. However, by allowing several users access to
the same frequency at the same time, on the same channel, how are we able to recover
each users information? To do this, we must create an orthogonality structure of the
signals that the users are assigned. This structure is an algebraic code, called a spreading
sequence.
Definition 4. Let C be an (n, k) linear code over F with generator matrix G and paritycheck matrix H. Then for any vector v Fn , we have vH = 0 if and only if v C
Another, more practical way of thinking about orthogonality is for any two vectors,
, C we have cos , = 0. Even more generally two vectors, , are orthogonal if
their inner product h, i = 0. With this understanding of orthogonality, we are able to
transmit our bits.
Suppose Users 1 and 2 want to transmit one information bit to their receivers. Let the
bits be given by
b1 [1] = 1,
b2 [1] = 1
Remember that both users will be transmitting at the same time at the same frequency.
To separate b1 [1] and b2 [1] at the receiver, we must assign mutually orthogonal codes to
each. These codes can be thought of as vectors in high dimensionality vector space.
For this simple example, let the vector space V = R4 . Let Users 1 and 2 be assigned the
respective codes
p1 [m] = {1, 1, 1, 1}
p2 [m] = {1, 1, 1, 1}
We see that by taking the inner product of the vectors that they are infact mutually
orthogonal to each other. We will follow the following steps for the recovery of our coded
bit at the receiver.
Each users information bits are first multiplied onto their individual codes.
The total transmitted signal is created by adding together all of the coded sequences.
At the receiver, individual filters use the different codes to recover each users information bits from the received signal.
To transmit, first we multiply these codes by our information bit, resulting in b1 [1]p1 [m]
and b2 [1]p2 [m]. In this current example, the system transmits the sum signal s[n] over the
channel:
s[n] = b1 [1]p1 [m] + b2 [1]p2 [m],
1 n 4, 1 m 4
We now have
s[n] = {0, 2, 2, 0}
For this example, assume there is no noise and that we are operating under the convenient
mathematical fiction of a loss-less channel so that our received signal is r[n] = s[n]. The
individual bits b1 [1] and b2 [1] can be recovered by sending the received signal r[m] through
detection filters which are matched to the individual codes. The rationale shall be omitted
for brevity, however we will define the detection filter hi [m] for the ith user as the reflected
version of the code. Recalling our original orthogonal codes, this gives us for our matched
detection filters
h1 [m] = {1, 1, 1, 1}
h2 [m] = {1, 1, 1, 1}
To obtain the output of the receiver, yi [n]we must pass the received signal through the
output filter. In the case of all linear, time invariant systems such this the output is
defined as the convolution of the input with the systems impulse response. In this case, it
is the convolution of hi [m] with r[m]. Mathematically, the convolution sum is
yi [n] =
4
X
hi [m]r[n + 1 m]
m=1
Gallian Exercises
31.3 Referring to Example 1, use the nearest-neighbor method to decode the received
words 0000110 and 110100.
1. 0000110: This string has one error, and is in fact 1000110.
2. 1110100: According to nearest-neighbor, we decode the string with no errors.
31.10 Let C = {0000000, 1110100, 0111010, 0011101, 100110, 0100111, 1010011, 1101001}.
Determine the error-correcting and error-detecting capability of C.
The weight of C is 4, so there exists two options: we can detect up to any 3 errors,
or we can correct 1 error and detect two other errors.
31.13 Find all code words of the (7, 4) binary
1 0 0 0 1
0 1 0 0 1
G=
0 0 1 0 1
0 0 0 1 0
1 1
0 1
= I44 A
1 0
1 1
To do this, we treat the binary numbers representing 0 through 15 (e.g all permutations of a 4 bit binary word) as our message vector mi and multiply them by G to
form the ith row of C. All of the individual code words are:
0 0 0 0 0 0 0
1 0 0 0 1 1 1
0 1 0 0 1 0 1
0 0 1 0 1 1 0
0 0 0 1 0 1 1
1 1 0 0 0 1 0
1 0 1 0 0 0 1
m0 G
1
0
0
1
1
0
0
..
C=
=
.
0 1 1 0 0 1 1
m15 G
0 1 0 1 1 1 0
0 0 1 1 1 0 1
0 1 1 1 0 0 0
1 0 1 1 0 1 1
1 1 0 1 0 0 1
1 1 1 0 1 0 0
1 1 1 1 1 1 1
H=
A
I33
1
1
1
0
1
0
0
1
0
1
1
0
1
0
1
1
0
1
0
0
1
1 0
G= 0 1
0 0
0 1 1 0
0 0 1 1 = I33 A
1 1 0 1
Here we are working with 3 bit binary words, so there are 8 possible words to send.
To develop the code, we again treat the word as a 3 1 vector and form the code by
multiplying by G for the respective rows of C. The code words are then:
0 0 0 0 0 0
1 0 0 1 1 0
0 1 0 0 1 1
m0 G
0 0 1 1 0 1
..
C=
=
1 1 0 1 0 1
m7 G
1 0 1 0 1 1
0 1 1 1 1 0
1 1 1 0 0 0
The codes are constructed so that the respective decodings are as follows. Via the
nearest-neighbor method:
1. 001001 is decoded as 001101. Eliminating the trailing 3-bit redundancy, we
finally obtain the word 001 (a single correction).
2. 011000 is decoded as 111000. Similarly eliminating the acquired redundancy
(this comment will be omitted for future decodes), we obtain 111 (a single
correction).
3. 000110 is decoded as 100110 100 (a single correction)
4. 100001 is not decoded as there are greater than two errors.
10
To decode with the parity check method, we must first find the parity check matrix
H.
1 1 0
0 1 1
A
1 0 1
=
H=
I33
1 0 0
0 1 0
0 0 1
Decoding the given vectors di from the nearest neighbor approach we have
1. d1 = 001001, and d1 H = 100, so we conclude there exists an error in the 4th
component of d1 . Then, decode as 001101 001.
2. d2 = 011000, and d2 H = 110, so we conclude there exists an error in the first
component of d2 . Then, decode as 111000 111.
3. Similarly, d3 H = 110 so we decode as 100110 100.
4. Lastly, d4 H = 111, so we know there exists two or more errors, and so we do
not decode.
For a coset decoding approach, we must define a standard array. To do this, we place
the code words from C as the first row. For each subsequent rows, choose a vector
not already in the standard array, and call this the coset leader. Then, generate the
coset of this vector with C and list this as the next row. For reasons which will
become obvious, we will note the location of our received words in the array. This
results in
000000
100110 010011 001101 110101 101011 011110 111000
100000
000110 110011 101101 010101 001011 111110 011000
010000
110110 000011 011101 100101 111011 001110 101000
001000
101110 011011 000101 111101 100011 010110 110000
S=
000100
100010 010111 001001 110001 101111 011010 111100
000010
100100 010011 001111 110111 101001 011100 111010
000001
100111 010010 001100 110100 101010 011111 111001
100001
000111 110010 101100 010100 001010 111111 011001
To decode, we locate each received word in the array and then decode the word at
the top of the column in which the word is located.
1. 001101 001
2. 111000 111
3. 100110 100
11
4. 000000 000
Lastly, to decode using the syndrome method we calculate the syndromes of the coset
leaders by taking the product of the code word and parity check matrix H.
000000 H = 000
100000 H = 110
010000 H = 011
001000 H = 101
000100 H = 100
000010 H = 010
000001 H = 001
100001 H = 111
So that we decode with the bit-wise difference between the matches as follows:
1. 001001 is decoded as 001001 000100 = 001101 001
2. 011000 is decoded as 011000 100000 = 111000 111
3. 000110 is decoded as 000110 100000 = 100110 100
4. 100001 is decoded as 100001 100001 = 000000 000
12