07 Introduction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 72

CHAPTER 0

IKTEODUCnON
CHAPTER 0
INTRODUCTION

0.1 INFORMATION THEORY

Nowadays, many people claim we live in the so-called information age.

Clearly, the rise of the internet (among others) has made information available

to people on an unprecedented scale and in a magnitude never seen before so

widely. This can and has been compared to the introduction of the printing

press in the middle Ages. With its advent, the massive distribution of books

and ideas became possible, and the printing press certainly has played a

significant role since its invention. Even while it is still much too early to tell,

the rise of the internet seems to be of a similar scale for the first time in history,

everyone is able to publish his own words.

In the early 1940s it was thought to be impossible to send information at a positive

rate with negligible probability of error. Shannon (1948) surprised the

communication theory community by proving that the probability of error could be

made nearly zero for all communication rates below channel capacity. The

capacity can be computed simply from the noise characteristics of the channel.

Shannon further argued that random processes such as music and speech have an

irreducible complexity below which the signal cannot be compressed. This he

named the entropy, in deference to the parallel use of this word in

thermodynamics, and argued that if the entropy of the source is less than the

capacity of the channel, asymptotically error-free communication can be achieved.


Introduction

Infomiation theory today represents the extreme points of the set of all possible

communication schemes shown in the fancifiil figure 0.1.2. The data

compression minimum IiX\X) lies at one extreme of the set of communication

ideas. All data compression schemes require description rates at least equal to this

minimum. At the other extreme is the data transmission maximum/(Z; 7), known

as the channel capacity. Thus, all modulation schemes and data compression

schemes lie between these limits. Information theory also suggests means of

achieving these ultimate limits of communication. However, these theoretically

optimal communication schemes, beautiful as they are, may tum out to be

computationally impractical. It is only because of the computational feasibility of

simple modulation and demodulation schemes that we use them rather than the

random coding and nearest-neighbor decoding rule suggested by Shannon's proof

of the channel capacity theorem. Progress in integrated circuits and code design

has enabled us to reap some of the gains suggested by Shannon's theory.

Computational practicality wasfinallyachieved by the advent of turbo codes. A

good example of an application of the ideas of information theory is the use of

error-correcting codes on compact discs and DVDs.

Recent work on the communication aspects of information theory has

concentrated on network information theory: the theory of the simultaneous

rates of communication from many senders to many receivers in the presence

of interference and noise. Some of the trade-offs of rates between senders and
Introduction

receivers are unexpected, and all have a certain mathematical simplicity.

A unifying theory, however, remains to be found.

Figure 0.1.1 Relationship of information theory to other fields.

Data compression Data transmission


limit limit

min/(X;A) max /(X; Y)

Figure 0.1.2 Information theory as the extreme points of communication theory.


Introduction

Kolmogorov (1956), Chaitin (1966) and Solomonoff (1964) put forth the idea

that the complexity of a string of data can be defined by the length of the

shortest binary computer program for computing the string. Thus, the

complexity is the minimal description length. This definition of complexity

turns out to be universal that is, computer independent, and is of fundamental

importance. Thus, Kolmogorov complexity lays the foundation for the theory

of descriptive complexity. Gratifyingly, the Kolmogorov complexity K is

approximately equal to the Shannon entropy H if the sequence is drawn at

random from a distribution that has entropy H. So the tie-in between

information theory and Kolmogorov complexity is perfect. Indeed, we consider

Kolmogorov complexity to be more fundamental than Shannon entropy. It is

the ultimate data compression and leads to a logically consistent procedure for

inference.

There is a pleasing conpiaiKntary relationship between algorithmic complexity

and computational complexity. One can think about computational complexity

(time complexity) and Kolmogorov complexity (program length or descriptive

complexity) as two axes corresponding to program running time and program

length. Kolmogorov complexity focuses on minimizing along the second axis,

and computational complexity focuses on minimizing along the first axis. Little

wok has been done on the simultaneous minimization of the two.

Statistical mechanics is the birthplace of entropy and the second law of

thermodynamics. Entropy always increases. Among other things, the second

law allows one to dismiss any claims to perpetual motion machines.


Introduction

The fundamental quantities of information theory—entropy, relative entropy,

and mutual information are defined as functional of probability distributions. In

turn, they characterize the behavior of long sequences of random variables and

allow us to estimate the probabilities of rare events (large deviation theory) and

tofindthe best error exponent in hypothesis tests.

As we build larger computers out of smaller components, we encounter both a

computation limit and a communication limit. Computation is communication

limited and communication is computation limited. These become intertwined,

and thus all of the developments in communication theory via information

theory should have a direct impact on the theory of computation.

Coding theory originated with the 1948 publication of the paper "A mathematical

theory of communication" by Claude Shannon. For the past half century, coding

theory has grown into a discipline intersecting mathematics and engineering

with applications to almost every area of communication such as satellite and

cellular telephone transmission, compact disc recording, and data storage.

Modem information and communication systems are based on the reliable and

efficient transmission of information. Channels encountered in practical

applications are usually disturbed regardless of whether they correspond to

information transmission over noisy and time-variant mobile radio channels or

to information transmission on optical discs that might be damaged by

scratches. Owing to these disturbances, appropriate channel coding schemes


Introduction

have to be employed such that errors within the transmitted information can be

detected or even corrected. To this end, channel coding theory provides

suitable coding schemes for error detection and error correction. Besides good

code characteristics with respect to the number of errors that can be detected or

corrected, the complexity of the architectures used for implementing the

encoding and decoding algorithms is important for practical applications.

The subject of coding is the detection and correction of errors in digital information.

Such errors almost inevitably occur after the transmission, storage or processing of

information in digital (mainly binary) form, because of noise and interference in

communication channels, or imperfections in storage media, for example. Protecting

digital information with a suitable error-control code enables the efficient detection

and correction of any errors that may have occurred.

Error-control codes are now used in almost the entire range of information

communication, storage and processing systems. Rapid advances in electronic

and optical devices and systems have enabled the implementation of very

powerful codes with close to optimum error-control performance. In addition,

new types of code, and new decoding methods, have recently been developed

and are starting to be applied. However, error-control coding is complex, novel

and unfamiliar, not yet widely understood and appreciated.

In the 50 years since Shannon's seminal papers of 1948 and 1949, coding

theory has progressed rather fitfully through periods of euphoric highs with
Introduction

discoveries of promising code classes, elegant decoding algorithms, and visions

of revolutionizing communications, and dismal lows when it was feared that

coding application would never move beyond hard decision Hamming and

Golay codes. It now appears safe to say that coding is maturing into an

important segment of communications systems engineering, one that will see

increasing applications of relatively standard techniques to reduce system cost

and complexity. Revolutions are not yet in sight.

Certain of the reasons for the increasing applications of coding are external: the

anticipated rapid growth in satellite communications; the revolution in digital

integrated circuits; the increasing emphasis on the reliable transmission of

digital data and of digitally coded analog signals; the availability of

inexpensive computers for system, algorithm, and hardware simulation; the

increasing digitalization of modems, switches, and interconnect facilities,

permitting ready interfacing and common maintainability; and the increasing

sophistication of the user community.

Also, there are many cases in which (a form of) confidentiality is required. An

obvious example would be sending sensitive information in some digital form.

But there are also other concerns: for instance, one could want to be sure of

whether a certain (electronic) letter actually comes from the mentioned author.

In daily life, the author can achieve this by writing his signature on the letter.

But how can one do that in an email? Another, more mundane example is
latroductioa

getting money from an ATM. Before handing you money, the bank wants to be

sure that there is enough money in your account. On the other hand, you would

like to be the only person able to withdraw money from your account. Again,

going to a bank teller and using handwritten signatures has been the solution

for centuries. But this is very difficult, if not impossible for an automated

machine, and so other, intrinsically digital methods must be adopted. The tools

of choice here, collectively called cryptography, used to protect national

secrets. Both coding theory and cryptography have been already proven to be

essential in our information age. While they may seem to achieve opposite

goals at first sight, they share much more than that. The aim of cryptography is

to provide secure transmission of messages, in the sense that two or more

persons can communicate in a way that guarantees that the desired subset of the

following four primitives is met:

(i) Confidentiality. This primitive is usually perceived to be the main focus

of cryptography, providing a way such that the information can only be

viewed by those people authorized to see it.

(ii) Data integrity. This service will provide a means to check if the

transmitted information was altered in any way, including but not limited

to things like insertion, deletion and substitution of messages.

(iii) Authentication. This service will establish some identity pertaining to the

message. Thus, this primitive can (among others) be used to guarantee the
Introduction

identity of the sender, guarantee the identity of the receiver, or guarantee

the time the message was sent,

(iv) Non-repudiation. This serves to prevent someone from denying previous

commitments. It is needed in cases where disputes might have to be

resolved, for instance in E-commerce.

The aim of coding theory is to provide secure transmission of messages, in the

sense that (up to a certain number of) errors that occurred during the

transmission can be corrected. However, for this capability a price must be

paid, in the form of redundancy of the transmitted data.

0.2 ENTROPY AND MUTUAL INFORMATION.

The initial questions treated by information theory lay in the areas of data

compression and transmission. The answers are quantities such as entropy and

mutual information, which are functions of the probability distributions that

underlie the process of communication.

In the 1930s, Hartley introduced a logarithmic measure of information for

communication. His measure was essentially the logarithm of the alphabet size.

Shannon (1948) was the first to define entropy and mutual information defined as:

The entropy H(X) of a discrete random variable X is defined by

//(X) = -Xp(x,)ln{p(x,)}, (0.2.1)

and of a continuous random variable with the probability density p(x) by


Introduction

H(X) = -\p(x,) HP(X>)} dx, (0.2.2)

Relative entropy was first defined by Kullback and Leibler (1951). It is known

under a variety of names, including the Kullback-Leibler distance, cross

entropy, information divergence, and information for discrimination.

The relative entropy or Kullback-Leibler distance between two probability

mass functions p(x) and q(x) is defmed as

H(p\\q) = '^pix)]n^ (0.2.3)

Consider two random variables X and Y with a joint probability mass function

p(x, y) and marginal probability mass functions p(x) and p(y). The mutual

information I (X; Y) is the relative entropy between the joint distribution and

the product distribution p(x) p(y):

I(X;Y) = '£Y.P(^,y)\n^^ (0.2.4)

The relationship of mutual information and sufficiency is due to Kullback

(1959) given as

I(X;Y) = H(X) - H(X\Y) (0.2.5)

/ (X; Y) = H(Y) - H(Y\X) (0.2.6)

/ (X; Y) = H(X) + H(Y) - H(X, Y) (0.2.7)

I(X',Y)=1(Y;X) (0.2.8)

I(X\X) = H(X). (0.2.9)

10
Introduction

0.3 INFORMATION MEASURE

It has been shown by Kannappan and Rathie (1973) that KuUback's (1959)

information divergence possesses properties, which make it a useful measure of

the difference between two probability distributions. Such a measure of

difference is useful when wishes to estimate a probability distributions.

In a common situation, one begins with a probability measure, Q, which may

be based on intuition, incomplete prior information, wishes to find a new

probability measure, P, which is compatible with new information and which is

close as possible to the probability Q.

One attractive method to determine P is to minimize the information

divergence between P and Q subject to the constraints imposed by the new

information. This method becomes the maximum entropy method of Jaynes

(1957) when Q is a uniform probability distribution minimum information

divergence estimates of probability have been employed such diverse fields as

statistical mechanics, decision theory and pattern recognition.

Information divergence can be regarded as a measure of closeness of two

probability distribution P and Q. When the Information divergence I (P: Q) is

small, the P probabilities of events are usually close to Q probabilities of the same

events. While if I (P: Q) is large the P probabilities and the Q probabilities of the

same events are quite different. Thus I (P: Q) acquires a natural interpretation in

terms of probability of events which complements the interpretation provided by

11
Introduction

axiomatic characterizations. Tiie interpretation of the result in terms of a measure

of closeness of two probability measures seems not to have been emphasized

previously, although it is implicit in KuUback's (1959) discussions of

discrimination between hypothesis and in Sakrison (1968) treatment of empirical

distribution.

The concept of directed divergence between two discrete distributions, some of

their generalizations and some measures involving more than two distributions,

different defmition and some of the properties are pointed out and characterization

are also given. This measure is widely used in information's theory,

communication theory, forecasting and statistical inference, statistical mechanics

and other disciplines.

Taneja and Tuteja (1984) defined the qualitative-quantitative measure of

relative information associated with the probability distribution P predicted

probability distribution Q and utility distribution U as the mean value of useful

self information gain as

/(/>/e:£/) = X«,Alog-^ (0.3.1)

liP/Q: U) is also called the relative "useful" information measure.

Theil (1967) introduced an information measure (information improvement) for

three probability distributions

HP/Q: R) = f^p, log, ^; P,Q,Re A„ (0.3.2)

12
Introduction

Rathie and Kannappan (1972) gave a generalization of Their s information

improvement as

lA^f-Vy-l
i;(P;Q;R)=^ j3^\,P,Q,ReA„ (0.3.3)
2^-'-l

Aczel and Nath (1972) proposed the following two measures of generalized

directed-divergences in information

(0.3.4)
UP.

and
a-l

1 ^^'
a^l,P,Q,ReA„ (0.3.5)
a-\
IA
(=1

Vinocha and Goyal (1980, 1981, 1982) defined new generalizations of Theil's

information improvement for additive and non-additive cases respectively as

m-2 n
k+\
4[^J = IZ^.l0g2-^ (0.3.6)
*=i (=1 ' iM2

n p m-\

m-\

^:[^*]=E^..iog2T^- -'ZP.MP,,
/>i k-l

m-\ fn^
I p
K[PkhlAP^/Pr/P.]-Z"r, • ' l
(0.3.7)
k-2 yPkj

13
Introduction

Where P,=iP,_^,P2^„ ,?>,), k = \,2, ,m;P^^>0 and X^,*=1 be m finite


;=1

discrete probability distribution.


/J-I
m-2 n
'/,i+l
IIP. -(m-2)
mi t-1 1=1 • ;,i+2

2^-'-l
•;J3^i (0.3.8)

/f [P^ ] is called net information of m-distribution spaces as the first term on

the right hand side of (0.3.7) is the raformation improvement of three distribution

spaces 1, r and m; the second term is the inaccuracy of the remaining.

Vinocha and Goyal (1984) also generalized the non-additive directed-

divergence of type [a, y^] as

nfi-a
m-2 n
' /,*+!
-(m-2)
' lM2
iriPA= 2^-'-l
•;a=?^l;yf?9il (0.3.9)

Vinocha and Faruq (2000) have generalized new directed-divergence of type y9

" - p-\
n m-\

IP. -1
/-I *=2
#f
CAP^h 2P-1 fi^\ (0.3.10)

Vinocha and Faruq (2000) have also defined directed-divergence of type p

-\a-\
n m-1
/r[^j=-^iog ZP. (0.3.11)
;=1 kf2

Further Vinocha and Faruq (2000) have defined amount of information as

/[i>] = logM^[PJ^ (0.3.12)

14
latroduction

where
i"l
m-1
M,{P,\ =(!>-' I/(^..) (l>
1=1 i,m
(0.3.13)

Vinocha and Faruq (2000) also generalized the amount of information as

i^PJ = [N,[PJfJ (0.3.14)

Where

m-2 n p l]
1 1 P<: fiP,.k) <l>
(0.3.15)
^ J.-*
N,[PJ,-<I> 2^-" -\

Vinocha and Faruq (2000) have further generalized Taneja and Tuteja (1984)

relative "useful" information measure in m-distributions spaces associated with

the probability distributionsp,, revised probability distributions Pj'Ps' ^Pm

and utility distributions U as

(0.3.16)
(=1 "i,m *=2

0.4 CODING THEORY

The technology of. communication and computing advanced at a breathtaking

pace in the 20th century, especially in the second half. A significant part of this

advance in communication began some 60 years ago when Shannon published

his seminal paper "A Mathematical Theory of Communication." In that paper

Shannonfiramedand posed a fundamental question: how can we efficiently and

reliably transmit information? Shannon also gave a basic answer: coding can

15
Introduction

do it. Since that time the problem of finding practical coding schemes that

approach the fundamental limits established by Shannon has been at the heart

of information theory and communications. Recently, significant advances

have taken place that rings us close to answering this question. Perhaps, at least

in a practical sense, the question has been answered.

The advance came with a fundamental paradigm shift m the area of coding that

took place in the early 1990s. In Modem Coding theory, codes are viewed as

large complex systems described by random sparse graphical models, and

encoding as well as decoding are accomplished by efficient local algorithms.

The local interactions of the code bits are simple but the overall code is

nevertheless complex (and so sufficiently powerful to allow reliable

communication) because of the large number of interactions. The idea of

random codes is in the spirit of Shannon's original formulation.

These are exciting times for coding theorists and practitioners. Despite all the

progress made, many fundamental questions are still open. Sparse graphical

models and message-passing algorithms, to name just two of the notions that

are fundamental to our treatment, play an increasingly important role in many

otherfieldsas well. This is not a coincidence. Many of the innovations were brought

into thefieldof coding by physicists or computer scientists. Conversely, the success

of modem coding has inspired work in several other fields.

Modem coding will not displace classical coding anytime soon. At any point in
time hundreds of millions of Reed-Solomon codes work hard to make your life

16
Introduction

less error prone. This is unlikely to change substantially in the near future. But

modem coding offers an alternative way of solving the communications

problem. Most current wireless communications systems have already adopted

modem coding.

Technically, aim of modem coding focused on Shannon's classical problem:

we want to transmit a message across a noisy channel so that the receiver can

determine this message with high probability despite the imperfections of the

channel. We are interested in low-complexity schemes that introduce little delay

and allow reliable transmission close to the ultimate limit, the Shannon capacity.

0.4.1 Communications Problem

Consider the following communications scenario - the point-to-point

communications problem depicted in Figure 0.4.1.

Source Channel Sink

Figure 0.4.1: Basic point-to-point communications problem

A source transmits its information (speech, audio, data, etc.) via a noisy

channel (phone line, optical link, wireless, storage medium, etc.) to a sink. We

are interested in reliable transmission, i.e., we want to recreate the transmitted

information with as little distortion (number of wrong bits, mean squared error

distortion, etc.) as possible at the sink.

Shannon (1948) formalized the communications problem and showed that the

point-to-point problem can be decomposed into two separate problems as

shown in Figure (0.4.2). First, a source encoder transforms the source into a bit

17
Introduction

stream. Ideally, the source encoder removes all redundancy from the source so

that the resulting bit stream has the smallest possible number of bits while still

representing the source with enough accuracy. The channel encoder then

processes the bit stream to add redundancy. This redundancy is carefully

chosen to combat the noise that is introduced by the channel.

Source Sink
ik

1 '
Source Source
Encoder Decoder
ii

^'
Channel Channel
Channel
Encoaer Decoaer

Figure 0.42: Basic point-to-point communications problem in view of the source


channel separation theorem.

To be mathematically more precise: we model the output of the source as a

stochastic process. For example, we might represent text as the output of a

Markov chain, describing the local dependency structure of letter sequences. It

is the task of the source encoder to represent this output as efficiently as

possible (using as few bits as possible) given a desired distortion. The

distortion measure reflects the "cost" of deviating from the original source

output. If the source emits points in R" it might be natural to consider the

squared Euclidean distance, whereas if the source emits binary strings a more

natural measure might be to count the number of positions in which the source

output and the word that can be reconstructed from the encoded source differ.

18
Introduction

Shannon's source coding theorem asserts that, for a given source and distortion

measure, there exists a minimum rate R = R(d) (bits per emitted source

symbol) which is necessary (and sufficient) to describe this source with

distortion not exceeding d. The plot of this rate R as a function of the distortion

d is usually called the rate-distortion curve. In the second stage an appropriate

amount of redundancy is added to these source bits to protect them against the

errors in the channel. This process is called channel coding. Richardson and

Urbanke (2008) modeled the channel as a probabilistic mapping and they are

typically interested in the average performance, where the average is taken over

all channel realizations. Shaimon's channel coding theorem asserts the

existence of a maximum rate (bits per channel use) at which information can be

transmitted reliably, i.e., with vanishing probability of error, over a given

channel. This maximum rate is called the capacity of the chaimel and is

denoted by C. At the receiver they first decode the received bits to determme

the transmitted information. Authors then use the decoded bits to reconstruct

the source at the receiver. Shannon's source-channel separation theorem asserts

that the source can be reconstructed with a distortion of at most d at the

receiver if R(d) < C, i.e., if the rate required to represent the given source with

the allowed distortion is smaller than the capacity of the channel. Conversely,

no scheme can do better. One great benefit of the separation theorem is that a

communications link can be used for a large variety of sources: one good

channel coding solution can be used with any source. Virtually all systems in

19
Introduction

use today are based on this principle. It is important though to be aware of the

limitations of the source-channel separation theorem. The optimality is only in

terms of the achievable distortion when large blocks of data are encoded together.

Joint schemes can be substantially better in terms of complexity or delay. Also, the

separation is no longer valid if one looks at multi-user scenarios.

We will not be concerned with the source coding problem or, equivalently, we

assume thattiiesource coding problem has been solved. For us, the source emits a

sequence of independent identically distributed (iid) bits which are equally likely to

be zero or one. Under this assumption, we will see how to accomplish the channel

coding problem in an efficient manner for a variety of scenarios.

0.4.2 Coding: Trial and Error

How can we transmit informationreliabfyover a noisy channel at a strictly

positive rate? At some level we have already given the answer: add redundancy

to the message that can be exploited to combat the distortion introduced by the

chaimel. By starting with a special case, following are the key concepts.

Binary Symmetric Channel

Consider the binary symmetric channel with cross-over probability c depicted

in Figure (0.4.3).We denote it by BSC(e). Both input X, and output Y, are

elements of {±1}. A transmitted bit is either received correctly or received

flipped, the latter occurring with probability e, and different bits areflippedor

?0
Introduction

not flipped independently. We can assume that 0<€<l/2 without loss of

generality.

-1 1- e -1
Figure 0.4.3: BSC(e)

The BSC is the generic model of a binary-input memoryless channel in which

hard decisions are made at the front end of the ;"eceiver, i.e., where the received

value is quantized to two values.

Fh^t Trial: Suppose that the transmitted bits are independent and

ihatP{X, =+\} = P{X, = -1} = -. We start by considering uncoded transmission

over the BSC(6). Thus, we send the source bits across the channel as is, without

the insertion of redundant bits. At the receiver we estimate the transmitted bit X

based on the observation Y. The decision rule that minimizes the bit-error

probability, call it y^^^{y), is to choose that element of {±1} which maximizes

^A-|j'(^!>')for tiie given y. Since the prior on X is uniform, an application of

Bayes's rule shows that this is equivalent to maximizing Py^^{y \ x) for the given y.

Since 6 < l / 2 we conclude that the optimal estimator is x^'^{y) = y. The

21
Introduction

probability that the estimate differs from the true value, i.e., P^ = P{x'^'^''{Y) * X},

is equal to G . Since for every information bit we want to convey we send exactly

one bit over the channel we say that this scheme has rate l.We conclude that with

uncoded transmission we can achieve a (rate, P^ )-pair of (l,e).

Second Trial: If the error probability G is too high for our application, what

transmission strategy can we use to lower it? The simplest strategy is repetition

coding. Assume we repeat each bit k times. To keep things simple, assume that

k is odd. So if X, the bit to be transmitted, has value x then the input to the

BSC(e) is the k-tuple ;Cj,X2,....,x^. Denote the k associated observations by

Yi,Y2^....,Yi^. It is intuitive, and not hard to prove, that the estimator that

minimizes the bit-error probability is given by the majority rule

x^^(yi,y2^--^yk) = majority of {yi,y2,--..,yk} (0.4.1)

Hence the probability of bit error is given by

Pl,=P{x^'^(Y)*X} = P{atleqst \k12\ errors occur)

= 2]r]e'(l-€)*-'. (0.4.2)
i>kl2 V

Since for every information bit we want to convey we send k bits over the

channel we say that such a scheme has rate \lk. So with repetition codes we
^
can achieve the (rate, i'^)-pairs -.J] e'(l-e)^-' . For Pf, to
]^'L^i>kii
v'y

-I
22
Introduction

approach zero we have to choose k larger and larger and as a consequence the

rate approaches zero as well.

0.4.3 Codes and Ensembles

Information is inherently discrete. It is natural and convenient to use finite

fields to represent it. The most important instance for us is the binary field

GF(2), consisting of {0,1} with mod-2 addition and mod-2 multiplication . In

words, if we use GF(2) then we represent information in terms of (sequences

of) bits, a natural representation and convenient for the purpose of processing.

If you are not familiar with finite fields, very little is lost if you replace any

mention of a generic finite field F with GF(2),We write | F | to indicate the

number of elements of the finite field F, e.g., ] GF{2)\ = 2.Why do we choose

finitefields?As we will see, by using algebraic operations in both the encoding

as well as the decoding we can significantly reduce the complexity.

The following defmitions play an important role in coding theory and it is

convenient to collect them here for reference.

Code: A code C of length n and cardinality M over a field F is a collection of

M elementsfi-omF", i.e.,

C(n,M) = {xt*l,....,xf^l}, x^^^ eF",l<m<M. (0.4.3)

The elements of the code are called codewords. The parameter n is called the

block length.

23
Introduction

Repetition Code: LetF = GF(2). The binary repetition code of length 3 is

defined as C(« = 3, M = 2) = {000, 111}.

In the preceding definition we have introduced binary codes, i.e., codes whose

components are elements of GF(2) = {0,1}. Sometimes it is more convenient to

think of the two field elements as {±1} instead (see, the definition of the BSC).

The standard mapping is 0->l and 1 ^ - 1 . It is convenient to use both

notations. We freely and frequently switch. With some abuse of notation, we

make no distinction between these two cases and talk about binary codes and

GF(2) even if the components take values in{±l}.

Rate: The rate of a code C(n,M) is r = -log|ci M. It is measured in information

symbols per transmitted symbol. For Example in Repetition Code let F = GF(2).

We have r(C(3,2)) = -log2 2 = - . It takes three channel symbols to transmit one

information symbol.

Support Set: The support set of a codeword xeC is the set of locations

/e[«] = {l,2 ,«} such that x, T^O.

Minimal Codeword: Consider a binary code C, i.e., a code over GF(2).We

say that a codeword jc e C is minimal if its support set does not contain the

support set of any other (non-zero) codeword.

The Hamming distance introduced in the following definition and the derived

minimum distance of a code, are the central characters in all of classical

24
Introduction

coding. This is probably one of the most distinguishing factors between

classical and modem coding.

Hamming Weight and Hamming Distance: Let u,veF". The Hamming weight

of a word u, which we denote hyw{u), is equal to the number of non-zero symbols

in u, i.e., the cardinality of the support set. The Hamming distance of a pair (u, v),

which we denote by d(u, v), is the number of positions in which u differs from v.

We have d{u,v) = d{u-v, 0) = W(M-V). Further, d{u,v) = d{v,u) and

d(u,v)>0, with equality if and only if u = v. Also, d(u,v) satisfies the triangle

inequality

d(u,v) < d(u, t) + d(t,v) (0.4.4)

for any triple u,v,teF". In words, d{u,v) is a true distance in the

mathematical sense.

Minimum Distance of a Code: Let C be a code. Its minimum distance d{C) is

defined as

d(C) = mm{d(u,v):u,veC,u^v}. (0.4.5)

0.5 IMPORTANT CODES

0.5.1 Linear Codes: A binary block code of length n with 2* codewords is

called an (n, k) linear block code if and only if its 2 codewords form a k-

dimensional subspace of the vector space V of all the n-tuples over GF(2).

25
Introduction

Generator and Parity-Check Matrices

Since a binary (n, k) linear block code C is a k-dimensional subspace of the

vector space of all the n-tuples over GF(2), there exist k linearly independent

codewords, go,gi,-",gk-\' ^ ^ ^^^^ ^^^^ ^^^^ codeword v in C is a linear

combination of these k linearly independent codewords. These k linearly

independent codewords in C form a basis B^ of C. Using this basis, encoding

can be done as follows. Let U = (UQ,UI,....,U,^_I) be the message to be encoded.

The codeword v = (vo,Vi, v„_j) for this message is given by the following linear

combination of gQ,gi,...,g^^i, with the k message bits of u as the coefiBcients:

V = Wo^O+"l^l+-- + "t-lg/t-l (0.5.1)

We may arrange the k linearly independent codewords, gQ,gi,-,gk-\' of C as

rows of a k X n matrix over GF(2) as follows:

SQ ^0,0 SQ,\ • •• gO,n-\

G = g\ =
^1,0 ^u • •• ^1,«-1
(0.5.2)

8k-i. _gk-l,0 gk-l,l •" 8k-l,n-

Then the codeword v = (vo,v,,....,v„_j) for message w = (Mo,Mi,....,Mi_i)given by

(0.5.2) can be expressed as the matrix product of u and G as follows:

v = u.G (0.5.3)

Therefore, the codeword v for a message u is simply a linear combination of the

rows of matrix G with the information bits in the message u as the coefiBcients.

G is called a generator matrix of the (n, k) linear block code C. Since C is

spanned by the rows of G, it is called the row space of G. In general, an (n, k)

26
Introduction

linear block code has more than one basis. Consequently, a generator matrix of

a given (n, k) linear block code is not unique. Any choice of a basis of C gives

a generator matrix of C. Obviously; the rank of a generator matrix of a linear

block code C is equal to the dimension of C.

Since a binary (n, k) linear block code C is a k-dimensional subspace of the vector

space V of all the n-tuples over GF(2), its null (or dual) space, denoted Q , is an

(n - k)-dimensional subspace of V that is given by the following set of n- titles in V:

Q={weF:{>v,v> = 0/ora//v€C} (0.5.4)

where (w,v) denotes the inner product of w and v . Q may be regarded as a

binary (n, n - k) linear block code and is called the dual code of C. Let B^ be a

basis of Q . Then B^ consists of n - k linearly independent codewords in Q . Let

hQ,1\,....\_j^_i be the n - k linearly independent codewords mB^. Then every

codeword in Q is a linear combination of these n - k linearly independent

codewords in B^. Form the following (n - k) x; n matrix over GF(2):

\ ^,0 Ki k),nA

H=
^,0 \i Kn-l
I (0.5.5)

%-k-\. K-k-\,Q K-k-\,\ K-k-\,n

Then H is a generator matrix of the dual code Q of the binary (n, k) linear

block code C. It follows from (0.5.2), (0.5.4), and (0.5.5) thatGH^ =0, where

O is a k X (n - k) zero matrix. Furthermore, C is also uniquely specified by the

27
Introduction

H matrix as follows: a binary n-tuple v €Fis a codeword in C if and only if

v// = 0 (the all-zero (n - k)-tuple), i.e.,

C = {v€F:vi/^=0} (0.5.6)

H is called a parity-check matrix of C and C is said to be the null space of H.

Therefore, a linear block code is uniquely specified by two matrices, a

generator matrix and a parity-check matrix. In general, encoding of a linear

block code is based on a generator matrix of the code using (0.5.3) and

decoding is based on a parity-check matrix of the code. Many classes of well-

known linear block codes are constructed in terms of their parity-check

matrices. A parity-check matrix H of a linear block code is said to be a full-

rank matrix if its rank is equal to the number of rows of H. However, in many

cases, a parity-check matrix of an (n, k) linear block code is not given as a full-

rank matrix, i.e., the number of its rows is greater than its row rank, n - k. In

this case, some rows of the given parity-check matrix H are linear

combinations of a set of n - k linearly independent rows. These extra rows are

called redundant rows.

0.5.2 Hamming Codes

A widely used class of linear block codes is the Hamming code family

[Hamming (1950)]. For any positive integer m > 3, there exists a Hamming

code with the following characteristics:

Length n = 2m - 1

Number of message bits k = 2m - m - I

28
Introduction

Number of parity check bits n- k=m

Error-correction capability t = 1, (d^^^ = 3)

The parity check matrix H of these codes is formed of the non-zero columns of

m bits, and can be implemented in systematic form:

ii = [lJQ] (0.5.7)

where the identity submatrix I^ is a square matrix of size m x m and the

submatrix Q consists of the 2*" - w - 1 columns formed with vectors of weight 2

or more.

For the simplest case, for which m = 3,

„ = 2^-l = 7
yt = 2 ^ - 3 - l = 4
n-A; = m = 3

1 0 0 1 0 11
/f = 0 1 0 1 1 1 0
0 0 1 0 1 1 1

which is the linear block code C(7, 4) that has been analyzed previously. The

generator matrix can be constructed using the following expression for linear

block codes of systematic form:

In the parity check matrix H, the sum of three columns can result in the all-zero

vector, and it is not possible for the sum of two columns to give the same

29
Introduction

result, and so the minimum distance of the code is d^^^ = 3. This means that

they can be used for correcting any error pattern of one error, or detecting any error

pattem of up to two errors. In this case there are also 2'" - 1 correctable error

patterns and on the other hand there exist 2"" cosets, so that the number of

possible correctable error patterns is the same as the number of different cosets

(syndrome vectors). The codes with this characteristic are called perfect codes.

0.5.3 Cyclic Codes

Let V = (vo,Vi,....,v„_i) be an n-tuple over GF(2). If we shift every component of

V cyclically one place to the right, we obtain the following n-tuple:

v^'^=(VbVo,v„....,v„_2) (0.5.9)

which is called the right cyclic-shift (or simply cyclic-shift) of v.

An (n, k) linear block code C is said to be cyclic if the cyclic-shift of each

codeword in C is also a codeword in C.

Cyclic codes form a very special type of linear block codes. They have encoding

advantage over many other types of linear block codes. Encoding of this type of

code can be implemented with simple shift-registers with feedback connections.

Many classes of cycUc codes with large minimum distances have been constructed;

and,fiirthermore,efiBcient algebraic hard-decision decoding algorithms for some of

these classes of cyclic codes have been developed. To analyze the structural

properties of a cyclic code, a codeword v = (vo,Vi,....,v„_i)is represented by a

30
Introduction

polynomial over GF(2) of degree n -1 or less with the components of v as

coefficients as follows:

v = Vo + ViJi'+....+v„_,X''"^ (0.5.10)

This polynomial is called a code polynomial. In polynomial form, an (n, k)

cyclic code C consists of 2* code polynomials. The code polynomial

corresponding to the all-zero codeword is the zero polynomial. All the other

2* -1 code polynomials corresponding to the 2* -1 nonzero codewords in C

are nonzero polynomials.

Some important structural properties of cyclic codes are presented in the

following without proofs. Berlekamp (1984), Blauhut (2003), Blake and muUin

(1975), Clark and Cain (1981), Lin and Costello (2004), Macwilliams and

Sloane (1977), Peterson and Weldon (1972) contain good and extensive

coverage of the structure and construction of cyclic codes.

In an (n, k) cyclic code C, every nonzero code polynomial has degree at least n

- k but not greater than n - 1. There exists one and only one code polynomial

g(X) of degree n - k of the following form:

g{X) = 1+giJf + g2^2 + _^ g„_jt_jX«-*-i + X"-* r— (0.5.11)

Therefore, g(X) is a nonzero code polynomial of minimum degree and is unique.

Every code polynomial v(X) in C is divisible by g(X), i.e., is a multiple of

g(X).Moreover, every polynomial over GF(2) of degree n - 1 or less that is

31
Introduction

divisible by g(X) is a code polynomial in C. Therefore, an (n, k) cyclic code C

is completely specified by the unique polynomial g(X) of degree n - k given by

(0.5.11). This unique nonzero code polynomial g(X) of minimum degree in C

is called the generator polynomial of the (n, k) cyclic code C. The degree of

g(X) is simply the number of parity-check bits of the code. Since each code

polynomial v(X) in C is a multiple of g(X) (including the zero code

polynomial), it can be expressed as the following product:

v(X) = m(X) g(X) (0.5.12)

where m{X) = /MQ + AWIZ+ ....+W^.JX*"' is a polynomial over GF(2) of degree (k

- 1) or less. If m = {mQ,mY,....,mi^_i)is the message to be encoded, then m(X)

is the polynomial representation of m (called a message polynomial) and v(X)

is the corresponding code polynomial of the message polynomial m(X). With

this encoding, the corresponding k x n generator matrix of the (n, k) cyclic

code C is given as follows:

1 ^1 ^2 • •• gn-k-l 1 0 0 •• 0 0
0 1 gi • •• Sn-k-2 8n-k-l 1 0 •• 0 0
G= (0.5.13)

0 0 0 ••• 1 gi g2 g3 - g„.k-l 1.

Note that G is simply obtained by using the n-tuple representation of the

generator polynomial g(X) as the first row and its k - 1 right cyclic-shifts as

the other k - 1 rows. G is not in systematic form but can be put into systematic

form by elementary row operations without column permutations.

32
Introduction

0.5.4 Reed-Muller codes

The binary Reed-MuUer codes were first constructed and explored by MuUer

(1954), and a majority logic decoding algorithm for them was described by

Reed (1954). Although their minimum distance is relatively small, they are of

practical importance because of the ease with which they can be implemented

and decoded. They are of mathematical interest because of their connection

with finite affine and projective geometries; [Assumus and Key (1998)]. These

codes can be defined in several different ways. Here we choose a recursive

definition based on the (u | u + v) construction.

Let m be a positive integer and r a nonnegative integer with r < m. The binary codes

we construct will have length 2"". For each length there will be m + 1 linear

codes, denoted R(r, m) and called the r order Reed-Muller, or RM, code of

length 2". The codes R(0,m) and R(m, m) are trivial codes: the 0th order RM

code R(0,m) is the binary repetition code of length 2'" with basis {1}, and the

m"" order RM code R(m, m) is the entire space GF(2f'". For 1 < r < m, define

R(r, m) = {(u, u + v) I u G R(r,m - l),v G R(r - l,m - 1)} (0.5.14)

Let G(0,m) = [1 1 • • • 1] and G(m,m) = I^„ . From the above description, these

are generator matrices for R(0,m) and R(m, m), respectively. For 1 < r < m, a

generator matrix G(r, m) for R(r, m) is

G(r,m-\) G{r,m-\)
G{r,m) = (0.5.15)
0 G(r-l,w-l)

33
Introduction

The generator matrices for R(r, m) with l < r < m < 3 a r e constructed as follows:

10 1 0 10 10
1 0 1 O" 0 10 1 0 10 1
G(l,2) = 0 1 0 1 G(l,3) =
0 0 11 0 0 11
0 0 1 1
0 0 0 0 1111

'l 0 0 0 10 0 0
0 1 0 0 0 10 0
0 0 1 0 0 0 10
and G(l,3) = 0 0 0 1 0 0 01
0 0 0 0 10 10
0 0 0 0 0 10 1
0 0 0 0 0 0 11

From these matrices, notice that R(l, 2) and R(2, 3) are both the set of all even

weight vectors in GF{T)^ and GF{2f , respectively. Notice also that R(l, 3) is

an [8,4,4] self-dual code.

The dimension, minimum weight, and duals of the binary Reed-MuUer codes

can be computed directly from their definitions.

Properties: Let r be an integer with 0 < r < m. Then the following hold:

(/) R{i,m)QR{j,m), ifO<i<j<m.

(m ^m^
(ii) The dimension of R(r, m) equals + +
\f J

(iii) The minimum weight of R(r, m) equals 2'" ''.

(iv) R(m,m)^ = {0}, and if 0 < r < m, thenR{r,m)^ =R(m-r-l,m).

34
Introduction

0.6 GOLAY CODES

0.6.1 The Binary Golay Code: The binary form of the Golay code is one of the

most important types of linear binary block codes. It is of particular significance

since it is one of only a few examples of a nontrivial perfect code. A t-error-

correcting code can correct a maximum oft errors. A perfect t-error correcting code

has the property that every word lies within a distance oft to exactly one codeword.

Equivalently, the code has d^^^=2t-\-\, and covering radius t, where tiie

covering radius r is the smallest number such that every word lies within a distance

of r to a codeword.

If there is an (n, k) code with an alphabet of q elements, and £/„,;„ = 2/ + 1, then,

The inequality in (0.6.1) is known as the Hamming bound. Clearly, a code is perfect

precisely when it attains equality in the Hamming bound. Two Golay codes do

attain equality, making them perfect codes: the [23, 12] binary code

withi/njjn = 7 , and the (11, 6) ternary code witht/min =5. Both codes have the

largest minimum distance for any known code with the same values of n and k.

Golay was in search of a perfect code when he noticed that

+ = 2^'=2^3-12 (0.6.2)
0 j li J v2y v 3 ;

which indicated the possible existence of a [23, 12] perfect binary code that could

correct up to three errors. Golay (1949) discovered such a perfect code, and it is

35
Introduction

the only one known capable of correcting any combination of three or fewer

random errors in a block of 23 elements. This [23, 12] Golay code can be

constructed as a cyclic code. The only knowledge we shall assume in advance is

the factorization of x^^ -\ O\QTGF{2). There is clever method of finding the

factors of x" -1 over GFiq) [MacWilliams and Sloane (1977)]. Let g,(x), g^ix)

are the factors of x^^ -1 overGF(2), then

jc^^ -1 = (;c-l)(:c" +x'° + / +x^ +x^ +x^ +x^ +1)


X (X^^ + X^ + X^ + X^ + X^ ++X + 1)

= (X-I)gl(x)g2(x)

Both the polynomials gj(x) and g^ix) generate the [23, 12, 7] Golay code. Let

C, be the cyclic code generated by g,(jc)and C^ be the cyclic code generated

by giix)- It can be easily observe that both the polynomials g^{x) and g^{x)

are reciprocals of each other, and so C^ is equivalent toC, Q .

There are several different ways to decode the [23, 12] binary Golay code that

maximize its error-correcting capability at t - 3. Two of the best methods are

refined error trapping schemes: the Kasami Decoder, and the Systematic

Search Decoder. Both are explained by Lin and Costello (1983).There is also

other systems of decoding, but are not as good because some of the error-

correcting capability of the code is lost in carrying out the decoding.

36
Introduction

0.6.2 The Extended Binary Golay Code

Let C be any [n, k] code whose minimum distance is odd. We can obtain a

new (n+1, k) code C with the new minimum distance i/^„ = J^„ +1 by adding

a 0 at the end of each code word of even weight and a 1 at the end of each code

word of odd weight. This process is called adding an overall parity check or

extension of the code. The [23, 12] Golay code can be extended by adding an

overall parity check to each codeword to form the [24, 12] extended Golay

codeC24. This code can be generated by the 12 by 24 matrix G = [7,21 A] where

7,2 the 12 by 12 identity matrix and A is is the following matrix:

0 1 1 1 1 1 1 1 1 1 1 1
1 1 1 0 1 1 1 0 0 0 1 0
1 1 0 1 1 1 0 0 0 1 0 1
1 0 1 1 1 0 0 0 1 0 1 1
1 1 1 1 0 0 0 1 0 1 1 0
1 1 1 0 0 0 1 0 1 1 0 1
A= 1 (0.6.3)
1 0 0 0 1 0 1 1 0 1 1
1 0 0 0 1 0 1 1 0 1 1 1
1 0 0 1 0 1 1 0 1 1 1 0
1 0 1 0 1 1 0 1 1 1 0 0
1 1 0 1 1 0 1 1 1 0 0 0
1 0 1 1 0 1 1 1 0 0 0 1

In addition, the 12 by 24 matrix G' = [^ 17,2] is also a generator for the code.

This [24, 12] extended Golay code C24 has minimum distance ^„i„ =8 and has

a code rate of exactly R = \I2. Following are the properties of C24:

Property 1: The extended binary Golay code C24 is a doubly-even code, i.e.

the Hamming weights of all codewords of C24 are divisible by four.

37
Introduction

Property 2: C^^ is invariant under a permutation of coordinates that

interchanges the two halves of each codeword.

Property 3: Unlike the [23,12] code, the [24, 12] extended Golay code is not

perfect, only quasi-perfect, because all spheres of radius t are disjoint, but

every vector is at most a distance of t + 1 from some code vector. A quasi-

perfect code is defined to be a code which for some t has most vectors of

weight t or less, a few of weight t + 1, and none of weight greater than t + 1.

Property 4: There are 212, or 4096, possible code words in the extended

Golay code and like the non-extended [23,12] code, it can be used to correct at

most three errors.

Property 5: C^^ has no codewords of weight 4.

The possible weights of codewords in [24, 12] extended Golay code C^^ are 0,

4, 8,12,16,20,24. If M e C^^ has weight 20, then w +1 has weight 4, 1 e C„ as

C24 is a linear code. But it can be prove that C^^ has neither codewords of

weight 4 nor of weight 20 [MacWilliams and Sloane (1977)]. Thus the weights

occurring in C^^ are 0, 8,12,16,24.

Property 6: Weight distribution of C^^.

Let Ai be the number of words of weight i. Then A^=A^^=\,A^ = A^(,. To

each left sideL of a codeword in C^^ there are two possible right sides, R

and /{. If M'r(I) = 0, then wt{R) ^ 4 and wt{R) ^ 8 (or else wt(R) = 4 ), which is

not possible so wt(R) = 0or\2. If wt(L) = 2, then wt(R) = 6 by a similar

38
Introduction

argument. Proceeding in this way the possibilities of number of codewords of

different weights of C^^ are given in the following table:

Number wt{L) wt{R) wt{R) total weight


1 0 0 12 0 12
'in rm 2
1 v2y

+ 8 8 12
v3y v4y
a =? 6 2 10 8 16
6 6 6 12 12
rin rir 8 8 12 16

10 16 16
.9 J"'llO,
12 12 12 24

Since C^^ is invariant under a permutation of coordinates that interchanges the

two halves of each codeword. Therefore is equal to the number of vectors of

type(2,6), whichis2 11+


r. fn^^ Thus
v2y

^ = 4 11+
'in^ ^ir 'in = 759 (0.6.4)
.2. .3.

and so A12 = 2576 .Hence the weight distribution of extended binary Golay code C24 is

/ : 0 8 12 16 24
(0.6.5)
^•: 1 759 2576 759 1

Any binary vector of weight 5 and length 24 is covered by exactly one codeword of

C24 of weight 8, since if a vector of weight 5 were covered by two codewords

39
IntroductioB

w,v of weight 8, then dist{u ,\)<(), a contradiction. Each codeword of

weight 8 covers vectors of weight 5, which are all distinct,


v5;

(%\ (1^\
and 759 . Thus the codewords of weight 8 in the extended binary
v5y

Golay code C24 form a Steiner system S(5, 8, 24). The codewords of weight 8

and 12 in C•^^ are called octads and dodecads respectively.

Property 7: C^^ is unique in terms of its parameters.

Witt (1938) proved an important theorem which states that the Steiner system

S(5, 8, 24) is unique. Other proofs of the theorem are given by Curtis (1976),

Jonsson (1972). Since there is a generator matrix for C^^ all whose rows have

weight 8, it follows that the octads of S(5, 8, 24) generate C24. This result is

used to prove an important property of extended Golay code that C-^^ is unique

in terms of its parameters by Delsarte and Goethals (1975) and Pless (1968).

Since the [23, 12, 7] perfect code C23, may be obtained by deleting any

coordinate of C24 therefore it is also unique in terms of its parameters.

Property 7: The Golay code C^^ is a self-dual code i.e. C24 = C^.

Property 8: The Golay code C24 is an extremal code.

It is well known that the length of any binary doubly-even self-dual code is divisible

by eight. Mallows and Sloane (1973) proved the following celebrated bound on the

minimum Hamming weight of a binary [n, n/2,d] doubly-even self-dual code C:

d<4 — +4 (0.6.6)
24 ^ ^

40
Introduction

If C meets this bound, that is, £/ = 4 + 4, then C is called an extremal code.


24

Property 8: The block intersection number /i,^ for the Steiner system formed by

the octads and dodecads of C^^ are shown below:

0 759
1 506 253
2 330 176 77
3 210 120 56 21
iiA 130 80 40 16 5
5 78 52 28 12 4 1
6 46 32 20 8 4 0 1
7 30 16 16 4 4 0 0 1
8 3 0 0 16 0 4 0 0 1 1

Figure 0.6.1 Block intersection number Xy for the octads in the extended Golay code

0 2576
1 1288 1288
2 616 672 616
3 280 336 336 280
iiA 120 160 176 160 120
5 48 72 88 88 72 48
6 16 32 40 43 40 32 16
7 0 16 16 21 24 16 16 0
8 0 0 16 0 24 0 16 0 0

Figure 0.6.2 Generalized block intersection number /{,y for the octads in the extended
Golay code.

41
Introduction

Property 9: Automorphism group of the extended binary Golay code:

Conway (1971) shown that the Mathieu group M24 preserves the extended binary

Golay code C24 i.e. M24 is the full automorphism group of C^^ .The Mathieu

group isfive-foldtransitive group, of order 24 • 23 • 22 • 21 • 20 • 48 = 244823040 .

0.6.3 Ternary Golay Codes:

In addition to the binary Golay codes discussed previously, there are also [11,

6, 5] and [12, 6, 6] ternary Golay codes, denoted by C^ and C,2. The ternary

[11, 6, 5] Golay code is the only known perfect nonbinary code. Note that a

Hamming sphere with radius 2 over GF(3) contains 243 vectors because

1+2 +4
^ir
= 243
.1. .2,

)5 ^1 1 r. . ^ 1 . u\, o6
Since 243 is equal to 3 , there may be a perfect packing with 3 spheres (code

words) of radius t = 2, which was discovered by Golay. The [11,6] Golay code

over the Galois field with three elements GF(3) has minimum distance of 5,

and can correct up to two errors. As stated previously, like the [23, 12, 7]

binary Golay code, the [11, 6, 5] ternary Golay Code has the largest minimum

distance d^^^ of any known code with the same values of n and k.

Like [23, 12, 7] binary Golay code the [11, 6, 5] ternary Golay code Q, may

be constructed as cyclic code. The factorization of x " - 1 over GF(3):

x" - l = (x-l)(x' + / - x ' +x' -\){x' -x' +x^ -x-\)

x''-\ = {x-\)g,{x)g,{x)

42
Introduction

Note that g\{x) = -x^g2{x ) and so (gi(x))and (giW) are equivalent [11, 6]

code with minimum distance 5.

Also, like the [23, 12, 7] binary code, the [11, 6, 5] ternary code can be

extended by adding an overall parity check to form the extended [12, 6, 6]

ternary Golay codeC]2. The generator matrix of C12 can be given as:

0 1 1 1 1 1
1 0 10 12 2 1
1 1 1 0 12 2
G-{h\A] = (0.6.7)
1 12 10 12
0 1 12 2 10 1
1 1 1 2 2 10

Properties of ternary Golay codes:

The [11, 6, 5] ternary code Cj, is a perfect code as it attains the sphere packmg

rin ..(\\\
bound i.e. 3^- 1 + 2' +2^ = 3''. The supports of the codewords of
Kh v2y

weight 5 of Cn form the blocks of Steiner system S(4, 5, 11). The [12, 6, 6]

code C12 is self dual, so has all weights divisible by 3. The supports of the

codewords of weight 6 form the 132 blocks of Steiner system S{5, 6, 12).

Delsarte and Goethals (1975) and Pless (1968) proved that C,j, Steiner system

S(4, 5, 11) and C12, S(5, 6, 12) are unique. The automorphism group of C12 is

isomorphic to the Mathieu group Mjj which is a five-fold transitive group of

order 12-1M0-9-8 = 95040,

43
Introduction

0.7 A LATTICE PRIMER

0.7.1 Definitions

A real lattice A is simply a discrete set of vectors (points, N-tuples) in real

Euclidean N-space R^ that forms a group under ordinary vector addition, i.e.,

the sum or difference of any two vectors in A is in A. Thus A necessarily

includes the all-zero N-tuple 0, and \i X is in A, then so is its additive

inverse'-A'. The vectors in a lattice may possibly span fewer than N

dimensions; however, this will not be the case for any lattice considered here,

so there will be no confusion if we call a lattice of real N-tuples an N-

dimensional real lattice.

As an example, the set Z of all integers is essentially the only one-dimensional

real lattice, up to scaling, and the prototype of all lattices. The set Z^ of all

integer N-tuples is an N-dimensional real lattice for any N.

Lattices have only two principal structural characteristics. Algebraically, a

lattice is a group; this property leads to the study of subgroups (sublattices) and

partitions (coset decompositions) induced by such subgroups. Geometrically, a

lattice is endowed with the properties of the space in which it is embedded,

such as the Euclidean distance metric and the notion of volume in R^.The

following two sections are concerned with these two aspects of lattice structure.

Lattices closely related to a given real N-dimensional lattice A are obtained by

the following operations.

44
Introduction

1) Scaling: If r is any real number, then rA is the lattice consisting of all

multiples rk of vectors X, in A by the scalar r.

2) Orthogonal Transformation: More generally, if T is any scaled orthogonal

transformation of N-space, then Tk is the lattice consisting of all

transformations TX of vectors A, in A by T. We say that TK is a version

of A.

3) Cartesian Product: The M-fold Cartesian product of A with itself- i.e., the

set of all MN-tuples {l[,Xi, ...,Xj^) where each Xj, is in A-is an MN-

dimensional lattice denoted by A^.

For example, Z^ is the N-fold Cartesian product of Z with itself, and rZ^ is a

scaled version of Z^ for any r and N. The two-dimensional lattice Z^ is

illustrated in the following Fig. (0.7.1)

0 • o •

• 0 * 0

o • o •

• o • o

Figure 0.7.1: Lattice Z^ and its sublattice RZ^ (block dots)

The most important scaled orthogonal transformation is the rotation operator R,

defined by the 2 x 2 matrix

R= (0.7.1)
1 -I

45
Introduction

RZ^ is a version of Z^ obtained by rotating Z^ by 45° and scaling by 2^'^

and is also illustrated in Fig. (0.7.1). The points in RZ^ are a subset of the

points inZ^, meaning that RZ^ is a sublattice of Z^. Note that R^ =21, where

I is the identity operator (in two dimensions), so that i?^Z^ We can define a

2N-dimensional rotation operator by letting R operate on each pair of

coordinates in a 2N-tuple; with a slight abuse of notation, we denote by R any

such rotation operator. For instance, in four dimensions.

^ 1 1 0 0^
1-10 0
/? = (0.7.2)
0 0 11
0 0 1-1

Note that R =21 for any N, where I is the identity operator in 2N dimensions,

SO that R A = 2A for any real 2N-dimensional lattice A.

0.7.2 Group Properties

A coset of a lattice A, denoted by A+c, is the set of all N-tuples of the

fonnA, + c, where X is any point in A and c is some constant N-tuple that

specifies the coset. Geometrically, the coset A + c is therefore a translate of A by

c (if c is in A, then A + c = A). Two N-tuples are equiualent modulo A if their

difference is a point in A. Thus the coset A+c is the set of all points equivalent to

c modulo A.

A sublattice A' of a lattice A is a subset of the elements of A that is itself a

lattice, i.e.. A' is a subgroup of the additive group A. Thus, by elementary

46
Introduction

group theory, a sublattice A' induces a partition (denoted by A/A') of A into

equivalence classes modulo A' (the equivalence classes may be added modulo

A' and form the quotient group A/A'). We shall say that the order of the

partition (or quotient group) A / A' is the number | A / A' | of such equivalence

classes (in the mathematical literature, | A / A' | is usually called the index of A'

in A). Each equivalence class is a coset of A' (one being A'itself), or,

geometrically, a translate of A'. For example, the partition Z^IRZ^ has

orderl Z^/i?Z^ |=2, and Fig. 0.7.1 illustrates Z^ as the union of two cosets of

RZ^. Of course, any N-dimensional integer lattice A is a sublattice of Z^.

If we take one element from each equivalence class, we obtain a system of coset

representatives for the partition A/A', denoted by [A/A']. (In general, there are

many ways of selecting such a system[A/A'], so the notation does not entirely

specify the system.) Then every element of A can be written uniquely as a sum

X, = A,' + c where c e [A / A'] is the coset representative of the equivalence class

in which X lies, and A,' = A, - c is an element of A' (because X = c mod A').

This is called a coset decomposition of A and will be written here as

A = A'+[A/A'] (0.7.3)

For example, the two 2-tuples (0, 0) and (1, 0) are a system of coset

representatives for the partition Z^/i?Z^, and every element of Z^ may be

written as the sum of one of these two 2-tuples with an element of RZ^, i.e.,

Z^ is the union of RZ'^+ {0,0) = RZ'^ and i?Z^ +(1,0) (the black dots and

white dots in Fig. (0.7.1) respectively).

47
Introduction

As another example, if m is any integer, the lattice mZ of integer multiples of

m is a sublattice of Z. The partition ZImZ is the partition of the integers into

m equivalence classes modulo mZ (modulo m), and the order of the partition is

m. The integers {0,1,... ,w -1} form a system of coset representatives for the

partition ZImZ, and every integer n can be written uniquely as n = am + c,

where am is an element of mZ and c e {0,1,..., w -1} = [Z / mZ] (thus [Z / mZ] is

essentially the ring Z„ of integers modulo m). In particular, the partition

Z/2Z has order 2 and divides the integers into two subsets, 2Z (the even

integers) and 2Z + 1 (the odd integers).

More generally, for any m e Z, the lattice mZ^ of N-tuples of integer multiples

of m is a sublattice of Z^ of order w^, and [ZImZf is a system of coset

representatives for Z^ ImZ^ hence Z^ = Z^ +[Z I mZf. A partition A/A'

also induces a coset decomposition of any coset of A, say A+c; for

A = A'+[A/A']+c (0.7.4)

A partition chain A/A'/A"/... . is a sequence of lattices such that each is a

sublattice of the previous one (in other words, A 3 A' 3 A"...). For example,

Z/2Z/4Z/ . . . is an infinite sequence of two-way partitions of the integers. A

partition chain induces a multiterm coset decomposition chain, with a term

corresponding to each partition; e.g., if A / A' / A" is a partition chain, then

A = A''+[A7A'']+[A/A'] (0.7.5)

48
Introduction

that is every element of A can be expressed as an element of A' plus a coset

representative from [A'/A*] plus a coset representative from [A/A]. For

example, the chain Z/2Z/4Z/... leads to the standard binary representation of

an integer m:

m = aQ + 2ai+4a2 +... (0.7.6)

where 00,0^,02,... e {0,1}, and QQ specifies the coset in the partition Z/2Z, 2a^

specifies the coset in the partition 2Z/4Z, and so forth. That is,

Z = [Z/2Z]+[2ZIAZ]+[4Z+8Z] + (0.7.7)

0.7.3 Geometric Properties

The geometry of a real lattice A arises from the geometry of a real Euclidean

N-space R^. The two principal geometrical parameters of A are the minimum

squared distance ^min(A) between its points and itsftmdamentalvolume F(A);

these determine its fundamental coding %dim.y{K).

The norm |jf|| of a vector x in R is the sum of the squares of its coordinates.

Norms are nonnegative and in fact nonzero unless x = O.The squared distance

between two vectors x and y is the norm of their differencefo:-dl .

Because a lattice A consists of discrete points, the norms of all lattice points

are an infinite set of discrete values that can be enumerated in ascending order.

We call this the weight distribution of the lattice (theta series, in the lattice

literature). The weight distribution is also the squared distance distribution

49
Introduction

between any point in the lattice and all other points, since any point A, in A

can be taken as the origin 0 by translation of A by A, (looking out from any

point in A, the lattice looks the same).

The minimum nonzero norm is thus the minimum squared distance <imin(^)

between any two points in A, The number of elements of A with this norm is

the number of nearest neighbors of any lattice point (also called the kissing

number, or multiplicity), and will be called the error coefficient#o(A)- For

example, for any N, the integer lattice Z^ has d^;^^{Z^) = \. The set of all

integer N-tuples of norm 1 is the set of all permutations and sign changes of the

vector (1,0,..., 0), soN^{Z^) = 2N.

Loosely, the fundamental volume F(A) is the volume of N-space per lattice

point, or the reciprocal of the number of lattice points per unit volume. More

precisely, if we can partition N-space into regions of equal volume, one

associated with each lattice point, then F(A) is the volume of each such region.

For example, it is easy to see that we may partition N-space into N-cubes of

side 1, one associated with each point of Z^, so V(Z^) = 1.

To treat the general case, note that R^ is itself a group under ordinary vector

addition (but not a lattice), because its points are not discrete. Any real N-

dimensional lattice A is a subgroup of R^. Thus there is a partition R^/A of

N-space into equivalence classes modulo A (cosets of A) (in our original

definition of a coSet of A, implicitly we meant a coset in the partition R^/A).

_
Introduction

Define a fundamental region R(A) as a region of N-space that contains one

and only one point from each such equivalence class modulo A; thus M(A) is a

system of coset representatives for the partition K^ IK. Every point x in R is

thus uniquely representable as x = X-\-c where X, G A andc € R(A), i.e., there

is a coset decomposition R^ = A + ]R(A). Geometrically, this is a tesselation of

N-space by translates of fundamental regions of A. While there is no unique

fundamental region, every fundamental region K(A) must have the same

volume F(A) (if it is measurable), since it is congruent to any other

fundamental region modulo A; this uniquely defines the fundamental


I

volume V{K).

The computation of fundamental volumes of an integer lattice A may be

completely avoided by use of the following property, if we know the order

IZ'^ / A| of the partition Z ^ / A .

Property: If A' is a Sublattice of A of order | A / A' |, then F(A') =| A / A' | F(A).

From the two geometrical parameters fi?Jin(A) and V(A) we define the

fundamental coding gainy(A) of a lattice A as follows:

yW=''^^^^,M (0.7.8)

(in the mathematical literature this is called Hermite's parameter and is also

denoted by the symbol y.


Introduction

0.7.4 Complex Lattices and Gaussian Integers

A complex lattice A is a discrete set of points in complex Euclidean

N-space C that forms a group under ordinary (complex) vector addition.

Again, we stipulate that the only such lattices to be considered here will

actually span N dimensions, so we shall feel free to call such a A an N-

dimensional pomplex lattice.

An obvious isomorphism (written A^ sA^) exists between any 2N-dimensional

real lattice A^, and a corresponding N-dimensional complex lattice A^, formed by

taking each pair of coordinates of A^ to specify the real and imaginary parts of

each coordinate of A^, or vice versa. Addition of two points gives the same result

in either case. Sublattices, cosets, and all such group properties cany over. Even

the norm of two corresponding vectors is the same, so distances are not affected.

Thus for most purposes it makes no difference whether we consider a lattice to be

real or complect. For all parameters previously defined (e.g., ^JinC^)' ^(^)' Y(^) )»

we may define the values for a complex lattice to be the same as those for tiie

corresponding Ueal lattice.

The only difference of any significance arises when we consider multiplicative

operations, such as scaling, or the taking of inner products. A complex lattice

Aj., may be scaled by either a real number r or a complex number a , the latter

operation involving an equal phase rotation of each coordinate of A^., by the

phase of a (as well as a scaling of lengths by | a |, or norms by | a p). The

52
Introduction

inner product (x, y) of two real vectors x and y is the sum of the products of

their coordinates must be real; the (Hermitian) inner product (x, y) of two

complex vectors x and y is the sum of the products of the coordinates of x with

the complex conjugates of the coordinates of y and may be complex. Thus

there may arise differences in definitions of orthogonality, duality, and so forth.

The simplest example of a complex lattice is the one dimensional complex

lattice G corresponding to the two dimensional real lattice Z^. The point (a, b)

in Z^ cortesponds to the point a + bi in G, where a and b may be any pair of

integers. The set G is called the set of Gaussian integers. The Gaussian integers

G actually form a system of complex integers analogous to the ordinary real

integers Z, Multiplication of two elements of G (using complex arithmetic)

yields another element of G, which cannot be 0 unless one of the two elements

is 0 (in fact, their norms multiply as real integers). Thus G is a ring and, in fact,

an integral domain. Indeed, we have unique factorization in G: every element

of G can be expressed uniquely as a product of primes, up to units, where the

units (invertible elements) are ±1 and ±i, and the primes are the elements that

have no divisors other than themselves, up to units. The primes of G, in order

of increasing norm, are 1±/, 2±i, 3:[.y, with norms 2, 5, 9 , . . . . We denote the

prime of least norm by (j) = 1 + / (Note that (j)^ = ((MJ)* = 2, and thus two is not a

prime in G.) We may scale G by any element g G G and obtain a sublattice

gG of G. The partition G/gG must have order | ^ p (the norm of g). There are

thus I (|) 1^ equivalence classes of G modulo g.

53
Introduction

0.8 BCH CODES

BCH (Bose, Chaudhuri (i960), Hocquenghem (1959)) codes are a class of

linear and cyclic block codes that can be considered as a generalization of the

Hamming codes, as they can be designed for any value of the error-correction

capability t. These codes are defined in the binary field GF(2), and also in their

non-binary version, over the Galois field GF(q). These codes were generalized

to allfinitefieldsby Gorenstein and Zierler (1961).

0.8.1 Description of BCH Cyclic Codes

BCH codes are a generalization of Hamming codes, and they can be designed

to be able to correct any entor pattern of size t or less [Bose, Chaudhuri (1960),

Hocquenghem (1959), tin and Costello (1983)]. In this sense the

generalization of the Hamining codes extends the design of codes for t = 1

(Hamming codes) to codes for any desired higher value oft (BCH codes). The

design method is based on taking an LCM of appropriate minimal polynomials.

For any positive integer nfi > 3 and / < 2*""', there exists a binary BCH code

^BCH ("' ^) w^th the following properties:

Code length n=2'"-l

Number of parity bits n - k < mt

Minimum Hamming distance f/n,j„ = 2/ +1

Error-correction capability t errors in a code vector.

These codes are able to correct any error pattern of size t or less, in a code

vector of length n, n = 2"" - 1 .

54
Introductioa

The generator polynomial of a BCH code is described in terms of its roots,

taken from the Galois fieldGFCl""). If a is a primitive element in GF{2''), the

generator polynomial g(X) of a BCH code for correcting t errors in a code

vector of lengtfi n =2"" -1 is the minimum-degree polynomial over GF(2) that

has a,a ,... ,a as its roots:

gia!) = 0, i = l,2,...2t (0.8.1)

It also true that g(X) has a' and its conjugate as its roots. On the other hand, if

(j),(X) is the minimal polynomial of a' then the LCM of

(|)](X),(1)2(A'),.., ,^2ti^) is the generator polynomial g(X):

g{X) = LCM{^^(X),^2(n... AitW} (0.8.2)

However, and due to repetition of conjugate roots, it can be shown that the

generator polynomial g(X) can be formed with only the odd index minimal

polynomials [Lin and Costello (1983)]:

g{X) = LCMMXU,iX),... ,(t.2M(^)} (0.8.3)

Since the degree of each minimal polynomial is m or less, the degree of g(X) is

at mostw^ As BCH codes are cyclic codes, this means that the value of n - k

can be at most mt. The Hamming codes are a particular class of BCH codes, for

which the generator polynomial is g(X) = ^i(X). A BCH code for t = 1 is then

a Hamming code. Since a is a primitive element of GF{2"'), then(t)i(A') is a

polynomial of degree m.

55
Introduction

Example: Let a be a primitive element of GF{2^), thenl+a+a"* = Oand the

minimal polynoinials of a, a^ and a^ are, respectively,

(t)5(X) = l + X + X2

A BCH code for correcting error patterns of size t = 2 or less, and with block

length n = 24 - 1 = 15, will have the generator polynomial

g{X) = LCHMX)MX)}

Since ^i{X) and (j)3(^)are two irreducible and distinct polynomials,

g(X) = (l)i(^)(t)3(X)
g{X) = {UX + x'^){\ + X+x'^+X^+X^)
g(,X) = {\ + X^ + X^+x'^ +X^)

This is the BCH Code CQCU{\5, 7) with minimum Hamming distance d^^^> 5.

Since the generator polynomial is of weight 5, the minimum Hamming distance

of the BCH code Which this polynomial generates is d^^^^ = 5.

In order to increase the error-correction capability to any error pattern of size t

= 3 or less, the corresponding binary BCH code is Cgcff(\5, 5) with minimum

distance d^^i^ > 7, which can be constructed using the generator polynomial

g(X) = ^i(X)^^iX)^,(X)
= (l + X+X'^)(\-\-X+X^+X^+X^)i\ + X+X^)
= (1 + X + X^+X'^+X^+X^^ X^°)

56
Introduction

This generator polynomial is of weight 7, and so it generates a BCH code of

minimum Hamijiing distance c/^jj, = 7.

As a result of the defmition of a linear binary block BCH code C^CH (^» ^ fo""

correcting error patterns of size t or less, and with code length n = 2" - 1 , it is

possible to afFiirm that any code polynomial of such a code will have

a,a^,...,a^'and their conjugates as its roots. This is so because any code

polynomial is a niultiple of the corresponding generator polynomial g(X), and also

of all the minimi polynomials i^^{X\i^2i^\ -Aiti^)' Any code polynomial

c{X) = CQ + c^X+CiX^ + ... + Cn_^X"~^ of Qc// (n, k) has primitive element a' as

a root:

c{X) = CQ + CjOt+Cjtt + . . . + c„_ian-\ = 0 (0.8.4)

In matrix form,

a
2/
(CQ, CJ, C2, —,C„-\)°
a =0 (0.8.5)

(n-\)i
a

The inner product of the code vector (CQ, CJ, C2, ...,c„_j)and the vector of roots

(I,a',a^',...,a^""'^') is equal to zero. The following matrix can then be formed:

57
Introduction

1 a a" a? . •• a"-^
1 a^ i^'f (a^)^ • •• (a^r^
H = 1 a^ i^'f (a^)^ • •• (a^r' (0.8.6)

1 a^^ {o}^f (a^')^ • •• (a^T'

If c is a code vector, it should be true that

coW =0 (0.8.7)

From this point of view, the linear binary block BCH code C^^ffin, k) is the

dual row space of the matrix H, and this matrix is in turn its parity check

matrix. If for soi^e i and some j , a-' is the conjugate of a', thenc(a^) = 0. This

means that the ipner product of C = (CQ, C,, CJ, ...,c„_i) with the ith row of H is

zero, so that these rows can be omitted in the construction of the matrix H,

which then adopts the form

a a a
3
a (a')"-'
H = 1 a= (0.8.8)

1 a^'-i (a2'-')2 (a^'-y ia^'-'r'

Each element of the matrix H is an element of GF(2'"), which can be

represented as an m-component vector taken over GF(2), arranged as a column,

which allows us to construct the same matrix in binary form.

58
Introduction

4 1 _
Example: For the binary BCH code Q^/zClS, 7) of length « = 2* -1 = 15, able

to correct any error pattern of size t = 2 or less, and a being a primitive element

of GF(2'^), the parity check matrix H is of the form

\ 2 3 4 5 6 7 8 9 10 11 12 13 14"
H= l a a a a a a a a a a a a a a
,l a3 6a 9a a 12 a 0a 3 a 6 a 9 a 12 0 a 3a 6a a9 a12

which can be described in binary form as follows:

1 0 0 0 1 0 0 1 1 0 1 0 1 1 1
0 1 0 0 1 1 0 1 0 1 1 1 1 0 0
0 0 1 0 0 1 1 0 1 0 1 1 1 1 0
0 0 0 1 0 0 1 1 0 1 0 1 1 1 1
H=
l O O O l l O O O l l O O O O
o o o i i o o o i i o o o i i
0 0 1 0 1 0 0 1 0 1 0 0 1 0 1
0 1 I 1 1 0 1 1 1 1 0 1 1 1 1

0.8.2 Bounds on the Error-Correction Capability of a BCH Code

The Vandermonde Determinant

It can be showii that a given BCH code must have minimum

distance i/^jn > 2^ + 1, so that its corresponding parity check matrix H has 2t + 1

columns that sum to the zero vector. BCH codes are linear block codes, and so

the minimum distance is defined by the non-zero code vector of minimum

weight. Should there exist a non-zero code vector of weight p^j <2t with non-

zero elementscyi,cy2,93,...,c^p^, then

59
Introduction

i<^ju^j2^<^j3^-'Cjp,y
a =0 (0.8.9)

^JPH {^yPH ... 2'V/'H


(a^').

By making u$e of (a^')^' = (a^'Y' and as Pu<2t, we obtain

(^jbCy2.^y3.-Cyp^)' =0 (0.8.10)

JPH (nJPH\^ -
iaJP"f (a^P"/"
a

which becomes a p^ x/?^ matrix that fits the result indicated in equation (10)

only if its determinant is zero:

=0 (0.8.11)

a-'P" {a^P»f ••• (a-'Pffff

Extracting a common factorfi-omeach row, we get

1 a^' - {aJ^fPf-^^

Ui+j2+...+Jpff)
a =0 (0.8.12)

1 JPff ••• (a^^^)^^'/-'^

This determinant is called the Vandermonde determinant, and is a non-zero

determinant [Lin and Costello (1983), Blaum (2001)]. Thus, the initial

assumption that pff j< 2/ is not valid, and the minimum Hamming distance of a

60
Introduction

binary BCH code is then equal to 2t + 1 or more. The parameter 2t + 1 is called

the designed distance of a BCH code, but the actual minimum distance can be higher.

Binary BCH codes can also be designed with block lengths less than 2*" - 1 , in a

similar way to tjiat described for BCH codes of length equal to 2" - 1 . If p is an

element of order n in the 0^(2""), then n is a factor of 2*" - 1 . Let g(X) be a

minimum-degree binary polynomial that has p,p^,....,p^'as its roots. Let

(t)i(Z),(t)2(X),...i,(t»2,(X) be minimal polynomials of (5,p^,....,p^', respectively.

Then
!
\
g(X) = LCMMX)M{X\... MtiX)} (0.8.13)
I

Since p" =1, p,0^, ....,p^'are roots of Z" +1. Therefore the cyclic code generated

by g(X) is a code of code length n. It can be shown, in the same way as for binary

BCH codes of code lengths = 2*" - 1 , that the number of parity check bits is not

greater than mt, and that the minimum Hamming distance is atleast Jn,i„ > 2/ +1.

The above analysis provides us a more general definition of a binary BCH code

[Lin and Costello (1983)]. If p is an element of GF(2'")and UQ a positive

integer, then the binary BCH code with a designed mmimum distance dQ is

generated by the minimum-degree generator polynomial g(X) that has as its

roots powers of th^ element p, p, p ,^'^^\...., p'^+'^o-^, with 0 < / < JQ - 1 :

g{X) = LCM\li„{X)MX),... ,<t>4-2W} (0.8.14)

61
Introduction

Here, (|),(A')is the minimal polynomial of P"**"^' and «, is its order. The

constructed binary BCH code has a code length equal to n:

n = ICM{«i,«2—«V2} (0.8.15)

The designed binary BCH code has minimum distance C/Q, a maximum number

W(JQ -l)of parity check bits, and is able to correct any error pattern of size

L(^o-l)/2j.

When WQ = 1 then dQ = 2t-¥\, and if P is a primitive element of 0^(2"*)then the

code length of the binary BCH code is « = 2"" - 1 . In this case the binary BCH

code is said to be primitive. When MQ = land with dQ = lt-v\, if p is not a

primitive element of GF(2'")then the code length of the binary BCH code is

not n = 2"" - 1 , but is equal to the order of p. In this case the binary BCH code is

said to be non-primitive. The requirement that the d^ -1 powers of p have to be

roots of the generator polynomial g(X) ensures that the binary BCH code has a

minimum distance of at Iicast d^.

In more general way BCH codes can be defined as [Ling and Xing (2004)]

Let a be a primitive element of GF{q'") and denote by M\X)ihQ minimal

polynomial of a' with respect to GF{q). A (primitive) BCH code over GF{q)

of length n = q'^ -1 with designed distance 5 is a q-ary cyclic code generated by

g{X) = LCM{M\X)^M''^\X)^...-vM''^^-^{X)) (0.8.16)

for some integer a. Furthermore, the code is called narrow-sense if a = 1.

62
Introduction

0.8.3 Dimension of a q-ary BCH code

(i) The dimension of a q-ary BCH code of length « = ^'"-l generated by

g{X) = LCM{M\X),M''^\X),...,M''''^ {X)} is independent of the choice of

the primitive element a.

Let C, be the oyclotomic coset of q modulo ^""-1 containing i. Put

S = \ Y*' C,, we have

giX) = LCM\jJiX-a'),YliX-a'),...., fj (^-«')

= YliX^a') (0.8.17)

Hence, the dimension is equal to q'" -1 -deg(g(Z)) = ^'" - 1 - | 51. As the set S is

independent of the choice of a, the desired result follows.

(ii) A q-ary BCH code of length n = ^'"-lwith designed distance 8 has

dimension at least ^'" -1 - /w(6 -1).

By part (i), the dimension k satisfies

k = q'"-l-\S
a+8-2
k=q'"-l-
i=a
•6-2

^>^'"-l- Y\Ci\
k= q'"-\-m{h-\) (0.8.18)

63
Introduction

The above result shows that, in order to find the dimension of a q-ary BCH

code of length n = q'^-\generated by g{X) = LCMiM"(X),M"^^(X),...,

a+5-2
j^^a+5-2^^-jj, it is sufficient to check the cardinality of | J C,, where C, is the
i=a

cyclotomiccos^tof qmodulo q"^ -\ containingi.

Example: Consider the following cyclotomic cosets of 2 modulo 15:

C2 = {1,2^4,8} , C3 = {3,6,12,9}

Then the dimension of the binary BCH code of length 15 of designed distance

3 generated by g{X) = LCM{M^{X),M^{X)} is 15-(C2uC3)=15-8=7.

Example: For t > 1, t and 2t belong to the same cyclotomic coset of 2 modulo

2'" - 1 . This is equivalent to the fact that M^^\X) = M^^'\X). Therefore,

{M^^\X)M^\X),....,M^^'~^\X)} = {M^^kX),M^^\X),....,M^^'\X)}',

i.e., the narrow^sense binary BCH codes of length 2m-1 with designed

distance 2t + 1 are the same as the narrow-sense binary BCH codes of length

2'" - 1 with designed distance 2t.


I

In the given Tabid we list the dimensions of narrow-sense binary BCH codes of

length 2" - 1 with designed distance 2t + 1, for 3 < m < 6. Note that the dimension

of a narrow-sense BCH code is independent of the choice of the primitive elements.

64
Introduction

f
Fable 0.8.1

n k t « k t
7 4 1 63 51 2
15 11 1 63 45 3
15 7 2 63 39 4
15 5 3 63 36 5
31 26 1 63 30 6
31 21 2 63 24 7
31 16 3 63 18 10
31 11 5 63 16 11
31 6 7 63 10 13
63 57 1 63 7 15

The dimenisions of narrow-sense binary BCH codes of length 2 -Iwith


designed distance 2t + 1, for 3 < m < 6.

0.8.4 Dimeitsion of a narrow-sense q-ary BCH code


J

(i) A narrowi-sense q-ary BCH code of length n = q'^ -Iwith designed distance

5 has dimension exactly q"^ -\-m{b-\) \i qi^l and gcd(g'"-l,e) = lfor all

l<e<6-L

From the dimension of q-ary BCH code, we know that the dimension is equal
a+5-2
to q"'-\-
/=^a

where C, stands for the cyclotomic coset of q modulo q"" -\ containing i .

Hence, it is sufficient to prove that |C,| = m for all 1 < i < 8 - 1, and that C,

and Cj are disjoint for all 1 < i < j < 6 - 1.

65
Introduction

For any integer 1 < t < m - 1, we claim that / # q'i{ra.odiq"* -l)for 1 < i < 8 - 1.

Otherwise, we would have (^'-l)/ = Omod(^'"-l). This forces

(^' -l)sOmod(^'"-l) as gcd(^'" -1,0 = 1- This is a contradiction. This implies

that|C,| = m f o r a U l < i < 8 - l .

For any integers 1 < i < j < 5-1, we claim that ; # qH (mod^'"-l)for any

integer s > 0. Otherwise, we would havey-/s {q^ -i)/ (mod^'" -1). This forces

y-/sO (rnodq''"-!), which is a contradiction to the condition

gcd(j-i,q'" -1) = 1. Hence, Q and Cj are disjoint.

Example: Consider a narrow-sense 4-ary BCH code of length 63 with

designed distance 3. Its dimension is 63 - 3(3 - 1) = 57.

As we know that a n^ow-sense binary BCH code with designed distance 2t


I

is the same as a narrow-sense binary BCH code with designed distance 2t +1, it

is enough to consider narrow-sense binary BCH codes with odd designed

distance.

(ii) A narrow-sense binary BCH code of length n = 2m - 1 and designed

distance 5 = 2t + 1 has dimension at least n - m (8 - l)/2.


As the cyclotomic cos0ts C2, and C, are the same, the dimension k satisfies

2/
k = 2'"-l-
1=1

A: = 2 ' " - l -[j^2i-i


J=l

66
Introduction

^>2'"-1-2|C2M|

k>2'"-Utm
ik = 2'"-l^w(6-l)/2.

Example: A narrow-sense binary BCH code of length 63 with designed

distance 8 = 5 has dimension exactly 51 = 63 - 6(5 - l)/2. However, a narrow-

sense binary BCH code of length 31 with designed distance 5 = 11 has

dimension 11, which is bigger than 31 - 5(11 - l)/2.

0.9 NEW CODUS FROM OLD

In this subsection We describe that many interesting and important codes will arise

by modifying or combining existing codes. We will discussfiveways to do this.

0.9.1 Puncturing codes

Let C be an [n, k, d] code over F^. We can puncture C by deleting the same

coordinate /in each codeword. The resulting code is still linear, its length is n -

1, and we often denote the punctured code by C . If G is a generator matrix for

C, then a generator matrix for C* is obtainedfi"omG by deleting column / (and

omitting a zero or duplicate row that may occur). What are the dimension and

minimum weight of C* ? Because C contains q'' codewords, the only way that

C* could contain f^wer codewords is if two codewords of C agree in all but

coordinate / . In that case C has minimum distance d = 1 and a codeword of

weight 1 whose nonzero entry is in coordinate i . The minimum distance

67
Introduction

decreases by 1 only if a minimum weight codeword of C has a nonzero /"'

coordinate.

Summarizing, we have the following property.

Property 0.9*1 Let C be an [n, k, d] code over F^, and let C* be the code C

punctured on the /"' coordinate.

(i) If d > 1, C" is an [n - 1, k,flf*] code where c/* = d - 1 if C has a mmimum

weight codeword with a nonzero f coordinate and c/'= d otherwise.

(ii) When d = 1, C* is an [n - 1, k, 1] code if C has no codeword of weight 1

whose nonkero entry is in coordinate / ; otherwise, if k > 1, C is an [n -

1, k - 1 , £/*] code with £/•>!.

Example 0.9.1 Let C be the [5, 2, 2] binary code with generator matrix

0 0
G=
1 1

Let C' and C] be the code C punctured on coordinates 1 and 5, respectively.

They have generator matrices

1 0 0 0' 'l 1 0 0
G; = and G: =
0 1 1 1 [o 0 1 1

So c; is a [4, 2, 1] code, while C] is a [4, 2, 2] code.

0.9.2 Extending codes

We can create longer codes by adding a coordinate. There are many possible

ways to extend a code but the most common is to choose the extension so that

68
Introduction

the new code has only even-like vectors. If C is an [n, k, d] code over F^,

define the extended code C to be the code

C = {x,X2...*„+, eF^"*' I X, JC2...x„ eC with x^+X2+...+ x„^., = 0}.

C is an [« + 1, k,ji ] code, where J = c? or c? + 1. Let G and H be generator and

parity check matrices, respectively, for C. Then a generator matrix Gfor Ccan

be obtained from G by adding an extra column to G so that the sum of the


A A

coordinates of each row of G is 0. A parity check matrix for C is the matrix

H=
H

This construction is also referred to as adding an overall parity check.

0.9.3 Shortening codes

Let C be an [n, k, cG code over F^ and let T be any set of t coordinates.

Consider the set C{T) of codewords which are 0 on T; this set is a subcode of

C. Puncturing C{T) on T gives a code over F^ of length n - t called the code

shortened on Tand demoted Cj.

Property 0.9.2 Let C be an [n, k, d\ code over F^. Let T be a set of t

coordinates. Then:

(i) (C^ )j. = (C^ )^ and (C^ f = (Q )"•, and

69
Introduction

(ii) If t' < d, then C^ and (C-^)rhave dimensions k and n - t - k,

respectively;

(iii) If t = d and T is the set of coordinates where a minimum weight codeword

is noniero, then C^ and (C"^)j,have dimensions k - 1 and n- d- k+ 1,

respectively.

0.9.4 Direct! sums

For i E {1, 2} let C, be an [np^,,^/,] code, both over the same finite field F^.

Then their direct sum is the [ Wj + Wj, ^i + ^2 > min(c?,, Jj) ] code

q®C,={(c^,c,)\c,eq,c,eC,}

If C, has generator matrix G, and parity check matrix/^, , then

G, e G2 ^ ' and i/, ® i/j =


vO G,^ ,0 H,,

are a generator matrix and parity check matrix forC, ® C^.

0.9.5 The (u | ti + v) construction

Two codes of the same length can be combined to form a third code of twice

the length in a way similar to the direct sum construction. Let C, be an [n,^,,£/.]

code for / £ | 1 , 2}, both over the same finite field F^. The (u | u + v)

construction produces the [2«, k^+k^, min(2fi?,, ^2)} ] code

70
Introduction

C = {(u|u + v)|ueC,,veC2}

If C, has generator matrix G, and parity check matrix H,, then generator and

parity check matrices for C are

(G, G,\ ( H, 0^
and
^0 G,j

Unlike the direct sum construction of the previous section, the (u | u + v)

construction can produce codes that are important for reasons other than

theoretical.

71

You might also like