H 5 Code
H 5 Code
David F. Brailsford
School of Computer Science, University of Nottingham, UK.
1. Introduction
The multi-error-correcting code used on the Mariner 9 Mars missions — to deliver error-
corrected, 64-level grayscale photographs of the Martian surface — was a first-order Reed-Muller
code. These codes were discovered in the early 1950s by Irving Reed and David Muller.
From a pure mathematical standpoint the first-order Reed-Muller codes can be regarded as a sub-
set of Hadamard codes which are, in turn, generated from a recursive sequence of Hadamard
matrices making use of tensor (or ‘Kronecker’, or ‘outer’) products of matrices. It is fortunate
that, as mere computer scientists, we can join in on this party by simply taking on board the
primitive-recursive algorithm shown in section 2, for Hn . After five levels of recursion we reach
the code H5 and this is the Reed-Muller code used on Mariner 9.
2 n − 1 0s 2 n − 1 1s
together with two-fold repetitions of the codewords from the previous basis set for Hn − 1 .
-2-
So:
H0 has basis set 1
⇒ H1 has basis set 01, 11
⇒ H2 has basis set 0011, 0101, 1111
and so on.
3. Construction of H2
We will now look in detail at the construction of all the codewords for the H2 code starting from
its basis set, which is: x 1 = 0011, x 2 = 0101, x 3 = 1111. Note that the zero vector (denoted as 0
or as x 0 ) is present in all the Hn codes and consists simply of a row of n zeroes.
We now add together (via bitwise binary addition i.e. ‘exclusive or’, denoted ⊕) the basis set vec-
tors in all possible combinations, as shown below
0 (i.e. x 0 ) = 0000
x 1 = 0011
x 2 = 0101
x 3 = 1111
x 1 ⊕ x 2 = 0110
x 1 ⊕ x 3 = 1100
x 2 ⊕ x 3 = 1010
x 1 ⊕ x 2 ⊕ x 3 = 1001
It is now apparent that H2 is an even-parity code of length 4. The first 3 bits, counting from the
left, can be thought of as the message bits and the 4th bit is then the parity check. The minimum
distance between the codewords is 2 and so the code can be characterised, in [n, k, d ] notation as
a [4, 3, 2]-code. Like all codes with a single-bit parity check, its capabilities are limited to detect-
ing one error.
Another fascinating property of this code is that it embeds all of the eight possible 3-bit patterns.
Look at the codeword list again, with bit positions, starting from the left, numbered from 1
upwards. An inspection of bits 1, 2 and 4 (the “power-of-two positions”) over all eight code-
words, reveals that they form a collection (sometimes called C3 ) of all the possible 3-bit
patterns — though not in any particular order.
Finally, we note that these 4-bit codewords form a linear vector space because bitwise combina-
tion of any two of them, using the ⊕ operation, yields some other codeword in this same set. Two
of the codewords are worth a special mention: 0000 when added to any other codeword will leave
it unchanged; 1111 when added to any other codeword will deliver back the bitwise inverse of
that codeword i.e. with every bit flipped.
Readers familiar with matrix operations will note that it is not necessary to write out all the
basis-set combinations, as we have done above, in order to develop the full set of codewords. By
writing out the basis vectors (including x 0 ) as a matrix we obtain:
0 0 0 0
0 0 1 1
0 1 0 1
1 1 1 1
This generator matrix can be multiplied on the left by a suitable set of vectors specifying which
members of the basis set are to be combined, via ⊕, to form any given codeword.
-3-
4. Construction of H3
The next stage along the way to H5 is to generate H3 by iterating the method of section 2 once
again, but now using the H2 basis set as the starting point. In this way we obtain :
x 1 = 00001111; x 2 = 00110011; x 3 = 01010101; x 4 = 11111111
and by noting that the 0 vector is x 0 = 00000000 on this occasion, we can once again construct all
combinations of x 1 . . . x 4 in order to generate the complete set of 16 codewords:
Codewords
0–7 Codewords 7–16
00000000 01100110
00001111 11001100
00110011 10101010
01010101 01101001
11111111 11000011
00111100 10100101
01011010 10011001
11110000 10010110
There are no prizes for guessing that within these 16 codewords the C4 code (i.e. all the 4-bit
combinations) is embedded at bit positions 1, 2, 4, and 8.
A careful inspection of all 16 codewords reveals that this is an [8, 4, 4]-code. Therefore the num-
ber of correctable errors in any codeword will be ( d − 1 ) / 2 = 3 / 2 = 1.5 = 1. At this
stage in the Hn recursive build-up we have generated a code that is broadly similar in its capabili-
ties to those found in the [7, 4, 3] Hamming code.
5. Construction of H4
We now turn back to Section 2 once again and note that for H4 the new basis-set’s first vector
will consist of eight zeros followed by eight ones. Four other basis set vectors are obtained by
two-fold repetition of the 4 basis set vectors we used for H3 . Hence the basis set is:
0000000011111111; 0000111100001111; 0011001100110011; 0101010101010101; 1111111111111111
As ever we can supplement this basis set with a 0 vector consisting of 16 zeroes.
Exercise: Using the basis set just given, construct the full set of 32 codewords for H4 . Verify
that this forms a [16, 5, 8] linear code and that the C5 code is embedded at bit positions 1, 2, 4, 8,
and 16.
The Hamming code that is closest to H4 , in terms of codeword length, is [15, 11, 3] and this
shows us very clearly the nature of the tradeoff we are now making. H4 has 5 message bits i.e.
less than 50% of the 11 message bits the Hamming code allows. But in return H4 can correct
( d − 1 ) / 2 = 7 / 2 = 3.5 = 3. errors whereas Hamming codes can only correct 1 error.
We can now speculate that one further application of Section 2’s recursive algorithm will cer-
tainly result in fairly lengthy (32-bit) codewords but might possibly yield even more impressive
error-correction capabilities.
00000000000000000000000000000000 10101010101010101010101010101010
00000000000000001111111111111111 10101010101010100101010101010101
00000000111111111111111100000000 10101010010101010101010110101010
00000000111111110000000011111111 10101010010101011010101001010101
00001111111100000000111111110000 10100101010110101010010101011010
00001111111100001111000000001111 10100101010110100101101010100101
00001111000011111111000011110000 10100101101001010101101001011010
00001111000011110000111100001111 10100101101001011010010110100101
00111100001111000011110000111100 10010110100101101001011010010110
00111100001111001100001111000011 10010110100101100110100101101001
00111100110000111100001100111100 10010110011010010110100110010110
00111100110000110011110011000011 10010110011010011001011001101001
00110011110011000011001111001100 10011001011001101001100101100110
00110011110011001100110000110011 10011001011001100110011010011001
00110011001100111100110011001100 10011001100110010110011001100110
00110011001100110011001100110011 10011001100110011001100110011001
01100110011001100110011001100110 11001100110011001100110011001100
01100110011001101001100110011001 11001100110011000011001100110011
01100110100110011001100101100110 11001100001100110011001111001100
01100110100110010110011010011001 11001100001100111100110000110011
01101001100101100110100110010110 11000011001111001100001100111100
01101001100101101001011001101001 11000011001111000011110011000011
01101001011010011001011010010110 11000011110000110011110000111100
01101001011010010110100101101001 11000011110000111100001111000011
01011010010110100101101001011010 11110000111100001111000011110000
01011010010110101010010110100101 11110000111100000000111100001111
01011010101001011010010101011010 11110000000011110000111111110000
01011010101001010101101010100101 11110000000011111111000000001111
01010101101010100101010110101010 11111111000000001111111100000000
01010101101010101010101001010101 11111111000000000000000011111111
01010101010101011010101010101010 11111111111111110000000000000000
01010101010101010101010101010101 11111111111111111111111111111111
Clearly H5 is an even parity [32, 6, 16]-code. Its six message bits (at positions 1, 2, 4, 8, 16 and
32) exactly allow for 64 grayscale shades. Moreover the codeword distance of 16 allows for up to
( 16 − 1 ) / 2 = 15 / 2 = 7.5 = 7 errors to be corrected.
Even as early as the mid-1960s Mariner 3 mission (which took photos during a Mars ’fly-by’) the
64-shade grayscale was in use. By the time Mariner 9 came along (1969–1972) it was possible to
put the spacecraft in orbit around Mars. Furthermore, better technology enabled an 18-bit scale
for each pixel. Even so these 18 bits were packaged up into three 6-bit sub-packages precisely so
that the Reed-Muller H5 6-message-bit code could still be used.
6.1. Decoding H5
It’s almost inevitable to discover that it’s far easier to invent a good code, that is robust against
various levels of noise, than it is to decode it when damaged codewords are received. Indeed
there is a general result to the effect that decoding, in the worst case, can be NP-complete i.e. you
-5-
7. Further reading
The list below gives some details of publications I have found useful. I have listed them in order
of the amount of mathematics you need in order to get to grips with what they are saying, with
the easiest listed first. All of them touch on Reed-Muller codes in one way or another, although
the SMP book gives the recursive algorithm without saying what the Hn codes are called !
SMP (1991)
Ron Haydock, Stan Dolan, and Andy Hall,
Information and Coding–The School Mathematics Project,
Cambridge University Press, 1991.
This book is one of the very few introductory texts that is accessible to those with only a High
School level of Mathematics. In places it uses its own non-standard notation (e.g "codenames"
rather than "codewords" and "basic set" rather than "basis set") so be careful. In addition it uses
the notation (n, k, d) to characterise a linear code, rather than [n, k, d]. This leads to confusion
with respect to earlier books (e.g. Welsh, Hill – see below) which both reserve round brackets for
denoting a (n, M, d)-code where M is the ‘number of messages’, as opposed to being ‘the number
of bits, (k) devoted to the message part’. Given that M is not constrained to being a power of two,
and can take on odd or even values, the (n, M, d) notation can represent non-linear codes.
Kun (2015)
Jeremy Kun, The Codes of Solomon, Reed and Muller,
https://jeremykun.com/2015/03/23/the-codes-of-solomon-reed-and-muller. March 2015
Jeremy Kun provides a very useful set of brief papers and primers on many topics, including
Coding Theory. The level of mathematics needed is about the same as that for Welsh’s book.
Welsh (1988)
Dominic Welsh, Codes and Cryptography, Oxford University Press, 1988.
Chapter 4 is especially useful. If you have had at least some exposure to University-level linear
algebra (including vector spaces and matrices) then you should be able to cope.
Hill (1983)
Raymond Hill, A First Course in Coding Theory, Oxford University Press, 1986.
Another approachable book, written at an elementary level for pure mathematicians. Computer
Scientists should be able to cope provided they have had some experience of linear algebra and
vector spaces.
Mackay (2003)
David J. C. Mackay, Information Theory Inference and Learning Algorithms,
Cambridge University Press, 2003.
(see also http://www.inference.phy.cam.ac.uk/mackay/)
David Mackay was a most talented polymath, sadly taken away from us in 2015 as a result of
cancer. He had been a champion of the move, in recent years, to LDPC (Low Density Parity
Codes) This book is at a 2nd/3rd year University-level text for keen engineers, mathematicians
and computer scientists
Reed (1954)
Irving Reed, A class of multiple-error-correcting codes and the decoding scheme,
Transactions of the IRE Professional Group on Information Theory, vol. 4, p. 38–49, 1954.
A surprisingly approachable paper, from a renowned coding theorist, setting out the earliest gen-
eral approach to majority-logic coding in the context of what we now call Reed-Muller codes.
-7-
Appendix I. This is an awk script for generating and testing the H5 code
{
# basis vectors for H2
x0 = "0000"; x1 = "0011"; x2 = "0101"; x3 = "1111"
#basis vectors for H3
y0 = "00000000"; y1 = "00001111"; y3 = "00110011"; y2 = "01010101"; y4 = "11111111";
# basis vectors for H4
z0 = "0000000000000000"; z1 = "0000000011111111"; z2 = "0000111100001111";
z3 = "0011001100110011"; z4 = "0101010101010101"; z5 = "1111111111111111";
# Basis vectors for H5. Build these from concatenations of the z basis vectors for H4
w0 = conc(z0, z0); w1 = conc(z0,z5); w2 = conc(z1, z1); w3 = conc(z2, z2);
w4 = conc(z3,z3); w5 = conc(z4,z4); w6 = conc(z5,z5);
#
# H5 codewords will be built up in the array cw from all combinations of the above
# w[0-6]. Note that the 6 "message bits" in an H5 codeword (reading from the left
# and starting at 1) are at bit positions 1,2,4,8,16,32. The ordering of
# codewords in the cw array might seem arbitrary but it owes much to Pascal’s
# Triangle - which makes hand-checking easier - See assignments to "cw" below.
# The integer array "map". that now follows. points at elements in the cw array
# and reorders them so as to impose an ascending order of 6-bit message values
# i.e. increasing grayscale vals. Hence, map[0] leads to msg bits of "000000",
# map[31] to msg bits of "011111" and map[63] # delivers a msg of "111111" etc.
#
map[0] = 0; map[1] = 1; map[2] = 7; map[3] = 2; map[4] = 12; map[5] = 22; map[6] = 8;
map[7] = 3; map[8] = 16; map[9] = 26; map[10] = 42; map[11] = 32; map[12] =13;
map[13] = 23; map[14] = 9; map[15] = 4; map[16] = 19; map[17] = 29; map[18] = 45;
map[19] = 35; map[20] = 52; map[21] = 57; map[22] = 48; map[23] = 38; map[24] = 17;
map[25] = 27; map[26] = 43; map[27] = 33; map[28] = 14; map[29] = 24; map[30] = 10;
map[31] = 5; map[32] = 21; map[33] = 31; map[34] = 47; map[35] = 37; map[36] = 54;
map[37] = 59; map[38] = 50; map[39] = 40; map[40] = 56; map[41] = 61; map[42] = 63;
map[43] = 62; map[44] = 55; map[45] = 60; map[46] = 51; map[47] = 41; map[48] = 20;
map[49] = 30; map[50] = 46; map[51] = 36; map[52] = 53; map[53] = 58; map[54] = 49;
map[55] = 39; map[56] = 18; map[57] = 28; map[58] = 44; map[59] = 34; map[60] = 15;
map[61] = 25; map[62] = 11; map[63] = 6;
#
#Building up the H5 (Reed-Muller) codewords
#
#Single vectors that form the basis set
cw[0] = w0; cw[1] = w1; cw[2] = w2; cw[3] = w3; cw[4] = w4; cw[5] = w5; cw[6] = w6;
# combinatios of two basis vectors
cw[7] = rowxor(w1, w2)
cw[8] = rowxor(w1, w3)
cw[9] = rowxor(w1, w4)
cw[10] = rowxor(w1, w5)
cw[11] = rowxor(w1, w6)
cw[12] = rowxor(w2, w3)
cw[13] = rowxor(w2, w4)
cw[14] = rowxor(w2, w5)
cw[15] = rowxor(w2, w6)
cw[16] = rowxor(w3, w4)
cw[17] = rowxor(w3, w5)
cw[18] = rowxor(w3, w6)
cw[19] = rowxor(w4, w5)
cw[20] = rowxor(w4, w6)
cw[21] = rowxor(w5, w6)
# combs of 3 basis vectors
cw[22] = rowxor(cw[7], w3) # triples starting 12
-8-
x = codematch(newtest48, 0);
x = codematch(random, 1);
# Test print of the codeword norms
#print ("\n here are the norms in order\n\n");
#for (k = 0; k <= 63; k++) print norm(cw[map[k]]);
exit
} #end of H5 program
# the function below is not needed on Linux awk - where xor seems to be built in
#
#function xor (a, b){
# primitive xor on single char zeroes and ones
#if ((a=="0" && b == "0) || (a == 1 && b == 1))
#return "0"
#else return "1"
#}