Hamming Code
Hamming Code
In mathematical terms, Hamming codes are a class of binary linear codes. For each
integer m > 2 there is a code with m parity bits and 2m − m − 1 data bits. The parity-
check matrix of a Hamming code is constructed by listing all columns of length m that
are pairwise independent.
Because of the simplicity of Hamming codes, they are widely used in computer memory
(RAM). In particular, a single-error-correcting and double-error-detecting variant
commonly referred to as SECDED.
Contents
[hide]
• 1 History
o 1.1 Codes predating Hamming
1.1.1 Parity
1.1.2 Two-out-of-five code
• 2 Hamming codes
o 2.1 General algorithm
• 3 Hamming codes with additional parity (SECDED)
• 4 Hamming(7,4) code
• 5 Construction of G and H
• 6 Encoding
• 7 Hamming(8,4) code
• 8 Hamming(11,7) code
• 9 See also
• 10 References
• 11 External links
[edit] History
Hamming worked at Bell Labs in the 1940s on the Bell Model V computer, an
electromechanical relay-based machine with cycle times in seconds. Input was fed in on
punch cards, which would invariably have read errors. During weekdays, special code
would find errors and flash lights so the operators could correct the problem. During
after-hours periods and on weekends, when there were no operators, the machine simply
moved on to the next job.
Hamming worked on weekends, and grew increasingly frustrated with having to restart
his programs from scratch due to the unreliability of the card reader. Over the next few
years he worked on the problem of error-correction, developing an increasingly powerful
array of algorithms. In 1950 he published what is now known as Hamming Code, which
remains in use today in applications such as ECC memory.
A number of simple error-detecting codes were used before Hamming codes, but none
were as effective as Hamming codes in the same overhead of space.
[edit] Parity
Parity adds a single bit that indicates whether the number of 1 bits in the preceding data
was even or odd. If an odd number of bits is changed in transmission, the message will
change parity and the error can be detected at this point. (Note that the bit that changed
may have been the parity bit itself!) The most common convention is that a parity value
of 1 indicates that there is an odd number of ones in the data, and a parity value of 0
indicates that there is an even number of ones in the data. In other words: The data and
the parity bit together should contain an even number of 1s.
Parity checking is not very robust, since if the number of bits changed is even, the check
bit will be valid and the error will not be detected. Moreover, parity does not indicate
which bit contained the error, even when it can detect it. The data must be discarded
entirely and re-transmitted from scratch. On a noisy transmission medium, a successful
transmission could take a long time or may never occur. However, while the quality of
parity checking is poor, since it uses only a single bit, this method results in the least
overhead. Furthermore, parity checking does allow for the restoration of an erroneous bit
when its position is known.
Hamming studied the existing coding schemes, including two-of-five, and generalized
their concepts. To start with, he developed a nomenclature to describe the system,
including the number of data bits and error-correction bits in a block. For instance, parity
includes a single bit for any data word, so assuming ASCII words with 7-bits, Hamming
described this as an (8,7) code, with eight bits in total, of which 7 are data. The repetition
example would be (3,1), following the same logic. The code rate is the second number
divided by the first, for our repetition example, 1/3.
Hamming also noticed the problems with flipping two or more bits, and described this as
the "distance" (it is now called the Hamming distance, after him). Parity has a distance of
2, as any two bit flips will be invisible. The (3,1) repetition has a distance of 3, as three
bits need to be flipped in the same triple to obtain another code word with no visible
errors. A (4,1) repetition (each bit is repeated four times) has a distance of 4, so flipping
two bits can be detected, but not corrected. When three bits flip in the same group there
can be situations where the code corrects towards the wrong code word.
Hamming was interested in two problems at once; increasing the distance as much as
possible, while at the same time increasing the code rate as much as possible. During the
1940s he developed several encoding schemes that were dramatic improvements on
existing codes. The key to all of his systems was to have the parity bits overlap, such that
they managed to check each other as well as the data.
The following general algorithm generates a single-error correcting (SEC) code for any
number of bits.
The form of the parity is irrelevant. Even parity is simpler from the perspective of
theoretical mathematics, but there is no difference in practice.
Bit position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Encoded data
p1 p2 d1 p4 d2 d3 d4 p8 d5 d6 d7 d8 d9 d10 d11 p16 d12 d13 d14 d15
bits
p1 X X X X X X X X X X
...
p2 X X X X X X X X X X
Parity
bit p4 X X X X X X X X X
coverage
p8 X X X X X X X X
p16 X X X X X
Shown are only 20 encoded bits (5 parity, 15 data) but the pattern continues indefinitely.
The key thing about Hamming Codes that can be seen from visual inspection is that any
given bit is included in a unique set of parity bits. To check for errors, check all of the
parity bits. The pattern of errors, called the error syndrome, identifies the bit in error. If
all parity bits are correct, there is no error. Otherwise, the sum of the positions of the
erroneous parity bits identifies the erroneous bit. For example, if the parity bits in
positions 1, 2 and 8 indicate an error, then bit 1+2+8=11 is in error. If only one parity bit
indicates an error, the parity bit itself is in error.
If, in addition, an overall parity bit (bit 0) is included, the code can detect (but not
correct) any two-bit error, making a SECDED code. The overall parity indicates whether
the total number of errors is even or odd. If the basic Hamming code detects an error, but
the overall parity says that there are an even number of errors, an uncorrectable 2-bit
error has occurred.
By including an extra parity bit, it is possible to increase the minimum distance of the
Hamming code to 4. This gives the code the ability to detect and correct a single error
and at the same time detect (but not correct) a double error. (It could also be used to
detect up to 3 errors but not correct any.)
Graphical depiction of the 4 data bits and 3 parity bits and which parity bits apply to
which data bits
Main article: Hamming(7,4)
In 1950, Hamming introduced the (7,4) code. It encodes 4 data bits into 7 bits by adding
three parity bits. Hamming(7,4) can detect and correct single-bit errors and also can
detect (but not correct) double-bit errors.
This is the construction of G and H in standard (or systematic) form. Regardless of form,
G and H for linear block codes must satisfy
Thus H is a matrix whose left side is all of the nonzero n-tuples where order of the n-
tuples in the columns of matrix does not matter. The right hand side is just the (n-k)-
identity matrix.
So G can be obtained from H by taking the transpose of the left hand side of H with the
identity k-identity matrix on the left hand side of G.
and
Finally, these matrices can be mutated into equivalent non-systematic codes by the
following operations [Moon, p. 85]:
[edit] Encoding
Example
From the above matrix we have 2k=24=16 codewords. The codewords of this binary code
can be obtained from . With with ai exist in F2 ( A field with two elements namely 0 and
1).
Therefore,
The same (7,4) example from above with an extra parity bit
The Hamming(7,4) can easily be extended to an (8,4) code by adding an extra parity bit
on top of the (7,4) encoded word (see Hamming(7,4)). This can be summed up with the
revised matrices:
and
Note that H is not in standard form. To obtain G, elementary row operations can be used
to obtain an equivalent matrix to H in systematic form:
For example, the first row in this matrix is the sum of the second and third rows of H in
non-systematic form. Using the systematic construction for Hamming codes from above,
the matrix A is apparent and the systematic form of G is written as
The non-systematic form of G can be row reduced (using elementary row operations) to
match this matrix.
The addition of the fourth row effectively computes the sum of all the codeword bits
(data and parity) as the fourth parity bit.
For example, 1011 is encoded into 01100110 where blue digits are data; red digits are
parity from the Hamming(7,4) code; and the green digit is the parity added by
Hamming(8,4). The green digit makes the parity of the (7,4) code even.
Finally, it can be shown that the minimum distance has increased from 3, as with the
(7,4) code, to 4 with the (8,4) code. Therefore, the code can be defined as
Hamming(8,4,4).
Graphical depiction of the 7 data bits and 4 parity bits and which parity bits apply to
which data bits
Mapping in the example data value. The parity of the red, yellow, green, and blue circles
are even.
A bit error on bit 11 causes bad parity in the red, yellow, and green circles
p1 p2 d1 p3 d2 d3 d4 p4 d5 d6 d7
p1 1 0 1 0 1 1
p2 0 0 1 0 0 1
p3 0 1 1 0
p4 0 1 0 1
The new data word (with parity bits) is now "10001100101". We now assume the final
bit gets corrupted and turned from 1 to 0. Our new data word is "10001100100"; and this
time when we analyze how the Hamming codes were created we flag each parity bit as 1
when the even parity check fails.
Received data
1 0 0 0 1 1 0 0 1 0 0
word:
p1 1 0 1 0 1 0 Fail 1
p2 0 0 1 0 0 0 Fail 1
p3 0 1 1 0 Pass 0
p4 0 1 0 0 Fail 1
The final step is to evaluate the value of the parity bits (remembering the bit with lowest
index is the least significant bit, i.e., it goes furthest to the right). The integer value of the
parity bits is 11, signifying that the 11th bit in the data word (including parity bits) is
wrong and needs to be flipped.
p4 p3 p2 p1
Binary 1 0 1 1
Decimal 8 2 1 Σ = 11
Flipping the 11th bit changes 10001100100 back into 10001100101. Removing the
Hamming codes gives the original data word of 0110101.
Note that in this case, as parity bits do not check each other, if a single parity bit check
fails while all others succeed, then the bit error occurred in this parity bit and not any bit
it checks.
Finally, suppose two bits change, at positions x and y. If x and y have the same bit at the
2k position in their binary representations, then the parity bit corresponding to that
position checks them both, and so will remain the same. However, some parity bit must
be altered, because x ≠ y, and so some two corresponding bits differ in x and y. Thus, the
Hamming code detects all two bit errors — however, it cannot distinguish them from 1-
bit errors.