9 Run Length Codes
9 Run Length Codes
0 0 0 1
1 1 0 0
M =
0 0 0 1
1 1 1 0
The following binary string arises when the image is horizontally scanned:
(0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0)
The run lengths are r1 = 3, r2 = 3, r3 = 5, r4 = 4, r5 = 1. Huffman coding of the string of run
lengths would yield 10 codebits. A prefix of 0 to the Huffman codeword would be needed to indicate that
the first data symbol is “0”. This yields a total of 11 codebits at the run length encoder output. We have
overlooked the overhead that would be involved in telling the decoder what run lengths appeared and with
what frequencies. In this example, the overhead would be considerable. In the run length encoding of a large
image, the overhead would be almost insignificant. We will discuss the issue of overhead in the case study
at the end of this section. The reader can now see the advantage in using a prefabricated run length code
as opposed to a run length code adapted to the data as we did here—the prefabricated code would involve
no overhead.
In the last section is given the MATLAB program “runlengths.m” for computing the sequence of run-
lengths for a horizontally scanned binary image. To illustrate the use of this function, suppose we want to
find the run lengths for the 4 × 4 image above. Then, we execute the following one line MATLAB command
which gives us the result:
ans =
3 3 5 4 1
1
>> load lena.dat
>> M=lena;
>> size(M)
ans =
256 256
>> r=runlengths(M);
>> entropy(r)*length(r)/65536
ans =
0.3827
%The preceding is the approximate compression rate in codebits
%per pixel if arithmetic coding of the run lengths is employed
%Now we try separate encoding of the odd and even indexed runlengths
>> length(r)
ans =
5410
>> i=1:2705;
>> i=2*i-1;
%We extract the odd indexed runlengths
>> r_odd = r(i);
%We extract the even indexed runlengths
>> r_even = r(i+1);
%We compute the approximate compression rate
>> compression_rate = (entropy(r_odd)*2705+entropy(r_even)*2705)/65536
compression_rate =
0.3769
Let us discuss the amount of overhead that would have to be used in in the above study, thereby increasing
the compression rate. It can be determined that the largest odd indexed run length in our case study is
121 and the largest even indexed run length is 79. (Since the first entry of “lena” is a “1”, this corresponds
to a longest run of ones equal to 121 ones, and the longest run of zeroes equal to 79 zeroes.) In the first
part of the overhead, we would let the decoder know the two integers 121 and 79. Using the function
“index to bitstring”, we can represent the integer 121 as “111010” and we can represent the integer 79 as
“010000”. We can write each digit in the first representation two times, follow that by “01”, write each digit
in the second representation twice, and follow that by “01”. This results in the string
1111110011000100110000000001
which can be the first part of our overhead—from this string, the integers 121 and 79 can be determined
by the decoder. The decoder then looks at the next 121 bits of overhead to determine which of the integers
{1, 2, . . . , 121} actually occur among the odd indexed runlengths, and then the decoder can look at the next
79 bits of overhead following that to determine which of the integers {1, 2, . . . , 79} actually occurred as even
indexed runlengths. The last bit of overhead needed would be to allow the decoder to determine the lengths
of r_odd and r_even, so that r_odd and r_even can be arithmetically decoded. The length of r_odd is 2705
which converts to
0011001100001100001100
2
when “index to bitstring” is used and each digit is repeated. We can conclude the overhead with the
preceding string followed by “01” to denote that r_odd and r_even have the same length (we would have
used “10” if the length of r_odd were one greater). The overhead is now complete—we don’t need to give
the frequencies of the runlengths as part of the overhead, because we can use an adaptive arithmetic code
to encode r_odd and r_even. So the total length of the overhead required is
2 ∗ 6 + 2 + 2 ∗ 6 + 2 + 121 + 79 + 22 + 2 = 252
which increases the amount of the compression rate only by the not very significiant amount of 252/65536 =
.0038.
Elias codewords
runlength codeword runlength codeword
1 0 11 11101011
2 1010 12 11101100
3 1011 13 11101101
4 1000100 14 11101110
5 1000101 15 11101111
6 1000110 16 10000010000
7 1000111 17 10000010001
8 11101000 18 10000010010
9 11101001 19 10000010011
10 11101010 20 10000010100
EXAMPLE. To form the Elias codeword for 100, you first expand 100 in binary, getting 1100100. This
requires 7 bits. You then subtract one from 7, getting 6. You then expand 6 in binary, getting 110. You then
write down each entry of 110 twice, getting 111100. You then delete “1” from the beginning and append
“0” to the end, getting 111000. To obtain the Elias codeword for 100, you then concatenate 111000 with
1100100, obtaining
E(100) = 1110001100100
These steps can be reversed. The appearance of the first 01 in E(100) allows the decoder to split 111000
off from 1100100. From 111000, the decoder determines that the next 7 bits are the binary representation
of the integer that was encoded. (Remember that as a result of the run length encoding process, the string
1100100 will be followed by other codebits and will have to be split off from these codebits—the decoder will
know that exactly 7 codebits have to be split off.)
Elias codewords work well when there is a very large number of different run length values. In such a
scenario, the Elias codewords can be shown to yield a run length code compression performance close to that
of an arithmetic coder.
3
9.2.2 Golomb codewords
When the frequencies of the runlengths in the sequence of runlengths r1 , r2 , . . . , rk follow an approximate
geometric distribution, then good compression performance is obtained by using Golomb codewords to encode
the run lengths. Specifically, suppose for some parameter p between zero and one, the normalized frequency
with which ri = j is approximately equal to
pj−1 (1 − p) (1)
(Count the number of ri for which ri = j and divide and k, and see if this number is roughly that given
in (1) for each j.) Then, for certain values of p, Golomb tabulated the codewords one can use to get good
performance. The Golomb codewords can be shown to give the optimum run length code compression
performance for the geometrically distributed run lengths (1).
Golomb codewords (p = 1/2)
runlength codeword runlength codeword
1 0 6 111110
2 10 7 1111110
3 110 8 11111110
4 1110 9 111111110
5 11110 10 1111111110
The reader can see from these tables that a Golomb codeword w for pm = 1/2 gives rise to the two Golomb
codewords w0 and w1 for pm+1 = 1/2.
4
9.3 Run length coding of bit planes
Run length coding cannot be directly applied to a gray level image. Suppose there are 256 gray levels in a
gray level image. This means the image is represented as a matrix in which each element (i.e., each pixel
value) comes from the alphabet {0, 1, . . . , 255}. Each pixel value can be represented using 8 bits, the usual
binary code representation of the value, with most significant bit on the left. The most natural approach at
this point would seem to be to horizontally scan the image, concatenating together these 8-bit representations
to obtain one very long binary string, which one could then run length encode. However, this approach turns
out not to be as effective as other approaches for real images.
A better approach for a 256 level image would be to extract 8 “bit planes” from the image. Each of the
eight bit planes is a binary matrix. The first bit plane is obtained by replacing each pixel value in the image
with the most significant bit in its 8-bit representation. The next bit plane is obtained by replacing each
pixel value in the image with the second most significant bit in its 8-bit representation. One continues in this
way until all 8 bit planes are generated. One could then run length encode the separate bit planes, obtaining
8 separate runlength codewords, which could be concatenated together to yield the overall codeword for the
original gray level image. However, here is a simple example to show that this approach will sometimes
not work very well. Suppose you have a 256 level image in which half the pixel values are 127 and half the
pixel values are 128. Direct Huffman coding of the image would yield a compression rate of one codebit per
pixel. Now examine what happens with the 8 bit planes. The binary expansion of 127 is 01111111 and the
binary expansion of 128 is 10000000. Therefore each bit plane will have half of its pixel values equal to zero
and half of its pixel values equal to one. Run length encoding of each bit plane therefore could conceivably
yield a compression rate of one codebit per pixel for each of the 8 bit planes. Overall, this would lead to a
compression rate for the original gray level image of 8 codebits per pixel, no compression at all, and 8 times
worse than what is theoretically possible to achieve!
Generally speaking, separate run-length coding of bit planes will work better if one uses the Gray code
representation for each pixel value instead of the usual binary code representation. When one converts each
integer in the set {0, 1, 2, . . . , 2i −1} into its i-bit Gray code representation, one will find that two consecutive
integers j, j + 1 will have Gray code representation strings that agree with each other except in just one
position. For example, the Gray code representation of the integer 127 is 01000000 and the Gray code
representation of the integer 128 is 11000000. If we use these representations in the example in the preceding
paragraph, the first bit plane will have one half of the pixel values equal to one and the other half equal
to zero, whereas bit planes two through eight will have all pixel values equal to zero. Separate run length
encoding of these bit planes will yield a compression rate almost equal to 1 codebit per pixel for the original
image, the best possible result.
The mapping rule for going from the binary code representation to the Gray code representation is very
easy to describe: Simply complement every entry in the binary code representation that is preceded by a 1.
For example, 10011011 in binary code becomes 11010010 in Gray code because the second, fifth, and eighth
entries are preceded by ones and are therefore complemented.
We have obtained MATLAB programs “binarycode.m” and “graycode.m” for computing the binary
code and Gray code representations of an integer. For example, to obtain the 8-bit binary codeword for the
integer 127, you would execute the command
binarycode(127,8)
To obtain the 4-bit binary codeword for the integer 13, you would execute the MATLAB command
binarycode(13,4)
To obtain the 8-bit Gray codeword for the integer 127, execute the command
graycode(127,8)
and to obtain the 4-bit Gray codeword for the integer 13, execute the command
graycode(13,4)
EXERCISE. Try to discover the simple rule for going from the Gray code representation back to the
binary code representation.
5
9.3.1 Case Study: A 256 × 256 8-bit image
We do separate run length encoding of the bit planes for the 256 × 256 8-bit Lena image, using the Gray
code representation of the pixel values in extracting the bit planes. Here is the our MATLAB diary:
ans =
256 256
ans =
256 8
%Generate bitplanes
>> for i=1:256;
for j=1:256;
q=M(i,j);
y=A(q+1,:);
bitplane1(i,j)=y(1);
bitplane2(i,j)=y(2);
bitplane3(i,j)=y(3);
bitplane4(i,j)=y(4);
bitplane5(i,j)=y(5);
bitplane6(i,j)=y(6);
bitplane7(i,j)=y(7);
bitplane8(i,j)=y(8);
end
end
%Generate runlengths in each bitplane
>> r1=runlengths(bitplane1);
>> r2=runlengths(bitplane2);
>> r3=runlengths(bitplane3);
>> r4=runlengths(bitplane4);
>> r5=runlengths(bitplane5);
>> r6=runlengths(bitplane6);
>> r7=runlengths(bitplane7);
>> r8=runlengths(bitplane8);
>> c1=entropy(r1)*length(r1);
>> c2=entropy(r2)*length(r2);
>> c3=entropy(r3)*length(r3);
>> c4=entropy(r4)*length(r4);
>> c5=entropy(r5)*length(r5);
>> c6=entropy(r6)*length(r6);
>> c7=entropy(r7)*length(r7);
>> c8=entropy(r8)*length(r8);
>> compression_rate = (c1+c2+c3+c4+c5+c6+c7+c8)/65536
6
compression_rate =
5.9028
We approximated arithmetic codeword lengths using entropy and we didn’t worry about overhead, since
this should be negligible, in view of our earlier discussion. So, we expect that this bit plane encoding method
will yield a compression rate of about 5.9 codebits per pixel for the 256 × 256 8-bit (256 level) Lena image.
EXERCISES. Re-do the above compression experiment using separate arithmetic encoders for the even
and odd indexed runlengths, and see how much better the compression rate becomes. Re-do the above
compression experiment using the binary code representation in obtaining the bitplanes, and see how much
worse the compression rate becomes.
To more easily explain what goes on, suppose we have a 16-level image. The pixel values then come from
the set {0, 1, 2, . . . , 15}. The codewords to be assigned to the pixel values are
7
ends in “1”, then the current second pass codeword is taken to be the complement of the current first pass
codeword; otherwise, the current second pass codeword is taken as the current first pass codeword. To
illustrate, suppose the precoder has assigned the following sequence of codewords on the first pass:
9.5 Programs
We developed three MATLAB programs, “runlengths.m”, “binarycode.m”, and “graycode.m”. Here are
the m-files:
8
y=x;
else
y=[zeros(1,L-n) x];
end