0% found this document useful (0 votes)
59 views

9 Run Length Codes

Run length codes compress binary data by encoding runs of consecutive 0s and 1s. The document discusses: - Run length codes represent a binary string by encoding the sequence of run lengths (lengths of consecutive 0s and 1s). - Run lengths can be encoded using Huffman codes, arithmetic codes, or prefabricated codes like Elias or Golomb codes. - Run length coding was applied to compress the "Lena" image, achieving a compression rate of about 0.38 bits per pixel using arithmetic coding of separately encoded odd and even run lengths. Overhead from transmitting additional data was estimated to increase the rate by only 0.0038 bits per pixel.

Uploaded by

JohnKieffer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

9 Run Length Codes

Run length codes compress binary data by encoding runs of consecutive 0s and 1s. The document discusses: - Run length codes represent a binary string by encoding the sequence of run lengths (lengths of consecutive 0s and 1s). - Run lengths can be encoded using Huffman codes, arithmetic codes, or prefabricated codes like Elias or Golomb codes. - Run length coding was applied to compress the "Lena" image, achieving a compression rate of about 0.38 bits per pixel using arithmetic coding of separately encoded odd and even run lengths. Overhead from transmitting additional data was estimated to increase the rate by only 0.0038 bits per pixel.

Uploaded by

JohnKieffer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Data Compression Seminar John Kieffer

9 Run Length Codes


Run length codes are useful for the compression of binary data. Suppose you have a binary data string. If
the first symbol in the string is “0”, then there will be positive integers r1 , r2 , . . . , rk (the run lengths) such
that the string will consist of r1 zeroes, followed by r2 ones, followed by r3 zeroes, etc. If the first symbol
is “1”, then the only difference will be that you have r1 ones, followed by r2 zeroes, etc. A run length code
encodes the binary data string via encoding of the sequence of run lengths r1 , r2 , . . . , rk . (In addition, one
would prefix the run length encoder output with a “0” or “1” to denote how the string begins.) To encode
the run lengths, one could use a Huffman code, arithmetic code, or a prefabricated code which works well
on a certain type of data. For a little bit better compression, one could separately encode the odd indexed
run lengths (r1 , r3 , r5 , . . .) and the even indexed runlengths (r2 , r4 , r6 , . . .).
In Section 9.1, we apply run length codes to the compression of binary images. In Section 9.2, we consider
prefabricated run length codes. In 9.3-9.4, we cover techniques via which run length coding can be applied
to gray level images.

9.1 Run length coding of binary images


Consider the binary image given by the following matrix M :

0 0 0 1
 
1 1 0 0
M =
0 0 0 1

1 1 1 0
The following binary string arises when the image is horizontally scanned:

(0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0)
The run lengths are r1 = 3, r2 = 3, r3 = 5, r4 = 4, r5 = 1. Huffman coding of the string of run
lengths would yield 10 codebits. A prefix of 0 to the Huffman codeword would be needed to indicate that
the first data symbol is “0”. This yields a total of 11 codebits at the run length encoder output. We have
overlooked the overhead that would be involved in telling the decoder what run lengths appeared and with
what frequencies. In this example, the overhead would be considerable. In the run length encoding of a large
image, the overhead would be almost insignificant. We will discuss the issue of overhead in the case study
at the end of this section. The reader can now see the advantage in using a prefabricated run length code
as opposed to a run length code adapted to the data as we did here—the prefabricated code would involve
no overhead.
In the last section is given the MATLAB program “runlengths.m” for computing the sequence of run-
lengths for a horizontally scanned binary image. To illustrate the use of this function, suppose we want to
find the run lengths for the 4 × 4 image above. Then, we execute the following one line MATLAB command
which gives us the result:

>> runlengths([0 0 0 1; 1 1 0 0; 0 0 0 1; 1 1 1 0])

ans =

3 3 5 4 1

9.1.1 Case Study: 256 × 256 binary image


We applied run length coding to the 256 × 256 binary image “Lena”. Here is the diary of our MATLAB
session:

1
>> load lena.dat
>> M=lena;
>> size(M)

ans =

256 256

>> r=runlengths(M);
>> entropy(r)*length(r)/65536

ans =

0.3827
%The preceding is the approximate compression rate in codebits
%per pixel if arithmetic coding of the run lengths is employed
%Now we try separate encoding of the odd and even indexed runlengths
>> length(r)

ans =

5410

>> i=1:2705;
>> i=2*i-1;
%We extract the odd indexed runlengths
>> r_odd = r(i);
%We extract the even indexed runlengths
>> r_even = r(i+1);
%We compute the approximate compression rate
>> compression_rate = (entropy(r_odd)*2705+entropy(r_even)*2705)/65536

compression_rate =

0.3769

Let us discuss the amount of overhead that would have to be used in in the above study, thereby increasing
the compression rate. It can be determined that the largest odd indexed run length in our case study is
121 and the largest even indexed run length is 79. (Since the first entry of “lena” is a “1”, this corresponds
to a longest run of ones equal to 121 ones, and the longest run of zeroes equal to 79 zeroes.) In the first
part of the overhead, we would let the decoder know the two integers 121 and 79. Using the function
“index to bitstring”, we can represent the integer 121 as “111010” and we can represent the integer 79 as
“010000”. We can write each digit in the first representation two times, follow that by “01”, write each digit
in the second representation twice, and follow that by “01”. This results in the string

1111110011000100110000000001
which can be the first part of our overhead—from this string, the integers 121 and 79 can be determined
by the decoder. The decoder then looks at the next 121 bits of overhead to determine which of the integers
{1, 2, . . . , 121} actually occur among the odd indexed runlengths, and then the decoder can look at the next
79 bits of overhead following that to determine which of the integers {1, 2, . . . , 79} actually occurred as even
indexed runlengths. The last bit of overhead needed would be to allow the decoder to determine the lengths
of r_odd and r_even, so that r_odd and r_even can be arithmetically decoded. The length of r_odd is 2705
which converts to
0011001100001100001100

2
when “index to bitstring” is used and each digit is repeated. We can conclude the overhead with the
preceding string followed by “01” to denote that r_odd and r_even have the same length (we would have
used “10” if the length of r_odd were one greater). The overhead is now complete—we don’t need to give
the frequencies of the runlengths as part of the overhead, because we can use an adaptive arithmetic code
to encode r_odd and r_even. So the total length of the overhead required is

2 ∗ 6 + 2 + 2 ∗ 6 + 2 + 121 + 79 + 22 + 2 = 252
which increases the amount of the compression rate only by the not very significiant amount of 252/65536 =
.0038.

9.2 Prefabricated codes


Rather than adapt the choice of run length code to the particular binary image being encoded, the encoder
and decoder can agree to use prefabricated codewords to encode the run lengths. For example, one could
use Elias codewords or Golomb codewords. We describe how these codeword sets are formed.

9.2.1 Elias codewords


Let the sequence of run lengths for a binary image be r1 , r2 , . . . , rk . In run length coding with Elias codewords,
each run length ri = j is assigned an Elias codeword E(j) of length |E(j)| satisfying:

|E(j)| = dlog2 (j + 1)e + 2dlog2 (dlog2 (j + 1)e)e


and then the codewords E(r1 ), E(r2 ), . . . , E(rk ) are concatenated together. The first 20 Elias codewords are

Elias codewords
runlength codeword runlength codeword
1 0 11 11101011
2 1010 12 11101100
3 1011 13 11101101
4 1000100 14 11101110
5 1000101 15 11101111
6 1000110 16 10000010000
7 1000111 17 10000010001
8 11101000 18 10000010010
9 11101001 19 10000010011
10 11101010 20 10000010100

EXAMPLE. To form the Elias codeword for 100, you first expand 100 in binary, getting 1100100. This
requires 7 bits. You then subtract one from 7, getting 6. You then expand 6 in binary, getting 110. You then
write down each entry of 110 twice, getting 111100. You then delete “1” from the beginning and append
“0” to the end, getting 111000. To obtain the Elias codeword for 100, you then concatenate 111000 with
1100100, obtaining

E(100) = 1110001100100
These steps can be reversed. The appearance of the first 01 in E(100) allows the decoder to split 111000
off from 1100100. From 111000, the decoder determines that the next 7 bits are the binary representation
of the integer that was encoded. (Remember that as a result of the run length encoding process, the string
1100100 will be followed by other codebits and will have to be split off from these codebits—the decoder will
know that exactly 7 codebits have to be split off.)
Elias codewords work well when there is a very large number of different run length values. In such a
scenario, the Elias codewords can be shown to yield a run length code compression performance close to that
of an arithmetic coder.

3
9.2.2 Golomb codewords
When the frequencies of the runlengths in the sequence of runlengths r1 , r2 , . . . , rk follow an approximate
geometric distribution, then good compression performance is obtained by using Golomb codewords to encode
the run lengths. Specifically, suppose for some parameter p between zero and one, the normalized frequency
with which ri = j is approximately equal to

pj−1 (1 − p) (1)
(Count the number of ri for which ri = j and divide and k, and see if this number is roughly that given
in (1) for each j.) Then, for certain values of p, Golomb tabulated the codewords one can use to get good
performance. The Golomb codewords can be shown to give the optimum run length code compression
performance for the geometrically distributed run lengths (1).
Golomb codewords (p = 1/2)
runlength codeword runlength codeword
1 0 6 111110
2 10 7 1111110
3 110 8 11111110
4 1110 9 111111110
5 11110 10 1111111110

Golomb codewords (p2 = 1/2)


runlength codeword runlength codeword
1 00 6 1101
2 01 7 11100
3 100 8 11101
4 101 9 111100
5 1100 10 111101

Golomb codewords (p4 = 1/2)


runlength codeword runlength codeword
1 000 6 1001
2 001 7 1010
3 010 8 1011
4 011 9 11000
5 1000 10 11001

Golomb codewords (p8 = 1/2)


runlength codeword runlength codeword
1 0000 11 10010
2 0001 12 10011
3 0010 13 10100
4 0011 14 10101
5 0100 15 10110
6 0101 16 10111
7 0110 17 110000
8 0111 18 110001
9 10000 19 110010
10 10001 20 110011

The reader can see from these tables that a Golomb codeword w for pm = 1/2 gives rise to the two Golomb
codewords w0 and w1 for pm+1 = 1/2.

4
9.3 Run length coding of bit planes
Run length coding cannot be directly applied to a gray level image. Suppose there are 256 gray levels in a
gray level image. This means the image is represented as a matrix in which each element (i.e., each pixel
value) comes from the alphabet {0, 1, . . . , 255}. Each pixel value can be represented using 8 bits, the usual
binary code representation of the value, with most significant bit on the left. The most natural approach at
this point would seem to be to horizontally scan the image, concatenating together these 8-bit representations
to obtain one very long binary string, which one could then run length encode. However, this approach turns
out not to be as effective as other approaches for real images.
A better approach for a 256 level image would be to extract 8 “bit planes” from the image. Each of the
eight bit planes is a binary matrix. The first bit plane is obtained by replacing each pixel value in the image
with the most significant bit in its 8-bit representation. The next bit plane is obtained by replacing each
pixel value in the image with the second most significant bit in its 8-bit representation. One continues in this
way until all 8 bit planes are generated. One could then run length encode the separate bit planes, obtaining
8 separate runlength codewords, which could be concatenated together to yield the overall codeword for the
original gray level image. However, here is a simple example to show that this approach will sometimes
not work very well. Suppose you have a 256 level image in which half the pixel values are 127 and half the
pixel values are 128. Direct Huffman coding of the image would yield a compression rate of one codebit per
pixel. Now examine what happens with the 8 bit planes. The binary expansion of 127 is 01111111 and the
binary expansion of 128 is 10000000. Therefore each bit plane will have half of its pixel values equal to zero
and half of its pixel values equal to one. Run length encoding of each bit plane therefore could conceivably
yield a compression rate of one codebit per pixel for each of the 8 bit planes. Overall, this would lead to a
compression rate for the original gray level image of 8 codebits per pixel, no compression at all, and 8 times
worse than what is theoretically possible to achieve!
Generally speaking, separate run-length coding of bit planes will work better if one uses the Gray code
representation for each pixel value instead of the usual binary code representation. When one converts each
integer in the set {0, 1, 2, . . . , 2i −1} into its i-bit Gray code representation, one will find that two consecutive
integers j, j + 1 will have Gray code representation strings that agree with each other except in just one
position. For example, the Gray code representation of the integer 127 is 01000000 and the Gray code
representation of the integer 128 is 11000000. If we use these representations in the example in the preceding
paragraph, the first bit plane will have one half of the pixel values equal to one and the other half equal
to zero, whereas bit planes two through eight will have all pixel values equal to zero. Separate run length
encoding of these bit planes will yield a compression rate almost equal to 1 codebit per pixel for the original
image, the best possible result.
The mapping rule for going from the binary code representation to the Gray code representation is very
easy to describe: Simply complement every entry in the binary code representation that is preceded by a 1.
For example, 10011011 in binary code becomes 11010010 in Gray code because the second, fifth, and eighth
entries are preceded by ones and are therefore complemented.
We have obtained MATLAB programs “binarycode.m” and “graycode.m” for computing the binary
code and Gray code representations of an integer. For example, to obtain the 8-bit binary codeword for the
integer 127, you would execute the command
binarycode(127,8)
To obtain the 4-bit binary codeword for the integer 13, you would execute the MATLAB command
binarycode(13,4)
To obtain the 8-bit Gray codeword for the integer 127, execute the command
graycode(127,8)
and to obtain the 4-bit Gray codeword for the integer 13, execute the command
graycode(13,4)
EXERCISE. Try to discover the simple rule for going from the Gray code representation back to the
binary code representation.

5
9.3.1 Case Study: A 256 × 256 8-bit image
We do separate run length encoding of the bit planes for the 256 × 256 8-bit Lena image, using the Gray
code representation of the pixel values in extracting the bit planes. Here is the our MATLAB diary:

>> load lena256.dat


>> M=lena256;
>> size(M)

ans =

256 256

%Build table of gray codewords


>> for i=1:256;
A(i,:)=graycode(i-1,8);
end
>> size(A)

ans =

256 8
%Generate bitplanes
>> for i=1:256;
for j=1:256;
q=M(i,j);
y=A(q+1,:);
bitplane1(i,j)=y(1);
bitplane2(i,j)=y(2);
bitplane3(i,j)=y(3);
bitplane4(i,j)=y(4);
bitplane5(i,j)=y(5);
bitplane6(i,j)=y(6);
bitplane7(i,j)=y(7);
bitplane8(i,j)=y(8);
end
end
%Generate runlengths in each bitplane
>> r1=runlengths(bitplane1);
>> r2=runlengths(bitplane2);
>> r3=runlengths(bitplane3);
>> r4=runlengths(bitplane4);
>> r5=runlengths(bitplane5);
>> r6=runlengths(bitplane6);
>> r7=runlengths(bitplane7);
>> r8=runlengths(bitplane8);
>> c1=entropy(r1)*length(r1);
>> c2=entropy(r2)*length(r2);
>> c3=entropy(r3)*length(r3);
>> c4=entropy(r4)*length(r4);
>> c5=entropy(r5)*length(r5);
>> c6=entropy(r6)*length(r6);
>> c7=entropy(r7)*length(r7);
>> c8=entropy(r8)*length(r8);
>> compression_rate = (c1+c2+c3+c4+c5+c6+c7+c8)/65536

6
compression_rate =

5.9028

We approximated arithmetic codeword lengths using entropy and we didn’t worry about overhead, since
this should be negligible, in view of our earlier discussion. So, we expect that this bit plane encoding method
will yield a compression rate of about 5.9 codebits per pixel for the 256 × 256 8-bit (256 level) Lena image.
EXERCISES. Re-do the above compression experiment using separate arithmetic encoders for the even
and odd indexed runlengths, and see how much better the compression rate becomes. Re-do the above
compression experiment using the binary code representation in obtaining the bitplanes, and see how much
worse the compression rate becomes.

9.4 Conditional run length coding


Conditional run length coding is another way in which run length coding can be implemented on a gray scale
image. Suppose one has a 2i -level image. Instead of assigning the i-bit Gray codeword to each pixel value,
followed by run length coding of the resulting bit planes, conditional run length coding involves a precoding
step in which an i-bit binary codeword is assigned adaptively to each pixel, followed by run length coding of
the concatenated i-bit codewords. The decomposition into bit planes is not used in conditional run length
coding. The reason for this is that the codeword assignment in the precoding step is done in such a way
that one gains by run length encoding the concatenated precoder codewords as opposed to breaking up the
bits in these codewords via bitplanes.
Suppose, in the generic gray level image below, the pixel marked “X” is the pixel currently being encoded.
The pixels marked “A” and “B” are examined; the pixel values in thse locations are used to select the precoder
codeword for pixel “X”.
 
 B 
A X
 

To more easily explain what goes on, suppose we have a 16-level image. The pixel values then come from
the set {0, 1, 2, . . . , 15}. The codewords to be assigned to the pixel values are

0000, 0001, 0011, 0111,


1111, 1110, 1100, 1000,
(2)
0100, 0010, 0110, 1011,
1001, 1101, 0101, 1010
When it is time to encode pixel “X”, the pixel values “A” and “B” are noted, and then all previously
processed pixel positions are examined in which the pixel value to the left is “A” and the pixel value above
is “B”. (We assume pixels are processed via horizontal scanning.) A ranking P1 , P2 , . . . , P16 of the pixel
values 0, 1, . . . , 15 is then determined with Pi being the i-th most frequent of the sixteen pixel values in the
examined positions. If the pixel value in position “X” is Pj , then the precoder assigns the j-th codeword in
the list (2) to position “X”. (For position “X” in the upper left hand corner, there will be no pixel value “A”
or “B”—in this case, the ranking P1 , . . . , P16 is taken to be 0, 1, . . . , 15. For position “X” in the left column,
second position on down, there will be a pixel value “B” but not a pixel value “A”—the ranking P1 , . . . , P16
is then determined by examining all previous positions “X” where the pixel value above is “B”. For position
“X” in the top row, second position on over, there will be a pixel value “A” but not a pixel value “B”—the
ranking P1 , . . . , P16 is determined by examining all previous positions “X” where the pixel value to the left
is “A”. In all cases, if there is no previous “X” position to be examined, the standard ordering 0, 1, . . . , 15 is
taken as the ranking P1 , . . . , P16 .)
In the previous paragraph, we have described how the precoder assigns a 4-bit codeword to each pixel
in the image. This is the first pass of the precoding process. The precoder makes a second pass at this
point, generating a second pass codeword for each first pass codeword. If the previous second pass codeword

7
ends in “1”, then the current second pass codeword is taken to be the complement of the current first pass
codeword; otherwise, the current second pass codeword is taken as the current first pass codeword. To
illustrate, suppose the precoder has assigned the following sequence of codewords on the first pass:

0001, 0000, 0011, 0000, 1110, 0001, 0000


Then the corresponding codewords on the second pass are:

0001, 1111, 1100, 0000, 1110, 0001, 1111


The first pass codewords in positions 2, 3, 7 were complemented because the previously generated second
pass codewords all ended in one.
Once the precoder has generated its sequence of second pass codewords, these codewords are then con-
catenated together and a run length encoder is applied The reason for the second pass operation in the
precoder is to create codewords which when concatenated together yield longer runs of zeroes and ones than
the first pass codewords would have, thereby increasing the compression capability of the run length encoder.
EXERCISE. Using MATLAB, test the conditional run length encoding technique on some large 16-level
image.

9.5 Programs
We developed three MATLAB programs, “runlengths.m”, “binarycode.m”, and “graycode.m”. Here are
the m-files:

%This program extracts the run lengths


%from a horizontally scanned binary image
function r = runlengths(M)
[r,s]=size(M);
for i=1:s;
x(r*(i-1)+1:r*i)=M(i,:);
end
N=r*s;
y=x(2:N);
x1=x(1:N-1);
z=y+x1;
j=find(z==1);
j1=[j N];
j2=[0 j];
r=j1-j2;

%This program computes the L-bit binary


%code representation of an integer i between
%0 and 2^L-1
function y = binarycode(i,L)
x=[];
q=i;
while q > 0
R=q-2*floor(q/2);
q=floor(q/2);
x=[R x];
end
n=length(x);
if n==L

8
y=x;
else
y=[zeros(1,L-n) x];
end

%This program computes the L-bit Gray


%code representation of an integer i between
%0 and 2^L-1
function y = graycode(i,L)
x=binarycode(i,L);
y(1)=x(1);
for j=2:L;
if x(j-1)==1
y(j)=1-x(j);
else
y(j)=x(j);
end
end

You might also like