0% found this document useful (0 votes)
65 views

Image Compression-Decompression Technique Using Arithmetic Coding

The document describes a new technique for image compression and decompression using arithmetic coding. It divides images into 64x64 pixel blocks and pads any blocks not a multiple of this size with zeros. Each block is then processed separately using arithmetic coding to generate a stream of compressed tables. The technique achieves a high compression ratio of 88-90% while allowing exact recovery of the original image without loss during decompression. It is tested on different images and shown to compress data quickly and efficiently.

Uploaded by

perhacker
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Image Compression-Decompression Technique Using Arithmetic Coding

The document describes a new technique for image compression and decompression using arithmetic coding. It divides images into 64x64 pixel blocks and pads any blocks not a multiple of this size with zeros. Each block is then processed separately using arithmetic coding to generate a stream of compressed tables. The technique achieves a high compression ratio of 88-90% while allowing exact recovery of the original image without loss during decompression. It is tested on different images and shown to compress data quickly and efficiently.

Uploaded by

perhacker
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/318643056

New Image Compression/Decompression Technique Using Arithmetic Coding


Algorithm

Article · October 2016


DOI: 10.17656/jzs.10604

CITATIONS READS
3 1,706

1 author:

Nassir H. Salman
University of Baghdad
29 PUBLICATIONS   162 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Medical Image Segmentation Based on Edge Detection Techniques View project

digital image compression View project

All content following this page was uploaded by Nassir H. Salman on 20 March 2019.

The user has requested enhancement of the downloaded file.


New Image Compression/Decompression Technique Using
Arithmetic Coding Algorithm
Nassir H. Salman; Computer Science Dept., Cihan Univ., Erbil, Iraq
[email protected]

Abstract
Arithmetic coding is a form of entropy encoding used in lossless data compression. In this
article, new technique for image compression/decompression based on arithmetic coding
decoding was used. The input image is divided into blocks, each block is 64 x 64 pixels. If the
image is not multiple of 64 , the size of what remains of the image is calculated and columns
and rows of zero values are added(padding) up to be all the blocks are 64 x 64 . The blocks
are separated first as a stream for processing. The result is a stream of compressed tables after
do arithmetic coding process and the store bytes are calculated during this process. Then
compression ratio is calculated. Finally the stream of the compressed tables is recollected for
decompressing (reconstruction image after decoding process) and this is done for each block.
The algorithm is tested on different images and the results show a high compression ratio (88-
90%), the process is fast, and it is easy to decode the binary bits to recover the original image
exactly without any lose. For more results see section 5 and 6 in this article.

Keywords: arithmetic coding, image compression, lossless compression technique, decoding

1. Introduction
Arithmetic coding is a data compression technique that encodes data (character,
image,….) by creating a code which represents a fraction in the unit interval [0, 1]. The
algorithm is recursive. And on each recursion, the algorithm successively partitions
subintervals of the unit interval [0, 1]. This means in arithmetic coding, instead of using a
sequence of bits to represent a symbol, a subinterval of the unit interval[0, 1] is used to
represent that symbol. In other words, the data is encoded into a number in the unit
interval[0, 1], and this technique can be implemented by separating the unit interval into
several segments according to the number of distinct symbols. The length of each segment is
proportional to the probability of each symbol, and then the output data is located in the
corresponding segment according to the input symbol.[1]

Arithmetic coding (AC) is a very efficient technique for lossless data compression and it
produces a rate which approaches the entropy of the encoded data (i.e a form of entropy
encoding used in lossless data compression). AC is widely used in the well known image and
video compression algorithms such as JBIG, JBIGS,JPEG2000 and H.264/AVC. AC is also a
statistical coding technique which needs to work with a model that estimates the probability of
each pixel at each iteration in the encoding and decoding processes. In this case it needs to
find the probability distribution of symbols that maximize the compression efficiency [ 2 ].

Normally, a string of characters is represented using a fixed number of bits per character,
as in the ASCII code. When a string is converted to arithmetic encoding, frequently used
characters will be stored with fewer bits and not-so-frequently occurring characters will be

1
stored with more bits, resulting in fewer bits used in total. In this case arithmetic coding differs
from other forms of entropy encoding, such as Huffman coding, in that rather than separating
the input into component symbols and replacing each with a code, arithmetic coding encodes
the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0) [1].

In [2] an image is divided into non-overlapping blocks and then each block is encoded
separately by using arithmetic coding. The proposed model provides a probability distribution
for each block which is modeled by a mixture of non-parametric distributions by exploiting
the high correlation between neighboring blocks. The Expectation-Maximization algorithm is
used to find the maximum likelihood mixture parameters in order to maximize the arithmetic
coding compression efficiency.
In [ 3] an efficient algorithm has been proposed for lossy image compression /decompression
scheme using histogram based block optimization and arithmetic coding , the results shows
that the algorithm gives better compression ratio as that of JPEG

In [ 4] a complete program was used for coding stream of symbols into stream of bits by using
Arithmetic Coding Algorithm. The input to the coding function is array of "Data” such as [3 0
-1 129 9 -255 255 0 0 -3] and this program accept negative, positive and zero data, where in
our work, image pixels values store as vector and used as input to coding and decoding
functions.

In this article, we discuss basic concepts about arithmetic coding in section1, the motivation
and the coding theorem behind arithmetic coding in section2, arithmetic coding procedures in
sec3, the method of generating a binary sequence in sec4, the proposed method and the
algorithm steps in sec5, the results of using arithmetic code to compressed image data in sec6,
and end up with the conclusion.

2. Using Arithmetic Coding


The source symbols can be mapped to numbers. In the following discussion, we will use the
mapping equation (1);[1] [6]
𝑋(𝑎𝑖 ) = 𝑖, 𝑎𝑖 ∈ 𝒜 (1)

Where, 𝒜 = {𝑎1 , 𝑎2 , … , 𝑎𝑚 } is the alphabet for a discrete source and 𝑋 is a random variable.
This mapping means that given a probability model for the source, we can obtain a probability
density function (pdf) for the random variable ; equation (2):

𝑃𝑟(𝑋 = 𝑖) = 𝑃(𝑎𝑖 ) (2)


and the cumulative density function (cdf) can be expressed as
𝑖

𝐹𝑋 (𝑖) = ∑ 𝑃𝑟(𝑋 = 𝑘). (3)


𝑘=1
Hence for each symbol 𝑎𝑖 with a nonzero probability, there is a distinct value of 𝐹𝑋 (𝑖). Since
the value of 𝐹𝑋 (𝑖) is distinct for distinct symbol 𝑎𝑖 , it is able to use this fact in what follows to
develop the arithmetic code. For more clear, the following example illustrates restricting the
interval containing the tag for the input sequence (𝑎1 , 𝑎2 , 𝑎3 ).[ 6]

2
Consider a ternary alphabet 𝒜 = {𝑎1 , 𝑎2 , 𝑎3 } with 𝑃(𝑎1 ) = 0.7, 𝑃(𝑎2 ) = 0.1, and 𝑃(𝑎3 ) =
0.2. Using the equations (2 ) and (3) above, we have 𝐹𝑥 (1) = 0.7, 𝐹𝑥 (2) = 0.8, and 𝐹𝑥 (3) =
1.0. Now let us consider an input sequence (𝑎1 , 𝑎2 , 𝑎3 ). Then this partitions the unit interval
as shown in Figure 1.[1]

The partition in which the tag resides depends on the first symbol of the sequence. If the first
symbol is 𝑎1 , then the tag lies in the interval [0.0, 0.7); if the first symbol is 𝑎2 , then the tag
lies in the interval [0.7,0.8); if the first symbol is 𝑎3 , then the tag lies in the interval [0.8, 1.0).
Once the interval containing the tag has been determined, the rest of the unit interval is
discarded, and this restricted interval is again divided in the same proportions as the original
interval. Now the input sequence is (𝑎1 , 𝑎2 , 𝑎3 ). So, the first symbol is 𝑎1 , and the tag would
be contained in the subinterval [0.0, 0.7). This subinterval is then subdivided in exactly the
same proportions as the original interval, yielding the subintervals [0.00, 0.49), [0.49, 0.56),

Figure 1. Restricting the interval containing the tag for the input sequence (𝑎1 , 𝑎2 , 𝑎3 )

and [0.56, 0.70).The first partition [0.00, 0.49) corresponds to the symbol 𝑎1 , the second
partition [0.49, 0.56) corresponds to the symbol 𝑎2 , and the third partition [0.56,0.70)
corresponds to the symbol 𝑎3 . And because the second symbol in the sequence is 𝑎2 . Then the
tag value is then restricted to lie in the interval [0.49, 0.56). Following is a simple calculating
for these intervals:
0.7-0=0.7 starting with subinterval of 𝑎1 , then
0.7*probability=0.7 * .7=0.49 [0-0.49] for 𝑎1 , 0.7 *.8=0.56 [0.49 -0.56] represent 𝑎2
interval, then continue for 𝑎3 [0.56-0.7]. Now using interval of 𝑎2 , the new intervals become:
-----------------
0.56-0.49=0.07 *.7=0.049 +0.49=0.539 [0.49 -0.539] represent 𝑎1 ,
0.07 *0.1=0.007 +0.539=0.546 [0.539-0.546] represent 𝑎2
0.07 *0.2=0.014+0.546=0.560 , [0.546 -0.560] represent 𝑎3
3
So, this interval [0.49 -0.56] is partitioned in the same proportion as the original interval in order
to obtain the subinterval [0.49, 0.539) corresponding to the symbol 𝑎1 , the subinterval
[0.539,0.546) corresponding to the symbol 𝑎2 and the subinterval [0.546,0.560) corresponding
to the symbol 𝑎3 . Now the third symbol is 𝑎3 , then the tag will be restricted to the interval
[0.546,0.560), which can be subdivided further by the same procedure described above.

Note that the appearance of each new symbol restricts the tag to a subinterval that is
disjoint from any other subinterval that may have been generated using this process. For the
sequence (𝑎1 , 𝑎2 , 𝑎3 ), since the third symbol is 𝑎3 , the tag is restricted to the subinterval [0.546,
0.560). If the third symbol is 𝑎1 instead of 𝑎3 , the tag would have resided in the subinterval
[0.49, 0.539), which is disjoint from the subinterval [0.546,0.560).

3.Arithmetic Coding Procedures

After reading the image, a matrix of pixel intensities will obtain. For a gray-scale image, the
intensity values range from 0 to 255. All we have to do is make a list of all the intensity values
i.e., 0 - 255 and find out the no. of pixels that belong to each of these intensities (calculate
statistical distribution of image pixels). And then dividing those numbers in each entry by the
total number of pixels to obtain the frequency (probability) of each of these intensities in the
input image. These intensities are your symbols with their corresponding probabilities. At this
step you have the familiar parameters to use the Arithmetic Coding. So; any source coding
technique is the final step in encoding and if you want to directly implement arithmetic coding
method on image, you need to add a few steps as follows:

1. Read any image file format.

2. From the image matrix, find out the different intensity values that are used in the image and
make out a list of them.

3. Find the frequency of occurrence (probability) of each of intensity values in the image.

Now the intensity values list make up the source symbols and frequency of occurrence will be
their corresponding probabilities [5]. Then those parameters are supplied to arithmetic coding
function.

4.Generating A Binary Sequence


4.1 Efficiency of the Arithmetic Code
Consider a source A that generates letters from an alphabet of size four and five with
probabilities shown in tables (1 and 2) below. According to the equations (3 in section 2 ,and
4 below), A binary code for this source can be generated [ 6].
𝑖−1
1
̅̅̅
𝐹𝑋 (𝑥𝑖 ) = ∑ 𝑃(𝑋 = 𝑘) + 𝑃(𝑋 = 𝑖)
2
𝑘=1
1
=𝐹𝑥 (𝑖 − 1) + 2 𝑃(𝑋 = 𝑖) … … … … … . (4)

4
Equation(4) above means: for each 𝑥𝑖 , ̅̅̅ 𝐹𝑋 (𝑥𝑖 )will have a unique value in the interval [0,
1] .This value can be used as a unique for 𝑥𝑖 . Therefore, a binary code for ̅̅̅ 𝐹𝑋 (𝑥𝑖 ) can be
obtained by taking the binary representation of this number (fraction number) and truncating
1
it to 𝑙(𝑥) = ⌈log 𝑃(𝑥)⌉ + 1 bits, where log(. ) denotes the logarithm to base 2 as follows:

𝑙𝑜𝑔2 (𝑋) = 𝑙𝑜𝑔10 (𝑋)/𝑙𝑜𝑔10 (2),

For example for symbol no.1 in table 1 below, the calculations are as follows:
1
⌈log .25⌉ + 1 = 𝑙𝑜𝑔2 4 + 1 = 𝑙𝑜𝑔10 (4)/𝑙𝑜𝑔10 (2) + 1 = 0.602059/0.30102 + 1 = 3

Table 1
Symbol 𝑃(𝑥) 𝐹𝑋 (𝑥) i 𝐹𝑋 (𝑥) 𝐹𝑋 (𝑥) in binary 𝑙(𝑥) = ⌈log 1 ⌉ + 1 Codeword
𝑃(𝑥)
1 0.25 0.25 1 0.125 0.001 3 001
2 0.5 0.75 2 0.5 0.10 2 10
3 0.125 0.875 3 0.8125 0.1101 4 1101
4 0.125 1.0 4 0.9375 0.1111 4 1111

The quantity 𝐹𝑋 (𝑥) is obtained using Equation (4), and the binary representation of 𝐹𝑋 (𝑥) is
1
truncated to 𝑙(𝑥) = ⌈log 𝑃(𝑥)⌉ + 1 bits to obtain the binary code. , 𝐹𝑥 (0) = 0, see eq(4), and
𝐹𝑋 (1) = (𝑃(𝑋 = 1) = 0.25
𝐹𝑋 (2) = 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2) = 0.25 + 0.5 = 0.75, but
1 1
𝐹𝑋 (1) = 𝐹𝑋 (1 − 1) + 𝑃(𝑋 = 1) = 0 + 𝑥 0.25 = 0.125
2 2
1 1
𝐹𝑋 (2) = 𝐹𝑋 (1) + 2 𝑃(𝑋 = 2) = 0.25 + 2 𝑥 0.5 = 0.5, and so on, see table 2.

Table 2
Symbol 𝑃(𝑥) 𝐹𝑋 (𝑥) 𝐹𝑋 (𝑥) 𝐹𝑋 (𝑥) in binary 𝑙(𝑥) = ⌈log 1 ⌉ + 1 Codeword
𝑃(𝑥)
1 0.25 0.25 0.125 0.001 3 001
2 0.25 0.50 0.375 0.011 3 011
3 0.20 0.70 0.60 0.10011 4 1001
4 0.15 0.85 0.775 0.1100011 4 1100
5 0.15 1.00 0.925 0.1110110 4 1110

It can be proved that a code obtained in this fashion is a uniquely decodable code. For more
about arithmetic coding details see [7-12].

5
5-Proposed Method and Algorithm Steps
Figure 1. Shows the steps of using Arithmetic Coding as follows:

1-Read any image file format (i.e; .jpg, gif,. png…etc)

2-Divide the image into blocks ( each block size is 64 x 64), moving or sliding window which
is designed to strip the image into blocks. The blocks are separated first as a stream for
processing. Counter of number of blocks. Calculate the size of what remains of the image to
complete it into block of size 64 x64 through zero padding. Call Arith_Code(f1) function to
code image pixels values .The result is a stream of compressed tables , in this step:

A complete program for coding stream of symbols into stream of bits by using Arithmetic
Coding Algorithm [4 ] was used. The input to the function is array of "Data". Also in this step,
the following were done:

 Compute the table of the symbols (pixels)


 Compute the probability of the symbols
 Apply Arithmetic coding using:code = arithenco(New_Data,Counts) to calculate the
Store_Byte,Counts,Table of pixels.

3-Calculate the compressed image file size ,where the amount of data for the block stream of
the original image without compression and the amount of data for the block stream after the
compression were calculated.

4- Then calculate the compression ratio (CR): CR=(M*N*8-sum(BlockSize))/(M*N*8);


origin image file size – compressed size divide by origin image.

5-And finally the origin image was recovered from decompressed one, where the stream is
recollected for decompressing process and this is done for each block: This step represents
decoding process by Arithmetic Decoding using arithdeco(code, Counts, length(Data))
function;

6-Calculate the elapsed time(sec) for compressing /decompressing different images.

6
Fig 1. Shows the proposed method and algorithm steps

6. Test and Results

First of all, many samples of gray scale medical images of different size and format
were read and the following results are done:
1- Displaying the original data of the input image
2- The table of pixel values where the occurrence pixels display one time only,this
means the values in the table included only the image data without repeating
3- Calculating the count of each occurrence of each pixel value
4- The probability of the table symbols vales are calculated. If the following data
represents the original image data,

Then the table of pixels values and the count of each occurrence as follows:

7
The values in the table included only the image data without repeating, where the original data
represents the all input data (here are the pixels values).And the probability of the table symbols
vales are calculated as follows:

Simply the following calculations showed coding data using Arith_Code1(input image) [4]
If the pixels values are as follows:
[10 8 8 10 9 9 9 12 0 0 10], this coding into

D_Bits = 0001010110001110111111100100000 = 31 bits converted into bytes


Store_Byte =21 142 254 64, where 21 represented in binary 00010101
Counts = 3 2 3 1 2 , this means 10 occurrences 3 times….etc
Table = 10 8 9 12 0, image pixels without repeating, see pixels values above

The following Fig.2 shows the origin image, and recovered image after compressing process
decompressing processes using Arithmetic coding for some medical images got them from
Rezgari Hospital, Erbil-Iraq

8
Fig.2 Shows the origin image, and recovered image using Arithmetic coding

Table 3
Image type Image file Compression Compression Elapsed Applications
and resolution size(bit) image file ratio * % time(sec)
size(bit)
331 x 553 .jpg 1464344 145604 90.057 77.8591 medical
357 x 429 .jpg 1225224 139349 88.627 57.0807 medical
500 x 362 1448000 160892 88.889 79.6536 Normal image

*CR=(M*N*8-sum(BlockSize))/(M*N*8); [origin image file size – compressed size] divide


by the origin image size.

9
Conclusion

The compression ratio is large (about 90%) for medical images we used. The original image is
recovered easy from the compression one without any lose (losseless compression tech). The
processing time becomes short with using blocks compare with using one block (the whole
image ).The proposed method can be used for image data or any array data to coding and
encoding without any lose. Arithmetic coding is very useful with medical images compare
with using .jpeg format (lossy ) because it is losses, and the Cr is 90% which mean for
example ,if the image file size is 100kbyte becomes 10 Kbytes.
Finally one advantage of arithmetic coding over other similar methods of data compression is
the convenience of adaptation which is the changing of the frequency (or probability) tables
while processing the data. The decoded data matches the original data as long as the frequency
table in decoding is replaced in the same way and in the same step as in encoding. The
synchronization is, usually, based on a combination of symbols occurring during the encoding
and decoding process. The proposed method give high compression ratio, that is useful for
sending compression images via e-mail (i.e. small size) for diagnostic, for storing a lot of image
files in the hospital.

Reference
1-Khalid Sayood. Introduction to Data Compression 2006. 3rd edition Ch4-P/81. Elsevier Inc.

2-Atef Masmoudi, William Puech, Afif Masmoudi. “An improved lossless image compression
based arithmetic coding using mixture of non-parametric distributions” in Multimedia Tools
and Applications (2015) December 2015, Volume 74, Issue 23, pp 10605-10619.springer
sciences

3- Subarna Dutta etal . An efficient image compression algorithm based on histogram based
block optimization and arithmetic coding .international journal of computer theory and
engineering, vol.4, No.6, Dec. 2012

4-Mohammed Siddeq.Arithmetic Coding and Decoding. Sept. 2011 Math works file exchange.
See website: http://mamadmmx76.wix.com/mohammedsiddeq

5-http://www.mathworks.com/matlabcentral/fileexchange/36377-arithmetic-encoding---
decoding

6-Yu-Yun Chang, Tutorial: Arithmetic Coding. National Taiwan University, Taipei, Taiwan,
ROC
7-Arithmetic Coding For Data Compression. Ian H. Willen, Radford M. Neal, and John G.
Cleary. Communications of the ACM. June 1987 Volume 30 Number 6.

8-David Salomon. Data Compression The Complete Reference. Third Edition-Ch2. P-


106 ,Springer-Verlag New York, Inc.2004

10
9-David Salomon. A Concise Introduction to Data Compression Ch4, p-123. Springer-Verlag
London Limited 2008.

10-David Salomon, Giovanni Motta, D. Bryant Handbook of Data Compression.5th


edition .Ch5, p264. Springer-Verlag London Limited 2010.

11-Ida Mengyi Pu Fundamental Data Compression . Ch6.p-101. Elsevier Inc. 2006

12-Adam Drozdek . “Elements of data compression”, Brooks/Cole, Thomson learning,


Inc.2002.

11

View publication stats

You might also like