Introduction To Conventional Compression Solutions
Introduction To Conventional Compression Solutions
A common characteristic of most images is that the neighboring pixels are correlated
and therefore contain redundant information. The foremost task then is to find less
correlated representation of the image. Two fundamental components of compression
are redundancy and irrelevancy reduction. Redundancy reduction aims at removing
duplication from the signal source (image/video). Irrelevancy reduction omits parts of
the signal that will not be noticed by the signal receiver, namely the Human Visual
System (HVS). In general, three types of redundancy can be identified:
Image compression research aims at reducing the number of bits needed to represent
an image by removing the spatial and spectral redundancies as much as possible.
There are basically two types of compression methods: lossy and lossless. Lossy
compression creates smaller files by discarding some information about the original
image. It removes details and color changes it deems too small for the human eye to
differentiate. Lossless compression, on the other hand, never discards any information
about the original file. In lossless compression schemes, the reconstructed image, after
compression, is numerically identical to the original image. An image reconstructed
following lossy compression contains degradation relative to the original. Often this is
because the compression scheme completely discards redundant information. Under
normal viewing conditions, no visible loss is perceived (visually lossless).
Image Compression Process
Over the years, a variety of linear transforms have been developed which include
Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), Discrete Wavelet
Transform (DWT) and many more, each with its own advantages and disadvantages.
Quantizer
A quantizer simply reduces the number of bits needed to store the transformed
coefficients by reducing the precision of those values. Since this is a many-to-one
mapping, it is a lossy process and is the main source of compression in an encoder.
Quantization can be performed on each individual coefficient, which is known as Scalar
Quantization (SQ). Quantization can also be performed on a group of coefficients
together, and this is known as Vector Quantization (VQ). Both uniform and non-uniform
quantizers can be used depending on the problem at hand.
Entropy Encoder
An entropy encoder further compresses the quantized values losslessly to give better
overall compression. It uses a model to accurately determine the probabilities for each
quantized value and produces an appropriate code based on these probabilities so that
the resultant output code stream will be smaller than the input stream. The most
commonly used entropy encoders are the Huffman encoder and the arithmetic encoder,
although for applications requiring fast execution, simple run-length encoding (RLE) has
proven very effective.
It is important to note that a properly designed quantizer and entropy encoder are
absolutely necessary along with optimum signal transformation to get the best possible
compression.
Further Elaboration on Source Encoder
The discrete cosine transform (DCT) helps separate the image into parts (or spectral sub-
bands) of differing importance (with respect to the image's visual quality). The DCT is
similar to the discrete Fourier transform: it transforms a signal or image from the spatial
domain to the frequency domain. With an input image, A, the coefficients for the output
"image," B, are:
The input image is N2 pixels wide by N1 pixels high; A(i,j) is the intensity of the pixel in row i
and column j; B(k1,k2) is the DCT coefficient in row k1 and column k2 of the DCT matrix.
All DCT multiplications are real. This lowers the number of required multiplications, as
compared to the discrete Fourier transform. The DCT input is an 8 by 8 array of integers.
This array contains each pixel's gray scale level; 8 bit pixels have levels from 0 to 255. The
output array of DCT coefficients contains integers; these can range from -1024 to 1023.
For most images, much of the signal energy lies at low frequencies; these appear in the
upper left corner of the DCT. The lower right values represent higher frequencies, and are
often small - small enough to be neglected with little visible distortion.
Wavelets are functions defined over a finite interval and having an average value of
zero. The basic idea of the wavelet transform is to represent any arbitrary function f(t) as
a superposition of a set of such wavelets or basis functions. These basis functions or baby
wavelets are obtained from a single prototype wavelet called the mother wavelet, by
dilations or contractions (scaling) and translations (shifts). The Discrete Wavelet Transform
of a finite length signal x(n) having N components, for example, is expressed by an N x N
matrix.
Wavelet-based Compression
Despite all the advantages of JPEG compression schemes based on DCT namely
simplicity, satisfactory performance, and availability of special purpose hardware for
implementation; these are not without their shortcomings. Since the input image needs
to be “blocked,” correlation across the block boundaries is not eliminated. This results in
noticeable and annoying “blocking artifacts” particularly at low bit rates. Lapped
Orthogonal Transforms (LOT) attempt to solve this problem by using smoothly overlapping
blocks. Although blocking effect are reduced in LOT compressed images, increased
computational complexity of such algorithms do not justify wide replacement of DCT by
LOT.
Over the past several years, the wavelet transform has gained widespread acceptance
in signal processing in general and in image compression research in particular. In many
applications wavelet-based schemes (also referred as subband coding) outperform
other coding schemes like the one based on DCT. Since there is no need to block the
input image and its basis functions have variable length, wavelet coding schemes at
higher compression avoid blocking artifacts. Wavelet-based coding is more robust under
transmission and decoding errors, and also facilitates progressive transmission of images.
Because of their inherent multiresolution nature, wavelet coding schemes are especially
suitable for applications where scalability and tolerable degradation are important.
JPEG (DCT-Based)
JPEG is the image compression standard developed by the Joint Photographic Experts
Group. It works best on natural images (scenes). The discovery of DCT in 1974 was an
important achievement for the research community working on image compression. The
DCT can be regarded as a discrete-time version of the Fourier-Cosine series. It is a close
relative of DFT, a technique for converting a signal into elementary frequency
components. Thus DCT can be computed with a Fast Fourier Transform (FFT) like algorithm
in O(n log n) operations. Unlike DFT, DCT is real-valued and provides a better
approximation of a signal with fewer coefficients. The DCT of a discrete signal x(n),
n=0,1, .. , N-1 is defined as:
In 1992, JPEG established the first international standard for still image compression where
the encoders and decoders are DCT-based. The JPEG standard specifies three modes
namely sequential, progressive and hierarchical for lossy encoding, and one mode of
lossless encoding.
JPEG 2000, as noted previously, is the next ISO/ITU-T standard for still image coding. JPEG
2000 is based on the discrete wavelet transform (DWT), scalar quantization, context
modeling, arithmetic coding and post compression rate allocation. The DWT is dyadic
and can be performed with either a reversible filter, which provides for lossless coding, or
a non-reversible one, which provides for higher compression but does not do lossless. The
quantizer follows an embedded dead-zone scalar approach and is independent for
each sub-band. Each sub-band is divided into blocks, typically 64x64, and entropy
coded using context modeling and bit-plane arithmetic coding. The coded data is
organized in so called layers, which are quality levels, using the post-compression rate
allocation and output to the code stream in packets. The generated code-stream is
parseable and can be resolution, layer (i.e. SNR), position or component progressive, or
any combination thereof. JPEG 2000 also supports error-resilience, arbitrarily shaped
region of interest, random access, multicomponent images, palletized color, compressed
domain lossless flipping and simple rotation, to mention a few times of the different
algorithms on a Linux workstation. This only gives an appreciation of the involved
complexity.
Features
Traditional JPEG compression uses the Discrete Cosine Transformation (DCT), which
compresses an image in 8x8 blocks and results in visible artifacts at high compression
rates. JPEG artifacts include visible seams at the tile edges, dubbed as "blocking
artifacts". The wavelet transform encodes an image in a continuous stream allowing it to
avoid the artifacts that result from DCT’s division of an image into discrete compression
blocks. Wavelet artifacts take the form of blurring high contrast lines, merely making the
image look softer. The wavelet transform performs what's called, multi-resolution
compression—it stores image information in a series of bands, with the most important
image information at the beginning of the file. Each band contains a representation of
the entire image, with the various bands containing details of the image at every level,
from coarse resolution and textures to fine details.
Entropy Encoding
Lossless compression techniques frequently involve some form of entropy encoding and
are based in information theoretic techniques. Entropy encoding is an example of lossless
encoding as the decompression process regenerates the data completely. The raw data
and the decompressed data are identical, no information is lost. Entropy encoding just
manipulates bit streams without regarding what the bits mean. It usually transforms the bit
pattern into a different form for transmitting. These methods make use exclusively of the
redundancy of the data. There are several of these kinds. Entropy encoding is used
regardless of the media’s specific characteristics. The data stream to be compressed is
considered to be a simple digital sequence, and the semantic of the data is ignored.
Run Length Encoding (RLE)
Run length coding is an example of entropy encoding. If a byte occurs at least four
consecutive times the number of occurrences is counted. The compressed data
contains this byte followed by a special flag, called M-byte, and the number of its
occurrences. The exclamation mark “!” can be defined as this M-byte. A single
occurrence of this exclamation mark is interpreted as M-byte during the decompression;
two consecutive exclamation marks are interpreted as an exclamation mark occurring
within the data.
This algorithm is very easy to implement and does not require much CPU horsepower. RLE
compression is only efficient with files that contain lots of repetitive data. These can be
text files if they contain lots of spaces for indenting but line-art images that contain large
white or black areas are far more suitable. Computer generated colour images (e.g.
architectural drawings) can also give fair compression ratios.
• TIFF files
• PDF files
• BMP files
• PCX files
Huffman Encoding
This compression algorithm is mainly efficient in compressing text or program files. Images
like they are often used in prepress are better handled by other compression algorithms.
Huffman compression is mainly used in compression programs like pkZIP, lha, gz, zoo and
arj. It is also used within JPEG and MPEG compression.
Arithmetic Coding
LZW compression works best for files containing lots of repetitive data. This is often the
case with text and monochrome images. Files that are compressed but that do not
contain any repetitive information at all can even grow bigger. LZW compression is fast.
Royalties have to be paid to use LZW compression algorithms within applications (see
below).
• TIFF files
• GIF files
The BWT is an algorithm that takes a block of data and rearranges it using a sorting
algorithm. The resulting output block contains exactly the same data elements that it
started with, differing only in their ordering. The transformation is reversible, meaning the
original ordering of the data elements can be restored with no loss of fidelity.
The BWT is performed on an entire block of data at once. Most of today's familiar lossless
compression algorithms operate in streaming mode, reading a single byte or a few bytes
at a time. But with this new transform, there is a need to operate on the largest chunks of
data possible. Since the BWT operates on data in memory, there would be files too big to
process in one fell swoop. In these cases, the file must be split up and processed a block
at a time.
BWT can be used in a variety of file formats
• bzip2
• zip2