Unit 3 Image Compression
Unit 3 Image Compression
IMAGE COMPRESSION
1. Introduction:
A digital image is a rectangular array of dots, or picture elements, arranged in m rows and n columns.
The expression m×n is called the resolution of the image, and the dots are called pixels (except in the
cases of fax images and video compression, where they are referred to as pels). The term “resolution” is
sometimes also used to indicate the number of pixels per unit length of the image. Thus, dpi stands for
dots per inch.
The purpose of compression is to code the image data into a compact form, minimizing both the
number of bits in the representation, and the distortion caused by the compression. The importance of
image compression is emphasized by the huge amount of data in raster images: a typical gray‐scale image
of 512512 pixels, each represented by 8 bits, contain 256 kilobytes of data. With the color information,
the number of bytes is tripled. If we talk about video images of 25 frames per second, even a one second
of color film requires approximately 19 megabytes of memory. Thus, the necessity for compression is
obvious.
Image compression addresses the problem of reducing the amount of data required to
represent a digital image. The underlying basis of the reduction process is the removal of
redundant data. From a mathematical viewpoint, this amounts to transforming a 2-D pixel
array into a statistically uncorrelated data set. The transformation is applied prior to storage
or transmission of the image. At some later time, the compressed image is decompressed to
reconstruct the original image or an approximation of it.
For the purpose of image compression it is useful to distinguish the following types of images:
1. A bilevel (or monochromatic) image. This is an image where the pixels can have one of two values,
normally referred to as black and white. Each pixel in such an image is represented by one bit, making
this the simplest type of image.
2. A grayscale image. A pixel in such an image can have one of the n values 0 through n − 1, indicating
one of 2n shades of gray (or shades of some other color). The value of n is normally
compatible with a byte size; i.e., it is 4, 8, 12, 16, 24, or some other convenient multiple of 4 or of 8. The
set of the most‐significant bits of all the pixels is the most‐significant bitplane. Thus, a grayscale image
has n bitplanes.
3. A continuoustone image. This type of image can have many similar colors (or grayscales). When
adjacent pixels differ by just one unit, it is hard or even impossible for the eye to distinguish their colors.
As a result, such an image may contain areas with colors that seem to vary continuously as the eye moves
along the area. A pixel in such an image is represented by either a single large number (in the case of
many grayscales) or three components (in the case of a color image). A continuous‐tone image is
normally a natural image (natural as opposed to artificial) and is obtained by taking a photograph with a
digital camera, or by scanning a photograph or a painting.
4. A discretetone image (also called a graphical image or a synthetic image). This is normally an artificial
image. It may have a few colors or many colors, but it does not have the noise and blurring of a natural
image. Examples are an artificial object or machine, a page of text, a chart, a cartoon, or the contents of a
computer screen. Artificial objects, text, and line drawings have sharp, well‐ defined edges, and are
therefore highly contrasted from the rest of the image (the background). Adjacent pixels in a discretetone
image often are either identical or vary significantly in value. Such an image does not compress well with
lossy methods, because the loss of just a few pixels may render a letter illegible, or change a familiar
pattern to an unrecognizable one.
5. A cartoonlike image. This is a color image that consists of uniform areas. Each area has a uniform
color but adjacent areas may have very different colors. This feature may be exploited to obtain excellent
compression.
2. Introduction to image compression
The term data compression refers to the process of reducing the amount of data required to represent a
given quantity of information. A clear distinction must be made between data and information. They are
not synonymous. In fact, data are the means by which information is conveyed. Various amounts of data
may be used to represent the same amount of information. That is, it contains data (or words) that either
provide no relevant information or simply restate that which is already known. It is thus said to contain
data redundancy.
Data redundancy is a central issue in digital image compression. It is e compression. It is not an
abstract concept but a mathematically quantifiable entity. If n1 and n2 denote the number of
information‐carrying units in two data sets that represent the same information, the relative data
redundancy RD of the first data set (the one characterized by n1) can be defined as,
1
R =1—
D CR
Where CR commonly called the compression ratio, as
C n1
=
R n2
For the case n2 = n1, CR = 1 and RD = 0, indicating that (relative to the second data set) the first
representation of the information contains no redundant data. When n2« n1, CR→ ∞» and RD
→ —∞ implying significant compression and highly redundant data. Finally, when n2» n1, CR —> 0
and RD → —∞, indicating that the second data set contains much more data than the original
representation. In general, CR and RD lie in the open intervals (0, ∞) and (‐∞, 1), respectively. A practical
compression ratio, such as 10 (or 10:1), means that the first data set has 10 information carrying units
(say, bits) for every 1 unit in the second or compressed data set. The corresponding redundancy of 0.9
implies that 90% of the data in the first data set is redundant.
In digital image compression, three basic data redundancies can be identified and exploited:
1. coding redundancy,
2. interpixel redundancy,
3. Psychovisual redundancy.
Data compression is achieved when one or more of these redundancies are reduced or eliminated.
2.1 Coding Redundancy
We know that how the gray‐level histogram of an image can provide a great deal of insight into the
construction of codes to reduce the amount of data used to represent it. Let us assume, that a discrete
random variable rk in the interval [0,1] represents the gray levels of an image and that each rk occurs with
probability pr(rk), which is given by,
pr(rk) = nk
n
where L is the number of gray levels, n k is the number of times that the kth gray level appears in the
image, and n is the total number of pixels in the image. If the number of bits used to represent each value
of rk is l(rk), then the average number of bits required to represent each pixel is
L–1
Lavg = Σ l(rk)pr(rk)
k=0
That is, the average length of the code words assigned to the various gray‐level values is found by
summing the product of the number of bits used to represent each gray level and the probability that the
gray level occurs. Thus the total number of bits required to code an M X N image is MNLavg.
Assigning fewer bits to the more probable gray levels than to the less probable ones achieves data
compression. This process commonly is referred to as variable-length coding. If the gray levels of an
image are coded in a way that uses more code symbols than absolutely necessary to represent each gray
level, the resulting image is said to contain coding redundancy. In general, coding redundancy is present
when the codes assigned to a set of events (such as gray‐level values) have not been selected to take full
advantage of the probabilities of the events. It is almost always present when an image's gray levels are
represented with a straight or natural binary code. In this case, the underlying basis for the coding
redundancy is that images are typically composed of objects that have a regular and somewhat predictable
morphology (shape) and reflectance, and are generally sampled so that the objects being depicted are
much larger than the picture elements. The natural consequence is that, in most images, certain gray
levels are more probable than others. A natural binary coding of their gray levels assigns the same number
of bits to both the most and least probable values, thus failing to minimize Lavg and resulting in coding
redundancy.
Figure 1: Two images (a) and (b) and their gray level histograms (c) and (D)
In order to reduce the interpixel redundancies in an image, the 2‐D pixel array normally used for
human viewing and interpretation must be transformed into a more efficient (but usually "nonvisual")
format. For example, the differences between adjacent pixels can be used to represent an image.
Transformations of this type (that is, those that remove interpixel redundancy) are referred to as
mappings. They are called reversible mappings if the original image elements can be reconstructed
from the transformed data set.
2.3 Psychovisual Redundancy
We know that the brightness of a region, as perceived by the eye, depends on factors other than simply
the light reflected by the region. For example, intensity variations (Mach bands) can be perceived in an
area of constant intensity. Such phenomena result from the fact that the eye does
not respond with equal sensitivity to all visual information. Certain information simply has less relative
importance than other information in normal visual processing. This information is said to be
psychovisually redundant. It can be eliminated without significantly impairing the quality of image
perception.
That psychovisual redundancies exist should not come as a surprise, because human perception of
the information in an image normally does not involve quantitative analysis of every pixel value in the
image. In general, an observer searches for distinguishing features such as edges or textural regions
and mentally combines them into recognizable groupings. The brain then correlates these groupings with
prior knowledge in order to complete the image interpretation process.
Psychovisual redundancy is fundamentally different from the redundancies discussed earlier. Unlike
coding and interpixel redundancy, psychovisual redundancy is associated with real or quantifiable visual
information. Its elimination is possible only because the information itself is not essential for normal
visual processing. Since the elimination of psychovisually redundant data results in a loss of quantitative
information, it is commonly referred to as quantization. This terminology is consistent with normal
usage of the word, which generally means the mapping of a broad range of input values to a limited
number of output values. As it is an irreversible operation (visual information is lost), quantization
results in lossy data compression.
Improved gray-scale (IGS) quantization method recognizes the eye's inherent sensitivity to
edges and breaks them up by adding to each pixel a pseudorandom number, which is generated from the
low‐order bits of neighboring pixels, before quantizing the result. Because the low‐order bits arc fairly
random, this amounts to adding a level of randomness, which depends on the local characteristics of the
image, to the artificial edges normally associated with false contouring.
Approach 1: This is appropriate for bi-level images. A pixel in such an image is represented by one bit.
Applying the principle of image compression to a bi‐level image therefore means that the immediate
neighbors of a pixel P tend to be identical to P. Thus, it makes sense to use run‐length encoding (RLE) to
compress such an image. A compression method for such an image may scan it in raster order (row by
row) and compute the lengths of runs of black and white pixels. The lengths are encoded by variable ‐size
(prefix) codes and are written on the compressed stream. An example of such a method is facsimile
compression.
Approach 2: Also for bi‐level images. The principle of image compression tells us that the neighbors of
a pixel tend to be similar to the pixel. We can extend this principle and conclude that if the current pixel
has color c (where c is either black or white), then pixels of the same color seen in the past (and also those
that will be found in the future) tend to have the same immediate neighbors.
This approach looks at n of the near neighbors of the current pixel and considers them an n‐ bit
number. This number is the context of the pixel. In principle there can be 2n contexts, but because of
image redundancy we expect them to be distributed in a nonuniform way. Some contexts should be
common while others will be rare. This approach is used by JBIG.
Approach 3: Separate the grayscale image into n bi‐level images and compress each with RLE and prefix
codes. The principle of image compression seems to imply intuitively that two adjacent pixels that are
similar in the grayscale image will be identical in most of the n bi‐level images. This, however, is not
true. An example of such a code is the reflected Gray codes.
Approach 4: Use the context of a pixel to predict its value. The context of a pixel is the values of some
of its neighbors. We can examine some neighbors of a pixel P, compute an average A of their values, and
predict that P will have the value A. The principle of image compression tells us that our prediction will
be correct in most cases, almost correct in many cases, and completely wrong in a few cases. This is used
in MLP method.
Approach 5: Transform the values of the pixels and encode the transformed values. Recall that
compression is achieved by reducing or removing redundancy. The redundancy of an image is caused by
the correlation between pixels, so transforming the pixels to a representation where they are decorrelated
eliminates the redundancy. It is also possible to think of a transform in terms of the entropy of the image.
In a highly correlated image, the pixels tend to have equiprobable values, which results in maximum
entropy. If the transformed pixels are decorrelated, certain pixel values become common, thereby having
large probabilities, while others are rare. This results in small entropy. Quantizing the transformed values
can produce efficient lossy image compression. Approach 6: The principle of this approach is to separate
a continuous‐tone color image into three grayscale images and compress each of the three separately;
using approaches 3, 4, or 5. For a continuous‐tone image, the principle of image
An important feature of this approach is to use a luminance chrominance color representation
instead of the more common RGB. The advantage of the luminance chrominance color representation is
that the eye is sensitive to small changes in luminance but not in
chrominance. This allows the loss of considerable data in the chrominance components, while making it
possible to decode the image without a significant visible loss of quality.
Approach 7: A different approach is needed for discrete‐tone images. Recall that such an image contains
uniform regions, and a region may appear several times in the image. A good example is a screen dump.
Such an image consists of text and icons. Each character of text and each icon is a region, and any region
may appear several times in the image. A possible way to compress such an image is to scan it, identify
regions, and find repeating regions. If a region B is identical to an already found region A, then B can be
compressed by writing a pointer to A on the compressed stream. The block decomposition method
(FABD) is an example of how this approach can be implemented.
Approach 8: Partition the image into parts (overlapping or not) and compress it by processing the parts
one by one. Suppose that the next unprocessed image part is part number 15. Try to match it with parts 1–
14 that have already been processed. If part 15 can be expressed, for example, as a combination of parts 5
(scaled) and 11 (rotated), then only the few numbers that specify the combination need be saved, and part
15 can be discarded. If part 15 cannot be expressed as a combination of already‐processed parts, it is
declared processed and is saved in raw format.
This approach is the basis of the various fractal methods for image compression. It applies the
principle of image compression to image parts instead of to individual pixels. Applied this way, the
principle tells us that “interesting” images (i.e., those that are being compressed in practice) have a certain
amount of self similarity. Parts of the image are identical or similar to the entire image or to other parts.
An image compression method that has been developed specifically for a certain type of image can
sometimes be used for other types. Any method for compressing bi‐level images, for example, can be
used to compress grayscale images by separating the bitplanes and compressing each individually, as if it
were a bi‐level image. Imagine, for example, an image with 16 grayscale values. Each pixel is defined by
four bits, so the image can be separated into four bi‐level images. The trouble with this approach is that it
violates the general principle of image compression. Imagine two adjacent 4‐bit pixels with values 7 =
01112 and 8 = 10002. These pixels have close values, but when separated into four bitplanes, the
resulting 1‐bit pixels are different in every bitplane! This is because the binary representations of the
consecutive integers 7 and 8 differ in all four bit positions. In order to apply any bi‐level
compression method to grayscale images, a binary
representation of the integers is needed where consecutive integers have codes differing by one bit only.
Such a representation exists and is called reflected Gray code (RGC).
The conclusion is that the most-significant bitplanes of an image obey the principle of image
compression more than the least-significant ones. When adjacent pixels have values that differ by one
unit (such as p and p+1), chances are that the least‐significant bits are different and the most‐significant
ones are identical. Any image compression method that compresses bitplanes individually should
therefore treat the least‐significant bitplanes differently from the most‐ significant ones, or should use
RGC instead of the binary code to represent pixels.. The bitplanes are numbered 8 (the leftmost or most ‐
significant bits) through 1 (the rightmost or least‐significant bits). It is obvious that the least‐significant
bitplane doesn’t show any correlations between the pixels; it is random or very close to random in both
binary and RGC. Bitplanes 2 through 5, however, exhibit better pixel correlation in the Gray code.
Bitplanes 6 through 8 look different in Gray code and binary, but seem to be highly correlated in either
representation.
Color images provide another example of using the same compression method across image
types. Any compression method for grayscale images can be used to compress color images. In a color
image, each pixel is represented by three color components (such as RGB). Imagine a color image where
each color component is represented by one byte. A pixel is represented by three bytes, or 24 bits, but
these bits should not be considered a single number. The two pixels 118|206|12 and 117|206|12 differ by
just one unit in the first component, so they have very similar colors. Considered as 24‐bit numbers,
however, these pixels are very different, since they differ in one of their most significant bits. Any
compression method that treats these pixels as 24‐bit numbers would consider these pixels very different,
and its performance would suffer as a result.
A compression method for grayscale images can be applied to compressing color images, but the color
image should first be separated into three color components, and each component compressed
individually as a grayscale image.
5. Error Metrics
Developers and implementers of lossy image compression methods need a standard metric to measure the
quality of reconstructed images compared with the original ones. The better a reconstructed image
resembles the original one, the bigger should be the value produced by this metric. Such a metric should
also produce a dimensionless number, and that number should not be very sensitive to small variations in
the reconstructed image.
A common measure used for this purpose is the peak signal to noise ratio (PSNR). Higher PSNR
values imply closer resemblance between the reconstructed and the original images, but they do not
provide a guarantee that viewers will like the reconstructed image. Denoting the pixels of the original
image by Pi and the pixels of the reconstructed image by Qi (where 1 ≤ i ≤ n), we first define the mean
square error (MSE) between the two images as
n
1
MSE = Σ(P — Oi)2
i
n
i=1
It is the average of the square of the errors (pixel differences) of the two images. The root mean
square error (RMSE) is defined as the square root of the MSE, and the PSNR is defined as
1
√ ∑ n P2
SNR = 20 log
n i=1 i
10
RMSE
The numerator is the root mean square of the original image.
Another relative of the PSNR is the signal to quantization noise ratio (SQNR). This is a measure of
the effect of quantization on signal quality. It is defined as
Image frequencies are important because of the following basic fact: Low frequencies
correspond to the important image features, whereas high frequencies correspond to the details of the
image, which are less important. Thus, when a transform isolates the various image frequencies, pixels
that correspond to high frequencies can be quantized heavily, whereas pixels that correspond to low
frequencies should be quantized lightly or not at all. This is how a transform can compress an image very
effectively by losing information, but only information associated with unimportant image details.
Practical image transforms should be fast and preferably also simple to implement. This suggests the use
of linear transforms. In such a transform, each transformed value (or transform coefficient) ci is a
weighted sum of the data items (the pixels) dj that are being transformed, where each item is multiplied
by a weight wij. Thus, Ci = ∑j djwij for i, j = 1, 2, . . . , n. For n = 4, this is
expressed in matrix notation:
For the general case, we can write C =W.D. Each row of W is called a “basis vector.” The only quantities
that have to be computed are the weights wij . The guiding principles are as follows:
1. Reducing redundancy. The first transform coefficient c1 can be large, but the remaining values
c2, c3, . . . should be small.
2. Isolating frequencies. The first transform coefficient c1 should correspond to zero pixel
frequency, and the remaining coefficients should correspond to higher and higher frequencies.
The key to determining the weights wij is the fact that our data items dj are not arbitrary numbers but pixel
values, which are nonnegative and correlated.
This choice of wij satisfies the first requirement: to reduce pixel redundancy by means of a
transform. In order to satisfy the second requirement, the weights wij of row i should feature frequencies
that get higher with i. Weights w1j should have zero frequency; they should all be +1’s. Weights w1j
should have one sign change; i.e., they should be +1, +1, . . . + 1,−1,−1, . . . ,−1. This continues until the
last row of weights wnj should have the highest frequency +1,−1, +1,−1, . . . ,
+1,−1. The mathematical discipline of vector spaces coins the term “basis vectors” for our rows of
weights.
In addition to isolating the various frequencies of pixels dj, this choice results in basis vectors that are
orthogonal. The basis vectors are the rows of matrix W, which is why this matrix and, by implication, the
entire transform are also termed orthogonal. These considerations are satisfied by the orthogonal matrix
1
1 1 1
1
1 1 1
1 1 1 1
The first basis vector (the1top1 1 W)1consists of all 1’s, so its frequency is zero. Each of the
row of
subsequent vectors has two +1’s and two −1’s, so they produce small transformed values, and their
frequencies (measured as the number of sign changes along the basis vector) get higher. It is also possible
to modify this transform to conserve the energy of the data vector. All that’s needed is to multiply the
transformation matrix W by the scale factor 1/2. Another advantage of W is that it also performs the
inverse transform.
6.2 Two-Dimensional Transforms
Given two‐dimensional data such as the 4X4 matrix
5 6 7 4
6 5 7 5
7 7 6 6
8 8 8 8
where each of the four columns is highly correlated, we can apply our simple one dimensional transform to
the columns of D. The result is,
1 1 1 5 1 6 7 4 26 26 28 23
1 1 1 1 6 5 7 5 4 4 0 5
C' = W · D = · =
1 1 1 1
7 7 6 6 0 2 2 1
1 1 1 1 8 8 8 8 2 0 2 3
Each column of C’ is the transform of a column of D. Notice how the top element of each column
of C’ is dominant, because the data in the corresponding column of D is correlated. Notice also that the
rows of C’ are still correlated. C’ is the first stage in a two‐stage process that produces the two‐
dimensional transform of matrix D. The second stage should transform each row of C’, and this is done
by multiplying C’ by the transpose WT. Our particular W, however, is symmetric, so we end up with C =
2 for u 1,2,..., N 1
N
u 1
N
The corresponding 2‐D DCT, and the inverse DCT are defined as
N 1 N 1 2x 2 y 1v
1u
Cu, v uv f x, y cos cos
x 0 y 0 2 2N
N
and
N 1 N 1 2x 2 y 1v
1u
f x, y uvCu, v cos cos
u0 v0 2 2N
N
The advantage of DCT is that it can be expressed without complex numbers. 2‐D DCT is also separable
(like 2‐D Fourier transform), i.e. it can be obtained by two subsequent 1‐D DCT.
The important feature of the DCT, the feature that makes it so useful in data
compression, is that it takes correlated input data and concentrates its energy in just the
first few transform coefficients. If the input data consists of correlated quantities, then most
of the N transform coefficients produced by the DCT are zeros or small numbers, and only a
few are large (normally the first ones).
Compressing data with the DCT is therefore done by quantizing the coefficients. The
small ones are quantized coarsely (possibly all the way to zero), and the large ones can be
quantized finely to the nearest integer. After quantization, the coefficients (or variable-size
codes assigned to the coefficients) are written on the compressed stream. Decompression is
done by performing the inverse DCT on the quantized coefficients. This results in data items
that are not identical to the original ones but are not much different.
In practical applications, the data to be compressed is partitioned into sets of N items
each and each set is DCT-transformed and quantized individually. The value of N is critical.
Small values of N such as 3, 4, or 6 result in many small sets of data items. Such a small set
is transformed to a small set of coefficients where the energy of the original data is
concentrated in a few coefficients, but there are only a few coefficients in such a set! Thus,
there are not enough small coefficients to quantize. Large values of N result in a few large
sets of data. The problem in such a case is that the individual data items of a large set are
normally not correlated and therefore result in a set of transform coefficients where all the
coefficients are large. Experience indicates that N= 8 is a good value, and most data
compression methods that employ the DCT use this value of N.
7. JPEG
JPEG is a sophisticated lossy/lossless compression method for color or grayscale still images. It does not
handle bi‐level (black and white) images very well. It also works best on continuous‐tone images, where
adjacent pixels have similar colors. An important feature of JPEG is its use of many parameters, allowing
the user to adjust the amount of the data lost (and thus also the compression ratio) over a very wide range.
Often, the eye cannot see any image degradation even at compression factors of 10 or 20. There are two
operating modes, lossy (also called baseline) and lossless (which typically produces compression ratios
of around 0.5). Most implementations support just the lossy mode. This mode includes progressive and
hierarchical coding. JPEG is a compression method, not a complete standard for image representation.
This is why it does not specify image features such as pixel aspect ratio, color space, or interleaving of
bitmap rows. JPEG has been designed as a compression method for continuous‐tone images.
3. The pixels of each color component are organized in groups of 8×8 pixels called data units, and
each data unit is compressed separately. If the number of image rows or columns is not a multiple of 8,
the bottom row and the rightmost column are duplicated as many times as necessary. In the
noninterleaved mode, the encoder handles all the data units of the first image component, then the data
units of the second component, and finally those of the third component. In the interleaved mode the
encoder processes the three top‐left data units of the three image components, then the three data units to
their right, and so on.
4. The discrete cosine transform is then applied to each data unit to create an 8×8 map of frequency
components. They represent the average pixel value and successive higher‐frequency changes within the
group. This prepares the image data for the crucial step of losing information.
5. Each of the 64 frequency components in a data unit is divided by a separate number called its
quantization coefficient (QC), and then rounded to an integer. This is where information is irretrievably
lost. Large QCs cause more loss, so the high frequency components typically have larger QCs. Each of
the 64 QCs is a JPEG parameter and can, in principle, be specified by the user. In practice, most JPEG
implementations use the QC tables recommended by the JPEG standard for the luminance and
chrominance image components.
6. The 64 quantized frequency coefficients (which are now integers) of each data unit are encoded using a
combination of RLE and Huffman coding.
7. The last step adds headers and all the required JPEG parameters, and outputs the result. The
compressed file may be in one of three formats (1) the interchange format, in which the file contains the
compressed image and all the tables needed by the decoder (mostly quantization tables and tables of
Huffman codes), (2) the abbreviated format for compressed image data, where the file contains the
compressed image and may contain no tables (or just a few tables), and (3) the abbreviated format for
table‐specification data, where the file contains just tables, and no compressed image. The second
format makes sense in cases where the same encoder/decoder pair is used, and they have the same tables
built in. The third format is used in cases where many images have been compressed by the same encoder,
using the same tables. When those images need to be decompressed, they are sent to a decoder preceded
by one file with table‐specification data.
The JPEG decoder performs the reverse steps. (Thus, JPEG is a symmetric compression
method.)
Figure 4 and 5 shows the block diagram of JPEG encoder and decoder. Figure 6 shows JPEG for
RGB images.
7.1 Modes of JPEG algorithm:
The progressive mode is a JPEG option. In this mode, higher‐frequency DCT coefficients are
written on the compressed stream in blocks called “scans.” Each scan that is read and processed by the
decoder results in a sharper image. The idea is to use the first few scans to quickly create a low ‐quality,
blurred preview of the image, and then either input the remaining scans or stop the process and reject the
image. The trade‐off is that the encoder has to save all the coefficients of all the data units in a memory
buffer before they are sent in scans, and also go through all the steps for each scan, slowing down the
progressive mode.
In the hierarchical mode, the encoder stores the image several times in the output stream, at
several resolutions. However, each high‐resolution part uses information from the low‐
resolution parts of the output stream, so the total amount of information is less than that required to store
the different resolutions separately. Each hierarchical part may use the progressive mode. The hierarchical
mode is useful in cases where a high‐resolution image needs to be output in low resolution. Older dot‐
matrix printers may be a good example of a low‐resolution output device still in use.
The lossless mode of JPEG calculates a “predicted” value for each pixel, generates the difference
between the pixel and its predicted value, and encodes the difference using the same method (i.e.,
Huffman or arithmetic coding) employed by step 5 above. The predicted value is calculated using values
of pixels above and to the left of the current pixel (pixels that have already been input and encoded).
7.2 Why DCT?
The JPEG committee elected to use the DCT because of its good performance, because it does
not assume anything about the structure of the data (the DFT, for example, assumes that the data to be
transformed is periodic), and because there are ways to speed it up. DCT has two key advantages: the
decorrelation of the information by generating coefficients which are almost independent of each other
and the concentration of this information in a greatly reduced number of coefficients. It reduces
redundancy while guaranteeing a compact representation.
The JPEG standard calls for applying the DCT not to the entire image but to dataunits (blocks) of
8×8 pixels. The reasons for this are (1) Applying DCT to large blocks involves many arithmetic
operations and is therefore slow. Applying DCT to small data units is faster. (2) Experience shows that,
in a continuous‐tone image, correlations between pixels are short range. A pixel in such an image has a
value (color component or shade of gray) that’s close to those of its near neighbors, but has nothing to do
with the values of far neighbors. The JPEG DCT is therefore executed for n = 8
The DCT is JPEG’s key to lossy compression. The unimportant image information is reduced or
removed by quantizing the 64 DCT coefficients, especially the ones located toward the lower‐ right. If
the pixels of the image are correlated, quantization does not degrade the image quality much. For
best results, each of the 64 coefficients is quantized by dividing it by a different quantization coefficient
(QC). All 64 QCs are parameters that can be controlled, in principle, by the user. Mathematically, the
DCT is a one-to-one mapping of 64-point vectors from the image domain to the frequency domain.
The IDCT is the reverse mapping. If the DCT and IDCT could be
calculated with infinite precision and if the DCT coefficients were not quantized, the original 64
pixels would be exactly reconstructed.
7.3 Quantization
After each 8×8 data unit of DCT coefficients Gij is computed, it is quantized. This is the step
where information is lost (except for some unavoidable loss because of finite precision calculations in
other steps). Each number in the DCT coefficients matrix is divided by the corresponding number from
the particular “quantization table” used, and the result is rounded to the nearest integer. As has already
been mentioned, three such tables are needed, for the three color components. The JPEG standard allows
for up to four tables, and the user can select any of the four for quantizing each color component.
The 64 numbers that constitute each quantization table are all JPEG parameters. In principle, they
can all be specified and fine‐tuned by the user for maximum compression. In practice, few users have the
patience or expertise to experiment with so many parameters, so JPEG software normally uses the
following two approaches:
1. Default quantization tables. Two such tables, for the luminance (grayscale) and the chrominance
components, are the result of many experiments performed by the JPEG committee. They are included in
the JPEG standard and are reproduced here as Table 1. It is easy to see how the QCs in the table generally
grow as we move from the upper left corner to the bottom right corner. This is how JPEG reduces the
DCT coefficients with high spatial frequencies.
2. A simple quantization table Q is computed based on one parameter R specified by the user. A simple
expression such as Qij = 1+(i + j) × R guarantees that QCs start small at the upper‐left corner and get
bigger toward the lower‐right corner. Table 2 shows an example of such a table with R = 2.
1. The 64 numbers are collected by scanning the matrix in zigzags. This produces a string of 64 numbers
that starts with some nonzeros and typically ends with many consecutive zeros. Only the nonzero
numbers are output (after further compressing them) and are followed by a special end ‐of block (EOB)
code. This way there is no need to output the trailing zeros (we can say that the EOB is the run ‐length
encoding of all the trailing zeros)..
2. The nonzero numbers are compressed using Huffman coding.
3. The first of those numbers (the DC coefficient) is treated differently from the others (the AC
coefficients).
7.4 Coding:
Each 8×8 matrix of quantized DCT coefficients contains one DC coefficient [at position (0, 0),
the top left corner] and 63 AC coefficients. The DC coefficient is a measure of the average value of the 64
original pixels, constituting the data unit. Experience shows that in a continuous‐tone image, adjacent data
units of pixels are normally correlated in the sense that the average values of the pixels in adjacent data
units are close. We already know that the DC coefficient of a data unit is a multiple of the average of the
64 pixels constituting the unit. This implies that the DC coefficients of adjacent data units don’t differ
much. JPEG outputs the first one (encoded), followed by differences (also encoded) of the DC
coefficients of consecutive data units.
Example: If the first three 8×8 data units of an image have quantized DC coefficients of 1118,
1114, and 1119, then the JPEG output for the first data unit is 1118 (Huffman encoded) followed by the
63 (encoded) AC coefficients of that data unit. The output for the second data unit will be 1114 − 1118 =
−4 (also Huffman encoded), followed by the 63 (encoded) AC coefficients of that data unit, and the
output for the third data unit will be 1119 − 1114 = 5 (also Huffman encoded), again followed by the 63
(encoded) AC coefficients of that data unit. This way of handling the DC coefficients is worth the extra
trouble, because the differences are small.
Assume that 46 bits encode one color component of the 64 pixels of a data unit. Let’s
assume that the other two color components are also encoded into 46-bit numbers. If each
pixel originally consists of 24 bits, then this corresponds to a compression factor of 64 ×
24/(46 × 3) ≈ 11.13; very impressive!
Each quantized spectral domain is composed of a few non‐zero quantized coefficients, and the
majority of zero coefficients eliminated in the quantization stage. The positioning of the zeros changes
from one block to another. As shown in Figure 7, a zigzag scanning of the block is performed in order to
create a vector of coefficients with a lot of zero runlengths. The natural images generally have low
frequency characteristics. By beginning the zigzag scanning at the top left (by the low frequency zone),
the vector generated will at first contain significant coefficients, and then more and more runlengths of
zeros as we move towards the high frequency coefficients. Figure 7gives us an example.
Figure 7. Zigzag scanning of a quantized DCT domain, the resulting coefficient vector, and the
generation of pairs (zero runlength, DCT coefficient). EOB stands for “end of block”
Couples of (zero runlengths, DCT coefficient value) are then generated and coded by a set of Huffman
coders defined in the JPEG standard. The mean values of the blocks (DC coefficient) are coded separately
by a DPCM method. Finally, the “.jpg” file is constructed with the union of the bitstreams associated with
the coded blocks.
8. JPEG – LS:
JPEG‐LS is a new standard for the lossless (or near‐lossless) compression of continuous tone images.
JPEG‐LS examines several of the previously‐seen neighbors of the current pixel, uses them as the context
of the pixel, uses the context to predict the pixel and to select a probability distribution out of several such
distributions, and uses that distribution to encode the prediction error with a special Golomb code. There
is also a run mode, where the length of a run of identical pixels is encoded. Figure 8 below shows the
block diagram of JPEG‐LS encoder.
The context used to predict the current pixel x is shown in Figure 9. The encoder examines the
context pixels and decides whether to encode the current pixel x in the run mode or in the
regular mode. If the context suggests that the pixels y, z,. . . following the current pixel are likely to be
identical, the encoder selects the run mode. Otherwise, it selects the regular mode. In the near ‐ lossless
mode the decision is slightly different. If the context suggests that the pixels following the current pixel
are likely to be almost identical (within the tolerance parameter NEAR), the encoder selects the run mode.
Otherwise, it selects the regular mode. The rest of the encoding process depends on the mode selected.
In the regular mode, the encoder uses the values of context pixels a, b, and c to predict pixel x,
and subtracts the prediction from x to obtain the prediction error, denoted by Errval. This error is then
corrected by a term that depends on the context (this correction is done to compensate for systematic
biases in the prediction), and encoded with a Golomb code. The Golomb coding depends on all four
pixels of the context and also on prediction errors that were previously encoded for the same context (this
information is stored in arrays A and N). If near‐lossless compression is used, the error is quantized before
it is encoded.
In the run mode, the encoder starts at the current pixel x and finds the longest run of pixels that
are identical to context pixel a. The encoder does not extend this run beyond the end of the current image
row. Since all the pixels in the run are identical to a (and a is already known to the decoder) only the
length of the run needs be encoded, and this is done with a 32‐entry array denoted by J. If near‐lossless
compression is used, the encoder selects a run of pixels that are close to a within the tolerance parameter
NEAR.
The decoder is not substantially different from the encoder, so JPEG‐LS is a nearly symmetric
compression method. The compressed stream contains data segments (with the Golomb codes and the
encoded run lengths), marker segments (with information needed by the decoder), and markers (some of
the reserved markers of JPEG are used). A marker is a byte of all ones followed by a special code,
signaling the start of a new segment. If a marker is followed by a byte
whose most significant bit is 0, that byte is the start of a marker segment. Otherwise, that byte starts
a data segment.
Advantages of JPEGLS:
[1] JPEG‐LS is capable of lossless compression.
[2] JPEG‐LS has very low computational complexity.
JPEG-LS achieve state-of-the-art compression rates at very low computational complexity
and memory requirements. These characteristics are what brought to the selection of JPEG-
LS, which is based on the LOCO- I algorithm developed at Hewlett-Packard Laboratories, as
the new ISO/ITU standard for lossless and near- lossless still image compression.
Ref: “The LOCO-I Lossless Image Compression Algorithm: Principles and Standardization
into JPEG-LS”, Marcelo J. Weinberger, Gadiel Seroussi, Guillermo Sapiro, IEEE
TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 8, AUGUST 2000.
9. JPEG 2000:
The JPEG 2000 standard for the compression of still images is based on the Discrete Wavelet
Transform (DWT). This transform decomposes the image using functions called wavelets. The basic idea
is to have a more localized (and therefore more precise) analysis of the information (signal, image or 3D
objects), which is not possible using cosine functions whose temporal or spatial supports are identical to
the data (the same time duration for signals, and the same length of line or column for images).
JPEG-2000 advantages:
Better image quality that JPEG at the same file size; or alternatively 25‐35% smaller file
sizes with the same quality.
Good image quality at low bit rates (even with compression ratios over 80:1)
Low complexity option for devices with limited resources.
Scalable image files ‐‐ no decompression needed for reformatting. With JPEG 2000, the image
that best matches the target device can be extracted from a single compressed file on a server.
Options include:
1. Image sizes from thumbnail to full size
2. Grayscale to full 3 channel color
3. Low quality image to lossless (identical to original image)
JPEG 2000 is more suitable to web‐graphics than baseline JPEG because it supports Alpha‐
channel (transparency component).
Region of interest (ROI): one can define some more interesting parts of image, which are
coded with more bits than surrounding areas
Following is a list of areas where this new standard is expected to improve on existing
methods:
High compression efficiency. Bitrates of less than 0.25 bpp are expected for highly detailed
grayscale images.
The ability to handle large images, up to 232×232 pixels (the original JPEG can handle
images of up to 216×216).
Progressive image transmission. The proposed standard can decompress an image
progressively by SNR, resolution, color component, or region of interest.
Easy, fast access to various points in the compressed stream.
The decoder can pan/zoom the image while decompressing only parts of it.
The decoder can rotate and crop the image while decompressing it.
Error resilience. Error‐correcting codes can be included in the compressed stream, to
improve transmission reliability in noisy environments.
9.1 The JPEG 2000 Compression Engine
The JPEG 2000 compression engine (encoder and decoder) is illustrated in block diagram form
in Fig. 10.
Figure 10: General block diagram of the JPEG 2000 (a) encoder and (b) decoder.
At the encoder, the discrete transform is first applied on the source image data. The transform
coefficients are then quantized and entropy coded before forming the output code stream (bit stream).
The decoder is the reverse of the encoder. The code stream is first entropy decoded, dequantized, and
inverse discrete transformed, thus resulting in the reconstructed image data. Although this general block
diagram looks like the one for the conventional JPEG, there are radical differences in all of the processes
of each block of the diagram. A quick overview of the whole system is as follows:
The source image is decomposed into components.
The image components are (optionally) decomposed into rectangular tiles. The tile‐ component is
the basic unit of the original or reconstructed image.
A wavelet transform is applied on each tile. The tile is decomposed into different resolution
levels.
The decomposition levels are made up of subbands of coefficients that describe the frequency
characteristics of local areas of the tile components, rather than across the entire image
component.
The subbands of coefficients are quantized and collected into rectangular arrays of “code
blocks.”
The bit planes of the coefficients in a code block (i.e., the bits of equal significance across the
coefficients in a code block) are entropy coded.
The encoding can be done in such a way that certain regions of interest can be coded at a higher
quality than the background.
Markers are added to the bit stream to allow for error resilience.
The code stream has a main header at the beginning that describes the original image and the
various decomposition and coding styles that are used to locate, extract, decode and reconstruct
the image with the desired resolution, fidelity, region of interest or other characteristics.
For the clarity of presentation we have decomposed the whole compression engine into three parts:
the preprocessing, the core processing, and the bit-stream formation part, although there exist high
inter‐relation between them. In the preprocessing part the image tiling, the dc‐level shifting and the
component transformations are included. The core processing part consists of the discrete transform, the
quantization and the entropy coding processes. Finally, the concepts of the precincts, code blocks, layers,
and packets are included in the bit‐stream formation part.
Ref: “The JPEG 2000 Still Image Compression Standard”, Athanassios Skodras, Charilaos
Christopoulos, and Touradj Ebrahimi, IEEE SIGNAL PROCESSING MAGAZINE, SEPTEMBER 2001,
PP. 36-58
10. DPCM:
The DPCM compression method is a member of the family of differential encoding compression
methods, which itself is a generalization of the simple concept of relative encoding . It is based on the
well‐known fact that neighboring pixels in an image (and also adjacent samples in digitized sound) are
correlated. Correlated values are generally similar, so their differences are small, resulting in
compression.
Differential encoding methods calculate the differences di = ai − ai−1 between consecutive data
items ai, and encode the di’s. The first data item, a0, is either encoded separately or is written on the
compressed stream in raw format. In either case the decoder can decode and generate a0 in exact form. In
principle, any suitable method, lossy or lossless, can be used to encode the differences. In practice,
quantization is often used, resulting in lossy compression. The quantity encoded is not the difference di
difference between di and d^7 is the quantization error qi. Thus, d^7 = di + qi.
It turns out that the lossy compression of differences introduces a new problem, namely, the
accumulation of errors. This is easy to see when we consider the operation of the decoder. The decoder
inputs encoded values of d^7, decodes them, and uses them to generate “reconstructed”
values a^7 (where a^7 = a^7 – 1 +d^7 ) instead of the original data values ai. The decoder starts by reading
and decoding a0. It then inputs d^7 = d1 + q1 and calculates a^1 = a0+ d^7 = a0+d1+q1 = a1+q1. The next
where wj are the weights, which still need to be determined. Figure 12 shows a simple example for the
case N = 3. Let’s assume that a pixel X is predicted by its three neighbors A, B, and C according to the
simple weighted sum
X = 0.35A + 0.3B + 0.35C
The weights used in above equation have been selected more or less arbitrarily and are for
illustration purposes only. However, they make sense, because they add up to unity. In order to determine
the best weights, we denote by ei the prediction error for pixel ai,
N
ei = ai — pi = ai — Σ wjai–j
j=1
i=1,2,…,n. and n is the number of pixels to be compressed and we find the set of weights wj that minimizes
the sum
n n N 2
E = Σ e2 = Σ [ai — Σ wjai–j]
The difference here is that the entire image is not self‐similar, but parts of the image are self‐
similar with properly transformed parts of itself. Studies suggest that most naturally occurring
images contain this type of self‐similarity. It is this restricted redundancy that fractal image compression
schemes attempt to eliminate.
What is Fractal Image Compression?
Imagine a special type of photocopying machine that reduces the image to be copied by half and
reproduces it three times on the copy (see Figure 1). What happens when we feed the output of this
machine back as input? Figure 2 shows several iterations of this process on several input images. We can
observe that all the copies seem to converge to the same final image, the one in 2(c). Since the copying
machine reduces the input image, any initial image placed on the copying machine will be reduced to a
point as we repeatedly run the machine; in fact, it is only the position and the orientation of the copies that
determines what the final image looks like.
The way the input image is transformed determines the final result when running the copy
machine in a feedback loop. However we must constrain these transformations, with the limitation that
the transformations must be contractive (see contractive box), that is, a given transformation applied to
any two points in the input image must bring them closer in the copy. This technical condition is quite
logical, since if points in the copy were spread out the final image would have to be of infinite size.
Except for this condition the transformation can have any form. In practice, choosing transformations of
the form
is sufficient to generate interesting transformations called affine transformations of the plane. Each can
skew, stretch, rotate, scale and translate an input image. A common feature of these transformations that
run in a loop back mode is that for a given initial image each image is formed
from a transformed (and reduced) copies of itself, and hence it must have detail at every scale. That is, the
images are fractals. This method of generating fractals is due to John Hutchinson.
Barnsley suggested that perhaps storing images as collections of transformations could lead to
image compression. His argument went as follows: the image in Figure 3 looks complicated yet it is
generated from only 4 affine transformations.
The JBIG Standard
JBIG is the coding standard recommended by the Joint Bi-level Image Processing Group for binary images.
This lossless compression standard is used primarily to code scanned images of printed or handwritten text,
computer-generated text, and facsimile transmissions. It offers progressive encoding and decoding capability, in
the sense that the resulting bitstream contains a set of progressively higher-resolution images. This standard can
also be used to code grayscale and color images by coding each bitplane independently, but this is not the main
objective.
The JBIG compression standard has three separate modes of operation: progressive, progressive-compatible
sequential, and single-progression seqlleTltial. The progressive compatible sequential mode uses a bitstream
compatible with the progressive mode. The only difference is that the data is divided into strips in this mode.
The single-progression sequential mode has only a single lowest-resolution layer. There fore, an entire image
can be coded without any reference to other higher-resolution layers. Both these modes can be viewed as special
cases of the progressive mode. Therefore, our discussion covers only the progressive mode.
The JBIG encoder can be decomposed into two components: • Resolution-reduction and differential-layer
encoder• Lowest-resolution-layer encoder
The input image goes through a sequence of resolution-reduction and differential-layer encoders. Each is
equivalent in functionality, except that their input images have different resolutions. Some implementations of the
JBIG standard may choose to recursively use one such physical encoder. The lowest-resolution image is coded
using the lowest-resolution layer encoder. The design of this encoder is somewhat simpler than that of the
resolution reduction and differential-layer encoders, since the resolution-reduction and deterrninistic prediction
operations are not needed.
While the JBIG standard offers both lossless and progressive (lossy to lossless) coding abilities, the lossy
image produced by this standard has significantly lower quality than the original, beca,use the lossy image
contains at most only one-quarter of the number of pixels in the original image. By contrast, the JBIG2 standard is
explicitly designed for lossy, lossless, and lossy to lossless image compression. The design goal for JBIG2 aims
not only at providing superior lossless compression performance over existing standards but also at incorporating
lossy compression at a much higher compression ratio, with as little visible degradation as possible.
A unique feature of JBIG2 is that it is both quality progressive and content progressive. By quality
progressive, we mean that the bitstream behaves similarly to that of the JBIG standard, in which the image quality
progresses from lower to higher (or possibly lossless) quality. On the other hand, content progressive allows
different types of image data to be added progressively. The JBIG2 encoder decomposes the input bilevel image
into regions of different attributes and codes each separately, using different coding methods.As in other imag"e
compression standards, only the JBIG2 bitstrearn, and thus the de coder, is explicitly defined. As a result, any
encoder that produces the correct bitstream is "compliant", regardless of the actions it actually takes. Another
feature of JBIG2 that sets it apart from other image compression standards is that it is able to represent multiple
pages of a document
For example, if a character appears on one page, it is likely to appear on other pages as well. Thus, using a
dictionary-based technique, this character is coded only once instead of multiple times for every page on which it
appears. This compression technique is somewhat analogous to video coding, which exploits interframe
redundancy to increase compression efficiency.
JBIG2 offers content-progressive coding and superior compression performance through model-based coding,
in which different models are constructed for different data types in an image, realizing additional coding gain.
Model-Based Coding. The idea behind model-based coding is essentially the same as that of context-based
coding. From the study of the latter, we know we can realize better compression performance by carefully
designing a context template and accurately estimating the probability distribution for each context. Similarly, if
we can separate the image content into different categories and derive a model specifically for each, we are much
more likely to accurately model the behavior of the data and thus achieve higher compression ratio.
In the JBIG style of coding, adaptive and model templates capture the structure within the image. This model
is general, in the sense that it applies to all kinds of data. However, being general implies that it does not
explicitly deal with the stmctural differences between text and halftone data that comprise nearly all the contents
of bilevel images. JBIG2 takes advantage of this by designing custom models for these data types.The JBIG2
specification expects the encoder to first segment the input image into regions of different data types, in
particular, text and halftone regions. Each region is then coded independently, according to its
characteristics.Text-Region Coding.
Each text region is further segmented into pixel blocks containing connected black pixels. These blocks
correspond to characters that make up the content of this region. Then, instead of coding all pixels of each
character, the bitmap of one representative instance of this character is coded and placed into a dictional)'. For any
character to be coded, the algorithm first tries to find a match with the characters in the dictionary. If one is found,
then both a pointer to the corresponding entry in the dictionary and the position of the character on the page are
coded. Otherwise, the pixel block is coded directly and added to the dictionary. This technique is refelTed to as
pattern matching and substitution in the JBIG2 specification.
However, for scanned documents, it is unlikely that two instances of the same character will match pixel by
pixeL In this case, JBIG2 allows the option of including refinement data to reproduce the original character on the
page. The refinement data codes the CUlTent character using the pixels in the matching character in the
dictional)'. The encoder has the freedom to choose the refinement to be exact or lossy. This method is called soft
pattern matching.
The numeric data, such as the index of matched character in the dictionary and the position of the characters
on the page, are either bitwise or Huffman encoded. Each bitmap for the characters in the dictionary is coded
using JBIG-based techniques.Halftone-Region Coding
The JBIG2 standard suggests two methods for halftone image coding. The first is similar to the context-based
arithmetic coding used in JBIG. The only difference is that the new standard allows the context template to
include
The second method is called descreelling. This involves converting back to grayscale and coding the
grayscale values. In this method, the bi level region is divided into blocks of size mb x nb. For an m x n bi level
region, the resulting grayscale image has dimension m g= (m + (mb - 1»)/mb , Ng= (n + (nb - 1»)/nbJ. The grayscale
value is then computed to be the sum of the binary pixel values in the corresponding nib x Jlb block. The bit
planes of the grayscale image are coded using context-based arithmetic coding. The grayscale values are used as
indices into a dictionary of halftone bitmap patterns. The decoder can use this value to index into this dictionary,
to reconstruct the original halftone image.
Preprocessing and Postprocessing: JBIG2 allows the use of lossy compression but does not specify a method for
doing so. From the decoder point of view, the decoded bit stream is lossless with respect to the image encoded by
the encoder, although not necessarily with respect to the original image. The encoder may modify the input image
in a prepro cessing step, to increase coding efficiency. The preprocessor usually tries to change the original image
to lower the code length in a way that does not generally affect the image's appearance. Typically, it tries to
remove noisy pixels and smooth out pixel blocks. Postprocessing, another issue not addressed by the
specification, can be especially useful for halftones, potentially producing more visually pleasing images. It is
also helpful to tune the decoded image to a particular output device, such as a laser printer.