0% found this document useful (0 votes)
3 views

Unit 3 Image Compression

Uploaded by

Mouli Mouli V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit 3 Image Compression

Uploaded by

Mouli Mouli V
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Chapter 3

IMAGE COMPRESSION

1. Introduction:

A digital image is a rectangular array of dots, or picture elements, arranged in m rows and n columns.
The expression m×n is called the resolution of the image, and the dots are called pixels (except in the
cases of fax images and video compression, where they are referred to as pels). The term “resolution” is
sometimes also used to indicate the number of pixels per unit length of the image. Thus, dpi stands for
dots per inch.
The purpose of compression is to code the image data into a compact form, minimizing both the
number of bits in the representation, and the distortion caused by the compression. The importance of
image compression is emphasized by the huge amount of data in raster images: a typical gray‐scale image
of 512512 pixels, each represented by 8 bits, contain 256 kilobytes of data. With the color information,
the number of bytes is tripled. If we talk about video images of 25 frames per second, even a one second
of color film requires approximately 19 megabytes of memory. Thus, the necessity for compression is
obvious.
Image compression addresses the problem of reducing the amount of data required to
represent a digital image. The underlying basis of the reduction process is the removal of
redundant data. From a mathematical viewpoint, this amounts to transforming a 2-D pixel
array into a statistically uncorrelated data set. The transformation is applied prior to storage
or transmission of the image. At some later time, the compressed image is decompressed to
reconstruct the original image or an approximation of it.
For the purpose of image compression it is useful to distinguish the following types of images:
1. A bilevel (or monochromatic) image. This is an image where the pixels can have one of two values,
normally referred to as black and white. Each pixel in such an image is represented by one bit, making
this the simplest type of image.
2. A grayscale image. A pixel in such an image can have one of the n values 0 through n − 1, indicating

one of 2n shades of gray (or shades of some other color). The value of n is normally

compatible with a byte size; i.e., it is 4, 8, 12, 16, 24, or some other convenient multiple of 4 or of 8. The
set of the most‐significant bits of all the pixels is the most‐significant bitplane. Thus, a grayscale image
has n bitplanes.
3. A continuoustone image. This type of image can have many similar colors (or grayscales). When
adjacent pixels differ by just one unit, it is hard or even impossible for the eye to distinguish their colors.
As a result, such an image may contain areas with colors that seem to vary continuously as the eye moves
along the area. A pixel in such an image is represented by either a single large number (in the case of
many grayscales) or three components (in the case of a color image). A continuous‐tone image is
normally a natural image (natural as opposed to artificial) and is obtained by taking a photograph with a
digital camera, or by scanning a photograph or a painting.
4. A discretetone image (also called a graphical image or a synthetic image). This is normally an artificial
image. It may have a few colors or many colors, but it does not have the noise and blurring of a natural
image. Examples are an artificial object or machine, a page of text, a chart, a cartoon, or the contents of a
computer screen. Artificial objects, text, and line drawings have sharp, well‐ defined edges, and are
therefore highly contrasted from the rest of the image (the background). Adjacent pixels in a discretetone
image often are either identical or vary significantly in value. Such an image does not compress well with
lossy methods, because the loss of just a few pixels may render a letter illegible, or change a familiar
pattern to an unrecognizable one.

5. A cartoonlike image. This is a color image that consists of uniform areas. Each area has a uniform
color but adjacent areas may have very different colors. This feature may be exploited to obtain excellent
compression.
2. Introduction to image compression
The term data compression refers to the process of reducing the amount of data required to represent a
given quantity of information. A clear distinction must be made between data and information. They are
not synonymous. In fact, data are the means by which information is conveyed. Various amounts of data
may be used to represent the same amount of information. That is, it contains data (or words) that either
provide no relevant information or simply restate that which is already known. It is thus said to contain
data redundancy.
Data redundancy is a central issue in digital image compression. It is e compression. It is not an
abstract concept but a mathematically quantifiable entity. If n1 and n2 denote the number of
information‐carrying units in two data sets that represent the same information, the relative data
redundancy RD of the first data set (the one characterized by n1) can be defined as,
1
R =1—
D CR
Where CR commonly called the compression ratio, as
C n1
=
R n2
For the case n2 = n1, CR = 1 and RD = 0, indicating that (relative to the second data set) the first
representation of the information contains no redundant data. When n2« n1, CR→ ∞» and RD
→ —∞ implying significant compression and highly redundant data. Finally, when n2» n1, CR —> 0
and RD → —∞, indicating that the second data set contains much more data than the original
representation. In general, CR and RD lie in the open intervals (0, ∞) and (‐∞, 1), respectively. A practical
compression ratio, such as 10 (or 10:1), means that the first data set has 10 information carrying units
(say, bits) for every 1 unit in the second or compressed data set. The corresponding redundancy of 0.9
implies that 90% of the data in the first data set is redundant.
In digital image compression, three basic data redundancies can be identified and exploited:
1. coding redundancy,
2. interpixel redundancy,
3. Psychovisual redundancy.
Data compression is achieved when one or more of these redundancies are reduced or eliminated.
2.1 Coding Redundancy
We know that how the gray‐level histogram of an image can provide a great deal of insight into the
construction of codes to reduce the amount of data used to represent it. Let us assume, that a discrete
random variable rk in the interval [0,1] represents the gray levels of an image and that each rk occurs with
probability pr(rk), which is given by,
pr(rk) = nk
n
where L is the number of gray levels, n k is the number of times that the kth gray level appears in the
image, and n is the total number of pixels in the image. If the number of bits used to represent each value
of rk is l(rk), then the average number of bits required to represent each pixel is
L–1
Lavg = Σ l(rk)pr(rk)
k=0
That is, the average length of the code words assigned to the various gray‐level values is found by
summing the product of the number of bits used to represent each gray level and the probability that the
gray level occurs. Thus the total number of bits required to code an M X N image is MNLavg.

Assigning fewer bits to the more probable gray levels than to the less probable ones achieves data
compression. This process commonly is referred to as variable-length coding. If the gray levels of an
image are coded in a way that uses more code symbols than absolutely necessary to represent each gray
level, the resulting image is said to contain coding redundancy. In general, coding redundancy is present
when the codes assigned to a set of events (such as gray‐level values) have not been selected to take full
advantage of the probabilities of the events. It is almost always present when an image's gray levels are
represented with a straight or natural binary code. In this case, the underlying basis for the coding
redundancy is that images are typically composed of objects that have a regular and somewhat predictable
morphology (shape) and reflectance, and are generally sampled so that the objects being depicted are
much larger than the picture elements. The natural consequence is that, in most images, certain gray
levels are more probable than others. A natural binary coding of their gray levels assigns the same number
of bits to both the most and least probable values, thus failing to minimize Lavg and resulting in coding
redundancy.

2.2 Interpixel Redundancy


Consider the images shown in Figs. 1(a) and (b). As Figs. 1(c) and (d) show, these images have virtually
identical histograms. Note also that both histograms are trimodal, indicating the presence of three
dominant ranges of gray‐level values. Because the gray levels in these images are not equally probable,
variable-length coding can be used to reduce the coding redundancy that would result from a straight or
natural binary encoding of their pixels. The coding process, however, would not alter the level of
correlation between the pixels within the images. In other words, the codes used to represent the gray
levels of each image have nothing to do with the correlation between pixels. These correlations result
from the structural or geometric relationships between the objects in the image.
These illustrations reflect another important form of data redundancy—one directly related to the
interpixel correlations within an image. Because the value of any given pixel can be reasonably predicted
from the value of its neighbors, the information carried by individual pixels is relatively small. Much of
the visual contribution of a single pixel to an image is redundant; it could have been guessed on the basis
of the values of its neighbors. A variety of names, including spatial redundancy, geometric
redundancy, and interframe redundancy, have been coined to refer to these interpixel dependencies.
We use the term interpixel redundancy to encompass them all.

Figure 1: Two images (a) and (b) and their gray level histograms (c) and (D)

In order to reduce the interpixel redundancies in an image, the 2‐D pixel array normally used for
human viewing and interpretation must be transformed into a more efficient (but usually "nonvisual")
format. For example, the differences between adjacent pixels can be used to represent an image.
Transformations of this type (that is, those that remove interpixel redundancy) are referred to as
mappings. They are called reversible mappings if the original image elements can be reconstructed
from the transformed data set.
2.3 Psychovisual Redundancy
We know that the brightness of a region, as perceived by the eye, depends on factors other than simply
the light reflected by the region. For example, intensity variations (Mach bands) can be perceived in an
area of constant intensity. Such phenomena result from the fact that the eye does

not respond with equal sensitivity to all visual information. Certain information simply has less relative
importance than other information in normal visual processing. This information is said to be
psychovisually redundant. It can be eliminated without significantly impairing the quality of image
perception.
That psychovisual redundancies exist should not come as a surprise, because human perception of
the information in an image normally does not involve quantitative analysis of every pixel value in the
image. In general, an observer searches for distinguishing features such as edges or textural regions
and mentally combines them into recognizable groupings. The brain then correlates these groupings with
prior knowledge in order to complete the image interpretation process.
Psychovisual redundancy is fundamentally different from the redundancies discussed earlier. Unlike
coding and interpixel redundancy, psychovisual redundancy is associated with real or quantifiable visual
information. Its elimination is possible only because the information itself is not essential for normal
visual processing. Since the elimination of psychovisually redundant data results in a loss of quantitative
information, it is commonly referred to as quantization. This terminology is consistent with normal
usage of the word, which generally means the mapping of a broad range of input values to a limited
number of output values. As it is an irreversible operation (visual information is lost), quantization
results in lossy data compression.
Improved gray-scale (IGS) quantization method recognizes the eye's inherent sensitivity to
edges and breaks them up by adding to each pixel a pseudorandom number, which is generated from the
low‐order bits of neighboring pixels, before quantizing the result. Because the low‐order bits arc fairly
random, this amounts to adding a level of randomness, which depends on the local characteristics of the
image, to the artificial edges normally associated with false contouring.

3. Approaches to Image Compression

Approach 1: This is appropriate for bi-level images. A pixel in such an image is represented by one bit.
Applying the principle of image compression to a bi‐level image therefore means that the immediate
neighbors of a pixel P tend to be identical to P. Thus, it makes sense to use run‐length encoding (RLE) to
compress such an image. A compression method for such an image may scan it in raster order (row by
row) and compute the lengths of runs of black and white pixels. The lengths are encoded by variable ‐size
(prefix) codes and are written on the compressed stream. An example of such a method is facsimile
compression.

Approach 2: Also for bi‐level images. The principle of image compression tells us that the neighbors of
a pixel tend to be similar to the pixel. We can extend this principle and conclude that if the current pixel
has color c (where c is either black or white), then pixels of the same color seen in the past (and also those
that will be found in the future) tend to have the same immediate neighbors.
This approach looks at n of the near neighbors of the current pixel and considers them an n‐ bit
number. This number is the context of the pixel. In principle there can be 2n contexts, but because of
image redundancy we expect them to be distributed in a nonuniform way. Some contexts should be
common while others will be rare. This approach is used by JBIG.
Approach 3: Separate the grayscale image into n bi‐level images and compress each with RLE and prefix
codes. The principle of image compression seems to imply intuitively that two adjacent pixels that are
similar in the grayscale image will be identical in most of the n bi‐level images. This, however, is not
true. An example of such a code is the reflected Gray codes.
Approach 4: Use the context of a pixel to predict its value. The context of a pixel is the values of some
of its neighbors. We can examine some neighbors of a pixel P, compute an average A of their values, and
predict that P will have the value A. The principle of image compression tells us that our prediction will
be correct in most cases, almost correct in many cases, and completely wrong in a few cases. This is used
in MLP method.
Approach 5: Transform the values of the pixels and encode the transformed values. Recall that
compression is achieved by reducing or removing redundancy. The redundancy of an image is caused by
the correlation between pixels, so transforming the pixels to a representation where they are decorrelated
eliminates the redundancy. It is also possible to think of a transform in terms of the entropy of the image.
In a highly correlated image, the pixels tend to have equiprobable values, which results in maximum
entropy. If the transformed pixels are decorrelated, certain pixel values become common, thereby having
large probabilities, while others are rare. This results in small entropy. Quantizing the transformed values
can produce efficient lossy image compression. Approach 6: The principle of this approach is to separate
a continuous‐tone color image into three grayscale images and compress each of the three separately;
using approaches 3, 4, or 5. For a continuous‐tone image, the principle of image
An important feature of this approach is to use a luminance chrominance color representation
instead of the more common RGB. The advantage of the luminance chrominance color representation is
that the eye is sensitive to small changes in luminance but not in

chrominance. This allows the loss of considerable data in the chrominance components, while making it
possible to decode the image without a significant visible loss of quality.
Approach 7: A different approach is needed for discrete‐tone images. Recall that such an image contains
uniform regions, and a region may appear several times in the image. A good example is a screen dump.
Such an image consists of text and icons. Each character of text and each icon is a region, and any region
may appear several times in the image. A possible way to compress such an image is to scan it, identify
regions, and find repeating regions. If a region B is identical to an already found region A, then B can be
compressed by writing a pointer to A on the compressed stream. The block decomposition method
(FABD) is an example of how this approach can be implemented.
Approach 8: Partition the image into parts (overlapping or not) and compress it by processing the parts
one by one. Suppose that the next unprocessed image part is part number 15. Try to match it with parts 1–
14 that have already been processed. If part 15 can be expressed, for example, as a combination of parts 5
(scaled) and 11 (rotated), then only the few numbers that specify the combination need be saved, and part
15 can be discarded. If part 15 cannot be expressed as a combination of already‐processed parts, it is
declared processed and is saved in raw format.
This approach is the basis of the various fractal methods for image compression. It applies the
principle of image compression to image parts instead of to individual pixels. Applied this way, the
principle tells us that “interesting” images (i.e., those that are being compressed in practice) have a certain
amount of self similarity. Parts of the image are identical or similar to the entire image or to other parts.

4. Gray Codes and its significance for image compression

An image compression method that has been developed specifically for a certain type of image can
sometimes be used for other types. Any method for compressing bi‐level images, for example, can be
used to compress grayscale images by separating the bitplanes and compressing each individually, as if it
were a bi‐level image. Imagine, for example, an image with 16 grayscale values. Each pixel is defined by
four bits, so the image can be separated into four bi‐level images. The trouble with this approach is that it
violates the general principle of image compression. Imagine two adjacent 4‐bit pixels with values 7 =
01112 and 8 = 10002. These pixels have close values, but when separated into four bitplanes, the
resulting 1‐bit pixels are different in every bitplane! This is because the binary representations of the
consecutive integers 7 and 8 differ in all four bit positions. In order to apply any bi‐level
compression method to grayscale images, a binary

representation of the integers is needed where consecutive integers have codes differing by one bit only.
Such a representation exists and is called reflected Gray code (RGC).
The conclusion is that the most-significant bitplanes of an image obey the principle of image
compression more than the least-significant ones. When adjacent pixels have values that differ by one
unit (such as p and p+1), chances are that the least‐significant bits are different and the most‐significant
ones are identical. Any image compression method that compresses bitplanes individually should
therefore treat the least‐significant bitplanes differently from the most‐ significant ones, or should use
RGC instead of the binary code to represent pixels.. The bitplanes are numbered 8 (the leftmost or most ‐
significant bits) through 1 (the rightmost or least‐significant bits). It is obvious that the least‐significant
bitplane doesn’t show any correlations between the pixels; it is random or very close to random in both
binary and RGC. Bitplanes 2 through 5, however, exhibit better pixel correlation in the Gray code.
Bitplanes 6 through 8 look different in Gray code and binary, but seem to be highly correlated in either
representation.
Color images provide another example of using the same compression method across image
types. Any compression method for grayscale images can be used to compress color images. In a color
image, each pixel is represented by three color components (such as RGB). Imagine a color image where
each color component is represented by one byte. A pixel is represented by three bytes, or 24 bits, but
these bits should not be considered a single number. The two pixels 118|206|12 and 117|206|12 differ by
just one unit in the first component, so they have very similar colors. Considered as 24‐bit numbers,
however, these pixels are very different, since they differ in one of their most significant bits. Any
compression method that treats these pixels as 24‐bit numbers would consider these pixels very different,
and its performance would suffer as a result.
A compression method for grayscale images can be applied to compressing color images, but the color
image should first be separated into three color components, and each component compressed
individually as a grayscale image.
5. Error Metrics
Developers and implementers of lossy image compression methods need a standard metric to measure the
quality of reconstructed images compared with the original ones. The better a reconstructed image
resembles the original one, the bigger should be the value produced by this metric. Such a metric should
also produce a dimensionless number, and that number should not be very sensitive to small variations in
the reconstructed image.
A common measure used for this purpose is the peak signal to noise ratio (PSNR). Higher PSNR
values imply closer resemblance between the reconstructed and the original images, but they do not
provide a guarantee that viewers will like the reconstructed image. Denoting the pixels of the original
image by Pi and the pixels of the reconstructed image by Qi (where 1 ≤ i ≤ n), we first define the mean
square error (MSE) between the two images as
n
1
MSE = Σ(P — Oi)2
i
n
i=1
It is the average of the square of the errors (pixel differences) of the two images. The root mean
square error (RMSE) is defined as the square root of the MSE, and the PSNR is defined as

PSNR = 20 log10 maxi|Pi|


RMSE
The absolute value is normally not needed, since pixel values are rarely negative. For a bi‐ level
image, the numerator is 1. For a grayscale image with eight bits per pixel, the numerator is
255. For color images, only the luminance component is used. Greater resemblance between the images
implies smaller RMSE and, as a result, larger PSNR. The PSNR is dimensionless, since the units of both
numerator and denominator are pixel values. However, because of the use of the logarithm, we say that
the PSNR is expressed in decibels (dB). The use of the logarithm also implies less sensitivity to changes
in the RMSE. Notice that the PSNR has no absolute meaning. It is meaningless to say that a PSNR of,
say, 25 is good. PSNR values are used only to compare the performance of different lossy compression
methods or the effects of different parametric values on the performance of an algorithm.
Typical PSNR values range between 20 and 40. Assuming pixel values in the range [0, 255], an
RMSE of 25.5 results in a PSNR of 20, and an RMSE of 2.55 results in a PSNR of 40. An RMSE of zero
(i.e., identical images) results in an infinite (or, more precisely, undefined) PSNR. An RMSE of 255
results in a PSNR of zero, and RMSE values greater than 255 yield negative PSNRs.
A related measure is signal to noise ratio (SNR). This is defined as

1
√ ∑ n P2
SNR = 20 log
n i=1 i
10
RMSE
The numerator is the root mean square of the original image.
Another relative of the PSNR is the signal to quantization noise ratio (SQNR). This is a measure of
the effect of quantization on signal quality. It is defined as

SONR = 10 signal power


log
10
quantization error
where the quantization error is the difference between the quantized signal and the original signal.
Another approach to the comparison of an original and a reconstructed image is to generate the
difference image and judge it visually. Intuitively, the difference image is Di = Pi−Qi, but such an image
is hard to judge visually because its pixel values Di tend to be small numbers. If a pixel value of zero
represents white, such a difference image would be almost invisible. In the opposite case, where pixel
values of zero represent black, such a difference would be too dark to judge. Better results are obtained by
calculating
Di = a(Pi − Qi) + b
where a is a magnification parameter (typically a small number such as 2) and b is half the maximum
value of a pixel (typically 128). Parameter a serves to magnify small differences, while b shifts the
difference image from extreme white (or extreme black) to a more comfortable gray.
6. Image Transforms
An image can be compressed by transforming its pixels (which are correlated) to a representation where
they are decorrelated. Compression is achieved if the new values are smaller, on average, than the
original ones. Lossy compression can be achieved by quantizing the transformed values. The decoder
inputs the transformed values from the compressed stream and reconstructs the (precise or approximate)
original data by applying the inverse transform. The transforms discussed in this section are orthogonal.
The term decorrelated means that the transformed values are independent of one another. As a
result, they can be encoded independently, which makes it simpler to construct a statistical model. An
image can be compressed if its representation has redundancy. The redundancy in images stems from
pixel correlation. If we transform the image to a representation where the pixels are decorrelated, we have
eliminated the redundancy and the image has been fully compressed.

6.1 Orthogonal Transforms


Image transforms are designed to have two properties:
1. to reduce image redundancy by reducing the sizes of most pixels and
2. to identify the less important parts of the image by isolating the various frequencies of the image.
We intuitively associate a frequency with a wave. Water waves, sound waves, and electromagnetic waves
have frequencies, but pixels in an image can also feature frequencies. Figure 2 shows a small, 5×8 bi‐
level image that illustrates this concept. The top row is uniform, so we can assign it zero frequency. The
rows below it have increasing pixel frequencies as measured by the number of color
changes along a row. The four waves on the right roughly correspond to the frequencies of the four top
rows of the image.

Figure 2: Image frequencies

Image frequencies are important because of the following basic fact: Low frequencies
correspond to the important image features, whereas high frequencies correspond to the details of the
image, which are less important. Thus, when a transform isolates the various image frequencies, pixels
that correspond to high frequencies can be quantized heavily, whereas pixels that correspond to low
frequencies should be quantized lightly or not at all. This is how a transform can compress an image very
effectively by losing information, but only information associated with unimportant image details.
Practical image transforms should be fast and preferably also simple to implement. This suggests the use
of linear transforms. In such a transform, each transformed value (or transform coefficient) ci is a
weighted sum of the data items (the pixels) dj that are being transformed, where each item is multiplied
by a weight wij. Thus, Ci = ∑j djwij for i, j = 1, 2, . . . , n. For n = 4, this is
expressed in matrix notation:

 c1   w11 w12 w13 w14  d1 



     
c w w w23 w24 d
 2    2122  2
 c3  w31 w32 w33 w34  d 
 3
     
 4 c  w41 w42 w43 w44  d4 

For the general case, we can write C =W.D. Each row of W is called a “basis vector.” The only quantities
that have to be computed are the weights wij . The guiding principles are as follows:
1. Reducing redundancy. The first transform coefficient c1 can be large, but the remaining values
c2, c3, . . . should be small.
2. Isolating frequencies. The first transform coefficient c1 should correspond to zero pixel
frequency, and the remaining coefficients should correspond to higher and higher frequencies.
The key to determining the weights wij is the fact that our data items dj are not arbitrary numbers but pixel
values, which are nonnegative and correlated.
This choice of wij satisfies the first requirement: to reduce pixel redundancy by means of a
transform. In order to satisfy the second requirement, the weights wij of row i should feature frequencies
that get higher with i. Weights w1j should have zero frequency; they should all be +1’s. Weights w1j
should have one sign change; i.e., they should be +1, +1, . . . + 1,−1,−1, . . . ,−1. This continues until the
last row of weights wnj should have the highest frequency +1,−1, +1,−1, . . . ,
+1,−1. The mathematical discipline of vector spaces coins the term “basis vectors” for our rows of
weights.
In addition to isolating the various frequencies of pixels dj, this choice results in basis vectors that are
orthogonal. The basis vectors are the rows of matrix W, which is why this matrix and, by implication, the
entire transform are also termed orthogonal. These considerations are satisfied by the orthogonal matrix

1
 1 1 1
 
1
1 1 1 
1 1 1 1 
The first basis vector (the1top1 1 W)1consists of all 1’s, so its frequency is zero. Each of the
row of 
subsequent vectors has two +1’s and two −1’s, so they produce small transformed values, and their
frequencies (measured as the number of sign changes along the basis vector) get higher. It is also possible
to modify this transform to conserve the energy of the data vector. All that’s needed is to multiply the
transformation matrix W by the scale factor 1/2. Another advantage of W is that it also performs the
inverse transform.
6.2 Two-Dimensional Transforms
Given two‐dimensional data such as the 4X4 matrix
5 6 7 4
 
 6 5 7 5
 7 7 6 6 
 
 8 8 8 8

where each of the four columns is highly correlated, we can apply our simple one dimensional transform to
the columns of D. The result is,
1 1 1 5 1 6 7 4  26 26 28 23 

  
   
1 1 1 1 6 5 7 5 4 4 0 5
C' = W · D =  ·  =  

1 1 1 1 
 
7 7 6 6  0 2 2 1 
1 1 1 1 8 8 8 8 2 0 2 3
     
Each column of C’ is the transform of a column of D. Notice how the top element of each column
of C’ is dominant, because the data in the corresponding column of D is correlated. Notice also that the
rows of C’ are still correlated. C’ is the first stage in a two‐stage process that produces the two‐
dimensional transform of matrix D. The second stage should transform each row of C’, and this is done

by multiplying C’ by the transpose WT. Our particular W, however, is symmetric, so we end up with C =

C’.WT =W.D.WT =W.D.W or


 26 26 28 23 1 1 1 1  103 1 5
5
 
     
4 4 0 5 1 1 1 1 13 3 5 5
C=   ·   =  

 0 2 2 1  
1 1 1 1 
 
 5 1 3 1

2 0 2 3 1 1 1 1 7 3 3 1
     
The elements of C are decorrelated. The top‐left element is dominant. It contains most of the
total energy of the original D. The elements in the top row and the leftmost column are somewhat large,
while the remaining elements are smaller than the original data items. The double‐ stage, two‐dimensional
transformation has reduced the correlation in both the horizontal and vertical dimensions. As in the one‐
dimensional case, excellent compression can be achieved by quantizing the elements of C, especially
those that correspond to higher frequencies (i.e., located toward the bottom‐right corner of C).
This is the essence of orthogonal transforms. The important transforms are:
1. The WalshHadamard transform: is fast and easy to compute (it requires only additions and
subtractions), but its performance, in terms of energy compaction, is lower than that of the DCT.
2. The Haar transform: is a simple, fast transform. It is the simplest wavelet transform.
3. The KarhunenLo`eve transform: is the best one theoretically, in the sense of energy
compaction (or, equivalently, pixel decorrelation). However, its coefficients are not fixed; they depend
on the data to be compressed. Calculating these coefficients (the basis of the transform) is slow, as is the
calculation of the transformed values themselves. Since the coefficients are data dependent, they have to
be included in the compressed stream. For these reasons and because the DCT performs almost as well,
the KLT is not generally used in practice.
4. The discrete cosine transform (DCT): is important transform as efficient as the KLT in terms of
energy compaction, but it uses a fixed basis, independent of the data. There are also fast methods for
calculating the DCT. This method is used by JPEG and MPEG audio.
The 1‐D discrete cosine transform (DCT) is defined as
N 1
C 2x 1u 
u   u f xcos
 
 2N 
x0
The input is a set of n data values (pixels, audio samples, or other data), and the output is a set of n DCT
transform coefficients (or weights) C(u) . The first coefficient C(0) is called the DC coefficient, and the
rest are referred to as the AC coefficients. Notice that the coefficients are real numbers even if the input
data consists of integers. Similarly, the coefficients may be positive or negative even if the input data
consists of nonnegative numbers only.

Similarly, the inverse DCT is defined as


N 1
f 2x 1u 
x    uCu cos
 2N 
 
u
where 
0
for u  0

2 for u  1,2,..., N  1

 N

u   1
N

The corresponding 2‐D DCT, and the inverse DCT are defined as

N 1 N 1  2x    2 y  1v 
1u 
Cu, v   uv  f  x, y  cos   cos 
x 0 y 0  2   2N 
N
and

N 1 N 1  2x    2 y  1v 
1u 
f  x, y   uvCu, v  cos   cos 
u0 v0  2   2N 
N
The advantage of DCT is that it can be expressed without complex numbers. 2‐D DCT is also separable
(like 2‐D Fourier transform), i.e. it can be obtained by two subsequent 1‐D DCT.

The important feature of the DCT, the feature that makes it so useful in data
compression, is that it takes correlated input data and concentrates its energy in just the
first few transform coefficients. If the input data consists of correlated quantities, then most
of the N transform coefficients produced by the DCT are zeros or small numbers, and only a
few are large (normally the first ones).

Compressing data with the DCT is therefore done by quantizing the coefficients. The
small ones are quantized coarsely (possibly all the way to zero), and the large ones can be
quantized finely to the nearest integer. After quantization, the coefficients (or variable-size
codes assigned to the coefficients) are written on the compressed stream. Decompression is
done by performing the inverse DCT on the quantized coefficients. This results in data items
that are not identical to the original ones but are not much different.
In practical applications, the data to be compressed is partitioned into sets of N items
each and each set is DCT-transformed and quantized individually. The value of N is critical.
Small values of N such as 3, 4, or 6 result in many small sets of data items. Such a small set
is transformed to a small set of coefficients where the energy of the original data is
concentrated in a few coefficients, but there are only a few coefficients in such a set! Thus,
there are not enough small coefficients to quantize. Large values of N result in a few large
sets of data. The problem in such a case is that the individual data items of a large set are
normally not correlated and therefore result in a set of transform coefficients where all the
coefficients are large. Experience indicates that N= 8 is a good value, and most data
compression methods that employ the DCT use this value of N.

7. JPEG

JPEG is a sophisticated lossy/lossless compression method for color or grayscale still images. It does not
handle bi‐level (black and white) images very well. It also works best on continuous‐tone images, where
adjacent pixels have similar colors. An important feature of JPEG is its use of many parameters, allowing
the user to adjust the amount of the data lost (and thus also the compression ratio) over a very wide range.
Often, the eye cannot see any image degradation even at compression factors of 10 or 20. There are two
operating modes, lossy (also called baseline) and lossless (which typically produces compression ratios
of around 0.5). Most implementations support just the lossy mode. This mode includes progressive and
hierarchical coding. JPEG is a compression method, not a complete standard for image representation.
This is why it does not specify image features such as pixel aspect ratio, color space, or interleaving of
bitmap rows. JPEG has been designed as a compression method for continuous‐tone images.

The main goals of JPEG compression are the following:


The name JPEG is an acronym that stands for Joint Photographic Experts Group. This was a
1. High compression ratios, especially in cases where image quality is judged as very good to excellent.
2. The use of many parameters, allowing knowledgeable users to experiment and achieve the desired
compression/quality trade‐off.
3. Obtaining good results with any kind of continuous‐tone image, regardless of image dimensions, color
spaces, pixel aspect ratios, or other image features.
4. A sophisticated, but not too complex compression method, allowing software and hardware
implementations on many platforms.
5. JPEG includes four modes of operation : (a) A sequential mode where each image component
(color) is compressed in a single left‐to‐right, top‐to‐bottom scan; (b) A progressive mode where the
image is compressed in multiple blocks (known as “scans”) to be viewed from coarse to fine detail; (c) A
lossless mode that is important in cases where the user decides that no pixels should be lost (the trade ‐off
is low compression ratio compared to the lossy modes); and (d) A hierarchical mode where the image is
compressed at multiple resolutions allowing lower‐ resolution blocks to be viewed without first having to
decompress the following higher‐resolution blocks.

Figure3 Difference between sequential coding and progressive coding

The main JPEG compression steps are:


1. Color images are transformed from RGB into a luminance/chrominance color space. The
eye is sensitive to small changes in luminance but not in chrominance, so the chrominance part can later
lose much data, and thus be highly compressed, without visually impairing the overall image quality
much. This step is optional but important because the remainder of the algorithm works on each color
component separately. Without transforming the color space, none of the three color components will
tolerate much loss, leading to worse compression.
2. Color images are downsampled by creating low-resolution pixels from the original ones (this step
is used only when hierarchical compression is selected; it is always skipped for grayscale images).
The downsampling is not done for the luminance component. Downsampling is done either at a ratio of
2:1 both horizontally and vertically (the so called 2h2v or 4:1:1 sampling) or at ratios of 2:1 horizontally
and 1:1 vertically (2h1v or 4:2:2 sampling). Since this is done on two of the three color components, 2h2v
reduces the image to 1/3 + (2/3) × (1/4) = 1/2 its original size, while 2h1v reduces it to 1/3 + (2/3) × (1/2)
= 2/3 its original size. Since the luminance component is not touched, there is no noticeable loss of image
quality. Grayscale images don’t go through this step.

Figure 4: JPEG encoder and decoder


Figure5: JPEG encoder

Figure 6: Scheme of the JPEG for RGB images

3. The pixels of each color component are organized in groups of 8×8 pixels called data units, and
each data unit is compressed separately. If the number of image rows or columns is not a multiple of 8,
the bottom row and the rightmost column are duplicated as many times as necessary. In the
noninterleaved mode, the encoder handles all the data units of the first image component, then the data
units of the second component, and finally those of the third component. In the interleaved mode the
encoder processes the three top‐left data units of the three image components, then the three data units to
their right, and so on.
4. The discrete cosine transform is then applied to each data unit to create an 8×8 map of frequency
components. They represent the average pixel value and successive higher‐frequency changes within the
group. This prepares the image data for the crucial step of losing information.
5. Each of the 64 frequency components in a data unit is divided by a separate number called its
quantization coefficient (QC), and then rounded to an integer. This is where information is irretrievably
lost. Large QCs cause more loss, so the high frequency components typically have larger QCs. Each of
the 64 QCs is a JPEG parameter and can, in principle, be specified by the user. In practice, most JPEG
implementations use the QC tables recommended by the JPEG standard for the luminance and
chrominance image components.
6. The 64 quantized frequency coefficients (which are now integers) of each data unit are encoded using a
combination of RLE and Huffman coding.
7. The last step adds headers and all the required JPEG parameters, and outputs the result. The
compressed file may be in one of three formats (1) the interchange format, in which the file contains the
compressed image and all the tables needed by the decoder (mostly quantization tables and tables of
Huffman codes), (2) the abbreviated format for compressed image data, where the file contains the
compressed image and may contain no tables (or just a few tables), and (3) the abbreviated format for
table‐specification data, where the file contains just tables, and no compressed image. The second
format makes sense in cases where the same encoder/decoder pair is used, and they have the same tables
built in. The third format is used in cases where many images have been compressed by the same encoder,
using the same tables. When those images need to be decompressed, they are sent to a decoder preceded
by one file with table‐specification data.
The JPEG decoder performs the reverse steps. (Thus, JPEG is a symmetric compression
method.)
Figure 4 and 5 shows the block diagram of JPEG encoder and decoder. Figure 6 shows JPEG for
RGB images.
7.1 Modes of JPEG algorithm:
The progressive mode is a JPEG option. In this mode, higher‐frequency DCT coefficients are
written on the compressed stream in blocks called “scans.” Each scan that is read and processed by the
decoder results in a sharper image. The idea is to use the first few scans to quickly create a low ‐quality,
blurred preview of the image, and then either input the remaining scans or stop the process and reject the
image. The trade‐off is that the encoder has to save all the coefficients of all the data units in a memory
buffer before they are sent in scans, and also go through all the steps for each scan, slowing down the
progressive mode.
In the hierarchical mode, the encoder stores the image several times in the output stream, at
several resolutions. However, each high‐resolution part uses information from the low‐
resolution parts of the output stream, so the total amount of information is less than that required to store
the different resolutions separately. Each hierarchical part may use the progressive mode. The hierarchical
mode is useful in cases where a high‐resolution image needs to be output in low resolution. Older dot‐
matrix printers may be a good example of a low‐resolution output device still in use.
The lossless mode of JPEG calculates a “predicted” value for each pixel, generates the difference
between the pixel and its predicted value, and encodes the difference using the same method (i.e.,
Huffman or arithmetic coding) employed by step 5 above. The predicted value is calculated using values
of pixels above and to the left of the current pixel (pixels that have already been input and encoded).
7.2 Why DCT?
The JPEG committee elected to use the DCT because of its good performance, because it does
not assume anything about the structure of the data (the DFT, for example, assumes that the data to be
transformed is periodic), and because there are ways to speed it up. DCT has two key advantages: the
decorrelation of the information by generating coefficients which are almost independent of each other
and the concentration of this information in a greatly reduced number of coefficients. It reduces
redundancy while guaranteeing a compact representation.
The JPEG standard calls for applying the DCT not to the entire image but to dataunits (blocks) of
8×8 pixels. The reasons for this are (1) Applying DCT to large blocks involves many arithmetic
operations and is therefore slow. Applying DCT to small data units is faster. (2) Experience shows that,
in a continuous‐tone image, correlations between pixels are short range. A pixel in such an image has a
value (color component or shade of gray) that’s close to those of its near neighbors, but has nothing to do
with the values of far neighbors. The JPEG DCT is therefore executed for n = 8
The DCT is JPEG’s key to lossy compression. The unimportant image information is reduced or
removed by quantizing the 64 DCT coefficients, especially the ones located toward the lower‐ right. If
the pixels of the image are correlated, quantization does not degrade the image quality much. For
best results, each of the 64 coefficients is quantized by dividing it by a different quantization coefficient
(QC). All 64 QCs are parameters that can be controlled, in principle, by the user. Mathematically, the
DCT is a one-to-one mapping of 64-point vectors from the image domain to the frequency domain.
The IDCT is the reverse mapping. If the DCT and IDCT could be
calculated with infinite precision and if the DCT coefficients were not quantized, the original 64
pixels would be exactly reconstructed.
7.3 Quantization
After each 8×8 data unit of DCT coefficients Gij is computed, it is quantized. This is the step
where information is lost (except for some unavoidable loss because of finite precision calculations in
other steps). Each number in the DCT coefficients matrix is divided by the corresponding number from
the particular “quantization table” used, and the result is rounded to the nearest integer. As has already
been mentioned, three such tables are needed, for the three color components. The JPEG standard allows
for up to four tables, and the user can select any of the four for quantizing each color component.
The 64 numbers that constitute each quantization table are all JPEG parameters. In principle, they
can all be specified and fine‐tuned by the user for maximum compression. In practice, few users have the
patience or expertise to experiment with so many parameters, so JPEG software normally uses the
following two approaches:
1. Default quantization tables. Two such tables, for the luminance (grayscale) and the chrominance
components, are the result of many experiments performed by the JPEG committee. They are included in
the JPEG standard and are reproduced here as Table 1. It is easy to see how the QCs in the table generally
grow as we move from the upper left corner to the bottom right corner. This is how JPEG reduces the
DCT coefficients with high spatial frequencies.
2. A simple quantization table Q is computed based on one parameter R specified by the user. A simple
expression such as Qij = 1+(i + j) × R guarantees that QCs start small at the upper‐left corner and get
bigger toward the lower‐right corner. Table 2 shows an example of such a table with R = 2.

Table 1: Recommended Quantization Tables.


If the quantization is done correctly, very few nonzero numbers will be left in the DCT
coefficients matrix, and they will typically be concentrated in the upper‐left region. These numbers are
the output of JPEG, but they are further compressed before being written on the output stream. In the
JPEG literature this compression is called “entropy coding,” Three techniques are used by entropy coding
to compress the 8 × 8 matrix of integers:

Table 2: The Quantization Table 1 + (i + j) × 2.

1. The 64 numbers are collected by scanning the matrix in zigzags. This produces a string of 64 numbers
that starts with some nonzeros and typically ends with many consecutive zeros. Only the nonzero
numbers are output (after further compressing them) and are followed by a special end ‐of block (EOB)
code. This way there is no need to output the trailing zeros (we can say that the EOB is the run ‐length
encoding of all the trailing zeros)..
2. The nonzero numbers are compressed using Huffman coding.
3. The first of those numbers (the DC coefficient) is treated differently from the others (the AC
coefficients).
7.4 Coding:
Each 8×8 matrix of quantized DCT coefficients contains one DC coefficient [at position (0, 0),
the top left corner] and 63 AC coefficients. The DC coefficient is a measure of the average value of the 64
original pixels, constituting the data unit. Experience shows that in a continuous‐tone image, adjacent data
units of pixels are normally correlated in the sense that the average values of the pixels in adjacent data
units are close. We already know that the DC coefficient of a data unit is a multiple of the average of the
64 pixels constituting the unit. This implies that the DC coefficients of adjacent data units don’t differ
much. JPEG outputs the first one (encoded), followed by differences (also encoded) of the DC
coefficients of consecutive data units.
Example: If the first three 8×8 data units of an image have quantized DC coefficients of 1118,
1114, and 1119, then the JPEG output for the first data unit is 1118 (Huffman encoded) followed by the
63 (encoded) AC coefficients of that data unit. The output for the second data unit will be 1114 − 1118 =
−4 (also Huffman encoded), followed by the 63 (encoded) AC coefficients of that data unit, and the
output for the third data unit will be 1119 − 1114 = 5 (also Huffman encoded), again followed by the 63
(encoded) AC coefficients of that data unit. This way of handling the DC coefficients is worth the extra
trouble, because the differences are small.
Assume that 46 bits encode one color component of the 64 pixels of a data unit. Let’s
assume that the other two color components are also encoded into 46-bit numbers. If each
pixel originally consists of 24 bits, then this corresponds to a compression factor of 64 ×
24/(46 × 3) ≈ 11.13; very impressive!
Each quantized spectral domain is composed of a few non‐zero quantized coefficients, and the
majority of zero coefficients eliminated in the quantization stage. The positioning of the zeros changes
from one block to another. As shown in Figure 7, a zigzag scanning of the block is performed in order to
create a vector of coefficients with a lot of zero runlengths. The natural images generally have low
frequency characteristics. By beginning the zigzag scanning at the top left (by the low frequency zone),
the vector generated will at first contain significant coefficients, and then more and more runlengths of
zeros as we move towards the high frequency coefficients. Figure 7gives us an example.

Figure 7. Zigzag scanning of a quantized DCT domain, the resulting coefficient vector, and the
generation of pairs (zero runlength, DCT coefficient). EOB stands for “end of block”
Couples of (zero runlengths, DCT coefficient value) are then generated and coded by a set of Huffman
coders defined in the JPEG standard. The mean values of the blocks (DC coefficient) are coded separately
by a DPCM method. Finally, the “.jpg” file is constructed with the union of the bitstreams associated with
the coded blocks.

Why the Zig-Zag Scan:

1. To group low frequency coefficients in top of vector.


2. Maps 8 x 8 to a 1 x 64 vector
3. Zig-Zag scan is more effective

8. JPEG – LS:

JPEG‐LS is a new standard for the lossless (or near‐lossless) compression of continuous tone images.
JPEG‐LS examines several of the previously‐seen neighbors of the current pixel, uses them as the context
of the pixel, uses the context to predict the pixel and to select a probability distribution out of several such
distributions, and uses that distribution to encode the prediction error with a special Golomb code. There
is also a run mode, where the length of a run of identical pixels is encoded. Figure 8 below shows the
block diagram of JPEG‐LS encoder.

Figure 8: JPEG – LS Block diagram

The context used to predict the current pixel x is shown in Figure 9. The encoder examines the
context pixels and decides whether to encode the current pixel x in the run mode or in the
regular mode. If the context suggests that the pixels y, z,. . . following the current pixel are likely to be
identical, the encoder selects the run mode. Otherwise, it selects the regular mode. In the near ‐ lossless
mode the decision is slightly different. If the context suggests that the pixels following the current pixel
are likely to be almost identical (within the tolerance parameter NEAR), the encoder selects the run mode.
Otherwise, it selects the regular mode. The rest of the encoding process depends on the mode selected.

Figure 9: Context for Predicting x.

In the regular mode, the encoder uses the values of context pixels a, b, and c to predict pixel x,
and subtracts the prediction from x to obtain the prediction error, denoted by Errval. This error is then
corrected by a term that depends on the context (this correction is done to compensate for systematic
biases in the prediction), and encoded with a Golomb code. The Golomb coding depends on all four
pixels of the context and also on prediction errors that were previously encoded for the same context (this
information is stored in arrays A and N). If near‐lossless compression is used, the error is quantized before
it is encoded.
In the run mode, the encoder starts at the current pixel x and finds the longest run of pixels that
are identical to context pixel a. The encoder does not extend this run beyond the end of the current image
row. Since all the pixels in the run are identical to a (and a is already known to the decoder) only the
length of the run needs be encoded, and this is done with a 32‐entry array denoted by J. If near‐lossless
compression is used, the encoder selects a run of pixels that are close to a within the tolerance parameter
NEAR.
The decoder is not substantially different from the encoder, so JPEG‐LS is a nearly symmetric
compression method. The compressed stream contains data segments (with the Golomb codes and the
encoded run lengths), marker segments (with information needed by the decoder), and markers (some of
the reserved markers of JPEG are used). A marker is a byte of all ones followed by a special code,
signaling the start of a new segment. If a marker is followed by a byte
whose most significant bit is 0, that byte is the start of a marker segment. Otherwise, that byte starts
a data segment.
Advantages of JPEGLS:
[1] JPEG‐LS is capable of lossless compression.
[2] JPEG‐LS has very low computational complexity.
JPEG-LS achieve state-of-the-art compression rates at very low computational complexity
and memory requirements. These characteristics are what brought to the selection of JPEG-
LS, which is based on the LOCO- I algorithm developed at Hewlett-Packard Laboratories, as
the new ISO/ITU standard for lossless and near- lossless still image compression.
Ref: “The LOCO-I Lossless Image Compression Algorithm: Principles and Standardization
into JPEG-LS”, Marcelo J. Weinberger, Gadiel Seroussi, Guillermo Sapiro, IEEE
TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 8, AUGUST 2000.

9. JPEG 2000:
The JPEG 2000 standard for the compression of still images is based on the Discrete Wavelet
Transform (DWT). This transform decomposes the image using functions called wavelets. The basic idea
is to have a more localized (and therefore more precise) analysis of the information (signal, image or 3D
objects), which is not possible using cosine functions whose temporal or spatial supports are identical to
the data (the same time duration for signals, and the same length of line or column for images).
JPEG-2000 advantages:

JPEG‐2000 has the following advantages:

 Better image quality that JPEG at the same file size; or alternatively 25‐35% smaller file
sizes with the same quality.
 Good image quality at low bit rates (even with compression ratios over 80:1)
 Low complexity option for devices with limited resources.
 Scalable image files ‐‐ no decompression needed for reformatting. With JPEG 2000, the image
that best matches the target device can be extracted from a single compressed file on a server.
Options include:
1. Image sizes from thumbnail to full size
2. Grayscale to full 3 channel color
3. Low quality image to lossless (identical to original image)
 JPEG 2000 is more suitable to web‐graphics than baseline JPEG because it supports Alpha‐
channel (transparency component).
 Region of interest (ROI): one can define some more interesting parts of image, which are
coded with more bits than surrounding areas
Following is a list of areas where this new standard is expected to improve on existing
methods:
 High compression efficiency. Bitrates of less than 0.25 bpp are expected for highly detailed
grayscale images.
 The ability to handle large images, up to 232×232 pixels (the original JPEG can handle
images of up to 216×216).
 Progressive image transmission. The proposed standard can decompress an image
progressively by SNR, resolution, color component, or region of interest.
 Easy, fast access to various points in the compressed stream.
 The decoder can pan/zoom the image while decompressing only parts of it.
 The decoder can rotate and crop the image while decompressing it.
 Error resilience. Error‐correcting codes can be included in the compressed stream, to
improve transmission reliability in noisy environments.
9.1 The JPEG 2000 Compression Engine
The JPEG 2000 compression engine (encoder and decoder) is illustrated in block diagram form
in Fig. 10.

Figure 10: General block diagram of the JPEG 2000 (a) encoder and (b) decoder.
At the encoder, the discrete transform is first applied on the source image data. The transform
coefficients are then quantized and entropy coded before forming the output code stream (bit stream).
The decoder is the reverse of the encoder. The code stream is first entropy decoded, dequantized, and
inverse discrete transformed, thus resulting in the reconstructed image data. Although this general block
diagram looks like the one for the conventional JPEG, there are radical differences in all of the processes
of each block of the diagram. A quick overview of the whole system is as follows:
 The source image is decomposed into components.
 The image components are (optionally) decomposed into rectangular tiles. The tile‐ component is
the basic unit of the original or reconstructed image.
 A wavelet transform is applied on each tile. The tile is decomposed into different resolution
levels.
 The decomposition levels are made up of subbands of coefficients that describe the frequency
characteristics of local areas of the tile components, rather than across the entire image
component.
 The subbands of coefficients are quantized and collected into rectangular arrays of “code
blocks.”
 The bit planes of the coefficients in a code block (i.e., the bits of equal significance across the
coefficients in a code block) are entropy coded.
 The encoding can be done in such a way that certain regions of interest can be coded at a higher
quality than the background.
 Markers are added to the bit stream to allow for error resilience.
 The code stream has a main header at the beginning that describes the original image and the
various decomposition and coding styles that are used to locate, extract, decode and reconstruct
the image with the desired resolution, fidelity, region of interest or other characteristics.

For the clarity of presentation we have decomposed the whole compression engine into three parts:
the preprocessing, the core processing, and the bit-stream formation part, although there exist high
inter‐relation between them. In the preprocessing part the image tiling, the dc‐level shifting and the
component transformations are included. The core processing part consists of the discrete transform, the
quantization and the entropy coding processes. Finally, the concepts of the precincts, code blocks, layers,
and packets are included in the bit‐stream formation part.
Ref: “The JPEG 2000 Still Image Compression Standard”, Athanassios Skodras, Charilaos
Christopoulos, and Touradj Ebrahimi, IEEE SIGNAL PROCESSING MAGAZINE, SEPTEMBER 2001,
PP. 36-58

10. DPCM:

The DPCM compression method is a member of the family of differential encoding compression
methods, which itself is a generalization of the simple concept of relative encoding . It is based on the
well‐known fact that neighboring pixels in an image (and also adjacent samples in digitized sound) are
correlated. Correlated values are generally similar, so their differences are small, resulting in
compression.
Differential encoding methods calculate the differences di = ai − ai−1 between consecutive data
items ai, and encode the di’s. The first data item, a0, is either encoded separately or is written on the
compressed stream in raw format. In either case the decoder can decode and generate a0 in exact form. In
principle, any suitable method, lossy or lossless, can be used to encode the differences. In practice,
quantization is often used, resulting in lossy compression. The quantity encoded is not the difference di

but a similar, quantized number that we denote by d^7. The

difference between di and d^7 is the quantization error qi. Thus, d^7 = di + qi.
It turns out that the lossy compression of differences introduces a new problem, namely, the
accumulation of errors. This is easy to see when we consider the operation of the decoder. The decoder

inputs encoded values of d^7, decodes them, and uses them to generate “reconstructed”

values a^7 (where a^7 = a^7 – 1 +d^7 ) instead of the original data values ai. The decoder starts by reading
and decoding a0. It then inputs d^7 = d1 + q1 and calculates a^1 = a0+ d^7 = a0+d1+q1 = a1+q1. The next

step is to input dˆ2 = d2+q2 and to calculate a^2 =a^1 + dˆ2 = a1 + q1 + d2 + q2 = a2 + q1 +


q2. The decoded value a^2 contains the sum of two quantization errors. In general, the decoded value is,
n
a^n = an + Σ qi
i=1
and includes the sum of n quantization errors. Figure 11 summarizes the operations of both encoder and
decoder. It shows how the current data item ai is saved in a storage unit (a delay), to be used for encoding
the next item ai+1. The next step in developing a general differential encoding method is to take advantage
of the fact that the data items being compressed are correlated.
Figure 11: DPCM encoder and decoder
Any method using a predictor is called differential pulse code modulation, or DPCM. The
simplest predictor is linear. In such a predictor the value of the current pixel ai is predicted by a weighted
sum of N of its previously‐seen neighbors (in the case of an image these are the pixels above it or to its
left):
N
Pi = Σ Wjai–j
j=1

where wj are the weights, which still need to be determined. Figure 12 shows a simple example for the
case N = 3. Let’s assume that a pixel X is predicted by its three neighbors A, B, and C according to the
simple weighted sum
X = 0.35A + 0.3B + 0.35C

The weights used in above equation have been selected more or less arbitrarily and are for
illustration purposes only. However, they make sense, because they add up to unity. In order to determine
the best weights, we denote by ei the prediction error for pixel ai,
N
ei = ai — pi = ai — Σ wjai–j
j=1
i=1,2,…,n. and n is the number of pixels to be compressed and we find the set of weights wj that minimizes
the sum
n n N 2
E = Σ e2 = Σ [ai — Σ wjai–j]

11. Fractal Image Compression: i= i= j=1


1 1
Coastlines, mountains and clouds are not easily described by traditional Euclidean geometry. The
natural objects may be described and mathematically modeled by Mandelbrot’s fractal geometry. This is
another reason why image compression using fractal transforms are investigated. The word fractal was
first coined by Mandelbrot in 1975.
Properties of fractals
1) The defining characteristic of a fractal is that it has a fractional dimension, from which the word fractal
is derived.
2) The property of self‐similarity or scaling is one of the central concepts of fractal geometry.

11.1 SelfSimilarity in Images


A typical image does not contain the type of self‐similarity found in fractals. But, it contains a
different sort of self‐similarity. The figure shows regions of Lenna that are self‐similar at different scales.
A portion of her shoulder overlaps a smaller region that is almost identical, and a portion of the reflection
of the hat in the mirror is similar to a smaller part of her hat.

The difference here is that the entire image is not self‐similar, but parts of the image are self‐
similar with properly transformed parts of itself. Studies suggest that most naturally occurring
images contain this type of self‐similarity. It is this restricted redundancy that fractal image compression
schemes attempt to eliminate.
What is Fractal Image Compression?
Imagine a special type of photocopying machine that reduces the image to be copied by half and
reproduces it three times on the copy (see Figure 1). What happens when we feed the output of this
machine back as input? Figure 2 shows several iterations of this process on several input images. We can
observe that all the copies seem to converge to the same final image, the one in 2(c). Since the copying
machine reduces the input image, any initial image placed on the copying machine will be reduced to a
point as we repeatedly run the machine; in fact, it is only the position and the orientation of the copies that
determines what the final image looks like.

The way the input image is transformed determines the final result when running the copy
machine in a feedback loop. However we must constrain these transformations, with the limitation that
the transformations must be contractive (see contractive box), that is, a given transformation applied to
any two points in the input image must bring them closer in the copy. This technical condition is quite
logical, since if points in the copy were spread out the final image would have to be of infinite size.
Except for this condition the transformation can have any form. In practice, choosing transformations of
the form

is sufficient to generate interesting transformations called affine transformations of the plane. Each can
skew, stretch, rotate, scale and translate an input image. A common feature of these transformations that
run in a loop back mode is that for a given initial image each image is formed
from a transformed (and reduced) copies of itself, and hence it must have detail at every scale. That is, the
images are fractals. This method of generating fractals is due to John Hutchinson.

Barnsley suggested that perhaps storing images as collections of transformations could lead to
image compression. His argument went as follows: the image in Figure 3 looks complicated yet it is
generated from only 4 affine transformations.
The JBIG Standard
JBIG is the coding standard recommended by the Joint Bi-level Image Processing Group for binary images.
This lossless compression standard is used primarily to code scanned images of printed or handwritten text,
computer-generated text, and facsimile transmissions. It offers progressive encoding and decoding capability, in
the sense that the resulting bitstream contains a set of progressively higher-resolution images. This standard can
also be used to code grayscale and color images by coding each bitplane independently, but this is not the main
objective.

The JBIG compression standard has three separate modes of operation: progressive, progressive-compatible
sequential, and single-progression seqlleTltial. The progressive compatible sequential mode uses a bitstream
compatible with the progressive mode. The only difference is that the data is divided into strips in this mode.

The single-progression sequential mode has only a single lowest-resolution layer. There fore, an entire image
can be coded without any reference to other higher-resolution layers. Both these modes can be viewed as special
cases of the progressive mode. Therefore, our discussion covers only the progressive mode.

The JBIG encoder can be decomposed into two components: • Resolution-reduction and differential-layer
encoder• Lowest-resolution-layer encoder

The input image goes through a sequence of resolution-reduction and differential-layer encoders. Each is
equivalent in functionality, except that their input images have different resolutions. Some implementations of the
JBIG standard may choose to recursively use one such physical encoder. The lowest-resolution image is coded
using the lowest-resolution layer encoder. The design of this encoder is somewhat simpler than that of the
resolution reduction and differential-layer encoders, since the resolution-reduction and deterrninistic prediction
operations are not needed.

The JBIG2 Standard

While the JBIG standard offers both lossless and progressive (lossy to lossless) coding abilities, the lossy
image produced by this standard has significantly lower quality than the original, beca,use the lossy image
contains at most only one-quarter of the number of pixels in the original image. By contrast, the JBIG2 standard is
explicitly designed for lossy, lossless, and lossy to lossless image compression. The design goal for JBIG2 aims
not only at providing superior lossless compression performance over existing standards but also at incorporating
lossy compression at a much higher compression ratio, with as little visible degradation as possible.

A unique feature of JBIG2 is that it is both quality progressive and content progressive. By quality
progressive, we mean that the bitstream behaves similarly to that of the JBIG standard, in which the image quality
progresses from lower to higher (or possibly lossless) quality. On the other hand, content progressive allows
different types of image data to be added progressively. The JBIG2 encoder decomposes the input bilevel image
into regions of different attributes and codes each separately, using different coding methods.As in other imag"e
compression standards, only the JBIG2 bitstrearn, and thus the de coder, is explicitly defined. As a result, any
encoder that produces the correct bitstream is "compliant", regardless of the actions it actually takes. Another
feature of JBIG2 that sets it apart from other image compression standards is that it is able to represent multiple
pages of a document

in a single file, enabling it to exploit interpage similarities.

For example, if a character appears on one page, it is likely to appear on other pages as well. Thus, using a
dictionary-based technique, this character is coded only once instead of multiple times for every page on which it
appears. This compression technique is somewhat analogous to video coding, which exploits interframe
redundancy to increase compression efficiency.

JBIG2 offers content-progressive coding and superior compression performance through model-based coding,
in which different models are constructed for different data types in an image, realizing additional coding gain.

Model-Based Coding. The idea behind model-based coding is essentially the same as that of context-based
coding. From the study of the latter, we know we can realize better compression performance by carefully
designing a context template and accurately estimating the probability distribution for each context. Similarly, if
we can separate the image content into different categories and derive a model specifically for each, we are much
more likely to accurately model the behavior of the data and thus achieve higher compression ratio.

In the JBIG style of coding, adaptive and model templates capture the structure within the image. This model
is general, in the sense that it applies to all kinds of data. However, being general implies that it does not
explicitly deal with the stmctural differences between text and halftone data that comprise nearly all the contents
of bilevel images. JBIG2 takes advantage of this by designing custom models for these data types.The JBIG2
specification expects the encoder to first segment the input image into regions of different data types, in
particular, text and halftone regions. Each region is then coded independently, according to its
characteristics.Text-Region Coding.

Each text region is further segmented into pixel blocks containing connected black pixels. These blocks
correspond to characters that make up the content of this region. Then, instead of coding all pixels of each
character, the bitmap of one representative instance of this character is coded and placed into a dictional)'. For any
character to be coded, the algorithm first tries to find a match with the characters in the dictionary. If one is found,
then both a pointer to the corresponding entry in the dictionary and the position of the character on the page are
coded. Otherwise, the pixel block is coded directly and added to the dictionary. This technique is refelTed to as
pattern matching and substitution in the JBIG2 specification.

However, for scanned documents, it is unlikely that two instances of the same character will match pixel by
pixeL In this case, JBIG2 allows the option of including refinement data to reproduce the original character on the
page. The refinement data codes the CUlTent character using the pixels in the matching character in the
dictional)'. The encoder has the freedom to choose the refinement to be exact or lossy. This method is called soft
pattern matching.

The numeric data, such as the index of matched character in the dictionary and the position of the characters
on the page, are either bitwise or Huffman encoded. Each bitmap for the characters in the dictionary is coded
using JBIG-based techniques.Halftone-Region Coding
The JBIG2 standard suggests two methods for halftone image coding. The first is similar to the context-based
arithmetic coding used in JBIG. The only difference is that the new standard allows the context template to
include

as many as 16 template pixels, four of which may be adaptive.

The second method is called descreelling. This involves converting back to grayscale and coding the
grayscale values. In this method, the bi level region is divided into blocks of size mb x nb. For an m x n bi level
region, the resulting grayscale image has dimension m g= (m + (mb - 1»)/mb , Ng= (n + (nb - 1»)/nbJ. The grayscale
value is then computed to be the sum of the binary pixel values in the corresponding nib x Jlb block. The bit
planes of the grayscale image are coded using context-based arithmetic coding. The grayscale values are used as
indices into a dictionary of halftone bitmap patterns. The decoder can use this value to index into this dictionary,
to reconstruct the original halftone image.

Preprocessing and Postprocessing: JBIG2 allows the use of lossy compression but does not specify a method for
doing so. From the decoder point of view, the decoded bit stream is lossless with respect to the image encoded by
the encoder, although not necessarily with respect to the original image. The encoder may modify the input image
in a prepro cessing step, to increase coding efficiency. The preprocessor usually tries to change the original image
to lower the code length in a way that does not generally affect the image's appearance. Typically, it tries to
remove noisy pixels and smooth out pixel blocks. Postprocessing, another issue not addressed by the
specification, can be especially useful for halftones, potentially producing more visually pleasing images. It is
also helpful to tune the decoded image to a particular output device, such as a laser printer.

You might also like