Image Compression
Image Compression
Therefore the transmission of a picture with high quality requires a large transmission
bandwidth. But with so many TV channels to be transmitted simultaneously we cannot afford
to allot such a large bandwidth for each TV channel.
A picture is worth a thousand words. For example, 1000 words contain 6000 characters. If each
character is coded into 7-bit ASCII symbols then we require 6000 x 7 i.e. 42000 bits, for
transmitting 1000 words.
What will be the size of a picture that can be described with 42000 bits?
As shown in Fig. an image is formed with the picture elements called pixels. Generally a
medium quality picture is formed with 300 pixels per inch.
With existing standards the 42,000 bits will be able to describe a picture of size 1/4 inch i.e. a
very small picture. If the picture size is increased then the number of pixels will increase and
hence the number of bits will also increase. A picture of 8.5 inch by 11.0 inch with 300 pixels
per inch would require about 2 x 10 bits.
To transmit this image, we have to use some kind of source coding technique to reduce
bandwidth and memo requirements. Therefore source coding (PCM (Pulse-code modulation),
DM (delta modulation), DPCM (Differential pulse code modulation) or ADM (Alternating
Direction Method)) has to be used to reduce the bandwidth and memory requirement for
transmission of an image.
Characteristics of Images:
1) High compression ratio: Since a very large number of bits is required for the
representation of an image, it is necessary to use extremely high compression ratios to
make storage and transmission practically possible.
2) If the transmission of moving images is to be done (examples are TV, movies, computer
graphics, WWW etc.) then it should be executed very fast.
4) Human eyes are highly tolerant to the approximation errors in an image. This makes
the compression practically possible. Such a compression is called is the lossy
compression.
- We can compress video by compressing images. The two standards used for image
compression are:
1. Joint Photographic Experts Group (JPEG).
2. Moving Picture Experts Group (MPEG).
Description:
- The image that is to be transmitted is first converted into an uncompressed digital
image by the process of digitization.
- This digital image is applied to an encoder that uses an appropriate image compression
technique to compress the digital image.
- The compressed digital image is transmitted over a suitable communication channel.
- At the receiving end, a decoder will uncompressed the received compressed image and
passes it on to the receiver to display the image.
A possible exception could be the Earth image. The best compressed file size using the second
JPEG mode and adaptive arithmetic coding is 32,137 bytes, compared to 34,276 bytes using
GIF. The difference between the file sizes is not significant. We can see the reason by looking
at the Earth image. Note that a significant portion of the image is the background, which is of
a constant value. In dictionary coding, this would result in some very long entries that would
provide significant compression.
Alipta Anil Pawar
Assistant Professor,
Dept. of Electronics and Telecommunication Engineering,
Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
We can see that if the ratio of background to foreground were just a little different in this image,
the dictionary method in GIF might have outperformed the JPEG approach. The PNG approach
which allows the use of a different predictor (or no predictor) on each row, prior to dictionary
coding significantly outperforms both GIF and JPEG on this image.
2. CALIC
CALIC method uses both the prediction and context of the pixel value. This scheme works
in two modes, one for gray scale images and another for bi-level images. Here we will
discuss the compression of gray-scale images. A given pixel has a value close to any one
of its neighbours, in an image. A neighbour which has the closest value depends on the
structure (local) of the image.
Depending on whether there is a horizontal or vertical edge in the neighbourhood of the
pixel being encoded, the pixel above, or the pixel to the left, or some weighted average of
neighbouring pixels may give the best prediction. How close the prediction is to the pixel
being encoded depends on the surrounding texture. In a region of the image with a great
deal of variability, the prediction is likely to be further from the pixel being encoded than
in the regions with less variability.
In order to take into account all these factors, the algorithm has to make a determination of
the environment of the pixel to be encoded. The only information that can be used to make
this determination has to be available to both encoder and decoder.
Let’s take up the question of the presence of vertical or horizontal edges in the
neighbourhood of the pixel being encoded. To help our discussion, we will refer to Figure.
In this figure, the pixel to be encoded has been marked with an X. The pixel above is called
the north pixel, the pixel to the left is the west pixel, and so on. Note that when pixel X is
being encoded, all other marked pixels (N_ W_ NW_ NE_ WW_ NN_ NE, and NNE) are
available to both encoder and decoder.
Using the information about whether the pixel values are changing by large or small amounts
in the vertical or horizontal direction in the neighbourhood of the pixel being encoded provides
a good initial prediction. In order to refine this prediction, we need some information about the
interrelationships of the pixels in the neighbourhood. Using this information, we can generate
an offset or refinement to our initial prediction. We quantify the information about the
neighbourhood by first forming the vector
[N, W, NW, NE, NN, WW, 2N –NN, 2W –WW]
Alipta Anil Pawar
Assistant Professor,
Dept. of Electronics and Telecommunication Engineering,
Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
We then compare each component of this vector with our initial prediction . If the value of
the component is less than the prediction, we replace the value with a 1; otherwise we replace
it with a 0. Thus, we end up with an eight-component binary vector. If each component of the
binary vector was independent, we would end up with 256 possible vectors. However, because
of the dependence of various components, we actually have 144 possible configurations. We
also compute a quantity that incorporates the vertical and horizontal variations and the previous
error in prediction by
δ= dh+dv+2 |N − |
Where ˆN is the predicted value of N. This range of values of δ is divided into four intervals,
each being represented by 2 bits. These four possibilities, along with the 144 texture
descriptors, create 144×4 = 576 contexts for X. As the encoding proceeds, we keep track of
how much prediction error is generated in each context and offset our initial prediction by that
amount. This results in the final predicted value.
Once the prediction is obtained, the difference between the pixel value and the prediction (the
prediction error, or residual) has to be encoded. While the prediction process outlined above
removes a lot of the structure that was in the original sequence, there is still some structure left
in the residual sequence. We can take advantage of some of this structure by coding the residual
in terms of its context. The context of the residual is taken to be the value of δ defined in
Equation. In order to reduce the complexity of the encoding, rather than using the actual value
as the context, CALIC uses the range of values in which δ lies as the context. Thus:
+1 → 2 +1
+2 → 2 +2
: :
: :
: :
M −1− → M −1
Where we have assumed that ≤ [M −1]/2.
Another approach used by CALIC to reduce the size of its alphabet is to use a modification of
a technique called recursive indexing. Recursive indexing is a technique for representing a
large range of numbers using only a small set. It is easiest to explain using an example. Suppose
we want to represent positive integers using only the integers between 0 and 7—that is, a
representation alphabet of size 8. Recursive indexing works as follows: If the number to be
represented lies between 0 and 6, we simply represent it by that number.
If the number to be represented is greater than or equal to 7, we first send the number 7, subtract
7 from the original number, and repeat the process. We keep repeating the process until the
remainder is a number between 0 and 6. Thus, for example, 9 would be represented by 7
Alipta Anil Pawar
Assistant Professor,
Dept. of Electronics and Telecommunication Engineering,
Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
followed by a 2, and 17 would be represented by two 7s followed by a 3. The decoder, when it
sees a number between 0 and 6, would decode it at its face value, and when it saw 7, would
keep accumulating the values until a value between 0 and 6 was received.
This method of representation followed by entropy coding has been shown to be optimal for
sequences that follow a geometric distribution.
In CALIC, the representation alphabet is different for different coding contexts. For each
coding context k, we use an alphabet Ak= {0,1, ……., Nk}. Furthermore, if the residual occurs
in context k, then the first number that is transmitted is coded with respect to context k; if
further recursion is needed, we use the k+1 context.
We can summarize the CALIC algorithm as follows:
1. Find initial prediction .
2. Compute prediction context.
3. Refine prediction by removing the estimate of the bias in that context.
4. Update bias estimate.
5. Obtain the residual and remap it so the residual values lie between 0 and M−1, where
M is the size of the initial alphabet.
6. Find the coding context k.
7. Code the residual using the coding context.
All these components working together have kept CALIC as the state of the art in lossless
image compression. However, we can get almost as good a performance if we simplify some
of the more involved aspects of CALIC.
3. JPEG-LS
Where T1, T2, and T3 are positive coefficients that can be defined by the user. Given nine
possible values for each component of the context vector, this results in 9×9×9 = 729 possible
contexts. In order to simplify the coding process, the number of contexts is reduced by
replacing any context vector Q whose first nonzero element is negative by −Q. Whenever this
happens, a variable SIGN is also set to −1; otherwise, it is set to +1. This reduces the number
of contexts to 365. The vector Q is then mapped into a number between 0 and 364.
(The standard does not specify the particular mapping to use.)
The variable SIGN is used in the prediction refinement step. The correction is first multiplied
by SIGN and then added to the initial prediction.
The prediction error rn is mapped into an interval that is the same size as the range occupied by
the original pixel values. The mapping used in JPEG-LS is as follows:
Finally, the prediction errors are encoded using adaptively selected codes based on Golomb
codes, which have also been shown to be optimal for sequences with a geometric distribution.
In Table we compare the performance of the old and new JPEG standards and CALIC. The
Alipta Anil Pawar
Assistant Professor,
Dept. of Electronics and Telecommunication Engineering,
Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
results for the new JPEG scheme were obtained using a software implementation courtesy of
HP.
We can see that for most of the images the new JPEG standard performs very close to CALIC
and outperforms the old standard by 6% to 18%. The only case where the performance is not
as good is for the Omaha image. While the performance improvement in these examples may
not be very impressive, we should keep in mind that for the old JPEG we are picking the best
result out of eight. In practice, this would mean trying all eight JPEG predictors and picking
the best. On the other hand, both CALIC and the new JPEG standard are single-pass algorithms.
Furthermore, because of the ability of both CALIC and the new standard to function in multiple
modes, both perform very well on compound documents, which may contain images along
with text.
Gray codes:
The reflected binary code or Gray code is an ordering of the binary numeral system such that
two successive values differ in only one bit (binary digit). Gray codes are very useful in the
normal sequence of binary numbers generated by the hardware that may cause an error or
ambiguity during the transition from one number to the next. So, the Gray code can eliminate
this problem easily since only one bit changes its value during any transition between two
numbers.
Gray code is not weighted that means it does not depends on positional value of digit. This
cyclic variable code that means every transition from one value to the next value involves only
one bit change.
This is very simple method to get Gray code from Binary number. These are following steps
for n-bit binary numbers −
The most significant bit (MSB) of the Gray code is always equal to the MSB of the
given Binary code.
Other bits of the output Gray code can be obtained by XORing binary code bit at the
index and previous index.
Types of DCT:
1. One-dimensional DCT
2. Two-dimensional DCT
In multimedia compression we use two-dimensional DCT
1. One-dimensional DCT:
The DCT in one dimension is given by
The input is a set of n data values (pixels, audio samples, or other data) and the output
is a set of n DCT transform coefficients (or weights) . The first coefficient is called
the DC coefficient and the rest are referred to as the AC coefficients (these terms have
been inherited from electrical engineering, where they stand for “direct current” and
“alternating current”). Notice that the coefficients are real numbers even if the input data
consists of integers. Similarly, the coefficients may be positive or negative even if the
input data consists of nonnegative numbers only. This computation is straightforward but
slow. The decoder inputs the DCT coefficients in sets of n and uses the inverse DCT
(IDCT) to reconstruct the original data values (also in groups of n). The IDCT in one
dimension is given by
2. Two-dimensional DCT:
The DCT in one dimension can be used to compress one-dimensional data, such as
audio samples. This chapter, however, discusses image compression which is based on
the two-dimensional correlation of pixels (a pixel tends to resemble all its near
neighbours, not just those in its row). This is why practical image compression methods
use the DCT in two dimensions. This version of the DCT is applied to small parts (data
blocks) of the image. It is computed by applying the DCT in one dimension to each row
of a data block, then to each column of the result. Because of the special way the DCT
in two dimensions is computed, we say that it is separable in the two dimensions.
Because it is applied to blocks of an image, we term it a “blocked transform.” It is
defined by
For 0 ≤ i ≤ n−1 and 0 ≤ j ≤ m−1 and for Ci and Cj defined by Equation (4.13). The first
coefficient G00 is again termed the “DC coefficient” and the remaining coefficients are
called the “AC coefficients.”
The image is broken up into blocks of n×m pixels pxy (with n = m = 8 typically), and
Equation (4.15) is used to produce a block of n×m DCT coefficients Gij for each block
of pixels. The coefficients are then quantized, which results in lossy but highly efficient
compression. The decoder reconstructs a block of quantized data values by computing
the IDCT whose definition is
1. The image is divided into k blocks of 8×8 pixels each. The pixels are denoted by pxy.
If the number of image rows (columns) is not divisible by 8, the bottom row (rightmost
column) is duplicated as many times as needed.
2. The DCT in two dimensions [Equation (4.15)] is applied to each block Bi. The result
( ) ()
is a block (we’ll call it a vector) of 64 transform coefficients (where j = 0, 1,
()
. . . , 63). The k vectors become the rows of matrix W
( ) ( ) ( ) ( )
3. The 64 columns of W are denoted by , ,..., . The k elements of
( ) ( ) ( ) ( )
are, ( , , …… ). The first coefficient vector consists of the k DC
coefficients.
4. Each vector ( ) is quantized separately to produce a vector ( ) of quantized
coefficients (JPEG does this differently). The elements of ( ) are then written on the
compressed stream. In practice, variable-size codes are assigned to the elements, and
the codes, rather than the elements themselves, are written on the compressed stream.
Sometimes, as in the case of JPEG, variable-size codes are assigned to runs of zero
coefficients, to achieve better compression.
In practice, the DCT is used for lossy compression. For lossless compression (where
the DCT coefficients are not quantized) the DCT is inefficient but can still be used, at
least theoretically, because (1) most of the coefficients are small numbers and (2) there
often are runs of zero coefficients. However, the small coefficients are real numbers,
not integers, so it is not clear how to write them in full precision on the compressed
stream and still have compression. Other image compression methods are better suited
for lossless image compression.
1. JPEG offers an excellent quality at high and mid bit rates. But at low bit rates the quality
of JPEG is unacceptable (e.g. below 0.25 bits per pixel).
2. JPEG cannot provide a superior performance at lossless and lossy compression.
3. The current JPEG standard provides some resynchronization markers, but the quality
still degrades when bit errors are encountered.
4. JPEG was optimized for natural images. Therefore its performance on computer
generated images and bi-level text images is poor.
5. Every next step in compressing the JPEG image degrades its quality.
Quantization:
After each 8×8 matrix of DCT coefficients Gij is calculated, it is quantized. This is the step
where the information loss (except for some unavoidable loss because of finite precision
calculations in other steps) occurs. Each number in the DCT coefficients matrix is divided by
the corresponding number from the particular “quantization table” used, and the result is
rounded to the nearest integer. As has already been mentioned, three such tables are needed,
for the three color components. The JPEG standard allows for up to four tables, and the user
can select any of the four for quantizing each color component. The 64 numbers that constitute
each quantization table are all JPEG parameters. In principle, they can all be specified and fine-
tuned by the user for maximum compression. In practice, few users have the patience or
expertise to experiment with so many parameters, so JPEG software normally uses the
following two approaches:
Role of a Predictor:
- It is observed that if the sampling takes place at a rate which is higher than the Nyquist
rate, then there is a correlation between successive samples of the signal x (t).
- Hence we can predict the range of next required increment or decrement in x (t) at the
predictor output, if we know the past sample value or the difference.
- This reduces the difference or error between x (t) and ( ). Therefore to encode this
small value of error the DPCM system requires less number of bits which will
ultimately reduce the bit rate. This is the role predictor in DPCM system.
- Suppose that a baseband signal x (t) is sampled at a rate fs = 1/ Ts to produce the sampled
signal {x(n ) }. This signal acts as the input signal to the DPCM transmitter.
- Let the sequence of such samples be denoted by {x(n ) } where n is an integer.
- Let the predictor produce a predicted version of the sampled input and let the predictor
output be denoted by (n ).
- The predictor output is subtracted from the sampled input to obtain a difference signal
e (n ) as follows:
e (n )= x (n ) - (n )
- The predictor value (n ) is produced by the predictor whose input consists of
quantized version of input signal x (n ).
- The difference signal e (n ) is called as prediction error, because it represents the
difference between the sample and its predicted value.
- The quantizer output v (n ) is encoded to obtain the digital pulses i.e. DPCM signal.
- Let the input output characteristics of the quantizer be denoted by a nonlinear function
Q (·).
- So referring to Fig.2.8.2 we get the quantizer output as.
v(n ) = Q[e(n )]
= e(n ) + q (n )
Where q (n ) is the quantization error.
u(n ) = (n ) + v(n )
- Substituting the expression for v (n ) we get
u(n ) = (n ) + e (n ) + q (n )
- But (n ) + e (n ) = x (n )
∴ u(n ) = x (n ) + q (n )
This is nothing but quantized version of input x (n )
Alipta Anil Pawar
Assistant Professor,
Dept. of Electronics and Telecommunication Engineering,
Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
- Thus the quantized signal u (n ) at the predictor input differ from the original input
signal by q (n ) i.e. the quantization error.
DPCM Receiver:
- The DPCM signal is applied to the decoder for reconstructing the quantized version of
the input.
- The decoder output is actually the reconstructed quantized error signal.
- This signal is then added to the predictor output to produce the original signal.
- The predictor used at the receiver is same as that at the transmitter.
Receiver output = e (n ) + (n )
= x(n )
Video compression:
A video is a sequence of images. They are displayed at a constant rate of 24 or 30 images per
second. Therefore we can compress video by compressing images. The two standards used for
image compression are:
Out of these the JPEG is used to compress still images whereas the MPEG is used for
compressing moving pictures.
Objectives of MPEG:
- As with the JPEG standard, the MPEG standard is intended to be generic, meaning that
it will support the needs of many applications.
- As such, it can be considered as a motion video compression toolkit, from which a user
selects the particular features that applications. More specific objectives are:
1. The standard will deliver acceptable video quality at compressed data rates between
1.0 and 1.5 Mbps.
2. It will support either symmetric or asymmetric compress/decompress applications.
3. When compression takes it into account, random-access playback is possible to any
specified degree.
4. Similarly, when compression takes it into account, fast-forward, fast-reverse, or
normal-reverse playback modes can be made available in addition to normal
(forward) playback.
5. Audio/Video synchronization will be maintained.
6. Catastrophic behaviour in the presence of data errors should be avoidable.
7. When it is required, compression-decompression delay can be controlled.
8. Editability should be available when required by the application.
9. There should be sufficient format flexibility to support playing of video in windows.
10. The processing requirements should not preclude the development of low-cost
chipsets, which are capable of encoding in real-time.)
- As you can see, some of these objectives are conflicting, and they all conflict with the
objectives of cost and quality.
- In spite of that the proposed standard provides for all of the objectives, but of course
not all at once.
- A proposed application has to make its own choices about which features of the
standard it requires and accept any trade-off that this may cause.
The MPEG compression standards are among the most popular compression techniques.
Various MPEG standards are as follows:
MPEG 1:
- MPEG-1 was the first procedure for video compression only for those systems which
used progressive scanning. MPEG-1 was not meant for interlaced scanning.
- The compression ratio used in MPEG-1 is 100: 1 That means the original signal of 150
Mbps can be compressed to 1.5 Mbps using MPEG-1
- MPEG-1 standard was first published in 1993 which also supported the two channel
stereo application.
- MPEG-1 audio layer 3 is also known as MP3. Do not confuse it with MPEG-3.
Limitations of MPEG-1:
1. It is not suitable for surround sound system. It supports only the two channel stereo.
2. It cannot be applied to the interlaced scanning (TV).
3. It cannot be used for HDTV.
4. It is an audio compression system limited to only two channels (stereo).
5. It provides very poor compression when used for interlaced video.
6. It is not suitable for higher resolution videos.
7. MPEG-1 can support only one chroma subsampling i.e. 4:2:0.
MPEG-2:
- MPEG-2 has evolved out of the shortcomings of MPEG-1.
- MPEG-2 should not be confused with MPEG-1 Audio Layer II (MP2).
- MPEG-2 is also called as H.262 as defined by ITU. It is a standard for "The generic
coding of moving pictures” and the associated audio information.
Description:
- The key techniques used in MPEG-2 codecs include intraframe Discrete Cosine
Transform (DCT) coding and motion compensated interframe prediction.
- The MPEG-2 standard allows the encoding of video over a wide range of resolutions,
including higher resolutions commonly known as HDTV.
Alipta Anil Pawar
Assistant Professor,
Dept. of Electronics and Telecommunication Engineering,
Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
- MPEG-2 is a combination of lossy video compression and lossy audio compression
methods, based on motion vector estimation, discrete cosine transform (DCT),
quantization and Huffman encoding.
- MPEG-2 is not as efficient as the newer standards such as H.264/AVC and
H.265/HAVC, it is still widely used in over the air transmission of Digital TV and in
the DVD-Video standard.
- MPEG-2 widely used as the format of Digital Television signals that are broadcast
over the air, cable or direct broadcast satellite TV (DBS) systems.
- MPEG-2 is also used to specify the format of movies that are stored on DVDs and
other discs.
- MPEG-2 governs the design of TV stations, TV receivers, DVD players and other
related equipment’s.
- It is second of many MPEG standards and it is an international standard (ISO/IEC
13818).
- Part 1 and 2 or this standard were developed in collaboration with ITU-T and they are
called as H.261 and H.262.
- MPEG is the core of most digital TV and DVD formats. Yet it does not completely
specify them.
- The part 2 of MPEG-2 is called as its video section. It is very similar to the previous
MPEG-1 standard but with an additional feature.
- It also provides support for interlaced video.
- MPEG-2 video is not optimized for low bit rates especially less than 1 Mbps at
standard resolutions.
- The MPEG-2 is fully backward compatible with MPEG-1 video format.
- MPEG-2 video is formerly known as ISO/IEC 13818-2 and as ITU-T Rec H.262.
- The enhanced version of MPEG-2 video can be used even for HDTV transmission and
the ATSC digital TV.
MPEG-2 Part 3:
MPEG-2 Part 7:
- In MPEG every Mth picture in a sequence can be fully compressed by using a standard
MPEG algorithm, these are I-pictures.
- Then the successive I-pictures are compared and the portion of the image that have
moved are identified.
- The image sections which do not move are carried forward in time domain to
intermediate pictures by the decoder memory.
- Then a subset of intermediate pictures is selected and the prediction and correction of
locations of the image section which have moved, is carried out.
- These predicted and corrected images are the P-pictures.
- The pictures between I and P-pictures are the B-pictures. They incorporate the
stationary image sections uncovered by the moving sections.
- Fig. 3.11.1 shows the relative position of these pictures.
- The P and B pictures are allowed but they are not required and their number keeps
changing.
- It is possible to form a sequence without P or B pictures but a sequence without I
pictures is not possible.
- The first step in MPEG is to identify the macroblocks moved between the I-pictures.
- Next step is to form the P-frame between the I-pictures. Each macro block is placed at
its predicted location on the P frame and is cross-correlated in its neighbourhood to
determine the true location of the macro block in P-frame.
- The difference between the predicted and true position of the macro block represents
the error in prediction.
- This error is compressed using DCT and used for correcting the P-frame.
- Fig. 3.11.3 shows the macro block shift between I- pictures and an intermediate P-
picture.
1. It can provide compression of video content which cannot be done with MPEG-1
devices.
2. It provides encoding and decoding of audio contents of high quality. (Enhances audio
coding).
3. It can multiplex a variety of MPEG channels into one single transmission stream.
Disadvantages:
Features of MPEG-2:
Applications of MPEG-2:
1. DVD-video.
2. MPEG-IMX which is a standard definition professional video recording format.
3. High definition video (HDV).
4. XDCAM-a tapeless video recording format.
5. HD-TV.
6. Blue ray disc.
7. Broadcast TV.
8. Digital cable TV.
9. Satellite TV.