0% found this document useful (0 votes)
19 views114 pages

Image Compression

Image compression is the process of reducing the file size of an image while maintaining visual quality, categorized into lossless and lossy types. Lossless compression allows for perfect reconstruction, while lossy compression sacrifices some quality for greater size reduction, with JPEG being a prominent example. The JPEG algorithm involves dividing images into blocks, applying discrete cosine transform (DCT), quantizing coefficients, and using lossless compression techniques to achieve efficient storage.

Uploaded by

mora srinath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views114 pages

Image Compression

Image compression is the process of reducing the file size of an image while maintaining visual quality, categorized into lossless and lossy types. Lossless compression allows for perfect reconstruction, while lossy compression sacrifices some quality for greater size reduction, with JPEG being a prominent example. The JPEG algorithm involves dividing images into blocks, applying discrete cosine transform (DCT), quantizing coefficients, and using lossless compression techniques to achieve efficient storage.

Uploaded by

mora srinath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 114

Image Compression

CS 663, Ajit Rajwade

1
Image Compression
• Process of converting an image file into
another image file that occupies less storage
space, without sacrificing its visual content as
far as possible.

• Useful for saving storage space, and


transmission costs.

2
Types of compression
• Lossless: the compressed image can be
converted back with zero error.

• Lossy: the compressed image cannot be


converted back to the original without error.
The amount of error is inversely proportional
to the storage space (usually) and can be
controlled by the user.

3
Lossless compression - examples
• LZW method (used in Winzip)
• Huffman encoding (part of the JPEG algorithm,
although overall JPEG is lossy)
• Run-length encoding (also part of the JPEG
algorithm, although JPEG is lossy overall)

4
Lossy compression
• JPEG
• MPEG (for video)
• MP3 (for audio)
• Machine learning based techniques for
compression of images or video (not covered
in this course).

5
Lossy image compression
• Compression of text files or exe files cannot
afford to be lossy.
• But some portion of image content is often
not very noticeable to the human eye,
especially the higher frequencies. Discarding
this extraneous information leads to
compression without significant loss of visual
appeal.

6
Source: Article on compressive sensing by Candes and Wakin, from IEEE Signal Processing
Magazine, 2008

DCT coefficients or

7
JPEG compression method
• JPEG = Joint Photographic Experts Group
• One of the most popular standards for
compression of photographic images – widely
used on the internet.
• Widely used in digital cameras.
• Implemented in all standard image processing
software (MATLAB, OpenCV, etc.)
• Essentially lossy (though there are some
lossless variants)
• Applicable for color as well as grayscale images.
8
JPEG image compression
• User specifies a quality factor (Q) between 0 and
100 (higher Q means better quality)
• JPEG algorithm compresses the image based on the
user-provided Q.
• Higher the Q, less will be the compression rate (but
higher image quality). Lower Q will give higher
compression rate (but poorer image quality).
• JPEG can achieve 1/10 or 1/15 compression rate
with little loss of quality.

9
JPEG image compression
• How is the loss of quality measured?
• As MSE between original (uncompressed) and
reconstructed images:

10
Q = 100,
compression
rate = 1/2.6 Q = 10,
compression
rate = 1/46

Q = 50,
Q = 1,
compression
compression
rate = 1/15
rate = 1/144

Q = 25, http://en.wikipedia.org/wiki/JPEG
compression
rate = 1/23

11
Steps of the JPEG algorithm (encoder):
Overview (approximate)
1. Divide the image into non-overlapping 8 x 8 blocks and
compute the discrete cosine transform (DCT) of each block.
This produces a set of 64 “DCT coefficients” per block.
2. Quantize these DCT coefficients, i.e. divide by some number
and round off to nearest integer (that’s why it is lossy). Many
coefficients now become 0 and need not be stored!
3. Now run a lossless compression algorithm (typically
Huffman encoding) on the entire set of integers.

We will go through each step in detail in the several slides to


follow.

12
13
STEP 1: Discrete Cosine
Transform (DCT)

14
Discrete Cosine Transform (DCT) in 1D

15
Discrete Cosine Transform (DCT) in 1D

n u

u n

16
DCT
• Expresses a signal as a linear combination of
cosine bases (as opposed to the complex
exponentials as in the Fourier transform).
• The coefficients of this linear combination are
called DCT coefficients.
• Is real-valued unlike the Fourier transform!
• Discovered by Ahmed, Natarajan and Rao
(1974)

17
u

• DCT basis matrix is orthonormal. The dot product of any row (or column) with itself
is 1. The dot product of any two different rows (or two different columns) is 0. The
inverse is equal to the transpose.

• Being orthonormal, it preserves the squared norm, i.e.

• DCT is NOT the real part of the Fourier!

• DCT basis matrix is NOT symmetric.

• Columns of the DCT matrix are called the DCT basis vectors.
18
Digression: matrix view of a discrete orthonormal
transform (Fourier transform used as example here)

• Remember:

Fourier matrix: in any row, the


• In matrix form, we write: value of x is fixed, the value of
u ranges from 0 to M-1

19
20
DCT in 2D
The DCT matrix is this case will have size
MN x MN, and it will be the Kronecker
product of two DCT matrices – one of size
M x M, the other of size N x N. The DCT
matrix for the 2D case is also
orthonormal, it is NOT symmetric and it is
NOT the real part of the 2D DFT.

21
What is a 2D Fourier Matrix?
• It is of the following form:

22
What is a 2D Fourier Matrix?

Consider a matrix A of size N1 x N2 and a matrix B of size M1 x M2. The size of their
Kronecker product is given by N1 M1 x N2 M2. The Kronecker product is
constructed by creating a rectangular grid of size N1 x N2 . In each cell of the grid,
you place B. The copy of B in the cell at grid location (i,j) is multiplied by Aij.
23
This big matrix is nothing but the Kronecker product of two
2 x 2 Fourier matrices.
24
How do the DCT bases look like? (1D case)

25
How do the DCT bases look like? (2D-case)

The DCT transforms an 8×8 block of


input values to a linear combination
of these 64 patterns. The patterns
are referred to as the two-
dimensional DCT basis vectors, and
the output values are referred to as
transform coefficients. Here each
basis vector is reshaped to form an
image.

http://en.wikipedia.org/wiki/JPEG

26
Again: DCT is NOT the real part of the DFT

Real part of DFT DCT

http://en.wikipedia.org/wiki/JPEG 27
DCT on grayscale image patches
• The DCT coefficients of natural image patches
have an amazing property.
• It is observed that most of the signal energy is
concentrated in only a small number of
coefficients.
• This is good news for compression! Store only a
few coefficients, and throw away the rest.
• The corresponding error will be small, due to
the orthonormal nature of the DCT.
28
149 74 92 74 74 74 149 162
87 74 117 30 74 105 180 130
30 117 105 43 105 130 149 105
74 162 105 74 105 117 105 105
117 149 74 117 74 105 74 149
149 87 74 87 74 74 117 180
IMAGE PATCH
105 74 105 43 61 117 180 149
74 74 105 74 105 130 149 105

828.3750 -106.7827 126.4183 -8.2540 -57.3750 -0.5311 -2.1682 29.8472


-6.0004 2.5328 8.3779 -7.1377 -17.3419 -6.9695 -11.1366 22.7612 DCT COEFFICIENTS
-6.5212 -56.2336 23.5930 16.3746 -5.5436 74.2016 23.1543 65.2328
17.2141 29.9058 91.3782 -19.9119 106.2541 37.4804 15.8409 -25.1828
14.1250 53.2562 -30.5477 -0.8891 30.8750 -23.2787 -9.4005 -41.8019
5.7938 -2.9468 10.0191 2.8929 -16.5056 -2.4595 -5.1284 12.7364
-3.6579 2.3417 -14.8457 -0.7304 34.6327 -10.3257 -7.3430 -5.6082
-1.7071 -9.8264 -6.4722 -1.3611 -10.5811 -4.5081 -0.4332 -20.6615

HISTOGRAM OF DCT
COEFFICIENTS

29
Original image Image reconstructed after Image reconstructed after
discarding all DCT coefficients discarding all DCT coefficients
of non-overlapping 8 x 8 of non-overlapping 8 x 8
patches with absolute value patches with absolute value
less than 10, and then less than 20, and then
computing inverse DCT computing inverse DCT
Number of DCT coefficients of non-
Number of DCT coefficients of non-
overlapping 8 x 8 patches with absolute
overlapping 8 x 8 patches with absolute
value less than 20 was 51,045 out of a
value less than 10 was 34,377 out of a total
total of 65536 (64 coefficients for each 8
of 65536 (64 coefficients for each 8 x 8
x 8 patch, totally 1024 such patches). This
patch, totally 1024 such patches). This is
is more than 78%. Corresponding
more than 50%. Corresponding percentage
percentage for DFT was 7%.
for DFT was 1%. 30
Why DCT? DFT and DCT comparison
DFT DCT
Orthonormal Yes Yes
Real/complex Complex Real
Separable in
Yes Yes
2D
Norm-
Yes Yes
preserving
Inverse exists Yes Yes
Fast
implementatio Yes (fft) Yes (uses fft)
n
Energy
compaction for Good/Fair Much Better
natural images

31
DCT has better energy compaction than DFT because…

Recall that the DFT of a sequence is equal to the Discrete Fourier Series (DFS) of a periodic
extension of that sequence. In computing the DFT of a signal of length n, there is the implicit
extension of several copies of the signal placed one after the other (n-point periodicity). The
resultant discontinuities require several frequencies for good representation in the DFS. As
against this, the discontinuities are reduced in a DCT because a reflected copy of the signal is
32
appended to it (2n-point periodicity) before computing the DFS.
DCT has better energy compaction than
DFT because…

33
DCT computational complexity
• Naïve implementation (matrix times vector) is
O(N2) for a vector of N elements.
• You can speed this up to O(N log N) using the
FFT as shown on next slide.

34
Reflected version of f, appended to f.

(with some caveats – next slide)

DCT of f is computed from the DFT of the sequence of double length as f.


Only the first N frequencies are picked.
In MATLAB, you have the commands called dct and idct (in 1D)
and dct2 and idct2 (in 2D).

supporting code:
https://www.cse.iitb.ac.in/~ajitvr/CS663_Fall2024/Code_Compression/
dct_dft_relation.m
35
1

See code in google drive


folder
http://en.wikipedia.org/wiki/Discrete_c
osine_transform
http://www.ecsutton.ece.ufl.edu/dip/h
andouts/dct.pdf

You would noticed that the constant


factors sqrt(1/N) and sqrt(2/N) are
missing in this expression. These
factors are essential for the DCT
matrix to be orthonormal, but their
presence doesn’t allow for this
relationship between DCT and DFT.36
Which is the best orthonormal basis?
• Consider a set of M data-points (e.g. image
patches in a vectorized form) represented as a
linear combination of column vectors of an
ortho-normal basis matrix:

• Suppose we reconstruct each patch using only


a subset of some k coefficients as follows:

37
Which is the best orthonormal basis?
• For which orthonormal basis U is the following
error the lowest:

38
Which is the best orthonormal basis?
• The answer is the PCA basis, i.e. the set of k
eigenvectors of the correlation matrix C,
corresponding to the k largest eigen-values.
Here is C is defined as:

39
PCA: separable 2D version
• Find the correlation matrix CR of row vectors from
the patches.
• Find the correlation matrix CC of column vectors
from the patches.
• The final PCA basis is the Kronecker product of the
individual bases:

40
But PCA is not used in JPEG, because…
• It is image-dependent, and the basis matrix
would need to be computed afresh for each
image.
• The basis matrix would need to be stored for
each image.
• It is expensive to compute – O(n3) for a vector
with n elements.
The DCT is used instead!

41
DCT and PCA
• DCT can be computed very fast using fft.
• It is universal – no need to store the DCT bases
explicitly.
• DCT has very good energy compaction
properties, only slightly worse than PCA.

42
Code:
https://www.cse.iitb.ac.in/~ajitvr/CS663_Fall2024/Code_Compression/dct_pca.m

Experiment
• Suppose you extract M ~ 100,000 small-sized (8 x 8) patches from a set of images.
• Compute the column-column and row-row correlation matrices.

• Compute their eigenvectors VR and VC.


• The eigenvectors will be very similar to the columns of the 1D-DCT matrix! (as
evidenced by dot product values).
• Now compute the Kronecker product of VR and VC and call it V. Reshape each
column of V to form an image. These images will appear very similar to the DCT
bases.

43
0.3536 0.4904 0.4619 0.4157 0.3536 0.2778 0.1913 0.0975
0.3536 0.4157 0.1913 -0.0975 -0.3536 -0.4904 -0.4619 -0.2778 DCT matrix: dctmtx
0.3536 0.2778 -0.1913 -0.4904 -0.3536 0.0975 0.4619 0.4157 command from MATLAB (see
0.3536 0.0975 -0.4619 -0.2778 0.3536 0.4157 -0.1913 -0.4904
0.3536 -0.0975 -0.4619 0.2778 0.3536 -0.4157 -0.1913 0.4904 code on website)
0.3536 -0.2778 -0.1913 0.4904 -0.3536 -0.0975 0.4619 -0.4157
0.3536 -0.4157 0.1913 0.0975 -0.3536 0.4904 -0.4619 0.2778
0.3536 -0.4904 0.4619 -0.4157 0.3536 -0.2778 0.1913 -0.0975
0.3517 -0.4493 -0.4278 0.4230 0.3754 0.3247 -0.2250 -0.1245
0.3534 -0.4366 -0.2276 -0.0110 -0.3078 -0.4746 0.4732 0.2975 VC: Eigenvectors of column-
0.3543 -0.3101 0.1728 -0.4830 -0.3989 0.0498 -0.4299 -0.4109
0.3546 -0.1115 0.4799 -0.3005 0.3342 0.4102 0.1856 0.4761 column correlation matrix
0.3547 0.1141 0.4823 0.2944 0.3301 -0.4182 0.1745 -0.4771
0.3543 0.3104 0.1771 0.4834 -0.3977 -0.0322 -0.4308 0.4103
0.3535 0.4357 -0.2319 0.0143 -0.3009 0.4656 0.4851 -0.2975
0.3520 0.4468 -0.4328 -0.4204 0.3686 -0.3253 -0.2342 0.1261
0.3520 -0.4461 -0.4305 0.4224 0.3696 0.3247 0.2342 0.1283
0.3537 -0.4338 -0.2345 -0.0114 -0.3000 -0.4671 -0.4814 -
0.3028 VR: Eigenvectors of row-row
0.3545
0.3548
-0.3086
-0.1145
0.1662
0.4763
-0.4896
-0.3031
-0.4007
0.3339
0.0359
0.4198
0.4261
-0.1800
0.4102
-0.4713
correlation matrix
0.3548 0.1056 0.4839 0.2926 0.3349 -0.4194 -0.1766 0.4733
0.3543 0.3043 0.1863 0.4833 -0.4028 -0.0354 0.4269 -0.4097
0.3532 0.4389 -0.2269 0.0180 -0.3008 0.4654 -0.4811 0.3037
0.3512 0.4562 -0.4300 -0.4126 0.3694 -0.3242 0.2335 -0.1319

Absolute value of dot products between the columns of DCT matrix and columns of V R (left) and VC (right)
1.0000 0.0007 0.0032 0.0002 0.0013 0.0001 0.0005 0.0000 1.0000 0.0002 0.0029 0.0001 0.0010 0.0000 0.0004 0.0000
0.0007 0.9970 0.0097 0.0689 0.0009 0.0322 0.0003 0.0110 0.0002 0.9965 0.0028 0.0766 0.0005 0.0314 0.0009 0.0107
0.0033 0.0106 0.9968 0.0118 0.0713 0.0004 0.0334 0.0025 0.0029 0.0025 0.9969 0.0046 0.0728 0.0017 0.0304 0.0013
0.0002 0.0718 0.0124 0.9926 0.0007 0.0927 0.0017 0.0276 0.0001 0.0795 0.0044 0.9923 0.0029 0.0916 0.0015 0.0243
0.0010 0.0001 0.0737 0.0004 0.9942 0.0008 0.0780 0.0010 0.0008 0.0003 0.0747 0.0026 0.9948 0.0061 0.0696 0.0004
0.0000 0.0261 0.0015 0.0962 0.0005 0.9934 0.0011 0.0569 0.0000 0.0246 0.0021 0.0949 0.0069 0.9940 0.0131 0.0452
0.0003 0.0007 0.0276 0.0021 0.0802 0.0010 0.9964 0.0013 0.0003 0.0004 0.0252 0.0003 0.0715 0.0137 0.9970 44
0.0002
0.0000 0.0076 0.0026 0.0227 0.0012 0.0596 0.0015 0.9979 0.0000 0.0076 0.0013 0.0207 0.0001 0.0476 0.0009 0.9986
64 columns of V – each reshaped to form an 8 x DCT bases
8 image, and rescaled to fit in the 0-1 range.
Notice the similarity between the DCT bases and
the columns of V. Again, V is the Kronecker
product of VR and VC.

45
DCT and PCA
• DCT is very close to PCA when the patches
come from what is called as a stationary first
order Markov process, i.e.

46
DCT and PCA
• One can show that the eigenvectors of the correlation
matrix of the form seen on the previous slide are the DCT
basis vectors!

• Natural images approximate this first order Markov model,


and hence DCT is almost as good as PCA for compression of
a large ensemble of image patches.

• DCT has the advantage of being a universal basis and also


the DCT coefficients are more efficiently computable than
PCA coefficients (because DCT computation uses FFT).
47
More results from the previous experiment. See code:
https://www.cse.iitb.ac.in/~ajitvr/CS663_Fall2024/Code_Compression/dct_pca.m

1.0000 0.9902 0.9795 0.9733 0.9682 0.9639 0.9604 0.9570


0.9902 1.0005 0.9908 0.9795 0.9734 0.9684 0.9643 0.9605 CR/CR(1,1) -
0.9795 0.9908 1.0010 0.9908 0.9796 0.9735 0.9689 0.9646 Notice it can be
0.9733 0.9795 0.9908 1.0005 0.9904 0.9793 0.9735 0.9686
0.9682 0.9734 0.9796 0.9904 1.0004 0.9903 0.9794 0.9734
approximated by the
0.9639 0.9684 0.9735 0.9793 0.9903 1.0001 0.9903 0.9793 form shown two slides
0.9604 0.9643 0.9689 0.9735 0.9794 0.9903 1.0004 0.9904 before, with ρ~0.99
0.9570 0.9605 0.9646 0.9686 0.9734 0.9793 0.9904 1.0002

1.0000 0.9888 0.9770 0.9704 0.9648 0.9599 0.9554 0.9510


0.9888 1.0004 0.9891 0.9768 0.9703 0.9646 0.9596 0.9548 CC/CC(1,1) -
0.9770 0.9891 1.0004 0.9886 0.9764 0.9698 0.9640 0.9587 Notice it can be
0.9704 0.9768 0.9886 0.9994 0.9878 0.9755 0.9687 0.9627
0.9648 0.9703 0.9764 0.9878 0.9986 0.9870 0.9746 0.9676
approximated by the
0.9599 0.9646 0.9698 0.9755 0.9870 0.9978 0.9861 0.9734 form shown two slides
0.9554 0.9596 0.9640 0.9687 0.9746 0.9861 0.9967 0.9847 before, with ρ~0.9888
0.9510 0.9548 0.9587 0.9627 0.9676 0.9734 0.9847 0.9951

48
Computation of DCT coefficients in JPEG
• Before computation, the value 128 (midpoint
of the range 0 to 255) is subtracted from every
pixel value.
• This changes the range of intensity values
from 0 to 255, to -128 to 127.
• This also changes the range of DCT coefficient
values from 0 to 2048, to -1024 to +1024.

49
STEP 2: Quantization

50
Quantization
• The DCT coefficients are floating point
numbers and storing them in a file will
produce no compression. So they need to be
quantized.
• The human eye is not sensitive to changes in
the higher frequency content.
• So we can have cruder quantization for the
higher frequency coefficients and a finer one
for the lower frequency coefficients.
51
Quantization
• Quantization is performed by dividing the DCT coefficient
matrix element-wise by a quantization matrix and rounding
off to the nearest integer.
• The quantization matrix on the next slide is for quality factor
Q = 50.
• Matrices for lower Q values are obtained by scaling the Q =
50 matrix with a constant 50/Q – which increases the values
in the quantization matrix.
• Matrices for higher Q values are obtained by scaling the Q =
50 matrix with a constant 50/Q – which decreases the values
in the quantization matrix.
52
M

Quantization matrix for Q = 50:


notice the higher values in the Most of the values in B are 0!
matrix for higher frequency They need not be stored! Only
coefficients the non-zero values in B will be
stored! 53
How was this quantization matrix picked?

• The quantization error is given by


where .
• The maximum possible value of the error is given by
Muv/2.
• Psychophysical studies have been performed to find
threshold values of DCT coefficients, i.e. for each
frequency (u,v), these studies have determined the
smallest DCT coefficient value that yielded a visible
signal. This threshold is called tuv.
• We set Muv = 2tuv so that the errors remain invisible.
54
STEP 3: Lossless compression steps:
Huffman encoding and Run length encoding

55
56
Huffman encoding
• Input: a set of non-zero quantized DCT coefficients from all the
different blocks of the image (values lying between -1024 to
+1024).
• Output: a set of encoded coefficients with length (in terms of
number of bits) less than that of the original set.
• Principles behind Huffman encoding:
(1) Encode the more frequently occurring coefficients with
fewer bits. Encode the rarely occurring coefficients with
more bits. This will reduce the average bit-length.
(2) Ensure that the encoding for no coefficient is a strict prefix of
the encoding of any other coefficient (to be explained on
next slide). This is called a “prefix-free code”.
57
Huffman encoding example
• Consider a set of alphabets {a,e,q}. Let the frequency of an
alphabet x be denoted as p(x).
• Assume p(e) > p(a) > p(q) [actually true in the English
language].
• Consider the following code-word assignment: e – 0, a – 1, q –
01 (note: we assigned more bits for q). Now consider the
encoded stream: 001. It can be interpreted as ‘eea’ or ‘eq’.
• The reason for this ambiguity is that the code for ‘e’ is a strict
prefix of the code for ‘q’.
• For unambiguous decoding, we need prefix-free codes.
Example e – 0, a – 10, q – 11 is one example of a prefix-free
code.
58
Huffman encoding example
• The Huffman encoding algorithm asks the
following question:
Given a set of n alphabets A = {ai} with
corresponding frequencies {p(ai)} (each
frequency lies from 0 to 1), what prefix-free
encoding yields the least average bit length?
That is, which set of code-words {λ(ai)} will
minimize
Length of the code-
word λ(ai)
59
Algorithm
1. Sort alphabets in increasing order of frequency. Create a leaf
node from each alphabet. These leaf nodes will belong to a
binary tree called the Huffman tree.
2. Combine the two lowest frequency nodes s1 and s2 to create a
parent node s12. s1 and s2 will be the left and right child of s12.
The frequency of s12 is given by p(s12) = p(s1) + p(s2).
3. Label the edge from s12 to s1 with a ‘0’ and the edge from s12 to
s2 with a ‘1’.
4. Delete s1 and s2 from the sorted list of alphabets and insert the
node s12, i.e. root node of the tree (s12,s1,s2) in the correct place
depending on the value of p(s12).
5. Repeat steps 2 to 4 until there is only one node in the list. This
will be the root node of the final Huffman tree.
6. Traverse the tree from the root node until each leaf and collect
all the binary symbols along every edge into a string. This string60
www.cis.upenn.edu/~matuszek/cit594-2002/Slides/huffman.ppt

Example
0.6
3 0
1 0.2
0 1 0.4

1
1
0 0.2
0.4 0.2 0.1 0.1 0.2
0
1

0.4 0.2 0.1 0.1 0.2


2
0.4
1.00
1 4 1
0
0.2
0.6
0
1 0
0 0.4
1
A=0 1
0.4 0.2 0.1 0.1 0.2
B = 100 0
0
0.2
1
C = 1010
This is a prefix-free code. No D = 1011
leaf node is on the path to R = 11 0.4 0.2 0.1 0.1 0.2
any other node. 61
A=0 Average code length here
B = 100 = 1 * p(A) + 3 * p(B) + 4 * p(C) + 4*p(D)
1.00
C = 1010 + 2 *p(R) = 0.4 + 0.6 + 0.4 + 0.4 + 0.4 =
1
D = 1011 2.2.
0.6 R = 11
0
0 0.4
1
Input string: RABBCDR
0
1
Encoded bit stream:
0.2
0
1 11-0-100-100-1010-1011-11
Decoded string: RABBCDR
0.4 0.2 0.1 0.1 0.2

To perform encoding, we maintain an initially empty encoded bit stream. Read a symbol
from the input, traverse the Huffman tree from root node to the leaf node for that
symbol, collecting all the bit labels on the traversed path, and appending them to the
encoded bit stream. Repeat this for every symbol from the input. Example: for R, we
write 11.

To perform decoding, read the encoded bit stream, and traverse the Huffman tree from
the root node toward a leaf node, following the path as indicated by the bit stream. For
example, if you read in 11, you would travel to the leaf node R. When you reach a leaf
node, append its associated symbol to the decoded output. Go back to the root node
and traverse the tree as per the remaining bits from the encoded bit stream.
62
About the algorithm
• This is a greedy algorithm, which is guaranteed
to produce the prefix-free code with minimal
average length (proof beyond the scope of the
course).
• There could be multiple sets of code words with
the same average bit length. Huffman encoding
produces one of them, depending on the order
in which the nodes were combined, and the
convention for labeling the edges with a 0 or a 1.

63
Huffman Trees and Entropy
• The average code length as computed by this
algorithm satisfies
Entropy of the random variable (set of
alphabets) – measure of uncertainty of
the random variable, or the measure of
how much a random variable surprises
you, or the average number of bits
required to store a random variable.

Recall: Entropy is minimum in the case


Note: Entropy is also the average number where the random variable takes on only
of bits required to encode the values of one value. It is the maximum when it can
the random variable. It is not possible to take on any one of some k different
code the intensity values of an image values, each with equal probability.
with fewer bits per pixel than its
entropy. 64
Zig-zag ordering

• The quantized DCT coefficients are arranged


now in a zigzag order as follows. The zig-zag
pattern leaves a bunch of consecutive zeros at
the end.
−26
−3 0
−3 −2 −6
2 −4 1 −3
1 1 5 1 2
−1 1 −1 2 0 0
0 0 0 −1 −1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0
0 0 0 0 65
0 0 0
Run length encoding
• The non-zero re-ordered quantized DCT
coefficients (except for the DC coefficient) are
written down in the following format:
• run-length (number of zeros before this coefficient), 4 bits each
• size (no. of bits to store the Huffman code for the
coefficient),
• actual Huffman code of the coefficient

We refer to the above set as a triple. In case there are more than 15 zeros in between
2 non-zero AC coefficients, a special triple is inserted. That triple is (15,0,0). If there
are a large number of trailing zeros at the end of a block, we but in an “end of block”
triple given as (0,0).
66
−26
−3 0
−3 −2 −6
2 −4 1 −3
1 1 5 1 2
−1 1 −1 2 0 0
0 0 0 −1 −1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0
0 0 0 0
0 0 0
0 0
0

(0, 2)(-3); (1, 2)(-3); (0, 2)(-2); (0, 3)(-6); (0, 2)


(2); (0, 2)(-4); (0, 1)(1); (0, 2)(-3); (0, 1)(1);
(0, 1)(1); (0, 3)(5); (0, 1)(1); (0, 2)(2); (0, 1)(-1);
(0, 1)(1); (0, 2)(2); (5, 1)(-1); (0, 1)(-1); (0, 0).

67
Encoding DC coefficients
• The difference between the DC coefficient of
the current and previous patch is encoded and
stored.
• These difference values are Huffman encoded
using a separate table (different from the
Huffman table used for AC coefficients).
• The DC coefficient of the first patch is stored
explicitly.

68
JPEG encoded file
• Begins with a header that contains
information such as size of file, whether color
or grayscale, the table of different alphabets
(i.e. DCT coefficient values and their Huffman
codes) and the quantization matrix.
• This is followed by a bit stream containing
triples of the form: (run-length, the length of
the Huffman code of the coefficient, and the
Huffman code for the coefficient).
69
JPEG DECODER

70
JPEG decoding
• Perform Huffman decoding and obtain the DCT coefficients (AC).
• Multiply the AC coefficients point-wise with the entries in the
quantization matrix.
• Compute the DC coefficients for each patch using the differences
between the DC coefficients of successive patches. Multiply by
the appropriate entry from the quantization matrix.
• Reconstruct the image patches of size 8 x 8 using the inverse DCT.
Add 128 to the intensity values in the patch.
• Note: During JPEG encoding, the round-off errors from the
quantization step can never be recovered again. Hence JPEG is
overall a lossy algorithm.

71
JPEG for color images

72
JPEG for color images
• The RGB values are converted to the YCbCr color space using:

• Encode the Y, Cb, and Cr channels separately, using the


“grayscale” JPEG algorithm on each channel. The Cb and Cr
channels (the chrominance channels) are down-sampled by a
factor of 2 in X and Y direction to further save storage space.
• The Y channel (luminance) is not down-sampled. This is
because the human eye is much more sensitive to luminance
than to chrominance information.

73
PCA on RGB values
• Why can you not separately compress the R,G,B
images – instead of converting to Y, Cb, Cr?
• The answer lies in PCA!
• Suppose you take N color images and extract
RGB values of each pixel (3 x 1 vector at each
location).
• Now, suppose you build an eigenspace out of
this – you get 3 eigenvectors, each
corresponding to 3 different eigenvalues.
74
PCA on RGB values
• The eigenvectors will look typically as follows:
0.5952 0.6619 0.4556
0.6037 0.0059 -0.7972
0.5303 -0.7496 0.3961
• Exact numbers are not important, but the first
eigenvector is like an average of RGB. It is
called as the Luminance Channel (Y). It is
similar to the intensity in the HSI space.

75
PCA on RGB values
• The second eigenvector is like Y-B, and the third is
like Y-G. These are called as the Chrominance
Channels.
• The Y-Cb-Cr color space is related to this PCA-based
space (though there are some details in the relative
weightings of RGB to get Luminance and
Chrominance – denoted by Cb and Cr).
• The values in the three channels Y, Cb and Cr are
decorrelated, similar to the values projected onto
the PCA-based channels.
76
PCA on RGB values
• Why does PCA produce decorrelated values (i.e. why are the
values of the eigencoefficients decorrelated)?

• Recall: in PCA, we built the correlation matrix C from N original


data-points {xi}, 1 ≤ i ≤ N.

• Recall that .

• The eigencoefficients are given as {αi}, 1 ≤ i ≤ N where xi = Vαi


where C = VΛVT.

• The correlation matrix of the eigencoefficients is given by:


77
PCA on RGB values
• The correlation matrix of the eigencoefficients is
given by:

• This is a diagonal matrix (of eigenvalues).

• Which means that the different elements of the


vector of eigencoefficients are decorrelated.

78
PCA on RGB values
• Why is it important to have decorrelated values in compression?

• Because if the variables were not decorrelated, any operations you


perform on one variable have an effect on the other variables.

• So if you compressed the R image, G image and B image separately,


a change in the R values (due to quantization) unintentionally
affects the G and B values (and vice versa).

• To prevent this, you convert the color values from RGB to a


decorrelated color space such as YCbCr.

79
PCA on RGB values
• The luminance channel (Y) carries most information from
the point of view of human perception, and the human
eye is less sensitive to changes in chrominance.
• This fact can be used to assign coarser quantization levels
(i.e. fewer bits) for storing or transmitting Cb and Cr
values as compared to the Y channel. This improves the
compression rate.
• The JPEG standard for color image compression uses the
YCbCr format. For an image of size M x N x 3, it stores Y
with full resolution (i.e. as an M x N image), and Cb and Cr
with 25% resolution, i.e. as M/2 x N/2 images.
80
81
82
83
The variances of the three eigen-coefficient values:
8411, 159.1, 71.7

84
85
86
87
RGB and its corresponding Y, Cb, Cr channels

88
http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-4.pdf

89
Down-sampling of Cb and Down-sampling of Cb and No down-sampling of
Cr in X and Y directions by Cr only in X direction by a chrominance or luminance
a factor of 2 factor of 2 channels

https://en.wikipedia.org/wi
ki/Chroma_subsampling#/
Cb channel under different down- media/File:Colorcomp.jpg
sampling factors
90
Modes of JPEG compression
• Sequential: encoding and decoding of patches
takes place in left to right, top to bottom
order.
• Progressive: encoding and decoding in
multiple scans, each one with finer
quantization levels.
• Hierarchical: encoding and decoding
performed at different scales.

91
Commonly seen in
web applications (e.g.:
Facebook)

http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-4.pdf

92
JPEG artifacts
• Seam artifacts at patch boundaries (more
prominent for lower Q values).
• Ringing artifacts around edges.
• Some loss of edge and textural detail.
• Color artifacts

93
Video Compression

94
Need for video compression
• Huge data – typical HDTV has frames of size
1920 x 1080 and 30 fps frame-rate. That is
more than 1 GB per second.
• Network channel bandwidths are limited
(around 20 Mbps).
• We need large rates of compression (around
1:80)!

95
Motion JPEG
• Encode each frame using JPEG.
• That will yield only around 1:10 compression.
• This makes use of only spatial redundancy, no
temporal redundancy.

96
MPEG (Motion Pictures Expert Group)
• Heavy use of temporal redundancy!
• Uses a process called predictive coding.
• Consider pixel f(x,y,t) in frame t. We try to
predict its value, denoted as g(x,y,t), using a
linear combinations of the values from
previous k frames, i.e. f(x,y,t-k),…,f(x,y,t-1).
• We simply store the error e(x,y,t) = f(x,y,t)-
g(x,y,t).

97
Differential coding (First order predictive
coding)
• In this method, k = 1, i.e. you encode the
differences between consecutive frames, i.e.
e(x,y,t) = f(x,y,t) – f(x,y,t-1).
• For most frames, the errors are highly sparse.

98
99
Errors are not always sparse ☹
• Exceptions: (1) camera zoom-in and zoom-out,
(2) sudden changes in viewpoint or scene
content, or fade-in/fade-out effects. In such
cases, the errors will be large!

100
Motion compensation
• For each macro-block (typically 16 x 16 or 8 x 8 in size) in
the frame to be encoded, find the most similar macro-
block in a reference frame (which could be the previous
frame, but not necessarily so).
• The difference between the pixel locations of the top-left
corner of the two macro-blocks is called the motion
vector.
• The motion vector is considered to be constant within a
macro-block.
• Search for similar macro-blocks is restricted to a small
search window around the original macro-block.
• The search window is rectangular – broader than taller
(why?). 101
102
Motion Compensation
• For many macro-blocks, the motion vectors will be 0.
• The search for similar macro-blocks is performed at a
sub-pixel level (1/2 pixel or ¼ pixel) for more
accuracy. In this case, image intensity values need to
be interpolated.
• Macro-block similarity measure is one of the
following:
In color videos, only
luminance (i.e. Y)
channel is used in
similarity measure

103
Motion compensation
• Many a time, there are more than one similar block. In such
cases, the spatially closest block is chosen.
• Note: we are not concerned with the accuracy of the motion
estimate. It is only a means towards the larger goal – of
compression.
• The inter-frame differences are computed after motion
compensation. This hugely improves the sparsity of these
differences.
• In other words, don’t blindly compute the difference between
macro-blocks at corresponding locations in frames t and t-1.
Compute the difference between the current macro-block in
frame t and its most similar match in frame t-1.
104
105
106
http://www.cse.iitd.
ernet.in/~pkalra/siv8
64/pdf/session-11-5.
pdf

107
Motion Compensation
• For each macro-block (typically 16 x 16 or 8 x 8 in size) in
the frame to be encoded, find the most similar macro-block
in a reference frame.
• If the reference frame is the previous one or the previous
reference frame (see two slides later for more details), the
current frame (the one being encoded) is called the P-frame.
• If the reference frame is a combination of the previous
frame and the next one, the current frame is called the B-
frame.
• A B-frame can use two motion vectors – one for previous
and one for next frame.

108
Example: why do we need B-frames?

http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-5.pdf

109
No Motion Compensation here!
• For some frames that rest on shot-boundaries
(i.e. sudden changes in content), there is no
advantage to performing motion compensation.
• Such frames are called I-frames (independent
frames). These usually act as reference frames
for other frames to be differentially encoded.
• I-frames can be detected by the presence of
very frequently low similarity values during
search for macro-blocks.
110
MPEG encoder

Note:
•YUV is a color-space
very similar to YCbCr.
• MV = motion vector

http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-5.pdf
111
MPEG encoder
• The I-frames are encoded using JPEG.
• The B-frames and P-frames are also encoded using JPEG, but the
DCT is computed on macro-blocks from the motion-
compensated residual image as follows:
o We have already computed motion vectors w.r.t. the reference
frame.
o Compute a residual image by calculating differences between
macro-blocks in the current frame and the most-similar
matching macro-blocks (as given by the motion vector) from the
reference image. This is called motion-compensated frame
differencing.
o Note: the motion vector for the macro-block also needs to be
stored. For this, motion-vectors from several macro-blocks are
collected together and Huffman-encoded. Only the non-zero
motion vectors are encoded.
112
Display Order and Transmission Order (or order in which frames are
compressed) may be different!

http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-15-5.pdf

113
MPEG decoder

Note:
•YUV is a color-space
very similar to YCbCr.

114

You might also like