Image Compression
Image Compression
2552
Peak Signal to Noise Ratio PSNR 10 log10
MSE
Q = 100,
compression
rate = 1/2.6 Q = 10,
compression
rate = 1/46
Q = 50,
Q = 1,
compression
compression
rate = 1/15
rate = 1/144
Q = 25, http://en.wikipedia.org/wiki/JPEG
compression
rate = 1/23
Steps of the JPEG algorithm: Overview
(approximate)
1. Divide the image into non-overlapping 8 x 8 blocks and
compute the discrete cosine transform (DCT) of each
block. This produces a set of 64 “DCT coefficients” per
block.
2. Quantize these DCT coefficients, i.e. divide by some
number and round off to nearest integer (that’s why it is
lossy). Many coefficients now become 0 and need not be
stored!
3. Now run a lossless compression algorithm (typically
Huffman encoding) on the entire set of integers.
un
1 j 2
DCT : a Nun ,u 0 DFT : aNun e N
N
~ *un
a a N (complex conjugate)
un
2 ( 2n 1)u N
a
un
N cos , u 1... N 1
N 2N
a~Nun a Nun
Discrete Cosine Transform (DCT) in 1D
F (0) a N0,0 . . . a N0, N 1 f (0) f (0) a N0,0 . . . a N0, N 1 F (0)
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
F ( N 1) a N 1,0
a NN 1, N 1 f ( N 1) f ( N 1) a N 1,0
a NN 1, N 1 F ( N 1)
N . . . N . . .
n u ~
AR N N
A R N N ( DCT Basis Matrix)
u n ~~
AAT I AA T I
~
DCT : A A T
~
DFT : A A * (conjugate transpose)
T
DCT
• Expresses a signal as a linear combination of
cosine bases (as opposed to the complex
exponentials as in the Fourier transform). The
coefficients of this linear combination are
called DCT coefficients.
• Is real-valued unlike the Fourier transform!
• Discovered by Ahmed, Natarajan and Rao
(1974)
f (0) a N0,0 . . . a N0, N 1 F (0)
. . . . . . .
. . . . . . .
. . . . . . .
f ( N 1) a N 1,0
a NN 1, N 1 F ( N 1)
N . . .
u ~
A R N N ( DCT Basis Matrix)
n ~~
AA T I
• DCT basis matrix is orthonormal. The dot product of any row (or column) with itself
is 1. The dot product of any two different rows (or two different columns) is 0. The
inverse is equal to the transpose.
• Columns of the DCT matrix are called the DCT basis vectors.
DCT in 2D
N 1 M 1
F (u, v ) f (n, m)a NM
unvm The DCT matrix is this case will have size
MN x MN, and it will be the Kronecker
n 0 m 0 product of two DCT matrices – one of size
N 1 M x M, the other of size N x N. The DCT
f (n, m) F (u, v )a~NM
unvm
matrix for the 2D case is also
u 0 orthonormal, it is NOT symmetric and it is
NOT the real part of the 2D DFT.
DCT :
( 2n 1)u (2m 1)v
unvm
a NM (u ) (v ) cos cos , u 0... N 1, v 0...M 1
2N 2M
(u ) 1 / N (u 0 ), else (u ) 2 / N
(v ) 1 / M (v 0 ), else (v ) 2 / M
a~NM
unvm
a NM
unvm
How do the DCT bases look like? (1D
case)
How do the DCT bases look like? (2D-
case)
The DCT transforms an 8×8 block of
input values to a linear
combination of these 64 patterns. The
patterns are referred to as the two-
dimensional DCT basis functions, and
the output values are referred to as
transform coefficients.
http://en.wikipedia.org/wiki/JPEG
Again: DCT is NOT the real part of the
DFT
http://en.wikipedia.org/wiki/JPEG
DCT on grayscale image patches
• The DCT coefficients of image patches have an
amazing property. Most of the signal energy is
concentrated in only a small number of
coefficients.
• This is good news for compression! Store only
a few coefficients, and throw away the rest.
149 74 92 74 74 74 149 162
87 74 117 30 74 105 180 130
30 117 105 43 105 130 149 105
74 162 105 74 105 117 105 105
117 149 74 117 74 105 74 149
149 87 74 87 74 74 117 180
IMAGE PATCH
105 74 105 43 61 117 180 149
74 74 105 74 105 130 149 105
HISTOGRAM OF DCT
COEFFICIENTS
Original image Image reconstructed after Image reconstructed after
discarding all DCT coefficients discarding all DCT coefficients
of non-overlapping 8 x 8 of non-overlapping 8 x 8
patches with absolute value patches with absolute value
less than 10, and then less than 20, and then
computing inverse DCT computing inverse DCT
In computing the DFT of a signal, there is the implicit extension of several copies of the signal
placed one after the other (n-point periodicity). The resultant discontinuities require several
frequencies for good representation. As against this, the discontinuities are reduced in a DCT
because a reflected copy of the signal is appended to it (2n-point periodicity).
DCT has better energy compaction
than DFT because…
DCT computational complexity
• Naïve implementation (matrix times vector) is
O(N2) for a vector of N elements.
• You can speed this up to O(N log N) using the
FFT as shown on next slide.
~
f (n ) f (n ),0 n N 1
~
f (n ) f (2 N n 1), N n 2 N 1
In MATLAB, you have the commands called dct and idct (in 1D)
and dct2 and idct2 (in 2D).
N 1 2 N 1
~
DFT ( f ( n ))( u ) f ( n )e j 2un / 2 N
f ( 2 N n 1)e j 2un / 2 N
n 0 nN
N 1 N 1
f ( n )e j 2un / 2 N f ( n )e j 2u ( 2 N n 1) / 2 N Replace n by 2N-n-1
n 0 n 0
N 1
f ( n )e j 2un / 2 N e j 2u ( 2 N n 1) / 2 N
n 0
1
N 1
f ( n )e j 2un / 2 N e j 2u / 4 N e j 2u / 4 N e j 2u ( 2 N ) / 2 N e j 2un / 2 N e j 2u / 2 N
n 0
N 1
f ( n )e j 2un / 2 N e j 2u / 4 N e j 2u / 4 N e j 2un / 2 N e j 2u / 2 N http://en.wikipedia.org/wiki/Discrete_
n 0 cosine_transform
N 1
N = 17;
f = rand(1,N);
ff = [f f(end:-1:1)];
except k coefficien ts (the same k coefficien ts are retained for all patches)
Which is the best orthonormal basis?
• For which orthonormal basis is the following
error the lowest:
M
~( k ) q 2
qi i
i 1
Which is the best orthonormal basis?
• The answer is the PCA basis, i.e. the set of k
eigenvectors of the correlation matrix C,
corresponding to the k largest eigen-values.
Here is C is defined as:
1 M
C
M 1 i 1
q qT
i i ,
1 M
Ckl
M 1 i 1
qikl qilk
PCA: separable 2D version
• Find the correlation matrix CR of row vectors from
the patches.
• Find the correlation matrix CC of column vectors
from the patches.
• The final PCA basis is the Kronecker product of the
individual bases:
1 M n
CR qi ( j,:)' qi ( j,:);
M 1 i 1 j 1
[ VR , DR ] eig (CR ); qi ( j, :) R1n j th row vector of qi
1 M n
CC
M 1 i 1 j 1
qi (:, j ) qi (:, j )';
Experiment
• Suppose you extract M ~ 100,000 small-sized (8 x 8) patches from a set of
images.
• Compute the column-column and row-row correlation matrices.
1 M 1 M 8
CC
M 1 i 1
PiPi
T
M 1 i 1 j 1
Pi (:, j ) Pi (:, j )';
1 M T 1 M 8
CR
M 1 i 1
Pi Pi
M 1 i 1 j 1
Pi ( j, :)' Pi ( j, :);
Absolute value of dot products between the columns of DCT matrix and columns of VR (left) and VC (right)
1.0000 0.0007 0.0032 0.0002 0.0013 0.0001 0.0005 0.0000 1.0000 0.0002 0.0029 0.0001 0.0010 0.0000 0.0004 0.0000
0.0007 0.9970 0.0097 0.0689 0.0009 0.0322 0.0003 0.0110 0.0002 0.9965 0.0028 0.0766 0.0005 0.0314 0.0009 0.0107
0.0033 0.0106 0.9968 0.0118 0.0713 0.0004 0.0334 0.0025 0.0029 0.0025 0.9969 0.0046 0.0728 0.0017 0.0304 0.0013
0.0002 0.0718 0.0124 0.9926 0.0007 0.0927 0.0017 0.0276 0.0001 0.0795 0.0044 0.9923 0.0029 0.0916 0.0015 0.0243
0.0010 0.0001 0.0737 0.0004 0.9942 0.0008 0.0780 0.0010 0.0008 0.0003 0.0747 0.0026 0.9948 0.0061 0.0696 0.0004
0.0000 0.0261 0.0015 0.0962 0.0005 0.9934 0.0011 0.0569 0.0000 0.0246 0.0021 0.0949 0.0069 0.9940 0.0131 0.0452
0.0003 0.0007 0.0276 0.0021 0.0802 0.0010 0.9964 0.0013 0.0003 0.0004 0.0252 0.0003 0.0715 0.0137 0.9970 0.0002
0.0000 0.0076 0.0026 0.0227 0.0012 0.0596 0.0015 0.9979 0.0000 0.0076 0.0013 0.0207 0.0001 0.0476 0.0009 0.9986
64 columns of V – each reshaped to form an 8 x DCT bases
8 image, and rescaled to fit in the 0-1 range.
Notice the similarity between the DCT bases and
the columns of V. Again, V is the Kronecker
product of VR and VC.
DCT and PCA
• DCT basis is very close to PCA basis when the
data-points come from what is called as a
stationary first order Markov process when ρ
(defined below) is close to 1, i.e.
P( qi | qi 1 , qi 2 ,..., qi n ) P( qi | qi 1 ) Defn. of 1st order Markov process
E ( qi | qi 1 , qi 2 ,..., qi n ) E ( qi | qi 1 )
E ( qi qi 1 ) , E ( qi qi 2 ) 2 ,..., E ( qi qi n 1 ) n 1 , | | 1
1 2 . n 1
For this part, you may refer to section
1 . . . 2.9 (equations 2.67 and 2.68) of the
C 2 . . . . , C E ( q q ) book “Fundamentals of Digital Image
ij i j Processing” by Anil Jain. Also read,
. . . . . section 5.6 (equations 5.95 and 5.96).
n 1
. . . 1
DCT and PCA
• One can show that the eigenvectors of the covariance
matrix of the form seen on the previous slide are very
close to the DCT basis vectors!
Example 0.6
3 0
1 0.2
0 0.4
1
1
1
0 0.2
0.4 0.2 0.1 0.1 0.2
0
1
To perform encoding, we maintain an initially empty encoded bit stream. Read a symbol
from the input, traverse the Huffman tree from root node to the leaf node for that
symbol, collecting all the bit labels on the traversed path, and appending them to the
encoded bit stream. Repeat this for every symbol from the input. Example: for R, we
write 11.
To perform decoding, read the encoded bit stream, and traverse the Huffman tree from
the root node toward a leaf node, following the path as indicated by the bit stream. For
example, if you read in 11, you would travel to the leaf node R. When you reach a leaf
node, append its associated symbol to the decoded output. Go back to the root node
and traverse the tree as per the remaining bits from the encoded bit stream.
About the algorithm
• This is a greedy algorithm, which is guaranteed to
produce the prefix-free code with minimal
average length (proof beyond the scope of the
course).
• There could be multiple sets of code words with
the same average bit length. Huffman encoding
produces one of them, depending on the order in
which the nodes were combined, and the
convention for labeling the edges with a 0 or a 1.
Zig-zag ordering
• The quantized DCT coefficients are arranged
now in a zigzag order as follows. The zig-zag
pattern leaves a bunch of consecutive zeros at
the end.
−26
−3 0
−3 −2 −6
2 −4 1 −3
1 1 5 1 2
−1 1 −1 2 0 0
0 0 0 −1 −1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0
0 0 0 0
0 0 0
0 0
0
Run length encoding
• The non-zero re-ordered quantized DCT
coefficients (except for the DC coefficient) are
written down in the following format:
• run-length (number of zeros before this coefficient), 4 bits each
• size (no. of bits to store the Huffman code for the
coefficient),
• actual Huffman code of the coefficient
We refer to the above set as a triple. In case there are more than 15 zeros in between
2 non-zero AC coefficients, a special triple is inserted. That triple is (15,0,0). If there
are a large number of trailing zeros at the end of a block, we but in an “end of block”
triple given as (0,0).
−26
−3 0
−3 −2 −6
2 −4 1 −3
1 1 5 1 2
−1 1 −1 2 0 0
0 0 0 −1 −1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0
0 0 0 0
0 0 0
0 0
0
Cb channel
PCA on RGB values
• Suppose you take N color images and extract
RGB values of each pixel (3 x 1 vector at each
location).
• Now, suppose you build an eigenspace out of
this – you get 3 eigenvectors, each
corresponding to 3 different eigenvalues.
PCA on RGB values
• The eigenvectors will look typically as follows:
0.5952 0.6619 0.4556
0.6037 0.0059 -0.7972
0.5303 -0.7496 0.3961
• Exact numbers are not important, but the first
eigenvector is like an average of RGB. It is
called as the Luminance Channel (Y).
PCA on RGB values
• The second eigenvector is like Y-B, and the third is
like Y-G. These are called as the Chrominance
Channels.
• The Y-Cb-Cr color space is related to this PCA-
based space (though there are some details in
the relative weightings of RGB to get Luminance
denoted by Y, and Chrominance – denoted by Cb
and Cr).
• The values in the three channels Y, Cb and Cr are
decorrelated, similar to the values projected onto
the PCA-based channels.
PCA on RGB values
• The luminance channel (Y) carries most information
from the point of view of human perception, and the
human eye is less sensitive to changes in chrominance.
• This fact can be used to assign coarser quantization
levels (i.e. fewer bits) for storing or transmitting Cb and
Cr values as compared to the Y channel. This improves
the compression rate.
• The JPEG standard for color image compression uses
the YCbCr format. For an image of size M x N x 3, it
stores Y with full resolution (i.e. as an M x N image),
and Cb and Cr with 25% resolution, i.e. as M/2 x N/2
images.
R channel B channel
G channel
Image containing eigencoefficient value corresponding to 1st eigenvector (with maximum eigenvalue)
Image containing eigencoefficient value corresponding to 2nd eigenvector (with second largest eigenvalue)
Image containing eigencoefficient value corresponding to 3rd eigenvector (with least eigenvalue)
http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-4.pdf
JPEG artifacts
• Seam artifacts at patch boundaries (more
prominent for lower Q values).
• Ringing artifacts around edges.
• Some loss of edge and textural detail.
• Color artifacts
http://www.sitepoint.com/sharper-gif-jpeg-
png-images/
A word about JPEG 2000
• JPEG2000 (extension jp2) is the latest series of
standards from the JPEG committee
– Uses wavelet transform
– Better compression than JPG
– Superior lossless compression
– Supports large images and images with many components
– Region-of-interest coding
http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-4.pdf
References
• Image compression chapter of the book by
Gonzalez
• Section 2.9 and 5.6 of the book by Anil Jain
• Wikipedia article on JPEG
• Image compression slides by Prof. Prem Kalra
(IIT Delhi):
http://www.cse.iitd.ernet.in/~pkalra/siv864/p
df/session-11-4.pdf