Image Compression
Image Compression
1
Image Compression
• Process of converting an image file into
another image file that occupies less storage
space, without sacrificing its visual content as
far as possible.
2
Types of compression
• Lossless: the compressed image can be
converted back with zero error.
3
Lossless compression - examples
• LZW method (used in Winzip)
• Huffman encoding (part of the JPEG algorithm,
although overall JPEG is lossy)
• Run-length encoding (also part of the JPEG
algorithm, although JPEG is lossy overall)
4
Lossy compression
• JPEG
• MPEG (for video)
• MP3 (for audio)
• Machine learning based techniques for
compression of images or video (not covered
in this course).
5
Lossy image compression
• Compression of text files or exe files cannot
afford to be lossy.
• But some portion of image content is often
not very noticeable to the human eye,
especially the higher frequencies. Discarding
this extraneous information leads to
compression without significant loss of visual
appeal.
6
Source: Article on compressive sensing by Candes and Wakin, from IEEE Signal Processing
Magazine, 2008
DCT coefficients or
7
JPEG compression method
• JPEG = Joint Photographic Experts Group
• One of the most popular standards for
compression of photographic images – widely
used on the internet.
• Widely used in digital cameras.
• Implemented in all standard image processing
software (MATLAB, OpenCV, etc.)
• Essentially lossy (though there are some
lossless variants)
• Applicable for color as well as grayscale images.
8
JPEG image compression
• User specifies a quality factor (Q) between 0 and
100 (higher Q means better quality)
• JPEG algorithm compresses the image based on the
user-provided Q.
• Higher the Q, less will be the compression rate (but
higher image quality). Lower Q will give higher
compression rate (but poorer image quality).
• JPEG can achieve 1/10 or 1/15 compression rate
with little loss of quality.
9
JPEG image compression
• How is the loss of quality measured?
• As MSE between original (uncompressed) and
reconstructed images:
10
Q = 100,
compression
rate = 1/2.6 Q = 10,
compression
rate = 1/46
Q = 50,
Q = 1,
compression
compression
rate = 1/15
rate = 1/144
Q = 25, http://en.wikipedia.org/wiki/JPEG
compression
rate = 1/23
11
Steps of the JPEG algorithm (encoder):
Overview (approximate)
1. Divide the image into non-overlapping 8 x 8 blocks and
compute the discrete cosine transform (DCT) of each block.
This produces a set of 64 “DCT coefficients” per block.
2. Quantize these DCT coefficients, i.e. divide by some number
and round off to nearest integer (that’s why it is lossy). Many
coefficients now become 0 and need not be stored!
3. Now run a lossless compression algorithm (typically
Huffman encoding) on the entire set of integers.
12
13
STEP 1: Discrete Cosine
Transform (DCT)
14
Discrete Cosine Transform (DCT) in 1D
15
Discrete Cosine Transform (DCT) in 1D
n u
u n
16
DCT
• Expresses a signal as a linear combination of
cosine bases (as opposed to the complex
exponentials as in the Fourier transform).
• The coefficients of this linear combination are
called DCT coefficients.
• Is real-valued unlike the Fourier transform!
• Discovered by Ahmed, Natarajan and Rao
(1974)
17
u
• DCT basis matrix is orthonormal. The dot product of any row (or column) with itself
is 1. The dot product of any two different rows (or two different columns) is 0. The
inverse is equal to the transpose.
• Columns of the DCT matrix are called the DCT basis vectors.
18
Digression: matrix view of a discrete orthonormal
transform (Fourier transform used as example here)
• Remember:
19
20
DCT in 2D
The DCT matrix is this case will have size
MN x MN, and it will be the Kronecker
product of two DCT matrices – one of size
M x M, the other of size N x N. The DCT
matrix for the 2D case is also
orthonormal, it is NOT symmetric and it is
NOT the real part of the 2D DFT.
21
What is a 2D Fourier Matrix?
• It is of the following form:
22
What is a 2D Fourier Matrix?
Consider a matrix A of size N1 x N2 and a matrix B of size M1 x M2. The size of their
Kronecker product is given by N1 M1 x N2 M2. The Kronecker product is
constructed by creating a rectangular grid of size N1 x N2 . In each cell of the grid,
you place B. The copy of B in the cell at grid location (i,j) is multiplied by Aij.
23
This big matrix is nothing but the Kronecker product of two
2 x 2 Fourier matrices.
24
How do the DCT bases look like? (1D case)
25
How do the DCT bases look like? (2D-case)
http://en.wikipedia.org/wiki/JPEG
26
Again: DCT is NOT the real part of the DFT
http://en.wikipedia.org/wiki/JPEG 27
DCT on grayscale image patches
• The DCT coefficients of natural image patches
have an amazing property.
• It is observed that most of the signal energy is
concentrated in only a small number of
coefficients.
• This is good news for compression! Store only a
few coefficients, and throw away the rest.
• The corresponding error will be small, due to
the orthonormal nature of the DCT.
28
149 74 92 74 74 74 149 162
87 74 117 30 74 105 180 130
30 117 105 43 105 130 149 105
74 162 105 74 105 117 105 105
117 149 74 117 74 105 74 149
149 87 74 87 74 74 117 180
IMAGE PATCH
105 74 105 43 61 117 180 149
74 74 105 74 105 130 149 105
HISTOGRAM OF DCT
COEFFICIENTS
29
Original image Image reconstructed after Image reconstructed after
discarding all DCT coefficients discarding all DCT coefficients
of non-overlapping 8 x 8 of non-overlapping 8 x 8
patches with absolute value patches with absolute value
less than 10, and then less than 20, and then
computing inverse DCT computing inverse DCT
Number of DCT coefficients of non-
Number of DCT coefficients of non-
overlapping 8 x 8 patches with absolute
overlapping 8 x 8 patches with absolute
value less than 20 was 51,045 out of a
value less than 10 was 34,377 out of a total
total of 65536 (64 coefficients for each 8
of 65536 (64 coefficients for each 8 x 8
x 8 patch, totally 1024 such patches). This
patch, totally 1024 such patches). This is
is more than 78%. Corresponding
more than 50%. Corresponding percentage
percentage for DFT was 7%.
for DFT was 1%. 30
Why DCT? DFT and DCT comparison
DFT DCT
Orthonormal Yes Yes
Real/complex Complex Real
Separable in
Yes Yes
2D
Norm-
Yes Yes
preserving
Inverse exists Yes Yes
Fast
implementatio Yes (fft) Yes (uses fft)
n
Energy
compaction for Good/Fair Much Better
natural images
31
DCT has better energy compaction than DFT because…
Recall that the DFT of a sequence is equal to the Discrete Fourier Series (DFS) of a periodic
extension of that sequence. In computing the DFT of a signal of length n, there is the implicit
extension of several copies of the signal placed one after the other (n-point periodicity). The
resultant discontinuities require several frequencies for good representation in the DFS. As
against this, the discontinuities are reduced in a DCT because a reflected copy of the signal is
32
appended to it (2n-point periodicity) before computing the DFS.
DCT has better energy compaction than
DFT because…
33
DCT computational complexity
• Naïve implementation (matrix times vector) is
O(N2) for a vector of N elements.
• You can speed this up to O(N log N) using the
FFT as shown on next slide.
34
Reflected version of f, appended to f.
supporting code:
https://www.cse.iitb.ac.in/~ajitvr/CS663_Fall2024/Code_Compression/
dct_dft_relation.m
35
1
37
Which is the best orthonormal basis?
• For which orthonormal basis U is the following
error the lowest:
38
Which is the best orthonormal basis?
• The answer is the PCA basis, i.e. the set of k
eigenvectors of the correlation matrix C,
corresponding to the k largest eigen-values.
Here is C is defined as:
39
PCA: separable 2D version
• Find the correlation matrix CR of row vectors from
the patches.
• Find the correlation matrix CC of column vectors
from the patches.
• The final PCA basis is the Kronecker product of the
individual bases:
40
But PCA is not used in JPEG, because…
• It is image-dependent, and the basis matrix
would need to be computed afresh for each
image.
• The basis matrix would need to be stored for
each image.
• It is expensive to compute – O(n3) for a vector
with n elements.
The DCT is used instead!
41
DCT and PCA
• DCT can be computed very fast using fft.
• It is universal – no need to store the DCT bases
explicitly.
• DCT has very good energy compaction
properties, only slightly worse than PCA.
42
Code:
https://www.cse.iitb.ac.in/~ajitvr/CS663_Fall2024/Code_Compression/dct_pca.m
Experiment
• Suppose you extract M ~ 100,000 small-sized (8 x 8) patches from a set of images.
• Compute the column-column and row-row correlation matrices.
43
0.3536 0.4904 0.4619 0.4157 0.3536 0.2778 0.1913 0.0975
0.3536 0.4157 0.1913 -0.0975 -0.3536 -0.4904 -0.4619 -0.2778 DCT matrix: dctmtx
0.3536 0.2778 -0.1913 -0.4904 -0.3536 0.0975 0.4619 0.4157 command from MATLAB (see
0.3536 0.0975 -0.4619 -0.2778 0.3536 0.4157 -0.1913 -0.4904
0.3536 -0.0975 -0.4619 0.2778 0.3536 -0.4157 -0.1913 0.4904 code on website)
0.3536 -0.2778 -0.1913 0.4904 -0.3536 -0.0975 0.4619 -0.4157
0.3536 -0.4157 0.1913 0.0975 -0.3536 0.4904 -0.4619 0.2778
0.3536 -0.4904 0.4619 -0.4157 0.3536 -0.2778 0.1913 -0.0975
0.3517 -0.4493 -0.4278 0.4230 0.3754 0.3247 -0.2250 -0.1245
0.3534 -0.4366 -0.2276 -0.0110 -0.3078 -0.4746 0.4732 0.2975 VC: Eigenvectors of column-
0.3543 -0.3101 0.1728 -0.4830 -0.3989 0.0498 -0.4299 -0.4109
0.3546 -0.1115 0.4799 -0.3005 0.3342 0.4102 0.1856 0.4761 column correlation matrix
0.3547 0.1141 0.4823 0.2944 0.3301 -0.4182 0.1745 -0.4771
0.3543 0.3104 0.1771 0.4834 -0.3977 -0.0322 -0.4308 0.4103
0.3535 0.4357 -0.2319 0.0143 -0.3009 0.4656 0.4851 -0.2975
0.3520 0.4468 -0.4328 -0.4204 0.3686 -0.3253 -0.2342 0.1261
0.3520 -0.4461 -0.4305 0.4224 0.3696 0.3247 0.2342 0.1283
0.3537 -0.4338 -0.2345 -0.0114 -0.3000 -0.4671 -0.4814 -
0.3028 VR: Eigenvectors of row-row
0.3545
0.3548
-0.3086
-0.1145
0.1662
0.4763
-0.4896
-0.3031
-0.4007
0.3339
0.0359
0.4198
0.4261
-0.1800
0.4102
-0.4713
correlation matrix
0.3548 0.1056 0.4839 0.2926 0.3349 -0.4194 -0.1766 0.4733
0.3543 0.3043 0.1863 0.4833 -0.4028 -0.0354 0.4269 -0.4097
0.3532 0.4389 -0.2269 0.0180 -0.3008 0.4654 -0.4811 0.3037
0.3512 0.4562 -0.4300 -0.4126 0.3694 -0.3242 0.2335 -0.1319
Absolute value of dot products between the columns of DCT matrix and columns of V R (left) and VC (right)
1.0000 0.0007 0.0032 0.0002 0.0013 0.0001 0.0005 0.0000 1.0000 0.0002 0.0029 0.0001 0.0010 0.0000 0.0004 0.0000
0.0007 0.9970 0.0097 0.0689 0.0009 0.0322 0.0003 0.0110 0.0002 0.9965 0.0028 0.0766 0.0005 0.0314 0.0009 0.0107
0.0033 0.0106 0.9968 0.0118 0.0713 0.0004 0.0334 0.0025 0.0029 0.0025 0.9969 0.0046 0.0728 0.0017 0.0304 0.0013
0.0002 0.0718 0.0124 0.9926 0.0007 0.0927 0.0017 0.0276 0.0001 0.0795 0.0044 0.9923 0.0029 0.0916 0.0015 0.0243
0.0010 0.0001 0.0737 0.0004 0.9942 0.0008 0.0780 0.0010 0.0008 0.0003 0.0747 0.0026 0.9948 0.0061 0.0696 0.0004
0.0000 0.0261 0.0015 0.0962 0.0005 0.9934 0.0011 0.0569 0.0000 0.0246 0.0021 0.0949 0.0069 0.9940 0.0131 0.0452
0.0003 0.0007 0.0276 0.0021 0.0802 0.0010 0.9964 0.0013 0.0003 0.0004 0.0252 0.0003 0.0715 0.0137 0.9970 44
0.0002
0.0000 0.0076 0.0026 0.0227 0.0012 0.0596 0.0015 0.9979 0.0000 0.0076 0.0013 0.0207 0.0001 0.0476 0.0009 0.9986
64 columns of V – each reshaped to form an 8 x DCT bases
8 image, and rescaled to fit in the 0-1 range.
Notice the similarity between the DCT bases and
the columns of V. Again, V is the Kronecker
product of VR and VC.
45
DCT and PCA
• DCT is very close to PCA when the patches
come from what is called as a stationary first
order Markov process, i.e.
46
DCT and PCA
• One can show that the eigenvectors of the correlation
matrix of the form seen on the previous slide are the DCT
basis vectors!
48
Computation of DCT coefficients in JPEG
• Before computation, the value 128 (midpoint
of the range 0 to 255) is subtracted from every
pixel value.
• This changes the range of intensity values
from 0 to 255, to -128 to 127.
• This also changes the range of DCT coefficient
values from 0 to 2048, to -1024 to +1024.
49
STEP 2: Quantization
50
Quantization
• The DCT coefficients are floating point
numbers and storing them in a file will
produce no compression. So they need to be
quantized.
• The human eye is not sensitive to changes in
the higher frequency content.
• So we can have cruder quantization for the
higher frequency coefficients and a finer one
for the lower frequency coefficients.
51
Quantization
• Quantization is performed by dividing the DCT coefficient
matrix element-wise by a quantization matrix and rounding
off to the nearest integer.
• The quantization matrix on the next slide is for quality factor
Q = 50.
• Matrices for lower Q values are obtained by scaling the Q =
50 matrix with a constant 50/Q – which increases the values
in the quantization matrix.
• Matrices for higher Q values are obtained by scaling the Q =
50 matrix with a constant 50/Q – which decreases the values
in the quantization matrix.
52
M
55
56
Huffman encoding
• Input: a set of non-zero quantized DCT coefficients from all the
different blocks of the image (values lying between -1024 to
+1024).
• Output: a set of encoded coefficients with length (in terms of
number of bits) less than that of the original set.
• Principles behind Huffman encoding:
(1) Encode the more frequently occurring coefficients with
fewer bits. Encode the rarely occurring coefficients with
more bits. This will reduce the average bit-length.
(2) Ensure that the encoding for no coefficient is a strict prefix of
the encoding of any other coefficient (to be explained on
next slide). This is called a “prefix-free code”.
57
Huffman encoding example
• Consider a set of alphabets {a,e,q}. Let the frequency of an
alphabet x be denoted as p(x).
• Assume p(e) > p(a) > p(q) [actually true in the English
language].
• Consider the following code-word assignment: e – 0, a – 1, q –
01 (note: we assigned more bits for q). Now consider the
encoded stream: 001. It can be interpreted as ‘eea’ or ‘eq’.
• The reason for this ambiguity is that the code for ‘e’ is a strict
prefix of the code for ‘q’.
• For unambiguous decoding, we need prefix-free codes.
Example e – 0, a – 10, q – 11 is one example of a prefix-free
code.
58
Huffman encoding example
• The Huffman encoding algorithm asks the
following question:
Given a set of n alphabets A = {ai} with
corresponding frequencies {p(ai)} (each
frequency lies from 0 to 1), what prefix-free
encoding yields the least average bit length?
That is, which set of code-words {λ(ai)} will
minimize
Length of the code-
word λ(ai)
59
Algorithm
1. Sort alphabets in increasing order of frequency. Create a leaf
node from each alphabet. These leaf nodes will belong to a
binary tree called the Huffman tree.
2. Combine the two lowest frequency nodes s1 and s2 to create a
parent node s12. s1 and s2 will be the left and right child of s12.
The frequency of s12 is given by p(s12) = p(s1) + p(s2).
3. Label the edge from s12 to s1 with a ‘0’ and the edge from s12 to
s2 with a ‘1’.
4. Delete s1 and s2 from the sorted list of alphabets and insert the
node s12, i.e. root node of the tree (s12,s1,s2) in the correct place
depending on the value of p(s12).
5. Repeat steps 2 to 4 until there is only one node in the list. This
will be the root node of the final Huffman tree.
6. Traverse the tree from the root node until each leaf and collect
all the binary symbols along every edge into a string. This string60
www.cis.upenn.edu/~matuszek/cit594-2002/Slides/huffman.ppt
Example
0.6
3 0
1 0.2
0 1 0.4
1
1
0 0.2
0.4 0.2 0.1 0.1 0.2
0
1
To perform encoding, we maintain an initially empty encoded bit stream. Read a symbol
from the input, traverse the Huffman tree from root node to the leaf node for that
symbol, collecting all the bit labels on the traversed path, and appending them to the
encoded bit stream. Repeat this for every symbol from the input. Example: for R, we
write 11.
To perform decoding, read the encoded bit stream, and traverse the Huffman tree from
the root node toward a leaf node, following the path as indicated by the bit stream. For
example, if you read in 11, you would travel to the leaf node R. When you reach a leaf
node, append its associated symbol to the decoded output. Go back to the root node
and traverse the tree as per the remaining bits from the encoded bit stream.
62
About the algorithm
• This is a greedy algorithm, which is guaranteed
to produce the prefix-free code with minimal
average length (proof beyond the scope of the
course).
• There could be multiple sets of code words with
the same average bit length. Huffman encoding
produces one of them, depending on the order
in which the nodes were combined, and the
convention for labeling the edges with a 0 or a 1.
63
Huffman Trees and Entropy
• The average code length as computed by this
algorithm satisfies
Entropy of the random variable (set of
alphabets) – measure of uncertainty of
the random variable, or the measure of
how much a random variable surprises
you, or the average number of bits
required to store a random variable.
We refer to the above set as a triple. In case there are more than 15 zeros in between
2 non-zero AC coefficients, a special triple is inserted. That triple is (15,0,0). If there
are a large number of trailing zeros at the end of a block, we but in an “end of block”
triple given as (0,0).
66
−26
−3 0
−3 −2 −6
2 −4 1 −3
1 1 5 1 2
−1 1 −1 2 0 0
0 0 0 −1 −1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0
0 0 0 0
0 0 0
0 0
0
67
Encoding DC coefficients
• The difference between the DC coefficient of
the current and previous patch is encoded and
stored.
• These difference values are Huffman encoded
using a separate table (different from the
Huffman table used for AC coefficients).
• The DC coefficient of the first patch is stored
explicitly.
68
JPEG encoded file
• Begins with a header that contains
information such as size of file, whether color
or grayscale, the table of different alphabets
(i.e. DCT coefficient values and their Huffman
codes) and the quantization matrix.
• This is followed by a bit stream containing
triples of the form: (run-length, the length of
the Huffman code of the coefficient, and the
Huffman code for the coefficient).
69
JPEG DECODER
70
JPEG decoding
• Perform Huffman decoding and obtain the DCT coefficients (AC).
• Multiply the AC coefficients point-wise with the entries in the
quantization matrix.
• Compute the DC coefficients for each patch using the differences
between the DC coefficients of successive patches. Multiply by
the appropriate entry from the quantization matrix.
• Reconstruct the image patches of size 8 x 8 using the inverse DCT.
Add 128 to the intensity values in the patch.
• Note: During JPEG encoding, the round-off errors from the
quantization step can never be recovered again. Hence JPEG is
overall a lossy algorithm.
71
JPEG for color images
72
JPEG for color images
• The RGB values are converted to the YCbCr color space using:
73
PCA on RGB values
• Why can you not separately compress the R,G,B
images – instead of converting to Y, Cb, Cr?
• The answer lies in PCA!
• Suppose you take N color images and extract
RGB values of each pixel (3 x 1 vector at each
location).
• Now, suppose you build an eigenspace out of
this – you get 3 eigenvectors, each
corresponding to 3 different eigenvalues.
74
PCA on RGB values
• The eigenvectors will look typically as follows:
0.5952 0.6619 0.4556
0.6037 0.0059 -0.7972
0.5303 -0.7496 0.3961
• Exact numbers are not important, but the first
eigenvector is like an average of RGB. It is
called as the Luminance Channel (Y). It is
similar to the intensity in the HSI space.
75
PCA on RGB values
• The second eigenvector is like Y-B, and the third is
like Y-G. These are called as the Chrominance
Channels.
• The Y-Cb-Cr color space is related to this PCA-based
space (though there are some details in the relative
weightings of RGB to get Luminance and
Chrominance – denoted by Cb and Cr).
• The values in the three channels Y, Cb and Cr are
decorrelated, similar to the values projected onto
the PCA-based channels.
76
PCA on RGB values
• Why does PCA produce decorrelated values (i.e. why are the
values of the eigencoefficients decorrelated)?
• Recall that .
78
PCA on RGB values
• Why is it important to have decorrelated values in compression?
79
PCA on RGB values
• The luminance channel (Y) carries most information from
the point of view of human perception, and the human
eye is less sensitive to changes in chrominance.
• This fact can be used to assign coarser quantization levels
(i.e. fewer bits) for storing or transmitting Cb and Cr
values as compared to the Y channel. This improves the
compression rate.
• The JPEG standard for color image compression uses the
YCbCr format. For an image of size M x N x 3, it stores Y
with full resolution (i.e. as an M x N image), and Cb and Cr
with 25% resolution, i.e. as M/2 x N/2 images.
80
81
82
83
The variances of the three eigen-coefficient values:
8411, 159.1, 71.7
84
85
86
87
RGB and its corresponding Y, Cb, Cr channels
88
http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-4.pdf
89
Down-sampling of Cb and Down-sampling of Cb and No down-sampling of
Cr in X and Y directions by Cr only in X direction by a chrominance or luminance
a factor of 2 factor of 2 channels
https://en.wikipedia.org/wi
ki/Chroma_subsampling#/
Cb channel under different down- media/File:Colorcomp.jpg
sampling factors
90
Modes of JPEG compression
• Sequential: encoding and decoding of patches
takes place in left to right, top to bottom
order.
• Progressive: encoding and decoding in
multiple scans, each one with finer
quantization levels.
• Hierarchical: encoding and decoding
performed at different scales.
91
Commonly seen in
web applications (e.g.:
Facebook)
http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-4.pdf
92
JPEG artifacts
• Seam artifacts at patch boundaries (more
prominent for lower Q values).
• Ringing artifacts around edges.
• Some loss of edge and textural detail.
• Color artifacts
93
Video Compression
94
Need for video compression
• Huge data – typical HDTV has frames of size
1920 x 1080 and 30 fps frame-rate. That is
more than 1 GB per second.
• Network channel bandwidths are limited
(around 20 Mbps).
• We need large rates of compression (around
1:80)!
95
Motion JPEG
• Encode each frame using JPEG.
• That will yield only around 1:10 compression.
• This makes use of only spatial redundancy, no
temporal redundancy.
96
MPEG (Motion Pictures Expert Group)
• Heavy use of temporal redundancy!
• Uses a process called predictive coding.
• Consider pixel f(x,y,t) in frame t. We try to
predict its value, denoted as g(x,y,t), using a
linear combinations of the values from
previous k frames, i.e. f(x,y,t-k),…,f(x,y,t-1).
• We simply store the error e(x,y,t) = f(x,y,t)-
g(x,y,t).
97
Differential coding (First order predictive
coding)
• In this method, k = 1, i.e. you encode the
differences between consecutive frames, i.e.
e(x,y,t) = f(x,y,t) – f(x,y,t-1).
• For most frames, the errors are highly sparse.
98
99
Errors are not always sparse ☹
• Exceptions: (1) camera zoom-in and zoom-out,
(2) sudden changes in viewpoint or scene
content, or fade-in/fade-out effects. In such
cases, the errors will be large!
100
Motion compensation
• For each macro-block (typically 16 x 16 or 8 x 8 in size) in
the frame to be encoded, find the most similar macro-
block in a reference frame (which could be the previous
frame, but not necessarily so).
• The difference between the pixel locations of the top-left
corner of the two macro-blocks is called the motion
vector.
• The motion vector is considered to be constant within a
macro-block.
• Search for similar macro-blocks is restricted to a small
search window around the original macro-block.
• The search window is rectangular – broader than taller
(why?). 101
102
Motion Compensation
• For many macro-blocks, the motion vectors will be 0.
• The search for similar macro-blocks is performed at a
sub-pixel level (1/2 pixel or ¼ pixel) for more
accuracy. In this case, image intensity values need to
be interpolated.
• Macro-block similarity measure is one of the
following:
In color videos, only
luminance (i.e. Y)
channel is used in
similarity measure
103
Motion compensation
• Many a time, there are more than one similar block. In such
cases, the spatially closest block is chosen.
• Note: we are not concerned with the accuracy of the motion
estimate. It is only a means towards the larger goal – of
compression.
• The inter-frame differences are computed after motion
compensation. This hugely improves the sparsity of these
differences.
• In other words, don’t blindly compute the difference between
macro-blocks at corresponding locations in frames t and t-1.
Compute the difference between the current macro-block in
frame t and its most similar match in frame t-1.
104
105
106
http://www.cse.iitd.
ernet.in/~pkalra/siv8
64/pdf/session-11-5.
pdf
107
Motion Compensation
• For each macro-block (typically 16 x 16 or 8 x 8 in size) in
the frame to be encoded, find the most similar macro-block
in a reference frame.
• If the reference frame is the previous one or the previous
reference frame (see two slides later for more details), the
current frame (the one being encoded) is called the P-frame.
• If the reference frame is a combination of the previous
frame and the next one, the current frame is called the B-
frame.
• A B-frame can use two motion vectors – one for previous
and one for next frame.
108
Example: why do we need B-frames?
http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-5.pdf
109
No Motion Compensation here!
• For some frames that rest on shot-boundaries
(i.e. sudden changes in content), there is no
advantage to performing motion compensation.
• Such frames are called I-frames (independent
frames). These usually act as reference frames
for other frames to be differentially encoded.
• I-frames can be detected by the presence of
very frequently low similarity values during
search for macro-blocks.
110
MPEG encoder
Note:
•YUV is a color-space
very similar to YCbCr.
• MV = motion vector
http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-11-5.pdf
111
MPEG encoder
• The I-frames are encoded using JPEG.
• The B-frames and P-frames are also encoded using JPEG, but the
DCT is computed on macro-blocks from the motion-
compensated residual image as follows:
o We have already computed motion vectors w.r.t. the reference
frame.
o Compute a residual image by calculating differences between
macro-blocks in the current frame and the most-similar
matching macro-blocks (as given by the motion vector) from the
reference image. This is called motion-compensated frame
differencing.
o Note: the motion vector for the macro-block also needs to be
stored. For this, motion-vectors from several macro-blocks are
collected together and Huffman-encoded. Only the non-zero
motion vectors are encoded.
112
Display Order and Transmission Order (or order in which frames are
compressed) may be different!
http://www.cse.iitd.ernet.in/~pkalra/siv864/pdf/session-15-5.pdf
113
MPEG decoder
Note:
•YUV is a color-space
very similar to YCbCr.
114