0% found this document useful (0 votes)
12 views16 pages

Lecture 20

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views16 pages

Lecture 20

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Transform Coding

• Predictive coding technique is a spatial domain technique since it


operates on the pixel values directly.
• Transform coding techniques operate on a reversible linear
transform coefficients of the image (ex. DCT, DFT, Walsh etc.)

Input Image Compressed


(N × N ) Construct
Forward Symbol Image
n×n Transform
Quantizer
Encoder
subimages

Merge
Symbol Inverse
Decoder Transform
n×n
Compressed subimages Decompressed
Image Image

• Input N × N image is subdivided into subimages of size n × n .


• n × n subimages are converted into transform arrays. This tends to
decorrelate pixel values and pack as much information as possible
in the smallest number of coefficients.
• Quantizer selectively eliminates or coarsely quantizes the
coefficients with least information.
• Symbol encoder uses a variable-length code to encode the
quantized coefficients.
• Any of the above steps can be adapted to each subimage (adaptive
transform coding), based on local image information, or fixed for
all subimages.
Walsh Transform (1-D case)
• Given a one-dimensional image (sequence)
f (m), m = 0,1, , N − 1, with N = 2 , its Walsh transform W (u )
q

is defined as:

∑ f (m)∏ (−1)
N −1 q −1
1
W (u ) = , u = 0,1, , N − 1.
bi ( m ) bq−1−i ( u )


N m =0 i =0

bk ( z ) : k th bit (from LSB) in the binary representation of z.

• Note that the Walsh-Hadamard transform discussed in the text is


very similar to the Walsh transform defined above.

• Example: Suppose z = 6 = 110 in binary representation. Then


b0 (6) = 0 , b1 (6) = 1 and b2 (6) = 1.

• The inverse Walsh transform of W (u ) is given by


f (m) = ∑ W (u )∏ (−1)
N −1 q −1
, m = 0,1, , N − 1.
bi ( m ) bq −1−i ( u )


u =0 i=0

• Verify that the “inverse works.” Let


 

g ( m) = ∑ ∑ f (n)∏ (−1)
 

N −1 N −1 q −1 q −1
1
∏ (−1)


bi ( n ) bq −1−i ( u ) 

bi ( m ) bq −1−i ( u )

N
 

u =0 


n =0 i =0 

i =0
This is W ( u ) with m replaced with n.

∑ f (n)∑∏ (−1)
N −1 N −1 q −1
1 (bi ( n ) + bi ( m ) )bq −1−i ( u )
=
N n =0 u =0 i = 0
?
= f (m) (HW problem!!)
Walsh Transform (2-D case)

• Given a two-dimensional N × N image f (m, n) , with N = 2q , its


Walsh transform W (u , v) is defined as:

∑∑ f (m, n)∏ (−1)


N −1 N −1 q −1
1 bi ( m ) bq −1−i ( u ) + bi ( n ) bq−1−i ( v )
W (u , v) = ,
N m= 0 n =0 i=0
u , v = 0,1, , N − 1.

• Similarly, the inverse Walsh transform is given by

∑∑ W (u, v)∏ (−1)


N −1 N −1 q −1
1 bi ( m ) bq −1−i ( u ) + bi ( n ) bq −1−i ( v )
f (m, n) = ,
N u =0 v =0 i =0
m, n = 0,1, , N − 1.


• The Walsh transform is




Separable (can perform 2-D transform in terms of 1-D


transform).


Symmetric (the operations on the variables m, n are identical).




Forward and inverse transforms are identical.




Involves no trigonometric functions (just +1 and –1), so is


computationally simpler.
Discrete Cosine Transform (DCT)
• Given a two-dimensional N × N image f (m, n) , its discrete
cosine transform (DCT) C (u , v) is defined as:

N −1 N −1
(2m + 1)uπ  ( 2n + 1) vπ
C (u, v ) = α (u )α (v) f (m, n) cos   cos  ,

m= 0 n = 0 2N 
2 N 

u , v = 0,1,  , N − 1, where
1 , u=0
α(u ) = 
N
2
N , u = 1,2, , N −1

• Similarly, the inverse discrete cosine transform (IDCT) is given


by
N −1 N −1
(2m + 1)uπ  ( 2 n + 1)vπ
f (m, n) = α (u )α (v)C (u, v) cos   cos  ,

u = 0 v =0 2N 
2 N 

m, n = 0,1,  , N − 1.

• The DCT is

Separable (can perform 2-D transform in terms of 1-D
transform).

Symmetric (the operations on the variables m, n are identical)

Forward and inverse transforms are identical

• The DCT is the most popular transform for image compression


algorithms like JPEG (still images), MPEG (motion pictures).

• The more recent JPEG2000 standard uses wavelet transforms


instead of DCT.

• We will now look at a simple example of image compression using


DCT. We will come back to this in detail later.
Discrete Cosine Transform Example

Fraction of DCT coeff. Used 0.85,


Original Image MSE: 0.45

Fraction of DCT coeff. Used 0.65, Fraction of DCT coeff. Used 0.41,
MSE: 1.6 MSE: 4
Discrete Cosine Transform Example

Fraction of DCT coeff. Used 0.19, Fraction of DCT coeff. Used 0.08,
MSE: 7.7 MSE: 12
Transform Selection
• Commonly used ones are Karhunen-Loeve (Hotelling) transform
(KLT), discrete cosine transform (DCT), discrete Fourier
transform (DFT), Walsh-Hadamard transform (WHT).

• Choice depends on the computational resources available and the


reconstruction error that can be tolerated.

• This step by itself is lossless and does not lead to compression. The
quantization of the resulting coefficients results in compression.

• The KLT is optimum in terms of packing the most information for


any given fixed number of coefficients.

• However, the KLT is data dependent. Computing it requires


computing the correlation matrix of the image pixel values.

• The “non-sinusoidal” transforms like the WHT are easy to


compute and implement (no multiplications and no trigonometric
function evaluation).

• Performance of “sinusoidal” transforms like DCT, DFT, in terms


of information packing capability, closely approximates that of the
KLT.

• DCT is by far the most popular choice and is used in the JPEG
(Joint Photographic Experts Group) image standard.
Reconstructed Image Error Image

DCT
RMSE = 0.018

DFT
RMSE = 0.028

WHT
RMSE = 0.023

DCT/DFT/WHT comparison for 8× 8 subimages, 25% coefficients


(with largest magnitude) retained. Note also the blocking artifact.
Subimage Size Selection
• Images are subdivided into subimages of size n × n to reduce the
correlation (redundancy) between adjacent subimages.

• Usually n = 2k , for some integer k. This simplifies the computation


of the transforms (ex. FFT algorithm).
• Typical block sizes used in practice are 8× 8 and 16 × 16 .

Comparison of different transforms and block sizes for transform coding


0.032
DCT
DFT
0.03 Hadamard

0.028

r
or 0.026
r
E
S
M 0.024
R
d
e
zil
a 0.022
mr
o
N
0.02

0.018

0.016
1
10
Size of subimage
Bit Allocation
• After transforming each subimage, only a fraction of the
coefficients are retained. This can be done in two ways:

Zonal coding: Transform coefficients with large variance are


retained. Same set of coefficients retained in all subimages.

Threshold coding: Transform coefficients with large


magnitude in each subimage are retained. Different set of
coefficients retained in different subimages.

• The retained coefficients are quantized and then encoded.

• The overall process of truncating, quantizing, and coding the


transformed coefficients of the subimage is called bit-allocation.
Reconstructed Image Error Image

RMSE = 0.029

Threshold coding

RMSE = 0.038

Zonal coding

Comparison of Zonal and Threshold coding for 8 × 8 DCT subimages,


with 12.5% coefficients retained in each case.
Zonal Coding:

• Transform coefficients with large variance carry most of the


information about the image. Hence a fraction of the coefficients
with the largest variance is retained.

• The variance of each coefficient is calculated based on the


ensemble of ( N / n) 2 transformed subimages, or using a statistical
image model. Recall Project #2 on DCT.

• The coefficients with maximum variance are usually located


around the origin of an image transform. This is usually
represented as a 0-1 mask.

• The same set of coefficients are retained in each subimage.

• The retained coefficients are then quantized and coded. Two


possible ways:

The retained coefficients are normalized with respect to their


standard deviation and they are all allocated the same number of
bits. A uniform quantizer then used.

A fixed number of bits is distributed among all the coefficients


(based on their relative importance). An optimal quantizer such
as a Lloyd-Max quantizer is designed for each coefficient.
Threshold Coding:

• In each subimage, the transform coefficients of largest magnitude


contribute most significantly and are therefore retained.

• A different set of coefficients is retained in each subimage. So this


is an adaptive transform coding technique.

• The thresholding can be represented as T (u, v)m(u, v) , where


m(u , v ) is a masking function:

 0 if T (u , v) satisfies some truncation criterion


m(u, v ) =
1 otherwise

• The elements of T (u, v)m(u, v) are reordered in a predefined


manner to form a 1-D sequence of length n 2 . This sequence has
several long runs of zeros, which are run-length encoded.

1 1 1 0 1 0 0 0 0 1 5 6 14 15 27 28
1 1 1 0 0 0 0 0 2 4 7 13 16 26 29 42
1 1 0 0 0 0 0 0 3 8 12 17 25 30 41 43
1 0 0 0 0 0 0 0 9 11 18 24 31 40 44 53
1 0 0 0 0 0 0 0 10 19 23 32 39 45 52 54
0 0 1 0 0 0 0 0 20 22 33 38 46 51 55 60
0 0 0 0 0 0 0 0 21 34 37 47 50 56 59 61
0 0 0 0 0 0 0 0 35 36 48 49 57 58 62 63

Typical threshold mask Zigzag ordering of coefficients


• The retained coefficients are encoded using a suitable variable-
length code.

• The thresholding itself can be done in three different ways,


depending on the “truncation criterion:”

A single global threshold is applied to all subimages. The level


of compression differs from image to image depending on the
number of coefficients that exceed the threshold.

N-largest coding: The largest N coefficients are retained in


each subimage. Therefore, a different threshold is used for each
subimage. The resulting code rate (total # of bits required) is
fixed and known in advance.

Threshold is varied as a function of the location of each


coefficient in the subimage. This results in a variable code rate
(compression ratio).


Here, the thresholding and quantization steps can be together


represented as:

Original
Transform
coefficient


T (u, v)
Thresholded Tˆ (u, v ) = round
 

and quantized Z (u, v)


value

Normalization
Factor
Z(u, v) is a transform normalization matrix. Typical example
is shown below.

16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99

The values in the Z matrix weigh the transform coefficients


according to heuristically determined perceptual or psycho-
visual importance. Larger the value of Z(u, v), smaller the
importance of that coefficient.

The Z matrix maybe scaled to obtain different levels of


compression.

At the decoding end, T (u, v) = Tˆ (u, v) Z (u, v ) is used to




denormalize the transform coefficients before inverse


transformation.
Example: Coding with different Z

Reconstructed Image Error Image

RMSE = 0.023

Quantization matrix Z
(9152 non-zero coefficients)

RMSE = 0.048

Quantization matrix 8Z
(2389 nonzero coefficients)

You might also like