Compression
Compression
Compression
CCU, Taiwan
Wen-Nung Lie
Applications of image
compression
Televideo conferencing,
Remote sensing (satellite imagery),
Document and medical imaging,
Facsimile transmission (FAX),
Control of remotely piloted vehicles in military,
space, and hazardous waste management
CCU, Taiwan
Wen-Nung Lie 8-1
Fundamentals
What is data and hat is information ?
Data are the means by which information is conveyed.
Various amounts of data may be used to represent the
same amount of information
Data redundancy
Coding redundancy
Interpixel redundancy
Psychovisual redundancy
CCU, Taiwan
Wen-Nung Lie 8-2
Coding redundancy
The graylevel histogram of an image can provide a great
deal of insight into the construction of codes
The average number of bits to represent each pixel
L −1
Lavg = ∑ l (rk ) pr (rk )
k =0
l (rk ) is the average number of bits to represent each value of rk
Variable length coding (VLC) : Higher probability, shorter bit length
Lavg = 3.0 bits (code1)
2.7 bits (code2)
CCU, Taiwan
Wen-Nung Lie 8-3
Interpixel redundancy
Autocorrelation coefficients
A(∆n) 1 N −1− ∆n
γ (∆n) =
A(0)
A(∆n) = ∑
N − ∆n y =0
f ( x, y ) f ( x, y + ∆n)
Interpixel redundancy
Spatial redundancy
Geometrical redundancy
Interframe redundancy
Larger autocorrelation, more interpixel
redundancy
CCU, Taiwan
Wen-Nung Lie 8-4
Run-length coding to remove
spatial redundancy
C R = 2.63
1
RD = 1 − = 0.62
2.63
CCU, Taiwan
Wen-Nung Lie 8-5
Psychovisual redundancy
Certain information simply has less relative importance
than others in normal visual processing – psychovisual
redundancy
The elimination of psychovisually redundant data results
in a loss of quantitative information – called quantization
⇒ lossy data compression
E.g., quantization in graylevels
E.g., line interlacing in TV
(reduced video scanning rate)
(b) False contouring on quantization
(c) Improve graylevel quantization
via dithering
CCU, Taiwan
Wen-Nung Lie 8-6
Fidelity criteria
Objective fidelity criterion
Root-mean-square error
1
⎡ 1 M −1 N −1 ⎤ 2
erms = ⎢ ∑∑ [ fˆ ( x, y ) − f ( x, y )] ⎥
2
⎣ MN x =0 y =0 ⎦
Mean-square signal-to-noise ratio
M −1 N −1
∑∑ fˆ ( x, y)
x =0 y =0
2
SNRms =
∑∑ [ ]
M −1 N −1
ˆf ( x, y ) − f ( x, y ) 2
x =0 y =0
CCU, Taiwan
Wen-Nung Lie 8-8
Source encoder model
Mapper
Transform the input data
into a format designed to
reduce interpixel
redundancies in the image
(reversible)
Quantizer
Reduce the accuracy of the mapper’s output (reduce the
psychovisual redundancies) – irreversible and must be omitted on
error-free compression
Symbol encoder
Fixed or variable-length coder (assign the shortest code words to
the most frequently occurring output to reduce coding
CCU, Taiwan redundancies) -- reversible
Wen-Nung Lie 8-9
Channel encoder and decoder
Designed to reduce the impact of channel noise by
inserting a controlled form of redundancy into the source
encoded data
Hamming code as a channel code
7-bit Hamming (7,4) – the minimum distance is 3 and could
correct one bit error h = b ⊕b ⊕b h =b
1 3 2 0 3 3
h2 = b3 ⊕ b1 ⊕ b0 h5 = b2
h1 h2 h4 are even
parity bits h4 = b2 ⊕ b1 ⊕ b0 h6 = b1
h7 = b0
A single-bit error is indicated by a nonzero parity word c4c2c1
c1 = h1 ⊕ h3 ⊕ h5 ⊕ h7
c2 = h2 ⊕ h3 ⊕ h6 ⊕ h7
CCU, Taiwan
Wen-Nung Lie c4 = h4 ⊕ h5 ⊕ h6 ⊕ h7 8-10
Information theory
Explore the minimum amount of data that is
sufficient to describe completely an image without
loss of information
Self-information of an event E with probability
P(E)
1
I ( E ) = log = − log P( E )
P( E )
P(E)=1 ⇒ I(E)=0
P(E)=1/2 ⇒ I(E)=1 bit
J
P(bk ) = ∑ P(bk a j ) P(a j )
j =1
CCU, Taiwan
Wen-Nung Lie 8-14
Mutual information
J K
P( a j , bk )
I ( z, v ) = ∑∑ P (a j , bk ) log
j =1 k =1 P (a j ) P (bk )
J K
qkj Q={qij}
= ∑∑ P ( a j )qkj log J
j =1 k =1
∑ P(a )q
i =1
i ki
CCU, Taiwan
Wen-Nung Lie 8-15
Channel capacity
Capacity : the maximum rate at which information can be
transmitted reliably through the channel
Channel capacity does not depend on the input probability
of the source (i.e., on how the channel is used) but is a
function of the conditional probabilities defining the
channel alone (i.e., Q)
CCU, Taiwan
Wen-Nung Lie 8-16
Binary entropy function
To estimate source entropy
For binary source z = [ pbs ,1 − pbs ]T
H ( z ) = − pbs log2 pbs − pbs log2 pbs
Max ( H bs ) = 1 bit when pbs = 0.5 Plot_1
Binary symmetrical channel (BSC)
error probability : pe To estimate channel capacity
channel matrix : Q = ⎡⎢ pe pe ⎤⎥
⎣ pe pe ⎦
⎡ pe pbs + pe pbs ⎤
probability of received symbols : v = Qz = ⎢ ⎥
⎣ pe pbs + pe pbs ⎦
mutual information : I ( z, v ) = H bs ( pbs pe + pbs pe ) − H bs ( pe ) Plot_2
I(z,v)=0 when pbs=0 or 1
CCU, Taiwan I(z,v)=1-Hbs(pe)=C when pbs=0.5 and any pe
Wen-Nung Lie 8-17
when pe=0 or 1 ⇒ C=1 bit
when pe=0.5 ⇒ C=0 bit (no information transmitted)
CCU, Taiwan
Wen-Nung Lie 8-18
Shannon’s first theorem for
noiseless coding
Define the minimum average code length per source
symbol that can be achieved
For a zero-memory source (with finite ensemble (A, z) and
statistically independent source symbols)
Single symbol : A = {a1 , a2 ,..., a J }
Block symbol : (n-tuple of symbols) A′ = {α1 , α 2 ,....,α J n }
CCU, Taiwan
H (z′) = nH (z ) Entropy of zero-memory
source with block symbols
Wen-Nung Lie 8-19
N-extended source
Use n symbols as a block symbol
Average word length of the n-extended code can be
Jn Jn
⎡ 1 ⎤
′ = ∑ P(α i )l (α i ) = ∑ P (α i ) ⎢log
Lavg ⎥
i =1 i =1 ⎢ P (α i ⎥
)
H (z′) ≤ Lavg
′ < H (z′) + 1
′
Lavg 1
H (z ) ≤ < H (z ) +
n n
′
Lavg
lim[ ] = H (z )
n →∞ n
′ / n arbitrarily close to H(z) by
It is possible to make Lavg
CCU, Taiwan coding infinitely long extensions of the source
Wen-Nung Lie 8-20
Coding efficiency
H ( z)
Coding efficiency : η = n
′
Lavg
Example :
Original source : P (a1 ) = 2 / 3, P ( a2 ) = 1 / 3
H ( z ) = 1.83 bits / symbol
′ = 1 bit / symbol
Lavg η = 0.918 / 1 = 0.918
Second extension source : z′ = {4 / 9,2 / 9,2 / 9,1 / 9}
′ = 17 / 9 = 1.89 bits / symbol
Lavg η = 1.83 / 1.89 = 0.97
CCU, Taiwan
Wen-Nung Lie 8-21
Shannon’s second theorem
for noisy coding
For any R < C (channel capacity of the zero-memory
channel with Q), there exists codes of block length r and
rate R such that the block decoding error can be arbitrarily
small
Rate distortion function
given a distortion D, the minimum rate at which
information about the source can be conveyed to the user
Q : the probability from source symbols to output symbols by compression
Q D = {qkj d (Q) ≤ D} d(Q) : average distortion
R ( D) = min [ I (z, v )] D: distortion threshold
Q∈Q D
CCU, Taiwan
Wen-Nung Lie 8-22
Example of Rate-distortion
function
Consider a zero-memory binary source with equally
probable source symbols {0,1}
R ( D) = 1 − H bs ( D)
R(D) is monotonically decreasing and convex in the interval (0,Dmax)
Shape of R-D function represents the coding efficiency of a coder
CCU, Taiwan
Wen-Nung Lie 8-23
Estimate the information
content (entropy) of an image
Construct a source model based on the relative frequency
of occurrence of the graylevels
First-order estimate (consider histogram of single pixels) : 1.81
bits/pixel
Second-order estimate (consider histogram of pairs of adjacent
pixels) : 1.25 bits/pixel
Third-order, fourth-order … : approaching the source entropy
CCU, Taiwan
Wen-Nung Lie 8-25
Error-free compression --
variable-length coding (VLC)
Huffman coding
the most popular method to yield the smallest possible number of
code symbols per source symbol
construct the Huffman tree according to source symbol
probabilities
Code the Huffman tree
Compute the source entropy, average code length, and code
efficiency
CCU, Taiwan
Wen-Nung Lie 8-26
Huffman code is :
a block code (each symbol is mapped to a fixed sequence of bits)
instantaneous (decoded without referencing succeeding symbols)
uniquely decodable (any code word is not a prefix of another)
for example : 010100111100 → a3 a1 a2 a2 a6
CCU, Taiwan
Wen-Nung Lie 8-27
Variations of VLC
CCU, Taiwan
Wen-Nung Lie 8-28
Variations of VLC (cont.)
Considered when a large number of source symbols are
considered
Truncated Huffman coding
Only partial symbols (with most probabilities, here the top 12) are
encoded with Huffman code
A prefix code (whose probability was the sum of other symbols,
here “10”) followed with a fixed-length code is used to represent
all the other symbols
B-code
Each code word is made up of continuation bits (C) and
information bits (natural binary numbers)
C can be 1 or 0, but alternative
E.g., a11a2 a7 → 001010101000010 or 101110001100110
CCU, Taiwan
Wen-Nung Lie 8-29
Variations of VLC (cont.)
Binary shift code
Divide the symbols into blocks of equal size
Code the individual elements within all blocks identically
Add shift-up or shift-down symbols to identify each block (here,
“111”)
Huffman shift code
Select one reference block
Sum up probabilities of all other blocks and use it to determine the
shift symbol by Huffman method (here, “00”)
CCU, Taiwan
Wen-Nung Lie 8-30
Arithmetic coding
AC generates non-block codes (no
look-up table as in Huffman code)
An entire sequence of source symbols
is assigned a single code word
The code word itself defines an
interval of real numbers between 0 and
1. As the number of symbols in the
message increases, the interval
becomes smaller (more information
bits to represent this real number)
Multiple symbols are integratedly
coded with a set of bits → number of
bits per symbol is effectively
fractional
CCU, Taiwan
Wen-Nung Lie 8-31
Arithmetic coding (cont.)
Example : coding of a1a2a3a3a4
Any real number between [0.06752, 0.0688] can represent
the source symbols (e.g., 0.000100011=0.068359375)
Theoretically, 0.068 is enough to represent the source
symbol → 3 decimal digits → 0.6 decimal digits per
symbol
Actually, 0.068359375 → 9 bits for 5 symbols → 1.8 bits
per symbol
As the length of the source symbol sequence increases, the
resulting arithmetic code approaches the Shannon’s bound
Two factors limit the coding performance
An end-of-message indicator is necessary to separate one
message sequence from another
CCU, Taiwan The finite precision arithmetic 8-32
Wen-Nung Lie
LZW (Lempel-Ziv-Welch)
coding
Assign fixed-length code words to variable-length
sequences of source symbols but require no a priori
knowledge of the probabilities of occurrence of symbols
LZW compression has been integrated into GIF, TIFF, and
PDF formats
For a 9-bit (512 words) dictionary, the latter half entries
are constructed in the encoding process. An entry of two or
multiple pixels can be possible (i.e., assigned with a 9 bits
code)
If the size of the dictionary is too small, the detection of matching
gray-level sequences will be less likely.
If too large, the size of code words will adversely affect
compression performance
CCU, Taiwan
Wen-Nung Lie 8-33
LZW coding (cont.)
The LZW decoder builds an
identical decompression
dictionary as it decodes
simultaneously the bit
stream
Notice to handle the
dictionary overflow
CCU, Taiwan
Wen-Nung Lie 8-34
Bit-plane coding
Bit-plane decomposition
Binary : bit change between two adjacent codes may be
significant (e.g., 127 and 128)
Gray code : only 1 bit is changed between any two
adjacent codes
am −1 2 m −1 + am − 2 2 m − 2 + ... + a1 21 + a0 20
g i = ai ⊕ ai +1
g m −1 = am −1 0≤i ≤ m−2
Gray-coded bit planes are less complex than the
corresponding binary bit planes
CCU, Taiwan
Wen-Nung Lie 8-35
CCU, Taiwan Binary-coded gray-coded 8-36
Wen-Nung Lie
Lossless compression for
binary images
1-D run-length coding
RLC+VLC according to run-lengths statistics
2-D run-length coding
used for FAX image compression
Relative address coding (RAC)
based on the principle of tracking the binary transitions that
begin and end each black and white run
combined with VLC
CCU, Taiwan
Wen-Nung Lie 8-37
Other methods for lossless
compression of binary images
White block skipping
code the solid lines as 0’s and all other lines with a 1 followed by
original bit patterns
Direct contour tracing
represent each contour by a single boundary point and a set of
directionals
Predictive differential quantizing (PDO)
a scan-line-oriented contour tracing procedure
Comparisons of various algorithms
only entropies after pixel mapping are computed, instead of real
encoding (see next slice)
CCU, Taiwan
Wen-Nung Lie 8-38
Comparison of various
methods
Run-length coding proved to
be the best coding method for
bit-plane coded images
2-D techniques (PDQ, DDC,
RAC) perform better when
compressing binary images or
higher bit-plane images
Gray-coded images proved to
gain additional 1 bit/pixel
compression efficiency relative
to binary-coded images
CCU, Taiwan
Wen-Nung Lie 8-40
Methods of prediction
Use local, global, or adaptive predictor to generate fˆn
linear prediction : linear combination of m previous pixels
⎡ m
⎤
fˆn = round ⎢∑αi f n −i ⎥
⎣ i =1 ⎦
m : order of prediction
αi : prediction coefficients
Use local neighborhoods (e.g., pixels-1,2,3,4) for
prediction of pixel-X in 2-D images 2 3 4
special case : previous pixel predictor 1 X
CCU, Taiwan
Wen-Nung Lie 8-41
Entropy reduction by
difference mapping
Due to removal of interpixel redundancies by prediction, first-
order entropy of difference mapping will be lower than the
original image (3.96 bits/pixel vs. 6.81 bits/pixel)
The probability density function of the prediction errors is
highly peaked at zero and characterized by a relatively small
variance (modeled by the zero mean uncorrelated Laplacian pdf)
− 2e
1 σe
pe ( e) = e
2σ e
λ −λ e
pe ( e ) = e
2
CCU, Taiwan
Wen-Nung Lie 8-42
Lossy predictive coding
Error-free encoding of images seldom results in more than
3:1 reduction in data
A lossy predictive coding model
the prediction in the encoder and decoder must be equivalent (same)
-- placing encoder’s predictor within a feedback loop
f&n = e&n + fˆn
CCU, Taiwan
Wen-Nung Lie 8-43
Delta modulation (DM)
⎧ + ς for en > 0
DM : fˆn = αf&n −1 e&n = ⎨
⎩− ς otherwise
the resulting bit rate is 1 bit/pixel
CCU, Taiwan
Wen-Nung Lie 8-44
Delta modulation (cont)
CCU, Taiwan
Wen-Nung Lie 8-45
Optimal predictors
Minimize encoder’s mean-square prediction error
E{en2 } = E{[ f n − fˆn ]2 }
m
⎪⎩ ⎣ i =1 ⎦ ⎪⎭
The solution : α = R r −1
⎡ α1 ⎤
⎡ E{ f n f n −1} ⎤ ⎡ E{ f n −1 f n −1} E{ f n −1 f n −2 } . . E{ f n −1 f n −m } ⎤ ⎢α ⎥
⎢ E{ f f } ⎥ ⎢ E{ f f } E{ f f } ⎥ ⎢ 2⎥
⎢ − ⎥ . . .
n n 2
⎢ n − 2 n −1 n −2 n −2 ⎥ α =⎢ . ⎥
r=⎢ . ⎥ R=⎢ . . . . . ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ . ⎥
⎢ . ⎥ ⎢ . . . . . ⎥ ⎢⎣α m ⎥⎦
⎢⎣ E{ f n f n −m }⎥⎦ ⎢⎣ E{ f n −m f n −1} . . . E{ f n −m f n −m }⎥⎦
CCU, Taiwan
Wen-Nung Lie 8-46
Optimal predictors (cont)
Variance of prediction error m
σ = σ − α r = σ − ∑ E{ f n f n −i }α i
2
e
2 T 2
i =1
For a 2-D Markov source with separable autocorrelation
function E{ f ( x, y ) f ( x − i, y − j )} = σ 2 ρ vi ρ hj 2 1
3 X
fˆ ( x, y ) = α1 f ( x, y − 1) + α 2 f ( x − 1, y − 1) + α 3 f ( x − 1, y ) + α 4 f ( x − 1, y + 1)
α1 = ρ h , α 2 = − ρ v ρ h , α 3 = ρ v , α 4 = 0
We generally have m
∑α
i =1
i ≤ 1 .0
CCU, Taiwan
Wen-Nung Lie 8-48
Optimal quantization
The staircase quantization function is an odd function that
can be described by the decision ( si ) and reconstruction ( ti )
levels
Quantizer design is to select the best si and ti for a
particular optimization criterion and input probability
density function p(s)
CCU, Taiwan
Wen-Nung Lie 8-49
L-level Lloyd-Max quantizer
Optimal in the mean-square error sense
si ti is the centroid of
∫
si −1
( s − ti ) p ( s )ds = 0 i = 1,2,..., L2 each decision interval
⎧ 0 i=0
⎪
si = ⎨ ti +2ti +1 i = 1,2,..., L2 − 1 Decision levels are halfway
⎪ ∞ i= of the reconstruction levels
⎩
L
2
Non-uniform
quantizer s−i = − si , t−i = −ti
CCU, Taiwan
Wen-Nung Lie 8-50
Transform coding
Subimage decomposition
Transformation
Decorrelate the pixels of each subimage
Energy packing
Quantization
Eliminate coefficients that carry the least information
coding
CCU, Taiwan
Wen-Nung Lie 8-51
Transform selection
Requirements
orthonormal or unitary forward and inverse transformation kernels
basis function or basis images
separable and symmetric for the kernels
Types
DFT
DCT
WHT (Walsh-Hadamard transform)
Compression is achieved during the quantization of the
transformed coefficients (not during the transformation
step)
CCU, Taiwan
Wen-Nung Lie 8-52
WHT
The summation in the exponent is performed in modulo 2
arithmetic
bk (z ) is the kth bit (from right to left) in the binary
representation of z
N = 2m
N −1 N −1
T (u, v ) = ∑∑ f ( x, y ) g ( x, y , u, v )
x =0 y =0
N −1 N −1
f ( x, y ) = ∑∑ T (u, v )h ( x, y , u, v )
u =0 v =0
m −1
1 ∑ ⎣bi ( x ) pi ( u )+bi ( y ) pi ( v ) ⎦
g ( x, y , u, v ) = h( x, y , u, v ) = ( −1) i =0
N
CCU, Taiwan
Wen-Nung Lie 8-53
DCT
⎡ ( 2 x + 1)uπ ⎤ ⎡ ( 2 y + 1)vπ ⎤
g ( x, y , u, v ) = h ( x, y , u, v ) = α (u )α ( v ) cos ⎢ ⎥ cos ⎢ ⎥⎦
⎣ 2N ⎦ ⎣ 2N
⎧⎪ 1
for u = 0
α (u ) = ⎨ N
⎪⎩ 2
N for u = 1,2,..., N − 1
CCU, Taiwan
Wen-Nung Lie 8-54
Approximations using DFT,
WHT, and DCT
Truncating 50% of the
transformed coefficients
based on maximum DFT
magnitudes
The RMS error values are
1.28 (DFT), 0.86 (WHT), and
0.68 (DCT). WHT
DCT
CCU, Taiwan
Wen-Nung Lie 8-55
Basis images
A linear combination of n basis images : H uv
2
n −1 n −1
F = ∑∑ T (u , v)H uv
u =0 v =0
CCU, Taiwan
Wen-Nung Lie 8-56
Transform coefficient masking
Masking function : γ (u, v)
n −1 n −1
Fˆ = ∑∑ γ (u , v)T (u , v)H uv
u =0 v =0
u =0 v =0 u =0 v =0
n −1 n −1
= ⋅ ⋅ ⋅ = ∑∑ σ T2 ( u ,v ) [1 − γ (u , v)]
u =0 v =0
CCU, Taiwan
Wen-Nung Lie 8-58
original 2x2
subimage
4x4 8x8
subimage subimage
CCU, Taiwan
Wen-Nung Lie 8-59
Zonal vs. threshold coding
The coefficients are retained based on
Maximum variance : zonal coding
Maximum magnitude : threshold coding
Zonal coding
Each DCT coefficient is considered as a random variable whose
distribution could be computed over the ensemble of all transformed
subimages
The variances can also be based on an assumed image model (e.g., a
Markov autocorrelation function)
Coefficients of maximum variances are usually located around the origin
Threshold coding
Select coefficients which have the largest magnitudes
Causing far less error than the zonal coding result
CCU, Taiwan
Wen-Nung Lie 8-60
Bit allocation for
zonal coding
The retained coefficients are
allocated bits according to :
The dc coefficient is modeled by
a Rayleigh density function
The ac coefficients are often
modeled by a Laplacian or
Gaussian density
The number of bits is made
proportional to log 2 σ T2(u ,v )
The information content of a Gaussian
random variable is proportional to log 2 (σ / D)
2
Threshold Zonal
coding coding
CCU, Taiwan
Wen-Nung Lie 8-61
Bit allocation in threshold
coding
The transform coefficients of largest magnitude make the
most significant contribution to reconstructed subimage
quality
Inherently adaptive in the sense that the location of the
retained transform coefficients vary from one subimage to
another
Retained coefficients are recorded in a 1-D zigzag ordering
pattern
The mask pattern is run-length coded
CCU, Taiwan
Wen-Nung Lie 8-62
Zonal mask Bit allocation
CCU, Taiwan
Wen-Nung Lie 8-65
Wavelet coding
Requirements in encoding an image
An analyzing wavelet
Number of decomposition levels
CCU, Taiwan
Wen-Nung Lie 8-67
1-D Subband decomposition
Analysis filter :
Low-pass : approximation h0 (n)
High-pass : detail h1 (n)
Synthesis filter :
Low-pass : g 0 (n)
High-pass : g1 (n)
CCU, Taiwan
Wen-Nung Lie 8-68
2-D Subband image coding –
4 filter bank
CCU, Taiwan
Wen-Nung Lie 8-69
An example of Daubechies
orthonormal filters
CCU, Taiwan
Wen-Nung Lie 8-70
Comparison between DCT-
based and DWT-based coding
A noticeable decrease of error in the wavelet coding results (based on
compression ratios of 34:1 and 67:1)
DWT-based DCT-based
CCU, Taiwan
Wen-Nung Lie 8-71
Based on compression ratios of
108:1 and 167:1
Even the 167:1 DWT result
(rms=4.73) is better than 67:1
DCT result (rms=6.33)
At more than twice of Rms error :
compression ratio, the wavelet- 3.72, 4.73
based reconstruction has only
75% of the error of the DCT-
based result
CCU, Taiwan
Wen-Nung Lie 8-72
Wavelet selection
The wavelet selection affects all aspects of wavelet coding
system design and performance
Computational complexity of the transform
System’s ability to compress and reconstruct images of acceptable
error
Include the decomposition filters and reconstruction filters
Useful analysis property : number of zero moments
Important synthesis property : smoothness of reconstruction
The most widely used expansion functions
Daubechies wavelets
Allows filters with binary coefficients (numbers of the
Bi-orthogonal wavelets form k/2a)
CCU, Taiwan
Wen-Nung Lie 8-73
Comparison between different
wavelets
(1) Harr wavelets : the
simplest
(2) Daubechies wavelets :
the most popular imaging
1,2
wavelets
3,4
(3) Symlets : an extension
of Daubechies with
increased symmetry
(4) Cohen-Daubechies-
Feauveau wavelets :
biorthogonal wavelets
CCU, Taiwan
Wen-Nung Lie 8-74
Comparison between different
wavelets (cont)
Comparisons in number of operations required
As the computational complexity increases, the information
packing ability does as well
CCU, Taiwan
Wen-Nung Lie 8-75
Other issues
The number of decomposition levels required
The initial decompositions are responsible for the majority of the data
compression (3 levels is enough)
Quantizer design
Introduce an enlarged quantization interval around zero (i.e., the dead
zone)
Adapting the size of the quantization interval from scale to scale
Quantization of coefficients at more decomposition levels impacts larger
areas of the reconstructed image
CCU, Taiwan
Wen-Nung Lie 8-76
Binary image compression
standards
Most of the standards are issued by
International Standardization Organization (ISO)
Consultative Committee of the International Telephone and Telegraph
(CCITT)
CCITT Group 3 and 4 are for binary image compression
Originally designed as fasimile (FAX) coding method
G3 : Nonadaptive, 1-D run-length coding
G4 : a simplified or streamlined version of G3, only 2-D coding
The coding approach is quite similar to the RAC method
Joint Bilevel Imaging Group (JBIG)
A joint committee of CCITT and ISO
Proposed JBIG1: adaptive arithmetic compression technique (the best
average and worst-case available)
Proposed JBIG2 : achieve compressions 2 to 4 times greater than JBIG1
CCU, Taiwan
Wen-Nung Lie 8-77
Continuous-tone image
compression -- JPEG
JPEG, wavelet-based JPEG-2000, JPEG-LS
JPEG – define three coding system
Lossy baseline coding system
Extended coding system – for greater compression, higher
precision, or progressive reconstruction applications
Lossless independent coding system
A product or system must include support for the baseline system
Baseline system
Input, output : 8 bits
Quantized DCT values : 11 bits
Subdivided into blocks of 8x8 pixels for encoding
Pixels are level shifted by substracting 128 graylevels
CCU, Taiwan
Wen-Nung Lie 8-78
JPEG (cont)
DCT-transformed
Quantized by using the quantization or normalization matrix
The DC DCT coefficients is difference coded relative to the DC
coefficient of the previous block
The non-zero AC coefficients are coded by using a VLC that
defines the coefficient’s value and number of preceding zeros
The user is free to construct custom tables and/or arrays, which may
be adapted to the characteristics of the image being compressed
Add a special EOB (end of block) code behind the last non-zero
AC coefficient
CCU, Taiwan
Wen-Nung Lie 8-79
Huffman Coding for the DC
and AC coefficients
VLC = base code (category code) + value code
An example to encode DC = -9
Category 4 : -15,…,-8,8,…,15 (base code = 101)
Total length = 7 Æ value code length = 4
For a category K, additional K bits are needed as the value code and
computed as either the K LSBs (positive value) or the K LSBs of 1’s
complement of its absolute value (negative value) : K=4, value code =
0110
Complete code : 101-0110
An example to encode AC = (0,-3)
length of “0” run : 0, AC coefficient value : -3
AC = -3 Æ category = 2 Æ 0/2 (Table 8.19) Æ base code = 01
Value code = 00
Complete code = 0100
CCU, Taiwan
Wen-Nung Lie 8-80
A JPEG coding example
52 55 61 66 70 61 64 73 -76 -73 -67 -62 -58 -67 -64 -55
63 59 66 90 109 85 69 72 -65 -69 -62 -38 -19 -43 -59 -56
62 59 68 113 144 104 66 73 -66 -69 -60 -15 16 -24 -62 -55
original 63 58 71 122 154 106 70 69 -65 -70 -57 -6 26 -22 -58 -59 Level
67 61 68 104 126 88 68 70 -61 -67 -60 -24 -2 -40 -60 -58 shifted
79 65 60 70 77 68 58 75 -49 -63 -68 -58 -51 -65 -70 -53
85 71 64 59 55 61 65 83 -43 -57 -64 -69 -73 -67 -63 -45
87 79 69 68 65 76 78 94 -41 -49 -59 -60 -63 -52 -50 -34
-50 13 35 -15 -9 6 0 3
DCT
transformed 11 -8 -13 -2 -1 1 -4 1
-10 1 3 -3 -1 0 2 -1
-4 -1 2 -1 2 -3 1 -2
CCU, Taiwan -1 1 -1 -2 -1 -1 0 -1
Wen-Nung Lie 8-81
-26 -3 -6 2 2 0 0 0
1 -2 -4 0 0 0 0 0
-3 1 5 -1 -1 0 0 0
quantized -4 1 2 -1 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Final codes :
1010110,0100,001,0100,0101,100001,0110,100011,001,100011,001,001,100101,1110011
0,110110,0110,11110100,000,1010
CCU, Taiwan
Wen-Nung Lie 8-82
58 64 67 64 59 62 70 78
56 55 67 89 98 88 74 69
60 50 70 119 141 116 80 64
69 51 71 128 149 115 77 68
Reconstructed after VLD
74 53 64 105 115 84 65 72
and requantization
76 57 56 74 75 57 57 74
83 69 59 60 61 61 67 78
93 81 67 62 69 80 84 84
-6 -9 -6 2 11 -1 -6 -5
7 4 -1 1 11 -3 -5 3
2 9 -2 -6 -3 -12 -14 9
residual -6 7 0 -4 -5 -9 -7 1
-7 8 4 -1 11 4 3 -2
3 8 4 -4 2 11 1 1
2 2 5 -1 -6 0 -2 5
CCU, Taiwan -6 -2 2 6 -4 -4 -6 10
Wen-Nung Lie 8-83
CCU, Taiwan
Wen-Nung Lie 8-84
CCU, Taiwan
Wen-Nung Lie 8-85
JPEG-2000
Increased flexibility to the access of compressed data
Portions of a JPEG 2000 compressed image can be extracted for
re-transmission, storage, display, and editing
Procedures
DC level shift by substracting 128
Optionally decorrelated by using reversible or non-reversible linear
combination of components (e.g., RGB to YCrCb component
transformation) (i.e., RGB-based or YCrCb-based JPEG-2000
coding are allowed)
Y0 ( x, y ) = 0.299 I 0 ( x, y ) + 0.587 I 1 ( x, y ) + 0.114 I 2 ( x, y )
Y1 ( x, y ) = −0.16875 I 0 ( x, y ) − 0.33126 I 1 ( x, y ) + 0.5 I 2 ( x, y )
Y2 ( x, y ) = 0.5 I 0 ( x, y ) − 0.41869 I 1 ( x, y ) − 0.08131I 2 ( x, y )
CCU, Taiwan Y1 and Y2 are different images highly peaked around zero
Wen-Nung Lie 8-86
JPEG-2000 (cont)
Optionally divided into tiles (rectangular arrays of pixels which
can be extracted and reconstructed independently), providing a
mechanism for accessing a limited region of a coded image
2-D DWT by using a biorthogonal 5-3 coefficient filter (lossless)
or a 9-7 coefficient filter (lossy) produces a low-resolution
approximation and horizontal, vertical, and diagonal frequency
components (LL, HL,LH, HH bands)
CCU, Taiwan
Wen-Nung Lie 8-87
9-7 filter for lossy DWT
i 低通濾波器 H0 高通濾波器 H1
0 6/8 1
1 2/8 -1/2
2 -1/8
CCU, Taiwan
Wen-Nung Lie 8-88
JPEG-2000 (cont)
A fast DWT algorithm or a lifting-based approach is often used
Iterative decomposition of the approximation part to obtain :
CCU, Taiwan
Wen-Nung Lie 8-89
CCU, Taiwan
Wen-Nung Lie 8-90
JPEG-2000 (cont)
Coefficient quantization adapted to individual scales and subbands
ab (u , v)
qb (u , v) = sign[ab (u , v)] ⋅ floor[ ]
∆b
µb
quantization step size : ∆ b = 2 R −ε (1 +
b b
)
211
Rb is the nominal dynamic range (in bits) of subband b, εb and
µb are the number of bits allocated to the exponent and
mantissa of subband’s coefficients
CCU, Taiwan
Wen-Nung Lie 8-92
N+1 decomposition level N decomposition level N-1 decomposition level
empty
Layer 5
empty
Layer 4
empty
Layer 3
Layer 2
empty
Layer 1
Resolution
CCU, Taiwan
Wen-Nung Lie 8-93
Video compression -- motion-
compensated transform coding
Standards
Video teleconferencing standard (H.261, H.263, H.263+, H.320, H.264)
Multimedia standard (MPEG-1/2/4)
A combination of predictive coding (in temporal domain) and
transform coding (in spatial domain)
Estimate (predict) the image by simply propagating pixels in the
previous frame along their motion trajectory – motion
compensation
The success depends on accuracy, speed, and robustness of the
displacement estimator -- motion estimation
Motion estimation is based on each image block (16×16 or 8×8
pixels)
CCU, Taiwan
Wen-Nung Lie 8-94
Motion-compensated
transform coding - encoder
CCU, Taiwan
Wen-Nung Lie 8-95
Motion-compensated
transform coding -- decoder
Compressed Reconstructed
bit stream Frame
BUF VLD IT +
MC
Predictor
Motion Vectors
CCU, Taiwan
Wen-Nung Lie 8-97
Block-Matching (BM) Motion
Estimation
Find the best match (mvx , mv y ) between current image block and
candidates in the previous frame by minimizing the following cost
∑ x ( x, y ) − x
( x , y )∈block
n n −1 ( x + mvx , y + mv y )
One motion vector (MV) for each block between successive frames
frame k-1
frame k
search window
block
CCU, Taiwan
Wen-Nung Lie 8-98
BM motion estimation
techniques
Full search method
Fast Search (Reduced Search)
Usually a multistage search procedure that calculates fewer search points
Local-(or Sub-)optima in general
Evaluation – number of search points, noise immunity
Existing Algorithms :
Three-step method
Four-step method
Log-search method
Hierarchical matching method
Prediction search method
Kalman prediction method
CCU, Taiwan
Wen-Nung Lie 8-99
Fast full search
Strategy of early quit to next search point in computing partial errors
Re-ordering in computing errors
Conventional : raster-scanning order
Improvement : row or column sorting by gradient magnitudes in decreasing order
A 3x~4x speedup with respect to traditional full search
Search area
23 30 32 35
22 32 33 37
45 46 41 36
41 29 31 20 0 3 12 12
11 6 4 6
difference 32 26 38 49
10 8 10 6 34 23 27 20 23 1 16 19 29
21 22 22 25 90 33 38 29 31
31 26 31 37 125 77 72 79 85 Assume : current minimum
25 21 18 22 86 40 45 50 49 error = 100
gradient Template
CCU, Taiwan
Wen-Nung Lie 8-100
Three steps search
The full search examines 15×15=225 positions, the three step search examines
9+8+8=25 positions (assume MV range is –7 to +7)
Search range
CCU, Taiwan
Wen-Nung Lie 8-101
Predictive search
Explore the similarity of MVs between adjacent macroblocks
Obtain initially predicted MV from neighboring macroblocks by
averaging
Prediction followed by a small-area detailed search
Significantly reduce the CPU time by 4~10X
MV1 MV2
MV3
CCU, Taiwan
Wen-Nung Lie 8-102
Fractional pixel motion
estimation
Question : what happens if motion vector (mvx , mv y ) are real numbers instead
of integer numbers ?
Answer : Find integer motion vector first and then do interpolation from
surrounding pixels for refined search. Most popular case is the half-pixel
accuracy (i.e., 1 out of 8 positions for refinement). 1/3 pixel accuracy will be
proposed in H.263L standard.
Half-pixel accuracy often leads to a matching with less residue (high
reconstruction quality), hence decreases the bit rate required in following
encoding.
: integer MV position
CCU, Taiwan
Wen-Nung Lie
: candidates of half-pixel MV positions 8-103
Evaluation of BM motion
estimation
Advantages :
Straightforward, regular operation, and promising for VLSI chip
design
Robustness (no differential operations as in optical flow approach)
Disadvantages :
Can not process several moving objects in a block (e.g., around
occlusion boundaries)
Improvements :
Post-processing of images to eliminate blocking effect
Interpolation of images or averaging of motion vectors to subpixel
accuracy
CCU, Taiwan
Wen-Nung Lie 8-104
Image Prediction structure
I/ P frame only (H.261, H.263) Intra-frame
I/ P/ B frame (MPEG-x)
forward prediction
I B B B P B B B P B B B P B
.....
bidirectional
prediction
CCU, Taiwan
Wen-Nung Lie 8-105