0% found this document useful (0 votes)
27 views

Compression

This document discusses image compression. It covers several key topics in 3 paragraphs: 1) It discusses applications of image compression including video conferencing, remote sensing, document and medical imaging, fax transmission, and controlling remotely piloted vehicles. It also covers fundamentals of data, information, and different types of redundancies in images like coding, interpixel, and psychovisual redundancy. 2) It describes various image compression techniques including variable length coding, run-length coding, quantization, dithering, and transforms. It also discusses fidelity criteria for evaluating compressed images. 3) It provides an overview of the image compression model including source encoding to remove redundancies, channel encoding for noise immunity

Uploaded by

sujitha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Compression

This document discusses image compression. It covers several key topics in 3 paragraphs: 1) It discusses applications of image compression including video conferencing, remote sensing, document and medical imaging, fax transmission, and controlling remotely piloted vehicles. It also covers fundamentals of data, information, and different types of redundancies in images like coding, interpixel, and psychovisual redundancy. 2) It describes various image compression techniques including variable length coding, run-length coding, quantization, dithering, and transforms. It also discusses fidelity criteria for evaluating compressed images. 3) It provides an overview of the image compression model including source encoding to remove redundancies, channel encoding for noise immunity

Uploaded by

sujitha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

Chapter 8 : Image

Compression

CCU, Taiwan
Wen-Nung Lie
Applications of image
compression
„ Televideo conferencing,
„ Remote sensing (satellite imagery),
„ Document and medical imaging,
„ Facsimile transmission (FAX),
„ Control of remotely piloted vehicles in military,
space, and hazardous waste management

CCU, Taiwan
Wen-Nung Lie 8-1
Fundamentals
„ What is data and hat is information ?
„ Data are the means by which information is conveyed.
Various amounts of data may be used to represent the
same amount of information
„ Data redundancy
„ Coding redundancy
„ Interpixel redundancy
„ Psychovisual redundancy

CCU, Taiwan
Wen-Nung Lie 8-2
Coding redundancy
„ The graylevel histogram of an image can provide a great
deal of insight into the construction of codes
„ The average number of bits to represent each pixel
L −1
Lavg = ∑ l (rk ) pr (rk )
k =0
„ l (rk ) is the average number of bits to represent each value of rk
„ Variable length coding (VLC) : Higher probability, shorter bit length
Lavg = 3.0 bits (code1)
2.7 bits (code2)

CCU, Taiwan
Wen-Nung Lie 8-3
Interpixel redundancy
„ Autocorrelation coefficients
A(∆n) 1 N −1− ∆n
γ (∆n) =
A(0)
A(∆n) = ∑
N − ∆n y =0
f ( x, y ) f ( x, y + ∆n)

„ Interpixel redundancy
„ Spatial redundancy
„ Geometrical redundancy
„ Interframe redundancy
„ Larger autocorrelation, more interpixel
redundancy
CCU, Taiwan
Wen-Nung Lie 8-4
Run-length coding to remove
spatial redundancy

C R = 2.63
1
RD = 1 − = 0.62
2.63

CCU, Taiwan
Wen-Nung Lie 8-5
Psychovisual redundancy
„ Certain information simply has less relative importance
than others in normal visual processing – psychovisual
redundancy
„ The elimination of psychovisually redundant data results
in a loss of quantitative information – called quantization
⇒ lossy data compression
„ E.g., quantization in graylevels
„ E.g., line interlacing in TV
(reduced video scanning rate)
(b) False contouring on quantization
(c) Improve graylevel quantization
via dithering
CCU, Taiwan
Wen-Nung Lie 8-6
Fidelity criteria
„ Objective fidelity criterion
„ Root-mean-square error
1
⎡ 1 M −1 N −1 ⎤ 2

erms = ⎢ ∑∑ [ fˆ ( x, y ) − f ( x, y )] ⎥
2

⎣ MN x =0 y =0 ⎦
„ Mean-square signal-to-noise ratio
M −1 N −1

∑∑ fˆ ( x, y)
x =0 y =0
2

SNRms =
∑∑ [ ]
M −1 N −1
ˆf ( x, y ) − f ( x, y ) 2
x =0 y =0

„ Subjective fidelity criterion


„ Much worse, worse, slightly worse, the same, slightly better, better,
much better
CCU, Taiwan
Wen-Nung Lie 8-7
Image compression model
„ Source encoder
„ Remove input redundancies
„ Channel encoder
„ Increase the noise immunity of the source encoder’s
output

CCU, Taiwan
Wen-Nung Lie 8-8
Source encoder model
„ Mapper
„ Transform the input data
into a format designed to
reduce interpixel
redundancies in the image
(reversible)
„ Quantizer
„ Reduce the accuracy of the mapper’s output (reduce the
psychovisual redundancies) – irreversible and must be omitted on
error-free compression
„ Symbol encoder
„ Fixed or variable-length coder (assign the shortest code words to
the most frequently occurring output to reduce coding
CCU, Taiwan redundancies) -- reversible
Wen-Nung Lie 8-9
Channel encoder and decoder
„ Designed to reduce the impact of channel noise by
inserting a controlled form of redundancy into the source
encoded data
„ Hamming code as a channel code
„ 7-bit Hamming (7,4) – the minimum distance is 3 and could
correct one bit error h = b ⊕b ⊕b h =b
1 3 2 0 3 3

h2 = b3 ⊕ b1 ⊕ b0 h5 = b2
h1 h2 h4 are even
parity bits h4 = b2 ⊕ b1 ⊕ b0 h6 = b1
h7 = b0
„ A single-bit error is indicated by a nonzero parity word c4c2c1
c1 = h1 ⊕ h3 ⊕ h5 ⊕ h7
c2 = h2 ⊕ h3 ⊕ h6 ⊕ h7
CCU, Taiwan
Wen-Nung Lie c4 = h4 ⊕ h5 ⊕ h6 ⊕ h7 8-10
Information theory
„ Explore the minimum amount of data that is
sufficient to describe completely an image without
loss of information
„ Self-information of an event E with probability
P(E)
1
I ( E ) = log = − log P( E )
P( E )
P(E)=1 ⇒ I(E)=0
P(E)=1/2 ⇒ I(E)=1 bit

The base of logarithm


CCU, Taiwan determines the information units
Wen-Nung Lie 8-11
Information channel
J

„ Source alphabet A = {a1 , a2 ,..., a J } ∑


j =1
P(a j ) = 1

„ Symbol probability z = [P(a1 ), P(a2 ),..., P(a J )]T


„ Ensemble (A, z) describes the information source
completely
„ Average information per source output
J
H (z ) = −∑ P(a j ) log P(a j )
j =1

H(z) is the uncertainty or entropy of the source or


the average amount of information (in m-ary units)
obtained by observing a single source output

For equally probable source symbols, entropy is maximized


CCU, Taiwan
Wen-Nung Lie 8-12
„ Channel alphabet B = {b1 , b2 ,..., bK }
„ Channel description (B, v) v = [P (b1 ), P (b2 ),..., P (bK ) ]T

J
P(bk ) = ∑ P(bk a j ) P(a j )
j =1

⎡ P (b1 a1 ) P(b1 a2 ) ... ... P (b1 a J ) ⎤


⎢ ⎥
⎢ P (b2 a1 ) ... ... ... . ⎥
Q=⎢ . ... ... ... . ⎥
⎢ ⎥
⎢ . ... ... ... . ⎥
⎢ P (b a ) P(b a ) ... ... P (bK a J )⎥⎦
⎣ K 1 K 2

Q : Forward channel transition


CCU, Taiwan
v = Qz matrix, channel matrix
Wen-Nung Lie 8-13
Conditional entropy function
J
„ H (z bk ) = − ∑ P(a j bk ) log P(a j bk )
j =1
K
H (z v ) =∑ H (z bk ) P(bk )
k =1
J K
H (z v ) = − ∑∑ P(a j , bk ) log P(a j bk )
j =1 k =1

„ H ( z v ) is called the equivocation of z wrt v.


„ The difference between H (z) and H ( z v ) is the average
information received upon observing a single output
symbol -- mutual information I ( z, v )
I ( z, v ) = H ( z ) − H ( z v )

CCU, Taiwan
Wen-Nung Lie 8-14
Mutual information
J K
P( a j , bk )
„ I ( z, v ) = ∑∑ P (a j , bk ) log
j =1 k =1 P (a j ) P (bk )
J K
qkj Q={qij}
= ∑∑ P ( a j )qkj log J
j =1 k =1
∑ P(a )q
i =1
i ki

„ The minimum possible value of I(z,v) is zero and occurs


when the input and output symbols are statistically
independent (i.e., P(a j ,bk ) = P(a j ) P(bk ) )
„ The maximum value of I(z,v) for all possible z is called the
capacity C described by channel matrix Q
C = max[ I ( z, v )]
z

CCU, Taiwan
Wen-Nung Lie 8-15
Channel capacity
„ Capacity : the maximum rate at which information can be
transmitted reliably through the channel
„ Channel capacity does not depend on the input probability
of the source (i.e., on how the channel is used) but is a
function of the conditional probabilities defining the
channel alone (i.e., Q)

CCU, Taiwan
Wen-Nung Lie 8-16
Binary entropy function
To estimate source entropy
„ For binary source z = [ pbs ,1 − pbs ]T
H ( z ) = − pbs log2 pbs − pbs log2 pbs
Max ( H bs ) = 1 bit when pbs = 0.5 Plot_1
„ Binary symmetrical channel (BSC)
„ error probability : pe To estimate channel capacity
„ channel matrix : Q = ⎡⎢ pe pe ⎤⎥
⎣ pe pe ⎦
⎡ pe pbs + pe pbs ⎤
„ probability of received symbols : v = Qz = ⎢ ⎥
⎣ pe pbs + pe pbs ⎦
„ mutual information : I ( z, v ) = H bs ( pbs pe + pbs pe ) − H bs ( pe ) Plot_2
I(z,v)=0 when pbs=0 or 1
CCU, Taiwan I(z,v)=1-Hbs(pe)=C when pbs=0.5 and any pe
Wen-Nung Lie 8-17
when pe=0 or 1 ⇒ C=1 bit
when pe=0.5 ⇒ C=0 bit (no information transmitted)
CCU, Taiwan
Wen-Nung Lie 8-18
Shannon’s first theorem for
noiseless coding
„ Define the minimum average code length per source
symbol that can be achieved
„ For a zero-memory source (with finite ensemble (A, z) and
statistically independent source symbols)
„ Single symbol : A = {a1 , a2 ,..., a J }
„ Block symbol : (n-tuple of symbols) A′ = {α1 , α 2 ,....,α J n }

P(α i ) = P(a j1 ) P(a j 2 )...P(a jn )


„ ( A, z ) → H (z ) Jn
„
( A′, z′) → H (z′) H (z′) = −∑ P(α i ) log P(α i )
i =1

CCU, Taiwan
H (z′) = nH (z ) Entropy of zero-memory
source with block symbols
Wen-Nung Lie 8-19
N-extended source
„ Use n symbols as a block symbol
„ Average word length of the n-extended code can be
Jn Jn
⎡ 1 ⎤
′ = ∑ P(α i )l (α i ) = ∑ P (α i ) ⎢log
Lavg ⎥
i =1 i =1 ⎢ P (α i ⎥
)

H (z′) ≤ Lavg
′ < H (z′) + 1

Lavg 1
H (z ) ≤ < H (z ) +
n n

Lavg
lim[ ] = H (z )
n →∞ n
′ / n arbitrarily close to H(z) by
It is possible to make Lavg
CCU, Taiwan coding infinitely long extensions of the source
Wen-Nung Lie 8-20
Coding efficiency
H ( z)
„ Coding efficiency : η = n

Lavg
„ Example :
„ Original source : P (a1 ) = 2 / 3, P ( a2 ) = 1 / 3
H ( z ) = 1.83 bits / symbol
′ = 1 bit / symbol
Lavg η = 0.918 / 1 = 0.918
„ Second extension source : z′ = {4 / 9,2 / 9,2 / 9,1 / 9}
′ = 17 / 9 = 1.89 bits / symbol
Lavg η = 1.83 / 1.89 = 0.97

CCU, Taiwan
Wen-Nung Lie 8-21
Shannon’s second theorem
for noisy coding
„ For any R < C (channel capacity of the zero-memory
channel with Q), there exists codes of block length r and
rate R such that the block decoding error can be arbitrarily
small
„ Rate distortion function
given a distortion D, the minimum rate at which
information about the source can be conveyed to the user
Q : the probability from source symbols to output symbols by compression
Q D = {qkj d (Q) ≤ D} d(Q) : average distortion
R ( D) = min [ I (z, v )] D: distortion threshold
Q∈Q D
CCU, Taiwan
Wen-Nung Lie 8-22
Example of Rate-distortion
function
„ Consider a zero-memory binary source with equally
probable source symbols {0,1}
R ( D) = 1 − H bs ( D)
„ R(D) is monotonically decreasing and convex in the interval (0,Dmax)
„ Shape of R-D function represents the coding efficiency of a coder

CCU, Taiwan
Wen-Nung Lie 8-23
Estimate the information
content (entropy) of an image
„ Construct a source model based on the relative frequency
of occurrence of the graylevels
„ First-order estimate (consider histogram of single pixels) : 1.81
bits/pixel
„ Second-order estimate (consider histogram of pairs of adjacent
pixels) : 1.25 bits/pixel
„ Third-order, fourth-order … : approaching the source entropy

21 21 21 95 169 243 243 243


21 21 21 95 169 243 243 243
Example image
21 21 21 95 169 243 243 243
21 21 21 95 169 243 243 243
CCU, Taiwan
Wen-Nung Lie 8-24
„ The difference between the higher-order estimates of entropy and
the first-order estimate indicates the presence of interpixel
redundancy
„ If the pixels are statistically independent, the higher-order
estimates are equivalent to the first-order estimate and variable
length coding (VLC) provides optimal compression (i.e., no further
mapping is necessary)
„ First-order estimate of difference mapped source : 1.41 bits/pixel
1.41 > 1.25 ⇒ there is an even better mapping

CCU, Taiwan
Wen-Nung Lie 8-25
Error-free compression --
variable-length coding (VLC)
„ Huffman coding
„ the most popular method to yield the smallest possible number of
code symbols per source symbol
„ construct the Huffman tree according to source symbol
probabilities
„ Code the Huffman tree
„ Compute the source entropy, average code length, and code
efficiency

CCU, Taiwan
Wen-Nung Lie 8-26
„ Huffman code is :
„ a block code (each symbol is mapped to a fixed sequence of bits)
„ instantaneous (decoded without referencing succeeding symbols)
„ uniquely decodable (any code word is not a prefix of another)
„ for example : 010100111100 → a3 a1 a2 a2 a6

Lavg = 2.2 bits / symbol H(z)=2.14 bits/symbol η = 0.973

CCU, Taiwan
Wen-Nung Lie 8-27
Variations of VLC

CCU, Taiwan
Wen-Nung Lie 8-28
Variations of VLC (cont.)
„ Considered when a large number of source symbols are
considered
„ Truncated Huffman coding
„ Only partial symbols (with most probabilities, here the top 12) are
encoded with Huffman code
„ A prefix code (whose probability was the sum of other symbols,
here “10”) followed with a fixed-length code is used to represent
all the other symbols
„ B-code
„ Each code word is made up of continuation bits (C) and
information bits (natural binary numbers)
„ C can be 1 or 0, but alternative
„ E.g., a11a2 a7 → 001010101000010 or 101110001100110
CCU, Taiwan
Wen-Nung Lie 8-29
Variations of VLC (cont.)
„ Binary shift code
„ Divide the symbols into blocks of equal size
„ Code the individual elements within all blocks identically
„ Add shift-up or shift-down symbols to identify each block (here,
“111”)
„ Huffman shift code
„ Select one reference block
„ Sum up probabilities of all other blocks and use it to determine the
shift symbol by Huffman method (here, “00”)

CCU, Taiwan
Wen-Nung Lie 8-30
Arithmetic coding
„ AC generates non-block codes (no
look-up table as in Huffman code)
„ An entire sequence of source symbols
is assigned a single code word
„ The code word itself defines an
interval of real numbers between 0 and
1. As the number of symbols in the
message increases, the interval
becomes smaller (more information
bits to represent this real number)
„ Multiple symbols are integratedly
coded with a set of bits → number of
bits per symbol is effectively
fractional
CCU, Taiwan
Wen-Nung Lie 8-31
Arithmetic coding (cont.)
„ Example : coding of a1a2a3a3a4
„ Any real number between [0.06752, 0.0688] can represent
the source symbols (e.g., 0.000100011=0.068359375)
„ Theoretically, 0.068 is enough to represent the source
symbol → 3 decimal digits → 0.6 decimal digits per
symbol
„ Actually, 0.068359375 → 9 bits for 5 symbols → 1.8 bits
per symbol
„ As the length of the source symbol sequence increases, the
resulting arithmetic code approaches the Shannon’s bound
„ Two factors limit the coding performance
„ An end-of-message indicator is necessary to separate one
message sequence from another
CCU, Taiwan „ The finite precision arithmetic 8-32
Wen-Nung Lie
LZW (Lempel-Ziv-Welch)
coding
„ Assign fixed-length code words to variable-length
sequences of source symbols but require no a priori
knowledge of the probabilities of occurrence of symbols
„ LZW compression has been integrated into GIF, TIFF, and
PDF formats
„ For a 9-bit (512 words) dictionary, the latter half entries
are constructed in the encoding process. An entry of two or
multiple pixels can be possible (i.e., assigned with a 9 bits
code)
„ If the size of the dictionary is too small, the detection of matching
gray-level sequences will be less likely.
„ If too large, the size of code words will adversely affect
compression performance
CCU, Taiwan
Wen-Nung Lie 8-33
LZW coding (cont.)
„ The LZW decoder builds an
identical decompression
dictionary as it decodes
simultaneously the bit
stream
„ Notice to handle the
dictionary overflow

CCU, Taiwan
Wen-Nung Lie 8-34
Bit-plane coding
„ Bit-plane decomposition
„ Binary : bit change between two adjacent codes may be
significant (e.g., 127 and 128)
„ Gray code : only 1 bit is changed between any two
adjacent codes
am −1 2 m −1 + am − 2 2 m − 2 + ... + a1 21 + a0 20
g i = ai ⊕ ai +1
g m −1 = am −1 0≤i ≤ m−2
„ Gray-coded bit planes are less complex than the
corresponding binary bit planes
CCU, Taiwan
Wen-Nung Lie 8-35
CCU, Taiwan Binary-coded gray-coded 8-36
Wen-Nung Lie
Lossless compression for
binary images
„ 1-D run-length coding
„ RLC+VLC according to run-lengths statistics
„ 2-D run-length coding
„ used for FAX image compression
„ Relative address coding (RAC)
„ based on the principle of tracking the binary transitions that
begin and end each black and white run
„ combined with VLC

CCU, Taiwan
Wen-Nung Lie 8-37
Other methods for lossless
compression of binary images
„ White block skipping
„ code the solid lines as 0’s and all other lines with a 1 followed by
original bit patterns
„ Direct contour tracing
„ represent each contour by a single boundary point and a set of
directionals
„ Predictive differential quantizing (PDO)
„ a scan-line-oriented contour tracing procedure
„ Comparisons of various algorithms
„ only entropies after pixel mapping are computed, instead of real
encoding (see next slice)

CCU, Taiwan
Wen-Nung Lie 8-38
Comparison of various
methods
„ Run-length coding proved to
be the best coding method for
bit-plane coded images
„ 2-D techniques (PDQ, DDC,
RAC) perform better when
compressing binary images or
higher bit-plane images
„ Gray-coded images proved to
gain additional 1 bit/pixel
compression efficiency relative
to binary-coded images

CCU, Taiwan For binary image


Wen-Nung Lie 8-39
Lossless predictive coding
„ Eliminating the interpixel redundancies of closely spaced pixels
by extracting and coding only the new information (the
difference between the actual and predicted value) in each pixel
„ System architecture
„ the same predictor in the encoder and decoder sides en = f n − fˆn
„ rounding the predicted value
„ RLC symbol encoder

CCU, Taiwan
Wen-Nung Lie 8-40
Methods of prediction
„ Use local, global, or adaptive predictor to generate fˆn
„ linear prediction : linear combination of m previous pixels
⎡ m

fˆn = round ⎢∑αi f n −i ⎥
⎣ i =1 ⎦
„ m : order of prediction
„ αi : prediction coefficients
„ Use local neighborhoods (e.g., pixels-1,2,3,4) for
prediction of pixel-X in 2-D images 2 3 4
„ special case : previous pixel predictor 1 X

CCU, Taiwan
Wen-Nung Lie 8-41
Entropy reduction by
difference mapping
„ Due to removal of interpixel redundancies by prediction, first-
order entropy of difference mapping will be lower than the
original image (3.96 bits/pixel vs. 6.81 bits/pixel)
„ The probability density function of the prediction errors is
highly peaked at zero and characterized by a relatively small
variance (modeled by the zero mean uncorrelated Laplacian pdf)
− 2e
1 σe
pe ( e) = e
2σ e

λ −λ e
pe ( e ) = e
2

CCU, Taiwan
Wen-Nung Lie 8-42
Lossy predictive coding
„ Error-free encoding of images seldom results in more than
3:1 reduction in data
„ A lossy predictive coding model
„ the prediction in the encoder and decoder must be equivalent (same)
-- placing encoder’s predictor within a feedback loop
f&n = e&n + fˆn

This closed loop


configuration prevents
error buildup at
decoder’s output

CCU, Taiwan
Wen-Nung Lie 8-43
Delta modulation (DM)
⎧ + ς for en > 0
„ DM : fˆn = αf&n −1 e&n = ⎨
⎩− ς otherwise
„ the resulting bit rate is 1 bit/pixel

„ Slope overload effect


„ ς is too small to represent input’s large changes, lead to blurred
object edges
„ Granular noise effect
„ ς is too large to represent input’s small change, lead to grainy or
noisy surfaces

CCU, Taiwan
Wen-Nung Lie 8-44
Delta modulation (cont)

CCU, Taiwan
Wen-Nung Lie 8-45
Optimal predictors
„ Minimize encoder’s mean-square prediction error
E{en2 } = E{[ f n − fˆn ]2 }
m

with f&n = en + fˆn ≈ en + fˆn = f n fˆn = ∑αi f n −i DPCM


i =1
⇒ select m prediction coefficients to minimize

⎪ ⎡ m

2
⎫⎪
E{en } = E ⎨ ⎢ f n − ∑ α i f n −i ⎥ ⎬
2

⎪⎩ ⎣ i =1 ⎦ ⎪⎭
„ The solution : α = R r −1

⎡ α1 ⎤
⎡ E{ f n f n −1} ⎤ ⎡ E{ f n −1 f n −1} E{ f n −1 f n −2 } . . E{ f n −1 f n −m } ⎤ ⎢α ⎥
⎢ E{ f f } ⎥ ⎢ E{ f f } E{ f f } ⎥ ⎢ 2⎥
⎢ − ⎥ . . .
n n 2
⎢ n − 2 n −1 n −2 n −2 ⎥ α =⎢ . ⎥
r=⎢ . ⎥ R=⎢ . . . . . ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ . ⎥
⎢ . ⎥ ⎢ . . . . . ⎥ ⎢⎣α m ⎥⎦
⎢⎣ E{ f n f n −m }⎥⎦ ⎢⎣ E{ f n −m f n −1} . . . E{ f n −m f n −m }⎥⎦
CCU, Taiwan
Wen-Nung Lie 8-46
Optimal predictors (cont)
„ Variance of prediction error m
σ = σ − α r = σ − ∑ E{ f n f n −i }α i
2
e
2 T 2

i =1
„ For a 2-D Markov source with separable autocorrelation
function E{ f ( x, y ) f ( x − i, y − j )} = σ 2 ρ vi ρ hj 2 1

3 X

and 4-th order linear predictor 4

fˆ ( x, y ) = α1 f ( x, y − 1) + α 2 f ( x − 1, y − 1) + α 3 f ( x − 1, y ) + α 4 f ( x − 1, y + 1)
α1 = ρ h , α 2 = − ρ v ρ h , α 3 = ρ v , α 4 = 0
„ We generally have m

∑α
i =1
i ≤ 1 .0

to reduce the DPCM decoder’s susceptibility to


CCU, Taiwan transmission noise Confines the impact of an input error to a 8-47
Wen-Nung Lie small number of outputs
Examples of predictors
„ Four examples :
fˆ ( x, y ) = 0.97 f ( x, y − 1)
fˆ ( x, y ) = 0.5 f ( x, y − 1) + 0.5 f ( x − 1, y )
fˆ ( x, y ) = 0.75 f ( x, y − 1) + 0.75 f ( x − 1, y ) − 0.5 f ( x − 1, y − 1)
⎧ 0.97 f ( x, y − 1) if∆h ≤ ∆v
fˆ ( x, y ) = ⎨ Adaptive with local measure of
⎩0.97 f ( x − 1, y ) otherwise directional property

„ The visually perceptible error decreases as the order of the


predictor increases

CCU, Taiwan
Wen-Nung Lie 8-48
Optimal quantization
„ The staircase quantization function is an odd function that
can be described by the decision ( si ) and reconstruction ( ti )
levels
„ Quantizer design is to select the best si and ti for a
particular optimization criterion and input probability
density function p(s)

CCU, Taiwan
Wen-Nung Lie 8-49
L-level Lloyd-Max quantizer
„ Optimal in the mean-square error sense
si ti is the centroid of

si −1
( s − ti ) p ( s )ds = 0 i = 1,2,..., L2 each decision interval
⎧ 0 i=0

si = ⎨ ti +2ti +1 i = 1,2,..., L2 − 1 Decision levels are halfway
⎪ ∞ i= of the reconstruction levels

L
2
Non-uniform
quantizer s−i = − si , t−i = −ti

How about an optimal


uniform quantizer ?

CCU, Taiwan
Wen-Nung Lie 8-50
Transform coding
„ Subimage decomposition
„ Transformation
„ Decorrelate the pixels of each subimage
„ Energy packing
„ Quantization
„ Eliminate coefficients that carry the least information
„ coding

CCU, Taiwan
Wen-Nung Lie 8-51
Transform selection
„ Requirements
„ orthonormal or unitary forward and inverse transformation kernels
„ basis function or basis images
„ separable and symmetric for the kernels
„ Types
„ DFT
„ DCT
„ WHT (Walsh-Hadamard transform)
„ Compression is achieved during the quantization of the
transformed coefficients (not during the transformation
step)
CCU, Taiwan
Wen-Nung Lie 8-52
WHT
„ The summation in the exponent is performed in modulo 2
arithmetic
„ bk (z ) is the kth bit (from right to left) in the binary
representation of z
„ N = 2m
N −1 N −1
T (u, v ) = ∑∑ f ( x, y ) g ( x, y , u, v )
x =0 y =0
N −1 N −1
f ( x, y ) = ∑∑ T (u, v )h ( x, y , u, v )
u =0 v =0
m −1

1 ∑ ⎣bi ( x ) pi ( u )+bi ( y ) pi ( v ) ⎦
g ( x, y , u, v ) = h( x, y , u, v ) = ( −1) i =0
N
CCU, Taiwan
Wen-Nung Lie 8-53
DCT
⎡ ( 2 x + 1)uπ ⎤ ⎡ ( 2 y + 1)vπ ⎤
„ g ( x, y , u, v ) = h ( x, y , u, v ) = α (u )α ( v ) cos ⎢ ⎥ cos ⎢ ⎥⎦
⎣ 2N ⎦ ⎣ 2N
⎧⎪ 1
for u = 0
α (u ) = ⎨ N

⎪⎩ 2
N for u = 1,2,..., N − 1

CCU, Taiwan
Wen-Nung Lie 8-54
Approximations using DFT,
WHT, and DCT
„ Truncating 50% of the
transformed coefficients
based on maximum DFT
magnitudes
„ The RMS error values are
1.28 (DFT), 0.86 (WHT), and
0.68 (DCT). WHT

DCT
CCU, Taiwan
Wen-Nung Lie 8-55
Basis images
A linear combination of n basis images : H uv
2
„

n −1 n −1
F = ∑∑ T (u , v)H uv
u =0 v =0

⎡ h(0,0, u , v) h(0,1, u , v) . . h(0, n − 1, u , v) ⎤


⎢ h(1,0, u , v) . . . . ⎥
⎢ ⎥
H uv = ⎢ . . . . . ⎥
⎢ ⎥
⎢ . . . . . ⎥
⎢⎣h(n − 1,0, u, v) h(n − 1,1, u , v) . . h(n − 1, n − 1, u , v)⎥⎦

CCU, Taiwan
Wen-Nung Lie 8-56
Transform coefficient masking
„ Masking function : γ (u, v)
n −1 n −1
Fˆ = ∑∑ γ (u , v)T (u , v)H uv
u =0 v =0

„ Optimalizing masking function :


n −1 n −1 n −1 n −1 2

ems = E{ F − Fˆ } = E{ ∑∑ T (u , v)H uv − ∑∑ γ (u , v)T (u , v)H uv }


2

u =0 v =0 u =0 v =0
n −1 n −1
= ⋅ ⋅ ⋅ = ∑∑ σ T2 ( u ,v ) [1 − γ (u , v)]
u =0 v =0

The total mean-square approximation error is the sum


KLT or
of the variances of the discarded transform coefficients
DCT
CCU, Taiwan Transformations that pack the most information into
Wen-Nung Lie the fewest coefficients provide the best approximation 8-57
Subimage size selection
„ The most popular subimage sizes are 8×8 and 16×16.
„ A comparison of n×n subimage for n=2,4,8,16
„ Truncating 75% of resulting coefficients
„ Curves of WHT and DCT flatten as size of subimage becomes greater
than 8×8, while FT does not
„ Block effect decreases when subimage size increases

CCU, Taiwan
Wen-Nung Lie 8-58
original 2x2
subimage

4x4 8x8
subimage subimage

CCU, Taiwan
Wen-Nung Lie 8-59
Zonal vs. threshold coding
„ The coefficients are retained based on
„ Maximum variance : zonal coding
„ Maximum magnitude : threshold coding
„ Zonal coding
„ Each DCT coefficient is considered as a random variable whose
distribution could be computed over the ensemble of all transformed
subimages
„ The variances can also be based on an assumed image model (e.g., a
Markov autocorrelation function)
„ Coefficients of maximum variances are usually located around the origin
„ Threshold coding
„ Select coefficients which have the largest magnitudes
„ Causing far less error than the zonal coding result

CCU, Taiwan
Wen-Nung Lie 8-60
Bit allocation for
zonal coding
„ The retained coefficients are
allocated bits according to :
„ The dc coefficient is modeled by
a Rayleigh density function
„ The ac coefficients are often
modeled by a Laplacian or
Gaussian density
„ The number of bits is made
proportional to log 2 σ T2(u ,v )
The information content of a Gaussian
random variable is proportional to log 2 (σ / D)
2

Threshold Zonal
coding coding
CCU, Taiwan
Wen-Nung Lie 8-61
Bit allocation in threshold
coding
„ The transform coefficients of largest magnitude make the
most significant contribution to reconstructed subimage
quality
„ Inherently adaptive in the sense that the location of the
retained transform coefficients vary from one subimage to
another
„ Retained coefficients are recorded in a 1-D zigzag ordering
pattern
„ The mask pattern is run-length coded

CCU, Taiwan
Wen-Nung Lie 8-62
Zonal mask Bit allocation

Thresholded and Zigzag ordering pattern


retained coefficients
CCU, Taiwan
Wen-Nung Lie 8-63
Bit allocation in threshold
coding (cont)
„ Thresholding the transform coefficients -- the threshold
can be varied as a function of the location of each
coefficient within the subimage
„ Result in a variable code rate, but offer the advantage that
thresholding and quantization can be combined by replacing the
masking (i.e., γ (u , v)T (u , v) ) with
T (u , v)
Tˆ (u , v) = round [ ]
Z (u , v)
Z(u,v) is the transform normalization array

„ T& (u, v) = Tˆ (u, v) Z (u, v) Recovered or denormalized value


„ The element Z(u, v) can be scaled to achieve a variety of
compression levels
CCU, Taiwan
Wen-Nung Lie 8-64
Normalization (quantization)
matrix
„ Used in JPEG standard

CCU, Taiwan
Wen-Nung Lie 8-65
Wavelet coding
„ Requirements in encoding an image
„ An analyzing wavelet
„ Number of decomposition levels

„ The digital wavelets transform (DWT) converts a large


portion of the original image to horizontal, vertical and
diagonal decomposition coefficients with zero mean and
Laplacian-like distributions.
CCU, Taiwan
Wen-Nung Lie 8-66
Wavelet coding (cont)
„ Many computed coefficients can be quantized and coded to
minimize intercoefficient and coding redundancy
„ The quantization can be adapted to exploit any positional
correlation across different decomposition levels
„ Subdivision of the original image is unnecessary
„ Eliminating the blocking artifact that characterizes DCT-based
approximations at high compression ratios

CCU, Taiwan
Wen-Nung Lie 8-67
1-D Subband decomposition
„ Analysis filter :
„ Low-pass : approximation h0 (n)
„ High-pass : detail h1 (n)
„ Synthesis filter :
„ Low-pass : g 0 (n)
„ High-pass : g1 (n)

CCU, Taiwan
Wen-Nung Lie 8-68
2-D Subband image coding –
4 filter bank

CCU, Taiwan
Wen-Nung Lie 8-69
An example of Daubechies
orthonormal filters

CCU, Taiwan
Wen-Nung Lie 8-70
Comparison between DCT-
based and DWT-based coding
„ A noticeable decrease of error in the wavelet coding results (based on
compression ratios of 34:1 and 67:1)

DWT-based DCT-based

Rms error : Rms error :


2.29, 2.96 3.42, 6.33

CCU, Taiwan
Wen-Nung Lie 8-71
„ Based on compression ratios of
108:1 and 167:1
„ Even the 167:1 DWT result
(rms=4.73) is better than 67:1
DCT result (rms=6.33)
„ At more than twice of Rms error :
compression ratio, the wavelet- 3.72, 4.73
based reconstruction has only
75% of the error of the DCT-
based result

CCU, Taiwan
Wen-Nung Lie 8-72
Wavelet selection
„ The wavelet selection affects all aspects of wavelet coding
system design and performance
„ Computational complexity of the transform
„ System’s ability to compress and reconstruct images of acceptable
error
„ Include the decomposition filters and reconstruction filters
„ Useful analysis property : number of zero moments
„ Important synthesis property : smoothness of reconstruction
„ The most widely used expansion functions
„ Daubechies wavelets
Allows filters with binary coefficients (numbers of the
„ Bi-orthogonal wavelets form k/2a)

CCU, Taiwan
Wen-Nung Lie 8-73
Comparison between different
wavelets
„ (1) Harr wavelets : the
simplest
„ (2) Daubechies wavelets :
the most popular imaging
1,2
wavelets
3,4
„ (3) Symlets : an extension
of Daubechies with
increased symmetry
„ (4) Cohen-Daubechies-
Feauveau wavelets :
biorthogonal wavelets

CCU, Taiwan
Wen-Nung Lie 8-74
Comparison between different
wavelets (cont)
„ Comparisons in number of operations required
„ As the computational complexity increases, the information
packing ability does as well

For analysis and reconstruction filters, respectively

CCU, Taiwan
Wen-Nung Lie 8-75
Other issues
„ The number of decomposition levels required
„ The initial decompositions are responsible for the majority of the data
compression (3 levels is enough)

„ Quantizer design
„ Introduce an enlarged quantization interval around zero (i.e., the dead
zone)
„ Adapting the size of the quantization interval from scale to scale
„ Quantization of coefficients at more decomposition levels impacts larger
areas of the reconstructed image

CCU, Taiwan
Wen-Nung Lie 8-76
Binary image compression
standards
„ Most of the standards are issued by
„ International Standardization Organization (ISO)
„ Consultative Committee of the International Telephone and Telegraph
(CCITT)
„ CCITT Group 3 and 4 are for binary image compression
„ Originally designed as fasimile (FAX) coding method
„ G3 : Nonadaptive, 1-D run-length coding
„ G4 : a simplified or streamlined version of G3, only 2-D coding
„ The coding approach is quite similar to the RAC method
„ Joint Bilevel Imaging Group (JBIG)
„ A joint committee of CCITT and ISO
„ Proposed JBIG1: adaptive arithmetic compression technique (the best
average and worst-case available)
„ Proposed JBIG2 : achieve compressions 2 to 4 times greater than JBIG1
CCU, Taiwan
Wen-Nung Lie 8-77
Continuous-tone image
compression -- JPEG
„ JPEG, wavelet-based JPEG-2000, JPEG-LS
„ JPEG – define three coding system
„ Lossy baseline coding system
„ Extended coding system – for greater compression, higher
precision, or progressive reconstruction applications
„ Lossless independent coding system
„ A product or system must include support for the baseline system
„ Baseline system
„ Input, output : 8 bits
„ Quantized DCT values : 11 bits
„ Subdivided into blocks of 8x8 pixels for encoding
„ Pixels are level shifted by substracting 128 graylevels
CCU, Taiwan
Wen-Nung Lie 8-78
JPEG (cont)
„ DCT-transformed
„ Quantized by using the quantization or normalization matrix
„ The DC DCT coefficients is difference coded relative to the DC
coefficient of the previous block
„ The non-zero AC coefficients are coded by using a VLC that
defines the coefficient’s value and number of preceding zeros
„ The user is free to construct custom tables and/or arrays, which may
be adapted to the characteristics of the image being compressed
„ Add a special EOB (end of block) code behind the last non-zero
AC coefficient

CCU, Taiwan
Wen-Nung Lie 8-79
Huffman Coding for the DC
and AC coefficients
„ VLC = base code (category code) + value code
„ An example to encode DC = -9
„ Category 4 : -15,…,-8,8,…,15 (base code = 101)
„ Total length = 7 Æ value code length = 4
„ For a category K, additional K bits are needed as the value code and
computed as either the K LSBs (positive value) or the K LSBs of 1’s
complement of its absolute value (negative value) : K=4, value code =
0110
„ Complete code : 101-0110
„ An example to encode AC = (0,-3)
„ length of “0” run : 0, AC coefficient value : -3
„ AC = -3 Æ category = 2 Æ 0/2 (Table 8.19) Æ base code = 01
„ Value code = 00
„ Complete code = 0100

CCU, Taiwan
Wen-Nung Lie 8-80
A JPEG coding example
52 55 61 66 70 61 64 73 -76 -73 -67 -62 -58 -67 -64 -55
63 59 66 90 109 85 69 72 -65 -69 -62 -38 -19 -43 -59 -56
62 59 68 113 144 104 66 73 -66 -69 -60 -15 16 -24 -62 -55

original 63 58 71 122 154 106 70 69 -65 -70 -57 -6 26 -22 -58 -59 Level
67 61 68 104 126 88 68 70 -61 -67 -60 -24 -2 -40 -60 -58 shifted
79 65 60 70 77 68 58 75 -49 -63 -68 -58 -51 -65 -70 -53
85 71 64 59 55 61 65 83 -43 -57 -64 -69 -73 -67 -63 -45
87 79 69 68 65 76 78 94 -41 -49 -59 -60 -63 -52 -50 -34

-415 -29 -62 25 55 -20 -1 3


7 -21 -62 9 11 -7 -6 6

-46 8 77 -25 -30 10 7 -5

-50 13 35 -15 -9 6 0 3
DCT
transformed 11 -8 -13 -2 -1 1 -4 1
-10 1 3 -3 -1 0 2 -1
-4 -1 2 -1 2 -3 1 -2
CCU, Taiwan -1 1 -1 -2 -1 -1 0 -1
Wen-Nung Lie 8-81
-26 -3 -6 2 2 0 0 0
1 -2 -4 0 0 0 0 0
-3 1 5 -1 -1 0 0 0

quantized -4 1 2 -1 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

1-D Zigzag scanning : [-26 –3 1 –3 –2 –6 2 –4 1 –4 1 1 5 0 2 0 0 –1 2 0 0 0 0 0 –1 –1 EOB]

Symbols to be coded : (DPCM =-9, assumed),(0,-3),(0,1),(0,-3),(0,-2),(0,-6),(0,2),(0,-


4),(0,1),(0,-4),(0,1),(0,1),(0,5),(1,2),(2,-1),(0,2),(5,-1),(0,-1),EOB

Final codes :
1010110,0100,001,0100,0101,100001,0110,100011,001,100011,001,001,100101,1110011
0,110110,0110,11110100,000,1010
CCU, Taiwan
Wen-Nung Lie 8-82
58 64 67 64 59 62 70 78
56 55 67 89 98 88 74 69
60 50 70 119 141 116 80 64
69 51 71 128 149 115 77 68
Reconstructed after VLD
74 53 64 105 115 84 65 72
and requantization
76 57 56 74 75 57 57 74
83 69 59 60 61 61 67 78
93 81 67 62 69 80 84 84

-6 -9 -6 2 11 -1 -6 -5
7 4 -1 1 11 -3 -5 3
2 9 -2 -6 -3 -12 -14 9

residual -6 7 0 -4 -5 -9 -7 1
-7 8 4 -1 11 4 3 -2
3 8 4 -4 2 11 1 1
2 2 5 -1 -6 0 -2 5
CCU, Taiwan -6 -2 2 6 -4 -4 -6 10
Wen-Nung Lie 8-83
CCU, Taiwan
Wen-Nung Lie 8-84
CCU, Taiwan
Wen-Nung Lie 8-85
JPEG-2000
„ Increased flexibility to the access of compressed data
„ Portions of a JPEG 2000 compressed image can be extracted for
re-transmission, storage, display, and editing
„ Procedures
„ DC level shift by substracting 128
„ Optionally decorrelated by using reversible or non-reversible linear
combination of components (e.g., RGB to YCrCb component
transformation) (i.e., RGB-based or YCrCb-based JPEG-2000
coding are allowed)
Y0 ( x, y ) = 0.299 I 0 ( x, y ) + 0.587 I 1 ( x, y ) + 0.114 I 2 ( x, y )
Y1 ( x, y ) = −0.16875 I 0 ( x, y ) − 0.33126 I 1 ( x, y ) + 0.5 I 2 ( x, y )
Y2 ( x, y ) = 0.5 I 0 ( x, y ) − 0.41869 I 1 ( x, y ) − 0.08131I 2 ( x, y )

CCU, Taiwan Y1 and Y2 are different images highly peaked around zero
Wen-Nung Lie 8-86
JPEG-2000 (cont)
„ Optionally divided into tiles (rectangular arrays of pixels which
can be extracted and reconstructed independently), providing a
mechanism for accessing a limited region of a coded image
„ 2-D DWT by using a biorthogonal 5-3 coefficient filter (lossless)
or a 9-7 coefficient filter (lossy) produces a low-resolution
approximation and horizontal, vertical, and diagonal frequency
components (LL, HL,LH, HH bands)

CCU, Taiwan
Wen-Nung Lie 8-87
9-7 filter for lossy DWT

i 低通濾波器 H0 高通濾波器 H1
0 6/8 1
1 2/8 -1/2
2 -1/8

5-3 filter for lossless DWT

CCU, Taiwan
Wen-Nung Lie 8-88
JPEG-2000 (cont)
„ A fast DWT algorithm or a lifting-based approach is often used
„ Iterative decomposition of the approximation part to obtain :

The standard does not specify


Difference from subband the number of scales (i.e., the
decomposition number of decomposition
levels) to be computed

„ Important visual information is concentrated in a few coefficients

CCU, Taiwan
Wen-Nung Lie 8-89
CCU, Taiwan
Wen-Nung Lie 8-90
JPEG-2000 (cont)
„ Coefficient quantization adapted to individual scales and subbands
ab (u , v)
qb (u , v) = sign[ab (u , v)] ⋅ floor[ ]
∆b
µb
quantization step size : ∆ b = 2 R −ε (1 +
b b
)
211
Rb is the nominal dynamic range (in bits) of subband b, εb and
µb are the number of bits allocated to the exponent and
mantissa of subband’s coefficients

For error-free compression, µb = 0, Rb = εb, and ∆b=1


„ The number of exponent and mantissa bits should be provided to
the decoder on a subband basis, called explicit quantization, or for
the LL subbands only, called implicit quantization (the remaining
subbands are quantized using extrapolated LL subband parameters)
CCU, Taiwan
Wen-Nung Lie 8-91
JPEG-2000 (cont)
„ Quantized coefficients are arithmetically coded on a bit-plane basis
„ The coefficients are arranged into rectangular blocks called code
blocks, which are individually coded a bit plane at a time
„ Starting from the MSB bit plane with nonzero elements to the LSB bit
plane
„ Encoding by using the context-based arithmetic coding method
„ The coding outputs from each code block are grouped to form
layers
„ The resulting layers are finally partitioned into packets
„ The encoder can encode Mb bitplanes for a particular
subband, the decoder can decode only Nb bitplanes, due to
the embedded nature of the code stream

CCU, Taiwan
Wen-Nung Lie 8-92
N+1 decomposition level N decomposition level N-1 decomposition level

code-block 3 code-block 5 code-block 7


code-block 1
bit-stream
bit-stream code-block 2 bit-stream code-block 4 bit-stream code-block 6
bit-stream bit-stream bit-stream
R-D / Quality

empty
Layer 5

empty
Layer 4

empty
Layer 3

Layer 2

empty
Layer 1
Resolution

CCU, Taiwan
Wen-Nung Lie 8-93
Video compression -- motion-
compensated transform coding
„ Standards
„ Video teleconferencing standard (H.261, H.263, H.263+, H.320, H.264)
„ Multimedia standard (MPEG-1/2/4)
„ A combination of predictive coding (in temporal domain) and
transform coding (in spatial domain)
„ Estimate (predict) the image by simply propagating pixels in the
previous frame along their motion trajectory – motion
compensation
„ The success depends on accuracy, speed, and robustness of the
displacement estimator -- motion estimation
„ Motion estimation is based on each image block (16×16 or 8×8
pixels)
CCU, Taiwan
Wen-Nung Lie 8-94
Motion-compensated
transform coding - encoder

CCU, Taiwan
Wen-Nung Lie 8-95
Motion-compensated
transform coding -- decoder

Compressed Reconstructed
bit stream Frame
BUF VLD IT +

MC
Predictor
Motion Vectors

IT : inverse DCT transform


VLD : variable length decoding
CCU, Taiwan
Wen-Nung Lie 8-96
2-D Motion estimation
techniques
„ Assumptions :
„ Rigid translational movements only
„ Uniform illumination in time and space
„ Methods :
„ Optical flow approach
„ Pel-recursive approach
„ Block-matching approach
„ Different from motion estimation in target tracking
„ Causal operations (forward prediction, no future frames at decoder)
„ Goal: reproduce pictures with minimum distortion, not necessarily
estimate accurate motion parameters

CCU, Taiwan
Wen-Nung Lie 8-97
Block-Matching (BM) Motion
Estimation
„ Find the best match (mvx , mv y ) between current image block and
candidates in the previous frame by minimizing the following cost

∑ x ( x, y ) − x
( x , y )∈block
n n −1 ( x + mvx , y + mv y )
„ One motion vector (MV) for each block between successive frames
frame k-1

frame k

search window
block
CCU, Taiwan
Wen-Nung Lie 8-98
BM motion estimation
techniques
„ Full search method
„ Fast Search (Reduced Search)
„ Usually a multistage search procedure that calculates fewer search points
„ Local-(or Sub-)optima in general
„ Evaluation – number of search points, noise immunity
„ Existing Algorithms :
„ Three-step method
„ Four-step method
„ Log-search method
„ Hierarchical matching method
„ Prediction search method
„ Kalman prediction method

CCU, Taiwan
Wen-Nung Lie 8-99
Fast full search
„ Strategy of early quit to next search point in computing partial errors
„ Re-ordering in computing errors
„ Conventional : raster-scanning order
„ Improvement : row or column sorting by gradient magnitudes in decreasing order
„ A 3x~4x speedup with respect to traditional full search

Search area

23 30 32 35
22 32 33 37
45 46 41 36
41 29 31 20 0 3 12 12
11 6 4 6
difference 32 26 38 49
10 8 10 6 34 23 27 20 23 1 16 19 29
21 22 22 25 90 33 38 29 31
31 26 31 37 125 77 72 79 85 Assume : current minimum
25 21 18 22 86 40 45 50 49 error = 100

gradient Template
CCU, Taiwan
Wen-Nung Lie 8-100
Three steps search
„ The full search examines 15×15=225 positions, the three step search examines
9+8+8=25 positions (assume MV range is –7 to +7)

Search range
CCU, Taiwan
Wen-Nung Lie 8-101
Predictive search
„ Explore the similarity of MVs between adjacent macroblocks
„ Obtain initially predicted MV from neighboring macroblocks by
averaging
„ Prediction followed by a small-area detailed search
„ Significantly reduce the CPU time by 4~10X

MVpredict=(MV1+MV2+MV3)/3 Search area

MV1 MV2

MV3

CCU, Taiwan
Wen-Nung Lie 8-102
Fractional pixel motion
estimation
„ Question : what happens if motion vector (mvx , mv y ) are real numbers instead
of integer numbers ?
„ Answer : Find integer motion vector first and then do interpolation from
surrounding pixels for refined search. Most popular case is the half-pixel
accuracy (i.e., 1 out of 8 positions for refinement). 1/3 pixel accuracy will be
proposed in H.263L standard.
„ Half-pixel accuracy often leads to a matching with less residue (high
reconstruction quality), hence decreases the bit rate required in following
encoding.

: integer MV position
CCU, Taiwan
Wen-Nung Lie
: candidates of half-pixel MV positions 8-103
Evaluation of BM motion
estimation
„ Advantages :
„ Straightforward, regular operation, and promising for VLSI chip
design
„ Robustness (no differential operations as in optical flow approach)
„ Disadvantages :
„ Can not process several moving objects in a block (e.g., around
occlusion boundaries)
„ Improvements :
„ Post-processing of images to eliminate blocking effect
„ Interpolation of images or averaging of motion vectors to subpixel
accuracy

CCU, Taiwan
Wen-Nung Lie 8-104
Image Prediction structure
„ I/ P frame only (H.261, H.263) Intra-frame

forward prediction Predictive


frame
I P P P P P ..... Bi-directional
frame

„ I/ P/ B frame (MPEG-x)
forward prediction

I B B B P B B B P B B B P B
.....
bidirectional
prediction
CCU, Taiwan
Wen-Nung Lie 8-105

You might also like