0% found this document useful (0 votes)
12 views19 pages

Entropy

Entropy in data compression

Uploaded by

redmonter John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views19 pages

Entropy

Entropy in data compression

Uploaded by

redmonter John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Entropy Encoding in Wavelet Image

Compression

Myung-Sin Song1

Department of Mathematics and Statistics, Southern Illinois University


Edwardsville [email protected]

Summary. Entropy encoding which is a way of lossless compression that is


done on an image after the quantization stage. It enables to represent an image
in a more efficient way with smallest memory for storage or transmission. In
this paper we will explore various schemes of entropy encoding and how they
work mathematically where it applies.

1 Introduction

In the process of wavelet image compression, there are three major steps
that makes the compression possible, namely, decomposition, quanti-
zation and entropy encoding steps. While quantization may be a lossy
step where some quantity of data may be lost and may not be re-
covered, entropy encoding enables a lossless compression that further
compresses the data. [13], [18], [5]
In this paper we discuss various entropy encoding schemes that are
used by engineers (in various applications).

1.1 Wavelet Image Compression

In wavelet image compression, after the quantization step (see Figure 1)


entropy encoding, which is a lossless form of compression is performed
on a particular image for more efficient storage. Either 8 bits or 16 bits
are required to store a pixel on a digital image. With efficient entropy
encoding, we can use a smaller number of bits to represent a pixel in
an image; this results in less memory usage to store or even transmit
an image. Karhunen-Loève theorem enables us to pick the best basis
thus to minimize the entropy and error, to better represent an image
for optimal storage or transmission. Also, Shannon-Fano entropy (see
2 Myung-Sin Song

section 3.3), Huffman coding (see section 3.4), Kolmogorov entropy (see
section 3.2) and arithmetic coding (see section 3.5) are ones that are
used by engineers. Here, optimal means it uses least memory space to
represent the data. i.e., instead of using 16 bits, use 11 bits. Thus, the
best basis found would make it possible to represent the digital image
with less storage memory. In addition, the choices made for entropy
encoding varies; one might take into account the effectiveness of the
coding and the degree of difficulty of implementation step into pro-
gramming codes. We will also discuss how those preferences are made
in section 3.

z–œ™ŠŒ GGm–™žˆ™‹ GGl•›™–— j–”—™ŒššŒ‹


xœˆ•›¡ˆ›–• GGGGp”ˆŽŒ
p”ˆŽŒ G{™ˆ•š–™” Gl•Š–‹•Ž

z›–™ŒG–™
{™ˆ•𔐛

yŒŠ–•š›™œŠ›Œ‹ GGGGp•Œ™šŒ
GGGGGp”ˆŽŒ
GiˆŠ’žˆ™‹ GGl•›™–— j–”—™ŒššŒ‹
G{™ˆ•š–™” xœˆ•›¡ˆ›–• GkŒŠ–‹•Ž GGGGp”ˆŽŒ

Fig. 1. Outline of the wavelet image compression process. [13]

1.2 Geometry in Hilbert Space


While finite or infinite families of nested subspaces are ubiquitous in
mathematics, and have been popular in Hilbert space theory for gen-
erations (at least since the 1930s), this idea was revived in a different
guise in 1986 by Stéphane Mallat, then an engineering graduate stu-
dent. In its adaptation to wavelets, the idea is now referred to as the
multiresolution method.
What made the idea especially popular in the wavelet community
was that it offered a skeleton on which various discrete algorithms in
applied mathematics could be attached and turned into wavelet con-
structions in harmonic analysis. In fact what we now call multiresolu-
tions have come to signify a crucial link between the world of discrete
wavelet algorithms, which are popular in computational mathematics
and in engineering (signal/image processing, data mining, etc.) on the
one side, and on the other side continuous wavelet bases in function
spaces, especially in L2 (Rd ). Further, the multiresolution idea closely
mimics how fractals are analyzed with the use of finite function systems.
Entropy Encoding in Wavelet Image Compression 3

But in mathematics, or more precisely in operator theory, the under-


lying idea dates back to work of John von Neumann, Norbert Wiener,
and Herman Wold, where nested and closed subspaces in Hilbert space
were used extensively in an axiomatic approach to stationary processes,
especially for time series. Wold proved that any (stationary) time series
can be decomposed into two different parts: The first (deterministic)
part can be exactly described by a linear combination of its own past,
while the second part is the opposite extreme; it is unitary, in the
language of von Neumann.
von Neumann’s version of the same theorem is a pillar in operator
theory. It states that every isometry in a Hilbert space H is the unique
sum of a shift isometry and a unitary operator, i.e., the initial Hilbert
space H splits canonically as an orthogonal sum of two subspaces Hs
and Hu in H, one which carries the shift operator, and the other Hu the
unitary part. The shift isometry is defined from a nested scale of closed
spaces Vn , such that the intersection of these spaces is Hu . Specifically,

· · · ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vn ⊂ Vn+1 ⊂ · · ·
^ _
Vn = Hu , and Vn = H.
n n

An important fact about the wavelet aplication is that then Hu = {0}


However, Stéphane Mallat was motivated instead by the notion of
scales of resolutions in the sense of optics. This in turn is based on
a certain “artificial-intelligence” approach to vision and optics, devel-
oped earlier by David Marr at MIT, an approach which imitates the
mechanism of vision in the human eye.
The connection from these developments in the 1980s back to von
Neumann is this: Each of the closed subspaces Vn corresponds to a
level of resolution in such a way that a larger subspace represents a
finer resolution. Resolutions are relative, not absolute! In this view,
the relative complement of the smaller (or coarser) subspace in larger
space then represents the visual detail which is added in passing from
a blurred image to a finer one, i.e., to a finer visual resolution.
This view became an instant hit in the wavelet community, as it
offered a repository for the fundamental father and the mother func-
tions, also called the scaling function ϕ, and the wavelet function ψ.
Via a system of translation and scaling operators, these functions then
generate nested subspaces, and we recover the scaling identities which
initialize the appropriate algorithms. What results is now called the
family of pyramid algorithms in wavelet analysis. The approach itself
4 Myung-Sin Song

is called the multiresolution approach (MRA) to wavelets. And in the


meantime various generalizations (GMRAs) have emerged.
Haar’s work in 1909–1910 had implicitly the key idea which got
wavelet mathematics started on a roll 75 years later with Yves Meyer,
Ingrid Daubechies, Stéphane Mallat, and others—namely the idea of a
multiresolution. In that respect Haar was ahead of his time. See Figures
2 and 3 for details.

Scaling Operator
ϕ ψ

W0

... V-1 V0 V1 V2 V3 ...


Fig. 2. Multiresolution. L2 (Rd )-version (continuous); ϕ ∈ V0 , ψ ∈ W0 .

... 2
S0S1 S0S1 S1
... S0 S0
Fig. 3. Multiresolution. l2 (Z)-version (discrete); ϕ ∈ V0 , ψ ∈ W0 .

· · · ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ · · · , V0 + W0 = V1
The word “multiresolution” suggests a connection to optics from
physics. So that should have been a hint to mathematicians to take a
closer look at trends in signal and image processing! Moreover, even
staying within mathematics, it turns out that as a general notion this
same idea of a “multiresolution” has long roots in mathematics, even
in such modern and pure areas as operator theory and Hilbert-space
geometry. Looking even closer at these interconnections, we can now
recognize scales of subspaces (so-called multiresolutions) in classical
algorithmic construction of orthogonal bases in inner-product spaces,
now taught in lots of mathematics courses under the name of the Gram–
Schmidt algorithm. Indeed, a closer look at good old Gram–Schmidt
reveals that it is a matrix algorithm, Hence new mathematical tools
involving non-commutativity!
Entropy Encoding in Wavelet Image Compression 5

If the signal to be analyzed is an image, then why not select a


fixed but suitable resolution (or a subspace of signals corresponding
to a selected resolution), and then do the computations there? The
selection of a fixed “resolution” is dictated by practical concerns. That
idea was key in turning computation of wavelet coefficients into iterated
matrix algorithms. As the matrix operations get large, the computation
is carried out in a variety of paths arising from big matrix products.
The dichotomy, continuous vs. discrete, is quite familiar to engineers.
The industrial engineers typically work with huge volumes of numbers.
In the formulas, we have the following two indexed number systems
a := (hi ) and d := (gi ), a is for averages, and d is for local differences.
They are really the input for the DWT. But they also are the key link
between the two transforms, the discrete and continuous. The link is
made up of the following scaling identities:
X
ϕ(x) = 2 hi ϕ(2x − i);
i∈Z
X
ψ(x) = 2 gi ϕ(2x − i);
i∈Z
P
and (low-pass normalization) i∈Z hi = 1. The scalars (hi ) may be
real or complex; they may be finite or infinite in number. If there are
four of them, it is called the “four tap”, etc. The finite case is best for
computations since it corresponds to compactly supported functions.
This means that the two functions ϕ and ψ will vanish outside some
finite interval on a real line.
The two number systems are further subjected to orthgonality re-
lations, of which
X 1
h̄i hi+2k = δ0,k (1)
2
i∈Z

is the best known.


Our next section outlines on how the whole wavelet image compres-
sion process works step by step.
In our next section we give the general context and definitions from
operators in Hilbert space which we shall need: We discuss the partic-
ular orthonomal bases (ONBs) and frames which we use, and we recall
the operator theoretic context of the Karhunen-Loève theorem [1]. In
approximation problems involving a stochastic component (for example
noise removal in time-series or data resulting from image processing)
one typically ends up with correlation kernels; in some cases as frame
kernels; see [10]. In some cases they arise from systems of vectors in
6 Myung-Sin Song

Hilbert space which form frames (see [10]). In some cases parts of the
frame vectors fuse (fusion-frames) onto closed subspaces, and we will
be working with the corresponding family of (orthogonal) projections.
Either way, we arrive at a family of selfadjoint positive semidefinite
operators in Hilbert space. The particular Hilbert space depends on
the application at hand. While the Spectral Theorem does allow us to
diagonalize these operators, the direct application the Spectral Theo-
rem may lead to continuous spectrum which is not directly useful in
computations, or it may not be computable by recursive algorithms.
The questions we address are optimality of approximation in a va-
riety of ONBs, and the choice of the “best” ONB. Here “best” is given
two precise meanings: (1) In the computation of a sequence of approxi-
mations to the frame vectors, the error terms must be smallest possible;
and similarly (2) we wish to minimize the corresponding sequence of en-
tropy numbers (referring to von Neumann’s entropy). In two theorems
we make precise an operator theoretic Karhunen-Loève basis, which
we show is optimal both in regards to criteria (1) and (2). But before
we prove our theorems, we give the two problems an operator theo-
retic formulation; and in fact our theorems are stated in this operator
theoretic context.

2 How it works

In wavelet image compression, wavelet decomposition is performed on a


digital image. Here, an image is treated as a matrix of functions where
the entries are pixels. The following is an example of a representation
for a digitized image function:
 
f (0, 0) f (0, 1) · · · f (0, N − 1)
 f (1, 0) f (1, 1) · · · f (1, N − 1) 
f (x, y) =  .. .. .. .. . (2)
 
 . . . . 
f (M − 1, 0) f (M − 1, 1) · · · f (M − 1, N − 1)

After the decomposition quantization is performed on the image. The


quantization maybe a lossy (meaning some information is being lost) or
lossless. Then a lossless means of compression, entropy encoding is done
on the image to minimize the memory space for storage or transmission.
Here the mechanism of entropy will be discussed.
Entropy Encoding in Wavelet Image Compression 7

2.1 Entropy Encoding

In most images their neighboring pixels are correlated and thus contain
redundant information. Our task is to to find less correlated representa-
tion of the image, then perform redundancy reduction and irrelavancy
reduction. Redundancy reduction removes duplication from the signal
source (for instance a digital image). Irrelavancy reduction omits parts
of the signal that will not be noticed by the Human Visual System
(HVS).
Entropy encoding further compresses the quantized values in lossless
manner which gives better compression in overall. It uses a model to
accurately determine the probabilities for each quantized value and
produces an appropriate code based on these probabilities so that the
resultant output code stream will be smaller than the input stream.

Some Terminology

(i) Spatial Redundancy : correlation between neighboring pixel values.


(ii)Spectral Redundancy : correlation between different color planes or
spectral bands.
When a digital image is 1-level wavelet decomposed from the matrix
representation in section 2
Inside the paper we use (ϕi ) and (ψi ) to denote generic ONBs.
However, in wavelet theory, [4] there is a tradition for reserving ϕ for
the father function and ψ for the mother function. A 1-level wavelet
transform of an N × M image can be represented as
 1
a | h1

f 7→ −− −− (3)


v1 | d1

where the subimages h1 , d1 , a1 and v1 each have the dimension of N/2


by M/2.

a1 = Vm1 ⊗ Vn1 : ϕA (x, y) = ϕ(x)ϕ(y) = i j hi hj ϕ(2x − i)ϕ(2y − j)


P P

h1 = Vm1 ⊗ Wn1 : ψ H (x, y) = ψ(x)ϕ(y) = i j gi hj ψ(2x − i)ϕ(2y − j)


P P

v 1 = Wm1 ⊗ V 1 : ψ V (x, y) = ϕ(x)ψ(y) =


P P
h g ϕ(2x − i)ψ(2y − j)
1 1
n
1 D
P Pj i j
i
d = Wm ⊗ Wn : ψ (x, y) = ψ(x)ψ(y) = i j gi gj ψ(2x − i)ψ(2y − j)
(4)
where ϕ is the father function and ψ is the mother function in sense of
wavelet, V space denotes the average space and the W spaces are the
8 Myung-Sin Song

difference space from multiresolution analysis (MRA) [4]. h and g are


both low-pass and high-pass filter coefficients.
a1 : the first averaged image, which consists of average intensity values
of the original image. Note that only ϕ function, V space and h
coefficients are used here.
h1 : the first detail image of horizontal components, which consists of
intensity difference along the vertical axis of the original image. Note
that ϕ function is used on y and ψ function on x, W space for x
values and V space for y values; and both h and g coefficients are
used accordingly.
v1 : first detail image of vertical components, which consists of intensity
difference along the horizontal axis of the original image. Note that
ϕ function is used on x and ψ function on y, W space for y values
and V space for x values; and both h and g coefficients are used
accordingly.
1
d : the first detail image of diagonal components, which consists of
intensity difference along the diagonal axis of the original image.
The original image is reconstructed from the decomposed image by
taking the sum of the averaged image and the detail images and
scaling by a scaling factor. It could be noted that only ψ function,
W space and g coefficients are used here.
See [19], [17].
This decomposition not only limits to one step but it can be done
again and again on the averaged detail depending on the size of the
image. Once it stops at certain level, quantization (see [15], [13], [18])
is done on the image. This quantization step may be lossy or lossless.
Then the lossless entropy encoding is done on the decomposed and
quantized image as Figure 6.
The above figure illustrates on how mathematically wavelet image
decomposition is done. An example would illustrate how average, hor-
izontal, vertical and diagonal details are obtained through the wavelet
decomposition of a digital image of an octagon.
There are various means of quantization and one commonly used
one is called thresholding. Thresholding is a method of data reduction
where it puts 0 for the pixel values below the thresholding value or
something other ‘appropriate’ value. Soft thresholding is defined as
follows: 
0
 if |x| ≤ λ
Tsof t (x) = x − λ if x > λ (5)

x + λ if x < −λ

Entropy Encoding in Wavelet Image Compression 9

S0SH
S H

S0SV S0SD

S V S D

Fig. 4. How the subdivision works.

and hard thresholding as follows:

Fig. 5. Original octagon image.


10 Myung-Sin Song

Fig. 6. Octagon after 2-level decomposition

(
0 if |x| ≤ λ
Thard (x) = (6)
x if |x| > λ

where λ ∈ R+ and x is a pixel value. It could be observed by looking at


the definitions, the difference between them is related to how the coeffi-
cients larger than a threshold value λ in absolute values are handled. In
hard thresholding, these coefficient values are left alone. Where else, in
soft thresholding, the coefficient values area decreased by λ if positive
and increased by λ if negative [20]. Also, see [19], [8], [17].
Another way of quantization is as follows:
Definition 1. Let X be a set, and K be a discrete set. Let Q and D
be mappings Q : X → K and D : K → X. Q and D are such that
kx − D(Q(x))k ≤ kx − D(d)k, for all d∈K
Applying Q to some x ∈ X is called quantization, and Q(x) is the
quantized valued of x. Likewise, applying D to some k ∈ K is called
dequantization and D(k) is the dequantized value of k. [15]
During the quatization process, the number of bits needed to store
the wavelet transformed coefficients by reducing the precision of the
values. This is a many-to-one mapping, meaning that it is a lossy pro-
cess resulting in lossy compression.
Entropy Encoding in Wavelet Image Compression 11

Entropy encoding further compresses the quantized values in lossless


manner which gives better compression in overall. It uses a model to
accurately determine the probabilities for each quantized value and
produces an appropriate code based on these probabilities so that the
resultant output code stream will be smaller than the input stream.

2.2 Benefits of Entropy Encoding

One might think that the quantization step suffices for compression. It
is true that the quantization does compress the data tremendously. Af-
ter the quantization step many of the pixel values are either eliminated
or replaced with other suitable values. However, those pixel values are
still represented with either 8 or 16 bits. See 1.1. So we aim to minimize
the number of bits used by means of entropy encoding. Karhunen-Loève
transform or PCAs makes it possible to represent each pixel on the digi-
tal image with the least bit representation accoding to their probability
thus yields the lossless optimized representation using least amount of
memory.

3 Various entropy encoding schemes

In this section we discuss various entropy encoding schemes on how


they work, the mathematics behind it.

3.1 The Kahunen-Loève transform

Karhunen-Loève transform also known as Principal Components Anal-


ysis (PCA) allows us to better represent each pixels on the image matrix
using the smallest number of bits. It makes us possible to assign the
smallest number of bits for the the pixel that has the highest proba-
bility, then the next number to the pixel value that has second highest
probabilty, and so forth; thus the pixel that has smallest probability
gets assigned the highest value among all the other pixel values.
An example with letters in the text would better depict how the
mechanism works. Suppose we have a text with letters a, e, f, q, in
order of probabilty. That is, ‘a’ shows up most frequently and ‘q’ shows
up least frequently. Then we would assign 00 to ‘a’, then 01 to ‘e’, 100
to ‘f’, and 101 to ‘q’.
In general, one refers to a Karhunen-Loève transform as an expan-
sion in Hilbert space with respect to an ONB resulting from an appli-
cation of the Spectral-Theorem.
12 Myung-Sin Song

The Algorithm

Our aim is to reduce the number of bits needed to represent an image


by removing redundancies as much as possible.
The algorithm for entropy encoding using Karhunen-Loève expan-
sion can be described as follows:
1. Perform the wavelet transform for the whole image. (ie. wavelet
decomposition.)
2. Do quantization to all coefficients in the image matrix, except the
average detail.
3. Subtract the mean: Subtract the mean from each of the data dimen-
sions. This produces a data set whose mean is zero.
4. Compute the covariance matrix.
Pn
(Xi − X̄)(Yi − Ȳ )
cov(X, Y ) = i=1
n
5. Compute the eigenvectors and eigenvalues of the covariance matrix.
6. Choose components and form a feature vector(matrix of vectors),

(eig1 , ..., eign )

Eigenvectors are in order of their eigenvalues. Eigenvalues found in


step 5 are different in values. The eigenvector with highest eigenvalue
is the principle component of the data set.
7. Derive the new data set. F inalData = RowF eatureV ector ×
RowDataAdjust where RowDataAdjust is the mean-adjusted data
transposed. [15]
Starting with a matrix representation for a particular image, we then
compute the covariance matrix using the steps from (3) and (4) in algo-
rithm above. We then compute the Karhunen-Loève eigenvalues. Next,
the eigenvalues arranged in decreasing order. The corresponding eigen-
vectors are arranged to match the eigenvalues with multiplicity. The
eigenvalues mention here are the same eigenvalues λi in this section,
thus yielding smallest error and smallest entropy in the computation.
In computing probabilities and entropy, Hilbert space serves as a
helpful tool. For example, take a unit vector f in some fixed Hilbert
space H, and an orthonormal basis (ONB) ψi with i running onto an
indext set I. With this we now introduce to families of probability
measures one Pf (·) indexed by f ∈ H, and a second PT indexed by a
class of operators T : H → H.
Entropy Encoding in Wavelet Image Compression 13

Definition 2. Let H be a Hilbert space. Let (ψi ) and (φi ) be orthonor-


mal bases (ONB), with index set I. Usually

I = N = {1, 2, ...}. (7)

If (ψi )i∈I is an ONB, we set Qn := the orthogonal projection onto


span{ψ1 , ..., ψn }.
We now introduce a few facts about operators which will be needed
in the paper. In particular we recall Dirac’s terminology [?] for rank-one
operators in Hilbert space. While there are alternative notation avail-
able, Dirac’s bra-ket terminology is especially efficient for our present
considerations.
Definition 3. Let vectors u, v ∈ H. Then

hu|vi = inner product ∈ C, (8)

|uihv| = rank-one operator, H → H, (9)


where the operator |uihv| acts as follows

|uihv|w = |uihv|wi = hu|wiu, for all w ∈ H. (10)


Consider an ensemble of a large number N of similar objects, of
which N wα , α = 1, 2, ..., ν where the relative frequency wα satisfies the
probability axioms:

α
w ≥ 0, wα = 1
α=1
Assume that each type specified by a value of the index α is repre-
sented by f α (ξ) in a real domain [a, b], which we normalize by
Z b
|f α (ξ)|2 dξ = 1
a

Let {ψi (ξ)}, i = 1, 2, ..., be a complete set of orthonomal base functions


defined on [a, b] Then any function f α (ξ) can be expanded as

(α)
X
f α(ξ) = xi ψi (ξ) (11)
i=1

with Z b
xαi = ψi∗ (ξ)f α (ξ)dξ. (12)
a
14 Myung-Sin Song

Here, xαi is the component of f α in ψi coordinate system. With the


normalization of f α we have

X
|xαi |2 = 1 (13)
i=1

Then substituting (12) in (11) gives


Z b X∞ X
α
f (ξ) = f α (η)[ ψi∗ (η)ψi (η)]dη = hψi (η)|f α iψi (14)
a i=1 i=1

in definition of ONB.
Let H = L2 (a, b). ψi : H → l 2 (Z) and U : l 2 (Z) → l 2 (Z) where U is
a unitary operator
Note that the distance is invariant under a unitary transformation.
Thus, using another coordinate system {φj } in place of {ψi }, would
not change the distance.
Let {φj }, j = 1, 2, ..., be another set of ONB functions instead of
{ψi (ξ)}, i = 1, 2, ...,. Let yjα be the component of f α in {φj } where it can
be expressed in terms of xαi by a linear relation yjα = ∞ α
P
P∞ i=1 hφj , ψi ixi =
α
i=1 Ui,j xi where U : l 2 (Z) → l 2 (Z), U is a unitary operator matrix
Rb ∗
Ui,j = hφj , ψi i = a φj (ξ)ψi (ξ)dξ Also, xαi can be written in terms of
yjα under the following relation xαi = ∞
P α
P∞ −1 α
j=1 hψi , φj iyj = j=1 Ui,j yj
−1 ∗
where Ui,j = Ui,j and Ui,j = Uj,i

X X
f α (ξ) = xαi (ξ)ψi (ξ) = yiα (ξ)φi (ξ)
i=1
P∞ α
P∞ α
So U (xi ) = (yi ) and i=1 xi ψi (ξ) = j=1 yj φj (ξ)
Z b
xαi =< ψi , f α >= ψi (ξ)f (α) (ξ)dξ
a
(α)
The squared magnitude |xi |2
of the coefficient for ψi in the expan-
sion of f (α) can be considered as a good measure of the average in the
ensemble n
(α)
X
Qi = w(α) |xi |2
α=1
can be considered as the measure of importance of {ψi }.
X
Qi ≥ 0, Qi = 1
i
Entropy Encoding in Wavelet Image Compression 15

Then the entropy function in terms of the Qi ’s is defined as


X
S({ψi }) = − Qi log Qi .
i

We are interested in minimizing the entropy, that is if {Θj } is one such


optimal coordinate system, we shall have

S({Θj }) = min{ψj } S({ψi })

Let G(ξ, ξ ′ ) = Pα wα f α (ξ)f α∗ (ξ ′ ). Then G is a Hermitian


P
matrix
and Qi = G(i, i) = α wα xαi xα∗
P
i where normalization of Q i = 1 give
us trace G = 1 where the trace means the diagonal sum.
Then define a special function system {Θk (ξ)} as the set of eigen-
functions of G, i.e.
Z b
G(ξ, ξ ′ )Θk (ξ)dξ ′ = λk Θk (ξ). (15)
a

so GΘk (ξ) = λk Θk (ξ).


When the date are not functions but vectors v α s whose components
(α)
are xi in the ψi coordinate system, we have
X
G(i, i′ )tki′ = λk tki (16)
i′

where tki is the i − th component of the vector Θk in the coordinate


system {ψi }. So we get ψ : H → (xi ) and also Θ : H → (ti ). The two
ONBs result in
X X
xαi = cαk tki for all i, cαk = tk∗ α
i xi
k i

which is Karhunen-Loève expansion of f α(ξ) or vector v α . Then {Θk (ξ)}


is the K-L coordiate system dependent on {wα } and {f α (ξ)}. Then we
arrange the corresponding functions or vectors in the order of eigenval-
ues λ1 ≥ λ2 ≥ . . . ≥ λk−1 ≥ λk ≥P. . ..
Now, Qi = Gi,i = hψi Gψi i = k Aik λk where Aik = tki tk∗ i which is
a double stochastic matrix. Then
 
λ1 · · · 0
G = U  0 . . . 0  U −1
 

0 · · · λk
16 Myung-Sin Song

3.2 Kolmogorov Entropy

This is an example of hard implementation into coding. Thus, is not


very commonly used in industry compared to other methods mentioned.

Implementation

Let X be a metric space with distance function ρ. If f ∈ X and r > 0,


let
B(f, r) := B(f, r)x := {g ∈ X : ρ(f, g) < r}.
be the open ball with radius r centered at f . For K ⊂ X compact,
there is a finite collection of balls B(fi , ǫ), i = 1, . . . , n, for each ǫ > 0,
which cover K: K ⊂ ni=1 B(fi , ǫ). Then the covering number Nǫ (K) :=
S
Nǫ (K, X) is the smallest integer n for which there is such an ǫ−covering
of K.

Definition 4. The Kolmogorov ǫ−entropy of K is defined as

Hǫ (K) := Hǫ (K, X) := log Nǫ (K), ǫ > 0.

where the log is the logarithm to the base two. [3]

3.3 Shannon-Fano Entropy

For each dataP


on an image, ie. pixel, a set of probabilities pi is com-
puted, where ni=1 pi = 1. The entropy of this set gives the measure
of how much choice is involved, in the selection of the pixel value on
average.

Definition 5. Shannon’s entropy E(p1 , p2 , . . . , pn ) which satisfy the


following:
• E is a continuous function of pi .
• E should be steadily increasing function of n.
• If the choice is made in k successive stages, then E = sum of the
entropies of choices at each stage, with weights corresponding to the
probabilities of the stages.
E = −k ni=1 pi log pi . k controls the units of the entropy, which is
P
”bits.” logs are taken base 2. [2, 14]

Shannon-Fano entropy encoding is done according to the probabil-


ities of data and the method is as follows:
Entropy Encoding in Wavelet Image Compression 17

• The data is listed with their probabilities in decreasing order of their


probabilities.
• The list is divided into two parts that has roughly equal probability.
• Start the code for those data in the first part with a 0 bit and for
those in the second part with a 1.
• Continue recursively until each subdivision contains just one data.
[2, 14]

3.4 Huffman Coding


This was developed by Huffmand shortly after Shannon’s work. This
gives a greater compression compared to Shannon entropy encoding.
Huffman coding is done as follows:
• The data is listed with their probabilities.
• The two data with the smallest probabilities are located.
• The two data are replaced by a single set containing both, whose
probability is the sum of the individual probabilities.
• These steps are repeated until the list is left with only one member.
See [2].

3.5 Arithmetic Coding


This is one of the latest and popular encoding scheme. In arithmetic
coding, symbols are restricted in such a way that tranlation is done
into an integral number of bits, thus making the coding more efficient.
In this coding, the data is represented by an interval of real numbers
between 0 and 1. As the data becomes larger, the interval required
for representation becomes smaller, and the number of bits required to
specify that interval increases. Successive symbols of the data reduce
the size of the interval according to the probabilities of the symbol
generated by the model. The data that is more likely has more reduced
ranged compared to the unlikely data, thus fewer bits are used. [2, 22]
The above mentioned entropy encoding schemes are chosen in ap-
plication in wavelet image compression with the preference of coding
simplicity, effectiveness in minimization of entropy and the lossless com-
pression ratio.

Acknowledgement
The author would like to thank Professor Palle Jorgensen, the members
of WashU Wavelet Seminar, Professors David Larson, Gestur Olafsson,
18 Myung-Sin Song

Peter Massopust, Dorin Dutkay, Simon Alexander, for helpful discus-


sions, and Professor Victor Wickerhauser for suggesting [1, 7], Professor
Brody Johnson for suggesting [21], .

References
1. Ash RB (1990) Information theory. Corrected reprint of the 1965 original.
Dover Publications, Inc., New York
2. Bell TC, Cleary JG, Witten IH (1990) Text Compression. Prentice Hall,
Englewood Cliffs
3. Cohen A, Dahmen W, Daubechies I, DeVore R (2001) Tree Approxima-
tion and Optimal Encoding. Applied Computational Harmonic Analysis
11:192–226
4. Daubechies I (1992) Ten Lectures on Wavelets. SIAM
5. Donoho DL, Vetterli M, DeVore RA, Daubechies I (Oct. 1998) Data Com-
pression and Harmonic Analysis. IEEE Trans. Inf. Theory, 44 (6):2435–
2476
6. Dirac PAM (1947) The Principles of Quantum Mechanics. 3d ed Oxford,
at the Clarendon Press
7. Effros M, Feng H, Zeger K (Aug. 2004) Suboptimality of the Karhunen-
Loève Transform for Transform Coding. IEEE Trans. Inf. Theory, 50
(8):1605–1619
8. Field DJ (1999) Wavelets, vision and the statistics of natural scenes. Phil.
Trans. R. Soc. Lond. A 357:2527–2542
9. Gonzalez RC, Woods RE, Eddins SL (2004) Digital Image Processing
Using MATLAB. Prentice Hall
10. Jorgensen PET (2006) Analysis and Probability Wavelets, Signals, Frac-
tals. Springer, Berlin Heidelberg New York
11. Jorgensen PET, Song M-S (2007) Entropy Encoding using Hilbert Space
and Karhunen-Loève Transforms. preprint
12. Pierce JR (1980) An Introduction to Information Theory Symbols, Sig-
nals and Noise. 2nd Edition Dover Publications, Inc., New York
13. Schwab C, Todor RA (2006) Karhunen-Loève approximation of random
fields by generalized fast multipole methods. Journal of Computational
Physics 217, Elsevier
14. Skodras A, Christopoulos C, Ebrahimi T (Sept. 2001) JPEG 2000 Still
Image Compression Standard. IEEE Signal processing Magazine 18:36–
58
15. Shannon CE, Weaver W (1998) The Mathematical Theory of Communi-
cation. University of Illinois Press, Urbana and Chicago
16. Smith LI (2002) A Tutorial on Principal Components Analysis.
http://csnet.otago.ac.nz/cosc453/student tutorials/principal components.pdf
17. Song M-S (2005) Wavelet image compression. Ph.D. thesis The University
of Iowa, Iowa
Entropy Encoding in Wavelet Image Compression 19

18. Song M-S (2006) Wavelet image compression. Operator theory, operator
algebras, and applications Contemp. Math. 414:41–73, Amer. Math. Soc.,
Providence, RI
19. Usevitch BE (Sept. 2001) A Tutorial on Modern Lossy Wavelet Image
Compression: Foundations of JPEG 2000. IEEE Signal processing Mag-
azine 18:22–35
20. Walker JS (1999) A Primer on Wavelets and their Scientific Applications.
Chapman & Hall, CRC
21. Walnut DF (2002) An Introduction to Wavelet Analysis. Birkhäuser
22. Watanabe S (1965) Karhunen-Loève Expansion and Factor Analysis The-
oretical Remarks and Applications Transactions of the Fourth Prague
Conference on Information Theory Statistical Decision Functions Ran-
dom Process. Adademia Press
23. Witten IH, Neal RM, Cleary JG, (June 1987) Arithmetic Coding for Data
Compression. Communications of the ACM 30 (6):520–540

You might also like