Wavelets For Kids A
Wavelets For Kids A
Wavelets For Kids A
A Tutorial Introduction
By
Brani Vidakovic and Peter Muller
Duke University
Contents
1 What are wavelets? 3
2 How do the wavelets work? 5
2.1 The Haar wavelet : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5
2.2 Mallat's multiresolution analysis, lters, and direct and inverse wavelet
transformation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9
2.2.1 Mallat's mra : : : : : : : : : : : : : : : : : : : : : : : : : : : 9
2.2.2 The language of signal processing : : : : : : : : : : : : : : : : 10
3 Thresholding methods 13
3.1 Dierent thresholding policies : : : : : : : : : : : : : : : : : : : : : : 16
3.1.1 Hard thresholding : : : : : : : : : : : : : : : : : : : : : : : : : 16
3.1.2 Soft thresholding : : : : : : : : : : : : : : : : : : : : : : : : : 16
3.1.3 Quantile thresholding : : : : : : : : : : : : : : : : : : : : : : : 17
3.1.4 Universal thresholding : : : : : : : : : : : : : : : : : : : : : : 17
4 Example: California earthquakes 17
5 Wavelet image processing 18
6 Can you do wavelets? 20
7 Appendix 24
1 WHAT ARE WAVELETS? 3
set where the wavelet jk is constant. That makes integral (1) equal to zero:
If j = j 0, but k 6= k0, then at least one factor in the product j k jk is zero. 0 0
1.0
0.5
0.0
-0.5
-1.0
0.0 0.2 0.4 0.6 0.8 1.0
The functions 10 11 20 21 22 23 are depicted in Figure 4. The set fjk j 2
Z k 2 Z g denes an orthonormal basis for L2. Alternatively we will consider or-
thonormal bases of the form fj0k jk j j0 k 2 Z g, where 00 is called the scaling
function associated with the wavelet basis jk . The set fj0 k k 2 Z g spans the same
subspace as fjk j < j0 k 2 Z g. We will later make this statement more formal and
dene jk . For the Haar wavelet basis the scaling function is very simple. It is unity
on the interval 0,1), i.e.
(x) = 1(0 x < 1):
The statistician may be interested in wavelet representations of functions gener-
ated by data sets.
Let y = (y0 y1 : : : y2 ;1) be the data vector of size 2n : The data vector can be
n
The (data) function f is obviously in the L20 1) space, and the wavelet decomposition
of f has the form
;1 2 ;1 d (x):
f (x) = c00(x) + nj=0 j
(2)
k=0 jk jk
The sum with respect to j is nite because f is a step function, and everything can
be exactly described by resolutions up to the (n ; 1)-st level. For each level the sum
R
2 jjf jj2 =def hf f i = f 2 :
2 HOW DO THE WAVELETS WORK? 7
2
1
0
-1
-2
-3
The solution is 2 2 3
6
c00 37 6 121 7
6
6 d00 77 66 ;12 77
6
6 d10 77 666 2p2 777
6
6 d11 777 = 66 ; 2p1 2 77 :
6
6
6 d20 77 666 41 777
6
6
6
d21 777 66 ; 54 77
4 d22 5 64 14 75
d23 ; 14
Thus,
f = 12 ; 12 00 + p1 10 ; p1 11 + 14 20 ; 54 21 + 41 22 ; 14 23 (3)
2 2 2 2
The solution is easy to check. For example, when x 2 0 81 )
p
f (x) = 12 ; 21 1 + p1 2 + 14 2 = 1:
2 2
The reader may already have the following question ready: \What will we do for
vectors y of much bigger length?" Obviously, solving the matrix equations becomes
~
impossible.
2 HOW DO THE WAVELETS WORK? 9
We now explain how the wavelets enter the picture. Because V0 p V1 any function
in V0 can be written as a linear combination of the basis functions 2(2x ; k) from
V1. In particular:
p
(x) = k h(k) 2(2x ; k): (4)
p
Coecients h(k) are dened as h(x) 2(2x ; k)i. Consider now the orthogonal
complement Wj of Vj to Vj+1 (i.e. Vj+1 = Vj
Wj ). Dene
p
(x) = 2k (;1)k h(;k + 1)(2x ; k): (5)
p
It can be shown that f 2(2x ; k) k 2 Z g is an orthonormal basis for W1.4
R
3 A function f is in L2(S) if S f 2 is nite.
4 This can also be expressed in terms of Fourier transformations as follows: Let m0 (!) be the
2 HOW DO THE WAVELETS WORK? 10
For a sequence a = fang the operators H and G are dened by the following
coordinatewise relations:
(Ha)k = nh(n ; 2k)an
(Ga)k = ng(n ; 2k)an :
The operators H and G correspond to one step in the wavelet pdecomposition.
The only dierence is that the above denitions do not include the 2 factor as in
Equations (4) and (5).
Denote the original signal by c(n). If(n)the signal is of length 2n, then c(n) can be
represented by the function f (x) ~= ck nk , f 2 Vn . At each stage of the ~ wavelet
transformation we move to a coarser~approximation c(j;1) by c(j;1) = Hc(j) and
d(j;1) = Gc(j). Here, d(j;1) is the \detail" lost by approximating ~ c~(j) by the averaged
~
~c(j;1). The~ discrete wavelet
~ ~
transformation of a sequence y = c of length 2n can
( n)
~then be represented as another sequence of length 2n (notice~ that~ the sequence c(j;1)
has half the length of c(j)): ~
~
(d(n;1) d(n;2) : : : d(1) d(0) c(0)): (9)
~ ~ ~ ~ ~
Thus the discrete wavelet transformation can be summarized as a single line:
y ;! (Gy GHy GH 2y : : : GH n;1 y H n y):
~ ~ ~ ~ ~ ~
The reconstruction formula is also simple in terms of H and G! we rst dene
adjoint operators H ? and G? as follows:
(H ?a)n = k h(n ; 2k)an
(G? a)n = k g(n ; 2k)an :
Recursive application leads to:
;1 (H ? )j G? d(j ) + (H ? )n c(0):
(Gy GHy GH 2 y : : : GH j;1 y H j y) ;! y = nj=0
~ ~ ~ ~ ~ ~ ~ ~
Equations (7) and (8) which generate lter coecients (sometimes called dilation
equations) look very simple for the Haar wavelet:
p p
(x) = (2x) + (2x ; 1) = p1 2(2x) + p1 2(2x ; 1) (10)
2 2
p p
(x) = (2x) ; (2x ; 1) = p1 2(2x) ; p1 2(2x ; 1):
2 2
The lter coecients in (10) are
h(0) = h(1) = p1 g(0) = ;g(1) = p1
2 2
2 HOW DO THE WAVELETS WORK? 12
y = c(3) 1 0 -3 2 1 0 1 2
~
d(1) 1 -1
~
c(1) 0 2
~
d(0) ;p2
~
c(0) p
~ 2
Figure 6 schematically gives the decomposition algorithm applied to our data set.
To get the wavelet coecients as in (3) we multiply components of d(j) j = 0 1 2
and c(0) with the factor 2;N=2: Simply, ~
djk = 2;N=2d(kj) 0 j < N (= 3):
It is interesting that in the Haar wavelet case 2;3=2c(0) 1
0 = c00 = 2 is the mean of
the sample y:
Figure 7~schematically gives the reconstruction algorithm for our example.
The careful reader might have already noticed that when the length of the lter
is larger than 2, boundary problems occur. (There are no boundary problems with
the Haar wavelet!) There are two main ways to handle the boundaries: symmetric
and periodic.
We should remark that some problems call for use of continuous wavelet trans-
forms.
3 Thresholding methods
In wavelet decomposition the lter H is an \averaging" lter while its mirror coun-
terpart G produces details. The wavelet coecients correspond to details. When
details are small, they might be omitted without substantially aecting the \general
picture." Thus the idea of thresholding wavelet coecients is a way of cleaning
out \unimportant" details considered to be noise. We illustrate the idea on our old
friend, the data vector (1 0 ;3 2 1 0 1 2):
Example: The data vector (1 0 ;3 2 1 0 1 2) is transformed into the vector
p p
( p1 ; p5 p1 ; p1 1 ;1 ; 2 2):
2 2 2 2
If all coecients less than 0.9 (well, our choice) are replaced
p p by zeroes, then the
resulting (\thresholded") vector is (0 ; p52 0 0 1 ;1 ; 2 2):
The graph of \smoothed data", after reconstruction, is given in Figure 8.
p H? -
c(0) 2 1 1
~
p G? -
d(0) - 2 -1 1
~
0 2
9
c(1) 0 2 H? - 0 0 p22 p22
~
d(1) 1 -1 G? - p1 - p12 - p12 p1
~ 2 2
1 0 -3 2 1 0 1 2
Figure 7: Reconstruction example
3 THRESHOLDING METHODS 15
2
1
0
-1
-2
-3
Donoho and Johnstone propose to start with a wavelet decomposition of the data set,
threshold the coecients, and then use the wavelet reconstruction as an estimate
p fp^.
When using the thresholding rule d^jk = sign(djk )(jdjk j;)+ with = 2 log n= n,
the estimate f^ can be shown to have risk R(f^ f ) within a factor 2 log n of the mini-
mum risk when using the (of course unknown) optimal thresholding rule. Here R is
given by R(f^ f ) = E ((f^(ti) ; f (ti))2=n. Donoho and Johnstone (1993) show that
the (interpolated) function estimate f^ is, with probability tending to 1 (as n ! 1),
at least as smooth as f .
Another interesting application of wavelet thresholding arises in density estima-
tion. Assume X1 : : : Xn are i.i.d. observations from an unknown probability density
function f (x). Donoho, Johnstone, Kerkyacharian, and Picard (1993) dene a non-
linear density estimate by thresholding the coecient in the wavelet decomposition
of the empirical p.d.f.
If the unknown density is estimated by
f^(x) = jk d^jk jk (x)
then due to orthonormality of jk s, the sample estimator of djk is d^jk = n1 jk (Xi ):
Thresholding in this problem reminds us of well known procedures in density
estimation by orthogonal series: shrinking and tapering.
In the next subsection we will give a brief tour through some thresholding policies.
3 THRESHOLDING METHODS 16
2
2
1
1
0
0
y
z
-1
-1
-2
-2
-3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
x x
where p is a p-quantile of the set of all wavelet coecients. For example, we might
want to replace 30% of the smallest wavelet coecients by zero.
3.1.4 Universal thresholding
q p
Donoho and Johnstone (1992) propose to use the threshold = 2 log(n)= n on
transformed data set yn , where n is the sample size, and is the scale of the noise on
i
A plot of the raw data for hourly measurements over one year (8192 = 213 obser-
vations) is given in Figure 10a. After applying the DAUB #2 wavelet transformation
and thresholding by the Donoho-Johnstone \universal" method, we got a very clear
signal with big jumps at the earthquake time. The cleaned data are given in Figure
10b. The magnitude of the water level change at the earthquake time did not get
distorted in contrast to usual smoothing techniques. This is a desirable feature of
wavelet methods. Yet, a couple of things should be addressed with more care.
(i) Possible %uctuations important for the earthquake prediction are cleaned as
noise. In post-analyzing the data, having information about the earthquake time, one
might do time-sensitive thresholding.
(ii) Small spikes on the smoothed signal (Figure 10b) as well as `boundary dis-
tortions" indicate that the DAUB2 wavelet is not the most fortunate choice. Com-
promising between smoothness and the support shortness of the mother wavelet with
help of wavelet banks, one can develop ad-hoc rules for better mother wavelet (wavelet
model) choice.
-53.0
-53.0
-53.1
-53.1
-53.2
-53.2
acl
-53.3
-53.3
-53.4
-53.4
-53.5
-53.5
Index Index
(a) Raw data, water level vs. time (b) After thresholding the wavelet transformation.
Figure 10: Panel (a) shows n = 8192 hourly measurements of the waterlevel for a
well in an earthquake zone. Notice the wide range of waterlevels at the time of an
earthquake around t = 2000.
white) image can be approximated by a matrix A in which the entries aij correspond
to intensities of gray in the pixel (i j ). For reasons that will be obvious later, it is
assumed that A is the square matrix of dimension 2n 2n n integer.
The process of the image wavelet decomposition goes as follows. On the rows of the
matrix A the lters H and G are applied. Two resulting matrices are obtained: Hr A
and Gr A, both of dimension 2n 2n;1 (Subscript r suggest that the lters are applied
on rows of the matrix A). Now on the columns of matrices Hr A and Gr A, lters H
and G are applied again and the four resulting matrices Hc Hr A GcHr A HcGr A and
GcGr A of dimension 2n;1 2n;1 are obtained. The matrix Hc Hr A is the average,
while the matrices Gc Hr A Hc Gr A and Gc Gr A are details (Figure 11)
-
A Gr A Hr A
-
? ?
GcGr A Gc Hr A
? ?
HcGr A Hc Hr A
The process continues with the average matrix Hc Hr A until a single number (av-
erage of the whole original matrix A) is obtained. Two examples are given below.
Example 1.
This example is borrowed from Nason and Silverman (1993). The top left panel
in Figure 12 is 256 256 black and white image of John Lennon in 0-255 gray scale.
In the top-right gure each pixel is contaminated by normal N (0 60) noise. (In
Splus: le lennon+rnorm(256*256, s=60) where lennon is the pixel matrix of
the original image.)
The two bottom gures are restored images. The DAUB #4 lter was used for
the rst gure, while DAUB #10 was used for the second.
Though the quality of the restored images may be criticized, the stunning property
of wavelet image analysis shows up in this example. Both restored images use only
about 1.8 % of the information contained in the \blurred" image. The compression
rate is amazing: 527120 bites go to 9695 bites after the universal thresholding.
Example 2.
This is an adaptation of the data set of J. Schmert, University of Washington. The
word ve was recorded and each column on the top-right gure represents a peri-
odogram over a short period of time (adjacent columns have half of their observations
in common). The rows represent time. The original 92 64 matrix was cut to 64
64 matrix for obvious reasons. After performing hard thresholding with = 0:25,
a compression ratio of 1:2 is achieved. The compressed gures are shown in the two
bottom panels of Figure 13.
250
250
200
200
150
150
100
100
50
50
0
250
200
200
150
150
100
100
50
50
0
60
50
5
4
40
3
Z
2
30
1
0
60
20
50
40 60
30 50
10
40
Y 20 30
10 20 X
10
0
0 10 20 30 40 50 60
60
50
5
4
40
3
Z
2
30
1
0
60
20
50
40 60
30 50
10
40
Y 20 30
10 20 X
10
0
0 10 20 30 40 50 60
Figure 13: Word FIVE data. The panels in the rst row show to the original data.
The bottom panels show the signal after thresholding.
REFERENCES 23
being eective! rather they are educational. mathematica notebook is given in the
appendix.
References
1] Barry A. C. (1993). Wavelet applications come to the fore, SIAM News, Novem-
ber 1993.
2] Coifman, R., Meyer, Y., and Wickerhauser, V. (1991) Wavelet analysis and signal
processing. In: Wavelets and Their Applications, Edited by Mary Beth Ruskai,
Jones and Bartlet Publishers.
3] Daubechies, I. (1988), Orthonormal bases of compactly supported wavelets. Com-
mun. Pure Appl. Math., 41 (7), 909-996.
4] Daubechies, I. (1992), Ten Lectures on Wavelets, Society for Industrial and Ap-
plied Mathematics.
5] DeVore, R. and Lucier, B. J. (1991) Wavelets. Acta Numerica 1 1-56.
6] Donoho, D. (1992). Wavelet shrinkage and WVD: A 10-minute tour. Presented
on the International Conference on Wavelets and Applications, Tolouse, France,
June 1992.
7] Donoho, D., Johnstone, I., Kerkyacharian, G, and Picard, D. (1993). Density
estimation by wavelet thresholding. Technical Report, Department of Statistics,
Stanford University.
8] Donoho, D., Johnstone, I., Kerkyacharian, G, and Picard, D. (1993). Wavelet
shrinkage: Asymptopia? Technical Report, Department of Statistics, Stanford
University.
9] Donoho D., and Johnstone, I. (1993). Adapting to unknown smoothness via
wavelet shrinkage. Technical Report, Department of Statistics, Stanford Univer-
sity.
10] Donoho, D. and Johnstone, I. (1992). Minimax estimation via wavelet shrinkage.
Technical Report, Department of Statistics, Stanford University.
11] Grossmann, A. and Morlet, J. (1984). Decomposition of Hardy functions into
square integrable wavelets of constant shape. SIAM J. Math., 15, 723-736.
12] Johnstone, I. (1993). Minimax-Bayes, asymptotic minimax and sparse wavelet
priors. Technical Report, Department of Statistics, Stanford University.
7 APPENDIX 24
7 Appendix
BeginPackage
"Waves`"]
(* Author: Brani Vidakovic, ISDS, Duke University
Functions Dec and Comp are based on M. V. Wickerhauser's
mathematica program *)
Begin "`Private`"]
WT
vector_List, filter_List]:=
Module
{ wav={}, c,d, ve=vector, H=filter,
G=Mirror
filter]},
While
Length
ve] > 1,
lev=Log
2,Length
ve]]-1
c = Dec
ve, H]
d = Dec
ve, G]
wav= Join
wav, d ]
ve = c] Join
wav, c] ]
7 APPENDIX 26
WR
vector_List, filter_List]:=
Module
{i=1, vl=Length
vector], c=Take
vector,-1],
d=Take
RotateRight
vector,1],-1],
mirrorf=Mirror
filter], cn, dn, k=1},
While
i <= vl/2 ,
k += i
i= 2 i
cn=Comp
c, filter]+Comp
d, mirrorf]
dn=Take
RotateRight
vector, k], -i ]
c=cn
d=dn
]
c ]
End ]
EndPackage ]
Institute of Statistics
and Decision Sciences
Duke University
Durham, NC 27708-0251
[email protected]
[email protected]