Discrete Wavelet Transforms - Algorithms and Applications PDF
Discrete Wavelet Transforms - Algorithms and Applications PDF
Discrete Wavelet Transforms - Algorithms and Applications PDF
TRANSFORMS:
ALGORITHMS AND
APPLICATIONS
Edited by Hannu Olkkonen
Discrete Wavelet Transforms: Algorithms and Applications
Edited by Hannu Olkkonen
Published by InTech
Janeza Trdine 9, 51000 Rijeka, Croatia
Statements and opinions expressed in the chapters are these of the individual contributors
and not necessarily those of the editors or publisher. No responsibility is accepted
for the accuracy of information contained in the published articles. The publisher
assumes no responsibility for any damage or injury to persons or property arising out
of the use of any materials, instructions, methods or ideas contained in the book.
Preface IX
The discrete wavelet transform (DWT) algorithms have a firm position in processing
of signals in several areas of research and industry. As DWT provides both octave-
scale frequency and spatial timing of the analyzed signal, it is constantly used to solve
and treat more and more advanced problems. The DWT algorithms were initially
based on the compactly supported conjugate quadrature filters (CQFs). However, a
drawback in CQFs is due to the nonlinear phase effects such as spatial dislocations in
multi-scale analysis. This is avoided in biorthogonal discrete wavelet transform
(BDWT) algorithms, where the scaling and wavelet filters are symmetric and linear
phase. The BDWT algorithms are usually constructed by a ladder-type network called
lifting scheme. The procedure consists of sequential down and uplifting steps and the
reconstruction of the signal is made by running the lifting network in reverse order.
Efficient lifting BDWT structures have been developed for VLSI and microprocessor
applications. Only register shifts and summations are needed for integer arithmetic
implementation of the analysis and synthesis filters. In many systems BDWT-based
data and image processing tools have outperformed the conventional discrete cosine
transform (DCT) -based approaches. For example, in JPEG2000 Standard the DCT has
been replaced by the lifting BDWT.
A difficulty in multi-scale DWT analyses is the dependency of the total energy of the
wavelet coefficients in different scales on the fractional shifts of the analysed signal.
This has led to the development of the complex shift invariant DWT algorithms, the
real and imaginary parts of the complex wavelet coefficients are approximately a
Hilbert transform pair. The energy of the wavelet coefficients equals the envelope,
which provides shift-invariance. In two parallel CQF banks, which are constructed so
that the impulse responses of the scaling filters have half-sample delayed versions of
each other, the corresponding wavelet bases are a Hilbert transform pair. However,
the CQF wavelets do not have coefficient symmetry and the nonlinearity disturbs the
spatial timing in different scales and prevents accurate statistical analyses. Therefore
the current developments in theory and applications of shift invariant DWT
algorithms are concentrated on the dual-tree BDWT structures.
This book reviews the recent progress in discrete wavelet transform algorithms and
applications. The book covers a wide range of methods (e.g. lifting, shift invariance,
multi-scale analysis) for constructing DWTs. The book chapters are organized into
X Preface
four major parts. Part I describes the progress in hardware implementations of the
DWT algorithms. Applications include multitone modulation for ADSL and
equalization techniques, a scalable architecture for FPGA-implementation, lifting
based algorithm for VLSI implementation, comparison between DWT and FFT based
OFDM and modified SPIHT codec. Part II addresses image processing algorithms such
as multiresolution approach for edge detection, low bit rate image compression, low
complexity implementation of CQF wavelets and compression of multi-component
images. Part III focuses watermaking DWT algorithms. Finally, Part IV describes shift
invariant DWTs, DC lossless property, DWT based analysis and estimation of colored
noise and an application of the wavelet Galerkin method.
The chapters of the present book consist of both tutorial and highly advanced material.
Therefore, the book is intended to be a reference text for graduate students and
researchers to obtain state-of-the-art knowledge on specific applications. The editor is
greatly indebted to all co-authors for giving their valuable time and expertise in
constructing this book. The technical editors are also acknowledged for their tedious
support and help.
1. Introduction
The reliable delivery of information over severe fading wireless or wired channels is a major
challenge in communication systems. At the heart of every communication system is the
physical layer, consisting of a transmitter, a channel and a receiver. A transmitter maps the
input digital information into a waveform suitable for transmission over the channel. The
communication channel distorts the transmitted waveform. One of the many sources of
signal distortion is the presence of multipath in the communication channel. Due to the
effect of the multipath signal propagation, inter-symbol interference (ISI) occurs in the
received waveform. Moreover, the transmitted signal gets distorted due to the effect of
various kinds of interference and noise, as it propagates through the channel. ISI and the
channel noise distort the amplitude and phase of the transmitted signal, which lead to
erroneous bit detection at the receiver. It is desirable for a good communication system that
its receiver is able to retrieve the digital information from the received waveform, even in
the presence of channel impairments such as, multipath effect and noise.
Orthogonal Frequency Division Multiplexing (OFDM) is a Multi-Carrier Modulation
(MCM) technique that enables high data rate transmission and is robust against ISI
(Saltzberg, 1967), (Weinstein and Ebert, 1971), (Hirosaki, 1981). It is a form of frequency
division multiplexing (FDM), where data is transmitted in several narrowband streams at
various carrier frequencies. The sub-carriers in an OFDM system are orthogonal under ideal
propagation conditions. By dividing the input bit-stream into multiple and parallel bit-
streams, the objective is to lower the data rate in each sub-channel as compared to the total
data rate and also to make sub-channel bandwidth lower than the coherence bandwidth of
the communication channel. Therefore, each sub-channel will experience flat-fading and will
have small ISI. Hence an OFDM system requires simplified equalization techniques, to
mitigate the inter-symbol interference. The ISI can be completely eliminated in OFDM
transceivers by utilizing the principle of cyclic prefixing (CP). Therefore, high data rate
communication systems prefer to apply multicarrier modulation techniques. OFDM has
been standardized for many digital communication systems, including ADSL, the 802.11a
and 802.11g Wireless LAN standards, Digital audio broadcasting including EUREKA 147
4 Discrete Wavelet Transforms: Algorithms and Applications
and Digital Radio Mondiale, Digital Video Broadcasting (DVB), some Ultra Wide Band
(UWB) systems, WiMax, and Power Line Communication (PLC) (Sari, et al., 1995)
(Frederiksen and Prasad, 2002), (Baig and Gohar, 2003).
Over the years, OFDM has evolved into variants, such as Discrete Multitone (DMT), and
hybrid modulation techniques, such as multi-carrier code division multiple access (MC-
CDMA), Wavelet OFDM and Discrete Wavelet Multitone (DWMT). Several factors are
responsible for the development of these variants, especially Wavelet based OFDM
techniques, which target several disadvantages associated with Multicarrier modulation
(MCM) techniques. Some of these drawbacks are:
the spectral inefficiency associated with the guard interval insertion, which includes the
cyclic prefix
the high degree of spectral leakage due to high magnitude side lobes of pulse shape of
sinusoidal carriers
OFDM based communication system’s sensitivity to inter-carrier interference (ICI) and
narrowband interference (NBI)
Therefore, a Discrete Wavelet Transform (DWT) based MCM system was developed as an
alternative to DFT based MCM scheme (Lindsey, 1995). DWT based MCM techniques came
to be known as Wavelet-OFDM in wireless communications and as Discrete Wavelet
Multitone (DWMT) for harsh and noisy wireline communication channels such as Digital
Subscriber Line (DSL) or Power Line Communications (PLC) (Baig and Mughal, 2009).
This chapter describes the application of DWT in Discrete Multitone (DMT) transceivers and
its performance analysis in Digital Subscriber Line (DSL) channel, in the presence of
background noise, crosstalk etc. Time domain equalization techniques proposed for DWT
based multitone that is DWMT are discussed, along with the simulation results. The pros
and cons of adopting DWT instead of DFT in DMT transceivers will also be discussed,
highlighting the open areas of research.
( ) ( ) (− ) ( )
= (1)
( ), ( ) (− ) (− )
The two signal spectra overlap. The downsampling will produce aliased components of the
signals, that are functions of (− / ) in Eq. 1, since the filtered signals are not bandlimited
to . Two-channel synthesis filter bank is the dual of analysis filter bank, as shown in Fig. 3.
( ) and ( ) denote the lowpass and highpass filters, which recombine the upsampled
signals ( ) and ( ) into ( ), the reconstructed version of the input signal. The aliased
images are removed by the filter ( ) in the frequency range 4 ≤ ≤ 2, while the
filter ( ) eliminates the images in the upsampled signal ( ) in the frequency range
0 ≤ ≤ 4. Therefore, the signal ( ), output from the synthesis filter bank is (Fliege,
1994),
( )
( )= ( ) ( ) (2)
( )
/2
Fig. 2. (Continued)
6 Discrete Wavelet Transforms: Algorithms and Applications
/2
/2
/2
/2 2
2
Fig. 2. Signal spectra in two-channel analysis filter bank. (a) Low pass & high pass filter
transfer functions. (b) low pass filtered signal spectrum ( ). (c) high pass filtered signal
spectrum ( ). (d) downsampled signal ( )spectrum. (e) downsampled signal ( )
spectrum (f) output signal spectra.
building block in many multirate systems. A two-channel QMF bank is shown in Fig. 4. The
constituent analysis and synthesis filter banks have power complementary frequency
responses. The low pass and high pass filters in the analysis filter bank decompose the input
signal into sub-bands, and the decimation introduces a certain amount of aliasing, due to the
non-ideal frequency response of the analysis filters. However, the synthesis filters
characteristics are chosen with such frequency response, that the aliasing introduced by the
analysis filter bank is canceled out in the reconstruction process. The output signal ( ) is
the recovered version of the input signal ( ). Therefore, the output signal ( ) is
expressed as,
( ) (− ) ( )
( )= ( ) ( ) (3)
( ) (− ) (− )
( )= ( ) ( )+ ( ) (− ) (4)
The reconstructed signal ( ) consists of two terms, the first term that is the product of the
transfer function ( ) and ( ) is the desired QMF output, while the second term is the
product of the transfer function ( ) and (− ) is the aliasing term ( ) denotes the
aliasing components produced by the overlapping frequency responses of the analysis and
synthesis filter banks. For an alias-free filter bank, ( ) must be equal to zero. This
condition is mathematically expressed as (Vaidyanthan, 1993),
( )= ( ) (− ) + ( ) (− ) = 0 (5)
( )= ( ) (− ) − ( ) (− ) (6)
Xˆ ( z )
x̂(n)
Fig. 5. (a) Three-level analysis filter bank (b) Three-level synthesis filter bank.
2.3 Transmultiplexer
Transmultiplexers form an integral part of modems and transceivers based on filter banks
that work on the principle of perfect reconstruction. A simple two-channel filter bank can be
utilized to illustrate the perfect reconstruction condition. A transmultiplexer is the dual of
Sub-band coder (SBC) in structure. Fig. 6 shows a two-channel transmultiplexer filter bank,
which converts a time-interleaved signal at its input to a FDM signal, having separate bands
of spectrum multiplexed together and then converts it back into TDM signal at its output.
Transmultiplexers find application in modems and transceivers for digital communication
(Vaidyanthan, 1993).
X 0 (z) Xˆ 0 ( z )
X1 ( z) Xˆ 1 ( z )
cannot utilize CP to mitigate ISI created by the frequency-selective channel, as various DWT
symbols overlap in time domain (Vaidyanathan, 1993). Nevertheless, such MCM systems
based on DWT require an efficient equalization technique to counter the ISI created by the
channel.
implemented as a set of FIR filters, which leads to the filter bank realization of wavelet
transform, according to Mallat’s algorithm (Mallat, 1998). The blocked version of the input
signal ( ) is mapped to a variable QAM constellation according to the number of bits
loaded. This is interpolated and filtered by the branch synthesis filter ( ). The
combined signal is sent through the channel, and the received signal is filtered by an
equalizer filter. The equalized signal is passed through the corresponding analysis filter
( ) and decimated to retrieve the QAM encoded version of the transmitted signal. The
transmitted signal is recovered after QAM decoding.
1 SNR
b log 2 1 (7)
2 . m
where SNR = εn.gn is the SNR of each sub-channel, εn is the sub-channel energy and gn is the
sub-channel SNR and it can be calculated as,
2
Hn
gn (8)
2
where Hn is the ADSL channel impulse response and σ2 is the noise power, Γ is the SNR gap
and γm is the performance margin, which is the amount by which SNR can be reduced (Yu
and Cioffi,2001). The water filling bit-loading for the proposed system is shown in Fig. 9.
While considering the DWMT based communication system for the ADSL channel, it is
necessary to consider its frequency response and the effect of crosstalk, near-end crosstalk
(NEXT), and far-end crosstalk (FEXT) in system simulation. The ADSL channel impairments
and crosstalk is briefly discussed in the following section.
Discrete Wavelet Multitone Modulation for ADSL & Equalization Techniques 13
Fig. 9. ADSL channel frequency response & number of bits loaded according to discrete
water-filling algorithm.
Fig. 10. A typical DSL network connecting subscribers to internet services through DSL to
the Central Office.
Although the DSL channel offers the advantage of utilizing the already in place telephone
lines to carry digital data, however there are different channel impairments that pose
14 Discrete Wavelet Transforms: Algorithms and Applications
4.2.1 Crosstalk
In a telephone network, each subscriber is connected to the CO through a twisted pair,
however, hundreds of such pairs are bound together in a cable. The twisting in the wires
keeps the electromagnetic coupling between them to a minimum, however, when the pairs
are numerous, all crosstalk between the pairs cannot be completely removed. Therefore, this
crosstalk constitutes a dominant impairment, where DSL channel is concerned. The DSL
crosstalk types, namely near end crosstalk (NEXT) and far-end crosstalk (FEXT) are
illustrated in Fig. 13 (Thomas Starr, et al., 2002). NEXT is the crosstalk due to the
neighboring transmitter on a different twisted pair line and its power increases with
increase in frequency. FEXT is the noise detected by the receiver located at the far end of the
cable from the transmitter. FEXT is typically less severe than NEXT, because FEXT is
attenuated as the cable length increases.
In this chapter, the performance of DWMT transceiver is evaluated for the downstream
ADSL channel. For this purpose, the NEXT and FEXT are modeled using the ADSL standard
G.992.1/G.992.2(ITU-T, 2003).
Fig. 13. NEXT and FEXT, the DSL crosstalks illustrated (Thomas Starr, et al., 2002).
The PSD of the ADSL transceiver disturbers for downstream is given by (ITU-T, 2003),
2
f
sin
2 f o 1 1
PSDADSL , ds Disturber K ADSL , ds 2
12
16
,
fo f f f HP 3 dB
1 1
fo f HP 3 dB f
(0 f ) (9)
where f is in Hz and the remaining parameters are defined in Table 1. The PSD of the ADSL
transceiver downstream NEXT is given by (ITU-T, 2003),
NPSLn
1.5
PSDADSL , ds NEXT PSDADSL , ds Disturber 10 10 f NXT f 1.5 ,(0 f ) (10)
16 Discrete Wavelet Transforms: Algorithms and Applications
where f is in Hz and the remaining parameters are also given in Table 1. The PSD of the
ADSL transceiver downstream FEXT is given by (ITU-T, 2003),
2
FPSLn
1 2
PSDADSL , ds FEXT PSDADSL , ds Disturber H channel ( f ) 10 10 dFXT dFXT d f 2,
(0 f ) (11)
where f is in Hz, and Hchannel(f) is the channel transfer function and the remaining
parameters are given in Table 1.
PSD of disturbers and NEXT is shown in Fig. 14(a) and Fig. 14 (b) displays the FEXT PSD for
downstream ADSL (ITU-T, 2003). The NEXT and FEXT for upstream can be computed in a
similar manner (ITU-T, 2003).
Fig. 14. (a) PSD-disturber & PSD-NEXT for downstream ADSL in G.992.1/G.992.2 standard.
Discrete Wavelet Multitone Modulation for ADSL & Equalization Techniques 17
= + (12)
18 Discrete Wavelet Transforms: Algorithms and Applications
The equalizer output vector z can be found by convolving a set of a training sequence input
samples h and equalizer tap weights c (Sklar, 2001),
= (13)
However, we continue with the assumption that channel state information is entirely known
at the receiver. Therefore, a square matrix h, consisting of channel coefficients is formulated
with the help of ZF criterion. The ZF algorithm defines that in order to minimize the peak
ISI distortion by selecting the equalizer filter weights c such that the equalizer output is
enforced to zero at sample points other than at the desired pulse. The weights are chosen
such that (Sklar, 2001)
1 for k 0
z( k ) (14)
0 for k 1, 2, , N
The equalizing filter has L=2N+1 taps. Equalizer filter coefficients are computed by (Sklar,
2001)
= (15)
The job of equalizing filter is to recover the transmitted signal x̂ from the received channel-
distorted signal y, as follows,
xˆ = yc
(16)
= xch c rc
where x̂ is the distorted received signal which was transmitted through ADSL channel and
recovered after ZF equalization.
Fig. 15. A Linear transversal equalizer with coefficients optimized by Zero-Forcing criterion.
= (17)
Discrete Wavelet Multitone Modulation for ADSL & Equalization Techniques 19
= (18)
where is the cross correlation matrix and = is the autocorrelation matrix of the
input noisy signal, which are used to determine the equalizer coefficients c,
= (19)
For the MMSE solution of the equalizing filter, an over sampled non-square matrix h is
formed which is transformed to a square autocorrelation matrix Rhh, yielding the optimized
filter coefficients.
BER curve, shown in Fig. 17, presents the fact that the two systems give almost identical
performance for lower SNR, and at higher SNR, the DWMT system exhibits an
improvement of 1 dB in / over the DMT system for an AWGN channel, at a of 1E-
6. It shows that both techniques using DMT and DWPT based ADSL without crosstalk
perform identically except at higher SNR. In the next step, the simulation is performed
according to the ADSL standard with crosstalk from G.992.1/G.992.2 (ITU-T, 2003).
Fig. 17. BER Comparison of DWMT & DMT systems in AWGN with ZF Equalization
techniques.
Fig. 18 shows the performance of DWMT and DMT systems in ADSL channel with AWGN,
NEXT and FEXT (crosstalk), utilizing time-domain equalization (TEQ) techniques. The
NEXT & FEXT represent the downstream crosstalk in ADSL channel according to the
G.992.1/G.992.2 standard (ITU-T, 2003), with the simulation parameters as described in
Table 1. DMT system is still equalized by ZF-TEQ, while the DWMT transceiver is equalized
by ZF-TEQ, time-domain MMSE (MMSE-TEQ). The BER curves shown in Fig. 18 validate
the fact that the wavelet packet transmultiplexer improves the performance of DWMT
transceiver, having ZF-TEQ by Eb/No margin of 1.0 db for BER of 1E-4, over a DMT
transceiver, having an identical equalizer. Moreover the MMSE-TEQ technique for DWMT
system shows an improvement of 2 dBs in Eb/No over ZF-TEQ technique for DWMT and a 3
dB gain over the ZF-TEQ equalized DMT system, at a BER of 1E-4.
Discrete Wavelet Multitone Modulation for ADSL & Equalization Techniques 21
Fig. 18. BER Comparison of DWMT & DMT systems for ADSL channel with AWGN, NEXT
& FEXT.
6. Conclusion
The multirate digital signal processing techniques, including wavelets and filter banks are part
of new emerging technologies, which are finding applications in the field of digital
communications. DWT based Multicarrier modulation techniques have opened new avenues
for researchers, to avoid the spectral leakage and spectral inefficiency associated with Fourier
Transform based MCM techniques. Time domain equalizers based on ZF and MMSE
algorithms are utilized for DSL channel equalization in DWMT transceivers. MMSE based
equalizers outperform the ZF equalizers in terms of BER. The equalization techniques adopted
22 Discrete Wavelet Transforms: Algorithms and Applications
for DWMT transceiver is a topic of active research. Moreover, simulation results found in
literature have shown that DWT based MCM systems exhibit higher immunity to narrowband
interference (NBI). Therefore, WOFDM/DWMT can be considered as a viable alternative to
spectrally inefficient OFDM/DMT, however at the cost of higher computational complexity of
equalization.
7. References
Acker, K. V. , Leus, G. Moonen, M. van de Wiel, O. and Pollet, T. (2001). Per tone
equalization for DMT-based systems, IEEE Trans. on Communications, vol. 49, no. 1,
pp. 109-119.
Akansu, A.N.; Xueming Lin. (1998).A comparative performance evaluation of DMT (OFDM)
and DWMT (DSBMT) based DSL communications systems for single and multitone
interference. Proceedings of IEEE International Conference onAcoustics, Speech and
Signal Processing, 1998., vol.6, no., pp.3269-3272 vol.6, 12-15.
Alliance for Telecommunications Industry Solutions, (1995). American National Standard
for Telecommunications - Network and Customer Installation Interfaces -
Asymmetric Digital Subscriber Line (ADSL) Metallic Interface. ANSI T1.413 1995
ANSI. New York.
Jamin, A. Mähönen, P. (2005). Wavelet packet modulation for wireless communications,
Wireless Communications and Mobile Computing 5 (2): pp. 123-137.
Baig, S. and Gohar, N. D. (April 2003). Discrete Multi-Tone Transceiver atthe Heart of PHY
Layer of an In-Home Powerline Communication Local Area Network.IEEE
Communications Magazine,pp. 48-53.
Baig, S. and Mughal, M.J. (2009). Multirate signal processing techniques for high-speed
communication over power lines, IEEE Communications Magazine, vol.47, no.1,
pp.70-76, January 2009
Bingham, J. C. (2000). ADSL, VDSL, and Multicarrier Modulation, Wiley & Sons.
Chow, P. S., Tu, J. C. and Cio, J. M. (1991). Performance Evaluation ofa Multichannel
Transceiver System for ADSL and VHDSL Services.IEEE Journal on Select. Areas in
Commun., 9(6):909-919.
Cook, J.W.; Kirkby, R.H.; Booth, M.G.; Foster, K.T.; Clarke, D.E.A.; Young, G. (1999). The
noise and crosstalk environment for ADSL and VDSL systems.IEEE
Communications Magazine,vol.37, no.5, pp.73-78, May 1999.
Doux, C., V.; Lienard, J.; Conq, B.; Gallay, P.(2003). Efficient implementation of discrete
wavelet multitone in DSL communications.Video/Image Processing and Multimedia
Communications, 2003. 4th EURASIP Conference focused on , vol.1, no., pp. 393- 398.
Farrukh,F., Baig, S. and Mughal, M.J. (2007). Performance Comparison of DFT-OFDM and
Wavelet-OFDM with Zero-Forcing Equalizer for FIR Channel Equalization, Proc. of
IEEE International Conference on Electrical Engineering, LHR, Pakistan.
Farrukh,F., Baig, S. and Mughal, M.J. (2009). MMSE Equalization for Discrete Wavelet
Packet Based OFDM, Proc. of IEEEInternational Conference on Electrical Engineering,
ICEE’09, LHR, Pakistan.
Frederiksen, F.B. Prasad, R. (2002). An overview of OFDM and relatedtechniques towards
development of future wireless multimedia communications.Proceedingsof IEEE
Radio and Wireless Conference, RAWCON.pp. 19- 22.Aug. 11-14 Boston,
Massachusetts, USA.
Discrete Wavelet Multitone Modulation for ADSL & Equalization Techniques 23
1. Introduction
In recent year, the Forward and Inverse Discrete Wavelet Transform (FDWT/IDWT) (S.Mallet,
1999) has been widely used as an alternative to the existing time-frequency representations
such as DFT and DCT. It has become a powerful tool in many areas, such as image
compression and analysis, texture discrimination, fractal analysis, pattern recognition and so
on. The recent and future developments of high definition digital video and the diversity of
the terminals had led to consider a multi-resolution codec. In this context, the FDWT/IDWT
as well as the others computational functions such as Motion Estimation (ME) are required to
be scalable and flexible to support rich multimedia applications and adapt to the fast changing
of standards requirement. In this background, a universal, extremely scalable and flexible
computational architecture which can adapt to variable workload would be more and more
important and suitable for the multimedia application in the future.
In the literature, there have been several proposals devoted to the hardware implementation of
FDWT/IDWT. Some proposals(M.A.Trenas et al., 2002) (et al, 2002) (Lee & Lim, 2006)(Ravasi,
2002)(P.Jamkhandi et al., 2000)(Tseng et al., 2003)addressed the importance of flexibility and
proposed programmable DWT architectures based on two types: VLSI or FPGA architecture.
The VLSI architectures have large limitations in terms of flexibility and scalability compared
to the FPGA architectures. Even though some recent solutions proposed programmable
and scalable for either variable wavelet filters(Olkkonen & T.Olkkonen, 2010) (Lee & Lim,
2006) or the structure of FDWT, they remind, in addition to their cost, dedicated to specifics
algorithms and cannot be adapted to future solutions. In another hand, the existing FPGA
architectural solutions are mainly ASIC like architectures and use external off-the-shelf
memory components which represent a bottleneck for data access. The possibility of
parallelizing the processing elements offered by FPGAs associated to a sequential access to
data and bandwidth limitations do not enhance the overall computing throughput. The
very powerful commercial VLIW digital signal processor obtains its performance thanks to
a double data-path with a set of arithmetic and logic operators with a possibility of parallel
executions and a wide execution pipeline. However, these performances are due to a high
frequency working clock. Even though these DSP has a parallel but limited access to a set
of instructions, the data memory access remains sequential. The performance requirement is
paid by high circuit complex and power consumption. Most of work focuses on the reuse of
devices likes FPGAs for different applications or different partitions of one applications.
In order to square up these needs, we propose a novel DWT architecture and implementation
method. The proposed architecture can support multi-standard by reconfiguring the
26
2 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
T1 T1 T1
T2 T7 T2
T3 T3 T30
T31
T4 T8 T4 T32
T5
T9 T5
T6 T6 T6
(A1) (A2) (A3)
Fig. 1. Application adaptive configuration
interconnection between date memories and processing elements. Moreover, the number
of processing element and its working frequency could be reconfigured dynamically. A
controller plays a key role as a reconfigurable interface allowing multiple accesses to local
memory, external memory through a DMA and feeding the processing element in an
optimal fashion. An implementation method is developed to identify parallelism level of
processing element and working frequency as well as to find out the tradeoff between power
consumption and performance. In comparing with others VLSI and ASIC architecture, double
size of memory can be economic in using our novel architecture.
In the following paper, we start, in Section2 by presenting a definition of adaptation in two
manners: the application adaptive and task adaptive, within the system complexe context. We
then give a brief overview of DWT algorithm in Section 3 where we detail a reconfigurable
DWT hardware processor architecture. In order to experimentally explicit our proposed
system, Section 4 focus on the detail of our proposed reconfigurable architecture which
supports our DWT algorithm implementation. Section 5 focus on the implmentation, analysis
and validation of system. Finally, Section 6 summarizes and concludes this work.
2. Levels of adaptation
In the multimedia computing environment, adaptation can be seen in two manners: the
application adaptive and task adaptive. Following the adaptation of computing environement
the different applications or different standard of one application can be switched in run-time.
For example, the multimedia terminal switches it use from playing a movie to answering
a video call. The task adaptive consists of the switching different versions of a task of an
application, this situation can occur for instance in down scaling or up scaling situations.
of two applications is required. To achieve this, different versions of specific tasks must be
available.
N×N 4L − 1
L
D=
i −1
= ∑
3 × 4 L −1
×N×N (1)
i =1 4
To process a N × N image, a temporal memory of size
4L − 1
D−N×N = ( )×N×N (2)
3 × 4 L −1
is required. As an example, for 2 level resolution a temporal memory of 0.25 N × N size is
required. For a given layer, the filtering process is achieved horizontally and vertically; thus
28
4 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
two read accesses and two writes accesses are necessary and the total amount of data read and
written is expressed as Dw = Dr = 2 × D. The memory bandwidth B, in bidirectional access
case, can be considered as the production of the total amount of data processed for a frame
per second ( f ps) Td f = ( Dr + Dw ) × f ps and the number of bits Nb of a coefficient:
B = Td f × Nb (3)
As an example, for a gray level image of 512 × 512 pixels with 25 frame per second, 8 bits
per pixel and 2 levels of reconstruction, a bandwidth of 260 Mb/s is required. These results
illustrate the memory management problem as the main bottleneck of the classical approach.
HL L PE1
L L
LH HH HL H H H PE2
Φ1 Φ2 Φ3
Fig. 3. 2-D IDWT Processing phases
30
6 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
same set of processing are performed on multiple data blocs. Each cluster is composed of
an heterogeneous multiprocessor cores that allow software reuse, one or several Reconfig-
urable Processing Modules (RPU), a Reconfigurable Communication Module (RCM), and on
chip memory. The RPM allows hardware acceleration and can be configured in a way that
supports different versions of a task. The reconfigurable interface (RIF) is used to build the
inter- connection between differents modules. Each RPM can be reconfigured at runtime.
Each cluster has a local configuration manager implemented in an on chip processor that
controls the sequences of reconfigurations of the cluster. In this local configuration level, all
clusters are configurable in parallel and independently. The reconfiguration process allocates
dynamicaly to differents tasks of an application the adequate hard- ware ressources and
optimal operation frequency and voltage. The presence of local configuration managers
allows the acceleration of the adaptation process. To control the overall system, a global
reconfiguration level is necessary. In this level, the necessary informations are managed in
order to modify the global organisation of the system by configuring the communication
between clusters and the elements of a cluster, allowing for instance to switch from an
application to another.
The overall architectureM.Guarisco et al. (2007) is depicted in figure 4. Three memory blocks
are present, while the first one and the last one repectively store original data image and
deliver computed data, the second block feeds the processing elements. In addition to these
three blocks, the system is composed of a reconfigurable processing unit two data organization
units and control unit. This last one unit allows to connect the right memory to the right Unit
at the right time. Once the memory bloc 1 is full (and as a consequence memory bloc 3 is
empty, or at least, all his bytes are read or store in external memory), each memory datapath
is switch allowing new picture datas to be treated. A new cycle begins, memory 3 is this time
filled and datas in memory 1 are transforming.
Table 1 lists the number of main computational requirements (the number of additions,
shifts, and multiplications per filtering operation). We choose two filters to illustrate the task
adaptive level.
4.1.0.1 The 5/3 lifting based wavelet transform
The IDWT 5/3 lifting based wavelet transform has short filter length for both low-pass and
high-pass filter. They are computed through following equations :
D [n] = S0 [n] − [1/4( D [n] + D [n − 1]) + 1/2] (4)
S[n] = D0 [n] + [1/2(S0 (n + 1) + S0 [n])] (5)
The equations for FDWT 5/3 are given bellow:
• Read (R): The source operands from the on chip memory are sent to the register file. The
contro module gives an order to the reading file address generator integrated into the
control module for reading the row or column resource from the memory module (SRAM)
to the RPU at the address pointed to be by a read counter. Two data are read in one clock
cycle.
• Execution (E): The data available in the regiter file is used bythe data-path to process in
parallel the two parts of the filter. As the high pass filter part requires the previous result
of low pass filter part, the execution is delayed by one clock cylce for high pass filter results.
This operation is executed in one clock cycle.
• Writeback (W): The results of computation are written back to on chip memory at the
address pointed to by a write counter.
The figure7 illustrates the operating mode of the three stages pipeline. Because of sequential
acces to one memory bloc, the computations of the first level are performed as shown in (a)
allowing the exection of three operations in one clock cycle. For the remaining porcessing,
thanks to the parallel read, execute and write, six operations are executed in one clock cycle
(b).
34
10 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
level is Tload
2 (we reach two pixels at one clock cycle, so, it divides by two Tload , we have two
PE that divides again by two Tload but we have to achieve two time the 1D-DWT. Finally the
execution time is of Tload
2 ). Moreover, we know that for the next level, we need only the low
frequency coefficients which represent only a quarter of the total result of the previous level.
The execution time is then the result of an arithmetic suite which is represented in equation 4.
1 Tload
n
Texec = ∑ i = 1 n 4i − 1 2
(12)
If the number of level tends towards infinite, the execution time is then of 2∗T3load . A data-out
unit allows getting back the DWT coefficients in an ordered way. This controller can be easily
modified to adapt the structure of the data flow to the system.
5. Implementation results
We have modeled the architecture in HDL in the sofware suite ISE from Xilinx. The simulation
results agree with our theoretical waiting. Indeed, we can perform with this architecture
a very high number of levels. According to the simulation results, we can run a working
frequency of 67MHz. But as we use the internal memory of an FPGA, we are limited and we
can reach an image size of only 128x128 pixels. The solution consists of a small modification
of the data organizing units to allow the architecture to treat macro-bloc instead of a whole
picture.
Figure10 illustrates the placement and routing of one RPU on Xilinx Virtex-4 FPGA. Three
mains parts of system like: reconfigurable interface(middle bloc), four registre blocs and two
datapaths of IDWT algorithm. The configuration file of each is independant, which is named
36
12 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
datapath
odd
register file
On chip memory
datapath
Reconfigurable Interface
even
(1)
RPU
On chip memory
datapath
odd
register file
(2)
On chip memory
datapath
even
(3)
LL LH
I/O
HL HH
Organisating
PowerPC Unit
PLB
accessing memory
ICAP Controller
Controller BRAM
DDR
Table 2. Measured reconfiguration time of different bitstream files for 2-D IDWT.
A Scalable
A Scalable Architecture
Architecture forTransform
for Discrete Wavelet Discrete Wavelet
on FPGA-Based Transform on FPGA-Based System
System 37
13
Fig. 10. placement and routing schema of DWT processing unit with Xilinx PlanAhead tool
as partial bitstreams. The different partial bitstreams are stored in the on chip BRAM. The
static bitstream is loaded using cable. To measure the execution time of each partial bitstream,
a free running hardware timer is used. The measurement results are in table 2. In this table,
the mains modules are : the different part between filter 5/3 and 9/7 ( R_d_97 ), 5/3 filter
functional module (R_f_53), interface for 5/3 filter (R_com_1) and: communication interface
for 9/7 filter (R_com_2).
The on chip PowerPC processor is used for autoconfiguration through HWICAP. As the
PowerPC is an element of the system, it is used to detect external or internal events and
accordingly loads automatically the adequate configuration to adapte the system to the given
situation and then making the auto-adaptive. The HWICAP makes auto-configuration easier,
in fact a C program on PowerPC allows the transfer of 512x32 bit blocks of the partial bitstream
from the configuration memory to a fixed size buffer of the HWICAP peripheral, which the
transfer from the buffer to the ICAP. The total reconfiguration time can approximated by the
following equation:
two filters, Part3 corresponds to 5/3 filter, and Part4 is the difference between 5/3 filter
and 9/7 filter ). The configuration time is measured using a free running counter (timer)
incremented every system clock cycle, and capturing the start time and the end time. We see
that the configuration time as expected depends linearly on the size of bitstream.
the defnite size of image. The size of of memory requirment is scalable and thus the correcte
size of memory can be configured dynamically to adapt to the requirement of bandwidth
of memory. The other work shown in this table are based on ASIC(Tseng et al., 2003)and
VLSI (et al, 2002). the area of circuit and the size of memory are fixed and thus the maximum
size of memory must be previewered, which may lead to the urgent or surplus of memory
access.
This proposed architecture features small area and low momory requirments. Processing time
for a 32x32 image blocks is 43s which is lower than others traditional design. Using a 64X64
image blocks gives a good performance throughput which takes 86s for the transformation,
for two-level 2D IDWT, which is capable to perform the image CCIR(720X576) format image
signal at 50 f rame/s.
6. Conclusion
In this book chapter, , we have described auto-adaptive and reconfigurable hybrid architecture
for F/IDWT signal processing application. Two levels of auto adaptation are defined in
order to minimize the reconfiguration overhead. The application adaptive level in which
different applications of a domain are classified and characterized by a set of tasks. The task
adaptive level in which for a given task, a set of versions are defined and characterized for
use in a situation to adapt the application to different constraints like energy, and bandwidth
requirement.
The proposed architecture is a universal, scalable and flexible featuring two levels of
reconfiguration in order to enable the application adaptivity and task adaptivity. We
demonstrated through the case study that it can be used for any types of filters, any size
of image and any level of transformation. The memory is organized as a set of independent
memory blocks. Each memory block is a reconfigurable module. The high scalability of the
architecture is achieved through the flexibility and ease of choosing the number of memory
blocks and processing elements to match the desired resolution. The on-chip memory is used
not only to hold the source image, but also to store the temporary and final result. Hence, there
is no need of temporal memory. The processor has no instructions and then no decoder, in fact,
the hardware reconfigurable controller plays the role of a specific set of instructions and their
sequencing. For a given set of tasks, a set of configurations are generated at compile time and
loaded in run time by the configuration manager via configuration memory. The prototype
has been tested on FPGA developpment cart of Xilinx with 65nm CMOS technology. The
prototyping chip can be reconfigured to adapte 5/3 filter or 9/7 filter. In comparing with
others ASIC architecture at the same working frequency, our proposed architecture requires
less memory bloc and fewer hardware resource than the others.
7. References
et al, S. (2002). Vlsi implmentation of 2-d dwt/idwt cores using 9/7-tap filter banks based on
the non-expansive symmetric extension scheme, in IEEE (ed.), Proceedings of the 15th
International Conference on VLSI Design, number 435 in ASP-DAC ’02, IEEE Computer
Society, Washington, DC, USA.
Inc., X. (ed.) (2004). Two flows for partial reconfiguration: module-based or different based, Xilinx.
Lee, S.-W. & Lim, S.-C. (2006). Vlsi design of a wavelet processing core, in IEEE (ed.), IEEE
Transactions on Circuits and Systems for Video Technology, Vol. 16.
40
16 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
M.A.Trenas, J.Lopez & E.L.Zapata (2002). A configurable arechitecture for the wavelet packet
transform, The journal of VLSI Signal Processing, Vol. 32, pp. 151–163.
M.Guarisco, X.Zhang, H.Rabah & S.Weber (2007). An efficient implementation of scalable
architecture for discrete wavelet transform on fpga, in IEEE (ed.), 6th IEEE Dallas
Circuits and Systems workshop, pp. 1–3.
Olkkonen, H. & T.Olkkonen, J. (2010). Discrete Wavelet Transform Structures for VLSI
Architecture Design, intech, Hannu Olkkonen and Juuso T. Olkkonen (2010).
Discrete Wavelet Transform Structures for VLSI Architecture Design, VLSI,
Zhongfeng Wang (Ed.), ISBN: 978-953-307-049-0, InTech, Available from:
http://www.intechopen.com/articles/show/title/discrete-wavelet-transform-
structures-for-vlsi-architecture-design.
P.Jamkhandi, A.Mukherjee, K.Mukherjee & Franceschini, R. (2000). Parallel
hardware/software architecture for computation of discret wavelet tranform
using the recursive merge filtering algorithm, Proceeding International Parallel Distrib.
workshop, pp. 250–256.
Ravasi, M. e. a. (2002). A scalable and programmable architecture for 2-d dwt decoding, IEEE
Transactions on Circuits and Systems for Video Technology, Vol. 12, pp. 671–677.
S.Mallet (1999). A theory for multi-resolution signal decomposition: The wavelet
representation, number 7, IEEE, pp. 674–693.
Tseng, P.-C., Huang, C.-T. & Chen, L.-G. (2003). Reconfigurable discrete wavelet transform
architecture for advancedmultimedia, in IEEE (ed.), Signal Processing Systems,
number 137-141.
0
3
1. Introduction
The advantages of the wavelet transform over conventional transforms, such as the Fourier
transform, are now well recognized. Because of its excellent locality in time-frequency
domain, wavelet transform is remarkable and extensively used for signal analysis,
compressing and denoising. Defining DWT by Mallat [1] provided possibility of its digitally
hardware or software implementation. The discrete wavelet transform (DWT) performs a
multiresolution signal analysis which has adjustable locality in both the space (time) and
frequency domains [1]. Unlike the Fourier transform, the wavelet transform has many
possible sets of basis functions. A trade-off can be made between the choice of basis functions
and the complexity of the corresponding hardware implementations. Using finite impulse
response (FIR) filters and then subsampling is the classical method for implementing the
DWT. Due to the large amount of computations required, there have been many research
efforts to develop new rapid algorithms [2]. In 1996, Sweldens presented a lifting scheme
for a fast DWT, which can be easily implemented by hardware due to significantly reduced
computations [3]. This method is entirely based on a spatial interpretation of the wavelet
transform. Moreover, it provides the capability of producing new mother wavelets for
the wavelet transform, based on space domain features. Due to recent advances in the
technology, implementation of the DWT on field programmable gate array (FPGA) and digital
signal processing (DSP) chips has been widely developed. As described in Sect. 3, in the
lifting scheme the structural processing elements, including multipliers, are arranged serially;
hence, the number of multipliers in each pipeline stage determines the clock speed of the
structure. Based on [4], the main challenges in the hardware architectures for 1-D DWT are
the processing speed and the number of multipliers, while for 2-D DWT it is the memory
issue that dominates the hardware cost and the architectural complexity. The reason is the
limitation of the on-chip memory and the power consumption [4,5].
2. DWT structures
The wavelet transform provides a time-frequency domain representation for the analysis
of signals. Therefore, there are two main methods to produce and implement wavelet
transforms. These methods are based on time domain or frequency domain features. The
frequency based method is Filter Banks (FB) and the time based one is called Lifting Scheme
(LS). We will describe them in following sections.
42
2 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
d
G(z) 2
x
s
H(z) 2
−1 1 3 1 −1
H ( z ) = √ z2 + √ z + √ + √ z −1 + √ z −2 .
4 2 2 2 2 2 2 2 4 2
−1 1 −1
G ( z ) = √ z2 + √ z + √
2 2 2 2 2
The low pass filter has 5 taps and the high pass has 3 taps, so we call it 5/3 wavelet. Although
FB structure is the prior one but it is only capable of providing wavelet transforms in the
frequency domain and not in the time domain. Moreover, in general, the FB filter coefficients
are not integer numbers; hence, they are not appropriate for hardware implementation. In
addition, the number of arithmetic computations in the FB method is very large.
d = xodd − P ( xeven ),
VLSI Architectures
VLSI Architectures ofDiscrete
of Lifting-Based Lifting-Based Discrete Wavelet Transform
Wavelet Transform 433
In (1), k and 1/k are normalization factors. The last matrix is used only for normalization
and may be omitted in many applications such as compression. The relation between FB
coefficients and LS equations is (3):
h (z) ho (z) k 0 m 1 si (z) 1 0
E (z) = e
ge ( z ) go ( z )
= ∏
0 1/k i=1 0 1 ti (z) 1
Matrix E (z) is called a polyphase matrix, where according to the FB structure, he and ho are
even and odd taps of the low pass filter and ge and go are even and odd taps of the high
pass filter, respectively. si (z) and ti (z) are related to filter coefficients in FB structure. In other
words si (z) and ti (z) can be obtained from FB by factorization algorithm presented in [5].
Example: Let consider the previous example, 5/3 wavelet, in LS. This wavelet consist of one
lifting step (one P unit and one U unit together is a lifting step). For this wavelet the prediction
of each odd sample in signal is the average of two adjacent even samples. Then P block
calculates the difference between the real value of signal sample and its prediction:
1
d(n ) = x (2n + 1) − [ x (2n ) + x (2n + 2)].
2
U block updates even samples to have the same property as the original signal. It uses two
most recently computed differences for update procedure:
1
s(n ) = x (2n ) + (d(n − 1) + d(n )).
4
44
4 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
3. 1-DDWT
In this section, some types of lifting-based DWT processing elements and 1-D structures are
explained.
a × +
× ×
a ×
+ a b
C Z A C Z
A A
-1 -1
a(1+z ) a+bz a
even samples while B receives odd samples. On the other hand, for the U unit, A and C
are odd samples and B receives even samples. Now, the structure of Fig. 4 can be used to
implement 5/3 and 9/7 wavelets. For instance, Fig. 5 and Fig. 6 shows the architecture of
the 5/3 and 9/7 wavelets respectively, where each white circle represents a PE. In Fig. 6, the
R R R
P
U
R R R
registers is constant and is equal to 3, while the number of temporary memory registers is
(2e + 1), where e is the number of extended layers [13]. This structure can be implemented
by using combinatorial circuits so that, when the input samples are fed to the architecture,
outputs are ready to be used after a delay time. Also, the implementation of the structure
can be performed via a pipelined structure by adding some registers. The number of pipeline
stages depends on the added registers. Increasing the pipeline stages results in increases in
the clock frequency, system latency and number of required registers [4]. Note that 2-D DWT
architectures are constructed from 1-D DWT units as row-wise and column-wise DWT units.
The data of a complete row is saved for each memory in a column-wise unit. So, the sum of
the data and temporary memories in the column-wise DWT unit determines the amount of
needed internal memory [14,15,16]. The pipeline registers do not affect the required internal
memory [17].
4. Hardware architectures
4.1 Simple hardware implementation
Figure 7 shows an architecture proposed in [18] for the 9/7 wavelet. Indeed, it is the hardware
Implementation of Fig. 6. Accordingly, the architecture presented in Fig. 9 can be used for the
5/3 wavelet.
x x x x
(a)
x x
x x T=0
T=1
T=2
T=3
(b)
Fig. 7. 1-D DWT structures based on lifting a)parallel architecture b) sequential-pipelined
architecture
U1 U2
i0
Z + Z + o0
+ β × + δ ×
α × + γ × +
i1 Z + Z Z + Z
o1
P1 P2
The calculation of consecutive wavelet coefficients is periodic and continuous; therefore, the
sequence of control signal "S" for data flow can be easily generated by a simple logic circuit.
Figure 11 shows the hardware architecture for Fig. 11. The 5/3 wavelet implementation of
the proposed architecture is depicted in Fig. 13. It is clear that only the number of coefficients
48
8 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
R0
o0
i0 +
+ 1/4 ×
-1/2 × +
i1 + o1
R1 R2
Fig. 9. Lifting-based hardware architecture for 5/3 wavelet
Z + o0
i0 β
+ ×
δ
α
γ
× +
i1 o1
Z + Z
in1 out
+
s F
×
in2
+ s
Z
s
Fig. 11. Minimized structure
and delay block registers, that is, the z−1 blocks, have been modified from four to two. So,
changing the wavelet type changes these two quantities, coefficients and registers, only. Both
P and U units in LS can be implemented by means of the PE shown in Fig. 4. We explored
this feature in the previous section and implemented a 1-D DWT structure containing only
one PE. We call this method the "folded method". The folded structure is an alternative for the
proposed method in [12] by which the lifting-based structures can be designed systematically.
As shown in Fig. 14 for 9/7 wavelet, the method in [12] produces systolic architecture, but
folded method produces folded architecture. In folded structure, the output of the PE unit is
fed back through the delay registers to the PE’s input. By incorporating different numbers of
delay registers and coefficients with PE, the structure for different wavelets can be designed.
For example the folded structure for 5/3 and 9/7 wavelets has two and four delay registers,
respectively. Also the coefficients for 5/3 wavelet are −21 and 14 while for 9/7 they are α, β, γ, δ.
VLSI Architectures
VLSI Architectures ofDiscrete
of Lifting-Based Lifting-Based Discrete Wavelet Transform
Wavelet Transform 499
out
+
s
×
s
+ s
out
+
s ×
s
+ s
out
PE
In order to show the efficiency of our architecture, several architectures are chosen for
comparison. Ignoring the pipeline registers, the results of comparison for the 9/7 wavelet are
given in Table 2. It is obvious that compared to other architectures, the number of processing
units is reduced in the folded architecture, thus requiring less area to implement the DWT.
Having smaller 1-D DWT units is very effective in multidimensional architectures or in 2-D
DWT, where it is needed to increase the number of 1-D DWT units to achieve a higher
performance [20]. The cost is that, in the proposed architecture, the clock pulses required
to compute outputs are more than those in the previous architectures. This requirement is
due to the sequential states required to complete the computation of each output.
50
10 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
1 1 1
4 × (1 + + + · · · + ( ) J −1 ) × N 2 (2)
4 16 4
increases the speed, it requires more internal memory and the size of the circuit is increased.
In [22], by replacing registers with line buffers and controlling data flow in a structure like Fig.
13, with 5 more registers, a 2-D DWT block for 5/3 wavelet has been proposed.
Obviously, for a higher-level 2-D DWT, only LL coefficients of the previous level are used, so
the total number of external memory access for a J-level 2-D DWT on an N × N image is
1 1 1
+ + · · · + ( ) J −1 ) × N 2
2 × (1 + (3)
4 16 4
The structure of Fig. 17 performs all levels of 2-D DWT, using only internal memory. So,
1 1 1
5N × (1 + + + · · · + ( ) J −1 ) (4)
2 4 2
19, L is 4 for the 9/7 wavelet without pipelining. The storage of temporary memory may be
fulfilled in internal or external memory. If internal memory is used to save temporary data, the
number of external memory accesses is equal to N 2 read and N 2 write operations. However,
if external memory is used to save temporary memory, both of the external memory reads and
writes are increased by an amount of ( M N − 1) N × L. Hence, the number of external memory
this expression, the ( M − 1) coefficient is the number of M-pixel sections. Due to hardware
N
limitations (the limit size of internal memory on FPGA ICs), we select the second case for
implementation. Comparisons of the aforementioned methods for one level 2-D DWT are
given in Table 3. The table shows the values of the internal memory size and external memory
accesses for the three algorithms. It is shown that in our proposed algorithm the internal
memory size is between those of two other algorithms (the direct method and the line-based
method). The same conclusion is true for the external memory size. The table also shows
the order of complexity for the control circuits of these methods based on [28]. Similar
comparisons for J-level (J −→ ∞) 2-D DWT are given in Table 4. Note that M has been
considered to be fixed for all levels of 2-D DWT. It means that the width of the M-pixel section
for the current level is the same as the one in the previous level. The J-level structure can
be implemented either in the form of a single level (Fig. 15) or multilevel structure (Fig.
16), and the relevant values are listed in Table 4. We observe that the parameter M can be
determined according to hardware limitations, such as internal memory. The conclusion from
the two tables is that the block-based structure with the new scan method, in comparison
with the direct method, needs more internal memory, but needs only about one-half of the
external memory accesses. Due to the shorter access time for internal memory, the clock
pulse frequency will increase, and based on the energy model in [5] the power consumption
will decrease. In comparison with other methods, the new method remarkably decreases
the needed internal memory at the cost of a soft increase in the number of external memory
accesses.
7. Experimental results
The folded 1-D DWT architecture was described in VHDL code and simulated by
Active-HDL6.3 software. Then the relevant VHDL code was synthesized by the Synplify7.5.1
software tool to be implemented on IC XC2V40 (from the VirtexII family of Xilinx FPGAs).
The maximum estimated frequency for implementing Fig. 12 on this IC is 122.4 MHz, which
is practical for real-time implementation of the 9/7 wavelet for large images. The maximum
frequency to implement the 5/3 wavelet on the IC is 163.1 MHz. Also, the block-based
architecture with the new scan method was modeled and simulated for the 5/3 wavelet with
N = 1024 and 8-bit pixels. The code was synthesized by Synplify7.5.1 for implementation on
VirtexII. After post place and route simulation, the clock pulse frequency achieved was 97
MHz. The structure receives one pixel as input per each clock pulse. So, according to the
54
14 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
Block-Based
Direct Line-Based Single Level Line-Based Multilevel Line-Based
Single Multi-
Level level (27) (30) (27) (30)
Internal
Memory 0 5N 10N 5M 5M (5M ) J (5M ) J
Size
External
4 N2 M 4 N2 K 16N 2
Memory 8 2
3N
4 2
3N N2 3 ( M−K ) 3
N (1 + M
4 2 4
) − 8N N2 + 3 M−K N2 + M − 8N
Reads
External
16N 2
3 N (1 + M ) − 8N N2 + − 8N
8 N2 4 2 4 2 4 2 4
Memory 3 3N N2 3N N2 M
Write
Table 4. Comparison of different 2-D DWT structures for J level 9/7 wavelet (J −→ ∞)
calculations below, the folded structure can be used to perform 3 levels of 2-D DWT for 70
frames (1024×1024 pixels) per second, and it is suitable for use in real-time hardware video
codec.
1024 × 1024 × (1 + 14 + 16
1
)
t= = 14.2ms,
97MHz
1 1
nf = = 70( f rame/s).
t 14.2ms
8. Conclusion
Lifting Scheme and some different lifting based architectures for DWT presented in this
chapter. Then we focused on the size (area) of the architecture. An architecture to minimize
the number of multipliers and adders has been investigated for implementation of 1-D DWTs.
All types of 1-D DWTs can be implemented by modifying only the number of registers and
coefficients of the architecture. Thus, the folded architecture, which has fixed form units for
all DWT types, presents a new folded method for systematic implementation of DWT. It is
possible to design a software program to produce the folded architecture for different types
of wavelets. What is needed is to define coefficients (α, β, ...) required for each step of the
desired wavelet in the lifting scheme. The folded method can be extended for large and
complex structures such as multilevel discrete wavelet packet transforms [29] to reduce the
area. Also, we have reviewed the 2-D DWT block-based structure and shown its power
to trade off between the internal memory size and the number of external accesses by a
controlling parameter.
9. References
[1] S. Mallat, A theory for multiresolution signal decomposition: the wavelet representation.
IEEE Trans. Pattern Anal. Mach. Intell. 11, 674–693 (1989)
[2] T. Acharya, C. Chakrabarti, A survey on lifting-based discrete wavelet transform
architectures. J. VLSI Signal Process. 42, 321–339 (2006)
[3] I. Daubechies, W. Sweldens, Factoring wavelet transform into lifting steps. J. Fourier
Anal. Appl. 4, 247–269 (1998)
VLSI Architectures
VLSI Architectures ofDiscrete
of Lifting-Based Lifting-Based Discrete Wavelet Transform
Wavelet Transform 55
15
[4] C.-T. Huang, P.-C. Tseng, L.-G. Chen, Flipping structure: an efficient VLSI architecture
for liftingbased discrete wavelet transform, IEEE Trans. Signal Process. 52 (2004), pp.
1080–1089
[5] N.D. Zervas et al., Evaluation of design alternatives for the 2-D-discrete wavelet
transform. IEEE Trans. Circuits Syst. Video Technol. 11(12), 1246–1262 (2001)
[6] K. A. Kotteri, S. Barua, A. E. Bell, and J. E. Carletta, "A comparison of hardware
implementations of the biorthogonal 9/7 DWT: convolution versus lifting," IEEE Trans.
Circuits Syst. II, Expr. Br., vol. 52, no. 5, pp. 256-260, 2005.
[7] M. Maurizio, M. Guido, P. Gianluca, and Z. Maurizio, "Novel JPEG 2000 compliant DWT
and IWT VLSI implementations," J. VLSI Signal Process., vol. 35, no. 2, pp. 137-153, Sep.
2003.
[8] J. Reichel, On the arithmetic and bandwidth complexity of the lifting scheme, in Proc. of
International Conference on Image Processing (2001), pp. 198–201
[9] R. Calderbank, I. Daubechies, W. Sweldens, B.-L. Yeo, Wavelet transforms that map
integers to integers. Appl. Comput. Harmon. Anal. 5(3), 332–369 (1998)
[10] A. Jensen, A. La Cour-Harbo, Ripples in Mathematics: The Discrete Wavelet Transform
(Springer, Berlin, 2001)
[11] K.G. Oweiss et al., A scalable wavelet transform VLSI architecture for real-time signal
processing in high-density intra-cortical implants. IEEE Trans. Circuits Syst. 54(6),
1266–1278 (2007)
[12] C.-T. Huang, P.-C. Tseng, L.-G. Chen, Efficient VLSI architectures of lifting-based discrete
wavelet transform by systematic design method, in IEEE International Symposium on
Circuits and Systems, vol. 5 (2002), pp. 565–568
[13] W.-H. Chang, Y.-S. Lee, W.S. Peng, C.-Y. Lee, A line-based, memory efficient and
programmable architecture for 2D DWT using lifting scheme, in IEEE International
Symposium on Circuits and Systems, vol. 4 (2001), pp. 330–333
[14] K. Andra, C. Chakrabarti, T. Acharya, A VLSI architecture for lifting-based forward and
inverse wavelet transform. IEEE Trans. Signal Process. 50(4), 966–977 (2002)
[15] O. Fatemi, S. Bolouki, Pipeline memory-efficient and programmable architecture for 2D
discrete wavelet transform using lifting scheme, in Proceedings of IEE Circuits, Devices
and Systems, December 2005, pp. 703–708
[16] Lan, N. Zheng, Y. Liu, Low power and high-speed VLSI architecture for lifting-based
forward and inverse wavelet transform. IEEE Trans. Consumer Electron. 51(2), 379–385
(2005)
[17] C.Y. Xiong, J.W. Tian, J. Liu, A note on Şflipping structure: an efficient VLSI architecture
for liftingbased discrete wavelet transformŤ. IEEE Trans. Signal Process. 54(4), 1910–1916
(2006)
[18] J.M. Jou, Y.H. Shiau, C.C. Lio, Efficient VLSI architectures for the biorthogonal wavelet
transform by filter bank and lifting scheme, in Proceedings of IEEE ISCAS 2001, pp.
529–533
[19] C.J. Lian, K.F. Chen, H.H. Chen, L.G. Chen, Lifting based discrete wavelet transform
architecture for JPEG2000, in IEEE International Symposium on Circuits and Systems
(ISCAS 2001) Sydney, May 2001
[20] C.Y. Xiong, J.W. Tian, J. Liu, Efficient architectures for two-dimensional discrete wavelet
transform using lifting scheme. IEEE Trans. Image Process. 16(3), 607–614 (2007)
56
16 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
[21] P.-C. Tseng, C.-T. Huang, L.-G. Chen, Generic RAM-based architecture for
two-dimensional discrete wavelet transform with line-based method, in Asia-Pacific
Conference on Circuits and Systems (2002), pp. 363–366
[22] H. Varshney, M. Hasan, S. Jain, Energy efficient novel architectures for the lifting-based
discrete wavelet transform. IET Image Process. 1(3), 305–310 (2007)
[23] C.-T. Huang, P.-C. Tseng, L.-G. Chen, Hardware implementation of shape-adaptive
discrete wavelet transform with the JPEG2000 defaulted 9/7 filter bank, in IEEE
International Conference on Image Processing, vol. 3 (2003), pp. 571–574
[24] L.Hongyu, M.K. Mandal, B.F. Cockburn, Efficient architectures for 1-D and 2-D
lifting-based wavelet transforms. IEEE Trans. Signal Process. 52(5), 1315–1326 (2004)
[25] C.H. Yang et al., A block-based architecture for lifting scheme discrete wavelet transform.
IEICE Trans. Fundam. 90(5), 1062–1071 (2007)
[26] J.W. Kim et al., Tiled interleaving for multi-level 2-D discrete wavelet transform, in IEEE
Int. Symp. Circuits Syst., May 2007, pp. 3984–3987
[27] C.-T. Huang, P.-C. Tseng, L.-G. Chen, Memory analysis and architecture for
two-dimensional discrete wavelet transform, in IEEE International Symposium on
Circuits and Systems (ISCAS 2004)
[28] C.-T. Huang, P.-C. Tseng, L.-G. Chen, Analysis and VLSI architecture for 1-D and 2-D
discrete wavelet transform. IEEE Trans. Signal Process. 53(4), 1575–1586 (2005)
[29] C. Wang, W.S. Gan, Efficient VLSI architecture for lifting-based discrete wavelet packet
transform. IEEE Trans. Circuits Syst. 54(5), 422–426 (2007)
[30] S.A. Salehi, S. Sadri, "Investigation of Lifting-Based Hardware Architectures for Discrete
Wavelet Transform," Journal of Circuits Systems and Signal Processing, vol. 28, N.1,
pp1-16, 2009.
4
1. Introduction
Orthogonal Frequency Division Multiplexing (OFDM) is a multicarrier modulation system.
The transmission channel is divided into a number of subchannel in which each subchannel is
assigned a subcarrier. Conventional OFDM systems use IFFT and FFT algorithms at the
transmitter and receiver respectively to multiplex the signals and transmit them
simultaneously over a number of subcarriers. The system employs guard intervals or cyclic
prefixes (CP) so that the delay spread of the channel becomes longer than the channel impulse
response (Peled & Ruiz, 1980; Bahai & Saltzberg, 1999; Kalet, 1994; Beek et al.,1999; Bingham,
1990; Nee and Prasad, 2000). The system must make sure that the cyclic prefix is a small
fraction of the per carrier symbol duration (Beek et al.,1999; Steendam & Moeneclaey, 1999).
The purpose of employing the CP is to minimize inter-symbol interference (ISI). However a CP
reduces the power efficiency and data throughput. The CP also has the disadvantage of
reducing the spectral containment of the channels (Ahmed, 2000; Dilmirghani & Ghavami,
2007, 2008). Due to these issues, an alternative method is to use the wavelet transform to
replace the IFFT and FFT blocks (Ahmed, 2000; Dilmirghani & Ghavami, 2007, 2008; Akansu &
Xueming, 1998; Sandberg & Tzannes, 1995). The wavelet transform is referred as Discrete
Wavelet Transform OFDM (DWT-OFDM). By using the transform, the spectral containment of
the channels is better since they are not using CP (Ahmed, 2000; Dilmirghani & Ghavami,
2007, 2008). The illustration of the superior subchannel containment attributes in wavelet has
been described in detailed by (Sandberg & Tzannes, 1995) as compared to Fourier. The wavelet
transform also employs Low Pass Filter (LPF) and High Pass Filter (HPF) operating as
Quadrature Mirror Filters satisfying perfect reconstruction and orthonormal bases properties.
It uses filter coefficients as approximate and detail in LPF and HPF respectively. The
approximated coefficients is sometimes referred to as scaling coefficients, whereas, the detailed
is referred to wavelet coefficients (Abdullah et al., 2009; Weeks, 2007). In some literatures, these
two filters are also called subband coding since the signals are divided into sub-signals of low
and high frequencies respectively. The purpose of this chapter is to show the simulation study
of using the Matrices Laboratory (MATLAB) on the wavelet based OFDM particularly DWT-
58 Discrete Wavelet Transforms: Algorithms and Applications
OFDM as alternative substitutions for Fourier based OFDM. MATLAB is preferred for this
approach because it offers very powerful matrices calculation with wide range of enriched
toolboxes and simulation tools. To the best of the authors’ knowledge, there is no study on the
descriptive procedures of simulations using MATLAB with regards of flexible transformed
models in an OFDM system, especially when dealing with wavelet transform. Therefore, this
chapter is divided into three main sections: section 2 will explain conventional FFT-OFDM,
section 3 will describe in detail the models for DWT-OFDM, and section 4 will discuss the Bit
Error rate (BER) result regarding those two transformed platforms, DWT-OFDM versus FFT-
OFDM.
2. Fourier-based OFDM
A typical block diagram of an OFDM system is shown in Figure 1. The inverse and forward
blocks can be FFT-based or DWT-based OFDM.
Fig. 1. A Typical model of an OFDM transceiver with inverse and forward transformed
blocks which can be substituted as FFT-OFDM or DWT-OFDM.
The system model for FFT-based OFDM will not be discussed in detail as it is well known in
the literature. Thus, we merely present a brief description about it. The data dk is first being
processed by a constellation mapping. M-ary QAM modulator is used for this work to map
the raw binary data to appropriate QAM symbols. These symbols are then input into the
IFFT block. This involves taking N parallel streams of QAM symbols (N being the number of
sub-carriers used in the transmission of the data) and performing an IFFT operation on this
parallel stream. The output in discrete time domain is as follows:
N 1 n
1 j 2 i
X k (n)
N
Xm (i )e N (1)
i 0
Where xk(n) | 0 ≤ n ≤ N −1, is a sequence in the discrete time domain and Xm(i) | 0 ≤ i ≤ N −
1 are complex numbers in the discrete frequency domain. The cyclic prefix (CP) is lastly
added before transmission to minimize the inter-symbol interference (ISI). At the receiver,
the process is reversed to obtain the decoded data. The CP is removed to obtain the data in
the discrete time domain and then processed to FFT for data recovery. The output of the FFT
in the frequency domain is as follows:
N 1 n
j 2 i
U m (i ) U k ( n )e N (2)
i 0
Simulation of Models and BER Performances of DWT-OFDM versus FFT-OFDM 59
3. Wavelet-based OFDM
As mentioned in the previous section, the inverse and forward block transforms are flexible
and can be substituted with FFT or DWT-OFDM. We have discussed briefly about FFT-
OFDM. Thus, this section will describe wavelet based OFDM particularly about DWT-
OFDM transceiver. This section is divided into three parts: a description of the DWT-OFDM
transmitter and receiver models as well as the Perfect Reconstruction properties’ discussion.
The detailed and approximated coefficients must be orthogonal and normal to each other.
By assigning g as LPF filter coefficients and h as HPF filter coefficients, the orthonormal
bases can be satisfied via four possible ways (Weeks, 2007): <g, g*>= 1, <h, h*>= 1, <g, h*>= 0
and <h, g*>= 0. The symbol * indicates its conjugate, and the symbol < , > is referring to the
dot product. The result which yields to 1 is related to the normal property whereas the
result yielding to 0 is for orthogonal property accordingly.
Fig. 3. The processed signals of one symbol DWT-OFDM system using bior5.5 in DWT
transmitter. Top: data CA, Middle: data CD, Bottom: data Xk, corresponding to Figure 2.
Both filters are also assumed to have perfect reconstruction property. The input and output
of the two filters are expected to be the same. A further discussion can be found in section
3.3.
Fig. 5. The processed signals of one symbol DWT-OFDM system using bior5.5 in DWT
receiver. Top: data ca. Middle: data cd. Bottom: data Uk, corresponding to Figure 4.
and HPF. The first level of analysis filter in the receiver part can be folded and the decimator
and the expander are cancelled out by each other.
Fig. 6. A simple and modified model of two-channel filter bank illustrating a perfect
reconstruction property with the superscript number is referring to the steps.
To satisfy a perfect reconstruction operation, the output Yk(i) is expected to be the same as
Xk(i). With the exception of a time delay, the input can be considered as Yk(i) = Xk(i-n) where
n can be substituted as 1 to describe this simple task. The steps to perform the mathematical
operation of PR can be summarized as follows (Weeks, 2007):
1. Selecting the filter coefficients for ga, i.e., a and b. Thus, ga = {a; b}.
2. ha is a reversed version of ga with every other value negated. Thus, ha = {b;−a}. If the
system has 4 filter coefficients with ga = {a; b; c; d}, then ha = {d;−c; b;−a}.
3. hs is the reversed version of ga, thus hs = {b; a}.
4. gs is also a reversed version of ha, therefore gs = {−a; b}.
The above steps can be rewritten as follows:
Substituting equations (5), (6), (7) and (8) into (9) yields to
4. Simulation results
Simulation variables and their matrix values are shown in Table I. The number of samples
for the subcarriers N is 64, and the number of samples for the symbols ns is 1000. Data is
similar between FFT and DWT OFDM in all parameters except the multiplexed one. For
DWT-OFDM, it is required the transmitted signal to have double the data of FFT-OFDM.
This is due to the fact that the DWT transmitter has zeroes padding component. An element
value in the table that has a multiplier is referred to its matrix representation of row and
column. If the element has 64 x 1000, it means that it has 64 numbers of rows and 1000
numbers of columns.
Minimum Subcarriers 64 64
requirement OFDM symbols 1000 1000
input binary generated 64 x 1000 64 x 1000
parallel transmitted data 64 x 1000 64 x 1000
Transmitter
serial transmitted data 1 x 64000 1 x 64000
multiplexed data transmitted 64000 x 1 128000 x 1
multiplexed data received 64000 x 1 128000 x 1
serial received data 1 x 64000 1 x 64000
Receiver
parallel received data 64 x 1000 64 x 1000
output binary recovered 64 x 1000 64 x 1000
(transmitter) system model. As a result, most samples in the middle of DWT-OFDM symbol
is almost zeroes. The DWT-OFDM performance can be observed from Figure 8. The wavelet
families Biorthogonal and Daubechies are compared with FFT-OFDM. It is shown that
bior5.5 is superior among all others. It outperforms FFT and Daubechies by about 2 dB and
bior3.3 by 8 dB at 0.001 BER.
5. Conclusions
Simulation approaches using MATLAB for wavelet based OFDM, particularly in DWT-
OFDM as alternative substitutions for Fourier based OFDM are presented. Conventional
OFDM systems use IFFT and FFT algorithms at the transmitter and receiver respectively to
multiplex the signals and transmit them simultaneously over a number of subcarriers. The
system employs guard intervals or cyclic prefixes so that the delay spread of the channel
becomes longer than the channel impulse response. The system must make sure that the
cyclic prefix is a small fraction of the per carrier symbol duration. The purpose of employing
the CP is to minimize inter-symbol interference (ISI). However a CP reduces the power
efficiency and data throughput. The CP also has the disadvantage of reducing the spectral
containment of the channels. Due to these issues, an alternative method is to use the wavelet
transform to replace the IFFT and FFT blocks. The wavelet transform is referred as Discrete
Wavelet Transform OFDM (DWT-OFDM). By using the transform, the spectral containment
of the channels is better since they are not using CP. The wavelet based OFDM (DWT-
OFDM) is assumed to have ortho-normal bases properties and satisfy the perfect
reconstruction property. We use different wavelet families particularly, Biorthogonal and
Daubechies and compare with conventional FFT-OFDM system. BER performances of both
OFDM systems are also obtained. It is found that the DWT-OFDM platform is superior as
compared to others as it has less error rate, especially using bior5.5 wavelet family.
6. References
Abdullah, K.; Mahmoud, S. & Hussain, Z.M. (2009). Performance Analysis of an Optimal
Circular 16-QAM for Wavelet Based OFDM Systems. International Journal of
Communications, Network and System Sciences (IJCNS), Vol. 2, No. 9, (December
2009), pp 836-844, ISSN 1913-3715.
Ahmed, N. (2000). Joint Detection Strategies for Orthogonal Frequency Division
Multiplexing. Dissertation for Master of Science, Rice University, Houston, Texas.
pp. 1-51, April.
Akansu, A. N. & Xueming, L. (1998). A Comparative Performance Evaluation of DMT
(OFDM) and DWMT (DSBMT) Based DSL Communications Systems for Single and
Multitone Interference, Proceedings of the IEEE International Conference on
Acoustics, Speech and Signal Processing, vol. 6, pp. 3269 - 3272, May.
Bahai, A. R. S. & Saltzberg, B. R. (1999). Multi-Carrier Digital Communications - Theory and
Applications of OFDM. Kluwer Academic. ISBN: 0-306-46974-X 0-306-46296-6. New
York.
Baig, S. R.; Rehman, F. U. & Mughal, M. J. (2005). Performance Comparison of DFT, Discrete
Wavelet Packet and Wavelet Transforms in an OFDM Transceiver for Multipath
Fading Channel. 9th IEEE International Multitopic Conference, pp. 1-6, December.
66 Discrete Wavelet Transforms: Algorithms and Applications
1. Introduction
SPIHT (Set Partitioning In hierachical trees), being an efficient coding method for wavelet
coefficients, has acquired more and more widely application, especially in image/video
compression fields. But, conventional SPHIT have some obvious limitions. For example,
when for the color image compression, polarmetric SAR image compression, or multi-
spectrum image compression and other multi-channel image compression, there are only
very limited image planes(R,G,B for color image, HH, HV,VVfor polarmetric SAR images)
but there exist large amount of information redundancy among the image planes. So,
considering the support length of discrete wavelet transform, we can’t use the set 8-partition
methods such as 3D-SPIHT which has been used for video compression. Another example,
when the input image is unsymmetrical such as 16x512 image block even only one line
image content, which is very common in hardware design, because it means the larger final
chip die size for the many line buffers. But, the traditional SPIHT codec can acquire the best
compression performances only for the image is symmetrical in horizontal and vertical
dimension. To the above specific applications, I will discuss several kinds of modified
SPIHT in this chapter, most of them are the author’s newly research result.
In the following section, We will give a simple description about the traditional SPIHT
codec, then, we will take the polarimetric SAR intensity image compression as example and
give some specific compression method for multi-channel image compression. Certainly,
before encoding for the wavelet coefficients, we need a 3D matrix transform to remove the
information redundancy among the image plane and in each image plane. For the
unsymmetrical image compression used in hardware design, an unsymmetrical SPIHT
codec is detailed addressed, at the time, its specific case for only 1 line image compression,
1D SPIHT codec is also given.
According the SOT structure, the set partitioning can be defined as:
T (i , j ) c(i , j ) D(i , j )
D(i , j ) O( i , j ) L(i , j ) (1)
L(i , j ) D( k , l ) ( k , l ) O( i , j )
Here, T (i , j ) is spatial orientation tree and c(i , j ) is a root node of the tree; D(i , j ) , O(i , j )
and L(i , j ) are node c(i , j ) ’s descendant node set, direct descendant node set and indirect
descendant node set. Direct node set O(i , j ) can be further partitioned into 4 nodes just as
the following:
Conventional SPIHT encoding process can be divided into sorting pass and refinement pass.
During the encoding process, 3 lists are used to record the corresponding encoding
information, which include List of insignificant set (LIS), list of insignificant pixel (LIP) and
list of significant pixel (LSP). The basic operation is significance test and the significance test
function is just as the following:
1,
Sn T
max c i , j 2
n
(3)
i , j T
0, otherwise ,
wavelet transform coefficients, this method is called 3D-SPIHT. But at some cases, we only
need to compress multi-channel image, such as color images (R,G, B), polarimetric SAR
image (HH,HV,VV), multi-spectrum image, ect. Because the third dimension usually have
only limited image planes and can’t be processed by supported discrete wavelet transform,
however there exist much information redundancy among each image channel. In this case,
consider it’s simplity, 2D DWT for each image plane and 1D DCT can be used to remove the
information redundancy. In the following section 3.1-3, taking polarimetric SAR image as
example, 3D matrix transform and the related compression method including bit allocation
based encoding method and 3D SPIHT Embedded method will be addressed.
1 2
F( x , y ,0) f ( x , y , z)
3 z0
(4)
2 2 (2 z 1)Z
F( x , y , Z ) f ( x , y , z)cos 6
3 z0
(Z 1, 2)
The 3D-matrix can be composed in other orders, such as HH, VV, HV acted as the 1st, 2nd
and 3rd planes respectively. The components of like-polarimetric(HH and VV) have strong
correlation, while the components of cross-polarimetric (HH/VV and HV) have weak
correlation. Both the DCT theory and experiment results prove that the DCT coefficients of
the matrix composed of HH, HV and VV are the most concentrated and that the decoded
images have the least loss at the same coding rate
After 1D-DCT transform, the data power of the 3D-matrix is concentrated into the 1st plane
and the redundancy among three data planes decreases greatly. In order to remove the
redundancy in every data plane of the 3D matrix, 2D-DWT is chosen. According to the
definition of 2D-DWT transform, the image data can be decomposed into horizontal,
vertical, diagonal and low frequency components after horizontal and vertical filtering. The
low frequency component can be decomposed further. After many level discrete wavelet
transform, the data power is concentrated to the low frequency components. After 1D-DCT
transform and 2D-DWT transform, the power of the whole 3D-matrix is concentrated onto
the top left corner. 3 level wavelet transform is adopted, so each data plane is decomposed
into 10 subbands including LL3, HL3, LH3 HH3, HL2, LH2, HH2, HL1, LH1, HH1. Of all the
subbands, the LL3 subband of the 1st plane has the highest power.
The 3D-matrix transform of Multi-polarimetric SAR intensity images (1D-DCT among
polarimetric channel and 2D-DWT in each polarimetric plane) is illustrated in Figure.1. The
70 Discrete Wavelet Transforms: Algorithms and Applications
1D-DCT and 2D-DWT are linear transform, so, the operation 1DCTz and 2DWTx,y can be
inverted in sequence, that is:
min (D1 ( R1 ) D2 ( R2 ) D3 ( R3 ))
R ,R ,R
1 2 3 (6)
subject to R1 R2 R3 3RT
F( R1 , R2 , R3 ) D1 ( R1 ) D2 ( R2 ) D3 ( R3 ) (3 RT R1 R2 R3 ) (7)
Several Kinds of Modified SPIHT Codec 71
F
0
R1
F
From 0
R2
F
0
R3
we can acquire
D1 ( R1 )
R1
D2 ( R2 )
(8)
R2
D3 ( R3 )
R3
According to the inequation of rate distortion function for non-Gauss continuous
information source, we have:
1 1 2
h(U ) log 2 eD R(D) log (9)
2 2 D
2
where h(U ) is the differential entropy of information source and is its mean square
error.
At high bit rate, the lower bound of the inequation approaches the real rate distortion
function for most probability distributed information source. So, we can let the lower bound
equal to the real rate distortion function and then acquire:
1
R(D) h(U ) log 2 eD (10)
2
Further, we can acquire
e 2( h(U ) R )
D( R ) (11)
2 e
and
D( R ) e 2( h(U ) R )
(12)
R e
For every mixed coefficient plane:
1
R1 h(U 1 ) 2 log( e )
1
R2 h(U 2 ) log( e ) (13)
2
1
R3 h(U 3 ) 2 log( e )
72 Discrete Wavelet Transforms: Algorithms and Applications
The 3D-SPIHT process can be divided into sorting pass and refinement pass, whose basic
operations is also significance test of set just as the conventional SPIHT algorithm. That is,
1, max C
(i , j ) 2 n
iplane
Sn Tiplane i , j Tiplane (16)
0, otherwise
Additionally, it is necessary to be mentioned that there have been two kinds of 3D-SPIHT
algorithms before. One is proposed for video compression by extending the conventional
SPIHT algorithm to 3D case directly and encoding the 3D-DWT wavelet coefficients of video
data, so the SOF is defined as 8 splitting [7]. The other is proposed for compression of
multispectral images, which make some amendments of the conventional SPIHT by adding
one spectral child to every baseband coefficient, so its SOF is still 4 splitting [8]. The
proposed 3D-SPIHT embedded coding in this book is very different from the two existing
3D-SPIHT algorithms, which encodes 1 or 2 or 3 coefficient planes of the 3 mixed coefficient
plane sequentially by adopting 3 independent thresholds.
H
c(i , 2 j ), c(i , 2 j 1) if H 0 i
2
W
O(i , j ) c(2 i , j ), c(2i 1, j ) if iW0 j (18)
2
c(2 i , 2 j ), c(2 i 1, 2 j ), c(2 i , 2 j 1), c(2i 1, 2 j 1) otherwise
H W
Here, H 0 HLevel WLevel 1
, W0 . WLevel HLevel 1
2 2
When HLevel WLevel , the unsymmetrical SPIHT codec will completely degenerate into
conventional SPIHT codec. When HLevel 1 or WLevel 1 , only set 2-partitioning are
Several Kinds of Modified SPIHT Codec 75
implemented in one direction, the coefficients can be encoded line by line independently.
The wavelet transform in another direction is only be used as compacting the image energy.
Fig 3 and Fig 4 are the illustrations of conventional SPIHT encoding and unsymmetrical
SPIHT encoding at HLevel 3 or WLevel 5 .
Fig. 3. The illustration of set partitioning using conventional SPIHT encoding for
unsymmetrical image size
Fig. 4. The illustration of set partitioning with unsymmetrical SPIHT encoding for
unsymmetrical image size
76 Discrete Wavelet Transforms: Algorithms and Applications
5. 1D SPIHT codec
In section 4, unsymmetrical SPIHT codec is detailed described, which also need 2D image
data or image block. But, in real time image transmission or scan display system, the image
data are usually transmitted or displayed line by line. In order to use conventional SPIHT or
unsymmetrical SPIHT, it needs many line buffers to store the previous image data. In
hardware, it is a high burden for the costly RAM. So, 1 line image data compression method
will have the precedence over other block based compression methods, such as the 1D DWT
followed 1D SPIHT codec which will be addressed in the following.
After 1D DWT, the wavelet coefficient also has the natural pyramid characteristic: every
pixel of the high frequency subband has its 2 corresponding pixels in its adjacent level high
frequency subbands in position, which means that only set 2-partitioning can adopted. The
illustration is given in fig3.
Fig. 5. The illustration of set partitioning with 1D SPIHT encoding for 1 line image.
The SOT and set partitioning can be written as formula 19 and 20.
T (i ) c(i ) D(i )
D(i ) O(i ) L(i ) (19)
L(i ) D( k ) ( k ) O(i )
O(i ) = {c(2i ), c (2 i 1)} (20)
From fig 5 and formula 19 and 20, we can see that 1D SPIHT use set 2 partitioning to encode
1D DWT coefficients, which only need 1 line buffer RAM but leave another dimensional
redundancy un-removed.
6. Conclusion
In this chapter, SPIHT and it’s derivatives or its modification methods, such as 3D-SPIHT,
3D-SPIHT Embedded method, Unsymmetrical SPIHT and 1D-SPIHT are described,
which can overcome the disadvantages of traditional SPIHT codec and meet the
specific requirements for the real applications. Fig.6 gives the derivative relationship of
traditional SPIHT and its several modified methods. We can see that SPIHT is the
foundation for traditional symmetrical image, but it can’t meet the requirements at some
specific applications, such as multi-channel image, strip image (unsymmetrical image),
even image line. But, its modified version can meet some specific requirements in real
applications.
Several Kinds of Modified SPIHT Codec 77
7. References
[1] Said A. and Pearlman W.A., A new fast and efficient image codec based on set
partitioning in hierarchal trees, IEEE Trans. on Circuits and Systems for Video
Technology, 1996, 6(3): 243-250.
[2] D. E. Dudgen, R. M. Mersereau, Multidimensional digital signal processing, New
Jersey:prentice-Hall, Inc.,1984.
[3] Zhu Yanqiu,Chen Hexin,Dai Yisong. Compression Coding of color image via 3-D
matrix transform [J] ACTA EI ECTRONICA SINICA,1997,25(7):16-21.
[4] Y. Shoham, A. Gersho, Efficient bit allocation for an arbitrary set of quantizers, IEEE
Transaction on Acoustics,Speech, and Signal Processing, 1988,36(9):1445-1453
[5] P. Prandoni, M. Vetterli, R/D optimal linear predication, IEEE Transaction on Speech
and Audio Processing, 2000,8(6):646-655.
[6] A. Aminlou, O. Fatemi, Very fast bit allocation algorithm, based on simplified R-D curve
modeling, IEEE ICECS 2003, 112-115.
[7] B. J. Kim, Z. Xiong, W. A. Pearlman,Low bit-rate scalable video coding with 3-D set
partitioning in hierarchical trees(3-D SPIHT), IEEE Transaction on Circuilt and
System for Video technology, 2000, 10(8): 1374-1387.
[8] P. L. Dragotti, G. Poggi and A. R. P. Ragozini, Compression of multispectral images by
three-dimensional SPIHT algorithm, IEEE Transaction on Geoscience and Remote
sensing, 2000, 38(1):416-428.
78 Discrete Wavelet Transforms: Algorithms and Applications
[9] W.-C. Zhang, Y.-F. Wang, G.-H. Hu,.Compression of multi-polarimetric SAR intensity
images based on 3D-matrix transform, IET- Image processing, 2008,2(4):194:202.
[10] Zhang Zhi-hui, Zhang Jun, Unsymmetrical SPIHT Codec and 1D SPIHT Codec,
International Conference on Electrical and Control Engineering (ICECE), 2010,
wuhan 2498:2501.
Part 2
1. Introduction
Detecting edges is a very well known subject in the image-processing field. Edge detection is
the process of the localization of significant discontinuities in the grey level image and the
identification of the physical phenomena that originated them. Those significant intensity
changes occur at different resolution or scales for a given image. As suggested by Rosenfeld
and Thurston (Rosenfeld & Thurston, 1976) and Marr (Marr, 1982), we can obtain a
description of the image changes at different scales combining the information given by an
edge detector applied at different resolutions. This is the aim of the work presented in this
paper.
The first aspect to be covered by multiresolution analysis is the filter chosen to accomplish
the low-pass filtering of the image at different scales. At a single resolution, low pass
filtering is imposed because differentiation is an ill-posed problem (Torre & Poggio, 1984).
The needed regularization process is implemented by means of a low pass filter. Marr
(Marr, 1982) proposed the Gaussian filter because its optimal behaviour in terms of the
smoothing and the localization in both the spatial and frequency domains. This filter has
been commonly used in edge detectors. In a multiresolution approach the first or second
directional derivatives of the low pass filtered image with Gaussians of different widths are
used to detect edges. In the Bergholm edge focusing method (Bergholm, 1987) various edge
maps extracted at different scales are integrated allowing distinguishing shadows contours
from perfect ones using Canny’s operator (Canny, 1986) with different widths. Another
possibility is to describe the image in terms of the scale space as proposed by Witkin
(Witkin, 1983) and to detect edges in terms of the zero crossing of the Laplacian operator
with different widths (Park et al., 1995) (Eklundth et al., 1982).
Other multiresolution methods have been proposed. Mallat and Zhong (Mallat & Zhong,
1992) related multiscale edge detection with the discrete wavelet transform (DWT). They
proposed a wavelet to perform edge detection and they showed that the evolution of
wavelet local maxima across scales characterizes the shape of irregular structures. In our
work we will use the wavelet-based algorithm proposed by Mallat and Zhong and we will
show the condition that must be satisfied by the Gaussian filter to be comparable with the
Mallat and Zhong’s wavelet. Our aim is to detect and classify different edge types. Various
edge profiles have been proposed. Rosenfeld (Rosenfeld & Kak, 1976) proposed the step,
82 Discrete Wavelet Transforms: Algorithms and Applications
ramp, pulse and stair as the basic edge types. He stated that these profiles are suited for a
first intuitive classification of the edges found in the contours of real images.
William and Shah (Williams & Shah, 1990, 1993) have studied these edge types using the
first directional derivative of the Gaussian. Some other profiles like the blurred step have
also been proposed and analyzed (Ziou & Tabbone, 1993). In a first step of this work we are
going to analyze the evolution of the modulus of the wavelet coefficients at the edge
position in order to classify the edges into four different profiles: step, ramp, stair and pulse
(Beltrán et al., 1994). Then we will propose a general schema to detect, analyze and classify
different edge types.
Due to the high pass filtering operation involved in the edge detection algorithms, the noise is
always amplified when detecting edges. Thresholding has been the most common way to
eliminate the irrelevant or noise detected edges (Canny, 1986; Marr, 1982). To validate the
proposed classification schema we will present the noise (Gaussian noise) as a new edge class
(Beltrán et al., 1998). Then, we have analyzed this contour type and we have modified the
proposed classification algorithm to include the noise as a new edge type. With the noise edges
labelled we can easily implement a noise-filtering algorithm. Finally, we have developed a
new classification algorithm including other edge profile models, such as the roof, ridge and
two non-antisymmetrical step profiles, like the ones proposed by Paillou (Paillou, 1994).
This chapter is divided as follows. Section 2 presents a survey of the wavelet formalism
introduced by Mallat and Zhong. Then, we will show the geometrical contour types: step,
ramp, pulse and stair. Section 3 presents the edge detection algorithm including the wavelet
algorithm implementation. Section 4 presents the classification algorithm implementation
details as well as the classification results obtained processing a 256x256 grey level synthetic
image with four objects: a circle, a square, a triangle and a narrow line, each one having a
different contour type. Section 5 deals with the characterization of the noise as a new type of
contour. Section 6 copes with the new contour types just introduced, the modified
classification algorithm together with the obtained results. In section 7 we present the main
conclusions and the future work.
2. Theoretical basis
Mallat and Zhong (Mallat & Zhong, 1992) introduced the relationship between the wavelet
transform and a multiresolution edge detection algorithm. We are going to briefly
summarize those results. Let be f(x,y) ∈ L2(R2) an image and ψ(x,y) a wavelet. The
bidimensional wavelet transform of f(x,y) is defined as:
+∞ +∞ 1 x−u y−v
Ws f(u, v) = ∫ ∫ f(x, y) ψ , (1)
−∞ −∞ s s s
If we define the dilation by a factor s as:
1 x y
ψ s ( x, y ) = ψ , (2)
s s s
ɶ ( x, y ) = ψ
and ψ ɶ ( -x, -y ) , we can rewrite (1) as a convolution:
Ws f ( u, v ) = f ∗ ψ
ɶ ( u, v )
s (3)
Multiresolution Approaches for Edge Detection
and Classification Based on Discrete Wavelet Transform 83
So, as expressed in (3), the wavelet transform can be seen as the filtering of f(x,y) by the filter
ψɶ (x,y), that is a variable width bandpass filter (Mallat, 1989). It is possible to define N
s
directional wavelets ψi(x,y) (1 ≤ i ≤ N), satisfying energy conservation properties. In such a
case the directional wavelet transform of f(x,y) is defined as:
Wsi f ( u, v ) = f ∗ ψ
ɶ i ( u, v )
s (4)
Equation (4) represents the filtering of f(x,y) by the bidimensional, directional and bandpass
ɶ i (x,y).
filter ψ s
We can define the Discrete Wavelet Transform (DWT) by selecting the scales inside a dyadic
grid; that is to say, the scale could be expressed as S=2j with j ∈ Z. Therefore, for discrete
signals, we can understand the 2-D wavelet transform as the result of filtering the 2-D signal
(the original image) with a bandpass directional FIR filter.
Mallat and Zhong (Mallat & Zhong, 1992) designed a function specially suited for edge
detection purposes, which is a wavelet. This function is not a orthogonal wavelet, so the
only condition to be satisfied by ψ(x,y) is:
+∞ +∞
∫ ∫ ψ ( u, v ) dudv = 0 (5)
−∞ −∞
We can define two functions ψ1(x,y) and ψ2(x,y) as:
∂ θ(x, y) ∂ θ(x, y)
ψ 1 (x, y) = , ψ 2 (x, y) = (6)
∂x ∂y
Where θ(x,y) is a smoothing function, that is to say, the integral over x and y is one, and it
converges to zero at infinity. With these conditions is easy to show that both ψ1(x,y) and
ψ2(x,y) satisfy (5), so they are wavelets. Following equation (3) we can say that the wavelet
transform of f(x,y) is:
Ws1f(x, y) = f ∗ ψɶ 1s (x, y)
(7)
Ws2 f(x, y) = f ∗ ψ
ɶ s2 (x, y)
From (6) and (7) it can be shown that the wavelet transform is the gradient of the image
smoothed by a factor or scale s. It can be expressed as:
∂
W1f(x, y) ∂ x (f ∗ θs )
s = s = s∇(f ∗ θs ) (8)
W2 f(x, y) ∂
s (f ∗ θs )
∂y
We can define the modulus M and the phase Φ of the gradient at scale s as:
( )
2 2
M ∇(f ∗ θs ) = Ws1f(x, y) + Ws2 f(x, y) (9)
84 Discrete Wavelet Transforms: Algorithms and Applications
W 2 f(x, y)
( )
Φ ∇(f ∗ θ s ) = atan s1
W f(x, y)
(10)
s
A point (x0,y0) will be an edge point of the smoothed image f *θs (x,y) if in this point there is
a relative maximum of M in the direction addressed by Φ. The above statement is the
classical definition of edge detection proposed by Canny [5]. In the wavelet case we have a
discrete set of scales s = 2 j (0 ≤ j ≤ J), and we can calculate the edges at each scale and not at
an only one scale as described in Canny’s work. So, we can conclude that we can implement
a multiresolution edge detection algorithm by means of the 2-D wavelet transform. The
smoothing function could be a new defined function, as in Mallat and Zhong’s work, or a
Gaussian function, as in Canny’s work.
Mallat and Zhong (Mallat & Zhong, 1992) defined θ(x,y) as a separable bidimensional cubic
spline, so ψ(x,y) is a separable bidimensional quadratic spline. Their 1-D expressions are:
8 3 2 8
3 x + 8x + 8x + 3 if − 1 ≤ x ≤ − 1 2
−8x 3 − 8x 2 + 4 if − 1 2 ≤ x ≤ 0
3
θ̂(x) = 3 2 4 (11)
8x − 8x + 3 if 0≤x≤1 2
− 8 x 3 + 8x 2 − 8x + 8 if 1 2≤x≤1
3 3
0 otherwise
8x 2 + 16x + 8 if − 1 ≤ x ≤ −1 2
−24x 2 − 16x if − 1 2 ≤ x ≤ 0
ψ̂(x) = 24x 2 − 16x if 0≤x≤1 2
(12)
−8x 2 + 16x − 8 if 1 2≤x≤1
0 otherwise
ˆ ˆ
In the 2-D case we can define θ(x, y) = θ(x)θ(y) and ψ(x, y) = ψ(x)ψ(y)
ˆ ˆ . We can compare the
wavelet and the Gaussian functions behaviour. In figure 1 we can observe that both
functions present a similar aspect. We have obtained a relationship between the Gaussian
width σ and the scale s of the wavelet analysis. The Gaussian normalization constant is:
1 4
N= , and the smoothing function value at the origin is θ(0) = .
2̟ ⋅ σ 3
Equating both quantities we obtain σ= 0.3. So, to obtain a similar analysis with the wavelet
and the Gaussian function, the Gaussian width and the scale should be related by the
aforementioned expression.
Now we will to present the edge profiles considered initially. We have considered that each
profile contour is a function that represents a change in the grey level with respect to the
image dimensions. In this work we will name Ui the contour height, ω the contour width, x0
the contour position (at the middle of the contour) and s the scale.
Multiresolution Approaches for Edge Detection
and Classification Based on Discrete Wavelet Transform 85
Fig. 1. Left: Solid line: Smoothing function. Dotted line: Gaussian with σ = 0.3. Right: Solid
line: Wavelet. Dotted line: First derivative of Gaussian with σ = 0.3.
2.1 Step
A step profile is shown in figure 2. We have named x0 the step location and U0 the step
height. Let be u(x) the step function.
U 0 if x ≥ x0
u(x) = (13)
0 if x < x0
Fig. 2. Step profile. Horizontal axis represents pixels. Vertical axis represents grey level.
2.2 Ramp
The ramp profile is drawn in figure 3. The point x0 is the middle point of the ramp; U0 is the
height and ω is the ramp width. The ramp slope is m = U0/ω. Let be r(x) the ramp function.
ω
0 if x < x0 −
2
ω ω
r(x) = mx + r0 if x0 − ≤ x ≤ x0 + (14)
2 2
ω
U 0 if x > x 0 +
2
Fig. 3. Ramp profile. Horizontal axis represents pixels. Vertical axis represents grey level.
86 Discrete Wavelet Transforms: Algorithms and Applications
2.3 Stair
A stair profile with two steps, named s(x), is shown in figure 4. We have named the point x0
the middle of the stair, ω is the stair width and U0 and U1 are, respectively, the steps height.
ω
0 if x < x0 − 2
ω ω
s(x) = U 1 if x0 − ≤ x ≤ x 0 + (15)
2 2
ω
U 0 if x > x0 + 2
Fig. 4. Stair profile. Horizontal axis represents pixels. Vertical axis represents grey level.
2.4 Pulse
A pulse profile, named p(x), is shown in figure 5. The point x0 is the middle of the pulse, ω is
the width. U0 and U1 are, respectively, the maximum and minimum height.
ω
0 if x < x0 − 2
ω ω
p(x) = U 0 if x0 − ≤ x ≤ x0 + (16)
2 2
ω
U 1 if x > x 0 + 2
Fig. 5. Pulse profile. Horizontal axis represents pixels. Vertical axis represents grey level.
eight possible scales for a 256x256 grey level image. The scale s forms a dyadic sequence (s =
2j, j = 1..6). The algorithm is summarized in figure 6, and table 1 shows the values of the
normalization coefficients λj. The algorithm outputs are twelve 256x256 grey level images
for each original one, six corresponding to the modulus of the derivative and six to the
phase.
As we have pointed out in section 2, it is possible to perform a multiresolution analysis
changing the smoothing function. The only difference between a typical Gaussian
algorithm and the wavelet one is the filtering stage. In this case, instead of using the filter
proposed by Mallat and Zhong, a Gaussian filter with a different size for each scale could
be used. It leads to a different way to obtain the derivative of the image at different scales.
A more detailed discussion about the Gaussian-based processing algorithm is given in
Beltrán (Beltrán et al., 1998). Processing the image with the wavelet filter is faster, in
computational cost terms, because the non-zero coefficients are constant independent on
scale.
In order to obtain the contour image, a top-down searching algorithm has been
implemented. For accepting a maximum at one scale to be an edge, it has been imposed that
the maxima have to be propagated to the lowest scale with no change in the gradient
direction between scales. When a maximum is found at one scale s we look for extrema in
the lower scale within an interval of 2s centered in the extrema position at the scale s, in the
direction given by the gradient. This interval is greater than the theoretical one found by
Beltrán (Beltrán, 1994), in order to cover the maxima displacement.
The maximum appearing in the lower scale has to have the same direction that the upper
one. The best edge position is given at the lowest scale. The first scale to be analyzed
depends strongly on the image. Empirically we have noticed that this scale has to be no
higher than either the 5th or the 6th. Otherwise, we have a strong blurring in the image
that gives us information of global objects rather than finer patterns, like edges. The
procedure has been iterated until first scale. A stop could be done at an upper scale,
depending on the details we are looking for. If we were looking for finer details we
should reach the first scale. A global threshold has been included in order to discard
irrelevant edges.
j λj
1 1.5
2 1.125
3 1.031
4 1.007
5 1.001
6 1
4. Classification schema
To obtain real evolution patterns in an image we have analyzed the evolution across scales
of the value of the wavelet transform modulus for each contour type at the point at which
the edge is located in our test image (see figure 7).
Fig. 7. Test image. Each object has a different contour profile. Circle: step. Square: ramp.
Triangle: stair. Straight line: pulse.
In figure 8 we present for each contour class both the median value (normalized to the
highest value for each pixel) and the deviation for each scale.
It can be seen a decreasing behavior in the evolution for each contour class in the upper
scales due to the interaction between the opposite contours in the image. This decreasing
pattern is not present in the 1-D case. From those results we can say that a step profile,
figure 8(a), is characterized by an almost constant evolution across scales. The evolution
pattern for a ramp, presented in figure 8(b), is a continuous growing from low to high scales.
A constant value followed by an increasing in the normalized modulus of the gradient in the
Multiresolution Approaches for Edge Detection
and Classification Based on Discrete Wavelet Transform 89
(a) (b)
(c) (d)
Fig. 8. (a) Step evolution. (b) Ramp evolution. (c) Stair evolution. (d) Pulse evolution. (See
text for details)
edge position characterizes a stair contour, figure 8(c). As shown in figure 8(d) the pulse
profile presents a sharp decreasing in the upper scales due to the interaction between the
positive and negative slopes.
Figure 9 shows a block diagram of the edge detection and classification algorithm. An
immediate conclusion is that it is possible to implement an algorithm to detect and classify the
above-characterized four profiles using the wavelet transform coefficients. Then, we are able to
distinguish one contour type from another one only by looking at the coefficients evolution at
the appropriate contour point and with no pre-processing in the coefficient values.
The classification engine is based in a second-order polynomial-fitting algorithm. We have
chosen a second order taking into account the easiness of implementation of properties such
as derivability, continuity, concavity, and convexity. Analyzing the concavity and convexity,
zero crossing and minimum abscissa and ordinate of the fitted polynomial we are able to
distinguish between the four profiles presented. Typical values of the coefficients are
presented in table 2 and the corresponding polynomial in figure 10. The second order
polynomial is of the form: f(x)=a0+a1x+a2x2. We have used a standard polynomial-fitting
algorithm with the 6 discrete values obtained for each contour at the edge location.
90 Discrete Wavelet Transforms: Algorithms and Applications
a0 a1 a2
Step 0.8295 0.1026 -0.0187
Ramp -0.0249 0.3678 -0.0362
Stair 0.3562 0.0894 0.0005
Pulse 1.0087 -0.0732 -0.0131
(a) (b)
(c) (d)
Fig. 10. Dotted lines: Original values. Solid lines: polynomial. (a) Polynomial fitting for the
step profile. (b) Polynomial fitting for the ramp profile. (c) Polynomial fitting for the stair
profile. (d) Polynomial fitting for the pulse profile.
As some preliminary results we can see the correct classification made in figure 12 for our
test image. It is important to note that no post-processing (edge tracking, non-maxima
suppression, etc.) has been made in the obtained contour image.
Fig. 12. From left to right and top to bottom: step, ramp, pulse, and stair classified profiles.
Fig. 13. (a) Mean values of Gaussian noise evolution. (b) Noise dependence on distance.
But noise is rarely presented isolated in a real image, and its evolution depends strongly on
the distance between the noise and the contour of the objects present in the image. To obtain
the noise evolution we have processed the image of a simple square with a step profile
corrupted with Gaussian noise. The evolution in the values of the noise, in terms of the
distance between the noise and the square, are presented in figure 13b. These patterns serve
us to classify and differentiate the noise from other contour types. We can see that the
evolution pattern for a noise contour located very close to a real contour is quite similar to
the stair evolution. This is the expected behavior because the stair has been defined as closed
steps. We can include this contour type in our classification engine.
Multiresolution Approaches for Edge Detection
and Classification Based on Discrete Wavelet Transform 93
The final decision algorithm is presented in figure 14. We have to distinguish between the
stair type and the noise one. If we have a convex polynomial (a2 > 0) we have a stair or noise
edge. They are differentiated by means of the minimum ordinate value. If the minimum
ordinate is close to 0 it is classified as a noise edge.
YES YES
Convex? minimun==0? NOISE
NO
NO STAIR
YES
Increasing? RAMP
NO
YES
Xm= 6? PULSE
NO
STEP
It can be seen that the noise is perfectly classified (figure 15c) and can be eliminated by
simply removing this edge type (figure 15b). Figure 15d shows the output of the Canny edge
detector. In this case Canny operator is more sensitive to noise than our algorithm. Figure 16
shows the different edge profiles detected.
6.1 Roof
A roof profile, named R(x), is shown in figure 17. We have named the point x0 the middle of
the roof, ω is the roof width and U0 the roof height.
ω
0 if x < x0 −
2
mx + k ω
if x0 − ≤ x ≤ x0
2
r(x) = U0 if x = x0 (17)
ω
−mx + k if x0 ≤ x ≤ x0 +
2
ω
0 if x > x0 +
2
Fig. 17. Roof profile. Horizontal axis represents pixels. Vertical axis represents grey level.
6.2 Ridge
A ridge profile, named R(x), is shown in figure 18. The point x0 is the middle of the ridge, ω1
is the width of the first ramp, ω2 is the width of the plain part, while ω3 is the width of the
second ramp. U0 is the ridge height.
Multiresolution Approaches for Edge Detection
and Classification Based on Discrete Wavelet Transform 95
ω
0 if x < x0 − ω 1 − 2
2
mx + k ω2 ω
if x0 − ω 1 − ≤ x ≤ x0 − 2
2 2
ω2 ω2
R(x) = U 0 if x 0 − ≤ x ≤ x0 + (18)
2 2
ω2 ω2
−mx + k if x0 + 2 ≤ x ≤ x0 + 2 + ω 3
0 ω
if x > x0 + 2 + ω 3
2
Fig. 18. Ridge profile. Horizontal axis represents pixels. Vertical axis represents grey level.
0 if x < x0 − ω1
mx + k if x0 − ω1 ≤ x ≤ x 0 + ω1 − ω 2
nu(x) = (19)
mx + k if x0 + ω 1 − ω 2 ≤ x ≤ x0 + ω 1 + ω 2
U 3 if x > x0 + ω1 + ω 2
Fig. 19. First non-antisymmetrical step profile. Horizontal axis represents pixels. Vertical
axis represents grey level.
96 Discrete Wavelet Transforms: Algorithms and Applications
0 if x < x0 − ω1
mx + k ω2
if x0 − ω1 ≤ x ≤ x 0 + ω1 −
2
ω2 ω
nu2(x) = cx + k if x0 + ω1 − ≤ x ≤ x0 + ω1 + 2 (20)
2 2
ω2 ω
mx + k if x0 + ω1 +
2
− ω 3 ≤ x ≤ x0 + ω1 + 2 + ω 3
2
U ω2
if x > x0 + ω1 + + ω3
3 2
Fig. 20. Second non-antisymmetrical step profile. Horizontal axis represents pixels. Vertical
axis represents grey level.
In order to show the results of the modified algorithm, we have created a new test image,
including the new edge profiles. Figure 22 shows the new test image, together with the
different detected edge profiles. Results with a real image are shown in figures 23 and 24.
Fig. 22. (a) Test image. (b) Ramp edges. (c) Step edges. (d) Pulse edges. (e) Stair edges. (f)
Ridge edges. (g) Roof edges. (h) First non-antisymmetrical step edges. (i) Second non-
antisymmetrical step edges.
7. Conclusions
In this work, we have developed a new algorithm for edge detection and classification
purposes using the coefficients given by the DWT. We have shown that Mallat’s wavelet is a
very suitable tool to perform both edge detection and contour analysis. We have presented
the results obtained with a 256x256 synthetic image with several objects, each one with a
different contour profile, obtaining a very good segmentation. At this point, we are not only
able to see the evolution across scales of the edges proposed by Rosenfeld (Rosenfeld &
Thurston, 1976) like in Williams and Shah’s work (Williams & Shah, 1990, 1993), but we are
also able to classify them.
A new edge class has been introduced: the noise. This edge type presents a particular
evolution across scales. This has allowed us to implement a simply noise filtering algorithm
based on edge classification.
The classification algorithm we have developed is based on second order polynomial fitting
of the modulus of the wavelet transform coefficients. The mathematical behaviour of the
polynomial is a robust indicator of the edge class. This kind of classification is good enough
to obtain the five different profiles analyzed: step, ramp, stair, pulse, and noise. The
robustness of the proposed classification schema has been tested including other profiles
100 Discrete Wavelet Transforms: Algorithms and Applications
appeared in the literature: roof, ridge and two kinds of non-antisymmetrical step models. A
third order polynomial-fitting algorithm is needed to obtain a proper classification. This
algorithm can be viewed as a new framework to classify different contour types.
Some preliminary results, like the behaviour of ramp edges, are promising to obtain a
classification of the contours appearing in real images (shadows, changes in illumination,
corners and so on). A future work to perform, which has not been covered in this paper, is
the study of the evolution across scales of these real edges. If there were some special
evolution pattern for these real edge types it would be very important information for the
next stages of an image understanding algorithm.
An edge-closing algorithm based on the edge type is under developing in this moment. The
extra information provided by the classification stage gives very good indicators to close
edges and extract objects in an image. The processing results presented in this work have
been obtained using Matlab®.
8. References
Beltrán, J. R., J. García-Lucía, J. & Navarro, J. (1994). Edge detection and classification using
Mallat’s wavelet. Proceedings of the ICIP-94, pp. 293-296
Beltrán, J. R., Beltrán, F & Estopañan A. (1998). Multiresolution edge classification: Noise
Characterization, Proceedings of the IEEE-SMC’98, pp. 4476-448
Bergholm, F. (1987). Edge Focusing. IEEE Trans. on Patt. Anal. and Machine Intell, Vol. 9, No.
6, pp. 726-471
Canny, J. F. (1986). A computational approach to edge detection. IEEE Trans. on Patt. Anal.
And Machine Intell. Vol. 8, pp. 679-698
Eklundth, J. O., Elfving, T. & Nyberg, S. (1982). Edge detection using the Marr-Hildreth
operator whith different sizes. Procedings of the. 6th. Int. Conf. on Pattern Recognition
(ICPR), Munich, Germany, pp. 1109-1112
Mallat, S. (1989). A theory for Multiresolution Signal Decomposition: The Wavelet
Representation. Trans. on Patt. Anal. and Machine Intell, Vol. 11, No. 7, pp. 674-693
Mallat, S. & Zhong, S. (1992). Characterization of Signals from Multiscale Edges. IEEE Trans.
on Patt. Anal. and Machine Intell, Vol. 14, No. 7, pp. 710-732
Marr, D. (1982). Vision. W. H. Freeman. San Francisco
Paillou, Ph. (1994). A non antisymmetrical edge profile detection. Pattern Recognition Letters
Vol. 15, pp. 595-605
Park, D. J., Nam, Kwon M. & Park, Rae-Hong. (1995). Multiresolution Edge Detection
Techniques, Pattern Recognition, Vol. 28, No. 2, pp. 211-229
Rosenfeld, A. & Thurston, M. (1971). Edge curve detection for visual scene analysis, IEEE
Transactions Computers, Vol. 20, pp. 562-569
Rosenfeld, A. & Kak, A. C. (1976). Digital Picture Processing. Academic Press
Torre, V. & Poggio, T. (1984). On edge detection. Technical Report 768, MIT
Williams, D. J. & Shah, M. (1990). Normalized Edge Detector. Proceedings of the 10th Int.
Conference On Pattern Recognition, 1 (16-21), pp. 942-946
Williams, D. J. & Shah, M. (1993). Edge Characterization Using Normalized Edge Detector.
Proceedings of CVGIP, Vol. 5, No. 4, pp. 311-318
Witkin, A. P. (1983). Scale-space filtering. Proceedings of the 4th Int. Joint Conference on Artificial
Intelligence (IJCAI), pp. 1019-1022
Ziou, D. & Tabbone, S. (1993). A Multiescale Edge Detector. Pattern Recognition, Vol 26, No.
9, pp. 1305-1314
7
1. Introduction
Video applications, like video teleconferencing, video telephones, and advanced television
(ATV), have given the field of compression and transmission of digital video signals a
significant importance. It is expected that the advances in video compression technology
will play a crucial role in the transmission and display of three-dimensional video signal.
A typical image, for example, of size 512x512 pixels with 8 bits per pixel (bpp) needs storage
capacity of about 2 Mbits. A video sequence, on the other hand, with the same frame size
with 30 frames per second and a channel transmission rate of 64 kilo bits per second (kbps)
would take about 17 minutes of transmission time. The required transmission time would
become unmanageable with the continuously increasing demand of image base application.
You can't put enough of it over a telephone line and you can't squeeze it into the broadcast
bandwidth of available channels1. Therefore, image and video compression algorithms
became a necessity to store or transmit these images.
Data compression is the science of representing information in a compact form by exploiting
the different kinds of statistical structures that may be present in the data2. This is to reduce
the number of bits per sample while keeping the distortion constant3. There is a great deal of
correlation between neighboring pixel values of an image. Therefore, removing such
redundant information and transmitting only the new information (the changes) enables us
to reconstruct the original image. For video signals, redundancy over time between
successive images can also be eliminated.
There are two types of compression: lossless and lossy. In the lossless compression the
original image can be retrieved without error, while for the lossy compression, the original
image can’t be retrieved without error; an image copy close to the original can be retrieved.
1 Realtime video compression poses challenge to designers and vendors alike, Computer Design, vol.
32, no. 7, pp. 67-70 (Child, July 1993).
2 Hybrid coding of images for progressive transmission over a digital cellular channel, CISST’99
International Conference on Imaging Science, Systems and Technology, Monte Carlo resort, Las Vegas,
Nevada, USA, PIN 128C (Al-Asmari et al.,June 28 – July 1, 1999).
3 Introduction to data compression, Morgan Kaufmann Publishers Inc., San Francisco, California
(Sayood, 1996).
102 Discrete Wavelet Transforms: Algorithms and Applications
Motion pictures expert group (MPEG) video standard is the most prevalent and widely used
for video compression3–6. Also, the MPEG is an application specific standard and different
versions of MPEG (Such as MPEG-1, MPEG-2, MPEG-4, and MPEG-7) are available for
different applications and bit rates. The basic algorithm for all these versions is the same and
is very similar to the other video compression standards.
The proposed algorithm is based on temporal filtering of image sequences with short
symmetric kernel filters (SSKFs)7–8, which are well known for their simplicity. In this paper,
we use four SSKFs filters each with 4-taps and with decimation factor of 4:1 instead of two
SSKFs filters each of 2-taps and with decimation factor of 2:1 used in classical 3D –
decomposition algorithms7–8. The temporal filtering removes the redundancy in temporal
domain. On the other hand, the pyramid coding (PC) is used for subband decomposition in
the spatial domain. The vector quantization (VQ) and the absolute moment block truncation
code (AMBTC) will be used to encode the spatial domain subbands.
4 Digital pictures representation, compression, and standard, Plenium Press (Netravali & Haskell, 1995).
5 Image and video compression standards: algorithms and architectures, Kluwer Academic Publishers
(Bhaskaran & Konstantinidides & Hewlett Packard Laboratories, 1996).
6 Digital compression of still images and video, Academic Press (Clarke, 1995).
7 Subband coding of video for packet networks, Optical Engineering, vol. 27, no. 7, pp. 574-586,
in Table 1. The frequency responses of these filters are shown in Figure 1. H0(ejω) is the low
pass filter, H3(ejω) is the high pass filter, while H1(ejω) and H2(ejω) are the band pass filters.
The 3-dB bandwidth for these filters is approximately π/4.
Ho(ejω) H1(ejω)
1 0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0 0
0 0.5 1 0 0.5 1
ω/pi ω /pi
H2(ejω) H3(ejω)
0.8 1
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0 0
0 0.5 1 0 0.5 1
ω /pi ω /pi
h0 TLL
Haar Filters
h1 TLH
h2 THL
Four consecutive
frames
h3 THH
Table 2. The average entropy for the four frames before and after the decomposition.
9 A mathematical theory of communication, Bell System Tech. J.. Vol. 27, pp. 379-423, and pp. 623-656
(Shannon, July. 1948a, Oct. 1948b).
Low Bit Rate Video Compression Algorithm Using 3-D Discrete Wavelet Decomposition 105
Frame 4 Frame 5
Frame 6 Frame 7
(a)
TLL TLH
Fig. 3. (a) Original and (b) Temporal filtered frames of the second four frames of Miss
America sequence.
106 Discrete Wavelet Transforms: Algorithms and Applications
3. Pyramid coding
Burt and Adelson have suggested a method of pyramid coding that is suitable for progressive
transmission11. In this method, the original image is filtered to be down-sampled by a factor of
2. The image thus obtained serves as a decimated image of the original. Then, the decimated
image is filtered to be interpolated by a factor of 2 to have the same size of the original image.
The difference between the original image and the interpolated one generates an error image.
This is called the first level PC decomposition. This process can be further repeated over the
decimated image to obtain higher levels. To achieve compression, the difference images and
the decimated image are bit allocated depending on the amount of information in each
subband. Those subbands with high information content will be assigned higher bit rate than
those with lower information content.
10 Video Signal Transmission for IS-95 Environment, Electronic Letters, Vol. 36, no. 5, pp. 465-466 (Al-
Trans. Circuit and Systems for video Tech., vol. 5, no. 3 pp. 182-192,(Alasmari, June 1995).
Low Bit Rate Video Compression Algorithm Using 3-D Discrete Wavelet Decomposition 107
By using a pyramidal coding scheme which basically follows a rate of change of 4 instead of
2 as used in the conventional pyramid coding, the number of samples to be encoded are 20%
less than the conventional pyramidal samples12–13. Another advantage of this filtering
technique is that the 24-tap FIR filter involves 33% fewer computations as compared to the
Gaussian filter when FFT algorithm is used. The third advantage of this filter is the lower
entropy obtained when compared with that found when the Gaussian filter is applied12.8
Fig. 4. Spatial decomposition for the temporal subbands using pyramid coding.
13 Low complexity subband encoding for HDTV images, IEEE J. Select. Areas Commun., vol. 11, no. 1,
24-tap filter. It is found that the 24-tap filter will not give a good performance when used to
decompose the temporal (TLH, THL and THH) bands due to the spares nature of the
information in these subbands10. The Gaussian filter given in11 is used to decompose these
subbands. Figures 5, 6, 7, and 8 show the spatial subbands for the temporal bands of Miss
America sequence using the pyramid coding concept.
Fig. 5 shows the spatial pyramid decomposition of the temporal TLL subband. Two levels of
pyramid coding have been applied for this temporal subband. The decimated image (band
1), which contains most of TLL subband information, is of dimension 18 × 22 pixels. The
difference images of this band (band 2 and band 3) contain edge components and are of
dimension 72 × 88 pixels and 288 × 352 pixels; respectively.
TLL band 3
band 2
band 1
TLH band 5
band 4
In Figure 7 the spatial pyramid decomposition of the temporal THL subband is shown. One
level pyramid coding has been applied for this temporal subband. The decimated image
(band 6) is of dimension 144 × 176 pixels while the difference image (band 7) is of dimension
288 × 352 pixels.
THL band 7
band 6
THH band 9
band 8
14 Analysis of Coding and Compression strategies for Data Storage and Transmission, Ph. D. thesis,
1/2
cd (V ( j ) − Vm ( j ))2
dm = ∑ i (m = 1, 2,...Cs and i = 1,1 + Cd ,1 + 2Cd..., mxn) (1)
j =1 cd
Where
dm is the root mean square error (RMSE).
Vi is the image vector.
Vm is the codebook vector.
cs is the codebook size.
cd is the codeword dimension.
The RMSE (dm) will be compared with a pre-decided threshold error (Vth). If dm is less
than this threshold, then the index of the codebook vector with lowest error will be
transmitted and this vector is moved to the top of the codebook in both transmitter and
receiver. Otherwise, the image vector will be transmitted and placed at the top of the
codebook.
112 Discrete Wavelet Transforms: Algorithms and Applications
Band
Band 11 4
H24
h24
Band Edge
Edge-
Band 22 4 H24
h24 detection
detection
H24
Mapping
Mappingthe
the edge
–
edge Coding the pixels at
Band
Band 33 pixels to band 3 the edge- location
16 Absolute moment block truncation coding and application to color images, IEEE Trans. on Commun.,
sampled to have the same size as band 3. This partially reconstructed version of the original
image is used for edge-detection, which gives the location of the vectors of the band 3 that
are encoded. Once the locations are known, the encoded vectors are suitably placed to form
band 3. Thus, using this approach, no side information needs to be sent for the encoded
areas of the first difference image level and an average of only 4 % to 5% of this level needs
to be encoded. Thus, more compression is achieved by edge-detection instead of coding the
entire band. After edge-detection, the data is encoded by using LAVQ.
The decomposed subbands (band 4 and band 5) of the temporal (TLH) subband are encoded
using the LAVQ for band 4 and the edge-detection approach for band 5. Band 4 is interpolated
to the size of band 5. Then, the edge-detection technique is applied to the interpolated version
and the corresponding pixels from band 5 are formed into vectors to be transmitted. At the
receiver, this process is repeated by interpolating the decoded band 4 to the size of band 5.
Then, edge-detection is taken for this band to decide the location of the received vectors for
band 5. The same decoding process is adapted for bands 6 and 7 to get the temporal THL
subband. Band 8 and band 9 formed the temporal THH subband. Band 9 has extremely low
energy content and the sparse information carried in this band is not significant for the final
image reconstruction. Thus, it can be safely discarded. Band 8 is encoded using the LAVQ
algorithm. At the receiver, this band is interpolated to get the temporal THH subband.
The subbands encoded using LAVQ have different codebooks sizes and vectors (codewords)
dimension. The choice of the codebook size and the codeword dimension depends on many
factors such as the important of the information to be transmitted, the correlation between the
data in each band, and the size of the band to be encoded. Band 4, for example, is more
important than band 8 because it has some of the motion information, and also used at the
receiver as a detector to encode band 5. Therefore, some care shall be given to this band. The
codebook size is selected to be 128 codewords each of dimension 4. Band 6 and band 8 have
most of the edges information, which already emphasized by encoding band 7. Therefore, a
codebook with smaller size and a codeword of bigger dimension than that of band 4 can be
adapted. From the simulation results, we find that the codebook of size 64 with codeword
dimension of 8 will give an excellent reconstructed video sequence quality.
Since the encoded vectors corresponding to the edge-detection concept are the only
information used to reconstruct band 3, band 5, and band 7, then, these vectors shall be
quantized with lowest possible MSE. Therefore, codebooks of size 64 and codewords of
dimension 4 pixels are selected to encode those subbands. Table 3 shows the encoding
techniques, PSNR, and bit rate for Miss America sequence.
7. Simulation results
The compression algorithm is tested on three video sequences with different motions and
backgrounds. The first sequence is Miss America sequence with slow motion and static
background. In this sequence, the only moving objects are the lips and head. The second
sequence with moderate motion and noisy background is the Salesman sequence. The man’s
head and hand are moving faster than Miss America sequence and with noisy background.
The third sequence with fast motion than the man’s sequence is Walter sequence. All of these
sequences were 256 level gray-scale images with dimension of 288 x 352 pixels per frame at the
rate of 30 frames /s and 8 bits per pixel. These sequences are standard and are used by many
researchers. The only way to check the performance of our proposed algorithm is to test it on
such images and compare the results with other compression algorithms.
l1 l2
n
× n
( Bm + Bh − l ) × ( 4 4 )
Bd
Bitrate = (2)
l1 × l2
Where Bm is the bit-map,Bh-1 is the required bits to encode the high and the low mean, Bd is
the sub-block dimension, l1×l2 is the original image frame dimension, and n is the pyramid
level number (n = 1, for first level, and n = 2 for second level). In this paper, Bd is selected to
be 3x3 for band 1, and 4x4 for band 2. The high mean and the low mean for band 1 are
encoded at 8 bits each, while for band 2, the high mean and the low mean are encoded at 6
and 4 bits; respectively.
The bit rate calculation for the Miss America sequence according to equation (2) is shown
below for four frames of the original image sequence; namely, frame 4,5,6 and frame 7.
These frames are considered to have the highest motion among this test sequence.
18 × 22
(9 + 16) ×
Bitrate(band 1) = 3 × 3 = 0.01085 = 0.0027 bpp (3)
288 × 352 4
72 × 88
(16 + 10) ×
Bitrate(band 2) = 4 × 4 = 0.10156 = 0.0253bpp (4)
288 × 352 4
The second encoding technique (edge detection + LAVQ) is used to encode bands 3, 5, and
7. First, the edge concept is applied for those bands. Then, those pixels corresponding to the
positions represented with “1” are encoded using the LAVQ technique.
The first algorithm (LAVQ) is found to be of high performance for bands 4,6 and 8 since the
motion and edges information are presented in these bands. The average bit rate for each
band can be calculated using the following formula:
Low Bit Rate Video Compression Algorithm Using 3-D Discrete Wavelet Decomposition 115
( X × b ) + (Y × ((cd × 8) + b ))
Bitrate = (5)
l1 × l2
Where X is the number of matched codewords, b is the number of bits needed to encode the
index, Y is the number of non-matched codewords, and cd is the dimension of codeword.
Since the eight subbands represent four frames, then the bit rate for each frame is given by
the sum of the bit rates for each band divided by a factor of four.
Simulation results on these sequences can be discussed based on three main factors: peak
signal to noise ration (PSNR), bit per pixel (bpp) and visual quality. The PSNR in decibel
(dB) is given by;
2552
PSNR = 10 log 10 (6)
MSE
Where MSE is the mean square error written as follow;
1 m n ~
MSE = ∑∑ ( − )2
mxn i = 1 j = 1 X ij X ij
(7)
where;
m is the number of rows.
n is the number of columns.
~
X is the reconstructed pixel value.
Table 4 shows the average bit rate and average PSNR with different pre-decided thresholds
error (Vth) for the three sequences. These thresholds are compared with the RMSE (dm) that
result from LAVQ searching method. The bigger the Vth the lower number of non-matched (Y)
codewords will be. This will reduce the average bit rate required for transmission as given in
equation (5). However, the quality of the reconstructed image will be effected. Therefore, a
compromised between the bit rate and the required visual quality shall be decided.
at Vth = 12. Figure 11 shows the original and the reconstruction of the second four frames of
Miss America sequence at Vth =12. The visual quality of the reconstructed sequence is the
same as the original. From the simulation, it can be concluded that this compression
algorithm is capable of compressing a video sequences of different motions.
Frame 4 Frame 5
Frame 6 Frame 7
(a)
Frame 4 Frame 5
Fig. 11. (a) Original and (b) Reconstructed frames of the second four frames of Miss America
sequence at Vth = 12.
Low Bit Rate Video Compression Algorithm Using 3-D Discrete Wavelet Decomposition 117
The PSNR (dB) via the number of frames is demonstrated for the three sequences at
different thresholds. Also the average bit rate (bpp) versus the number of frames is
presented for 16 frames of the test video sequence. Since four frames are simulated as one
group at a time, only 4 different bit rates will be observed. However, the subband TLL is
transmitted first. Then, band TLH which is the second important subband regarding the
information contents. This approach will be followed until all the subbands are transmitted.
Accumulative calculation for the bit rate is then adopted in order to plot the bit rate curves.
Figure 12 shows the overall performance in terms of PSNR (dB) and bit rate (bpp) of the
proposed algorithm for Miss America sequence.
0.2 Threshold=7
Threshold=8
Threshold=9
0.18 Threshold=10
Threshold=11
0.16
0.14
bpp
0.12
0.1
0.08
0.06
2 4 6 8 10 12 14 16
Number of frames
(a)
38
Threshold=7
Threshold=8
37.8 Threshold=9
Threshold=10
Threshold=11
37.6
PSNR in dBs
37.4
37.2
37
36.8
2 4 6 8 10 12 14 16
Number of frames
(b)
Fig. 12. Performance curves. (a) PSNR vs. number of frames for Miss America sequence. (b)
bpp vs. number of frames for Miss America sequence.
118 Discrete Wavelet Transforms: Algorithms and Applications
8. Performance evaluation
The performance evaluation of the proposed algorithm is done in two stages. First, it is
compared with the performance of MPEG standard algorithm. Second, it is compared with
some existing research works in this field using multiresolution decomposition concept.
MPEG-1 KARL [7] NEHAL [8] Al-Asmari [10] Al-Asmari [17] This Method
Interframe Motion Temporal 2-tap Temporal 2-tap Temporal 2-tap Motion Temporal 4-tap
Coding compensation filtering SSKFs filtering SSKFs filtering SSKFs compensation filtering SSKFs
DCT –
Intraframe SBC + ADPCM PC + DPCM or PC + FSCL AMBTC and PC + AMBTC +
Transform
Coding and PCM BTC (VQ) quantizetion LAVQ
(JPEG)
Average
37 dB 36.9 dB 36.5 dB 36.52 dB 37 dB 36.9 dB
PSNR
Bpp 0.343 0.434 0.273 0.25 0.2 0.13
Relative
Complex Moderate Low Moderate Moderate Low
complexity
Excellent Excellent
Competitive Excellent Excellent
quality with Good Quality at quality with
Comments quality with quality with quality at
moderate bit high bit rate very low bit
low bit rate low bit rate Low bit rate
rate rate
17 Low Complexity Video Compression Algorithm Using AMBTC, Proceeding of IEEE Military
codebook for the vector quantization. This algorithm gives an excellent image quality for the
Miss America sequence at an average bit rate of 0.25 bpp and PSNR 36.52 dB. This algorithm
is considered to be of higher complexity than our algorithm because of the codebook design.
In 17, the authors present a coder based on a combination of AMBTC for intraframe and MC
for interframe. They produce results for the monochrome Miss America sequence. For the
CIF format (i.e. 288 × 352) at 30 frames/sec, they achieve approximately 37 dB at 0.2 bpp.
The disadvantage of this algorithm is the use of motion compensation for fast motion video
sequence. For the same video sequence, the proposed algorithm in this paper gives a higher
PSNR (37.2 dB) and a lower bit rate (0.13 bpp) than those algorithms for a coder with lower
complexity.
9. Conclusion
The results presented here are better than other coding schemes, which are published using
almost the same coding technique concept. The 3-D decomposition does not make
unrealistic assumptions about the data, as do methods based on motion compensation (MC).
Moreover, coding and decoding for the proposed algorithm are of comparable and
relatively low complexity. The robustness obtained by adapting the LAVQ for the codebook
has been discussed. The results reported in this paper are independent of which sequence is
used to produce the codebook.
This scheme is faster than MPEG algorithms and other existing technique based their
encoder on VQ concept since no need for training set or codebook generation. This scheme
will be an optimal choice for real time transmission. It is well suited for progressive
transmission of the video sequence and for browsing moving images via the Internet.
LAVQ technique gives good performance with those bands, which are highly uncorrelated,
and with spark information. Bit rate is varying from 0.13 to 0.191 bpp depending on the
nature of the sequence. Different video sequences have been tested and show very good
image quality with PSNR in the range of 36.9 to 35.15 dB and with bit rate range from 0.395
Mbps to 0.58 Mbps as shown in table 4.
10. References
Al-asmari Awad Kh. (1995), Optimum Bit Rate Pyramid Coding with Low Computational and
Memory Requirements, IEEE Trans. Circuit and Systems for video Tech., vol. 5, no. 3
pp. 182-192, (June 1995).
Al-Asmari Awad Kh., Ahmed Abobakr & Al-Doweesh Abdullah (1996), Image compression
scheme using improved basic-LAVQ and optimized VLC, J. King Saud Univ., Vol. 8,
Eng. Sci. (2), pp. 251-266.
Al-Asmari Awad Kh., Aryai Deepali, & Kwatra Subhash C. (2000), Video Signal Transmission
for IS-95 Environment, Electronic Letters, Vol. 36, no. 5, pp. 465-466, 2nd March
2000.
Al-Asmari Awad Kh., Dave Sameep & Kawatra Subhash C. (1999), Low Complexity Video
Compression Algorithm Using AMBTC, Proceeding of IEEE Military communication
conference, Atlantic city, NJ, (31 Oct. – 3 Nov).
Al-Asmari Awad, Singh Vinay & Kawatra Subhash (June 28 – July 1 CIST 99), Hybrid coding
of images for progressive transmission over a digital cellular channel, CISST’99
120 Discrete Wavelet Transforms: Algorithms and Applications
Low Complexity
Implementation of Daubechies
Wavelets for Medical Imaging Applications
Khan Wahid
University of Saskatchewan,
Canada
1. Introduction
The Discrete Wavelet Transform (DWT) has extensively been used in a wide range of
applications, including numerical analysis, image and video coding, pattern recognition,
medical and telemetric imaging, etc. The invention of DWT decomposition by Mallat
(Mallat, 1998) shows that the DWT can be viewed as a multiresolution decomposition of
signal. This means it decomposes the signal into its components in different frequency
bands. The Inverse DWT does the opposite, i.e. it reconstructs the signal from its octave
band components. After its inclusion in JPEG2000 compression standard (Seo & Kim, 2007),
significant research has been done to optimize the DWT implementation to reduce the
computational complexity. Among a wide range of wavelets, the Daubechies wavelets
include members ranging from highly localized to highly smooth and can provide excellent
performance in image compression (Daubechies, 1992). Among the family members, the
first two – Daubechies 4-tap (DAUB4) and Daubechies 6-tap (DAUB6) – are popular choices
in medical imaging applications.
While compressing medical images, the key here is to preserve as much critical information
as possible in the reconstructed image so that accurate diagnosis is possible. There have
been several efficient implementations of wavelet filters proposed for applications in image
processing (Lee & Lim, 2006; Martina & Masera, 2007; Acharyya et al., 2009; Shi et al., 2009;
Lai et al., 2009). But, the use of conventional fixed-point (FP) binary (or any other weighted)
representation for implementing discrete wavelet coefficients (that are irrational in nature)
introduces round-off or approximation errors at the very beginning of the process. The error
is due to the lack of exact representation of the irrational numbers that form the coefficient
basis. These errors tend to expand as the calculations progress through the architecture,
degrading the quality of image reconstruction (Wahid et al., 2003). A lossless mapping
technique, known as Algebraic Integer Quantization (AIQ), can be used to minimize the
approximation error and efficiently compute the DAUB4 and DAUB6 coefficients (Wahid et
al., 2004). The AIQ scheme is divided into two parts: the first stage is based on factorization
and decomposition of transform matrices exploiting the symmetric structure. After the
decomposition, we map the irrational transform basis coefficients using multidimensional
algebraic integers that results in exact representation and simpler implementation. As a
result, less error is introduced in the computation process that yields significantly better
122 Discrete Wavelet Transforms: Algorithms and Applications
reconstruction of images while keeping critical information, making the scheme suitable for
medical and telemetric imaging applications.
As a case study, we apply the scheme to several medical images, such as endoscopic,
ultrasound, x-ray, CT-scan images and evaluate the performance. The chapter is organized
as follows: Previous related works are presented next. Section 3 presents a brief introduction
to Daubechies wavelets. In Section 4, we explain the AIQ scheme applied to Daubechies
wavelets. Then the simulation and synthesized results of the case study are summarized in
Section 5. Finally, we conclude the work in Section 6.
2. Past work
Lewis and Knowles proposed an architecture for Daubechies wavelets without multipliers
(Lewis & Knowles, 1991). A major drawback was that it was heavily dependent on the
properties of only one specific wavelet, DAUB4 tap coefficients. At the same time, Aware
Inc. came out with a chip called Wavelet Transform Processor (WTP) (Aware, 1991). It
essentially consists of a 4-tap filter (4 Multiply-Accumulate cells) and some external
memory with control but no specific features that can take advantage of the DWT
structure rather it relies heavily on the software to compute the DWT. It is also a complex
design requiring extensive user control. Parhi and Nishitani proposed two architectures,
folded and digit serial, for 1D DWT (Parhi & Nishitani, 1993). These architectures do not
easily scale with the filter size and the number of octaves computed. The number of
multipliers is higher, and hence the silicon area is large. In (Vishwanath et al., 1995), the
authors proposed linear systolic array architecture. Paek and Kim in proposed recursive
and semi-recursive architectures for DWT which has several drawbacks like large area
(hardware cost), scheduling control overhead and incomplete data-bus utilization (Paek &
Kim, 1998).
Most of the research work to reduce the hardware complexity is inclined towards
multiplierless implementations by maneuvering the filter banks (Lee & Lim, 2006; Martina
& Masera, 2007; Acharyya et al., 2009) or using lifting schemes (Shi et al., 2009; Lai et al.,
2009; Huang et al., 2004). However, in these designs, the use of conventional FP binary
representation results in erroneous computation process and degrades image
reconstruction. In this chapter, we present an efficient low-cost implementation of the
DAUB filters with a demonstration of performance advantages on medical images and noisy
environment.
3. Daubechies wavelets
This section provides a brief introduction to Daubechies wavelets. This class of wavelets
includes members ranging from highly localized to highly smooth – Daubechies-2 (DAUB2
with two coefficients) to Daubechies-20 (DAUB20 with 20 coefficients) and also provides
excellent performance in image compression (Daubechies, 1992). The Daubechies wavelet
coefficients are based on computing wavelet coefficients, C n (where, n = 0, 1, 2,..., N-1 and N
is the number of coefficients) to satisfy the following conditions (Mallat, 1998):
1. The conservation of area under a finite length signal x(t ) : C n 2
2. The accuracy conditions: ( 1)n nmC n 0 (where m = 0, 1, 2,...,
n
p-1 and p N )
2
3. The perfect reconstruction conditions: C n 2 and C nC n 2 m 0
n 2
n n
Low Complexity Implementation of Daubechies Wavelets for Medical Imaging Applications 123
Cn
Then the low-pass filter is h( n) and the high-pass filter is g(n) ( 1)n 1 h(n N 1) .
2
One of the simplest and most localized members is the DAUB6 which has six coefficients:
(1 z1 z2 ) (5 z1 3 z2 ) (10 2 z1 2 z2 )
C0 C1 C2
16 2 16 2 16 2
(1)
(10 2 z1 2 z2 ) (5 z1 3 z2 ) (1 z1 z2 )
C3 C4 C5
16 2 16 2 16 2
Where, z1 10 and z2 5 2 10 . For an 8x8 input data, the DAUB6 forward transform
matrix (using an assumption of periodicity) is shown in Eq. (2):
C 0 C1 C 2 C3 C4 C5 0 0
C C 4 C 3 C 2 C 1 C 0 0 0
5
0 0 C0 C1 C 2 C3 C4 C5
0 0 C5 C 4 C 3 C 2 C1 C 0
6 (C ) (2)
C C5 0 0 C0 C1 C2 C3
4
C 1 C 0 0 0 C5 C 4 C 3 C 2
C C3 C4 C5 0 0 C0 C1
2
C 3 C 2 C 1 C 0 0 0 C5 C 4
C0 -C5
x0
C1 -C4
x1
C2 C3
x2
C3 -C2 LOW0 / HIGH0
x3
C4 C1
x4
C5 -C0
x5
The structure of the matrix uses the set of coefficients, {C 0 , C 1 ,..., C 5 } as a smoothing filter
(low-pass) and the set, {C 5 , C 4 ,..., C 0 } as a non-smoothing filter (high-pass). The DWT is
invertible and orthogonal - the inverse transform, when viewed as a matrix, is simply the
transpose of the forward transform matrix. So, basically we need only 2 sets of multiply-
accumulate (MAC) cells each containing 6 multipliers and 5 adders where partial
products are computed separately and subsequently added. However, it can be seen that,
by introducing additional control circuitry, the same multipliers can be used for both low-
pass and high-pass filtering. As a result, the number of multipliers can be reduced to 6
instead of 12. Fig. 1 shows the signal flow graph of the conventional finite-precision (FP)
implementation of Eq. (2), where xi is the input data vector. Since all the coefficients are
fixed, for a fixed precision, we can in fact replace all the multipliers by adders and
shifters. As a result, the total equivalent additions required to compute the 1-D DAUB6
filter is 44.
4. AIQ-based algorithm
Algebraic integer (AI) is defined by real numbers that are roots of monic polynomials with
2 j
integer coefficients (Wahid et al., 2004). As an example, let e 16 denote a primitive 16th
root of unity over the ring of complex numbers. Then satisfies the equation: x 8 1 0 .
The ring Z( ) can be regarded as consisting of polynomials in of degree 7 with integer
coefficients. The elements of Z( ) are added and multiplied as polynomials, except that the
rule 8 1 is used in the product to reduce the degree of powers to below 8.
In summary, algebraic integers of an extension of degree n can be assumed to be of the
form:
Where, {0 , 1 ,..., n 1 } is called the AI basis and the coefficients ai are integers. The process
of mapping with AI is known as Algebraic Integer Quantization (AIQ).
The AIQ technique is useful in computing discrete transforms as first explored by Cozzens
and Finkelstein (Cozzens & Finkelstein, 1985). In their work, the algebraic integer number
representation, in which the signal sample is represented by a set of (typically four to eight)
small integers, combines with the Residue Number System (RNS) to produce processors
composed of simple parallel channels. The analog samples must first be quantized into the
algebraic integer representation and the final algebraic integer result converted back to an
analog or digital form. In between these two conversions, the algebraic integer
representation must be converted into and out of two levels of RNS parallelism.
1 1
a00 a10 1 z2
f ( z1 , z2 ) aij zi 1 z j 2 (4)
i 0 j 0 a01 a11 z1 z1 z2
Low Complexity Implementation of Daubechies Wavelets for Medical Imaging Applications 125
a a10
So the corresponding coefficients, aij , are encoded in the form 00 . Then all the
a01 a11
DAUB6 coefficients are exactly encoded (scaled by 16 2 ) and shown below in Eq. (5):
1 1 5 1 10 2
C0 C1 C2
1 0 3 0 2 0
(5)
10 2 5 1 1 1
C3 C4 C5
2 0 3 0
1 0
The obvious advantages of this approach are: a) Very small dynamic range (numbers
ranging from 0 to 10); b) Multiplication by a constant is very easy and efficient (only 1
addition is required in most cases; so, multiplication can be eliminated by add/shift
algorithm); and c) We have 3 parallel channels through which data flows independently
(since a11 is zero for all) and also a very simple scheduling is needed. No quantization errors
would be incurred. Fig. 2 shows the signal flow graph of the AIQ-based scheme that
requires 42 adders.
The encoding scheme can be easily applied to DAUB4 coefficients. In that case, we need a
2nd degree polynomial of one variable:
f ( z3 ) a0 a1 z3 a2 z 2 3 (6)
C 0 1 z3 ; C 1 3 z 3 ; C 2 3 z3 ; C 3 1 z 3 (7)
2 3
2 , 3 5 , cos , sin
16 16
(a)
2 3
2 , 3 5 , cos , sin z, z1 , z 2 (0,1, 2,...)
16 16
(b)
Fig. 3. Stages in computation process (error induced in shaded blocks): (a) FP approach; (b)
AIQ approach.
Moreover, the error introduced at the final stage in AIQ approach can be further minimized
using higher precision AIQ multipliers. We have performed an error analysis that shows the
error incurred for different bit-length of the AIQ multipliers (in Fig. 4). The error is
computed taking a multiplier of 16-bit width as reference. The signed digit representation
(for 8-bits) is shown below in Eq. (8):
z1 10 11.001010 3 2 3 2 5
z2 5 2 10 100.10100 4 2 1 2 3 (8)
2 5
z3 3 10.010010 2 2 2
-3
x 10
7
Z1
6 Z2
Z3
5
Computation Error
0
6 7 8 9 10 11 12 13 14
Multiplier Precision (bits)
5. Performance evaluation
The AIQ-based algorithm to compute the Daubechies Wavelet Transform is intended to
be used in applications where the quality of image reconstruction is critical, such as
biomedical imaging, telemedicine, capsule endoscopy (Wahid et al., 2008), etc. Due to the
error-free nature of integer mapping, the AIQ approach results in a much better
reconstruction compared to conventional binary approach. Here, we present the results of
our study, where we apply the scheme to several standard benchmark and medical
images, such as endoscopic, ultrasound, x-ray, CT-scan images and evaluate the
performance.
The section is divided into four sub-sections. In the first section, we evaluate the
performance of the AIQ scheme for standard images followed by the analysis of medical
images. Next, we show the performance of the scheme in a noisy environment. Finally, the
results are compared with existing works related to medical image compression. In all these
cases, we have used peak-signal-to-noise-ratio (PSNR) as the visual quality assessment
index which is given by Eq. (9):
255
PSNR 20 log 10 (9)
1
n 1 m1
xm ,n x 'm , n
N M 2
MN
Where, M and N are the image width and height namely; x and x’ are the original and
reconstructed component values namely.
70
70
60
60
PSNR (dB)
PSNR (dB)
50
40 50
30
40
20
30
10 FP
AIQ
0 20
6 8 10 12 14 50 100 150 200 250 300 350 400 450
Bit Precision Cost of Implementation
(a) (b)
Fig. 5. Performance analysis of DAUB4 – FP vs. AIQ: (a) PSNR vs. Precision; (b) PSNR vs.
Hardware cost.
60 70
PSNR (dB)
PSNR (dB)
50
60
40
30 50
20
40
10 FP
AIQ
0 30
6 8 10 12 14 100 200 300 400 500 600 700
Bit Precision Cost of Implementation
(a) (b)
Fig. 6. Performance analysis of DAUB6 – FP vs. AIQ: (a) PSNR vs. Precision; (b) PSNR vs.
Hardware cost.
Same kind of superiority is seen for DAUB6 (Figure 6) too. So, not only an improvement in
image reconstruction quality is obtained but also hardware cost is reduced. Fig. 7 shows the
image reconstruction for a 8-bit FP vs. a 14-bit AIQ for DAUB4. The difference in PSNR is
about 45dB and the level of improvement is quite noticeable.
130 Discrete Wavelet Transforms: Algorithms and Applications
Fig. 9. (a) Original x-ray image; (b) Reconstructed image using FP scheme; (c) AIQ scheme.
Fig. 10. (a) Original CT-scan image; (b) Reconstructed image using FP scheme; (c) AIQ
scheme.
Fig. 11. (a) Original US image; (b) Reconstructed image using FP scheme; (c) AIQ scheme.
132 Discrete Wavelet Transforms: Algorithms and Applications
PSNR (dB)
Noise Algorithm
Goldhill US X-ray Endoscopic CT
D4 - FP 43.9 47.5 43.1 44.6 44.4
D4 - AIQ 51.4 55.5 50.0 51.6 52.4
Gaussian
D6 - FP 40.6 44.3 39.7 41.3 41.1
D6 - AIQ 49.9 53.2 49.3 50.6 49.7
D4 - FP 44.1 47.6 43.2 44.8 44.6
D4 - AIQ 51.6 55.6 50.0 51.8 52.5
Poisson
D6 - FP 40.7 44.5 39.8 41.5 41.3
D6 - AIQ 50.1 53.4 49.3 50.9 49.9
Table 4. Comparative analysis (in terms of PSNR in dB) of the AIQ scheme.
6. Conclusion
In this chapter, we have presented an efficient approach to compute Daubechies wavelet
transforms that is based on encoding the basis set of forward transform coefficients using
algebraic integers. The AIQ approach not only reduces the number of arithmetic operations,
Low Complexity Implementation of Daubechies Wavelets for Medical Imaging Applications 133
but also reduces the dynamic range of the computations. Because of error-free mapping in
the earlier stages, less error is introduced in the system, as compared to FP implementation,
that results in much better data reconstruction. The performance is validated using standard
and medical images in both normal and noisy conditions. In all cases, the AIQ-based
approach outperforms the conventional FP scheme by far margin. The rate of data recovery
is very high while preserving critical information that makes the scheme suitable for
medical and telemetric imaging applications.
7. Acknowledgement
The author would like to acknowledge the Natural Science and Engineering Research
Council of Canada (NSERC) for its support to this research work. The author is also
indebted to the Canadian Microelectronics Corporation (CMC) for providing the hardware
and software infrastructure used in the development of this design.
8. References
Acharyya, A., Maharatna, K., Al-Hashimi, B., Gunn, S. (2009), Memory reduction
methodology for distributed arithmetic based DWT/IDWT exploiting data
symmetry, IEEE Trans. on Circuits and Systems II, vol. 56, no. 4, pp. 285-289.
Aware Inc., (1991) Aware Wavelet Transform Processor WTP) Preliminary, Cambridge, MA.
Cozzens, J. and Finkelstein, L. (1985) Computing the Discrete Fourier Transform using
Residue Number Systems in a Ring of Algebraic Integers, IEEE Transactions on
Information Theory, vol. 31, pp. 580-588.
Daubechies, I., (1992) Ten lectures on wavelets, SIAM, 1992.
Huang, C., Tseng, P., and Chen, L., (2004) Flipping structure: an efficient VLSI architecture
for lifting based discrete wavelet transform, IEEE Trans. Signal Process., vol. 52, no.
4, pp. 1080–1089.
Lai, Y., Chen, L., Shih, Y. (2009) A high-performance and memory-efficient VLSI architecture
with parallel scanning method for 2-D lifting-based discrete wavelet transform,
IEEE Trans. on Consumer Electronics, vol. 55, no. 2, pp. 400 – 407.
Lee, S. and Lim, S., (2006) VLSI design of a wavelet processing core, IEEE Trans. Cir. Syst.
Video Tech., vol. 16, pp. 1350-1360.
Lewis, A. and Knowles, G. (1991) VLSI Architecture for 2D Daubechies Wavelet Transform
without Multipliers, IEE Electronics Letters, vol. 27, no. 2, pp. 171-173.
Mallat, S., (1998) A wavelet tour of signal processing, New York: Academic.
Martina, M. and Masera, G., (2007) Multiplierless, folded 9/7-5/3 wavelet VLSI
architecture,” IEEE Trans. Cir. Syst. II, 54, pp. 770-774, 2007.
Mohammed, U. (2008), Highly scalable hybrid image coding scheme, Digital Signal
Processing, Science Direct, vol. 18, pp. 364–374.
Mohammed, U. and Abd-elhafiez, W. (2010), Image coding scheme based on object
extraction and hybrid transformation technique, Int. J. of Engineering Science and
Technology, vol. 2, no. 5, pp. 1375–1383.
Paek, S. and Kim, L. (1998) 2D DWT VLSI Architecture for Wavelet Image Processing, IEE
Electronics Letters, vol. 34, no. 5, pp. 537-538.
Parhi, K. and Nishitani, T. (1993) VLSI Architectures for Discrete Wavelet Transforms, IEEE
Transactions on VLSI Systems, vol. 1, no. 2, pp. 191-202.
134 Discrete Wavelet Transforms: Algorithms and Applications
Seo, Y. and Kim, D. (2007) VLSI architecture of line-based lifting wavelet transform for
motion JPEG2000, IEEE J. Solid-State Circuits, vol. 42, no. 2, pp. 431-440.
Shi, G., Liu, W., Zhang, L., Li, F. (2009) An efficient folded architecture for lifting-based
discrete wavelet transform, IEEE Trans. on Circuits and Systems II, vol. 56, no. 4,
pp. 290-294.
Vishwanath, M., Owens, R. and Irwin, M. (1995) VLSI Architectures for the Discrete Wavelet
Transform, IEEE Transactions on Circuits and Systems - II, vol. 42, no. 5, pp. 305-
316.
Wahid, K., Dimitrov, V., Jullien, G. (2004) VLSI architectures of daubechies wavelet
transforms using algebraic integers, J. of Circuits, Sys., and Comp., vol. 13, no.6,
pp. 1251-1270.
Wahid, K., Dimitrov, V., Jullien, G. and Badawy, W. (2003) Error-Free computation of
daubechies wavelets for image compression applications, Elect. Lett., vol. 39, no. 5,
pp. 428-429.
Wahid, K., Ko, SB. and Teng, D., (2008) Efficient hardware implementation of an image
compressor for wireless capsule endoscopy applications, Proc. of the IEEE Int. Joint
Conf. on Neural Net., pp. 2762-2766.
Yu, T., and Mitra, S. (1997), Wavelet based hybrid image coding scheme, Proc. IEEE Int
Circuits and Systems Symp, vol. 1, pp. 377–380.
9
1. Introduction
Human life is closely tied to signals. These signals are present everywhere - listening to music
is possible because of audible sound signals traveling through air, reading a book is feasible
due to light waves bouncing off objects and interpreted by our bodies as visual images,
electromagnetic waves allow us to communicate through the radio or wireless Internet.
Signal Processing is an area of electrical engineering and applied mathematics that deals with
either continuous or discrete signals. Particularly, Image Processing is any kind of Signal
Processing where the input is an image, such as a digital photograph. The underlying
essence of Image Processing lies in understanding the concept of what is an image and
studying techniques for the manipulation of images with the use of a computer. While these
explanations may seem quite generic, the importance of Image Processing in the modern
world is undeniable and progress in this field is very desirable.
1.1 Images
The concept of an image can initially be mathematically defined as a function f : S → C
that goes from a certain space S (such as R2 , for instance) to a space C of colors that can be
perceived by the human eye. This definition does not exhaust all of the possible meanings of
this word, but will be enough for this chapter. When working on a computer, however, both
the domain and counter-domain of the image-function must be discrete. The most common
representation of an image in Image Processing thus consists of taking a discrete subset of S -
S and a function that associates the values of S to a certain subset of C - C . In this way, an
image I can be thought of as a discrete function I : S → C .
In this work and in Image Processing in general, the kind of image we are most interested in is
a digital image, usually obtained through a digital camera or generated by a computer. As the
previous mathematical definition, digital images are discrete, that is they are composed of a
finite number of elements. A digital image can be thought of as a mosaic of colors taken form a
certain set. In mathematical terms, a digital image can be represented via a matrix M ∈ Mn,m ,
composed of numbers that represent colors that can be shown by modern electronic devices,
such as televisions, computer monitors and projectors. Each element of this matrix is called a
pixel (this name comes from the words ’picture element’).
It is important to understand the concept of color. Initially, color is a sensation produced by
the human brain when it receives certain visual stimuli. This input is given by electromagnetic
radiation (or light) in a set wavelength that is called the visible spectrum. A typical human
eye will respond to wavelengths from about 390 to 750 nm. Theoretically speaking, the space
136
2 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
of all visible colors, as given by their wavelengths is of infinite dimension, and thus not fit for
a computer. This limitation is bypassed through the study of the human vision.
Scrutiny of the human eyes shows that they contain two different kinds of photo-receptor cells
that allow vision. These cells are rods and cones. Rods are very sensitive to light, being mostly
responsible for night vision and have little, if any, role in color vision. Cones on the other
hand are of three types (Short, Medium and Long), each covered in a different photo-sensitive
pigment. These pigments respond differently to incoming light wavelengths. A chart showing
the response of each kind of cone to light can be seen below in Figure 1.
By using the knowledge above, modern visual devices are built so that they emit light at only
three different wavelengths, specifically suited to excite each cone in a known way. This allows
devices to create a wide range of visible colors. While its not possible to re-create all possible
color sensations using only these three colors, the difference when using modern technology
is mostly imperceptible. Thus we have arrived at the discretization of the color space used
for digital images. These colors can now be codified as certain finite amounts taken in small
intervals of these three primary colors. A schematic of a digital image can be seen in Figure 2.
such as ends and bifurcations of said ridges. Figure 3 shows a program extracting information
from a finger photograph and Figure 4 shows a fingerprint recognition device being used.
Fig. 7. An example of image segmentation. The frog is being segmented from the
background.
Some more information on interesting applications of this field and otherwise can be found in
(Acharya & Ray, 2005).
There are plenty of uses of wavelets in image processing. For example, in 1994 (Fröhlich &
Weickert, 1994) presented an algorithm to solve a nonlinear diffusion equation in a wavelet
basis. This equation has the property of edge enhancement, an important feature for image
processing. More applications in edge detection are shown later in this chapter. The JPEG
2000 image coding system (from the Joint Photographic Experts Group) uses compression
techniques based on wavelets. In (Walker, 2003) the author describes a wavelet-based
technique for image denoising. Applications of the wavelet transform to detect cracks in
frame structures is presented by (Ovanesova & Suárez, 2004). Wavelet transforms have an
important role in multiresolution representations in order to effectively analyze the content of
images. Multiresolution will be introduced later in this chapter.
Unlike the Fourier transform, the wavelet transform can capture both frequency and location
information.
A wavelet is a function ψ ∈ L2 (R ) with a zero average:
+∞
ψ(t)dt = 0 (1)
−∞
This function is normalized ψ = 1, and centered in the neighborhood of t = 0. A family of
time-frequency atoms is obtained by scaling ψ by s and translating it by u:
1 t−u
ψu,s (t) = √ ψ (2)
s s
Thus, the Continuous Wavelet Transform (CWT) of a function f at a scale s > 0 and translated
by u ∈ R can be written as:
+∞
1 t−u
W f (u, s) = f (t) √ ψ∗ dt (3)
−∞ s s
In the field of image processing we are interested in wavelets which form a base of L2 (R2 )
to represent images. If we have an orthonormal wavelet basis in L2 (R ) given by ψ with the
scaling function φ, we can use
ψ1 ( x1 , x2 ) = φ ( x1 ) ψ ( x2 ),
ψ2 ( x1 , x2 ) = ψ ( x1 ) φ ( x2 ), (4)
ψ3 ( x1 , x2 ) = ψ ( x1 ) ψ ( x2 ),
140
6 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
The DWT can then be written as a circular convolution ψ̄j [n] = ψ∗j [n]:
N −1
W f (n, a j ) = ∑ f [m]ψ∗j [m − n] = f ψ̄j [n] (7)
m =0
N −1
L f [n, a J ] = ∑ f [m]φ∗J [m − n] = f φ̄ J [n] (9)
m =0
As we can see in the Equations 6 and 9, the DWT is a circular convolution. In that way, we will
have lowpass and highpass filters which form a bank of filters. Figure 9 shows the discrete
wavelet transform for 3 scales. hψ (n) is a highpass filter and hφ is a lowpass filter. This form
is known as Fast Wavelet Transform (FWT).
−1 N −1
Wφ ( j0 , m, n) = √ 1 ∑ xM1 = 0 ∑ x2 =0 f ( x1 , x2 ) φ j0 ,m,n ( x1 , x2 )
MN
(10)
−1 N −1
Wψi ( j, m, n) = √ 1 ∑ xM1 = 0 ∑ x2 =0 f ( x1 , x2 ) ψ j,m,n ( x1 , x2 )
i
MN
where i = {1, 2, 3}
Similar to Figure 9, we can express the FWT in two dimensions like the Figure 10.
Fig. 11. Results of lowpass and highpass filters. The first image is the original, the second is
the result of a lowpass filter and the third is the result of a highpass filter
• Regularity
The subspace V0 is generated as the linear combination of integer shifts of one or a
finite number of generating functions φ1 ,. . . ,φr . These generating functions are called
scaling functions. Usually those functions must have compact support and be piecewise
continuous.
• Completeness
those nested subspaces fill the whole space L2 (R ), and they are not too redundant. So, the
intersection of these subspaces should only contain the zero element.
This concept, applied to image processing and wavelets, justifies the successful use of image
pyramids in the context of high frequency detection.
The Daubechies family is of particular interest because it is fractal in nature, and the Haar
family, although very simple, can be very useful in many applications.
In practical terms, the base of the pyramid is the image which we want to filter in various
scales, and each level of the pyramid above the base is produced by filtering it and generating
an image with half of its width and height.
Using wavelet and scale functions, the nested subspaces of scale and detail are produced. The
horizontal, vertical and diagonal details of a subspace Vi+1 are the information that cannot be
represented in Vi (Figure 13).
⊂ ⊂ ⊂
Fig. 13. Nested subspaces in the context of image processing. The details are represented in
the grey regions (the contrast was enhanced for better visualization).
Now is easy to understand how the discrete wavelet transform can be applied for images.
As we saw in Equation 10, the Discrete Wavelet Transform in two dimensions captures the
variations on rows, columns and diagonals. Figure 14 shows an example of a DWT applied
for an image in 3 scales.
Fig. 14. The result decomposition of a blank image using a discrete Wavelet Transform for 1
and 2 scales Gonzalez & Woods (2006)
Section 4.2 describes a method which produces a pyramid of a chosen image and processes the
correspondent details in every scale. This allows us to detect discontinuities in a very precise
and adaptative approach.
144
10 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
to combine the tensors obtained at each scale j, where n j is the number of scales and k j ∈ R is
the weight assigned to each scale, given by
np
∑n=1 Trace M j,n
k j = nj np
(18)
∑k=1 ∑n=1 Trace Mk,n
where n p is the number of pixels and Trace( M j,p ) is the sum of the eigenvalues of M j,p .
The trace represents the amplification driven by the tensor to the unit sphere and is a good
estimator of its importance. Thus, the tensor sum is weighted by the proportion of energy of
each scale in the multiresolution pyramid.
In order to find M j,p in Equation 17, we use bilinear interpolation of the tensor values, relative
to each position p in the initial image, at the subsampled image at scale j to find the resulting
tensor M j,p for each pixel of the initial image. This is depicted in Figure 15, where tensors are
represented as superquadric glyphs whose longer axis shows the main direction.
Note that the tensor presented in Equation 17 is a 3x3 positive symmetric matrix with
real coefficients, and thus we may apply Equation 14. We then find the main orientation
component (spear) of the final orientation tensor for each pixel of the input image. This
component indicates the collinearity of the interpolated tensors and provides interesting
results.
4.2.1 Implementation
The proposed algorithm consists of three main steps: a discrete wavelet transform (Barnard,
1994; Mallat, 1999), a tensor field computation and a weighted sum of the computed tensors.
The whole process is illustrated in Figure 16.
Discrete Wavelets
Discrete Wavelets on Edges on Edges 147
13
Fig. 15. A tensor is computed for each pixel in original image by a weighted sum of
corresponding tensors in each scale. In this example, two wavelet decompositions are
performed.
Fig. 16. Example of the proposed algorithm using Daubechies1 to decompose the image into
two scales.
The number of scales to be used is a parameter of the algorithm. The DWT splits the image
into three detail components and one scale component in the beginning of each iteration. In
the next iteration, the same process is applied, using the resulting scale component as the
input image.
For each pixel of the input image, its correspondent position at the current scale is computed
with subpixel precision for each resolution. The four nearest pixels in a given resolution are
used to compute the final tensor. The vectors v j,p described in Equation 15 are computed for
each of these pixels and then used to compute four spear type tensors. The final tensor for the
subpixel position is obtained by combining these four tensors with bilinear interpolation. The
pixel tensor is computed by combining the n j tensors as showed in Equation 17.
The pixel tensors are decomposed and their eigenvalues are then extracted. The values
λ1 − λ2 are computed and normalized to form the output image. Color images are split
148
14 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
into three monochromatic channels (Red, Green and Blue) and the proposed algorithm is
applied to each channel separately. The tensors for each color channel are summed before
eigen decomposition.
The complexity of the whole process is O(n j · n p ), where n j is the number of analyzed scales
and n p the amount of input pixels. Thus, this is an efficient method that can be further
parallelized.
(a)
(b) (c)
(d) (e)
Fig. 17. (a) input image. (b) λ1 − λ2 with Daubechies1 and 1 scale. (c) Daubechies1 and 3
scales. (d) Daubechies3 and 1 scale. (e) Daubechies3 and 3 scales.
Discrete Wavelets
Discrete Wavelets on Edges on Edges 149
15
A better estimation of soft edge transitions is obtained by changing the analyzing filter from
Daubechies1 to Daubechies3. Figures 17b and 17d illustrate this behavior.
In general, it can be noted that this method highlights high frequencies occurring in the same
region at different scales. We used thermal coloring with smooth transition from blue to red,
where blue means absence of high frequencies, and red means presence of high frequencies.
The green regions also indicate high frequencies, but less intense than those indicated by red
regions. The red regions provide the better higher frequencies estimation tensors.
4.2.3 Conclusion
We presented an overview of discrete wavelets and multiresolution applied to edge detection.
We also presented a method for high frequency assessment visualization using these powerful
tools. The method is based on the DWT decomposition and detail information merging
using orientation tensors. This multiresolution analysis showed to be suitable for detecting
edges and salient areas in an image. The experimental results show that the high frequency
information can be inferred by varying the DWT filters and number of scales. Coincident
frequencies in space domain are successfully highlighted. By tuning the number of scales, one
may infer texture feature regions. The λ1 − λ2 scalar field is one of the most used orientation
alignment descriptors. However, other relations can be extracted from final tensors. This
method can be easily parallelized, the use of technologies like GPGPUs and multicore CPUs
turns it attractive for high performance applications.
5. References
Acharya, T. & Ray, A. K. (2005). Image Processing - Principles and Applications, First Edition,
Wiley InterScience.
Barnard, H. J. (1994). Image and Video Coding Using a Wavelet Decomposition, PhD thesis, Delft
University of Technology, Department of Electrical Engineering, Information Theory
Group, P.O.Box 5031, 2600 GA, Delft.
Belkasim, S., Derado, G., Aznita, R., Gilbert, E. & O’Connell, H. (2007). Multi-resolution
border segmentation for measuring spatial heterogeneity of mixed population
biofilm bacteria, Computerized Medical Imaging and Graphics 32.
Brannock, E. & Weeks, M. (2006). Edge detection using wavelets, Proceedings of the 44th annual
Southeast regional conference, ACM-SE 44, ACM, New York, NY, USA, pp. 649–654.
Burt, P. J. & Adelson, E. H. (1983). The laplacian pyramid as a compact image code, IEEE
Transactions on Communications 31: 532–540.
Daubechies, I. (1988). Orthonormal bases of compactly supported wavelets, Communications
on Pure and Applied Mathematics 41(7): 909–996.
Daubechies, I. (1992). Ten lectures on wavelets, Society for Industrial and Applied Mathematics,
Philadelphia, PA, USA.
de Castro, T. K., de A. Perez, E., Mota, V. F., Chapiro, A., Vieira, M. B. & Freire, W. P. (2009).
High frequency assessment from multiresolution analysis., ICCS (1), Vol. 5544 of
Lecture Notes in Computer Science, Springer, pp. 429–438.
Fröhlich, J. & Weickert, J. (1994). Image processing using a wavelet algorithm for nonlinear
diffusion.
Gonzalez, R. C. & Woods, R. E. (2006). Digital Image Processing (3rd Edition), Prentice-Hall, Inc.,
Upper Saddle River, NJ, USA.
Haar, A. (1911). Zur theorie der orthogonalen funktionensysteme, Mathematische Annalen
71: 38–53. 10.1007/BF01456927.
Han, Y. & Shi, P. (2007). An adaptive level-selecting wavelet transform for texture defect
detection, Image Vision Comput. 25(8): 1239–1248.
150
16 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
Heric, D. & Zazula, D. (2007). Combined edge detection using wavelet transform and signal
registration, Image Vision Comput. 25: 652–662.
Knutsson, H. (1989). Representing local structure using tensors, The 6th Scandinavian
Conference on Image Analysis, Oulu, Finland, pp. 244–251. Report LiTH–ISY–I–1019,
Computer Vision Laboratory, Linköping University, Sweden, 1989.
Koenderink, J. J. (1984). The structure of images, Biological Cybernetics 50(5): 363–370–370.
Mallat, S. (1999). A Wavelet Tour of Signal Processing, Second Edition (Wavelet Analysis & Its
Applications), Academic Press.
Mallat, S. G. (1989). A theory for multiresolution signal decomposition: the wavelet
representation, IEEE Transactions on Pattern Analysis and Machine Intelligence
11: 674–693.
Max Planck Institut fur Informatik, M. (n.d.). http://www.mpi-inf.mpg.de/resources/hdr/.
Meyer, Y. (1987). Principe d’incertitude, bases hilbertiennes et algèbres d’opérateurs. (The
uncertainty principle, Hilbert base and operator algebras)., Sémin. Bourbaki, 38ème
année, Vol. 1985/86, Exp. Astérisque 145/146, 209-223 (1987).
Ovanesova, A. V. & Suárez, L. E. (2004). Applications of wavelet transforms to damage
detection in frame structures, Engineering Structures 26(1): 39 – 49.
Radunivić, D. (2009). WAVELETS from MATH to PRACTICE, Springer.
Shih, M. & Tseng, D. (2005). A wavelet-based multiresolution edge detection and tracking,
IVC 23(4): 441–451.
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A. & Blake,
A. (2011). Real-time human pose recognition in parts from a single depth image.
Sumengen, B. & Manjunath, B. S. (2005). Multi-scale edge detection and image segmentation,
European Signal Processing Conference (EUSIPCO).
Tremblais, B. & Augereau, B. (2004). A fast multi-scale edge detection algorithm, Pattern
Recogn. Lett. 25: 603–618.
Velho, L., Frery, A. & Gomes, J. (2008). Image Processing for Computer Graphics and Vision, First
Edition, Springer.
Walker, J. S. (2003). Tree-adapted wavelet shrinkage, Vol. 124 of Advances in Imaging and
Electron Physics, Elsevier, pp. 343 – 394.
Westin, C.-F. (1994). A Tensor Framework for Multidimensional Signal Processing, PhD thesis,
Department of Electrical Engineering Linköping University.
Witkin, A. P. (1983). Scale-Space Filtering., 8th Int. Joint Conf. Artificial Intelligence, Vol. 2,
Karlsruhe, pp. 1019–1022.
Zhang, M., Li, X., Yang, Z. & Yang, Y. (2010). A novel zero-crossing edge detection
method based on multi-scale space theory, Signal Processing (ICSP), 2010 IEEE 10th
International Conference on, pp. 1036 –1039.
10
1. Introduction
These last years, research activities on multicomponent image compression have been
expanded, due to the development of multispectral and hyperspectral image sensors which
supply larger and larger amount of data. The end-users of such images become also more
numerous and have various needs and various applications. The future earth observation
systems, for instance, will use multi-, super- and hyper- spectral image sensors with higher
resolutions leading to bigger amount of transmitted data. However the channel bandwidth for
transmission is limited and therefore there is an interest of conceiving compression systems
(onboard and on the ground) of multicomponent images which are not application dependent
and which are compatible with the diversity of end-users’ needs. The components of a
multicomponent image generally represent the same scene with different views depending
on the wavelength. For data from different sensors, a preliminary step of image registration
is therefore required as there is a high degree of dependence (or redundancies) between
the various components: the usual spatial redundancy (between different pixels in each
component) and the spectral redundancy (between the components).
During the past two decades, different solutions have been proposed for multicomponent
image coding. A solution currently adopted consists of using two different transformations,
each with the goal of reducing only one of the two redundancies. In (Dragotti et al., 2000),
a 2-D discrete wavelet transform (DWT) is used to reduce the spatial redundancies in each
component while the Karhunen Loève transform (KLT) is applied to reduce the spectral
ones. In that paper, the quantization and entropy coding are achieved thanks to the well
known SPIHT (Set Partitioning in Hierarchical Trees) codec by Said and Pearlman (Said
& Pearlman, 1996) in its original version and in a modified version including VQ (vector
quantization). In the same way, with the use of the 2-D DWT of (Antonini et al., 1992)
(usually called the Daubechies 9/7), the authors of (Vaisey et al., 1998) use a lattice VQ
with a stack run coder as quantization and entropy coding. More recently in (Rucker et al.,
2005), the KLT associated with the Daubechies 9/7 2-D DWT and with EBCOT (Taubman,
2000; Taubman & Marcellin, 2002) for quantizing and entropy coding has been tested on
152
2 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
hyperspectral images with different bit-allocations between components. It is shown that the
Post Compression Rate-Distortion (PCRD) optimizer of EBCOT applied across multiple bands
gives the best rate-distortion performance. Another solution consists of using a 3-D DWT for
reducing both the spatial and spectral redundancies with only one transform. This approach
is generally applied to hyperspectral images as in (Christophe et al., 2006). An overview of
3-D wavelet-based techniques and more can be found in (Fowler & Rucker, 2007). The two
above mentioned solutions are compatible with the JPEG2000 Part 2 standard. The JPEG2000
standard is well known and well spread today. Moreover the KLT used in JPEG2000 Part 2
is considered as the best existing lossy compression techniques for hyperspectral images at
medium and high bit rates (Du & Fowler, 2007; Penna et al., 2007). The KLT consists in a
Principal Component Analysis (PCA), well known of statisticians, where all the components
are kept. However, the rather great computational complexity of the KLT hinders its adoption
in practice — specially on satellite platforms — and recent works propose different solutions
in order to pass round this problem. One approach consists in reducing the complexity of
the covariance matrix computation. This is done by randomly sampling the entire image in
order to obtain a small sample of the pixels’ population on which the covariance matrix is
computed (Du & Fowler, 2008; Penna et al., 2007). Another approach consists in computing
a kind of KLT average on a set of images (the learning basis) issued from only one sensor
and using it on other images obtained with the same sensor. This sub-optimal transform
is called exogenous KLT in (Thiebaut et al., 2006) and the computational complexity of the
second approach is compatible with satellite platforms. Both approaches are fruitful: the
rate-distortion performance sacrifice compared with the true KLT is very slight, whereas the
computational burden is significantly reduced. In the second approach, the exogenous KLT
matrix is known by the decoder, hence there is no need to transmit it.
It is well known that the KLT can be suboptimal in transform coding when the data are not
Gaussian. Now, under only the high resolution quantization hypothesis, nearly everything
is known about the performance of a transform coding. Nevertheless, the optimal transform
computation is generally considered as a difficult task and the Gaussian assumption is then
used in order to simplify the calculation. Recently, the problem of computing the optimal
coding transform associated with scalar variable-rate quantizers for still images was resolved
under high-resolution quantization hypothesis, with mean square error as distortion and
without the Gaussian assumption (Narozny et al., 2005; 2008). However, for the JPEG2000
Part2 compression scheme, the previous optimal transform computation cannot be directly
applied to obtain the optimal spectral transform, because of the 2D DWT presence—see the
criterion (15) in Section 4, which depends on subband statistics—. In (Akam Bita et al.,
2010a), the authors solved both the problems of computing an optimal spectral transform
(OST), with the constraint of orthogonality and without any constraint but invertibility, for
that compression scheme, when the 2D DWT has fixed coefficients and under the only
high resolution quantization hypothesis. They showed that on hyperspectral images, the
orthogonal OST, called OrthOST, performs slightly but significantly better than a KLT at
low, medium and high bit-rates and that the gain obtained by removing the orthogonality
constraint in the computation of the OST is not significant. Further, it is not widely
known that even when the input data are Gaussian, the KLT is not optimal in the above
mentioned compression scheme. Indeed, after the 2D DWT, the variance of the wavelet
coefficients depends on the subband they belong to (even for Gaussian data) and the KLT
does not capture these various variances, while the EBCOT coder with its PCRD optimizer
performing simultaneously across all the codeblocks from the entire image take them into
Discrete Wavelet Transform
and
DiscreteOptimal Spectral
Wavelet Transform Transform
and Optimal Applied
Spectral Transform toMulticomponent
Applied to MulticomponentImage CodingImage Coding 1533
account. In (Akam Bita et al., 2010b), the authors introduced an orthogonal spectral transform
(called JADO for Joint Approximate Diagonalization under Orthogonality constraint) using
only second order statistics that has not this shortcoming, and that is optimal at high
bit-rates for the JPEG2000 Part 2 compression scheme, when the data are Gaussian. They
showed on natural hyperspectral images that JADO (resp. OrthOST) performs slightly but
significantly better than the KLT (resp. JADO). The main drawback of the OSTs is their heavy
computational cost, which is much higher than the one of a KLT or JADO (which both have
roughly the same complexity).
In order to reduce the complexity of a codec based on OrthOSTs, the authors of (Akam Bita
et al., 2008; 2010c; Barret et al., 2009) used the same strategy as in (Thiebaut et al., 2006):
they replaced the OrthOST, which must be computed for each new encoded image, with an
exogenous quasi optimal spectral transform. This last transform is an OrthOST computed once
and for all on a learning basis constituted of images from only one spectrometer and which
is then applied to any image to be coded stemming from the same spectrometer. Using either
the JPEG2000 codec called Verification Model version 9 (JPEG2000, 2001) or the Bit Plane
Encoder (BPE (CCSDS-1, 2007)) recommended for satellite image compression (Yeh et al.,
2005) by the CCSDS (Consultative Committee for Space Data Systems), they showed that this
strategy yielded good performances, sometimes better than the (non exogenous) KLT ones,
in terms of bit-rate versus distortions. Four different distortions were considered: Signal to
Noise Ratio (SNR), Maximum Absolute Difference (MAD), Mean Absolute Error (MAE) and
Maximum Spectral Angle (MSA). Indeed, it is well-known that providing the mean square
error as one distortion only is not sufficient to assess the quality of a codec for hyperspectral
images (Christophe et al., 2005). However in the simulations presented in (Akam Bita et al.,
2008; 2010c; Barret et al., 2009) when the VM9 is used, the computational complexity of the
EBCOT coder associated with its PCRD optimizer is very high, and when the BPE is applied
to encode each component of the transformed image, the complexity of the algorithm for
optimal allocation between components is also very high. In both cases, the computational
complexity is too high for a compression system on-board a satellite. In (Barret et al., 2011), the
authors present a low complexity hyperspectral image coder based on exogenous OrthOST
and zerotrees well adapted to OrthOST.
It is important to note that the point of view presented in this chapter — i.e., a compression
scheme for hyperspectral images that is independent of the end-user application — is no
longer justified at very low bit-rates (lower than 0.5 bits per pixel and per band). For more
details on low-bit rates hyperspectral compression see (Chang et al., 2010c).
In this chapter, we study the question of an optimal linear transform for reducing spectral
redundancies under high resolution and variable rate constrained quantization hypothesis,
when a 2-D DWT — with fixed coefficients — is applied to each component to reduce
spatial redundancies and one scalar quantizer per subband and per component is used.
This compression scheme, described in Section 2, is compatible with the JPEG2000 Part 2
standard. The asymptotic expression of the mean square error distortion associated with that
compression scheme is given in Section 3. In Section 4, we clarify the criterion minimized by
such an optimal spectral transform with mean square error distortion and we show the link
between the criterion and the mutual information contrast used in Independent Component
Analysis (ICA). In Section 5, we derive a criterion minimized by an OrthOST under Gaussian
data assumption. Moreover, we describe in Section 6 the quasi-Newton algorithms used for
the minimization of the criterion, either with the constraint of an orthogonal transform or
with no constraint but invertibility or with the constraint of an orthogonal transform and the
154
4 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
assumption of Gaussian data. The two first algorithms are derived from an algorithm by Pham
ICAinf described in (Pham, 2004) that performs ICA. Then in Section 7, performances of these
transforms and comparisons with the KLT are given for multi- and hyper-spectral satellite
images, with the four above mentioned different measures of distortion. Finally, in Section 8
we introduce quasi-optimal OrthOSTs, called exogenous, that have not the main drawback of
heavy computational cost and we compare their performances in lossy coding with OrthOSTs.
We begin by recalling the solution of the problem in a simple general case (Gersho & Gray,
1992; Taubman & Marcellin, 2002).
Proof: We have X − X = A−1 b and A−1 b2 = b T A−T A−1 b = tr[A−1 bb T A−T ] =
tr[bb A A ], therefore D = N1 E(A−1 b2 ) = N1 tr E(bb T )A−T A−1 .
T − T − 1
Further, we may need the following assumption, that can be deduced from high resolution
quantization hypothesis (Gersho & Gray, 1992) (this point is recalled in Subsection 3.2).
H1 : The components of the quantization noise are zero mean and uncorrelated.
Theorem 1. 1. With the hypotheses of Lemma 3.1 and assuming H1 , the distortion becomes
1 N
N i∑
D= α i Di , (2)
=1
where Di = E(bi2 ) is the quantizer distortion of the ith component Yi of Y and, with ei the ith
canonical vector of R N and (A−1 )ij the element of A−1 located on row i and column j, we have
N
αi = ∑ (A−1 )2ji = A−1 ei 2 . (3)
j =1
2. The assertion 1. holds without the assumption H1 if A−T A−1 is diagonal, e.g. if A is orthogonal.
Proof: The assumptions in 1. or 2. state that at least one of the two matrices A−TA−1 and
E(bb T ) is diagonal. Hence the trace of their product is equal to the sum of the products of
their diagonal elements.
associated with S is the iso-barycenter of S. Indeed, if the three conditions C1 , C2 and C3 hold,
then the pdf f Y can be considered as quasi constant in the hypercube Yq + ∏iN=1 [− hi /2 , hi /2].
Further, the conditional law of the quantization noise b = Y − Yq knowing the dequantized
value Yq satisfies f b|Yq (u) 1/ ∏iN=1 hi if u ∈ ∏iN=1 [− hi /2 , hi /2], 0 otherwise. We see that
the conditional pdf f b|Yq does not depend on the quantized value Yq , hence it is equal to f b ,
the pdf of b. Further the components of b are zero mean and (quasi) independent since their
joint density is approximatively equal to the product of their marginal densities.
) = E[ Da (X, X
)] = 1 2 ].
D (X, X E[X − X (4)
NL
Now, in order to express the relation (3) in terms of the DWT W and the spectral transform
A, it is important to note first that the canonical basis of the space of matrices of dimension
N × L is the family of matrices ei,k = ei ekT (1 ≤ i ≤ N, 1 ≤ k ≤ L), with ei (resp. ek ) the ith
(resp. kth ) vector of the canonical basis of R N (resp. R L ). Therefore, the weighting factor αi in
relation (3) depends here on the two indices i and k: αik = A−1 ei,k 2 . Then, let
wi = A −1 e i 2 (1 ≤ i ≤ N ), (5)
T T
we have A−1 ei,k = A−1 ei ek W−T and A−1 ei,k 2 = tr[A−1 ei ek W−T W−1 ek eiT A−T ] =
eiT A−T A−1 ei ekT W−T W−1 ek and finally
αik = A−1 ei,k 2 = wi W−1 ek 2 . (6)
Therefore, according to Theorem 1, under assumption H1 we have
N L
) = 1 ∑ ∑ wi W−T e 2 E[(Yi (k) − Y q (k))2 ].
D (X, X (7)
NL i=1 k=1 k i
1 N M (m) 1
) =
D (X, X
N i∑ ∑ π m w i ω m Di with ωm =
Km ∑ W − T e k 2 , (9)
=1 m =1 k
where in the last summation, the range of k consists in the columns of Y with the subband m.
Or, by adopting a different perspective, if we assume that
Discrete Wavelet Transform
and
DiscreteOptimal Spectral
Wavelet Transform Transform
and Optimal Applied
Spectral Transform toMulticomponent
Applied to MulticomponentImage CodingImage Coding 1577
H3 : the weight W−1 ek 2 = ωm does not depend on the spatial position k in the subband m,
then equation (7) becomes
) = 1 N M (m) (m) 1 q
N i∑ ∑ π m ω m w i Di ∑ E[(Yi (k) − Yi (k))2 ],
D (X, X with Di = (10)
=1 m =1
Km k
where in the last summation, the range of k consists in the columns of Y with the subband m.
Remark 1. The condition H3 is satisfied by dyadic wavelets having Finite Impulse Response (FIR)
synthesis filters, when edge effects are neglected (for more details see e.g. (Usevitch, 1996; Woods &
Naven, 1992)).
Lastly, we can notice that the actual distortion Da given in equation (1) satisfies
) = 1 tr[(X − X
Da (X, X ) T ] = 1 tr[A−1 (Y − Yq )W−T W−1 (Y − Yq ) T A−T ],
)(X − X
NL NL
therefore if we assume
H4 : the DWT is orthogonal, i.e. WW T = I L , with I L the identity matrix of dimension L,
then Da (X, X ) = tr[A−1 (Y − Yq )(Y − Yq ) T A−T ] = −1 (m) − Yq(m) )( Y(m) −
=1 tr[ A ( Y
1 1 M
NL NL ∑m
Y q ( m ) T −
) A ].
T
Remark 2. The hypothesis H4 is roughly satisfied with the approximately orthogonal Daubechies
9/7 DWT (indeed, a simulation shows that the infinity norm of the diagonal, and respectively the off
diagonal, elements of W T W − I L is worth 0.42 and 0.16, for five levels of decomposition on a 1-D
signal of length 512).
Now, K1m (Y(m) − Yq(m) )(Y(m) − Yq(m) ) T is the actual autocorrelation matrix of the m-th
subband quantization noise. If we assume
H1 : in each subband, the actual autocorrelation matrix of the quantization noise is
(m) (m) (m)
diagonal, i.e., Km ( Y
1
− Yq(m) )(Y(m) − Yq(m) ) T = diag( D1 , . . . , D N ) (1 ≤ m ≤ M),
then we have
N
(m) (m) (m)
tr[A−1 diag( D1 , . . . , D N )A−T ] = ∑ w i Di
i =1
1 M (m) (m)
) = πm tr[A−1 diag( D1 , . . . , D N )A−T ]
N m∑
Da (X, X
=1
N M
) = 1 ∑ ∑ π m wi D ( m ) .
Da (X, X (11)
N i =1 m =1 i
Theorem 2. With the notations of Section 2.2, the end-to-end distortion of the separable scheme is
given by:
• equation (9) under the assumptions H1 and H2 ;
• equation (10) under the assumptions H1 and H3 ;
• equation (11) under the assumptions H1 and H4 ;
• equation (12) under the assumptions H4 and H5 .
Remark 3. 1. The assumptions H1 and H1 are consequences of high resolution quantizations (see
Subsection 3.2). They can also be deduced from the condition of statistical independence of the
transformed components, since if the components of Y are independent, then the components of the
quantization noise Y − Yq , which is generally centered, are uncorrelated.
2. A method for the computation of the weighting wavelet coefficients ωm (1 ≤ m ≤ M) can be found
in (Usevitch, 1996; Woods & Naven, 1992).
3. Since the assumptions H1 , H1 , . . . , H4 , are only approximatively satisfied, the equalities (9–13)
are only approximations. However, we observed on many experiments that these approximations
are very good for bit-rates greater than 0.25 bits per pixel and per band.
We search the optimal spectral transform (that is the one which minimizes the total bit-rate
for a given end-to-end distortion) which adapts to the data, assuming high resolution
quantizations hypotheses and 2-D DWT with fixed coefficients, i.e., which do not adapt to
the data. As already mentioned, in our tests we always used the Daubechies 9/7 DWT. First,
we derive the criterion minimized by an optimal spectral transform. We emphasize the fact
that we do not assume Gaussian data and that generally in the literature this assumption
is made in order to clarify the criterion (coding gain) maximized by the optimal transform.
However, the Bennett’s formula and the optimal bit allocation between quantizers formula on
which our criteria are based are well-known and therefore it is straightforward to deduce these
criteria from well-known results. Our major innovation consists especially in the computation
of the optimal transforms, since this computation is generally presented as a difficult task in
classical transform coding and has never been done in the case of the separable scheme which
is JPEG2000 compatible.
1 N M 1
(m) (m)
R ∑ ∑
N i =1 m =1
πm H (Yi ) − log2 (cDi ) .
2
(13)
Discrete Wavelet Transform
and
DiscreteOptimal Spectral
Wavelet Transform Transform
and Optimal Applied
Spectral Transform toMulticomponent
Applied to MulticomponentImage CodingImage Coding 1599
The problem now consists in minimizing R under the constraint (given by Theorem 2)
1 N M (m)
N i∑ ∑ π m ω m w i Di ≤ D t (14)
=1 m =1
for a given end-to-end distortion Dt . In other words, for a target end-to-end distortion Dt ,
(m)
how can the quantizer distortions Di be distributed in each subband of each component
in order to minimize the total bit-rate? It is a classical problem in compression, called
optimal bit allocation (Gersho & Gray, 1992), that can be solved as follows. According to
relation (13), when the spectral and spatial transforms A and W are given, the differential
(m)
entropies H (Yi ) and the factors wi and ωm are given. Then, the total bit-rate is minimized
(m) πm
∏iN=1 = 1 ( Di )
M
if and only if ∏m N is maximized, that is if and only if
1
N M πm N N M N M πm
(m) πm (m)
∏∏ ∏ wi ∏ =∏ ∏
N N
Di ωm ω m w i Di
i =1 m =1 i =1 m =1 i =1 m =1
is maximized. Now the mean inequality states the last expression (which is a geometric mean)
is not greater than the arithmetic mean corresponding to the left member of inequality (14),
with equality if and only if all the terms in the summation are equal. Hence, the minimization
(m) −1 w−1 for all m and i. That leads to
holds when Di = Dt ω m i
M N
1 (m) 1 1 1
R ∑ πm ∑ H (Yi ) + 2 log2 wi + 2 log2 ωm − 2 log2 (cDt )
m =1
N i =1
and since wi is the ith diagonal element of A−T A−1 , the other terms ωm do not depend on A,
we obtain the following theorem.
Theorem 3. For the separable scheme when the 2-D DWT has fixed coefficients, if high resolution
quantizations hypotheses are assumed, then the optimal spectral transform A is an N × N matrix that
minimizes the criterion:
N M
(m) 1
C2 (A) = ∑ ∑ πm H (Yj )+ log2 det diag(A−T A−1 ). (15)
j =1 m =1
2
M N
(m) 1 det diag A−T A−1
C2 (A) = ∑ πm ∑ H (Yi ) − log2 | det A| + log2
m =1 i =1
2 det(A−T A−1 )
M
(m)
= ∑ πm C ICA (A) + CO (A), (16)
m =1
(m) (m)
where, for 1 ≤ m ≤ M, C ICA (A) = ∑iN=1 H (Yi ) − log2 | det A| is the criterion to minimize when
performing only ICA to the N components of the transformed coefficients that belong to the subband
m. Pham (Pham, 2004) used that criterion to perform the algorithm ICAinf.
160
10 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
1 See Remark 2.
Discrete Wavelet Transform
and
DiscreteOptimal Spectral
Wavelet Transform Transform
and Optimal Applied
Spectral Transform toMulticomponent
Applied to MulticomponentImage CodingImage Coding 161
11
the entropy estimation and uses only second order statistics. Thus its minimization requires
much less computer resources than using (15).
An analysis of criterion (17) shows that it takes into account two phenomena: 1) the non
(m)
Gaussianity of the transformed coefficients Yi for 1 ≤ m ≤ M and 1 ≤ i ≤ N — this
is controlled by the first term — and 2) the inhomogeneity of the variances in the subbands
— this is controlled by the second term. It is natural to explore the case where the second
phenomenon is the most important, since the DWT tends to render the variables more
Gaussian. In practice, this condition is generally roughly satisfied, except in the LL subband (a
subband of lowest resolution) for which the weighting coefficient πm is generally small. Thus,
if we neglect the variation, induced by the spectral transform A, of the first term in the right
member of equation (17), and if we consider only orthogonal matrices A, then the optimal
transform minimizes the new criterion
1 N M (m)
C (A) =
2 i∑ ∑ πm log2 [var(Yi )]. (18)
=1 m =1
Furthermore if we assume in each component the transformed coefficients have all the
same variance,
regardless
of the subband they belong to, then the criterion (18) becomes
1
2 log2 ∏ N
i =1 var ( Yi ) , leading to the KLT.
In the following, we express criterion (18) in terms of the covariance matrices of the wavelets
coefficients XW T = (XW T )(1) (XW T )(2) · · · (XW T )( M) located in the same subband. The
matrix (XW T )(m) is of dimension N × πm L. Its columns can be considered as different
realizations of a random vector of dimension N whose covariance matrix is denoted C(m) .
Now, Y = AXW T can be written Y = [Y(1) · · · Y( M) ], where Y(m) = (AXW T )(m) is a
matrix whose columns can also be considered as different realizations of a random vector
(m)
having AC(m) A T as covariance matrix. With these notations, we have ∏ N j=1 var(Yj ) =
det diag(AC(m) A T ) and hence the new criterion becomes
1 M
C (A) = πm log2 det diag(AC(m) A T )
2 m∑
(19)
=1
2 The orthogonality constraint will be justified in § 7 in which we find that minimizing (15) with and
without this constraint yields almost the same performances. With the orthogonality constraint, the
second term in (15) vanishes.
162
12 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
The FG algorithm in (Flury & Gautschi, 1986) can be used to minimize the above criterion.
We have developed a slightly different algorithm (called JADO) which is briefly described in
Appendix 6.3.
(m) (m)
where the function ψ (m) is equal to the derivative of − log p(yi ) — p ( yi ) denoting the
Yi
(m)
probability density function of Yi — and is known as the score function. Let M = A−T A−1 .
In (Narozny et al., 2008), the Taylor expansion of CO (A + E A) is given up to the second order,
however it is quite involved and it is simplified into
M ji 1 M jj 2
CO (A + E A) ≈ CO (A) − ∑ E ji + ∑ E ji + E ji Eij + · · · (21)
1≤ i = j ≤ N
Mii 2 1≤i = j≤ N Mii
by neglecting the non diagonal elements of M = [ Mij ] in the second order terms of the Taylor
expansion.
Using the approximation (21), the equality (20) and the relation (16) we obtain
M M
(m) (m) ij
C2 (A + E A)=C2 (A) + ∑ ∑ πm E Yj ψYi(m) Yi − E
1≤ i = j ≤ N m =1
M jj ij
1 M M
( m )2 (m)
2 1≤i∑ ∑ πm Eij E Yj
2 2 ii 2
+ E ψ (m) Yi + E + 2Eij E ji . (22)
= j ≤ N m =1
Yi M jj ij
The quadratic form associated to this last expansion is positive definite. One iteration of the
algorithm is first to solve the following equation
Ψij 2 Eij Φij
= , (23)
2 Ψ ji E ji Φ ji
Discrete Wavelet Transform
and
DiscreteOptimal Spectral
Wavelet Transform Transform
and Optimal Applied
Spectral Transform toMulticomponent
Applied to MulticomponentImage CodingImage Coding 163
13
M
1 ( m )2 (m) ( m )2 (m)
2 1≤i∑ ∑ πm E[Yj ] E[ψ2 (m) (Yi )] + E[Yi ] E[ψ2 (m) (Yj )] − 2 Eij2 . (24)
Y Y
< j≤ N m =1 i j
Actually, A + E A obtained in this way is not a true orthogonal matrix. This can be overcome
by replacing A + E A with eE A = (I + E + E 2 /2! + · · · )A, which is orthogonal and differs from
A + E A only by second order terms. We call OrthOST (Orthogonal Optimal Spectral Transform)
this algorithm and OrthOST the orthogonal transform returned by the algorithm. The case
where the spectral transform is constrained to be orthogonal is particularly interesting because
the weightings which depend on the linear transform are all equal to one.
6.3 The JADO (Joint Approximate Diagonalization under Orthogonality constraint) algorithm
Given K positive definite (complex) matrices C1 , . . . , CK associated with positive weights w1 ,
. . . , wK , the JADO algorithm aims to find a unitary matrix B which minimizes
K
C (B) = ∑ wk log det diag(BCk B∗ ) (26)
k =1
where ∗ denotes the hermitian operator. This algorithm differs only slightly from FG
algorithm in (Flury & Gautschi, 1986). However, its derivation in (Flury & Gautschi, 1986)
is complex and difficult to understand. Here we provide briefly a much simpler derivation.
164
14 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
The idea is to make successive Givens rotations, each time on a pair of rows of B, the ith row
Bi· and the jth row B j· , say:
Bi · Bi ·
← Tij , (27)
B j· B j·
where Tij is a 2 × 2 unitary matrix, chosen so that the criterion is decreased. The processing
K ( K −1)
of all the 2 pairs is called a sweep. The algorithm consists of repeated sweeps until
convergence is achieved.
The decrease of the criterion (26) induced by (27) is
K Bi ·
∑ wk log (Bi· Ck Bi∗· )(B j· Ck B∗j· ) det diag Tij Ck Bi∗· B∗j· Tij∗ .
k =1 B j·
A natural idea is to chose Tij to maximize this decrease, but there is no closed form formulae
for that. Our idea is to maximize a lower bound of it instead. Since for a > 0, b ≥ 0, log( a/b) ≥
1 − b/a, the above decrease can be seen to be bounded below by
∗ ∗
2(w1 + · · · + wK ) − Tij;1· PTij;1 · − Tij;2· QTij;2· , (28)
where Tij;1· and Tij;2· are the first and second rows of Tij and
K
wk Bi · K
wk Bi ·
P= ∑ ∗ Ck [ Bi∗· B∗j· ]; Q= ∑ ∗ Ck [ Bi∗· B∗j· ].
k =1
B i · C k Bi · B j· k =1
B j · C k B j· B j·
Since Tij;2· has unit norm and is orthogonal to Tij;1· , it must be of the form eiα Tij;1· J where
α is some phase angle, x denotes the complex conjugate of x and J is the 2 × 2 matrix with 0
∗
on the diagonal and 1, −1 on the anti-diagonal. Thus Tij;2· QTij;2 ∗ ∗
· = Tij;1· JQJ Tij;1· , but since
the above left hand side is real (as Q is hermitian), it also equals Tij;1· JQJ∗ Tij;1
∗ . Therefore
·
expression (28) can be rewritten as 2(w1 + · · · + wK ) − Tij;1· (P + JQJ∗ )Tij;1
∗ . Maximizing it
·
with respect to the unitary matrix T thus amounts to minimizing Tij;1· (P + JQJ∗ )Tij;1 ∗
· with
respect to the vector of unit norm Tij;1· . The solution is that Tij;1· is (up to a factor of unit
modulus) the normalized left eigenvector of the smallest eigenvalue of P + JQJ∗ . Since Tij;2·
is orthogonal to Tij;1· it is the other eigenvector. Finally, Tij is the matrix formed by the left
eigenvectors of P + JQJ∗ . Its elements can be computed explicitly in closed form as follows.
We note that the off diagonal elements of JQJ∗ is the negative of those of Q while the diagonal
elements are those of Q in reverse order. Thus JQJ∗ = tr(Q)I − Q where tr denotes the trace.
Since the addition of a multiple of the identity matrix does not change the eigenvectors, Tij
is also the matrix formed by the left eigenvectors of P − Q. One can now recognize that
the rotation (27) is the same as an iteration in the G loop of the FG algorithm. However, it
differs from our JADO algorithm in that it repeats (27) with the same pair i, j (but with the
newly computed Bi· and B j· ) until convergence (the G loop) and only then another pair i, j
is considered. We feel that this is not efficient since the decrease of the criterion will be very
small near the end of the G loop. We call JADOST the transform returned by the algorithm.
Discrete Wavelet Transform
and
DiscreteOptimal Spectral
Wavelet Transform Transform
and Optimal Applied
Spectral Transform toMulticomponent
Applied to MulticomponentImage CodingImage Coding 165
15
7. Experimental results
In this section we present the performances in image compression of the optimal transforms
described in the previous sections.
Nc × Nr = 320 × 1376, Toulouse with Nc × Nr = 352 × 3816, Vannes with Nc × Nr = 352 × 3736,
. . . The hyperspectral images are4 AVIRIS images (Moffett, Cuprite and Jasper) with N = 224
components from the visible to the infrared and coded on Nb = 16 bpppb. They are
originally acquired with Nr × Nc = 512 × 624, but for the simulations we kept only the 512
leftmost columns. Some images used in our tests are shown in figures 1 and 2. As already
absolute error (MAE = ∑iN=1 ∑nL=1 | Xi (n) − X i (n)|/( NL)). With these four
distortions, one
can estimate the performances of a codec on usual applications of hyperspectral images, like
classifications and targets detections (Christophe et al., 2005). For multispectral images, we
considered only the MAD and the MSE distortions, the last one being expressed in terms
of Peak of Signal to Noise Ratio, PSNR = 10 log10 [(2 Nb − 1)2 /D ], where D is the actual
end-to-end MSE distortion and Nb is the number of bits per pixel and per band (bpppb) of
the initial image. The bit-rate, expressed in bpppb, was measured on the actual bit stream
obtained with the JPEG2000 coder EBCOT (Taubman, 2000) and its PCRD optimizer applied
across components for optimal bit allocation. We used the Verification Model version 9.1
(VM9 (JPEG2000, 2001)) codec developed by the JPEG2000 group. The coefficients of A−1 (the
inverse matrix of the optimal spectral transform) and the mean of each component are stored
in the bitstream as float32 data (this costs 32( N + 1)/L bpppb). A difference exists between
the aimed bit-rate and the actual bit-rate obtained with the VM9. In our tests, this difference
does not exceed ± 0.001 bpppb and thus the precision of the PSNR is about ± 0.05 dB.
4 These images have been downloaded from the NASA web site http://aviris.jpl.nasa.gov/.
Discrete Wavelet Transform
and
DiscreteOptimal Spectral
Wavelet Transform Transform
and Optimal Applied
Spectral Transform toMulticomponent
Applied to MulticomponentImage CodingImage Coding 167
17
Table 1. Bit-rate (in bpppb) versus PSNR (in dB) and versus MAD of different spectral
transforms on multispectral images (best results are bolded). The bit-rate was computed with
the VM9.
two distortions PSNR and MAD on six multispectral images and Tables 2 and 3 show the
bit-rate of different transforms versus the four distortions SNR (in dB), MAE, MAD and
MSA (expressed in degree ◦ ) on three hyperspectral images. All the 2-D DWT was applied
with five levels of decomposition. We observe the well-known fact that spectral transforms
perform significantly better than the identity matrix (i.e., no spectral transform), especially
for hyperspectral images. Indeed, on six multispectral images (see Table 1) the average gains
168
18 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
Table 2. Bit-rate (in bpppb) versus SNR (in dB) and versus MAE of different spectral
transforms on hyperspectral images. The bit-rate was computed with the VM9.
of the KLT, JADOST, OrthOST and OST on Identity are respectively 3.6 dB, 3.6 dB, 3.8 dB and
3.8 dB. On three hyperspectral images (see Table 2) the average gains of the KLT, JADOST,
OrthOST and OST on Identity are respectively 15.9 dB, 16.3 dB, 16.3 dB and 16.4 dB. Moreover,
we can notice that the optimal transforms OrthOST and OST perform always a little better
than the KLT at medium and high bit-rates: on six multispectral (resp. three hyperspectral)
images the average gains of OrthOST and OST on KLT are about 0.23 dB and 0.28 dB (resp.
0.43 dB and 0.49 dB). On the multispectral images, we observed that JADOST performs
roughly as the KLT for MSE distortion, sometimes slightly better, sometimes slightly worse, at
any rate. On six images, the average gain of JADOST on the KLT is negligible (about 0.02 dB)
at medium and high bit-rates (from 0.25 to 3 bpppb), whereas the average gain of OrthOST on
JADOST is about 0.21 dB at the same rates. Nevertheless, on hyperspectral images, JADOST
performs slightly but significantly better than the KLT for the four distortions tested at
medium and high bit-rates (see Tables 2 and 3) and nearly reaches the OrthOST scores with a
significantly lower computational complexity. The average gain of JADOST on the KLT (resp.
OrthOST on JADOST) is 0.37 dB (resp. 0.07 dB) on the range [0.25 bpppb , 3 bpppb]. Further,
we can remark that there is an insignificant difference of performances between OrthOST and
OST. This can be explained by the fact that transforms minimizing the criterion (16) must have
a small value for CO (A), i.e., they must be close to orthogonality (see Remark 5). Therefore
there is no advantage to use OST rather than the orthogonal transform OrthOST. In examining
the MAD distortion we observe that on the multispectral images tested, at medium bit-rates
(i.e. between 0.25 and 1.5 bpppb), OrthOST performs worse than the KLT (see Table 1). On
the other hand, on the three hyperspectral (AVIRIS) images tested, for all the distortions
measured, at medium and high bit-rates, JADOST and OrthOST perform always better than
the KLT (see Tables 2 and 3). This is a nice finding, since the optimality of OrthOST is justified
only for the MSE distortion and at high bit-rates.
Discrete Wavelet Transform
and
DiscreteOptimal Spectral
Wavelet Transform Transform
and Optimal Applied
Spectral Transform toMulticomponent
Applied to MulticomponentImage CodingImage Coding 169
19
MSA (◦ ) MAD
bit-rate 0.25 0.50 0.75 1.00 1.50 2.00 2.50 3.00 0.25 0.50 0.75 1.00 1.50 2.00 2.50 3.00
Moffett
Id 12.12 6.82 3.94 2.66 1.29 0.85 0.52 0.36 1676 781 492 1259 183 62 32 20
KLT 1.43 0.87 0.57 0.37 0.20 0.15 0.12 0.10 392 211 119 67 24 14 8 7
JADOST 1.15 0.59 0.42 0.27 0.19 0.14 0.11 0.09 279 120 67 44 18 12 8 6
OrthOST 0.96 0.47 0.31 0.25 0.18 0.14 0.11 0.09 261 77 49 33 18 10 8 6
OST 0.86 0.50 0.32 0.25 0.18 0.14 0.11 0.09 207 101 46 37 19 12 7 6
Cuprite
Id 5.30 2.81 2.20 1.57 1.01 0.59 0.40 0.26 659 360 253 185 110 62 61 40
KLT 0.42 0.25 0.22 0.15 0.12 0.08 0.07 0.06 154 135 100 54 26 16 10 8
JADOST 0.33 0.25 0.16 0.14 0.10 0.08 0.07 0.05 112 109 61 39 20 11 9 7
OrthOST 0.32 0.25 0.17 0.14 0.10 0.08 0.07 0.05 113 110 61 37 22 11 9 7
OST 0.35 0.24 0.16 0.14 0.10 0.08 0.07 0.05 113 109 58 42 17 11 9 7
Jasper
Id 18.20 12.53 7.88 5.70 3.87 2.14 1.41 1.01 1907 1220 732 559 241 160 84 55
KLT 0.91 0.53 0.43 0.34 0.26 0.20 0.15 0.12 225 151 82 57 30 15 10 7
JADOST 0.87 0.51 0.44 0.33 0.24 0.19 0.15 0.12 157 91 56 51 20 11 9 7
OrthOST 0.83 0.51 0.40 0.33 0.24 0.19 0.15 0.12 157 84 46 34 23 13 9 7
OST 0.79 0.51 0.41 0.32 0.24 0.18 0.14 0.11 156 86 48 34 22 14 8 6
Table 3. Bit-rate (in bpppb) versus MSA (in degree ◦ ) and versus MAD of different spectral
transforms on hyperspectral images for the separable scheme. The bit-rate was computed
with the VM9.
As already mentioned, the main drawback of the OrthOSTs returned by JADO and OrthOST
algorithms is their heavy computational costs. In the next section we introduce quasi-optimal
orthogonal spectral transforms.
5 The images were acquired via the Data Disseminated System (http://dwlinkdvb.esrin.esa.it/DDS/)
thanks to ESA/ESRIN.
170
20 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
Fig. 3. Fifth component (corresponding approximately to the band [555 nm, 565 nm]) of the
hyperspectral images MERIS, numbered 1–4, 6, 8, 10, 13, 15–16, from left to right
reflective spectral range of visible and near infrared light. Each band has a programmable
width and a programmable location in the 390 nm to 1040 nm spectral range. As mentioned
in (MERIS, 2006) the instrument scans the Earth’s surface by the push-broom method, CCD
arrays provide spatial sampling in the across-track direction, while the satellite’s motion
provides scanning in the along-track direction. The scene is imaged simultaneously across
the entire spectral range, through a dispersing system, onto the CCD array. Therefore there
is no problem of deregistration on the MERIS images. The ten images of our tests have all
the same dimensions: Nr = 128, Nc = 1121 and N = 15. They are originally coded on
Nb = 16 bpppb and they were acquired with the same fifteen spectral bands. To construct
exogenous KLT and exogenous OrthOST, we split the ten MERIS images in two disconnected
sets, one constituted of seven images (the learning basis), the other constituted of the three
remaining images (the test subset). We considered 13 various learning bases, denoted Li
(1 ≤ i ≤ 13) which are presented in Table 4. The bit-rate is computed with the Verification
Model version 9 (VM9 (JPEG2000, 2001)). Note that the exogenous transforms are fixed (i.e.,
they do not adapt to the encoded image), hence they are known by the decoder and have not
to be transmitted. However, in the lossy compression results given in Table 5 with the VM9,
the inverse of the spectral transform is coded in the bit-stream (it costs less than 0.001 bpppb).
Discrete Wavelet Transform
and
DiscreteOptimal Spectral
Wavelet Transform Transform
and Optimal Applied
Spectral Transform toMulticomponent
Applied to MulticomponentImage CodingImage Coding 171
21
In Table 5 we present the performances obtained with two images when the learning basis
varies. Among all the tests we made, we chose to show the best and worst cases obtained
with an exogenous OrthOST. For this, the PSNR of a spectral transform is compared to that
obtained with the KLT by subtracting, and we considered that difference of PSNR at 1 bpppb.
The best and worst cases correspond respectively to the tested images MERIS2 and MERIS8.
We can see that for MERIS2, the exogenous OrthOST performs significantly better than the
KLT at all rates for both MSE and MAE global distortions. Whereas for both MAD and
MSA local distortions, exogenous OrthOST and KLT have roughly the same performance, the
winner depending on the bit-rate. A more interesting result is the worst case: at bit-rates
not greater than 1 bpppb, the worst exogenous OrthOST performs worse than the worst
exogenous KLT and this trend is reversed for bit-rates larger than 1.0 dB. Moreover, the loss
of PSNR compared to the KLT is 4.3 dB at 1 bpppb, however, the difference of PSNR between
the KLT and identity (i.e., no spectral transform) is particularly high here (30 dB). For the
other eight tested images, the loss of PSNR of the worst exogenous OrthOST with respect to
the KLT, is not greater that 2.5 dB, at all bit-rates. Moreover, it is always smaller than the loss
of PSNR of the best exogenous KLT with respect to the KLT. An example is shown in Fig. 4,
where the bit-rate is computed either with the VM9 or the Bit Plane Encoder (BPE) (CCSDS-1,
2007) recommended by the CCSDS (Yeh et al., 2005). In order to compute the bit-rate with
the BPE, we proceeded as follows: first, for each transformed component we computed a few
hundred points of the graph that links mean square error to bit-rate, then we applied the
algorithm by Shoham and Gersho (Shoham & Gersho, 1988) to optimally allocate distortions
between components for given maximal total bit-rates.
In average on the 10 images, the loss of PSNR of the worst exogenous OrthOST with respect
to the KLT is significantly smaller than the one of the best exogenous KLT (see Table 6). It
is no longer the case for exogenous JADOST. We observed the importance of the learning
basis, whose influence can range from 0 dB to −4 dB. In other words, when the learning basis
is well chosen (depending on the scene and not only on the spectrometer), one can expect
a loss of PSNR of an exogenous OrthOST with respect to the KLT not greater than 0.4 dB.
Whereas, when it is badly chosen, the same loss of PSNR should be limited to 4 dB. However,
these values are only indicative and should not be considered definitive, because they were
obtained on a set of 10 MERIS images, which was not proven to be statistically significant.
We observed good performances of exogenous OrthOST used with the VM9 and in (Akam Bita
et al., 2010c) the authors observed that, associated with the BPE and the optimal bit allocation
172
22 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
Table 5. Bit-rate (in bpppb) vs PSNR (in dB), vs MAD, vs MSA and vs MAE of various
spectral transforms on two images. The exoi_KLT, exoi_JADOST and exoi_OrthOST
correspond respectively to exogenous KLT, JADOST and OrthOST computed with the
learning basis Li . The best (resp. worst) results of exogenous transforms at 1.0 bpppb are
bolded (resp. in italics).
Discrete Wavelet Transform
and
DiscreteOptimal Spectral
Wavelet Transform Transform
and Optimal Applied
Spectral Transform toMulticomponent
Applied to MulticomponentImage CodingImage Coding 173
23
Fig. 4. PSNR (in dB) versus bit-rate (in bpppb) for various spectral transforms (KLT, JADOST
OrthOST and (left) best exogenous KLT, best exogenous JADOST, best exogenous OrthOST
or (right) worst exogenous KLT, worst exogenous JADOST, worst exogenous OrthOST). The
image is MERIS15 and the bit-rate is computed with first row: the VM9, second row: the BPE.
bit-rate (in bpppb) 0.25 0.50 0.75 1.00 1.50 2.00 3.00
mean (in dB) {PSNR(KLT)− worst exogenous PSNR(OrthOST)} 0.67 1.19 1.56 1.68 1.49 1.15 0.86
mean (in dB) {PSNR(KLT) − worst exogenous PSNR(JADOST)} 0.56 1.32 1.99 2.29 2.25 1.89 1.40
mean (in dB) {PSNR(KLT) − best exogenous PSNR(KLT)} 1.02 1.83 2.34 2.51 2.27 1.82 1.4
Table 6. Comparison of the averaged losses of PSNR with respect to the KLT for the worst
exogenous OrthOST, the worst exogenous JADOST and the best exogenous KLT. The worst
and best exogenous transforms are selected at 1.00 bpppb. The averages are computed on the
ten images.
174
24 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
algorithm by Shoham and Gersho (Shoham & Gersho, 1988) for quantization and entropy
coding, exogenous OrthOST still performs well (see Fig. 4). However, the VM9 and the
Shoham and Gersho algorithm both have a too high computational complexity for a coder
on-board a satellite. In (Gutzwiller et al., 2009), the authors propose an extension to
multicomponent images of the well-known 2-D SPIHT encoder that has not the shortcoming
of a high computational cost for bit-rate allocation.
9. Conclusion
In this chapter, we have studied the problem of finding optimal spectral transforms associated
with fixed 2D discrete wavelet transforms in coding of multi- and hyper-spectral images, for
a compression scheme that is compatible with the JPEG2000 Part 2 standard. We clarified
the criterion that gives, when minimized, an optimal transform under high-rate entropy
constraint scalar quantization hypothesis and when one scalar quantizer per subband and per
component is applied. We showed the link between the criterion and the mutual information
contrast used in independent component analysis. We derived a criterion minimized by an
orthogonal optimal transform when the data are Gaussian. Then we gave three algorithms
that return the spectral transforms that minimize the JPEG2000 compatible criterion, two
under the constraint of orthogonality — one of which assuming Gaussian data — and the
third with no constraint, but invertibility. Finally, we have tested the optimal transforms
on satellite multi- and hyper-spectral images and found that for hyperspectral images the
orthogonal optimal transform OrthOST and JADOST performs a little better than the KLT
for four distortion measures that permit to evaluate the performances of the codec in
applications of hyperspectral images like classifications or target detections. However the
computational complexity of the optimal transform is too heavy for actual applications. Last
we have presented the exogenous orthogonal quasi-optimal spectral transforms, that have a
significantly smaller complexity, and their performances in lossy coding. In future works, we
will study the problem of designing optimal spectral filters (i.e. a convolutive rather than an
instantaneous mixture) in lossy compression of multi- and hyper-spectral images.
10. References
Akam Bita, I. P., Barret, M., Dalla Vedova, F. & Gutzwiller, J.-L. (2008). Onboard com-
pression of hyperspectral images using an exogenous orthogonal quasi-optimal
transform, Proceedings of On-Board Payload Data Compression Workshop, Noordwijk
(The Netherlands).
Akam Bita, I. P., Barret, M. & Pham, D.-T. (2010a). On optimal transforms in lossy compression
of multicomponent images with JPEG2000, Signal Processing Vol. 90(No. 3): 759–773.
Akam Bita, I. P., Barret, M. & Pham, D.-T. (2010b). On optimal orthogonal transforms at high
bit-rates using only second order statistics in multicomponent image coding with
JPEG2000, Signal Processing Vol. 90(No. 3): 753–758.
Akam Bita, I. P., Barret, M., Vedova, F. D. & Gutzwiller, J.-L. (2010c). Lossy and lossless
compression of MERIS hyperspectral images with exogenous quasi-optimal spectral
transforms, Journal of Applied Remote Sensing Vol. 4: 1–15.
Antonini, M., Barlaud, M., Mathieu, P. & Daubechies, I. (1992). Image coding using wavelet
transform, IEEE Transactions on Image Processing Vol. 1(No. 2): 205–220.
Discrete Wavelet Transform
and
DiscreteOptimal Spectral
Wavelet Transform Transform
and Optimal Applied
Spectral Transform toMulticomponent
Applied to MulticomponentImage CodingImage Coding 175
25
Barret, M., Akam Bita, I. P., Gutzwiller, J.-L. & Dalla Vedova, F. (2009). Lossy hyperspectral
image coding with exogenous quasi optimal transforms, Proceedings Data Compression
Conference, Snowbird (USA), pp. 411–419.
Barret, M., Gutzwiller, J.-L. & Hariti, M. (2011). Low complexity hyperspectral image coding
using exogenous orthogonal optimal spectral transform (OrthOST) and degree-2
zerotrees, IEEE Transactions on Geoscience and Remote Sensing Vol. 49(No. 5).
CCSDS-1 (2007). Image data compression, Report concerning space data system standards, CCSDS
120.1-G-1, Green book .
Chang, C.-I., Ramakrishna, B., Wang, J. & Plaza, A. (2010). Low-bit rate exploitation-based
lossy hyperspectral image compression, Journal of Applied Remote Sensing Vol. 4: 1–24.
Christophe, E., Léger, D. & Mailhes, C. (2005). Quality criteria benchmark for hyperspectral
imagery, IEEE Transactions on Geoscience and Remote Sensing Vol. 43(No. 9): 2103–2114.
Christophe, E., Mailhes, C. & Duhamel, P. (2006). Best anisotropic 3-D wavelet decomposition
in a rate-distortion sense, Proceedings of International Conference on Acoustic, Speech and
Signal Processing, Toulouse (France), pp. II–17–20.
Dragotti, P. L., Poggi, G. & Ragozini, A. R. P. (2000). Compression of multispectral images
by three-dimensional SPIHT algorithm, IEEE Transactions on Geoscience and Remote
Sensing Vol. 38(No. 1): 416–428.
Du, Q. & Fowler, J. E. (2007). Hyperspectral image compression using JPEG2000 and principal
component analysis, IEEE Geoscience and Remote Sensing Letters Vol. 4: 201–205.
Du, Q. & Fowler, J. E. (2008). Low-compexity principal component analysis for hyperspectral
image compression, International Journal of High Performance Computing Applications
Vol. 22: 438–448.
Flury, B. N. & Gautschi, W. (1986). An algorithm for simultaneous orthogonal transformation
of several positive definite symmetric matrices to nearly diagonal form, SIAM Journal
on Scientific and Statistical Computing Vol. 7(No. 1): 169–184.
Fowler, J. E. & Rucker, J. T. (2007). chap 14: 3D wavelet-based compression of hyperspectral
imagery, in C.-I. Chang (ed.), Hyperspectral Data Exploitation: Theory and Applications,
John Wiley & Sons, Hoboken.
Gersho, A. & Gray, R. M. (1992). Vector quantization and signal compression, Kluwer Academic
Publisher.
Gray, R. M. & Neuhoff, D. L. (1998). Quantization, IEEE Transactions on Information Theory Vol.
44(No. 6): 2325–2384.
Gutzwiller, J.-L.and Hariti, M., Barret, M., Christophe, E., Thiebaut, C. & Duhamel, P. (2009).
Extension du codeur SPIHT au codage d’images hyperspectrales, Proceedings of
Colloquium COmpression et REprésentation des Signaux Audiovisuels (CORESA), Tou-
louse (France).
URL: http://liris.cnrs.fr/Documents/Liris-4087.pdf
JPEG2000 (2001). JPEG2000 verification model 9.1 (technical description), ISO/IEC JTC 1/SC
29/WG 1 WG1 N2165 pp. 1–213.
MERIS (2006). MERIS detailed instrument description, European Space Agency .
URL: http://envisat.esa.int/instruments/meris/
Narozny, M., Barret, M., Pham, D.-T. & Akam Bita, I. P. (2005). Modified ICA algorithms
for finding optimal transforms in transform coding, Proceedings of IEEE International
Symposium on Image and Signal Processing and Analysis, Zagreb (Croatie), pp. 111–116.
176
26 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
Narozny, M., Barret, M. & Pham, D.-T. (2008). ICA based algorithms for computing optimal
1-D linear block transforms in variable high-rate source coding, Signal Processing Vol.
88(No. 2): 268–283.
Papoulis, A. (1984). Probability, Random Variables, and Stochastic Processes, McGraw-Hill.
Penna, B., Tillo, T., Magli, E. & Olmo, G. (2007). Transform coding techniques for lossy
hyperspectral data compression, IEEE Transactions on Geoscience and Remote Sensing
Vol. 45(No. 5): 1408–1421.
Pham, D.-T. (2004). Fast algorithms for mutual information based independent component
analysis, IEEE Transactions on Signal Processing Vol. 52(No. 10): 2690–2700.
Pham, D.-T. (2005). Entropy of random variable slightly contaminated with another, IEEE
Signal Processing Letters Vol. 12(No. 7): 536–539.
Rucker, J. T., Fowler, J. E. & Younan, N. H. (2005). JPEG2000 coding strategies for hyperspectral
data, Proceedings of International Geoscience and Remote Sensing Symposium, Seoul
(Korea), pp. 128–131.
Said, A. & Pearlman, W. A. (1996). A new, fast and efficient image codec based on set
partitioning in hierarchical trees, IEEE Transactions on Circuits and Systems for Video
Technology Vol. 6(No. 3): 243–250.
Shoham, Y. & Gersho, A. (1988). Efficient bit allocation for an arbitrary set of quantizers, IEEE
Transactions on Acoustics, Speech, and Signal Processing Vol. 36(No. 9): 1445–1453.
Taubman, D. S. (2000). High performance scalable compression with EBCOT, IEEE Transactions
on Image Processing Vol. 9(No. 7): 1158–1170.
Taubman, D. S. & Marcellin, M. W. (2002). JPEG2000: Image Compression Fundamentals,
Standards and Practice, Kluwer Academic, Place of publication.
Thiebaut, C., Lebedeff, D., Latry, C. & Bobichon, Y. (2006). On-board compression algorithm
for satellite multispectral images, Proceedings of Data Compression Conference, Snow-
bird (USA), pp. 28–30.
Usevitch, B. (1996). Optimal bit allocation for biorthogonal wavelet coding, Proceedings of Data
Compression Conference, Snowbird (USA), pp. 387–395.
Vaisey, J., Barlaud, M. & Antonini, M. (1998). Multispectral image coding using lattice VQ and
the wavelet transform, Proceedings of IEEE International Conference on Image Processing,
Chicago (USA), pp. 307–311.
Woods, J. W. & Naven, T. (1992). A filter based bit allocation scheme for subband compression
of HDTV, IEEE Transactions on Image Processing Vol. 1(No. 3): 436–440.
Yeh, P. S., Armbruster, P., Kiely, A., Masschelein, B., Moury, G. & Schafer, C. (2005). The
new CCSDS image compression recommendation, Proceedings of IEEE Aerospace
Conference, Big Sky (USA), pp. 1–8.
Part 3
Watermarking-Based
Image Authentication System in the
Discrete Wavelet Transform Domain
Clara Cruz Ramos, Rogelio Reyes Reyes, Mariko Nakano Miyatake and
Héctor Manuel Pérez Meana
SEPI ESIME Culhuacan, National Polytechnic Institute of México, México City,
México
1. Introduction
Nowadays, digital images and video are gradually replacing their conventional analog
counterparts. This is quite understandable because digital format is easy to edit, modify,
and exploit. Digital images and videos can be readily shared via computer networks and
conveniently processed for queries in databases. Also, digital storage does not age or
degrade with usage. On the other hand, thanks to powerful editing programs, it is very
easy even for an amateur to maliciously modify digital media and create "perfect"
forgeries. It is usually much more complicated to tamper with analog tapes and images.
Tools as digital watermarks help us establish the authenticity and integrity of digital
media and can prove vital whenever questions are raised about the origin of an image and
its content.
A digital watermarking technique embeds an invisible signal with an imperceptible form for
human audio/visual systems, which is statistically undetectable and resistant to lossy
compression and common signal processing operations. So far there some content
authentication of digital image methods, which can be classified in two groups:
watermarking based technique (Hsu & Wu, 1999) and digital signature based technique
(Friedman, 1993). Some authors had written about digital image authentication systems
(Wong, 1998; Holiman & Memos, 2000; Wong & Memon 2001; Celik, et al, 2002; Monzoy, et
al, 2007; Cruz, et al, 2008; Cruz, et al, 2009; Hernandez, et al, 2000; Lin & Chang 2001; Maeno,
2006; Hu & Chen, 2007; Zhou, et al, 2004; Lu & Liao 2003) and are classified in three
categories: complete authentication, robust authentication and content authentication (Liu &
Steinebach, 2006). Complete authentication refers to techniques that consider the whole
piece of multimedia data and do not allow any manipulation (Yeung & Mintzer, 1997; Wu &
Liu, 1998). Because the non-manipulable data are like generic messages, many existing
message authentication techniques can be directly applied. For instance, digital signatures
can be placed in the LSB of uncompressed data, or the header of compressed data. Then,
manipulations will be detected because the hash values of the altered content bits may not
match the information in the altered digital signature.
We define robust authentication as a technique that treats altered multimedia data as
authentic if manipulation is imperceptible. For example, authentication techniques, that
180 Discrete Wavelet Transforms: Algorithms and Applications
tolerate lossy compression up to an allowable level of quality loss and reject other
manipulations, such as tampering, belong to this category.
Content authentication techniques are designed to authenticate multimedia content in a
semantic level even though manipulations may be perceptible. Such manipulations may
include filtering, color manipulation, geometric distortion, etc. We distinguish these
manipulations from lossy compression because these perceptible changes may be
considered as acceptable to some observers but may be unacceptable to others.
A common objective for authentication is to reject the crop-and-replacement process that
may change the meaning of data. Many robust watermarking techniques in literature are
designed to be robust to all manipulations for copyright protection purpose. They usually
fail to reject the crop-and–replacement process so that they are not suitable for robust
authentication and content authentication.
An authentication system can be considered as effective if it satisfies the following
requirements:
1. Sensibility: The authenticator is sensitive to malicious manipulations such as crop-and-
replacement.
2. Robustness: The authenticator is robust to acceptable manipulations such as lossy
compression, or other content-preserving manipulations.
3. Security: The embedded information bits cannot be forged or manipulated. For
instance, if the embedded watermarks are independent of the content, then an attacker
can copy watermarks from one multimedia data to another.
4. Portability: Watermarks have better portability than digital signatures because the
authentication can be conducted directly from only received content.
5. Identification of manipulated area: Users may need partial information. The
authenticators should be able to detect location of altered areas, and verify other areas
as authentic.
Regardless of security issues, watermarking capacity is determined by invisibility and
robustness requirements. There are three dimensions shown in Figure 1. If one parameter is
determined, the other two parameters are inversely proportional. For instance, a specific
application may determinate how many bits of message are needed. After the embedded
amount is decided, it always exists a trade-off between visual quality and robustness which
must be considered. Robustness refers to the extraction of embedded bits with an error
probability equal to or approaching zero. Watermark imperceptibility (invisibility) represents
the quality of watermarked image respect to the original one. In general, if we want to make
our watermark more robust against attacks then a longer codeword or larger codeword
amplitudes will be necessary to provide better error-resistence. However, visual quality
degradation cannot be avoided. Another scenario may be that with a default visual quality,
there exists a trade-off between the information quantity of embedded message and
robustness. For instance, the fewer the message bits are embedded, the more redundant the
code word can be. Therefore, the code word has better error correction capability against noise.
It is difficult for an authenticator to know the purpose of manipulation. A practical approach
is to design an authenticator based on the manipulation method. In this work, we design an
authenticator which accepts format transformation and lossless compression (JPEG). The
authenticator rejects replacement manipulations because they are frequently used for
attacks. Our authenticator does not aim to reject or accept, in absolute terms, other
manipulation methods because the problem of whether they are acceptable or not depends
on applications.
Watermarking-Based Image Authentication System in the Discrete Wavelet Transform Domain 181
watermark detection error due to incidental alterations is shown to be smaller than the
probability of watermark detection error due to malicious tampering because they produce
comparatively smaller variance difference with the embedded marks. The authors argue
that this grants a certain degree of robustness to the system and show that their method is
able to authenticate JPEG compressed images without any access to the original unmarked
image. However, the degree of image compression allowed by the detection procedure is
not stated and the selection procedure of quantization parameters is not explained either.
In this work we develop a content authentication technique using imperceptible digital
watermarking which is robust to malicious and incidental attacks for image authentication,
embedding a digital signature as watermark. A digital signature is a set of features extracted
from an image, and these features are stored as a file, which will be used later for
authentication. To avoid the extra bandwidth needed for transmission of the signature in a
conventional way; having extracted the digital signature we applied the discrete wavelet
transform (DWT) to the image to embed the watermark in the sub band of lowest frequency,
because we want the watermark insertion to be imperceptible to the Human Visual System
and robust to common image processing such as JPEG compression and noise
contamination. The proposed system is able to extract the watermark in full blind detection
mode, which does not have access to the original host signal, and the watermark extracted
has to be re-derived from the watermarked signal, this process increases the system security.
In the security community, an integrity service is unambiguously defined as one which
insures that the sent and received data are identical. Of course, this binary definition is also
applicable to image, however it is too strict and not well adapted to this type of digital
document. Indeed, in real life situations images will be transformed, their pixel values will
therefore be modified but not the actual semantic meaning. In other words, the problem of
image authentication is released on the image content, for example: when modifications of
the document may change its meaning or visually degrade it. In order to provide an
authentication service for still images, it is important to distinguish between malicious
manipulations, which consist of changing the content of the original image (captions, faces,
etc.) and manipulations related to the usage of an image such as format conversion,
compression, noise, etc.
Unfortunately this distinction is not always clear; it partly depends on the type of image and
its usage. Indeed the integrity criteria of an artistic master piece and a medical image will
not be the same. In the first case, a JPEG compression will not affect the perception of the
image, whereas in the second case it may discard some of the fine details which would
render the image totally useless. In the latter case, the strict definition of integrity is
required. We applied the proposed algorithms in to grayscale and color no medical images.
| . | < ℎ = 0
| . | ≥ ℎ = 1 (1)
Since the patterns Pi have zero mean, the projections do not depend on the mean gray value
of the block and only depend on the variations in the block itself. The distribution of the
projections is image dependent and should be adjusted accordingly so that approximately
half the bits bi are zeros and half are ones. This will guarantee the highest information
content of the extracted N-tuple. This adaptive choice of the threshold becomes important
184 Discrete Wavelet Transforms: Algorithms and Applications
for those image operations that significantly change the distribution of projections, such as
contrast adjustment.
The subbands labeled LH1, HL1, and HH1 represent the finest scale wavelet coefficients. In
the present work, the wavelet transform is realized with Daubechies Wavelets of order 2.
Using this wavelets, the image is decomposed into four subbands: LL1, LH1, HL1 and HH1.
= [( ) ⁄ ] (2)
Then, we extract the embedded binary data wk as follows: if S is an even number, then wk
=0, otherwise wk =1.
∑ ⨂ ≤ ℎ ℎ
(3)
∑ ⨂ ≥ ℎ
Threshold Thv was determined through trial and error; resulting value of Thv was 4, it
means that if bits number of digital signature extracted of the block authenticated has at
least 12 of 16 bits equal, the block is consider as authentic else it is consider as modified.
Although the block is considered modified, sometimes you do not get the same 16-bit digital
signature extracted with respect to the original signature can be caused by any intentional
modification, which is why we proposed the following process check.
ℎ ≤ 3
(4)
> 3
were represents an error block, so if there are more than three consecutive error blocks in
the region, it has been intentionally modified.
Watermarking-Based Image Authentication System in the Discrete Wavelet Transform Domain 187
Fig. 6. (a,b) Non intentional modified image; (c,d) Intentionally modified image.
4. Experimental results
4.1 Digital signature robustness
To evaluate the robustness of the bit extraction procedure, we subjected the test image
"Barbara" with 512x512 pixels and 256 gray levels to various image processing operations
available in specialized commercial image manipulation software (we used Photoshop). The
test image "Barbara" had 1024 blocks of 16x16 pixels. We extracted N=16 bits from each
block for the original image and the manipulated image and calculated the average number
of error over all 1024 blocks. The results are shown in Table 1.
188 Discrete Wavelet Transforms: Algorithms and Applications
Table 1. Average number of error recovered bits out of 16 bits after some image processing
operations.
Another advantage of this algorithm is that the size and texture of the image doesn´t affect
on the correct operation of the system.
(a) 8 bits per pixel (bpp) grayscale image (b) 8 bits per pixel (bpp) grayscale image
(c) 24 bits per pixel (bpp) color image (d) 24 bits per pixel (bpp) color image
= 10 (5)
where is the mean square of the difference between the original image and the
watermarked one.
Figure 8 shows some examples of original images (in grayscale and color) together with
their respective watermarked images and PSNR values, where we can see that watermarked
images are to perceptually very similar to the original version. In table 2 PSNR values of
some grayscale and color images are shown, where we can observe that the average PSNR
value in the grayscale image is 45 dB’s and in the color image is 50 dB’s, so we can conclude
that degradation in the watermarked image is not perceptible.
190 Discrete Wavelet Transforms: Algorithms and Applications
considered as authentic. To satisfy this need, we calculate the increase of the number of the
“different” signature bits after compression (error blocks). The number of the error blocks
increases if the image is more compressed. We can set a threshold on this change to reject
those images that have too many error blocks.
If the error blocks are isolated, we apply equation (4) to determinate if those blocks are
result of a JPEG compression, however, if they are concentrated we are talking about an
intentional attack. We called to this process “verification” and it helps us to differentiate
between an intentional or non intentional attack.
Figure 9 shows the extracted results from the authentication JPEG compressed watermarked
images with quality factors higher than 75 and their corresponding verified image; we can
see that compressed images with quality factors higher than 75 have their error blocks
(white blocks) isolated; consequently, before the verification process they are considered as
not attacked.
5. Conclusion
The transition from analog to digital technologies is widely used, with the higher capacity of
storage devices and data communication channels, multimedia content has become a part of
our daily lives. Difital data is now commonly used in many areas such as education,
entertainment, journalism, law enforcement, finance, health services, and national defense.
The low cost of reproduction, storage, and distribution has added an additional dimension
to the complexity of the problem. In a number of applications, multimedia needs to be
protected for several reasons. Watermarking is a group of complementary technology that
has been identified by content provider to protect multimedia data.
In this paper we have successfully developed a robust digital signature algorithm which is
used as a semi-fragile watermarking algorithm for image authentication. The highest
advantage of this combination besides the digital signature robustness and the watermark
image imperceptibility, is that is not necessary an additional band width to transmit the
digital signature, since this is embedded in the host image as a watermark. Besides to the
extraction and authentication process, we propose a verification process, which helps us to
differentiate between an intentional or non intentional modification applying the concept of
connectivity between the 8 neighbors of error blocks.
194 Discrete Wavelet Transforms: Algorithms and Applications
(c) Authentication of the altered image (d) Verification of the authenticated image
Numerical experiments show that this algorithm is robust to JPEG lossy compression, the
lowest acceptable JPEG quality factor is 75 for grayscale images and 70 for color images. In the
case of impulsive noise, verification system determines that a watermarked image has no-
intentional modification if its density value is less than 0.002 which produce a PSNR average
value equal to 32 dB between watermarked image and contaminated watermarked image; a
similar case occurs with the Gaussian noise; the highest variance that the system accept is
0.00011 before it consider watermarked contaminated image as intentionally modified.
An important characteristic of this system besides its robustness against common signal
processing is its capacity to detect the exact tampered locations, which are intentionally
modified. Several watermarking systems using digital signature had been reported but they
aren’t robust to JPEG compression neither to modifications caused by common signal
processing.
Finally it is important to mention that the watermarked images generated by the proposed
algorithm are secure because the embedded watermarks are dependent on their own
content.
6. Acknowledgment
This work is supported by the National Polytechnic Institute of México.
7. References
Celik, M.; Sharma, G.; Saber, E. & Tekalp, A. (2002), Hierarchical Watermarking for Secure
Image Authentication with Localization, IEEE Transactions on Image Processing, Vol.
11, No. 6, pp. 585–595.
Chen, T.; Wang, J. & Zhou, Y. (2001), Combined Digital Signature and Digital Watermark
Scheme for Image Authentication, Info-tech and Info-net, 2001. Proceedings. ICII 2001 -
International Conferences on, Vol. 5, pp. 78-82, Print ISBN: 0-7803-7010-4.
Cruz, C. ; Reyes, R.; Nakano M. and Pérez, H. (2009), Image Authentication Scheme Based
on Self-embedding Watermarking, CIARP '09 Proceedings of the 14th Iberoamerican
Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis,
Computer Vision, and Applications, ISBN: 978-3-642-10267-7.
Cruz, C.; Reyes, R.; Mendoza, J.; Nakano, M. and Pérez, H. (2008), A Novel Verification
Scheme for watermarking based Image Content Authentication Systems,
Telecommunications and Radio Engineering, vol. 67, no. 19, pp. 1777-1790, 2008,
ISSN:0040-2508, http://begelhouse.com
Fridrich, J. (1999), Robust Bit Extraction from Images, Proceedings of IEEE International
Conference on Multimedia Computing and Systems (ICMCS’99), Vol. 2, pp. 536-
540, ISBN:0-7695-0253-9.
Friedman, G.L. (1993), The Trustworthy Digital Camera: Restoring Credibility to the
Photographic Image, IEEE Transactions on Consum. Elec., Vol. 39, pp. 905–910.
Hernández, V. ; Cruz, C. ; Nakano M. and Pérez, H. (2000), Algoritmo de Marca de Agua
Basado en la DWT para Patrones Visualmente Reconocibles, IEEE Latin America
Transactions, Vol. 4, No. 4, June 2006.
Holiman, M. & Memos N. (2006), Counterfeiting Attacks on Oblivious Block-Wise
Independent Invisible Watermarking Scheme, IEEE Transactions on Image Processing,
Vol. 9, No. 3, pp. 432-441.
196 Discrete Wavelet Transforms: Algorithms and Applications
Hsu, C. T. & Wu, J.I. (1999). Hidden Digital Watermarks in Images, IEEE Transactions on
Image Processing, Vol. 8, pp. 58–68.
Hu, Y. & Chen, Z. (2007), An SVD-Based Watermarking Method for Image Authentication,
Proceedings of the Sixth International Conference on Machine Learning and Cybernetics,
pp. 1723-1728, Hong Kong, 19-22 August (2007).
Inoue, H.; Miyazaki, A. & Katsura, T. (2000), A Digital Watermark for Images Using the Wavelet
Transform, Journal Integrated Computer-Aided Engineering, Vol. 7, No. 2, pp. 105-115.
Kundur, D. & Hatzinakos, D. (1999). Digital watermarking for telltale tamper proofing and
authentication, Proceedings of the IEEE, Vol. 87, No.7, pp. 1167–1180.
Lin, C. & Chang S. (2001), A Robust Image Authentication Method Distinguishing JPEG
Compression from Malicious Manipulation, IEEE Transactions on Circuits and
systems of Video Technology, Vol. 11 No. 2, pp. 153-168.
Liu, H. & Steinebach, M. (2006), Semi-Fragile Watermarking for Image Authentication with
High Tampering Localization Capability, Proc. of Int. Conf. Automated Production
of Cross Media Content for Multi-Channel Distribution, ISBN:0-7695-2625-X.
Lu, C. & Liao, H. M. (2003), Structural Digital Signature for Image Authentication: An
Incidental Distortion Resistant Scheme, IEEE Transactions on Multimedia, Vol. 5, No.
2, pp. 161-173.
Maeno, K.; Sun, Q.; Chang, S. & Suto, M. (2006), New Semi-Fragile Image Authentication
Watermarking Techniques Using Random Bias and Nonuniform Quantization,
IEEE Trans. on Multimedia, Vol. 8, No. 1, pp. 32-45.
Monzoy, M.; Salinas, M.; Nakano, M. & Pérez, H. (2007), Fragile Watermarking for Color
Image Authentication, 4th Int. Conf. Electrical and Electronic Engineering (ICEEE
2007), pp. 157-160.
Paquet, H. A.; Ward, R. K. & Pitas, I. (2003). Wavelet packets-based Digital Watermarking
for Image Verification and Authentication, Journal Signal Processing - Special section:
Security of data hiding technologies archive, Vol. 83 Issue 10, Amsterdam, The
Netherlands.
The Mathworks. Inc. (2008), Imwrite: Functions (Matlab functions references), Matlab help,
Ver. 7.6.0.324.
Wong, W. P. (1998), A Public Key Watermark for Image Verification and Authentication,
Proceedings of the IEEE Int. Conf. Image Processing, pp. 425-429.
Wong, W. P. & Memon, N. (2001), Secret and Public Key Image Watermarking Schemes for
Image Authentication and Ownership Verification, IEEE Transactions on Image
Processing, Vol. 10, No. 10, pp. 1593-1601.
Wu, M. & Liu, B. (1998), Watermarking for image authentication, Image Processing, 1998. ICIP 98.
Proceedings. International Conference on. Vol. 2, pp. 437–441, Print ISBN: 0-8186-8821-1
Yeung, M. & Mintzer, F., (1997), An Invisible Watermarking Technique for Image
Verification, Image Processing, International Conference on, Vol. 2, pp. 680, ISBN: 0-
8186-8183-7.
Yu, G.J. ; Lu, C.-S. ; Liao, H.-Y. M. & Sheu, J.-P. (2000). Mean quantization blind
watermarking for image authentication, IEEE International Conference on Image
Processing (ICIP’2000), Vol. III, pp. 706–709, Vancouver, BC, Canada.
Zhou, X.; Duan, X. & Wang, D. (2004), A Semi-Fragile Watermark Scheme for Image
Authentication, Proc. of Int. Conf. Multimedia Modeling Conference, pp. 374 – 377,
Print ISBN: 0-7695-2084-7.
12
Application of Discrete
Wavelet Transform in Watermarking
Corina Nafornita and Alexandru Isar
“Politehnica” University of Timisoara,
Romania
1. Introduction
Proliferation of multimedia data on the Internet and the ease of copying this data have
brought an interest for copyright protection (Cox et al., 2002). During transmission, data can
be protected using encryption; however after decrypting it, it is no longer protected. As an
alternative to encryption, watermarking has been proposed as a means of identifying the
owner, by secretly embedding an imperceptible signal into the host signal (Cox, 2005) – see
Fig. 1.
Fig. 1. Watermark embedding. The watermark is embedded using a secret or public key,
making invisible changes to the cover work.
The main properties of a watermarking system are perceptual transparency, robustness,
security, and data hiding capacity (Cox et al., 1997). Some of the terms used in
watermarking are (Cox et al., 2002):
- The original data where the watermark is to be inserted is referred to as host or cover
work.
- The hidden information is called payload.
- Visible watermarks are visual patterns (images, logos) inserted or overlaid on
images/video. Visible watermarks are applied to photos publicly available on the web,
to prevent commercial use of such images. One example of visible watermarking has
been implemented by IBM for the Vatican library (Braudaway et al., 1996).
- Most watermarking systems involve making the watermark imperceptible.
- The key is required for embedding the watermark. If the same key is used for retrieving
the watermark, the system is private, while if another key is used to retrieve it, the
system is known as public.
198 Discrete Wavelet Transforms: Algorithms and Applications
- If the cover work is required at the detector, the system is informed (non-blind); if it’s
not required at the detector, the system is blind.
- Watermarking systems are robust or fragile. Robust watermarks should resist any
modifications and are designed for copyright protection. Fragile watermarks are
designed to fail whenever the cover work is modified and to give some measure of the
tampering. Fragile watermarks are used in authentication.
Most of existing watermarking systems proposed in the literature can be classified
depending on the watermarking domain, where the embedding takes place: spatial domain
techniques (Nikolaidis & Pitas, 1998), where the pixels are directly modified, or transform
domain techniques.
The majority of watermarking algorithms operate based on the spread spectrum (SS)
communication principle. A pseudorandom sequence is added to the host signal in some
critically sampled domain and the watermarked signal is obtained by inverse transforming
the modified coefficients. Typical transform domains are the Discrete Wavelet Transform
(DWT), the Discrete Cosine Transform (DCT) and the Discrete Fourier Transform (DFT). The
DWT based algorithms usually produce watermarked images with the best balance between
visual quality and robustness due to the absence of blocking artefacts (Nafornita, 2008).
Watermarks can be robust or fragile, depending on the application. For copyright
protection, robustness is required. This can be assured with encoding of the watermark
using a repetition code or an error correcting code. Robustness is increased with the increase
of the correction capacity of the code. Despite of their efficient use in telecommunications,
turbo codes have been rarely used in watermarking (Abdulaziz et al., 2002, Serdean et al.,
2003, Balado & Perez-Gonzalez, 2001, Nafornita et al., 2009).
At the embedding side, the watermark can be added to coefficients of known robustness
(large valued coefficients) or perceptually significant regions (Cox, 2005), such as contours
and textures of an image. This can be done empirically, selecting larger coefficients (Cox et
al., 1997) or using a thresholding scheme in the transform domain (Podilchuk & Zeng, 1998,
Nafornita et al., 2005). Another approach is to insert the watermark in all coefficients of a
transform, using a variable strength for each coefficient (Barni et al., 2001). Hybrid
techniques, based on compression schemes, embed the watermark using a thresholding
scheme and variable strength (Podilchuk & Zeng, 1998). The performance of such a system
depends on the quality of the wavelet transform.
This chapter will focus on the application of the wavelet transforms in robust watermarking
for static images. We will present the classical techniques of watermarking; starting with the
spread spectrum DCT based watermarking system proposed by Cox et al. (Cox et al., 1997)
and continuing with those proposed in the wavelet domain.
Other wavelet transforms as the Double Tree Complex Wavelet Transform (DTCWT)
(Selesnick et al., 2005) or the Hyperanalytic Wavelet Transform (HWT) (Nafornita et al.,
2008, Firoiu et al., 2009) could also be considered. The advantages of such transforms
compared to DWT are: quasi-shift invariance and enhanced directional selectivity. The data
hiding capacity increases with the increase of redundancy (4x for DTCWT and HWT). We
will compare the efficiency of those wavelet transforms in watermarking.
2. Watermarking methods
Most techniques embed the watermark in a transform domain as mentioned before. Early
techniques have used the Discrete Cosine Transform. One of the most influential
Application of Discrete Wavelet Transform in Watermarking 199
watermarking works is a spread spectrum approach proposed in (Cox et al., 1997). They
argue that the watermark be placed explicitly in the perceptually most significant
components of the data, and that the watermark be composed of random numbers drawn
from a Gaussian distribution N ( 0,1 ) , in order to make it invisible and robust to attacks:
v′ ( i ) = v ( i ) ( 1 + α w ( i ) ) (1)
where v(i) is the DCT coefficient to be watermarked, w(i) is the watermark bit, α is the
embedding strength and v’(i) is the watermarked coefficient. Detection is made using the
similarity between the original W and extracted Ŵ watermarks:
ˆ
ˆ = W ⋅W
(
sim W , W
Wˆ ⋅W
ˆ
) (2)
The fact that the transform is performed over the entire image increases the computation
time. Other methods have been proposed that use the block-based DCT transform, just like
in the JPEG compression (see for example Podilchuk & Zeng, 1998).
Other authors have proposed the use of the Discrete Fourier Transform or its variant – the
Fourier-Mellin transform. This is useful in order to perform phase modulation between the
watermark and the original signal (Ó Ruanaidh et al., 1996). The phase is more important
than the amplitude; hence it will be difficult for an attacker to remove the watermark. Phase
modulation often possesses superior noise immunity in comparison with amplitude
modulation. Many watermarking techniques use DFT amplitude modulation because the
watermark will be translation invariant. The DFT is more often used in its derived forms
such as the Fourier-Mellin transform. This Fourier-Mellin transform approach has arisen out
of the need for Rotation, Scale and Translation invariant (RST-invariant) watermarking
techniques. It involves creating a Log Polar map of the DFT amplitudes of the image, where
the embedding takes place. This method is said to be extremely RST invariant and uses a
RST invariant watermark (Lin et al., 2001, Ó Ruanaidh & Pun, 1998).
I u , v are the coefficients of the original image, wu , v are the watermark bits, and JNDu , v are
the JND values computed using visual models. In the case of DCT, they are computed using
Watson’s perceptual model; for the wavelet domain, the weight is computed for each
frequency band based on typical viewing conditions. Detection is made using correlation
200 Discrete Wavelet Transforms: Algorithms and Applications
between the image difference and the watermark sequence. This method is more robust than
the spread-spectrum method by Cox et al., 1997. Although more robust than IA-DCT, the
IA-W method does not take into account perceptual significant regions, so the watermark
can be erased from perceptually insignificant coefficients. For example, low-pass filtering
will affect the watermark inserted in high frequency components.
Xia et al., 1998 propose a watermarking algorithm using the Haar mother wavelet, and two
levels of decomposition. A pseudo-random sequence is added to the highest coefficients not
located in the lowest resolution:
β
f ′ ( m , n ) = f ( m , n ) + α ⋅ f ( m , n ) wi (4)
where α is the watermark strength, and β is the amplification for large coefficients. This
algorithm concentrates most of the energy in edges and textures, which are the coefficients
in detail subbands. This increases the invisibility of the watermark, because human
observers are less sensitive to change in edges and textures compared to changes in smooth
areas of an image. More watermarks are inserted in each subband, and detection is done
hierarchically, for each resolution level, using intercorrelation between original watermark
and the difference of the two images. The method is robust to a series of distortions, but
low-pass and median filtering affect the watermark.
Kundur & Hatzinakos, 1998 use the Daubechies wavelet family to compute the DWT on
three levels of decomposition. The watermarking algorithm selects in a pseudo-random
manner the embedding locations from the detail subbands. The authors state that the
spread-spectrum technique is not appropriate for transmitting the watermark because the
correlator used for watermark detection is not effective in the presence of fading. Hence,
they use quantization for embedding the watermark bits. To increase robustness, they use a
reference watermark in order to estimate if the watermark bit has been embedded (Kundur
& Hatzinakos, 2001).
One of the popular methods is the one proposed by Barni et al., 2001. The watermark is
masked according to the characteristics of the human visual system (HVS), taking into
account the texture and the luminance content of all the image subbands. For coefficients
corresponding to contours of the image a higher strength is used, for textures a medium
strength is used and for regions with high regularity a lower strength is used, in accordance
with the analogy water-filling and watermarking (Kundur, 2000).
The image I, of size 2M×2N, is decomposed into 4 levels using Daubechies-6 wavelet
mother, where I θl is the subband from level l∈{0, 1, 2, 3}, and orientation θ∈{0, 1, 2, 3}
(horizontal, diagonal and vertical detail subbands, and approximation subband). A
pseudorandom binary (±1) sequence is casted into 2D binary watermarks, each of size
MN/4l, xθl . The watermark is embedded in all coefficients from level l=0 by addition
where α is the embedding strength and wθl ( i , j ) is half of the quantization step:
qθl ( i , j ) = Θ ( l ,θ ) Λ ( l , i , j ) Ξ ( l , i , j )
0.2
(6)
Watermark
IDWT
DWT
Original Marked
Mask
Fig. 2. Watermark embedding in the wavelet domain (Barni et al., 2001). The watermark is
embedded in the first resolution level using a perceptual mask.
This is a product of three factors: sensitivity to noise, local brightness and texture activity
around a pixel. They are computed as follows:
1.00 l = 0
2 , θ = 1 0.32 l = 1
Θ ( l ,θ ) = ⋅ (7)
1 otherwise 0.16 l = 2
0.10 l = 3
Λ ( l, i , j ) = 1 + L '( l, i , j ) (8)
(
L ( l , i , j ) = I 33 1 + i 2 3 − l ,1 + j 2 3 − l 256 ) (9)
3−l 2 1 2
Ξ ( l , i , j ) = ∑ 16 − k ∑ ∑ Iθ ( y + i
k +l 2 k , x + j 2 k )
k =0 θ =0 x ,y =0 (10)
{
⋅ Var I ( 1 + y + i 2
3
3
3−l
,1 + x + j 2 3−l
)} x = 0 ,1
y = 0 ,1
The texture activity around a pixel is composed by the product of two contributions; the first
is the local mean square value of the DWT coefficients in all detail subbands and the second
is the local variance of the 4th level approximation image. Both are computed in a small 2×2
neighborhood corresponding to the location (i, j) of the pixel. The first contribution is the
distance from the edges, and the second one is the texture. This local variance estimation is
computed with a low resolution.
Detection is made using the correlation between the marked DWT coefficients and the
watermarking sequence to be tested for presence (the original image is not needed):
l l
2 M /2 − 1 N /2 − 1
ρ ( l ) = 4l ∑ ∑ ∑ Iɶlθ ( i , j ) xlθ ( i , j ) ( 3MN ) (11)
θ=0 i =0 j =0
202 Discrete Wavelet Transforms: Algorithms and Applications
Barni’s method is quite robust against common signal processing techniques like filtering,
compression, cropping and so on. However, because embedding is made only in the last
resolution level, the watermark information can be easily erased by an attacker. Nafornita,
2008 proposed a pixel-wise mask allowing insertion of the watermark in lower resolution
levels. The third factor of the texture is estimated using the local standard deviation of the
original image computed on a rectangular moving window W(i,j) of WS×WS pixels, centered
on each pixel I(i,j). This criterion of segmentation finds its contours, textures and regions
with high homogeneity. The local mean is:
µˆ ( i , j ) = WS−2 ∑ I ( m, n ) (13)
I ( m , n )∈W ( i , j )
σˆ 2 ( i , j ) = WS−2 ∑ ( I ( m, n) − µˆ ( i , j ))
2
(14)
I ( m , n )∈W ( i , j )
The local standard deviation is the square root of this local variance. The texture for a
considered DWT coefficient is proportional with the local standard deviation of the
corresponding pixel from the host image. We denote this local standard deviation image
with S, and the local mean image with U. Embedding is made in the subband s, level l; the
size of the texture matrix must agree with the size of the subband. Hence, the approximation
image at the lth decomposition level is used. This compression can be realized exploiting the
separation properties of the DWT. To generate the mask required for the embedding into the
detail subimages corresponding to the lth decomposition level, the DWT of the local
standard deviation image is computed (making l+1 iterations). The required mask will be
the approximation subimage from level l, denoted Sl3, normalized to the local mean, also
compressed in the wavelet domain, Ul3. This is illustrated in Fig. 3. One difference between
the watermarking method proposed by Nafornita, 2008 and the one proposed by Barni et
al., 2001, is given by the computation of the local variance – the second term – in (10). To
obtain the new values of the texture, the local variance of the image to be watermarked is
computed, using the relations (13) and (14). The local standard deviation image is
decomposed using one iteration wavelet transform, and only the approximation image is
kept. Relation (10) is then replaced with:
3 −l 2 1
Iθk + l ( y + i 2 k , x + j 2 k )
2
Ξ ( l , i , j ) = ∑ 16− k ∑ ∑
k =0 θ =0 x , y =0 (15)
⋅S ( i , j ) U ( i , j )
3
l
3
l
Application of Discrete Wavelet Transform in Watermarking 203
Sθ l
normalization
Local DWT S3 0 S0 0
Original standard S3 0 NS30
image deviation
I S2 0 S1 0
S
Uθl
DWT
U30 U00
Local
U30
mean
2 1
U U 0 U 0
Fig. 3. Watermark embedding. The watermark is embedded using a secret or public key,
making invisible changes to the cover work.
The second difference is that the luminance mask is computed on the approximation image
from level l, where the watermark is embedded. The DWT of the original image using l
decomposition levels was computed and the approximation subimage corresponding at
level l was separated, obtaining the image I l3 . The luminance content is computed using:
L ( l , i , j ) = I l3 ( i , j ) 256 (16)
Since both factors are more dependent on the resolution level in the method proposed by
Barni, the noise sensitivity function becomes:
2 , θ = 1 1.00 l ∈ {0,1}
Θ ( l ,θ ) = . (17)
1, otherwise 0.66 l = 2
It was considered the ratio between the correlation ρ(l) in Eq. (11) and the image dependent
threshold Tρ(l), hence the detector was viewed as a nonlinear function with a fixed
threshold. In Nafornita, 2007a, three detectors are used, to take advantage of the wavelet
hierarchical decomposition. The watermark presence is detected,
1. from all resolution levels, “all_levels”,
2. separately from each resolution level, considering the maximum detector response from
each level, “max_level”,
3. separately from each subband, considering the maximum detector response from each
subband, “max_subband”.
Evaluating the correlations separately per resolution level or subband can be sometimes
advantageous. In the case of cropping, the watermark will be damaged more likely in the
lower frequency than in the higher frequency, while lowpass filtering affects more the
higher frequency than lower ones. Layers or subbands with lower detector response are
discarded. This type of embedding combined with new detectors is more attack resilient to a
possible erasure of the three subbands watermark. The detector “all_levels” evaluates the
watermark’s presence on all resolution levels:
d1 = ρ d 1 Td 1 (18)
204 Discrete Wavelet Transforms: Algorithms and Applications
l l
2 M /2 − 1 N /2 − 1
2
2
ρ d 1 = ∑∑ ∑ ∑ Iɶlθ ( i , j ) xlθ ( i , j ) 3MN ∑ 4 − l (19)
l =0 θ =0 i =0 j =0 l=0
l l
2 M /2 − 1 N /2 − 1 2
2
2
−l
∑ ∑ ( Iɶ ( i , j ) )
2
σ ρd1
2
≈ ∑∑ θ
l 3MN ∑ 4 (20)
l=0 θ=0 i =0 j =0 l=0
The second detector “max_levels” considers the responses from different levels, as
d(l)=ρ(l)/T(l), with l∈{0, 1, 2}, and discards the detector responses with lower values:
d2 = max {d ( l )} (21)
l
The third detector considers the responses from different subbands and levels, as d(l,θ) the
ratio ρ(l,θ)/T(l,θ), with l,θ∈{0, 1, 2}, and discards the detector responses with lower values,
d3 = max {d ( l ,θ )} (22)
l ,θ
The correlation and threshold are computed with the same rationale on one subband,
indicated by its orientation and level.
To maximize the robustness and the capacity, the role of the redundancy of the transform
used must be highlighted first. An example of redundant WT is represented by the tight
frame decomposition. In Hua & Fowler, 2002 are analyzed the watermarking systems based
on tight frame decompositions. The analysis indicates that a tight frame offers no inherent
performance advantage over an orthonormal transform (DWT) in the watermark detection
process despite the well known ability of redundant transforms to accommodate greater
amounts of added noise for a given distortion. The overcompleteness of the expansion,
which aids the watermark insertion by accommodating greater watermark energy for a
given distortion, actually hinders the correlation operator in watermark detection. As a
result, the tight-frame expansion does not inherently offer greater spread-spectrum
watermarking performance. This analytical observation should be tempered with the fact
that spread-spectrum watermarking is often deployed in conjunction with an image-
adaptive weighting mask to take into account the human visual model (HVM) and to improve
perceptual performance. Another redundant WT, the DTCWT, was already used for
watermarking (Loo & Kingsbury, 2000). The authors of this paper prove that the capacity of a
watermarking system based on a complex wavelet transform is higher than the capacity of a
similar system that embeds the watermark in the DWT domain. Many authors (e.g. Daugman,
1980) have suggested that the processing of visual data inside our visual cortex resembles
filtering by an array of Gabor filters of different orientations and scales. The proposed
implementation of HWT is efficient, has only a modest amount of redundancy, provides
approximate shift invariance, has better directional selectivity than the 2D DWT and it can be
observed that the corresponding basis functions closely approximate the Gabor functions. So,
the spread spectrum watermarking based on the use of an image adaptive weighting mask
applied in the HWT domain is potentially a robust solution that increases the capacity.
ψ a ( x , y ) = ψ ( x , y ) + iHx {ψ ( x , y )} +
(23)
{
+ jHy {ψ ( x , y )} + kHx Hy {ψ ( x , y )} }
where i 2 = j 2 = − k 2 = −1, and ij = ji = k (Davenport, 2008). The HWT of the image f ( x , y ) is:
HWT { f ( x , y )} = f ( x , y ) ,ψ a ( x , y ) . (24)
The 2D-HWT of the image f ( x , y ) can be computed using the 2D-DWT of its associated
hypercomplex image:
HWT { f ( x , y )} = DWT { f ( x , y )} +
x { y }
iDWT H { f ( x , y )} + jDWT H { f ( x , y )} + { }
{ }}
(25)
{
+ kDWT H H { f ( x , y )} =
y x
f ( x , y ) ,ψ ( x , y ) = DWT { f ( x , y )} .
a a
206 Discrete Wavelet Transforms: Algorithms and Applications
HWT uses four trees, each implemented by 2D-DWT, being adequate to a multi-wavelet
environment (Firoiu et al., 2009). Hx is the Hilbert transform computed across lines and Hy
across columns (Fig. 4). The HWT coefficients are organized in two sequences of complex
coefficients separated by the sign of their preferential orientation, with 6 subbands, 3 of
positive orientations and 3 of negative orientations ±atan(1/2), ±π/4 and ±atan(2):
z± = z± r + jz± i
= f D1,2 ,3 ∓
{ }
Hy Hx { f }
D1,2 ,3 + j
( Hx D1,2 ,3 ± Hy )
D1,2 ,3 .
(26)
Fig. 6. Left: The ratio ρ/T as a function of the PSNR between the marked and the original
images, for different quality factors, JPEG compression. Right: Ratio ρ/T as a function of
embedding strength, for different quality factors, JPEG compression. Pf is set to 10-8.
Fig. 6 shows results for JPEG compression, for different quality factors: the ratio ρ/T is
plotted as a function of the peak signal-to-noise ratio (PSNR) between the marked (un-
attacked) image and the original one, and respectively as a function of α. The probability of
false positive detection is set to 10-8. If this ratio is greater than 1 then the watermark is
positively detected. Generally, for a PSNR higher than 30 dB, the original image and
watermarked one are considered indistinguishable. For compression quality factors higher
or equal than 25 the distortion introduced by JPEG compression is tolerable. For PSNR in
the range of 30-35 dB, of practical interest, the watermark is detected for all significant
compression quality factors. Increasing the embedding strength, the PSNR of the
watermarked image decreases, and the ratio ρ/T increases. The watermark is still detectable
even for very small values of α. For the quality factor Q=5 (or a compression ratio CR=32), the
208 Discrete Wavelet Transforms: Algorithms and Applications
watermark is still detectable even for α=0.5. Fig. 7 shows the detection of a true watermark for
various quality factors, in the case of α=1.5; the threshold is well below the detector response.
In Table 1 we give a comparison between the two methods, for the Lena image, α=1.5 in the
case of JPEG compression with a quality factor of 5 (compression ratio of 46).
Fig. 7. Left: Detector response ρ, threshold T, as a function of different quality factors (JPEG
compression). The watermark is successfully detected. Pf is set to 10-8. Right: Highest
detector response, ρ2, corresponding to a fake watermark and threshold T. The threshold is
above the detector response.
Fig. 8. Original image Lena; mask from Nafornita et al., 2006b and Barni’s mask for level l=0.
The masks are the complementary of the real ones.
Application of Discrete Wavelet Transform in Watermarking 209
In Nafornita et al., 2006b, Barni’s method is modified, using the texture mask in (15), as well
as the luminance factor in (16). The masks obtained are shown in Fig. 8. The improvement is
clearly visible around edges and contours. The method is applied in two cases, when the
watermark is inserted in level 0 only and when it’s inserted in level 1 only. JPEG
compression is again considered. The image Lena is watermarked at level l=0 and
respectively at level l=1 with α ranging from 1.5 to 5. The binary watermark is embedded in
all the detail wavelet coefficients of the resolution level, l as previously described. For α=1.5,
the watermarked images, in level 0 and level 1, as well as the image watermarked using
Barni’s mask, are shown in Fig. 9. Obviously the quality of the watermarked images are
preserved using the new pixel-wise mask. The PSNR values are 38 dB (level 0) and 43 dB
(level 1), compared to Barni’s method, with a PSNR of 20 dB.
Fig. 9. Watermarked images, α =1.5, for Nafornita et al., 2006b, level 0 (PSNR = 38 dB); level 1
(43 dB); for Barni et al., 2001, level 0 (20 dB).
Fig. 10. Left: PSNR as a function of α. Embedding is made either in level 0 or in level 1.Right:
Detector response ρ, threshold T, highest detector response, ρ2, corresponding to a fake
watermark, as a function of different quality factors (JPEG compression). The watermark is
successfully detected. Pf is set to 10−8. Embedding was made in level 0.
PSNR values are shown in Fig. 10(left) as a function of the embedding strength. The
watermark is still invisible, even for high values of α. Fig. 11 gives the results for JPEG
compression. In all experiments, the probability of false positive detection is set to 10−8. The
210 Discrete Wavelet Transforms: Algorithms and Applications
watermark is successfully detected for a large interval of compression quality factors. For
PSNR values higher than 30 dB, the watermarking is invisible. For quality factors Q≥10, the
distortion introduced by JPEG compression is tolerable. For all values of α, the watermark is
detected for all the significant quality factors (Q≥10). Increasing the embedding strength, the
PSNR of the watermarked image decreases, and ρ/T increases. For the quality factor Q = 10
(or a compression ratio CR = 32), the watermark is still detectable even for low values of α.
Fig. 10(right) shows the detection of a true watermark from level 0 for various quality
factors, for α=1.5; the threshold is below the detector response. The selectivity of the
watermark detector is also illustrated, when a number of 999 fake watermarks were tested:
the second highest detector response is shown, for each quality factor. False positives are
rejected.
In Table 2 a comparison between Nafornita et al., 2006b and Barni et al., 2001, can be seen
for JPEG compression with Q=10 (compression ratio of 32). The detector response for the
original watermark ρ, the detection threshold T, and the second highest detector response
ρ2, when the watermark was inserted in level 0 are given. The detector response is higher
than for Barni et al.
Fig. 11. Ratio ρ/T as a function of the embedding strength α. The watermarked image is
JPEG compressed with different quality factors Q. Pf is set to 10−8. Embedding was made in
level 0 (left), and in level 1 (right).
first resolution level, l=0, for α=0.2, that results in a similar image quality (see Fig.12). This
has been concluded in Nafornita, 2007b, where by limiting the watermark strength such that
the PSNR is 35 dB and in average the percentage of affected pixels is less than 25%, the
quality of the images is greatly improved. Girod’s model has been used for determining the
location and number of affected pixels (Girod, 1989). For instance, in Barni’s case, the
watermarked image with α=0.2 has a PSNR of 36.39 dB, 11.84% affected pixels, compared to
the one watermarked with α=1.5 has a PSNR of 20 dB, and all pixels are affected. What is
kept constant for comparison are the 2D watermarks embedded in the first level, and the
image quality. The method Nafornita, 2007a cannot be compared with the one in Barni et al.,
2001 when the watermark is embedded in all resolution levels, simply because their mask
isn’t suited for embedding in other levels than the highest resolution level. Results for some
of the standard images from the USC SIPI Image Database are given.
Fig. 12. (left) Original image Lena, (middle) Watermarked images for Nafornita, 2007a,
α=1.5, PSNR=36.86 dB, (right) Barni et al., 2001, α=0.2, PSNR=36.39 dB.
Table 3 includes PSNR values for the two cases. For the first detector, an estimate of the false
positive probability is shown for the image Lena, before and after JPEG compression attack,
with quality factor Q=10, as a function of the detection thresholds, Tρ1. The threshold values
have been computed using as estimate the variance of the ρ1 obtained from experiments.
The mean PSNR for the twelve images is 34.16 dB for the proposed method (Nafornita,
2007a) and 34.06 dB for Barni’s method.
Nafornita, 2007a
Detector response vs. attack Barni’s method
1-All levels 2-Max level 3-Max subband
JPEG compression, Q=10 2.38 1.98 1.44 1.75
Median filtering, M=5 1.32 1.12 1.46 0.25
Scaling, 50% 4.06 5.21 5.76 1.85
Cropping, 512x512 -> 32x32 0.68 0.98 1.73 1.48
Gamma correction, γ=2 20.32 29.19 28.06 32.54
Motion blur, L=31, θ=11 1.98 5.48 8.04 6.14
Table 3. Resistance to different attacks, for Nafornita, 2007a method. The detector response
is a mean value of different responses.
Tests were made for JPEG compression, median filtering, cropping, resizing, gamma
correction and blurring. Table 3 shows the mean values of the detector responses for each
212 Discrete Wavelet Transforms: Algorithms and Applications
attack. A particular attack parameter is chosen where the watermark is still detectable by at
least one detector. For compression, the method in Nafornita, 2007a successfully detects the
watermark at Q=10. The 1st detector is better in all cases. This new method has better results
than Barni’s technique. The watermark of both methods survived in all images for median
filtering with kernel sizes up to 3. For kernel size 5, the watermark of Nafornita, 2007a using
the first and third detector is detectable; Barni’s method fails to detect the watermark. In the
case of scaling to 50%, the watermark was successfully detectable in both cases, with better
results for Nafornita, 2007a. The third detector has the best performance in detecting the
mark. The watermark of Nafornita, 2007a was successfully detected in the cropped image of
32x32, only with the third detector, which proves its efficiency. Barni’s method detects the
watermark with similar detector responses as in the case of the third detector. As expected
for normalized correlation detection, both methods are practically insensitive to gamma
correction adjustment. For the motion blur attack, both methods have successfully detected
the watermark in all cases. Detector 3 has slightly better results than the others.
Fig. 13. Experimentally evaluated probability of false positive Pf vs. Tρ1/σρ1, the ratio
between the detection threshold and standard deviation of the correlations in the case where
an incorrect watermark was embedded. The theoretical trend is also shown (‘o’ marker).
Tests were made on Lena, before and after JPEG compression with quality factor 10, using
5×104 different watermarks.
Application of Discrete Wavelet Transform in Watermarking 213
For the first detector, the probability of false positive was estimated by searching many
different watermarks into one watermarked image, Lena. Each threshold Tρ1 was set in such
a way to grant a given value of Pf. The trial was repeated for values of Pf ranging from 10-1
through 10-4. In total 5x104 watermarks per image have been tested. The estimation has been
done before any type of manipulation and after JPEG compression, with quality factor 10.
The estimated Pf is plotted in Fig. 13 versus the ratio Tρ1/σρ1 between the detection
thresholds and standard deviations of correlations for the case corresponding to certain
estimates of this probability of false positive. This case corresponds to the situation where
the image is watermarked with a code Y other than X.
Surprisingly, the estimated false alarm Pf, is lower in the case of compression than in the
case of no attack, for the same detection threshold. This can be explained by the fact that
before compression, the empirical pdf of the correlations in the case for an incorrect
watermark is embedded, was not Gaussian. Although the two empirical pdf’s are closer
after the attack, they are still very good separated and the empirical pdf for an incorrect
watermark has the mean below zero, compared to the equivalent one before – which is
centered on zero. Thus setting a particular threshold can indeed result in a lower false alarm
after attack. Similar results were obtained for Barbara, and for the same attack.
For the first detector, the obtained probability of false positive is close to the expected one.
The assumption that the wavelet coefficients from different levels and subbands are i.i.d. is
thus reasonable and the detector has a good performance.
Fig. 14. Original and watermarked images with method (Nafornita et al., 2008), for α=1.5,
PSNR=35.63 dB; Difference image, amplified 8 times.
The watermarked images have been exposed at some common attacks: JPEG compression
with different quality factors (Q), shifting, median filtering with different window sizes M,
resizing with different scale factors, cropping with different areas remaining, gamma
correction with different values of γ, blurring with a specified point spread function (PSF)
and perturbation with AWGN with different variances.
Resistance to unintentional attacks, for watermarked image Lena, can be compared to the
results obtained using the watermarking methods in Barni et al., 2001 and Nafornita, 2007a
214 Discrete Wavelet Transforms: Algorithms and Applications
analyzing Table 4. For the method in Nafornita, 2007a, the same watermark strength, 1.5 is
used and the watermark is embedded in all three wavelet decomposition levels, resulting in
a PSNR of 36.86 dB. For the method in Barni et al., 2001, the watermark strength 0.2 is used
and the embedding is made only in the first resolution level, resulting in a similar quality of
the images (PSNR=36.39 dB).
Table 4. Resistance to different attacks, for HWT based method compared to DWT based
methods.
Special attention was paid to the shifting attack. First the watermarked image was circularly
shifted with li lines and co columns, obtained the attacked image ( Iɶt ) . Supposing that the
numbers li and co are known, the messages at level l are circularly shifted with li/2l lines
θ
and co/2l columns obtaining the new messages ( xt )l . Next the watermark was detected
θ
using the image ( I )t and the messages ( xt )l . The values obtained for li=128 and co=128 are
ɶ
presented in Table 4.
From the results, it is clear that embedding in the real parts of the HWT transform yields in
a higher capacity at the same visual impact and robustness. In fact the results obtained in
Nafornita et al., 2008 are slightly better than the results obtained with the DWT-based
methods in Nafornita et al., 2008 and Barni et al., 2001 for JPEG compression, median
filtering with window size M=3, resizing and gamma correction. For the other attacks the
results obtained are similar with the results of the watermarking methods based on DWT.
The case of the shifting attack is very interesting. In this case the robustness of the
watermarking method is given by two properties: the shift invariance degree of the WT
used and the masking ability. All the methods compared in Table 4 are very robust against
the shifting attack. The values of the ratios between the correlations and the image
dependent thresholds obtained before and after the shifting attack are equal for all the
methods compared in Table 4. So, the ability of masking seems to be more important than
the shift invariance degree of the WT used for the conception of counter-measures against
Application of Discrete Wavelet Transform in Watermarking 215
the shifting attack, when the numbers of lines and columns used for the attack are already
known. Of course, the detection of these numbers must also be realized, for the
implementation of a strategy against the shifting attack.
4. Conclusion
In a watermarking system, robustness evaluation should be made if invisibility criteria are
satisfied. For this purpose, perceptual watermarks are being used to overcome the issue of
robustness against invisibility. In the literature, there was proposed a blind spread spectrum
technique that uses a perceptual mask in the wavelet domain, taking into account the noise
sensitivity, texture and the luminance content of all image subbands. We described new
techniques proposed by the authors, based on the modifications of this perceptual mask, in
order to increase robustness, while still maintaining imperceptibility. Moreover, using the
new mask, information is successfully hidden in the lower frequency levels, thus increasing
the capacity and making the watermark more robust to common attacks that affect both
high frequencies and low frequencies of the image. A good balance between robustness and
invisibility of the watermark is achieved when embedding is made in all detail subbands for
all resolution levels, except the coarsest level; this can be particularly useful against erasure
of high frequency subbands containing the watermark in Barni’s system.
A nonlinear detector with fixed threshold – as ratio between correlation and the image
dependent ratio – has been used; three watermark detectors were proposed in Nafornita,
2007a that take advantage of the hierarchical wavelet decomposition: 1) from all resolution
levels, 2) separately from each level, considering the maximum detector response for each
level and 3) separately from each subband, considering the maximum detector response for
each subband. This has been advantageous for cropping, scaling and median filtering where
the 3rd detector shows improved performance. We tested our methods against different
attacks, and found out that it is better than Barni’s method. The behavior of our methods can
be explained by the fact that we have used a better estimate of the mask and we took
advantage of the diversity of the wavelet decomposition. The effectiveness of the new
perceptual mask is appreciated by comparison with Barni’s method. Simulation results
show the superiority of the proposed methods (Nafornita et al., 2006a, b, Nafornita,
2007a).
The HWT is a very modern WT as it has been formalized only two years ago. A very simple
implementation of this transform has been used, which permits the exploitation of the
mathematical results and of the algorithms previously obtained in the evolution of wavelets
theory. It does not require the construction of any special wavelet filter. It has a very flexible
structure, as we can use any orthogonal or bi-orthogonal real mother wavelets for the
computation of the HWT. The presented implementation leads to both a high degree of
shift-invariance and to an enhanced directional selectivity in the 2D case. An ideal Hilbert
transformer was considered. A new type of pixel-wise masking for robust image
watermarking in the HWT domain has been presented (Nafornita et al., 2008). Modifications
were made to two existing watermarking technique proposed in Barni et al., 2001 and
Nafornita, 2007a, based on DWT. These techniques were selected for their good robustness
against the usual attacks. The method is based on the method in Barni et al., 2001, with some
modifications. The first modification is in computing the estimate of the variance, which
216 Discrete Wavelet Transforms: Algorithms and Applications
gives a better measure of the texture activity. An improvement is also owed to the use of a
better luminance mask. The third improvement is to embed the watermark in the detail
coefficients at all resolutions, except the coarsest level, making the watermark more attack
resilient. The HWT embedding exploits the coefficients z+ r and z− r .
The simulation results illustrate the effectiveness of the proposed algorithms. The methods
were tested against different attacks (in terms of robustness). The HWT based watermarking
method is similar and in some cases outperforms the DWT based methods, but it has a
superior capacity than the DWT based methods.
As a future research direction, the statistical properties of the HWT will be used to improve
the watermark detection.
5. References
Abdulaziz, N.; Glass, A.; Pang, K.K. (2002). Embedding Data in Images Using Turbo-
Coding, 6th Symposium on DSP for Communication Systems, 28-31 Jan. 2002, Univ. of
Wollongong, Australia.
Adam, I.; Nafornita, C.; Boucher, J.-M. & Isar, A. (2007). A New Implementation of the
Hyperanalytic Wavelet Transform, Proc. of IEEE Symposium ISSCS 2007, Iasi,
Romania, ’07, 401-404.
Balado F. & Perez-Gonzalez, F. (2001). Coding at the Sample Level for Data Hiding: Turbo
and Concatenated Codes, SPIE Security and Watermarking of Multimedia Contents,
San Jose CA, 22-25 Jan. 2001, San Jose CA , USA, 2001, 4314, 532-543.
Barni, M.; Bartolini, F. & Piva, A (2001). Improved wavelet-based watermarking through
pixel-wise masking, IEEE Trans. Image Processing, 10, 5, May 2001, 783 – 791.
Braudaway, G.W.; Magerlein, K.A. & Mintzer, F. (1996). Protecting publicly available images
with a visible watermark, Proc. SPIE – Int. Soc.Opt. Eng., vol. 2659, pp.126 – 133,
1996.
Cox, I. (2005). Robust watermarking, ECRYPT Summer School on Multimedia Security,
Salzburg, Austria, Sept. 22, 2005
Cox, I.; Killian, J.; Leighton, T. & Shamoon, T. (1997). Secure Spread Spectrum
Watermarking for Multimedia, IEEE Trans. Image Processing, 6, 12, 1997, 1673-1687
Cox, I.; Miller, M. & Bloom, J. (2002). Digital Watermarking, Morgan Kaufmann Publishers,
2002
Daugman, J. (1980). Two-dimensional spectral analysis of cortical receptive field profiles,
Vision Res., 20, ’80, 847-856.
Davenport, C. (2008). Commutative Hypercomplex Mathematics, Available from:
http://home.comcast.net/~cmdaven/hyprcplx.htm.
Firoiu, I.; Nafornita, C.; Boucher, J.–M. & Isar, A. (2009). Image Denoising Using a New
Implementation of the Hyperanalytic Wavelet Transform, IEEE Transactions on
Instrumentation and Measurements, vol. 58, Issue 8, August 2009, pp. 2410-2416.
Girod, B. (1989). The information theoretical significance of spatial and temporal masking in
video signals, Proc. SPIE Human Vision, Visual Processing, and Digital Display, vol.
1077, pp. 178–187, 1989.
Hua, L. & Fowler, J. E. (2002). A Performance Analysis of Spread-Spectrum Watermarking
Based on Redundant Transforms, Proc. IEEE Int. Conf. on Multimedia and Expo,
Lausanne, Switzerland, ’02, vol. 2, 553–556.
Application of Discrete Wavelet Transform in Watermarking 217
Kingsbury, N. (2001). Complex Wavelets for Shift Invariant Analysis and Filtering of
Signals, Applied and Comp. Harm. Anal. 10, ’01, 234-253.
Kingsbury, N. (2000). A Dual-Tree Complex Wavelet Transform with improved
orthogonality and symmetry properties, Proc. IEEE Conf. on Image Processing,
Vancouver, ’00, paper 1429.
Kundur, D. (2000). Water-filling for Watermarking?, Proc. IEEE Int. Conf. On Multimedia and
Expo, NY, 1287-1290, Aug. 2000.
Kundur, D. & Hatzinakos, D. (1998). Digital Watermarking using Multiresolution Wavelet
Decomposition, Proc. IEEE Int. Conf. On Acoustics, Speech and Signal Processing,
Seattle, Washington, Vol. 5, pp. 2969-2972, May 1998.
Kundur, D. & Hatzinakos, D. (2001). Diversity and Attack Characterization for Improved
Robust Watermarking, IEEE Transactions on Signal Processing, Vol. 49, No. 10, 2001,
pp. 2383-2396.
Lin, C. Y.; Wu, M.; Bloom, J. A.; Cox, I. J.; Miller, M. L. & Lui, Y. M. (2001). Rotation, Scale,
and Translation Resilient Watermarking for Images, IEEE Trans. On Image
Processing, 10, 5, May 2001
Loo, P. & Kingsbury, N. (2000). Digital Watermarking Using Complex Wavelets, ICIP 2000.
Moulin, P. & Mihcak M.K. (2002). A Framework for Evaluating the Data-Hiding Capacity of
Image Sources, IEEE Trans. Image Processing, 11(9), ’02, 1029-1042.
Nafornita, C.; Isar, A. & Borda, M. (2005). Image Watermarking Based on the Discrete
Wavelet Transform Statistical Characteristics, Proc. IEEE EUROCON 2005, Serbia &
Montenegro, 943-946.
Nafornita, C. (2008). Contributions to Digital Watermarking of Still Images in the Wavelet
Transform, Ph.D. thesis, Feb. 2008, Technical University of Cluj-Napoca, Romania.
Nafornita, C.; Isar, A.; Kovaci M. (2009). Increasing Watermarking Robustness using Turbo
Codes, IEEE International Symposium on Intelligent Signal Processing WISP 2009,
Budapest, Hungary, 26-28 August 2009.
Nafornita, C.; Firoiu, I.; Boucher, J.-M. & Isar, A. (2008). A New Watermarking Method
Based on the Use of the Hyperanalytic Wavelet Transform, Proc. SPIE Europe:
Photonics Europe, vol. 7000: Optical and Digital Image Processing 70000W, pp.70000W-
1-70000W-12, ISBN 97808194 71987, Strasbourg, April, 2008.
Nafornita, C. (2007). A New Pixel-Wise Mask for Watermarking, Proc. of ACM Multimedia
and Security Workshop, 2007, Dallas, TX, USA.
Nafornita, C.; Isar, A. & Borda, M. (2006). Pixel-wise masking for watermarking using local
standard deviation and wavelet compression, Scientific Bulletin of the Politehnica
Univ. of Timisoara, Trans. on Electronics and Communications, 51(65), 2, pp. 146-151,
ISSN 1583-3380, 2006.
Nafornita, C.; Isar, A. & Borda, M. (2006). Improved Pixel-Wise Masking for Image
Watermarking, Multimedia Content Representation, Classification and Security,
September 11-13, 2006, Istanbul, Turkey, Lecture Notes in Computer Science,
Springer-Verlag, 2006, pp. 90-97.
Nafornita, C. (2007). Robustness Evaluation of Perceptual Watermarks, IEEE Int. Symposium
on Signal, Circuits and Systems ISSCS 2007, 12-13 July 2007, Iasi, Romania.
Nason, G.P. (2002). Choice of wavelet smoothness, primary resolution and threshold in
wavelet shrinkage, Statistics and Computing, 12, ’02, 219-227.
218 Discrete Wavelet Transforms: Algorithms and Applications
Nikolaidis, N. & Pitas, I. (1998). Robust Image Watermarking in the Spatial Domain, Trans.
Signal Processing, Vol. 66, No. 3, pp. 385-403, 1998.
Ó Ruanaidh, J.J.K. & Pun, T. (1998). Rotation, Scale and Translation Invariant Spread
Spectrum Digital Image Watermarking, Signal Processing, 66(1998), pp. 303-317.
Ó Ruanaidh, J.J.K; Dowling, W.J.; Boland, F.M. (1996). Phase watermarking of digital
images, Proc. IEEE Int. Conf. Image Processing, 1996, pp. 239-242.
Podilchuk, C. & Zeng, W. (1998). Image-Adaptive Watermarking Using Visual Models, IEEE
Journal on Selected Areas in Communications, 16, 4, May 1998, 525-539
Selesnick, I. W.; Baraniuk, R. G. & Kingsbury, N. (2005). The Dual-tree Complex Wavelet
Transform - A Coherent Framework for Multiscale Signal and Image Processing,
IEEE Signal Processing Magazine, 22(6):123-151, November 2005.
Serdean, C.V.; Ambroze, M.A.; Tomlinson, M. & Wade, J.G. (2003). DWT based high-
capacity blind video watermarking, invariant to geometrical attacks, IEE Proc.-Vis.
Image Signal Process., 150, 1, Feb. 2003.
Xia, X.; Boncelet, C. G. & Arce, G. R. (1998). Wavelet Transform Based Watermark for Digital
Images, Optics Express, Vol. 3, No. 12, 1998, pp. 497-505.
Part 4
1. Introduction
The discrete wavelet transform (DWT) has an established position in processing of signals
and images in research and industry. The first DWT structures were based on the compactly
supported conjugate quadrature filters (CQFs) (Smith & Barnwell, 1986; Daubechies, 1988).
However, a drawback in CQFs is related to the nonlinear phase effects such as image
blurring and spatial dislocations in multi-scale analyses. On the contrary, in biorthogonal
discrete wavelet transform (BDWT) the scaling and wavelet filters are symmetric and linear
phase. The biorthogonal filters (BFs) are usually constructed by a ladder-type network
called lifting scheme (Sweldens, 1988). The procedure consists of sequential down and
uplifting steps and the reconstruction of the signal is made by running the lifting network in
reverse order. Efficient lifting BF structures have been developed for VLSI and
microprocessor environment (Olkkonen et al. 2005; Olkkonen & Olkkonen, 2008). The
analysis and synthesis filters can be implemented by integer arithmetics using only register
shifts and summations. Many BDWT-based data and image processing tools have
outperformed the conventional discrete cosine transform (DCT) -based approaches. For
example, in JPEG2000 Standard (ITU-T, 2000), the DCT has been replaced by the lifting BFs.
One of the main difficulties in DWT analysis is the dependence of the total energy of the
wavelet coefficients in different scales on the fractional shifts of the analysed signal. If we
have a discrete signal x[n] and the corresponding time shifted signal x[n − τ ] , where
τ ∈ [0,1] , there may exist a significant difference in the energy of the wavelet coefficients as
a function of the time shift. Kingsbury (2001) proposed a nearly shift invariant method,
where the real and imaginary parts of the complex wavelet coefficients are approximately a
Hilbert transform pair. The energy (absolute value) of the wavelet coefficients equals the
envelope, which provides smoothness and approximate shift-invariance. Selesnick (2002)
observed that using two parallel CQF banks, which are constructed so that the impulse
responses of the scaling filters have half-sample delayed versions of each other: h0 [ n] and
h0 [n − 0.5] , the corresponding wavelets are a Hilbert transform pair. In z-transform domain
we should be able to construct the scaling filters H 0 ( z) and z −0.5 H 0 ( z) . For design of the
scaling filters Selesnick (2002) proposed a spectral factorization method based on the half
delay all-pass Thiran filters. As a disadvantage the scaling filters do not have coefficient
symmetry and the nonlinearity interferes with the spatial timing in different scales and
prevents accurate statistical correlations. Gopinath (2003) generalized the idea for N parallel
222 Discrete Wavelet Transforms: Algorithms and Applications
filter banks, which are phase shifted versions of each other. Gopinath showed that
increasing N the shift invariance of the wavelet transform improves. However, the greatest
advantage comes from the change N = 1 to 2 .
In this book chapter we review the methods for constructing the shift invariant CQF and BF
wavelet sequences. We describe a dual-tree wavelet transform, where two parallel CQF
wavelet sequences form a Hilbert pair, which warrants the shift invariance. Next we review
the construction of the BF wavelets and show the close relationship between the CQF and
BF wavelets. Then we introduce a novel Hilbert transform filter for constructing shift
invariant dual-tree BF banks.
Fig. 1. The analysis and synthesis parts of the real-valued CQF DWT bank.
H 0 ( z) = (1 + z −1 )K P( z)
H 1 ( z) = z− N H 0 ( − z −1 )
(1)
G0 ( z) = H 1 ( − z)
G1 ( z) = − H 0 ( − z)
where P( z ) is a polynomial in z −1 . The scaling filter H 0 ( z ) has the Kth order zero at ω = π .
The wavelet filter H 1 ( z) has the Kth order zero at ω = 0 , correspondingly. The filters are
related via the perfect reconstruction (PR) condition
H 0 ( z )G0 ( z) + H 1 ( z)G1 ( z) = 2 z − N
(2)
H 0 ( − z )G0 ( z) + H 1 ( − z )G1 ( z ) = 0
The tree structured implementation of the real-valued CQF filter bank is described in Fig. 2.
Let us denote the frequency response of the z-transform filter as
H ( z) = ∑ hn z− n ⇒ H (ω ) = ∑ hn e − jωn (3)
n n
H ( − z ) ⇒ H (ω − π )
(4)
H ( − z −1 ) ⇒ H ∗ (ω − π )
Shift Invariant Discrete Wavelet Transforms 223
where * denotes complex conjugation. In M-stage CQF tree the frequency response of the
wavelet sequence is
M
WM (ω ) = H 1 (ω / 2)∏ H 0 (ω / 2 k ) (5)
k=2
Fig. 2. The tree structured implementation of the real-valued CQF DWT, which yields the
wavelet sequences w1[n], w2 [n] ... wM [ n] and one scaling sequence sM [n] .
Next we construct a phase shifted parallel CQF filter bank consisting of the scaling filter
H 0 ( z) and the wavelet filter H 1 ( z) . Let us suppose that the scaling filters in parallel CQF
trees are related as
H 0 (ω ) = e − j φ (ω ) H 0 (ω ) (6)
where φ (ω ) is a 2π periodic phase function. Then the corresponding CQF wavelet filters are
related as
H 1 (ω ) = e − jω N H 0* (ω − π ) (7)
and
H 1 (ω ) = e − jω N H 0* (ω − π ) = e − jω N e jφ (ω −π ) H 0* (ω − π ) = e j φ (ω −π ) H 1 (ω ) (8)
We may easily verify that the phase shifted CQF bank (6,8) obeys the PR condition (2).
Correspondingly, the frequency response of the M-stage CQF wavelet sequence is
M M
WM (ω ) = H 1 (ω / 2) ∏ H 0 (ω / 2 k ) = e j φ (ω /2 −π ) H 1 (ω / 2)∏ e − j φ (ω /2 ) H 0 (ω / 2 k )
k
k=2 k =2
M (9)
− j ∑ φ (ω /2 )
k
M
j φ (ω /2 −π ) jθ
=e e k =2
H 1 (ω / 2)∏ H 0 (ω / 2 ) = e WM (ω )
k
k=2
M
θ = φ (ω / 2 − π ) − ∑ φ (ω / 2 k ) (10)
k =2
224 Discrete Wavelet Transforms: Algorithms and Applications
φ (ω ) = ω / 2 (11)
the scaling filters (6) are half-sample delayed versions of each other. By inserting (11) in (10)
we have
ω /2 −π M
1 π ω
θ= −ω∑ = − + M +1 (12)
2 k =2 2 k+1 2 2
The wavelet sequences (5,9) yielded by the CQF bank (1) and the phase shifted CQF bank
(6,8) can be interpreted as real and imaginary parts of the complex wavelet sequence
W MC (ω ) = W M (ω ) + jW M (ω ) (13)
W M (ω ) = H {ψ M (ω )} (14)
where H denotes the Hilbert transform. The frequency response of the Hilbert transform
operator is defined as
H(ω ) = − j sgn(ω ) (15)
Our result (12) reveals that if the scaling filters are the half-sample delayed versions of each
other, the resulting wavelet sequences are not precisely Hilbert transform pairs. There
occurs a phase error term ω / 2 M + 1 , which depends both in frequency and the stage M of the
wavelet sequence. In sequel we describe a novel procedure for elimination this error. We
move the phase error in front of the phase shifted CQF tree using the equivalence described
in Fig. 3. Then the error term reduces to ω / 2 . The elimination of the error term can be made
by prefiltering the analyzed signal by the half-sample delay operator D( z) = z−1/2 , which has
the frequency response D(ω ) = e − j ω /2 . The total phase function is then for −π ≤ ω ≤ π
θ (ω ) = ∠D(ω ) − π / 2 + ω / 2 = −π / 2 (18)
which warrants that the M-stage CQF wavelet sequence and the phase error corrected
sequence are a Hilbert transform pair.
Fig. 3. The two equivalents for transferring the phase function in front of the phase shifted
CQF tree.
Shift Invariant Discrete Wavelet Transforms 225
H 0 ( z) = (1 + z −1 )L Q( z)
H 1 ( z) = (1 − z−1 )M R( z)
(19)
G0 ( z ) = H 1 ( − z)
G1 ( z ) = − H 0 ( − z)
where the scaling filter H 0 ( z) has the Lth order zero at ω = π . The wavelet filter H 1 ( z) has
the Kth order zero at ω = 0 , correspondingly. Q( z) and R( z) are polynomials in z −1 . The
low-pass and high-pass reconstruction filters G0 ( z ) and G1 ( z ) are defined as in the CQF
bank. For two-channel biorthogonal filter bank the PR relation is
H 0 ( z)G0 ( z) + H 1 ( z)G1 ( z) = 2 z− D
(20)
H 0 ( − z)G0 ( z) + H 1 ( − z)G1 ( z) = 0
BK ( z) = (1 + z−1 )K (21)
which appears both in the CQF and BF banks. Using the binomial term the CQF bank can be
written as
H 0 ( z) = BK ( z)P( z)
H 1 ( z) = z − N ( − z)K BK ( − z)P( − z−1 )
(22)
G0 ( z ) = z − N zK BK ( z)P( z −1 )
G1 ( z ) = BK ( − z)P( − z)
For the PR condition of the CQF bank ( ) the following is valid for K odd
The above relation (25) gives a novel way to design of the biorthogonal wavelet filter bank
based on the CQF bank and vice versa. The polynomials Q( z) and R( − z) can be found by
factoring P( z)P( z−1 ) , which is a symmetrical polynomial. The roots of the product filter
P( z)P( z−1 ) should be optimally divided so that both Q( z) and R( − z) are low-pass. Then
R( z) is high-pass. If the BF bank is known it is easy to factor Q( z)R( − z) into
P( z) and P( z−1 ) using some spectral factorization method. An important result is related to
the modification of the BF bank (Olkkonen & Olkkonen, 2007a).
Lemma 1: If the scaling filter H 0 ( z) , the wavelet filter H 1 ( z) and the reconstruction filters
G0 ( z ) and G1 ( z ) in FB bank (19) have a perfect reconstruction property (20), the following
modified FB bank obeys also the PR relation
H 0 ( z ) = F ( z )H 0 ( z )
H 1 ( z) = F −1 ( − z)H 1 ( z)
(26)
G0 ( z) = F −1 ( z)G0 ( z)
G1 ( z) = F( − z)G1 ( z)
−1
where F( z ) is any polynomial in z . Proof is yielded by direct insertion (26) to PR condition
(20).
where sgn(ω ) = 1 for ω ≥ 0 and sgn(ω ) = −1 for ω < 0 . We describe a novel method for
constructing the Hilbert transform filter based on the half-sample delay filter D( z) = z−0.5 .
The classical approach for design of the half-sample delay filter D( z) is based on the Thiran
all-pass interpolator
c k + z −1 z− N A( z −1 ) c N + c N − 1 + ⋯ + z− N
p
D( z) = z−0.5 = ∏ −1
= = (28)
k =1 1 + c k z A( z) 1 + c1 z−1 + ⋯ + c N z− N
where the ck coefficients are optimized so that the frequency response follows approximately
D(ω ) = e − jω /2 (29)
A( z)
D( z) = (30)
B( z)
The quadrature mirror filter D( − z) has the frequency response
D(ω )
= e − jω /2 e j (ω −π )/2 = e − jπ /2 (32)
D(ω − π )
Comparing (27) and using the IIR filter notation (30) we obtain the Hilbert transform filter as
A( z )B( − z )
H( z ) = (33)
A( − z )B( z )
The Hilbert transform filter is inserted in the BF bank using the result of Lemma 1 (26). The
modified prototype BF filter bank is
H 0 ( z ) = H( z )H 0 ( z )
H 1 ( z ) = H −1 ( − z ) H 1 ( z )
(34)
G0 ( z ) = H −1 ( z )G0 ( z )
G1 ( z ) = H( − z )G1 ( z )
The BF bank (34) can be highly simplified by noting the following equivalents concerning on
(33)
H −1 ( − z ) = H( z )
(35)
H −1 ( z ) = H( − z )
H 0 ( z ) = H( z ) H 0 ( z )
H 1 ( z ) = H( z ) H 1 ( z )
(36)
G0 ( z ) = H( − z )G0 ( z )
G1 ( z ) = H( − z )G1 ( z )
The modified BF bank (36) can be realized by the Hilbert transform filter H( z ) , which works
as a prefilter for the analysed signal. The Hilbert transform filter H( − z) works as a postfilter
in the reconstruction stage, respectively. The wavelet sequences yielded by the two parallel
BF trees can be considered to form a complex wavelet sequence by defining the Hilbert
transform operator
Ha ( z ) = 1 + j H( z) (37)
By filtering the real-valued signal x[n] by the Hilbert transform operator results in an
analytic signal
2 X (ω ) 0≤ω <π
X a (ω ) = (39)
0 −π ≤ω < 0
228 Discrete Wavelet Transforms: Algorithms and Applications
The wavelet sequence is obtained by decimation of the high-pass filtered analytic signal
1
W (ω ) = [ X a (ω ) H 1 (ω )]↓ 2 = W a (ω )↓ 2 = X a (ω / 2) H 1 (ω / 2) (40)
2
The result (40) means that the decimation does not produce aliasing but the frequency
spectrum is dilated by two. The frequency spectrum of the undecimated wavelet sequence
Wa (ω ) contains frequency components only in the range 0 ≤ ω < π , but the frequency
spectrum of the decimated analytic signal has the frequency band 0 ≤ ω < 2π . Hence, the
decimation does not produce overlapping and leakage (aliasing) to the negative frequency
range. A key feature of the dual-tree wavelet transform is the shift invariance of the decimated
analytic wavelet coefficients. The Fourier transform of the decimated wavelet sequence of the
1 − jωτ /2
fractionally delayed signal x[n − τ ] is e Wa (ω / 2) and the corresponding wavelet
2
sequence is w[n − τ / 2] . The energy (absolute value) of the decimated wavelet coefficients is
1
W (ω / 2) , which does not depend on the fractional delay τ . If the wavelet filter has linear
2
phase the wavelet coefficients are shift invariant in respect to their energy content.
An integer-valued half-delay filter D( z) = A( z) / B( z) is obtained by the B-spline transform
(see details Olkkonen & Olkkonen, 2007b). Table I gives the polynomial coefficients for the
B-spline orders K=4, 5 and 6. The frequency response of the Hilbert transform filter
constructed by the fourth order B-spline (Fig. 4) shows a maximally flat magnitude
spectrum. The phase spectrum corresponds to an ideal Hilbert transformer (15).
K A( z ) B( z )
−1 −2
1 + 6z +z 1 + 4 z −1 + z − 2
4
8 6
1 + 76 z −1 + 230 z −2 + 76 z −3 + z −4 1 + 11 z −1 + 11 z −2 + z −3
5
384 24
1 + 237 z −1 + 1682 z −2 + 237 z −3 + z −4 1 + 26 z −1 + 66 z −2 + 26 z −3 + z −4
6
3840 120
Table I. The half-delay filter polynomials for the B-spline transform order K=4, 5 and 6.
Fig. 4. Magnitude and phase spectra of the Hilbert transform filter yielded by the fourth
order B-spline transform.
Shift Invariant Discrete Wavelet Transforms 229
6. Conclusion
It is well documented that the real-valued DWTs are not shift invariant, but small fractional
time-shifts may introduce significant differences in the energy of the wavelet coefficients.
Kingsbury (2001) showed that the shift invariance is improved by using two parallel filter
banks, which are designed so that the wavelet sequences constitute real and imaginary parts
of the complex analytic wavelet transform. The dual-tree discrete wavelet transform has
been shown to outperform the real-valued DWT in a variety of applications such as
denoising, texture analysis, speech recognition, processing of seismic signals and
neuroelectric signal analysis (Olkkonen et al. 2006; Olkkonen et al. 2007b).
Selesnick (2002) made an observation that a half-sample time-shift between the scaling
filters in parallel CQF banks is enough to produce the shift invariant wavelet transform. In
this work we reanalysed the condition and observed a phase-error term ω / 2 M + 1 (12)
compared with the ideal phase response θ (ω ) = −π / 2 . The phase error attains s highest
value at high frequency range and small stage M of the wavelet sequence. Fortunately, we
showed in this book chapter that the phase error term can be cancelled by adding a half-
delay prefilter in front of the CQF chain. For this purpose the half-delay filter
D( z) = A( z) / B( z) (30, Table I) constructed by the B-spline transform (Olkkonen & Olkkonen,
2007a) is well suited. In addition, there exists many other design methods for half-delay
filters (see e.g. Laakso et al. 1996; Johansson & Lowenborg, 2002; Pei & Tseng, 2003; Pei &
Wang, 2004; Tseng, 2006).
In multi-scale DWT analysis the complex wavelet sequences should be shift invariant. This
requirement is satisfied in the Hilbert transform-based approach (Olkkonen et al. 2006,
Olkkonen et al. 2007b), where the signal in every scale is Hilbert transformed yielding
strictly analytic and shift invariant transform coefficients. The procedure needs FFT-based
computation which may be an obstacle in many digital signal processor realizations. To
avoid this we conducted the novel shift invariant dual-tree BF bank (36) based on the
Hilbert transform filter (33). This highly simplified BF bank is yielded by Lemma 1 and the
equivalence (35) of the Hilbert transform filter (33). In many respects the BF bank (36)
outperforms the previous nearly shift invariant DWT approaches.
7. References
Daubechies, I. (1988). Orthonormal bases of compactly supported wavelets. Commmun. Pure
Appl. Math., Vol. 41, 909-996.
ITU-T (2000) Recommend. T.800-ISO DCD15444-1: JPEG2000 Image Coding System.
International Organization for Standardization, ISO/IEC JTC! SC29/WG1.
Johansson, H. & Lowenborg, P. (2002). Reconstruction of nonuniformy sampled bandlimited
signals by means of digital fractional delay filters, IEEE Trans. Signal Process., Vol.
50, No. 11, pp. 2757-2767.
Kingsbury, N.G. (2001). Complex wavelets for shift invariant analysis and filtering of
signals. J. Appl. Comput. Harmonic Analysis. Vol. 10, 234-253.
Laakso, T., Valimaki, V., Karjalainen, M. & Laine, U.K. (1996). Splitting the unit delay. Tools
for fractional delay filter design, IEEE Signal Processing Magazine, pp. 30- 80.
230 Discrete Wavelet Transforms: Algorithms and Applications
Olkkonen, H., Pesola, P. & Olkkonen, J.T. (2005). Efficient lifting wavelet transform for
microprocessor and VLSI applications. IEEE Signal Process. Lett. Vol. 12, No. 2, 120-
122.
Olkkonen, H., Pesola, P., Olkkonen, J.T. & Zhou, H. (2006). Hilbert transform assisted
complex wavelet transform for neuroelectric signal analysis. J. Neuroscience Meth.
Vol. 151, 106-113.
Olkkonen, H. & Olkkonen, J.T. (2007a). Half-delay B-spline filter for construction of shift-
invariant wavelet transform. IEEE Trans. Circuits and Systems II. Vol. 54, No. 7, 611-
615.
Olkkonen, H., Olkkonen, J.T. & Pesola, P. (2007b). FFT-based computation of shift invariant
analytic wavelet transform. IEEE Signal Process. Lett. Vol. 14, No. 3, 177-180.
Olkkonen, H. & Olkkonen, J.T. (2008). Simplified biorthogonal discrete wavelet transform
for VLSI architecture design. Signal, Image and Video Process. Vol. 2, 101-105.
Pei , T. S.C. & Tseng, C.C. (2003). An efficient design of a variable fractional delay filter
using a first-order differentiator, IEEE Signal Processing Letters, Vol. 10, No. 10, pp.
307-310.
Pei , S.C. &and Wang, P.H. (2004). Closed-form design of all-pass fractional delay, IEEE
Signal Processing Letters, Vol. 11, No. 10, pp. 788-791.
Selesnick, I.W. (2002). The design of approximate Hilbert transform pairs of wavelet bases.
IEEE Trans. Signal Process. Vol. 50, No. 5, 1144-1152.
Smith, M.J.T. & Barnwell, T.P. (1986). Exaxt reconstruction for tree-structured subband
coders. IEEE Trans. Acoust. Speech Signal Process. Vol. 34, 434-441.
Sweldens, W. (1988). The lifting scheme: A construction of second generation wavelets.
SIAM J. Math. Anal. Vol. 29, 511-546.
Tseng,C.C. ( 2006). Digital integrator design using Simpson rule and fractional delay filter,
IEEE Proc. Vision, Image and Signal Process., Vol. 153, No. 1, pp. 79-85.
14
1. Introduction
A discrete wavelet transform (DWT) has been widely applied to various digital signal
processing techniques. It has been designed under a certain condition such as perfect
reconstruction, aliasing cancellation, regularity, vanishing moment, etc. This article
introduces a new condition referred to “DC lossless”. It guarantees lossless reconstruction of
a constant input signal (DC signal) instead of rounding of signal values and coefficient
values inside a transform. The minimum word length of the values under the new condition
is theoretically derived and experimentally verified.
Since JPEG 2000 algorithm based on the discrete wavelet transform (DWT) was adopted as
an international standard for digital cinema video coding [1], high speed and low power
implementation of a DWT has been becoming an issue of great importance [2,3]. In
designing a DWT, its coefficient values and signal values are assumed to be real numbers.
However, in implementation, they are rounded to rational numbers so that they are
expressed with finite word length representation in binary digit. Therefore it is inevitable to
have rounding errors inside a DWT processing unit.
In this article, we derive a condition on word length of coefficient values and that of signal
values of a DWT such that the transform becomes lossless for a DC signal. Under this
condition (DC lossless condition), it is theoretically guaranteed that an output signal
contains no error in spite of rounding of coefficients and signals inside the DWT. We treat
the irreversible 9-7 DWT adopted by the JPEG 2000 for lossy coding of image signals as an
example.
In case of the 5-3 DWT in JPEG 2000 for lossless coding, benefiting from its lifting structure
[4-6], lossless reconstruction of any signal is guaranteed even though signals and coefficients
are rounded. On the contrary, it does not hold for the 9-7 DWT because of scaling for
adjusting DC gain of a low pass filter in a forward transform [7]. However, we have pointed
out that it became possible to be lossless for a DC signal under a certain condition on word
length of coefficients and signals [8].
This DC lossless condition is a necessary condition for the regularity which has been
analyzed by numerous researchers to improve coding performance of a transform. When
the regularity is not satisfied, the DWT has some problems such as a checker board artifact
which is observed in a reconstructed signal as unnecessary high frequency noise in flat or
232 Discrete Wavelet Transforms: Algorithms and Applications
smooth region of a signal [9]. It also brings about DC leakage which decreases the coding
gain of a transform [10].
The regularity has been structurally guaranteed for a two channel quadrature mirror filter
bank (QMF) [9] and the DCT [10] respectively. However, since these previous methods were
based on the lattice structure, these are not directly applicable to the lifting structure of the
9-7 DWT. Beside these relations to the regularity, the DC lossless condition itself is also
considered to be important for white balancing of a video system in which the DC signal is
used as a reference input for calibration [11].
This article aims at deriving the DC lossless condition theoretically and clarifying the
minimum word length of signals and coefficients. In conventional analysis, errors due to
shortening of word length of signals (signal errors) were described as 'additive' to a signal
[7,12]. They were treated as independent and uniformly distributed white noise. On the
other hand, errors due to rounding of coefficients (coefficient errors) were described as
'multiplicative' to a signal and evaluated with the sensitivity [13-15]. It should be noted that
the signal error and the coefficient error have been treated independently. Unlike those
conventional approaches, we utilize mutual effect between rounding of signals and that of
coefficients. Introducing a new model which unifies the coefficient error and the signal
error, we define tolerance for those errors as a parameter to simultaneously control both of
word length of signals and that of coefficients.
As a result of our theoretical analysis, the minimum word length of signals and that of
coefficients inside the lifting 9-7 DWT are derived under the DC lossless condition. We
confirm that the minimum word length derived by our analysis is shorter than that
determined by a conventional approach. We also confirm that the DWT under the condition
does not have the checker board for a DC signal.
This article is organized as follows. Chapter 2 defines a rounding operation and a rounding
error, describes their basic properties in algebraic approach, and derives 'addition' formula
and 'multiplication' formula of the rounding (modulo) operation. Application of these
formulas to scaling of a signal value is introduced in chapter 3. Chapter 4 introduces the DC
lossless DWT. Its usefulness is also described. Derivation process of conditions on word
length of signals and coefficients is described in chapter 5. The new condition derived from
the basic properties in chapter 2 is summarized in chapter 6. Other related condition derived
from a conventional approach is also summarized. Theoretical results are verified and the
minimum word length of the DC lossless DWT is clarified in chapter 7. This article is
concluded in chapter 8.
I 1
x b
p F
p 2 p , bp {0,1}, I 1, F 0, I Z , F Z (1)
where bp, p {-F, ,I-1), is a set of binary digit for a value x. It has I bit integer part
including one sign bit and F bit fraction part. Hereinafter, F is referred to as word length of a
value x. This F bit value x has a range expressed as
x [ 2 I 1 , 2 I 1 2 F ] [ 2 I 1 , 2 I 1 ) . (2)
For example, in case of I=1 and F=2, the maximum value is x=0.75 for [b0 b-1 b-2]=[0 1 1], and
the minimum value is x=-1.00 for [b0 b-1 b-2]=[1 0 0].
When an F bit signal value is multiplied with a coefficient value, in a convolution of a
filtering process in DWT for example, a resulting signal value has longer word length than
its original value. Therefore it is rounded to F bit again. So far there are various types of
rounding operations [16]. In this article, we deal with the rounding operation defined by
0 [ x ] x R0 [ x ] or 0 [ x ] ( x 2 1 ) mod 1 2 1 . (4)
Expanding these expressions to an F bit case, we can define the rounding operation and the
rounding error as
RF [ x ] R0 [ x 2 ]2
F F
F
(5)
F [ x ] 0 [ x 2 ]2
F
x 0 y R0 [ x ] x F y ' RF [ x ]
x2 1
R0 [ x 2 F ]2 F
=
0
2F 2-F
(a) integer (b) F bit fraction
Fig. 1. Definition of the rounding operation and the rounding error. (a) An integer
implementation case. (b) An F bit word length implementation case.
234 Discrete Wavelet Transforms: Algorithms and Applications
It suggests that an integer y can be ignored when only the rounding error is considered in an
analysis. There is another obvious property;
R0 [ x ] 0 x [ 2 1 , 2 1 ) . (8)
0 [ x ] [ 2 1 , 2 1 ) , (9)
R0 0 [ x ] 0,
(10)
0 0 [ x ] 0 [ x ].
The equations above for F=0 can be straightforwardly extended to an F≠0 case as follows.
RF [ x y ] RF [ x ] y
y 2F Z for x R (11)
F [ x y ] F [ x ]
RF [ x ] 0 x [ 2 1 F , 2 1 F ) (12)
F [ x ] [ 2 1 F , 2 1 F ) (13)
RF F [ x ] 0
(14)
F F [ x ] F [ x ]
RF [ x ] n2 F x 2 F 2 1 n , 2 1 n for n Z . (15)
Addition formula
RF [ x y ] RF [ x ] RF y F [ x ] for x, y R (16)
Proof:
RF x y
RF RF [ x ] F [ x ] y (4)
RF [ x ] RF F [ x ] y (11)
Q.E.D.
Multiplication formula
RF [ xy ]
RF x F [ y ] xRF [ y ] (4)
RF x F [ y ] F xRF [ y ] RF xRF [ y ] (4)
RF x F [ y ] F xRF [ y ] RF xRF [ y ]
(11)
Q.E.D.
Formulas for a rounding error (remainder) can be also derived as
F [ x y ] F F [ x ] F [ y ]
(18)
F [ xy ] F F [ x ]RF [ y ] RF [ x ] F [ y ] F [ x ] F [ y ]
RF [ x y ] RF [ x ] y
y 2 F Z RF [ x y ] F [ x ] x y (19)
[ x y ] [ x ]
F F
Multiplication formula
RF [ xy ] RF F [ x ]y RF [ x ]y
y 2 Z RF [ xy ] F F [ x ]y xy
F
(20)
F [ xy ] F F [ x ]y
Especially when two kinds of word lengths are mixed in a signal processing, the following
variation of the multiplication formula is conveniently applied to analyzing behavior of
signals and errors in a pair of encoder and decoder [17].
236 Discrete Wavelet Transforms: Algorithms and Applications
Proof:
Q.E.D.
x y* x y
F F F F
h h'=RW[h]
Em RF RW [ h ]x RF [ hx ]
Em 0 for . (22)
RW [ h ] h W [ h ]
Condition on Word Length of Signals and Coefficients for DC Lossless Property of DWT 237
W [ h ]x F [ hx ] 2 1 F , 2 1 F . (23)
Proof:
RF RW [ h ]x RF [ hx ]
RF hx W [ h ]x RF [ hx ]
RF F [ hx ] W [ h ]x RF [ hx ] RF [ hx ]
RF F [ hx ] W [ h ]x
0
F [ hx ] W [ h ]x 2 1 F , 2 1 F
Q.E.D.
The Eq.(23) also means
F [ hx ] 2 1 F [ hx ] 2 1
W [h] , , x 0
x x
(24)
F [ hx ] 2 1
F [ hx ] 2
1
W [h] x
,
x
, x 0
which gives tolerance to the rounding error of a coefficient [8]. This is the mapping invariant
condition on word length W of a coefficient under a given word length F of signals. It
represents exact (not approximated) behavior of rounding errors.
Unlike the condition above, a sufficient condition can be derived by substituting the upper
bound of errors and signals;
W [ h ] 2 1 W , F [ hx ] 2 1 F , x 2 I 1 , (25)
W F I 1. (26)
This condition is too strict and requires too long word length to guarantee the mapping
invariance. In both cases of Eq.(24) and Eq.(26), the mapping invariant condition determines
the minimum of word length W of a coefficient under a given word length F of signals.
x y w x y w
F2 F1 F2 F2 F1 F2
h1 h2 (=1/h1) h1 h2
x, y, w ∈ real number x, y, w ∈ rational number
F1 → ∞ [bit] F1 → min. [bit]
(a) assumption (b) implementation
Fig. 3. Scaling pair has two coefficients h1 and h2 (=1/h1). (a) Output w is exactly the same as
its original x. (b) This lossless property is guaranteed under a condition on F1 and F2.
We apply the formulas and the properties to derive the condition on F1 and F2. The lossless
case in Fig.3(b) is described as
From the basic properties, the lossless condition on a scaling pair is derived as
h2 F1 [ h1 x ] 2 1 F2 , 2 1 F2 . (28)
Proof:
R0 h2 F1 [ h1x ]2 F2 h2 h1 x 2 F2 2 F2 x
R0 h2 F1 [ h1x ]2 F2 2 F2 x x
h2 F1 [ h1 x ] 2 1 F2 , 2 1 F2
Q.E.D.
This condition determines the word length F1 and F2 of signals for an input value x. It
represents exact condition such that total accumulated rounding error is nullified by the
rounding just after the final multiplier with h2. As a result, the original value x is recovered
as the final output without any loss.
Unlike the exact condition above, a sufficient condition can be derived by analyzing the
upper bound as follows.
h2 F1 [ h1 x ] h2 2 1 F1 2 1 F2 (29)
F1 F2 log 2 h2 (30)
Condition on Word Length of Signals and Coefficients for DC Lossless Property of DWT 239
holds, the scaling pair becomes lossless. However this condition is too strict and requires too
long word length of signals.
x(n)
FX ↓2 FB
+ + c6 y1(m)
FS FS
1+z
1+z
c2 c4
z
c1 c3 1+z-1
1+z-1
FS FS c5 y2(m)
+ +
↓2 FB
(a) Forward transform
FS FX ↑2
y1(m) c5 - - + w(n)
FS FS
1+z
1+z
c4 c2
z-1
c3 c1
-1
1+z-1
1+z
FS FS
y2(m) c6
- -
FS FX ↑2
(b) Backward transform
Fig. 4. The irreversible 9-7 DWT of the JPEG 2000 standard.
The multiplier coefficients ci, i I, I={i | 1,2, , 6} are designed under the word length long
enough to be treated as real numbers. When the DWT is implemented, coefficient values are
rounded to the length as short as possible to minimize total hardware complexity. Similarly,
signal values are also rounded. In the figure, fraction part of each signal is shortened to FS,
FB or FX [bit] by a rounding operation illustrated as a circle.
Denoting the integer part as IS [bit], total word length WS [bit] of a signal s is defined as
WS I S FS 1 (31)
including 1 [bit] for the sign part. Similarly, total word length WC [bit] of a coefficient c is
defined as
WC IC FC 1 . (32)
240 Discrete Wavelet Transforms: Algorithms and Applications
In Fig.4, fraction part of the input signal x(n) is given as FX [bit]. Inside the DWT, fraction
part of the signals are rounded to FS [bit] just after each of all the multiplications with ci, i I.
Output signals from the forward and backward transforms are rounded to FB [bit] and FX
[bit] respectively. Note that we do not truncate integer part of signals and that of
coefficients. We are determining FS and FC such that the DC lossless property is satisfied.
for a given constant value d with FX [bit] fraction part. When the proposition in Eq.(33)
holds, the DWT has no DC leakage for the DC input signal with value d. Similarly, when the
proposition in Eq.(34) is true, the reconstructed signal w(n) contains no checker board
artifact for the DC input signal. In the following chapters, we investigate the minimum
fraction part of signals FS [bit] which guarantees the DC lossless for given FX and FB [bit].
We also investigate the minimum fraction part FCi [bit] of a coefficient ci, i I with flexibility
of trading off the signal error and the coefficient error.
Fig.5(a) illustrates an example of a video system. It contains an encoder and a decoder which
are composed of a forward DWT and a backward DWT. In white balancing, a camera and a
display are calibrated with a constant valued input signal (DC signal) [11,19]. Therefore, it is
useful for this calibration if the forward DWT and its backward do not generate any error. In
this case, the camera and the display can be calibrated ignoring existence of the encoder and
the decoder as illustrated in Fig.5(b). Namely, the DC lossless condition provides a low
complexity DWT useful for the white balancing.
input output
DC signal DC signal
Camera Display
Camera Display
x(n) w(n)
adjust adjust
Encoder Decoder
(a) video system (b) calibration
Fig. 5. The DC lossless property is useful for white balancing in a video system.
In addition, the DC lossless condition is a necessary condition for the regularity which
controls smoothness of basis functions and coding performance of a transform. A DWT
under the regularity does not generate the checker board artifact or the DC leakage. Harada
et. al. analyzed a condition for the regularity of a two channel quadrature mirror filter bank
(QMF) [9]. They confirmed that a QMF under the condition has reduced checker board
artifact for an input step signal. It is expanded to a multirate system under short word
length expression [20]. The regularity was structurally guaranteed for a biorthogonal linear
Condition on Word Length of Signals and Coefficients for DC Lossless Property of DWT 241
phase filter bank [21,22] and the DCT [10] respectively. However, since these previous
methods are based on factorization of a transfer function including (1+z-1) or (1-z-1) in the
lattice structure, these are not directly applicable to the lifting structure of the 9-7 DWT in
Fig.4.
In this article, we derive the DC lossless condition theoretically in chapter 5, and determine
the minimum word length of signals and that of coefficients under the condition in
following chapters.
c c c ' . (35)
Just after the multiplication, the signal is rounded to s' with FS [bit] fraction part as
where e' is the signal error. From Eq.(35) and (36), the final output becomes
e '' 2 FS p (39)
where p is an integer. Given the tolerable maximum to an integer p, word length of the
coefficient c can be controlled independently of other coefficients in other sections inside the
DWT. Furthermore, denoting the signal error as e' similarly to Eq.(36), the output value is
described as
s ' cs e
(40)
e e ' e ''
242 Discrete Wavelet Transforms: Algorithms and Applications
where
e ' FS [cs]
(41)
e '' RFS FS [cs ] FC [c ]s
as illustrated in Fig.6(d). In this new model, both of the coefficient error e'' and the signal
error e' are unified to the error e. Utilizing Eqs.(13) and (15), its absolute value is limited
to
|e| ( p 2 1 )2 FS . (42)
Note that the parameter p to control word length of a coefficient c is included in this
equation. It is equivalent to
2 FC FS IS 1 p (43)
e'
s c' s' s c s'
FS FS FS
c' c c -Δc e' 2 1 FS
c e'' c e
s s' s s'
FS FS FS
e' ' p 2 FS e ( p 2 1 )2 FS
Fig. 6. A multiplier in the DWT and its models for error analysis.
Inside the forward DWT, the error e is propagated and added up with other errors from
other multipliers. When its maximum absolute value is less than 2 1 FB , the total error is
nullified by the rounding at the final output of the forward DWT. In this article, we utilize
this nullification of errors at output of the DWT to derive a condition on word length such
that the DC lossless defined by Eq.(33) and (34) is satisfied.
Condition on Word Length of Signals and Coefficients for DC Lossless Property of DWT 243
where
where
Similarly to Eq.(42), these errors are described with the parameters pi and qi to control word
length of coefficients as
x(n)
FX ↓2 FB
x s1 s'2 + s3 s'4 + s6 c e x1 y1
6 6
2c1 e2 2c3 e4
z e1 2c2 e3 2c4
x s'1 + s2 s'3 + s4 s5 c 5 e5 x2 y2
↓2 FB
Y12 RFB IU e6 I L e5 K
IU e4 H 4 I L e3
(48)
H 3 IU e2 H 2 I L e1 H1IUL x
where
1 0 1 2 c j c6 0
Hi{1,3} , H j{ 2 ,4} , K 0
1 c 5
.
2 c i 0 1
It is described with the unified error matrices E1 and E2 as
where
H e 1 IU KIU KH 43 IU
H e 2 I L KH 4 I L KH 432 I L ,
H 43 H 4H 3
Condition on Word Length of Signals and Coefficients for DC Lossless Property of DWT 245
and
E 1 [ e6 e4 e2 ]T
E 2 [ e5 e3 e1 ]T .
Similarly, output values W12=[w1 w2]T from the backward transform in Fig.7(b) are
where
H e 3 [H11IU H123 1
IU H1234 1
IU ]
1 1
H
e4 [ I L H I
12 L H I
1234 L ]
1 1 1
H
12 H 1 H 2
and
E 3 [ f 2 f4 f 5 ]T
E 4 [ f 1 f3 f 6 ]T .
Yˆ 12 KH 4321 x IU
x. (51)
ˆ 1 ˆ
W12 (KH 4321 ) X 12 IUL
Using this equation, the accumulated errors are defined as
ˆ
E y 12 Y12 Y
E W ˆ .
12 (52)
w 12 12 W12
Substituting Eqs.(49), (50), (51) and using the property in Eq.(6), we have
E y 12 RFB ( H e 1E 1 H e 2 E 2 )
(53)
.
E w 12 RFX (H e 3E 3 H e 4 E 4 )
Applying Eq.(12), it becomes clear that when the conditions;
H e 1E1 H e 2 E 2 IUL 2 1 FB
1 F
(54)
H e 3E 3 H e 4 E 4 IUL 2 X
are satisfied, the accumulated errors are nullified by the rounding operations at the final
output of each of the forward transform and the backward transform.
E1 [ p6 p4 p2 ]T I 3 2 1 2 FS
E 2 [ p5 p3 p1 ]T I 3 2 1 2 FS
(55)
E 3 [ q2 q4 q5 ]T I 3 2 1 2 FS
E 4 [q1 q3 q6 ]T I 3 2 1 2 FS
for
I 3 [1 1 1]T
into Eq.(54). This is the condition we derived based on the new model described in section
5.1. We investigate the fraction part FCi [bit] of a coefficient ci , i I as the minimum word
length under the condition for a DC value x at the word length Fs [bit] of signals.
H e 1 He 2 2 FS
( p 2 1 ) IUL 2 1 FB
L1 L1
(56)
H e 3 2 FS
L1
He 4 L1
( p 2 1 ) IUL 2 1 FX
where H L1 denotes a column vector whose component is a sum of absolute value of all
components in each row. Substituting coefficients of the 9-7 DWT [1] into Eq.(56), we
have
p 2 S E 2
1 F G 1
(57)
GE 2.66 [bit]
where
and GE is the lower bound. This means a sufficient condition for the DC lossless. Since it is
too strict, the word length under this condition is redundant. Unlike this sufficient
condition, our critical condition given as Eq.(54) under Eq.(55) determines the word length
minimum and necessary for the DC lossless.
Condition on Word Length of Signals and Coefficients for DC Lossless Property of DWT 247
7. Simulation results
This chapter verifies theoretically derived conditions, and clarifies the minimum word
length of the DC lossless DWT.
FC GE I S
(59)
FS GE
and it is also confirmed by the figure. It guarantees the DC lossless, however the condition is
too strict. Therefore the word length is redundant and there is room for further reduction.
Table 2. Word length calculated with equations in table 1 for Wx=8 [bit].
248 Discrete Wavelet Transforms: Algorithms and Applications
xin x 2 I X 1 (60)
where x is an input value to the DC equivalent circuit in Fig.7. The minimum of FC for each
FS is indicated as a broken line. It is clear that the word length derived by the critical
condition is shorter than that determined by the sufficient condition. For example in
Fig.8(a), the fraction part FS (= FC) is reduced from 11 [bit] to 9 [bit] for Ex.2. The word
length is not shortened for Ex.1 and Ex.3. In case of Fig.8(b), FS (= FC) is reduced from 13 [bit]
to 12 [bit] for Ex.2. (FS, FC) is reduced from (14, 4) to (13, 3) or (12, 4) for Ex.1. WS (= WC) is
reduced from 16 [bit] to 15 [bit] for Ex.3. It is confirmed that the word length is shortened
due to the analysis in this article.
20
15
Fc [bit]
Ex.2 Sufficient
10
Ex.1
Minimum
Ex.3
5
x∈[-27, 27)
A
0 5 10 15
Fs [bit]
(a) xin [0, 2 8 ), WX 8 [bit]
20
15
Sufficient
Fc [bit]
10 Minimum
5
x∈[-29, 29)
A
0 5 10 15
Fs [bit]
(b) xin [0, 2 10 ), WX 10 [bit]
Fig. 8. Word length under the two conditions. " " indicates (FS, FC) such that the DWT
becomes DC lossless.
Condition on Word Length of Signals and Coefficients for DC Lossless Property of DWT 249
20
15
Fc [bit]
Sufficient
10
Minimum
5
(FS ,FC)=(2, 9) for xin=16
0 5 10 15
Fs [bit]
20
15
Fc [bit]
Sufficient
10
Minimum
5
(FS ,FC)=(3, 9) for xin=235
0 5 10 15
Fs [bit]
(b) white value
Fig. 9. Word length under the two conditions for a specific value used in white balancing.
250 Discrete Wavelet Transforms: Algorithms and Applications
20
Fs=3
15 Sufficient
min.Fc [bit]
10
Minimum
5
x∈[1,128]
20 40 60 80 100 120
input x
Fig. 10. The minimum word length of coefficients for each of input DC values at (FS, WX)=(3,
8). According to the sufficient condition, the word length is too long.
sufficient 4 12
WX= 8 bit
any xin∈[0,28) 2 12
8 2
xin= 16 (black) 2 9
xin=235 (white) 3 9
sufficient 4 14
WX=10 bit
Table 3. The minimum word length for a specific value for white balancing of a video
system.
forward transform have the same length. It is worth paying attention to the fact that the
parameter p1 is the same for FC=9, 8 and 7 [bit] for example. It means that word length of the
coefficient c1 can be reduced from 9 to 7 [bit] without any influence to the errors. Therefore,
word length [FC1 FC2 FC6] of coefficients [c1 c2 c6] can be reduced from [9 9 9 9 9 9] to [7 9
7 4 6 4] according to the table.
Table 5 summarizes results of this optimum word length assignment for the forward
transform. Comparing to table 3, it is observed that word length of coefficients is reduced
from 9.00 [bit] to 6.17 [bit] on average for an input value xin=16. Table 6 summarizes results
for the backward transform. In this case, the word length is furthermore shortened. It is
observed that c6 and c4 can be omitted since y2 is equal to zero under the DC lossless. Fig.11
illustrates image signals reconstructed by the DWT which does not satisfy the DC lossless
condition. It demonstrates the checker board artifact for reference. It is confirmed that total
word length is furthermore shortened utilizing the tolerance parameters pi and qi introduced
in this article.
FC p1 p2 p3 p4 p5 p6 y1-x y2
9 0 0 0 0 0 1 0 0
8 0 -3 0 0 0 0 -1 -1
7 0 -3 0 0 0 0 -1 -1
6 7 12 -8 0 0 2 6 6
5 7 -18 10 0 1 0 -7 -5
4 -21 -18 9 0 -2 1 -10 -12
3 35 107 7 25 9 -28 63 70
Table 4. Tolerance parameters in Eq.(46) and (47) for xin=16 and FS=2 as an example.
forward transform
Table 5. The minimum word length of coefficients in the forward transform for a specific
values xin for a given word length of signals FS.
252 Discrete Wavelet Transforms: Algorithms and Applications
backward transform
signals coefficients
input
values
FS FC1 FC2 FC3 FC4 FC5 FC6 ave.
Table 6. The minimum word length of coefficients in the backward transform for a specific
values xin for a given word length of signals FS.
8. Conclusions
Introducing a new model which unifies the coefficient error and the signal error, and
utilizing the nullification of the accumulated errors, this article theoretically derived a
condition on word length of signals and coefficients such that the 9-7 DWT of JPEG 2000
becomes lossless for a DC input signal. It was confirmed that the minimum word length
Condition on Word Length of Signals and Coefficients for DC Lossless Property of DWT 253
derived by the newly introduced 'critical' condition was shorter than that determined by a
conventionally well known 'sufficient' condition. It was also confirmed that the DWT under
the condition does not have the checker board for a DC signal. Analysis in this article
contributes to build a low complexity DC lossless DWT.
9. Appendix
Proof of Eq.(43)
Eq.(39) with Eq.(41) means
FS [cs ] FC [c ]s ( p 2 1 )2 FS (A.1)
according to Eqs.(13) and (15). Applying the triangle inequality to the left hand side, we have
According to Eq.(13), each terms in the right hand side are described as
FS [cs ] 2 1 FS
(A.3)
1 F
FC [c ] s 2 C 2 S
I
2 1 FS 2 1 FC 2 IS ( p 2 1 )2 FS
2 FC FS IS 1 p .
Q.E.D.
10. References
[1] ISO/IEC FCD15444-1, "JPEG2000 image coding system," March 2000.
[2] Descampe, et.al., "A flexible hardware JPEG 2000 decoder for digital cinema," IEEE Trans.
circuits, systems on video technology, vol.16, issue 11, pp.1397-1410, Nov. 2006
[3] Bing-Fei Wu, Chung-Fu Lin, "Memory-efficient architecture for JPEG 2000 coprocessor
with large tile image, IEEE Trans. circuits and systems II, vol.53, issue 4, pp.304-
308, April 2006.
[4] H. Kiya, M. Yae, M. Iwahashi, "Linear phase two channel filter bank allowing perfect
reconstruction", IEEE Proc. international symposium on circuits and systems
(ISCAS), no.2, pp.951-954, May 1992.
[5] W. Sweldens, "The lifting scheme: A custom-design construction of biorthogonal
wavelets," Technical Report 1994:7, industrial mathematics initiative, department of
mathematics, university of South Carolina, 1994.
[6] M. L. Bruelers, A. W. M. van den Enden, "New Networks for Perfect Inversion and
Perfect Reconstruction," IEEE Journal of selected areas in communications, vol.10,
no.1, pp.130-137, Jan.1992.
254 Discrete Wavelet Transforms: Algorithms and Applications
[7] M. Reza, Lian Zhu, "Analysis of error in the fixed-point implementation of two-
dimensional discrete wavelet transforms," IEEE Trans. circuits and systems,
fundamental theory and applications, vol.52, issue 3, pp.641-655, March 2005.
[8] Kiya , M. Iwahashi, O. Watanabe, "A new class of lifting wavelet transform for
guaranteeing losslessness of specific signals," IEEE international conference,
acoustics, speech, and signal processing (ICASSP), pp.3273-3276, March 2008.
[9] Y. Harada, S. Muramatsu, H. Kiya, "Two channel QMF bank without checker board
effect and its lattice structure," IEICE Trans. on fundamentals, vol.J80-A, no.11,
pp.1857-1867, Nov. 1997.
[10] Wei Dai, T. D. Tran, , "Regularity-constrained pre- and post- filtering for block DCT-based
systems," IEEE Trans. signal processing, vol.51, Issue 10, pp.2568- 2581, Oct. 2003.
[11] Hirakawa, T. W. Parks, "Chromatic adaptation and white balancing problem," IEEE
Proc. international conference, image processing (ICIP), vol. III, pp.984-987, Nov.
2005.
[12] Grangetto, et.al., "Optimization and implementation of the integer wavelet transform
for image coding," IEEE Trans. Image Processing, vol.11, Issue 6, pp. 596-604, June
2002.
[13] Xiao, et.al., "Coefficient sensitivity and structure optimization of multidimensional
state-space digital filters," IEEE Trans. circuits, systems I, vol.45, issue 9, pp.993-
998, 1998.
[14] S. Yamaki, M. Abe, M. Kawamata, "A closed form solution to L2-sensitivity
minimization of second-order state-space digital filters subject to L2-scaling
constraints," IEICT Trans. fundamentals of electronics, communications and
computer sciences, vol.E91A, no.7, pp.1697-1705, July 2008.
[15] Y. Tonomura, S. Chokchaitam, M. Iwahashi, "Minimum hardware implementation of
multipliers of the lifting wavelet transform," IEEE Proc. international conference,
image processing (ICIP), pp.2499-2502, Oct. 2004.
[16] IEEE Standard 754-1985, IEEE standard for binary floating-point arithmetic.
[17] M. Iwahashi, H. Kiya, "Finite word length error analysis based on basic formula of
rounding operation", the international symposium on intelligent signal processing
and communication systems (ISPACS), no.86, pp.49-52, Dec. 2008.
[18] M. Iwahashi, H. Kiya, "Word length condition for DC Lossless DWT," Asia pacific
signal and information processing association (APSIPA) annual summit and
conference, no.TA-P2-6, pp.469–472, Oct. 2009.
[19] The society of motion picture and television engineers, "Standard for television,
1920x1080 image sample structure, digital representation and digital timing
reference sequences for multiple picture rates", SMPTE 274 M-2005, Feb.2005.
[20] H. Iwai, M. Iwahashi, K. Kiya, "Methods for avoiding the checkerboard distortion
caused by finite word length error in multirate system", IEICE Trans.
fundamentals, vol. E93-A, no.3, pp.631-635, March, 2010.
[21] S. Oraintara, T.D. Tran, T.Q. Nquen, "A class of regular biorthogonal linear-phase
filterbanks: Theory, structure, and application in image coding," IEEE Trans. signal
processing, vol.51, no.12, pp.3220-3235, Dec.2003.
[22] Y. Tanaka, M. Ikehara, "First order linear phase filter banks with regularity constrains
for efficient image coding," IEICE Trans. fundamentals, vol. J91-A, no.2, pp.192-201,
Feb. 2008.
[23] M. Iwahashi, H. Kiya, "A lossless condition of lifting DWT for specific DC values",
IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), pp.1458-1461, March 2010.
15
1. Introduction
Digital imaging devices inevitably produce images corrupted with noise. The noise originates
from the sensors and analogue circuitry in the camera. In order to have better and sharper
images and also for commercial reasons, there is a recent tendency to further increase the
image resolution. Nowadays, cameras with more than 20 megapixels are not uncommon. To
reach such a high number of megapixels, the area of the sensor elements must be decreased
and correspondingly the elements become more sensitive to noise, resulting in a lower image
quality due to noise.
During the last decades, the use of image processing techniques has become widespread. The
increasing processing power of computers allows for more sophisticated techniques that are
better adapted to the classes of images under consideration (e.g. photographic images or
medical images). This also allows for new classes of techniques that alleviate the physical
limitations of the sensor elements by means of post-processing such as denoising. Because
of power and hardware complexity constraints, the post-processing techniques implemented
by camera manufacturers are based on simplistic assumptions with respect to the assumed
noise model: for example, while it is well known that photon signals are Poisson distributed,
the techniques most often rely on a white Gaussian noise model. In practice, such model
mismatches generally lead to inferior denoising results. Also, many factors cause the noise
in practice to be colored instead of white (i.e. with a flat power spectrum). For example, the
image formation is often a reconstruction process based on an insufficient number of samples,
and missing samples need to be estimated using interpolation techniques (e.g. Bayer pattern
demosaicing). Doing so, the noise becomes colored. A technique that is designed to remove
white Gaussian noise may offers a image quality: either some noise artifacts may be left in the
image, or the noise is suppressed too much, leading to an overblurred image.
The obvious solution to this problem is to adapt existing techniques to use a colored noise
model that is well matched to the underlying sensor characteristics and/or reconstruction.
Therefore, estimation of the noise statistics is indispensable. Stationary colored noise (or
correlated noise) is completely described by its Power Spectral Density (PSD). The noise PSD
describes the power distribution of the noise in frequency space and can be estimated by
using the Discrete Fourier Transform (DFT). However, noisy images also contain information
other than noise (e.g. edges and textures), and directly estimating the PSD through the DFT
256
2 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
will yield seriously biased estimates caused by the signal presence. Alternatively, the PSD
could be estimated from noise-only patches in the image. However, not all images contain
such patches and also the number of noise samples that can be used for this task is often too
limited to yield reliable PSD estimates. Hence, more specialized techniques are needed.
The discrete wavelet transform (DWT) is an important tool for developing such techniques.
The DWT provides a non-uniform partitioning of the space-frequency plane, which allows
positional information of structures to be included in the estimation. This is not possible with
the DFT, since the DFT cannot recover information at specified positions in the image.
In this chapter, we investigate the estimation of colored noise. First, we discuss a number of
origins for colored noise in images. Next, we explain the importance of wavelets in solving
the estimation problem. To proceed, it is necessary to know how the wavelet-domain and
spatial-domain autocorrelation functions are related to each other, since we are aiming at
estimating the wavelet-domain autocorrelation function. Because the wavelet transform in
general does not fully decorrelate signals as we will explain, noise-free wavelet coefficients
with significant magnitudes can still be found near high-frequent transitions in the signals (for
example, near edges in images). To benefit from prior knowledge in a statistical estimation
approach, we will discuss a number of wavelet domain prior models. Two iterative EM-based
techniques will be presented, to estimate the wavelet-domain autocorrelation function. Next,
we will explain how the parameters of a parametric noise PSD can be estimated using the
presented tools. Finally, we will give a number of experimental results for the proposed
techniques.
For colored noise, neighboring noise samples are not statistically independent, hence spatial
dependencies exist between these samples. Their dependencies can be characterized by the
autocorrelation function of the noise, which is - for colored noise - different from the Dirac
delta function.
The PSD is a related descriptor of colored noise. More specifically, the PSD describes how the
noise energy is distributed in frequency space. According to the Wiener-Khinchin theorem,
the power spectral density is the (discrete time) Fourier transform of the autocorrelation function
Wavelet-Based Analysis
Wavelet-based Analysis and Estimation of and
ColoredEstimation
Noise of Colored Noise 2573
1 0 1
Normalized vertical frequency ωy/π
20
0.5 −5 0.5
10
0 −10 0 0
y
−10
−0.5 −15 −0.5
−20
−1 −20 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Normalized horizontal frequency ωx/π x
(a) (b)
Fig. 1. Noise in PAL broadcasting. (a) Power Spectral Density [dB], (b) Noise signal
(containing horizontal stripe patterns due to correlations).
R w (p):
Pw (ω ) = ∑ Rw (p) exp − jω T p . (3)
p ∈Z 2
White noise has a flat PSD: Pw (ω ) = 1. Suppose a filter with frequency response H (ω ) = 1 is
applied to the noise signal, then the resulting PSD Pw (ω ) becomes Baher (2001):
Pw (ω ) = Pw (ω ) | H (ω )|2 . (4)
Clearly, the PSD P (ω ) is subjected to the filter magnitude response | H (ω )|. Hence one can
think of correlated noise as white noise subjected to linear filtering. In analogy with the term
“white noise” the resulting term is called “colored noise” (or correlated noise, because the filtering
introduces correlations in the noise samples).
In practical circumstances, there are a number of origins of colored noise in images:
• Phase Alternating Line (PAL) television: the noise in PAL television images is a good example
of colored noise. The correlations between the noise samples are caused by several
mechanisms, such as deinterlacing Kwon et al. (2003), demodulation and filter schemes.
In Figure 1, the PSD of a noise patch from a PAL broadcast is shown. Here, there is a high
concentration of energy in the lower horizontal frequencies, leading to horizontal stripes
and artifacts.
• Color interpolation (demosaicing): modern digital cameras use a rectangular arrangement of
photosensitive elements. In this matrix arrangement, photosensitive elements of different
color sensitivity are placed in an interleaved way. This allows sampling of full color
images without the use of three arrays of photosensitive elements. One popular example
is the Bayer pattern Bayer (1976). Color interpolation (or demosaicing) is the process of
estimating the values of missing photosensitive elements.
• Post-processing techniques: image noise often becomes correlated by the use of
post-processing techniques, e.g., image quality enhancement techniques, sharpening
filters, digital zoom functions of cameras, JPEG compression...
258
4 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
π 1
0.8
50
0.6
100
ωy
0.4
150
0.2
200
−π 0
250 −π π
ωx
50 100 150 200 250
(a) (b)
Fig. 2. (a) Image corrupted with colored noise caused by demosaicing (b) PSD of the noise in
the green color channel of (a).
• Thermal cameras: images captured by thermal cameras of the push broom or whisk broom
type often exhibit streaking noise artifacts, mainly caused by detector and sampling
circuitry Aelterman, Goossens, Pižurica & Philips (2010). This kind of noise can be
approximated using a 1/ f frequency characteristic (called pink noise) Borel et al. (1996).
Pink noise also frequently arises in image sensors that acquire pixel data in time.
• Medical imaging: in computed tomography (CT), noise correlations are introduced
by the specific reconstruction technique that is being used. Noise created by the
backprojection algorithm (without reconstruction filter) is called ramp-spectrum noise,
and has an 1/ f frequency characteristic. Noise in magnetic resonance imaging
(MRI) is traditionally considered white Nowak (1999); Pižurica et al. (2003), although
many MRI scanner manufacturers have included a wide range of techniques to
allow for shorter scanning times (mainly to avoid patient motion artifacts in the
images). To name a few: K-space subsampling, partial Fourier, elliptical filtering
Aelterman, Deblaere, Goossens, Pižurica & Philips (2010). The use of these techniques
results in correlated noise in the reconstructed MRI images.
In Figure 3 another example is shown of an image corrupted with colored noise. The colored
noise
was artificially generated by subjecting white noise to a filter with magnitude response
P (ω ) and subsequently by adding the filtered noise to the images.
ωy
−π
−π ωx π
(a) (b)
Fig. 3. Illustration of the noise PSD: (a) Image with correlated noise, (b) The noise PSD (in
frequency domain, the center of the image is the origin of frequency space, white
corresponds with low noise powers, black with high noise powers).
70 20
60
Time domain 15
Wavelet domain
50 10
40 5
30 0
20 -5
70 20
10 -10
60 15
0 -15
50 10
-10
0 200 400 600 800 1000 1200 1400 1600 y(p)
-20
0 100 200 300 400 500 600 700 800 ~
y(p)
x(p)
40
~
x(p)
5
+ 30 + 0
w(p) ~
w(p)
20 -5
10 20
10 -10
8
15
0 -15
6
10
4 -10 -20
0 200 400 600 800 1000 1200 1400 1600 0 100 200 300 400 500 600 700 800
5
2
0 0
-2
-5
-4
-10
-6
-15
-8
-10 -20
0 200 400 600 800 1000 1200 1400 1600 0 100 200 300 400 500 600 700 800
(a) (b)
Fig. 4. Example of a piecewise linear signal with correlated noise. Our goal is to estimate the
noise power spectrum from the corrupted signal y(p). (a) The signals in time domain, (b)
The finest scale of the wavelet transform of the signals (Daubechies’ wavelet with 2
vanishing moments was used).
This problem is illustrated in Figure 4 for a piecewise linear signal corrupted with correlated
Gaussian noise. While the noise statistics can be easily estimated from w(p), we only have
the degraded signal y(p) at our disposal, which also contains an unknown signal component.
A straightforward solution is then to first estimate the signal x̂ (p), to subtract it from y(p)
and finally to estimate the noise statistics from the difference y(p) − x̂ (p). However, optimal
estimation of x (p) from y(p) requires knowledge of the noise statistics on its own, so we have
a chicken-and-egg problem. The common approach is then to use iterative techniques, which
first estimate x̂ (p) and then later refine this estimate x̂ (p) when better estimates for the noise
parameters become available.
In this chapter, we will take a different approach by relying on the properties of wavelets. The
wavelet transform Daubechies (1992); Mallat (1999) analyzes signals according to different
260
6 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
scales and at different points in time. Starting from a fixed mother wavelet ψ (t), the input
signal is correlated with time-shifted and time-stretched (dilated) versions of this wavelet.
Correlations with wavelets with a large dilation factor then give the coarse features of the
signal, while correlations with wavelets with small dilation factors give the fine signal details.
Because the wavelet basis functions are well localized in time or space (this is in contrast to
the basis functions of e.g., the Fourier transform), wavelets are ideal candidates for analyzing
non-stationary signals, having statistical properties that vary in time (or space).
The Daubechies wavelets are a class of orthogonal wavelets for which the number of vanishing
moments for a given support is maximal. More specifically, the n-th moment of a real-valued
wavelet function ψ (t) is defined by:
ˆ +∞
μn = tn ψ (t)dt. (5)
−∞
The Daubechies wavelet of support 2N (with N vanishing moments) will have moments μ n =
0 for 0 ≤ n < N. Now, let us denote the time-shifted and dilated basis functions of ψ (t) by:
1 t−b
ψa,b (t) = √ ψ (6)
a a
√
where a is the dilation factor, b is a time shift, and the constant 1/ a is an energy
normalization factor. The continuous wavelet transform of a signal f ∈ L2 (R ) is defined
by:
ˆ +∞
W f ( a, b ) = f (t)ψa,b (t)dt. (7)
−∞
Now, suppose that a signal is linear on a region larger than the support S ( a) of the wavelet
function ψa,b (t):
f (t) = c · t if |t − b | ≤ S ( a).
For Daubechies wavelets with at least two vanishing moments (N ≥ 2), the corresponding
wavelet coefficient W f ( a, b ) will be zero:
ˆ +∞
W f ( a, b ) = c · tψa,b (t)dt
−∞
ˆ +∞
c t−b
= √ tψ dt
a −∞ a
ˆ +∞
√
=c a at + b ψ t dt
−∞
ˆ +∞ ˆ +∞
In the remainder of this chapter, for the ease of notation, we will consider one particular
wavelet subband (with scale a) at a time and we will denote the corresponding wavelet
coefficients by a tilde: for example x̃ (p) are the wavelet coefficients for that particular scale
of x (p). The process can then be repeated for other subbands as well. Let us now apply a
Wavelet-Based Analysis
Wavelet-based Analysis and Estimation of and
ColoredEstimation
Noise of Colored Noise 2617
Daubechies wavelet transform to the piecewise linear signal from Figure 4(a). The result is
shown in Figure 4(b) for the finest scale of the DWT1 : because of the vanishing moments of
the wavelet, the wavelet coefficients x̃ (p) are zero, except at the positions where the derivative
of x (p) does not exist. At these positions, the wavelet coefficients have a negligibly small
magnitude. This nicely illustrates the sparsifying properties of the DWT for this type of signal.
Correspondingly, the wavelet coefficients ỹ(p) are (approximately) w̃(p), which means that
the chicken-and-egg problem is solved: the noise statistics can be directly estimated from
ỹ (p)! More specifically, the wavelet domain autocorrelation function of w(p) can in this case
be estimated based on the following relationship:
Rw̃ (p) ≈ Rỹ (p) = E ỹ(p )ỹ(p + p ) . (8)
It then suffices to compute the sample autocorrelation function of ỹ(p). There are now two
issues remaining, which we will explain in the remainder of this Chapter:
1. The autocorrelation function of a signal in the wavelet domain (e.g. a for particular wavelet
subband) is not the same as the autocorrelation function of a signal in time domain.
Nevertheless, there exists a simple relation between both, as we will explain in Section
3.
2. Most real-life signals are not piecewise linear functions or piecewise polynomials. For such
signals, the wavelet coefficient magnitudes may become non-negligible, causing serious
biases to the final noise estimates. An example of a frequency modulated signal with
maximal frequency at half length of the signal, is given in Figure 5. Because of the high
local bandwidth of the signal at this time position, the wavelet is not able to cancel out
the signal, resulting in wavelet coefficients with a large magnitude. Consequently, the
approximation ỹ (p) ≈ x̃ (p) does not hold anymore. However, it can be seen in Figure
5(b) that this phenomenon is well localized in time, hence, because the noise process is
assumed to be stationary, a plausible solution would be to estimate the noise statistics from
the wavelet coefficients ỹ(p) that have a small underlying components x̃ (p) (ignoring the
outliers in Figure 5(b)). In Section 4 we will discuss solutions that generalize this idea by
using a statistical prior model for wavelet coefficients.
So far, we discussed the estimation of colored noise for one dimensional signals. The reasoning
can also be extended to higher dimensional signals, such as images. To illustrate this, a
noisy image together with its DWT are shown in Figure 6. It can be seen that the wavelet
subbands (LH, HL and HH in Figure 6) predominantly contain information on the noise,
with exception in the areas of textures and edges (the fine hairs of the mandrill). In these
areas, the (noise-free) wavelet coefficients x̃ (p) still have a relatively large magnitude, but this
phenomenon is localized - in the surrounding smooth regions the wavelet coefficients ỹ(p)
mostly consist of noise.
For higher dimensional signals, the DWT is usually computed by using basis functions that
are tensor products of one dimensional wavelets and one dimensional scaling functions.
While this approach can efficiently deal with point-wise singularities (e.g. bumps, dots,
...), most structures in images are line-like singularities with a given direction. However,
the DWT can not well adapt to the arbitrary direction of the singularity: for example, the
70 20
60
Time domain 15
Wavelet domain
50 10
40 5
30 0
20 -5
70 20
10 -10
60 15
0 -15
50 10
-10
0 200 400 600 800 1000 1200 1400 1600 y(p)
-20
0 100 200 300 400 500 600 700 800 ~
y(p)
x(p)
40
~
x(p)
5
+ 30 + 0
w(p) ~
w(p)
20 -5
10 20
10 -10
8
15
0 -15
6
10
4 -10 -20
0 200 400 600 800 1000 1200 1400 1600 0 100 200 300 400 500 600 700 800
5
2
0 0
-2
-5
-4
-10
-6
-15
-8
-10 -20
0 200 400 600 800 1000 1200 1400 1600 0 100 200 300 400 500 600 700 800
(a) (b)
Fig. 5. Example of a non-piecewise linear signal with correlated noise. Our goal is to estimate
the noise power spectrum from the corrupted signal y(p). (a) The signals in time domain, (b)
The finest scale of the wavelet transform of the signals (Daubechies’ wavelet with 2
vanishing moments was used).
transform can not make a distinction between features oriented at +45° and -45°. This
is known as the checkerboard problem of the DWT: due to the separability of the higher
dimensional wavelets, these wavelets appear as a checkerboard pattern which does not have
a dominant direction. Consequently, many nonzero wavelet coefficients may be needed to
represent a line singularity at an arbitrary orientation. To overcome this limitation there has
recently been a lot of interest in transforms that offer a better directional selectivity. Examples
are steerable pyramids Simoncelli et al. (1992), dual-tree complex wavelets Selesnick et al.
(2005a), Marr-like wavelet pyramids Van De Ville & Unser (2008), 2-D (log) Gabor transforms
Fischer et al. (2007); Lee (1996), contourlets Do & Vetterli (2005), ridgelets Candès (1998);
Do & Vetterli (2003), curvelets Candès et al. (2006) and shearlets Guo & Labate (2007). These
transforms are designed to have better sparsifying properties so that our outlier problem in
Figure 5(b) is alleviated (but not solved).
In the next subsections we will focus on the DWT as a primary multiresolution decomposition
tool, however, the same reasoning can also be applied to more recently developed transforms.
LL2 HL2
LH 1 HH1
(a) (b)
Fig. 6. (a) Baboon image with noise, (b) DWT of the image.
are subsequently decimated by a factor of two. The analysis is iterated on the scaling
coefficients F2 (z). Now, the input signal has an autocorrelation function in the z-domain
defined by R̃1 (z) = E F̃1 (z) F̃1 (z−1 ) . The filtered signals then have autocorrelation functions
respectively R̃1 (z) G (z) G (z−1 ) and R̃1 (z) H (z) H (z−1 ). Decimating the resulting signals by a
factor 2 leads to the signal with autocorrelation function Goossens et al. (2010):
R1 ( z) = E F1 ( z) F1 ( z−1 )
1 1 1 −1 1 1 1
= R̃1 z 2 G z 2 G z 2 + R̃1 − z 2 G − z 2 G − z− 2 ,
2
R2 ( z) = E F2 ( z) F2 ( z−1 )
1 1 1 −1 1 1 1
= R̃1 z 2 H z 2 H z 2 + R̃1 − z 2 H − z 2 H − z− 2 . (9)
2
Hence, the wavelet-domain autocorrelation function R1 (z) can be directly computed from the
autocorrelation function of the input signal R̃1 (z) and the wavelet and scaling filters. This
involves two simple convolutions and a decimation operation of the input autocorrelation
function R̃1 (z). For subsequent decompositions (coarser scales of the wavelet transform), this
process can be iterated by re-inserting R̃1 (z) = R2 (z) in (9).
To show that this reasoning also applies to other wavelet transforms, we will briefly discuss
the adaptation to the dual-tree complex wavelet transform (DT-CWT) Kingsbury (2001) in
one dimension. Extension to higher dimensions is then straightforward. The 1D DT-CWT is
implemented using two parallel DWT filter banks, the first filter bank uses the real parts of
the complex wavelet and scaling filters (respectively G1 (z) and H1 (z)), while in the second
filter bank, the imaginary parts of the wavelet and scaling filters (respectively G2 (z) and
H2 (z)) are applied. Finally, the output of both filter banks are mixed together (see the right
square in Figure 7(b)), applying a 45° rotation in the complex plane. This last step is in fact
only necessary in 2D (or higher dimensions), where complex wavelets are constructed using
tensor products of 1D complex wavelets. The translation of the resulting complex-valued
filter banks to parallel real-valued filter banks then automatically results into this phase
modulation in the complex plane (for more details, see Selesnick et al. (2005b)). Defining
264
10 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
F1 ( z) +
F̃1 ( z) - G1 (z) - ↓n
2 - √1 - +n- F1 (z)
2
F2 ( z)
F̃2 ( z) - G2 (z) - ↓n
2 - √1 -U +n- F2 (z)
2
−
F̃1 ( z) - G ( z) - ↓n
2 - F1 ( z)
- H1 (z) - ↓n
2 -
Computation of
F3 ( z) complex coefficients
- H ( z) - ↓n
2 - F2 ( z) - H2 (z) - ↓n
2 - F4 ( z)
1
β=1
0.9 β=1.1
β=1.2
β=1
β=1.3
Wavelet − scale 4 β=1.1 0.8
β=1.2
β=1.3
β=1.4 0.7
Wavelet − scale 3
0.6
|R(ejω)|/β
0.5
Wavelet − scale 2
0.4
0.3
Wavelet − scale 1
0.2
0.1
Time domain
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ω/pi [Normalized frequency]
(a) (b)
Fig. 8. (a) Wavelet analysis
ofthe
autocorrelation function (in z-domain)
β2
R(z) = ∑n π ( n2 − β2 ) 1 + cos πn β zn across different scales and for different values of β.
Daubechies’ wavelet with two vanishing moments was used. (b) Power spectral density
R(e jω ) for different values of β.
influences the noise variances of the individual wavelet subbands (see Figure 8(a)), due to
the frequency-selective behavior of the wavelets at different scales. For example, increasing
the parameter β has as effect that the noise variance at wavelet scale 4 decreases. This also
suggests that, when a thresholding strategy (e.g. soft/hardthresholding) would be used to
suppress the colored noise process, the thresholds would need to be level-dependent, e.g., as
proposed by Johnstone and Silverman Johnstone & Silverman (1997).
266
12 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
50
y(p+1)
0
−50
−50 0 50
y(p)
Fig. 9. Joint histogram of neighboring wavelet coefficients for Figure 5(b). Black dots are
noise coefficients, crosses are the outliers due to signal presence.
kurtosis=6.982752
8000
6000
4000
2000
0
−2000 −1000 0 1000 2000
(1989); Moulin & Liu (1999); Simoncelli & Adelson (1996) proposed to use a generalized Laplace
distribution (GLD, also known as generalized Gaussian distribution) to model the kurtotic
behavior of wavelet coefficients. The GLD is defined as:
ν
ν
x̃
f x̃ ( x̃ ) = exp −
, (13)
2sΓ (1/ν) s
´ +∞
where Γ ( x ) = 0 t x −1 e−t dt is the Gamma function. The parameter s is scale parameter of
the distribution, which controls the variance of the distribution. The parameter ν is a shape
parameter that is related to the kurtosis of the distribution, given by:
Γ (5/ν)Γ (1/ν)
κ= − 3. (14)
Γ2 (3/ν)
The shape parameter ν is typically in the range [0.5, 1]. Because in practice, the actual value
of this parameter is unknown, the parameter value is usually estimated from the observed
data. This may be done using the maximum likelihood method or the method of moments
Srivastava et al. (2003).
268
14 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
where m is the mean of the distribution (typically m = 0), g(u ) is a real-valued function
(called density generator function), d is the length of x̃ and k d is a proportionality constant.
A multivariate extension of the GLD is obtained
by using the following density generator
function Kotz et al. (2000): g(u ) = exp − |u |ν . The resulting distribution is known as
the multivariate exponential power distribution (EPD). For our modeling task, the EPD
has a number of practical limitations: 1) the marginal densities of the distribution are not
EPD-distributed and 2) for estimation purposes, the exponential power ν often leads to
integral expressions that are analytically intractable.
Wainwright & Simoncelli (2000) noted that when the wavelet filter responses are normalized
by dividing by the square root of the local variance, the statistics of the normalized coefficients
are approximately Gaussian. The Gaussian Scale Mixture (GSM), see Figure 10(c), was then
proposed to account both for the correlations and the variability in local variance of the
wavelet coefficients. A random variable x̃ is GSM distributed if it can be written as the
product of a zero mean Gaussian random vector ũ and a scalar positive random variable
√
z Andrews & Mallows (1974):
d √
x̃ = zũ (16)
d
where = denotes equality in distribution. The scalar variable z is not observed and is therefore
√
also called ’hidden’ multiplier or mixing variable. Because of scaling ambiguity between z
and ũ, the hidden multiplier is often assumed to be normalized such that E [ z] = 1. Prior
distributions for z include Jeffrey’s non-informative3 prior Portilla et al. (2003), the log-normal
prior Portilla & Simoncelli (2001), the exponential distribution Selesnick (2006) and the Gamma
distribution Fadili & Boubchir (2005); Srivastava et al. (2002).
2 Quite often, the neighborhoods are chosen to be overlapping, despite of the fact that this destroys the
mutual independence of the different neighborhood vectors. This is done to arrive at a sufficiently large
number of neighborhood vectors (for example, for a 3 × 3 neighborhood, the number of vectors will be
multiplied by 9), which will generally result in more reliable estimates.
3 Note that in this case, the mathematical expectation E [z] does not exist.
Wavelet-Based Analysis
Wavelet-based Analysis and Estimation of and
ColoredEstimation
Noise of Colored Noise 269
15
The GSM also belongs to the family of ESDs. The density generator function is given by:
ˆ +∞
1
f z (z)z− 2 exp − x2 dz.
d
g( x ) = (17)
0 2z
For some hidden multiplier densities f z (z) a closed-form expression can be found for
g( x ), although most often, numerical integration is performed over a closed interval. In
Gómez et al. (2008) it has been shown that the EPD is also a GSM distribution, for some
values of the shape parameter ν ∈]0, 1]. However, the distribution f z (z) depends on d and
has a complicated analytical expression (see Gómez et al. (2008)).
where w̃ (p) is spatially stationary Gaussian distributed vector of length d with mean 0
and covariance Cw̃ . Due to the assumed noise stationarity, the covariance matrix Cw̃ has
dimensions d × d and is directly related to the noise autocorrelation function Rw̃ (p): the
covariance between two coefficients at positions p and q only depends on the difference in
location between both positions:
where vector-valued indices in (Cw̃ )p,q are used as a short notation for their respective
column-stacked ordening. By (19), the estimation of the noise autocorrelation function is
equivalent to the estimation of the covariance Cw̃ . Next, the noise-free coefficients are GSM
distributed with covariance matrix C x̃ . For the GSM model, we have x̃| z ∼ N (0, zCũ ) .
Consequently, the density of ỹ is a specific case of a Gaussian mixture model:
270
16 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
where the signal covariance is also unknown. We remark that this matrix can be eliminated
relying on Cũ + Cw̃ = Cỹ (this directly follows from (1), when E [z] = 1):
The signal-plus-noise covariance matrix can be estimated using the method of maximum
likelihood: Cỹ = 1 ∑p ỹ (p)ỹ T (p), with N the number of coefficients in the considered
N
wavelet subband.
< | ỹ ( p ) Θ ( i ) ỹ (p)ỹ T (p)
( i +1)
∑ p P z z 0 ,
Cw̃ = , (22)
∑p P z < z0 |ỹ (p), Θ ( i)
where i is the iteration index and Θ ( i) denotes the GSM model parameters at iteration i and
where z0 is a small positive constant. Equation (22) can be motivated by the observation that
for z sufficiently small, Cỹ|z = Cw̃ . The posterior probability that z < z0 , conditioned on an
observation vector ỹ (p), i.e., P z < z0 |ỹ (p), Θ ( i) is then used as a weight in the averaging
process. We can understand this as follows: P z < z0 |ỹ (p), Θ ( i) represents the probability
that a given observation vector contains a negligible signal component. The estimated noise
covariance is then the average over all sample covariances y (˜p)ỹ T (p), weighted by the
probability that the considered sample contains a negligible signal component.
Because the updating rule (22) is not guaranteed to increase the likelihood of the data, at
every iteration it is checked if this new covariance estimate results in a higher likelihood:
Q(Θ ( i) , Θ ( i+1) ) > Q(Θ ( i) , Θ ( i) ), with Q(Θ ( i) , Θ ) the expected log-likelihood function of the
data:
Q(Θ ( i) , Θ ) = E log f z|ỹ (z|ỹ, Θ ) |ỹ, Θ ( i)
ˆ +∞
= ∑ f z|ỹ z|ỹ (p), Θ ( i) log f z|ỹ (z|ỹ (p), Θ ) dz. (23)
p 0
In case the expected log-likelihood (23) decreases, it is proposed in Portilla (2004) to perform
a gradient ascent step:
Wavelet-Based Analysis
Wavelet-based Analysis and Estimation of and
ColoredEstimation
Noise of Colored Noise 271
17
( i +1) ( i) ∂Q(Θ ( i) , Θ )
Cw̃ = Cw̃ +λ
∂Cw̃
( i)
Cw̃ =Cw̃
ˆ +∞
( i) λ
= Cw̃ + N f z (z) (1 − z)Cz−1 (I − C
z Cz−1 )dz, (24)
2 0
where
Although a good fitting to the data was reported in Portilla (2004), the technique requires the
relatively costly evaluation of the expected log-likelihood function (23). Another issue is the
choice of the constant z0 . In Portilla (2004), this was solved by using a discrete GSM mixture
for the hiddenmultiplier density f z (z). By assigning a non-zero probability mass at z = 0, the
probability P z = 0|ỹ (p), Θ ( i) is guaranteed to be non-zero.
In contrast to the GEM algorithm, where Cw̃ is optimized directly, we take a slightly
different approach. We rely on the fact that the density f ỹ (ỹ ) corresponds to a Gaussian
mixture model. This allows us to use the EM algorithm for Gaussian mixtures, with some
modifications that we will describe next. Let us denote by Ck the covariance matrices of
the mixture components. Because of (20), the mixture covariance matrices should be subject
to the constraint zk Cũ + Cw̃ = Ck . Our method now consists of optimizing the expected
log-likelihood function (as in a regular EM algorithm Dempster et al. (1977)), but now subject
to the GSM constraint:
max E log f z|ỹ (z|ỹ, Θ ) |ỹ, Θ ( i) s.t. zk Cũ + Cw̃ = Ck (28)
Θ
To solve this constrained problem, we use the augmented Lagrangian (AL) method. In the AL
method, a constrained problem is translated to an unconstrained problem with a Lagrange
4
Here, values zmin and zmax from (Portilla et al., 2003, p. 1343) are slighly modified to have a good
sampling of the continuous pdf f z ( z) with a small number of components K (for example, K = 6).
272
18 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
multiplier and an extra penalty term. In our case, the unconstrained problem is given by:
K K
max E log f z|ỹ (z|ỹ, Θ ) |ỹ, Θ ( i) − 2 ∑ Vec [ak ] T Vec [Ck -zk Cx̃ -Cw̃ ] - ∑ λk Ck -zk Cx̃ -Cw̃ 2F
Θ k =1 k =1
(29)
where ak , k = 1, ..., K are d × d matrices of Lagrange multipliers, λk are penalty factors, Vec [·]
converts a matrix to a column vector (e.g., using column stacking) and · F is the matrix
Frobenius norm. Taking the derivatives of (29) with respect to C x and Cw and setting to zero
leads to a linear system of equations, in block matrix form:
⎛ ( i +1) ⎞ ⎛ K ⎞
( i)
μ2 I μ1 I C x̃ ∑ k =1 z k λ k C k + a k
⎝ ⎠=⎝ ⎠ (30)
( i +1) ( i)
μ1 I I Cw̃ ∑K k =1 λ k C k + a k
Algorithm 2 Constrained EM algorithm for estimating the noise covariance matrix Cw̃ .
( 0) ( 0) ( 0) ( 0) ( 0) ( 0)
Cỹ = 1
N ∑ỹ (p)ỹ T (p), Cw̃ = 9
10 Cỹ , C x̃ = 0.1Cỹ , Ck = zk Cx̃ + Cw̃ , αk = 1
K, λk = z1
zk .
p
repeat
( i +1)
α̂k = 1
N ∑p P ( z = z k | y (p), Θ ) , for k = 1, ..., K
∑P ( z = z k |ỹ(p),Θ ( i) )ỹ (p)ỹ T ( i) ( i) ( i)
(p)−2λ k z k Cx̃ +Cw̃ −ak
( i +1) p
Ck = , for k = 1, ..., K
∑P (z=z |ỹ(p),Θ( ))−2λ
k
i
k
p
⎛ ⎞
( i +1) ( i +1) ( i)
C x̃ I − μ1 I ∑K = z k λ k Ck + ak
= 1 ⎝ k 1 ⎠
( i +1) μ2 − μ21 − μ1 I μ2 I ( i +1) ( i)
Cw̃ ∑Kk =1 λ k C k + ak
( i +1) ( i) λk ( i +1) ( i +1) ( i +1)
ak = ak + 2 Ck − zk Cx̃ − Cw̃ for k = 1, ..., K
i ← i+1
( i +1) ( i)
until convergence (Cw̃ − Cw̃ <
).
F
Wavelet subbands
Estimate of the
Parameter noise autocorr.
estimation function
Parametric autocorr. function
for each wavelet subband
Wavelet-based
Parametric noise
autocorrelation
autocorrelation function
decomposition
Fig. 11. Overview of the proposed algorithm for the estimation of a parametric noise PSD.
denoising approaches (see Portilla (2004)), the covariance matrices are not directly related to
the noise PSD (in the sense that, after estimation of the covariances matrices the noise PSD is
still unknown). We here present a novel approach to estimate the parameters of a parametric
noise PSD based on the covariance matrix estimation methods. As far as the authors are
aware of, such a technique does not yet exist. This approach also combines all the different
techniques discussed in this Chapter. An overview of our algorithm is given in Figure 11.
First, the noise is assumed to have a PSD with an unkown set of parameters β. Consequently,
by the Wiener-Khinchin theorem, the noise autocorrelation function Rw,β (p) is known. The
wavelet-domain noise autocorrelation functions can be computed from Rw,β (p), as explained
in Section 3. Using the formula (19), the parametric wavelet domain noise covariance matrix
274
20 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
Cw̃ (β ) can be found. Defining Rw (β ) = Rw,β (p) , the noise covariance matrix can be
expressed in terms of Rw (β ) by using a matrix multiplication:
Note that in practice this equation may be iterated several times until convergence in
an inner iteration, before the other model parameters are updated. As an example,
consider the autocorrelation function from Figure 8, corresponding to the PSD P (ω ) =
β sin ( β | ω |) I [ β | ω | < π ], with I [·] the indicator function. Application
of the inverse DTFT
πn
Substitution of (36) into (35) then gives the desired update step.
An interesting special case is the estimation of white Gaussian noise, with autocorrelation
function Rw,β (n ) = sδ(n ), with s the unknown noise variance. In this case, (34) comprises a
least-squares problem, with a linear solution.
7. Experimental results
In this Section, we will compare the performances of the noise estimation methods from
Section 5. For this task, both iterative algorithms (the GEM algorithm and the constrained EM
algorithm), are initialized using the same set of parameters. The initial values used are given
in Algorithm 2 and in (27). The number of mixture components used is 6: K = 6. Five images
(Barbara, Baboon, Lena, Boats and Peppers) are transformed to the wavelet domain, using the
Daubechies wavelet with two vanishing moments. Artificial Gaussian noise with a known
(ground-truth) autocorrelation function is added to each LH1 -subband, which allows us to
compute the estimation
error
afterwards.
Thisground-truth noise autocorrelation function is
πy
with σ ∈ {1, 5, 10, 15, 25, 50}. Then, after every iteration of both algorithms, the log-likelihood
2
function log f ỹ |Θ (ỹ | Θ ) and the quadratic error C w̃ − Cw̃ are computed, which allows us
F
to compare the performances of both algorithms as function of the iteration number i. Both
Wavelet-Based Analysis
Wavelet-based Analysis and Estimation of and
ColoredEstimation
Noise of Colored Noise 275
21
10 log10(||Cw−Cw,true||2)
−1.78
25
log fy(y)
−1.8
20
−1.82
15
−1.84
Fig. 12. Comparison of the performance of the GEM algorithm Portilla (2004) and the
constrained EM algorithm Subsection 5.2, as a function the iteration number i. Results are
averaged over 5 images and 6 noise levels. (left) average log-likelihood log f ỹ|Θ (ỹ | Θ ), (right)
2
average estimation error in logarithmic scale 10 log10 C w̃ − Cw̃ .
F
Table 1. Comparison of the performance of the GEM algorithm Portilla (2004) and the
constrained EM algorithm (CEM) from Subsection 5.2, for 5 images and 6 noise levels. Shown
2
is the estimation error in logarithmic scale 10 log10 Cw̃ − Cw̃ after 40 iterations.
F
σ=1 σ=5 σ = 10 σ = 15 σ = 25 σ = 50
Image CEM GEM CEM GEM CEM GEM CEM GEM CEM GEM CEM GEM
Barbara 14.25 14.56 -12.98 -11.78 -26.63 -23.21 -29.29 -28.08 -36.50 -31.83 -37.93 -35.50
Baboon 24.02 28.74 -3.79 2.13 -14.84 -7.23 -21.88 -12.82 -25.60 -18.60 -30.75 -28.38
Lena 9.56 14.42 -16.77 -12.23 -23.25 -23.34 -29.54 -29.68 -37.99 -37.17 -38.38 -38.93
Boats 7.72 9.66 -17.77 -16.55 -30.35 -26.31 -30.09 -28.59 -35.83 -32.65 -37.90 -37.06
Peppers 11.28 17.06 -14.34 -9.92 -24.71 -21.25 -30.10 -27.86 -31.84 -34.05 -40.29 -36.51
Average 13.37 16.89 -13.13 -9.67 -23.96 -20.27 -28.18 -25.41 -33.55 -30.86 -37.05 -35.28
algorithms maximize the log-likelihood function, note however that this does not necessarily
results in minimizing the quadratic error. The results are shown in Figure 12 and Table 1. It can
be seen that while the GEM algorithm converges to its final value, on average the constrained
EM algorithm is able to reach a solution with a higher log-likelihood function and a lower
error. We remark that the objective function is non-convex, such that both algorithms can get
trapped in local maxima. Although both algorithms use the same initialization, in most of
the experiments (see Table 1) the constrained EM gives a more accurate estimate of the noise
covariance matrix.
In Figure 13 and Figure 14, we used the noise estimation method based on the constrained
EM algorithm in combination with the BLS-GSM Portilla et al. (2003) denoising method, in
order to perform blind noise removal. An undecimated wavelet transform of 3 levels with the
Daubechies wavelet with eight vanishing moments was used. The PSD of the Gaussian noise
is in the captions of Figure 13 and Figure 14. Clearly, the combined method is well able to
distinguish signal information from noise information, leading to a succesful removal of the
noise while preserving signal structures.
276
22 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
(a) Original image (b) With artificial noise (PSNR=20.17dB) (c) Denoised (PSNR=41.21dB)
Fig. 13. Blind denoising results (using the BLS-GSM denoising method and the proposed
constrained EM noise estimation technique). Noise PSD
P (ω ) ∼ exp(−4000((ω x /π − 0.1)2 + (ω y /π − 0.12)2 )).
(a) Original image (b) With artificial noise (PSNR=17.25dB) (c) Denoised (PSNR=29.00dB)
Fig. 14. Blind denoising results (using the BLS-GSM denoising method and the proposed
constrained EM noise estimation technique). Noise PSD P (ω ) ∼ exp(−2000((ω x /π −
0.1)2 + (ω y /π − 0.12)2 ))+ exp(−3000((ω x /π + 0.15)2 + (ω y /π − 0.22)2 )) + 10−3 .
8. Conclusion
In this chapter, we investigated the estimation of stationary colored noise, which is most
efficiently described in a Fourier basis using the power spectral density (PSD). Because of
the time or spatial locality of signal structures, estimation of colored noise is best performed
in a transform domain that allows to adapt to the signal locality. We have shown that
wavelets are very good candidates for this task: their vanishing moment properties allow
us to complete suppress smoothly varying signals, such that efficient noise estimation can
directly be performed on a single wavelet subband. However, in practice, signals are not
smoothly varying and may contain transitions (such as edges and textures in images). To take
Wavelet-Based Analysis
Wavelet-based Analysis and Estimation of and
ColoredEstimation
Noise of Colored Noise 277
23
this into account, we have presented several prior models for noise-free wavelet coefficients.
These prior models are then used in an expectation-maximization algorithm, which gives us
an estimate of the noise covariance matrix for a given wavelet subband. We have further
shown how this covariance matrix is related to the noise autocorrelation function in spatial or
time domain. This relationship can then be used, e.g., to estimate parameters of parametric
PSDs, yielding reliable and accurate estimates for noise PSDs. Because noise is present
in most real-life signals and images, many signal and image processing methods can be
further improved by taking advantage of estimated noise characteristics using techniques as
described in this chapter.
9. References
Abramovich, F., Sapatinas, T. & Silverman, B. (1998). Wavelet thresholding via a Bayesian
approach, J. of the Royal Statist. Society B 60: 725–749.
Achim, A., Bezerianos, A. & Tsakalides, P. (2001). Wavelet-based ultrasound image denoising
using an alpha-stable prior probability model, Proc. International Conference on Image
Processing, Vol. 2, pp. 221–224.
Aelterman, J., Deblaere, K., Goossens, B., Pižurica, A. & Philips, W. (2010). Dual Tree Complex
Wavelet-Based Denoising of correlated noise in 3D Magnetic Resonance Imaging.
Under revision.
Aelterman, J., Goossens, B., Pižurica, A. & Philips, W. (2010). Recent Advances in Signal
Processing, IN-TECH, chapter Suppression of Correlated Noise.
Andrews, D. & Mallows, C. (1974). Scale mixtures of normal distributions, J. Royal Stat. Stoc.
36: 99–102.
Antonini, M., Barlaud, M., Mathieu, P. & Daubechies, I. (1992). Image coding using wavelet
transform., IEEE Trans. Image Process. 1(2): 205–220.
Baher, H. (2001). Analog and Digital Signal Processing, Wiley, Chichester.
Bayer, B. (1976). Color imaging array, United States Patent 3971065.
Borel, C., Cooke, B. & Laubscher, B. (1996). Partial Removal of Correlated noise in Thermal
Imagery, Proceedings of SPIE, Vol. 2759, pp. 131–138.
Campbell, N. A., Lopuhaä, H. P. & Rousseeuw, P. J. (1998). On the calculation of a robust
s-estimator of a covariance matrix., Stat Med 17(23): 2685–2695.
Candès, E. (1998). Ridgelets: Theory and Applications, PhD thesis, Departement of Statistics,
Stanford University.
Candès, E., Demanet, L., Donoho, D. & Ying, L. (2006). Fast Discrete Curvelet Transforms,
Multiscale modeling and simulation 5(3): 861–899.
Chang, S. G., Yu, B. & Vetterli, M. (1998). Spatially adaptive wavelet thresholding with context
modeling for image denoising, Proc. IEEE Internat. Conf. on Image Proc., Chicago, IL,
USA.
Clyde, M., Parmigiani, G. & Vidakovic, B. (1998). Multiple shrinkage and subset selection in
wavelets, Biometrika 85(2): 391–401.
Crouse, M. S., Nowak, R. D. & Baranuik, R. G. (1998). Wavelet-based statistical signal
processing using hidden Markov models, IEEE Trans. Signal Proc. 46(4): 886–902.
Daubechies, I. (1992). Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics,
Philadelphia.
278
24 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
Dempster, A., Laird, N. & Rubin, D. (1977). Maximum likelihood from incomplete data via
the EM algorithm, Journal of the Royal Statistical Society, Series B 19(1): 1–38.
Do, M. N. & Vetterli, M. (2003). The finite ridgelet transform for image representation, IEEE
Trans. Image Processing 12(1): 16–28.
Do, M. N. & Vetterli, M. (2005). The contourlet transform: An efficient directional
multiresolution image representation, IEEE Trans. Image Process. 14(12): 2091–2106.
Donoho, D. L. & Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet
shrinking, Journal of the American Statistical Association 90(432): 1200–1224.
Fadili, J. M. & Boubchir, L. (2005). Analytical form for a Bayesian wavelet estimator of images
using the Bessel K form densities, IEEE Trans. on Image Process. 14(2): 231–240.
Fan, G. & Xia, X. (2001). Image denoising using local contextual hidden Markov model in the
wavelet domain, IEEE Signal Processing Letters 8(5): 125–128.
Field, D. J. (1987). Relations between the statistics of natural images and the response
properties of cortical cells, J. Opt. Soc. Am. A 4(12): 2379–2394.
Fischer, S., Šroubek, F., Perrinet, L., Redondo, R. & Cristobal, G. (2007). Self-Invertible 2D
Log-Gabor Wavelets, International Journal of Computer Vision 75(2): 231–246.
Gómez, E., Gómez-Villegas, M. A. & Marín, J. M. (2008). Multivariate exponential
power distributions as mixtures of normal distributions with Bayesian applications,
Communications in Statistics - Theory and Methods 37(6): 972–985.
Goossens, B., Aelterman, J., Pižurica, A. & Philips, W. (2010). A Recursive Scheme for
Computing Autocovariance functions of complex wavelet subbands, IEEE Trans.
Signal Processing 58(7): 3907–3912.
Guo, K. & Labate, D. (2007). Optimally Sparse Multidimensional Representation using
Shearlets, SIAM J Math. Anal. 39: 298–318.
Johnstone, I. M. & Silverman, B. W. (1997). Wavelet threshold estimators for data with
correlated noise, Journal of the Royal Statistical Society B 59(2): 319–351.
Kingsbury, N. G. (2001). Complex wavelets for shift invariant analysis and filtering of signals,
Journal of Applied and Computational Harmonic Analysis 10(3): 234–253.
Kotz, S., Kozubowski, T. J. & Podgorski, K. (2000). An asymmetric multivariate laplace
distribution, Computational Statistics 4: 531–540.
Kotz, S. & Kozubowski, T.and Podgorski, K. (2001). The Laplace Distributions And
Generalizations: A Revisit with Applications to Communications, Economics, Engineering,
Finance, Birkhäuser, Boston.
Kwon, O., Sohn, K. & Lee, C. (2003). Deinterlacing using Directional Interpolation and Motion
Compensation, IEEE Trans. Consumer Electronics 49(1): 198–203.
Lee, T. (1996). Image Representation Using 2D Gabor Wavelets, IEEE Trans. Pattern Analysis
and Machine Intelligence 18(10): 1.
Mallat, S. (1989). Multifrequency channel decomposition of images and wavelet models, IEEE
Trans. Acoust., Speech, Signal Proc. 37(12): 2091–2110.
Mallat, S. (1999). A Wavelet Tour of Signal Processing, Academic Press.
Moulin, P. & Liu, J. (1999). Analysis of multiresolution image denoising schemes using
generalized-gaussian and complexity priors, IEEE Trans. Info. Theory, Special Issue on
Multiscale Analysis 3(3): 909–919.
Nikias, C. L. & Shao, M. (1995). Signal Processing with Alpha-Stable Distributions and
Applications, Wiley-Interscience.
Wavelet-Based Analysis
Wavelet-based Analysis and Estimation of and
ColoredEstimation
Noise of Colored Noise 279
25
Nowak, R. (1999). Wavelet-based rician noise removal for magnetic resonance imaging., IEEE
Trans Image Process 8(10): 1408–1419.
Pena, D. & Prieto, F. (2001). Multivariate outlier detection and robust covariance matrix
estimation, Technometrics 43(3): 286–310.
Pižurica, A. & Philips, W. (2006). Estimating the probability of the presence of a signal of
interest in multiresolution single- and multiband image denoising., IEEE Trans. Image
Process. 15(3): 654–665.
Pižurica, A., Philips, W., Lemahieu, I. & Acheroy, M. (2003). A versatile wavelet domain noise
filtration technique for medical imaging, IEEE Trans. Medical Imaging 22(3): 323–331.
Portilla, J. (2004). Full blind denoising through noise covariance estimation using Gaussian
Scale Mixtures in the wavelet domain, IEEE Int. Conf. on Image Process. (ICIP)
2: 1217–1220.
Portilla, J. & Simoncelli, E. (2001). Adaptive Wiener Denoising using a Gaussian Scale Mixture
Model in the Wavelet Domain, IEEE Int. Conf. on Image Process. (ICIP) 2: 37–40.
Portilla, J., Strela, V., Wainwright, M. & Simoncelli, E. (2003). Image denoising using scale
mixtures of gaussians in the wavelet domain, IEEE Transactions on image processing
12(11): 1338–1351.
Rabbani, H., Vafadust, M., Gazor, S. & Selesnick, I. W. (2006). Image Denoising Employing
a Bivariate Cauchy Distribution with Local Variance in Complex Wavelet Domain,
12th Digital Signal Processing Workshop - 4th Signal Processing Education Workshop,
pp. 203–208.
Romberg, J., Choi, H. & Baraniuk, R. G. (2001). Bayesian tree structured image modeling using
wavelet-domain Hidden Markov Models, IEEE Trans. Image Process. 10(7): 1056–1068.
Enter text here.
Selesnick, I. W. (2006). Laplace random vectors, Gaussian noise, and the generalized
incomplete Gamma function, Proc. IEEE Int. Conf. on Image Process., pp. 2097–2100.
Selesnick, I. W., Baraniuk, R. G. & Kingsbury, N. G. (2005a). The Dual-Tree Complex Wavelet
Transform, IEEE Signal Processing Magazine 22(6): 123–151.
Selesnick, I. W., Baraniuk, R. G. & Kingsbury, N. G. (2005b). The Dual-Tree Complex Wavelet
Transform, IEEE Signal Processing Magazine 22(6): 123–151.
Shi, F. & Selesnick, I. W. (2006). Multivariate Quasi-Laplacian Mixture Models for
Wavelet-based Image Denoising, Proc. Int. Conf. on Image Processing (ICIP),
pp. 2097–2100.
Simoncelli, E., Freeman, W. T., Adelson, E. H. & Heeger, D. J. (1992). Shiftable Multi-scale
Transforms, IEEE Trans. Information Theory 38(2): 587–607.
Simoncelli, E. P. & Adelson, E. H. (1996). Noise removal via Bayesian wavelet coring, Proc.
IEEE Internat. Conf. Image Proc. ICIP, Lausanne, Switzerland.
Srivastava, A., Lee, A. B., Simoncelli, E. & Zhu, S.-C. (2003). On Advances in Statistical
Modeling of Natural Images, Journal of Mathematical Imaging and Vision 18: 17–33.
Srivastava, A., Liu, X. & Grenander, U. (2002). Universal Analytical Forms for Modeling Image
Probabilities, IEEE Trans. Pattern Analysis and Machine Intelligence 24(9): 1200–1214.
Tzikas, D., Likas, A. & Galatsanos, N. (2007). Variational bayesian blind image deconvolution
with student-t priors, Proc. IEEE International Conference on Image Processing ICIP 2007,
Vol. 1, pp. I–109–I–112.
Van De Ville, D. & Unser, M. (2008). Complex Wavelet Bases, Steerability, and the Marr-like
pyramid, IEEE Trans. Image Processing 17(11): 2063–2080.
280
26 Discrete Wavelet Transforms: AlgorithmsDiscrete
and Wavelet
Applications
Transforms
Vo, A., Nguyen, T. & Oraintara, S. (2007). Image denoising using shiftable directional pyramid
and scale mixtures of complex gaussians, Proc. IEEE International Symposium on
Circuits and Systems ISCAS 2007, pp. 4000–4003.
Wainwright, M. J. & Simoncelli, E. P. (2000). Scale mixtures of Gaussians and the statistics of
natural images, Adv. Neural Information Processing Systems (NIPS 1999) 12: 855–861.
16
1. Introduction
The time-independent neutron transport equation derived from the Boltzmann equation with
a linear collision kernel models the neutron population in the six dimensional space defined
by r ∈ D the space variable, Ω ∈ S2 the direction of motion variable and E ∈ B =] EG +1, E1 [
the energy variable.
It represents
the balance between the neutrons entering the hypervolume
3 2
d rd ΩdE about r, Ω, E by fission or scattering and those leaving by streaming or any kind
of interactions. The unknown is the so-called neutron flux φ(r, Ω, E ) = v( E )n (r,
Ω, E ) with
n (r, Ω, E ) the neutron density and v( E ) the neutron velocity. The problem is defined in terms
of the neutron interaction properties of the different materials i.e. the cross sections.
The solution of this equation in a deterministic way proceeds by the successive discretization
of the three variables: energy, angle and space. The treatment of the energy variable invariably
consists in a multigroup discretization which considers the cross sections and the flux to be
constant within a group (i.e. a cell of the 1D energy mesh). A pre-homogenization of the cross
sections is performed at the library processing level using a spatially independent weighting
flux (e.g. 1/E spectrum in the epithermal range).
With a broad group structure (≈ 100 to 2000 energy groups), this prior homogenization is
unsufficient to take into account the case-specific, spatially-dependent, self-shielding effect i.e.
the flux local depression in the vicinity of resonances that largely affects the neutron balance.
As a consequence, a neutron transport calculation has to incorporate a so-called self-shielding
model to correct the group cross sections of resonant isotopes. This homogenization stage
of a neutron transport calculation is known to be a main source of errors for deterministic
methods; as a consequence, an important work has been carried out to improve it. An
optimized energy mesh structure (Mosca et al., 2011) in addition to an advanced self-shielding
model (Hébert, 2007) is incorporated in state-of-the-art transport codes.
A different treatment for the energy variable based on a finite element approach is the basis
of the present work. Such an avenue was proposed in the past by (Allen, 1986) but seldom
used in practice. Indeed, finite element methods are commonly based on polynomial function
bases which are not appropriate for non-smooth behavior.
Recently, two independent works by (Le Tellier et al., 2009) and (Yang et al., 2010) have
proposed wavelet-Galerkin methods to overcome this issue. In this chapter, after a review
282
2 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
of these two approaches, we will focus on the development in this framework of adaptive
algorithms with a control (at least, partial) of the discretization error. Such algorithms have
been partially presented in a previous conference presentation by (Fournier & Le Tellier, 2009)
but this book chapter gives a more in-depth presentation and updated numerical results for
algorithms that may be of interest for other applications of wavelet-based finite elements.
Such algorithms are analyzed in a limited framework (the fine structure flux equation for
a single isotope diluted in a mixture of non-resonant isotopes in an infinite homogeneous
medium) but the relevant issues regarding their extension in the general case are discussed.
Note that Eq. 5 introduces a coupling between the different modes within a group on the left
hand side of the transport equation i.e. a coupling of the angular flux projections φ g (r,
Ω) that
is not present for the standard multigroup approach.
An Adaptive Energy Discretization
of the Neutron
An Adaptive Transport
Energy Discretization Equation
of the Neutron TransportBased on a
Equation based on Wavelet Galerkin
a Wavelet Galerkin Method Method 2833
Two different approaches by (Le Tellier et al., 2009) and (Yang et al., 2010) based on compactly
supported Daubechies wavelets (Daubechies, 1992) have been proposed so far to deal with
this coupling:
1. in (Yang et al., 2010), a dilation order is fixed and the basis consists in the translates of the
associated scaling function; in this case, Σt g (r ) is a band matrix and the mode coupling is
limited in such a way that a Richardson iterative scheme can be employed to resolve this
coupling.
2. in (Le Tellier et al., 2009), both dilates and translates of the mother wavelet functions are
retained in the basis according to a thresholding procedure applied to the discrete wavelet
transform of either the total cross section Σt or an approximate flux. In this case, the basis
selection can be optimized but the modes are tightly coupled; a procedure based on a
change of basis through a matrix diagonalization have been proposed to explictly decouple
the equations.
This second approach proceeds as follows. Let us consider
that the nuclear data are known
g
by their projections on a set of orthonormal functions gk in each group e.g.
k ∈[1,Ng ]
gT
Σt (r, E ) = Σ̂t (r ) g g ( E ). (7)
At this stage, g g is assumed to be spatially uniform. This condition is satisfied for example if
the same set of functions is considered for all the isotopes of a givenconfiguration.
g g
Considering the isomorphism between the Hilbert space Fg = span g1 . . . , g Ng and R Ng , we
g
can construct an orthonormal basis ( f n )n∈[1,Ng ] of Fg in such a way that the different functions
g
fn are Σt -orthogonal. Indeed,
g
Σ̃t (r ) = dEg g ( E )Σt (r, E ) g gT ( E ) (8)
Ig
is unitary similar to a diagonal matrix (see (Le Tellier et al., 2009) for more details) i.e.
g g
Σ̃t (r ) = C g (r )Σt (r )C gT (r ), (9)
with
g
• C g (r ) = a unitary matrix containing the eigenvectors of Σ̃t (r ),
• Σt g (r ) = a diagonal matrix containing its eigenvalues.
gT
Σt (r, E ) = ∑ Π I (E) ∑ ΠD (r)Σ̂ti
g i
(r ) g g ( E ), (10)
g i
where ΠD i is the characteristic function of Di and the flux is expanded as
g
For a given i, f g is uniform on Di and is obtained by diagonalizing Σ̃ti as previously
i
described.
For r belonging to a uniform medium domain Di , Eq. 4 can be written without any
complication of the streaming term. In fact, this formulation of the transport equation is
similar to the standard multigroup form. In this case, the mode coupling only appears for
the conditions at the interface Γ ij between two uniform medium domains Di and D j along
Ω. The continuity of φ(r,
Ω, E ) at r ∈ Γ ij implies directly the continuity of φ g (r,
Ω) in the
standard multigroup case:
φj (r,
Ω) = φi (r,
g g
Ω ), (12)
while, in our case, it translates into
φ j (r,
Ω) = C j Ci φi (r,
g gT g g
Ω ). (13)
When crossing an interface between two uniform media domain, a change of basis with
respect to the energy expansion has to be performed in order to maintain a diagonal group
transport operator over the whole domain.
σt ∗ g + σd φ g = ∑ σs ∗ g← g φ g + σd dE f g ( E ), (14)
g =1
Ig
HΦ = SΦ + Q. (15)
We will also consider that the source-flux coupling in Eq. 15 is solved by a simple Richardson
iterative scheme under the form
HΦ n+1 = SΦ n + Q. (16)
where α j,k and β j,k correspond to the orthogonal projection of φ on θ j,k and γ j,k respectively.
In the present work, we consider for the basis functions g g in each group Ig a subset of
θ0,k k , γ j,k obtained by the sampling, discrete wavelet transform and thresholding
j,k
∗g
of σt ( E ) or an approximate flux restricted to Ig . This is to be distinguished from the work
of (Yang et al., 2010)
where the basis is composed of the scaling functions for a given dilation
order j i.e. g g = θ j,k .
k
In the following, we restrict ourselves to compactly supported wavelets introduced by
(Daubechies, 1992) constructed starting from a function m0 (ξ ) = √1 ∑ hk e−ikξ where hk
2
k
are real-valued coefficients such that only a finite number M (the support length) of hk are
non-zero. In this context, the MRA obeys
and the decomposition of a sampled N −length signal is obtained efficiently by the discrete
wavelet transform (DWT) based on the cascade algorithm proposed in (Mallat, 1989).
286
6 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
where ( β̃ j,k ) j,k is obtained from ( β j,k ) j,k and #( β̃ j,k ) j,k #( β j,k ) j,k
A natural criterion is to discard coefficients lower than a given cut-off ε i.e.
0 if | β j,k | ≤ ε max j,k ( β j,k ),
β̃ j,k = (21)
β j,k otherwise.
This method is called hard thresholding. We refer the interested reader to (Le Tellier et al.,
2009) for a comparison of different wavelet filters and thresholding strategies in this context.
3. Adaptivity
In the context of Eq. 16, adaptive algorithms aim at improving the operators discretization
during the iterative process by dynamically selecting the basis functions and consequently,
optimizing the computational cost and control (at least partially) the error on the final solution.
The proposed algorithms aim at reducing the computational cost defined as the sum of the
supports size at each iteration:
nbIter
cost = ∑ #ΛiA + #ΛSi , (22)
i =1
where ΛSi (resp. ΛiA ) represents the support of operator S (resp. A) at iteration i. Actually,
the computational cost required to solve Eq. 16 is directly linked to the size of the operators
manipulated: ΛSi for the construction of matrix Si and ΛiA the order of the system used for
iterations. It justifies the use of Eq. 22 as a measure of the algorithm computational cost.
Our work differs from the approach in (Cohen, 2003) where the goal was to minimize the
final support. Here, the purpose is to find a balance between the number of iterations and
the support size. In the following, two different algorithms are presented and tested. Both are
based on a decomposition of the error in terms of the Richardson iterations residual (δres ) and
the errors due to the discretization of A and S operators (denoted δ A and δS respectively):
n +1
Φ − Φ 1
≤ δ A
+ δ S
+ δ res
= NB. (23)
Φ n +1 1 − AS
Sections 3.2 and 3.3 explicit this bound for both algorithms. The first version, inspired from
(Cohen, 2003), uses two levels of iterations: one in order to increase the support and one to
converge the residual. The single-loop algorithm is proposed as a simplification of the first
one and a way to correlate the errors on the operators and the residual is detailed.
with A j+1 (resp. S j+1 ) representing matrix A (resp. S) restricted to Λ jA+1 (resp. ΛSj+1 ) support.
The error is given by:
Φ nj++11 − Φ = A j+1 S j+1 Φ nj+1 + Q − A(SΦ + Q)
= A j+1 − A S j+1 Φnj+1 + Q + A S j+1 − S Φnj+1 + AS Φnj+1 − Φ .
(25)
It follows that the relative error can be expressed by Eq. 23 with
S j+1 − S Φ nj+1
δS = A , (26)
n +1
Φ j+1
A j+1 − A S j+1 Φ nj+1 + Q
δ =
A , (27)
n +1
Φ j+1
n +1
Φ j+1 − Φ nj+1
δ = AS
res . (28)
n +1
Φ j+1
A main issue is the choice of the matrices S j+1 and A j+1 or, in other words, the selection of the
wavelet supports. The idea in the remainder is to monitor the errors related to the operator
discretizations using the numerical residual in order to obtain a relation of the type:
n +1 n +1
Φ j+1 − Φ Φ j+1 − Φ nj+1
n +1 ≤ K n +1 , (29)
Φ j+1 Φ j+1
where K is a given constant. The error on the flux is thus controlled by the residual at each
iteration.
The error on δS (resp. δ A )can be practically controlled by a thresholding on the product
SΦ n (resp. A S j+1 Φ nj+1 + Q ) ensuring:
(S j+1 − S )Φ nj+1 ≤ j+1 Φ nj+1 , (30)
A j+1 − A S j+1 Φ nj+1 + Q ≤ j+1 Φ nj+1 . (31)
288
8 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
Remaining coefficients give the new supports ΛSj+1 and Λ jA+1 such that #ΛSj+1 #ΛS and
#Λ jA+1 #Λ A where ΛS and Λ A are the support of S and A operators approximated by
a large number of coefficients. The localization property of wavelets ensure that these two
supports slowly increase when j+1 and j+1 decrease (see (Cohen, 2003) for more details).
By applying the procedures exposed above to A and S, Eq. 23 becomes:
⎛ ⎞
n +1 n n n +1 n
Φ j+1 − Φ 1 Φ j +1 Φ j +1 Φ j +1 − Φ j +1
⎝ ⎠.
n+1 ≤ 1 − AS j+1 n+1 + j+1 A n+1 + AS n +1
Φj +1 Φ
j +1 Φ j +1 Φ j +1
(32)
Note however that the thresholding procedure described for operator A in Eq. 31 cannot
be applied in the general context of the spatially-dependent transport equation (Eq. 4). A
possibility is to use the same support for operators A and S. Such a solution has been tested in
(Fournier & Le Tellier, 2009). Even if the convergence is deteriorated compared to the solution
with two different supports for S and A, results are interesting and show that the adaptive
algorithms proposed in this book chapter are extensible to the general
problem.
As proposed in (Cohen, 2003), a geometrical decreasing sequence j is fixed and iterations
on n are performed until the residual becomes inferior to the value imposed by this sequence.
To link j and j , we ensure that the first two terms defined in Eq. 32 decay at the same rate by
imposing:
j +1
j+1 = . (33)
A
At a given iteration j, Richardson iterations are carried out in order to ensure:
Φ j+1 − Φ j j +1
≤ . (34)
AS
Φ j +1
Combining Eqs. 33 and 34 with the bound of Eq. 32 guarantees the convergence of the error:
Φ j+1 − Φ 3 j+1
. (35)
1 − AS
Φ j +1
tmp = SΦ nj−−11 ;
prod = Thresholding(tmp, j ) ;
% remove smallest coefficients of tmp, guarantee tmp − prod ≤ j Φ nj−−11
Rnj = prod + Q ;
n
ΛS j = Support( Rnj ) ;
Φ nj = H −1 Rnj ;
Φ nj = Thresholding(Φ nj , H −j 1 ) ;
n
Λ A j = Support(Φ nj ) ;
n
Φ j − Φ nj −1
err ←
n ;
Φ j
n ← n+1;
end
Φ j+1 = Φ nj ;
end
Algorithm 1: two-loop adaptive algorithm
1
ε = 1/2, NB
ε = 1/2, L2 error
ε = ρ, NB
0 ε = ρ, L2 error
−1
−2
−4
−5
−6
−7
−8
0 5 10 15 20
number of iterations
−2.5 −4
−3 −5
−3.5 −6
−4 −7
−4.5 −8
−5 −9
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 200
cost cost
Fig. 2. Relative error versus cost for different values for group 88 of 238 U (left) and for
groups 26 to 29 of 16 O (right)
1050
1000
950
900
cost
850
800
750
700
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07
parameter value
A single loop means that the residual is no longer directly controlled and a strategy to handle
this point has to be devised. At a given iteration, the residual is given by:
Φ n +1 − Φ n = A n +1 S n +1 Φ n + Q − A n S n Φ n −1 + Q
= A n +1 − A n S n +1 Φ n + Q + A n S n +1 − S n Φ n + A n S n Φ n − Φ n −1 .
(37)
And the same relationship as the one for the two-loop algorithm holds for the actual error:
( I − AS ) Φn+1 − Φ = An+1 − A S n+1 Φn + Q + A S n+1 − S Φn − AS Φn+1 − Φn .
(38)
Substituting Φ n+1 − Φ n as given by Eq. 37 in Eq. 38 leads to an error bound given by Eq. 23
292
12 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
with
n+1
A S − S − SAn S n+1 − S n Φn
δS = , (39)
Φ n +1
n+1 n+1 n
A − A − AS An+1 − An S Φ +Q
δ A = , (40)
Φ n +1
n
Φ − Φ n −1
δres = ASAn S n
Φ n +1 . (41)
Such a bound for the operator-related error δ A (resp. δS ) is interesting because it takes
into account both An+1 − A (resp. Sn+1 − S ), the distance between the current operator
and the complete one, and An+1 − An (resp. Sn+1 − Sn ), the distance between two
successive operators. The direct control of the numerical residual with Richardson iterations
in the previous algorithm is now “replaced” by the introduction of the distance between two
successive operators in the error bounds on A and S. As the first term decreases with n until 0,
the
second one until A −
increases An (resp. S − Sn ). Depending on the value of AS ,
An+1 − A + AS An+1 − An can be strictly decreasing or presents a minimum or a
maximum (Figure 4).
−4
|| A − A || x 10
n+1
18 ||An+1 − An ||
8
||An+1 − A||
|| An+1 − A || + ρ || An+1 − An ||
||An+1 − An||
16 7 ||A − A|| + ρ ||A − A ||
n+1 n+1 n
14
6
12
5
10
4
8
3
6
4 2
2 1
0
0
0 20 40 60 80 100 120 140 0 100 200 300 400 500 600 700
number of coefficients number of coefficients
Fig. 4. Comparison of error terms defined in Eq. 40 for group 88 of 238 U with AS = 0.26
(left) and with AS artificially increased to 0.8 (right)
Even if the general behaviour is not known, the initial and final bounds are given by:
δ(SSn+1=Sn ) = S
δini = A (S − S ) Φ ,
n n
(44)
δ(AAn+1= An ) = δini
A
= ( A − A n ) S n +1 Φ n + Q . (45)
An Adaptive Energy Discretization
of the Neutron
An Adaptive Transport
Energy Discretization Equation
of the Neutron TransportBased on a
Equation based on Wavelet Galerkin
a Wavelet Galerkin Method Method 293
13
As AS < 1 (ensuring the convergence of Richardson iterations), it guarantees that δSfin <
S and δ A < δ A . These error bounds are at the basis of our algorithm. Three different
δini f in ini
cases are considered:
• δres ∈ [ δSfin , δini
S ]. It is possible to decrease the error due to operator S discretization to
−1
−1
error criterion (log)
−2 −2
−3
−3
−4
−4
−5
−5
−6
−6 −7
1 2 3 4 5 6 7 8 9 0 2 4 6 8 10 12
number of iterations number of iterations
Fig. 5. Comparison of error terms on group 88 of 238 U with AS = 0.26 (left) and with AS
artificially increased to 0.8 (right)
The only remaining parameter is α. A numerical study is performed to give us some
information about the optimal value.
Figure 6 shows that the choice of this parameter is important regarding the cost of the
algorithm. If not enough coefficients are kept at each iteration, the rate of convergence is
low which causes an important cost. On the opposite, if a large number is kept, large systems
294
14 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
1800 700
650
1600
600
1400
550
1200
cost
cost
500
1000
450
800
400
600
350
400 300
0.4 0.5 0.6 0.7 0.8 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
parameter value parameter value
Fig. 6. Cost of the algorithm depending on α for a given accuracy = 10−5 on group 56 of
56 Fe (left) and = 10−4 on group 88 of 238 U (right)
4. Conclusion
Considering a wavelet-based Galerkin discretization for treating the energy variable in the
neutron transport equation, this chapter has proposed two adaptive algorithms for the
Richardson iterative scheme that is commonly used to solve the source-flux coupling. While
An Adaptive Energy Discretization
of the Neutron
An Adaptive Transport
Energy Discretization Equation
of the Neutron TransportBased on a
Equation based on Wavelet Galerkin
a Wavelet Galerkin Method Method 295
15
non−adaptive algorithm
single−loop algorithm non−adaptive algorithm
−1
10 double−loop algorithm −1 single−loop algorithm
10 double−loop algorithm
−2
−2 10
10
error criterion
error criterion
−3
10
−3
10
−4
10
−4
10
−5
10
−5
10
50 100 150 200 500 1000 1500 2000 2500 3000 3500
number of coefficients cost
Fig. 7. Algorithms comparison in terms of the convergence (left) and the cost (right) for
group 88 of 238 U
non−adaptive algorithm −1 non−adaptive algorithm
−1 10
10 single−loop algorithm single−loop algorithm
double−loop algorithm double−loop algorithm
−2
−2 10
10
error criterion
error criterion
−3
−3 10
10
−4
−4 10
10
−5
−5 10
10
100 200 300 400 500 600 700 200 400 600 800 1000 1200 1400 1600
number of coefficients cost
Fig. 8. Algorithms comparison in terms of the convergence (left) and the cost (right) for
group 56 of 56 Fe
the first algorithm based on two nested loops is a modification of an algorithm previously
proposed in the literature, the second one has been devised as a simplification that retains
the same convergence properties. Both approaches are based on a formal decomposition of
the error into three terms: two of them are related to the operators discretization while the
third one is the Richardson residual. The algorithms then consist in a strategy to monitor and
relate these three terms in such a way that error can be controlled by the Richardson iterations
296
16 Discrete Wavelet Transforms: Algorithms and Applications
Will-be-set-by-IN-TECH
residual. As a benefit of these algorithms, the accuracy of the final solution is known and the
cost to obtain it has been decreased by adapting the size of the system during iterations. The
performances of these algorithms have been demonstrated in the restricted framework of the
fine structure flux equation in an homogeneous infinite medium. In the context of neutron
transport calculations, the modifications necessary for spatially-dependent cases have been
mentioned.
5. References
Allen, E. J. (1986). A finite element approach for treating the energy variable in the numerical
solution of the neutron transport equation, Transport Theory and Statistical Physics
15(4): 449–478.
Cohen, A. (2003). Numerical Analysis of Wavelet Methods, Vol. 32 of Studies in Mathematics and
its Application, North Holland.
Daubechies, I. (1992). Ten Lectures on Wavelets, CBMS-NSF Regional Conference Series in
Applied Mathematics, SIAM.
Fournier, D. & Le Tellier, R. (2009). Adaptive algorithms for a self-shielding treatment using
a wavelet-based Galerkin method, Proc. of Int. Conf. on Mathematics, Computational
Methods & Reactor Physics M&C 2009, ANS, Saratoga Springs, USA.
Hardle, W., Kerkyacharian, G., Picard, D. & Tsybakov, A. (1997). Wavelets, approximation and
statistical applications, Seminar Paris-Berlin.
Hébert, A. (2007). A review of legacy and advanced self-shielding models for lattice
calculations, Nuclear Science and Engineering 155(2): 310–320.
Le Tellier, R., Fournier, D. & Ruggieri, J. M. (2009). A wavelet-based finite element method
for the self-shielding issue in neutron transport, Nuclear Science and Engineering
163(1): 34–55.
Mallat, S. G. (1989). A theory for multiresolution signal decomposition: the wavelet
representation, IEEE Transactions on Pattern Analysis and Machine Intelligence
11: 674–693.
Mosca, P., Mounier, C., Sanchez, R. & Arnaud, G. (2011). An adaptive energy mesh constructor
for multigroup library generation for transport codes, Nuclear Science and Engineering
167(1): 40–60.
Yang, W., Wu, H., Zheng, Y. & Cao, L. (2010). Application of wavelets scaling function
expansion method in resonance self-shielding calculation, Annals of Nuclear Energy
37(5): 653–663.