Image Compression Using 2D Dual-Tree Discrete Wavelet Transform (DDWT)
Image Compression Using 2D Dual-Tree Discrete Wavelet Transform (DDWT)
Yao Wang
Department of Electrical and Computer Engineering Polytechnic University Brooklyn, New York, USA [email protected] [8]. To our best knowledge, no real coding result has been reported in literature so far. Encouraged by the coding performance of DDWTVC, this paper investigates image compression using 2D DDWT. Noise shaping is first applied to DDWT coefficients to obtain sparser representation. To look insight into the characteristics of DDWT coefficients after noise shaping, we investigate both their marginal and joint statistics. Then information-theoretic analysis is conducted to further reveal the inter-scale, inter-subband, and intra-subband dependency. Our observation is that the highly sparse DDWT coefficients are substantially decorrelated even to second-order statistics and there is not much remaining dependency can be further exploited. We consider two alternatives for coding DDWT coefficients: SPIHT [14] and EBCOT which is used in JPEG2000 [15]. Our experimental results show that DDWT with SPIHT outperforms JPEG2000 (with CDF 9/7 filters) at low bit rates and is comparable to JPEG2000 at higher bit rates. This is very encouraging as previous image codecs using redundant transforms usually gain at low bit rate but lose at high bit rates [13]. The rest of this paper is organized as follows: Section 2 gives a brief introduction of 2D DDWT. Coefficient statistics of DDWT are investigated in Section 3. Coding schemes with DDWT and experimental results are presented in Section 4 followed by conclusions in Section 5. II. 2-D DUAL-TREE DISCRETE WAVELET TRANSFORM
I. INTRODUCTION Wavelet-based image compression has witnessed great success in the past decade. However, it is well known that the 2D DWT does not represent directional features of images efficiently. Lots of efforts have been contributed to multiscale directional representation [1][2][3]. DDWT proposed by Kingsbury is a promising tool of this kind [4]. DDWT has been successfully used in many applications such as image denoising, texture analysis, and motion estimation. To use a redundant transform for compression seems contradictory to the goal of compression which is to reduce whatever redundancy as much as possible. But if coefficients of a redundant transform are sparse enough, compression can even benefit from the introduced redundancy since most coefficients are nearly zero. Wang et al. first investigate the representation efficiency of 3D DDWT for video [9] and propose a DDWT-based scalable video coding scheme without motion estimation (DDWTVC) [10]. Better coding efficiency in terms of PSNR and visual quality are reported compared with 3D SPIHT which also does not use motion compensation. Kingsbury and Reeves show that DDWT gives higher PSNR at the same estimated entropy of quantized coefficients
This work is supported by the Joint Research Fund for Overseas Chinese Young Scholars of NSFC under grant No. 60528004 and the Distinguished Young Scholars of NSFC under grant No. 60525111.
The DDWT is developed to overcome two main drawbacks of DWT: shift variance and poor directional selectivity [4]. With carefully designed filter banks, DDWT mainly has following advantages: approximate shift invariance, directional selectivity, limited redundancy, and similar computation efficiency as DWT. Either the real part or the imaginary part of DDWT yields perfect reconstruction and thus can be employed as a stand-alone transform. We only use real part to reduce the redundancy factor from 4:1 to 2:1. The real part of DDWT is simply referred to as DDWT hereafter. The implementation of DDWT is very straightforward. An input image is decomposed by two sets of filter banks, (H0a, H1a) and (H0b, H1b) separately, filtering the image horizontally
297
and then vertically just as conventional 2D DWT does. Then eight subbands are obtained: LLa, HLa, LHa, HHa, LLb, HLb, LHb, and HHb. Each high-pass subband from one filter bank is combined with the corresponding subband from the other filter bank by simple linear operations: averaging or differencing. The size of each subband is the same as that of 2D DWT at the same level. But there are six highpass subbands instead of three highpass subbands at each level. The two lowpass subbands, LLa, and LLb, are iteratively decomposed up to a desired level within each branch. For full description of DDWT, please refer to [4]. The basis functions of 2D DDWT and 2D DWT are shown in Fig. 1a and Fig. 1b respectively. Each DDWT basis function is oriented at a certain direction, including 75, 15, and 45. However, the basis function of HH subband of 2D DWT mixes directions of 45 together. Substantial quantization will result in annoying checkerboard artifacts. Therefore, DDWT represents directional features more efficiently than DWT does. A redundant transform can have different coefficient configurations to yield the same reconstruction. Matching Pursuit [5] and basis pursuit [6] are two well-known techniques to get sparse representations over redundant dictionaries, but are too computationally demanding. We resort to an iterative projection-based noise shaping algorithm to sparsify DDWT coefficients. The effectiveness of noise shaping for image and video data has been verified in [8][9][10]. For detailed description of noise shaping, please refer to [7][8]. For all the results reported in this paper, the threshold of noise shaping is initially set to 128, and decreases to a target threshold with a step size of 1 at each iteration. The gain factor is set to 1.6 as suggested in [8]. III. CHARACTERISTICS OF DDWT COEFFICIENTS To get an insight into the characteristics of DDWT coefficients, we first study their marginal statistics and joint statistics, and then investigate the inter-scale, inter-subband, and intra-subband dependency among coefficients. The CDF 9/7 biorthogonal filter bank are employed for DWT and the first level decomposition of DDWT. For the remaining stages of DDWT, Qshift filters in [4] are used. A. Statistical Analysis of DDWT Coefficients Four finest subbands produced by DWT and DDWT respectively are shown in Fig. 2. These subbands are normalized to the range of 0~255 for illustration purpose. It can be observed that DDWT has significantly fewer large coefficients than DWT. So DDWT coefficients are highly sparse after noise shaping. To further quantify the sparseness of the shown subbands, we plot their histograms in the middle row of Fig. 2. All distributions highly peak around zero and have flat tails. This trend is much stronger for DDWT coefficients than for DWT coefficients. One simple measure of sparseness is Kurtosis [11]. The estimated kurtosis of DDWT subbands (55.79 for Barbara and 25.89 for Lena) are much higher than those of DWT subbands (11.34 for Barbara and 17.60 for Lena), which implies the high sparseness of DDWT coefficients.
Marginal distributions (i.e. the histograms) only consider first-order statistics of coefficients. Joint distribution further describes the dependency between transform coefficients. In Fig. 2, the bottom row presents the conditional histograms of subband coefficients (X) conditioned on their nearest left neighbors (NX). Other neighbors also give similar results. Very different joint statistics between DDWT coefficients and DWT coefficients can be observed from the conditional histograms. The probability of DWT coefficients X are related to the magnitude of their neighbors. This suggests that coefficients are still locally dependent with each other in spite of being decorrelated. On the contrary, for DDWT coefficients, the conditional distributions are nearly independent with conditioned value. The kurtosis of conditional distributions P(X | NX = nx), where nx is a fixed value of 10, is 25.00 for Barbara and 12.26 for Lena. They are much higher than the kurtosis of 3 for Gaussian distribution. Based on above statistical analysis, we can conclude that DDWT coefficients are not only non-Gaussian but also conditionally nonGaussian. It is in contrast to the observation that wavelet coefficients are non-Gaussian but conditionally Gaussian [11]. This implies that there is not much room for exploiting correlation among adjacent coefficients within the same subband (intra-subband correlation).
(a)
(b)
Figure 1. (a) Six basis functions of 2D DDWT (real part) at level 3 and (b) three basis functions of 2D separable DWT at the same level.
0.05 0 -50
40 20 0 -20 -40 -40 -20 0 20 NX 40 X
0.05 0 50 0 -100
40 20 0 -20 -40 -40 -20 0 20 NX 40
200
100
(a)
(b)
(c)
(d)
Figure 2. Top row: A finest (a) DWT subband and (b) DDWT subband of Barbara, (c) DWT subband and (d) DDWT subband of Lena; Middle row: the corresponding marginal histograms, the kurtoses of these distributions are (a) 11.34, (b) 55.79, (c) 17.60, and (d) 25.89 respectively; Bottom row: the conditional histograms conditioned on the left neighbor.
298
B. Information-theoretic Analysis of Coefficient Dependency Mutual information is a good criterion to measure how much information one random variable tells about another. Mutual information of three pairs, i.e. I(X;PX), I(X;CX), and I(X;NX), are evaluated, where X denotes a coefficient, PX represents its parent (the coefficient at the same spatial location at the next coarser subband), CX contains its cousins (the coefficients at the same spatial location of other subbands at the same scale), and NX denote its eight neighbors (the neighboring coefficients within the same subband). Therefore, I(X;PX), I(X;CX) and I(X;NX) measures inter-scale, intersubband, and intra-subband correlation, respectively. Note that a coefficient has multiple cousins and neighbors. To estimate mutual information between a random variable and multiple random variables suffers the dilution problem due to lack of enough large number of samples, which thus severely affects the estimation accuracy. Following the method in [12], we reduce the dimensionality with a sufficient statistic. E.g. we estimate I(X;T) instead of I(X;CX), where T = ai CX i is considered to be a sufficient statistic of CX.
i
EBCOT [15]. SPIHT exploits both inter-scale and intrasubband dependency while EBCOT exploits intra-subband dependency only. Both SPIHT and EBCOT do not exploit inter-subband dependency. As shown in the previous section, the inter-subband dependency is very weak with DDWT, and may not be worthy of consideration. The coding schemes with these two methods are referred to as DDWT_SPIHT and DDWT_EBCOT respectively. Three 512512 test images, i.e. Lena, Barbara, and Baboon, are 6-level decomposed for all experiments. Filter banks for DDWT and DWT are the same as the setting in Section 3. DDWT has twice the number of subbands as DWT. These subbands should be organized in such a way that the structure would not degrade the efficiency of subband coding and consume as low overhead as possible. For this purpose, we imagine subbands of each branch to be generated by conventional DWT at the same decomposition level. Then concatenate the co-located subbands of two branches horizontally. As a result, the reorganized subbands look as if they are produced by DWT from an 5121024 image. Then the reorganized subbands are coded as a standard wavelet pyramid using either SPIHT or EBCOT. Coding performances of above two coding schemes together with JPEG2000 (using EBCOT on DWT) and SPIHT (using DWT) are evaluated in Tables 2-4 respectively. DDWT_SPIHT shows the best performance for most cases. It provides about 0.3 dB gain for images with rich directional features. We can observe that DDWT_SPIHT is consistently superior to DDWT_EBCOT. Since DDWT coefficients are highly sparse, only a small number of coefficients are significant at each bitplane. The superiority of SPIHT for DDWT coefficients lies in the efficient clustering of continuous zeros across scales, which is not exploited by EBCOT. Furthermore, JPEG2000 processes a bit-plane block by block. This limits its efficiency in coding large areas of zeros despite the use of context-based arithmetic coding. For coefficients with more intrascale dependency, e.g. LHa band and HHb band of Barbara in Table 1, the performance gap between DDWT_SPIHT and DDWT_EBCOT becomes smaller. This indicates that DDWT_EBCOT may be more efficient in exploiting intra-scale dependency. As shown in Fig. 3, DDWT gives better visual quality due to its high representation efficiency for directional features. R. M. Figueras i Ventura et al. report some coding results based on redundant Gabor dictionaries very recently [13]. Their scheme obtains some coding gain at very low bit rate over JPEG2000, but suffers severe performance degradation as the bit rate increases. It is worth to point out that our DDWT_SPIHT scheme is competitive to JPEG2000 even at high bit rates. This changes the prejudice that image compression with redundant representation only shows better performance at very low bit rate. V. CONCLUSION
Estimated mutual information for DWT coefficients and DDWT coefficients of two test images, Lena and Barbara, are reported in Table 1. It can be seen that I(X;PX) and I(X;CX) for DDWT coefficients are almost zero. I(X;NX) is still large for some subbands of images with rich textures such as Barbara. However, it is significantly smaller than that for DWT coefficients. Results for other natural test images are quite similar to those reported here. We can conclude that the inter-subband correlation and inter-scale correlation are very weak, and the intra-subband correlation is significantly reduced than that of DWT but still exists for highly textured images. IV. CODING SCHEMES AND EXPERIMENTAL RESULTS We examine the application of two representative wavelet coding methods for coding DDWT coefficients after noise shaping: tree-structure based SPIHT [14] and block-based
TABLE I. ESTIMATED MUTUAL I NFORMATION BETWEEN CURRENT COEFFICIENT X AND ITS PARENT PX, ITS COUSINS CX, AND NEIGHBORS NX FOR LENA AND BARBARA. Lena, finest subbands DWT LH I (X;PX) 0.11 I (X;CX) 0.05 I (X;NX) 0.27 HL 0.07 0.05 0.17 DWT LH I (X;PX) 0.16 I (X;CX) 0.18 I (X;NX) 0.70 HL 0.07 0.11 0.44 HH 0.12 0.16 0.33 0.01 0.02 0.29 0.00 0.00 0.10 HH 0.05 0.05 0.10 LHa+b 0.03 0.00 0.09
*
DDWT HLa+b HHa+b LHa-b** HLa-b HHa-b 0.01 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.00 0.00 0.01
Barbara, finest subbands DDWT LHa+b HLa+b HHa+b LHa-b 0.00 0.00 0.00 0.00 0.01 0.06 HLa-b HHa-b 0.00 0.00 0.02 0.00 0.04 0.13
* The subscript a+b means averaging, e.g. LHa+b is the average of LHa and LHb ** The subscript a-b means differencing, e.g. LHa-b is the difference of LHa and LHb
In this paper, we investigate image compression using a redundant wavelet transform: DDWT. The characteristics of DDWT coefficients are investigated from both statistical and information-theoretic point of view. It is shown that both
299
inter-scale and inter-subband dependency of DDWT coefficients are very limited, and the intra-subband correlation is significantly reduced than that of DWT. We examine two coding methods, SPIHT and EBCOT, for DDWT coefficients. Experimental results demonstrate that SPIHT consistently outperforms EBCOT, and that DDWT_SPIHT outperforms JPEG2000 at low bit rate while its coding performance is comparable with JPEG2000 at high bit rates.
TABLE IV. Bit-rate (bpp) 0.1 0.2 0.3 0.4 0.5 0.8 1.0
PERFORMANCE COMPARISON FOR BABOON SPIHT 21.34 22.69 23.76 24.66 25.64 27.84 29.17 JPEG2000 21.35 22.63 23.65 24.62 25.55 27.74 29.09 DDWT (EBCOT) 21.20 22.37 23.47 24.41 25.17 27.91 28.46 DDWT (SPIHT) 21.39 22.75 23.96 24.69 25.51 27.82 28.88
ACKNOWLEDGMENT We would like to thank Dr. Feng Wu, Beibei Wang, and Kun Li for helpful discussions. REFERENCES
[1] M. N. Do and M. Vetterli, The contourlet transform: an efficient directional multiresolution image representation, IEEE Trans. Image Processing, vol. 14, pp. 2091-2106, Dec. 2005. E. L. Pennec and S. Mallat, Sparse geometric image representations with bandelets, IEEE Trans. Image Processing, Vol. 14, pp. 423-438, Apr. 2005. V. Velisavljevic, et al., "Directionlets: anisotropic multidirectional representation with separable filtering", IEEE Trans. Image Processing, Vol. 15, pp.1916-1933, July 2006. N. G. Kingsbury, Complex wavelets for shift invariant analysis and filtering of signals, Applied Computational Harmonic Anal, vol. 10, no. 3, pp. 234-253, May 2001. S. G. Mallat and Z. Zhang, Matching pursuits with time frequency dictionaries, IEEE Trans. Signal Processing, Vol. 41, pp. 3397-3415, Dec. 1993. S. S. Chen, et al., Atomic Decomposition by Basis Pursuit, SIAM J. Scientific Comp., vol. 20, pp. 33-61, 1999. T. H. Reeves and N. G. Kingsbury, Overcomplete image coding using iterative projection-based noise shaping, in Proc. Int. Conf. Image Processing, Rochester, NY, Sept 2002. N. G. Kingsbury and T. H. Reeves, "Redundant representation with complex wavelets: how to achieve sparsity", in Proc. Int. Conf. Image Processing, Barcelona, Sept. 2003. B. Wang, et al., An investigation of 3D dual-tree wavelet transform for video coding, in Proc. Int. Conf. Image Processing, Singapore, Oct. 2004. B. Wang, et al., Video coding using 3-D dual-tree wavelet transforms, in Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Philadelphia, Mar. 2005. R. W. Buccigrossi and E. P. Simoncelli, Image compression via joint statistical characterization in the wavelet domain, IEEE Trans. Image Processing, vol. 8, pp. 16881701, Dec. 1999. J. Liu and P. Moulin, Information-theoretic analysis of interscale and intrascale dependencies between image wavelet coefficients, IEEE Trans. Image Processing, vol. 10, no. 11, pp. 16471658, 2001. R.M. Figueras i Ventura, et al., "Low-rate and flexible image coding with redundant representations", IEEE Trans. Image Processing, Vol. 15, No. 3, Mar. 2006. A. Said and W. A. Pearlman, A new, fast and efficient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuits and System for Video Tech., vol. 6, pp. 243250, Jun 1996. D. Taubman, High performance scalable image compression with ebcot, IEEE Trans. Image Processing, Vol. 9, pp. 11581170, Jul 2000.
[2]
[3]
[4]
Figure 3. Enlarged patches of Lena (top row), Barbara (middle row) and Baboon (bottom row) at 0.2 bpp. Left: original images. Middle: results of JPEG2000. Right: results of DDWT_SPIHT.
[5]
[6] [7]
TABLE II. Bit-rate (bpp) 0.1 0.2 0.3 0.4 0.5 0.8 1.0 TABLE III. Bit-rate (bpp) 0.1 0.2 0.3 0.4 0.5 0.8 1.0
PERFORMANCE COMPARISON FOR LENA JPEG2000 29.94 32.95 34.86 36.12 37.24 39.22 40.35 DDWT (EBCOT) 29.59 32.49 34.13 35.35 36.34 38.07 39.04 DDWT (SPIHT) 30.52 33.40 34.94 36.24 37.12 39.09 39.98
[8]
[9]
[10]
[11]
PERFORMANCE COMPARISON FOR BARBARA SPIHT 24.26 26.66 28.56 30.10 31.40 34.66 36.41 JPEG2000 24.64 27.27 29.18 30.82 32.26 35.28 37.15 DDWT (EBCOT) 24.73 27.51 29.20 30.87 32.13 34.87 36.18 DDWT (SPIHT) 24.83 27.52 29.30 31.08 32.21 35.30 36.68 [12]
[13]
[14]
[15]
300