0% found this document useful (0 votes)
56 views52 pages

02 Sparsity Overview PDF

Uploaded by

Ashwani Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views52 pages

02 Sparsity Overview PDF

Uploaded by

Ashwani Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Sparse Representations:

An Overview of Theory and Applications

Justin Romberg

Georgia Tech, School of ECE

Tsinghua University
October 14, 2013
Beijing, China
Applied and Computational Harmonic Analysis

Signal/image f (t) in the time/spatial domain


Decompose f as a superposition of atoms
X
f (t) = αi ψi (t)
i
ψi = basis functions
αi = expansion coefficients in ψ-domain

Classical example: Fourier series


ψi = complex sinusoids
αi = Fourier coefficients
Modern example: wavelets
ψi = “little waves”
αi = wavelet coefficients
More exotic example: curvelets (more later)
Taking images apart and putting them back together
Frame operators Ψ, Ψ̃ map images to sequences and back
Two sequences of functions: {ψi (t)}, {ψ̃(t)}
Analysis (inner products):
α = Ψ̃∗ [f ], αi = hψ̃i , f i
Synthesis (superposition):
X
f = Ψ[α], f= αi ψi (t)
i
If {ψi (t)} is an orthobasis, then
kαk2`2 = kf k2L2 (Parseval)
X Z
αi βi = f (t)g(t) dt (where β = Ψ̃[g])
i
ψi (t) = ψ̃i (t)
i.e. all sizes and angles are preserved
Overcomplete tight frames have similar properties
ACHA

ACHA Mission: construct “good representations” for


“signals/images” of interest

Examples of “signals/images” of interest


I Classical: signal/image is “bandlimited” or “low-pass”
I Modern: smooth between isolated singularities (e.g. 1D piecewise poly)
I Postmodern: 2D image is smooth between smooth edge contours

Properties of “good representations”


I sparsifies signals/images of interest
I can be computed using fast algorithms
(O(N ) or O(N log N ) — think of the FFT)
Example: The discrete cosine transform (DCT)

For an image f (t, s) on [0, 1]2 , we have


( √
1/ 2 `=0
ψ`,m (t, s) = 2λ` λm · cos(π`t) cos(πms), λ` =
1 otherwise

Closely related to 2D Fourier series/DFT,


the DCT is real, and implicitly does symmetric extension
Can be taken on the whole image, or blockwise (JPEG)
Image approximation using DCT
Take 1% of “low pass” coefficients, set the rest to zero

original approximated

rel. error = 0.075


Image approximation using DCT
Take 1% of “low pass” coefficients, set the rest to zero

original approximated

rel. error = 0.075


Image approximation using DCT
Take 1% of largest coefficients, set the rest to zero (adaptive)

original approximated

rel. error = 0.057


Image approximation using DCT
Take 1% of largest coefficients, set the rest to zero (adaptive)

original approximated

rel. error = 0.057


Wavelets
X
f (t) = αj,k ψj,k (t)
j,k

Multiscale: indexed by scale j and location k


Local: ψj,k analyzes/represents an interval of size ∼ 2−j
Vanishing moments: in regions where f is polynomial, αj,k = 0
ψj,k piecewise poly f
j

..
. wavelet coeffs αj,k
2D wavelet transform

Sparse: few large coeffs, many small coeffs


Important wavelets cluster along edges
Multiscale approximations
Scale = 4, 16384:1

rel. error = 0.29


Multiscale approximations
Scale = 5, 4096:1

rel. error = 0.22


Multiscale approximations
Scale = 6, 1024:1

rel. error = 0.16


Multiscale approximations
Scale = 7, 256:1

rel. error = 0.12


Multiscale approximations
Scale = 8, 64:1

rel. error = 0.07


Multiscale approximations
Scale = 9, 16:1

rel. error = 0.04


Multiscale approximations
Scale = 10, 4:1

rel. error = 0.02


Image approximation using wavelets
Take 1% of largest coefficients, set the rest to zero (adaptive)

original approximated

rel. error = 0.031


DCT/wavelets comparison
Take 1% of largest coefficients, set the rest to zero (adaptive)

DCT wavelets

rel. error = 0.057 rel. error = 0.031


Linear approximation

Linear S-term approximation: keep S coefficients in fixed locations


S
X
fS (t) = αm ψm (t)
m=1
I projection onto fixed subspace
I lowpass filtering, principle components, etc.
Fast coefficient decay ⇒ good approximation

|αm | . m−r ⇒ kf − fS k22 . S −2r+1

Take f (t) periodic, d-times continuously differentiable,


Ψ= Fourier series:
kf − fS k22 . S −2d
The smoother the function, the better the approximation
Something similar is true for wavelets ...
Nonlinear approximation

Nonlinear S-term approximation: keep S largest coefficients


X
fS (t) = αγ ψγ (t), ΓS = locations of S largest |αm |
γ∈ΓS

Fast decay of sorted coefficients ⇒ good approximation

|α|(m) . m−r ⇒ kf − fS k22 . S −2r+1

|α|(m) = mth largest coefficient


Linear v. nonlinear approximation

For f (t) uniformly smooth with d “derivatives”


S-term approx. error
Fourier, linear S −2d+1
Fourier, nonlinear S −2d+1
wavelets, linear S −2d+1
wavelets, nonlinear S −2d+1
For f (t) piecewise smooth
S-term approx. error
Fourier, linear S −1
Fourier, nonlinear S −1
wavelets, linear S −1
wavelets, nonlinear S −2d+1
Nonlinear wavelet approximations adapt to singularities
Wavelet adaptation
piecewise polynomial f (t)

wavelet coeffs αj,k


Approximation curves

Approximating Pau with S-terms...


35

30

− log(rel. error) 25

20

15

10

0
0 2 4 6 8 10
4
x 10

S →

wavelet nonlinear, DCT nonlinear, DCT linear


Approximation comparison
original DCT linear (.075)

DCT nonlinear (.057) wavelet nonlinear (.031)


The ACHA paradigm

Sparse representations yield algorithms for (among other things)


1 compression,
2 estimation in the presence of noise (“denoising”),
3 inverse problems (e.g. tomography),
4 acquisition (compressed sensing)
that are
fast,
relatively simple,
and produce (nearly) optimal results
Compression
Transform-domain image coding

Sparse representation = good compression


Why? Because there are fewer things to code
Basic, “stylized” image coder
1 Transform image into sparse basis
2 Quantize
Most of the xform coefficients are ≈ 0
⇒ they require very few bits to encode
3 Decoder: simply apply inverse transform to quantized coeffs
Image compression

Classical example: JPEG (1980s)


I standard implemented on every digital camera
I representation = Local Fourier
discrete cosine transform on each 8 × 8 block

Modern example: JPEG2000 (1990s)


I representation = wavelets
Wavelets are much sparser for images with edges
I about a factor of 2 better than JPEG in practice
half the space for the same quality image
JPEG vs. JPEG2000

00Visual
000 vs.comparison
JPEG:at 0.25 bits per pixel (≈ 100:1 compression)
JPEG2000 vs
ng Artefacts JPEG Blocking Ar
JPEG2000

l JPEG @ 0.25 bits/pixel JPEG2000 @ 0.25 bits/pixel


vid Taubman, UNSW 27 March, 2003 © David Taubman,
(Images from David Taubman, University of New South Wales)
Sparse transform coding is asymptotically optimal
Donoho, Cohen, Daubechies, DeVore, Vetterli, and others . . .
The statement “transform coding in a sparse basis is a smart thing to
do” can be made mathematically precise
Class of images C
Representation {ψi } (orthobasis) such that
|α|(n) . n−r
for all f ∈ C (|α|(n) is the nth largest transform coefficient)
Simple transform coding: transform, quantize (throwing most coeffs
away)
`() = length of code (# bits) that guarantees the error <  for all
f ∈ C (worst case)
To within log factors
`()  −1/γ , γ = r − 1/2
For piecewise smooth signals and {ψi } = wavelets,
no coder can do fundamentally better
Statistical Estimation
Statistical estimation setup

y(t) = f (t) + σz(t)

y: data
f : object we wish to recover
z: stochastic error; assume zt i.i.d. N (0, 1)
σ: noise level
The quality of an estimate f˜ is given by its risk
(expected mean-square-error)

MSE(f˜, f ) = Ekf˜ − f k22


Transform domain model

y = f + σz
Orthobasis {ψi }:

hy, ψi i = hf, ψi i + hz, ψi i


ỹi = αi + zi

zi Gaussian white noise sequence


σ noise level
αi = hf, ψi i coordinates of f
Classical estimation example

Classical model: signal of interest f is lowpass


time domain Fourier domain
f(t) fˆ (" )

!
"
t
B "

Observable frequencies: 0 ≤ ω ≤ Ω
fˆ(ω) is nonzero only for ω ≤ B ! !
Classical estimation example

Add noise: y = f + z
time domain Fourier domain
y(t) yˆ (" )

t
"
B "

Observation error: Eky − f k22 = Ekŷ − fˆk22 = Ω · σ 2


Noise is spread out over entire spectrum
! !
Classical estimation example

Optimal recovery algorithm: lowpass filter (“kill” all ŷ(ω) for ω > B)
yˆ (" ) ˜
fˆ (" )

! !
" "
B " B "
Original error Recovered error
! ˜
! Ekŷ − fˆk2 = Ω!· σ 2
2 ! Ekfˆ − fˆk2 = B!· σ 2
2

Only the lowpass noise affects the estimate, a savings of (B/Ω)2


Modern estimation example

Model: signal is piecewise smooth


Signal is sparse in the wavelet domain
time domain f (t) wavelet domain αj,k

t −→ j, k −→
Again, the αj,k are concentrated on a small set
This set is signal dependent (and unknown a priori)
⇒ we don’t know where to “filter”
Ideal estimation

yi = αi + σzi , y ∼ Normal(α, σ 2 I)

Suppose an “oracle” tells us which coefficients are above the noise


level
Form the oracle estimate
(
orc yi , if |αi | > σ
α̃i =
0, if |αi | ≤ σ

keep the observed coefficients above the noise level, ignore the rest
Oracle Risk: X
Ekα̃i orc − αk22 = min(αi2 , σ 2 )
i
Ideal estimation

Transform coefficients α
I Total length N = 64
I # nonzero components = 10
I # components above the noise level S = 6

original coeffs α noisy coeffs y oracle estimate α̃orc


3 3 3

2 2 2

1 1 1

0 0 0

−1 −1 −1

−2 −2 −2

−3 −3 −3
10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60

Eky − αk22 =N· σ2 Ekα̃orc − f k22 = S · σ2


Interpretation
X
MSE(α̃orc , α) = min(αi2 , σ 2 )
i

Rearrange the coefficients in decreasing order


|α|2(1) ≥ |α|2(2) ≥ . . . ≥ |α|2(N )
S: number of those αi ’s s.t. αi2 ≥ σ 2
X
M SE(α̃orc , α) = |α|2(i) + S · σ 2
i>S
= kα − αS k22 + S · σ 2
= Approx Error + Number of terms × noise level
= Bias2 + Variance
The sparser the signal,
I the better the approximation error (lower bias), and
I the fewer # terms above the noise level (lower variance)
Can we estimate as well without the oracle?
Denoising by thresholding

Hard-thresholding (“keep or kill”)


(
yi , |yi | ≥ λ
α̃i =
0, |yi | < λ

Soft-thresholding (“shrinkage”)

yi − λ, yi ≥ λ

α̃i = 0, −λ < yi < λ

yi + λ, yi ≤ −λ

Take λ a little bigger than σ


Working assumption: whatever is above λ is signal, whatever is below
is noise
Denoising by thresholding

Thresholding performs (almost) as well as the oracle estimator!


Donoho and Johnstone: √
Form estimate α̃t using threshold λ = σ 2 log N ,
X
MSE(α̃t , α) := Ekα̃t − αk22 ≤ (2 log N + 1) · (σ 2 + min(αi2 , σ 2 ))
i

Thresholding comes within a log factor of the oracle performance


The (2 log N + 1) factor is the price we pay for not knowing the
locations of the important coeffs
Thresholding is simple and effective
Sparsity ⇒ good estimation
Recall: Modern estimation example
Signal is piecewise smooth, and sparse in the wavelet domain
time domain f (t) wavelet domain αj,k

t −→ j, k −→
noisy signal y(t) noisy wavelet coeffs

t −→ j, k −→
Thresholding wavelets
Denoise (estimate) by soft thresholding
noisy signal noisy wavelet coeffs

t −→ j, k −→
recovered signal recovered wavelet coeffs

t −→ j, k −→
Denoising the Phantom

noisy lowpass filtered wavelet thresholding, λ = 3σ

Error = 25.0 Error = 42.6 Error = 11.0


Wavelets and geometry

Wavelet basis functions are isotropic


⇒ they cannot adapt to geometrical structure
Curvelets offer a more refined scaling concept...
Geometrical transforms
smooth image discontinuous along a smooth contour . (b) Parameterization of wedgelet : a li
of the edge, and and specify the grayscale intensities on each side. (c) Example of a piecewise constant ima
ling divides the domain of an image into dyadic squares and uses a piecewise constant wedgelet in each square

imation (with smoothness


) carefully in Section III.
e conventional sense [28]
directly in the spatial do-
arameters to approximate
ur approach is guided by
efficients on subtree
) should behave similarly
dgelet supported on .
o we propose to replace
curvelet bandelet wedgeprint
proximation with a single
Fig. 3. (a) Portion of an image containing a wedgelet through the block
et orthogonally projected
These geometrical basis(b)functions arewedgeprint
Spatial domain parameterized by scale,
obtained location,
by projecting and
onto vertical-ban
orientation (and wavelet subtree .

in Fig. 3, wedgeprints leave few ringing artifacts around approx


Piecewise-smooth approximation

Image fragment: C 2 smooth regions separated by C 2 contours


Fourier approximation

kf − fS k22 . S −1/2

Wavelet approximation

kf − fS k22 . S −1

Geometrical basis approximation

kf − fS k22 . S −2 logq S

(for some small q; within log factor of optimal)


Application: Curvelet denoising

Zoom-in on piece of Lena

wavelet thresholding curvelet thresholding


Summary

Having a sparse representation plays a fundamental role in how well


we can
I compress
I denoise
I restore
signals and images
The above were accomplished with relatively simple algorithms
(in practice, we use similar ideas + a bag a tricks)
Geometrical representation −→ better results

You might also like