0% found this document useful (0 votes)
2 views

pnp_slides

This document discusses the convergence of Plug-and-Play (PnP) methods when integrated with properly trained denoisers for image processing, specifically through optimization techniques like ADMM. It establishes theoretical foundations for PnP methods under certain conditions, proposes real spectral normalization for deep learning-based denoisers, and presents experimental validations of these theories. The findings indicate that PnP methods can effectively combine data-driven denoising with classic optimization approaches, leading to significant improvements in image recovery tasks.

Uploaded by

Norbert Hounsou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

pnp_slides

This document discusses the convergence of Plug-and-Play (PnP) methods when integrated with properly trained denoisers for image processing, specifically through optimization techniques like ADMM. It establishes theoretical foundations for PnP methods under certain conditions, proposes real spectral normalization for deep learning-based denoisers, and presents experimental validations of these theories. The findings indicate that PnP methods can effectively combine data-driven denoising with classic optimization approaches, leading to significant improvements in image recovery tasks.

Uploaded by

Norbert Hounsou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Plug-and-Play Methods Provably Converge with

Properly Trained Denoisers

Ernest K. Ryu1 Jialin Liu1 Sicheng Wang2 Xiaohan Chen2


Zhangyang Wang2 Wotao Yin1

2019 International Conference on Machine Learning

1 UCLA Mathematics
2 Texas A&M Computer Science and Engineering
Image processing via optimization

Consider recovering or denoising an image through the optimization

minimize f (x) + γg(x),


x∈Rd

I x is image
I f (x) is data fidelity (a posteriori knowledge)
I g(x) is noisiness of the image (a priori knowledge)
I γ ≥ 0 is relative importance between f and g

2
Image processing via ADMM

We often use first-order methods, such as ADMM

xk+1 = argmin σ 2 g(x) + (1/2)kx − (y k − uk )k2



x∈Rd
k+1
= argmin αf (y) + (1/2)ky − (xk+1 + uk )k2

y
y∈Rd
k+1
u = u + xk+1 − y k+1
k

with σ 2 = αγ.

3
Image processing via ADMM

More concise notation

xk+1 = Proxσ2 g (y k − uk )
y k+1 = Proxαf (xk+1 + uk )
uk+1 = uk + xk+1 − y k+1 .

The proximal operator of h is

Proxαh (z) = argmin αh(x) + (1/2)kx − zk2 .



x∈Rd

(Well-defined if h is proper, closed, and convex.)

4
Interpretations of ADMM subroutines

The subroutine Proxσ2 g : Rd → Rd is a denoiser, i.e.,

Proxσ2 g : noisy image 7→ less noisy image

Proxαf : Rd → Rd enforces consistency with measured data, i.e.,

Proxαf : less consistent 7→ more consistent with data

5
Other denoisers

However, some state-of-the-art image denoisers do not originate from


optimization problems. (E.g. NLM, BM3D, and CNN.) Nevertheless,
such a denoiser Hσ : Rd → Rd still has the interpretation

Hσ : noisy image 7→ less noisy image

where σ ≥ 0 is a noise parameter.

It is possible to integrate such denoisers with existing algorithms such as


ADMM or proximal gradient?

6
Plug and play!

To address this question, Venkatakrishnan et al.3 proposed


Plug-and-Play ADMM (PnP-ADMM), which simply replaces the proximal
operator Proxσ2 g with the denoiser Hσ :

xk+1 = Hσ (y k − uk )
y k+1 = Proxαf (xk+1 + uk )
uk+1 = uk + xk+1 − y k+1 .

Surprisingly and remarkably, this ad-hoc method exhibited great empirical


success, and spurred much follow-up work.

3 Venkatakrishnan, Bouman, and Wohlberg, Plug-and-play priors for model based

reconstruction, IEEE GlobalSIP, 2013.


7
Plug and play!

By integrating modern denoising priors into ADMM or other proximal


algorithms, PnP combines the advantages of data-driven operators and
classic optimization.

In image denoising, PnP replaces total variation regularization with an


explicit denoiser such as BM3D or deep learning-based denoisers.

PnP is suitable when end-to-end training is impossible (e.g. due to


insufficient data or time).

8
Example: Poisson denoising

noisy,peak 0.1 ansc p4ip

Corrupted image Other method PnP-ADMM with BM3D

Rond, Giryes, and Elad, J. Vis. Commun. Image R. 2016.


Example: Inpainting

Original image 5% random sampling

Sreehari et al., IEEE Trans. Comput. Imag., 2016.


Example: Inpainting

Other method PnP-ADMM with NLM

Sreehari et al., IEEE Trans. Comput. Imag., 2016.


Example: Super resolution

Low resolution input Other method Other method

Other method Other method Other method PnP-ADMM with BM3D

Chan, Wang, Elgendy, IEEE Trans. Comput. Imag., 2017.


Example: Single photon imaging

Corrupted image other method

other method PnP-ADMM with BM3D


Chan, Wang, Elgendy, IEEE Trans. Comput. Imag., 2017.
Example: Single photon imaging

Corrupted image other method

other method PnP-ADMM with BM3D


Chan, Wang, Elgendy, IEEE Trans. Comput. Imag., 2017.
Contribution of this work

The empirical success of Plug-and-Play (PnP) naturally leads us to ask


theoretical questions: When does PnP converge and what denoisers
can we use?

I We prove convergence of PnP methods under a certain Lipschitz


condition.
I We propose real spectral normalization, a technique for constraining
deep learning-based denoisers in their training to enforce the
proposed Lipschitz condition.
I We present experimental results validating our theory.4

4 Code available at: https://github.com/uclaopt/Provable_Plug_and_Play/


9
Outline

PNP-FBS/ADMM and their fixed points

Convergence via contraction

Real spectral normalization: Enforcing Assumption (A)

Experimental validation

PNP-FBS/ADMM and their fixed points 10


PnP FBS

Plug-and-play forward-backward splitting:

xk+1 = Hσ (I − α∇f )(xk ) (PNP-FBS)

where α > 0.

PNP-FBS/ADMM and their fixed points 11


PnP FBS

PNP-FBS is a fixed-point iteration, and x? is a fixed point if

x? = Hσ (I − α∇f )(x? ).

Interpretation of fixed points: A compromise between making the image


agree with measurements and making the image less noisy.

PNP-FBS/ADMM and their fixed points 12


PnP ADMM

Plug-and-play alternating directions method of multipliers:

xk+1 = Hσ (y k − uk )
y k+1 = Proxαf (xk+1 + uk ) (PNP-ADMM)
k+1 k k+1 k+1
u =u +x −y

where α > 0.

PNP-FBS/ADMM and their fixed points 13


PnP ADMM

PNP-ADMM is a fixed-point iteration, and (x? , u? ) is a fixed point if

x? = Hσ (x? − u? )
x? = Proxαf (x? + u? ).

PNP-FBS/ADMM and their fixed points 14


PnP DRS

Plug-and-play Douglas–Rachford splitting:

xk+1/2 = Proxαf (z k )
xk+1 = Hσ (2xk+1/2 − z k ) (PNP-DRS)
k+1 k k+1 k+1/2
z =z +x −x

where α > 0.

We can write PNP-DRS as z k+1 = T (z k ) with


1 1
T = I + (2Hσ − I)(2Proxαf − I).
2 2

PNP-ADMM and PNP-DRS are equivalent. We analyze convergence of


PNP-DRS and translate the result to PNP-ADMM.
PNP-FBS/ADMM and their fixed points 15
PnP DRS

PNP-DRS is a fixed-point iteration, and z ? is a fixed point if

x? = Proxαf (z ? )
x? = Hσ (2x? − z ? ).

PNP-FBS/ADMM and their fixed points 16


Outline

PNP-FBS/ADMM and their fixed points

Convergence via contraction

Real spectral normalization: Enforcing Assumption (A)

Experimental validation

Convergence via contraction 17


What we do not assume

If we assume 2Hσ − I is nonexpansive, standard tools of monotone


operator theory tell us that PnP-ADMM converges. However, this
assumption is unrealistic5 so we do not assume it.

We do not assume Hσ is continuously differentiable.

5 Chan, Wang, and Elgendy, Plug-and-Play ADMM for Image Restoration:

Fixed-Point Convergence and Applications, IEEE TCI, 2017.


Convergence via contraction 18
Main assumption

Rather, we assume Hσ : Rd → Rd satisfies

k(Hσ − I)(x) − (Hσ − I)(y)k ≤ εkx − yk (A)

for all x, y ∈ Rd for some ε ≥ 0. Since σ controls the strength of the


denoising, we can expect Hσ to be close to identity for small σ. If so ,
Assumption (A) is reasonable.

Convergence via contraction 19


Contractive operators

Under (A), we show PNP-FBS and PNP-DRS are contractive iterations


in the sense that we can express the iterations as xk+1 = T (xk ), where
T : Rd → Rd satisfies

kT (x) − T (y)k ≤ δkx − yk

for all x, y ∈ Rd for some δ < 1.

If x? satisfies T (x? ) = x? , i.e., x? is a fixed point, then xk → x?


geometrically by the classical Banach contraction principle.

Convergence via contraction 20


Convergence of PNP-FBS

Theorem
Assume Hσ satisfies assumption (A) for some ε ≥ 0. Assume f is
µ-strongly convex, f is differentiable, and ∇f is L-Lipschitz. Then

T = Hσ (I − α∇f )

satisfies

kT (x) − T (y)k ≤ max{|1 − αµ|, |1 − αL|}(1 + ε)kx − yk

for all x, y ∈ Rd . The coefficient is less than 1 if


1 2 1
<α< − .
µ(1 + 1/ε) L L(1 + 1/ε)

Such an α exists if ε < 2µ/(L − µ).

Convergence via contraction 21


Convergence of PNP-DRS

Theorem
Assume Hσ satisfies assumption (A) for some ε ≥ 0. Assume f is
µ-strongly convex and differentiable. Then
1 1
T = I + (2Hσ − I)(2Proxαf − I)
2 2
satisfies
1 + ε + εαµ + 2ε2 αµ
kT (x) − T (y)k ≤ kx − yk
1 + αµ + 2εαµ
for all x, y ∈ Rd . The coefficient is less than 1 if
ε
< α, ε < 1.
(1 + ε − 2ε2 )µ

Convergence via contraction 22


Convergence of PNP-ADMM

Corollary
Assume Hσ satisfies assumption (A) for some ε ∈ [0, 1). Assume f is
µ-strongly convex. Then PNP-ADMM converges for
ε
< α.
(1 + ε − 2ε2 )µ

Convergence via contraction 23


PnP-FBS vs. PnP-ADMM

PNP-FBS and PNP-ADMM share the same fixed points 6 7 . They are
distinct methods for finding the same set of fixed points.

PNP-FBS is easier to implement as it requires ∇f rather than Proxαf .

PNP-ADMM has better convergence properties as demonstrated by


Theorems 1 and 2 and our experiments.

6 Meinhardt, Moeller, Hazirbas, and Cremers, Learning proximal operators: Using

denoising networks for regularizing inverse imaging problems. ICCV, 2017.


7 Sun, Wohlberg, and Kamilov, An online plug-and-play algorithm for regularized

image reconstruction. IEEE TCI, 2019.


Convergence via contraction 24
Convergence proof sketch

PnP-FBS: The iteration is composition of an expansive operator with a


contractive operator.

PnP-DRS: Proof is based on the notion “negatively averaged” operators


of Giselsson 8 .

8 Giselsson, Tight global linear convergence rate bounds for Douglas–Rachford

splitting, J. Fix. Point. Theory. Appl., 2017


Convergence via contraction 25
Outline

PNP-FBS/ADMM and their fixed points

Convergence via contraction

Real spectral normalization: Enforcing Assumption (A)

Experimental validation

Real spectral normalization: Enforcing Assumption (A) 26


Deep learning denoiser: DnCNN

We use DnCNN9 , which learns the residual mapping with a 17-layer CNN.

Conv + BN + ReLU

Conv + BN + ReLU
Conv + ReLU

Conv
...

17 Layers

Given a noisy observation y = x + e, where x is the clean image and e is


noise, the residual mapping R outputs the noise, i.e., R(y) = e so that
y − R(y) is the clean recovery. Learning the residual mapping is a
common approach in deep learning-based image restoration.

9 Zhang, Zuo, Chen, Meng, and Zhang, Beyond a Gaussian Denoiser: Residual

Learning of Deep CNN for Image Denoising, IEEE TIP, 2017.


Deep learning denoiser: SimpleCNN

We also construct a simple convolutional encoder-decoder model for


denoising and call it SimpleCNN.

Conv + ReLU

Conv + ReLU
Conv + ReLU

Conv
4 Layers

We use SimpleCNN to show realSN is applicable to any CNN denoiser.

Real spectral normalization: Enforcing Assumption (A) 28


Lipschitz constrained deep denoising

Note
(I − Hσ )(y) = y − Hσ (y) = R(y),
with denoiser Hσ , residual R, and identity I.

Enforcing
k(I − Hσ )(x) − (I − Hσ )(y)k ≤ εkx − yk (A)
is equivalent to constraining the Lipschitz constant of R. We propose a
variant of the spectral normalization for this.

Real spectral normalization: Enforcing Assumption (A) 29


Spectral normalization

Miyato et al.10 proposed spectral normalization (SN), which controls the


Lipschitz constant of a network’s layers through controlling the spectral
norm of the layer’s weight. If we use 1-Lipschitz nonlinearities (such as
ReLU), the Lipschitz constant of a layer is upper-bounded by the spectral
norm of its weight, and the Lipschitz constant of the full network is
bounded by the product of spectral norms of all layers.

While this basic methodology suits our goal, Miyato et al.’s SN uses an
inexact implementation that underestimates the true spectral norm.

10 Miyato, Kataoka, Koyama, and Yoshida, Spectral Normalization for Generative

Adversarial Networks, ICLR, 2018.


Real spectral normalization: Enforcing Assumption (A) 30
Real Spectral Normalization

Real Spectral Normalization (realSN) accurately constrains the network’s


Lipschitz constant through a power iteration with the convolutional linear
operator Kl : RCin ×h×w → RCout ×h×w , where h, w are input’s height
and width, and its conjugate (transpose) operator Kl∗ . The iteration
maintains Ul ∈ RCout ×h×w and Vl ∈ RCin ×h×w to estimate the leading
left and right singular vectors respectively. During each forward pass of
the neural network, realSN conducts:
1. Apply one step of the power method with operator Kl :

Vl ← Kl∗ (Ul ) / kKl∗ (Ul )k2 ,


Ul ← Kl (Vl ) / kKl (Vl )k2 .

2. Normalize the convolutional kernel Kl with estimated spectral norm:

Kl ← Kl /σ(Kl ), where σ(Kl ) = hUl , Kl (Vl )i


We can view realSN as an approximate projected gradient enforcing the
Lipschitz continuity constraint.
Implementation details
We train SimpleCNN and DnCNN in the setting of Gaussian denoising
with 40 × 40 patches of the BSD500 dataset, natural images. RealSN
constrains the Lipschitz constant to no more than 1.

40 × 40 patches
BSD500 40 × 40
corrupted with
original images (clean) patches
Gaussian noise

On an Nvidia GTX 1080 Ti, DnCNN took 4.08 hours and realSN-DnCNN
took 5.17 hours to train, so the added cost of realSN is mild.
Outline

PNP-FBS/ADMM and their fixed points

Convergence via contraction

Real spectral normalization: Enforcing Assumption (A)

Experimental validation

Experimental validation 33
Poisson denoising

Given a true image xtrue ∈ Rd , we observe Poisson random variables

yi ∼ Poisson((xtrue )i )

for i = 1, . . . , d. We use the negative log-likelihood


d
X
f (x) = −yi log(xi ) + xi .
i=1

11
For further details of the experimental setup, see the main paper or .

11 Rond, Giryes, and Elad, Poisson inverse problems by the plug-and-play scheme, J.

Vis. Commun. Image R. 2016.


Experimental validation 34
Poisson denoising

Corrupted 3.36dB Recovery 20.28dB

Experimental validation 35
Poisson denoising

=1.198 =0.96 =0.758

0.95 1 1.05 1.1 1.15 1.2 0.86 0.88 0.9 0.92 0.94 0.96 0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76

(a) BM3D (b) SimpleCNN (c) RealSN-SimpleCNN

=0.484 =0.464

0.43 0.44 0.45 0.46 0.47 0.48 0.4 0.41 0.42 0.43 0.44 0.45 0.46

(d) DnCNN (e) RealSN-DnCNN

We run PnP iterations, calculate k(I − Hσ )(x) − (I − Hσ )(y)k/kx − yk


between the iterates and the limit, and plot the histogram. The maximum
value, the red bar, lower-bounds ε of (A). Convergence of PnP-ADMM
requires ε < 1. The results prove BM3D violates this assumption and
illustrate that RealSN indeed controls (reduces) the Lipschitz constant.
Poisson denoising

BM3D RealSN-DnCNN RealSN-SimpleCNN


PNP-ADMM 23.4617 23.5873 18.7890
PNP-FBS 18.5835 22.2154 22.7280

PSNR of the PnP methods with BM3D, RealSN-DnCNN, and


RealSN-SimpleCNN plugged in. In both PnP methods, one of the two
denoisers using RealSN, for which we have theory, outperforms BM3D.

Experimental validation 37
Single photon imaging

The measurement model of quanta image sensors is

z = 1(y ≥ 1), y ∼ Poisson(αsg Gxtrue )

where xtrue ∈ Rd is the true image, G : Rd → RdK duplicates each pixel


to K pixels, αsg ∈ R is sensor gain, K is the oversampling rate,
z ∈ {0, 1}dK is the observed binary photons. (y is not measured.) The
likelihood function is
n
X
f (x) = −Kj0 log(e−αsg xj /K ) − Kj1 log(1 − e−αsg xj /K ),
j=1

where Kj1 is the number of ones in the j-th unit pixel, Kj0 is the number
of zeros in the j-th unit pixel.

12
For further details of the experimental setup, see the main paper or .
12 Elgendy and Chan, Image reconstruction and threshold design for quanta image

sensors, IEEE ICIP, 2016.


Single photon imaging

Corrupted 17.32dB Recovery 36.02dB

Measurement pixels take integer values between 0 and K = 64.

Experimental validation 39
Single photon imaging

PnP-ADMM with RealSN-DnCNN provides best PSNR. We also observe


that RealSN makes PnP converge more stably.

PnP-FBS, α = 0.005
Average PSNR BM3D RealSN- RealSN-
DnCNN SimpleCNN
Iteration 50 28.7933 27.9617 29.0062
Iteration 100 29.0510 27.9887 29.0517
Best Overall 29.5327 28.4065 29.3563
PnP-ADMM, α = 0.01
Average PSNR BM3D RealSN- RealSN-
DnCNN SimpleCNN
Iteration 50 30.0034 31.0032 29.2154
Iteration 100 30.0014 31.0032 29.2151
Best Overall 30.0474 31.0431 29.2155

Experimental validation 40
Compressed sensing MRI

PnP is useful in medical imaging when we do not have enough data for
end-to-end training: train the denoiser Hσ on natural images, and “plug”
it into the PnP framework to be applied to medical images.

Given a true image xtrue ∈ Cd , CS-MRI measures

y = Fp xtrue + εe ,

where Fp is the Fourier k-domain subsampling (partial Fourier operator),


and εe ∼ N (0, σe Ik ) is measurement noise. We use the objective function

f (x) = (1/2)ky − Fp xk2 .

13
For further details of the experimental setup, see the main paper or .

13 Eksioglu, Decoupled algorithm for MRI reconstruction using nonlocal block

matching model: BM3D-MRI, J. Math. Imaging Vis., 2016.


Compressed sensing MRI

Radial sampling k-space Recovery 19.09dB

k-space measurement is complex-valued so we plot the absolute value.

Experimental validation 42
Compressed sensing MRI

PSNR (in dB) for 30% sampling with additive Gaussian noise σe = 15.
RealSN generally improves the performance.
Sampling approach Random Radial Cartesian
Image Brain Bust Brain Bust Brain Bust
Zero-filling 9.58 7.00 9.29 6.19 8.65 6.01
TV14 16.92 15.31 15.61 14.22 12.77 11.72
RecRF15 16.98 15.37 16.04 14.65 12.78 11.75
BM3D-MRI16 17.31 13.90 16.95 13.72 14.43 12.35
BM3D 19.09 16.36 18.10 15.67 14.37 12.99
DnCNN 19.59 16.49 18.92 15.99 14.76 14.09
PnP-FBS RealSN-DnCNN 19.82 16.60 18.96 16.09 14.82 14.25
SimpleCNN 15.58 12.19 15.06 12.02 12.78 10.80
RealSN-SimpleCNN 17.65 14.98 16.52 14.26 13.02 11.49
BM3D 19.61 17.23 18.94 16.70 14.91 13.98
DnCNN 19.86 17.05 19.00 16.64 14.86 14.14
PnP-ADMM RealSN-DnCNN 19.91 17.09 19.08 16.68 15.11 14.16
SimpleCNN 16.68 12.56 16.83 13.47 13.03 11.17
RealSN-SimpleCNN 17.77 14.89 17.00 14.47 12.73 11.88

14 Lustig,Santos, Lee, Donoho, and Pauly, SPARS, 2005.


15 Yang, Zhang, and Yin, IEEE JSTSP, 2010.
16 Eksioglu, J. Math. Imaging Vis., 2016.
Conclusion
1. PnP-FBS and PnP-ADMM converges under a Lipschitz assumption
on the denoiser.
2. Real spectral normalization enforces the Lipschitz condition in
training deep learning-based denoisers.
3. The experiments validate the theory.

Paper available at:


http://proceedings.mlr.press/v97/ryu19a.html
Code available at:
https://github.com/uclaopt/Provable_Plug_and_Play/

Link to paper Link to code

You might also like