From Classical To Unsupervised Deep Learning For Solving Inverse Problem in Imaging To
From Classical To Unsupervised Deep Learning For Solving Inverse Problem in Imaging To
Unsupervised-Deep-Learning
Methods for Solving Inverse
Problems in Imaging
Harshit Gupta
Thèse N 7360 (septembre 2020)
In this thesis, we propose new algorithms to solve inverse problems in the context of
biomedical images. Due to ill-posedness, solving these problems require some prior
knowledge of the statistics of the underlying images. The traditional algorithms, in
the field, assume prior knowledge related to smoothness or sparsity of these images.
Recently, they have been outperformed by the second generation algorithms which
harness the power of neural networks to learn required statistics from training data.
Even more recently, last generation deep-learning-based methods have emerged
which require neither training nor training data.
This thesis devises algorithms which progress through these generations. It ex-
tends these generations to novel formulations and applications while bringing more
robustness. In parallel, it also progresses in terms of complexity, from proposing
algorithms for problems with 1D data and an exact known forward model to the
ones with 4D data and an unknown parametric forward model.
We introduce five main contributions. The last three of them propose deep-
learning-based latest-generation algorithms that require no prior training.
1) We develop algorithms to solve the continuous-domain formulation of inverse
problems with both classical Tikhonov and total-variation regularizations. We for-
malize the problems, characterize the solution set, and devise numerical approaches
to find the solutions.
2) We propose an algorithm that improves upon end-to-end neural-network-
based second generation algorithms. In our method, a neural network is first trained
as a projector on a training set, and is then plugged in as a projector inside the
projected gradient descent (PGD). Since the problem is nonconvex, we relax the
PGD to ensure convergence to a local minimum under some constraints. This
method outperforms all the previous generation algorithms for Computed Tomog-
i
ii
raphy (CT).
3) We develop a novel time-dependent deep-image-prior algorithm for modalities
that involve a temporal sequence of images. We parameterize them as the output
of an untrained neural network fed with a sequence of latent variables. To impose
temporal directionality, the latent variables are assumed to lie on a 1D manifold.
The network is then tuned to minimize the data fidelity. We obtain state-of-the-art
results in dynamic magnetic resonance imaging (MRI) and even recover intra-frame
images.
4) We propose a novel reconstruction paradigm for cryo-electron-microscopy
(CryoEM) called CryoGAN. Motivated by generative adversarial networks (GANs),
we reconstruct a biomolecule’s 3D structure such that its CryoEM measurements
resemble the acquired data in a distributional sense. The algorithm is pose-or-
likelihood-estimation-free, needs no ab initio, and is proven to have a theoretical
guarantee of recovery of the true structure.
5) We extend CryoGAN to reconstruct continuously varying conformations of
a structure from heterogeneous data. We parameterize the conformations as the
output of a neural network fed with latent variables on a low-dimensional manifold.
The method is shown to recover continuous protein conformations and their energy
landscape.
Dans cette thèse, nous proposons de nouveaux algorithmes pour résoudre des pro-
blèmes inverses dans le cadre d’images biomédicales. En raison de la mauvaise pose,
la résolution de ces problèmes nécessite une connaissance préalable des statistiques
des images sous-jacentes. Les algorithmes traditionnels, sur le terrain, supposent des
connaissances préalables liées à la fluidité ou à la rareté de ces images. Récemment,
ils ont été dépassés par les algorithmes de deuxième génération qui exploitent la
puissance des réseaux de neurones pour apprendre les statistiques requises à partir
des données d’entraînement. Plus récemment encore, des méthodes basées sur l’ap-
prentissage en profondeur de dernière génération sont apparues, qui ne nécessitent
ni formation ni données de formation.
Cette thèse conçoit des algorithmes qui progressent à travers ces générations. Il
étend ces générations à de nouvelles formulations et applications tout en apportant
plus de robustesse. En parallèle, il progresse également en termes de complexité, de
la proposition d’algorithmes pour des problèmes avec des données 1D et un modèle
direct exact connu à ceux avec des données 4D et un modèle direct paramétrique
inconnu.
Nous introduisons cinq contributions principales. Les trois derniers proposent
des algorithmes de dernière génération basés sur le deep learning qui ne nécessitent
aucune formation préalable.
1) Nous développons des algorithmes pour résoudre la formulation dans le do-
maine continu de problèmes inverses avec des régularisations Tikhonov classiques
et à variation totale. Nous formalisons les problèmes, caractérisons l’ensemble de
solutions et concevons des approches numériques pour trouver les solutions.
2) Nous proposons un algorithme qui améliore les algorithmes de deuxième gé-
nération basés sur un réseau de neurones de bout en bout. Dans notre méthode,
iii
iv
un réseau de neurones est d’abord formé en tant que projecteur sur un ensemble
d’entraînement, puis branché en tant que projecteur à l’intérieur de la descente
en gradient projeté (PGD). Comme le problème n’est pas convexe, nous assou-
plissons le DPI pour assurer la convergence vers un minimum local sous certaines
contraintes. Cette méthode surpasse tous les algorithmes de génération précédente
pour la tomodensitométrie (CT).
3) Nous développons un nouvel algorithme prioritaire d’image profonde dépen-
dant du temps pour les modalités qui impliquent une séquence temporelle d’images.
Nous les paramétrons comme la sortie d’un réseau neuronal non formé alimenté par
une séquence de variables latentes. Pour imposer une directionnalité temporelle, les
variables latentes sont supposées se trouver sur une variété 1D. Le réseau est ensuite
réglé pour minimiser la fidélité des données. Nous obtenons des résultats de pointe
en imagerie par résonance magnétique dynamique (IRM) et récupérons même des
images intra-trame.
4) Nous proposons un nouveau paradigme de reconstruction pour la cryo micro-
scopie électronique (CryoEM) appelé CryoGAN. Motivés par des réseaux contra-
dictoires génératifs (GAN), nous reconstruisons la structure 3D d’une biomolécule
de telle sorte que ses mesures CryoEM ressemblent aux données acquises dans un
sens distributionnel. L’algorithme est sans estimation de vraisemblance ni de pose
ou de vraisemblance, ne nécessite aucun ab initio et il est prouvé qu’il a une garantie
théorique de récupération de la véritable structure.
5) Nous étendons CryoGAN pour reconstruire des conformations variant conti-
nuellement d’une structure à partir de données hétérogènes. Nous paramétrons les
conformations comme la sortie d’un réseau neuronal alimenté avec des variables la-
tentes sur une variété de faible dimension. Il est démontré que la méthode récupère
les conformations protéiques continues et leur paysage énergétique.
This thesis is the result of support from my friends, family, and mentors who showed
faith in me throughout this journey. Firstly, I thank Prof. Michael Unser for super-
vising this thesis. His intuition, passion, and curiosity helped me build my research
instinct. I also thank him for trusting me during the tough times and encouraging
me with both words and actions. In future, I would like to emulate the culture he
has cultivated in the group.
I express sincere thanks to jury president Prof. Dimitri Van de Ville, the jury
members, Prof. FranÃğois Fleuret, Prof. Ender Konukoglu, and Dr. Sjors Scheres
for reviewing and accepting this thesis.
I would like to thank Dr. Michael T. McCann (Mike), the creative genius, who
with his beautiful mind helped me solve many personal and research problems. I
tried to learn his technique of brutally analyzing concepts and problems. This was
immensely useful during the course of this thesis. I thank Dr. Soham Basu for all
his advises and moral support. I thank Dr. Daniel Schmitter, Dr. Denis Fortun,
and Luc Zheng for being there in the beginning years of my PhD.
I was fortunate to share my office with Thanh-an Pham from whom I learnt
to have fun while carrying out a PhD. His humor has been a good company for
the last three years. I am glad to have Shayan Aziznejad as my lab mate and a
dear friend. Over the course of the thesis I learnt many life perspectives from him,
shared numerous views, and had a lot of fun. I thank Pablo Garcia for all the fun
coffee meetings and gaming sessions. I thank Dr. Kyong Hwan Jin for initiating
my journey into deep learning.
ix
x
I would like to thank Dr. Daniel Sage for his help with almost all the aspects
of my research life at the lab and Dr. Philippe Thevenaz for all the help in writing
the research papers. I thank Dr. Emrah Bostan and Dr. Pedram Pad for being
really patient mentors and office mates during the initial part of my thesis. I thank
Dr. Masih Nilchian for all the life wisdom he shared with me.
I thank Dr. Laurene Donati, Dr. Anais Bodoul, Thomas Debarre, Fangshu
Yang, Pakshal Bohra, Dr. Quentin Denoyelle, and Dr. Jaejun Yoo for all the fun,
humor, and discussions we had in the past few years. I thank all the past members
of our lab Dr. Ferreol Soulez, Prof. Adrien Depeursinge, Dr. Emmanuel Soubies,
Dr. Zsuzsanna PÃijspoki, Dr. Virginie Uhlmann, Prof. Arash Amini, Leello Dadi,
and Carlos Garcia for the shared memories. I thank the recent members or affiliates
of the lab Joaquim Campos, Thong Huy Phan, Alexis Goujon, Dr. Pol del Aguila
Pla, and Yan Liu for bringing fresh perspectives.
I would like to specially thank the Indian community in Lausanne. I had fun of
a lifetime with Ranjith, Sai, Harshal, Kunhal, Maneesha, Sanket, Sourabh, Sagar,
Salil, Tejal, Anjali, Venkat, Anand, Rishikesh, Teju, Aparajitha, Yanisha, Sean,
Kavitha, Nithin, Chethana, Shravan, Mohit, Murali, Amrita, and Mayank. I thank
my friends Kaushal, Ravi, Gagandeep, Arpit, Pawan, Saurabh, Shashank, Puru,
Lakshman, Nikit, Rupam, Rajan from IIT Guwahati and Kirti, Khushal, Arpit,
Rishiraj, Rishabh, Rajwardhan, and Rahul from childhood for all their support.
Lastly, I thank my grandparents, parents, brother, and my entire long and wide
family without whom I could not have completed this thesis.
Contents
Abstract i
Résumé iii
Acknowledgement ix
Introduction 1
I First Generation 19
2 Continuous-Domain Extension of Classical Methods 21
xi
xii CONTENTS
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Continuous-Domain Formulation of Inverse Problems . . . . . . . . . 24
2.2.1 Measurement Operator . . . . . . . . . . . . . . . . . . . . . 25
2.2.2 Data-Fidelity Term . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Regularization Operator . . . . . . . . . . . . . . . . . . . . 25
2.2.4 Regularization Norms . . . . . . . . . . . . . . . . . . . . . . 26
2.2.5 Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1 Inverse Problem with Tikhonov/L2 Regularization . . . . . . 30
2.4.2 Inverse Problem with gTV Regularization . . . . . . . . . . . 31
2.4.3 Illustration with Ideal Sampling . . . . . . . . . . . . . . . . 32
2.5 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Discretization and Algorithms . . . . . . . . . . . . . . . . . . . . . . 35
2.6.1 Tikhonov Regularization . . . . . . . . . . . . . . . . . . . . . 35
2.6.2 gTV Regularization . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.3 Alternative Grid-free Techniques . . . . . . . . . . . . . . . . 42
2.7 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.7.1 Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . 44
2.7.2 Multiple Solutions . . . . . . . . . . . . . . . . . . . . . . . . 44
2.7.3 Random Fourier Sampling . . . . . . . . . . . . . . . . . . . . 45
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
II Second Generation 53
3 Deep-Learning-based PGD for Iterative Reconstruction 55
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.1.2 Related and Prior Work . . . . . . . . . . . . . . . . . . . . . 58
3.1.3 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Theoretical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.2 Constrained Least Squares . . . . . . . . . . . . . . . . . . . 61
3.2.3 Projected Gradient Descent . . . . . . . . . . . . . . . . . . . 61
CONTENTS xiii
A Appendices 173
A.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
A.1.1 Proof of Theorem 2.4.1 . . . . . . . . . . . . . . . . . . . . . 173
A.1.2 Proof of Theorem 2.4.2 . . . . . . . . . . . . . . . . . . . . . 178
A.1.3 Proof of Theorem 2.6.2 . . . . . . . . . . . . . . . . . . . . . 180
A.1.4 Proof of Proposition A.1.3 . . . . . . . . . . . . . . . . . . . . 181
A.1.5 Proof of Proposition A.1.4 . . . . . . . . . . . . . . . . . . . . 181
A.1.6 Structure of the Search Spaces . . . . . . . . . . . . . . . . . 182
A.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
A.2.1 Proof of Theorem 3.3.1 . . . . . . . . . . . . . . . . . . . . . 183
A.2.2 RPGD for Poisson Noise in CT . . . . . . . . . . . . . . . . . 185
A.2.3 Proof of Proposition 3.2.1 . . . . . . . . . . . . . . . . . . . . 186
A.2.4 Proof of Proposition 3.2.2 . . . . . . . . . . . . . . . . . . . . 187
A.2.5 Proof of Proposition 3.2.3 . . . . . . . . . . . . . . . . . . . . 187
A.2.6 Proof of Theorem 3.2.4 . . . . . . . . . . . . . . . . . . . . . 188
A.2.7 Proof of Theorem 3.2.5 . . . . . . . . . . . . . . . . . . . . . 189
A.2.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
A.3 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
A.3.1 Synthetic Data (Figure 5.3) . . . . . . . . . . . . . . . . . . . 195
A.3.2 Additional Synthetic Data (Figure 5.4) . . . . . . . . . . . . . 197
A.3.3 Experimental Data (Figure 5.5) . . . . . . . . . . . . . . . . . 197
A.4 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
A.4.1 Neural Network Architectures . . . . . . . . . . . . . . . . . . 200
xvi CONTENTS
Bibliography 200
Imagine we have an old weighing machine and a sealed paper bag that contains
two apples. We would like to know the weight of each apple. However, the bag
must remain sealed. When weighed the total weight appears to be 501 grams.
This problem of finding the individual weights of the apples, defined here as our
signal, from an indirect hint about them, called measurement, can be regarded as an
inverse problem. The measurement is obtained from the signal through a forward
model. In this case, the forward model is the operation of summation.
This particular inverse problem is ill-posed since one cannot find the exact
weights of the two apples based solely on this one measurement. This is because
of two reasons. Firstly, there is an uncertainty on their exact total weight due
to the fact that the old machine can have an error margin and that the bag has
some unknown weight. These latter factors are called noise which corrupts the true
measurement. Secondly, even if the true total weight was known, still there will be
an infinite combination of weights that would sum up to this exact total.
To address this problem, we could take the advantage of prior knowledge about
the apples. For example, if we know that in this particular season all the apples have
almost the same weight, then we can estimate that each apple weighs around 250.5
grams. We could also factor in the weight of the bag and the error margin of the
machine to get an even better estimate. If we know that both of these contribute
together at most 1 gram of variation in the total weight, then we can claim that the
weight of each apple lies now in an interval, somewhere between 250-251 grams.
The resolution of ill-posed inverse problems is fundamental to modern imag-
ing [1] such as fluorescence microscopy, computational tomogaphy (CT), magnetic
resonance imaging (MRI), cryo-electron microscopy (Cryo-EM), single-molecule lo-
calization microscopy (SMLM), etc.(see Figure 1). These techniques are essential
1
2 CONTENTS
• The measurements are corrupted by noise. This can be thermal noise in the
detector or intrinsic to the forward model or imaging setup.
• The measurements may not be fully informative because the forward model
has an intrinsic null space.
Therefore, solving these problems requires prior knowledge about the true signal.
For example, it can be assumed that the CT image of the organ is piecewise-
constant. The quality of reconstruction therefore substantially hinges on the ability
to inject accurate prior knowledge during the reconstruction.
In all techniques (CT, MRI, Cryo-EM, SMLM, among others), the approaches to
solve the underlying inverse problems have dramatically shifted from the classical-
1 COVID-19 image has been obtained from the website of Centre of Disease Control.
CONTENTS 3
Depending on the way the network is used in the reconstruction, this second gen-
eration of methods can itself be divided into two categories.
1. Direct Feedforward Reconstruction: In this category, the training set is first
used to learn a neural network that maps the measurements to their corre-
sponding training signal [17–20]. Once trained, the neural network is fed with
the actual measurements and the output of the network then yields the re-
construction. These methods reconstruct images with unprecedented quality,
owing to the capacity of neural networks to faithfully learn complicated map-
pings. However, these methods do not actively inject the information from
the measurements and rely only on the training data to understand the inver-
sion of the underlying physics-based forward model. This might reconstruct
image which lack consistency with the actual measurements.
2. Iterative Reconstruction with Consistency Constraints: Many approaches
have been proposed to enforce data consistency in the solution [21–25], in-
cluding one of the contributions of this thesis [26]. In these approaches, recon-
struction is performed iteratively with information from the forward model
and the measurements being injected in conjunction with the ability of the
network to reconstruct quality images. In essence, methods of this category
combine the power of neural networks with the variational formulation of the
classical methods. They produce better results and favor robustness when
there is a mismatch between the training-data and the image to be recon-
structed.
The main disadvantage of these supervised learning-based methods is that they
require training-data.
Recently, there has been an emergence of third-generation methods that use
deep learning without requiring training or training-data. The most prominent
representatives of this generation are methods based on deep image priors [27],
which use an untrained network to solve inverse problems. In this scheme, a fixed
random input is fed to the network. The network is then optimized to ensure
that its output is consistent with the acquired measurements. The success of this
scheme is explained by the neural network architecture which imposes an implicit
regularization that favours the reconstruction of natural-looking images. However,
this deep image prior needs more theoretical and experimental analysis. This will be
needed to understand its effect, applicability, and limits. Another category has also
CONTENTS 5
Main Contributions
This thesis brings five main contributions to the field of inverse problems in imaging.
These contributions progress from the classical methods of the first generation to the
deep-learning-based methods of the last generation. They extend these generations
to novel formulations and applications, all the while bringing more robustness. In
parallel, they also progress in terms of complexity, from algorithms for problems
with 1D data and an exact known forward model to the problems with 4D data
and an unknown parametric forward model. We summarize these contributions in
Figure 2.
ization,” IEEE Transactions on Signal Processing, vol. 66(17), pp. 4670-84, July
2018.
we assume the latent variable sequence to lie on a line segment. For approximately
periodic data like cardiac motion, we assume it to lie on a helix. We show that our
scheme results in state-of-the-art performance for both synthetic and experimental
data. Moreover, the interpolation of latent variables even lets us recover intra-
frame images. To the best of our knowledge, this is the first deep-learning method
applicable for dynamic MRI which requires neither prior training nor training-data.
Related Preprint: K.H. Jin*, H. Gupta*, J. Yerly, M. Stuber, M. Unser,
“Time-dependent deep image prior for dynamic MRI,” arXiv [eess.IV], October,
2019. *cofirst authors
1
In this chapter, we will provide the mathematical formulation of inverse problems
and the reconstruction methods.
1.1 Overview
Solving inverse problems implicitly requires inverting the forward model which in-
corporates the physical acquisition process by which the measurement is acquired
from a signal. In the imaging applications dealt with in this thesis, the acquisition
physics is such that the forward model is linear and hence can be represented by a
linear operator H 2 RM ! RN . The measurement y is then given by
y = Hx + n, (1.1)
11
12 Linear Inverse Problems for Biomedical Imaging
where J is a suitable function on the space of signal and the given measurement.
Hence, reconstruction procedure requires carefully choosing the function J and
then finding the signal which minimizes it. Solving these problems when M > N
is easy and the algorithms that deal with this scenario are mature and efficient. In
this thesis, we deal with the current trends in imaging. These trends may include
significantly fewer measurements than the number of unknowns (M ⌧ N ). For ex.,
this is useful in decreasing either the radiation dose in computed tomography (CT)
or the scanning time in MRI. Moreover, the measurements are typically very noisy
due to short integration times, which calls for some form of denoising.
This gives rise to an ill-posed problem in the sense that there may be an infinity
of consistent signals (or images) that map to the same measurements y. Thus, one
challenge of the reconstruction algorithm is to select the best solution among a mul-
titude of potential candidates. To counteract this ill-posedness, appropriate prior
knowledge about the signal needs to be injected. The quality of the reconstruction
method is therefore highly dependent on the data-fidelity and regularization used.
The available reconstruction methods can be broadly arranged in three genera-
tions, which represent the continued efforts of the research community to address
the aforementioned challenges.
Due to the simplicity of the objective function, the reconstruction is easy to perform.
In fact, for the case when E(Hx, y) = kHx yk2 the solution is given by
x⇤ = (HT H + I) 1
HT y. (1.5)
Tikhonov-based classical methods are fast and provide excellent results when
the number of measurements is large and the noise is small [30]. However, in case of
extreme imaging, when M is smaller than N they reconstruct overly smooth image.
Here L could be a difference operator such that the regularization enforces image to
be piecewise constant. Due to the non-differentiability of the regularization term,
the solution to these problems cannot be found using simple gradient-based meth-
ods such as gradient descent. Instead the solutions are typically found iteratively
by enforcing the data-fidelity and the regularization, alternatively. The latter is
done using proximal operation. The overall skeleton of iterative algorithms can be
summarised as follows
14 Linear Inverse Problems for Biomedical Imaging
This alternate mechanism is the backbone of many schemes like Iterative Soft
Thresholding Algorithm (ISTA) [31], Fast Iterative Soft Thresholding Algorithm
(FISTA) [32], and Alternating Directions Methods of Multipliers (ADMM) [13]. In
ISTA, the ⌧ and are kept the same through the iterations. However, the rate of
global convergence is sublinear, O(1/k). In FISTA, these parameters are updated
based on Nesterov acceleration. This brings faster speed of convergence than ISTA
with convergence rate O(1/k 2 ). Similar convergence rate is achieved in ADMM by
decoupling the data-fidelity and regularization using auxiliary variables, and having
a separate step to explicitly update these auxiliary variables.
Under the assumption that the functionals E and R are convex, the solution of
(1.6) also satisfies
x⇤ = arg min E(Hx, y) (1.9)
x2SR
1.3 Generation II - Supervised-Deep-Learning-Based methods 15
{(x1 , y1 ), . . . , (xQ , yQ )}
16 Linear Inverse Problems for Biomedical Imaging
which constitute pairs of signal and measurement. The network is first trained to
map these measurement to their true corresponding image, and then is deployed
on the new unseen measurement. This is given by
Q
X
⇤
Training : = arg min kCNN (yq ) x q k2 , (1.11)
q=1
Often the measurement is followed by a fixed linear step like Back Projection
(BP) or Filtered Back-Projection (FBP) before being fed to the CNN [17,18]. This
helps in better convergence and to implicitly inject the physical knowledge during
the learning.
Although it is now well documented that these methods have the capacity to
outperform the classical algorithms, there is still limited theoretical understanding
regarding them. Moreover, these methods are extremely sensitive to the mismatch
between the statistics of the training and testing data. This is because the mea-
surement information is injected into the reconstruction only once.
ments associated with the reconstruction. CNNs have the ability to represent a
large set of images. Therefore, they are susceptible to reconstruct undesirable im-
ages in (1.13), which minimizes the data fidelity. Yet, it has been observed that
when the optimization is carried out using standard techniques the algorithm recon-
structs desirable natural and biomedical images. In the case when the measurement
is corrupted with noise, early stopping is required.
This bias or prior towards desirable images has been ascribed to the architecture
of the network which not only lets it represent natural and biomedical images more
easily than other images but even helps in reaching faster to these desirable images.
In the literature this bias is called deep image prior.
1.5 Summary
In this chapter, we give an overview of the reconstruction methods to solve linear
inverse problems. We describe the optimization problem that are formulated for
these methods and the routines deployed to solve them numerically. This will act
as the background for to understand the next parts of our thesis.
18 Linear Inverse Problems for Biomedical Imaging
Part I
First Generation
19
Chapter 2
Continuous-Domain Extension
of Classical Methods
1
In chapter 1, we formulated linear inverse problems. A subtle point to note in that
formulation was that the signal, measurement, and the forward model were repre-
sented in discrete-domain (vectors). It is important to understand that the true
signal and the forward model are continuous-domain entities. The measurements
are generally discrete since they are obtained from a finite number of detectors.
For example, MRI measurements are samples of the Fourier transform at a finite
number of different frequencies of a continuous-domain signal. In this chapter, we
further our understanding of classical methods by appropriate extension to solve
continuous-domain linear inverse problems.
2.1 Overview
Although the signals and forward model that one encounters are in continuous-
domain, they are discretized in order to numerically solve the inverse problems.
This is done by choosing an arbitrary but suitable basis {'n } and to represent the
1 The content of this chapter are based on our work [29]
21
22 Continuous-Domain Extension of Classical Methods
signal in continuous-domain as
N
X
f (x) = fn 'n (x), (2.1)
n=1
which can be efficiently solved using iterative algorithms [7], [41]. The solutions
to (2.2), (2.3), and their variants with generalized data-fidelity terms are well
known [42], [43], [44], [45].
While those discretization paradigms are well studied and used successfully in prac-
tice, it remains that the use of a prescribed basis {'n }, as in (2.1), is somewhat
arbitrary.
In this chapter, we propose to bypass this limitation by reformulating and solv-
ing the linear inverse problem directly in the continuous domain. To that end, we
impose the regularization in the continuous domain, too, and restate the recon-
struction task as a functional minimization. We show that this new formulation
leads to the identification of an optimal basis for the solution which then suggests
a natural way to discretize the problem.
2.1 Overview 23
2.1.1 Contributions
Our contributions are two folds and are summarized as follows:
Theoretical.
• Given y 2 RM , we formalize 1D inverse problem in the continuous domain as
Algorithmic.
• We propose a discretization scheme to minimize JR (f |y) in the continuous
domain. Even though the minimization of JR (f |y) is an infinite-dimensional
24 Continuous-Domain Extension of Classical Methods
problem, the knowledge of the optimal basis of the solution makes the problem
finite-dimensional: it boils down to the search for a set of optimal expansion
coefficients.
This chapter is organized as follows: In Sections 2.2 and 2.4, we present the formu-
lation and the theoretical results of the inverse problem for the two regularization
cases. In Section 2.5, we compare the solutions of the two cases. We present our
numerical algorithm in Section 2.6 and illustrate its behavior with various exam-
ples in Section 2.7. The mathematical proofs of the main theorems are given in the
appendices.
Given that Lb is the frequency response of L, the Green’s function can be calcu-
⇢
lated through the inverse Fourier transform ⇢L = F 1 1
. For example, if L = D,
b
L Z
then ⇢D (x) = 2 sign(x). Here the Fourier transform, F : f 7! Ff =
1
f (x)e jx(·) dx,
R
is defined when the function is integrable and can be extended in the usual manner
to f 2 S 0 (R) where
⇢ S (R) is ‘Schwartz’ space of tempered distributions. In cases
0
such as ⇢L = F 11
when the argument is non-integrable, the definition should
b
L
be seen in terms of generalized Fourier Transform [47, Defintion 8.9] which treats
the argument as a distribution.
There S(R) is the ‘Schwartz’ space of smooth and rapidly decaying functions,
which is also the dual of S 0 (R). Moreover, M={w 2 S 0 (R) | kwkM < 1}.
In particular, when w 2 L1 ⇢ M, we have that
Z
kwkM = |w(x)| dx = kwkL1 . (2.9)
R
Yet, we note that M is slightly larger than L1 since it also includes the Dirac
distribution with k kM = 1. The popular TV norm is recovered by taking
kf kTV = kDf kM [46].
In other words, our search (or native) space is the largest space over which the
regularization is well defined. It turns out that X2 and X1 are Hilbert and Banach
spaces, respectively. However, this is nontrivial to see since these spaces contain
the null space which makes kLf kL2 and kLf kM semi-norms. This null-space can
be taken care off by using an appropriate inner-product h·, ·iNL (norm k · kNL ,
respectively) such that h·, ·iX2 = hL·, L·i + h·, ·iNL (k · kX1 = kL · kM + k · kNL ,
respectively) is the inner-product (norm, respectively) on X2 (X1 , respectively).
The structure of these spaces has been studied in [46] and is recalled in the Appendix
A.1.6
As we shall see in Section 2.4, the solution of (2.4) will be composed of splines;
therefore, we also review the definition of splines.
K
X
Lf = ak (· xk ). (2.12)
k=1
28 Continuous-Domain Extension of Classical Methods
By solving the differential equation in (2.12), we find that the generic form of the
nonuniform spline f is
K
X
f = p0 + ak ⇢L (· xk ), (2.13)
k=1
where p0 2 NL .
one of the main contributions of our work. In addition our results are valid for a
much larger set of data-fidelity terms than [46]. This is useful in practical scenarios
where one may use data-fidelity terms depending on factors like distribution of
noise, etc..
Assumption 1. Let the search space X and the regularization space Y be Banach
spaces such that the following holds.
iii. The inverse problem is well posed over the null space. This means that, for
any pair p1 , p2 2 NL , we have that
i. It is strictly convex; or
As we shall see later, stronger results can be derived for the E(y, ·) that satisfy
Assumption 2’.
Two remarks are in order. Firstly, the condition of being proper in the range
of H implies that there exists an f 2 X such that E(y, H{f }) is finite. Secondly,
when E(y, ·) is strictly convex or is such that its range does not include 1, it is
redundant to ensure that it is proper in the range of H.
We now state our two main results. Their proofs are given in Appendix A.1.1
and Appendix A.1.2, respectively.
of minimizers is nonempty, convex, and such that any f2⇤ 2 V2 is of the form
M
X N0
X
f2⇤ (x) = am 'm (x) + bn pn (x), (2.16)
m=1 n=1
n o
bm
where 'm = F 1 |hL|
b 2 , and a = (a1 , . . . , aM ), and b = (b1 , . . . , bN0 ) are expansion
coefficients such that
M
X
am hhm , pn i = 0 (2.17)
m=1
for all n 2 {1, . . . , N0 }. Moreover, if E(y, ·) satisfies Assumption 2’ then the min-
imizer is unique (the set V2 is singleton).
2.4 Theoretical Results 31
for some K (M N0 ). The unknown knots (x1 , . . . , xK ), and the expansion co-
efficients a = (a1 , . . . , aK ) and b = (b1 , . . . , bN0 ) are the parameters of the solution
with kLf1⇤ kM = kak1 . The solution set V1 is the closed-convex hull of these extreme
points. Moreover, if Assumption 2’ is satisfied then all the solutions have the same
measurement; i.e., yV1 = H{f }, 8 f 2 V1 .
R
A sufficient condition for weak*-continuity of hm is R |hm (x)|(1 + |x|)D dx < 1
( [46, Theorem 6]), meaning that hm should exhibit some minimal decay at infinity
(see Appendix A.1.6). Here D = inf{n 2 N : (ess supx2R ⇢L (1 + |x|)n ) < +1}.
The ideal sampling is feasible as well, provided that the ⇢L is continuous; a detailed
proof of the weak*-continuity of (· xn ) for the case L = D2 can be found in [63,
Proposition 6].
We remark that [46, Theorem 2] is a special case of Theorem 4. The former
states the same result as Theorem 4 for the minimization problem
where C is feasible, convex, and compact. Feasibility of C means that the set
CX1 = {f 2 X1 : Hf 2 C} is nonempty. In our setting, problem (2.20) can be
obtained by using an indicator function over the feasible set C as the data-fidelity
term. However, Theorem 4 covers other more useful cases of E; for example,
ky H{f }k1 and ky H{f }k22 . Moreover, as discussed earlier, when data-fidelity
terms are strictly convex or do not have 1 in their range, they are proper in the
32 Continuous-Domain Extension of Classical Methods
range of H for any y 2 RM . This means that they do not require careful selection
of C in order to satisfy the feasibility condition. This is helpful in directly devising
and deploying algorithms to find the minimizers.
Also, fundamentally (2.20) only penalizes the regularization value, whereas The-
orem 4 additionally penalizes a data-fidelity term that can recover more desirable
solutions. In fact, Theorem 4 also covers cases such as
which allow more control than (2.20) over the data-fidelity of the recovered solution.
As given
n in Theorem
o 2.4.1, f2⇤ is unique and has the basis function 'm (x) =
1 e j(·)xm
F | (·)2 |2 (x) = 12 |x xm |3 . The resulting solution is piecewise linear. It
1
can be expressed as
XM
1
f2⇤ (x) = b1 +b2 x + am |x xm | 3 , (2.23)
m=1
12
In this scenario, the term kD2 f kM is the total variation of the function Df . It
penalizes solutions whose slope varies too much from one point to the next.
The Green’s function in this case is ⇢D2 = |x|
2 . Based on Theorem 2.4.2, any
extreme point of (2.24) is of the form
K
1X 0
f1⇤ (x) = b1 +b2 x + ak |x ⌧k |, (2.25)
2
k=1
fixed a priori and usually differ from the measurement points {xm }m=1 .
M
The two solutions and their basis functions are illustrated in Figure 2.1 for
specific data. This example demonstrates that the mere replacement of the L2
penalty with the gTV norm has a fundamental effect on the solution: piecewise-
cubic functions having knots at the sampling locations are replaced by piecewise-
linear functions with a lesser number of adaptive knots. Moreover, in the gTV case,
the regularization has been imposed on the generalized second-order derivative of
XK
the function kD2 f kM , which uncovers the innovations D2 f1⇤ = a0k (· ⌧k ).
k=1
By contrast, when R2 (f ) = kD2 f k2L2 = hD2⇤ D2 f, f i, the recovered solution is such
PM
that D2⇤ D2 f2⇤ = m=1 am (· xm ), where D2⇤ = D2 is the adjoint operator of D2 .
Thus, in both cases, the recovered functions are composed of the Green’s function
of the corresponding active operators: D2 vs. D2⇤ D2 = D4 .
2.5 Comparison
We now discuss and contrast the results of Theorems 2.4.1 and 2.4.2. In either
case, the solution is composed of a primary component and a null-space component
whose regularization cost vanishes.
Nature of the Primary Component The solutions for the gTV regularization
are composed of atoms within the infinitely large dictionary {⇢L (· ⌧ )}, 8⌧ 2 R,
whose shapes depend only on L. In contrast, the L2 solutions are composed of
fixed atoms {'m }M m=1 whose shapes depend on both L and H. As the shape of the
atoms of the gTV solutions does not depend on H, this makes it easier to inject
34 Continuous-Domain Extension of Classical Methods
3
2.5 TV TV
L2 2 L2
1.5 z
0.5 1
-0.5
0
0 1 2
x
3 4 5 -3 -2 -1 0 x 1 2 3
(a) f1 (x) and f2 (x).
⇤ ⇤
(b) ⇢D2 (x) and ⇢D2⇤ D2 (x).
prior knowledge in that case. The weights and the location of the atoms of the gTV
solution are adaptive and found through a data-dependent procedure which results
in a sparse solution that turns out to be a nonuniform spline. By contrast, the L2
solution lives in a fixed finite-dimensional space.
Null-Space Component. The second component in either solution belongs to
the null space of the operator L. As its contribution to regularization vanishes, the
solutions tend to have large null-space components in both instances.
Oscillations.
n oThe modulus of the Fourier transform of the basis nfunctiono of the
bm
gTV case, Lb1 typically decays faster than that of the L2 case, |hL| b2 . There-
fore, the gTV solution exhibits weaker Gibbs oscillations at edges.
Uniqueness of the Solution. Our hypotheses guarantee existence. Moreover,
the minimizer of the L2 problem is unique when Assumption 2’ is true. By contrast,
even for this special category of E(y, ·), the gTV problem can have infinitely many
solutions, despite all having the same measurements. Remarkably, however, when
the gTV solution is unique, it is guaranteed to be an L-spline.
Nature of the Regularized Function. One of the main differences between the
reconstructions f2⇤ and f1⇤ is their sparsity. Indeed, Lf1⇤ uncovers Dirac impulses
PM 1
situated at (M 1) locations for the gTV case, with Lf1⇤ = m=1 am (· ⌧m ). In
return, Lf2⇤ is a nonuniform L-spline convolved with the measurement functions,
2.6 Discretization and Algorithms 35
whose temporal support is not localized. This allows us to say that the gTV solution
is sparser than the Tikhonov solution.
can be expressed as
M
X N0
X
f2⇤ = a m 'm + b n pn . (2.27)
m=1 n=1
M
X
L⇤ Lf2⇤ = a m hm . (2.28)
m=1
36 Continuous-Domain Extension of Classical Methods
The corresponding J2 (y| , a, b) is then found by expressing H{f2⇤ } and kLf2⇤ k2L2 in
terms of a and b. Due to the linearity of the model,
M
X N0
X
H{f2⇤ } = am H{'m } + bn H{pn }
m=1 n=1
= Va + Wb, (2.29)
where (2.30) uses (2.28) and where (2.31) uses the orthogonality property (2.17),
which we can restate as aT W = 0. By substituting these reduced forms in (2.26),
the discretized problem becomes
which is very similar to (2.2). This criterion is convex with respect to the coefficients
a and b. Enforcing that the gradient of J2 vanishes with respect to a and b and
setting the gradient to 0 then yields M linear equations with respect to the M + N0
variables, while the orthogonality property (2.17) gives N0 additional constraints.
The combined equations correspond to the linear system
V+ I W a y
= . (2.34)
WT 0 b 0
2.6 Discretization and Algorithms 37
The system matrix so obtained can be proven to be positive definite due to the
property of Gram matrices generated in an RKHS and the admissibility condition of
the measurement functional (Assumption 1). This ensures that the matrix is always
invertible. The consequence is that the reconstructed signal can be obtained by
solving a linear system of equation, for instance by QR decomposition or by simple
matrix inversion. The derived solution is the same as the least-square solution
in [54].
and satisfies
K
X
Lf1⇤ = w1 = ak (· ⌧k ) (2.37)
k=1
where
(N 1 N0
)
X X
N +N0
X1, = an ⇢L (· n )+ bn pn (a, b) 2 R .
n=0 n=1
⇤
H{f1, } = Pa + Qb, (2.40)
⇤
kLf1, kM = kak1 , (2.41)
⇤
f1, = arg min (E(y, (Pa + Qb)) + kak1 ) . (2.42)
a,b | {z }
⇤ |y, )
J1, (a,b|y, )=J1, (f1,
which is more suitable when the measurements are noisy. The discrete version
(2.45) is similar to (2.3), the fundamental difference being in the nature of the
underlying basis function.
The problem is converted into a LASSO formulation [43] by decoupling the com-
putation of a⇤ and b⇤ . Suppose that a⇤ is fixed, then b⇤ is found by differentiating
(2.45) and equating the gradient to 0. This leads to
⇣ ⌘ 1
b⇤ = QT Q QT (y Pa⇤ ). (2.46)
• If LASSO has a unique solution, then the convergence to the exact solution can
be slow. The convergence rate is inversely proportional
⇣ to the
⇣ Lipschitz
⌘⌘ con-
T
stant of the gradient of a quadratic loss function max Eig H H , which
is typically high for the system matrix obtained through our formulation.
has the same measurement Ha⇤ = y0 for any a⇤ 2 ↵ . Moreover, if the solution is
not unique,
n then⇣ any ⌘ two ⇣solutions
⌘ ao(1) , a(2) 2 ↵ are such that their mth element
(1) (2)
satisfies sign am sign am 0 for m 2 [1 . . . M ]. In other words, any two
solutions have the same sign over their common support.
We use Lemma 2.6.1 to infer Theorem 2.6.2, whose proof is given in Appendix
2.6.2.
2.6 Discretization and Algorithms 41
Then, the solution a⇤SLP (obtained using the simplex algorithm) of the linear program
corresponding to the problem
the Lipschitz constant of the gradient of the quadratic loss, the simplex algorithm
is used after the FISTA iterations are stopped using an appropriate convergence
criterion. For FISTA, the convergence behavior is ruled by the number of iterations
t as
C
F (at ) F (a⇤ ) , (2.51)
(t + 1)2
where F is the LASSO functional and
⇣ ⌘
C = 2ka0 a⇤ k22 max Eig HT H (2.52)
(see [32]). This implies thatpan ✏ neighborhood of the minima of the functional
is obtained in at most t = C/✏ iterations. To ensure convergence, it is also
advisable to rely on the modified version of FISTA proposed in [73].
However, there is no direct relation between the functional value and the spar-
sity index of the iterative solution. Using the simplex algorithm as the next
step guarantees the upper bound M on the sparsity index of the solution. Also,
F (a⇤SLP ) F (a⇤F ). This implies that an ✏-based convergence criterion, in addition
to the sparsity-index-based criterion like a⇤F M , can be used to stop FISTA.
Then, the simplex scheme is deployed to find an extreme point of the solution set
with a reduced sparsity index.
Note that when E(y, ·) is not strictly convex, the solution set can have non-
unique measurements. In that case, it is still possible to further sparsify a recovered
solution by using the discussed Simplex approach.
2.7 Illustrations
We discuss the results obtained for the cases when the measurements are random
samples either of the signal itself or of its continuous-domain Fourier transform.
The operators of interest are L = D and L = D2 . The ground truth (GT) signal
fGT is solution of the stochastic differential equation LfGT = wGT [78] for the two
cases when wGT is
• Impulsive Noise. Here, the innovation wGT is a compound-Poisson noise
with Gaussian jumps, which corresponds to a sum of Dirac impulses whose
amplitudes follow a Gaussian distribution. The corresponding process fGT
has then the particularity of being piecewise smooth [79]. This case is matched
to the regularization operator kLf kM and is covered by Theorem 2.4.2 which
states that the minima f1⇤ for this regularization case is such that
K
X
w1⇤ = Lf1⇤ = ak (· xk ), (2.53)
k=1
to ⇢D (x) = 2 |x| and 'D (x) = (⇢L L ⇤ hm ) (x) = |x xm | /12. The null space
2
1
2 ⇤
3
is ND2 = span{1, x} for this operator. This means that the gTV-regularized solu-
tion is piecewise linear and that the L2 -regularized solution is piecewise cubic. We
compare in Figures 2.3.a and 2.3.b the recovery from noiseless samples of a second-
order process, referred to as ground truth (GT). It is composed of sparse (impulsive
Poisson) and non-sparse (Gaussian) innovations, respectively [80]. The sparsity
index—the number of impulses or non-zero elements—for the original sparse signal
is 9. The solution for the gTV case is recovered with = 0.05 and N = 200. The
sparsity index of the gTV solution for the sparse and Gaussian cases are 9 and 16,
respectively. As expected, the recovery of the gTV-regularized reconstruction is
better than that of the L2 -regularized solution when the signal is sparse. For the
Gaussian case, the situation is reversed.
and fSLP
⇤
have basis functions whose coefficients are the (non-unique) solutions of a
given LASSO problem, as shown in Figure 2.2.b. The `1 norms of the corresponding
coefficients are the same. Also, it holds that
which implies that the TV norm of the slope of fF⇤ and fSLP ⇤
are the same. This
is evident from Figure 2.2.c. The arc-length of the two curves are the same. The
signal fSLP
⇤
is piecewise linear (21 < M ), carries a piecewise-constant slope, and
is by definition, a non-uniform spline of degree 1. By contrast, fF⇤ has many more
knots and even sections whose slope appears to be piecewise-linear.
Theorem 2.4.2 asserts that the extreme points of the solution set of the gTV
regularization need to have fewer than M knots. Remember that fSLP ⇤
is obtained
by combining FISTA and simplex; this ensures that the basis coefficients of fSLP ⇤
2.7 Illustrations 45
are the extreme points of the solution set of the corresponding LASSO problem
(Theorem 2.6.2) and guarantees that the number of knots is smaller than M .
This example shows an intuitive relationship between the continuous-domain
and the discrete-domain formulations of inverse problems with gTV and `1 regu-
larization, respectively. The nature of the continuous-domain solution set and its
extreme points resonates with its corresponding discretized version. In both cases,
the solution set is convex and the extreme points are sparse.
No. of D D2
impulses Sparsity TV L2 TV L2
10 Strong 19.60 15.7 52.08 41.54
100 Medium 16.58 16.10 41.91 41.26
2000 Low 14.45 16.14 39.68 41.40
- Gaussian 14.30 16.32 40.05 41.23
(a) Noiseless case.
No. of D D2
impulses Sparsity TV L2 TV L2
10 Strong 17.06 11.52 25.55 24.60
100 Medium 13.24 10.94 24.44 24.24
2000 Low 10.61 11.13 25.80 26.19
- Gaussian 10.40 11.10 24.95 25.48
(b) Noisy case.
Table 2.1: Comparison of TV and L2 recovery from their (top table) noise-
less and (bottom table) noisy (with 40 dB SNR) random Fourier samples.
The results have been averaged over 40 realizations.
1
2| · | ⇤ hm (x). Figure 2.4.a and 2.4.b correspond to a first-order process with
sparse and Gaussian innovations, respectively. The grid step = 0.05, M =
41, and N = 200. The sparsity index of the gTV solution for the sparse and
Gaussian cases is 36 and 39, respectively. For the original sparse signal (GT), it
is 7. The oscillations of the solution in the L2 -regularized case are induced by
the sinusoidal form of the the measurement functionals. This also makes the L2
solution intrinsically smoother than its gTV counterpart. Also, the quality of the
recovery depends on the frequency band used to sample.
In Figures 2.4.c and 2.4.d, we show the zoomed version of the recovered second-
order process with sparse and Gaussian innovations, respectively. The grid step is
= 0.05, M = 41 and N = 200. The operator L = D2 is used for the regular-
ization. This corresponds to ⇢D2 (x) = |x|2 and 'D ,m (x) = 12 | · | ⇤ hm (x). The
2
1 3
sparsity index of the gTV solution in the sparse and Gaussian cases is 10 and 36,
respectively. For the original sparse signal (GT), it is 10. Once again, the recovery
by gTV is better than by L2 when the signal is sparse. In the Gaussian case, the
L2 solution is better.
The effect of sparsity on the recovery of signals from their noiseless and noisy
(40 dB SNR) Fourier samples are shown in Table 1. The sample frequencies are
kept the same for all the cases. Here, M = 41, N = 200, T = 10, and the grid
step = 0.05. We observe that reconstruction performances for random processes
based on impulsive noise are comparable to that of Gaussian processes when the
number of impulses increases. This is reminiscent of the fact that generalized-
Poisson processes with Gaussian jumps are converging in law to corresponding
Gaussian processes [81].
2.8 Summary
In this chapter, we consider 1D linear inverse problems that are formulated in the
continuous domain. The object of recovery is a function that is assumed to min-
imize a convex objective functional. The solutions are constrained by imposing
a continuous-domain regularization. We derive the parametric form of the solu-
tion (representer theorems) for Tikhonov (quadratic) and generalized total-variation
(gTV) regularizations. We show that, in both cases, the solutions are splines that
are intimately related to the regularization operator. In the Tikhonov case, the
solution is smooth and constrained to live in a fixed subspace that depends on the
2.8 Summary 47
(a)
a⇤F and a⇤SLP
(b) (c)
0.8 100
0.6 80
0.4 60
0.2 40
0 20
-0.2 0
-0.4 -20
0 2 4 6 8 10 0 2 4 6 8 10
Figure 2.3: Recovery of sparse (a) and Gaussian (b) second-order pro-
cesses (GT) using L = D2 from their nonuniform samples corrupted with
40 dB measurement noise.
50 Continuous-Domain Extension of Classical Methods
0.5 120
0 100
80
-0.5
60
-1
40
-1.5
20
-2 0
-2.5 -20
-5 0 5 -5 0 5
130
-2.3 128
126
-2.35
124
122
-2.4
120
118
-2.45
0.9 1 1.1 1.2 1.3 1.4 1.5 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2 -1.9
Second Generation
53
Chapter 3
3.1 Overview
While medical imaging is a fairly mature area, there is recent evidence that it may
still be possible to reduce the radiation dose and/or speedup the acquisition pro-
cess without compromising image quality. This can be accomplished with the help
of sophisticated reconstruction algorithms that incorporate some prior knowledge
(e.g., sparsity) on the class of underlying images [7]. The reconstruction task is
usually formulated as an inverse problem where the image-formation physics are
modeled by an operator H : RN ! RM (called the forward model ). The measure-
ment equation is y = Hx + n 2 RM , where x2 RN is the space-domain image that
1 This content of this chapter are based on our work [26].
55
56 Deep-Learning-based PGD for Iterative Reconstruction
3.1.1 Contributions
In this work, we propose a simple yet effective iterative scheme (see Figure 3.1),
which tries to incorporate the advantages of the existing algorithms and side-steps
3.1 Overview 57
Figure 3.1: (a) Block diagram of projected gradient descent using a CNN
as the projector and E as the data-fidelity term. The gradient step pro-
motes consistency with the measurements and the projector forces the so-
lution to belong to the set of desired solutions. If the CNN is only an
approximate projector, the scheme may diverge. (b) Block diagram of the
proposed relaxed projected gradient descent. The ↵k s are updated in such
a way that the algorithm always converges (see Algorithm 1 for more de-
tails).
58 Deep-Learning-based PGD for Iterative Reconstruction
• We first propose to learn a CNN that acts as a projector onto a set S which
can be intuitively thought of as the manifold of the data (e.g., biomedical
images). In this sense, our CNN encodes the prior knowledge of the data. Its
purpose is to map an input image to an output image that is more similar to
the training data.
3.1.3 Roadmap
The chapter is organized as follows: In Section 3.2, we discuss the mathematical
framework that motivates our approach and justify the use of a projector onto a
set as an effective strategy to solve inverse problems. In Section 3.3, we present
our algorithm, which is a relaxed version of PGD. It has been modified so as to
converge in practical cases where the projection property is only approximate. We
60 Deep-Learning-based PGD for Iterative Reconstruction
discuss in Section 3.4 a novel technique to train the CNN as a projector onto
a set, especially when the training data is small. This is followed by experiments
(Section 3.5), results and discussions (Section 3.6 and Section 3.7), and conclusions
(Section 3.8).
3.2.1 Notation
We consider the finite-dimensional Hilbert space RN equipped with the scalar prod-
uct h· , ·i that induces the `2 norm k·k2 . The spectral norm of the matrix H, denoted
by kHk2 , is equal to its largest singular value. For x 2 RN and " > 0, we denote
by B" (x) the `2 -ball centered at x with radius ", i.e.,
where is a step size chosen such that < 2/ HT H . This algorithm combines
2
the orthogonal projection onto S with the gradient descent with respect to the
quadratic objective function, also called the Landweber update [103]. PGD [104,
62 Deep-Learning-based PGD for Iterative Reconstruction
Propositions 1-3 suggest that, when S is non-convex, the best we can hope for
is to find a local minimizer of (3.1) through a fixed point of G . Theorem 3.2.4
provides a sufficient condition for PGD to converge to a unique fixed point of G .
Theorem 3.2.4. Let max and min be the largest and smallest eigenvalues of
HT H, respectively. If PS satisfies (3.4) and is Lipschitz-continuous with constant
L < ( max + min )/( max min ), then, for = 2/( max + min ), the sequence
{xk } generated by (3.2) converges to a local minimizer of (3.1), regardless of the
initialization x0 .
It is important to note that the projector PS can never be contractive since it
preserves the distance between any two points on S. Therefore, when H has a non-
trivial null space, the condition L < ( max + min )/( max min ) of Theorem 3.2.4
is not feasible. The smallest possible Lipschitz constant of PS is L = 1, which
means that PS is non-expansive. Even with this condition, it is not guaranteed
that the combined operator F has a fixed point. This limitation can be overcome
when F is assumed to have a nonempty set of fixed points. Indeed, we state in
Theorem 3.2.5 that one of them must be reached by iterating the averaged opera-
tor ↵ Id +(1 ↵)G , where ↵ 2 (0, 1) and Id is the identity operator. We call this
scheme averaged PGD (APGD).
Theorem 3.2.5. Let max be the largest eigenvalue of HT H. If PS satisfies (3.4)
and is a non-expansive operator such that G in (3.3) has a fixed point for some
< 2/ max , then the sequence {xk } generated by APGD, with
for any ↵ 2 (0, 1), converges to a local minimizer of (3.1), regardless of the initial-
ization x0 .
k 0
while not converged do
zk = F (xk HT Hxk + HT y)
if k 1 then
if kzk xk k2 > ck kzk 1 xk 1 k2 then
↵k = ck kzk 1 xk 1 k2 /kzk xk k2 ↵k 1
else
↵k = ↵k 1
end if
end if
xk+1 = (1 ↵k )xk + ↵k zk
k k+1
end while
(iii) if, in addition to (ii), F is indeed a projector onto S that satisfies (3.4), then
x⇤ is a local minimizer of (3.1).
We prove Theorem 3.3.1 in Appendix A.2.1. Note that the weakest statement
here is (i); it guarantees that RPGD always converges, albeit not necessarily to a
fixed point of G . Moreover, the assumption about the continuity of F in (ii) is
automatically satisfied when F is a CNN.
In summary, we have described three algorithms: PGD, APGD, and RPGD.
PGD is a standard algorithm which, in the event of convergence, finds a local
minima of (3.1); however, it does not always converge. APGD ensures convergence
under the broader set of conditions given in Theorem 3.2.5; but, in order to have
these properties, both PGD and APGD necessarily need a projector. While, we
shall train our CNN to act like a projector, it may not exactly fulfill the required
conditions. This is the motivation for RPGD, which, unlike PGD and APGD,
is guaranteed to converge. It also retains the desirable properties of PGD and
APGD: it finds a local minima of (3.1), given that the conditions (ii) and (iii) of
Theorem 3.3.1 are satisfied. Note, however, that when the set S is nonconvex, this
local minimum may not be a global minimum. The results of Section 3.2 and 3.3
66 Deep-Learning-based PGD for Iterative Reconstruction
of N ⇥ Q perturbed points and train the CNN by minimizing the loss function
Q
N X
X 2
J(✓) = kxq CNN ✓ (x̃q,n )k2 . (3.9)
n=1 q=1
| {z }
Jn (✓)
x̃q,1 = xq (3.10)
q,2 q
x̃ = AHx (3.11)
q,3 q,2
x̃ = CNN ✓t 1 (x̃ ), (3.12)
To understand the perturbation x̃q,2 in (3.11), recall that AHxq is the classical
linear reconstruction of xq from its measurement y = Hxq . Perturbation (3.11)
is indeed useful because we initialize RPGD with AHxq . Using only (3.11) for
training would return the same CNN as in [17].
The perturbation x̃q,3 in (3.12) is the output of the CNN whose parameters ✓ t
change with every epoch t; thus, it is a nonlinear and dynamic (epoch-dependent)
perturbation of xq . The rationale for using (3.12) is that it greatly increases the
training diversity by allowing the network to see T new perturbations of each train-
ing point, without greatly increasing the total training size since it only requires
Q additional gradient computations per epoch. Moreover, (3.12) is in sync with
the iterative scheme of RPGD, where the output of the CNN is processed with a
gradient descent and is again fed back into itself.
3.4.1 Architecture
Our CNN architecture is the same as in [17], which is a U-net [109] with intrinsic
skip connections among its layers and an extrinsic skip connection between the
input and the output. The intrinsic skip connections help to eliminate singularities
during the training [110]. The extrinsic skip connections make this network a
residual net; i.e., CNN = Id +Unet, where Id denotes the identity operator and
Unet : RN ! RN denotes U-net as a function. Therefore, U-net actually provides
the projection error (negative perturbation) that should be added to the input to
get the projection.
Residual nets have been shown to be effective for image recognition [111] and for
solving inverse problems [17]. While the residual-net architecture does not increase
the capacity or the approximation power of the CNN, it does help in learning
functions that are close to an identity operator, as is the case in our setting.
T3 epochs with all three ensembles {x̃q,1 , x̃q,2 , x̃q,3 } to minimize the original loss
function J = J1 + J2 + J3 from (3.9).
We shall see in Section 3.7.2 that this sequential procedure speeds up the training
without compromising the performance. The parameters of U are initialized by a
normal distribution with a very low variance. Since CNN = Id +U , this function
acts close to an identity operator in the initial epochs and makes it redundant to
use {x̃q,1 } for the initial training stages. Therefore, {x̃q,1 } is only added at the last
stage when the CNN is no longer close to an identity operator. After training with
only {x̃q,2 } in Stage 1, x̃q,3 will be close to xq since it is the output of the CNN for
the input x̃q,2 . This eases the training for {x̃q,3 } in the second and third stage.
3.5 Experiments
We validate the proposed method on the challenging case of sparse-view CT recon-
struction. Conventionally, CT imaging requires many views to obtain good quality
reconstruction. We call this scenario full-dose reconstruction. Our main aim in
these experiments is to reduce the number of views (or dose) for CT imaging while
retaining the quality of full-dose reconstructions. We denote a k-times reduction in
views by ⇥k.
The measurement operator H for our experiments is the Radon transform. It
maps an image to the values of its integrals along a known set of lines [2]. In 2D,
the measurements are indexed by the angle and offset of each lines and arranged
in a 2D sinogram. We implemented H and HT with Matlab’s radon and iradon
(normalized to satisfy the adjoint property), respectively. The Matlab code for the
RPGD and the sequential-strategy-based training are made publically available2 .
3.5.1 Datasets
We use two datasets for our experiments.
1) Mayo Clinic Dataset. It consists of 500 clinically realistic, (512 ⇥ 512) CT
images from the lower lungs to the lower abdomen of 10 patients. Those were
obtained from the Mayo clinic AAPM Low Dose CT Grand Challenge [112].
2) Rat Brain Dataset. We use a real (1493 px ⇥ 720 view ⇥ 377 slice) sinogram
from a CT scan of a single rat brain. The data acquisition was performed at the
2 https://github.com/harshit-gupta-epfl/CNN-RPGD
3.5 Experiments 69
Paul Scherrer Institute in Villigen, Switzerland at the TOMCAT beam line of the
Swiss Light Source. During pre-processing, we split this sinogram slice-by-slice
and downsampled it to create a dataset of 377 (729 px ⇥ 720 view) sinograms. CT
images of size (512⇥512) were then generated from these full-dose sinograms (using
the FBP, see Section 3.5.3). For the qth z-slice, we denote the corresponding image
xqFD . For experiments based on this dataset, the first 327 and the last 25 slices are
used for training and testing, respectively. This left a gap of 25 slices in between
the training and testing data.
task is to reconstruct xqFD back from yq . Sinograms were generated with 25, 30,
and 35 dB SNR with respect to HxqFD . To achieve this, in (A.35) and (A.36), we
assume the readout noise to be zero and {b1 , . . . , bm } = b0 = 1.66 ⇥ 105 , 5.24 ⇥ 105 ,
and 1.66 ⇥ 106 , respectively. More details about this process is given in Appendix
A.2.2. The CNNs were trained at only the 30-dB level of noise. Again, our task is
to reconstruct the images from the sinograms.
3) Experiment 3. We downsampled the views of the original, (729 ⇥ 720) rat-
brain sinograms by 5 to obtain sparse-view sinograms of size (729 ⇥ 144). For the
qth z-slice, we denote the corresponding sparse-view sinograms yqReal . Note that,
unlike in Experiments 1 and 2, the sinogram was not generated from an image but
was obtained experimentally.
where the purpose of a and b is to adjust for contrast and offset. We also evaluate
the performance using the structural similarity index (SSIM) [113]. We compare
five reconstruction methods.
1) FBP. FBP is the classical direct inversion of the Radon transform H, here
implemented in Matlab by the iradon command with the ram-lak filter and linear
interpolation as options.
2) Total-Variation Reconstruction. TV solves
✓ ◆
1
xTV = min kHx yk22 + kxkTV s.t. x 0, (3.15)
x 2
where
X1 NX1 q
N
kxkTV = (Dh;i,j (x))2 + (Dv;i,j (x))2 ,
i=1 j=1
Dh;i,j (x) = [x]i,j+1 [x]i,j , and Dv;i,j (x) = [x]i,j+1 [x]i,j . The optimization is
carried out via ADMM [13].
3.5 Experiments 71
2
where Ej : RN ⇥N ! RL extracts and vectorizes the jth patch of size (L ⇥ L) from
2
the image x, D 2 RL ⇥256 is the dictionary, ↵j is the jth column of ↵ 2 R256⇥R ,
and R = (N L + 1)2 . Note that the patches are extracted with a sliding distance
of one pixel.
For a given y, the dictionary D is learned from the corresponding ground truth
using the procedure described in [114]. The objective (3.16) is then solved iteratively
by first minimizing it with respect to x using gradient descent as described in [96]
and then with respect to ↵ using orthogonal matching pursuit (OMP) [115]. Since
D is learned from the testing ground truth itself, the performance that we report
here is an upper bound to the one that would be achieved by learning it using the
training images.
4) FBPconv. FBPconv [17] is a state-of-the-art deep-learning technique, in
which a residual CNN with U-net architecture is trained to directly denoise the FBP
. It has been shown to outperform other deep-learning-based direct reconstruction
methods for sparse-view CT. In our proposed method, we use a CNN with the same
architecture as in FBPconv. As a result, in our framework, FBPconv corresponds
to training with only the ensemble in (3.11). In the testing phase, the FBP of the
measurements is fed into the trained CNN to output the reconstruction image.
5) RPGD. RPGD is our proposed method. It is described in Algorithm 1.
There the nonlinear operator F is the CNN trained as a projector (as discussed
in Section 3.4). For experiments with Poisson noise, we use the slightly modified
RPGD described in Appendix A.2.2. For all the experiments, FBP is used for the
operator A.
the ground truth. We set the additional penalty parameter inside ADMM (see
Equation (2.6) in [13]) equal to . The rationale for this heuristic is that it puts
the soft-threshold parameter in the same order of magnitude as the image gradients.
We set the number of iterations to 100, which was enough to show good empirical
convergence.
For DL, the parameters are selected via a parameter sweep, roughly following
the approach described in [96, Table 1]. Specifically: The patch size is L = 8.
During dictionary learning, the sparsity level is set to 5 and 10. During recon-
struction, the sparsity level for OMP is set to 5, 8, 10, 12, 20, and 25, while the
tolerance level is taken to be 10, 100, and 1000. This, in effect, is the same as
sweeping over ⌫j in (3.16). For each of these 2 ⇥ 6 ⇥ 3 = 36 parameter settings,
in (3.16) is chosen by a golden-section search over 7 values.
As discussed earlier, the CNNs for both the ⇥5 and ⇥16 cases are trained
separately for high and low measurement noise.
i) Training with Noiseless Measurements. The training of the projector for
RPGD follows the sequential procedure described in Section 3.4, with the configu-
rations
We use the CNN obtained right after the first stage for FBPconv, since during this
stage, only the training ensemble in (3.11) is taken into account. We empirically
found that the training error J2 converged in T1 epochs of Stage 1, yielding an
optimal performance for FBPconv.
ii) Training with 40-dB Measurement Noise. This includes replacing the ensem-
ble in (3.11) with {Ayq } where yq = Hxq + n, has a 40-dB SNR with respect
to Hxq . With 20% probability, we also perturb the views of the measurements
with an AWGN of 0.05 standard deviation so as to enforce robustness to model
mismatch. These CNNs are initialized with the ones obtained after the first stage
of the noiseless training and are then trained with the configurations
Similarly to the previous case, the CNNs obtained after the first and the third
training stage are used in FBPconv and RPGD, respectively. For clarity, these
variants will be referred to as FBPconv40 and RPGD40.
The learning rate is decreased in a geometric progression from 10 2 to 10 3 in
Stage 1 and kept at 10 3 for Stages 2 and 3. Recall that the last two stages contain
the ensemble with dynamic perturbation (3.12) which changes in every epoch. The
lower learning rate, therefore, avoids drastic changes in parameters between the
epochs. The batch size is fixed to 2. The other hyper-parameters follow [17]. For
stability, gradients above 10 2 are clipped and the momentum is set to 0.99. The
total training time for the noiseless case is around 21.5 hours on a Titan X GPU
(Pascal architecture).
The hyper-parameters for RPGD are chosen as follows: The relaxation param-
eter ↵0 is initialized with 1, the sequence {ck } is set to the constant C = 0.99 for
RPGD and C = 0.8 for RPGD40. For each noise level and views number, the only
free parameter is swept over 20 values geometrically spaced between 10 2 and
10 5 . We pick the which gives the best average SNR over the 25 test images.
Note that, for TV and DL, the value of the optimum generally increases as the
measurement noise increases; however, no such obvious relation exists for . This
is mainly because it is the step size of the gradient descent in RPGD and not a
regularization parameter. In all experiments, the gradient step is skipped during
the first iteration.
On the GPU, one iteration of RPGD takes less than 1 second. The algorithm
is stopped when the residual kxk+1 xk k2 reaches a value less than 1, which is
sufficiently small compared to the dynamic range [0, 350] of the image. It takes
around 1-2 minutes to reconstruct an image with RPGD.
2) Experiment 2. For this case the CNNs are trained similarly to the CNN for
RPGD40 in Experiment 1. Perturbations (3.10)-(3.12) are used with the replace-
ment of AHxqFD in (3.11) by Ayq , where yq had 30 dB Poisson noise. The xqFD
and AyqReal are multiplied with a constant so that their maximum pixel value is
480.
The CNN obtained after the first stage is used as FBPconv.
While testing, we keep C = 0.4. Other training hyper-parameters and test-
ing parameters of the RPGD are kept the same as the RPGD40 for ⇥5 case in
Experiment 1.
3) Experiment 3. The CNNs are trained using the perturbations (3.10)-(3.12)
with two modifications: (i) xq is replaced with xqFD because the actual ground truth
74 Deep-Learning-based PGD for Iterative Reconstruction
was unavailable; and (ii) AHxq in (3.11) is replaced with AyqReal because we have
now access to the actual sinogram.
All other training hyper-parameters and testing parameters are kept the same
as RPGD for the ⇥5 case in Experiment 1. Similar to Experiment 1, the CNN
obtained after the first stage of the sequential training is used as the FBPconv.
RPGD method outperforms all the others for both ⇥5 and ⇥16 reductions in terms
of SNR and SSIM indices. FBP performs the worst but is able to retain enough
information to be utilized by FBPConv and RPGD. Due to the convexity of the
iterative scheme, TV is able to perform well but tends to smooth textures and
edges. DL performs worse than TV for ⇥16 case but is equivalent to it for ⇥5
case. On one hand, FBPConv outperforms both TV and DL. but it is surpassed
by RPGD. This is mainly due to the feedback mechanism in RPGD which lets
RPGD use the information in the given measurements to increase the quality of
the reconstruction. In fact, for the ⇥16, no noise, case, the SNRs of the sinogram
of the reconstructed images for TV, FBPconv, and RPGD are around 47 dB, 57
dB, and 62 dB, respectively. This means that reconstruction using RPGD has
both better image quality and more reliability since it is consistent with the given
noiseless measurement.
2) High Measurement Noise. In the noisier cases (Table 3.3), RPGD40 yields a
better SNR than other methods in the low-view cases (⇥16) and is more consistent
3.6 Results and Discussions 77
in performance than the others in the high-view (⇥5) cases. In terms of the SSIM
index, it outperforms all of them. The performance of DL and TV are robust to the
noise level with DL performing better than others in terms of SNR for the 45-dB,
⇥5, case. FBPconv40 substantially outperforms DL and TV in the two scenarios
with 40-dB noise measurement, over which it was actually trained. For this noise
level and ⇥5 case, it even performs slightly better than RPGD40 but only in terms
of SNR. However, as the level of noise deviates from 40 dB, the performance of
FBPconv40 degrades significantly. Surprisingly, its performances in the 45-dB cases
are much worse than those in the corresponding 40-dB cases. In fact, its SSIM
index for the 45-dB, ⇥5, case is even worse than FBP. This implies that FBPConv40
is highly sensitive to the difference between the training and testing conditions. By
contrast, RPGD40 is more robust to this difference due to its iterative correction.
In the ⇥16 case with 45-dB and 35-dB noise level, it outperforms FBPconv40 by
around 3.5 dB and 6 dB, respectively.
3) Case Study The reconstructions of lung and abdomen images for the case of
⇥16 downsampling and noiseless measurements are illustrated in Figure 3.2 (first
and fifth columns). FBP is dominated by line artifacts, while TV and DL satisfac-
torily removes those but blurs the fine structures. FBPConv and RPGD are able to
reconstruct these details. The zoomed version (second and sixth columns) suggests
that RPGD is able to reconstruct the fine details better than the other methods.
This observation remains the same when the measurement quality degrades. The
remaining columns, contain the reconstructions for different noise levels. For
the abdomen image it is noticeable that only TV is able to retain the small bone
structure marked by an arrow in the zoomed version of the lung image (seventh
column). Possible reason for this could be that the structure similar to this were
rare in the training set. Increasing the training data size with suitable images could
be a solution.
Figure 3.3 contains the profiles of high- and low-contrast regions of the recon-
structions for the two images. These regions are marked by line segments inside the
original image in the first column of Figure 3.2. The FBP profile is highly noisy and
the TV and DL profiles overly smooth the details. FBPconv40 is able to accommo-
date the sudden transitions in the high-contrast case. RPGD40 is slightly better
in this regard. For the low-contrast case, RPGD40 is able to follow the structures
of the original (GT) profile better than the others. A similar analysis holds for the
⇥5 case (see Figure A.3 in the Appendix).
78 Deep-Learning-based PGD for Iterative Reconstruction
Figure 3.3: Profile of the high- and low-contrast regions marked in the
first and fifth columns of Figure 3.2 by solid and dashed line segments,
respectively. First and second columns: ⇥16, 45-dB noise case for the lung
image. Third and fourth columns: ⇥16, 40-dB noise case for the abdomen
image.
3.6.2 Experiment 2
We show in Table 3.4 the regressed SNR and SSIM indices averaged over the 25
reconstructed slices. RPGD outperforms both FBP and FBPconv in terms of SNR
and SSIM. Similar to the Experiment 1, its performance is also more robust with
respect to noise mismatch. Fig. A.4 in the Appendix compares the reconstructions
for a given test slice.
3.7 Behavior of Algorithms 79
3.6.3 Experiment 3
In Figure 3.4, we show the reconstruction result for one slice for = 10 5 . Since
the ground truth is unavailable, we show the reconstructions without a quantitative
comparison. It can be seen that the proposed method is able to reconstruct images
with reasonable perceptual quality.
Due to the high value of the step size ( = 2 ⇥ 10 3 ) and the large difference
(Hxk y), the initial few iterations have large gradients and result in the instability
of the algorithm. The reason is that the CNN is fed with (xk HT (Hxk y)),
which is drastically different from the perturbations on which it was trained. In
this situation, ↵k decreases steeply and stabilizes the algorithm. At convergence,
↵k 6= 0; therefore, according to Theorem 3.3.1, x100 is the fixed point of (3.7) where
F = CNN .
3.8 Summary
In this chapter, we present a new image reconstruction method that replaces the
projector in a projected gradient descent (PGD) with a convolutional neural net-
work (CNN). Recently, CNNs trained as image-to-image regressors have successfully
been used to solve inverse problems in imaging. However, unlike existing iterative
image reconstruction algorithms, these CNN-based approaches usually lack a feed-
back mechanism to enforce that the reconstructed image is consistent with the
measurements. We propose a relaxed version of PGD wherein gradient descent
enforces measurement consistency, while a CNN recursively projects the solution
closer to the space of desired reconstruction images. We show that this algorithm is
82 Deep-Learning-based PGD for Iterative Reconstruction
30
27
25
RPGD
SNR
20 FBPconv
TV
FBP
15
10
0 k 50 100
(a) (b)
1
0.8
0.6
αk
0.4
0.2
0
0 k 50 100
(c)
Third Generation
85
Chapter 4
4.1 Overview
There are currently three main approaches to accelerate the magnetic resonance
imaging (MRI) of a static image. All three methods rely on a partial sampling of
the k-space to reduce the acquisition time. The resulting partial loss of data must
then be compensated to maintain the quality of the image. Once compensation is
1 The content of this chapter is based on our work [116].
87
88 Time-Dependent Deep Image Prior
achieved, the accelerated methods capture accurate motions of fast moving organs
such as the heart.
i. In parallel MRI (pMRI), the simultaneous use of several hardware coils re-
sults in spatial redundancy that enables algorithms to reconstruct clean im-
ages [117, 118].
ii. In compressed sensing (CS) MRI, the data are assumed to be sparse in cer-
tain transform domains [119, 120]. This ultimately leads to regularized com-
putational methods that compensate for the partial sampling. Their success
suggests that, in particular, a Fourier-based forward model matches well the
assumption of sparsity.
iii. In the context of trainable deep artificial neural networks, learning approaches
have already achieved fast and accurate reconstructions of partially sampled
MRI data [15,121]. Similarly to CS MRI, dynamic accelerated reconstructions
have also been proposed in the literature [122–124], possibly in combination
with pMRI in the learning loop [24]. These approaches depend on training
datasets [17, 20, 22, 84, 125, 126].
In the context of dynamic MRI, the approach that consists in the acquisition of
a sequence of frames is suboptimal. Instead, it is more efficient to take advantage of
the time dependencies between frames to gain additional improvements in terms of
temporal resolution [127, 128]. For instance, [129, 130] design non-overlapping sam-
pling masks at each frame to restore a dynamic volume—a method that demands
far fewer k-space samples than would a sequence of static MRI. Indeed, the CS
theory explains the possibility of perfect reconstruction despite a sub-Nyquist sam-
pling [131]. The capture of temporal redundancies has also been handled through
low-dimensional manifolds [132, 133]. In the specific case of cardiac applications,
the overall motion of the heart is expected to be approximately cyclic. This peri-
odicity can be exploited to bring additional gains in terms of temporal resolution,
but the length and phase of the cycles must be determined first. This is usually
achieved either through electrocardiograms (ECG) or self-gating [134, 135]. Under
the restrictive assumption of ideal periodicity, these methods allow one to prefix
the cardiac phases and to reorder temporally the acquired frames, effectively pro-
ducing a stroboscope-inspired analysis of the motion. Motion irregularities are not
captured by those methods.
4.1 Overview 89
4.1.1 Contribution
In this chapter, we propose an unsupervised learning framework in which a neu-
ral network reconstructs fast dynamic MRI from golden-angle radial lines in k-
space, also called spokes. To reconstruct a single image, we feed one realization of
low-dimensional latent variables to a artificial neural network. A nonuniform fast
Fourier transform2 (NuFFT) is then applied to the output of the neural network
and simulates the MRI measurement process [136]. Inspired by deep image pri-
ors [27], we fit the simulated measurements to the real measurements. The fit is
controlled by adjusting the weights of the neural network until the Euclidean loss
between the simulated and real measurements is minimized; this fitting process is
referred to as the learning stage.
In the context of dynamic MRI, we extend the fitting process in such a way that
the weights of the network are learned by taking simultaneously into consideration
the joint collection of all acquisitions, which yields time-independent weights. Time
dependence is recovered by controlling the latent variables. Given some temporal
interval, we synthesize two independent random realizations of low-dimensional la-
tent variables and associate one to the initial and one to the final bound of the
interval. Timestamped contributions to learning are obtained by taking as ground-
truth the real measurements acquired over a quasi-instantaneous observation period
(say, five spokes), while we let the activation of the neural network be the inter-
mediate realization of the latent variables obtained by linear interpolation of the
two latent endpoints. This approach allows us to impose and exploit temporal
dependencies in the latent space.
In short, the action of our neural network is to map the manifold of latent
variables onto a manifold of dynamic images. Importantly, our approach is purely
unsupervised; moreover, priors are imposed only indirectly, arising from the mere
structure of a convolutional network. We demonstrate the performance of our
neural network by comparing its reconstructions to those obtained from CS algo-
rithms [129, 130].
the limitations and pitfalls of hand-crafted priors. They have been deployed in [137]
to build an unsupervised learning scheme for accelerated MRI; but, contrarily to
ours, the task addressed therein is static. Other researchers have used deep image
priors to reconstruct positron emission tomography images, albeit again in a non-
dynamic fashion [138].
4.2 Methods
Let R be the Radon transform of the complex-valued continuously defined image
x : R2 ! C, so that
Z
R{x}(r, #) = x(⇠) (uT
#⇠ r) d⇠, (4.1)
R2
where r and # are the spatial and angular Radon arguments, respectively, and where
u# is a unit vector in the # direction. Moreover, let F denote the one-dimensional
continuous Fourier transform that follows the convention
Z
F{x}(!) = x(r) e j ! r dr (4.2)
R
model, in particular because of aliasing concerns, and also because the discrete
version is no more invertible.
Formally, let x 2 CN be a vectorized version of the samples of x seen as an
image of finite size (N1 ⇥ N2 ), with N = N1 N2 . Likewise, let y 2 CM be a
vectorized version of the samples of the sinogram H# {x}(!), with measurements of
finite size (M# ⇥ M! ) taken over M# orientations and M! elements of the k-space,
with M = M# M! . Then, by linearity of the transformation we write that
y = G x, (4.4)
where G is an M -rows by N -columns matrix that combines discrete Fourier and
discrete Radon transforms.
space, as suggested in Figure 4.1) are instantaneous and indexed at discrete times
t0 2 Z t, taken regularly at temporal interval t. The spoke orientations follow
the golden-angle strategy
#t = #0 + !0 t, (4.5)
y t 0 = Ht 0 x t 0 . (4.6)
4.2.3 Regularization
Even with n > 1, it is observed that (n M! ) ⌧ N , which makes severely ill-
posed the recovery of xt0 given yt0 . To truly resolve this issue, practitioners often
choose to regularize the problem over some extended temporal range. From a
notational perspective, K vectors yt0 are concatenated over a large duration (K t)
to build Y = (yk t )k2[0...K 1] . Likewise, we write that X = (xk t )k2[0...K 1]
and H = [Hk t ]k2[0...K 1] . The length of Y and X are (K n M! ) and (K N ),
respectively. The size of H ensues.
In the context of CS dynamic imaging, the traditional regularization of the
forward model (4.6) is established as a search for the solution
⇣ ⌘
2
X⇤ = arg min kH X Yk2 + kD Xkp , (4.7)
X
For a single coil, our deep prior minimizes an Euclidean loss and results in the
solution
K
X1
⇤ 2
= arg min kHk t g (zk t ) yk t k2 , (4.10)
k=0
where Cc gives the sensitivity map of the cth coil, is a pixel-wise multiplication
operator in the spatial domain which relates true magnetization image to coil sen-
sitivities, and yc,t concatenates n instantaneous acquisitions of spokes for the cth
coil. Once an optimal ⇤ has been found in either (4.10) or (4.11), we can produce
the final estimate x⇤t for all values of t, including for t 2
/ Z t if desired, as
Datasets
All experimental datasets are breath-hold. We use golden-angle radial sparse par-
allel (GRASP) MRI as a common baseline [129]. Spoke-sharing is not applied for
98 Time-Dependent Deep Image Prior
Size of
Number Size of Each Zero
Strides Output
Operation Layer of Filter Padding
(XY) Image
Filters (XYC) (XY)
(XYC)
Input 8⇥8
Conv+BN+ReLU 128 3⇥3⇥1 1⇥1 1⇥1 8 ⇥ 8 ⇥ 128
Conv+BN+ReLU 128 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 8 ⇥ 8 ⇥ 128
NN interp. 2⇥2 16 ⇥ 16 ⇥ 128
2⇥(Conv+BN+ReLU) 128 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 16 ⇥ 16 ⇥ 128
NN interp. 2⇥2 32 ⇥ 32 ⇥ 128
2⇥(Conv+BN+ReLU) 128 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 32 ⇥ 32 ⇥ 128
NN interp. 2⇥2 64 ⇥ 64 ⇥ 128
2⇥(Conv+BN+ReLU) 128 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 64 ⇥ 64 ⇥ 128
NN interp. 2⇥2 128 ⇥ 128 ⇥ 128
2⇥(Conv+BN+ReLU) 128 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 128 ⇥ 128 ⇥ 128
Conv. 2 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 128 ⇥ 128 ⇥ 2
4.2 Methods 99
Reconstruction Experiments
We use an Intel i7-7820X (3.60GHz) CPU and an Nvidia Titan X (Pascal) GPU.
Pytorch 1.0.0 on Python 3.6 is used to implement our method. All experiments
are performed in single-batch mode. The input size is (8 ⇥ 8). The cost function
used to train our neural network is (4.11). The learning rate is 10 3 , with [141] as
optimizer.
4.3 Results
4.3.1 Retrospective Simulation
In this experiment, the acquisition process is simulated, which allows us to build
the ground truth from a fully sampled k-space. We use n = 13 spokes for the
reconstruction and present the results in Figure 4.4. There, the bandpass method
(BP) corresponds to a zero-filled DFT while GRASP is the baseline against which
we compare the performance of our method. We see in Figure 4.4 (A) that GRASP
leads to blurring artifacts, while the residual map discloses the occurrence of errors
around the wall of the heart. By contrast, our proposed method gives better results.
This is confirmed in Figure 4.4 (B), where the cardiac motions are captured better
by our model than by GRASP. The systolic phase of our reconstruction is well
described and very close to the ground-truth, whereas the systolic phase captured
by GRASP is too flat.
(A)
Residual
(B)
(A)
(B)
Figure 4.5: Dynamic reconstructions of a fetal heart for one beating cycle.
Top rows (A), from left to right: field of view from OV, BP, GRASP,
RD, and ours. Bottom row (B), from left to right: OV with a white
line indicating the (y-t) location of cross sections; cross sections from BP,
GRASP, RD, and ours.
104 Time-Dependent Deep Image Prior
(A)
(B)
Figure 4.6: Top (A): Series of (y-t) cross sections of our reconstruction
from the region of interest in Figure 4.5 (B). Bottom (B): t-SNE embedding
from image frames (left) and latent variables (right). The temporal index
is color-coded.
4.4 Discussion 105
4.4 Discussion
4.4.1 Latent Encoding for Acyclic Data
Letting K in Section 4.2.3 be such that truly all data in Figure 4.6 (A) were
taken jointly, and interpolating the latent variables between the only two endpoint
realizations z0 and z(K 1) t , we observed that the reconstruction x⇤t of the fetal
cardiac motion took a constant, time-independent value. We surmise that this
failure is due to the overly strong presence of non-periodic components in the data.
To cope with them, we adapted our scheme slightly and proceeded by piecewise
interpolation, the pieces being made of temporal chunks in the latent space. More
precisely, we generated fourteen realizations {z(⌧ ) }⌧ 2[0...13] of the latent variables,
equi-spaced in time; then, instead of building zt as a linear combination of z0 = z(0)
4 For display purposes, we show only one cycle of our cross section. In fact, our reconstructed
data have as many frames (K = 1,400) than there are spokes, in reason of the spoke-sharing
mechanism of Section 4.2.2.
106 Time-Dependent Deep Image Prior
and z(K 1) t = z(13) , we built zt as a linear combination of z(⌧ ) and z(⌧ +1) , with
an appropriate ⌧ that depends on t. Note that, while the latent variables evolve
now chunk-wise, the network is still time-independent and trained over all data
jointly. The chunk boundaries are made visible in Figure 4.6 (B).
trained with
non-smooth latent variables
19.29
trained with
smooth/interpolated latent variables
Table 4.2: Regressed SNR in terms of the size of the latent variables.
iii. New Latent Variables In the third scenario, we make use of random real-
izations of latent variables that differ from those used while learning. This
gives severely perturbed images.
v. Scalar Latent Variables In the fifth and last scenario, we deploy scalar
latent variables. This results in a time-independent, non-moving sequence of
images.
Several competing methods aim at synthesizing a single cardiac cycle out of spokes
collected over several cycles at various cardiac phases. There, a synchronized ECG
could in principle allow one to associate a phase to each spoke; however, the de-
ployment of this ancillary ECG would make the practical apparatus more involved,
which is often undesirable. Furthermore, there are applications such as fetal car-
diology where no ECG can be deployed at all. In traditional ECG-free approaches
instead, one deploys self-gating methods for the purpose of phase assignment. They
proceed on the basis of either human inspection or heuristic decisions, which makes
them arduous, non-reproducible, and prone to errors. (Sometimes, the assign-
ment is no more advanced than a simple manual sorting.) One specific additional
difficulty that self-gating methods must deal with originates with the necessary
Cartesian-to-polar conversion inherent in radial sampling trajectories, which ul-
timately results in streaking artifacts that tend to confound phase assignments,
particularly those based on visual assessments in the spatial domain.
4.5 Summary
In this chapter, we develop a novel unsupervised deep-learning-based algorithm to
solve the inverse problem found in dynamic magnetic resonance imaging (MRI).
Our method needs neither prior training nor additional data; in particular, it does
not require either electrocardiogram or spokes-reordering in the context of cardiac
images. It generalizes to sequences of images the recently introduced deep-image-
prior approach. The essence of the proposed algorithm is to proceed in two steps to
fit k-space synthetic measurements to sparsely acquired dynamic MRI data. In the
first step, we deploy a convolutional neural network (CNN) driven by a sequence of
low-dimensional latent variables to generate a dynamic series of MRI images. In the
second step, we submit the generated images to a nonuniform fast Fourier transform
that represents the forward model of the MRI system. By manipulating the weights
of the CNN, we fit our synthetic measurements to the acquired MRI data. The
corresponding images from the CNN then provide the output of our system; their
evolution through time is driven by controlling the sequence of latent variables
whose interpolation gives access to the sub-frame—or even continuous—temporal
control of reconstructed dynamic images. We perform experiments on simulated
112 Time-Dependent Deep Image Prior
CryoGAN: Cryo-EM
Reconstruction using GAN
Framework
1
In the previous chapter we discussed deep image prior, which, in principle is ap-
plicable to any inverse problem where the forward model is exactly known. In this
chapter, we propose a new unsupervised deep-learning-based algorithm for Cryo-
EM, an imaging problem where only a parameteric form of the forward model is
known. Our method, as we shall see, approaches this problem from a distributional
perspective.
5.1 Overview
Single-particle cryo-electron microscopy (Cryo-EM) is a powerful method to deter-
mine the atomic structure of macro-molecules by imaging them with electron rays at
cryogenic temperatures [143–145]. Its popularity has rocketed in recent years, cul-
minating in 2017 with the Nobel Prizes of Jacques Dubochet, Richard Henderson,
1 This chapter uses content from our work [28].
113
114 CryoGAN: Cryo-EM Reconstruction using GAN Framework
and Joachim Frank. In Cryo-EM one obtains many 2D noisy tomographic projec-
tions from separate instance of the same but randomly oriented 3D biomolecule2 .
There exists a multitude of software packages to produce high-resolution 3D struc-
ture(s) from these 2D measurements [146–153]. These sophisticated algorithms,
which include projection-matching approaches and maximum-likelihood optimiza-
tion frameworks, enable the determination of structures with unprecedented atomic
resolution.
Yet reconstruction procedures in single-particle Cryo-EM still face complex ob-
stacles. The task involves a high-dimensional, nonconvex optimization problem
with numerous local minima. Hence, the outcome of the global process is pred-
icated on the quality of the initial reconstruction [154, 155]. Moreover, one still
often relies on the input of an expert user for appropriate processing decisions and
parameter tuning [156]. Even for more automated methods, the risk of outputting
incorrect and misleading 3D reconstructions is ever-present. A key reason behind
such complexity is that the imaged particles have unknown poses. To handle this,
most software packages rely on a marginalized maximum-likelihood (ML) formula-
tion [157] that is solved through an expectation-maximization algorithm [151, 153].
The latter involves calculations over the discretized space of poses for each projec-
tion, a computationally demanding procedure.
To bypass these limitations, we introduce CryoGAN, an unsupervised recon-
struction algorithm for single-particle Cryo-EM that exploits the remarkable ability
of generative adversarial networks (GANs) to capture data distributions [36]. Sim-
ilar to GANs, CryoGAN is driven by the competitive training of two entities: one
that tries to capture the distribution of real data, and another that discriminates
between generated samples and samples from the real dataset (Figure 5.1). In a clas-
sical GAN, the two entities are each a convolutional neural network (CNN). They
are known as the generator and the discriminator and are trained simultaneously
using backpropagation. The important twist with CryoGAN is that we replace the
generator network by a Cryo-EM physics simulator. By doing so, CryoGAN learns
the 3D density map whose simulated projections are the most consistent with a
given dataset of 2D measurements in a distributional sense.
The CryoGAN architecture represents a complete change of paradigm for single-
particle Cryo-EM reconstruction. No estimation of the poses is attempted during
2 Inthis work we consider the homogenuous case where the biomolecule exhibit only a single
conformation. The heterogeneuous (multiple conformation) case is the topic of the next chapter.
5.2 Image-Formation Model in Single-Particle Cryo-EM 115
density; H' 2 RM ⇥V is the forward operator with parameters ' ⇠ p' ; and n 2 RM
is an additive noise following a distribution pn . In Cryo-EM one obtains 104 107
measurements of the biomolecule. Each of these measurement is obtained with an
unknown '. The imaging parameters ' comprise the projection (Euler) angles
✓ = (✓1 , ✓2 , ✓3 ), the projection shifts t = (t1 , t2 ), and the CTF parameters c =
(d1 , d2 , ↵ast ), where d1 is the defocus-major, d2 is the defocus-minor, and ↵ast is
the angle of astigmatism.
The forward operator H' is given by
H' = Cc St P✓ , (5.2)
where P✓ : RV ! RM is a projection operator (mathematically speaking, the X-
ray transform [160]), St : RM ! RM is a shift operator, and Cc : RM ! RM is
a convolution operator. We discuss in more detail the continuous-domain physics
behind the image formation model H' .
Projection Operator
The projection operation is given by [160]
Z 1
P✓ {f }(x1 , x2 ) = R✓ {f }(x1 , x2 , x3 ) dx3 (5.5)
1
118 CryoGAN: Cryo-EM Reconstruction using GAN Framework
where R✓ is the rotation matrix associated with ✓ and R✓ {f }(x) = f (R✓ 1 x).
Shift Operator
The projection measurements are picked from the micrographs and can thus be
off-centered. This is modelled via the shift operator which, for any ys : R2 ! R,
yields
where t = (t1 , t2 ).
Convolution by CTF
The effect of the operator of Cc on any yc : R2 ! R is given in Fourier domain as
where F is the Fourier transform and ŷc = F{yc }. Its Fourier transform Ĉc (i.e.,
the CTF) is given by
There, Ĉcp : R2 ! R is the phase-contrast transfer function that takes the form
p
Ĉcp (!) = 1 A2 sin( c (!)) A2 cos( c (!)), (5.9)
with ✓ ◆
1
c (!) = ⇡ dc (↵)k!k2 3
cs k!k4 , (5.10)
4
where is the electron wavelength, cs is third-order spherical aberration constant,
↵ is the phase of the vector !, and dc (↵) is the defocus arising at the phase ↵. This
defocus is given as
where d1 and d2 are the horizontal and vertical defocus, respectively, and ↵ast is
the reference angle that defines the azimuthal direction of axial astigmatism. The
objective aperture function  : R2 ! R is given by
(
1, k!k !cutoff
Â(!) = (5.12)
0, k!k > !cutoff ,
2⇡d
where !cutoff = f0 ap is the cutoff frequency, f0 is the focal length of the objec-
tive lens, and dap corresponds to the diameter of the aperture. The spatial and
chromatic envelope function Ê : R2 ! R is given by
Discretization
The discretization of H' results in H' . This discretized measurement operator
is itself decomposed of the discretized projection, shift, and convolution operation
which are denoted by P✓ , St , and Cc , respectively. The input to the operator H'
is a discretized version of the continuous-domain 3D volume. This discretization of
the 3D volume is done using a suitable basis function [163].
where is the set of all the possible imaging parameters. We denote a noiseless
projection as ynoiseless = H' x. In our formulation, the projections in the real
dataset are samples of a distribution pdata ; hence, p(y|xtrue ) = pdata (y) assuming
that the forward model is correct.
We demonstrate in Theorem 5.5.5 in Section 5.5 that two 3D volumes have
identical conditional distributions if and only if they are identical, up to rotation
and reflection. Hence, Theorem 5.5.5 implies that, for the reconstruction xrec to
satisfy xrec = xtrue , it must also satisfy p(y|xrec ) = p(y|xtrue ). Thus, we can
formulate the reconstruction task as the minimization problem
where D is some distance between two distributions. In essence, (5.15) states that
the appropriate reconstruction is the 3D density map whose projection distribution
is the most similar to the real dataset in a distributional sense. For the sake of
conciseness, we shall henceforth use the notation p(y|x) = px (y).
As distance in (5.15), we use the Wasserstein distance defined as
where ⇧(p1 , p2 ) is the set of all the joint distributions (y1 , y2 ) whose marginals
are p1 and p2 , respectively. Our choice is driven by works demonstrating that
the Wasserstein distance is more amenable to minimization than other popular
distances (e.g., total-variation or Kullback-Leibler divergence) for this kind of ap-
plication [164]. Using (5.16), the minimization problem (5.15) expands as
By using the formalism of [164–166], this minimization problem can also be stated
in its dual form
⇣ ⌘
xrec = argmin max Ey⇠pdata [f (y)] Ey⇠px [f (y)] , (5.18)
x f :kf kL <1
Here, pint denotes the uniform distribution along the straight line between points
sampled from pdata and px , while 2 R+ is an appropriate penalty coefficient
(see [167], Section 4).
minimization of
X X X
n n n
LS (x, D ) = D (ydata ) D (ysim ) (kry D (yint )k 1)2 ), (5.21)
n2S n2S n2S
where S consists of either the full dataset Sfull = {1, . . . , Ntot } or a batch B ✓ Sfull ;
n
ydata is a projection sampled from the acquired experimental dataset; ysim n
⇠ px is
a projection of the current estimate x generated by the Cryo-EM physics simulator;
and yint
n n
= ↵n ·ydata +(1 ↵n )·ysim , where ↵n is sampled from a uniform distribution
between 0 and 1.
In practice, we minimize (5.21) through stochastic gradient descents (SGD)
using batches. We alternatively update the discriminator D (for ndiscr iterations)
using an Adam optimizer [141] with gradient
N N N
!
X X X
n n n 2
r LB (x, D ) = r D (ybatch ) D (ysim ) (kry D (yint )k 1) ,
n=1 n=1 n=1
(5.22)
The pseudocode and a schematic view of the CryoGAN algorithm are given in Algo-
rithm 3 and Figure 5.1b, respectively. We provide further details of the CryoGAN
physics simulator and discriminator network in the next two sections.
1: for ntrain do
2: for ndiscr do
3: sample real projections: {y1batch , . . . , yN n
batch } = {ydata }n2B
4: sample projections simulated from current x: {ysim , . . . , yN
1
sim } ⇠ px (see
Algorithm 2)
5: sample {↵1 , . . . , ↵n } ⇠ U [0, 1]
6: for all n 2 {1, . . . , N }, compute yint
n n
= ↵n · ybatch n
+ (1 ↵n ) · ysim
7: update the parameters of the discriminator D using (5.22)
8: sample {y1sim , . . . , ysim N } ⇠ px
9: update the volume x using (5.23)
Recall that the set of imaging parameters is given by ' = (✓1 , ✓2 , ✓3 , t1 , t2 , d1 , d2 , ↵ast ).
We first sample the Euler angles ✓ = (✓1 , ✓2 , ✓3 ) from a distribution p✓ decided a
priori based on the acquired dataset. Similarly, the projection shifts t = (t1 , t2 )
are sampled from the prior distribution pt . The CTF parameters c = (d1 , d2 , ↵ast )
are sampled from the prior distribution pc . In practice, we exploit the fact that the
CTF parameters can often be efficiently estimated for all micrographs. We then
uniformly sample from the whole set of extracted CTF parameters.
We generate noiseless projections ynoiseless by applying H' to the current volume
estimate x. The projection operator P✓ in (5.2) is implemented using the ASTRA
toolbox [168].
The precise modeling of the noise is a particularly challenging feat in single-
particle Cryo-EM. To produce noise realizations that are as realistic as possible,
we extract random background patches directly from the micrographs themselves,
at locations where particles do not appear. For consistency, the noise patch added
to a given noiseless projection is taken from the same micrograph that was used
to estimate the CTF parameters previously applied to that specific projection.
Additional details for this implementation are given in Section A.3.
124 CryoGAN: Cryo-EM Reconstruction using GAN Framework
Convolution
H*W Max Pooling
Leaky ReLU
Fully-connected
Leaky ReLU
C*H/2*W/2
Fully-connected
2C*H/4*W/4
10*1
4C*H/8*W/8
1*1
16C*H/32*W/32 Output
8C*H/16*W/16 32C*H/64*W/64
iii. the volume f is nonnegative everywhere and has a bounded support; and
where O3 is the space of all orthogonal matrices such that det A 2 { 1, 1}. The
invariance in (5.26) is true since
where A0 = R 1 A and the last equality follows from the right invariance of Haar
measure. We define G{F} = { A : A 2 O3 } such that
1
( A f )(·) = f (A ·), 8A 2 O(3), f 2 F. (5.28)
5.5 Theoretical Guarantee of Recovery 129
Sketch of the Proof. Without loss of generality, we provide the sketch of the
proof for the case when pA is uniform. For the case when pA is nonuniform the
argument remains the same provided that A associated with the non-uniform
distribution pA is absolutely continuous with respect to ( A ⌧ ). This has
been stated in [170]. Since we assume pA to be bounded, this condition is satisfied.
The only difference here with respect to the uniform distribution is that the orbit
of f and g are more restricted than O(3).
The proof first uses in [171, Proposition 7.8] which we restate here as Proposition
5.5.2.
Note that this proposition assumes that the angle of the projections are known.
Although in our case the angles are unknown, we shall see that this proposition
will be useful.
130 CryoGAN: Cryo-EM Reconstruction using GAN Framework
We now want to determine how different Pproj (.|f ) and Pproj (.|g) are for any
given f and g. For this, we use the equality
where TV is the total variation distance and ⇧(P1 , P2 ) is the set of all the joint
distributions (y1 , y2 ) whose marginals are P1 and P2 [165]. In fact, E[1y1 6=y2 ] is
equal to the probability of the event y1 6= y2 . In our context, this translates into
The optimum is achieved at the extremas which are sparse joint distributions and
are such that the variable y2 is a function of y1 . For any arbitrary joint distribution
(or coupling) of this form, the proof then assigns a measurable function h : SO(3) !
SO(3) such that (y1 , y2 ) = (PA {f }, Ph(A) {g}) for A ⇠ pA .
We can then write that
The task now is to estimate Prob(y1 6= y2 ), where (y1 , y2 ) = (PA {f }, Ph(A) {g}) for
A ⇠ pA .
(Continuous h). When h is continuous, Proposition 5.5.2 implies that, if [f ] 6= [g],
then
(General h). When the function h is discontinuous, the proof uses Lusin’s theorem
to approximate h by a continuous function. Lusin’s theorem states that, for any
> 0, there exists an h such that h(A) = h (A), 8A 2 H and [SO(3)|H ] < .
This then leads to
In conclusion, for any arbitrary coupling, the event {PA {f } 6= Ph(A) {g}} has
probability 1 if [f ] 6= [g]. This implies that, when [f ] and [g] are not the same,
the total-variation distance between Pproj (·|f ) and Pproj (·|g) is 2. This ensures that
the two probability measures are mutually singular meaning that the intersection
of their support has zero measure. This concludes the proof.
Proof. Similarly to the previous proof, we show that the TV distance between
Pproj,CTF (·|f ) and Pproj,CTF (·|g) is 2 when [f ] and [g] are distinct. For simplification,
we assume that pA is uniform. (When this is not the case the proof essentially
remains the same.) We need to show that Prob(y1 6= y2 ) = 1, where (y1 , y2 ) ⇠
for any arbitrary coupling of Pproj,CTF (·|f ) and Pproj,CTF (·|g). For an arbitrary
coupling such that Prob(y1 6= y2 ) is minimum, we again assign h : (SO(3) ⇥ C) !
(SO(3) ⇥ C) such that
Pproj,CTF (·|g) = A,c [{(A, c) 2 (SO(3) ⇥ C) : Ch1 (A,c) ⇤ Ph0 (A,c) {g} 2 ·}]. (5.38)
We now show that, for any h, the event {y1 6= y2 } has probability 1.
(Continuous h). We first assume that h is continuous and use the same kind of
technique as in the proof of [170, Theorem 3.1].
Since SO(3) is transitive, we can write that
which creates the partition of (SO(3) ⇥ C). These partitions are such that for any
m, there exists a km such that {Am n+1 ⇥ Cn+1 } ⇢ {An ⇥ Cn }. This means that,
m km km
such that
m
n = arg min min kPA {f } PA {g}k, (5.41)
2{ m m
A,c :(A,c)2{Ān ⇥C̄n }}
(A,c)2{Ām m
n ⇥C̄n }
where Ām ¯m
n and Cn are the closures of An and Cn , respectively. The sequence hn
m m
converge to h as n ! 1. We denote
We invoke Proposition 5.5.4, which gives that A,c [Kn ] = A,c [(Am
n ⇥Cn )]. There-
m
fore, A,d [K] = A,c [(SO(3) ⇥ C)] = 1. This means that, when h is continuous,
the event {y1 6= y2 } has probability 1 if [f ] 6= [g].
5.5 Theoretical Guarantee of Recovery 133
(General h). When h is discontinuous, we can invoke Lusin’s theorem to claim the
same, similarly to Theorem 5.5.1. This means that, for any h, if [f ] 6= [g], then
the probability of the event {y1 6= y2 } is 1. Therefore, the TV distance between
Pproj,CTF (·|f ) and Pproj,CTF (·|g) is 2, yielding that Pproj,CTF (·|f )?Pproj,CTF (·|g).
This concludes the proof.
Proposition 5.5.4. Let f, g 2 F, A0 ✓ SO(3), C 0 ✓ C, 2 SO(3), and
Let the assumptions from Theorem 5.5.5 be true. Then, if [f ] 6= [g], it holds that
0 0
A,c [K ]= A,c [(A ⇥ C 0 )]. (5.46)
Proof. We show that A,c [K0c ] = 0, where (K0c [K0 ) = (A0 ⇥C 0 ). We define the set
SA = {c 2 C 0 : kCc ⇤PA {f } Ch1 (A,c) ⇤PA { g}k = 0}. We define SA00 = [A2A00 SA
for any A00 ✓ A0 . We define
2
X
0c
A,c [{K ]= A,c [[A2A0k (A ⇥ SA )}] (5.49)
k=1
We define ze(I) b
ˆ = {! 2 R2 : I(!) = 0}, !↵ = {[(r cos ↵, r sin ↵)] : r > 0},
ˆ ˆ
and ze↵ (I) = ze(I) \ !↵ . From (5.51), we can write that
bc ) [ ze(Iˆf ) = ze(C
ze(C bh (A,c) ) [ ze(Iˆg ), 8c 2 SA . (5.52)
1
b b ˆ b b
ze↵ (Cc ) [ (ze↵ (Cc ) \ ze↵ (If )) = (ze↵ (Cc ) \ ze↵ (Ch (A,c) )) [ (ze↵ (C bc ) \ ze↵ (Iˆg )),
1
bc ) [ (ze↵ (C
ze↵ (C bc ) \ ze↵ (Iˆf )) = ze↵ (C
bc ) \ ze↵ (Iˆg ) (5.53)
for all c 2 SA and ↵ 2 [0, ⇡].
We can now write that
bc ) [ (ze↵ (C
[c2SA ze↵ (C bc ) \ ze↵ (Iˆf )) = [c2S ze↵ (C
bc ) \ ze↵ (Iˆg ). (5.54)
A
for any ↵ 2 S↵ . The set on the left hand side of (5.54) has an uncountably
infinite cardinality since there are uncountably many c 2 SA and for each c
there are distinct ze↵ (C bc ). In return, the set in the right hand side of (5.54)
is countable for a given ↵ 2 S↵ . Therefore, for any ↵ 2 S↵ , the two sets have
different cardinality, which raises a contradiction. The only possible scenario
in which (5.52) is true is when h1 (A, c) = c. Using (5.51), we infer that
PA {f } = PA { g}. Therefore, for any A 2 A01 , PA {f } = PA { g}. However,
[A01 ] = 0 since, if this is not true, then [f ] = [g] by Proposition 5.5.2.
Now note that
For the first part we progress by noting that y = ynoiseless + n. Recall that the
characteristic function of the probability measure associated to the sum of two
random variables is the product of their characteristic functions. Mathematically,
P̂(·|f )
P̂noiseless (·|f ) = . (5.58)
P̂n
From (5.58), it is easy to see that P(·|f1 ) = P(·|f2 ) , Pnoiseless (·|f1 ) = Pnoiseless (·|f2 ).
This concludes the first part.
For the second part, we now invoke the result from Theorem 5.5.3. It states
that if f2 6= G(f1 ) for any G in the set of rotation and reflection operation, then
the corresponding Pnoiseless (·|f1 ) and Pnoiseless (·|f2 ) are mutually singular. This
means that the support of their intersection has zero measure. Since, we have
Pnoiseless (·|f1 ) = Pnoiseless (·|f2 ) this means they are not mutually singular. This
implies that f2 = G(f1 ) for some G in the set of rotation and reflection operation.
This concludes the proof.
Cryo-EM Reconstruction
The main challenge in Cryo-EM reconstruction is that every particle has an un-
known pose in its micrograph—if the poses were known, maximum-likelihood (ML)
or maximum a posteriori (MAP) estimation of the volume could be performed by
solving a standard linear inverse problem, where robustness would result from the
5.6 Related Works 137
large number of measurements which would counteract the low SNR of each mea-
surement. One approach is to attempt to estimate the unknown poses iteratively.
Pose estimation can be achieved with a variety of strategies, including the popular
projection-matching approach [172, 173]. Whatever the method used, pose estima-
tion is challenging because the SNR of individual projection images is extremely
low. It also requires the estimation of additional parameters and the projection of
the current reconstructed volume at a large number of poses and at every iteration
of the reconstruction pipeline; ultimately, this is very computationally demanding.
technique with theoretical properties that are at least as good—if not better—than
the ML ones. In fact, the Wasserstein distance is often easier to minimize than the
KL divergence (e.g., due to the smoothness of the former) [164].
5.7 Results
5.7.1 Results on Synthetic Data
We first assessed the viability and performance of CryoGAN on a synthetic dataset
that consists of 41,000 -galactosidase projections, designed to mimic the EMPIAR-
10061 data [158] in terms of noise level and CTF parameters. To create this dataset,
we generated a 2.5 Å-resolution density map from the PDB entry (5a1a) of the pro-
tein and applied the forward model described in Online Methods to obtain projec-
tions modulated by CTF effects and corrupted by noise. We then randomly divided
this dataset in two and applied the CryoGAN algorithm separately on both halves
to generate half-maps. In the context of this experiment, we refer to these synthetic
projections as “real,” in contrast to the projections coming from CryoGAN, which
we term “simulated.” The details of the experimental setup are given in Appendix
A.3.
We ran the CryoGAN algorithm for 400 minutes on an NVIDIA V100 GPU and
obtained a reconstruction with a resolution of 8.64 Å (Figure 5.3a). Starting from
a zero-valued volume, CryoGAN progressively updates the 3D structure so that
its simulated projections (Figure 5.3b) reach a distribution that matches that of
the real projections. These gradual updates are at the core of the deep adversarial
learning scheme of CryoGAN. At each iteration of the algorithm, the gradients
from the discriminator carry information about the current difference between the
real projections and the simulated projections. These gradients are used by the
Cryo-EM physics simulator to update the volume so as to improve the fidelity
of the simulated projections. Hence, at the end of its run, the volume learned by
CryoGAN has simulated projections (Figure 5.3.c, Rows 1-3) that are similar to the
real projections (Figure 5.3.c, Row 4) in a distributional sense. The evolution of the
Fourier-shell correlation (FSC) between the reconstructed half-maps (Figure 5.3.d)
testifies to the progressive increase in resolution that derives from this adversarial
learning scheme.
5.7 Results 141
real dataset. Higher-resolution details are thus progressively introduced in the es-
timated volume throughout the run, as illustrated by the evolution of the FSC
curves between the reconstructed half-maps (Figure 5.5d). This resulted in a 12.08
Å -galactosidase structure whose simulated projections closely resemble the real
ones (Figure 5.5c).
5.8 Summary
In this chapter, we present CryoGAN, a new paradigm for single-particle Cryo-
EM reconstruction based on unsupervised deep adversarial learning. The major
challenge in single-particle Cryo-EM is that the imaged particles have unknown
poses. Current reconstruction techniques are based on a marginalized maximum-
likelihood formulation that requires calculations over the set of all possible poses
for each projection image, a computationally demanding procedure. CryoGAN
sidesteps this problem by using a generative adversarial network (GAN) to learn the
3D structure that has simulated projections that most closely match the real data
in a distributional sense. The architecture of CryoGAN resembles that of standard
GAN, with the twist that the generator network is replaced by a model of the
Cryo-EM image acquisition process. CryoGAN is an unsupervised algorithm that
only demands projection images and an estimate of the contrast transfer function
parameters. No initial volume estimate or prior training is needed. Moreover,
CryoGAN requires minimal user interaction and can provide reconstructions in a
matter of hours on a high-end GPU. In addition, we provide sound mathematical
guarantees on the recovery of the correct structure. CryoGAN currently achieves
a 8.6 Å resolution on a realistic synthetic dataset. Preliminary results on real -
galactosidase data demonstrate CryoGAN’s ability to exploit data statistics under
standard experimental imaging conditions. We believe that this paradigm opens
the door to a family of novel likelihood-free algorithms for Cryo-EM reconstruction.
144 CryoGAN: Cryo-EM Reconstruction using GAN Framework
Reconstructing Continuous
Conformations in CryoEM
using GANs
1
In the previous chapters, we have only discussed inverse problems in which at
least the signal or the forward model is deterministic. In this chapter, we deal
with the challenging case of heterogeneous Cryo-EM. In this case the 3D structure
exhibits an unknown conformation variability while being imaged by the stochastic
forward model. In order to solve this problem, we devise a third-generational
algorithm which is based on the extension of the distributional perspective proposed
in CryoGAN.
6.1 Overview
The determination of the structure of nonrigid macromolecules is an important
aspect of structural biology and is fundamental in our understanding of biological
mechanisms and in drug discovery [156]. Among other popular techniques such
as X-ray crystallography and nuclear magnetic resonance spectroscopy, Cryo-EM
1 This chapter uses content from our work [194].
145
146 Reconstructing Continuous Conformations in CryoEM using GANs
whose randomly projected 2D Cryo-EM images match the acquired data in a distri-
butional sense (Figure 6.2(b)). Due to this likelihood-free characteristic, CryoGAN
does not require any additional processing step such as pose estimation, while it can
be directly deployed on the Cryo-EM measurements. This largely helps simplify the
reconstruction procedure. However, its application is limited to the reconstruction
of a single conformation.
In this work, we combine the advantages of CryoDRGN and CryoGAN. We
propose an unsupervised deep-learning-based method, called Multi-CryoGAN. It
can reconstruct continuously varying conformations of a molecule in a truly stan-
dalone and likelihood-free manner. Using a convolutional neural network (CNN), it
directly learns a mapping from a latent space to the 3D conformation distribution.
Unlike current methods, it requires no calculation of pose or conformation estima-
tion for each projection, while it has the capacity to reconstruct low-dimensional
but complicated conformation manifolds [198].
Using synthetic Cryo-EM data as our benchmark, we show that our method
can reconstruct the conformation manifold for both continuous and discrete con-
formation distributions. In the discrete case, it also reconstructs the corresponding
probabilities. To the best of our knowledge, this is the first standalone method that
can construct whole manifold of the biomolecule conformations.
surement yq 2 RN ⇥N is given by
where
• xq 2 RN ⇥N ⇥N is a separate instance of the 3D molecular structure;
• nq 2 RN ⇥N is the noise;
• H'q is the measurement operator which depends on the imaging parameters
'q = (✓ q , tq , cq ) 2 R8 and involves three operations.
– The term P✓q {xq } is the tomographic projection of xq rotated by ✓ q =
(✓1q , ✓2q , ✓3q ).
– The operator Stq shifts the projected image by tq = (tq1 , tq2 ). This shift
arises from off-centered particle picking.
– The Fourier transform of the resulting image is then modulated by the
CTF Ĉcq with defocus parameters cq = (dq1 , dq2 , ↵ast
q
) and thereafter
subjected to inverse Fourier Transform.
For more details, please see Section on Image Formation in the previous chapter.
The challenge of Cryo-EM is that, for each measurement yq , the structure xq and
the imaging parameters (✓ q , tq ) are unknown, the CTF is a band pass filter with
multiple radial zero frequencies that incur irretrievable loss of information, and the
energy of the noise is multiple times (⇠10 to 100 times) that of the signal which
corresponds to SNRs of -10 to -20 dB. In the homogeneous case (single conforma-
tion), all xq are identical. But in the heterogeneous case (multiple conformations),
each xq represents a different conformation of the same biomolecule.
Stochastic Modeling. We denote the probability distribution over the conforma-
tion landscape by pconf (x) from which a conformation xq is assumed to be sampled
from. We assume that the imaging parameters and the noise are sampled from
known distributions p' = p✓ pt pc and pn , respectively. For a given conformation
distribution pconf (x), this stochastic forward model induces a distribution over the
measurements which we denote by p(y). We denote by pdata conf (x) the true confor-
mation distribution from which the data distribution pdata (y) is acquired such that
6.3 Background and Preliminaries 151
{y1data , . . . , yQ
data } ⇠ pdata (y). The distribution pconf (x) is unknown and needs to
data
be recovered.
The classical methods are likelihood-based and rely on the estimation of imaging
parameters (✓ q , tq ) (or a distribution over them) and of the conformation class for
each measurement image yq . This information is then utilized to reconstruct the
multiple discrete conformations. Our method, in contrast, is built upon the insight
gen
that, to recover pdata conf (x) it is sufficient to find a pconf (x) whose corresponding
measurement distribution pgen (y) is equal to pdata (y) (see Theorem 6.5.1). This
does away with pose (or distributions over the poses) estimation and conformation
clustering for each measurement.
6.3.2 CryoGAN
Our scheme is extension of the CryoGAN [28] method, which is applicable only
for the homogeneous case pdata
conf (x) = (x xdata ), where xdata is the true 3D
structure. CryoGAN tackles the challenge by casting the reconstruction problem
as a distribution-matching problem (Figure 6.2(b)). More specifically, it learns
to reconstruct the 3D volume x⇤ whose simulated projection set (measurement
distribution) is most similar to the real projection data in a distributional sense,
such that
x⇤ = arg min WD(pdata (y)||pgen (y; x)). (6.2)
x
Here, pgen (y; x) is the distribution generated from the Cryo-EM physics simulator
and WD refers to the Wasserstein distance [165]. This goal is achieved by solving
the min-max optimization problem:
x⇤ = arg min max (Ey⇠pdata (y) [D (y)] Ey⇠pgen (y;x) [D (y)]), (6.3)
x D :kD kL 1
| {z }
WD(pdata (y)||pgen (y;x))
6.4 Method
data but does not recover the underlying conformation landscape. Our proposed
scheme includes the physics of Cryo-EM, which ties pgen (y) with the conformation
landscape pconf (x) and is thus able to recover it (See Theorem 6.5.1 in Section 6.5).
y = H' {f } + n, (6.6)
where f 2 L2 (R3 ) is some conformation of the biomolecule sampled from the prob-
ability measure P̃conf on L2 (R3 ), the imaging parameters ' are sampled from p' ,
and n is sampled from the noise probability measure Pn on L2 (R2 ).
We define [f ] := {rA {f } : rA 2 O} as the set of all the rotated-reflected
versiond of f . There,PO is the set of all the rotated-reflected versions over L2 (R3 ).
We define the space L2 (R ) = L2 (R3 )/O as the quotient space
3
Pof the3shapes. For
any P̃conf defined over L2 (R3 ), an equivalent Pconf exists over L2 (R ). Since we
are interested only in the shape of conformations of the biomolecule, we will only
focus on recovering Pconf . We denote by the probability measure on B 2 R8 .
The measure is associated to the density function p' . Both of these induce a
probability measure Pclean on the space L22 = {f : R3 ! R2 s.t. kf kL2 < 1}
through
P the forward operator. This is given by Pclean [A] = (Pconf ⇥ )[([f ], ') 2
( L2 ⇥ B) : H' f 2 A] for any Borel measure set A 2 L2 (R2 ). We denote Pmeas
as the probability measure of the noisy measurements.
6.5 Theoretical Guarantee of Recovery 155
gen
Theorem 6.5.1. Let Pdataconf and Pconf be the true and the reconstructed
P conformation
probability measures on the quotient space of 3D structures L2 (R3 ), respectively.
We assume that they are atomic and that they are supported only on nonnegative-
valued shapes. Let Pdata
meas and Pmeas be the probability measures of the noisy Cryo-EM
gen
gen
measurements obtained from Pdataconf and Pconf , respectively.
Make the following physical assumptions:
i. the noise probability measure Pn is such that its characteristic functional van-
ishes nowhere in its domain and that its sample n is pointwise-defined every-
where;
iii. for any two c1 , c2 ⇠ pc , c1 6= c2 , the CTFs Ĉc1 and Ĉc2 share no common
zero frequencies.
P̂data data
meas = P̂clean P̂n
gen
P̂gen
meas = P̂clean P̂n . (6.8)
gen
From the assumption that P̂n is nonzero everywhere, we deduce that P̂data clean = P̂clean .
This proves the first step.
To prove the next step, we invoke Theorem 4 in [28] which states that any
two probability measures P1cleanP and 3Pclean that correspond to Dirac probability
2
f
conf } . For any [f ] 2 S1 , we denote by Pclean its noiseless probability
Supp{Pdata C
measure. Since f 2 S1 , it is distinct from any constituent Dirac measure in Pdata conf .
Therefore, by using [28, Theorem 4], Pfclean is mutually singular to each of the
f
constituent mutually singular measures of Pdata clean , implying that Pclean ?Pclean .
data
gen gen
From Supp{Pclean } ⇢ Supp{Pclean }, it follows that Pclean 6= Pclean , which raises
f data
a contradiction. Therefore, the set S1 is empty. The same can be proved for the
set S2 = Supp{Pgen C data gen
conf } \ Supp{Pconf }. Therefore, Supp{Pconf } = Supp{Pconf },
data
which means that the location of their constituent Dirac measures are the same.
gen
To maintain Pdata
clean = Pclean , the weight of their constituent Dirac measures have to
be the same, too. This concludes the proof.
In essence, Theorem 6.5.1 claims that a reconstructed manifold of conformations
recovers the true conformations if its measurements match the acquired data in a
distributional sense. Though the result assumes the true conformation landscape
to be discrete (atomic measure), it holds for an infinite number of discrete confor-
mations which could be arbitrarily close/similar to each other and is thus relevant
to continuously varying conformations. We leave the proof of the latter case to
future works.
the uniform distribution between a and b, and Bernoulli (p) denotes the Bernoulli
distribution with parameter p. A conformation is generated with (32 ⇥ 32 ⇥ 32)
voxels, where the size of each voxel is 5 Å. A 2D projection with random orientation
of this conformation is obtained ((32 ⇥ 32) image, Figure 6.4b). The orientation
is sampled from a uniform distribution over SO(3). Then, the CTF is applied to
this projection image with a defocus uniformly sampled between [1.0 µm, 2.0 µm],
assuming that the horizontal and vertical defocus values are the same and there
is no astigmatism. Translations/shifts are disabled in these experiments. Finally,
Gaussian noise was added to the CTF-modulated images, resulting in an SNR of
approximately -10 dB.
Implementation Details. The reconstruction of the conformation is done by
solving (6.5) using Algorithm 6. For both continuous and discrete conformations,
we use the same distribution pz ⇠ Uniform (z0 , z1 ), where z0 , z1 2 R32⇥32⇥32 are
randomly chosen from Uniform(0, 0.025) and fixed throughout the process. Thus,
we do not impose any prior knowledge whether the landscape is continuous or
discrete. As we shall see later, this latent distribution is sufficiently rich to represent
the variation of interest in the synthetic datasets. The architecture of D, G, and
training details are provided in the Appendix A.4.
Optimization Details The models are trained end-to-end on the synthetic datasets
with the usual WGAN loss (gradient-penalty regularizer = 0.6) on a TITAN X
GPU. In all experiments, G , D and the noise parameters are optimized using
three separate Adam optimizers with learning rate of 10 3 and gradient norm clip-
ping value of 1, 103 , 1, respectively. Between each generator step, there are ndisc = 5
discriminator steps. The batch size is kept at 16 samples and the algorithm is run
for 30 epochs which was sufficient for the convergence.
Metric. We deploy two metrics based on Fourier-Shell Correlation (FSC). The
FSC between two structures x1 and x2 is given by
hV! !
x̂1 , Vx̂2 i
FSC(!, x1 , x2 ) = (6.9)
kVx̂1 kkV!
!
x̂2 k
given by
Z
M[m, n] = AreaFSC(xm , xn ) = FSC(!, xm , xn ) d! (6.10)
0<!!c
any transition conformations. The structural quality of these two recovered config-
urations with respect to the corresponding ground truth are given in Figure 6.6(b).
Their FSC show that at least half of the Nyquist resolution is achieved (Figure
6.6(c)).
The FSCCC reconstruction matrix (Figure 6.5(b)) validates the fact that the re-
constructed structures cluster into two main conformations. We use it to determine
the probabilities of these cluster/conformations. We determine the probability of
a conformation using its first and last row (similarity of the conformations with
respect to the extreme conformations). We consider that a conformation xn be-
longs to the first cluster/conformation if M[0, n] > 0.5 and M[20, n] < 0.5. If the
case is reversed (M[0, n] < 0.5 and M[20, n] > 0.5), then it belongs to the second
162 Reconstructing Continuous Conformations in CryoEM using GANs
6.7 Summary
In this chapter, we propose a deep-learning-based reconstruction method for cryo-
electron microscopy (Cryo-EM) that can model multiple conformations of a nonrigid
biomolecule in a standalone manner. Cryo-EM produces many noisy projections
from separate instances of the same but randomly oriented biomolecule. Current
methods rely on pose and conformation estimation which are inefficient for the
reconstruction of continuous conformations that carries valuable information. We
introduce Multi-CryoGAN, which sidesteps the additional processing by casting the
volume reconstruction into the distribution matching problem. By introducing a
manifold mapping module, Multi-CryoGAN can learn continuous structural hetero-
geneity without pose estimation nor clustering. We also give a theoretical guarantee
of recovery of the true conformations. Our method can successfully reconstruct 3D
protein complexes on synthetic 2D Cryo-EM datasets for both continuous and dis-
crete structural variability scenarios. To the best of our knowledge, Multi-CryoGAN
is the first model that can reconstruct continuous conformations of a biomolecule
from Cryo-EM images in a fully unsupervised and standalone manner.
6.7 Summary 163
• compute yint
b b
= b · ybatch + (1 b
b ) · ygen for all b 2 {1, . . . , B}.
• update the discriminator D by gradient ascent on the loss (6.5) comple-
mented with the gradient penalty term from [167].
end for
• sample generated data: {y1gen , . . . , yB
gen } ⇠ pgen (y; G ) (see Algorithm 1).
end for
return G ⇤
164 Reconstructing Continuous Conformations in CryoEM using GANs
Chapter 7
165
166 Conclusion and Outlook
Future Work
We expect that similar results exist in higher dimensions since the theory can be
generalized. However, the computations can also be expected to be challenging for
signals defined over Rd with d > 1, for example, when considering images rather
than signals. Our scheme to solve the gTV regularized 1D inverse problems has
been made more efficient in [205]. This formulation can be useful for cases when
a function needs to be optimized with a constraint on its sparsity. For example,
in [206] this numerical scheme has been used to learn activations of a neural network
in order to increase its capacity while maintaining its stability.
Future Work
The proposed framework is generic and can be used to solve a variety of inverse
problems including superresolution, deconvolution, accelerated MRI, etc. This can
bring more robustness and reliability to the direct deep-learning-based techniques.
Infact, this method has contributed fundamentally in the emergence of the family
of such deep-learning-based iterative algorithms [21–25].
is the first deep learning method to reconstruct dynamic MRI without the require-
ment of additional data or training. This framework fits well the learning of spatio-
temporal manifolds that are smooth temporally; it is purely unsupervised. It is also
particularly appropriate in the context of inverse problems where no ground-truth
is available. In practice, it results in significant memory savings when compared to
compressed-sensing (CS) approaches. Our study shows that our proposed method
has the potential to reconstruct dynamic magnetic resonance images (MRI) in the
absence of an electrocardiogram signal.
Future Work
The method results in state-of-the-art reconstruction and could be utilized for other
similar temporal inverse problems. Recently, the research community has started
attempts to theoretically understand deep image prior [207–209]. We believe that
this continued effort could bring perspectives that could even further improve the
proposed method.
The current bottleneck of our method is the slow forward model, NuFFT. Cur-
rently, the NuFFT package1 is optimized neither for Python nor for GPU usage,
which leads to a major reason of slowdown in our implementation. With an efficient
implementation, we can speed-up the algorithm multiple times.
In future, we could explore more architectures to further improve the perfor-
mance. Because the network architecture explored in this work is never exhaustive,
there can be some architectural variations that would bring more improvement than
the currently reported results. For example, we can explore advanced architectures
or good initialization techniques of the network parameters. Similar to [210], which
have proposed a progressive learning strategy, we can also progressively increase
the spatial resolution of the network and the forward model to achieve better re-
construction. This would not only bring faster convergence but could also result
in improved performance and stability. Similar improvements could be achieved by
applying the strategy in temporal dimension. This could be achieved, for example,
by initializing with a high number of spoke-sharing and decreasing it over the course
of optimization. This could be done in parallel with a similar aggregating strategy
being followed in the latent variable domain.
In addition, instead of finding an entire image, we can let the network focus on
1 https://github.com/marchdf/python-nufft
168 Conclusion and Outlook
finding the residual of the static gold standard image. This facilitates to find the
dynamics of the dataset, which can bring better reconstruction [211]. This could
result in faster convergence to the solution. The convergence could even further be
accelerated by employing temporal and spatial multi-resolution approaches for this
routine.
Currently, our framework aims to optimize the network on the measurement
space from the beginning. Instead, as a warm start, in future we can first opti-
mize the network on the image space to fit the gold standard image. Because this
initial routine would not require the repetitive use of the forward model, it could
potentially even speed-up the reconstruction. Currently, the network is initialized
randomly which outputs random image series. Instead, it could be initialized to
output a more coherent image series. For example, the network could be first di-
rectly optimized to output precomputed backprojection images HTk yk . This could
result in faster convergence to the solution. The convergence could even further be
accelerated by employing temporal and spatial multi-resolution approaches for this
routine. Moreover, since this initial routine wouldn’t require the repetitive use of
the forward model, it could potentially even speed-up the reconstruction.
Future Work
Our implementation of CryoGAN is bound to further improve. Beyond simple
engineering tweaks (e.g., tuning the number of layers in the discriminator, testing
169
tions that this can also be achieved when there is a certain mismatch between the
assumed distribution of poses and the actual one, given that an appropriate GAN
loss is used.
Like all reconstruction algorithms, CryoGAN can fail if the dataset contains too
many corrupted particle images, e.g., those with broken structures or strong optical
aberrations. Several solutions could be deployed to handle excessive outliers in the
data distribution. One approach would be to include a step that automatically
spots and discards corrupted data so that the discriminator never sees them. Recent
deep-learning-based approaches able to track outliers in data could prove useful in
this regard [188].
While the spatial resolution of the CryoGAN reconstructions from real data
is not yet competitive with the state-of-the-art, the algorithm is already able to
steadily perform the hardest part of the job, which is to obtain a reasonable struc-
ture by using nothing save the particle dataset and CTF estimations. We believe
that the aforementioned developments will help to bring the CryoGAN algorithm to
the stage where it becomes a relevant contributor for high-resolution reconstruction
in single-particle Cryo-EM. We have laid out a roadmap for future improvements
that should get us to this stage, and may eventually help us reconstruct dynamic
structures.
Future Work
The current experiments have been conducted on synthetic data with simple dy-
namics. In future, the method could be deployed on real data. We believe that,
with a better incorporation of state-of-the art GAN architectures [210,212] and the
future work intended for Cryo-GAN, this method could provide excellent results on
challenging datasets. Moreover, we also hope that this work will bring the inter-
est of computer vision researchers to the problem of Cryo-EM. In conclusion, with
suitable extension this method could, for the first time, let biologists reconstruct
the true dynamics of biomolecules in a user friendly and robust manner.
172 Conclusion and Outlook
Appendix A
Appendices
A.1 Chapter 2
A.1.1 Proof of Theorem 2.4.1
Abstract Representer Theorem
The result presented in this section is preparatory to Theorem 2.4.1. It is classical
for Hilbert spaces [47, Theorem 16.1]. We give its proof for the sake of completeness.
Theorem A.1.1. Let X be a Hilbert space equipped with the inner product h·, ·iX
and let h1 , . . . , hM 2 X 0 be linear and continuous functionals. Let C 2 RM be a
feasible convex compact set, meaning that there exists at least a function f 2 X
such that H{f } 2 C. Then, the minimizer
173
174 Appendices
Proof. The feasibility of the set C implies that the set CX = H 1 (C) = {f 2 X :
H{f } 2 C} 2 X , is nonempty. Since H is linear and bounded and since C is convex
and compact, its preimage CX is also convex and closed. By Hilbert’s projection
theorem [214], the solution f ⇤ exists and is unique as it is the projection of the null
function onto CX . Let the measurement of this unique point f ⇤ be z0 = H{f ⇤ }.
The Riesz representation theorem states that hhm , f i = hh# m , f iX for every f 2 X ,
where h#m 2 X is the unique Riesz conjugate of the functional hm . We then uniquely
PM
decompose f ⇤ as f ⇤ = f ⇤? + m=1 am h# m , where f ⇤?
is orthogonal to the span of
the h#m with respect to the inner product on X i.e., H{f ⇤?
} = 0. The orthogonality
also implies that
M 2
2 X
kf ⇤ k2X = f ⇤? X + a m h#
m . (A.3)
m=1 X
This means that the minimum norm is reached when f ⇤? = 0 while keeping
H{f ⇤ } = z0 , implying the form (A.2) of the solution.
for every p 2 NL . As presented in the supplementary material (see [46] for more
details), the search space X2 is a Hilbert space for the Hilbertian norm
where we used respectively the triangular inequality in (A.8), the inequalities (A.4)
and (A.5) in (A.9), and the relations kpm kNL = kfm kX2 kLfm kL2 and kgm kX2 =
kLfm kL2 in (A.10). Since kLfm kL2 is bounded and kfm kX2 ! 1, we deduce that
kH{fm }k2 ! 1, which is known to be false. Finally, we obtain a contradiction,
proving the coercivity.
Since, J2 (·|z) is proper, lsc, convex, and coercive on X2 , therefore, it has at least
one minimizer.
Uniqueness of the Solution. We now prove that if E(z, ·) satisfies Assumption
2’ then the solution is unique. We first show this for the case when Assumption
176 Appendices
2’.i) is satisfied. We already know that the solution set is nonempty. It is then
clear that the uniqueness is achieved if J2 (·|z) is strictly convex. We now prove the
convex functional J2 (·|z) is actually strictly convex.
For 2 (0, 1), fA , fB 2 X2 , we denote fAB = fA + (1 )fB . Then, the
equality case J2 (fAB |z) = J2 (fA |z) + (1 )J2 (fB |z) implies that E(z, fAB ) =
E(z, fA )+(1 )E(z, fB ) and kLfAB kL2 = kLfB kL2 +(1 )kLfB kL2 , since the
two parts of the functional are themselves convex. The strict convexity of E(z, ·)
and the norm k·k2 then implies that
Q = {f 2 X2 : hf, piX2 = 0, 8p 2 NL }
is the Hilbert space with norm kL·kL2 . In particular, we have that f2⇤ = q ⇤ + p⇤
with q ⇤ 2 Q and p⇤ 2 NL .
Consider the optimization problem
⇧q 0 = q 0 ⇤ ⇢L⇤L + pq0 for some pq0 2 NL [47, 52]. Here ⇢L⇤ L is the Green’s function
of the operator (L⇤ L) (see Definition 2.2.1). Therefore,
( M )
X
⇤
f 2 = p 0 + ⇢L ⇤ L ⇤ a m hm (A.14)
m=1
P
where p0 = (pq0 + p⇤ ) 2 NnL and where o m am hhm , pi = 0 for every p 2 NL .
PM PM
The component ⇢L⇤ L ⇤ m=1 am hm in (A.14) can be written as, m=1 am 'm
n o
bm
provided that 'm = ⇢L⇤ L ⇤ hm = F 1 |hL|b2 is well-defined. To show that this is
the case, we decompose hm = ProjQ0 {hm } + ProjNL0 {hm } where ProjQ0 and ProjNL0
are the projection operators on Q0 and NL0 , respectively. Since, ProjQ0 {hm } 2 Q0 ,
as discussed earlier, ⇢L⇤L ⇤ ProjQ0 {hm } is well-defined.
Now, one can always select a basis {pn }N N0
n=1 such that NL = Span{ n }n=1 with
0 0
Note that, even in the absence of convexity of E(z, ·), results on the form of the
solution can still be obtained.
Existence of Solutions. We first show that V = arg minf 2X1 J1 (f |z) is nonempty,
convex, and weak*-compact.
We rely on the generalized Weierstrass theorem presented in [215]: Any proper,
lower semi-continuous (lsc) functional over a compact topological vector space
reaches its minimum, from which we deduce the following result. We recall that the
dual space B 0 of a Banach space B can be endowed with the weak*-topology, and
that one can define a norm kf kB0 = supkxkB hf, xi for which B 0 is a Banach space.
Proof. Let ↵ > inf J. The coercivity implies that there exists r > 0 such that
J(f ) ↵ as soon as kf kB0 > r. The infimum of J can only be reached on Br = {f 2
B 0 , kf kB0 r}, hence we restrict our analysis to it. The Banach-Alaoglu theorem
implies that Br is weak*-compact. As a consequence, the functional J is proper
and lsc on the compact space Br endowed with the weak*-topology. According to
the generalized Weierstrass theorem [215, Theorem 7.3.1], J reaches its infimum on
Br , hence on X 0 .
Let V = arg min J and ↵0 = min J. The convexity of J directly implies the one
of V. The set V is included in the ball B↵0 which is weak*-compact. Therefore,
it suffices to show that V is weak*-closed to deduce that it is weak*-compact.
Moreover, the weak*-lower semi-continuity is equivalent to the weak*-closedness of
the level sets {f 2 B 0 : J(f ) ↵} are weak*-closed. Applying this to ↵ = ↵0 , we
deduce that V = {f 2 B 0 : J(f ) ↵0 } is weak*-closed, as expected.
Ve = arg min
1
kLf kM . (A.16)
f =H {ze }
Since ze is convex and compact, and the set H 1 {ze } is nonempty, we can apply
Theorem 2 of [46] to deduce that Ve is convex and weak*-compact, together with
the general form (2.19) of the extreme-points of Ve .
Since Ve ✓ V1 , and fe 2 Ve it can be easily shown that fe is also an extreme
point of Ve . This proves that the extreme points of V1 admit the form (2.19).
Measurement of the solution set. We now show that in the case of Assumption
2’ the measurement of the solution set is unique. We first prove this for the case
of Assumption 2’.i). Let J1⇤ be the minimum value attained by the solutions. Let
fA⇤ and fB⇤ be two solutions. Let eA , eB be their corresponding E functional value
and let rA , rB be their corresponding regularization functional value. Since the cost
function is convex, any convex combination fAB = fA⇤ +(1 )fB⇤ is also a solution
for 2 [0, 1] with functional value J1 . Let us assume that H{fA⇤ } =
⇤
6 H{fB⇤ }. Since
E(z, ·) is strictly convex and R1 (f ) = kLf kM is convex, we get that
is a compact convex set and kak0 M, 8a 2 ↵E, , where ↵E, is the set of the
extreme points of ↵ .
Proposition A.1.4. Let the convex compact set ↵ be the solution set of Problem
(2.48) and let ↵E, be the set of its extreme points. Let the operator T : ↵ ! RN
be such that Ta = u with um = |am |, m 2 [1, . . . , N ]. Then, the operator is linear
and invertible over the domain ↵ and the range T↵ is convex compact such that
the image of any extreme point aE 2 ↵E, is also an extreme point of the set T↵ .
The linear program corresponding to (2.50) is
N
X
(a⇤ , u⇤ ) = min un , subject to u + a 0,
a,u
n=1 u a 0,
Pa = z. (A.18)
Any solution a⇤ of (A.18) is equal to (s⇤1 s⇤2 ) for some solution pair (A.19). We
denote the concatenation of any two independent points sr1 , sr2 2 RN by the ⇣variable ⌘
sr = (sr1 , sr2 ) 2 R2N . Then, the concatenation of the feasible pairs sf = sf1 , sf2
A.1 Chapter 2 181
that satisfies the constraints of the linear program (A.19) forms a polytope in R2N .
Given that (A.19) is solvable, it is known that at least one of the extreme points
of this polytope is also a solution. The simplex algorithm is devised such that
its solution s⇤SLP = s⇤1,SLP , s⇤2,SLP is an extreme point of this polytope [68]. Our
remaining task is to prove that a⇤SLP = s⇤1,SLP s⇤2,SLP is an extreme point of the
set ↵ , the solution set of the problem (2.48).
Proposition A.1.3 claims that the solution set ↵ of the LASSO problem is a
convex set with extreme points ↵E, 2 RN . As ↵ is convex and compact, the
concatenated set ⇣ = {w 2 R2N : w = (a⇤ , u⇤ ) , a⇤ 2 ↵ } is convex and compact
by Proposition A.1.4. The transformation (a⇤ , u⇤ ) = (s⇤1 s⇤2 , s⇤1 + s⇤2 ) is linear
and invertible. This means that the solution set of (A.19) is convex and compact,
too. The simplex solution corresponds to one of the extreme points of this convex
compact set.
Since the map (a⇤ , u⇤ ) = (s⇤1 s⇤2 , s⇤1 + s⇤2 ) is linear and invertible, it also implies
that an extreme point of the solution set of (A.19) corresponds to an extreme point
of ⇣. Proposition A.1.4 then claims that this extreme point of ⇣ corresponds to an
extreme point aSLP 2 ↵ ,E .
for some z0, . The solution of the problem akin to (A.20) has been discussed in [45]
and is proven to be convex and compact such that the extreme points ↵E, of the
convex set ↵ satisfy kak0 M for any a 2 ↵E, .
2 RN , such that
HT (z Ha) = , and (A.21)
(
sign(am ), if am 6= 0
m 2 (A.22)
[ 1, 1], if am = 0,
for any m 2 [1, . . . , N ]. The is unique since Ha is unique for all a 2 ↵ . Condition
(A.22) implies that |am | = m .am for any m 2 [1, . . . , N ] and a 2 ↵ .
Therefore, for any a 2 ↵ , Ta = Ra, where R 2 RN ⇥N is a diagonal matrix
with entries Rmm = m . Thus, the operation of T is linear in the domain ↵ . Also,
a = RRa for a 2 ↵ implying that the operator T is invertible.
This ensures that the image of the convex compact set T↵ is also convex compact
and the image of any extreme point aE 2 ↵E, is also an extreme point of the set
T↵ . Similarly, it can be proved that the concatenated set ⇣ = {w 2 R2N : w =
(a, Ta) , a 2 ↵ } is the image of a linear and invertible concatenation operation on
↵. Thus, it is convex and compact, and the image of any extreme point through
the inverse operation of the concatenation wE 2 ⇣E, is also an extreme point of
↵ .
Predual of X1 . The space M(R) is the topological dual of the space C0 (R) of
continuous and vanishing functions. The space X1 inherits this property: It is the
topological dual of CL (R), defined as the image of C0 (R) by the adjoint L⇤ of L
according to [46, Theorem 6].
We can therefore define a weak*-topology on X1 : It is the topology for which
fn ! 0 if hfn , 'i ! 0 for every ' 2 CL (R). The weak*-topology is crucial to ensure
the existence of solutions of (2.18); see [46] for more details.
Weak*-continuity of hm . The weak*-continuity of hm is equivalent to its inclusion
in the predual space CL (R) [216, Theorem IV.20, p. 114].
A.2 Chapter 3
A.2.1 Proof of Theorem 3.3.1
(i) Set rk = (xk+1 xk ). On one hand, it is clear that
rk = (1 ↵k )xk + ↵k zk xk = ↵k (zk xk ) . (A.28)
On the other hand, from the construction of {↵k },
↵k kzk xk k2 ck ↵k 1 kzk 1 xk 1 k2
, krk k2 ck krk 1 k2 . (A.29)
184 Appendices
We now show that {xk } is a Cauchy sequence. Since {ck } is asymptotically upper-
bounded by C < 1, there exists K such that ck C, 8k > K. Let m, n be two
integers such that m > n > K. By using (A.30) and the triangle inequality,
m
X1 K
Y mX
1 K
kxm x n k2 krk k2 kr0 k2 ci Ck
k=n i=1 k=n K
K
!
Y C n K
Cm K
kr0 k2 ci . (A.31)
i=1
1 C
By plugging (A.34) into (A.33), we get that x⇤ = G (x⇤ ), which means that x⇤ is
a fixed point of the operator G .
(iii) Now that F = PS satisfies (3.4), we invoke Proposition 3.2.1 to infer that
x⇤ is a local minimizer of (3.1), thus completing the proof.
A.2 Chapter 3 185
1
where W 2 RM ⇥M is a diagonal matrix with [diag(W)]m = wm = pm , H0 = W 2 H,
1
and y0 = W 2 y.
Imposing the data manifold prior, we get the equivalent of Problem (3.1) as
1
min kH0 x y0 k2 . (A.42)
x2S 2
Note that all the results discussed in Section 3.2 and 3.3 apply to Problem (A.42).
As a consequence, we use Algorithm 1 to solve the problem with the following small
change in the gradient step:
0 hz PS x , x PS xi
D E
= z x⇤ , HT y HT Hx⇤
⇣ ⌘
2 2 2
= kHx⇤ yk2 kHz yk2 + kHx⇤ Hzk2 .
2
Since > 0, the last inequality implies that
2 2
kHx⇤ yk2 kHz yk2 , 8z 2 S \ B" (x⇤ ),
hxi PS x , x PS xi > 0,
hz x̂ , x x̂i 0, 8z 2 Ci , 8i n,
Sn
Let d be the distance from x̂ to the set T = i=m+1 Ci . Since each Ci is closed, T
must be closed too and, so, d > 0. We now choose 0 < " < d. Then, B" (x̂) \ T = ;.
Therefore,
m
[ m
[
S \ B" (x̂) = (Ci \ B" (x̂)) = C˜i , (A.45)
i=1 i=1
where C˜i = Ci \ B" (x̂) is clearly a convex set, for allSi m. It is straightforward
Tm
m
that x̂ is the orthogonal projection of x onto the set i=1 C˜i and that x 2 i=1 C˜i .
We are back to Case 1 and, therefore,
From (A.45) and (A.46), (3.4) is fulfilled for the chosen ".
x HT Hx I HT H kxk2 , (A.47)
2 2
2 min 2 max
i , 8i
max + min max + min
max min
, |1 i| , 8i. (A.49)
max + min
max min
x HT Hx kxk2 , 8x. (A.50)
2 max + min
A.2 Chapter 3 189
Since L < ( max + min )/( max min ), (A.51) implies that G is a contractive
mapping. By the Banach-Picard fixed point theorem [219, Thm. 1.48], {xk } de-
fined by xk+1 = G (xk ) converges to the unique fixed point x⇤ of G , for every
initialization x0 . Finally, since PS satisfies (3.4), by Proposition 3.2.1, x⇤ is also a
local minimizer of (3.1).
A.2.8 Experiments
1) Experiment 1
Figure A.1 show the difference images between the reconstructions and the
ground truth for lung and abdomen images for ⇥16 case.
Fig. A.3 compares the reconstructions for the ⇥5 case when the noise levels
are 1-dB (first and second columns), 45 dB (third columns) and 35 dB (fourth
190 Appendices
Figure A.2: Profile of the high and low contrast regions marked by solid
and dashed line segments, respectively, inside the original image in the first
column of Fig. A.3. This case corresponds to ⇥5, 35-dB noise case.
A.2 Chapter 3 193
Results (25-dB) zoom (25-dB) zoom (30-dB) zoom (35-dB) diff (25-dB) diff (30-dB)
diff(35-dB)
A.3 Chapter 5
A.3.1 Synthetic Data (Figure 5.3)
Dataset. We construct a synthetic Cryo-EM dataset that mimics the experimental
-galactosidase dataset (EMPIAR-10061) from [158]. We generate 41,000 synthetic
-galactosidase projections of size 192 ⇥ 192 using our Cryo-EM image-formation
model (see Online Methods). For the ground-truth volume, we generate a 2.5
Å density map from PDB-5a1a atomic model using Chimera [220]. This gives a
volume of size (302⇥233⇥163) with voxel size 0.637 Å. The volume is then padded,
averaged, and downsampled to a size (180 ⇥ 180 ⇥ 180) with voxel size 1.274 Å.
This corresponds to a Nyquist resolution of 2.548 Å for the reconstructed volume.
The projection poses are sampled from a uniform distribution over SO(3), where
SO(3) is the group of 3D rotations around the origin of R3 .
In order to apply random CTFs and noise, we randomly pick a micrograph in the
EMPIAR- 10061 dataset. We extract its CTF paramters using CTFFIND4 [221]
and apply them to a clean projection. The parameter B of the envelope function of
the CTF (see (5.13)) is chosen such that it decays to 0.2 at the Nyquist frequency.
We then randomly select a background patch from the same micrograph to simulate
noise. The noise is downsampled to size 192 ⇥ 192, and normalized to zero-mean,
scaled and added to the projection. The scaling is such that the ratio of the energy
of the signal to the energy of the noise (SNR) is kept at 0.01, which is equivalent
to -20 dB. The dataset is then randomly divided into two halves. The algorithm
is applied separately on both halves to generate the half-maps. The FSC between
the two half maps is then reported using FOCUS [222, 223].
We apply a binary spherical mask of size (184 ⇥ 184 ⇥ 184) on the learned vol-
ume. To handle the sharp transitions at the mask borders, we restrict the voxel
values to lie above a certain value. This value changes as a function of position
and iteration number: it increases linearly with the distance from the center of the
volume to the border of the mask, from Vmin to 0, and the value Vmin changes as
a function of iteration number, starting at 0 and decreasing to -2% of the current
maximum voxel value. This promotes nonnegativity during the initial phases of
reconstruction and increases the stability of the algorithm.
For the back-propagations, the norm of the gradients for the discriminator are
clipped to a maximal value of 108 . For the generator, the gradients for each pixel
are clipped to a maximal value of 104 . The clipping values increase linearly from
zero to those maxima in the first two epochs. Doing so improves on the stability
of the adversarial learning scheme in the start, in particular, on that of the dis-
criminator. All parameters are tuned for a fixed value range that follows from the
normalization of all projections.
Reconstruction Settings. The reconstruction settings for both cases are the
same than the ones used in the main experiment, except for the few following dif-
ferences. For the second case, the translations are also imposed, and the translation
distribution is kept the same as the one used for generating the dataset. Further-
more, in both cases, the lower bound of the clipping value at the centre reaches
-5% of the maximum voxel value of the volume. Finally, the algorithm is run for
100 epochs for both experiments.
are clipped to a maximal value of 103 . The clipping values increase linearly from
zero to those maxima in the first two epochs. Doing so improves on the stability of
the adversarial learning scheme in the start, in particular, on that of the discrimi-
nator. The gradients that correspond to the learning of the scaling ratios between
the noise and projection images are clipped to a value of 10.
A.4 Chapter 6
A.4.1 Neural Network Architectures
Layer id Layer Resample Output Shape
0 Input - 1 ⇥ 32 ⇥ 32
1 Conv2d MaxPool 96 ⇥ 16 ⇥ 16
2 Conv2d MaxPool 192 ⇥ 8 ⇥ 8
3 Conv2d MaxPool 384 ⇥ 4 ⇥ 4
4 Conv2d MaxPool 768 ⇥ 2 ⇥ 2
5 Flatten - 3072⇥ 1 ⇥ 1
6 FC - 50 ⇥ 1 ⇥ 1
7 FC - 1 ⇥ 1 ⇥ 1
203
204 BIBLIOGRAPHY
[9] S. Ramani and J.A. Fessler, “Parallel MR image reconstruction using aug-
mented Lagrangian methods,” IEEE Trans. Med. Imag., vol. 30, no. 3, pp.
694–706, 2011.
[10] M. A. T. Figueiredo and R. D. Nowak, “An EM algorithm for wavelet-based
image restoration,” IEEE Transactions on Image Processing, vol. 12, no. 8,
pp. 906–916, Aug. 2003.
[11] Ingrid Daubechies, Michel Defrise, and Christine De Mol, “An iterative
thresholding algorithm for linear inverse problems with a sparsity constraint,”
Commun. Pure Appl. Math, vol. 57, no. 11, pp. 1413–1457, 2004.
[12] Amir Beck and Marc Teboulle, “A fast iterative shrinkage-thresholding algo-
rithm for linear inverse problems,” SIAM J. Imaging Sciences, vol. 2, no. 1,
pp. 183–202, 2009.
[13] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein,
“Distributed optimization and statistical learning via the alternating direction
method of multipliers,” Foundations and Trends in Machine Learning, vol.
3, no. 1, pp. 1–122, 2011.
[14] Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep learning, MIT
Press, 2016.
[15] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Imagenet clas-
sification with deep convolutional neural networks,” in Advances in neural
information processing systems, 2012, pp. 1097–1105.
[16] Michael T. McCann, Kyong Hwan Jin, and Michael Unser, “Convolutional
neural networks for inverse problems in imaging: A review,” IEEE Signal
Process. Mag., vol. 34, no. 6, pp. 85–95, Nov. 2017.
[17] Kyong Hwan Jin, Michael T McCann, Emmanuel Froustey, and Michael
Unser, “Deep convolutional neural network for inverse problems in imaging,”
IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509–4522, 2017.
[18] Yo Seob Han, Jaejun Yoo, and Jong Chul Ye, “Deep learning with domain
adaptation for accelerated projection reconstruction MR,” arXiv:1703.01135
[cs.CV], 2017.
BIBLIOGRAPHY 205
[19] Stephan Antholzer, Markus Haltmeier, and Johannes Schwab, “Deep learning
for photoacoustic tomography from sparse data,” arXiv:1704.04587 [cs.CV],
2017.
[20] Shanshan Wang, Zhenghang Su, Leslie Ying, Xi Peng, Shun Zhu, Feng Liang,
Dagan Feng, and Dong Liang, “Accelerating magnetic resonance imaging via
deep learning,” in Proc. IEEE Int. Symp. Biomed. Imaging (ISBI), 2016, pp.
514–517.
[21] Jonas Adler and Ozan Öktem, “Solving ill-posed inverse problems using iter-
ative deep neural networks,” arXiv:1704.04058 [math.OC], 2017.
[22] Kerem C Tezcan, Christian F Baumgartner, Roger Luechinger, Klaas P
Pruessmann, and Ender Konukoglu, “MR image reconstruction using deep
density priors,” IEEE Trans. on Med. Imag., vol. 38, no. 7, pp. 1633–1642,
July 2019.
[23] Yong Chun and Jeffrey A Fessler, “Deep bcd-net using identical encoding-
decoding cnn structures for iterative image recovery,” in 2018 IEEE 13th
Image, Video, and Multidimensional Signal Processing Workshop (IVMSP).
IEEE, 2018, pp. 1–5.
[24] K. Hammernik, T. Klatzer, E. Kobler, M.P. Recht, D.K. Sodickson, T. Pock,
et al., “Learning a variational network for reconstruction of accelerated MRI
data,” Magn. Reson. in Med., vol. 79, no. 6, pp. 3055–3071, June 2018.
[25] JH Rick Chang, Chun-Liang Li, Barnabas Poczos, BVK Vijaya Kumar, and
Aswin C Sankaranarayanan, “One network to solve them all–solving linear
inverse problems using deep projection models,” in Proceedings of the IEEE
International Conference on Computer Vision, 2017, pp. 5888–5897.
[26] Harshit Gupta, Kyong Hwan Jin, Ha Q Nguyen, Michael T McCann, and
Michael Unser, “CNN-based projected gradient descent for consistent ct im-
age reconstruction,” IEEE transactions on medical imaging, vol. 37, no. 6,
pp. 1440–1453, 2018.
[27] V. Lempitsky, A. Vedaldi, and D. Ulyanov, “Deep image prior,” in Proc. of
the IEEE Comput. Soc. Conf. on Comput. Vision and Pattern Recognition
(CVPR), Salt Lake City UT, USA, July 18-23 2018, pp. 9446–9454.
206 BIBLIOGRAPHY
[28] Harshit Gupta, Michael T McCann, Laurene Donati, and Michael Unser,
“Cryogan: A new reconstruction paradigm for single-particle cryo-em via deep
adversarial learning,” BioRxiv, 2020.
[29] Harshit Gupta, Julien Fageot, and Michael Unser, “Continuous-domain so-
lutions of linear inverse problems with tikhonov versus generalized tv reg-
ularization,” IEEE Transactions on Signal Processing, vol. 66, no. 17, pp.
4670–4684, 2018.
[33] Michael Elad and Michal Aharon, “Image denoising via sparse and redundant
representations over learned dictionaries,” IEEE Trans. Image Process., vol.
15, no. 12, pp. 3736–3745, 2006.
[34] Emmanuel J Candès, Yonina C Eldar, Deanna Needell, and Paige Randall,
“Compressed sensing with coherent and redundant dictionaries,” Applied and
Computational Harmonic Analysis, vol. 31, no. 1, pp. 59–73, 2011.
[36] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-
Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative ad-
versarial nets,” in Advances in neural information processing systems, 2014,
pp. 2672–2680.
BIBLIOGRAPHY 207
[41] M.A.T. Figueiredo, R.D. Nowak, and S.J. Wright, “Gradient projection for
sparse reconstruction: Application to compressed sensing and other inverse
problems,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, no.
4, pp. 586–597, Dec. 2007.
[42] Arthur E. Hoerl and Robert W. Kennard, “Ridge regression: Biased estima-
tion for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67,
Feb. 1970.
[43] R. Tibshirani, “Regression shrinkage and selection via the Lasso,” Journal of
the Royal Statistical Society. Series B, vol. 58, no. 1, pp. 265–288, 1996.
[44] Bradley Efron, Trevor Hastie, and Robert Tibshirani, “Discussion: The
Dantzig selector: Statistical estimation when p is much larger than n,” The
Annals of Statistics, vol. 35, no. 6, pp. 2358–2364, Dec. 2007.
[46] Michael Unser, Julien Fageot, and John Paul Ward, “Splines are universal so-
lutions of linear inverse problems with generalized TV regularization,” SIAM
Review, vol. 59, no. 4, pp. 769–793, Dec. 2017.
[47] Holger Wendland, Scattered Data Approximation, vol. 17, Cambridge Uni-
versity press, 2004.
208 BIBLIOGRAPHY
[48] Bernhard Schölkopf and Alexander J. Smola, Learning with Kernels: Support
Vector Machines, Regularization, Optimization, and Beyond, MIT Press,
Cambridge, MA, USA, 2001.
[49] Bernhard Schölkopf, Ralf Herbrich, and Alex J Smola, “A generalized repre-
senter theorem,” Lecture Notes in Computer Science, vol. 2111, pp. 416–426,
2001.
[50] Grace Wahba, Spline Models for Observational Data, vol. 59, SIAM, 1990.
[51] Grace Wahba, “Support vector machines, reproducing kernel Hilbert spaces
and the randomized GACV,” Advances in Kernel Methods-Support Vector
Learning, vol. 6, pp. 69–87, 1999.
[52] Anatoly Yu Bezhaev and Vladimir Aleksandrovich Vasilenko, Variational
theory of splines, Springer, 2001.
[53] J. Kybic, T. Blu, and M. Unser, “Generalized sampling: A variational
approach—Part I: Theory,” IEEE Transactions on Signal Processing, vol.
50, no. 8, pp. 1965–1976, Aug. 2002.
[54] J. Kybic, T. Blu, and M. Unser, “Generalized sampling: A variational
approach—Part II: Applications,” IEEE Transactions on Signal Processing,
vol. 50, no. 8, pp. 1977–1985, Aug. 2002.
[55] Leonid I. Rudin, Stanley Osher, and Emad Fatemi, “Nonlinear total variation
based noise removal algorithms,” Physics D, vol. 60, no. 1-4, pp. 259–268,
Nov. 1992.
[56] Gabriele Steidl, Stephan Didas, and Julia Neumann, “Splines in higher order
TV regularization,” International Journal of Computer Vision, vol. 70, no.
3, pp. 241–255, Dec. 2006.
[57] SD Fisher and JW Jerome, “Spline solutions to L1 extremal problems in one
and several variables,” Journal of Approximation Theory, vol. 13, no. 1, pp.
73–83, Jan. 1975.
[58] K. Bredies and H.K. Pikkarainen, “Inverse problems in spaces of measures,”
ESAIM: Control, Optimisation and Calculus of Variations, vol. 19, no. 1, pp.
190–218, Jan. 2013.
BIBLIOGRAPHY 209
[60] Quentin Denoyelle, Vincent Duval, and Gabriel Peyré, “Support recovery for
sparse super-resolution of positive measures,” Journal of Fourier Analysis
and Applications, vol. 23, no. 5, pp. 1153–1194, Oct. 2017.
[61] Antonin Chambolle, Vincent Duval, Gabriel Peyré, and Clarice Poon, “Geo-
metric properties of solutions to the total variation denoising problem,” In-
verse Problems, vol. 33, no. 1, pp. 015002, Dec. 2016.
[64] Andrea Braides, Gamma-convergence for Beginners, vol. 22, Clarendon Press,
2002.
[65] Vincent Duval and Gabriel Peyré, “Sparse regularization on thin grids I: the
Lasso,” Inverse Problems, vol. 33, no. 5, pp. 055008, 2017.
[66] Gongguo Tang, Badri Narayan Bhaskar, and Benjamin Recht, “Sparse recov-
ery over continuous dictionaries-just discretize,” in Asilomar Conference on
Signals, Systems and Computers. IEEE, 2013, pp. 1043–1047.
[67] George B Dantzig, Alex Orden, and Philip Wolfe, “The generalized simplex
method for minimizing a linear form under linear inequality restraints,” Pa-
cific Journal of Mathematics, vol. 5, no. 2, pp. 183–195, Oct. 1955.
[69] Vincent Duval and Gabriel Peyré, “Exact support recovery for sparse spikes
deconvolution,” Foundations of Computational Mathematics, vol. 15, no. 5,
pp. 1315–1355, 2015.
210 BIBLIOGRAPHY
[70] Ryan J Tibshirani, “The LASSO problem and uniqueness,” Electronic Journal
of Statistics, vol. 7, pp. 1456–1490, 2013.
[71] Holger Rauhut, Karin Schnass, and Pierre Vandergheynst, “Compressed sens-
ing and redundant dictionaries,” IEEE Transactions on Information Theory,
vol. 54, no. 5, pp. 2210–2219, Apr. 2008.
[72] Simon Foucart and Holger Rauhut, A Mathematical Introduction to Com-
pressive Sensing, Springer, 2013.
[73] Antonin Chambolle and Charles Dossal, “On the convergence of the iterates
of FISTA,” Journal of Optimization Theory and Applications, vol. 166, no.
3, pp. 25, 2015.
[74] E. Mammen and S. van de Geer, “Locally adaptive regression splines,” Annals
of Statistics, vol. 25, no. 1, pp. 387–413, 1997.
[75] Emmanuel J Candès and Carlos Fernandez-Granda, “Towards a mathematical
theory of super-resolution,” Communications on Pure and Applied Mathemat-
ics, vol. 67, no. 6, pp. 906–956, 2014.
[76] Gongguo Tang, Badri Narayan Bhaskar, Parikshit Shah, and Benjamin Recht,
“Compressed sensing off the grid,” IEEE Transactions on Information The-
ory, vol. 59, no. 11, pp. 7465–7490, 2013.
[77] Yohann De Castro, Fabrice Gamboa, Didier Henrion, and J-B Lasserre, “Ex-
act solutions to super resolution on semi-algebraic domains in higher dimen-
sions,” IEEE Transactions on Information Theory, vol. 63, no. 1, pp. 621–630,
2017.
[78] M. Unser and T. Blu, “Generalized smoothing splines and the optimal dis-
cretization of the Wiener filter,” IEEE Transactions on Signal Processing,
vol. 53, no. 6, pp. 2146–2159, Jun. 2005.
[79] M. Unser and P. D. Tafti, “Stochastic models for sparse and piecewise-smooth
signals,” IEEE Transactions on Signal Processing, vol. 59, no. 3, pp. 989–
1006, Mar. 2011.
[80] M. Unser and P. D. Tafti, An Introduction to Sparse Stochastic Processes,
Cambridge University Press, 2014.
BIBLIOGRAPHY 211
[81] J. Fageot, V. Uhlmann, and M. Unser, “Gaussian and sparse processes are
limits of generalized Poisson processes,” arXiv:1702.05003 [math.PR], 2017.
[82] Ali Mousavi and Richard G Baraniuk, “Learning to invert: Signal recovery
via deep convolutional networks,” arXiv:1701.03891 [stat.ML], 2017.
[83] Karol Gregor and Yann LeCun, “Learning fast approximations of sparse
coding,” in Proc. Int. Conf. Mach. Learn. (ICML), 2010, pp. 399–406.
[84] Yan Yang, Jian Sun, Huibin Li, and Zongben Xu, “Deep ADMM-Net for
compressive sensing MRI,” in Adv. Neural Inf. Process. Syst. (NIPS), pp.
10–18. 2016.
[85] Patrick Putzky and Max Welling, “Recurrent inference machines for solving
inverse problems,” arXiv:1706.04008 [cs.NE], 2017.
[86] Jo Schlemper, Jose Caballero, Joseph V Hajnal, Anthony Price, and Daniel
Rueckert, “A deep cascade of convolutional neural networks for MR image
reconstruction,” in International Conference on Information Processing in
Medical Imaging. Springer, 2017, pp. 647–658.
[88] Stanley H Chan, Xiran Wang, and Omar A Elgendy, “Plug-and-Play ADMM
for image restoration: Fixed-point convergence and applications,” IEEE
Trans. Comput. Imaging, vol. 3, no. 1, pp. 84–98, 2017.
[90] Yaniv Romano, Michael Elad, and Peyman Milanfar, “The little engine that
could: Regularization by denoising (red),” SIAM Journal on Imaging Sci-
ences, vol. 10, no. 4, pp. 1804–1844, 2017.
212 BIBLIOGRAPHY
[91] JH Chang, Chun-Liang Li, Barnabás Póczos, BVK Kumar, and Aswin C
Sankaranarayanan, “One network to solve them all—Solving linear inverse
problems using deep projection models,” arXiv:1703.09912 [cs.CV], 2017.
[92] Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis, “Compressed
sensing using generative models,” arXiv:1703.03208 [stat.ML], 2017.
[93] Brendan Kelly, Thomas P Matthews, and Mark A Anastasio, “Deep learning-
guided image reconstruction from incomplete data,” arXiv:1709.00584
[cs.CV], 2017.
[96] Qiong Xu, Hengyong Yu, Xuanqin Mou, Lei Zhang, Jiang Hsieh, and
Ge Wang, “Low-dose X-ray CT reconstruction via dictionary learning,” IEEE
Trans. Med. Imag., vol. 31, no. 9, pp. 1682–1697, 2012.
[97] Shanzhou Niu, Yang Gao, Zhaoying Bian, Jing Huang, Wufan Chen, Gaohang
Yu, Zhengrong Liang, and Jianhua Ma, “Sparse-view X-ray CT reconstruction
via total generalized variation regularization,” Phys. Med. Biol., vol. 59, no.
12, pp. 2997–3017, 2014.
[98] Lars Gjesteby, Qingsong Yang, Yan Xi, Ye Zhou, Junping Zhang, and
Ge Wang, “Deep learning methods to guide CT image reconstruction and
reduce metal artifacts,” in Medical Imaging 2017: Physics of Medical Imag-
ing, Orlando, Fl, 2017.
[99] Hu Chen, Yi Zhang, Mannudeep K. Kalra, Feng Lin, Yang Chen, Peixi Liao,
Jiliu Zhou, and Ge Wang, “Low-dose CT with a residual encoder-decoder
convolutional neural network,” IEEE Trans. Med. Imag., vol. 36, no. 12, pp.
2524–2535, 2017.
BIBLIOGRAPHY 213
[100] Eunhee Kang, Junhong Min, and Jong Chul Ye, “A deep convolutional neural
network using directional wavelets for low-dose X-ray CT reconstruction,”
Med. Phys., vol. 44, no. 10, pp. e360–e375, 2017.
[101] Yoseop Han, Jaejoon Yoo, and Jong Chul Ye, “Deep residual learning for
compressed sensing CT reconstruction via persistent homology analysis,”
arXiv:1611.06391 [cs.CV], 2016.
[102] Bertolt Eicke, “Iteration methods for convexly constrained ill-posed problems
in Hilbert space,” Numer. Funct. Anal. Optim., vol. 13, no. 5-6, pp. 413–429,
1992.
[103] L. Landweber, “An iteration formula for Fredholm integral equations of the
first kind,” Amer. J. Math., vol. 73, no. 3, pp. 615–624, 1951.
[108] A. Aldroubi and R. Tessera, “On the existence of optimal unions of subspaces
for data modeling and clustering,” Found. Comput. Math., vol. 11, no. 3, pp.
363âĂŞ–379, 2011.
[109] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional
networks for biomedical image segmentation,” in Proc. Med. Image. Comput.
Comput. Assist. Interv. (MICCAI), 2015, pp. 234–241.
[110] A. Emin Orhan and Xaq Pitkow, “Skip connections eliminate singularities,”
arXiv:1701.09175 [cs.NE], 2017.
214 BIBLIOGRAPHY
[111] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual
learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), 2016, pp. 770–778.
[112] C. McCollough, “TU-FG-207A-04: Overview of the Low Dose CT Grand
Challenge,” Med. Phys., vol. 43, no. 6-part-35, pp. 3759–3760, 2016.
[113] Zhou Wang, Eero P Simoncelli, and Alan C Bovik, “Multiscale structural sim-
ilarity for image quality assessment,” in Proc. IEEE Asilomar Conf. Signals,
Syst., Comput., Pacific Grove, CA, Nov. 2003, vol. 2, pp. 1398–1402.
[114] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro, “Online
learning for matrix factorization and sparse coding,” Journal of Machine
Learning Research (JMLR), vol. 11, pp. 19–60, 2010.
[115] Joel A Tropp and Anna C Gilbert, “Signal recovery from random measure-
ments via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53,
no. 12, pp. 4655–4666, 2007.
[116] Kyong Hwan Jin, Harshit Gupta, Jerome Yerly, Matthias Stuber, and Michael
Unser, “Time-dependent deep image prior for dynamic mri,” arXiv preprint
arXiv:1910.01684, 2019.
[117] M.A. Griswold, P.M. Jakob, R.M. Heidemann, M. Nittka, V. Jellus,
J. Wang, et al., “Generalized autocalibrating partially parallel acquisitions
(GRAPPA),” Magn. Reson. in Med., vol. 47, no. 6, pp. 1202–1210, June 2002.
[118] K.P. Pruessmann, M. Weiger, M.B. Scheidegger, and P. Boesigner, “SENSE:
Sensitivity encoding for fast MRI,” Magn. Reson. in Med., vol. 42, no. 5, pp.
952–962, November 1999.
[119] M. Lustig, D. Donoho, and J.M. Pauly, “Sparse MRI: The application of
compressed sensing for rapid MR imaging,” Magn. Reson. in Med., vol. 58,
no. 6, pp. 1182–1195, December 2007.
[120] J. Fessler, “Model-based image reconstruction for MRI,” IEEE Sig. Process.
Mag., vol. 27, no. 4, pp. 81–89, July 2010.
[121] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no.
7553, pp. 436–444, May 27 2015.
BIBLIOGRAPHY 215
[131] E.J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact
signal reconstruction from highly incomplete frequency information,” IEEE
Trans. on Info. Theory, vol. 52, no. 2, pp. 489–509, February 2006.
[132] S. Poddar and M. Jacob, “Dynamic MRI using smoothness regularization
on manifolds (SToRM),” IEEE Trans. on Med. Imag., vol. 35, no. 4, pp.
1106–1115, April 2015.
[133] U. Nakarmi, Wang Y., J. Lyu, D. Liang, and L. Ying, “A kernel-based low-
rank (KLR) model for low-dimensional manifold recovery in highly accel-
erated dynamic MRI,” IEEE Trans. on Med. Imag., vol. 36, no. 11, pp.
2297–2307, November 2017.
[134] J. Yerly, G. Ginami, G. Nordio, A.J. Coristine, S. Coppo, P. Monney, et al.,
“Coronary endothelial function assessment using self-gated cardiac cine MRI
and k-t sparse SENSE,” Magn. Reson. in Med., vol. 76, no. 5, pp. 1443–1454,
November 2016.
[135] J. Chaptinel, J. Yerly, Y. Mivelaz, M. Prsa, L. Alamo, Y. Vial, et al., “Fetal
cardiac cine magnetic resonance imaging in utero,” Scientific Reports, vol. 7,
no. 15540, pp. 1–10, November 14 2017.
[136] J.A. Fessler and B.P. Sutton, “Nonuniform fast Fourier transforms using min-
max interpolation,” IEEE Trans. on Sig. Proc., vol. 51, no. 2, pp. 560–574,
February 2003.
[137] A.P. Yazdanpanah, O. Afacan, and S.K. Warfield, “Non-learning based deep
parallel MRI reconstruction (NLDpMRI),” in Proc. of the SPIE Conf. on
Med. Imag.: Imag. Process., San Diego CA, USA, February 16-21 2019, Inter-
national Society for Optics and Photonics, vol. 10949, pp. 1094904–1094910.
[138] K. Gong, C. Catana, J. Qi, and Q. Li, “PET image reconstruction using deep
image prior,” IEEE Trans. on Med. Imag., vol. 38, no. 7, pp. 1655–1665, July
2019.
[139] C. Qin, J. Schlemper, J. Caballero, A.N. Price, J.V. Hajnal, and D. Rueckert,
“Convolutional recurrent neural networks for dynamic MR image reconstruc-
tion,” IEEE Trans. on Med. Imag., vol. 38, no. 1, pp. 280–290, January
2019.
BIBLIOGRAPHY 217
[156] Tamir Bendory, Alberto Bartesaghi, and Amit Singer, “Single-particle cryo-
electron microscopy: Mathematical theory, computational challenges, and
opportunities,” IEEE Signal Processing Magazine, vol. 37, no. 2, pp. 58–76,
2020.
[157] Amit Singer and Fred J Sigworth, “Computational methods for single-particle
electron cryomicroscopy,” Annual Review of Biomedical Data Science, vol. 3,
2020.
[158] Alberto Bartesaghi, Alan Merk, Soojay Banerjee, Doreen Matthies, Xiongwu
Wu, Jacqueline LS Milne, and Sriram Subramaniam, “2.2 å resolution cryo-
em structure of -galactosidase in complex with a cell-permeant inhibitor,”
Science, vol. 348, no. 6239, pp. 1147–1151, 2015.
[161] Miloš Vulović, Raimond B.G. Ravelli, Lucas J. van Vliet, Abraham J. Koster,
Ivan Lazić, Uwe Lücken, Hans Rullgård, Ozan Öktem, and Bernd Rieger, “Im-
age formation modeling in cryo-electron microscopy,” Journal of Structural
Biology, vol. 183, no. 1, pp. 19–32, July 2013.
[162] Hans Rullgård, L-G Öfverstedt, Sergey Masich, Bertil Daneholt, and Ozan
Öktem, “Simulation of transmission electron microscope images of biological
specimens,” Journal of microscopy, vol. 243, no. 3, pp. 234–256, 2011.
[163] M. Unser, “Sampling—50 years after Shannon,” Proceedings IEEE, vol. 88,
no. 4, pp. 569–587, apr 2000.
[164] Martin Arjovsky, Soumith Chintala, and Léon Bottou, “Wasserstein genera-
tive adversarial networks,” in International conference on machine learning,
2017, pp. 214–223.
[165] Cédric Villani, Optimal transport: old and new, vol. 338, Springer Science &
Business Media, 2008.
220 BIBLIOGRAPHY
[176] Zvi Kam, “The reconstruction of structure from electron micrographs of ran-
domly oriented particles,” in Electron Microscopy at Molecular Dimensions,
pp. 270–277. Springer, 1980.
[177] Nir Sharon, Joe Kileel, Yuehaw Khoo, Boris Landa, and Amit Singer,
“Method of moments for 3-d single particle ab initio modeling with non-
uniform distribution of viewing angles,” Inverse Problems, 2019.
[178] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud
Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm
Van Der Laak, Bram Van Ginneken, and Clara I Sánchez, “A survey on deep
learning in medical image analysis,” Medical image analysis, vol. 42, pp.
60–88, 2017.
[179] Michael T. McCann, Kyong Hwan Jin, and Michael Unser, “Convolutional
neural networks for inverse problems in imaging: A review,” IEEE Signal
Processing Magazine, vol. 34, no. 6, pp. 85–95, Nov. 2017.
[180] George Barbastathis, Aydogan Ozcan, and Guohai Situ, “On the use of deep
learning for computational imaging,” Optica, vol. 6, no. 8, pp. 921–943, 2019.
[181] Tristan Bepler, Alex J Noble, and Bonnie Berger, “Topaz-denoise: general
deep denoising models for cryoem,” bioRxiv, p. 838920, 2019.
[182] Feng Wang, Huichao Gong, Gaochao Liu, Meijing Li, Chuangye Yan, Tian
Xia, Xueming Li, and Jianyang Zeng, “Deeppicker: a deep learning approach
for fully automated particle picking in cryo-em,” Journal of structural biology,
vol. 195, no. 3, pp. 325–336, 2016.
[183] Yanan Zhu, Qi Ouyang, and Youdong Mao, “A deep convolutional neural
network approach to single-particle recognition in cryo-electron microscopy,”
BMC bioinformatics, vol. 18, no. 1, pp. 348, 2017.
[184] Dimitry Tegunov and Patrick Cramer, “Real-time cryo-em data pre-
processing with warp,” BioRxiv, p. 338558, 2018.
[185] Thorsten Wagner, Felipe Merino, Markus Stabrin, Toshio Moriya, Claudia
Antoni, Amir Apelbaum, Philine Hagel, Oleg Sitsel, Tobias Raisch, Daniel
Prumbaum, et al., “Sphire-cryolo is a fast and accurate fully automated
222 BIBLIOGRAPHY
particle picker for cryo-em,” Communications Biology, vol. 2, no. 1, pp. 218,
2019.
[186] Tristan Bepler, Andrew Morin, Micah Rapp, Julia Brasch, Lawrence Shapiro,
Alex J Noble, and Bonnie Berger, “Positive-unlabeled convolutional neural
networks for particle picking in cryo-electron micrographs,” Nature methods,
pp. 1–8, 2019.
[187] Ellen D. Zhong, Tristan Bepler, Joseph H. Davis, and Bonnie Berger, “Re-
constructing continuous distributions of 3D protein structure from cryo-em
images,” in International Conference on Learning Representations, 2020.
[188] Nina Miolane, Frédéric Poitevin, Yee-Ting Li, and Susan Holmes, “Estima-
tion of orientation and camera parameters from cryo-electron microscopy im-
ages with variational autoencoders and generative adversarial networks,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops, 2020, pp. 970–971.
[189] Ashish Bora, Eric Price, and Alexandros G Dimakis, “AmbientGAN: Gener-
ative models from lossy measurements.,” ICLR, vol. 2, pp. 5, 2018.
[190] Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lom-
bardi, Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason
Saragih, Matthias Nießner, et al., “State of the art on neural rendering,”
arXiv preprint arXiv:2004.03805, 2020.
[191] Shubham Tulsiani, Alexei A Efros, and Jitendra Malik, “Multi-view consis-
tency as supervisory signal for learning shape and pose prediction,” in Pro-
ceedings of the IEEE conference on computer vision and pattern recognition,
2018, pp. 2897–2905.
[192] Matheus Gadelha, Subhransu Maji, and Rui Wang, “3D shape induction
from 2D views of multiple objects,” in 2017 International Conference on 3D
Vision (3DV). IEEE, 2017, pp. 402–411.
[194] Harshit Gupta, Thong H. Phan, Jaejun Yoo, and Michael Unser, “Multi-
cryogan: Reconstruction of continuous conformations in cryo-em using gen-
erative adversarial networks,” in Proc. European Conference on Computer
Vision Workshops (ECCVW August 23-28), 2020.
[196] Abbas Ourmazd, “Cryo-em, xfels and the structure conundrum in structural
biology,” Nature methods, vol. 16, no. 10, pp. 941–944, 2019.
[197] Ellen D Zhong, Tristan Bepler, Bonnie Berger, and Joseph H Davis, “Cryo-
drgn: Reconstruction of heterogeneous structures from cryo-electron micro-
graphs using neural networks,” bioRxiv, 2020.
[198] Kurt Hornik, Maxwell Stinchcombe, Halbert White, et al., “Multilayer feed-
forward networks are universal approximators.,” .
[199] Carlos Oscar S Sorzano, A Jiménez, Javier Mota, José Luis Vilas, David
Maluenda, M Martínez, E Ramírez-Aportela, T Majtner, J Segura, Ruben
Sánchez-García, et al., “Survey of the analysis of continuous conformational
variability of biological macromolecules by electron microscopy,” Acta Crys-
tallographica Section F: Structural Biology Communications, vol. 75, no. 1,
pp. 19–32, 2019.
[200] Joakim AndÃľn, Eugene Katsevich, and Amit Singer, “Covariance estimation
using conjugate gradient for 3d classification in cryo-EM,” pp. 200–204.
[201] Ali Dashti, Peter Schwander, Robert Langlois, Russell Fung, Wen Li, Ah-
mad Hosseinizadeh, Hstau Y Liao, Jesper Pallesen, Gyanesh Sharma, Vera A
Stupina, et al., “Trajectories of the ribosome as a brownian nanomachine,”
Proceedings of the National Academy of Sciences, vol. 111, no. 49, pp. 17492–
17497, 2014.
[202] Amit Moscovich, Amit Halevi, Joakim Andén, and Amit Singer, “Cryo-em
reconstruction of continuous heterogeneity by laplacian spectral volumes,”
Inverse Problems, vol. 36, no. 2, pp. 024003, 2020.
224 BIBLIOGRAPHY
[204] Evan Seitz, Francisco Acosta-Reyes, Peter Schwander, and Joachim Frank,
“Simulation of cryo-em ensembles from atomic models of molecules exhibiting
continuous conformations,” BioRxiv, p. 864116, 2019.
[205] Thomas Debarre, Julien Fageot, Harshit Gupta, and Michael Unser, “B-
spline-based exact discretization of continuous-domain inverse problems with
generalized tv regularization,” IEEE Transactions on Information Theory,
vol. 65, no. 7, pp. 4457–4470, 2019.
[206] Shayan Aziznejad, Harshit Gupta, Joaquim Campos, and Michael Unser,
“Deep neural networks with trainable activations and controlled lipschitz con-
stant,” arXiv preprint arXiv:2001.06263, 2020.
[207] Mohammad Zalbagi Darestani and Reinhard Heckel, “Can un-trained neural
networks compete with trained neural networks at image reconstruction?,”
arXiv preprint arXiv:2007.02471, 2020.
[208] Gauri Jagatap and Chinmay Hegde, “Algorithmic guarantees for inverse imag-
ing with untrained network priors,” in Advances in Neural Information Pro-
cessing Systems, 2019, pp. 14832–14842.
[209] Reinhard Heckel and Mahdi Soltanolkotabi, “Compressive sensing with un-
trained neural networks: Gradient descent finds the smoothest approxima-
tion,” arXiv preprint arXiv:2005.03991, 2020.
[210] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen, “Pro-
gressive growing of gans for improved quality, stability, and variation,”
arXiv:1710.10196, 2017.
[211] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee, “Accurate image super-
resolution using very deep convolutional networks,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2016, pp. 1646–
1654.
BIBLIOGRAPHY 225
[212] Tero Karras, Samuli Laine, and Timo Aila, “A style-based generator ar-
chitecture for generative adversarial networks,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2019, pp. 4401–
4410.
[213] Miloš Vulović, Raimond BG Ravelli, Lucas J van Vliet, Abraham J Koster,
Ivan Lazić, Uwe Lücken, Hans Rullgård, Ozan Öktem, and Bernd Rieger,
“Image formation modeling in cryo-electron microscopy,” Journal of structural
biology, vol. 183, no. 1, pp. 19–32, 2013.
[214] Walter Rudin, Real and Complex Analysis, Tata McGraw-Hill Education,
1987.
[216] Reed Michael and Barry Simon, Methods of modern mathematical physics I:
Functional analysis, Academic Press, 1980.
[217] Ken Sauer and Charles Bouman, “A local update strategy for iterative recon-
struction from projections,” IEEE Trans. Signal Process., vol. 41, no. 2, pp.
534–548, 1993.
[218] Idris A Elbakri and Jeffrey A Fessler, “Statistical image reconstruction for
polyenergetic X-ray computed tomography,” IEEE Trans. Med. Imag., vol.
21, no. 2, pp. 89–99, 2002.
[219] Heinz H. Bauschke and Patrick L. Combettes, Convex Analysis and Monotone
Operator Theory in Hilbert Spaces, Springer, New York, NY, 2011.
[221] Alexis Rohou and Nikolaus Grigorieff, “Ctffind4: Fast and accurate defocus
estimation from electron micrographs,” Journal of structural biology, vol. 192,
no. 2, pp. 216–221, 2015.
226 BIBLIOGRAPHY
227
HARSHIT GUPTA
[email protected]
BM 4.134, EPFL, Lausanne CH-1015, Switzerland
Homepage ⇧ Google Scholar
EDUCATION
July 2015 - September 2020 École polytechnique fédérale de Lausanne (EPFL), Switzerland
Ph.D. in Electrical Engineering
Thesis: “From Classical to Unsupervised-Deep-Learning Methods
for Solving Inverse Problems in Imaging”.
Advisor: Prof. Michael Unser
July 2011 - May 2015 Indian Institute of Technology (IIT), Guwahati, India
B. Tech in Electronics and Communications Engineering
RESEARCH EXPERIENCES
July 2014 - May 2015 Indian Institute of Technology (IIT), Guwahati, India
Bachelor Thesis Project
Topic: “Blind Image Quality Assessment”
Advisor: Prof. Kannan Karthik
May 2014 - July 2014 École polytechnique fédérale de Lausanne (EPFL), Switzerland
Research Internship
Topic: “Interpolation using Derivatives”
Advisor: Prof. Michael Unser
May 2013 - July 2013 Indian Institute of Science (IISc), Bangalore, India
Research Internship
Topic: “Building a MATLAB GUI on Optic Disk
Localization using `1 -minimization”
Advisor: Prof. Chandra Sekhar Seelamantula
PUBLICATIONS
Preprints
8. Gupta H* , McCann M T* , Donati L, Unser M, “CryoGAN: A New Reconstruction Paradigm for Single-
particle Cryo-EM Via Deep Adversarial Learning,” bioRxiv 2020.03.20.001016, March 2020. * Co-first
authors. [PDF]
7. Jin K H* , Gupta H* , Yerly J, Stuber M, Unser M, “Time-Dependent Deep Image Prior for Dynamic
MRI,” IEEE Transactions on Medical Imaging, in Revision. * Co-first authors. [PDF]
Journals
6. Aziznejad S, Gupta H, Campos J, Unser M, “Deep Neural Networks with Trainable Activations and
Controlled Lipschitz Constant,” IEEE Transactions on Signal Processing, vol. 68, pp. 4688 - 4699,
August 2020. [PDF]
5. Yang F, Pham T, Gupta H, Unser M, Ma J, “Deep-learning projector for optical di↵raction tomography,”
Optics Express, vol. 28(3), pp. 3905-3921, February 2020. [PDF]
4. Debarre T, Fageot J, Gupta H, Unser M, “B-spline-based exact discretization of continuous-domain
inverse problems with generalized TV regularization,” IEEE Transactions on Information Theory, vol.
65(7), pp.4457-4470, March 2019. [PDF]
3. Gupta H, Jin K H, Nguyen H Q, McCann M T, Unser M, “CNN-based projected gradient descent for
consistent CT image reconstruction,” IEEE Transactions on Medical Imaging, vol. 37(6), pp. 1440-1453,
May 2018. [PDF]
2. Gupta H, Fageot J, Unser M, “Continuous-domain solutions of linear inverse problems with Tikhonov
versus generalized TV regularization,” IEEE Transactions on Signal Processing, vol. 66(17), pp. 4670-
4684, July 2018. [PDF]
1. Unser M, Fageot J, Gupta H, “Representer Theorems for Sparsity-Promoting `1 Regularization,” IEEE
Transactions on Information Theory, vol. 62(9), pp. 5167-5180, August 2016. [PDF]
Conference and Workshop Proceedings
4. Gupta H, Phan T H, Yoo J, Unser M, “Multi-CryoGAN: Reconstruction of continuous conformations
in Cryo-EM using Generative Adversarial Networks,” Proc. European Conference on Computer Vision
Workshops (ECCVW 2020) (Online, August 23-28), in press. [PDF]
3. Debarre T, Fageot J, Gupta H, Unser M, “Solving Continuous-domain Problems Exactly with Mul-
tiresolution B-splines,” Proc. IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP 2019) (Brighton, UK, May 12-17), pp. 5122-5126. [PDF]
2. Gupta H, Schmitter D, Uhlmann V, Unser M, “General surface energy for spinal cord and aorta segmen-
tation,” IEEE Proc. International Symposium on Biomedical Imaging (ISBI 2017), (Sydney, Australia,
April 18-21), pp. 319-322. [PDF]
1. Uhlmann V, Fageot J, Gupta H, Unser M, “Statistical optimality of Hermite splines,” Proc. International
Conference on Sampling Theory and Applications (SampTA 2015), (Washington, DC, US, May 25-29),
pp. 226-230. [PDF]
TEACHING EXPERIENCES
TECHNICAL STRENGTHS
HONOURS
• Selected for II round of Texas Instruments Innovation Challenge: India Analog Design Contest 2014.
• Selected in national Top-30 in Manthan, CAG, 2014, among more than 150 teams.
• Placed among top 0.5% in 2011 IIT-Joint Entrance Exam (to enroll in undergraduate program) given by
500,000 students.
• Placed among National Top 1% in National Standard Examination in Physics, 2010-11, organized by
Indian Association of Physics Teachers.
• Secured AIR-171 in National Level Science Talent Search Examination, 2009.
• Secured 3rd position in SBM Inter School Science and Environment Quiz, 2008.
• Awarded the Talent Scholarship Award by Saraswati Siksha Sansthan, 2008.