0% found this document useful (0 votes)
249 views248 pages

From Classical To Unsupervised Deep Learning For Solving Inverse Problem in Imaging To

Uploaded by

Asim Asrar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
249 views248 pages

From Classical To Unsupervised Deep Learning For Solving Inverse Problem in Imaging To

Uploaded by

Asim Asrar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 248

From Classical to

Unsupervised-Deep-Learning
Methods for Solving Inverse
Problems in Imaging
Harshit Gupta
Thèse N 7360 (septembre 2020)

Thèse présentée à la faculté des sciences et techniques de l’ingénieur


pour l’obtention du grade de docteur ès sciences
et acceptée sur proposition du jury

Prof. Dimitri Van De Ville, président


Prof. Michael Unser, directeur de thèse
Prof. François Fleuret, rapporteur
Prof. Ender Konukoglu, rapporteur
Dr. Sjors Scheres, rapporteur

École polytechnique fédérale de Lausanne—2020

Cover design by Annette Unser


Printing and binding by Repro-EPFL
Typeset with LATEX
Copyright © 2020 by Harshit Gupta
Available at http://bigwww.epfl.ch/
Abstract

In this thesis, we propose new algorithms to solve inverse problems in the context of
biomedical images. Due to ill-posedness, solving these problems require some prior
knowledge of the statistics of the underlying images. The traditional algorithms, in
the field, assume prior knowledge related to smoothness or sparsity of these images.
Recently, they have been outperformed by the second generation algorithms which
harness the power of neural networks to learn required statistics from training data.
Even more recently, last generation deep-learning-based methods have emerged
which require neither training nor training data.
This thesis devises algorithms which progress through these generations. It ex-
tends these generations to novel formulations and applications while bringing more
robustness. In parallel, it also progresses in terms of complexity, from proposing
algorithms for problems with 1D data and an exact known forward model to the
ones with 4D data and an unknown parametric forward model.
We introduce five main contributions. The last three of them propose deep-
learning-based latest-generation algorithms that require no prior training.
1) We develop algorithms to solve the continuous-domain formulation of inverse
problems with both classical Tikhonov and total-variation regularizations. We for-
malize the problems, characterize the solution set, and devise numerical approaches
to find the solutions.
2) We propose an algorithm that improves upon end-to-end neural-network-
based second generation algorithms. In our method, a neural network is first trained
as a projector on a training set, and is then plugged in as a projector inside the
projected gradient descent (PGD). Since the problem is nonconvex, we relax the
PGD to ensure convergence to a local minimum under some constraints. This
method outperforms all the previous generation algorithms for Computed Tomog-

i
ii

raphy (CT).
3) We develop a novel time-dependent deep-image-prior algorithm for modalities
that involve a temporal sequence of images. We parameterize them as the output
of an untrained neural network fed with a sequence of latent variables. To impose
temporal directionality, the latent variables are assumed to lie on a 1D manifold.
The network is then tuned to minimize the data fidelity. We obtain state-of-the-art
results in dynamic magnetic resonance imaging (MRI) and even recover intra-frame
images.
4) We propose a novel reconstruction paradigm for cryo-electron-microscopy
(CryoEM) called CryoGAN. Motivated by generative adversarial networks (GANs),
we reconstruct a biomolecule’s 3D structure such that its CryoEM measurements
resemble the acquired data in a distributional sense. The algorithm is pose-or-
likelihood-estimation-free, needs no ab initio, and is proven to have a theoretical
guarantee of recovery of the true structure.
5) We extend CryoGAN to reconstruct continuously varying conformations of
a structure from heterogeneous data. We parameterize the conformations as the
output of a neural network fed with latent variables on a low-dimensional manifold.
The method is shown to recover continuous protein conformations and their energy
landscape.

Key words: inverse problems, deep learning, cryo electron-microscopy, continu-


ous conformations, dynamic magnetic resonance, computed tomography, deep im-
age prior, total variation
Résumé

Dans cette thèse, nous proposons de nouveaux algorithmes pour résoudre des pro-
blèmes inverses dans le cadre d’images biomédicales. En raison de la mauvaise pose,
la résolution de ces problèmes nécessite une connaissance préalable des statistiques
des images sous-jacentes. Les algorithmes traditionnels, sur le terrain, supposent des
connaissances préalables liées à la fluidité ou à la rareté de ces images. Récemment,
ils ont été dépassés par les algorithmes de deuxième génération qui exploitent la
puissance des réseaux de neurones pour apprendre les statistiques requises à partir
des données d’entraînement. Plus récemment encore, des méthodes basées sur l’ap-
prentissage en profondeur de dernière génération sont apparues, qui ne nécessitent
ni formation ni données de formation.
Cette thèse conçoit des algorithmes qui progressent à travers ces générations. Il
étend ces générations à de nouvelles formulations et applications tout en apportant
plus de robustesse. En parallèle, il progresse également en termes de complexité, de
la proposition d’algorithmes pour des problèmes avec des données 1D et un modèle
direct exact connu à ceux avec des données 4D et un modèle direct paramétrique
inconnu.
Nous introduisons cinq contributions principales. Les trois derniers proposent
des algorithmes de dernière génération basés sur le deep learning qui ne nécessitent
aucune formation préalable.
1) Nous développons des algorithmes pour résoudre la formulation dans le do-
maine continu de problèmes inverses avec des régularisations Tikhonov classiques
et à variation totale. Nous formalisons les problèmes, caractérisons l’ensemble de
solutions et concevons des approches numériques pour trouver les solutions.
2) Nous proposons un algorithme qui améliore les algorithmes de deuxième gé-
nération basés sur un réseau de neurones de bout en bout. Dans notre méthode,

iii
iv

un réseau de neurones est d’abord formé en tant que projecteur sur un ensemble
d’entraînement, puis branché en tant que projecteur à l’intérieur de la descente
en gradient projeté (PGD). Comme le problème n’est pas convexe, nous assou-
plissons le DPI pour assurer la convergence vers un minimum local sous certaines
contraintes. Cette méthode surpasse tous les algorithmes de génération précédente
pour la tomodensitométrie (CT).
3) Nous développons un nouvel algorithme prioritaire d’image profonde dépen-
dant du temps pour les modalités qui impliquent une séquence temporelle d’images.
Nous les paramétrons comme la sortie d’un réseau neuronal non formé alimenté par
une séquence de variables latentes. Pour imposer une directionnalité temporelle, les
variables latentes sont supposées se trouver sur une variété 1D. Le réseau est ensuite
réglé pour minimiser la fidélité des données. Nous obtenons des résultats de pointe
en imagerie par résonance magnétique dynamique (IRM) et récupérons même des
images intra-trame.
4) Nous proposons un nouveau paradigme de reconstruction pour la cryo micro-
scopie électronique (CryoEM) appelé CryoGAN. Motivés par des réseaux contra-
dictoires génératifs (GAN), nous reconstruisons la structure 3D d’une biomolécule
de telle sorte que ses mesures CryoEM ressemblent aux données acquises dans un
sens distributionnel. L’algorithme est sans estimation de vraisemblance ni de pose
ou de vraisemblance, ne nécessite aucun ab initio et il est prouvé qu’il a une garantie
théorique de récupération de la véritable structure.
5) Nous étendons CryoGAN pour reconstruire des conformations variant conti-
nuellement d’une structure à partir de données hétérogènes. Nous paramétrons les
conformations comme la sortie d’un réseau neuronal alimenté avec des variables la-
tentes sur une variété de faible dimension. Il est démontré que la méthode récupère
les conformations protéiques continues et leur paysage énergétique.

Mots clés : problèmes inverses, deep learning, cryo-microscopie électronique,


conformations continues, résonance magnétique dynamique, tomographie par com-
putational, deep image prior, variation totale
Then even non-existence was not there, nor existence,
whence all creation had its origin,
the creator, whether he fashioned it or whether he did not,
the creator, who surveys it all from highest heaven,
he knows or maybe even he does not know.

-Nasadiya Sukt, Rigved (10:129), 1200 BC


vi
Dedicated to Bittu (Parikshit) and my family.
viii
Acknowledgement

This thesis is the result of support from my friends, family, and mentors who showed
faith in me throughout this journey. Firstly, I thank Prof. Michael Unser for super-
vising this thesis. His intuition, passion, and curiosity helped me build my research
instinct. I also thank him for trusting me during the tough times and encouraging
me with both words and actions. In future, I would like to emulate the culture he
has cultivated in the group.

I express sincere thanks to jury president Prof. Dimitri Van de Ville, the jury
members, Prof. FranÃğois Fleuret, Prof. Ender Konukoglu, and Dr. Sjors Scheres
for reviewing and accepting this thesis.

I would like to thank Dr. Michael T. McCann (Mike), the creative genius, who
with his beautiful mind helped me solve many personal and research problems. I
tried to learn his technique of brutally analyzing concepts and problems. This was
immensely useful during the course of this thesis. I thank Dr. Soham Basu for all
his advises and moral support. I thank Dr. Daniel Schmitter, Dr. Denis Fortun,
and Luc Zheng for being there in the beginning years of my PhD.

I was fortunate to share my office with Thanh-an Pham from whom I learnt
to have fun while carrying out a PhD. His humor has been a good company for
the last three years. I am glad to have Shayan Aziznejad as my lab mate and a
dear friend. Over the course of the thesis I learnt many life perspectives from him,
shared numerous views, and had a lot of fun. I thank Pablo Garcia for all the fun
coffee meetings and gaming sessions. I thank Dr. Kyong Hwan Jin for initiating
my journey into deep learning.

ix
x

I would like to thank Dr. Daniel Sage for his help with almost all the aspects
of my research life at the lab and Dr. Philippe Thevenaz for all the help in writing
the research papers. I thank Dr. Emrah Bostan and Dr. Pedram Pad for being
really patient mentors and office mates during the initial part of my thesis. I thank
Dr. Masih Nilchian for all the life wisdom he shared with me.
I thank Dr. Laurene Donati, Dr. Anais Bodoul, Thomas Debarre, Fangshu
Yang, Pakshal Bohra, Dr. Quentin Denoyelle, and Dr. Jaejun Yoo for all the fun,
humor, and discussions we had in the past few years. I thank all the past members
of our lab Dr. Ferreol Soulez, Prof. Adrien Depeursinge, Dr. Emmanuel Soubies,
Dr. Zsuzsanna PÃijspoki, Dr. Virginie Uhlmann, Prof. Arash Amini, Leello Dadi,
and Carlos Garcia for the shared memories. I thank the recent members or affiliates
of the lab Joaquim Campos, Thong Huy Phan, Alexis Goujon, Dr. Pol del Aguila
Pla, and Yan Liu for bringing fresh perspectives.

I would like to specially thank the Indian community in Lausanne. I had fun of
a lifetime with Ranjith, Sai, Harshal, Kunhal, Maneesha, Sanket, Sourabh, Sagar,
Salil, Tejal, Anjali, Venkat, Anand, Rishikesh, Teju, Aparajitha, Yanisha, Sean,
Kavitha, Nithin, Chethana, Shravan, Mohit, Murali, Amrita, and Mayank. I thank
my friends Kaushal, Ravi, Gagandeep, Arpit, Pawan, Saurabh, Shashank, Puru,
Lakshman, Nikit, Rupam, Rajan from IIT Guwahati and Kirti, Khushal, Arpit,
Rishiraj, Rishabh, Rajwardhan, and Rahul from childhood for all their support.

Lastly, I thank my grandparents, parents, brother, and my entire long and wide
family without whom I could not have completed this thesis.
Contents

Abstract i

Résumé iii

Acknowledgement ix

Introduction 1

1 Linear Inverse Problems for Biomedical Imaging 11


1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Generation I - Classical Methods . . . . . . . . . . . . . . . . . . . . 12
1.2.1 Tikhonov-Prior-based Classical Methods . . . . . . . . . . . 13
1.2.2 Sparsity-Prior-based Classical Methods . . . . . . . . . . . . 13
1.3 Generation II - Supervised-Deep-Learning-Based methods . . . . . . 15
1.3.1 Direct Feedforward Reconstruction . . . . . . . . . . . . . . . 15
1.3.2 Iterative Reconstruction . . . . . . . . . . . . . . . . . . . . . 16
1.4 Generation III - Unsupervised Deep-Learning-based methods . . . . 16
1.4.1 Deep Image Prior . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4.2 Deep Generative Models . . . . . . . . . . . . . . . . . . . . . 17
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

I First Generation 19
2 Continuous-Domain Extension of Classical Methods 21

xi
xii CONTENTS

2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Continuous-Domain Formulation of Inverse Problems . . . . . . . . . 24
2.2.1 Measurement Operator . . . . . . . . . . . . . . . . . . . . . 25
2.2.2 Data-Fidelity Term . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Regularization Operator . . . . . . . . . . . . . . . . . . . . 25
2.2.4 Regularization Norms . . . . . . . . . . . . . . . . . . . . . . 26
2.2.5 Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1 Inverse Problem with Tikhonov/L2 Regularization . . . . . . 30
2.4.2 Inverse Problem with gTV Regularization . . . . . . . . . . . 31
2.4.3 Illustration with Ideal Sampling . . . . . . . . . . . . . . . . 32
2.5 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Discretization and Algorithms . . . . . . . . . . . . . . . . . . . . . . 35
2.6.1 Tikhonov Regularization . . . . . . . . . . . . . . . . . . . . . 35
2.6.2 gTV Regularization . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.3 Alternative Grid-free Techniques . . . . . . . . . . . . . . . . 42
2.7 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.7.1 Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . 44
2.7.2 Multiple Solutions . . . . . . . . . . . . . . . . . . . . . . . . 44
2.7.3 Random Fourier Sampling . . . . . . . . . . . . . . . . . . . . 45
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

II Second Generation 53
3 Deep-Learning-based PGD for Iterative Reconstruction 55
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.1.2 Related and Prior Work . . . . . . . . . . . . . . . . . . . . . 58
3.1.3 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Theoretical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.2 Constrained Least Squares . . . . . . . . . . . . . . . . . . . 61
3.2.3 Projected Gradient Descent . . . . . . . . . . . . . . . . . . . 61
CONTENTS xiii

3.3 Relaxation with Guaranteed Convergence . . . . . . . . . . . . . . . 63


3.4 Training a CNN as a Projector . . . . . . . . . . . . . . . . . . . . . 66
3.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4.2 Sequential Training Strategy . . . . . . . . . . . . . . . . . . 67
3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.2 Experimental Setups . . . . . . . . . . . . . . . . . . . . . . . 69
3.5.3 Comparison Methods . . . . . . . . . . . . . . . . . . . . . . 70
3.5.4 Training and Selection of Parameters . . . . . . . . . . . . . . 71
3.6 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.6.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.6.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.6.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.7 Behavior of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.7.1 Convergence of RPGD . . . . . . . . . . . . . . . . . . . . . . 79
3.7.2 Advantages of Sequential Training . . . . . . . . . . . . . . . 81
3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

III Third Generation 85


4 Time-Dependent Deep Image Prior 87
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.1.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2.1 Static Discretization . . . . . . . . . . . . . . . . . . . . . . . 90
4.2.2 Spoke Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2.4 Deep Image Prior with Interpolated Latent Variables . . . . . 93
4.2.5 Architectures, Datasets, and Training . . . . . . . . . . . . . 97
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.3.1 Retrospective Simulation . . . . . . . . . . . . . . . . . . . . 101
4.3.2 Golden-Angle Reconstruction of Fetal Cardiac Motion . . . . 101
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.4.1 Latent Encoding for Acyclic Data . . . . . . . . . . . . . . . 105
xiv CONTENTS

4.4.2 Smoothness in the Manifold of Latent Variables . . . . . . . . 106


4.4.3 Size of Latent Variables . . . . . . . . . . . . . . . . . . . . . 106
4.4.4 Variations on Latent Variables . . . . . . . . . . . . . . . . . 106
4.4.5 Memory Savings . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.4.6 Benefits of Our Approach . . . . . . . . . . . . . . . . . . . . 110
4.4.7 Limitations and Future Work . . . . . . . . . . . . . . . . . . 111
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5 CryoGAN: Cryo-EM Reconstruction using GAN Framework 113


5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2 Image-Formation Model in Single-Particle Cryo-EM . . . . . . . . . 115
5.2.1 Image Formation in Continuous-Domain . . . . . . . . . . . . 117
5.3 Mathematical Framework of CryoGAN . . . . . . . . . . . . . . . . . 119
5.3.1 Connection with Wasserstein GANs . . . . . . . . . . . . . . 121
5.4 The CryoGAN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 121
5.4.1 The Cryo-EM Physics Simulator . . . . . . . . . . . . . . . . 122
5.4.2 The CryoGAN Discriminator Network . . . . . . . . . . . . . 124
5.4.3 Overall Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.5 Theoretical Guarantee of Recovery . . . . . . . . . . . . . . . . . . . 125
5.5.1 Recovery in the Absence of CTF and Noise . . . . . . . . . . 129
5.5.2 Recovery in the Presence of CTF and Absence of Noise . . . 131
5.5.3 Recovery in the presence of CTF and Noise . . . . . . . . . . 135
5.6 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.6.1 CryoGAN vs. Likelihood-based Methods . . . . . . . . . . . . 138
5.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.7.1 Results on Synthetic Data . . . . . . . . . . . . . . . . . . . . 140
5.7.2 Results on Additional Synthetic Data . . . . . . . . . . . . . 142
5.7.3 Results on Experimental Data (EMPIAR-10061) . . . . . . . 142
5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6 Reconstructing Continuous Conformations in CryoEM using GANs


145
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.3 Background and Preliminaries . . . . . . . . . . . . . . . . . . . . . . 149
6.3.1 Image-Formation Model . . . . . . . . . . . . . . . . . . . . . 149
CONTENTS xv

6.3.2 CryoGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151


6.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.4.1 Parameterization of the Conformation Manifold . . . . . . . . 152
6.4.2 Optimization Scheme . . . . . . . . . . . . . . . . . . . . . . 153
6.5 Theoretical Guarantee of Recovery . . . . . . . . . . . . . . . . . . . 154
6.6 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.6.1 Continuous Conformations . . . . . . . . . . . . . . . . . . . 158
6.6.2 Discrete Conformations . . . . . . . . . . . . . . . . . . . . . 160
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

7 Conclusion and Outlook 165

A Appendices 173
A.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
A.1.1 Proof of Theorem 2.4.1 . . . . . . . . . . . . . . . . . . . . . 173
A.1.2 Proof of Theorem 2.4.2 . . . . . . . . . . . . . . . . . . . . . 178
A.1.3 Proof of Theorem 2.6.2 . . . . . . . . . . . . . . . . . . . . . 180
A.1.4 Proof of Proposition A.1.3 . . . . . . . . . . . . . . . . . . . . 181
A.1.5 Proof of Proposition A.1.4 . . . . . . . . . . . . . . . . . . . . 181
A.1.6 Structure of the Search Spaces . . . . . . . . . . . . . . . . . 182
A.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
A.2.1 Proof of Theorem 3.3.1 . . . . . . . . . . . . . . . . . . . . . 183
A.2.2 RPGD for Poisson Noise in CT . . . . . . . . . . . . . . . . . 185
A.2.3 Proof of Proposition 3.2.1 . . . . . . . . . . . . . . . . . . . . 186
A.2.4 Proof of Proposition 3.2.2 . . . . . . . . . . . . . . . . . . . . 187
A.2.5 Proof of Proposition 3.2.3 . . . . . . . . . . . . . . . . . . . . 187
A.2.6 Proof of Theorem 3.2.4 . . . . . . . . . . . . . . . . . . . . . 188
A.2.7 Proof of Theorem 3.2.5 . . . . . . . . . . . . . . . . . . . . . 189
A.2.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
A.3 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
A.3.1 Synthetic Data (Figure 5.3) . . . . . . . . . . . . . . . . . . . 195
A.3.2 Additional Synthetic Data (Figure 5.4) . . . . . . . . . . . . . 197
A.3.3 Experimental Data (Figure 5.5) . . . . . . . . . . . . . . . . . 197
A.4 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
A.4.1 Neural Network Architectures . . . . . . . . . . . . . . . . . . 200
xvi CONTENTS

Bibliography 200

Curriculum Vitæ 227


Introduction

Imagine we have an old weighing machine and a sealed paper bag that contains
two apples. We would like to know the weight of each apple. However, the bag
must remain sealed. When weighed the total weight appears to be 501 grams.
This problem of finding the individual weights of the apples, defined here as our
signal, from an indirect hint about them, called measurement, can be regarded as an
inverse problem. The measurement is obtained from the signal through a forward
model. In this case, the forward model is the operation of summation.
This particular inverse problem is ill-posed since one cannot find the exact
weights of the two apples based solely on this one measurement. This is because
of two reasons. Firstly, there is an uncertainty on their exact total weight due
to the fact that the old machine can have an error margin and that the bag has
some unknown weight. These latter factors are called noise which corrupts the true
measurement. Secondly, even if the true total weight was known, still there will be
an infinite combination of weights that would sum up to this exact total.
To address this problem, we could take the advantage of prior knowledge about
the apples. For example, if we know that in this particular season all the apples have
almost the same weight, then we can estimate that each apple weighs around 250.5
grams. We could also factor in the weight of the bag and the error margin of the
machine to get an even better estimate. If we know that both of these contribute
together at most 1 gram of variation in the total weight, then we can claim that the
weight of each apple lies now in an interval, somewhere between 250-251 grams.
The resolution of ill-posed inverse problems is fundamental to modern imag-
ing [1] such as fluorescence microscopy, computational tomogaphy (CT), magnetic
resonance imaging (MRI), cryo-electron microscopy (Cryo-EM), single-molecule lo-
calization microscopy (SMLM), etc.(see Figure 1). These techniques are essential

1
2 CONTENTS

in understanding the structure of internal organs, cells, biomolecules, etc. which


is important in diagnosing diseases, drug-discovery, and to deepen the knowledge
about life mechanisms. A shared trait of these techniques is that they do not give
direct access to the desirable image (signal) but only provide information about it
through an indirect physical mechanism (forward model). This information (mea-
surement) then has to be processed adequately to obtain the required signal. Unlike
the apple problem, the signal in these cases is numerically large and the forward
model is much more complicated.
For example, in CT, x-rays propagate through the organ to be imaged [2] before
being measured by a detector. The intensity of these rays is altered depending on
the absorption properties of the organ tissues. This measured data thus contains
the information about the organ absorption, albeit in a very convoluted form. The
task then is to recover the image of the absorption index of the organ (signal)
from the collected information (measurement). This is done by solving an inverse
problem, which requires numerical inversion of the forward model associated with
the device. This inversion is challenging since these problems are ill-posed. This
ill-posedness stems from the following factors.

• The measurements are corrupted by noise. This can be thermal noise in the
detector or intrinsic to the forward model or imaging setup.

• The measurements may not be fully informative because the forward model
has an intrinsic null space.

• The measurements can be substantially fewer than the unknown variables in


the signal. This means that there can be infinitely many signals that map to
the same measurement. In fact, a reduction in the number of measurements
is desirable. This is useful, for example, to reduce the radiation dose.

Therefore, solving these problems requires prior knowledge about the true signal.
For example, it can be assumed that the CT image of the organ is piecewise-
constant. The quality of reconstruction therefore substantially hinges on the ability
to inject accurate prior knowledge during the reconstruction.
In all techniques (CT, MRI, Cryo-EM, SMLM, among others), the approaches to
solve the underlying inverse problems have dramatically shifted from the classical-
1 COVID-19 image has been obtained from the website of Centre of Disease Control.
CONTENTS 3

methods introduced in the 1970s to the recently introduced deep-learning-based


methods.
In the classical (first-generation methods), the reconstruction task is formulated
as an optimization problem [3, 4]. Given a vector of measurements, the signal is
reconstructed by solving this optimization problem. There are two components in
this formulation. The first is called data fidelity. It ensures that the recovered signal
is such that its simulated measurements are close to the acquired measurements.
The second component is called regularization; it is required to counteract the ill-
posedness. This component is based on the prior knowledge about the true signal.
The two components ensure that the reconstructed signal is consistent with the
measurements, while being compatible with the prior knowledge.
Depending on the type of prior knowledge used, the classical reconstruction methods
can be broadly classified in two categories.
1. Tikhonov-Based: These methods use quadratic regularization (Tikhonov)
which assumes that the true signal has low energy in some domain. The
resulting optimization problem is easy to solve, often through a linear op-
eration on the measurements. However, this prior results in reconstructed
images being too smooth.
2. Sparsity-Based: Non-quadratic priors became popular since mid 2000s [5–9].
They assume images to be sparse in some domain and result in more realistic
reconstruction since they are better than Tikhonov-based priors at capturing
the statistics of the true images. However, the resulting optimization problem
is harder to solve. The solution is obtained by a nonlinear iterative proce-
dure that has necessitated the development of new specialized optimization
routines [10–13].
The main shortcoming of classical methods is that their priors are hand-picked
from a small set of well-behaved priors so that the resulting optimization problem
can be solved via traditional optimization routines. This limits the type of prior
knowledge that can be injected into the reconstruction.
In the first half of 2010s, deep learning [14] became an integral part of many
computer-vision applications such as image segmentation, classification, and object
detection [15]. Deep learning has then been utilized to solve inverse problems since
2015 and has resulted in state-of-the-art reconstructions [16–20]. These second-
generational methods utilize a training set containing measurement-signal pairs.
4 CONTENTS

Depending on the way the network is used in the reconstruction, this second gen-
eration of methods can itself be divided into two categories.
1. Direct Feedforward Reconstruction: In this category, the training set is first
used to learn a neural network that maps the measurements to their corre-
sponding training signal [17–20]. Once trained, the neural network is fed with
the actual measurements and the output of the network then yields the re-
construction. These methods reconstruct images with unprecedented quality,
owing to the capacity of neural networks to faithfully learn complicated map-
pings. However, these methods do not actively inject the information from
the measurements and rely only on the training data to understand the inver-
sion of the underlying physics-based forward model. This might reconstruct
image which lack consistency with the actual measurements.
2. Iterative Reconstruction with Consistency Constraints: Many approaches
have been proposed to enforce data consistency in the solution [21–25], in-
cluding one of the contributions of this thesis [26]. In these approaches, recon-
struction is performed iteratively with information from the forward model
and the measurements being injected in conjunction with the ability of the
network to reconstruct quality images. In essence, methods of this category
combine the power of neural networks with the variational formulation of the
classical methods. They produce better results and favor robustness when
there is a mismatch between the training-data and the image to be recon-
structed.
The main disadvantage of these supervised learning-based methods is that they
require training-data.
Recently, there has been an emergence of third-generation methods that use
deep learning without requiring training or training-data. The most prominent
representatives of this generation are methods based on deep image priors [27],
which use an untrained network to solve inverse problems. In this scheme, a fixed
random input is fed to the network. The network is then optimized to ensure
that its output is consistent with the acquired measurements. The success of this
scheme is explained by the neural network architecture which imposes an implicit
regularization that favours the reconstruction of natural-looking images. However,
this deep image prior needs more theoretical and experimental analysis. This will be
needed to understand its effect, applicability, and limits. Another category has also
CONTENTS 5

emerged. It is one of the contributions of this thesis; there, generative adversarial


networks (GANs) is used for the reconstruction [28].

Main Contributions
This thesis brings five main contributions to the field of inverse problems in imaging.
These contributions progress from the classical methods of the first generation to the
deep-learning-based methods of the last generation. They extend these generations
to novel formulations and applications, all the while bringing more robustness. In
parallel, they also progress in terms of complexity, from algorithms for problems
with 1D data and an exact known forward model to the problems with 4D data
and an unknown parametric forward model. We summarize these contributions in
Figure 2.

1. Continuous-Domain Extension of Classical Methods (Chap-


ter 2)
In order to further our understanding of the classical methods of reconstruction,
we formulate and solve 1D linear inverse problems in the continuous domain using
Tikhonov-and-sparsity-based regularizations. Our object of recovery is a function
resulting from the minimization of a convex objective functional composed of a
data-fidelity term and regularization. For the latter, we consider and compare the
continuous-domain Tikhonov and generalized total-variation (gTV) regularizations.
Using representer theorems, we derive the parametric form of the solutions. This
form is then used to discretize the problems and to find the numerical solutions.
For the Tikhonov case, we obtain a smooth solution that lives in a fixed subspace
determined by the forward model. In the gTV case, the solution is sparse and
composed of a few functions that depend on the regularization operator. The
number of these functions is upper-bounded by the number of measurements. These
results are in resonance with the discrete counterparts of the two cases, `2 -and-`1
regularizations. We illustrate these results through experiments in 1D. Moreover,
for the scenario of multiple solutions in the gTV case, we devise a scheme to find
the extreme points of the solution set which is theoretically ensured to be sparse.
Related Publication: H. Gupta, J. Fageot, M. Unser, “Continuous-domain
solutions of linear inverse problems with Tikhonov versus generalized TV regular-
6 CONTENTS

ization,” IEEE Transactions on Signal Processing, vol. 66(17), pp. 4670-84, July
2018.

2. Deep-Learning-Based PGD for Iterative Reconstruction


(Chapter 3)
Many second-generation approaches learn a neural-network-based measurement-to-
image regressor from the training data. This learnt regressor is then directly used
to reconstruct the image from a given measurement. However, these approaches
lack a feedback mechanism to enforce consistency between the reconstructed im-
age and the actual measurements. Our proposal is a plug-and-play scheme where
the projector in the projected gradient descent (PGD) is replaced by a trained
neural network. This iterative approach alternatively enforces measurement con-
sistency while a convolutional neural network (CNN) recursively projects the input
to the space of desirable images. In order to ensure convergence, we propose a
relaxed PGD and prove that, under certain conditions, the scheme converges to
some local minimum of the non-convex optimization problem. Our experiments on
sparse-view CT show that the scheme produces state-of-the-art results and brings
more robustness with respect to noise and training-data mismatch. This method
has become a fundamental contributor in the emergence of CNN-based iterative
algorithms belonging to the second generation.
Related Publication: H. Gupta, K.H. Jin, H.Q. Nguyen, M.T. McCann, M.
Unser, “CNN-based projected gradient descent for consistent CT image reconstruc-
tion,” IEEE Transactions on Medical Imaging, vol. 37(6), pp. 1440-53, May 2018.

3. Time-Dependent Deep Image Prior (Chapter 4)


As part of our thesis, we participated in the development of a third-generation
deep-learning-based method for dynamic MRI and other similar time-dependent
inverse problems. Our method requires neither prior training nor additional data
and extends deep image prior to temporal sequence. For this, we first assume a
time-ordered sequence of latent variables that are forced to lie on a low-dimensional
manifold. A neural network is then optimized to transform this sequence of latent
variables into a sequence of images such that the data-fidelity with the sequence
of acquired measurements is maximized. The semantic prior knowledge is injected
by enforcing constraints on the latent variables sequence. For non-periodic data,
CONTENTS 7

we assume the latent variable sequence to lie on a line segment. For approximately
periodic data like cardiac motion, we assume it to lie on a helix. We show that our
scheme results in state-of-the-art performance for both synthetic and experimental
data. Moreover, the interpolation of latent variables even lets us recover intra-
frame images. To the best of our knowledge, this is the first deep-learning method
applicable for dynamic MRI which requires neither prior training nor training-data.
Related Preprint: K.H. Jin*, H. Gupta*, J. Yerly, M. Stuber, M. Unser,
“Time-dependent deep image prior for dynamic MRI,” arXiv [eess.IV], October,
2019. *cofirst authors

4. CryoGAN: Cryo-EM Reconstruction using GAN Frame-


work (Chapter 5)

In Cryo-EM, we obtain 2D noisy tomographic projection of the separate instance


of the same but randomly oriented biomolecule. The task is to reconstruct the 3D
structure from these projection data. The current methods are likelihood-based
and, hence, require estimation of either pose or the conditional distribution over
the space of poses for each projection, a computationally expensive routine. In our
thesis, we introduce a novel paradigm that we name CryoGAN. It require neither
pose estimation, ab-initio, additional training-data, nor prior training. We aim to
reconstruct a biomolecule whose simulated Cryo-EM projections are distribution-
ally similar to the acquired data. Similarly to the GAN framework, this is done by
training a neural network called a discriminator which is optimized to distinguish
the two distributions. We then learn a biomolecule such that the discriminator
is unable to distinguish the two distributions. We mathematically prove that this
quest for distribution matching results in the recovery of the true structure. Our
scheme reaches 8.6 Å on realistic synthetic Cryo-EM data. Our results on exper-
imental data are promising; the next step will be to improve the neural network
architecture and to rely on multiresolution approaches to become competitive with
the state-of-the-art in the field.
Related Preprints: H. Gupta*, M.T. McCann*, L. Donati, M. Unser. âĂIJCryo-
GAN: A new reconstruction paradigm for single-particle Cryo-EM via deep adver-
sarial learning,âĂİ Biorxiv, March 2020. *cofirst authors
8 CONTENTS

5. Reconstructing Continuous Conformations in CryoEM us-


ing GANs (Chapter 6)
In the field of structural biology, the determination of the continuously varying con-
formations of a biomolecule is crucial in understanding its behaviour. The major
challenge in Cryo-EM is to reconstruct these conformations of a biomolecule from
their Cryo-EM data. To this end, we propose an extension of CryoGAN, termed
Multi-CryoGAN. In addition to the advantages of CryoGAN, our method sidesteps
the conformation estimation for each projection, a step necessary in the current
methods implicitly or explicitly. We propose to parameterize the conformation
manifold using a neural network that is driven by latent variables sampled from a
distribution. The task then is to optimize the network such that the Cryo-EM pro-
jections of the generated conformations are distributionally similar to the acquired
data. We provide a mathematical guarantee of recovery of the true conformation
landscape. Our method can successfully reconstruct conformations of a heat-shock
protein from their realistic synthetic data in both the continuous and discrete con-
formation cases. To the best of our knowledge, this is the first method that can
recover continuous conformations of a biomolecule in a standalone manner (without
the need of external routines).
Related Publication: H. Gupta, T.H. Phan, J. Yoo, M. Unser, âĂIJMulti-
CryoGAN: Reconstruction of continuous conformations in Cryo-EM using Genera-
tive Adversarial Networks,âĂİ In review.
H. Gupta, T.H. Phan, J. Yoo, M. Unser, “Multi-CryoGAN: Reconstruction
of continuous conformations in Cryo-EM using Generative Adversarial Networks,”
Proc. European Conference on Computer Vision Workshops (ECCVW 2020) (On-
line, August 23-28), in press.

Organization of the Thesis


Chapter 1 deals with the mathematical formulation of inverse problems and in-
vestigates the three generations of methods that solve these problems. Chapter 2
to Chapter 6 focus on each of the mentioned contributions. This is followed by
conclusion and the outlook of this thesis.
CONTENTS 9

Figure 1: Inverse problems in biomedical imaging. The imaging de-


vice (forward model) is used to obtain the measurement from an unknown
signal. This measurement is then used to reconstruct the signal. In con-
ventional microscopy, the measurement is the blurred version of the signal.
In CT, the scanner obtains the tomographic projection of the signal. An
MRI scanner acquires the k-space data of the signal. 1 Cryo-EM provides
the tomographic projection of separate instance of the same but randomly
oriented biomolecule.
10 CONTENTS

Figure 2: Contributions of this thesis in the context of the three gener-


ations of reconstruction methods for inverse problems.
Chapter 1

Linear Inverse Problems for


Biomedical Imaging

1
In this chapter, we will provide the mathematical formulation of inverse problems
and the reconstruction methods.

1.1 Overview
Solving inverse problems implicitly requires inverting the forward model which in-
corporates the physical acquisition process by which the measurement is acquired
from a signal. In the imaging applications dealt with in this thesis, the acquisition
physics is such that the forward model is linear and hence can be represented by a
linear operator H 2 RM ! RN . The measurement y is then given by

y = Hx + n, (1.1)

where x is the space-domain signal2 that we are interested in recovering and n 2 RM


is the noise intrinsic to the acquisition process. The noise can be assumed to
be a random variable sampled from an appropriate probability distribution; for
1 This chapter uses content from our work [26, 29].
2 In this thesis, we use the term image and signal interchangeably.

11
12 Linear Inverse Problems for Biomedical Imaging

example, a Gaussian distribution. Depending upon the imaging application, the


signal can be an image, a movie, or a volume but here we describe it in the vectorized
form. Moreover, the signal and the forward-model in reality are continuous-domain
entities which need to be properly discretized in order to reach the formulation in
(1.1). This aspect is discussed in detail in Chapter 2.
To find a suitable x, the task is formulated as an optimization problem
x⇤ = arg min J(y, x), (1.2)
x2RN

where J is a suitable function on the space of signal and the given measurement.
Hence, reconstruction procedure requires carefully choosing the function J and
then finding the signal which minimizes it. Solving these problems when M > N
is easy and the algorithms that deal with this scenario are mature and efficient. In
this thesis, we deal with the current trends in imaging. These trends may include
significantly fewer measurements than the number of unknowns (M ⌧ N ). For ex.,
this is useful in decreasing either the radiation dose in computed tomography (CT)
or the scanning time in MRI. Moreover, the measurements are typically very noisy
due to short integration times, which calls for some form of denoising.
This gives rise to an ill-posed problem in the sense that there may be an infinity
of consistent signals (or images) that map to the same measurements y. Thus, one
challenge of the reconstruction algorithm is to select the best solution among a mul-
titude of potential candidates. To counteract this ill-posedness, appropriate prior
knowledge about the signal needs to be injected. The quality of the reconstruction
method is therefore highly dependent on the data-fidelity and regularization used.
The available reconstruction methods can be broadly arranged in three genera-
tions, which represent the continued efforts of the research community to address
the aforementioned challenges.

1.2 Generation I - Classical Methods


In this generation the reconstruction task is formulated as solving
x⇤ = arg min (E(Hx, y) + R(x)), (1.3)
x2RN

where E : RM ⇥ RM ! R+ is a data-fidelity term that favors solutions that are


consistent with the measurements, R : RN ! R+ is a suitable regularizer that
1.2 Generation I - Classical Methods 13

encodes prior knowledge about the image x to be reconstructed, and 2 R+ is


a tradeoff parameter. For example, E could be weighted least-squares kHx yk2
and R is either a Tikhonov-or-sparsity-based regularization.
We discuss these two categories in detail.

1.2.1 Tikhonov-Prior-based Classical Methods


In these methods, the prior is expressed by a quadratic functional that is easy to
differentiate. are simple, quadratic, and differentiable. For example R(x) = kLxk2
which yields,
x⇤ = arg min (E(Hx, y) + kLxk2 ). (1.4)
x2RN

Due to the simplicity of the objective function, the reconstruction is easy to perform.
In fact, for the case when E(Hx, y) = kHx yk2 the solution is given by

x⇤ = (HT H + I) 1
HT y. (1.5)

Tikhonov-based classical methods are fast and provide excellent results when
the number of measurements is large and the noise is small [30]. However, in case of
extreme imaging, when M is smaller than N they reconstruct overly smooth image.

1.2.2 Sparsity-Prior-based Classical Methods


These methods avoid the shortcomings of the classical ones by assuming sparsity-
based priors which are better at enforcing the true image statistics. For example,
they could enforce the image to be sparse in some appropriate domain

x⇤ = arg min (E(Hx, y) + kLxk1 . (1.6)


x2RN

Here L could be a difference operator such that the regularization enforces image to
be piecewise constant. Due to the non-differentiability of the regularization term,
the solution to these problems cannot be found using simple gradient-based meth-
ods such as gradient descent. Instead the solutions are typically found iteratively
by enforcing the data-fidelity and the regularization, alternatively. The latter is
done using proximal operation. The overall skeleton of iterative algorithms can be
summarised as follows
14 Linear Inverse Problems for Biomedical Imaging

• Data-fidelity update. At iteration k, the estimate is updated to make it more


consistent with its measurements. This is done by

x̃k = xk rxk E(Hxk , y) (1.7)

where is an appropriate step size.

• Regularization update. In this step, the prior is enforced indirectly by solving

xk+1 = arg min kx̃k xk22 + ⌧ R(x), (1.8)


x

where ⌧ is an appropriate step-size. This step finds an image, which, on the


one hand, minimizes the regularization cost and, on the other hand, maintains
its similarity with the more data-consistent x̃k . Finding this solution is itself
not easy since the non-differentiability of R is still an issue. However, this
formulation decouples the data-fidelity term, which depends on the forward
model, from the regularization term, rendering the latter task easier.
Iterative schemes can be deployed to find the solution of (1.8). However, in
some special regularization cases, an analytical form exist. For example, when
L = I such that R(x) = kxk1 , the solution of (1.8) is the soft-thresholding
operation given by x⇤ = sign(x)(|x| )+ . However, the solution for other
L is generally not analytical and therefore requires additional computational
overhead.

This alternate mechanism is the backbone of many schemes like Iterative Soft
Thresholding Algorithm (ISTA) [31], Fast Iterative Soft Thresholding Algorithm
(FISTA) [32], and Alternating Directions Methods of Multipliers (ADMM) [13]. In
ISTA, the ⌧ and are kept the same through the iterations. However, the rate of
global convergence is sublinear, O(1/k). In FISTA, these parameters are updated
based on Nesterov acceleration. This brings faster speed of convergence than ISTA
with convergence rate O(1/k 2 ). Similar convergence rate is achieved in ADMM by
decoupling the data-fidelity and regularization using auxiliary variables, and having
a separate step to explicitly update these auxiliary variables.
Under the assumption that the functionals E and R are convex, the solution of
(1.6) also satisfies
x⇤ = arg min E(Hx, y) (1.9)
x2SR
1.3 Generation II - Supervised-Deep-Learning-Based methods 15

with SR = {x 2 RN : R(x)  ⌧ } for some unique ⌧ that depends on the regulariza-


ton parameter . Therefore, the solution has the best data fidelity among all images
in the set SR which is implicitly defined by R. This shows that the quality of the
reconstruction depends heavily on the prior encoder R. As discussed, these priors
are either handpicked (e.g., total variation (TV) or the `1 -norm of the wavelet co-
efficients of the image [5–9]) or learned through a dictionary [33–35]. However, in
either case, they are restricted to well-behaved functionals that can be minimized
via a convex routine [10–13]. This limits the type of prior knowledge that can be
injected into the algorithm.

1.3 Generation II - Supervised-Deep-Learning-Based


methods
Recently, researchers have obtained new state-of-the-art results for inverse problems
with the help of the neural networks [16–20]. This has resulted in the emergence
of new generation of deep-learning-based algorithms. A neural network CNN is a
function of the form

CNN = fL (fL 1 (. . . f1 (x))) (1.10)

where fl = ⇢(Wl x+al ) and = (W1 , a1 , . . . , WL , aL ). There ⇢ is a pointwise non-


linear function and thus CNN is composed of L layers each of which constitutes
of a linear followed by a non-linear function. The parameters are learnable and
thus, tune the function CNN to behave in a desirable way. If the linear part
is implemented by a convolution filter, then the network is called a convolutional
neural network (CNN). These networks are more efficient for processing images
since they are able to extract their spatial correlation in a shift invariant fashion.
This generation can be further divided into two categories.

1.3.1 Direct Feedforward Reconstruction


In these methods, it is assumed that we have access to training data

{(x1 , y1 ), . . . , (xQ , yQ )}
16 Linear Inverse Problems for Biomedical Imaging

which constitute pairs of signal and measurement. The network is first trained to
map these measurement to their true corresponding image, and then is deployed
on the new unseen measurement. This is given by

Q
X

Training : = arg min kCNN (yq ) x q k2 , (1.11)
q=1

Reconstruction : x⇤ = CNN ⇤ (y). (1.12)

Often the measurement is followed by a fixed linear step like Back Projection
(BP) or Filtered Back-Projection (FBP) before being fed to the CNN [17,18]. This
helps in better convergence and to implicitly inject the physical knowledge during
the learning.
Although it is now well documented that these methods have the capacity to
outperform the classical algorithms, there is still limited theoretical understanding
regarding them. Moreover, these methods are extremely sensitive to the mismatch
between the statistics of the training and testing data. This is because the mea-
surement information is injected into the reconstruction only once.

1.3.2 Iterative Reconstruction


In these methods, the advantages of classical and deep-learning methods are com-
bined to obtain improved performance with robustness guarantees. Here the recon-
struction is performed iteratively. Each step of iteration constitute a sub-step to
maximize data-fidelity and a sub-step to clean the image using CNN. Our contri-
bution in Chapter 3 discusses this framework in more depth.

1.4 Generation III - Unsupervised Deep-Learning-


based methods
The second generation algorithms rely on the power of the neural networks to
reconstruct state-of-the-art results but they require rich training data. The latter
may not be available for many modalities. The latest generation algorithms have
been devised to address this issue.
1.5 Summary 17

1.4.1 Deep Image Prior


In this algorithm [27], a neural network is optimized (not trained) to map a random
vector z to an image under the constraint that the latter is the most consistent with
the measurement y. This is formulated by

Optimization : = arg min kH{CNN (z)} yk2 , (1.13)

Solution : x⇤ = CNN ⇤ (z), (1.14)


where are the parameters of the CNN and HCNN ⇤ (z) = x are the measure-

ments associated with the reconstruction. CNNs have the ability to represent a
large set of images. Therefore, they are susceptible to reconstruct undesirable im-
ages in (1.13), which minimizes the data fidelity. Yet, it has been observed that
when the optimization is carried out using standard techniques the algorithm recon-
structs desirable natural and biomedical images. In the case when the measurement
is corrupted with noise, early stopping is required.
This bias or prior towards desirable images has been ascribed to the architecture
of the network which not only lets it represent natural and biomedical images more
easily than other images but even helps in reaching faster to these desirable images.
In the literature this bias is called deep image prior.

1.4.2 Deep Generative Models


Methods like GAN [36] and Variational auto encoder (VAE) [37] have become quite
popular in generative modelling. Until recently, their application in inverse problem
was limited to learn priors from the training data. In this thesis, we show that
there are certain modalities like Cryo-EM where deep generative models could play
a central role in reconstruction without the need of training data. We shall delve
deeper into this aspect in Chapter 5 and 6.

1.5 Summary
In this chapter, we give an overview of the reconstruction methods to solve linear
inverse problems. We describe the optimization problem that are formulated for
these methods and the routines deployed to solve them numerically. This will act
as the background for to understand the next parts of our thesis.
18 Linear Inverse Problems for Biomedical Imaging
Part I

First Generation

19
Chapter 2

Continuous-Domain Extension
of Classical Methods

1
In chapter 1, we formulated linear inverse problems. A subtle point to note in that
formulation was that the signal, measurement, and the forward model were repre-
sented in discrete-domain (vectors). It is important to understand that the true
signal and the forward model are continuous-domain entities. The measurements
are generally discrete since they are obtained from a finite number of detectors.
For example, MRI measurements are samples of the Fourier transform at a finite
number of different frequencies of a continuous-domain signal. In this chapter, we
further our understanding of classical methods by appropriate extension to solve
continuous-domain linear inverse problems.

2.1 Overview
Although the signals and forward model that one encounters are in continuous-
domain, they are discretized in order to numerically solve the inverse problems.
This is done by choosing an arbitrary but suitable basis {'n } and to represent the
1 The content of this chapter are based on our work [29]

21
22 Continuous-Domain Extension of Classical Methods

signal in continuous-domain as
N
X
f (x) = fn 'n (x), (2.1)
n=1

where f = (f1 , . . . , fN ) 2 RN . Given the measurements y 2 RM , the task then is to


find the expansion coefficients f by minimizing
0 1

f ⇤ = arg min @ky Hf k22 + kLf k22 A , (2.2)


f 2RN | {z } | {z }
I II

where H 2 RM ⇥N has elements [H]m,n = hhm , 'n i. The analysis functions


{hm }Mm=1 specify the forward model which encodes the physics of the measure-
ment process. Term I in (2.2) is the data fidelity. It ensures that the recovered
signal is close to the measurements. Term II is the regularization, which encodes
the prior knowledge about the signal. The regularization is imposed on some trans-
formed version of the signal coefficients using the matrix L. Linear reconstruction
algorithms [38,39] can be used to solve Problem (2.2). As discussed earlier, the no-
tion that the real-world signals are sparse in some basis (e.g., wavelets) has resulted
in better signal reconstruction in some fields. This prior is imposed by using the
sparsity-promoting `1 -regularization norm [40], [8] and results in the minimization
problem
f ⇤ = arg min ky Hf k22 + kLf k1 (2.3)
f 2RN

which can be efficiently solved using iterative algorithms [7], [41]. The solutions
to (2.2), (2.3), and their variants with generalized data-fidelity terms are well
known [42], [43], [44], [45].
While those discretization paradigms are well studied and used successfully in prac-
tice, it remains that the use of a prescribed basis {'n }, as in (2.1), is somewhat
arbitrary.
In this chapter, we propose to bypass this limitation by reformulating and solv-
ing the linear inverse problem directly in the continuous domain. To that end, we
impose the regularization in the continuous domain, too, and restate the recon-
struction task as a functional minimization. We show that this new formulation
leads to the identification of an optimal basis for the solution which then suggests
a natural way to discretize the problem.
2.1 Overview 23

2.1.1 Contributions
Our contributions are two folds and are summarized as follows:

Theoretical.
• Given y 2 RM , we formalize 1D inverse problem in the continuous domain as

fR⇤ = arg min ky H{f }k22 + R(f ) , (2.4)


f 2X | {z }
JR (f |y)

where f is a function that belongs to a suitable function space X . Similarly


to the discrete regularization terms kLf k2`2 and kLf k`1 in (2.2) and (2.3), we
focus on their continuous-domain counterparts R(f ) = kLf k2L2 and R(f ) =
kLf kM , respectively. There, L and H are the continuous-domain versions of
L and H, while kLf kM is the proper continuous-domain counterpart of the
discrete `1 norm. We show that the effect of these regularizations is similar
to the one of their discrete counterparts.
• We provide the parametric form of the solution (representer theorem) that
minimizes JR (f |y) in (2.4) for the Tikhonov regularization R(f ) = kLf k2L2
and the generalized total-variation (gTV) regularization R(f ) = kLf kM . Our
results underline how the discrete regularization resonates with the continuous-
domain one. The optimal solution for the Tikhonov case is smooth, while it
is sparse for the gTV case. The optimal bases in the two cases are intimately
connected to the operators L and H.
• We present theoretical results that are valid for any convex, coercive, and
lower-semicontinuous data-fidelity term which is proper in the range of H.
This includes the case when the data-fidelity term is ky H{f }k22 . In this
sense, for the gTV case our work extends the results in [46] which only deals
with indicator function over a feasible convex-compact set as a data-fidelity
term.

Algorithmic.
• We propose a discretization scheme to minimize JR (f |y) in the continuous
domain. Even though the minimization of JR (f |y) is an infinite-dimensional
24 Continuous-Domain Extension of Classical Methods

problem, the knowledge of the optimal basis of the solution makes the problem
finite-dimensional: it boils down to the search for a set of optimal expansion
coefficients.

• We devise an algorithm to find a sparse solution when the gTV solution is


non-unique. For this case, the optimization problem turns out to be a LASSO
[43] minimization with non-unique solution. We introduce a combination of
FISTA [32] and the simplex algorithm to find a sparse solution which we prove
to be an extreme point of the solution set.

This chapter is organized as follows: In Sections 2.2 and 2.4, we present the formu-
lation and the theoretical results of the inverse problem for the two regularization
cases. In Section 2.5, we compare the solutions of the two cases. We present our
numerical algorithm in Section 2.6 and illustrate its behavior with various exam-
ples in Section 2.7. The mathematical proofs of the main theorems are given in the
appendices.

2.2 Continuous-Domain Formulation of Inverse Prob-


lems
In our formulation of a linear inverse problem, the signal f is a function of the
continuous-domain variable x 2 R. The task is then to recover f from the vector
of measurements y = H{f } + n 2 RM , where n is an unknown noise component
that is typically assumed to be i.i.d. Gaussian.
In the customary discrete formulation, the basis of the recovered function is already
chosen and, therefore, all that remains is to recover the expansion coefficients of
the signal representation (2.1). In this scenario, one often includes matrices H
and L that directly operate on these coefficients. However, for our continuous-
domain formulation, the operations have to act directly on the function f . For
this reason, we also need the continuous-domain counterparts of the measurement
and regularization operators. The entities that enter our formulation are described
next.
2.2 Continuous-Domain Formulation of Inverse Problems 25

2.2.1 Measurement Operator


The system matrix H in (2.2) and (2.3) is henceforth replaced by the operator
H : X ! RM that maps the continuous-domain functions living in a Banach space
X to the linear measurements y 2 RN . This operator is described as

H{f } = (hh1 , f i, . . . , hhM , f i) = (y1 , . . . , yM ) = y, (2.5)


R
where hh, gi = R h(x)g(x) dx, which in the case of generalized functions should
be interpreted as the duality product. Furthermore, the map hm : f 7! hhm , f i is
assumed to be continuous in X ! R. For example, the components of the measure-
ment operator that samples a function at the locations x1 , . . . , xM are represented
by hm = (· xm ) such that h (· xm ), f i = f (xm ). Similarly, Fourier measure-
ments at pulsations !1 , . . . , !M are obtained by taking hm = e j!m · .

2.2.2 Data-Fidelity Term


As extension of the conventional quadratic data-fidelity term ky Hf k22 , we consider
a general cost functional E(y, ·) : RM ! R+ [ {1} with some assumptions (see
Assumption 2 in Section 2.4), that measures the discrepancy between the given
measurements y and the values H{f } predicted from the reconstruction. A relevant
example is the weighted quadratic data-fidelity term, which is often used when the
measurement noise is Gaussian with diagonal covariance. Similarly, we can use
ky H{f }k1 , for example, when the additive noise is Laplacian. Alternatively,
when the measurements are noiseless, we use the indicator function
(
0, y = H{f }
I(y, H{f }) = (2.6)
1, y 6= H{f },

which imposes an exact fit.

2.2.3 Regularization Operator


Since the underlying signal is continuously defined, we need to replace the regular-
ization matrix L in (2.2) and (2.3) by a regularization operator L : X ! Y, where X
and Y are appropriate (generalized) function spaces to be defined in Section 2.2.5.
The typical example that we have in mind is the derivative operator L = D = ddx .
26 Continuous-Domain Extension of Classical Methods

The continuous-domain regularization is then imposed on Lf . We assume that the


operator L is admissible in the sense of definition 2.2.1.

Definition 2.2.1. The operator L : X ! Y is called spline-admissible if

• it is linear and shift-invariant;

• its null space NL = {p 2 X : Lp = 0} is finite-dimensional;

• it admits the Green’s function ⇢L : R ! R with the property that L⇢L = .

Given that Lb is the frequency response of L, the Green’s function can be calcu-

lated through the inverse Fourier transform ⇢L = F 1 1
. For example, if L = D,
b
L Z
then ⇢D (x) = 2 sign(x). Here the Fourier transform, F : f 7! Ff =
1
f (x)e jx(·) dx,
R
is defined when the function is integrable and can be extended in the usual manner
to f 2 S 0 (R) where
⇢ S (R) is ‘Schwartz’ space of tempered distributions. In cases
0

such as ⇢L = F 11
when the argument is non-integrable, the definition should
b
L
be seen in terms of generalized Fourier Transform [47, Defintion 8.9] which treats
the argument as a distribution.

2.2.4 Regularization Norms


Since the optimization is done in the continuous domain, we also have to specify
the proper counterparts of the `2 and `1 norms, as well as the corresponding vector
spaces.

i. Quadratic (or Tikhonov) regularization: R2 (f ) = kLf k2L2 , where


Z
kwk2L2 = |w(x)|2 dx. (2.7)
R

ii. Generalized total variation: R1 (f ) = kLf kM , where

kwkM = sup hw, 'i. (2.8)


'2S(R),k'k1 =1
2.2 Continuous-Domain Formulation of Inverse Problems 27

There S(R) is the ‘Schwartz’ space of smooth and rapidly decaying functions,
which is also the dual of S 0 (R). Moreover, M={w 2 S 0 (R) | kwkM < 1}.
In particular, when w 2 L1 ⇢ M, we have that
Z
kwkM = |w(x)| dx = kwkL1 . (2.9)
R

Yet, we note that M is slightly larger than L1 since it also includes the Dirac
distribution with k kM = 1. The popular TV norm is recovered by taking
kf kTV = kDf kM [46].

2.2.5 Search Space


The Euclidean search space RN is replaced by spaces of functions, namely,

X2 ={f : R ! R | kLf kL2 < +1}, (2.10)


X1 ={f : R ! R | kLf kM < +1}. (2.11)

In other words, our search (or native) space is the largest space over which the
regularization is well defined. It turns out that X2 and X1 are Hilbert and Banach
spaces, respectively. However, this is nontrivial to see since these spaces contain
the null space which makes kLf kL2 and kLf kM semi-norms. This null-space can
be taken care off by using an appropriate inner-product h·, ·iNL (norm k · kNL ,
respectively) such that h·, ·iX2 = hL·, L·i + h·, ·iNL (k · kX1 = kL · kM + k · kNL ,
respectively) is the inner-product (norm, respectively) on X2 (X1 , respectively).
The structure of these spaces has been studied in [46] and is recalled in the Appendix
A.1.6
As we shall see in Section 2.4, the solution of (2.4) will be composed of splines;
therefore, we also review the definition of splines.

Definition 2.2.2 (Nonuniform L-spline). A function f : R ! R is called a


nonuniform L-spline with spline knots (x1 , . . . , xK ) and weights (a1 , . . . , aK ) if

K
X
Lf = ak (· xk ). (2.12)
k=1
28 Continuous-Domain Extension of Classical Methods

By solving the differential equation in (2.12), we find that the generic form of the
nonuniform spline f is
K
X
f = p0 + ak ⇢L (· xk ), (2.13)
k=1

where p0 2 NL .

2.3 Related Work


The use of R(f ) = kLf k2L2 goes back to Tikhonov’s theory of regularization [38] and
to kernel methods in machine learning [48]. In the learning community, representer
theorems (RT) as in [49], [50] use the theory of reproducing-kernel Hilbert spaces
(RKHS) to state the solution of the problem for the restricted case where the
measurements are samples of the function. For the generalized-measurement case,
there are also tight connections between these techniques and variational splines and
radial-basis functions [51], [52], [47]. These representer theorems, however, either
have restrictions on the empirical risk functional or on the class of measurement
operators.
Specific spline-based methods with quadratic regularization have been developed
for inverse problems. In particular, [53], [54] used variational calculus. Here, we
strengthen these results by proving the uniqueness and existence of the solution of
(2.4) for R(f ) = kLf k2L2 . We revisit the derivation of the result using the theory of
RKHS.
Among more recent non-quadratic techniques, the most popular ones rely on
(TV) regularization which was introduced as a noise-removal technique in [55] and
is widely used in computational imaging and compressed sensing, although always
in discrete settings. Splines as solutions of TV problems for restricted scenarios
have been discussed in [56]. More recently, a RT for the continuous-domain R(f ) =
kLf kM in a general setting has been established in [46], extending the seminal work
of Fisher and Jerome [57]. The solution has been shown to be composed of splines
that are directly linked to the differential operator L. Other recent contributions
on inverse problems in the space of measures include [58–62]. In particular, in this
chapter, we extend the result of [46] to an unconstrained version of the problem.
The unconstrained formulation is useful in devising numerical algorithms which are
2.4 Theoretical Results 29

one of the main contributions of our work. In addition our results are valid for a
much larger set of data-fidelity terms than [46]. This is useful in practical scenarios
where one may use data-fidelity terms depending on factors like distribution of
noise, etc..

2.4 Theoretical Results


To state our theorems, we need some technical assumptions.

Assumption 1. Let the search space X and the regularization space Y be Banach
spaces such that the following holds.

i. The functionals hm for m 2 {1, . . . , M } are linear continuous over X and


the vector-valued functional H : X ! RM gives the linear measurements
f 7! H{f } = (hh1 , f i, . . . , hhM , f i).

ii. The regularization operator L : X ! Y is spline-admissible. Its finite-


dimensional null space NL has the basis p = (p1 , . . . , pN0 ).

iii. The inverse problem is well posed over the null space. This means that, for
any pair p1 , p2 2 NL , we have that

H{p1 } = H{p2 } , p1 = p2 . (2.14)

In other words, different null-space functions result in different measurements.

In particular, Condition iii) is equivalent to NL \ NH = {0}, where NH is the


null space of the vector-valued measurement functional. This property prevents
from having a nonzero f0 2 NL \ NH whose addition to any f 2 X can neither be
detected by the data-fidelity term nor by the regularization term. This is essential
in ensuring the boundedness of the set of the minimizers.

Assumption 2. For a given y 2 RM , the functional E(y, ·) : RM ! R+ [ {1} is


convex, coercive, and lower semi-continuous on the whole RM , and is proper (has
finite value for at least one input) in the range of H.
Assumption 2’. For a given y 2 RM , the functional E(y, ·) satisfies Assumption
2 as well as one of the following.
30 Continuous-Domain Extension of Classical Methods

i. It is strictly convex; or

ii. it is an indicator function I(y, ·).

As we shall see later, stronger results can be derived for the E(y, ·) that satisfy
Assumption 2’.
Two remarks are in order. Firstly, the condition of being proper in the range
of H implies that there exists an f 2 X such that E(y, H{f }) is finite. Secondly,
when E(y, ·) is strictly convex or is such that its range does not include 1, it is
redundant to ensure that it is proper in the range of H.
We now state our two main results. Their proofs are given in Appendix A.1.1
and Appendix A.1.2, respectively.

2.4.1 Inverse Problem with Tikhonov/L2 Regularization


Theorem 2.4.1. Let Assumptions 1 and 2 hold with the search space X = X2 and
regularization space Y = L2 . Then, the set

V2 = arg min E(y, H{f }) + kLf k2L2 (2.15)


f 2X2

of minimizers is nonempty, convex, and such that any f2⇤ 2 V2 is of the form

M
X N0
X
f2⇤ (x) = am 'm (x) + bn pn (x), (2.16)
m=1 n=1

n o
bm
where 'm = F 1 |hL|
b 2 , and a = (a1 , . . . , aM ), and b = (b1 , . . . , bN0 ) are expansion
coefficients such that

M
X
am hhm , pn i = 0 (2.17)
m=1

for all n 2 {1, . . . , N0 }. Moreover, if E(y, ·) satisfies Assumption 2’ then the min-
imizer is unique (the set V2 is singleton).
2.4 Theoretical Results 31

2.4.2 Inverse Problem with gTV Regularization


Theorem 2.4.2. Let Assumptions 1 and 2 hold for the search space X = X1 and
regularization space Y = M. Moreover, assume that H is weak*-continuous (see
Appendix A.1.6)). Then, the set

V1 = arg min (E(y, H{f }) + kLf kM ) (2.18)


f 2X1

of minimizers is nonempty, closed-convex, weak*-compact, and its extreme points


are nonuniform L-splines of the form
K
X N0
X
f1⇤ (x) = ak ⇢L (x xk ) + bn pn (x) (2.19)
k=1 n=1

for some K  (M N0 ). The unknown knots (x1 , . . . , xK ), and the expansion co-
efficients a = (a1 , . . . , aK ) and b = (b1 , . . . , bN0 ) are the parameters of the solution
with kLf1⇤ kM = kak1 . The solution set V1 is the closed-convex hull of these extreme
points. Moreover, if Assumption 2’ is satisfied then all the solutions have the same
measurement; i.e., yV1 = H{f }, 8 f 2 V1 .
R
A sufficient condition for weak*-continuity of hm is R |hm (x)|(1 + |x|)D dx < 1
( [46, Theorem 6]), meaning that hm should exhibit some minimal decay at infinity
(see Appendix A.1.6). Here D = inf{n 2 N : (ess supx2R ⇢L (1 + |x|)n ) < +1}.
The ideal sampling is feasible as well, provided that the ⇢L is continuous; a detailed
proof of the weak*-continuity of (· xn ) for the case L = D2 can be found in [63,
Proposition 6].
We remark that [46, Theorem 2] is a special case of Theorem 4. The former
states the same result as Theorem 4 for the minimization problem

V1 = arg min kLf kM , (2.20)


H{f }2C

where C is feasible, convex, and compact. Feasibility of C means that the set
CX1 = {f 2 X1 : Hf 2 C} is nonempty. In our setting, problem (2.20) can be
obtained by using an indicator function over the feasible set C as the data-fidelity
term. However, Theorem 4 covers other more useful cases of E; for example,
ky H{f }k1 and ky H{f }k22 . Moreover, as discussed earlier, when data-fidelity
terms are strictly convex or do not have 1 in their range, they are proper in the
32 Continuous-Domain Extension of Classical Methods

range of H for any y 2 RM . This means that they do not require careful selection
of C in order to satisfy the feasibility condition. This is helpful in directly devising
and deploying algorithms to find the minimizers.
Also, fundamentally (2.20) only penalizes the regularization value, whereas The-
orem 4 additionally penalizes a data-fidelity term that can recover more desirable
solutions. In fact, Theorem 4 also covers cases such as

V1 = arg min E(y, H{f }) + kLf kM , (2.21)


H{f }2C

which allow more control than (2.20) over the data-fidelity of the recovered solution.

2.4.3 Illustration with Ideal Sampling


Here, we discuss the regularized case where noisy data points ((x1 , y1 ), . . . , (xM , yM ))
are fitted by a function. The measurement functionals in this case are the shifted
Dirac impulses hm = (· xm ) whose Fourier transform is b hm (!) = e j!xm . We
choose L = D2 and E = ky H{f }k22 which satisfies Assumption 2’.i). Here D2 is
the generalized second-order derivative. For the L2 problem, we have that
M
!
X
⇤ 2 2 2
f2 = arg min |ym f (xm )| + kD f kL2 . (2.22)
f 2X2
m=1

As given
n in Theorem
o 2.4.1, f2⇤ is unique and has the basis function 'm (x) =
1 e j(·)xm
F | (·)2 |2 (x) = 12 |x xm |3 . The resulting solution is piecewise linear. It
1

can be expressed as

XM
1
f2⇤ (x) = b1 +b2 x + am |x xm | 3 , (2.23)
m=1
12

where b1 +b2 x 2 ND2 is a linear function.


We contrast (2.22) with the gTV version
0 1
M
BX C
f1⇤ = arg min @ |ym f (xm )|2 + kD2 f kM A . (2.24)
f 2X1
m=1
| {z }
kDf kTV
2.5 Comparison 33

In this scenario, the term kD2 f kM is the total variation of the function Df . It
penalizes solutions whose slope varies too much from one point to the next.
The Green’s function in this case is ⇢D2 = |x|
2 . Based on Theorem 2.4.2, any
extreme point of (2.24) is of the form
K
1X 0
f1⇤ (x) = b1 +b2 x + ak |x ⌧k |, (2.25)
2
k=1

which is a piecewise linear function composed of a linear term b1 +b2 x and K 


(M 1) basis functions, {|x ⌧k |}K k=1 . The knots (or locations) {⌧k }k=1 are not
K

fixed a priori and usually differ from the measurement points {xm }m=1 .
M

The two solutions and their basis functions are illustrated in Figure 2.1 for
specific data. This example demonstrates that the mere replacement of the L2
penalty with the gTV norm has a fundamental effect on the solution: piecewise-
cubic functions having knots at the sampling locations are replaced by piecewise-
linear functions with a lesser number of adaptive knots. Moreover, in the gTV case,
the regularization has been imposed on the generalized second-order derivative of
XK
the function kD2 f kM , which uncovers the innovations D2 f1⇤ = a0k (· ⌧k ).
k=1
By contrast, when R2 (f ) = kD2 f k2L2 = hD2⇤ D2 f, f i, the recovered solution is such
PM
that D2⇤ D2 f2⇤ = m=1 am (· xm ), where D2⇤ = D2 is the adjoint operator of D2 .
Thus, in both cases, the recovered functions are composed of the Green’s function
of the corresponding active operators: D2 vs. D2⇤ D2 = D4 .

2.5 Comparison
We now discuss and contrast the results of Theorems 2.4.1 and 2.4.2. In either
case, the solution is composed of a primary component and a null-space component
whose regularization cost vanishes.
Nature of the Primary Component The solutions for the gTV regularization
are composed of atoms within the infinitely large dictionary {⇢L (· ⌧ )}, 8⌧ 2 R,
whose shapes depend only on L. In contrast, the L2 solutions are composed of
fixed atoms {'m }M m=1 whose shapes depend on both L and H. As the shape of the
atoms of the gTV solutions does not depend on H, this makes it easier to inject
34 Continuous-Domain Extension of Classical Methods

3
2.5 TV TV
L2 2 L2
1.5 z

0.5 1
-0.5
0
0 1 2
x
3 4 5 -3 -2 -1 0 x 1 2 3
(a) f1 (x) and f2 (x).
⇤ ⇤
(b) ⇢D2 (x) and ⇢D2⇤ D2 (x).

Figure 2.1: Reconstructions of a signal from nonuniform samples for L =


D2 : (a) Tikhonov (L2 ) vs. gTV solution, and (b) Corresponding basis
functions ⇢D2 vs. ⇢D2⇤ D2 .

prior knowledge in that case. The weights and the location of the atoms of the gTV
solution are adaptive and found through a data-dependent procedure which results
in a sparse solution that turns out to be a nonuniform spline. By contrast, the L2
solution lives in a fixed finite-dimensional space.
Null-Space Component. The second component in either solution belongs to
the null space of the operator L. As its contribution to regularization vanishes, the
solutions tend to have large null-space components in both instances.
Oscillations.
n oThe modulus of the Fourier transform of the basis nfunctiono of the
bm
gTV case, Lb1 typically decays faster than that of the L2 case, |hL| b2 . There-
fore, the gTV solution exhibits weaker Gibbs oscillations at edges.
Uniqueness of the Solution. Our hypotheses guarantee existence. Moreover,
the minimizer of the L2 problem is unique when Assumption 2’ is true. By contrast,
even for this special category of E(y, ·), the gTV problem can have infinitely many
solutions, despite all having the same measurements. Remarkably, however, when
the gTV solution is unique, it is guaranteed to be an L-spline.
Nature of the Regularized Function. One of the main differences between the
reconstructions f2⇤ and f1⇤ is their sparsity. Indeed, Lf1⇤ uncovers Dirac impulses
PM 1
situated at (M 1) locations for the gTV case, with Lf1⇤ = m=1 am (· ⌧m ). In
return, Lf2⇤ is a nonuniform L-spline convolved with the measurement functions,
2.6 Discretization and Algorithms 35

whose temporal support is not localized. This allows us to say that the gTV solution
is sparser than the Tikhonov solution.

2.6 Discretization and Algorithms


We now lay down the discretization procedure that translates the continuous-
domain optimization into a more tractable finite-dimensional problem. Theorems
2.4.1 and 2.4.2 imply that the infinite-dimensional solution lives in a finite-dimensional
space that is characterized by the basis functions {'m }M m=1 for L2 and {⇢L (·
N0
k=1 for gTV, in addition to {pn }n=1 as basis of the null space. Therefore,
⌧k )}K
the solutions can be uniquely expressed with respect to the finite-dimensional pa-
rameter a 2 RM or a 2 RK , respectively, and b 2 RN0 . Thus, the objective
functional JRi (f |y, ), for a given i 2 {1, 2}, can be discretized to get the objective
functional JRi (a, b|y, ). Its minimization is done numerically, by expressing H{f }
and kLf k2L2 or kLf kM in terms of a and b. We discuss the strategy to achieve
JRi (a, b|y, ) and its minima for the two cases. From now onwards, we will use Ji
for JRi where i 2 {1, 2}.

2.6.1 Tikhonov Regularization


For the L2 regularization, given > 0, the solution

f2⇤ = arg min E(y, H{f }) + kLf k2L2 (2.26)


f 2X2 | {z }
J2 (f |y, )

can be expressed as
M
X N0
X
f2⇤ = a m 'm + b n pn . (2.27)
m=1 n=1

Recall that L⇤ L'm = hm , so that

M
X
L⇤ Lf2⇤ = a m hm . (2.28)
m=1
36 Continuous-Domain Extension of Classical Methods

The corresponding J2 (y| , a, b) is then found by expressing H{f2⇤ } and kLf2⇤ k2L2 in
terms of a and b. Due to the linearity of the model,
M
X N0
X
H{f2⇤ } = am H{'m } + bn H{pn }
m=1 n=1
= Va + Wb, (2.29)

where [V]m,n = hhm , 'n i and [W]m,n = hhm , pn i. Similarly,


* M +
X
⇤ ⇤ ⇤ ⇤ ⇤ ⇤
hLf2 , Lf2 i = hL Lf2 , f2 i = a m hm , f2 (2.30)
m=1
= aT Va + aT Wb = a Va, T
(2.31)

where (2.30) uses (2.28) and where (2.31) uses the orthogonality property (2.17),
which we can restate as aT W = 0. By substituting these reduced forms in (2.26),
the discretized problem becomes

f2⇤ = arg min E(y, Va + Wb) + aT Va . (2.32)


a,b | {z }
J2 (a,b|y, )=J2 (f2⇤ |y, )

Due to Assumption 2, this problem is convex. If E is differentiable with respect to


the parameters, the solution can be found by gradient descent.
When E(y, H{f }) = ky H{f }k22 , the problem is reduced to

arg min ky (Va + Wb)k22 + aT Va (2.33)


a,b | {z }
J2 (a,b|y, )

which is very similar to (2.2). This criterion is convex with respect to the coefficients
a and b. Enforcing that the gradient of J2 vanishes with respect to a and b and
setting the gradient to 0 then yields M linear equations with respect to the M + N0
variables, while the orthogonality property (2.17) gives N0 additional constraints.
The combined equations correspond to the linear system
  
V+ I W a y
= . (2.34)
WT 0 b 0
2.6 Discretization and Algorithms 37

The system matrix so obtained can be proven to be positive definite due to the
property of Gram matrices generated in an RKHS and the admissibility condition of
the measurement functional (Assumption 1). This ensures that the matrix is always
invertible. The consequence is that the reconstructed signal can be obtained by
solving a linear system of equation, for instance by QR decomposition or by simple
matrix inversion. The derived solution is the same as the least-square solution
in [54].

2.6.2 gTV Regularization


In the case of gTV regularization, the problem to solve is

f1⇤ = arg min (E(y, H{f }) + kLf kM ) . (2.35)


f 2X1 | {z }
J1 (f |y, )

According to Theorem 2.4.2, an extreme-point solution of (2.35) is


K
X N0
X
f1⇤ (x) = ak ⇢L (x ⌧k ) + bn pn (x) (2.36)
k=1 n=1

and satisfies
K
X
Lf1⇤ = w1 = ak (· ⌧k ) (2.37)
k=1

with K  (M N0 ). Theorem 2.4.2 implies that we only have to recover ak , ⌧k ,


and the null-space component p to recover f1⇤ .
In our experiments, we shall consider the case of measurement functionals whose
support is limited to [0, T ]. We therefore only reconstruct the restriction of the
signal in this interval. Since we usually know neither K nor ⌧k beforehand, our
solution is to quantize the x-axis and look for ⌧k in the range [0, T ] on a grid with
N K points. We control the quantization error with the grid step = T /N .
The discretized problem is then to find a 2 RN with fewer than (M N0 )
nonzero coefficients and b 2 RN0 such that
N
X1 N0
X

f1, (x) = an ⇢L (x n )+ bn pn (x) (2.38)
n=0 n=1
38 Continuous-Domain Extension of Classical Methods

with K  (M N0 ) ⌧ N nonzero coefficients an , satisfies a computationally


feasible variant of (2.35). In other words, we solve the restricted version of (2.35)

min (E(y, H{f }) + kLf kM ), (2.39)


f 2X1,| {z }
J1, (y| ,f )

where
(N 1 N0
)
X X
N +N0
X1, = an ⇢L (· n )+ bn pn (a, b) 2 R .
n=0 n=1

Similarly to the L2 case, J1, (a, b|y, ) is found by expressing H{f1,



} and
kLf1, kM in terms of a and b. For this, we use the properties that L⇢L = ,

k kTV = 1, and Lpn = 0 for n 2 [1 . . . N0 ]. This results in


H{f1, } = Pa + Qb, (2.40)

kLf1, kM = kak1 , (2.41)

where a = (a0 , . . . , aN 1 ), [P]m,n = hhm , ⇢L (· n )i for n 2 [0 . . . N 1], [Q]m,n =


PN
hhm , pn i for n 2 [1 . . . N0 ], kak1 = n=1 |an |, and where N is the initial number of
Green’s functions of our dictionary. The new discretized objective functional is


f1, = arg min (E(y, (Pa + Qb)) + kak1 ) . (2.42)
a,b | {z }
⇤ |y, )
J1, (a,b|y, )=J1, (f1,

Note that (2.42) is the exact discretization of the infinite-dimensional problem


(2.39). However, additional theories, such as convergence [64–66], are needed
to show that the recovered signal f1,⇤
converges (in the weak sense) to one of the
solution of the original problem (2.35) when the discretization step goes to 0 (or
when N is large enough). We leave this analysis for the future work.
When E is differentiable with respect to the parameters, a minimum can be
found by using proximal algorithms where the slope of kak1 is defined by a Prox
operator. We discuss the two special cases when E is either an indicator function
or a quadratic data-fidelity term.
2.6 Discretization and Algorithms 39

Exact Fit with E = I(y, H{f })


To perfectly recover the measurements, we impose an infinite penalty when the
recovered measurements differ from the given ones. In view of (2.40) and (2.41),
this corresponds to solving
(a⇤ , b⇤ ) = arg min kak1 subject to Pa + Qb = y. (2.43)
a,b

We then recast Problem (2.43) as the linear program


N
X
(a⇤ , u⇤ , b⇤ ) = min un subject to u + a 0,
a,u,b
n=1
u a 0,
Pa + Qb = y, (2.44)

where the inequality x y between any 2 vectors x 2 RN and y 2 RN means that


xn yn for n 2 [1 . . . N ]. This linear program can be solved by a conventional
simplex or a dual-simplex approach [67], [68].

Least Squares Fit with E = ky H{f }k22


When E is a quadratic data-fidelity term, the problem becomes
(a⇤ , b⇤ ) = arg min ky (Pa + Qb) k22 + kak1 , (2.45)
a,b

which is more suitable when the measurements are noisy. The discrete version
(2.45) is similar to (2.3), the fundamental difference being in the nature of the
underlying basis function.
The problem is converted into a LASSO formulation [43] by decoupling the com-
putation of a⇤ and b⇤ . Suppose that a⇤ is fixed, then b⇤ is found by differentiating
(2.45) and equating the gradient to 0. This leads to
⇣ ⌘ 1
b⇤ = QT Q QT (y Pa⇤ ). (2.46)

Upon substitution in (2.45), we get that


a⇤ = arg min kQ0 y Q0 Pak22 + kak1 , (2.47)
a
40 Continuous-Domain Extension of Classical Methods
✓ ⇣ ⌘ ◆
1
where Q0 = I Q QT Q QT and I is the (M ⇥M ) identity matrix. Problem
(2.47) can be solved using a variety of optimization techniques such as interior-point
methods or proximal-gradient methods, among others. We employ the popular
iterative algorithm FISTA [32], which has an O(1/t2 ) convergence rate with respect
to its iteration number t. However, in our case, the system matrices are formed by
the measurements of the shifted Green’s function on a fine grid. This leads to high
correlations among the columns and introduces two issues.

• If LASSO has multiple solutions, then FISTA can converge to a solution


within the solution set, whose sparsity index is greater than M . A similar
type of limitation has been discussed in [69].

• If LASSO has a unique solution, then the convergence to the exact solution can
be slow. The convergence rate is inversely proportional
⇣ to the
⇣ Lipschitz
⌘⌘ con-
T
stant of the gradient of a quadratic loss function max Eig H H , which
is typically high for the system matrix obtained through our formulation.

We address these issues by using a combination of FISTA and simplex, governed


by the following Lemma 2.6.1 and Theorem 2.6.2. The properties of the solution of
the LASSO problem have been discussed in [70], [71], [72]. We quickly recall one
of the main results from [70].

Lemma 2.6.1 ( [70, Lemma 1 and 11]). Let y 2 RM and H 2 RM ⇥N , where


M < N . Then, the solution set

↵ = arg min ky Hak22 + kak1 (2.48)
a2RN

has the same measurement Ha⇤ = y0 for any a⇤ 2 ↵ . Moreover, if the solution is
not unique,
n then⇣ any ⌘ two ⇣solutions
⌘ ao(1) , a(2) 2 ↵ are such that their mth element
(1) (2)
satisfies sign am sign am 0 for m 2 [1 . . . M ]. In other words, any two
solutions have the same sign over their common support.

We use Lemma 2.6.1 to infer Theorem 2.6.2, whose proof is given in Appendix
2.6.2.
2.6 Discretization and Algorithms 41

Theorem 2.6.2. Let y 2 RM and H 2 RM ⇥N , where M < N . Let y0, =


Ha⇤ , 8a⇤ 2 ↵ , be the measurement of the solution set ↵ of the LASSO formulation

a⇤ = arg min ky Hak22 + kak1 . (2.49)


a2RN

Then, the solution a⇤SLP (obtained using the simplex algorithm) of the linear program
corresponding to the problem

a⇤SLP = arg min kak1 subject to Ha = y0, (2.50)

is an extreme point of ↵ . Moreover, ka⇤SLP k0  M .


Theorem 2.6.2 helps us to find an extreme point of the solution set ↵ of a given
LASSO problem in the case when its solution is non-unique. To that end, we first
use FISTA to solve the LASSO problem until it converges to a solution a⇤F . By
setting y0, = Ha⇤F , Lemma 2.6.1 then implies that Ha = y0, , 8a 2 ↵ . We then
run the simplex algorithm to find

a⇤SLP = arg min kak1 subject to Ha = HaF ,

which yields an extreme point of ↵ by Theorem 2.6.2.


An example where the LASSO problem has a non-unique solution is shown in Figure
2.2.b. In this case, FISTA converges to a non-sparse solution with ka⇤F k > M ,
shown as solid stems. This implies that it is not an extreme point of the solution
set. The simplex algorithm is then deployed to minimize the `1 norm such that the
measurement y0 = Ha⇤F is preserved. The final solution shown as dashed stems
is an extreme point with the desirable level of sparsity. The continuous-domain
relation of this example is discussed later.
The solution of the continuous-domain formulation is a convex set whose extreme
points are composed of at most M shifted Green’s functions. To find the position
of these Green’s functions, we discretize the continuum into a fine grid and then
run the proposed two-step algorithm. If the discretization is fine enough, then the
continuous-domain function that corresponds to the extreme point of the LASSO
formulation is a good proxy for the actual extreme point of the convex-set solution
of the original continuous-domain problem. This makes the extreme-point solutions
of the LASSO a natural choice among the solution set. For the case when there
is a unique solution but the convergence is too slow owing to the high value of
42 Continuous-Domain Extension of Classical Methods

the Lipschitz constant of the gradient of the quadratic loss, the simplex algorithm
is used after the FISTA iterations are stopped using an appropriate convergence
criterion. For FISTA, the convergence behavior is ruled by the number of iterations
t as
C
F (at ) F (a⇤ )  , (2.51)
(t + 1)2
where F is the LASSO functional and
⇣ ⌘
C = 2ka0 a⇤ k22 max Eig HT H (2.52)

(see [32]). This implies thatpan ✏ neighborhood of the minima of the functional
is obtained in at most t = C/✏ iterations. To ensure convergence, it is also
advisable to rely on the modified version of FISTA proposed in [73].
However, there is no direct relation between the functional value and the spar-
sity index of the iterative solution. Using the simplex algorithm as the next
step guarantees the upper bound M on the sparsity index of the solution. Also,
F (a⇤SLP )  F (a⇤F ). This implies that an ✏-based convergence criterion, in addition
to the sparsity-index-based criterion like a⇤F  M , can be used to stop FISTA.
Then, the simplex scheme is deployed to find an extreme point of the solution set
with a reduced sparsity index.
Note that when E(y, ·) is not strictly convex, the solution set can have non-
unique measurements. In that case, it is still possible to further sparsify a recovered
solution by using the discussed Simplex approach.

2.6.3 Alternative Grid-free Techniques


Our proposed method relies on a grid based discretizatoin of the infinite-dimensional
problem. For the sake of completeness, we discuss here alternative techniques for
reconstructing continuous-domain sparse signals which employ grid-free optimiza-
tion. Although elegant, these techniques have a more restricted range of applicabil-
ity. The Taut-string algorithm (see [74]) can fit L-splines for L = Dn but is devised
for ideal sampling only. In [59, 60, 69, 75–77] the dual problem is considered for the
optimization with an added emphasis on recovering the ground-truth signal. These
methods, however, only deal with L = Id and limited measurement operators.
Recently, in [62], motivated from [46], results for more general L and H have
been derived. There the optimization is carried out in two steps; firstly, a finite
2.7 Illustrations 43

dimensional dual problem involving two infinite-dimensional convex-constraints-


sets is solved; secondly, the support of this solution is identified which is finally used
to solve a finite-dimensional primal problem. Remarkably, for some specific cases,
solving each of these steps is feasible which results in an exact finite-dimensional
formulation (see for example [62, Section 2.4.2 and 2.4.3]).

2.7 Illustrations
We discuss the results obtained for the cases when the measurements are random
samples either of the signal itself or of its continuous-domain Fourier transform.
The operators of interest are L = D and L = D2 . The ground truth (GT) signal
fGT is solution of the stochastic differential equation LfGT = wGT [78] for the two
cases when wGT is
• Impulsive Noise. Here, the innovation wGT is a compound-Poisson noise
with Gaussian jumps, which corresponds to a sum of Dirac impulses whose
amplitudes follow a Gaussian distribution. The corresponding process fGT
has then the particularity of being piecewise smooth [79]. This case is matched
to the regularization operator kLf kM and is covered by Theorem 2.4.2 which
states that the minima f1⇤ for this regularization case is such that
K
X
w1⇤ = Lf1⇤ = ak (· xk ), (2.53)
k=1

which is a form compatible with a realization of an impulsive white noise.


• Gaussian White Noise. This case is matched to the regularization operator
kLf kL2 . Unlike the impulsive noise, w2⇤ = Lf2⇤ is not localized to finite points
and therefore is a better model for the realization of a Gaussian white noise.
In all experiments, we also constrain the test signals to be compactly supported.
This can be achieved by putting linear constraints on the innovations of the signal.
In Sections 2.7.1 and 2.7.3, we confirm experimentally that matched regularization
recovers the test signals better than non-matched regularization. While recon-
structing the Tikhonov and gTV solutions when the measurements are noisy, the
parameter in (2.34) and (2.45) is tuned using a grid search to give the best
recovered SNR.
44 Continuous-Domain Extension of Classical Methods

2.7.1 Random Sampling


In this experiment, the measurement functionals are Dirac impulses with the ran-
dom locations {xm }M m=1 . The regularization operator is L = D . It corresponds
2

to ⇢D (x) = 2 |x| and 'D (x) = (⇢L L ⇤ hm ) (x) = |x xm | /12. The null space
2
1
2 ⇤
3

is ND2 = span{1, x} for this operator. This means that the gTV-regularized solu-
tion is piecewise linear and that the L2 -regularized solution is piecewise cubic. We
compare in Figures 2.3.a and 2.3.b the recovery from noiseless samples of a second-
order process, referred to as ground truth (GT). It is composed of sparse (impulsive
Poisson) and non-sparse (Gaussian) innovations, respectively [80]. The sparsity
index—the number of impulses or non-zero elements—for the original sparse signal
is 9. The solution for the gTV case is recovered with = 0.05 and N = 200. The
sparsity index of the gTV solution for the sparse and Gaussian cases are 9 and 16,
respectively. As expected, the recovery of the gTV-regularized reconstruction is
better than that of the L2 -regularized solution when the signal is sparse. For the
Gaussian case, the situation is reversed.

2.7.2 Multiple Solutions


We discuss the case when the gTV solution is non-unique. We show in Figure 2.2.a
examples of solutions of the gTV-regularized random-sampling problem obtained
using FISTA alone (fF⇤ ) and FISTA + simplex (linear programming, fSLP ⇤
). In
this case, M = 30, L = D , and = 0.182. The continuous-domain functions fF⇤
2

and fSLP

have basis functions whose coefficients are the (non-unique) solutions of a
given LASSO problem, as shown in Figure 2.2.b. The `1 norms of the corresponding
coefficients are the same. Also, it holds that

kD2 fF⇤ kM = kD2 fSLP



kM = kDfF⇤ kTV = kDfSLP

kTV, (2.54)

which implies that the TV norm of the slope of fF⇤ and fSLP ⇤
are the same. This
is evident from Figure 2.2.c. The arc-length of the two curves are the same. The
signal fSLP

is piecewise linear (21 < M ), carries a piecewise-constant slope, and
is by definition, a non-uniform spline of degree 1. By contrast, fF⇤ has many more
knots and even sections whose slope appears to be piecewise-linear.
Theorem 2.4.2 asserts that the extreme points of the solution set of the gTV
regularization need to have fewer than M knots. Remember that fSLP ⇤
is obtained
by combining FISTA and simplex; this ensures that the basis coefficients of fSLP ⇤
2.7 Illustrations 45

are the extreme points of the solution set of the corresponding LASSO problem
(Theorem 2.6.2) and guarantees that the number of knots is smaller than M .
This example shows an intuitive relationship between the continuous-domain
and the discrete-domain formulations of inverse problems with gTV and `1 regu-
larization, respectively. The nature of the continuous-domain solution set and its
extreme points resonates with its corresponding discretized version. In both cases,
the solution set is convex and the extreme points are sparse.

No. of D D2
impulses Sparsity TV L2 TV L2
10 Strong 19.60 15.7 52.08 41.54
100 Medium 16.58 16.10 41.91 41.26
2000 Low 14.45 16.14 39.68 41.40
- Gaussian 14.30 16.32 40.05 41.23
(a) Noiseless case.

No. of D D2
impulses Sparsity TV L2 TV L2
10 Strong 17.06 11.52 25.55 24.60
100 Medium 13.24 10.94 24.44 24.24
2000 Low 10.61 11.13 25.80 26.19
- Gaussian 10.40 11.10 24.95 25.48
(b) Noisy case.

Table 2.1: Comparison of TV and L2 recovery from their (top table) noise-
less and (bottom table) noisy (with 40 dB SNR) random Fourier samples.
The results have been averaged over 40 realizations.

2.7.3 Random Fourier Sampling


Let now the measurement functions be hm (x) = rect Tx e j!m x , where T is the
window size. The samples are thus random samples of the continuous-domain
Fourier transform of a signal restricted to a window. For the regularization operator
L = D, the Green’s function is ⇢D (x) = 12 sign(x) and the basis is 'D,m (x) =
46 Continuous-Domain Extension of Classical Methods

1
2| · | ⇤ hm (x). Figure 2.4.a and 2.4.b correspond to a first-order process with
sparse and Gaussian innovations, respectively. The grid step = 0.05, M =
41, and N = 200. The sparsity index of the gTV solution for the sparse and
Gaussian cases is 36 and 39, respectively. For the original sparse signal (GT), it
is 7. The oscillations of the solution in the L2 -regularized case are induced by
the sinusoidal form of the the measurement functionals. This also makes the L2
solution intrinsically smoother than its gTV counterpart. Also, the quality of the
recovery depends on the frequency band used to sample.
In Figures 2.4.c and 2.4.d, we show the zoomed version of the recovered second-
order process with sparse and Gaussian innovations, respectively. The grid step is
= 0.05, M = 41 and N = 200. The operator L = D2 is used for the regular-
ization. This corresponds to ⇢D2 (x) = |x|2 and 'D ,m (x) = 12 | · | ⇤ hm (x). The
2
1 3

sparsity index of the gTV solution in the sparse and Gaussian cases is 10 and 36,
respectively. For the original sparse signal (GT), it is 10. Once again, the recovery
by gTV is better than by L2 when the signal is sparse. In the Gaussian case, the
L2 solution is better.
The effect of sparsity on the recovery of signals from their noiseless and noisy
(40 dB SNR) Fourier samples are shown in Table 1. The sample frequencies are
kept the same for all the cases. Here, M = 41, N = 200, T = 10, and the grid
step = 0.05. We observe that reconstruction performances for random processes
based on impulsive noise are comparable to that of Gaussian processes when the
number of impulses increases. This is reminiscent of the fact that generalized-
Poisson processes with Gaussian jumps are converging in law to corresponding
Gaussian processes [81].

2.8 Summary
In this chapter, we consider 1D linear inverse problems that are formulated in the
continuous domain. The object of recovery is a function that is assumed to min-
imize a convex objective functional. The solutions are constrained by imposing
a continuous-domain regularization. We derive the parametric form of the solu-
tion (representer theorems) for Tikhonov (quadratic) and generalized total-variation
(gTV) regularizations. We show that, in both cases, the solutions are splines that
are intimately related to the regularization operator. In the Tikhonov case, the
solution is smooth and constrained to live in a fixed subspace that depends on the
2.8 Summary 47

measurement operator. By contrast, the gTV regularization results in a sparse


solution composed of only a few dictionary elements that are upper-bounded by
the number of measurements and independent of the measurement operator. Our
findings for the gTV regularization resonates with the minimization of the `1 norm,
which is its discrete counterpart and also produces sparse solutions. The formu-
lations and the results of this chapter are summarized in Figure 2.5. Finally, we
find the experimental solutions for some measurement models in one dimension.
We discuss the special case when the gTV regularization results in multiple solu-
tions and devise an algorithm to find an extreme point of the solution set which is
guaranteed to be sparse.
48 Continuous-Domain Extension of Classical Methods

(a)
a⇤F and a⇤SLP

(b) (c)

Figure 2.2: Illustration of inability of FISTA to deliver a sparse solution


: (a) comparison of solutions, fF⇤ vs. fSLP⇤
for continuous-domain gTV
problem, (b) signal innovations with sparsity index 64 (> M ) and 21 (<
M ), respectively, and (c) derivative of the two solutions. The two signal
innovations in (b) are solutions of the same Lasso problem, but only a⇤SLP is
an extreme point of the solution set. The original signal is a second-order
process (L = D2 ) and the measurements are M = 30 nonuniform noisy
samples (SNR = 40 dB). The parameters are = 0.182, N = 400, and grid
step = 80 1
.
2.8 Summary 49

0.8 100

0.6 80

0.4 60

0.2 40

0 20

-0.2 0

-0.4 -20
0 2 4 6 8 10 0 2 4 6 8 10

(a) Sparse Signal (b) Gaussian Signal

Figure 2.3: Recovery of sparse (a) and Gaussian (b) second-order pro-
cesses (GT) using L = D2 from their nonuniform samples corrupted with
40 dB measurement noise.
50 Continuous-Domain Extension of Classical Methods

0.5 120

0 100

80
-0.5
60
-1
40
-1.5
20

-2 0

-2.5 -20
-5 0 5 -5 0 5

(a) Sparse Signal (b) Gaussian Signal

130

-2.3 128

126
-2.35
124

122
-2.4
120

118
-2.45

0.9 1 1.1 1.2 1.3 1.4 1.5 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2 -1.9

(c) Sparse Signal (d) Gaussian Signal

Figure 2.4: Recovery of first-order (first row) and second-order (second


row) processes from their random noiseless Fourier samples. In all the
cases, M = 41 and N = 200. In the interest of clarity, (c) and (d) contain
the zoomed versions of the actual signals.
2.8 Summary 51

Figure 2.5: Summary of the whole scheme. The regularization operator


with a given norm {4.a} defines the search space for the solution{1.a, 4.b}.
Representer theorems then give the parametric representation of the solu-
tion {1.b}. The numerical solution is then recovered by optimizing over
the parameters to minimize JR (f |y) {1.c}.
52 Continuous-Domain Extension of Classical Methods
Part II

Second Generation

53
Chapter 3

Deep-Learning-based PGD for


Iterative Reconstruction
1
As discussed earlier, supervised deep-learning-based reconstruction methods out-
perform the classical methods. However, this comes at the cost of robustness. In
this chapter, we introduce an iterative framework inspired from classical methods
in order to bring more robustness and better performance to the deep-learning
methods.

3.1 Overview
While medical imaging is a fairly mature area, there is recent evidence that it may
still be possible to reduce the radiation dose and/or speedup the acquisition pro-
cess without compromising image quality. This can be accomplished with the help
of sophisticated reconstruction algorithms that incorporate some prior knowledge
(e.g., sparsity) on the class of underlying images [7]. The reconstruction task is
usually formulated as an inverse problem where the image-formation physics are
modeled by an operator H : RN ! RM (called the forward model ). The measure-
ment equation is y = Hx + n 2 RM , where x2 RN is the space-domain image that
1 This content of this chapter are based on our work [26].

55
56 Deep-Learning-based PGD for Iterative Reconstruction

we are interested in recovering and n 2 RM is the noise intrinsic to the acquisition


process.
In the case of extreme imaging, the number of measurements is reduced as much
as possible to decrease either the radiation dose in computed tomography (CT) or
the scanning time in MRI. Moreover, the measurements are typically very noisy due
to short integration times, which calls for some form of denoising. Indeed, there
may be significantly fewer measurements than the number of unknowns (M << N ).
This gives rise to an ill-posed problem in the sense that there may be an infinity
of consistent images that map to the same measurements y. Thus, one challenge
of the reconstruction algorithm is to select the best solution among a multitude of
potential candidates.
Recently, a surge in using deep learning to solve inverse problems in imaging [16–
20], has established new state-of-the-art results for tasks such as sparse-view CT
reconstruction [17]. Rather than reconstructing the image from the measurements
y directly, the most successful strategies have been to train the CNN as a regressor
between a rough initial reconstruction Ay, where A : RM ! RN , and the final,
desired reconstruction [17, 18]. This initial reconstruction could be obtained using
classical algorithms (e.g., FBP, BP) or by some other linear operation. Once the
training is complete, the reconstruction for a new measurement y is given by x⇤ =
CNN ✓⇤ (Ay), where CNN ✓ : RN ! RN denotes the CNN as a function and ✓ ⇤
denotes the internal parameters of the CNN after training. These schemes exploit
the fact that the structure of images can be learned from representative examples.
CNNs are favored because of the way they encode the data in their hidden layers.
In this sense, a CNN can be seen as a good prior encoder.
Although the results reported so far are remarkable in terms of image quality,
there is still some concern as to whether or not they can be trusted, especially in the
context of diagnostic imaging. The main limitation of direct algorithms such as [17]
is that they do not provide any guarantee on the worst-case performance. Moreover,
even in the case of noiseless (or low-noise) measurements, there is no insurance that
the reconstructed image is consistent with the measurements because, unlike for the
iterative schemes, there is no feedback mechanism that imposes this consistency.

3.1.1 Contributions
In this work, we propose a simple yet effective iterative scheme (see Figure 3.1),
which tries to incorporate the advantages of the existing algorithms and side-steps
3.1 Overview 57

Figure 3.1: (a) Block diagram of projected gradient descent using a CNN
as the projector and E as the data-fidelity term. The gradient step pro-
motes consistency with the measurements and the projector forces the so-
lution to belong to the set of desired solutions. If the CNN is only an
approximate projector, the scheme may diverge. (b) Block diagram of the
proposed relaxed projected gradient descent. The ↵k s are updated in such
a way that the algorithm always converges (see Algorithm 1 for more de-
tails).
58 Deep-Learning-based PGD for Iterative Reconstruction

their disadvantages. Specifically:

• We first propose to learn a CNN that acts as a projector onto a set S which
can be intuitively thought of as the manifold of the data (e.g., biomedical
images). In this sense, our CNN encodes the prior knowledge of the data. Its
purpose is to map an input image to an output image that is more similar to
the training data.

• Given a measurement y, we initialize our reconstruction using a classical


algorithm.

• We then iteratively alternate between minimizing the data-fidelity term and


projecting the result onto the set S by applying a suitable variant of the
projected gradient descent (PGD) which ensures convergence.

Besides the design of the implementation, our contribution is in the proposal of


the relaxed form of PGD that is guaranteed to converge and under certain conditions
can also find a local minima of a nonconvex inverse problem. Moreover, as we
shall see later, this method outperforms existing algorithms on low-dose x-ray CT
reconstructions.

3.1.2 Related and Prior Work


Deep learning has already shown promising results in image denoising, superreso-
lution, and deconvolution. Recently, it has also been used to solve inverse problems
in imaging using limited data [17–20], and in compressed sensing [82]. However,
as discussed earlier, these regression-based approaches lack a feedback mechanism
that could be beneficial in solving inverse problems.
Another usage of deep learning is to complement iterative algorithms. This in-
cludes learning a CNN as an unrolled version of the iterative shrinkage-thresholding
algorithm (ISTA) [83] or ADMM [84]. In [21], inverse problems involving non-linear
forward models are solved by partially learning the gradient descent. In [85], the
iterative algorithm is replaced by a recurrent neural network (RNN). Recently,
in [86], a cascade of CNNs is used to reconstruct images. Within this cascade
the data-fidelity is enforced at multiple steps. However, in all of these approaches
the training is performed end-to-end, meaning that the network parameters are
dependent on the iterative scheme chosen.
3.1 Overview 59

These approaches differ from plug-and-play ADMM [87–89], where an inde-


pendent off-the-shelf denoiser or a trained operator is plugged into the iterative
scheme of the alternating-direction method of multipliers (ADMM) [13]. ADMM is
an iterative optimization technique that alternates between (i) a linear solver that
reinforces consistency with respect to the measurements; and (ii) a nonlinear oper-
ation that re-injects the prior. The idea of plug-and-play ADMM is to replace (ii),
which resembles denoising, with an off-the-shelf denoiser. Plug-and-play ADMM
is more general than the optimization framework (1.6) but still lacks theoretical
justifications. In fact, there is little understanding yet of the connection between
the use of a given denoiser and the regularization it imposes (though this link has
recently been explored in [90]).
In [91], a generative adversarial network (GAN) trained as a projector onto a
set, has been used with the plug-and-play ADMM. Similarly, in [92], the inverse
problem is solved over a set parameterised by a generative model. However, it
requires a precise initialization of the parameters. In [93], similarly to us, the
projector in PGD is replaced with a neural network. However, the scheme lacks
convergence guarantee and a rigorous theoretical analysis.
Our scheme is similar in spirit to plug-and-play ADMM, but is simpler to an-
alyze. Although our methodology is generic and can be applied in principle to
any inverse problem, our experiments here involve sparse-view x-ray CT recon-
struction. For a recent overview of the field, see [94]. Current approaches to
sparse-view CT reconstruction follow the formulation (1.6), e.g., using a penalized
weighted least-squares data term and sparsity-promoting regularizer [95], dictionary
learning-based regularizer [96], or generalized total variation regularizer [97]. There
are also prior works on the direct application of CNNs to CT reconstruction. These
methods generally use the CNN to denoise the sinogram [98] or the reconstruction
obtained from a standard technique [17, 99–101]; as such, they do not perform the
reconstruction directly.

3.1.3 Roadmap
The chapter is organized as follows: In Section 3.2, we discuss the mathematical
framework that motivates our approach and justify the use of a projector onto a
set as an effective strategy to solve inverse problems. In Section 3.3, we present
our algorithm, which is a relaxed version of PGD. It has been modified so as to
converge in practical cases where the projection property is only approximate. We
60 Deep-Learning-based PGD for Iterative Reconstruction

discuss in Section 3.4 a novel technique to train the CNN as a projector onto
a set, especially when the training data is small. This is followed by experiments
(Section 3.5), results and discussions (Section 3.6 and Section 3.7), and conclusions
(Section 3.8).

3.2 Theoretical Framework


Our goal is to use a trained CNN iteratively inside PGD to solve an inverse prob-
lem. To understand why this scheme will be effective, we first analyze how using
a projector onto a set, combined with gradient descent, can be helpful in solving
inverse problems. Properties of PGD using an orthogonal projector onto a convex
set are known [102]. Here, we extend these results for any projector onto a non-
convex set. This extension is required because there is no guarantee that the set of
desirable reconstruction images is convex. Proofs of all the results in this section
can be found in the Appendix.

3.2.1 Notation
We consider the finite-dimensional Hilbert space RN equipped with the scalar prod-
uct h· , ·i that induces the `2 norm k·k2 . The spectral norm of the matrix H, denoted
by kHk2 , is equal to its largest singular value. For x 2 RN and " > 0, we denote
by B" (x) the `2 -ball centered at x with radius ", i.e.,

B" (x) = z 2 RN : kz xk2  " .

The operator T : RN ! RN is Lipschitz-continuous with constant L if

kT (x) T (z)k2  L kx zk2 , 8x, z 2 RN .

It is contractive if it is Lipschitz-continuous with constant L < 1 and non-expansive


if L = 1. A fixed point x⇤ of T (if any) satisfies T (x⇤ ) = x⇤ .
Given the set S ⇢ RN , the mapping PS : RN ! S is called a projector if it
satisfies the idempotent property PS PS = PS . It is called an orthogonal projector
if

PS (x) = inf kx zk2 , 8x 2 RN .


z2S
3.2 Theoretical Framework 61

3.2.2 Constrained Least Squares


Consider the problem of the reconstruction of the image x 2 RN from its noisy
measurements y = Hx + n, where H 2 RM ⇥N is the linear forward model and
n 2 RM is additive white Gaussian noise. The framework is also applicable to
Poisson noise model-based CT via a suitable transformation, as shown in Appendix
A.2.2.
Our reconstruction incorporates a strong form of prior knowledge about the
original image: We assume that x must lie in a set S ⇢ RN that contains all
objects of interest. The proposed way to make the reconstruction consistent with
the measurements as well as with the prior knowledge is to solve the constrained
least-squares problem
1 2
min kHx yk2 . (3.1)
x2S 2
The condition x 2 S in (3.1) plays the role of a regularizer. If no two points in S
have the same measurements and in case y is noiseless, then out of all the points
in RN that are consistent with the measurement y, (3.1) selects a unique point
x⇤ 2 S. In this way, the ill-posedness of the inverse problem is bypassed. When
the measurements are noisy, (3.1) returns a point x⇤ 2 S such that y⇤ = Hx⇤ is as
close as possible to y. Thus, it also denoises the measurement, where the quantity
y⇤ can be regarded as the denoised version of y.
The point x⇤ 2 S is called a local minimizer of (3.1) if

9" > 0 : kHx⇤ yk2  kHx yk2 , 8x 2 S \ B" (x⇤ ).

3.2.3 Projected Gradient Descent


When S is a closed convex set, it is well known [102] that a solution of (3.1) can
be found by PGD

xk+1 = PS (xk HT Hxk + HT y), (3.2)

where is a step size chosen such that < 2/ HT H . This algorithm combines
2
the orthogonal projection onto S with the gradient descent with respect to the
quadratic objective function, also called the Landweber update [103]. PGD [104,
62 Deep-Learning-based PGD for Iterative Reconstruction

Section 2.3] is a subclass of the forward-backward splitting [105, 106], which is


known in the `1 -minimization literature as iterative shrinkage/thresholding algo-
rithms (ISTA) [10, 11, 107].
In our problem, S is presumably non-convex, but we propose to still use the
update (3.2) with some projector PS that may not be orthogonal. In the rest of
this section, we provide sufficient conditions on the projector PS (not on S itself)
under which (3.2) leads to a local minimizer of (3.1). Similarly to the convex case,
we characterize the local minimizers of (3.1) by the fixed points of the combined
operator
G (x) = PS (x HT Hx + HT y) (3.3)
and then show that some fixed point of that operator must be reached by the
iteration xk+1 = G (xk ) as k ! 1, regardless of the initial point x0 . We first
state a sufficient condition for each fixed point of G to become a local minimizer
of (3.1).
Proposition 3.2.1. Let > 0 and PS be such that, for all x 2 RN ,
hz PS x , x PS xi  0, 8z 2 S \ B" (PS x), (3.4)
for some " > 0. Then, any fixed point of the operator G in (3.3) is a local
minimizer of (3.1). Furthermore, if (3.4) is satisfied globally, in the sense that
hz PS x , x PS xi  0, 8x 2 RN , z 2 S, (3.5)
then any fixed point of G is a solution of (3.1).
Two remarks are in order. First, (3.5) is a well-known property of orthogonal
projections onto closed convex sets. It actually implies the convexity of S (see
Proposition 3.2.2). Second, (3.4) is much more relaxed and easily achievable, for
example, as stated in Proposition 3.2.3, by orthogonal projections onto unions of
closed convex sets. (Special cases are unions of subspaces, which have found some
applications in data modeling and clustering [108]).
Proposition 3.2.2. If PS is a projector onto S ⇢ RN that satisfies (3.5), then S
must be convex.
Proposition 3.2.3. If S is a union of a finite number of closed convex sets in RN ,
then the orthogonal projector PS onto S satisfies (3.4).
3.3 Relaxation with Guaranteed Convergence 63

Propositions 1-3 suggest that, when S is non-convex, the best we can hope for
is to find a local minimizer of (3.1) through a fixed point of G . Theorem 3.2.4
provides a sufficient condition for PGD to converge to a unique fixed point of G .
Theorem 3.2.4. Let max and min be the largest and smallest eigenvalues of
HT H, respectively. If PS satisfies (3.4) and is Lipschitz-continuous with constant
L < ( max + min )/( max min ), then, for = 2/( max + min ), the sequence
{xk } generated by (3.2) converges to a local minimizer of (3.1), regardless of the
initialization x0 .
It is important to note that the projector PS can never be contractive since it
preserves the distance between any two points on S. Therefore, when H has a non-
trivial null space, the condition L < ( max + min )/( max min ) of Theorem 3.2.4
is not feasible. The smallest possible Lipschitz constant of PS is L = 1, which
means that PS is non-expansive. Even with this condition, it is not guaranteed
that the combined operator F has a fixed point. This limitation can be overcome
when F is assumed to have a nonempty set of fixed points. Indeed, we state in
Theorem 3.2.5 that one of them must be reached by iterating the averaged opera-
tor ↵ Id +(1 ↵)G , where ↵ 2 (0, 1) and Id is the identity operator. We call this
scheme averaged PGD (APGD).
Theorem 3.2.5. Let max be the largest eigenvalue of HT H. If PS satisfies (3.4)
and is a non-expansive operator such that G in (3.3) has a fixed point for some
< 2/ max , then the sequence {xk } generated by APGD, with

xk+1 = (1 ↵)xk + ↵G (xk ) (3.6)

for any ↵ 2 (0, 1), converges to a local minimizer of (3.1), regardless of the initial-
ization x0 .

3.3 Relaxation with Guaranteed Convergence


Despite their elegance, Theorems 3.2.4 and 3.2.5 are not directly productive when
we construct the projector PS by training a CNN because it is unclear how to
enforce the Lipschitz continuity of PS on the CNN architecture. Without putting
any constraints on the CNN, however, we can still achieve the convergence of the
reconstruction sequence by modifying PGD as described in Algorithm 1; we name
64 Deep-Learning-based PGD for Iterative Reconstruction

it relaxed projected gradient descent (RPGD). In Algorithm 1, the projector PS


is replaced by the general nonlinear operator F . We also introduce a sequence
{ck } that governs the rate of convergence of the algorithm and a sequence {↵k } of
relaxation parameters that evolves with the algorithm. The convergence of RPGD
is guaranteed by Theorem 3.3.1. More importantly, if the nonlinear operator F is
actually a projector and the relaxation parameters do not go all the way to 0, then
RPGD converges to a meaningful point.

Algorithm 1 Relaxed projected gradient descent (RPGD)

Input: H, y, A, nonlinear operator F , step size > 0, positive sequence {cn }n 1,


x0 = Ay 2 RN , ↵0 2 (0, 1].
Output: reconstructions {xk }, relaxation parameters {↵k }.

k 0
while not converged do
zk = F (xk HT Hxk + HT y)
if k 1 then
if kzk xk k2 > ck kzk 1 xk 1 k2 then
↵k = ck kzk 1 xk 1 k2 /kzk xk k2 ↵k 1
else
↵k = ↵k 1
end if
end if
xk+1 = (1 ↵k )xk + ↵k zk
k k+1
end while

Theorem 3.3.1. Let the input sequence {ck } of Algorithm 1 be asymptotically


upper-bounded by C < 1. Then, the following statements hold true for the recon-
struction sequence {xk }:
(i) xk ! x⇤ as k ! 1, for all choices of F ;
(ii) if F is continuous and the relaxation parameters {↵k } are lower-bounded by
" > 0, then x⇤ is a fixed point of
G (x) = F (x HT Hx + HT y); (3.7)
3.3 Relaxation with Guaranteed Convergence 65

Table 3.1: Convergence of different algorithms for different cases. Here


G and F are related by (3.7), and the terms global and local minima are in
the context of the problem (3.1). Note that when F is CNN ✓ , only RPGD
offers convergence guarantee.

Algorithm S F Converges Converged Point Property


Convex PS Always Global Minima
NonConvex PS Not always Fixed Point (FP) of G
PGD
NonConvex PS with (3.4) Not always Local Minima
NonConvex CNN ✓ Not always Unknown
NonConvex PS with (3.4) If G has FP Local Minima
APGD and L = 1
NonConvex CNN ✓ Not always Unknown
NonConvex PS with (3.4) Always Local Minima if ↵k > 0
RPGD
NonConvex CNN ✓ Always FP of G if ↵k > 0

(iii) if, in addition to (ii), F is indeed a projector onto S that satisfies (3.4), then
x⇤ is a local minimizer of (3.1).

We prove Theorem 3.3.1 in Appendix A.2.1. Note that the weakest statement
here is (i); it guarantees that RPGD always converges, albeit not necessarily to a
fixed point of G . Moreover, the assumption about the continuity of F in (ii) is
automatically satisfied when F is a CNN.
In summary, we have described three algorithms: PGD, APGD, and RPGD.
PGD is a standard algorithm which, in the event of convergence, finds a local
minima of (3.1); however, it does not always converge. APGD ensures convergence
under the broader set of conditions given in Theorem 3.2.5; but, in order to have
these properties, both PGD and APGD necessarily need a projector. While, we
shall train our CNN to act like a projector, it may not exactly fulfill the required
conditions. This is the motivation for RPGD, which, unlike PGD and APGD,
is guaranteed to converge. It also retains the desirable properties of PGD and
APGD: it finds a local minima of (3.1), given that the conditions (ii) and (iii) of
Theorem 3.3.1 are satisfied. Note, however, that when the set S is nonconvex, this
local minimum may not be a global minimum. The results of Section 3.2 and 3.3
66 Deep-Learning-based PGD for Iterative Reconstruction

are summarized in Table 3.1.

3.4 Training a CNN as a Projector


For any point x 2 S, a projector onto S should satisfy PS x = x. Moreover, we
want that
x = PS (x̃), (3.8)
where x̃ is any perturbed version of x. Given the training set, {x1 , . . . , xQ } of Q
points drawn from the set S, we generate the ensemble

{{x̃1,1 , . . . , x̃Q,1 }, . . . , {x̃1,N . . . , x̃Q,N }}

of N ⇥ Q perturbed points and train the CNN by minimizing the loss function
Q
N X
X 2
J(✓) = kxq CNN ✓ (x̃q,n )k2 . (3.9)
n=1 q=1
| {z }
Jn (✓)

The optimization proceeds by stochastic gradient descent for T epochs, where an


epoch is defined as one pass though the training data.
It remains to select the perturbations that generate the xq,n . Our goal here is to
create a diverse set of perturbations so that the CNN does not overfit one specific
type. In our experiments, while training for the tth epoch, we chose

x̃q,1 = xq (3.10)
q,2 q
x̃ = AHx (3.11)
q,3 q,2
x̃ = CNN ✓t 1 (x̃ ), (3.12)

where A is a classical linear reconstruction algorithm (FBP in our experiments),


and ✓ t are the CNN parameters after t epochs. Equations (3.10), (3.11), and (3.12)
correspond to no perturbation, a linear perturbation, and a dynamic nonlinear
perturbation, respectively. We now comment on each perturbation in detail.
Keeping x̃q,1 in the training ensemble will train the CNN with the defining
property of the projector: the projector maps a point in the set S onto itself. If
the CNN were trained only with (3.10), it would be an autoencoder [14].
3.4 Training a CNN as a Projector 67

To understand the perturbation x̃q,2 in (3.11), recall that AHxq is the classical
linear reconstruction of xq from its measurement y = Hxq . Perturbation (3.11)
is indeed useful because we initialize RPGD with AHxq . Using only (3.11) for
training would return the same CNN as in [17].
The perturbation x̃q,3 in (3.12) is the output of the CNN whose parameters ✓ t
change with every epoch t; thus, it is a nonlinear and dynamic (epoch-dependent)
perturbation of xq . The rationale for using (3.12) is that it greatly increases the
training diversity by allowing the network to see T new perturbations of each train-
ing point, without greatly increasing the total training size since it only requires
Q additional gradient computations per epoch. Moreover, (3.12) is in sync with
the iterative scheme of RPGD, where the output of the CNN is processed with a
gradient descent and is again fed back into itself.

3.4.1 Architecture
Our CNN architecture is the same as in [17], which is a U-net [109] with intrinsic
skip connections among its layers and an extrinsic skip connection between the
input and the output. The intrinsic skip connections help to eliminate singularities
during the training [110]. The extrinsic skip connections make this network a
residual net; i.e., CNN = Id +Unet, where Id denotes the identity operator and
Unet : RN ! RN denotes U-net as a function. Therefore, U-net actually provides
the projection error (negative perturbation) that should be added to the input to
get the projection.
Residual nets have been shown to be effective for image recognition [111] and for
solving inverse problems [17]. While the residual-net architecture does not increase
the capacity or the approximation power of the CNN, it does help in learning
functions that are close to an identity operator, as is the case in our setting.

3.4.2 Sequential Training Strategy


We train the CNN in three stages. In Stage 1, we train it for T1 epochs with
respect to the partial-loss function J2 in (3.9) which only uses the ensemble {x̃q,2 }
generated by (3.11). In Stage 2, we add the ensemble {x̃q,3 } according to (3.12)
at every epoch and then train the CNN with respect to the loss function J2 + J3 ;
we repeat this procedure for T2 epochs. Finally, in Stage 3, we train the CNN for
68 Deep-Learning-based PGD for Iterative Reconstruction

T3 epochs with all three ensembles {x̃q,1 , x̃q,2 , x̃q,3 } to minimize the original loss
function J = J1 + J2 + J3 from (3.9).
We shall see in Section 3.7.2 that this sequential procedure speeds up the training
without compromising the performance. The parameters of U are initialized by a
normal distribution with a very low variance. Since CNN = Id +U , this function
acts close to an identity operator in the initial epochs and makes it redundant to
use {x̃q,1 } for the initial training stages. Therefore, {x̃q,1 } is only added at the last
stage when the CNN is no longer close to an identity operator. After training with
only {x̃q,2 } in Stage 1, x̃q,3 will be close to xq since it is the output of the CNN for
the input x̃q,2 . This eases the training for {x̃q,3 } in the second and third stage.

3.5 Experiments
We validate the proposed method on the challenging case of sparse-view CT recon-
struction. Conventionally, CT imaging requires many views to obtain good quality
reconstruction. We call this scenario full-dose reconstruction. Our main aim in
these experiments is to reduce the number of views (or dose) for CT imaging while
retaining the quality of full-dose reconstructions. We denote a k-times reduction in
views by ⇥k.
The measurement operator H for our experiments is the Radon transform. It
maps an image to the values of its integrals along a known set of lines [2]. In 2D,
the measurements are indexed by the angle and offset of each lines and arranged
in a 2D sinogram. We implemented H and HT with Matlab’s radon and iradon
(normalized to satisfy the adjoint property), respectively. The Matlab code for the
RPGD and the sequential-strategy-based training are made publically available2 .

3.5.1 Datasets
We use two datasets for our experiments.
1) Mayo Clinic Dataset. It consists of 500 clinically realistic, (512 ⇥ 512) CT
images from the lower lungs to the lower abdomen of 10 patients. Those were
obtained from the Mayo clinic AAPM Low Dose CT Grand Challenge [112].
2) Rat Brain Dataset. We use a real (1493 px ⇥ 720 view ⇥ 377 slice) sinogram
from a CT scan of a single rat brain. The data acquisition was performed at the
2 https://github.com/harshit-gupta-epfl/CNN-RPGD
3.5 Experiments 69

Paul Scherrer Institute in Villigen, Switzerland at the TOMCAT beam line of the
Swiss Light Source. During pre-processing, we split this sinogram slice-by-slice
and downsampled it to create a dataset of 377 (729 px ⇥ 720 view) sinograms. CT
images of size (512⇥512) were then generated from these full-dose sinograms (using
the FBP, see Section 3.5.3). For the qth z-slice, we denote the corresponding image
xqFD . For experiments based on this dataset, the first 327 and the last 25 slices are
used for training and testing, respectively. This left a gap of 25 slices in between
the training and testing data.

3.5.2 Experimental Setups


We now describe three experimental setups. We use the first dataset for the first
experiment and the second for the last two.
1) Experiment 1. We split the Mayo dataset into 475 images from 9 patients for
training and 25 images from the remaining patient for testing. We assume these
images to be the ground truth. From the qth image xq , we generated the sparse-
view sinogram yq = Hxq using several different experimental conditions. Our task
is to reconstruct the image from the sinogram.
The sinograms always have 729 offsets per view, but we varied the number of
views and the level of measurement noise for different cases. We took 144 views
and 45 views, which corresponds to ⇥5 and ⇥16 dosage reductions (assuming a
full-view sinogram has 720 views). We added Gaussian noise to the sinograms to
make the SNR equal to 35, 40, 45, 70, and infinity dB, where we refer to the first
three as high measurement noise and the last two as low measurement noise. The
SNR of the sinogram y + n is defined as
SNR(y + n, y) = 20 log10 (kyk2 /knk2 ) . (3.13)
For testing with the low and high measurement noise, we trained the CNNs without
noise and at the 40-dB level of noise, respectively (see Section 3.5.4 for details).
To make the experiments more realistic and to reduce the inverse crime, the sino-
grams were generated by slightly perturbing the angles of the views by a zero-mean
additive white Gaussian noise (AWGN) with standard deviation of 0.05 degrees.
This creates a deliberate mismatch between the actual measurement process and
the forward model.
2) Experiment 2. We used images xqFD from the rat-brain dataset to generate
Poisson-noise-corrupted sinograms yq with 144 views. Just as in Experiment 1, the
70 Deep-Learning-based PGD for Iterative Reconstruction

task is to reconstruct xqFD back from yq . Sinograms were generated with 25, 30,
and 35 dB SNR with respect to HxqFD . To achieve this, in (A.35) and (A.36), we
assume the readout noise to be zero and {b1 , . . . , bm } = b0 = 1.66 ⇥ 105 , 5.24 ⇥ 105 ,
and 1.66 ⇥ 106 , respectively. More details about this process is given in Appendix
A.2.2. The CNNs were trained at only the 30-dB level of noise. Again, our task is
to reconstruct the images from the sinograms.
3) Experiment 3. We downsampled the views of the original, (729 ⇥ 720) rat-
brain sinograms by 5 to obtain sparse-view sinograms of size (729 ⇥ 144). For the
qth z-slice, we denote the corresponding sparse-view sinograms yqReal . Note that,
unlike in Experiments 1 and 2, the sinogram was not generated from an image but
was obtained experimentally.

3.5.3 Comparison Methods


Given the ground truth x, our figure of merit for the reconstructed x⇤ is the re-
gressed SNR given by

SNR(x⇤ , x) = arg max SNR(ax⇤ + b, x), (3.14)


a,b

where the purpose of a and b is to adjust for contrast and offset. We also evaluate
the performance using the structural similarity index (SSIM) [113]. We compare
five reconstruction methods.
1) FBP. FBP is the classical direct inversion of the Radon transform H, here
implemented in Matlab by the iradon command with the ram-lak filter and linear
interpolation as options.
2) Total-Variation Reconstruction. TV solves
✓ ◆
1
xTV = min kHx yk22 + kxkTV s.t. x 0, (3.15)
x 2

where
X1 NX1 q
N
kxkTV = (Dh;i,j (x))2 + (Dv;i,j (x))2 ,
i=1 j=1

Dh;i,j (x) = [x]i,j+1 [x]i,j , and Dv;i,j (x) = [x]i,j+1 [x]i,j . The optimization is
carried out via ADMM [13].
3.5 Experiments 71

3) Dictionary Learning (DL). DL [96] solves


✓ J
X
xDL = arg min kHx yk2 + kEj x D↵j k2
x,↵
j=1

+ ⌫j k↵j k0 , (3.16)

2
where Ej : RN ⇥N ! RL extracts and vectorizes the jth patch of size (L ⇥ L) from
2
the image x, D 2 RL ⇥256 is the dictionary, ↵j is the jth column of ↵ 2 R256⇥R ,
and R = (N L + 1)2 . Note that the patches are extracted with a sliding distance
of one pixel.
For a given y, the dictionary D is learned from the corresponding ground truth
using the procedure described in [114]. The objective (3.16) is then solved iteratively
by first minimizing it with respect to x using gradient descent as described in [96]
and then with respect to ↵ using orthogonal matching pursuit (OMP) [115]. Since
D is learned from the testing ground truth itself, the performance that we report
here is an upper bound to the one that would be achieved by learning it using the
training images.
4) FBPconv. FBPconv [17] is a state-of-the-art deep-learning technique, in
which a residual CNN with U-net architecture is trained to directly denoise the FBP
. It has been shown to outperform other deep-learning-based direct reconstruction
methods for sparse-view CT. In our proposed method, we use a CNN with the same
architecture as in FBPconv. As a result, in our framework, FBPconv corresponds
to training with only the ensemble in (3.11). In the testing phase, the FBP of the
measurements is fed into the trained CNN to output the reconstruction image.
5) RPGD. RPGD is our proposed method. It is described in Algorithm 1.
There the nonlinear operator F is the CNN trained as a projector (as discussed
in Section 3.4). For experiments with Poisson noise, we use the slightly modified
RPGD described in Appendix A.2.2. For all the experiments, FBP is used for the
operator A.

3.5.4 Training and Selection of Parameters


1) Experiment 1. For TV, the regularization parameter is selected via a golden-
section search over 20 values so as to maximize the SNR of xTV with respect to
72 Deep-Learning-based PGD for Iterative Reconstruction

the ground truth. We set the additional penalty parameter inside ADMM (see
Equation (2.6) in [13]) equal to . The rationale for this heuristic is that it puts
the soft-threshold parameter in the same order of magnitude as the image gradients.
We set the number of iterations to 100, which was enough to show good empirical
convergence.
For DL, the parameters are selected via a parameter sweep, roughly following
the approach described in [96, Table 1]. Specifically: The patch size is L = 8.
During dictionary learning, the sparsity level is set to 5 and 10. During recon-
struction, the sparsity level for OMP is set to 5, 8, 10, 12, 20, and 25, while the
tolerance level is taken to be 10, 100, and 1000. This, in effect, is the same as
sweeping over ⌫j in (3.16). For each of these 2 ⇥ 6 ⇥ 3 = 36 parameter settings,
in (3.16) is chosen by a golden-section search over 7 values.
As discussed earlier, the CNNs for both the ⇥5 and ⇥16 cases are trained
separately for high and low measurement noise.
i) Training with Noiseless Measurements. The training of the projector for
RPGD follows the sequential procedure described in Section 3.4, with the configu-
rations

• ⇥5, no noise: T1 = 80, T2 = 49, T3 = 5;

• ⇥16, no noise: T1 = 71, T2 = 41, T3 = 11.

We use the CNN obtained right after the first stage for FBPconv, since during this
stage, only the training ensemble in (3.11) is taken into account. We empirically
found that the training error J2 converged in T1 epochs of Stage 1, yielding an
optimal performance for FBPconv.
ii) Training with 40-dB Measurement Noise. This includes replacing the ensem-
ble in (3.11) with {Ayq } where yq = Hxq + n, has a 40-dB SNR with respect
to Hxq . With 20% probability, we also perturb the views of the measurements
with an AWGN of 0.05 standard deviation so as to enforce robustness to model
mismatch. These CNNs are initialized with the ones obtained after the first stage
of the noiseless training and are then trained with the configurations

• ⇥5, 40-dB noise: T1 = 35, T2 = 49, T3 = 5;

• ⇥16, 40-dB noise: T1 = 32, T2 = 41, T3 = 11.


3.5 Experiments 73

Similarly to the previous case, the CNNs obtained after the first and the third
training stage are used in FBPconv and RPGD, respectively. For clarity, these
variants will be referred to as FBPconv40 and RPGD40.
The learning rate is decreased in a geometric progression from 10 2 to 10 3 in
Stage 1 and kept at 10 3 for Stages 2 and 3. Recall that the last two stages contain
the ensemble with dynamic perturbation (3.12) which changes in every epoch. The
lower learning rate, therefore, avoids drastic changes in parameters between the
epochs. The batch size is fixed to 2. The other hyper-parameters follow [17]. For
stability, gradients above 10 2 are clipped and the momentum is set to 0.99. The
total training time for the noiseless case is around 21.5 hours on a Titan X GPU
(Pascal architecture).
The hyper-parameters for RPGD are chosen as follows: The relaxation param-
eter ↵0 is initialized with 1, the sequence {ck } is set to the constant C = 0.99 for
RPGD and C = 0.8 for RPGD40. For each noise level and views number, the only
free parameter is swept over 20 values geometrically spaced between 10 2 and
10 5 . We pick the which gives the best average SNR over the 25 test images.
Note that, for TV and DL, the value of the optimum generally increases as the
measurement noise increases; however, no such obvious relation exists for . This
is mainly because it is the step size of the gradient descent in RPGD and not a
regularization parameter. In all experiments, the gradient step is skipped during
the first iteration.
On the GPU, one iteration of RPGD takes less than 1 second. The algorithm
is stopped when the residual kxk+1 xk k2 reaches a value less than 1, which is
sufficiently small compared to the dynamic range [0, 350] of the image. It takes
around 1-2 minutes to reconstruct an image with RPGD.
2) Experiment 2. For this case the CNNs are trained similarly to the CNN for
RPGD40 in Experiment 1. Perturbations (3.10)-(3.12) are used with the replace-
ment of AHxqFD in (3.11) by Ayq , where yq had 30 dB Poisson noise. The xqFD
and AyqReal are multiplied with a constant so that their maximum pixel value is
480.
The CNN obtained after the first stage is used as FBPconv.
While testing, we keep C = 0.4. Other training hyper-parameters and test-
ing parameters of the RPGD are kept the same as the RPGD40 for ⇥5 case in
Experiment 1.
3) Experiment 3. The CNNs are trained using the perturbations (3.10)-(3.12)
with two modifications: (i) xq is replaced with xqFD because the actual ground truth
74 Deep-Learning-based PGD for Iterative Reconstruction

Table 3.2: Reconstruction results for Experiment 1 with low mea-


surement noise (Gaussian). Gray cells indicate that the method was
tuned/trained for the corresponding noise level.

Case Measurement Quality Method


SNR (dB) Index FBP TV DL FBPconv RPGD
⇥16 1 SNR 12.74 24.21 23.11 26.19 27.02
SSIM 0.178 0.277 0.231 0.323 0.374
70 SNR 12.73 24.20 23.10 26.18 26.94
SSIM 0.178 0.277 0.231 0.324 0.325
⇥5 1 SNR 24.19 30.80 29.36 32.09 32.62
SSIM 0.434 0.511 0.424 0.480 0.554
70 SNR 24.15 30.74 29.24 32.08 32.56
SSIM 0.432 0.507 0.422 0.483 0.553

was unavailable; and (ii) AHxq in (3.11) is replaced with AyqReal because we have
now access to the actual sinogram.
All other training hyper-parameters and testing parameters are kept the same
as RPGD for the ⇥5 case in Experiment 1. Similar to Experiment 1, the CNN
obtained after the first stage of the sequential training is used as the FBPconv.

3.6 Results and Discussions


3.6.1 Experiment 1
We report in Tables 3.2 and 3.3 the results for low and high measurement noise,
respectively. FBPconv and RPGD are used for low noise, while FBPconv40 and
RPGD40 are used for high noise. The reconstruction SNRs and SSIMs are averaged
over the 25 test images. The gray cells indicate that the method was optimized for
that level of noise. As discussed earlier, adjusting for TV and DL indirectly implies
tuning for the measurement noise; therefore, all of the cells in these columns are
gray. This is different for the learning methods, where tuning for the measurement
noise requires retraining.
1) Low Measurement Noise. In the low-noise cases (Table 3.2), the proposed
3.6 Results and Discussions 75

1-dB zoom(1-dB) 45-dB 35-dB 1-dB zoom(1-dB) 45-dB


40-dB

Figure 3.2: Comparison of reconstructions using different methods for


the ⇥16 case in Experiment 1. First column: reconstruction from noiseless
measurements of a lung image. Second column: zoomed version of the area
marked by the box in the original in the first column. Third and fourth
columns: zoomed version for the case of 45 and 35 dB, respectively. Fifth
to eighth columns: corresponding results for an abdomen image. Seventh
and eighth column correspond to 45 and 40 dB, respectively.
76 Deep-Learning-based PGD for Iterative Reconstruction

Table 3.3: Reconstruction results for Experiment 1 with high mea-


surement noise (Gaussian). Gray cells indicate that the method was
tuned/trained for the corresponding noise level.

Case Measurement Quality Method


SNR (dB) Index FBP TV DL FBPconv40 RPGD40
⇥16 40+5 SNR 11.08 22.59 22.74 20.87 24.16
SSIM 0.127 0.238 0.222 0.161 0.262
40 SNR 9.09 21.40 22.13 23.26 23.73
SSIM 0.096 0.210 0.209 0.205 0.252
40-5 SNR 6.51 20.01 20.93 16.20 22.59
SSIM 0.066 0.179 0.187 0.128 0.221
⇥5 40+5 SNR 18.85 27.18 27.82 22.56 27.17
SSIM 0.241 0.367 0.364 0.201 0.384
40 SNR 14.96 25.46 26.26 28.24 27.61
SSIM 0.167 0.314 0.315 0.324 0.361
40-5 SNR 10.76 23.44 22.24 18.90 24.58
SSIM 0.110 0.261 0.263 0.193 0.300

RPGD method outperforms all the others for both ⇥5 and ⇥16 reductions in terms
of SNR and SSIM indices. FBP performs the worst but is able to retain enough
information to be utilized by FBPConv and RPGD. Due to the convexity of the
iterative scheme, TV is able to perform well but tends to smooth textures and
edges. DL performs worse than TV for ⇥16 case but is equivalent to it for ⇥5
case. On one hand, FBPConv outperforms both TV and DL. but it is surpassed
by RPGD. This is mainly due to the feedback mechanism in RPGD which lets
RPGD use the information in the given measurements to increase the quality of
the reconstruction. In fact, for the ⇥16, no noise, case, the SNRs of the sinogram
of the reconstructed images for TV, FBPconv, and RPGD are around 47 dB, 57
dB, and 62 dB, respectively. This means that reconstruction using RPGD has
both better image quality and more reliability since it is consistent with the given
noiseless measurement.
2) High Measurement Noise. In the noisier cases (Table 3.3), RPGD40 yields a
better SNR than other methods in the low-view cases (⇥16) and is more consistent
3.6 Results and Discussions 77

in performance than the others in the high-view (⇥5) cases. In terms of the SSIM
index, it outperforms all of them. The performance of DL and TV are robust to the
noise level with DL performing better than others in terms of SNR for the 45-dB,
⇥5, case. FBPconv40 substantially outperforms DL and TV in the two scenarios
with 40-dB noise measurement, over which it was actually trained. For this noise
level and ⇥5 case, it even performs slightly better than RPGD40 but only in terms
of SNR. However, as the level of noise deviates from 40 dB, the performance of
FBPconv40 degrades significantly. Surprisingly, its performances in the 45-dB cases
are much worse than those in the corresponding 40-dB cases. In fact, its SSIM
index for the 45-dB, ⇥5, case is even worse than FBP. This implies that FBPConv40
is highly sensitive to the difference between the training and testing conditions. By
contrast, RPGD40 is more robust to this difference due to its iterative correction.
In the ⇥16 case with 45-dB and 35-dB noise level, it outperforms FBPconv40 by
around 3.5 dB and 6 dB, respectively.
3) Case Study The reconstructions of lung and abdomen images for the case of
⇥16 downsampling and noiseless measurements are illustrated in Figure 3.2 (first
and fifth columns). FBP is dominated by line artifacts, while TV and DL satisfac-
torily removes those but blurs the fine structures. FBPConv and RPGD are able to
reconstruct these details. The zoomed version (second and sixth columns) suggests
that RPGD is able to reconstruct the fine details better than the other methods.
This observation remains the same when the measurement quality degrades. The
remaining columns, contain the reconstructions for different noise levels. For
the abdomen image it is noticeable that only TV is able to retain the small bone
structure marked by an arrow in the zoomed version of the lung image (seventh
column). Possible reason for this could be that the structure similar to this were
rare in the training set. Increasing the training data size with suitable images could
be a solution.
Figure 3.3 contains the profiles of high- and low-contrast regions of the recon-
structions for the two images. These regions are marked by line segments inside the
original image in the first column of Figure 3.2. The FBP profile is highly noisy and
the TV and DL profiles overly smooth the details. FBPconv40 is able to accommo-
date the sudden transitions in the high-contrast case. RPGD40 is slightly better
in this regard. For the low-contrast case, RPGD40 is able to follow the structures
of the original (GT) profile better than the others. A similar analysis holds for the
⇥5 case (see Figure A.3 in the Appendix).
78 Deep-Learning-based PGD for Iterative Reconstruction

High-contrast profile Low-contrast profile High-contrast profile Low-contrast


profile

Figure 3.3: Profile of the high- and low-contrast regions marked in the
first and fifth columns of Figure 3.2 by solid and dashed line segments,
respectively. First and second columns: ⇥16, 45-dB noise case for the lung
image. Third and fourth columns: ⇥16, 40-dB noise case for the abdomen
image.

3.6.2 Experiment 2

We show in Table 3.4 the regressed SNR and SSIM indices averaged over the 25
reconstructed slices. RPGD outperforms both FBP and FBPconv in terms of SNR
and SSIM. Similar to the Experiment 1, its performance is also more robust with
respect to noise mismatch. Fig. A.4 in the Appendix compares the reconstructions
for a given test slice.
3.7 Behavior of Algorithms 79

Table 3.4: Reconstruction results for Experiment 2 with Poisson noise


and ⇥5 views reduction. Grey cell indicate that the method was trained
for the corresponding noise level.

Measurement Quality Method


SNR (dB) Index FBP FBPconv RPGD
30-5 SNR 4.61 7.45 8.21
SSIM 0.112 0.134 0.154
30 SNR 5.96 9.18 9.22
SSIM 0.200 0.174 0.246
30+5 SNR 7.75 9.50 9.75
SSIM 0.305 0.132 0.332

3.6.3 Experiment 3
In Figure 3.4, we show the reconstruction result for one slice for = 10 5 . Since
the ground truth is unavailable, we show the reconstructions without a quantitative
comparison. It can be seen that the proposed method is able to reconstruct images
with reasonable perceptual quality.

3.7 Behavior of Algorithms


We now explore the behavior of the proposed method in more details, including its
empirical convergence and the effect of sequential training.

3.7.1 Convergence of RPGD


In Figure 3.5, we show the behavior of RPGD with respect to the iteration number
k for Experiment 1. The evolution of the SNR of images xk and their measure-
ments Hxk computed with respect to the ground truth image and the ground-truth
measurement are shown in Figures 3.5 (a) and (b), respectively. We give ↵k with
respect to the iteration k in Figure 3.5 (c). The results are averaged over 25 test
images for ⇥16, no noise, case and C = 0.99. RPGD outperforms all the other
methods in the context of both image quality and measurement consistency.
80 Deep-Learning-based PGD for Iterative Reconstruction

Results (1-dB) zoom diff

Figure 3.4: Reconstruction results for a test slice in Experiment 3. Full-


dose image is obtained by taking FBP of the full-view sinogram. The rest
of the reconstructions are obtained from the sparse-view (⇥5) sinogram.
The last column shows the difference between the reconstruction and the
full-dose image.
3.8 Summary 81

Due to the high value of the step size ( = 2 ⇥ 10 3 ) and the large difference
(Hxk y), the initial few iterations have large gradients and result in the instability
of the algorithm. The reason is that the CNN is fed with (xk HT (Hxk y)),
which is drastically different from the perturbations on which it was trained. In
this situation, ↵k decreases steeply and stabilizes the algorithm. At convergence,
↵k 6= 0; therefore, according to Theorem 3.3.1, x100 is the fixed point of (3.7) where
F = CNN .

3.7.2 Advantages of Sequential Training


Here, we experimentally verify the advantages of the sequential-training strategy
discussed in Section 3.5. Using the setup of Experiment 1, we compare the training
time and performance of the CNNs trained with and without this strategy for the
⇥16 downsampling and no noise case. For the gold standard (systematic training
of CNN), we train a CNN as a projector with the 3 types of perturbation in every
epoch. We use 135 epochs for training which is roughly equal to {T1 + T2 + T3 }
used during training for the corresponding sequential-training-based CNN. This
number was sufficient for the convergence of the training error. The reconstruction
performance of RPGD using this gold standard CNN is 26.86 dB, compared to
27.02 dB for RPGD using the sequentially trained CNN. The total training times
are 48 and 22 hours, respectively. This demonstrates that the sequential strategy
reduces the training time (in this case more than 50%), while preserving (or even
slightly increasing) the reconstruction performance.

3.8 Summary
In this chapter, we present a new image reconstruction method that replaces the
projector in a projected gradient descent (PGD) with a convolutional neural net-
work (CNN). Recently, CNNs trained as image-to-image regressors have successfully
been used to solve inverse problems in imaging. However, unlike existing iterative
image reconstruction algorithms, these CNN-based approaches usually lack a feed-
back mechanism to enforce that the reconstructed image is consistent with the
measurements. We propose a relaxed version of PGD wherein gradient descent
enforces measurement consistency, while a CNN recursively projects the solution
closer to the space of desired reconstruction images. We show that this algorithm is
82 Deep-Learning-based PGD for Iterative Reconstruction

30

27
25

RPGD
SNR

20 FBPconv
TV
FBP
15

10
0 k 50 100

(a) (b)
1

0.8

0.6
αk

0.4

0.2

0
0 k 50 100

(c)

Figure 3.5: Convergence with iteration k of RPGD for the Experiment


1, ⇥16, no-noise case when C = 0.99. Results are averaged over 25 test
images. (a) SNRs of xk with respect to the ground-truth image. (b) SNRs
of Hxk with respect to the ground-truth sinogram. (c) Evolution of the
relaxation parameters ↵k . In (a) and (b), the FBP, FBPconv, and TV
results are independent of the RPGD iteration k but have been shown for
the sake of comparison.
3.8 Summary 83

guaranteed to converge and, under certain conditions, converges to a local minimum


of a non-convex inverse problem. Finally, we propose a simple scheme to train the
CNN to act like a projector. Our experiments on sparse-view computed-tomography
reconstruction show an improvement over total variation-based regularization, dic-
tionary learning, and a state-of-the-art deep learning-based direct reconstruction
technique.
84 Deep-Learning-based PGD for Iterative Reconstruction
Part III

Third Generation

85
Chapter 4

Time-Dependent Deep Image


Prior
1
Although widely popular, classical and supervised deep-learning-based methods
discussed this far have limitations. The quality of the reconstruction of the former
is limited and the latter requires training data. Recently, a new unsupervised deep-
learning method has emerged which are based on deep-image-prior. These methods
requires no additional training nor training data, while their reconstruction quality
surpasses the classical methods and is comparable to that of the supervised deep-
learning approaches. In this chapter, we extend the deep image prior methods for
time dependent inverse problems. Although the chapter targets the application of
dynamic MRI, the method is applicable to other modalities as well.

4.1 Overview
There are currently three main approaches to accelerate the magnetic resonance
imaging (MRI) of a static image. All three methods rely on a partial sampling of
the k-space to reduce the acquisition time. The resulting partial loss of data must
then be compensated to maintain the quality of the image. Once compensation is
1 The content of this chapter is based on our work [116].

87
88 Time-Dependent Deep Image Prior

achieved, the accelerated methods capture accurate motions of fast moving organs
such as the heart.

i. In parallel MRI (pMRI), the simultaneous use of several hardware coils re-
sults in spatial redundancy that enables algorithms to reconstruct clean im-
ages [117, 118].

ii. In compressed sensing (CS) MRI, the data are assumed to be sparse in cer-
tain transform domains [119, 120]. This ultimately leads to regularized com-
putational methods that compensate for the partial sampling. Their success
suggests that, in particular, a Fourier-based forward model matches well the
assumption of sparsity.

iii. In the context of trainable deep artificial neural networks, learning approaches
have already achieved fast and accurate reconstructions of partially sampled
MRI data [15,121]. Similarly to CS MRI, dynamic accelerated reconstructions
have also been proposed in the literature [122–124], possibly in combination
with pMRI in the learning loop [24]. These approaches depend on training
datasets [17, 20, 22, 84, 125, 126].

In the context of dynamic MRI, the approach that consists in the acquisition of
a sequence of frames is suboptimal. Instead, it is more efficient to take advantage of
the time dependencies between frames to gain additional improvements in terms of
temporal resolution [127, 128]. For instance, [129, 130] design non-overlapping sam-
pling masks at each frame to restore a dynamic volume—a method that demands
far fewer k-space samples than would a sequence of static MRI. Indeed, the CS
theory explains the possibility of perfect reconstruction despite a sub-Nyquist sam-
pling [131]. The capture of temporal redundancies has also been handled through
low-dimensional manifolds [132, 133]. In the specific case of cardiac applications,
the overall motion of the heart is expected to be approximately cyclic. This peri-
odicity can be exploited to bring additional gains in terms of temporal resolution,
but the length and phase of the cycles must be determined first. This is usually
achieved either through electrocardiograms (ECG) or self-gating [134, 135]. Under
the restrictive assumption of ideal periodicity, these methods allow one to prefix
the cardiac phases and to reorder temporally the acquired frames, effectively pro-
ducing a stroboscope-inspired analysis of the motion. Motion irregularities are not
captured by those methods.
4.1 Overview 89

4.1.1 Contribution
In this chapter, we propose an unsupervised learning framework in which a neu-
ral network reconstructs fast dynamic MRI from golden-angle radial lines in k-
space, also called spokes. To reconstruct a single image, we feed one realization of
low-dimensional latent variables to a artificial neural network. A nonuniform fast
Fourier transform2 (NuFFT) is then applied to the output of the neural network
and simulates the MRI measurement process [136]. Inspired by deep image pri-
ors [27], we fit the simulated measurements to the real measurements. The fit is
controlled by adjusting the weights of the neural network until the Euclidean loss
between the simulated and real measurements is minimized; this fitting process is
referred to as the learning stage.
In the context of dynamic MRI, we extend the fitting process in such a way that
the weights of the network are learned by taking simultaneously into consideration
the joint collection of all acquisitions, which yields time-independent weights. Time
dependence is recovered by controlling the latent variables. Given some temporal
interval, we synthesize two independent random realizations of low-dimensional la-
tent variables and associate one to the initial and one to the final bound of the
interval. Timestamped contributions to learning are obtained by taking as ground-
truth the real measurements acquired over a quasi-instantaneous observation period
(say, five spokes), while we let the activation of the neural network be the inter-
mediate realization of the latent variables obtained by linear interpolation of the
two latent endpoints. This approach allows us to impose and exploit temporal
dependencies in the latent space.
In short, the action of our neural network is to map the manifold of latent
variables onto a manifold of dynamic images. Importantly, our approach is purely
unsupervised; moreover, priors are imposed only indirectly, arising from the mere
structure of a convolutional network. We demonstrate the performance of our
neural network by comparing its reconstructions to those obtained from CS algo-
rithms [129, 130].

4.1.2 Related Works


Deep image priors have been introduced in [27]. They capitalize on the structure of
convolutional artificial neural networks to adjust priors to the data, thus avoiding
2 https://github.com/marchdf/python-nufft
90 Time-Dependent Deep Image Prior

the limitations and pitfalls of hand-crafted priors. They have been deployed in [137]
to build an unsupervised learning scheme for accelerated MRI; but, contrarily to
ours, the task addressed therein is static. Other researchers have used deep image
priors to reconstruct positron emission tomography images, albeit again in a non-
dynamic fashion [138].

4.2 Methods
Let R be the Radon transform of the complex-valued continuously defined image
x : R2 ! C, so that
Z
R{x}(r, #) = x(⇠) (uT
#⇠ r) d⇠, (4.1)
R2

where r and # are the spatial and angular Radon arguments, respectively, and where
u# is a unit vector in the # direction. Moreover, let F denote the one-dimensional
continuous Fourier transform that follows the convention
Z
F{x}(!) = x(r) e j ! r dr (4.2)
R

at spatial pulsation !, otherwise known as a coordinate in k-space. Then, our


conceptual model of non-Cartesian MRI is the concatenated linear transform

H# {x}(!) = F{R{x}(·, #)}(!) (4.3)

which maps a two-dimensional static image onto its measurements at continuously


defined direction # and pulsation !. Mathematically, H# {x}(!) is invertible be-
cause so are F and R. Consequently, provided we know H# {x}(!) for every direc-
tion # and every pulsation !, we can in principle recover the value of x(⇠) at every
spatial argument ⇠.

4.2.1 Static Discretization


Unfortunately, H# {x}(!) can be known in practice only at finitely many discrete
directions and pulsations. This discretization, which amounts to modeling MRI by
a NuFFT, makes the discrete transform an unfaithful surrogate of our conceptual
4.2 Methods 91

Figure 4.1: Nonuniform fast Fourier transform with golden-angle scheme


and spoke sharing.

model, in particular because of aliasing concerns, and also because the discrete
version is no more invertible.
Formally, let x 2 CN be a vectorized version of the samples of x seen as an
image of finite size (N1 ⇥ N2 ), with N = N1 N2 . Likewise, let y 2 CM be a
vectorized version of the samples of the sinogram H# {x}(!), with measurements of
finite size (M# ⇥ M! ) taken over M# orientations and M! elements of the k-space,
with M = M# M! . Then, by linearity of the transformation we write that
y = G x, (4.4)
where G is an M -rows by N -columns matrix that combines discrete Fourier and
discrete Radon transforms.

4.2.2 Spoke Sharing


In dynamic MRI, it is acknowledged that the image x changes continuously through
time. We assume however that the measurements of a spoke (a radial line in k-
92 Time-Dependent Deep Image Prior

space, as suggested in Figure 4.1) are instantaneous and indexed at discrete times
t0 2 Z t, taken regularly at temporal interval t. The spoke orientations follow
the golden-angle strategy
#t = #0 + !0 t, (4.5)

where #t gives the orientation of a spoke at continuous time t 2 R, with !0


its angular velocity. The golden-angle specificity is the irrationality condition
(!0 t/⇡) 2 / Q, which is approximated by setting (!0 t) ⇡ 111.25 [129]. We
finally denote an image frame at time t as xt and its spatially discretized version
of length N as xt . Its associated sinogram is st . As it turns out, however, it is
only natural to set M# = 1 for the discretization of st because of our assumption
of ideal temporal sampling. Then, st is a direct representation of a spoke and has
length M = M! . Its dependence on xt is encoded in the time-dependent M! -rows
by N -columns system matrix Gt , with st = Gt xt .
In accelerated dynamic MRI, one acquires st0 for t0 2 Z t and wants to recon-
struct xt0 or, possibly, xt for t 2 R. Clearly, however, a single orientation does not
provide sufficient data for the the recovery of the two-dimensional xt0 . To over-
come this issue, we assume next that the changes are slow over some small n, so
that xt ⇡ xt0 for all t in the half-open interval Tt0 = [t0 n t/2, t0 + n t/2).
In practice, the odd n 2 2 N + 1 corresponds to the number of radial lines used
for reconstruction and is related to the temporal resolution of dynamic imag-
ing. We then collect n neighboring spokes3 and concatenate them in the vector
yt0 = (sm t )m2Z\(Tt0 / t) of length (n M! ). Through this mechanism, there are
spokes st0 that are shared between, say, yt0 and yt0 + t .
The dependence of yt0 on xt0 is encoded now in the time-dependent (n M! )-
rows by N -columns matrix Ht0 = (Gm t )m2Z\(Tt0 / t) , so that (4.4) becomes
time-dependent by writing that

y t 0 = Ht 0 x t 0 . (4.6)

Because of the irrationality condition of the golden-angle approach, no direction


will ever be measured twice and Ht0 acquires effective time dependence.

3 In the practical case of finite-time acquisitions, it may happen that no s


t0 is available for some
t0 2 T0 . In this case, which happens near the beginning and the end of the temporal range, we
use one-sided nearest spokes.
4.2 Methods 93

4.2.3 Regularization
Even with n > 1, it is observed that (n M! ) ⌧ N , which makes severely ill-
posed the recovery of xt0 given yt0 . To truly resolve this issue, practitioners often
choose to regularize the problem over some extended temporal range. From a
notational perspective, K vectors yt0 are concatenated over a large duration (K t)
to build Y = (yk t )k2[0...K 1] . Likewise, we write that X = (xk t )k2[0...K 1]
and H = [Hk t ]k2[0...K 1] . The length of Y and X are (K n M! ) and (K N ),
respectively. The size of H ensues.
In the context of CS dynamic imaging, the traditional regularization of the
forward model (4.6) is established as a search for the solution
⇣ ⌘
2
X⇤ = arg min kH X Yk2 + kD Xkp , (4.7)
X

where D is a sparsifying transform along the temporal domain. Typically, this


transform is a finite-difference operator, used as surrogate of the conceptual first-
order derivative D. The corresponding regularization term encourages temporal
dependency and counterbalances the ill-posedness of (4.6).
By contrast, there exists no explicit regularizer in the context of dynamic MRI
reconstruction by traditional neural networks [122, 139]. In return, image priors
are data-driven, imposed by the supervised learning of ground-truth pairs (Y, X).
Letting f : CM ! CN represent the function that the network implements, where
gives network parameters such as weights and biases, learning will return the
solution
⇤ 2
= arg min E{(Y,X)} f (HH Y) X 2 . (4.8)

There, HH is the Hermitian transpose of H. With some abuse of notation, the


learning process is represented by the expectation operator E. The set {(Y, X)}
provides the learning data. After learning has converged to some ⇤ , reconstructed
images are obtained as f ⇤ (HH Y).

4.2.4 Deep Image Prior with Interpolated Latent Variables


Supervised learning cannot be performed in the absence of ground truth; unsu-
pervised learning methods must be used instead. To that effect, neural networks
associated with deep image priors have been proposed in [140], while a cost func-
tion that is appropriate to unsupervised learning has been developed in [27]. In
94 Time-Dependent Deep Image Prior

Figure 4.2: Flowchart of our framework. Conv.: convolutional layers,


BN: batch normalization. The details are described in Table 4.1.
4.2 Methods 95

this chapter, we propose an extension whereby we address the needs of dynamic


MRI by performing temporal interpolation of the latent variables. The sketch of
our approach is provided in Figure 4.2.

Inpainting and Denoising


As introduction, we address first the static applications of inpainting and denoising.
There, the purpose of deep-image-prior algorithms is to reconstruct a clean signal
x⇤ = f✓⇤ (z) given the perturbed measurement x. The neural network with optimal
parameter
2
✓⇤ = arg min kA(f✓ (z)) xk2 (4.9)

minimizes a data-fidelity term characterized by the forward model A, which is


assumed to be known. The Z-dimensional latent variable z 2 RZ is maintained
at a fixed value during the whole optimization process. Often, the minimizer ✓⇤
is calculated using some stochastic gradient-descent method with random initial
parameter [141]. In summary, deep-image-prior methods achieve strong priors from
the structure of a generator network and capture advanced image statistics in a
purely unsupervised fashion.

Accelerated Dynamic MRI


Our main contribution in this chapter is to use interpolated latent variables as in-
puts of a neural network. We start by obtaining two realizations of random discrete
images; these realizations are kept for the entire duration of the procedure. The
first realization z0 takes the role of the latent variable associated to the beginning
of the dynamic MRI sequence. The second realization is denoted by z(K 1) t and
is associated to the end of the sequence. Intermediate realizations are built like
shown in Figure 4.3 and are also used as latent variables.
Choosing the dimension Z of the latent variables to be small, we conjecture that
the linearly interpolated latent variables span a low-dimensional, smooth manifold.
In other words, we interpret our proposal as a way to impose data-driven temporal
dependency in the latent space. A convolutional neural network will then transfer
this low-dimensional manifold to some corresponding low-dimensional manifold in
the MRI space, the central tenet of this work being that the frames from dynamic
imaging do span a low-dimensional manifold, too.
96 Time-Dependent Deep Image Prior

Figure 4.3: Interpolated latent variables.


4.2 Methods 97

For a single coil, our deep prior minimizes an Euclidean loss and results in the
solution
K
X1
⇤ 2
= arg min kHk t g (zk t ) yk t k2 , (4.10)
k=0

where g : CZ ! CN represents the deep neural network of parameter . For C


coils in pMRI, we establish the solution
C K
X X1
⇤ 2
= arg min kHk t (Cc g (zk t )) yc,k t k2 , (4.11)
c=1 k=0

where Cc gives the sensitivity map of the cth coil, is a pixel-wise multiplication
operator in the spatial domain which relates true magnetization image to coil sen-
sitivities, and yc,t concatenates n instantaneous acquisitions of spokes for the cth
coil. Once an optimal ⇤ has been found in either (4.10) or (4.11), we can produce
the final estimate x⇤t for all values of t, including for t 2
/ Z t if desired, as

x⇤t = g ⇤ (zt ). (4.12)

4.2.5 Architectures, Datasets, and Training


Architectures
We design our neural network as shown in Table 4.1. The neural network consists
of convolutional layers, batch normalization layers, ReLU, and nearest-neighbor
interpolations. We apply zero-padding before convolution to let the size of the
output mirror that of the input. At the last layer, ReLU is not used. The input of
the network is a small-size random variable generated from the uniform distribution
U ⇠ (0, 0.1), as explained in Section 4.2.4. The output has two channels because
MRI images take complex values.

Datasets
All experimental datasets are breath-hold. We use golden-angle radial sparse par-
allel (GRASP) MRI as a common baseline [129]. Spoke-sharing is not applied for
98 Time-Dependent Deep Image Prior

Table 4.1: Architecture of our convolutional network. Conv.: convolution;


BN: batch normalization; NN interp.: nearest-neighbor interpolation.

Size of
Number Size of Each Zero
Strides Output
Operation Layer of Filter Padding
(XY) Image
Filters (XYC) (XY)
(XYC)

Input 8⇥8
Conv+BN+ReLU 128 3⇥3⇥1 1⇥1 1⇥1 8 ⇥ 8 ⇥ 128
Conv+BN+ReLU 128 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 8 ⇥ 8 ⇥ 128
NN interp. 2⇥2 16 ⇥ 16 ⇥ 128
2⇥(Conv+BN+ReLU) 128 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 16 ⇥ 16 ⇥ 128
NN interp. 2⇥2 32 ⇥ 32 ⇥ 128
2⇥(Conv+BN+ReLU) 128 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 32 ⇥ 32 ⇥ 128
NN interp. 2⇥2 64 ⇥ 64 ⇥ 128
2⇥(Conv+BN+ReLU) 128 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 64 ⇥ 64 ⇥ 128
NN interp. 2⇥2 128 ⇥ 128 ⇥ 128
2⇥(Conv+BN+ReLU) 128 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 128 ⇥ 128 ⇥ 128
Conv. 2 3 ⇥ 3 ⇥ 128 1⇥1 1⇥1 128 ⇥ 128 ⇥ 2
4.2 Methods 99

Algorithm 2 Time-Dependent Deep-Image-Prior


Input: Number of iterations niter , Batch size B, Measurement sequence {yk }K 1
k=0 ,
K 1
Untrained neural network g , Latent variable sequence {zk }k=0 .
Output: Reconstructed image sequence {x⇤k }K 1
k=0 }
for niter do

• Sample a batch {b0 , . . . , bB 1} of size B from {0, . . . , K 1}.


• Compute batch Loss
B 1
1 X
LB ( ) = kybk Hbk g (zbk )k2 .
B
k=0

• Update with gradient r✓ LB ( ).


end for
return Reconstruct image series {x⇤k = g ⇤ (zk )}K 1
k=0 .

GRASP. We assume two-fold upsampling of measurements for every dataset. There-


fore, the size of the reconstructed fields of view is half that of the first dimension
of the measurements.

• Retrospective Simulation A cardiac cine data set was acquired using a 3T


whole-body MRI scanner (Siemens; Tim Trio) equipped with a 32-element
cardiac coil array. The acquisition sequence was bSSFP and prospective car-
diac gating was used. The imaging parameters were as follows: FOV=300 ⇥
300 mm2 , acquisition matrix size=128 ⇥ 128, TE/TR=1.37/2.7 ms, receiver
bandwidth=1184 Hz/pixel, and flip angle=40 . The number of cardiac phases
was 23 and the temporal resolution was 43.2 ms. We then used NuFFT to
implement the forward model in a golden-angle context, resulting in fully
sampled Cartesian trajectories. Then, sinograms were obtained as shown in
Figure 4.1. The number of spokes per phase is 13. The dimension of sino-
grams is (K ⇥ M! ⇥ C) = 23 · 13 ⇥ 256 ⇥ 32. In this simulation, the cardiac
motion is discrete; thus, no spoke-sharing strategy is applied.
100 Time-Dependent Deep Image Prior

• Golden-angle Reconstruction of Fetal Cardiac Motion Fetal cardiac


MRI data were acquired on a 1.5 T clinical MR scanner (MAGNETOM Aera,
Siemens AG, Healthcare Sector, Erlangen, Germany) with an 18-channel body
array coil and a 32-channel spine coil for signal reception. Images were ac-
quired with an untriggered continuous 2D bSSFP sequence that was modified
to acquire radial readouts with a golden-angle trajectory [135]. The acqui-
sition parameters were: FOV = 260 ⇥ 260 mm2 , acquisition matrix size =
256 ⇥ 256 pixels, slice thickness = 4.0 mm, TE/TR = 1.99/4.1 ms, RF ex-
citation angle = 70deg, radial readouts = 1600, acquisition time = 6.7 s and
bandwidth = 1028 Hz/pixel.

Reconstruction Experiments
We use an Intel i7-7820X (3.60GHz) CPU and an Nvidia Titan X (Pascal) GPU.
Pytorch 1.0.0 on Python 3.6 is used to implement our method. All experiments
are performed in single-batch mode. The input size is (8 ⇥ 8). The cost function
used to train our neural network is (4.11). The learning rate is 10 3 , with [141] as
optimizer.

• Retrospective Simulation The number of iterations is 10,000. The schedul-


ing for the learning rate is 0.5 multiplier per 2,000 iterations. We set U ⇠
(0, 0.1).

• Golden-angle Reconstruction of Fetal Cardiac Motion The number of


iterations is 20,000. No scheduling for the learning rate is applied. We used
14 cycles of latent variables. We set U ⇠ (0, 10) to obtain the results shown
in Figure 4.6.

We use the regressed signal-to-noise ratio (RSNR) as a quantitative metric in our


retrospective simulations. With x the oracle and x⇤ the reconstructed image, RSNR
is given by
kxk2
RSNR = max 20 log , (4.13)
a,b2R kx a x⇤ + bk2
where a higher RSNR corresponds to a better reconstruction.
For real datasets, a CS approach with self-gated signals takes the role of a
state-of-the-art baseline [134, 135].
4.3 Results 101

4.3 Results
4.3.1 Retrospective Simulation
In this experiment, the acquisition process is simulated, which allows us to build
the ground truth from a fully sampled k-space. We use n = 13 spokes for the
reconstruction and present the results in Figure 4.4. There, the bandpass method
(BP) corresponds to a zero-filled DFT while GRASP is the baseline against which
we compare the performance of our method. We see in Figure 4.4 (A) that GRASP
leads to blurring artifacts, while the residual map discloses the occurrence of errors
around the wall of the heart. By contrast, our proposed method gives better results.
This is confirmed in Figure 4.4 (B), where the cardiac motions are captured better
by our model than by GRASP. The systolic phase of our reconstruction is well
described and very close to the ground-truth, whereas the systolic phase captured
by GRASP is too flat.

4.3.2 Golden-Angle Reconstruction of Fetal Cardiac Motion


In this experiment, the acquisition process is real, so that no ground truth is avail-
able. We use n = 5 spokes for most reconstructions (n = 15 for GRASP) and
present the results in Figure 4.5. The synthesis of the sequence of latent vari-
ables is explained in Section 4.4.1. The overlapped method (OV) corresponds to
frames generated from all spokes, while the reordered method (RD) is able to re-
construct multiple cardiac phases by reordering k-t sparse SENSE by self-gated
signals [134, 135]. Self-gating, driven by correlation coefficients between approxi-
mately reconstructed images, is necessary because it is impractical to capture the
electrocardiogram signals of a fetal heart. This ultimately prevents the reordering
of golden angles in accordance with their phase.
In the absence of ground truth, we shall take OV to be the reference image for
navigation purposes. However, because OV considers all spokes simultaneously, it
is a static image that is of high quality only in regions that are not moving. We
see in Figure 4.5 (A) that BP and GRASP are noisier than RD and our method.
Comparing now RD to our approach, we see that ours produces better-resolved
features, particularly for the hyper-intense dot-like structures.
In Figure 4.5 (B), it becomes apparent that BP fails altogether to capture the
fetal cardiac beats, while GRASP is less noisy but still mostly fails to reconstruct
102 Time-Dependent Deep Image Prior

(A)
Residual

(B)

Figure 4.4: Retrospective reconstructions of cine dynamic MRI, with


n = 13. Top rows (A), from left to right: ground truth; field of view of
reconstructed frames from BP, GRASP, and ours. Bottom row (B), from
left to right: ground-truth with a white line indicating the (y-t) location
of cross sections; cross sections from BP, GRASP, and ours.
4.3 Results 103

(A)

(B)

Figure 4.5: Dynamic reconstructions of a fetal heart for one beating cycle.
Top rows (A), from left to right: field of view from OV, BP, GRASP,
RD, and ours. Bottom row (B), from left to right: OV with a white
line indicating the (y-t) location of cross sections; cross sections from BP,
GRASP, RD, and ours.
104 Time-Dependent Deep Image Prior

(A)

image manifold latent variable manifold

(B)

Figure 4.6: Top (A): Series of (y-t) cross sections of our reconstruction
from the region of interest in Figure 4.5 (B). Bottom (B): t-SNE embedding
from image frames (left) and latent variables (right). The temporal index
is color-coded.
4.4 Discussion 105

motions. RD fares better; unfortunately, its reordering process may lead it to


superpose in the same frame spokes that may belong indeed to different cardiac
cycles. By contrast, our method4 reconstructs each frame with data from just a few
neighboring spokes, thus avoiding the mingling of different cycles. Consequently,
we expect our reconstructed systolic phase to capture well the true motion of the
heart. The cross section from our method is similar to that of RD but the motion
is smoother in our case, which is the expected behavior of a beating heart.
We provide in Figure 4.6 (A) our complete reconstructed sequence of cardiac
cycles, in the form of a (y-t) cross section. The quasi-periodicity of the cardiac
motion is clearly visible along the temporal axis, while motion variations can be
discerned from cycle to cycle. In Figure 4.6 (B), we also explore over the image and
latent domains the structure of the manifold of the cardiac motion. The visualiza-
tion proceeds through a (t-SNE) embedding [142]. We observe that the continuous
trajectory for the image manifold is well aligned with the temporal index, while
the input latent variables lie on a smooth manifold in the latent space. The latent
variables that correspond to Figure 4.6 (B) have been cut in fourteen chunks; the
reason for this is explained in Section 4.4.1. The importance of smoothness in the
latent variables is described in Section 4.4.2.

4.4 Discussion
4.4.1 Latent Encoding for Acyclic Data
Letting K in Section 4.2.3 be such that truly all data in Figure 4.6 (A) were
taken jointly, and interpolating the latent variables between the only two endpoint
realizations z0 and z(K 1) t , we observed that the reconstruction x⇤t of the fetal
cardiac motion took a constant, time-independent value. We surmise that this
failure is due to the overly strong presence of non-periodic components in the data.
To cope with them, we adapted our scheme slightly and proceeded by piecewise
interpolation, the pieces being made of temporal chunks in the latent space. More
precisely, we generated fourteen realizations {z(⌧ ) }⌧ 2[0...13] of the latent variables,
equi-spaced in time; then, instead of building zt as a linear combination of z0 = z(0)
4 For display purposes, we show only one cycle of our cross section. In fact, our reconstructed

data have as many frames (K = 1,400) than there are spokes, in reason of the spoke-sharing
mechanism of Section 4.2.2.
106 Time-Dependent Deep Image Prior

and z(K 1) t = z(13) , we built zt as a linear combination of z(⌧ ) and z(⌧ +1) , with
an appropriate ⌧ that depends on t. Note that, while the latent variables evolve
now chunk-wise, the network is still time-independent and trained over all data
jointly. The chunk boundaries are made visible in Figure 4.6 (B).

4.4.2 Smoothness in the Manifold of Latent Variables


To explore the importance of smoothness in the manifold of latent variables, we
have trained separately two neural networks, each with a different configuration
for the latent variables. On one hand, we have considered independent realizations
of random latent variables at each frame, which resulted in an uncorrelated, non-
smooth latent spatio-temporal manifold. On the other hand, we have followed the
interpolation procedure of Figure 4.3. After examination of the outcome shown in
Figure 4.7, we conclude that the non-smooth manifold fails to reconstruct dynamic
images, while our proposed approach succeeds.

4.4.3 Size of Latent Variables


Until now, we have fixed the size of the latent variables as (8 ⇥ 8), which corre-
sponds to Z = 64. In this section instead, we let Z vary and report the resulting
RSNR in Table 4.2, where we observe an acceptable quality of reconstruction for
latent variables of size ranging between (2 ⇥ 2) and (16 ⇥ 16). For larger sizes, the
reconstruction are corrupted by significant artifacts. We expected the smallest,
(1 ⇥ 1) size to reflect the fact that time is a one-dimensional variable. Yet, the
learning process failed to converge to a desirable solution in this case, the corre-
sponding sequence of reconstructed images taking the constant, time-independent
value shown in the last row of Figure 4.8. In conclusion, this experiment suggests
that the (8 ⇥ 8) size provides a good tradeoff between the dimension of the mani-
fold spanned by the latent variables and the convergence issues inherent with any
optimization procedure.

4.4.4 Variations on Latent Variables


In this section, we explore various scenarii beyond linear interpolation, which we
summarize in Figure 4.8.
4.4 Discussion 107

trained with
non-smooth latent variables

4th 5th 6th 7th


12.26

19.29

4th 5th 6th 7th

trained with
smooth/interpolated latent variables

Figure 4.7: Importance of smoothness in a latent manifold.

Table 4.2: Regressed SNR in terms of the size of the latent variables.

Latent size 1⇥1 2⇥2 4⇥4 8⇥8 16 ⇥ 16 32 ⇥ 32 64 ⇥ 64 128 ⇥ 128


RSNR 16.9 19.1 18.9 19.1 18.9 17.5 13.7 13.6
108 Time-Dependent Deep Image Prior

Figure 4.8: Latent-variable scenarii. Here, T = (k 1) t.


4.4 Discussion 109

i. Temporal Interpolation In the first scenario, we consider t 2 / Z t, which


corresponds to a fine interpolation of the intermediate latent variables. This
gives us access to temporally interpolated images at the output of our neural
network.

ii. Temporal Extrapolation In the second scenario, we extrapolate the latent


variables outside of the temporal range that was available for learning. This
gives distorted images.

iii. New Latent Variables In the third scenario, we make use of random real-
izations of latent variables that differ from those used while learning. This
gives severely perturbed images.

iv. Perturbations In the fourth scenario, we perturb the latent variables by


adding uniform noise whose energy amounts to 10 %. This setting outputs
clean images.

v. Scalar Latent Variables In the fifth and last scenario, we deploy scalar
latent variables. This results in a time-independent, non-moving sequence of
images.

4.4.5 Memory Savings


In a CS context, the gradient updates of the iterative optimization process would
need one to allocate enough memory to hold a whole target reconstruction volume.
For example, the reconstruction of 5,000 frames with spatial size (256 ⇥ 256) would
need one to handle data of size (256 ⇥ 256 ⇥ 5000), which demands for over a
gigabyte of memory.
By contrast, our approach is much less memory-hungry. It optimizes the neural
network using batches, which requires the simultaneous handling of only those
frames that correspond to the batch size. In short, the fact that our proposed
approach handles few 2D images whereas CS handles a 2D+t extended sequence
leads to substantial savings, particularly for golden-angle dynamic MRI with many
frames. In our approach, we only save a nerual network; for example, its memory
demands for the spatial size (256 ⇥ 256) are about half-a-dozen megabytes. This
cost is negligible compared to that of the CS approach.
110 Time-Dependent Deep Image Prior

4.4.6 Benefits of Our Approach

Several competing methods aim at synthesizing a single cardiac cycle out of spokes
collected over several cycles at various cardiac phases. There, a synchronized ECG
could in principle allow one to associate a phase to each spoke; however, the de-
ployment of this ancillary ECG would make the practical apparatus more involved,
which is often undesirable. Furthermore, there are applications such as fetal car-
diology where no ECG can be deployed at all. In traditional ECG-free approaches
instead, one deploys self-gating methods for the purpose of phase assignment. They
proceed on the basis of either human inspection or heuristic decisions, which makes
them arduous, non-reproducible, and prone to errors. (Sometimes, the assign-
ment is no more advanced than a simple manual sorting.) One specific additional
difficulty that self-gating methods must deal with originates with the necessary
Cartesian-to-polar conversion inherent in radial sampling trajectories, which ul-
timately results in streaking artifacts that tend to confound phase assignments,
particularly those based on visual assessments in the spatial domain.

By contrast, in our proposed approach we relax the hypothesis of ideal pe-


riodicity. As presented in Section 4.4.1, this allows us to take into consideration
cycle-to-cycle variations, thus providing access to clinical insights that are not avail-
able with traditional accelerated methods. Moreover, our ECG-free approach has
the major benefit that no phase assignment is needed, thus providing access to a
continuously defined time variable. Being fully automated, our approach is also
reproducible. Another advantage is that the streaking artifacts associated to ra-
dial trajectories play no role since the reference data used for training in (4.10)
or (4.11) are the spokes themselves, as opposed to reconstructed images. Finally,
when compared to CS, our approach achieves better reconstruction, with a gain of
0.69 dB, as seen in Figure 4.4. It also leads to a simpler optimization task with
fewer hyper-parameters. For instance, k-t SENSE requires three interdependent
hyper-parameters whose optimal value is found only after some substantial grid-
search effort, while the two hyper-parameters of our approach are easier to interpret
since they trivially consist of just an initial learning rate, along with a number of
iterations.
4.5 Summary 111

4.4.7 Limitations and Future Work


One major limitation of our proposed approach is its computational complexity. At
times, no fewer than 10,000 iterations are required before convergence is observed.
This imposes a large computational burden since every iteration involves both for-
ward and adjoint operations. From a technological perspective, we implemented
the forward model in Python 3.6, with Pytorch (v1.0.0) as main library. Unfortu-
nately, NuFFT is currently optimized neither for Python nor for GPU usage, which
lead to a marked slowdown of our implementation. For instance, our method spent
a whole day letting one GPU (TITAN X) process the cardiac dataset presented in
Section 4.2.5, whereas GRASP terminated within ten minutes. This bottleneck will
be improved by a better integration of the NuFFT library.
In our future work, we want to explore the dynamics of the latent variables. In
this work indeed, we simply took advantage of linear interpolation to build trivial
intermediate states of the latent variables; however, refined approaches may better
capture the temporal correlation between frames.

4.5 Summary
In this chapter, we develop a novel unsupervised deep-learning-based algorithm to
solve the inverse problem found in dynamic magnetic resonance imaging (MRI).
Our method needs neither prior training nor additional data; in particular, it does
not require either electrocardiogram or spokes-reordering in the context of cardiac
images. It generalizes to sequences of images the recently introduced deep-image-
prior approach. The essence of the proposed algorithm is to proceed in two steps to
fit k-space synthetic measurements to sparsely acquired dynamic MRI data. In the
first step, we deploy a convolutional neural network (CNN) driven by a sequence of
low-dimensional latent variables to generate a dynamic series of MRI images. In the
second step, we submit the generated images to a nonuniform fast Fourier transform
that represents the forward model of the MRI system. By manipulating the weights
of the CNN, we fit our synthetic measurements to the acquired MRI data. The
corresponding images from the CNN then provide the output of our system; their
evolution through time is driven by controlling the sequence of latent variables
whose interpolation gives access to the sub-frame—or even continuous—temporal
control of reconstructed dynamic images. We perform experiments on simulated
112 Time-Dependent Deep Image Prior

and real cardiac images of a fetus acquired through 5-spoke-based golden-angle


measurements. Our results show improvement over the current state-of-the-art. To
the best of our knowledge, this work is the first approach of unsupervised learning
in accelerated dynamic MRI.
Chapter 5

CryoGAN: Cryo-EM
Reconstruction using GAN
Framework
1
In the previous chapter we discussed deep image prior, which, in principle is ap-
plicable to any inverse problem where the forward model is exactly known. In this
chapter, we propose a new unsupervised deep-learning-based algorithm for Cryo-
EM, an imaging problem where only a parameteric form of the forward model is
known. Our method, as we shall see, approaches this problem from a distributional
perspective.

5.1 Overview
Single-particle cryo-electron microscopy (Cryo-EM) is a powerful method to deter-
mine the atomic structure of macro-molecules by imaging them with electron rays at
cryogenic temperatures [143–145]. Its popularity has rocketed in recent years, cul-
minating in 2017 with the Nobel Prizes of Jacques Dubochet, Richard Henderson,
1 This chapter uses content from our work [28].

113
114 CryoGAN: Cryo-EM Reconstruction using GAN Framework

and Joachim Frank. In Cryo-EM one obtains many 2D noisy tomographic projec-
tions from separate instance of the same but randomly oriented 3D biomolecule2 .
There exists a multitude of software packages to produce high-resolution 3D struc-
ture(s) from these 2D measurements [146–153]. These sophisticated algorithms,
which include projection-matching approaches and maximum-likelihood optimiza-
tion frameworks, enable the determination of structures with unprecedented atomic
resolution.
Yet reconstruction procedures in single-particle Cryo-EM still face complex ob-
stacles. The task involves a high-dimensional, nonconvex optimization problem
with numerous local minima. Hence, the outcome of the global process is pred-
icated on the quality of the initial reconstruction [154, 155]. Moreover, one still
often relies on the input of an expert user for appropriate processing decisions and
parameter tuning [156]. Even for more automated methods, the risk of outputting
incorrect and misleading 3D reconstructions is ever-present. A key reason behind
such complexity is that the imaged particles have unknown poses. To handle this,
most software packages rely on a marginalized maximum-likelihood (ML) formula-
tion [157] that is solved through an expectation-maximization algorithm [151, 153].
The latter involves calculations over the discretized space of poses for each projec-
tion, a computationally demanding procedure.
To bypass these limitations, we introduce CryoGAN, an unsupervised recon-
struction algorithm for single-particle Cryo-EM that exploits the remarkable ability
of generative adversarial networks (GANs) to capture data distributions [36]. Sim-
ilar to GANs, CryoGAN is driven by the competitive training of two entities: one
that tries to capture the distribution of real data, and another that discriminates
between generated samples and samples from the real dataset (Figure 5.1). In a clas-
sical GAN, the two entities are each a convolutional neural network (CNN). They
are known as the generator and the discriminator and are trained simultaneously
using backpropagation. The important twist with CryoGAN is that we replace the
generator network by a Cryo-EM physics simulator. By doing so, CryoGAN learns
the 3D density map whose simulated projections are the most consistent with a
given dataset of 2D measurements in a distributional sense.
The CryoGAN architecture represents a complete change of paradigm for single-
particle Cryo-EM reconstruction. No estimation of the poses is attempted during
2 Inthis work we consider the homogenuous case where the biomolecule exhibit only a single
conformation. The heterogeneuous (multiple conformation) case is the topic of the next chapter.
5.2 Image-Formation Model in Single-Particle Cryo-EM 115

the learning procedure; rather, the reconstruction is obtained through distributional


matching performed in a likelihood-free manner. Hence, CryoGAN sidesteps many
of the computational drawbacks associated with likelihood-based methods.
In practice, CryoGAN requires no prior knowledge of the 3D structure; its
learning process is purely unsupervised. The user needs only to feed the particle
images and estimates of the contrast transfer function (CTF) parameters. No initial
volume is needed: the algorithm starts with a volume initialized with zeros. The
CryoGAN framework is backed up by a comprehensive mathematical framework
that provides guarantees on the recovery of the correct structure under a given set of
assumptions that are often met in practice, at least to some degree of approximation.
We first assessed the performance and stability of CryoGAN on a synthetic -
galactosidase dataset, where we generated noisy projections in silico. The results
demonstrate that our unsupervised reconstruction paradigm permits the recovery of
a 8.6 Å structure (Figure 5.3). We then deployed CryoGAN on a real experimental
-galactosidase dataset (EMPIAR-10061) [158], reaching a resolution of 12.1 Å in
150 minutes in far more challenging conditions (Figure 5.5). These preliminary
results provide a strong indication of the suitability of the CryoGAN framework
for the reconstruction of real structures. On the implementation side, we expect
to be able to improve the resolution of the reconstructions by taking advantage of
the many recent technical developments and advances in the area of GANs. In the
meantime, the preliminary results obtained with CryoGAN are encouraging and
demonstrate the potential of adversarial-learning schemes in image reconstruction.
The proposed paradigm opens many new perspectives in single-particle Cryo-EM
and paves the way for more applications beyond the present one.

5.2 Image-Formation Model in Single-Particle Cryo-


EM
We detail here the Cryo-EM image formation model used in our implementation of
the CryoGAN physics simulator.
We use the standard model for the single-particle Cryo-EM imaging proce-
dure [159]
y = H' x + n, (5.1)
where y 2 RM is a (vectorized) 2D projection; x 2 RV is the (vectorized) 3D
116 CryoGAN: Cryo-EM Reconstruction using GAN Framework

Figure 5.1: A schematic comparison between (a) a classical GAN ar-


chitecture and (b) the CryoGAN architecture. Both frameworks rely on
a deep adversarial learning scheme to capture the distribution of the real
data. CryoGAN exploits this ability to look for the volume whose simulated
projections have a distribution that matches the real data distribution.
This is achieved by adding a “Cryo-EM physics simulator” that produces
measurements following a mathematical model of the Cryo-EM imaging
procedure. Importantly, CryoGAN does not rely on a first low-resolution
volume estimate, but is initialized with a zero-valued volume. Note that,
for both architectures, the updates involve backpropagating through the
neural networks; those actions are not indicated here for the sake of clarity.
5.2 Image-Formation Model in Single-Particle Cryo-EM 117

density; H' 2 RM ⇥V is the forward operator with parameters ' ⇠ p' ; and n 2 RM
is an additive noise following a distribution pn . In Cryo-EM one obtains 104 107
measurements of the biomolecule. Each of these measurement is obtained with an
unknown '. The imaging parameters ' comprise the projection (Euler) angles
✓ = (✓1 , ✓2 , ✓3 ), the projection shifts t = (t1 , t2 ), and the CTF parameters c =
(d1 , d2 , ↵ast ), where d1 is the defocus-major, d2 is the defocus-minor, and ↵ast is
the angle of astigmatism.
The forward operator H' is given by
H' = Cc St P✓ , (5.2)
where P✓ : RV ! RM is a projection operator (mathematically speaking, the X-
ray transform [160]), St : RM ! RM is a shift operator, and Cc : RM ! RM is
a convolution operator. We discuss in more detail the continuous-domain physics
behind the image formation model H' .

5.2.1 Image Formation in Continuous-Domain


Our continuous-domain forward model is based on [159] (2.1-2.10), [161], and [162],
which model the relationship between the 3D density map and a 2D measurement
as linear.
The continuous-domain measurement can be given in the following form
y = H' {f } + n (5.3)
where y : R2 ! R is the intensity measured on the image plane, n : R2 ! R
is the noise, f : R3 ! R is the density map we aim to recover, and H' is the
measurement operator dependent on the imaging parameters '. This model can
be further decomposed as
y = Cc St P✓ {f } + n (5.4)
where ' = (✓, t, c). We now discuss the three involved operators in more details.

Projection Operator
The projection operation is given by [160]
Z 1
P✓ {f }(x1 , x2 ) = R✓ {f }(x1 , x2 , x3 ) dx3 (5.5)
1
118 CryoGAN: Cryo-EM Reconstruction using GAN Framework

where R✓ is the rotation matrix associated with ✓ and R✓ {f }(x) = f (R✓ 1 x).

Shift Operator
The projection measurements are picked from the micrographs and can thus be
off-centered. This is modelled via the shift operator which, for any ys : R2 ! R,
yields

St {ys }(x1 , x2 ) = y(x1 t1 , x 2 t2 ) (5.6)

where t = (t1 , t2 ).

Convolution by CTF
The effect of the operator of Cc on any yc : R2 ! R is given in Fourier domain as

F{Cc {yc }}(!) = Ĉc (!).ŷc (!) (5.7)

where F is the Fourier transform and ŷc = F{yc }. Its Fourier transform Ĉc (i.e.,
the CTF) is given by

Ĉc (!) = Ĉcp (!)Ê(!)Â(!). (5.8)

There, Ĉcp : R2 ! R is the phase-contrast transfer function that takes the form
p
Ĉcp (!) = 1 A2 sin( c (!)) A2 cos( c (!)), (5.9)

with ✓ ◆
1
c (!) = ⇡ dc (↵)k!k2 3
cs k!k4 , (5.10)
4
where is the electron wavelength, cs is third-order spherical aberration constant,
↵ is the phase of the vector !, and dc (↵) is the defocus arising at the phase ↵. This
defocus is given as

dc (↵) = d1 cos2 (↵ ↵ast ) + d2 sin2 (↵ ↵ast ), (5.11)


5.3 Mathematical Framework of CryoGAN 119

where d1 and d2 are the horizontal and vertical defocus, respectively, and ↵ast is
the reference angle that defines the azimuthal direction of axial astigmatism. The
objective aperture function  : R2 ! R is given by
(
1, k!k  !cutoff
Â(!) = (5.12)
0, k!k > !cutoff ,
2⇡d
where !cutoff = f0 ap is the cutoff frequency, f0 is the focal length of the objec-
tive lens, and dap corresponds to the diameter of the aperture. The spatial and
chromatic envelope function Ê : R2 ! R is given by

Ê(!) = exp B(k!k2 ) , (5.13)

where B : R2 ! R is a function influenced by chromatic aberration and spatial


incoherence.

Discretization
The discretization of H' results in H' . This discretized measurement operator
is itself decomposed of the discretized projection, shift, and convolution operation
which are denoted by P✓ , St , and Cc , respectively. The input to the operator H'
is a discretized version of the continuous-domain 3D volume. This discretization of
the 3D volume is done using a suitable basis function [163].

5.3 Mathematical Framework of CryoGAN


The goal of single-particle Cryo-EM reconstruction is to estimate a 3D density map
xrec whose projections are consistent with the observed projections (data) of the
true density map xtrue .3
We write the conditional probability density function of a measurement y given
a volume x by marginalizing over the imaging parameters '
Z
p(y|x) = pn (y H' x)p' (')d', (5.14)
'2
3 Inthis forward model, we assume conformational homogeneity of the underlying structure.
The extension of CryoGAN to multiple conformations is discussed in the next chapter.
120 CryoGAN: Cryo-EM Reconstruction using GAN Framework

where is the set of all the possible imaging parameters. We denote a noiseless
projection as ynoiseless = H' x. In our formulation, the projections in the real
dataset are samples of a distribution pdata ; hence, p(y|xtrue ) = pdata (y) assuming
that the forward model is correct.
We demonstrate in Theorem 5.5.5 in Section 5.5 that two 3D volumes have
identical conditional distributions if and only if they are identical, up to rotation
and reflection. Hence, Theorem 5.5.5 implies that, for the reconstruction xrec to
satisfy xrec = xtrue , it must also satisfy p(y|xrec ) = p(y|xtrue ). Thus, we can
formulate the reconstruction task as the minimization problem

xrec = arg min D p(y|x), p(y|xtrue ) , (5.15)


x

where D is some distance between two distributions. In essence, (5.15) states that
the appropriate reconstruction is the 3D density map whose projection distribution
is the most similar to the real dataset in a distributional sense. For the sake of
conciseness, we shall henceforth use the notation p(y|x) = px (y).
As distance in (5.15), we use the Wasserstein distance defined as

D(p1 , p2 ) = inf E(y1 ,y2 )⇠ [ky1 y2 k], (5.16)


2⇧(p1 ,p2 )

where ⇧(p1 , p2 ) is the set of all the joint distributions (y1 , y2 ) whose marginals
are p1 and p2 , respectively. Our choice is driven by works demonstrating that
the Wasserstein distance is more amenable to minimization than other popular
distances (e.g., total-variation or Kullback-Leibler divergence) for this kind of ap-
plication [164]. Using (5.16), the minimization problem (5.15) expands as

xrec = argmin inf E(y1 ,y2 )⇠ [ky1 y2 k]. (5.17)


x 2⇧(px ,pdata )

By using the formalism of [164–166], this minimization problem can also be stated
in its dual form
⇣ ⌘
xrec = argmin max Ey⇠pdata [f (y)] Ey⇠px [f (y)] , (5.18)
x f :kf kL <1

where kf kL denotes the Lipschitz constant of the function f : RM ! R.


5.4 The CryoGAN Algorithm 121

5.3.1 Connection with Wasserstein GANs


Equation ((5.18)) falls under the framework of the generative adversarial networks
(GANs) [36] called Wasserstein GANs (WGANs) [164]. In the standard WGAN
representation, the function f is parameterized by a neural network D with pa-
rameters that is called the discriminator. The task of this discriminator is to
learn to differentiate between real samples (typically coming from an experimental
dataset) and fake samples. The latter are produced by another neural network,
called the generator, which aims at producing samples that are realistic enough to
fool the discriminator. This adversarial-learning scheme progressively drives the
WGAN to capture the distribution of the experimental data.
In CryoGAN, we exploit this capability of adversarial schemes to learn the
volume x whose simulated projections follow the captured real-data distribution. To
do so, we rely on a Cryo-EM physics simulator, whose role is to produce projections
of a volume estimate x using (5.1). These simulated projections then follow a
distribution y ⇠ px . Hence, (5.18) translates into
⇣ ⌘
xrec = argmin max Ey⇠pdata [D (y)] Ey⇠px [D (y)] . (5.19)
x D :kD kL 1

As proposed in [167], the Lipschitz constraint kD kL  1 can be enforced by


penalizing the norm of the gradient of D with respect to its input. This gives the
final formulation of our reconstruction problem as
⇣ ⌘
xrec = argmin max Ey⇠pdata [D (y)] Ey⇠px [D (y)] · Ey⇠pint [(kry D (y)k 1)2 ] .
x D
(5.20)

Here, pint denotes the uniform distribution along the straight line between points
sampled from pdata and px , while 2 R+ is an appropriate penalty coefficient
(see [167], Section 4).

5.4 The CryoGAN Algorithm


Equation (5.20) is a min-max optimization problem. By replacing the expected
values with their empirical counterparts (sums) [167], we reformulate it as the
122 CryoGAN: Cryo-EM Reconstruction using GAN Framework

minimization of
X X X
n n n
LS (x, D ) = D (ydata ) D (ysim ) (kry D (yint )k 1)2 ), (5.21)
n2S n2S n2S

where S consists of either the full dataset Sfull = {1, . . . , Ntot } or a batch B ✓ Sfull ;
n
ydata is a projection sampled from the acquired experimental dataset; ysim n
⇠ px is
a projection of the current estimate x generated by the Cryo-EM physics simulator;
and yint
n n
= ↵n ·ydata +(1 ↵n )·ysim , where ↵n is sampled from a uniform distribution
between 0 and 1.
In practice, we minimize (5.21) through stochastic gradient descents (SGD)
using batches. We alternatively update the discriminator D (for ndiscr iterations)
using an Adam optimizer [141] with gradient
N N N
!
X X X
n n n 2
r LB (x, D ) = r D (ybatch ) D (ysim ) (kry D (yint )k 1) ,
n=1 n=1 n=1
(5.22)

and the volume x (for 1 iteration) using the batch gradient,


N
!
X
n
rx LB (x, D ) = rx D (ysim ) . (5.23)
n=1

The pseudocode and a schematic view of the CryoGAN algorithm are given in Algo-
rithm 3 and Figure 5.1b, respectively. We provide further details of the CryoGAN
physics simulator and discriminator network in the next two sections.

5.4.1 The Cryo-EM Physics Simulator


The goal of the physics simulator is to sample ysim ⇠ px (y); this is done in three
steps. First, we sample the imaging parameters ' from the distribution p' : ' ⇠ p' .
Second, we generate noiseless CTF-modulated and shifted projections from the
current volume estimate with H' (x). Third, we sample the noise model to simulate
noisy projections y = H' (x) + n, where n ⇠ pn . We detail these steps in the
following paragraphs, and a pseudocode of this Cryo-EM physics simulator is given
in Algorithm 4.
5.4 The CryoGAN Algorithm 123

Algorithm 3 Pseudocode for CryoGAN


Parameters: number of training iterations, ntrain ; number of iterations of the
discriminator per training iteration, ndiscr ; size of the batches used for SGD, N ;
penalty parameter,

1: for ntrain do
2: for ndiscr do
3: sample real projections: {y1batch , . . . , yN n
batch } = {ydata }n2B
4: sample projections simulated from current x: {ysim , . . . , yN
1
sim } ⇠ px (see
Algorithm 2)
5: sample {↵1 , . . . , ↵n } ⇠ U [0, 1]
6: for all n 2 {1, . . . , N }, compute yint
n n
= ↵n · ybatch n
+ (1 ↵n ) · ysim
7: update the parameters of the discriminator D using (5.22)
8: sample {y1sim , . . . , ysim N } ⇠ px
9: update the volume x using (5.23)

Recall that the set of imaging parameters is given by ' = (✓1 , ✓2 , ✓3 , t1 , t2 , d1 , d2 , ↵ast ).
We first sample the Euler angles ✓ = (✓1 , ✓2 , ✓3 ) from a distribution p✓ decided a
priori based on the acquired dataset. Similarly, the projection shifts t = (t1 , t2 )
are sampled from the prior distribution pt . The CTF parameters c = (d1 , d2 , ↵ast )
are sampled from the prior distribution pc . In practice, we exploit the fact that the
CTF parameters can often be efficiently estimated for all micrographs. We then
uniformly sample from the whole set of extracted CTF parameters.
We generate noiseless projections ynoiseless by applying H' to the current volume
estimate x. The projection operator P✓ in (5.2) is implemented using the ASTRA
toolbox [168].
The precise modeling of the noise is a particularly challenging feat in single-
particle Cryo-EM. To produce noise realizations that are as realistic as possible,
we extract random background patches directly from the micrographs themselves,
at locations where particles do not appear. For consistency, the noise patch added
to a given noiseless projection is taken from the same micrograph that was used
to estimate the CTF parameters previously applied to that specific projection.
Additional details for this implementation are given in Section A.3.
124 CryoGAN: Cryo-EM Reconstruction using GAN Framework

Algorithm 4 Pseudocode for Cryo-EM Physics Simulator


Parameters: current volume estimate, x
1: sample the Euler angles ✓ = (✓1 , ✓2 , ✓3 ) ⇠ p✓ .
2: sample the 2D shifts t = (t1 , t2 ) ⇠ pt .
3: sample the CTF parameters c = (d1 , d2 , ↵ast ) ⇠ pc .
4: generate a synthetic noiseless projection based on (5.2), with ynoiseless = H' x.
5: sample the noise n ⇠ pn . Add to the projection as ysim = ynoiseless + n.

5.4.2 The CryoGAN Discriminator Network


The role of the discriminator is to differentiate between projections from the ex-
perimental dataset and projections generated by the Cryo-EM physics simulator.
The gradients from the discriminator (see (5.23) in Algorithm 3) carry information
on the difference between the real and simulated projections at a given run-time.
Those gradients are used by the simulator to update itself, thus improving on the
realism of the simulated projections.
The discriminator network takes an image as input and outputs a scalar value.
Its architecture is illustrated in Figure 5.2. It is composed of 8 layers: 6 convolu-
tional blocks followed by 2 fully connected (FC) layers. Each convolutional block
is made up of a convolutional layer followed by a max-pooling and a leaky ReLU
(with negative slope of 0.1). The number of channels in each convolutional layer is
96, 192, 384, 768, 1536, and 3072, respectively. The filters in these layers are of size
3, and the padding size is 1. The max-pooling layer uses a kernel of size 2 with a
stride of 2. This leads to a downsampling by a factor of 2. The output of the final
convolutional block is then reshaped, fed into the FC layer with 10 neurons, and
finally processed by a leaky ReLU. The resulting activations are fed to the last FC
layer to output a scalar.

5.4.3 Overall Scheme


In summary, CryoGAN is like a standard GAN, except that the generator net-
work is replaced by a Cryo-EM physics simulator (Figure 5.1b). This simulator
implements a mathematical model of the imaging procedure to produce a sim-
ulated measurement based on 1) the current volume estimate and 2) a random
projection orientation. This image-formation model considers that the Cryo-EM
5.5 Theoretical Guarantee of Recovery 125

2D measurement is the projection of the volume at that orientation, modulated by


microscopy-related effects and corrupted by substantial additive noise.
The Cryo-EM physics simulator is paired with a discriminator network whose
architecture is similar to those used in standard GANs. The role of this discrimina-
tor in CryoGAN is to encourage the simulator to learn the 3D volume xrec whose
simulated dataset distribution p(y|xrec ) matches that of the real dataset, pdata (y).
Mathematically, this equates to
xrec = arg min D p(y|x), pdata (y) , (5.24)
x

where D is an appropriate measure of distance between distributions, which the


discriminator allows us to compute; in our implementation, D is the Wasserstein
distance [165]. This formulation is based on a sound mathematical framework that
provides guarantees on the recovery of the volume. In Theorem 5.5.5 (Section 5.5),
we show that, under certain constraints, the reconstructed volume is the same as
the true volume up to rotation and reflection.
To perform the minimization (5.24), the CryoGAN algorithm alternates between
updates of the discriminator and of the volume with stochastic gradient descent
(SGD). The code for our implementation of CryoGAN is written in Python using
the PyTorch [169] package.

5.5 Theoretical Guarantee of Recovery


The CryoGAN paradigm is supported by Theorem 5.5.5, which states that perfect
recovery is possible from continuous-domain measurements. In the continuous do-
main, we have y = H' f +n where y : R2 ! R is the 2D measurement obtained from
the 3D volume f and n is the noise. Here H' = Cc St P✓ is the continuous-domain
forward operator, P✓ is the projection operator, St is the shift operator, and Cc is
the operator for convolution with the CTF (see Image Formation). We assume
that ✓ ⇠ p✓ , c ⇠ pc , t ⇠ pt , and Pn be the probability measure associated with n.
We assume that
i. the characteristic functional P̂n of the noise probability measure Pn is non-
zero everywhere in its domain and n is pointwise defined everywhere in R2 ;
ii. the support of pc is such that, for any c1 , c2 2 Support{pc } and c1 6= c2 , the
fourier transform F{Cc1 + Cc2 } is non-zero everywhere;
126 CryoGAN: Cryo-EM Reconstruction using GAN Framework

Convolution
H*W Max Pooling
Leaky ReLU

Fully-connected
Leaky ReLU
C*H/2*W/2
Fully-connected

2C*H/4*W/4

10*1
4C*H/8*W/8
1*1
16C*H/32*W/32 Output

8C*H/16*W/16 32C*H/64*W/64

Figure 5.2: Architecture of the discriminator. The parameter for the


channel size is C = 96 in every experiment. The input image with size
H ⇥ W is successively processed and downsampled to output a scalar.

iii. the volume f is nonnegative everywhere and has a bounded support; and

iv. the probability distributions p✓ , pc , and pt are bounded.

Before proving Theorem 5.5.5, we first comment on the assumptions. Assump-


tion 1) is true for many noise distributions, including a white Gaussian noise filtered
with any kernel of arbitrarily non-zero compact support. Assumption 2) is gener-
ally true as well. In fact, it is used to justify the application of Wiener filter to the
clustered projections in classical Cryo-EM reconstruction pipelines. Assumption 3)
is true since the volume represents the density map, which is nonnegative. Also,
the biological structures considered in Cryo-EM have finite sizes.
In the following two sections, we build up to Theorem 5.5.5. First by analyzing
the problem in the absence of the CTF and noise, and then by adding the CTF.
5.5 Theoretical Guarantee of Recovery 127

Figure 5.3: CryoGAN is applied on a synthetic projection dataset gen-


erated from a 2.5Å -galactosidase volume. We refer to these synthetic
projections as “real,” in contrast to the projections coming from CryoGAN,
which we term “simulated.” (a) The volume is initialized with zeros and is
progressively updated to produce projections whose distribution matches
that of the real projections. (b) Evolution during training of some clean
projections (i.e., before CTF and noise) generated by the Cryo-EM physics
simulator. (c) Row 1 : Clean, simulated projections (before CTF and noise)
generated at the final stage of training. Row 2 : CTF-modulated simulated
projections (before noise) generated at the final stage of training. Row
4 : Real projections, for comparison. (d) FSC curves between the two
reconstructed half-maps at different points during training.
128 CryoGAN: Cryo-EM Reconstruction using GAN Framework

Notations and Preliminaries


Let SO(3) be the space of the special orthonormal matrices and D be the Borel
algebra induced using the standard Riemannian metric on SO(3). Then, (SO(3), D)
describes the measurable space of orthonormal matrices. Let W N = {x 2 R
N
:
kxk2  W } for some W 2 R . By (L2 , B), we denote the measurable space of all
+

the square-integrable functions supported in W 2 with Borel algebra B induced


by the L2 -norm. We denote by F the set of all the functions supported in W 3 ,
which are nonnegative and essentially bounded. R1
For any f 2 F and A 2 SO(3), we denote y = PA {f } = 1 Af (x1 , x2 , x3 ) dx3
where Af (x) = f (A 1 x). Let pA be a probability density on the space (SO(3), D).
Note that there is a bijective mapping from ✓ in Theorem 1 and A. In fact, A
represents the rotation matrix associated with the projection angle ✓.
We denote by the normalized Haar measure
R on (SO(3), D) and by A the
measure associated with pA such that A [·] = (a2·) pA (a) [ da].
For a given f 2 F, the density pA induces a probability measure Pproj (·|f ) on
the space (L2 , B) through the mapping PA {f } such that

Pproj (·|f ) = A [{A 2 SO(3) : PA {f } 2 ·}]. (5.25)

When pA is uniform on SO(3), one has that

Pproj (·|f ) = Pproj (·|Rf ), 8f 2 F and R 2 O(3), (5.26)

where O3 is the space of all orthogonal matrices such that det A 2 { 1, 1}. The
invariance in (5.26) is true since

Pproj (·|f ) = [{A 2 SO(3) : PA {f } = ·}]


= [{A 2 SO(3) : PR 1A {Rf } = ·}]
0
= [{RA 2 SO(3) : P {Rf } = ·}]A0
= [{A0 2 SO(3) : PA0 {Rf } = ·}] (5.27)

where A0 = R 1 A and the last equality follows from the right invariance of Haar
measure. We define G{F} = { A : A 2 O3 } such that
1
( A f )(·) = f (A ·), 8A 2 O(3), f 2 F. (5.28)
5.5 Theoretical Guarantee of Recovery 129

We define the shape [f ] as an orbit of f under the influence of G such that [f ] =


{ A f : A 2 G}. When pA is uniform, the shape [f ] is composed of all the rotations
and reflections of f .

5.5.1 Recovery in the Absence of CTF and Noise


In the absence of CTF and shifts the recoverability of f : R3 ! R from its 2D
projections obtained at unknown random poses is guaranteed by [170, Theorem
3.1]. We first go through the notations described in [170] before we state the
required foundational result. We then extend [170, Theorem 3.1] to the case when
the CTF and shifts are present.
We can now restate [170, Theorem 3.1]. We discuss here the sketch of the proof
given in [170].

Theorem 5.5.1 ( [170, Theorem 3.1]). Let pA be any bounded distribution on


SO(3) and let the assumptions of Theorem 5.5.5 be true; then, 8f, g 2 F,

[f ] 6= [g] ) Pproj (·|f )?Pproj (·|g). (5.29)

Sketch of the Proof. Without loss of generality, we provide the sketch of the
proof for the case when pA is uniform. For the case when pA is nonuniform the
argument remains the same provided that A associated with the non-uniform
distribution pA is absolutely continuous with respect to ( A ⌧ ). This has
been stated in [170]. Since we assume pA to be bounded, this condition is satisfied.
The only difference here with respect to the uniform distribution is that the orbit
of f and g are more restricted than O(3).
The proof first uses in [171, Proposition 7.8] which we restate here as Proposition
5.5.2.

Proposition 5.5.2 ( [171, Proposition 7.8]). Let f 2 F and let SA be an uncount-


ably infinite subset of SO(3), then f is determined by the collection {PA {f }}A2SA
ordered with respect to A 2 SA .

Note that this proposition assumes that the angle of the projections are known.
Although in our case the angles are unknown, we shall see that this proposition
will be useful.
130 CryoGAN: Cryo-EM Reconstruction using GAN Framework

We now want to determine how different Pproj (.|f ) and Pproj (.|g) are for any
given f and g. For this, we use the equality

TV(P1 , P2 ) = 2 inf E(y1 ,y2 )⇠ [1y1 6=y2 ], (5.30)


2⇧(P1 ,P2 )

where TV is the total variation distance and ⇧(P1 , P2 ) is the set of all the joint
distributions (y1 , y2 ) whose marginals are P1 and P2 [165]. In fact, E[1y1 6=y2 ] is
equal to the probability of the event y1 6= y2 . In our context, this translates into

TV(Pproj (.|f ), Pproj (.|g)) = 2 inf Prob(y1 6= y2 ), where (y1 , y2 ) ⇠ .


2⇧(Pproj (.|f ),Pproj (.|g))
(5.31)

The optimum is achieved at the extremas which are sparse joint distributions and
are such that the variable y2 is a function of y1 . For any arbitrary joint distribution
(or coupling) of this form, the proof then assigns a measurable function h : SO(3) !
SO(3) such that (y1 , y2 ) = (PA {f }, Ph(A) {g}) for A ⇠ pA .
We can then write that

[{A 2 SO(3) : Ph(A) {g} 2 ·}] = Pproj (·|g). (5.32)

The task now is to estimate Prob(y1 6= y2 ), where (y1 , y2 ) = (PA {f }, Ph(A) {g}) for
A ⇠ pA .
(Continuous h). When h is continuous, Proposition 5.5.2 implies that, if [f ] 6= [g],
then

[{A 2 SO(3) : kPA {f } Ph(A) {g}k2 > 0}] = 1. (5.33)

(General h). When the function h is discontinuous, the proof uses Lusin’s theorem
to approximate h by a continuous function. Lusin’s theorem states that, for any
> 0, there exists an h such that h(A) = h (A), 8A 2 H and [SO(3)|H ] < .
This then leads to

[{A 2 SO(3) : kPA {f } Ph(A) {g}k2 > 0}] (H )


1 . (5.34)

Since is arbitrarily small, the event {PA {f } =


6 Ph(A) {g}} has probability 1.
5.5 Theoretical Guarantee of Recovery 131

In conclusion, for any arbitrary coupling, the event {PA {f } 6= Ph(A) {g}} has
probability 1 if [f ] 6= [g]. This implies that, when [f ] and [g] are not the same,
the total-variation distance between Pproj (·|f ) and Pproj (·|g) is 2. This ensures that
the two probability measures are mutually singular meaning that the intersection
of their support has zero measure. This concludes the proof.

5.5.2 Recovery in the Presence of CTF and Absence of Noise


We now extend the previous result to the case when the CTF is present. For
the sake of simplicity we do not take into account shifts in the forward model.
However, it is trivial to generalize the results to them since, shifts don’t change the
information content in the projections but only their location.
We assume that c ⇠ pc such that the support of pc is in some bounded region
C ⇢ R3 . We denote c [·] as the measure associated with pc on the space C.
We denote by (SO(3) ⇥ C) the product space of SO(3) and C, while we denote
by A,c the measure on this product space. We then define

Pproj,CTF (·|f ) = A,c [{(A, c) 2 (SO(3) ⇥ C) : Cc ⇤ PA {f } 2 ·}], (5.35)

where Cc is the space-domain CTF given in (5.4).

Theorem 5.5.3. Let pA be a bounded probability distribution on SO(3), pc be a


distribution of the CTF with parameters c 2 C, and let the assumptions of Theorem
5.5.5 be true; then, 8f, g 2 F,

[f ] 6= [g] ) Pproj,CTF (·|f )?Pproj,CTF (·|g). (5.36)

Proof. Similarly to the previous proof, we show that the TV distance between
Pproj,CTF (·|f ) and Pproj,CTF (·|g) is 2 when [f ] and [g] are distinct. For simplification,
we assume that pA is uniform. (When this is not the case the proof essentially
remains the same.) We need to show that Prob(y1 6= y2 ) = 1, where (y1 , y2 ) ⇠
for any arbitrary coupling of Pproj,CTF (·|f ) and Pproj,CTF (·|g). For an arbitrary
coupling such that Prob(y1 6= y2 ) is minimum, we again assign h : (SO(3) ⇥ C) !
(SO(3) ⇥ C) such that

(y1 , y2 ) = (Cc ⇤ PA {f }, Ch1 (A,c) ⇤ Ph0 (A,c) {g}), (5.37)


132 CryoGAN: Cryo-EM Reconstruction using GAN Framework

where A ⇠ pA , c ⇠ pc and where h0 : (SO(3)⇥C) ! SO(3) and h1 : (SO(3)⇥C) !


C are such that h(A, c) = (h0 (A, c), h1 (A, c)). This implies that

Pproj,CTF (·|g) = A,c [{(A, c) 2 (SO(3) ⇥ C) : Ch1 (A,c) ⇤ Ph0 (A,c) {g} 2 ·}]. (5.38)

We now show that, for any h, the event {y1 6= y2 } has probability 1.
(Continuous h). We first assume that h is continuous and use the same kind of
technique as in the proof of [170, Theorem 3.1].
Since SO(3) is transitive, we can write that

h(A, c) = (A A,c , h1 (A, c)). (5.39)

As h is continuous, so is A,c . Let {Amn ⇥ Cn }m=1 be a collection of n disjoint sets


m n

which creates the partition of (SO(3) ⇥ C). These partitions are such that for any
m, there exists a km such that {Am n+1 ⇥ Cn+1 } ⇢ {An ⇥ Cn }. This means that,
m km km

as n increases, the partitions become finer. We now define


m m
hn (A, c) = (A n , hn,1 (A, c)) 8 (A, c) 2 {Am m
n ⇥ Cn }, (5.40)

such that
m
n = arg min min kPA {f } PA {g}k, (5.41)
2{ m m
A,c :(A,c)2{Ān ⇥C̄n }}
(A,c)2{Ām m
n ⇥C̄n }

where Ām ¯m
n and Cn are the closures of An and Cn , respectively. The sequence hn
m m

converge to h as n ! 1. We denote

K = {(A, c) 2 (SO(3) ⇥ C) : kCc ⇤ PA {f } Ch1 (A,c) ⇤ PA A


{g}k > 0}, (5.42)
Kn = {(A, c) 2 (Am
n ⇥ Cnm ) : kCc ⇤ PA {f } Ch1 (A,c) ⇤ PA { m
n g}k > 0}. (5.43)

Similarly to [170, Theorem 3.1], we can then show that


m=n
X
A,d [K] = lim A,d [Km ]. (5.44)
n!1
m=1

We invoke Proposition 5.5.4, which gives that A,c [Kn ] = A,c [(Am
n ⇥Cn )]. There-
m

fore, A,d [K] = A,c [(SO(3) ⇥ C)] = 1. This means that, when h is continuous,
the event {y1 6= y2 } has probability 1 if [f ] 6= [g].
5.5 Theoretical Guarantee of Recovery 133

(General h). When h is discontinuous, we can invoke Lusin’s theorem to claim the
same, similarly to Theorem 5.5.1. This means that, for any h, if [f ] 6= [g], then
the probability of the event {y1 6= y2 } is 1. Therefore, the TV distance between
Pproj,CTF (·|f ) and Pproj,CTF (·|g) is 2, yielding that Pproj,CTF (·|f )?Pproj,CTF (·|g).
This concludes the proof.
Proposition 5.5.4. Let f, g 2 F, A0 ✓ SO(3), C 0 ✓ C, 2 SO(3), and

K0 = {(A, c) 2 (A0 ⇥ C 0 ) : kCc ⇤ PA {f } Ch1 (A,c) ⇤ PA { g}k > 0}. (5.45)

Let the assumptions from Theorem 5.5.5 be true. Then, if [f ] 6= [g], it holds that
0 0
A,c [K ]= A,c [(A ⇥ C 0 )]. (5.46)

Proof. We show that A,c [K0c ] = 0, where (K0c [K0 ) = (A0 ⇥C 0 ). We define the set
SA = {c 2 C 0 : kCc ⇤PA {f } Ch1 (A,c) ⇤PA { g}k = 0}. We define SA00 = [A2A00 SA
for any A00 ✓ A0 . We define

A01 = {A 2 A0 : SA is an uncountable set}, (5.47)


A02 0
= {A 2 A : SA is a countable non-empty set}. (5.48)

Note that K0c = [2k=1 [A2A0k (A ⇥ SA ). Then,

2
X
0c
A,c [{K ]= A,c [[A2A0k (A ⇥ SA )}] (5.49)
k=1

We now look at the two cases.


• (When SA is uncountable). For this case, we show that [A01 ] = 0. The main
argument is that if this is not true, then it contradicts [f ] 6= [g].
For the sake of conciseness, we denote PA {f } by If and PA { g} by Ig . For
any A 2 A01 , it holds that

Cc ⇤ If = Ch1 (A,c) ⇤ Ig , 8c 2 SA , (5.50)


bc · Iˆf = C
C bh (A,c) · Iˆg , 8c 2 SA , (5.51)
1

b Iˆf , and Iˆg are the Fourier transforms of C, If , and Ig , respectively.


where C,
134 CryoGAN: Cryo-EM Reconstruction using GAN Framework

We define ze(I) b
ˆ = {! 2 R2 : I(!) = 0}, !↵ = {[(r cos ↵, r sin ↵)] : r > 0},
ˆ ˆ
and ze↵ (I) = ze(I) \ !↵ . From (5.51), we can write that
bc ) [ ze(Iˆf ) = ze(C
ze(C bh (A,c) ) [ ze(Iˆg ), 8c 2 SA . (5.52)
1

Two remarks are in order. Firstly, by assumption 2 of Theorem 5.5.5, ze(C bc )\


1
b b
ze(Cc2 ) = ; for c1 6= c2 . (Remember that ze↵ (Cc ) for any ↵ 2 [0, ⇡] is
nonempty (see “Image Formation Theory).) Secondly, by assumption 3 of
Theorem 5.5.5, the supports of f and g are compact and nontrivial, so are
the supports of If and Ig . This means that their Fourier transforms Iˆf and
Iˆg are analytic functions, which implies that there are infinitely many ↵ such
that the cardinality of the sets ze↵ (Iˆf ) and ze↵ (Iˆf ) is countable. We call the
set of such ↵ as S↵ . Now, we have that
bc ) \ (ze↵ (C
ze↵ (C bc ) [ ze↵ (Iˆf )) = ze↵ (C
bc ) \ (ze↵ (C
bh (A,c) ) [ ze↵ (Iˆg )),
1

b b ˆ b b
ze↵ (Cc ) [ (ze↵ (Cc ) \ ze↵ (If )) = (ze↵ (Cc ) \ ze↵ (Ch (A,c) )) [ (ze↵ (C bc ) \ ze↵ (Iˆg )),
1

bc ) [ (ze↵ (C
ze↵ (C bc ) \ ze↵ (Iˆf )) = ze↵ (C
bc ) \ ze↵ (Iˆg ) (5.53)
for all c 2 SA and ↵ 2 [0, ⇡].
We can now write that
bc ) [ (ze↵ (C
[c2SA ze↵ (C bc ) \ ze↵ (Iˆf )) = [c2S ze↵ (C
bc ) \ ze↵ (Iˆg ). (5.54)
A

for any ↵ 2 S↵ . The set on the left hand side of (5.54) has an uncountably
infinite cardinality since there are uncountably many c 2 SA and for each c
there are distinct ze↵ (C bc ). In return, the set in the right hand side of (5.54)
is countable for a given ↵ 2 S↵ . Therefore, for any ↵ 2 S↵ , the two sets have
different cardinality, which raises a contradiction. The only possible scenario
in which (5.52) is true is when h1 (A, c) = c. Using (5.51), we infer that
PA {f } = PA { g}. Therefore, for any A 2 A01 , PA {f } = PA { g}. However,
[A01 ] = 0 since, if this is not true, then [f ] = [g] by Proposition 5.5.2.
Now note that

A,c [[A2A01 (A ⇥ SA )]  [A0 ] c [[A2A01 SA ]


| {z 1} | {z }
0 finite
= 0. (5.55)
5.5 Theoretical Guarantee of Recovery 135

• (When SA is countable and nonempty). Since SA is a countable set in this


case, its elements have a bijection with natural numbers. We denote this
bijection by b : Z ⇥ A02 ! SA . We denote by q(z) = [A2A02 (A, bA (z)), 8z 2
Z. Note that q(z) is a graph of the function b(z, ·). Since it is a graph,
A,c [q(z)] = 0.
P
We also have that A,c [[A2A02 (A ⇥ SA )] = A,c [ z2Z q(z)]. The latter
vanishes since it is the measure of a countable addition of sets of measure
zero. Hence, A,c [[A2A02 (A ⇥ SA )] = 0.
P2
This gives that A,c [K
0c
]= k=1 A,c [[A2A0k (A ⇥ SA )] = 0, which concludes the
proof.

5.5.3 Recovery in the presence of CTF and Noise


Theorem 5.5.5. Let y = H' f + n as given in (5.4) with ' = (✓, t, c), where
✓ = (✓1 , ✓2 , ✓3 ) are the projection angles, t = (t1 , t2 ) are the shifts, and c =
(d1 , d2 , ↵ast ) are the CTF parameters (defocus-major, defocus-minor, and angle
of astigmatism, respectively), f is the continuous-domain 3D volume, and y, n are
continuous-domain 2D images.
Then, it holds that

P(·|f1 ) = P(·|f2 ) , f1 = G(f2 ), (5.56)

where G is some member of the set of rotation-reflection operations.

Proof of Theorem 5.5.5. We denote the probability measure of ynoiseless = H' f


with Pnoiseless (·|f ). We shall prove the following in sequence:

i. P(·|f1 ) = P(·|f2 ) , Pnoiseless (·|f1 ) = Pnoiseless (·|f2 ),

ii. Pnoiseless (·|f1 ) = Pnoiseless (·|f 2 ) , f2 = G(f1 ).

For the first part we progress by noting that y = ynoiseless + n. Recall that the
characteristic function of the probability measure associated to the sum of two
random variables is the product of their characteristic functions. Mathematically,

P̂(·|f ) = P̂noiseless (·|f ) P̂n . (5.57)


136 CryoGAN: Cryo-EM Reconstruction using GAN Framework

By Assumption (1, Theorem 5.5.5), we can now write that

P̂(·|f )
P̂noiseless (·|f ) = . (5.58)
P̂n

From (5.58), it is easy to see that P(·|f1 ) = P(·|f2 ) , Pnoiseless (·|f1 ) = Pnoiseless (·|f2 ).
This concludes the first part.
For the second part, we now invoke the result from Theorem 5.5.3. It states
that if f2 6= G(f1 ) for any G in the set of rotation and reflection operation, then
the corresponding Pnoiseless (·|f1 ) and Pnoiseless (·|f2 ) are mutually singular. This
means that the support of their intersection has zero measure. Since, we have
Pnoiseless (·|f1 ) = Pnoiseless (·|f2 ) this means they are not mutually singular. This
implies that f2 = G(f1 ) for some G in the set of rotation and reflection operation.
This concludes the proof.

Discrete-domain Extension. In practice, the Cryo-EM measurements are


acquired on a detector grid and are therefore discrete. CryoGAN reconstructs a
voxel-domain xrec such that p(y|xrec ) = p(y|xGT ), meaning that the corresponding
probability measures are equal as well. Theorem 5.5.5 then holds approximately, up
to some error that results from discretization of the measurements, forward model,
and 3D density map. We leave a more thorough analysis of this error for future
work.

5.6 Related Works


Related works fall into two main categories: current Cryo-EM reconstruction meth-
ods and deep learning techniques that may apply to the Cryo-EM pipeline; we now
discuss each of these.

Cryo-EM Reconstruction
The main challenge in Cryo-EM reconstruction is that every particle has an un-
known pose in its micrograph—if the poses were known, maximum-likelihood (ML)
or maximum a posteriori (MAP) estimation of the volume could be performed by
solving a standard linear inverse problem, where robustness would result from the
5.6 Related Works 137

large number of measurements which would counteract the low SNR of each mea-
surement. One approach is to attempt to estimate the unknown poses iteratively.
Pose estimation can be achieved with a variety of strategies, including the popular
projection-matching approach [172, 173]. Whatever the method used, pose estima-
tion is challenging because the SNR of individual projection images is extremely
low. It also requires the estimation of additional parameters and the projection of
the current reconstructed volume at a large number of poses and at every iteration
of the reconstruction pipeline; ultimately, this is very computationally demanding.

Another approach is to formulate the reconstruction as a ML (or MAP) esti-


mation problem in which the unknown poses are marginalized away [151, 153, 174].
This is attractive in that no extra parameter need to be estimated. The problem
can then be solved using the expectation-maximization algorithm (e.g., [151, 153]),
where marginalization over poses during the so-called E-step is computationally
expensive. Alternatively, the problem can be minimized using stochastic gradient
descent (e.g., during the ab initio phase of [153]); here, the challenge is that the
involved gradients require computations over all poses. For a more in-depth dis-
cussion, see [156, 157, 175]. For additional mathematical details on the relationship
between likelihood-based methods and CryoGAN, see CryoGAN vs. Likelihood-
based Methods.

Likelihood-free methods for Cryo-EM reconstruction are relatively few. An early


approach is [176], which proposes to reconstruct an ab initio structure such that
the first few moments of the distribution of its theoretical Cryo-EM measurements
match the ones of the particles. However, the method assumes that the poses of
the particles have a uniform distribution. This moment-matching technique has
been recently extended in [177] to reconstruct an ab initio structure in the case of
nonuniform pose distributions.

By contrast, our CryoGAN framework proposes to match the distribution of


the theoretical Cryo-EM measurements to that of the real projections, by which
we mean all the moments and not just the first few. Moreover, our method works
for any pose distribution of the particles provided it is known beforehand. Alter-
natively, one could rely on a parametric model of the pose distribution and use the
backpropagation mechanism of neural networks to learn its parameters during the
CryoGAN run; this technique is explored in [177].
138 CryoGAN: Cryo-EM Reconstruction using GAN Framework

Deep Learning for Cryo-EM


Deep learning has already had a profound impact in a wide range of image-reconstruction
applications [178–180]; however, their current utilization in Cryo-EM is mostly re-
stricted to preprocessing steps such as micrograph denoising [181] or particle pick-
ing [182–186]. A recent work uses neural networks to model continuous generative
factors of structural heterogeneity [187]. However, the algorithm necessitates a
pose-estimation procedure that relies on a merely conventional approach. Another
recent work [188] uses a variational autoencoder trained using a discriminator-based
objective to find a low-dimensional latent representation of the particles. These rep-
resentations are then used to estimate the poses.
Deep learning is now extensively used to solve inverse problems in imaging [17,
21,26,179]. However, most methods are based on supervised learning and thus rely
on training data. A GAN-based scheme that recovers the underlying distribution of
the data from its noisy partial observations through a forward model was recently
proposed in [189]. Finally, the reconstruction of a 3D structure (implicitly or
explicitly) from its 2D viewpoints (and not projections) is an important problem
in computer vision [190]. Many recent deep-learning algorithms have been used
in this regard [191, 192]. While this problem is ostensibly similar to Cryo-EM
reconstruction, the measurement model for these problems is much less complicated
than it is for Cryo-EM and bares no relation to this modality.

5.6.1 CryoGAN vs. Likelihood-based Methods


As discussed in [157], most reconstruction approaches in SPA currently rely a on
maximum-likelihood (ML) formulation [174] written as
N
X
xrec = arg max log p(yndata |x). (5.59)
x
n=1

This formulation can be solved by using an expectation-maximization (ML-EM) al-


gorithm or gradient descent. The former is preferred for iterative refinement [151],
while a stochastic version of the latter is used for generating initial volume estimates
in [153]. These techniques all involve the likelihood-estimation for each projection
given the current volume estimate, which requires marginalizing the joint distri-
bution p(y, ✓|x) over the space of poses. In ML-EM, this is side-stepped in the
5.6 Related Works 139

expectation step (E-step) by computing the conditional distribution on the poses


for each projection given the volume estimate. This conditional distribution is then
used to update the volume in the marginalization step (M-step). Hence, all ML
techniques require, either implicitly or explicitly, computations over a large number
of poses for each projection.
For another perspective, ML techniques can be viewed as distribution-matching
approaches. Specifically, (5.59) minimizes an empirical estimate of the Kullback-
Leibler (KL) divergence between the real distribution pdata (y) and the simulated
distribution p(y|x), such that

xrec = arg min KL(pdata (y)||p(y|x)) (5.60)


x

pdata (y)
= arg min Ey⇠pdata log (5.61)
x p(y|x)
= arg min Ey⇠pdata [ log p(y|x)] (5.62)
x
N
X
⇡ arg max log p(yndata |x). (5.63)
x
n=1

Hence, ML methods aim at finding the reconstruction whose simulated projection


distribution matches that of the real data. In practice, this specific goal can be
achieved by minimizing any suitable distance between these two distributions. By
changing the distance, one can avoid the challenging likelihood computations that
are inherent to the current ML methods, while preserving the theoretical guarantees
that come with distribution matching (e.g., Theorem 5.5.5).
This is precisely the philosophy behind CryoGAN, which relies on the Wasser-
stein distance to formulate the distribution-matching task as
⇣ ⌘
xrec = argmin max Ey⇠pdata [D (y)] Ey⇠px [D (y)] . (5.64)
x D :kD kL 1

In this formulation, the two distributions indirectly interact through a common


function: the discriminator network D . As a result, likelihood estimation is
avoided; only a reliable sampler for each of the two distributions [193] is required.
The samples from the real data distribution are readily available in the form of
the acquired projection dataset, while the ones from the simulated distribution are
generated by the Cryo-EM physics simulator. Hence, CryoGAN is a likelihood-free
140 CryoGAN: Cryo-EM Reconstruction using GAN Framework

technique with theoretical properties that are at least as good—if not better—than
the ML ones. In fact, the Wasserstein distance is often easier to minimize than the
KL divergence (e.g., due to the smoothness of the former) [164].

5.7 Results
5.7.1 Results on Synthetic Data
We first assessed the viability and performance of CryoGAN on a synthetic dataset
that consists of 41,000 -galactosidase projections, designed to mimic the EMPIAR-
10061 data [158] in terms of noise level and CTF parameters. To create this dataset,
we generated a 2.5 Å-resolution density map from the PDB entry (5a1a) of the pro-
tein and applied the forward model described in Online Methods to obtain projec-
tions modulated by CTF effects and corrupted by noise. We then randomly divided
this dataset in two and applied the CryoGAN algorithm separately on both halves
to generate half-maps. In the context of this experiment, we refer to these synthetic
projections as “real,” in contrast to the projections coming from CryoGAN, which
we term “simulated.” The details of the experimental setup are given in Appendix
A.3.
We ran the CryoGAN algorithm for 400 minutes on an NVIDIA V100 GPU and
obtained a reconstruction with a resolution of 8.64 Å (Figure 5.3a). Starting from
a zero-valued volume, CryoGAN progressively updates the 3D structure so that
its simulated projections (Figure 5.3b) reach a distribution that matches that of
the real projections. These gradual updates are at the core of the deep adversarial
learning scheme of CryoGAN. At each iteration of the algorithm, the gradients
from the discriminator carry information about the current difference between the
real projections and the simulated projections. These gradients are used by the
Cryo-EM physics simulator to update the volume so as to improve the fidelity
of the simulated projections. Hence, at the end of its run, the volume learned by
CryoGAN has simulated projections (Figure 5.3.c, Rows 1-3) that are similar to the
real projections (Figure 5.3.c, Row 4) in a distributional sense. The evolution of the
Fourier-shell correlation (FSC) between the reconstructed half-maps (Figure 5.3.d)
testifies to the progressive increase in resolution that derives from this adversarial
learning scheme.
5.7 Results 141

Figure 5.4: Additional CryoGAN reconstructions for synthetic datasets


with different imaging conditions. (a) Reconstruction for a low noise case
(-5.2 dB SNR). The corresponding evolution of the FSC curves with time
is shown on the right. (b) Reconstruction for a realistic noise level (-20
dB SNR) and with translations (3% of the image size) in the data. The
corresponding evolution of the FSC curves with time is shown on the right.
142 CryoGAN: Cryo-EM Reconstruction using GAN Framework

5.7.2 Results on Additional Synthetic Data


The main synthetic experiment given in Figure 5.3 considers imaging conditions
with a realistic noise level (-20 dB SNR). We performed additional experiments
on synthetic -galactosidase datasets to understand the effect of different imaging
conditions on the quality of CryoGAN reconstruction. More precisely, we considered
1) the case with a lower noise level (-5.2 dB SNR) and 2) the case with a realistic
noise level (-20 dB SNR) and translations (3% of the image size) in the projections.
All the other conditions are identical to the main synthetic experiment. More
details on these two additional experiments are provided in Appendix A.3.
The reconstructions obtained in the two cases reach a resolution of 7.53 Å and
10.8 Å, respectively, as shown in Figure 5.4. As expected, the decrease of the noise
level from -20 dB to -5.2 dB improves the reconstruction quality (from 8.64 Å to
7.53 Å). For the second case, the challenging presence of translations results in a
slightly lower resolution. We believe that using a discriminator architecture that is
invariant to shifts in the input image would improve the result in this case. Indeed,
the discriminator would then be blind to such shifts in the real and simulated data,
which would yield a reconstruction with a quality similar to the no-shift case.

5.7.3 Results on Experimental Data (EMPIAR-10061)


We then deployed CryoGAN on 41,123 -galactosidase projections (obtained from
EMPIAR-10061 [158]) to assess its capacity to reconstruct real, experimental data.
Here as well, we randomly divided the dataset in two and applied CryoGAN sepa-
rately on both halves. The details of this experimental setup are given in Appendix
A.3.
We ran CryoGAN for 150 minutes to obtain a 12.1 Å-resolution reconstruction
using an NVIDIA V100 GPU. The results are displayed in Figure 5.5. The flexible
architecture of CryoGAN permits the straightforward injection of prior knowledge
on this specific imaging procedure into the reconstruction pipeline (e.g., the assump-
tion of uniform pose distribution). Using this prior knowledge and its adversarial
learning scheme, CryoGAN converges toward the reconstruction that best explains
the statistics of the dataset (Figure 5.5a). As with the synthetic experiments, this
is achieved by exploiting the gradients of the discriminator to update the simu-
lator and the current volume estimate, so that, at later iterations, the simulated
projections (Figure 5.5b) follow a distribution that better approaches that of the
5.8 Summary 143

real dataset. Higher-resolution details are thus progressively introduced in the es-
timated volume throughout the run, as illustrated by the evolution of the FSC
curves between the reconstructed half-maps (Figure 5.5d). This resulted in a 12.08
Å -galactosidase structure whose simulated projections closely resemble the real
ones (Figure 5.5c).

5.8 Summary
In this chapter, we present CryoGAN, a new paradigm for single-particle Cryo-
EM reconstruction based on unsupervised deep adversarial learning. The major
challenge in single-particle Cryo-EM is that the imaged particles have unknown
poses. Current reconstruction techniques are based on a marginalized maximum-
likelihood formulation that requires calculations over the set of all possible poses
for each projection image, a computationally demanding procedure. CryoGAN
sidesteps this problem by using a generative adversarial network (GAN) to learn the
3D structure that has simulated projections that most closely match the real data
in a distributional sense. The architecture of CryoGAN resembles that of standard
GAN, with the twist that the generator network is replaced by a model of the
Cryo-EM image acquisition process. CryoGAN is an unsupervised algorithm that
only demands projection images and an estimate of the contrast transfer function
parameters. No initial volume estimate or prior training is needed. Moreover,
CryoGAN requires minimal user interaction and can provide reconstructions in a
matter of hours on a high-end GPU. In addition, we provide sound mathematical
guarantees on the recovery of the correct structure. CryoGAN currently achieves
a 8.6 Å resolution on a realistic synthetic dataset. Preliminary results on real -
galactosidase data demonstrate CryoGAN’s ability to exploit data statistics under
standard experimental imaging conditions. We believe that this paradigm opens
the door to a family of novel likelihood-free algorithms for Cryo-EM reconstruction.
144 CryoGAN: Cryo-EM Reconstruction using GAN Framework

Figure 5.5: Evolution of CryoGAN while reconstructing the experimental


-galactosidase dataset (EMPIAR-10061) from [158]. (a) The volume is
initialized with zeros and is progressively updated to produce projections
whose distribution matches that of the experimental dataset. (b) Evo-
lution during the training of the clean projections (i.e., before CTF and
noise) generated by the Cryo-EM physics simulator. (c) Row 1 : Clean,
simulated projections generated at the final stage of training. Row 2 : CTF-
modulated, simulated projections (before noise) generated at the final stage
of training. Row 3 : Simulated projections (with CTF and noise) gener-
ated at the final stage of training. Row 4 : Real projections, for comparison.
(d) FSC curves between the two reconstructed half-maps at different points
during training.
Chapter 6

Reconstructing Continuous
Conformations in CryoEM
using GANs

1
In the previous chapters, we have only discussed inverse problems in which at
least the signal or the forward model is deterministic. In this chapter, we deal
with the challenging case of heterogeneous Cryo-EM. In this case the 3D structure
exhibits an unknown conformation variability while being imaged by the stochastic
forward model. In order to solve this problem, we devise a third-generational
algorithm which is based on the extension of the distributional perspective proposed
in CryoGAN.

6.1 Overview
The determination of the structure of nonrigid macromolecules is an important
aspect of structural biology and is fundamental in our understanding of biological
mechanisms and in drug discovery [156]. Among other popular techniques such
as X-ray crystallography and nuclear magnetic resonance spectroscopy, Cryo-EM
1 This chapter uses content from our work [194].

145
146 Reconstructing Continuous Conformations in CryoEM using GANs

Figure 6.1: Reconstruction task of Cryo-EM. Many samples of a


biomolecule (which may exhibit continuously varying conformations) are
frozen in vitreous ice. These are then imaged/projected using an electron
beam to get 2D micrographs. The 2D images containing projection of a sin-
gle sample are then picked out (black-box). The task then is to reconstruct
the conformations of the biomolecule from these measurements.

has emerged as a unique method to determine molecular structures at unprece-


dented high resolutions. It is widely applicable to proteins that are difficult to
crystallize or have large structures. Cryo-EM produces a large number (from 104
to 107 ) of tomographic projections of the molecules dispersed in a frozen solution.
The reconstruction of 3D molecular structures from these data involves three main
challenges: possible structural heterogeneity of the molecule, random locations and
orientations of the molecules in the ice, and an extremely poor signal-to-noise ratio
(SNR), which can be as low as to -20 dB (Figure 6.1). In fact, the reconstruction of
continuously varying conformations of a nonrigid molecule is still an open problem
in the field [195, 196]. A solution would considerably enhance our understanding of
the functions and behaviors of many biomolecules.
Most current methods [151, 153] find the 3D structure by maximizing the likeli-
hood of the data. They employ an iterative optimization scheme that alternatively
estimates the distribution of poses (or orientations) and reconstructs a 3D structure
6.1 Overview 147

Figure 6.2: Schematic overview of the reconstruction methods in Cryo-


EM. (a) Current methodels; (b) CryoGAN; (c) proposed method (Multi-
CryoGAN).

until a criterion is satisfied (Figure 6.2(a)). To address the structural variability


of protein complexes, these methods typically use discrete clustering approaches.
However, the pose estimation and clustering steps are computationally heavy and
include heuristics. This makes these methods inefficient when the molecule has
continuous variations or a large set of discrete conformations.
Recently, two deep-learning-based reconstruction methods that require no prior-
training nor additional training data have been introduced. On one hand, Cryo-
DRGN [197] uses a variational auto-encoder (VAE) to model continuous structural
variability, avoiding the heuristic clustering step. It is a likelihood-based method
that requires pose estimation using an external routine like a branch-and bound-
method [153]. This additional processing step can complicate the reconstruction
procedure and limit the flexibility of the model. On the other hand, Gupta et
al. [28] have recently proposed CryoGAN. It addresses the problem under a gen-
erative adversarial framework [36]. CryoGAN learns to reconstruct a 3D structure
148 Reconstructing Continuous Conformations in CryoEM using GANs

whose randomly projected 2D Cryo-EM images match the acquired data in a distri-
butional sense (Figure 6.2(b)). Due to this likelihood-free characteristic, CryoGAN
does not require any additional processing step such as pose estimation, while it can
be directly deployed on the Cryo-EM measurements. This largely helps simplify the
reconstruction procedure. However, its application is limited to the reconstruction
of a single conformation.
In this work, we combine the advantages of CryoDRGN and CryoGAN. We
propose an unsupervised deep-learning-based method, called Multi-CryoGAN. It
can reconstruct continuously varying conformations of a molecule in a truly stan-
dalone and likelihood-free manner. Using a convolutional neural network (CNN), it
directly learns a mapping from a latent space to the 3D conformation distribution.
Unlike current methods, it requires no calculation of pose or conformation estima-
tion for each projection, while it has the capacity to reconstruct low-dimensional
but complicated conformation manifolds [198].
Using synthetic Cryo-EM data as our benchmark, we show that our method
can reconstruct the conformation manifold for both continuous and discrete con-
formation distributions. In the discrete case, it also reconstructs the corresponding
probabilities. To the best of our knowledge, this is the first standalone method that
can construct whole manifold of the biomolecule conformations.

6.2 Related Work


Traditional Cryo-EM Image Reconstruction. A detailed survey of the classi-
cal methods is provided in [157,199]. Most of them fall into the maximum-likelihood
(ML) framework and rely on either expectation-maximization (ML-EM) [151] or
gradient descent (the first stage of [153]). In the context of heterogeneous confor-
mation reconstruction, a conjugate-gradient descent is used to estimate the volume
covariance matrix [200]. The eigendecomposition of this matrix contains informa-
tion about the conformation distribution which is then input to the ML framework.
In [201], a conformation manifold is generated for each group of projections with
similar poses. This data-clustering approach assumes orientation rather than struc-
tural heterogeneity to be the dominant cause for variations among the projection im-
ages, a strong constraint. In addition, the reconstruction of 3D movies from multi-
ple 2D manifolds can be computationally expensive. In another method, Moscovich
et al. [202] compute the graph Laplacian eigenvectors of the conformations using
6.3 Background and Preliminaries 149

covariance estimation. In [203], the problem of heterogeneous reconstructions is


reformulated as the search for a homogeneous high-dimensional structure that rep-
resents all the states, called a hypermolecule, which is characterized by a basis of
hypercomponents. This allows for reconstruction of high-dimensional conformation
manifolds but requires assumptions on the variations of the conformations as a prior
in their Bayesian formulation.
One of the main drawbacks of these methods is that they require marginalization
over the space of poses for each projection image, which is computationally demand-
ing and potentially inaccurate. In addition, because they rely on 3D clustering to
deal with structural variations of protein complexes, these methods become ineffi-
cient for a large set of discrete conformations and struggle to recover a continuum
of conformations.
Deep Learning for Cryo-EM Reconstructions. In addition to CryoDRGN
and CryoGAN that have already been discussed in the introduction, there is a
third described in [188]. It uses a VAE and a framework based on a generative
adversarial network (GAN) to learn the latent distribution of the acquired data.
This representation is then used to estimate the orientation and other important
parameters for each projection image.
Deep Learning to Recover a 3D Object from 2D Projections. The implicit
or explicit recovery of 3D shapes from 2D views is an important problem in com-
puter vision. Many deep-learning algorithms have been proposed for this [190–192].
Taking inspiration from compressed sensing, Bora et al. [189] have recently intro-
duced a GAN framework that can recover an original distribution from the mea-
surements through a forward model. While these approaches would in principle be
applicable, they consider a forward model that is too simple for Cryo-EM, where a
contrast transfer function (CTF) must be taken into account and where the noise
is orders of magnitude stronger (e.g. with a typical SNR of -10 to -20 dB.

6.3 Background and Preliminaries


6.3.1 Image-Formation Model
The aim of Cryo-EM is to reconstruct the 3D molecular structure from the mea-
surements {y1data , . . . , yQ
data }, where Q is typically between 10 to 10 . Each mea-
4 7
150 Reconstructing Continuous Conformations in CryoEM using GANs

surement yq 2 RN ⇥N is given by

yq = Ccq ⇤ Stq P✓q {xq } + nq , (6.1)


| {z }
H' q

where
• xq 2 RN ⇥N ⇥N is a separate instance of the 3D molecular structure;
• nq 2 RN ⇥N is the noise;
• H'q is the measurement operator which depends on the imaging parameters
'q = (✓ q , tq , cq ) 2 R8 and involves three operations.
– The term P✓q {xq } is the tomographic projection of xq rotated by ✓ q =
(✓1q , ✓2q , ✓3q ).
– The operator Stq shifts the projected image by tq = (tq1 , tq2 ). This shift
arises from off-centered particle picking.
– The Fourier transform of the resulting image is then modulated by the
CTF Ĉcq with defocus parameters cq = (dq1 , dq2 , ↵ast
q
) and thereafter
subjected to inverse Fourier Transform.
For more details, please see Section on Image Formation in the previous chapter.
The challenge of Cryo-EM is that, for each measurement yq , the structure xq and
the imaging parameters (✓ q , tq ) are unknown, the CTF is a band pass filter with
multiple radial zero frequencies that incur irretrievable loss of information, and the
energy of the noise is multiple times (⇠10 to 100 times) that of the signal which
corresponds to SNRs of -10 to -20 dB. In the homogeneous case (single conforma-
tion), all xq are identical. But in the heterogeneous case (multiple conformations),
each xq represents a different conformation of the same biomolecule.
Stochastic Modeling. We denote the probability distribution over the conforma-
tion landscape by pconf (x) from which a conformation xq is assumed to be sampled
from. We assume that the imaging parameters and the noise are sampled from
known distributions p' = p✓ pt pc and pn , respectively. For a given conformation
distribution pconf (x), this stochastic forward model induces a distribution over the
measurements which we denote by p(y). We denote by pdata conf (x) the true confor-
mation distribution from which the data distribution pdata (y) is acquired such that
6.3 Background and Preliminaries 151

{y1data , . . . , yQ
data } ⇠ pdata (y). The distribution pconf (x) is unknown and needs to
data

be recovered.
The classical methods are likelihood-based and rely on the estimation of imaging
parameters (✓ q , tq ) (or a distribution over them) and of the conformation class for
each measurement image yq . This information is then utilized to reconstruct the
multiple discrete conformations. Our method, in contrast, is built upon the insight
gen
that, to recover pdata conf (x) it is sufficient to find a pconf (x) whose corresponding
measurement distribution pgen (y) is equal to pdata (y) (see Theorem 6.5.1). This
does away with pose (or distributions over the poses) estimation and conformation
clustering for each measurement.

6.3.2 CryoGAN
Our scheme is extension of the CryoGAN [28] method, which is applicable only
for the homogeneous case pdata
conf (x) = (x xdata ), where xdata is the true 3D
structure. CryoGAN tackles the challenge by casting the reconstruction problem
as a distribution-matching problem (Figure 6.2(b)). More specifically, it learns
to reconstruct the 3D volume x⇤ whose simulated projection set (measurement
distribution) is most similar to the real projection data in a distributional sense,
such that
x⇤ = arg min WD(pdata (y)||pgen (y; x)). (6.2)
x

Here, pgen (y; x) is the distribution generated from the Cryo-EM physics simulator
and WD refers to the Wasserstein distance [165]. This goal is achieved by solving
the min-max optimization problem:

x⇤ = arg min max (Ey⇠pdata (y) [D (y)] Ey⇠pgen (y;x) [D (y)]), (6.3)
x D :kD kL 1
| {z }
WD(pdata (y)||pgen (y;x))

where D is a neural network with parameters that is constrained to have Lip-


schitz constant kD kL  1 [167] (Figure 6.3(b)). Here, D learns to differentiate
between the real projection y and the simulated projection H' {x} and scores the
realness of given samples. As the discriminative power of D becomes stronger
(maximization) and the underlying volume estimate x is updated accordingly
(minimization), pdata (y) and pgen (y; x) become indistinguishable so that the al-
gorithm recovers x⇤ = xdata .
152 Reconstructing Continuous Conformations in CryoEM using GANs

Figure 6.3: Schematic illustration of Multi-CryoGAN and its components.


(a) Conformation manifold mapper; (b) CryoGAN.

6.4 Method

6.4.1 Parameterization of the Conformation Manifold

CryoGAN successfully reconstructs the volumetric structure of a protein by finding


a single volume x that explains the entire set of projections, which is adequate when
all the imaged particles are identical (homogeneous case). However, in reality, many
biomolecules have nonrigid structures, which carry vital information.
To address this, we introduce a manifold-learning module G that uses a CNN
with learnable weights (Figure 6.3 (a)). Sampling from pconf (x) is then equivalent
to getting G (z), where z is sampled from a prior distribution. Therefore, y ⇠
pgen (y) is obtained by evaluating H' {G (z)} + n, where (n, z, ') are sampled
from their distributions (see Algorithm 5 and Figure 6.3). To explicitly show this
dependency to G , we hereafter denote the generated distribution of projection
data by pgen (y; G ).
6.4 Method 153

Algorithm 5 Samples from the generated distribution pgen (y; G ).


Input: Latent distribution pz ; angle distribution p✓ ; translation distribution pt ;
CTF parameters distribution pc ; noise distribution pn
Output: Simulated projection ygen
1. Sample z ⇠ pz .
2. Feed z into generator network to get x = G (z).
3. Sample the imaging parameters ' = [✓, t, c].
• Sample the Euler angles ✓ = (✓1 , ✓2 , ✓3 ) ⇠ p✓ .
• Sample the 2D shifts t = (t1 , t2 ) ⇠ pt .

• Sample the CTF parameters c = (d1 , d2 , ↵ast ) ⇠ pc .


4. Sample the noise n ⇠ pn .
5. Generate ygen = H' x + n based on (6.6).
return ygen

6.4.2 Optimization Scheme


We now find G ⇤ such that the distance between pdata (y) and pgen (y; G ) is mini-
mized, which results in the min-max optimization problem [164]

G ⇤ = arg min WD(pdata (y)||pgen (y; G )) (6.4)


G

= arg min max (Ey⇠pdata [D (y)] Ey⇠pgen (y;G ) [D (y)]) . (6.5)


G D :kD kL 1
| {z }
WD(pdata (y)||pgen (y;G )

As will be discussed in Theorem 6.5.1, the global minimizer G ⇤ of (6.5) indeed


captures the true conformation landscape pconf (x), which is achieved when D is
no longer able to differentiate the samples from pdata (y) and pgen (y; G ⇤ ).
It is crucial to note the difference between Multi-CryoGAN and conventional
generative adversarial frameworks [36]. In the latter, G directly outputs the samples
from pgen (y), whereas ours outputs the samples x from the conformation distribu-
tion pconf (x) whose stochastic projections are the samples of pgen (y). The con-
ventional schemes only helps one to generate samples which are similar to the real
154 Reconstructing Continuous Conformations in CryoEM using GANs

data but does not recover the underlying conformation landscape. Our proposed
scheme includes the physics of Cryo-EM, which ties pgen (y) with the conformation
landscape pconf (x) and is thus able to recover it (See Theorem 6.5.1 in Section 6.5).

6.5 Theoretical Guarantee of Recovery


Our experiments illustrate that enforcing a match between the distribution of the
simulated measurements and that of the data is sufficient to reconstruct the true
conformations. We now prove this mathematically. For the homogeneous case,
the proof is already discussed in [28, Theorem 1] which we now extend to the
heterogeneous case. We switch to a continuous-domain formulation of the Cryo-
EM problem while noting that the result is transferable to the discrete-domain as
well, albeit with some discretization error.
Notations and Preliminaries. We denote by L2 (R3 ) the space f 3D structures
f : R3 ! R with finite energy kf kL2 < 1. The imaging parameters ' are assumed
to lie in B ⇢ R8 . We denote by L2 (R2 ) the space of 2D measurements with finite
energy. Each individual continuous-domain Cryo-EM measurement y 2 L2 (R2 ) is
given by

y = H' {f } + n, (6.6)

where f 2 L2 (R3 ) is some conformation of the biomolecule sampled from the prob-
ability measure P̃conf on L2 (R3 ), the imaging parameters ' are sampled from p' ,
and n is sampled from the noise probability measure Pn on L2 (R2 ).
We define [f ] := {rA {f } : rA 2 O} as the set of all the rotated-reflected
versiond of f . There,PO is the set of all the rotated-reflected versions over L2 (R3 ).
We define the space L2 (R ) = L2 (R3 )/O as the quotient space
3
Pof the3shapes. For
any P̃conf defined over L2 (R3 ), an equivalent Pconf exists over L2 (R ). Since we
are interested only in the shape of conformations of the biomolecule, we will only
focus on recovering Pconf . We denote by the probability measure on B 2 R8 .
The measure is associated to the density function p' . Both of these induce a
probability measure Pclean on the space L22 = {f : R3 ! R2 s.t. kf kL2 < 1}
through
P the forward operator. This is given by Pclean [A] = (Pconf ⇥ )[([f ], ') 2
( L2 ⇥ B) : H' f 2 A] for any Borel measure set A 2 L2 (R2 ). We denote Pmeas
as the probability measure of the noisy measurements.
6.5 Theoretical Guarantee of Recovery 155

gen
Theorem 6.5.1. Let Pdataconf and Pconf be the true and the reconstructed
P conformation
probability measures on the quotient space of 3D structures L2 (R3 ), respectively.
We assume that they are atomic and that they are supported only on nonnegative-
valued shapes. Let Pdata
meas and Pmeas be the probability measures of the noisy Cryo-EM
gen
gen
measurements obtained from Pdataconf and Pconf , respectively.
Make the following physical assumptions:

i. the noise probability measure Pn is such that its characteristic functional van-
ishes nowhere in its domain and that its sample n is pointwise-defined every-
where;

ii. the distributions p✓ , pt , and pc are bounded;

iii. for any two c1 , c2 ⇠ pc , c1 6= c2 , the CTFs Ĉc1 and Ĉc2 share no common
zero frequencies.

Then, it holds that


gen
Pdata gen data
meas = Pmeas ) Pconf = Pconf . (6.7)

Proof. We first prove that Pdata


meas = Pmeas ) Pclean =
gen data
Pgen
clean .
Note that, due to the
independence of clean measurements and noise, we have that

P̂data data
meas = P̂clean P̂n
gen
P̂gen
meas = P̂clean P̂n . (6.8)
gen
From the assumption that P̂n is nonzero everywhere, we deduce that P̂data clean = P̂clean .
This proves the first step.
To prove the next step, we invoke Theorem 4 in [28] which states that any
two probability measures P1cleanP and 3Pclean that correspond to Dirac probability
2

measures Pconf and Pconf on


1 2
L2 (R ), respectively, are mutually singular (zero
measure of the common support) if and only if the latter are distinct. We denote
the relation of mutual singularity by ?.
Since Pdata
conf is an atomic measure (countable weighted sum of distinct Dirac
measures), the corresponding Pdata clean is composed of a countable sum of mutually
singular measures. The same is true for Pgen clean since it is equal to Pclean .
data

We proceed by contradiction. We denote by Supp{P} the support of the measure


gen gen
P. Assume that Supp{Pdata conf } 6= Supp{Pconf }. Let us define S1 = Supp{Pconf } \
156 Reconstructing Continuous Conformations in CryoEM using GANs

f
conf } . For any [f ] 2 S1 , we denote by Pclean its noiseless probability
Supp{Pdata C

measure. Since f 2 S1 , it is distinct from any constituent Dirac measure in Pdata conf .
Therefore, by using [28, Theorem 4], Pfclean is mutually singular to each of the
f
constituent mutually singular measures of Pdata clean , implying that Pclean ?Pclean .
data
gen gen
From Supp{Pclean } ⇢ Supp{Pclean }, it follows that Pclean 6= Pclean , which raises
f data

a contradiction. Therefore, the set S1 is empty. The same can be proved for the
set S2 = Supp{Pgen C data gen
conf } \ Supp{Pconf }. Therefore, Supp{Pconf } = Supp{Pconf },
data

which means that the location of their constituent Dirac measures are the same.
gen
To maintain Pdata
clean = Pclean , the weight of their constituent Dirac measures have to
be the same, too. This concludes the proof.
In essence, Theorem 6.5.1 claims that a reconstructed manifold of conformations
recovers the true conformations if its measurements match the acquired data in a
distributional sense. Though the result assumes the true conformation landscape
to be discrete (atomic measure), it holds for an infinite number of discrete confor-
mations which could be arbitrarily close/similar to each other and is thus relevant
to continuously varying conformations. We leave the proof of the latter case to
future works.

6.6 Experiments and Results


We evaluate the performance of the proposed algorithm on synthetic datasets ob-
tained from a protein with multiple conformations. We synthesize two datasets: one
continuum of configurations and one where the particles can only take a discrete
number of states. During reconstruction, no assumption is made of their continuous
or discrete nature, which suggests that our method is capable of learning different
conformation distribution behaviors.
Dataset. For each dataset, we generate 100,000 simulated projections from the in
vivo conformation variation of the heat-shock protein Hsp90. The Coulomb density
maps of each conformation are created by the code provided in [204] with slight
modifications. The conformation variation of this protein is represented by the
bond-angle , which describes the work cycle of the molecule, where the two sub-
units continuously vary between fully closed ( = 0 , protein database entry 2cg9 )
and fully opened ( = 20 ). We sample ⇠ Uniform (0 , 20 ) for the continuous
case and ⇠ 20 ⇤ Bernoulli (0.75) for the discrete case. Here, Uniform (a, b) is
6.6 Experiments and Results 157

the uniform distribution between a and b, and Bernoulli (p) denotes the Bernoulli
distribution with parameter p. A conformation is generated with (32 ⇥ 32 ⇥ 32)
voxels, where the size of each voxel is 5 Å. A 2D projection with random orientation
of this conformation is obtained ((32 ⇥ 32) image, Figure 6.4b). The orientation
is sampled from a uniform distribution over SO(3). Then, the CTF is applied to
this projection image with a defocus uniformly sampled between [1.0 µm, 2.0 µm],
assuming that the horizontal and vertical defocus values are the same and there
is no astigmatism. Translations/shifts are disabled in these experiments. Finally,
Gaussian noise was added to the CTF-modulated images, resulting in an SNR of
approximately -10 dB.
Implementation Details. The reconstruction of the conformation is done by
solving (6.5) using Algorithm 6. For both continuous and discrete conformations,
we use the same distribution pz ⇠ Uniform (z0 , z1 ), where z0 , z1 2 R32⇥32⇥32 are
randomly chosen from Uniform(0, 0.025) and fixed throughout the process. Thus,
we do not impose any prior knowledge whether the landscape is continuous or
discrete. As we shall see later, this latent distribution is sufficiently rich to represent
the variation of interest in the synthetic datasets. The architecture of D, G, and
training details are provided in the Appendix A.4.
Optimization Details The models are trained end-to-end on the synthetic datasets
with the usual WGAN loss (gradient-penalty regularizer = 0.6) on a TITAN X
GPU. In all experiments, G , D and the noise parameters are optimized using
three separate Adam optimizers with learning rate of 10 3 and gradient norm clip-
ping value of 1, 103 , 1, respectively. Between each generator step, there are ndisc = 5
discriminator steps. The batch size is kept at 16 samples and the algorithm is run
for 30 epochs which was sufficient for the convergence.
Metric. We deploy two metrics based on Fourier-Shell Correlation (FSC). The
FSC between two structures x1 and x2 is given by
hV! !
x̂1 , Vx̂2 i
FSC(!, x1 , x2 ) = (6.9)
kVx̂1 kkV!
!
x̂2 k

where V! x̂ is the vectorization of the shell of x̂ at radius ! and x̂ is the 3D Fourier


transform of x. As first metric, we use the FSC between a reconstructed conforma-
tion and the corresponding ground truth conformation. This metric encapsulates
the structural quality of an individual reconstructed conformation.
To evaluate the landscape of the conformations, we propose a second metric that
we call the matrix M 2 RL⇥L of FSC cross conformations (FSCCC). Its entries are
158 Reconstructing Continuous Conformations in CryoEM using GANs

given by
Z
M[m, n] = AreaFSC(xm , xn ) = FSC(!, xm , xn ) d! (6.10)
0<!!c

where xm and xn are samples in the reconstructed conformation manifold and


!c is the normalized Nyquist frequency. We determine it for the reconstructed
landscape by setting xm = G ⇤ ((1 ↵m ) ⇤ z0 + ↵m ⇤ z1 ) where ↵m = (m/L) for
m 2 {0, . . . , L}. The matrix M encapsulates how similar xm is compared to other
structures xn across the manifold (M[m, n] is proportional to the similarity between
xm and xn ), hence allowing for a visualization of the manifold.
For the continuous conformation, it is useful to compare the FSCCC of our re-
constructions with that of the ground truth. To that end, we also evaluate M[m, n]
when xm corresponds to the bond-angle = 20 (m/L), where m 2 {0, . . . , L}. In
our experiments, we used L = 20 for all FSCCC calculations.

6.6.1 Continuous Conformations


We give in Figure 6.4 a qualitative comparison between the ground truth conforma-
tion variation, as the angle goes from 0 to 20 , and the reconstructions G( ⇤ )(z),
where z = (1 ↵)z0 + z1 and ↵ goes from 0 to 1. Our method successfully recon-
structs a manifold that exhibits smooth continuous conformation variation (Figure
6.4(a)), where the input parameter ↵ has direct control over the bond-angle for
the reconstruction. This shows that not only the true conformation landscape has
been captured, but its factor of variation has been meaningfully encoded by the
latent variables z. The similarity between simulated projections and the ground
truth data in Figure 6.4(b) suggests that the algorithm has achieved pdata = pgen .
Moreover, their underlying distributions of noiseless projections are also similar, in
accordance to the property discussed in Theorem 6.5.1.
We also evaluate the structural quality of reconstruction for certain representa-
tive individual conformations. In Figure 6.4(c), the extreme conformations for the
ground truth = 0 and = 20 and the reconstructions ↵ = 0 and ↵ = 1 are
shown. Their FSC plot reach the value 0.5 after the normalized frequency of 0.25,
so that at least half of the Nyquist resolution is achieved (Figure 6.4(d)). All these
results are further confirmed by the very similar FSCCC matrix for ground truth
and reconstruction in Figure 6.5(a). This implies that the reconstruction manifold
successfully approximates the continuous ground truth.
6.6 Experiments and Results 159

Figure 6.4: Continuous conformations experiment. (a) Comparison be-


tween the ground truth conformation manifold and the reconstructed con-
formation manifold G⇤ (z) where z = (1 ↵)z0 + ↵z1 , ↵ 2 [0, 1]. (b,
left-column) Clean projections of random samples from the ground truth
and reconstructed manifold; (b, right-column) their CTF-modulated and
noise-corrupted projection. These are the real and generated samples that
are fed to D . (c) Ground truth with angles 0 and 20 , and the recon-
struction corresponding to the endpoints in the latent space. (d) The FSC
between them.
160 Reconstructing Continuous Conformations in CryoEM using GANs

Figure 6.5: The FSC cross-conformation (FSCCC) matrix in (6.10). (a)


Continuous conformation with (left) ground truth and (right) reconstruc-
tion. It shows that the reconstructed conformations smoothly vary (with-
out forming clusters) similar to the ground truth case. (b) Discrete con-
formation case with (left) ideal reconstruction and (right) obtained recon-
struction. The ideal reconstruction describes the case where 25% and 75%
of latent space would have mapped to the two distinct conformations with-
out any transitions. The obtained reconstruction case can be seen to be
very similar to the ideal case with 25% and 70% being the latent space
occupied by the two conformations.

6.6.2 Discrete Conformations


We present in Figure 6.6 the reconstruction results for the discrete case, where our
proposed method successfully recovers not only the conformations but also their
probabilities. About 70% of the reconstructed landscape matches the configuration
for = 20 , while 25% matches = 0 . The remaining 5% of the landscape cor-
responds to a relatively abrupt transition between them. This suggests that our
model distribution pconf (x) closely follows the ground truth Bernoulli distribution.
This is further supported in Figure Figure 6.5(b), where the reconstructed FSCCC
matrix greatly resembles the ideal reconstruction case. Ideally, one would expect
the two conformations to occupy 25% and 75% of the latent space without having
6.6 Experiments and Results 161

Figure 6.6: Discrete conformations experiment. (a) Comparison between


the ground truth (GT) taking only two conformations with probability
0.25 and 0.75 and the reconstructed conformations G ⇤ (z), where z =
(1 ↵)z0 + ↵z1 , ↵ 2 [0, 1]. (b) The GT with bond angles 0 and 20 and
their reconstruction. (c) FSC of the structures in (b).

any transition conformations. The structural quality of these two recovered config-
urations with respect to the corresponding ground truth are given in Figure 6.6(b).
Their FSC show that at least half of the Nyquist resolution is achieved (Figure
6.6(c)).
The FSCCC reconstruction matrix (Figure 6.5(b)) validates the fact that the re-
constructed structures cluster into two main conformations. We use it to determine
the probabilities of these cluster/conformations. We determine the probability of
a conformation using its first and last row (similarity of the conformations with
respect to the extreme conformations). We consider that a conformation xn be-
longs to the first cluster/conformation if M[0, n] > 0.5 and M[20, n] < 0.5. If the
case is reversed (M[0, n] < 0.5 and M[20, n] > 0.5), then it belongs to the second
162 Reconstructing Continuous Conformations in CryoEM using GANs

cluster/conformation. Otherwise, it is considered as a transitioning conformation.


This yields that the first 25% and the last 70% structures cluster together to form
the = 0 and = 20 conformations, respectively, and the middle 5% are the
transitioning conformations.

6.7 Summary
In this chapter, we propose a deep-learning-based reconstruction method for cryo-
electron microscopy (Cryo-EM) that can model multiple conformations of a nonrigid
biomolecule in a standalone manner. Cryo-EM produces many noisy projections
from separate instances of the same but randomly oriented biomolecule. Current
methods rely on pose and conformation estimation which are inefficient for the
reconstruction of continuous conformations that carries valuable information. We
introduce Multi-CryoGAN, which sidesteps the additional processing by casting the
volume reconstruction into the distribution matching problem. By introducing a
manifold mapping module, Multi-CryoGAN can learn continuous structural hetero-
geneity without pose estimation nor clustering. We also give a theoretical guarantee
of recovery of the true conformations. Our method can successfully reconstruct 3D
protein complexes on synthetic 2D Cryo-EM datasets for both continuous and dis-
crete structural variability scenarios. To the best of our knowledge, Multi-CryoGAN
is the first model that can reconstruct continuous conformations of a biomolecule
from Cryo-EM images in a fully unsupervised and standalone manner.
6.7 Summary 163

Algorithm 6 Reconstruction of multiple conformations using Multi-CryoGAN.


Input: Dataset {y1data , . . . , yQ
data }; training parameters: number of steps k to apply
to the discriminator and penalty parameter
Output: A mapping G from the latent space to the 3D conformation space.
for ntrain training iterations do
for k steps do

• sample from real data: {y1data , . . . , yB


data }.

• sample from generated data: {y1gen , . . . , yB


gen } ⇠ pgen (y; G ) (see Algorithm
1).
• sample from {1 , . . . , B } ⇠ U[0, 1].

• compute yint
b b
= b · ybatch + (1 b
b ) · ygen for all b 2 {1, . . . , B}.
• update the discriminator D by gradient ascent on the loss (6.5) comple-
mented with the gradient penalty term from [167].

end for
• sample generated data: {y1gen , . . . , yB
gen } ⇠ pgen (y; G ) (see Algorithm 1).

• update the Generator G by gradient descent on the loss (6.5).

end for
return G ⇤
164 Reconstructing Continuous Conformations in CryoEM using GANs
Chapter 7

Conclusion and Outlook

Reconstruction methods for linear problems range from classical to unsupervised-


deep-learning methods. This thesis brings contribution across these methods for
inverse problems ranging from 1D sampling to heterogeneous Cryo-EM.

Continuous-domain solutions of inverse problems

In chapter 2, we develop the continuous-domain extension of classical methods and


devise numerical approaches to solve them. We have shown that the formulation
of continuous-domain linear inverse problems with Tikhonov- and total-variation-
based regularizations leads to spline solutions. The nature of these splines is de-
pendent on the Green’s function of the regularization operator (L⇤ L) and L for
Tikhonov and total variation, respectively. The former is better to reconstruct
smooth signals; the latter is an attractive choice to reconstruct signals with sparse
innovations. Representer theorems for the two cases come handy in the numerical
reconstruction of the solution. They allow us to reformulate the infinite-dimensional
optimization as a finite-dimensional parameter search.

165
166 Conclusion and Outlook

Future Work
We expect that similar results exist in higher dimensions since the theory can be
generalized. However, the computations can also be expected to be challenging for
signals defined over Rd with d > 1, for example, when considering images rather
than signals. Our scheme to solve the gTV regularized 1D inverse problems has
been made more efficient in [205]. This formulation can be useful for cases when
a function needs to be optimized with a constraint on its sparsity. For example,
in [206] this numerical scheme has been used to learn activations of a neural network
in order to increase its capacity while maintaining its stability.

Deep-learning-based iterative methods


We propose a deep-learning-based iterative framework in chapter 3 to solve imaging
problems such as CT in order to bring more robustness and quality to the recon-
struction. The purpose of this chapter is to develop a simple yet effective iterative
scheme (RPGD) where one step of enforcing measurement consistency is followed
by a CNN that tries to project the solution onto the set of desired reconstruction
images. The whole scheme is ensured to be convergent. We also introduced a
novel method to train a CNN that acts like a projector using a reasonably small
dataset. For sparse-view CT reconstruction, our method outperforms the previous
techniques for both noiseless and noisy measurements.

Future Work
The proposed framework is generic and can be used to solve a variety of inverse
problems including superresolution, deconvolution, accelerated MRI, etc. This can
bring more robustness and reliability to the direct deep-learning-based techniques.
Infact, this method has contributed fundamentally in the emergence of the family
of such deep-learning-based iterative algorithms [21–25].

Time-dependent deep image prior


In chapter 4, we develop an unsupervised deep learning method for time-dependent
inverse problem. To this end, we introduce time-dependent deep image prior which
167

is the first deep learning method to reconstruct dynamic MRI without the require-
ment of additional data or training. This framework fits well the learning of spatio-
temporal manifolds that are smooth temporally; it is purely unsupervised. It is also
particularly appropriate in the context of inverse problems where no ground-truth
is available. In practice, it results in significant memory savings when compared to
compressed-sensing (CS) approaches. Our study shows that our proposed method
has the potential to reconstruct dynamic magnetic resonance images (MRI) in the
absence of an electrocardiogram signal.

Future Work
The method results in state-of-the-art reconstruction and could be utilized for other
similar temporal inverse problems. Recently, the research community has started
attempts to theoretically understand deep image prior [207–209]. We believe that
this continued effort could bring perspectives that could even further improve the
proposed method.
The current bottleneck of our method is the slow forward model, NuFFT. Cur-
rently, the NuFFT package1 is optimized neither for Python nor for GPU usage,
which leads to a major reason of slowdown in our implementation. With an efficient
implementation, we can speed-up the algorithm multiple times.
In future, we could explore more architectures to further improve the perfor-
mance. Because the network architecture explored in this work is never exhaustive,
there can be some architectural variations that would bring more improvement than
the currently reported results. For example, we can explore advanced architectures
or good initialization techniques of the network parameters. Similar to [210], which
have proposed a progressive learning strategy, we can also progressively increase
the spatial resolution of the network and the forward model to achieve better re-
construction. This would not only bring faster convergence but could also result
in improved performance and stability. Similar improvements could be achieved by
applying the strategy in temporal dimension. This could be achieved, for example,
by initializing with a high number of spoke-sharing and decreasing it over the course
of optimization. This could be done in parallel with a similar aggregating strategy
being followed in the latent variable domain.
In addition, instead of finding an entire image, we can let the network focus on
1 https://github.com/marchdf/python-nufft
168 Conclusion and Outlook

finding the residual of the static gold standard image. This facilitates to find the
dynamics of the dataset, which can bring better reconstruction [211]. This could
result in faster convergence to the solution. The convergence could even further be
accelerated by employing temporal and spatial multi-resolution approaches for this
routine.
Currently, our framework aims to optimize the network on the measurement
space from the beginning. Instead, as a warm start, in future we can first opti-
mize the network on the image space to fit the gold standard image. Because this
initial routine would not require the repetitive use of the forward model, it could
potentially even speed-up the reconstruction. Currently, the network is initialized
randomly which outputs random image series. Instead, it could be initialized to
output a more coherent image series. For example, the network could be first di-
rectly optimized to output precomputed backprojection images HTk yk . This could
result in faster convergence to the solution. The convergence could even further be
accelerated by employing temporal and spatial multi-resolution approaches for this
routine. Moreover, since this initial routine wouldn’t require the repetitive use of
the forward model, it could potentially even speed-up the reconstruction.

CryoGAN for Cryo-EM reconstruction


In chapter 5, we propose CryoGAN which reconstructs the 3D structure of a
biomolecule from its multiple noisy tomographic projections obtained from un-
known random orientations. The work utilizes the GAN framework to learn a
biomolecule whose measurements resemble the acquired measurements in a distri-
butional sense. This method sidesteps the need to involve calculations involving
poses. We believe that with better network architecture and multi-resolution ap-
proaches this paradigm could produce excellent reconstructions. Moreover, we be-
lieve that this work could lead to a completely novel family of algorithms dedicated
to improve or even remove certain stages of Cryo-EM pipeline, for example, particle
picking.

Future Work
Our implementation of CryoGAN is bound to further improve. Beyond simple
engineering tweaks (e.g., tuning the number of layers in the discriminator, testing
169

different optimization strategies, or using Fourier methods to accelerate projection),


we expect that several interesting developmental steps lie ahead.
A promising direction of research is the use of a coarse-to-fine strategy to re-
construct the volume progressively as the resolution improves. The motivation is
that increased robustness during the low-resolution regime tends to have a positive
impact on the convergence of the higher-resolution steps. Several GAN architec-
tures rely on such approaches, such as the progressive GANs [210] and the style-
GANs [212]. The benefits of multi-scale reconstruction could be considerable for
CryoGAN, given the extremely difficult imaging conditions that prevail in single-
particle Cryo-EM and that make the convergence of optimization algorithms to
good solutions particularly challenging. The core idea here would be to have the
discriminator learn to differentiate between real and simulated distributions at a
low resolution first, and then at successively higher ones. The impact on CryoGAN
could be as important as the one it had on GANs, which progressed in just a few
years from generating blurry facial images [36] to simulated images indistinguish-
able from real facial images [210, 212]. More generally, the upcoming tools and
extensions in GAN architectures could bring significant gain in resolution to the
CryoGAN implementation.
The performance of the Cryo-EM physics simulator should also improve hand-
in-hand with our ability to precisely model the physics behind single-particle Cryo-
EM with computationally tractable entities. At the moment, CryoGAN assumes
that the noise is additive in its image-formation model. One could alternatively
consider a Poisson-noise-based forward model [162,213]. This would however require
backpropagating through a Poisson distribution, a nontrivial operation.
Another interesting extension of the simulator would be to directly simulate the
patches of nonaligned micrographs/frames (rather than the individual projections)
and match their distribution to that of the raw dataset. Doing so would allow
CryoGAN to bypass additional preprocessing tasks, in particular particle picking.
Similar to likelihood-based methods, the CryoGAN algorithm requires the spec-
ification of the distribution of poses. In the case of CryoGAN, one could also
parameterize this distribution and learn its parameters during the reconstruction
procedure, along the lines of [177]. The same approach could be used to calibrate
the distribution of the translations of the projections.
On the theoretical side, we currently have mathematical guarantees on the re-
covery of volumes for which the assumed distribution of poses (be it uniform or
not) matches the distribution of the real data. We have prior mathematical indica-
170 Conclusion and Outlook

tions that this can also be achieved when there is a certain mismatch between the
assumed distribution of poses and the actual one, given that an appropriate GAN
loss is used.
Like all reconstruction algorithms, CryoGAN can fail if the dataset contains too
many corrupted particle images, e.g., those with broken structures or strong optical
aberrations. Several solutions could be deployed to handle excessive outliers in the
data distribution. One approach would be to include a step that automatically
spots and discards corrupted data so that the discriminator never sees them. Recent
deep-learning-based approaches able to track outliers in data could prove useful in
this regard [188].
While the spatial resolution of the CryoGAN reconstructions from real data
is not yet competitive with the state-of-the-art, the algorithm is already able to
steadily perform the hardest part of the job, which is to obtain a reasonable struc-
ture by using nothing save the particle dataset and CTF estimations. We believe
that the aforementioned developments will help to bring the CryoGAN algorithm to
the stage where it becomes a relevant contributor for high-resolution reconstruction
in single-particle Cryo-EM. We have laid out a roadmap for future improvements
that should get us to this stage, and may eventually help us reconstruct dynamic
structures.

Multi-CryoGAN for reconstructing continuous con-


formations in Cryo-EM
We propose an extension of CryoGAN called Multi-CryoGAN in chapter 6, which
is applicable to the much sought after problem of heterogeneous Cryo-EM. This
method reconstructs continuous conformations of a biomolecule without conforma-
tion estimation. This is the first method that can perform this task in a standalone
manner and has no limit on the complexity of conformation landscape it can re-
construct. By matching the simulated Cryo-EM projections with the acquired data
distribution, this method naturally learns to generate a set of 3D conformations in
a likelihood-free way. This allows us to reconstruct both continuous and discrete
conformations without any prior assumption on the conformation landscape, data
preprocessing steps, nor external algorithms such as pose estimation. Our experi-
ments shows that Multi-CryoGAN successfully recovers the molecular conformation
171

manifold, including the underlying distribution.

Future Work
The current experiments have been conducted on synthetic data with simple dy-
namics. In future, the method could be deployed on real data. We believe that,
with a better incorporation of state-of-the art GAN architectures [210,212] and the
future work intended for Cryo-GAN, this method could provide excellent results on
challenging datasets. Moreover, we also hope that this work will bring the inter-
est of computer vision researchers to the problem of Cryo-EM. In conclusion, with
suitable extension this method could, for the first time, let biologists reconstruct
the true dynamics of biomolecules in a user friendly and robust manner.
172 Conclusion and Outlook
Appendix A

Appendices

A.1 Chapter 2
A.1.1 Proof of Theorem 2.4.1
Abstract Representer Theorem
The result presented in this section is preparatory to Theorem 2.4.1. It is classical
for Hilbert spaces [47, Theorem 16.1]. We give its proof for the sake of completeness.
Theorem A.1.1. Let X be a Hilbert space equipped with the inner product h·, ·iX
and let h1 , . . . , hM 2 X 0 be linear and continuous functionals. Let C 2 RM be a
feasible convex compact set, meaning that there exists at least a function f 2 X
such that H{f } 2 C. Then, the minimizer

f ⇤ = arg min kf k2X s.t. H{f } 2 C (A.1)


f 2X

exists, is unique, and can be written as


M
X
f⇤ = a m h#
m (A.2)
m=1

for some {am }M


m=1 2 R, where hm = ⇧hm and ⇧ : X ! X is the Riesz map of X .
# 0

173
174 Appendices

Proof. The feasibility of the set C implies that the set CX = H 1 (C) = {f 2 X :
H{f } 2 C} 2 X , is nonempty. Since H is linear and bounded and since C is convex
and compact, its preimage CX is also convex and closed. By Hilbert’s projection
theorem [214], the solution f ⇤ exists and is unique as it is the projection of the null
function onto CX . Let the measurement of this unique point f ⇤ be z0 = H{f ⇤ }.
The Riesz representation theorem states that hhm , f i = hh# m , f iX for every f 2 X ,
where h#m 2 X is the unique Riesz conjugate of the functional hm . We then uniquely
PM
decompose f ⇤ as f ⇤ = f ⇤? + m=1 am h# m , where f ⇤?
is orthogonal to the span of
the h#m with respect to the inner product on X i.e., H{f ⇤?
} = 0. The orthogonality
also implies that
M 2
2 X
kf ⇤ k2X = f ⇤? X + a m h#
m . (A.3)
m=1 X

This means that the minimum norm is reached when f ⇤? = 0 while keeping
H{f ⇤ } = z0 , implying the form (A.2) of the solution.

Proof of Theorem 2.4.1


The proof of Theorem 2.4.1 has two steps. We first show that if Assumption 2
holds, then there is at least one solution and, moreover, if Assumption 2’ holds,
then the solution is unique. After this, we use Theorem A.1.1 to deduce the form
of the solution.
Existence of the Solution. We use the classical result on Hilbert spaces which
states that a proper, coercive, lsc, and convex objective functional over a Hilbert
space has a nonempty and convex set of minimizers [215].
Properness: By Assumption 2, E(z, ·) is proper. The regularization kLf k2L2 is
proper by the definition of X2 . This means that J2 (·|z) is proper in X2 .
Lower semi-continuity: E(z, ·) is lsc in RM , and H : X2 ! RM is continuous.
Therefore, E(z, H{·}) is lsc over X2 . Similarly, by composition f 7! kLf kL2 is
continuous, hence lsc over X2 . Since J2 (·|z) is the sum of two lsc functionals, it is
lsc as well.
Convexity: E(z, ·) and k · k2L2 are convex, and H and L are linear. Therefore,
J2 (·|z) = E(z, H{·}) + kL · k2L2 is convex too.
Coercivity: The measurement operator H is continuous and linear from X2 to
A.1 Chapter 2 175

RM ; hence, there exists a constant A such that

kH{f }k2  Akf kX2 (A.4)

for every f 2 X2 . Likewise, the condition H{p} = H{q} ) p = q for p, q 2 NL


implies the existence of B > 0 such that [46, Proposition 8]

kH{p}k2 BkpkNL (A.5)

for every p 2 NL . As presented in the supplementary material (see [46] for more
details), the search space X2 is a Hilbert space for the Hilbertian norm

kf kX2 = kLf kL2 + kPf kNL (A.6)

with P being the projector on NL introduced in (A.23). We set p = Pf and


g = f p. Then, g 2 X2 satisfies Lg = Lf and Pg = 0, and hence

kgkX2 = kLgkL2 + kPgkNL = kLf kL2 . (A.7)

Now consider a sequence of (generalized) functions fm 2 X2 , m 2 N such that


kfm kX2 ! 1. We set pm = Pfm and gm = fm pm . Assume by contradiction
that J2 (fm |z) is bounded. Then, kLfm kL2 and kH{fm }k2 are bounded (for the
latter, we use that E(z, ·) is coercive). However, we have

kH{fm }k2 kH{pm }k2 kH{gm }k2 (A.8)


Bkpm kNL Akgm kX2 (A.9)
= Bkfm kX2 (A + B)kLfm kL2 (A.10)

where we used respectively the triangular inequality in (A.8), the inequalities (A.4)
and (A.5) in (A.9), and the relations kpm kNL = kfm kX2 kLfm kL2 and kgm kX2 =
kLfm kL2 in (A.10). Since kLfm kL2 is bounded and kfm kX2 ! 1, we deduce that
kH{fm }k2 ! 1, which is known to be false. Finally, we obtain a contradiction,
proving the coercivity.
Since, J2 (·|z) is proper, lsc, convex, and coercive on X2 , therefore, it has at least
one minimizer.
Uniqueness of the Solution. We now prove that if E(z, ·) satisfies Assumption
2’ then the solution is unique. We first show this for the case when Assumption
176 Appendices

2’.i) is satisfied. We already know that the solution set is nonempty. It is then
clear that the uniqueness is achieved if J2 (·|z) is strictly convex. We now prove the
convex functional J2 (·|z) is actually strictly convex.
For 2 (0, 1), fA , fB 2 X2 , we denote fAB = fA + (1 )fB . Then, the
equality case J2 (fAB |z) = J2 (fA |z) + (1 )J2 (fB |z) implies that E(z, fAB ) =
E(z, fA )+(1 )E(z, fB ) and kLfAB kL2 = kLfB kL2 +(1 )kLfB kL2 , since the
two parts of the functional are themselves convex. The strict convexity of E(z, ·)
and the norm k·k2 then implies that

LfA = LfB and H{fA } = H{fB } (A.11)

and, therefore, (fA fB ) 2 NL \ NH . Since NL \ NH = 0 by Assumption 1,


therefore, fA = fB . This demonstrates that J2 (·|z) is strictly convex.
For Assumption 2’.ii), that is when E(z, ·) = I(z, ·), the solution set can be
written as

V2 = arg min kLf k2L2 . (A.12)


f 2H 1 {z}

where the set H 1 {z} = {f 2 X2 : H{f } = z} is nonempty since we assumed I(z, ·)


to be proper in the range of H.
According to [52, Theorem 1.1 and 1.2] given that the range of L : X2 ! L2 is
closed in L2 , V2 in (A.12) is singleton . As discussed in the supplementary material,
given any w 2 L2 , we can always find an f 2 X2 such that Lf = w. This means
that the range of L is the whole L2 which is clearly closed in L2 .
Form of the Minimizer. We first take the case when E satisfies Assumption 2’.
Let f2⇤ be the unique solution and z0 = H{f2⇤ }. One decomposes again X2 as the
direct sum X2 = Q NL , where

Q = {f 2 X2 : hf, piX2 = 0, 8p 2 NL }

is the Hilbert space with norm kL·kL2 . In particular, we have that f2⇤ = q ⇤ + p⇤
with q ⇤ 2 Q and p⇤ 2 NL .
Consider the optimization problem

min kLgk2L2 s.t. H{g} = (z0 H{p⇤ }). (A.13)


g2Q
A.1 Chapter 2 177

According to Theorem A.1.1, this problem admits a unique minimizer g ⇤ such


that ⇧ 1 g ⇤ 2 Q0 \ Span{hm }M m=1 where ⇧
1
: X ! X 0 is the inverse of the Riesz
map ⇧ : X ! X and Q = ⇧ Q. The set Q0 \ Span{hm }M
0 0 1
m=1 is represented by
PM P
a
m=1 m m h , where a
m m mhh , pi = 0 for every p 2 N L .
However, by definition, the function q ⇤ also satisfies H{q ⇤ } = (z0 H{p⇤ }).
Moreover, kLq ⇤ k2L2  kLg ⇤ k2L2 ; otherwise, the function f˜ = g ⇤ + p⇤ 2 X2 would
satisfy J2 (f˜|z) < J2 (f2⇤ |z), which is impossible. However, since (A.13) has a unique
solution, we have q ⇤ = g ⇤ . n o
PM
This proves that f2⇤ = ⇧ m=1 am hm + p . For any q 2 Q the Riesz map
⇤ 0 0

⇧q 0 = q 0 ⇤ ⇢L⇤L + pq0 for some pq0 2 NL [47, 52]. Here ⇢L⇤ L is the Green’s function
of the operator (L⇤ L) (see Definition 2.2.1). Therefore,
( M )
X

f 2 = p 0 + ⇢L ⇤ L ⇤ a m hm (A.14)
m=1
P
where p0 = (pq0 + p⇤ ) 2 NnL and where o m am hhm , pi = 0 for every p 2 NL .
PM PM
The component ⇢L⇤ L ⇤ m=1 am hm in (A.14) can be written as, m=1 am 'm
n o
bm
provided that 'm = ⇢L⇤ L ⇤ hm = F 1 |hL|b2 is well-defined. To show that this is
the case, we decompose hm = ProjQ0 {hm } + ProjNL0 {hm } where ProjQ0 and ProjNL0
are the projection operators on Q0 and NL0 , respectively. Since, ProjQ0 {hm } 2 Q0 ,
as discussed earlier, ⇢L⇤L ⇤ ProjQ0 {hm } is well-defined.
Now, one can always select a basis {pn }N N0
n=1 such that NL = Span{ n }n=1 with
0 0

n = (· xn ) and h m , pn i = [m n]. The other component ProjNL0 {hm } =


PN0
n=1 cn n where cn = hhm , pn i. Therefore, ⇢L⇤L ⇤ ProjNL0 {hm } is na linear com-
o
bm
bination of shifted Green’s functions, which proves that 'm = F 1 |hL| b2 is well
defined.
For general case, when Assumption 2 is satisfied, we see that any solution f2⇤ 2
V2 also minimizes the following
min kLf kL2 . (A.15)
f 2H {H{f2⇤ }}
1

As discussed earlier, the minimizer of (A.15) is unique so that it is clearly f2⇤ . We


now use the same reasoning as for the cases of Assumption 2’ to show that f2⇤ takes
the form (2.16). This concludes the proof.
178 Appendices

Note that, even in the absence of convexity of E(z, ·), results on the form of the
solution can still be obtained.

A.1.2 Proof of Theorem 2.4.2


Similarly to the L2 case, the proof has two steps. We first show that the set of min-
imizers is nonempty. We then connect the optimization problem to the one studied
in [46, Theorem 2] to deduce the form of the extreme points. The functional to
minimize is J1 (f |z) = E(z, H{f }) + kLf kM , defined over the Banach space X1 .

Existence of Solutions. We first show that V = arg minf 2X1 J1 (f |z) is nonempty,
convex, and weak*-compact.
We rely on the generalized Weierstrass theorem presented in [215]: Any proper,
lower semi-continuous (lsc) functional over a compact topological vector space
reaches its minimum, from which we deduce the following result. We recall that the
dual space B 0 of a Banach space B can be endowed with the weak*-topology, and
that one can define a norm kf kB0 = supkxkB hf, xi for which B 0 is a Banach space.

Proposition A.1.2. Let B be a Banach space. Then, a functional J : B 0 !


R+ [ {1} which is proper, convex, coercive, and weak*-lsc is lower bounded and
reaches its infimum. Moreover, the set V = arg min J is convex and weak*-compact.

Proof. Let ↵ > inf J. The coercivity implies that there exists r > 0 such that
J(f ) ↵ as soon as kf kB0 > r. The infimum of J can only be reached on Br = {f 2
B 0 , kf kB0  r}, hence we restrict our analysis to it. The Banach-Alaoglu theorem
implies that Br is weak*-compact. As a consequence, the functional J is proper
and lsc on the compact space Br endowed with the weak*-topology. According to
the generalized Weierstrass theorem [215, Theorem 7.3.1], J reaches its infimum on
Br , hence on X 0 .
Let V = arg min J and ↵0 = min J. The convexity of J directly implies the one
of V. The set V is included in the ball B↵0 which is weak*-compact. Therefore,
it suffices to show that V is weak*-closed to deduce that it is weak*-compact.
Moreover, the weak*-lower semi-continuity is equivalent to the weak*-closedness of
the level sets {f 2 B 0 : J(f )  ↵} are weak*-closed. Applying this to ↵ = ↵0 , we
deduce that V = {f 2 B 0 : J(f )  ↵0 } is weak*-closed, as expected.

We apply Proposition A.1.2 to B 0 = X1 , which is the dual of the Banach space


A.1 Chapter 2 179

B = CL (R) introduced in [46] and recapped in the supplementary material. One


has to show that the functional J = J1 (·|z) is coercive and weak*-lsc over X1 .
The coercivity is deduced exactly in the same way as for Theorem 2.4.1. The
weak*-lower semi-continuity is deduced as follows. First, H is weak*-continuous by
assumption and E(z, ·) is lsc; hence, the composition f 7! E(z, H{f }) is weak*-lsc.
Similarly, the norm k·kM is weak*-lsc on M and L : X1 ! M is continuous, hence
f 7! kLf kM is weak*-continuous, and therefore weak*-lsc over X1 . Finally, J1 (·|z)
is weak*-lsc over X1 as it is a sum of two weak*-lsc functionals.
Form of the Extreme Points. Let fe be an extreme point of the set V1 and
ze = Hfe . Then fe is also a member of the solution set

Ve = arg min
1
kLf kM . (A.16)
f =H {ze }

Since ze is convex and compact, and the set H 1 {ze } is nonempty, we can apply
Theorem 2 of [46] to deduce that Ve is convex and weak*-compact, together with
the general form (2.19) of the extreme-points of Ve .
Since Ve ✓ V1 , and fe 2 Ve it can be easily shown that fe is also an extreme
point of Ve . This proves that the extreme points of V1 admit the form (2.19).
Measurement of the solution set. We now show that in the case of Assumption
2’ the measurement of the solution set is unique. We first prove this for the case
of Assumption 2’.i). Let J1⇤ be the minimum value attained by the solutions. Let
fA⇤ and fB⇤ be two solutions. Let eA , eB be their corresponding E functional value
and let rA , rB be their corresponding regularization functional value. Since the cost
function is convex, any convex combination fAB = fA⇤ +(1 )fB⇤ is also a solution
for 2 [0, 1] with functional value J1 . Let us assume that H{fA⇤ } =

6 H{fB⇤ }. Since
E(z, ·) is strictly convex and R1 (f ) = kLf kM is convex, we get that

rclJ1⇤ = E(z, H{ fA⇤ + (1 )fB⇤ }) + R1 ( fA⇤ + (1 )fB⇤ )


< eA + (1 )eB + rA + (1 )rB .
| {z }
J1⇤

This is a contradiction. Therefore, H{fA⇤ } = H{fB⇤ } = H{fAB }.


In the case of Assumption 2’.ii), E(z, ·) is an indicator function. It is therefore
obvious that all the solutions have the same measurement z.
180 Appendices

A.1.3 Proof of Theorem 2.6.2


We first state two propositions that are needed for the proof. Their proofs are given
after this section.
Proposition A.1.3 (Adapted from [45, Theorem 5]). Let z 2 RM and H 2 RM ⇥N ,
where M < N . Then, the solution set ↵ of

a⇤ = arg min kz Hak22 + kak1 (A.17)


a2RN

is a compact convex set and kak0  M, 8a 2 ↵E, , where ↵E, is the set of the
extreme points of ↵ .
Proposition A.1.4. Let the convex compact set ↵ be the solution set of Problem
(2.48) and let ↵E, be the set of its extreme points. Let the operator T : ↵ ! RN
be such that Ta = u with um = |am |, m 2 [1, . . . , N ]. Then, the operator is linear
and invertible over the domain ↵ and the range T↵ is convex compact such that
the image of any extreme point aE 2 ↵E, is also an extreme point of the set T↵ .
The linear program corresponding to (2.50) is
N
X
(a⇤ , u⇤ ) = min un , subject to u + a 0,
a,u
n=1 u a 0,
Pa = z. (A.18)

By putting u + a = s1 and (u a) = s2 , the standard form of this linear program


is
N
!
X
(s⇤1 , s⇤2 ) = min s1n + s2n , s.t. s1 0,
s1 ,s2
n=1 s2 0,
Ps1 Ps2  z
Ps1 + Ps2  z. (A.19)

Any solution a⇤ of (A.18) is equal to (s⇤1 s⇤2 ) for some solution pair (A.19). We
denote the concatenation of any two independent points sr1 , sr2 2 RN by the ⇣variable ⌘
sr = (sr1 , sr2 ) 2 R2N . Then, the concatenation of the feasible pairs sf = sf1 , sf2
A.1 Chapter 2 181

that satisfies the constraints of the linear program (A.19) forms a polytope in R2N .
Given that (A.19) is solvable, it is known that at least one of the extreme points
of this polytope is also a solution. The simplex algorithm is devised such that
its solution s⇤SLP = s⇤1,SLP , s⇤2,SLP is an extreme point of this polytope [68]. Our
remaining task is to prove that a⇤SLP = s⇤1,SLP s⇤2,SLP is an extreme point of the
set ↵ , the solution set of the problem (2.48).
Proposition A.1.3 claims that the solution set ↵ of the LASSO problem is a
convex set with extreme points ↵E, 2 RN . As ↵ is convex and compact, the
concatenated set ⇣ = {w 2 R2N : w = (a⇤ , u⇤ ) , a⇤ 2 ↵ } is convex and compact
by Proposition A.1.4. The transformation (a⇤ , u⇤ ) = (s⇤1 s⇤2 , s⇤1 + s⇤2 ) is linear
and invertible. This means that the solution set of (A.19) is convex and compact,
too. The simplex solution corresponds to one of the extreme points of this convex
compact set.
Since the map (a⇤ , u⇤ ) = (s⇤1 s⇤2 , s⇤1 + s⇤2 ) is linear and invertible, it also implies
that an extreme point of the solution set of (A.19) corresponds to an extreme point
of ⇣. Proposition A.1.4 then claims that this extreme point of ⇣ corresponds to an
extreme point aSLP 2 ↵ ,E .

A.1.4 Proof of Proposition A.1.3


Using Lemma 2.6.1, it is clear that ↵ is also a solution set of

↵ = arg min kak1 s.t. Ha = z0, (A.20)

for some z0, . The solution of the problem akin to (A.20) has been discussed in [45]
and is proven to be convex and compact such that the extreme points ↵E, of the
convex set ↵ satisfy kak0  M for any a 2 ↵E, .

A.1.5 Proof of Proposition A.1.4


Proof. We use the Karush-Kuhn-Tucker (KKT) conditions for the lasso problem
derived in [70]. For a given a 2 ↵ , these conditions state that their exists a
182 Appendices

2 RN , such that
HT (z Ha) = , and (A.21)
(
sign(am ), if am 6= 0
m 2 (A.22)
[ 1, 1], if am = 0,
for any m 2 [1, . . . , N ]. The is unique since Ha is unique for all a 2 ↵ . Condition
(A.22) implies that |am | = m .am for any m 2 [1, . . . , N ] and a 2 ↵ .
Therefore, for any a 2 ↵ , Ta = Ra, where R 2 RN ⇥N is a diagonal matrix
with entries Rmm = m . Thus, the operation of T is linear in the domain ↵ . Also,
a = RRa for a 2 ↵ implying that the operator T is invertible.
This ensures that the image of the convex compact set T↵ is also convex compact
and the image of any extreme point aE 2 ↵E, is also an extreme point of the set
T↵ . Similarly, it can be proved that the concatenated set ⇣ = {w 2 R2N : w =
(a, Ta) , a 2 ↵ } is the image of a linear and invertible concatenation operation on
↵. Thus, it is convex and compact, and the image of any extreme point through
the inverse operation of the concatenation wE 2 ⇣E, is also an extreme point of
↵ .

A.1.6 Structure of the Search Spaces


Decomposition of X1 and X2 . The set X1 is the search space, or native space, for
the gTV case. It is defined and studied in [46, Section 6], from which we recap the
main results. Note that the same construction is at work for X2 , which is then a
Hilbert space.
Let p = (p1 , . . . , pN0 ) be a basis of the finite-dimensional null space NL of L.
If = ( 1 , . . . , N0 ) and p = (p1 , . . . , pN0 ) form a biorthonormal system such that
h n1 , pn2 i = [n1 n2 ], and if n is in S(R), then
N0
X
Pf = hf, n ipn (A.23)
n=1
is a well-defined projector from X1 to NL . The finite-dimensional null space of L is
a Banach (and even a Hilbert) space for the norm
N0
!1/2
X
kpkNL = hp, n i2 . (A.24)
n=1
A.2 Chapter 3 183

Moreover, f 2 X1 is uniquely determined by w = Lf 2 M(R) and p = Pf 2 NL .


More precisely, there exists a right-inverse operator L 1 of L such that [46, Theorem
4]
1
f =L w + p. (A.25)
In other words, X1 is isomorphic to the direct sum M(R) NL , from which we
deduce that it is a Banach space for the norm [46, Theorem 5]
kf kX1 = kLf kM + kPf kNL = kwkM + kpkNL . (A.26)
The same construction is at work for X2 by replacing M(R) with L2 (R). Then,
X2 is a Hilbert space for the Hilbertian norm
kf kX2 = kLf kL2 + kPf kNL . (A.27)

Predual of X1 . The space M(R) is the topological dual of the space C0 (R) of
continuous and vanishing functions. The space X1 inherits this property: It is the
topological dual of CL (R), defined as the image of C0 (R) by the adjoint L⇤ of L
according to [46, Theorem 6].
We can therefore define a weak*-topology on X1 : It is the topology for which
fn ! 0 if hfn , 'i ! 0 for every ' 2 CL (R). The weak*-topology is crucial to ensure
the existence of solutions of (2.18); see [46] for more details.
Weak*-continuity of hm . The weak*-continuity of hm is equivalent to its inclusion
in the predual space CL (R) [216, Theorem IV.20, p. 114].

A.2 Chapter 3
A.2.1 Proof of Theorem 3.3.1
(i) Set rk = (xk+1 xk ). On one hand, it is clear that
rk = (1 ↵k )xk + ↵k zk xk = ↵k (zk xk ) . (A.28)
On the other hand, from the construction of {↵k },
↵k kzk xk k2  ck ↵k 1 kzk 1 xk 1 k2
, krk k2  ck krk 1 k2 . (A.29)
184 Appendices

Iterating (A.29) gives


k
Y
krk k2  kr0 k2 ci , 8k 1. (A.30)
i=1

We now show that {xk } is a Cauchy sequence. Since {ck } is asymptotically upper-
bounded by C < 1, there exists K such that ck  C, 8k > K. Let m, n be two
integers such that m > n > K. By using (A.30) and the triangle inequality,
m
X1 K
Y mX
1 K
kxm x n k2  krk k2  kr0 k2 ci Ck
k=n i=1 k=n K
K
!
Y C n K
Cm K
 kr0 k2 ci . (A.31)
i=1
1 C

The last inequality proves that kxm xn k2 ! 0 as m ! 1, n ! 1, or {xk } is a


Cauchy sequence in the complete metric space RN . As a consequence, {xk } must
converge to some point x⇤ 2 RN .
(ii) Assume from now on that {↵k } is lower-bounded by " > 0. By definition,
{↵k } is also non-increasing and, thus, convergent to ↵⇤ > 0. Next, we rewrite the
update of xk in Algorithm 1 as
xk+1 = (1 ↵k )xk + ↵k G (xk ), (A.32)
where G is defined by (3.7). Taking the limit of both sides of (A.32) leads to
x⇤ = (1 ↵⇤ )x⇤ + ↵⇤ lim G (xk ). (A.33)
k!1

Moreover, since the nonlinear operator F is continuous, G is also continuous.


Hence,
✓ ◆
lim G (xk ) = G lim xk = G (x⇤ ). (A.34)
k!1 k!1

By plugging (A.34) into (A.33), we get that x⇤ = G (x⇤ ), which means that x⇤ is
a fixed point of the operator G .
(iii) Now that F = PS satisfies (3.4), we invoke Proposition 3.2.1 to infer that
x⇤ is a local minimizer of (3.1), thus completing the proof.
A.2 Chapter 3 185

A.2.2 RPGD for Poisson Noise in CT


In the case where the CT measurements are corrupted by Poisson noise, the data-
fidelity term in (3.1) should be replaced by weighted least squares [96, 217, 218].
For the sake of completeness, we show a sketch of the derivation. Let x represent
the distribution of linear attenuation coefficient of an object and [Hx]m represents
their line integral. The mth CT measurement, ym , is a Poisson random variable
with parameters
⇣ ⌘
pm ⇠ Poisson bm e [Hx]m + rm (A.35)
✓ ◆
pm
ym = log (A.36)
bm
where bm is the blank scan factor and rm is the readout noise. Since logarithm is
bijective, the negative log-likelihood of y given x is equal to the one of p given x.
After removing the constants, we use this negative log-likelihood as the data-fidelity
term
M
X
E(Hx, y) = (p̂m pm log p̂m ) , (A.37)
m=1

where p̂m = bm e [Hx]m + rm is the expected value of pm . We then perform a


quadratic approximation of E with respect to Hx around the point ( ln( p̂mbmrm ))
using a Taylor expansion. After ignoring the higher-order terms, this yields
XM ✓ ✓ ◆◆2
wm bm
E(Hx, y) = Hx log , (A.38)
m=1
2 pm r m
2
where wm = (pmpmrm ) .
In the case when the readout noise rm is insignificant, (A.38) can be written as
XM
wm 2
E(Hx, y) = ([Hx]m ym ) (A.39)
m=1
2
1 1 1
= kW 2 Hx W 2 yk2 (A.40)
2
1
= kH0 x y0 k2 , (A.41)
2
186 Appendices

1
where W 2 RM ⇥M is a diagonal matrix with [diag(W)]m = wm = pm , H0 = W 2 H,
1
and y0 = W 2 y.
Imposing the data manifold prior, we get the equivalent of Problem (3.1) as

1
min kH0 x y0 k2 . (A.42)
x2S 2

Note that all the results discussed in Section 3.2 and 3.3 apply to Problem (A.42).
As a consequence, we use Algorithm 1 to solve the problem with the following small
change in the gradient step:

zk = F (xk H0T H0 xk + H0T y0 ). (A.43)

A.2.3 Proof of Proposition 3.2.1


Suppose that (3.4) is fulfilled and let x⇤ 2 S be a fixed point of G . We show that
x⇤ is also a local minimizer of (3.1). Indeed, setting x = x⇤ HT Hx⇤ + HT y
leads to PS x = x . Then, there exists " > 0 such that, for all z 2 S \ B" (x⇤ ),

0 hz PS x , x PS xi
D E
= z x⇤ , HT y HT Hx⇤
⇣ ⌘
2 2 2
= kHx⇤ yk2 kHz yk2 + kHx⇤ Hzk2 .
2
Since > 0, the last inequality implies that
2 2
kHx⇤ yk2  kHz yk2 , 8z 2 S \ B" (x⇤ ),

which means that x⇤ is a local minimizer of (3.1).


Assume now that PS satisfies (3.5). By just removing the "-ball in the above
argument, one easily verifies that
2 2
kHx⇤ yk2  kHz yk2 , 8z 2 S,

which means that x⇤ is a solution of (3.1).


A.2 Chapter 3 187

A.2.4 Proof of Proposition 3.2.2


We prove by contradiction. Assuming that S is non-convex, there must exist
x1 , x2 2 S and ↵ 2 (0, 1) such that x = ↵x1 + (1 ↵)x2 2
/ S. Since PS x 2 S, it
must be that
2
0 < kx PS xk2 = hx PS x , x PS xi
= ↵ hx1 PS x , x PS xi
+ (1 ↵) hx2 PS x , x PS xi .

Thus, there exists i 2 {0, 1} such that

hxi PS x , x PS xi > 0,

which violates (3.5). So, S is convex.

A.2.5 Proof of Proposition 3.2.3


Sn
Suppose that S = i=1 Ci , where Ci is a closed convex set for all i = 1, . . . , n. The
statement is trivial when n = 1; assume now that n 2. Let x 2 RN and x̂ be the
orthogonal projection
Tn of x onto S. Consider two cases.
Case 1 : x̂ 2 i=1 Ci .
It is then clear that

kx̂ xk2  kz xk2 , 8z 2 Ci , 8i.

This means that x̂ is the orthogonal projection of x onto each Ci . Consequently,

hz x̂ , x x̂i  0, 8z 2 Ci , 8i  n,

which implies that


Tn(3.4) holds true for all " > 0.
Case 2 : x̂ 2
/ i=1 Ci .
Without loss of generality, there exists m < n such that
m
\ n
[
x̂ 2 Ci , x̂ 2
/ Ci . (A.44)
i=1 i=m+1
188 Appendices

Sn
Let d be the distance from x̂ to the set T = i=m+1 Ci . Since each Ci is closed, T
must be closed too and, so, d > 0. We now choose 0 < " < d. Then, B" (x̂) \ T = ;.
Therefore,
m
[ m
[
S \ B" (x̂) = (Ci \ B" (x̂)) = C˜i , (A.45)
i=1 i=1

where C˜i = Ci \ B" (x̂) is clearly a convex set, for allSi  m. It is straightforward
Tm
m
that x̂ is the orthogonal projection of x onto the set i=1 C˜i and that x 2 i=1 C˜i .
We are back to Case 1 and, therefore,

hz x̂ , x x̂i  0, 8z 2 C˜i , 8i  m. (A.46)

From (A.45) and (A.46), (3.4) is fulfilled for the chosen ".

A.2.6 Proof of Theorem 3.2.4


Let { i } denote the set of eigenvalues of HT H. We first have that, for all x 2 RN ,

x HT Hx  I HT H kxk2 , (A.47)
2 2

where the spectral norm of I HT H is given by

I HT H = max{|1 i |}. (A.48)


2 i

On the other hand, choosing = 2/( max + min ) yields

2 min 2 max
 i  , 8i
max + min max + min
max min
, |1 i|  , 8i. (A.49)
max + min

By combining (A.47), (A.48), and (A.49),

max min
x HT Hx  kxk2 , 8x. (A.50)
2 max + min
A.2 Chapter 3 189

Combining (A.50) with the Lipschitz continuity of PS gives

kG (x) G (z)k2  L (x z) HT H(x z)


2
max min
L kx zk2 , 8x, 8z. (A.51)
max + min

Since L < ( max + min )/( max min ), (A.51) implies that G is a contractive
mapping. By the Banach-Picard fixed point theorem [219, Thm. 1.48], {xk } de-
fined by xk+1 = G (xk ) converges to the unique fixed point x⇤ of G , for every
initialization x0 . Finally, since PS satisfies (3.4), by Proposition 3.2.1, x⇤ is also a
local minimizer of (3.1).

A.2.7 Proof of Theorem 3.2.5


Again, let { i } be the set of eigenvalues of HT H. With < 2/ max , one readily
verifies that, 8x 2 RN ,

x HT Hx  max {|1 i |} · kxk2  kxk2 .


2 i

Combining this with the non-expansiveness of PS leads to

kG (x) G (z)k2  (x z) HT H(x z)


2
 kx zk2 , 8x, z 2 RN .

Now that G is a non-expansive operator with a nonempty fixed-point set, we invoke


the Krasnosel’skiı̆-Mann theorem [219, Thm. 5.14] to deduce that the iteration (3.6)
must converge to a fixed point of G which is, by Proposition 3.2.1, also a local
minimizer of (3.1).

A.2.8 Experiments
1) Experiment 1
Figure A.1 show the difference images between the reconstructions and the
ground truth for lung and abdomen images for ⇥16 case.
Fig. A.3 compares the reconstructions for the ⇥5 case when the noise levels
are 1-dB (first and second columns), 45 dB (third columns) and 35 dB (fourth
190 Appendices

column). It is visible that FBPconv40 results in a noisy image and TV is again


blurred. DL and RPGD40 reconstruction for 45-dB case have similar reconstruction
performance and outperform others. For the other cases, RPGD40 is the best
performer. Fig. A.2 shows the profile for the reconstructions for ⇥5, 45-dB noise
case.
2) Experiment 2
Fig. A.4 shows the reconstructions from measurements corrupted with Poisson
noise discussed in Section 3.6.2.
A.2 Chapter 3 191

Results(1-dB) diff(1-dB) diff(45-dB) diff(35-dB) Results(1-dB) diff(1-dB) diff(45-


dB) diff(40-dB)

Figure A.1: Comparison of reconstructions using different methods for


the ⇥16 case in Experiment 1 for lung and abdomen images. The first
and fifth column shows the reconstruction from noiseless measurements for
lung and abdomen images, respectively. Second-fourth columns: show the
difference between the reconstructions and the lung image for noiseless,
45-dB noise, and 35-dB noise cases, respectively. Sixth-eighth columns:
show the difference between the reconstructions and the abdomen image
for noiseless, 45-dB noise, and 40-dB noise cases, respectively.
192 Appendices

High-contrast profile Low-contrast profile

Figure A.2: Profile of the high and low contrast regions marked by solid
and dashed line segments, respectively, inside the original image in the first
column of Fig. A.3. This case corresponds to ⇥5, 35-dB noise case.
A.2 Chapter 3 193

1-dB zoom(1-dB) 45-dB 35-dB diff(1-dB) 45-dB


35-dB

Figure A.3: Comparison of reconstructions using different methods for


the ⇥5 case. The first column shows the reconstruction from noiseless
measurements. Second column shows the zoomed version of a the box area
given in the original in the first column. Third and fourth column show the
zoomed version for the case of 45 and 35 dB, respectively. The next three
columns show the difference between the reconstruction and the ground
truth for noiseless, 45-dB noise, and 35-dB noise case, respectively.
194 Appendices

Results (25-dB) zoom (25-dB) zoom (30-dB) zoom (35-dB) diff (25-dB) diff (30-dB)
diff(35-dB)

Figure A.4: Reconstruction results for Experiment 2. The first column


shows the result when the measurement noise is 25 dB. Second column
shows the zoomed version of the box area given in the full-dose image in
the first column. Third and fourth columns show the zoomed version for
the case of 30 and 35 dB respectively. The next three columns show the
difference between the reconstruction and the ground truth for 25dB, 30dB
and 35 dB respectively.
A.3 Chapter 5 195

A.3 Chapter 5
A.3.1 Synthetic Data (Figure 5.3)
Dataset. We construct a synthetic Cryo-EM dataset that mimics the experimental
-galactosidase dataset (EMPIAR-10061) from [158]. We generate 41,000 synthetic
-galactosidase projections of size 192 ⇥ 192 using our Cryo-EM image-formation
model (see Online Methods). For the ground-truth volume, we generate a 2.5
Å density map from PDB-5a1a atomic model using Chimera [220]. This gives a
volume of size (302⇥233⇥163) with voxel size 0.637 Å. The volume is then padded,
averaged, and downsampled to a size (180 ⇥ 180 ⇥ 180) with voxel size 1.274 Å.
This corresponds to a Nyquist resolution of 2.548 Å for the reconstructed volume.
The projection poses are sampled from a uniform distribution over SO(3), where
SO(3) is the group of 3D rotations around the origin of R3 .
In order to apply random CTFs and noise, we randomly pick a micrograph in the
EMPIAR- 10061 dataset. We extract its CTF paramters using CTFFIND4 [221]
and apply them to a clean projection. The parameter B of the envelope function of
the CTF (see (5.13)) is chosen such that it decays to 0.2 at the Nyquist frequency.
We then randomly select a background patch from the same micrograph to simulate
noise. The noise is downsampled to size 192 ⇥ 192, and normalized to zero-mean,
scaled and added to the projection. The scaling is such that the ratio of the energy
of the signal to the energy of the noise (SNR) is kept at 0.01, which is equivalent
to -20 dB. The dataset is then randomly divided into two halves. The algorithm
is applied separately on both halves to generate the half-maps. The FSC between
the two half maps is then reported using FOCUS [222, 223].

Generator Settings. We reconstruct a volume of size (184 ⇥ 184 ⇥ 184) vox-


els for each half dataset. The pixel size is 1.274 Å. The volumes are initialized with
zeros. The D2 symmetry of -galactosidase is enforced during reconstruction.
We use our image-formation model to simulate realistic projections from the
current volume estimate at every CryoGAN iteration. The distribution of the
imaging parameters is identical to the one used to generate the dataset. To add the
noise on the CTF-modulated projections, we also keep the same approach as the
one used to generate the dataset. However, we assume that the final SNR of each
projection is unknown, leading us to learn the scaling parameter that controls the
ratio between the projections and the noise patches.
196 Appendices

We apply a binary spherical mask of size (184 ⇥ 184 ⇥ 184) on the learned vol-
ume. To handle the sharp transitions at the mask borders, we restrict the voxel
values to lie above a certain value. This value changes as a function of position
and iteration number: it increases linearly with the distance from the center of the
volume to the border of the mask, from Vmin to 0, and the value Vmin changes as
a function of iteration number, starting at 0 and decreasing to -2% of the current
maximum voxel value. This promotes nonnegativity during the initial phases of
reconstruction and increases the stability of the algorithm.

Discriminator Architecture. The architecture of the discriminator network is


detailed in Online Methods. The discriminator is initialized identically for the two
half datasets. All projections (i.e., the real projections and the ones generated by
the simulator) are normalized to zero-mean and a unit standard-deviation before
being fed to the discriminator.

General Settings. The adversarial learning scheme is implemented in Pytorch


[169]. For the optimization, we use [141] ( 1 = 0.5, 2 = 0.9, ✏ = 10 8 ) with a
learning rate of 10 3 and a batch size of 8. The algorithm is run for 40 epochs and
the learning rate decreases by 1% at every epoch. The parameter for the gradient-
penalty term (see (5.20)) is kept to = 0.001. The discriminator is trained 4 times
for every training of the generator (ndiscr = 4 in Algorithm 3).

For the back-propagations, the norm of the gradients for the discriminator are
clipped to a maximal value of 108 . For the generator, the gradients for each pixel
are clipped to a maximal value of 104 . The clipping values increase linearly from
zero to those maxima in the first two epochs. Doing so improves on the stability
of the adversarial learning scheme in the start, in particular, on that of the dis-
criminator. All parameters are tuned for a fixed value range that follows from the
normalization of all projections.

Computational Resources. The reconstruction is run on a Nvidia V100 GPU


with 18GB memory. Each epoch lasts 10 minutes. The algorithm is run for 40
epochs which, in the current implementation, takes 400 minutes.
A.3 Chapter 5 197

A.3.2 Additional Synthetic Data (Figure 5.4)


Synthetic dataset. The data is generated similarly to the main synthetic exper-
iment, with the exception of some changes to the imaging conditions. In a first
case, the noise level is set to a SNR of -5.2 dB. This corresponds to the energy of
the noise being almost four times that of the signal. In the second case, the SNR
is kept at -20 dB (the same as for the main experiment), but the projections are
also translated. The translation (both horizontal and vertical) for each projection
is sampled from a zero-mean symmetric triangular distribution whose total width
is 6% of the image size from the centre. This corresponds to a shift of maximum 5
pixels from the centre in each direction.

Reconstruction Settings. The reconstruction settings for both cases are the
same than the ones used in the main experiment, except for the few following dif-
ferences. For the second case, the translations are also imposed, and the translation
distribution is kept the same as the one used for generating the dataset. Further-
more, in both cases, the lower bound of the clipping value at the centre reaches
-5% of the maximum voxel value of the volume. Finally, the algorithm is run for
100 epochs for both experiments.

A.3.3 Experimental Data (Figure 5.5)


Dataset. The dataset consists of 41,123 -galactosidase (EMPIAR-10061) projec-
tions extracted from 1,539 micrographs [158]. The projections of size (384 ⇥ 384)
are downsampled to (192 ⇥ 192), with a pixel size of 1.274 Å. This corresponds to a
Nyquist resolution of 2.548 Å for a reconstructed volume of size (180 ⇥ 180 ⇥ 180).
The dataset is randomly divided into two halves. The algorithm is applied sep-
arately on both halves to generate the half-maps. The defocus and astigmatism
parameters of the CTF are estimated from each micrograph using Relion [151].

Generator Settings. We reconstruct a volume of size (180 ⇥ 180 ⇥ 180) vox-


els for each half dataset. The pixel size is 1.274 Å. The volumes are initialized with
zeros. The D2 symmetry of -galactosidase is enforced during reconstruction. A
uniform distribution is assumed for the poses. The CTF parameters estimated with
CTFFIND4 [221] are used in the forward model of the Cryo-EM physics simulator.
The parameter B of the envelope function of the CTF (see (5.13)) decays to 0.4 at
198 Appendices

the Nyquist frequency. The translations are set to zero.


To handle the noise, we randomly extract (prior to the learning procedure)
41,123 patches of size (384 ⇥ 384) from the background of the micrographs at lo-
cations where particles do not appear; this is done by identifying patches with the
lowest variance. We extract as many noise patches per micrograph as we have
particle images. Each noise patch is then downsampled to size (192 ⇥ 192) and
normalized. Then, during run-time, the noise patches are sampled from this col-
lection, scaled, and added to the simulated projections. For consistency, the noise
patch added to a given simulated projection is taken from the same micrograph as
the one that was used to estimate the CTF parameters previously applied to that
specific projection. The scaling operation weights the contribution of the noise with
respect to the projection signal. This is handled by multiplying the pixel values of
the noise patches and the projections by two scalars that are learnt throughout the
procedure. These two scalar values are the same for every pair of noise/projection
images, so that the same amount of extracted noise is added to every simulated
projection.
We apply a binary spherical mask of size (171⇥171⇥171) on the learned volume.
To handle the sharp transitions at the mask borders, we enforce constraints on the
masked volume. These are similar to those used in the synthetic experiment, with
difference that the lower bound, Vmin decreases to -5% of the maximum voxel value.

Discriminator Architecture. The architecture of the discriminator network is


detailed in Online Methods. The discriminator is initialized identically for the two
half datasets. The projection images (real and simulated) are smoothed with a
Gaussian kernel before being fed to the discriminator. The standard deviation of
the kernel is initially set at 2 and changes in every iteration so that it decreases by
a total of 2% in each epoch.

General Settings. The adversarial learning scheme is implemented in Pytorch


[169]. For the optimization, we use [141] ( 1 = 0.5, 2 = 0.9, ✏ = 10 8 ) with a
learning rate of 10 3 and a batch size of 8. The algorithm is run for 15 epochs and
the learning rate decreases by 8% at every epoch. The parameter for the gradient-
penalty term is kept to = 1 (see (5.20)). The discriminator is trained 4 times for
every training of the generator (ndiscr = 4 in Algorithm 3).
For the back-propagations, the norm of the gradients for the discriminator are
clipped to a maximal value of 107 . For the generator, the gradients for each pixel
A.3 Chapter 5 199

are clipped to a maximal value of 103 . The clipping values increase linearly from
zero to those maxima in the first two epochs. Doing so improves on the stability of
the adversarial learning scheme in the start, in particular, on that of the discrimi-
nator. The gradients that correspond to the learning of the scaling ratios between
the noise and projection images are clipped to a value of 10.

Computational Resources. The reconstruction is run on a Nvidia V100 GPU


with 18GB memory. Each epoch lasts 10 minutes. The algorithm is run for 15
epochs, which takes 150 minutes.
200 Appendices

A.4 Chapter 6
A.4.1 Neural Network Architectures
Layer id Layer Resample Output Shape
0 Input - 1 ⇥ 32 ⇥ 32
1 Conv2d MaxPool 96 ⇥ 16 ⇥ 16
2 Conv2d MaxPool 192 ⇥ 8 ⇥ 8
3 Conv2d MaxPool 384 ⇥ 4 ⇥ 4
4 Conv2d MaxPool 768 ⇥ 2 ⇥ 2
5 Flatten - 3072⇥ 1 ⇥ 1
6 FC - 50 ⇥ 1 ⇥ 1
7 FC - 1 ⇥ 1 ⇥ 1

Table A.1: 2D Discriminator architecture. LeakReLU(0.1) is used after


every MaxPool and in layer 6.
A.4 Chapter 6 201

Layer id Layer Resample Norm Output Shape (C, D, H, W)


0 Input - - 1 ⇥ 32 ⇥ 32 ⇥ 32
1 Conv3d - BN 16 ⇥ 32 ⇥ 32 ⇥ 32
2 Conv3d MaxPool BN 16 ⇥ 16 ⇥ 16 ⇥ 16
3 Conv3d - BN 32 ⇥ 16 ⇥ 16 ⇥ 16
4 Conv3d MaxPool BN 32 ⇥ 8 ⇥ 8 ⇥ 8
5 Conv3d - BN 64 ⇥ 8 ⇥ 8 ⇥ 8
6 Conv3d MaxPool BN 64 ⇥ 4 ⇥ 4 ⇥ 4
7 Conv3d - BN 128 ⇥ 4 ⇥ 4 ⇥ 4
8 Conv3d MaxPool BN 128 ⇥ 2 ⇥ 2 ⇥ 2
9 Conv3d - BN 256 ⇥ 2 ⇥ 2 ⇥ 2
10 Conv3d - BN 256 ⇥ 2 ⇥ 2 ⇥ 2
11 Conv3d Upsample BN 128 ⇥ 4 ⇥ 4 ⇥ 4
12 Concat(layer 8) - - 256 ⇥ 4 ⇥ 4 ⇥ 4
13 Conv3d - BN 128 ⇥ 4 ⇥ 4 ⇥ 4
14 Conv3d - BN 128 ⇥ 4 ⇥ 4 ⇥ 4
15 Conv3d Upsample BN 64 ⇥ 8 ⇥ 8 ⇥ 8
16 Concat(layer 6) - - 128 ⇥ 8 ⇥ 8 ⇥ 8
17 Conv3d - BN 64 ⇥ 8 ⇥ 8 ⇥ 8
18 Conv3d - BN 64 ⇥ 8 ⇥ 8 ⇥ 8
19 Conv3d Upsample BN 32 ⇥ 16 ⇥ 16 ⇥ 16
20 Concat(layer 4) - - 64 ⇥ 16 ⇥ 16 ⇥ 16
21 Conv3d - BN 32 ⇥ 16 ⇥ 16 ⇥ 16
22 Conv3d - BN 32 ⇥ 16 ⇥ 16 ⇥ 16
23 Conv3d Upsample BN 16 ⇥ 32 ⇥ 32 ⇥ 32
24 Concat(layer 2) - - 32 ⇥ 32 ⇥ 32 ⇥ 32
25 Conv3d - BN 16 ⇥ 32 ⇥ 32 ⇥ 32
26 Conv3d - BN 16 ⇥ 32 ⇥ 32 ⇥ 32
27 Conv3d - BN 1 ⇥ 32 ⇥ 32 ⇥ 32

Table A.2: 3D Generator architecture. ReLU is used after every Batch-


Norm (BN). Concatenation is with the values before pooling.
202 Appendices
Bibliography

[1] Mario Bertero and Patrizia Boccacci, Introduction to inverse problems in


imaging, CRC press, 1998.
[2] A. Kak and M. Slaney, Principles of Computerized Tomographic Imaging,
Classics in Applied Mathematics. Society for Industrial and Applied Mathe-
matics, New York, NY, 2001.
[3] Otmar Scherzer, Handbook of mathematical methods in imaging, Springer
Science & Business Media, 2010.
[4] M Bertero and M Piana, “Inverse problems in biomedical imaging: modeling
and methods of solution,” in Complex systems in biomedicine, pp. 1–33.
Springer, 2006.
[5] C. Bouman and K. Sauer, “A generalized Gaussian image model for edge-
preserving MAP estimation,” IEEE Trans. Image Process., vol. 2, no. 3, pp.
296–310, 1993.
[6] P. Charbonnier, L. Blanc-Féraud, G. Aubert, and M. Barlaud, “Deterministic
edge-preserving regularization in computed imaging,” IEEE Trans. Image
Process., vol. 6, no. 2, pp. 298–311, 1997.
[7] M. Lustig, D. L. Donoho, and J. M. Pauly, “Sparse MRI: The application of
compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine,
vol. 58, no. 6, pp. 1182–1195, Dec. 2007.
[8] E. Candès and J. Romberg, “Sparsity and incoherence in compressive sam-
pling,” Inverse Problems, vol. 23, no. 3, pp. 969–985, Jun. 2007.

203
204 BIBLIOGRAPHY

[9] S. Ramani and J.A. Fessler, “Parallel MR image reconstruction using aug-
mented Lagrangian methods,” IEEE Trans. Med. Imag., vol. 30, no. 3, pp.
694–706, 2011.
[10] M. A. T. Figueiredo and R. D. Nowak, “An EM algorithm for wavelet-based
image restoration,” IEEE Transactions on Image Processing, vol. 12, no. 8,
pp. 906–916, Aug. 2003.
[11] Ingrid Daubechies, Michel Defrise, and Christine De Mol, “An iterative
thresholding algorithm for linear inverse problems with a sparsity constraint,”
Commun. Pure Appl. Math, vol. 57, no. 11, pp. 1413–1457, 2004.

[12] Amir Beck and Marc Teboulle, “A fast iterative shrinkage-thresholding algo-
rithm for linear inverse problems,” SIAM J. Imaging Sciences, vol. 2, no. 1,
pp. 183–202, 2009.
[13] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein,
“Distributed optimization and statistical learning via the alternating direction
method of multipliers,” Foundations and Trends in Machine Learning, vol.
3, no. 1, pp. 1–122, 2011.
[14] Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep learning, MIT
Press, 2016.
[15] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Imagenet clas-
sification with deep convolutional neural networks,” in Advances in neural
information processing systems, 2012, pp. 1097–1105.
[16] Michael T. McCann, Kyong Hwan Jin, and Michael Unser, “Convolutional
neural networks for inverse problems in imaging: A review,” IEEE Signal
Process. Mag., vol. 34, no. 6, pp. 85–95, Nov. 2017.

[17] Kyong Hwan Jin, Michael T McCann, Emmanuel Froustey, and Michael
Unser, “Deep convolutional neural network for inverse problems in imaging,”
IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509–4522, 2017.
[18] Yo Seob Han, Jaejun Yoo, and Jong Chul Ye, “Deep learning with domain
adaptation for accelerated projection reconstruction MR,” arXiv:1703.01135
[cs.CV], 2017.
BIBLIOGRAPHY 205

[19] Stephan Antholzer, Markus Haltmeier, and Johannes Schwab, “Deep learning
for photoacoustic tomography from sparse data,” arXiv:1704.04587 [cs.CV],
2017.
[20] Shanshan Wang, Zhenghang Su, Leslie Ying, Xi Peng, Shun Zhu, Feng Liang,
Dagan Feng, and Dong Liang, “Accelerating magnetic resonance imaging via
deep learning,” in Proc. IEEE Int. Symp. Biomed. Imaging (ISBI), 2016, pp.
514–517.
[21] Jonas Adler and Ozan Öktem, “Solving ill-posed inverse problems using iter-
ative deep neural networks,” arXiv:1704.04058 [math.OC], 2017.
[22] Kerem C Tezcan, Christian F Baumgartner, Roger Luechinger, Klaas P
Pruessmann, and Ender Konukoglu, “MR image reconstruction using deep
density priors,” IEEE Trans. on Med. Imag., vol. 38, no. 7, pp. 1633–1642,
July 2019.
[23] Yong Chun and Jeffrey A Fessler, “Deep bcd-net using identical encoding-
decoding cnn structures for iterative image recovery,” in 2018 IEEE 13th
Image, Video, and Multidimensional Signal Processing Workshop (IVMSP).
IEEE, 2018, pp. 1–5.
[24] K. Hammernik, T. Klatzer, E. Kobler, M.P. Recht, D.K. Sodickson, T. Pock,
et al., “Learning a variational network for reconstruction of accelerated MRI
data,” Magn. Reson. in Med., vol. 79, no. 6, pp. 3055–3071, June 2018.
[25] JH Rick Chang, Chun-Liang Li, Barnabas Poczos, BVK Vijaya Kumar, and
Aswin C Sankaranarayanan, “One network to solve them all–solving linear
inverse problems using deep projection models,” in Proceedings of the IEEE
International Conference on Computer Vision, 2017, pp. 5888–5897.
[26] Harshit Gupta, Kyong Hwan Jin, Ha Q Nguyen, Michael T McCann, and
Michael Unser, “CNN-based projected gradient descent for consistent ct im-
age reconstruction,” IEEE transactions on medical imaging, vol. 37, no. 6,
pp. 1440–1453, 2018.
[27] V. Lempitsky, A. Vedaldi, and D. Ulyanov, “Deep image prior,” in Proc. of
the IEEE Comput. Soc. Conf. on Comput. Vision and Pattern Recognition
(CVPR), Salt Lake City UT, USA, July 18-23 2018, pp. 9446–9454.
206 BIBLIOGRAPHY

[28] Harshit Gupta, Michael T McCann, Laurene Donati, and Michael Unser,
“Cryogan: A new reconstruction paradigm for single-particle cryo-em via deep
adversarial learning,” BioRxiv, 2020.

[29] Harshit Gupta, Julien Fageot, and Michael Unser, “Continuous-domain so-
lutions of linear inverse problems with tikhonov versus generalized tv reg-
ularization,” IEEE Transactions on Signal Processing, vol. 66, no. 17, pp.
4670–4684, 2018.

[30] X. Pan, E. Y. Sidky, and M. Vannier, “Why do commercial CT scanners


still employ traditional, filtered back-projection for image reconstruction?,”
Inverse Probl., vol. 25, no. 12, pp. 123009, 2009.

[31] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algo-


rithm for linear inverse problems with a sparsity constraint,” Communications
on Pure and Applied Mathematics, vol. 57, no. 11, pp. 1413–1457, 2004.

[32] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm


for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no.
1, pp. 183–202, Jan. 2009.

[33] Michael Elad and Michal Aharon, “Image denoising via sparse and redundant
representations over learned dictionaries,” IEEE Trans. Image Process., vol.
15, no. 12, pp. 3736–3745, 2006.

[34] Emmanuel J Candès, Yonina C Eldar, Deanna Needell, and Paige Randall,
“Compressed sensing with coherent and redundant dictionaries,” Applied and
Computational Harmonic Analysis, vol. 31, no. 1, pp. 59–73, 2011.

[35] S. Ravishankar, R. R. Nadakuditi, and J. Fessler, “Efficient sum of outer prod-


ucts dictionary learning (SOUP-DIL) and its application to inverse problems,”
IEEE Trans. Comput. Imaging, vol. 3, no. 4, pp. 694–709, 2017.

[36] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-
Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative ad-
versarial nets,” in Advances in neural information processing systems, 2014,
pp. 2672–2680.
BIBLIOGRAPHY 207

[37] Diederik P Kingma and Max Welling, “Auto-encoding variational bayes,”


arXiv preprint arXiv:1312.6114, 2013.

[38] A. N. Tikhonov, “Solution of incorrectly formulated problems and the regu-


larization method,” Soviet Mathematics, vol. 4, pp. 1035–1038, 1963.

[39] Mario Bertero and Patrizia Boccacci, Introduction to Inverse Problems in


Imaging, CRC press, 1998.

[40] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information


Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.

[41] M.A.T. Figueiredo, R.D. Nowak, and S.J. Wright, “Gradient projection for
sparse reconstruction: Application to compressed sensing and other inverse
problems,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, no.
4, pp. 586–597, Dec. 2007.

[42] Arthur E. Hoerl and Robert W. Kennard, “Ridge regression: Biased estima-
tion for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67,
Feb. 1970.

[43] R. Tibshirani, “Regression shrinkage and selection via the Lasso,” Journal of
the Royal Statistical Society. Series B, vol. 58, no. 1, pp. 265–288, 1996.

[44] Bradley Efron, Trevor Hastie, and Robert Tibshirani, “Discussion: The
Dantzig selector: Statistical estimation when p is much larger than n,” The
Annals of Statistics, vol. 35, no. 6, pp. 2358–2364, Dec. 2007.

[45] M. Unser, J. Fageot, and H. Gupta, “Representer theorems for sparsity-


promoting `1 -regularization,” IEEE Transactions on Information Theory,
vol. 62, no. 9, pp. 5167–5180, Sep. 2016.

[46] Michael Unser, Julien Fageot, and John Paul Ward, “Splines are universal so-
lutions of linear inverse problems with generalized TV regularization,” SIAM
Review, vol. 59, no. 4, pp. 769–793, Dec. 2017.

[47] Holger Wendland, Scattered Data Approximation, vol. 17, Cambridge Uni-
versity press, 2004.
208 BIBLIOGRAPHY

[48] Bernhard Schölkopf and Alexander J. Smola, Learning with Kernels: Support
Vector Machines, Regularization, Optimization, and Beyond, MIT Press,
Cambridge, MA, USA, 2001.
[49] Bernhard Schölkopf, Ralf Herbrich, and Alex J Smola, “A generalized repre-
senter theorem,” Lecture Notes in Computer Science, vol. 2111, pp. 416–426,
2001.
[50] Grace Wahba, Spline Models for Observational Data, vol. 59, SIAM, 1990.
[51] Grace Wahba, “Support vector machines, reproducing kernel Hilbert spaces
and the randomized GACV,” Advances in Kernel Methods-Support Vector
Learning, vol. 6, pp. 69–87, 1999.
[52] Anatoly Yu Bezhaev and Vladimir Aleksandrovich Vasilenko, Variational
theory of splines, Springer, 2001.
[53] J. Kybic, T. Blu, and M. Unser, “Generalized sampling: A variational
approach—Part I: Theory,” IEEE Transactions on Signal Processing, vol.
50, no. 8, pp. 1965–1976, Aug. 2002.
[54] J. Kybic, T. Blu, and M. Unser, “Generalized sampling: A variational
approach—Part II: Applications,” IEEE Transactions on Signal Processing,
vol. 50, no. 8, pp. 1977–1985, Aug. 2002.
[55] Leonid I. Rudin, Stanley Osher, and Emad Fatemi, “Nonlinear total variation
based noise removal algorithms,” Physics D, vol. 60, no. 1-4, pp. 259–268,
Nov. 1992.
[56] Gabriele Steidl, Stephan Didas, and Julia Neumann, “Splines in higher order
TV regularization,” International Journal of Computer Vision, vol. 70, no.
3, pp. 241–255, Dec. 2006.
[57] SD Fisher and JW Jerome, “Spline solutions to L1 extremal problems in one
and several variables,” Journal of Approximation Theory, vol. 13, no. 1, pp.
73–83, Jan. 1975.
[58] K. Bredies and H.K. Pikkarainen, “Inverse problems in spaces of measures,”
ESAIM: Control, Optimisation and Calculus of Variations, vol. 19, no. 1, pp.
190–218, Jan. 2013.
BIBLIOGRAPHY 209

[59] E.J. Candès and C. Fernandez-Granda, “Super-resolution from noisy data,”


Journal of Fourier Analysis and Applications, vol. 19, no. 6, pp. 1229–1254,
Dec. 2013.

[60] Quentin Denoyelle, Vincent Duval, and Gabriel Peyré, “Support recovery for
sparse super-resolution of positive measures,” Journal of Fourier Analysis
and Applications, vol. 23, no. 5, pp. 1153–1194, Oct. 2017.

[61] Antonin Chambolle, Vincent Duval, Gabriel Peyré, and Clarice Poon, “Geo-
metric properties of solutions to the total variation denoising problem,” In-
verse Problems, vol. 33, no. 1, pp. 015002, Dec. 2016.

[62] A. Flinth and P. Weiss, “Exact solutions of infinite dimensional total-variation


regularized problems,” arXiv:1708.02157 [math.OC], 2017.

[63] Michael Unser, “A representer theorem for deep neural networks,”


arXiv:1802.09210 [stat.ML], 2018.

[64] Andrea Braides, Gamma-convergence for Beginners, vol. 22, Clarendon Press,
2002.

[65] Vincent Duval and Gabriel Peyré, “Sparse regularization on thin grids I: the
Lasso,” Inverse Problems, vol. 33, no. 5, pp. 055008, 2017.

[66] Gongguo Tang, Badri Narayan Bhaskar, and Benjamin Recht, “Sparse recov-
ery over continuous dictionaries-just discretize,” in Asilomar Conference on
Signals, Systems and Computers. IEEE, 2013, pp. 1043–1047.

[67] George B Dantzig, Alex Orden, and Philip Wolfe, “The generalized simplex
method for minimizing a linear form under linear inequality restraints,” Pa-
cific Journal of Mathematics, vol. 5, no. 2, pp. 183–195, Oct. 1955.

[68] David G Luenberger, Introduction to Linear and Nonlinear Programming,


vol. 28, Addison-Wesley Reading, MA, 1973.

[69] Vincent Duval and Gabriel Peyré, “Exact support recovery for sparse spikes
deconvolution,” Foundations of Computational Mathematics, vol. 15, no. 5,
pp. 1315–1355, 2015.
210 BIBLIOGRAPHY

[70] Ryan J Tibshirani, “The LASSO problem and uniqueness,” Electronic Journal
of Statistics, vol. 7, pp. 1456–1490, 2013.
[71] Holger Rauhut, Karin Schnass, and Pierre Vandergheynst, “Compressed sens-
ing and redundant dictionaries,” IEEE Transactions on Information Theory,
vol. 54, no. 5, pp. 2210–2219, Apr. 2008.
[72] Simon Foucart and Holger Rauhut, A Mathematical Introduction to Com-
pressive Sensing, Springer, 2013.
[73] Antonin Chambolle and Charles Dossal, “On the convergence of the iterates
of FISTA,” Journal of Optimization Theory and Applications, vol. 166, no.
3, pp. 25, 2015.
[74] E. Mammen and S. van de Geer, “Locally adaptive regression splines,” Annals
of Statistics, vol. 25, no. 1, pp. 387–413, 1997.
[75] Emmanuel J Candès and Carlos Fernandez-Granda, “Towards a mathematical
theory of super-resolution,” Communications on Pure and Applied Mathemat-
ics, vol. 67, no. 6, pp. 906–956, 2014.
[76] Gongguo Tang, Badri Narayan Bhaskar, Parikshit Shah, and Benjamin Recht,
“Compressed sensing off the grid,” IEEE Transactions on Information The-
ory, vol. 59, no. 11, pp. 7465–7490, 2013.
[77] Yohann De Castro, Fabrice Gamboa, Didier Henrion, and J-B Lasserre, “Ex-
act solutions to super resolution on semi-algebraic domains in higher dimen-
sions,” IEEE Transactions on Information Theory, vol. 63, no. 1, pp. 621–630,
2017.
[78] M. Unser and T. Blu, “Generalized smoothing splines and the optimal dis-
cretization of the Wiener filter,” IEEE Transactions on Signal Processing,
vol. 53, no. 6, pp. 2146–2159, Jun. 2005.
[79] M. Unser and P. D. Tafti, “Stochastic models for sparse and piecewise-smooth
signals,” IEEE Transactions on Signal Processing, vol. 59, no. 3, pp. 989–
1006, Mar. 2011.
[80] M. Unser and P. D. Tafti, An Introduction to Sparse Stochastic Processes,
Cambridge University Press, 2014.
BIBLIOGRAPHY 211

[81] J. Fageot, V. Uhlmann, and M. Unser, “Gaussian and sparse processes are
limits of generalized Poisson processes,” arXiv:1702.05003 [math.PR], 2017.

[82] Ali Mousavi and Richard G Baraniuk, “Learning to invert: Signal recovery
via deep convolutional networks,” arXiv:1701.03891 [stat.ML], 2017.

[83] Karol Gregor and Yann LeCun, “Learning fast approximations of sparse
coding,” in Proc. Int. Conf. Mach. Learn. (ICML), 2010, pp. 399–406.

[84] Yan Yang, Jian Sun, Huibin Li, and Zongben Xu, “Deep ADMM-Net for
compressive sensing MRI,” in Adv. Neural Inf. Process. Syst. (NIPS), pp.
10–18. 2016.

[85] Patrick Putzky and Max Welling, “Recurrent inference machines for solving
inverse problems,” arXiv:1706.04008 [cs.NE], 2017.

[86] Jo Schlemper, Jose Caballero, Joseph V Hajnal, Anthony Price, and Daniel
Rueckert, “A deep cascade of convolutional neural networks for MR image
reconstruction,” in International Conference on Information Processing in
Medical Imaging. Springer, 2017, pp. 647–658.

[87] Singanallur V Venkatakrishnan, Charles A Bouman, and Brendt Wohlberg,


“Plug-and-play priors for model based reconstruction,” in Proc. IEEE Glob.
Conf. Signal Inform. Process. (GlobalSIP), 2013, pp. 945–948.

[88] Stanley H Chan, Xiran Wang, and Omar A Elgendy, “Plug-and-Play ADMM
for image restoration: Fixed-point convergence and applications,” IEEE
Trans. Comput. Imaging, vol. 3, no. 1, pp. 84–98, 2017.

[89] Suhas Sreehari, S Venkat Venkatakrishnan, Brendt Wohlberg, Gregery T Buz-


zard, Lawrence F Drummy, Jeffrey P Simmons, and Charles A Bouman,
“Plug-and-play priors for bright field electron tomography and sparse inter-
polation,” IEEE Tran. Comput. Imaging, vol. 2, no. 4, pp. 408–423, 2016.

[90] Yaniv Romano, Michael Elad, and Peyman Milanfar, “The little engine that
could: Regularization by denoising (red),” SIAM Journal on Imaging Sci-
ences, vol. 10, no. 4, pp. 1804–1844, 2017.
212 BIBLIOGRAPHY

[91] JH Chang, Chun-Liang Li, Barnabás Póczos, BVK Kumar, and Aswin C
Sankaranarayanan, “One network to solve them all—Solving linear inverse
problems using deep projection models,” arXiv:1703.09912 [cs.CV], 2017.

[92] Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis, “Compressed
sensing using generative models,” arXiv:1703.03208 [stat.ML], 2017.

[93] Brendan Kelly, Thomas P Matthews, and Mark A Anastasio, “Deep learning-
guided image reconstruction from incomplete data,” arXiv:1709.00584
[cs.CV], 2017.

[94] Jerome Zhengrong Liang, Patrick J. La Riviere, Georges El Fakhri, Stephen J.


Glick, and Jeff Siewerdsen, “Guest editorial low-dose CT: What has been
done, and what challenges remain?,” IEEE Trans. Med. Imag., vol. 36, no.
12, pp. 2409–2416, 2017.

[95] S. Ramani and J. A. Fessler, “A splitting-based iterative algorithm for accel-


erated statistical X-ray CT reconstruction,” IEEE Trans. Med. Imag., vol.
31, no. 3, pp. 677–688, 2012.

[96] Qiong Xu, Hengyong Yu, Xuanqin Mou, Lei Zhang, Jiang Hsieh, and
Ge Wang, “Low-dose X-ray CT reconstruction via dictionary learning,” IEEE
Trans. Med. Imag., vol. 31, no. 9, pp. 1682–1697, 2012.

[97] Shanzhou Niu, Yang Gao, Zhaoying Bian, Jing Huang, Wufan Chen, Gaohang
Yu, Zhengrong Liang, and Jianhua Ma, “Sparse-view X-ray CT reconstruction
via total generalized variation regularization,” Phys. Med. Biol., vol. 59, no.
12, pp. 2997–3017, 2014.

[98] Lars Gjesteby, Qingsong Yang, Yan Xi, Ye Zhou, Junping Zhang, and
Ge Wang, “Deep learning methods to guide CT image reconstruction and
reduce metal artifacts,” in Medical Imaging 2017: Physics of Medical Imag-
ing, Orlando, Fl, 2017.

[99] Hu Chen, Yi Zhang, Mannudeep K. Kalra, Feng Lin, Yang Chen, Peixi Liao,
Jiliu Zhou, and Ge Wang, “Low-dose CT with a residual encoder-decoder
convolutional neural network,” IEEE Trans. Med. Imag., vol. 36, no. 12, pp.
2524–2535, 2017.
BIBLIOGRAPHY 213

[100] Eunhee Kang, Junhong Min, and Jong Chul Ye, “A deep convolutional neural
network using directional wavelets for low-dose X-ray CT reconstruction,”
Med. Phys., vol. 44, no. 10, pp. e360–e375, 2017.

[101] Yoseop Han, Jaejoon Yoo, and Jong Chul Ye, “Deep residual learning for
compressed sensing CT reconstruction via persistent homology analysis,”
arXiv:1611.06391 [cs.CV], 2016.

[102] Bertolt Eicke, “Iteration methods for convexly constrained ill-posed problems
in Hilbert space,” Numer. Funct. Anal. Optim., vol. 13, no. 5-6, pp. 413–429,
1992.

[103] L. Landweber, “An iteration formula for Fredholm integral equations of the
first kind,” Amer. J. Math., vol. 73, no. 3, pp. 615–624, 1951.

[104] Dimitri P. Bertsekas, Nonlinear Programming, Athena Scientific, Cambridge,


MA, 2 edition, 1999.

[105] Patrick L. Combettes and V. Wajs, “Signal recovery by proximal forward-


backward splitting,” Multiscale Modeling and Simulation, vol. 4, no. 4, pp.
1168–1200, 2005.

[106] Patrick L. Combettes and Jean-Christophe Pesquet, Proximal Splitting Meth-


ods in Signal Processing, pp. 185–212, Springer, New York, NY, 2011.

[107] J. Bect, L. Blanc-Feraud, G. Aubert, and A. Chambolle, “A `1 -unified vari-


ational framework for image restoration,” in Proc. Eur. Conf. Comput. Vis.
(ECCV), 2004, pp. 1–13.

[108] A. Aldroubi and R. Tessera, “On the existence of optimal unions of subspaces
for data modeling and clustering,” Found. Comput. Math., vol. 11, no. 3, pp.
363âĂŞ–379, 2011.

[109] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional
networks for biomedical image segmentation,” in Proc. Med. Image. Comput.
Comput. Assist. Interv. (MICCAI), 2015, pp. 234–241.

[110] A. Emin Orhan and Xaq Pitkow, “Skip connections eliminate singularities,”
arXiv:1701.09175 [cs.NE], 2017.
214 BIBLIOGRAPHY

[111] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual
learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), 2016, pp. 770–778.
[112] C. McCollough, “TU-FG-207A-04: Overview of the Low Dose CT Grand
Challenge,” Med. Phys., vol. 43, no. 6-part-35, pp. 3759–3760, 2016.
[113] Zhou Wang, Eero P Simoncelli, and Alan C Bovik, “Multiscale structural sim-
ilarity for image quality assessment,” in Proc. IEEE Asilomar Conf. Signals,
Syst., Comput., Pacific Grove, CA, Nov. 2003, vol. 2, pp. 1398–1402.
[114] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro, “Online
learning for matrix factorization and sparse coding,” Journal of Machine
Learning Research (JMLR), vol. 11, pp. 19–60, 2010.
[115] Joel A Tropp and Anna C Gilbert, “Signal recovery from random measure-
ments via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53,
no. 12, pp. 4655–4666, 2007.
[116] Kyong Hwan Jin, Harshit Gupta, Jerome Yerly, Matthias Stuber, and Michael
Unser, “Time-dependent deep image prior for dynamic mri,” arXiv preprint
arXiv:1910.01684, 2019.
[117] M.A. Griswold, P.M. Jakob, R.M. Heidemann, M. Nittka, V. Jellus,
J. Wang, et al., “Generalized autocalibrating partially parallel acquisitions
(GRAPPA),” Magn. Reson. in Med., vol. 47, no. 6, pp. 1202–1210, June 2002.
[118] K.P. Pruessmann, M. Weiger, M.B. Scheidegger, and P. Boesigner, “SENSE:
Sensitivity encoding for fast MRI,” Magn. Reson. in Med., vol. 42, no. 5, pp.
952–962, November 1999.
[119] M. Lustig, D. Donoho, and J.M. Pauly, “Sparse MRI: The application of
compressed sensing for rapid MR imaging,” Magn. Reson. in Med., vol. 58,
no. 6, pp. 1182–1195, December 2007.
[120] J. Fessler, “Model-based image reconstruction for MRI,” IEEE Sig. Process.
Mag., vol. 27, no. 4, pp. 81–89, July 2010.
[121] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no.
7553, pp. 436–444, May 27 2015.
BIBLIOGRAPHY 215

[122] J. Schlemper, J. Caballero, J.V. Hajnal, A.N. Price, and D. Rueckert, “A


deep cascade of convolutional neural networks for dynamic MR image recon-
struction,” IEEE Trans. on Med. Imag., vol. 37, no. 2, pp. 491–503, February
2018.
[123] A. Hauptmann, S. Arridge, F. Lucka, V. Muthurangu, and J.A. Steeden,
“Real-time cardiovascular MR with spatio-temporal artifact suppression using
deep learning—Proof of concept in congenital heart disease,” Magn. Reson.
in Med., vol. 81, no. 2, pp. 1143–1156, February 2019.
[124] S. Biswas, H.K. Aggarwal, and M. Jacob, “Dynamic MRI using model-based
deep learning and SToRM priors: MoDL-SToRM,” Magn. Reson. in Med.,
vol. 82, no. 1, pp. 485–494, July 2019.
[125] Y. Han, J. Yoo, H.H. Kim, H.J. Shin, K. Sung, and J.C. Ye, “Deep learn-
ing with domain adaptation for accelerated projection-reconstruction MR,”
Magn. Reson. in Med., vol. 80, no. 3, pp. 1189–1205, September 2018.
[126] M. Mardani, E. Gong, J.Y. Cheng, S.S. Vasanawala, G. Zaharchuk, L. Xing,
et al., “Deep generative adversarial neural networks for compressive sensing
MRI,” IEEE Trans. on Med. Imag., vol. 38, no. 1, pp. 167–179, January 2019.
[127] H. Jung, K. Sung, K.S. Nayak, E.Y. Kim, and J.C. Ye, “k-t FOCUSS: A
general compressed sensing framework for high resolution dynamic MRI,”
Magn. Reson. in Med., vol. 61, no. 1, pp. 103–116, January 2009.
[128] S.G. Lingala, Y. Hu, E. DiBella, and M. Jacob, “Accelerated dynamic MRI
exploiting sparsity and low-rank structure: k-t SLR,” IEEE Trans. on Med.
Imag., vol. 30, no. 5, pp. 1042–1054, 2011.
[129] L. Feng, R. Grimm, K.T. Block, H. Chandarana, S. Kim, J. Xu, et al.,
“Golden-angle radial sparse parallel MRI: Combination of compressed sens-
ing, parallel imaging, and golden-angle radial sampling for fast and flexible
dynamic volumetric MRI,” Magn. Reson. in Med., vol. 72, no. 3, pp. 707–717,
September 2014.
[130] L. Feng, L. Axel, H. Chandarana, K.T. Block, D.K. Sodickson, and R. Otazo,
“XD-GRASP: Golden-angle radial MRI with reconstruction of extra motion-
state dimensions using compressed sensing,” Magn. Reson. in Med., vol. 75,
no. 2, pp. 775–788, February 2016.
216 BIBLIOGRAPHY

[131] E.J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact
signal reconstruction from highly incomplete frequency information,” IEEE
Trans. on Info. Theory, vol. 52, no. 2, pp. 489–509, February 2006.
[132] S. Poddar and M. Jacob, “Dynamic MRI using smoothness regularization
on manifolds (SToRM),” IEEE Trans. on Med. Imag., vol. 35, no. 4, pp.
1106–1115, April 2015.
[133] U. Nakarmi, Wang Y., J. Lyu, D. Liang, and L. Ying, “A kernel-based low-
rank (KLR) model for low-dimensional manifold recovery in highly accel-
erated dynamic MRI,” IEEE Trans. on Med. Imag., vol. 36, no. 11, pp.
2297–2307, November 2017.
[134] J. Yerly, G. Ginami, G. Nordio, A.J. Coristine, S. Coppo, P. Monney, et al.,
“Coronary endothelial function assessment using self-gated cardiac cine MRI
and k-t sparse SENSE,” Magn. Reson. in Med., vol. 76, no. 5, pp. 1443–1454,
November 2016.
[135] J. Chaptinel, J. Yerly, Y. Mivelaz, M. Prsa, L. Alamo, Y. Vial, et al., “Fetal
cardiac cine magnetic resonance imaging in utero,” Scientific Reports, vol. 7,
no. 15540, pp. 1–10, November 14 2017.
[136] J.A. Fessler and B.P. Sutton, “Nonuniform fast Fourier transforms using min-
max interpolation,” IEEE Trans. on Sig. Proc., vol. 51, no. 2, pp. 560–574,
February 2003.
[137] A.P. Yazdanpanah, O. Afacan, and S.K. Warfield, “Non-learning based deep
parallel MRI reconstruction (NLDpMRI),” in Proc. of the SPIE Conf. on
Med. Imag.: Imag. Process., San Diego CA, USA, February 16-21 2019, Inter-
national Society for Optics and Photonics, vol. 10949, pp. 1094904–1094910.
[138] K. Gong, C. Catana, J. Qi, and Q. Li, “PET image reconstruction using deep
image prior,” IEEE Trans. on Med. Imag., vol. 38, no. 7, pp. 1655–1665, July
2019.
[139] C. Qin, J. Schlemper, J. Caballero, A.N. Price, J.V. Hajnal, and D. Rueckert,
“Convolutional recurrent neural networks for dynamic MR image reconstruc-
tion,” IEEE Trans. on Med. Imag., vol. 38, no. 1, pp. 280–290, January
2019.
BIBLIOGRAPHY 217

[140] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning


with deep convolutional generative adversarial networks,” in Int. Conf. on
Learn. Representations (ICLR), San Diego CA, USA, May 7-9 2015.
[141] D.P. Kingma and J. Ba, “ADAM: A method for stochastic optimization,” in
Int. Conf. on Learn. Representations (ICLR), San Diego CA, USA, May 7-9
2015.
[142] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. of
Mach. Learn. Res., vol. 9, pp. 2579–2605, November 2008.
[143] Jacques Dubochet, Marc Adrian, Jiin-Ju Chang, Jean-Claude Homo, Jean
Lepault, Alasdair W McDowall, and Patrick Schultz, “Cryo-electron mi-
croscopy of vitrified specimens,” Quarterly Reviews of Biophysics, vol. 21,
no. 2, pp. 129–228, 1988.
[144] Richard Henderson, J M_ Baldwin, TA Ceska, F Zemlin, E a Beckmann, and
Kenneth H Downing, “Model for the structure of bacteriorhodopsin based on
high-resolution electron cryo-microscopy,” Journal of molecular biology, vol.
213, no. 4, pp. 899–929, 1990.
[145] Joachim Frank, Three-dimensional electron microscopy of macromolecular
assemblies: visualization of biological molecules in their native state, Oxford
University Press, 2006.
[146] Joachim Frank, Brian Shimkin, and Helen Dowse, “SpiderâĂŤa modular
software system for electron image processing,” Ultramicroscopy, vol. 6, no.
4, pp. 343–357, 1981.
[147] COS Sorzano, Roberto Marabini, Javier Velázquez-Muriel, José Román
Bilbao-Castro, Sjors HW Scheres, José M Carazo, and Alberto Pascual-
Montano, “Xmipp: A new generation of an open-source image processing
package for electron microscopy,” Journal of Structural Biology, vol. 148, no.
2, pp. 194–204, 2004.
[148] Guang Tang, Liwei Peng, Philip R Baldwin, Deepinder S Mann, Wen Jiang,
Ian Rees, and Steven J Ludtke, “Eman2: An extensible image processing
suite for electron microscopy,” Journal of Structural Biology, vol. 157, no. 1,
pp. 38–46, 2007.
218 BIBLIOGRAPHY

[149] Nikolaus Grigorieff, “Frealign: High-resolution refinement of single particle


structures,” Journal of Structural Biology, vol. 157, no. 1, pp. 117–125, 2007.
[150] Michael Hohn, Grant Tang, Grant Goodyear, Philip R Baldwin, Zhong
Huang, Pawel A Penczek, Chao Yang, Robert M Glaeser, Paul D Adams,
and Steven J Ludtke, “Sparx, a new environment for cryo-em image process-
ing,” Journal of Structural Biology, vol. 157, no. 1, pp. 47–55, 2007.
[151] Sjors HW Scheres, “Relion: Implementation of a bayesian approach to cryo-
em structure determination,” Journal of Structural Biology, vol. 180, no. 3,
pp. 519–530, 2012.

[152] J.M. de la Rosa-TrevÃŋn, A. Quintana, L. del Cano, A. ZaldÃŋvar, I. Foche,


J. GutiÃľrrez, J. GÃşmez-Blanco, J. Burguet-Castell, J. Cuenca-Alba,
V. Abrishami, J. Vargas, J. OtÃşn, G. Sharov, J.L. Vilas, J. Navas, P. Conesa,
M. Kazemi, R. Marabini, C.O.S. Sorzano, and J.M. Carazo, “Scipion: A
software framework toward integration, reproducibility and validation in 3D
electron microscopy,” Journal of Structural Biology, vol. 195, no. 1, pp. 93 –
99, 2016.
[153] Ali Punjani, John L Rubinstein, David J Fleet, and Marcus A Brubaker,
“cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determi-
nation,” Nature Methods, vol. 14, no. 3, pp. 290–296, Feb. 2017.

[154] Carlos Oscar Sánchez Sorzano, Roberto Marabini, Alberto Pascual-Montano,


Sjors HW Scheres, and José María Carazo, “Optimization problems in electron
microscopy of single particles,” Annals of Operations Research, vol. 148, no.
1, pp. 133–165, 2006.
[155] Richard Henderson, Andrej Sali, Matthew L. Baker, Bridget Carragher,
Batsal Devkota, Kenneth H. Downing, Edward H. Egelman, Zukang Feng,
Joachim Frank, Nikolaus Grigorieff, Wen Jiang, Steven J. Ludtke, Ohad
Medalia, Pawel A. Penczek, Peter B. Rosenthal, Michael G. Rossmann,
Michael F. Schmid, Gunnar F. SchrÃűder, Alasdair C. Steven, David L.
Stokes, John D. Westbrook, Willy Wriggers, Huanwang Yang, Jasmine Young,
Helen M. Berman, Wah Chiu, Gerard J. Kleywegt, and Catherine L. Law-
son, “Outcome of the first electron microscopy validation task force meeting,”
Structure, vol. 20, no. 2, pp. 205–214, 2012.
BIBLIOGRAPHY 219

[156] Tamir Bendory, Alberto Bartesaghi, and Amit Singer, “Single-particle cryo-
electron microscopy: Mathematical theory, computational challenges, and
opportunities,” IEEE Signal Processing Magazine, vol. 37, no. 2, pp. 58–76,
2020.

[157] Amit Singer and Fred J Sigworth, “Computational methods for single-particle
electron cryomicroscopy,” Annual Review of Biomedical Data Science, vol. 3,
2020.

[158] Alberto Bartesaghi, Alan Merk, Soojay Banerjee, Doreen Matthies, Xiongwu
Wu, Jacqueline LS Milne, and Sriram Subramaniam, “2.2 å resolution cryo-
em structure of -galactosidase in complex with a cell-permeant inhibitor,”
Science, vol. 348, no. 6239, pp. 1147–1151, 2015.

[159] Joachim Frank, Electron tomography: methods for three-dimensional visual-


ization of structures in the cell, Springer Science & Business Media, 2008.

[160] F. Natterer, The mathematics of computerized tomography, Society for In-


dustrial and Applied Mathematics, jan 2001.

[161] Miloš Vulović, Raimond B.G. Ravelli, Lucas J. van Vliet, Abraham J. Koster,
Ivan Lazić, Uwe Lücken, Hans Rullgård, Ozan Öktem, and Bernd Rieger, “Im-
age formation modeling in cryo-electron microscopy,” Journal of Structural
Biology, vol. 183, no. 1, pp. 19–32, July 2013.

[162] Hans Rullgård, L-G Öfverstedt, Sergey Masich, Bertil Daneholt, and Ozan
Öktem, “Simulation of transmission electron microscope images of biological
specimens,” Journal of microscopy, vol. 243, no. 3, pp. 234–256, 2011.

[163] M. Unser, “Sampling—50 years after Shannon,” Proceedings IEEE, vol. 88,
no. 4, pp. 569–587, apr 2000.

[164] Martin Arjovsky, Soumith Chintala, and Léon Bottou, “Wasserstein genera-
tive adversarial networks,” in International conference on machine learning,
2017, pp. 214–223.

[165] Cédric Villani, Optimal transport: old and new, vol. 338, Springer Science &
Business Media, 2008.
220 BIBLIOGRAPHY

[166] Gabriel Peyré, Marco Cuturi, et al., “Computational optimal transport,”


Foundations and Trends R in Machine Learning, vol. 11, no. 5-6, pp. 355–
607, 2019.
[167] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and
Aaron C Courville, “Improved training of wasserstein gans,” in Advances in
neural information processing systems, 2017, pp. 5767–5777.
[168] Wim van Aarle, Willem Jan Palenstijn, Jan De Beenhouwer, Thomas Al-
tantzis, Sara Bals, K Joost Batenburg, and Jan Sijbers, “The astra toolbox:
A platform for advanced algorithm development in electron tomography,” Ul-
tramicroscopy, vol. 157, pp. 35–47, 2015.
[169] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury,
Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca
Antiga, et al., “Pytorch: An imperative style, high-performance deep learning
library,” in Advances in Neural Information Processing Systems, 2019, pp.
8024–8035.
[170] Victor M Panaretos et al., “On random tomography with unobservable pro-
jection angles,” The Annals of Statistics, vol. 37, no. 6A, pp. 3272–3306,
2009.
[171] Sigurdur Helgason, The radon transform, vol. 2, Springer, 1980.
[172] Pawel A. Penczek, Robert A. Grassucci, and Joachim Frank, “The ribosome at
improved resolution: New techniques for merging and orientation refinement
in 3D cryo-electron microscopy of biological particles,” Ultramicroscopy, vol.
53, no. 3, pp. 251 – 270, 1994.
[173] T.S. Baker and R.H. Cheng, “A model-based approach for determining ori-
entations of biological macromolecules imaged by cryoelectron microscopy,”
Journal of Structural Biology, vol. 116, no. 1, pp. 120–130, 1996.
[174] Fred J Sigworth, “A maximum-likelihood approach to single-particle image
refinement,” Journal of structural biology, vol. 122, no. 3, pp. 328–339, 1998.
[175] Fred J Sigworth, Peter C Doerschuk, Jose-Maria Carazo, and Sjors HW
Scheres, “An introduction to maximum-likelihood methods in cryo-em,” in
Methods in enzymology, vol. 482, pp. 263–294. Elsevier, 2010.
BIBLIOGRAPHY 221

[176] Zvi Kam, “The reconstruction of structure from electron micrographs of ran-
domly oriented particles,” in Electron Microscopy at Molecular Dimensions,
pp. 270–277. Springer, 1980.
[177] Nir Sharon, Joe Kileel, Yuehaw Khoo, Boris Landa, and Amit Singer,
“Method of moments for 3-d single particle ab initio modeling with non-
uniform distribution of viewing angles,” Inverse Problems, 2019.
[178] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud
Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm
Van Der Laak, Bram Van Ginneken, and Clara I Sánchez, “A survey on deep
learning in medical image analysis,” Medical image analysis, vol. 42, pp.
60–88, 2017.
[179] Michael T. McCann, Kyong Hwan Jin, and Michael Unser, “Convolutional
neural networks for inverse problems in imaging: A review,” IEEE Signal
Processing Magazine, vol. 34, no. 6, pp. 85–95, Nov. 2017.

[180] George Barbastathis, Aydogan Ozcan, and Guohai Situ, “On the use of deep
learning for computational imaging,” Optica, vol. 6, no. 8, pp. 921–943, 2019.
[181] Tristan Bepler, Alex J Noble, and Bonnie Berger, “Topaz-denoise: general
deep denoising models for cryoem,” bioRxiv, p. 838920, 2019.
[182] Feng Wang, Huichao Gong, Gaochao Liu, Meijing Li, Chuangye Yan, Tian
Xia, Xueming Li, and Jianyang Zeng, “Deeppicker: a deep learning approach
for fully automated particle picking in cryo-em,” Journal of structural biology,
vol. 195, no. 3, pp. 325–336, 2016.
[183] Yanan Zhu, Qi Ouyang, and Youdong Mao, “A deep convolutional neural
network approach to single-particle recognition in cryo-electron microscopy,”
BMC bioinformatics, vol. 18, no. 1, pp. 348, 2017.
[184] Dimitry Tegunov and Patrick Cramer, “Real-time cryo-em data pre-
processing with warp,” BioRxiv, p. 338558, 2018.
[185] Thorsten Wagner, Felipe Merino, Markus Stabrin, Toshio Moriya, Claudia
Antoni, Amir Apelbaum, Philine Hagel, Oleg Sitsel, Tobias Raisch, Daniel
Prumbaum, et al., “Sphire-cryolo is a fast and accurate fully automated
222 BIBLIOGRAPHY

particle picker for cryo-em,” Communications Biology, vol. 2, no. 1, pp. 218,
2019.

[186] Tristan Bepler, Andrew Morin, Micah Rapp, Julia Brasch, Lawrence Shapiro,
Alex J Noble, and Bonnie Berger, “Positive-unlabeled convolutional neural
networks for particle picking in cryo-electron micrographs,” Nature methods,
pp. 1–8, 2019.

[187] Ellen D. Zhong, Tristan Bepler, Joseph H. Davis, and Bonnie Berger, “Re-
constructing continuous distributions of 3D protein structure from cryo-em
images,” in International Conference on Learning Representations, 2020.

[188] Nina Miolane, Frédéric Poitevin, Yee-Ting Li, and Susan Holmes, “Estima-
tion of orientation and camera parameters from cryo-electron microscopy im-
ages with variational autoencoders and generative adversarial networks,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops, 2020, pp. 970–971.

[189] Ashish Bora, Eric Price, and Alexandros G Dimakis, “AmbientGAN: Gener-
ative models from lossy measurements.,” ICLR, vol. 2, pp. 5, 2018.

[190] Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lom-
bardi, Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason
Saragih, Matthias Nießner, et al., “State of the art on neural rendering,”
arXiv preprint arXiv:2004.03805, 2020.

[191] Shubham Tulsiani, Alexei A Efros, and Jitendra Malik, “Multi-view consis-
tency as supervisory signal for learning shape and pose prediction,” in Pro-
ceedings of the IEEE conference on computer vision and pattern recognition,
2018, pp. 2897–2905.

[192] Matheus Gadelha, Subhransu Maji, and Rui Wang, “3D shape induction
from 2D views of multiple objects,” in 2017 International Conference on 3D
Vision (3DV). IEEE, 2017, pp. 402–411.

[193] Shakir Mohamed and Balaji Lakshminarayanan, “Learning in implicit gener-


ative models,” arXiv preprint arXiv:1610.03483, 2016.
BIBLIOGRAPHY 223

[194] Harshit Gupta, Thong H. Phan, Jaejun Yoo, and Michael Unser, “Multi-
cryogan: Reconstruction of continuous conformations in cryo-em using gen-
erative adversarial networks,” in Proc. European Conference on Computer
Vision Workshops (ECCVW August 23-28), 2020.

[195] Joachim Frank and Abbas Ourmazd, “Continuous changes in structure


mapped by manifold embedding of single-particle data in cryo-em,” Meth-
ods, vol. 100, pp. 61–67, 2016.

[196] Abbas Ourmazd, “Cryo-em, xfels and the structure conundrum in structural
biology,” Nature methods, vol. 16, no. 10, pp. 941–944, 2019.

[197] Ellen D Zhong, Tristan Bepler, Bonnie Berger, and Joseph H Davis, “Cryo-
drgn: Reconstruction of heterogeneous structures from cryo-electron micro-
graphs using neural networks,” bioRxiv, 2020.

[198] Kurt Hornik, Maxwell Stinchcombe, Halbert White, et al., “Multilayer feed-
forward networks are universal approximators.,” .

[199] Carlos Oscar S Sorzano, A Jiménez, Javier Mota, José Luis Vilas, David
Maluenda, M Martínez, E Ramírez-Aportela, T Majtner, J Segura, Ruben
Sánchez-García, et al., “Survey of the analysis of continuous conformational
variability of biological macromolecules by electron microscopy,” Acta Crys-
tallographica Section F: Structural Biology Communications, vol. 75, no. 1,
pp. 19–32, 2019.

[200] Joakim AndÃľn, Eugene Katsevich, and Amit Singer, “Covariance estimation
using conjugate gradient for 3d classification in cryo-EM,” pp. 200–204.

[201] Ali Dashti, Peter Schwander, Robert Langlois, Russell Fung, Wen Li, Ah-
mad Hosseinizadeh, Hstau Y Liao, Jesper Pallesen, Gyanesh Sharma, Vera A
Stupina, et al., “Trajectories of the ribosome as a brownian nanomachine,”
Proceedings of the National Academy of Sciences, vol. 111, no. 49, pp. 17492–
17497, 2014.

[202] Amit Moscovich, Amit Halevi, Joakim Andén, and Amit Singer, “Cryo-em
reconstruction of continuous heterogeneity by laplacian spectral volumes,”
Inverse Problems, vol. 36, no. 2, pp. 024003, 2020.
224 BIBLIOGRAPHY

[203] Roy R. Lederman, Joakim AndÃľn, and Amit Singer, “Hyper-Molecules: on


the Representation and Recovery of Dynamical Structures, with Application
to Flexible Macro-Molecular Structures in Cryo-EM,” Inverse Problems, vol.
36, Apr. 2020.

[204] Evan Seitz, Francisco Acosta-Reyes, Peter Schwander, and Joachim Frank,
“Simulation of cryo-em ensembles from atomic models of molecules exhibiting
continuous conformations,” BioRxiv, p. 864116, 2019.

[205] Thomas Debarre, Julien Fageot, Harshit Gupta, and Michael Unser, “B-
spline-based exact discretization of continuous-domain inverse problems with
generalized tv regularization,” IEEE Transactions on Information Theory,
vol. 65, no. 7, pp. 4457–4470, 2019.

[206] Shayan Aziznejad, Harshit Gupta, Joaquim Campos, and Michael Unser,
“Deep neural networks with trainable activations and controlled lipschitz con-
stant,” arXiv preprint arXiv:2001.06263, 2020.

[207] Mohammad Zalbagi Darestani and Reinhard Heckel, “Can un-trained neural
networks compete with trained neural networks at image reconstruction?,”
arXiv preprint arXiv:2007.02471, 2020.

[208] Gauri Jagatap and Chinmay Hegde, “Algorithmic guarantees for inverse imag-
ing with untrained network priors,” in Advances in Neural Information Pro-
cessing Systems, 2019, pp. 14832–14842.

[209] Reinhard Heckel and Mahdi Soltanolkotabi, “Compressive sensing with un-
trained neural networks: Gradient descent finds the smoothest approxima-
tion,” arXiv preprint arXiv:2005.03991, 2020.

[210] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen, “Pro-
gressive growing of gans for improved quality, stability, and variation,”
arXiv:1710.10196, 2017.

[211] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee, “Accurate image super-
resolution using very deep convolutional networks,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2016, pp. 1646–
1654.
BIBLIOGRAPHY 225

[212] Tero Karras, Samuli Laine, and Timo Aila, “A style-based generator ar-
chitecture for generative adversarial networks,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2019, pp. 4401–
4410.

[213] Miloš Vulović, Raimond BG Ravelli, Lucas J van Vliet, Abraham J Koster,
Ivan Lazić, Uwe Lücken, Hans Rullgård, Ozan Öktem, and Bernd Rieger,
“Image formation modeling in cryo-electron microscopy,” Journal of structural
biology, vol. 183, no. 1, pp. 19–32, 2013.

[214] Walter Rudin, Real and Complex Analysis, Tata McGraw-Hill Education,
1987.

[215] Andrew J Kurdila and Michael Zabarankin, Convex functional analysis,


Springer Science & Business Media, 2006.

[216] Reed Michael and Barry Simon, Methods of modern mathematical physics I:
Functional analysis, Academic Press, 1980.

[217] Ken Sauer and Charles Bouman, “A local update strategy for iterative recon-
struction from projections,” IEEE Trans. Signal Process., vol. 41, no. 2, pp.
534–548, 1993.

[218] Idris A Elbakri and Jeffrey A Fessler, “Statistical image reconstruction for
polyenergetic X-ray computed tomography,” IEEE Trans. Med. Imag., vol.
21, no. 2, pp. 89–99, 2002.

[219] Heinz H. Bauschke and Patrick L. Combettes, Convex Analysis and Monotone
Operator Theory in Hilbert Spaces, Springer, New York, NY, 2011.

[220] Eric F Pettersen, Thomas D Goddard, Conrad C Huang, Gregory S Couch,


Daniel M Greenblatt, Elaine C Meng, and Thomas E Ferrin, “Ucsf
chimeraâĂŤa visualization system for exploratory research and analysis,”
Journal of computational chemistry, vol. 25, no. 13, pp. 1605–1612, 2004.

[221] Alexis Rohou and Nikolaus Grigorieff, “Ctffind4: Fast and accurate defocus
estimation from electron micrographs,” Journal of structural biology, vol. 192,
no. 2, pp. 216–221, 2015.
226 BIBLIOGRAPHY

[222] Nikhil Biyani, Ricardo D Righetto, Robert McLeod, Daniel Caujolle-Bert,


Daniel Castano-Diez, Kenneth N Goldie, and Henning Stahlberg, “Focus: The
interface between data collection and data processing in cryo-em,” Journal
of structural biology, vol. 198, no. 2, pp. 124–133, 2017.
[223] Ricardo D Righetto, Nikhil Biyani, Julia Kowal, Mohamed Chami, and Hen-
ning Stahlberg, “Retrieving high-resolution information from disordered 2d
crystals by single-particle cryo-em,” Nature communications, vol. 10, no. 1,
pp. 1–10, 2019.
Curriculum Vitæ

227
HARSHIT GUPTA
[email protected]
BM 4.134, EPFL, Lausanne CH-1015, Switzerland
Homepage ⇧ Google Scholar

CURRENT RESEARCH FOCUS


My research is aimed at designing mathematically backed deep-learning algorithms for solving inverse problems
in imaging. I am interested in the modalities of Cryo Electron Microscopy (Cryo-EM), Computational Tomog-
raphy (CT), and Magnetic Resonance Imaging (MRI).

EDUCATION

July 2015 - September 2020 École polytechnique fédérale de Lausanne (EPFL), Switzerland
Ph.D. in Electrical Engineering
Thesis: “From Classical to Unsupervised-Deep-Learning Methods
for Solving Inverse Problems in Imaging”.
Advisor: Prof. Michael Unser

July 2011 - May 2015 Indian Institute of Technology (IIT), Guwahati, India
B. Tech in Electronics and Communications Engineering

RESEARCH EXPERIENCES

July 2014 - May 2015 Indian Institute of Technology (IIT), Guwahati, India
Bachelor Thesis Project
Topic: “Blind Image Quality Assessment”
Advisor: Prof. Kannan Karthik

May 2014 - July 2014 École polytechnique fédérale de Lausanne (EPFL), Switzerland
Research Internship
Topic: “Interpolation using Derivatives”
Advisor: Prof. Michael Unser

May 2013 - July 2013 Indian Institute of Science (IISc), Bangalore, India
Research Internship
Topic: “Building a MATLAB GUI on Optic Disk
Localization using `1 -minimization”
Advisor: Prof. Chandra Sekhar Seelamantula

PUBLICATIONS
Preprints

8. Gupta H* , McCann M T* , Donati L, Unser M, “CryoGAN: A New Reconstruction Paradigm for Single-
particle Cryo-EM Via Deep Adversarial Learning,” bioRxiv 2020.03.20.001016, March 2020. * Co-first
authors. [PDF]

7. Jin K H* , Gupta H* , Yerly J, Stuber M, Unser M, “Time-Dependent Deep Image Prior for Dynamic
MRI,” IEEE Transactions on Medical Imaging, in Revision. * Co-first authors. [PDF]
Journals
6. Aziznejad S, Gupta H, Campos J, Unser M, “Deep Neural Networks with Trainable Activations and
Controlled Lipschitz Constant,” IEEE Transactions on Signal Processing, vol. 68, pp. 4688 - 4699,
August 2020. [PDF]
5. Yang F, Pham T, Gupta H, Unser M, Ma J, “Deep-learning projector for optical di↵raction tomography,”
Optics Express, vol. 28(3), pp. 3905-3921, February 2020. [PDF]
4. Debarre T, Fageot J, Gupta H, Unser M, “B-spline-based exact discretization of continuous-domain
inverse problems with generalized TV regularization,” IEEE Transactions on Information Theory, vol.
65(7), pp.4457-4470, March 2019. [PDF]
3. Gupta H, Jin K H, Nguyen H Q, McCann M T, Unser M, “CNN-based projected gradient descent for
consistent CT image reconstruction,” IEEE Transactions on Medical Imaging, vol. 37(6), pp. 1440-1453,
May 2018. [PDF]
2. Gupta H, Fageot J, Unser M, “Continuous-domain solutions of linear inverse problems with Tikhonov
versus generalized TV regularization,” IEEE Transactions on Signal Processing, vol. 66(17), pp. 4670-
4684, July 2018. [PDF]
1. Unser M, Fageot J, Gupta H, “Representer Theorems for Sparsity-Promoting `1 Regularization,” IEEE
Transactions on Information Theory, vol. 62(9), pp. 5167-5180, August 2016. [PDF]
Conference and Workshop Proceedings
4. Gupta H, Phan T H, Yoo J, Unser M, “Multi-CryoGAN: Reconstruction of continuous conformations
in Cryo-EM using Generative Adversarial Networks,” Proc. European Conference on Computer Vision
Workshops (ECCVW 2020) (Online, August 23-28), in press. [PDF]
3. Debarre T, Fageot J, Gupta H, Unser M, “Solving Continuous-domain Problems Exactly with Mul-
tiresolution B-splines,” Proc. IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP 2019) (Brighton, UK, May 12-17), pp. 5122-5126. [PDF]
2. Gupta H, Schmitter D, Uhlmann V, Unser M, “General surface energy for spinal cord and aorta segmen-
tation,” IEEE Proc. International Symposium on Biomedical Imaging (ISBI 2017), (Sydney, Australia,
April 18-21), pp. 319-322. [PDF]
1. Uhlmann V, Fageot J, Gupta H, Unser M, “Statistical optimality of Hermite splines,” Proc. International
Conference on Sampling Theory and Applications (SampTA 2015), (Washington, DC, US, May 25-29),
pp. 226-230. [PDF]
TEACHING EXPERIENCES

September 2015 - August 2020 Teaching Assistant at EPFL


Image Processing I - Autumn 2015, 2016, 2017, 2018, 2019
Image Processing II - Spring 2016, 2017, 2018, 2019, 2020

September 2019 - February 2019 Supervisor for Master Semester Project


Student: Huy Thong, EPFL
Topic: “Reconstructing multiple-conformations of particles
in Cryo-Electron Microscopy with deep learning”
January 2019 - June 2019 Co-supervisor for Master Semester Project
Student: Huy Thong, EPFL
Topic: “Implementing Deep-learning-based
iterative algorithm to solve inverse problem of MRI”

January 2019 - June 2019 Supervisor for Master Semester Project


Student: Joaquim Campos, EPFL
Topic: “Learning Spline-based activations for very deep learning”

September 2018 - February 2019 Co-supervisor for Master Thesis


Student: Matthieu Broisin, EPFL in collaboration with MIT, USA
Topic: “Segmentation of images using a Deep-Learning-based approach”

April 2017 - Septmeber 2017 Co-supervisor for Master Thesis


Student: Thomas Debarre, ENS Paris Saclay, Cachan, France
Topic: “B-spline-based exact discretization of continuous-domain
inverse problems with generalized TV regularization”

TECHNICAL STRENGTHS

Programming Languages Python, Matlab, Java, C, C++


Libraries PyTorch, MatConvNet
Softwares ImageJ, Fiji, Chimera

HONOURS
• Selected for II round of Texas Instruments Innovation Challenge: India Analog Design Contest 2014.
• Selected in national Top-30 in Manthan, CAG, 2014, among more than 150 teams.
• Placed among top 0.5% in 2011 IIT-Joint Entrance Exam (to enroll in undergraduate program) given by
500,000 students.
• Placed among National Top 1% in National Standard Examination in Physics, 2010-11, organized by
Indian Association of Physics Teachers.
• Secured AIR-171 in National Level Science Talent Search Examination, 2009.
• Secured 3rd position in SBM Inter School Science and Environment Quiz, 2008.
• Awarded the Talent Scholarship Award by Saraswati Siksha Sansthan, 2008.

You might also like