0% found this document useful (0 votes)
62 views124 pages

On Inexact Newton Methods For Inverse Problems in Banach Spaces

This document is the dissertation of Fábio J. Margotti submitted to Karlsruhe Institute of Technology (KIT) to earn a Doctor of Natural Sciences degree. The dissertation investigates an inexact Newton algorithm and its adaptation to Kaczmarz methods for constructing regularized solutions to nonlinear ill-posed inverse problems in Banach spaces. It includes chapters on the geometry of Banach spaces, the inexact Newton method K-REGINN, its convergence analysis, numerical experiments applying it to electrical impedance tomography, and conclusions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views124 pages

On Inexact Newton Methods For Inverse Problems in Banach Spaces

This document is the dissertation of Fábio J. Margotti submitted to Karlsruhe Institute of Technology (KIT) to earn a Doctor of Natural Sciences degree. The dissertation investigates an inexact Newton algorithm and its adaptation to Kaczmarz methods for constructing regularized solutions to nonlinear ill-posed inverse problems in Banach spaces. It includes chapters on the geometry of Banach spaces, the inexact Newton method K-REGINN, its convergence analysis, numerical experiments applying it to electrical impedance tomography, and conclusions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 124

———————————————————————————————————————

On Inexact Newton Methods


for Inverse Problems
in Banach Spaces
——————————————————–
Zur Erlangung des akademischen Grades eines

DOKTORS DER NATURWISSENSCHAFTEN

von der Fakultät für Mathematik des


Karlsruher Instituts für Technologie (KIT)
genehmigte

DISSERTATION

von
M.Sc. Fábio J. Margotti
aus Nova Veneza

————————————————————————————————————————

Tag der mündlichen Prüfung: 15. Juli 2015


Referent: Prof. Dr. Andreas Rieder
Korreferent: Prof. Dr. Andreas Kirsch
I dedicate this work to my
beloved wife PATRÍCIA
Contents

1 Introduction 1

2 Geometry of Banach Spaces 7


2.1 The subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Smoothness of Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Convexity of Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Relations between smoothness and convexity . . . . . . . . . . . . . . . . . 14
2.5 Duality mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Bregman distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 The Inexact Newton Method K-REGINN 21


3.1 Solving the inner iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 Primal gradient methods . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.2 Dual gradient methods . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.3 Tikhonov methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1.4 Mixed gradient-Tikhonov methods . . . . . . . . . . . . . . . . . . . 47

4 Convergence Analysis of K-REGINN 53


4.1 Termination and qualified decreasing residual . . . . . . . . . . . . . . . . . 53
4.2 Decreasing error and weak convergence . . . . . . . . . . . . . . . . . . . . . 58
4.3 Convergence without noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Regularization property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5 Numerical Experiments 75
5.1 EIT - Continuum Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.1.1 Fréchet-differentiability of the forward operator . . . . . . . . . . . . 77
5.1.2 Computational implementation . . . . . . . . . . . . . . . . . . . . . 79
5.2 EIT - Complete Electrode Model . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2.1 Fréchet-differentiability of the forward operator . . . . . . . . . . . . 92
5.2.2 Computational implementation . . . . . . . . . . . . . . . . . . . . . 95

6 Conclusions and Final Considerations 103

A Stability Property of the Regularizing Sequences 107


Acknowledgements

I am very glad to express the most sincere gratitude to my advisor Prof. Dr. Andreas
Rieder for trusting in my work since its beginning. His excellent mathematical skills have
strongly contributed to the improvement of my knowledge and his constant explanations
about German language and culture greatly facilitated my everyday life. I would like to
thank Prof. Dr. Andreas Kirsch too, for being a referee of this work, for kindly writing
recommendations to support the extension of my scholarship in two occasions and for
helping to complement my mathematical formation with his useful lectures as well as with
his fruitful discussions in our work group on inverse problems.
This work would have been impossible without the important financial support provided
by the Deutscher Akademischer Austauschdienst. I am very thankful to DAAD for intro-
ducing me to the interesting German’s culture too. In particular, I would like to say thank
you to Mrs. Maria Salgado for being so supportive every time I needed. I am much obliged
to my previous advisor Prof. Dr. Antonio Leitão, who introduced me to this fascinating
area of inverse problems and provided a strong support for my coming to Germany.
It is very necessary to thank all my colleagues of my research group of scientific comput-
ing IANM3 for creating a very pleasant and productive work environment. Special thanks
to Sonja Becker, Nathalie Sonnefeld, Johannes Ernesti, Julian Krämmer, Daniel Weiß, Mar-
cel Mikl, Ramin S. Nejad, Lydia Wagner, professors Tobias Jahnke and Christian Wieners
and my ex-colleagues Tudor Udrescu and Tim Kreutzmann.
My gratefulness is also directed to my colleague Robert Winkler for his always useful
discussions concerning the inverse problem of Electrical Impedance Tomography as well as
for providing me with his very organized and well-written code of EIT-CEM. Further, I
want to thank Ekkachai Thawinan and Andreas Schulz for being good friends in these last
four years.
Last but not least, I am deeply thankful to my dear wife Patrı́cia, who has been always
very considerate and comprehensive and with whom I have shared each little moment in
this amazing experience in Germany. Muito obrigado!
Abstract

This monograph is concerned with the investigation of an inexact-Newton algorithm and


its adaptation to the Kaczmarz methods. The new iterative methods aim the construction
of regularized solutions for nonlinear inverse ill-posed problems in Banach spaces.
To perform an iteration, the Newton-type algorithm linearizes the original problem
around the current iterate and then applies a regularization technique in order to stably
solve the resulting linear system. Various methods are proposed to the task of solving the
generated linear system and a complete convergence analysis is carried out. Furthermore,
stability and regularization properties of the computed solutions are established.
The new methods are tested in the nonlinear and highly ill-posed inverse problem of
Electrical Impedance Tomography in some numerical implementations realized at the end of
the work. The experiments simultaneously provide the necessary support to the theoretical
results and contribute to a relevant improvement to the solutions of the referred inverse
problem.

———————————————————————
Key words: Inverse problems, Ill-posed problems, Regularization theory, Iterative meth-
ods, Inexact-Newton methods, Electrical Impedance Tomography, Banach spaces.
Chapter 1

Introduction

This work is dedicated to the study of regularization methods for obtaining stable solutions
of nonlinear inverse ill-posed problems in Banach spaces. It is focused on iterative methods
and places particular emphasis on Newton-type algorithms. Further, a convergence analysis
of the investigated methods is provided and some numerical implementations support the
theoretical results.
In order to properly introduce the classical definition of ill-posedness due to Hadamard,
we start defining a forward problem as a function which associates the cause of a specific
phenomenon with the effect produced by it. The effect of this phenomenon is usually called
the solution of the forward problem. To each forward problem belongs a correlated inverse
problem.
The inverse problems consist of a set of mathematical problems where one aims to
determine the cause of a particular phenomenon (solution of the inverse problem) by means
of observing the effect produced by it (data). This kind of procedure encompasses an
extensive amount of real problems described by mathematical models in the most diversified
fields such as medicine, astronomy, engineering, geology, meteorology and a large sort of
physical problems as well as image restoration and several other applications [30, 45, 48].
The biggest challenge in the resolution of such problems is related to the fact that they are
frequently ill-posed in the sense of Hadamard.
The French mathematician Jacques Hadamard [19] has defined a problem as well-posed
if for this problem:

1. A solution exists (existence);

2. There exists at most one solution (uniqueness);

3. The solution depends continuously on the data (stability).

A problem which is not well-posed is called ill-posed.


If the data space is defined as the set of the solutions of the forward problem, existence of
a solution of the inverse problem is clear. However, a solution corresponding to a measured
data may fail to exist if this data is corrupted by noise. In this case, the solution space
should be enlarged in order to guarantee the existence of a solution. Typical examples of
enlargement of the solution space are constructed by using the Moore-Penrose generalized
inverse for linear inverse problems, and by accepting distributional solutions of partial
differential equations in Sobolev spaces.
In contrast, a restriction on the set of the admissible solutions can be imposed with
the purpose of ensuring uniqueness of a solution (the use of a minimal norm solution is
a common example). In this case, some a priori information about the solution must be
available. Non-uniqueness of solutions might also mean that insufficient data has been
gathered and consequently more data needs to be observed.

1
2 CHAPTER 1. INTRODUCTION

The third item in the definition of Hadamard is certainly the most delicate to deal
with. The non-fulfillment of the stability statement implies that unavoidable measurement
and round-off errors can be amplified by an arbitrarily large factor, severely compromising
the reliability of the computed solutions. Moreover, in contrast to the two first items of
Hadamard’s definition, a reformulation of the problem in order to achieve stability is not
a trivial issue. The stability property depends on the topology of the involved spaces and
an alteration of these topologies often modifies the original features of the problem and
deprives it of its fundamental characteristics, or in other words, the reformulated problem
becomes meaningless.
The observance of item 3 of Hadamard’s definition on the other hand, signifies that
the solution of the inverse problem corresponding to data corrupted by a low-level noise
cannot be distant from the searched-for solution. Hadamard believed, just as many of
his contemporaries, that a mathematical model could only correctly represent a natural
phenomenon if the associated inverse problem was well-posed (natura non facit saltum).
If this was not the case, the model was considered incorrectly formulated and the related
problem was called ill-posed. This was a controversy perception until the beginning of last
century, when was finally realized that an enormous quantity of real problems are actually
ill-posed if translated in any reasonable mathematical model. This conclusion initiated in
the second half of last century a huge amount of research targeting methods capable of
stably solving inverse ill-posed problems. At this time, the regularization theory was born.
In practical situations one usually has access only to a perturbed version of the data,
which is invariably contaminated by noise. As a consequence, the exact reconstruction of a
solution is unattainable and therefore, the best it can be done is finding an approximate so-
lution. However, unless a regularization method1 is engaged, the ill-posedness phenomenon
combined with even very low noise levels can easily ruin the process of computing a solution.
Due to both, its technological relevance and the challenging difficulties involving the
development of regularization methods, inverse problems are still nowadays an active area
of research. This fact is reflected in the large number of Journals, books and monographs
devoted to this subject. A particular interesting group, widely applied in the resolution of
large scale inverse ill-posed problems, is the iterative regularization methods. The literature
in this area is vast and the books [30, 15, 45] can be duly included in the most complete
references concerning the subject in Hilbert spaces. The work [29] deserves to be mentioned
too. For more recent progress, including the regularization theory in Banach spaces, consult
[48] and the references therein.
Known for their fast convergence properties, Newton-type methods are often regarded
as robust algorithms for solving nonlinear inverse problems. Among all the members of this
large group of remarkable iterative methods, we highlight the REGINN method. Introduced
in 1999 by A. Rieder [43], the REGularization based on INexact Newton iteration is a class
of inexact-Newton methods, which solves nonlinear inverse ill-posed problems by means
of solving a sequence of linearizations of the original problem. This algorithm linearizes
the original problem around the current iterate and then applies an iterative regularization
technique in the so-called inner iteration to find and approximate solution of the resulting
linearized system. Afterwards, this approximate solution is added to the current iterate
in the outer iteration in order to generate an update. Finally, the discrepancy principle is
employed to terminate the outer iteration.
The properties of REGINN in Hilbert spaces are well-known. Convergence results, regular-
ization property and convergence rates have been established under standard requirements
and with different iterative regularization methods in the inner iteration [43, 44, 46, 25]. A
1
If for each sequence of positive noise levels converging to zero, the correspondent sequence of approximate
solutions converges to an exact solution of the problem, then the related method is called a regularization
method and it is accordingly said to satisfy the regularization property.
3

few examples of possible inner iterations are given by gradient methods like the Conjugate
Gradient, Steepest Descent and Landweber methods and by Tikhonov-like methods such
as the Iterated-Tikhonov and Tikhonov-Phillips methods.
Perhaps the biggest disadvantage of Newton-like methods is the necessity of calculating
the derivative of the forward problem. This is normally an expensive process and it is
generally the bottleneck in numerical implementations. A wise approach to overcome this
obstacle is the utilization of a loping strategy. This procedure, proposed by S. Kaczmarz in
[27], splits the problem into a finite number of sub-problems, which are cyclically processed,
using one sub-problem each time, in order to find a solution of the original problem. The
Kaczmarz’s technique reduces drastically the computational effort necessary to perform
a single iteration. The expected effect is the acceleration of the original method with a
consequent gain in the convergence speed. Kaczmarz methods have an additional advantage:
they describe more appropriately the problems whose data is gathered by a set of individual
measurements, which is the case in Electrical Impedance Tomography for example (see the
numerical experiments in Chapter 5).
There are many papers concerning Kaczmarz methods in the context of ill-posed prob-
lems, but similar to REGINN, most results are proven in the Hilbert spaces framework
[31, 4, 20, 7]. Several inverse problems however, are naturally described in more gen-
eral contexts than Hilbert spaces. The Lebesgue spaces Lp (Ω) , as well as the Sobolev
spaces W n,p (Ω) and the space of the p−summable sequences `p (N) for different values of
p ∈ [1, ∞] are classical examples of Banach spaces, which model more appropriately nu-
merous inverse problems. Further, describing an inverse problem in more suitable Banach
spaces contributes with supplementary benefits such as the non-destruction of sparsity char-
acteristics of the searched-for solution and the increase of data accuracy when dealing with
impulsive noise2 . These improvements are commonly obtained changing in the solution
and data spaces, the parameter p = 2, commonly used to characterize Hilbert spaces, with
p ≈ 1, see Daubechies et al [12].
Figure 1.1 illustrates the advantage of using more general Banach spaces in the descrip-
tion of inverse problems. The searched-for (sparse) electrical conductivity is reconstructed
in the Electrical Impedance Tomography problem using 1% of impulsive noise and compares
two different frameworks: in the first case, the solution space X and the data space Y are
both the Hilbert space L2 , while in the second one, the conductivity is reconstructed in the
Banach spaces X = Y = L1.01 .
For more examples of inverse problems modelled in Banach spaces, see the numerical
experiments in Chapter 5 and the book of Schuster et al [48].
The difficulties of carrying out a convergence analysis in Banach spaces grow massively if
the solution and data spaces have poor smoothness/convexity properties. It is not straight-
forward modifying classical methods from Hilbert spaces in order to adjust them to more
complicated Banach spaces. As long as we know, the first version of REGINN in Banach
spaces has been published by Q. Jin in [24], where a weak convergence result was proven.
The first strong convergence result has been firstly successfully accomplished for the com-
bination Kaczmarz/inexact-Newton/Banach spaces in our previous work [40], where the
Landweber method was employed as inner iteration. The most recent progress has been
made using the Iterated-Tikhonov method as inner iteration [39].
The present thesis contributes to expand the results above mentioned, providing a rel-
ative general convergence analysis of a Kaczmarz version of REGINN (in short K-REGINN) in
Banach spaces, valid at the same time for different methods in the inner iteration. This
work is structured as following:

• In Chapter 2, the necessary preliminaries concerning the geometry of Banach spaces


2
Impulsive noise is a kind of sparsely distributed noise, common in many practical applications, see e.g.
[11].
4 CHAPTER 1. INTRODUCTION

Exact X = Y = L2 X = Y = L1.01

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 1.1: Sparse conductivity (picture on the left) reconstructed in the Electrical
Impedance Tomography problem with 1% of impulsive noise. The Hilbert spaces X =
Y = L2 and the Banach spaces X = Y = L1.01 have been used to reconstruct the pictures
displayed on the middle and on the right respectively.

are presented. First, the concepts of convexity and smoothness are discussed. These
concepts determine ”how favorable” the geometry of a Banach space is. Hilbert
spaces are usually regarded as the Banach spaces with the most favorable geometrical
features, being at the same time the most convex and the smoothest Banach spaces.
After a discussion about the connections between convexity and smoothness, the
definition of duality mapping is introduced. Inspired by the Riesz Representation
Theorem, this mapping associates each element x in the solution space with a subset
Jp (x) of its dual space so that the resulting duality pairing hx∗ , yi , with x∗ ∈ Jp (x) ,
mimics the properties of the inner product hx, yi in Hilbert spaces. At the end of this
chapter, the Bregman distance is presented. It serves to simulate in Banach spaces
the polarization identity and replaces the norm in some specific situations. Finally,
using the Xu-Roach famous Theorems [53], some connections between the Bregman
distance and the standard norm are proven.

• The Kaczmarz version of REGINN is introduced in Chapter 3. In order to solve the


inner iteration in Banach spaces, we carry out in Section 3.1 an adaptation of dif-
ferent regularization techniques from Hilbert to more general Banach spaces. The
gradient methods are first considered. In Hilbert spaces, these algorithms update the
current iterate adding a multiple of the gradient of the residual-squared-norm func-
tional. The primal and dual gradient methods, presented respectively in Subsections
3.1.1 and 3.1.2, are the Banach space versions. The fundamental difference between
these two kinds of gradient methods is the space where the iteration occurs: while the
primal gradient methods have an iteration occurring in the original space, the dual
version performs this iteration in the dual space. Among all the dual gradient meth-
ods of Subsection 3.1.2, we highlight the important Landweber, Modified Steepest
Descent and Decreasing Error methods, with the last one being a new strategy, firstly
presented in this work. In Subsection 3.1.3 the Tikhonov-like methods are introduced
and special attention is put on to the study of the relevant Iterated-Tikhonov and
Tikhonov-Phillips methods. It is well-known that Tikhonov-type methods provide
further stability to the computations, however, the necessity of minimizing function-
als in order to perform the inner iteration, substantially increases the effort of these
methods. To simplify this task, a mixed version of Tikhonov and dual gradient meth-
5

ods is suggested in Subsection 3.1.4. This novel algorithm yields extra stability to the
computations of the inner iteration without requiring the solution of an optimization
problem.

• Most methods presented in Section 3.1 have a similar nature and share similar prop-
erties. This particular aspect makes possible a general convergence analysis, which
is implemented in Chapter 4. The convergence analysis of K-REGINN presented in
Chapter 4 is carried out without considering any specific method in the inner itera-
tion. It is made assuming only specific properties of the sequence generated in the
inner iteration, which makes this convergence analysis valid for every method having
the required properties. Since many of the requested properties have already been
proven for the methods presented in Section 3.1, the results of Chapter 4 hold true
in particular for these methods. We highlight the main results: In Section 4.1 it is
proven that REGINN terminates and has a decreasing residual behavior whenever a
primal gradient method or a Tikhonov-Phillips-like method is used as inner itera-
tion. For the remaining methods, we further provide in the subsequent sections the
proofs of strong convergence of REGINN in the noiseless situation and of the regu-
larization property, see Theorems 43 and 47. Moreover, a decreasing error behavior
and a weak convergence result for the Kaczmarz version are provided in Theorem 38
and Corollary 41 respectively. Additionally, for the dual gradient Landweber method,
Iterated-Tikhonov and Tikhonov-Phillips methods, strong convergence in the noise-
less situation and the regularization property are proven for the Kaczmarz versions
too.

• In Chapter 5, the performance of K-REGINN is tested for solving the inverse problem of
the Electrical Impedance Tomography. Section 5.1 presents the Continuum Model [8]
and the Complete Electrode Model [50] is presented in Section 5.2. At the beginning
of each section, a brief explanation of the mathematical model is given and in the
subsequent subsection, the evaluation of the derivatives is discussed. Further, some
numerical experiments are performed in Subsections 5.1.2 and 5.2.2 and the respective
results discussed.
6 CHAPTER 1. INTRODUCTION
Chapter 2

Geometry of Banach Spaces

This first chapter lists the most relevant concepts and facts concerning the geometry of Ba-
nach spaces. The main goals are to point out and explain the main results about convexity
and smoothness of Banach spaces, duality mappings, Bregman distances and the proper-
ties connecting these notions. We only prove some few results and suggest some references
where the remaining proofs can be found.
We consider the book of Cioranescu [10] a good reference. It presents in a very clear
and understandable way the main ideas concerning the uniform convexity and uniform
smoothness of Banach spaces and their connections with duality mappings. However, it
lacks the important concepts of convexity and smoothness of power type. Moreover, the
Cioranescu’s book had been written before the very famous paper of Xu and Roach [53]
was published, which means that the main relations between convexity/smoothness and the
Bregman distances is not present. To understand this significant issue and learn its main
results, we suggest beyond the paper of Xu and Roach itself, the article [47], the books
[48, 9] and the references therein.
The theoretical results of this work cover simultaneously real and complex Banach
spaces. However, as we will discuss in a moment, their dual spaces have analogous proper-
ties, which permits to assume without restriction of the generality that the Banach space in
question is always a real Banach space. To make it clear, we give the following definition.

Definition 1 Let X be a normed space defined over the field k (either R or C). We call
the set
X ∗ := {x∗ : X → k : x∗ linear and continuous}
the dual space of X. We write XR to represent the vector space X regarded as a real
vector space. Accordingly,

XR∗ = {x∗ : XR → R : x∗ linear and continuous}


= {x∗ : X → R : x∗ R − linear and continuous}

denotes the dual space of XR . An element of X ∗ or XR∗ is called a functional.

As R and C are Banach spaces, so is X ∗ , independent of X being a Banach space itself.


Let X be a complex Banach space. It is easy to see that if x∗ ∈ X ∗ , then the function
Re x∗ : X → R defined by

hRe x∗ , xi := Re hx∗ , xi , x ∈ X,

belongs to XR∗ . Further, the operator T : X ∗ → XR∗ defined by T x∗ = Re x∗ satisfies

T (x∗ + λy ∗ ) = T x∗ + λT y ∗ for all x∗ , y ∗ ∈ X ∗ and λ ∈ R,

7
8 CHAPTER 2. GEOMETRY OF BANACH SPACES

which means that T is R−linear.


Let x∗ ∈ X ∗ be fixed. We claim that for each x ∈ X, there exists a vector x
b ∈ X such
that
xk = kxk and hRe x∗ , x
kb bi = |hx∗ , xi| . (2.1)
In fact, this is obviously true if hx∗ , xi = 0. Suppose that hx∗ , xi =
6 0 and define the vector
x ∗
b := sgn (hx , xi)x, with sgn(z) := z/ |z| being the signal function. Thus kb xk = kxk and

hx∗ , xi hx∗ , xi
hx∗ , x
bi = = |hx∗ , xi| ∈ R,
|hx∗ , xi|
which proves the claim. Using this property, it is not so hard to prove that (see e.g. [6,
Lemma 6.39])
kRe x∗ kL(X,R) = kx∗ kL(X,C)
for all x∗ ∈ X ∗ and consequently T is an isometric R−linear operator. In particular T is
injective. Writing now hx∗ , xi = a + ib, with a, b ∈ R, we see that a = Re hx∗ , xi and
Re hx∗ , ixi = Re (i hx∗ , xi) = Re (i (a + ib)) = −b.
Hence
hx∗ , xi = hRe x∗ , xi − i hRe x∗ , ixi , x ∈ X
which implies that T is surjective. Indeed, if x b∗ ∈ XR∗ , then the operator x∗ : X → C defined
by
hx∗ , xi = hb
x∗ , xi − i hb
x∗ , ixi , x ∈ X
is C−linear and continuous, which means that x∗ ∈ X ∗ . Further, it is clear that
hRe x∗ , xi = Re hx∗ , xi = hb
x∗ , xi for all x ∈ X
and consequently Re x∗ = x
b∗ . This leads to the conclusion that T is a R−linear isometric
isomorphismus.

Remark 2 The above result implies in particular that

• the spaces (X ∗ )R and XR∗ are isometric isomorphic;


• if X and Y are complex Banach spaces and A : X → Y is a bounded linear operator,
then the operator AR : XR → YR defined by AR x = Ax is linear and bounded with the
same operator norm and its Banach adjoint operator satisfies A∗R = T A∗ T −1 ;
• the complex pre-Hilbert space H is a Hilbert space if and only if HR is a Hilbert space
with the inner product hx, yiHR := Re hx, yiH ;

All these considerations show that it suffices to consider only real Banach spaces (resp.
real Hilbert spaces). For this reason, we assume without restriction of the generality, that
X is always a real Banach space for the rest of this work. Accordingly, the Hilbert space
H is always considered a real Hilbert space.

2.1 The subdifferential


Definition 3 Let X be a vector space and R := (−∞, ∞]. The function f : X → R is said
to be convex if
f (λx + (1 − λ) y) ≤ λf (x) + (1 − λ) f (y)
for all x, y ∈ X and λ ∈ (0, 1) . If this inequality is strict whenever x 6= y, then f is said
to be strictly convex. The set D(f ) := {x ∈ X : f (x) < ∞} is called the effective
domain of f . Finally, the function f is said to be proper if D(f ) 6= ∅.
2.1. THE SUBDIFFERENTIAL 9

A convex function f always has a convex effective domain, i.e., λx + (1 − λ) y ∈D(f )


whenever x, y ∈D(f ) and λ ∈ (0, 1) . Further, a proper convex function f is continuous
in the interior of its effective domain int(D (f )) if X is finite dimensional. If X is infinite
dimensional, the same result holds whenever f is bounded from above on a neighborhood
of an interior point x0 ∈int(D (f )) [10, Theo. 1.10 and Cor. 1.11, Ch. I].

Definition 4 Let X and Y be normed vector spaces, C ⊂ X open, F : C → Y a function


and x0 ∈ C. The directional derivative of F at x0 in the direction x ∈ X is defined by
the limit
F (x0 + tx) − F (x0 )
F+ (x0 , x) := lim ,
t→0+ t
if it exists.
If there exists a linear and bounded operator A : X → Y satisfying

F (x0 + tx) − F (x0 )


Ax = lim
t→0 t
for each x ∈ X, then the function F is said to be Gateaux-differentiable (in short
G-differentiable) at x0 and we denote by F 0 (x0 ) := A the G-derivative of F at x0 .
We say that F is Frechet-differentiable (in short F-differentiable) at x0 , if it is
G-differentiable at this point and

F (x0 + tx) − F (x0 ) 0

lim sup − F (x0 )
= 0.
t→0kxk=1 t

In this case, the G-derivative of F at x0 , F 0 (x0 ) , is also called the F-derivative. Finally, F
is said to be continuously f-differentiable (resp. continuously g-differentiable)
in C if the F-derivative (resp. G-derivative) F 0 : C → L (X, Y ) is a continuous function.

It is clear from definition that F F-differentiable at x0 implies F G-differentiable at x0 .


The second condition in turn, implies that the directional derivative of F at x0 exists in all
directions x ∈ X and
F+ (x0 , x) = F 0 (x0 ) x = F− (x0 , x) ,
where
F (x0 + tx) − F (x0 )
F− (x0 , x) := lim = −F+ (x0 , −x) .
t→0− t
Under appropriate assumptions, the important chain rule holds true for F-differentiable
functions F : X → Y and G : Y → Z (see [10, Theo. 1.14, Ch. I]): G (F (x))0 = G0 (F (x)) F 0 (x)
for all x ∈ X.
If f : X → R is a proper convex function, then for any x0 ∈int(D (f )) , the directional
derivatives f+0 (x0 , x) and f−0 (x0 , x) exist in all directions x ∈ X and f−0 (x0 , x) ≤ f+0 (x0 , x) .
It is clear that this inequality becomes an equality if f is G-differentiable at x0 . Reciprocally,
f−0 (x0 , x) = f+0 (x0 , x) for all x ∈ X implies the G-differentiability of f and in this case,
f−0 (x0 , x) = f+0 (x0 , x) = f 0 (x0 ) x.

Definition 5 Let X be a Banach space, f : X → R be a function and let 2X denote the set
of all subsets of X. We say that f is subdifferentiable at a point x0 ∈ X if there exists
a functional x∗0 ∈ X ∗ , called subgradient of f at x0 , such that

f (x) − f (x0 ) ≥ hx∗0 , x − x0 i for all x ∈ X.



The set of all subgradients of f at x0 is denoted by ∂f (x0 ) and the mapping ∂f : X → 2X
is called the subdifferential of f.
10 CHAPTER 2. GEOMETRY OF BANACH SPACES

The equivalence
 
1 ∗
αf (x) − αf (x0 ) ≥ hx∗0 , x − x0 i ⇐⇒ f (x) − f (x0 ) ≥ x , x − x0
α 0
for any α > 0, leads to the conclusion: x∗0 ∈ ∂ (αf ) (x0 ) ⇐⇒ α1 x∗0 ∈ ∂f (x0 ) , this is,
∂ (αf ) (x0 ) = α∂f (x0 ) for each α > 0.
A subdifferentiable function f is convex and lower semi-continuous in any open convex
set C ⊂D(f ) . Reciprocally, a proper convex and lower semi-continuous function f is always
subdifferentiable on int(D (f )) .
The optimality condition 0 ∈ ∂f (x0 ) is equivalent to f (x0 ) ≤ f (x) for all x ∈ X, which
means that x0 is a minimizer of f.
Let f be proper and convex and let x0 ∈int(D (f )) . Then x∗0 ∈ ∂f (x0 ) if and only if
f−0 (x0 , x) ≤ hx∗0 , xi ≤ f+0 (x0 , x) , for all x ∈ X,
see [10, Prop. 2.5, Ch. I]. From it, we conclude that f has a unique subgradient at
x0 ∈int(D (f )) if and only if f is G-differentiable at x0 . In this case, f 0 (x0 ) x = hx∗0 , xi for
all x ∈ X, this is, ∂f (x0 ) = {f 0 (x0 )} .
If f1 and f2 are two convex functions defined on X such that there is a point x0 ∈D(f1 ) ∩D(f2 )
where f1 is continuous, then [10, Theo. 2.8, Ch. I]
∂ (f1 + f2 ) (x) = ∂f1 (x) + ∂f2 (x) for all x ∈ X.
If A : X → Y is a bounded linear operator between the Banach spaces X and Y and
f : Y → R is convex and continuous at some point of the range of A, then
∂ (f ◦ A) (x) = A∗ (∂f (Ax)) for all x ∈ X,
where A∗ : Y ∗ → X ∗ represents the Banach adjoint of A (cf. [49]). Finally, for f : Y → R
convex and b ∈ Y fixed, see [48, Theo. 2.24],
∂ (f (· − b)) (y) = (∂f ) (y − b) for all y ∈ Y.

Example 6 Let X and Y be Banach spaces. Define the Tikhonov functional


1 1
Tα (x) := kAx − bkr + α kxkp ,
r p
with p, r > 1, α > 0, A : X → Y linear and bounded and b ∈ Y. Using the optimality
condition, we find that x0 ∈ X minimizes Tα if and only if
    
∗ 1 r 1 p
0 ∈ ∂Tα (x0 ) = A ∂ k·k (Ax0 − b) + α∂ k·k (x0 ) . (2.2)
r p
We concentrate in the particular case p = r = 2. In a Hilbert space, making use of
the definition of G-derivative and the polarization identity (2.5), it is not too difficult to
prove that the convex function f (x) = 21 kxk2 satisfies f 0 (x0 ) x = hx0 , xi for allx ∈ X.
Consequently ∂f (x0 ) is single valued and ∂f (x0 ) = f 0 (x0 ) = x0 , this is, ∂ 12 k·k2 is just
the identity operator. Equality (2.2) assumes in this case the form (for p = r = 2)
0 = A∗ (Ax0 − b) + αx0 =⇒ x0 = (A∗ A + αI)−1 A∗ b.

Using the definition of subdifferential, one can prove that for all x ∈ X\ {0} [10, Prop.
3.4, Ch. I],
∂ kxk = {x∗ ∈ X ∗ : hx∗ , xi = kxk and kx∗ k = 1} . (2.3)
This result motivates the definition of a smooth Banach space.
2.2. SMOOTHNESS OF BANACH SPACES 11

2.2 Smoothness of Banach spaces


Definition 7 A Banach space X is called smooth if for every x 6= 0, there is a unique
x∗ ∈ X ∗ such that hx∗ , xi = kxk and kx∗ k = 1.

The existence of x∗ in the above definition is guaranteed by the Theorem of Hahn-


Banach. Thus, smoothness of Banach spaces is only a matter of uniqueness of x∗ .

Remark 8 We want to mention the fact that from the isomorphismus T constructed at the
beginning of this chapter it follows that a complex Banach space is smooth if and only if its
corresponding real Banach space is smooth. This is a direct consequence of the equivalence

hx∗ , xi = kxk and kx∗ k = 1 ⇐⇒ hT x∗ , xi = kxk and kT x∗ k = 1, (2.4)

which we prove now. Indeed, first remember that kT x∗ k = kx∗ k . Now, if hx∗ , xi = kxk ∈ R,
then
kxk = hx∗ , xi = Re hx∗ , xi = hT x∗ , xi .
Reciprocally, assuming hT x∗ , xi = kxk we define the vector x
b := sgn (hx∗ , xi)x and obtain
from (2.1) ,

kxk = hT x∗ , xi = Re hx∗ , xi ≤ |hx∗ , xi| = Re hx∗ , x


bi
= hT x∗ , x
bi ≤ kT x∗ k kb
xk = kb
xk = kxk .

Consequently, Re hx∗ , xi = |hx∗ , xi| which implies that hx∗ , xi = Re hx∗ , xi = kxk and the
proof is complete.
As the subdifferential of k·k : X → R is always a subset of XR∗ , we see that in a complex
Banach space X, the set in (2.3) is actually described by

∂ kxk = {x∗ ∈ XR∗ : hx∗ , xi = kxk and kx∗ k = 1} .

But as these two sets can be identified to each other using the isomorphismus T (see (2.4)
above), (2.3) can be used, in a slight abuse of notation, even in complex Banach spaces.

Using the last definition and (2.3) , we conclude that a Banach space is smooth if and
only if the subdifferential of the convex function f (x) = kxk is single valued for all x ∈
X\ {0} . This is in turn, an equivalent condition to the G-differentiability of f in X\ {0} ,
i.e., the Banach space X is smooth if and only if the norm-function is G-differentiable in
X\ {0} .

Definition 9 Let X be a Banach space. The function ρX : R+ → R defined by


1
ρX (τ ) := sup {kx + τ yk + kx − τ yk − 2 : kxk = kyk = 1} ,
2
is called the modulus of smoothness of X. The space X is said to be uniformly
smooth if lim ρX (τ ) /τ = 0 and it is called p−smooth, for p > 1 fixed, if ρX (τ ) ≤ Cp τ p ,
τ →0+
where Cp > 0 is a constant independent on τ.

X p−smooth implies that ρX (τ ) /τ ≤ Cp τ p−1 → 0 as τ → 0, which in turn implies that


X is uniformly smooth.
Define the function f (x) = kxk and observe that
 
ρX (τ ) 1 f (x + τ y) − f (x) f (x + τ (−y)) − f (x)
= sup + : kxk = kyk = 1 .
τ 2 τ τ
12 CHAPTER 2. GEOMETRY OF BANACH SPACES

Assume now that X is uniformly smooth, then for all x, y ∈ X with kxk = kyk = 1,
 
f (x + τ y) − f (x) f (x + τ (−y)) − f (x)
0 = lim + = f+0 (x, y) − f−0 (x, y) .
τ →0+ τ τ
It is not so difficult to extend this equality for all y ∈ X, which implies that f is G-
differentiable at x, for all x ∈ X satisfying kxk = 1. Finally, one can prove that the
result actually holds for all x ∈ X\ {0} , which means that X is smooth. Hence, the
uniform smoothness of X implies the smoothness of this space. The converse is however
not true, as can be seen in [10, Theo. 3.12, Ch. I], where a proof of the equivalence
between uniform smoothness and uniform F-differentiability of the norm-function on the
unit sphere is given. In particular, the norm-function is F-differentiable in X\ {0} provided
X is uniformly smooth.

Example 10 With help of the polarization identity:


1 1 1
kx − yk2 = kxk2 − hx, yi + kyk2 , (2.5)
2 2 2
which holds true for all vectors x and y in an arbitrary Hilbert space H, one can prove that
   
(kx + τ yk + kx − τ yk)2 ≤ 2 kx + τ yk2 + kx − τ yk2 = 4 kxk2 + τ 2 kyk2 , (2.6)

which implies that


1 p
(kx + τ yk + kx − τ yk) ≤ 1 + τ 2
2

for kxk = kyk = 1. Consequently, ρH (τ ) ≤ 1 + τ 2 − 1 < 12 τ 2 , which proves that a Hilbert
space is always 2−smooth. Further, as τ > 0, the inequality in (2.6) holds if and only if
kx + τ yk = kx − τ yk , i.e.,
√ hx, yi = 0. This is possible if and only if dim H ≥ 2, and in this
case we obtain ρH (τ ) = 1 + τ 2 − 1.

2.3 Convexity of Banach spaces


Definition 11 The Banach space X is said to be strictly convex if for all x, y ∈ X
with x 6= y and kxk = kyk = 1 it holds kλx + (1 − λ) yk < 1 for all λ ∈ (0, 1) .

The above definition states that in a strictly convex Banach space, the line segment
connecting two points in the unit sphere has only points lying inside this sphere, except for
the extremal points themselves. It is possible to prove (see [10, Prop. 1.2, Ch. II]) that a
Banach space is strictly convex
1 if and only if the unit sphere has no line segments. It is also
an equivalent condition to 2 (x + y) < 1 for all x, y ∈ X with x 6= y and kxk = kyk = 1
(the midpoint of a line segment with extremal points lying in the unit sphere does not
belong to this sphere).
The strict convexity of the Banach space X is also equivalent to the strict convexity of
the function h (x) = kxkp , for any p > 1 fixed. In fact, let p > 1 and x, y ∈ X be fixed. As
the function f : R+ + p
0 → R0 , t 7−→ t is strictly convex, we get from triangle inequality,

h (λx + (1 − λ) y) = kλx + (1 − λ) ykp ≤ (λ kxk + (1 − λ) kyk)p (2.7)


= f (λ kxk + (1 − λ) kyk) ≤ λf (kxk) + (1 − λ) f (kyk)
= λh (x) + (1 − λ) h (y) ,
for all λ ∈ (0, 1) . Hence h is convex. Suppose now that h is not strictly convex. Then
there exist x, y ∈ X with x 6= y and λ0 ∈ (0, 1) such that h (λ0 x + (1 − λ0 ) y) = λ0 h (x) +
(1 − λ0 ) h (y). From (2.7) ,
f (λ0 kxk + (1 − λ0 ) kyk) = λ0 f (kxk) + (1 − λ0 ) f (kyk) ,
2.3. CONVEXITY OF BANACH SPACES 13

which implies that kxk = kyk =


6 0 because f is strictly convex. Again from (2.7) ,

kλ0 x + (1 − λ0 ) yk = λ0 kxk + (1 − λ0 ) kyk = kxk = kyk ,



x y
this is, λ0 kxk + (1 − λ0 ) kyk = 1, which implies that X is not strictly convex. Conversely,

suppose that X is not strictly convex. Then there exist λ ∈ (0, 1) and x, y ∈ X with x 6= y
and kxk = kyk = 1 such that

h (λx + (1 − λ) y) = kλx + (1 − λ) ykp = 1


= λ kxkp + (1 − λ) kykp = λh (x) + (1 − λ) h (y) ,

which implies that h is not strictly convex.

Definition 12 Let X be a Banach space. The function δX : (0, 2] → R defined by


 
1
δX () := inf 1 −
2 (x + y) : kxk = kyk = 1 and kx − yk ≥  ,

is called the modulus of convexity of X. The space X is said to be uniformly convex


if δX () > 0 for all 0 <  ≤ 2 and it is called p−convex, for p > 1 fixed, if δX () ≥ Kp p ,
where Kp > 0 is a constant independent on τ.

It is easy to prove that p−convexity implies uniform convexity.


Fix x, y ∈ X with x 6= y and kxk = kyk = 1. If X is uniformly convex, then for
 := kx − yk ∈ (0, 2] it holds δ () > 0, which implies in view of definition of δX that
1 − 21 (x + y) > 0, which in turn implies that X is strictly convex.
In [9, Example 1.7], the author presents an interesting example of two different Banach
spaces, which have equivalent norms, and at the same time the first one is strictly convex,
the second one is not. Further, the strictly convex space is not uniformly convex, which
proves that these are not equivalent concepts.
An interesting property of uniformly convex Banach spaces is the following: if xn * x
and kxn k → kxk as n → ∞, then xn → x as n → ∞ [10, Prop. 2.8, Ch. II].
For an arbitrary Hilbert space H and a Banach space X, the inequalities δH () ≥ δX ()
for all 0 <  ≤ 2 and ρH (τ ) ≤ ρX (τ ) for all τ > 0 always hold. Moreover X p−convex
implies p ≥ 2 and X p−smooth implies p ≤ 2. Since Hilbert spaces are 2−smooth and
2−convex (see Example 10 and Example 13 below), we conclude that they are the ”most
convex” and ”smoothest” Banach spaces.
As shown in Example 10, an explicit formula for ρH (τ ) is known in case of dim H ≥ 2.
The same is true for δH () as the next example shows.

Example 13 In a Hilbert space H, the polarization identity (2.5) shows that


 
kx + yk2 = 2 kxk2 + kyk2 − kx − yk2 .

With kxk = kyk = 1, it results in


2  2
(x + y) = 1 − kx − yk
1
2 2
q 2
and with kx − yk ≥  we find δH () ≥ 1− 1 − 2 > 18 2 , which proves that Hilbert spaces
are 2−convex. If dim H ≥ 2, then choosing two sequences in q the unit sphere satisfying
 2

kxn − yn k →  as n → ∞, we obtain the equality δH () = 1 − 1 − 2 .
14 CHAPTER 2. GEOMETRY OF BANACH SPACES

2.4 Relations between smoothness and convexity


In this section we always consider X a Banach space and X ∗ its dual space. The numbers
p, p∗ > 1 represent conjugate numbers, i.e., 1/p + 1/p∗ = 1.
The strict convexity of X is equivalent to the smoothness of its dual space and vice-versa
in case of X being reflexive, as shown in [10, Cor. 1.4, Ch. II]. X uniformly smooth or
uniformly convex implies X reflexive [10, Theo. 2.9 and 2.15, Ch. II].
A very important result is the so called Lindenstrauss’ duality formulas, which connects
the modulus of smoothness of X and the modulus of convexity of its dual space X ∗ and
vice-versa (see [10, Prop. 2.12, Ch. II]): For each τ > 0 it holds
nτ o
ρX (τ ) = sup − δX ∗ () : 0 <  ≤ 2
2
and nτ o
ρX ∗ (τ ) = sup − δX () : 0 <  ≤ 2 .
2
A consequence of Lindenstrauss’ formulas is the following important result: X is uni-
formly convex (resp. p−convex) if and only if X ∗ is uniformly smooth (resp. p∗ −smooth)
and vice-versa (cf [38, 143, Vol II 1.e]).

Example 14 The Lebesgue space Lp (Ω) satisfies:


( p−1
2 2 > p−1 2

8  +o  8  :1<p<2
δLp () = p p 1
p
1 − 1 − 2 > p1 2
  
:2≤p<∞

and ( 1
(1 + τ p ) p − 1 < p1 τ p :1<p<2
ρLp (τ ) = p−1 2
,
2 < p−1 2

2 τ +o τ 2 τ :2≤p<∞
which means that this space is1 p∨2−convex and p∧2−smooth. In particular, it is uniformly
smooth and uniformly convex. The space of the summable sequences `p (N) and the Sobolev
spaces W n,p (Ω) , n ∈ N, are also p ∨ 2−convex and p ∧ 2−smooth for 1 < p < ∞. As these
spaces are not reflexive for p = 1 and p = ∞, we conclude that they cannot be uniformly
smooth nor uniformly convex. One can actually prove that they are not even strictly convex
or smooth Banach spaces.

2.5 Duality mapping


The Riesz Representation Theorem states that in an arbitrary Hilbert space H, for each
element x ∈ H, there exists a unique linear and continuous functional x∗ : H → R such that

hx∗ , yiH ∗ ×H = hx, yiH , for all y ∈ H

and kx∗ kL(H,R) = kxkH . This implies that

hx∗ , yi ≤ kxk . kyk and hx∗ , xi = kxk2 . (2.8)

This reasoning suggests we could cover the lack of an inner product in a general Banach
space X associating each element x ∈ X to a functional x∗ ∈ X ∗ and then replacing the
inner product hx, yi with hx∗ , yiX ∗ ×X . In the ideal case, the dual pairing h·, ·iX ∗ ×X would
have inner-product-like properties similar to (2.8) above.
1
a ∨ b := max {a, b} and a ∧ b := min {a, b} .
2.5. DUALITY MAPPING 15

Definition 15 Let X be a Banach space. A continuous and strictly increasing function


ϕ : R+ +
0 → R0 such that ϕ (0) = 0 and lim ϕ (t) = ∞ is called a gauge function. The
t→∞

set-valued mapping Jϕ : X → 2X defined by

Jϕ (x) := {x∗ ∈ X ∗ : hx∗ , xi = kx∗ k . kxk and kx∗ k = ϕ (kxk)}

is called duality mapping associated with the gauge function ϕ. The duality mapping
associated with the gauge function ϕ (t) = t is called normalized duality mapping.
Finally, a selection of the duality mapping Jϕ is a single-valued function jϕ : X → X ∗
satisfying jϕ (x) ∈ Jϕ (x) for each x ∈ X.

Suppose we are given x ∈ X. From Hahn-Banach Theorem it follows that there exists
x∗∈ X ∗ such that kx∗ k = 1 and hx∗ , xi = kxk, which means that y ∗ := x∗ ϕ (kxk) ∈ Jϕ (x) .
Hence Jϕ (x) 6= ∅ for any x ∈ X.

Remark 16 The same reasoning of the previous paragraph implies, in view of (2.4) , that
the vector x∗ belongs to the set

{x∗ ∈ X ∗ : hx∗ , xi = kx∗ k . kxk and kx∗ k = ϕ (kxk)}

if and only if the vector Re x∗ belongs to

{x∗ ∈ XR∗ : hx∗ , xi = kx∗ k . kxk and kx∗ k = ϕ (kxk)} ,

which means that the two above sets can be identified to each other by the use of the iso-
morphismus T x∗ = Re x∗ . Therefore, one is allowed to write

hjϕ (x) , yi ≤ kjϕ (x)k . kyk

even if X is a complex Banach space. The above inequality should actually be interpreted
as
Re hjϕ (x) , yi ≤ kRe jϕ (x)k . kyk = kjϕ (x)k . kyk .
The Asplund’s Theorem (2.9) below and the Xu-Roach inequalities in Theorem 18 should
be interpreted in the same way.

With the special notation Jp , where p > 1 is fixed, we denote the duality mapping
associated with the gauge function ϕ (t) = tp−1 . In particular, J2 is the normalized duality
mapping. From definition, we conclude that, for all x, y ∈ X,

kjp (x)k = ϕ (kxk) = kxkp−1 ,


hjp (x) , xi = kjp (x)k . kxk = kxkp and
hjp (x) , yi ≤ kjp (x)k . kyk = kxkp−1 kyk .

Further, each selection j2 of the normalized duality mapping has the inner-product prop-
erties shown in (2.8) .
The connection between the subdifferential and the duality mapping is given by the
very important Asplund’s Theorem [10, Lemma 4.3 and Theo. 4.4, Ch. I]: Let X be a
Banach Rspace, x ∈ X an arbitrary vector and ϕ a gauge function. Then the function
t
ψ (t) := 0 ϕ (s)ds, t ≥ 0 is convex in R+
0 and

Jϕ (x) = ∂ (ψ (kxk)) . (2.9)

For the gauge function ϕ (t) = tp−1 we have ψ (kxk) = p1 kxkp and conclude that Jp =
   
∂ p1 k·kp . In particular, for the normalized duality mapping it holds J2 = ∂ 12 k·k2 . In
16 CHAPTER 2. GEOMETRY OF BANACH SPACES
   0
a Hilbert space, ∂ 12 k·k2 (x) = 12 k·k2 (x) = x, which means that J2 = I is the identity
operator. Unfortunately, this very nice property is true only in Hilbert spaces. In fact,
one can prove that the normalized duality mapping J2 is linear in X if and only if X is a
Hilbert space [10, Prop. 4.8, Ch. I].
The Asplund’s Theorem is the key to connect the properties of the duality mappings
with convexity and smoothness properties of a Banach space. For instance, an interesting
consequence of Asplund’s Theorem is the fact that a Banach space X is smooth if and only
if each duality mapping Jϕ is single valued (cf [10, Cor. 4.5, Ch. I]). In this case,

d
hJϕ (x) , yi = ψ (kx + tyk) , for all x, y ∈ X. (2.10)
dt t=0

Further, X is uniformly smooth if and only if each duality mapping is single-valued and
uniformly continuous in the unit sphere [10, Theo. 2.16, Ch. II].
The next properties of the duality mapping Jϕ are collected from [10, Prop. 4.7, Ch. I]:
Let x, y ∈ X. The duality mapping inherits the monotonicity property of the subdifferential:
hjϕ (x) − jϕ (y) , x − yi ≥ 0.
Further, Jϕ (−x) = −Jϕ (x) and
ϕ (λ kxk)
Jϕ (λx) = Jϕ (x) for all λ > 0.
ϕ (kxk)
In particular, J2 (λx) = λJ2 (x) is homogeneous. The inverse of ϕ is a gauge function too
and if Jϕ∗−1 : X ∗ → X ∗∗ is the duality mapping on X ∗ associated with the gauge function
ϕ−1 , then
x∗ ∈ Jϕ (x) =⇒ x ∈ Jϕ∗−1 (x∗ ) . (2.11)
Finally, if ϕ1 and ϕ2 are gauge functions, then
ϕ2 (kxk) Jϕ1 (x) = ϕ1 (kxk) Jϕ2 (x) .
In particular, for ϕ1 (t) = tr−1 and ϕ2 (t) = tp−1 with p, r > 1 it holds
Jr (x) = kxkr−p Jp (x) . (2.12)

From [10, Cor. 3.13, Ch. II], we see that the range2 of Jϕ is dense in X ∗ , i.e., R (Jϕ ) =
X ∗ . If X is reflexive, then this result becomes R (Jϕ ) = X ∗ (actually, this is an equivalent
condition to the reflexivity of X, see [10, Cor 3.4, Ch.II]). We
 conclude that in case of X
being reflexive, the reciprocal of (2.11) is true and R Jϕ−1 = X ∗∗ ∼

= X. Assuming that
X is smooth, then each duality mapping is single valued and if X is additionally reflexive
(this is the case for instance, if X is uniformly smooth), then Jϕ is invertible and satisfies
Jϕ−1 = Jϕ∗−1 : X ∗ → X ∗∗ ∼
= X. (2.13)

In particular, Jp−1 = Jp∗∗ . Lastly, if the norm-functions in X and X ∗ are F-differentiable,


then X is reflexive, Jϕ is single-valued, continuous and invertible with continuous inverse
satisfying (2.13) . This result is valid for example, if X is uniformly smooth (then the norm
in X is uniformly F-differentiable on the unit sphere) and uniformly convex (then X ∗ is
uniformly smooth and the same result holds true in this space).
The strict monotonicity of the duality mapping is equivalent to the strict convexity of
X, i.e., X is strictly convex if and only if, each duality mapping satisfies
hjϕ (x) − jϕ (y) , x − yi > 0 for all x, y ∈ X with x 6= y.

2
We define the range of Jϕ : X → 2X as R (Jϕ ) := ∪x∈X Jϕ (x) , which makes sense even if Jϕ is not
single-valued.
2.6. BREGMAN DISTANCES 17

Example 17 The dual space of the Lebesgue space Lp (Ω), 1 < p < ∞ is given by (Lp (Ω))∗ =
∗ ∗
Lp (Ω) and the duality mapping Jp : Lp (Ω) → Lp (Ω) can be calculated using the formula
(2.10) . In fact, the duality mapping Jp is associated with the gauge function ϕ (t) = tp−1 ,
Rt
which means that ψ (t) = 0 ϕ (s)ds= p1 tp . Let f, g ∈ Lp (Ω) be given. Then by (2.10) ,
 
d d 1 p
hJp (f ) , giLp∗ ×Lp = ψ (kf + tgkLp ) = kf + tgkLp
dt t=0 dt p t=0
Z
1 d
|f (x) + tg (x)|p dx

=
p dt Ω t=0
Z
p−1

= |f (x) + tg (x)| sgn (f (x) + tg (x)) g (x) dx
ZΩ D
t=0
E
= |f (x)|p−1 sgn (f (x)) g (x) dx = |f |p−1 sgn (f ) , g .
Ω Lp∗ ×Lp

This means that Jp (f ) = |f |p−1 sgn(f ) , where the equality is understood pointwise. Using
now (2.12) we conclude that the duality mapping Jr in Lp (Ω) is given by

Jr (f ) = kf kr−p
Lp |f |
p−1
sgn (f ) . (2.14)

2.6 Bregman distances


In 1991, Xu and Roach proved in their famous paper [53] two very important results, which
are nowadays known as Xu-Roach inequalities. We start this section formulating these
results in form of a theorem.

Theorem 18 (Xu-Roach) Let X be a Banach space and p > 1.


(A) If X is uniformly convex, then there exists a positive constant K
e p such that for all
x, y ∈ X and jp (x) ∈ Jp (x) ,

kx − ykp ≥ kxkp − p hjp (x) , yi + σp (x, y)

with
1
(kx − tyk ∨ kxk)p
 
t kyk
Z
σp (x, y) := K
ep δX dt,
0 t 2 (kx − tyk ∨ kxk)
where δX is the modulus of convexity, see Definition 12.
(B) If X is uniformly smooth, then there exists a positive constant C
ep such that for all
x, y ∈ X
kx − ykp ≤ kxkp − p hJp (x) , yi + σ
ep (x, y)
with
1
(kx − tyk ∨ kxk)p
 
t kyk
Z
σ
ep (x, y) := C
ep ρX dt,
0 t kx − tyk ∨ kxk
where ρX is the modulus of smoothness (Definition 12).

Assuming that the space X is s−convex for some s > 1 and using the definition of
s−convexity,
Z 1
K
e p Ks s
σp (x, y) ≥ kyk (kx − tyk ∨ kxk)p−s ts−1 dt.
2s 0

Since for all t ∈ [0, 1] ,

kx − tyk ∨ kxk ≤ kxk + kyk ≤ 2 (kxk ∨ kyk) ,


18 CHAPTER 2. GEOMETRY OF BANACH SPACES

it follows that for p ≤ s,

σp (x, y) ≥ pKp,s (kxk ∨ kyk)p−s kyks , (2.15)

where Kp,s = Ke p Ks 2p−2s /ps > 0. Similarly, if the Banach space X is assumed to be
s−smooth and p ≥ s, then there exists a positive constant Cp,s such that for all x, y ∈ X

ep (x, y) ≤ pCp,s (kxk ∨ kyk)p−s kyks .


σ (2.16)

In particular, if p = s, then
1 1 Kp
kx − ykp ≥ kxkp − hjp (x) , yi + kykp , (2.17)
p p p
and
1 1 Cp
kx − ykp ≤ kxkp − hJp (x) , yi + kykp , (2.18)
p p p
respectively, with K p := pKp,s and C p := pCp,s .
In [9, Cor. 4.17 and Cor. 5.8] it is shown that the existence of K p and C p in inequalities
(2.17) and (2.18) are actually equivalent conditions to p−convexity and p−smoothness of
X respectively.
Note that inequalities (2.17) and (2.18) reduce to the polarization identity (2.5) in
Hilbert spaces (for p = s = 2, K p = C p = 1). Trying to mimic this identity in a general
Banach space, we introduce the Bregman distance.

Definition 19 Let X be a Banach space and Ω : X → R a convex and subdifferentiable


functional. The Bregman distance associated to Ω is the function ∆Ω : X × X → R
defined by
∆Ω (x, y) := Ω (x) − Ω (y) − inf {hξ, x − yi : ξ ∈ ∂Ω (y)} .

Despite its name, the Bregman distance is not a metric because it does not satisfy the
reflexivity for example (∆Ω (x, y) 6= ∆Ω (y, x) in general). It does not satisfy the important
triangle inequality either. But, from definition of subdifferential immediately follows that
∆Ω (x, y) ≥ 0 for all x, y ∈ X. Additionally, x = y implies ∆Ω (x, y) = 0.
Let ϕ be a gauge function. Then, from Asplund’s Theorem, the function Ω (x) :=
R kxk
ψ (kxk) = 0 ϕ (s)ds is convex and ∂Ω (x) = Jϕ (x) . It follows that in this case,

∆Ω (x, y) = ψ (kxk) − ψ (kyk) − inf {hξ, x − yi : ξ ∈ Jϕ (y)} .

We denote by ∆p , p > 1, the Bregman distance associated to the particular gauge function
ϕ (t) = tp−1 . This means
1 1
∆p (x, y) = kxkp − kykp − inf {hξ, x − yi : ξ ∈ Jp (y)} . (2.19)
p p
Assume from now on that the duality mapping is single-valued (this is the case in smooth
Banach spaces for instance). Hence, the above equality becomes
1 1
∆p (x, y) = kxkp − kykp − hJp (y) , x − yi
p p
1 1
= kxkp − kykp + kykp − hJp (y) , xi
p p
1 1
= kxkp + ∗ kykp − hJp (y) , xi
p p
1 1 ∗
= kxkp − hJp (y) , xi + ∗ kJp (y)kp .
p p
2.6. BREGMAN DISTANCES 19

Observe the similarity between this formula and the polarization identity (2.5) . Since in
Hilbert spaces the normalized duality mapping is the identity operator, we conclude that
in these spaces ∆2 (x, y) = 12 kx − yk2 . Further,

1 1
∆p (x, y) ≥ kxkp + ∗ kykp − kykp−1 kxk .
p p
Now, if (xn )n∈N ⊂ X is a sequence and x ∈ X is a fixed vector, then the inequality
∆p (x, xn ) ≤ ρ implies  
1
kxn kp−1 kx n k − kxk ≤ ρ.
p∗
1 1 1 1
Considering now the cases p∗ kxn k − kxk ≤ 2p∗ kxn k and p∗ kxn k − kxk > 2p∗ kxn k, we
conclude the implication
 
∆p (x, xn ) ≤ ρ =⇒ kxn k ≤ 2p∗ kxk ∨ ρ1/p . (2.20)

Therefore, (xn )n∈N is bounded whenever ∆p (x, xn ) is bounded. A similar result can be
proven if ∆p (xn , x) ≤ ρ. If the duality mapping is single-valued and continuous (this is
the case, for instance, in a uniformly smooth Banach space) then the continuity is handed
down to both arguments of the Bregman distance ∆p .
If X is strictly convex, then x = y whenever ∆p (x, y) = 0 because X strictly convex
implies that x 7−→ p1 kxkp is strictly convex, which in turn implies that ∆p is strictly convex
on its first argument. Now, if x 6= y, we find for λ ∈ (0, 1)

0 ≤ ∆p (λx + (1 − λ) y, y) < λ∆p (x, y) + (1 − λ) ∆p (y, y) = λ∆p (x, y) ,

which implies that ∆p (x, y) 6= 0.


A straightforward calculation leads to the three points identity:

∆p (z, y) − ∆p (z, x) = ∆p (x, y) + hJp (y) − Jp (x) , x − zi (2.21)

for all x, y, z ∈ X. It is also easy to verify the identity ∆p (x, y) = ∆p∗ (Jp (y) , Jp (x)) .
The Xu-Roach Theorem states that in an arbitrary uniformly convex Banach space it
holds
1
σp (y, y − x) ≤ ∆p (x, y) ,
p
for all x, y ∈ X. Using (2.15) we conclude that in an s−convex Banach space there exists,
for each p ≤ s, a constant Kp,s such that

Kp,s (kyk ∨ kx − yk)p−s kx − yks ≤ ∆p (x, y) (2.22)

for all x, y ∈ X. In particular, if the sequence (xn )n∈N ⊂ X is bounded (or p = s), then
there exists a constant C > 0 such that

kxn − xm ks ≤ C∆p (xn , xm ) . (2.23)

In the same way, in an uniformly smooth Banach space it is true that


1
∆p (x, y) ≤ ep (y, y − x)
σ
p
for all x, y ∈ X, which in view of (2.16) implies that in a s−smooth Banach space it holds,
for each p ≥ s,

∆p (x, y) ≤ Cp,s (kyk ∨ kx − yk)p−s kx − yks = Cp,s kykp−s kx − yks ∨ kx − ykp



(2.24)
20 CHAPTER 2. GEOMETRY OF BANACH SPACES

for all x, y ∈ X. One can additionally prove for an arbitrary s−smooth Banach space, that
for each p > 1, there exists a positive constant C p,s satisfying

kJp (x) − Jp (y)k ≤ 2s−p C p,s (kxk ∨ kyk)p−s kx − yks−1 ,

for all x, y ∈ X. Since

kxk ∨ kyk ≤ kx − yk + kyk ≤ 2 (kx − yk ∨ kyk) ,

we additionally have for p ≥ s,

kJp (x) − Jp (y)k ≤ C p,s (kx − yk ∨ kyk)p−s kx − yks−1 (2.25)


 
= C p,s kx − ykp−1 ∨ kykp−s kx − yks−1 .
Chapter 3

The Inexact Newton Method


K-REGINN

We aim to find an approximate solution to the nonlinear ill-posed problem

F (x) = y (3.1)

with F operating between Banach spaces X and Y , that is, F : D(F ) ⊂ X → Y, where
D(F ) denotes the domain of definition of F . We suppose to have full knowledge of this
operator. An approximation y δ for y satisfying

y − y δ ≤ δ,

and the noise level δ > 0 are assumed to be known as well. Suppose now that a solution
x+ of (3.1) exists and for the ease of presentation, assume for now that it is unique. We
aim to find for each pair y δ , δ satisfying the above inequality, a vector xδ such that the


regularization property holds1 :


xδ → x+ as δ → 0. (3.2)
The basis of our work is the Newton-type algorithm REGINN (REGularization based on
INexact Newton iteration). We first explain the original idea of this method as introduced
in [43] and later present our Kaczmarz version K-REGINN, which is a generalization of the
original algorithm. REGINN, as described in [43], improves the current iterate xn via

xn+1 = xn + sn (3.3)

by a correction step sn obtained from approximately solving a local linearization of (3.1):

An s = bδn (3.4)

where An := F 0 (xn ) is the Fréchet derivative of F at xn and bδn := y δ − F (xn ) is the


corresponding nonlinear residual. For a fixed number n, REGINN typically applies an iterative
solver to (3.4), called inner iteration, to generate a sequence (sn,k )k∈N which approximates
a solution of this system. The inner iteration terminates in the first index k = kn satisfying

An sn,k − bδn < µ bδn (3.5)

with a pre-defined constant 0 < µ < 1. The Newton iteration (3.3), also called the outer
iteration, is now realized defining sn := sn,kn . The algorithm is finally terminated with the
1
The approximate solution xδ actually depends on δ and y δ : xδ = x(δ,yδ ) .

21
22 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

discrepancy principle: Stop in the first iteration n = N (δ) satisfying2



δ
y − F xN (δ) ≤ τ δ,

with τ > 1 being a constant3 .


In order to define REGINN in a more general framework, we assume that (3.1) splits
into d ∈ N ”smaller” sub-problems, that is, Y factorizes into Banach spaces Y0 , . . . , Yd−1 :
Y = Y0 × Y1 × · · · × Yd−1 . Accordingly, F = (F0 , F1 , . . . , Fd−1 )> , Fj : D(F ) ⊂ X → Yj , and
y = (y0 , y1 , . . . , yd−1 )> . The equation (3.1) is now equivalent to the system

j = 0, . . . , d − 1.
Fj (x) = yj , (3.6)
 
δ
Our task can be recast as: for each d pairs yj j , δj satisfying

δ
kyj − yj j k ≤ δj , j = 0, ..., d − 1, (3.7)

find a vector xδ such that the regularization property (3.2) holds for

δ := max {δj : j = 0, . . . , d − 1} > 0. (3.8)


δ
The approximations yj j to yj and the respective positive noise levels δj as well as the
operators Fj , j = 0, . . . , d − 1 are assumed to be known.
We emphasize that systems like (3.6) arise quite naturally in applications where the
data is measured by d individual experiments or observations. For instance, in electrical
impedance tomography, one wants to find the conductivity of an object by applying, say, d
current patterns at the boundary and measuring the resulting voltages at the boundary as
well (see the numerical experiments in Chapter 5).
Our goal now is to introduce a Kaczmarz variant of REGINN (in short K-REGINN). In
contrast to traditional iterative methods, the Kaczmarz strategy, also known as loping
strategy, aims to find a solution of the original problem (3.1) processing the equations (3.6)
cyclically, using one equation each time stead of using all of them at the same time. This
kind of cycling strategy was initiated by Kaczmarz [27] in the context of linear systems in
finite dimensional spaces, first analyzed in the context of inverse problems by Kowar and
Scherzer [31] and further investigated by several authors [20, 7, 40, 39, 37].
The idea of K-REGINN is to determine sn from (3.4) similarly as explained in (3.5) but
with
0 δ[n]
An := F[n] (xn ) and bδn := y[n] − F[n] (xn ) ,
where [n] := n mod d denotes the remainder of integer division. Thus, the subsystems are
processed cyclically, breaking the large-scale system (3.1) into handy pieces. Observe that
the inner iteration works with a fixed equation of (3.6) . When the inner iteration terminates,
the current vector xn is updated in the outer iteration (3.3) generating the vector xn+1 and
the current equation is replaced by the next one. Finally, the outer iteration is stopped
using a variant of the discrepancy principle.
We go now into details and explain K-REGINN more precisely: Start the outer iteration
with x0 ∈D(F ) and the inner iteration setting sn,0 := 0. With n fixed, generate the sequence
(sn,k )k∈N . Update now the outer iteration using xn+1 := xn + sn,kn , where the final (inner)
index kn is determined as follows: choose τ > 1 and µ ∈ (0, 1). Define kn = 0 in case of

δ
bn ≤ τ δ[n] . (3.9)
2
The number N is chosen by a posteriori strategy, it thus depends actually on δ and y δ : N = N δ, y δ .


But we stick to the simpler notation N = N (δ).


3
The idea of using (3.5) was originally introduced in 1982 by Dembo et al [13] in the context of nonlinear
well-posed problems in finite dimensional spaces.
23

Algorithm 1 K-REGINN
Input: xN ; (y δj , δj ); Fj ; Fj0 , j = 0, . . . , d − 1; µ; τ ;
δ
Output: xN with kyj j − Fj (xN )k ≤ τ δj , j = 0, . . . , d − 1;
` := 0; x0 := xN ; c := 0;
while c < d do
for j = 0 : d − 1 do
n := `d + j;
δ
bδn := yj j − Fj (xn ); An := Fj0 (xn );
if kbδn k ≤ τ δj then
xn+1 := xn ; c := c + 1;
else
k := 0; sn,0 := 0; choose kmax,n ∈ N;
repeat
calculate sn,k+1 := f (sn,k ) from (3.4) % The meaning of f is explained in
% Remark 20
k := k + 1;

until bδn − An sn,k < µ bδn or k = kmax,n
xn+1 := xn + sn,k ; c := 0;
end if
end for
` := ` + 1;
end while
xN := x`d−c ;

Otherwise choose arbitrarily kn ∈ {1, ..., kREG } with


n o
kREG := min k ∈ N : An sn,k − bδn < µ bδn . (3.10)

Note that the definition of kn can be seen as

kn = kREG ∧ kmax,n , (3.11)

where (kmax,n )n∈N is an arbitrary sequence of natural numbers and kREG := 0 if (3.9) is
verified. Observe further that if kmax < ∞, where

kmax := max {kmax,n : n ∈ N} , (3.12)

then the sequence (kn )n∈N is bounded: kn ≤ kmax for all n ∈ N.


The equality xn+1 = xn holds whenever (3.9) holds, which means that the algorithm
does not alter xn anymore in case of (3.9) being verified d times in a row. Stop therefore the
outer iteration as soon as the discrepancy principle (3.9) is satisfied d consecutively times.
Our approximate solution of (3.1) is then xN where N = N (δ) is the smallest number
satisfying
δj
yj − Fj (xN ) ≤ τ δj , j = 0, . . . , d − 1. (3.13)

See Algorithm 1 for an implementation in pseudocode. See also Remark 20.

Remark 20 The function f : X → X in the repeat looping of Algorithm 1 represents a


general procedure to generate the inner iteration sequence (sn,k )0≤k≤kn from (3.4). Although
24 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

the function f depends on the Banach spaces X and Y and on the particular method used
to approximate the solution of (3.4), we do not consider any particular method to perform
this task in our convergence analysis in Chapter 4. Instead of this, we only assume that
some properties of this sequence hold true, without caring about how it is generated. In
next section however, we adapt some classical and well-known methods from Hilbert to
more general Banach spaces to provide some practical examples of how this sequence could
be generated in order to have the required properties used in the convergence analysis of
Chapter 4.

3.1 Solving the inner iteration


In this section we present various iterative methods which can be employed to approximately
solve the linear system An s = bδn in order to find the vector sn,kn (see (3.10) and (3.11))
used to update the outer iteration of K-REGINN. We recall that
0 δ
An := F[n] (xn ) and bδn := y[n]
[n]
− F[n] (xn )

represent respectively the F-derivative of the forward operator F[n] at the current iterate
xn and the nonlinear residual. The numbers p, r > 1 are fixed and p∗ and r∗ represent their
conjugate numbers respectively.
We suppose in the whole of this section that exact data (δ = 0 in (3.8)) is given. The
objectives are to avoid unnecessary complications at this point of the text as well as to ease
the notation, and for this last reason, we temporarily omit the superscript δ. We would like
to stress however, that all the results presented here can similarly be proven for the noisy
data case. Later, in Chapter 4 we turn back to the old notation and consider noisy data
again.
The concept in next definition is essential to understand the ideas of the primal gradient
methods of Subsection 3.1.1 below.

Definition 21 Let X be a vector space and ϕ : D(ϕ) ⊂ X → R a functional with domain


D(ϕ) having non-empty interior. The vector v ∈ X is a descent direction for ϕ from
x ∈int(D (ϕ)) if there exists a positive number t such that ϕ (x + tv) < ϕ (x) for all 0 < t ≤ t.

K-REGINN updates the current vector xn adding the vector sn,kn found in the inner
iteration: xn+1 = xn + sn,kn . The vector xn+1 is now a new approximation for a solution
of (3.1) . For this reason,
we would like the vectors sn,k to be descent directions for the
1 r
functional ψn (x) := r F[n] (x) − y[n] from xn .

The function 1r k·kr is F-differentiable if the uniform smoothness of Y[n] is assumed, see
Section 2.2. Assume now the F-differentiability of F[n] . In this case the chain rule applies
to the auxiliary function ϕn (t) := ψn (xn + tsn,k ):
D E
ϕ0n (0) = Jr F[n] (xn + tsn,k ) − y[n] , F[n]
 0
(xn + tsn,k ) sn,k

t=0
= hJr (−bn ) , An sn,k i = hJr (−bn ) , Asn,k − bn i − hJr (−bn ) , −bn i
≤ kbn kr−1 (kAn sn,k − bn k − kbn k) .

Assuming additionally that kAn sn,k − bn k < kbn k , it follows that


ϕn (t) − ϕn (0)
lim = ϕ0n (0) < 0,
t→0 t
which implies that there exists t > 0 such that ϕn (t) < ϕn (0) for all 0 < t < t. This is
equivalent to ψn (xn + tsn,k ) < ψ (xn ) for all 0 < t < t, which means that sn,k is a descent
direction for ψn from xn .
3.1. SOLVING THE INNER ITERATION 25

Though the assumption under the space Y[n] facilitates the above proof through the
use of the chain rule in the F-differentiable functions 1r k·kr and F[n] , it is not an essential
condition. The result actually holds true under weaker restrictions on the space Y[n] , which
guarantees only G-differentiability of norm-functions, as the next proposition shows.

Proposition 22 Let X and Y be Banach spaces with Y being smooth and let F : D(F ) ⊂
X → Y be a Gâteaux-differentiable function in x ∈int(D (F )) . Further, let y ∈ Y be a
fixed vector and define A = F 0 (x) and b = y − F (x) . If s ∈ X satisfies the inequality
kAs − bk < kbk , then s is a descent direction for the functional ψ (·) := kF (·) − yk from x.

Proof. Define the auxiliary functions ψ0 (t) := 1r kb − tAskr , r > 1, ψ1 (t) := kF (x + ts) − yk
and ψ2 (t) := kb − tAsk . As Y is smooth, the duality mapping Jr : Y → Y ∗ is single-valued
and satisfies (see (2.10))
 
d 1 r

hJr (v) , wi = kv + twk . (3.14)
dt r t=0

Therefore,

ψ0 (t) − ψ0 (0)
lim = ψ00 (0) = hJr (b) , −Asi = hJr (b) , b − Asi − hJr (b) , bi
t→0 t
≤ kbkr−1 (kAs − bk − kbk) < 0.

The result implies that there exist small numbers t, γ > 0 such that

ψ0 (t) − ψ0 (0)
≤ −γ < 0
t

for all 0 < t ≤ t, which in turn implies that


1
ψ2 (t) ≤ (−γtr + ψ2 (0)r ) r .

Now,

ψ1 (t) ≤ F (x + ts) − F (x) − tF 0 (x) s + ψ2 (t)



1
≤ F (x + ts) − F (x) − tF 0 (x) s + (ψ2 (0)r − γtr) r

and as ψ1 (0) = ψ2 (0) ,

1
ψ1 (t) − ψ1 (0) kF (x + ts) − F (x) − tF 0 (x) sk (ψ2 (0)r − γtr) r − ψ2 (0)
≤ +
t t t

for all 0 < t ≤ t. Finally, kF (x + ts) − F (x) − tF 0 (x) sk /t → 0 as t → 0 because F is


Gâteaux-differentiable at x and
1
(ψ2 (0)r − γtr) r − ψ2 (0) L’Hospital 1 γ
lim = lim − γ (ψ2 (0)r − γtr)− r∗ = − r < 0.
t→0 t t→0 ψ2 (0) r∗

Hence, there exists a number t1 > 0 such that ψ1 (t) < ψ1 (0) for all 0 < t ≤ t1 , i.e.,
kF (x + ts) − yk < kF (x) − yk .
26 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

3.1.1 Primal gradient methods


We forget K-REGINN for a while, skip the index n of the outer iteration and concentrate
only in the inner iteration. A : X → Y represents a linear operator and b ∈ Y is a fixed
vector.
In Hilbert spaces, the gradient of the functional ϕ (s) := 12 kAs − bk2 is given by
∇ϕ (s) = A∗ (As − b) and the vector −∇ϕ (sk ) is a descent direction for ϕ from sk whenever
it is not zero. We prove in the next proposition a similar result for smooth Banach spaces.
Proposition 23 Let X and Y be Banach spaces with Y being smooth. Further, let A : X →
Y be linear, b ∈ Y a fixed vector and ϕ : X → R the convex functional ϕ (s) := 1r kAs − bkr
with r > 1 fixed. Then ϕ is G-differentiable in X and if ∇k represents the G-derivative of
ϕ at sk , the vector −jp∗∗ (∇k ) with p∗ > 1 fixed is either equal zero or a descent direction
for ϕ from sk .
Proof. Like in (2.2) , we see that
 
∗ 1 r
∂ϕ (s) = A ∂ k·k (As − b) = A∗ Jr (As − b) .
r
As Y is smooth, the duality mapping Jr : Y → Y ∗ is single-valued, and so is the subdiffer-
ential ∂ϕ : X → X ∗ , which implies that ϕ is G-differentiable in X. Further, its G-derivative
at sk is given by ∇k = A∗ Jr (Ask − b) . Suppose that ∇k 6= 0. Using the auxiliary function
 1  r
Φ (t) := ϕ sk + t −jp∗∗ (∇k ) = (Ask − b) + t −Ajp∗∗ (∇k )
r
we find applying (3.14) ,
Φ (t) − Φ (0) (3.14)

= Φ0 (0) = Jr (Ask − b) , −Ajp∗∗ (∇k )



lim
t→0 t

= − ∇k , jp∗∗ (∇k ) = − k∇k kp < 0.

It follows that there exists anumber t > 0 such that for all 0 < t ≤ t it holds Φ (t) < Φ (0) ,
this is, ϕ sk + t −jp∗∗ (∇k ) < ϕ (sk ) .
Proposition 23 shows that in a smooth Banach space Y, the sequence generated by the
iterative method
sk+1 := sk − λk jp∗∗ (∇k ) , (3.15)
with p∗ > 1, s0 ∈ X and λk > 0 small enough, satisfies the inequality ϕ (sk+1 ) < ϕ (sk ) ,
i.e.,
kAsk+1 − bk < kAsk − bk (3.16)
as long as ∇k 6= 0. If additionally s0 := 0, then kAsk − bk < kbk for all4 k ∈ N. The
iterative methods defined in this way are called primal gradient 5 methods. Algorithm 2
codifies K-REGINN with a primal gradient method in the inner iteration in a smooth Banach
space Y . The pieces highlighted in red represent the part of the algorithm exclusively
related to (3.15).
If the step-size λk in (3.15) can be chosen independently on k, the associated gradient
method is called Landweber method6 (in short LW), that is, λLW =constant. The Steepest
Descent method (SD) is defined choosing a step-size λSD satisfying
λSD ∈ arg minϕ sk − λjp∗∗ (∇k ) ,

λ∈R+
4
In view of Proposition 22,
we see that for A = An and b = bn , the vectors sk = sn,k are descent
directions for the functional F[n] (·) − y[n] from xn .
5
Here the iteration occurs in the primal space X, in contrast with the so-called dual methods where the
iteration happens in the dual space X ∗ , see Subsection 3.1.2.
6
To homage the relevant work of L. Landweber [33].
3.1. SOLVING THE INNER ITERATION 27

Algorithm 2 K-REGINN with primal gradient inner iteration


Input: xN ; (y δj , δj ); Fj ; Fj0 , j = 0, . . . , d − 1; µ; τ ; p, r > 1;
δ
Output: xN with kyj j − Fj (xN )k ≤ τ δj , j = 0, . . . , d − 1;
` := 0; x0 := xN ; c := 0;
while c < d do
for j = 0 : d − 1 do
n := `d + j;
δ
bδn := yj j − Fj (xn ); An := Fj0 (xn );
if kbδn k ≤ τ δj then
xn+1 := xn ; c := c + 1;
else
k := 0; sn,0 := 0; choose kmax,n ∈ N;
repeat
∇n,k := A∗n Jr (An sn,k − bδn );
choose λn,k > 0 and jp∗∗ (∇n,k ) ∈ Jp∗∗ (∇n,k );
sn,k+1 := sn,k − λn,k jp∗∗ (∇n,k );
k := k + 1;

until bδn − An sn,k < µ bδn or k = kmax,n
xn+1 := xn + sn,k ; c := 0;
end if
end for
` := ` + 1;
end while
xN := x`d−c ;

if such a minimizer exists. Assuming this is the case and additionally assuming that the
function ϕ is F-differentiable (the second assumption is true for instance, if Y is uniformly
smooth), one can apply the chain rule to find


d ∗
= ϕ0 sk − λjp∗∗ (∇k ) , −jp∗∗ (∇k ) λ=λ


ϕ sk − λjp∗ (∇k )
dλ λ=λSD
SD

= − A Jr (Ask+1 − b) , jp∗∗ (∇k ) = − ∇k+1 , jp∗∗ (∇k ) .




It follows that, similarly to Hilbert spaces, the gradient of two consecutive iterates are
”orthogonal” in the sense that jp∗∗ (∇k ) , ∇k+1 = 0. Further, the inequality ϕ (sk+1 ) ≤

ϕ sk − λjp∗∗ (∇k ) is immediately verified for all λ ≥ 0 and consequently (3.16) holds. Due


to the nonlinearity of the duality mapping, an explicit formula for λSD is nevertheless
difficult to be achieved. The uniqueness of a minimizer is guaranteed for instance, if A is
injective and Y is strictly convex because in this case the function 1r k·kr is strictly convex
and Ax1 − b 6= Ax2 − b whenever x1 6= x2 , which implies the strict convexity of ϕ and
consequently the desired result.
Suppose now that Y is a r−smooth Banach space, then there exists a positive number
28 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

C r (see (2.18)) such that for all λ ≥ 0 and sk+1 = sk − λjp∗∗ (∇k ) ,

1 1 r
kAsk+1 − bkr = (Ask − b) − λAjp∗∗ (∇k )
r r
1 Cr
λAjp∗∗ (∇k ) r
≤ kAsk − bkr − Jr (Ask − b) , λAjp∗∗ (∇k ) +


r r
1 ∗ C r r r
= kAsk − bkr − λ k∇k kp + λ Ajp∗∗ (∇k ) ,

r r
which implies that

∗ Cr r r
ϕ (sk+1 ) − ϕ (sk ) ≤ −λ k∇k kp + λ Ajp∗∗ (∇k ) =: f (λ) . (3.17)
r
The above inequality is verified for each method defined via sk+1 = sk − λjp∗∗ (∇k ) and
in particular, since the step-size λSD minimizes the difference ϕ (sk+1 ) − ϕ (sk ) , inequality
(3.17) holds for this method using an arbitrary λ ≥ 0 in the rightmost side. The optimality
condition f 0 (λ) = 0 drives to the step-size

k∇k kp
λr−1
M SD := C0 r , (3.18)

Ajp∗ (∇k )

with C0 := 1/C r . The associated gradient method is called Modified Steepest Descent (MSD)
method. If Y is a Hilbert space and r = 2, then the polarization identity (2.5) shows that
C r = 1 can be chosen. In this case, the SD and MSD methods coincide and have the same
step-size:

k∇k kp
λ= 2 .

Ajp∗ (∇k )

If X is a Hilbert space too, then this step-size is given by the expression (for p = r = 2):

k∇k k2 kA∗ (Ask − b)k2


λ= = . (3.19)
kA∇k k2 kAA∗ (Ask − b)k2
Changing the definition of C0 with an arbitrary number satisfying the inequality 0 <
C0 < r/C r , we observe that the choice λk ∈ (0, λM SD ] implies that f (λk ) < 0, which in
view of (3.17) implies (3.16). The inequalities 0 < λk ≤ λM SD in combination with (3.17)
additionally imply that

∗ C r r−1 r
ϕ (sk+1 ) − ϕ (sk ) ≤ f (λk ) ≤ −λk k∇k kp + λk λM SD Ajp∗∗ (∇k ) (3.20)
r
∗ C r C0 ∗ ∗
= −λk k∇k kp + λk k∇k kp = −C1 λk k∇k kp < 0,
r
where C1 := 1 − C r C0 /r > 0. The above result is true for each primal gradient method with
step-size satisfying λk ∈ (0, λM SD ] with 0 < C0 < r/C r . It holds for SD too (in principle
not for λk = λSD in the rightmost term because we do not know whether 0 < λSD ≤ λM SD ,
but for an arbitrary λk ∈ (0, λM SD ]). Now, if λk ∈ [cλM SD , λM SD ] with 0 < c < 1 being
independent of k,
∗ r−1 ∗ r−1
   
λk k∇k kp ≥ cλM SD k∇k kp
∗ ∗ −1)
k∇k kp −r(p ∗ (r−1) cr−1 C0
≥c r−1
C0 k∇k kp = k∇k kr ,
kAkr kAkr
3.1. SOLVING THE INNER ITERATION 29

∗ ∗
which implies that λk k∇k kp & k∇k kr . From (3.20) ,
∞ ∞ ∞
r∗ p∗
X X X
k∇k k . λk k∇k k . ϕ (sk ) − ϕ (sk+1 ) ≤ ϕ (s0 ) < ∞.
k=0 k=0 k=0

It follows that
k∇k k → 0 as k → ∞. (3.21)
Hence, there exists a constant C2 > 0 independent of k such that k∇k k ≤ C2 . The result is
true for each primal gradient method with step-size λk in the interval [cλM SD , λM SD ] with
0 < c < 1 fixed but arbitrary. Although we do not know if the inequality cλM SD ≤ λSD ≤
λM SD is true, (3.21) is ensured for SD method too because (3.20) holds for this method
with an arbitrary λk ∈ (0, λM SD ] . In particular, (3.20) holds for SD with λk = λM SD for
example, which implies (3.21).
Finally, the inequality p ≤ r implies that p∗ − r (p∗ − 1) ≤ 0, which in turn implies
∗ ∗ −1) p∗ −r(p∗ −1)
r−1 k∇k kp −r(p C2
λM ≥ C0 ≥ C0 .
SD
kAkr kAkr
Choosing
∗ −1 ∗ ∗
C0r C2r −p
λLW := ∗ , (3.22)
kAkr
we conclude that λLW ∈ (0, λM SD ] and due to (3.20) , the inequality

ϕ (sk+1 ) − ϕ (sk ) ≤ −C2 λLW k∇k kp

is valid for LW method. As λLW is constant, the last inequality immediately implies (3.21).
In summary, we have proven that:

• If it is well-defined, the SD method always satisfies (3.16) . Additionally, if Y is


r−smooth, then (3.21) holds true.

• The LW method is well-defined, satisfies (3.16) , (3.20) and (3.21) whenever Y is


r−smooth and p ≤ r.

• Y r−smooth implies that each primal gradient method with step-size λk ∈ [cλM SD , λM SD ]
and 0 < c < 1 (in particular, MSD itself) satisfies (3.16) , (3.20) and (3.21).

Lemma 24 Let X and Y be Banach spaces and (sk )k∈N be a sequence generated by the
iterative method (3.15) with s0 = 0. Suppose that (3.21) and (3.20) hold. Then there exists
a constant C ≥ 0 such that
1
lim kAsk − bkr ≤ (kAs − bkr + C kbkr ) , for all s ∈ X. (3.23)
k→∞ 1+C

Proof. From (3.21) , it is possible to choose a subsequence ∇kj j∈N such that ∇kj ≤

k∇m k for m ≤ kj . Let s ∈ X be an arbitrary vector. As ∇kj ∈ ∂ϕ skj , it follows from
definition of subgradient that
kj −1
(3.15) X
λm ∇kj , jp∗∗ (∇m ) − ∇kj , s





ϕ skj ≤ ϕ (s) + ∇kj , skj − ∇kj , s = ϕ (s) −
m=0
kj −1
X ∗
λm k∇m kp + ∇kj . ksk ,

≤ ϕ (s) +
m=0
30 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

and employing (3.20) we arrive at


 
ϕ skj ≤ ϕ (s) + C ϕ (s0 ) − ϕ skj + ∇kj . ksk ,

with C := 1/C1 > 0. Hence



(1 + C) ϕ skj ≤ ϕ (s) + Cϕ (s0 ) + ∇kj . ksk .

Now, (3.20) implies that (ϕ (sk ))k∈N is a positive decreasing sequence, hence convergent. It
follows that
 1  
lim ϕ (sk ) = lim ϕ skj ≤ lim ϕ (s) + Cϕ (0) + ∇kj . ksk
k→∞ j→∞ j→∞ 1 + C
1
= [ϕ (s) + Cϕ (0)] .
1+C

3.1.2 Dual gradient methods


We start this subsection using a motivation in Hilbert spaces to introduce a new method,
which we call Decreasing Error method (in short DE). Using the best of our knowledge,
a similar method was introduced by Fridman in [17] in the context of linear problems in
Hilbert spaces. In this particular situation, this method results in the gradient method
with the fastest decreasing error and this is the reason why it was called later, the Minimal
Error method, see e.g. [41]. As far as we know, the adaptation of DE method to nonlinear
problems in Banach spaces is novel.
Suppose momentarily that there exists a solution s+ of the linear equation As = b.
Define now the gradient method sk+1 = sk − λk ∇k , with λk > 0 and ∇k being the gradient
of ϕ (s) = 12 kAs − bk2 at sk , i.e., ∇k = A∗ (Ask − b) . The main idea consists of finding a
calculable upper bound λk > 0 for the step-size λk such that the inequality λk ≤ λk implies
the monotonically reduction of the error in each step.
Using the polarization identity (2.5) we find
1 sk+1 − s+ 2 = 1 sk − s+ − λk ∇k 2 = 1 sk − s+ 2 − λk sk − s+ , ∇k + 1 λ2 k∇k k2 .


2 2 2 2 k
Thus,
1 sk+1 − s+ 2 − 1 sk − s+ 2 = −λk sk − s+ , ∇k + 1 λ2 k∇k k2 := g (λk ) .


(3.24)
2 2 2 k
Observe that g (λk ) < 0 if and only if

hsk − s+ , ∇k i hA (sk − s+ ) , Ask − bi kAsk − bk2


λk < 2 = 2 = 2 .
k∇k k2 k∇k k2 k∇k k2

We conclude that the choice λk ∈ 0, λk with the calculable step-size

kAsk − bk2
λk := C0 (3.25)
k∇k k2

and C0 < 2 implies that g (λk ) < 0, as we wanted. Observe that C0 = 1 transforms λk in the
optimal step-size (in the sense that the resulting method has, among all gradient
 methods,
the error which decreases with maximal speed), which is obtained from g 0 λk = 0. The
gradient method associated with the step-size λk with C0 = 1 is just the Minimal Error
3.1. SOLVING THE INNER ITERATION 31

method introduced in [17]. Note further that the step-size (3.25) is simpler to be computed
than the one of Steepest Descent method (3.19).
To enlarge the above results in order to guarantee their validity for nonlinear problems in
Banach spaces, more general results as those shown in last subsection are required. For the
proper adjustment of DE method to K-REGINN, we suppose for the rest of this subsection
that X is an uniformly smooth and uniformly convex Banach space. Both restrictions
together ensure that the duality mapping Jp : X → X ∗ , 1 < p < ∞, is single valued,
continuous, invertible and with continuous inverse given by Jp−1 = Jp∗∗ : X ∗ → X ∗∗ ∼ = X.

This result provides free access to the dual space X in the sense that it is always possible
to transfer a vector from X to its dual space X ∗ , perform an iteration and then come back
to the original space in a stable way.
We further assume the next assumption on the inverse problem F (x) = y :

Assumption 1 (a) There exists a solution x+ ∈ X of equation (3.1) satisfying

Bρ x+ , ∆p := v ∈ X : ∆p x+ , v < ρ ⊂ D (F ) ,
  

where ρ > 0 and p > 1 are fixed numbers and the Bregman distance ∆p is defined in (2.19) .
(b) All the functions Fj , j = 0, . . . , d − 1, are continuously Fréchet differentiable in
Bρ (x+ , ∆p ) and their derivatives satisfy
0
Fj (v) ≤ M for all v ∈ Bρ x+ , ∆p and j = 0, . . . , d − 1,


where M > 0 is a constant.


(c) (Tangential Cone Condition (TCC)): There exists a constant 0 ≤ η < 1 such that

Fj (v) − Fj (w) − Fj0 (w) (v − w) ≤ η kFj (v) − Fj (w)k ,


for all v, w ∈ Bρ (x+ , ∆p ) and j = 0, . . . , d − 1.

Before starting the analysis in more general Banach spaces, we stay a little longer
in Hilbert spaces and observe how could be possible to employ the DE method as inner
iteration of K-REGINN. Similar to before, we define the iteration sn,k+1 = sn,k − λn,k ∇n,k ,
where λn,k > 0 and ∇n,k = A∗n (An sn,k − bn ) , with An := F[n]
0 (x ) and b := y −F
n n [n] [n] (xn ) .
Observe that the most suitable vector to be approximated in the inner iteration is not s+ ,
but en := x+ − xn . The reason is quite simple: if sn,kn = en , then xn+1 = xn + en = x+ .
Applying an idea similar to (3.24) , we derive the equality

1 1 1
ksn,k+1 − en k2 − ksn,k − en k2 = −λn,k hsn,k − en , ∇n,k i + λ2n,k k∇n,k k2 (3.26)
2 2 2
:= g (λn,k )

and conclude that g (λn,k ) < 0 if and only if

hsn,k − en , ∇n,k i
λn,k < 2 .
k∇n,k k2

Note that the term hsn,k − en , ∇n,k i depends on the unavailable information en . But,

hsn,k − en , ∇n,k i = hAn (sn,k − en ) , An sn,k − bn i (3.27)


= hAn sn,k − bn , An sn,k − bn i − hAn en − bn , An sn,k − bn i
≥ kAn sn,k − bn k2 − kAn sn,k − bn k . kAn en − bn k .
32 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

Applying now the TCC, Assumption 1(c) , and observing that kbn k ≤ µ1 kAn sn,k − bn k for
k = 0, ..., kn , , see (3.11),

0
kAn en − bn k = F[n] x+ − F[n] (xn ) − F[n] (xn ) x+ − xn

(3.28)

η
≤ η kbn k ≤ kAn sn,k − bn k .
µ

Therefore
hsn,k − en , ∇n,k i ≥ K1 kAn sn,k − bn k2 ,
η 
with K1 := 1 − µ (which is positive for µ ∈ (η, 1)). Thus, λn,k ∈ 0, λn,k with

kAn sn,k − bn k2
λn,k := 2K1
k∇n,k k2

implies ksn,k+1 − en k < ksn,k − en k .


At this point we clearly see why is important to observe the outer and inner iteration
of K-REGINN simultaneously for this method. An arbitrary linear operator A : X → Y and
an arbitrary vector b ∈ Y do not necessarily verify (3.28). We actually need to use A = An
and b = bn .
See that the monotonicity of the error in the outer iteration is inherited from the inner
iteration because

xn+1 − x+ = ksn,k − en k < ... < ksn,0 − en k = xn − x+ .



n

Remark 25 To expand the above ideas to more general Banach spaces, we need to assume
that X is s−convex for some s > 1, which leave us with two possible approaches. In the
first one, we assume that s = p, which means that the index s coincides with the index used
to define the duality mapping Jp . In this case, the numbers p and s in inequalities (2.22) ,
(2.23) , (2.24) and (2.25) coincide. This framework strongly facilitates the convergence
analysis of K-REGINN presented in Chapter 4. However, when performing the numerical
experiments in Chapter 5 we are interested in the use of the Lebesgue spaces Lp (Ω) with
1 < p < 2, which are not p−convex but 2−convex spaces. This structure forces the use of
the normalized duality mapping J2 in Lp (Ω) instead of the standard option Jp . At first, this
is not a big problem, because J2 can be calculated in Lp (Ω) via (2.14) . But the numerical
experiments we have done suggested that this is not the best approach to guarantee good
reconstructions.
In order to fix this problem, an alternative and more general approach can be applied
assuming the space X to be s−convex with p ≤ s, which makes possible the use of the duality
mapping Jp in the s−convex space Lp (Ω) (s = max {p, 2}). For this reason, we have chosen
to employ this approach to develop our theory. This procedure actually results in a much
more complicated theory, but as a reward we are free to use a more suitable duality mapping
and expect to achieve better reconstructions in our numerical experiments in Chapter 5.

We now clarify how to engage the dual gradient methods in combination with K-REGINN
in Banach spaces. Assume that X is uniformly smooth and s−convex with some s ≥ p.
Suppose that the current outer iterate xn of K-REGINN is well-defined and define the convex
functional ϕn (s) := 1r kAn s − bn kr , r > 1. The dual gradient methods are now defined by
the following iteration:
Jp (zn,k+1 ) := Jp (zn,k ) − λn,k ∇n,k , (3.29)
where λn,k > 0, ∇n,k ∈ ∂ϕn (sn,k ) = A∗n Jr (An sn,k − bn ) , sn,k := zn,k − xn and sn,0 := 0.
Observe that zn,0 = xn + sn,0 = xn as well as zn,kn = xn + sn,kn = xn+1 . We codify
3.1. SOLVING THE INNER ITERATION 33

Algorithm 3 K-REGINN with dual gradient inner iteration


Input: xN ; (y δj , δj ); Fj ; Fj0 , j = 0, . . . , d − 1; µ; τ ; p, r > 1;
δ
Output: xN with kyj j − Fj (xN )k ≤ τ δj , j = 0, . . . , d − 1;
` := 0; x0 := xN ; c := 0;
while c < d do
for j = 0 : d − 1 do
n := `d + j;
δ
bδn := yj j − Fj (xn ); An := Fj0 (xn );
if kbδn k ≤ τ δj then
xn+1 := xn ; c := c + 1;
else
k := 0; sn,0 := 0; choose kmax,n ∈ N;
repeat
choose λn,k > 0 and ∇n,k ∈ A∗n Jr (An sn,k − bδn );
zn,k := sn,k + xn ;
Jp (zn,k+1 ) := Jp (zn,k ) − λn,k ∇n,k ;
sn,k+1 := Jp∗∗ (Jp (zn,k+1 )) − xn ;
k := k + 1;

until bδn − An sn,k < µ bδn or k = kmax,n
xn+1 := xn + sn,k ; c := 0;
end if
end for
` := ` + 1;
end while
xN := x`d−c ;

K-REGINN with a dual gradient method in the inner iteration in Algorithm 3. The fragments
highlighted in red represent here the part of the algorithm corresponding to (3.29).
To imitate the polarization identity, which is necessary to derive (3.26) , we change the
2
functional 12 kx+ − ·k into the Bregman distance ∆p (x+ , ·) . Assume that7 xn ∈ Bρ (x+ , ∆p ) ,
that is, ∆p (x+ , xn ) < ρ. In view of (2.20) , this inequality implies that kxn k ≤ Cρ,x+ , with
 
Cρ,x+ := 2p∗ x+ ∨ ρ1/p . (3.30)

Our goal now is to define a calculable step-size λDE = λDE (n, k) > 0 such that the
inequalities 0 < λn,k ≤ λDE will imply the monotonicity of the error in the inner iteration:

∆p x+ , zn,k+1 < ∆p x+ , zn,k .


 
(3.31)

Further, we prove that the step-size λLW of the Landweber method and λM SD of the
Modified Steepest Descent method satisfy the required inequality and consequently the
monotonicity property (3.31) is ensured for these methods.
The definition of λDE depends on an uniformly bound in n and k for the generated
sequence (zn,k )0≤k≤kn , n ∈ N, see (3.32) and (3.39). But in order to prove that this
7
In Theorem 38, we will prove that if K-REGINN starts with x0 ∈ Bρ x+ , ∆p , then all the outer iterations


xn belong to the same ball, see 4.10.


34 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

sequence is uniformly bounded, it is necessary to use the property (3.31) , which in turn
depends on the definition of λDE . This reasoning suggests that an induction argument is
necessary to prove all these properties at the same time. Note that zn,0 = xn ∈ Bρ (x+ , ∆p )
and therefore kzn,0 k ≤ Cρ,x+ . We prove now by induction that

kzn,k k ≤ Cρ,x+ (3.32)

for k = 0, ..., kn . Assume that (3.32) holds true for some k < kn as well as ∆p (x+ , zn,` ) <
∆p (x+ , zn,`−1 ) for ` = 1, ..., k.
The three points identity (2.21) is applied to replace (3.26) . Definition (3.29) together
with en = x+ − xn yields

∆p x+ , zn,k+1 − ∆p x+ , zn,k = ∆p (zn,k , zn,k+1 )


 
(3.33)
+ hJp (zn,k+1 ) − Jp (zn,k ) , sn,k − en i
= ∆p (zn,k , zn,k+1 ) − λn,k h∇n,k , sn,k − en i =: g (λn,k ) .

Making use of the properties of the duality mapping, we obtain, similarly to (3.27) ,

h∇n,k , sn,k − en i = hjr (An sn,k − bn ) , An (sn,k − en )i (3.34)


= hjr (An sn,k − bn ) , An sn,k − bn i − hjr (An sn,k − bn ) , An en − bn i
≥ kAn sn,k − bn kr − kAn sn,k − bn kr−1 kAn en − bn k .

As X is s−convex, X ∗ is s∗ −smooth and since p∗ ≥ s∗ , there exists a positive constant


Cp∗ ,s∗ (see (2.24)) such that

∆p (zn,k , zn,k+1 ) = ∆p∗ (Jp (zn,k+1 ) , Jp (zn,k )) (3.35)


 ∗ ∗ ∗
≤ Cp∗ ,s∗ kJp (zn,k )kp −s kJp (zn,k+1 ) − Jp (zn,k )ks


∨ kJp (zn,k+1 ) − Jp (zn,k )kp
p−s∗ (p−1) s∗ ∗
 ∗ ∗

≤ Cp∗ ,s∗ Cρ,x+ λn,k k∇n,k ks ∨ λpn,k k∇n,k kp .

The last inequality is the exact point where the bound (3.32) is used. We impose now a
restriction on λn,k : if it is possible to find λn,k satisfying
 !
p−s∗ (p−1) s∗ ∗
 ∗ ∗
Cp∗ ,s∗ Cρ,x+ λn,k k∇n,k ks ∨ λpn,k k∇n,k kp ≤ C0 λn,k kAn sn,k − bn kr (3.36)

for some 0 < C0 < 1, then putting it together with (3.34) in (3.33) we arrive at

g (λn,k ) ≤ λn,k kAn sn,k − bn kr−1 [kAn en − bn k − (1 − C0 ) kAn sn,k − bn k] . (3.37)

The above inequality is the key to prove that g (λn,k ) < 0 and it is fundamental for our con-
vergence analysis in Chapter 4. Observe that it does not depend on the TCC (Assumption
1(c) , page 31), but if we use it, together with k < kn and the definition (3.11), we obtain
like in (3.28) the bound

g (λn,k ) ≤ −λn,k C3 kAn sn,k − bn kr < 0, (3.38)

with C3 := 1 − C0 − η/µ (which is positive if η < 1 − C0 and η/ (1 − C0 ) < µ < 1). This
inequality ensure that (3.31) holds and accordingly

∆p x+ , zn,k+1 < ∆p x+ , zn,k < ... < ∆p x+ , zn,0 < ρ.


  

It follows that kzn,k+1 k ≤ Cρ,x+ , which completes the induction proof.


3.1. SOLVING THE INNER ITERATION 35

It remains only to find λDE > 0 such that (3.36) holds for all λn,k ≤ λDE . Looking at
(3.36) , we see that this is the case if

λDE := C1 λDE,s ∧ C2 λDE,p (3.39)

where
kAn sn,k − bn kr(`−1)
λDE,` := ,
k∇n,k k`
s−p
with C1 := C0s−1 /Cps−1
∗ ,s∗ C
ρ,x+
and C2 := C0p−1 /Cpp−1
∗ ,s∗ .

Remark 26 As zn,k − x+ = sn,k − en , it comes from (3.34) and (3.28) ,


 
η
k∇n,k k zn,k − x+ ≥ h∇n,k , sn,k − en i ≥ kAn sn,k − bn kr .

1−
µ

Since the sequences (zn,k − x+ )k≤kn , n ∈ N, are uniformly bounded, we obtain

kAn sn,k − bn kr . k∇n,k k .

Finally, p ≤ s implies kAn sn,k − bn kr(s−p) . k∇n,k ks−p and consequently

kAn sn,k − bn kr(s−1) kAn sn,k − bn kr(p−1)


. .
k∇n,k ks k∇n,k kp

Therefore, λDE,s . λDE,p for all n and k. This means that λDE,s . λDE . λDE,p . Further,
if a small enough constant replaces the constant C1 in definition of λDE , then λDE = λDE,s
can be chosen and the property

λn,k ≤ λDE =⇒ ∆p x+ , zn,k+1 < ∆p x+ , zn,k


 

still holds.

1 r
Remark 27 As long as a minimizer s+ n of ϕn (s) = r kAn s − bn k exists and λn,k ∈
(0, λDE ], the sequence (∆p (zn+ , zn,k ))k≤kn with zn+ := xn + s+
n is also monotonically de-
creasing. In fact, proceeding like in (3.33) and (3.35) ,

∆p zn+ , zn,k+1 − ∆p zn+ , zn,k ≤ h (λn,k )


 

with
p−s∗ (p−1) s∗ ∗
 ∗ ∗

λn,k k∇n,k ks ∨ λpn,k k∇n,k kp − λn,k ∇n,k , sn,k − s+


h (λn,k ) := Cp∗ ,s∗ Cρ,x+ n .

Now, as kAn s+
n − bn k ≤ kAn en − bn k , we obtain like in (3.34) and (3.28) ,
 
η
∇n,k , sn,k − s+ kAn sn,k − bn kr


n ≥ 1−
µ

From (3.36) , it follows that for all λn,k ∈ (0, λDE ] it is true that
 
η
h (λn,k ) ≤ − 1 − C0 − λn,k kAn sn,k − bn kr < 0.
µ
36 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

Notice that
kAn sn,k − bn kr(s−1) 1
λDE,s & ≥ s kAn sn,k − bn ks−r ,
k∇n,k ks M
which implies that
λDE & kAn sn,k − bn kt , with t := s − r > −r. (3.40)
Employing the TCC (Assumption 1(c) , page 31), we see that
kbn k − kAn en k ≤ kbn − An en k ≤ η kbn k .
Hence
M M x+ + Cρ,x+ .

kbn k ≤ ken k ≤ (3.41)
1−η 1−η
Thus, the sequence of residuals (bn )n∈N is bounded. Since
ksn,k k ≤ kzn,k k + kxn k ≤ 2Cρ,x+ , (3.42)
the sequences (sn,k )k≤kn and consequently (An sn,k − bn )k≤kn are uniformly bounded for all
n ∈ N. This implies that there exists a constant C4 > 0 independent on n and k such that
kAn sn,k − bn k ≤ C4 for all k ≤ kn and n ∈ N. Thus, as kAn sn,k − bn k ≥ µ kbn k for all
k = 0, ..., kn − 1,
C −r µs
λDE & kAn sn,k − bn ks−r ≥ 4 s kbn ks .
M
In particular, the Landweber method with step-size (independent of k) defined by λLW :=
C5 kbn ks , where C5 > 0 is a small constant, is well-defined as inner iteration of K-REGINN
and satisfies λLW ∈ (0, λDE ]. Consequently, inequality (3.37) is satisfied for this method.
Even a small constant in n and k can be used as Landweber step-size if s ≤ r, because
in this case
λDE & kAn sn,k − bn ks−r ≥ C4s−r . (3.43)
From definition (3.29) , we can see that
sn,k+1 = Jp∗∗ (Jp (xn + sn,k ) − λn,k ∇n,k ) − xn
and since the Steepest Descent method is the gradient method whose the step-size minimizes
the residual, the most natural manner to define the Steepest Descent method for the dual
gradient methods seems to be choosing a step-size satisfying
λSD ∈ arg minϕn Jp∗∗ (Jp (xn + sn,k ) − λ∇n,k ) − xn .

(3.44)
λ∈R+

If such a number exists, it follows immediately that


ϕn (sn,k+1 ) ≤ ϕ Jp∗∗ (Jp (xn + sn,k ) − λ∇n,k ) − xn


for all λ ≥ 0, and in particular, kAn sn,k+1 − bn k ≤ kAn sn,k − bn k is achieved by picking
λ = 0. However, an explicit expression for λSD is hard to find. Even simple inequalities
involving λSD are not easy to be obtained. Unfortunately we were not able to prove that
λSD ∈ (0, λDE ] to include it in our convergence analysis of Chapter 4. But, the Modified
Steepest Descent method defined in (3.18) can be useful here, because it already has an
explicit step-size. For the dual methods it is nevertheless convenient to alter its exponents.
Notice that for ` ∈ {p, s} ,
∗ ∗ r∗ (`−1)
r∗ (`−1)
k∇n,k kp r (`−1) = Jp∗∗ (∇n,k ) , ∇n,k = An Jp∗∗ (∇n,k ) , jr (An sn,k − bn )

r∗ (`−1)
≤ An Jp∗∗ (∇n,k ) kAn sn,k − bn kr(`−1) ,

3.1. SOLVING THE INNER ITERATION 37

and therefore,
∗ r ∗ (`−1)−`
k∇n,k kp kAn sn,k − bn kr(`−1)
r∗ (`−1) ≤ .
k∇n,k k`


A J ∗ (∇ )

n p∗ n,k

Thus, defining8
λM SD := K1 λM SD,s ∧ K2 λM SD,p (3.45)
with ∗ r ∗ (`−1)−`
k∇n,k kp
λM SD,` := r∗ (`−1) ,

An Jp∗ (∇n,k )

0 < K1 ≤ C1 and 0 < K2 ≤ C2 , we conclude that λM SD ≤ λDE and inequality (3.37)


automatically holds for this method. We remark that this step-size still coincide with the
one of Steepest Descent method (3.19) in Hilbert spaces if p = s = r = 2 and K1 = K2 = 1.9
Further ∗ ∗
k∇n,k kp r (`−1)−`
λM SD,` ≥ ∗ (`−1) (p∗ −1)r∗ (`−1)
& k∇n,k kt ,
M r k∇n,k k
with t := r∗ (` − 1) − ` ≥ −1 > −r. Now, if ` ≤ r (i.e., if p ≤ s ≤ r), we conclude that
t ≤ 0, which implies that

λM SD & kAn sn,k − bn kt , with − r < t ≤ 0. (3.46)

Using the same argument as that used for the DE method (see (3.43)), we conclude that

λM SD & C4t , (3.47)

where C4 > 0 is an uniform upper bound to kAn sn,k − bn k .


We investigate now what should be the optimal step-size for the dual gradient methods.
Defining the vector zn,λ ∈ X as

Jp (zn,λ ) := Jp (zn,k ) − λ∇n,k ,

we intend to find a step-size λ = λopt such that the corresponding dual gradient method
minimizes f (λ) := ∆p (x+ , zn,λ ) − ∆p (x+ , zn,k ) , yielding a dual gradient method with the
fastest decreasing error in the inner iteration of K-REGINN. Looking to the three points
identity (2.21), we see that this difference can be written as

f (λ) = ∆p (zn,k , zn,λ ) + Jp (zn,λ ) − Jp (zn,k ) , zn,k − x+




 
1 p 1 p∗
− λ ∇n,k , zn,k − x+


= kzn,k k − hJp (zn,λ ) , zn,k i + ∗ kJp (zn,λ )k
p p
 
1 1 ∗
= kzn,k kp − hJp (zn,k ) − λ∇n,k , zn,k i + ∗ kJp (zn,k ) − λ∇n,k kp
p p
+


− λ ∇n,k , zn,k − x .

The uniformly smoothness of X ∗ implies the Fréchet-differentiability of p1∗ k·kp . Applying
the chain rule, we obtain

f 0 (λ) = h∇n,k , zn,k i − hJp∗ (Jp (zn,k ) − λ∇n,k ) , ∇n,k i − ∇n,k , zn,k − x+

= − ∇n,k , Jp∗∗ (Jp (zn,k ) − λ∇n,k ) − x+ ,



8
The step-size of the MSD method is defined slightly differently for the primal and dual gradient methods.
The same occurs with the LW method. We however keep the same notation λM SD and λLW for these step-
sizes in both situations.
9
In Hilbert spaces, Cp∗ ,s∗ = 1/2 can be chosen and therefore, the unique restriction we have is 1 = K` ≤
C` = C0 /Cp∗ ,s∗ = 2C0 , ` = 1, 2. This means that the inequality 1/2 ≤ C0 < 1 needs to be observed.
38 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

which makes evident that f 0 is continuous. From (3.28) and (3.34) ,


 
η
0 +
kAn sn,k − bn kr < 0


f (0) = − ∇n,k , zn,k − x ≤ − 1 −
µ
in the inner iteration of K-REGINN. Further,
kJp (zn,k ) − λ∇n,k k ≥ λ k∇n,k k − kJp (zn,k )k → ∞ as λ → ∞,
which in view of (2.20) implies that ∆p (x+ , zn,λ ) = ∆p∗ (Jp (zn,λ ) , Jp (x+ )) cannot remain
unbounded as λ grows to infinite. It follows that f (λ) → ∞ as λ → ∞ and since f is
continuous, it needs to be increasing at least in a small open interval. The derivative of
f is therefore positive in this interval and since it is a continuous function and f 0 (0) < 0,
there exists a λ = λopt > 0 such that f 0 (λopt ) = 0. The uniqueness of λopt follows from the
injectivity of f 0 : let λ1 , λ2 ∈ R satisfy f 0 (λ1 ) = f 0 (λ2 ). Assuming that λ1 6= λ2 we derive
from the strictly monotonicity of the duality mapping in strictly convex Banach spaces the
contradiction
0 = (λ1 − λ2 ) f 0 (λ1 ) − f 0 (λ2 )


= (λ1 − λ2 ) ∇n,k , Jp∗∗ (Jp (zn,λ2 )) − Jp∗∗ (Jp (zn,λ1 ))



= (Jp (zn,k ) − λ2 ∇n,k ) − (Jp (zn,k ) − λ1 ∇n,k ) , Jp∗∗ (Jp (zn,λ2 )) − Jp∗∗ (Jp (zn,λ1 ))

= Jp (zn,λ2 ) − Jp (zn,λ1 ) , Jp∗∗ (Jp (zn,λ2 )) − Jp∗∗ (Jp (zn,λ1 )) > 0.



Of course λopt is a minimizer of f because f 0 is negative in zero and positive for large values.
Additionally, the unique minimizer λopt solves the nonlinear equation
∇n,k , Jp∗∗ (Jp (zn,k ) − λopt ∇n,k ) − x+ = 0.


(3.48)
Using again the three points identity, we conclude that for zn,k+1 := zn,λopt ,
∆p x+ , zn,k+1 − ∆p x+ , zn,k = −∆p (zn,k+1 , zn,k ) + Jp (zn,k+1 ) − Jp (zn,k ) , zn,k+1 − x+
 

= −∆p (zn,k+1 , zn,k ) − λopt ∇n,k , zn,k+1 − x+



and consequently
∆p x+ , zn,k+1 − ∆p x+ , zn,k = −∆p (zn,k+1 , zn,k ) .
 

An explicit formula for λopt is hard to achieve and to calculate λopt , even numerically, is
very challenging since the nonlinear equation (3.48) depends on the unavailable information
x+ . It is easy to confirm that λopt = h∇n,k , sn,k − en i / k∇n,k k2 in Hilbert spaces, which still
depends on the unavailable information x+ . This leaves little hope to exactly determine λopt
for nonlinear problems using only available information10 . However, using the property
(3.48) a lower bound for λopt can be achieved:
λopt ∇n,k , zn,k − x+ = − hλopt ∇n,k , zn,k+1 − zn,k i

= hJp (zn,k+1 ) − Jp (zn,k ) , zn,k+1 − zn,k i


= ∆p (zn,k+1 , zn,k ) + ∆p (zn,k, zn,k+1 )
= ∆p∗ (Jp (zn,k ) , Jp (zn,k+1 )) + ∆p∗ (Jp (zn,k+1 ) , Jp (zn,k )) .
The inequality (3.31) obviously hold for k = 0, ..., kn − 1, and accordingly, the uniform
bound (3.32) is true. Proceeding like in (3.35) , we obtain
λopt h∇n,k , sn,k − en i = λopt ∇n,k , zn,k − x+

p−s∗ (p−1) s∗ ∗
 ∗ ∗

≤ 2Cp∗ ,s∗ Cρ,x+ λopt k∇n,k ks ∨ λpopt k∇n,k kp .
10
In Hilbert spaces, the last expression reduces to the calculable step-size λopt = kAsk − bk2 / k∇k k2 if
the problem (3.1) is linear, see (3.25).
3.1. SOLVING THE INNER ITERATION 39

Thus
! !
1 h∇n,k , sn,k − en is−1 1 h∇n,k , sn,k − en ip−1
λopt ≥ ∧ .
2s−1 Cps−1
∗ ,s∗ C
s−p k∇n,k ks 2p−1 Cpp−1
∗ ,s∗ k∇n,k kp
ρ,x+

Finally, from (3.34) and (3.28) we obtain, like in Remark 26,


 s−1  p−1  s−1
1 − η/µ 1 − η/µ 1 − η/µ
λopt ≥ C1 λDE.s ∧ C2 λDE,p ≥ λDE .
2 2 2

Remark 28 Unfortunately, we do not know if λopt ≤ λDE to include this method in the
convergence analysis of Chapter 4. However, as λopt minimizes f, we can conclude that

(3.38)
∆p x+ , zn,λopt − ∆p x+ , zn,k = f (λopt ) ≤ f (λDE ) ≤ −C3 λDE kAn sn,k − bn kr ,
 

which is enough to prove that the inner iteration of K-REGINN terminates (kn < ∞) and
consequently the monotonicity property (3.31) as well as the bound (3.32) is transferred to
the outer iteration, see Theorem 38. This implies that at least weak convergence holds for
this method, see Corollary 41.

We finish this subsection discussing how the dual gradient methods look like when
kmax = 1 is used (see (3.12)). In this case, K-REGINN performs only one inner iteration each
outer iteration whenever the inequality (3.9) is not verified. It follows that

Jp (xn+1 ) = Jp (zn,1 ) = Jp (zn,0 ) − λn,0 A∗n jr (An sn,0 − bn )


0
(xn )∗ jr F[n] (xn ) − y[n] .

= Jp (xn ) − λn,0 F[n]

We conclude that this procedure is equivalent to apply a dual gradient method directly to
the nonlinear system (3.6). Thus the convergence of K-REGINN using these methods as inner
iteration implies in particular the convergence of a Kaczmarz version of the dual gradient
methods themselves. Analogously, the iteration
 
xn+1 = xn − λn,0 jp∗∗ F[n]
0
(xn )∗ Jr F[n] (xn ) − y[n]

represents the primal gradient methods in case kmax = 1.

3.1.3 Tikhonov methods


The linear system An s = bn often inherits the ill-posedness of the nonlinear equation
F[n] (xn ) = y[n] . Therefore, it is crucial to apply a regularization technique to reconstruct
stable approximations. The plan in this subsection is to employ Tikhonov regularization
[51] to the referred linear system in order to generate the inner iteration of K-REGINN. To
alleviate the notation, we forget K-REGINN once again and skip the index n of the outer
iteration for a while. Let X and Y be Banach spaces, A : X → Y linear and b ∈ Y fixed.
Define for each k ∈ N0 , the Tikhonov functional Tk : X → R+ 0 as

1
Tk (s) := kAs − bkr + αk Ω (s) , (3.49)
r
with r > 1, αk > 0 and Ω : X → R+
0 being subdifferentiable and satisfying (sm )m∈N bounded
whenever (Ω (sm ))m∈N bounded. Define now the Tikhonov iteration as

sk+1 ∈ arg minTk (s) , s0 = 0. (3.50)


s∈X
40 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

Algorithm 4 K-REGINN with Tikhonov inner iteration


Input: xN ; (y δj , δj ); Fj ; Fj0 , j = 0, . . . , d − 1; µ; τ ; r > 1;
δ
Output: xN with kyj j − Fj (xN )k ≤ τ δj , j = 0, . . . , d − 1;
` := 0; x0 := xN ; c := 0;
while c < d do
for j = 0 : d − 1 do
n := `d + j;
δ
bδn := yj j − Fj (xn ); An := Fj0 (xn );
if kbδn k ≤ τ δj then
xn+1 := xn ; c := c + 1;
else
k := 0; sn,0 := 0; choose kmax,n ∈ N;
repeat
choose αn,k > 0 and Ωk : X → R+0 subdifferentiable;
 r 
1
choose sn,k+1 ∈ arg min r An s − bδn + αn,k Ωk (s) ;
s∈X
k := k + 1;

until bδn − An sn,k < µ bδn or k = kmax,n
xn+1 := xn + sn,k ; c := 0;
end if
end for
` := ` + 1;
end while
xN := x`d−c ;

Algorithm 4 displays the combination of K-REGINN with a Tikhonov method in the


inner iteration with the parts corresponding to the Tikhonov methods highlighted in red.
To guarantee the well-definedness of the inner iteration, the space X is assumed to be
reflexive, which is a sufficient condition for the existence of a minimizer of Tk in (3.49) as
we will see in a moment.

Remark 29 The penalization term Ω is defined above in a relatively general form.


Therefore, the Tikhonov functional (3.49) can represent many different functionals and the
associated iteration (3.50) accordingly plays the role of various Tikhonov-based iterations.
In this subsection we analyze four different methods, namely, the Tikhonov-Phillips and
Iterated-Tikhonov methods, where Ω (s) := p1 ks − xkp for some fixed vector x ∈ X
and Ω (s) := p1 ks − sk kp respectively, as well as their Bregman variations, Ω (s) :=
∆p (x + s, x + x) for some x ∈ X fixed and Ω (s) = ∆p (x + s, x + sk ) respectively. Although
our main intention is to prove some properties of these methods, we keep the analysis as
general as possible and only introduce the referred methods at the specific points we need
them.

The existence of a minimizer of Tk is assured in case of X being reflexive as we prove


now. Indeed, following ideas from [48, Prop. 4.1], let (sm )m∈N ⊂ X be a sequence sat-
isfying Tk (sm ) → inf {Tk (s) : s ∈ X} as m → ∞. Thus the sequence (Tk (sm ))m∈N , and
consequently Ω (sm ) ≤ Tk (sm ) /αk is bounded. It follows that (sm )m∈N is bounded and
since X is reflexive, there exists a subsequence smj j∈N and a vector s+ ∈ X such that

3.1. SOLVING THE INNER ITERATION 41

smj * s+ as j → ∞. Hence Asmj − b * As+ − b and since the norm-function is lower


semi-continuous, +
As − b ≤ lim inf Asm − b .
j
j→∞

As Ω is subdifferentiable, it is also lower semi-continuous. Therefore

Ω s+ ≤ lim inf Ω smj .


 
j→∞

Both results together prove that

Tk s+ ≤ lim inf Tk smj = lim Tk (sm ) = inf Tk (s) .


 
j→∞ m→∞ s∈X

Therefore s+ is a minimizer of Tk as we wanted.


Utilizing the optimality condition 0 ∈ ∂Tk (sk+1 ) we find

0 ∈ A∗ Jr (Ask+1 − b) + αk ∂Ω (sk+1 ) ,

and conclude that there exist a selection jr : Y → Y ∗ and s∗k+1 ∈ ∂Ω (sk+1 ) such that

1 ∗
s∗k+1 = − A jr (Ask+1 − b) .
αk
If the functional Ω is strictly convex, the minimizer of Tk is unique and consequently, sk+1
is unique in (3.50). If Ω is Gâteaux-differentiable in sk+1 , the above equality becomes
1 ∗
Ω0 (sk+1 ) = − A jr (Ask+1 − b) ,
αk
where Ω0 represents the G-derivative of Ω.
The Tikhonov-Phillips method (TP) is defined by choosing a strictly decreasing zero-
sequence (αk )k∈N and Ω (s) := p1 ks − xkp , with p > 1 and x ∈ X being independent on
k.
Assume from now on that X is smooth and strictly convex. The first restriction is
equivalent to the G-differentiability of p1 k·kp and consequently Ω0 (s) = Jp (s − x) . The
second restriction in turn, is equivalent to the strict convexity of p1 k·kp , which implies that
Tk is strictly convex and therefore the minimizer of this functional is unique. Choosing
x := 0, the TP method results in the implicit iteration
1 ∗
Jp (sk+1 ) = − A jr (Ask+1 − b) . (3.51)
αk

It is easy to confirm that sk+1 = (A∗ A + αk I)−1 A∗ b in Hilbert spaces (using p = r = 2).
Definition (3.50) immediately implies that Tk−1 (sk ) ≤ Tk−1 (0) for all k ∈ N and as
Ω (0) = 0,
kAsk − bk ≤ kbk . (3.52)
Used as inner iteration of K-REGINN, the iteration (3.51) results in a Kaczmarz variation
of the well-known Levenberg-Marquardt method [21]. K-REGINN with the use of TP method
in the inner iteration and the choice x := x0 − xn is transformed into a Kaczmarz version
of the IRGN (Iteratively Regularized Gauss-Newton method, due to Bakushinskii [3], see also
[29]). Observe that, if x 6= 0, then Ω (0) = p1 kxkp 6= 0, which means that the inequality
Tk−1 (sk ) ≤ Tk−1 (0) alone is not enough to ensure (3.52). However, inequality (3.16) can
be proven for this method, as Lemma 30 below shows.
A variation of TP method is defined employing the Bregman distance instead of the
standard norm in X. The functional p1 k· − xkp is then replaced by Ω (s) := ∆p (x + s, x + x)
42 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

where x ∈ X is a fixed vector. The restrictions on X still guarantee the strict convexity
and G-differentiability of Ω in this case. Moreover, Ω0 (s) = Jp (x + s) − Jp (x + x) and we
have for this variation,
1 ∗
Jp (x + sk+1 ) = Jp (x + x) − A jr (Ask+1 − b) . (3.53)
αk
The two variations of TP method coincide for p = 2 whenever X is a Hilbert space, because
in this situation 21 ks − xk2 = ∆2 (x + s, x + x) . Similar to before, x = 0 implies Ω (0) = 0
and the property (3.52) follows again from Tk−1 (sk ) ≤ Tk−1 (0). Inequality (3.16) also
holds for this method, even in case of x 6= 0, as the next lemma demonstrate.
The first part of this lemma is a generalization to Banach spaces of [30, Theo. 2.16].
Lemma 30 Let X and Y be Banach spaces with X being reflexive, A : X → Y linear,
b ∈ Y and (αk )k∈N ⊂ R a strictly decreasing positive sequence with αk → 0 as k → ∞. Let
s0 ∈ X and
sk+1 ∈ arg minTk (s)
s∈X
with
1
Tk (s) :=kAs − bkr + αk Ω (s) ,
r
where r > 1, Ω : X → R+ 0 is subdifferentiable and satisfies (sm )m∈N bounded whenever
(Ω (sm ))m∈N bounded. Then (sk )k∈N is well-defined and satisfies
kAsk+1 − bk ≤ kAsk − bk and Ω (sk+1 ) ≥ Ω (sk ) (3.54)
for all k ∈ N, where the above inequalities are strict in case of Y being strictly convex.
Furthermore,
lim kAsk − bk = inf kAs − bk (3.55)
k→∞ s∈X
and consequently, inequality (3.23) in Proposition 24 holds for C = 0.
Proof. As shown at the beginning of this subsection, a minimizer of Tk exists for each k ∈
N0 . Thus sk and sk+1 are well-defined and fill the optimality conditions 0 ∈ ∂Tk−1 (sk ) and
0 ∈ ∂Tk (sk+1 ) . Therefore, there exist a selection jr of the duality mapping Jr , s∗k ∈ ∂Ω (sk )
and s∗k+1 ∈ ∂Ω (sk+1 ) such that
αk−1 s∗k = −A∗ jr (Ask − b) and (3.56)
αk s∗k+1 ∗
= −A jr (Ask+1 − b) .
Subtracting the second equation from the first and applying both sides of the resulting
equality in the vector sk − sk+1 ,
αk−1 s∗k − s∗k+1 , sk − sk+1 + (αk−1 − αk ) s∗k+1 , sk − sk+1


= hjr (Ask+1 − b) − jr (Ask − b) , A (sk − sk+1 )i .


Observe now that A (sk − sk+1 ) = − [(Ask+1 − b) − (Ask − b)] . From the monotonicity of
the duality mapping (respectively the strict monotonicity in case of Y being a strictly
convex space), we conclude that the first term in the left-hand side and the one in the
right-hand side of above equality are non-negative (resp. positive)

∗ and non-positive
(resp.
negative)

∗ respectively.
As α k−1 > α k , we conclude that s ,
k+1 ks − s k+1 ≤ 0 (resp.
sk+1 , sk − sk+1 < 0). We proceed applying both sides of second equation of (3.56) in
the vector sk − sk+1 ,
0 ≥ αk s∗k+1 , sk − sk+1 = hjr (Ask+1 − b) , A (sk+1 − sk )i

= hjr (Ask+1 − b) , Ask+1 − bi − hjr (Ask+1 − b) , Ask − bi


≥ kAsk+1 − bkr − kAsk+1 − bkr−1 kAsk − bk ,
3.1. SOLVING THE INNER ITERATION 43

which proves the first inequality in (3.54) . To prove the second one, we only use Tk−1 (sk ) ≤
Tk−1 (sk+1 ) and apply the just proved inequality kAsk+1 − bk ≤ kAsk − bk.
Now we turn to (3.55) . Because (kAsk − bk)k∈N is a non-increasing sequence, it con-
verges. Let  > 0 be given and (sj )j∈N ⊂ X be a sequence satisfying

1 1
kAsj − bkr → a := inf kAs − bkr as j → ∞.
r s∈X r

Then there exists a number J ∈ N such that for all j ≥ J and all k ∈ N,
1 1
a≤ kAsk − bkr ≤ kAsk − bkr + αk−1 Ω (sk )
r r
1 r
≤ kAsj − bk + αk−1 Ω (sj ) ≤ a +  + αk−1 Ω (sj ) .
r
In particular
1
a≤ kAsk − bkr ≤ a +  + αk−1 Ω (sJ )
r
for all k ∈ N. Since αk → 0 as k → ∞,
1
a ≤ lim kAsk − bkr ≤ a + .
k→∞ r

As  > 0 is arbitrary, (3.55) follows. Inequality (3.23) with C = 0 is just (3.55).


We now impose some restrictions on the sequence (αk )k∈N in order to prove an inequality
similar to (3.67) for the Bregman variation of TP method (3.53) . Choose 1 < θ < (r + 1) /r
and assume that
αk−1
1< ≤ θ, for all k ∈ N. (3.57)
αk
Define zk := x + sk and observe that the inequality Tk−1 (sk ) ≤ Tk−1 (sk+1 ) implies

− kAsk+1 − bkr ≤ − kAsk − bkr + rαk−1 ∆p (zk+1 , x + x)


≤ − kAsk − bkr + rθαk ∆p (zk+1 , x + x) ,

Now, as Tk (sk+1 ) ≤ Tk (x) ,

1
αk ∆p (zk+1 , x + x) ≤ kAx − bkr
r
and with help of (2.21) and (3.54) , we obtain for f (k) := ∆p (x+ , zk ) − ∆p (x+ , x + x) with
e := x+ − x,

f (k + 1) = Jp (zk+1 ) − Jp (x + x) , zk+1 − x+ − ∆p (zk+1 , x + x)



1
= − hjr (Ask+1 − b) , A (sk+1 − e)i − ∆p (zk+1 , x + x)
αk
1  
≤ kAsk+1 − bkr−1 kAe − bk − kAsk+1 − bkr − αk ∆p (zk+1 , x + x)
αk
1  
≤ kAsk+1 − bkr−1 kAe − bk − kAsk − bkr + (rθ − 1) αk ∆p (zk+1 , x + x)
αk
1  
≤ kAsk − bkr−1 kAe − bk − kAsk − bkr + C2 kAx − bkr ,
αk

with C2 := (rθ − 1) /r < 1. For x := 0, x := xn , sk := sn,k , A := An and b := bn , the TP


method becomes
1 ∗
Jp (zn,k+1 ) = Jp (xn ) − A jr (An sn,k+1 − bn ) (3.58)
αk n
44 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

and the above inequality takes the form

∆p x+ , zn,k+1 − ∆p x+ , xn
 
(3.59)
1 h i
≤ kAn sn,k − bn kr−1 (kAn en − bn k − kAn sn,k − bn k) + C2 kbn kr .
αk

Moreover, the vector zn,k+1 in (3.58) is the unique minimizer of the Tikhonov functional

1
Tn,k (z) := kAn (z − xn ) − bn kr + αk ∆p (z, xn ) . (3.60)
r

Remark 31 Inequality (3.59) was achieved using only the conditions for existence and
uniqueness of a minimizer of (3.49) together with the hypotheses of Lemma 30. These are
actually very few requirements. The s−convexity of X, for instance, is not necessary either
to the well-definedness of the TP method nor to prove (3.59) above. The TCC (Assumption
1(c) , page 31) is not necessary as well.

The main difficulties in studying the convergence of K-REGINN using the Tikhonov-
Phillips-type methods as inner iteration arise from the fact that the iteration sk+1 is actually
independent of the previous iteration sk . This fact also indicates that some information is
wasted during the inner iteration. It would be helpful if we could count on an iteration
similar to (3.29) of the dual gradient methods, where the available information sk is used
to generate the update sk+1 . Motivated by these facts, we introduce the Iterated-Tikhonov
method (IT), where the sequence (αk )k∈N is chosen being independent of k , i.e., αk = α
for all k ∈ N. In contrast, the functional Ω is dependent on this variable: Ω (s) = Ωk (s) :=
1 p
p ks − sk k . Then from (3.50) ,

1
Jp (sk+1 − sk ) = − A∗ jr (Ask+1 − b) ,
α
for some selection jr : Y → Y ∗ . Adopting the Bregman distance variation

Ωk (s) := ∆p (x + s, x + sk ) ,

with x ∈ X fixed, we arrive at

1 ∗
Jp (x + sk+1 ) = Jp (x + sk ) − A jr (Ask+1 − b) . (3.61)
α
The inequality kAsk+1 − bk ≤ kAsk − bk is an immediate consequence of Tk (sk+1 ) ≤
Tk (sk ). Moreover, Tk (sk+1 ) < Tk (sk ) for sk+1 6= sk because in this case Ωk (sk+1 ) > 0 and
accordingly (3.16) applies. Since s0 = 0, inequality (3.52) is also true.
Our target now is to prove a similar inequality to (3.37) for the Bregman variation of
IT method. Similarly to what occurs with the dual gradient methods, this property needs
to be proven by induction at the same time with (3.31) and (3.32) . For this reason, it is
necessary to observe the inner and outer iteration of K-REGINN simultaneously. Regarded
as inner iteration of K-REGINN, IT method has a constant regularization parameter α in
the inner iteration, but it can be chosen dependent on the index n of the outer iteration:
α = αn . Let Assumption 1, page 31, hold true and assume that X is uniformly smooth and
s−convex with p ≤ s ≤ r. Define now x = xn , sk = sn,k , A = An , b = bn and α = αn in
(3.61), which results in

1 ∗
Jp (zn,k+1 ) = Jp (zn,k ) − A jr (An sn,k+1 − bn ) , (3.62)
αn n
3.1. SOLVING THE INNER ITERATION 45

with zn,k := xn +sn,k . With these definitions, the vector zn,k+1 ∈ X is the unique minimizer
of the Tikhonov functional
1
Tn,k (z) := kAn (z − xn ) − bn kr + αn ∆p (z, zn,k ) . (3.63)
r
In order to prove (3.37) , and based in the proof of this inequality for the dual gradient
method DE presented in the last subsection, we assume that11 xn ∈ Bρ (x+ , ∆p ) and observe
that this implies that ∆p (x+ , zn,0 ) = ∆p (x+ , xn ) < ρ and consequently kzn,0 k ≤ Cρ,x+ , with
the constant Cρ,x+ > 0 being defined in (3.30) . Suppose now by induction that for some
k ∈ N, the inner iterates zn,0 , ..., zn,k are well-defined and satisfy the inequalities (3.31)
and (3.32) . Everything we need to complete the induction proof is that the inequality
∆p (x+ , zn,k+1 ) < ∆p (x+ , zn,k ) holds. To this end, we adapt ideas of our previous work
[39]. Applying the three points identity (2.21) :

∆p x+ , zn,k+1 − ∆p x+ , zn,k = −∆p (zn,k+1 , zn,k )


 
(3.64)
+


+ Jp (zn,k+1 ) − Jp (zn,k ) , zn,k+1 − x
≤ Jp (zn,k+1 ) − Jp (zn,k ) , zn,k+1 − x+

1
=− hjr (An sn,k+1 − bn ) , An (sn,k+1 − en )i
αn
1
= hjr (An sn,k+1 − bn ) , (An en − bn ) − (An sn,k+1 − bn )i
αn
1  
≤ kAn sn,k+1 − bn kr−1 kAn en − bn k − kAn sn,k+1 − bn kr .
αn
We would like to have the linearized residual in the k−th inner iterate, kAn sn,k − bn k,
instead of kAn sn,k+1 − bn k in the rightmost term of above inequality. Observe however
that, since the functional k·kr is convex,

kAn sn,k − bn kr ≤ 2r−1 (kAn sn,k+1 − bn kr + kAn (sn,k+1 − sn,k )kr ) ,

which implies that


1
− kAn sn,k+1 − bn kr ≤ − kAn sn,k − bn kr + M r kzn,k+1 − zn,k kr . (3.65)
2r−1
To estimate the last term in the right-hand side of (3.65) we recall the definition ∇n,k =
A∗n jr (An sn,k − bn ) and make use of the s−convexity of X. Since X ∗ is s∗ −smooth and as
p∗ ≥ s∗ , it follows from (2.25) and (3.32) that

kzn,k+1 − zn,k k = Jp∗∗ (Jp (zn,k+1 )) − Jp∗∗ (Jp (zn,k ))



 ∗
≤ C p∗ ,s∗ kJp (zn,k+1 ) − Jp (zn,k )kp −1
∗ ∗ ∗

∨ kJp (zn,k )kp −s kJp (zn,k+1 ) − Jp (zn,k )ks −1
p−s∗ (p−1)
 
1 ∗ C ρ,x+ ∗
≤ C p∗ ,s∗  p∗ −1 k∇n,k+1 kp −1 ∨ s∗ −1 k∇n,k+1 ks −1  .
αn αn

As p ≤ s ≤ r, the inequality (r − 1) (p∗ − 1) − 1 ≥ 0 holds true and as

kAn sn,k+1 − bn k ≤ kAn sn,k − bn k ≤ kbn k ,


11
Similar to the DE method, this property needs to be assumed in order to prove inequality (3.32) . This
fact will be proven later in Theorem 38.
46 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

it follows that
∗ −1 ∗ −1 ∗ −1)
k∇n,k+1 kp ≤ Mp kAn sn,k+1 − bn k(r−1)(p (3.66)
p∗ −1 (r−1)(p∗ −1)−1
≤M kbn k kAn sn,k − bn k .

Proceeding similarly, replacing p∗ with s∗ and using at this time the inequality (r − 1) (s∗ − 1)−
1 ≥ 0 we arrive at
M r kzn,k+1 − zn,k kr ≤ C (n)r kAn sn,k − bn kr ,
with
p−s∗ (p−1) ∗ ∗ −1)−1

C p∗ ,s∗ M p kbn k(r−1)(p
∗ −1)−1
C p∗ ,s∗ Cρ,x+ M s kbn k(r−1)(s
C (n) := ∗ −1 ∨ ∗ .
αnp αns −1

Choose 0 < C0 < 2−1/r and observe that C (n) ≤ C0 if and only if αn ≥ αmin,n with
p−1 s−1 s−p s r−s
C p∗ ,s∗ M p kbn kr−p C p∗ ,s∗ Cρ,x + M kbn k
αmin,n := ∨ .
C0p−1 C0s−1

Hence, for αn ≥ αmin,n ,

M r kzn,k+1 − zn,k kr ≤ C0r kAn sn,k − bn kr .

Inserting it in (3.65), (3.65) in (3.64) and using the inequality

kAn sn,k+1 − bn k ≤ kAn sn,k − bn k

once again, we obtain

∆p x+ , zn,k+1 − ∆p x+ , zn,k
 
(3.67)
1
≤ kAn sn,k − bn kr−1 (kAn en − bn k − C1 kAn sn,k − bn k) ,
αn

with C1 := 1/2r−1 −C0r > 0, which has the same form as (3.37) with λn,k = 1/αn . Employing
the TCC and choosing carefully the constant µ, we conclude, like in (3.28) and (3.38), that
the right-hand side of (3.67) is negative, this is, ∆p (x+ , zn,k+1 ) < ∆p (x+ , zn,k ) , completing
the induction proof.
Notice that, from TCC, the sequence of the residuals can be proven to be bounded, this
is, kbn k ≤ C2 (see (3.41)) and since p ≤ s ≤ r,
p−1 s−1 s−p s r−s
C p∗ ,s∗ M p C2r−p C p∗ ,s∗ Cρ,x + M C2
αmin,n ≤ ∨ =: αmin ,
C0p−1 C0s−1

which means that the inequality αn ≥ αmin implies αn ≥ αmin,n . Choosing, for instance,
αmax := Kαmin with K > 1, we conclude that all the above results hold true for

αn ∈ [αmin , αmax ] . (3.68)

In particular, the choice αn =constant is possible.

Remark 32 See that, if a single step is given in each inner iteration, i.e., if kn = 1
whenever (3.9) does not hold (this means that kmax = 1 in (3.12)), then xn+1 minimizes
1 0
r
Tn (x) = y[n] − F[n] (xn ) − F[n] (xn ) (x − xn ) + αn ∆p (x, xn ) .

r
3.1. SOLVING THE INNER ITERATION 47

In Hilbert spaces this functional reads (with p = r = 2)


1 0
2 α
n
Tn (x) = y[n] − F[n] (xn ) − F[n] (xn ) (x − xn ) + kx − xn k2

2 2
revealing K-REGINN-IT as a Kaczmarz version of the Levenberg-Marquardt method in Ba-
nach spaces. In case of a linear problem (Fj = Aj are linear for all j’s) we have
1 y[n] − A[n] x r + αn ∆p (x, xn )

Tn (x) =
r
and the method is now a Kaczmarz version of the Iterated-Tikhonov method defined in [26].

3.1.4 Mixed gradient-Tikhonov methods


To apply a Tikhonov method in order to generate the inner iteration of K-REGINN as done
in the last subsection, one needs to minimize a Tikhonov functional on each single step of
the inner iteration. But, even finding an approximate minimizer for a Tikhonov functional
is normally not a simple task and finding it exactly is almost impossible in general Banach
spaces. Motivated by this reasoning, we propose to employ a dual gradient method to
iteratively minimize a fixed Tikhonov functional and then use the resulting iteration as
inner iteration of K-REGINN. The advantage of this procedure over a classical Tikhonov
iteration as presented in Subsection 3.1.3 is clear: we do not intend to find a minimizer of
a Tikhonov functional. We only apply a dual gradient method to iteratively improve the
current iterate until finding an approximate minimizer good enough.
Assume Assumption 1 in page 31, and let X be uniformly smooth and s−convex with
p ≤ s. Define the Tikhonov functional
1
Tn (s) := kAn s − bn kr + αn Ωn (s) , (3.69)
r
where r > 1, αn > 0 and Ωn : X → R+ 0 is a subdifferentiable functional satisfying the
condition: for each n ∈ N fixed, the sequence (sm )m∈N is bounded whenever (Ωn (sm ))m∈N
is bounded. This condition guarantees the existence of a minimizer of Tn (see the reasoning
in the beginning of last subsection). Define now

Jp (zn,k+1 ) := Jp (zn,k ) − λn,k ∇Tn,k , (3.70)

with zn,0 := xn , λn,k > 0 and

∇Tn,k ∈ ∂Tn (sn,k ) = A∗n Jr (An sn,k − bn ) + αn ∂Ωn (sn,k )

with sn,k := zn,k − xn . See Algorithm 5 for an implementation in pseudocode.


Since in each inner iteration the update zn,k+1 is obtained following the direction of the
gradient of a Tikhonov functional, we hope to obtain extra stability in comparison to the
regular gradient methods, where the direction of the gradient of the linearized residual is
chosen to update the current iterate. Compare (3.70) and (3.29) .
We now apply a similar idea to that used to derive the Decreasing Error method in
Subsection 3.1.2, in order to prove inequalities (3.31) and (3.32) at the same time. Assume
that xn ∈ Bρ (x+ , ∆p ) , which implies that kzn,0 k ≤ Cρ,x+ , see (3.30) . Suppose by induction
that for some k ∈ N, the iterates zn,0 , ..., zn,k are well-defined and satisfy the inequalities
(3.31) and (3.32) . Again, everything we need in order to complete the induction proof is
to prove that the inequality ∆p (x+ , zn,k+1 ) < ∆p (x+ , zn,k ) holds. Making use of the three
points identity (2.21) and inequality (2.24) we find, like in (3.33) and (3.35) ,

∆p x+ , zn,k+1 − ∆p x+ , zn,k ≤ g (λn,k ) ,


 
48 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

Algorithm 5 K-REGINN with mixed gradient-Tikhonov inner iteration


Input: xN ; (y δj , δj ); Fj ; Fj0 , j = 0, . . . , d − 1; µ; τ ; p, r > 1;
δ
Output: xN with kyj j − Fj (xN )k ≤ τ δj , j = 0, . . . , d − 1;
` := 0; x0 := xN ; c := 0;
while c < d do
for j = 0 : d − 1 do
n := `d + j;
δ
bδn := yj j − Fj (xn ); An := Fj0 (xn );
if kbδn k ≤ τ δj then
xn+1 := xn ; c := c + 1;
else
k := 0; sn,0 := 0; choose kmax,n ∈ N; αn > 0 and Ωn : X → R+
0 subdifferentiable;
repeat
choose λn,k > 0 and ∇Tn,k ∈ A∗n Jr (An sn,k − bδn ) + αn ∂Ωn (sn,k );
zn,k := sn,k + xn ;
Jp (zn,k+1 ) := Jp (zn,k ) − λn,k ∇Tn,k ;
sn,k+1 := Jp∗∗ (Jp (zn,k+1 )) − xn ;
k := k + 1;

until bδn − An sn,k < µ bδn or k = kmax,n
xn+1 := xn + sn,k ; c := 0;
end if
end for
` := ` + 1;
end while
xN := x`d−c ;

with
p−s∗ (p−1) s∗ ∗
 ∗ ∗

g (λn,k ) := −λn,k h∇Tn,k , sn,k − en i + Cp∗ ,s∗ Cρ,x+ λn,k k∇Tn,k ks ∨ λpn,k k∇Tn,k kp

Observe now that,

h∇Tn,k , sn,k − en i = hA∗n jr (An sn,k − bn ) , sn,k − en i + αn s∗n,k , sn,k − en ,



with some vector s∗n,k ∈ ∂Ωn (sn,k ) . Like in (3.34),

hA∗n jr (An sn,k − bn ) , sn,k − en i ≥ kAn sn,k − bn kr − kAn sn,k − bn kr−1 kAn en − bn k ,

and as s∗n,k ∈ ∂Ωn (sn,k ) , it follows from definition of subgradient that,




sn,k , sn,k − en ≥ Ωn (sn,k ) − Ωn (en ) ≥ −Ωn (en ) .

If an upper bound C3 > 0 for Ωn (en ) is known, i.e., if

Ωn (en ) ≤ C3 (3.71)

for all n ∈ N, then choosing 0 < αn ≤ (C4 /C3 ) kbn kr with 0 < C4 < 1,

h∇Tn,k , sn,k − en i ≥ kAn sn,k − bn kr − kAn sn,k − bn kr−1 kAn en − bn k − C4 kbn kr .


3.1. SOLVING THE INNER ITERATION 49

Proceeding similarly to (3.39) , we define λDE = λDE (n, k) as

λDE := C1 λDE,s ∧ C2 λDE,p (3.72)

with C1 and C2 like in (3.39) and λDE,` being the same as λDE,` but with ∇Tn,k replacing
∇n,k , i.e.,
kAn sn,k − bn kr(`−1)
λDE,` := .
k∇Tn,k k`

Thus, λn,k ∈ 0, λDE implies that
h i
r−1 r
g (λn,k ) ≤ λn,k kAn sn,k − bn k (kAn en − bn k − C5 kAn sn,k − bn k) + C4 kbn k . (3.73)

where C5 = 1 − C0 . See that C5 > C4 > 0 if 0 < C0 < 1 − C4 .


Inequality (3.73) above was achieved using neither the TCC nor the definition of kn .
However, as we will see later in Theorem 38, if the constant η in the TCC is small enough
and the constant µ which stops the inner iteration of K-REGINN is well chosen, then the
right-hand side of (3.73) is negative. Consequently, g (λn,k ) < 0 and the induction proof is
complete.
We assume now that the functionals Ωn satisfy the property: Ωn (sn,k ) is uniformly
bounded in n and k whenever (sn,k ) is uniformly bounded in n and k. In this case, since
the sequences (sn,k )0≤k≤kn , n ∈ N are uniformly bounded (see (3.42)), it follows that,

1
αn kΩn (sn,k )k . αn . kbn kr ≤ kAn sn,k − bn kr
µr
for k = 0, ..., kn − 1. Thus,

k∇Tn,k k ≤ M kAn sn,k − bn kr−1 + αn kΩn (sn,k )k


. kAn sn,k − bn kr−1 ∨ kAn sn,k − bn kr
. kAn sn,k − bn kr−1 ,

because kAn sn,k − bn k is uniformly bounded (see (3.41) and (3.42)). Consequently,

λDE,` & kAn sn,k − bn k`−r

and therefore,

λDE & kAn sn,k − bn ks−r ∧ kAn sn,k − bn kp−r & kAn sn,k − bn ks−r (3.74)

for k = 0, ..., kn − 1.
Similarly to (3.43) , one can prove for s ≤ r that λDE,` & C6−r , where C6 > 1 is an
upper bound to kAn sn,k − bn k . This reasoning implies that the Landweber method, with
constant step-size given by
λLW := C6−r , (3.75)
satisfies λLW ≤ λDE , which implies that inequality (3.73) holds for this method.
Suppose from now on that the functional Ωn is defined by Ωn (s) := p1 ks − xn kp , with
(xn )n∈N ⊂ X being independent of k. Then the iteration (3.70) assumes the form

Jp (zn,k+1 ) = Jp (zn,k ) − λn,k (A∗n jr (An sn,k − bn ) + αn Jp (sn,k − xn )) . (3.76)

In this case, as X is s−convex and hence strictly convex, Ωn is strictly convex too and the
minimizer of Tn , defined in (3.69) , is unique.
50 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN

In order to find an upper bound C3 for Ωn (en ) = p1 ken − xn kp , an upper bound for
kx+ k normally needs to be known. For example, if xn := −xn , i.e., Ωn (s) = p1 ks + xn kp ,
p
then we need a bound for Ωn (en ) = p1 kx+ k . If the functional Ωn (s) = p1 ks − (x0 − xn )kp
is considered, then
1 x+ − x0 p ≤ 1 x+ + kx0 k p .

Ωn (en ) =
p p
p
The bound Ωn (en ) ≤ kx+ k + Cx+ ,ρ /p holds in case of Ωn (s) = p1 kskp is chosen and


xn ∈ Bρ (x+ , ∆p ) is assumed.
We finish this subsection giving a practical example: consider the iteration (3.76) with
xn := 0, which is equivalent to the dual gradient iteration (3.70) applied to the Tikhonov
functional
1 1
Tn (s) = kAn s − bn kr + αn kskp .
r p
If one wants to use SD method to iteratively minimize this functional, it is necessary to
define the step-size λn,k in (3.70) such that the number Tn (sn,k+1 ) is as small as possible.
As
sn,k+1 = Jp∗∗ (Jp (sn,k + xn ) − λn,k ∇Tn,k ) − xn ,
we conclude that λn,k ∈ arg minh (λ) , with h : R+ +
0 → R0 defined by
λ∈R+
0

h (λ) := Tn Jp∗∗ (Jp (sn,k + xn ) − λ∇Tn,k ) − xn .




It is possible to employ now an iterative method to find a minimizer of h if it exists.


Although D(h) ⊂ R, this optimization problem is often difficult to be solved and can be
computationally expensive, particularly if the Banach space Y has poor convexity and
smoothness properties.
In Hilbert spaces however, the use of polarization identity (with p = r = 2) yields
1  
h (λ) = Tn (sn,k − λ∇Tn,k ) = Tn (sn,k ) − λ k∇Tn,k k2 + λ2 kA∇Tn,k k2 + αn k∇Tn,k k2 .
2
Requiring h0 (λSD ) = 0, we obtain the explicit step-size

k∇Tn,k k2
λSD = .
kA∇Tn,k k2 + αn k∇Tn,k k2

The result implies in particular that 1/ M 2 + αn ≤ λSD ≤ 1/αn . Finally, from the modi-


fied version of Young’s inequality12 , it follows that for all a, b ≥ 0 and  > 0,
 2
b2

2 2 a 1+ 2
(a + b) ≤ a + 2 + + b2 = a + (1 + ) b2 .
2 2 
Thus,

k∇Tn,k k4 = h∇Tn,k , ∇Tn,k i2 = (hA∇Tn,k , An sn,k − bn i + αn h∇Tn,k , sn,k i)2


1+
≤ kA∇Tn,k k2 kAn sn,k − bn k2 + (1 + ) αn2 k∇Tn,k k2 ksn,k k2 ,

for all  > 0. Now, if
kAn sn,k − bn k2
αn ≤ ,
 ksn,k k2
12
For all a, b ≥ 0 and  > 0 it holds: ab ≤ a2 /2 + b2 /2.
3.1. SOLVING THE INNER ITERATION 51

it follows that
1+  
k∇Tn,k k4 ≤ kAn sn,k − bn k2 kA∇Tn,k k2 + αn k∇Tn,k k2 ,

and consequently,
1 +  kAn sn,k − bn k2
λSD ≤ ,
 k∇Tn,k k2
for all  > 0. Choosing 0 <  ≤ 1/ (1 − C0 ) , we find (1 + ) / ≤ C0 , which implies that
λSD ≤ λDE , see (3.72) .
52 CHAPTER 3. THE INEXACT NEWTON METHOD K-REGINN
Chapter 4

Convergence Analysis of K-REGINN

Although we do not consider in this chapter any specific method to generate the inner
iteration of K-REGINN, some properties of the inner iteration sequence (sn,k )k∈N are required
in form of assumptions. Only these assumptions are then used to prove general results for
the outer iteration sequence (xn )n∈N . Many of these required properties were previously
proven for the sequences generated by the methods presented in Section 3.1, which means
that these sequences can be used as inner iteration of K-REGINN and the respective results
proved in this chapter are assured for these methods. We point out however, that any
sequence satisfying the necessary requirements assumed in this chapter can be used as
inner iteration of K-REGINN and all the respective results will accordingly hold for them.
We assume again that only noisy data is available, that is, δ > 0 in (3.8) is given. To
δ[n]
stress this fact, we accordingly add a superscript δ in the residual bn : bδn = y[n] − F[n] (xn ) .

Lemma 33 Let X and Y be Banach spaces with Y being smooth. Let Assumption 1 in
page 31 hold true and kn = kREG . If xn is well-defined and kREG < ∞, then sn,kn is a
δ[n]
descent direction for the functional ψn (·) := F[n] (·) − y[n] from xn .


Proof. The result follows from Proposition 22 and An sn,k − bδn < µ bδn < bδn .
n

δ[n] δ[n]
The above lemma shows that F[n] (xn + tsn,kn ) − y[n] < F[n] (xn ) − y[n] for t > 0

small enough. But, as we will see later in Theorem
35, if the right
assumptions are required,
δ[n] δ[n]
then t = 1 can be used and consequently F[n] (xn+1 ) − y[n] < F[n] (xn ) − y[n] .

4.1 Termination and qualified decreasing residual


To show termination of K-REGINN, we need only some few restrictions on the sequence
(sn,k )k∈N . Keeping this goal in mind, we use for the first results a level set based analysis.
The level set associated to the point x ∈D(F ) is defined as
n o
£ (x) := x ∈ D (F ) : Fj (x) − yjδ ≤ Fj (x) − yjδ , j = 0, ..., d − 1 .

For any x ∈D(F ) , it holds x ∈ £ (x) and x ∈ £ (x) =⇒ £ (x) ⊂ £ (x) .


In the first step towards the termination of K-REGINN, we prove that kn is finite. For
that, we require the first assumption on the sequence (sn,k )k∈N .

Assumption 2 If the iterate xn of K-REGINN is well-defined, then there exists a constant


C ≥ 0 independent on n and k such that
r 1  r r 
lim An sn,k − bδn ≤ An s − bδn + C bδn , for all s ∈ X.

k→∞ 1+C

53
54 CHAPTER 4. CONVERGENCE ANALYSIS OF K-REGINN

The Assumption 2 holds true if the sequence (sn,k )k∈N is generated from the following
methods:

• Each primal gradient method, defined by iteration (3.15) , presented in Subsection


3.1.1, with step-size λk ∈ [cλM SD , λM SD ] and 0 < c < 1, where λM SD is defined in
(3.18) (this includes the MSD method itself) and assuming that Y[n] is r−smooth (see
Lemma 24).

• The primal gradient LW method, defined by iteration (3.15) with the constant step-
size (3.22) and with the additional requirement p ≤ r. For this method, we also assume
the hypotheses of Lemma 24. In particular, the space Y[n] needs to be r−smooth.

• The Tikhonov-Phillips method presented in Subsection 3.1.3 under the hypotheses of


Lemma 30.

We did not prove this property for the dual gradient methods, Iterated-Tikhonov and
mixed gradient-Tikhonov methods. Therefore, the results shown in this section, which are
based on Assumption 2 above are in principle not guaranteed for these methods. However,
the inequalities (3.37) , (3.67) and (3.73) are strong enough to guarantee not only the results
of this section, but even the stronger results of the next one (see Theorem 38 below).
Assumption 2 is a weaker version of

lim An sn,k − bδn = inf An s − bδn ,

k→∞ s∈X

which is equivalent to1


lim An sn,k = PR(An ) bδn
k→∞

in Hilbert spaces. This last property is exactly the Assumption (2.2) in [36], used to prove
[36, Lemma 2.3], which is a similar version of next proposition. Assumption 2 combined
with a careful choice of µ implies that the inner iteration stops after finitely many iterations.

Proposition 34 Assume Assumption 2 and that the iterate xn is well-defined. Assume


further that the TCC (Assumption 1(c) , page 31) holds for xn , x+ and F[n] . Then for
τ > (1 + η) / (1 − η) and µ ∈ (µmin , 1) with
 r
1+η
τ + η +C
µrmin := ,
1+C
the stop index kn is finite.

Proof. η < 1 together with the restriction on τ imply (1 + η) /τ + η < 1, which in turn
implies that the interval of picking the parameter µ is non-empty. If inequality (3.9) is
satisfied, then kn = 0. Otherwise, bδn > τ δ[n] and we define en := x+ − xn , use (3.7) and
the TCC to obtain

δ δ[n] 0 +
A e − b = y − F (x ) − F (x ) x − x (4.1)

n n n [n] [n] n [n] n n

δ[n] 0
− y[n] + F[n] x+ − F[n] (xn ) − F[n] (xn ) x+ − xn
 
≤ y[n]

≤ δ[n] + η F[n] x+ − F[n] (xn )



   1 + η 
δ
≤ δ[n] + η δ[n] + bn < + η bδn .

τ
1
PR(An ) : Y → Y is the orthogonal projection in R (An ).
4.1. TERMINATION AND QUALIFIED DECREASING RESIDUAL 55

Now, we apply Assumption 2 with s = en to prove


r 1  r r 
lim An sn,k − bδn ≤ An en − bδn + C bδn

k→∞ 1+C
 r
1+η
τ + η +C
δ r
r
≤ bn < µr bδn ,

1+C
which means that kn ≤ kREG < ∞.
Under the hypotheses of above proposition, we conclude that xn well-defined implies
xn+1 well-defined, this is, the inner iteration terminates. The next theorem shows that the
outer iteration also terminates after finitely many iterations if d = 1 (only a single equation
in (3.6) is considered) and kn = kREG . This theorem is an adaptation of [36, Theo. 2.8].

Theorem 35 (Termination) Let X and Y be Banach spaces with Y being smooth. Let
D(F ) be open, d = 1, kn = kREG and suppose that Assumption 2 holds true. Suppose
additionally that Assumption 1 in page 31 holds true with Bρ (x+ , ∆p ) replaced by £ (x) for
some x ∈ X fixed and start with x0 ∈ £ (x) . Further, assume that the constant η in the
TCC (Assumption 1(c)) is small enough to satisfy the inequality
ηr + C
< (1 − 2η)r (4.2)
1+C
and the constant τ in the discrepancy principle (3.13) is large enough to verify the inequality
 r
1+η
+ η + C < (1 + C) (1 − 2η)r . (4.3)
τ
Define now the constants 0 < µmin < µmax < 1 as
 r
1+η
τ + η +C
µrmin := , µmax := 1 − 2η (4.4)
1+C
and pick up µ ∈ (µmin , µmax ) . Then, there exists a constant 0 < Λ < 1 such that:

1. all the iterates of REGINN are well-defined and belong to the level set £ (x0 ) as long as
the discrepancy principle (3.13) is not satisfied;

2. there exists N = N (δ) ∈ N satisfying (3.13) and



ln(τ δ/ bδ0 )
N< + 1; (4.5)
ln Λ
3. the residual has the qualified decreasing behavior

δ
bn+1 ≤ Λ bδn (4.6)

for n = 0, ..., N − 1.

Proof. We first discuss the choices of the constants. See that


 r 1  1
η +C r C r
lim + 2η = < 1,
η→0+ 1+C 1+C
which implies that (4.2) is verified provided η small enough. Now, due to the restriction on
η,  r
1+η
τ + η +C ηr + C
lim = < (1 − 2η)r
τ →∞ 1+C 1+C
56 CHAPTER 4. CONVERGENCE ANALYSIS OF K-REGINN

and therefore, inequality (4.3) is verified whenever τ is large enough. Further, the restriction
on τ implies that the constants µmin and µmax in (4.4) are well-defined and thus the interval
for choosing the tolerance µ is non-empty. Finally, choosing Λ ∈ R satisfying
η (1 + µ)
µ+ < Λ < 1,
1−η
we observe that the restriction µ < µmax implies that µ + η (1 + µ) / (1 − η) < 1. Hence Λ
is well-defined. Further, as 1 − 2η < 1, we also have τ > (1 + η) / (1 − η) and Proposition
34 applies.
We use now an inductive argument: x0 is well-defined and belongs to £ (x0 ) . Suppose
that the iterates x0 , ..., xn with n ≤ N − 1, are well-defined, belong to £ (x0 ) and satisfy
inequality (4.6) . From Proposition 34, kn < ∞ and consequently xn+1 = xn + sn,kn is
well-defined. We prove next that xn+1 ∈D(F ) and inequality (4.6) holds for this vector,
which will prove that xn+1 ∈ £ (x0 ) . In fact, for each t ∈ R define the vector

xn,t := xn + tsn,kn ∈ X

and see that for each t ∈ R such that xn,t ∈ £ (x) ⊂D(F ) , it is true that (see Assumption
1(c))

F (xn,t ) − F (xn ) − F 0 (xn ) (xn,t − xn ) ≤


η F 0 (xn ) (xn,t − xn )

1−η
η F 0 (xn ) sn,k ,

=t n
1−η
and from definition (3.10) of kREG ,
0 δ
δ 0

δ
F (xn ) sn,k − b ≤ b
n n − F (x ) s

n n,kn < µ bn ,
n

which implies that kF 0 (xn ) sn,kn k < (1 + µ) bδn . It follows that



F (xn,t ) − y δ ≤ (1 − t) bδn + t bδn − F 0 (xn ) sn,kn (4.7)

+ F (xn,t ) − F (xn ) − F 0 (xn ) tsn,kn



η (1 + µ)
< (1 − t) bδn + tµ bδn + t
δ
bn

1−η
   
η (1 + µ) δ

δ
= 1−t 1− µ+ b
n ≤ [1 − t (1 − Λ)] bn .
1−η

for each t ∈ [0, 1] such that xn,t ∈ £ (x). Let

tmax := sup {t ∈ [0, 1] : xn,t ∈ £ (x0 )} .

We want to prove that tmax = 1. Observe first that the above set where the supremum is
taken is
non-empty because from Lemma 33, sn,kn is a descent direction for the functional
ψ (·) := F (·) − y δ from xn and since D(F ) is open, there exists a constant t > 0 such
that for all 0 < t ≤ t, xn,t ∈D(F ) and
Induction
F (xn,t ) − y δ < F (xn ) − y δ ≤ F (x0 ) − y δ .

This implies that xn,t ∈ £ (x0 ) for all 0 < t ≤ t. Assume next that tmax < 1, i.e., xn,tmax ∈
∂£ (x0 ) . As x0 ∈ £ (x) , it follows that

xn,tmax ∈ ∂£ (x0 ) ⊂ £ (x0 ) ⊂ £ (x) ⊂ D (F ) .


4.1. TERMINATION AND QUALIFIED DECREASING RESIDUAL 57

Thus,
F (xn,tmax ) − y δ < [1 − tmax (1 − Λ)] bδn < bδn ≤ bδ0 ,

which xn,tmax ∈ ∂£ (x0 ) . Choosing t = 1 in (4.7) we find (4.6) . Now, from (4.6) ,
δ contradicts
bn ≤ Λn bδ → 0 as n → ∞, which implies that (3.13) is satisfied for n large enough.
0
Finally, from
τ δ < bδN −1 ≤ ΛN −1 bδ0

we obtain 
ln τ δ/ bδ0
N −1< .
ln (Λ)

Note that the level set £ (x) can be replaced by £ (x0 ) or X in the above theorem.

Corollary 36 Assume all the hypotheses of Theorem 35. Then, F xN (δ) → F (x+ ) as


δ → 0.

Proof. The result follows from


 δ 
y − y + y δ − F xN (δ) ≤ (1 + τ ) δ.

y − F xN (δ) ≤
(4.8)

From Tangential Cone Condition, Assumption 1(c) , page 31, it follows easily that

1 Fj0 (w) (v − w)

kFj (v) − Fj (w)k ≤
1−η

for all v, w ∈ Bρ (x+ , ∆p ) and j = 0, . . . , d − 1. Now, if d = 1 and x+ is the unique solution


of (3.1) in a neighborhood of x+ , then the null space of A := F 0 (x+ ) satisfies N (A) = {0}
because if 0 6= v ∈ N (A) , then for t > 0 small enough,

F (x+ − tv) − y = F (x+ − tv) − F (x+ ) ≤


1
t kAvk = 0.
1−η

In general, the function k·kA : X → R, x 7−→ kAxk is a semi-norm in X and it is a


norm in case of A being injective (it is a weaker norm than the standard norm, because
kxkA = kAxk ≤ kAk kxk , for all x ∈ X).

Corollary 37 Assume all the hypotheses of Theorem 35. Then, xN (δ) − x+ A → 0 as
δ→0

Proof. From inequality (4.8) of Corollary 36 and Assumption 1(c) ,

xN (δ) − x+ = F 0 x+ xN (δ) − x+
 
A
≤ (η + 1) F xN (δ) − F x+ ≤ (η + 1) (1 + τ ) δ.
 

The boundedness of £ (x0 ) would certainly facilitate a convergence analysis of the family
xN (δ) δ>0 ⊂ £ (x0 ) using for instance, a weak convergence argument in reflexive Banach
spaces. There is no reason however, to believe that this is the case. On the contrary, this
set is expected to be unbounded for ill-posed problems, which stimulate us to concentrate
in a local convergence analysis restricted to the bounded set Bρ (∆p , x+ ) .
58 CHAPTER 4. CONVERGENCE ANALYSIS OF K-REGINN

4.2 Decreasing error and weak convergence


Assumption 3 If the iterate xn of K-REGINN is well-defined and the definitions en :=
x+ − xn and zn,k := sn,k + xn with sn,0 := 0 are adopted, then there exist constants 0 ≤
K0 < K1 ≤ 1 and positive numbers λn,k satisfying
t
λn,k & An sn,k − bδn , with t > −r

such that

∆p x+ , zn,k+1 − ∆p x+ , zbn,k
 
 r−1   r 
δ
≤ λn,k An sn,k − bn An en − bn − K1 An sn,k − bn + K0 bδn ,
δ δ

where either zbn,k = zn,k or zbn,k = xn .

The following methods satisfy the required properties of Assumption 3, with zbn,k = zn,k ,
provided X is uniformly smooth and s−convex with p ≤ s:

• The dual gradient method DE, presented in (3.29) and (3.39) , Subsection 3.1.2 with
K1 = 1 − C0 , 0 < C0 < 1, K0 = 0 (see (3.37)) and t = s − r (see (3.40));

• Each dual gradient method,


h defined by iteration
i (3.29) in Subsection 3.1.2, with step-
δ
t
size satisfying λn,k ∈ c An sn,k − bn , λDE where c > 0 and t > −r (K1 and K0

are the same as the DE method). In particular, the MSD method as defined in (3.45) ,
with −r < t ≤ 0, see (3.46), and LW method with step-size being given by a small
constant (see (3.43)) and with s ≤ r. For this last method, it obviously holds t = 0;

• The Bregman variation of the IT method defined by the implicit iteration (3.62)
in Subsection 3.1.3 with K1 = 1/2r−1 − C0r , 0 < C0r < 1/2r−1 , λn,k = 1/αn and
K0 = t = 0 (see (3.67) and (3.68));

• Each of the mixed gradient-Tikhonov methods presented in h Subsection 3.1.4 andi


t
defined by iteration (3.76) , with step-size satisfying λn,k ∈ c An sn,k − bδn , λDE
with c > 0 and t > −r, where λDE is defined in (3.72). In particular, the DE method
itself (see (3.74)) and the LW method with a small constant step-size (see (3.75)).
For these methods, 0 < K0 < 1 − C0 = K1 with 0 < C0 < 1, as we can see in (3.73).

The Assumption 3 is verified for zbn,k = xn if X is uniformly smooth and uniformly


convex for:

• The Bregman variation of TP method as presented in (3.58) , Subsection 3.1.3, with


K1 = 1, K0 = (rθ − 1) /r < 1, λn,k = 1/αk and t = 0 (see (3.59) and (3.57)).

Assumption 3 is a generalization of [36, Assumption (3.2)] and the next Theorem cor-
responds to Theorem 3.1 in the same reference.

Theorem 38 (Decreasing error) Let X and Y be Banach spaces with X being uniformly
smooth and uniformly convex. Assume that Assumption 3 and Assumption 1 in page 31
hold true with the constant η in the TCC satisfying

0 ≤ η < K 1 − K0 ,
4.2. DECREASING ERROR AND WEAK CONVERGENCE 59

and the constant τ in (3.13) satisfying


η+1
τ> . (4.9)
K1 − K0 − η
Assume additionally that Assumption 2 in page 53 holds true with C = 0 in case of zbn,k = xn
and start with x0 ∈ Bρ (x+ , ∆p ) . Then for µ ∈ (µmin , 1) with
η+1
+ η + K0
µrmin := τ
,
K1
1. all the iterates of K-REGINN are well-defined and belong to Bρ (x+ , ∆p ) ⊂D(F ) as long
as the discrepancy principle (3.13) is not satisfied;
2. the outer iteration terminates, this is, there exists a number N = N (δ) ∈ N satisfying
(3.13) ;
3. the iterates have the error decreasing behavior

∆p x+ , xn+1 ≤ ∆p x+ , xn
 
(4.10)

for n = 0, ..., N − 1, where the equality holds if and only if (3.9) holds;
4. the generated sequence is bounded:

kxn k ≤ Cρ,x+ for all δ > 0 and n = 0, ..., N, (4.11)

where Cρ,x+ > 0 is defined in (3.30) ;


5. there exists a constant C0 > 0 such that the inequality
n −1
kX r
λn,k An sn,k − bδn ≤ ∆p x+ , xn − ∆p x+ , xn+1
 
C0 (4.12)

k=0

holds if zbn,k := zn,k and


r
C0 λn,kn −1 An sn,kn −1 − bδn ≤ ∆p x+ , xn − ∆p x+ , xn+1
 
(4.13)

if zbn,k := xn .

Proof. The restrictions on the constant η imply the well-definedness of τ and µmin < 1. We
use induction: as x0 ∈ Bρ (x+ , ∆p ) , the bound kx0 k ≤ Cρ,x+ holds (see (2.20)). Suppose
that the iterates x0 , ..., xn , with n < N are well-defined, belong to Bρ (x+ , ∆p ) and satisfy
(4.10) as well as (4.11) . If (3.9) is verified, then xn+1 = x n and all the required properties
hold true for xn+1 . Otherwise, as bδn ≤ µ1 An sn,k − bδn for k = 0, ..., kn − 1, we proceed

like in (4.1) to validate


1 + η 
δ
An en − bn < + η bδn ≤ Cη,τ,µ An sn,k − bδn ,

τ
 
with Cη,τ,µ := 1+η τ + η /µ. Using now µmin < µ < 1, we get from Assumption 3,
h r r i
+ +
< λn,k (Cη,τ,µ − K1 ) An sn,k − bn + K0 bδn
δ
 
∆p x , zn,k+1 − ∆p x , zbn,k (4.14)

 
K0 r

≤ λn,k Cη,τ,µ − K1 + r An sn,k − bδn
µ
r
≤ −C0 λn,k An sn,k − bδn ,

60 CHAPTER 4. CONVERGENCE ANALYSIS OF K-REGINN
 
1+η
with C0 := K1 − τ + η + K0 /µr > 0.
t
We study first the case zbn,k = zn,k . Using the inequality λn,k & An sn,k − bδn , with
t > −r (see Assumption 3), we obtain for all ` ≤ kn ,
 t+r `−1
X t+r
µ bδn `≤ A s − bδ

n n,k n
k=0
`−1
X `−1
r (4.14) X
C0 λn,k An sn,k − bδn ≤ ∆p x+ , zn,k − ∆p x+ , zn,k+1
 
.

k=0 k=0
= ∆p x , zn,0 − ∆p x , zn,` ≤ ∆p x+ , zn,0 = ∆p x+ , xn < ∞,
+ +
 
 

which implies that ` < ∞ and consequently kn < ∞. This means that the inner iteration
terminates. Using ` = kn in the above inequality we obtain
n −1
kX r
C0 λn,k An sn,k − bδn ≤ ∆p x+ , zn,0 −∆p x+ , zn,kn = ∆p x+ , xn −∆p x+ , xn+1 ,
   
0<

k=0
which implies (4.12) and (4.10) . Using now (4.10) , we conclude that
∆p x+ , xn+1 ≤ ∆p x+ , xn ≤ ... ≤ ∆p x+ , x0 < ρ,
  

which means that xn+1 ∈ Bρ (x+ , ∆p ) ⊂D(F ) and that (4.11) is true(see (2.20)).
It remains
to prove that the outer iteration terminates. Define the set I := n ∈ N0 : bδn > τ δ[n]

and let n represent the largest number in I (which in principle could possibly be ∞). For any
t+r
n ∈ I the inequality bδn ≥ (τ δmin )t+r holds, where δmin := min {δj : j = 0, ..., d − 1} >
0. Now, as kn ≥ 1 for all n ∈ I and kn = 0 otherwise, we obtain from (4.12),
n  
X δ t+r t+r
X  X
t+r
(µτ δmin ) ≤ µ bn kn ≤ µ bδn kn
n∈I n∈I n=0
X n −1
n kX r
δ
. λn,k An sn,k − bn

n=0 k=0
Xn
∆p x+ , xn − ∆p x+ , xn+1 ≤ ∆p x+ , x0 < ∞,
  
.
n=0
which can only be true if n < ∞. Then N (δ) = n + 1 < ∞.
We turn now to the proof in the case zbn,k = xn . Observe first that τ > (1 + η) / (1 − η) .
Now, as C = 0 in Assumption 2, 0 < K1 ≤ 1 and K0 ≥ 0, we conclude that µmin is larger
than that one in Proposition 34, which implies that the results of this proposition are valid
here and therefore kn is finite. Using k = kn − 1 in (4.14) ,
∆p x+ , xn+1 − ∆p x+ , xn = ∆p x+ , zn,kn − ∆p x+ , zbn,k
   
r
δ
≤ −C0 λn,kn −1 An sn,kn −1 − bn ,

which is (4.13) . Consequently, xn+1 ∈ Bρ (x+ , ∆p ) and (4.10) as well as (4.11) is true.
Defining the set I and the number δmin as before,
t+r X
X X  t+r
µt+r (τ δmin )t+r ≤ µ bδn ≤ An sn,kn −1 − bδn

n∈I n∈I n∈I
Xn r
. λn,kn −1 An sn,kn −1 − bδn

n=0
Xn
∆p x+ , xn − ∆p x+ , xn+1 ≤ ∆p x+ , x0 < ∞.
  
.
n=0
4.2. DECREASING ERROR AND WEAK CONVERGENCE 61

As before, we conclude that N (δ) < ∞.

Remark 39 Assuming the hypotheses of Theorem 38, the results of Corollaries 36 and 37
are clearly true. Furthermore, the monotonicity estimate (4.10) can be proven in a more
general setting:
∆p (ϑn , xn+1 ) ≤ ∆p (ϑn , xn ) , (4.15)
where ϑn represents a solution of the [n]-th equation in Bρ (x+ , ∆p ): y[n] = F[n] (ϑn ) . We
remark further that, for those iterations satisfying kn = kREG and where inequality (3.9) is
violated by xn , we find following [21],

δ[n] δ[n] 0
y[n] − F[n] (xn+1 ) ≤ y[n] − F[n] (xn ) − F[n] (xn ) (xn+1 − xn )


0
+ F[n] (xn+1 ) − F[n] (xn ) − F[n] (xn ) (xn+1 − xn )


δ[n]
≤ µ y[n] − F[n] (xn ) + η F[n] (xn+1 ) − F[n] (xn )


δ[n] δ[n]
≤ (µ + η) y[n] − F[n] (xn ) + η y[n] − F[n] (xn+1 ) ,

so that
δ[n]

δ[n]
µ+η
y[n] − F[n] (xn+1 ) ≤ Λ y[n] − F[n] (xn ) with Λ := .

1−η
Finally, if η satisfies η + K0 < (1 − 2η)r K1 (this is possible for a small η because K0 < K1 )
and
η+1
τ> r
(1 − 2η) K1 − (η + K0 )
then µrmin < (1 − 2η)r and restricting µ to (µmin , 1 − 2η) yields Λ < 1. In particular, if
d = 1 (only one equation in (3.6) is considered), (4.6) and (4.5) are true.

Remark 40 Inequality (4.14) together with zbn,k = zn,k implies

∆p x+ , zn,k+1 ≤ ∆p x+ , zn,k ≤ ... ≤ ∆p x+ , zn,0 = ∆p x+ , xn < ρ,


   

which is the monotonicity of the inner iteration. In view of (2.20) ,

kzn,k k ≤ Cρ,x+ . (4.16)

Thus the sequences (zn,k )0≤k≤kn , n ≤ N (δ) , are uniformly bounded in n, k and δ.
The monotonicity in the inner iteration does not follow directly from (4.14) if zbn,k = xn ,
but the uniform bound (4.16) still holds in this case because

∆p x+ , zn,k+1 ≤ ∆p x+ , xn < ρ.
 

Weak convergence of K-REGINN follows immediately from (4.11) and the reflexivity of
X.

Corollary 41 (Weak convergence) Let all the assumptions of Theorem 38 hold true.
If the operators
  Fj , j = 0, . . . , d − 1, are weakly sequentially closed then for any se-
(δj )i (i)

quence yj with δ = max (δj )i : j = 0, . . . , d − 1 → 0 as i → ∞, the se-
 i∈N

quence xN (δ(i) ) contains a subsequence that converges weakly to a solution of (3.1)
i∈N
in Bρ (x+ , ∆p ) . If x+ is the unique solution of (3.1) in Bρ (x+ , ∆p ) , then xN (δ) δ>0 con-


verges weakly to x+ as δ = max {δj : j = 0, ..., d − 1} → 0.

Proof. This is a standard proof. See, e.g. [36, Cor. 3.5].


62 CHAPTER 4. CONVERGENCE ANALYSIS OF K-REGINN

4.3 Convergence without noise


From now on, we need to clearly differ between the noisy (δ > 0) and the noise-free (δ = 0)
situations. For this reason we exclusively mark quantities by a superscript δ when the data
is corrupted by noise: xδn , bδn , Aδn , etc. Thus, xn , bn , An , etc. originate from exact data.
Note that the starting guess is chosen independently of δ: xδ0 = x0 .
Algorithm 1 is well defined in the noiseless situation when we set δj = 0, τ = ∞,
and τ δj = 0. Then, the discrepancy principle (3.9) is replaced by kbn k = 0, in which
case xn+1 = xn . Termination only occurs in the unlikely event that an iterate xN satisfies
kyj − Fj (xN )k = 0 for j = 0, . . . , d − 1, i.e., xN solves (3.6). In general, Algorithm 1 does
not stop but produces a sequence which converges strongly to a solution of (3.1) as we will
prove in this section, see Theorem 43 below.
Except for the termination, all results of Theorem  38 hold true with an even larger
interval for the selection of the tolerances: µ ∈ η+K K1
0
, 1 . Accordingly, the constant in
r
(4.14) is replaced by C0 := K1 − (η + K0 ) /µ > 0. Further, N (δ) = ∞ in case we have no
premature termination.
In the next assumption, we define how the regularizing sequence is generated in the
inner iteration.

Assumption 4 If the iterate xn of K-REGINN is well-defined, then the inner iteration se-
quence is generated by

zn,k ) − λn,k (A∗n jr (vn,k ) + γn Jp (sn,k − xn )) ,


Jp (zn,k+1 ) = Jp (b

where (vn,k )k≤kn ⊂ Y[n] satisfies kvn,k k ≤ kAn sn,k − bn k. The sequence (γn )n∈N ⊂ R obeys
the inequalities 0 ≤ γn ≤ K2 kbn kr with K2 ≥ 0. Further, (xn )n∈N ⊂ X is an arbitrary
sequence and K2 = 0 whenever this sequence is unbounded or in case zbn,k = xn .

The following methods satisfy the above assumption:

• All the dual gradient methods, defined by iteration (3.29) in Subsection 3.1.2 and
satisfying λn,k ≤ λDE . In particular, the DE method itself, the MSD and LW methods.
Here, zbn,k = zn,k , vn,k = An sn,k − bn and K2 = 0;

• The Bregman variation of the Iterated-Tikhonov method (3.62), with zbn,k = zn,k ,
vn,k = An sn,k+1 − bn and K2 = 0;

• The Bregman variation of Tikhonov-Phillips method (3.58) with zbn,k = xn , vn,k =


An sn,k+1 − bn (see (3.54)) and K2 = 0;

• The mixed gradient-Tikhonov methods, defined by iteration (3.76) , with zbn,k = zn,k ,
vn,k = An sn,k − bn , γn = αn and K2 = K0 /K3 , where
 K0 is defined
 in Assumption 3,
page 58, and K3 is an upper bound to the sequence 1
p ken − xn kp (see Subsection
n∈N
3.1.4).

With the next lemma we prepare our convergence proof for the exact data case.
 
Lemma 42 Assume all the hypotheses from Theorem 38 but with µ ∈ η+K K1
0
, 1 and let
Assumption 4 hold true. Then,

∆p (xn , xn+1 ) . ∆p x+ , xn − ∆p x+ , xn+1


 
(4.17)

for all n ∈ N.
4.3. CONVERGENCE WITHOUT NOISE 63

Proof. From (2.21),

∆p (xn , xn+1 ) = ∆p x+ , xn+1 − ∆p x+ , xn + Jp (xn+1 ) − Jp (xn ) , x+ − xn .


 

(4.18)

We first analyze the case zbn,k = zn,k . From Assumption 4,

Jp (xn+1 ) − Jp (xn ) , x+ − xn = hJp (zn,kn ) − Jp (zn,0 ) , en i

n −1
kX
= hJp (zn,k+1 ) − Jp (zn,k ) , en i
k=0
n −1
kX
= λn,k (hjr (vn,k ) , −An en i + γn hJp (sn,k − xn ) , −en i) .
k=0

If (xn )n∈N is not bounded, then γn ≡ 0 (K2 = 0), otherwise, since the sequences (en ) and
(sn,k ) are uniformly bounded, see (4.11) and (4.16) ,

γn hJp (sn,k − xn ) , −en i ≤ γn ksn,k − xn kp−1 ken k . γn . kbn kr . (4.19)


1
From properties of jr , the TCC (Assumption 1(c)) and kbn k ≤ µ kbn − An sn,k k for k =
0, ..., kn − 1, we obtain by making use of (4.12),

n −1
kX  
Jp (xn+1 ) − Jp (xn ) , x+ − xn . λn,k kAn sn,k − bn kr−1 kAn en k + kbn kr

k=0
n −1
kX  
≤ λn,k kAn sn,k − bn kr−1 (η + 1) kbn k + kbn kr
k=0
  kn −1
η+1 1 X
≤ + r λn,k kAn sn,k − bn kr
µ µ
k=0
. ∆p x , xn − ∆p x+ , xn+1 .
+
 

Inserting this result in (4.18), we arrive at (4.17) . We proceed now proving the result for
the case zbn,k = xn (consequently, K2 = 0). Similarly to above, but using (4.13) instead of
(4.12) , the inequality

Jp (xn+1 ) − Jp (xn ) , x+ − xn = hJp (zn,kn ) − Jp (xn ) , en i



= −λn,kn −1 hjr (vn,kn −1 ) , An en i


η+1
≤ λn,kn −1 kAn sn,kn −1 − bn kr
µ
. ∆p x+ , xn − ∆p x+ , xn+1 ,
 

is achieved, which concludes the proof.


Assumptions 1 to 4 are strong enough to prove the convergence of the sequence (xn )n∈N
to a solution of (3.1) as n → ∞ if a single equation in (3.6) is considered (d = 1). However,
it is very difficult to prove the same result for the Kaczmarz version of REGINN (d > 1)
using only these assumptions. The boundedness of kn and of λn,k seems to be necessary
conditions to this end. For this reason we enunciate the next assumption.

Assumption 5 If s ≤ r and kmax < ∞ in (3.12) , then there exist constants 0 < λmin ≤
λmax such that λmin ≤ λn,k ≤ λmax for all k = 0, ..., kn and n ∈ N.

The above assumption is verified for:


64 CHAPTER 4. CONVERGENCE ANALYSIS OF K-REGINN

• The dual gradient method LW with a small constant step-size. This method clearly
satisfies λmin = λLW = λmax ;
• The Bregman variation of Iterated-Tikhonov method, where the required inequalities
are satisfied for λn,k = 1/αn . Consequently, λmin = 1/αmax and λmax = 1/αmin (see
(3.68));
• The Bregman variation of Tikhonov-Phillips method with λn,k = 1/αk . In this case,
λmin = 1/α0 and λmax = 1/αkmax (see (3.57)).
It is possible to prove the existence of λmin for the dual gradient methods DE and MSD
(see (3.43) and (3.47)). But unfortunately, this seems not to be the case for λmax . However,
if we modify the definitions of these methods to λnew = λold ∧ λ with a (possibly very
large) constant λ ≥ λmin , Assumption 5 is immediately verified for these methods with
λmax := λ. Anyway, some results of next Theorem hold true for these methods in their
original definition. In the following convergence proof we adapt ideas from our previous
work [40].
Theorem 43 (Convergence without noise) Let X and Y be Banach spaces with X
being uniformly smooth and s−convex with p ≤ s. Let Assumption 1 in page 31 hold true
and start with x0 ∈ Bρ (x+ , ∆p ) . Assume that Assumptions 3, page 58 and 4, page 62
hold true and that Assumption 2, page 53 is verified with C = 0 in case of zbn,k = xn .
Suppose that the constant η in the TCC satisfy η < K1 − K0 and that µ ∈ (µmin , 1) with
µmin := (η + K0 ) /K1 < 1. If d = 1, then REGINN either stops after finitely many iterations
with a solution of (3.1) or the generated sequence (xn )n∈N ⊂ Bρ (x+ , ∆p ) converges strongly
in X to a solution of (3.1) . If x+ is the unique solution in Bρ (x+ , ∆p ), then xn → x+ as
n → ∞.
If d > 1 then the same results hold true for K-REGINN in case of Assumption 5 holds
true as well as kmax < ∞ and s ≤ r.
Proof. As this proof is relatively long, it is divided in four parts. In the first and second
parts, the case d > 1 is analyzed and Assumption 5 is employed to prove inequality (4.31)
below, for the case zbn,k = zn,k in the first part of the proof and for zbn,k = xn in the second
one. In the third part, an inequality similar to (4.31) is established for the case d = 1.
Finally, we prove in the fourth part of the proof that the residual converges to zero as n
goes to infinity, which together with (4.31) will ensure the desired result.
If Algorithm 1 stops after a finite number of iterations then the current iterate is a
solution of (3.6) and consequently of (3.1) . Otherwise, (xn )n∈N is a Cauchy sequence, as
we will prove now. Let m, l ∈ N with m ≤ l. The trick of the whole proof is to use the
triangle inequality in order to ”cut” the norm kxm − xl k in an appropriate vector xz and
then estimate each of the resulting norms using the properties of this vector.
Part 1: We consider first the case d > 1. In this case Assumption 5 holds true as well
as kmax < ∞ and s ≤ r. Write m = m0 d + m1 and l = l0 d + l1 with m0 , l0 ∈ N and
m1 , l1 ∈ {0, . . . , d − 1} . Of course m0 ≤ l0 . Choose z0 ∈ {m0 , . . . , l0 } such that
d−1
P
(kyj − Fj (xz0 d+j )k + kxz0 d+j+1 − xz0 d+j k) (4.20)
j=0
d−1
P
≤ (kyj − Fj (xn0 d+j )k + kxn0 d+j+1 − xn0 d+j k)
j=0

for all n0 ∈ {m0 , . . . , l0 }. Define z := z0 d + z1 , where z1 = l1 if z0 = l0 and z1 = d − 1


otherwise. This setting guarantees m ≤ z ≤ l. As X is s−convex and the sequence (xn )n∈N
is bounded (see (4.11)), it follows from (2.23) that
kxm − xl ks ≤ 2s−1 (kxm − xz ks + kxz − xl ks ) . ∆p (xz , xm ) + ∆p (xz , xl ) .
4.3. CONVERGENCE WITHOUT NOISE 65

Three points identity (2.21) implies now that

kxm − xl ks . βm,z + βl,z + f (z, m, l) (4.21)

with
βm,z := ∆p x+ , xm − ∆p x+ , xz
 

and

f (z, m, l) := Jp (xz ) − Jp (xm ) , xz − x+ + Jp (xz ) − Jp (xl ) , xz − x+ .




By monotonicity (4.10), we conclude that ∆p (x+ , xn ) → γ ≥ 0 as n → ∞. Thus, βm,z and


βl,z converge to zero as m → ∞ (which causes z → ∞ and l → ∞). Further,

l−1
X
Jp (xn+1 ) − Jp (xn ) , xz − x+ .


f (z, m, l) ≤ (4.22)
n=m

We want to show now that f (z, m, l) is bounded from above by a term that converges to
zero as m → ∞, which together with (4.21) will prove that (xn )n∈N is a Cauchy sequence.
Our first goal is to prove this property for the case zbn,k = zn,k . As in the proof of Lemma
42,

Jp (xn+1 ) − Jp (xn ) , xz − x+


(4.23)
n −1
kX  
≤ λn,k kAn sn,k − bn kr−1 kAn ez k + γn |hJp (sn,k − xn ) , ez i| .
k=0

Similarly to (4.19),
γn |hJp (sn,k − xn ) , ez i| . kbn kr . (4.24)
We proceed applying Assumption 1(c) to estimate kAn ez k :

kAn ez k ≤ An x+ − xn + kAn (xz − xn )k



(4.25)
≤ kbn k + bn − An x+ − xn + F[n] (xz ) − F[n] (xn )


0
+ F[n] (xz ) − F[n] (xn ) − F[n] (xn ) (xz − xn )


≤ (η + 1) kbn k + F[n] (xz ) − F[n] (xn )

≤ (η + 1) 2 kbn k + y[n] − F[n] (xz ) .

Note that in the last norm, the operator F[n] is applied in the ”wrong” vector xz . To
estimate this norm, we use Assumption 1. Write n = n0 d + n1 with n0 ∈ {m0 , . . . , l0 } and
n1 ∈ {0, . . . , d − 1} . Then,

y[n] − F[n] (xz ) = kyn − Fn (xz d+z )k
1 1 0 1
d−1
X
≤ kyn1 − Fn1 (xz0 d+n1 )k + kFn1 (xz0 d+j+1 ) − Fn1 (xz0 d+j )k
j=0
d−1
1 X Fn0 (xz d+j ) (xz d+j+1 − xz d+j )

≤ kyn1 − Fn1 (xz0 d+n1 )k + 0 0 0
1−η 1
j=0
 d−1
X
M
≤ 1+ (kyj − Fj (xz0 d+j )k + kxz0 d+j+1 − xz0 d+j k) .
1−η
j=0
66 CHAPTER 4. CONVERGENCE ANALYSIS OF K-REGINN

This inequality was the motivation for the definition of z. From (4.20),
 d−1
X
y[n] − F[n] (xz ) ≤ 1 + M

(kyj − Fj (xn0 d+j )k + kxn0 d+j+1 − xn0 d+j k) . (4.26)
1−η
j=0

Inserting (4.26) in (4.25), (4.25) and (4.24) in (4.23), (4.23) in (4.22), and using the defini-
tion of kn , we arrive at
n −1
l−1 kX
X
f (z, m, l) . λn,k kAn sn,k − bn kr + g (z, m, l) + h (z, m, l) , (4.27)
n=m k=0

where
X n −1
l−1 kX d−1
X
g (z, m, l) := λn,k kAn sn,k − bn kr−1 kyj − Fj (xn0 d+j )k ,
n=m k=0 j=0

X n −1
l−1 kX d−1
X
h (z, m, l) := λn,k kAn sn,k − bn kr−1 kxn0 d+j+1 − xn0 d+j k .
n=m k=0 j=0

The first term on the right-hand side of (4.27) can be estimated by (4.12). It remains to
estimate g and h. At this point we need the boundedness of kn . From kn ≤ kmax ,
!v k −1
n −1
kX n −1
kX n
X
v
kAn sn,k − bn k . kAn sn,k − bn k . kAn sn,k − bn kv
k=0 k=0 k=0
for any v > 0. Then defining
n −1
kX
wn := kAn sn,k − bn k ,
k=0

and making use of the inequality2


 
d−1
X d−1
X Xd−1 d−1
X
ar−1
i bj ≤ d2  ari + brj  for every ai , bj ≥ 0,
i=0 j=0 i=0 j=0

we find applying Assumption 5 and definition of kn (3.11),


l−1
X d−1
X
g (z, m, l) . λmax wnr−1 kyj − Fj (xn0 d+j )k (4.28)
n=m j=0
 
l0
X d−1
X d−1
X
≤ λmax  wnr−1
0 d+n1
kyj − Fj (xn0 d+j )k
n0 =m0 n1 =0 j=0
l0 d−1 d−1
!
X X X
. λmax wnr 0 d+n1 + kyn1 − Fj (xn0 d+n1 )kr
n0 =m0 n1 =0 n1 =0
l0 d+d−1
X l0 d+d−1
X
≤ λmax wnr + kbn kr
n=m0 n=m0
l0 d+d−1
X kX n −1 l d+d−1kn −1
λmax 0 X X
r
. λmax kAn sn,k − bn k ≤ λn,k kAn sn,k − bn kr .
n=m0 k=0
λmin n=m
0 k=0
d−1
X d−1
X
2 r−1
ai bj ≤ ari if bj ≤ ai , and ar−1
i bj ≤ brj otherwise. Thus, ar−1
i bj ≤ ari + brj ≤ ari + brj for
i=0 j=0
d−1 d−1 d−1 d−1
!
X X X X
i, j = 0, ..., d − 1 and consequently ar−1
i bj ≤ d 2 ari + brj .
i=0 j=0 i=0 j=0
4.3. CONVERGENCE WITHOUT NOISE 67

Similarly,
l0 d+d−1
X kX n −1 l0 d+d−1
X
r
h (z, m, l) . λn,k kAn sn,k − bn k + kxn+1 − xn kr . (4.29)
n=m0 k=0 n=m0

Making use of the s−convexity of X once again,


(4.17)
kxn+1 − xn ks . ∆p (xn , xn+1 ) . ∆p x+ , xn − ∆p x+ , xn+1 .
 
(4.30)

As r ≥ s, we have for m, l large enough that


l0 d+d−1
X l0 d+d−1
X
r
kxn+1 − xn k ≤ kxn+1 − xn ks
n=m0 n=m0
∆p x+ , xm0 − ∆p x+ , xl0 d+d = βm0 ,l0 d+d .
 
.

Plugging this bound into (4.29), inserting then inequalities (4.29) and (4.28) in (4.27), (4.27)
in (4.21), and using (4.12), we end up with

kxm − xl ks . βm,z + βl,z + βm0 ,l0 d+d . (4.31)

Part 2: Now we consider the case zbn,k = xn (in this case K2 = 0 and consequently
γn ≡ 0 in Assumption 4) together with d > 1. Inequality (4.23) reads here
Jp (xn+1 ) − Jp (xn ) , xz − x+ ≤ λn,k −1 kAn sn,k −1 − bn kr−1 kAn ez k .


n n (4.32)

Proceeding like above, we find, similarly to (4.27) ,


l−1
X
f (z, m, l) . λn,kn −1 kAn sn,kn −1 − bn kr + g (z, m, l) + h (z, m, l)
n=m

with
l−1
X d−1
X
r−1
g (z, m, l) := λn,kn −1 kAn sn,kn −1 − bn k kyj − Fj (xn0 d+j )k ,
n=m j=0
l−1
X d−1
X
r−1
h (z, m, l) := λn,kn −1 kAn sn,kn −1 − bn k kxn0 d+j+1 − xn0 d+j k .
n=m j=0

In the same way as done in (4.28) and (4.29) above, the inequalities
l0 d+l−1
X
g (z, m, l) . λn,kn −1 kAn sn,kn −1 − bn kr ,
n=m0
l0 d+l−1
X l0
X
r
h (z, m, l) . λn,kn −1 kAn sn,kn −1 − bn k + kxn+1 − xn kr
n=m0 n=m0

can be proven. Using now (4.13) instead of (4.12) we arrive again at (4.31) .
Part 3: The case d = 1 is considered now (therefore kmax = ∞ is possible). This
situation is easier because everything we need is to change the definition of xz in (4.20) to
the vector with the smallest residual in the outer iteration, i.e., choose z ∈ {m, . . . , l} such
that kbz k ≤ kbn k , for all n ∈ {m, . . . , l} . Then, from (4.25) ,

kAn ez k ≤ (η + 1) (2 kbn k + kbz k) ≤ 3 (η + 1) kbn k .


68 CHAPTER 4. CONVERGENCE ANALYSIS OF K-REGINN

The inequality kbn k ≤ µ1 kAn sn,k − bn k for k = 0, ..., kn − 1, together with (4.32) (respec-
tively (4.23) and (4.24)) and (4.22) , results in

n −1
l−1 kX
X
f (z, m, l) . λn,k kAn sn,k − bn kr
n=m k=0

in the case zbn,k = zn,k and

l−1
X
f (z, m, l) . λn,kn −1 kAn sn,kn −1 − bn kr
n=m

in the case zbn,k = xn . Plugging now these inequalities into (4.21) and using again (4.12)
(respectively (4.13)), we arrive at (4.31) with m0 and l0 d + d replaced by m and l, respec-
tively.
Part 4: In any case the right-hand side of inequality (4.31) converges to zero as m → ∞
revealing (xn )n∈N to be a Cauchy sequence. As X is complete, it converges to some x∞ ∈ X.
Observe that kn ≥ 1 if kbn k 6= 0 and since µ kbn k ≤ kAn sn,k − bn k for all k ≤ kn − 1, it
follows that for the case zbn,k = zn,k ,

∞ ∞ n −1
∞ kX
X X X (4.12)
r+t r+t
µ r+t
kbn k ≤ kn (µ kbn k) . λn,k kAn sn,k − bn kr < ∞
n=0 n=0 n=0 k=0

Similarly, ∞ r+t
P
n=0 kb n k < ∞ for z
b n,k = x n . Then, y[n] − F[n] (xn ) = kbn k → 0 as n → ∞
and since the Fj ’s are continuous for all j = 0, . . . , d − 1, we have yj = Fj (x∞ ). If (3.1) has
only one solution in Bρ (x+ , ∆p ) then x∞ = x+ .

4.4 Regularization property


In this section we validate that, under appropriate conditions, K-REGINN is a regularization
scheme for solving (3.1) with noisy data y δ . Indeed, we show that the family (xδN (δ) )δ>0
of outputs of Algorithm 1 relative to the inputs (y δ )δ>0 converges strongly to solutions of
(3.1) with exact data y as δ → 0.
To avoid possible wrong interpretations, we will not use the notation δj , j = 0, . . . , d−1,
as in (3.7) any more. Instead, when we write δi , we mean a positive number in a sequence
of δ’s as defined in (3.8) , i.e., δi := max (δj )i : j = 0, . . . , d − 1 > 0.
In order to prove the regularization property, a standard argument of
three steps is fre-

quently used. Firstly, one proves the monotonicity of the error: xδn − x+ ≤ xδn−1 − x+
for n = 1, ..., N (δ) . Secondly, the convergence in the noiseless situation is proven: for  > 0
+
δ kxn − x k < /2 provided n large enough. Thirdly, a stability result is necessary:
fixed,
xn − xn < /2 provided δ small enough. Now, disregarding the details of pathological
cases, N (δ) → ∞ as δ → 0. Thus, for n ∈ N sufficiently large but fixed and δ small enough
to guarantee N (δ) ≥ n, it follows that

δ
xN (δ) − x+ ≤ xδn − x+ ≤ xδn − xn + xn − x+ < .

In this subsection, we apply a similar reasoning with the necessary modifications to fit
it in with our framework. We follow ideas from [36] and [40]. In order to facilitate the
comprehension, we summarize the main results:

1. From (3.11) , it follows that kn is an arbitrary number less than or equal kREG ,
which means that the sequence (xn )n∈N , generated from a run of K-REGINN using
4.4. REGULARIZATION PROPERTY 69

data without noise, can change if the sequence (kmax,n )n∈N is chosen differently. In
Definition 44 below, we observe all the sequences which are possibly generated from
a run of K-REGINN in the noiseless situation and collect all the n−th iterates in the
set Xn .

2. In Lemma 45, we prove that if the sequences xδni n∈N , with (δi )i∈N ⊂ R being a


zero-sequence, are generated by different runs of K-REGINN with the different noise
δ

levels δi , then for each n ∈ N fixed, the sequence of the n−th iterates xn i∈N splits
i

into convergent subsequences, all of which converge to elements of Xn (xδn → Xn as


δ → 0).

3. Further, in Lemma 46 it is proven that for each  > 0, there exists a M ∈ N such that,
for each n ≥ M, there exists an element ξn ∈ Xn and a solution x∞ of (3.1) satisfying
kξn − x∞ k <  (the sets Xn converge uniformly to the set of solutions of (3.1)).
 
4. Finally, in Theorem 47, we provide a proof that the sequence xδNi (δi ) of the final
i∈N
iterates of K-REGINN (generated with the different noise levels δi , where (δi )i∈N ⊂ R
is again a zero-sequence) splits into convergent subsequences, all of which converge
strongly to solutions of (3.1) as i → ∞.

In a first step we investigate the stability of the scheme, i.e., we study the behavior of
the n-th iterate xδn as δ approaches zero. The sets Xn defined below play an important role,
see the Item 1 above.

Definition 44 Let X0 := {x0 } and define Xn+1 from Xn by the following procedure: for
each ξ ∈ Xn , change xn in definition of K-REGINN with ξ and change the respective inner
iterate zn,k with σn,k , resulting from using ξ instead of xn . More precisely, define σn,0 =
σn,0 (ξ) := ξ and σn,k+1 = σn,k+1 (ξ) as
    
σn,k ) − λξn,k F[n]
Jp (σn,k+1 ) := Jp (b 0
(ξ)∗ jr vn,k
ξ
+ γnξ Jp (σn,k − ξ) − xξn ,

where σbn,k , λξn,k , vn,k


ξ
, γnξ and xξn replace zbn,k , λn,k , vn,k , γn and xn in Assumption 4, page
62, respectively, when the vectors zn,k and xn are replaced by σn,k and ξ respectively.
Let ebn := y[n] − F[n] (ξ) and define kREG (ξ) := 0 in case of ebn = 0 and
n o
0
kREG (ξ) := min k ∈ N : F[n] (ξ) (σn,k − ξ) − bn < µ ebn (4.33)
e

otherwise. Then σn,k (ξ) ∈ Xn+1 for k = 1, . . . , kREG (ξ) in case kREG (ξ) ≥ 1 and only for
k = 0 in case kREG (ξ) = 0. We call ξ ∈ Xn the predecessor of the vectors σn,k (ξ) ∈ Xn+1
and these ones successors of ξ.

Note that xn is a possible outer iterate of K-REGINN if and only if xn ∈ Xn . Moreover,


Xn is finite for each n ∈ N and from inequality (4.15) follows that, if ξn+1 ∈ Xn+1 is as
successor of ξn ∈ Xn ,    
∆p x† , ξn+1 ≤ ∆p x† , ξn

whenever x† is a solution of (3.1) in Bρ (x+ , ∆p ) . We emphasize that the sets Xn , n ∈ N0 ,


are defined with respect to exact data y.
The following stability property of inner iteration of K-REGINN is crucial to prove con-
vergence with noisy data. This assumption is a variation of [36, Assumption (3.12)].
70 CHAPTER 4. CONVERGENCE ANALYSIS OF K-REGINN
 
δi
Assumption 6 Let (δi )i∈N ⊂ R be a zero-sequence. Assume that the sequence zn,k
i∈N
converges to σn,k (ξ) for k = 0, ..., kREG (ξ) whenever the sequence xδni i∈N converges to


ξ ∈ Xn . This is,
δi
lim xδni = ξ ∈ Xn =⇒ lim zn,k = σn,k (ξ) for k = 0, ..., kREG (ξ) .
i→∞ i→∞

Assumption 6 is verified for the following methods:

• All the dual gradient methods, defined by iteration (3.29) in Subsection 3.1.2 and
satisfying λδn,k ≤ λδDE , where λδn,k depends continuously on δ. In particular, the DE
method itself, the MSD and LW methods. For these methods, we assume that X is
uniformly smooth and s−convex with p ≤ s. Additionally, we suppose that the spaces
Yj , j = 0, ..., d − 1 are uniformly smooth;

• The Bregman variation of the Iterated-Tikhonov method (3.62), assuming that X is


uniformly smooth and s−convex with p ≤ s ≤ r;

• The Bregman variation of Tikhonov-Phillips method (3.58) . We assume that X is


uniformly smooth and uniformly convex;

• The mixed gradient-Tikhonov methods, defined by iteration (3.76) . Here the uniform
smoothness and the s−convexity of X, with p ≤ s, as well as the uniform smoothness
of Yj , j = 0, ..., d − 1 is assumed.

As these proofs are long, we transfer them to Appendix A.


Based on Assumption 6, we prove the next lemma, which basically adapts ideas of [36,
Lemma 3.11].

Lemma 45 Let all assumptions of Theorem 38 hold true and assume additionally Assump-
tion 6 and that X is s−convex with p ≤ s. If δi → 0 as i → ∞ then for n ≤ N (δi ) with
δi > 0 sufficiently small, the sequence xδni i∈N splits into convergent subsequences, all of
which converge to elements of Xn .

Proof. We prove the statement by induction. For n = 0, xδ0i = x0 → x0 ∈X0 as i → ∞.


Now, suppose that for some n ∈ N with n < N (δi ) for i large enough, xδni i∈N splits into
 subsequences, all of which converge to elements of Xn . To simplify the notation,
convergent
δ
let xn i∈N itself be a subsequence which converges to an element of Xn , say
i

lim xδni = ξ ∈ Xn . (4.34)


i→∞
 
We have to prove that the sequence xδn+1
i
splits into convergent subsequences, each
i∈N
one converging to an element of Xn+1 . From Assumption 6 and (4.34) ,

lim z δi = σn,k (ξ) for k = 0, ..., kREG (ξ) . (4.35)


i→∞ n,k

As the functions Fj and Fj0 are continuous for j = 0, ..., d − 1, it follows from (3.7) , (4.35)
and (4.34) that

lim bδni = ebn and (4.36)

i→∞
 
xδni sδn,k
0 0
lim F[n] i
− bδni = F[n] (ξ) (σn,k − ξ) − ebn

i→∞

for all k ≤ kREG (ξ) . Now, we have to differ two cases.


4.4. REGULARIZATION PROPERTY 71

Case 1: ebn 6= 0. From definition of kREG (ξ) (see (4.33)),



0 
F (ξ) σ − ξ − b < µ bn
e
n,kREG (ξ) n
[n] e

and we conclude, making use of (4.36) that for i large enough


 
0 δi δi δi δi
F x s
[n] n n,kREG (ξ) − bn < µ bn ,
δi
which implies in view of (3.10) that knδi ≤ kREG ≤ kREG (ξ) . Hence knδi ∈ {0, . . . , kREG (ξ)}
for i large enough. This means that for i large enough, the sequence knδi i∈N splits into at
 δ  δi
i
most kREG (ξ) + 1 constant subsequences kn ` satisfying kn ` = k ∈ {0, . . . , kREG (ξ)} .
`∈N
Accordingly,
δi δi ` δi (4.35)
`
lim xn+1 = lim z δ = lim zn,k` = σn,k ∈ Xn+1 .
`→∞ `→∞ n,kni` `→∞

Case 2: ebn = 0. In this case, y[n] = F[n] (ξ) and σn,0 = ξ ∈ Xn+1 (see Definition 44).
As the sequence xδni n≤N (δ ) is uniformly bounded (see (4.11)) and X is s−convex, we

i

conclude that (2.23) applies. We prove now that xδn+1


i
→ ξ as i → ∞. Assume the contrary,
δi` s
then there exist an  > 0 and a subsequence (δi` )`∈N such that  < ξ − xn+1 . It follows

that
δi` s (2.23)
 δ  (4.15)  δ 
i` i
 < ξ − xn+1 ≤ C∆p ξ, xn+1 ≤ C∆p ξ, xn ` → C∆p (ξ, ξ) = 0

as ` → ∞, contradicting  > 0.
In the second step towards establishing the regularization property we provide a kind
of uniform convergence of the set sequence (Xn )n to solutions of (3.1). For the rigorous
formulation in Lemma 46 below we need to introduce further notation: Let l ∈ N and set
(l) (l) (l) (l) (l)
ξ0 := x0 . Now define ξn+1 := σn,k(l) (ξn ) by choosing kn ∈ {1, . . . , kREG (ξn )} in case of
n
(l) (l) (l) (l) (l)
kREG (ξn ) ≥ 1 and kn = 0 otherwise. Then ξn+1 is a successor of ξn . Of course ξn ∈ Xn
(l)
for all n ∈ N and reciprocally, each element in Xn can be written as ξn for some l ∈ N.
(l)
Observe that (ξn )n∈N represents a sequence generated by K-REGINN in the case of exact
(l)
data is given and with the inner iteration stopped with an arbitrary stop index kn less
(l) (l)
than or equal kREG (ξn ). Due to this fact, we call the sequence (kn )n∈N a stop rule. The
(l)
sequence (ξn )n∈N is therefore one of many possible sequences which can be generated from
a run of K-REGINN with initial vector x0 . Since the results of Theorem 43 hold true for all
(l)
these sequences, it applies in particular to (ξn )n∈N . Thus, the limit
x(l) (l)
∞ := lim ξn (4.37)
n→∞

exists and is a solution of (3.1) in Bρ (x+ , ∆p ) .


The following result was first presented in [22] in the context of Hilbert spaces and later
generalized to the Banach spaces framework in Proposition 19 of [40], where only the case
of a unique solution of (3.1) was analyzed. The present version, appeared first in [39].
(l)
Lemma 46 Let all assumptions of Theorem 43 hold true and let (ξn )n∈N denote the se-
(l)
quence generated by the stop rule (kn )n∈N . Then, for each  > 0 there exists an M =
M () ∈ N such that

(l) (l)
ξn − x∞ <  for all n ≥ M and all l ∈ N.
(l)
In particular, if x+ is the unique solution of (3.1) in Bρ (x+ , ∆p ) then kξn − x+ k <  for
all n ≥ M and all l ∈ N.
72 CHAPTER 4. CONVERGENCE ANALYSIS OF K-REGINN

Proof. Assume the statement is not true. Then, there exist an  > 0 and sequences
(nj )j , (lj )j ⊂ N with (nj )j strictly increasing such that

(lj ) (lj )
ξnj − x∞ ≥  for all j ∈ N

(l ) (l )
where (ξn j )n represents the sequence generated by the stop rule (kn j )n . We stress the fact
(l )
that the iterates ξnjj must be generated by infinitely many different sequences of stop rules
(l) (l)
(otherwise, as ξnj → x∞ as j → ∞ for each l and as the lj ’s attain only a finite number of
(l ) (l )
values, we would have kξnjj − x∞j k <  for nj large enough). Next we reorder the numbers
lj (excluding some iterates if necessary) such that

(l) (l)
ξnl − x∞ ≥  for all l ∈ N. (4.38)

(l) (l) (l)


Set ξb0 := x0 (= ξ0 for all l ∈ N). As kREG (ξb0 ) < ∞ and k0 ∈ {0, . . . , kREG (ξ0 )} =
(l)
{0, . . . , kREG (ξb0 )} for all l ∈ N, there exists a number b k0 in this set such that b
k0 = k0
for infinitely many l ∈ N. Let £0 ⊂ N be the set of those indices l. Fix now b k0 as the
(l)
stop index of the first inner iteration, i.e., ξ1 := σ0,bk0 (ξ0 ), see Definition 44. Then ξb1 = ξ1
b b
for all l ∈ £0 and as kREG (ξb1 ) < ∞, we conclude like before, that there exists a number
(l)
k1 ∈ {0, . . . , kREG (ξb1 )} such that b
b k1 = k1 for infinitely many l ∈ £0 \ {1} . Those l’s are
collected in £1 . Proceeding by induction, we find a sequence (ξbn )n , generated by the stop
rule (b kn )n as well as a sequence of unbounded sets (£n )n satisfying £n ⊂ N\ {1, . . . , n}
with £n+1 ⊂ £n and
(l)
ξn+1 = ξbn+1 for all l ∈ £n , n ∈ N0 . (4.39)

In view of (4.37) the limit xb∞ := limn→∞ ξbn exists in Bρ (x+ , ∆p ) and solves (3.1). It
follows that the sequence (ξbn )n is bounded and since X is s−convex, inequality (2.23)
holds. Additionally, there exists M = M () ∈ N such that
  s
∆p xb∞ , ξbn < s for all n > M, (4.40)
2 C

where C > 0 is the constant from (2.23) . We can additionally suppose that ξbn ∈ Bρ (x+ , ∆p )
for all n > M. In fact, as lim n→∞ξn = Dx
b b∞ and the mappings Jp and E ∆p (b x∞ , ·) are
continuous, we have that ∆p x b∞ , ξbn and Jp (ξbn ) − Jp (b b∞ − x+ converge to zero
x∞ ) , x
as n → ∞. From three points identity (2.21),
     D E
∆p x+ , ξbn = ∆p xb∞ , ξbn + ∆p x+ , xb∞ + Jp (ξbn ) − Jp (b b∞ − x+
x∞ ) , x
 
and as ∆p (x+ , x
b∞ ) < ρ, we conclude that ∆p x+ , ξbn < ρ for n large enough.
Now, for l0 ∈ £M fixed,
  (4.39)   (4.40) s
(l )
∆p xb∞ , ξM0+1 = ∆p x b∞ , ξbM +1 < s .
2 C
(l )
As xb∞ is a solution of (3.1) and ξM0+1 = ξbM +1 ∈ Bρ (x+ , ∆p ) , inequality (4.10) applies and
 
(l )
the errors ∆p x b∞ , ξn 0 are monotonically decreasing in n for all n ≥ M + 1. In particular,
nl0 ≥ l0 ≥ M + 1 (because l0 ∈ £M ⊂ N\ {1, . . . , M }). Then
  
(l )
 s
∆p xb∞ , ξn(ll0 ) ≤ ∆p xb∞ , ξM0+1 < s .
0 2 C
4.4. REGULARIZATION PROPERTY 73

(l ) (l )
Since ξn 0 → x∞0 as n → ∞, we conclude that

(l0 )
   (4.10)   s
∆p xb∞ , x∞ = lim ∆p xb∞ , ξn(l0 ) ≤ ∆p x b∞ , ξn(ll0 ) < s .
n→∞ 0 2 C
From the s−convexity of X,
s  s s 
(l0 )
ξnl0 − x(l 0)
≤ 2 s−1 (l0 )
ξ − x + x − x (l0 )

∞ nl0 b∞ b∞ ∞
    
≤ 2s−1 C ∆p x b∞ , ξn(ll0 ) + ∆p x b∞ , x(l ∞
0)
< s ,
0

contradicting (4.38) .
We are now in position to prove our main result. The result of the next theorem was
first established in our work [39].

Theorem 47 (Regularization property) Assume all the hypotheses of Theorem 38, As-
sumptions 4, page 62, and 6, page 70, and suppose that X is s−convex with p ≤ s. If d > 1,
 holds true, as well as kmax < ∞ and s ≤ r.
assume additionally that Assumption 5, page 63,
δi
If δi → 0 as i → ∞, then the sequence xN (δi ) splits into convergent subsequences, all
i∈N
of which converge strongly to solutions of (3.1) as i → ∞. In particular, if x+ is the unique
solution of (3.1) in Bρ (x+ , ∆p ) then

lim xδNi (δi ) − x+ = 0.

i→∞

Proof. If N (δi ) ≤ I as i → ∞ for some I ∈ N, then the sequence (xδNi (δi ) )i∈N splits
δi
into subsequences of the form (xn ` )`∈N where n is a fixed number less than or equal to I.
According to Lemma 45, each of these subsequences splits into convergent subsequences.
Hence each limit of such a subsequence must be a solution of (3.1) due to the discrepancy
principle (3.13). In fact, if xδni → a as i → ∞, then using (3.7) ,
 
kyj − Fj (a)k = lim yj − Fj xδni ≤ lim (1 + τ ) δi = 0, j = 0, . . . , d − 1.

i→∞ i→∞

Suppose now that N (δi ) → ∞ as i → ∞ and let  > 0 be given. As the Bregman distance
is a continuous function in both arguments, there exists γ = γ () > 0 such that
  s
∆p x, xδni < whenever x − xδni < γ, (4.41)

C
where C > 0 is the constant from (2.23). From Lemma 46, there is an M ∈ N such that,
(l) (l)
for each ξM ∈ XM , there exists a solution x∞ of (3.1) satisfying
γ
(l) (l)
x∞ − ξM < .
2
According to Lemma 45, the sequence xδMi splits into convergent subsequences, each one
δi
converging to an element of XM . Let (xMj )j∈N be a generic convergent subsequence, which
converges to an element of XM , say
δi (l )
lim xM` = ξM0 ∈ XM .
`→∞

δi (l )
We prove now that the subsequence (xN`(δi ) )`∈N converges to the solution x∞0 . In fact,
`
δi (l )
since xM` → ξM0 as ` → ∞, there exists a L1 = L1 () such that
γ
(l0 ) δi
ξM − xM` < for all ` ≥ L1 .
2
74 CHAPTER 4. CONVERGENCE ANALYSIS OF K-REGINN

As N (δi` ) → ∞ as ` → ∞, we have N (δi` ) ≥ M for all ` ≥ L where L ≥ L1 is a sufficiently


large number. Then, for all ` ≥ L,

(l0 ) δi` (l0 ) (l0 ) (l0 ) δi`
x∞ − xM ≤ x∞ − ξM + ξM − xM < γ.

Finally, the s−convexity of X leads to


s (2.23)  (4.10)  (4.41)
δi ` (l0 ) δi` (l0 ) δi`
 
xN (δi ) − x(l

0)
≤ C∆ p x ∞ , xN (δi ) ≤ C∆ p x ∞ , xM < s .
` `
Chapter 5

Numerical Experiments

To test the performance of K-REGINN, we have chosen a severely ill-posed problem, namely,
the Electrical Impedance Tomography (in short EIT). In this non-invasive problem, one
aims to reconstruct specific features of the interior of an object collecting information on
its boundary. The procedure consists in applying different electric current configurations
on the boundary of a bounded set and then measuring the resulting potentials on the
boundary as well. The objective is to access information of the interior of this set in order
to reconstruct the electrical conductivity.

This idea was originally introduced by Calderón in his famous paper [8]. The problem
proposed by him is currently known as the Continuum Model of EIT (EIT-CM), where
electric current is applied on the whole boundary and the corresponding potential is also
read in all the point of the boundary. The Calderón’s method is more of theoretical than
practical interest because in real situations is actually impossible to apply or record this
information in the whole boundary. This method is however important in the mathematical
point of view. More details and some numerical experiments using K-REGINN to solve this
problem are presented in Section 5.1 below.

To modify the EIT-CM in order to adjust it to a more concrete and realistic framework,
many attempts have been made and different methods have been suggested. One of the
most promising and currently considered a very realistic framework is the so-called Complete
Electrode Model (EIT-CEM), which was first presented by Somersalo et al in [50]. In the
EIT-CEM, electrodes are attached on the boundary of an object and the electric current is
injected on this object through these electrodes. The resulting voltage is then read in the
same electrodes. The inverse problem of reconstructing the electrical conductivity in the
entire set from the application of different configurations of currents and the measure of
the respective voltages on its boundary is considered in Section 5.2, where we additionally
present some numerical experiments and discuss the results.

Concerning practical situations, EIT has a vast range of potential applications in several
fields, including non-invasive medical imaging, non-destructive testing for locating resistiv-
ity anomalies, monitoring of oil and gas mixtures in oil pipelines, geophysics, environmental
sciences, among others. See [5, 23] and the references therein.

Before starting, we would like to point out that both EIT variations are appropriated
and naturally suitable to the use of Kaczmarz methods. The reason is simple: the electrical
conductivity is reconstructed from a set of individual measurements. Each of those mea-
surements can be regarded as an individual operator in a system of equations to be solved.
The details will be explained later in a convenient moment.

75
76 CHAPTER 5. NUMERICAL EXPERIMENTS

5.1 EIT - Continuum Model


Let Ω ⊂ R2 be a simply connected Lipschitz domain. Applying some electric currents
f : ∂Ω → R on the boundary of Ω and recording the voltages g : ∂Ω → R on the same set,
we aim to find the electrical conductivity γ : Ω → R in whole of Ω. The set Ω is assumed to
have no electrical sources or drains, which means that the electric flux γ∇u is divergence
free, where u : Ω → R represents the potential distribution in Ω. This condition is written
as
div (γ∇u) = 0 in Ω. (5.1)
The electric flux is assumed to be completely transferred to the boundary, which means
that
∂u
γ = g on ∂Ω, (5.2)
∂ν
where ∂u/∂ν is the outward normal derivative of u. In order to prove existence and unique-
−1/2
ness of a solution u, we write the weak formulation of (5.1) and (5.2) : Given g ∈ H♦ (∂Ω)
and γ ∈ L∞ + (Ω) , find u ∈ H♦
1 (Ω) such that

Z Z
γ∇u∇ϕ = gϕ for all ϕ ∈ H♦1 (Ω) . (5.3)
Ω ∂Ω

The symbol ♦ means that the integral of the function over the boundary of Ω is zero:
 Z 
1/2 1/2
H♦ (∂Ω) := v ∈ H (∂Ω) : v=0 .
∂Ω

If the function is defined in Ω, this integral is understood in the sense of the trace theorem:
for u ∈ H 1 (Ω) , its trace f = u|∂Ω belongs to H 1/2 (∂Ω) , and we define
 Z 
1 1
H♦ (Ω) := u ∈ H (Ω) : f =0 .
∂Ω

−1/2 1/2
Finally, the set H♦ (∂Ω) is defined as the dual space of H♦ (∂Ω) and

L∞ ∞
+ (Ω) := {v ∈ L (Ω) : v ≥ C a.e. in Ω} ,

where C > 0 is a constant.


R −1/2
The condition ∂Ω g = 0 (g ∈ H♦ (∂Ω)) is interpreted as the law of conservation of
charge. Using it in combination with the Lemma of Lax Milgram, one can prove that there
exists a vector u ∈ H 1 (Ω) satisfying (5.3) and that it is unique up to a constant. The
condition ∂Ω f = 0 (u ∈ H♦1 (Ω)) is therefore required to guarantee uniqueness of solutions
R

in (5.3). It can be interpreted as the grounding of potential. Further, the lower bound
γ ≥ C a.e. in Ω (γ ∈ L∞+ (Ω)) ensure that the electric flux can flow through the entire set
Ω.
With the existence and uniqueness of solutions of the variational problem (5.3) , the
Neumann-to-Dirichlet (NtD) operator Λγ , which associates the current with the respective
voltage,
−1/2 1/2
Λγ : H♦ (∂Ω) → H♦ (∂Ω) , g 7−→ f
is well-defined. Moreover, it can be proven to be an invertible linear bounded operator. Its
bounded linear inverse is called the Dirichlet-to-Neumann operator (DtN).
The forward operator associated with EIT-CM problem is now defined as the nonlinear
function  
−1/2 1/2
F : D (F ) ⊂ L∞ (Ω) → L H♦ (∂Ω) , H♦ (∂Ω) , γ 7−→ Λγ , (5.4)
5.1. EIT - CONTINUUM MODEL 77

with D(F ) := L∞ + (Ω) . The EIT-CM inverse problem consists therefore in recovering γ
from a partial knowledge of the NtD operator Λγ , which is a nonlinear and highly ill-posed
problem [1].
In 2006, Astala and Päivärinta [2] proved the injectivity of F in (5.4) , which means
that the EIT-CM is uniquely solvable. In practice however, one cannot expect to have
full knowledge of the NtD operator. The best it can be done is to apply a finite number
of currents = := (g0 , ..., gd−1 ) and then recover the corresponding voltages, fj = Λγ gj ,
j = 0, ..., d − 1. This fact suggests that the operator
 d
1/2
F= : D (F ) ⊂ L∞ (Ω) → H♦ (∂Ω) , γ 7−→ (f0 , ..., fd−1 ),

is a natural substitute for (5.4) in practical applications.


Observe that by defining the operators
1/2
Fj : D (F ) ⊂ L∞ (Ω) → H♦ (∂Ω) , γ 7−→ fj , j = 0, ..., d − 1, (5.5)

we immediately see that F= = (F0 , ..., Fd−1 )> and Y = Y0 × ... × Yd−1 , with Y :=
 d
1/2 1/2
H♦ (∂Ω) and Yj := H♦ (∂Ω) for j = 0, ..., d − 1. This is exactly the structure pre-
sented in (3.6) , which makes this problem suitable to the application of a Kaczmarz method.
Moreover, each of the operators in (5.5) is Fréchet-differentiable as we will discuss now.

5.1.1 Fréchet-differentiability of the forward operator


The F-differentiability of the forward operators Fj in (5.5) is a well-known result, see
e.g. [35], and we do not intend to prove it here. The goal of this subsection is only to
give a roughly explanation of how the vectors Fj0 (γ) σ and Fj0 (γ)∗ η for γ ∈ int (D (F )) ,
1/2
σ ∈ L∞ (Ω) and η ∈ H♦ (∂Ω) can be calculated, which is important for our numerical
implementations in the next subsection.
−1/2
Fix gj ∈ H♦ (∂Ω) and define Gj (γ) := uγ , where uγ ∈ H♦1 (Ω) is the unique solution
of (5.3) with g := gj , i.e.,
Z Z
γ∇uγ ∇ϕ = gj ϕ for all ϕ ∈ H♦1 (Ω) . (5.6)
Ω ∂Ω

Thus Gj (γ)|∂Ω = Fj (γ) . Further, let t ∈ R\ {0} be small enough to satisfy γ + tσ ∈


D (F ) and define uγ+tσ := Gj (γ + tσ) , which is the unique vector in H♦1 (Ω) satisfying the
equation Z Z
(γ + tσ) ∇uγ+tσ ∇ϕ = gj ϕ for all ϕ ∈ H♦1 (Ω) . (5.7)
Ω ∂Ω

Subtracting (5.7) from (5.6) we obtain


Z Z
γ∇ (uγ+tσ − uγ ) ∇ϕ = −t σ∇uγ+tσ ∇ϕ for all ϕ ∈ H♦1 (Ω) .
Ω Ω

Dividing both sides of above equality by t and letting t → 0 we obtain


Z Z
γ∇wσ ∇ϕ = − σ∇uγ ∇ϕ for all ϕ ∈ H♦1 (Ω) , (5.8)
Ω Ω

where
uγ+tσ − uγ Gj (γ + tσ) − Gj (γ)
wσ := lim = lim = G0j (γ) σ
t→0 t t→0 t
78 CHAPTER 5. NUMERICAL EXPERIMENTS

is the Fréchet derivative of Gj evaluated in γ and applied in the vector σ. As the trace
operator is linear and continuous,
Fj0 (γ) σ = G0j (γ) σ ∂Ω = wσ |∂Ω .

Therefore, in order to calculate Fj0 (γ) σ, one needs first to find uγ ∈ H♦1 (Ω) in (5.6) , use it
1/2
to find wσ ∈ H♦1 (Ω) in (5.8) and finally evaluate its trace wσ |∂Ω ∈ H♦ (∂Ω) .
To calculate the vector Fj0 (γ)∗ η, we use the following procedure: let ϑ ∈ L∞ (Ω) be
fixed and let ψη ∈ H♦1 (Ω) be the unique solution of (5.3) for g := η, i.e.,
Z Z
γ∇ψη ∇ϕ = ηϕ for all ϕ ∈ H♦1 (Ω) . (5.9)
Ω ∂Ω

Now, let wϑ ∈ H♦1 (Ω)


denote the unique solution of (5.8) for σ := ϑ, which implies that
0
Fj (γ) ϑ = wϑ |∂Ω . Consequently,
Z Z

0 ∗
0
(5.9)
Fj (γ) η, ϑ = η, Fj (γ) ϑ = ηwϑ = γ∇ψη ∇wϑ
Z ∂Ω Z Ω
(5.8)
= γ∇wϑ ∇ψη = − ϑ∇uγ ∇ψη = h−∇uγ ∇ψη , ϑi .
Ω Ω

Thus,
Fj0 (γ)∗ η = −∇uγ ∇ψη , (5.10)
with uγ , ψη ∈ H♦1 (Ω) being given in (5.6) and (5.9) respectively.
Unfortunately, the space L∞ (Ω) has too poor smoothness/convexity properties to be
included in the convergence analysis of K-REGINN in Chapter 4 (it is not uniformly smooth,
for example). But, as Ω is bounded, L∞ (Ω) ⊂ Lp (Ω) for 1 < p < ∞, and since these spaces
have the necessary properties (remember that for 1 < p < ∞, the Lebesgue spaces Lp are
p∨2−convex and p∧2−smooth (see Example 14)), a possible and immediate solution would
be redefine the operators Fj in different spaces:
1/2
Fj : D (F ) ⊂ Lp (Ω) → H♦ (∂Ω) , 1 < p < ∞.

The duality mapping Jp : Lp (Ω) → Lp (Ω) can be now easily calculated via (2.14).
Using this strategy however, a new problem becomes evident: D(F ) has no interior
points in the Lp −topology1 , which means that differentiability or even the continuity of Fj
are compromised. To overcome this technical obstacle, we suggest restricting the searched-
for conductivity space X to a finite dimensional space V ⊂ L∞ (Ω) , that is, redefine the
functions Fj ’s as
Fj : V+ ⊂ Vp → L2 (∂Ωj ) , γ 7−→ fj , V+ := D (F ) ∩ V, (5.11)
where ∂Ωj is the part of the boundary where the experiments are actually taken. The
subscript index p in Vp := (V, k·kLp ) highlights the fact that the Lp −topology is used
in2 V = span {v1 , ..., vM }. This is a reasonable framework because from a computational
point of view, the best one can do is to find the coefficients of the conductivity vector in a
specific basis of a finite dimensional subspace of L∞ (Ω). Moreover, using a finite number of
experiments, only a finite number of degrees of freedom of the conductivity can be restored.
Since in finite dimensional spaces all the norms are equivalent, the F-derivative of Fj
remain the same3 .
n o
1
For γ ∈ D (F ) fixed, the ball γ e ∈ L∞ (Ω) : kγ − γ ekLp (Ω) < ρ is not entirely contained in D (F ) for
any ρ > 0.
2
The vectors vi ∈ L∞ (Ω) , i = 1, ..., M are naturally assumed to be linearly independent.
3 1/2
The change of the data space H♦ (∂Ω) with L2 (∂Ωj ) in (5.11) does not alter the derivative of Fj
1/2 2
either, since H (∂Ω) ,→ L (∂Ω) , see e.g. [14, Theo. 2.72].
5.1. EIT - CONTINUUM MODEL 79

5.1.2 Computational implementation


In order to improve the reconstructions in our numerical experiments, we propose the use
of a weight-function as done in [52]. To formalize the results and properly adjust the ideas
to our scheme, the next proposition needs to be proven.

Proposition 48 Let X and Y be Banach spaces defined over the same field k. If there ex-

ists a linear, isometric and surjective operator T : X → Y, then the operator T −1 : X ∗ →
Y ∗ is well-defined and shares the same properties.

Proof. Suppose that T : X → Y has the required properties. From its isometry follows
that T is injective and hence bijective. Thus T −1 : Y → X is well-defined and it is clearly
linear, invertible and isometric too. From the isometry of T −1follows now the boundedness

of this operator, which implies that its Banach-adjoint T −1 : X ∗ → Y ∗ is well-defined.

Further, T −1 is a linear operator. It remains to prove that this operator is isometric and


surjective. Let therefore x∗ ∈ X ∗ be given. As T −1 is invertible and isometric,


D E
−1 ∗ ∗ ∗
x ∗ = sup T −1 x∗ , y =

∗ −1
T

sup x , T y = kx∗ k ∗ ,
X
Y kykY =1 kT −1 ykX =1
∗
which proves that T −1 is isometric. Lastly, to prove surjectivity, let y ∗ ∈ Y ∗ be given
and define the operator x∗ : X → k as

hx∗ , xi := hy ∗ , T xi , x ∈ X.
∗
Then it is clear that x∗ ∈ X ∗ . We will prove that T −1 x∗ = y ∗ . In fact, for each y0 ∈ Y,
define x0 := T −1 y0 . Then
D ∗ E
hy ∗ , y0 i = hy ∗ , T x0 i = hx∗ , x0 i = x∗ , T −1 y0 = T −1 x∗ , y0 .

Note that if T has the required properties of above proposition, then it is a linear,
bounded, isometric and invertible operator. Further, its inverse is also a bounded operator,
because it is isometric too.
∗ This means that T is an isomorphismus of Banach spaces and
the same applies to T −1 . Moreover, it obviously holds, for all x ∈ X ∗ and x ∈ X,

D ∗ E
hx∗ , xiX ∗ ×X = T −1 x∗ , T x ∗ ,
Y ×Y

−1 ∗

which implies that T is actually an isomorphismus of dual spaces. The result implies
in particular that ∗
x∗ ∈ Jϕ (x) ⇐⇒ T −1 x∗ ∈ Jϕ (T x) ,
∗
this is, T −1 Jϕ (x) = Jϕ (T x) . Then, for all x, y ∈ X,
D ∗ E
hJϕ (T x) , T yiY ∗ ×Y = T −1 Jϕ (x) , T y ∗ = hJϕ (x) , yiX ∗ ×X .
Y ×Y

Each property of X which only depends either on the norm k·kX or on the duality pair-
ing h·, ·iX ∗ ×X of its dual space X ∗ is therefore preserved. In particular, smoothness and
convexity properties of X and Y are exactly the same. For instance, the equivalence

1 1 Cp
kx − ykp ≤ kxkp − hJp (x) , yi + kykp
p p p
if and only if
1 1 Cp
kT x − T ykp ≤ kT xkp − hJp (T x) , T yi + kT ykp ,
p p p
80 CHAPTER 5. NUMERICAL EXPERIMENTS

implies that X is p−smooth if and only if Y is p−smooth. Furthermore, the constant C p


remains unchanged.
The motivation for proving Proposition 48 is to define the weighted Banach space
p
Lω (Ω) . Let ω : Ω → R be a positive function and define the vector space
 Z 
p p
Lω (Ω) := f : Ω → R : |f | ω < ∞ .

Observe that f ∈ Lpω (Ω) if and only if f ω 1/p ∈ Lp (Ω) . We can therefore define the norm

kf kLpω (Ω) := f ω 1/p p , f ∈ Lpω (Ω) ,

L (Ω)

which transforms Lpω (Ω) in a normed space. If ωmin ≤ ω (x) ≤ ωmax for all x ∈ Ω, where
0 < ωmin ≤ ωmax are constants independent of x, then

1 1
kf kpLp (Ω) ≤ kf kpLp (Ω) ≤ kf kpLp (Ω) .
ωmax ω ωmin ω

Thus, the two vector spaces have equivalent norms and the same elements. Further, the
completeness of Lp (Ω) is transmitted to Lpω (Ω) and the operator T : Lpω (Ω) → Lp (Ω) ,
f 7−→ f ω 1/p is well-defined and satisfies all the hypotheses of Proposition 48. Therefore∗ the

Banach space Lpω (Ω) inherits the properties of Lp (Ω) . In particular, (Lpω (Ω)) = Lpω (Ω)
and for all f, g ∈ Lpω (Ω) ,
D E
hJp (f ) , giLp∗ ×Lp = hJp (T f ) , T giLp∗ ×Lp = |T f |p−1 sgn (T f ) , T g p∗ p
ω ω L ×L
D E Z
p−1 ∗ p−1
= |f | sgn (f ) ω 1/p , gω 1/p p∗ p = |f | sgn (f ) gω
L ×L Ω
D E
= |f |p−1 sgn (f ) , g p∗ p .
Lω ×Lω

Hence, the duality mapping Jp in Lpω (Ω) is still given by Jp (f ) = |f |p−1 sgn(f ) .
The adjoint operator Fj0 (γ)∗ changes slightly in the new space because for the new
0
adjoint operator F j (γ)∗ we want to have for each γ ∈ int (D (F )) , σ ∈ Lp (Ω) and η ∈
L2 (∂Ωj ) ,
D E
0 !

F j (γ)∗ η, σ = η, Fj0 (γ) σ L2 (∂Ω ) = Fj0 (γ)∗ η, σ Lp∗ (Ω)×Lp (Ω) ,





Lpω (Ω)×Lpω (Ω) j

that is Z Z
0 ∗
Fj (γ) ησω = Fj0 (γ)∗ ησ.
Ω Ω

Thus (see (5.10)),


0
F j (γ)∗ = ω −1 Fj0 (γ)∗ = −ω −1 ∇uγ ∇ψη . (5.12)

We keep using the old notation Fj instead of F j for the new operator.
The idea of using the function ω is to give different weights for different regions of Ω.
With this procedure we hope to increase stability in some regions where is important to
have it. We come back to this subject later to introduce an appropriated weight-function
for our framework. Before doing that however, some preliminaries are needed.
For our numerical experiments in this section, we observe only the Kaczmarz version
of REGINN and let the comparison between Kaczmarz and non-Kaczmarz methods for the
EIT-CEM problem, which will be analyzed in next section.
5.1. EIT - CONTINUUM MODEL 81

In order to perform the experiments, we have chosen Ω as the unit square (0, 1) × (0, 1)
and d = 4m (m ∈ N) independent currents

cos (2j0 πx) cos (2j0 πy) : (x, y) ∈ Γj1
gj := ,
0 : otherwise

where j = 4 (j0 − 1) + (j1 − 1) , j0 = 1, ..., m and j1 = 1, ..., 4. The sets Γj1 represent the
faces of Ω : Γ1 := [0, 1] × {1} , Γ2 := {1} × [0, 1] , Γ3 := [0, 1] × {0} and Γ4 := {0} × [0, 1] .
The voltages fj = Λγ gj are measured in ∂Ωj = ∂Ω\Γj1 , which means that we do not read
the voltages where we apply the currents.
For implementing K-REGINN numerically, the computational evaluation of the vectors
Fj (γ), Fj0 (γ) σ and Fj0 (γ)∗ η for γ ∈ int (V+ ) , σ ∈ Vp and η ∈ L2 (∂Ω) are necessary. Since
an analytical solution is in general not available, we apply the Finite Element Method
(FEM), constructing a Delaunay triangulation

Υ := {Ti : i = 1, ..., M } (5.13)

for Ω, provided by MATLAB’s pde toolbox with M = 2778. The same triangulation is then
used to solve the elliptic problems (5.6) , (5.8) and (5.9) and to reconstruct the conductivity.
The elements of the basis used to define the finite dimensional space V are defined as

1 : x ∈ Ti
vi (x) := χTi (x) = , x ∈ Ω, (5.14)
0 :x∈ / Ti

which means that we are looking for piecewise constant conductivities. This choice of
V guarantees injectivity of Fj0 (γ) . Moreover, Fj satisfies the Tangential Cone Condition
(Assumption 1(c) , page 31), see [35, Subsection 3.1].
We want to test the performance of K-REGINN to reconstruct sparse conductivities in
the {v1 , ...vM } basis, and for this reason, we define the exact solution as
M 
X 0.9 : x ∈ B
γ + (x) := γ0 (x) + αi vi (x) , αi := , x∈Ω (5.15)
0 : otherwise
i=1

where γ0 (x) ≡ 0.1 represents a known background and it is also the first iterate. The set
B represents two open balls inclusions with radii equal 0.15 and centers (0.35, 0.35) and
(0.65, 0.65). Figure 5.1 shows γ + and the triangulation Υ of Ω.
To properly compare the results in different spaces, we always calculate the error in the
2
L −norm. The relative iteration error of the n−th iterate γn is therefore defined by

kγn − γ + kL2 (Ω)


en := 100 , (5.16)
kγ + kL2 (Ω)

which implies that the initial error is e0 ≈ 87, 4%. The (final) relative iteration error eN (δ)
is denominated the reconstruction error. The corresponding data fj = Λγ + gj have been
synthetically computed using again the FEM, but with a different and much more refined
mesh than Υ to avoid an inverse crime. The generated data is then corrupted by artificial
random noise of relative noise level δ, that is, the perturbed data are

fjδ = fj + δ kfj kL2 (∂Ωj ) perj , j = 0, ..., d − 1, (5.17)

where perj is an uniformly distributed random variable with kperj kL2 (∂Ωj ) = 1. in contrast
to the previous chapter, δ denotes here a relative noise level. The other input parameter of
Algorithm 1 are chosen as τ = 1.8, µ = 0.8 and kmax = 50.
82 CHAPTER 5. NUMERICAL EXPERIMENTS

0.8

0.6

0.4

0.2

0
0 0.2 0.4 0.6 0.8 1

Figure 5.1: Left: the searched-for conductivity γ + , defined in (5.15) and modelled by a
resistive background (in blue) and two balls-like conductive inclusions (in dark red). Right:
the triangulation Υ of Ω.

An appropriate choice of the weight-function ω is crucial for the quality of the recon-
structions. Based
PMon the explanations from [52], we have defined ω as a piecewise constant
function: ω := i=1 ωi χTi , with
r 2
Pd−1 0
F (γ ) χ

j=0 j 0 Ti 2
L (∂Ωj )
ωi := , (5.18)
|Ti |
where |Ti | represents the area of the triangle Ti . In spite of being only a heuristic choice for
the EIT-CM, this definition of ω has a special meaning for the complete electrode model
in Hilbert spaces, see (5.43) and Remark 50 in next section.
Since Yj = L2 (∂Ωj ) is a Hilbert space, we choose r = 2 because in this case, j2 (f ) = f
for all f ∈ L2 (∂Ωj ) . Since we are interested in studying sparse conductivities, we follow
ideas from [12] and always choose 1 < p ≤ 2. For the first test, we have used p = 1.1.
The goal of the first experiment is to test the quality of the weight-function ω. Figure 5.2
illustrates the results obtained using d = 8 and different
 relative
 noise levels δ. It compares
the influence of the Banach spaces X = Vp,ω := V, k·kLpω (Ω) , weighted with ω and the
standard space X = Vp in the reconstruction γN (δ) . The dual gradient DE method with
C1 = C2 = 0.1 (see (3.29) and (3.39)) is employed as inner iteration of K-REGINN to generate
the results. Below each picture it is shown the number of outer iterations N = N (δ) , the
reconstruction error eN and the number kall , which represents the sum of all inner iterations
performed until convergence. All the pictures are in the same scale of colors and below all of
them, a color bar is exhibited. Observe that for all noise levels, the reconstructions in first
row, obtained with the weighted-norm k·kL1.1 ω
, are both qualitatively and quantitatively
superior than those found using the L1.1 −norm and displayed in the second row. Moreover,
in the weighted space, Algorithm 1 requires less iterations until convergence. Further, it
is clear that the use of the weight-function ω brings more stability to the solutions in the
sense that it reduces the oscillation near the boundary, which is constantly present in the
reconstructions in the non-weighted space L1.1 . This figure also makes perceptible in both
spaces, the behavior N (δ) → ∞ as δ → 0 and the regularization property, γN (δ) → γ + as
δ → 0, proved in Theorem 47.
Due to the improvement provided by the use of ω, this weight-function is used in all
the remaining experiments of this  subsection.
 For this reason, we skip the dependence of
Vp on ω, i.e., from now on Vp = V, k·kLpω (Ω) .
5.1. EIT - CONTINUUM MODEL 83

δ = 4% δ = 2% δ = 1%

V1.1,ω

N = 30, eN = 81.57% N = 63, eN = 79.28% N = 96, eN = 77.20%


kall = 15 kall = 52 kall = 133

V1.1

N = 63, eN = 85.19% N = 142, eN = 83.25% N = 598, eN = 81.24%


kall = 204 kall = 465 kall = 3018
0.1 0.15 0.2 0.25

Figure 5.2: Reconstructed conductivity γN (δ) in two different Banach spaces and with dif-
ferent noise levels. The first and second rows display the results obtained using respectively
the weighted-norm k·kL1.1ω
and the standard norm k·kL1.1 .

The purpose of the second test is to check the results of Theorem 43. We fix all the same
parameters of last experiment (inclusively p = 1.1 and d = 8) and analyze the behavior
of the Bregman variation of TP method (3.58) in the inner iteration of K-REGINN with
maximal number of inner iterations is fixed to kmax = 1. This configuration results in a
variation of the Levenberg-Marquardt method. The regularization parameter in (3.57) is
chosen as α0 = 0.01. Now, in order to carry out the inner iteration, either the solution
of the nonlinear equation (3.58) needs to be found or the minimization of the functional
(3.60) must be executed. A fixed-point iteration for zn,k+1 in (3.58) could be employed to
find this vector. Convergence is guaranteed if the spaces X and Y have enough regularity
and the constant α0 is large enough, see e.g. [39, Appendix A]. However, these conditions
are very restrictive: convergence is only ensured for large values of α0 and this constant
needs to be very large for small values of p, which in a practical point of view, seems to
render this method unfeasible. In the other hand, minimizing the functional (3.60) , even
in Hilbert spaces, is not a trivial task and it is far from simple in more general Banach
spaces. In order to achieve this goal, we have utilized the DE method ((3.29) and (3.39))
with C1 = C2 = 0.01, which showed itself robust enough to reach a satisfactory precision
in the minimization of (3.60). As Theorem 43 refers to the noiseless situation, no artificial
noise is added in the generated data (δ = 0). Table 5.1 presents the results. It reinforces
the fact that in the noiseless situation, both the residual bn and the relative iteration error
en converge to zero as the number of outer iterations n grows to infinity.
84 CHAPTER 5. NUMERICAL EXPERIMENTS

n 0 1 10 100 1000 10000


en 87.40 86.27 80.50 74.57 69.29 60.17
kbn k 0.0715 0.0690 0.0291 0.0030 0.0014 0.0008

Table 5.1: Levenberg-Marquardt method employed to confirm the results from Theorem
43: the relative iteration error en as well as the nonlinear residual bn approaches zero as
the outer iteration index n grows to infinity.

In the next experiment we aim to confirm the results of Theorem 38, which states that
the iteration error in the Bregman distance is monotonically non-increasing in the outer
iteration of K-REGINN, this is, ∆p (γ + , γn ) ≤ ∆p (γ + , γn−1 ) for n = 1, ..., N (δ) . Figure 5.3
exhibits the evolution of the iteration error versus the outer iteration index n for three
different dual gradient methods (3.29) engaged as inner iteration: the DE, MSD and LW
methods with C1 = C2 = 0.1 in (3.39) , K1 = K2 = 0.1 in (3.45) and λLW = 0.1 (see (3.43)).
The noise level is set to δ = 1%, kmax = 50 and τ = 1.5. The other parameters remain the
same as in the last experiment. See that the (final) iteration errors are comparable for all
methods. This behavior is somehow expected because the quality of the approximations
in the inner iteration are not only dependent on the utilized method itself, but it is also
controlled by the stop criteria of inner iteration (3.5) , which remains the same for all
methods. Among these three methods however, the DE method is by far the method which
results in the fastest convergence while the LW method seems to be the slowest. See that
although the iteration error is strictly decreasing in most iterates, it remains constant in
some of them. This behavior is in accordance with the results of Theorem 38, which states
that the iteration error in the Bregman distance remains the same in the iterates where the
discrepancy principle 3.9 is satisfied and it is strictly decreasing otherwise.

0.24
Iteration Error in the Bregman Distance

DE
MSD
0.22
LW

0.2

0.18

0.16

0.14

0.12
0 40 80 120 160 200
Outer Iteration

Figure 5.3: The decreasing error behavior of the outer iteration of K-REGINN (shown in
Theorem 38) observed with three different dual gradient methods in the inner iteration.

In order to observe the influence of different Banach spaces in the reconstructions of γ + ,


we apply the LW method with the same configuration of last experiment to generate the
Figure 5.4. The reconstruction error eN (δ) is displayed in the vertical axis and confronted
with the noise level δ. The Banach spaces Vp , with p = 2, p = 1.1 and p = 1.01 are tested in
this situation and the results are compared using a discrete set of four different noise levels:
δ = 4%, δ = 2%, δ = 1% and δ = 0, 5%. The regularization property γN (δ) → γ + as δ → 0,
is somehow observed in all the tested Banach spaces. Note however that the reconstruction
errors are higher for p = 2 in all noise levels and become lower as p approaches 1. This
quantitative improvement in the reconstructions for small values of p is expected and can
be credited to the Lp Banach space norm for suitable values of p. in contrast to the
Hilbert space L2 , which has the tendency of producing over-smoothed reconstructions and
5.1. EIT - CONTINUUM MODEL 85

0.84
p=2
0.82 p=1.1

Reconstruction error
p=1.01
0.8

0.78

0.76

0.74
4 2 1 0.5
Noise level

Figure 5.4: Reconstruction error confronted with the noise level and observed in three
different Banach spaces norms: L2 (in red) L1.1 (in blue) and L1.01 (in green).

consequently destroying the sparsity properties of the solution γ + , the Lp −norm with p
assuming values close to 1 preserves these sparsity features and it is therefore more capable
to reconstruct sparse solutions, see e.g., [12].
One of the main advantages of using the Kaczmarz methods is the fact that this kind
of algorithms does not work with all the equations of (3.6) in all cycles. Only the equations
whose the current iterate is not a good enough approximation4 are used to perform an
iteration. The remaining equations whose the current iterate is already considered good
enough, are not used. This procedure only updates the current iterate in the necessary
directions, accelerating the method until convergence and rendering better reconstructions.
This behavior is evident in Figure 5.5, where the number of active equations is compared
on each cycle of K-REGINN. The dual gradient MSD method ((3.29) and (3.45)) is employed
as inner iteration with the same parameters of last experiment but with the fixed noise
level δ = 1%, p = 1.1 and d = 16. Observe that after some few initial cycles, the number of
active equations drops to relative small levels, where only the relevant equations are worked
until termination. The average of active equations in this example is roughly half of the
whole number.
Number of active equations

16

12

0
0 10 20 30 40 50 60 70
Cycles

Figure 5.5: Behavior of Kaczmarz methods: the average of active equations becomes lower
with the time and the current iterate is corrected using only the relevant information.

The regularization property proved in Theorem 47 is once more evident in Figure 5.6,
where different numbers of electric currents are applied on the boundary of Ω (different
values of d, or equivalently, different numbers of equations in (3.6)) are tested. We want to
illustrate both behaviors in this experiment: more information improves the reconstructions
independently on the noise levels and the results become better whenever the noise levels
are reduced. Each row in this Figure presents the reconstructions for a different value of
d : d = 4, d = 8 and d = 12. The columns exhibit and compare the different noise levels
δ = 2%, δ = 1% and δ = 0.5%. The pictures have been generated using the Bregman
variation of the IT method (3.62) with the same parameters of the last experiment but
4
Called the active equations, they are those equations, whose inequality (3.9) is not satisfied.
86 CHAPTER 5. NUMERICAL EXPERIMENTS

with kmax = 10. Further, the regularization parameter in (3.68) is defined as the constant
αn ≡ 0.1 and the dual gradient DE method ((3.29) and (3.39)) with C1 = C2 = 0.01 is
used to find the minimizer of the associated functional (3.63). All the pictures are in the
same scale of colors and below all of them, a color bar shows the values represented by each
of those colors. A clear improvement in the reconstructions is seen moving towards the
rightmost column (where the noise levels are lower) and moving downwards (where more
information is available).

δ = 2.0% δ = 1.0% δ = 0.5%

d=4

d=8

d = 12

0.1 0.15 0.2 0.25 0.3 0.35

Figure 5.6: Reconstruction γN (δ) in the space V1.1 obtained with the noise levels δ = 4%,
δ = 2% and δ = 1% and with different number of equations: d = 4, d = 8 and d = 12.

The norm of the noise can vary if it is regarded as a vector in different spaces. For
instance, the so-called impulsive noise, which consists of standard uniform noise superim-
posed with some few highly inconsistent data points called outliers, has a small Lr −norm
for r small and becomes larger if r increases. In contrast, the so-called Gaussian noise,
which is an uniformly distributed noise, is less sensitive to the chosen norm and has more
similar values in different Lr −norms. In the first row of Figure 5.7, three completely differ-
ent kinds of noise are presented. The above referred Gaussian an impulsive noises are the
first and second picture respectively. Below each kind noise, the value of the corresponding
Lr −norm is shown for different values of r. The noises are scaled such that the (relative)
5.1. EIT - CONTINUUM MODEL 87

0.01 0.01 0.01

0.005 0.005 0.005

N oise 0 0 0

−0.005 −0.005 −0.005

−0.01 −0.01 −0.01

0 1 2 0 1 2 0 1 2

r=2 6.0e−3 6.0e−3 6.0e−3


r = 1.01 8.1e−3 3.9e−3 10.2e−3
r = 50 8.4e−3 13.2e−3 3.5e−3

Figure 5.7: Norm of three different kinds of noise, measured in the Banach space norms
L2 (∂Ω1 ) (first row), L1.01 (∂Ω1 ) (second row) and L50 (∂Ω1 ) (third row).

L2 −noise is δ = 1% in all cases.


The same noises shown in the top of Figure 5.7 are used to compute the reconstructions
presented in Figure 5.8. The pictures exposed in this new figure are organized in columns
and rows in the same way as in the last figure. Thus, each column and each row of Figure
5.8 represents respectively the same kind of noise and space of Figure 5.7. We have used
d = 4, τ = 1.8 and kmax = 50. The dual gradient DE method have been engaged again as
inner iteration of K-REGINN with and C1 = C2 = 0.1 in (3.39). The parameter p is fixed
in 2, which means that X is a Hilbert space in this experiment. On the other hand, we
have chosen different values for r expecting to obtain better reconstructions in those spaces
where the noise, measured in the corresponding norm, has smaller values.
Since the set ∂Ωj is an uni-dimensional Lipschitz manifold and H 1/2 ((a, b)) is contin-
uously embedded in Lr ((a, b)) for (a, b) ⊂ R and 1 < r < ∞, see e.g. [14, Theo. 2.72], it
follows that H 1/2 (∂Ωj ) ,→ Lr (∂Ωj ) [18] and we have permission to set Yj = Lr (∂Ωj ) for
an arbitrary r ∈ (1, ∞) . Since kgj kH 1/2 (∂Ωj ) . kgj kLr (∂Ωj ) for 1 < r < ∞, the new opera-
tors Fj are still Fréchet differentiable and have the same derivatives as before. We want to
point out however, that the standard choice Jr for the duality mapping is in principle not
allowed if r < 2. Indeed, we do not have any convergence results for the Kaczmarz methods
in this case. Theorem 47 actually requires that s ≤ r for the case d > 1, but as X is a
Hilbert space, it follows that s = 2, and accordingly, the index r of the duality mapping
should be larger than or equal 2. This technical problem could be overcome for instance,
using the normalized duality mapping J2 in the Lr space, which can be realized via (2.14).
However, this seems not to be the best solution because the reconstructions found with this
framework are not as good as the ones found when the duality mapping Jr is used in the
space Lr (see also Remark 25). Seemingly, the duality mapping Jr is the right one for the
space Lr and for this reason, we have chosen to use it, even without having the complete
convergence proof in this situation.
As expected, all the reconstructions in the L2 space (first row of Figure 5.8) are similar
because all the noises have the same L2 −norm. The results shown in the first column are
also similar because the Gaussian noise has similar norms in the three different investigated
spaces. But, a slightly superior quality of the picture in the first row can be observed. This
behavior can be explained looking again to the first column of Figure 5.7, which shows
that the smallest value for the Gaussian noise is obtained with the use of the L2 −norm.
The most interesting results however, are shown in the two last rows intersected with the
two last columns. Comparing the results of Figure 5.8 with those shown in Figure 5.7, we
clearly see that an improvement is achieved when the kind of noise matches with the used
space, in the sense that the resulting noise-norm has a small value (this is the case in the
intersection between the second row with the second column and the third row with the
third column). On the other hand, the contrary effect occurs if the combination of kind of
88 CHAPTER 5. NUMERICAL EXPERIMENTS

X = V2

X = V1.01

X = V50

0.05 0.1 0.15 0.2 0.25

Figure 5.8: Reconstruction γN (δ) obtained in different Banach spaces norm (compared by
rows) and with different kinds of noise (compared by columns). All the noises are scaled
and have the same L2 −norm.

noise and used space results in a large noise-norm (see the second row intersected with the
third column and the intersection between the third row and the second column).
To finish this section we explain how to take advantage of the weight-function ω to
incorporate a priori information about the solution γ + . If the inclusions B are expected
to be located in a specific region A of Ω, one way to get better reconstructions is giving
less weight for the points within this region and ”penalizing” the distance to this set: let
A ⊂ Ω be a closed subset in Ω where the inclusions are expect to be located. Define the
new weight-function
ω (x) := ω (x) h (x) , x ∈ Ω,
where
h (x) := (c0 + dist (x, A)) . (5.19)
The function dist (·, A) : Ω → R measures the distance between a point of Ω and the set
A:
dist (x, A) := inf {kx − yk : y ∈ A} , x ∈ Ω.
5.2. EIT - COMPLETE ELECTRODE MODEL 89

0.8

0.6

0.4

0.2

0.35

0.3

0.25

0.2

0.15

N = 934, eN = 74.60% N = 662, eN = 71.86% N = 278, eN = 70.66% 0.1

kall = 5162 kall = 4750 kall = 1747

Figure 5.9: Reconstructed solution γN (δ) for different weight-functions. The first row shows
the function h from (5.19).

The small constant c0 > 0 is used to avoid zero-weights and to define the ”contrast” between
the points which belong and the ones which do not belong to the set A. This new weight-
function is still bounded from above and from below: c0 ωmin ≤ ω ≤ (c0 + |Ω|) ωmax , where
0 < ωmin ≤ ωmax are respectively lower and upper bounds for ω.
Figure 5.9 collects the results obtained using different sets A and the same constant
c0 = 0.03. The first row shows the function h, defined in (5.19) , while the second one
exhibits the respective reconstructions with the number of outer iteration N, the overall
number of inner iteration kall and the reconstruction error eN being highlighted below each
picture. At the end of each row, a color bar is displayed. The first column presents the
result for the set A = Ω, which means that the new weight ω (x) = c0 ω (x) is just a multiple
of the original weight-function ω. For the reconstruction in the second column the strip-like
set
A = {(x, y) ∈ Ω : y − 0.25 ≤ x ≤ y + 0.25}
have been used, and the surface delimited by a square

A = [0.2, 0.8] × [0.2, 0.8]

is the chosen set in the third column. The pictures have been generated using the DE
method with C1 = C2 = 0.1 and the following configuration: d = 8, p = 1.1, r = 2, τ = 1.5
and δ = 0.5%. The other parameters are the same of last experiment.
Note that better results are found in both the second and third columns, where more
accurate information about the location of the inclusions is provided. Further, the overall
number of outer and inner iterations are lower in both cases.

5.2 EIT - Complete Electrode Model


In this new procedure, introduced by Somersalo et al. in [50] to build a more realistic frame-
work for the EIT problem, electric currents are injected in the simply connected Lipschitz
90 CHAPTER 5. NUMERICAL EXPERIMENTS

domain Ω ⊂ R2 via L ∈ N electrodes attached to its boundary ∂Ω and the resulting voltages
are measured in the same electrodes with the goal of restoring the electrical conductivity
γ : Ω → R. For the correct translation in an appropriate mathematical model, we suppose
that the electrodes e1 , ..., eL are identified with the part of the surface of Ω they contact.
Thus, ei ⊂ ∂Ω is open and has positive measure |ei | > 0 for i = 1, ..., L. Additionally,
the electrodes are connected and separated: ei ∩ ej = ∅ for i 6= j. Similarly to the EIT-
CM model, we suppose that Ω has no sources or drains and thus (5.1) is satisfied, where
u : Ω → R represents again the potential distribution in Ω. The electrodes are modelled as
perfect conductors, which means that the electric current in the electrode ei is a constant
Ii ∈ R, which agrees with the total electric flux over the same electrode:
Z
∂u
γ dS = Ii , i = 1, ..., L.
ei ∂ν

The vector I := (I1 , ..., IL )> ∈ RL , which collects in a single vector the electric currents of
all electrodes, is called a current pattern. As the electric flux does not flow in the electrodes
gaps we have
∂u
γ = 0 in ∂Ω\ ∪L i=1 ei .
∂ν
At the contact of the electrodes with ∂Ω, there is an electro-chemical effect which gives rise
to a thin, highly resistive layer characterized here by the quantities zi > 0, i = 1, ..., L and
called the contact impedances. The product of the contact impedance zi times the electric
flux γ (∂u/∂ν) results in the drop of the voltage in the electrode ei . Thus
∂u
u + zi γ = Ui , on ei , i = 1, ..., L,
∂ν
where Ui is the measured voltage in the i−th electrode ei . We denote by U := (U1 , ..., UL )> ∈
RL , the vector of the voltages associated with the current pattern I.
In [50, Prop. 3.1], it is demonstrated that if the set Ω, the electrical conductivity γ and
the electrodes ei have enough regularity, the above conditions can be equivalently replaced
by the variational problem
L
X
B ((u, U ) , (v, V )) = Ii Vi for all (v, V ) ∈ H, (5.20)
i=1

where H := H 1 (Ω) ⊕ RL and the bi-linear form B : H × H → R is defined by


Z L Z
X 1
B ((u, U ) , (v, V )) := γ∇u∇v + (u − Ui ) (v − Vi ) dS. (5.21)
Ω zi ei
i=1

Aiming now to apply the Lemma of Lax Milgram [32] in (5.20) in order to prove the
existence of a vector (u, U ) for fixed contact impedances zi > 0 and a fixed current pattern
I ∈ RL , it was suggested in [50, Theo. 3.3] changing the space H with H e = H/R, which
is essentially the same space but with the difference that two vectors from H which differ
only by a constant function are regarded as the same vector in H, e i.e.,
 
(v, V ) ∼ ve, Ve ⇔ v − ve = const = V1 − Ve1 = ... = VL − VeL . (5.22)

In this new vector space, the bi-linear form B in (5.21) is coercive whenever γ ∈ L∞
+ (Ω).
Now defining f : H → R as
e
L
X
f ((v, V )) := Ii V i ,
i=1
5.2. EIT - COMPLETE ELECTRODE MODEL 91
 
we see that, in order to have a well-defined function, the equality f ((v, V )) = f ve, Ve
must be verified whenever the right-hand side of (5.22) holds true. Thus, the equality

  L
X L
X L
X
0 = f ((v, V )) − f ve, Ve = Ii Vi − Ii (Vi + const) = const Ii ,
i=1 i=1 i=1

must be verified for an arbitrary constant. We thus assume that I ∈ RL


♦ , where

L
( )
X
RL
♦ := I ∈ RL : Ii = 0 .
i=1

In this case, f is a well-defined, bounded linear functional and the Lemma of Lax Milgram
applies. It follows that there exists a unique solution of (5.20) if H is replaced with H. e
Therefore, there exist infinite solutions in H, differing from each other only by a constant.
To obtain uniqueness in H, we suppose that U ∈ RL L
♦ . The conditions I, U ∈ R♦ can be
understood as the law of the conservation of the charge and the grounding of the potential
respectively.
The above reasoning leads to the conclusion that the following related variational prob-
lem is well-defined: fixed a current pattern I ∈ RL ♦ and positive contact impedances
z1 , ..., zL , find the unique pair (u, U ) ∈ H 1 (Ω) ⊕ RL
♦ satisfying

L
X
B ((u, U ) , (v, V )) = Ii Vi for all (v, V ) ∈ H 1 (Ω) ⊕ RL
♦, (5.23)
i=1

b ×H
where B : H b := H 1 (Ω) ⊕ RL is defined in (5.21) .
b → R, with H

Remark 49 The variational problem (5.23) has a unique PL solution even if the fixed current
pattern I belongs to5 RL \RL♦ . Indeed, defining a := i=1 Ii and changing I with I − C,
>
where C := (a/L, ..., a/L) , we see that I − C ∈ R♦ and for all V ∈ RL
L
♦,

L L L L
X X aX X
(Ii − Ci ) Vi = Ii Vi − Vi = Ii Vi .
L
i=1 i=1 i=1 i=1
| {z }
=0

Since a unique solution of (5.23) exists with the current pattern I − C, the same occurs with
I.

Using (5.23), it is not difficult to prove that the Neumann-to-Dirichlet (NtD) operator

Λγ : RL → RL , I 7→ U, (5.24)

which associates a current pattern I ∈ RL with the respective voltage vector U ∈ RL


♦ ⊂R
L

is linear. The Complete Electrode Model of the EIT (in short EIT-CEM) forward problem
is now defined by the function

F : D (F ) ⊂ L∞ (Ω) → L RL , RL , γ 7→ Λγ ,

(5.25)

with D(F ) := L∞
+ (Ω) . Recovering γ from a partial knowledge of Λγ is the associated inverse
problem to be solved.
5
The current pattern vector I ∈ RL \RL
♦ does not satisfy the principle of the conservation of the charge.
Therefore, it has no physical meaning.
92 CHAPTER 5. NUMERICAL EXPERIMENTS

As can be seen from Remark 49, the NtD operator (5.24) is not injective. In fact, the
null-space of Λγ is given by

N (Λγ ) = I ∈ RL : I1 = ... = IL = const 6= 0.




But, since it is a linear operator, the knowledge of the vectors U j = Λγ I j , j = 1, ..., L, where
I 1 , ..., I L is a basis for RL , implies in the knowledge of the NtD operator Λγ itself6 .


In practical situations, one fixes ` ∈ N (not necessarily linearly independent) current


patterns I j ∈ RL ♦ , j = 1, ..., `, which for notational reasons we put together in a single vector
 >
= := I 1 , ..., I ` ∈ R`L , (5.26)

and reads in the EIT-CEM experiment a noisy version of


 >
Γγ := U 1 , ..., U ` ∈ R`L , (5.27)

where uj , U j ∈ H 1 (Ω) ⊕ RL

♦ is the unique solution of (5.23) associated with the current
pattern I = I , that is, U := Λγ I j , j = 1, ..., `. Observe that the noisy versions of U j
j j

belong to the space RL but not necessarily to RL ♦.


We now reformulate (5.25) as

F= : D (F ) ⊂ L∞ (Ω) → R`L , γ 7→ Γγ , (5.28)

and regard this operator as the EIT-CEM forward problem for the rest of this section.
The determination of an approximation for γ from partial knowledge of Γγ is therefore the
associated inverse problem we want to solve.
Similarly to the EIT-CM, presented in the last section, the space Y = R`L factorizes into
the spaces Y = Y1 × ... × Y` , with Yj := RL , j = 1, ..., ` and accordingly, F= = (F1 , ..., F` )>
with
Fj : D (F ) ⊂ L∞ (Ω) → RL , γ 7→ U j , j = 1, ..., `, (5.29)
which replaces the problem (5.28) with a more suitable version for an application of a
Kaczmarz method.

5.2.1 Fréchet-differentiability of the forward operator


The Fréchet differentiability of the operators Fj in (5.29) can be proven similarly to the
EIT-CM, see [28] and [34, Theo 4.1]. Further, for γ ∈ int (D (F )) and η ∈ L∞ (Ω) , the
F-derivative Fj0 (γ) η =: W j is the second element of the pair wj , W j ∈ H 1 (Ω) ⊕ RL ♦,
which is the unique solution of the following variational problem:
Z
B w , W , (v, V ) = − η∇uj ∇vdx for all (v, V ) ∈ H 1 (Ω) ⊕ RL
j j
 
♦, (5.30)

where uj , U ∈ H 1 (Ω) ⊕ RL
j

♦ is the unique solution of (5.23) with the current pattern
I = I j , this is, U j = Fj (γ) .
Proceeding similarly to (5.10) , one can prove that for each Z ∈ RL , the adjoint operator
of the F-derivative of Fj satisfies

Fj0 (γ)∗ Z = −∇uj ∇y, (5.31)

where (y, Y ) ∈ H 1 (Ω) ⊕ RL


♦ is the unique solution of (5.23) with the current pattern I = Z.
6
Since dim (N (Λγ )) 6= 0 and Λγ is symmetric [50], less linearly independent measures are actually needed.
5.2. EIT - COMPLETE ELECTRODE MODEL 93

The F-derivative of the forward operator (5.28) is now given by F=0 (γ) = (F10 (γ) , ..., F`0 (γ))> ,
i.e., for each η ∈ L∞ (Ω) ,
 0 
F1 (γ) η
F=0 (γ) η =  .. `L
∈R . (5.32)
 
.
F`0 (γ) η

Since only a finite number of experiments can be made, only a finite number of degrees
of freedom of the conductivity can be restored. Thus, it makes fully sense to restrict
the searched-for conductivities to a finite dimensional space. Proceeding similarly to the
last section, we define the triangulation Υ := {Ti : i = 1, ..., M } of Ω as in (5.13) and
analogously to (5.11) and (5.14) , we restrict the space X to the finite dimensional space V
and the domain of definition of F to

V+ := V ∩ L∞+ (Ω) ⊂ Vp , (5.33)


 
where V := span {χT1 , ..., χTM } ⊂ L∞ (Ω) and Vp := V, k·kLp (Ω) . The discrete version of
the operators defined in (5.28) and (5.29) are

F : V+ ⊂ Vp → R`L , γ 7→ Γγ , (5.34)

and
Fj : V+ ⊂ Vp → RL , γ 7→ U j , j = 1, ..., `, (5.35)
PM
respectively. Identifying the arbitrary vector γ = i=1 αi χTi of V+ ⊂ L∞ (Ω) with the
vector of its coordinates (α1 , ..., αM )> ∈ RM , we see that the functions in (5.34) and in
(5.35) above can now be seen as nonlinear operators from (a subset of) RM to R`L and
from (a subset of) RM to RL respectively. Their derivatives F 0 (γ) and Fj0 (γ) , evaluated
in a vector γ ∈ int (V+ ) , can accordingly be regarded as two matrices (called the Jacobian
matrices) with dimension M × `L and M × L respectively. In this framework, the adjoint
operators of these derivatives are simply the transpose Jacobian matrices.
We now explain how to calculate the Jacobian matrix efficiently: let γ ∈ int (V+ ) be
given and observe that

Fj0 (γ) = Fj0 (γ) χT1 , ..., Fj0 (γ) χTM ∈ RL×M , j = 1, ..., `.

 
Further, using (5.30) , the vector Fj0 (γ) χTi =: Wij can be evaluated as a part of wij , Wij ∈
H 1 (Ω) ⊕ RL
♦ , the unique solution of
   Z
B wij , Wij , (v, V ) = − χTi ∇uj ∇vdx for all (v, V ) ∈ H 1 (Ω) ⊕ RL
♦, (5.36)

where uj , U j ∈ H 1 (Ω) ⊕ RL

♦ is the unique solution of (5.23) for the current pattern
I = I j . This relative straightforward procedure for calculating the Jacobian matrix is
however highly expensive. In fact, it is necessary to solve the problem (5.23) for I = I j in
order to obtain the vector uj and then solve the problem (5.36) for i = 1, ..., M in order
to calculate Fj0 (γ). Similarly, ` solutions of (5.23) and M ` solutions of (5.36) are required
for the evaluation of F 0 (γ) . But, using a simple trick, as explained in [42], we are able to
strongly reduce these numbers. For each i ∈ {1, ..., L} , let v i , V i ∈ H 1 (Ω) ⊕ RL ♦ be the
i L
unique solution of (5.23) with the current pattern I = J := (δi,j )j=1 , which is the vector
having the value 1 in the i−th coordinate and zero elsewhere. This is,

B v i , V i , (z, Z) = J i , Z for all (z, Z) ∈ H 1 (Ω) ⊕ RL


 

♦. (5.37)
94 CHAPTER 5. NUMERICAL EXPERIMENTS

Thus
D E D E>
Fj0 (γ) χTi = Wij = J 1 , Wij , ..., J L , Wij (5.38)
(5.37)
        >
= B v 1 , V 1 , wij , Wij , ..., B v L , V L , wij , Wij
      >
= B wij , Wij , v 1 , V 1 , ..., B wij , Wij , v L , V L
 Z Z >
(5.36) j 1 j L
= − χTi ∇u ∇v dx, ..., − χTi ∇u ∇v dx
Ω Ω
 Z Z >
= − ∇uj ∇v 1 dx, ..., − ∇uj ∇v L dx .
Ti Ti
Thus, in order to calculate the Jacobian matrix, which represents the derivative of
(5.34) , everything one needs to do is to solve the problem (5.23) for each of the ` current
patterns in (5.26) , solve (5.37) for i ∈ {1, ..., L} and assemble the results as explained in
(5.38) and (5.32) . This procedure demands the solution of only ` + L variational problems
and since the number of electrodes L is normally much smaller than the number of triangles
M in Υ, a tremendous saving in the computational effort can be achieved with this strategy.
If we further observe that in each inner iteration of REGINN, the vectors uj , j = 1, ..., `
have already been calculated in the outer iteration in order to evaluate F (γ), then the
calculation of the Jacobian actually requires the additional calculation of L solutions of
(5.37) . Analogously, the evaluation of the Jacobian Fj0 (γ) for the Kaczmarz version (5.35)
demands L solutions of (5.37) if the information Fj (γ) is already available.
All the methods presented in Section 3.1 demand the calculation of at least one derivative
applied in a vector (An s, s ∈ X) or of the calculation of the adjoint of the derivative applied
in a vector (A∗n b, b ∈ Y ∗ ) to calculate the first vector sn,1 in each inner iteration of REGINN.
Thus, if one wants to calculate sn,1 for the problem (5.34) without executing the calculation
of the Jacobian matrix, we conclude in view of (5.32) that at least L solutions of (5.30) or
(5.31) need to be found. Considering that the process of obtaining a solution of a variational
problem is the unique relevant computational effort in a run of Algorithm 1, and recalling
that L solutions of (5.37) is everything one needs in order to calculate the Jacobian for this
problem, we conclude that it is always worth to evaluate this matrix.
The situation is a little different for the Kaczmarz version of REGINN, i.e., if the problem
(5.35) is analyzed instead of (5.34) . If the Jacobian matrix Fj0 (γ) is not available and a
dual gradient method (3.29) or a mixed gradient-Tikhonov method (3.76) is employed
as inner iteration for instance, the solution of two variational problems are required to
compute each single vector: one for the calculation of the derivative applied in sn,k and a
second one for the adjoint of the derivative applied in jr An sn,k − bδn . This results in a


total of 2kn required solutions for the entire execution of an inner iteration. In the first
iteration however, the calculation of An sn,0 can be avoided because sn,0 = 0. But, if the
inner iteration is not terminated by the maximal number kmax,n , the additional derivative
An sn,kn needs to be calculated only to verify the stop criteria (3.5) in the last iteration,
which leave us with the same number 2kn . The calculation of the last derivative is not
necessary however, if the maximal number of inner iterations is reached. In this case,
the inner iteration calculates the solution of 2kmax,n − 1 variational problems. Since the
evaluation of the Jacobian matrix demands a total of L solutions of (5.37) , we conclude
that it is not worth to calculate this matrix if 2kmax,n −1 ≤ L or 2kn ≤ L for the cases when
the maximal number of inner iterations is reached or not reached respectively. But, as an
a priori estimation for the number kn is not available, it is difficult to decide whether the
Jacobian should be calculated. Note however, that if kmax ≤ L/2, then both inequalities
are satisfied and therefore the calculation of the Jacobian is more expensive. See a detailed
comparison in the Tables 5.2 and 5.3 in the final of Subsection 5.2.2 below.
5.2. EIT - COMPLETE ELECTRODE MODEL 95

5.2.2 Computational implementation


To implement our experiments, we define Ω as the circle centered in the origin and with
radius equals 4/π. To reconstruct the conductivity, we use the same triangulation Υ of Ω
defined in the last subsection, which is also used to calculate the Jacobian matrix of the
forward operator. Since an analytical solution of (5.23) is in general not available, the FEM
is used to find an approximate solution. For this approximate solution however, a different
and more refined triangulation Θ of Ω is used7 . Further, we fix d = 1 (non-Kaczmarz
method), which means that we analyze the problem defined in (5.34) .
In [35], Lechleiter and Rieder have proven that once Υ is fixed, there exists a num-
ber Lmin ∈ N dependent on Υ, such that if the number of electrodes satisfies L ≥ Lmin
then the Fréchet-derivative of the (discrete) forward operator (5.34) is injective and sat-
isfies the Tangential Cone Condition (Assumption 1(c) , page 31) in a small ball centered
on an arbitrary element γ ∈ int (D (F )) . Further, the same result remains true if the
vector (u, U ) ∈ H 1 (Ω) ⊕ RL
♦ , which is the exact solution of (5.23) , is changed by its FEM-
approximation (uΘ , UΘ ) in the mesh Θ whenever this triangulation is sufficiently fine, see
[35, Theo. 4.5 and 4.9]. In particular, the inequality dim Υ ≤ L (L − 1) /2 is a necessary
condition for the injectivity of F 0 .
Once fixed the meshes, the forward operator (5.34) is well-defined and acts between finite
dimensional spaces and since all the norms are equivalent in such spaces, the derivative of
the forward operator does not change if the norms in X and Y are modified. As explained
in the last subsection, we choose the norm in X to be the Lp −norm and denote this space
by X := Vp , see (5.33) above. We now transform the vector space Y = R`L in a Banach
space, equipping it with the norm
`L
!1/r
X
kU kr := |Ui |r , U ∈ R`L , (5.39)
i=1

where r > 1 is a fixed number. A similar calculation to that made in Example 17 shows
that the duality mapping is given by
(Jr (U ))i = |Ui |r−1 sgn (Ui ) , i = 1, ..., `L, (5.40)
where
Jr (U ) := (Jr (U )1 , ...Jr (U )`L )> , U ∈ Y.
In our experiments, we have chosen Υ and Θ with dim Υ = 1200 and dim Θ = 14340.
We highlight the fact that the use of the relatively large dimension of the triangulation Υ
implies that the inequality dim Υ ≤ L (L − 1) /2, necessary for the injectivity of F 0 , is only
verified for L ≥ 50. However, most experiments we have performed in this subsection have
a moderate number of electrodes, which is in general much smaller than 50. The result is
that a solution of an under-determined system
An s = bn (5.41)
needs to be approximated in each inner iteration of K-REGINN. Among all the possible
solutions we could approximate, we then pick up a specific one which is the most suitable to
our interests. We follow ideas from [52] and select an appropriate weight-function ω : Ω → R,
changing the Lp −norm in X with Lpω , as done in the last section. In [52], Winkler and Rieder
have proposed choosing a different weight-function ω = ωn for each (outer) iteration. They
argument that an usual choice for the solution of (5.41) in the Hilbert spaces is
s := arg min kAn s − bn k ,
s∈N (An )⊥
7
For a full explanation of how to employ the FEM in order to find a solution of (5.23) we recommend
[42].
96 CHAPTER 5. NUMERICAL EXPERIMENTS

where the condition s ∈ N (An )⊥ resolves under-determinedness. The author’s idea is to


define a suitable piecewise constant and positive weight
M
X
ωn := ωn,i χTi ,
i=1

such that the weighted inner product h·, ·iωn := h·, ωn ·iL2 (Ω) substitutes the original L2
inner product and then redefine

s := arg min kAn s − bn k , (5.42)


s∈N (An )⊥ωn

where N (An )⊥ωn represents the orthogonal complement of N (An ) with the new inner prod-
uct h·, ·iωn . They have suggested an strategy which updates indistinguishable coefficients
of the current (outer) iterate γn := M
P
i=1 γn,i χTi with the same amount as following: the
definition of the coefficients
kSi k2 −1
ωn,i := γ , for i = 1, ..., M, (5.43)
|Ti | n,i
where |Ti | is the area of the triangle Ti and
v
u ` 2
uX
kSi k2 := t Fj0 (γn ) χTi

2
j=1

is the 2−norm of the i−th column of the Jacobian 0


PM matrix An = F (γn ) , results in a weight-
function ωn which provides a solution s = i=1 si χTi of (5.42) whose coefficients si and sj
are proportional to the local updates γn,i and γn,j whenever the columns Si and Sj of the
Jacobian matrix An are linearly dependent. More precisely, in [52, Theo. 1], it is proven
that if the coefficients of ωn are defined like in8 (5.43) , then the equality
si sj
= sgn (β)
γn,i γn,j
is guaranteed whenever the condition

Sj = βSi , β ∈ R\ {0}

holds true.

Remark 50 Since γ0 was chosen as a constant in the last section, the weight function ω
used in that section is just given by the first weight (n = 0 in Definition (5.43) above) times
a constant, this is, ω = γ0 ω0 , see (5.18) .

Note that the use of the weighted L2 inner product hf, giωn = Ω f gωn results in the
R
 
weighted space Xn = V2,ωn = V, k·kL2ω (Ω) . Unfortunately, our framework does not allow
n
the use of different spaces X in different iterations and for this reason we proceed like in
the last section and fix the same weight-function for all iterations. We therefore define
M
X
ω := ωi χTi
i=1
8
The authors of [52] actually proposed the use of a slightly different weight-function ω n , whose coefficients
ω n,i correspond to |Ti | ωn,i in (5.43) . But, since they have used the Euclidean inner product in RM in place
of the L2 inner product, both choices result in equivalent approaches.
5.2. EIT - COMPLETE ELECTRODE MODEL 97

Figure 5.10: Left: example of a conductivity γ + , modeled by a background (in blue) and
a sparsely distributed inclusion (in dark red). Middle and right: Triangulations Θ and Υ
respectively.

with r
P` 2
0
F (γ ) χ

j=1 j 0 Ti
2 −1
ωi := γ0,i . (5.44)
|Ti |
Even not having any results for Banach spaces, we use this weight-function in all the
experiments. In particular, the change of the space X = Vp with X = Vp,ω results in the
new norm
kf kLpω (Ω) = f ω 1/p p , f ∈ X.

L (Ω)

Further, this modification in X implies a slight modification in the way the adjoint operator
of the derivative is calculated. In this new space, the adjoint A∗n of the Jacobian matrix is
not only the transpose of the matrix An , but the transpose of the weighted matrix An,ω ,
defined multiplying each element in the i−th row of the matrix An by ωi . Indeed, for all
u ∈ RM and v ∈ R`L ,
D E
u, (An,ω )> v = hAn,ω u, viVp = hωAn u, viVp = hAn u, viVp,ω = hu, A∗n viY ,
Y

resulting in A∗n = (An,ω )> .


We point out that no additional computational effort is required to calculate the weights
in (5.44) since all the needed information have already been calculated in order to evaluate
the Jacobian matrix A0 = F 0 (γ0 ), necessary to perform the first inner iteration.
Since the weighted-space  Vp,ω is always
 used in our experiments, we skip the dependence
of Vp on ω, denoting Vp = V, k·kLpω (Ω) from now on.
In each experiment we have performed, all the electrodes have the same measure, are
uniformly distributed on the boundary of Ω and cover 50% of ∂Ω. We have fixed L = ` = 16
for the first experiments and the current patterns in (5.26) are defined as the vectors I i :=
(0, ..., 0, 1, −1, 0, ..., 0)> , with 1 in the i−th coordinate, −1 in the immediately following
one and zero elsewhere. The contact impedances are the known constant zj = 0.04/π for
j = 1, ..., L, which is 1% of the radius of Ω.
The exact solution γ + we are looking for has the form shown in (5.15) . However, different
inclusions B ⊂ Ω are considered in different experiments. The first iterate γ0 ≡ 0.1 matches
the background again and the relative iteration error en as well as the error reconstruction
is defined in the same way as (5.16) . In all the tests we have realized, the data Γγ in (5.27)
has been synthetically generated using a very fine mesh with more than 230000 triangles.
Figure 5.10 illustrates an example of γ + , where the inclusion B is modelled by a four
squares-like inclusion. The triangulations Θ and Υ, used respectively to solve the elliptical
problem (5.23) and to reconstruct the conductivity, are also exhibited.
98 CHAPTER 5. NUMERICAL EXPERIMENTS

p=2

e10 = 68.80% e100 = 66.40% e1000 = 64.62%

p = 1.1

e10 = 65.10% e100 = 59.89% e1000 = 46.57%

Figure 5.11: Convergence in the noiseless situation observed in two different Banach space
norms. The first and second rows show the reconstructions obtained using the Hilbert space
X = V2 and the Banach space X = V1.1 respectively.

To confirm the important noiseless convergence result, proved in Theorem 43, we add no
artificial noise in the generated data (δ = 0), and see what happens to the approximation
γn as n grows to infinity. For this first experiment, the IT method (3.62) has been employed
as inner iteration of REGINN and the results, which have been built in the spaces X = V2
(p = 2) and X = V1.1 (p = 1.1), are respectively displayed in the first and second row of
Figure 5.11. The Euclidean norm k·k2 (Definition (5.39) with r = 2) is used to transform
the vector space Y in a Hilbert space. With this configuration, the duality mapping in
(5.40) is just the identity operator. The parameter αn in (3.68) is given by the constant
value 0.1 and the dual gradient DE method (see (3.29) and (3.39)) with C1 = C2 = 0.1
is applied to find the minimizer of the functional (3.63) , which is necessary to realize the
inner iteration. The other parameters of REGINN are µ = 0.8 and kmax = 10. Each picture
displayed in Figure 5.11 is actually a linear interpolation of the coefficients of the piecewise
constant reconstructions. All these pictures are in the same scale of colors and below each
of them, the relative error iteration en is shown. The first column illustrates the iterate
γ10 while the second and third one refers to γ100 and γ1000 respectively. As expected,
the reconstructions in the Hilbert space X = V2 are over-smoothed, with a relative large
oscillation in the background and too low inclusions. In the other hand, the second row
clearly shows more thin and high inclusions with larger slopes and a lower oscillation levels
in the background, which is a typical behavior of solutions restored in the Banach space
norms Lp for small values of p, see e.g. [12].
In the next experiment we use artificially generated noise to contaminate the data Γγ
in (5.27) with the relative noise level δ :

Γδγ = Γγ + δ kΓγ k2 per,


2
where the perturbation vector per ∈ RL is an uniformly distributed random variable with
kperk2 = 1. We fix δ = 0.1% and compare the performance of REGINN for reconstructing
sparsely located inclusions when different norms in X are employed to this task. We use
p = 2 and p = 1.01. The parameter kmax is now set to 500 and a mixed gradient-Tikhonov
method is used as inner iteration, combining the Tikhonov-Phillips method with the dual
gradient DE method (xn = 0 in (3.76) with λn,k = λDE being defined in (3.72) with
C1 = C2 = 0.1). The constant τ has the value 1.5 and the other parameters are the same as
5.2. EIT - COMPLETE ELECTRODE MODEL 99

Exact

X = V2

X = V1.01

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 5.12: Conductivity with sparsely located inclusions (first row) and the respective
reconstructed solutions: using the Hilbert space norm L2 (second row) and the Banach
space norm L1.01 (third row).

in the last experiment. Figure 5.12 presents the searched-for solutions γ + in the first row
and exhibits in the second and third rows, the respective V2 − and V1.01 −reconstructions. All
the pictures are in the same scale of colors and after the third row, a color bar is displayed.
We clearly see a lower level of oscillation in the background for the reconstructions in the
third row. Additionally, a doubtless improvement in the reconstructions is achieved if the
Hilbert space V2 is replaced with the Banach space V1.01 , resulting in sharper inclusions
with more accurate values, forms and locations.
In order to examine the behavior of REGINN for reconstructing a sparse conductivity
together with impulsive noise9 , we choose different norms in both X and Y. The values
p = 2 and p = 1.01 as well as r = 2 and r = 1.01 have been designated to perform this task.
Since d = 1 (non-Kaczmarz method), inequality s ≤ r in Theorem 47 is an unnecessary
condition and therefore, the index r of the duality mapping in Y can be freely chosen.
Figure 5.13 illustrates the searched-for conductivity γ + , modelled by a ring-like inclusion,

9
Impulsive noise in the context of the Complete Electrode Model corresponds to a low level error in most
electrodes in the measure of the voltage for a fixed current pattern, but with very high error levels in some
few of them.
100 CHAPTER 5. NUMERICAL EXPERIMENTS

0.01

−0.01

16
12 16
8 12
4 4 8

Figure 5.13: Left: the searched-for sparsely distributed conductivity γ + . Middle and right:
impulsive noise Γδγ − Γγ .

followed by the noise Γδγ − Γγ (see (5.27)) displayed in two different angles10 and with noise
level δ = 1%.
Figure 5.14 complements Figure 5.13 and shows the reconstructions in the above referred
framework. The dual gradient DE method ((3.29) and (3.39) with C1 = C2 = 0.1) is used
as inner iteration of REGINN to generate the results. The remaining parameters of REGINN
are the same as in the last experiment. In the first and second rows we present the results
found using the Hilbert space X = V2 and the Banach space X = V1.01 respectively. The
first column shows the reconstructions when r = 2 in the norm (5.39) is used, while the
second column displays the results for r = 1.01. Below each picture, the number of outer
iterations N = N (δ) as well as the reconstruction error eN and the overall number of inner
iterations kall are shown. Further, a color bar is presented on the right. It is clear that
the pictures in the last column have a superior quality than the ones in the first column,
which means that the norm k·k1.01 is more appropriated to deal with the impulsive noise
than the standard Euclidean norm k·k2 . The price to be paid for better reconstructions is a
significantly increase in the overall number of inner and outer iterations. Observe however,
that a small value for p produces an additional effect of less variation in the background
in the same time that it demands less computational effort until convergence. Figure 5.14
makes clear that the combination p = r = 1.01 results in a very satisfactory framework for
this specific situation.
To finish this chapter, we confront the non-Kaczmarz version of REGINN (5.34) (where
the case d = 1 is considered) with its corresponding Kaczmarz version (5.35) , where d = `
is used (i.e., each single current pattern in (5.26) results in a different equation used by
K-REGINN). The goal is to compare the reconstruction errors and the computational effort
necessary to perform an entire run of Algorithm 1. For the case d = 1, it is only considered
the case where the Jacobian is calculated because this is always the most advantageous
situation (see the discussion at the end of last subsection). Further, the computational
effort necessary to perform all the iterations using or not using the calculation of the
Jacobian matrix is compared for the Kaczmarz version. Different numbers of electrodes,
and respectively of current patterns, are used: L = ` (= d for the Kaczmarz version) with
L = 8, L = 16, L = 32 and L = 64. Trying to be as fair as possible in the comparisons,
we have not used any weight-function in the Lp −norms of X. The constant τ has been
chosen as the smallest constant such that Algorithm 1 terminates in all cases11 . We found
the values τ = 1.3 for the non-Kaczmarz version and τ = 3 for the Kaczmarz version as
the optimal values. The conductivity function γ + is the one shown in the intersection of
2
10
The vector-noise Γδγ − Γγ ∈ RL is actually displayed in Figure 5.13 in the matrix form, where the j−th
column corresponds to the vector U j,δ − U j ∈ RL and generated from the current pattern I j in (5.26) .
11
The constant η in the TCC can be significantly different for the problems (5.34) and (5.35) if the
number of experiments ` ∈ N is large. Since the constant τ depends on η, see (4.9), it can be different for
the Kaczmarz and non-Kaczmarz versions.
5.2. EIT - COMPLETE ELECTRODE MODEL 101

r=2 r = 1.01
0.6

0.5

p=2
0.4

N = 11, eN = 74.85% N = 69, eN = 64.71%


0.3
kall = 67 kall = 27815

0.2

0.1
p = 1.01

N = 16, eN = 75.50% N = 26, eN = 62.72%


kall = 34 kall = 4632

Figure 5.14: Conductivity function γ + from Figure 5.13, reconstructed in different
 Ba-
L2
nach spaces with 1% of impulsive noise. In the first column Y = R , k·k2 and
 2 
Y = RL , k·k1.01 in the second one. The first and second rows correspond to the spaces
X = V2 and X = V1.01 respectively.

the first row and the first column of Figure 5.12. For the inner iteration, the dual gradient
DE method with C1 = C2 = 0.1 is used again. Further, all the tests have been made with
the same noise level δ = 0.1% and with p = 1.1. Since a large constant kmax represents an
extra advantage for the case where the Jacobian matrix is calculated in comparison with
the case where this matrix is not calculated, this constant must be carefully chosen. We
fixed it with the value 50, which we consider a good number. All the other parameters
are the same as in the last experiment. The results of this experiment are collected in the
Tables 5.2 and 5.3. In the first table, the number of outer iterations N (δ) as well as the
overall number of inner iterations kall and the reconstruction error eN (δ) are displayed for
each electrode configuration.
As already discussed at the end of last subsection, the regular version of REGINN requires

L 08 16 32 64
Regular 39 149 109 561
N (δ)
Kaczmarz 558 28174 33111 202499
Regular 3700 26294 14788 18643
kall
Kaczmarz 2003 103227 93392 535148
Regular 79.20 70.38 67.10 65.48
eN (δ)
Kaczmarz 79.08 69.16 65.78 64.32

Table 5.2: Comparison between the Regular version of REGINN and its Kaczmarz version
with different numbers L of electrodes.
102 CHAPTER 5. NUMERICAL EXPERIMENTS

L 08 16 32 64
AI 331 14954 18093 77693
Regular 616 4752 6944 71744
CE Kaczmarz Jac 3206 267438 612087 5174851
Kaczmarz 4564 234628 219895 1272795

Table 5.3: Computational effort CE (overall number of required solutions of variational


problems for an entire run of K-REGINN). The row denoted by ”Regular” represents the
case d = 1 (non-Kaczmarz version) with the evaluation of the Jacobian matrix. The rows
denoted by ”Regular Jac” and ”Regular” represent the Kaczmarz versions (d = L) when
the Jacobian matrix is respectively evaluated and not evaluated.

the solution of L variational problems in each outer iteration in order to calculate F (γn ).
Each inner iteration in turn, requires the solution of L additional problems for the evaluation
of the Jacobian matrix. Since the last outer iteration is the unique iteration which does
not perform an inner iteration, we conclude that the overall number of variational problems
which need to be solved in an entire run of Algorithm 1 is (2N − 1) L. For the Kaczmarz
version of REGINN, 1 solution of a variational problem is required in each outer iteration
in order to calculate F[n] (γn ). If the Jacobian is calculated, L additional solutions are
necessary in each active iteration12 , which results in a total of N + L.AI, where the number
AI represents the overall number of active iterations. On the other hand, each vector
calculated in the inner iteration of K-REGINN demands the solution of approximately 2
variational problem if the Jacobian matrix is not calculated, which results in a total of
N + 2kall required solutions.
Turning again to the results, we see that a small improvement in the reconstruction
error eN (δ) is shown in the Table 5.2 if the Kaczmarz version of REGINN is considered.
The price to be paid is however high, as Table 5.3 makes evident. This table compares
the computational effort necessary to achieve the results. It shows the overall number of
active iterations AI and the ”computational effort” CE for each electrode configuration13 .
Further, the regular version of REGINN (where d = 1 and the Jacobian matrix is calculated)
and the Kaczmarz versions (where d = L) with and without the calculation of the Jacobian
matrix are compared. It is clear that the regular version of REGINN always works less than
its Kaczmarz versions in all situations. Comparing only the Kaczmarz versions, we see that
the calculation of the Jacobian matrix demands less computational effort only for a small
number of electrodes. If a relative large number of electrodes is used, the computational
effort necessary to calculate the Jacobian matrix becomes higher than the direct calculation
of the solutions of (5.23) and (5.30) and the evaluation of this matrix is not worth any longer.

12
An active iteration is an outer iteration where the discrepancy principle (3.9) is not satisfied and con-
sequently, the inner iteration needs to be performed.
13
The number CE actually represents the overall number of required solutions of variational problems
used by Algorithm 1 until termination.
Chapter 6

Conclusions and Final


Considerations

We consider the convergence analysis of Chapter 4 the main accomplishment of this work.
In this chapter, a Kaczmarz version of the inexact-Newton method REGINN [43] is analyzed
in Banach spaces and the proofs are carried out considering an inner iteration defined in
a relatively general way, which fits in with various methods. The result is a convergence
analysis of K-REGINN, valid at the same time for several different regularization methods
in the inner iteration. This analysis is a generalization to Banach spaces and to Kaczmarz
methods of ideas previously discussed in [36]. In order to properly develop this general
convergence analysis however, strong restrictions needed to be imposed. We think it is
possible to weaken these hypotheses, especially those required on the solution space X. For
instance, in the cases where the non-Kaczmarz version (d = 1 in (3.6)) is observed, the first
inequality in (4.30) becomes unnecessary to prove convergence in the noiseless situation.
But, this inequality seems to be exactly the crucial point, where the s−convexity of X
is most needed. In the remaining cases, the s−convexity is apparently preventable and
could be handled with similar techniques to those engaged in [47]. Assuming the condition
d = 1 therefore, the uniform smoothness and s−convexity of X could be weaken to only
smoothness and uniform convexity. Of course without the s−convexity, the verification of
Assumption 3, page 58, for the methods investigated in Section 3.1 would be much more
arduous.
In our opinion, the uniform smoothness of data space Y, required to show the stability
property of the dual gradient methods in Appendix A, could also be avoided using similar
arguments to those employed in [47]. However, in order to perform this modification,
the proof of the stability property should be made simultaneously with the proof of the
regularization property, which means that, at least in principle, Assumption 6, page 70,
could not be proven separately, and consequently the general convergence analysis could be
harmed.
The Tikhonov-Phillips method (3.58) has a peculiar iteration form, which is somewhat
different from the other methods of Section 3.1. This characteristic complicates the con-
vergence analysis of Chapter 4 substantially, forcing the recurrent separation in two cases:
zbn,k = zn,k and zbn,k = xn , see Assumptions 3, and 4, page 62. But looking on the bright
side, none geometrical property on the space Y is required for this method in order to prove
Assumption 6, see the stability proof in Appendix A. This suggests that the space L1 could
be used as data space in further numerical experiments, building up a more appropriate
framework to deal with impulsive noise for example. If Y = L1 however, although the
functional (3.60) remains strictly convex, it is not differentiable any longer and the pro-
cess of finding its minimizer, necessary to perform the inner iteration, becomes much more

103
104 CHAPTER 6. CONCLUSIONS AND FINAL CONSIDERATIONS

challenging.
At the end of Subsection 3.1.2, the possibility of obtaining the optimal step-size of
the dual gradient methods is discussed and a lower bound for this step-size is shown.
Though an explicit formula for the optimal step-size is hard to achieve, sharper bounds could
possibly be determined. An upper bound like λopt . λDE for instance, would be especially
interesting and could facilitate the verification of Assumption 3, including the gradient
method associated with the optimal step-size in our convergence analysis of Chapter 4.
A similar situation is observed with the dual gradient Steepest Descent method. Find-
ing an explicit formula for λSD in (3.44) involves the differentiation of the duality mapping,
which makes difficult the determination of this step-size. To overcome this technical obsta-
cle, we have replaced this method with its similar version, the Modified Steepest Descent
method (3.45) , which already has an explicit step-size satisfying the desired inequality
λM SD ≤ λDE , necessary to include it in our convergence analysis. Although an obvious
formula for λSD is not available, it is not unconditionally necessary to prove Assumption
3 since only the inequality λSD ≤ λDE needs to be verified. Once proven, Assumption 3
would imply the convergence of Steepest Descent method. The implicit defined step-size
λSD could be numerically approximated with the help of an optimization algorithm in the
real-line.
In addition to several classical regularization techniques, which have been adapted from
Hilbert to more general Banach spaces in this work, a few novel approaches are presented.
Among the methods which have been firstly introduced in this thesis, we highlight the
mixed gradient-Tikhonov methods (Subsection 3.1.4) and the dual gradient Decreasing
Error method ((3.29) and (3.39)). Both algorithms have shown themselves useful methods
to reconstruct stable solutions of ill-posed problems as can be seen in Figures 5.12 and
5.14 for example. The additional (regularization) term of the mixed gradient-Tikhonov
methods confers further stability to regular gradient methods and results in a more steady
inner iteration. The Decreasing Error method in turn, exhibits the advantage of playing
the role of the gradient method with the fastest decreasing error in the case of a linear
problem in a Hilbert space been observed. This behavior seems to be somehow transmitted
to nonlinear problems in Banach spaces as Figure 5.3 shows.
The nonlinearity of the duality mapping is possibly the biggest complication in the con-
vergence analysis of K-REGINN in general Banach spaces. The extra effort to demonstrate
the theorems in these more complicated spaces is however counterbalanced with the im-
provements provided by the use of more convenient norms. The use of Banach spaces can
be therefore lucrative in order to achieve better reconstructions of inverse problems in some
specific situations, especially if sparsity constraints on the searched-for solution or on the
data-noise are present. This enhancement of the quality becomes evident in the numerical
experiments performed in Chapter 5, however, many questions concerning the mathemat-
ical modelling of EIT problem with Lp −norms remain still open. In its original (infinite
dimensional) version, differentiability or even continuity of the forward operator with the
Lebesgue spaces Lp are in principle not valid for any p ∈ (1, ∞) and legitimizing the use
of these spaces seems not to be a trivial issue. For a proper adaption, further research is
therefore needed.
A superficial examination on the computational implementations of Subsection 5.2.2
and particularly on Tables 5.2 and 5.3 could erroneously lead to the conclusion that the
Kaczmarz version of REGINN is not competitive. These results however, are in large part
due to the manipulation described in (5.38) , which reduces considerably the computational
effort necessary to calculate the Jacobian matrix, especially for the non-Kaczmarz version
of REGINN. But without this strategy, the regular REGINN would be much more ”expensive”
and its Kaczmarz version could become more advantageous. Thus, to have a more accurate
answer about the real utility of the Kaczmarz version of REGINN, supplementary numerical
105

experiments, preferably with a different problem, should be performed.


To finish this work, we want to compare all the algorithms introduced in Section 3.1. The
primal gradient methods from Subsection 3.1.1 are the easiest to implement and are from
far the methods with least prerequisites on space X. But they require strong hypotheses
on Y and only termination of (regular) REGINN can be shown if such a method is used in
the inner iteration. The dual gradient methods from Subsection 3.1.2 on the other hand,
can be employed as inner iteration of REGINN to show convergence in the noiseless case and
the regularization property when only noisy data is available. These results hold true even
for the Kaczmarz version of REGINN if the Landweber method is considered. However, the
iteration (3.29) needs to be performed using the current (outer) iterate xn , which makes this
version of the gradient methods a little more complicated. Additionally, the requirements
on the spaces X and Y constitute a considerable disadvantage. Among all the methods
introduced in Section 3.1, these are the ones which demand the most restrictive Banach
spaces to work: uniformly smoothness of X and Y and p−convexity of X are required.
Tikhonov methods own the convenient property of working without any restrictions on
space Y , as shown in Subsection 3.1.3 and Appendix A. Moreover, convergence and the
regularization property of K-REGINN can be proven whenever either the Iterated-Tikhonov
or the Tikhonov-Phillips method is the chosen inner iteration. However, the obligation of
solving an optimization problem in order to generate a new vector in the inner iteration
represents a big obstacle: the more poor convexity/smoothness properties the space Y pos-
sesses, the more difficult this optimization problem becomes. To combine the advantages
of dual gradient and Tikhonov methods, we have introduced in Subsection 3.1.4 the mixed
gradient-Tikhonov methods. Though this kind of algorithm necessitates the same (strong)
requirements on the spaces X and Y as the dual gradient methods, it confers extra stability
to the inner iteration by incorporating a regularization parameter αn on its iteration, see
(3.76) . Further, it does not request the solution of any optimization problem, although sim-
ilarly to Tikhonov methods, a suitable regularization parameter αn needs to be determined,
which in the particular case of mixed methods, depends on a priori information about the
norm of the solution, see (3.71) .
106 CHAPTER 6. CONCLUSIONS AND FINAL CONSIDERATIONS
Appendix A

Stability Property of the


Regularizing Sequences

This appendix provides for different methods, the proof of the stability property given in
Assumption 6, page 70. For the ease of the presentation, we assume all the hypotheses
of Theorem 47 except for Assumption 6 itself. For some methods, other hypotheses are
necessary to complete the proof and these additional hypotheses will be required at the
specific points where they are needed.
Let (δi )i∈N ⊂ R be a zero-sequence and suppose that xδni → ξ ∈ Xn as i → ∞. We
δi
proceed giving a proof by induction: for k = 0, zn,0 = xδni → ξ = σn,0 (ξ) as i → ∞.
δi
Assume now that lim zn,k = σn,k (ξ) for k < kREG (ξ) . Our task is now to prove that
i→∞
δi
zn,k+1 → σn,k+1 as i → ∞.

• Dual gradient methods DE, LW and MSD


δi δi δi
For these methods we have zbn,k = zn,k and vn,k = Aδni sδn,k
i
− bδni . Accordingly, σ
bn,k =
σn,k .
This proof is a slightly different version of that given in [40, Lemma 10]. Assume that
the Banach spaces Yj , j = 0, ..., d − 1 are uniformly smooth. As the functions Fj and
Fj0 are continuous, using the induction hypothesis and (3.7) it is clear that the vector
     
δi
vn,k = Aδni sδn,k
i 0
− bδni = F[n] δi
xδni zn,k δi
− xδni − y[n] − F[n] xδni

converges to
ξ 0

vn,k = F[n] (ξ) (σn,k (ξ) − ξ) − y[n] − F[n] (ξ)

as i → ∞. Similarly, λδn,k
i
→ λξn,k as i → ∞ (see (3.39) , (3.43) and (3.45)). As the
spaces Yj ’s are uniformly smooth, the selection jr : Yj → Yj∗ is unique and continuous
and since the mappings Jp , Fj and Fj0 are continuous too,
     ∗  
δi δi
Jp zn,k+1 = Jp zn,k − λδn,k
i 0
F[n] xδni jr Aδni sδn,k
i
− bδni (A.1)

converges to
 
Jp (σn,k+1 ) = Jp (σn,k ) − λξn,k F[n]
0
(ξ)∗ jr vn,k
ξ
, (A.2)

δi
as i → ∞. This means that lim zn,k+1 = σn,k+1 because the duality mapping Jp−1 =
i→∞
Jp∗∗ is also continuous.

107
108APPENDIX A. STABILITY PROPERTY OF THE REGULARIZING SEQUENCES

• Bregman variation of Iterated-Tikhonov method (3.62)


δi δi δi
Here zbn,k = zn,k and vn,k = Aδni sδn,k+1
i
− bδni .
This proof was first presented in [39] in its current version. It is an adaptation from
[26, Lemma 3.4], which in turn uses ideas from [16]. The stability proof in this case
is more complicated because we do not assume any condition on the Banach spaces
Yj ’s.
δi
See that the vectors zn,k+1 and σn,k+1 respectively minimize the functionals

δi 1 0
   r 
δi

Tn,k (z) := F[n] xδni z − xδni − bδni + αn ∆p z, zn,k

r
and
1 0
r
Wn,k (z) := F[n] (ξ) (z − ξ) − ebn + αn ∆p (z, σn,k ) ,

r
 
δi
where ebn := y[n] − F[n] (ξ) . As the family zn,k+1 is uniformly bounded (see
δi >0
(4.16)) and X is reflexive, there exists, by picking a subsequence if necessary, some
δi
z ∈ X such that zn,k+1 * z as i → ∞. We first prove that z = σn,k+1 and later that
δi ∗ ,
zn,k+1 → z as i → ∞. For all g ∈ Y[n]
D   E D     E D E
0
g, F[n] xδni sδn,k+1
i 0
= g, F[n] 0
xδni − F[n] (ξ) sδn,k+1
i 0
+ g, F[n] (ξ) sδn,k+1
i
.

But as sδn,k+1
i δi
= zn,k+1 0 (ξ)∗ g ∈ X ∗ ,
− xδni * z − ξ =: s as i → ∞ and F[n]
D E D E D E D E
0
g, F[n] (ξ) sδn,k+1
i 0
= F[n] (ξ)∗ g, sδn,k+1
i 0
→ F[n] (ξ)∗ g, s = g, F[n]
0
(ξ) s .

0 is continuous and xδi → ξ,


Now, as F[n] n
D     E
0
g, F[n] 0
xδni − F[n] (ξ) sδn,k+1
i


 
0 0 δi
≤ kgkY ∗ F[n] xδni − F[n] (ξ) s →0

n,k+1
[n] L(X,Y[n] ) X

as i → ∞ because sδn,k+1
δi
≤ zn,k+1 + xδni is uniformly bounded (see (4.11) and
i

(4.16)). Then, D   E D E
0
g, F[n] xδni sδn,k+1
i 0
→ g, F[n] (ξ) s (A.3)
∗ is arbitrary,
and as g ∈ Y[n]
 
0
F[n] xδni sδn,k+1
i 0
* F[n] (ξ) s.

From (3.7) we conclude that


 
0
bδni − F[n] xδni sδn,k+1
i 0
* ebn − F[n] (ξ) s,

and then  
0 δi 0 δi δi
bn − F[n] (ξ) s ≤ lim inf bn − F[n] xn sn,k+1 . (A.4)
e

Now, as Jp is continuous, we have similarly to (A.3),


D   E D   E D E
δi δi δi δi δi
Jp zn,k , zn,k+1 = Jp zn,k − Jp (σn,k ) , zn,k+1 + Jp (σn,k ) , zn,k+1
→ hJp (σn,k ) , zi
109

which in turn implies


1 1
∆p (z, σn,k ) = kzkp + ∗ kσn,k kp − hJp (σn,k ) , zi (A.5)
p p
 
1 δi p 1
δi p
D 
δi

δi
E
≤ lim inf zn,k+1 + ∗ zn,k − Jp zn,k , zn,k+1

p p
 
δi δi
= lim inf ∆p zn,k+1 , zn,k .

δi
From xδni → ξ, (A.4), (A.5) and due to the minimality property of zn,k+1 ,
 
δi δi δi
Wn,k (z) ≤ lim inf Tn,k zn,k+1 ≤ lim inf Tn,k (σn,k+1 )
δi
= lim Tn,k (σn,k+1 ) = Wn,k (σn,k+1 ) .
i→∞

Using minimality and uniqueness of σn,k+1 , we conclude that σn,k+1 = z and then
δi
zn,k+1 * σn,k+1 . Accordingly, sδn,k+1
i
* σn,k+1 − ξ which implies that s = σn,k+1 − ξ.
We prove now that
 
δi δi
∆p zn,k+1 , zn,k → ∆p (σn,k+1 , σn,k ) as i → ∞. (A.6)

Define
 
δi δi
ai := ∆p zn,k+1 , zn,k , a := lim sup ai , c := ∆p (σn,k+1 , σn,k ) ,
1   r
0
rei := bδni − F[n] xδni sδn,k+1
i
, and re := lim inf rei .

r
In view of (A.5) , it is enough to prove that a ≤ c. Suppose that a > c. From the
definition of lim sup there exists, for all M ∈ N, some index i > M such that
a−c
ai > a − . (A.7)
4
From the definition of lim inf, there exists N1 ∈ N such that

αn (a − c)
rei ≥ re − , (A.8)
4
δi
for all i ≥ N1 . As above, lim Tn,k (σn,k+1 ) = Wn,k (σn,k+1 ) and then there is an N2 ∈ N
i→∞
such that
δi αn (a − c)
Tn,k (σn,k+1 ) < Wn,k (σn,k+1 ) + (A.9)
2
for all i ≥ N2 . Using (A.4) and setting M = N1 ∨ N2 , there exists some index i > M
such that

Wn,k (σn,k+1 ) ≤ re + αn c = re + αn a − αn (a − c)
αn (a − c) αn (a − c)
≤ rei + + αn ai + − αn (a − c)
4 4
αn (a − c) δi

δi
 α (a − c)
n
= rei + αn ai − = Tn,k zn,k+1 −
2 2
δi αn (a − c)
≤ Tn,k (σn,k+1 ) −
2
where the second inequality comes from (A.8) and (A.7) and the last one, from
δi
the minimality of zn,k+1 . From (A.9) we obtain the contradiction Wn,k (σn,k+1 ) <
110APPENDIX A. STABILITY PROPERTY OF THE REGULARIZING SEQUENCES

≤ c and
Wn,k (σn,k+1 ). Thus, a (A.6) holds. From the definition of the Bregman
δi δi
distance ∆p we have zn,k+1 → kσn,k+1 k . As zn,k+1 * σn,k+1 we conclude that

δi
zn,k+1 → σn,k+1 as i → ∞, because X is uniformly convex. So far, we have shown

that each positive zero-sequence (δi )i∈N contains a subsequence δij j∈N such that
δij
zn,k+1 → σn,k+1 as j → ∞ which is enough to prove the statement.

• Bregman variation of Tikhonov-Phillips method (3.58)


δi δi
This method uses zbn,k = xδni and vn,k = Aδni sδn,k+1
i
− bδni .
We omit this proof because it is very similar to that one given above to the IT method.

• Mixed gradient-Tikhonov methods presented in Subsection 3.1.4


δi δi δi
For these methods, zbn,k = zn,k , vn,k = Aδni sδn,k
i
− bδni , γnδi = αnδi and K2δi = K0 /K3δi ,
where K0is defined in Assumption
 3, page 58 and K3δi is an upper bound to the
p
sequence p1 eδni − xδni

(see Assumption 4, page 62, and Subsection 3.1.4). In
n∈N
addition to the hypotheses of Theorem 47, we need to assume in this case that Yj ,
j = 0, ..., d−1 are uniformly smooth Banach spaces, which guarantees that the duality
mapping jr is unique and continuous.
The proof is very similar to that given above for the dual gradient methods, with the
difference that here K2 6= 0 (see Assumption 4). Since xδni → ξ and xδni → x ξn as i → ∞,
p
δi ξ 1 ξ
the bound K3 converges to a bound K3 of the sequence p en − xn and
n∈N
r r
r
accordingly, K2δi bδni → K2ξ ebn as i → ∞. It follows that 0 ≤ lim γnδi ≤ K2ξ ebn ,
i→∞
which implies, similarly to (A.1) and (A.2) that
    h  ∗    i
δi δi
Jp zn,k+1 = Jp zn,k − λδn,k
i 0
F[n] xδni jr Aδni sδn,k
i
− bδni + γnδi Jp sδn,k
i
− xδni

converges to
h    i
Jp (σn,k+1 ) = Jp (σn,k ) − λξn,k F[n]
0
(ξ)∗ jr vn,k
ξ
+ γnξ Jp σn,k − ξ − xξn ,

δi
as i → ∞. Since Jp−1 = Jp∗∗ is a continuous function, the vector zn,k+1 converges to
σn,k+1 as i → ∞ and the proof is complete.
Bibliography

[1] Giovanni Alessandrini. Stable determination of conductivity by boundary measure-


ments. Appl. Anal., 27(1-3):153–172, 1988.

[2] Kari Astala and Lassi Päivärinta. Calderón’s inverse conductivity problem in the plane.
Ann. of Math. (2), 163(1):265–299, 2006.

[3] A. B. Bakushinskiı̆. On a convergence problem of the iterative-regularized Gauss-


Newton method. Zh. Vychisl. Mat. i Mat. Fiz., 32(9):1503–1509, 1992.

[4] Johann Baumeister, Barbara Kaltenbacher, and Antonio Leitão. On Levenberg-


Marquardt-Kaczmarz iterative methods for solving systems of nonlinear ill-posed equa-
tions. Inverse Probl. Imaging, 4(3):335–350, 2010.

[5] Liliana Borcea. Electrical impedance tomography. Inverse Problems, 18(6):R99–R136,


2002.

[6] Kristian Bredies and Dirk Lorenz. Matematische Bildverarbeitung. Einführung in


Grundlagen und moderne Theorie. Vieweg+Teubner, Berlin, 2011.

[7] Martin Burger and Barbara Kaltenbacher. Regularizing Newton-Kaczmarz methods


for nonlinear ill-posed problems. SIAM J. Numer. Anal., 44(1):153–182 (electronic),
2006.

[8] Alberto-P. Calderón. On an inverse boundary value problem. In Seminar on Numerical


Analysis and its Applications to Continuum Physics (Rio de Janeiro, 1980), pages 65–
73. Soc. Brasil. Mat., Rio de Janeiro, 1980.

[9] Charles Chidume. Geometric properties of Banach spaces and nonlinear iterations,
volume 1965 of Lecture Notes in Mathematics. Springer-Verlag London, Ltd., London,
2009.

[10] Ioana Cioranescu. Geometry of Banach spaces, duality mappings and nonlinear prob-
lems, volume 62 of Mathematics and its Applications. Kluwer Academic Publishers
Group, Dordrecht, 1990.

[11] Christian Clason and Bangti Jin. A semismooth Newton method for nonlinear param-
eter identification problems with impulsive noise. SIAM J. Imaging Sci., 5(2):505–538,
2012.

[12] Ingrid Daubechies, Michel Defrise, and Christine De Mol. An iterative thresholding
algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl.
Math., 57(11):1413–1457, 2004.

[13] Ron S. Dembo, Stanley C. Eisenstat, and Trond Steihaug. Inexact Newton methods.
SIAM J. Numer. Anal., 19(2):400–408, 1982.

111
112 BIBLIOGRAPHY

[14] Françoise Demengel and Gilbert Demengel. Functional spaces for the theory of elliptic
partial differential equations. Universitext. Springer, London; EDP Sciences, Les Ulis,
2012. Translated from the 2007 French original by Reinie Erné.

[15] Heinz W. Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse prob-
lems, volume 375 of Mathematics and its Applications. Kluwer Academic Publishers
Group, Dordrecht, 1996.

[16] Heinz W. Engl, Karl Kunisch, and Andreas Neubauer. Convergence rates for Tikhonov
regularisation of nonlinear ill-posed problems. Inverse Problems, 5(4):523–540, 1989.

[17] V. M. Fridman. On the convergence of methods of steepest descent type. Uspehi Mat.
Nauk, 17(3 (105)):201–204, 1962.

[18] P. Grisvard. Elliptic problems in nonsmooth domains, volume 24 of Monographs and


Studies in Mathematics. Pitman (Advanced Publishing Program), Boston, MA, 1985.

[19] Jacques Hadamard. Le problème de Cauchy et les équations aux dérivées partielles
linéaires hyperboliques. Hermann, Paris, 1932.

[20] Markus Haltmeier, Antonio Leitão, and Otmar Scherzer. Kaczmarz methods for regu-
larizing nonlinear ill-posed equations. I. Convergence analysis. Inverse Probl. Imaging,
1(2):289–298, 2007.

[21] Martin Hanke. A regularizing Levenberg-Marquardt scheme, with applications to in-


verse groundwater filtration problems. Inverse Problems, 13(1):79–95, 1997.

[22] Martin Hanke. Regularizing properties of a truncated Newton-CG algorithm for non-
linear inverse problems. Numer. Funct. Anal. Optim., 18(9-10):971–993, 1997.

[23] Bangti Jin, Taufiquar Khan, and Peter Maass. A reconstruction algorithm for electrical
impedance tomography based on sparsity regularization. Internat. J. Numer. Methods
Engrg., 89(3):337–353, 2012.

[24] Qinian Jin. Inexact Newton-Landweber iteration for solving nonlinear inverse problems
in Banach spaces. Inverse Problems, 28(6):065002, 15, 2012.

[25] Qinian Jin. On the order optimality of the regularization via inexact Newton iterations.
Numer. Math., 121(2):237–260, 2012.

[26] Qinian Jin and Linda Stals. Nonstationary iterated Tikhonov regularization for ill-
posed problems in Banach spaces. Inverse Problems, 28(10):104011, 15, 2012.

[27] S. Kaczmarz. Approximate solution of systems of linear equations. Internat. J. Control,


57(6):1269–1271, 1993. Translated from the German.

[28] Jari P. Kaipio, Ville Kolehmainen, Erkki Somersalo, and Marko Vauhkonen. Statistical
inversion and Monte Carlo sampling methods in electrical impedance tomography.
Inverse Problems, 16(5):1487–1522, 2000.

[29] Barbara Kaltenbacher, Andreas Neubauer, and Otmar Scherzer. Iterative regulariza-
tion methods for nonlinear ill-posed problems, volume 6 of Radon Series on Computa-
tional and Applied Mathematics. Walter de Gruyter GmbH & Co. KG, Berlin, 2008.

[30] Andreas Kirsch. An introduction to the mathematical theory of inverse problems, vol-
ume 120 of Applied Mathematical Sciences. Springer, New York, second edition, 2011.
BIBLIOGRAPHY 113

[31] R. Kowar and O. Scherzer. Convergence analysis of a Landweber-Kaczmarz method for


solving nonlinear ill-posed problems. In Ill-posed and inverse problems, pages 253–270.
VSP, Zeist, 2002.

[32] Erwin Kreyszig. Introductory functional analysis with applications. Wiley Classics
Library. John Wiley & Sons, Inc., New York, 1989.

[33] L. Landweber. An iteration formula for Fredholm integral equations of the first kind.
Amer. J. Math., 73:615–624, 1951.

[34] Armin Lechleiter and Andreas Rieder. Newton regularizations for impedance tomog-
raphy: a numerical study. Inverse Problems, 22(6):1967–1987, 2006.

[35] Armin Lechleiter and Andreas Rieder. Newton regularizations for impedance tomog-
raphy: convergence by local injectivity. Inverse Problems, 24(6):065009, 18, 2008.

[36] Armin Lechleiter and Andreas Rieder. Towards a general convergence theory for inex-
act Newton regularizations. Numer. Math., 114(3):521–548, 2010.

[37] A. Leitão and M. Marques Alves. On Landweber-Kaczmarz methods for regularizing


systems of ill-posed equations in Banach spaces. Inverse Problems, 28(10):104008, 15,
2012.

[38] Joram Lindenstrauss and Lior Tzafriri. Classical Banach spaces. II, volume 97 of
Ergebnisse der Mathematik und ihrer Grenzgebiete [Results in Mathematics and Related
Areas]. Springer-Verlag, Berlin-New York, 1979. Function spaces.

[39] Fábio Margotti and Andreas Rieder. An inexact Newton regularization in Banach
spaces based on the nonstationary iterated Tikhonov method. Journal of inverse and
Ill-posed Problems, Ahead of print, 2014.

[40] Fábio Margotti, Andreas Rieder, and Antonio Leitão. A Kaczmarz version of the
REGINN-Landweber iteration for ill-posed problems in Banach spaces. SIAM J. Numer.
Anal., 52(3):1439–1465, 2014.

[41] A. Neubauer and O. Scherzer. A convergence rate result for a steepest descent method
and a minimal error method for the solution of nonlinear ill-posed problems. Z. Anal.
Anwendungen, 14(2):369–377, 1995.

[42] Nick Polydorides and William R. B. Lionheart. A Matlab toolkit for three-dimensional
electrical impedance tomography: a contribution to the Electrical Impedance and Dif-
fuse Optical Reconstruction Software project. Measurement Science and Technology,
13(12):1871–1873, 2002.

[43] Andreas Rieder. On the regularization of nonlinear ill-posed problems via inexact
Newton iterations. Inverse Problems, 15(1):309–327, 1999.

[44] Andreas Rieder. On convergence rates of inexact Newton regularizations. Numer.


Math., 88(2):347–365, 2001.

[45] Andreas Rieder. Keine Probleme mit inversen Problemen. Friedr. Vieweg & Sohn,
Braunschweig, 2003. Eine Einführung in ihre stabile Lösung. [An introduction to their
stable solution].

[46] Andreas Rieder. Inexact Newton regularization using conjugate gradients as inner
iteration. SIAM J. Numer. Anal., 43(2):604–622 (electronic), 2005.
114 BIBLIOGRAPHY

[47] F. Schöpfer, A. K. Louis, and T. Schuster. Nonlinear iterative methods for linear
ill-posed problems in Banach spaces. Inverse Problems, 22(1):311–329, 2006.

[48] Thomas Schuster, Barbara Kaltenbacher, Bernd Hofmann, and Kamil S. Kazimierski.
Regularization methods in Banach spaces, volume 10 of Radon Series on Computational
and Applied Mathematics. Walter de Gruyter GmbH & Co. KG, Berlin, 2012.

[49] R. E. Showalter. Monotone operators in Banach space and nonlinear partial differential
equations, volume 49 of Mathematical Surveys and Monographs. American Mathemat-
ical Society, Providence, RI, 1997.

[50] Erkki Somersalo, Margaret Cheney, and David Isaacson. Existence and uniqueness
for electrode models for electric current computed tomography. SIAM J. Appl. Math.,
52(4):1023–1040, 1992.

[51] A. N. Tikhonov. On the solution of incorrectly put problems and the regularisation
method. In Outlines Joint Sympos. Partial Differential Equations (Novosibirsk, 1963),
pages 261–265. Acad. Sci. USSR Siberian Branch, Moscow, 1963.

[52] Robert Winkler and Andreas Rieder. Model-aware newton-type inversion scheme for
electrical impedance tomography. Inverse Problems, 31(4):045009, 2015.

[53] Zong Ben Xu and G. F. Roach. Characteristic inequalities of uniformly convex and
uniformly smooth Banach spaces. J. Math. Anal. Appl., 157(1):189–210, 1991.

You might also like