A Bayesian approach with Gaussian priors to the inverse problem of source identification in elliptic PDEs

Matteo Giordano

ESOMAS Department, University of Turin,
Corso Unione Sovietica 218 bis, Turin, Italy

Abstract

We consider the statistical linear inverse problem of making inference on an unknown source function in an elliptic partial differential equation from noisy observations of its solution. We employ nonparametric Bayesian procedures based on Gaussian priors, leading to convenient conjugate formulae for posterior inference. We review recent results providing theoretical guarantees on the quality of the resulting posterior-based estimation and uncertainty quantification, and we discuss the application of the theory to the important classes of Gaussian series priors defined on the Dirichlet-Laplacian eigenbasis and Matérn process priors. We provide an implementation of posterior inference for both classes of priors, and investigate its performance in a numerical simulation study. The reproducible code is available at: https://github.com/MattGiord/Bayesian-Source-Identification

Keywords. Parameter identification; semiparametric inference; uncertainty quantification; frequentist analysis of posterior distributions; simulation study.

1 Introduction

Linear inverse problems consist in the task of recovering unknown objects or physical quantities from linear indirect noisy measurements. A widespread mathematical formulation for these problems postulates that the recovery target be an element $f$ of a Hilbert space $\mathbb{H}_{1}$ , and that the data arise according to the equation

Y^{\varepsilon}=G(f)+\varepsilon W,

(1)

where $G:\mathbb{H}_{1}\to\mathbb{H}_{2}$ is a linear operator between $\mathbb{H}_{1}$ and another Hilbert space $\mathbb{H}_{2}$ , $W$ is additive observational noise and $\varepsilon>0$ is the noise level. In view of the central limit theorem, normality of the measurement errors can often be maintained, whereby $W$ is assumed to be a white noise process indexed by $\mathbb{H}_{2}$ . The goal is then to estimate $f$ from an observed realisation of $Y^{\varepsilon}$ .

Observation schemes as in equation (1) are found in a variety of scientific fields and engineering applications, including medical imaging [9], geophysics [40], acoustics [12] and finance [7]. For example, Computerised Tomography (CT), a technique to obtain detailed images of the human body and information about the density variation of the tissues, is based on a mathematical model (related to the ‘Radon transform’) for the absorption of $X$ -rays. Similar concepts underpin many other medical imagining techniques, such as Magnetic Resonance (MR) and Positron Emission Tomography (PET); see [9] for further details and references.

In many such applications, the unknown $f$ may be characterised as a functional coefficient governing a partial differential equation (PDE), while the observed object $G(f)$ is the corresponding PDE solution. The ‘forward operator’ is then the coefficient-to-solution map $G:f\mapsto G(f)$ . See the monograph [24] for an extensive overview on PDE-based inverse problems.

In the present article, we shall mostly focus on the representative example of ‘source identification’ in elliptic PDEs. Many of the ideas developed hereafter have a natural application to other linear inverse problems. Let $\mathcal{O}\subset\mathbb{R}^{d}$ be an open and bounded set with smooth boundary $\partial\mathcal{O}$ . Let the unknown (square-integrable) function $f\in\mathbb{H}_{1}\equiv L^{2}(\mathcal{O})$ be the ‘source term’ in the elliptic PDE with zero Dirichlet boundary conditions,

\begin{split}\nabla\cdot(c\nabla u)-f&=0,\ \ \text{on}\ \ \mathcal{O},\\ u&=0,\ \ \text{on}\ \ \partial\mathcal{O},\end{split}

(2)

where $\nabla\cdot$ and $\nabla$ denote, respectively, the divergence and gradient operators, and where the smooth and positive ‘diffusion coefficient’ $c\in C^{\infty}(\overline{\mathcal{O}})$ , $\inf_{x\in\mathcal{O}}c(x)>0$ , is assumed to be known. By standard elliptic theory, e.g. [33, Chapter 2] and [14, Chapter 6], for any $f\in L^{2}(\mathcal{O})$ there exists a unique (weak) solution $G(f)\equiv u$ in the Sobolev space $H^{1}(\mathcal{O})\subset L^{2}(\mathcal{O})\equiv\mathbb{H}_{2}$ , giving rise to a linear (injective, self-adjoint and compact) operator $G:L^{2}(\mathcal{O})\to L^{2}(\mathcal{O})$ ; see Appendix B in [18] for further details. We then consider the problem of estimating the source $f$ from observations $Y^{\varepsilon}$ arising as in (1), with $G$ the solution map associated to the PDE (2) and $W$ a Gaussian white noise process indexed by $L^{2}(\mathcal{O})$ . An illustration of the problem with synthetic data is provided in Figure 1 below. Among the numerous applications areas, inverse problems based on elliptic PDEs are important building blocks in oil reservoir modelling [47]. Source identification problems have been extensively investigated in the applied mathematics and statistics communities; see [2, 13, 21, 18] and the many references therein.

Refer to caption — Figure 1: Left: a source function $f$ on a rotated elliptically-shaped domain, modelling three heat sources centred at the points $(-0.5,0)$ , $(0,0)$ and $(0,0.5)$ . Right: noisy observations of the associated PDE solution $G(f)$ (with fixed diffusivity $c$ ).

Here, we shall pursue the popular nonparametric Bayesian approach to inverse problems [41]. We shall assign to the unknown source $f$ a prior distribution $\Pi(\cdot)$ supported on the function space $L^{2}(\mathcal{O})$ and then, following the Bayesian paradigm, combine it with the (Gaussian) data likelihood induced by the statistical model (1) to form the posterior distribution $\Pi(\cdot|Y^{\varepsilon})$ of $f|Y^{\varepsilon}$ , representing our updated belief about the inferential target and providing us with point estimates and uncertainty quantification. In particular, Gaussian priors represent a natural choice for inference in the context of the observation scheme (1) due to their conjugacy, which leads to convenient explicit formulae for the posterior distribution.

The Bayesian approach to inverse problems dates back at least to the 1980’s, with early seminal work laid out in [43, 31, 32] among the others, and has since gained enormous popularity across several applied fields. See the monographs [25, 42], as well as the more recent reviews [8, 6], where further references can be found. Over the last decade, a large number of articles have investigated the theoretical recovery performance of Gaussian priors in linear inverse problems. We refer in particular to [29, 30, 4, 27, 20], and also mention [1, 36, 19, 35, 17] for results in nonlinear problems. Recently, Giordano and Kekkonen [18], building on earlier findings by [34], identified general conditions on the forward operator, on the ground truth and on the prior distribution under which ‘semiparametric’ Bernstein-von Mises theorems can be obtained, characterising the asymptotic shape of the posterior distribution of a large collection of linear functionals of the unknown. For the problem of estimating the source $f$ in the elliptic PDE (2) from observations $Y^{\varepsilon}$ arising as in (1), they showed that a wide class of ‘standard’ centred Gaussian process priors (such as the one associated to the commonly used Matérn kernel) yield, in the small noise limit $\varepsilon\to 0$ and under the frequentist assumption that the data have been generated by a fixed ground truth $f_{0}$ , valid and optimal estimation and uncertainty quantification, via the posterior mean estimate and credible intervals centred around it, for one-dimensional aspects of the unknown source function.

For this study, we shall focus on the implementation of the Bayesian procedures with Gaussian priors for the inverse problem of source identification that were investigated in [18], corroborating the theory developed therein with a numerical simulation study. To this end, we will first review, in Section 2.2, the asymptotic results derived in [18]. We will then provide, in Section 2.3, two examples of Gaussian priors satisfying the assumptions of the general theory; in particular, we will consider centred Gaussian series priors defined on the eigenbasis of the Dirichlet-Laplacian, as well as centred stationary Gaussian process priors associated to the Matérn covariance kernel. Both classes of priors are of practical interest and widely used, and they will lead to two different discretisation strategies for the implementation.

We will present the numerical simulation study in Section 3, were we will provide an implementation of posterior inference in the source identification problem for Gaussian series priors and for the Matérn process priors. The series approach will hinge on a natural discretisation of the parameter space via high-dimensional basis expansions, while for the Matérn priors we will employ piecewise linear functions defined on the elements of a deterministic triangular mesh. Under a suitable discretisation of the statistical model (1), we will provide the explicit formulae for the Gaussian conjugate posterior distributions, which we will exploit to efficiently compute the posterior mean estimates and to efficiently implement, via posterior sampling, credible sets for uncertainty quantification. In the study, we will numerically investigate the asymptotic concentration of the posterior distribution around the ground truth generating the data, the efficiency of the posterior mean estimator (in terms of minimality of its asymptotic variance), and further the frequentist coverage of the obtained credible intervals. Overall, the study shows a close correspondence between the numerical results and the expected performance predicted by the theory developed in [18]. The posterior mean estimate (relative to a Gaussian series prior) and the associated uncertainty quantification (obtained via the cross-section of 2500 posterior samples along the $x$ -axis) are depicted in Figure 2, to be compared to the true source function shown in Figure 1. The reproducible (MATLAB) code used for the study is available at: https://github.com/MattGiord/Bayesian-Source-Identification.

2 A nonparametric Bayesian approach with Gaussian priors

In this section, we formally introduce the statistical model and the Bayesian procedures of interest for the present article. We will provide details on the likelihood, the prior and the posterior distribution in Section 2.1. We will then review the asymptotic results of [18] in Section 2.2, and further present two classes of Gaussian priors to which the theory applies in Section 2.3.

2.1 Details on the Bayesian model

Throughout, let $\mathcal{O}\subset\mathbb{R}^{d}$ , $d\in\mathbb{N}$ , be a non-empty, open and bounded set with smooth boundary $\partial\mathcal{O}$ . For a fixed, smooth and positive ‘diffusion coefficient’ $c\in C^{\infty}(\overline{\mathcal{O}})$ , $\inf_{x\in\mathcal{O}}c(x)>0$ , let $G:L^{2}(\mathcal{O})\to L^{2}(\mathcal{O})$ be the (linear) source-to-solution map associated to the elliptic PDE (2); see Appendix B in [18] for details. For such $G$ , and a given noise level $\varepsilon>0$ , consider observations $Y^{\varepsilon}$ arising as in (1) for some unknown $f\in L^{2}(\mathcal{O})$ , where $W$ is a white noise process indexed by $L^{2}(\mathcal{O})$ defined on some probability space $(\Omega,\mathcal{S},\Pr)$ , that is the centred Gaussian process $(W(g),\ g\in L^{2}(\mathcal{O}))$ with covariance $\textnormal{E}[W(g_{1})W(g_{2})]=\langle g_{1},g_{2}\rangle_{2}$ . Throughout most of the article, we will regard $\varepsilon$ as known. In practice, it can often be replaced by an estimate (cf. Section 3.3.2). As described, for example, in Chapter 1 of [16], observing $Y^{\varepsilon}$ is understood as observing a realisation of the Gaussian process $(Y^{\varepsilon}(g),\ g\in L^{2}(\mathcal{O}))$ with mean $\textnormal{E}[Y^{\varepsilon}(g)]=\langle G(f),g\rangle_{2}$ and covariance $\mathbbm{C}\textnormal{ov}[Y^{\varepsilon}(g_{1}),Y^{\varepsilon}(g_{2})]=% \langle g_{1},g_{2}\rangle_{2}$ . Such observation scheme serves as a convenient continuous counterpart of the inverse regression model

Y_{i}=G(f)(X_{i})+\sigma W_{i},\qquad i=1,\dots,n,

(3)

comprising noisy point evaluations of the PDE solution $G(f)$ over a set of points $X_{1},\dots,X_{n}\in\mathcal{O}$ , corrupted by Gaussian measurement errors $\sigma W_{1},\dots,\sigma W_{n}\overset{\textnormal{iid}}{\sim}N(0,\sigma^{2})$ , $\sigma>0$ , known to be asymptotically equivalent (in the sense of [10]) to the white noise model (1) under suitable assumptions on the grid and the calibration $n/\sigma^{2}\simeq\varepsilon^{-2}$ .

For continuous observations $Y^{\varepsilon}$ arising as in (1), for any $f\in L^{2}(\mathcal{O})$ , the (cylindrically-defined) law $P^{\varepsilon}_{f}$ of $Y^{\varepsilon}$ is absolutely continuous with respect to the law $P_{0}^{\varepsilon}$ of the scaled white noise $\varepsilon W$ , with log-likelihood

\ell_{\varepsilon}(f):=\log\frac{dP^{\varepsilon}_{f}}{dP^{\varepsilon}_{0}}(Y% ^{\varepsilon})=\frac{1}{\varepsilon^{2}}Y^{\varepsilon}[G(f)]-\frac{1}{2% \varepsilon^{2}}\|G(f)\|_{2}^{2}.

In view of the joint measurability of $\ell_{\varepsilon}$ , regarding $f$ as a random function with values in $L^{2}(\mathcal{O})$ and assigning to it any prior distribution in the form of a Borel probability measure $\Pi(\cdot)$ supported on $L^{2}(\mathcal{O})$ then induces, via Bayes’ formula (for example, [15, p.7]), the posterior distribution

\Pi(A|Y^{\varepsilon})=\frac{\int_{A}e^{\ell_{\varepsilon}(f)}d\Pi(f)}{\int_{L% ^{2}(\mathcal{O})}e^{\ell_{\varepsilon}(f^{\prime})}d\Pi(f^{\prime})},\qquad A% \subseteq L^{2}(\mathcal{O})\ \textnormal{measurable},

that is, the conditional distribution of $f|Y^{\varepsilon}$ . In particular, it will be of interest to consider Gaussian priors which, in view of the linearity of the forward operator $G$ and the normal assumption on the noise $W$ , will lead to conjugate Gaussian posteriors. Concrete formulae are provided in Section 3. In the following, we will repeatedly use elements of the theory of Gaussian processes and measures on Hilbert spaces, and we refer to [16, Chapter 2] for the necessary background. For a Gaussian prior $\Pi(\cdot)$ on $L^{2}(\mathcal{O})$ , the ‘information geometry’ is encoded within an associated reproducing kernel Hilbert space (RKHS) of functions defined on the domain $\mathcal{O}$ , strictly contained inside $L^{2}(\mathcal{O})$ . Popular prior choices in applications and theoretical studies typically model functions belonging to a ‘smoothness scale’, with associated RKHS equal to (or included in) a Sobolev space $H^{\alpha}(\mathcal{O})$ , for some regularity level $\alpha>0$ . These include Gaussian series priors defined on bases spanning the Sobolev scale, as well as stationary Gaussian processes with the Matérn covariance kernel; see [15, Chapter 11].

2.2 Theoretical guarantees for estimation and uncertainty quantification

The asymptotic properties of nonparametric Bayesian procedures with Gaussian priors in inverse problems have recently been investigated by Giordano and Kekkonen [18], resulting, under general ‘regularity conditions’ for the forward operator, for the ground truth and for the prior distribution, in semiparametric Bernstein-von Mises theorems that entail the convergence of certain one-dimensional posterior distributions to limiting Gaussian probability measures with minimal variance, in the small noise limit and under the frequentist assumption that the data have been generated by a fixed ground truth. These results were then leveraged in [18] to prove the asymptotic efficiency of the associated posterior mean estimators, as well as to derive theoretical guarantees certifying that credible intervals centred around them are asymptotically valid confidence intervals with minimal width. In this section, we provide a review of the findings of [18] for the inverse problem of source identification. These will later be corroborated by the results of the numerical simulation study presented in Section 3.

The investigation of [18] builds on the semiparametric approach to the Bernstein-von Mises theorem in infinite-dimensional statistical models developed by Castillo and Nickl [11] and later refined by Monard et al. [34] in the inverse problem setting. It is based on the study of the posterior distributions of a class of scaled and centred one-dimensional functionals of the unknown which, in the context of the source identification problem, take the form $\varepsilon^{-1}\langle f-\bar{f}_{\varepsilon},\psi\rangle_{2}$ for test functions $\psi\in L^{2}(\mathcal{O})$ , where $\bar{f}_{\varepsilon}:=E^{\Pi}[f|Y^{\varepsilon}]$ is the posterior mean. Let $\mathcal{L}(\varepsilon^{-1}\langle f-\bar{f}_{\varepsilon},\psi\rangle_{2}|Y^% {\varepsilon})$ denote the associated scaled and centred posterior distribution.

Theorem 1 (Theorem 4.1 in [18]).

Let $\Pi(\cdot)$ be a centred Gaussian Borel probability measure supported on $L^{2}(\mathcal{O})$ with RKHS equal to $H^{\alpha}(\mathcal{O})$ for some $\alpha>d/2$ . Let $f_{0}\in H^{\beta}(\mathcal{O})$ , for some $\beta>\alpha-d/2$ , be compactly supported inside $\mathcal{O}$ , and consider observations $Y^{\varepsilon}\sim P^{\varepsilon}_{f_{0}}$ from the statistical model (1) with $G(f)$ the solution to the PDE (2) and $f=f_{0}$ . Then, for any $\gamma>2+d/2$ and any compactly supported test function $\psi\in H^{\gamma}(\mathcal{O})$ , we have

\mathcal{L}(\varepsilon^{-1}\langle f-\bar{f}_{\varepsilon},\psi\rangle_{2}|Y^% {\varepsilon})\overset{\mathcal{L}}{\longrightarrow}N(0,\|\nabla\cdot(c\nabla% \psi)\|^{2}_{2}),

(4)

in $P^{\varepsilon}_{f_{0}}$ -probability as $\varepsilon\to 0$ .

The result asserts that the random (data-dependent) one-dimensional probability distribution $\mathcal{L}(\varepsilon^{-1}\langle f-\bar{f}_{\varepsilon},\psi\rangle_{2}|Y^% {\varepsilon})$ converges (in the topology of weak convergence) in probability to a centred normal distribution with variance $\|\nabla\cdot(c\nabla\psi)\|^{2}_{2}$ . The latter can be shown to be minimal [18, Remark 2.4], as it coincides with the Cramér-Rao lower bound for estimating the one-dimensional quantity $\langle f,\psi\rangle_{2}$ from data $Y^{\varepsilon}$ arising as in (1). Furthermore, the class of test functions $\psi$ for which the convergence (4) is obtained is to be understood to be maximal, in the sense that the infinite-dimensional Gaussian probability measure with marginals identified by the right hand side of (4) is tight (a necessary condition for weak convergence) only when $\gamma>2+d/2$ ; see Lemma 4.2 in [18] and the related discussion.

A first important consequence of Theorem 1 is a central limit for the ‘plug-in’ posterior mean estimators $\langle\bar{f}_{\varepsilon},\psi\rangle_{2}$ of the one dimensional aspects $\langle f,\psi\rangle_{2}$ of the unknown. Note that, for Gaussian priors, these can be efficiently computed via the explicit formulae for the conjugate Gaussian posteriors. The central limit follows, as argued in Remark 2.4 in [18], from the convergence of moments in the limit (4). In particular, under the assumptions of Theorem 1, it holds that

\varepsilon^{-1}\left(\langle\bar{f}_{\varepsilon},\psi\rangle_{2}-\langle f_{% 0},\psi\rangle_{2}\right)\overset{d}{\longrightarrow}N(0,\|\nabla\cdot(c\nabla% \psi)\|^{2}_{2}),

(5)

as $\varepsilon\to 0$ . In view of the aforementioned minimality of the asymptotic variance $\|\nabla\cdot(c\nabla\psi)\|^{2}_{2}$ , the result indeed implies the asymptotic efficiency of the plug-in estimators $\langle\bar{f}_{\varepsilon},\psi\rangle_{2}$ .

The second implication of the Berstein-von Mises result stated in Theorem 1 concerns the coverage and width of credible intervals built around the efficient estimators $\langle\bar{f}_{\varepsilon},\psi\rangle_{2}$ , which can be shown to be asymptotically valid frequentist confidence intervals and to have diameter shrinking at the optimal parametric rate $\varepsilon^{-1}$ . For any level $a\in(0,1)$ , consider the $(1-a)\%$ -credible interval

C_{\varepsilon,a}:=\{z\in\mathbb{R}:|z-\langle\bar{f}_{\varepsilon},\psi% \rangle_{2}|\leq R_{\varepsilon,a}\},

(6)

where $R_{\varepsilon,a}>0$ is the $(1-a/2)\%$ -quantile of the one-dimensional (Gaussian) posterior distribution of $\langle f,\psi\rangle_{2}|Y^{\varepsilon}$ , so that

\Pi\left(f:\langle f,\psi\rangle_{2}\in C_{\varepsilon,a}|Y^{\varepsilon}% \right)=1-a.

Then, in the setting of Theorem 1, the asymptotic frequentist coverage of $C_{\varepsilon,a}$ is given by

P^{\varepsilon}_{f_{0}}\left(\langle f_{0},\psi\rangle_{2}\in C_{\varepsilon,a% }\right)\to 1-a,

(7)

as $\varepsilon\to 0$ , while its radius $R_{\varepsilon,a}$ satisfies

R_{\varepsilon,a}=O_{P^{\varepsilon}_{f_{0}}}(\varepsilon^{-1}).

See Corollary 2.5 in [18]. Note that although an analytic formulation of the credible intervals $C_{\varepsilon,a}$ requires the derivation of the quantiles of the one dimensional posterior distributions $\langle f,\psi\rangle_{2}|Y^{\varepsilon}$ , these can typically be numerically approximated by efficiently sampling from the explicitly available conjugate posterior distributions.

2.3 Examples of Gaussian priors

In this section, we provide two concrete examples of Gaussian priors to which Theorem 1 applies. For both instances, an implementation of the resulting posterior inference will be presented in Section 3 below, based on two different discretisation strategies. The first example concerns Gaussian series priors.

Example 2 (Gaussian series priors on the Dirichlet-Laplacian eigenbasis).

Let $(\phi_{j},\ j\in\mathbb{N})\subset H^{1}(\mathcal{O})\cap C^{\infty}(\overline% {\mathcal{O}})$ be the orthonormal basis of the space $L^{2}(\mathcal{O})$ formed by the eigenfunctions of the (negative) Dirichlet-Laplacian,

\begin{split}-\Delta\phi_{j}-\lambda_{j}\phi_{j}&=0,\ \ \textnormal{on}\ \ % \mathcal{O}\\ \phi_{j}&=0,\ \ \textnormal{on}\ \ \partial\mathcal{O},\end{split}\qquad j\in% \mathbb{N},

(8)

with associated eigenvalues $0<\lambda_{1}<\lambda_{2}\leq\lambda_{3}\leq\dots,$ satisfying $\lambda_{j}\to\infty$ as $j\to\infty$ according to Weyl’s asymptotics, namely $\lambda_{j}=O(j^{2/d})$ as $j\to\infty$ . We refer to Example 6.3 and Section 7.4 in [23] for details. The associated Hilbert scale

\mathbb{H}^{\alpha}:=\Bigg{\{}f\in L^{2}(\mathcal{O}):\|f\|^{2}_{\mathbb{H}^{% \alpha}}:=\sum_{j=1}^{\infty}\lambda_{j}^{\alpha}|\langle f,\phi_{j}\rangle_{2% }|^{2}<\infty\Bigg{\}},\qquad\alpha\geq 0,

then satisfies $\mathbb{H}^{0}=L^{2}(\mathcal{O})$ (with equality of norms) and the continuous (strict) embedding $\mathbb{H}^{\alpha}\subset H^{\alpha}(\mathcal{O})$ for all $\alpha>0$ [44, p. 472]. In fact, it holds that $\|f\|_{\mathbb{H}^{\alpha}}\simeq\|f\|_{H^{\alpha}}$ for all $f\in\mathbb{H}^{\alpha}$ and $\alpha\geq 0$ .

For any $\alpha>d/2$ , consider the Gaussian random series

F:=\sum_{j=1}^{\infty}\lambda_{j}^{-\alpha/2}F_{j}\phi_{j},\qquad F_{j}% \overset{\textnormal{iid}}{\sim}N(0,1),

(9)

corresponding to the Karhunen-Loève expansions of certain commonly used Gaussian process priors with covariance kernel given by an inverse power of the Laplacian [41, Section 2.4]. By Weyl’s asymptotics, we have

\textnormal{E}[\|F\|_{2}^{2}]=\sum_{j=1}^{\infty}\lambda_{j}^{-\alpha}\simeq% \sum_{j=1}^{\infty}j^{-2\alpha/d}<\infty,

since $2\alpha/d>1$ , showing that $F$ takes values almost surely in $L^{2}(\mathcal{O})$ . By Lemma I.5 in [15], the law $\Pi(\cdot)$ of $F$ is then seen to define a Gaussian Borel probability measure supported on $L^{2}(\mathcal{O})$ . Furthermore, by Theorem I.12 in [15], its RKHS is equal to $\mathbb{H}^{\alpha}$ . Noting that, for any compactly supported test function $\psi\in H^{\gamma}(\mathcal{O})$ , $\gamma>2+d/2$ , the approximation argument in the proof of Theorem 4.1 in [18] can be carried out with minimal modifications with $H^{\alpha}(\mathcal{O})$ replaced by $\mathbb{H}^{\alpha}$ , we conclude that Theorem 1 applies by modelling the unknown source function $f$ via the Gaussian series (9).

While not explicitly available for general domains $\mathcal{O}$ , we note that the Dirichlet-Laplacian eigenbasis can be numerically computed via efficient finite element methods for elliptic eigenvalue problems, offering a broadly applicable framework for implementation. More details will be provided in Section 3 below.

A second example of interest involves stationary Gaussian processes defined via a covariance kernel of choice. In particular, the Matérn kernel is widely used in applications [37, Section 4.2].

Example 3 (Matérn process priors).

Let $F=(F(x),\ x\in\mathcal{O})$ be the centred and stationary Gaussian process with Matérn covariance kernel

C_{\alpha,\ell}(x,y)=\frac{2^{1-\alpha}}{\Gamma(\alpha)}\left(\frac{|x-y|\sqrt% {2\alpha}}{\ell}\right)^{\alpha}B_{\alpha}\left(\frac{|x-y|\sqrt{2\alpha}}{% \ell}\right),\qquad x,y\in\mathcal{O},

(10)

with regularity parameter $\alpha>d/2$ and length scale $\ell>0$ . Above, $\Gamma$ denotes the gamma function and $B_{\alpha}$ is the modified Bessel function of the second kind. The finite dimensional distributions of $F$ are identified by the relation

(F(x_{1}),\dots,F(x_{M}))^{T}\sim N_{M}(0,\mathbf{C}),

(11)

with $\mathbf{C}:=(C_{\alpha,\ell}(x_{h},x_{m}))_{h,m=1}^{M}\in\mathbb{R}^{M,M}$ , holding for any $M\in\mathbb{N}$ and any $x_{1},\dots,x_{M}\in\mathcal{O}$ . By Lemma I.4 in [15], a version of $F$ can be identified with sample paths belonging almost surely to the Hölder space $C^{\alpha^{\prime}}(\mathcal{O})\subset L^{2}(\mathcal{O})$ for any $0<\alpha^{\prime}<\alpha-d/2$ , and therefore, in view of Lemma I.7 in [15], the law $\Pi(\cdot)$ of such version defines a Gaussian Borel probability measure supported on $L^{2}(\mathcal{O})$ . Moreover, the results in Section 11.4.4 in [15] imply that the RKHS of $F$ equals, with norm equivalence, the set of restrictions to the domain $\mathcal{O}$ of functions in the Sobolev space $H^{\alpha}(\mathbb{R}^{d})$ . Since $\mathcal{O}$ is assumed to have smooth boundary, the latter is indeed equal to $H^{\alpha}(\mathcal{O})$ . Thus, Theorem 1 applies with $\Pi(\cdot)$ a Matérn process prior with covariance kernel (10).

3 Numerical simulation study

For illustration, we take as working domain the area $\mathcal{O}$ contained inside a rotated ellipse with horizontal semi-axis of unit length, vertical semi-axis of length $3/4$ , and rotation angle $\theta=\pi/6$ ,

\{(\cos(t)\cos(\theta)-3/4\sin(t)\sin(\theta),3/4\sin(t)\cos(\theta)+\cos(t)% \sin(\theta)),\ t\in[0,2\pi)\}.

For an unknown source function $f\in L^{2}(\mathcal{O})$ , we assume in practice that we are given $n$ noisy point evaluations $\mathbf{Y}:=(Y_{1},\dots,Y_{n})^{T}\in\mathbb{R}^{n}$ of the solution $G(f)$ to the PDE (2) generated according to the equivalent discrete statistical model (3), for a given deterministic grid of points $x_{1},\dots,x_{n}\in\mathcal{O}$ comprising the nodes of a triangular mesh covering $\mathcal{O}$ (Figure 3, top-left). We then seek to estimate $f$ from data $\mathbf{Y}$ .

3.1 Posterior inference with Gaussian series priors

3.1.1 Methodology

For the Gaussian series priors defined via the Dirichlet-Laplacian eigenpairs $\{(\phi_{j},\lambda_{j}),\ j\in\mathbb{N}\}$ considered in Example 2, we discretise the parameter space by modelling the unknown source function $f$ as the finite sum

f=\sum_{j=1}^{J}f_{j}\phi_{j},\qquad f_{1},\dots,f_{J}\in\mathbb{R},\qquad J% \in\mathbb{N}.

(12)

For any such $f$ , the linearity of the forward map $G$ then implies that the discrete observations are given by

\displaystyle Y_{i}=\sum_{j=1}^{J}f_{j}G(\phi_{j})(x_{i})+\sigma W_{i}\equiv(% \mathbf{G}\mathbf{f})_{i}+\sigma W_{i}

where $\mathbf{G}:=[G(\phi_{j})(x_{i}),\ i=1,\dots,n,\ j=1,\dots,J]\in\mathbb{R}^{n,J}$ and $\mathbf{f}:=(f_{1},\dots,f_{J})^{T}\in\mathbb{R}^{J}$ , whereby the inverse regression model (3) can be written in matrix notation as

\mathbf{Y}=\mathbf{G}\mathbf{f}+\sigma\mathbf{W}

(13)

with $\mathbf{W}:=(W_{1},\dots,W_{n})^{T}\sim N_{n}(0,\mathbf{I}_{n})$ . Thus, for any given $\mathbf{f}\in\mathbb{R}^{J}$ , $\mathbf{Y}|\mathbf{f}\sim N_{n}(\mathbf{G}\mathbf{f},\sigma^{2}\mathbf{I}_{n})$ .

In practice, outside certain special cases for the domain $\mathcal{O}$ (such as squared or circular ones), the Dirichlet-Laplacian eigenpairs are not explicitly available. For general domains, we then resort to numerical methods to solve the elliptic eigenvalue problem (8). In particular, we used MATLAB PDE Toolbox, that allows to input a range of search $[0,\lambda_{\textnormal{max}}]$ , $\lambda_{\textnormal{max}}>0$ , for the eigenvalues, and then returns numerical approximations, obtained via finite element methods, of the eigenvalues in the prescribed interval and of the corresponding eigenfunctions. For the considered rotated elliptically-shaped domain, Figure 3 shows the first, the second and the last eigenfunction returned by the elliptic PDE solver. The range was set to $[0,\lambda_{\textnormal{max}}]=[0,500]$ , for which $J=84$ eigenvalues were found. The (numerical approximations to) the eigenvalues are displayed in Figure 4. They exhibit a linear growth as expected from Weyl’s asymptotics for bi-dimensional domains. The computation of the matrix $\mathbf{G}$ in the discretised observation model (13) is performed by numerically solving, again using MATLAB PDE Toolbox, the elliptic PDE (2) with $f$ replaced by $\phi_{j}$ (or, more precisely, by the finite element approximation of $\phi_{j}$ ) and then evaluating $\mathbf{G}_{ij}:=G(\phi_{j})(x_{i})$ for all $i=1,\dots,n$ and $j=1,\dots,J$ .

Under the discretisation (12), the Gaussian series prior described in Example 2 is approximately implemented by truncating the random series (9) at level $J$ , and then assigning to the coefficients $f_{1},\dots,f_{J}$ in (12) independent Gaussian priors $N(0,\lambda_{j}^{-\alpha})$ , $j=1,\dots,J$ . In the discretised observation model (13), this corresponds to assigning to the vector $\mathbf{f}$ the $J$ -dimensional Gaussian prior with diagonal covariance matrix

\mathbf{f}\sim N_{J}(0,\mathbf{\Lambda}),\qquad\mathbf{\Lambda}:=\textnormal{% diag}(\lambda_{1}^{-\alpha},\dots,\lambda_{J}^{-\alpha})\in\mathbb{R}^{J,J}.

(14)

Thus, recalling that, according to (13), $\mathbf{Y}|\mathbf{f}\sim N_{n}(\mathbf{G}\mathbf{f},\sigma^{2}\mathbf{I}_{n})$ , a standard conjugate computation for multivariate models with Gaussian likelihood and prior yields the posterior distribution

\mathbf{f}|\mathbf{Y}\sim N_{J}(\bar{\mathbf{f}}_{n},\mathbf{\Lambda}_{n}),

(15)

where

\bar{\mathbf{f}}_{n}:=\frac{1}{\sigma^{2}}\mathbf{\Lambda}_{n}\mathbf{G}^{T}% \mathbf{Y};\qquad\mathbf{\Lambda}_{n}:=(\sigma^{-2}\mathbf{G}^{T}\mathbf{G}+% \mathbf{\Lambda}^{-1})^{-1}.

(16)

Using the conjugate formulae (15) and (16), it is straightforward to compute posterior mean estimates and drawing posterior samples. In turn, this allows to efficiently implement credible sets centred around the posterior mean, replacing the theoretical posterior quantiles (for example, the ones used in the definition of the credible intervals (6)) with the empirical quantiles associated to a sufficiently large sample from the posterior distribution.

3.1.2 Experiments

Throughout the numerical simulation study, the true source function (shown in Figure 1, left) was taken to be

f_{0}(x,y)=e^{-(5x-2.5)^{2}-(5y)^{2}}+e^{-(7.5x)^{2}-(2.5y)^{2}}+e^{-(5x-2.5)^% {2}-(5y)^{2}},\qquad(x,y)\in\mathcal{O}.

Figure 1 (right) shows $n=4500$ discrete noisy observations, over the nodes of the triangular mesh depicted in Figure 3 (top-left), of the corresponding PDE solution $G(f_{0})$ arising as in the inverse regression model (3) with noise standard deviation $\sigma=0.0005$ (with corresponding signal-to-noise ratio $\|G(f_{0})\|_{2}/\sigma=37.55$ ). The diffusion coefficient was taken to be $c(x,y):=2+5e^{-(5x-2)^{2}-(5y-2)^{2}}+5e^{-(5x+2)^{2}-(5y+2)^{2}}$ , $(x,y)\in\mathcal{O}$ . The PDE solution $G(f_{0})$ was calculated using MATLAB PDE Toolbox, which also contains the routine to create the triangular mesh.

The posterior mean estimate $\bar{f}_{n}:=\sum_{j=1}^{J}\bar{\mathbf{f}}_{n,j}\phi_{j}$ shown in Figure 2 (left) was obtained by computing the vector of coefficients $\bar{\mathbf{f}}_{n}$ according to the conjugate formula in (16). A diagonal Gaussian prior as in (14) was used, with regularity parameter $\alpha=3/4$ . The parameter space was discretised using $J=84$ basis functions. The obtained $L^{2}$ -estimation error was $\|\bar{f}_{n}-f_{0}\|_{2}=0.060077$ . For comparison, $\|f_{0}\|_{2}=0.4764$ (with corresponding relative error $\|\bar{f}_{n}-f_{0}\|_{2}/\|f_{0}\|_{2}=12.5\%$ ), while the $L^{2}$ -approximation error incurred by projecting $f_{0}$ onto the linear space spanned by the employed set of basis functions (furnishing a lower bound for the $L^{2}$ -estimation error) is $0.0486$ . The $2500$ posterior draws whose cross-sections along the $x$ -axis are shown in Figure 2 (right) were sampled from the conjugate Gaussian posterior distribution in (15).

Figure 5 provides an illustration of asymptotic convergence in the infinitely-informative data limit, showing the posterior mean estimates obtained for increasing sample sizes. The (decreasing) $L^{2}$ -estimation errors for sample sizes ranging between $n=50$ and $n=4500$ are reported in Table 1. Across the experiments, the same discretisation with $J=84$ basis function and the same diagonal Gaussian prior with regularity $\alpha=3/4$ were used.

Table 1:

L^{2}

-estimation errors achieved by the posterior mean estimator

\bar{f}_{n}

for increasing sample sizes.

$n$	50	100	250	500	750	1000	2000	3000	4500
$\\|\bar{f}_{n}-f_{0}\\|_{2}$	0.22	0.18	0.13	0.099	0.088	0.078	0.076	0.069	0.060
$\\|\bar{f}_{n}-f_{0}\\|_{2}/\\|f_{0}\\|_{2}$	45.8%	37.5%	27.1%	20.6%	18.3%	16.3%	15.8%	14.3%	12.5%

We next consider semiparametric inference for one-dimensional linear functionals $\langle f,\psi\rangle_{2}$ , $\psi\in L^{2}(\mathcal{O})$ , and provide a numerical illustration of the asymptotic results presented in Section 2.2. In particular, we focus on test functions $\psi=\phi_{j}$ , $j\in\{1,\dots,J\}$ , belonging to the Dirichlet-Laplacian eigenbasis, for which, under the discretisation (12), $\langle f,\phi_{j}\rangle_{2}=f_{j}$ . Accordingly, for $\bar{\mathbf{f}}_{n}$ and $\mathbf{\Lambda}_{n}$ as in (16), the plug-in posterior estimators are given by $\langle\bar{f}_{n},\phi_{j}\rangle_{2}=\bar{\mathbf{f}}_{n,j}$ , with corresponding posterior variances $\mathbf{\Gamma}_{n,jj}$ . Thus, the $95\%$ -credible interval for $\langle f,\phi_{j}\rangle_{2}$ is given by $\bar{\mathbf{f}}_{n,j}\pm 1.96\sqrt{\mathbf{\Gamma}_{n,jj}}$ .

In order to compute the asymptotic variances $\|\nabla\cdot(c\nabla\psi)\|_{2}^{2}$ appearing in the right hand side of (4) and (5), we obtain the singular value decomposition (SVD) of the forward operator $G$ , corresponding to finding the eigenfunctions $(\xi_{k},\ k\in\mathbb{N})\subset L^{2}(\mathcal{O})$ and eigenvalues $(\eta_{k},\ k\in\mathbb{N})\subset[0,\infty)$ solving the problem

\begin{split}-\nabla\cdot(c\nabla\xi)-\eta\xi&=0,\ \ \textnormal{on}\ \ % \mathcal{O}\\ \xi&=0,\ \ \textnormal{on}\ \ \partial\mathcal{O},\end{split}

(17)

whereupon there follow the identities

G(f)=\sum_{k=1}^{\infty}\eta_{k}^{-1}\langle f,\xi_{k}\rangle_{2}\xi_{k},

and

\nabla\cdot(c\nabla u)=\sum_{k=1}^{\infty}\eta_{k}\langle u,\xi_{k}\rangle_{2}% \xi_{k};\qquad\|\nabla\cdot(c\nabla u)\|_{2}^{2}=\sum_{k=1}^{\infty}\eta_{k}^{% 2}\langle u,\xi_{k}\rangle_{2}^{2}.

Table 2: Observed coverages for increasing sample sizes of the

95\%

-credible intervals for the linear functionals

\langle f,\phi_{j}\rangle_{2}

, with

j=2,4,8,16

$n$	50	100	250	500	750	1000
$\phi_{2}$	0.885	0.921	0.954	0.969	0.947	0.962
$\phi_{4}$	0.904	0.936	0.943	0.96	0.953	0.958
$\phi_{8}$	0.92	0.934	0.951	0.944	0.963	0.952
$\phi_{16}$	0.949	0.922	0.92	0.946	0.945	0.96

In practice, we tackle the eigenvalue problem (17) via finite element methods exactly as outlined in Section 3.1.1 for the computation of the Dirichlet-Laplacian eigenbasis, obtaining numerical approximations of the eigenpairs. We note that, while used here as a convenient computational device to evaluate the asymptotic variances, knowledge of the SVD of the forward operator $G$ is not assumed for the theoretical results of Section 2.2, nor is required for the specification of the two classes of Gaussian priors introduced in Examples 2 and 3 respectively. The theory and methodology investigated in the present article are indeed generally applicable to inverse problems where the SVD might be challenging or unfeasible to compute, or to settings where the properties of the associated eigenpairs might be unknown.

Figure 6 shows the (approximate) distributions of the plug-in posterior mean estimators $\langle\bar{f}_{n},\psi\rangle_{2}$ for four representative test functions $\psi=\phi_{2},\phi_{4},\phi_{8},\phi_{16}$ . The plots present the histograms relative to $1000$ realisations of the estimators, obtained by drawing $1000$ independent collections of observations from the inverse regression model (3). For each experiment, a sample of size $n=1000$ was drawn, with noise standard deviation $\sigma=0.0005$ . As expected from the central limit theorem (5), the distributions of the plug-in estimators $\langle\bar{f}_{n},\psi\rangle_{2}$ exhibit a normal shape, are approximately centred around the true parameter $\langle f_{0},\psi\rangle_{2}$ , and their spread is mostly captured by the asymptotic variance $\|\nabla\cdot(c\nabla\psi)\|^{2}_{2}$ .

Finally, Table 2 reports the coverage, for increasing sample sizes, of the $95\%$ -credible intervals (6) for the same linear functionals $\langle f,\phi_{j}\rangle_{2}$ , with $j=2,4,8,16$ , considered in the previous set of experiments. The results were obtained by drawing $1000$ independent collections, of size $n=1000$ , of observations from the inverse regression model (3), with noise standard deviation $\sigma=0.0005$ . For each random sample, a realisation of the $95\%$ -credible intervals $\bar{\mathbf{f}}_{n,j}\pm 1.96\sqrt{\mathbf{\Gamma}_{n,jj}}$ was obtained, and the final coverage scores were computed as the fraction of times in which the true parameters $\langle f_{0},\phi_{j}\rangle_{2}$ were contained in the obtained credible intervals. As expected from the theoretical convergence result in (7), the observed coverages stabilise, as the sample size increases, around the prescribed credibility level $95\%$ .

3.2 Posterior inference with the Matérn process prior

Next, we consider the Matérn process priors introduced in Example 3. We discretise the parameter space by assuming that $f$ is given by the finite sum

f=\sum_{m=1}^{M}f_{m}\varphi_{m},\qquad f_{1},\dots,f_{M}\in\mathbb{R},\qquad M% \in\mathbb{N},

(18)

where $\varphi_{1},\dots,\varphi_{M}$ are piecewise linear functions on the nodes $z_{1},\dots,z_{M}\in\mathcal{O}$ of a deterministic triangular mesh, uniquely identified by the property $\varphi_{m}(z_{m^{\prime}})=1_{\{m=m^{\prime}\}}$ ; see Figure 7. Accordingly, $f$ in (18) satisfies $f(z_{m})=f_{m}$ , and for any $x\in\mathcal{O}$ the value $f(x)$ is obtained by linearly interpolating the pairs $\{(z_{m},f_{m}),\ m=1,\dots,M\}$ .

Under the discretisation (18), the inverse regression model (3) can be written in matrix notation exactly as in (13), now with

\mathbf{G}:=[G(\varphi_{m})(x_{i}),\ i=1,\dots,n,\ m=1,\dots,M]\in\mathbb{R}^{% n,M},

and with

\mathbf{f}:=(f_{1},\dots,f_{M})^{T}\in\mathbb{R}^{M}.

Thus, again, $\mathbf{Y}|\mathbf{f}\sim N_{n}(\mathbf{G}\mathbf{f},\sigma^{2}\mathbf{I}_{n})$ . Similarly to Section 3.1.1, the numerical computation of the matrix $\mathbf{G}$ can be carried out with finite element methods for elliptic PDEs.

Recalling that, under the discretisation (18), $f_{m}=f(z_{m})$ for $m=1,\dots,M$ , and the finite dimensional distributions property (11), assigning to $f$ a Matérn process prior with covariance $C_{\alpha,\ell}$ as in (10) corresponds to assigning to the vector $\mathbf{f}$ the $M$ -dimensional Gaussian prior with covariance matrix

\mathbf{f}\sim N_{M}(0,\mathbf{C}),\qquad\mathbf{C}:=[C_{\alpha,\ell}(z_{h},z_% {m})]_{h,m=1}^{M}\in\mathbb{R}^{M,M}.

The same conjugate computation as the one outlined in Section 3.1.1 can then be carried out, leading to the Gaussian posterior distribution $\mathbf{f}|\mathbf{Y}\sim N_{M}(\bar{\mathbf{f}}_{n},\mathbf{C}_{n})$ , with posterior mean and covariance matrix respectively given by

\bar{\mathbf{f}}_{n}:=\frac{1}{\sigma^{2}}\mathbf{C}_{n}\mathbf{G}^{T}\mathbf{% Y};\qquad\mathbf{C}_{n}:=(\sigma^{-2}\mathbf{G}^{T}\mathbf{G}+\mathbf{C}^{-1})% ^{-1}.

Using the above conjugate formulae, posterior inference for the source function $f$ based on the Matérn process prior can efficiently be implemented.

Table 3:

L^{2}

-estimation errors achieved by the posterior mean estimator

\bar{f}_{n}

arising from the Matérn process prior for increasing sample sizes.

$n$	50	100	250	500	750	1000	2000	3000	4500
$\\|\bar{f}_{n}-f_{0}\\|_{2}$	0.30	0.30	0.18	0.12	0.13	0.10	0.086	0.076	0.067
$\\|\bar{f}_{n}-f_{0}\\|_{2}/\\|f_{0}\\|_{2}$	62.5%	62.5%	37.5%	25%	27.1%	20.8%	17.9%	15.6%	13.9%

For the ground truth displayed in Figure 1 (left), Table 3 reports the $L^{2}$ -estimation errors attained by the posterior mean estimate $\bar{f}_{n}=\sum_{m=1}^{M}\bar{\mathbf{f}}_{n,m}\varphi_{m}$ based on an increasing number of observations from the inverse regression model (3), with noise standard deviation $\sigma=0.0005$ (with corresponding signal-to-noise ratio $\|G(f_{0})\|_{2}/\sigma=37.55$ ). Across the experiments, the parameter space was discretised using a triangular mesh with $M=1169$ nodes. The prior regularity parameter for the Matérn process prior was set to $\alpha=10$ , and the length-scale parameter to $\ell=0.25$ . Figure 8 shows the posterior mean estimate resulting from $n=4500$ observations.

The results are relative to the same collection of synthetic data sets with increasing sample size employed in Section 3.1.2, allowing a direct comparison to the results obtained with the Gaussian series priors considered therein. Overall, the achieved $L^{2}$ -estimation errors are comparable in magnitude for each sample size, albeit the performance of the Gaussian series priors was consistently slightly better. It is plausible that such small discrepancy is caused by finite sample effects, prior tuning and the various numerical approximations.

3.3 Further numerical experiments

3.3.1 Sensitivity to the noise variance

Table 4:

L^{2}

-estimation errors achieved by the posterior mean estimator

\bar{f}_{n}

arising from the Gaussian series prior for decreasing noise standard deviation.

$\sigma$	0.01	0.005	0.0025	0.001	0.0005	0.0001
$\\|G(f_{0})\\|_{2}/\sigma$	1.88	3.75	7.51	18.77	37.55	187.74
$\\|\bar{f}_{n}-f_{0}\\|_{2}$	0.21	0.16	0.14	0.078	0.060	0.049
$\\|\bar{f}_{n}-f_{0}\\|_{2}/\\|f_{0}\\|_{2}$	43.75%	33.33%	29.17%	16.25%	12.5%	10.20%

For the empirical results presented in Sections 3.1 and 3.2, a fixed noise standard deviation $\sigma=0.0005$ in the inverse regression model (3) was used (with corresponding signal-to-noise ratio $\|G(f_{0})\|_{2}/\sigma=37.55$ ). Here, we provide a brief investigation of the sensitivity of the considered methodology to the value of $\sigma$ , performing a set of experiments with decreasing standard deviation from $\sigma=0.01$ (for which $\|G(f_{0})\|_{2}/\sigma=1.8774$ ) to $\sigma=0.0001$ (for which $\|G(f_{0})\|_{2}/\sigma=187.74$ ), based on the same domain and ground truth used previously. Across the experiments, the sample size was kept fixed at $n=4500$ .

For concreteness, we focus on the Gaussian series priors from Section 3.1, with the same prior tuning employed therein; similar results may be obtained with the Matérn process priors. The $L^{2}$ -estimation error associated to the resulting posterior mean estimates are shown in Table 4. Unsurprisingly, these were observed to decrease monotonically as the signal-to-noise ratio increased. In particular, at the lowest value $\sigma=0.0001$ , the $L^{2}$ -estimation error may be seen to approach the $L^{2}$ -approximation error resulting from projecting $f_{0}$ onto the employed basis, which is equal to $0.0486$ .

3.3.2 Inference with unknown noise variance

We conclude the simulation study considering the important practical scenario where the noise standard deviation $\sigma$ is itself unknown and needs to be estimated from the data. Given observations $\{(Y_{i},X_{i})\}_{i=1}^{n}$ from the inverse regression model (3), we undertake the simple ‘empirical Bayes’ approach of obtaining a preliminary estimate $\hat{\sigma}_{n}$ of $\sigma$ , and then carry over the methodology laid out in Sections 3.1 and 3.2 with $\sigma$ replaced by $\hat{\sigma}_{n}$ . Alternatively, we note that a joint Bayesian model for $f$ and $\sigma$ in (3) could be considered by endowing $\sigma$ with a prior distribution. For example, an independent inverse-gamma distribution would lead (conditionally given $f$ ) to a conjugate posterior distribution, whereupon joint posterior samples for the pair $(f,\sigma)$ could readily be obtained via a Gibbs sampler, alternating draws from the Gaussian posterior distribution of $f|\{(Y_{i},X_{i})\}_{i=1}^{n},\sigma$ and draws from the inverse-gamma posterior distribution of $\sigma|\{(Y_{i},X_{i})\}_{I=1}^{n},f$ . For brevity, we will not pursue this approach further here.

Table 5: Inferential results for the difference-based estimator

\hat{\sigma}_{n}

of the noise standard deviation and the the empirical Bayes posterior mean estimator

\hat{f}_{n}

, for increasing sample sizes.

$n$	1000	2000	3000	4500
$\hat{\sigma}$	0.0034	0.0024	0.0023	0.00072
$\\|\hat{f}_{n}-f_{0}\\|_{2}$	0.18	0.14	0.12	0.063
$\\|\hat{f}_{n}-f_{0}\\|_{2}/\\|f_{0}\\|_{2}$	37.5%	29.17%	25%	13.12%

Several strategies have been proposed in the literature for variance estimation in nonparametric regression models, ranging from residual-based estimators using kernel smoothing [22] and splines [46], to difference-based estimators [38]. See [5] for an overview. Here, we will consider the difference-based method proposed in [38], estimating $\sigma$ in model (3) by

\hat{\sigma}_{n}:=\sqrt{\hat{\sigma}^{2}_{n}},\qquad\hat{\sigma}^{2}_{n}:=% \frac{1}{2(n-1)}\sum_{i=2}^{n}(Y_{i}-Y_{i-1})^{2}.

Based on $\hat{\sigma}_{n}$ , the ‘empirical Bayes posterior mean’ estimate $\hat{f}_{n}$ arising from a Gaussian series prior or a Matérn process prior can then be readily computed exactly as described in Sections 3.1 and 3.2 respectively, replacing $\sigma$ with $\hat{\sigma}_{n}$ in the relevant conjugate formulae.

Table 5 summarises the inferential results obtained with the difference-based estimation procedure for increasing sample sizes. For these experiments, the noise standard deviation was set to $\sigma=0.0005$ , and a Gaussian series priors with the same tuning as in Sections 3.1.2 and 3.3.1 was used. The results show a progressive improvement in the reconstruction quality for both the noise standard deviation $\sigma$ and the unknown source function $f$ . In particular, for the largest considered sample size $n=4500$ , the $L^{2}$ -estimation error $\|\hat{f}_{n}-f_{0}\|_{2}$ resulted to be only marginally higher than the one obtained under the same experimental conditions (and prior tuning) in the context of the empirical results presented in Section 3.1.2 (cf. Table 1), for which knowledge of the value of $\sigma$ was assumed.

4 Summary and discussion

In this article we have considered the nonparametric Bayesian approach with Gaussian priors to linear inverse problems, focusing on the important example of source identification in elliptic PDEs. The main advantages of the considered methodology lie in its modelling flexibility, its ease of implementation (cf. the conjugate formulae (15) and (16)), as well as its theoretical guarantees on estimation and uncertainty quantification (cf. Section 2.2). The performance of the approach has been investigated in a numerical simulation study (cf. Section 3) under two distinct prior models (Gaussian series and Matérn process priors), both for which excellent reconstruction results have been obtained.

The present work also raises various related research questions. Firstly, it is of interest and practical importance to further explore the setting where the noise standard deviation $\sigma$ in the inverse regression model (3) is unknown. While the simple difference-based estimator considered in Section 3.3.2 has proved effective, several competing approaches, including the joint conjugate Bayesian model outlined in Section 3.3.2, could be investigated. Furthermore, a related interesting question concerns the extensions of the theoretical results presented in Section 2.2 to the setting with unknown variance; see e.g. [26] for related results in a direct regression model.

Lastly, let us mention the important issue of specifying the hyperparameter values for the considered prior distributions, namely the truncation level and regularity in the Gaussian series (14), and the smoothness and length-scale parameters in the Matérn covariance kernel (10). There is by now a vast literature investigating the methodological and theoretical aspects of empirical and hierarchical Bayesian approaches to fully data-driven selection of the hyperparameters; see [28, 39, 45, 3] and the many reference therein. Investigating the implications and performance of these methods in the context of the observation model and prior distributions considered in the present article is an interesting problem for future research.

Acknowledgement.

The Author is grateful to three anonymous referee for many helpful comments that lead to an improvement of the article. This research has been partially supported by MUR, PRIN project 2022CLTYP4. The Author also gratefully acknowledges the support of “de Castro" Statistics Initiative, Collegio Carlo Alberto, Torino. There are no conflicts of interest to declare that are relevant to the content of this chapter.

References

[1] Abraham, K., and Nickl, R. On statistical Calderón problems. Math. Stat. Learn. 2, 2 (2019), 165–216.
[2] Adavani, S. S., and Biros, G. Fast algorithms for source identification problems with elliptic pde constraints. SIAM Journal on Imaging Sciences 3, 4 (2010), 791–808.
[3] Agapiou, S., Bardsley, J. M., Papaspiliopoulos, O., and Stuart, A. M. Analysis of the gibbs sampler for hierarchical inverse problems. SIAM/ASA Journal on Uncertainty Quantification 2, 1 (2014), 511–544.
[4] Agapiou, S., Stuart, A. M., and Zhang, Y.-X. Bayesian posterior contraction rates for linear severely ill-posed inverse problems. J. Inverse Ill-Posed Probl. 22, 3 (2014), 297–321.
[5] Alharbi, Y. F. M. Error variance estimation in nonparametric regression models. PhD thesis, University of Birmingham, 2013.
[6] Arridge, S., Maass, P., Öktem, O., and Schönlieb, C.-B. Solving inverse problems using data-driven models. Acta Numer. 28 (2019), 1–174.
[7] Baumeister, J. Inverse problems in finance. In Recent developments in computational finance: Foundations, algorithms and applications. World Scientific, 2013, pp. 81–157.
[8] Benning, M., and Burger, M. Modern regularization methods for inverse problems. Acta Numerica 27 (2018), 1–111.
[9] Bertero, M., and Piana, M. Inverse problems in biomedical imaging: modeling and methods of solution. In Complex systems in biomedicine. Springer Italia, Milan, 2006, pp. 1–33.
[10] Brown, L. D., and Low, M. G. Asymptotic equivalence of nonparametric regression and white noise. Ann. Statist. 24, 6 (1996), 2384–2398.
[11] Castillo, I., and Nickl, R. Nonparametric Bernstein–von Mises Theorems in Gaussian white noise. Ann. Statist. 41, 4 (2013), 1999–2028.
[12] Collins, M., and Kuperman, W. Inverse problems in ocean acoustics. Inverse Problems 10, 5 (1994), 1023.
[13] Elvetun, O. L., and Nielsen, B. F. A regularization operator for source identification for elliptic pdes. Inverse Problems & Imaging 15, 4 (2021).
[14] Evans, L. C. Partial differential equations, second ed., vol. 19 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2010.
[15] Ghosal, S., and van der Vaart, A. W. Fundamentals of Nonparametric Bayesian Inference. Cambridge University Press, New York, 2017.
[16] Giné, E., and Nickl, R. Mathematical foundations of infinite-dimensional statistical models. Cambridge University Press, New York, 2016.
[17] Giordano, M. Bayesian nonparametric inference in pde models: asymptotic theory and implementation. In 2023 JSM Proceedings. Zenodo, 2023, pp. 1–17.
[18] Giordano, M., and Kekkonen, H. Bernstein–von Mises theorems and uncertainty quantification for linear inverse problems. SIAM/ASA J. Uncertain. Quantif. 8, 1 (2020), 342–373.
[19] Giordano, M., and Nickl, R. Consistency of Bayesian inference with Gaussian process priors in an elliptic inverse problem. Inverse Problems 36, 8 (2020), 085001–85036.
[20] Giordano, M., and Ray, K. Nonparametric bayesian inference for reversible multidimensional diffusions. The Annals of Statistics 50, 5 (2022), 2872–2898.
[21] Gugushvili, S., van der Vaart, A., and Yan, D. Bayesian linear inverse problems in regularity scales. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 56, 3 (2020), 2081 – 2107.
[22] Hall, P., and Marron, J. On variance estimation in nonparametric regression. Biometrika 77, 2 (1990), 415–419.
[23] Haroske, D. D., and Triebel, H. Distributions, Sobolev Spaces, Elliptic Equations. EMS Press, 2007.
[24] Isakov, V. Inverse problems for partial differential equations, third ed., vol. 127 of Applied Mathematical Sciences. Springer, Cham, 2017.
[25] Kaipio, J., and Somersalo, E. Statistical and Computational Inverse Problems. No. 160 in Applied Mathematical Sciences. Springer-Verlag New York, 2004.
[26] Kejzlar, V., Son, M., Bhattacharya, S., and Maiti, T. A fast and calibrated computer model emulator: an empirical bayes approach. Statistics and Computing 31, 4 (2021), 49.
[27] Kekkonen, H., Lassas, M., and Siltanen, S. Posterior consistency and convergence rates for Bayesian inversion with hypoelliptic operators. Inverse Problems 32, 8 (2016), 085005, 31.
[28] Knapik, B., Szabò, B., van der Vaart, A. W., and van Zanten, H. Bayes procedures for adaptive inference in inverse problems for the white noise model. Probab. Theory Relat. Fields, 164 (2015), 771–813.
[29] Knapik, B., van der Vaart, A. W., and van Zanten, J. H. Bayesian inverse problems with Gaussian priors. Ann. Statist. 39, 5 (2011), 2626–2657.
[30] Knapik, B. T., van der Vaart, A. W., and van Zanten, J. H. Bayesian recovery of the initial condition for the heat equation. Comm. Statist. Theory Methods 42, 7 (2013), 1294–1313.
[31] Lehtinen, M. S. On statistical inversion theory. Theory and applications of inverse problems 67 (1988), 46–57.
[32] Lehtinen, M. S., Paivarinta, L., and Somersalo, E. Linear inverse problems for generalised random variables. Inverse Problems 5, 4 (1989), 599.
[33] Lions, J. L., and Magenes, E. Non-Homogeneous Boundary Value Problems and Applications, 1 ed. Grundlehren der mathematischen Wissenschaften. Springer-Verlag Berlin Heidelberg, 1972.
[34] Monard, F., Nickl, R., and Paternain, G. P. Efficient nonparametric Bayesian inference for $X$ -ray transforms. Ann. Statist. 47, 2 (2019), 1113–1147.
[35] Monard, F., Nickl, R., and Paternain, G. P. Consistent inversion of noisy non-Abelian X-ray transforms. Comm. Pure Appl. Math. 74, 5 (2021), 1045–1099.
[36] Nickl, R., van de Geer, S., and Wang, S. Convergence rates for penalized least squares estimators in PDE constrained regression problems. SIAM/ASA J. Uncertain. Quantif. 8, 1 (2020), 374–413.
[37] Rasmussen, C. E., and Williams, C. K. I. Gaussian processes for machine learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2006.
[38] Rice, J. Bandwidth choice for nonparametric regression. The annals of Statistics (1984), 1215–1230.
[39] Rousseau, J., and Szabo, B. Asymptotic behaviour of the empirical Bayes posteriors associated to maximum marginal likelihood estimator. The Annals of Statistics 45, 2 (2017), 833 – 865.
[40] Snieder, R., and Trampert, J. Inverse problems in geophysics. In Wavefield Inversion (Vienna, 1999), A. Wirgin, Ed., Springer Vienna, pp. 119–190.
[41] Stuart, A. M. Inverse problems: a Bayesian perspective. Acta Numer. 19 (2010), 451–559.
[42] Tarantola, A. Inverse problem theory and methods for model parameter estimation. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2005.
[43] Tarantola, A., Valette, B., et al. Inverse problems= quest for information. Journal of geophysics 50, 1 (1982), 159–170.
[44] Taylor, M. E. Partial Differential Equations I. Springer New York, NY, 2011.
[45] Teckentrup, A. L. Convergence of gaussian process regression with estimated hyper-parameters and applications in bayesian inverse problems. SIAM/ASA Journal on Uncertainty Quantification 8, 4 (2020), 1310–1337.
[46] Wahba, G. Improper priors, spline smoothing and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society Series B: Statistical Methodology 40, 3 (1978), 364–372.
[47] Yeh, W. W.-G. Review of parameter identification procedures in groundwater hydrology: The inverse problem. Water Resources Research 22, 2 (1986), 95–108.