A novel model reduction method to solve inverse problems of parabolic type

Wenlong Zhang Corresponding author. Department of Mathematics

\&

National Center for Applied Mathematics Shenzhen, Southern University of Science and Technology (SUSTech), 1088 Xueyuan Boulevard, University Town of Shenzhen, Xili, Nanshan, Shenzhen, Guangdong Province, P.R.China. ([email protected]). Zhiwen Zhang Corresponding author. Department of Mathematics, The University of Hong Kong, Pokfulam Road, Hong Kong SAR, P.R.China. ([email protected]).

Abstract

In this paper, we propose novel proper orthogonal decomposition (POD)–based model reduction methods that effectively address the issue of inverse crime in solving parabolic inverse problems. Both the inverse initial value problems and inverse source problems are studied. By leveraging the inherent low-dimensional structures present in the data, our approach enables a reduction in the forward model complexity without compromising the accuracy of the inverse problem solution. Besides, we prove the convergence analysis of the proposed methods for solving parabolic inverse problems. Through extensive experimentation and comparative analysis, we demonstrate the effectiveness of our method in overcoming inverse crime and achieving improved inverse problem solutions. The proposed POD model reduction method offers a promising direction for improving the reliability and applicability of inverse problem-solving techniques in various domains.

AMS subject classification: 35R30, 65J20, 65M12, 65N21, 78M34.

Keywords: parabolic inverse problem; regularization method; model reduction method; inverse crime; convergence analysis.

1 Introduction

Inverse crime, the phenomenon where the forward model used for solving an inverse problem is the same as the one used for generating the data, poses a significant challenge in accurate and reliable inverse problem solutions.

Inverse problems arise in various fields of science and engineering, ranging from medical imaging and geophysics to material science and finance. Inverse problems require the estimation of an unknown parameter or field of interest from indirect measurements, which are often noisy and incomplete. The solution of inverse problems is challenging due to the ill-posedness of the problem, which leads to unstable and non-unique solutions. To overcome these challenges, various regularization techniques have been proposed to impose constraints on the solution space. However, the accuracy and reliability of inverse problem solutions can be significantly impacted by inverse crime.

Inverse crime refers to a situation where the forward model used for generating the data is the same as the one used for solving the inverse problem. This scenario leads to overly optimistic results and underestimates the uncertainties associated with the solution. Inverse crime can be a significant issue in practical applications, where the forward model is often an approximation of the underlying physical system and contains modeling errors and uncertainties.

To overcome the issue of inverse crime, we propose a novel Proper Orthogonal Decomposition (POD) model reduction method for solving inverse problems. The POD method is a data-driven technique that enables the identification of the dominant modes of variability in the data and the construction of a low-dimensional representation of the data. By leveraging the inherent low-dimensional structures present in the data, the POD method enables the reduction of the forward model complexity without compromising the accuracy of the inverse problem solution.

In this paper, we outline our new POD model reduction method for solving inverse problems and demonstrate its effectiveness in overcoming inverse crime. We first introduce the basic principles of the POD method and its application in inverse problems. We then present our new method for addressing the issue of inverse crime by incorporating the POD method into the inverse problem solution process. We demonstrate the performance of our method through extensive experimentation and comparative analysis with state-of-the-art methods. The results show that our proposed POD model reduction method outperforms existing methods in terms of accuracy and reliability, and offers a promising avenue for enhancing the applicability of inverse problem-solving techniques in various domains.

One of the successful model reduction ideas in solving time-evolution problems is the proper orthogonal decomposition (POD) method [14, 4]. The POD method uses the data from an experiment or an accurate numerical simulation and extracts the most energetic modes in the system by using the singular value decomposition. This approach generates low-dimensional structures that can approximate the solutions to the time-evolution problem with high accuracy. The POD method has been used to solve many types of PDEs, including linear parabolic equations [16, 11], Navier‐Stokes equations [11], viscous G-equations [8], Hamilton–Jacobi–Bellman (HJB) equations [12], and optimal control problems [2]. The interested reader is referred to [13, 3, 9] for a comprehensive introduction to the model reduction methods.

In this paper, we will develop a novel POD method to solve the forward and inverse problems of the parabolic type.

To start with, we consider a parabolic equation as follows:

\left\{\begin{aligned} u_{t}+\mathcal{L}u&=f(x)&\mbox{in }\Omega\times(0,T),\\ u(x,t)&=0&\mbox{on }\partial\Omega\times(0,T),\\ u(x,0)&=g(x)&\mbox{in }\Omega\,,\end{aligned}\right.

(1.1)

where $\Omega\subset\mathbb{R}^{d}$ $(d=1,2,3)$ is a bounded domain with a $C^{2}$ boundary or a convex domain satisfying the uniform cone condition, $\mathcal{L}$ denotes a second-order elliptic operator given by $\mathcal{L}u=-\nabla\cdot(q(x)\nabla u)+c(x)u$ , and $g(x)$ is the initial condition. We assume the elliptic operator $\mathcal{L}$ is uniform elliptic, i.e., there exist $q_{\min},a_{\max}>0$ such that $q_{\min}<q(x)<q_{\max}$ for all $x\in\Omega$ . Additionally, we assume $q(x)\in C^{1}(\bar{\Omega})$ , $c(x)\in C(\bar{\Omega})$ and $c(x)\geq 0$ .

Let $u$ represent the solution of the parabolic equation (1.1). We define the forward operator $\mathcal{S}:$ $\mathcal{S}(f,g)=u(\cdot,T)$ . The forward problem involves computing the solution $u(\cdot,t)$ for $t>0$ given the source term $f(x)$ and initial condition $g(x)$ . The inverse problem, on the other hand, aims to reconstruct $f(x)$ or $g(x)$ from the final time measurement $m=u(\cdot,T)$ . Typically, iterative methods are employed to solve the inverse problem. During each iterative step, one may need to solve the forward problem one or more times. Consequently, the majority of computations expenses are attributed to the computation of the forward problems.

In this paper, we will solve two types of inverse problems:

1.

Inverse source problem: recover the source term $f(x)$ using the final time measurement $m=u(\cdot,T)$ and the known initial term $g(x)$ .
2.

Backward problem: recover the initial term $g(x)$ using the final time measurement $m=u(\cdot,T)$ and the known source term $f(x)$ .

Iterative methods are usually used to solve the inverse problems. For each iterative step, one may have to solve the forward problem one or more times, thus most of the computations are costed by the computation of the forward problem.

To solve the inverse problem in a faster way, the authors construct the POD basis functions from the snapshot solutions of the parabolic equation (1.1) with fixed source functions in [17]. The proposed method accelerates the computation of the inverse source problem, yet In this paper, we develop a novel POD method to solve the forward and inverse problems of the parabolic type. We will give a brief review of the traditional POD method in the appendix A, including the construction of the POD basis functions.

The rest of the paper is organized as follows. In Section 2, we introduce the Ajoint-POD method for solving parabolic inverse source problems and provide the error estimate for the proposed methods. Similarly, in Section 3, we propose the Ajoint-POD method for solving parabolic backward problem and provide the corresponding error estimate. In Section 4, we present numerical results to demonstrate the accuracy of our method. Finally, concluding remarks are made in Section 5.

2 Ajoint-POD method for parabolic inverse source problems

The traditional POD method has a drawback: to construct the POD basis functions, one needs to know the source term $f(x)$ or the initial term $g(x)$ in advance. However, in inverse problems, the source term or the initial term is precisely what we want to find. This can lead to the so-called inverse crime, which should be avoided in practice. In [17], the authors studied this issue by assuming that the true source term belongs to a known function class, thus avoiding the inverse crime. However, this approach does not completely address the issue of the inverse crime.

To tackle this challenge, we propose a novel model reduction method for this type of inverse problem: the Adjoint-POD method in this paper. Our new method efficiently solves inverse problems without requiring a priori information about the source term or initial term. By combining the Adjoint method’s strengths with the POD method’s model reduction capabilities, the Adjoint-POD method can efficiently and quickly solve inverse problems while avoiding the inverse crime issue.

2.1 Ajoint POD method

To demonstrate the idea of the Adjoint-POD method, we will first apply it to solve the inverse source problem. For the inverse source problem of the parabolic equation, the objective is to recover the unknown source term $f(x)$ , given the final time measurement $m(x)=\mathcal{S}(f)=u(\cdot,T)$ . In this case, $u$ satisfies the following equation:

\left\{\begin{aligned} u_{t}+\mathcal{L}u&=f(x)&\mbox{in }\Omega\times(0,T),\\ u(x,t)&=0&\mbox{on }\partial\Omega\times(0,T),\\ u(x,0)&=0&\mbox{in }\Omega\,.\end{aligned}\right.

(2.1)

Here we assume $u(x,t)=0$ and $u(x,0)=0$ for simplicity, otherwise, one just need to subtract the background solution from the measurement $m(x)$ . Since the source term $f(x)$ is unknown, we cannot use the traditional POD method to obtain snapshots. Instead, we will acquire the snapshots from the following adjoint equation:

\left\{\begin{aligned} \tilde{u}_{t}+\mathcal{L}\tilde{u}&=m(x)&\mbox{in }% \Omega\times(0,T),\\ \tilde{u}(x,t)&=0&\mbox{on }\partial\Omega\times(0,T),\\ \tilde{u}(x,0)&=0&\mbox{in }\Omega\,.\end{aligned}\right.

(2.2)

Denote the snapshots $\tilde{y}_{k}=\tilde{u}(\cdot,t_{k-1})$ , $k=1,\ldots,M+1$ with $M=\frac{T}{\Delta t}$ , and $\tilde{y}_{k}=\overline{\partial}\tilde{u}(\cdot,t_{k-M-1})$ , $k=M+2,\ldots,2m+1$ with $\overline{\partial}\tilde{u}(\cdot,t_{k})=\frac{\tilde{u}(\cdot,t_{k})-\tilde{% u}(\cdot,t_{k-1})}{\Delta t}$ , $k=1,\ldots,M$ . Then we construct the new POD basis $\{\psi_{1},...,\psi_{N_{\text{pod}}}\}$ using the method described in Appendix A from the adjoint equation (2.2). Denote $V_{\text{POD}}=span\{\psi_{1},...,\psi_{N_{\text{pod}}}\}$

We consider using these new POD basis functions $\{\psi_{1},...,\psi_{N_{\text{pod}}}\}$ to approximate the forward problem to accelerate the computation. The fully discrete scheme is constructed on $V_{\text{pod}}$ and the solution is denoted by $U_{k}$ for $k=1\cdots M$ . To be precise, we seek numerical solutions $U_{k}$ ’s such that

(\bar{\partial}U_{k},\psi)+a(U_{k},\psi)=(f,\psi),\quad\forall\psi\in{V_{\text% {pod}}}.

(2.3)

Here the bilinear form $a(u,v)=(q\nabla u,\nabla v)+(cu,v)$ . We define the solution operator from the source term $f$ to the final time solution $U_{M}$ as $\mathcal{S}_{\text{pod}}$ , i.e., $\mathcal{S}_{\text{pod}}f=U_{M}$ . Using the new POD basis functions and the reduced-order model represented by $\mathcal{S}_{\text{pod}}$ , we can efficiently solve the forward problem for each time step, significantly reducing the computational cost compared to the full-scale model. This approach is particularly useful when solving inverse problems, where multiple forward problem evaluations are required.

2.2 Convergence of the Adjoint-POD method

We will first revisit an important property of the eigenvalue distribution for the classical elliptic operator $\mathcal{L}$ [1, 7].

Proposition 2.1

Suppose $\Omega$ is a bounded domain in $\mathbb{R}^{d}$ and $a(x),c(x)\in C^{0}(\bar{\Omega})$ , $c(x)\geq 0$ , then the eigenvalue problem

\displaystyle\mathcal{L}\psi=\mu\,\psi~{}~{}\text{with}~{}~{}\psi_{\partial% \Omega}=0

(2.4)

has a countable set of positive eigenvalues $\mu_{1}\leq\mu_{2}\leq\cdots$ , with its corresponding eigenfunctions $\{\phi_{k}\}_{k=1}^{\infty}$ forming an orthogonal basis of $L^{2}(\Omega)$ . Moreover, there exist constants $C_{1},C_{2}>0$ such that $C_{1}k^{2/d}\leq\mu_{k}\leq C_{2}k^{2/d}$ for all $k=1,2,\cdots.$

From the Proposition above, eigenfunction set $\{\phi_{k}\}_{k=1}^{\infty}$ forms an orthogonal basis of $L^{2}(\Omega)$ . Then for any $f\in L^{2}(\Omega)$ , we write $f=\sum_{k=1}^{\infty}f_{k}\phi_{k}$ for a set of coefficients $f_{k}$ . Let $u=\sum_{k=1}^{\infty}u_{k}(t)\phi_{k}$ be the solution of the problem (2.1). Substituting these two expressions of $f$ and $u$ into the first equation of (2.1), we get by noting the fact that $\mathcal{L}\phi_{k}=\mu_{k}\phi_{k}$ and comparing the coefficients of $\phi_{k}$ on both sides of the equation that $u_{k}(0)=0$ and

u^{\prime}_{k}(t)+\mu_{k}u_{k}=f_{k}\quad\quad\text{in}~{}(0,T)\,.

(2.5)

This equation expresses the time evolution of the coefficients $u_{k}(t)$ in terms of the coefficients $f_{k}$ of the source term $f$ . We can write the solution as $u_{k}(T)=\alpha_{k}\,f_{k}$ , with $\alpha_{k}=e^{-\mu_{k}T}\int_{0}^{T}e^{\mu_{k}s}ds=\frac{1}{\mu_{k}}(1-e^{-\mu% _{k}T})$ . Noting that $Sf=u(\cdot,T)=\sum^{\infty}_{k=1}u_{k}(T)\phi_{k}$ , we can formally write

S\Big{(}\sum_{k=1}^{\infty}f_{k}\phi_{k}\Big{)}=\sum_{k=1}^{\infty}\alpha_{k}f% _{k}\phi_{k}.

This representation of the solution operator $S$ provides a convenient way to compute the solution $u(\cdot,T)$ using the eigenfunctions $\phi_{k}$ and the coefficients $\alpha_{k}$ . For simplicity, we approximate the source term $f(x)$ by a finite-dimensional truncation, i.e.

f_{\text{app}}=\sum_{k=1}^{L}f_{k}\phi_{k}.

(2.6)

Then, the solution $u(x,t)$ of the parabolic equation has the form:

u(\cdot,T)=\sum^{L}_{k=1}\frac{1}{\mu_{k}}(1-e^{-\mu_{k}T})f_{k}\phi_{k}.

(2.7)

After simple calculation, we will also derive that

\tilde{u}(x,t)=\sum^{L}_{k=1}\frac{1}{\mu_{k}}(1-e^{-\mu_{k}T})(1-e^{-\mu_{k}t% })f_{k}\phi_{k}.

(2.8)

Actually the POD basis (A.4) is nothing but the singular value decomposition of the matrix $\tilde{A}=(\tilde{y}_{1},...,\tilde{y}_{M})$ , where $\tilde{y}_{j}=(\tilde{u}(x_{1},t_{j}),...,\tilde{u}(x_{N},t_{j}))^{T}$ . Here $x_{1},...,x_{N}$ are the finite element nodes in $\Omega$ . Suppose $A$ has the singular value decomposition: $A=U\Sigma V$ , then $\psi_{k}s$ are exactly the first $M$ columns of $U$ .

Let us denote $A=(y_{1},...,y_{M})$ and $\tilde{A}=(\tilde{y}_{1},...,\tilde{y}_{M})$ , the matrix $\Phi=(\phi_{1}(\vec{x},t_{1}),...,\phi_{L}(\vec{x},t_{1}))$ , $F=\text{diag}(f_{1},...,f_{L})$ and $D=\text{diag}(\frac{1}{\mu_{k}}(1-e^{-\mu_{k}T}),...,\frac{1}{\mu_{L}}(1-e^{-% \mu_{L}T}))$ . Additionally, let us define an $L\times M$ matrix $J$ with entries $J(i,j)=\frac{1}{\mu_{i}}(1-e^{-\mu_{i}t_{j}})$ . $\Phi$ is a column orthogonal matrix due to the normal orthogonality of the eigenfunctions $\phi_{k}$ . Utilizing the formulations of $u$ and $\tilde{u}$ , we can represent the matrices $A$ and $\tilde{A}$ as follows:

A=\Phi FJ,~{}~{}\text{and}~{}~{}\tilde{A}=\Phi DFJ.

(2.9)

Proposition A.1 demonstrates that the low-rank space $V_{\text{pod}}$ is the best $N_{\text{pod}}$ -rank approximation of the column space of $\tilde{A}$ . Our objective is to show that $V_{\text{pod}}$ is also a good approximation of the column space of $A$ , which will validate the efficiency of the new POD method. To begin, let us establish the relationship between the matrices $A$ and $\tilde{A}$ .

Lemma 2.2

If $L\leq M$ , then $span\{y_{1},...,y_{M}\}=span\{\tilde{y}_{1},...,\tilde{y}_{M}\}$ , i.e. $C(A)=C(\tilde{A})$ .

Proof. Here we provide a concise proof for the case when the eigenvalues $\mu_{j}$ are distinct from each other. The proof for the case of repeated eigenvalues follows a similar approach. To demonstrate the desired results, we need to show that the existence of matrices $P$ and $\tilde{P}$ such that,

\Phi DFJP=\Phi FJ,

\Phi DFJ=\Phi FJ\tilde{P}.

We will only present a brief proof for the first equality, as the second can be derived in a similar manner.

Since the columns of $\Phi$ are independent and the diagonal matrix $F$ is invertible, it suffices to prove the existence of a matrix $P$ such that

JP=DJ.

First, we show that the matrix $J_{L\times L}^{\prime}$ with entries $J^{\prime}(i,j)=1-e^{-\mu_{i}t_{j}}$ is invertible. Denote the vector $\textbf{e}=(1,...,1)^{T}$ . Then, we can express $J^{\prime}$ as

J^{\prime}=\textbf{e}\textbf{e}^{T}-V_{L},

where $V_{L}(i,j)=e^{-\mu_{i}t_{j}}$ is a Vandermonde matrix. To prove the invertibility of $J^{\prime}$ , we assume, by contradiction, that $J^{\prime}$ is singular. In that case, there exists a nonzero vector $\textbf{c}=(c_{1},...,c_{L})^{T}$ such that

J^{\prime}\textbf{c}=0,

or equivalently,

V_{L}\textbf{c}=\textbf{e}\textbf{e}^{T}\textbf{c}.

Now, consider the function $f(x)=\sum_{j=1}^{L}c_{j}e^{xt_{j}}$ . Under this assumption, we have that

f(0)=f(\mu_{1})=f(\mu_{2})=\cdots=f(\mu_{L}),

which implies that the function $f$ has $L+1$ distinct zeros. This implies that the derivative $f^{\prime}(x)=\sum_{j=1}^{L}c_{j}t_{j}e^{xt_{j}}$ has $L$ distinct zeros. Since c is a nonzero vector and all $\mu_{j}$ s are all nonzero, this will imply that the Vandermonde matrix $V_{L}$ is singular, which contradicts the fact that $V_{L}$ is an invertible matrix. Consequently, $J^{\prime}$ must be a nonsingular matrix.

Since the invertibility of $J^{\prime}$ , the first $L$ columns of $J$ are independent and thus form a basis for $R^{L}$ . Similarly, the matrix $DJ$ also has independent columns that form a basis for $R^{L}$ . Consequently, there must exist a matrix $P$ such that

JP=DJ.

This result establishes that the spans of the sets $\{y_{1},...,y_{M}\}$ and $\{\tilde{y}_{1},...,\tilde{y}_{M}\}$ are equivalent, i.e., $\operatorname{span}\{y_{1},\dots,y_{M}\}=\operatorname{span}\{\tilde{y}_{1},% \dots,\tilde{y}_{M}\}$ . This ends the proof.

Based on Proposition A.1, the new POD basis effectively approximates the set $\{\tilde{y}_{1},...,\tilde{y}_{M}\}$ . Given the previous results, we can now demonstrate that the new POD basis also serves as a good approximation for the original set $\{y_{1},...,y_{M}\}$ .

Theorem 2.3

Using the same notation as in Proposition A.1, if a sufficient number of snapshots are available, i.e. $L\leq M$ , then the following approximation error bound holds:

\frac{\sum_{i=1}^{M}\left|\left|y_{i}-P_{\text{pod}}y_{i}\right|\right|_{L^{2}% (\Omega)}^{2}}{\sum_{i=1}^{M}\left|\left|y_{i}\right|\right|_{L^{2}(\Omega)}^{% 2}}\leq CL^{4/d}\rho,

(2.10)

where $P_{\text{pod}}$ is the projection operator onto the adjoint-POD space $\operatorname{span}\{\psi_{1},\dots,\psi_{N_{\text{pod}}}\}$ and $\rho=\frac{\sum_{k={N_{\text{pod}}}+1}^{2M+1}\lambda_{k}}{\sum_{k=1}^{2M+1}% \lambda_{k}}$ .

Proof. In the following proof, we assume $L=M$ for simplicity. For the case $L<M$ , the proof is similar.

Using the same notation of Lemma 2.2, $\Phi$ and $J$ are both invertible square matrices. Then there exists a unique matrix $P$ such that,

\Phi DFJP=\Phi FJ,

and $P=J^{-1}D^{-1}J$ .

Hence $y_{j}=\sum^{L}_{i=1}P_{ij}\tilde{y}_{i}$ . Then by using the Cauchy-Schwarz inequality, for any $1\leq j\leq L$ , we have that

\displaystyle\|y_{j}-P_{\text{pod}}y_{j}\|^{2}\leq\sum_{i=1}^{L}P^{2}_{ij}\sum% _{i=1}^{L}\|\tilde{y}_{i}-P_{\text{pod}}\tilde{y}_{i}\|^{2}.

(2.11)

Hence,

\displaystyle\sum_{j=1}^{L}\|y_{j}-P_{\text{pod}}y_{j}\|^{2}\leq\sum_{i,j=1}^{% L}P^{2}_{ij}\sum_{i=1}^{L}\|\tilde{y}_{i}-P_{\text{pod}}\tilde{y}_{i}\|^{2}=\|% P\|^{2}_{F}\sum_{i=1}^{L}\|\tilde{y}_{i}-P_{\text{pod}}\tilde{y}_{i}\|^{2}.

(2.12)

The rest is to estimate the Frobenius norm of $P$ . Since $P=J^{-1}D^{-1}J$ , we define $\|P\|_{d}=\|D^{-1}\|_{2}$ . It is easy to verify that $\|\cdot\|_{d}$ is a matrix norm. Then,

\|P\|_{F}\leq C\|P\|_{d}=C\|D^{-1}\|_{2}\leq C\mu_{L}.

On the other hand, since $\Phi$ is an orthogonal matrix, we have,

$\displaystyle\sum_{j=1}^{L}\\|\tilde{y}_{j}\\|^{2}$	$\displaystyle=\\|\Phi DFJ\\|_{F}^{2}$	(2.13)
	$\displaystyle=\\|DFJ\\|_{F}^{2}\leq\\|D\\|_{F}^{2}\\|FJ\\|_{F}^{2}$	(2.14)
	$\displaystyle\leq C\\|FJ\\|_{F}^{2}$	(2.15)
	$\displaystyle\leq C\sum_{j=1}^{L}\\|y_{j}\\|^{2}.$	(2.16)

Alighed with Proposition A.1, we finally have,

	$\displaystyle\frac{\sum_{i=1}^{M}\left\|\left\|y_{i}-P_{\text{pod}}y_{i}\right\|% \right\|_{L^{2}(\Omega)}^{2}}{\sum_{i=1}^{M}\left\|\left\|y_{i}\right\|\right\|_{L^% {2}(\Omega)}^{2}}$	$\displaystyle\leq C\\|P\\|^{2}_{F}\frac{\sum_{i=1}^{M}\left\|\left\|\tilde{y}_{i}-% P_{\text{pod}}\tilde{y}_{i}\right\|\right\|_{L^{2}(\Omega)}^{2}}{\sum_{i=1}^{M}% \left\|\left\|\tilde{y}_{i}\right\|\right\|_{L^{2}(\Omega)}^{2}}$		(2.17)
		$\displaystyle\leq\mu_{L}^{2}\rho.$		(2.18)

The conclusion comes with the estimation $\mu_{i}\leq Ci^{2/d}$ .

2.3 Convergence of inverse parabolic source problem

To solve this inverse source problem, we use the well-established Tikhonov regularization method, expressed as

\displaystyle\mathop{\rm min}\limits_{f\in X}\|\mathcal{S}(f)-m\|_{L^{2}(% \Omega)}^{2}+\lambda\|f\|_{L^{2}(\Omega)}^{2}.

(2.19)

However, in the conventional application of the POD method, the source term $f$ and the initial condition $g$ must be determined initially to generate snapshots and obtain the POD basis functions. In the context of inverse problems, the only available information is the measurement $m(x)$ . This predicament, referred to as the inverse crime, makes this method impossible to implement in practice. Our new method could overcome this vital drawback by setting the forward solver to be our new POD forward solver.

In the general discrete approximation of problem (2.19), we seek to solve the following least-squares regularized optimization problem:

\displaystyle\mathop{\rm min}\limits_{f\in V_{\text{pod}}}\|\mathcal{S}_{\text% {pod}}(f)-m\|_{L^{2}(\Omega)}^{2}+\lambda\|f\|_{L^{2}(\Omega)}^{2}.

(2.20)

Consider the functional $\mathcal{J}_{\text{pod}}[f]=\|\mathcal{S}_{\text{pod}}f-m\|_{L^{2}(\Omega)}^{2% }+\lambda\|f\|_{L^{2}(\Omega)}^{2}$ . By computing the Fr $\acute{e}$ chet derivative of $\mathcal{J}_{\text{pod}}[f]$ , we can derive the subsequent iterative scheme:

\displaystyle f_{k+1}=f_{k}-\beta d\mathcal{J}_{\text{pod}}[f_{k}],\quad% \forall k\in\mathbb{N},

(2.21)

where $\beta$ is the step size, $d\mathcal{J}_{\text{pod}}[f]=\mathcal{S}_{\text{pod}}^{*}(\mathcal{S}_{\text{% pod}}f-m)+\lambda f$ denotes the Fr $\acute{e}$ chet derivative, and $f_{0}$ is an initial guess [17].

The above theory is based on noise-free case, i.e. the final time measurement $m=u(\cdot,T)$ is assumed to be precisely known. However, in practical applications, measurement data often contains uncertainties. We assume the measurement data is blurred by noise and takes the discrete form

m^{n}_{i}=u(d_{i},T)+e_{i},i=1,\cdots,n,

where $d_{i}$ s represent the positions of detectors and $\{e_{i}\}^{n}_{i=1}$ are independent and identically distributed (i.i.d.) random variables on an appropriate probability space ( $\mathfrak{X},\mathcal{F},\mathbb{P})$ . Based on [6] and the analysis therein, we know that $\|u\|_{C([0,T];H^{2}(\Omega))}\leq C\|f\|_{L^{2}(\Omega)}$ . According to the embedding theorem of Sobolev spaces, we know that $H^{2}(\Omega)$ is continuously embedded into $C(\bar{\Omega})$ so that $u(\cdot,T)$ is well defined point-wisely for all $d_{i}\in\Omega$ . Without loss of generality, we assume that the scattered locations $\{d_{i}\}_{i=1}^{n}$ are uniformly distributed in $\Omega$ . That is, there exists a constant $B>0$ such that ${d_{\max}}/{d_{\min}}\leq B$ , where ${d_{\max}}$ and ${d_{\min}}$ are defined by

\displaystyle d_{\max}=\mathop{\rm sup}\limits_{x\in\Omega}\mathop{\rm inf}% \limits_{1\leq i\leq n}|x-d_{i}|~{}~{}~{}\mbox{and}~{}~{}~{}d_{\min}=\mathop{% \rm inf}\limits_{1\leq i\neq j\leq n}|d_{i}-d_{j}|.

(2.22)

We will first use the technique developed in [5] to recover the final time measurement $u(\cdot,T)$ from the noisy data $m^{n}_{i}$ for $i=1,...,n$ . We approximate $u(\cdot,T)$ by solving the following minimization problem:

\displaystyle m=\mathop{\rm argmin}\limits_{u\in X}\frac{1}{n}\sum_{i=1}^{n}(u% (x_{i})-m^{n}_{i})^{2}+\alpha|u|_{H^{2}(\Omega)}^{2}.

(2.23)

Assume the pointwise noise $e_{i}$ has a bounded variance $\sigma$ , which is referred to as the noise level. [5] analyzed this problem and provided optimal convergence results. Moreover, they proposed an a posteriori algorithm to obtain the best approximation without knowing the true solution $m$ and noise level $\sigma$ . Here, we list their main results. If one chooses the optimal regularization parameter

\alpha^{1/2+d/8}=O(\sigma n^{-1/2}\|u(\cdot,T)\|^{-1}_{H^{2}(\Omega)}),

then the solution $m$ of (2.23) achieves the optimal convergence

\mathbb{E}\big{[}\|u(\cdot,T)-m\|_{L^{2}(\Omega)}\big{]}\leq C\alpha^{1/2}\|u(% \cdot,T)\|_{H^{2}(\Omega)}.

(2.24)

And if the noise $\{e_{i}\}^{n}_{i=1}$ are independent Gaussian random variables with variance $\sigma$ , we further have,

\mathbb{P}(\|u(\cdot,T)-m\|_{L^{2}(\Omega)}\geq\alpha^{1/2}\|u(\cdot,T)\|_{H^{% 2}(\Omega)}z)\leq 2e^{-Cz^{2}}.

(2.25)

Using this recovered function $m(x)$ , we generate the adjoint POD basis functions in Section 2.1. It can be easily shown that, with uncertainty, the POD basis functions are still good low-rank approximation of the snapshots $\{y_{1},...,y_{M}\}$ . Combining the Theorem 2.3 and (2.24), we shall have that for any $1\leq i\leq M$ ,

\left|\left|y_{i}-P_{\text{pod}}y_{i}\right|\right|_{L^{2}(\Omega)}^{2}\leq C(% ML^{4/d}\rho+\alpha)\|f\|^{2}_{L^{2}(\Omega)},

(2.26)

Since we replace the source term by a finite truncation (2.6), and if $f\in H^{1}(\Omega)$ ,

\|f-f_{\text{app}}\|_{L^{2}}\leq C\frac{\|\nabla f\|_{L^{2}}}{\sqrt{\mu_{L}}}% \leq C\frac{\|\nabla f\|_{L^{2}}}{{L^{1/d}}}.

(2.27)

If $f\in L^{2}(\Omega)$ , then $f_{\text{app}}\rightarrow f$ as $L\rightarrow+\infty$ . We assume

\|f-f_{app}\|^{2}_{L^{2}}\leq\varepsilon,

(2.28)

where $\varepsilon$ depends on $L$ . With those results, using a similar technique to prove the Theorem 4.1 in [17], we have the following convergence results.

Theorem 2.4

Let $\{e_{i}\}^{n}_{i=1}$ be independent random variables satisfying $\mathbb{E}[e_{i}]=0$ and $\mathbb{E}[e^{2}_{i}]\leq\sigma^{2}$ for $i=1,\cdots,n$ . Set $\alpha^{1/2+d/8}=O(\sigma n^{-1/2}\|u^{*}(\cdot,T)\|^{-1}_{H^{2}(\Omega)})$ in (2.23), and $\lambda=O(ML^{4/d}\rho+\alpha)$ , then

\displaystyle\mathbb{E}\big{[}\|\mathcal{S}f^{*}-\mathcal{S}_{\text{pod}}f_{% \text{pod}}\|_{L^{2}(\Omega)}^{2}\big{]}\leq C\lambda\|f^{*}\|^{2}_{L^{2}(% \Omega)}+C\varepsilon,

(2.29)

\displaystyle\mathbb{E}\big{[}\|f^{*}-f_{\text{pod}}\|_{L^{2}(\Omega)}^{2}\big% {]}\leq C\|f^{*}\|^{2}_{L^{2}(\Omega)}+C\varepsilon,

(2.30)

and

\displaystyle\mathbb{E}\big{[}\|f^{*}-f_{\text{pod}}\|_{H^{-1}(\Omega)}^{2}% \big{]}

\displaystyle\leq C\lambda^{1/2}\|f^{*}\|^{2}_{L^{2}(\Omega)}+C\varepsilon.

(2.31)

Furthermore, if we assume the noise $\{e_{i}\}^{n}_{i=1}$ are independent Gaussian random variables with variance $\sigma$ , we will have a stronger type of convergence, one can refer to [6] for a similar proof. We just list the results here.

Theorem 2.5

Let $\{e_{i}\}^{n}_{i=1}$ be independent Gaussian random variables with variance $\sigma$ . Set $\alpha^{1/2+d/8}=O(\sigma n^{-1/2}\|u^{*}(\cdot,T)\|^{-1}_{H^{2}(\Omega)})$ in (2.23), and $\lambda=O(ML^{4/d}\rho+\alpha)$ , then there exists a constant C, for any $z>0$ ,

\displaystyle\mathbb{P}(\|S_{\text{pod}}f_{\text{pod}}-Sf^{*}\|_{L^{2}(\Omega)% }\geq(\lambda^{1/2}\|f^{*}\|_{L^{2}}+\varepsilon)z)\leq 2e^{-Cz^{2}},

(2.32)

\displaystyle\mathbb{P}(\|f_{\text{pod}}-f^{*}\|_{L^{2}(\Omega)}\geq(\|f^{*}\|% _{L^{2}}+\varepsilon)z)\leq 2e^{-Cz^{2}},

(2.33)

and

\displaystyle\mathbb{P}(\|f_{\text{pod}}-f^{*}\|_{H^{-1}(\Omega)}\geq(\lambda^% {1/4}\|f^{*}\|_{L^{2}}+\varepsilon)z)\leq 2e^{-Cz^{2}}.

(2.34)

3 Parabolic backward problem

For the backward problem of the parabolic equation, our goal is to recover the initial term $g(x)$ , given the final time measurement $m=\mathcal{S}(g)=u(\cdot,T)$ . In this case, $u$ satisfies the following equation:

\left\{\begin{aligned} u_{t}+\mathcal{L}u&=0&\mbox{in }\Omega\times(0,T),\\ u(x,t)&=0&\mbox{on }\partial\Omega\times(0,T),\\ u(x,0)&=g(x)&\mbox{in }\Omega\,,\end{aligned}\right.

(3.1)

Unlike the traditional POD method, we will get the snapshots from the following adjoint equation:

\left\{\begin{aligned} \tilde{u}_{t}+\mathcal{L}\tilde{u}&=0&\mbox{in }\Omega% \times(0,T),\\ \tilde{u}(x,t)&=0&\mbox{on }\partial\Omega\times(0,T),\\ \tilde{u}(x,0)&=m(x)&\mbox{in }\Omega\,,\end{aligned}\right.

(3.2)

In this case, the snapshots are generated by solving the adjoint equation (3.2) with the given final time measurement $m(x)$ as the initial condition.

Repeat the standard procedure in section 2.2, we generate the new POD basis $\psi_{k}$ s from the snapshots $\big{\{}\tilde{u}(\cdot,t_{0}),\tilde{u}(\cdot,t_{1}),\ldots,\tilde{u}(\cdot,t% _{M})\big{\}}$ , where $t_{k}=k\Delta t$ with $\Delta t=\frac{T}{M}$ and $k=0,\ldots,M$ . Then we have the following error formula similar to Proposition A.1:

\frac{\sum_{i=1}^{2M+1}\left|\left|\tilde{y}_{i}-\sum_{k=1}^{{N_{\text{pod}}}}% \big{(}\tilde{y}_{i},\psi_{k}(\cdot)\big{)}_{L^{2}(\Omega}\psi_{k}(\cdot)% \right|\right|_{L^{2}(\Omega)}^{2}}{\sum_{i=1}^{2M+1}\left|\left|\tilde{y}_{i}% \right|\right|_{L^{2}(\Omega)}^{2}}=\rho,

(3.3)

where the number $N_{\text{pod}}$ is determined according to the decay of the ratio $\rho=\frac{\sum_{k={N_{\text{pod}}}+1}^{2M+1}\lambda_{k}}{\sum_{k=1}^{2M+1}% \lambda_{k}}$ .

We consider using these new POD basis functions to approximate the forward problem to accelerate the computation. The fully discrete scheme is constructed on $V_{\text{pod}}$ and the solution is denoted by $U_{k}$ for $k=1\cdots M$ with $M=\frac{T}{\Delta t}$ . To be precise, we seek numerical solutions $U_{k}$ ’s such that

(\bar{\partial}U_{k},\psi)+a(U_{k},\psi)=0,\quad\forall\psi\in{V_{pod}},

(3.4)

with $U_{0}=g(x)$ . We define the solution operator from the ini term $g$ to the final time solution $U_{M}$ as $\mathcal{S}_{\text{pod}}$ , i.e., $\mathcal{S}_{\text{pod}}g=U_{M}$ .

3.1 Convergence of the adjoint-POD method

For any $g\in L^{2}(\Omega)$ , we write $g=\sum_{k=1}^{\infty}g_{k}\phi_{k}$ for a set of coefficients $g_{k}$ . Let $u=\sum_{k=1}^{\infty}u_{k}(t)\phi_{k}$ be the solution of the problem (2.1). Substituting these two expressions of $g$ and $u$ into the first equation of (2.1), we get by noting the fact that $L\phi_{k}=\mu_{k}\phi_{k}$ and comparing the coefficients of $\phi_{k}$ on both sides of the equation that $u_{k}(0)=g_{k}$ and

u^{\prime}_{k}(t)+\mu_{k}u_{k}=0\ \ \ \ \mbox{in }~{}(0,T)\,.

We can write the solution as $u_{k}(T)=\alpha_{k}\,g_{k}$ , with $\alpha_{k}=e^{-\mu_{k}T}$ . Noting that $Sg=u(\cdot,T)=\sum^{\infty}_{k=1}u_{k}(T)\phi_{k}$ , we can formally write

S\Big{(}\sum_{k=1}^{\infty}g_{k}\phi_{k}\Big{)}=\sum_{k=1}^{\infty}\alpha_{k}g% _{k}\phi_{k}.

This representation shows the relationship between the initial condition $g$ and the solution $u(\cdot,T)$ at the final time $T$ . The operator $S$ maps the initial condition to the solution at time $T$ through the coefficients $\alpha_{k}$ , which depend on the eigenvalues $\mu_{k}$ of the operator $L$ and the final time $T$ . This relationship can be used to analyze the properties of the solution and the backward problem.

For simplicity, we approximate the source term $g(x)$ by a finite-dimensional truncation, i.e.

g_{\text{app}}=\sum_{k=1}^{L}g_{k}\phi_{k}.

(3.5)

Then the solution $u(x,t)$ of the parabolic equation has the form: $u(\cdot,T)=\sum^{L}_{k=1}e^{-\mu_{k}T}f_{k}\phi_{k}$ . After simple calculation, we can also have that $\tilde{u}(x,t)=\sum^{L}_{k=1}e^{-\mu_{k}T}e^{-\mu_{k}t}f_{k}\phi_{k}$ . Choosing the POD basis is to compute the singular value decompositio of the matrix $\tilde{A}=(\tilde{y}_{1},...,\tilde{y}_{M})$ , where $\tilde{y}_{j}=(\tilde{u}(x_{1},t_{j}),...,\tilde{u}(x_{N},t_{j}))^{T}$ . Here $x_{1},...,x_{N}$ are the finite element nodes in $\Omega$ . Suppose $A$ has the singular value decomposition: $A=U\Sigma V$ , then the $\psi_{k}s$ are exactly the first $M$ columns of $U$ .

Denote $A=(y_{1},...,y_{M})$ and $\tilde{A}=(\tilde{y}_{1},...,\tilde{y}_{M})$ , the matrix $\Phi=(\phi_{1}(\vec{x},t_{1}),...,\phi_{L}(\vec{x},t_{1}))$ , $F=\text{diag}(f_{1},...,f_{L})$ and $D=\text{diag}(e^{-\mu_{1}T},...,e^{-\mu_{L}T})$ , and the $L\times M$ matrix $J$ with entries: $J(i,j)=e^{-\mu_{i}t_{j}}$ . Obviously, $\Phi$ is a column orthogonal matrix since the normal orthogonality of eigenfunctions $\phi_{k}$ s. With the formulations of $u$ and $\tilde{u}$ , we could represent the matrix $A$ and $\tilde{A}$ by:

A=\Phi FJ,\quad\quad\text{and}\quad\quad\tilde{A}=\Phi DFJ.

(3.6)

These matrix representations of $A$ and $\tilde{A}$ provide a compact way to express the relationship between the coefficients of the eigenfunctions $\phi_{k}$ and the solutions $u(x,t)$ and $\tilde{u}(x,t)$ .

Proposition A.1 shows that the low-rank space $V_{\text{pod}}$ is the best $N_{\text{pod}}$ rank approximation of the column space of $\tilde{A}$ , we want to show that $V_{pod}$ is also a good approximation of the column space of $A$ , which will certify the efficiency of the new POD method. First of all, let us show the connection of the matrices $A$ and $\tilde{A}$ .

Lemma 3.1

If $L\leq M$ , then $span\{y_{1},...,y_{M}\}=span\{\tilde{y}_{1},...,\tilde{y}_{M}\}$ . This means the column spaces of $A$ and $\tilde{A}$ are identical, i.e., $C(A)=C(\tilde{A})$ .

Proof. The proof is similar to the Lemma 2.2, just using the fact that the matrix $J$ is actually a Vandermonde matrix.

From (3.3), the new POD basis is a good approximation of $\{\tilde{y}_{1},...,\tilde{y}_{M}\}$ . With the above preparation, we will show the new POD basis is also a good approximation of $\{y_{1},...,y_{M}\}$ .

Theorem 3.2

Using the same notation in this section, if one has enough snapshots, i.e. $L\leq M$ , then

\frac{\sum_{i=1}^{M}\left|\left|y_{i}-P_{pod}y_{i}\right|\right|_{L^{2}(\Omega% )}^{2}}{\sum_{i=1}^{M}\left|\left|y_{i}\right|\right|_{L^{2}(\Omega)}^{2}}\leq Ce% ^{2\mu_{L}T}\rho,

(3.7)

where $P_{pod}$ is the projection operator on the adjoint-POD space $span\{\psi_{1},...,\psi_{N_{pod}}\}$ and $\rho=\frac{\sum_{k={N_{pod}}+1}^{M}\lambda_{k}}{\sum_{k=1}^{M}\lambda_{k}}$

Proof. In the following proof, we assume $L=M$ for simplicity. For the case $L<M$ , the proof is similar.

Using the same notation of Lemma 3.1, $\Phi$ and $J$ are both invertible square matrices. Then, there exists a unique matrix $P$ such that,

\Phi DFJP=\Phi FJ,

and $P=J^{-1}D^{-1}J$ .

Hence

y_{j}=\sum^{L}_{i=1}P_{ij}\tilde{y}_{i}.

Then by Cauchy-Schwarz inequality, for any $1\leq j\leq L$ , we have the estimate

\displaystyle\|y_{j}-P_{\text{pod}}y_{j}\|^{2}\leq\sum_{i=1}^{L}P^{2}_{ij}\sum% _{i=1}^{L}\|\tilde{y}_{i}-P_{\text{pod}}\tilde{y}_{i}\|^{2}.

(3.8)

Hence,

\displaystyle\sum_{j=1}^{L}\|y_{j}-P_{\text{pod}}y_{j}\|^{2}\leq\sum_{i,j=1}^{% L}P^{2}_{ij}\sum_{i=1}^{L}\|\tilde{y}_{i}-P_{\text{pod}}\tilde{y}_{i}\|^{2}=\|% P\|^{2}_{F}\sum_{i=1}^{L}\|\tilde{y}_{i}-P_{\text{pod}}\tilde{y}_{i}\|^{2}.

(3.9)

The rest is to estimate the Frobenius norm of $P$ . Since $P=J^{-1}D^{-1}J$ , we define $\|P\|_{d}=\|D^{-1}\|_{2}$ . It is easy to verify that $\|\cdot\|_{d}$ is a matrix norm. Then,

\|P\|_{F}\leq C\|P\|_{d}=C\|D^{-1}\|_{2}\leq Ce^{\mu_{L}T}.

On the other hand, since $\Phi$ is an orthogonal matrix, we have

$\displaystyle\sum_{j=1}^{L}\\|\tilde{y}_{j}\\|^{2}$	$\displaystyle=\\|\Phi DFJ\\|_{F}^{2}=\\|DFJ\\|_{F}^{2}\leq\\|D\\|_{F}^{2}\\|FJ\\|_{F}^% {2}$	(3.10)
	$\displaystyle\leq C\\|FJ\\|_{F}^{2}$	(3.11)
	$\displaystyle=\leq C\sum_{j=1}^{L}\\|y_{j}\\|^{2}.$	(3.12)

Alighed with (3.3), we finally have,

	$\displaystyle\frac{\sum_{i=1}^{M}\left\|\left\|y_{i}-P_{pod}y_{i}\right\|\right\|_% {L^{2}(\Omega)}^{2}}{\sum_{i=1}^{M}\left\|\left\|y_{i}\right\|\right\|_{L^{2}(% \Omega)}^{2}}$	$\displaystyle\leq C\\|P\\|^{2}_{F}\frac{\sum_{i=1}^{M}\left\|\left\|\tilde{y}_{i}-% P_{pod}\tilde{y}_{i}\right\|\right\|_{L^{2}(\Omega)}^{2}}{\sum_{i=1}^{M}\left\|% \left\|\tilde{y}_{i}\right\|\right\|_{L^{2}(\Omega)}^{2}}$		(3.13)
		$\displaystyle\leq Ce^{2\mu_{L}T}\rho.$		(3.14)

The conclusion comes with the estimation $\mu_{i}\leq Ci^{2/d}$ .

3.2 Convergence of backward problem

To solve this inverse problem, we use the traditional Tikhonov regularization method,

\displaystyle\mathop{\rm min}\limits_{g\in X}\|\mathcal{S}(g)-m\|_{L^{2}(% \Omega)}^{2}+\lambda\|g\|_{L^{2}(\Omega)}^{2}.

(3.15)

In the general discrete approximation of the problem (3.15), we solve the following least-squares regularized optimization problem:

\displaystyle\mathop{\rm min}\limits_{g\in V_{\text{pod}}}\|\mathcal{S}_{\text% {pod}}(g)-m\|_{L^{2}(\Omega)}^{2}+\lambda\|g\|_{L^{2}(\Omega)}^{2}.

(3.16)

But in the traditional setting of the POD method, one has to know the source term $f$ and the initial condition $g$ first to derive the snapshots to get the POD basis functions, but the only information known in inverse problems is the measurement $m(x)$ . This is called inverse crime which makes this method impossible to implement in practice.

Define the functional $\mathcal{J}_{\text{pod}}[g]=\|\mathcal{S}_{\text{pod}}g-m\|_{L^{2}(\Omega)}^{2% }+\lambda\|g\|_{L^{2}(\Omega)}^{2}$ . We can compute the Fr $\acute{e}$ chet derivative of $\mathcal{J}_{\text{pod}}[g]$ and obtain the following iterative scheme:

\displaystyle g_{k+1}=g_{k}-\beta d\mathcal{J}_{pod}[g_{k}],\quad\forall k\in% \mathbb{N},

(3.17)

where $\beta$ is the step size, $d\mathcal{J}_{\text{pod}}[g]=\mathcal{S}_{\text{pod}}^{*}(\mathcal{S}_{\text{% pod}}g-m)+\lambda f$ , and $g_{0}$ is an initial guess.

The above theory is based on noise free case, i.e. the final time measurement $m=u(\cdot,T)$ is exactly known. For practical consideration, the measurement data always contains uncertainty. We assume the measurement data is always blurred by noise and takes the discrete form $m^{n}_{i}=u(d_{i},T)+e_{i}$ , $i=1,\cdots,n$ , where $d_{i}$ s are the positions of detectors and $\{e_{i}\}^{n}_{i=1}$ are independent and identically distributed (i.i.d.) random variables on a proper probability space ( $\mathfrak{X},\mathcal{F},\mathbb{P})$ . From property of paraboic equation, we know that $\|u\|_{C([0,T];H^{2}(\Omega))}\leq C\|g\|_{L^{2}(\Omega)}$ . According to the embedding theorem of Sobolev spaces, we know that $H^{2}(\Omega)$ is continuously embedded into $C(\bar{\Omega})$ so that $u(\cdot,T)$ is well defined point-wisely for all $d_{i}\in\Omega$ .

\displaystyle m=\mathop{\rm argmin}\limits_{u\in X}\frac{1}{n}\sum_{i=1}^{n}(u% (x_{i})-m^{n}_{i})^{2}+\alpha|u|_{H^{2}(\Omega)}^{2}.

(3.18)

Assume the point wise noise $e_{i}$ has bounded variance $\sigma$ , this is the so-called noise level. [5]analyzed this problem and give the optimal convergence results, moreover they proposed a posteriori algorithm to give the best approximation without knowing the true solution $m$ and noise level $\sigma$ . Here we list their main results: If one chooses the optimal regularization parameter

\alpha^{1/2+d/8}=O(\sigma n^{-1/2}\|u(\cdot,T)\|^{-1}_{H^{2}(\Omega)}),

then the solution $m$ of (3.18) achieves the optimal convergence

\mathbb{E}\big{[}\|u(\cdot,T)-m\|_{L^{2}(\Omega)}\big{]}\leq C\alpha^{1/2}\|u(% \cdot,T)\|_{H^{2}(\Omega)}.

(3.19)

And if the noise $\{e_{i}\}^{n}_{i=1}$ are independent Gaussian random variables with variance $\sigma$ , we further have,

\mathbb{P}(\|u(\cdot,T)-m\|_{L^{2}(\Omega)}\geq\alpha^{1/2}\|u(\cdot,T)\|_{H^{% 2}(\Omega)}z)\leq 2e^{-Cz^{2}}.

(3.20)

Using this recovered function $m(x)$ , we generate the adjoint POD basis in Section 2.1. It can be easily shown that, with uncertainty, the POD basis is still a good low rank approximation of the snapshots $\{y_{1},...,y_{M}\}$ . Combining the Theorem 3.2 and (3.19), we shall have that for any $1\leq i\leq M$ ,

\left|\left|y_{i}-P_{\text{pod}}y_{i}\right|\right|_{L^{2}(\Omega)}^{2}\leq C(% Me^{2\mu_{L}T}\rho+\alpha)\|g\|^{2}_{L^{2}(\Omega)},

(3.21)

Since we replace the source term by a finite truncation (3.5), we assume

\|g-g_{\text{app}}\|^{2}_{L^{2}}\leq\varepsilon(L).

(3.22)

With those results, using a similar technique to prove the Theorem 3.7 in [18], we have the following convergence results.

Theorem 3.3

Let $\{e_{i}\}^{n}_{i=1}$ be independent random variables satisfying $\mathbb{E}[e_{i}]=0$ and $\mathbb{E}[e^{2}_{i}]\leq\sigma^{2}$ for $i=1,\cdots,n$ . Set $\alpha^{1/2+d/8}=O(\sigma n^{-1/2}\|u(\cdot,T)\|^{-1}_{H^{2}(\Omega)})$ in (2.23), and $\lambda=O(Me^{2\mu_{L}T}\rho+\alpha)$ , then

\displaystyle\mathbb{E}\big{[}\|\mathcal{S}g^{*}-\mathcal{S}_{\text{pod}}g_{% \text{pod}}\|_{L^{2}(\Omega)}^{2}\big{]}\leq C\lambda\|g^{*}\|^{2}_{L^{2}(% \Omega)}+C\varepsilon,

(3.23)

\displaystyle\mathbb{E}\big{[}\|g^{*}-g_{\text{pod}}\|_{L^{2}(\Omega)}^{2}\big% {]}\leq C\|g^{*}\|^{2}_{L^{2}(\Omega)}+C\varepsilon.

(3.24)

Furthermore, if we assume the noise $\{e_{i}\}^{n}_{i=1}$ are independent Gaussian random variables with variance $\sigma$ , we will have a stronger type of convergence, one can refer to [18] for a similar proof. We just list the results here.

Theorem 3.4

Let $\{e_{i}\}^{n}_{i=1}$ be independent Gaussian random variables with variance $\sigma$ . Set $\alpha^{1/2+d/8}=O(\sigma n^{-1/2}\|u(\cdot,T)\|^{-1}_{H^{2}(\Omega)})$ in (2.23), and $\lambda=O(Me^{2\mu_{L}T}\rho+\alpha)$ , then there exists a constant C, for any $z>0$ ,

\displaystyle\mathbb{P}(\|S_{\text{pod}}g_{\text{pod}}-Sg^{*}\|_{L^{2}(\Omega)% }\geq(\lambda^{1/2}\|g^{*}\|_{L^{2}}+\varepsilon)z)\leq 2e^{-Cz^{2}},

(3.25)

\displaystyle\mathbb{P}(\|g_{\text{pod}}-g^{*}\|_{L^{2}(\Omega)}\geq(\|g^{*}\|% _{L^{2}}+\varepsilon)z)\leq 2e^{-Cz^{2}}.

(3.26)

4 Numerical examples

In this section, we present several numerical examples to demonstrate the reconstruction results for the inverse source problem and the backward problem discussed in this paper. We consider the domain $\Omega=[0,\pi]^{2}$ . For each observation data set, we first apply the backward Euler scheme in time and the linear finite element method (specifically, the P1 element) in space with a mesh size of $h=1/50$ and a time step of $\Delta t=T/400$ . We select 9 POD basis functions to compute the inverse problems, whereas, for the finite element method, there are approximately 2500 basis functions.

In [17], the authors have already compared the efficiency of the POD method and the finite element method for solving this inverse problem. They demonstrate that the POD method achieves a speed-up of at least 6 times, even with 400 finite element basis functions. Therefore, we do not include a comparison with the finite element method in this paper. As the number of finite element basis functions increases, the potential for the POD method to achieve greater speed-up also grows correspondingly.

4.1 Inverse source examples

In the following examples, we apply the adjoint POD method to recover the source term $f$ as described in Section 2. We obtain the data for the forward problem with the exact source term $f$ at the final time $T=1$ .

Example 4.1 We first demonstrate the importance of choosing the appropriate POD basis functions. For the same source term, we apply different right-hand sides of equation (2.1) to obtain the POD basis functions. Subsequently, we solve the inverse source problem using these different POD basis functions.

Figure 1 illustrates this process. The true source term is given by $\sin(2x)\sin(2y)e^{\frac{x+y}{\pi}}$ , and its surface plot is shown in Figure 1 (a). Figure 1 (b) presents the reconstruction result obtained using the adjoint POD method proposed in this paper, indicating that our new method effectively recovers the source term in an efficient manner. Figure 1 (c) displays the result when an incorrect right-hand side is used to generate the POD basis functions. In this case, we use $\sin(x)\sin(y)$ as the right-hand side in (2.1) to generate the POD basis functions. Figure 1 (d) shows the result when we use an A-shaped function as the right-hand side to derive the POD basis.

As can be seen, Figure 1 (b) provides a good reconstruction, whereas Figures 1 (c) and (d) yield inaccurate results. The result in Figure 1 (d) is particularly striking, as the recovered image deviates significantly from the exact source term.

In this example, we demonstrate the importance of selecting the appropriate basis functions for solving inverse problems. Utilizing an unsuitable set of basis functions can lead to wrong results. Our proposed adjoint POD method offers a set of suitable basis functions for such problems. In the following examples, we will compare our adjoint POD basis functions with the original true POD basis to validate Theorem 2.3.

Refer to caption — (a) Exact source term

Example 4.2 In this example, we will first use the true source term as the right-hand side in (2.1) to generate the POD basis. Then, we will generate the POD basis using our proposed adjoint POD method. To validate Theorem 2.3, we will compare both sets of basis functions and assess their similarity. Figure LABEL:basis-comp-source-sin shows the results when using the exact source term $f^{*}=\sin(2x)\sin(2y)$ . Figure LABEL:basis-comp-source-sin (b) and Figure LABEL:basis-comp-source-sin (d) show that both the traditional POD and our adjoint POD work well to recover the true source term. However, our adjoint POD method does not require prior knowledge of the exact source term, while the traditional POD method does, leading to the so-called inverse crime. The basis functions for each method are depicted in Figure LABEL:basis-comp-source-sin (c) and Figure LABEL:basis-comp-source-sin (e), demonstrating that our adjoint POD basis is highly similar to the traditional one, even though we derived it solely from measured data.

Figure LABEL:basis-comp-source-Z presents the results when using an exact source term $f^{*}$ in the form of a Z-shaped function. This example also illustrates the efficiency of the POD method in solving inverse problems compared to the finite element method. Figure LABEL:basis-comp-source-Z (c) and Figure LABEL:basis-comp-source-Z (e) show that both the basis functions of the traditional POD method and our adjoint POD method contain the critical information of the exact source term that we aim to recover. In contrast, the basis functions of finite element method do not contain any prior information about the true function we need to recover.

In the aforementioned cases, all the measured data were noise-free. We will now test the denoising method described in Section 2.3 by examining highly challenging cases with noise levels ranging from $10\%$ to $50\%$ .

Example 4.3 In this example, we will evaluate the robustness of the adjoint POD method in the presence of noise. We consider the measurement data to be $m^{n}_{i}=u(d_{i},T)+\sigma e_{i}$ , $i=1,\cdots,n$ , where $d_{i}$ represents positions within the domain $\Omega$ , and $\{e_{i}\}^{n}_{i=1}$ are independent standard normal random variables. We will take 2500 positions $d_{i}$ uniformly distributed over the domain $\Omega$ .

Figure 4 demonstrates the robustness of our method in the presence of significant noise. Even with a 50% noise level, where the measured data is entirely obscured by noise as shown in Figure 4 (c), our method is still able to recover the source term as depicted in Figure 4 (f).

4.2 Examples for backward problem

In this subsection, we will apply the new POD method to recover the initial term $g$ as discussed in Section 3. We will collect the data at the time $T=0.05$ .

Example 4.4 We will demonstrate the importance of selecting the appropriate POD basis functions. For the same source term, we apply different right-hand sides of equation (3.1) to derive the POD basis functions. Then, we solve the backward problem using different POD basis functions.

Figure 5 illustrates the corresponding results. The true source term is $\sin(2x)\sin(2y)e^{\frac{x+y}{\pi}}$ , and its surface plot is shown in Figure 5 (a). Figure 5 (b) presents the reconstruction result using our adjoint POD method proposed in this paper, which demonstrates the effectiveness and efficiency of our new POD method in recovering the source term. Figure 5 (c) shows the result when we use an incorrect right-hand side to generate the POD basis functions. In this case, we use $\sin(x)\sin(y)$ as the right-hand side in (3.1) to generate the POD basis functions. Figure 5 (d) displays the result when we use an A-shaped function as the right-hand side to generate the POD basis functions. It can be observed that Figure 5 (b) provides a good reconstruction, while Figure 5 (c) and Figure 5 (d) yield incorrect results. Particularly in Figure 5 (d), the recovered image is entirely different from the exact source term.

In this example, we demonstrate the importance of choosing the correct basis to solve the inverse problem. Using an inappropriate set of basis functions may lead to unsatisfactory results. Our proposed adjoint POD provides a set of suitable basis functions. In the upcoming examples, we will compare our adjoint POD basis functions with the original true POD basis functions to verify Theorem 3.2.

Example 4.5 In the following two examples, we first use the true source term as the right-hand side in Eq. (3.1) to generate the POD basis functions. Then, we generate the POD basis functions using our proposed adjoint POD method. To verify Theorem 3.2, we will plot two sets of basis functions to determine if they are closely related. Figure LABEL:fig:ini-eg_sin2exp-basis shows the results using the exact source term $g^{*}=\sin(2x)\sin(2y)e^{\frac{x+y}{\pi}}$ . Figures LABEL:fig:ini-eg_sin2exp-basis (b) and LABEL:fig:ini-eg_sin2exp-basis (d) demonstrate that both the traditional POD and our adjoint POD methods work well in recovering the true initial term. However, the difference is that our adjoint POD method does not require prior knowledge of the exact initial term, while the traditional POD method does, which is known as the inverse crime. Figures LABEL:fig:ini-eg_sin2exp-basis (c) and LABEL:fig:ini-eg_sin2exp-basis (e) display the basis functions of each method. We can conclude that the basis functions of our adjoint POD method are very close to the traditional one, and we obtain it solely from the measured data.

Figure LABEL:fig:ini-eg_A-basis shows the results using the exact initial term $g^{*}$ of the A-shaped function. This example also illustrates why the POD method is more efficient in solving inverse problems compared to the finite element method. Figures LABEL:fig:ini-eg_A-basis (c) and LABEL:fig:ini-eg_A-basis (e) reveal that both the traditional POD method and our adjoint POD method’s basis functions contain critical information about the exact initial term we aim to recover. In contrast, the finite element method’s basis lacks any priori information about the true function we need to recover.

In the above cases, all the measured data are noise-free. We will now test the denoising method discussed in Section 3.2 by examining very challenging cases with noise levels ranging from $10\%$ to $50\%$ .

Example 4.6 In this example, we will test the robustness of the adjoint POD method against noise. We take the measurement data as $m^{n}_{i}=u(d_{i},T)+\sigma e_{i}$ , for $i=1,\cdots,n$ , where $d_{i}$ s represent the positions inside of $\Omega$ , and $\{e_{i}\}^{n}_{i=1}$ are independent standard normal random variables. We will use 2500 positions $d_{i}$ uniformly distributed over the domain $\Omega$ . Figure 8 demonstrates that our method is robust even in the presence of big noise. Remarkably, even with $50\%$ noise, when the measured data is completely obscured by noise as shown in Figure 8 (c), we can still recover the source term, as seen in Figure 8 (f).

Example 4.7 Finally, we study a more interesting case. While the inverse source problem and the backward problem are two distinct problems, we have observed from the previous numerical examples that they share some commonalities. Specifically, the POD basis for both problems contains critical information about the functions one wants to recover. As a result, we will employ the POD basis functions derived from the inverse source problem to solve the backward problem. Please refer to Figure LABEL:fig:ini-eg_A_using_source-basis for the reconstruction results.

5 Conclusion

We have developed a data-driven and model-based approach for solving parabolic inverse source problems with uncertain data. The key idea is to exploit the model-based intrinsic low-dimensional structure of the underlying parabolic PDEs and construct data-based POD basis functions to achieve significant dimension reduction in the solution space. Equipped with the POD basis functions, we develop a fast algorithm that can compute the optimization problem in the inverse source problems. Hence, we obtain an effective data-driven and model-based approach for the inverse source problems and overcome the typical computational bottleneck of FEM in solving these problems. Under a weak assumption on the regularity of the solution, we provide the convergence analysis of our POD algorithm in solving the forward parabolic PDEs and thus obtain the error estimate of the POD algorithm for the parabolic inverse source problems. Finally, we carry out numerical experiments to demonstrate the accuracy and efficiency of the proposed method. We also study other issues of the POD algorithm, such as the dependence of the error on the mesh size, the regularization parameter in the least-squares regularized minimization problems, and the number of POD basis functions. Through numerical results, we find that our POD algorithm provides significant computational savings over the FEM while yielding as good approximations as the FEM. We expect an even better performance of efficiency can be obtained for 3D problems, which will be studied in our future works.

Acknowledgement

The research of W. Zhang is supported by the National Natural Science Foundation of China No. 12371423 and 12241104. The research of Z. Zhang is supported by Hong Kong RGC grant project 17307921, National Natural Science Foundation of China No. 12171406, Seed Funding for Strategic Interdisciplinary Research Scheme 2021/22 (HKU), and Seed Funding from the HKU-TCL Joint Research Centre for Artificial Intelligence.

Appendix A Proper orthogonal decomposition (POD) method

Assuming that $u\in H^{1}_{0}(\Omega)$ is the solution to the weak formulation of the parabolic equation (1.1), the construction of POD basis functions requires solution snapshots. These solution snapshots can be obtained by the appropriate technological means related to a specific application, such as experimental data or numerical methods.

Given a set of solutions at different time instances $\big{\{}u(\cdot,t_{0}),u(\cdot,t_{1}),\ldots,u(\cdot,t_{M})\big{\}}$ , where $t_{k}=k\Delta t$ with $\Delta t=\frac{T}{M}$ and $k=0,\ldots,M$ , we first obtain the solution snapshots $\{y_{1},\ldots,y_{M+1},$ $y_{M+2},\ldots,y_{2M+1}\}$ , where $y_{k}=u(\cdot,t_{k-1})$ , $k=1,\ldots,M+1$ , and $y_{k}=\overline{\partial}u(\cdot,t_{k-M-1})$ , $k=M+2,\ldots,2m+1$ with $\overline{\partial}u(\cdot,t_{k})=\frac{u(\cdot,t_{k})-u(\cdot,t_{k-1})}{% \Delta t}$ , $k=1,\ldots,M$ .

Then, the POD basis functions $\{\psi_{k}\}_{k=1}^{{N_{\text{pod}}}}$ are constructed by minimizing the following projection error:

	$\displaystyle\frac{1}{2m+1}\Big{(}\sum_{j=0}^{M}\big{\\|}u(t_{j})-\sum_{k=1}^{{% N_{\text{pod}}}}(u(t_{j}),\psi_{k})_{L^{2}(\Omega)}\psi_{k}\big{\\|}_{L^{2}(% \Omega)}^{2}$		(A.1)
	$\displaystyle+\sum_{j=1}^{M}\big{\\|}\overline{\partial}u(t_{j})-\sum_{k=1}^{{N% _{\text{pod}}}}(\overline{\partial}u(t_{j}),\psi_{k})_{L^{2}(\Omega)}\psi_{k}% \big{\\|}_{L^{2}(\Omega)}^{2}\Big{)}$		(A.2)

subject to the constraints that $\big{(}\psi_{k_{1}}(\cdot),\psi_{k_{2}}(\cdot)\big{)}_{L^{2}(\Omega)}=\delta_{% k_{1}k_{2}}$ , $1\leq k_{1},k_{2}\leq N_{\text{pod}}$ , where $\delta_{k_{1}k_{2}}=1$ if $k_{1}=k_{2}$ , otherwise $\delta_{k_{1}k_{2}}=0$ . Here, we use $N_{\text{pod}}$ to denote the number of POD basis functions that will be extracted from solution snapshots.

Let ${V_{\text{pod}}}=\text{span}\{\psi_{1},\ldots,\psi_{N_{\text{pod}}}\}$ denote the finite-dimensional space spanned by the POD basis functions. Using the method of snapshot proposed by Sirovich [15], we know that the minimizing problem can be reduced to the following eigenvalue problem:

Kv=\mu v,

(A.3)

where the correlation matrix $K$ is computed from the solution snapshots $\{y_{1},y_{2},\ldots,y_{2M+1}\}$ with entries $K_{ij}=(y_{i},y_{j})_{L^{2}(\Omega)}$ , $i,j=1,\ldots,2M+1$ , and $K$ is symmetric and semi-positive definite. We sort the eigenvalues in a decreasing order as $\lambda_{1}\geq\lambda_{2}\geq...\geq\lambda_{2m+1}$ and the corresponding eigenvectors are denoted by $v_{k}$ , $k=1,...,2M+1$ . It can be shown that if the POD basis functions are constructed by

\varphi_{k}(\cdot)=\frac{1}{\sqrt{\lambda_{k}}}\sum_{j=1}^{2M+1}(v_{k})_{j}u(% \cdot,t_{j}),\quad 1\leq k\leq N_{\text{pod}},

(A.4)

where $(v_{k})_{j}$ is the $j$ -th component of the eigenvector $v_{k}$ , they minimize the projection error.

The approximation error for the POD method has been studied extensively in the literature, particularly in the works [10] and [3].

Proposition A.1 (Sec. 3.3.2, [10] or p. 502, [3])

Let $\lambda_{1}\geq\lambda_{2}\geq...\geq\lambda_{2M+1}\geq 0$ denote the non-negative eigenvalues of $K$ in the eigenvalue problem (A.3). Then, $\{\psi_{k}\}_{k=1}^{N_{\text{pod}}}$ constructed according to the method of snapshots (A.4) is the set of POD basis functions, and we have the following error formula:

\frac{\sum_{i=1}^{2M+1}\left|\left|\tilde{y}_{i}-\sum_{k=1}^{{N_{\text{pod}}}}% \big{(}\tilde{y}_{i},\psi_{k}(\cdot)\big{)}_{L^{2}(\Omega}\psi_{k}(\cdot)% \right|\right|_{L^{2}(\Omega)}^{2}}{\sum_{i=1}^{2M+1}\left|\left|\tilde{y}_{i}% \right|\right|_{L^{2}(\Omega)}^{2}}=\frac{\sum_{k={N_{\text{pod}}}+1}^{2M+1}% \lambda_{k}}{\sum_{k=1}^{2M+1}\lambda_{k}},

(A.5)

where the number $N_{\text{pod}}$ is determined according to the decay of the ratio $\rho=\frac{\sum_{k={N_{\text{pod}}}+1}^{2M+1}\lambda_{k}}{\sum_{k=1}^{2M+1}% \lambda_{k}}$ .

References

[1] S. Agmon. Lectures on Elliptic Boundary Value Problems. Van Norstrand, Princeton, NJ, 1965.
[2] A. Alla and M. Falcone. A time-adaptive POD method for optimal control problems. IFAC Proceedings Volumes, 46(26):245–250, 2013.
[3] P. Benner, S. Gugercin, and K. Willcox. A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Review, 57(4):483–531, 2015.
[4] G. Berkooz, P. Holmes, and J. Lumley. The proper orthogonal decomposition in the analysis of turbulent flows. Annual review of fluid mechanics, 25(1):539–575, 1993.
[5] Z. Chen, R. Tuo, and W. Zhang. Stochastic convergence of a nonconforming finite element method for the thin plate spline smoother for observational data. SIAM Journal on Numerical Analysis, 56(2):635–659, 2018.
[6] Z. Chen, W. Zhang, and J. Zou. Stochastic convergence of regularized solutions and their finite element approximations to inverse source problems. SIAM Journal on Numerical Analysis, 60(2):751–780, 2022.
[7] J. Fleckinger and M. Lapidus. Eigenvalues of elliptic boundary value problems with an indefinite weight function. Transactions of the American Mathematical Society, 295(1):305–324, 1986.
[8] H. Gu, J. Xin, and Z. Zhang. Error estimates for a POD method for solving viscous G-equations in incompressible cellular flows. SIAM Journal on Scientific Computing, 43(1):A636–A662, 2021.
[9] J. Hesthaven, G. Rozza, and B. Stamm. Certified reduced basis methods for parametrized partial differential equations. Springer, 2016.
[10] P. Holmes, J. Lumley, and G. Berkooz. Turbulence, coherent structures, dynamical systems and symmetry. Cambridge University Press, 1998.
[11] K. Kunisch and S. Volkwein. Galerkin proper orthogonal decomposition methods for parabolic problems. Numerische Mathematik, 90(1):117–148, 2001.
[12] K. Kunisch, S. Volkwein, and L. Xie. HJB-POD-based feedback design for the optimal control of evolution problems. SIAM Journal on Applied Dynamical Systems, 3(4):701–722, 2004.
[13] A. Quarteroni, A. Manzoni, and F. Negri. Reduced basis methods for partial differential equations: an introduction, volume 92. Springer, 2015.
[14] L. Sirovich. Turbulence and the dynamics of coherent structures. I. Coherent structures. Quarterly of applied mathematics, 45(3):561–571, 1987.
[15] L. Sirovich. Turbulence and the dynamics of coherent structures. I. Coherent structures. Quarterly of applied mathematics, 45(3):561–571, 1987.
[16] S. Volkwein. Proper orthogonal decomposition: Theory and reduced-order modelling. Lecture Notes, University of Konstanz, 4(4), 2013.
[17] Z. Wang, W. Zhang, and Z. Zhang. A data-driven model reduction method for parabolic inverse source problems and its convergence analysis. Journal of Computational Physics, 487:112156, 2023.
[18] Z. Wang, W. Zhang, and Z. Zhang. Stochastic convergence of regularized solutions for backward heat conduction problems. arXiv:2311.03623, 2023.

$\displaystyle\sum_{j=1}^{L}\\|\tilde{y}_{j}\\|^{2}$	$\displaystyle=\\|\Phi DFJ\\|_{F}^{2}$	(2.13)
	$\displaystyle=\\|DFJ\\|_{F}^{2}\leq\\|D\\|_{F}^{2}\\|FJ\\|_{F}^{2}$	(2.14)
	$\displaystyle\leq C\\|FJ\\|_{F}^{2}$	(2.15)
	$\displaystyle\leq C\sum_{j=1}^{L}\\|y_{j}\\|^{2}.$	(2.16)

$\displaystyle\sum_{j=1}^{L}\\|\tilde{y}_{j}\\|^{2}$	$\displaystyle=\\|\Phi DFJ\\|_{F}^{2}=\\|DFJ\\|_{F}^{2}\leq\\|D\\|_{F}^{2}\\|FJ\\|_{F}^% {2}$	(3.10)
	$\displaystyle\leq C\\|FJ\\|_{F}^{2}$	(3.11)
	$\displaystyle=\leq C\sum_{j=1}^{L}\\|y_{j}\\|^{2}.$	(3.12)