Filtering SPDEs with Spatio-Temporal Point Process Observations

Jan Szalankiewicz Institute of Mathematics, TU Berlin, 10623 Berlin, Germany Cristina Martinez-Torres Institute of Physics and Astronomy, University of Potsdam, 14476 Potsdam, Germany
e-mail: [email protected], [email protected], [email protected] Wilhelm Stannat Institute of Mathematics, TU Berlin, 10623 Berlin, Germany

Abstract

In this paper, we develop the mathematical framework for filtering problems arising from biophysical applications where data is collected from confocal laser scanning microscopy recordings of the space-time evolution of intracellular wave dynamics of biophysical quantities. In these applications, signals are described by stochastic partial differential equations (SPDEs) and observations can be modelled as functionals of marked point processes whose intensities depend on the underlying signal. We derive both the unnormalized and normalized filtering equations for these systems, demonstrate the asymptotic consistency and approximations of finite dimensional observation schemes respectively partial observations. Our theoretical results are validated through extensive simulations using synthetic and real data. These findings contribute to a deeper understanding of filtering with point process observations and provide a robust framework for future research in this area.

Keywords and phrases: Stochastic partial differential equations, Marked point processes, Stochastic Filtering

1 Introduction

Reaction-diffusion systems are fundamental models in biophysics, representing spatially extended systems where dynamics at each location involve nonlinear reaction kinetics, coupled by diffusive transport of reacting species [10, 21]. The motivating example for this paper is the spatially extended stochastic FitzHugh-Nagumo-type model of actin wave formation in the social amoeba Dictyostelium discoideum [2], modeled by a stochastic partial differential equation (SPDE) of the following type:

\quad\begin{cases}\begin{aligned} \textnormal{d}X(t,x)&=(AX(t,x)+F(X(t,x)))% \textnormal{d}t+B(X(t,x))\textnormal{d}W(t,x),\\ X(0,x)&=\xi(x),\end{aligned}\end{cases}

(1)

$t\in(0,T]$ , on a suitable domain $\mathcal{D}\subset\mathbb{R}^{d}$ , where $A$ denotes diffusion, and $F$ the reaction-kinetics; see [23]. We will give precise conditions on the above terms in Section 2.1.1.

In practice, information on actin wave dynamics is obtained from confocal laser scanning microscopy (CLSM) recordings given as a time series of digital grey-scale images. To infer accurate statistical information contained in the data, based on the SPDE model (1), requires careful selection of the model parameters guided by experimental data obtained from CLSM recordings of giant D. discoideum cells. In addition, even if detailed simulations based on (1) may align well with experimental data, questions about the robustness and plausibility of model parameters remain [23].

In order to gather data using CLSM in the experiments, cells are tagged with fluorescent biomarkers, allowing researchers to count photon emissions correlated with the actin concentration rather than measuring actin concentration directly. Each pixel in the CLSM images corresponds to a specific region of the cell, with pixel values representing the number of emitted photons. Consequently, CLSM recordings provide data as sequences of digital images, where the photon counts are approximately Poisson distributed with intensity related to the fluorescent material concentration. This introduces an additional layer of stochasticity known as observation noise.

We use marked point processes (MPPs) as a mathematical model of such type of observations. MPPs represent a well-established class of point processes, capable of modeling random events in random positions — in this case, the time and location of photon emissions. This approach allows us to infer information on the underlying signal, the actin concentration modelled in terms of the SPDE (1), given MPP observations using stochastic filtering, a comprehensive Bayesian framework for sequential estimation in a model-based setting.

More specifically, let $\mathcal{K}$ be the mark space modeling the area of point positions, the evolution of the photon emissions in a given subset $\Gamma\subseteq\mathcal{K}$ over time can be written in integral form as the dynamics of a stochastic jump process $Y$ as follows,

\quad\begin{cases}\begin{aligned} \textnormal{d}Y(\Gamma,t)&=\int_{\Gamma}% \lambda(t,x\,|\,X(t))\textnormal{d}x\,\textnormal{d}t+\textnormal{d}N(\Gamma,t% ),\quad t\in(0,T],\\ Y(\Gamma,0)&=0,\end{aligned}\end{cases}

(2)

where $(N(\Gamma,t))_{t\geq 0}$ is the jump martingale corresponding to $Y$ restricted to $\Gamma$ .

In this paper, we develop the statistical filtering theory for the stochastic signal $X$ described by the SPDE in (1) with observation schemes arising from (2). Our work includes the derivation of the Kallianpur-Striebel formula, as well as the Zakai and Kushner-Stratonovich equations for the posterior distribution of $X$ . Although filtering problems are often formulated with Gaussian observations [3], the study of filtering with point process observations has gained significant attention across various disciplines, including statistics and engineering [15, 26, 29].

The foundational work by Snyder [27] was the first to rigorously address point process observations in stochastic filtering, a framework later extended to MPPs by Brémaud [5, 6]. Filtering for SPDEs with Gaussian observations was initially explored by Pardoux [22], and further developed by Ahmed, Fuhrmann, and Zabczyk [1]. Florchinger made contributions by analyzing SPDE signals with one-dimensional temporal point process observations [13], though this line of inquiry was not extensively pursued. More recently, Sun, Zeng, and Zhang investigated filtering with MPPs in the context of abstract Hilbert-space valued Markov processes [28], albeit without deriving the Kushner-Stratonovich equation and without giving an explicit functional analytical framework for the signal process.

To the best of our knowledge, the filtering framework of SPDEs with multivariate point process observations or more general MPP observations, has not been previously addressed in the literature.

Furthermore, we explore the relationship between observations represented as marked point processes and their lower-resolution multivariate point process approximations, which contain reduced spatial information. We prove weak convergence of the multivariate point processes observations to the underlying MPP counterparts and establish convergence in total variation for both, the unnormalized and normalized posterior distributions in the high-resolution limit. Additionally, we address the case of partial observations. To the best of our knowledge, such work has not yet been conducted within the context of filtering, providing error bounds for estimates based on low-resolution point process observations. Finally, we report on extensive numerical experiments, providing further insights into our theoretical findings.

The structure of this paper is as follows. In Section 2 we provide a concise overview of key concepts of SPDEs in the variational setting and MPPs, followed by the precise mathematical modeling of the stochastic filtering problems including both infinite- and finite-dimensional spatio-temporal point process observation schemes.

Section 3 is devoted to deriving the filtering equations. Specifically, we present the Kallianpur-Striebel formulas in Lemma 3.3 and Lemma 3.8, the Zakai equations for the time-evolution of the unnormalized conditional distributions in Theorem 3.5 and Theorem 3.9 and the Kushner-Stratonovich equations in Corollary 3.6 and Corollary 3.10.

In Section 4, we study the convergence of the multivariate point processes observations to the underlying MPP counterparts in the high-resolution limit, analyze the convergence of both, the unnormalized and normalized posterior distributions and establish approximation errors. Additionally, we introduce a specific modeling of partial observations designed to replicate the setting of CLSM data and derive corresponding error bounds.

The final Section 5 presents numerical simulation results.

2 Mathematical setting of the filtering model

The filtering theory for SPDE signals with Gaussian observations has been extensively studied in the literature; see [22, 1]. The only known work analyzing SPDE signals with point process observations is the conference paper [13], which considers a one-dimensional Poisson process with intensity dependent on the SPDE state. The recent paper [28] introduces multivariate point process (MPP) observations but deals with a very abstract, Hilbert space-valued Markov process.

Our objective to explicitly model the CLSM observations of actin wave dynamics implies leads to a new filtering problem for an SPDEs observed with MPPs. For one, this approach introduces a novel method for modeling spatio-temporal shot noise via generalized Cox processes steered by an SPDE. Furthermore, new questions about limits of statistical estimators arise, which we partly answer in Section 4.

2.1 The signal process

We will model the signal process as an SPDE within the variational framework as introduced in [19, 22], employing their terminology. Although our analysis primarily focuses on the variational solution concept, it can be adapted to accommodate other concepts, such as mild solutions. This adaptation is a technical matter that necessitates changes to the functional analytical framework, resulting in different conditions for the SPDE coefficients and a different Itô formula than the one we employ; see for example [25, Thm. 4.17].

2.1.1 Variational solutions to SPDE

Let $\mathcal{H}$ be a Hilbert space with inner product $(\cdot,\cdot)_{\mathcal{H}}$ and $\mathcal{V}$ a reflexive Banach space, both on $\mathcal{D}\subset\mathbb{R}^{d}$ , and let $\mathcal{V}^{\ast}$ denote the dual space of $\mathcal{V}$ . By ${}_{\mathcal{V}^{\ast}}\langle\cdot,\cdot\rangle_{\mathcal{V}}$ we denote the dual pairing between $\mathcal{V}$ and $\mathcal{V}^{\ast}$ . We impose that $(\mathcal{V},\mathcal{H},\mathcal{V}^{\ast})$ forms a Gelfand triple which implies that $\mathcal{V}\subset\mathcal{H}\approx\mathcal{H}^{\ast}\subset\mathcal{V}^{\ast}$ continuously and densely and that

{}_{\mathcal{V}^{\ast}}\langle h,v\rangle_{\mathcal{V}}=(h,v)_{\mathcal{H}},% \quad\text{ for all }h\in\mathcal{H},\,v\in\mathcal{V},

see e.g. [19, pp. 69].

Let $T\geq 0$ and $(\Omega,\mathcal{F},(\mathcal{F}_{t})_{t\geq 0},\mathbb{P})$ be a complete probability space with filtration $(\mathcal{F}_{t})_{t\geq 0}$ satisfying the usual conditions. For some given separable real Hilbert space $\mathcal{U}$ we consider $(W(t))_{t\geq 0}$ to be a $\mathcal{U}$ -valued $(\mathcal{F}_{t})_{t\geq 0}$ -adapted $Q$ -Wiener process. We assume that $Q$ is a self-adjoint, positive semidefinite linear operator on $\mathcal{U}$ , with finite trace $\textnormal{tr}_{\mathcal{U}}Q<+\infty$ .

We consider stochastic partial differential equations (SPDE) on $\mathcal{H}$ of the following type

\hypertarget{introduction_spde_basic_formulation}{\textnormal{(S)}}\quad\begin% {cases}\begin{aligned} \textnormal{d}X(t)&=A(X(t))\textnormal{d}t+B(X(t))% \textnormal{d}W(t),\quad t\in(0,T],\\ X(0)&=\xi\in\mathcal{H}\end{aligned}\end{cases}

with $B\in L_{2}(\mathcal{U},\mathcal{H})$ , where $L_{2}(\mathcal{U},\mathcal{H})$ denotes the space of Hilbert Schmidt operators from $\mathcal{U}$ to $\mathcal{H}$ , and $A:\mathcal{V}\rightarrow\mathcal{V}^{\ast}$ . Such a general form of an SPDE covers cases such as stochastic heat and reaction-diffusion equations, see [19]. In order to being able to work with an analytically weak solution to LABEL:introduction_spde_basic_formulation we make the standard assumptions:

Assumption 1.

We assume that the following conditions hold on the coefficients $\xi,\,A,B$ in LABEL:introduction_spde_basic_formulation.

(A0)

Initial condition: Let $\xi\in L^{2}(\Omega,\mathcal{F}_{0},\mathbb{P};\mathcal{H})$ .

(A1)

Hemicontinuity: For $u,v,w\in\mathcal{V}$ , $t\in[0,T]$ the map

\delta\mapsto\,_{\mathcal{V}^{\ast}}\langle A(u+\delta v),w\rangle_{\mathcal{V}}

is continuous.

(A2)

Weak monotonicity: There exists a constant $\mathcal{C}_{1}\in\mathbb{R}$ s.t. for $u,v\in\mathcal{V}$

\displaystyle 2\,_{\mathcal{V}^{\ast}}\langle A(u)-A(v),u-v\rangle_{\mathcal{V}}

\displaystyle+\|(B(u)-B(v))\,\sqrt{Q}\|^{2}_{L_{2}(\mathcal{U},\mathcal{H})}% \leq\mathcal{C}_{1}\|u-v\|^{2}_{\mathcal{H}}

on $[0,T]$ .

(A3)

Coercivity: There exist constants $\mathcal{C}_{2}\in\mathbb{R}$ , $\mathcal{C}_{3},\mathcal{C}_{4}\in(1,\infty)$ , $\tilde{p}\in(1,\infty)$ , such that for all $v\in\mathcal{V}$

\displaystyle 2\,_{\mathcal{V}^{\ast}}\langle A(v),v\rangle_{\mathcal{V}}

\displaystyle+\|B(v)\,\sqrt{Q}\|^{2}_{L_{2}(\mathcal{U},\mathcal{H})}\leq% \mathcal{C}_{2}\|v\|^{2}_{\mathcal{H}}-\mathcal{C}_{3}\|v\|^{\tilde{p}}_{% \mathcal{V}}+\mathcal{C}_{4}.

(A4)

Boundedness: There exists a constant $\mathcal{C}_{5}>0$ s.t. for all $v\in\mathcal{V}$

\displaystyle\|A(v)\|_{\mathcal{V}^{\ast}}\leq\mathcal{C}_{5}(1+\|u\|_{% \mathcal{V}}).

Under Assumption 1 it is known that equation LABEL:introduction_spde_basic_formulation admits an analytically weak or variatonal solution to the SPDE LABEL:introduction_spde_basic_formulation, see for example [19, Thm. 4.2.4]. In particular, this means that there exists a unique $\mathcal{H}$ -valued, $(\mathcal{F}_{t})$ -adapted process $X=(X(t))_{t\in[0,T]}$ , where

X\in L^{2}([0,T]\times\Omega,\textnormal{d}t\otimes\mathbb{P};\mathcal{H})\cap L% ^{\tilde{p}}([0,T]\times\Omega,\textnormal{d}t\otimes\mathbb{P};\mathcal{V})

with $\tilde{p}$ from LABEL:assumption_coercivity, such that for any $v\in\mathcal{V}$ we have the $\mathbb{P}$ -a.s. equality

(X(t),v)_{\mathcal{H}}=(X(0),v)_{\mathcal{H}}+\int_{0}^{t}\,{}_{\mathcal{V}^{% \ast}}\langle A(X(s)),v\rangle_{\mathcal{V}}\textnormal{d}s+\int_{0}^{t}(v,B(X% (s))\textnormal{d}W(s))_{\mathcal{H}},

(3)

for any $t\in[0,T]$ . Additionally, one can show that the solution is an $\mathcal{H}$ -Markov process [19, Proposition 4.3.5]. Such a variational solution to LABEL:introduction_spde_basic_formulation represents the signal in our filtering problem.

2.1.2 Itô functions and the infinitesimal generator

For deriving the filtering equations in Section 3, it will be of great use to have a version of Itô’s lemma for variational solutions. The suitable function class is given as follows.

Definition 2.1.

[22, p. 136] We call a function $\psi:\mathcal{H}\rightarrow\mathbb{R}$ an Itô function, if it fulfills the following conditions, where all derivatives have to be understood w.r.t. $\mathcal{H}$ .

(i)

$\psi$ is twice Fréchet-differentiable with derivatives $\text{D}^{1}\psi$ and $\text{D}^{2}\psi$ .
(ii)

$\psi$ , $\text{D}^{1}\psi$ and $\text{D}^{2}\psi$ are locally bounded.
(iii)

For any trace-class operator $\Theta:\mathcal{H}\rightarrow\mathcal{H}$ , the functional $u\rightarrow\textnormal{tr}\big{(}\Theta\text{D}^{2}\psi(u)\big{)}$ is continuous on $\mathcal{H}$ .
(iv)

For $v\in\mathcal{V}$ both $\text{D}^{1}\psi(v)\in\mathcal{V}$ and the map $\text{D}^{1}\psi(v)|_{\mathcal{V}}:\mathcal{V}\rightarrow\mathcal{V}$ is continuous when the domain is equipped with the strong and the image is equipped with the weak topology.
(v)

There is a constant $\mathcal{C}_{\mathcal{V}}>0$ such that $\|\text{D}^{1}\psi(v)\|_{\mathcal{V}}\leq\mathcal{C}_{\mathcal{V}}(1+\|v\|_{% \mathcal{V}})$ for all $v\in\mathcal{V}$ .

Moreover, if $\psi$ , $\text{D}^{1}\psi$ and $\text{D}^{2}\psi$ are globally bounded, we call $\psi$ a globally bounded Itô function.

Under Assumptions 1, the infinitesimal generator $\mathcal{L}$ of the signal $X$ is given by

\mathcal{L}\psi=\,_{\mathcal{V}^{\ast}}\langle A(\cdot),\text{D}^{1}\psi% \rangle_{\mathcal{V}}+\frac{1}{2}\textnormal{tr}\{\text{D}^{2}\psi\;B(\cdot)QB% (\cdot)^{\ast}\},

(4)

for any Itô function $\psi$ .

2.2 The observation process

In the biophysical application we can only measure the actin concentration indirectly in the form of photon emissions of certain fluorescent biomarkers attached to actin. These measurements are given as sequences of digital gray-scale images in given times $t_{1},\dots,t_{n}$ . In particular, the pixel value of an image in time $t_{i}$ corresponds to a (transformed) photon emission count in the corresponding area under the microscope, recorded in the time interval $(t_{i-1},t_{i}]$ . In practice, our analysis has shown that in the given experiments these photon counts have a Poisson statistic. Hence, we can justify to model the photon count of an individual pixel as a Poisson distributed random variable, where the intensity is given as a function of the concentration of fluorescent molecules available at the time of recording in the corresponding area.

Now, for a sequence of images, an intuitive approach to modeling such an observation scheme is to assign a point process in time to each pixel, resulting in a multivariate point process as described in LABEL:introduction_MPP_discretized_observation_multivariate, where $M$ is the number of pixels. This is referred to as the finite dimensional model because it only involves a finite number of sets, or pixels.

A more general approach is to move away from the analogy of digital images with a fixed number of pixels and instead look at (theoretical) recordings of the exact space-time locations of each single photon count. An analytically manageable way to formalize such an observation is by employing the notion of marked point processes, which can be either seen as random space-time point clouds or as random space-time counting measures. This leads to the scheme LABEL:introduction_MPP_observation, termed the infinite dimensional observation.

We choose to first construct the more general version LABEL:introduction_MPP_observation, as this observation includes the exact times and locations of photon emissions. From this, we derive LABEL:introduction_MPP_discretized_observation_multivariate, which records only the pixel area of photon emissions, not their exact positions. This distinction will become clearer once all technical details are elaborated.

In the first half of this section, we provide a brief overview of point process theory, as several of the tools discussed are crucial for the analyses in Sections 3 and 4. The second half introduces the two observation schemes we intend to investigate and outlines the filtering problem.

For a comprehensive introduction to point processes, we refer the reader to [6, 8, 18], which serve as our primary references regarding MPPs.

2.2.1 Fundamentals of marked point processes

Simple point processes and MPPs

A point process $\chi$ on some state space $\mathcal{S}$ is defined as a measurable mapping from $(\Omega,\mathcal{F},\mathbb{P})$ into $(\mathcal{N}^{\#}_{\mathcal{S}},\mathcal{B}(\mathcal{N}^{\#}_{\mathcal{S}}))$ , where $\mathcal{N}^{\#}_{\mathcal{S}}$ denotes the space of boundedly finite counting measures; see [8, Ch. 9] Motivated by our application, we choose $\mathcal{S}:=[0,T]\times\mathcal{K}$ for $T$ from Section 2.1.1 and a compact set $\mathcal{K}\subset\mathbb{R}^{d_{O}}$ , $d_{O}\in\mathbb{N}$ . Let $\mu_{\mathcal{K}}$ denote the $d_{O}$ -dimensional Lebesgue measure. We introduce the measure space $(\mathcal{K},\mathcal{B}(\mathcal{K}),\mu_{\mathcal{K}})$ , and call it the mark space and are going to refer to $([0,T]\times\mathcal{K},\mathcal{B}([0,T]\times\mathcal{K}),\textnormal{d}t% \times\mu_{\mathcal{K}})$ when we speak of the product measure space. The following definitions and notations are taken from [8, Ch.9].

Definition 2.2.

(i)

By $\mathcal{N}^{\#\ast}_{[0,T]}$ we denote the family of all simple counting measures on $[0,T]$ , meaning that for any $\zeta\in\mathcal{N}^{\#\ast}_{[0,T]}$ we have

\zeta(\{t\})\in\{0,1\}\text{ for all }t\in[0,T].

(5)

(ii)

By $\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ we denote the family of boundedly finite counting measures on the product measure space such that for any $\chi\in\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ the associated ground measure $\chi^{g}$ defined by

\chi^{g}(L):=\chi(L\times\mathcal{K}),\text{ for any }L\in\mathcal{B}([0,T]),

(6)

is an element of $\mathcal{N}^{\#\ast}_{[0,T]}$ .

Note that $\mathcal{N}^{\#\ast}_{[0,T]}$ is not a closed subset of $\mathcal{N}^{\#}_{[0,T]}$ , and similarly, $\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ is not a closed subset of $\mathcal{N}^{\#}_{[0,T]\times\mathcal{K}}$ , as in general the existence of so-called accumulation points can not be ruled out. Let $(\Omega,\mathcal{F},\mathbb{P})$ be the filtered probability space from Section 2.1.1.

Definition 2.3.

(i)

A point process $\nu$ on the state space $[0,T]\times\mathcal{K}$ is a measurable mapping from $(\Omega,\mathcal{F},\mathbb{P})$ into $(\mathcal{N}^{\#}_{[0,T]\times\mathcal{K}},\mathcal{B}(\mathcal{N}^{\#}_{[0,T]% \times\mathcal{K}}))$ .
(ii)

A point process $\bar{\nu}$ on $[0,T]$ is called simple when $\bar{\nu}\in\mathcal{N}^{\#\ast}_{[0,T]}\;\mathbb{P}$ -a.s.
(iii)

A point process $\nu$ on $[0,T]\times\mathcal{K}$ is called marked point process (MPP) on $[0,T]$ with mark space $\mathcal{K}$ if $\nu\in\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}\;\mathbb{P}$ -a.s.
(iv)

An MPP $\nu$ on $[0,T]\times\mathcal{K}$ is called marked Poisson process on $[0,T]$ with mark space $\mathcal{K}$ if its ground process is a Poisson process on $[0,T]$ .

As throughout the paper $\mathcal{K}$ will always be the mark space, we are simply going to refer to any MPP on $[0,T]$ with mark space $\mathcal{K}$ as an MPP on $[0,T]\times\mathcal{K}$ . Sometimes it is also demanded that a marked Poisson process has a mark distribution which, given $\alpha$ is independent of $\lambda^{g}$ ; see [6, p. 243].

Remark 2.4 (Finite boundedness on compact spaces).

For any complete separable metric space $\mathcal{S}$ , denote by $\mathcal{M}^{\#}_{\mathcal{S}}$ the space of all boundedly finite measures on $\mathcal{S}$ , i.e., all countably additive, real-valued set functions $\xi$ with the property

\xi(A)<\infty\text{ for any bounded }A\in\mathcal{B}(\mathcal{S}),

(7)

and by $\mathcal{M}_{\mathcal{S}}$ the family of all totally bounded measures on $\mathcal{S}$ . It is known that under the weak topology, $\mathcal{M}_{\mathcal{S}}$ is complete separable metric space itself and that the family of all totally bounded counting measures $\mathcal{N}_{\mathcal{S}}$ is a closed subset of $\mathcal{M}_{\mathcal{S}}$ . Analogously, $\mathcal{M}^{\#}_{\mathcal{S}}$ is a complete separable metric space under the weak hash-topology, and the space of boundedly finite counting measures $\mathcal{N}^{\#}_{\mathcal{S}}$ is a closed subset of $\mathcal{M}^{\#}_{\mathcal{S}}$ ; see [8, Ch. 9] for details.

It is evident that by compactness of $[0,T]\times\mathcal{K}$ the families $\mathcal{M}_{[0,T]\times\mathcal{K}}$ and $\mathcal{M}^{\#}_{[0,T]\times\mathcal{K}}$ , and thus also $\mathcal{N}_{[0,T]\times\mathcal{K}}$ and $\mathcal{N}^{\#}_{[0,T]\times\mathcal{K}}$ , coincide. This implication will play a role in Section 4, where we are going to exploit the fact that weak convergence on $\mathcal{N}_{[0,T]\times\mathcal{K}}$ is metrizable to derive convergence rates; see [9]. However, keeping this identity in mind we will stick to the notation using the $\#$ -symbol for the measure spaces to be in line with point process literature.

Doob-Meyer decomposition of MPPs

For an MPP $\nu$ , let us denote $\nu_{\Gamma}(t):=\nu((0,t]\times\Gamma)$ (and $\nu_{\Gamma}(0):=\nu(\{0\}\times\Gamma)$ ) for any $t\in[0,T]$ and $\Gamma\in\mathcal{B}(\mathcal{K})$ . Under mild assumptions, in particular boundedly finite first moment measure and absolute continuity of the so-called Campbell measure associated to $\nu$ , see [8, Ch. 13-14], we have the existence of a $\mathbb{P}$ -a.s. unique nonnegative conditional intensity $\lambda$ w.r.t. $(\mathbb{P},\mathcal{F}_{t})$ , such that we have the integral representation

\displaystyle\textnormal{d}\nu_{\Gamma}(t)

\displaystyle=\int_{\Gamma}\lambda(t,x)\;\mu_{\mathcal{K}}(\textnormal{d}x)\,% \textnormal{d}t+\textnormal{d}N_{\Gamma}(t),

(8)

where the process $(N_{\Gamma}(t))_{t\geq 0}$ defined by

\textnormal{d}N_{\Gamma}(t):=\textnormal{d}\nu_{\Gamma}(t)-\textnormal{d}% \Lambda_{\Gamma}(t),\quad t\in(0,T],

(9)

is a local right-continuous $\mathcal{F}_{t}$ -martingale for any $\Gamma\in\mathcal{B}(\mathcal{K})$ .

The analogous decomposition can be done for the ground measure $\nu^{g}$ of an MPP. There we simply introduce the ground process $(\nu^{g}(t))_{t\geq 0}$ by

\nu^{g}(t):=\nu^{g}((0,t])=\sum\limits_{(\tau_{i},\kappa_{i})\in\nu((0,t]% \times\mathcal{K})}\mathds{1}\{(\tau_{i},\kappa_{i})\in(0,t]\times\mathcal{K}% \},\quad t\in(0,T],

(10)

which defines a right-continuous $\mathcal{F}_{t}$ -adapted stochastic process. This leads to the integral representation

\displaystyle\textnormal{d}\nu^{g}(t)

\displaystyle=\int_{\mathcal{K}}\lambda(t,x)\;\mu_{\mathcal{K}}(\textnormal{d}% x)\,\textnormal{d}t+\textnormal{d}N_{\mathcal{K}}(t).

(11)

It is often useful to factorize $\lambda$ into the intensity $\lambda^{g}$ of the ground process $Y^{g}$ , defined $\mathbb{P}$ -a.s. by

\lambda^{g}(t):=\int_{\mathcal{K}}\lambda(t,x)\,\mu_{\mathcal{K}}(\textnormal{% d}x),\quad t\in[0,T],

and the stochastic kernel of the so-called conditional mark distribution $\Phi(\textnormal{d}x\,|\,t):=\phi(x\,|\,t)\mu_{\mathcal{K}}(\textnormal{d}x)$ on $\mathcal{K}$ , leading to the pair $\{\lambda^{g}(\cdot)\,,\,\Phi(\textnormal{d}x\,|\,\cdot)\}$ , called $(\mathbb{P},\mathcal{F}_{t})$ -local characteristics in [6]. The existence and uniqueness of such a factorization directly follows from the assumptions we made on the point process, see [8, Prop. 14.3.II]. As they are derived directly from the compensator the conditional intensity, and equivalently the local characteristics, suffice to completely characterize an MPP w.r.t. $(\mathcal{F}_{t})$ .

Cox processes

Finally, all of the concepts in this paragraph can be easily extended to (marked) point processes, whose intensities $\lambda$ which are functions of some underlying random element $\zeta$ . We provide a heuristic definition and again refer to the standard books [6, 7] for further details.

Definition 2.5.

Let $\zeta$ be a random measure on some measurable space $(S,\mathcal{B}(S))$ .

(i)

An MPP $\nu$ on $[0,T]\times\mathcal{K}$ is a generalized marked Cox process directed by $\zeta$ , when its conditional intensity $\lambda$ is a measurable function of $\zeta$ .
(ii)

An MPP $\nu$ on $[0,T]\times\mathcal{K}$ is a marked Cox process directed by $\zeta$ , when it is a generalized marked Cox process whose ground process given $\zeta$ is a Poisson process on $[0,T]$ ; equivalently, given $\zeta$ the MPP $\nu$ is a marked Poisson process.

We want to note that the notion of a generalized Cox process is not used consistently in the literature. In filtering theory it is standard procedure to let the random measure $\zeta$ be given as a nonnegative function of the state $\xi(t)$ of some Markov process $(\xi(t))_{t\geq 0}$ . Equivalently, one can then say that the generalized Cox process is directed by $(\xi(t))_{t\geq 0}$ . An explicit construction will be given in the next section.

2.2.2 Observation schemes

Let $(\Omega,\mathcal{F},(\mathcal{F}_{t}),\mathbb{P})$ be the filtered probability space and $X$ be the signal from Section 2.1.1. We impose the following assumptions.

Assumption 2.

Let $\mathcal{K}\subset\mathbb{R}^{d_{O}}$ be compact. The observation process $Y$ is given as a generalized Cox process on $[0,T]\times\mathcal{K}$ directed by $X$ , with boundedly finite first moment. Moreover, the conditional $(\mathbb{P},\mathcal{F}_{t})$ -intensity $\lambda$ of $Y$ is a strictly positive, bounded, measurable mapping $\lambda:[0,T]\times\mathcal{K}\times\mathcal{H}\longrightarrow\mathbb{R}_{+}$ such that there exist constants $\underline{C},\,\overline{C}$ with

0<\underline{C}\leq\int_{\mathcal{K}}\lambda(t,x\,|\,u)\,\mu_{\mathcal{K}}(% \textnormal{d}x)\leq\overline{C}<\infty,\quad\mathbb{P}\text{-a.s.},\;t\in[0,T].

(12)

As discussed in the previous section, with fixed $T>0$ , an MPP on $[0,T]\times\mathcal{K}$ is not only $\mathbb{P}$ -a.s. boundedly finite but even $\mathbb{P}$ -a.s. totally finite. Therefore, assuming the boundedness of the stochastic intensity is not overly restrictive in this context.

Remark 2.6.

Using the notion of local characteristics introduced in the last section, condition (12) is equivalent to saying that the $(\mathbb{P},\mathcal{F}_{t})$ -local characteristics $(\lambda^{g}(t,X(t),\Phi(\textnormal{d}x\,|\,t,X(t))$ of $Y$ are uniformly bounded, $\mathcal{H}$ -measurable mappings such that

	$\displaystyle 0<$	$\displaystyle\underline{C}\leq\lambda^{g}(t\,\|\,X(t))\leq\overline{C}<\infty,$		(13)
		$\displaystyle\int_{\mathcal{K}}\Phi(\textnormal{d}x\,\|\,t,X(t))=1$		(14)

$\mathbb{P}$ -a.s., for any $t\in[0,T]$ .

Infinite-dimensional observations

Given $Y$ as in Assumption 2, the observation LABEL:introduction_MPP_observation is a realization of the MPP $Y$ on $[0,T]\times\mathcal{K}$ given a signal path of $X$ , meaning that for any Borel set $\Gamma\in\mathcal{B}(\mathcal{K})$ , by using the form of the semimartingale decomposition in (11), we have a path of the jump process

\hypertarget{introduction_MPP_observation}{\text{(O)}}\quad\begin{cases}\begin% {aligned} \textnormal{d}Y_{\Gamma}(t)&=[\lambda^{g}(t\,|\,X(t))\Phi(\Gamma\,|% \,t,X(t))]\,\textnormal{d}t+\textnormal{d}N_{\Gamma}(t),\quad t\in(0,T],\\ Y_{\Gamma}(0)&=0.\end{aligned}\end{cases}

Finite-dimensional observations

In LABEL:introduction_MPP_observation, for any $t\in[0,T]$ , given $X$ the observation $Y_{\cdot}(t)$ is a measure on $(\mathcal{K},\mathcal{B}(\mathcal{K}))$ . In practice we often have a finite-dimensional observation vector, think of pixels in an image from fluorescence microscopy, which dictates a specific partition on the mark space $\mathcal{K}$ , thereby limiting the available spatial information and hence the choice of test sets. A mathematical formalization of such a spatial discretization can be done as follows: For any $M\in\mathbb{N}$ we denote by

\mathcal{K}^{M}:=\{K^{M}_{1},\dots,K^{M}_{M}\}

a partition consisting of nonempty Borel sets of the markspace $\mathcal{K}$ . Such a collection of sets $\mathcal{K}^{M}$ can always be found for any $M\in\mathbb{N}$ due to the separability assumption on $\mathcal{K}$ .

Given any partition $\mathcal{K}^{M}$ and a realization of the signal $X$ , we define

\lambda^{M}_{i}(t\,|\,X(t)):=\lambda^{g}(t\,|\,X(t))\Phi(K^{M}_{i}\,|\,t,X(t))% ,\quad i=1,\dots,M,

for any $t\in[0,T]$ . We now introduce a multivariate $M$ -dimensional point process $(Y^{M}(t))_{t\in[0,T]}$ on $[0,T]$ , with $Y^{M}(t):=(Y^{M}_{1}(t),\dots,Y^{M}_{M}(t))$ , $t\in[0,T]$ , where each of the $Y^{M}_{i}$ has $(\mathbb{P},\mathcal{F}_{t})$ -intensity $\lambda^{M}_{i}(t\,|\,X(t))$ . Exactly as in (11), any of the processes $Y^{M}_{i}$ can be written as a semimartingale with associated jump martingale part

\textnormal{d}N^{M}_{i}(t):=\textnormal{d}Y_{i}(t)-\lambda^{M}_{i}(t)% \textnormal{d}t.

The finite-dimensional observation is then given as the system

\hypertarget{introduction_MPP_discretized_observation_multivariate}{% \textnormal{(O${}^{M}$)}}\quad\begin{cases}\begin{aligned} \textnormal{d}Y^{M}% _{i}(t)&=\lambda^{M}_{i}(t\,|\,X(t))\,\textnormal{d}t+\textnormal{d}N^{M}_{i}(% t),\quad t\in(0,T],\\ Y^{M}_{i}(0)&=0,\end{aligned}\end{cases}

for $i=1,\dots,M$ .

Remark 2.7.

Although we could also introduce a general multivariate point process in the form of LABEL:introduction_MPP_discretized_observation_multivariate, we choose to explicitly construct the finite dimensional observation from the MPP as this approach allows us to utilize the more general methods in both settings from the outset. Moreover, we do not need to introduce additional assumptions on the multivariate point process $Y^{M}$ as they carry over from the properties of $Y$ . We will furthermore have the advantage of being able to embed the multivariate point process $Y^{M}$ on $[0,T]$ into the space of counting measures on $[0,T]\times\mathcal{K}$ in Section 4. This way we characterize $Y$ as a weak limit of multivariate point processes and show how the filtering equations for LABEL:introduction_MPP_observation can be seen as the limit case of the ones corresponding to LABEL:introduction_MPP_discretized_observation_multivariate.

We end this section with a simple practical example of our observation schemes.

Example 2.8 (Reaction-Diffusion SPDE with Marked Cox process observations).

For some given bounded compact domain $\mathcal{D}\subset\mathbb{R}^{d}$ and a globally Lipschitz continuous and bounded function $F$ , we define the $A(u):=\Delta u+F(u)$ with Dirichlet boundary conditions, such that LABEL:introduction_spde_basic_formulation becomes

\textnormal{d}X(t)=(\Delta X(t)+F(X(t)))\textnormal{d}t+B\textnormal{d}W(t),

(15)

which represents a typical reaction-diffusion SPDE. We choose $\mathcal{V}:=W^{1,2}_{0}(\mathcal{D})$ , $\mathcal{H}:=L^{2}(\mathcal{D})$ , so $\mathcal{V}^{\ast}:=(W^{1,2}_{0}(\mathcal{D}))^{\ast}$ ; see [19, Ch. 4.1] for a detailed discussion.

Now, we explicitly construct a simple example for a marked Cox process observation of $X$ . To this end, let $\mathcal{K}=\mathcal{D}$ and let $0<c_{1}<c_{2}$ . We define

\lambda^{g}(u):=\max\big{\{}\|u\|_{\mathcal{H}}+c_{1},c_{2}\big{\}},\quad u\in% \mathcal{H}.

For some given mollifier $\varphi_{\varepsilon}:\mathbb{R}^{d}\rightarrow\mathbb{R}$ with radius $\varepsilon>0$ (see for example [4, Chapter 4.4]) we have

u^{\varepsilon}:=u\ast\varphi_{\varepsilon}\in\mathcal{C}^{\infty}(\mathcal{D}),

and by defining

\phi(x\,|\,u):=u^{\varepsilon}(x)\,\Big{(}\int_{\mathcal{D}}u^{\varepsilon}(x)% \textnormal{d}x\Big{)}^{-1},

we get a probability density on $\mathcal{K}$ with corresponding distribution $\Phi(\,\cdot\,|u)=\int_{\cdot}\phi(x|u)\,\textnormal{d}x$ for any $u\in\mathcal{H}$ .

Given a signal path $X$ according to (15), we define the observation $Y$ as the marked Cox process with $\mathbb{P}$ -local characteristics $(\lambda^{g}(X(t)),\Phi(\textnormal{d}x\,|\,X(t)))$ . The ground process $Y^{g}$ is indeed a Cox process in time, as $\lambda^{g}(X(\,\cdot\,)$ is continuous and $\mathcal{F}_{0}$ -measurable by construction, hence [8, Theorem 14.6.I.] applies.

3 The Filtering Equations

In this section we are going to derive the classical equations of the unnormalized and normalized filters for the observation scheme LABEL:introduction_MPP_observation. The main techniques for this are known since Snyder’s seminal paper [27] and have been generalized to the MPP case by Brémaud, see [6]. Other references covering the topic are for example [8, 18]. Our paper is the first to tackle the case of an SPDE signal and thus, in comparison to the rather recent paper [28], we do know the explicit form of the generator $\mathcal{L}$ and the functional analytical framework of $X$ . For the rest of this section we assume that Assumptions 1 and 2 hold true.

3.1 The Kallianpur-Striebel formula

As usual in filtering, our first step is to show the existence of a reference measure $\mathbb{Q}$ on $(\Omega,\mathcal{F})$ under which the process $Y$ has $(\mathbb{Q},\mathcal{F}_{t})$ -local characteristics $(1,|\mathcal{K}|^{-1}\mu_{\mathcal{K}}(\textnormal{d}x))$ , in other words under which $Y$ has a unit rate Poisson-distributed ground process and uniformly distributed marks in $\mathcal{K}$ . By $\mathbb{P}_{t}$ and $\mathbb{Q}_{t}$ we denote the restrictions of the respective measures to $\mathcal{F}_{t}$ , for any $t\in[0,T]$ .

First we define the process $(\hat{Z}(t))_{t\geq 0}$ via

	$\displaystyle\hat{Z}(t):=\exp$	$\displaystyle\left\{-\int_{0}^{t}\int_{\mathcal{K}}\log\{\lambda(s-,x\,\|\,X(s)% )\}\right.Y(\textnormal{d}s,\textnormal{d}x)$		(16)
		$\displaystyle\quad+\left.\int_{0}^{t}\int_{\mathcal{K}}\big{(}\lambda(s,x\,\|\,% X(s))-1\big{)}\,\mu_{\mathcal{K}}(\textnormal{d}x)\textnormal{d}s\right\},% \quad t\in[0,T],$

which is well-defined as $\lambda$ is strictly positive and measurable. It can be easily seen that $\hat{Z}$ is stochastic exponential and follows the integral equation

	$\displaystyle\hat{Z}(t)=1+\int_{0}^{t}\int_{\mathcal{K}}\hat{Z}(s-)$	$\displaystyle(\lambda(s-,x\,\|\,X(s))^{-1}-1)\,\times$		(17)
		$\displaystyle\times(Y(\textnormal{d}s,\textnormal{d}x)-\lambda(s,x\,\|\,X(s))% \mu_{\mathcal{K}}(\textnormal{d}x)\textnormal{d}s),$

which can be found with an application of Itô’s formula. The following result is crucial for the filtering equations:

Lemma 3.1.

The process $\hat{Z}$ given by (16) is a $(\mathbb{P},\mathcal{F}_{t})$ -martingale.

$\diamond$

We omit a detailed proof, as it is standard and widely available in the literature, see [6, 8, 18]. Furthermore, the proof does not hinge on the specifics of the underlying signal. The general strategy relies on the fact that, as a consequence of the boundedness of $Y$ , $\hat{Z}$ is a local $(\mathbb{P},\mathcal{F}_{t})$ -martingale, and by nonnegativity also a $(\mathbb{P},\mathcal{F}_{t})$ -supermartingale. In conclusion, it suffices to show that $\mathbb{E}_{\mathbb{P}}[\hat{Z}(t)]=1$ for any $t\in[0,T]$ , under the conditions outlined in [6, VIII.T11], which are fulfilled in our case.

This lets us introduce the reference probability measure $\textnormal{d}\mathbb{Q}_{t}:=\hat{Z}(t)\textnormal{d}\mathbb{P}_{t}$ , which can be extended to a probability measure $\mathbb{Q}$ on $(\Omega,\mathcal{F})$ by standard methods. Under $\mathbb{Q}$ , the processes $X$ and $Y$ are independent as $Y$ has $(\mathbb{Q},\mathcal{F}_{t})$ -local characteristics $(1,|\mathcal{K}|^{-1}\mu_{\mathcal{K}}(\textnormal{d}x))$ , see [6, VIII.T10] and [8, Prop. 14.4.III]. Furthermore, the notion of Radon-Nikodym derivatives is justified and we define $\frac{\textnormal{d}\mathbb{Q}_{t}}{\textnormal{d}\mathbb{P}_{t}}:=\hat{Z}(t)$ .

Moreover, as $\hat{Z}$ is nonnegative, we can define $Z(t)=(\hat{Z}(t))^{-1}$ , $t\in[0,T]$ , and by (17) get the associated integral equation

Z(t)=1+\int_{0}^{t}\int_{\mathcal{K}}Z(s-)(\lambda(s-,x\,|\,X(s))-1)\,(Y(% \textnormal{d}s,\textnormal{d}x)-\mu_{\mathcal{K}}(\textnormal{d}x)\textnormal% {d}s),

(18)

for $t\in[0,T].$ Furthermore, the above results imply $\mathbb{E}_{\mathbb{Q}}[Z(t)]=1$ , $t\in[0,T]$ and that the converse Radon-Nikodym derivative is given by $\frac{\textnormal{d}\mathbb{P}_{t}}{\textnormal{d}\mathbb{Q}_{t}}:=Z(t)$ .

Remark 3.2.

Depending on the range of the values of $\lambda$ , the canonical choice of $(\mathbb{Q},\mathcal{F}_{t})$ -local characteristics $(1,|\mathcal{K}|^{-1}\mu_{\mathcal{K}}(\textnormal{d}x))$ can be adjusted to $(c_{g},|\mathcal{K}|^{-1}\mu_{\mathcal{K}}(\textnormal{d}x))$ for some $c_{g}>0$ , without any limitations to the theory developed in this paper. All objects derived in this and the subsequent sections can be configured to hold with respect to the adjusted characteristics.

From a numerical perspective, it might be useful to choose $c_{g}$ in such a way that the difference $\lambda(s-,x\,|\,X(s))-c_{g}$ remains within a numerically feasible range in (16) and forthcoming analogous Radon-Nikodym densities.

From a statistical standpoint, it could be beneficial to choose a $c_{g}$ much larger than the actual intensity, analogous to using a reference process with a much higher expected number of points and interpreting the actual observation as a thinned point process.

$\diamond$

For any Itô-function $\psi$ we define the normalized filter $(\eta_{t}(\psi))_{t\geq 0}$ by

\eta_{t}(\psi):=\mathbb{E}_{\mathbb{P}}[\psi(X(t))|\mathcal{Y}_{t}],\quad t\in% [0,T],

where $(\mathcal{Y}_{t})_{t\geq 0}$ is the filtration generated by the observation process $Y$ . The starting point of deriving an explicit form for $(\eta_{t}(\psi))_{t\geq 0}$ is the following application of Bayes’s type formula.

Theorem 3.3.

The following Kallianpur-Striebel formula holds for any Itô-function $\psi$ :

\eta_{t}(\psi)=\frac{\mathbb{E}_{\mathbb{Q}}[\psi(X(t))Z(t)|\mathcal{Y}_{t}]}{% \mathbb{E}_{\mathbb{Q}}[Z(t)|\mathcal{Y}_{t}]}\quad\mathbb{P}\text{-a.s.},% \quad t\in[0,T],

(19)

where $Z(t)$ is given by

	$\displaystyle Z(t)=\exp$	$\displaystyle\left\{\int_{0}^{t}\int_{\mathcal{K}}\log\{\lambda(s-,x\,\|\,X(s))% \}\right.Y(\textnormal{d}s,\textnormal{d}x)$
		$\displaystyle\quad-\left.\int_{0}^{t}\int_{\mathcal{K}}(\lambda(s,x\,\|\,X(s))-% 1)\,\mu_{\mathcal{K}}(\textnormal{d}x)\textnormal{d}s\right\},\quad t\in[0,T].$

$\diamond$

Proof. For any test set $U\in\mathcal{Y}_{t}$ we have for globally bounded $\psi$

\displaystyle\mathbb{E}_{\mathbb{P}}\left[\mathds{1}_{U}\,\mathbb{E}_{\mathbb{% P}}[\psi(X(t))Z(t)\,|\,\mathcal{Y}_{t}]\right]=\mathbb{E}_{\mathbb{P}}\left[% \mathds{1}_{U}\,\psi(X(t))Z(t)\right]=\mathbb{E}_{\mathbb{Q}}\left[\mathds{1}_% {U}\,\psi(X(t))\right]

(20)

by definition and

	$\displaystyle\mathbb{E}_{\mathbb{P}}\left[\mathds{1}_{U}\,\mathbb{E}_{\mathbb{% Q}}[\psi(X(t))\,\|\,\mathcal{Y}_{t}]\,\mathbb{E}_{\mathbb{P}}[Z(t)\,\|\,\mathcal% {Y}_{t}]\right]$	$\displaystyle=\mathbb{E}_{\mathbb{Q}}\left[\mathds{1}_{U}\,\mathbb{E}_{\mathbb% {Q}}[\psi(X(t))\,\|\,\mathcal{Y}_{t}]\right]$		(21)
		$\displaystyle=\mathbb{E}_{\mathbb{Q}}\left[\mathds{1}_{U}\,\psi(X(t))\right],$		(22)

by $\mathcal{Y}_{t}$ -measurability of $\mathbb{E}_{\mathbb{Q}}[\psi(X(t))\,|\,\mathcal{Y}_{t}]$ . In order to get the equality in ratio form, we observe that for any set $\mathcal{Y}_{t}$ -measurable set $N$ on which $\mathbb{E}_{\mathbb{P}}[Z(t)\,|\,\mathcal{Y}_{t}]=0$ we have

\displaystyle\mathbb{Q}(N)=\mathbb{E}_{\mathbb{P}}[\mathds{1}_{N}Z(t)]=\mathbb% {E}_{\mathbb{P}}[\mathds{1}_{N}\mathbb{E}_{\mathbb{P}}[Z(t)\,|\,\mathcal{Y}_{t% }]]=0,

(23)

implying that (19) holds true under $\mathbb{P}$ . The statement for general $\psi$ follows with monotone-class arguments and approximations.

$\square$

Remark 3.4.

To ensure clarity in the notation for regular conditional expectations used in subsequent sections, we define the functional $z:[0,T]\times\mathcal{C}([0,T];\mathcal{H})\times\mathcal{N}^{\#g}_{[0,T]% \times\mathcal{K}}\rightarrow\mathbb{R}$ by

	$\displaystyle z(t;\mathbf{x},\xi):=$	$\displaystyle\exp\Big{\{}\int_{0}^{t}\int_{\mathcal{K}}\log\left(\lambda(s,x% \mid\mathbf{x}(s))\right)\xi(\textnormal{d}s,\textnormal{d}x)$
		$\displaystyle\qquad\qquad-\int_{0}^{t}\int_{\mathcal{K}}\left(\lambda(s,x\mid X% (s))-1\right)\mu_{\mathcal{K}}(\textnormal{d}x)\textnormal{d}s\Big{\}},$

for $t\in[0,T]$ , $\mathbf{x}\in\mathcal{C}([0,T];\mathcal{H})$ , and $\xi\in\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ . Given the signal $X$ and observation $Y$ , we have

Z(t)=z(t;X,Y).

Consequently, the unnormalized posterior distribution is given by

\rho_{t}(A):=\rho_{t}(\mathds{1}_{A})=\mathbb{E}_{\mathbb{Q}}[\mathds{1}_{A}(X% (t))z(t;X,Y)],\quad A\in\mathcal{B}(\mathcal{H}).

This gives rise to the definition $\tilde{\rho}:[0,T]\times\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}\rightarrow% \mathcal{M}^{+}_{\mathcal{H}}$ as follows:

\tilde{\rho}_{t}\{\chi\}(A):=\mathbb{E}_{X}[\mathds{1}_{A}(X(t))z(t;\cdot,\xi)% ]=\mathbb{E}_{\mathbb{P}}[\mathds{1}_{A}(X(t))\,|\,Y_{0:t}=\xi_{0:t}],\quad A% \in\mathcal{B}(\mathcal{H}),

for any $\xi\in\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ and where $\mathbb{E}_{X}$ denotes the expectation under the distribution with respect to the law $\mathbb{P}_{X}$ of $X$ . In other words $\tilde{\rho}_{t}$ is a regular version of the unnormalized conditional expectation $\rho_{t}$ . Therefore, for a typical observation $Y$ , we have $\tilde{\rho}_{t}\{Y\}(A)=\rho_{t}(A)$ , $t\in[0,T]$ .

$\diamond$

3.2 The Zakai equation

As usual in Bayesian estimation theory, we denote the numerator of (19) as

\rho_{t}(\psi):=\mathbb{E}_{\mathbb{Q}}[\psi(X(t))Z(t)|\mathcal{Y}_{t}],\quad t% \in[0,T],

and call the process $(\rho_{t}(\psi))_{t\geq 0}$ the unnormalized filter. We have the following theorem for the associated filtering equation:

Theorem 3.5 (Zakai equation).

For any Itô-function $\psi$ the following equation for the unnormalized filter holds

	$\displaystyle\rho_{t}(\psi)$	$\displaystyle=\rho_{0}(\psi)+\int_{0}^{t}\rho_{s}(\mathcal{L}\psi)\textnormal{% d}s$		(24)
		$\displaystyle\quad+\int_{0}^{t}\int_{\mathcal{K}}\rho_{s-}((\lambda(s-,x\,\|\,% \cdot\,)-1)\psi)(Y(\textnormal{d}s,\textnormal{d}x)-\mu_{\mathcal{K}}(% \textnormal{d}x)\textnormal{d}s),\quad\mathbb{Q}\text{-a.s.},\,$

for any $t\in[0,T]$ , where $\mathcal{L}$ is given by (4).

$\diamond$

Proof. Let $\psi$ be a globally bounded Itô function. For $t\in[0,T]$ we have by Itô’s lemma for variational solutions of SPDE (see [22, Thm. 1.2]) and by (18) that

$\displaystyle\psi$	$\displaystyle(X(t))Z(t)=\psi(X(0))+\int_{0}^{t}Z(s)\,_{\mathcal{V}^{\ast}}% \langle A(X(t)),\text{D}^{1}\psi(X(t)\rangle_{\mathcal{V}}\,\textnormal{d}s$	(25)
	$\displaystyle\quad+\int_{0}^{t}Z(s)\,\textnormal{tr}\{\text{D}^{2}\psi(X(t)\;(% B(X(s))Q^{\frac{1}{2}})(B(X(s))Q^{\frac{1}{2}})^{\ast}\,\}\,\textnormal{d}s$
	$\displaystyle\quad+\int_{0}^{t}(Z(s)D^{1}\psi(X(s)),B(X(s))\textnormal{d}W(s))% _{\mathcal{H}}$
	$\displaystyle\quad+\int_{0}^{t}\int_{\mathcal{K}}Z(s-)(\lambda(s-,x\,\|\,X(s))-% 1)\psi(X(s))\,[Y(\textnormal{d}s,\textnormal{d}x)-\mu_{\mathcal{K}}(% \textnormal{d}x)\textnormal{d}s].$

We take conditional expectations w.r.t. $\mathcal{Y}_{t}$ on both sides and use the definition of the infinitesimal generator in (4) to arrive at

$\displaystyle\mathbb{E}_{\mathbb{Q}}[\psi(X(t))Z(t)\,\|\,\mathcal{Y}_{t}]$	$\displaystyle=\mathbb{E}_{\mathbb{Q}}\left[\psi(X(0))\,\|\,\mathcal{Y}_{t}\right]$	(26)
	$\displaystyle\quad+\mathbb{E}_{\mathbb{Q}}\left[\int_{0}^{t}Z(s)\mathcal{L}(% \phi(X(s)))\,\textnormal{d}s\,\|\,\mathcal{Y}_{t}\right]$
	$\displaystyle\quad+\mathbb{E}_{\mathbb{Q}}\Big{[}\int_{0}^{t}\int_{\mathcal{K}% }Z(s-)(\lambda(s-,x\,\|\,X(s))-1)\psi(X(s))\,\times$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\quad\times(Y(\textnormal{d}s,% \textnormal{d}x)-\mu_{\mathcal{K}}(\textnormal{d}x)\textnormal{d}s)\,\|\,% \mathcal{Y}_{t}\Big{]},$

as the stochastic integral vanishes due to being a local $\mathbb{Q}$ -martingale. Applying the standard stochastic Fubini argument and then inserting the definition of $\rho_{t}(\psi)$ finishes the proof for globally bounded $\psi$ . Using monotone class arguments and approximations, the assertion for a general $\psi$ can be established.

$\square$

3.3 The Kushner-Stratonovich equation

Now that we have proven Zakai’s equation for the unnormalized filter $(\rho_{t}(\psi))_{t\in[0,T]}$ in our setting, we can derive an equivalent equation for the normalized filter $(\eta_{t}(\psi))_{t\in[0,T]}$ from (19).

Corollary 3.6 (Kushner-Stratonovich equation).

For any Itô-function $\psi$ the following equation for the normalized filter holds

$\displaystyle\eta_{t}(\psi)$	$\displaystyle=\eta_{0}(\psi)+\int_{0}^{t}\eta_{s}(\mathcal{L}\psi)\textnormal{% d}s$	(27)
	$\displaystyle\quad+\int_{0}^{t}\int_{\mathcal{K}}\frac{\eta_{s-}(\psi\,\lambda% (s-,x\,\|\,\cdot))-\eta_{s-}(\psi)\eta_{s-}(\lambda(s-,x\,\|\,\cdot))}{\eta_{s-}% (\lambda(s-,x\,\|\,\cdot))}\times$
	$\displaystyle\qquad\qquad\qquad\qquad\times(Y(\textnormal{d}s\times\textnormal% {d}x)-\eta_{s-}(\lambda(s-,x\,\|\,\cdot))\mu_{\mathcal{K}}(\textnormal{d}x)% \textnormal{d}s),$

for any $t\in[0,T]$ , where $\mathcal{L}$ is given by (4).

$\diamond$

Proof. Let $\psi$ be a globally bounded Itô function. As usual in filtering theory, we are going to use

\eta_{t}(\psi)=\frac{\rho_{t}(\psi)}{\rho_{t}(\mathds{1})},\quad t\in[0,T].

(28)

As $Z(t)^{-1}=\hat{Z}(t)$ , by (17) we have

	$\displaystyle(Z(t))^{-1}$	$\displaystyle=1-\int_{0}^{t}\int_{\mathcal{K}}\frac{\lambda(s-,x\,\|\,X(s))-1}{% Z(s-)\,\lambda(s-,x\,\|\,X(s))}\times$		(29)
		$\displaystyle\qquad\qquad\times(Y(\textnormal{d}s,\textnormal{d}x)-\lambda(s,x% \,\|\,X(s))\mu_{\mathcal{K}}(\textnormal{d}x)\textnormal{d}s).$

From here it can be easily derived that the denominator in (28) suffices

	$\displaystyle\textnormal{d}\rho_{t}(\mathds{1})^{-1}$	$\displaystyle=-\int_{\mathcal{K}}\frac{\eta_{t-}(\lambda(t-,x\,\|\,\cdot))-1}{% \rho_{t-}(\mathds{1})\,\eta_{t-}(\lambda(t-,x\,\|\,\cdot))}\times$		(30)
		$\displaystyle\qquad\qquad\times(Y(\textnormal{d}t,\textnormal{d}x)-\eta_{t}(% \lambda(t,x\,\|\,\cdot))\mu_{\mathcal{K}}(\textnormal{d}x)\textnormal{d}t),$

see e.g. [6, 8] for detailed discussions on restrictions of stochastic intensities to smaller filtrations. Now, an application of Itô’s lemma yields

$\displaystyle\textnormal{d}(\rho_{t}(\psi)$	$\displaystyle\rho_{t}(\mathds{1})^{-1})$
	$\displaystyle=\rho_{t-}(\mathds{1})^{-1}\textnormal{d}\rho_{t}(\psi)+\rho_{t-}% (\psi)\textnormal{d}\rho_{t}(\mathds{1})^{-1}+\Delta\rho_{t}(\psi)\Delta\rho_{% t}(\mathds{1})^{-1}$	(31)
	$\displaystyle=\rho_{t}(\mathds{1})^{-1}\rho_{t}(\mathcal{L}\psi)\textnormal{d}t$
	$\displaystyle\quad+\int_{\mathcal{K}}\rho_{t-}(\mathds{1})^{-1}\rho_{t-}((% \lambda(t-,x\,\|\,\cdot\ )-1)\psi)(Y(\textnormal{d}t,\textnormal{d}x)-\mu_{% \mathcal{K}}(\textnormal{d}x)\textnormal{d}t)$
	$\displaystyle\quad-\int_{\mathcal{K}}\rho_{t-}(\psi)\frac{\eta_{t-}(\lambda(t-% ,x\,\|\,\cdot))-1}{\rho_{t-}(\mathds{1})\,\eta_{t-}(\lambda(t-,x\,\|\,\cdot))}\times$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\quad\times(Y(\textnormal{d}t,% \textnormal{d}x)-\eta_{t}(\lambda(t,x\,\|\,\cdot))\mu_{\mathcal{K}}(\textnormal% {d}x)\textnormal{d}t)$
	$\displaystyle\quad-\int_{\mathcal{K}}\frac{\eta_{t-}(\lambda(t-,x\,\|\,\cdot))-% 1}{\rho_{t-}(\mathds{1})\,\eta_{t-}(\lambda(t-,x\,\|\,\cdot))}\rho_{t-}((% \lambda(t-,x\,\|\,\cdot\ )-1)\psi)Y(\textnormal{d}t,\textnormal{d}x),$

where all terms are well-defined due to our boundedness assumptions on $\lambda$ and $\psi$ . Rearranging terms and inserting the equality (28) lead to (27). The claim for any $\psi$ follows from monotone-class reasoning and approximation methods.

$\square$

3.4 The filtering equations for finite dimensional observations

Fix $M\in\mathbb{N}\backslash\{0\}$ and let $Y^{M}$ be the process from the observation scheme LABEL:introduction_MPP_discretized_observation_multivariate. As mentioned in Section 2, $Y^{M}=(Y^{M}_{1},\dots,Y^{M}_{M})$ is a multivariate point process on $[0,T]$ with conditional intensities $(\lambda^{M}_{1}(t),\dots,\lambda^{M}_{M}(t))$ under $\mathbb{P}$ . By construction, any properties which follow from Assumptions 2 carry over to the counterparts for multivariate point processes.

Generally speaking, the theory of filtering for point processes (without marks) is well-established. However, since there is no known literature addressing the filtering of multivariate point processes with SPDE signals, except for the conference paper by Florchinger [13], we present the main results in this section for the sake of completeness. As a notational convention, we will use the superscript $M$ to distinguish between finite- and infinite-dimensional objects.

For any $i=1,\dots,M$ , we define

	$\displaystyle\hat{Z}^{M}_{i}(t):=\exp\Big{\{}-\int_{0}^{t}\log$	$\displaystyle\left\{\frac{\lambda^{M}_{i}(t\,\|\,X(t))}{\mu_{\mathcal{K}}(K^{M}% _{i})}\right\}\,\textnormal{d}Y^{M}_{i}(t)+$
		$\displaystyle\qquad\qquad+\int_{0}^{t}\big{(}\lambda^{M}_{i}(t\,\|\,X(t))-\mu_{% \mathcal{K}}(K^{M}_{i})\big{)}\,\textnormal{d}s\Big{\}},$

and

\hat{Z}^{M}(t):=\prod_{i=1}^{M}\hat{Z}^{M}_{i}(t)

for any $t\in[0,T]$ .

Analogous to Lemma 3.1, we have

Lemma 3.7.

The process $(\hat{Z}^{M}(t))_{t\in[0,T]}$ is a $(\mathbb{P},\mathcal{F}_{t})$ -martingale.

$\diamond$

As this can be shown by standard techniques, we again omit the proof and refer to [6, 18, 8].

Using above Lemma, analogously to Section 3.1 we define the reference probability measure $\textnormal{d}\mathbb{Q}_{t}^{M}:=\hat{Z}^{M}(t)\textnormal{d}\mathbb{P}$ , which can be extended to a probability measure $\mathbb{Q}^{M}$ on $(\Omega,\mathcal{F})$ . Under $\mathbb{Q}^{M}$ the process $(Y^{M}(t))_{t\in[0,T]}$ is an $M$ -dimensional Poisson process on $[0,T]$ with rate $\mu_{\mathcal{K}}(K^{M}_{i})$ independent of $X$ .

Furthermore, Lemma 3.7 implies the existence of the reverse Radon-Nikodym-derivative $(Z^{M}(t))_{t\in[0,T]}$ by setting $Z^{M}(t):=(\hat{Z}^{M}(t))^{-1}$ and that $(Z^{M}(t))_{t\in[0,T]}$ is a $(\mathbb{Q}^{M},\mathcal{F}_{t})$ -martingale as $\mathbb{E}_{\mathbb{Q}^{M}}[\hat{Z}^{M}(t)]=1$ for any $t\in[0,T]$ .

Denote by $(\mathcal{Y}_{t}^{M})_{t\in[0,T]}$ the filtration generated by $(Y^{M}(t))_{t\in[0,T]}$ . We have

Theorem 3.8.

The following Kallianpur-Striebel formula holds $\mathbb{P}$ -a.s. for any Itô-function $\psi$ as in Definition 2.1:

\eta^{M}_{t}(\psi):=\mathbb{E}_{\mathbb{P}}[\psi(X(t))\,|\,\mathcal{Y}_{t}^{M}% ]=\frac{\mathbb{E}_{\mathbb{Q}^{M}}[\psi(X(t))Z^{M}(t)|\mathcal{Y}^{M}_{t}]}{% \mathbb{E}_{\mathbb{Q}^{M}}[Z^{M}(t)|\mathcal{Y}^{M}_{t}]},\quad t\in[0,T],

(32)

where $Z^{M}$ is given by

	$\displaystyle Z^{M}(t):=\exp\Big{\{}\sum\limits_{i=1}^{M}\Big{[}$	$\displaystyle\int_{0}^{t}\log\left\{\frac{\lambda^{M}_{i}(t\,\|\,X(t))}{\mu_{% \mathcal{K}}(K^{M}_{i})}\right\}\,\textnormal{d}Y^{M}_{i}(t)$
		$\displaystyle-\int_{0}^{t}\big{(}\lambda^{M}_{i}(t\,\|\,X(t))-\mu_{\mathcal{K}}% (K^{M}_{i})\big{)}\,\textnormal{d}s\Big{]}\Big{\}},\quad t\in[0,T].$

$\diamond$

The proof works exactly as the on for Lemma 3.3 after replacing $Z$ , $\mathbb{Q}$ and $\mathcal{Y}_{t}$ with their corresponding counterparts with superscript $M$ .

We define $\rho^{M}_{t}(\psi):=\mathbb{E}_{\mathbb{Q^{M}}}[\psi(X(t))Z^{M}(t)|\mathcal{Y}% ^{M}_{t}]$ and have the analogous results:

Theorem 3.9 (Zakai equation for multivariate point processes).

For any Itô-function $\psi$ the following equation for the unnormalized filter holds

	$\displaystyle\rho^{M}_{t}(\psi)$	$\displaystyle=\rho^{M}_{0}(\psi)+\int_{0}^{t}\rho^{M}_{s}(\mathcal{L}\psi)% \textnormal{d}s$		(33)
		$\displaystyle\quad+\sum_{i=1}^{M}\int_{0}^{t}\rho^{M}_{s-}((\lambda^{M}_{i}(s-% \,\|\,\cdot\,)-1)\psi)(Y^{M}_{i}(\textnormal{d}s)-\mu_{\mathcal{K}}(K^{M}_{i})% \textnormal{d}s),\quad\mathbb{Q}^{M}\text{-a.s.},\,$

for any $t\in[0,T]$ , where $\mathcal{L}$ is given by (4).

$\diamond$

Corollary 3.10 (Kushner-Stratonovich equation for multivariate point processes).

For any Itô-function $\psi$ the following equation for the normalized filter holds $\mathbb{P}$ -a.s.

$\displaystyle\eta^{M}_{t}(\psi)$	$\displaystyle=\eta^{M}_{0}(\psi)+\int_{0}^{t}\eta^{M}_{s}(\mathcal{L}\psi)% \textnormal{d}s$	(34)
	$\displaystyle\quad+\sum_{i=1}^{M}\int_{0}^{t}\frac{\eta^{M}_{s-}(\psi\,\lambda% ^{M}_{i}(s-\,\|\,\cdot\,))-\eta^{M}_{s-}(\psi)\eta^{M}_{s-}(\lambda^{M}_{i}(s-% \,\|\,\cdot\,))}{\eta^{M}_{s-}(\lambda^{M}_{i}(s-\,\|\,\cdot\,))}\times$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\times(Y^{M}_{i}(\textnormal{d}s)-% \eta^{M}_{s-}(\lambda^{M}_{i}(s-\,\|\,\cdot\,))\mu_{\mathcal{K}}(K^{M}_{i})\,% \textnormal{d}s),$

for any $t\in[0,T]$ , where $\mathcal{L}$ is given by (4).

$\diamond$

Both Theorem 3.9 and Corollary 3.10 can be proven analogously to Theorem 3.5 and Corollary 3.6 by replacing the MPP objects with their multivariate counterparts. For further details we refer to [6] and [18].

4 Consistency of finite-dimensional approximations and error bounds

In this section, we explore the relationship between the observations from LABEL:introduction_MPP_observation and LABEL:introduction_MPP_discretized_observation_multivariate, as well as the corresponding estimators for the unnormalized and normalized posterior distributions. If we consider LABEL:introduction_MPP_observation as an observation scheme with an ”infinitely high” resolution, and LABEL:introduction_MPP_discretized_observation_multivariate as an approximation with limited spatial information, it naturally raises questions about the error bounds between them. To address these questions, we introduce the concept of dissecting systems, which are nested partitions commonly used in measure theory.

Using this framework, we construct a nested series of multivariate observations that can be embedded into the MPPs. We demonstrate that this series weakly converges to the process corresponding to LABEL:introduction_MPP_observation in the space of MPPs.

Additionally, we examine the convergence of the corresponding estimators for the normalized and unnormalized posterior distributions. We establish convergence in total variation and provide error bounds.

In the third subsection, we introduce the concept of partial finite-dimensional observation, motivated by the application to CLSM data, where we never observe the entire spatial area but only a fixed subset of partition sets. We derive error bounds for the unnormalized and normalized posterior distributions given these partial observations.

Nested partitioning of the markspace

In order to investigate convergence properties of a family of observation paths according to LABEL:introduction_MPP_discretized_observation_multivariate, $M\in\mathbb{N}$ , we have to make assumptions about the underlying corresponding partitions $\mathcal{K}^{M}$ , introduced in Section 2. The concept of dissecting systems, introduced below, is particularly useful for this purpose. It defines a system of nested partitions that interacts well with point process theory and is intuitive to understand. The following definition is taken from [8].

Definition 4.1.

A sequence $(\mathcal{K}^{M})_{M\in\mathbb{N}}$ of partitions $\mathcal{K}^{M}=\{K^{M}_{1},\dots,K^{M}_{n_{M}}\}$ , $M\in\mathbb{N}$ , consisting of sets in $\mathcal{B}(\mathcal{K})$ , is a dissecting system for $\mathcal{K}$ iff

(i)

The sets $K^{M}_{1},\dots,K^{M}_{n_{M}}$ are disjoint and $\bigcupplus\limits_{i=1}^{n_{M}}K^{M}_{i}=\mathcal{K}$ for any $M\in\mathbb{N}$ .
(ii)

The $\mathcal{K}^{M}$ are nested with increasing $M$ , i.e. $K^{M-1}_{i}\cap K^{M}_{j}=K^{M}_{j}$ or $\emptyset$ .
(iii)

Given any distinct $x_{1},x_{2}\in\mathcal{K}$ , there exists a $\tilde{M}\in\mathbb{N}$ , such that $x_{1}\in K^{\tilde{M}}_{i}$ implies $x_{2}\notin K^{\tilde{M}}_{i}$ .

$\diamond$

The last property is called the point-separation property of the dissecting system. It implies that for any $x\in\mathcal{K}$ there exists a uniquely determined nested sequence of sets $(K^{M}\{x\})_{M\in\mathbb{N}}$ with

x\in K^{M}\{x\}\textnormal{ and }K^{M}\{x\}\in\mathcal{K}^{M}\textnormal{ for % any }M\in\mathbb{N},

such that $\bigcap\limits_{M=1}^{\infty}K^{M}\{x\}=\{x\}$ .

As $(K^{M}\{x\})_{M\in\mathbb{N}}$ is a monotonic sequence, for any measure $\xi$ on $(\mathcal{K},\mathcal{B}(\mathcal{K}))$ we get by continuity from above that

\xi(K^{M}\{x\})\rightarrow\xi(\{x\})\text{ for }M\rightarrow\infty.

(35)

The markspace $\mathcal{K}$ contains a dissecting system, as any Polish space contains at least one, see [7, Proposition A2.1.IV.]. Moreover, as $\mathcal{K}$ is compact hence bounded, we naturally have that all the sets inside its dissecting systems are bounded.

A practical interpretation of such a dissecting system is seeing $M$ as a theoretically increasing resolution of an image and $\mathcal{K}^{M}$ as the corresponding collection of pixels.

Let $(\mathcal{K}^{M})_{M\in\mathbb{N}}$ be a fixed dissecting system of $\mathcal{K}$ for the rest of this section. Definition 4.1 implies that we may assume the existence of some strictly decreasing real positive sequence $(D_{M})_{M\in\mathbb{N}}$ converging to zero such that

\overline{\textnormal{diam}}(\mathcal{K}^{M}):=\max\limits_{i\leq n_{M}}\,% \textnormal{diam}_{\mathcal{K}}(K^{M}_{i})\leq D_{M}\quad\text{ for all }M\in% \mathbb{N},

(36)

where the separability of the underlying space assures the existence of such a dissecting system and $\mathcal{K}$ is equipped with the standard metric on $\mathbb{R}^{d}$ .

4.1 Convergence of finite-dimensional observations

Induced MPP

In order to discuss the convergence of measures, we need to specify a common measure space. For a fixed $M\in\mathbb{N}$ let $Y^{M}$ be the process from the multivariate observation scheme (O^M), where the according sets $K^{M}_{1},\dots,K^{M}_{n_{M}}\in\mathcal{K}^{M}$ . In particular, $Y^{M}$ is an $M$ -variate counting measure on $[0,T]$ , whereas $Y$ is a measure on $[0,T]\times\mathcal{K}$ . We will demonstrate how the explicit construction in Section 2.2.2 induces a marked point process (MPP) on the product measure space.

First, we observe that for any $M\in\mathbb{N}$ we can choose a set of points

\mathbf{k}^{M}:=\{k^{M}_{1},\dots,k^{M}_{{n_{M}}}\in\mathcal{K}\,\,|\,\,k^{M}_% {i}\in K^{M}_{i},\;i=1,\dots,n_{M}\}

which we call representative points of the corresponding sets. We assume these representative points are chosen by some deterministic rule and that they lie in the inner of the corresponding sets, e.g. choosing the center of each set. By above assumption on the diameter of the partition sets we have

d_{\mathcal{K}}(x,k^{M}_{i})\leq\textnormal{diam}_{\mathcal{K}}(K^{M}_{i})\leq D% _{M}\quad\text{ for all }x\in K^{M}_{i},

(37)

for any $i=1,\dots,n_{M}$ . Now, let $(\mathbf{k}^{M})_{M\in\mathbb{N}}$ be a fixed sequence of representative points for $(\mathcal{K}^{M})_{M\in\mathbb{N}}$ . Given $Y^{M}$ from (O^M), we define the MPP $\tilde{Y}^{M}$ using the representative points by setting

\tilde{Y}^{M}(\textnormal{d}t,\textnormal{d}x):=\sum_{i=1}^{M}\sum_{\tau_{i}% \in Y^{M}_{i}([0,T])}\delta_{\tau_{i}\times k^{M}_{i}}(\textnormal{d}t,% \textnormal{d}x),\quad\mathbb{P}\text{-a.s.},

(38)

with $k^{M}_{1},\dots,k^{M}_{n_{M}}\in\mathbf{k}^{M}$ .

The process $\tilde{Y}^{M}$ is a re-embedding of the MPP $Y$ and it can be easily seen that it is indeed an MPP according to Def. 2.3(iii). In particular, we observe that the ground processes coincide, i.e.

(\tilde{Y}^{M})^{g}=Y^{g}.

(39)

One can view $\tilde{Y}^{M}$ as an approximation of $Y$ , where $\tilde{Y}^{M}$ does not capture the exact positions of the marks $\kappa_{i}$ but only identifies the partition set $K^{M}(\kappa_{i})$ in which they lie.

Weak convergence of observations

In the following we are going to explicitly use that $Y$ , and therewith also $\tilde{Y}^{M}$ , are $\mathbb{P}$ -a.s. totally bounded which implies that we can use the notion of weak convergence instead of weak-hash convergence for boundedly finite measures. For the sake of deriving explicit convergence rates, following [9] we introduce the space $\textnormal{BL}([0,T]\times\mathcal{K})$ of all bounded Lipschitz functions on $[0,T]\times\mathcal{K}$ with the norm

\|f\|_{\textnormal{BL}}:=\|f\|_{L}+\|f\|_{\infty},\quad f\in\textnormal{BL}([0% ,T]\times\mathcal{K}),

where

\|f\|_{L}:=\sup\left\{\frac{|f(s,x)-f(t,y)|}{d((s,x),(t,y))}\,\bigg{|}\,d((s,x% ),(t,y))\neq 0\right\}.

Furthermore, each bounded finite signed measure $\mu$ on $([0,T]\times\mathcal{K},\,\mathcal{B}([0,T]\times\mathcal{K}))$ defines an element of the dual space of $\textnormal{BL}([0,T]\times\mathcal{K})$ with the norm

\|\mu\|^{\ast}_{\textnormal{BL}}:=\sup\limits_{f\in\textnormal{BL}([0,T]\times% \mathcal{K})}\left\{|\int_{[0,T]\times\mathcal{K}}\,f\,\textnormal{d}\mu|\;% \big{|}\;\|f\|_{\textnormal{BL}}=1\right\}.

(40)

and by [9, Theorem 12.] the weak topology in the space $\mathcal{M}^{+}_{[0,T]\times\mathcal{K}}$ of all nonnegative totally bounded Borel measures on the product measure space coincides with the topology defined by $\|\cdot\|^{\ast}_{\textnormal{BL}}$ and as as a direct implication, the same applies to the corresponding topologies on $\mathcal{N}^{\#}_{[0,T]\times\mathcal{K}}$ .

The following result shows that after the re-embedding of the multivariate point processes according to LABEL:introduction_MPP_discretized_observation_multivariate, the approximations weakly converge to the underlying MPP $Y$ .

Proposition 4.2.

Let $(\mathcal{K}^{M})_{M\in\mathbb{N}}$ be a dissecting system and $(\mathbf{k}^{M})_{M\in\mathbb{N}}$ be a sequence of corresponding representative points. Furthermore, let $Y$ be an MPP on $[0,T]\times\mathcal{K}$ . For any $M$ we define the approximating MPP $\tilde{Y}^{M}$ via the explicit construction in (38). Then $\tilde{Y}^{M}\stackrel{{\scriptstyle w}}{{\rightarrow}}Y$ $\mathbb{P}$ -a.s. in $\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ for $M\rightarrow\infty$ . Furthermore, we have the approximation error

\|\tilde{Y}^{M}-Y\|^{\ast}_{\textnormal{BL}}\leq Y^{g}([0,T])\,\overline{% \textnormal{diam}}(\mathcal{K}^{M}).

(41)

$\diamond$

Proof. As discussed above, the process $\tilde{Y}^{M}\in\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ $\mathbb{P}$ -a.s. for any $M\in\mathbb{N}$ , thus it is also $\mathbb{P}$ -a.s. an element of $\mathcal{N}^{\#}_{[0,T]\times\mathcal{K}}$ .

Let $f\in\textnormal{BL}([0,T]\times\mathcal{K})$ with Lipschitz constant $L_{f}$ . Then we have $\mathbb{P}$ -a.s. that

	$\displaystyle\Big{\|}\int_{[0,T]\times\mathcal{K}}f(t,x)Y(\textnormal{d}t,% \textnormal{d}x)-\int_{[0,T]\times\mathcal{K}}f(t,x)\tilde{Y}^{M}(\textnormal{% d}t,\textnormal{d}x)\Big{\|}$		(42)
	$\displaystyle=\Big{\|}\sum\limits_{(\tau_{i},\kappa_{i})\in Y([0,T]\times% \mathcal{K})}f(\tau_{i},\kappa_{i})-\sum\limits_{(\tau_{i},\kappa_{i})\in Y([0% ,T]\times\mathcal{K})}f(\tau_{i},k^{M}\{\kappa_{i}\})\Big{\|}$		(43)
	$\displaystyle\leq\sum\limits_{(\tau_{i},\kappa_{i})\in Y([0,T]\times\mathcal{K% })}\|f(\tau_{i},\kappa_{i})-f(\tau_{i},k^{M}\{\kappa_{i}\})\|$		(44)
	$\displaystyle\leq\sum\limits_{(\tau_{i},\kappa_{i})\in Y([0,T]\times\mathcal{K% })}L_{f}\,d_{\mathcal{K}}(\kappa_{i},k^{M}\{\kappa_{i}\}))\leq\sum\limits_{% \tau_{i}\in Y^{g}([0,T])}\\|f\\|_{\textnormal{BL}}\,\overline{\textnormal{diam}}% (\mathcal{K}^{M})$		(45)
	$\displaystyle\leq Y^{g}([0,T])\\|f\\|_{\textnormal{BL}}\,D_{M}\longrightarrow 0,% \quad M\rightarrow\infty,$		(46)

where we used (37) and the dominating sequence $(D_{M})_{M\in\mathbb{N}}$ . Thus $\tilde{Y}^{M}\stackrel{{\scriptstyle w}}{{\rightarrow}}Y$ $\mathbb{P}$ -a.s. in $\mathcal{N}_{[0,T]\times\mathcal{K}}$ by the Portemanteau theorem [16, Thm. 13.16(ii)].

In particular, we have that the limit process $Y\in\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ $\mathbb{P}$ -a.s. by assumption and hence $\tilde{Y}^{M}\stackrel{{\scriptstyle w}}{{\rightarrow}}Y$ $\mathbb{P}$ -a.s. in $\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ .

The approximation error follows directly by choosing $f\in\textnormal{BL}([0,T]\times\mathcal{K})$ from the subset of functions in $\textnormal{BL}([0,T]\times\mathcal{K})$ with $\|f\|_{\textnormal{BL}}=1$ and taking the supremum as in (40).

$\square$

We want to remind the reader, that $\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ is in general not closed under weak convergence as accumulation points might appear in the limit even if every element of a sequence is an MPP. However, in our particular setting, we know that the limit process $Y\in\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ $\mathbb{P}$ -a.s. allowing us to state the weak convergence in $\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ .

4.2 Asymptotic consistency of posterior distributions

In this subsection, we investigate the limiting behavior of the unnormalized and normalized posterior distributions with increasing spatial resolution of the underlying partition. Using our explicit construction, we are able to show convergence in total variation. Additionally, we prove that the approximation error decreases linearly with respect to the size of the partition sets.

For the next results we denote by $\mathcal{M}^{+}_{\mathcal{H}}$ the space of all totally bounded positive measures on $\mathcal{H}$ and make the following additional assumption on the stochastic intensity of $Y$ .

Assumption 3.

In addition to all properties from Assumption 2, the stochastic intensity $\lambda$ of $Y$ is a continuous function on $[0,T]\times\mathcal{K}\times\mathcal{H}$ and for all $u\in\mathcal{H}$ the bounds

\lambda_{-}:=\inf\limits_{(t,x)\in[0,T]\times\mathcal{K}}\lambda(t,x\,|\,u),% \quad\lambda_{+}:=\sup\limits_{(t,x)\in[0,T]\times\mathcal{K}}\lambda(t,x\,|\,u)

exist, such that

0<\lambda_{-}\leq\lambda(t,x\,|\,u)\leq\lambda_{+}<\infty,

(47)

for all $(t,x)\in[0,T]\times\mathcal{K}$ .

$\diamond$

We now present the main theorem of this section.

Theorem 4.3.

Let Assumptions 1, 2, 3 hold true. Furthermore, let $(\mathcal{K}^{M})_{M\in\mathbb{N}}$ be a dissecting system and $(\mathbf{k}^{M})_{M\in\mathbb{N}}$ be a sequence of corresponding representative points. Given a signal $X$ according to LABEL:introduction_spde_basic_formulation let $Y$ be the MPP from observation scheme LABEL:introduction_MPP_observation and $(Y^{M})_{M\in\mathbb{N}}$ a family of multivariate point processes on $[0,T]$ , where each $Y^{M}$ is the process from LABEL:introduction_MPP_discretized_observation_multivariate given $\mathcal{K}^{M}$ and $\mathbf{k}^{M}$ .

Moreover, let $\rho_{t}$ and $\rho^{M}_{t}$ be the unnormalized posterior distributions from Theorem 3.5 and Theorem 3.9, respectively, corresponding to LABEL:introduction_MPP_observation and LABEL:introduction_MPP_discretized_observation_multivariate, for any $M\in\mathbb{N}$ and $t\in[0,T]$ . Then, we have the following result:

(i)

$\|\rho^{M}_{t}-\rho_{t}\|_{\textnormal{TV}}\longrightarrow 0$ $\mathbb{P}$ -a.s. in $\mathcal{M}^{+}_{\mathcal{H}}$ for $M\longrightarrow\infty$ ;

(ii)

Let in addition the stochastic intensity $\lambda(\cdot,\cdot\,|\,X(\cdot))\in\textnormal{BL}([0,T]\times\mathcal{K})$ $\mathbb{P}$ -a.s. with deterministic Lipschitz constant $L_{\lambda}>0$ such that

L_{\lambda}:=\sup\left\{\frac{|\lambda(s,x\,|\,u)-\lambda(t,y\,|\,u)|}{d((s,x)% ,(t,y))}\,\bigg{|}\,d((s,x),(t,y))\neq 0\right\},

for all $u\in\mathcal{H}$ .

Then, we have the approximation error

\sup\limits_{t\in(0,T]}\|\rho^{M}_{t}-\rho_{t}\|_{\textnormal{TV}}\leq\kappa_{% \rho}(\lambda,Y)\,\max\Big{\{}\overline{\textnormal{diam}}(\mathcal{K}^{M}),\,% \overline{\textnormal{diam}}(\mathcal{K}^{M})^{Y^{g}((0,T])}\Big{\}}

(48)

with

	$\displaystyle\kappa_{\rho}(\lambda,Y)$	$\displaystyle:=\frac{1}{2}\big{(}(1+L_{\lambda}\,\lambda_{-}^{-1})^{Y^{g}((0,T% ])}-1\big{)}\cdot$
		$\displaystyle\cdot\max\Big{\{}\lambda_{+}^{Y^{g}((0,T])}\,\exp\{-T\,(\lambda_{% -}-1)\,\mu_{\mathcal{K}}(\mathcal{K})\},$
		$\displaystyle\qquad\quad\lambda_{+}^{Y^{g}((0,T])},\,\exp\{-T\,(\lambda_{-}-1)% \,\mu_{\mathcal{K}}(\mathcal{K})\}\Big{\}}$

where $\lambda_{-}$ and $\lambda_{+}$ are the bounds from Assumption 3.

$\diamond$

Proof. Fix $M\in\mathbb{N}$ and $t\in[0,T]$ . Moreover, let $X$ be a signal path and let $Y$ be the MPP given $X$ . The statement is clear for $t=0$ , so let $t\in(0,T]$ . First, we want to show convergence of the Radon-Nikodym densities. For any given continuous path $\mathbf{x}\in\mathcal{C}([0,T];\mathcal{H})$ we denote by $\mathbf{x_{0:t}}$ the restriction of $\mathbf{x}$ up to time $t$ . Recall that the Radon-Nikodym density $Z^{M}(t)$ associated to $Y^{M}$ from Section 3.4 is given as

	$\displaystyle Z$	${}^{M}(t)=\exp\bigg{\{}\sum\limits_{i=1}^{M}\Big{[}\int_{0}^{t}\log\Big{\{}(% \mu_{\mathcal{K}}(K^{M}_{i}))^{-1}\lambda^{M}_{i}(s\,\|\,X(s))\Big{\}}\,Y^{M}_{% i}(\textnormal{d}s)\Big{]}$
		$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad-\int_{0}^{t}\int% _{\mathcal{K}}(\lambda(s,x\,\|\,X(s))-1)\,\mu_{\mathcal{K}}(\textnormal{d}y)% \textnormal{d}s\Big{]}\bigg{\}}.$

To improve readability, for a given typical path $\mathbf{x}\in\mathcal{C}([0,T];\mathcal{H})$ we define

I(t\,|\,\mathbf{x}_{0:t}):=\int_{0}^{t}\int_{\mathcal{K}}(\lambda(s,x\,|\,% \mathbf{x}(s))-1)\,\mu_{\mathcal{K}}(\textnormal{d}x)\textnormal{d}s,\quad t% \in[0,T].

and

\theta_{M}(s,x\,|\,\mathbf{x}(s)):=(\mu_{\mathcal{K}}(K^{M}\{x\}))^{-1}\int_{K% ^{M}\{x\}}\lambda(s,y\,|\,\mathbf{x}(s))\,\mu_{\mathcal{K}}(\textnormal{d}y).

Analogously to the unnormalized regular conditional expectations in Remark 3.4, we define the functionals $z^{M}:[0,T]\times\mathcal{C}([0,T];\mathcal{H})\times\mathcal{N}^{\#g}_{[0,T]% \times\mathcal{K}}\rightarrow\mathbb{R}$ , $M\in\mathbb{N}$ , by

\displaystyle z^{M}(t;\mathbf{x},\xi)

\displaystyle:=\exp\bigg{\{}\int_{0}^{t}\int_{\mathcal{K}}\log\{\theta_{M}(s,x% \,|\,\mathbf{x}(s))\}\xi(\textnormal{d}s,\textnormal{d}x)-I(t\,|\,\mathbf{x_{0% :t}})\bigg{\}},

for $t\in[0,T]$ , $\mathbf{x}\in\mathcal{C}([0,T];\mathcal{H})$ , $\xi\in\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}$ . We can write $Z^{M}(t)$ in terms of the underlying MPP $Y$ as

	$\displaystyle Z^{M}(t)$	$\displaystyle=\exp\bigg{\{}\int_{0}^{t}\int_{\mathcal{K}}\log\Big{\{}\theta_{M% }(s,x\,\|\,X(s)))\Big{\}}\,Y(\textnormal{d}s,\textnormal{d}x)-I(t\,\|\,X_{0:t})% \bigg{\}}$		(49)
		$\displaystyle=z^{M}(t;X,Y),$

and note that $I(t\,|\,X_{0:t})$ already coincides with the $\mu_{\mathcal{K}}(\textnormal{d}x)\textnormal{d}t$ -integral in $Z(t)$ ; compare to the derivation in Section 3.1. By continuity and boundedness of $\lambda$ we have

\|\theta_{M}(\cdot,\,\cdot\,|\,X(\cdot))-\lambda(\cdot,\,\cdot\,|\,X(\cdot))\|% _{\infty}\rightarrow 0\text{ for }M\rightarrow\infty.

Hence, as $\lambda(\cdot\,,\cdot\,|\,X(\cdot))$ is assumed to be uniformly bounded from below away from zero by Assumption 3, we also have $\log\{\theta_{M}(\cdot,\,\cdot\,|\,X(\cdot))\}\rightarrow\log\{\lambda(\cdot,% \,\cdot\,|\,X(\cdot))$ uniformly for $M\rightarrow\infty$ , yielding the convergence

	$\displaystyle\int_{0}^{t}\int_{\mathcal{K}}$	$\displaystyle\log\{\theta_{M}(s,x\,\|\,X(s))\}Y(\textnormal{d}s,\textnormal{d}x)$		(50)
		$\displaystyle\longrightarrow\int_{0}^{t}\int_{\mathcal{K}}\log\{\lambda(s,x\,\|% \,X(s))\}\,Y(\textnormal{d}s,\textnormal{d}x),$

and hence also

\displaystyle z^{M}(t;X,Y)

\displaystyle\longrightarrow z(t;X,Y)

(51)

$\mathbb{P}$ -a.s. and in $L^{1}$ for $M\rightarrow\infty$ , where $z(t;X,Y)$ was introduced in Remark 3.4 and represents a functional form of the Radon-Nikodym density $Z(t)$ .

Analogously to the definition of $\tilde{\rho}_{t}$ in Remark 3.4, given the signal path $X$ , we define the measure-valued functionals $\tilde{\rho}^{M}:[0,T]\times\mathcal{N}^{\#g}_{[0,T]\times\mathcal{K}}% \rightarrow\mathcal{M}^{+}_{\mathcal{H}}$ , $M\in\mathbb{N}$ , by

\displaystyle\tilde{\rho}^{M}_{t}\{\chi\}(A)

\displaystyle:=\mathbb{E}_{X}[\mathds{1}_{A}(X(t))\,z^{M}(t;\,\cdot\,,\chi)]% \quad A\in\mathcal{B}(\mathcal{H}).

We denote for the observation $Y^{M}$ corresponding to LABEL:introduction_MPP_discretized_observation_multivariate by $\tilde{Y}^{M}$ the embedding into the MPPs as explained in (38) and used in the proof of Proposition 4.2. We see that

\tilde{\rho}^{M}_{t}\{Y\}(A)=\tilde{\rho}^{M}_{t}\{\tilde{Y}^{M}\}(A)=\rho^{M}% _{t}(A),

(52)

hence $\tilde{\rho}^{M}_{t}(A)$ is equivalent to a regular version of the unnormalized conditional expectation $\rho^{M}_{t}(A)$ .

The total variation of $\tilde{\rho}^{M}_{t}\{{\chi}\}$ and $\tilde{\rho}_{t}\{{\chi}\}$ is given w.r.t. the dominating measure $\mathbb{P}_{X}$ , so that we have

\displaystyle\|\tilde{\rho}^{M}_{t}\{{\chi}\}-\tilde{\rho}_{t}\{\chi\}\|_{% \textnormal{TV}}=\dfrac{1}{2}\,\mathbb{E}_{X}[|z^{M}(t;\,\cdot\,,{\chi})-z(t;% \,\cdot\,,\chi)|]\longrightarrow 0

(53)

for $M\rightarrow\infty$ by (51). Hence, for any typical observation path $Y$ , we have

\displaystyle\|\rho_{t}-\rho^{M}_{t}\|_{\textnormal{TV}}\leq\|\tilde{\rho}^{M}% _{t}\{Y\}-\rho_{t}\|_{\textnormal{TV}}=\|\tilde{\rho}^{M}_{t}\{{Y}\}-\tilde{% \rho}_{t}\{Y\}\|_{\textnormal{TV}}\longrightarrow 0

(54)

$\mathbb{P}$ -a.s. for $M\rightarrow\infty$ , proving (i).

For the rest of the proof let us denote $\lambda(t,x):=\lambda(t,x\,|\,X(t))$ and $\theta_{M}(t,x):=\theta_{M}(t,x\,|\,X(t))$ for better readability. To prove the approximation error in (ii), we first note that for any $(t,x)\in[0,T]\times\mathcal{K}$ and $M\in\mathbb{N}$ we have

$\displaystyle\log\{$	$\displaystyle\theta_{M}(t,x)\}-\log\{\lambda(t,x)\}=\log\left\{\frac{\int_{K^{% M}\{x\}}\lambda(t,y)\mu_{\mathcal{K}}(\textnormal{d}y)}{\mu_{\mathcal{K}}(K^{M% }\{x\})\,\lambda(t,x)}\right\}$	(55)
	$\displaystyle\leq\log\Bigg{\{}\frac{\sup\limits_{y\in K^{M}\{x\}}\lambda(t,y)}% {\inf\limits_{y\in K^{M}\{x\}}\lambda(t,y)}\Bigg{\}}=\log\bigg{\{}1+\frac{% \omega_{\lambda(t,\cdot)}(K^{M}\{x\})}{\inf\limits_{y\in K^{M}\{x\}}\lambda(t,% y)}\bigg{\}}$	(56)
	$\displaystyle\leq\log\left\{1+\frac{L_{\lambda}\,\overline{\textnormal{diam}}(% \mathcal{K}^{M})}{\lambda_{-}}\right\},$	(57)

where $\omega_{f}(K):=\sup_{x\in K}f(x)-\inf_{x\in K}f(x)$ is the oscillation of a function $f$ on the set $K$ , and similarly

$\displaystyle\log\{$	$\displaystyle\lambda(t,x)\}-\log\{\theta_{M}(t,x)\}=-\log\Bigg{\{}\frac{\int_{% K^{M}\{x\}}\lambda(t,y)\mu_{\mathcal{K}}(\textnormal{d}y)}{\mu_{\mathcal{K}}(K% ^{M}\{x\})\,\lambda(t,x)}\Bigg{\}}$	(58)
	$\displaystyle\leq-\log\Bigg{\{}\frac{\inf\limits_{y\in K^{M}\{x\}}\lambda(t,y)% }{\sup\limits_{y\in K^{M}\{x\}}\lambda(t,y)}\Bigg{\}}=\log\Bigg{\{}\frac{\sup% \limits_{y\in K^{M}\{x\}}\lambda(t,y)}{\inf\limits_{y\in K^{M}\{x\}}\lambda(t,% y)}\Bigg{\}}$	(59)
	$\displaystyle\leq\log\left\{1+\frac{L_{\lambda}\overline{\textnormal{diam}}(% \mathcal{K}^{M})}{\lambda_{-}}\right\}.$	(60)

Combining both inequalities yields

\displaystyle|\log\{\theta_{M}(t,x)\}-\log\{\lambda(t,x)\}|\leq\log\left\{1+% \frac{L_{\lambda}\overline{\textnormal{diam}}(\mathcal{K}^{M})}{\lambda_{-}}% \right\},

(61)

and finally

	$\displaystyle\bigg{\|}\int_{0}^{t}\int_{\mathcal{K}}\big{(}\log\{\theta_{M}(s,x)\}$	$\displaystyle-\log\{\lambda(s,x)\}\big{)}Y(\textnormal{d}s,\textnormal{d}x)% \bigg{\|}$
		$\displaystyle\leq Y^{g}((0,t])\,\log\left\{1+\frac{L_{\lambda}\overline{% \textnormal{diam}}(\mathcal{K}^{M})}{\lambda_{-}}\right\}.$		(62)

To construct an upper bound for $Z(t)$ , we define the function $\overline{\vartheta}(t,\lambda,Y)$ by

	$\displaystyle\overline{\vartheta}(t,\lambda,Y):=\max\Big{\{}$	$\displaystyle\lambda_{+}^{Y^{g}((0,t])}\,\exp\{-t\,(\lambda_{-}-1)\,\mu_{% \mathcal{K}}(\mathcal{K})\},$
		$\displaystyle\lambda_{+}^{Y^{g}((0,t])},\,\exp\{-t\,(\lambda_{-}-1)\,\mu_{% \mathcal{K}}(\mathcal{K})\}\Big{\}}.$

The case distinction for the $\max$ function depends on the values of the bounds $\lambda_{-}$ and $\lambda_{+}$ from Assumption 3. In particular,

\overline{\vartheta}(t,\lambda,Y)=\begin{cases}\begin{aligned} &\exp\{-t\,(% \lambda_{-}-1)\,\mu_{\mathcal{K}}(\mathcal{K})\},&&0<\lambda_{-}\leq\lambda_{+% }\leq 1,\\ &\lambda_{+}^{Y^{g}((0,t])}\,\exp\{-t\,(\lambda_{-}-1)\,\mu_{\mathcal{K}}(% \mathcal{K})\},&&0<\lambda_{-}\leq 1\leq\lambda_{+}<\infty,\\ &\lambda_{+}^{Y^{g}((0,t])},&&1\leq\lambda_{-}\leq\lambda_{+}<\infty.\end{% aligned}\end{cases}

(63)

By definition we have that $\overline{\vartheta}(\cdot,\lambda,Y)$ is monotonically increasing on $[0,T]$ . From here, we can impose the bound

Z(t)\leq\overline{\vartheta}(t,\lambda,Y),

(64)

which is immediately evident from (16) and the above case distinction.

Thus, by using above bounds we have

	$\displaystyle\|Z(t)-Z^{M}(t)\|$
	$\displaystyle=Z(t)\;\big{\|}1-\exp\Big{\{}\int_{0}^{t}\int_{\mathcal{K}}\big{(}% \log\{\theta_{M}(s,x)\}-\log\{\lambda(s,x)\}\big{)}Y(\textnormal{d}s,% \textnormal{d}x)\Big{\}}\big{\|}$		(65)
	$\displaystyle\leq\overline{\vartheta}(t,\lambda,Y)\,\Big{(}\exp\Big{\{}\int_{0% }^{t}\int_{\mathcal{K}}\big{\|}\log\{\theta_{M}(s,x)\}-\log\{\lambda(s,x)\}\big% {\|}Y(\textnormal{d}s,\textnormal{d}x)\Big{\}}-1\Big{)}$		(66)
	$\displaystyle\leq\overline{\vartheta}(t,\lambda,Y)\Big{(}\Big{(}1+\frac{L_{% \lambda}\overline{\textnormal{diam}}(\mathcal{K}^{M})}{\lambda_{-}}\Big{)}^{Y^% {g}((0,t])}-1\Big{)},$		(67)

where we again utilized (49) to obtain (65). Using the elementary estimate

\displaystyle(1+uv)^{n}-1=\sum\limits_{k=1}^{n}\binom{n}{k}(uv)^{k}\leq\sum% \limits_{k=1}^{n}\binom{n}{k}(u)^{k}v\vee v^{n}=((1+u)^{n}-1)v\vee v^{n},

for $u,v\geq 0,\,n\in\mathbb{N}$ we obtain

	$\displaystyle\Big{(}\Big{(}1+\frac{L_{\lambda}\overline{\textnormal{diam}}(% \mathcal{K}^{M})}{\lambda_{-}}\Big{)}^{Y^{g}((0,t])}-1\Big{)}$
	$\displaystyle\quad\leq\Big{(}\Big{(}1+\frac{L_{\lambda}}{\lambda_{-}}\Big{)}^{% Y^{g}((0,t])}-1\Big{)}\,\max\Big{\{}\overline{\textnormal{diam}}(\mathcal{K}^{% M}),\,\overline{\textnormal{diam}}(\mathcal{K}^{M})^{Y^{g}((0,t])}\Big{\}},$		(68)

leading to

	$\displaystyle\|Z(t)-Z^{M}(t)\|\leq\overline{\vartheta}(t,\lambda,Y)\,\Big{(}$	$\displaystyle\Big{(}1+\frac{L_{\lambda}}{\lambda_{-}}\Big{)}^{Y^{g}((0,t])}-1% \Big{)}\cdot$		(69)
		$\displaystyle\cdot\max\Big{\{}\overline{\textnormal{diam}}(\mathcal{K}^{M}),\,% \overline{\textnormal{diam}}(\mathcal{K}^{M})^{Y^{g}((0,t])}\Big{\}}.$

The right hand side does not depend on $X$ , thus we can also bound $\mathbb{E}_{X}[|Z^{M}(t)-Z(t)|]$ by (69).

Finally, as $Y^{g}$ grows monotonically and all components are bounded on $[0,T]$ we take the supremum to conclude

$\displaystyle\sup\limits_{t\in[0,T]}\\|\rho^{M}_{t}-\rho_{t}\\|_{\textnormal{TV}}$	$\displaystyle=\sup\limits_{t\in[0,T]}\dfrac{1}{2}\,\mathbb{E}_{X}[\|Z^{M}(t)-Z(% t)\|]$
	$\displaystyle\,\leq\dfrac{1}{2}\,\overline{\vartheta}(T,\lambda,Y)\,\Big{(}% \Big{(}1+\frac{L_{\lambda}}{\lambda_{-}}\Big{)}^{Y^{g}((0,T])}-1\Big{)}\cdot$	(70)
	$\displaystyle\qquad\cdot\max\Big{\{}\overline{\textnormal{diam}}(\mathcal{K}^{% M}),\,\overline{\textnormal{diam}}(\mathcal{K}^{M})^{Y^{g}((0,T])}\Big{\}},$

whereby assertion (ii) is proven.

$\square$

As a direct Corollary we have the following analogous result for the normalized posterior distribution. We denote by $\mathcal{P}_{\mathcal{H}}$ the space of all probability measures on $\mathcal{H}$ .

Corollary 4.4.

Let the assumptions from Theorem 4.3 hold true. Moreover, let $\eta_{t}$ and $\eta^{M}_{t}$ be the unnormalized posterior distributions from Theorem 3.6 corresponding to LABEL:introduction_MPP_observation and from Theorem 3.10 corresponding to (O^M), respectively, for any $M\in\mathbb{N}$ and $t\in[0,T]$ .

Then we have the following result:

(i)

$\|\eta^{M}_{t}-\eta_{t}\|_{\textnormal{TV}}\longrightarrow 0$ $\mathbb{P}$ -a.s. in $\mathcal{P}_{\mathcal{H}}$ for $M\rightarrow\infty$ .

(ii)

Under the assumptions of Theorem 4.3(ii) we have the approximation error

\sup\limits_{t\in[0,T]}\|\eta^{M}_{t}-\eta_{t}\|_{\textnormal{TV}}\leq\kappa_{% \eta}(\lambda,Y)\,\max\Big{\{}\overline{\textnormal{diam}}(\mathcal{K}^{M}),\,% \overline{\textnormal{diam}}(\mathcal{K}^{M})^{Y^{g}((0,T])}\Big{\}}

(71)

with

	$\displaystyle\kappa_{\eta}(\lambda,Y):=$	$\displaystyle\big{(}(1+L_{\lambda}\,\lambda_{-}^{-1})^{Y^{g}((0,T])}-1\big{)}\cdot$
		$\displaystyle\cdot\max\Big{\{}\lambda_{-}^{-Y^{g}((0,T])}\exp\{-2T\,(\lambda_{% -}-1)\,\mu_{\mathcal{K}}(\mathcal{K})\},$
		$\displaystyle\qquad\quad\Big{(}\frac{\lambda_{+}^{2}}{\lambda_{-}}\Big{)}^{Y^{% g}((0,T])}\,\exp\{T\,(\lambda_{+}+1-2\lambda_{-})\,\mu_{\mathcal{K}}(\mathcal{% K})\},$
		$\displaystyle\qquad\quad\lambda_{+}^{2Y^{g}((0,T])}\exp\{T\,(\lambda_{+}-1)\,% \mu_{\mathcal{K}}(\mathcal{K})\}\Big{\}}$

with $\lambda_{-}$ and $\lambda_{+}$ from Assumption 3.

$\diamond$

Proof. Let $\psi\in\mathcal{C}_{b}(\mathcal{H})$ . By definition

\eta_{t}(\psi):=\frac{\rho_{t}(\psi)}{\rho_{t}(\mathds{1})},\quad\eta^{M}_{t}(% \psi):=\frac{\rho^{M}_{t}(\psi)}{\rho^{M}_{t}(\mathds{1})},\;M\in\mathbb{N},

are probability measures on $\mathcal{H}$ for all $t\in[0,T]$ . By Theorem 4.3 we get the convergence $\rho^{M}_{t}(\psi)\rightarrow\rho_{t}(\psi)$ and $\rho^{M}_{t}(\mathds{1})\rightarrow\rho_{t}(\mathds{1})$ in total variation for $M\rightarrow\infty$ . Moreover, $\rho^{M}_{t}(\mathds{1})>0$ $\mathbb{P}$ -a.s., thus $\eta^{M}_{t}(\psi)\rightarrow\eta_{t}(\psi)$ in total variation by $\eta_{t}(\psi)$ being the quotient of two converging sequences. As $\psi$ was chosen arbitrarily assertion (i) follows.

For the proof of the rate in (ii), we again denote $\lambda(t,x):=\lambda(t,x\,|\,X(t))$ and $\theta_{M}(t,x):=\theta_{M}(t,x\,|\,X(t))$ for better readability. As $\eta_{t},\eta^{M}_{t}\in\mathcal{P}_{\mathcal{H}}$ , $M\in\mathbb{N}$ ,

\|\eta^{M}_{t}-\eta_{t}\|_{\textnormal{TV}}=\sup\limits_{A\in\mathcal{B}(% \mathcal{H})}|\eta^{M}_{t}(A)-\eta_{t}(A)|.

(72)

We know from (64) in the proof of Theorem 4.3 that we have the upper bound

Z(t)\leq\overline{\vartheta}(t,\lambda,Y)\quad t\in[0,T].

(73)

In order to impose a lower bound we use similar arguments and define

	$\displaystyle\underline{\vartheta}(t,\lambda,Y):=\min\Big{\{}$	$\displaystyle\lambda_{-}^{Y^{g}((0,t])}\,\exp\{-t\,(\lambda_{+}-1)\,\mu_{% \mathcal{K}}(\mathcal{K})\},$
		$\displaystyle\lambda_{-}^{Y^{g}((0,t])},\,\exp\{-t\,(\lambda_{+}-1)\,\mu_{% \mathcal{K}}(\mathcal{K})\}\Big{\}},$

where an analogous case distinction to (63) is given by

\underline{\vartheta}(t,\lambda,Y)=\begin{cases}\begin{aligned} &\lambda_{-}^{% Y^{g}((0,t])},&&0<\lambda_{-}\leq\lambda_{+}\leq 1,\\ &\lambda_{-}^{Y^{g}((0,t])}\,\exp\{-t\,(\lambda_{+}-1)\,\mu_{\mathcal{K}}(% \mathcal{K})\},&&0<\lambda_{-}\leq 1\leq\lambda_{+}<\infty,\\ &\exp\{-t\,(\lambda_{+}-1)\,\mu_{\mathcal{K}}(\mathcal{K})\},&&1\leq\lambda_{-% }\leq\lambda_{+}<\infty.\end{aligned}\end{cases}

(74)

Hence $\underline{\vartheta}(\cdot,\lambda,Y)$ is monotonically decreasing on $[0,T]$ . This gives rise to impose the lower bound

\displaystyle Z(t)\geq\underline{\vartheta}(t,\lambda,Y),

(75)

for any $t\in[0,T]$ . One can easily verify, that for any $M\in\mathbb{N}$ we also have

\underline{\vartheta}(t,\lambda,Y)\leq Z^{M}(t)\leq\overline{\vartheta}(t,% \lambda,Y),

(76)

and as the bounds do not depend on $X$ it also follows that

\underline{\vartheta}(t,\lambda,Y)\leq\rho_{t}(\mathds{1}),\rho_{t}^{M}(% \mathds{1})\leq\overline{\vartheta}(t,\lambda,Y).

(77)

Hence, for any $A\in\mathcal{B}(\mathcal{H})$ , we conclude

$\displaystyle\|\eta^{M}_{t}(A)-\eta_{t}$	$\displaystyle(A)\|=\Big{\|}\frac{\rho^{M}_{t}(A)}{\rho^{M}_{t}(\mathds{1})}-% \frac{\rho_{t}(A)}{\rho_{t}(\mathds{1})}\Big{\|}$
	$\displaystyle=\Big{\|}\frac{\rho^{M}_{t}(A)\,\rho_{t}(\mathds{1})-\rho_{t}(A)\,% \rho^{M}_{t}(\mathds{1})}{\rho^{M}_{t}(\mathds{1})\,\rho_{t}(\mathds{1})}\Big{\|}$	(78)
	$\displaystyle\leq(\rho^{M}_{t}(\mathds{1})\,\rho_{t}(\mathds{1}))^{-1}\,\Big{(% }\,\big{\|}\,\rho^{M}_{t}(A)\,\rho_{t}(\mathds{1})-\rho_{t}(A)\,\rho_{t}(% \mathds{1})\,\big{\|}$	(79)
	$\displaystyle\qquad\qquad\qquad\qquad\qquad+\big{\|}\,\rho_{t}(A)\,\rho^{M}_{t}% (\mathds{1})-\rho_{t}(A)\,\rho_{t}(\mathds{1})\,\big{\|}\Big{)}$
	$\displaystyle\leq\rho^{M}_{t}(\mathds{1})^{-1}\Big{(}\,\big{\|}\,\rho^{M}_{t}(A% )-\rho_{t}(A)\big{\|}+\big{\|}\rho^{M}_{t}(\mathds{1})-\rho_{t}(\mathds{1})\,% \big{\|}\Big{)}$	(80)
	$\displaystyle\leq 2\,(\underline{\vartheta}^{-1}\overline{\vartheta})(t,% \lambda,Y)\\|\rho^{M}_{t}-\rho_{t}\\|_{\textnormal{TV}},$	(81)

where we used that $\rho_{t}(\mathds{1})\geq\rho_{t}(A)$ for all $A\in\mathcal{B}(\mathcal{H})$ and the notation $(\underline{\vartheta}^{-1}\overline{\vartheta})(t,\lambda,Y):=\underline{% \vartheta}^{-1}(t,\lambda,Y)\overline{\vartheta}(t,\lambda,Y)$ for better readability. Taking the supremum over all $A\in\mathcal{B}(\mathcal{H})$ shows that we can bound $\|\eta^{M}_{t}-\eta_{t}\|_{\textnormal{TV}}$ by (81).

From here, we estimate using the right hand side of (69) from the proof of Theorem 4.3 to arrive at

	$\displaystyle\\|\eta^{M}_{t}-\eta_{t}\\|_{\textnormal{TV}}$	$\displaystyle\leq(\underline{\vartheta}^{-1}\overline{\vartheta}^{2})(t,% \lambda,Y)\big{(}(1+L_{\lambda}\lambda_{-}^{-1})^{Y^{g}((0,t])}-1\big{)}\cdot\,$		(82)
		$\displaystyle\qquad\qquad\qquad\cdot\max\Big{\{}\overline{\textnormal{diam}}(% \mathcal{K}^{M}),\,\overline{\textnormal{diam}}(\mathcal{K}^{M})^{Y^{g}((0,t])% }\Big{\}}.$

Now, for the combined term $(\underline{\vartheta}^{-1}\overline{\vartheta}^{2})(t,\lambda,Y)$ we have the case distinction

(\underline{\vartheta}^{-1}\overline{\vartheta}^{2})(t,\lambda,Y)=\begin{cases% }\begin{aligned} &\lambda_{-}^{-Y^{g}((0,t])}\exp\{-2t\,(\lambda_{-}-1)\,\mu_{% \mathcal{K}}(\mathcal{K})\},&&0<\lambda_{-}\leq\lambda_{+}\leq 1,\\ &\Big{(}\frac{\lambda_{+}^{2}}{\lambda_{-}}\Big{)}^{Y^{g}((0,t])}\,\exp\{t\,(% \lambda_{+}+1-2\lambda_{-})\,\mu_{\mathcal{K}}(\mathcal{K})\},&&0<\lambda_{-}% \leq 1\leq\lambda_{+}<\infty,\\ &\lambda_{+}^{2Y^{g}((0,t])}\exp\{t\,(\lambda_{+}-1)\,\mu_{\mathcal{K}}(% \mathcal{K})\},&&1\leq\lambda_{-}\leq\lambda_{+}<\infty.\end{aligned}\end{cases}

Hence, as $(\underline{\vartheta}^{-1}\overline{\vartheta}^{2})(\cdot,\lambda,Y)$ is monotonically increasing on $[0,T]$ , we have

\displaystyle(\underline{\vartheta}^{-1}\overline{\vartheta}^{2})(t,\lambda,Y)% \big{(}(1+L_{\lambda}\lambda_{-}^{-1})^{Y^{g}((0,t])}-1\big{)}\leq\kappa_{\eta% }(\lambda,Y).

(83)

Using this bound and taking the supremum over $t$ in (82) finishes the proof of assertion (ii).

$\square$

4.3 Partial observations

As opposed to our observation schemes models LABEL:introduction_MPP_observation and LABEL:introduction_MPP_discretized_observation_multivariate, where we always have information about the whole mark space $\mathcal{K}$ , CLSM data does only contain information about a subset of the partition. For the purpose of modeling such a partial observation scheme, let $(\mathcal{K}^{M})_{M\in\mathbb{N}}$ again be the fixed dissecting system for the mark space $\mathcal{K}$ from the last section. For some fixed $M\in\mathbb{N}$ , let $\mathcal{I}_{M}:=\{i_{1},\dots i_{|\mathcal{I}_{M}|}\}\subseteq\{1,\dots,M\}$ be some subset of indices with $|\mathcal{I}_{M}|<M$ and let $\mathcal{K}^{M}_{\mathcal{I}_{M}}$ be the collection of all sets $K^{M}_{i}$ with $i\in\mathcal{I}_{M}$ . Because of $|\mathcal{I}_{M}|<M$ the family $\mathcal{K}^{M}_{\mathcal{I}_{M}}$ is no longer a partition of $\mathcal{K}$ .

Partial filtering problem

We can use the tools from Section 3.4 to derive the analogous filtering equations for a partial observation, as the partition property of $\mathcal{K}^{M}$ is not explicitly required in this context.

For the signal process $X$ from LABEL:introduction_spde_basic_formulation we again introduce the $M$ -variate observation $Y^{M}$ from (O^M) given $\mathcal{K}^{M}$ . Now, in addition to that we define the partial observation $Y^{M}|_{\mathcal{I}_{M}}$ given the collection of sets $\mathcal{K}^{M}_{\mathcal{I}_{M}}$ . Analogously to $Y^{M}$ , we can introduce a reference measure $\mathbb{Q}^{M}_{\mathcal{I}_{M}}$ under which $Y^{M}|_{\mathcal{I}_{M}}$ is a $|\mathcal{I}_{M}|$ -dimensional Poisson process with rate $\mu_{\mathcal{K}}(K^{M}_{i})$ in each component, with Radon-Nikodym derivative $Z^{M}_{\mathcal{I}_{M}}(t):=\frac{\textnormal{d}\mathbb{P}|_{t}}{\textnormal{d% }\mathbb{Q}^{M}_{\mathcal{I}_{M}}|_{t}}$ given by

	$\displaystyle Z^{M}_{\mathcal{I}_{M}}(t):=\exp$	$\displaystyle\Big{\{}\sum\limits_{i\in\mathcal{I}_{M}}\int_{0}^{t}\log\left\{% \frac{\lambda^{M}_{i}(t\,\|\,X(t))}{\mu_{\mathcal{K}}(K^{M}_{i})}\right\}\,% \textnormal{d}Y^{M}_{i}(t)$
		$\displaystyle\qquad\qquad-\int_{0}^{t}\big{(}\lambda^{M}_{i}(t\,\|\,X(t))-\mu_{% \mathcal{K}}(K^{M}_{i})\big{)}\,\textnormal{d}s\Big{\}},$

for any $t\in[0,T]$ . By introducing the filtration $(\mathcal{Y}^{\mathcal{I}_{M}}_{t})_{t\in[0,T]}$ generated by $Y^{M}|_{\mathcal{I}_{M}}$ one can derive the unnormalized and normalized posterior distributions $\rho^{\mathcal{I}_{M}}_{t}$ and $\eta^{\mathcal{I}_{M}}_{t}$ , respectively, in the exact same way as we did in Section 3.4.

The partial observation $Y^{M}_{\mathcal{I}_{M}}$ does not inherit all jumps of $Y$ , only those with marks in the sets of $\mathcal{K}^{M}_{\mathcal{I}_{M}}$ . We were able to interpret the process $Y^{M}$ as an approximation of the MPP $Y$ with uncertainty about the exact mark positions. A crucial property of the embedding $\tilde{Y}^{M}$ was the identity of the ground processes, i.e. $(\tilde{Y}^{M})^{g}=Y^{g}$ , and that we had $\tilde{Y}^{M}([0,T]\times\mathcal{K})=Y([0,T]\times\mathcal{K})$ . As opposed to that, in general for the partial observation we have

Y^{M}_{\mathcal{I}_{M}}([0,T])\leq Y([0,T]\times\mathcal{K}),

(84)

meaning that we may always miss some points.

Although we cannot expect convergence of the estimators in general, we can still derive approximation errors for the total variation distances $\|\rho_{t}-\rho^{\mathcal{I}_{M}}_{t}\|_{\textnormal{TV}}$ and $\|\eta_{t}-\eta^{\mathcal{I}_{M}}_{t}\|_{\textnormal{TV}}$ , as demonstrated in the next theorem. As we trivially have

\|\rho_{t}-\rho^{\mathcal{I}_{M}}_{t}\|_{\textnormal{TV}}\leq\|\rho_{t}-\rho^{% M}_{t}\|_{\textnormal{TV}}+\|\rho^{M}_{t}-\rho^{\mathcal{I}_{M}}_{t}\|_{% \textnormal{TV}},

(85)

for any $t\in[0,T]$ , and the analogous inequality for $\|\eta_{t}-\eta^{\mathcal{I}_{M}}_{t}\|_{\textnormal{TV}}$ , the first terms on the right hand sides of the bounds in (i) and (ii) follow direcly by Theorem 4.3 and Corollary 4.4, respectively. Hence, the errors comprise two components: the discretization errors $\kappa_{\rho}$ from Theorem 4.3 and $\kappa_{\eta}$ from Corollary 4.4, and additional errors $\varepsilon_{\rho}$ and $\varepsilon_{\eta}$ , respectively, that exponentially depend on the size of the unobserved area. The latter accounts for the information loss due to observing only a subset of the partition.

For better readability we define for any index set $\mathcal{I}_{M}\subset\{1,\dots,M\}$

	$\displaystyle\mathcal{I}_{M}^{\tcomplement}$	$\displaystyle:=\{1,\dots,M\}\backslash\mathcal{I}_{M},$
	$\displaystyle\mathcal{K}^{M}(\mathcal{I}_{M})$	$\displaystyle:=\bigcupplus_{i\in\mathcal{I}_{M}}K^{M}_{i}.$

Proposition 4.5.

Let the assumptions from Theorem 4.3(ii) hold true. Then we have the following approximation errors.

(i)

We have

	$\displaystyle\\|\rho_{t}-\rho^{\mathcal{I}_{M}}_{t}\\|_{\textnormal{TV}}\leq$	$\displaystyle\,\kappa_{\rho}(\lambda,Y)\max\Big{\{}\overline{\textnormal{diam}% }(\mathcal{K}^{M}),\,\overline{\textnormal{diam}}(\mathcal{K}^{M})^{Y^{g}((0,T% ])}\Big{\}}$
		$\displaystyle+\varepsilon_{\rho}(\lambda,Y)(\mathcal{K}(\mathcal{I}_{M}^{% \tcomplement})),$

with $\kappa_{\rho}(\lambda,Y)$ being the constant from Theorem 4.3 (ii) and

	$\displaystyle\varepsilon_{\rho}(\lambda,Y)$	$\displaystyle(\mathcal{K}(\mathcal{I}_{M}^{\tcomplement})):=$
		$\displaystyle\frac{1}{2}\max\Big{\{}\lambda_{+}^{Y^{g}((0,T])}\,\exp\{-T\,(% \lambda_{-}-1)\,\mu_{\mathcal{K}}(\mathcal{K})\},$
		$\displaystyle\qquad\qquad\lambda_{+}^{Y^{g}((0,T])},\,\exp\{-T\,(\lambda_{-}-1% )\,\mu_{\mathcal{K}}(\mathcal{K})\}\Big{\}}\cdot$
		$\displaystyle\cdot\Big{(}\exp\Big{\{}\max\big{\{}\|\log\{\lambda_{-}\}\|,\|\log\{% \lambda_{+}\}\|\big{\}}Y((0,T]\times\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_% {M}))+$
		$\displaystyle\qquad\quad\qquad+T\mu_{\mathcal{K}}(\mathcal{K}^{M}(\mathcal{I}^% {\tcomplement}_{M}))\max\{\|\lambda_{-}-1\|,\|\lambda_{+}-1\|\}\Big{\}}-1\Big{)}.$

(ii)

We have

	$\displaystyle\\|\eta_{t}-\eta^{\mathcal{I}_{M}}_{t}\\|_{\textnormal{TV}}\leq$	$\displaystyle\,\kappa_{\eta}(\lambda,Y)\max\Big{\{}\overline{\textnormal{diam}% }(\mathcal{K}^{M}),\,\overline{\textnormal{diam}}(\mathcal{K}^{M})^{Y^{g}((0,T% ])}\Big{\}}$
		$\displaystyle+\varepsilon_{\eta}(\lambda,Y)(\mathcal{K}(\mathcal{I}_{M}^{% \tcomplement})),$

with $\kappa_{\eta}(\lambda,Y)$ being the constant from Corollary 4.4 (ii) and

	$\displaystyle\varepsilon_{\eta}(\lambda,Y)$	$\displaystyle(\mathcal{K}(\mathcal{I}_{M}^{\tcomplement}))$
		$\displaystyle:=\max\Big{\{}\lambda_{-}^{-Y((0,t]\times\mathcal{K}^{M}(\mathcal% {I}_{M}))}\exp\{-2t\,(\lambda_{-}-1)\,\mu_{\mathcal{K}}(\mathcal{K})\},$
		$\displaystyle\qquad\qquad\lambda_{+}^{2Y^{g}((0,t])}\lambda_{-}^{-Y((0,t]% \times\mathcal{K}^{M}(\mathcal{I}_{M}))}\cdot$
		$\displaystyle\qquad\qquad\quad\cdot\exp\{t\,\big{(}(\lambda_{+}+1)\mu_{% \mathcal{K}}(\mathcal{K}^{M}(\mathcal{I}_{M})))-2\lambda_{-}\,\mu_{\mathcal{K}% }(\mathcal{K})\big{)}\},$
		$\displaystyle\qquad\qquad\lambda_{+}^{2Y^{g}((0,t])}\exp\{t\,(\lambda_{+}-1)\,% \mu_{\mathcal{K}}(\mathcal{K}^{M}(\mathcal{I}_{M}))\}\Big{\}}\cdot$
		$\displaystyle\quad\cdot\Big{(}\exp\Big{\{}\max\big{\{}\|\log\{\lambda_{-}\}\|,\|% \log\{\lambda_{+}\}\|\big{\}}Y((0,t]\times\mathcal{K}^{M}(\mathcal{I}^{% \tcomplement}_{M}))$
		$\displaystyle\qquad\qquad\qquad+t\mu_{\mathcal{K}}(\mathcal{K}^{M}(\mathcal{I}% ^{\tcomplement}_{M}))\max\{\|\lambda_{-}-1\|,\|\lambda_{+}-1\|\}\Big{\}}-1\Big{)}.$

$\diamond$

Proof. Analogously to the proofs of the preceeding approximation errors, for (i) we rewrite

$\displaystyle\\|\rho^{M}_{t}$	$\displaystyle-\rho^{\mathcal{I}_{M}}_{t}\\|_{\textnormal{TV}}=\frac{1}{2}% \mathbb{E}_{X}[\|Z^{M}(t)-Z^{M}_{\mathcal{I}_{M}}(t)\|]$
	$\displaystyle=\frac{1}{2}\mathbb{E}_{X}\Big{[}Z^{M}(t)\,\big{\|}1-\exp\big{\{}-% \int_{0}^{t}\int_{\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M})}\log\{\theta% _{M}(s,x)\}\,Y(\textnormal{d}s,\textnormal{d}x)$	(86)
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad-\int_{0}^{t}\int_{\mathcal{K% }^{M}(\mathcal{I}^{\tcomplement}_{M})}(\lambda(s,x)-1)\,\mu_{\mathcal{K}}(% \textnormal{d}x)\,\textnormal{d}s\big{\}}\big{\|}\Big{]}$

where $\theta_{M}(s,x)$ is defined as in the proof of Theorem 4.3.

As we have $\lambda_{-}\leq\theta_{M}(s,x)\leq\lambda_{+}$ , we can conclude that

	$\displaystyle\|\int_{0}^{t}\int_{\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M})}$	$\displaystyle\log\{\theta_{M}(s,x)\}\,Y(\textnormal{d}s,\textnormal{d}x)\|$
		$\displaystyle\leq\max\big{\{}\|\log\{\lambda_{-}\}\|,\|\log\{\lambda_{+}\}\|\big{% \}}Y((0,t]\times\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M})).$

Hence,

	$\displaystyle\|\int_{0}^{t}\int_{\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M}% )}\log\{\theta_{M}(s,x)\}\,Y(\textnormal{d}s,\textnormal{d}x)+\int_{0}^{t}\int% _{\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M})}(\lambda(s,x)-1)\,\mu_{% \mathcal{K}}(\textnormal{d}x)\|$
	$\displaystyle\quad\leq\max\big{\{}\|\log\{\lambda_{-}\}\|,\|\log\{\lambda_{+}\}\|% \big{\}}Y((0,t]\times\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M}))$		(87)
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad+t\mu_{\mathcal{K}}(\mathcal{% K}^{M}(\mathcal{I}^{\tcomplement}_{M}))\max\{\|\lambda_{-}-1\|,\|\lambda_{+}-1\|\}$

With a similar approximation as in the proof of Theorem 4.3, we now have for given $X$

	$\displaystyle Z^{M}(t)\,\big{\|}1-\exp\big{\{}-\int_{0}^{t}\int_{\mathcal{K}^{M% }(\mathcal{I}^{\tcomplement}_{M})}\log\{\theta_{M}(s,x)\}\,Y(\textnormal{d}s,% \textnormal{d}x)$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad-\int_{0}^{t}\int_{\mathcal{K% }^{M}(\mathcal{I}^{\tcomplement}_{M})}(\lambda(s,x)-1)\,\mu_{\mathcal{K}}(% \textnormal{d}x)\,\textnormal{d}s\big{\}}\big{\|}$
	$\displaystyle\leq\overline{\vartheta}(t,\lambda,Y)\,\Big{(}\exp\Big{\{}\|\int_{% 0}^{t}\int_{\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M})}\log\{\theta_{M}(s% ,x)\}\,Y(\textnormal{d}s,\textnormal{d}x)$		(88)
	$\displaystyle\qquad\qquad\qquad\qquad\qquad+\int_{0}^{t}\int_{\mathcal{K}^{M}(% \mathcal{I}^{\tcomplement}_{M})}(\lambda(s,x)-1)\,\mu_{\mathcal{K}}(% \textnormal{d}x)\,\textnormal{d}s\|\Big{\}}-1\Big{)}$
	$\displaystyle\leq\overline{\vartheta}(t,\lambda,Y)\,\Big{(}\exp\Big{\{}\max% \big{\{}\|\log\{\lambda_{-}\}\|,\|\log\{\lambda_{+}\}\|\big{\}}Y((0,t]\times% \mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M}))$		(89)
	$\displaystyle\qquad\qquad\qquad\qquad\qquad+t\mu_{\mathcal{K}}(\mathcal{K}^{M}% (\mathcal{I}^{\tcomplement}_{M}))\max\{\|\lambda_{-}-1\|,\|\lambda_{+}-1\|\}\Big{% \}}-1\Big{)}$

All components are independent of $X$ and bounded and monotonically increasing on $[0,T]$ , hence we can bound (86) using (89). Taking the supremum over $t$ proves assertion (i).

For the proof of the bound in (ii), we observe that

\displaystyle Z^{M}_{\mathcal{I}_{M}}(t)\geq\underline{\vartheta}_{\mathcal{I}% _{M}}(t,\lambda,Y),

with

	$\displaystyle\underline{\vartheta}_{\mathcal{I}_{M}}(t,\lambda,Y):=\min\Big{\{}$	$\displaystyle\lambda_{-}^{Y((0,t]\times\mathcal{K}^{M}(\mathcal{I}_{M}))}\,% \exp\{-t\,(\lambda_{+}-1)\,\mu_{\mathcal{K}}(\mathcal{K}^{M}(\mathcal{I}_{M}))% )\},$
		$\displaystyle\lambda_{-}^{Y((0,t]\times\mathcal{K}^{M}(\mathcal{I}_{M}))},\,% \exp\{-t\,(\lambda_{+}-1)\,\mu_{\mathcal{K}}(\mathcal{K}^{M}(\mathcal{I}_{M}))% )\}\Big{\}},$

and where

1\geq\underline{\vartheta}_{\mathcal{I}_{M}}(t,\lambda,Y)\geq\underline{% \vartheta}(t,\lambda,Y).

The rest of the proof is done analogously to the proof of Corollary 4.4 (ii) and is therefore being skipped.

$\square$

5 Simulations

In this section, we will compare our theoretical results with numerical experiments. The Python code used for the simulations and plots is publicly available at ”https://github.com/jszala/SPDE_Poisson_filtering.git”.

5.1 Synthetic data

Signal and observation processes both are simulated using explicit Euler schemes in time and finite differences in space. The Git repository also provides the necessary data for reproducing the experiments.

In the experiments, the observation process will be given as a multivariate Poisson process according to the scheme LABEL:introduction_MPP_discretized_observation_multivariate. The intensity is chosen as

\lambda(t,x):=e^{-at}(cx)^{2}\vee\mathcal{C}_{\max}

(90)

with $a>0$ being an optional and sufficiently small decay parameter and $c>0$ being a scaling parameter and $\mathcal{C}_{\max}$ is some sufficiently large upper bound. We note that since $\lambda$ is not Lipschitz continuous, the error bounds provided in Corollary 4.4(ii) are not directly applicable in this case.

Motivated by the application, we investigate the case where $\mathcal{D}=\mathcal{K}\subset\mathbb{R}^{2}$ . For computational reasons we choose to discretize the spatial domain into $1024$ sets, or, from an image analytical viewpoint, into $32\times 32$ pixels, whereas the decreasing observations’ spatial resolutions are given as $32\times 32$ , $16\times 16$ , …, $1\times 1$ pixels; see Figure 2 for an example.

5.1.1 Particle filter estimations

Particle filters provide a numerical approximation of the Kushner-Stratonovich equation from Theorem 3.10; see [3, Ch. 8-10] for details. A critical component of this approach involves calculating the forward steps of the Radon-Nikodym density $Z^{M}$ , which, analogous to the signal and observation processes, is achieved using an explicit Euler scheme in our implemetation.

Let $Y^{M}$ be a given observation according to LABEL:introduction_MPP_discretized_observation_multivariate. The ensemble size $L\in\mathbb{N}$ determines the the number of particles, denoted by $X_{L,1}^{M},\dots,X^{M}_{L,L}$ , used in the particle filter. The algorithm iteratively simulates the particles’ forward steps, assesses their likelihood, and then resamples them. For a given time discretization $t_{1},\dots,t_{N}$ of $[0,T]$ , the corresponding empirical distribution $\frac{1}{L}\sum_{i=1}^{L}\delta_{X^{M}_{L,i}(t_{j})}$ yields an approximation of the posterior distribution $\eta^{M}_{t_{j}}$ .

The empirical mean of the particles provides an estimate of the signal:

\overline{X}^{M}_{L}(t_{j}):=\frac{1}{L}\sum_{i=1}^{L}X^{M}_{L,i}(t_{j})% \approx X(t_{j}).

(91)

We assess the corresponding estimation error in (91) by computing the empirical mean squared errors.

The accuracy of $\overline{X}^{M}_{L}$ depends on various factors, such as signal and observation noise amplitudes, the spatial resolution $M$ , and the Monte Carlo sampling error, which decreases with larger ensemble sizes $L$ .

White noise signal

Our first experiment investigates a space-time white noise signal

\textnormal{d}X(t)=0.01\textnormal{d}W(t),\quad X(0,x)\sim\mathcal{N}(10,1),

(92)

and the particle filter estimates $\overline{X}^{M}_{10}$ corresponding to LABEL:introduction_MPP_discretized_observation_multivariate with spatial resolutions $M=32^{2},16^{2},\dots,1$ . We chose $a=0$ , $c=10$ in (90).

Denote the number of pixels in the signal’s spatial discretization by $|\text{pixels}|$ . As a measure of the estimations’ accuracy, in each time step $t_{j}$ we compute the spatial empirical MSE given by

\text{MSE}(\overline{X}^{M}_{10},X)(t_{j}):=\sqrt{\frac{1}{|\text{pixels}|}% \sum_{\text{pixels}}(\overline{X}^{M}_{10}(t_{j},\text{pixel})-X(t_{j},\text{% pixel}))^{2}}.\quad

(93)

Refer to caption — Figure 1: Empirical MSE for particle filter estimates with Poisson process observations and signal given by (92). The different colors correspond to the observation scheme resolution.

In Figure 1 we can see that the empirical MSE only stabilizes for the highest resolution $32\times 32$ around time step $3500$ . Furthermore, as expected, we can observe that the error increases with decreasing $M$ .

SPDE signal

We consider a class of stochastic reaction-diffusion SPDEs, specifically of the form:

\begin{cases}\textnormal{d}u(t)=\Big{(}\Delta u(t)+\varepsilon(u(t)-\alpha_{1}% )(u(t)-\alpha_{2})(\alpha_{3}-u(t))-v(t)+I\Big{)}\textnormal{d}t+B\textnormal{% d}W_{1}(t),\\ \textnormal{d}v(t)=\Big{(}\Delta v(t)+\gamma\big{(}\beta u(t)-v(t)\big{)}\Big{% )}\textnormal{d}t+\vartheta\textnormal{d}W_{2}(t),\end{cases}

(94)

which are commonly referred to as spatially extended stochastic FitzHugh-Nagumo dynamics, where $\Delta$ is the Neumann-Laplacian on $\mathcal{D}$ and $W_{1}$ and $W_{2}$ are two independent cylindrical $Q$ -Wiener processes. In [23, 12] the stochastic FHN-System has been introduced as a spatially extendend stochastic two-phase dynamics to model and further analyze the actin dynamics in D. discoideum. We set $X:=u$ as the signal process in our filtering problem, hence having an additional hidden process $v$ in the simulations. The parameters required to reproduce this experiment are available in the associated Git repository.

We applied a particle filter with $20$ particles to observations at various resolutions: $32\times 32$ , $16\times 16$ , down to $1\times 1$ , (see figures 2(b) and 2(d) for examples of different resolutions).

Using the MSE defined in (93), we observe that the estimation accuracy remains high even for relatively low-dimensional observations, as shown in Figure 3. One possible explanation is that, since the Laplacian is a is a strongly dissipative operator, its influence can still be captured effectively at lower resolutions, leading to accurate predictions of the signal state.

5.1.2 Implications of wrong assumptions on observation noise

In this subsection we compare the particle filter estimates with estimates produced by an Ensemble Kalman filter (EnKF). The EnKF is a widely applied particle filter implementation of the well-known Kalman filter for the case of additive Gaussian observation noise introduced in [11].

For a given signal path $X$ based on the reaction-diffusion system described above, we simulate two different observations at the maximum resolution of $1024=32\times 32$ . The first observation, denoted $Y^{1024}_{l}$ , is generated with a low intensity $\lambda_{l}$ by setting $c=100$ in (90), resulting in fewer point emissions. The second observation, denoted $Y^{1024}_{h}$ , is produced with a higher intensity $\lambda_{h}$ by setting $c=2000$ , leading to a significantly higher number of point emissions at each time step; see Figure 4.

To construct the ”wrong” observation model, we estimate the empirical variances $\hat{\sigma}^{2}_{l}$ and $\hat{\sigma}^{2}_{h}$ of $Y^{1024}_{l}$ and $Y^{1024}_{h}$ , where due to the observations being Poisson distributed, we have

\hat{\sigma}^{2}_{i}\textnormal{d}t\approx\lambda^{k}_{i}(t,X(t))\textnormal{d% }t,\quad i\in\{l,h\}.

Hence, we assume the observation dynamics

\textnormal{d}\tilde{Y}^{1024,k}_{i}(t)=\lambda^{k}_{i}(t,X(t))\textnormal{d}t% +\hat{\sigma}_{i}\textnormal{d}B_{i}^{k}(t),\quad k=1,\dots,1024,\;i\in\{l,h\},

(95)

with $B_{l}$ , $B_{h}$ being multivariate Brownian motions on $[0,T]$ . While the particle filter assumes the correct Poisson noise dynamics, the EnKF is run with the wrong model assumptions (95).

Using empirical MSE as a measure of estimation accuracy, Figure 5 demonstrates that for the high-intensity observation $Y_{h}^{1024}$ , the EnKF provides a slightly better estimation of the signal state. In contrast, the particle filter performs significantly better with the low-intensity observation $Y_{l}^{1024}$ . This difference can be attributed to the effects of the Normal approximation, which gets more accurate with large $\lambda$ ; see for example [20] for error estimates. Our experiment highlights that, particularly in scenarios with low point emission counts, having the correct observation noise assumption is crucial for the accuracy of the filter estimate.

5.1.3 Partial observations

Using the signal dynamics from (94), we conducted experiments to simulate the partial observation schemes discussed in Section 4.3. Unlike the observations in LABEL:introduction_MPP_discretized_observation_multivariate, where the number of point emissions remains stable across all resolutions due to summing the emission counts, the partial observation scheme progressively loses information as the resolution decreases. As anticipated, this results in less accurate estimates with lower resolution; see Figure 6.

5.2 Outlook: Filtering CLSM data

The application of a Poisson particle filter to real CLSM data of D. discoideum will be explored in future work. We plan to investigate parameter estimation under Poisson observation noise, expanding upon the theoretical framework established in [24, 23]. While a detailed analysis will be provided in a forthcoming paper, we offer a brief overview of the intended applications.

Confocal Laser Scanning Microscopy (CLSM) is an optical imaging technique that enhances image clarity by selectively excluding out-of-focus light, effectively sectioning a three-dimensional object into thin two-dimensional optical slices. In CLSM, a laser beam is focused on single points within the sample, exciting fluorescent molecules that are present in the illuminated region. The sample is scanned point-by-point, and the emitted fluorescence passes through a pinhole aperture that blocks out-of-focus light, allowing only the fluorescence from the focal plane to reach the detector. This process results in an integer-valued photon count, which is typically transformed into a pixel value in a nonlinear fashion. In the analyzed data, we had access to the raw photon counts before their transformation into pixel values, allowing for more direct analysis of the imaging data.

5.2.1 Data acquisition

Experimental CLSM data was acquired using a laser scanning microscope (LSM780, Zeiss, Jena) equipped with a 20x objective lens and a 488 nm Argon laser. In order to access the raw photon counts, all recordings were performed under the ”Photon Counting” acquisition mode.

For the control experiments with fluorescein, a solution of 100 nM fluorescein sodium salt in Sørensen’s buffer (14.7 mM KH₂PO₄, 2 mM Na₂HPO₄, pH 6.0) was freshly prepared and further diluted to the desired final concentration before imaging. All fluorescein solutions were protected from light until imaging was performed. Timelapse recordings were acquired for 16 x 16 pixel frames, using a pixel dwell time of 16 $\mu$ s, 40 $\mu$ s or 81 $\mu$ s, without any time delay between frames.

For live cell imaging, we worked with giant D. discoideum cells, produced through the electric pulse-induced fusion of individual cells [14]. The cells (strain DdB NF1 KO, transformed with a plasmid for fluorescent labeling, SF108 as described in [12]) contain a green fluorescent protein that labels the intracellular actin (LifeAct-GFP). In all cases, samples were contained in a small petri dish with a glass bottom.

5.2.2 Poisson statistics in CLSM microscopy

To validate the assumption that the observation noise in our data follows a Poisson distribution, commonly referred to as ”shot noise” in statistical literature [17], we conducted an analysis on images of solutions containing varying concentrations of the fluorescent dye Fluorescein. Due to minimal diffusion over short time periods and within localized regions, it is reasonable to assume that the Fluorescein concentration remains approximately constant during the observations. An example of an image from such a ”static” sample is shown in Figure 7(a). Each pixel in these images can be treated as a photon count sample from the same underlying Fluorescein concentration. We then compared the distribution of photon counts across all pixels with a Poisson probability density function (pdf) where the intensity parameter is given by the mean photon count, as illustrated in Figure 7(b). This analysis was performed across over 30 datasets, consistently showing that the bar plots of photon counts closely match Poisson distributions. The intensity of these distributions varied according to microscope settings, such as dwell time, laser intensity, and Fluorescein concentration. Further analysis revealed no significant correlation between photon counts, further supporting the Poisson noise assumption.

5.2.3 Filtering CLSM data

In a final experiment, we applied our filtering method to data obtained from confocal laser scanning microscopy recordings of giant D. discoideum cells. Given that the datasets typically capture the entire cell, we began by extracting an area of interest (AOI) focused exclusively on the cell’s interior to omit boundary effects [23]. The SPDE model (94) was used as the signal model, with parameters calibrated to ensure that $u$ maintains concentration values between $0$ and $1$ with large probability in good accordance with the observed data of actin concentrations. We assumed Poisson-distributed observation noise with an intensity of the form (90), adjusting the scaling factor $c$ to align the model’s photon counts with those observed in the data.

Figure 8(b) shows a data sample alongside the estimated state of the underlying actin dynamics. The experiments demonstrate that the filter effectively tracks wave-like actin movements, providing a satisfactory proof of concept across four different cell recordings.

While the initial results are promising, a significant challenge persists: the parameters must be manually selected, with no definitive method to ensure their accuracy beyond phenomenological validation. In future research, we aim to expand our theory and address this limitation by exploring parameter estimation techniques for SPDEs under point process noise, with a focus on potential applications in biophysics.

Acknowledgments

This research has been partially funded by the Deutsche Forschungsgemeinschaft (DFG)- Project-ID 318763901 - SFB1294, Project A01 “Statistics for Stochastic Partial Diﬀerential Equations” (JS and WS) and Project B02 ”Inferring the dynamics underlying protrusion-driven cell motility” (CM-T).

References

Ahmed et al. [1997] N. Ahmed, M. Fuhrmann, and J. Zabczyk. On filtering equations in infinite dimensions. Journal of Functional Analysis, 143:180–204, 1997.
Annesley and Fisher [2009] S. J. Annesley and P. R. Fisher. Dictyostelium discoideum — a model for many reasons. Mol Cell Biochem, 329(1-2):73–91, 2009.
Bain and Crisan [2009] A. Bain and D. Crisan. Fundamentals of Stochastic Filtering. Springer, New York, 2009.
Brezis [2011] H. Brezis. Functional Analysis, Sobolev Spaces and Partial Differential Equations. Springer, New York, 2011.
Brémaud [1972] P. Brémaud. A Martingale Approach to Point Processes, PhD Thesis. University of California, 1972.
Brémaud [1981] P. Brémaud. Point Processes and Queues: Martingale Dynamics. Springer, New York, 1981.
Daley and Vere-Jones [2003] D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes. Volume I: Elementary Theory and Methods. Springer, New York, 2003.
Daley and Vere-Jones [2008] D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes. Volume II: General Theory and Structure. Springer, New York, 2008.
Dudley [1966] R. Dudley. Convergence of baire measures. Stud. Math., 27:251 – 268, 1966.
Edelstein-Keshet [2005] L. Edelstein-Keshet. Mathematical Models in Biology: Siam Classics In Applied Mathematics 46. Society for Industrial and Applied Mathematics, Philadelphia, 2005.
Evensen [1994] G. Evensen. Sequential data assimilation with a non-linear quasi-geostrophic model using monte carlo methods to forecast error statistics. J. Geophys. Res, 10(99(C5)):143–162, 1994.
Flemming et al. [2020] S. Flemming, F. Font, S. Alonso, and C. Beta. How cortical waves drive fission of motile cells. Proceedings of the National Academy of Sciences 117, 117(12):6330–6338, 2020.
Florchinger [1999] P. Florchinger. Filtering equations in infinite dimensional spaces with counting observation. Proceedings of the 38th IEEE Conference on Decision and Control, 2:1895–1896, 1999.
Gerisch et al. [2013] G. Gerisch, M. Ecke, R. Neujahr, J. Prassler, A. Stengl, M. Hoffmann, U. S. Schwarz, and E. Neumann. Membrane and actin reorganization in electropulse-induced cell fusion. J Cell Sci, 126(9):2069–2078, 2013.
Gonçalves and Gamerman [2018] F. B. Gonçalves and D. Gamerman. Exact bayesian inference in spatiotemporal cox processes driven by multivariate gaussian processes. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 80(1):157–175, 2018.
Klenke [2020] A. Klenke. Probability Theory. A Comprehensive Course. Springer, Cham, 2020.
Krull et al. [2023] A. Krull, H. Basevi, B. Salmon, A. Zeug, F. Müller, S. Tonks, L. Muppala, and A. Leonardis. Image denoising and the generative accumulation of photons. pre-print, 2023. arXiv:2307.06607.
Liptser and Shiryaev [2001] R. S. Liptser and A. N. Shiryaev. Statistics of Random Processes II. Applications. Springer, Berlin, Heidelberg, 2001.
Liu and Röckner [2015] W. Liu and M. Röckner. Stochastic Partial Differential Equations: An Introduction. Springer, Cham, 2015.
Molenaar [1970] W. Molenaar. Normal approximations to the poisson distribution. Random Counts in Scientific Work, 2:237–254, 1970.
Murray [2003] J. D. Murray. Mathematical Biology II: Spatial Models and Biomedical Applications. Springer, New York, 2003.
Pardoux [1979] E. Pardoux. Stochastic partial differential equations and filtering of diffusion processes. Stochastics, 3:127 – 167, 1979.
Pasemann et al. [2021] G. Pasemann, S. Flemming, S. Alonso, C. Beta, and W. Stannat. Diffusivity estimation for activator-inhibitor models: Theory and application to intracellular dynamics of the actin cytoskeleton. Journal of Nonlinear Science, 31(59):1–34, 2021.
Pasemann et al. [2023] G. Pasemann, C. Beta, and W. Stannat. Stochastic reaction-diffusion systems in biophysics: Towards a toolbox for quantitative model evaluation. pre-print, 2023. arXiv:2307.06655.
Prato and Zabczyk [2014] G. D. Prato and J. Zabczyk. Stochastic Equations in Infinite Dimensions. Cambridge University Press, Cambridge, 2014.
Santitissadeekorn et al. [2020] N. Santitissadeekorn, D. J. Lloyd, M. B. Short, and S. Delahaies. Approximate filtering of conditional intensity process for poisson count data: Application to urban crime. Computational Statistics & Data Analysis, 144:106850, 2020.
Snyder [1972] D. Snyder. Filtering and detection for doubly stochastic poisson processes. IEEE Trans. Inform. Theory,, IT-18:91–102, 1972.
Sun et al. [2013] W. Sun, Y. Zeng, and S. Zhang. Filtering with marked point process observations via poisson chaos expansion. Appl Math Optim, 67:323 – 351, 2013.
Venugopal et al. [2016] M. Venugopal, R. M. Vasu, and D. Roy. An ensemble kushner-stratonovich-poisson filter for recursive estimation in nonlinear dynamical systems. IEEE Transactions on Automatic Control, 61(3):823–828, 2016.

$\displaystyle\mathbb{E}_{\mathbb{Q}}[\psi(X(t))Z(t)\,\|\,\mathcal{Y}_{t}]$	$\displaystyle=\mathbb{E}_{\mathbb{Q}}\left[\psi(X(0))\,\|\,\mathcal{Y}_{t}\right]$	(26)
	$\displaystyle\quad+\mathbb{E}_{\mathbb{Q}}\left[\int_{0}^{t}Z(s)\mathcal{L}(% \phi(X(s)))\,\textnormal{d}s\,\|\,\mathcal{Y}_{t}\right]$
	$\displaystyle\quad+\mathbb{E}_{\mathbb{Q}}\Big{[}\int_{0}^{t}\int_{\mathcal{K}% }Z(s-)(\lambda(s-,x\,\|\,X(s))-1)\psi(X(s))\,\times$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\quad\times(Y(\textnormal{d}s,% \textnormal{d}x)-\mu_{\mathcal{K}}(\textnormal{d}x)\textnormal{d}s)\,\|\,% \mathcal{Y}_{t}\Big{]},$

$\displaystyle\|\eta^{M}_{t}(A)-\eta_{t}$	$\displaystyle(A)\|=\Big{\|}\frac{\rho^{M}_{t}(A)}{\rho^{M}_{t}(\mathds{1})}-% \frac{\rho_{t}(A)}{\rho_{t}(\mathds{1})}\Big{\|}$
	$\displaystyle=\Big{\|}\frac{\rho^{M}_{t}(A)\,\rho_{t}(\mathds{1})-\rho_{t}(A)\,% \rho^{M}_{t}(\mathds{1})}{\rho^{M}_{t}(\mathds{1})\,\rho_{t}(\mathds{1})}\Big{\|}$	(78)
	$\displaystyle\leq(\rho^{M}_{t}(\mathds{1})\,\rho_{t}(\mathds{1}))^{-1}\,\Big{(% }\,\big{\|}\,\rho^{M}_{t}(A)\,\rho_{t}(\mathds{1})-\rho_{t}(A)\,\rho_{t}(% \mathds{1})\,\big{\|}$	(79)
	$\displaystyle\qquad\qquad\qquad\qquad\qquad+\big{\|}\,\rho_{t}(A)\,\rho^{M}_{t}% (\mathds{1})-\rho_{t}(A)\,\rho_{t}(\mathds{1})\,\big{\|}\Big{)}$
	$\displaystyle\leq\rho^{M}_{t}(\mathds{1})^{-1}\Big{(}\,\big{\|}\,\rho^{M}_{t}(A% )-\rho_{t}(A)\big{\|}+\big{\|}\rho^{M}_{t}(\mathds{1})-\rho_{t}(\mathds{1})\,% \big{\|}\Big{)}$	(80)
	$\displaystyle\leq 2\,(\underline{\vartheta}^{-1}\overline{\vartheta})(t,% \lambda,Y)\\|\rho^{M}_{t}-\rho_{t}\\|_{\textnormal{TV}},$	(81)

$\displaystyle\\|\rho^{M}_{t}$	$\displaystyle-\rho^{\mathcal{I}_{M}}_{t}\\|_{\textnormal{TV}}=\frac{1}{2}% \mathbb{E}_{X}[\|Z^{M}(t)-Z^{M}_{\mathcal{I}_{M}}(t)\|]$
	$\displaystyle=\frac{1}{2}\mathbb{E}_{X}\Big{[}Z^{M}(t)\,\big{\|}1-\exp\big{\{}-% \int_{0}^{t}\int_{\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M})}\log\{\theta% _{M}(s,x)\}\,Y(\textnormal{d}s,\textnormal{d}x)$	(86)
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad-\int_{0}^{t}\int_{\mathcal{K% }^{M}(\mathcal{I}^{\tcomplement}_{M})}(\lambda(s,x)-1)\,\mu_{\mathcal{K}}(% \textnormal{d}x)\,\textnormal{d}s\big{\}}\big{\|}\Big{]}$

	$\displaystyle\|\int_{0}^{t}\int_{\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M}% )}\log\{\theta_{M}(s,x)\}\,Y(\textnormal{d}s,\textnormal{d}x)+\int_{0}^{t}\int% _{\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M})}(\lambda(s,x)-1)\,\mu_{% \mathcal{K}}(\textnormal{d}x)\|$
	$\displaystyle\quad\leq\max\big{\{}\|\log\{\lambda_{-}\}\|,\|\log\{\lambda_{+}\}\|% \big{\}}Y((0,t]\times\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M}))$		(87)
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad+t\mu_{\mathcal{K}}(\mathcal{% K}^{M}(\mathcal{I}^{\tcomplement}_{M}))\max\{\|\lambda_{-}-1\|,\|\lambda_{+}-1\|\}$

	$\displaystyle Z^{M}(t)\,\big{\|}1-\exp\big{\{}-\int_{0}^{t}\int_{\mathcal{K}^{M% }(\mathcal{I}^{\tcomplement}_{M})}\log\{\theta_{M}(s,x)\}\,Y(\textnormal{d}s,% \textnormal{d}x)$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad-\int_{0}^{t}\int_{\mathcal{K% }^{M}(\mathcal{I}^{\tcomplement}_{M})}(\lambda(s,x)-1)\,\mu_{\mathcal{K}}(% \textnormal{d}x)\,\textnormal{d}s\big{\}}\big{\|}$
	$\displaystyle\leq\overline{\vartheta}(t,\lambda,Y)\,\Big{(}\exp\Big{\{}\|\int_{% 0}^{t}\int_{\mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M})}\log\{\theta_{M}(s% ,x)\}\,Y(\textnormal{d}s,\textnormal{d}x)$		(88)
	$\displaystyle\qquad\qquad\qquad\qquad\qquad+\int_{0}^{t}\int_{\mathcal{K}^{M}(% \mathcal{I}^{\tcomplement}_{M})}(\lambda(s,x)-1)\,\mu_{\mathcal{K}}(% \textnormal{d}x)\,\textnormal{d}s\|\Big{\}}-1\Big{)}$
	$\displaystyle\leq\overline{\vartheta}(t,\lambda,Y)\,\Big{(}\exp\Big{\{}\max% \big{\{}\|\log\{\lambda_{-}\}\|,\|\log\{\lambda_{+}\}\|\big{\}}Y((0,t]\times% \mathcal{K}^{M}(\mathcal{I}^{\tcomplement}_{M}))$		(89)
	$\displaystyle\qquad\qquad\qquad\qquad\qquad+t\mu_{\mathcal{K}}(\mathcal{K}^{M}% (\mathcal{I}^{\tcomplement}_{M}))\max\{\|\lambda_{-}-1\|,\|\lambda_{+}-1\|\}\Big{% \}}-1\Big{)}$