Latent-KalmanNet Learned Kalman Filtering For
Latent-KalmanNet Learned Kalman Filtering For
Abstract—The Kalman filter (KF) is a widely-used algorithm due to its computational efficiency and optimality properties.
arXiv:2304.07827v2 [eess.SP] 20 Apr 2023
for tracking dynamic systems that are captured by state space However, the reliance of the KF and its variants on an accurate
(SS) models. The need to fully describe a SS model limits description of the underlying dynamics as a closed-form SS
its applicability under complex settings, e.g., when tracking
based on visual data, and the processing of high-dimensional model with Gaussian noise restricts it applicability when
signals often induces notable latency. These challenges can tracking from complex high-dimensional data.
be treated by mapping the measurements into latent features In particular, the KF assumes linear dynamics with Gaussian
obeying some postulated closed-form SS model, and applying noise of a known distribution. Variations of the KF, such as the
the KF in the latent space. However, the validity of this ap- extended Kalman filter (EKF) [4] and the unscented Kalman
proximated SS model may constitute a limiting factor. In this
work, we study tracking from high-dimensional measurements filter [5], can cope with nonlinear Gaussian SS models, yet
under complex settings using a hybrid model-based/data-driven they require an accurate description of the nonlinearities,
approach. By gradually tackling the challenges in handling the which is often unavailable when dealing with visual data,
observations model and the task, we develop Latent-KalmanNet, and their complexity grows when processing high-dimensional
which implements tracking from high-dimensional measurements observations. Alternative tracking methods based on Bayesian
by leveraging data to jointly learn the KF along with the
latent space mapping. Latent-KalmanNet combines a learned filtering [6]–[8] do not assume Gaussian modeling, yet are
encoder with data-driven tracking in the latent space using the often computationally complex. While for certain families
recently proposed-KalmanNet, while identifying the ability of of high-dimensional observations, such as graph signals, one
each of these trainable modules to assist its counterpart via can leverage structures in the data to notably reduce tracking
providing a suitable prior (by KalmanNet) and by learning a complexity [9]–[11], such approaches do not naturally extend
latent representation that facilitates data-aided tracking (by the
encoder). Our empirical results demonstrate that the proposed to other domains of high-dimensional data. Moreover, all of
Latent-KalmanNet achieves improved accuracy and run-time the aforementioned techniques are model-based, relying on full
performance over both model-based and data-driven techniques knowledge of the SS model, which is likely to be unavailable
by learning a surrogate latent representation that most facilitates when tracking based on high-dimensional measurements such
tracking, while operating with limited complexity and latency. as visual data.
In recent years, the combination of large-scale datasets and
advancements in deep learning has led to the development
I. I NTRODUCTION
of several data-driven filtering methods, see review in [12].
Tracking the hidden state of dynamic systems is a funda- These methods have shown empirical success in processing
mental problem in various fields, including signal processing, visual data, and typically involve deep neural network (DNN)
control, and finance. In many real-world applications, such architectures, such as recurrent neural networks (RNNs) [13],
as autonomous driving, smart city monitoring, and visual attention mechanisms [14], and deep Markov models [15]
surveillance, tracking is based on noisy high-dimensional for state tracking tasks. While these methods are based on
observations, e.g., visual data. The classic Kalman filter (KF) architectures designed for generic time sequence processing,
[2] algorithm and its variants [3, Ch. 10] have been the go- several DNN architectures were proposed specifically for
to approach for tracking, relying on the representation of tracking in SS models being inspired by model-based tracking
the dynamics as a state space (SS) model that describes the algorithms [16]–[24], resulting in, e.g., DNNs whose internal
state evolution and the sensing model. KF is widely used interconnection follows the flow of the EKF. Among these
existing works, the systems of [21]–[24] were specifically
Parts of this work were accepted for presentation at the IEEE International designed to cope with high-dimensional measurements, with a
Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2023 as
the paper [1]. I. Buchnik, T. Routtenberg, and N. Shlezinger are with the leading basis architecture being the recurrent Kalman network
School of ECE, Ben-Gurion University of the Negev, Be’er Sheva, Israel (RKN) of [21]. However, those methods suffer from diffi-
(e-mail: [email protected]; {tirzar; nirshl}@bgu.ac.il). T. Routtenberg culty in training, sensitivity to initialization and generalization
is also with the ECE Department, Princeton University, Princeton, NJ. D.
Steger and G. Revach are with the D-ITET, ETH Zürich, Switzerland (e- problems. Moreover, were not leverage domain knowledge
mail: [email protected]; [email protected]). R. J. G. van Sloun is with regarding the state evolution, even when such is available, as
the EE Dpt., Eindhoven University of Technology, The Netherlands (e-mail: is the case in various applications including, e.g., localization
[email protected]). This work is partially supported by the Israeli Ministry of
National Infrastructure, Energy, and Water Resources. We thank Hans-Andrea and navigation [25, Ch. 6].
Loeliger for the helpful discussions. While the above data-driven approaches do not use knowl-
2
edge about the SS model, one can combine model-agnostic thermore, we demonstrate that Latent-KalmanNet outperforms
deep-learning tools with SS-aware processing. A candidate classic model-based nonlinear tracking algorithms as well as
approach to do so in the context of high dimensional data state-of-the-art deep architectures in terms of state estimation
is to use a DNN decoder to capture the complex observation accuracy as well as inference speed.
model [26], thus overcoming the need to analytically describe The rest of the paper is organized as follows: Section II
it, yet preserving the complexity associated with tracking details the problem formulation and briefly describes the EKF
using high-dimensional data. Alternatively, a widely adopted and KalmanNet. Section III presents Latent-KalmanNet in a
approach encodes the observations into a latent space via a step-by-step manner, along with the proposed training method.
DNN, i.e., using instead the inverse of the observations model. Our numerical evaluation is provided in Section IV, while
These latent features are then used to track the state with, Section V concludes the paper.
e.g., a conventional KF. This approach assumes that the latent Throughout the paper, we use boldface lower-case letters
features obey a simple SS model, typically a known Gaussian for vectors and boldface upper-case letters for matrices. The
>
one in the latent domain as in [26]–[31]. However, the resulting transpose, `2 norm, and gradient operator are denoted by {·} ,
latent SS model is often non-Gaussian, which can impact the k·k, and ∇(·) , respectively. Finally, R and Z are the sets of
tracking accuracy in the latent space. real and integer numbers, respectively.
In this work, we propose Latent-KalmanNet, which ad-
dresses the difficulties of tracking high-dimensional data by II. S YSTEM M ODEL AND P RELIMINARIES
simultaneously learning to track along with the latent space In this section, we present the system model and rel-
representation. To achieve this, we utilize the recently pro- evant preliminaries needed to derive Latent-KalmanNet in
posed KalmanNet [32]–[34], that learns from data to perform Section III. We start by formulating the problem of tracking
Kalman filtering in partially known SS models as a form of in partially known SS models with high-dimensional observa-
model-based deep learning [35], [36]. KalmanNet relies on a tions in Subsection II-A. Then, in Subsection II-B, we briefly
(possibly approximated) description of the sensing function. recall the model-based EKF and KalmanNet of [32], and
However, this information is unavailable in the setting con- identify their shortcomings for the considered setting.
sidered in this work of complicated high-dimensional data,
and the solution complexity grows with the dimensions of the
A. Problem Formulation
measurements. Therefore, in Latent-KalmanNet we combine
KalmanNet with latent-space encoding, and propose a novel We consider a dynamic system characterized by a (possibly)
training method which jointly learns the latent representation nonlinear, continuous SS model in discrete-time t ∈ Z. Let xt
and filter operation. Latent-KalmanNet uses a latent trans- be the m × 1 state vector, which evolves by a nonlinear state
formation is assisted by its subsequent data-aided tracking evolution function f (·), and is driven by an additive zero-mean
method. The resulting latent representation is suitable for noise et . The n × 1 observation vectors yt , t ∈ Z, are high-
tracking while maintaining the interpretability and low com- dimensional, and in particular n m, that can be, e.g., the
plexity of the KF. vector representation of an image/tensor1 . The observed yt is
In particular, we first identify the main challenges associated related to the state xt via a complex and possibly unknown
with tracking from high-dimensional measurements in (1) the measurement function h(·) with additive zero-mean noise vt .
need to model stochasticity; (2) the operation with a possibly The resulting SS model is given by:
intractable measurements model; (3) the need to be applicable xt = f (xt−1 ) + et , x t ∈ Rm , (1a)
in real-time; and (4) the presence of possible mismatches in n
yt = h (xt ) + vt , yt ∈ R . (1b)
the state evolution model. Based on these challenges, we de-
rive Latent-KalmanNet by gradually addressing each specific We consider a case where at least some of the state variables
challenge, while accounting for settings where the state can be can be estimated from yt , which is related to the notion of
either partially observable or fully observable. The resulting observablity, typically used in the context of deterministic
Latent-KalmanNet combines two trainable components – an systems [25, Ch. 3]. In particular, we use the term fully
encoder that maps the observations into a latent representation, observable to denote measurement models where yt is affected
and KalmanNet, which tracks based on the latent features. by all variables in xt , and all variables in state xt can
Instead of designing these components separately, we exploit be recovered from yt (i.e., the mapping h(·) is injective).
the ability of each module to facilitate the operation of partially observable for models in which some of the entries
its counterpart. Specifically, the tracking module is used to of xt cannot be recovered from yt (though they may be
provide a prior for encoding, while the encoder generates a dependent on {yτ }τ ≤t ). That is, we examine both the fully
latent representation that is most suitable for tracking, and observable case and the partially observable setting, where in
this desired behavior is learned from data using a dedicated the latter a single yt , can be used to recover only a subset of
alternating training mechanism. Our experimental study eval- p ≤ m variables in xt , denoted as the p × 1 vector Pxt , with
uates Latent-KalmanNet for tracking in challenging settings P being a p × m selection matrix. We henceforth focus our
with high-dimensional visual data, identifying the benefits of
1 For mathematical simplicity, we formulate our high-dimensional obser-
each of the components incorporated in Latent-KalmanNet,
vations in vector form, which also represents tensor data by stacking their
while showing that the synergistic design of latent encoding elements in vector form. The size n considered is the total number of elements
and tracking yields notable performance improvements. Fur- in the observation.
3
on the partially observable setting as it also includes the fully The Kalman gain Kt is calculated via
observable setting by writing P = I and p = m. > −1
Our goal is to develop a filtering algorithm for real-time Kt = Σ̂t|t−1 · Ĥt · Ŝt|t−1 , (5)
state estimation, i.e., for the recovery of xt from {yτ }τ ≤t where Σ̂t|t−1 and Ŝt|t−1 are the covariance matrices of
for each time instance t [37]. This algorithm should work the state prediction x̂t|t−1 and observation prediction ŷt|t−1 ,
effectively in both fully or partially observable SS models, respectively. These matrices are calculated via
while we assume that one has knowledge on which state
variables are observable, i.e., P is known. The performance Σ̂t|t−1 = F̂t · Σ̂t−1 · F̂>
t + Q, (6)
of a given estimator (obtained by the filtering approach) x̂t
is measured using the mean-squared error (MSE), which is Ŝt|t−1 = Ĥt · Σ̃t|t−1 · Ĥ>
t + R, (7)
defined as E{kx̂t − xt k2 }. where Q and R are the known covariance matrices of et and
While various methods have been proposed for tracking in vt , respectively. The matrices F̂t and Ĥt are instantaneous
SS models, our setting is associated with several challenges: linearizations of f (·) and h(·), respectively, obtained using
C.1 The distribution of the noises, et and vt in (1), is their Jacobian matrices evaluated at x̂t−1 and x̂t|t−1 (see [3,
unknown and may be non-Gaussian as, e.g., stochasticity Ch. 10]), i.e.,
in visual data is often non-Gaussian.
F̂t = ∇x f (x̂t−1 ); Ĥt = ∇x h(x̂t|t−1 ). (8)
C.2 The available state-evolution function, f (·), may be mis-
matched, e.g., obtained via a first-order linear approxima- Challenges C.1-C.4 notably limit the applicability of the
tion of complex physical dynamics, as is often the case EKF for the setup detailed in the previous subsection. Kalman-
in navigation and localization tasks [25, Ch. 6]. Net proposed at [32] is designed to leverage data as in
C.3 The observations are high-dimensional (n m), leading (2) to tackle Challenges C.1 and C.2 (but not C.3-C.4). In
to high complexity and affecting real-time applicability. particular, KalmanNet builds on the insight that the missing
C.4 The sensing function h(·) is unknown and possibly and mismatched domain knowledge of the noise statistics and
analytically intractable. the linear approximations are encapsulated in the computation
To cope with the various unknown characteristics of (1), of the Kalman gain Kt (5). Consequently, it augments the EKF
we are given access to a labeled data set comprised of D with a deep-learning component by replacing the computation
trajectories of length T of paired observations and states, of the Kalman gain with an RNN, while preserving the filtering
n oT D operation via (3)-(4). By doing so, KalmanNet converts the
(d) (d) EKF into a trainable discriminative model [39], where the data
D, xt , yt . (2)
t=1 d=1 D is used to directly learn the Kalman gain, bypassing the
Our proposed algorithm for tackling C.1-C.4 is detailed in need to enforce any model over the noise statistics and able to
Section III. Our design follows model-based deep-learning handle domain knowledge mismatches as in challenges C.1-
methodology [35], [36], where deep-learning tools are used C.2. Moreover, KalmanNet preserves the interoperability of
to augment and empower model-based algorithms rather than the KF, while being operable in partially known SS models;
replace them. Our method builds upon the KalmanNet archi- therefore it allows to deduce uncertainty as shown in [33], and
tecture of [32], which augments the classic EKF, as briefly is amenable to training in an unsupervised manner [34].
recalled in the next subsection. Despite the ability of the KalmanNet architecture of [32]
to learn from data to cope with Challenges C.1 and C.2,
it is not suitable to be applied in our setting under C.3-
B. EKF and KalmanNet C.4. In particular, the high dimension of the observations
Various model-based filters have been developed tracking notably increases the complexity of its Kalman gain RNN and
in SS models (see, e.g. [3, Ch. 10]). One of the most common the resulting filter. Moreover, KalmanNet requires knowledge
algorithms, which is suitable when the noises are Gaussian and of h(·), which is not analytically available in the current
the SS model is fully known, i.e., in the absence of Challenges setting. This motivates the derivation of the proposed Latent-
C.1-C.4, is the EKF [38]. The EKF follows the operation of the KalmanNet in the sequel.
KF, combining prediction based on the previous estimate with
an update based on the current observation, while extending III. L ATENT-K ALMAN N ET
it to nonlinear SS models.
In particular, the EKF first predicts the next state and In this section, we present the proposed Latent-
observation based on x̂t−1 via KalmanNet algorithm, which tackles Challenges C.1-C.4. Our
derivation of Latent-KalmanNet is presented in a step-by-step
x̂t|t−1 = f (x̂t−1 ); ŷt|t−1 = h(x̂t|t−1 ). (3) manner, where each step tackles an additional challenging
aspect, while builds upon its preceding stages.
Then, the initial prediction is updated with a matrix Kt , known
As noted in Subsection II-B, the main added challenges
as the Kalman gain, which dictates the balance between relying
considered here as compared to the setting for which Kalman-
on the state evolution function f (·) through (3) and the current
Net is formulated in [32] are associated with the observation
observation yt . The estimate is computed as
model in (1b), i.e., Challenges C.3 and C.4. Therefore, our first
x̂t = Kt · ∆yt + x̂t|t−1 ; ∆yt , yt − ŷt|t−1 . (4) step, detailed in Subsection III-A, considers an instantaneous
4
x̂t−1 Z −1 x̂t−1 Z −1
x0 x0
t=1
t>1
t=1
t>1
x̂t|t−1 Prediction Block x̂t|t−1 x̂t|t−1 Prediction Block x̂t|t−1
ẑt|t−1
f (·) • H=P f (·) • H=P
ẑt|t−1
zt − x̂t −
zt x̂t
+ + • + • + •
yt + yt +
R
× ×
e
gψ (·) Encoder e
gψ (·) Encoder
Kalman Gain Kt ∆zt Kt
Computation Kalman gain Kalman Gain
Q
∆xt
Update Block Recurrent Neural Network
Update Block
f
Extended Kalman Filter gΘ (·) KalmanNet
Fig. 2: Encoder with prior and EKF in cascade block diagram. Fig. 3: Latent-KalmanNet block diagram.
D. Step 4 - Latent-KalmanNet
applying an EKF in cascade with the pretrained DNN encoder
of Step 2. The rationale here is to assume that the DNN is The system detailed in Step 3 builds upon the insight that
properly trained such that its estimate approaches the minimal the relationship between the latent zt and the state xt obeys
MSE estimator of Pxt from yt . In such cases, the DNN output an (approximated) SS model given by (1a) and (11). We
can be approximated as obeying conclude our design by accounting for Challenges C.1 and
e C.2, and the fact that the error term in (11) is likely to obey
zt = gψ (yt ) ≈ Pxt + ṽt , (11) an unknown distribution. This motivates using KalmanNet
where ṽt is zero-mean and mutually independent of xt . If ṽt instead of the EKF, which is particularly suitable for filtering
is also Gaussian and temporally independent, then (1a) along in such settings, and bypasses the need to impose a specific
with (11) represent a (possibly nonlinear) Gaussian SS model, distribution on the noise terms in the SS model. The result-
from which xt can be tracked using the EKF. The second- ing algorithm, encompassing the architecture and its training
order moment of ṽt , which is necessary for the Kalman gain procedure detailed next, is coined Latent-KalmanNet.
computation (5), can be estimated from the validation error Architecture: To formulate the system operation, we let
R of the DNN encoder. The measurement matrix Ĥt in this θ be the internal RNN parameters of KalmanNet, which
setting is set to Ĥt = P. The system is illustrated in Fig. 2. implements a mapping gθf : Rp 7→ Rm with state-evolution
To apply the EKF in latent space while treating (11) as function f (·) and observation function given by h(x) = Px.
the observation model, one should have knowledge of the As detailed in Subsection II-B, KalmanNet uses the previous
distribution of the state noise et , i.e., the matrix Q. This estimate x̂t−1 to predict the next state as x̂t|t−1 = f x̂t−1 .
can be estimated from the data D. For instance, one can tune This prediction is then used as the prior provided encoder of
the dynamic noise variance to optimize performance by, e.g., Step 2, producing the latent zt via (10). The estimate of xt is
assuming that Q is a scaled identity matrix and employing written as
grid search to identify the variance parameter which yields x̂t = gθf (zt ). (12)
the best performance on the available data. Alternatively, The resulting architecture is illustrated in Fig. 3, where the
one can incorporate parametric estimation mechanisms, e.g., two modules, the encoder and KalmanNet, aid one another
expectation maximization iterations, into the EKF [42]. by providing a low-dimensional latent representation (by the
The proposed cascaded operation allows utilizing a DNN to encoder), and a prior for obtaining the latent (by KalmanNet).
cope with the challenging observations model while systemat- Once trained, the estimation procedure on each time step,
ically incorporating the state evolution model. This is achieved during inference, is summarized as Algorithm 1.
by separating instantaneous estimation from the tracking task, Training The proposed architecture is a concatenation of
where the temporal correlation is exploited. Jointly treating two modules: A DNN estimator gψ e
(·) and KalmanNet gθf (·).
the instantaneous estimate task along with its subsequent Both are differentiable [32], allowing the overall architecture,
tracking allows for improving the overall performance, as parameterized by (θ, ψ), to be trained end-to-end as a dis-
shown in Section IV. Nonetheless, the fact that an EKF is criminative model [39]. We use the `2 regularized MSE loss,
utilized implies that the SS described via (1a) and (11) is which for a given data set D is evaluated as
inherently assumed to be fully known and Gaussian. This is
|D| >
not necessarily the case here, not only due to Challenges C.1 1 X X (d)
LD (θ, ψ) = Lt (θ, ψ)
and C.2, but also since there is no guarantee that the DNN |D|T t=1 d=1
estimation error ṽt is indeed Gaussian. This motivates our 2 2
final step, which formulates Latent-KalmanNet. + λ1 kθk + λ2 kψk , (13)
6
Algorithm 1: Latent-KalmanNet Inference the encoder module separately, via the regularized `2 norm loss
Init: Trained encoder ψ; Trained KalmanNet θ (9). This form of modular training [44] constitutes a warm start
Input: Observations yt ; previous estimate x̂t−1 which is empirically shown to facilitate learning. The resulting
1 Predict x̂t|t−1 = f (x̂t−1 ) ; procedure is summarized as Algorithm 2.
2 Predict ẑt|t−1 = Px̂t|t−1 ;
e
3 Encode observations via zt = gψ (yt , x̂t|t−1 ); Algorithm 2: Latent-KalmanNet Alternating Training
4 Apply the RNN θ to compute Kalman gain Kt ; Init: Fix learning rates µ1 , µ2 > 0 and epochs imax
f
5 Estimate via x̂t = gθ (zt ) = Kt · (zt − ẑt|t−1 ) + x̂t|t−1 ; Input: Training set D
6 return x̂t Warm start:
1 for i = 0, 1, . . . , imax − 1 do
2 Randomly divide D into Q batches {Dq }Q q=1 ;
3 for q = 1, . . . , Q do
where λ1 , λ2 > 0 are regularization coefficients. The loss term 4 Compute batch loss LeDq (ψ) by (9);
for each time step in a given trajectory of (13) is computed as
5 Update ψ ← ψ − µ1 ∇ψ LeDq (ψ);
2
(d) (d) (d)
Lt (θ, ψ) = x̂t − xt , (14) Alternating minimization:
6 for i = 0, 1, . . . , imax − 1 do
where
7 Randomly divide D into Q batches {Dq }Qq=1 ;
(d) (d) (d)
= gθf gψ
e
x̂t yt , f x̂t−1 8 for q = 1, . . . , Q do
=
(d)
x̂t|t−1 + Kt · (zt − ẑt|t−1 ). (15) 9 Compute batch loss LDq (θ, ψ) by (13);
10 Update θ ← θ − µ2 ∇θ LDq (θ, ψ);
The loss measure in (13) builds upon the ability to back- 11 for q = 1, . . . , Q do
propagate the loss to the computation of the Kalman gain Kt 12 Compute batch loss LDq (θ, ψ) by (13);
[43]. In particular, One can obtain the loss gradient of a given 13 Update ψ ← ψ − µ1 ∇ψ LDq (θ, ψ);
trajectory d in given time step t with respect to the Kalman
(d)
gain from the output x̂t of Latent-KalmanNet since 14 return (θ, ψ)
(d) 2
∂Lt (θ, ψ) ∂ kKt ∆zt − ∆xt k
=
∂Kt ∂Kt
= 2(Kt ∆zt − ∆xt ) · ∆z> E. Discussion
t , (16)
The proposed Latent-KalmanNet is designed to tackle the
(d) (d)
where ∆xt = xt − x̂t|t−1 . The gradient computation in (16) challenges of tracking from complex high-dimensional obser-
indicates that one can learn the computation of the Kalman vations. It leverages data to enable reliable tracking, overcom-
gain by training Latent-KalmanNet end-to-end. This allows ing the missing knowledge of the sensing function and the
training the overall filtering system, including both the latent noise statistics. Latent-KalmanNet is derived in gradual steps
encoding and its tracking into the state without having to obtained from pinpointing the specific challenges associated
externally provide ground truth values of the Kalman gain with the filtering problem detailed in Subsection II-A. In
or of the latent features for training purposes. The fact that particular, the usage of a DNN trained in a supervised manner
the MSE loss in (14) is computed based on the output of for coping with the complex observations model in Step 1 is
e
KalmanNet rather than that of gψ (·), implies that the latter will a straightforward approach for instantaneous estimations. Its
not necessarily learn to estimate the observable state variables, cascading with an EKF is a natural extension for incorporating
as in when training via (9). Instead, it is trained to encode the temporal correlation, and a similar approach of applying an
high-dimensional observations yt (along with the prior x̂t|t−1 ) EKF to data-driven extracted features was also proposed in
into latent features, from which KalmanNet can most reliably previous works, e.g., [26], [27]. However, the usage of the
recover the state. For this reason, we coin the algorithm Latent- evolution model f (·) in Step 2 for improving instantaneous
KalmanNet. estimate due to temporal correlation; replacing the EKF with
Latent-KalmanNet enables joint learning of (θ, ψ) via the trainable KalmanNet in Step 4; and the formulation of a
gradient-based optimization, e.g., SGD and its variants. How- suitable training procedure which encourages both modules to
ever, carrying this out in practice can be challenging and often facilitate their counterpart’s operation in tracking, are novel
unstable, as the learning procedure needs to simultaneously aspects of our design. These components are particularly
tune the latent representation and the corresponding Kalman tailored to cope with the challenging partially known SS model
gain computation. Nonetheless, the fact that the architecture in (1), without enforcing a model on the noise statistics and
is decomposable into distinct trainable building blocks with the observations function, and while being geared towards real-
concrete tasks facilitates training via alternating optimization. time applications with low-latency inference demands.
This is achieved by iteratively optimizing the filter θ while Compared with the preliminary findings of this research re-
freezing ψ, followed by training of the latent representation ported in [1], the Latent-KalmanNet algorithm presented here
ψ which best fits the filter with fixed weights θ based on (13). is not restricted to using instantaneous estimators for latent fea-
Additionally, one can initially train the observable variables of ture extraction, and can in fact learn to contribute to the latent
7
Angle
0.0 −1.2
−0.5 −1.3
−1.0 −1.4
−1.5
−1.5
0 50 100 150 200 250 300 350 400 0 10 20 30 40 50
Time s eps Time steps
(a) Trajectory of 400 time instances (b) Zoom in on 50 time instances
Fig. 6: Pendulum: State estimation of the angle variable for a single trajectory realization
φt . The state evolution model of the pendulum is defined by TABLE I: Encoder Architecture
mechanical system laws, making it highly nonlinear in nature,
Layer Filter Size Stride Channels Output size
as given in the following equation
Input - - - 1x28x28
g 1/2 · ∆2t
1 ∆t conv2D 3x3 2 8 8x14x14
xt = · xt−1 − · · sin(φt−1 ) + et . (17)
0 1 ` ∆t ReLU - - - 8x14x14
Batch Norm - - 8 8x14x14
In (17), ∆t denotes the sampling interval, dictating the time conv2D 3x3 2 16 16x7x7
difference between consecutive observations, and et is an i.i.d. ReLU - - - 16x7x7
zero-mean Gaussian noise with covariance Q = q 2 · I, where Batch Norm - - 16 16x7x7
q 2 = 0.1. The gravitational acceleration is set to a constant
conv2D 3x3 2 32 32x4x4
ReLU - - - 32x4x4
value of g = 9.81 m/sec2 , and the length of the string
Batch Norm - - 32 32x4x4
is represented by `. Fig. 4 illustrates the physical pendulum
Flatten - - - 512
setup. FC - - 32 32
The observations yt are 28×28 gray-scale images generated ReLU - - - 32
from the sampled trajectories of the pendulum. The images FC - - p p
capture the pendulum’s dynamic movements as if they were
taken by a camera set in front of the system, corrupted by
i.i.d. Gaussian observation noise vt with covariance R =
r2 · I, where r2 ∈ 0.001, . . . , 0.25. Fig. 5 shows several
representative visual observations of a given trajectory, with 2
Enc der after alternating
different added noise variance r2 . As only the angle can Enc der
be recovered from a single image, this setting represents 0 Enc der + Pri r
a partially observable SS model (see Subsection II-A) with Enc der + Pri r + EKF
Latent KalmanNet
P = [1, 0]. We use this model to generate D = 1, 000 −2
trajectories of length T = 200 which comprise the data set
−4
MSE [dB]
an estimate of the observable state variable φt . Fig. 7: Pendulum: Design steps contribution - MSE vs. differ-
• Encoder + Prior (Step 2): We modify the encoder ar- ent Gaussian noise variance added to the images r12 .
chitecture, detailed in Table I by incorporating a prior,
9
6 Encoder 6 Encoder
Encoder + Pr or Encoder + Pr or
4 Encoder + Pr or + EKF 4 Encoder + Pr or + EKF
RKN RKN
Latent KalmanNet 2 Latent KalmanNet
2
0 0
MSE [dB]
MSE [dB]
−2 −2
−4
−4
−6
−6
−8
−8
0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0
log(1/pr) log(1/pr)
(a) Same trajectories length: Ttrain = Ttest = 200. (b) Different trajectories length: Ttrain = 200, Ttest = 2000.
Fig. 10: Lorenz Attractor: Performance with full domain knowledge. MSE vs. observations S&P noise level.
Noise level − log(pr ) 0.3 1 2 3 we use
5.8 −0.5 −3.7 −5.6
!
Encoder −1
x
2
±1.1 ±1.3 ±0.85 ±0.9 h (c; x) = 10 exp c− 1 . (23)
2x3 x2
Encoder+Prior 2.58 −2.67 −6.1 −6.84
±0.5 ±0.58 ±0.48 ±0.52 The observations are corrupted by Salt and Pepper (S&P)
Encoder+Prior+EKF 0.51 −3 −6.31 −7.16 noise, modeled as an i.i.d. scaled Bernoulli vector with proba-
±0.35 ±0.41 ±0.38 ±0.32 bility pr . This type of noise is common in digital images and
RKN −1 −4.2 −6.9 −7.8 can be caused by sharp and sudden disturbances in the image
±2.1 ±1.5 ±1.3 ±1.1 signal when transmitting images over noisy digital channels.
Latent-KalmanNet -1.91 -4.92 -7.2 -7.94 Representative visual observations of a given trajectory are
±0.06 ±0.1 ±0.07 ±0.12 depicted in Fig. 9. All considered tracking algorithms have
access to the same dataset (2). Unless otherwise stated, the
TABLE II: Lorenz Attractor: Numeric MSE values for the data was generated from the Lorenz attractor SS model with
setting reported in Fig 10a, including standard deviation in Taylor order of J = 5, sampling interval of ∆t = 0.02, and a
the MSE. trajectory length of T = 200.
6 8
Encod r Encoder
Encod r + Prior Encoder + Pr or
4 Encod r + Prior + EKF 6 Encoder + Pr or + EKF
RKN RKN
La( n( KalmanN ( J=2 Latent KalmanNet
2
La( n( KalmanN ( J=5 4
0
MSE [dB]
MSE [dB]
2
−2
0
−4
−6 −2
−8
−4
0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0
log(1/pr) log(1/pr)
(a) Mismatch due to state-evolution Taylor expansion. (b) Mismatch due to coarse sampling.
Fig. 11: Lorenz attractor: Performance with partially domain knowledge. MSE vs. observations S&P noise level.
3) Results: To assess Latent-KalmanNet in terms of track- Next, we examine the generalization of the filters to dif-
ing performance, latency, and robustness, we evaluate MSE as ferent trajectory lengths. This is done by training the systems
well as latency and complexity. As the S&P noise is character- with trajectories of length Ttrain = 200 while testing with
ized by the probability pr , when evaluating how performance notably longer trajectories of length Ttest = 2000. Fig. 10b
scales with the noise level we report the MSE values versus demonstrates how the purely data-driven RKN struggles to
log p1r for ease of visualization, where pr ∈ {0.5, . . . , 0.001} generalize, achieving performance that is similar to the stand-
is mapped into log p1r ∈ {0.3, . . . , 3}. We consider both the alone encoder. However our Latent-KalmanNet successfully
case of full information, where the SS model parameter (e.g., generalizes to a much longer trajectory, as it learns how
the state evolution function f (·)) is the same as that used for to track based on both data, latent features, and domain
generating the trajectories, and the case of partial information, knowledge, allowing it to cope with the SS model and not
where this domain knowledge is mismatched. overfit to the trajectory length.
Full Information: Here, we compare Latent-KalmanNet to Partial Information: To evaluate the performance of
the benchmark algorithms, where all algorithms have access to Latent-KalmanNet under partial model information, we con-
the state-evolution function f (·) used during data generation. sider two sources of model mismatch in the Lorenz attractor
In our first experiment, the trajectory length presented setup. First, we examine state-evolution mismatch due to the
during training was the same as in the test set Ttrain = Ttest = use of a Taylor series approximation of insufficient order.
200. The resulting MSEs, reported in Fig. 10a, demonstrate In this study, both Latent-KalmanNet and the benchmark
that the proposed Latent-KalmanNet achieves the lowest MSE algorithms (Encoder + Prior and Encoder + Prior + EKF) use
for all considered noise levels. The improvement due to adding a crude approximation of the evolution dynamics obtained by
EKF on top of the pre-trained encoder, tracking in the new computing (21) with J = 2, while the data was generated with
learned latent space, is much less notable here compared with an order J = 5 Taylor series expansion. We set the trajectory
the improvement in the Pendulum setup noted in Fig. 7. This length to T = 200 (both Ttrain and Ttest ), and ∆t = 0.02.
follows since the S&P noise yields a latent representation in The results, depicted in Fig. 11a, demonstrate that applying
which the distribution of the distortion cannot be faithfully a model-based EKF achieves performance, which coincides
approximated as being Gaussian, while the EKF, as opposed with that of the Encoder + Prior. This stems from the fact
to KalmanNet, is designed for Gaussian SS models. The that mismatched model resulted in the EKF being unable to
purely data-driven RKN improves upon the model-based EKF, incorporate the state evolution to improve tracking, and the
and is only slightly outperformed by the proposed Latent- grid search for identifying the most suitable noise variance
KalmanNet. lead to the Kalman Gain computation such that the estimation
The superiority of Latent-KalmanNet is also evident in Ta- relies almost solely on the instantaneous observation. More
ble II, which reports the MSEs along with their standard devia- interestingly, Latent-KalmanNet with partial knowledge (of
tion, representing the confident intervals of the estimators. It is J = 2) learns to overcome this model mismatch and manages
observed that the improved estimates of Latent-KalmanNet are to come within a small gap with the performance of Latent-
also consistently achieved more confidently, i.e., with smaller KalmanNet with full knowledge (of J = 5), outperforming
standard deviation, compared with the competitor RKN. This its benchmark counterparts operating with the same level of
behavior is also illustrated when observing a single filter partial information and the data-driven RKN. These findings
trajectory in Fig. 8, where the original state tracked is the one suggest that Latent-KalmanNet is robust and effective, even
depicted on the left side, and the different models predictions when operating under partial model information of the system
are on the right. dynamics.
12
Next, we evaluate the performance of Latent-KalmanNet in of the internal RNN of Latent-KalmanNet results in it having
the presence of sampling mismatch. We generate data from the a similar computational complexity compared with applying
Lorenz attractor SS model using a dense sampling rate ∆t = the EKF in latent space. These results, combined with the
0.001, and then sub-sample the corresponding observations performance and robustness gains of Latent-KalmanNet noted
1
by a ratio of 20 to obtain a decimated process with sample in the previous studies, showcase the potential of Latent-
spacing of ∆t = 0.02. This results in a mismatch between the KalmanNet in leveraging both data and domain knowledge
SS model and the discrete-time sequence, as the nonlinearity for tracking with high-dimensional data while coping with the
of the SS model results in a difference in distribution between challenging C.1-C.4.
the decimated data and data generated directly with sampling
interval ∆t = 0.02. Such scenarios correspond to the practical V. C ONCLUSIONS
setting of mismatches due to processing of continuous-time In this work, we proposed a method for tracking based
signals using discrete-time approximations. For this setting on complex observations with unknown noise statistics. Our
we use the identity mapping for the sensing function h(·) proposed Latent-KalmanNet combines DNN-aided encoding
and set T = 200 (both Ttrain and Ttest ). The results, shown with learned Kalman filtering based on KalmanNet in the
in Fig. 11b, demonstrate that Latent-KalmanNet outperforms latent space, and designs these modules to mutually benefit one
both model-based tracking and the data-driven RKN as its another in a synergistic manner. The training scheme of Latent-
combination of learning capabilities, along with the available KalmanNet exploits its interpretable architecture to formulate
model dynamic knowledge, allows it overcomes the mis- alternate training between the two learnable components in
match induced by representing a continuous-time SS model in order to learn a surrogate latent representation, which most
discrete-time. As in the case with mismatched state evolution, facilitates tracking. Our empirical evaluations demonstrate that
we again observe that the model mismatches result in the the proposed Latent-KalmanNet successfully tracks from high-
EKF being unable to improve performance, and achieving the dimensional observations and generalizes to trajectories of
same performance as that of instantaneous detection using an different lengths. It also succeeds in working with partial
Encoder + Prior. domain knowledge of the state evolution function or sampling
Complexity and Latency: We conclude our numerical mismatches, and is shown to infer with low latency.
study by demonstrating that the performance benefits of R EFERENCES
Latent-KalmanNet do not come at the cost of increased
[1] I. Buchnik, D. Steger, G. Revach, R. J. G. van Sloun, T. Routtenberg,
computational complexity and latency, as is often the case and N. Shlezinger, “Learned Kalman Filtering in Latent Space with
when using deep models. In fact, we show that it can achieve High-Dimensional Data,” IEEE International Conference on Acoustics,
faster inference compared with both model-based EKF in Speech and Signal Processing (ICASSP), 2023.
[2] R. E. Kalman, “A new approach to linear filtering and prediction
latent space and the data-driven RKN. To that aim, we problems,” Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45,
provide an analysis of the average inference time of these 1960.
tracking algorithms computed over a test set comprised of [3] J. Durbin and S. J. Koopman, Time series analysis by state space
methods. OUP Oxford, 2012, vol. 38.
100 trajectories, with Ttest = 200 time steps each. Inference [4] R. E. Larson, R. M. Dressler, and R. S. Ratner, “Application of
time is computed on the same platform for all methods, which the extended Kalman filter to ballistic trajectory estimation.” Stanford
is an 11th Gen Lenovo laptop with Intel Core i7, 2.80 GHz Research Inst Menlo Park CA, Tech. Rep., 1967.
[5] E. A. Wan and R. Van Der Merwe, “The unscented kalman filter,”
processor, 16 GB of RAM, and Windows 11 operating system. Kalman filtering and neural networks, pp. 221–280, 2001.
We also report the number of floating point operations required [6] X. Lin and G. Terejanu, “Enllvm: Ensemble based nonlinear bayesian
by each method, where the number of operations in applying filtering using linear latent variable models,” in ICASSP 2019-2019 IEEE
International Conference on Acoustics, Speech and Signal Processing
a DNN is given by its number of trainable parameters. (ICASSP). IEEE, 2019, pp. 5222–5226.
The resulting complexity and latency measures of Latent- [7] P. M. Djuric, J. H. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. F.
KalmanNet compared with Encoder + Prior + EKF and RKN Bugallo, and J. Miguez, “Particle filtering,” IEEE Signal Process. Mag.,
vol. 20, no. 5, pp. 19–38, 2003.
are reported in Table III. These results reveal that Latent- [8] N. J. Gordon, D. J. Salmond, and A. F. Smith, “Novel approach to
KalmanNet achieves the fastest inference time, being not only nonlinear/non-Gaussian Bayesian state estimation,” in IEE proceedings
notably faster than the purely DNN-based RKN, but also from F (radar and signal processing), vol. 140, no. 2. IET, 1993, pp. 107–
113.
the model-based EKF. The latter stems from the fact that [9] E. Isufi, P. Banelli, P. Di Lorenzo, and G. Leus, “Observing and
the EKF needs to compute Jacobians and matrix inversions tracking bandlimited graph processes from sampled measurements,”
on each time instance to produce its Kalman gain, which Signal Processing, vol. 177, p. 107749, 2020.
[10] G. Sagi and T. Routtenberg, “GSP-based map estimation of graph
turns out to be slower compared with applying the compact signals,” arXiv preprint arXiv:2209.11638, 2022.
RNN used by KalmanNet for the same purpose, while being [11] G. Sagi, N. Shlezinger, and T. Routtenberg, “Extended Kalman Filter
amenable to parallelization and acceleration. The compactness for Graph Signals in Nonlinear Dynamic Systems,” IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[12] S. Cheng, C. Quilodran-Casas, S. Ouala, A. Farchi, C. Liu, P. Tandeo,
TABLE III: Latency and complexity comparison R. Fablet, D. Lucor, B. Iooss, J. Brajard et al., “Machine learning with
data assimilation and uncertainty quantification for dynamical systems:
Encoder+EKF RKN Latent-KalmanNet a review,” arXiv preprint arXiv:2303.10462, 2023.
[13] J. Gu, X. Yang, S. De Mello, and J. Kautz, “Dynamic facial analysis:
Complexity (FP) 15573 + 243 42426 15573 + 2712 From bayesian filtering to recurrent neural network,” in Proceedings
Latency (sec) 0.36 0.39 0.09 of the IEEE Conference on Computer Vision and Pattern Recognition,
2017, pp. 1548–1557.
13
[14] B. Tang and D. S. Matteson, “Probabilistic transformer for time series [38] M. Gruber, “An approach to target tracking,” MIT Lexington Lincoln
analysis,” Advances in Neural Information Processing Systems, vol. 34, Lab, Tech. Rep., 1967.
pp. 23 592–23 608, 2021. [39] N. Shlezinger and T. Routtenberg, “Discriminative and generative learn-
[15] T. Zhi-Xuan, H. Soh, and D. Ong, “Factorized inference in deep Markov ing for linear estimation of random signals [lecture notes],” arXiv
models for incomplete multimodal time series,” in Proceedings of the preprint arXiv:2206.04432, 2022.
AAAI Conference on Artificial Intelligence, vol. 34, no. 06, 2020, pp. [40] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press,
10 334–10 341. 2016.
[16] S. S. Rangapuram, M. W. Seeger, J. Gasthaus, L. Stella, Y. Wang, and [41] J. Park and S. Boyd, “General heuristics for nonconvex quadratically
T. Januschowski, “Deep state space models for time series forecasting,” constrained quadratic programming,” arXiv preprint arXiv:1703.07870,
Advances in Neural Information Processing Systems, vol. 31, 2018. 2017.
[17] B. Millidge, A. Tschantz, A. Seth, and C. Buckley, “Neural Kalman [42] S. Gannot and A. Yeredor, “The Kalman filter,” Springer Handbook of
filtering,” arXiv preprint arXiv:2102.10021, 2021. Speech Processing, pp. 135–160, 2008.
[18] S. Jouaber, S. Bonnabel, S. Velasco-Forero, and M. Pilte, “NNAKF: [43] L. Xu and R. Niu, “EKFNet: Learning system noise statistics from
A neural network adapted Kalman filter for target tracking,” in IEEE measurement data,” in IEEE International Conference on Acoustics,
International Conference on Acoustics, Speech and Signal Processing Speech and Signal Processing (ICASSP), 2021, pp. 4560–4564.
(ICASSP), 2021, pp. 4075–4079. [44] T. Raviv, S. Park, O. Simeone, Y. C. Eldar, and N. Shlezinger, “Online
[19] D. Ruhe and P. Forré, “Self-supervised inference in state-space models,” meta-learning for hybrid model-based deep receivers,” IEEE Trans.
arXiv preprint arXiv:2107.13349, 2021. Wireless Commun., 2023, early access.
[20] D. Ruhe and P. Forr’e, “Self-supervised inference in state-space models,” [45] W. Gilpin, “Chaos as an interpretable benchmark for forecasting and
in International Conference on Learning Representations, 2021. data-driven modelling,” arXiv preprint arXiv:2110.05266, 2021.
[21] P. Becker, H. Pandya, G. Gebhardt, C. Zhao, C. J. Taylor, and [46] V. Garcia Satorras, Z. Akata, and M. Welling, “Combining generative
G. Neumann, “Recurrent Kalman networks: Factorized inference in and discriminative models for hybrid inference,” Advances in Neural
high-dimensional deep feature spaces,” in International Conference on Information Processing Systems, vol. 32, 2019.
Machine Learning. PMLR, 2019, pp. 544–552. [47] V. Bayle, J.-B. Fiche, C. Burny, M. P. Platre, M. Nollmann, A. Mar-
[22] G. Nguyen-Quynh, P. Becker, C. Qiu, M. Rudolph, and G. Neu- tinière, and Y. Jaillais, “Single-particle tracking photoactivated localiza-
mann, “Switching recurrent Kalman networks,” arXiv preprint tion microscopy of membrane proteins in living plant tissues,” Nature
arXiv:2111.08291, 2021. Protocols, vol. 16, no. 3, pp. 1600–1628, 2021.
[23] A. Klushyn, R. Kurle, M. Soelch, B. Cseke, and P. van der Smagt,
“Latent matters: Learning deep state-space models,” Advances in Neural
Information Processing Systems, vol. 34, pp. 10 234–10 245, 2021. A PPENDIX
[24] Y. Wang, X. Luo, L. Ding, and S. Hu, “Visual tracking via robust
multi-task multi-feature joint sparse representation,” Multimedia Tools
A. Additional Numerical Results
and Applications, vol. 77, pp. 31 447–31 467, 2018. The results presented in these tables provide a numerical
[25] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with applica-
tions to tracking and navigation: Theory algorithms and software. John
description of previous experiments that were demonstrated
Wiley & Sons, 01 2004. visually in IV. The tables offer a concise and precise way
[26] L. Zhou, Z. Luo, T. Shen, J. Zhang, M. Zhen, Y. Yao, T. Fang, of presenting the MSE for different noise levels, allowing for
and L. Quan, “KFNet: Learning temporal camera relocalization using
Kalman filtering,” in Proceedings of the IEEE/CVF Conference on
reconstructing the findings. Each Table point the corresponded
Computer Vision and Pattern Recognition, 2020, pp. 4919–4928. figure. The Tables conclude also confidence intervals (std),
[27] H. Coskun, F. Achilles, R. DiPietro, N. Navab, and F. Tombari, “Long providing a measure of the precision of the estimate. In all
short-term memory Kalman filters: Recurrent neural estimators for pose
regularization,” in Proceedings of the IEEE International Conference on
tables our Latent-KalmanNet has narrow interval, proving
Computer Vision, 2017, pp. 5524–5532. our empirical superiority and strengthen the reliability of our
[28] M. Fraccaro, S. Kamronn, U. Paquet, and O. Winther, “A disentangled results.
recognition and nonlinear dynamics model for unsupervised learning,”
Advances in Neural Information Processing Systems, vol. 30, 2017.
TABLE IV: Pendulum: Design steps contribution, Fig 7
[29] A. H. Li, P. Wu, and M. Kennedy, “Replay overshooting: Learning
Noise level − log(r 2 ) 6 15.2 23 30
stochastic latent dynamics with the extended Kalman filter,” 2021 IEEE
International Conference on Robotics and Automation (ICRA), pp. 852– Encoder after alternating 1.6 −0.2 −1.8 −4.5
858, 2021. ±0.7 ±0.66 ±0.61 ±0.56
[30] R. G. Krishnan, U. Shalit, and D. Sontag, “Deep Kalman filters,” arXiv Encoder 0 −1.2 −3.1 −4.9
±0.51 ±0.41 ±0.48 ±0.43
preprint arXiv:1511.05121, 2015.
Encoder+Prior −4 −5.1 −6.5 −8.28
[31] B. Laufer-Goldshtein, R. Talmon, and S. Gannot, “A hybrid approach for ±0.37 ±0.32 ±0.38 ±0.34
speaker tracking based on TDOA and data-driven models,” IEEE/ACM Encoder+Prior+EKF −4.8 −6.1 −7.3 −9.3
Trans. Audio, Speech, Language Process., vol. 26, no. 4, pp. 725–735, ±0.22 ±0.34 ±0.25 ±0.21
2018. Latent-KalmanNet −8.2 −8.3 −9.8 −11.1
[32] G. Revach, N. Shlezinger, X. Ni, A. L. Escoriza, R. J. Van Sloun, and ±0.18 ±0.2 ±0.14 ±0.11
Y. C. Eldar, “KalmanNet: Neural network aided Kalman filtering for
partially known dynamics,” IEEE Trans. Signal Process., vol. 70, pp.
1532–1547, 2022. TABLE V: Lorenz: Performance with full domain knowledge.
[33] I. Klein, G. Revach, N. Shlezinger, J. E. Mehr, R. J. G. van Sloun, Different trajectories length, Fig 10b
and Y. C. Eldar, “Uncertainty in data-driven Kalman filtering for
partially known state-space models,” in IEEE International Conference Noise level − log(pr ) 0.3 1 2 3
on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 3194–
3198. RKN 5.7 −0.2 −3.5 −5.8
[34] G. Revach, N. Shlezinger, T. Locher, X. Ni, R. J. van Sloun, and ±1.6 ±2.4 ±2.8 ±1.5
Encoder 5.4 −0.58 −3.9 −6
Y. C. Eldar, “Unsupervised learned kalman filtering,” in European Signal
±0.79 ±0.71 ±0.75 ±0.8
Processing Conference (EUSIPCO), 2022, pp. 1571–1575. Encoder+Prior 1.63 −3.74 −6.72 −7.57
[35] N. Shlezinger, J. Whang, Y. C. Eldar, and A. G. Dimakis, “Model-based ±0.51 ±0.6 ±0.58 ±0.45
deep learning,” Proc. IEEE, 2023, early access. Encoder+Prior+EKF −0.61 −4.34 −7.11 −7.91
[36] N. Shlezinger, Y. C. Eldar, and S. P. Boyd, “Model-based deep learning: ±0.21 ±0.28 ±0.35 ±0.3
On the intersection of deep learning and optimization,” IEEE Access, Latent-KalmanNet −2.4 −5.45 −7.91 −8.54
vol. 10, pp. 115 384–115 398, 2022. ±0.2 ±0.16 ±0.08 ±0.12
[37] J. Durbin and S. J. Koopman, “A simple and efficient simulation
smoother for state space time series analysis,” Biometrika, vol. 89, no. 3,
pp. 603–616, 2002.
14