0% found this document useful (0 votes)
16 views14 pages

Latent-KalmanNet Learned Kalman Filtering For

The document presents Latent-KalmanNet, a hybrid model-based/data-driven approach for tracking high-dimensional signals using a learned Kalman filtering technique. It addresses the limitations of traditional Kalman filters, which require accurate state space models, by simultaneously learning a latent representation and the filtering operation from data. Empirical results demonstrate that Latent-KalmanNet outperforms existing model-based and data-driven tracking methods in terms of accuracy and computational efficiency.

Uploaded by

1028983121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views14 pages

Latent-KalmanNet Learned Kalman Filtering For

The document presents Latent-KalmanNet, a hybrid model-based/data-driven approach for tracking high-dimensional signals using a learned Kalman filtering technique. It addresses the limitations of traditional Kalman filters, which require accurate state space models, by simultaneously learning a latent representation and the filtering operation from data. Empirical results demonstrate that Latent-KalmanNet outperforms existing model-based and data-driven tracking methods in terms of accuracy and computational efficiency.

Uploaded by

1028983121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1

Latent-KalmanNet: Learned Kalman Filtering for


Tracking from High-Dimensional Signals
Itay Buchnik, Student Member, IEEE, Damiano Steger, Student Member, IEEE,
Guy Revach, Student Member, IEEE, Ruud J. G. van Sloun, Member, IEEE,
Tirza Routtenberg, Senior Member, IEEE, and Nir Shlezinger, Member, IEEE

Abstract—The Kalman filter (KF) is a widely-used algorithm due to its computational efficiency and optimality properties.
arXiv:2304.07827v2 [eess.SP] 20 Apr 2023

for tracking dynamic systems that are captured by state space However, the reliance of the KF and its variants on an accurate
(SS) models. The need to fully describe a SS model limits description of the underlying dynamics as a closed-form SS
its applicability under complex settings, e.g., when tracking
based on visual data, and the processing of high-dimensional model with Gaussian noise restricts it applicability when
signals often induces notable latency. These challenges can tracking from complex high-dimensional data.
be treated by mapping the measurements into latent features In particular, the KF assumes linear dynamics with Gaussian
obeying some postulated closed-form SS model, and applying noise of a known distribution. Variations of the KF, such as the
the KF in the latent space. However, the validity of this ap- extended Kalman filter (EKF) [4] and the unscented Kalman
proximated SS model may constitute a limiting factor. In this
work, we study tracking from high-dimensional measurements filter [5], can cope with nonlinear Gaussian SS models, yet
under complex settings using a hybrid model-based/data-driven they require an accurate description of the nonlinearities,
approach. By gradually tackling the challenges in handling the which is often unavailable when dealing with visual data,
observations model and the task, we develop Latent-KalmanNet, and their complexity grows when processing high-dimensional
which implements tracking from high-dimensional measurements observations. Alternative tracking methods based on Bayesian
by leveraging data to jointly learn the KF along with the
latent space mapping. Latent-KalmanNet combines a learned filtering [6]–[8] do not assume Gaussian modeling, yet are
encoder with data-driven tracking in the latent space using the often computationally complex. While for certain families
recently proposed-KalmanNet, while identifying the ability of of high-dimensional observations, such as graph signals, one
each of these trainable modules to assist its counterpart via can leverage structures in the data to notably reduce tracking
providing a suitable prior (by KalmanNet) and by learning a complexity [9]–[11], such approaches do not naturally extend
latent representation that facilitates data-aided tracking (by the
encoder). Our empirical results demonstrate that the proposed to other domains of high-dimensional data. Moreover, all of
Latent-KalmanNet achieves improved accuracy and run-time the aforementioned techniques are model-based, relying on full
performance over both model-based and data-driven techniques knowledge of the SS model, which is likely to be unavailable
by learning a surrogate latent representation that most facilitates when tracking based on high-dimensional measurements such
tracking, while operating with limited complexity and latency. as visual data.
In recent years, the combination of large-scale datasets and
advancements in deep learning has led to the development
I. I NTRODUCTION
of several data-driven filtering methods, see review in [12].
Tracking the hidden state of dynamic systems is a funda- These methods have shown empirical success in processing
mental problem in various fields, including signal processing, visual data, and typically involve deep neural network (DNN)
control, and finance. In many real-world applications, such architectures, such as recurrent neural networks (RNNs) [13],
as autonomous driving, smart city monitoring, and visual attention mechanisms [14], and deep Markov models [15]
surveillance, tracking is based on noisy high-dimensional for state tracking tasks. While these methods are based on
observations, e.g., visual data. The classic Kalman filter (KF) architectures designed for generic time sequence processing,
[2] algorithm and its variants [3, Ch. 10] have been the go- several DNN architectures were proposed specifically for
to approach for tracking, relying on the representation of tracking in SS models being inspired by model-based tracking
the dynamics as a state space (SS) model that describes the algorithms [16]–[24], resulting in, e.g., DNNs whose internal
state evolution and the sensing model. KF is widely used interconnection follows the flow of the EKF. Among these
existing works, the systems of [21]–[24] were specifically
Parts of this work were accepted for presentation at the IEEE International designed to cope with high-dimensional measurements, with a
Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2023 as
the paper [1]. I. Buchnik, T. Routtenberg, and N. Shlezinger are with the leading basis architecture being the recurrent Kalman network
School of ECE, Ben-Gurion University of the Negev, Be’er Sheva, Israel (RKN) of [21]. However, those methods suffer from diffi-
(e-mail: [email protected]; {tirzar; nirshl}@bgu.ac.il). T. Routtenberg culty in training, sensitivity to initialization and generalization
is also with the ECE Department, Princeton University, Princeton, NJ. D.
Steger and G. Revach are with the D-ITET, ETH Zürich, Switzerland (e- problems. Moreover, were not leverage domain knowledge
mail: [email protected]; [email protected]). R. J. G. van Sloun is with regarding the state evolution, even when such is available, as
the EE Dpt., Eindhoven University of Technology, The Netherlands (e-mail: is the case in various applications including, e.g., localization
[email protected]). This work is partially supported by the Israeli Ministry of
National Infrastructure, Energy, and Water Resources. We thank Hans-Andrea and navigation [25, Ch. 6].
Loeliger for the helpful discussions. While the above data-driven approaches do not use knowl-
2

edge about the SS model, one can combine model-agnostic thermore, we demonstrate that Latent-KalmanNet outperforms
deep-learning tools with SS-aware processing. A candidate classic model-based nonlinear tracking algorithms as well as
approach to do so in the context of high dimensional data state-of-the-art deep architectures in terms of state estimation
is to use a DNN decoder to capture the complex observation accuracy as well as inference speed.
model [26], thus overcoming the need to analytically describe The rest of the paper is organized as follows: Section II
it, yet preserving the complexity associated with tracking details the problem formulation and briefly describes the EKF
using high-dimensional data. Alternatively, a widely adopted and KalmanNet. Section III presents Latent-KalmanNet in a
approach encodes the observations into a latent space via a step-by-step manner, along with the proposed training method.
DNN, i.e., using instead the inverse of the observations model. Our numerical evaluation is provided in Section IV, while
These latent features are then used to track the state with, Section V concludes the paper.
e.g., a conventional KF. This approach assumes that the latent Throughout the paper, we use boldface lower-case letters
features obey a simple SS model, typically a known Gaussian for vectors and boldface upper-case letters for matrices. The
>
one in the latent domain as in [26]–[31]. However, the resulting transpose, `2 norm, and gradient operator are denoted by {·} ,
latent SS model is often non-Gaussian, which can impact the k·k, and ∇(·) , respectively. Finally, R and Z are the sets of
tracking accuracy in the latent space. real and integer numbers, respectively.
In this work, we propose Latent-KalmanNet, which ad-
dresses the difficulties of tracking high-dimensional data by II. S YSTEM M ODEL AND P RELIMINARIES
simultaneously learning to track along with the latent space In this section, we present the system model and rel-
representation. To achieve this, we utilize the recently pro- evant preliminaries needed to derive Latent-KalmanNet in
posed KalmanNet [32]–[34], that learns from data to perform Section III. We start by formulating the problem of tracking
Kalman filtering in partially known SS models as a form of in partially known SS models with high-dimensional observa-
model-based deep learning [35], [36]. KalmanNet relies on a tions in Subsection II-A. Then, in Subsection II-B, we briefly
(possibly approximated) description of the sensing function. recall the model-based EKF and KalmanNet of [32], and
However, this information is unavailable in the setting con- identify their shortcomings for the considered setting.
sidered in this work of complicated high-dimensional data,
and the solution complexity grows with the dimensions of the
A. Problem Formulation
measurements. Therefore, in Latent-KalmanNet we combine
KalmanNet with latent-space encoding, and propose a novel We consider a dynamic system characterized by a (possibly)
training method which jointly learns the latent representation nonlinear, continuous SS model in discrete-time t ∈ Z. Let xt
and filter operation. Latent-KalmanNet uses a latent trans- be the m × 1 state vector, which evolves by a nonlinear state
formation is assisted by its subsequent data-aided tracking evolution function f (·), and is driven by an additive zero-mean
method. The resulting latent representation is suitable for noise et . The n × 1 observation vectors yt , t ∈ Z, are high-
tracking while maintaining the interpretability and low com- dimensional, and in particular n  m, that can be, e.g., the
plexity of the KF. vector representation of an image/tensor1 . The observed yt is
In particular, we first identify the main challenges associated related to the state xt via a complex and possibly unknown
with tracking from high-dimensional measurements in (1) the measurement function h(·) with additive zero-mean noise vt .
need to model stochasticity; (2) the operation with a possibly The resulting SS model is given by:
intractable measurements model; (3) the need to be applicable xt = f (xt−1 ) + et , x t ∈ Rm , (1a)
in real-time; and (4) the presence of possible mismatches in n
yt = h (xt ) + vt , yt ∈ R . (1b)
the state evolution model. Based on these challenges, we de-
rive Latent-KalmanNet by gradually addressing each specific We consider a case where at least some of the state variables
challenge, while accounting for settings where the state can be can be estimated from yt , which is related to the notion of
either partially observable or fully observable. The resulting observablity, typically used in the context of deterministic
Latent-KalmanNet combines two trainable components – an systems [25, Ch. 3]. In particular, we use the term fully
encoder that maps the observations into a latent representation, observable to denote measurement models where yt is affected
and KalmanNet, which tracks based on the latent features. by all variables in xt , and all variables in state xt can
Instead of designing these components separately, we exploit be recovered from yt (i.e., the mapping h(·) is injective).
the ability of each module to facilitate the operation of partially observable for models in which some of the entries
its counterpart. Specifically, the tracking module is used to of xt cannot be recovered from yt (though they may be
provide a prior for encoding, while the encoder generates a dependent on {yτ }τ ≤t ). That is, we examine both the fully
latent representation that is most suitable for tracking, and observable case and the partially observable setting, where in
this desired behavior is learned from data using a dedicated the latter a single yt , can be used to recover only a subset of
alternating training mechanism. Our experimental study eval- p ≤ m variables in xt , denoted as the p × 1 vector Pxt , with
uates Latent-KalmanNet for tracking in challenging settings P being a p × m selection matrix. We henceforth focus our
with high-dimensional visual data, identifying the benefits of
1 For mathematical simplicity, we formulate our high-dimensional obser-
each of the components incorporated in Latent-KalmanNet,
vations in vector form, which also represents tensor data by stacking their
while showing that the synergistic design of latent encoding elements in vector form. The size n considered is the total number of elements
and tracking yields notable performance improvements. Fur- in the observation.
3

on the partially observable setting as it also includes the fully The Kalman gain Kt is calculated via
observable setting by writing P = I and p = m. > −1
Our goal is to develop a filtering algorithm for real-time Kt = Σ̂t|t−1 · Ĥt · Ŝt|t−1 , (5)
state estimation, i.e., for the recovery of xt from {yτ }τ ≤t where Σ̂t|t−1 and Ŝt|t−1 are the covariance matrices of
for each time instance t [37]. This algorithm should work the state prediction x̂t|t−1 and observation prediction ŷt|t−1 ,
effectively in both fully or partially observable SS models, respectively. These matrices are calculated via
while we assume that one has knowledge on which state
variables are observable, i.e., P is known. The performance Σ̂t|t−1 = F̂t · Σ̂t−1 · F̂>
t + Q, (6)
of a given estimator (obtained by the filtering approach) x̂t
is measured using the mean-squared error (MSE), which is Ŝt|t−1 = Ĥt · Σ̃t|t−1 · Ĥ>
t + R, (7)
defined as E{kx̂t − xt k2 }. where Q and R are the known covariance matrices of et and
While various methods have been proposed for tracking in vt , respectively. The matrices F̂t and Ĥt are instantaneous
SS models, our setting is associated with several challenges: linearizations of f (·) and h(·), respectively, obtained using
C.1 The distribution of the noises, et and vt in (1), is their Jacobian matrices evaluated at x̂t−1 and x̂t|t−1 (see [3,
unknown and may be non-Gaussian as, e.g., stochasticity Ch. 10]), i.e.,
in visual data is often non-Gaussian.
F̂t = ∇x f (x̂t−1 ); Ĥt = ∇x h(x̂t|t−1 ). (8)
C.2 The available state-evolution function, f (·), may be mis-
matched, e.g., obtained via a first-order linear approxima- Challenges C.1-C.4 notably limit the applicability of the
tion of complex physical dynamics, as is often the case EKF for the setup detailed in the previous subsection. Kalman-
in navigation and localization tasks [25, Ch. 6]. Net proposed at [32] is designed to leverage data as in
C.3 The observations are high-dimensional (n  m), leading (2) to tackle Challenges C.1 and C.2 (but not C.3-C.4). In
to high complexity and affecting real-time applicability. particular, KalmanNet builds on the insight that the missing
C.4 The sensing function h(·) is unknown and possibly and mismatched domain knowledge of the noise statistics and
analytically intractable. the linear approximations are encapsulated in the computation
To cope with the various unknown characteristics of (1), of the Kalman gain Kt (5). Consequently, it augments the EKF
we are given access to a labeled data set comprised of D with a deep-learning component by replacing the computation
trajectories of length T of paired observations and states, of the Kalman gain with an RNN, while preserving the filtering
n oT D operation via (3)-(4). By doing so, KalmanNet converts the
(d) (d) EKF into a trainable discriminative model [39], where the data
D, xt , yt . (2)
t=1 d=1 D is used to directly learn the Kalman gain, bypassing the
Our proposed algorithm for tackling C.1-C.4 is detailed in need to enforce any model over the noise statistics and able to
Section III. Our design follows model-based deep-learning handle domain knowledge mismatches as in challenges C.1-
methodology [35], [36], where deep-learning tools are used C.2. Moreover, KalmanNet preserves the interoperability of
to augment and empower model-based algorithms rather than the KF, while being operable in partially known SS models;
replace them. Our method builds upon the KalmanNet archi- therefore it allows to deduce uncertainty as shown in [33], and
tecture of [32], which augments the classic EKF, as briefly is amenable to training in an unsupervised manner [34].
recalled in the next subsection. Despite the ability of the KalmanNet architecture of [32]
to learn from data to cope with Challenges C.1 and C.2,
it is not suitable to be applied in our setting under C.3-
B. EKF and KalmanNet C.4. In particular, the high dimension of the observations
Various model-based filters have been developed tracking notably increases the complexity of its Kalman gain RNN and
in SS models (see, e.g. [3, Ch. 10]). One of the most common the resulting filter. Moreover, KalmanNet requires knowledge
algorithms, which is suitable when the noises are Gaussian and of h(·), which is not analytically available in the current
the SS model is fully known, i.e., in the absence of Challenges setting. This motivates the derivation of the proposed Latent-
C.1-C.4, is the EKF [38]. The EKF follows the operation of the KalmanNet in the sequel.
KF, combining prediction based on the previous estimate with
an update based on the current observation, while extending III. L ATENT-K ALMAN N ET
it to nonlinear SS models.
In particular, the EKF first predicts the next state and In this section, we present the proposed Latent-
observation based on x̂t−1 via KalmanNet algorithm, which tackles Challenges C.1-C.4. Our
derivation of Latent-KalmanNet is presented in a step-by-step
x̂t|t−1 = f (x̂t−1 ); ŷt|t−1 = h(x̂t|t−1 ). (3) manner, where each step tackles an additional challenging
aspect, while builds upon its preceding stages.
Then, the initial prediction is updated with a matrix Kt , known
As noted in Subsection II-B, the main added challenges
as the Kalman gain, which dictates the balance between relying
considered here as compared to the setting for which Kalman-
on the state evolution function f (·) through (3) and the current
Net is formulated in [32] are associated with the observation
observation yt . The estimate is computed as
model in (1b), i.e., Challenges C.3 and C.4. Therefore, our first
x̂t = Kt · ∆yt + x̂t|t−1 ; ∆yt , yt − ŷt|t−1 . (4) step, detailed in Subsection III-A, considers an instantaneous
4

estimation setting based solely on the observations model.


Fully Connected
The second step, described in Subsection III-B, incorporates Flatten
the state evolution function (1a) by adding a prior as an BatchNorm
Max Pooling
additional input, which accounts for temporal correlation Convolutional + ReLU
and partial observability. Then, we unite the instantaneous
estimate with the model-based EKF for tracking in Step 3
(Subsection III-C) to face Challenges C.3-C.4. Our fourth
step, detailed in Subsection III-D, incorporates Challenges C.1
concat
and C.2 by converting the joint instantaneous estimate and
tracking algorithm into Latent-KalmanNet, by replacing EKF
with KalmanNet. Latent-KalmanNet is learned as a trainable
discriminative model, where data-driven tracking is done based
on jointly learned latent features. We conclude our derivation Fig. 1: Illustration of an encoder with prior implemented using
with a discussion in Subsection III-E. a CNN, following the implementation utilized in the numerical
study in Section IV.
A. Step 1 - Instantaneous Estimate
We begin by considering the observations model (1b) solely. can be greatly facilitated by providing a good initial guess,
The resulting task boils down to the instantaneous estimation which hopefully lies in the proximity of the global optimum
of (the observable entries of) xt from the observed yt . The [41].
fact that the observation model is unknown (C.4) and high- To incorporate the state evolution as a form of a prior,
dimensional (C.3), combined with the availability of labeled we apply f (·) to the previous estimate, x̂t−1 . The previous
data (2), motivates the usage of DNNs. A natural approach estimate x̂t−1 can be obtained from the previous encoder
e
here is to use a regression DNN with parameters ψ, denoted output, i.e., gψ yt−1 , while fused with an estimate of the
gψe
: Rn 7→ Rp , and train it to map yt into an estimate of the unobservable variables, that can be provided as an initial
observable state variables Pxt . guess and improve the instantaneous estimate by using the
When properly trained with sufficient data, regression temporal correlation. An example of the resulting high-level
DNNs are often capable of learning to provide reliable es- architecture, where the prior prediction x̂t|t−1 is provided
timates from high-dimensional data with complex statistical to the DNN-based instantaneous estimate (implemented as a
models [40, Ch. 13]. In particular, the DNN parameters ψ convolutional neural network (CNN)) as additional features,
can be learned in a supervised manner via gradient-based is illustrated in Fig. 1. The DNN-based estimator now takes
optimization, e.g., stochastic gradient descent (SGD) and its two multivariate inputs, x̂t|t−1 and yt , and is thus trained for
variants. We adopt the regularized `2 norm MSE loss, which, mapping Rn × Rm into Rp . Namely, here
for a given data set D, is computed as: e
zt = gψ (yt , x̂t|t−1 ). (10)
|D|
T
1 XX (d)  (d)
2
2 The encoder is trained using the regularized MSE loss as
LeD (ψ) = e
gψ yt −Pxt +λ kψk , (9)
|D|T in (9). The incorporation of past outputs as a prior enables
d=1 t=1
leveraging domain knowledge in the sense of the state evo-
where λ > 0 is a regularization coefficient. lution model, thus guiding the learning procedure towards
The resulting instantaneous estimation system represents a a more desirable solution. Yet, it also impacts the stability
straightforward data-driven approach to tackle the challenges of the training procedure, as we have empirically observed.
associated with the observation model (1b). However, it does Nonetheless, the learning procedure can be facilitated by
not account for the temporal correlation induced by the state exploiting the interpretability of the architecture, building upon
evolution model (1a). Moreover, the full reliance on black box the ability to view the prior x̂t|t−1 as a noisy version of the
deep-learning architectures implies that the resulting system is desired state. One can thus set the prior during training to
sensitive to generalization problems and the model can easily be the ground truth state with added noise, while choosing
over-fit to the training set. the noise magnitude based on an ablation study, as done in
our numerical study reported in Section IV. Here, care should
B. Step 2 - Incorporating the Evolution Model be taken not to push the encoder to learn solely from the
observation when the noise is too large, while not relying only
The instantaneous estimator is oblivious to the partially
upon the prior where the noise is too small.
known state evolution model in (1a). Therefore, by inte-
grating the model in (1a), i.e., the (possibly approximated)
state evolution function f (·), one can potentially improve the C. Step 3 - Joint Instantaneous Estimation and Tracking
e
estimation of the observable state variables provided by gψ (·). Next, we assume that the relationship between the estimator
Our rationale stems from viewing the inference task carried output and the observable state variables can be represented as
out by the DNN, i.e., recovering Pxt from yt , as solving a obeying a Gaussian distribution. This approach allows tracking
non-convex optimization problem (see [36]). While tackling in the latent space, without accounting Challenges C.1 and C.2.
non-convex optimization is in general highly challenging, it In such cases, one can account for the temporal correlation by
5

x̂t−1 Z −1 x̂t−1 Z −1
x0 x0

t=1
t>1

t=1
t>1
x̂t|t−1 Prediction Block x̂t|t−1 x̂t|t−1 Prediction Block x̂t|t−1
ẑt|t−1
f (·) • H=P f (·) • H=P
ẑt|t−1

zt − x̂t −
zt x̂t
+ + • + • + •
yt + yt +
R
× ×
e
gψ (·) Encoder e
gψ (·) Encoder
Kalman Gain Kt ∆zt Kt
Computation Kalman gain Kalman Gain
Q
∆xt
Update Block Recurrent Neural Network
Update Block
f
Extended Kalman Filter gΘ (·) KalmanNet

Fig. 2: Encoder with prior and EKF in cascade block diagram. Fig. 3: Latent-KalmanNet block diagram.

D. Step 4 - Latent-KalmanNet
applying an EKF in cascade with the pretrained DNN encoder
of Step 2. The rationale here is to assume that the DNN is The system detailed in Step 3 builds upon the insight that
properly trained such that its estimate approaches the minimal the relationship between the latent zt and the state xt obeys
MSE estimator of Pxt from yt . In such cases, the DNN output an (approximated) SS model given by (1a) and (11). We
can be approximated as obeying conclude our design by accounting for Challenges C.1 and
e C.2, and the fact that the error term in (11) is likely to obey
zt = gψ (yt ) ≈ Pxt + ṽt , (11) an unknown distribution. This motivates using KalmanNet
where ṽt is zero-mean and mutually independent of xt . If ṽt instead of the EKF, which is particularly suitable for filtering
is also Gaussian and temporally independent, then (1a) along in such settings, and bypasses the need to impose a specific
with (11) represent a (possibly nonlinear) Gaussian SS model, distribution on the noise terms in the SS model. The result-
from which xt can be tracked using the EKF. The second- ing algorithm, encompassing the architecture and its training
order moment of ṽt , which is necessary for the Kalman gain procedure detailed next, is coined Latent-KalmanNet.
computation (5), can be estimated from the validation error Architecture: To formulate the system operation, we let
R of the DNN encoder. The measurement matrix Ĥt in this θ be the internal RNN parameters of KalmanNet, which
setting is set to Ĥt = P. The system is illustrated in Fig. 2. implements a mapping gθf : Rp 7→ Rm with state-evolution
To apply the EKF in latent space while treating (11) as function f (·) and observation function given by h(x) = Px.
the observation model, one should have knowledge of the As detailed in Subsection II-B, KalmanNet uses the previous 
distribution of the state noise et , i.e., the matrix Q. This estimate x̂t−1 to predict the next state as x̂t|t−1 = f x̂t−1 .
can be estimated from the data D. For instance, one can tune This prediction is then used as the prior provided encoder of
the dynamic noise variance to optimize performance by, e.g., Step 2, producing the latent zt via (10). The estimate of xt is
assuming that Q is a scaled identity matrix and employing written as
grid search to identify the variance parameter which yields x̂t = gθf (zt ). (12)
the best performance on the available data. Alternatively, The resulting architecture is illustrated in Fig. 3, where the
one can incorporate parametric estimation mechanisms, e.g., two modules, the encoder and KalmanNet, aid one another
expectation maximization iterations, into the EKF [42]. by providing a low-dimensional latent representation (by the
The proposed cascaded operation allows utilizing a DNN to encoder), and a prior for obtaining the latent (by KalmanNet).
cope with the challenging observations model while systemat- Once trained, the estimation procedure on each time step,
ically incorporating the state evolution model. This is achieved during inference, is summarized as Algorithm 1.
by separating instantaneous estimation from the tracking task, Training The proposed architecture is a concatenation of
where the temporal correlation is exploited. Jointly treating two modules: A DNN estimator gψ e
(·) and KalmanNet gθf (·).
the instantaneous estimate task along with its subsequent Both are differentiable [32], allowing the overall architecture,
tracking allows for improving the overall performance, as parameterized by (θ, ψ), to be trained end-to-end as a dis-
shown in Section IV. Nonetheless, the fact that an EKF is criminative model [39]. We use the `2 regularized MSE loss,
utilized implies that the SS described via (1a) and (11) is which for a given data set D is evaluated as
inherently assumed to be fully known and Gaussian. This is
|D| >
not necessarily the case here, not only due to Challenges C.1 1 X X (d)
LD (θ, ψ) = Lt (θ, ψ)
and C.2, but also since there is no guarantee that the DNN |D|T t=1 d=1
estimation error ṽt is indeed Gaussian. This motivates our 2 2
final step, which formulates Latent-KalmanNet. + λ1 kθk + λ2 kψk , (13)
6

Algorithm 1: Latent-KalmanNet Inference the encoder module separately, via the regularized `2 norm loss
Init: Trained encoder ψ; Trained KalmanNet θ (9). This form of modular training [44] constitutes a warm start
Input: Observations yt ; previous estimate x̂t−1 which is empirically shown to facilitate learning. The resulting
1 Predict x̂t|t−1 = f (x̂t−1 ) ; procedure is summarized as Algorithm 2.
2 Predict ẑt|t−1 = Px̂t|t−1 ;
e
3 Encode observations via zt = gψ (yt , x̂t|t−1 ); Algorithm 2: Latent-KalmanNet Alternating Training
4 Apply the RNN θ to compute Kalman gain Kt ; Init: Fix learning rates µ1 , µ2 > 0 and epochs imax
f
5 Estimate via x̂t = gθ (zt ) = Kt · (zt − ẑt|t−1 ) + x̂t|t−1 ; Input: Training set D
6 return x̂t Warm start:
1 for i = 0, 1, . . . , imax − 1 do
2 Randomly divide D into Q batches {Dq }Q q=1 ;
3 for q = 1, . . . , Q do
where λ1 , λ2 > 0 are regularization coefficients. The loss term 4 Compute batch loss LeDq (ψ) by (9);
for each time step in a given trajectory of (13) is computed as
5 Update ψ ← ψ − µ1 ∇ψ LeDq (ψ);
2
(d) (d) (d)
Lt (θ, ψ) = x̂t − xt , (14) Alternating minimization:
6 for i = 0, 1, . . . , imax − 1 do
where
7 Randomly divide D into Q batches {Dq }Qq=1 ;
(d) (d) (d)
= gθf gψ
e

x̂t yt , f x̂t−1 8 for q = 1, . . . , Q do
=
(d)
x̂t|t−1 + Kt · (zt − ẑt|t−1 ). (15) 9 Compute batch loss LDq (θ, ψ) by (13);
10 Update θ ← θ − µ2 ∇θ LDq (θ, ψ);
The loss measure in (13) builds upon the ability to back- 11 for q = 1, . . . , Q do
propagate the loss to the computation of the Kalman gain Kt 12 Compute batch loss LDq (θ, ψ) by (13);
[43]. In particular, One can obtain the loss gradient of a given 13 Update ψ ← ψ − µ1 ∇ψ LDq (θ, ψ);
trajectory d in given time step t with respect to the Kalman
(d)
gain from the output x̂t of Latent-KalmanNet since 14 return (θ, ψ)
(d) 2
∂Lt (θ, ψ) ∂ kKt ∆zt − ∆xt k
=
∂Kt ∂Kt
= 2(Kt ∆zt − ∆xt ) · ∆z> E. Discussion
t , (16)
The proposed Latent-KalmanNet is designed to tackle the
(d) (d)
where ∆xt = xt − x̂t|t−1 . The gradient computation in (16) challenges of tracking from complex high-dimensional obser-
indicates that one can learn the computation of the Kalman vations. It leverages data to enable reliable tracking, overcom-
gain by training Latent-KalmanNet end-to-end. This allows ing the missing knowledge of the sensing function and the
training the overall filtering system, including both the latent noise statistics. Latent-KalmanNet is derived in gradual steps
encoding and its tracking into the state without having to obtained from pinpointing the specific challenges associated
externally provide ground truth values of the Kalman gain with the filtering problem detailed in Subsection II-A. In
or of the latent features for training purposes. The fact that particular, the usage of a DNN trained in a supervised manner
the MSE loss in (14) is computed based on the output of for coping with the complex observations model in Step 1 is
e
KalmanNet rather than that of gψ (·), implies that the latter will a straightforward approach for instantaneous estimations. Its
not necessarily learn to estimate the observable state variables, cascading with an EKF is a natural extension for incorporating
as in when training via (9). Instead, it is trained to encode the temporal correlation, and a similar approach of applying an
high-dimensional observations yt (along with the prior x̂t|t−1 ) EKF to data-driven extracted features was also proposed in
into latent features, from which KalmanNet can most reliably previous works, e.g., [26], [27]. However, the usage of the
recover the state. For this reason, we coin the algorithm Latent- evolution model f (·) in Step 2 for improving instantaneous
KalmanNet. estimate due to temporal correlation; replacing the EKF with
Latent-KalmanNet enables joint learning of (θ, ψ) via the trainable KalmanNet in Step 4; and the formulation of a
gradient-based optimization, e.g., SGD and its variants. How- suitable training procedure which encourages both modules to
ever, carrying this out in practice can be challenging and often facilitate their counterpart’s operation in tracking, are novel
unstable, as the learning procedure needs to simultaneously aspects of our design. These components are particularly
tune the latent representation and the corresponding Kalman tailored to cope with the challenging partially known SS model
gain computation. Nonetheless, the fact that the architecture in (1), without enforcing a model on the noise statistics and
is decomposable into distinct trainable building blocks with the observations function, and while being geared towards real-
concrete tasks facilitates training via alternating optimization. time applications with low-latency inference demands.
This is achieved by iteratively optimizing the filter θ while Compared with the preliminary findings of this research re-
freezing ψ, followed by training of the latent representation ported in [1], the Latent-KalmanNet algorithm presented here
ψ which best fits the filter with fixed weights θ based on (13). is not restricted to using instantaneous estimators for latent fea-
Additionally, one can initially train the observable variables of ture extraction, and can in fact learn to contribute to the latent
7

state encoding. Furthermore, while [1] was only applicable in


fully observable SS models, Latent-KalmanNet presented here
is designed to leverage its access to the state evolution model
to track the state also in partially observable settings. This
enables its application in various challenging scenarios, as also
demonstrated in our numerical study reported in Section IV.
Our design of Latent-KalmanNet improves estimation per-
formance by breaking the separation between feature extrac-
tion and filtering. In principle, one can claim that providing
the prior x̂t|t−1 effectively delegates the filtering task to the
instantaneous estimate DNN for fully observable models, and
renders the following filtering step meaningless. However, our
numerical findings reported in Section IV demonstrate that this Fig. 4: Pendulum: physical setup and state variables.
is not the case, and that the system benefits from both prior-
ϕ= − 43° ϕ = − 23° ϕ = − 6°
aided instants estimation as well as from the filtering operation r 2 = 0.001, MSE=-11.1 dB r 2 = 0.001, MSE=-11.1 dB r 2 = 0.005, MSE=-9.8 dB
based on that estimate. Step 4 allows the resulting algorithm
to operate without enforcing a Gaussian SS model on the
latent representation, as opposed to [26]. This is obtained
as Algorithm 1 bypasses the need to model the stochasticity
in the SS model by using KalmanNet [32]. Unlike state-of-
the-art DNN-aided filters, such as the RKN of [21], we do
ϕ = 10° ϕ = 23° ϕ = 35°
not replace all the KF procedures with DNNs and preserve r2 = 0.005, MSE=-9.8 dB r2 = 0.25, MSE=-8.2 dB r2 = 0.25, MSE=-8.2 dB
the operation of the model-based filter. This allows us to
systematically incorporate the available domain knowledge on
the state evolution, improving performance and generalization,
as demonstrated in Section IV.
The hybrid model-based/data-driven design of Latent-
KalmanNet yields gains not only in accuracy and interpretabil-
ity. It can also achieve faster inference speed compared to Fig. 5: Pendulum: several representative gray-scale observa-
other model-based solutions or highly parameterised data tions along with their corresponding angle variable φ, set to
driven models, while supporting training with relatively lim- be the ground truth. In addition, the variance Gaussian noise
ited datasets, as demonstrated in Section IV. The computation level r2 that was added to the image, and the MSE achieved
complexity for each time step is linear in the dimensions of by Latent-KalmanNet.
the trainable modules, being the complexity order of inferring
using a DNN, while its augmentation with the classic EKF of a dynamic system from visual measurements: The first study
enables using relatively compact architectures, as we do in detailed in Subsection IV-A considers is a partially observable
Section IV. Moreover, while Latent-KalmanNet follows the dynamic system representing the tracking of a pendulum.
operation of the EKF, it does not involve Jacobian computa- It is used to critically examine the design steps of Latent-
tions as in (8) or matrix inversion as in (5) on each time step. KalmanNet and to evaluate the contribution of each of its
This implies that Latent-KalmanNet is a good candidate to components. The second study, presented in Subsection IV-B,
apply for high dimensional SS models and on computationally considers the Lorenz attractor chaotic system, which is an
limited devices, compared to other model-based solutions, observable dynamic system. This setup is used to compare
as well as data-driven approaches with a large volume of our proposed Latent-KalmanNet against both model-based and
weights. Preserving the model-based operation of the KF data-driven techniques across various scenarios, with both full
was shown to bring operational gains beyond accuracy and and partial domain knowledge.
training complexity. For instance, it was shown in [33] to
enable extracting uncertainty on the estimates, and in [34]
to facilitate unsupervised learning. We leave the exploration A. Pendulum Data
of these properties for latent-space learned filtering for future We commence our numerical study by comprehensively
work. evaluating the impact of each design step outlined in Sec-
tion III over the Pendulum setting, further detail in the
IV. E MPIRICAL S TUDY following.
1) SS Model: In the considered SS model, the vector
In this section, a comprehensive numerical analysis is
xt represents the state of an oscillating pendulum, encom-
performed2 on the proposed Latent-KalmanNet to assess its
passing both the angle φt and the angular velocity ωt , i.e.,
performance. We consider two setups involving the tracking
xt = [φt , ωt ]> . We focus on tracking the angle φt along the
2 The source code and hyperparameters used are available at https://github. trajectory of a pendulum movement that is released from rest
com/KalmanNet/Latent KalmanNet TSP.git at a pre-defined point, i.e., the MSE is reported with respect to
8

Encoder af er al erna ing −0.8 Encode afte alte nating


1.5 Encoder Encode
Encoder + Prior Encode + P io
−0.9
Encoder + Prior + EKF Encode + P io + EKF
1.0 La en KalmanNe Latent KalmanNet
Targe −1.0 Ta get
0.5
−1.1
Angle

Angle
0.0 −1.2

−0.5 −1.3

−1.0 −1.4

−1.5
−1.5
0 50 100 150 200 250 300 350 400 0 10 20 30 40 50
Time s eps Time steps
(a) Trajectory of 400 time instances (b) Zoom in on 50 time instances
Fig. 6: Pendulum: State estimation of the angle variable for a single trajectory realization

φt . The state evolution model of the pendulum is defined by TABLE I: Encoder Architecture
mechanical system laws, making it highly nonlinear in nature,
Layer Filter Size Stride Channels Output size
as given in the following equation
Input - - - 1x28x28
g 1/2 · ∆2t
   
1 ∆t conv2D 3x3 2 8 8x14x14
xt = · xt−1 − · · sin(φt−1 ) + et . (17)
0 1 ` ∆t ReLU - - - 8x14x14
Batch Norm - - 8 8x14x14
In (17), ∆t denotes the sampling interval, dictating the time conv2D 3x3 2 16 16x7x7
difference between consecutive observations, and et is an i.i.d. ReLU - - - 16x7x7
zero-mean Gaussian noise with covariance Q = q 2 · I, where Batch Norm - - 16 16x7x7
q 2 = 0.1. The gravitational acceleration is set to a constant
conv2D 3x3 2 32 32x4x4
ReLU - - - 32x4x4

value of g = 9.81 m/sec2 , and the length of the string
Batch Norm - - 32 32x4x4
is represented by `. Fig. 4 illustrates the physical pendulum
Flatten - - - 512
setup. FC - - 32 32
The observations yt are 28×28 gray-scale images generated ReLU - - - 32
from the sampled trajectories of the pendulum. The images FC - - p p
capture the pendulum’s dynamic movements as if they were
taken by a camera set in front of the system, corrupted by
i.i.d. Gaussian observation noise vt with covariance R =
r2 · I, where r2 ∈ 0.001, . . . , 0.25. Fig. 5 shows several
representative visual observations of a given trajectory, with 2
Enc der after alternating
different added noise variance r2 . As only the angle can Enc der
be recovered from a single image, this setting represents 0 Enc der + Pri r
a partially observable SS model (see Subsection II-A) with Enc der + Pri r + EKF
Latent KalmanNet
P = [1, 0]. We use this model to generate D = 1, 000 −2
trajectories of length T = 200 which comprise the data set
−4
MSE [dB]

D as (2), with additional 100 trajectories for evaluation.


2) Tracking Methods: To this end, we evaluate the follow-
ing approaches, representing the steps in designing Latent- −6
KalmanNet:
• Encoder (Step 1): We implement a purely data-driven, −8
model-agnostic convolutional encoder, comprised of 3
convolutional layers, followed by 2 fully connected (FC) −10
layers, and p = 1 output neuron, as detailed in Table
I. The encoder is trained in a supervised manner on the 5 10 15 20 25 30
1 [dB]
dataset D (2) with loss function (9) to map each yt into r2

an estimate of the observable state variable φt . Fig. 7: Pendulum: Design steps contribution - MSE vs. differ-
• Encoder + Prior (Step 2): We modify the encoder ar- ent Gaussian noise variance added to the images r12 .
chitecture, detailed in Table I by incorporating a prior,
9

(a) Ground truth. (b) Encoder (c) RKN (d) Latent-KalmanNet


Fig. 8: Lorenz attractor: ground truth trajectory of the state vs. trajectories estimated with the different methods.

f (x̂t−1 ), as an additional input to the observed image


yt . The prior undergoes an FC layer and concatenates
to the flattened version of the extracted features, as
illustrated in Fig. 1. The output of this encoder still
represents the estimate of φt , but is fused with an estimate
of unobservable angular velocity variable, obtained by
applying the state evolution f (x̂t−1 ).
• Encoder + Prior + EKF (Step 3): On top of the trained
encoder with prior of Step 2, we apply an EKF with the
observation function set to h(x) = Px. The variance
of the state evolution noise is selected through grid
search, while the observation noise is determined by the
empirical estimate loss at the output of the encoder.
• Latent-KalmanNet (Step 4): The proposed Latent-
KalmanNet uses the Encoder + Prior of Step 2, and
combines it with KalmanNet based implemented using Fig. 9: Lorenz attractor: several representative gray-scale ob-
Architecture 2 of [32]. Latent-KalmanNet is trained using servations along with their corresponding state value x, set
Algorithm 2. to be the ground truth (GT). In addition, the S&P noise
probability that was added to the image, and the MSE achieve
3) Results: In Fig. 7 we compare the MSE averaged over by Latent-KalmanNet.
100 test trajectories achieved by the considered methods in re-
covering angle, i.e., the observable entry of the state variables.
B. Comparative Evaluation of Latent-KalmanNet
The findings in Fig. 7 reveal the individual contribution of each
of the design steps comprising Latent-KalmanNet. There, it
is shown that including a prior based on the state evolution Next, we present a comprehensive numerical evaluation of
model notably improves the performance of an encoder in Latent-KalmanNet and its performance in terms of MSE and
recovering the observable state variables. Moreover, using latency. To that aim, we simulate various scenarios involving
this estimate as latent features and employing EKF tracking the nonlinear Lorenz attractor SS model detailed in the fol-
in latent space further improves performance, though less lowing.
dramatically; However, replacing the EKF with KalmanNet 1) SS Model: Here, the state vector xt is a three-
that is jointly trained along with the latent representation as dimensional chaotic solution to the Lorenz system of ordinary
Latent-KalmanNet achieves substantial improvements in MSE. differential equations. The system describes chaotic particle
We also depict in Fig. 7 the MSE computed at the output movement sampled into discrete time intervals [45]. The result
of the encoder of Latent-KalmanNet, where it is shown that is a nonlinear state evolution model of the continuous-time
the latent representation learned is not an accurate estimate of process, showcasing the dynamic interplay of forces shaping
the state, being in fact worse than an instantaneous encoder. the chaotic particle’s movement. The noise-free state-evolution
However, this representation is learned such that it facilitates equation is obtained from the differential equation
tracking in latent space, as evident by the superior performance  
−10 10 0
of Latent-KalmanNet. dxt
= A(xt ) · xt ; A(xt ) =  28 −1 −x1  (18)
While Fig. 7 reports the averaged MSE, in Fig. 6 the dt
0 x1 −8/3
superiority of Latent-KalmanNet is showcased when tracking
a single trajectory. There, the improved tracking quality is The model is converted into a discrete-time state-evolution
highlighted by observing a trajectory spanning 400 time in- model by repeating the steps used in [46]. First, we sample
stances in Fig. 6a, and we also zoom in on 50 time instances the noiseless process with sampling interval ∆t and assume
in Fig. 6b to improve visualization. These results demonstrate that can be kept constant in a small neighborhood of xt ; i.e.,
that Latent-KalmanNet’s principled incorporation of the state
A(xt ) ≈ A(xt+∆t ). (19)
evolution model knowledge while jointly learning the latent
representation with the tracking algorithm results in improved Then, the continuous-time solution of the differential system
angle estimation and smoother tracking. (18), which is valid in the neighborhood of xt for a short time
10

6 Encoder 6 Encoder
Encoder + Pr or Encoder + Pr or
4 Encoder + Pr or + EKF 4 Encoder + Pr or + EKF
RKN RKN
Latent KalmanNet 2 Latent KalmanNet
2

0 0
MSE [dB]

MSE [dB]
−2 −2

−4
−4
−6
−6
−8
−8
0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0
log(1/pr) log(1/pr)
(a) Same trajectories length: Ttrain = Ttest = 200. (b) Different trajectories length: Ttrain = 200, Ttest = 2000.
Fig. 10: Lorenz Attractor: Performance with full domain knowledge. MSE vs. observations S&P noise level.
Noise level − log(pr ) 0.3 1 2 3 we use
5.8 −0.5 −3.7 −5.6
!
Encoder −1
 
x
2
±1.1 ±1.3 ±0.85 ±0.9 h (c; x) = 10 exp c− 1 . (23)
2x3 x2
Encoder+Prior 2.58 −2.67 −6.1 −6.84
±0.5 ±0.58 ±0.48 ±0.52 The observations are corrupted by Salt and Pepper (S&P)
Encoder+Prior+EKF 0.51 −3 −6.31 −7.16 noise, modeled as an i.i.d. scaled Bernoulli vector with proba-
±0.35 ±0.41 ±0.38 ±0.32 bility pr . This type of noise is common in digital images and
RKN −1 −4.2 −6.9 −7.8 can be caused by sharp and sudden disturbances in the image
±2.1 ±1.5 ±1.3 ±1.1 signal when transmitting images over noisy digital channels.
Latent-KalmanNet -1.91 -4.92 -7.2 -7.94 Representative visual observations of a given trajectory are
±0.06 ±0.1 ±0.07 ±0.12 depicted in Fig. 9. All considered tracking algorithms have
access to the same dataset (2). Unless otherwise stated, the
TABLE II: Lorenz Attractor: Numeric MSE values for the data was generated from the Lorenz attractor SS model with
setting reported in Fig 10a, including standard deviation in Taylor order of J = 5, sampling interval of ∆t = 0.02, and a
the MSE. trajectory length of T = 200.

interval ∆t , is 2) Tracking Methods: The following experiments aim to


assess the efficacy of Latent-KalmanNet in comparison to
xt+∆t = exp(A(xt ) · ∆t) · xt . (20) benchmark data-driven algorithms as well as with model-
Finally, we take the Taylor series expansion of (20) and a finite based tracking in latent space. In particular, we evaluate the
series approximation (with J coefficients), which results following tracking algorithms:
J
X (A(xt ) · ∆t )j 1) Encoder: A data-driven convolutional encoder as in Ta-
F(xt ) , exp(A(xt ) · ∆t) = I + (21)
j=1
j! ble I, i.e., comprised of three convolutional layers and
two FC layers with p = 3 output neurons.
The resulting discrete-time evolution process is given by 2) Encoder + Prior: The convolutional encoder with a prior
xt+1 = f (xt ) = F(xt ) · xt . (22) estimate stacked to features provided to the first FC layer.
3) Encoder + Prior + EKF: Model-based tracking in latent
The discrete-time state-evolution model presented in (22), is space by applying an EKF with the observation function
augmented with additional zero-mean Gaussian noise with set to the identity matrix, i.e., h(x) = x. The variance
i.i.d. entries of variance q 2 = 0.005, obtaining a noisy state- of the state evolution noise is selected through a greed
evolution representation as in (1a). An illustration of the state search, while the observation noise is determined by the
trajectory is given in the left side of Fig. 8. empirical MSE at the output of the encoder.
We emulate visual representation of the movement de- 4) RKN: The RKN of [21], which is a leading data-driven
scribed by xt in the form of 28 × 28 matrices. To represent tracking algorithm that utilizes a high-dimensional fac-
visual observations used in particle tracking, e.g., [47], the torized latent state representation.
sensing function evaluated at coordinate c ∈ R2 for state value 5) Latent-KalmanNet: The proposed Latent-KalmanNet im-
x = [x1 , x2 , x3 ]> is modeled a Gaussian point spread function plemented using the above Encoder Prior along with
whose intensity depends on the lateral state coordinate, where KalmanNet based on Architecture 2 of [32].
11

6 8
Encod r Encoder
Encod r + Prior Encoder + Pr or
4 Encod r + Prior + EKF 6 Encoder + Pr or + EKF
RKN RKN
La( n( KalmanN ( J=2 Latent KalmanNet
2
La( n( KalmanN ( J=5 4
0
MSE [dB]

MSE [dB]
2
−2

0
−4

−6 −2

−8
−4
0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0
log(1/pr) log(1/pr)
(a) Mismatch due to state-evolution Taylor expansion. (b) Mismatch due to coarse sampling.
Fig. 11: Lorenz attractor: Performance with partially domain knowledge. MSE vs. observations S&P noise level.

3) Results: To assess Latent-KalmanNet in terms of track- Next, we examine the generalization of the filters to dif-
ing performance, latency, and robustness, we evaluate MSE as ferent trajectory lengths. This is done by training the systems
well as latency and complexity. As the S&P noise is character- with trajectories of length Ttrain = 200 while testing with
ized by the probability pr , when evaluating how performance notably longer trajectories of length Ttest = 2000. Fig. 10b
scales with the noise level we report the MSE values versus demonstrates how the purely data-driven RKN struggles to
log p1r for ease of visualization, where pr ∈ {0.5, . . . , 0.001} generalize, achieving performance that is similar to the stand-
is mapped into log p1r ∈ {0.3, . . . , 3}. We consider both the alone encoder. However our Latent-KalmanNet successfully
case of full information, where the SS model parameter (e.g., generalizes to a much longer trajectory, as it learns how
the state evolution function f (·)) is the same as that used for to track based on both data, latent features, and domain
generating the trajectories, and the case of partial information, knowledge, allowing it to cope with the SS model and not
where this domain knowledge is mismatched. overfit to the trajectory length.
Full Information: Here, we compare Latent-KalmanNet to Partial Information: To evaluate the performance of
the benchmark algorithms, where all algorithms have access to Latent-KalmanNet under partial model information, we con-
the state-evolution function f (·) used during data generation. sider two sources of model mismatch in the Lorenz attractor
In our first experiment, the trajectory length presented setup. First, we examine state-evolution mismatch due to the
during training was the same as in the test set Ttrain = Ttest = use of a Taylor series approximation of insufficient order.
200. The resulting MSEs, reported in Fig. 10a, demonstrate In this study, both Latent-KalmanNet and the benchmark
that the proposed Latent-KalmanNet achieves the lowest MSE algorithms (Encoder + Prior and Encoder + Prior + EKF) use
for all considered noise levels. The improvement due to adding a crude approximation of the evolution dynamics obtained by
EKF on top of the pre-trained encoder, tracking in the new computing (21) with J = 2, while the data was generated with
learned latent space, is much less notable here compared with an order J = 5 Taylor series expansion. We set the trajectory
the improvement in the Pendulum setup noted in Fig. 7. This length to T = 200 (both Ttrain and Ttest ), and ∆t = 0.02.
follows since the S&P noise yields a latent representation in The results, depicted in Fig. 11a, demonstrate that applying
which the distribution of the distortion cannot be faithfully a model-based EKF achieves performance, which coincides
approximated as being Gaussian, while the EKF, as opposed with that of the Encoder + Prior. This stems from the fact
to KalmanNet, is designed for Gaussian SS models. The that mismatched model resulted in the EKF being unable to
purely data-driven RKN improves upon the model-based EKF, incorporate the state evolution to improve tracking, and the
and is only slightly outperformed by the proposed Latent- grid search for identifying the most suitable noise variance
KalmanNet. lead to the Kalman Gain computation such that the estimation
The superiority of Latent-KalmanNet is also evident in Ta- relies almost solely on the instantaneous observation. More
ble II, which reports the MSEs along with their standard devia- interestingly, Latent-KalmanNet with partial knowledge (of
tion, representing the confident intervals of the estimators. It is J = 2) learns to overcome this model mismatch and manages
observed that the improved estimates of Latent-KalmanNet are to come within a small gap with the performance of Latent-
also consistently achieved more confidently, i.e., with smaller KalmanNet with full knowledge (of J = 5), outperforming
standard deviation, compared with the competitor RKN. This its benchmark counterparts operating with the same level of
behavior is also illustrated when observing a single filter partial information and the data-driven RKN. These findings
trajectory in Fig. 8, where the original state tracked is the one suggest that Latent-KalmanNet is robust and effective, even
depicted on the left side, and the different models predictions when operating under partial model information of the system
are on the right. dynamics.
12

Next, we evaluate the performance of Latent-KalmanNet in of the internal RNN of Latent-KalmanNet results in it having
the presence of sampling mismatch. We generate data from the a similar computational complexity compared with applying
Lorenz attractor SS model using a dense sampling rate ∆t = the EKF in latent space. These results, combined with the
0.001, and then sub-sample the corresponding observations performance and robustness gains of Latent-KalmanNet noted
1
by a ratio of 20 to obtain a decimated process with sample in the previous studies, showcase the potential of Latent-
spacing of ∆t = 0.02. This results in a mismatch between the KalmanNet in leveraging both data and domain knowledge
SS model and the discrete-time sequence, as the nonlinearity for tracking with high-dimensional data while coping with the
of the SS model results in a difference in distribution between challenging C.1-C.4.
the decimated data and data generated directly with sampling
interval ∆t = 0.02. Such scenarios correspond to the practical V. C ONCLUSIONS
setting of mismatches due to processing of continuous-time In this work, we proposed a method for tracking based
signals using discrete-time approximations. For this setting on complex observations with unknown noise statistics. Our
we use the identity mapping for the sensing function h(·) proposed Latent-KalmanNet combines DNN-aided encoding
and set T = 200 (both Ttrain and Ttest ). The results, shown with learned Kalman filtering based on KalmanNet in the
in Fig. 11b, demonstrate that Latent-KalmanNet outperforms latent space, and designs these modules to mutually benefit one
both model-based tracking and the data-driven RKN as its another in a synergistic manner. The training scheme of Latent-
combination of learning capabilities, along with the available KalmanNet exploits its interpretable architecture to formulate
model dynamic knowledge, allows it overcomes the mis- alternate training between the two learnable components in
match induced by representing a continuous-time SS model in order to learn a surrogate latent representation, which most
discrete-time. As in the case with mismatched state evolution, facilitates tracking. Our empirical evaluations demonstrate that
we again observe that the model mismatches result in the the proposed Latent-KalmanNet successfully tracks from high-
EKF being unable to improve performance, and achieving the dimensional observations and generalizes to trajectories of
same performance as that of instantaneous detection using an different lengths. It also succeeds in working with partial
Encoder + Prior. domain knowledge of the state evolution function or sampling
Complexity and Latency: We conclude our numerical mismatches, and is shown to infer with low latency.
study by demonstrating that the performance benefits of R EFERENCES
Latent-KalmanNet do not come at the cost of increased
[1] I. Buchnik, D. Steger, G. Revach, R. J. G. van Sloun, T. Routtenberg,
computational complexity and latency, as is often the case and N. Shlezinger, “Learned Kalman Filtering in Latent Space with
when using deep models. In fact, we show that it can achieve High-Dimensional Data,” IEEE International Conference on Acoustics,
faster inference compared with both model-based EKF in Speech and Signal Processing (ICASSP), 2023.
[2] R. E. Kalman, “A new approach to linear filtering and prediction
latent space and the data-driven RKN. To that aim, we problems,” Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45,
provide an analysis of the average inference time of these 1960.
tracking algorithms computed over a test set comprised of [3] J. Durbin and S. J. Koopman, Time series analysis by state space
methods. OUP Oxford, 2012, vol. 38.
100 trajectories, with Ttest = 200 time steps each. Inference [4] R. E. Larson, R. M. Dressler, and R. S. Ratner, “Application of
time is computed on the same platform for all methods, which the extended Kalman filter to ballistic trajectory estimation.” Stanford
is an 11th Gen Lenovo laptop with Intel Core i7, 2.80 GHz Research Inst Menlo Park CA, Tech. Rep., 1967.
[5] E. A. Wan and R. Van Der Merwe, “The unscented kalman filter,”
processor, 16 GB of RAM, and Windows 11 operating system. Kalman filtering and neural networks, pp. 221–280, 2001.
We also report the number of floating point operations required [6] X. Lin and G. Terejanu, “Enllvm: Ensemble based nonlinear bayesian
by each method, where the number of operations in applying filtering using linear latent variable models,” in ICASSP 2019-2019 IEEE
International Conference on Acoustics, Speech and Signal Processing
a DNN is given by its number of trainable parameters. (ICASSP). IEEE, 2019, pp. 5222–5226.
The resulting complexity and latency measures of Latent- [7] P. M. Djuric, J. H. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. F.
KalmanNet compared with Encoder + Prior + EKF and RKN Bugallo, and J. Miguez, “Particle filtering,” IEEE Signal Process. Mag.,
vol. 20, no. 5, pp. 19–38, 2003.
are reported in Table III. These results reveal that Latent- [8] N. J. Gordon, D. J. Salmond, and A. F. Smith, “Novel approach to
KalmanNet achieves the fastest inference time, being not only nonlinear/non-Gaussian Bayesian state estimation,” in IEE proceedings
notably faster than the purely DNN-based RKN, but also from F (radar and signal processing), vol. 140, no. 2. IET, 1993, pp. 107–
113.
the model-based EKF. The latter stems from the fact that [9] E. Isufi, P. Banelli, P. Di Lorenzo, and G. Leus, “Observing and
the EKF needs to compute Jacobians and matrix inversions tracking bandlimited graph processes from sampled measurements,”
on each time instance to produce its Kalman gain, which Signal Processing, vol. 177, p. 107749, 2020.
[10] G. Sagi and T. Routtenberg, “GSP-based map estimation of graph
turns out to be slower compared with applying the compact signals,” arXiv preprint arXiv:2209.11638, 2022.
RNN used by KalmanNet for the same purpose, while being [11] G. Sagi, N. Shlezinger, and T. Routtenberg, “Extended Kalman Filter
amenable to parallelization and acceleration. The compactness for Graph Signals in Nonlinear Dynamic Systems,” IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
[12] S. Cheng, C. Quilodran-Casas, S. Ouala, A. Farchi, C. Liu, P. Tandeo,
TABLE III: Latency and complexity comparison R. Fablet, D. Lucor, B. Iooss, J. Brajard et al., “Machine learning with
data assimilation and uncertainty quantification for dynamical systems:
Encoder+EKF RKN Latent-KalmanNet a review,” arXiv preprint arXiv:2303.10462, 2023.
[13] J. Gu, X. Yang, S. De Mello, and J. Kautz, “Dynamic facial analysis:
Complexity (FP) 15573 + 243 42426 15573 + 2712 From bayesian filtering to recurrent neural network,” in Proceedings
Latency (sec) 0.36 0.39 0.09 of the IEEE Conference on Computer Vision and Pattern Recognition,
2017, pp. 1548–1557.
13

[14] B. Tang and D. S. Matteson, “Probabilistic transformer for time series [38] M. Gruber, “An approach to target tracking,” MIT Lexington Lincoln
analysis,” Advances in Neural Information Processing Systems, vol. 34, Lab, Tech. Rep., 1967.
pp. 23 592–23 608, 2021. [39] N. Shlezinger and T. Routtenberg, “Discriminative and generative learn-
[15] T. Zhi-Xuan, H. Soh, and D. Ong, “Factorized inference in deep Markov ing for linear estimation of random signals [lecture notes],” arXiv
models for incomplete multimodal time series,” in Proceedings of the preprint arXiv:2206.04432, 2022.
AAAI Conference on Artificial Intelligence, vol. 34, no. 06, 2020, pp. [40] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press,
10 334–10 341. 2016.
[16] S. S. Rangapuram, M. W. Seeger, J. Gasthaus, L. Stella, Y. Wang, and [41] J. Park and S. Boyd, “General heuristics for nonconvex quadratically
T. Januschowski, “Deep state space models for time series forecasting,” constrained quadratic programming,” arXiv preprint arXiv:1703.07870,
Advances in Neural Information Processing Systems, vol. 31, 2018. 2017.
[17] B. Millidge, A. Tschantz, A. Seth, and C. Buckley, “Neural Kalman [42] S. Gannot and A. Yeredor, “The Kalman filter,” Springer Handbook of
filtering,” arXiv preprint arXiv:2102.10021, 2021. Speech Processing, pp. 135–160, 2008.
[18] S. Jouaber, S. Bonnabel, S. Velasco-Forero, and M. Pilte, “NNAKF: [43] L. Xu and R. Niu, “EKFNet: Learning system noise statistics from
A neural network adapted Kalman filter for target tracking,” in IEEE measurement data,” in IEEE International Conference on Acoustics,
International Conference on Acoustics, Speech and Signal Processing Speech and Signal Processing (ICASSP), 2021, pp. 4560–4564.
(ICASSP), 2021, pp. 4075–4079. [44] T. Raviv, S. Park, O. Simeone, Y. C. Eldar, and N. Shlezinger, “Online
[19] D. Ruhe and P. Forré, “Self-supervised inference in state-space models,” meta-learning for hybrid model-based deep receivers,” IEEE Trans.
arXiv preprint arXiv:2107.13349, 2021. Wireless Commun., 2023, early access.
[20] D. Ruhe and P. Forr’e, “Self-supervised inference in state-space models,” [45] W. Gilpin, “Chaos as an interpretable benchmark for forecasting and
in International Conference on Learning Representations, 2021. data-driven modelling,” arXiv preprint arXiv:2110.05266, 2021.
[21] P. Becker, H. Pandya, G. Gebhardt, C. Zhao, C. J. Taylor, and [46] V. Garcia Satorras, Z. Akata, and M. Welling, “Combining generative
G. Neumann, “Recurrent Kalman networks: Factorized inference in and discriminative models for hybrid inference,” Advances in Neural
high-dimensional deep feature spaces,” in International Conference on Information Processing Systems, vol. 32, 2019.
Machine Learning. PMLR, 2019, pp. 544–552. [47] V. Bayle, J.-B. Fiche, C. Burny, M. P. Platre, M. Nollmann, A. Mar-
[22] G. Nguyen-Quynh, P. Becker, C. Qiu, M. Rudolph, and G. Neu- tinière, and Y. Jaillais, “Single-particle tracking photoactivated localiza-
mann, “Switching recurrent Kalman networks,” arXiv preprint tion microscopy of membrane proteins in living plant tissues,” Nature
arXiv:2111.08291, 2021. Protocols, vol. 16, no. 3, pp. 1600–1628, 2021.
[23] A. Klushyn, R. Kurle, M. Soelch, B. Cseke, and P. van der Smagt,
“Latent matters: Learning deep state-space models,” Advances in Neural
Information Processing Systems, vol. 34, pp. 10 234–10 245, 2021. A PPENDIX
[24] Y. Wang, X. Luo, L. Ding, and S. Hu, “Visual tracking via robust
multi-task multi-feature joint sparse representation,” Multimedia Tools
A. Additional Numerical Results
and Applications, vol. 77, pp. 31 447–31 467, 2018. The results presented in these tables provide a numerical
[25] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with applica-
tions to tracking and navigation: Theory algorithms and software. John
description of previous experiments that were demonstrated
Wiley & Sons, 01 2004. visually in IV. The tables offer a concise and precise way
[26] L. Zhou, Z. Luo, T. Shen, J. Zhang, M. Zhen, Y. Yao, T. Fang, of presenting the MSE for different noise levels, allowing for
and L. Quan, “KFNet: Learning temporal camera relocalization using
Kalman filtering,” in Proceedings of the IEEE/CVF Conference on
reconstructing the findings. Each Table point the corresponded
Computer Vision and Pattern Recognition, 2020, pp. 4919–4928. figure. The Tables conclude also confidence intervals (std),
[27] H. Coskun, F. Achilles, R. DiPietro, N. Navab, and F. Tombari, “Long providing a measure of the precision of the estimate. In all
short-term memory Kalman filters: Recurrent neural estimators for pose
regularization,” in Proceedings of the IEEE International Conference on
tables our Latent-KalmanNet has narrow interval, proving
Computer Vision, 2017, pp. 5524–5532. our empirical superiority and strengthen the reliability of our
[28] M. Fraccaro, S. Kamronn, U. Paquet, and O. Winther, “A disentangled results.
recognition and nonlinear dynamics model for unsupervised learning,”
Advances in Neural Information Processing Systems, vol. 30, 2017.
TABLE IV: Pendulum: Design steps contribution, Fig 7
[29] A. H. Li, P. Wu, and M. Kennedy, “Replay overshooting: Learning
Noise level − log(r 2 ) 6 15.2 23 30
stochastic latent dynamics with the extended Kalman filter,” 2021 IEEE
International Conference on Robotics and Automation (ICRA), pp. 852– Encoder after alternating 1.6 −0.2 −1.8 −4.5
858, 2021. ±0.7 ±0.66 ±0.61 ±0.56
[30] R. G. Krishnan, U. Shalit, and D. Sontag, “Deep Kalman filters,” arXiv Encoder 0 −1.2 −3.1 −4.9
±0.51 ±0.41 ±0.48 ±0.43
preprint arXiv:1511.05121, 2015.
Encoder+Prior −4 −5.1 −6.5 −8.28
[31] B. Laufer-Goldshtein, R. Talmon, and S. Gannot, “A hybrid approach for ±0.37 ±0.32 ±0.38 ±0.34
speaker tracking based on TDOA and data-driven models,” IEEE/ACM Encoder+Prior+EKF −4.8 −6.1 −7.3 −9.3
Trans. Audio, Speech, Language Process., vol. 26, no. 4, pp. 725–735, ±0.22 ±0.34 ±0.25 ±0.21
2018. Latent-KalmanNet −8.2 −8.3 −9.8 −11.1
[32] G. Revach, N. Shlezinger, X. Ni, A. L. Escoriza, R. J. Van Sloun, and ±0.18 ±0.2 ±0.14 ±0.11
Y. C. Eldar, “KalmanNet: Neural network aided Kalman filtering for
partially known dynamics,” IEEE Trans. Signal Process., vol. 70, pp.
1532–1547, 2022. TABLE V: Lorenz: Performance with full domain knowledge.
[33] I. Klein, G. Revach, N. Shlezinger, J. E. Mehr, R. J. G. van Sloun, Different trajectories length, Fig 10b
and Y. C. Eldar, “Uncertainty in data-driven Kalman filtering for
partially known state-space models,” in IEEE International Conference Noise level − log(pr ) 0.3 1 2 3
on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 3194–
3198. RKN 5.7 −0.2 −3.5 −5.8
[34] G. Revach, N. Shlezinger, T. Locher, X. Ni, R. J. van Sloun, and ±1.6 ±2.4 ±2.8 ±1.5
Encoder 5.4 −0.58 −3.9 −6
Y. C. Eldar, “Unsupervised learned kalman filtering,” in European Signal
±0.79 ±0.71 ±0.75 ±0.8
Processing Conference (EUSIPCO), 2022, pp. 1571–1575. Encoder+Prior 1.63 −3.74 −6.72 −7.57
[35] N. Shlezinger, J. Whang, Y. C. Eldar, and A. G. Dimakis, “Model-based ±0.51 ±0.6 ±0.58 ±0.45
deep learning,” Proc. IEEE, 2023, early access. Encoder+Prior+EKF −0.61 −4.34 −7.11 −7.91
[36] N. Shlezinger, Y. C. Eldar, and S. P. Boyd, “Model-based deep learning: ±0.21 ±0.28 ±0.35 ±0.3
On the intersection of deep learning and optimization,” IEEE Access, Latent-KalmanNet −2.4 −5.45 −7.91 −8.54
vol. 10, pp. 115 384–115 398, 2022. ±0.2 ±0.16 ±0.08 ±0.12
[37] J. Durbin and S. J. Koopman, “A simple and efficient simulation
smoother for state space time series analysis,” Biometrika, vol. 89, no. 3,
pp. 603–616, 2002.
14

TABLE VI: Lorenz: Performance with partially domain


knowledge. Mismatched Taylor expansion of state-evolution
function, Fig 11a
Noise level − log(pr ) 0.3 1 2 3
Encoder 5.8 −0.5 −3.7 −5.6
±1.5 ±0.9 ±1.1 ±1.3
Encoder+Prior 3.19 −1.1 −5.44 −6.28
±0.83 ±0.8 ±0.73 ±0.7
Encoder+Prior+EKF 2.8 −1.2 −5.42 −6.29
±0.59 ±0.7 ±0.61 ±0.56
RKN −1 −4.2 −6.9 −7.8
±2.3 ±2.1 ±1.5 ±1.1
Latent-KalmanNet J = 2 −1.61 −4.86 −6.85 −7.53
±0.3 ±0.29 ±0.42 ±0.31
Latent-KalmanNet J = 5 −1.91 −4.92 −7.2 −7.94
±0.38 ±0.2 ±0.41 ±0.3

TABLE VII: Lorenz: Performance with partially domain


knowledge. Decimation: mismatch of sampling rate, Fig 11b
Noise level − log(pr ) 0.3 1 2 3
Encoder 7.7 1.78 −1.95 −3.11
±1.2 ±1.3 ±0.9 ±0.95
Encoder+Prior 6.1 0.9 −2.8 −3.2
±0.72 ±0.7 ±0.53 ±0.6
Encoder+Prior+EKF 5.6 0.8 −2.83 −3.21
±0.49 ±0.51 ±0.38 ±0.4
RKN 4.3 0.3 −3.1 −3.5
±1 ±1.3 ±1.5 ±2.1
Latent-KalmanNet 3.3 −0.46 −3.4 −3.6
±0.09 ±0.21 ±0.18 ±0.16

You might also like