0% found this document useful (0 votes)
236 views15 pages

Kalman Net

KalmanNet is a neural network aided Kalman filter for state estimation of partially known dynamical systems. It incorporates a recurrent neural network module into the Kalman filter to implicitly learn complex dynamics from data. This allows it to overcome model mismatch and non-linearities, outperforming classic filtering methods with both accurate and mismatched domain knowledge. By combining the structured state space model of the Kalman filter with neural network learning, KalmanNet retains the data efficiency and interpretability of the classic algorithm while leveraging neural networks to handle unknown system dynamics.

Uploaded by

DamiApache
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
236 views15 pages

Kalman Net

KalmanNet is a neural network aided Kalman filter for state estimation of partially known dynamical systems. It incorporates a recurrent neural network module into the Kalman filter to implicitly learn complex dynamics from data. This allows it to overcome model mismatch and non-linearities, outperforming classic filtering methods with both accurate and mismatched domain knowledge. By combining the structured state space model of the Kalman filter with neural network learning, KalmanNet retains the data efficiency and interpretability of the classic algorithm while leveraging neural networks to handle unknown system dynamics.

Uploaded by

DamiApache
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

KalmanNet: Neural Network Aided Kalman

Filtering for Partially Known Dynamics


Guy Revach, Nir Shlezinger, Xiaoyong Ni, Adrià López Escoriza, Ruud J. G. van Sloun, and Yonina C. Eldar

Abstract—State estimation of dynamical systems in real-time the unscented Kalman filter (UKF) [10]. Methods based on
is a fundamental task in signal processing. For systems that are sequential Monte-Carlo (MC) sampling such as the family
well-represented by a fully known linear Gaussian state space of particle filters (PFs) [11]–[13], were introduced for state
(SS) model, the celebrated Kalman filter (KF) is a low complexity
estimation in non-linear, non-Gaussian SS models. To date,
arXiv:2107.10043v2 [eess.SP] 24 Jan 2022

optimal solution. However, both linearity of the underlying SS


model and accurate knowledge of it are often not encountered in the KF and its non-linear variants are still widely used for
practice. Here, we present KalmanNet, a real-time state estimator online filtering in numerous real world applications involving
that learns from data to carry out Kalman filtering under non- tracking and localization [14].
linear dynamics with partial information. By incorporating the The common thread among these aforementioned filters
structural SS model with a dedicated recurrent neural network
module in the flow of the KF, we retain data efficiency and is that they are model-based (MB) algorithms; namely, they
interpretability of the classic algorithm while implicitly learn- rely on accurate knowledge and modeling of the underlying
ing complex dynamics from data. We numerically demonstrate dynamics as a fully characterized SS model. As such, the
that KalmanNet overcomes non-linearities and model mismatch, performance of these MB methods critically depends on the
outperforming classic filtering methods operating with both validity of the domain knowledge and model assumptions.
mismatched and accurate domain knowledge.
MB filtering algorithms designed to cope with some level
of uncertainty in the SS models, e.g., [15]–[17], are rarely
I. I NTRODUCTION capable of achieving the performance of MB filtering with full
Estimating the hidden state of a dynamical system from domain knowledge, and rely on some knowledge of how much
noisy observations in real-time is one of the most fundamental their postulated model deviates from the true one. In many
tasks in signal processing and control, with applications in practical use cases the underlying dynamics of the system is
localization, tracking, and navigation [2]. In a pioneering work non-linear, complex, and difficult to accurately characterize as
from the early 1960s [3]–[5], based on work by Wiener from a tractable SS model in which case degradation in performance
1949 [6], Rudolf Kalman introduced the Kalman filter (KF), of the MB state estimators is expected.
a minimum mean-squared error (MMSE) estimator that is Recent years have witnessed remarkable empirical success
applicable to time-varying systems in discrete-time, which of deep neural networks (DNNs) in real-life applications.
are characterized by a linear state space (SS) model with These data-driven (DD) parametric models were shown to be
additive white Gaussian noise (AWGN). The low-complexity able to catch the subtleties of complex processes and replace
implementation of the KF, combined with its sound theoretical the need to explicitly characterize the domain of interest [18],
basis, resulted in it quickly becoming the leading workhorse [19]. Therefore, an alternative strategy to implement state
of state estimation in systems that are well described by SS estimation—without requiring explicit and accurate knowledge
models in discrete-time. The KF has been applied to problems of the SS model—is to learn this task from data using deep
such as radar target tracking [7], trajectory estimation of learning. DNNs such as recurrent neural networks (RNNs),
ballistic missiles [8], and estimating the position and velocity i.e., long short-term memory (LSTM) [20] and gated recurrent
of a space vehicle in the Apollo program [9]. unit (GRU) [21], and attention mechanisms [22] have been
While the original KF assumes linear SS models, many shown to perform very well for time series related tasks mostly
problems encountered in practice are governed by non-linear in intractable environments, by training these networks in an
dynamical equations. Therefore, shortly after the introduction end-to-end model-agnostic manner from a large quantity of
of the original KF, non-linear variations of it were pro- data. Nonetheless, DNNs do not incorporate domain knowl-
posed, such as the extended Kalman filter (EKF) [7], [8] and edge such as structured SS models in a principled manner.
Consequently, these DD approaches require many trainable
Parts of this work focusing on linear Gaussian state space models were parameters and large data sets even for simple sequences [23]
presented in the IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP) 2021 [1]. G. Revach, X. Ni and A. and lack the interpretability of MB methods. These constraints
L. Escoriza are with the Institute for Signal and Information Process- limit the usage of highly parametrized DNNs for real-time
ing (ISI), D-ITET, ETH Zürich, Switzerland, (e-mail: [email protected]; state estimation in applications embedded in hardware-limited
[email protected]; [email protected]). N. Shlezinger is with the
School of ECE, Ben-Gurion University of the Negev, Beer Sheva, Israel mobile devices such as drones and vehicular systems.
(e-mail: [email protected]). R. J. G. van Sloun is with the EE Dpt., Eind- The limitations of MB Kalman filtering and DD state
hoven University of Technology, and with Phillips Research, Eindhoven, The estimation motivate a hybrid approach that exploits the best
Netherlands (e-mail: [email protected]). Y. C. Eldar is with the Faculty
of Math and CS, Weizmann Institute of Science, Rehovot, Israel (e-mail: of both worlds; i.e., the soundness and low complexity of the
[email protected]). classic KF, and the model-agnostic nature of DNNs. Therefore,

1
we build upon the success of our previous work in MB with mean µ and covariance Σ is denoted by N (µ, Σ). Finally,
deep learning for signal processing and digital communication R and Z are the sets of real and integer numbers, respectively.
applications [24]–[27] to propose a hybrid MB/DD online
recursive filter, coined KalmanNet. In particular, we focus II. S YSTEM M ODEL AND P RELIMINARIES
on real-time state estimation for continuous-value SS models
A. State Space Model
for which the KF and its variants are designed. We assume
that the noise statistics are unknown and the underlying SS We consider dynamical systems characterized by a SS
model is partially known or approximated from a physical model in discrete-time [36]. We focus on (possibly) non-linear,
model of the system dynamics. To design KalmanNet, we Gaussian, and continuous SS models, which for each t ∈ Z
identify the Kalman gain (KG) computation of the KF as a are represented via
critical component encapsulating the dependency on the noise xt = f (xt−1 ) + et , wt ∼ N (0, Q) , xt ∈ Rm , (1a)
statistics and domain knowledge, and replace it by a compact
RNN of limited complexity, which is integrated into the KF yt = h (xt ) + vt , vt ∼ N (0, R) , yt ∈ Rn . (1b)
flow. The resulting system uses labeled data to learn to carry In (1a), xt is the latent state vector of the system at time t,
out Kalman filtering in a supervised manner. which evolves from the previous state xt−1 , by a (possibly)
Our main contributions are summarized as follows: non-linear, state-evolution function f (·) and by an AWGN
1) We design KalmanNet, which is an interpretable, low et with covariance matrix Q. In (1b), yt is the vector of
complexity, and data-efficient DNN-aided real-time state observations at time t, which is generated from the current
estimator. KalmanNet builds upon the flow and theoret- latent state vector by a (possibly) non-linear observation (emis-
ical principles of the KF, incorporating partial domain sion) mapping h (·) corrupted byAWGN vt with covariance
knowledge of the underlying SS model in its operation. R. For the special case where the evolution or the observation
2) By learning the KG, KalmanNet circumvents the depen- transformations are linear, there exist matrices F, H such that
dency of the KF on knowledge of the underlying noise
statistics, thus bypassing numerically problematic matrix f (xt−1 ) = F · xt−1 , h (xt ) = H · xt . (2)
inversions involved in the KF equations and overcoming In practice, the state-evolution model (1a) is determined by
the need for tailored solutions for non-linear systems; e.g., the complex dynamics of the underlying system, while the
approximations to handle non-linearities as in the EKF. observation model (1b) is dictated by the type and quality of
3) We show that KalmanNet learns to carry out Kalman the observations. For instance, xt can determine the location,
filtering from data in a manner that is invariant to the velocity, and acceleration of a vehicle, while yt are measure-
sequence length. Specifically, we present an efficient ments obtained from several sensors. The parameters of these
supervised training scheme that enables KalmanNet to models may be unknown and often require the introduction
operate with arbitrary long trajectories while only training of dedicated mechanisms for their estimation in real-time
using short trajectories. [37], [38]. In some scenarios, one is likely to have access
4) We evaluate KalmanNet in various SS models. The to an approximated or mismatched characterization of the
experimental scenarios include synthetic setups, tracking underlying dynamics.
the chaotic Lorenz system, and localization using the SS models are studied in the context of several different
Michigan NCLT data set [28]. KalmanNet is shown to tasks; these tasks are different in their nature, and can be
converge much faster compared with purely DD systems, roughly classified into two main categories: observation ap-
while outperforming the MB EKF, UKF, and PF, when proximation, and hidden state recovery. The first category
facing model mismatch and dominant non-linearities. deals with approximating parts of the observed signal yt . This
The proposed KalmanNet leverages data and partial domain can correspond to, e.g., the prediction of future observations
knowledge to learn the filtering operation, rather than using given past observations; the generation of missing observations
data to explicitly estimate the missing SS model parameters. in a given block via imputation; and the denoising of the
Although there is a large body of work that combines SS observations. The second category considers the recovery of
models with DNNs, e.g., [29]–[35], these approaches are a hidden state vector xt . This family of state recovery tasks
sometimes for different SS related tasks (e.g., smoothing, includes offline recovery, also referred to as smoothing, where
imputation); with a different focus, e.g., incorporating high- one must recover a block of hidden state vectors, given a
dimensional visual observations to a KF; or under different block of observations, e.g., [35]. The focus of this paper is
assumptions, as we discuss in detail below. filtering; i.e., online recovery of xt from past and current noisy
The rest of this paper is organized as follows: Section II observations {yτ }tτ =1 . For a given x0 , filtering involves the
reviews the SS model and its associated tasks, and discusses design of a mapping from yt to x̂t , ∀t ∈ {1, 2, . . . , T } , T ,
related works. Section III details the proposed KalmanNet. where T is the time horizon.
Section IV presents the numerical study. Section V provides
concluding remarks and future work.
Throughout the paper, we use boldface lower-case letters B. Data-Aided Filtering Problem Formulation
for vectors, and boldface upper-case letters for matrices. The The filtering problem is at the core of real-time tracking.
transpose, `2 norm, and stochastic expectation are denoted by Here, one must provide an instantaneous estimate of the state
>
{·} , k·k, and E [·], respectively. The Gaussian distribution xt based on each incoming observation yt in an online manner.

2
Our main focus is on scenarios where one has partial knowl- This can be achieved by jointly learning the parameters and
edge of the SS model that describes the underlying dynamics. state sequence using expectation maximization [46]–[48] and
Namely, we know (or have an approximation of) the state- Bayesian probabilistic algorithms [37], [38], or by selecting
evolution (transition) function f (·) and the state-observation from a set of a priori known models [49]. When training
(emission) function h (·). For real world applications, this data is available, it is commonly used to tune the missing
knowledge is derived from our understating of the system parameters in advance, in a supervised or an unsupervised
dynamics, its physical design, and the model of the sensors. As manner, as done in [50]–[52]. The main drawback of these
opposed to the classical assumptions in KF, the noise statistics strategies is that they are restricted to an imposed parametric
Q and R are not known. More specifically, we assume: model on the underlying dynamics (e.g., Gaussian noises).
• Knowledge of the distribution of the noise signals et and
vt is not available. When one can bound the uncertainty in the SS model in
• The functions f (·) and h (·) may constitute an approxi- advance, an alternative approach to learning is to minimize
mation of the true underlying dynamics. Such approxima- the worst-case estimation error among all expected SS models.
tions can correspond to, for instance, the representation Such robust variations were proposed for various state estima-
of continuous time dynamics in discrete time, acquisition tion algorithms, including Kalman variants [15]–[17], [53] and
using misaligned sensors, and other forms of mismatches. particle filters [54], [55]. The fact that these approaches aim
While we focus on filtering in partially-known SS models, to design the filter to be suitable for multiple different SS
we assume that we have access to a labeled data set containing model typically results in degraded performance compared to
a sequence of observations and their corresponding ground operating with known dynamics.
truth states. In various scenarios of interest, one can assume
access to some ground truth measurements in the design stage. When the underlying system’s dynamics are complex and
For example, in field experiments it possible to add extra only partially known or the emission model is intractable and
sensors both internally or externally to collect the ground truth cannot be captured in a closed form—e.g., visual observations
needed for training. It is also possible to compute the ground as in a computer vision task [56]—one can resort to approxi-
truth data using offline and more computationally intensive mations and to the use of DNNs. Variational inference [57]–
algorithms. Finally, the inference complexity of the learned [59] is commonly used in connection with SS models, as in
filter should be of the same order (and preferably smaller) as [29]–[31], [33], [34], by casting the Bayesian inference task
that of MB filters, such as the EKF. to optimization of a parameterized posterior and maximizing
an objective. Such approaches cannot typically be applied
directly to state recovery in real-time, as we consider here,
C. Related Work and the learning procedure tends to be complex and prone to
A key ingredient in recursive Bayesian filtering is the update approximation errors.
operation; namely, the need to update the prior estimate using
new observed information. For linear Gaussian SS using the A common strategy when using DNNs is to encode the
KF, this boils down to computing the KG. While the KF observations into some latent space, that is assumed to obey
assumes linear SS models, many problems encountered in a simple SS model, typically a linear Gaussian one, and track
practice are governed by non-linear dynamics, for which one the state in the latent domain as in [56], [60], [61] or to use
should resort to approximations. Several extensions of the DNNs to estimate the parameters of the SS model as in [62],
KF were proposed to deal with non-linearities. The EKF [63]. Tracking in the latent space can also be extended by
[7], [8] is a quasi-linear algorithm based on an analytical applying a DNN decoder to the estimated state to return to the
linearization of the SS model. More recent non-linear vari- observations domain, while training the overall system end-to-
ations are based on numerical integration: UKF [10], the end [31], [64]. The latter allows to design trainable systems
Gauss-Hermite Quadrature [39], and the Cubature KF [40]. for recovering missing observations and predicting future ones
For more complex SS models, and when the noise cannot by assuming that the temporal relationship can be captured
be modeled as Gaussian, multiple variants of the PF were as an SS model in the latent space. This form of DNN-
proposed that are based on sequential MC [11]–[13], [41]– aided systems is typically designed for unknown or highly
[45]. These MC algorithms are considered to be asymptotically complex SS models, while we focus in this work on setups
exact but relatively computationally heavy when compared with partial domain knowledge, as detailed in Subsection II-B.
to Kalman-based algorithms. These MB algorithms require Another line of work is to combine RNNs [65], or variational
accurate knowledge of the SS model, and their performance inference [32], [66] with MC based sampling. Also related is
is typically degraded in the presence of model mismatch. the work [35], which used learned models in parallel with MBs
The combination of machine learning and SS models, and algorithms operating with full knowledge of the SS model,
specifically Kalman-based algorithms, is the focus of growing applying a graph neural network in parallel to the Kalman
research attention. To frame the current work in the context of smoother to improve its accuracy via neural augmentation.
existing literature, we focus on the approaches that preserve Estimation was performed by an iterative message passing
the general structure of the SS model. The conventional over the entire time horizon. This approach is suitable for the
approach to deal with partially known SS models is to im- smoothing task and is computational intensive, so therefore
pose a parametric model and then estimate its parameters. may not be suitable for real-time filtering [67].

3
D. Model-Based Kalman Filtering x0
x̂t

t = 0

t > 0
Our proposed KalmanNet, detailed in the following section,
Z −1 f • h − ∆yt x̂t
x̂t|t−1 ŷt|t−1
is based on the MB KF, which is a linear recursive estimator. + × + •

yt +
In every time step t, the KF produces a new estimate xt using
only the previous estimate x̂t−1 as a sufficient statistic and the
Σ̂t|t−1 Kalman Gain Kt
new observation yt . As a result, the computational complexity Σ̂t|t−1 · Ĥ · Ŝ−1
t|t−1
of the KF does not grow in time. We first describe the original Ŝ−1
t|t−1
algorithm for linear SS models, as in (2), and then discuss how Kt
Ŝt|t−1 {·}−1
it is extended into the EKF for non-linear SS models.
Σ̂t−1 − Σ̂t
The KF can be described by a two-step procedure: predic- Z −1 F̂ · {} · F̂> + Q̂ •
Ĥ · {} · Ĥ> + R̂ •
Kt · {} · Kt + •

Ŝt|t−1 Σ̂t|t−1 +
tion and update, where in each time step t ∈ T , it compute

t > 0
t = 0
Σ̂t|t−1
Σ̂t
the first- and second-order statistical moments. Σ0

1) The first step predicts the current a priori statistical Fig. 1: EKF block diagram. Here, Z −1 is the unit delay.
moments based on the previous a posteriori estimates.
An illustration of the EKF is depicted in Fig. 1. The
Specifically, the moments of x are computed using the
resulting filter admits an efficient linear recursive structure.
knowledge of the evolution matrix F as
However, it requires full knowledge of the underlying model
x̂t|t−1 = F · x̂t−1|t−1 , (3a) and notably degrades in the presence of model mismatch.
Σt|t−1 = F · Σt−1|t−1 · F + Q >
(3b) When the model is highly non-linear, the local linearity ap-
proximation may not hold, and the EKF can result in degraded
and the moments of the observations y are computed performance. This motivates the augmentation of the EKF into
based on the knowledge of the observation matrix H as the deep learning-aided KalmanNet, detailed next.
ŷt|t−1 = H · x̂t|t−1 (4a)
St|t−1 = H · Σt|t−1 · H + R. >
(4b) III. K ALMAN N ET
Here, we present KalmanNet; a hybrid, interpretable, data
2) In the update step, the a posteriori state moments are
efficient architecture for real-time state estimation in non-
computed based on the a priori moments as
linear dynamical systems with partial domain knowledge.
x̂t|t = x̂t|t−1 + Kt · ∆yt (5a) KalmanNet combines the MB Kalman filtering with an RNN
Σt|t = Σt|t−1 − Kt · St|t−1 · K> to cope with model mismatch and non-linearities. To introduce
t . (5b)
KalmanNet, we begin by explaining its high level operation in
Here, Kt is the KG, and it is given by Subsection III-A. Then we present the features processed by
its internal RNN and the specific architectures considered for
Kt = Σt|t−1 · H> · S−1
t|t−1 . (6)
implementing and training KalmanNet in Subsections III-B-
The term ∆yt is the innovation; i.e., the difference III-D. Finally, we provide a discussion in Subsection III-E.
between the predicted observation and the observed value,
and it is the only term that depends on the observed data
A. High Level Architecture
∆yt = yt − ŷt|t−1 . (7) We formulate KalmanNet by identifying the specific compu-
The EKF extends the KF for non-linear f (·) and/or h (·), as tations of the EKF that are based on unavailable knowledge.
in (1). Here, the first-order statistical moments (3a) and (4a) As detailed in Subsection II-B, the functions f (·) and h (·)
are replaced with are known (though perhaps inaccurately); yet the covariance
matrices Q and R are unavailable. These missing statistical
x̂t|t−1 = f (x̂t−1 ) , (8a) moments are used in MB Kalman filtering only for computing

ŷt|t−1 = h x̂t|t−1 , (8b) the KG (see Fig. 1). Thus, we design KalmanNet to learn the
KG from data, and combine the learned KG in the overall KF
respectively. The second-order moment though cannot be flow. This high level architecture is illustrated in Fig. 2.
propagated through the non-linearity, and must thus be ap-
In each time instance t ∈ T , similarly to the EKF,
proximated. The EKF linearizes the differentiable f (·) and
KalmanNet estimates x̂t in two steps; prediction and update.
h (·) in a time-dependent manner using their partial derivative
matrices, also known as Jacobians, evaluated at x̂t−1|t−1 and 1) The prediction step is the same as in the MB EKF, except
x̂t|t−1 . Namely, that only the first-order statistical moments are predicted.
 In particular, a prior estimate for the current state x̂t|t−1
F̂t = Jf x̂t−1|t−1 (9a) is computed from the previous posterior x̂t−1 via (8a).
 Then, a prior estimate for the current observation ŷt|t−1
Ĥt = Jh x̂t|t−1 , (9b)
is computed from x̂t|t−1 via (8b). As opposed to its MB
where F̂t is plugged into (3b) and Ĥt is used in (4b) and (6). counterparts, KalmanNet does not rely on the knowledge
When the SS model is linear, the EKF coincides with the KF, of noise distribution, and does not maintain an explicit
which is achieves the MMSE for linear Guassian SS models. estimate of the second-order statistical moments.

4
x0 x̂t|t h0
t = 0 x̂t|t−1
t > 0
t = 0

Z −1 t > 0

x̂t−1
Z −1 • f • h Fully connected ht−1 Fully connected
ŷt|t−1− ∆yt x̂t|t
+ • + • linear input layer GRU linear output layer

ht−1
+
yt ×

×
∆x̂t−1

rt
WM
Z −1

σ
∆yt Kt


+ Kalman Gain ∆yt ht Kt

WZ

×
σ
+

zt
− ∆x̂t−1

-1
tanh
Recurrent Neural Network

+
W

ĥt
Fig. 2: KalmanNet block diagram.

ht
2) In the update step, KalmanNet uses the new observation ∆x̂t−1 ∈ Rm ht
∆yt ∈ Rn Kt ∈ Rm×n
yt to compute the current state posterior x̂t from the
previously computed prior x̂t|t−1 in a similar manner to
the MB KF as in (5a), i.e., using the innovation term Fig. 3: KalmanNet RNN block diagram (architecture #1). The
∆yt computed via (7) and the KG Kt . As opposed to architecture is comprised of a fully connected input layer,
the MB EKF, here the computation of the KG is not followed by a GRU layer (whose internal division into gates
given explicitly, rather, it is learned from data using is illustrated [21]), and an output fully connected layer. Here,
an RNN, as illustrated in Fig. 2. The inherent memory the input features are F2 and F4.
of RNNs allows to implicitly track the second-order uncertainty of our state estimate. The difference operation
statistical moments without requiring knowledge of the removes the predictable components, and thus the time series
underlying noise statistics. of differences is mostly affected by the noise statistics that
Designing an RNN to learn how to compute the KG as part we wish to learn. The RNN described in Fig. 2 can use all
of an overall KF flow requires answers to three key questions: the features, although extensive empirical evaluation suggests
1) From which input features (signals) will the network learn that the specific choice of combination of features depends on
the KG? the problem at hand. Our empirical observations indicate that
2) What should be the architecture of the internal RNN? good combinations are {F1, F2, F4} and {F1, F3, F4}.
3) How will this network be trained from data?
In the following sections we address these questions. C. Neural Network Architecture
The internal DNN of KalmanNet uses the features discussed
B. Input Features
in the previous section to compute the KG. It follows from
The MB KF and its variants compute the KG from knowl- (6) that computing the KG Kt involves tracking the second-
edge of the underlying statistics. To implement such compu- order statistical moments Σt . The recursive nature of the KG
tations in a learned fashion, one must provide input (features) computation indicates that its learned module should involve
that capture the knowledge needed to evaluate the KG to a an internal memory element as an RNN to track it.
neural network. The dependence of Kt on the statistics of We consider two architectures for the KG computing RNN.
the observations and the state process indicates that in order The first, illustrated in Fig. 3, aims at using the internal
to track it, in every time step t ∈ T , the RNN should be memory of RNNs to jointly track the underlying second-
provided with input containing statistical information of the order statistical moments required for computing the KG in an
observations yt and the state-estimate x̂t−1 . Therefore, the implicit manner. To that aim, we use GRU cells [21] whose
following quantities that are related to the unknown statistical hidden state is of the size of some integer product of m2 + n2 ,
relationship of the SS model can be used as input features to which is the joint dimensionality of the tracked moments
the RNN: Σ̂t|t−1 in (3b), and Ŝt in (4b). In particular, we first use a
F1 The observation difference ∆ỹt = yt − yt−1 . fully connected (FC) input layer whose output is the input
F2 The innovation difference ∆yt = yt − ŷt|t−1 . to the GRU. The GRU state vector ht is mapped into the
F3 The forward evolution difference ∆x̃t = x̂t|t − x̂t−1|t−1 . estimated KG Kt ∈ Rm×n using an output FC layer with
This quantity represents the difference between two con- m · n neurons. While the illustration in Fig. 3 uses a single
secutive posterior state estimates, where in time instance GRU layer, one can also utilize multiple layers to increase
t, the available feature is ∆x̃t−1 . the capacity and abstractness of the network, as we do in the
F4 The forward update difference ∆x̂t = x̂t|t − x̂t|t−1 , i.e., numerical study reported in Subsection IV-E. The architecture
the difference between the posterior state estimate and proposed does not directly design the hidden state of the
the prior state estimate, where again in time instance t GRU to correspond to the unknown second-order statistical
we use ∆x̂t−1 . moments that are tracked by the MB KF. As such, it uses
Features F1 and F3 encapsulate information about the state- a relatively large number of state variables that are expected
evolution process, while features F2 and F4 encapsulate the to provide the required tracking capacity. For example, in the

5
∆x̂t = x̂t|t − x̂t|t−1 estimate x̂t|t , we train KalmanNet end-to-end. Namely, we
Z −1

t > 0 GRU 1
∆e
xt = x̂t|t − x̂t−1|t−1 compute the loss function L based on the state estimate x̂t ,
∆x̂t−1 ∆yt = yt − ŷt|t−1
Q̂0 t = 0 Q̂ ∆e
yt = yt − yt−1 which is not the output of the internal RNN. Since this vector
Qt
takes values in a continuous set Rm , we use the squared-error
loss,
2
Z −1
Σ̂t|t L = xt − x̂t|t (10)
t > 0 GRU 2
Σ̂0 t = 0 Σ̂
∆e
xt−1 which is also used to evaluate the MB KF. By doing so,
we build upon the ability to backpropagate the loss to the
Σ̂t|t−1 computation of the KG. One can obtain the loss gradient with
respect to the KG from the output of KalmanNet since
2
Z −1
∂L ∂ kKt ∆yt − ∆xt k
=
t > 0 GRU 3 ∆yt ∂Kt ∂Kt
R̂0 t = 0 Ŝ ∆e
yt
= 2 · (Kt · ∆yt − ∆xt ) · ∆yt> , (11)
Ŝt where ∆xt , xt − x̂t|t−1 . The gradient computation in (11)
indicates that one can learn the computation of the KG by
Kt training KalmanNet end-to-end using the squared-error loss.
In particular, this allows to train the overall filtering system
without having to externally provide ground truth values of
Fig. 4: KalmanNet RNN block diagram (architecture #2). The the KG for training purposes.
input features are used to update three GRUs with dedicated The data set used for training comprises N trajectories that
FC layers, and the overall interconnection between the blocks can be of varying lengths. Namely, by letting Ti be the length
is based on the flow of the KG computation in the MB KF. of the ith training trajectory, the data set is given by D =
N
{(Yi , Xi )}1 , where
numerical study in Section IV we set the dimensionality of ht
 (i) (i)   (i) (i) (i) 
to be 10 · (m2 + n2 ). This often results in substantial over- Yi = y1 , . . . , yTi , Xi = x0 , x1 , . . . , xTi . (12)
parameterization, as the number of GRU parameters grows
quadratically with the number of state variables [68]. By letting Θ denote the trainable parameters of the RNN,
The second architecture uses separate GRU cells for each and γ be a regularization coefficient, we then construct an `2
tracked second-order statistical moments. The division of the regularized mean-squared error (MSE) loss measure
architecture into separate GRU cells and FC layers and their Ti  
1 X (i) (i)
2
2
interconnection is illustrated in Fig. 4. As shown in the figure, `i (Θ) = x̂t yt ; Θ −xt + γ · kΘk . (13)
the network composed of three GRU layers, connected in a Ti t=1
cascade with dedicated input and output FC layers. The first To optimize Θ, we use a variant of mini-batch stochastic
GRU layer tracks the unknown state noise covariance Q, thus gradient descent in which for every batch indexed by k, we
tracking m2 variables. Similarly, the second and third GRUs choose M < N trajectories indexed by ik1 , . . . , ikM , computing
track the predicted moments Σ̂t|t−1 (3b) and Ŝt (4b), thus the mini-batch loss as
having m2 and n2 hidden state variables, respectively. The M
GRUs are interconnected such that the learned Q is used to 1 X
Lk (Θ) = ` k (Θ) . (14)
compute Σ̂t|t−1 , which in turn is used to obtain Ŝt , while both M j=1 ij
Σ̂t|t−1 and Ŝt are involved in producing Kt (6). This archi-
tecture, which is composed of a non-standard interconnection Since KalmanNet is a recursive architecture with both
between GRUs and FC layers, is more directly tailored towards an external recurrence and an internal RNN, we use the
the formulation of the SS model and the operation of the backpropagation through time (BPTT) algorithm [69] to train
MB KF compared with the simpler first architecture. As such, it. Specifically, we unfold KalmanNet across time with shared
it provides lesser abstraction; i.e., it is expected to be more network parameters, and then compute a forward and back-
constrained in the family of mappings it can learn compared ward gradient estimation pass through the network. We con-
with the first architecture, while as a result also requiring sider three different variations of applying the BPTT algorithm
less trainable parameters. For instance, in the numerical study for training KalmanNet:
reported in Subsection IV-D, utilizing the first architecture V1 Direct application of BPTT, where for each training itera-
requires the order of 5 · 105 trainable parameters, while the tion the gradients are computed over the entire trajectory.
second architecture utilizes merely 2.5 · 104 parameters. V2 An application of the truncated BPTT algorithm [70].
Here, given a data set of long trajectories (e.g., T = 3000
D. Training Algorithm time steps), each long trajectory is divided into multiple
KalmanNet is trained using the available labeled data set short trajectories (e.g., T = 100 time steps), which are
in a supervised manner. While we use a neural network for shuffled and used during training.
computing the KG rather than for directly producing the V3 An alternative application of truncated BPTT, where we

6
truncate each trajectory to a fixed (and relatively short) RNNs for end-to-end state estimation, and also approaches
length, and train using these short trajectories. the MMSE performance achieved by the MB KF in linear
Overall, directly applying BPTT via V1 may be com- Gaussian SS models. Furthermore, the fact that KalmanNet
putationally expensive and unstable. Therefore, a favorable preserves the flow of the EKF implies that the intermediate fea-
approach is to first use the truncated BPTT as in V2 as a tures exchanged between its modules have a specific operation
warm-up phase (train first on short trajectories) in order to meaning, providing interpretability that is often scarce in end-
stabilize its learning process, after which KalmanNet is tuned to-end deep learning systems. Finally, the fact that KalmanNet
using V1. The procedure in V3 is most suitable for systems learns to compute the KG indicates the possibility of providing
that are known to be likely to quickly converge to a steady- not only estimates of the state xt , but also a measure of
state (e.g., linear SS models). In our numerical study reported confidence in this estimate, as the KG can be related to the
in Section IV we utilize all three approaches. covariance of the estimate, as initially explored in [71].
These combined gains of KalmanNet over purely MB
and DD approaches were recently observed in [72], which
E. Discussion utilized an early version of KalmanNet for real-time velocity
KalmanNet is designed to operate in a hybrid DD/MB estimation in an autonomous racing car. In such a setup, a non-
manner, combining deep learning with the classical EKF linear MB mixed KF was traditionally used, and suffered from
procedure. By identifying the specific noise-model-dependent performance degradation due to inherent mismatches in the
computations of the EKF and replacing them with a dedicated formulation of the SS model describing the problem. Nonethe-
RNN integrated in the EKF flow, KalmanNet benefits from less, previously proposed DD techniques relying on RNNs for
the individual strengths of both DD and MB approaches. The end-to-end state estimation were not operable in the desired
augmentation of the EKF with dedicated deep learning mod- frequencies on the hardware limited vehicle control unit. It
ules results in several core differences between KalmanNet was shown in [72] that the application of KalmanNet allowed
and its MB counterpart. Unlike the MB EKF, KalmanNet does to achieve improved real-time velocity tracking compared to
not attempt to linearize the SS model, and does not impose a MB techniques while being deployed on the control unit of
statistical model on the noise signals. In addition, KalmanNet the vehicle.
filters in a non-linear manner, as its KG matrix depends on the Our design of KalmanNet gives rise to many interesting
input yt . Due to these differences, compared to MB Kalman future extensions. Since we focus here on SS models where the
filtering, KalmanNet is more robust to model mismatch and mappings f (·) and h (·) are known up to some approximation
can infer more efficiently, as demonstrated in Section IV. In errors, a natural extension of KalmanNet is to use the data
particular, the MB EKF is sensitive to inaccuracies in the to pre-estimate them, as demonstrated briefly in the numerical
underlying SS model, e.g., in f (·) and h (·), while KalmanNet study. Another alternative to cope with these approximation
can overcome such uncertainty by learning an alternative KG errors is to utilize dedicated neural networks to learn these
that yields accurate estimation. mappings while training the entire model in an end-to-end
Furthermore, KalmanNet is derived for SS models when fashion. Doing so is expected to allow KalmanNet to be
noise statistics are not specified explicitly. A MB approach utilized in scenarios with analytically intractable SS models, as
to tackle this without relying on data employs the robust often arises when tracking based on unstructured observations,
Kalman filter [15]–[17], which designs the filter to minimize e.g., visual observations as in [56].
the maximal MSE within some range of assumed SS models, While we train KalmanNet in a supervised manner using
at the cost of performance loss, compared to knowing the true labeled data, the fact that it preserves the operation of the
model. When one has access to data, the direct strategy to MB EKF that produces a prediction of the next observation
implement the EKF in such setups is to use the data to estimate ŷt|t−1 on each time instance indicates the possibility of using
Q and R, either directly from the data or by backpropagating this intermediate feature for unsupervised training. One can
through the operation of the EKF as in [51], and utilize these thus envision KalmanNet being trained offline in a supervised
estimates to compute the KG. As covariance estimation can manner, while tracking variations in the underlying SS model
be a challenging task when dealing with high-dimensional at run-time by online self supervision, following a similar
signals, KalmanNet bypasses this need by directly learning rationale to that used in [24], [25] for deep symbol detection
the KG, and by doing so approaches the MSE of MB Kalman in time-varying communication channels.
filtering with full knowledge of the SS model, as demonstrated Finally, we note that while we focus here on filtering
in Section IV. Finally, the computation complexity for each tasks, SS models are used to represent additional related
time step t ∈ T is also linear in the RNN dimensions and does problems such as smoothing and prediction, as discussed in
not involve matrix inversion. This implies that KalmanNet is a Subsection II-A. The fact that KalmanNet does not explicitly
good candidate to apply for high dimensional SS models and estimate the SS model implies that it cannot simply substitute
on computationally limited devices. these parameters into an alternative algorithm capable of
Compared to purely DD state estimation, KalmanNet ben- carrying out tasks other than filtering. Nonetheless, one can
efits from its model-awareness and the fact that its operation still design DNN-aided algorithms for these tasks operating
follows the flow of MB Kalman filtering rather than being with partially known SS models as extensions of KalmanNet,
utilized as a black box. As numerically observed in Section IV, in the same manner as many MB algorithms build upon the
KalmanNet achieves improved MSE compared to utilizing KF. For instance, as the MB KF constitutes the first part of the

7
Rauch-Tung-Striebel smoother [73], one can extend Kalman- standard deviation, where we denote these measures by µ̂ and
Net to implement high-performance smoothing in partially σ̂, respectively.
known SS models, as we have recently began investigating 1) KalmanNet Setting: In Section III we present several
in [67]. Nonetheless, we leave the exploration of extensions architectures and training mechanisms that can be used when
of KalmanNet to alternative tasks associated with SS models implementing KalmanNet. In our experimental study we con-
for future work. sider three different configurations of KalmanNet:
C1 KalmanNet architecture #1 with input features {F2, F4}
IV. E XPERIMENTS AND R ESULTS and with training algorithm V3
In this section we present an extensive numerical study of C2 KalmanNet architecture #1 with input features {F2, F4}
KalmanNet1 , evaluating its performance in multiple setups and and with training algorithm V1
comparing it to various benchmark algorithms: C3 KalmanNet architecture #1 with input features {F1, F3,
(a) In our first experimental study we consider multiple F4} and with training algorithm V2.
linear SS models, and compare KalmanNet to the MB C4 KalmanNet architecture #2 with all input features and
KF which is known to minimize the MSE in such a setup. with training algorithm V1.
We also confirm our design and architectural choices by In all our experiments KalmanNet was trained using the Adam
comparing KalmanNet with alternative RNN based end- optimizer [74].
to-end state estimators. 2) Model-Based Filters: In the following experimental
(b) We next consider two non-linear SS models, a sinusoidal study we compare KalmanNet with several MB filters. For
model, and the chaotic Lorenz attractor. We compare the UKF we used the software package [75], while the PF
KalmanNet with the common non-linear MB bench- is implemented based on [76] using 100 particles and with-
marks; namely, the EKF, UKF, and PF. out parallelization. During our numerical study, when model
(c) In our last study we consider a localization use case based uncertainty was introduced, we optimized the performance of
on the Michigan NCLT data set [28]. Here, we compare the MB algorithms by carefully tuning the covariance matrices,
KalmanNet with MB KF that assumes a linear Wiener usually via a grid search. For long trajectories (e.g., T > 1500)
kinematic model [36] and with a vanilla RNN based it was sometimes necessary to tune these matrices, even in
end-to-end state estimator, and demonstrate the ability the case of full information, to compensate for inaccurate
of KalmanNet to track real world dynamics that was not uncertainty propagation due to non-linear approximations and
synthetically generated from an underlying SS model. to avoid divergence.

A. Experimental Setting B. Linear State Space Model


Throughout the numerical study and unless stated otherwise, Our first experimental study compares KalmanNet to the
in the experiments involving synthetic data, the SS model is MB KF for different forms of synthetically generated linear
generated using diagonal noise covariance matrices; i.e., system dynamics. Unless stated otherwise, here F takes the
controllable canonical form.
q2
Q = q2 · I, R = r2 · I, ν, . (15) 1) Full Information: We start by comparing KalmanNet of
r2 setting C1 to the MB KF for the case of full information,
By (15), setting ν to be 0 dB implies that both the state where the latter is known to minimize the MSE. Here, we
noise and the observation noise have the same variance. set H to take the inverse canonical form, and ν = 0 [dB]. To
For consistency, we use the term full information for cases demonstrate the applicability of KalmanNet to various linear
where the SS model available to KalmanNet and its MB systems, we experimented with systems of different dimen-
counterparts accurately represents the underlying dynamics. sions; namely, m × n ∈ {2 × 2, 5 × 5, 10 × 1}, and with tra-
More specifically, KalmanNet operates with full knowledge jectories of different lengths; namely, T ∈ {50, 100, 150, 200}.
of f (·) and h (·), and without access to the noise covariance In Fig. 5a we can clearly observe that KalmanNet achieves the
matrices, while its MB counterparts operate with an accurate MMSE of the MB KF. Moreover, to further evaluate the gains
knowledge of Q and R. The term partial information refers of the hybrid architecture of KalmanNet, we check that its
to the case where KalmanNet and its MB counterparts operate learning is transferable. Namely, in some of the experiments,
with some level of model mismatch, where the SS model we test KalmanNet on longer trajectories then those it was
design parameters do not represent the underlying dynamics trained on, and with different initial conditions. The fact the
accurately (i.e., are not equal to the SS parameters from KalmanNet achieves the MMSE lower bound also for these
which the data was generated). Unless stated otherwise, the cases indicates that it indeed learns to implement Kalman
metric used to evaluate the performance is the MSE in [dB] filtering, and it is not tailored to the trajectories presented
scale. In the figures we depict the MSE in [dB] versus the during training, with dependency only on the SS model.
inverse observation noise level, i.e., r12 , also in [dB] scale. In 2) Neural Model Selection: Next, we evaluate and confirm
some of our experiments, we evaluate both the MSE and its our design and architectural choices by considering a 2 × 2
1 The source code used in our numerical study along with the complete set
setup (similar to the previous one), and by comparing Kalman-
of hyperparameters used in each numerical evaluation can be found online at Net with setting C1 to two RNN based architectures of similar
https://github.com/KalmanNet/KalmanNet TSP. capacity applied for end-to-end state estimation:

8
40 0
-17

-17.5
-2

30 -18

-4
-18.5

20 -19
-6

-19.5

-8
10 -20
MSE [dB]

MSE [dB]
-20.5
-10

-21
0
-12
-21.5

4
-22

-10 3.8 -14

3.6 -22.5
0 20 40 60 80 100 120

3.4
-16

3.2
-20
3
-18
2.8

2.6

-30 2.4 -20

2.2

2 -22
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

-40 0 100 200 300 400 500 600 700 800 900 1000

-10 -5 0 5 10 15 20 25 30 35 40

(a) KalmanNet converges to MMSE. (b) Learning curves for DD state estimation.

Fig. 5: Linear SS model with full information

• Vanilla RNN directly maps the observed yt to an estimate TABLE I: Test MSE in [dB] when trained using T = 20.
of the state x̂t . Test T Vanilla RNN MB RNN MB RNN, diff. KalmanNet KF
• MB RNN imitates the Kalman filtering operation by first 20 -20.98 -21.53 -21.92 -21.92 -21.97
200 58.14 36.8 -21.88 -21.90 -21.91
recovering x̂t|t−1 using domain knowledge, i.e., via (3a),
and then uses the RNN to estimate an increment ∆x̂t simulate a 2×2 SS model with mismatches in either the state-
from the prior to posterior. evolution model (F) or in the state-observation model (H).
All RNNs utilize the same architecture as in KalmanNet with
a single GRU layer and the same learning hyperparameters. In State-Evolution Mismatch: Here, we set T = 20, and ν =
this experiment we test the trained models on trajectories with 0 [dB] and use a rotated evolution matrix Fα◦ , α ∈ {10◦ , 20◦ }
the same length as they were trained on, namely T = 20. We for data generation. The state-evolution matrix available to
can clearly observe how each of the key design considerations the filters, denoted F0 , is again set to take the controllable
of KalmanNet affect the learning curves depicted in Fig. 5b: canonical form. The mismatched design matrix F0 is related
• The incorporation of the known SS model allows the to true Fα◦ via
 
MB RNN to outperform the vanilla RNN, although both cos α − sin α
Fα◦ = Rxy α◦ · F 0 , R xy
α◦ = (16)
converge slowly and fail to achieve the MMSE. sin α cos α
• Using the sequences of differences as input notably Such scenarios represent a setup in which the analytical
improves the convergence rate of the MB RNN, indi- approximation of the SS model differs from the true generative
cating the benefits of using the differences as features, as model. The resulting MSE curves depicted in Fig. 6a demon-
discussed in Subsection III-B. strate that KalmanNet (with setting C2) achieves a 3 [dB] gain
• Learning is further improved by using the RNN for over the MB KF. In particular, despite the fact that KalmanNet
recovering the KG as part of the KF flow, as done by implements the KF with an inaccurate state-evolution model,
KalmanNet, rather than for directly estimating xt . it learns to apply an alternative KG, resulting in MSE within
To further evaluate the gains of KalmanNet over end-to-end a minor gap from the MMSE; i.e., from the KF with the true
RNNs, we compare the pre-trained models using trajectories Fα◦ plugged in.
with different initial conditions and a longer time horizon
(T = 200) than the one on which they were trained (T = 20). State-Observation Mismatch: Next, we simulate a setup
The results, summarized in Table I, show that KalmanNet with state-observation mismatch while setting T = 100 and
maintains achieving the MMSE, as already observed in Fig. 5a. ν = −20 [dB]. The model mismatch is achieved by using a
The MB RNN and vanilla RNN are more than 50 [dB] from rotated observation matrix Hα=10◦ for data generation, while
the MMSE, implying that their learning is not transferable and using H = I as the observation design matrix. Such scenarios
that they do not learn to implement Kalman filtering. However, represent a setup in which a slight misalignment (≈ 5%) of the
when provided with the difference features as we proposed in sensors exists. The resulting achieved MSE depicted in Fig. 6b
Subsection III-B, the DD systems are shown to be applicable demonstrates that KalmanNet (with setting C2) converges to
in longer trajectories, with KalmanNet achieving MSE within within a minor gap from the MMSE. Here, we also did an ad-
a minor gap of that achieved by the MB KF. The results of this ditional experiment, where we first estimated the observation
study validate the considerations used in designing KalmanNet matrix from data, and then had KalmanNet use the estimate
for the DD filtering problem discussed in Subsection II-B. matrix denoted Ĥα . In this case it is observed in Fig. 6b that
3) Partial Information: To conclude our study on linear KalmanNet achieves the MMSE lower bound. These results
models, we next evaluate the robustness of KalmanNet to imply that KalmanNet converges also in distribution to the
model mismatch as a result of partial model information. We KF.

9
10
20
5

10 0

-5
0
MSE [dB]

MSE [dB]
-10

-10
-15
2 -9

1.5
-10

-20 1
-20
-11
0.5

-12
0
-25 -13
-0.5

-30 -1 -14

-1.5
-30 -15
-2
-16
-2.5
-40 -3 -35 -17
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

9.75 9.8 9.85 9.9 9.95 10 10.05 10.1 10.15 10.2 10.25

-10 -5 0 5 10 15 20 25 30 35 40 -10 -5 0 5 10 15 20 25 30

(a) State-evolution mismatch. (b) State-observation mismatch.


Fig. 6: Linear SS model, partial information

C. Synthetic Non-Linear Model 20

Next, we consider a non-linear SS model, where the state- 10

evolution model takes a sinusoidal form, while the state- 0

observation model is a second order polynomial. The resulting


-10
SS model is given by
MSE [dB]
-20

f (x) = α · sin (β · x + φ) + δ, x ∈ R2 , (17a) -30


-2

-3

2 2
h (x) = a · (b · x + c) , y∈R . (17b) -40
-4

-5

-6

In the following we generate trajectories of T = 100 time steps -50


-7

from the noisy SS model in (1), with ν = −20 [dB], while


-8
-12 -11.9 -11.8 -11.7 -11.6 -11.5 -11.4 -11.3 -11.2 -11.1 -11

-60
-10 -5 0 5 10 15 20 25 30 35 40

using f (·) and h (·) as in (17) computed in a component-wise


manner, with parameters as in Table II. KalmanNet is used Fig. 7: Non-linear SS model. KalmanNet outperforms EKF.
with setting C4.
The MSE values for different levels of observations noise TABLE II: Non-linear toy problem parameters.
achieved by KalmanNet compared with the MB EKF are
depicted in Fig. 7 for both full and partial model information. α β φ δ a b c
The full evaluation with the MB EKF, UKF, and PF is given Full 0.9 1.1 0.1π 0.01 1 1 0
in Table III for the case of full information, and in Table IV for Partial 1 1 0 0 1 1 0
the case of partial information. We first observe that the EKF
operation and low complexity of the KF.
achieves the lowest MSE values among the MB filters, there-
fore serving as our main MB benchmark in our experimental
studies. For full information and in the low noise regime, EKF
achieves the lowest MSE values due to its ability to approach D. Lorenz Attractor
the MMSE in such setups, and KalmanNet achieves similar The Lorenz attractor is a three-dimensional chaotic so-
performance. For higher noise levels; i.e., for r12 = −12.04 lution to the Lorenz system of ordinary differential equa-
[dB], the MB EKF suffers from degraded performance due tions in continuous-time. This synthetically generated system
to a non-linear effect. Nonetheless, by learning to compute demonstrates the task of online tracking a highly non-linear
the KG from data, KalmanNet manages to overcome this and trajectory and a real world practical challenge of handling
achieves superior MSE. mismatches due to sampling a continuous-time signal into
In the presence of partial model information, the state- discrete-time [77].
evolution parameters used by the filters differs slightly from
the true model, resulting in a notable degradation in the TABLE III: MSE [dB] - Synthetic non-linear SS model; full
performance of the MB filters due to the model mismatch. information.
In all experiments, KalmanNet overcomes such mismatches, 1/r2 [dB] −12.04 −6.02 0 20 40
EKF µ̂ -6.23 -13.41 -19.58 -39.78 -59.67
and its performance is within a small gap of that achieved σ̂ ±0.89 ±0.53 ±0.47 ±0.43 ±0.44
when using full information for such setups. We thus conclude UKF µ̂ -6.48 -13.14 -18.43 -27.24 -37.27
σ̂ ±0.69 ±0.49 ±0.50 ±0.55 ±0.31
that in the presence of harsh non-linearities as well as model PF µ̂ -6.59 -13.33 -18.78 -26.70 -30.98
uncertainty due to inaccurate approximation of the underlying σ̂ ±0.74 ±0.48 ±0.39 ±0.07 ±0.02
KalmanNet µ̂ -7.25 -13.19 -19.22 -39.13 -59.10
dynamics, where MB variations of the KF fail, KalmanNet σ̂ ±0.49 ±0.52 ±0.55 ±0.49 ±0.53
learns to approach the MMSE while maintaining the real-time

10
TABLE IV: MSE [dB] - Synthetic non-linear SS model; partial TABLE V: MSE [dB] - Lorenz attractor with noisy state
information. observations.
1/r2 [dB] −12.04 −6.02 0 20 40 1/r2 [dB] 0 10 20 30 40
EKF µ̂ -2.99 -5.07 -7.57 -22.67 -36.55 EKF -10.45 -20.37 -30.40 -40.39 -49.89
σ̂ ±0.63 ±0.89 ±0.45 ±0.42 ±0.3 UKF -5.62 -12.04 -20.45 -30.05 -40.00
UKF µ̂ -0.91 -1.54 -5.18 -24.06 -37.96
PF -9.78 -18.13 -23.54 -30.16 -33.95
σ̂ ±0.60 ±0.23 ±0.29 ±0.43 ±2.21
PF µ̂ -2.32 -3.29 -4.83 -23.66 -33.13 KalmanNet -9.79 -19.75 -29.37 -39.68 -48.99
σ̂ ±0.89 ±0.53 ±0.64 ±0.48 ±0.45
KalmanNet µ̂ -6.62 -11.60 -15.83 -34.23 -45.29
σ̂ ±0.46 ±0.45 ±0.44 ±0.58 ±0.64
TABLE VI: MSE [dB] - Lorenz attractor with non-linear
In particular, the noiseless state-evolution of the continuous- observations
time process xτ with τ ∈ R+ is given by 1/r2 [dB] −10 0 10 20 30
  EKF 26.38 21.78 14.50 4.84 -4.02
−10 10 0 UKF nan nan nan nan nan

xτ = A (xτ ) · xτ , A (xτ ) =  28 −1 −x1,τ  . PF 24.85 20.91 14.23 11.93 4.35
∂τ
0 x1,τ − 83 KalmanNet 14.55 6.77 -1.77 -10.57 -15.24
(18)
To get a discrete-time state-evolution model, we repeat the performance, with MSE values surpassing 30 [dB]. To stabilize
steps used in [35]. First, we sample the noiseless process with the EKF, we had to perform a grid search using the available
sampling interval ∆τ and assume that A (xτ ) can be kept data set to optimize the process noise Q used by the filter.
constant in a small neighborhood of xτ ; i.e., Noisy non-linear observations: Next, we consider the case
where the observations are given by a non-linear function of
A (xτ ) ≈ A (xτ +∆τ ) . the current state, setting h to take the form of a transformation
Then, the continuous-time solution of the differential system from a cartesian coordinate system to spherical coordinates.
(18), which is valid in the neighborhood of xτ for a short time We further set T = 20 and ν = 0 [dB]. From the results
interval ∆τ , is depicted in Fig. 8b and reported in Table VI we observe that
in such non-linear setups, the sub-optimal MB approaches op-
xτ +∆τ = exp (A (xτ ) · ∆τ ) · xτ . (19) erating with full information of the SS model are substantially
Finally, we take the Taylor series expansion of (19) and a finite outperformed by KalmanNet (with setting C4).
series approximation (with J coefficients), which results in 2) Partial Information: W proceed to evaluate KalmanNet
J
and compare it to its MB counterparts under partial model
X (A (xτ ) · ∆τ )
j
F (xτ ) , exp (A (xτ ) · ∆τ ) ≈ I+ . (20) information. We consider three possible sources of model
j=1
j! mismatch arising in the Lorenz attractor setup:
• State-evolution mismatch due to use of a Taylor series
The resulting discrete-time evolution process is given by
approximation of insufficient order.
xt+1 = f (xt ) = F (xt ) · xt . (21) • State-observation mismatch as a result of misalignment
due to rotation.
The discrete-time state-evolution model in (21), with addi- • State-observation mismatch as a result of sampling from
tional process noise, is used for generating the simulated continuous-time to discrete-time.
Lorenz attractor data. Unless stated otherwise the data was
Since the EKF produced the best results in the full information
generated with J = 5 Taylor order, and ∆τ = 0.02 sampling
case among all non-linear MB filtering algorithms, we use it
interval. In the following experiments, KalmanNet is consis-
as a baseline for the MSE lower bound.
tently invariant of the distribution of the noise signals, with the
models it uses for f (·) and h (·) varying between the different State-evolution mismatch: In this study, both KalmanNet
studies, as discussed in the sequel. and the MB algorithms operate with a crude approximation
of the evolution dynamics obtained by computing (20) with
1) Full Information: We first compare KalmanNet to the J = 2, while the data is generated with an order J = 5 Taylor
MB filter when using the state-evolution matrix F computed series expansion. We again set h to be the identity mapping,
via (20) with J = 5. T = 2000, and ν = −20 [dB]. The results, depicted in Fig. 9a
Noisy state observations: Here, we set h (·) to be the and reported in Table VII, demonstrate that KalmanNet (with
identity transformation, such that the observations are noisy setting C4) learns to partially overcome this model mismatch,
versions of the true state. Further, we set ν = −20 [dB], and outperforming its MB counterparts operating with the same
T = 2000. As observed in Fig. 8a, despite being trained level of partial information.
on short trajectories T = 100, KalmanNet (with setting C3) State-observation rotation mismatch: Here, the presence
achieves excellent MSE performance—namely, comparable to of mismatch in the observations model is simulated by using
EKF—and outperforms the UKF and PF. The full details of data generated by an identity matrix rotated by merely θ = 1◦ .
the experiment are given in Table V. All the MB algorithms This rotation is equivalent to sensor misalignment of ≈ 0.55%.
were optimized for performance; e.g., applying the EKF with The results depicted in Figure. 9b and reported in Table VIII
full model information achieves an unstable state tracking clearly demonstrate that this allegedly minor rotation can cause

11
0
25
-5

-10 20

-15 15

-20 10
MSE [dB]

MSE [dB]
-25
5
-30 22
-10
0 20

-35 -12 18

16
-14 -5 14

-40 -16 12

10
-18 -10
-45 8

-20 6
-0.3 -0.2 -0.1 0 0.1 0.2 0.3

-50
9.5 9.6 9.7 9.8 9.9 10 10.1 10.2 10.3 10.4 10.5
-15
-10 -5 0 5 10 15 20 25 30
0 5 10 15 20 25 30 35 40

(a) T = 2000, ν = −20 [dB], h (·) = I. (b) T = 20, ν = 0 [dB], h (·) non-linear.

Fig. 8: Lorenz attractor, full information.

TABLE IX: Lorenz attractor with sampling mismatch.


TABLE VII: MSE [dB] - Lorenz attractor with state-evolution
mismatch J = 2. Metric EKF UKF PF KalmanNet MB-RNN
MSE [dB] -6.432 -5.683 -5.337 -11.284 17.355
1/r2 [dB] 10 20 30 40 σ̂ ±0.093 ±0.166 ±0.190 ±0.301 ±0.527
EKF µ̂ -20.37 -30.40 -40.39 -49.89 Run-time [sec] 5.440 6.072 62.946 4.699 2.291
J =5 σ̂ ±0.25 ±0.24 ±0.24 ±0.20
EKF µ̂ -19.47 -23.63 -33.51 -41.15 1
J =2 σ̂ ±0.25 ±0.11 ±0.18 ±0.12
of 2000 and get a decimated process with ∆τd = 0.02. This
UKF µ̂ -11.95 -20.45 -30.05 -39.98 procedure results in an inherent mismatch in the SS model due
J =2 σ̂ ±0.87 ±0.27 ±0.09 ±0.09 to representing an (approximately) continuous-time process
PF µ̂ -17.95 -23.47 -30.11 -33.81 using a discrete-time sequence. In this experiment, no process
J =2 σ̂ ±0.18 ±0.09 ±0.10 ±0.13
KalmanNet µ̂ -19.71 -27.07 -35.41 -41.74 noise was applied, and the observations are again obtained
J =2 σ̂ ±0.29 ±0.18 ±0.20 ±0.11 with h set to identity and T = 3000.
The resulting MSE values for r12 = 0 [dB] of KalmanNet
with configuration C4 compared with the MB filters and with
TABLE VIII: MSE [dB] - Lorenz attractor with observation the end-to-end neural network termed MB-RNN (see Subsec-
rotation. tion IV-B) are reported in Table IX. The results demonstrate
that KalmanNet overcomes the mismatch induced by repre-
1/r2 [dB] 0 10 20 30
EKF µ̂ -10.40 -20.41 -30.50 -40.45 senting a continuous-time SS model in discrete-time, achieving
θ = 0◦ σ̂ ±0.35 ±0.37 ±0.34 ±0.34 a substantial processing gain over the MB alternatives due
EKF µ̂ -9.80 -16.50 -18.19 -18.57 to its learning capabilities. The results also demonstrate that
θ = 1◦ σ̂ ±0.54 ±6.51 ±0.22 ±0.21
UKF µ̂ -2.08 -6.92 -7.89 -8.09 KalmanNet significantly outperforms a straightforward com-
θ = 1◦ σ̂ ±1.73 ±0.53 ±0.59 ±0.62 bination of domain knowledge; i.e. a state-transition function
PF µ̂ -8.48 -0.18 15.24 19.87 f (·), with end-to-end RNNs. A fully model-agnostic RNN
θ = 1◦ σ̂ ±3 ±8.21 ±3.50 ±0.80
KalmanNet µ̂ -9.63 -18.17 -27.32 -34.04
was shown to diverge when trained for this task. In Fig. 10
θ = 1◦ σ̂ ±0.53 ±0.42 ±0.67 ±0.77 we visualize how this gain is translated into clearly improved
tracking of a single trajectory. To show that these gains of
a severe performance degradation for the MB filters, while KalmanNet do not come at the cost of computationally slow
KalmanNet (with setting C3) is able to learn from data to inference, we detail the average inference time for all filters
overcome such mismatches and to notably outperform its MB (without parallelism). The stopwatch timings were measured
counterparts, which are sensitive to model uncertainty. Here, on the same platform – Google Colab with CPU: Intel(R)
we trained KalmanNet on short trajectories with T = 100 time Xeon(R) CPU @ 2.20GHz, GPU: Tesla P100-PCIE-16GB. We
steps, and tested it on longer trajectories with T = 1000 time see that KalmanNet infers faster than the classical methods,
steps, and set ν = −20 [dB]. This again demonstrates that the thanks to the highly efficient neural network computations
learning of KalmanNet is transferable. and the fact that, unlike the MB filters, it does not involve
State-observations sampling mismatch: We conclude our linearization and matrix inversions on each time step.
experimental study of the Lorenz attractor setup with an eval-
uation of KalmanNet in the presence of sampling mismatch.
E. Real World Dynamics: Michigan NCLT Data Set
Here, we generate data from the Lorenz attractor SS model
with an approximate continuous-time evolution process using In our final experiment we evaluate KalmanNet on the
a dense sampling rate, set to ∆τ = 10−5 . We then sub-sample Michigan NCLT data set [28]. This data set comprises different
the noiseless observations from the evolution process by a ratio labeled trajectories, with each one containing noisy sensor

12
-10 0

-15 -5

-20 -10

-25 -15
MSE [dB]

MSE [dB]
-30 -20

-20

-35 -25 -18

-22
-20

-24
-30 -22

-40 -26 -24

-26
-28
-35
-45 -30
-28

-30

19.5 19.6 19.7 19.8 19.9 20 20.1 20.2 20.3 20.4 20.5

-40 19.5 19.6 19.7 19.8 19.9 20 20.1 20.2 20.3 20.4 20.5

-50
10 15 20 25 30 35 40 0 5 10 15 20 25 30

(a) State-evolution mismatch, identity h, T = 2000. (b) Observation mismatch - ∆θ = 1◦ , T = 1000.

Fig. 9: Lorenz attractor, partial information.

readings (e.g., GPS and odometer) and the ground truth loca- TABLE X: Numerical MSE in [dB] for the NCLT experiment.
tions of a moving Segway robot. Given these noisy readings,
the goal of the tracking algorithm is to localize the Segway Baseline EKF KalmanNet Vanilla RNN
from the raw measurements at any given time. 25.47 25.385 22.2 40.21

for Kalman filtering is given by


   
F 0 4×4 Q 0
F̃ = ∈ R , Q̃ = ∈ R4×4 , (25a)
To tackle this problem we model the Segway kinematics (in 0 F 0 Q
  2 
each axis separately) using the linear Wiener velocity model, H 0 r 0
where the acceleration is modeled as a white Gaussian noise H̃ = ∈ R2×4 , R̃ = ∈ R2×2 . (25b)
0 H 0 r2
process wτ with variance q2 [36]:
    This model is equivalent to applying two independent KFs
> 2 ∂ 0 1 0 in parallel. Unlike the MB KF, KalmanNet does not rely on
xτ = (p, v) ∈ R , xτ = · xτ + , (22)
∂τ 0 0 w τ noise modeling, and can thus accommodate dependency in its
Here, p and v are the position and velocity, respectively. learned KG.
The discrete-time state-evolution with sampling interval ∆τ We arbitrarily use the session with date 2012-01-22 that
is approximated as a linear SS model in which the evolution consists of a single trajectory. Sampling at 1[Hz] results in
matrix F and noise covariance Q are given by 5, 850 time steps. We removed unstable readings and were
  1 3 2
left with 5,556 time steps. The trajectory was split into three
1
1 ∆τ 2 3 · (∆τ ) 2 · (∆τ ) sections: 85% for training (23 sequences of length T = 200),
F= , Q=q · 1 2 . (23)
0 1 2 · (∆τ ) ∆τ 10% for validation (2 sequences, T = 200), and 5% for testing
Since KalmanNet does not rely on knowledge of the noise (1 sequence, T = 277). We compare KalmanNet with setting
covariance matrices, Q is given here for the use of the MB C1 to end-to-end vanilla RNN and the MB KF, where for the
KF and for completeness. latter the matrices Q and R were optimized through a grid
search.
Fig. 11 and Table X demonstrate the superiority of Kalman-
The goal is to track the underlying state vector in both axis Net for such scenarios. KF blindly follows the odometer tra-
using solely odometry data; i.e., the observations are given by jectory and is incapable of accounting for the drift, producing
noisy velocity readings. In this case the observations obey a a very similar or even worse estimation than the integrated
noisy linear model: velocity. The vanilla RNN, which is agnostic of the motion
y ∈ R, H = (0, 1) . (24) model, fails to localize. KalmanNet overcomes the errors
induced by the noisy odometer observations, and provides
Such settings where one does not have access to direct the most accurate real-time locations, demonstrating the gains
measurements for positioning are very challenging yet prac- of combining MB KF-based inference with integrated DD
tical and typical for many applications where positioning modules for real world applications.
technologies are not available indoors, and one must rely on
noisy odometer readings for self-localization. Odometry-based V. C ONCLUSIONS
estimated positions typically start drifting away at some point. In this work we presented KalmanNet, a hybrid combination
of deep learning with the classic MB EKF. Our design iden-
In the assumed model, the x-axis (in cartesian coordinates) tifies the SS-model-dependent computations of the MB EKF,
are decoupled from the y-axis, and the linear SS model used replacing them with a dedicated RNN operating on specific

13
Decimated Observations
Ground Truth EKF PF

Noisy Observations MB-RNN


KalmanNet UKF

Fig. 10: Lorenz attractor with sampling mismatch (decimation), T = 3000.

[11] N. J. Gordon, D. J. Salmond, and A. F. Smith, “Novel approach to


nonlinear/non-Gaussian Bayesian state estimation,” in IEE proceedings
F (radar and signal processing), vol. 140, no. 2. IET, 1993, pp. 107–
113.
[12] P. Del Moral, “Nonlinear filtering: Interacting particle resolution,”
Comptes Rendus de l’Académie des Sciences-Series I-Mathematics, vol.
325, no. 6, pp. 653–658, 1997.
[13] J. S. Liu and R. Chen, “Sequential Monte Carlo methods for dynamic
systems,” Journal of the American statistical association, vol. 93, no.
443, pp. 1032–1044, 1998.
Fig. 11: NCLT data set: ground truth vs. integrated velocity, [14] F. Auger, M. Hilairet, J. M. Guerrero, E. Monmasson, T. Orlowska-
Kowalska, and S. Katsura, “Industrial applications of the Kalman filter:
trajectory from session with date 2012-01-22 sampled at 1 Hz. A review,” IEEE Trans. Ind. Electron., vol. 60, no. 12, pp. 5458–5471,
features encapsulating the information needed for its operation. 2013.
[15] M. Zorzi, “Robust Kalman filtering under model perturbations,” IEEE
Our numerical study shows that doing so enables KalmanNet Trans. Autom. Control, vol. 62, no. 6, pp. 2902–2907, 2016.
to carry out real-time state estimation in the same manner [16] ——, “On the robustness of the Bayes and Wiener estimators under
as MB Kalman filtering, while learning to overcome model model uncertainty,” Automatica, vol. 83, pp. 133–140, 2017.
mismatches and non-linearities. KalmanNet uses a relatively [17] A. Longhini, M. Perbellini, S. Gottardi, S. Yi, H. Liu, and M. Zorzi,
“Learning the tuned liquid damper dynamics by means of a robust EKF,”
compact RNN that can be trained with a relatively small arXiv preprint arXiv:2103.03520, 2021.
data set and infers a reduced complexity, making it applicable [18] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
for high dimensional SS models and computationally limited no. 7553, p. 436, 2015.
[19] Y. Bengio, “Learning deep architectures for AI,” Foundations and
devices. Trends® in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
[20] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
ACKNOWLEDGEMENTS [21] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation
We would like to thank Prof. Hans-Andrea Loeliger for his of gated recurrent neural networks on sequence modeling,” preprint
arXiv:1412.3555, 2014.
helpful comments and discussions, and Jonas E. Mehr for his [22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
assistance with the numerical study. L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv preprint
arXiv:1706.03762, 2017.
[23] M. Zaheer, A. Ahmed, and A. J. Smola, “Latent LSTM allocation:
R EFERENCES Joint clustering and non-linear dynamic modeling of sequence data,” in
International Conference on Machine Learning, 2017, pp. 3967–3976.
[1] G. Revach, N. Shlezinger, R. J. G. van Sloun, and Y. C. Eldar,
[24] N. Shlezinger, N. Farsad, Y. C. Eldar, and A. J. Goldsmith, “ViterbiNet:
“KalmanNet: Data-driven Kalman filtering,” in Proc. IEEE ICASSP,
A deep learning based Viterbi algorithm for symbol detection,” IEEE
2021, pp. 3905–3909.
Trans. Wireless Commun., vol. 19, no. 5, pp. 3319–3331, 2020.
[2] J. Durbin and S. J. Koopman, Time series analysis by state space
methods. Oxford University Press, 2012. [25] N. Shlezinger, R. Fu, and Y. C. Eldar, “DeepSIC: Deep soft interference
[3] R. E. Kalman, “A new approach to linear filtering and prediction cancellation for multiuser MIMO detection,” IEEE Trans. Wireless
problems,” Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, Commun., vol. 20, no. 2, pp. 1349–1362, 2021.
1960. [26] N. Shlezinger, N. Farsad, Y. C. Eldar, and A. J. Goldsmith, “Learned
[4] R. E. Kalman and R. S. Bucy, “New results in linear filtering and factor graphs for inference from stationary time sequences,” IEEE Trans.
prediction theory,” 1961. Signal Process., early access, 2022.
[5] R. E. Kalman, “New methods in Wiener filtering theory,” 1963. [27] N. Shlezinger, J. Whang, Y. C. Eldar, and A. G. Dimakis, “Model-based
[6] N. Wiener, Extrapolation, interpolation, and smoothing of stationary deep learning,” arXiv preprint arXiv:2012.08405, 2020.
time series: With engineering applications. MIT press Cambridge, [28] N. Carlevaris-Bianco, A. K. Ushani, and R. M. Eustice, “University
MA, 1949, vol. 8. of Michigan North Campus long-term vision and lidar dataset,” The
[7] M. Gruber, “An approach to target tracking,” MIT Lexington Lincoln International Journal of Robotics Research, vol. 35, no. 9, pp. 1023–
Lab, Tech. Rep., 1967. 1035, 2016.
[8] R. E. Larson, R. M. Dressler, and R. S. Ratner, “Application of [29] R. G. Krishnan, U. Shalit, and D. Sontag, “Deep Kalman filters,” preprint
the extended kalman filter to ballistic trajectory estimation,” Stanford arXiv:1511.05121, 2015.
Research Institute, Tech. Rep., 1967. [30] M. Karl, M. Soelch, J. Bayer, and P. Van der Smagt, “Deep variational
[9] J. D. McLean, S. F. Schmidt, and L. A. McGee, Optimal filtering Bayes filters: Unsupervised learning of state space models from raw
and linear prediction applied to a midcourse navigation system for the data,” preprint arXiv:1605.06432, 2016.
circumlunar mission. National Aeronautics and Space Administration, [31] M. Fraccaro, S. D. Kamronn, U. Paquet, and O. Winther, “A disentangled
1962. recognition and nonlinear dynamics model for unsupervised learning,”
[10] S. J. Julier and J. K. Uhlmann, “New extension of the Kalman filter to in Advances in Neural Information Processing Systems, 2017.
nonlinear systems,” in Signal Processing, Sensor Fusion, and Target [32] C. Naesseth, S. Linderman, R. Ranganath, and D. Blei, “Variational
Recognition VI, vol. 3068. International Society for Optics and sequential Monte Carlo,” in International Conference on Artificial In-
Photonics, 1997, pp. 182–193. telligence and Statistics. PMLR, 2018, pp. 968–977.

14
[33] E. Archer, I. M. Park, L. Buesing, J. Cunningham, and L. Paninski, [59] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational inference:
“Black box variational inference for state space models,” arXiv preprint A review for statisticians,” Journal of the American statistical Associa-
arXiv:1511.07367, 2015. tion, vol. 112, no. 518, pp. 859–877, 2017.
[34] R. Krishnan, U. Shalit, and D. Sontag, “Structured inference networks [60] T. Haarnoja, A. Ajay, S. Levine, and P. Abbeel, “Backprop kf: Learning
for nonlinear state space models,” in Proceedings of the AAAI Confer- discriminative deterministic state estimators,” in Advances in Neural
ence on Artificial Intelligence, vol. 31, no. 1, 2017. Information Processing Systems, 2016, pp. 4376–4384.
[35] V. G. Satorras, Z. Akata, and M. Welling, “Combining generative and [61] B. Laufer-Goldshtein, R. Talmon, and S. Gannot, “A hybrid approach for
discriminative models for hybrid inference,” in Advances in Neural speaker tracking based on TDOA and data-driven models,” IEEE/ACM
Information Processing Systems, 2019, pp. 13 802–13 812. Trans. Audio, Speech, Language Process., vol. 26, no. 4, pp. 725–735,
[36] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with applica- 2018.
tions to tracking and navigation: theory algorithms and software. John [62] H. Coskun, F. Achilles, R. DiPietro, N. Navab, and F. Tombari, “Long
Wiley & Sons, 2004. short-term memory Kalman filters: Recurrent neural estimators for pose
[37] K.-V. Yuen and S.-C. Kuok, “Online updating and uncertainty quantifica- regularization,” in Proceedings of the IEEE International Conference on
tion using nonstationary output-only measurement,” Mechanical Systems Computer Vision, 2017, pp. 5524–5532.
and Signal Processing, vol. 66, pp. 62–77, 2016. [63] S. S. Rangapuram, M. W. Seeger, J. Gasthaus, L. Stella, Y. Wang, and
[38] H.-Q. Mu, S.-C. Kuok, and K.-V. Yuen, “Stable robust extended kalman T. Januschowski, “Deep state space models for time series forecasting,”
filter,” Journal of Aerospace Engineering, vol. 30, no. 2, p. B4016010, in Advances in Neural Information Processing Systems, 2018, pp. 7785–
2017. 7794.
[64] P. Becker, H. Pandya, G. Gebhardt, C. Zhao, C. J. Taylor, and
[39] I. Arasaratnam, S. Haykin, and R. J. Elliott, “Discrete-time nonlinear fil-
G. Neumann, “Recurrent Kalman networks: Factorized inference in
tering algorithms using Gauss–Hermite quadrature,” Proc. IEEE, vol. 95,
high-dimensional deep feature spaces,” in International Conference on
no. 5, pp. 953–977, 2007.
Machine Learning. PMLR, 2019, pp. 544–552.
[40] I. Arasaratnam and S. Haykin, “Cubature Kalman filters,” IEEE Trans. [65] X. Zheng, M. Zaheer, A. Ahmed, Y. Wang, E. P. Xing, and A. J. Smola,
Autom. Control, vol. 54, no. 6, pp. 1254–1269, 2009. “State space LSTM models with particle MCMC inference,” preprint
[41] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial arXiv:1711.11179, 2017.
on particle filters for online nonlinear/non-Gaussian Bayesian tracking,” [66] T. Salimans, D. Kingma, and M. Welling, “Markov chain monte carlo
IEEE Trans. Signal Process., vol. 50, no. 2, pp. 174–188, 2002. and variational inference: Bridging the gap,” in International Conference
[42] N. Chopin, P. E. Jacob, and O. Papaspiliopoulos, “SMC2: an efficient on Machine Learning. PMLR, 2015, pp. 1218–1226.
algorithm for sequential analysis of state space models,” Journal of the [67] X. Ni, G. Revach, N. Shlezinger, R. J. van Sloun, and Y. C. Eldar,
Royal Statistical Society: Series B (Statistical Methodology), vol. 75, “RTSNET: Deep learning aided kalman smoothing,” in Proc. IEEE
no. 3, pp. 397–426, 2013. ICASSP, 2022.
[43] L. Martino, V. Elvira, and G. Camps-Valls, “Distributed particle [68] R. Dey and F. M. Salem, “Gate-variants of gated recurrent unit (GRU)
metropolis-Hastings schemes,” in IEEE Statistical Signal Processing neural networks,” in Proc. IEEE MWSCAS, 2017, pp. 1597–1600.
Workshop (SSP), 2018, pp. 553–557. [69] P. J. Werbos, “Backpropagation through time: what it does and how to
[44] C. Andrieu, A. Doucet, and R. Holenstein, “Particle Markov chain do it,” Proc. IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
Monte Carlo methods,” Journal of the Royal Statistical Society: Series [70] I. Sutskever, Training recurrent neural networks. University of Toronto
B (Statistical Methodology), vol. 72, no. 3, pp. 269–342, 2010. Toronto, Canada, 2013.
[45] J. Elfring, E. Torta, and R. van de Molengraft, “Particle filters: A hands- [71] I. Klein, G. Revach, N. Shlezinger, J. E. Mehr, R. J. van Sloun, Y. Eldar
on tutorial,” Sensors, vol. 21, no. 2, p. 438, 2021. et al., “Uncertainty in data-driven Kalman filtering for partially known
[46] R. H. Shumway and D. S. Stoffer, “An approach to time series smoothing state-space models,” in Proc. IEEE ICASSP, 2022.
and forecasting using the em algorithm,” Journal of time series analysis, [72] A. López Escoriza, G. Revach, N. Shlezinger, and R. J. G. van Sloun,
vol. 3, no. 4, pp. 253–264, 1982. “Data-driven Kalman-based velocity estimation for autonomous racing,”
[47] Z. Ghahramani and G. E. Hinton, “Parameter estimation for linear in Proc. IEEE ICAS, 2021.
dynamical systems,” 1996. [73] H. E. Rauch, F. Tung, and C. T. Striebel, “Maximum likelihood estimates
[48] J. Dauwels, A. Eckford, S. Korl, and H.-A. Loeliger, “Expectation max- of linear dynamic systems,” AIAA Journal, vol. 3, no. 8, pp. 1445–1450,
imization as message passing-part i: Principles and gaussian messages,” 1965.
arXiv preprint arXiv:0910.2832, 2009. [74] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
[49] L. Martino, J. Read, V. Elvira, and F. Louzada, “Cooperative parallel preprint arXiv:1412.6980, 2014.
particle filters for online model selection and applications to urban [75] Labbe, Roger, FilterPy - Kalman and Bayesian Filters in Python, 2020.
mobility,” Digital Signal Processing, vol. 60, pp. 172–185, 2017. [Online]. Available: https://filterpy.readthedocs.io/en/latest/
[50] P. Abbeel, A. Coates, M. Montemerlo, A. Y. Ng, and S. Thrun, [76] Jerker Nordh, pyParticleEst - Particle based methods in Python, 2015.
“Discriminative training of Kalman filters.” in Robotics: Science and [Online]. Available: https://pyparticleest.readthedocs.io/en/latest/index.
Systems, vol. 2, 2005, p. 1. html
[51] L. Xu and R. Niu, “EKFNet: Learning system noise statistics from [77] W. Gilpin, “Chaos as an interpretable benchmark for forecasting and
measurement data,” in Proc. IEEE ICASSP, 2021, pp. 4560–4564. data-driven modelling,” arXiv preprint arXiv:2110.05266, 2021.
[52] S. T. Barratt and S. P. Boyd, “Fitting a kalman smoother to data,” in 2020
American Control Conference (ACC). IEEE, 2020, pp. 1526–1531.
[53] L. Xie, Y. C. Soh, and C. E. De Souza, “Robust kalman filtering
for uncertain discrete-time systems,” IEEE Transactions on automatic
control, vol. 39, no. 6, pp. 1310–1314, 1994.
[54] C. M. Carvalho, M. S. Johannes, H. F. Lopes, and N. G. Polson, “Particle
learning and smoothing,” Statistical Science, vol. 25, no. 1, pp. 88–106,
2010.
[55] I. Urteaga, M. F. Bugallo, and P. M. Djurić, “Sequential monte carlo
methods under model uncertainty,” in 2016 IEEE Statistical Signal
Processing Workshop (SSP). IEEE, 2016, pp. 1–5.
[56] L. Zhou, Z. Luo, T. Shen, J. Zhang, M. Zhen, Y. Yao, T. Fang,
and L. Quan, “KFNet: Learning temporal camera relocalization using
Kalman filtering,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, 2020, pp. 4919–4928.
[57] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”
preprint arXiv:1312.6114, 2013.
[58] D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backprop-
agation and approximate inference in deep generative models,” in
International conference on machine learning. PMLR, 2014, pp. 1278–
1286.

15

You might also like