0% found this document useful (0 votes)
25 views33 pages

Machine Learning For Predictive Estimation of Qubit Dynamics Subject To Dephasing

This document discusses the application of machine learning algorithms for predicting qubit dynamics affected by dephasing in quantum computing. It compares various methods, particularly focusing on the performance of autoregressive Kalman Filters (KF) and Gaussian Process Regression (GPR) for state estimation and future predictions. The findings indicate that autoregressive KF outperforms Fourier-based methods and highlights the limitations of GPR for forward prediction in this context.

Uploaded by

ranaimransa227
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views33 pages

Machine Learning For Predictive Estimation of Qubit Dynamics Subject To Dephasing

This document discusses the application of machine learning algorithms for predicting qubit dynamics affected by dephasing in quantum computing. It compares various methods, particularly focusing on the performance of autoregressive Kalman Filters (KF) and Gaussian Process Regression (GPR) for state estimation and future predictions. The findings indicate that autoregressive KF outperforms Fourier-based methods and highlights the limitations of GPR for forward prediction in this context.

Uploaded by

ranaimransa227
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Machine learning for predictive estimation of qubit dynamics subject to dephasing

Riddhi Swaroop Gupta∗ and Michael J. Biercuk


ARC Centre of Excellence for Engineered Quantum Systems, School of Physics,
The University of Sydney, New South Wales 2006, Australia

Decoherence remains a major challenge in quantum computing hardware and a variety of physical-
layer controls provide opportunities to mitigate the impact of this phenomenon through feedback
and feedforward control. In this work, we compare a variety of machine learning algorithms de-
rived from diverse fields for the task of state estimation (retrodiction) and forward prediction of
future qubit state evolution for a single qubit subject to classical, non-Markovian dephasing. Our
approaches involve the construction of a dynamical model capturing qubit dynamics via autoregres-
sive or Fourier-type protocols using only a historical record of projective measurements. A detailed
comparison of achievable prediction horizons, model robustness, and measurement-noise-filtering ca-
pabilities for Kalman Filters (KF) and Gaussian Process Regression (GPR) algorithms is provided.
arXiv:1712.01291v1 [quant-ph] 4 Dec 2017

We demonstrate superior performance from the autoregressive KF relative to Fourier-based KF ap-


proaches and focus on the role of filter optimization in achieving suitable performance. Finally, we
examine several realizations of GPR using different kernels and discover that these approaches are
generally not suitable for forward prediction. We highlight the underlying failure mechanism in this
application and identify ways in which the output of the algorithm may be misidentified numerical
artefacts.

I. INTRODUCTION underlying mathematical transformation, demonstrated


state estimation and forward predictions for chaotic, non-
In predictive estimation, a dynamically evolving sys- linear systems in the absence of a prescribed model [10].
tem is observed and any temporal correlations encoded For non-chaotic, multi-component stationary random sig-
in the observations are used to predict the future state nals, other algorithmic approaches have been particularly
of the system. This generic problem is well studied in di- useful for tracking instantaneous frequency and phase in-
verse fields such as engineering, econometrics, meteorol- formation, [12, 13], enabling short-run forecasting.
ogy, and seismology [1–5], and is addressed in the control- In the field of quantum control, work has begun to
theoretic literature as a form of filtering. Applying these incorporate the additional challenges faced when con-
approaches to state estimation on qubits is complicated sidering state estimation on qubits, notably quantum-
by a variety of factors; dominant among these is the vi- state collapse under projective measurement. Under
olation of the assumption of linearity inherent in most such circumstances, in which the measurement backac-
filtering applications as qubit states are formally bilin- tion strongly influences the quantum state (in contrast
ear. The case of an idling, or freely evolving qubit sub- with the classical case), it is not straightforward to ex-
ject to dephasing is more complicated still, as an a priori tend machine learning predictive estimation techniques.
model of system evolution suitable for implementation Work to date has approached the analysis of projective
within standard filtering algorithms will not in general measurement records on qubits as pattern recognition
be available. or image reconstruction problems, for example, in char-
Fortunately there are many lessons to learn from clas- acterising the initial or final state of quantum system
sical control, even in the presence of such complications. (e.g. [14–16]) or reconstructing the historical evolution
For classical systems, machine learning techniques have of a quantum system based on large measurement records
enabled state tracking, control, and forecasting for highly (e.g. [17–22]). In adaptive or sequential Bayesian learn-
non-linear and noisy dynamical trajectories or complex ing applications, a projective measurement protocol may
measurement protocols (e.g. [6–10]). These demonstra- be designed or adaptively manipulated to efficiently yield
tions move far beyond the simplified assumptions un- noise-filtered information about a quantum system (e.g.
derlying many basic filtering tasks such as linear dy- [23–26]).
namics and white (uncorrelated) noise processes. For The demonstrations above typically assume the ob-
instance, so-called particle-based Bayesian frameworks ject of interest is either static, or stochastically evolves
(e.g. particle filtering, unscented or sigma-point filter- in a manner which is dynamically uncorrelated in time
ing) allow state estimation and tracking in the presence of (white) as measurement protocols are repeated. This
non-linearities in system dynamics or measurement pro- simplifying assumption falls well short of typical labo-
tocols [11]. Further extensions approach the needs of ratory based experiments where noise processes are fre-
a stochastically evolving system; recently, an ensemble quently correlated in time, and evolution may also oc-
of so-called unscented Kalman filters, named after the cur rapidly relative to a measurement protocol. In such
a circumstance, further complexity is introduced as the
Markov condition commonly assumed in Bayesian learn-
∗ ing frameworks [11] is immediately violated. Even in the
[email protected]
2

classical case, the problem of designing an appropriate Basis (LKFFB); and a suitably designed GPR learning
representation of non-Markovian dynamics in Bayesian protocol. For binary measurement outcomes, we extend
learning frameworks is an active area of research (e.g the AKF to a Quantised Kalman Filter (QKF). In Sec-
[27]). Hence, the canonical real-time tracking and pre- tion IV A, we present optimisation procedures for tuning
diction problem - where a non-linear, stochastic trajec- all algorithms. Numerical investigations of algorithmic
tory of a system is tracked using noisy measurements performance are presented in Section IV and a compar-
and short-run forecasts are made - is under-explored for ative analysis of all algorithms is provided in Section V.
quantum systems with projective measurements.
In this manuscript, we develop and explore a broad
class of predictive estimation algorithms allowing us to II. PHYSICAL SETTING
track a qubit state undergoing stochastic but temporally
correlated evolution using a record of projective measure- Our physical setting considers a sequence of projective
ments, and forecast its future evolution. Our approaches measurements performed on a qubit. Each projective
employ machine learning algorithms to extract tempo- measurement yields a 0 or 1 outcome representing the
ral correlations from the measurement record and use state of the qubit. The qubit is then reset, and the exact
this information to build an effective dynamical model of procedure is repeated. By considering a qubit state ini-
the system’s evolution. We design a deterministic pro- tialized in a superposition of the measurement basis (for
tocol to correlate Markovian processes such that a cer- us, Pauli σ̂z eigenstates), we gain access to a direct probe
tain general class of non-Markovian dynamics can be ap- of qubit phase evolution. If, for instance, no dephas-
proximately tracked without violating the assumptions ing is present, then the probability of obtaining a binary
of a machine learning protocol, based on the theoreti- outcome remains static in time as sequential qubit mea-
cally accessible and computationally efficient frameworks surements are performed. If slowly drifting environmen-
of Kalman Filtering (KF) and Gaussian Process Regres- tal dephasing is present, then the probability of obtain-
sion (GPR). Both frameworks provide a mechanism by ing a given binary outcome also drifts stochastically. In
which temporal correlations (equally, dynamics) are en- essence, the qubit probes dephasing noise and our proce-
coded into an algorithm’s structure such that projection dure encodes a continuous-time non-Markovian dephas-
of data-sets onto this structure enables meaningful learn- ing process into time-stamped, discrete binary samples
ing, white-noise filtering, and effective forward predic- through the nonlinear projective measurement, carrying
tion. We perform numerical simulations to test the effec- the underlying correlations in the noise. It is this series of
tiveness of these algorithms in maximizing the prediction measurements which we seek to process in our algorith-
horizon under various conditions, and quantify the role mic approaches to qubit state tracking and prediction.
of the measurement sampling rate relative to the noise Formally, an arbitrary environmental dephasing pro-
dynamics in defining the prediction horizon. Simulations cess manifests as time-dependent stochastic detuning,
incorporate a variety of measurement models, including δω(t), between the qubit frequency and the system mas-
pre-processed data yielding a continuous measurement ter clock. This detuning is an experimentally measurable
outcome and discretised outcomes commonly associated quantity in a Ramsey protocol, as shown schematically
with single-shot projective qubit measurements. We find in Fig. 1 (a). A non-zero detuning over measurement pe-
that in most circumstances an autoregressive Kalman riod τ (starting from t = 0) induces a stochastic relative
framework yields the best performance, providing model- phase accumulation (in the rotating frame) for a qubit
robust forward prediction horizons and effective filtering superposition state as |0i + e−if (0,τ ) |1i between qubit
of measurement noise. Finally, we demonstrate that stan- basis states. The accumulated f (0, τ ) at the end of a
dard GPR-based protocols employing a variety of kernels, single Ramsey experiment is mapped to a probability of
while effective for the problem of filtering (fitting) a mea- obtaining a particular outcome in the measurement basis
surement record, are not suitable for real-time forecasting via the form of the Ramsey sequence.
beyond the measurement record. In a sequence of n Ramsey measurements spaced
In what follows, we describe in detail the physical set- ∆t apart with a fixed duration, τ , the change in the
ting for our problem in Section II and explain how this statistics of measured outcomes over this measurement
leads to a specific choice of algorithm which may be de- record depends solely on the dephasing δω(t). We as-
ployed for the task of tracking non-Markovian state dy- sume that the measurement action over τ is much faster
namics in the absence of a dynamical model for system than the temporal dynamics of the dephasing process,
evolution. We provide an overview of the central GPR and ∆t & τ . The resulting measurement record is a
and KF frameworks in Section III, and we specify a series set of binary outcomes, {dn }, determined probabilisti-
of algorithms under consideration in this paper tailored cally from n true stochastic qubit phases, f := {fn }.
to different measurement processes. For pre-processed Here the accumulated phase in each Ramsey experiment,
measurement records, we consider four algorithmic ap- R n∆t+τ
proaches: a Least Squares Filter (LSF) from [28]; an f (n∆t, n∆t+τ ) ≡ n∆t δω(t0 )dt0 and we use the short-
Autoregressive Kalman Filter (AKF); a so-called Liska hand f (n∆t, n∆t + τ ) ≡ fn . We define the statistical
Kalman Filter from [29] adapted for a Fixed oscillator likelihood for observing a single shot, dn , using Born’s
rule [30]:
3

(a)
again via Born’s rule for another random value of the
True State bias at time-step n + 1. A detailed discussion of Eq. (1)
Prediction can be found in Appendix A.
Msmts The action of measurement, expressed as h(fn ), is
Est. Risk given by P r(dn = d|fn , τ, n∆t) ≡ 21 − (−1)d h(fn ) and
Pred. Risk
(b) Non Linear Measurement Record is depicted in Fig. 1(b) as a probability of seeing the
Past Future qubit in the d = 1 state. We begin by describing here
a ‘raw’ non-linear measurement record, {dn } where each
dn [black dots] corresponds to a binary outcome derived
from a single projective measurement on a qubit. The
sequence {dn } can be treated as a sequence of biased
coin flips, where the underlying bias of the coin is a non-
Markovian, discrete-time process and the value of the
bias is given by Eq. (1) at each n. The non-linearity of
(c) Linear Measurement Record
the measurement, h(fn ), is defined with respect to fn
Processor where Eq. (1) is interpreted as a non-linear measurement
P action for Bayesian learning frameworks.
(a.u.)

This data series is contrasted with a linear measure-


ment record, {yn }, depicted in Fig. 1(c). Each value yn
is derived from the sum of a true qubit phase, fn , and
Gaussian white measurement noise, vn . The sequence
Time Steps (num) {yn } is generated by pre-processing raw binary measure-
ments, {dn } via a range of experimental techniques sub-
ject to a separation of timescales such that ∼ τ is much
FIG. 1. (a) A Ramsey experiment at t = n∆t with fixed wait faster than drift of δω(t). In the most common case,
time τ and time-steps, n, spaced ∆t > τ apart. A π/2 pulse one performs M runs of the experiment over which δω(t)
rotates qubit state to super-position of |di states, d ∈ {0, 1}; is approximately constant, giving an estimate of fn at
qubit evolves via ĤN (t) accumulating relative stochastic fn , t = n∆t using averaging, a Bayesian scheme, or Fourier
for non-zero environmental dephasing δω(t). Jittering arrows analysis. A more complex linearization protocol involves
depict potential qubit state vectors permitted for (unknown)
the use of low-pass or decimation filtering on a sequence
random fn . Qubit state is measured as dn = d in σ̂z basis af-
ter a second π/2 rotation. (b) Black dots depict {dn } against
{dn } to yield Pˆr(dn |fn , τ, n∆t), from which accumulated
time steps, n; data collection stops at n = 0 separating past phase corrupted by measurement noise, {yn }, can be ob-
state estimation from future prediction [blue region]. Black tained from Eq. (1).
solid line shows true qubit state likelihood ∝ h(fn ); and red We impose properties on environmental dephasing
solid line shows state estimate (prediction) for n < 0 (n > 0). such that our theoretical designs can enable meaningful
A prediction horizon is n < n∗ ∈ [0, NP ] for which dark- predictions. We assume dephasing is non-Markovian, co-
grey region between red and black lines is minimised (Bayes variance stationary and mean-square ergodic. That is, a
prediction risk) relative to predicting the mean of dephasing single realisation of the process f is drawn from a power
noise; algorithmic tuning occurs by minimising light-grey re- spectral density of arbitrary, but non-Markovian form.
gion (Bayes state estimation risk). Q quantises black line into We further assume that f is a Gaussian process and the
noisy qubit measurements, dn , under Gaussian uncertainty
separation of timescales between measurement protocols
vn . (c) Single shot outcomes in (b) are pre-processed to yield
noisy measurements {yn } [black dots]; yn is linear in fn and and dephasing dynamics articulated above are met.
vn represents additive white Gaussian measurement noise. Given these conditions, our task is to build a dynam-
ical model to approximately track f over past measure-
ments (n < 0), and enable qubit state predictions in
future times (n > 0). This prediction is represented by
( the red line in Fig. 1(b-c), and differs from the truth
cos2 ( f2n ) for d = 1 by the so-called estimation (prediction) risk for past (fu-
P r(dn = d|fn , τ, n∆t) = (1) ture) times as indicated by shading. We represent our
sin2 ( f2n ) for d = 0
estimate of f for all times using a hat in both the lin-
ear and nonlinear measurement models. The major chal-
The notation P r(dn |fn , τ, n∆t) refers to the conditional
lenge we face in developing this estimate, fˆ (equivalently
probability of obtaining measurement outcome dn given
a true stochastic phase, fn , accumulated over τ , begin- Pˆr(dn |fn , τ, n∆t)), is that for a qubit evolving under
ning at time t = n∆t. In the noiseless case, P r(dn = stochastic dephasing (true state given by black solid line
1|fn , τ, n∆t) = 1, ∀n, such that a qubit exhibits no in Fig. 1(b) and (c)), we have no a prior dynamical model
additional phase accumulation due to environmental de- for the underlying evolution of f . In the next section,
phasing. Following a single measurement the qubit state we define the theoretical structure of KF and GPR al-
is reset, but the dephasing noise correlations manifest gorithms which allow us to build that dynamical model
4

directly from the historical measurement record. methods; we provide an overview of the central features
of these algorithms below.
The key feature of a Kalman filter is the recursive
III. OVERVIEW OF PREDICTIVE learning procedure shown in the inset to Fig. 2(a). Our
METHODOLOGIES knowledge of the qubit state is summarised by the prior
and a posterior Gaussian probability distributions and
Our objective is to implement an algorithm permitting these are created and collapsed recursively at each time
learning of underlying qubit dynamics in such a way as step. The mean of these distributions is the true Kalman
to maximize the forward prediction horizon for a given state, xn , and the covariance of these distributions, Pn ,
qubit data record. We first quantify the quality of our captures the uncertainty in our knowledge of xn ; together
state estimation procedure. The fidelity of any under- both define the Gaussian distribution. The Kalman fil-
lying algorithm during state estimation and prediction, ter produces an estimate of the state, x̂n at each step
relative to the true state, is expressed by the mathemat- through this recursive procedure taking into account two
ical quantity known as a Bayes Risk, where zero risk factors. First, the Kalman gain, γn , updates our knowl-
corresponds to perfect estimation. At each time-step, n, edge of (xn , Pn ) within each time step n and serves as
the Bayes risk is a mean square distance between truth, a weighting factor for the difference between incoming
f , and prediction, fˆ, calculated over an ensemble of M data, and our best estimate for an observation based on
different realisations of true f and noisy data-sets D: x̂n , suitably transformed via the measurement action,
h(x̂n ). Next, the dynamical model Φn propagates the
LBR (n|I) ≡ h(fn − fˆn )2 if,D (2) state and covariance, (xn , Pn ), to the next time step, such
that the posterior moments at n define the prior at n + 1.
The notation LBR (n|I) expresses that the Bayes Risk This process occurs for each time step and an estimate
value at n is conditioned on I, a placeholder for free pa- of a true xn state is built up recursively based on all of
our existing knowledge, namely, a linear combination of
rameters in the design of the predictor, fˆn . State estima-
all past measurements; and all previously generated state
tion risk is Bayes Risk incurred during n ∈ [−NT , 0]; pre-
estimates. Beyond n = 0 we perform predictions in the
diction risk is the Bayes Risk incurred during n ∈ [0, NP ].
absence of further measurement data by simply propa-
State estimation and prediction risk regions for one real-
gating the dynamic model with the Kalman gain set to
isation of dephasing noise are shaded in Fig. 1-3. We
zero. Full details of the KF algorithm appear below in
therefore define the forward prediction horizon as the
Section III A.
number of time-steps for n∗ ∈ [0, NP ] during which a
In our application, we define the Kalman state, xn , the
predictive algorithm incurs a lower Bayes prediction risk
dynamical model Φn , and a measurement action h(xn )
than naively predicting fˆn ≡ µf = 0 ∀n, the mean such that the Kalman Filtering framework can track a
qubit behaviour under zero-mean dephasing noise. non-Markovian qubit state trajectory due to an arbitrary
With this concept in mind, we introduce two general realisation of f . In standard KF implementations, the
approaches for algorithmic learning relevant to the stric- discrete-time sequence {xn }, defines a “hidden” signal
tures of the problem we have introduced. Our general ap-
that cannot be observed, and the dynamic model Φn is
proach is shared between all algorithms employed and is
known. We deviate from this standard construction such
represented schematically for the KF and GPR in Fig. 2.
that our true Kalman state and its uncertainty, (xn , Pn ),
Stochastic qubit evolution is depicted for one realisation
do not have a direct physical interpretation. Kalman
of f [black solid line] given noisy linear measurements
xn has no a priori deterministic component and corre-
[black dots] corrupted by Gaussian white measurement
sponds to arbitrary power spectral densities describing f .
noise vn . Our overall task is to produce an estimate,
Hence, the role of the Kalman xn is to represent an ab-
given by the red line, which minimizes risk for the predic-
stract correlated process that, upon measurement, yields
tion period. Ideally both estimation risk and prediction
physically relevant quantities governing qubit dynamics.
risk are minimized simultaneously for well performing im-
Moreover a key challenge described in detail below is to
plementations. construct an effective Φn from the measurement record.
Examining the insets in both panels of Fig. 2, both In contrast to the recursive approach taken in the
frameworks start with a prior Gaussian distribution over KF, a GPR learning protocol illustrated schematically
qubit states [purple] that is constrained by the measure- in Fig. 2(b) selects a random process to best describe
ment record to yield a posterior Gaussian distribution overall dynamical behaviour of the qubit state under one
of the qubit state [red]. The prior captures assumptions realisation of f . The key point is that sampling the prior
about the qubit state before any data is seen and the or posterior distribution in GPR yields random realisa-
posterior expresses our best knowledge of the qubit state tions of discrete time sequences, rather than individual
under a Bayesian framework. The posterior distribution random variables, and GPR considers the entire mea-
in both KF and GPR is used to generate qubit state surement record at once. In a sense, it corresponds to
estimates (n < 0) and predictions (n > 0) [red solid a form of fitting over the entire data set. The output
line]. However the computational process by which this of a GPR protocol is a predictive distribution which we
posterior is inferred differs significantly between the two
5

True State Prediction Msmts Est. Risk Pred. Risk


(a) Kalman Filtering (KF) (b) Gaussian Process Regression (GPR)

Data constrains

Dynamic Kalman Zero Gain


or or
Update Gain Update Dynamics
[a.u.]

RBF RQ Periodic

Time Steps [num]

FIG. 2. Comparison of the algorithmic structure between the KF and GPR by superposing lower panels of Fig. 1 with KF
and GPR predictive frameworks. (a) KF: Purple distribution represents a prior, with mean xn , and covariance Pn ; propagated
in time-steps, n, using Kalman dynamics Φn , and updated within each n by the Kalman gain γn to yield posterior distribution
(red) at n. The posterior at n is the prior at n + 1. The mean of a posterior distribution at each n is used to derive predictions
given by the red line using h(xn ). In blue region, the red posterior predictive distribution is propagated using Φn but γn ≡ 0.
Gaussian white Kalman ‘process’ noise, wn , is coloured by Φn to yield dynamics for xn . (b) Purple prior distribution defined
over sequences, f , with mean, µf and variance Σf is constrained by the entire measurement record. The resulting posteriori
predictive distribution (red) is evaluated at test-points in time, n‡ ∈ [−NT , NP ]; state estimates (predictions) is the mean, µf ‡
at n‡ < 0 (n‡ > 0). A choice of kernel defines each element in Σf , Σf † . In both (a)-(b), the purple shadow represents posterior
state variance (diagonal Pn or Σf ‡ elements) constrained by data and filtered measurement noise vn .

can evaluate at an arbitrarily chosen sequence of test- kernels (RBF) and mixtures of Gaussian kernels (RQ)
points, where the test points can exist for n < 0 (n > 0) capture the continuity assumption that correlations die
such that we extract state estimates (forward predic- out as separation in time increases. We choose to employ
tions) from the predictive distribution. Due to the nature an infinite basis of oscillators implemented by the so-
of this procedure, we wish to distinguish the set of test called periodic kernel to enable us to represent arbitrary
points (in units of time-steps) using a ‡ , namely, that we power spectral densities for f . Prediction occurs simply
are evaluating the predictive posterior distribution of a by extending the GPR fit by choosing test-points n‡ > 0.
GPR protocol at desired time labels. In this notation, In the following subsections we provide details of the
{n‡ }, n‡ ∈ [−NT , NP ] are test-points; N ‡ is the total specific classes of learning algorithm employed here with
length of an array of test points; where state estimation an eye towards evaluating their predictive performance
occurs if n‡ ≤ 0 and predictions occur if n‡ > 0. on qubit-measurement records. We introduce a series of
The process of building the posterior distribution is KF algorithms capable of handling both linear and non-
implemented using a kernel, or basis, from which to con- linear measurement records, and restrict our analysis of
struct the effective fit. In standard GPR implementa- GPR to linear measurement records.
tions, the correlation between any two observations de-
pends only on the separation distance of the index of
these observations, and correlations are captured in the A. Kalman Filtering (KF)
covariance matrix, Σf . Each element, Σnf 1 ,n2 , describes
this correlation for observations at arbitrary time-steps In order for a Kalman Filter to track a stochastically
indexed by n1 and n2 : this quantity is given in a form evolving qubit state in our application, the hidden true
set by the selected kernel. Kalman state at time-step n, xn , must mimic stochastic
In our application, the non-Markovian dynamics of f dynamics of a qubit under environmental dephasing. We
are not specified explicitly but are encoded in a general propagate the hidden state xn according to a dynamical
way through the choice of kernel, prescribing how Σnf 1 ,n2 model Φn corrupted by Gaussian white process noise, wn .
should be calculated. The Fourier transform of the kernel
represents a power spectral density in Fourier space. A xn = Φn xn−1 + Γn wn (3)
general design of Σnf 1 ,n2 allows one to probe arbitrary wn ∼ N (0, σ ) ∀n 2
(4)
stochastic dynamics and equivalently, explore arbitrary
regions in the Fourier domain. For example, Gaussian Process noise has no physical meaning in our applica-
tion - wn is shaped by Γn and deterministically colored
6

by the dynamical model Φn to yield a non-Markovian the dynamical model and effectively ignores data. In
xn representing qubit dynamics under generalised envi- particular, only the second terms in both Eq. (13) and
ronmental dephasing. In addition to coloring via the Eq. (14) represent the Bayesian update of the moments
dynamical model, the process noise covariance matrix, of a prior distribution ((−) terms) to the posterior dis-
Qn ≡ Γn ΓTn , offers an additional mechanism to shape tribution ((+) terms) at n. If γn ≡ 0, then the prior and
input white noise by designing Γn . posterior moments at any time step are exactly identi-
We measure xn using an ideal measurement protocol, cal by Eqs. (13) and (14), and only dynamical evolution
h(xn ), and incur additional Gaussian white measurement occurs using Eqs. (8) to (10). This is the condition we
noise vn with scalar covariance strength R, yielding scalar employ when we seek to make forward predictions be-
noisy observations yn : yond a single time-step, and hence we set γ ≡ 0 during
future prediction.
yn = zn + vn (5) Since we do not have a known dynamical model Φ
zn ≡ h(xn ) (6) for describing stochastic qubit dynamics under f , we
vn ∼ N (0, R) ∀n (7) will need to make design choices for {x, Φ, h(x), Γ} such
that f can be approximately tracked. These design
The measurement procedure, h(xn ), can be linear or non- choices will completely specify algorithms introduced be-
linear, allowing us to explore both regimes in our physical low and represent key findings with respect to our work
application. in this manuscript. For a linear measurement record,
With appropriate definitions, the Kalman equations h(x) 7→ Hx and we compare predictive performance for
below specify all Kalman algorithms in this paper. At Φ modeling stochastic dynamics either via so-called ‘au-
each time step, n, we denote estimates of the moments toregressive’ processes in the AKF, or via projection onto
of the prior and posterior distributions (equivalently, es- a collection of oscillators in the LKKFB. In addition, we
timates of the true Kalman state) with (x̂n (−), P̂n (−)) use the dynamics of AKF to define a Quantised Kalman
and (x̂n (+), P̂n (+)) respectively. The Kalman update filter (QKF) with a non-linear, quantised measurement
equations take a generic form (c.f. [31]) : model such that the filter can act directly on binary qubit
outcomes. We provide the relevant details in sub-sections
below.
x̂n (−) = Φn−1 x̂n−1 (+) (8)
2
Qn−1 = σ Γn−1 ΓTn−1 (9)
P̂n (−) = Φn−1 P̂n−1 (+)ΦTn−1 + Qn−1 (10)
1. Autoregressive Kalman Filter (AKF)
γn = P̂n (−)HnT (Hn P̂n (−)HnT + Rn )−1 (11)
ŷn (−) = h(x̂n (−)) (12)
Recursive autoregressive methods are well-studied in
x̂n (+) = x̂n (−) + γn (yn − ŷn (−)) (13) classical control applications (c.f. [32]) presenting oppor-
P̂n (+) = [1 − γn Hn ] P̂n (−) (14) tunities to leverage existing engineering knowledge in de-
veloping quantum control strategies. In our application,
To reiterate, Eq. (8) and Eq. (10) bring the best state of we use an autoregressive Kalman filter to probe arbitrary,
knowledge from the previous time step into the current covariance-stationary qubit dynamics such that the dy-
time step, n, as a prior distribution. Dynamical evolution namic model is constructed as a weighted sum of q past
is modified by features of process noise, as encoded in values driven by white noise i.e. an autoregressive pro-
Eq. (9), and propagated in Eq. (10). The propagation cess of order q, AR(q). Using Wold’s decomposition, it
of the moments of the a priori distribution, as outlined can be shown that any zero mean covariance stationary
thus far, does not depend on the incoming measurement, process representing qubit dynamics has a representation
yn , but is determined entirely by the a priori (known) in the mean-square limit by an autoregressive process of
dynamical model, in our case Φ ≡ Φn , ∀n. finite order, as in Appendix B.
The Kalman gain in Eq. (11) depends on the uncer- The study of AR(q) processes falls under a general
tainty in the true state, P̂n (−) and is modified by fea- class of techniques based on autoregressive moving av-
tures of the measurement model, Hn , and measurement erage (ARMA) models in adaptive control engineering
noise, Rn ≡ R, ∀n. It serves as an effective weighting and econometrics (e.g. [33, 34] respectively). For high-q
function for each incoming observation. Before seeing models in a typical time-series analysis, it is possible to
any new measurement data, the filter predicts an obser- decompose an AR(q) into an ARMA model with a small
vation ŷn (−) corresponding to the best available knowl- number of parameters [35, 36]. However, we retain a
edge at n in Eq. (12). This value is compared to the high-q model to probe arbitrary power spectral densities.
actual noisy measurement yn received at n, and the dif- Further, literature suggests employing a high-q model is
ference is used to update our knowledge of the true state relatively easier than a full ARMA estimation problem
via Eq. (13). If measurement data is noisy and unreli- and enables lower prediction errors [35, 37].
able (high R), then γ has a small value, and the algo- To construct the Kalman dynamical operator Φ for the
rithm propagates Kalman state estimates according to AKF, we introduce a set of q coefficients {φq0 ≤q }, q 0 =
7

1, ..., q to specify the dynamical model:


AKF/ QKF: Order q True State Msmts
fn = φ1 fn−1 + φ2 fn−2 + ... + φq fn−q + wn (15) Prediction

We thus see that the dynamical model is constructed


as a weighted sum of time-retarded samples of f , with Past
weighting factors given by the autoregressive coefficients
up to order (and hence time lag) q. For small q < 3, it is

(a.u.)
possible to extract simple conditions on the coefficients,
{φq0 ≤q }, that guarantee properties of f : for example,
that f is covariance stationary and mean square ergodic.
In our application, we freely employ arbitrary-q models
via machine learning in order to improve our approxima-
tion of an arbitrary f . Any AR(q) process can be recast
(non-uniquely) into state space form ([4]), and we de-
fine the AKF by the following substitutions into Kalman
equations:
 T LKFFB: Oscillators
xn ≡ fn . . . fn−q+1 (16)
 T Time Steps (num)
Γn wn ≡ wn 0 . . . 0 (17)
 
φ1 φ2 . . . φq−1 φq
 1 0 ... 0 0 FIG. 3. Approaches to construction of the KF dynamical

.. . .. 
 model. Panel (a) from Fig. 2 superimposed with Kalman dy-
ΦAKF ≡ 

 0 1 . .
. .  ∀n (18) namical models, Φ ≡ Φn , ∀n. (a) AKF/QKF: A set of autore-
gressive coefficients, {φq0 ≤q }, define Φ to yield fn as a weight

 0 0 .. 0
 . 
0  sum of q past measurements. (b) LKFFB: Red arrows with
0 0 ... 1 0 heights ||xjn || depict set of basis oscillators for j = 1, . . . , J (B)

H ≡ 1 0 0 0...0

∀n (19) probe true purple spectrum of fn and yields time domain dy-
namics of fn as a stacked system of resonators, Θj . Black
The matrix ΦAKF is the dynamical model used to re- L-shaped arrows depict a single instance of fn at n = 0 based
cursively propagate the unknown state during state esti- on historical {fn−1 , fn−2 , . . .}.
mation in the AKF, as represented schematically in the
upper half of Fig. 3. In general, the {φq0 ≤q } employed
in ΦAKF must be learned through an optimisation pro- ((q + 2) → 2)-dimensional for an AKF of order q.
cedure using the measurement record, where the set of Since Kalman noise parameters (σ 2 , R) are subsequently
parameters to be optimised is {φ1 , . . . , φq , σ 2 , R}. This auto-tuned using a Bayes Risk optimisation procedure
procedure yields the optimal configuration of the autore- (see Section IV A), we optimise over potential remaining
gressive Kalman filter, but at the computational cost of model errors and measurement noise.
a q + 2-dimensional Bayesian learning problem for arbi- In general, LSF performance improves as q increases
trarily large q. and a full characterisation of model-selection decisions
The Least Squares Filter (LSF) in [28] considers a for LSF are given in [28]. Defining an absolute value
weighted sum of past measurements to predict the i-th for the optimal q is somewhat arbitrary as it is defined
step ahead measurement outcome, i ∈ [0, NP ]. A gra- relative to the extent to which a true f is oversampled in
dient descent algorithm learns the weights, {φq0 ≤q } for the measurement routine and the finite size of the data.
the previous q past measurements, and a constant offset For all analyses presented here, we fix the ratio q∆t = 0.1
value for non-zero mean processes, to calculate the i-th [a.u.] and q/NT = 0.05 [a.u.], where the experimental
step ahead prediction. The set of NP LSF models, col- sampling rate is 1/∆t, NT and {φq0 ≤q } are identical in
lectively, define the set of predicted qubit states under an the AKF and LSF. In practice this ensures numerical
LSF acting on a measurement record. For i = 1, equiv- convergence of the LSF during training.
alent to the single-step update employed in the Kalman
filter, we assert that learned {φq0 ≤q } in LSF effectively
implements an AR(q) process (we validate numerically 2. Liska Kalman Filter with Fixed Basis (LKFFB)
in Section IV). Under this condition, and for zero-mean
wn , the LSF in [28] by definition searches for coefficients In LKFFB, we effectively perform a Fourier decompo-
for the weighted linear sum of past q measurements, as sition of the underlying f in order to build the dynamic
described in in Eq. (15). model, Φ, for the Kalman filter. Here, we project our
We use the parameters {φq0 ≤q } learned in the LSF measurement record on J (B) oscillators with fixed fre-
(B)
to define Φ in Eq. (18), therefore reducing the compu- quency ωj ≡ jω0 with j an integer as j = 1, . . . , J (B) .
tational complexity of the remaining optimisation from The temporal resolution of the state tracking procedure
8

is set by the maximum frequency in the selected basis imaginary components, namely, we sum Ajn for all J (B)
and properties of the spacing between adjacent basis fre- basis oscillators.
quencies. The superscript (B) indicates Fourier domain In [29], a state-dependent process-noise-shaping ma-
information about an algorithmic basis, as opposed to in- trix is introduced to enable potentially non-stationary
formation about the true (unknown) dephasing process. instantaneous amplitude tracking in LKKFB for each in-
The LKFFB allows instantaneous amplitude and phase dividual oscillator:
tracking for each basis oscillator, directly enabling for- xn−1
ward prediction from the learned dynamics. The struc- Γn−1 ≡ Φn−1 (29)
||xn−1 ||
ture of this Kalman filter, referred to as the Liska Kalman
Filter (LKF), was developed in [29]; adding a fixed basis For the scope of this manuscript, we retain the form of
in this application yields the Liska Kalman Filter with a Γn in our application even if true qubit dynamics are co-
Fixed Basis (LKFFB). variance stationary. As such, Γn depends on the state es-
For our application, the true hidden Kalman state, x, timates x. For this choice of Γn , we deviate from classical
is encoded as a collection of sub-states, xj , for the j th Kalman filters because recursive equations for P cannot
oscillator. For clarity we remind that the superscript is be propagated in the absence of measurement data. Con-
used as an index rather than a power. Each sub-state sequently, Kalman gains cannot be pre-computed prior
is labeled by a real and imaginary component which we to experimental data collection. Details of gain pre-
represent in vector notation: computation in classical Kalman filtering can be found
h i in standard textbooks (e.g. [31]).
(B)
xn ≡ x1n . . . xjn . . . xJn (20) There are two ways to conduct forward prediction for
LKFFB and both are numerically equivalent for an ap-
Ajn ≡ Re(xjn ) (21) propriate choice of basis: (i) we set the Kalman gain to
Bnj ≡ Im(xjn ) (22) zero and recursively propagate using Φ; (ii) we define
 j a harmonic sum using the basis frequencies and learned
An
xjn ≡ (23) {||xjn ||, θnj }. This harmonic sum can be evaluated for all
Bnj
future time to yield forward predictions in a single calcu-
The algorithm tracks the real and imaginary parts of lation. The choice of basis for an LKFFB and its impli-
the Kalman sub-state simultaneously in order calculate cations for optimal predictive performance are discussed
the instantaneous amplitudes (||xjn ||) and phases (θnj ) for in Appendix C 2.
each Fourier component:
q
3. Quantised Kalman Filter (QKF)
||xjn || ≡ (Ajn )2 + (Bnj )2 (24)
Bnj In QKF, we implement a Kalman filter that acts di-
θnj ≡ tan (25)
Ajn rectly on discretised measurement outcomes, d ∈ {0, 1}.
To reiterate the discussion of Fig. 1(a), this means that
The dynamical model for LKFFB is now constructed the measurement action in QKF must be non-linear and
as a stacked collection of these independent oscillators. take as input quantised measurement data. This holds
The sub-state dynamics match the formalism of a Marko- true irrespective of our dynamical model, Φ. In our ap-
vian stochastic process defined on a circle for each basis plication we set the dynamical model to be identical to
(B)
frequency, ωj , as in Ref. [38]. We stack Θ(jω0 ∆t) for that employed in the AKF, allowing isolation of the effect
all ωj along the diagonal to obtain the full dynamical of the nonlinear, quantised measurement action.
matrix for Φn : With unified notation across AKF and QKF, we define

(B)
 a non-linear measurement model h(x) and its Jacobian,
Θ(ω0 ∆t) . . . 0 H as:
Φn ≡  . . . Θ(jω0(B) ∆t) . . .  (26)
 
(B)
1
(B)
0 . . . Θ(J ω0 ∆t) zn ≡ h(xn [0]) ≡
cos(fn ) (30)
" # 2
(B) (B) dh(fn ) 1
(B) cos(jω0 ∆t) − sin(jω0 ∆t) =⇒ Hn ≡ = − sin(fn ) (31)
Θ(jω0 ∆t) ≡ (B) (B) (27) dfn 2
sin(jω0 ∆t) cos(jω0 ∆t)
During filtering, zn = h(xn [0]) is used to compute mea-
We obtain a single estimate of the true hidden state by
surement residuals when updating the true Kalman state,
defining the measurement model, H, by concatenating
xn , whereas the state variance estimate, Pn , is propa-
J (B) copies of the row vector [1 0] :
gated using the Jacobian, Hn . Further, the Jacobian
is used to compute the Kalman gain. Hence the filter
 
H ≡ 1 0...1 0...1 0 (28)
can quickly destabilise if the linearisation of h(·) by Hn
Here, the unity values of H pick out and sum the Kalman doesn’t hold during dynamical propagation, resulting in
estimate for the real components of f while ignoring the a rapid build up of errors.
9

In this construction, the entity zn is associated with an sian processes, P r(f ) with an appropriate encoding of
abstract ‘signal’: a sequence formed by repeated appli- their covariance relations via a kernel, Σnf 1 ,n2 . We return
cations of the likelihood function for a single qubit mea- to the linear measurement record and the definition of
surements in Eq. (1). The true stochastic qubit phase, scalar noisy observations yn corrupted by Gaussian mea-
fn , is our Kalman hidden state, xn . Subsequently, we surement noise, vn , as considered previously for AKF,
extract an estimate of the true bias, zn , as an unnatu- LSF, and LKFFB. Under linear operations, the distribu-
ral association of the Kalman measurement model with tion of measured outcomes, yn , is also a Gaussian. The
Born’s rule. The sequence {zn } is not observable, but mean and variance of P r(y) depends on the mean µf and
can only be inferred over a large number of experimental variance Σf of the prior P r(f ), and the mean µv ≡ 0 and
runs. variance R of the measurement noise:
To complete the measurement action, we implement
a biased coin flip within the QKF filter given ỹn . While f ∼ P rf (µf , Σf ) (35)
the qubit provides measurement outcomes which are nat- y ∼ P ry (µf , Σf + R) (36)
urally quantised, we require a theoretical model, Q, to
generate quantised measurement outcomes with statistics For covariance stationary f , correlation relationships de-
that are consistent with Born’s rule in order to propagate pend solely on the time lag, ν ≡ ∆t|n1 − n2 | between
the dynamic Kalman filtering equations appropriately. In any two time points n1 , n2 ∈ [−NT , NP ]. An element of
order to build this machinery we modify the procedure the covariance matrix, Σnf 1 ,n2 , corresponds to one value
in [39] to quantise zn using biased coin flips. In our no- of lag, ν, and the correlation for any given ν is specified
tation, we represent a black-box quantiser, Q, that gives by the covariance function, R(ν):
only a 0 or a 1 outcome based on ỹn :
dn = Q(ỹn ) (32) Σnf 1 ,n2 ≡ R(ν) (37)
= Q(h(fn ) + vn ) (33) Any unknown parameters in the encoding of correlation
The use of the notation ỹn is meant to indicate a corre- relations via R(ν) are learned by solving the optimisation
spondence with yn introduced earlier, while the physical problem outlined in Section IV A. The optimised GPR
meaning differs due to the discretised nature of the QKF. model is then applied to datasets corresponding to new
Therefore, the stochastic changes in {ỹn } are represented realisations of f . Let indices n ∈ NT ≡ [−NT , 0] de-
in the bias of a coin flip, subject to proper normalisation note training points, and let a length N ‡ vector contain
constraints which maintains |ỹn | ≤ 0.5: arbitrary testing points n‡ ∈ [−NT , NP ]. These test-
ing points in machine learning language encompass both
P r(dn |ỹn , fn , τ ) ≡ B(nB = 1; pB = ỹn + 0.5) (34) state estimation and prediction points in our notation.
QKF uses Eq. (34) to define a biased coin-flip dur- We now define the joint distribution P r(y, f ‡ ), where f ‡
ing filtering, where nB represents a single coin flip, pB represents the true process evaluated by GPR at desired
represents the stochastically drifting bias on the coin. test points:
Kalman filtering with the coin-flip quantisation defined
K(N ‡ , N ‡ ) K(NT , N ‡ )
 ‡    
by Eq. (34) presents a departure from classical amplitude f µf ‡
∼ N( , )
quantisation procedures in [39, 40]. y µy K(N ‡ , NT ) K(NT , NT ) + R
From a computational perspective, we modify the pro- (38)
cess noise features definition from AKF to QKF. We set
Q ≡ σ 2 ΓΓT → σ 2 I ∀n, I is q × q identity matrix, from The additional ‘kernel’ notation Σf ≡ K(NT , NT ) is ubi-
AKF to QKF. This rationale for this modification is that tiquous in GPR. Time domain correlations specified by
it smears out the effect of white process noise in a way R(ν) populate each element of a matrix K(·, ··), where
that stabilizes inversions in the gain calculation in the the dimensions of the matrix depend on the vector length
Kalman filter, but does not correlate any two Kalman of each argument. For example, for K(NT , NT ), the
states in time (diagonal matrix). In practice, this modi- notation defines a square matrix where diagonals corre-
fication only yields mild improvements over the original spond to ν = 0 and off-diagonal elements correspond to
AKF process noise features matrix. separation of two arbitrary points in time i.e. ν 6= 0.
The definitions of {Q, h(xn ), Hn , Q} in this subsection, Following [41], the moments of the conditional predic-
and dynamics {xn , Φ} from the AKF now completely tive distribution P r(f ‡ |y) can be derived from the joint
specify the QKF algorithm for application to a discrete, distribution P r(y, f ‡ ) via standard Gaussian identities:
single-shot measurement record as depicted in Fig. 1 (a).
µf ‡ |y = µf + K(N ‡ , NT )(K(NT , NT ) + R)−1 (y − µy )
(39)
B. Gaussian Process Regression (GPR)
Σf ‡ |y = K(N ‡ , N ‡ )

In GPR, correlations in the measurement record can − K(N ‡ , NT )(K(NT , NT ) + R)−1 K(NT , N ‡ )
be learned if one projects data on a distribution of Gaus- (40)
10

The prediction procedure outlined above holds true for using Matern kernels of order q + 1/2 but with increased
any choice of kernel, R(ν). In any GPR implementa- restrictions on the form of coefficients [41, 45]. A simple
tion, the dataset, y, constrains the prior model yielding consideration of autoregressive approaches suggest that
an a posteriori predictive distribution. The mean values a Matern kernel for q = 1 (MAT32) can be briefly trialed
of this predictive distribution, µf ‡ |y , are the state pre- under GPR, whereas high-q autoregressive processes are
dictions for the qubit under dephasing at test points in naturally and generally treated under a KF framework.
N ‡. Further discussion of kernel choice appears in Sec. V.
In our work we focus on a ‘periodic kernel’ to encode a
covariance function which is theoretically guaranteed to
approximate any zero-mean covariance stationary pro-
cess, f , in the mean square limit, by having the same
IV. ALGORITHM PERFORMANCE
structure as a covariance function for trigonometric poly- CHARACTERISATION
nomials with infinite harmonic terms [38, 42]. The sine
squared exponential kernel represents an infinite basis of
oscillators and is defined as: In the results to follow, our metric for characterising
performance of optimally tuned algorithms will be the
(B)
ω
2 sin2 ( 02 )
ν normalised Bayes prediction risk:
R(ν) ≡ σ 2 exp(− ) (41)
l2
LBR (n|I)
This kernel is described using just two key hyper- L̃BR ≡ 2 , µf ≡ 0 (44)
parameters: the frequency-comb spacing for our infinite h(fn − µf ) if,D
basis of oscillators, ω0 , and a dimensionless length scale,
l. We use physical sampling considerations to approx- A desirable forward prediction horizon corresponds to
imate their initial conditions prior to an optimisation maximal n∗ ∈ [0, NP ] for which normalised Bayes pre-
procedure, namely, that the longest correlation length diction risk at all time-steps n ≤ n∗ is less than unity.
encoded in the data sets the frequency resolution of the We compare the difference in maximal forward prediction
comb, and the scale at which changes in f are resolved is horizons between algorithms in the context of realistic
limited physically by the minimum time taken between operating scenarios. We begin here by introducing the
sequential Ramsey measurements: numerical methods employed for generating data-sets on
(B) which predictive estimation is performed.
ω0 1
∼ (42) We simulate environmental dephasing through a
2π ∆tN Fourier-domain procedure described in Appendix A 2
l ∼ ∆t (43) [46] in order to simulate an f which is mean-square er-
godic and covariance stationary. For the results in this
Because the periodic kernel can be shown to be formally
manuscript, we choose a flat top spectrum with a sharp
equivalent to the basis of oscillators employed in the
high-frequency cutoff for simplicity as this choice of a
LKFFB algorithm in a limiting case (see Appendix C
power spectral density theoretically favors no particular
for a discussion using results in [42]), the inclusion of
choice of algorithm but violates the Markov property.
GPR using this kernel permits a comparison of the un-
In our simulations we also must mimic a measurement
derlying algorithmic structures for the task of predictive
process which samples the underlying “true” dephasing
estimation using spectral methods.
process. The algorithmic parameters {NT , ∆t} represent
For the analysis of covariance stationary time series
a sampling rate and Fourier resolution set by the sim-
under a GPR framework, we de-emphasise popular ker-
ulated measurement protocol; we choose regimes where
nel choices such as: a Gaussian kernel (RBF), a scale
the Nyquist rate, r  2. In generating noisy simulated
mixture of Gaussian kernels (RQ), and Matern kernels
measurement records, we corrupt a noiseless measure-
(e.g. MAT32) [41, 43]. An arbitrary-scale mixture of
ment by additive Gaussian white noise. Since f is Gaus-
zero-mean Gaussian kernels will probe an arbitrary area
sian, the measurement noise level, N.L., is defined as a
around zero in the Fourier domain, as schematically de-
ratio between the standard√ deviation of additive Gaus-
picted in Fig. 2(a). While such kernels capture the con-
sian measurement noise, R and the maximal spread of
tinuity assumption ubitiquous in machine learning, they
random variables in any realisation f . We approximate
are structurally inappropriate for probing a process char-
the maximal spread of f as three sample standard q devi-
acterized by an arbitrary power spectral density (e.g. √
ohmic noise). Another common kernel for time-series ations of one realisation of true f , N.L. = R/3 Σ̂n,n
f .
analysis is a quasi periodic kernel (QPER) defined by a The use of a hat in this notation denotes sample statis-
product of an RBF with a periodic kernel [44]. This cor- tics. This computational procedure enables a consistent
responds to a convolution in the Fourier domain giving application of measurement noise for f from arbitrary,
rise to a comb of Gaussians at the expense of an increase non-Markovian power spectral densities. For the case
in the number of parameters required for kernel tuning. where binary outcomes are required, we apply a biased
One can also consider specific types of AR(q) processes coin flip using Eq. (34).
11

A. Algorithmic Optimisation loss value calculation requires a recursive filter to act on


a long measurement record. Further, our approach en-
All algorithms in this manuscript employ machine sures tuning procedures are performed off-line such that
learning principles to tune unknown design parameters a tuned algorithm is simple in its recursive structure and
based on training data-sets. The physical intuition as- performs rapid calculations at each time-step.
sociated with optimising our filters is that we are cy- An ideal parameter pair (σ ∗ , R∗ ) minimises Bayes risk
cling through a large class of general models for environ- over K trials for both state estimation and prediction.
mental dephasing and seeking the model(s) which best We define acceptable low loss regions for state estima-
fit the data subject to various constraints. This allows tion and prediction as being the set which returns loss
each filter to track stochastic qubit dynamics under ar- less than 10% of the median risk over K trials. In the
bitrary covariance-stationary, non-Markovian dephasing. event that low risk regions do not exist for both state
We elected to deploy an optimisation routine with mini- estimation and prediction for a given parameter pair, we
mal computational complexity to enable nimble deploy- deem the optimisation to have failed as state estimation
ment of KF and GPR algorithms in realistic laboratory performance is uncorrelated with forward prediction (for
settings, particularly since LSF optimisation is extremely illustration, see panel (h) of Fig. 7).
(B)
rapid for our application [28]. In GPR the set of parameters I = {σ, R, ω0 , l} re-
Kalman filtering in our setting poses a significant chal- quires optimisation. However, in contrast to the KF,
lenge for general optimisers as the lack of theoretical no recursion exists and analytic gradients are accessible
bounds on the values of (σ, R) result in large, flat regions to simplify the overall optimisation problem. Instead of
of the Bayes Risk function. Further, the recursive struc- minimising Bayes state-estimation risk, we follow a popu-
ture of the Kalman filter means that no analytical gradi- lar practice of maximising the Bayesian likelihood. Initial
ents are accessible for optimising a choice of cost function conditions and optimisation constraints are derived from
and a large computational burden is incurred for any op- physical arguments as described in Section III.
timisation procedure. We randomly distribute (σk , Rk )
pairs for k = 1, . . . K over ten orders of magnitude in two
dimensions in order to sample the optimisation space. B. Performance of the KF using linear
We then generate a sequence of loss values L(σk , Rk ) measurement
for each k by considering a small region around n = 0,
where the size of the region is nL number of time steps The general performance of the various KF algorithms
we look forward or backwards from n = 0: discussed above is illustrated in Fig. 4 which compares
nL
the AKF and LKFFB algorithms using a linear measure-
X ment record. Here the solid black line represents the
L(σk , Rk ) ≡ LBR (n|I = {σk , Rk }). (45)
underlying true f and solid markers indicate noisy simu-
n=1
lated linear measurement data. Future predictions using
Here, LBR (n|I = {σk , Rk }) is given by Eq. (2) and it is the various KF formalisms and the (non-recursive) LSF
summed over 0 ≤ nL ≤ |NT | (0 ≤ nL ≤ |NP |) backwards filter [28] are shown as coloured open markers, based on
(forwards) time-steps for state estimation (prediction). these data. The selected single realization of the predic-
In the notation for I above, we omit Kalman dynamical tion process demonstrated in (a) is representative of a
model design parameters for ease of reading. Typically broad ensemble of simulated data sets and demonstrates
I would include, for instance, the set of autoregressive the ability of all algorithms to perform future prediction
coefficients in AKF and the set of fixed basis frequen- with varying degrees of success.
cies in LKFFB. Values of nL are chosen such that the In general, our objective is to maximise the forward
sequence {L(σk , Rk )} defines sensible shapes of the to- prediction horizon, n∗ , in any algorithmic setting. In
tal loss function over parameter space and the numerical Fig. 4(b)-(d), we explore the key determining factors set-
experiments in this manuscript. A choice of small nL ting the value of the prediction horizon under the three
in state-estimation ensures that data near the prediction main Kalman filtering algorithms treated here. We plot
horizon are employed - a region where the Kalman fil- the ensemble-averaged L̃BR as a function of forward pre-
ter is most likely to have converged. Similarly, in state diction time when adjusting the ratio of the cutoff fre-
prediction, large nL will flatten the true prediction loss quency in the noise, Jω0 , to the sample rate in the
function as long-term prediction errors dominate smaller measurement routine (ω(S) = 2π/∆t) without physical
loss values occurring during the short term prediction aliasing such that Nyquist r  2 and ω(S) ≈ ω (B) /r,
period. In addition, one can weight state estimation and where ω (B) incorporates a (potentially incorrect) band-
state prediction loss functions differently by choosing dif- width assumption about dephasing noise for LKFFB.
ferent values of nL for state estimation and prediction, Here again, we have a forward prediction horizon for
though we set nL to be the same in both regions. While time-steps 0 < n < n∗ if L̃BR . 1 for all time-steps
simple and by no means optimal, our tuning approach is in this region and an algorithm seeks to maximise n∗ .
computationally tractable and efficient compared to the In this region, each algorithm predicts future dynamics
application of standard optimisation routines where each better than naively predicting the mean behaviour of f
12

AKF LKFFB LSF True Msmts straint that the true dynamics of f cannot be perfectly
projected onto the basis used in LKFFB (the latter sit-
(a)
uation corresponding to substantial a priori knowledge
(a.u.)

of the dynamics of f ). The role of undersampling in


the LKFFB becomes pronounced as predictive estimates
lead to unstable behavior relative to the naive predic-
tion of µf = 0 in the case Jω0 /ω (B) = 2 in Fig. 4(d).
Past Future The AKF and LSF share autoregressive coefficients and
therefore both algorithms demonstrate comparable L̃BR
prediction risk in the ensemble average.
2.0 A key implied benefit of the use of Kalman filter-
(b) LSF (c) AKF (d) LKFFB ing vs the LSF with high-order autoregressive dynam-
(a.u.)

ics alone is the addition of robustness against measure-


ment noise. In order to probe this numerically, we
perform direct comparisons of filter performance un-
der varying measurement-noise strength for both the
AKF and LSF. Since autoregressive coefficients learned
in (noisy) environments are re-cast in Kalman form, we
test measurement-noise filtering in Kalman frameworks
enabled by the design parameter R. In Fig. 5 (a), we
Time Steps (num)
plot L̃BR prediction risk for AKF and LSF as a ratio
FIG. 4. (a) Solid dots depict yn against time-steps n and
data collection ceases at n = 0. Optimised LSF, AKF and
LKFFB yield predictions n > 0 in the blue region plotted (a) (i) N.L.
as open, coloured markers. A black solid line shows one re-
alisation of true fn , drawn from a flat top spectrum with
J true Fourier components spaced ω0 apart and uniformly LSF
(B)
randomised phases. Other parameters: ω0 /ω0 ∈ / Z (nat- outperforms AKF
AKF/LSF (a.u.)

ural numbers), J = 45000, ω0 /2π = 9 × 10−3 Hz such


8

that > 500 number of true components fall between adja-


cent LKFFB oscillators; N.L. = 10%. (b)-(d) Procedure in (ii)
(a) is repeated for ensemble M different realisations of f and
noisy datasets to compute L̃BR for LSF, AKF, and LKFFB.
L̃BR v. n ∈ [0, NP ] is plotted; dark-grey horizontal line (iv)
marks L̃BR ≡ 1 for predicting the mean µf ≡ 0. Vertical AKF
dashed lines mark the forward prediction horizon, n∗ , where (iii) outperforms LSF
L̃BR . 0.8 < 1 for all prediction time steps 0 < n ≤ n∗
in out-performing predicting the noise mean. Marker color
(b) AKF LSF
(dark indigo to pink) depicts true f cutoff, Jω0 varied rela-
(B)
tive to ω (B) ≡ ω0 J (B) ≈ rω(S) , with fixed Nyquist r  2;
(a.u.)

ω0 /2π = 0.497 Hz, J = 20, 40, 60, 80, 200; N.L. = 1%. For all
(B)
(a)-(d), a trained LKFFB is implemented with ω0 /2π = 0.5 (i) (ii) (iii) (iv)
(B)
Hz and J = 100 oscillators; trained AKF / LSF models
are q = 100; with NT = 2000, NP = 50 steps, ∆t = 0.001s,
M = 50 runs, K = 75 optimisation trials. Time Steps (num)

FIG. 5. Measurement noise filtering in AKF v. LSF.


(µf ≡ 0), indicated by a dark-grey horizontal line. (a) Dashed-lines with markers depict the ratio of L̃BR for
AKF to LSF against time-steps n > 0; for cases (i)-(iv) with
The prediction horizon, indicated approximately by
N.L. = 0.1, 1.0, 10.0, 25.0%. Green trajectory shows LSF out-
dashed vertical lines, for all algorithms increases as the performs AKF with ratio > 1 for n ≤ n∗ ; crimson trajectories
measurement becomes sufficiently fast to sample the show AKF outperforms LSF with ratio < 1 for n ≤ n∗ . (b)
highest frequency dynamics of f . We confirm numerically L̃BR against n is plotted for cases (i)-(iv) confirms a maximal
that absolute prediction horizons for any algorithm are forward prediction horizon marked by n∗ , exists for all ratios
arbitrary and adjustable through the sample rate, allow- in (a) for both LSF and AKF. In (a) and (b), AKF and LSF
ing us to restrict our analysis to comparative statements share identical {φq }. True f is drawn from a flat top spectrum
between algorithms for future results. While differences with ω0 /2π = 98 × 10−3 Hz, J = 45000, NT = 2000, NP = 100
between protocols appear reasonably small we note that steps, ∆t = 0.001s, r = 20 such that Fig. 6(c) corresponds
in most cases examined the AKF demonstrates superior to case (ii) in this figure. AKF is optimised with q = 100,
performance to the LKFFB subject to the realistic con- M = 50 runs, K = 75 trials.
13

True
LKFFB

LKFFB (a) (b) (c) (d)


(a.u.)

AKF
LSF

Time Steps (num)


LKFFB (e) (f) (g) (h)

AKF (i) (j) (k) (l)

FIG. 6. Comparison of KF performance under various imperfect learning scenarios. (a)-(d) True noise properties are varied to
introduce pathological learning with respect to fixed algorithmic configuration: ω0 /2π = 0.5, 0.499, 89 × 10−3 , 89 × 10−3 Hz and
J = 80, 80, 45000, 80000 respectively. The relationship between LKFFB basis and true noise spectrum is shown schematically
above columns: (a) perfect learning; (b) imperfect projection on LKFFB basis; (c) finite computational Fourier resolution;
(d) relaxed basis bandwidth assumption. (a)-(d) L̃BR against time-steps n > 0 is shown for LKFFB, AKF, and LSF. (e)-(l)
Optimisation results for LKFFB [top row] and AKF [bottom row] in each of the four regimes in (a)-(d). Grey dots depict K
random (σ 2 , R) pairs; where M realisations of f, D are used to calculate L̃BR for each pair. Purple (crimson) circles represent
low loss regions where risk value in Eq. (45), for (σ 2 , R) is < 10% of the median risk value during state estimation (prediction)
for −nL < n < 0 (nL > n > 0), with nL = 50. Black star, (σ ∗ , R∗ ), minimises risk values over purple circles during
state estimation. A KF filter is ‘tuned’ if optimal (σ ∗ , R∗ ) lies in the overlap of low loss regions for state estimation [purple]
and prediction [crimson]; disjoint regions in (h) show LKFFB tuning failure. KF algorithms set up with q = 100 for AKF;
(B)
J (B) = 100, ω0 /2π = 0.5 Hz for LKFFB; with NT = 2000, NP = 100 steps, ∆t = 0.001s, r = 20; N.L. = 1%.

such that a value greater than unity implies LSF out- ensemble-averaged L̃BR in Fig. 5 (b) demonstrate that
performs AKF. In cases (i)-(iv), we increase the applied all ratios reported in (a) correspond to a useful forward
noise level to our data-sets {yn } representing simulated prediction horizon.
measurements on f . For applied measurement noise level In machine learning or optimal control settings, the
N.L. > 1% in (ii)-(iv), we find that AKF/LSF < 1 and robustness of the learning procedure to small changes
AKF outperforms LSF for the conditions studied here, in the underlying system is an essential characteristic of
with a general trend towards increasing benefits as noise the algorithm. In our case, we have already seen that
increases until the noise becomes so large (iv) that the the quality of projection of the true dynamics of f onto
benefits fluctuate as a function of n. Calculations of the the LKFFB basis can have a significant impact on the
14

quality of learning and predictive estimation. We explore


this initial finding in more detail. LKFFB AKF True
In Fig. 6, we simulate various learning conditions in-
(a) (b)
cluding (a) perfect learning in LKFFB; (b) imperfect pro-
jection relative to the LKFFB basis; (c) imperfect pro-
jection combined with finite algorithm resolution; and
(d) imperfect learning and undersampling relative to true
noise bandwidth. The ordering of figure presentation
highlights the degree of impact of the introduced patholo-
gies on LKFFB. By contrast we find reasonable model (c) (d)
robustness in AKF/LSF at the expense of performance
in the somewhat unrealistic perfect learning case.
We expose the underlying optimisation results for
choosing an optimal (σ ∗ , R∗ ) for LKFFB in Fig. 6 (e)-(h)
and for AKF in Fig. 6 (i)-(l). Individual sample points
are highlighted as solid dots while low-loss pairs in this
2D space are highlighted for giving low state-estimation
[purple] or prediction [crimson] risk via shaded circles.
As the model pathologies indicated above increase, these
FIG. 7. (a)-(d) Blue (red) open markers plot LKFFB (AKF)
data demonstrate a divergence between regions of the op-
spectrum estimates; true spectrum (flat top) of f plotted
timisation space which permit low-loss state estimation in black solid line. Dashed black vertical line marks true
and forward prediction for LKFFB. In contrast, overlap noise cutoff, Jω0 , and this is varied relative to a measure-
of low loss Bayes Risk regions do not change for AKF (B)
ment sampling rate, ω(S) , and ω (B) ≡ ω0 J (B) ≈ ω(S) /r in
across Fig. 6 (i)-(l). LKFFB; such that ω0 /2π = 0.497 Hz, J = 20, 40, 80, 200. For
Kalman filtering algorithms employed here combine re- LKFFB, blue open markers are ∝ ||x̂jn ||2 in a single run with
cursive state estimation with the establishment of a dy- (B)
ω0 /2π = 0.5 Hz for j ∈ J (B) = 100 oscillators; dashed blue
namical model in the Fourier domain. Therefore, one vertical line marks edge of LKFFB basis. For AKF, red mark-
way to explore algorithmic performance is to look di- ers are Ŝ(ω) computed using learned {φq0 ≤q } and optimised
rectly at the efficacy of spectral estimation relative to σ ∗ , with order q = 100. In all plots, the zeroth Fourier com-
the true (here numerically engineered) hidden dynam- ponent is omitted on the log scale; and NT = 2000, NP = 50
ics of f . For both the LKFFB and AKF we plot the steps, ∆t = 0.001s, r = 20, with M = 50 runs, K = 75 trials;
extracted power spectral density, S(ω), as a function of N.L. = 1%.
angular frequency ω, for different measurement sampling
conditions in Fig. 7 against the true spectrum used to de-
fine f . These simulated experimental conditions match ing given the generally superior performance of the AKF
those introduced in Fig. 4 (b). in predictive estimation, but does highlight the practi-
In the case of LKFFB, we plot the learned instan- cal difference between Fourier-domain spectral estima-
taneous amplitudes from a single run [blue markers] tion and time-domain prediction.
and for AKF we extract optimised algorithm parameters
as described above [red markers]. Under the assertion
that the LSF implements an AR(q) process, the set of C. Performance of the quantised Kalman filter
trained parameters, {{φq0 ≤q }, σ 2 } from AKF allows us
to derive experimentally measurable quantities, includ-
ing the power spectral density of the dephasing process: The discrete nature of projective measurement out-
 Pq 0
−1 comes in quantum systems poses a potential challenge
S(ω) = σ 2 2π|1 − q0 =1 φq0 e−iωq |2 ) [35]. for Kalman filters in the event that measurement pre-
The critical feature in these data-sets is the existence processing as in Fig. 1(b) is not performed. We test filter
of a flat-top spectrum possessing a sharp high frequency performance for predictive estimation when only binary
cutoff. Both classes of Kalman filtering algorithm suc- measurement outcomes are available via the QKF. To re-
cessfully identify this structure and locate this high- iterate, QKF estimates and tracks hidden information,
frequency cutoff. In general, however, the LKFFB pro- fn , using the Kalman true state xn . In our construction
vides superior spectral estimation relative to the AKF, the associated probability for a projective qubit measure-
and enables better estimation of the signal strength in ment outcome, ∝ zn is not inferred or measured directly
the Fourier domain even in the presence of imperfect but given deterministically by Born’s rule encoded in the
projection of f onto the basis used in LKFFB. The only non-linear measurement model, zn = h(fn ). The mea-
case in which the LKFFB fails is in Fig. 7(d), where surement action is completed by performing a biased coin
the LKFFB basis is ill-specified relative to the true noise flip, where zn determines the bias of the coin.
bandwidth. The observed behavior is somewhat surpris- For QKF, the normalised ensemble-averaged predic-
tion risk, h(zn − ẑn )2 if,D /h(zn − µz )2 if,D , is calculated
15

with respect to z as the relevant quantity parameterising


qubit-state evolution, instead of the stochastic underly-
(a) Desired Perf.
ing f . This quantity is labeled as Norm. Risk in Fig. 8
and we test if h(zn − ẑn )2 if,D /h(zn − µz )2 if,D < 1 for
0 < n < n∗ can be achieved for numerical experiments
considered previously in the linear regime. In particular,

Norm. Risk (a.u.)


we generate true f defined in numerical experiments in
Fig. 4(b) (and Fig. 7) for q = 100 and varying sample True
rates.
We isolate the role of the measurement action by first
(b)
inputting into the QKF a true dynamical model rather
than a dynamical model learned as in the standard AKF.
To specify true dynamics, we begin with a set of {φq0 ≤q }
and exactly derive a new f 0 . As a result the full set of
parameters relevant to the filter, {{φq0 ≤q }, σ, R}, are per-
fectly defined and known, and the filter simply acts on Learned
single shot qubit measurements. These simulations re-
veal that subject to generic measurement oversampling
Time Steps (num)
conditions introduced above the QKF is able to success-
fully enable predictive estimation. As in the linear case,
the absolute forward prediction horizon is arbitrary rela- FIG. 8. Norm. Risk against n > 0 plotted for QKF in open
markers; dark-grey line at µf ≡ 0 depicts performance un-
tive to ω0 J/ω (B) and implicitly, an optimisation over the
der predicting the noise mean. QKF outperforms predict-
choice of q for a finite data size, NT , in our application. ing the mean if open markers lie in green regions. Marker
Our simulations reveal that the QKF is considerably colour (dark indigo to pink) depicts true noise cutoff varied
more sensitive to measurement noise, model errors, and Jω0 /ω(B) = 0.2, 0.4, 0.6, 0.8 for f defined identically in Fig. 7
the degree of undersampling than the linear model as with ω0 /2π = 0.497 Hz, J = 20, 40, 60, 80; N.L. = 1%. (a)
shown in Fig. 8 (b). Here the QKF incorporates a learned We obtain {φq0 ≤q }, q = 100 coefficients from AKF/LSF act-
dynamical model from AKF in the linear regime and we ing on a linear measurement record generated from true f .
tune (σ, R) for use in the QKF. In particular, we ex- A new truth, f 0 , is generated from an AR(q) process using

plore σ ≥ σAKF to incorporate model errors as {φq0 ≤q } {φq0 ≤q }, q = 100 as true coefficients and by defining a known,
were learned in the linear regime. We also incorporate true σ. Quantised measurements from f 0 are obtained; data is
increased measurement noise via R ≥ RAKF ∗
as QKF re- corrupted by measurement noise of a true, known strength R.
(b) We use {φq0 ≤q }, q = 100 coefficients from (a) but we gener-
ceives raw data that has not been pre-processed or low-
ate quantised measurements from the original, true f . QKF
pass filtered. The underlying optimisation problems are noise design parameters are optimised for (σAKF ∗
≤ σQKF ,
well behaved for all cases in Fig. 8(b) [not shown]. As ∗
RAKF ≤ RQKF ) with M = 50 runs, K = 75 trials. For
the sampling rate is reduced, the QKF forward predic- (a)-(b), NT = 2000, NP = 50 steps, ∆t = 0.001s, r  2.
tion horizon collapse rapidly i.e h(zn − ẑn )2 if,D /h(zn −
µz )2 if,D > 1 prediction risk for all n > 0.
implications of the choice of kernels in our application,
rather than making comparative statements about kernel
D. Failure of GPR in predictive estimation performance.
The results we have assembled demonstrate that the
Under a GPR framework, we test whether predictive implementation of GPR with a periodic kernel critically
(B)
performance can be improved by considering the entire depends on the frequency basis comb spacing, ω0 , or
measurement record (at once) and projecting this record equivalently, a deterministic quantity, κ:
on an infinite basis of oscillators summarised by a pe- 2π
riodic kernel. We investigate several different types of κ≡ (B)
− NT (46)
GPR models for M = 50 realisations of f in the top ∆tω0
panel of Fig. 9. For the results shown, we use a popular (B)
choice of a maximum-likelihood optimisation procedure The term 2π/∆tω0 is the theoretical number of mea-
implemented via L-BFGS in GPy [47]. surements that, in principle, would be required to phys-
We find that the underlying optimisation procedure ically achieve the Fourier resolution set by the kernel
(B)
for training on our measurement records remains diffi- hyper-parameter, ω0 , and the fundamentally discrete
cult despite having access to an analytical calculation for nature of a sequential Ramsey measurement record, ex-
the cost function. For all results in Fig. 9(a) and (b), pressed by ∆t. Hence, if κ = 0, the physical Fourier
we use significant manual tuning prior to deploying the resolution determined by the data set matches the comb
automated procedures in GPy. Hence, we focus on us- spacing in the periodic kernel. For κ > 0, the comb spac-
ing numerical results under GPR to illuminate structural ing in the periodic kernel is less than the Fourier spac-
16

ing defined by the experimental data collection protocol, (a)


with total measurements NT .
In Fig. 9(a), we see that GPR predictive perfor-
mance for the periodic kernel improves as the Ker-
nel’s comb spacing is reduced. For each value of κ
we plot L̃BR against time-steps forward, n‡ , where the

(a.u.)
MAT32
‡ corresponds to the evaluation of a predictive GPR QPER
distribution on arbitrarily chosen test points, n‡ =
−NT , . . . , −1, 0, 1 . . . , NP . Here, the optimiser is con-
(B) PER
strained to a region in 2π/ω0 parameter space that RBF
corresponds to the order of magnitude for κ. Grey mark- RQ
ers correspond to κ ≤ 0, where the algorithm operates
above (or at) the Fourier resolution. In this physically
motivated parameter regime, prediction fully fails. It is
not until we set κ ∼ 103 – a nominally unphysical operat- (smooth)
ing regime where the algorithm’s frequency-comb spacing Perfect Projection Imperfect Projection
is smaller than the Fourier resolution – that prediction (b) (c)
succeeds [red traces]. This latter case is physically dif-
ficult to interpret given that in this regime we find the
best ensemble-averaged predictive performance only by
providing unphysical freedom to the algorithm. We note
that the optimised length scale for the periodic kernel

(a.u.)
remains on of order ∆t ∼ 10∆t, such that for all red tra-
(B) High Kernel Resolution Both (c) and (d)
jectories in panel (a), we are operating in a high 2π/ω0 ,
(d) (e)
low l limit.
We contextualise the predictive performance of the
GPR periodic kernel (PER) [red solid line] in the high-
κ, low-l limit by comparing against predictions derived
using other standard kernels [dotted lines] in the in-
set to Fig. 9(a). In such circumstances the predictive
performance of the periodic kernel predictive is on par -NT
with an application of a Gaussian kernel (RBF) and a Time Steps (num)
.
scale mixture of zero mean Gaussians with different de-
cay lengths (RQ). A Matern kernel (MAT32) and a quasi FIG. 9. (a) L̃BR v. n‡ (in units of number of time steps)
periodic kernel (QPER) yield lower-than-anticipated per- are plotted for GPR with a periodic kernel. Dark-grey hori-
formance. Further discussion of the choice of kernel ap- zontal line at unity for µf ≡ 0 marks L̃BR under predicting
pears in Sec. V. For each individual time-trace contribut- the mean; GPR outperforms predicting the mean if data falls
ing to the ensemble averages appearing here, we observe below this line. Grey-black markers correspond to optimi-
that all kernels (PER, RBF, RQ, MAT32, QPER) yield sation within physical bounds for κ ≤ 0 (kernel resolution
good state estimation and the state estimate at n‡ = −1 at or above Fourier resolution); crimson markers and lines
depict optimisation within unphysical regimes, κ > 0; with
agrees well with the truth. For GPR with a PER, RBF,
solid lines in high κ  0 regime. Remaining {R, σ, l} opti-
and RQ kernels, the state estimate at n‡ = −1 smoothly
mised for non-negative values. Inset (a) L̃BR v. n‡ of pe-
decays to the mean value (zero) for n‡ ≥ 0 and this ef- riodic kernel (PER) with κ ≈ 103 is plotted against results
fect yields a favourable normalised Bayes prediction risk from naively trained Gaussian kernels (RBF, RQ); a Matern
immediately after n‡ > 0 depicted by the solid lines in kernel (MAT32) and a quasi-periodic kernel (QPER). (b)-(d)
inset of Fig. 9(a). True state fn v. n [black solid line] and GPR predictions µ̂f ‡
In order to illustrate the operating mechanism for the v. n‡ [open markers] plotted for periodic kernel for tracking
periodic kernel, we dramatically simplify the model used a sinusoid with frequency, ω0 ; noisy data record [not shown]
for f in Fig. 9 (a) and replace it with a single-frequency ceases at n = 0. We fix κ = 0, 70; triangles plot predictions for
sine curve. Fig. 9 (b)-(e) demonstrates the prediction manually tuned {R, σ, l}; circles plot predictions for optimised
routine for GPR using a periodic kernel on a simplified {R, σ, l}. Vertical dashed lines mark n = κ, where we overlay
version of f , and as before, prediction is always con- true f at the beginning of the data record as a red dashed
(B)
ducted from time-step zero. For this simple example, the line. (b) Perfection projection is possible ω0 /ω0 ∈ Z (nat-
periodic kernel learns Fourier information in the mea- ural numbers), ω0 /2π = 3 Hz. (c) Imperfect projection, with
(B)
surement record enabling interpolation using test-points ω0 /ω0 ∈ / Z, ω0 /2π = 3 13 Hz, κ = 0. (d) Moderately raise
(B)
n‡ ∈ [−NT , 0] for all cases (b)-(e) in Fig. 9, and atyp- κ > 0, such that ω0 /ω0  0 ∈ / Z for original ω0 /2π = 3
(B)
ical features are seen only for test-points in the predic- Hz. (e) Test (c) and (d) for κ > 0, ω0 /ω0 ∈ / Z, ω0 /2π = 3 13
tion region [blue shaded region]. We consider predictions Hz. For (b)-(e), NT = 2000, NP = 150 steps, ∆t = 0.001s;
N.L. = 1%.
17

from a manually tuned model [triangles] and an opti- discretised projective measurement models via what we
mised GPR model where remaining free {σ, R, l} param- refer to as the QKF. In QKF, we employ single-shot,
eters are tuned using GPy [circles]. discretised qubit data while enabling model-robust qubit
An examination of different cases for imperfect learn- state tracking and increased measurement noise filtering
ing reveal that this discontinuity exhibits deterministic via the underlying AKF algorithm. However we find that
behavior linked to the underlying structure of the algo- the QKF is vulnerable to the buildup of errors for arbi-
rithm, namely, to the value of κ. In our numerical ex- trary applications and we provide three explanatory re-
periments, we find that in all cases of imperfect learning marks from a theoretical perspective. First, the Kalman
under GPR with a periodic kernel, a discontinuity in the gains are recursively calculated using a set of linear equa-
prediction sequence arises at n‡ = κ. This is marked tions of motion which incorporate the Jacobian Hn of
by the vertical dashed lines in all panels of Fig. 9(b)- h(xn ) at each n. All non-linear Kalman filters perform
(e). However, another feature appears which we identify well if errors during filtering remain small such that the
as being linked to oversampling of the underlying pro- linearisation assumption holds at all time-steps. Second,
cess determining f . In such cases, the algorithm simply measurements are quantised and hence residuals must be
predicts zero out to n‡ = κ before discontinuously pre- {−1, 0, 1} rather than continuously represented floating-
dicting future evolution which does not appear similar point numbers. In our case, the Kalman update to xn
to the true value of f . By contrast an optimised model at n, mediated by the Kalman gain cannot benefit from
gives smoothly varying predictions, which still adhere to a gradual reduction in residuals. A third effect incor-
the underlying behaviour set by κ for n‡ > 0. porates consequences of both quantised residuals and a
In Fig. 9(b)-(e), we also plot the value of f as given non-linear measurement action. In linear Kalman filter-
from n = −NT , the start of the data set, on top of the ing, Kalman gains can be pre-calculated in advance of
prediction from n‡ = κ. Here we see that the prediction the acquisition of any measurement data: the recursion
provided by GPR matches the earliest stages of the un- of Kalman state-variances Pn , can be decoupled from the
derlying data set well. Through various numeric experi- recursion of Kalman state-means, xn [31]. In our appli-
ments we find that the action of GPR in such parameter cation, quantised residuals affect the Kalman update of
regimes (moderately positive κ > 0) appears to be to xn , and further, they affect the recursion for the Kalman
simply repeat the learned values of f from n = −NT be- gain via the state dependent Jacobian, Hn .
ginning at n‡ = κ. Accordingly these predictions rarely In this context, we demonstrate numerically that the
describe the underlying forward dynamics of f well. QKF achieves a desirable forward prediction horizon
As we enter the high κ regime, κ  0, the features when the build of errors during filtering is minimised, for
in Fig. 9(b)-(e) disappear, and GPR predictions begin to example, by specifying Kalman state dynamics and noise
track the (slow moving) ‘truth’ for n‡  0. Analogously strengths perfectly, and/or by severely oversampling rel-
to inset (a), we see the performance of PER approach ative to the true dynamics of f . At present, we sim-
that of standard Gaussian kernels in this simplified case. ply interpret our results on the QKF as demonstration
that one may in principle track stochastic qubit dynamics
using single shot measurements under a Kalman frame-
V. DISCUSSION work. The QKF also has the benefit, as constructed, of
reverting to the AKF if suitable pre-processing of data
is performed prior to execution of the iterative state-
The numeric simulations we have performed probe a
estimation algorithm. In common laboratory settings
wide variety of operating conditions in order to explore
the measurement protocol may be effectively linearised
the algorithmic pathologies of leading forecasting tech-
through simple averaging of multiple single-shot mea-
niques drawn from engineering, econometrics, and ma- surements, application of Bayesian estimation protocols,
chine learning communities. Our central finding is that or other pre-processing as identified above. So long as
overall the autoregressive Kalman filter provides an ef- the pre-processing takes place on timescales fast relative
fective path to perform both state estimation and for- to the underlying qubit dynamics, the measurement lin-
ward prediction for non-Markovian qubit dynamics. Re- earization has no impact other than to change the ef-
casting dynamics into an AKF filter, importantly, pro- fective sample rate of the measurements. Thus it is our
vides model robustness against details of the underlying view that full implementation of the QKF is not essential
dynamics as well as filtering of noise that allows it to if improved optimization routines are not accessible.
outperform the simpler LSF in [28]. Measurement noise It is possible that QKF forward prediction hori-
filtering is enabled in the Kalman framework through zons in realistic learning environments can be improved
the optimisation procedure for R and has a regularising by solving the full q + 2 optimisation problem for
(smoothing) effect. Additionally optimisation of the im- {{φq0 ≤q }, σ, R}, rather than employing the approach
perfectly learned dynamical model is provided through taken in this manuscript. However, this poses its own
the tuning of σ. The joint optimisation procedure over challenges given the observations we make about the op-
(σ, R) ensures that the relative strength of noise param- timisation landscape even for the 2D optimisation prob-
eters is also optimised. lem faced in the AKF. More sophisticated, data-driven
AKF has also been demonstrated to work well with
18

model selection schemes are described for both KF and mance of the AKF or LKFFB (refer to panel Fig. 5(b- ii),
kernel learning machines (such as GPR) in literature equivalently, Fig. 6(c)). This difference is somewhat sur-
(e.g. [48, 49]). Beyond standard local-gradient and sim- prising because in the limit that Γn is set to the identity
plex optimisers, we consider coordinate ascent [50] and in LKFFB and an infinite basis of oscillators in the pe-
particle swarm optimisation techniques [51] as promis- riodic kernel is truncated at the finite value, J (B) , both
ing, nascent candidates and their application remains an LKFFB and the GPR-PER are formally equivalent to
open research question. One may also consider switch- classical Kalman filtering for a collection of J (B) inde-
ing from a high order AR(q) to an ARMA model with pendent state-space resonators [42]. In this limit, the
a smaller number of optimisation parameters. Typically, true f is described by theoretically identical covariance
this is accomplished by incorporating either greater a pri- functions in both KF and GPR frameworks. While we do
ori information about the underlying dynamic process in not operate in this regime, one would expect predictive
the design of the ARMA model and/or using model-less capabilities of these two algorithms to be comparable.
particle-based / unscented filtering techniques to over- In contrast to our observations for the various flavors
come non-linearities in an ARMA representation (e.g. of KF tested here, we observe that GPR predictions with
[2]). The latter set of techniques are well adapted for non- a periodic kernel are useful for filtering/retrodiction but
linear models but are likely to require a modification to appear to have limited meaning for forward predictions
allow for non-Markovian dynamics (e.g. by designing an for time-steps n = n‡ > 0. In our application, predictive
appropriate transition probability for otherwise Markov performance of GPR with a periodic kernel for κ = 0 is
re-sampling procedures); in contrast, a typical recursive shown to yield poor predictive performance over the en-
ARMA formulation for our application may track tempo- semble average (Fig. 9(a)). For the unexpected regime of
ral correlations but be ill-equipped for non-linear, coin- κ  0 and relatively small fixed l, predictive performance
flip measurements. One expects that a straightforward improves and the periodic kernel performs similarly to
application of such procedures will be complicated. RBF and RQ. In this a high κ and a low l regime, the sin
Our general results on the use of autoregressive models term of the periodic kernel is slowly moving (sin(x) ≈ x)
for building Kalman dynamical models stand in contrast and hence the argument of the exponential in the peri-
to Fourier-domain approaches in LKFFB and GPR us- odic kernel approximates a Gaussian, reducing to an RBF
ing a periodic kernel; both show significant performance kernel. Our numerical investigations show that an opti-
degradation in cases when learning of state dynamics was mised RQ kernel consistently chooses parameter regimes
imperfect. In investigating the loss of performance for where an RQ also converges to an RBF. For the operat-
LKFFB, we find that the efficacy of this approach de- ing regimes pertinent to our application, it appears that
pends on a careful choice of a probe (i.e. a fixed compu- the choice of the periodic, RBF, and RQ kernels will pro-
tational basis) for the dynamics of f capturing the effect duce theoretically equivalent results for forward predic-
of dephasing noise on the qubit. In the imperfect learn- tions of the qubit state. In our analysis, these ‘forward
ing regime of Fig. 4 and identically, Fig. 7, LKFFB re- predictions’ simply arise from a smoothed decay of state
constructs Fourier domain information to a high fidelity estimates starting from test-point n‡ = −1 to the noise
across a range of sampling regimes but is outperformed mean for test-points n‡ > 0; and are difficult to interpret
by AKF in the time domain (Fig. 4). Since LKFFB compared to their Kalman counterparts.
tracks instantaneous amplitude and phase information Our numerical characterisation of the periodic kernel
explicitly for each basis frequency, the loss of LKFFB for a simple, noiseless f demonstrates that this kernel
time-domain predictive performance must accrue from learns Fourier domain amplitude information in a way
difficulty in tracking instantaneous phase, rather than that is better suited for pattern fitting than forward pre-
amplitude, information. diction. The predictive time domain sequence of state
While difficulty of instantaneous phase estimation is estimates is repetitive at n = n‡ = κ, and can be in-
likely to disadvantage the time-domain predictive per- terpreted as successful qubit-state predictions only when
formance of LKFFB, our results show that a Fourier- f is perfectly learned (no discontinuities appear). When
domain approach yields high fidelity reconstructions of learning is imperfect, however, GPR with a periodic ker-
power spectral density describing f . These reconstruc- nel is able to learn Fourier amplitudes to provide good
tions appear robust against imperfect projection on the retrodictive state estimates for n‡ < 0, but forward pre-
LKFFB oscillator basis even as oversampling is reduced. dictions for n‡ > 0 typically fail. Unlike LKFFB, we
This suggests that an application of LKFFB outside of believe the periodic kernel does not permit actively ex-
predictive estimation could be tested against standard tracting and updating phase information for each individ-
spectral estimation techniques in future work. ual basis oscillators at n‡ = κ. Since phase information
The challenge in adapting GPR for the task of time- can be recast as amplitude information for any fixed-
domain predictive estimation has proved more striking. frequency oscillator, one would naively expect that for-
In our numerical simulations, under conditions compara- ward predictions can be improved by increasing κ moder-
ble to those tested in the AKF, the values of normalised ately, such that the higher order terms in a series expan-
Bayes prediction risk for all GPR models are at least an sion of the sin term are non trivial and sin(x) ≈ x cannot
order of magnitude greater than the comparable perfor- apply. However, any positive value of κ means that we
19

are probing dynamics at frequencies lower than appear- linear measurement routines and validate the utility of
ing in the data-set. As such, a GPR-PER model predicts the Kalman filtering framework for both. In contrast,
zero for n‡ ∈ [0, κ], κ > 0, before reviving at κ. The use under GPR, we found numerical evidence that this ap-
of a procedure optimising kernel noise parameters {σ, R} proach enables retrodiction but not forward predictions
does not change the behavior as n‡ → κ, but does smooth beyond the measurement record.
the discontinuities, as illustrated in Fig. 9(f). In letting There are exciting opportunities for machine learning
κ  0 (extremely large), we lose the uniqueness of the algorithms to increase our understanding of dynamically
periodic kernel in summarising an infinite basis of oscil- evolving quantum systems in real time using projective
lators, and standard Gaussian kernels (e.g. RBF, RQ) measurements. Quantum systems coupled to classical
are likely to apply. spatially or temporally varying fields may benefit from
It is possible that the choice of more complex kernels classical algorithms to analyse correlation information
could enhance forward time series predictions via GPR, and enable predictive control of qubits for applications
but they bring additional complications which thus far re- in quantum information, sensing, and the like. Moving
main unresolved in relation to the current application. As beyond a single qubit, we anticipate that measurement
one example, our ability to use numerical investigations records will grow in complexity allowing us to exploit the
to inform kernel design is further distorted by the need natural scalability offered by machine learning for min-
for a robust optimisation procedure, as illustrated by ing large datasets. In realistic laboratory environments,
lower-than anticipated predictive performance observed the success of algorithmic approaches will be contingent
for QPER. Another class of GPR methods, namely, spec- on robust and computationally efficient algorithmic op-
tral mixture kernels and sparse spectrum approximation timisation procedures as well as the extensions beyond
using GPR have been explored in [52, 53]. However, Markovian dynamics studied here. The pursuit of these
these techniques also require efficient optimisation proce- opportunities is the subject of ongoing research.
dures to learn many unknown kernel parameters, whereas
the sine-squared exponential in the periodic kernel is pa-
rameterised only by two hyper-parameters. Aside from VII. ACKNOWLEDGMENTS
spectral methods, the generalisation of MAT32 to higher
q + 1/2 models probes only a subset of all possible AR(q) The LSF filter is written by V. Frey and S. Mavadia
processes, as the restrictions on autoregressive coeffi- [28]. The GPR framework is implemented and optimised
cients in Matern kernels are greater than the general using standard protocols in GPy [47]. Authors thank C.
case considered under an AKF in this manuscript. A Granade, K. Das, V. Frey, S. Mavadia, H. Ball, C. Fer-
detailed investigation of the application of such methods rie and T. Scholten for useful comments. This work par-
for forward prediction beyond pattern recognition and tially supported by the ARC Centre of Excellence for En-
with limited computational resources, remains an area of gineered Quantum Systems CE110001013, the US Army
future investigation. Research Office under Contract W911NF-12-R-0012, and
a private grant from H. & A. Harley.

VI. CONCLUSION

In this manuscript, we provided a detailed survey of


machine learning and filtering techniques applied to the
problem of tracking the state of a qubit undergoing non-
Markovian dephasing via a record of projective measure-
ments. We specifically considered the task of performing
predictive estimation: learning dynamics of the system
from the measurement record and then predicting evolu-
tion forward in time. To accommodate stochastic dynam-
ics under arbitrary dephasing, and without an a priori dy-
namical model, we chose two Bayesian learning protocols
- Gaussian Process Regression (GPR) and Kalman Fil-
tering (KF). All Kalman algorithms predicted the qubit
state forward in time better than predicting mean qubit
behaviour, indicating successful prediction, though an
autoregressive approach to building the Kalman dynam-
ical model demonstrated enhanced robustness relative to
Fourier-domain approaches. Forward prediction horizons
could be arbitrarily increased for all Kalman algorithms
by oversampling the underlying dephasing noise. Our
investigations included studies of both linear and non-
20

Appendix A: Physical Setting

In this Appendix, we derive Eq. (1). We consider a qubit under environmental dephasing. For any two level
system, a quantum mechanical description of physical quantities of interest can be provided in terms of the Pauli
spin operators {σ̂x , σ̂y , σ̂z }. If ~ωA corresponds to an energy difference separating these two qubit states, then the
Hamiltonian for a single qubit in free evolution can be written in the Pauli representation. We consider a qubit states
in the σ̂z basis, |0i or |1i with energies E0 , E1 in our notation, corresponding to a 0 or 1 outcome upon measurement.
This yields a Hamiltonian for a single qubit as:

σ̂z ≡ |1ih1| − |0ih0| (A1)


Î ≡ |0ih0| + |1ih1| (A2)
1
E0,1 ≡ ∓ ~ωA (A3)
2
1
Ĥ0 = (E0 |0ih0| + E1 |1ih1|) (A4)
2
1
+ [(E1 − E0 )σ̂z + E0 |1ih1| + E1 |1ih1|] (A5)
2
1
= ~ωA σ̂z (A6)
2
In this representation, the effect of dephasing noise on a free qubit system is that any initially prepared qubit
superposition of |0i and |1i states will decohere over time in the presence of dephasing noise. This physical effect is
modelled as a stochastically fluctuating process δω(t) that couples with the σ̂z operator. The noise Hamiltonian is
described as:
~
ĤN (t) ≡ δω(t)σ̂z (A7)
2
In the formula above, δω(t) is a classical, stochastically fluctuating parameter that models environmental dephasing
and ~/2 appears as a convenient scaling factor. The total Hamiltonian for a single qubit under dephasing is:
Ĥ(t) ≡ Ĥ0 + ĤN (t) (A8)
Since ĤN (t) commutes with Ĥ0 , we can transform away Ĥ0 by moving to a rotating frame with respect to H0 . Let
|ψ(t)i be a state in the lab frame, let Û define a transformation to a rotating frame, and let |ψ̃(t)i be the state in
the rotating frame. The notation,˜, indicates operators and states in the transformed frame. In this simple case, the
transformed Hamiltonian governing the evolution of |ψ̃(t)i will just be ĤN (t):

Û ≡ e−iĤ0 t/~ (A9)



|ψ̃(t)i ≡ Û |ψ(t)i (A10)
d d
i~ |ψ̃(t)i ≡ i~ Û † |ψ(t)i (A11)
dt dt
d
= −Ĥ0 Û † |ψ(t)i + i~Û † |ψ(t)i (A12)
dt
= (Û † H(t)Û − Ĥ0 )|ψ̃(t)i (A13)
ˆ ≡ Û † H(t)Û − Ĥ
=⇒ H̃ (A14)
0
† †
= Û Ĥ0 Û + Û ĤN (t)Û − Ĥ0 (A15)
= ĤN (t), [Û , Ĥ0 ] = [Û , ĤN (t)] = 0 (A16)
(A17)
In the semiclassical approximation, ĤN (t) commutes with itself at different t, and hence we can write a unitary time
evolution operator in the rotating frame as:
ˆ (t, t + τ ) ≡ e− ~i Rtt+τ ĤN (t0 )dt0 = e− 2i f (t,t+τ )σ̂z
Ũ (A18)
Z t+τ
f (t, t + τ ) ≡ δω(t0 )dt0 (A19)
t
21

In the rotating frame, we prepare an initial state that is a superposition of |0i and |1i states. This state evolves
under ĤN (t) during a Ramsey experiment for duration τ . Subsequently, the qubit state is rotated before a projective
measurement is performed with respect to the σ̂z axis i.e. the measurement action resets the qubit.
Without loss of generality, define the initial state as |ψ̃(0)i ≡ √12 |0i + √12 |1i in the rotating frame. Then, the
probability of measuring the same state after time τ in a single shot measurement, dn as:
ˆ (0, τ )|ψ̃(0)i|2
P r(dn = 1|f (0, τ ), τ ) = |hψ̃(0)|Ũ (A20)
P r(dn = 0|f (0, τ ), τ ) ≡ 1 − P r(dn = 1|f (0, τ ), τ ) (A21)

The second π/2 control pulse rotates the state vector such that a measurement in σ̂z basis is possible, and the
probabilities correspond to observing the qubit in the |1i state. Hence, Eq. (A20) defines the likelihood for single
shot qubit measurement. Further, Eq. (A20) defines the non linear measurement action on phase noise jitter, f (0, τ ).
We impose a condition that f (0, τ )/2 ≤ π such that accumulated phase over τ can be inferred from a projective
measurement on the σ̂z axis.

1. Experimentally Controlled Discretisation of Dephasing Noise

In this section, we consider a sequence of Ramsey measurements. At time t, the Eq. (A20) describes the qubit
measurement likelihood at one instant under dephasing noise. We assume that the dephasing noise is slowly drifting
with respect to a fast measurement action on timescales of order τ . In this regime, Eq. (A19) discretises the continuous
time process δω(t), at time t, for a number of n = 0, 1, ..., N equally spaced measurements with t = n∆t. Performing
the integral for τ  ∆t and slowly drifting noise such that we substitute the following terms in Eq. (A19):

δ ω̄n ≡ δω(t0 )|t0 =n∆t (A22)


fn ≡ f (n∆t, n∆t + τ ) (A23)
~ n∆t+τ
Z
~
= δ ω̄n dt0 = σ̂z δ ω̄n τ (A24)
2 n∆t 2
In this notation, δ ω̄n is a random variable realised at time, t = n∆t, and it remains constant over short duration of
the measurement action, τ . We use the shorthand fn ≡ f (n∆t, n∆t + τ ) to label a sequence of stochastic, temporally
correlated qubit phases f ≡ {fn }.
Since the qubit is reset by each projective measurement at n, the unitary operator governing qubit evolution is
also reset such that {Ũˆ ≡ Ũ ˆ (n∆t, n∆t + τ )} are a collection of N unitary operators describing qubit evolution for
n
each new Ramsey experiment. They are not to be interpreted, for example, as describing qubit free evolution without
re-initialising the system. Hence, for each stochastic qubit phase fn , the true probability for observing the |1i in a
single shot is given by substituting fn for f (0, 1) in Eq. (A20).
(
cos( f2n )2 for d = 1
P r(dn = d|fn , τ, n∆t) = (A25)
sin( f2n )2 for d = 0

The last line follows from the fact that total probability of the qubit occupying either state must add to unity. This
yields Eq. (1) in the main text.

2. True Dephasing Noise Engineering

In the absence of an a priori model for describing qubit dynamics under dephasing noise, we impose the following
properties on a sequence of stochastic phases, f ≡ {fn } such that we can design meaningful predictors of qubit state
dynamics. We assert that a stochastic process, fn , indexed by a set of values, n = 0, 1, . . . N satisfies:

E[fn ] = µf ∀n (A26)
E[fn2 ]
< ∞ ∀n (A27)
E[(fn1 − µ)(fn2 − µ)] = R(ν), ν = |n1 − n2 |, ∀n1 , n2 ∈ N (A28)
2
R(ν) 6= σ δ(ν) (A29)
22

Covariance stationarity of f is established by satisfying Eqs. (A26) to (A28) , namely that the mean is independent
of n, the second moments are finite, and the covariance of any two stochastic phases at arbitrary time-steps, n1 , n2 ,
do not depend on time steps but only on the separation distance, ν. The δ(ν) in the last condition, Eq. (A29), is the
Dirac-delta function and establishes that f is not delta-correlated (white). This condition captures the slowly drifting
assumption for environmental dephasing noise.
We also require that correlations in f eventually die off as ν → ∞ otherwise any sample statistics inferred from
noise-corrupted measurements are not theoretically guaranteed to converge to the true moments. Let M be the
number of runs for an experiment with M different realisations of the random process f , µf be the true mean, µ̂f its
estimate, DM denote the dataset of M experiments, and R(ν) define the correlation function for the true process, f .
Then mean square ergodicity states that estimators approach true moments only if the correlations die off over long
temporal separations:
M −1
1 X
lim R(ν) = 0 ⇐⇒ lim E[(µ̂f − µf )2 ]DM = 0
M →∞ M M →∞
ν=0
for ν = |nm1 − nm2 |, ∀m1 , m2 ∈ M, nm1 , nm2 ∈ N
M
1 X
with µ̂f = fnm (A30)
M m=0

The statement above means that a true R(ν) associated with f is bandlimited for sufficiently large (but unknown)
M . If correlations never ‘die out’, then any designed predictors for one realisation of dephasing noise will fail for a
different realisation of the same true dephasing. For the purposes of experimental noise engineering, we satisfy the
assumptions above by engineering discretised process, f , as:
J
X
fn = αω0 jF (j) cos(ωj n∆t + ψj ) (A31)
j=1
η
F (j) = j 2 −1 (A32)

As described in [46], α is an arbitrary scaling factor, ω0 is the fundamental spacing between true adjacent discrete
frequencies, such that ωj = 2πf0 j = ω0 j, j = 1, 2, ...J. For each frequency component, there exists a uniformly
distributed random phase, ψj ∈ [0, π]. The free parameter η allows one to specify an arbitrary shape of the true
power spectral density of f . In particular, the free parameters α, J, ω0 , η are true dephasing noise parameters which
any prediction algorithm cannot know beforehand.
It is straightforward to show that f is covariance stationary. To show mean square ergodicity of f , one requires
phases are randomly uniformly distributed over one cycle for each harmonic component of f [54]. Subsequently,
one shows that an ensemble average and a long time average of multi-component engineered f are equal. For the
evaluation of the long time average, we use product-to-sum formulae and observe that the case j 6= j 0 has a zero
contribution as any finite contribution from cosine terms over a symmetric integral are reduced to zero as N → ∞.
For j = j 0 , only a single cosine term survives. The surviving term depends on ν and N cancels to yield a finite,
non-zero contribution that matches the ensemble average.
We briefly comment that f is Gaussian by the central limit theorem in the regimes considered in this manuscript.
The probability density function of a sum of random variables is a convolution of the individual probability density
functions. The central limit theorem grants that each element of fn at n appears Gaussian distributed for large
J, irrespective of the underlying properties of the constituent terms or the distribution of the phases ψ. Numerical
analysis shows that J > 15 results in each fn appearing approximately Gaussian distributed.
There is an important difference between fn - defined here in Appendix A and - and fn in Appendices B and C.
In subsequent Appendices B and C, the term fn defines the ‘true model’ for an algorithmic representation of an
arbitrary covariance stationary process - either by invoking Wold’s decomposition theorem (AKF, QKF) or the
spectral representation theorem (LKFFB, GPR with Periodic Kernel). This means that fn in subsequent Appendices
only approximates the true covariance stationary stochastic qubit phases, {fn } of the Appendix A in the limit where
total size of available sample data increases to infinity. Our notation, fn , fails to distinguish these two different
interpretations as such a difference does not arise in typical applications - in our case, we have no a priori true
model of describing stochastic qubit phases, and must rely on mean square approximations. Henceforth, we retain
fn to be the true model for an algorithm with an understanding that this refers to an approximate representation
of an arbitrary, covariance stationary sequence of stochastic qubit phases. We reserve the use of the fˆn for the state
estimates and predictions that an algorithm makes having considered a single noisy measurement record.
23

Appendix B: Autoregressive Representation of f in AKF (and QKF)

Our objective in this Appendix is to justify the representation of fn assumed by the AKF. In particular, we justify
any fn drawn from any arbitrary power spectral density satisfying the properties in Appendix A 2 can be approximated
by a high order autoregressive process.
Such results are well known, if dispersed among standard engineering and econometrics textbooks [4, 11, 33–35, 55].
We struggled to find standard references that explicitly link high q AR models in approximating arbitrary covariance
stationary time series of arbitrary power spectral densities, though some general comments are made in [55]. In the
discussion below, we summarise relevant background, and link a high q AR process to a theorem that guarantees
arbitrary representation of zero mean covariance stationary processes, and provide explicit references for proofs out
of scope of introductory remarks in this Appendix. In order to achieve this, we will consider autoregressive (AR)
processes of order q, (AR(q)), and moving average processes of order, p (MA(p)). A model incorporating both types
of processes is known as an ARMA(q, p) model in our notation.
First, we define the lag operator, L. This operator defines a map between time series sequences and enables a
compact description of ARMA processes. For an infinite time series {fn }∞ n=−∞ and a constant scalar, c, the lag
operator is defined by the following properties:

Lfn = fn−1 (B1)


Lq fn = fn−q (B2)
L(cfn ) = cLfn = cfn−1 (B3)
Lfn = c, ∀n, =⇒ Lq fn = c (B4)

Next, we define a Gaussian white noise sequence, ξ, under the strong condition than what is stated simply in Eq. (B6),
that ξn1 , ξn2 are independent ∀n1 , n2 :

E[ξ] ≡ 0 (B5)
2
E[ξn1 ξn2 ] ≡ σ δ(n1 − n2 ) (B6)

With these definitions, we can define an autoregressive process and a moving average process of unity order. Eq. (B7)
defines an AR(q = 1) process and dynamics of fn are given as lagged values of the variable f . The second definition
in Eq. (B8) depicts a MA(p = 1) process where dynamics are given by lagged values of Gaussian white noise ξ.

(1 − φ1 L)fn = c + ξn (B7)
fn = c0 + (Ψ1 L + 1)ξn (B8)

Here, Ψ1 , φ1 are known scalars defining dynamics of fn ; wn is a white noise Gaussian process, and c, c0 are fixed
scalars. It is well known that an MA(∞) representation is equivalently an AR(1) process, and the reverse relationship
also applies. For example, we can re-write Eq. (B7) as:

fn = c + ξn + φ1 fn−1 (B9)
= wn + φ1 fn−1 (B10)
= wn + φ1 (wn−1 + φ1 fn−2 ) (B11)
..
. (B12)
= φn+1
1 F0 + φn1 w0 + φn−1
1 w1 + . . . wn (B13)
n+1
= φ 1 F0 + φn1 (c
+ ξ0 ) + . . . + (c + ξn ) (B14)
X n
n+1 n n−1
= φ1 F0 + c(φ1 + φ1 + . . . + 1) + φk1 ξn−k (B15)
k=0
wn ≡ c + ξn (B16)
F0 ≡ fn=−1 (B17)

In the last line (and for all subsequent analysis in this Appendix), k should only be interpreted as a index variable for
compactly re-writing terms in an equation as summations. We restrict |φ1 | < 1 such that f is covariance stationary
[34]. Under these conditions, we take the limit of f capturing an infinite past, namely, as n → ∞. The initial state
24

F0 is eventually forgotten, φn+1


1 F0 ≈ 0 if n is large and |φ1 | < 1. Similarly, the terms c(φn1 + φn−1
1 + . . . + 1) can be
summarised as a geometric series in φ1 . The remaining terms satisfy the definition of an MA(∞) process:


1 X
fn = c + φk1 ξn−k , |φ1 | < 1 (B18)
1 − |φ1 |
k=0

It is straightforward to show that the reverse is true, namely, an MR(1) is equivalent to an AR(∞) representation
[34].
The consideration of an MA(∞) process leads us directly to Wold’s decomposition for arbitrary covariance stationary
processes, namely, that any covariance stationary f can be represented as:


X
fn ≡ c0 + Ψk Lk ξn (B19)
k=0
c0 ≡ E[fn |fn−1 , fn−2 , . . .] (B20)
Ψ0 ≡ 1 (B21)

X
Ψ2k < ∞ (B22)
k=0

Eq. (B19) defines an MA(∞) process derived previously as an AR(1) process. This process is ergodic for Gaussian
ξ. However, such a representation of f requires fitting data to an infinite number of parameters {Ψ1 , Ψ2 , . . .} and
approximations must be made.
We approximate an arbitrary covariance stationary f using finite but high order AR(q) processes. Below we show
that any finite order AR(q) process has an MA(∞) representation satisfying Wold’s theorem.
We define an arbitrary AR(q) process as:

ξn ≡ (1 − φ1 L − φ2 L2 − . . . − φq Lq )(fn − c) (B23)

In particular, we define λi , i = 1, . . . , q as eiqenvalues of the dynamical model, Φ:

φ1 φ2 φ3 . . . φq−1 φq
 
1 0 0 ... 0 0
0 1 0 ... 0 0
 
Φ≡
0 0 1 ... 0 0  (B24)
. .. .. . .. 
 .. . . . . . .. .
0 0 0 ... 1 0
 
λ ≡ λ1 . . . λq s.t.|Φ − λIq | = 0 (B25)
(B26)

We use the following result from [34] without proof that the above implies:

1 − φ1 L − φ2 L2 − . . . − φq Lq (B27)
≡ (1 − λ1 L) . . . (1 − λq L) (B28)

This yields:

ξn = (1 − λ1 L) . . . (1 − λq L)(fn − c) (B29)

For us to invert this problem and recover an MA process, we need to show that the inverse for each (1 − λq0 L) term
exists for q 0 = 1, . . . , q. To do this, we start by defining the operator Λq (L) :

Λq (L) ≡ lim (1 + λq L + . . . + λkq Lk ) (B30)


k→∞
25

We consider an arbitrary q 0 -th eigenvalue term in process and we multiply Λq0 (L) :
Λq0 (L)ξn = Λq0 (L)(1 − λ0 L) . . . (1 − λq0 L) . . . (fn − c) (B31)
= lim (1 + λ L + . . . +
q0 λkq0 Lk )(1 − λ L)(1 − λ0 L) . . . (1 − λ
q0 q 0 −1 L)(1 − λ
q 0 +1 L) . . . (1 − λq L)(fn − c)
k→∞
(B32)
= lim (1 + λq0 L + . . . + λkq0 Lk )(1 − λ0 L) . . . (1 − λq0 −1 L)(1 − λq0 +1 L) . . . (1 − λq L)(fn − c) (B33)
k→∞

− lim (λq0 L + . . . + λk+1


q0 L
k+1
)(1 − λ0 L) . . . (1 − λq0 −1 L)(1 − λq0 +1 L) . . . (1 − λq L)(fn − c) (B34)
k→∞

= lim (1 + λk+1
q0 L
k+1
)(1 − λ0 L) . . . (1 − λq0 −1 L)(1 − λq0 +1 L) . . . (1 − λq L)(fn − c) (B35)
k→∞

Each of the residual terms, λk+1


q0 L
k+1
→ 0 if |λq0 | < 1 for large k, and this case Λq0 (L) defines the inverse (1 − λq0 L)−1 .
This procedure is repeated for all q eigenvalues to invert Eq. (B29) and subsequently perform a partial fraction
expansion as follows:
1
fn − c = ξn (B36)
(1 − λ1 L) . . . (1 − λq L)
q
X aq0
= ξn (B37)
0
1 − λq 0 L
q =1

λq−1
q0
aq0 ≡ Qq (B38)
q 00 =1,q 00 6=q 0 (λq
0 − λq00 )
The coefficients are aq0 as obtained via the partial fraction expansion method during which L is treated as an ordinary
polynomial. At present, we have a represent f via a finite q weighted average of values of ξ. However, in substituting
the definition of Λq0 ≡ (1 − λq0 L)−1 from Eq. (B30) in Eq. (B37) and regrouping terms in powers of L, we recover the
form of an MA representation (setting c ≡ f˜n = 0, ∀n for simplicity):
   
q k q
X X X 0 0
fn =  aq0 L0 + lim  aq0 λkq0  Lk  ξn (B39)
k→∞
q 0 =1 k0 =1 q 0 =1

X
= Ψ0 + Ψk Lk ξn (B40)
k=1
q
X
Ψ0 ≡ aq0 L0 (B41)
q 0 =1
q
X 0
Ψk ≡ aq0 λkq0 (B42)
q 0 =1
Pq
By examining the properties of Φ raised to arbitrary powers, it can be shown that q0 =1 aq0 ≡ 1 and Ψk is the first
element of Φ raised to the k-the power [34], yielding absolute summability of Ψk if |φq0 <q | < 1. This ensures that
Wold’s theorem is fully satisfied and an AR(p) process has an MA(∞) representation. In moving to an arbitrarily
high q, we enable the approximation of any covariance stationary f .
The proofs that high q AR approximations for covariance stationary f improve with q for example, in [37]. The
key correspondence is that the number of finite lag terms q in an AR(q)) model contribute to the first q values of the
covariance function. This approximation improves with q even if f is not a true AR process [37, 55]. Asymptotically
efficient coefficient estimates for any M A(∞) representation of f are obtained by letting the order of a purely AR(q)
process tend to infinity and increasing total data size, N [37].
When data is fixed at N , we expect a high q model to gradually saturate in predictive estimation performance. One
can arbitrarily increase performance by increasing both q, N [37]. In our application with finite data N , we increase
q to settle on a high order AR model while training LSF to track arbitrary covariance stationary power spectral
densities [35].
A high q AR model is often the first step for developing models with smaller number of parameters, for example,
considering a mixture of finite order AR(q) and MA(p) models and estimating p + q number of coefficients using
a range of standard protocols [35, 55]. The design of potential ARMA models for our application requires further
investigation beyond the scope of this manuscript.
26

Appendix C: Spectral Representation of f in GPR (Periodic Kernel) and LKFFB

The well-known spectral representation theorem guarantees that any covariance stationary random process (real or
complex) can be represented in a generalised harmonic basis. We defer a detailed treatment of spectral analysis of
covariance stationary processes in standard textbooks, for example, [34, 38] and present background and key results
to provide insights into the choice of LKFFB and GPR (periodic kernel).
The spectral representation theorem states that any covariance stationary random process has a representation
given by fn , and correspondingly, a probability distribution, F (ω) over [−π, π] in the dual domain such that:

Z π
fn = µf + [a(ω) cos(ωn) + b(ω) sin(ωn)]dω (C1)
Z π 0
R(ν) = e−iων dF (ω) (C2)
−π

Here, µf is the true meanR ω of the process f . The processes


R ω a(ω) and b(ω)
R ω 0are zero mean and serially and mutually
uncorrelated, namely, ω12 a(ω)dω is uncorrelated with ω34 a(ω)dω and ωjj b(ω)dω for any ω1 < ω2 < ω3 < ω4 and
any choice of j, j 0 within the half cycle [0, π].
The distribution F (ω) exists as a limiting case of considering cumulative probability density functions for fn at
each n and letting n → ∞ such that a sequence of these density functions approach F (ω) [38]. If F (ω) is differentiable
with respect to ω, then we see the power spectral density S(ω) and R(ν) are Fourier duals [38]:
Z π
R(ν) = e−iων S(ω)dω (C3)
−π
dF (ω)
S(ω) ≡ (C4)

The duality of the covariance function and the spectral density is formally expressed in literature by the Wiener
Khinchin theorem.
We consider the finite sample analogue of the spectral representation theorem considered above by following [34].
To proceed, we define mean square convergence as a distance metric for determining when a sequence of random
variables {fˆn } converges to a random variable, fn in the mean square limit if:

E[fˆn2 ] < ∞
∀n (C5)
lim E[fn − fn ] = lim ||fˆn − fn || = 0
ˆ (C6)
n→∞ n→∞

The statement ||fˆn − fn || = 0 measures the closeness between random variables fˆn and fn even though the mean
square limit is defined for terms of a sequence of random variables, {fˆn }, where convergence improves with n → ∞.
In context of this study, we define fˆn as a linear predictor of fn belonging to a covariance stationary f . Hence, each
fˆn for large n is a linear combination of the set of random variables belonging all past noisy observations (and in
Kalman Filtering, all past state predictions). Mean square convergence of ||fˆn − fn || = 0 in our context is a statement
of the quality of a predictor, fˆn , in predicting fn as the total measurement data grows.
Next, we account for finite data and define the finite sample analogue for the spectral representation theorem. We
suppose there exists a set of arbitrary, fixed frequencies {ωj } for j = 1, . . . , J. We let n denote finite time steps for
observing fn at n = 1, . . . , N . Further, we define a set of zero mean, mutually and serially uncorrelated random
process {aj } and {bj } as finite sample analogues of the true a(ω) and b(ω) for the j-th spectral component. In
particular, these processes are constant over n by covariance stationarity of f . Then, the finite sample analogue for
the spectral representation theorem becomes [34]:
J
X
fn = µf + [aj cos(ωj n) + bj sin(ωj n)] (C7)
j=1

E[aj ] = E[bj ] = 0 (C8)


2 0
E[aj aj 0 ] = E[bj bj 0 ] = σ δ(j − j ) (C9)
E[aj bj 0 ] = 0 ∀j, j 0 (C10)
µf ≡ 0 (C11)
27

The last line enforces a zero mean stochastic process and simplifies analysis without loss of generality and δ(·) is the
Dirac-delta function.
To illustrate, the first two moments are of the form:
J
X
E[fn ] = µf + E[aj ] cos(ωj n) + E[bj ] sin(ωj n) = 0 (C12)
j=0
J X
X J
R(ν) = σj2 δj, j 0 [cos(ωj n) cos(ωj0 (n + ν)) + sin(ωj n) sin(ωj0 (n + ν))] (C13)
j j0
J
X
= σ2 pj cos(ωj ν) (C14)
j

σj2 σj2
pj ≡ ≡ P 2 (C15)
σ2 j σj

We introduce process noise, wn , into the formula for true fn , and this establishes a commonality with state dynamics
in Kalman filtering for a covariance stationary process:
J
X
fn = µf + [aj cos(ωj (n − 1)) + bj sin(ωj (n − 1))] + wn (C16)
j=1

In the absence of measurement noise and operating in the oversampling regime, an ordinary least squares (OLS)
(B)
regression can be constructed by providing a collection of J (B) basis frequencies {ωj }, as in [34]. The OLS problem
is constructed by separating the set of coefficients {µ̂f , â1 , b̂1 , . . . âJ , b̂J } and regressors {1, cos(ω1 (n − 1)), sin(ω1 (n −
(B) (B)
1)), . . . , cos(ωJ (n − 1)), sin(ωJ (n − 1))}. For the specific particular choice of basis, J (B) = (N − 1)/2, (odd N ) and
(B)
ωj ≡ 2πj/N , we state the key result from [34] that the coefficient estimates are obtained as:

(B)
JX
(B) (B)
fˆn = µ̂f + [âj cos(ωj (n − 1)) + b̂j sin(ωj (n − 1))] (C17)
j=1
N
2 X ˆ (B)
âj ≡ fn0 cos(ωj (n0 − 1)) (C18)
N 0
n =1
N
2 X ˆ (B)
b̂j ≡ fn0 sin(ωj (n0 − 1)) (C19)
N 0
n =1

This choice of basis results in the number of regressors being the same as the length of the measurement record.
Further, the term (â2j + b̂2j ) is proportional to the total contribution of the j-th spectral component to the total sample
variance of f , or in other words, the amplitude estimate for the power spectral density of true f .
Next, we depart from the OLS problem above by in several ways, firstly, by introducing measurement noise and
secondly, by changing basis oscillators considered in the problem above. As in the main text, the linear measurement
record is defined as:
yn ≡ fn + vn (C20)
vn ∼ N (0, R) (C21)
The link in GPR (periodic kernel) is direct and the link with LKFFB is made by setting fn ≡ Hn xn . In both
frameworks, we incorporate the effect of measurement noise through the measurement noise variance, R, which has
the effect of regularising the least squares estimation process discussed above.

1. Infinite Basis of Oscillators in a GPR Periodic Kernel

The departure from simple OLS plus measurement noise (above) to GPR (periodic kernel) arises from the fact that
data is projected on an infinite basis of oscillators, namely, J (B) → ∞.
28

We follow the sketch of a proof provided in [42] to show that a sine squared exponential (periodic kernel) used in
Gaussian Process Regression satisfies covariance function of trigonometric polynomials. Here, the index j labels an
infinite comb of oscillators and m represents the higher order terms in the power reduction formulae in the last line
of the definition below:
(B)
(B) ωj
ω0 ≡ , j ∈ {0, 1, ..., J (B) } (C22)
j
(B)
ω ν
2 sin2 ( 02 )
2
R(ν) ≡ σ exp(− ) (C23)
l2
(B)
1 cos(ω0 ν)
= σ 2 exp(− 2 ) exp( 2
) (C24)
l l
M →∞ (B)
1 X 1 cosm (ω0 ν)
= σ 2 exp(− 2 ) (C25)
l m=0 m! l2m

Next, we expand each cosine using power reduction formulae for odd and even powers respectively, and we re-group
terms. For example, we expand the terms for m = 0, 1, 2, 3, 4, 5... as:
       
2 1 (B) 2 1 2 1 3 2 1 5
R(ν) = σ exp(− 2 ) cos(ω0 ν) + + ... (C26)
l (2l2 ) 0 (2l2 )3 3! 1 (2l2 )5 5! 2
     
1 (B) 2 1 2 2 1 4
+ σ 2 exp(− 2 ) cos(2ω0 ν) 2 2
+ + ... (C27)
l (2l ) 2! 0 (2l2 )4 4! 1
     
1 (B) 2 1 3 2 1 5
+ σ 2 exp(− 2 ) cos(3ω0 ν) 2 3
+ ... (C28)
l (2l ) 3! 0 (2l2 )5 5! 1
   
1 (B) 2 1 4
+ σ 2 exp(− 2 ) cos(4ω0 ν) + ... (C29)
l (2l2 )4 4! 0
   
1 (B) 2 1 5
+ σ 2 exp(− 2 ) cos(5ω0 ν) + ... (C30)
l (2l2 )5 5! 0
..
.
     
1 1 1 2 1 1 4 1
+ σ 2 exp(− 2 ) + + . . . + σ 2 exp(− 2 ) (C31)
l (2l2 )2 2! 1 (2l)4 4! 2 l
In the expansion above, the vertical and horizontal dots represent contributions from m > 5 terms. The key message
is that truncating m to a finite number of terms M will truncate j to represent a finite number of oscillators. For
the example above, if the power reduction expansion indexed by m above was truncated to M = 5 terms, then the
number of basis oscillators (number of rows) would also be truncated. We now summarise the amplitudes Eq. (C26)
to Eq. (C30) in second term of R(ν) and Eq. (C31) corresponds to p0,M term below:

(B)
X
R(ν) = σ 2 (p0,M + pj,M cos(jω0 ν)) (C32)
j=0
M AX
β=βj,m  
2 1 X 2 1 j + 2β
pj,M ≡ σ exp(− 2 ) (C33)
l
β=0
(2l2 )(j+2β) (j + 2β)! β
M AX
β ≡ 0, 1, ..., βj,m (C34)
α=αM
m
AX
 
1 X 1 1 2α
p0,M = exp(− ) (C35)
l2 α=0
2
(2l )(2α) (2α)! α
M AX
α≡ 0, 1, ..., αm (C36)

By examining the cosine expansion, one sees that a truncation at (M, J (B) ) means our summarised formulae will
M AX
require βj,M = b M2−j c and αM
M AX
= bM
2 c where bc denotes the ceiling floor. If we truncate with M ≡ J
(B)
such
(B) (B)
−j
that αMM AX
= b J 2 c, βj,M
M AX
= bJ 2 c and re-adjust the kernel for the zero-th frequency term, then we agree with
final result in [42].
29

We compare the covariance function of the periodic kernel in Eq. (C32) with the covariance function of trigonometric
polynomials in Eq. (C14). Here, pj,M for the periodic kernel are not identically specified in general to those under
the spectral representation theorem, but otherwise retain a structure as a cosine basis where the correlations between
two random variables in a sequence only depends on the separation between them. For a constant mean Gaussian
process, the form of the periodic kernel allows the underlying process to satisfy covariance stationarity and appears
to permit an interpretation via the spectral representation theorem.

2. Amplitude and Phase Extraction for Finite Oscillator Basis in LKFFB

In LKFFB, we depart from the simple OLS plus measurement noise problem considered earlier by specifying a fixed
basis of oscillators at the physical Fourier resolution established by the measurement record. Using a specific state
space model, we can track amplitudes and phases for each basis oscillator individually to enable forward prediction
at any time-step of our choosing. The design of a fixed basis necessarily incorporates a priori assumptions about the
extent to which a fast measurement action over-samples slowly drifting non-Markovian noise, that is, a (potentially
incorrect) assumption about dephasing noise bandwidth.
The efficacy of the Liska Kalman Filter in our application assumes an appropriate choice of the ‘Kalman basis’
oscillators. The choice of basis can effect the forward prediction of state estimates. To illustrate, consider the choice
(B)
of Basis A - C defined below. Basis A depicts a constant spacing above the Fourier resolution (e.g. ω0 ≥ N2π T ∆t
).
Basis B introduces a minimum Fourier resolution and effectively creates an irregular spacing if one wishes to consider
a basis frequency comb coarser than the experimentally established Fourier spacing over the course of the experiment.
Basis C is identical to Basis B but allows a projection to a zero frequency component.
(B) (B) (B)
Basis A: ≡ {0, ω0 , 2ω0 . . . J (B) ω0 } (C37)
2π 2π (B) 2π (B)
Basis B: ≡ { , + ω0 , . . . , + J (B) ω0 } (C38)
N ∆t N ∆t N ∆t
2π 2π (B) 2π (B)
Basis C: ≡ {0, , + ω0 , . . . , + J (B) ω0 } (C39)
N ∆t N ∆t N ∆t
While one can propagate LKFFB with zero gain, it may be advantageous for predictive control applications to
generate predictions in one calculation rather than recursively. This means we sum contributions over all j ∈ J (B)
oscillators and we reconstruct the signal for all future time values in one calculation, without having to propagate
the filter recursively with zero gain. The interpretation of the predicted signal, fˆn , requires an additional (but time-
constant) phase correction term ψC that arises as a byproduct of the computational basis (i.e. Basis A, B or C). The
phase correction term corrects for a gradual mis-alignment between Fourier and computational grids which occurs
if one specifies a non-regular spacing inherent in Basis B or C. Let nC denote the time-step at which instantaneous
amplitudes ||x̂jnC || and instantaneous phase θx̂jn is extracted for the oscillator represented by the j-th state space
C
(B) (B)
resonator, xjn , where super-script j denotes an oscillator of frequency ωj ≡ jω0 (not a power):
(B)
JX
(B)
fˆ = ||x̂jnC || cos(m∆tωj + θx̂jn + ψC ), (C40)
C
j=0

n C ∈ NT , m ∈ NP
(
0, (Basis A)
ψC ≡ (B) (C41)
≡ 2π
(B) (ω0 − N2π
∆t ), (Basis B or C)
ω0

The output predictions from calculating a harmonic sum using learned instantaneous amplitudes, phases and the
LKFFB Basis A-C agree with zero-gain predictions if ψC is specified as above. The calculation of ψC is determined
entirely by the choice of computational and experimental sampling procedures, and assumes no information about
true dephasing.
Next, we define an analytical ratio to define the optimal training time, nC , at which LKFFB predictions should
commence, irrespective of whether the prediction procedure is recursively propagating the Kalman Filter with zero
gain, or by calculating a harmonic sum for all prediction points in one go.
1 fs
nC ≡ (B)
= (B)
(C42)
∆tω0 ω0
30

Consider an arbitrarily chosen training period, NT 6= nC . For fs fixed, our choice of NT > nC means we are achieving
a Fourier resolution which exceeds the resolution of the LKFFB basis. Now consider NT < nC . This means that we’ve
extracted information prematurely, and we have not waited long enough to project on the smallest basis frequency,
(B)
namely, ω0 . In the case where data is perfectly projected on our basis, this has no impact. For imperfect learning,
we see that instantaneous amplitude and phase information slowly degrades for NT > nC ; and trajectories for the
smallest basis frequency have not stabilised for NT < nC .
(B)
Of these choices, Basis A for ω0 ≡ N2π T ∆t
is expected to yield best performance, at the expense of computational
load, and this is confirmed in numerical experiments. All results in this manuscript are reported for Basis A with
1 fs
NT ≡ (B) = (B) .
∆tω0 ω0

3. Equivalent Spectral Representation of f in LKFFB and GPR Periodic Kernel

In this section, we consider the structural similarities between LKFFB and GPR with a periodic kernel. We show
that the LKFFB has an analogous structure to a stack of stochastic processes on a circle [38], and in moving from
discrete to continuous time, we recover a covariance function that has the same structure if the periodic kernel was
truncated to a finite basis of oscillators, J (B) . For zero mean, Gaussian random variables, covariance stationarity
is established, completing the link between LKFFB and the periodic kernel. For the case Γn wn → wn in LKFFB,
stacked Kalman resonators as an approximation to infinite oscillators in a periodic kernel is documented in [42].
At time step n, the posterior Kalman state at n − 1 acts as the initial state at n, such that ν = ∆t for a small
∆t such that a linearised trajectory is approximately true for each basis frequency. We show this using the following
correlation relations and a Gaussian assumption for process noise, where n, m ∈ N are indices for time steps and
j = 0, 1, . . . J (B) indexes the set of basis oscillators:

E[wn ] = 0 ∀j ∈ J (B) , n∈N (C43)


2
E[wn , wm ] = σ δ(n − m) n, m ∈ N (C44)
0
E[Aj0 ] = E[B0j ] = 0, 0
∀j, j ∈ J (B)
(C45)
0
E[Ajn Bm
j
] = 0, ∀j, j 0 ∈ J (B) , n, m ∈ N (C46)
0
j0 0 0
E[Ajn Ajm ] = E[Bnj Bm ] = σj2 δ(n − m)δ(j − j ), ∀j, j ∈ J (B)
, n, m ∈ N (C47)
0
E[wn Ajm ] = E[wn Bm
j
]≡0 ∀j, j 0 ∈ J (B) , n, m ∈ N (C48)

Consider a j-th state space resonator, xjn , in the LKFFB, where super-script j denotes an oscillator (not a power)
and we obtain:

" #
(B) (B)
(B) cos(jω0 ∆t) − sin(jω0 ∆t)
Θ(jω0 ∆t) = (B) (B) (C49)
sin(jω0 ∆t) cos(jω0 ∆t)
 
 j  j 
j A n (B) w n−1 An−1
xn ≡ = Θ(jω0 ∆t) +q (C50)
Bnj j
Î 
j j
An−1 2 + Bn−1 2 B n−1

(C51)
31

=⇒ E[xjn ] = 0 (C52)
 j j j j 
(B) A Am−1 An−1 Bm−1 (B)
=⇒ E[xjn xjm T ]j = Θ(jω0 ∆t)E[ n−1 j ]Θ(jω0 ∆t)T (C53)
Bn−1 Ajm−1 Bn−1
j j
Bm−1
 
 j
Ajm−1 Ajn−1 Bm−1j 
(B) wn−1 wm−1  An−1 (B)
+ Θ(jω0 ∆t)  q +q j j j j Θ(jω0 ∆t)T
j j
An−1 2 + Bn−1 2 j j
Am−1 2 + Bm−1 2 B A
n−1 m−1 B B
n−1 m−1

(C54)
 
 j
Ajm−1 Ajn−1 Bm−1j 
(B) wn−1 wm−1  An−1 (B)
+ Θ(jω0 ∆t)  q q j j j j Θ(jω0 ∆t)T
j 2 j 2 j
An−1 + Bn−1 Am−1 + Bm−1 2 j 2 B n−1 A m−1 Bn−1 B m−1

(C55)
 
1 0
= σj2 δ(n − m) (C56)
0 1

The cross correlation terms disappear under the temporal correlation functions so defined, namely, if assume n ≥ m,
then states Ajm−1 , Bm−1
j
at m − 1 at most have a wn−2 term (for the case n = m) and cannot be correlated with a
future noise term wn−1 .
The dynamical trajectory in LKFFB is linearised for small ∆t. The linearisation is an approximation to a true,
continuous time deterministic trajectory defining a stochastic process on a circle.
We briefly visit this continuous time trajectory to specify the link between LKFFB and GPR (periodic kernel).
Let t denote the continuous time deterministic dynamics for random initial state given by aj0 , bj0 , where super-script
(B)
j denotes an oscillator with frequency ωj ≡ jω0 (not a power):
0
E[aj0 ] = E[bj0 ] = 0, ∀j, j 0 ∈ J (B) (C57)
0
E[aj0 bj0 ] = 0, 0
∀j, j ∈ J (B)
(C58)
0 0
E[aj0 aj0 ] = E[bj0 bj0 ] = σj2 δ(j − j 0 ), ∀j, j 0 ∈ J (B) (C59)
cos(ωj t) − sin(ωj t) aj0
  
j
x (t) ≡ (C60)
sin(ωj t) cos(ωj t) bj0
E[xj (t)] = 0 (C61)
cos(ωj t0 ) − sin(ωj t0 ) aj0  j j  cos(ωj t) − sin(ωj t)
    
E[xj (t)xj (t0 )T ] = a b (C62)
sin(ωj t0 ) cos(ωj t0 ) bj0 0 0 sin(ωj t) cos(ωj t)
 
cos(ωj ν) 0
= σj2 , ν ≡ |t0 − t| (C63)
0 cos(ωj ν)

We see that the initial state variables, aj0 , bj0 , must be zero mean, independent and identically distributed variables
for each j such that xj (t) is covariance stationary. If aj0 , bj0 are Gaussian, then the joint distribution, xj (t), remains
Gaussian under the linear operations above. Hence, the continuous time limit of the dynamics in LKFFB for J (B)
independent substates, xj (t), describe a process with the same first and second moments for a periodic kernel truncated
at J (B) . For Gaussian processes, this results in an approximate equivalent representation of LKFFB for J (B) stacked
resonators with an expansion of the periodic kernel truncated at J (B) .
While the formalism of LKFFB shares a common structure with GPR (periodic kernel) in a particular limit, the
physical interpretation of Ajn , Bnj is that these are components of the Hilbert transform of the original signal [29].
This gives us the ability to track and extract instantaneous amplitude and phase associated with each basis oscillator
in LKFFB. In contrast, the coefficients of the periodic kernel are always contingent on the arbitrary truncation of the
infinite basis, as seen in Eqs. (C32), (C33) and (C35). Hence, tracking (or extracting) amplitudes and phases for
individual oscillators does not seem appropriate for the periodic kernel, as these values would change depending on
the arbitrary choice of a truncation point.
32

[1] J. J. J. Groen, R. Paap, and F. Ravazzolo, Journal of [26] C. Granade, J. Combes, and D. Cory, New Journal of
Business & Economic Statistics 31, 29 (2013). Physics 18, 033024 (2016).
[2] Y. Dong, Y. Li, M. Xiao, and M. Lai, Applied Mathe- [27] P. E. Jacob, S. M. M. Alavi, A. Mahdi, S. J. Payne,
matical Modelling 33, 398 (2009). and D. A. Howey, IEEE Transactions on Control Systems
[3] J. Ko and D. Fox, Autonomous Robots 27, 75 (2009). Technology 99, 1 (2017).
[4] A. C. Harvey, Forecasting, structural time series mod- [28] S. Mavadia, V. Frey, S. D. Jarrah Sastrawan, and M. J.
els and the Kalman filter (Cambridge University Press, Biercuk, Nature communications 8 (2017).
Cambridge, United Kingdom, 1990). [29] J. Liška and E. Janeček, in Robotics Automation and
[5] C. Cheng, A. Sa-Ngasoongsong, O. Beyca, T. Le, Control (InTech, Vienna, 2008) pp. 28–38.
H. Yang, Z. Kong, and S. T. Bukkapatnam, IIE Trans- [30] C. Ferrie, C. E. Granade, and D. G. Cory, Quantum
actions 47, 1053 (2015). Information Processing 12, 611 (2013).
[6] J. D. Garcia and G. C. Amaral, in Sensor Array and [31] M. S. Grewal and A. P. Andrews, Kalman Filtering: The-
Multichannel Signal Processing Workshop (SAM) (Rio ory and Practice Using MATLAB, 2nd ed. (John Wiley
de Janeiro, 2016) pp. 1–5. & Sons, Hoboken, New Jersey, 2001).
[7] F. R. Bach and M. I. Jordan, IEEE transactions on signal [32] S.-M. Moon, D. G. Cole, and R. L. Clark, Journal of
processing 52, 2189 (2004). sound and vibration 294, 82 (2006).
[8] S. Tatinati and K. C. Veluvolu, The Scientific World [33] I. D. Landau, R. Lozano, M. M’Saad, and A. Karimi,
Journal 2013, 548370 (2013). Adaptive control, Vol. 51 (Springer, Berlin, 1998).
[9] J. Hall, C. E. Rasmussen, and J. Maciejowski, in De- [34] J. D. Hamilton, Time series analysis, Vol. 2 (Princeton
cision and Control and European Control Conference Uuniversity Press, Princeton, New Jersey, 1994).
(CDC-ECC) (Orlando, 2011) pp. 6019–6024. [35] P. J. Brockwell and R. A. Davis, Introduction to time
[10] F. Hamilton, T. Berry, and T. Sauer, Physical Review series and forecasting (Springer-Verlag, New York, 1996).
X 6, 011021 (2016). [36] M. Salzmann, P. Teunissen, and M. Sideris, in Kine-
[11] J. V. Candy, Bayesian signal processing: classical, mod- matic Systems in Geodesy, Surveying, and Remote Sens-
ern, and particle filtering methods, Vol. 54 (John Wiley ing, Vol. 107 (Springer-Verlag, New York, 1991) pp. 251–
& Sons, Hoboken, New Jersey, 2016). 260.
[12] B. Boashash, Proceedings of the IEEE 80, 540 (1992). [37] B. Wahlberg, Journal of Time Series Analysis 10, 283
[13] L. Ji and Z. Tie, in Proceedings of IEEE 13th Inter- (1989).
national Conference on Signal Processing (ICSP), IEEE [38] S. Karlin and H. Taylor, A First Course in Stochastic
(Curran Associates, New York, 2016) pp. 320–325. Processes (Academic Press Inc, New York, 1975).
[14] G. Struchalin, I. Pogorelov, S. Straupe, K. Kravtsov, [39] R. Karlsson and F. Gustafsson, Filtering and estimation
I. Radchenko, and S. Kulik, Physical Review A 93, for quantized sensor information, Tech. Rep. LiTH-ISY-
012103 (2016). R-2674 (Linköping University, 2005).
[15] A. Sergeevich, A. Chandran, J. Combes, S. D. Bartlett, [40] B. Widrow, I. Kollar, and M.-C. Liu, IEEE Transactions
and H. M. Wiseman, Physical Review A 84, 052315 on instrumentation and measurement 45, 353 (1996).
(2011). [41] C. E. Rasmussen and C. K. I. Williams, Gaussian Pro-
[16] D. Mahler, L. A. Rozema, A. Darabi, C. Ferrie, cesses for Machine Learning (Adaptive Computation and
R. Blume-Kohout, and A. Steinberg, Physical Review Machine Learning) (MIT Press, Cambridge, 2005).
Letters 111, 183601 (2013). [42] A. Solin and S. Särkkä, in Proceedings of the Seventeenth
[17] M. P. Stenberg, O. Köhn, and F. K. Wilhelm, Physical International Conference on Artificial Intelligence and
Review A 93, 012122 (2016). Statistics, Proceedings of Machine Learning Research,
[18] A. Shabani, R. Kosut, M. Mohseni, H. Rabitz, Vol. 33, edited by S. Kaski and J. Corander (PMLR,
M. Broome, M. Almeida, A. Fedrizzi, and A. White, Reykjavik, 2014) pp. 904–912.
Physical review letters 106, 100401 (2011). [43] F. Tobar, T. D. Bui, and R. E. Turner, in Advances in
[19] Z. Shen, W.-X. Wang, Y. Fan, Z. Di, and Y.-C. Lai, Neural Information Processing Systems, Vol. 28 (Curran
Nature communications 5 (2014). Associates, New York, 2015) pp. 3501–3509.
[20] L. E. de Clercq, R. Oswald, C. Flühmann, B. Keitch, [44] S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson,
D. Kienzler, H.-Y. Lo, M. Marinelli, D. Nadlinger, and S. Aigrain, Phil. Trans. R. Soc. A 371, 20110550
V. Negnevitsky, and J. P. Home, Nature communica- (2013).
tions 7 (2016). [45] M. L. Stein, Interpolation of spatial data: some theory
[21] D. Tan, S. Weber, I. Siddiqi, K. Moelmer, and K. Murch, for kriging (Springer Science & Business Media, 2012).
Physical Review Letters 114, 090403 (2015). [46] A. Soare, H. Ball, D. Hayes, X. Zhen, M. Jarratt, J. Sas-
[22] Y. Huang and J. E. Moore, arXiv:1701.06246. trawan, H. Uys, and M. Biercuk, Physical Review A 89,
[23] C. Bonato, M. S. Blok, H. T. Dinani, D. W. Berry, M. L. 042329 (2014).
Markham, D. J. Twitchen, and R. Hanson, Nature nan- [47] GPy, “GPy: A gaussian process framework in python,”
otechnology 11, 247 (2016). http://github.com/SheffieldML/GPy (2012).
[24] N. Wiebe, C. Granade, A. Kapoor, and K. M. Svore, [48] S. Arlot and P. Massart, Journal of Machine Learning
arXiv:1511.06458. Research, 10, 245 (2009).
[25] M. D. Shulman, S. P. Harvey, J. M. Nichol, S. D. Bartlett, [49] K. Vu, J. C. Snyder, L. Li, M. Rupp, B. F. Chen, T. Khe-
A. C. Doherty, V. Umansky, and A. Yacoby, Nature lif, K.-R. Müller, and K. Burke, International Journal of
Communications 5 (2014). Quantum Chemistry 115, 1115 (2015).
33

[50] P. Abbeel, A. Coates, M. Montemerlo, A. Y. Ng, and [53] J. Quiñonero Candela, C. E. Rasmussen, A. R. Figueiras-
S. Thrun, in Robotics: Science and Systems (MIT Press, Vidal, and M. Lázaro-Gredilla, Journal of Machine
Cambridge, 2005) pp. 289–296. Learning Research 11, 1865 (2010).
[51] A. Robertson and C. Grenade, (2017). [54] A. Gelb, Applied optimal estimation (MIT Press, Cam-
[52] A. Wilson and R. Adams, in Proceedings of the 30th In- bridge, 1974).
ternational Conference on Machine Learning (ICML-13), [55] M. West and J. Harrison, Bayesian Forecasting and Dy-
Vol. 28 (Journal of Machine Learning Research, Atlanta, namic Models (Springer-Verlag, New York, 1996).
2013) pp. 1067–1075.

You might also like