0% found this document useful (0 votes)

15 views17 pages

Motion Code Arxiv

Uploaded by

ryan.roby.scmcmf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views17 pages

Motion Code Arxiv

Uploaded by

ryan.roby.scmcmf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Motion Code: Robust Time Series Classification and

Forecasting via Sparse Variational Multi-Stochastic

Processes Learning

Chandrajit Bajaj∗ Minh Nguyen∗

University of Texas at Austin University of Texas at Austin

Abstract

Despite extensive research, time series classification and forecasting on noisy data
remain highly challenging. The main difficulties lie in finding suitable mathemat-
ical concepts to describe time series and effectively separate noise from the true
signals. Unlike traditional methods treating time series as static vectors or fixed
sequences, we propose a novel framework that views each time series, regardless
of length, as a realization of a continuous-time stochastic process. This mathe-
matical approach captures dependencies across timestamps and detects hidden,
time-varying signals within the noise. However, real-world data often involves
multiple distinct dynamics, making it insufficient to model the entire process with
a single stochastic model. To address this, we assign each dynamic a unique
signature vector and introduce the concept of "most informative timestamps" to
infer a sparse approximation of the individual dynamics from these vectors. The
resulting model, called Motion Code, includes parameters that fully capture diverse
underlying dynamics in an integrated manner, enabling simultaneous classification
and forecasting of time series. Extensive experiments on noisy datasets, includ-
ing real-world Parkinson’s disease sensor tracking, demonstrate Motion Code’s
strong performance against established benchmarks for time series classification
and forecasting.

1 INTRODUCTION
Noisy time series analysis is a challenging problem due to the difficulty in finding appropriate
mathematical models to represent and study such data, unlike images or text. For example, consider
two groups of time series, each representing the audio data of a word pronounced by speakers with
different accents, and with varying lengths between 80 and 95 data points (see Figure 1). Common
methods, like distance-based [Jeong et al., 2011] or shapelet-based [Bostrom and Bagnall, 2017]
approaches, treat time series as ordered vectors, but fail when those vectors are highly mixed, with
many red series resembling blue ones (see Figure 1a). Deep learning methods, such as recurrent
neural networks (e.g., LSTM-FCN [Karim et al., 2019]) or convolutional networks (e.g., ROCKET
[Dempster et al., 2020]), struggle to capture higher-order correlations in datasets with many data
points but a limited number of individual series, as in this example. Techniques that rely on empirical
statistics, such as dictionary-based [Schäfer, 2015] or interval-based methods [Deng et al., 2013], are
also unreliable because noise can distort the collected statistics, making it difficult to separate noise
from true signals. These challenges motivate our approach, which models time series collections
using stochastic processes to recover underlying dynamics from noisy data.
We propose modeling each time series as an instance of a common stochastic process governing
the group’s dynamics. While individual series may be noisy, our method recovers the underlying
stochastic process, capturing key patterns such as increases, decreases, or stable phases, as shown by

∗
Equal contribution
(a) 2 Time Series Collections (b) Absorptivity (c) Anything
Figure 1: (a): Two Collections Of Time Series Representing Pronunciation Audio Data For The Words
Absorptivity And Anything. (b) And (c): Most Informative Timestamps For The Pronunciation Of
Absorptivity (Red) And Anything (Blue).

the skeleton approximations in Figure 1b and Figure 1c. These approximations reveal core signals
and the statistical relationships between data points.
Modeling time series with multiple dynamics, however, requires more than a single stochastic process.
Unlike previous methods [Qi et al., 2010, Durbin, 2012, Cao et al., 2013, Moss et al., 2023] that focus
on a single series, our framework introduces the most informative timestamps (see Definition 2) to
identify key features across multiple series. In addition, each dynamic model is assigned a signature
vector, or motion code, which is jointly optimized using sparse learning techniques to prevent
overfitting. This enables us to accurately distinguish between different dynamics. For instance, in
Figure 1, scatter points reveal two distinct patterns that would otherwise be hidden in the noisy, mixed
time series collections.
Combining these innovations, our proposed model, Motion Code, effectively learns from multiple
underlying processes in a robust and comprehensive way. Our key contributions include:

1. Motion Code: A model that jointly learns from noisy time series collections, explicitly
modeling the underlying stochastic process to separate noise from core signals.

2. Irregular Data Handling: Motion Code handles out-of-sync, varying-length, and missing
data directly without interpolation, preserving temporal structure and avoiding alignment
distortions.

3. Most Informative Timestamps: An interpretable feature of Motion Code that employs

variational inference to capture the essential dynamics in noisy time series.

1.1 Related Works

Time Series Classification: Common techniques include distance-based methods [Jeong et al.,
2011], interval-based [Deng et al., 2013], dictionary-based [Schäfer, 2015], shapelet-based [Bostrom
and Bagnall, 2017], feature-based [Lubba et al., 2019], and ensemble models [Lines et al., 2016,
Middlehurst et al., 2021]. In deep learning, popular approaches involve convolutional neural networks
[Karim et al., 2019, Dempster et al., 2020], residual networks [Ma et al., 2016], autoencoders [Hu
et al., 2016].
Time Series Forecasting: Methods for forecasting include exponential smoothing [Holt, 2004],
TBATS [De Livera et al., 2011], ARIMA [Malki et al., 2021], probabilistic state-space models
[Durbin, 2012], and deep learning frameworks [Lim and Zohren, 2021, Liu et al., 2021].
Stochastic Modeling: Gaussian processes [Rasmussen and Williams, 2006] are widely used for
continuous-time series. To reduce computational cost, sparse Gaussian processes [Titsias, 2009] have
been developed. Building on these approaches, advanced generative models with either approximate
or exact inference [Qi et al., 2010, Cao et al., 2013, Moss et al., 2023] have been introduced. However,
these methods are typically limited to individual time series, whereas our approach extends to
multi-time series collections, enabling joint modeling across different series.

2
The rest of the paper is organized as follows: Section 2 details the mathematical and algorithmic
framework for Motion Code. Section 3 presents experiments and benchmarking for classification and
forecasting tasks. Section 4 discusses the benefits of our framework.

2 MOTION CODE: JOINT LEARNING ON COLLECTIONS OF TIME

SERIES

2.1 Stochastic Process Formulation and Data Assumption

We formulate the time series problem in the context of stochastic processes.

L
Input: The training data consists of samples from L underlying stochastic processes {Gk }k=1 where
Bk
L ∈ N ≥ 2. For each k ∈ 1, L, let Ck represent the sample set of Bk time series y i,k i=1 drawn
i,k
from process Gk . Each time series y has the timestamps Ti,k ⊂ R+ , and at each timestamp
t ∈ Ti,k , the corresponding data point is denoted as y i,k (t) ∈ R. The full dataset consists of L
L
collections of time series {Ck }k=1
Tasks And Required Outputs: The primary objective is to construct a model M that jointly learns
L L
the dynamics of the processes {Gk }k=1 from the dataset {Ck }k=1 . The model parameters must be
transferable to the following tasks:

1. Classification: Given a new time series y = (y(t))t∈T with timestamps T , classify it into
one of the L possible groups.
2. Forecasting: For a time series y generated by Gk (k ∈ 1, L), predict future values at new
timestamps T , i.e., predict {yt }t∈T .

Stochastic Process And Notations: Recall that a stochastic process G is defined as {g(t)}t≥0 , where
g is a random function, and g(t) is the random data point at time t. The random function g is referred
to as the underlying signal of the process G. For any timestamps set T , let gT denote the signal
vector (g(t))t∈T ∈ R|T | .
Data Assumptions: We assume that the observed time series data points y are normally distributed
around the underlying signal of their respective stochastic processes. Specifically, let gk represent
the underlying signal of the stochastic process Gk , i.e., Gk = {gk (t)}t≥0 . Then, for k ∈ 1, L,
the data (y i,k )t∈Ti,k assumes a Gaussian distribution with mean (gk )Ti,k = (gk (t))t∈Ti,k ∈ R|Ti,k |
and covariance matrix σI|Ti,k | , where In is the n × n identity matrix. The constant σ ∈ R+ is the
unknown noise variance in the sample data from the underlying signals.

2.2 The Most Informative Timestamps

In this sub-section, we develop the core mathematical concept behind Motion Code called the most
informative timestamps. The most informative timestamps of a time series collection generalize the
concept of inducing points of a single time series introduced in [Titsias, 2009]. They are a small
subset of timestamps that minimizes the mismatch between the original data and the information
reconstructed using only this subset. The visualization of the most informative timestamps is provided
in Figure 2, Figure 3, Figure 4, and is further discussed in Section 4.
To concretely define the most informative timestamps, we first introduce generalized evidence lower
bound function (GELB) in Definition 1. We then define the most informative timestamps as the
maximizers of this GELB function in Definition 2.
Definition 1. Suppose we are given a stochastic process G = {g(t)}t≥0 and a collection of time
B
series C = y i i=1 consisting of B independent time series y i sampled from G. Each series
y i = (yti )t∈Ti consists of Ni = |Ti | data points and is called a realization of G. Let m be a fixed
positive integer. We define the generalized evidence lower bound function L = L(C, G, S m , ϕ)
as a function of the data collection C, the stochastic process G, the m-elements timestamps set

3
(a) Weekend (b) Weekday

(c) Humidity Sensor (d) Temperature Sensor

Figure 2: Forecasting With Uncertainty For Time Series In Chinatown (Pedestrian Count On
Weekends Vs Weekdays) And MoteStrain (Humidity Vs Temperature Sensor Values). Motion Code
Is Trained On [0, 0.8] And Predicted On [0.8, 1].

S m = {s1 , · · · , sm } ⊂ R+ , and a variational distribution ϕ on Rm as follows:

B Z
m 1 X p(y i |gTi )p(gS m )
L(C, G, S , ϕ) := p(gTi |gS m )ϕ(gS m ) log dgTi dgS m (1)
B i=1 ϕ(gS m )
m
Recall that the vectors gTi and gS m are the signal vectors (g(t))t∈Ti ∈ R|Ti | and (g(t))t∈S m ∈ R|S |
on timestamps Ti and S m .
Definition 2. For a fixed m ∈ N, the m-elements set (S m )∗ ⊂ R+ is said to be the most informative
timestamps with respect to a noisy time series collection C of a stochastic process G if there exists a
variational distribution ϕ∗ on Rm so that:
(S m )∗ , ϕ∗ = arg max
m
L(C, G, S m , ϕ) (2)
S ,ϕ

Also define the function Lmax such that Lmax (C, G, S m ) := maxϕ L(C, G, S m , ϕ). Hence, (S m )∗
can be found by maximizing Lmax over all possible S m .

2.3 Approximate Formula for Lmax

To compute the training loss function for Motion Code, we need to computationally approximate the
function Lmax , which defines the most informative timestamps (see Algorithm 1). Specifically, for
B
a given set of m timestamps S m , a stochastic process G, and a collection C of B time series y i i=1
sampled from G, our goal is to approximate Lmax (C, G, S m ). This is achieved by approximating G
with a kernelized Gaussian process (see Definition 3), denoted as H, with a kernel function K.
Definition 3. A kernelized Gaussian process [Rasmussen and Williams, 2006] H := {h(t)}t≥0
with underlying signal h is a stochastic process defined by the mean function µ : R → R and the
positive-definite kernel function K : R × R → R. For the timestamps T , the joint distribution of the
signal vector hT = (h(t))t∈T is Gaussian and characterized by:
p(hT ) = p((h(t))t∈T ) = N (µT , KT T ), (3)

4
Here µT is the mean vector (µ(t))t∈T , and KT T is the positive-definite n × n kernel matrix
(K(t, s))t,s∈T . N (µ, Σ) denote a Gaussian distribution with mean µ and covariance matrix Σ.

With the kernel K of the approximate process H, for each i ∈ 1, B, define the kernel matrices
KTi Ti , KS m Ti , and KTi S m as follows: KTi Ti = (K(t, s))t∈Ti ,s∈Ti , KS m Ti = (K(t, s))t∈S m ,s∈Ti ,
and KTi S m = (K(t, s))t∈Ti ,s∈S m . From these, define the |Ti |-by-|Ti | matrix QTi Ti :=
KTi S m (KS m S m )−1 KS m Ti for i ∈ 1, B. Lastly, define the vector Y and the joint matrix QC,G
as follows:  1  
y QT1 T1 0 0
Y =  ...  , QC,G =  0 .. (4)
   
. 0 
yB 0 0 QTB TB

With these definitions, the function Lmax can be approximated as:

B
1 X
Lmax (C, G, S m ) ≈ log pN (Y |0, Bσ 2 I + QC,G ) − T r(KTi Ti − QTi Ti ) (5)
2σ 2 B i=1

where pN (X|µ, Σ) denotes the density function of a Gaussian random variable X with mean µ and
covariance matrix Σ. The detailed proof for this approximation is given in the Appendix.

2.4 Motion Code Learning

With the core concept of the most informative timestamps outlined in Section 2.2 and approximation
formula for Lmax in Section 2.3, we can now describe Motion Code learning framework in details:
Model And Parameters: We approximate each stochastic process Gk using a kernelized Gaussian
process with a kernel function K ηk , parameterized by ηk , for each k ∈ 1, L. All timestamps are
normalized to the interval [0, 1], and we select m ∈ N as the number of the most informative
timestamps, as well as a fixed latent dimension d ∈ N.
We jointly model the most informative timestamps S m,k for each stochastic process Gk (with
corresponding data collection Ck ) through a common mapping G : Rd → Rm . Specifically, we define
L distinct d-dimensional vectors z1 , . . . , zL ∈ Rd , referred to as motion codes, and use them to
model S m,k as:
[
S m,k := sigmoid(G(z )) ∈ Rm (6)
k

where sigmoid is the standard sigmoid function.

We approximate the map G with a linear transformation parameterized by a matrix Θ, such that
G(zk ) ≈ Θzk . Thus, the Motion Code model involves three types of parameters:

1. Kernel parameters η := (η1 , · · · , ηL ) to approximate underlying stochastic process

L
{Gk }k=1 .
2. Motion codes z := (z1 , · · · , zL ) with zi ∈ Rd .
3. The joint map parameter Θ with dimension m × d.

Training Loss Function: The goal is to have S [ m,k closely approximate the true S m,k , which
max
maximizes L . To achieve this, we maximize Lmax (Ck , Gk , S
[ m,k ) for all k, leading to the
following loss function:
L
X L
X
max
U(η, z, Θ) = − L [
(Ck , Gk , S m,k ) + λ ∥zk ∥22 (7)
k=1 k=1

The first term is computed using the approximation formula for Lmax in Equation (5). The second
term is a regularization term for the motion codes zk , controlled by the hyperparameter λ. The full
training procedure is detailed in Algorithm 1.

5
Algorithm 1 Motion Code training algorithm
Bk
Input: L collections of time series data Ck = y i,k i=1 , where the series y i,k has timestamps Ti,k ,
for k ∈ 1, L. Additional hyperparameters include number of the most informative timestamps m,
motion codes dimension d, regularization parameter λ, max iteration M , and stopping threshold ϵ.
Output: Parameters η, z, Θ that optimize loss function U(η, z, Θ) (see Section 2.4).
1: Initialize η and z to be constant vectors 1, and Θ to be the constant matrix, where each column is
the arithmetic sequence between 0.1 and 0.9.
2: repeat
3: Use the current parameter η, z to calculate the predicted most informative timestamps for the
k th stochastic process: S
[ m,k = sigmoid(Θz ).
k
4: Calculate KS m,k S m,k , KS m,k Ti,k , KTηi,k
ηk ηk k
S m,k
for k ∈ 1, L, i ∈ 1, Bk . Then calculate corre-
sponding matrix Q’s, and QC,G defined in Section 2.3.
5: Use above calculations to compute Lmax (Ck , Gk , S [ m,k ) approximated by Equation (5) via an

automatic differentiation framework for each k ∈ 1, L.

6: Calculate the loss U(η, z, Θ) and its differential via automatic differentiation.
7: Update parameters (η, z, Θ) using Limited-memory Broyden Fletcher Goldfarb Shanno (L-
BFGS) algorithm [Liu and Nocedal, 1989].
8: until numbers of iterations exceed M or training loss decreases less than ϵ.
9: Output the final (η, z, Θ).

2.5 Classification and Forecasting with Motion Code

We use the trained parameters η, z, Θ from Algorithm 1 to perform both time series forecasting and
classification. The first step is to compute preliminary predictions that yield the predicted mean
signal pk = E[(gk )T ] ∈ R|T | , which forms the basis for these tasks.
Preliminary Predictions: For a given k ∈ 1, L, the predicted distribution of the signal vector (gk )T
is obtained by marginalizing over the signal (gk )S m,k at the most informative timestamps S m,k for
process Gk :
Z
p((gk )T ) = p((gk )T |(gk )S m,k )ϕ∗ ((gk )S m,k )d(gk )S m,k (8)

where the optimal variational distribution ϕ∗ is defined in Equation (2). A detailed calculation of
the distribution p((gk )T ) and its mean pk = E[(gk )T ], referred to as the predicted mean signal, is
provided in the Appendix.
Forecasting: For the stochastic process Gk , the predicted mean signal pk = E[(gk )T ] serves as the
forecast for the process.
Classification: To classify a series y with timestamps T , we compute the predicted mean signal
pk ∈ R|T | for each k ∈ 1, L. Motion Code outputs the predicted label based on the closest pk , using
the Euclidean distance ∥.∥2,R|T | :

kpredicted = arg max∥y − pk ∥2,R|T | (9)

Time Complexity: Matrix multiplication between an m-by-m matrix and an m-by-|Ti,k | or |Ti,k |-
by-m matrix is the most computationally expensive operation in Algorithm 1. As a result, the
PL PBk 2 2
time complexity of Algorithm 1 is O k=1 i=1 m |Ti,k | × M = O(m N M ), where N =
PL PBk
k=1 i=1 |Ti,k | represents the total number of data points, M is the maximum number of iterations,
and m is the number of most informative timestamps. For time series tasks, by the same argument, the
cost of predicting a single mean vector pk is O(m2 |T |). Thus, the cost for forecasting at timestamps
T is also O(m2 |T |). For classification tasks across L groups of time series, classifying a time series
with timestamps T has a complexity of O(m2 |T ||L|). Since m is typically chosen to be small, these
complexities are approximately linear in terms of the number of data points in the time series input.

6
3 EXPERIMENTS

3.1 Datasets

We prepared three datasets for experimentation:

Basic Sensor And Device Data: Twelve publicly available time-series datasets were sourced from
the UCR archive [Bagnall et al., 2017], focusing on sensor and device data with corresponding ID
from 1 to 12. Gaussian noise was added to simulate real-world conditions, with a standard deviation
of 30% of the maximum absolute value of the data points.
Pronunciation Audio: This dataset includes pronunciation audio from speakers with different accents
(American, British, and Malaysian), focusing on two words: “absorptivity” and “anything.” These
audio samples were sourced from publicly available recordings [Media, 2022].

Table 1: Classification Accuracy (Percentage) For 7 Time Series Algorithms On Noisy Basic Sensor
Datasets. The Highest Accuracy Is Highlighted In Red, The Second Highest In Blue.

ID DTW TSF RISE BOSS BOSS-E catch22 Motion

Code
1 54.23 61.22 65.6 47.81 41.69 55.39 66.47
2 54.47 58.07 59.35 50.06 58.42 52.85 66.55
3 52.42 54.28 53.79 50 50.95 53.58 70.25
4 92.7 99.05 98.73 87.62 93.33 98.41 91.11
5 57.98 57.14 42.02 52.1 57.98 45.38 70.59
6 100 100 83.13 99.2 91.97 95.98 100
7 57.05 68.71 65.79 52.77 53.26 55.88 72.5
8 21.92 28.77 26.03 12.33 28.77 24.66 31.51
9 56.47 61.1 61.5 53.83 53.51 57.19 72.68
10 78.33 92.22 85.56 65.56 77.22 80 92.78
11 63.27 67.68 69.78 48.06 61.91 64.43 75.97
12 78.25 83.67 79.79 12.23 74.87 47.38 80.18

Table 2: Classification Accuracy (Percentage) For 7 Time Series Algorithms On Noisy Basic Sensor
Datasets. “Error" Indicates Failure To Run.

ID Shape- Teaser SVC LSTM- Rocket Hive- Motion

let FCN Cote 2 Code
1 61.22 Error 56.27 66.47 62.97 61.52 66.47
2 52.61 Error 49.71 53.54 56.79 55.75 66.55
3 50 50.11 50 50 52.67 58.18 70.25
4 74.6 89.52 52.38 52.38 90.48 98.73 91.11
5 57.14 53.78 45.38 57.98 58.82 59.66 70.59
6 44.58 100 85.94 100 46.18 100 100
7 62.49 63.17 49.85 61.61 70.36 72.98 72.5
8 20.55 21.92 26.03 17.81 27.4 32.88 31.51
9 47.76 Error 50.64 56.55 68.85 56.95 72.68
10 74.44 51.11 77.78 68.33 87.22 90 92.78
11 69.36 68.84 61.7 63.27 74.71 78.49 75.97
12 49.5 26.52 Error 12.67 83.45 78.5 80.18

Parkinson’s Disease Sensor Data: The Parkinson data are derived from the Clinician Input Study
(CIS-PD) [Elm et al., 2019, Raykov et al., 2019], a 6-month project using Apple Watch devices
to monitor patients during clinic visits and at home. For two days before each clinic visit, pa-
tients reported symptoms every 30 minutes, focusing on medication state and tremor severity. The
accelerometer data was segmented into 20-minute intervals (10 minutes before and after each symp-
tom report). These Parkinson data were obtained from the Biomarker & Endpoint Assessment
to Track Parkinson’s disease DREAM Challenge. For up-to-date information on the study, visit
https://www.synapse.org/Synapse:syn20825169/wiki/600898.

7
We used two experimental settings for Parkinson’s monitoring. The first tracks patients fully on
medication state, distinguishing between no tremor and mild tremor to assess whether the patient
has fully recovered or is still symptomatic. The second setting adds a third category for moderate to
severe tremor, independent of medication state, aiming to capture broader tremor patterns, including
cases where symptoms persist despite medication. This offers a more comprehensive assessment of
tremor severity beyond recovery stages.

3.2 Experimental Setups

Motion Code was applied to three datasets: 12 basic datasets with added noise, pronunciation audio
data, and Parkinson’s sensor data, focusing on classification tasks. Forecasting was performed on the
basic datasets and the audio dataset. All experiments were run on an Nvidia A100 GPU.
Data Preprocessing: For the Parkinson’s dataset, we downsampled each segment by averaging per
second, calculated the absolute differences between consecutive points, and applied an exponential
moving average filter. We interpolated the data to 1,600 points for benchmark algorithms that require
same-length time series, though Motion Code can handle misaligned data directly without the need
for interpolation. More details are given in the Appendix.
PJ
Kernel Choice: We used a spectral kernel defined as K η (t, s) := j=1 αj exp(-0.5βj |t − s|2 ) with
parameters η = (α1 , · · · , αJ , β1 , · · · , βJ ).
Hyperparameters: For experiments, we set d = 2, λ = 1, ϵ = 10−5 , and M = 10. For basic and
pronunciation audio datasets, we selected 10 most informative timestamps (m = 10), and 1 kernel
components (J = 1). For Parkinson’s disease (PD), we used m = 6, J = 2 for the first setting, and
m = 12, J = 2 for the second setting.

3.3 Evaluation on Time Series Classification

We compared Motion Code’s performance on time series classification against 12 algorithms: DTW
[Jeong et al., 2011], TSF [Deng et al., 2013], RISE [Lines et al., 2016], BOSS [Schäfer, 2015], BOSS-
E [Schäfer, 2015], catch22 [Lubba et al., 2019], Shapelet [Bostrom and Bagnall, 2017], Teaser
[Schäfer and Leser, 2020], SVC [Löning et al., 2019], LSTM-FCN [Karim et al., 2019], Rocket
[Dempster et al., 2020], and Hive-Cote 2 [Middlehurst et al., 2021]. We evaluated performance based
on classification accuracy (measured in percentage).
As shown in Table 1 and Table 2, Motion Code outperforms other algorithms on more than half of the
noisy basic datasets and consistently ranks in the top 2, only behind the ensemble model Hive-Cote 2.
This demonstrates the robustness of our method in handling collections of noisy time series.

Table 3: Classification Accuracy For 7 Time Series Algorithms On Pronunciation Audio And
Parkinson Data.
Shape- LSTM- Hive- Motion
Data sets Teaser SVC Rocket
let FCN Cote 2 Code
Pronunciation Audio 68.75 Error 62.5 56.25 75 75 87.5
Parkinson setting 1 52.80 59.94 63.96 43.48 61.49 59.63 70.81
Parkinson setting 2 44.99 37.53 48.02 24.01 51.52 50.82 54.31

Table 4: Classification Accuracy For 7 Time Series Algorithms On Pronunciation Audio And
Parkinson Data.
BOSS- Motion
Data sets DTW TSF RISE BOSS catch22
E Code
Pronunciation Audio 50 87.5 62.5 68.75 62.5 50 87.5
Parkinson setting 1 63.35 63.98 70.81 61.80 65.53 68.94 70.81
Parkinson setting 2 43.12 51.98 53.61 45.92 36.83 51.52 54.31

For real-world datasets, Table 3 and Table 4 show competitive performance from Motion Code
compared to 12 other algorithms, highlighting its effectiveness in handling noise inherent in real-
world data.

8
3.4 Evaluation on Time Series Forecasting

For forecasting, each dataset was split into two parts: 80% of the data points were used for training,
while the remaining 20% of future data points were reserved for testing. For Motion Code, we
generated a single prediction for all series within the same collection. We selected 5 algorithms as
baselines for comparison: Exponential Smoothing [Holt, 2004], ARIMA [Malki et al., 2021], State
Space Model [Durbin, 2012], TBATS [De Livera et al., 2011], and Last Seen, a basic method that
uses previous values to predict the next time steps.

Table 5: Average Root Mean-Square Error (RMSE) For 6 Time Series Forecasting Algorithms.
Exp. Motion
ID ARIMA State space Last seen TBATS
Smoothing Code
1 Error 1079 775.96 723.1 633.04 518.49
2 0.34 0.43 1.58 0.19 0.17 0.27
3 0.88 0.58 0.93 0.57 0.56 0.74
4 60.38 128.44 59.83 41.51 20.94 417.94
5 1117 3386 730.51 497.3 560.88 648.27
6 0.043 0.095 0.25 0.019 0.02 0.048
7 Error 2.02 2.37 1.24 0.96 0.67
8 1.7 2.85 1.7 1.35 1.35 1.08
9 1.11 1.52 1.09 1.01 0.88 0.82
10 3.38 4.85 4.41 1.77 1.72 1.15
11 2.79 2.01 3.21 1.39 1.52 2.26
12 4.37 5 4.45 0.98 1.44 0.98
Audio 0.087 0.27 0.086 0.1 0.059 0.085

We ran the 5 baseline algorithms with individual predictions and compared the results, as shown in
Table 5. Despite not making individual predictions for each series, Motion Code outperformed other
methods in the majority of datasets.
Code: The implementation is available at https://github.com/mpnguyen2/motion_code.

4 MOTION CODE’S BENEFITS

4.1 Interpretable Features

Despite having several noisy time series that deviate from the common mean, the points at most
informative timestamps S m,k form a skeleton approximation of the underlying stochastic process.
All the important twists and turns are constantly observed by the corresponding points at important
timestamps (see Figure 2). Those points create a feature that helps visualize the underlying dynamics
with explicit global behaviors such as increasing, decreasing, staying still, unlike the original complex
time series collections with no visible common patterns among series.

(a) Absorptivity (b) Anything

Figure 3: (a) And (b): Most Informative Timestamps For The Pronunciation Of Absorptivity And
Anything, Highlighting Key Linguistic Components.

Pronunciation Audio: For pronunciation audio data, where speakers from different nationalities
pronounce complex words, Motion Code highlights key linguistic features. For “absorptivity"

9
(ab-sorp-ti-vi-ty), the most informative timestamps align with significant phonetic components,
identifying emphasis on “ab" and “sorp" followed by a notable silent pause before proceeding to “ti"
and then “vi-ty" (see Figure 3a). Similarly, for “anything", it captures a strong vocal raise on “a-ny"
and emphasis on “thing", preserving core pronunciation patterns across accents despite variations
(see Figure 3b). This ability to focus on key moments reveals common speech dynamics shared
across accents.
Parkinson’s Disease Data: When tracking normal movement, the series appears random with no
clear pattern. This reflects the unpredictable nature of normal motion, where no consistent behavior
or tremor can be observed (see Figure 4a). In contrast, for patients with light tremor, the extracted
timestamps reveal a more consistent, repetitive pattern, characterized by slight oscillations that
correspond to minor, controlled hand swings. These small fluctuations, captured at key timestamps,
represent typical behavior in light tremor (see Figure 4b). For more severe tremors, the timestamps
highlight a progression from smaller, repeated movements to larger, more exaggerated swings.
Initially, the differences between consecutive data points are minimal, but as the tremor worsens,
the fluctuations become more pronounced, with larger variations visible at critical timestamps (see
Figure 4c). This interpretable feature allows us to track the severity and progression of tremors over
time, offering valuable insights into patient’s conditions.

(a) Normal (b) Light Tremor (c) Noticeable Tremor

Figure 4: Interpretable Features Showing Tremor Patterns And Disease Stages For Parkinson Data:
(a) Normal, (b) Light Tremor, And (c) Noticeable Tremor.

4.2 Uneven Length and Missing Data

Motion Code processes each data point and its timestamp independently, allowing the algorithm
to handle time series with different timestamps. Despite the time series having uneven lengths and
out-of-sync timestamps, Motion Code maintains accurate skeleton approximations (see Figure 1),
demonstrating its effectiveness with incomplete and varying-length data.
For Parkinson’s dataset, Motion Code efficiently handles out-of-sync timestamps and missing values.
The time series from wearable sensors vary in length from 200 to 1660 points, with intermediate
lengths such as 500 and 1000 points. Traditional methods often require interpolation to standardize
these lengths, which can distort the data, especially when dealing with large disparities. Motion Code
bypasses this need, processing time series of different lengths directly and learning across classes
without interpolation, preserving the original data’s integrity.
This capability is particularly useful for monitoring Parkinson’s disease, where tremors and bradyki-
nesia fluctuate, and sensor readings are irregular due to patient activities. Unlike other methods
that struggle with asynchronous data, Motion Code treats each reading as part of an underlying
stochastic process, enabling it to handle noisy, incomplete, and unsynchronized data efficiently. This
eliminates the need for strict time-aligned monitoring, allowing patients to maintain natural schedules
while ensuring accurate symptom tracking. Clinicians also benefit from clear, actionable insights,
improving their ability to monitor disease progression and make timely interventions.

5 CONCLUSION
In this work, we developed an integrated framework called Motion Code, utilizing variational
inference and sparse stochastic process modeling. Unlike most existing methods focusing on either
classification or forecasting, Motion Code performs both tasks simultaneously across diverse time

10
series collections. Our model demonstrates robustness to noise and consistently achieves competitive
performance against other leading time series algorithms. As discussed in Section 4, Motion Code
offers interpretable features that capture the core dynamics of the underlying stochastic process.
This is especially useful in domains like Parkinson’s disease monitoring, where understanding key
patterns offers actionable insights for clinicians. Additionally, it handles varying-length time series
and missing data, challenges that many other methods struggle with. In future work, we aim to extend
Motion Code by incorporating non-Gaussian approximation to adapt to time series from different
application domains.

Acknowledgments and Disclosure of Funding

Funding information: This research was supported in part by a grant from the NIH-DK129979, in
part from the Peter O’Donnell Foundation, the Michael J. Fox Foundation, Jim Holland-Backcountry
Foundation, and in part from a grant from the Army Research Office accomplished under Cooperative
Agreement Number W911NF-19-2-0333.
Parkinson data: These data were generated by participants of The Michael J. Fox Foundation for
Parkinson’s Research Mobile or Wearable Studies. They were obtained as part of the Biomarker
& Endpoint Assessment to Track Parkinson’s Disease DREAM Challenge (through Synapse ID
syn20825169) made possible through partnership of The Michael J. Fox Foundation for Parkinson’s
Research, Sage Bionetworks, and BRAIN Commons.

References
Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. The great time
series classification bake off: a review and experimental evaluation of recent algorithmic advances.
Data Min. Knowl. Discov., 31(3):606–660, 2017.
Aaron Bostrom and Anthony Bagnall. Binary shapelet transform for multiclass time series classifica-
tion. In Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXII, pages 24–46.
Springer Berlin Heidelberg, Berlin, Heidelberg, 2017.
Yanshuai Cao, Marcus A Brubaker, David J Fleet, and Aaron Hertzmann. Efficient optimization
for sparse gaussian process regression. In Advances in Neural Information Processing Systems,
volume 26. Curran Associates, Inc., 2013.
Alysha M De Livera, Rob J Hyndman, and Ralph D Snyder. Forecasting time series with complex
seasonal patterns using exponential smoothing. J. Am. Stat. Assoc., 106(496):1513–1527, 2011.
Angus Dempster, François Petitjean, and Geoffrey I Webb. ROCKET: exceptionally fast and accurate
time series classification using random convolutional kernels. Data Min. Knowl. Discov., 34(5):
1454–1495, 2020.
Houtao Deng, George Runger, Eugene Tuv, and Martyanov Vladimir. A time series forest for
classification and feature extraction. Inf. Sci. (Ny), 239:142–153, 2013.
James Durbin. Time Series Analysis by State Space Methods: Second Edition. Oxford University
Press, 2012.
Jordan J Elm, Margaret Daeschler, Lauren Bataille, Ruth Schneider, Amy Amara, Alberto J Espay,
Michal Afek, Chen Admati, Abeba Teklehaimanot, and Tanya Simuni. Feasibility and utility of a
clinician dashboard from wearable and mobile application parkinson’s disease data. NPJ Digit.
Med., 2(1):95, 2019.
Charles C Holt. Forecasting seasonals and trends by exponentially weighted moving averages. Int. J.
Forecast., 20(1):5–10, 2004.
Qinghua Hu, Rujia Zhang, and Yucan Zhou. Transfer learning for short-term wind speed prediction
with deep neural networks. Renew. Energy, 85:83–95, 2016.
Young-Seon Jeong, Myong K Jeong, and Olufemi A Omitaomu. Weighted dynamic time warping for
time series classification. Pattern Recognit., 44(9):2231–2240, 2011.

11
Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Samuel Harford. Multivariate LSTM-
FCNs for time series classification. Neural Netw., 116:237–245, 2019.
Bryan Lim and Stefan Zohren. Time-series forecasting with deep learning: a survey. Philos. Trans. A
Math. Phys. Eng. Sci., 379(2194):20200209, 2021.
Jason Lines, Sarah Taylor, and Anthony Bagnall. HIVE-COTE: The hierarchical vote collective of
transformation-based ensembles for time series classification. In 2016 IEEE 16th International
Conference on Data Mining (ICDM). IEEE, 2016.
Dong C Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization.
Math. Program., 45(1-3):503–528, 1989.
Zhenyu Liu, Zhengtong Zhu, Jing Gao, and Cheng Xu. Forecast methods for time series data: A
survey. IEEE Access, 9:91896–91912, 2021.
Markus Löning, Anthony Bagnall, Sajaysurya Ganesh, Viktor Kazakov, Jason Lines, and Franz J.
Király. sktime: A Unified Interface for Machine Learning with Time Series. arXiv e-prints
arXiv:1909.07872, 2019.
Carl H Lubba, Sarab S Sethi, Philip Knaute, Simon R Schultz, Ben D Fulcher, and Nick S Jones.
catch22: CAnonical time-series CHaracteristics: Selected through highly comparative time-series
analysis. Data Min. Knowl. Discov., 33(6):1821–1852, 2019.
Qianli Ma, Lifeng Shen, Weibiao Chen, Jiabin Wang, Jia Wei, and Zhiwen Yu. Functional echo state
network for time series classification. Inf. Sci. (Ny), 373:1–20, 2016.
Zohair Malki, El-Sayed Atlam, Ashraf Ewis, Guesh Dagnew, Ahmad Reda Alzighaibi, Ghada
ELmarhomy, Mostafa A Elhosseini, Aboul Ella Hassanien, and Ibrahim Gad. ARIMA models for
predicting the end of COVID-19 pandemic and the risk of second rebound. Neural Comput. Appl.,
33(7):2929–2948, 2021.
Forvo Media. Forvo: The pronunciation guide. http://www.forvo.com/, 2022.
Matthew Middlehurst, James Large, Michael Flynn, Jason Lines, Aaron Bostrom, and Anthony
Bagnall. HIVE-COTE 2.0: a new meta ensemble for time series classification. Mach. Learn., 110
(11-12):3211–3243, 2021.
Henry B. Moss, Sebastian W. Ober, and Victor Picheny. Inducing point allocation for sparse gaussian
processes in high-throughput bayesian optimisation. In Proceedings of The 26th International
Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine
Learning Research, pages 5213–5230, 2023.
Yuan Qi, Ahmed H Abdel-Gawad, and Thomas P Minka. Sparse-posterior gaussian processes for
general likelihoods. In Proceedings of the 26th conference on uncertainty in artificial intelligence,
pages 450–457. Citeseer, 2010.
Carl Edward Rasmussen and Christopher K Williams. Gaussian processes for machine learning.
MIT Press, 2006.
Yordan P. Raykov, Luc J. W. Evers, Reham Badawy, Marjan J. Faber, Bastiaan R. Bloem, Kasper
Claes, and Max A. Little. Probabilistic modelling of gait for remote passive monitoring applications.
arXiv preprint arXiv:1812.02585, 2019.
Patrick Schäfer. The BOSS is concerned with time series classification in the presence of noise. Data
Min. Knowl. Discov., 29(6):1505–1530, 2015.
Patrick Schäfer and Ulf Leser. TEASER: early and accurate time series classification. Data Min.
Knowl. Discov., 34(5):1336–1362, 2020.
Michalis Titsias. Variational learning of inducing variables in sparse gaussian processes. In Proceed-
ings of the Twelth International Conference on Artificial Intelligence and Statistics, volume 5 of
Proceedings of Machine Learning Research, pages 567–574, Florida, USA, 2009.

12
A MATHEMATICAL THEORY AND PROOF

In this section, we provide the details deferred from the main paper, including the approximation of
the Lmax function (Section 2.3) and the derivation of the signal distribution p((gk )T ) (Section 2.5).

A.1 Approximation of Lmax

We aim to prove the approximation formula for Lmax :

B
1 X
Lmax (C, G, S m ) ≈ log pN (Y |0, Bσ 2 I + QC,G ) − T r(KTi Ti − QTi Ti ) (10)
2σ 2 B i=1

We now state the approximation result as the following lemma:

Lemma 1. Let G = {g(t)}t≥0 be a stochastic process with the underlying signal g. Assume the data
B
collection C for the process G consists of B noisy time series y i i=1 with data points (y i )Ti , where
(y i )Ti ∼ N (gTi , σI|Ti | ). This assumption, from Section 2.1, implies the data points have Gaussian
noise with variance σ around the signal g. We approximate G by a Gaussian process with mean
function µ : R → R and kernel K : R × R → R. We further assume the mean vectors µ can be
−1
approximately rescaled by K: µT ≈ KT S KSS µS .
Let S m be an m-element set of timestamps. Recall the kernel matrices KTi Ti , KS m Ti , and KTi S m ,
defined as follows: KTi Ti = (K(t, s))t∈Ti ,s∈Ti , KS m Ti = (K(t, s))t∈S m ,s∈Ti , and KTi S m =
(K(t, s))t∈Ti ,s∈S m . Also, recall the |Ti |-by-|Ti | matrix QTi Ti := KTi S m (KS m S m )−1 KS m Ti for
i ∈ 1, B. Additionally, recall the data vector Y and the joint matrix QC,G :

y1
   
QT1 T1 0 0
 ..  C,G ..
Y =  . , Q = 0 (11)
 
. 0 
yB 0 0 QTB TB

Then Lmax defined in Section 2.2 has the approximate form:

B
1 X
Lmax (C, G, S m ) ≈ log pN (Y |0, Bσ 2 I + QC,G ) − T r(KTi Ti − QTi Ti ) (12)
2σ 2 B i=1

where pN (X|µ, Σ) denotes the density function of a Gaussian random variable X with mean µ and
covariance matrix Σ.
Furthermore, the optimal variational distribution ϕ∗ = arg maxϕ L(C, G, S m , ϕ) (see Section 2.2)
is a Gaussian distribution of the form:

B
! !
∗ −2 1 X
ϕ (gS m ) = N σ KS m S m Σ KS m Ti y i , KS m S m ΣKS m S m (13)
B i=1

σ −2 PB
where Σ = Λ−1 with Λ := KS m S m + i=1 KS Ti KTi S .
m m
B

Proof. Define the conditional mean signal vector αi = E[gTi |gS m ]. From the rescaled mean signals
approximation, the conditional distribution of gT given gS is Gaussian with the following mean and
variance:
−1 −1
p(gT |gS , T, S) = N (KT S KSS gS , KT T − KT S KSS KST ) (14)

13
As a result, αi = KTi S m (KS m S m )−1 gS m . Then, following the derivation from [Titsias, 2009],
individual terms in Definition 1 can be approximated as follows:
p(y i |gTi )p(gS m )
Z
p(gTi |gS m )ϕ(gS m ) log dgTi dgS m
ϕ(gS m )
Z Z
i p(gS m )
= ϕ(gS m ) p(gTi |gS m ) log p(y |gTi )dgTi + log dgS m
ϕ(gS m )
Z
i 1 p(gS m )
≈ ϕ(gS m ) log pN (y |αi , σI|Ti | ) − 2 T r(KTi Ti − QTi Ti ) + log dgS m
2σ ϕ(gS m )
pN (y|αi , σI|Ti | )p(gS m )
Z
1
= ϕ(gS m ) log dgS m − 2 T r(KTi Ti − QTi Ti )
ϕ(gS m ) 2σ
 
α1
 .. 
Let A be the combined mean signal vector A :=  . . Using the above approximation for individual
αB
terms, we upper-bound the function L(S, G, T m , ϕ) (see Definition 1 in Section 2.2) as follows:
L(S, G, T m , ϕ)
B B
pN (y i |gTi , σ 2 I)p(gS m )
Z
X 1 1 X
≈ ϕ(gS m ) log dgS m − 2 T r(KTi Ti − QTi Ti )
i=1
B ϕ(gS m ) 2σ B i=1
Z !1/B ! B
Y p(g S m) 1 X
= ϕ(gS m ) log pN (y i |αi , σ 2 I) dgS m − 2 T r(KTi Ti − QTi Ti )
i
ϕ(gS m ) 2σ B i=1
Z B
!1/B B
Y
i 2 1 X
≤ log pN (y |αi , σ I) p(gS m )dgS m − 2 T r(KTi Ti − QTi Ti )
i=1
2σ B i=1
Z B
2 1 X
= log pN (Y |A, Bσ )p(gS )dgS − 2
m m T r(KTi Ti − QTi Ti )
2σ B i=1
B
1 X
= log pN (Y |0, Bσ 2 I + QC,G ) − T r(KTi Ti − QTi Ti )
2σ 2 B i=1
The only inequality for this bound is due to Jensen inequality. This upper-bound no longer depends
on the variational distribution ϕ and only depends on the timestamps in S m . As a result, by definition
of Lmax , we obtain the Equation (12). Moreover, for this bound, the equality holds when:
YB
ϕ∗ (gS m ) ∝ pN (y i |αi , σ 2 I)1/B p(gS m )
i=1
B
!
σ −2 X −1 1
∝ exp (y ) KTi S m (KS m S m ) gS m − (gS m )T ×
i T
B i=1 2
−2 XB !
σ
(KS m S m )−1 KS m Ti KTi S m (KS m S m )−1 + (KS m S m )−1 × gS m
B i=1
Hence, ϕ∗ is (approximately) a Gaussian distribution with the following mean and variance:
B
! !
∗ −2 1 X i
ϕ (gS m ) = N σ KS m S m Λ KS m Ti y , KS m S m ΛKS m S m (15)
B i=1

A.2 Calculation of the Distribution p((gk )T )

Recall that S m,k represents the most informative timestamps for the underlying stochastic process Gk ,
associated with the time series data collection Ck (see Section 2.1). Furthermore, Gk is approximated
by a Gaussian process with a parameterized kernel function K ηk (see section 2.4).

14
We now provide a detailed calculation/approximation of the distribution of the underlying signal
p((gk )T ) and its mean pk = E[(gk )T ], referred to as the predicted mean signal (see Section 2.5).
For k ∈ 1, L, the predicted distribution of the signal vector (gk )T is obtained by marginalizing over
the signal (gk )S m,k at the most informative timestamps S m,k for process Gk :
Z
p((gk )T ) = p((gk )T |(gk )S m,k )ϕ∗ ((gk )S m,k )d(gk )S m,k (16)

Here the optimal variational distribution ϕ∗ has the approximate form defined in Equation (13).
Specifically, ϕ∗ can be approximated by a Gaussian distribution with the following mean µk and
covariance matrix Ak , based on Lemma 1:
Bk
!
−2 ηk 1 X ηk i
µk = σ KS m,k S m,k Σ K m,k y (17)
Bk i=1 S Ti,k
Ak = KSηkm,k S m,k ΣKSηkm,k S m,k (18)
σ −2 PBk ηk
where Σ = Λ−1 with Λ := KSηkm,k S m,k + ηk
i=1 KS m,k Ti,k KTi,k S m,k .
Bk
With this approximation for ϕ∗ , Equation (16) simplifies to an integral of two Gaussian distributions.
An explicit calculation shows that the distribution of (gk )T is approximated by a Gaussian distribution
with the following mean and variance:
pk = E[(gk )T ] = KTηkS m,k (KSηkm,k S m,k )−1 µk (19)
V ar[(gk )T ] = KTηkT − KTηkS m,k (KSηkm,k S m,k )−1 KSηkm,k T
+ KTηkS m,k (KSηkm,k S m,k )−1 Ak (KSηkm,k S m,k )−1 KSηkm,k T (20)

B DATASETS
B.1 Basic Datasets

Twelve publicly available time-series datasets were sourced from the UCR archive [Bagnall et al.,
2017]. These datasets are the following, with their respective IDs from 1 to 12: Chinatown, ECG-
FiveDays, FreezerSmallTrain, GunPointOldVersusYoung, HouseTwenty, InsectEPGRegularTrain,
ItalyPowerDemand, Lightning7, MoteStrain, PowerCons, SonyAIBORobotSurface2, and UWaveG-
estureLibraryAll.

B.2 Pronunciation Audio Dataset

For the Pronunciation Audio dataset, the audio samples were obtained from publicly available
pronunciation recordings [Media, 2022]. The original pronunciation audio files are included in the
folder data/audio, and all processing steps related to these files can be found in the provided Python
file data_processing.py. This file contains the full preprocessing pipeline for converting raw audio
into time-series data, which was used for benchmarking and experimentation.

B.3 Parkinson’s Disease Sensor Dataset

During data processing, all personal identifying information (PII) has been thoroughly removed
from the dataset to ensure privacy and data security. The sensor data has been aggregated to a
per-second level, meaning the original, unaggregated data cannot be recovered, thereby minimizing
any risk of data exposure. The processing steps for this dataset are available in the Python file
parkinson_data_processing.py. This code generates the second-level processed data, which serves
as input for all benchmarking algorithms, including Motion Code.
To access the full original data and labeled datasets, researchers must apply for a separate
license. To apply for access to the original datasets, follow the instructions provided at:
https://www.synapse.org/Synapse:syn20825169/wiki/600903.
In addition, we provide general information for both the Pronunciation Audio data and the Parkinson’s
sensor data as processed by us in Table 6 below:

15
Table 6: Descriptions of 3 Datasets Processed by Authors.

Dataset Train Test Length Description

Pronunciation Audio 18 18 80-100 Amplitude values of the audio

datasets for the pronunciations of 2
words with different accents.
PD setting 1 20 322 257-1665 Parkinson’s disease sensor data fo-
cusing on understanding recovery
stage
PD setting 2 24 429 208-1665 Parkinson’s disease sensor data fo-
cusing on detecting tremor pattern

C ADDITIONAL FIGURES
C.1 Interpretable Features

(a) October to March Period (b) April to September Period

Figure 5: Forecasting with Uncertainty and Interpretable Features for ItalyPowerDemand.

(a) Cement (b) Carpet

Figure 6: Interpretable Features for SonyAIBORobotSurface2.

16
(a) Class 1 (b) Class 2 (c) Class 3
Figure 7: Interpretable Features for InsectEPGRegularTrain.

C.2 Forecasting with Uncertainty

(a) October to March Period (b) April to September Period

Figure 8: Forecasting with Uncertainty on ItalyPowerDemand from 5 Forecasting Algorithms

(a) Warm Season in PowerCons (b) Cold Season in PowerCons

Figure 9: Forecasting with Uncertainty on PowerCons from 5 Forecasting Algorithms

Essentials of Econometrics Damodar Gujarati Z Library
No ratings yet
Essentials of Econometrics Damodar Gujarati Z Library
52 pages
Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems
50% (2)
Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems
241 pages
STA457
No ratings yet
STA457
30 pages
R09 Probability Concepts Q Bank
No ratings yet
R09 Probability Concepts Q Bank
20 pages
Motion Code
No ratings yet
Motion Code
20 pages
MixMamba Time Series Modeling With Adaptive Expertise
No ratings yet
MixMamba Time Series Modeling With Adaptive Expertise
13 pages
Diffusion Models For Time Series Applications: A Survey
No ratings yet
Diffusion Models For Time Series Applications: A Survey
25 pages
Deep Functional Factor Models: Forecasting High-Dimensional Functional Time Series Via Bayesian Nonparametric Factorization
No ratings yet
Deep Functional Factor Models: Forecasting High-Dimensional Functional Time Series Via Bayesian Nonparametric Factorization
19 pages
Lian Duke 0066D 13204
No ratings yet
Lian Duke 0066D 13204
117 pages
M - R D M T S F: Ulti Esolution Iffusion Odels For IME Eries Orecasting
No ratings yet
M - R D M T S F: Ulti Esolution Iffusion Odels For IME Eries Orecasting
19 pages
Biological Data Science Lecture3
No ratings yet
Biological Data Science Lecture3
23 pages
ICDM23 Tutorial Robust TS 12 03
No ratings yet
ICDM23 Tutorial Robust TS 12 03
105 pages
Recurrent Interpolants For Probabilistic Time Series Prediction
No ratings yet
Recurrent Interpolants For Probabilistic Time Series Prediction
14 pages
Koopa: Learning Non-Stationary Time Series Dynamics With Koopman Predictors
No ratings yet
Koopa: Learning Non-Stationary Time Series Dynamics With Koopman Predictors
20 pages
N F M: A H C A T - S A: Eural Ourier Odelling Ighly Ompact Pproach To IME Eries Nalysis
No ratings yet
N F M: A H C A T - S A: Eural Ourier Odelling Ighly Ompact Pproach To IME Eries Nalysis
28 pages
Probabilistic Forecasting For Dynamical Systems With Missing or Imperfect Data
No ratings yet
Probabilistic Forecasting For Dynamical Systems With Missing or Imperfect Data
26 pages
Mathematics 11 01649
No ratings yet
Mathematics 11 01649
19 pages
From Fourier To Koopman Spectral Methods For Long-Term Prediction
No ratings yet
From Fourier To Koopman Spectral Methods For Long-Term Prediction
38 pages
Recurrent Neural Processes: Preprint. Under Review
No ratings yet
Recurrent Neural Processes: Preprint. Under Review
12 pages
Spectral Temporal Graph Neural Network For Multivariate Time-Series Forecasting
No ratings yet
Spectral Temporal Graph Neural Network For Multivariate Time-Series Forecasting
20 pages
Learning Deep Time-Index Models For Time Series Forecasting
No ratings yet
Learning Deep Time-Index Models For Time Series Forecasting
21 pages
Answerkey
No ratings yet
Answerkey
4 pages
Neural Dynamics Discovery Via Gaussian Process Recurrent Neural Networks
No ratings yet
Neural Dynamics Discovery Via Gaussian Process Recurrent Neural Networks
11 pages
Learning Graphical Models For Stationary Time Series: Fbach@cs - Berkeley.edu Jordan@cs - Berkeley.edu
No ratings yet
Learning Graphical Models For Stationary Time Series: Fbach@cs - Berkeley.edu Jordan@cs - Berkeley.edu
20 pages
100 Time Series Data Mining Questions With Answers
No ratings yet
100 Time Series Data Mining Questions With Answers
26 pages
Signal Segmentations
No ratings yet
Signal Segmentations
39 pages
MATH545-Time Series
No ratings yet
MATH545-Time Series
79 pages
Roadmap For Project
No ratings yet
Roadmap For Project
9 pages
DMD PDF
No ratings yet
DMD PDF
19 pages
Time Series
100% (1)
Time Series
91 pages
LAXCAT
No ratings yet
LAXCAT
9 pages
Unit 6
No ratings yet
Unit 6
73 pages
Anomalies in Time Series
No ratings yet
Anomalies in Time Series
19 pages
Neural Processes
No ratings yet
Neural Processes
11 pages
Unsupervised Time Series Outlier Detection123
No ratings yet
Unsupervised Time Series Outlier Detection123
56 pages
Atfnet: Adaptive Time-Frequency Ensembled Network For Long-Term Time Series Forecasting
No ratings yet
Atfnet: Adaptive Time-Frequency Ensembled Network For Long-Term Time Series Forecasting
24 pages
Datascience
No ratings yet
Datascience
14 pages
SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model
No ratings yet
SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model
11 pages
Ml2 Script v2
No ratings yet
Ml2 Script v2
123 pages
TimeGPT 1 2310.03589
No ratings yet
TimeGPT 1 2310.03589
12 pages
Ts Auburn Manuf
No ratings yet
Ts Auburn Manuf
15 pages
A Transformer-Based Framework For Multivariate Time Series Representation Learning
No ratings yet
A Transformer-Based Framework For Multivariate Time Series Representation Learning
20 pages
Arning Time Series Classification With Fisher Information
No ratings yet
Arning Time Series Classification With Fisher Information
22 pages
A Universal Framework
No ratings yet
A Universal Framework
13 pages
AM F M T S F: Amba Oundation Odel For IME Eries Orecasting
No ratings yet
AM F M T S F: Amba Oundation Odel For IME Eries Orecasting
15 pages
A Review of Change Point Detection Methods: January 2018
No ratings yet
A Review of Change Point Detection Methods: January 2018
47 pages
Deep Learning of Contagion Dynamics On Complex Networks: Article
No ratings yet
Deep Learning of Contagion Dynamics On Complex Networks: Article
11 pages
Deep Learning in Pattern Recognition and Stock Forecasting: T U O A
No ratings yet
Deep Learning in Pattern Recognition and Stock Forecasting: T U O A
54 pages
2233 A Transformer Based Framework
No ratings yet
2233 A Transformer Based Framework
19 pages
Lecture Notes For A Course On System Identification, v2012: Kristiaan Pelckmans
No ratings yet
Lecture Notes For A Course On System Identification, v2012: Kristiaan Pelckmans
24 pages
Fourier GNN
No ratings yet
Fourier GNN
23 pages
Synthetic ECG Generation For Data Augmentation and Transfer Learning in Arrhythmia Classification
No ratings yet
Synthetic ECG Generation For Data Augmentation and Transfer Learning in Arrhythmia Classification
23 pages
Ouyang 2017
No ratings yet
Ouyang 2017
13 pages
FreDo - Frequency Domain-Based Long-Term Time Series Forecasting
No ratings yet
FreDo - Frequency Domain-Based Long-Term Time Series Forecasting
12 pages
Patch Mixer
No ratings yet
Patch Mixer
7 pages
Multivariate Lstm-Fcns For Time Series Classification: A B A, A
No ratings yet
Multivariate Lstm-Fcns For Time Series Classification: A B A, A
18 pages
2402.05427 Sameera Sinc
No ratings yet
2402.05427 Sameera Sinc
25 pages
Lecture Notes 2013
No ratings yet
Lecture Notes 2013
231 pages
Lecture Notes - Kristiaan Pelckmans
100% (1)
Lecture Notes - Kristiaan Pelckmans
153 pages
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
No ratings yet
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
44 pages
Bayesian Inference On Change Point Problems
No ratings yet
Bayesian Inference On Change Point Problems
71 pages
PG 4thsem Geoinformatics Image Classification Process by Dr. Bharati Gogoi
No ratings yet
PG 4thsem Geoinformatics Image Classification Process by Dr. Bharati Gogoi
19 pages
Pendekatan Covarian Based SEM Dengan Estimasi Bollen-Stine
No ratings yet
Pendekatan Covarian Based SEM Dengan Estimasi Bollen-Stine
8 pages
Furr-2008-Journal of Personality
No ratings yet
Furr-2008-Journal of Personality
50 pages
2012-408 Understanding Correlation Matrices
No ratings yet
2012-408 Understanding Correlation Matrices
6 pages
Post Class Online Quiz 7 - Corporate Finance T322WSB 5
No ratings yet
Post Class Online Quiz 7 - Corporate Finance T322WSB 5
10 pages
(Ebook) Linear Mixed Models by BRADY T. WEST, Kathleen B. Welch, Andrzej T Galecki ISBN 9781032019321, 1032019328download
100% (3)
(Ebook) Linear Mixed Models by BRADY T. WEST, Kathleen B. Welch, Andrzej T Galecki ISBN 9781032019321, 1032019328download
60 pages
Formula and Notes For Class 11 Maths Download PDF Chapter 15. Statistics
No ratings yet
Formula and Notes For Class 11 Maths Download PDF Chapter 15. Statistics
16 pages
Lec4 PDF
No ratings yet
Lec4 PDF
13 pages
Data Science Regular Handout
No ratings yet
Data Science Regular Handout
25 pages
Simplis Syntax
No ratings yet
Simplis Syntax
36 pages
6338 - Multicollinearity & Autocorrelation
No ratings yet
6338 - Multicollinearity & Autocorrelation
28 pages
A Tutorial On Causal Inference
No ratings yet
A Tutorial On Causal Inference
68 pages
Assignment 01 AK
No ratings yet
Assignment 01 AK
4 pages
Case Study - Risk & Return
100% (1)
Case Study - Risk & Return
6 pages
Lecture 8
No ratings yet
Lecture 8
8 pages
Markowitz PortfolioSelection 1952
No ratings yet
Markowitz PortfolioSelection 1952
16 pages
Forecast Tool For R
No ratings yet
Forecast Tool For R
121 pages
Introduction To Clustering Procedures
No ratings yet
Introduction To Clustering Procedures
42 pages
Chapter 2. Random Variables: Niprl
No ratings yet
Chapter 2. Random Variables: Niprl
59 pages
Some Stats Concepts
No ratings yet
Some Stats Concepts
6 pages
Underwater Target Detection With Hyperspectral Data: Solutions For Both Known and Unknown Water Quality
No ratings yet
Underwater Target Detection With Hyperspectral Data: Solutions For Both Known and Unknown Water Quality
9 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
19 pages
Guideline For Offshore Structural Reliability Analysis (Aplication To Tension Leg Platforms) DNV
No ratings yet
Guideline For Offshore Structural Reliability Analysis (Aplication To Tension Leg Platforms) DNV
66 pages
Econ 471 Notes 1
No ratings yet
Econ 471 Notes 1
14 pages
The Microstructure of Stock Markets
No ratings yet
The Microstructure of Stock Markets
57 pages
PPT-Of - Rural Finance
No ratings yet
PPT-Of - Rural Finance
177 pages
The Scalar Kalman Filter
100% (4)
The Scalar Kalman Filter
16 pages
Co Variance Shrinkage
No ratings yet
Co Variance Shrinkage
9 pages

Motion Code Arxiv

Uploaded by

Motion Code Arxiv

Uploaded by

Motion Code: Robust Time Series Classification and

Forecasting via Sparse Variational Multi-Stochastic

Chandrajit Bajaj∗ Minh Nguyen∗

3. Most Informative Timestamps: An interpretable feature of Motion Code that employs

1.1 Related Works

2 MOTION CODE: JOINT LEARNING ON COLLECTIONS OF TIME

2.1 Stochastic Process Formulation and Data Assumption

We formulate the time series problem in the context of stochastic processes.

2.2 The Most Informative Timestamps

(c) Humidity Sensor (d) Temperature Sensor

S m = {s1 , · · · , sm } ⊂ R+ , and a variational distribution ϕ on Rm as follows:

2.3 Approximate Formula for Lmax

With these definitions, the function Lmax can be approximated as:

2.4 Motion Code Learning

where sigmoid is the standard sigmoid function.

1. Kernel parameters η := (η1 , · · · , ηL ) to approximate underlying stochastic process

automatic differentiation framework for each k ∈ 1, L.

2.5 Classification and Forecasting with Motion Code

kpredicted = arg max∥y − pk ∥2,R|T | (9)

We prepared three datasets for experimentation:

ID DTW TSF RISE BOSS BOSS-E catch22 Motion

ID Shape- Teaser SVC LSTM- Rocket Hive- Motion

3.2 Experimental Setups

3.3 Evaluation on Time Series Classification

4 MOTION CODE’S BENEFITS

(a) Absorptivity (b) Anything

(a) Normal (b) Light Tremor (c) Noticeable Tremor

4.2 Uneven Length and Missing Data

Acknowledgments and Disclosure of Funding

A.1 Approximation of Lmax

We aim to prove the approximation formula for Lmax :

We now state the approximation result as the following lemma:

Then Lmax defined in Section 2.2 has the approximate form:

A.2 Calculation of the Distribution p((gk )T )

B.2 Pronunciation Audio Dataset

B.3 Parkinson’s Disease Sensor Dataset

Dataset Train Test Length Description

Pronunciation Audio 18 18 80-100 Amplitude values of the audio

(a) October to March Period (b) April to September Period

(a) Cement (b) Carpet

C.2 Forecasting with Uncertainty

(a) October to March Period (b) April to September Period

(a) Warm Season in PowerCons (b) Cold Season in PowerCons

You might also like