0% found this document useful (0 votes)
11 views

Bayesian Optimization of Sample Entropy Hyperparameters For Short Time Series

Quantifying the complexity and irregularity of time series data is a primary pursuit across various data-scientific disciplines. Sample entropy (SampEn) is a widely adopted metric for this purpose, but its reliability is sensitive to the choice of its hyperparameters, the embedding dimension (m) and the similarity radius (r), especially for short-duration signals. This paper presents a novel methodology that addresses this challenge. We introduce a Bayesian optimization framework

Uploaded by

guydumais
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Bayesian Optimization of Sample Entropy Hyperparameters For Short Time Series

Quantifying the complexity and irregularity of time series data is a primary pursuit across various data-scientific disciplines. Sample entropy (SampEn) is a widely adopted metric for this purpose, but its reliability is sensitive to the choice of its hyperparameters, the embedding dimension (m) and the similarity radius (r), especially for short-duration signals. This paper presents a novel methodology that addresses this challenge. We introduce a Bayesian optimization framework

Uploaded by

guydumais
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

BAYESIAN O PTIMIZATION OF S AMPLE E NTROPY

H YPERPARAMETERS FOR S HORT T IME S ERIES

Zachary Blanks Donald E. Brown


School of Data Science School of Data Science
University of Virginia University of Virginia
arXiv:2405.06112v1 [stat.AP] 9 May 2024

Charlottesville, VA 22903 Charlottesville, VA 22903


[email protected] [email protected]

A BSTRACT
Quantifying the complexity and irregularity of time series data is a primary pursuit across various
data-scientific disciplines. Sample entropy (SampEn) is a widely adopted metric for this purpose,
but its reliability is sensitive to the choice of its hyperparameters, the embedding dimension (m) and
the similarity radius (r), especially for short-duration signals. This paper presents a novel method-
ology that addresses this challenge. We introduce a Bayesian optimization framework, integrated
with a bootstrap-based variance estimator tailored for short signals, to simultaneously and optimally
select the values of m and r for reliable SampEn estimation. Through validation on synthetic signal
experiments, our approach outperformed existing benchmarks. It achieved a 60 to 90% reduction
in relative error for estimating SampEn variance and a 22 to 45% decrease in relative mean squared
error for SampEn estimation itself (p ≤ 0.043). Applying our method to publicly available short-
signal benchmarks yielded promising results. Unlike existing competitors, our approach was the
only one to successfully identify known entropy differences across all signal sets (p ≤ 0.042). Ad-
ditionally, we introduce “EristroPy,” an open-source Python package that implements our proposed
optimization framework for SampEn hyperparameter selection. This work holds potential for appli-
cations where accurate estimation of entropy from short-duration signals is paramount.

Keywords Nonlinear Time Series Analysis, Hyperparameter Optimization, Bootstrap Estimation, Short-Duration
Signals

1 Introduction
The ability to quantify the complexity or irregularity inherent in time series data is a primary pursuit across diverse
data-scientific disciplines, including healthcare [1, 2, 3, 4, 5, 6], finance [7, 8], and anomaly detection systems [9, 10,
11, 12, 13]. Sample entropy (SampEn) has emerged as a widely adopted measure for this purpose [14].
SampEn estimates regularity through a process of “template matching,” partitioning the time series into overlapping
segments or “templates” of length determined by the embedding dimension (m). The similarity between templates
is then evaluated based on their proximity within a defined radius (r), typically measured in the L∞ space. A higher
number of “matches” – templates being within the radius r of one another – suggests greater regularity or predictability
in the signal.
However, obtaining reliable SampEn estimates, particularly for short-duration signals, hinges on the careful selection
of these two hyperparameters: m and r [15].
Larger embedding dimensions (m) are preferred to capture more complex time series patterns, but this tends to spread
templates further apart in L∞ space, reducing the number of matches for a fixed radius r [16]. Conversely, smaller
values of r allow more precise characterization of similarities between templates but also diminish the match count.
This dynamic illustrates a fundamental trade-off: while larger m and smaller r are individually beneficial for charac-
terizing signal complexity, their combined effect strains our ability to obtain a sufficient number of template matches
required for stable SampEn estimation. Thus, the practical process of estimating SampEn for short-duration signals
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

Median L∞ Nearest-Neighbor Distance 100 1600

Total Number of Template Matches


m=1
1400 m=2
m=3
1200
10−1
1000

800

10−2 600

400

200

10−3
m=1 m=2 m=3 r = 0.1σ r = 0.15σ r = 0.2σ r = 0.25σ

Figure 1: The interdependence of embedding dimension (m) and distance radius (r) illustrates a fundamental trade-
off: as m increases, templates diverge in L∞ space, reducing the number of matches for a fixed r, making it harder to
obtain stable SampEn estimates. The multiplier σ is set to one and denotes the signal standard deviation.

amounts to making deliberate trade-offs between picking larger values of m and smaller values of r to achieve stable
SampEn estimates by ensuring more template matches.
To illustrate this point, we depict a signal with N = 100 observations, computing the median L∞ distance between
nearest neighbors among all m-dimensional templates and the total number of template matches for a fixed radius r.
The results are shown in Fig. 1 and empirically corroborate our previous claims.
Existing approaches for selecting SampEn hyperparameters share a common limitation: the requirement to pre-
determine the embedding dimension m, and then pick the similarity radius r [3, 17]. This strategy, however, ne-
glects the interdependence between the hyperparameters. Furthermore, quantifying the variance of SampEn estimates
obtained from short-duration signals, a necessary condition for appropriate hyperparameter selection, is particularly
challenging due to the intra-correlated and non-linear construction of SampEn. Addressing these limitations is crucial
for applications where accurate assessments of signal regularity are paramount, such as real-time clinical monitoring
from short-duration physiological signals [18, 19, 20].
The primary objective of this paper is to address the challenge of selecting m and r to achieve reliable SampEn
estimates. We present two main contributions to advance the field of SampEn estimation for short-duration signals:

1. A novel bootstrap-based, non-parametric estimator for SampEn variance, tailored for short signals, and a
Bayesian optimization (BO) framework for the task of concurrently selecting optimal (m, r) combinations.
2. An open-source Python package, “EristroPy,” (https://zblanks.github.io/eristropy/) implement-
ing our proposed hyperparameter selection algorithm, validated against existing benchmarks across diverse,
publicly available signal datasets encompassing multiple signal types and measurement modalities.

Through synthetic experiments and real-world case studies, we demonstrate the superiority of our approach in esti-
mating SampEn variance and automating parameter selection over existing methods, consistently identifying known
entropy differences between signal classes. Our work addresses a long-standing challenge in SampEn estimation,
enabling more accurate and reliable assessments of signal complexity.

2 Sample Entropy Overview


2.1 Sample Entropy Computation

SampEn is a measure used to estimate the complexity or regularity of time series data. Developed by Richman and
Moorman [14], it builds upon Pincus’s ApEn [21]. Like related measures (e.g., Eckmann-Ruelle entropy [22], ApEn,
etc.), estimation of SampEn focuses on assessing regularity via “template matching.”

2
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

Formally, let x ∈ RN be a time series signal of length N . A template of length m (known as the “embedding di-
(i)
mension”) is denoted by xm = (xi , . . . , xi+m−1 ). Implicitly, this definition of a template considers only consecutive
data points from the signal. This does not have to be the case in general (formally captured through the time delay
parameter, τ ∈ Z+ ), but we fix τ = 1 to ensure as large a sample size as possible, a common choice in the SampEn
literature [15, 23].
We compare these templates and consider them a match if the distance between them is less than a defined ra-
(i) (j)
dius, r > 0. Specifically, a match occurs when xm − xm ≤ r (i ̸= j), , denoting the maximum absolute

difference between the elements of two templates. Unlike ApEn, SampEn does not consider self-matches – i.e.,
(i) (i)
xm − xm = 0.

To calculate signal regularity, we compute two quantities, B m (r) and Am (r). B m (r), defined as:

NX −m h
−m NX i
1
B m (r) = 1 x(i) (j)
m − xm ≤r , (1)
Z(N, m) i=1 j=1 ∞
j̸=i

is the probability, in a frequentist sense, of the signal remaining within a radius, r, for m points. Similarly, Am (r),
given by:

NX −m h
−m NX i
1
Am (r) = 1 x(i) (j)
m+1 − xm+1 ≤r , (2)
Z(N, m) i=1 j=1 ∞
j̸=i

is the probability that the signal stays within the same radius for an additional step (m + 1 points in total). In Eqns.
(1) and (2), 1 [·] denotes the indicator function and Z(N, m) is a normalization constant to ensure valid probabili-
ties and like-to-like comparisons between templates of size m and m + 1. For both B m (r) and Am (r) with fixed
hyperparameters, (m, r), their estimates become more accurate as the signal length, N , grows larger.
m
A (r)
The ratio of these quantities: B m (r) is the conditional probability (CP) that a sequence will remain within radius r for

m + 1 steps, given it has stayed within r for m steps. For fixed hyperparmeters (m, r), the SampEn is defined as:
 
Am (r)
SampEn(m, r) = lim − log . (3)
N −→∞ B m (r)
However, we of course do not have access to infinitely long signals and thus the best finite data estimation of SampEn
for x is given by:
 
Am (r)
SampEn(x, m, r) = − log . (4)
B m (r)

If no matches are found at radius r across the signal (i.e., B m (r) = 0), SampEn(x, m, r) is undefined. Alternatively,
if B m (r) > 0 but Am (r) = 0 then SampEn(x, m, r) = ∞.

2.2 Existing Approaches to Picking SampEn Hyperparameters

Accurate and stable estimation of SampEn depends heavily on selecting appropriate values for the hyperparameters
(m, r). These choices significantly influence the computed SampEn value and, consequently, the interpretation of the
data [24, 25]. The prevalent guideline suggests setting m = 2 and r ∈ [0.10, 0.25]×σ – the signal’s standard deviation
– with (m = 2, r = 0.20σ) – often adopted as a standard [23]. This protocol, originally intended for ApEn in datasets
with N = 1000 observations [26, 21], has been empirically extended to smaller datasets, given SampEn’s relative
insensitivity to signal length [23, 27]. Nonetheless, its applicability is questionable for very short time series (about
N ≤ 200 observations) [15]. Recent studies in areas such as ultra-short heart rate variability [18] and myoelectric
sensor signals in prosthetics [19] highlight these challenges.
One approach to circumvent the parameter selection challenge in SampEn estimation involves integrating it into a
supervised machine learning classification framework, where SampEn serves as an input feature. Researchers have

3
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

employed methods like grid-search over the parameters m and r, in conjunction with various cross-validation tech-
niques, to determine optimal parameter settings for the given data and classification models [28, 29, 30, 31].
Nevertheless, this classification-focused methodology may not always be suitable. Often, the primary objective is
to analyze signal complexity to gain insights into the dynamics of the physical process, rather than to differentiate
between distinct classes. In such analyses, selecting SampEn parameters thoughtfully is crucial to accurately reflect
the irregularity of the underlying system, rather than optimizing for classification accuracy. This paper will primarily
focus on this aspect of SampEn application.
In contexts other than supervised learning, selecting appropriate parameters for SampEn remains an open challenge,
particularly for the similarity radius parameter (r). Researchers have generally explored two main approaches outside
the standard settings of (m = 2, r = 0.20σ) for SampEn parameterization: optimization-based and convergence-based
methods.
A key contribution in the optimization-based category, and a primary benchmark for this paper, is the method pro-
posed by Lake et al. [3]. This approach involves selecting the radius that optimizes a SampEn efficiency criterion
(SampEnEff) for a predetermined embedding dimension (m), usually fixed based on prior knowledge or an autore-
gressive analysis which finds the optimal time order lag determined via the Bayesian information criterion as a proxy
for embedding dimension [3, 32]. The objective function seeks to minimize the variability in SampEn estimates across
different radius values:
 
σCP σCP
min max , . (5)
r∈R CP − log(CP)CP
In Eq. (5), R denotes the set of feasible radius values, with CP representing the conditional probability from the
Am (r)
SampEn calculation (the ratio: B m (r) ) and σCP indicating the standard deviation of these estimates. The SampEnEff

metric favors entropy estimates with lower variance. Their methodology was validated on neonatal heart rate data
signals containing N = 4096 observations.
However, this approach may not be as effective with shorter time series. The assumptions underlying Eq. (5) that
connects σCP to the standard error (SE) of SampEn requires linearity in parameters and a Gaussian error distribution
for CP estimation (predicated on the Central Limit Theorem). These assumptions might not be applicable to shorter
signals due to the limited template match sample size. Consequently, the effectiveness of σCP
CP
as an approximation for
the SE of SampEn in such cases is questionable.
Conversely, convergence-based strategies for SampEn parameter selection, originally introduced by Ramdani et al.
[17] and further developed or applied by Kedadouche et al. [33] for SampEn and Mengarelli et al. [34] for fuzzy
entropy [35], have gained traction recently. This rise in popularity may be attributed to their conceptual and compu-
tational simplicity. These methods, with the embedding dimension m held constant, aim to identify a radius at which
the SampEn estimates or their variance reach a stable point. Typically, the selection of r is often conducted through
visual analysis of a convergence plot, effectively seeking the “elbow” or “knee” point where increases in radius yield
diminishing returns in reducing the variance of estimates. However, we formalize this “knee”-finding approach using
the algorithm detailed by Satopaa et al. [36].
The efficacy of convergence-based approaches hinges on accurately determining the variance of SampEn estimates.
In some scenarios, such as in the study by Mengarelli et al. [34], simulating input signals simplifies the process of
estimating variance. However, it is not universally feasible to model data as derivatives of simple synthetic signals
[25]. In these instances, either longer signals are needed to informally gauge variance, or one can employ the SampEn
variance estimator proposed by Lake et al. [3] as a more formal approach.

3 Methods
3.1 Mathematical Notations

We start by defining notations.


• x ∈ RN is a time series signal containing N observations. In general, we assume that x has zero mean and
unit variance.
• S = {x1 , . . . , xn } is a collection of n ∈ Z+ signals where the i-th signal, xi ∈ RNi , contains Ni observa-
tions. We similarly assume that all the signals in S have zero mean and unit variance.
• m ∈ Z+ is the SampEn embedding dimension.

4
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

• r ∈ R+ is the SampEn similarity radius.


• θ̂(m, r) denotes the SampEn estimate of x for a fixed (m, r) combination (see Eq. (4))
• B m (r) and Am (r) are probabilities of the signal, x, staying within r for m and m + 1 points, respectively
(see Eqns. (1) and (2)).
m
A (r)
• CP = B m (r) is the conditional probability that the signal, x, stays within r for m + 1 points given that it has

remained within r for m points.


• q ∈ (0, 1) is the stationary bootstrap success probability characterizing the variable block sizes (b ∼
Geom(q)) (see Section 3.2).
• xb ∈ RN is a bootstrapped signal of x constructed via Algorithm 1 given q.
B
• {xb }b=1 is the collection of B ∈ Z+ bootstrapped signals given q.
• θ̂b (m, r) is the SampEn estimate obtained from the bootstrapped signal, xb , given (m, r, q).
n oB
• θ̂b (m, r) is the collection of B bootstrapped SampEn estimates obtained from the bootstrapped signal
b=1
B
set, {xb }b=1 , given (m, r, q).
• λ ≥ 0 is a regularization control hyperparameter (see Eq. (9)).
• Ψ1 ⊆ Z+ , Ψ2 , Ψ3 ⊆ (0, 1) represent the domains of m, r, and q, respectively.
• Ψ = Ψ1 × Ψ2 × Ψ3 is the domain of the decision variables, ψ.
• ψ = (m, r, q) ∈ Ψ are the optimization decision variables.
  √
• f (ψ) = MSE θ̂(m, r) + λ r is the objective function for a signal signal, x (see Eqns. (8) and (9)).

• f˜(ψ) is the modified objective function for a signal set, S (see Eq. (12)).
• yt = f (ψ t ) + ϵt represents the t-th observation of the objective function, f : Ψ → R+ , possibly observed
with noise, ϵt .
• D = {(ψ t , yt )}Tt=1 is the set of observations.
• D(l) and D(g) are the sets of observations derived from D containing the better-performing and worse-
performing objective function evaluations, respectively.
• γ ∈ (0, 1] is the top quantile used to construct D(l) .
• y γ ∈ R+ is the top-γ objective value in D.
 
• p ψ | D(l) and p ψ | D(g) are the probability density functions (PDFs) of the better-performing and
worse-performing groups, respectively.

3.2 Bootstrap-Based Variance Estimation for SampEn

Given a short-duration signal x ∈ RN (e.g., N ≤ 200), our aim is to select SampEn hyperparameters (m, r) to ensure
we obtain a stable estimate of SampEn(x, m, r). For notational brevity, let θ̂(m, r) denote the SampEn estimate of x
for fixed (m, r).
To ensure SampEn estimate stability, we require an accurate quantification of variance. We propose a bootstrap-based
approach for this purpose. The bootstrap, introduced by Efron [37], generates empirical distributions of arbitrary
statistics by resampling data. For SampEn, we resample x, yielding a new signal, xb ∈ RN , which we use to calculate
n oB
the bootstrapped SampEn estimate θ̂b (m, r). This process is repeated B ∈ Z+ times, giving the set: θ̂b (m, r) .
b=1
Standard bootstrapping, however, assumes samples from the data are independent, an inappropriate assumption for
time series data. We address this issue by using the stationary bootstrap [38]. The main intuition behind this algorithm
is that adjacent elements of the signal – e.g., (xi , . . . xi+b−1 ) – should be treated as connected “blocks.” We then build
the new bootstrapped signal, xb , by stitching randomly selected blocks together. Algorithm 1 details this method.
There are two noteworthy points. First, the stationary bootstrap uses variable block sizes (b ∼ Geom(q) for q ∈ (0, 1))
to mitigate overall sensitivity to block size [39]. Second, to handle the stop index, t1 , exceeding signal length, we
wrap indices around. That is, if t1 = N + a where a > 0 then the elements we use to construct the data block are:
[xt0 , . . . , xN , x1 , . . . , x1+a−1 ].

5
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

Algorithm 1 Stationary Bootstrap Signal Generation


Require: x ∈ RN , N ∈ Z+ , q ∈ (0, 1)
1: B ← [] ▷ Placeholder for data blocks from x
2: l ← 1
3: while l ≤ N do
4: t0 ← U (1, . . . , N )
5: b ← Geom(q)
6: t1 ← t0 + b − 1 ▷ Handle t1 > N by “wrapping” the index around
7: if l + b > N then ▷ We cannot construct bootstrapped signals greater than N observations
8: b̃ ← b − (l + b − N )
9: t1 ← t0 + b̃ − 1
10: end if
11: B ← append ([xt0 , . . . , xt1 ])
12: l ←l+b
13: end while
14: xb ← concatenate the blocks in B
15: return xb

n oB
With a fixed q ∈ (0, 1), and bootstrapped SampEn estimates θ̂b (m, r) , we compute the variance of the SampEn
b=1
estimate as:

  1 X
B 2
V θ̂(m, r) ≈ θ̂b (m, r) − θ̄(m, r) , (6)
B
b=1

where θ̄(m, r) is the mean of bootstrapped SampEn estimates.

3.3 Minimizing the SampEn Estimate Mean Squared Error


n oB
Recall that θ̂b (m, r) is the set of B SampEn estimates obtained from the stationary bootstrap replicates of x for
b=1
a fixed q ∈ (0, 1) and SampEn hyperparameters (m, r).
We seek a stable SampEn estimate of x by manipulating the hyperparameters (m, r, q), aiming for low bias – the
systematic deviation of the bootstrap estimates from the original SampEn estimate, θ̂(m, r) – and low variance –
sensitivity of the bootstrap estimates to minor signal variations.
The bias of θ̂(m, r) is approximated by the expression:

!
  B
1 X
Bias θ̂(m, r) ≈ θ̂b (m, r) − θ̂(m, r). (7)
B
b=1

Conveniently, the mean squared error (MSE) of a statistic – a common loss function – corresponds to the squared bias
plus the variance of that statistic. Thus, the approximate MSE of θ̂(m, r) is given by:

    2  
MSE θ̂(m, r) = Bias θ̂(m, r) + V θ̂(m, r)
(8)
1 X 2
B
≈ θ̂(m, r) − θ̂b (m, r) .
B
b=1

To then select the optimal values of (m, r, q) which minimizes the approximate MSE of θ̂(m, r), we compute:

6
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

  √
minimize MSE θ̂(m, r) + λ r
m, r, q
subject to m ∈ Z+ , (9)
r, q ∈ (0, 1).

We avoid trivial solutions from Eq. (9) where r → ∞ by introducing the regularization function, Ω(r) = λ r and
constrain r ∈ (0, 1) where λ ≥ 0 acts as the regularization parameter and assume a normalized signal with zero mean
and unit variance. Values for r > σ, where σ is the empirical standard deviation of x, are likely not a meaningful char-
acterization of similarity for
√ short-duration signals and would notably diverge from standard recommendations [23].
Furthermore, over (0, 1), r provides a greater degree of regularization than the more canonical L1 regularization:
|r|, and the L22 regularization: r2 .
Problem (9) is difficult to solve because the objective function is non-convex and we do not have access to its gradients
due to our proposed bootstrapping procedure.

3.4 BO-driven SampEn Hyperparameter Search

Consider the problem minψ∈Ψ f (ψ), where ψ represents decision variables, Ψ is their constraint set or domain, and
f is a continuous, non-convex, and “black-box” objective function. In our case, this problem corresponds to selecting
optimal SampEn hyperparameters to achieve stable estimates
Bayesian optimization (BO) is well-suited for such problems, finding locally optimal solutions efficiently [40]. We
employ the Tree-structured Parzen Estimator (TPE) method [41], known for its effectiveness in similar hyperparameter
optimization tasks [42].
BO iteratively explores the decision space, Ψ, intelligently selecting new points for evaluation based on previous
observations and uncertainties about unexplored regions.
In the TPE method, shown schematically for a simplified example focusing on the similarity radius, r, in Fig. 2, we
partition past objective function evaluations into better and worse-performing sets using a threshold cutoff, y γ ∈ R+ .
Then, we construct probability densities for these sets and select the next evaluation point as the argmax of the ratio
of these densities.
To apply the TPE method to the SampEn hyperparameter selection problem, we start by considering how to select the
(T + 1)-th evaluation point given T previous objective function evaluations. We summarize this process in Algorithm
2.

Algorithm 2 TPE New Point Selection Process


T
Require: T previous objective function evaluations, D = {(ψ t , yt )}t=1 , S ∈ Z+
1: Compute top-quantile cut-off, γ = Γ(T ) ▷ See Appendix A.1
2: Sort D by yt such that y1 ≤ y2 ≤ · · · ≤ yT
3: Compute T (l) = min (⌈γT ⌉, 25)
T (l) T
4: Construct D (l) = {(ψ t , yt )}t=1 and D (g) = {(ψ t , yt )}t=T (l) +1
T +1
5: Compute {wt }t=0  ▷ See Appendix A.3
(l)
6: Build p ψ | D and p ψ | D(g) ▷ See Appendix A.2 and A.4
S 
7: Sample S ≡ {ψ s }s=1 ∼ p ψ | D (l)
p(ψ|D (l) )
8: ψ T +1 ← argmaxψ∈S
p(ψ|D (g) )
9: return ψ T +1

Suppose we have T observations obtained by evaluating f as D = {(ψ t , yt )}Tt=1 and assume these observations are
sorted by yt such that y1 ≤ y2 ≤ · · · ≤ yT . Let γ ∈ (0, 1] be the top quantile of the observations, and y γ be the
top-γ objective value in D. We partition D into a better-performing set, D(l) , and worse-performing set, D(g) , where
(l)
D(l) = {(ψ t , yt )}Tt=1 and T (l) = min (⌈γT ⌉, 25), and D(g) contains the remaining observations.
Under the TPE method, the surrogate model is the conditional probability density function:

7
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

Split at y = y γ
10−1
9 × 10−2
8 × 10−2
7 × 10−2
y = f (r)

6 × 10−2
0.1 0.2

10−1
y = yγ


50 p r | D(l)

p r | D(g)
40 p(r|D(l) )
p(r|D(g) )
Density

30
p(r|D(l) )
rt+1 ←− argmax p(r|D(g) )
≈ 0.131
20

10

0
0.0 0.2 0.4 0.6 0.8 1.0

Similarity Radius (r)

Figure 2: Simplified depiction of the Tree-structured Parzen Estimator (TPE) process for choosing a new evaluation
point during the optimization procedure. (Top): We partition the objective function evaluations into better and worse-
performing sets using a threshold cutoff, y ≤ y γ . (Bottom): We construct the probability densities for the better and
worse-performing sets, then select the new evaluation point as the argmax of the ratio of these densities.

8
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

 
p ψ | D(l) , if, y ≤ y γ
p (ψ | y, D) =  (10)
p ψ | D(g) , if, y > y γ

and the acquisition function to select new evaluation points is defined as:

p ψ | D(l)
ψ T +1 := argmaxψ∈Ψ . (11)
p ψ | D(g)

The TPE method diverges from standard BO by modeling how decision variables, ψ, are affected by function evalua-
tions, rather than modeling the objective function itself [41]. This strategy uses the intuition that promising evaluation
points are likely found within regions containing higher-performing observations. We construct the PDFs, p(ψ | D(l) )
and p(ψ | D(g) ) using a mixture of truncated Gaussian kernel density estimators (KDEs).
The surrogate model involves several hyperparameter choices, including γ, the kernel estimator, k, the bandwidth
T +1
parameter, b, and mixture weights, {wt ∈ [0, 1]}t=0 . We adopt default values from the “Optuna" optimization frame-
work [43], based on prior research (e.g., [41, 44, 42, 45]). Further details about these hyperparameters are provided in
Appendix A.

3.5 SampEn Hyperparameter Optimization Algorithm

We present Algorithm 3 to outline our approach for finding a locally optimal solution to the SampEn hyperparameter
selection problem for the signal, x ∈ RN . The decision variables ψ = (m, r, q) ∈ Ψ, representing the SampEn
hyperparameters (m, r) and the stationary bootstrap parameter, q, are optimized with respect to the objective function,
f (ψ) (defined in Section 3.1).

Algorithm 3 SampEn Hyperparameter Selection Optimization


Require: x ∈ RN , B ∈ Z+ , Te = Number of BO trials, λ ≥ 0, Tinit ∈ Z+
1: D ← ∅ ▷ Empty observation set
2: for t = 1, . . . , Tinit do
3: Select ψ t randomly
4: Compute SampEn estimate of x, θ̂(mt , rt )
B
5: Generate bootstrap replicates of x, {xb }b=1 , given qt ▷ See Algorithm 1
n oB
6: Compute bootstrap SampEn estimates, θ̂b (mt , rt )
  √
b=1
7: yt = f (ψ t ) = MSE θ̂(mt , rt ) + λ rt
8: D ← D ∪ (ψ t , yt )
9: end for
10: while t ≤ Te do
11: Select ψ t+1 using the TPE acquisition process ▷ See Algorithm 2
12: Compute SampEn estimate, θ̂(mt+1 , rt+1 )
B
13: Generate bootstrap replicates of x, {xb }b=1 , given qt+1 ▷ See Algorithm 1
n oB
14: Compute bootstrap SampEn estimates, θ̂b (mt+1 , rt+1 )
   b=1

15: yt+1 = f ψ t+1 = MSE θ̂(mt+1 , rt+1 ) + λ rt+1

16: D ← D ∪ ψ t+1 , yt+1
17: end while 
18: y ∗ ← min y1, . . . , yTe
19: ψ ∗ ← argmin y1 , . . . , yTe
20: return ψ ∗ , y ∗

One key aspect of Algorithm 3 is the random initialization phase, where we start the search process using Tinit ∈ Z+
replicates. Throughout this manuscript, we adopt the default value provided by Optuna [43], Tinit = 10. Following the
initial Tinit evaluations, we transition to employing the TPE method, as explained in Algorithm 2, to iteratively explore
the decision space, Ψ, and selecting new evaluation points.

9
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

3.6 Optimizing SampEn Hyperparameters for Multi-signal Analysis

In SampEn-based analyses involving multiple signals, it is customary to select a fixed (m, r) combination for all
signals to ensure consistent comparisons. While our algorithm thus far has focused on optimizing (m, r) for a single
signal x ∈ RN , extending it to accommodate a signal set S = {x1 , . . . , xn } with n ∈ Z+ signals is straightforward.
Algorithm 4 in Appendix B outlines this extension. For each signal, xi , we compute the SampEn estimate, θ̂i (m, r),
n oB
B
generate B bootstrap replicates, {xi,b }b=1 , and calculate the bootstrapped SampEn estimates, θ̂i,b (m, r) . Using
b=1
these, we estimate the SampEn MSE of xi given (m, r, q). The modified objective function, f˜(ψ), accounts for n
signals and is defined as:
!
1X
n   √
f˜(ψ) = MSE θ̂i (m, r) + λ r. (12)
n i=1

Under this scheme, the process to select new evaluation points using the TPE method remains unchanged. We simply
use the mean of the regularized MSE estimates across the signal set, S, to select a unified value of (m, r).

4 Results
All computational experiments were performed with a MacBook Air equipped with an Apple M1 chip with 16GB of
RAM running macOS Sonoma 14.4.1, using Python 3.11. We developed a Python package, “EristroPy,” which pro-
vides a complete implementation of our proposed SampEn hyperparmaeter selection strategy. Detailed documentation
for “EristroPy" can be found at https://zblanks.github.io/eristropy.

4.1 Synthetic Signal Experiments

We assessed the performance of our proposed SampEn hyperparameter optimization algorithm using synthetic white
noise and autoregressive order one (AR(1)) signals. Our evaluation focused on the following aspects:

• The accuracy of our proposed SampEn variance estimator (Eq. 6) compared to the estimator by Lake et al.
[3].
• The impact of the regularization parameter λ on the optimization procedure.
• The search behavior of the BO algorithm within the (m, r, q) decision space.
• The influence of the stationary bootstrap success probability q.
• The overall SampEn hyperparameter selection performance relative to existing benchmarks.
• The computational efficiency of the proposed method.

White noise is a series of independent, identically distributed random variables: xt ∼ N (0, σ 2 ) for σ > 0. AR(1)
processes are stochastic models where each value in the series is linearly dependent on the previous value plus a
random noise term: xt = ϕxt−1 + ϵt where ϵt ∼ N (0, σ 2 ) for σ > 0 and ϕ ∈ R. The AR(1) signal class, thus,
represents a signal containing both stochastic and deterministic components.
For our experiments, we set σ = 1 for white noise and ϕ = 0.9 with σ = 0.1 for AR(1) processes. We ensured
statistical signal stationarity according to the Augmented Dickey-Fuller (ADF) test [46] by applying a burn-in period
of 500 samples for AR(1) processes.

4.1.1 Evaluating SampEn Variance Estimators for Short Signals


We hypothesized that the SampEn variance estimator proposed by Lake et al. [3], relying on template match counting
and uncertainty propagation, might not be optimal for short signal settings. To test this, we compared our bootstrap-
based estimator with Lake et al.’s estimator across white noise and AR(1) signal classes, varying the signal length N
and similarity radius r, with fixed m = 1 and different q values for each signal type. We chose q = 0.5 for the AR(1)
signal class to yield an expected block size of b = 2 aligning with its time-order dependence, and q = 0.9 for Gaussian
white noise, accounting for its lack of temporal dependence.
The mean percent difference in SampEn variance estimation error, ∆ϵ(r, N ), was assessed for N ∈ {50, 100, 200}
and r ∈ {0.20σ, 0.25σ}. Positive values of ∆ϵ(r, N ) indicate our bootstrap-based estimator outperforms Lake et

10
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

al.’s approach in terms of SampEn variance estimation error. Summary statistics of ∆ϵ(r, N ) are presented in Table
1, showing consistent superiority of our proposed method across all signal types, similarity radii, and signal lengths.
Refer to Appendix C for experimental and computational specifics.

Table 1: Our proposed bootstrap-based SampEn variance estimator demonstrates statistically lower error rates com-
pared to the “counting”-based approach by Lake et al. [3]. Across signal type, N , and r combinations, we achieve a
reduction in SampEn variance estimation mean squared error typically ranging from 60% to 90% relative to the count-
ing approach. The ∆ϵ(r, N ) column is the mean posterior estimate for the relative percentage difference in SampEn
variance estimation error (see Eq. (26) in Appendix C) at similarity radius r and signal length N .
Signal Type N r ∆ϵ(r, N ) 94% Credible Interval
0.20σ 69.16 [68.36,70.08]
50
0.25σ 79.44 [78.99,79.88]
0.20σ 90.15 [89.95,90.36]
White Noise 100
0.25σ 91.46 [91.29,91.64]
0.20σ 94.42 [94.32,94.52]
200
0.25σ 94.93 [94.84,95.01]

0.20σ 64.04 [63.71,64.37]


50
0.25σ 70.96 [70.73,71.20]
0.20σ 68.49 [68.25,68.73]
AR(1) 100
0.25σ 69.27 [69.04,69.49]
0.20σ 61.28 [61.06,61.53]
200
0.25σ 60.78 [60.53,61.02]

However, while our method provides a more precise variance estimate, it does not guarantee superior SampEn hyper-
parameter choices. This aspect is further explored in Section 4.1.5.

4.1.2 Impact of Regularization on BO-driven SampEn Hyperparameter Search



In Eq. (9), we introduced the regularization function Ω(r) = λ r for λ ≥ 0 to prevent trivial solutions where
r −→ ∞. This regularization is essential because selecting r greater than the maximum L∞ distance between two
templates would yield zero MSE and a meaningless characterization of time series complexity.
Two primary considerations arise when applying regularization to the objective function. First, determining the appro-
priate magnitude of λ. Second, assessing whether the level of regularization applied is sufficient.
To address these questions, we focused on selecting optimal SampEn hyperparameters (m, r) for AR(1) and Gaussian
white noise signals, x ∈ R100 , each containing N = 100 observations with fixed values for ϕ and σ. We employed
Algorithm 3, setting q = 0.5 for AR(1) signals and q = 0.9 for white noise signals,
 with B = 100 bootstrap replicates
and Te = 100 Bayesian optimization (BO) trials. We varied λ across the range 0, 1000 1 1
, 100 1 1 1 1
, 10 , 5 , 3 , 2 , 1, 10 . Fig.
3 illustrates the distribution of r values explored by the BO algorithm during optimization.
 1 1
Overall, it appears that λ ∈ 10 , 3 encourages the BO algorithm to avoid fixating on r = 1, as observed with λ = 0
1
and λ = 1000 , or fixating on the lower bound of r with λ = 1 or λ = 10. Ideally, we aim to select a radius value that
yields a search within the recommended range of [0.10, 0.25] × σ [23], while allowing flexibility for achieving better
overall SampEn estimate stability.
We offer the following heuristic for selecting λ: if the BO algorithm primarily explores at or near the upper bound of
the radius domain, consider increasing λ to encourage tighter radii. Conversely, if the algorithm primarily focuses at
or near√the lower bound of the radius domain, suggesting that the objective is overly influenced by the regularization
term λ r, consider relaxing λ to promote broader exploration of radius options.

4.1.3 BO Search Dynamics for SampEn Hyperparameter Selection


We assessed the quality of the BO search over the SampEn hyperparameter space by evaluating three properties:

• The ability of the algorithm to find good, locally optimal solutions within a limited number of trials [40].
• The presence of distinct “exploration” and “exploitation” phases during the search process [42].
• The algorithm’s capability to identify the interdependence between (m, r) and make appropriate trade-offs.

11
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

AR(1) White Noise


1.0
Similarity Radius (r)

0.8

0.6

0.4

0.2

0.0
1

1
0

10

10
1

1
1

1
0

100

10

100

10

2
1

1
100

100
λ=

λ=

λ=

λ=
λ=

λ=

λ=

λ=

λ=

λ=
λ=

λ=
λ=

λ=
λ=

λ=
λ=

λ=
 1 1
Figure 3: λ ∈ 10 , 3 appears to effectively regularize the optimization procedure, preventing fixation at or near
r = 1, while also avoiding fixation near the lower bound such as with λ = 1 or λ = 10.

Fixed q = 0.50 Free q


yt∗ yt∗
Objective Function Value (yt)

10−1

0 20 40 60 80 100 0 20 40 60 80 100
Bayesian Optimization Trial Number (t) Bayesian Optimization Trial Number (t)

Figure 4: The BO algorithm identified good SampEn hyperparameter solutions in a relatively small number of trials
for AR(1) signals, transitioning from exploration to exploitation. (Left): Without optimizing q, good solutions were
found in fewer than 40 trials. (Right): Introducing q as a decision variable increased complexity, but the algorithm still
achieved a good solution in under 60 trials.

We investigated our proposed BO search process detailed in Algorithm 3 using an AR(1) signal with N = 100
observations, ϕ = 0.9, and σ = 0.1. With Te = 100 BO trials, B = 100 bootstrap replicates, and λ = 10 1
, we
examined two scenarios: one with a fixed value, q = 0.5 and the other allowing the algorithm to determine q ∗ . The
results are depicted in Fig. 4.
While introducing q as a decision variable slightly increased the time to reach a good solution by approximately 20
trials, the BO search, in either case, transitioned to a high-performing regime after around 20 trials and shifted to an
exploitation phase. Fig. 5, focusing on q = 0.5, illustrates the combinations of (m, r) explored by the algorithm over
the computational trials.

12
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

Trials: t = 1, . . . , 50 Trials: t = 51, . . . , 100


100
NaN Values NaN Values
Similarity Radius (r)

10−1

m=1 m=2 m=3 m=1 m=2 m=3

7 × 10−2 10−1 1.5 × 10−1


8 × 10−2 9 × 10−2

yt = f (mt, rt)

Figure 5: The BO algorithm adapts its search strategy from exploration to exploitation while accounting for the
interdependence between (m, r). (Left): In the initial 50 trials, the algorithm explored various (m, r) combinations,
adjusting the radius to obtain valid SampEn estimates. (Right): Subsequently, it focused on m = 1 to explore tighter
similarity radii, effectively balancing the trade-off between larger m values and smaller r values.

The exploration and exploitation phases of the BO procedure are evident. Initially, the algorithm evaluates many
(m, r) combinations to understand the loss surface, whereas later, it primarily focuses on m = 1 while varying
r ∈ [0.10, 0.20]. Moreover, the algorithm accounts for the interdependence between (m, r). During exploration,
it tests larger m values with tighter radii, resulting in invalid SampEn estimates, and thus the radius to obtain valid
estimates. Conversely, setting m = 1 enabled exploration of tighter radii, though extreme values of r led to insufficient
matches for a stable SampEn estimate. Thus, the algorithm balanced the desire for larger m values and smaller r values
to achieve stable SampEn estimates.

4.1.4 Sensitivity of SampEn Hyperparameter Selection to Bootstrap Success Probability

Thus far, we have primarily used fixed values for the stationary bootstrap success probability hyperparameter, q ∈
(0, 1). For white noise signals, we set q = 0.9 while for AR(1) signals, q = 0.5, based on heuristic assumptions
regarding the overall temporal dependence structure of each signal class. However, given the introduction of an
auxiliary variable into the optimization scheme, it becomes important to gauge the sensitivity of the optimization
process to the choice of q.
To investigate this sensitivity, we examined the problem of selecting optimal values of (r, q) for both white noise and
1
AR(1) signals with N = 100. For AR(1) signals we set λ = 10 and for white noise we set λ = 31 . Additionally we
e
maintained T = 100 BO trials, B = 100 bootstrap replicates, and fixed m = 1, we constructed an interpolation of the
q versus r loss surface for each signal class. Fig. 6 illustrates the outcomes of this analysis.
Overall, we found that at the optimal r∗ , the BO algorithm exhibits relatively low sensitivity to variations in q within
a bounded range. For white noise signals, q ∈ [0.75, 0.97] yielded nearly identical objective values at r∗ , while for
AR(1) signals, q ∈ [0.20, 0.45] showed comparable results. Notably, even when deviating from the optimal q ∗ , as
observed in the AR(1) signal case with our heuristic suggestion of q = 0.50, the objective value remained relatively
close. This suggests that the choice of q may benefit from the stochastic nature of the signal bootstrapping scheme,
enhancing the overall robustness of the optimization process [39].

13
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

AR(1)
(q ∗ , r∗ ) 2 × 10−1

0.8
Similarity Radius (r)

f (m∗ = 1, r, q)
0.6

10−1
0.4

0.2

6 × 10−2

White Noise
1.0
(q ∗ , r∗ )
Similarity Radius (r)

0.8

f (m∗ = 1, r, q)
3 × 10−1
0.6

0.4

0.2 2 × 10−1

0.0 0.2 0.4 0.6 0.8 1.0


Stationary Bootstrap Success Probability (q)

Figure 6: The BO algorithm was relatively insensitive to the block size success probability q within a bounded range at
the locally optimal similarity radius r∗ . (Top): For AR(1) signals, q values in the range [0.20, 0.45] had approximately
the same objective value at r∗ , and even the heuristic suggestion of q = 0.50 did not have a notably greater loss value
than than the optimal q ∗ . (Bottom): For white noise signals, q values in the range [0.75, 0.97] yielded nearly the same
objective value at r∗ .

14
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

4.1.5 Benchmarking SampEn Hyperparameter Selection Methods


In this section, we evaluate our method’s performance in selecting SampEn hyperparameters relative to existing bench-
marks. We compare our approach to the optimization-based method by Lake et al. [3], the SampEn convergence-based
approach, and the canonical SampEn hyperparameter settings of (m = 2, r = 0.20σ).
We aim to select an optimal (m, r) combination from a signal set, S = {x1 , . . . , xn }, where n = 100 signals, each
containing N = 100 observations. We explore this task for both white noise signals and AR(1) signals.
We assess the SampEn estimation error using the regularized MSE objective function (Eq. 12) for each approach and
measure the computation time required to reach the locally optimal (m, r) combination. Since both the optimization-
based approach by Lake et al. and the convergence-based approach do not inherently generate uncertainty bounds of
the SampEn estimate, we approximate the mean regularized SampEn estimate MSE using a Gaussian distribution, as
suggested by Lake et al. [3]. See Appendix D for further details.
For the optimization-based approach, we search over a discretized grid of similarity radii values, R, given a fixed
embedding dimension value (m = 1). We search over R = {0.10σ, 0.15σ, . . . , 0.95σ, σ} where σ = 1 and use a
linear interpolation scheme to discretize to a grid with a step size of 0.01.
For the convergence-based approach, we fix m = 1 and use the SampEn variance estimator proposed by Lake et al.
[3] to calculate the median SampEn estimate variance across S for a given (m = 1, r) combination. We select the
radius value by identifying the “elbow” in the radius versus SampEn variance curve using the knee-finding algorithm
by Satopaa et al. [36].
1
In contrast, for our BO procedure for determining optimal SampEn hyperparameters, we fixed λ = 10 for AR(1)
1
signals and λ = 3 for Gaussian white noise signals. We limited m ∈ {1, 2, 3} and allotted B = 100 bootstrap
samples with a computational budget of Te = 100 trials. We also compared the SampEn parameter selection strategies
to the canonical (m = 2, r = 0.20σ) setting, using our proposed BO procedure to select q ∗ with the same λ, B, and
Te settings. Table 2 presents the results of this analysis.

Table 2: Our proposed BO method demonstrates statistically significantly lower regularized mean SampEn estimate
MSE compared to competing benchmarks for both signal classes (p < 0.05, one-sided Mann-Whitney U Test). Al-
though methods by Lake et al. [3] and the convergence-based approach entail lower computational costs than our BO
routine, all methods achieve results within a reasonable timeframe. Notably, there seems to be little difference in
computational time between our BO-based method searching for (m, r, q) and the standard SampEn approach, which
focuses solely on optimizing the stationary bootstrap success probability, q.
Signal Type Method MSE p Time (sec) m∗ r∗ SampEn(m∗ , r∗ )
SampEnEff 0.282 ± 0.006 ≤ 3.398 × 10−8 0.116 ± 0.006 1 0.668 ± 0.029 1.021 ± 0.042
Convergence 0.191 ± 0.008 0.043 0.035 ± 0.003 1 0.253 ± 0.034 1.974 ± 0.142
White Noise
Standard SampEn 0.574 ± 0.061 3.139 × 10−8 17.262 ± 0.599 2 0.2 2.301 ± 0.048
Ours 0.187 ± 0.003 21.671 ± 0.354 1 0.194 ± 0.016 2.232 ± 0.079
SampEnEff 0.089 ± 0.002 ≤ 3.398 × 10−8 0.115 ± 0.002 1 0.775 ± 0.038 0.376 ± 0.028
AR(1) Convergence 0.063 ± 0.003 ≤ 3.398 × 10−8 0.033 ± 0.001 1 0.308 ± 0.054 1.051 ± 0.147
Standard SampEn 0.138 ± 0.007 ≤ 3.398 × 10−8 16.71 ± 0.20 2 0.2 1.450 ± 0.022
Ours 0.056 ± 0.0005 17.064 ± 0.371 1 0.241 ± 0.012 1.255 ± 0.049

The results from Table 2 highlight several findings. Our BO approach consistently outperforms the comparable bench-
marks in terms of the regularized MSE for all signal types. Additionally, the SampEnEff objective selected much larger
radii than either our method or the SampEn convergence-based strategy, which may risk losing too much detailed sys-
tem information [26] and may be attributed to the variance estimation issue we previously identified in Section 4.1.1.

4.1.6 Computational Cost of BO-driven SampEn Hyperparameter Selection


Our BO-based algorithm outperforms existing benchmarks in terms of SampEn variance estimation and overall es-
timation error but comes with increased computational expense. To quantify this, we assessed the time required to
reach an optimal, unified (m, r, q) solution across a signal set S(N ), varying signal length (N ) and the number of
signals (n), and repeated the experiment ten times. Using AR(1) signals with the same parameter settings for λ, B,
and Te as detailed in Section 4.1.5, we observed that computation time grows linearly with the number of signals but
non-linearly with signal length likely due to the O(N 2 ) computational complexity of the SampEn algorithm [47]. Fig.
7 displays the results of this experiment.

15
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

30
80
Mean Execution Time (sec)

Mean ExecutionTime (sec)


25
60
20

40
15

20 10

5
0
50 100 200 300 400 50 100 200 300 400
Number of Observations (N ) Number of Signals (n)

Figure 7: Computation execution time for our proposed SampEn optimization algorithm varies with signal length and
number of signals. While execution time grows linearly with the number of signals for a fixed length, it grows non-
linearly as the signal length increases, likely due to the O(N 2 ) computational complexity of the SampEn algorithm
[47].

While our approach yields practical results within reasonable computation times, it may not be suitable beyond signal
lengths of approximately N ≈ 500 observations. Beyond this threshold, standard SampEn parameters or simpler
optimization methods may be more appropriate, considering SampEn’s relative stability.

4.2 Entropy Analysis on Short-Signal Benchmarks

To conclude our experiments, we evaluated the performance of our SampEn hyperparameter selection optimization
method against conventional benchmarks across five distinct short-signal datasets:

1. Electrocardiogram (ECG) signals differentiating myocardial infarction from normal heartbeats [48], with
N = 96 observations per signal.
2. Traffic loop sensor data from near the Los Angeles Dodgers stadium, comparing traffic during games on
weekends versus weekdays [49], with N = 288 observations per signal.
3. Accelerometer readings (roll) from a robot navigating over cement and carpet surfaces [50], with N = 70
observations per signal.
4. Sensor data from silicone wafer production, distinguishing between normal and defective items [48], with
N = 152 observations per signal.
5. Sensor readings distinguishing between humidity signals versus temperature signals [51], with N = 84
observations per signal.

Each dataset contains two classes with known ground truth. Our focus was on evaluating SampEn hyperparameter
selection methods’ ability to discern these known differences rather than employing a supervised machine learning
approach for class segregation. This approach showcases the algorithm’s potential efficacy in scenarios lacking known
ground truths or where class discrimination is not the primary objective. Furthermore, by utilizing signals across
diverse applications, including medical diagnostics, traffic analysis, and sensor-based monitoring, we highlight Sam-
pEn’s versatility in data scientific applications.
We ensured signal stationarity, a necessary condition for valid entropy analysis [52], using the Augmented Dickey-
Fuller test [46], with the Holm-Sidak method [53, 54] applied to adjust for multiple testing errors at a α = 0.05
confidence level. Detailed pre-processing steps are documented in Appendix E.
We compared several SampEn hyperparameter selection strategies, including our BO algorithm, the optimization-
based criterion detailed by Lake et al. [3], convergence-based methods, standard SampEn parameters (m = 2, r =

16
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

0.20σ), and Fuzzy Entropy (FuzzEn) as an alternate measure [35]. FuzzEn employs a fuzzy similarity condition,
differing from SampEn’s hard similarity criteria, with standard parameters set to (m = 2, r = 0.20σ, η = 2).
For the SampEnEff objective and the convergence-based approach, we utilized the same similarity radius grid of
R = {0.10σ, 0.15σ, . . . , 0.95σ, σ} using the linear interpolation scheme with a grid fidelity of 0.01, and selected m
using the autoregressive heuristic proposed by Lake et al. [3]. Specifically, we found the optimal median time order
lag using the Bayesian information criterion and selected that value as a proxy for embedding dimension [3, 32].
For our BO optimization approach, we set B = 100 bootstrap replicates and Te = 200 BO trials with λ = 1
3 for all of
the signal sets.
For each signal set, we analyzed the locally optimal (m∗ , r∗ ) combination selected by each method, the median
standard error (SE) of the SampEn estimates, and the Mann-Whitney U-test p-value comparing the distribution of
SampEn estimates at (m∗ , r∗ ).
The outcome of this analysis is summarized in Table 3. We discuss three main findings.
First, our BO framework consistently identified statistically significant differences between classes across all five
signal sets at an α = 0.05 significance level, unlike competing methods. Particularly, for ECG data, Dodger’s stadium
traffic, and robot surface accelerometer readings, our algorithm was the sole approach detecting known differences.
Second, SampEnEff generally under-performed, failing to establish statistically significant distinctions between the
signal groups. This could be attributed to its preference for larger similarity radii due to previously discussed issues
of SampEn variance estimation issues, especially in shorter signal sets. Conversely, the convergence-based method
outperformed SampEnEff in all datasets except the ECG signal set.
Third, standard SampEn parameters and FuzzEn detected significant differences in specific datasets, like wafer man-
ufacturing and humidity versus temperature signal sets. However, predicting the effectiveness of these parameters
without prior analysis is challenging. Moreover, these conventional approaches lack the capability to quantify the un-
certainty associated with entropy estimates. Although fixing parameters, such as m = 2 and r = 0.20σ, and finding q ∗
to estimate uncertainty is feasible, our analysis demonstrated a relatively small computational difference to optimize
(m∗ , r∗ , q ∗ ) compared to merely q ∗ .

5 Discussion and Conclusion

In this study, we evaluated our proposed SampEn hyperparameter optimization approach against existing optimization-
based, convergence-based, and standard SampEn parameter selection strategies using synthetic signals and publicly
available benchmarks.
For the synthetic experiments, our main findings are as follows:

• Our bootstrap-based SampEn variance estimator consistently achieved 60 to 90 percent lower estimation error
compared to Lake et al.’s estimator across multiple signal types, similarity radii, and signal lengths.
 1 1
• A regularization value of λ ∈ 10 , 3 effectively regulated the BO algorithm, preventing fixation on extreme
bounds of the similarity radius domain.
• The BO algorithm typically converged to satisfactory solutions in under 75 computational trials and demon-
strated a balanced exploration-exploitation approach within the (m, r, q) decision space.
• Sensitivity analysis revealed the BO algorithm’s robustness to the specification of q at r∗ .
• Our proposed algorithm yielded a statistically significantly lower mean regularized SampEn estimate MSE,
with competing methods often favoring larger similarity radii.
• Our method found locally optimal, unified (m, r, q) combinations across signal sets of varying signal lengths
and signal set sizes in a reasonable timeframe. However, for signals longer than approximately N ≈ 500
observations, other methods may be more appropriate.

Across five diverse, short-signal benchmarks, our method was the only one to identify known, statistically significant
differences between signal classes. This appears to be driven by our joint consideration of (m, r) selection for stable
SampEn estimation and preference for tighter similarity radii.
Our results suggest that the key advantages of our approach are its regularization against overly large radii and more
accurate quantification of estimation uncertainty, leading to superior performance on short-signal scenarios.

17
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

Table 3: Our proposed BO method was the only approach to consistently detect known, statistically significant dif-
ferences at α = 0.05 in entropy between signal classes across all benchmark datasets while achieving comparable or
lower SampEn estimate standard error. Results of the entropy analysis are summarized below, including the the Mann-
Whitney U-test p-value (p), and the median SampEn standard error, Median SE(θ̂(m∗ , r∗ )), at the optimal parameters
(m∗ , r∗ ) selected by each method.
 
Data Method m∗ r∗ Median SE θ̂(m∗ , r∗ ) p
Ours 1 0.11 0.19 0.042
ECG [48] SampEnEff 4 0.50 0.19 0.35
n = 56 Convergence 4 0.41 0.23 0.47
N = 96 Standard SampEn 2 0.20 0.21
FuzzEn 2 0.20 0.30
Ours 1 0.18 0.097 0.002
Dodgers’ Traffic [49] SampEnEff 3 0.40 0.28 0.79
n = 144 Convergence 3 0.34 0.29 0.32
N = 288 Standard SampEn 2 0.20 0.30
FuzzEn 2 0.20 0.73
Ours 1 0.05 0.18 2.21 × 10−5
Robot Surface [50] SampEnEff 2 0.55 0.19 0.73
n = 511 Convergence 2 0.10 0.36 0.12
N = 70 Standard SampEn 2 0.20 0.12
FuzzEn 2 0.20 0.67
Ours 1 0.10 0.022 5.11 × 10−5
Wafer Manufacturing [48] SampEnEff 1 1.00 0.021 0.29
n = 1000 Convergence 1 0.24 0.023 1.62 × 10−6
N = 152 Standard SampEn 2 0.20 0.16
FuzzEn 2 0.20 8.40 × 10−5
Ours 1 0.10 0.19 0.006
Sensor Modality [51] SampEnEff 1 1.00 0.056 0.56
n = 825 Convergence 1 0.48 0.12 0.45
N = 84 Standard SampEn 2 0.20 2.61 × 10−3
FuzzEn 2 0.20 0.39

There are, however, open questions regarding entropy analysis with very short signals (N ≤ 50), where SampEn’s
reliability diminishes, sometimes yielding undefined values due to lack of template matches [55]. This limitation is
particularly relevant for applications like cardiopulmonary exercise testing, where signals may contain fewer than 50
observations [56].
To address this challenge, future work could explore two main directions: improving existing entropy measures or
modifying the SampEn computation itself. Permutation entropy (PermEn) shows promise for shorter signals [57, 58],
with recent research demonstrating reliable PermEn estimates using a Bayesian framework with as few as N = 10
observations [59]. Alternatively, modifying the SampEn computation, as suggested by Richman et al. [60] to ensure
template matches for shorter signals may be beneficial.
Ultimately, addressing the challenges of entropy estimation for very short signals is necessary for expanding the
applicability of these complexity measures to a wider range of real-world, data-scientific scenarios.

References
[1] Saila Vikman, Timo H. Mäkikallio, Sinikka Yli-Mäyry, Sirkku Pikkujämsä, Anna-Maija Koivisto, Pekka
Reinikainen, K. E. Juhani Airaksinen, and Heikki V. Huikuri. Altered complexity and correlation properties
of r-r interval dynamics before the spontaneous onset of paroxysmal atrial fibrillation. Circulation, 100(20):
2079–2084, 11 1999. doi:https://doi.org/10.1161/01.CIR.100.20.2079.
[2] Xueyu Liang, Jinle Xiong, Zhengtao Cao, Xingyao Wang, Jianqing Li, and Chengyu Liu. Decreased sam-
ple entropy during sleep-to-wake transition in sleep apnea patients. Physiol. Meas., 42(4):044001, 5 2021.

18
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

doi:https://doi.org/10.1088/1361-6579/abf1b2.
[3] Douglas E. Lake, Joshua S. Richman, M. Pamela Griffin, , and J. Randall Moorman. Sample entropy analysis
of neonatal heart rate variability. Am. J. Physiol. Regul. Integr. Comp. Physiol., 283(3):R789–R797, 9 2002.
doi:https://doi.org/10.1152/ajpregu.00069.2002.
[4] Douglas E Lake and J Randall Moorman. Accurate estimation of entropy in very short physiological time series:
the problem of atrial fibrillation detection in implanted ventricular devices. Am. J. Physiol. Heart. Circ. Physiol.,
300(1):H319–H325, 1 2011. doi:https://doi.org/10.1152/ajpheart.00561.2010.
[5] Xuewei Wang, Xiaohu Zhao, Fei Li, Qiang Lin, and Zhenghui Hu. Sample entropy and surrogate data analysis
for alzheimer’s disease. Math. Biosci. Eng., 16(6):6892–6906, 7 2019. doi:https://doi.org/10.3934/mbe.2019345.
[6] Hsien-Jung Chan, Zhuhuang Zhou, Jui Fang, Dar-In Tai, Jeng-Hwei Tseng, Ming-Wei Lai, Bao-Yu
Hsieh, Tadashi Yamaguchi, and Po-Hsiang Tsui. Ultrasound sample entropy imaging: A new ap-
proach for evaluating hepatic steatosis and fibrosis. IEEE J. Transl. Eng. Health Med., 9:1–12, 2021.
doi:https://doi.org/10.1109/JTEHM.2021.3124937.
[7] Jiuli Yin, Cui Su, Yongfen Zhang, and Xinghua Fan. Complexity analysis of carbon market using the modified
multi-scale entropy. Entropy, 20(6), 6 2018. doi:https://doi.org/10.3390/e20060434.
[8] Joanna Olbryś and Elżbieta Majewska. Regularity in stock market indices within turbulence periods: The sample
entropy approach. Entropy, 24(7), 7 2022. doi:https://doi.org/10.3390/e24070921.
[9] Marzia Zaman and Chung-Horng Lung. Evaluation of machine learning techniques for network intrusion de-
tection. In NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium, pages 1–5, New
York, NY, USA, 4 2018. IEEE. doi:https://doi.org/10.1109/NOMS.2018.8406212.
[10] Deepali Virmani, Shweta Taneja, Tanya Chawla, Rachesh Sharma, and Rohan Kumar. Entropy deviation method
for analyzing network intrusion. In 2016 International Conference on Computing, Communication and Automa-
tion, pages 515–519, New York, NY, USA, 4 2016. IEEE. doi:https://doi.org/10.1109/CCAA.2016.7813774.
[11] Rosario Gilmary, Akila Venkatesan, and Govindasamy Vaiyapuri. Detection of automated behavior on
twitter through approximate entropy and sample entropy. Pers. Ubiquit. Comput., pages 91–105, 9 2021.
doi:https://doi.org/10.1007/s00779-021-01647-9.
[12] Wu Huachun, Zhou Jian, Xie Chunhu, Zhang Jiyang, and Huang Yiming. Two-dimensional time series sample
entropy algorithm: Applications to rotor axis orbit feature identification. Mech. Syst. Signal Process., 147:
107123, 1 2021. doi:https://doi.org/10.1016/j.ymssp.2020.107123.
[13] Zhenya Wang, Ligang Yao, and Yongwu Cai. Rolling bearing fault diagnosis using generalized refined
composite multiscale sample entropy and optimized support vector machine. Measurement, 156:107574, 5
2020. doi:https://doi.org/10.1016/j.measurement.2020.107574. URL https://www.sciencedirect.com/
science/article/pii/S0263224120301111.
[14] Joshua S. Richman and J. Randall Moorman. Physiological time-series analysis using approximate
entropy and sample entropy. Am. J. Physiol. Heart. Circ. Physiol., 278(6):H2039–H2049, 6 2000.
doi:https://doi.org/10.1152/ajpheart.2000.278.6.H2039.
[15] Jennifer M. Yentes, Nathaniel Hunt, Kendra K. Schmid, Jeffrey P. Kaipust, Denise McGrath, and Nicholas
Stergiou. The appropriate use of approximate entropy and sample entropy with short data sets. Ann. Biomed.
Eng., 41:349–365, 2 2013.
[16] K. Hauke Kraemer, Reik V. Donner, Jobst Heitzig, and Norbert Marwan. Recurrence threshold selection for
obtaining robust recurrence characteristics in different embedding dimensions. Chaos, 28(8):085720, 8 2018.
doi:https://doi.org/10.1063/1.5024914.
[17] Sofiane Ramdani, Benoît Seigle, Julien Lagarde, Frédéric Bouchara, and Pierre Louis Bernard. On the
use of sample entropy to analyze human postural sway data. Med. Eng. Phys., 31(8):1023–1031, 10 2009.
doi:https://doi.org/10.1016/j.medengphy.2009.06.004.
[18] R. Castaldo, L. Montesinos, P. Melillo, C. James, and L. Pecchia. Ultra-short term hrv features as surrogates of
short term hrv: a case study on mental stress detection in real life. BMC Med. Inform. Decis. Mak., 19, 1 2019.
doi:https://doi.org/10.1186/s12911-019-0742-y.
[19] Angkoon Phinyomark, Rami N. Khushaba, and Erik Scheme. Feature extraction and selection for myoelectric
control based on wearable emg sensors. Sensors, 18(5), 5 2018. doi:https://doi.org/10.3390/s18051615.
[20] Jagdeep Rahul, Lakhan Dev Sharma, and Vijay Kumar Bohat. Short duration vectorcardiogram based inferior
myocardial infarction detection: class and subject-oriented approach. Biomedical Engineering / Biomedizinische
Technik, 66(5):489–501, 5 2021. doi:https://doi.org/10.1515/bmt-2020-0329.

19
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

[21] Steven M Pincus. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci., 88(6):
2297–2301, 3 1991. doi:https://doi.org/10.1073/pnas.88.6.2297.
[22] J. P. Eckmann and D. Ruelle. Ergodic theory of chaos and strange attractors. Rev. Mod. Phys., 57:617–656, 7
1985. doi:https://doi.org/10.1103/RevModPhys.57.617.
[23] Alfonso Delgado-Bonal and Alexander Marshak. Approximate entropy and sample entropy: A comprehensive
tutorial. Entropy, 21(6), 2019. doi:https://doi.org/10.3390/e21060541.
[24] P. Castiglioni and M. Di Rienzo. How the threshold “r”’ influences approximate entropy analysis of heart-
rate variability. In 2008 Computers in Cardiology, pages 561–564, New York, NY, USA, 2008. IEEE.
doi:https://doi.org/10.1109/CIC.2008.4749103.
[25] Chengyu Liu, Changchun Liu, Peng Shao, Liping Li, Xin Sun, Xinpei Wang, and Feng Liu. Comparison of
different threshold values r for approximate entropy: application to investigate the heart rate variability between
heart failure and healthy control groups. Physiol. Meas., 32(2):167, 12 2010. doi:https://doi.org/10.1088/0967-
3334/32/2/002.
[26] Steven M Pincus and Ary L Goldberger. Physiological time-series analysis: what does reg-
ularity quantify? Am. J. Physiol. Heart. Circ. Physiol., 266(4):H1643–H1656, 4 1994.
doi:https://doi.org/10.1152/ajpheart.1994.266.4.H1643.
[27] Zachary Blanks, Donald E Brown, Dan M Cooper, Shlomit Radom Aizik, and Ronen Bar-Yoseph. Signal vari-
ability comparative analysis of healthy early- and late-pubertal children during cardiopulmonary exercise testing.
Med. Sci. Sports. Exerc., 56(2):287—296, 2 2024. doi:https://doi.org/10.1249/mss.0000000000003296.
[28] Raúl Alcaraz, Daniel Abásolo, Roberto Hornero, and José J Rieta. Optimal parameters study for sample entropy-
based atrial fibrillation organization analysis. Comput. Methods Programs Biomed., 99(1):124–132, 7 2010.
doi:https://doi.org/10.1016/j.cmpb.2010.02.009.
[29] David Cuesta-Frau, Pau Miró-Martínez, Jorge Jordán Núñez, Sandra Oltra-Crespo, and Antonio Molina Picó.
Noisy eeg signals classification based on entropy metrics. performance assessment using first and second genera-
tion statistics. Comput. Biol. Med., 87:141–151, 8 2017. doi:https://doi.org/10.1016/j.compbiomed.2017.05.028.
[30] David Cuesta-Frau, Pau Miró-Martínez, Sandra Oltra-Crespo, Jorge Jordán-Núñez, Borja Vargas, Paula
González, and Manuel Varela-Entrecanales. Model selection for body temperature signal classifi-
cation using both amplitude and ordinality-based entropy measures. Entropy, 20(11), 11 2018.
doi:https://doi.org/10.3390/e20110853.
[31] Nikhil Padhye, Denise Rios, Vaunette Fay, and Sandra K. Hanneman. Pressure injury link to entropy of abdomi-
nal temperature. Entropy, 24(8), 8 2022. doi:https://doi.org/10.3390/e24081127.
[32] Gideon Schwarz. Estimating the dimension of a model. Ann. Stat., 6(2):461–464, 3 1978. URL http://www.
jstor.org/stable/2958889.
[33] Mourad Kedadouche, Marc Thomas, Antoine Tahan, and Raynald Guilbault. Nonlinear parameters for moni-
toring gear: Comparison between lempel-ziv, approximate entropy, and sample entropy complexity. Shock and
Vibration, 2015, 11 2015. doi:https://doi.org/10.1155/2015/959380.
[34] Alessandro Mengarelli, Andrea Tigrini, Federica Verdini, Rosa Anna Rabini, and Sandro Fioretti. Mul-
tiscale fuzzy entropy analysis of balance: Evidences of scale-dependent dynamics on diabetic pa-
tients with and without neuropathy. IEEE Trans. Neural Syst. Rehabil. Eng., 31:1462–1471, 2023.
doi:https://doi.org/10.1109/TNSRE.2023.3248322.
[35] Weiting Chen, Zhizhong Wang, Hongbo Xie, and Wangxin Yu. Characterization of surface emg
signal based on fuzzy entropy. IEEE Trans. Neural Syst. Rehabil. Eng., 15(2):266–272, 2007.
doi:https://doi.org/10.1109/TNSRE.2007.897025.
[36] Ville Satopaa, Jeannie Albrecht, David Irwin, and Barath Raghavan. Finding a “kneedle”’ in a haystack: Detect-
ing knee points in system behavior. In 2011 31st International Conference on Distributed Computing Systems
Workshops, pages 166–171, New York, NY, USA, 2011. IEEE. doi:https://doi.org/10.1109/ICDCSW.2011.20.
[37] B. Efron. Bootstrap Methods: Another Look at the Jackknife. Ann. Stat., 7(1):1 – 26, 1 1979.
doi:https://doi.org/10.1214/aos/1176344552.
[38] Dimitris N. Politis and Joseph P. Romano. The stationary bootstrap. J. Am. Stat. Assoc., 89(428):1303–1313, 12
1994. doi:https://doi.org/10.2307/2290993.
[39] Dimitris N. Politis and Halbert White. Automatic block-length selection for the dependent bootstrap. Econom.
Rev., 23(1):53–70, 2004. doi:https://doi.org/10.1081/ETC-120028836.

20
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

[40] Peter I. Frazier. Bayesian optimization. In Recent Advances in Optimization and Modeling of
Contemporary Problems, chapter 11, pages 255–278. INFORMS, Catonsville, MD, USA, 10 2018.
doi:https://doi.org/10.1287/educ.2018.0188.
[41] James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyper-parameter optimization.
In Proceedings of the 24th International Conference on Neural Information Processing Systems, volume 24,
pages 2546–2554, Red Hook, NY, USA, 2011. Curran Associates Inc.
[42] Shuhei Watanabe. Tree-structured parzen estimator: Understanding its algorithm components and their roles for
better empirical performance. ArXiv e-Print, 5 2023. doi:https://doi.org/10.48550/arXiv.2304.11127.
[43] Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation
hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference
on Knowledge Discovery & Data Mining, pages 2623–2631, New York, NY, USA, 7 2019. Association for
Computing Machinery. doi:https://doi.org/10.1145/3292500.3330701.
[44] James Bergstra, Daniel Yamins, and David Cox. Making a science of model search: Hyperparameter optimization
in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on
Machine Learning, volume 28, pages 115–123, Atlanta, GA, USA, 6 2013. PMLR.
[45] Jiaming Song, Lantao Yu, Willie Neiswanger, and Stefano Ermon. A general recipe for likelihood-free bayesian
optimization. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages
20384–20404, Atlanta, GA, USA, 7 2022. PMLR.
[46] David A. Dickey and Wayne A. Fuller. Distribution of the estimators for autoregressive time series with a unit
root. J. Am. Stat. Assoc., 74(366):427–431, 1979. doi:https://doi.org/10.2307/2286348.
[47] Ying Jiang, Dong Mao, and Yuesheng Xu. A fast algorithm for computing sample entropy. Advances in Adaptive
Data Analysis, 3(01n02):167–186, 2011. doi:https://doi.org/10.1142/S1793536911000775.
[48] Robert Thomas Olszewski, Roy Maxion, and Dan Siewiorek. Generalized feature extraction for structural
pattern recognition in time-series data. Carnegie Mellon University, Pittsburgh, PA, USA, 2001.
[49] Jon Hutchins. Dodgers Loop Sensor. UCI Machine Learning Repository, 2006. DOI:
https://doi.org/10.24432/C51P50.
[50] Abdullah Mueen, Eamonn Keogh, and Neal Young. Logical-shapelets: an expressive primitive for time se-
ries classification. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discov-
ery and Data Mining, pages 1154–1162, New York, NY, USA, 2011. Association for Computing Machinery.
doi:https://doi.org/10.1145/2020408.2020587.
[51] J. Sun, S. Papadimitriou, and C. Faloutsos. Online latent variable detection in sensor networks. In 21st In-
ternational Conference on Data Engineering (ICDE’05), pages 1126–1127, New York, NY, USA, 2005. IEEE.
doi:https://doi.org/10.1109/ICDE.2005.100.
[52] Cyril Chatain, Mathieu Gruet, Jean-Marc Vallier, and Sofiane Ramdani. Effects of nonstationarity on muscle
force signals regularity during a fatiguing motor task. IEEE Trans. Neural Syst. Rehabil. Eng., 28(1):228–237,
2020. doi:https://doi.org/10.1109/TNSRE.2019.2955808.
[53] Sture Holm. A simple sequentially rejective multiple test procedure. Scand. J. Stat., 6(2):65–70, 1979.
[54] Zbyněk Šidák. Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat.
Assoc., 62(318):626–633, 6 1967. doi:https://doi.org/10.2307/2283989.
[55] Peng Li, Chengyu Liu, Ke Li, Dingchang Zheng, Changchun Liu, and Yinglong Hou. Assessing the complexity
of short-term heartbeat interval series by distribution entropy. Med. Biol. Eng. Comput., 53(1):77–87, 2015.
doi:https://doi.org/10.1007/s11517-014-1216-0.
[56] T Takken, A C Blank, E H Hulzebos, M Van Brussel, W G Groen, and P J Helders. Cardiopulmonary exer-
cise testing in congenital heart disease: equipment and test protocols. Neth. Heart J., 17(9):339–344, 2009.
doi:https://doi.org/10.1007%2FBF03086280.
[57] Christoph Bandt and Bernd Pompe. Permutation entropy: A natural complexity measure for time series. Phys.
Rev. Lett., 88(17):174102, 4 2002. doi:https://doi.org/10.1103/PhysRevLett.88.174102.
[58] David Cuesta-Frau, Juan Pablo Murillo-Escobar, Diana Alexandra Orrego, and Edilson Delgado-Trejos. Embed-
ded dimension and time series length. practical influence on permutation entropy and its applications. Entropy,
21(4), 4 2019. doi:https://doi.org/10.3390/e21040385.
[59] Zachary Blanks, Donald E. Brown, Marc A. Adams, and Siddhartha S. Angadi. An improved bayesian permuta-
tion entropy estimator with wasserstein-optimized hierarchical priors. In Conference on Health, Inference, and
Learning, Atlanta, GA, USA, 2024. PMLR.

21
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

[60] Joshua S. Richman, Douglas E. Lake, and J.Randall Moorman. Sample entropy. In Numerical Computer Meth-
ods, Part E, volume 384 of Methods Enzymol, pages 172–184. Academic Press, Cambridge, MA, USA, 2004.
doi:https://doi.org/10.1016/S0076-6879(04)84011-4.
[61] David W Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons Inc.,
New York, NY, USA, 1992. doi:https://doi.org/10.1002/9780470316849.
[62] Oriol Abril-Pla, Virgile Andreani, Colin Carroll, Larry Dong, Christopher J. Fonnesbeck, Maxim Kochurov,
Ravin Kumar, Junpeng Lao, Junpeng Lao, Osvaldo A. Martin, Michael Osthege, Ricardo Vieira, Thomas Wiecki,
and Robert Zinkov. Pymc: a modern, and comprehensive probabilistic programming framework in python. PeerJ
Comput. Sci., 9, 9 2023. doi:http://dx.doi.org/10.7717/peerj-cs.1516.
[63] Matthew D. Homan and Andrew Gelman. The no-u-turn sampler: adaptively setting path lengths in hamiltonian
monte carlo. J. Mach. Learn. Res., 15(1):1593–1623, 1 2014.
[64] Andrew Gelman and Donald B. Rubin. Inference from Iterative Simulation Using Multiple Sequences. Stat. Sci.,
7(4):457 – 472, 11 1992. doi:https://doi.org/10.1214/ss/1177011136.

A Tree-structured Parzen Estimator Hyperparameters


In the Tree-structured Parzen Estimator (TPE) optimization framework, there are four hyperparameters of note:

• The top quantile value, γ ∈ (0, 1] which delineates the better-performing observation set, D(l) , and worse-
performing observation set, D(g) .
• The kernel model: kd : Ψd × Ψd → R+ describes the distribution of the d-th decision variable given the
observation set.
• The bandwidth parameters, b(l) , b(g) ∈ R+ control the kernel estimators for the better and worse-performing
observation sets
T +1
• {wt }t=0 assign weights to the Gaussian mixture components.

These hyperparameters use default values from the “Optuna” framework [43] for the TPE optimizer. The following
sections provide detailed descriptions of each component as applied to the SampEn hyperparameter selection problem.

A.1 Top Quantile Selection

The top quantile value, γ ∈ (0, 1], controls the sizes of the sets D(l) and D(g) ). For all iterations, we use γ = Γ(T ) =
0.10. However, when selecting the (T + 1)-th point, the size of the better-performing set D(l) is adjusted according to:

T (l) = min (⌈γT ⌉, 25) . (13)


T
Given a sorted observation set, D = {(ψ t , yt )}t=1 , on yt , such that y1 ≤ y2 ≤ · · · ≤ yT , we define D(l) =
(l)
T
{(ψ t , yt )}t=1 and D(g) contains the remaining objective function evaluations.

A.2 Gaussian Kernel Functions and Bandwidth Selection

To construct a mixture of Gaussian kernels, we define kernel functions for our SampEn hyperparameters: (m, r, q).
These kernels, denoted as kd : Ψd × Ψd → R, are defined over the domains of the respective decision variables. Given
that m ∈ Ψ1 ⊆ Z+ , and r, q ∈ Ψ2 , Ψ3 ⊆ (0, 1), we choose appropriate kernels and bandwidths for each.
For the continuous variables r and q, we employ the canonical Gaussian kernel, parameterized by b ∈ R+ . The kernel
for the t-th observation of r (similarly for q) is given by:

 2 !
1 1 r − rt
g(r, rt | b) = √ exp − . (14)
2πb2 2 b

However, since r and q are bounded within the unit interval, we utilize a truncated Gaussian kernel:

22
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

1
kd (r, rt ) = R 1 · g(r, rt | b). (15)
0
g(r′ , rt | b) dr′

For the embedding dimension m, a discrete parameter, we adopt a similar approach but with slight modifications due
to its discrete nature. We limit m to a finite set 1, 2, . . . , U , where U is the upper bound of possible m values. The
kernel for the t-th observation of m, given the bandwidth parameter b, is computed as:

Z U +1/2
1
k1 (m, mt ) = R U +1/2 × g(m′ , mt | b) dm′ . (16)
1−1/2
g(m′ , mt | b) dm′ 1−1/2

We select the bandwidth parameter b using Scott’s rule[61] with the “magic-clipping” heuristic [41]. Specifically, b is
calculated as:

1
b = T − d+4 , (17)

where d is the dimensionality of the kernel. We also introduce bmin :

U −L
bmin = , (18)
min ({T, 100})

where U and L indicate the upper and lower bounds of the decision variable. Finally, we set the bandwidth parameter
as:

b′ = max ({b, bmin }) . (19)

A.3 Gaussian Kernel Mixture Weights

In our Bayesian optimization framework, we utilize a mixture of Gaussian kernels for the surrogate model. This
T +1
requires a set of weights, {wt ∈ [0, 1]}t=0 , which we define using the “old decay” weighting scheme proposed by
Bergstra et al. [44] defined as:

 T (l)1+1 , if, t = 0, . . . , T (l)
wt = w′
PT +1 t , if, t = T (l) + 1, . . . , T + 1 (20)
 ′
wk
k=T (l) +1

In Eq. (20), for t = T (l) + 1, . . . , T + 1, we compute wt′ according to:


(
1, if it > T (g) + 1 − 25
wt′ = 1−τ (it ) , (21)
τ (it ) + T (g) +1
, else

where it represents the query order, with it = 1 being the oldest query and it = T (g) + 1 the most recent one. The
decay rate τ (it ) is defined as (it − 1)/(T (g) − 25) [44]. This weighting scheme assigns less weight to older queries in
D(g) and greater weight to more recent ones.

A.4 Constructing the Surrogate Model


 
The surrogate models p ψ | D(l) and p ψ | D(g) are built upon a non-informative Gaussian prior, p0 . Considering
an upper bound U for the embedding dimension m, we define the mean vector µ0 and covariance matrix Σ0 as:
 
  (U − 1)2 0 0
U −1 1 1 
µ0 = , , , Σ0 = 0 1 0 .
2 2 2 0 0 1

23
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

The non-informative prior is then p0 ∼ N (µ0 , Σ0 ). The surrogate model for the better-performing set is expressed
as:

(l)
  3 X
Y T  
(l)
p ψ|D = p0 (ψ) + wt · kd ψd , ψd,t | b(l) . (22)
d=1 t=1

Similarly, for the worse-performing group, we have:

  3
Y T
X  
p ψ | D(g) = p0 (ψ) + wt · kd ψd , ψd,t | b(g) , (23)
d=1 t=T (l) +1

where ψd,t ∈ Ψd represents the d-th dimension of the t-th observation, and b(l) and b(g) are the bandwidth parameters
obtained using Eq. (19).

B Signal Set SampEn Hyperparameter Selection Optimization Algorithm

We provide a full specification of how to select locally optimal SampEn hyperparameters, (m, r) for a signal set,
S = {x1 , . . . , xn }. This is detailed in Algorithm 4.

Algorithm 4 Signal Set SampEn Hyperparameter Selection Optimization


Require: S = {x1 , . . . , xn }, B ∈ Z+ , Te = Number of BO trials, λ ≥ 0, Tinit ∈ Z+
1: D ← ∅ ▷ Empty observation set
2: for t = 1, . . . , Tinit do
3: Select ψ t randomly
4: for i = 1, . . . , n do
5: Compute the SampEn estimate of xi , θ̂i (mt , rt )
B
6: Generate bootstrap replicates of xi , {xi,b }b=1 , given qt ▷ See Algorithm 1
n oB
7: Compute bootstrap SampEn estimates, θ̂i,b (mt , rt )
  √
b=1
8: yi = MSE θ̂i (mt , rt ) + λ rt
9: end forP
n
10: yt = n1 i=1 yi ▷ Mean objective value across the n signals
11: D ← D ∪ (ψ t , yt )
12: end for
13: while t ≤ Te do
14: Select ψ t+1 using the TPE acquisition process ▷ See Algorithm 2
15: for i = 1, . . . , n do
16: Compute SampEn estimate of xi , θ̂i (mt+1 , rt+1 )
B
17: Generate bootstrap replicates of xi , {xi,b }b=1 , given qt+1 ▷ See Algorithm 1
n oB
18: Compute bootstrap SampEn estimates, θ̂i,b (mt+1 , rt+1 )
  b=1

19: yi = MSE θ̂i (mt+1 , rt+1 ) + λ rt+1
20: end for P
n
21: yt+1 = n1 i=1 yi  ▷ Mean objective value across the n signals
22: D ← D ∪ ψ t+1 , yt+1
23: end while 
24: y ∗ ← min y1 , . . . , yTe

25: ψ ∗ ← argmin y1 , . . . , yTe
∗ ∗
26: return ψ , y

24
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

C SampEn Variance Estimators Comparison Experiment Details


This appendix details the methodology employed to compare the SampEn variance estimation between two proposed
approaches. We begin by describing the normalized set of signals for each signal class (white noise and AR(1) signals),
denoted as S(N ) = {x1 , . . . , xn }, where n represents the number of generated signals, all with N observations. We
fix n = 10, 000 and vary N across 50, 100, 200 to represent instances of short time series signals. Additionally, we
evaluate r ∈ {0.20σ, 0.25σ}, where σ is the standard deviation of the signal. All signals are normalized to have zero
mean and unit variance.
n on
We denote θ̂i (m, r) as the estimated SampEn of signal xi at (m, r), and θ̂i(m, r) as the set of n SampEn
i=1
estimates at (m, r) from the signal set, S(N ). We approximate the “true” SampEn estimate variance at (m, r, N ) by
the expression:

n on 
σ 2 (m, r, N ) = VN θ̂i (m, r)
i=1
1 X 2
n (24)
≈ θ̂i (m, r) − θ̄(m, r)
n − 1 i=1

where θ̄(m, r) is the mean of the SampEn estimate set.


We are interested in understanding how accurately a particular SampEn variance estimator performs. Given a signal,
xi , for a particular estimator, e, we obtain a SampEn variance estimate: σ bi2 (m, r, N, e). The mean squared error
(MSE) of the SampEn variance estimator, e, for a particular signal class at (m, r, N ) is approximated as:

 
n
ϵ(m, r, N, e) = MSE {b
σi (m, r, N )}i=1
e

n
1X 2
e
 (25)
≈ bi2 (m, r, N, e) .
σ (m, r, n) − σ
e i=1
n

For a given signal type at m = 1, and fixed values for (r, N ), we generated 20 estimates of ϵ(m, r, N, e) for both
our proposed bootstrap-based estimator and the estimator proposed by Lake et al. [3]. This process is summarized in
Algorithm 5.

Algorithm 5 SampEn Variance Estimation Error Calculation Procedure


Require: N ∈ Z+ , r ∈ R+ , Fix m = 1, q = 0.5 (AR(1)) or q = 0.9 (White Noise), B = 100, n = 10, 000, ne = 100
1: Generate S(N ) = {x1 , . . . , xn } n o ▷ Either a Gaussian white noise or AR(1) signal
n
2: Calculate SampEn estimates from S(N ): θ̂i (m, r)
i=1
3: Compute σ 2 (m, r, N ) using Eq. (24)
4: for i = 1, . . . , 20 do
5: e ) ← Randomly select n
S(N e signals from S(N ).
n e )
6: Calculate {b
σj (m, r, N, Counting)}j=1 from the signals in S(N ▷ See [3] for computation details
e

n e
7: Calculate {b
σj (m, r, N, Bootstrap)}j=1 from the signals in S(N ) given B bootstrap replicates and q ▷ See
e

Section 3.2 and (Eq. 6) for details.


8: Calculate ϵi (m, r, N, Counting) using Eq. (25)
9: Calculate ϵi (m, r, N, Bootstrap) using Eq. (25)
10: end for
20 20
11: return {ϵi (m, r, N, Counting)}i=1 and {ϵi (m, r, N, Bootstrap)}i=1

In Algorithm 5, “Counting” refers to the SampEn variance estimation approach detailed by Lake et al. [3] and “Boot-
strap” corresponds to our proposed estimation procedure. We repeated this process for N ∈ 50, 100, 200 observations
and r ∈ 0.15σ, 0.20σ, 0.25σ.
We model the distribution of signal type t of SampEn variance estimator errors using the following setup:

25
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

µt,r,N,e ∼ N (0, 4)
σt,r,N,e ∼ U(0.01, 2.0)
ϵi (t, r, N, e) ∼ log N (µt,r,N,e , σt,r,N,e ) (26)
 
exp(µt,r,N,1 ) − exp(µt,r,N,2 )
∆ϵ(r, N ) ≡ × 100.
exp(µt,r,N,1 )

In Eq. (26), µt,r,N,e and σt,r,N,e represent the mean and standard deviation of log-transformed SampEn variance MSE
values, respectively.
We are particularly interested in the posterior distribution of ∆ϵ(r, N ), representing the mean percent difference in
SampEn variance MSE between our bootstrap-based estimator (µt,r,N,2 ) and the counting-based approach (µt,r,N,1 )
detailed by Lake et al. [3]. We use the exp(·) function to transform the values out of the log-error domain. Positive
values of ∆ϵ(m, r, N ) would indicate that our proposed estimator has relatively lower SampEn variance estimator
error.
We propose to approximate the posterior distribution of µt,r,N,e using the Markov Chain Monte Carlo (MCMC) algo-
rithm via the No U-Turn Sampler algorithm in the PyMC probabilistic programming framework [62, 63]. To ensure
MCMC convergence, we used trace plots, evaluated the effective sample size (ESS), and checked the Gelman-Rubin
“R-hat” statistic [64]. Trace plots visually assessed the chains’ behavior, ensuring stability, good mixing, and a lack
of trends, signifying successful exploration of the parameter space and convergence. ESS measurements quantified
the number of independent samples, indicating efficient sampling and exploration of the posterior distribution. R-hat
values close to one suggested convergence.

D Gaussian Approximation of SampEn Estimate Uncertainty


n on
Let S = {x1 , . . . , xn } be a signal set containing n signals each with N observations, and let θ̂i (m, r) be the
i=1
set of SampEn estimates given a fixed (m, r) obtained from S. We are interested in generating an approximation
of the mean squared error (MSE) of the set of SampEn estimates to compare alternative SampEn hyperparameter
selection approaches against our proposed method. Approximating the SampEn MSE requires a way to reason about
the uncertainty of the estimate.
Lake et al. [3] assert that provided m is small enough and r large enough to ensure a sufficient number of matches,
SampEn can be assumed to be normally distributed. Specifically, the standard error (SE) of the SampEn estimate of
xi is approximated as:

σCP
si ≈ , (27)
CP
where CP is the conditional probability a signal will remain within r for m + 1 points given it has remained within r
Am (r) m
for m points, expressed as CP = B m (r) (see (1) and (2) for details on calculating B (r) and Am (r), respectively).
Lake et al. [3] provide a mechanism to compute σCP .
 
Under the Gaussian assumption, the SampEn of xi at (m, r) is then θi (m, r) ∼ N θ̂i (m, r), si . Using this ap-
proximation, we can obtain an estimate of the mean SampEn MSE for all n signals by sampling from its respective
distribution. Algorithm 6 details the approach.
Using this estimate, we can obtain an approximate comparison of the objective value realized for an (m, r) combina-
tion selected by an alternate approach compared to our proposed BO algorithm.

E Constructing Weakly Stationary Signal Sets



Let S j = x1 , . . . , xnj where xi ∈ RNj correspond to the j-th signal set analyzed in Section 4.2. To ensure
valid SampEn analysis [52], we require that all signals contained in the signal set are weakly stationary. We evaluate
their statistical stationarity via the Augmented Dickey-Fuller (ADF) test [46] and corrected for multiple testing error
at confidence level α = 0.05 using the Holm-Sidak method [53, 54]. Algorithm 7 summarizes this process for a
particular signal set.

26
Bayesian Optimization of Sample Entropy Hyperparameters for Short Time Series

Algorithm 6 Gaussian Approximation of Signal Set SampEn Estimate MSE


Require: S = {x1 , . . . , xn }, m ∈ Z+ , r ∈ R+ , D ∈ Z+ , λ ≥ 0
1: for i = 1, . . . , n do
2: Compute SampEn estimate of xi given (m, r): θ̂i (m, r).
3: Calculate the SampEn estimate standard error, si using Eq. (27).
n oD  
4: θei,d (m, r) ∼ N θ̂i (m, r), si .
d=1
1
PD  e 2
5: b
ϵi (m, r) = D d=1 θ i,d (m, r) − θ̂ i (m, r) .
6: end for P √
n
7: return n1 i=1 bϵi (m, r) + λ r

Algorithm 7 Weakly Stationary Signal Set Construction


Require: S = {x1 , . . . , xn }, α ∈ (0, 1)
e ←∅
1: S ▷ Placeholder for modified signal set
2: for i = 1, . . . , n do
3: Calculate differenced signal: x e i = (x2 − x1 , x3 − x2 , . . . , xN − xN −1 )
4: Normalize x e i to have zero mean and unit variance
5: Compute ADF test statistic p-value, pi ▷ See [46]
6: e ←S
S e ∪ {e xi }
7: end for
n n
8: Given the n ADF p-values: {pi }i=1 , compute the multiple testing error adjusted p-values, {e pi }i=1 using the
Holm-Sidak method ▷ See [53, 54]
9: for i = 1, . . . , n do
10: if pei > α then
11: Remove x e
e i from S
12: end if
13: end for
14: return S e

27

You might also like