(2017) Bayesian Microsaccade Detection
(2017) Bayesian Microsaccade Detection
Citation: Mihali, A., van Opheusden, B., & Ma, W. J. (2017). Bayesian microsaccade detection. Journal of Vision, 17(1):13, 1–23,
doi:10.1167/17.1.13.
doi: 10 .116 7 /1 7. 1. 1 3 accepted October 26, 2016; published January 11, 2017 ISSN 1534-7362
Figure 1. Microsaccades under different noise levels. Example single-trial eye position data from two subjects, measured with the
EyeLink eye tracker with the ‘‘Heuristic filter’’ option turned off. (A) Measured eye position in the plane (left) and horizontal and
vertical position as a function of time (right) for an easily detectable microsaccade. (B) Another trace, which contains an apparent
microsaccade buried in measurement noise. EK with the threshold multiplier set at six identities a microsaccade in (A) but not in (B).
video-based infrared eye trackers, which are less position data in Figure 1A as a microsaccade, but not
invasive than magnetic scleral search coils, but noisier the data in Figure 1B. However, lowering the threshold
(Hermens, 2015; Träisk, Bolzani, & Ygge, 2005). For multiplier to three causes EK to label both examples as
example, the popular EyeLink II video-based infrared microsaccades. This ambiguity in the identification of
eye tracker reports a precision of 0.01 deg of visual microsaccades can cause ambiguity in conclusions
angle; however, in practice this precision can be worse about their functional roles.
(Holmqvist et al., 2011). The low sensitivity, precision, More recent algorithms have tried to eliminate the
and resolution of video-based eye trackers can cause need for an arbitrary velocity threshold by taking into
difficulties in resolving microsaccades (Nyström, Han- account more details of the statistics of fixational eye
sen, Andersson, & Hooge, 2016; Poletti & Rucci, 2016). movements. Bettenbuehl et al. (2010) assumed that
How can microsaccades be reliably detected in the microsaccades are discontinuities embedded in drift
presence of other fixational eye movements and and used wavelet analysis to detect them. Otero-Millan,
measurement noise? The most commonly used micro- Castro, Macknik, and Martinez-Conde (2014) have
saccade detection method, especially in human studies, proposed an unsupervised clustering algorithm based
is a velocity-threshold algorithm proposed by Engbert on three features: peak velocity, initial acceleration
and Kliegl (Engbert, 2006; Engbert & Kliegl, 2003). peak, and final acceleration peak.
This method, which we refer to as EK, detects a Here we propose to detect microsaccades with a
microsaccade when the magnitude of the eye velocity Bayesian algorithm. Bayesian algorithms have been
exceeds a given threshold for a sufficiently long used previously for saccade detection. Salvucci and
duration. Because a fixed threshold would ignore colleagues (Salvucci & Anderson, 1998; Salvucci &
differences in noise across trials and individuals, the Goldberg, 2000) used a hidden Markov model to
threshold was adaptively chosen to be a multiple of the separate fixations from saccades. However, their
standard deviation of the velocity distribution (Engbert algorithm requires the user to specify a set of fixation
& Kliegl, 2003). However, the value of the multiplier is targets, which are regions of interest based on a
arbitrary and affects the algorithm’s performance, as cognitive process model of the task. By contrast, our
expected from signal detection theory: If the multiplier algorithm is entirely task independent. More recently,
is too high, the algorithm misses microsaccades, while Daye and Optican (2014) used particle filters to
too low a multiplier causes false alarms. For example, estimate the posterior over a hidden position variable in
EK with the threshold multiplier set to its standard a generic and simple model for eye velocity. Whenever
value of 6 (Engbert & Kliegl, 2003) labels the eye this model fails to capture the data, their algorithm
Figure 3. Prior distributions used in the algorithm. (A) Prior distributions over the durations of drift and microsaccade states. These
priors are fixed in the inference process. (B) Priors for eye velocity for drift and microsaccade states. The inference process estimates
the parameters of these priors from data; here we show the priors estimated for one example subject (EyeLink S1; Table 2). Note that
these distributions are not normalized.
" #
noise with covariance matrix Rx (independent across Y Y
siþ1 1
time points), yielding the measured eye position xt. pðvjCÞ ¼ pðvsi jCsi Þ dðvt vt1 Þ : ð3Þ
We define a change point of C as a time t where Ct 6¼ i t¼si þ1
Ct1, and denote the ith change point as si. The To define the state-specific velocity distribution, we
duration for which the eye stays in a given state is then write v in polar coordinates, v ¼ (rcosh, rsinh)T, and
Dsi ¼ siþ1 si. We assume that C is a semi-Markov assume that in both states, the direction of the velocity
process, which means that these durations are inde- h is uniformly distributed and its magnitude r follows a
pendent. In a hidden Markov model (Bishop, 2006), the generalized gamma distribution
probability of Ct depends only on the previous state,
Ct1; however, in a hidden semi-Markov model (also 2 r2
pðrjCÞ ¼ pffiffiffi rd e2r2 ; ð4Þ
called an explicit-duration hidden Markov model; Yu, dþ1
C dþ1
2 ðr 2Þ
2010), the durations over which the state remains
unchanged are independent. Then the prior probability where d ¼ d0 and r ¼ r0 if C ¼ 0, and d ¼ d1 and r ¼ r1 if
of C is C ¼ 1. Note that our definition of the generalized
Y gamma distribution differs from that p offfiffiffiStacy (1962) by
pðCÞ ¼ pðDsi jCsi Þ; ð1Þ a reparametrization d d þ 1, r r 2. We fix d0 to 1,
i which is equivalent to assuming that the distribution of
where p(DsijCsi) is the state-specific probability of the the two-dimensional velocity in the drift/tremor state is
duration. Specifically, we use a gamma distribution a circularly symmetric Gaussian with standard devia-
with shape parameter 2 and scale parameter k: tion r0. The other parameters d1 and r1 control the
shape and scale, respectively, of the distribution of
pðDsjCÞ}Ds ekDs ; ð2Þ microsaccade velocities. Figure 3B shows examples of
these velocity distributions.
where k ¼ k0 if C ¼ 0 (drift/tremor) and k ¼ k1 if C ¼ 1 The eye position time series z is piecewise linear with
(microsaccade). We choose this distribution because it velocity v, plus motor noise, which follows a Gaussian
makes very short and very long durations unlikely, distribution with covariance matrix Rz:
consistent with previously reported distributions of
durations for drift and microsaccades (Engbert, 2006). Y
T Y
T
Assumptions about the frequency and duration of pðzjvÞ ¼ pðzt jzt1 ; vt Þ ¼ N ðzt ; zt1 þ vt ; Rz Þ:
microsaccades are reflected in the choices of parameters k0 t¼1 t¼1
a state-specific probability distributionpðvsi j Csi Þ; this pðxjzÞ ¼ pðxt jzt Þ ¼ N ðxt ; zt ; Rx Þ: ð6Þ
t¼1 t¼1
velocity remains constant until the eye switches state at
siþ1. The distribution of the velocity time series v given Motor and measurement noise are in principle
an eye state time series C is distinguishable, because changes in the eye position due
Step Operation need to integrate both over the velocities at all change
points, fvs i g, and over the eye position time series z.
0 Initialize C, rb1 , db1 , rb0
The velocity integral is numerically tractable, but the
1 Estimate the motor and measurement noise: r ^ z, r
^x
eye position one is not. Moreover, the likelihood also
2 Estimate ẑ from observations x: Kalman smoother
3 Sample from the posterior over C: MCMC
depends on the unknown parameters r0, r1, d1, rz, and
4 Estimate the velocity distribution parameters: maximum rx. A fully Bayesian algorithm would require priors
likelihood estimation (MLE) over these parameters to jointly infer the parameters
Return to Step 1 together with the eye state time series C. This too is
intractable.
Table 1. BMD algorithm. Instead, we use a multistep approximate inference
algorithm, which we name Bayesian microsaccade
to motor noise are added over time, whereas measure- detection (BMD), outlined in Table 1. A key idea in
ment noise is independently added at each time point. this algorithm is to replace the marginalization over z
We assume that both covariance matrices are isotropic: by a single estimate, reminiscent of expectation
Rz ¼ r2z I and Rx ¼ r2x I. Before we analyze data, we maximization. Our algorithm then alternates between
rescale the vertical dimension of the measured eye estimating C, z, and the parameters for six iterations,
positions so that the isotropy assumption is approxi- which in practice suffices for the estimates to converge.
mately satisfied (see Preprocessing, later). To run BMD on an eye position time series of 1 min
(60,000 time steps) takes approximately 40 s on a
MacBook Pro with an Intel Core i7 with a 2.9 GHz
Inference of the eye state time series C processor and 8 GB of RAM.
Although BMD returns a probability over eye state
Our goal is to infer the eye state time series C given a at every time point, for most of the following analyses
time series of measured eye positions x, using the we will threshold these probabilities at 0.5 in order to
generative model. To perform optimal inference, we obtain binary judgments.
need to compute the posterior distribution over C. By We now describe the details of the steps of the BMD
Bayes’s rule, this posterior is proportional to the algorithm.
product of the prior p(C) and the likelihood p(xjC):
pðCjxÞ}pðCÞpðxjCÞ: ð7Þ
Preprocessing
The prior can be directly evaluated using Equations We split the eye position data into blocks of ;1
1 and 2, but computing the likelihood requires min, which we process independently. Before we
marginalization over nuisance parameters, the velocity perform inference, we preprocess the data to match
time series v and the eye position time series z, using the the isotropy assumption of the measurement and
dependencies given by the generative model: motor noise in our generative model. To do so, we
RR
pðxjCÞ ¼ pðxjzÞpðzjC; vÞpðvjCÞ dv dz: ð8Þ observe that within our model, eye velocity is
piecewise constant, and therefore its derivative is zero
Plugging in the functional form of these distributions, except at change points. This means that the
and performing some algebra (see Computation of the acceleration of the measured eye position depends
likelihood in the Appendix), yields the likelihood of C: only on the motor and measurement noise, except at
Z Y" ðzt zt1 ÞT ðzt zt1 Þ ðxt zt ÞT ðxt zt Þ # change points. For this reason, we use the median
pðxjCÞ} e 2r2z 2r2x absolute deviation of the acceleration to estimate the
t noise level. We calculate this quantity separately in the
" ðsiþ1 si ÞvT
# horizontal and vertical dimensions, and rescale the
ðzsiþ1 zsi ÞT vsi
Y R
si vsi
þ
3 pðvsi jCsi Þe 2r2z r2z
dvsi dz: vertical-position time series by the ratio of the
i outcomes. After rescaling, the noise distribution is
approximately isotropic.
ð9Þ
The algorithm utilizes measured eye position at
boundary unobserved time points x0 and xTþ1. For
these, we choose x0 ¼ x1 e and xTþ1 ¼ xT þ e, where e ¼
Approximate inference (104, 104)deg. We need to include the offset e to avoid
numerical instabilities in our implementation. Finally,
The goal of our algorithm is to draw samples from we subtract the resulting value of x0 from every point in
the posterior p(Cjx). First we need to evaluate the the time series, so that x0 ¼ 0; this has no effect on the
likelihood in Equation 9. This is difficult, because we detected microsaccades.
Step 0: Initialize C, rb1 , db1 , rb0 optimal estimate turns out to be a linear filter. We
We fix d0 to 1. We initialize r ^ 0, r
^ 1, and d̂1 to implement the Kalman smoother with the Rauch–
random values drawn from reasonable ranges (^ r0 : Tung–Striebel (RTS) algorithm, which applies a
[0.1, 5] deg/s, r^ 1: [5, 100] deg/s, and d̂1: [1.1, 5]). We Kalman filter to x followed by another Kalman filter
initialize C by setting Ct to 1 for time points t where to the output of the first filter, backward in time
jjxt xt1jj is in the highest percentile, and to 0 (Rauch, Tung, & Striebel, 1965; Terejanu, 2008). The
otherwise. Kalman filter estimates the system state at each time
only from earlier observations. In our case, the RTS
algorithm reduces to
Step 1: Estimate the motor and measurement noise
Our first goal is to estimate rx and rz given a ybt ¼ ybt1 þ Kforward ðxt ybt1 Þ; ð12Þ
measured eye position time series x and an estimated pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
with Kforward ¼ ð1 þ 1 þ 4R2 Þ=ð1 þ 2R2 þ 1 þ 4R2 Þ;
eye state time series Ĉ. As stated before, we can
disentangle motor and measurement noise because, in where R ¼ rrxz ; and
our generative model, motor noise accumulates over b
zt ¼ b
ztþ1 þ Kbackward ðyt b ztþ1 Þ; ð13Þ
time while measurement noise does not. Specifically, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
the autocovariance function of x conditioned on v at with Kbackward ¼ ð 1 þ 4R2 1Þ=ð 1 þ 4R2 þ 1Þ: For
time lag s is more details, see Kalman smoother in the Appendix.
covðxt ; xts Þ ¼ 2r2z s þ 4r2x : ð10Þ Given our generative model, the Kalman smoother is
the optimal filter to denoise the measured eye position.
To use this relationship, we first estimate v by fitting x The EyeLink eye tracker software also has a denoising
as a piecewise linear function with discontinuities at the option, called ‘‘Heuristic filter,’’ which is based on an
change points of C. Then we calculate the empirical algorithm by Stampe (1993). This filter is suboptimal
autocovariance function of the residual, given our generative model and therefore, assuming
1 X T that our generative model is realistic, will perform
cemp ðsÞ ¼ xeT xets ; ð11Þ worse in separating signal from noise than the Kalman
T t¼sþ1 t smoother.
and fit this as a linear function of s; this gives a slope and
a y-intercept. Our estimates of the motor noise and Step 3: Sample from the posterior over the eye state time
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
measurement noise are b rz ¼ slope=2 and b rx ¼ series C
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
yintercept =4: We draw samples from the posterior p(Cjẑ, r0, r1, d1,
rz) using MCMC sampling with Metropolis–Hastings
acceptance probabilities. Using the prior over veloci-
Step 2: Estimate z from observations x with Kalman ties, Equation 5, and the property of the delta function,
smoother
we can compute the posterior as
We cannot compute the likelihood of the eye state
time series, p(xjC) in Equation 9, because the integral YZ
ðsiþ1 si ÞvT
si vsi ðzs zs ÞT vsi
þ iþ1 2 i
2r2
over z is both analytically and numerically intracta- zÞ ¼
pðCjb pðvsi jCsi Þe z rz
dvsi :
ble. However, the integral over vsi depends only on i
zsiþ1 zsi : The expected value of this difference is ð14Þ
equal to ðsiþ1 si Þv̄, where v̄ is the average velocity
between the change points; its standard deviation is of Each term in this product is an independent integral
the order of rz. Therefore, if either v̄ or siþ1 si is over vsi , which depends only on zsiþ1 zsi , siþ1 si, and
sufficiently large (we expect the former to hold for implicitly the eye state Csi through the parameters d
microsaccades and the latter for drift), we can neglect and r in the prior pðvsi Þ. We can therefore write
the uncertainty in zsiþ1 zsi and approximate it by a Y
pðbzjCÞ ¼ Iðzsiþ1 zsi ; siþ1 si ; dCsi ; rCsi Þ; ð15Þ
point estimate.
i
We obtain the point estimate of z given x by
maximizing the first integral in Equation 9. This
with
maximization turns out to be equivalent to applying a ZZ
Kalman smoother to x (Kalman, 1960; Welch & 1
Bishop, 2006). In general, a Kalman smoother IðDz; Ds; d; rÞ ¼ pðrjd; rÞ
2p
estimates the system state in a time interval from Ds2 r2 þ 12 Dzð rsinh
rcosh
Þ
noisy observations during the same interval. The 3e 2rz rz
dr dh: ð16Þ
according to the manual (but see Holmqvist et al., reflection), DPI eye trackers detect both the first and
2011; Poletti & Rucci, 2016). We acquired eye position fourth Purkinje reflections, allowing discrimination
data with the EyeLink software. We set the ‘‘Heuristic between eye movements of rotation and translation.
filter’’ option to off to obtain the raw data. The DPI eye tracker has a high precision, of 0.0068
(Cherici et al., 2012; Crane & Steele, 1985).
Procedure
Subjects performed a delayed-estimation-of-orienta- Procedure
tion delayed-estimation task, as introduced by Wilken Subjects observed the screen with the right eye while
and Ma (2004). A trial sequence started with the wearing an eye patch on their left eye. A dental-imprint
appearance of a central white fixation cross subtending bite bar and a headrest prevented head movements.
a visual angle of 0.3 deg, which lasted for 500 ms or Subjects were asked to maintain sustained fixation
until the subject successfully fixated. We defined while looking at a marker displayed on the screen. Two
fixation to be successful when the eye position remained subjects performed the task.
within a 2 deg circle centered at the fixation cross. Next,
two stimuli appeared 6 deg to the right and left of the
central fixation cross. The stimuli were circularly
windowed gratings with radius 0.35 deg, spatial Results
frequency 2.5 cycles/deg, and uniformly drawn orien-
tation. The stimuli stayed on the screen for 11 frames Comparison of algorithms on simulated data
(about 110 ms), followed by a delay period of 1000 ms.
If the subject broke fixation at any point during the We created 36 data sets with eye position time series
stimulus or delay period, the trial was aborted and a of length T ¼ 60,000 ms according to the generative
new trial sequence started. We eliminated these trials model. We created every combination of six chosen
from our data set. After the delay period, the subject values of motor noise and six values of measurement
was probed about one of the locations and responded noise. We fixed the velocity distribution parameters at r0
by using the mouse to estimate the orientation. More ¼ 0.38/s, d1 ¼ 4.4, and r1 ¼ 308/s, to approximate realistic
precisely, when the subject moved the mouse, a microsaccade kinematics (Engbert, 2006). We inferred
windowed grating appeared inside that circle. The the eye state time series with the BMD algorithm and the
subject had to rotate it using the mouse to match the standard EK algorithm, which uses a velocity threshold
orientation of the grating that had been in that multiplier of 6 (referred to as EK6). After thresholding
location, and then press the space bar to submit a the BMD inferences, we evaluated their performance in
response. The experiment consisted of eight blocks, terms of the hit rate (defined as the proportion of 1s
each consisting of 60 completed (nonaborted) trials correctly identified in the C time series) and the false-
with 30-s breaks in between blocks. alarm rate (the proportion of 1s wrongly identified in the
C time series; Figure 4). While the velocity distribution
parameters were not perfectly recovered (Figure A3), the
Dual Purkinje Image experimental methods BMD hit rates were very high (Figure 4A). The hit rate
of the BMD algorithm decreases with increased motor
The Dual Purkinje Image (DPI) eye tracker data were noise, as in standard signal detection theory, but it is
made available by Martina Poletti and Michele Rucci. remarkably robust to increased measurement noise. By
Their study was approved by the institutional review contrast, the hit rate of EK6 is lower and more affected
board of Boston University. The method and data were by the noise level. In EK6, the false-alarm rate decreases
described in detail elsewhere (Cherici, Kuang, Poletti, & with increasing noise because the threshold adapts to the
Rucci, 2012); we summarize them here. noise level. Across the board, BMD has false-alarm rates
comparable to EK6’s but much higher hit rates,
especially at high noise.
Apparatus For a more comprehensive evaluation, we also
Stimuli were presented on a custom-developed compare BMD against OM and an EK variant with a
system for flexible gaze-contingent display control on a velocity threshold multiplier k ¼ 3 (EK3; Figure 5). As
fast-phosphor CRT monitor (Iiyama HM204DT) with performance metrics, we use the error rate in identify-
a vertical refresh rate of 150 Hz. The movements of the ing the eye state at every time point, the number of
right eye were measured with a Generation 6 DPI eye microsaccades per unit time, and the hit and false-
tracker (Fourward Technologies, Buena Vista, VA) at a alarm rates. BMD has a lower error rate than all
1-kHz sampling rate. While most video-based eye alternative algorithms in 30 out of 36 noise levels. As in
trackers detect only the first corneal reflection (Purkinje Figure 4, the improvement of BMD over alternative
Figure 4. Performance of the BMD and EK6 algorithms on simulated data. (A) Hit rates of the BMD algorithm as a function of the
motor noise rz for several values of measurement noise rx. Points and error bars represent means and standard errors across eight
simulated data sets. (B) Hit rates of the EK6 algorithm. (C) Scatterplot comparing hit rates of both algorithms. Each point corresponds
to a different pair (rz, rx). (D–F) The same as (A–C) for false-alarm rates.
algorithms is larger for higher noise. BMD has a hit apply our algorithm to real eye-tracking data measured
rate close to 1 in all but the highest level of motor noise, with two different eye trackers: EyeLink and DPI.
whereas the false-alarm rate is comparable to those of
other algorithms. The BMD algorithm is more robust
than all other algorithms: Its hit rate and microsaccade EyeLink data
rate vary only weakly with increasing measurement In Figure 7, we show six example measured eye
noise. position sequences and the inferred change points by
As expected from signal detection theory, there is a BMD and EK6. When the signal-to-noise ratio is high
trade-off between false alarms and misses in the EK (Figure 7A through C), BMD generally infers the same
algorithm. EK6 is too conservative, leading to more microsaccades as EK6. Additionally, BMD returns a
misses than BMD; however, EK3 is too permissive and probabilistic judgment of the beginning and end time of
has more false alarms. To test whether the EK the microsaccade. In some cases, BMD detects a small
algorithm with any threshold can match BMD’s microsaccade immediately after a larger one, in the
performance, we compute a receiver operating charac- opposite direction (Figure 7B, C), corresponding to the
teristic (ROC; Figure 6). At low noise, both BMD and overshoot. For low signal-to-noise data (Figure 7D
EK perform close to perfectly. Overall, BMD outper- through F), the BMD algorithm tends to detect
forms or matches EK at all other noise levels. However, potential microsaccades that EK6 misses, but they
in cases where BMD performance matches that of EK, could be false positives. BMD assigns low confidence to
BMD intersects the EK ROC curves for different its judgments in ambiguous cases like Figure 7D and F.
thresholds at different noise levels. This makes choos- The microsaccades detected by BMD have similar
ing a single best threshold problematic. kinematics as previously reported (Engbert, 2006; Figure
A4). The inferred velocity and duration distributions of
BMD and EK6 are similar, except for the duration
Applications to real data cutoff in EK6. Most importantly, the microsaccades
detected by BMD follow the main sequence: Their
The results on simulated data suggest that BMD amplitude is monotonically related to their peak velocity
recovers microsaccades more faithfully than alternative (Zuber, Stark, & Cook, 1965). As in Engbert and Kliegl
algorithms, especially at high noise. This confirms that (2003), we consider the approximate recovery of the
the approximations in our inference algorithm do not main-sequence relationship to be evidence for the
significantly impair its performance. However, we validity of our detection algorithm. Our algorithm
created data according to our generative model, so we estimates the mean velocity for drift as 0.1253 deg/s for
expected the BMD algorithm to be superior. Next, we all but one subject, and 22.64 6 8.4 deg/s (mean and
Figure 5. Performance of several algorithms on simulated data. Colors represent four different algorithms: OM, two versions of EK,
and BMD. We evaluate performance with four different metrics: (A) error rate, (B) microsaccade rate, (C) hit rate, and (D) false-alarm
rate. The motor noise rz increases across columns, and the measurement noise rx increases within each subplot. BMD has the lowest
error rates at high noise levels and is the most robust against increases in both rz and rx. BMD hit rates and microsaccade rates are
the most robust against increases in either rz or rx, without a major increase in false-alarm rates.
standard error across subjects) for microsaccades. These magnitude and show that the inferred microsaccade
values are in line with literature reports: mean drift rate is approximately constant (Figure A6), making the
velocity of 0.858/s (Poletti, Aytekin, & Rucci, 2015) and BMD algorithm robust to the choice of the prior in a
below 0.58/s (Engbert, 2006; Rolfs, 2009), and mean plausible range.
microsaccade velocity of ;30 deg/s. These results suggest that BMD outperforms EK6
Overall, BMD detects more microsaccades than EK6 with real data. Specifically, BMD detects many
for all five subjects (Figure A5). This difference can be plausible microsaccades that EK6 misses, especially
dramatic: For two subjects (S3 and S4), EK6 infers no when their amplitude is small and the noise is high.
microsaccades at all, whereas BMD infers micro- However, an alternative interpretation is that BMD
saccade rates up to 2.1 per second. This further suggests detects false positives. We cannot distinguish these
that EK6 is too conservative and misses microsaccades possibilities because, in contrast to the simulated data,
when the measurement noise is high. The other we do not know the ground truth. In general, we know
algorithms (OM and EK3) are less conservative, but that all four algorithms give different inferences, but
their inferred microsaccade rates vary widely, rein- without ground truth we have no way of establishing
forcing the need for a more principled microsaccade which one is better.
detection algorithm.
Finally, we ask how dependent the microsaccade rate
inferred by BMD is on the choice of parameters in the DPI data
priors over the frequency and duration of micro- To address this problem, we use another data set,
saccades. We vary both k0 and k1 by an order of provided by Poletti and Rucci (Cherici et al., 2012).
Figure 6. Performance of the algorithms on simulated data visualized relative to the EK ROC curve. The red and green dots represent
the combination of hit rate and false-alarm rate for BMD and OM, respectively. The EK ROC curves were created with different values
of the threshold multiplier k. EK3 and EK6 correspond to points on the curve. For all noise levels tested, including the ones presented
here, BMD either outperforms both OM and EK or matches EK.
These eye movements were measured with the more can treat the microsaccades inferred from the raw DPI
precise DPI eye tracker (Cherici et al., 2012; Crane & data (averaged across algorithms) as ground truth. Our
Steele, 1985). Indeed, BMD infers that the geometric strategy is to artificially add increasing amounts of
mean of the measurement noise level in DPI data is
measurement noise to the raw data and see how much
almost an order of magnitude lower than in EyeLink
data (Table 2). In simulated data with the same noise the inference of each algorithm degrades as a result.
level as BMD infers for DPI, all algorithms perform This allows us to compare the robustness of the
close to perfectly. In view of this high performance, we algorithms with an objective metric.
Figure 7. Inferences of microsaccades by BMD and EK6 on example eye position sequences measured with the EyeLink eye tracker.
The black and white shading represents the probability that the eye is in a microsaccade state, with black indicating certainty. Every
subplot shows the BMD inference in the top half and the EK6 inference in the bottom half. (A–C) Often, BMD and EK6 infer nearly
identical microsaccade sequences. (D–F) BMD infers potential microsaccades that EK6 misses, especially when they are small or noisy.
Subject ^ z (pdeg
r ffiffiffiffi)
ms
^ x deg
r d̂1 ^ 0 (deg/s)
r ^ 1 (deg/s)
r
We compare the error rates as well as the micro- tational cost, is parallel tempering (Earl & Deem,
saccade rates, hit rates, and false-alarm rates between 2005; Newman & Barkema, 1999). BMD with parallel
BMD, OM, EK3, and EK6 (Figure 8). BMD tempering does not significantly outperform BMD
outperforms EK3, EK6, and OM at all except the either for simulated data (Figures 10 and 11) or for
lowest noise levels. In particular, at measurement real DPI data with added noise (Figure 12), suggesting
noise levels comparable to the ones inferred in that the posterior probability landscape did not
EyeLink data (0.02 deg), the error rate for EK6 is contain many local maxima. To investigate which
3.22% (averaged across subjects), while BMD achieves components of our method are necessary for its
1.48%—a 54% improvement. Note that all algorithms performance, we compare BMD against three reduced
have low error rates, primarily because microsaccades variants. We obtain the first variant by reducing the
are rare. As in simulated data, we compare BMD to number of iterations in the approximate inference
EK with different thresholds by plotting an ROC method from six to two. The second variant has only
curve; BMD outperforms EK regardless of its one iteration, which is equivalent to applying a
threshold (Figure 9). Kalman smoother to obtain ẑ from x, then sampling
from p(Cjẑ).
Finally, a third version, BMDreduced þ threshold,
Variants of BMD starts with Steps 0–2 of the BMD algorithm. However,
instead of sampling from the posterior p(Cjẑ) in Step 3,
A common risk in Monte Carlo methods is that the it estimates C by applying a Kalman smoother (after
samples aggregate near potential local maxima of the the Kalman smoother of Step 2) to ẑ to obtain a
posterior and miss the global maximum. One method smoothed eye position time series, differentiating that
to mitigate this problem, albeit at increased compu- to obtain eye velocities, and thresholding the velocity
Figure 8. Performance of the algorithms on DPI data. We took DPI data from two subjects (rows), collected by Cherici et al. (2012),
and artificially added measurement noise to the eye position traces. Colors represent algorithms. BMD shows the highest robustness
to adding measurement noise; specifically, error rates are lowest and hit rates tend to stay the same.
Figure 9. Performance of the algorithms on DPI data with added noise relative to the EK ROC curve. We show the same two subjects
(S1 and S2) as in Figure 8. The level of the added measurement noise varies across columns. The EK ROC curves were created with
different values of the threshold multiplier k. EK3 and EK6 correspond to points on the curve. As more measurement noise is added,
BMD outperforms EK and OM by larger amounts.
time series (Figure 13). We fix the window size of the state clearly which of their findings depend on those
Kalman smoother to 5.32 ms and use a threshold beliefs.
which scales linearly with the inferredpmotor
ffiffi1 noise We designed the BMD algorithm for off-line analysis
level: threshold ¼ a^rz þ b, with a ¼ 32 s and b ¼ 1 of eye tracker data. An online detection method—for
deg/s. We chose these values to approximately match example for closed-loop experiments that require real-
the output of BMD and BMDreduced þ threshold in real time detection of microsaccades, such as in Chen and
and simulated data. This method performs about as Hafed (2013) and Yuval-Greenberg et al. (2014)—
well as the full inference algorithm. However, it is would require a modified inference algorithm. If it is
unprincipled, does not return a probabilistic estimate, crucial to detect microsaccades online, we recommend
and cannot be directly extended to more sophisticated using BMDreduced þ threshold, with a Kalman filter
generative models. (only the forward filter) instead of the Kalman
smoother.
We designed and tested BMD for detecting micro-
saccades in fixational eye movement data obtained
Discussion under head-fixed conditions, where the fixation point
does not move. Would the algorithm readily apply to
We developed a Bayesian algorithm for detecting other kinds of eye movement data? First, head-free
microsaccades among drift/tremor; it returns probabi- recordings are sometimes used in order to better mimic
listic rather than binary judgments. Given our as- naturalistic conditions (Benedetto, Pedrotti, & Bridge-
sumptions about the statistical process generating a man, 2011; Martinez-Conde, Macknik, Troncoso, &
measured eye position time series, this algorithm is Dyar, 2006; Poletti et al., 2015). In theory, our
optimal. BMD has lower error rates than the algo- algorithm is suitable for inferring microsaccades in
rithms proposed by Engbert and Kliegl (2003) and head-free recordings. However, studies have reported
Otero-Millan et al. (2014), especially at high noise. This higher velocities for drift in head-free fixation (Poletti et
is a particularly useful feature given the relatively high al., 2015; Skavenski, Hansen, Steinman, & Winterson,
measurement noise of current infrared eye trackers. 1979)—for example, on average 1.58/s for head-free
However, a hybrid between BMD and velocity- versus 0.85 deg/s for head-fixed in Poletti et al. (2015).
threshold algorithms, BMDreduced þ threshold, can Therefore, we expect the velocity distributions pre-
sometimes approach BMD’s performance. sented in Figure 3B to be less separable, which in turn
In our model, microsaccades are defined through would impair microsaccade detection. Second, our
prior probability distributions over velocity and dura- algorithm is not immediately applicable to smooth
tion that are different from those for drift/tremor pursuit, in which the eye continuously tracks the
(Figure 2). This definition contrasts with the more motion of an object. Santini, Fuhl, Kubler, and
common one that uses an arbitrary velocity threshold. Kasneci (2016) used a Bayesian classification algorithm
The BMD algorithm (and the actual code) allows to separate drift, saccades, and smooth pursuit based
researchers to easily build in their own prior beliefs and on features derived from the eye position data, but this
Figure 10. Performance of BMD variants on simulated data. The variants we examine are BMD with parallel tempering, BMD with
fewer iterations (two and one), and a reduced variant of BMD with a threshold (BMDreduced þ threshold). In the latter model, the
pffiffi1
threshold is dependent on motor noise through the equation threshold ¼ 32 s b rz þ 1 deg=s, chosen because it gave the lowest
error rates on DPI data. The motor noise rz increases across columns, and the measurement noise rx increases within each subplot.
BMD with parallel tempering is only a slight improvement over BMD, while BMD performs slightly better than BMD with two and one
iterations. BMDreduced þ threshold only performs comparably with BMD under high motor and measurement noise.
Figure 11. Performance of BMD variants on simulated data visualized relative to the ROC curves for BMDreduced þ threshold. Here we
show hit rates with false-alarm rates points for BMD variants and ROC curves for BMDreduced þ threshold for several values of the
threshold multiplier. In contrast to Figure 10, where we choose one threshold, here we see that the BMD-variant points are on the
ROC curves at low noise (first two subplots). However, as the motor and measurement noise increase (last two subplots), the full ROC
curve can reach higher performance than the variant BMD algorithms.
Figure 12. Performance of BMD variants on DPI data to which we add measurement noise. We show the same two subjects (S1 and
S2) as in Figure 8. We measure performance on the same metrics as before. For brevity, we show in (A) the error rates with fixed
threshold. In (B), we show the hit rates with false-alarm rates for the variant BMD algorithms relative to the ROC curve for BMDreduced
þ threshold. Adding parallel tempering to BMD makes little difference. Using fewer iterations negatively affects the hit rate.
BMDreduced þ threshold gives slightly lower error rates and seems to match the performance of BMD.
Figure 13. Schematic comparison of microsaccade detection Conceptual extensions of the algorithm
algorithms. All algorithms first perform a filtering operation to
eliminate the noise from the measured eye position time series The inferred microsaccades depend on assumptions
x. EK removes noise with a heuristically chosen filter; in in our generative model, which are simplistic and
contrast, BMD and BMDreduced þ threshold use a Kalman incorrect. We can flexibly adjust these assumptions in
smoother, which optimally eliminates measurement noise in the generative model and modify the BMD algorithm
our generative model. EK estimates the eye state time series by accordingly.
taking the derivate of the eye position to yield the eye-velocity
time series, then thresholding those velocities. BMD, on the
other hand, marginalizes over velocity and samples from the Correlated state durations
posterior distribution over eye states. BMDreduced þ threshold Our generative model assumes that the durations
uses a second Kalman smoother to eliminate some of the motor over which the eye remains in either state are
noise and ultimately uses a velocity threshold which depends independent. We can relax this assumption by changing
on the motor noise. the duration prior; this does not affect the likelihood.
Martinez-Conde, S., Macknik, S. L., Troncoso, X. G., Rolfs, M., Engbert, R., & Kliegl, R. (2004). Micro-
& Dyar, T. A. (2006). Microsaccades counteract saccade orientation supports attentional enhance-
visual fading during fixation. Neuron, 49(2), 297– ment opposite a peripheral cue: Commentary on
305, doi:10.1016/j.neuron.2005.11.033. Tse, Sheinberg, and Logothetis (2003). Psycholog-
Metropolis, N., Rosenbluth, A., Rosenbluth, M., ical Science, 15(10), 705–707, doi:10.1111/j.
Teller, A., & Teller, E. (1953). Equation of state 0956-7976.2004.00744.x.
calculations by fast computing machines. Journal of Rucci, M., Iovin, R., Poletti, M., & Santini, F. (2007).
Chemical Physics, 21, 1087–1092. Miniature eye movements enhance fine spatial
Newman, M. E. J., & Barkema, G. T. (1999). Monte detail. Nature, 447(7146), 852–855, doi:10.1038/
Carlo methods in statistical physics. Oxford, UK: nature05866.
Clarendon Press. Salvucci, D. D., & Anderson, J. R. (1998). Tracing eye
Nyström, M., Hansen, D. W., Andersson, R., & movement protocols with cognitive process models.
Hooge, I. (2016). Why have microsaccades become In J. Editor (Ed.), Proceedings of the Twentieth
larger? Investigating eye deformations and detec- Annual Conference of the Cognitive Science Society
tion algorithms. Vision Research, 118, 17–24, doi: (pp. 923–928). Hillsdale, NJ: Lawrence Erlbaum
10.1016/j.visres.2014.11.007. Associates.
Otero-Millan, J., Castro, J., Macknik, S. L., & Salvucci, D. D., & Goldberg, J. H. (2000). Identifying
Martinez-Conde, S. (2014). Unsupervised cluster- fixations and saccades in eye-tracking protocols. In
ing method to detect microsaccades. Journal of Proceedings of the symposium on eye-tracking
Vision, 14(2):18, 1–17, doi:10.1167/14.2.18. research & applications (pp. 71–78). New York:
ACM.
[PubMed] [Article]
Santini, T., Fuhl, W., Kubler, T., & Kasneci, E.
Otero-Millan, J., Troncoso, X. G., Macknik, S. L.,
(2016). Bayesian identification of fixations, sac-
Serrano-Pedraza, I., & Martinez-Conde, S. (2008).
cades, and smooth pursuits. In J. Editor (Ed.),
Saccades and microsaccades during visual fixation,
Proceedings of the ninth biennial ACM Symposium
exploration, and search: Foundations for a com-
on Eye Tracking Research & Applications (pp.
mon saccadic generator. Journal of Vision, 8(14):21,
NN–NN), doi:10.1145/285749/2857512. Location:
1–18, doi:10.1167/8.14.21. [PubMed] [Article]
Publisher.
Pelli, D. G. (1997). The VideoToolbox software for
Skavenski, A., Hansen, R., Steinman, R., & Winterson,
visual psychophysics: Transforming numbers into
B. (1979). Quality of retinal image stabilization
movies. Spatial Vision, 10(4), 437–442, doi:10.1163/
during small natural and artificial body rotations in
156856897x00366.
man. Vision Research, 19(6), 675–683, doi:10.1016/
Poletti, M., Aytekin, M., & Rucci, M. (2015). Head-eye 0042-6989(79)90243-8.
coordination at a microscopic scale. Current Stacy, E. W. (1962). A generalization of the gamma
Biology, 25(24), 3253–3259, doi:10.1016/j.cub.2015. distribution. The Annals of Mathematical Statistics,
11.004. 33(3), 1187–1192.
Poletti, M., Listorti, C., & Rucci, M. (2013). Micro- Stampe, D. M. (1993). Heuristic filtering and reliable
scopic eye movements compensate for nonhomo- calibration methods for video-based pupil-tracking
geneous vision within the fovea. Current Biology, systems. Behavior Research Methods, Instruments, &
23(17), 1691–1695, doi:10.1016/j.cub.2013.07.007. Computers, 25(2), 137–142, doi:10.3758/bf03204486.
Poletti, M., & Rucci, M. (2016). A compact field guide Terejanu, G. (2008). Crib sheet: Linear Kalman
to the study of microsaccades: Challenges and smoothing [Online tutorial]. Retrieved from https://
functions. Vision Research, 118, 83–97, doi:10.1016/ cse.sc.edu/;terejanu/files/tutorialKS.pdf
j.visres.2015.01.018.
Träisk, F., Bolzani, R., & Ygge, J. (2005). A
Ratliff, F., & Riggs, L. A. (1950). Involuntary motions comparison between the magnetic scleral search
of the eye during monocular fixation. Journal of coil and infrared reflection methods for saccadic
Experimental Psychology, 40(6), 687–701. eye movement analysis. Graefe’s Archive for Clin-
Rauch, H. E., Tung, F., & Striebel, C. T. (1965). ical and Experimental Ophthalmology, 243(8), 791–
Maximum likelihood estimates of linear dynamic 797, doi:10.1007/s00417-005-1148-3.
systems. AIAA Journal, 3(8), 1445–1450. Welch, G., & Bishop, G. (2006). An introduction to the
Rolfs, M. (2009). Microsaccades: Small steps on a long Kalman filter [Online tutorial]. Retrieved from
way. Vision Research, 49(20), 2415–2441, doi:10. http://www.cs.unc.edu/~welch/media/pdf/
1016/j.visres.2009.08.010. kalman_intro.pdf
R 2
Figure A1. Details of solving the integral A(a, d) ¼ f(s)ds, with f ðsÞ ¼ sd eas I0 ðsÞ. (A) logf(s) for several combinations of s, d, and a.
For larger values of a, f(s) is concentrated at lower values of s. For such values, we use the Taylor series expansion of I0 (s). However,
for smaller values of a, the larger values of s contribute substantially to the integral and therefore we use the large s approximation of
I0 (s). These analytical approximations are much faster than interpolation, though come at the cost of approximation errors. (B) We
limit the usage of approximations to ensure that the total approximation error of the integral A(a,d) is less than 0.003. In white and
gray, we show the parameter regions that satisfy this criterion.
across time. Plugging this Kalman gain into Equation 3 rd e2r2 e 2r2z r2z dr dh
12 allows us to express the state estimation equation for Z ‘
1d
ŷt in terms of the previous estimate ŷt1, current 22 r2 12 þ Ds2
¼ dþ1 dþ1 rd e 2r 2rz
observation xt, and process and noise standard Cð 2 Þr 0
deviations: Z
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 2p rr2 jjDzjjcosðhp2Þ
r2z þ r4z þ 4r2z r2x 3 ez dh dr
ybt ¼ ybt1 þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðxt ybt1 Þ: 2p 0
2r2x þ r2z þ r4z þ 4r2z r2x Z ‘
1d
22 d r2 1
2 þ Ds
2 r
ð26Þ ¼ dþ1 dþ1 re 2r 2rz
I0 2 jjDzjj dr; ð29Þ
C 2 r 0 rz
We denote R ¼ rx/rz and then get
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where I0 is the modified Bessel function of the first
1 þ 1 þ 4R2 kind of order zero. Next we change variables from r
ybt ¼ ybt1 þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðxt ybt1 Þ: ð27Þ
1 þ 2R2 þ 1 þ 4R2 to s ¼ rr2 jjDzjj and write the final form of the integral
z
Figure A3. Parameter recovery in simulated data. In all simulated data sets, we fixed the velocity distribution parameters at d1 ¼ 4.4,
r0 ¼ 0.00038/ms, and r1 ¼ 0.038 /ms. For every combination of six motor-noise values and six measurement-noise values (colors), we
created eight data sets. Here we show the median across the eight data sets of the inferred parameter values as a function of the true
value of the same parameter—(A) motor noise, (B) measurement noise—or as a function of the true measurement noise rx, in the
case of the velocity distribution parameters (C) d1, (D) r0, and (E) r1. The dashed black lines correspond to perfect parameter
recovery. While these parameters are not always faithfully recovered, the inferred eye state time series C is recovered to a great
degree of accuracy (Figure 4).
Z
‘ 2 r4z 1 Ds
asymptotic approximations and solve the resulting
d s 2jjDz jj2
þ
r2 r2z integrals. Specifically, we define upper and lower
3 se I0 ðsÞ ds: ð30Þ
0 bounds a‘(d) and a0(d). For a , a0(d), we use the large-
This expression shows that we can calculate I(Dz, Ds, s approximation to the Bessel function (Abramowitz &
s
d, r) by evaluating the integral Stegun, 1965), I0 ðsÞ ’ peffiffiffiffiffi
2ps
, so that
Z ‘ Z ‘
2 es
Aðd; aÞ ¼
2
sd eas I0 ðsÞ ds; logAðd; aÞ ’ log sd eas pffiffiffiffiffiffiffi ds
0 0 2ps
r4
and plugging in a ¼ 2jjDz jj2 r12 þ Ds 1
r :
2
z
¼ dloga dlog2: ð31Þ
z
4a
Unfortunately, this integral appears to have no
general analytic solution. However, in the limit of small When a . a‘(d), we approximate I0(s) by its Taylor
or large a, we can replace the Bessel function with series around s ¼ 0 (Abramowitz & Stegun, 1965):
Figure A4. Microsaccade kinematics in EyeLink data. (A) BMD: (left) peak velocity distributions, (middle) main-sequence linear
relationship between peak velocity and amplitude, and (right) duration distributions. (B) EK6. Mostly we notice similarities between
the kinematics of the sequences detected with the two different algorithms. We spot the velocity threshold for the peak velocity
distribution for the microsaccades detected by EK6.
Figure A5. Inferred microsaccade rates in EyeLink data vary across algorithms. Colors for the four algorithms are the same as in
previous figures. S1–S5 represent the five subjects.
‘ s 2i ‘ 22i
I0 ðsÞ ’ Ri¼0 Cðiþ1Þ
1
2 2 , so that Aðd; aÞ ’ Ri¼0 Cðiþ1Þ 2 MCMC sampling
Z ‘ 2
sdþ2i eas ds: Keeping the first two terms and The goal of Step 3 in the BMD algorithm is to
0
evaluating the integrals, we obtain ^ x,
sample possible eye state time series C from p(Cjẑ, r
r ^ 0, r
^ z, r ^ 1, d̂1). We use an MCMC method (Newman &
dþ1 Barkema, 1999), which performs a biased random walk
logAðd; aÞ ’ log2 þ loga in the space of all such time series. On each step, we
2
generate a new time series Cnew by randomly mutating
dþ1 C 1 þ dþ1
2
þ log C þ : ð32Þ the current C in one of six possible steps (Figure A2).
2 4a To concisely express these steps, we reparametrize each
We also build a lookup table with a million pairs of a time series C in terms of its change points s, and
and d, and the corresponding value of logA(a, d), which separately keep track of time points where the eye state
we compute numerically using MATLAB’s integral changes from drift to microsaccade (s01) and from
command. For a0(s) , a , a‘(s), we evaluate logA(a, d) microsaccade back to drift (s10). The six steps in our
MCMC sampling scheme are as follows:
by linearly interpolating between entries in the table.
Interpolation is a slow operation, so we replace I0(s) 1. s01 s01 þ 1
with asymptotic approximations in the limit of small 2. s01 s01 1
and large a. This causes some error, which grows as a 3. s10 s10 þ 1
deviates from 0 or ‘. We choose a‘(d) and a0(d) such 4. s10 s10 1
that the total error in logA(a, d) is less than 0.003 5. Create a new pair s01 s10
(Figure A1). 6. Create a new pair s10 s01
Figure A6. Inferred microsaccade rate in EyeLink data is robust to prior parameters. (A) As we vary k0, the parameter that controls the
drift-duration prior, the inferred microsaccade rate varies only slightly. The lowest value, k0 ¼ 0.012 ms1, corresponds to a drift-
duration distribution with median 80 ms, and the highest value, k0 ¼ 0.00133 ms1, to 760 ms. (B) The inferred microsaccade rate
does not depend too much on k1 (with the exception of subject S4). The highest and lowest values of k1 correspond to median
microsaccade durations of 3.3 and 30.3 ms, respectively. The somewhat larger dependence of the microsaccade rate on k1 makes
intuitive sense, as increasing k1 allows for very short high-velocity sequences to be labeled as microsaccades.
Parallel tempering
We have a high-dimensional problem with a
complicated probability landscape that can be hard for
Metropolis algorithms to navigate without getting
stuck in local maxima. To avoid this, we performed
parallel tempering (Earl & Deem, 2005; Newman &
Figure A7. Typical failure mode of BMD in low-noise simulations. Barkema, 1999), also called replica-exchange MCMC
Instead of detecting the microsaccade labeled by EK6, BMD sampling, which entails performing the Metropolis–
detects a microsaccade right before and another microsaccade Hastings algorithm concurrently at several inverse
right after. This error occurs because the Kalman smoother (Step temperatures b, which modify the acceptance proba-
2) converts the discontinuities at the beginning and end of the bility to
change points into more gradual slopes, and the subsequent
eye state estimation algorithm (Step 3) infers that these slopes
are low-velocity microsaccades. A truly optimal inference
algorithm, which marginalizes over the eye position, will not
AðCCnew Þ ¼ min 1; ð gðCnew CÞpðCnew Þ
gðCCnew ÞpðCÞ
!b
make this error.
equal g(Cnew C). The MCMC algorithm accepts any where we have split up the posterior into a prior and a
of these steps with an acceptance probability A(C likelihood. The lower the temperature (increased b), the
Cnew). To sample from the correct posterior distribu- less likely the Markov chain will accept steps which
tion, the Markov chain in a Monte Carlo algorithm has reduce the likelihood. Therefore, low-temperature
to satisfy detailed balance, which ensures that the chains are strongly attracted by likelihood maxima
system makes transitions in and out of every state with (local or global), whereas high-temperature chains
compatible probabilities: explore the space more widely. In the infinite-temper-
PðCCnew Þ gðCCnew ÞAðCCnew Þ ature limit, the Markov chain samples from the prior
¼ p(C). The strength of parallel tempering consists in
PðCnew CÞ gðCnew CÞAðCnew CÞ
allowing neighboring chains to exchange information
pðCnew jb
z; b
rx ; b
rz ; b r1 ; db1 Þ
r0 ; b by attempting to swap their configurations and
¼ : ð33Þ
z; b
pðCjb rx ; b
rz ; b r1 ; db1 Þ
r0 ; b accepting swaps with a probability
We guarantee detailed balance using a modified AðfC1 ; C2 gfC2 ; C1 gÞ
Metropolis–Hastings acceptance probability (Metrop- 0 !b2 b1 1
olis, Rosenbluth, Rosenbluth, Teller, & Teller, 1953; zjC1 ; b
pðb rx ; b
rz ; b
r0 ; b b
r1 ; d1 Þ
¼ min@1; A:
Newman & Barkema, 1999): zjC2 ; b
pðb rx ; b
rz ; b r1 ; db1 Þ
r0 ; b
AðCCnew Þ ð36Þ
!
new
gðC CÞpðC jb z; b new
rx ; b
rz ; b r1 ; db1 Þ
r0 ; b
¼ min 1; : This acceptance probability ensures that we always
gðCCnew ÞpðCjb
z; b
rx ; b
rz ; b r1 ; db1 Þ
r0 ; b swap if a hotter chain has stumbled on a state with a
ð34Þ higher posterior, thus providing the algorithm with a
very high chance to not get stuck in a local maxima,
Coarsely, this rule accepts all steps which increase while ensuring that the chain has b ¼ 1 sample from the
the posterior probability of the new time series, and correct posterior probability distribution p(Cjz, rz, r1,
accepts some steps which decrease its posterior. d1, r0, d0). We choose the set of temperatures in our
However, the acceptance probability also contains a simulation to span the full range between b ¼ 0 and b ¼
new
term gðC CÞ
gðCCnew Þ which compensates for any mismatch in 1, with significant overlap in the distribution of
selection probabilities between transitions and their posterior values between successive chains, so that
reverse. This compensation term allows the Metropo- swaps are accepted with a nonzero probability.