0% found this document useful (0 votes)
9 views6 pages

Kalman Filter Changepoint Detection and Trend Characterization

The document describes a method for detecting changepoints and characterizing trends in time series data using a Kalman filter approach. It applies a learning rate to the Kalman filter gain to allow trends to be "locked in" over time, forcing estimates to rely more on predictions than observations. When residuals between predictions and observations exceed thresholds, a changepoint is detected and a new Kalman filter is initialized to model the new trend. The method is demonstrated on electrocorticography (ECoG) neural time series data to analyze changes in neural activity during a speech task.

Uploaded by

Krauss John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

Kalman Filter Changepoint Detection and Trend Characterization

The document describes a method for detecting changepoints and characterizing trends in time series data using a Kalman filter approach. It applies a learning rate to the Kalman filter gain to allow trends to be "locked in" over time, forcing estimates to rely more on predictions than observations. When residuals between predictions and observations exceed thresholds, a changepoint is detected and a new Kalman filter is initialized to model the new trend. The method is demonstrated on electrocorticography (ECoG) neural time series data to analyze changes in neural activity during a speech task.

Uploaded by

Krauss John
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

KALMAN FILTER CHANGEPOINT DETECTION AND TREND CHARACTERIZATION

Scott Kuzdeba1 , Brandon Hombs2 , Jeremy D. Greenlee3∗ , Frank H. Guenther145†


1
Boston University, 2 Eigen LLC, 3 University of Iowa Hospitals and Clinics, 4 Massachusetts General
Hospital, 5 Massachusetts Institute of Technology. Corresponding Email: [email protected]

ABSTRACT Kalman filters are traditionally applied to model and track


an entire time-series, rather than disjoint parts of the time-
In this paper we describe a data-driven change detection al-
series. Some work has been done with Kalman filters as
gorithm based on the Kalman filter. The algorithm models
change detection, through the use of extreme value theory
trends in the underlying data through the use of a Kalman
[7], looking for changes in the residuals of the predictions
filter. A learning rate is applied to the Kalman filter gain to
[8], or explicit use of priors for the expected nature of the
allow for trends to be locked in. This forces the estimates
change [9]. In [9] the detector becomes a matched filter when
to rely more on the prediction and less on the observation as
change uncertainty is zero and an energy detector when the
time goes on. Statistical thresholds are set to detect a change,
uncertainty in the change goes to infinity.
i.e. changepoint, and a new Kalman filter is started to track
We focus our efforts on developing a novel Kalman filter
the new trend in the data. We show results using neural elec-
changepoint detector for use on neural time series data. We
trocorticography time-series data.
analyze electrophysiological data, in particular data recorded
Index Terms— Change Detection, Kalman Filter, change- by electrocorticography (ECoG), to look for and characterize
point, electrocorticography (ECoG), trend analysis changes in the underlying neural process.
Kalman filters have been used extensively in ECoG Brain-
1. INTRODUCTION Computer Interface (BCI) applications, including predicting
movement trajectories [10]. Outside of BCI, ECoG applica-
Understanding trends in time-series data is critically impor- tions are more sparse. It has been used in limiting settings to
tant in many applications including financial trading, weather help with modeling and prediction of ECoG signals, such as
prediction, communications, and electrophysiology to name predicting seizure onset [11].
a few. Often the underlying trends in the time-series gen- We are not interested in modeling the entire time-series,
erative process have discrete changes, referred to as change- like most Kalman filter applications. Instead, we wish to
points. Many purposes require both detecting the changes and model the discrete trends that appear throughout the time-
characterizing the trends. This helps with prediction of future series. This is important for our ECoG dataset, as it allows us
trends and in understanding what has happened in the past. to start to breakdown and understand the discrete differences
Changepoint detection algorithms have been applied to in neural activity during a task. This requires two additions
several different applications [1, 2, 3, 4]. They are based to the Kalman filter model: 1) the ability to detect when a
on the premise of modeling underlying states in sequential trend has changed and 2) the ability to start a new Kalman
data and detecting discrete changes, or changepoints, when filter at change to model the new trend. In this paper we de-
the state changes. See [5] or [6] for a comprehensive review. scribed how we modify the Kalman filter with a learning rate.
Kalman filters are used extensively to track trends in data, The learning rate puts more emphasis in the prediction step as
especially in time-series data analysis. It is based on the no- time goes on, allowing the estimate to deviate from the obser-
tion of tracking states of system by jointly using predictions vation when trend no longer fits, enabling change detection.
and observations, while accounting for several sources of un- Our approach uses the residuals between predictions and
certainty. It provides a data-driven methodology for modeling observations to detect change. This is similar to previous
trends in data, capable of capturing changes in the observa- methods, but instead of specifying trend definitions ahead of
tions as well as reliance on priors. time, we use the learning rate and a data-driven approach to
adaptively lock in learned trends as time progresses. Learning
∗ This research was supported by funding from the National Institute for
rates have been used in Kalman filters for ECoG in the past
Deafness and other Communication Disorders (grant R01 DC015260)
† This research was supported by funding from the National Institute for [12]. However, we apply it specifically for changepoint detec-
Deafness and other Communication Disorders (grants R01 DC002852 and tion and trend characterization. Further, we use the learning
R01 DC007683) rate in a novel way to apply it to the Kalman gain factor.

‹,(((
The paper is organized as follows. Section 2 describes the These account for the unknown or un-modeled dynamics of
data used for development and test. Next, Section 3 provides the system, amongst other things. The Kalman filter is an iter-
the details of Kalman filter changepoint detection. Section 5 ative algorithm that performs the prediction and update steps
presents results and Section 6 provides a discussion. at each iteration to estimate the true state of the system.
One of the big design questions is what form the Kalman
2. DATA filter should take. This is heavily dependent on the domain
and what it is that is trying to be tracked. In our application,
In this work we use electrocorticographic (ECoG) data our dataset consists of neural recordings which all start with a
recordings obtained from five neurosurgical patients (4 males, baseline, or resting, period where there is not expected to be
1 female) undergoing surgical treatment of medically in- large amplitude high gamma activity. Thus, each example in
tractable epilepsy. Informed consent was provided from all our dataset is expected to have a half second period where the
subjects and research protocols were approved by the appro- signal is centered around zero. Thus, to start, our initial trend
priate institutional review board. should be flat and centered around zero. We want anything
For more complete information on the data, task, prepro- that deviates from this to trigger a changepoint.
cessing, and clustering steps refer to [13], which we briefly When activity does change from this initial trend, we want
summarize here. ECoG data is collected from electrodes to note both that it changed and track the new trend, or state.
placed on the surface of the brain. Recordings were made We wish to track the dynamics of the trend, and not just a sim-
while subjects were awake and undergoing a speech task, ple bias, and thus we setup our Kalman filter to track both the
where they were repeating words visually presented to them value and the rate of change of the observations, or simply the
on a screen. Data was preprocessed to remove bad channels observation and rate. Going beyond this linear representation,
and artifacts through manual inspection and kurtosis analy- to polynomials or other nonlinear representations, may help to
sis. Next we extracted gamma frequency (70-150 Hz) power, better characterize some individual trends, but suffers from a
which has been linked to local neural processing [14]. Only loss of the ability to compare and understand the differences
electrodes showing a significant response to the speech task across trends. Therefore, we use a linear representation in
were kept. our application. Further, our approach here is different from
High gamma power responses from individual electrodes others which capture the value of the observations, but not
were segmented into trials based on the task. Each trial was 3 the rate. These approaches struggle to track transients, which
seconds long and sampled at 1 kHz, with the first 0.5 seconds are important for our dataset. Lastly, by using a data-driven
capturing baseline, non-speech, activity. Trials were then methodology we allow for the recorded data to drive the pa-
aligned to get average responses for each electrode. Average rameters of the model, moving away from the priors and thus
responses were z-scored based on baseline periods, which being more representative of the actual dataset instead of ex-
can be assumed to be Gaussian. Then responses were scaled pectations.
to fall between [0, 1]. Hierarchical agglomerative clustering In the rest of this section we walk through the parts of the
was performed across electrodes to get a set of characteristic model and extensions that we have added to it, starting with
activity patterns. Kalman filter changepoint detection was an illustrative example of the approach to provide intuition.
performed on these patterns to perform characterization.
The focus of this paper is on characterizing the results 3.1. Illustrative Example
from this dataset. While the focus is on this particular dataset,
it has applicability to other domains and applications where
the data is formatted similarly or can be transformed into a
similar representation. For simplicity, we will refer to the
collection of activity patterns as the dataset and an individ-
ual pattern as an example moving forward.

3. METHOD

Kalman filters can be broken up into two primary steps, a pre-


diction and update step. The prediction step uses a model of Fig. 1. Illustration of Kalman filter changepoint detect. Left:
the state of the system to predict what the next state of the Example time-series. Right, Kalman filter tracks (red) used
system will be. Then in the update step, this prediction is to detect changepoints (green).
compared to the observation of the system to update the es-
timate. Noise sources of the system are included within the Here we provide a quick illustration of the methodology
model to model the uncertainty in the measurement of the to build intuition before jumping into the details. An example
observed state, the measurement variance, and model noise. from the dataset is shown in the left of Figure 1. On the right,
the yellow blocked off portion is the baseline period where 3.4. Update Step
there is no large amplitude activity, as can be seen by the flat
After making a prediction, Kalman filters update their predic-
Kalman filter predictions (red) and observations (blue). As
tion given the observation, or measurement. Since we only
time progresses, the algorithm assesses new data to see if it
directly observe the activity, we need to recalculate an esti-
fits within this trend. A change is detected just after a time of
mate of the activity rate at each step. We use the slope of the
zero (green dashed line). At this point, the old trend no longer
observations since the last changepoint as our rate estimate,
captures the underlying dynamics of the data so a new trend
i.e. within the current trend. For simplicity, we calculate this
is learned to capture the transient behavior of the rise in ac-
as the difference in the current measured activity value and the
tivity (sloped red and blue lines). As time proceeds this trend
initial activity of the trend and divide by the duration of the
is learned and does not adapt as much, thus locking in the
current trend. For the initial trend, we leave the rate estimate
trend (slope of the red line becomes fixed). Once this trend
set to zero.
no longer fits the data, just after a time of 0.5, another change-
The prediction is updated using the Kalman gain term, K.
point is declared and a new trend is learned. This progresses
This term allows for a weighting between the observation and
until the end of the example.
the prediction. The update step is shown in the following set
of equations, where α is a learning rate that is discussed next:
3.2. Setup
K = αPHT (HPHT + R)−1
As mentioned in the beginning of this section, we are mod-
eling linear trends in order to provide descriptive power. We  :) = X(i,
X(i,  :) + K(Z(i) − HX(i,
 :))
model the true trends as:
3.5. Learning Rate
X(i + 1, :) = AX(i, :) + w
Z(i) = HX(i, :) + R Our novel contribution was to add in a learning rate, α, to
the Kalman gain term. The goal of this addition is to allow
where X is the state vector for the activity and activity rate, or for trends to get locked in once they are learned. Since the
rate of change, i.e. X = [activity, activityrate]T . The state Kalman gain term provides a weighting between the predic-
transition matrix, A, is set to be equal to a linear equation tion and observation, by adaptively modulating it we can turn
modeling the activity and activity rate, A = [1, 1; 0, 1] and down the influence of the observation over time to rely more
the realized noise is captured in w . Since the activity is the heavily on the prediction. This will allow for trends to be
only observation, the measurement function, H, is set to H = locked into the model and, importantly, when the dynamics
[1, 0]T . The observed activity, Z, is the measurement of the of the example start to change the model’s prediction will
activity by the ECoG sensor. The activity rate, rest , is not start to deviate from the observation which will allow for a
observed, but rather derived from the observed activity. Both changepoint to be declared.
Q and R model noise terms. The goal is then to set the appropriate learning rate to
 to [0, 0]T to
To initialize, we set the state estimate, X, allow for the model to get enough observations to learn the
model activity, or lack thereof, during the baseline period. activity rate and then rely on this learned value as time pro-
The estimated rate of the model will be used to describe the gresses further. This approach relies on a data-driven discov-
trend. The process noise, Q is set as the variance in the dif- ery of what the activity rate should be, rather than a priori
ference of the samples in the baseline period. The baseline setting of it through priors. An exponential decay was used
period is assumed to be Gaussian, as noted above, and hence for the learning rate, as in the following equation:
we transform the data to be difference-stationary to get a noise
estimate assuming whitened noise, which we use for the mea- α = exp−Δ/δF s
surement noise, R. This provides an estimate of the uncer-
where Δ is the number of time points seen since the start of
tainty in the true dynamics of the neural processing.
this trend, δ is a time constant, and Fs is the sample rate.
The choice for the time constant should be domain specific.
3.3. Prediction Step Since we are modeling signals where the activity is measures
The first step of a Kalman filter is to perform prediction. We of human speech, we set δ to be equal to 100ms, which is
do not make any modifications to the prediction step and use on the order of English phonemic expressions [15]. Thus, the
it as is: learning rate, α, acts as a decay factor for the Kalman gain,
K.
 :) = AX(i
X(i,  − 1, :)
P = APAT + Q 3.6. Change Detection Threshold
where, P is the covariance matrix for the estimate of the The learning rate enables the model to learn discrete activity
model and i indexes the current time point. trends. The next addition needed is a detector to identify when
the prediction deviates too far from the observations and a 4. CHARACTERIZING
changepoint should be declared. Again, a domain specific
determination for the detector threshold is used. Besides detecting when changepoints occur, the other goal
We construct a threshold using the statistics of the base- of this method is to be able to describe the trends between
line period, in particular using the measurement noise of the changepoints, and thus provide a description of the activity
observed activity, R. As already noted, this period can be patterns and shapes that emerge from the individual examples
assumed to be Gaussian distributed. We apply an inverse Q- and allow for cross-example comparisons. Our data started
function, Q−1 (), to the individual examples to get threshold at a baseline level and returned back to this level of activity
bounds for the individual responses. We used a 95% bound, once the speech task was complete. Further, in our primary
bound = 0.95, after some experimentation, finding this to be dataset all activity was positive. Therefore, our characteriza-
a good trade off between too many and too few changepoints tion of the activity becomes a description of the rise and fall
declared on individual examples. The threshold for individual of activity and its peak plateau structure.
examples is set with the following equation: We saw two primary shapes result; we will call these sym-
metric and ramp. We characterize a symmetric shape as one
√ 1 − bound whose rise and fall transient activity rates are within 10% of
threshold = RQ−1 ( ) each other. The ramp shape is characterized by a rise rate
2
that is faster than its fall rate. Examples of these shapes are
shown in Figure 2, where summary rates for each trend are il-
To get a global threshold across all examples and patients,
lustrated as simple linear overlay lines (red). This also allows
we set a confidence interval around the distribution of individ-
us to perform comparisons across responses to see which ac-
ual responses. Since all example thresholds are taken over the
tivities have the same rise times, fall times, or have a similar
baseline period, and all baseline periods have been z-scored,
rise time as another activity’s fall time, or vice versa.
they should all be drawn from the same distribution. Thus,
taking a global threshold over the collections helps control for
multiple comparisons. The distributions of individual thresh-
olds was beta distributed. We constructed a 99% confidence
interval of this distribution to set a fairly conservative thresh-
old to reduce false positive changepoints, which were not de-
sired in our application. This provided us with the threshold
that we used across all examples to declare a changepoint.
Fig. 2. Characterized shapes. Left: ramp. Right: symmetric.

3.7. Re-initialization To further test, we ran this algorithm over other ECoG
data sources that were characterized by more than just the
The last addition that we need to make to enable Kalman fil- symmetric and ramp shapes. We will not detail the charac-
ter changepoint detection is to re-initialize new filters when a terizations of those shapes here, but will provide results from
changepoint is detected. To re-initialize a filter, all parame- these datasets to show the validity of the method in Section 5.
ters are reset to their initial values, as stated above, with a few
exceptions now described. First, the estimated activity rate, 5. RESULTS
rest , is calculated based on the local observations around the
changepoint and is used to seed the estimated activity rate. In In Figure 3, results from the primary dataset are shown. It
our application we do not need to perform online estimation, can be seen that the ramp and symmetric shape are the two
so have availability of observations beyond our current time characteristic shapes displayed. Some of the shapes have a
point. We therefore use 100ms of data to recalculate a trend, brief transient period where activity slowly starts ramping up,
for similar reasons as setting α. To do so, we use 10% of before it reaches its full activity increase rate. In these cases,
data from the past and 90% in the future of the changepoint, deviations between the Kalman filter trace (red) and exam-
or what data is available if we approach boundary conditions ples (blue) can be seen prior to the first changepoint (green),
(which was not the case in our application). If we were per- indicating that the trend captured in the model is getting less
forming online calculation, we would have to allow for time indicative of the underlying process. Thus, changes occur and
to pass to get enough observation to reinitialize the next fil- two trends are needed to fully characterize the rise activity.
ter and hence no changepoints could occur during this time. In Figure 4, results from more variable ECoG traces are
The estimated covariance, P, is recalculate using the same shown. These results show that the method still holds when
data as rest , and with the same comment for extending to an there are changing trends during an increase, multiple peaks,
online application. We reset our state estimate, X(i,  :), to broad plateaus, and changing trends during the decrease.
T
[Z(i), rest ] . The method applied to this dataset was trained on the above
approach tries to keep things relatively flat and does not do a
good job of capturing slope.

Fig. 5. Top: Kalman filter changepoint detection. Bottom:


Bayesian changepoint detection.
Fig. 3. Results on main dataset: Examples in blue, Kalman
filter traces in red, changepoints in green.
6. DISCUSSION
dataset with no changes since the data is preprocessed the
same way. This provides strong evidence that the developed The Kalman filter changepoint detection described in this pa-
Kalman filter changepoint detection works well with added per provides a way to a) discover changes in underlying trends
data complexity. of data, i.e. changepoint detection, and b) provide character-
ization of the trends through the explicit modeling of activ-
ity rates between changepoints. The method is based on the
Kalman filter, allowing it to take a data-driven approach to
track the trends of the underlying process. This has the added
benefit of providing a descriptive language to characterize and
compare various results.
As an example, Figure 6 shows three different exam-
ples that were in the dataset. When they were characterized
through the Kalman filter changepoint detection algorithm
they were found to all have the same rise and fall activity
rates, within 10% of each other. This allowed for the identi-
fication of time delayed versions of the same activity pattern,
as can be seen in the right hand side of the figure where they
are all shifted in time to have their shapes aligned. Insights
Fig. 4. Results on more variable data: Examples in blue, can be gained from this and it motivates new questions that
Kalman filter traces in red, changepoints in green. can help to direct future research to help better understand
human speech and why different functional processing areas
exhibit the same activity patterns.
Figure 7 shows another example of the power of the re-
5.1. Comparison
sulting characterization of this approach. In this case, two
We also compare our method to other state-based changepoint separate shapes were found, (a) symmetric and (b) ramp.
detector. In Figure 5 we plot on the top our Kalman filter When comparing the individual trends it was found that the
changepoint detection results and on the bottom a Bayesian fall activity rates between the two were the same. Further,
changepoint detection (a modified version of [3]). We set both it was found that the changepoint for the start of the fall
methods up similarly, using the statistical bounds, training pe- trend occurred at the same time, showing that the underlying
riods, and parameter setting as described in Section 3. The re- processes of these two initially are different, but converge to
sults from the two methods have been aligned in time. It can exhibit the same deactivation, as shown in (c). Again, this
be see that our method is able to better find the points of trend helps to motivate and direct future research to understand
changes, while the Bayesian approach has a bit more of a lag. why different speech processing stages have a time-aligned
Further, in looking at the tracked states of each method (red) activity decay.
it is seen that our method captures the rate of the underlying We looked at other methods to fit the shapes beyond
trend (as seen by the slopes in the traces), while the Bayesian Kalman filters. Methods explored in this category included
[2] M. Lavielle, “Using penalized contrasts for the change-
point problem,” Signal Processing, vol. 85, no. 8, pp.
1501 – 1510, 2005.
[3] R. Prescott Adams and D. J. C. MacKay, “Bayesian
Online Changepoint Detection,” arXiv e-prints, p.
arXiv:0710.3742, Oct 2007.
[4] K. Haynes, P. Fearnhead, and I. A. Eckley, “A compu-
tationally efficient nonparametric approach for change-
Fig. 6. Results characterized with the same rise and fall point detection,” Statistics and Computing, vol. 27,
trends. Left: raw results. Right: aligned to illustrate simi- no. 5, pp. 1293–1305, Sep 2017.
lar trends. Green - Motor Planning, Red - Motor Execution,
Blue - Auditory Processing [5] S. Aminikhanghahi and D. J. Cook, “A survey of meth-
ods for time series change point detection,” Knowl. Inf.
Syst., vol. 51, no. 2, pp. 339–367, May 2017.
[6] C. Truong, L. Oudre, and N. Vayatis, “Selective re-
view of offline change point detection methods,” arXiv
e-prints, p. arXiv:1801.00718, Jan 2018.
[7] H. Lee and S. J. Roberts, “On-line novelty detection us-
ing the kalman filter and extreme value theory,” in 2008
19th International Conference on Pattern Recognition,
Dec 2008, pp. 1–4.
[8] M. Severo and J. Gama, “Change detection with kalman
filter and CUSUM,” in Discovery Science, 2006.
Fig. 7. (a) Symmetric and (b) ramp shapes, shown to have
[9] B. Moussakhani, J. Flåm, T. Ramstad, and I. Balas-
the same decay activity when plotted on top of each other(c).
ingham, “On change detection in a kalman filter based
Blue - Motor Execution, Red - Auditory Feedback
tracking problem,” Signal Processing, vol. 105, pp. 268
– 276, 2014.
fitting splines and polynomials, amongst others. With an ex- [10] A. Eliseyev and T. Aksenova, “Penalized multi-way par-
plicit expression for the function, i.e. spline, it was thought tial least squares for smooth trajectory decoding from
that trends could be discussed in common terms, i.e., in the electrocorticographic (ecog) recording,” in PloS one,
form of the function parameters, and hence a common lan- vol. 11, no. 5, 2016.
guage would be provided for description. None of these
methods produced satisfactory results without heavy addi- [11] O. V. Lie and P. van Mierlo, “Seizure-onset mapping
tional preprocessing, which was undesirable. Additionally, based on time-variant multivariate functional connec-
the fine timing information desired was smoothed out and tivity analysis of high-dimensional intracranial EEG: A
lost its specificity. kalman filter approach,” Brain Topogr, vol. 30, no. 1,
In addition to other methods, different variants of Kalman pp. 46–59, 2017.
filters were explored, including the extended Kalman filter [12] H.-L. Hsieh and M. M. Shanechi, “Optimizing the learn-
(EKF) and unscented Kalman filter (UKF) which are non- ing rate for adaptive estimation of neural encoding mod-
linear variants. We decided to stay with the base Kalman els,” PLoS Comput Biol, vol. 14, no. 5, 2018.
filter model for its descriptive ability. Our application was
described well by linear trends, so this was the best fit. Other [13] S. Kuzdeba, A. Daliri, J. A. Tourville, F. H. Guenther,
applications may want to revisit this, but care will need to and J. D. Greenlee, “Characteristic time courses of neu-
be given to allow for characterization as well as changepoint ral activity during speech production,” in Preparation.
detection if that is important as it was in our application. [14] S. Ray, N. E. Crone, E. Niebur, P. J. Franaszczuk, and
S. S. Hsiao, “Neural correlates of high-gamma oscilla-
7. REFERENCES tions (60-200 hz) in macaque local field potentials and
their potential implications in electrocorticography,” J.
[1] E. S. Page, “Controlling the standard deviation by Neurosci., vol. 28, no. 45, pp. 11 526–11 536, 2008.
cusums and warning lines,” Technometrics, vol. 5, no. 3,
[15] K. N. Stevens, Acoustic Phonetics. MIT Press, 2000.
pp. 307–315, 1963.

You might also like