Exact Indexing of Dynamic Time Warping
Exact Indexing of Dynamic Time Warping
net/publication/225230134
CITATIONS READS
1,713 2,099
2 authors, including:
SEE PROFILE
All content following this page was uploaded by Chotirat Ann Ratanamahatana on 23 December 2014.
Abstract. The problem of indexing time series has attracted much interest. Most algorithms
used to index time series utilize the Euclidean distance or some variation thereof. However, it
has been forcefully shown that the Euclidean distance is a very brittle distance measure. Dy-
namic time warping (DTW) is a much more robust distance measure for time series, allowing
similar shapes to match even if they are out of phase in the time axis. Because of this flexi-
bility, DTW is widely used in science, medicine, industry and finance. Unfortunately, however,
DTW does not obey the triangular inequality and thus has resisted attempts at exact indexing.
Instead, many researchers have introduced approximate indexing techniques or abandoned the
idea of indexing and concentrated on speeding up sequential searches. In this work, we intro-
duce a novel technique for the exact indexing of DTW. We prove that our method guarantees
no false dismissals and we demonstrate its vast superiority over all competing approaches in
the largest and most comprehensive set of time series indexing experiments ever undertaken.
Keywords: Dynamic time warping; Indexing; Lower bounding; Time series
1. Introduction
The indexing of very large time series databases has attracted the attention of the
database community in recent years. The vast majority of work in this area has
focused on indexing under the Euclidean distance metric (Agrawal et al. 1995; Chan
et al. 2003; Das et al. 1998; Debregeas and Hebrail 1998; Faloutsos et al. 1994;
Keogh et al. 2000, 2001; Korn et al. 1997; Yi and Faloutsos 2000). However, there is
an increasing awareness that the Euclidean distance is a very brittle distance measure
(Aach and Church 2001; Bar-Joseph et al. 2002; Berndt and Clifford 1994; Chu et
al. 2002; Diez and Gonzalez 2000; Kadous 1999; Keogh and Pazzani 2000; Kollios
et al. 2002; Schmill et al. 1999; Yi et al. 1998). What is needed is a method that
allows an elastic shifting of the time axis, to accommodate sequences that are similar
but out of phase, as shown in Fig. 1. Just such a technique, based on dynamic
Fig. 1. Note that, while the two time series have an overall similar shape, they are not aligned in the time axis.
Euclidean distance, which assumes the i th point in one sequence is aligned with the i th point in the other, will
produce a pessimistic dissimilarity measure. The nonlinear dynamic time warped alignment allows a more
intuitive distance measure to be calculated
Although applications listed above are quite diverse, one thing they have in com-
mon is that they need to find the best match to a query time series, from a (possibly
very large) pool of candidates. This can trivially be achieved by sequential scanning,
comparing each and every candidate to the query, in an arbitrary order. The problem
with sequential scanning is that it is simply too slow for most applications. What
we really need is a technique to index the data, that is, to find the best match with-
out having to examine every candidate (Roussopoulos et al. 1995; Seidl and Kriegel
1998). Indexing techniques can be exact, guaranteeing to return the same result as
sequential scanning (Faloutsos et al. 1994; Keogh et al. 2000, 2001), or approxi-
mate, returning good matches but not necessarily the best matches (Faloutsos and
Lin 1995; Park et al. 2001; Yi et al. 1998).
More than two-dozen techniques have been introduced to index time series under
the Euclidean distance (Chan et al. 2003; Faloutsos et al. 1994; Keogh et al. 2000;
Yi and Faloutsos 2000) (see Keogh et al. (2001) for a more comprehensive listing).
In addition, several researchers have shown techniques to approximately index DTW
(Park, personal communication) or introduced methods to reduce its demanding CPU
time (Berndt and Clifford 1994; Chu et al. 2002). However, only two researchers have
claimed to introduce an exact indexing technique for DTW (Kim et al. 2001; Park et
Exact indexing of dynamic time warping
al. 1999). In the case of Park et al. (1999), the claim of no false dismissals was later
retracted (Park et al. 1999, 2001). We have carefully implemented the only technique
to correctly claim the ability to exactly index DTW and the only other lower bound-
ing approximation of DTW for detailed comparison with our proposed approach.
In contrast with the approaches above, we will prove the no-false-dismissal prop-
erty of our approach and demonstrate its superiority with the most comprehensive
set of time series indexing experiments ever undertaken. In particular, in terms of
number and diversity of datasets, size of datasets, range of query lengths and index-
ing parameters, our experiments are one to two orders of magnitude more than all
previous papers combined.
The rest of the paper is organized as follows. In Sect. 2, we will consider the util-
ity of time series similarity search, review the DTW algorithm, and consider related
work. In Sect. 3, we will introduce a novel lower bounding technique that tightly
approximates the true DTW distance. Section 4 introduces a method that allows the
exact indexing using our lower bounding function. In Sect. 5, we conduct an ex-
haustive empirical comparison of our method with competing techniques. Finally, in
Sect. 6, we offer conclusions and suggestions for extensions.
2. Background
A similarity search in time series is useful in its own right as a tool for interactive
exploration of very large databases; it is also a subroutine in many data mining appli-
cations, including rule discovery (Das et al. 1998), clustering (Debregeas and Hebrail
1998) and classification (Diez and Gonzalez 2000; Kadous 1999). The superiority
of DTW over Euclidean distance for these tasks has been demonstrated by several
authors (Aach and Church 2001; Bar-Joseph et al. 2002; Caiani et al. 1998; Chu et
al. 2002; Keogh and Pazzani 2000; Yi et al. 1998). However, for completeness, we
include a simple experiment to illustrate the point.
The most studied time series classification/clustering problem is the cylinder–bell–
funnel dataset (Diez and Gonzalez 2000; Kadous 1999); it is a deceptively simple-
looking three-class problem. All classes are of length 128. The cylinder class consists
of a short flat section, followed by a sudden jump to an elevated plateau, and a return
to a short flat section. Both the location of the onset of the plateau and the length
of the plateau itself are controlled by random variables. The bell class replaces the
plateau with a ramp and the funnel class is simply the mirror image of the bell class.
Finally, all instances are corrupted by Gaussian noise. More details about the dataset
and a data generator can be downloaded from the UCR Time Series Data Mining
Archive (http://www.cs.ucr.edu/˜eamonn/TSDMA). Figure 2 shows typical examples
of each class.
The problem has been attacked with sophisticated techniques, including rule-base
learners (Diez and Gonzalez 2000; Kadous 1999), boosting, Bayesian techniques, and
decision trees. We performed a simple classification experiment on this dataset, using
the one-nearest neighbor algorithm. Our dataset consists of ten instances of each class
and the classifier was evaluated using the leaving-one-out strategy. Because we had
the luxury of unlimited data, we averaged the results over 1,000 runs. The mean error
rate for the Euclidean distance metric on the problem was 0.2734, but for DTW, it
was only 0.0269, an order of magnitude lower. This off-the-shelf result is competitive
with the highly tuned, sophisticated techniques enumerated above. The lower error
rate came with a cost, however; classification with DTW took approximately 230
times longer than that with Euclidean distance.
E. Keogh, C.A. Ratanamahatana
Fig. 2. Typical examples from the cylinder–bell–funnel dataset (cylinder 3 & 4, bell 5 & 6, funnel 1 & 2).
When clustered using Euclidean distance, the variability of the time axis often causes the classes to be con-
fused; in contrast, DTW compensates for the variability in the time axis and does a much better job at correctly
grouping the classes
This result reiterates the utility of DTW and motivates the necessity of indexing
it.
Suppose we have two time series, Q and C, of length n and m, respectively, where
Q = q1 , q2 , . . ., qi , . . ., qn (1)
C = c1 , c2 , . . ., c j , . . ., cm . (2)
To align two sequences using DTW, we construct an n-by-m matrix where the
(i th , j th) element of the matrix contains the distance d(qi , c j ) between the two points
qi and c j (i.e. d(qi , c j ) = (qi − c j )2 ). Each matrix element (i, j) corresponds to
the alignment between the points qi and c j . This is illustrated in Fig. 3. A warping
path W is a contiguous (in the sense stated below) set of matrix elements that defines
a mapping between Q and C. The k th element of W is defined as wk = (i, j)k . So
Exact indexing of dynamic time warping
we have
W = w1 , w2 , . . ., wk , . . ., wK max(m, n) ≤ K < m + n − 1. (3)
The warping path is typically subject to several constraints.
Fig. 3. A) Two sequences Q and C that are similar but out of phase. B) To align the sequences, we con-
struct a warping matrix and search for the optimal warping path, shown with solid squares. C) The resulting
alignment
• Boundary conditions: w1 = (1, 1) and wK = (m, n). This requires the warping
path to start and finish in diagonally opposite corner cells of the matrix.
• Continuity: Given wk = (a, b), then wk−1 = (a , b ), where a − a ≤ 1 and
b − b ≤ 1. This restricts the allowable steps in the warping path to adjacent
cells (including diagonally adjacent cells).
• Monotonicity: Given wk = (a, b), then wk−1 = (a , b ), where a − a ≥ 0 and
b − b ≥ 0. This forces the points in W to be monotonically spaced in time.
There are exponentially many warping paths that satisfy the above conditions.
However, we are only interested in the path that minimizes the warping cost:
K
DTW(Q, C) = min wk . (4)
k=1
This path can be found using dynamic programming to evaluate the following Re-
currence, which defines the cumulative distance γ(i, j) as the distance d(i, j) found
in the current cell and the minimum of the cumulative distances of the adjacent
elements:
γ(i, j) = d(qi , c j ) + min {γ(i − 1, j − 1), γ(i − 1, j), γ(i, j − 1)}. (5)
The Euclidean distance between two sequences can be seen as a special case of
DTW where the kth element of W is constrained such that wk = (i, j)k , i = j = k.
Note that it is only defined in the special case where the two sequences have the
same length. The time and space complexity of DTW is O(nm).
E. Keogh, C.A. Ratanamahatana
This review of DTW is necessarily brief; we refer the interested reader to Krus-
kall and Liberman (1983) and Rabiner and Juang (1993) for a more detailed treat-
ment.
There has also been some work in which attempts at indexing and/or lower
bounding are abandoned, and instead, efforts are concentrated on fast approximation
of the DTW distance using a lower resolution approximation of the data. The idea
was introduced by Keogh and Pazzani (2000), who use a piecewise linear approxi-
mation of the data. The method shows significant speedup with few false dismissals.
A similar idea was suggested by Chan et al. (2003). Here, the authors obtain the
lower resolution of the data approximation with wavelets and use their approximate
distance measure instead of that of Yi et al., i.e. the lower bounding measure, within
the FastMap framework. The method improves the speedup of the work of Yi et al.
work at the expense of introducing more false dismissals.
Finally, there has been some work on obtaining warping alignments by methods
other than DTW (Bar-Joseph et al. 2002; Kwong et al. 1996). For example, Kwong
et al. consider a genetic algorithm-based approach (Kwong et al. 1996), and recent
work by Bar-Joseph et al. considers a technique based on linear transformations of
spline-based approximations (Bar-Joseph et al. 2002). However, both methods are
stochastic and require multiple runs (possibly with parameter changes) to achieve
an acceptable alignment. In addition, both methods are clearly nonindexable. How-
ever, both works do reiterate the superiority of warping over nonwarping for pattern
matching.
Table 2. An algorithm that uses a lower bounding distance measure to speed up the sequential scan search
for the query Q
Algorithm Lower_Bounding_Sequential_Scan(Q)
1. best_so_far = infinity;
2. for all sequences in database
3. LB_dist = lower_bound_distance(Ci,Q);
4. if LB_dist < best_so_far
5. true_dist = DTW(Ci ,Q);
6. if true_dist < best_so_far
7. best_so_far = true_dist;
8. index_of_best_match = i;
9. endif
10. endif
11. endfor
E. Keogh, C.A. Ratanamahatana
While lower bounding functions for string edit, graph edit, and tree edit distance
have been studied extensively (Kruskall and Liberman 1983), there has been far less
work on DTW, which is very similar in spirit to its discrete cousins. Below, we will
consider the existing DTW lower bounding techniques.
To the best of our knowledge, there are only two existing lower bounding functions
available for DTW (not including Park et al. (1999), which incorrectly claims to be
lower bounding, or Park et al. (2000), which has a time complexity equal to the full
algorithm). While referring the interested reader to the original papers for detailed
explanations, below, we give a visual intuition and brief explanation of each.
The lower bounding function introduced by Kim et al. (2001) (hereafter, referred
to as LB_Kim), works by extracting a four-tuple feature vector from each sequence.
The features are the first and last elements of the sequence, together with the max-
imum and minimum values. The maximum squared differences of corresponding
features are reported as the lower bound. Figure 4 illustrates the idea.
Fig. 4. A visual intuition of the lower bounding measure introduced by Kim et al. The maximum squared
difference between the two sequences first (A), last (D), minimum (B) and maximum points (C) is returned
as the lower bound
Fig. 5. A visual intuition of the lower bounding measure introduced by Yi et al. The sum of the squared
length of gray lines represents the minimum of the corresponding points contribution to the overall DTW
distance, and thus can be returned as the lower bounding measure
Before introducing our lower bounding technique, we must review additional details
of the DTW algorithm that we deliberately omitted until now.
In addition to the constraints on the warping path enumerated in Sect. 2.1, virtually
all practitioners using DTW also constrain the warping path in a global sense by
limiting how far it may stray from the diagonal (Berndt and Clifford 1994; Chu et al.
2002; Gollmer and Posten 1995; Itakura 1975; Keogh and Pazzani 2000; Myers et
al. 1980; Sakoe and Chiba 1978; Tappert and Das 1978). The subset of the matrix
that the warping path is allowed to visit is called the warping window. Figure 6
illustrates two of the most frequently used global constraints, the Sakoe-Chiba band
(Sakoe and Chiba 1978) and the Itakura parallelogram (Itakura 1975).
Fig. 6. Global constraints limit the scope of the warping path, restricting them to the gray areas. The two
most common constraints in the literature are the Sakoe-Chiba band and the Itakura parallelogram
There are several reasons for using global constraints, one of which is that they
slightly speed up the DTW distance calculation. However, the most important rea-
E. Keogh, C.A. Ratanamahatana
son is to prevent pathological warpings, where a relatively small section of one se-
quence maps onto a relatively large section of another. The importance of global
constraints was documented by the originators of the DTW algorithm, who were
exclusively interested in aligning speech patterns (Sakoe and Chiba 1978). However,
it has been empirically confirmed in other settings, including finance, medicine, bio-
metrics, chemistry, astronomy, robotics, and industry.
Fig. 7. a to d Four local constraints on dynamic time warping, as suggested by Sakoe and Chiba. a) cor-
responds to the trivial case of no constraint, and is therefore equivalent to Eq. (5), γ(i, j) = d(i, j) +
min {γ(i−1, j−1), γ(i−1, j), γ(i, j−1)}. In contrast, c) corresponds to γ(i, j) = d(i, j)+min {γ(i−1, j−1),
γ(i − 1, j − 2), γ(i − 2, j − 1)}. Local constraints can be reinterpreted as global constraints, as an example,
d) can be reinterpreted as the global constraint shown in e), which looks superficially like the Itakura paral-
lelogram constraint
DTW matrix, the cell (i, j) of the shadow matrix can be labeled as reachable. The
convex hull of all the reachable cells forms a band, which can be interpreted as
a global constraint. Note that we only have to do this once and we can then store
the resulting constraint for future use. While this reinterpretation of local constraints
may be obvious, we state it explicitly because it has not appeared in the literature
to our knowledge. Finally, we note that global and local constraints can be used
together; the interpretation being that, where they conflict, the most restrictive con-
straint (i.e. the one that forces the path closest to the diagonal line) is used (Kruskall
and Liberman 1983).
Fig. 8. An illustration of the sequences U and L created for sequence Q (shown dotted). A was created using
the Sakoe-Chiba band and B using the Itakura parallelogram
Having defined U and L, we now use them to define a lower bounding measure
for DTW.
n (ci − Ui )2 if ci > Ui
LB_Keogh(Q,C) = (c − Li )2 if ci < Li . (9)
0 i otherwise
i=1
This function can be readily visualized as the Euclidean distance between any
part of the candidate matching sequence not falling within the envelope and the
nearest (orthogonal) corresponding section of the envelope. Figure 9 illustrates the
idea.
Fig. 9. An illustration of the lower bounding function LB_Keogh(Q,C). The original sequence Q (shown
dotted) is enclosed in the bounding envelope of U and L. The squared sum of the distances from every part
of the candidate sequence C not falling within the bounding envelope, to the nearest orthogonal edge of the
bounding envelope is returned as the lower bound. Bounding envelope A was created using the Sakoe-Chiba
band and bounding envelope B using the Itakura parallelogram
Because the tightness of the bounds is proportional to the number and length of
the gray hatch lines, we can see, in this example at least, that the Itakura parallel-
ogram provides a tighter bound than the Sakoe-Chiba band does, and both appear
tighter than LB_Kim or LB_Yi in Figs. 4 and 5, respectively.
We will now prove the claim of lower bounding.
Proposition 1. For any two sequences Q and C of the same length n, for any global
constraint on the warping path of the form j −r ≤ i ≤ j +r, the following inequality
holds: LB_Keogh(Q,C) ≤ DTW(Q,C)
Exact indexing of dynamic time warping
Our strategy will be to assume the opposite and show that it leads to a contradiction.
Assume
K
n (ci − Ui )2 if ci > Ui
(ci − Li ) if ci < Li >
2 wk .
0 otherwise
i=1 k=1
Because the terms under the radicals are positive, we can square both sides:
n (ci − Ui )2 if ci > Ui K
(ci − Li )2 if ci < Li > wk .
0 otherwise
i=1 k=1
We will map the i th term on the LHS with one of the i th terms on the RHS (recall
that wk is defined as (i, j)k , cf. Sect. 2.1). There may be several values of j for
a single i; so to enforce the desired one-to-one mapping, we will map to the one with
the lowest value for j. All the other wk ’s are placed in the unmatched summation.
For the moment, let us ignore the unmatched terms and see what relationship
exists between just the matched terms and the LHS. There are three cases to consider;
let us consider the case when ci > Ui :
(ci − Ui )2 <? > wk
(ci − Ui )2 <? > (ci − q j )2 By definition (cf. Sect. 2.1).
(ci − Ui ) < > (ci − q j )
?
Because ci > Ui , we can take square roots.
−Ui < > −q j
?
Add −ci to both sides.
q j < > Ui
?
Add Ui + q j to both sides.
q j < > max(qi−r : qi+r ) By definition, Eq. (6).
?
Because we have n = m (recall LB_Keogh is only defined when |Q| = |C|), then
j − r ≤ i ≤ j + r, ⇒ i − r ≤ j ≤ i + r, so we can rewrite the RHS as
q j <? > max(qi−r , q(i+1)−r , q j , . . ., qi+r ).
If we remove all terms except q j from the RHS, we are left with
q j ≤ max(q j ).
E. Keogh, C.A. Ratanamahatana
The case when ci < Li yields to a similar argument. The third case yields
4. Indexing DTW
Virtually all approaches to indexing time series under the Euclidean distance that
guarantee no false dismissals use the GEMINI framework of Faloutsos et al. (Chan
et al. 2003; Faloutsos et al. 1994; Keogh et al. 2000, 2001; Korn et al. 1997; Yi
and Faloutsos 2000). Using the GEMINI framework, all one has to do is to choose
a high level representation of the data and define a lower bounding measure on it
(Faloutsos et al. 1994). Many such representations have been suggested, including
Fourier transforms (Faloutsos et al. 1994), Wavelets (Chan et al. 2003), singular
value decomposition (Korn et al. 1997), adaptive piecewise aggregate approximation
(Keogh et al. 2001), and a simple technique independently introduced by two authors
called piecewise aggregate approximation (PAA) (Keogh et al. 2000; Yi and Falout-
sos 2000). This technique is attractive because it is simple, intuitive, and competitive
with the other more complex approaches. In this section, we will show that PAA can
be adapted to allow indexing under DTW. We begin with a brief review of PAA.
Simply stated, to reduce the time series from n dimensions to N dimensions, the
data is divided into N equal-sized frames. The mean value of the data falling within
a frame is calculated, and a vector of these values becomes the data-reduced rep-
resentation. The complicated subscripting in Eq. (10) just insures that the original
sequence is divided into the correct number and size of frames. The representation
can best be visualized as an attempt to model the original time series with a linear
combination of box basis functions as shown in Fig. 10.
Exact indexing of dynamic time warping
Fig. 10. The PAA representation can be readily visualized as an attempt to model a sequence with a linear
combination of box basis functions. In this case, a sequence of length 256 is reduced to 16 dimensions
Given two original sequences Q and C, we can transform them into Q̄ and C̄
using Eq. (10), and approximate their Euclidean distance by:
n N 2
DR( Q̄, C̄) ≡ q̄i − c̄i . (11)
N i=1
A proof that DR( Q̄, C̄) lower bounds the true Euclidean distance is in Keogh et al.
(2000) (a different proof appears in Yi and Faloutsos (2000)).
We can visualize Û and L̂ as the piecewise constant functions that bound, without
intersecting, U and L, respectively. Figure 11 illustrates this intuition.
Fig. 11. We can readily visualize Û and L̂ as the piecewise constant functions that bound, without intersecting,
U and L, respectively
We are now able to define the low dimension, lower bounding function, which
we denote as LB_PAA. Given a candidate sequence C, transformed to C̄ by Eq. (10),
and a query sequence Q, with its companion PAA functions Û and L̂, the following
function lower bounds LB_Keogh:
n (c̄i − Ûi ) if c̄i > Ûi
N 2
L B_PA A(Q, C̄) = (c̄ − L̂ i ) if c̄i < L̂ i .
2 (14)
N i
i=1 0 otherwise
Fig. 12. A) A representation of a minimum bounding rectangle (MBR). B) A subsection of the query shown
in Fig. 11, with its attendant functions Û and L̂. C) An illustration of the MINDIST function. The lengths of
the arrow lines, squared, scaled by n/N, summed and square rooted, are returned as the minimum distance
between Q and any sequence contained within R
Table 3. K-NN algorithm to compute the exact K nearest neighbors of a query time series Q using a multi-
dimensional index structure
Algorithm KNNSearch(Q,K)
Variable queue: MinPriorityQueue;
Variable list: temp;
1. queue.push(root_node_of_index, 0);
2. while not queue.IsEmpty() do
3. top = queue.Top();
4. for each time series C in temp such that DTW (Q,C) ≤ top.dist
5. Remove C from temp;
6. Add C to result;
7. if |result| = K return result;
8. queue.Pop();
9. if top is a PAA point C
10. Retrieve full sequence C from database;
11. temp.insert(C, DTW (Q,C));
12. else if top is a leaf node
13. for each data item C in top
14. queue.push(C, LB_PAA(Q,C̄));
15. else // top is a non-leaf node
16. for each child node U in top
17. queue.push(U, MINDIST(Q,R)) // R is MBR associated with U.
By inserting the time series in temp (i.e. previously seen objects) into result in in-
creasing order of their distances DTW(Q,C) (by keeping temp sorted by DTW(Q,C)),
we ensure that there exists no explored object E such that DTW(Q,E) <
DTW(Q,C).
The definitions of LB_Keogh, LB_PAA, and MINDIST proposed in this work
are also needed for answering range queries using a multidimensional index struc-
ture. We can use a classic R-tree-style recursive search algorithm. Because both
MINDIST(Q,R) and LB_PAA(Q,C̄) lower bound DTW(Q,C), the algorithm shown
in Table 4 is correct (Faloutsos and Lin 1995).
Table 4. Range search algorithm to retrieve all the time series within a range of ε from query time series Q.
The function is invoked as RangeSearch(Q, ε, root_node_of_index)
Algorithm RangeSearch(Q,ε,T)
1. if T is a non-leaf node
2. for each child U of T
3. if MINDIST(Q,R) ≤ ε RangeSearch(Q,ε,U); // R is MBR of U
4. else // T is a leaf node
5. for each PAA point C in T
6. if LB_PAA(Q,C̄) ≤ ε
7. Retrieve full sequence C from database;
8. if DTW(Q,C) ≤ ε Add C to result;
Exact indexing of dynamic time warping
5. Experimental evaluation
In this section, we test our proposed approach with a comprehensive set of experi-
ments.
T is in the range [0, 1], with the larger the better. To estimate T for each of the 32
datasets, we did the following: We randomly extracted 50 sequences of length 256.
We compared each sequence to the 49 others, using the true DTW distance, and the
three lower bounding functions. For each dataset, we report T as the average ratio
from the 1,225 (50*49/2) comparisons made.
Figure 13 shows the results of the experiments. On 24 out of 32 datasets, LB_Yi
produces tighter bounds than LB_Kim, and its average value is approximately 1.38
times larger. The most obvious result from the experiment, however, is the dominance
of LB_Keogh. It wins on every dataset, and its average value is approximately 3.11
times larger than its nearest rival. Because the efficiency of indexing has a (much)
greater than linear dependence on the tightness of the lower bounding function, these
results augur well for our approach.
Fig. 13. The mean value of T (tightness of lower bound) for the three lower bounding functions under con-
sideration for 32 datasets from finance, medicine, biometrics, chemistry, physics, astronomy, robotics, net-
working, and industry. Appendix A contains a key to the datasets
We choose to report results from a query length of 256 because this is about the
midrange of queries reported in the literature (Chan et al. 2003; Chu et al 2002; Park
et al. 1999; Yi et al. 1998). However, we also experimented with queries in the range
of 32-1,024. This range was chosen to include the longest and shortest reported in the
literature (Chan et al. 2003; Park, personal communication). All techniques perform
better for short queries; however, while both LB_Kim and LB_Yi degrade rapidly
for longer queries, LB_Keogh stays almost constant for longer queries. This effect
was observed on all datasets. For brevity, we just present results for the random walk
dataset in Fig. 14.
Fig. 14. The effect of query length on the tightness of lower bounds for the three techniques under consid-
eration
lower bounding functions to prune away the quadratic-time computation of the full
DTW algorithm. For fairness, we visit the 49 sequences in the same order for each
approach. The value P reported is averaged over all 50 runs.
Note the value of P depends only on the data and is completely independent
of any implementation choices, including spatial access method, buffer size, com-
puter language, or hardware platform. A similar idea for evaluating indexing schemes
appears in Hellerstein et al. (1997).
The results are summarized in Fig. 15. On 25 out of 32 datasets, LB_Yi is
more efficient at pruning than LB_Kim. On average, it was able to prune 1.53 times
as many items. Once again, however, the most obvious result is the dominance of
LB_Keogh. It wins on every dataset and was able to prune 3.95 times as many items
as LB_Yi and 6.06 times as many items as LB_Kim.
Fig. 15. The mean value of P (pruning power) for the three lower bounding functions under consideration
for 32 datasets from finance, medicine, biometrics, chemistry, physics, astronomy, robotics, networking, and
industry. Appendix A contains a key to the datasets
The second reason why the results may be pessimistic predictors of indexing
performance is the relatively small size of the datasets. We should expect the frac-
tion of pruned sequences to increase on larger datasets. The reason is because the
larger the dataset, the greater the chance there is of a good match being found, and
a good match allows us to extract the maximum benefit from the pruning conditional
LB_dist < best_so_far in line 4 of the algorithm in Table 2. To demonstrate
this effect, we ran the same experiment above on increasingly larger subsets of the
random walk dataset. The results are shown in Fig. 16.
Fig. 16. The effect of database size on pruning power. Note that, as the size of the database increases, we
are able to prune a larger fraction of the data
found that it never beat the linear scan, and we therefore decided to exclude it from
graphic presentation.
We tested over a range of query lengths and dimensionalities, but show just one
typical result for brevity. Figure 17 shows the normalized CPU cost of linear scan and
LB_Keogh, for queries of length 256, with a 16-dimensional index, for increasingly
large databases.
Fig. 17. The normalized CPU cost of linear scan and LB_Keogh, for queries of length 256, with a 16-
dimensional index, for increasingly large databases. Note that the X-axis is in logarithmic scale and denotes
the number of items in the database
In Sect. 3.3.1, we justified using warping windows by noting that researchers who
use DTW to solve real-world problems have documented their utility (Aach and
Church 2001; Caiani et al. 1998; Gavrila and Davis 1995; Gollmer and Posten 1995;
Itakura 1975; Kovacs-Vajna 2000; Munich and Perona 1999; Rath and Manmatha
2002). However, because warping windows are the cornerstone of our lower bound-
ing technique, we will conduct experiments to explicitly justify their use. As we are
E. Keogh, C.A. Ratanamahatana
Fig. 18. The effect of warping window width on the tightness of lower bounds for various query lengths. Note
that, even with an extremely loose warping window, equal to 20% of the query length, the lower bounds for
LB_Keogh are tighter than the two competing approaches over the entire range of query lengths
Fig. 19. A) Top: examples of the two classes from the TCB dataset. Bottom: Examples of the four classes
from the ASL dataset. B) The effect of varying the warping window width on accuracy for the two datasets
in question
Appendix A
The raw numbers obtained from the experiments discussed in Sects. 5.2 and 5.3 are
shown in Table 5. These numbers may be visualized in Figs. 13 and 15, respectively.
Table 5. The raw numbers obtained from the experiments discussed in Sects. 5.2 and 5.3
References
Aach J, Church G (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics
17:495–508
Agrawal R, Lin KI, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and
translation in times-series databases. In: Proceedings of the 21st international conference on very large
databases, pp 490–501
Bar-Joseph Z, Gerber G, Gifford D, Jaakkola T, Simon I (2002) A new approach to analyzing gene expression
time series data. In: Proceedings of the 6th annual international conference on research in computational
molecular biology, pp 39–48
Berndt D, Clifford J (1994) Using dynamic time warping to find patterns in time series. AAAI-94 workshop
on knowledge discovery in databases, pp 229–248
Caiani EG, Porta A, Baselli G, Turiel M, Muzzupappa S, Pieruzzi F, Crema C, Malliani A, Cerutti S (1998)
Warped-average template technique to track on a cycle-by-cycle basis the cardiac filling phases on left
ventricular volume. IEEE Comput Cardiol 25:73–76
Chan KP, Fu A, Yu C (2003) Haar wavelets for efficient similarity search of time-series: with and without
time warping. IEEE Trans Knowl Data Eng 15(3):686–705
Chu S, Keogh E, Hart D, Pazzani M (2002) Iterative deepening dynamic time warping for time series. In:
Proceedings of the 2nd SIAM international conference on data mining
Das G, Lin K, Mannila H, Renganathan G, Smyth P (1998) Rule discovery form time series. Proceedings of
the 4th international conference of knowledge discovery and data mining. AAAI Press, pp 16–22
Debregeas A, Hebrail G (1998) Interactive interpretation of Kohonen maps applied to curves. Proceedings
of the 4th international conference of knowledge discovery and data mining, pp 179–183
Diez JJR, Gonzalez CA (2000) Applying boosting to similarity literals for time series classification. Multiple
classifier systems, 1st international workshop, pp 210–219
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases.
In: Proceedings of the ACM SIGMOD conference, Minneapolis, MN, pp 419–429
Faloutsos C, Lin K (1995) FastMap: A fast algorithm for indexing, data-mining and visualization of tradi-
tional and multimedia datasets. SIGMOD conference, pp 163–174
Gavrila DM, Davis LS (1995) Towards 3-d model-based tracking and recognition of human movement:
a multi-view approach. In: International workshop on automatic face- and gesture-recognition, pp 272–
277
Gollmer K, Posten C (1995) Detection of distorted pattern using dynamic time warping algorithm and ap-
plication for supervision of bioprocesses. On-line fault detection and supervision in chemical process
industries
Guttman A (1984) R-trees: A dynamic index structure for spatial searching. In: Proceedings ACM SIGMOD
conference, pp 47–57
Hellerstein JM, Papadimitriou CH, Koutsoupias E (1997) Towards an analysis of indexing schemes. 16th
ACM symposium on principles of database systems, pp 249–256
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoustics
Speech Signal Process ASSP 23:52–72
Kadous MW (1999) Learning comprehensible descriptions of multivariate time series. In: Proceedings of the
16th international machine learning conference, pp 454–463
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2000) Dimensionality reduction for fast similarity search
in large time series databases. J Knowl Inf Syst 3(3):263–286
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Locally adaptive dimensionality reduction for in-
dexing large time series databases. In: Proceedings of ACM SIGMOD conference on management of
data, May, pp 151–162
Keogh E, Pazzani M (2000) Scaling up dynamic time warping for data mining applications. In: 6th ACM
SIGKDD international conference on knowledge discovery and data mining, Boston
Kim S, Park S, Chu W (2001) An index-based approach for similarity search supporting time warping in
large sequence databases. In: Proceedings of the 17th international conference on data engineering, pp
607–614
Kollios G, Vlachos M, Gunopulos G (2002) Discovering similar multidimensional trajectories. In: Proceed-
ings of the 18th international conference on data engineering
Korn F, Jagadish H, Faloutsos C (1997) Efficiently supporting ad hoc queries in large datasets of time se-
quences. In: Proceedings of SIGMOD ’97, pp 289–300
Kovacs-Vajna ZM (2000) A fingerprint verification system based on triangular matching and dynamic time
warping. IEEE Trans Pattern Anal Mach Intell 22(11):1266–1276
Kruskall JB, Liberman M (1983) The symmetric time warping algorithm: from continuous to discrete. In:
Time warps, string edits and macromolecules. Addison
E. Keogh, C.A. Ratanamahatana
Kwong S, He Q, Man K (1996) Genetic time warping for isolated word recognition. Int J Patt Recogn Artif
Intell 10(7):849–865
Munich M, Perona P (1999) Continuous dynamic time warping for translation-invariant curve alignment
with applications to signature verification. In: Proceedings of 7th international conference on computer
vision, Korfu, Greece, pp 108–115
Myers C, Rabiner L, Roseneberg A (1980) Performance tradeoffs in dynamic time warping algorithms for
isolated word recognition. IEEE Trans Acoustics Speech Signal Process ASSP-28:623–635
Park S, Lee D, Chu W (1999) Fast retrieval of similar subsequences in long sequence databases. In: 3rd IEEE
knowledge and data engineering exchange workshop
Park S, Kim S, Chu W (2001) Segment-based approach for subsequence searches in sequence databases. In:
Proceedings of the 16th ACM symposium on applied computing, Las Vegas, NV, pp 248–252
Park S, Chu W, Yoon J, Hsu C (2000) Efficient searches for similar subsequences of different lengths in
sequence databases. In: Proceedings of the 16th IEEE international conference on data engineering, pp
23–32
Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice, Englewood Cliffs, NJ
Rabiner L, Rosenberg A, Levinson S (1978) Considerations in dynamic time warping algorithms for discrete
word recognition. IEEE Trans Acoustics Speech Signal Process ASSP-26:575–582
Rath T, Manmatha R (2002) Word image matching using dynamic time warping, Tec Report MM-38. Center
for Intelligent Information Retrieval, University of Massachusetts Amherst
Roussopoulos N, Kelley S, Vincent F (1995) Nearest neighbor queries. SIGMOD Conference, pp 71–79
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE
Trans Acoustics Speech Signal Process ASSP 26:43–49
Schmill M, Oates T, Cohen P (1999) Learned models for continuous planning. In: 7th international workshop
on artificial intelligence and statistics
Seidl T, Kriegel H (1998) Optimal multi-step k-nearest neighbor search. SIGMOD Conference, pp 154–165
Strik H, Boves L (1988) Averaging physiological signals with the use of a DTW algorithm. In: Proceedings
SPEECH’88, 7th FASE symposium, Edinburgh, Book 3, pp 883–890
Tappert C, Das S (1978) Memory and time improvements in a dynamic programming algorithm for matching
speech patterns. IEEE Trans Acoustics Speech Signal Process ASSP 26:583–586
Walker J (2001) HotBits: genuine random numbers generated by radioactive decay,
www.fourmilab.ch/hotbits/
Yi B, Jagadish K, Faloutsos H (1998) Efficient retrieval of similar time sequences under time warping. In:
ICDE 98, pp 23–27
Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary L p norms. Proceedings of the 26th
international conference on very large databases, pp 385–394
Author biographies
Eamonn Keogh is an assistant professor of computer science at the Univer-
sity of California, Riverside. His research interests are in data mining, machine
learning and information retrieval. Several of his papers have won best-paper
awards, including papers at SIGKDD and SIGMOD. Dr. Keogh is the recip-
ient of a 5-year NSF Career Award for Efficient Discovery of Previously Un-
known Patterns and Relationships in Massive Time Series Databases.
Exact indexing of dynamic time warping
Correspondence and offprint requests to: Eamonn Keogh, University of California–Riverside, Computer Sci-
ence & Engineering Department, Riverside, CA 92521, USA. Email: [email protected]