Dynamic Time Warping
Dynamic Time Warping
Dynamic Time Warping
In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series)
with certain restrictions. The sequences are "warped" non-linearly in the time dimension to determine a
measure of their similarity independent of certain non-linear variations in the time dimension. This sequence
alignment method is often used in time series classification. Although DTW measures a distance-like quantity
between two given sequences, it doesn't guarantee the triangle inequality to hold.
In addition to a similarity measure between the two sequences, a so called "warping path" is produced, by
warping according to this path the two signals may be aligned in time. The signal with an original set of points
X(original), Y(original) is transformed to X(warped), Y(original). This finds applications in genetic sequence
and audio synchronisation. In a related technique sequences of varying speed may be averaged using this
technique see the average sequence section.
Contents
1 Implementation
2 Fast computation
3 Average sequence
4 Supervised Learning
5 Alternative approach
6 Open Source software
7 Applications
7.1 Spoken word recognition
7.2 Correlation Power Analysis
8 See also
9 References
10 Further reading
Implementation
This example illustrates the implementation of the dynamic time warping algorithm when the two sequences s
and t are strings of discrete symbols. For two symbols x and y, d(x, y) is a distance between the symbols, e.g.
d(x, y) =
for i := 1 to n
DTW[i, 0] := infinity
for i := 1 to m
DTW[0, i] := infinity
DTW[0, 0] := 0
for i := 1 to n
for j := 1 to m
cost := d(s[i], t[j])
DTW[i, j] := cost + minimum(DTW[i-1, j ], // insertion
DTW[i , j-1], // deletion
DTW[i-1, j-1]) // match
return DTW[n, m]
}
We sometimes want to add a locality constraint. That is, we require that if s[i] is matched with t[j], then
is no larger than w, a window parameter.
We can easily modify the above algorithm to add a locality constraint (differences marked in bold italic).
However, the above given modification works only if is no larger than w, i.e. the end point is within
the window length from diagonal. In order to make the algorithm work, the window parameter w must be
adapted so that (see the line marked with (*) in the code).
for i := 0 to n
for j:= 0 to m
DTW[i, j] := infinity
DTW[0, 0] := 0
for i := 1 to n
for j := max(1, i-w) to min(m, i+w)
cost := d(s[i], t[j])
DTW[i, j] := cost + minimum(DTW[i-1, j ], // insertion
DTW[i , j-1], // deletion
DTW[i-1, j-1]) // match
return DTW[n, m]
}
Fast computation
Computing the DTW requires in general. Fast techniques for computing DTW include PrunedDTW,[1]
SparseDTW,[2] FastDTW,[3] and the MultiscaleDTW.[4][5] A common task, retrieval of similar time series, can
be accelerated by using lower bounds such as LB_Keogh[6] or LB_Improved.[7] In a survey, Wang et al.
reported slightly better results with the LB_Improved lower bound than the LB_Keogh bound, and found that
other techniques were inefficient.[8]
Average sequence
Averaging for Dynamic Time Warping is the problem of finding an average sequence for a set of sequences.
The average sequence is the sequence that minimizes the sum of the squares to the set of objects. NLAAF[9] is
the exact method for two sequences. For more than two sequences, the problem is related to the one of the
Multiple alignment and requires heuristics. DBA[10] is currently the reference method to average a set of
sequences consistently with DTW. COMASA[11] efficiently randomizes the search for the average sequence,
using DBA as a local optimization process.
Supervised Learning
A Nearest Neighbour Classifier can achieve state-of-the-art performance when using Dynamic Time Warping
as a distance measure.[12]
Alternative approach
An alternative technique for DTW is based on functional data analysis, in which the time series are regarded as
discretizations of smooth (differentiable) functions of time and therefore continuous mathematics is applied.[13]
Optimal nonlinear time warping functions are computed by minimizing a measure of distance of the set of
functions to their warped average. Roughness penalty terms for the warping functions may be added, e.g., by
constraining the size of their curvature. The resultant warping functions are smooth, which facilitates further
processing. This approach has been successfully applied to analyze patterns and variability of speech
movements.[14][15]
Applications
Spoken word recognition
Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus time axis which needs
to be eliminated.[16] DP-matching is a dynamic programming (DP) based pattern matching algorithm which
uses a time normalization effect where the fluctuations in the time axis are modeled using a non-linear time-
warping function. Considering any two speech patterns, we can get rid of their timing differences by warping
the time axis of one so that the maximum coincidence is attained with the other. Moreover, if the warping
function is allowed to take any possible value, very less distinction can be made between words belonging to
different categories. So, to enhance the distinction between words belonging to different categories, restrictions
were imposed on the warping function slope.
Correlation Power Analysis
Unstable clocks are used to defeat naive power analysis. Several techniques are used to counter this defense,
one of which is dynamic time warp.
See also
Levenshtein distance
Elastic matching
References
1. Silva, D.F. & Batista, G.E.A.P.A. (2015). Speeding Up All-Pairwise Dynamic Time Warping Matrix Calculation(http://s
ites.labic.icmc.usp.br/dfs/pdf/SDM_PrunedDTW .pdf)
2. Al-Naymat, G., Chawla, S., & Taheri, J. (2012). SparseDTW: A Novel Approach to Speed up Dynamic Time Warping
(http://arxiv.org/abs/1201.2969)
3. Stan Salvador & Philip Chan, FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. KDD
Workshop on Mining Temporal and Sequential Data, pp. 70-80, 2004
4. Meinard Mller, Henning Mattes, and Frank Kurth (2006).An Efficient Multiscale Approach to Audio Synchronization.
(https://www.audiolabs-erlangen.de/fau/professor/mueller/publications/2006_MuellerMatte sKurth_MultiscaleAudioSyn
chronization_ISMIR.pdf)Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 192
197
5. Thomas Prtzlich, Jonathan Driedger, and Meinard Mller (2016). Memory-Restricted Mult iscale Dynamic Time
Warping. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.
569573
6. Keogh, E.; Ratanamahatana, C. A. (2005). "Exact indexing of dynamic time warping". Knowledge and Information
Systems. 7 (3): 358386. doi:10.1007/s10115-004-0154-9 (https://doi.org/10.1007%2Fs10115-004-0154-9).
7. Lemire, D. (2009). "Faster Retrieval with a T wo-Pass Dynamic-Time-Warping Lower Bound".Pattern Recognition. 42
(9): 21692180. arXiv:0811.3301 (https://arxiv.org/abs/0811.3301) . doi:10.1016/j.patcog.2008.11.030 (https://doi.org/
10.1016%2Fj.patcog.2008.11.030).
8. Wang, Xiaoyue; et al. "Experimental comparison of representation methods and distance measures for time series data".
Data Mining and Knowledge Discovery. 2010: 135.
9. Gupta, L.; Molfese, D. L.; Tammana, R.; Simos, P. G. (1996). "Nonlinear alignment and averaging for estimating the
evoked potential". IEEE Transactions on Biomedical Engineering. 43 (4): 348356. PMID 8626184 (https://www.ncbi.n
lm.nih.gov/pubmed/8626184). doi:10.1109/10.486255 (https://doi.org/10.1109%2F10.486255).
10. Petitjean, F. O.; Ketterlin, A.; Ganarski, P. (2011). "A global averaging method for dynamic time warping, with
applications to clustering".Pattern Recognition. 44 (3): 678. doi:10.1016/j.patcog.2010.09.013(https://doi.org/10.101
6%2Fj.patcog.2010.09.013).
11. Petitjean, F. O.; Ganarski, P. (2012). "Summarizing a set of time series by averaging: From Steiner sequence to
compact multiple alignment".Theoretical Computer Science. 414: 76. doi:10.1016/j.tcs.2011.09.029 (https://doi.org/10.
1016%2Fj.tcs.2011.09.029).
12. Ding, Hui; Trajcevski, Goce; Scheuermann, Peter; W ang, Xiaoyue; Keogh, Eamonn (2008). "Querying and mining of
time series data: experimental comparison of representations and distance measures". Proc. VLDB Endow. 1 (2): 1542
1552. doi:10.14778/1454159.1454226(https://doi.org/10.14778%2F1454159.1454226).
13. Lucero, J. C.; Munhall, K. G.; Gracco, V . G.; Ramsay, J. O. (1997). "On the Registration of Time and the Patterning of
Speech Movements"(http://jslhr.pubs.asha.org/article.aspx?doi=10.1044/jslhr.4005.1111). Journal of Speech, Language,
and Hearing Research. 40: 11111117. doi:10.1044/jslhr.4005.1111 (https://doi.org/10.1044%2Fjslhr.4005.1111).
14. Howell, P.; Anderson, A.; Lucero, J. C. (2010). "Speech motor timing and fluency". In Maa ssen, B.; van Lieshout, P.
Speech Motor Control: New Developments inBasic and Applied Research. Oxford University Press. pp. 215225.
ISBN 978-0199235797.
15. Koenig, Laura L.; Lucero, Jorge C.; Perlman, Elizabeth (2008)."Speech production variability in fricatives of children
and adults: Results of functional data analysis"(http://scitation.aip.org/content/asa/journal/jasa/124/5/10.1121/1.298163
9). The Journal of the Acoustical Society of America . 124 (5): 31583170. ISSN 0001-4966 (https://www.worldcat.org/i
ssn/0001-4966). PMC 2677351 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677351) . PMID 19045800 (https://
www.ncbi.nlm.nih.gov/pubmed/19045800). doi:10.1121/1.2981639 (https://doi.org/10.1121%2F1.2981639).
16. Sakoe, Hiroaki; Chiba, Seibi. "Dynamic programming algorithm optimization for spoken word recognition". IEEE
Transactions on Acoustics, Speech and Signal Pr ocessing. 26 (1): 4349. doi:10.1109/tassp.1978.1163055 (https://doi.or
g/10.1109%2Ftassp.1978.1163055).
Further reading
Vintsyuk, T.K. (1968). "Speech discrimination by dynamic programming". Kibernetika. 4: 8188.
Sakoe, H.; Chiba (1978). "Dynamic programming algorithm optimization for spoken word recognition".
IEEE Transactions on Acoustics, Speech and Signal Processing. 26 (1): 4349.
doi:10.1109/tassp.1978.1163055.
Myers, C. S.; Rabiner, L. R. (1981). "A Comparative Study of Several Dynamic Time-Warping
Algorithms for Connected-Word Recognition". Bell System Technical Journal. 60 (7): 13891409.
ISSN 0005-8580. doi:10.1002/j.1538-7305.1981.tb00272.x.
Rabiner, Lawrence; Juang, Biing-Hwang (1993). "Chapter 4: Pattern-Comparison Techniques".
Fundamentals of speech recognition. Englewood Cliffs, N.J: PTR Prentice Hall. ISBN 978-0-13-015157-
5.
Mller, Meinard (2007). Dynamic Time Warping. In Information Retrieval for Music and Motion, chapter
4, pages 69-84 (PDF) . Springer. ISBN 978-3-540-74047-6. doi:10.1007/978-3-540-74048-3.
Rakthanmanon, Thanawin (September 2013). "Addressing Big Data Time Series: Mining Trillions of
Time Series Subsequences Under Dynamic Time Warping". ACM Transactions on Knowledge Discovery
from Data. 7 (3): 10:110:31. doi:10.1145/2510000/2500489.