0% found this document useful (0 votes)
10 views

Matrix Profile Tutorial Part1

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Matrix Profile Tutorial Part1

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 110

Abdullah Mueen Eamonn Keogh

Time Series Data Mining Using the Matrix


Profile:
A Unifying View of Motif Discovery,
Anomaly Detection, Segmentation,
Classification, Clustering and Similarity
Joins

We will start at 8:25am to allow stragglers to find the room


To get these slides in PPT or PDF, go to www.cs.ucr.edu/~eamonn/MatrixProfile.html
This tutorial is based on work by:

Chin-Chia Michael Yeh, Yan Zhu, Abdullah Mueen

Nurjahan Begum, Yifei Ding, Hoang Anh Dau

Diego Furtado Silva, Liudmila Ulanova, Eamonn Keogh

Zachary Zimmerman, Nader S. Senobari, Philip Brisk

Shaghayegh Gharghabi, Kaveh Kamgar Your


face
..and others that have inspired us, forgive any omissions. here
This tutorial is funded by:

• NSF IIS-1161997 II

• NSF IIS 1510741

• NSF 544969

• CNS 1544969

• SHF-1527127

• AFRL FA9453-17-C-0024

Any errors or controversial statements are due solely to Mueen and Keogh
Disclaimer:

Time series is an inherently visual domain, and we exploit that fact in this tutorial.
We therefore keep formal notations and proofs to an absolute minimum.
If you want them, you can read the relevant papers [a]

---

All the datasets used in tutorial are freely available, all experiments are reproducible.
[a] www.cs.ucr.edu/~eamonn/MatrixProfile.html
If you enjoy this tutorial,
please check out our
other tutorials..

www.cs.ucr.edu/~eamonn/public/SDM_How_to_do_Research_Keogh.pdf www.cs.unm.edu/~mueen/DTW.pdf
Outline
Act 1 Act 2
• Our Fundamental Assumption • Background on time series mining
• What is the (MP) Matrix Profile? • Similarity Measures
• Normalization
• Properties of the MP
• Distance Profile
• Developing a Visual Intuition for MP • Definition and Trivial Approach
• Basic Algorithms • Just-in-time Normalization
• MP Motif Discovery • The MASS Algorithm
• MP Time Series Chains • Weighted Distance Profile
• MP Anomaly Discovery • Distance Profile with Gaps
• MP Joins (self and AB)
• MP Semantic Segmentation • Matrix Profile
• STAMP
• From Domain Agnostic to Domain Aware:
• STOMP
The Annotation Vector (A simple way to use domain
knowledge to adjust your results) • GPU-STOMP

• The “Matrix Profile and ten lines of code is • Open problems to solve
all you need” philosophy.
• Break
Fundamental Assumption: Conservation is Key motif 1

motif 2

2000 3000 4000 5000 6000 7000 8000


motif 3
0 2 seconds 200

If a pattern is conserved, there must be some mechanism * Bengali: Bābā


* Mandarin : baba
* Norwegian : papa
* Spanish : papá
that conserves it. This is true in linguistics, music, * Polish : tata * Swahili : baba
genetics, literature, religions…. * Swahili : baba
* Turkish : baba
* English : papa
* Hindi : papa
Much of our work asks what is conserved in time series, * Xhosa: -tata * Indonesian : bapa
when is it conserved, and why was an expected en.wikipedia.org/wiki/Mama_and_papa
conservation not observed…
For discrete strings, conserved is easy to define, for
example papa =*a*a. For time series it requires a
distance function, here we will use Euclidean Distance.
What is the Matrix Profile?
• The Matrix Profile (MP) is a data structure that annotates a time series.
• Key Claim: Given the MP, most time series data mining problems are trivial or easy!
• We will show about ten problems that are trivial given the MP, including motif
discovery, density estimation, anomaly detection, rule discovery, joins, segmentation,
clustering etc. However, you can use the MP to solve your problems, or to solve a
problem listed above, but in a different way, tailored to your interests/domain.
What is the Matrix Profile?
• The Matrix Profile (MP) is a data structure that annotates a time series.
• Key Claim: Given the MP, most time series data mining problems are trivial or easy!
• We will show about ten problems that are trivial given the MP, including motif
discovery, density estimation, anomaly detection, rule discovery, joins, segmentation,
clustering etc. However, you can use the MP to solve your problems, or to solve a
problem listed above, but in a different way, tailored to your interests/domain.
• Key Insight: The MP profile has many highly desirable properties, and any algorithm
you build on top of it, will inherit those properties.
• Say you use the MP to create: An Algorithm to Segment Sleep States
• Then, for free, you have also created: An Anytime Algorithm to Segment Sleep States
An Online Algorithm to Segment Sleep States
A Parallelizable Algorithm to Segment Sleep States
A GPU Accelerated Algorithm to Segment Sleep States
An Algorithm to Segment Sleep States with Missing Data
etc.
The Highly Desirable Properties of the
Matrix Profile I
• It is exact: For motif discovery, discord discovery, time series joins etc., the Matrix Profile
based methods provide no false positives or false dismissals.
• It is simple and parameter-free: In contrast, the more general algorithms in this space
that typically require building and tuning spatial access methods and/or hash functions.
• It is space efficient: Matrix Profile construction algorithms requires an inconsequential
space overhead, just linear in the time series length with a small constant factor, allowing
massive datasets to be processed in main memory (for most data mining, disk is death).
• It allows anytime algorithms: While exact MP algorithms are extremely scalable, for
extremely large datasets we can compute the Matrix Profile in an anytime fashion, allowing
ultra-fast approximate solutions and real-time data interaction.
• It is incrementally maintainable: Having computed the Matrix Profile for a dataset,
we can incrementally update it very efficiently. In many domains this means we can effectively
maintain exact joins/motifs/discords on streaming data forever.
The Highly Desirable Properties of the
Matrix Profile II
• It can leverage hardware: Matrix Profile construction is embarrassingly parallelizable,
both on multicore processors, GPUs, distributed systems etc.
• It is free of the curse of dimensionality: That is to say, It has time complexity that is
constant in subsequence length: This is a very unusual and desirable property; virtually all
existing algorithms in the time series scale poorly as the subsequence length grows.
• It can be constructed in deterministic time: Almost all algorithms for time series
data mining can take radically different times to finish on two (even slightly) different datasets.
In contrast, given only the length of the time series, we can precisely predict in advance how
long it will take to compute the Matrix Profile. (this allows resource planning)
• It can handle missing data: Even in the presence of missing data, we can provide
answers which are guaranteed to have no false negatives.
• Finally, and subjectively: Simplicity and Intuitiveness: Seeing the world through
the MP lens often invites/suggests simple and elegant solutions.
Developing a Visual Intuition for the
Matrix Profile
In the following slides we are
going to develop a visual intuition
for the matrix profile, without
regard to how we obtain it.
We are ignoring the elephant in
the room; the MP seems to be
much too slow to compute to be
practical. We will address this in
Part II of the tutorial.
Note that algorithms that use the
MP do not require us to visualize
the MP, but it happens that just
visualizing the MP can be 99% of
solving a problem.
Intuition behind the Matrix Profile: Assume we have a time series T, lets start with a synthetic one...

0 500 1000 1500 2000 2500 3000

|T | = n = 3,000
Note that for most time series data mining tasks, we are not interested in any global properties of the time
series, we are only interested in small local subsequences, of this length, m

These subsequences might be about the length of individual heartbeats (for ECGs), individual days (for
social media behavior), individual words (for speech analysis) etc

m = 100

0 500 1000 1500 2000 2500 3000


We can created a companion “time series”, called a Matrix Profile or MP.

The matrix profile at the ith location records the distance of the subsequence in T, at the ith location, to its nearest
neighbor under z-normalized Euclidean Distance.

For example, in the below, the subsequence starting at 921 happens to have a distance of 177.0 to its nearest
neighbor (wherever it is).

177

0 500 1000 1500 2000 2500 3000

921
Another example. In the below, the subsequence starting at 378 happens to have a distance of
34.2 to its nearest neighbor (wherever it is).

34.1
0 500 1000 1500 2000 2500 3000

378
For the rest of this tutorial….
The Matrix Profile is always shown in blue.

The real time series data, is generally shown in red.

0 500 1000 1500 2000 2500 3000


We can create another companion sequence, called a matrix profile index.

The MPI contains integers that are used as pointers. As a practical matter, even 32-bits will let us have a MP of
length 2,147,483,647, over two years of data at 60Hz. A 64-bit integer gives us ten billion years at 60Hz)

In the following slides we won’t bother to show the matrix profile index, but be aware it exists,
and it allows us to find the nearest neighbor to any subsequence in constant time.

200

34.1

0 500 1000 1500 2000 2500 3000

1373 1375 1389 … .. 368 378 378 234 …


matrix profile index
(zoom in )
Note that the pointers in the matrix profile index are not necessarily symmetric.
If A points to B, then B may or may not point to A

An interesting exception, the two smallest values in the MP must have the same value, and
their pointers must be mutual. This is the classic time series motif.

0 500 1000 1500 2000 2500 3000

1373 1375 1389 … .. 368 378 378 234 … 2000 2001 2002 2003 2003
Why is it called the Matrix Profile?
m

One naïve way to compute it would be to

m
construct a distance matrix of all pairs of
subsequences of length m.

For each column, we could then “project” down


the smallest (non diagonal) value to a vector, and
that vector would be the Matrix Profile.

While in general we could never afford the


memory to do this (4TB for just |T|= one million),
for most applications the Matrix Profile is the only
thing we need from the full matrix, and we can
compute and store it very efficiently. (as we will
see later)

Key:
Small distances are blue
Large distances are red
Dark stripe is excluded
How to “read” a Matrix Profile
Where you see relatively low values, you know that the subsequence in the original time
series must have (at least one) relatively similar subsequence elsewhere in the data (such
regions are “motifs” or reoccurring patterns)
Where you see relatively high values, you know that the subsequence in the original time
series must be unique in its shape (such areas are “discords” or anomalies).

Must be an anomaly in the original


data, in this region.

We call these Time Series Discords

0 500 1000 1500 2000 2500 3000

Must be conserved shapes (motifs) in the original data,


in these three regions
How to “read” a Matrix Profile: Synthetic Anomaly Example

Where you see relatively high values, you know that the subsequence in the original time
series must be unique in its shape. In fact, the highest point is exactly the definition of Time
Series Discord, perhaps the best anomaly detector for time series*

Must be an anomaly in the


original data, in this region

0 500 1000 1500 2000 2500 3000

* Vipin Kumar performed an extensive empirical evaluation and noted that “..on 19 different publicly available data sets, comparing 9 different techniques (time
series discords) is the best overall technique.”. V. Chandola, D. Cheboli, V. Kumar. Detecting Anomalies in a Time Series Database. UMN TR09-004
How to “read” a Matrix Profile: Synthetic Motif Example

Where you see relatively low values, you know that the subsequence in the original time
series must have (at least one) relatively similar subsequence elsewhere in the data.
In fact, the lowest points must be a tieing pair, and correspond exactly to the classic definition
of time series motifs.

0 500 1000 1500 2000 2500 3000

The corresponding subsequence in the raw data at this location, must have at least one similar
subsequence somewhere
How to “read” a Matrix Profile:

Now that we understand what a Matrix Profile is, and we have some practice
interpreting them on synthetic data, let us spend the next five minutes to see
some examples on real data.

Note that we will typically create algorithms that use the Matrix Profile,
without actually having humans look at it.

Nevertheless, in many exploratory time series data mining tasks, just looking at
the Matrix Profile can give us unexpected and actionable insights.

Ready to begin?
Taxi Example: Part I
Given a long time series, where should you examine carefully?
The problem is called “Attention Prioritization”, a group at Stanford is working on this [a].
However we think that the Matrix Profile can be used for this, “for free”.

Below is the data, the hourly average of the number of NYC taxi passengers over 75 days
in Fall of 2014.

Lets compute the Matrix Profile for it, we choose a subsequence length corresponding to
two days…. (next slide)

500 1000 1500 2000 2500 3000 3500

[a] http://futuredata.stanford.edu/ASAP/extended.pdf
Taxi Example: Part II

• The highest value corresponds to Thanksgiving (the uniqueness of Thanksgiving was the only thing
the Stanford Team noted)
• We find a secondary peak around Nov 6th, what could it be? Daylight Saving Time! The clock going
backwards one hour, gives an apparent doubling of taxi load.
• We find a tertiary peak around Oct 13th, what could it be? Columbus Day! Columbus Day is largely
ignored in much of America, but still a big deal in NY, with its large Italian American community.

500 1000 1500 2000 2500 3000 3500

0
500 1000 1500 2000 2500 3000 3500
Taxi Example: Part III

• What about the lowest values? (the best motifs)


• They are exactly seven days apart, suggesting that in this dataset, there might be a periodicity of
seven days, in addition to the more obvious periodicity of one day.

500 1000 1500 2000 2500 3000 3500

0
500 1000 1500 2000 2500 3000 3500
The top motif is a typical work week, starting from Tuesday
Italy Power Demand Weekend
(1995 to 1998)
0 20 40 60 80 100 120 140 160

The Taxi example was easy to solve by manual inspection of the raw data, but with just an order of magnitude more data,
the problem becomes much harder. Lets try a similar, but larger example, Italian Power Demand 1995 to 1998.
Note that the matrix profile is very low on average, most weeks are similar to the previous week (persistence) or the same
week in a different year (history).
All the high values can be explained by Italian holidays, most of which fall on different days in consecutive years.

sto sto o sto


o go y g
y rra
g
da
y rra i da rra
da e i Fe F r Fe
r i F ar
r
d F Day ar d D a y r y ar
o d F Day a r y Ye o y a ry Ye o
Go ster
a Ye
Go ster f M ' Day ew
o
r G aste
r Da fM ew
ar a D ay o fM N ew
r r o N E
Ap Ea
o N p E o on to 8 M ar or on to
on nt
s to 5 A pr ab s 2 a b ti s
14 Apr ti a i as A ay
L pti a M L m p a
16 ump All S X m 7 M su
m Xm 30
M
ay s su Xm
s s 1 A s 1 A
A
0
0.5 1 1.5 2 2.5
Electrocardiogram
(MIT-BIH Long-Term ECG Database)

In this case there are two anomalies annotated by MIT cardiologists.


The Matrix Profile clearly indicates them.

Here the subsequence length was set to 150, but we still find these
anomalies if we half or triple that length.

1000 2000 3000 4000 5000 6000 7000

20 The second discord: The first discord: premature


15
10 ectopic beat ventricular contraction
5
0
1000 2000 3000 4000 5000 6000 7000
Zebra Finch
(Zebra Finch Vocalizations in MFCC, 100 day old male)

1000 2000 3000 4000 5000 6000 7000 8000

motif 1
Motif discovery can often surprise you.

While it is clear that this time series is not random, we did motif 2
not expect the motifs to be so well conserved or repeated
so many times. There is evidence of a vocabulary, and
maybe even a grammar… motif 3
0 200
2 seconds
Seismology

If we see low values in the MP of a seismograph, it means there must have been a repeated earthquake.
Repeated earthquakes can happen decades apart.
Many fundamental problems seismology, including the discovery of foreshocks, aftershocks, triggered
earthquakes, swarms, volcanic activity and induced seismicity, can be reduced to the discovery of these
repeated patterns.

Seismic Time Series


The corresponding
subsequence in the
raw data at this
location, must have Matrix Profile
at least one similar 0 9,000
earthquake
somewhere

Time:19:23:48.44 Latitude:37.57 Longitude:-118.86 Depth: 5.60 Magnitude: 1.29


Zoom-In Time:20:08:01.13 Latitude:37.58 Longitude:-118.86 Depth: 4.93 Magnitude: 1.09
0 seconds 10 Thanks to C. Yoon, O. O’Reilly, K. Bergen and G. Beroza of Stanford for this data
20
T1 = 0;
Chimp DNA It is possible to convert DNA for i = 1 to length(chromosome)
if chromosomei = A, then Ti+1 =
Y-chromosome converted to time series strings to real-valued time Ti + 2
if chromosomei = G, then Ti+1 =
series, in a lossless fashion Ti + 1
if chromosomei = C, then Ti+1 =
Ti - 1
if chromosomei = T, then Ti+1 =
Let us search the Chimp’s DNA for repeated structure, of length 60,000… Ti - 2
end

Pan troglodytes Y-chromosome

0 1,000,000

12,749,475 to 14,249,474 bp
622,725 to 2,122,724 bp
*“much of the Y (Chimp chromosome) consists of
Zoom-In lengthy, highly similar repeat units, or ‘amplicons’”

0 *J. Hughes et al., “Chimpanzee and human Y chromosomes are remarkably divergent in structure” Nature 463, (2010). 60,000
Music

While motifs usually point to chorus,


discords point to bridges or solos

let it be, let it be, yeah let it be And there will be an answer, let it
{instrumental bridge}

Discord at 1m54 let it be, let it be, yeah let it be And there will be an answ
it
60

45 Motifs at 3m9s and 3m23s

30

15
0 60 120 180
Time (s)
Summary

We could do this all day! We have applied the MP to hundreds of datasets.


It is worth reminding ourselves of the following:
• The MP can find structure in taxi demand, seismology, DNA, power demand, heartbeats,
music and bird vocalization data. However the MP does not “know” anything about this
domains, it was not tweaked, tuned or adjusted in anyway.
• This domain-agnostic nature of MP is a great strength, you typically get very good results
“out-of-the-box” no matter what your domain.
The following is also worth stating again:
• We spent time looking at the MP to gain confidence that it is doing something useful.
However, most of the time, only a higher-level algorithm will look at the MP, with no
human intervention or inspection (we will make that clearer in the following slides).
A Minor Visual Mapping Trick It is sometime useful to think
of time series subsequences
as points in m-dimensional
space.

In this view, dense regions in


the m-dimensional space
correspond to regions of the
time series that have a low
corresponding MP

0 500 1000 1500


Here we show a sensible way to extract the top-K motifs. However,
The Top-K Motifs I there is nothing stopping you from inventing a different way. If you do,
the MP will let you compute it in milliseconds.

We need a parameter R.
1 < R < (small number, say 3)
Lets make R = 2 for now.

We begin by finding the nearest pair


of points, the motif pair….

(This the pair of subsequences


corresponding to lowest pair of
values in the MP)

Next slide…

0 500 1000 1500


The Top-K Motifs II We find the nearest pair of points
are D1 apart.

Lets draw a circle, D1 times R, around


both points.

Any points that are within either of


these circles, are added to this motif,
in this case there is just one… See
next slide…
The Top-K Motifs III The Top-1 motif has three members, it
is done.

Now lets find the Top-2 motif. We


begin by finding the nearest pair of
points, excluding anything from the
top motif.

The nearest pair of points are D2 apart.


Lets draw a circle D1 times R, around
both points.

Any points that are within either of


these circles, is added to this motif, in
this case there are two… See next
slide…
The Top-K Motifs IIII The Top-1 motif has three members, it
is done.

Now lets find the Top-2 motif. We


begin by finding the nearest pair of
points, excluding anything from the
top motif.

The nearest pair of points are D2 apart.


Lets draw a circle D1 times R, around
both points.

Any points that are within either of


these circles, are added to this motif,
in this case there are two, for a total
of four items in the Top-2 Motif
The Top-K Motifs V We are done with the Top-2 Motif

Note that we will always have:


D1 < D 2 < D 3 …

When to stop? (what is K?)


We could use MDL etc.
As a practical matter, we can pull out all K, and use
eyeballing to judge the quality of motifs.
For example, in the below, Motif 1 is stunningly well
conserved, Motif 2 is somewhat conserved, Motif 3
may be getting close to random…. So here we would
say we have a strong Top-2 Motifs.
From Motifs to Time Series Chains

Take a look at the blue ‘subsequences”

They would not from a single motif (but


perhaps they could form a set of motifs).
From Motifs to Time Series Chains However, if we label them by arrival time,
you can see that they are drifting, or
evolving in time.

10 This is actionable, for example, where will


9 the 11th item land? Surely just Northeast of
8 the 10th item
7
6
5
4
We call such pattern chains, with the first
3 item as the anchor.
2
1 Do such patterns exist in the real world?

Can we find them?


Arterial Blood Pressure

0 0.5 1 1.5 2 2.5 3

We will zoom-in to here in the next slide

We ran time series chain discovery on the dataset. The only thing we tell it is the
length of the subsequence to use (about one heartbeat long).
Zoom In
60

mmHg
40

20 tilt begins

0 5000

Ads the chain progresses, the depth of the dicrotic notch decreases….

Peak systolic pressure


Sy
s to
uptake

lic
Dicrotic de
cl i
notch ne
Systolic

Di
c ro
tic
run
off

2040 2220 2440 2620 3040 3220


More Time Series Chains
We looked at the google query volume for Kohl’s, an American retail chain.
The discovered chain shows that over the decade, the bump transitions from a smooth bump covering most of the period between
thanksgiving and Xmas, to a more sharply focus bump centered on thanksgiving. This seems to reflect the growing importance of Cyber
Monday, a marketing term for the Monday after Thanksgiving. The phrase was created by marketing companies to persuade people to shop
online. The term made its debut on November 28 th, 2005 in a press release entitled “Cyber Monday Quickly Becoming One of the Biggest
Online Shopping Days of the Year” . Note that this date coincides with the first glimpse of the sharping peak in our chain.

2004
2014
0 250 weeks 500 weeks

Thanksgiving

Xmas

45 55 95 105 150 165 305 315 410 420 460 475


One Last Time Series Chain
Magellanic penguins regularly dive to depths of up to 50m to hunt prey.
Penguins have typical body densities for a bird, but just before diving they
take a very deep breath that makes them exceptionally buoyant. This
positive buoyancy is difficult to overcome near the surface, but at depth,
the compression of water pressure cancels it. In order to get to down to
their hunting ground below sea level it is clear that “locomotory muscle
workload, varies significantly at the beginning of dives”*.
The snippet of time series shown in does not
suggest much of a change in stroke-rate,
however penguins are able vary the thrust of
their flapping by twisting their wings. The
chains we discovered shows this dramatic and
evolving sprint downwards leveling off to a
comfortable cruise.

3-minute snippet of X-Axis Acceleration

pressure
Zoom-In

0 18 seconds
*Williams, C.L. et al. Muscle energy stores and stroke rates of emperor penguins: implications for muscle metabolism and dive
performance. Physiological and Biochemical Zoology.85.2(2011):120-133 Photo by Paul J. Ponganis
• There are literally 100’s of time series anomaly detectors.
• However, many claim that Time Series Discords is among the best.
..on 19 different publicly available data sets, comparing 9 different
techniques (time series discords) is the best overall technique among all Vipin Kumar
ACM SIGKDD
techniques. Vipin Kumar* 2012 Innovation
Award Winner

• This is good news for us, because if you compute the matrix profile,
you have the discords “for free”. In fact, you have all the top K-
discords, for any K.
• Why are discords so effective? (our subjective opinion)
• They make no assumptions about the data (so no wrong assumptions).
• They don’t need to learn a bunch of parameters, with no parameters to fit, it
is hard to overfit.
• There is one pathological (but fixable) case where they don’t work
(next slide) *https://www.cs.umn.edu/research/technical_reports/view/09-004
The twin freak problem (see next slide)
The definition of a discord is: This is the discord.
The subsequence D that has the It is far from its nearest neighbor
maximum distance from its Let us say it was caused be a valve
(non-trivial match) nearest being stuck one day..
neighbor.
The twin freak problem
The definition of a discord is: ..but suppose that the anomaly
The subsequence D that has the happened twice?
maximum distance from its Once on Monday, once on Friday…
(non-trivial match) nearest
neighbor. The problem is that it is no longer
the discord, under our classic
definition ;-(

This is now the discord There is a simple fix, a minor change


to the definition..
The twin freak problem
The new definition of a discord is: The new definition solves the problem.
The subsequence D that has the
However, what about the triple freak, or
maximum distance from its (non- quadruple freak problem etc.…
trivial match) second nearest
If an “anomaly” happens many times, it is
neighbor. probably not an anomaly, and we probably
know about it anyway.

Nevertheless, it can be useful to generalize


to the Kth nearest neighbor, for a small K,
say 3

The subsequence D that has the maximum


distance from its (non-trivial match) K
nearest neighbor.

This is a trivial change/addition to the MP


We have already seen examples of
time series discords (although we
did not explicitly call them that) so
we will not revisit this here.

Discords are simply high values in


the Matrix Profile.

There are many other algorithms


to find discords. But why bother
with them, when the Matrix
Profile gives you them for free?
Generalizing to Joins
• We can think of the MP as a type of similarity self-join. For every subsequence in TA, we
join it with its nearest (non- trivial) neighbor in TA, or JT T or TA ⋈1nn TA
A A

• This is also known as all-pairs-similarity-search (or similarity join).


• However, we can genialize to an AB-similarity join. For every subsequence in A, we join
can it with its nearest neighbor in B, or JT T or TA ⋈1nn TB
A B

• Note that in general: JT ≠ JT


AT B BT A

• Note that A and B can be radically different sizes


• We may be interested in:
• What is conserved between two time series (the join motifs)
• What is different between two time series (the join discords)
• The tricks for understanding and reading join-based MPs are the same as before, we will
see some examples to make that clear.
Generalizing to Joins Two scenarios of interest: we do a JT T …
A B

join discord 1) The Golden Batch: Here we have two time series
that we think should be about the same. But when
we join them, there is a join discord, a subsequence
that appears only in only in A, but not in B, but
why? (spoken word example below)

2) The Suspicious Similarity: Here we have two time


series that we have no reason to think should be
join motifs
the same. But when we join them, there are join
motifs, some subsequences from A appear in B, but
why? (music example below. Another example
would result from “meter swapping”)
Now let us consider the join of two time series.

Assume we have two time series TA and TB ... Note that they can be of different lengths

TA

0 500 1000

TB

0 500 1000 1500 2000

| TA | = 1,000 | TB | = 2,000
As before, we are not interested in any global properties of the time series, we are only interested in small
local subsequences, of this length, m

These subsequences might be about the length of individual heartbeats (for ECGs), individual days (for
social media behavior), individual words (for speech analysis) etc.

TA

m = 100

TB

m = 100

0 500 1000 1500 2000


We can create a companion Matrix Profile of TA.

For every subsequence in TA, we look for its nearest neighbor in TB.

The Matrix Profile at the ith location records the distance of the subsequence in TA, at the ith location, to its
nearest neighbor in TB, under z-normalized Euclidean Distance.

The Matrix Profile is almost the same length as TA, it is shorter by just m

For example, in the below, the subsequence of length 100 starting at 362 happens to have a distance of 1.24 to
its nearest neighbor (wherever it is) in TB .
TA

Informally: how far is each subsequence


in TA, from its nearest neighbor in Tb?
1.24
0 500 1000
362
Recall that the Matrix Profile at the ith location records the distance of the subsequence in TA, at the ith location, to
its nearest neighbor in TB, under z-normalized Euclidean Distance.

However, it does not tell us where the location of the nearest neighbor in TB. To store this information, we can
create another companion sequence, called a matrix profile index.

The green arrow points from the subsequence of TA starting at 362 to its nearest neighbor in TB. The nearest
neighbor locates at 359 of TB .

This is JT
TA ATB
TB

1.24
0 1000 0 2000
362 359

357 359 1401 …


Informally: For each subsequence in TA,
matrix profile index point to its nearest neighbor in Tb
(zoom in )
We can reverse the direction of the join…
Here the matrix profile index tell us the location of the nearest neighbor of each subsequence in TB .

The green arrow points from the subsequence of TB starting at 1340 to its nearest neighbor in TA. The nearest neighbor
locates at 395 of TA .

This is JT
TA BTA
TB

1.05
0 1000 0 2000
395 1340

matrix profile index 394 395 396 …

(zoom in )
Music I (join case)
Can you see any common structure between the two time
series below?
Hint, it is probably about this length

0
10,000 20,000
Music II (join case) The data is the 2nd MFCC of two songs,
Under Pressure and Ice Ice Baby

Queen-Bowie

Vanilla Ice
10,000 20,000

A zoom-in of the best conserved region between the two time series (the similarity join)

Queen-Bowie

Vanilla Ice

0 250 500
I
In the previous example we asked you to find “common structure between the two time
series” Now I am going to ask you the opposite question.
What is different between the two time series?

Hint, it is probably about this length

Hint, it cant be the regions in the matching boxes, since they have matches…

UK

US
0 100 200 300 400 500 600 700 800
II Closest Match

ED = 2.8

since his first year at Hogwarts and owned a Fire..


since his first year at Hogwarts and owned on..

Here the difference is due Furthest Match

to a unique phrase that only


appears in the USA version ED = 10.7
…indor house Quidditch team ever since his first ye…
Harry had been on the Gryffindor House Quidditch te..
of the Harry Potter books. 0 (1.6 seconds) 100

UK version : Harry was passionate about Quidditch. He had played as Seeker


on the Gryffindor house Quidditch team ever since his first year at Hogwarts
and owned a Firebolt, one of the best racing brooms in the world...

USA version : Harry had been on the Gryffindor House Quidditch team
ever since his first year at Hogwarts and owned one of the best racing
DNA (join case)
L. pneumophila Paris
L. pneumophila Lens
We consider two strains of
Legionella bacteria, L.
pneumophila Paris and L.
pneumophila Lens, which
consist of 3,503,504 and
3,345,567 bp respectively. We 0 1,000,000 2,000,000 3,000,000

consider a subsequence length


of 100,000

However, we flipped one of


the time series “backwards”, Zoom-In
before computing the join. Lens: 1591412 to 1691411 bp
Paris :1769196 to 1869195 bp
(plotted in reverse)

0 100,000 200,000
Laura Gomez-Valero et al. Comparative and functional genomics
of Legionella identified eukaryotic like proteins as key players in host–
Time Series Semantic Segmentation Sometime the system we are
monitoring changes regimes, can
we detect such changes?

PigInternalBleedingDatasetArtPressureFluidFilled ..sedated pig, has bleeding induced…

PulsusParadoxusSP02 ..regular beat, then Pulsus Paradoxus..

SuddenCardiacDeath2 very irregular beat, then ventricular tachyarrhythmia

RoboticDogActivityX ..walking, then playing…

TiltECG
..lying horizontal, titling begins …
FLOSS: Matrix Profile Segmentation
What do we want in a Semantic Segmentation Algorithm?

• Handle fast online, or huge batch data


• Domain agnosticism. It would be nice to have a single algorithm the works on all
kinds of data
• Parameter-Lite. (Tuning parameters almost guarantees overfitting).
• High accuracy.
• Able to report degree of segmentation or confidence (A binary decision is too
brittle in most cases).

• Claim: We can do all this by looking at the Matrix Profile Index…


0

1892
1000 2000

1270
3000

4039
4000

4607
5000

Key Observation
1269 1270 1892 3450 4039 4040

Recall that the Matrix Profile Index has pointers (arrows, arcs)
that point to the nearest neighbor of each subsequence.

If we have a change of behavior, say from walk to run, we should


No arcs seem to cross here
expect very few arrows to span that change of behavior.

• Most walk subsequences will point to another walk


subsequence
• Most run subsequences will point to another run subsequence
• Rarely, if ever, will a run point to a walk.

So, if we slide across the Matrix Profile Index, and count how
many arrows cross each particular point, we expect to find few
that span the change of behavior.

Lets try this, next slide…


0 1000 2000 3000 4000 5000
This works!
1269
1892

1270
1270

1892
4039

3450
4607

4039 4040
If we use the sliding arc count to produce an arc-curve, we find
it is near zero at the point of system transition. This low value
signals the location of the system change.
1500

1000
There is one flaw. The arc-curve, tends to be low near the
500
beginning and end of the time series, just because there are
0
0 1000 2000 3000 4000 5000 fewer arcs that could cross at those locations.

The arc count here What we can do is calculate what the arc-curve would look like
is almost zero! if there was no system transition, and use that to correct the
arc-curve.
If there was no system transition structure, the arc-curve would
be a inverted parabola, with a height ½ the time series length.
Lets try this, next slide…
Empirical Theoretical
2500

But the beginning (and


end) are also near zero. 0 0 1000 2000 3000 4000 5000
The number of arcs that cross a given index, if the links are assigned randomly
0 1000 2000 3000 4000 5000
This works even
1269
1892

1270
1270

1892
4039

3450
4607

4039 4040
better!
The corrected arc-curve minimizes in the right place,
1500

1000

500 and does not have spurious minimizations at the


0
0 1000 2000 3000 4000 5000
beginning and end.

1
0.8
0.6
0.4

How robust is the corrected arc-curve? Lets add


0.2
0
0 1000 2000 3000 4000 5000

some distortions to the data, and see what it does.


Lets try this, next slide…

The corrected arc-curve


here is almost zero
FLOSS is very robust to the data’s
properties Consider the following distortions
 Downsampling from the original 250 Hz to 125 Hz (red).
 Reducing the bit depth from 64-bit to 8-bit (blue).
 Adding a linear trend of ten degrees (cyan).
 Adding 20dB of white noise (black).
 Smoothing, with MATLAB’s default settings (pink).
 Randomly deleting 3% of the data, and filling it back in
with simple linear interpolation (green).

Most distortions make almost no difference. The only one


that does move the CAC significantly was adding noise. But
even then we still find the correct segmentation, and we
1 added a lot of noise
0.8

0.6

We added a
0.4

0.2

0
lot of noise
0 1000 2000 3000 4000 5000
0 1000 2000 3000 4000 5000
FLOSS is very robust to its only
parameter
The CAC has a single parameter, the subsequence length m.
But we can typically change it by an order of magnitude, and get very good results.

The CAC computed for:


1
Tilt ABP (top) TiltABP with m = {100, 150, 200,
250, 300, 350, 400}
0
0 40,000

1 (bottom) DutchFactory for m = {25, 50,


200, 250}. Even for this huge range of
Dutch Factory values for L, the output of FLOSS is
0 essentially unchanged.
0 8,000
Great Barbet
(Psilopogon virens)

One individual Great Barbet sings…, ….another takes over…, …yet another takes over
10
MFCC Space
5

-5
0 5000

0.5

0
0GreatBarbet2_50_1900_3700.txt 1000 2000 3000 4000 5000
This dataset was hand annotated by an entomologist. The
insect changes its feeding behavior at about time 1,800. Asian citrus psyllid
(Diaphorina citri)
1

-1
0 12000
1

0.5

0
0 InsectEPG2_50_1800.txt 4000 8000 12000
Pulsus Paradoxus is often visually apparent in the SP02 trace.
Here we deliberately ignore this fact, and look only in the ECG
trace, which is normally considered as not predictive of Pulsus
Paradoxus.

Note that the clinician that annotated this data was in the room at the
time and may have had access to information that is simply not
available in this signal. Pulsus paradoxus (PP), also paradoxic pulse or paradoxical pulse, is an abnormally large
decrease in systolic blood pressure and pulse wave amplitude during inspiration.
See also https://www.youtube.com/watch?v=7AXIYQK5BBM

10

-5
0 10000 18000
1

0.5

0
0 PulsusParadoxusECG2_30_10000.txt 10000 18000
Summary for Time Series Segmentation
The Matrix Profile allows a simple algorithm, FLUSS/FLOSS, for time series segmentation.

Because it is built on the MP, it inherits all the MPs properties

• It is incrementally computable, i.e. online (This variant is called FLOSS)


• It is fast (at least 40 times faster than real-time for typical accelerometer data)
• It is domain agnostic (But you can use the AV to add domain knowledge, see below)
• It is parameter-lite (only one parameter, and it is not sensitive to its value)

• It has been tested on the largest and most diverse collection of time series ever considered
for this problem, and in spite of (or perhaps, because of) its simplicity, it is state-of-the-art.
Better than rival methods, and better than humans (details offline).
From Domain Agnostic to Domain
Aware*
• The great strength of the MP is that is domain agnostic. A single black box
algorithm works for taxi demand, seismology, DNA, power demand,
heartbeats, music, bird vocalizations....
• However, in a handful of cases, there is a need to, or some utility in,
incorporating some domain knowledge/constraints.
• There is a simple, generic and elegant way to do this, using the
Annotation Vector (AV).
• In the following slides we will show you the annotation vector in the
context of motif discovery, but you can use it with any MP algorithm.
• We will begin by showing you some examples of spurious motifs that can
be discovered in particular domains, then we will show you how the AV
*
mitigates
Hoang Anh them.
Dau and Eamonn Keogh. Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery. KDD'17, Halifax, Canada.
Motivating Example 1:
Stop-word Motif Bias

In some medical datasets, the


true motifs may be “swamped”
by more frequent, but biologically
meaningless patterns. Much like Top-1 Motif (all data)
Top-1 Motif, if we ignore the first 1,000 data points

the stop words “and” and “the” in


text mining.
Calibration Signal True ECG Signal

Here the approximate square 0 1000 2000 3000

wave is just a calibration signal, A snippet of ECG data from the LTAF-71 Database. The top motifs
sent when the sensor has weak come from regions of the calibration signal because they are much
contact with the skin. It is a more similar than the motifs discovered if we search only data that
frequent, but spurious motif. contains true ECGs.
Euclidean Distance has a bias toward simple shapes.
Motivating Example 2:
“Pairs of complex objects, even those which subjectively may
Simplicity Bias seem very similar to the human eye, tend to be further apart
under (Euclidean) distance than pairs of simple objects.” [1]

Surprisingly, the top-motif does not correspond to


the motion artifacts, but to simple regions of “drift”

Top-1 Motif
Motion Artifact

Top-1 Motif Motion Artifact

1 4000 8000

A snippet of ECG time series in which two motion artifacts were deliberately introduced
by the attending physician.

[1] Batista et al. “CID: an efficient complexity-invariant distance for time series.” Data Mining and Knowledge Discovery, 2014
Motivating Example 3:
Actionability Bias
In many cases a domain expert wants to find not simply the best
motif, but regularities in the data which are exploitable or actionable
in some domain specific ways.
“I want to find motifs in this web-click data, preferably occurring
on or close to the weekend.”
“I want to find motifs in this oil pressure data, but they would be
more useful if they end with a rising trend.”

Such queries can be almost seen as a hybrid between motif search


and similarity search.
Key Insight/Claim
• We could try to have different definitions of motifs, for different
domains/people/preferences/situations.
• However, we might have to devise a new search algorithm for each
one, and maybe some such algorithms could be hard to speed up.
• That would mean having to abandon our nice, fast, one-size-fits-all
matrix profile.
• Instead, we can do the following:
• Use our one-size-fits-all matrix profile algorithm to find the basic matrix profile
• Then use a domain dependent function, the Annotation Vector, to “nudge” the matrix
profile to better suit the individual desired domains/people/preferences/situations
The annotation vector framework
The annotation vector (AV) is a
time series consisting of real- Dodgers Loop data (subset)
valued numbers between [0 - 1].
A lower value indicates the
subsequence starting at that
index is less desirable, and 0 5000
Annotation Vector
therefore should be biased
against.
Conversely, higher values mean m t w t f s s
u
m t w t f s s
u
m t w
the corresponding subsequences
should be favored for the top) Seventeen days from the Dodger Loop dataset.
potential motif pool. bottom) The AV that encodes a preference for motifs occurring
on or near the weekend.
The annotation vector framework
Combines the matrix profile (MP) with the annotation vector (AV) to produce a
new “adjusted” matrix profile.
We refer to this as the “Corrected” MP (CMP), as it correctly incorporates the
contextual bias for the problem.

If =  : raises MP value in order to remove the subsequence from potential motif pool
If =  : retains original MP value to allow the motifs that best balance the fidelity of conservation with the user’s constraints to rise to the top

This only leaves the question of how do we create such an AV for our domain of
interest?
Key Claim: For most problems, a domain expert can design an appropriate AV with
5 minutes of introspection, and implement it in 2 or 3 lines of code or an excel
script.
Case study: Stop-word motif bias

Stop-word motif

1
Distance profile Original MP
0
Threshold
Extended exclusion zone for
Corrected MP 1 150
each data point below threshold
0.5
Annotation vector 1 3000 1 3000
0
0 1000 2000 3000 By correcting the MP to bias away from stop-word
top) We annotated a single stop-word from the motifs, we can discover medically meaningful motifs.
LTAF-71 dataset. middle) The stop-word distance
profile to the entire dataset was thresholded to
create an exclusion zone, which was used to create
an AV (bottom).
Case study: Actionability bias (i)
Suppressing motion artifacts
How to make the AV
Functional near-infrared spectroscopy • Slides a window of length m across the
(fNIRS) data 690 nm intensity acceleration time series.
(subset of record fNIRS3)
• Compares the STD of each subsequence with
the mean of all the subsequences’ STDs, and
assign the corresponding AV value to be
0 4000 8000 12000
either 0 or 1
A snippet of fNIRS searched for motifs of length 600. The
motifs correspond to an atypical region, which (using
STD vector
external data, see Fig. 7 below) we know is due to a Mean of STD vector
sensor artifact.
Acceleration time series
fNIRS data

Acceleration AV vector

1 50000
1 25000 Points above the mean of all subsequences’ standard deviation
The synchronization between the fNIRS data and are well aligned with regions of motion artifacts. The
accompanying accelerator data. corresponding AV values for these points are 0 and 1 for the rest.
Case study: Actionability bias (ii)
Suppressing motion artifacts

Motifs discovered
the classic approach
(top to bottom) Motifs in fNIRS data
discovered using classic motif search
tend to be spurious motion artifacts, Original matrix profile

because the matrix profile is


minimized by the highly conserved
1
but specious patterns. Domain-specific Annotation vector
0

If we use an AV to correct the MP,


Corrected matrix profile
then that CMP allows us to find
medically significant motifs.
Motifs discovered (zoom-in of motifs)
by the CMP

0 5000 10000 15000


Case Study: Actionability Bias
The flat top is not
medically true data, the
true value simply
Suppressing hard-limited artifacts exceeds
the 8-bit precision

0 200 400

A snippet of a left-eye EOG sampled at 50


Hz from an individual with sleep disorder Limit of 8-bit precision

1 4000 8000 How to make the AV


• record the maximum and minimum
True value exceeds values of the time series (the constant
the 8-bit precision values touching the red bars)
True motif,
suggestive of • slide a window across the time series to
REM sleep extract subsequences
• count the number of constant values
0 400 0 400 (from being hard-limited above or
Without Correction With Correction below) in each subsequence
• This number over the subsequence
length is used as the bias function.
• This take 3 minutes, and five lines of
Motifs that are discovered using classic motif search tend to include hard-limited data (left),
because the matrix profile is minimized by having long constant regions. By creating an AV to MATLAB.
correct the MP, we can find true motifs corresponding to ponto-geniculo-occipital waves (right)
Case study: Simplicity bias
Motifs discovered
the classic approach
0 3500

10000 60000 Motifs discovered


A short snippet of a time series of the flexion of a by the CMP
subject’s little finger. Subjectively, most people would
10000
expect that two occurrences of consecutive multiple 60000
flexions to be the top motif (inset). Instead we find the By correcting the matrix profile with an AV based on
simple “ramp-up” pattern. complexity measure, we discover the true motifs of finger
flexion pattern.

Time series
Complexity estimation

1 60000
The complexity measure shown in parallel to the raw
data. We simply normalize this complexity vector to be A visual intuition of the complexity estimation of three
in range [0 - 1] to obtain the final AV. time series subsequences of different complexity levels.
Summary of the last ten minutes: annotation vector
Most of the time, the plain vanilla MP is going to be all you need to find
motifs/discords/chains etc. for you data.

In some cases, you may get spurious results. That is to say, mathematically
correct results, but not what you want/need/expect for your domain.

In those cases, you can just invent a simple function to suppress the spurious
motifs, code it up as an annotation vector in a handful of lines of code, use it to
“correct” the MP, and then run the motifs/discords/chains algorithms as before.

Once you invent an AV, say AVdiesel_engine or AVTurkish_folk_music, you can reuse it on
similar datasets, share it with a friend, publish it etc.
The “Matrix Profile and ten lines of code is all you need” philosophy

Key Idea:
• We should think of the Matrix Profile as a black box, a primitive.
• As we will later see, in most cases we can think of it as being obtained
essentially for free.
• We claim that given this primitive, and at most ten lines of additional code, we
can reproduce the results of hundred of papers.
• This suggests that other people, may be able to take this primitive, add ten
lines of code, and do amazing things that have not occurred to us. We look
forward to seeing what you come up with!
• In the meantime, lets see an example of: with ten lines of additional code, we
can reproduce the results of a published paper….
Motifs under Uniform Scaling
The two imbedded examples

We took two exemplars from the same class from the MALLET
dataset, and imbedded them into a random walk dataset. Even
without the color-coded clue brushed onto the data by the Matrix
Profile discovery tool, the repeated pattern is visually obvious.

1 10,048
Suggestion: Toggle back and forth with last slide
We stretched the left half of the time series by just
5%, and now the pair of imbedded patterns are no 5% stretching means the shapes begin to go out of
phase, accumulating more and more error…
longer the top-1 motif, an unexpected and
disquieting result.

There is one paper* that offers a solution, but it is


approximate, complicated, has lots of parameters,
is slow..

1 10,048

100%

105%

1 10,313
*D.Yankov, et al (2007). Detecting Motifs Under Uniform Scaling. SIGKDD 2007.
This issue is easy to fix with our “Matrix Profile and ten lines of code is all you need” philosophy.
For example. Suppose you suspect that there are motifs in your dataset, that differ in length by 164%

Take the original dataset T, and copy a stretched version of it into T2, simply by using:
T2 = T(1: 100/164: end); % Unofficial matlab way to resample
Now call:
[JMP, JMPindex] = computeMatrixProfileJoin(T,T2,500);
The resulting Matrix Profile will discover the motifs with the appropriate uniform scaling invariance.
This issue is easy to fix with our “Matrix Profile and ten lines of code is all you need” philosophy.
For example. Suppose you suspect that there are motifs in your dataset, that differ in length by 164%

Take the original dataset T, and copy a stretched version of it into T2, simply by using:
T2 = T(1: 100/164: end); % Unofficial matlab way to resample
Now call:
[JMP, JMPindex] = computeMatrixProfileJoin(T,T2,500);
The resulting Matrix Profile will discover the motifs with the appropriate uniform scaling invariance.
We did this for the electric power demand example below…

What if you don’t know the scaling factor?


It is trivial to search over all possibilities.
January 14 January 18
for scale_factor = 101 : 170
T2 = T(1: 100/scale_factor: end);
[JMP, JMPindex] = computeMatrixProfileJoin(T,T2,500);
326,100 327,100 367,000 367,400
<trivial code to record best motifs omitted>
end
The January 14th pattern is a near
So, with the Matrix Profile and ten lines of perfect match the January 18th pattern,
after the latter is uniformly stretched to
code, we can reproduce the contribution of
164% of its original length.
a SIGKDD published research effort.
The Great Divorce
• In the last hour or so we have discussed the many
wonderful things you can do with the Matrix
Profile, without explaining how to compute it!
• This divorce is deliberate, and very useful.
• We can completely separate the fun, interesting
part of time series data mining, from the more
challenging backend part.
• The divorce lets us use appropriate computational
resources. For example, exploring a million
datapoints in real-time with approximate anytime
matrix profiles on a desktop, but then outsourcing
to a GPU or the cloud, when we need exact
answers, or we need to explore a billion
datapoints.
• All will be revealed, after the coffee break…
Coffee
Break
Dear Conference Attendee

• At this point, please switch to Mueen’s Slides

• There are some slides below, they are mostly back-up and bonus slides

• See you soon!


Visit the Matrix Profile
The Page

End! www.cs.ucr.edu/~eamonn/MatrixProfile.ht
ml

Visit the MASS Page


www.cs.unm.edu/~mueen/
FastestSimilaritySearch.html

Questions Please fill out an evaluation


form, available in the back of the
? room.
Below are Bonus Slides/Back up
Slides
www.cs.u
Why does the MP use Euclidean
nm.edu/~
mueen/D
TW.pdf

Distance instead of DTW?

1. Pragmatically, We have not yet figured out how to


do DTW with the MP efficiently.

2. Having said that, it may not be that useful. DTW is


very useful for..
* The difference this makes is a real-
A. …One-to-All matching (i.e. similarity search). But the MP valued version of the birthday paradox.
is All-to-All matching*. See Mueen’s thesis

B. …small datasets: For example building a ECG classifier


with just five representative heartbeats. But here we are
interested in large datasets.
Comparing Motifs of Different
Lengths
If we find motifs of different lengths, we need to be able to rank motifs of different lengths. A similar problem occurs in string processing, and the common solution is to replace the edit-
distance by the length-normalized edit-distance, which is simply the classic distance measure divided by the length of the strings in question [a]. This correction would find the pair
{concatenation, concameration} more similar than {cat, cot}, matching our intuitions. Researchers have suggested this length-normalized correction for time series, but as we will show, is the
wrong correction factor.
To see this, consider the following thought experiment. Imagine some process in the system we are monitoring occasionally “injects” a pattern into the time series. As a concrete example,
washing machines typically have a prototypic signature, but the signatures express themselves more slowly on a cold day, when it takes longer to heat the cooler water supplied from the city
[b]. We would like all equal length instances of the signature to have approximately the same distance. In Figure 1. left we show two examples from the TRACE dataset which will act as
proxies for a variable length signature. We produced the variable lengths by down sampling. In Figure 1. center we show the distances between the patterns as their length changes. With no
correction, the Euclidean distance is obviously biased to the shortest length. The length-normalized Euclidean distance looks “flatter” and suggests itself as the proper correction. However,
this is something of an optical illusion due to its smaller scale. In Figure 1.right we show all measures after dividing them by their largest value. We can now see that the length-normalized
Euclidean distance has a strong bias to the shortest pattern. In contrast to the other two approaches, the “sqrt(1/length)” correction factor provides a near perfectly invariant distance over a
huge range of values.

12 1
Euclidean Distance * Sqrt(1/Length)
Original Length
Euclidean Distance
Downsampled 1 in 2

Downsampled 1 in 3

Downsampled 1 in 4

Downsampled 1 in 5 0.5
Euclidean Distance / Length

Downsampled 1 in 6 Euclidean Distance * Sqrt(1/Length)


Euclidean Distance
0
Euclidean Distance / Length

0 50 100 150 200 250 0 100 200 0 100 200


Applications of the MP: Meter-Swapping
Head Tail
::
Electricity theft is multi-billion-dollar problem worldwide. There are dozens of ways to
H11
steal power, but some modern wireless meters offer a surprisingly easy method with little
chance of detection. Suppose customer A is a heavy consumer of electricity; perhaps he ::
has several electric cars, or a machine shop (or marijuana nursery) in his garage. Further H4
suppose that he notes that one of his neighbors, customer B, an elderly widow living
alone, consumes very little power. It is possible for A to surreptitiously switch his meter H3
with B, and thus only have to pay for her meager consumption, while she unwittingly gets
lumbered with paying for his extravagant consumption. This crime is called meter- H2
swapping, and has become increasing prevalent as power companies have reduced meter
H1
reading staff in favor of wireless meter reading.
It might be imagined that this would be easy for the power company to detect, as there Jan 1st
Nov 10 th
Dec 31th
would be a significant change in the average power consumed by two houses. However,
the Fig.top hints at, power consumption is often bursty anyway. For example, as families
take vacations, welcome a new baby, or have children return from college for a few
weeks. This pair is not particularly similar,
Our intuition to solve this problem is to note that while volume of consumption is not a given that they are allegedly from the
min(HeadH11 ⋈1nn TailH11)
good feature, some households may have a unique “shape” of the consumption over a same household….
day. Note that we do not expect all days to be conserved and unique, it is sufficient for
our purposes that the household occasionally produces a well-conserved pattern, perhaps Euclidean Distance = 9.56
correspond to a low-power use on the Sabbath for an orthodox family, or a once every
0 24
seven-week all-day obligation to wash and dry the soccer kits for the entire team.
We consider a dataset of household electrical power demand collected from twenty
houses in the UK in 2013. To simulate a meter-swapping event, we randomly choose two
of these time series, and swapped their traces starting at November 10 th. As we can see
the Fig.top this change is not readily visually obvious.
To find the swapped time series pair, we propose the following simple algorithm. We
divide all the time series into two sections, the “Head”, prior to November 10 th, and the This pair are suspiciously similar, given
“Tail”, subsequent to November 10th. We join all possible combinations of Heads and that they are allegedly from different
Tails, and record the pair Hi, Hj that minimizes the following score: households….
Swap-Score(i,j) = min(HeadHi ⋈1nn TailHj) / (min(HeadHi ⋈1nn TailHi) + eps)
min(HeadH11 ⋈1nn TailH4)
Nov 8th at 4:12pm Dec 17th at 3:44pm
In our simple experiment, this score was minimized by i = H11 and j = H4, which, as it
happens, are our swapped pair. As in the Fig.bottom shows, the motif spanning these two
Euclidean Distance = 2.85
apparently distinct traces time series is suspiciously similar, perhaps similar enough to
warrant a visit by a meter reader/fraud prevention officer. 0 Hours 24
Appendix A: A worked example of
an AV
(next 6 slides)
• Perhaps the most common problem people face with motif
discovery, is finding “too-simple” motifs.
• This is not a “bug”, just a property of Euclidean Distance.
• In the next six slides we will show you a concrete example of
our fix.

See also: Hoang Anh Dau and Eamonn Keogh. Matrix Profile V: A Generic Technique to Incorporate Domain
Knowledge into Motif Discovery. KDD'17, Halifax, Canada
Lets start by making a test dataset 13

12
In a smoothed random walk of length 50,000, we imbedded the reverse of
one Mallet-6 at location 10,000, and the reverse of pattern a different

Tak
Mallet-6 at location 40,000.

en
We imbedded one Mallet-2 at location 15,000, and a different Mallet-2 at

fro
m
location 25,000, and yet another Mallet-6 at location 35,000.

UC
Then we added noise to the entire thing:

IM
all
5

et
TAG = (TAG + randn(size(TAG))/4)
4

Here we would definitely expect to find the following … 3

A) Two motifs, one of size 2, one of size 3.

A
13
TAG
12

4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
3 10
We are 100.0% done: The input time series: The best-so-far motifs are color coded We are 100.0% done: The input time series: The best-so-far motifs are color coded
(see bottom panel)
TAG: Pure motif search
(see bottom panel)

0.0001 5 0.0001 5
The best-so-far corrected matrix profile The best-so-far corrected matrix profile
104 104
40 40
30 30
20

10
We did not find the 20

10

0
0.0001 5
imbedded motifs ;-( 0
0.0001 5
The best-so-far 1st motifs are located at 5503 (green) and 14051 The best-so-far 1st motifs are located at 14988 (green) and 24987
104 104
(cyan) (cyan)

Instead, we just found


these simple patterns
Discard Discard
1 500 1 500
The best-so-far 2nd motifs are located at 37756 (green) and 47458 The best-so-far 2nd motifs are located at 9995 (green) and 39995
(cyan) (cyan)

Discard Discard
1 500 1 500
The best-so-far 3rd motifs are located at 19769 (green) and 36016 The best-so-far 3rd motifs are located at 37759 (green) and 47461
(cyan) (cyan)

TAG: Motif search,


corrected for simplicity
Discard Discard
1 500 1 500

We correctly found the two


imbedded motifs, one of
size 3, one of size 2.
It is important to note that pure motif search
“almost” works.

Zoom-in of part of MP As you can see below, the true motif is low in
20 the MP, just not quite low enough.
10
That means if we can just nudge the relevant
0
section down a little (or equivalently, nudge
everything else up), we would find the right
motifs.

1 500

1 500
What we need is a function that recognizes that one
of these patterns is too simple to be of interest.

The number of zero-crossings would probably work,


but that can fail in pathological circumstances.

One such function is complexity, here it is in all its


Complexity = 3.2 glory, for a time series subsequence x

function [complexity] = check_complexity(x)


x=zscore(x);
1 500 complexity = sqrt(sum(diff(x).^2));
end
Complexity = 1.4

The exact values of complexity (shown to the left)


1 500 are not important, only the relative values will
matter.
Note that the complexity
function is high where the
potential motif is, but low
The raw TAG elsewhere. Just what we
(snippet) want.

The complexity function

9200 9400 9600 9800 10000 10200 10400 10600


We are almost done. However, there is one caveat. We have to have a parameter.

The complexity function might be too strong, and might push too hard for more complex motifs, even if they are
not really similar.
However, we can control its strength.
The dilution_factor is a number greater than or equal to zero. If it is zero, there is no dilution. If it large
enough, say over 40, we begin to degenerate to classic motif search.
At least for this problem, values in the range 2 to 16 work great.
% Makes annotation vector that favors complexity
% Dau Hoang Anh and Eamonn Keogh
% [annotationVector] = make_AV_complexity(data, subsequenceLength);
% Output: annotationVector: annotation vector (vector)
% Input: data: input time series (vector)
% subsequenceLength: motif length (scalar)
%
function [AV] = make_AV_complexity(data, subsequenceLength)
data = zscore(data); % data is a row vector
profile_length = length(data) - subsequenceLength + 1;
AV = zeros(profile_length,1);
for j = 1: profile_length
AV(j) = check_complexity(data(j:j+subsequenceLength-1));
end

AV = zeroOneNorm(AV); % zero-one normalize the AV


% Select dilution factor, 0 is no dilution,
dilution_factor = 5; % ..larger numbers are more dilution
AV=AV+dilution_factor;
AV=AV/(dilution_factor+1);

end
Music The MP is an useful tool for various music analysis tasks

Revisited 1
Eagles – Hotel California
The MP can be used to create arc plots,
giving a good visualization of the music
structure

Led Zeppelin – Stairway to Heaven

If there's a bustle in your hedgerow, don't be alarmed now

Yes, there are two paths you can go by, but in the long run

These links can also be used to create an “infinite song”


Music Revisited 2
Repeated patterns can be applied in different scenarios

The “most repeated” subsequence can be used as thumbnail


It is given by the mode of the MP-index

The plot is a histogram of the MPindex. The values record how many times a
subsequence was considered NN of some other subsequence. The
subsequence that maximize this plot was used as the audio thumbnail

We could have had it all


Rolling in the deep
You had my heart inside of your
hand
And you played it to the beat
Notes on Artwork
Most are Images by Dürer (but colored by other)and other artists from
Triumph of Emperor Maximilian I, King of Hungary, Dalmatia and Croatia, Archduke of Austria
• Provenance: National Library of Spain Biblioteca Nacional de España
• Identifier: 108150 Institution: National Library of Spain Provider: The European Library

• http://bdh-rd.bne.es/viewer.vm?id=0000012553&page=1

• The elephant 1: Elephants via some Paintings of the Mughal Era, http://ranasafvi.com/mughal-elephants/

• Elephant number 2: http://collections.vam.ac.uk/item/O15706/one-of-six-figures-from-gouache-mazhar-ali-


khan/

You might also like