Time Series
Time Series
Exploratory
analysis Insights &
Data Data Data
& Decision
collection processing Analytics
Data Making
visualization
1
High/Multi-dimensional data
o Text Data
n Vs. structured data
2
Time-Series and Sequence Data
o Time-series database
n Consists of sequences of values or events changing with
time (high-dimensional data)
n Data is recorded at regular intervals
n Characteristic time-series components
o Trend, cycle, seasonal, irregular
o Applications
n Financial: stock price, inflation
n Biomedical: blood pressure
n Meteorological: precipitation
3
Time-Series and Sequence Data
Time-series plot
4
Time-Series: Trend analysis
o Consists of sequences of values or events obtained
over repeated measurements of time (weekly,
hourly...)
o Time-Series data can be analyzed to:
n Identify correlations
n Similar / Regular patterns, trends
n Forecasting time series
5
Time-Series: Trend analysis
o A time series can be illustrated as a time-series graph
which describes a point moving with the passage of
time
o Categories of Time-Series Movements
1. Long-term or trend movements (trend curve)
2. Cyclic movements or cycle variations, e.g., business cycles
3. Seasonal movements or seasonal variations
i.e, almost identical patterns that a time series appears to
follow during corresponding months of successive years.
4. Irregular or random movements
6
Irregular Cyclical
fluctuations
Trend Seasonal
1 2 3 4 5 6 7 8 9 10 11 12 13
Year
7
Estimation of Trend Curve
o The freehand method
n Fit the curve by looking at the graph
n Costly and barely reliable for large-scaled data
analysis
Irregular Cyclical
fluctuations
Trend Seasonal
1 2 3 4 5 6 7 8 9 10 11 12 13
8
Year
Estimation of Trend Curve
o The freehand method
n Fit the curve by looking at the graph
n Costly and barely reliable for large-scaled data
mining
o Moving Average Method
n Series of arithmetic means
n Used for smoothing
n Provides overall impression of data over time
9
Trend Analysis
Moving Average Method
4323676
7
10
Example: Moving Average (MA)
10
5 original
4 moving average
0
1 2 3 4 5 6 7 8 9
o Example: 3 7 2 0 4 5 9 7 2
o Moving average of window = 3:
4323676
11
Estimation of Trend Curve
o The freehand method
n Fit the curve by looking at the graph
n Costly and barely reliable for large-scaled data
mining
o Moving Average Method
n Series of arithmetic means
n Used for smoothing
n Provides overall impression of data over time
o The least-square method
n Find the curve minimizing the sum of the squares
of the deviation of points on the curve from the
corresponding data points
12
Similarity Search in Time-Series Analysis
o Normal database query finds exact match
o Similarity search finds data sequences that differ only
slightly from the given query sequence
o Typical Applications
n Financial market
n Scientific databases
n Medical diagnosis
13
Time Series Problems (from a data perspective)
$price
1 365
day
$price 1 365
day
1 365
(eg, Euclidean distance)
day
Problems
o Define the similarity (or distance) function
o Find an efficient algorithm to retrieve similar time
series from a database
n Faster than sequential scan?
n
p 1/ p
L p = (∑ | xi − yi | )
i =1
n datapoints
n 2
D(Q, S ) ≡ (q
∑ i − si )
i =1 n datapoints
Euclidean model
Query Q Database Distance Rank
0.98 4
n datapoints
0.07 1
Euclidean Distance between
two time series Q = {q1, q2, …, qn}
and S = {s1, s2, …, sn}
Q 0.21 2
n 2 0.43 3
D(Q, S ) ≡ ∑ (qi − si )
i =1 n datapoints
Similarity Retrieval
o Range Query
D(Q, S ) ≤ ε
n Find all time series S where
21
DBMS versus DSMS
22
DBMS versus DSMS
o Persistent relations o Transient streams (and
persistent relations)
23
DBMS versus DSMS
o Persistent relations o Transient streams (and
persistent relations)
24
The (Simplified) Big Picture
Streamed Stored
Register Result Result
Query
DSMS
Input streams
Archive
Scratch Store Stored
Relations
25
(Simplified) Network Monitoring
Intrusion
Warnings
Online
Register Performance
Monitoring Metrics
Queries
DSMS
Network measurements,
Packet traces
Archive
Scratch Store Lookup
Tables
26
Example
o One might be interested in only the color and
itemnames...
I was interested
o They mightinwant
only thewhat
to see color andas a cross-
is known
mnames…tabulation of the aggregates corresponding to
Itemname and Color – but how for data streams?
ght want to see Blue Red … Total
t is known as a
s-tabulation of the Jacket 23 45 … 234
egates
responding to
Jeans 24 28 … 462
mname and Color
… … … … …
What does this
ind you of?
Total 89 132 … 2384
How could you
this via a
OUP BY?
14
Processing time windows