0% found this document useful (0 votes)
64 views29 pages

Time Series

The document discusses time series data and analysis. It begins by defining time series data as sequences of values or events changing over repeated measurements of time. Common applications include financial, biomedical, and meteorological data. Methods for analyzing time series data include identifying trends, correlations, and patterns using techniques like moving averages and least squares estimation. Similarity search is also important for finding similar sequences in large time series databases. Data streams pose additional challenges for continuous querying over transient data.

Uploaded by

padsj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views29 pages

Time Series

The document discusses time series data and analysis. It begins by defining time series data as sequences of values or events changing over repeated measurements of time. Common applications include financial, biomedical, and meteorological data. Methods for analyzing time series data include identifying trends, correlations, and patterns using techniques like moving averages and least squares estimation. Similarity search is also important for finding similar sequences in large time series databases. Data streams pose additional challenges for continuous querying over transient data.

Uploaded by

padsj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

The Data LifeCycle

Exploratory
analysis Insights &
Data Data Data
& Decision
collection processing Analytics
Data Making
visualization

1
High/Multi-dimensional data
o  Text Data
n  Vs. structured data

o  Time Series Data


n  Time series analytics
n  Search, forecasting

o  Dynamic time series


n  Streamed data analysis

2
Time-Series and Sequence Data
o  Time-series database
n  Consists of sequences of values or events changing with
time (high-dimensional data)
n  Data is recorded at regular intervals
n  Characteristic time-series components
o  Trend, cycle, seasonal, irregular
o  Applications
n  Financial: stock price, inflation
n  Biomedical: blood pressure
n  Meteorological: precipitation

3
Time-Series and Sequence Data

Time-series plot

4
Time-Series: Trend analysis
o  Consists of sequences of values or events obtained
over repeated measurements of time (weekly,
hourly...)
o  Time-Series data can be analyzed to:
n  Identify correlations
n  Similar / Regular patterns, trends
n  Forecasting time series

5
Time-Series: Trend analysis
o  A time series can be illustrated as a time-series graph
which describes a point moving with the passage of
time
o  Categories of Time-Series Movements
1.  Long-term or trend movements (trend curve)
2.  Cyclic movements or cycle variations, e.g., business cycles
3.  Seasonal movements or seasonal variations
i.e, almost identical patterns that a time series appears to
follow during corresponding months of successive years.
4.  Irregular or random movements

6
Irregular Cyclical
fluctuations

Trend Seasonal
1 2 3 4 5 6 7 8 9 10 11 12 13

Year

7
Estimation of Trend Curve
o  The freehand method
n  Fit the curve by looking at the graph
n  Costly and barely reliable for large-scaled data
analysis
Irregular Cyclical
fluctuations

Trend Seasonal
1 2 3 4 5 6 7 8 9 10 11 12 13
8
Year
Estimation of Trend Curve
o  The freehand method
n  Fit the curve by looking at the graph
n  Costly and barely reliable for large-scaled data
mining
o  Moving Average Method
n  Series of arithmetic means
n  Used for smoothing
n  Provides overall impression of data over time

9
Trend Analysis
 Moving Average Method

 Smoothes the data


Eliminates
o  Given
 cyclic, seasonal
a series and irregular movements
of measurements y1, y,2, y3...,
and a window of length n
 Loses the data at the beginning or end of a series

 Sensitive to outliers (can be reduced by Weighted Moving Average)


 Assigns greater weight to center elements to eliminate smoothing effects
o  Example:
 Ex: 3 7 2 03
4 579 2
7 20 4 5 9 7 2

o  Moving average of window = 3:


Moving average of order 3: 4 3 2 3 6 7 6

Weighted average (1 4 1): 5.5 2.5 1 3.5 5.5 8 6.5


4323676
7
10
Example: Moving Average (MA)
10

5 original

4 moving average

0
1 2 3 4 5 6 7 8 9

o  Given a series of measurements y1, y,2, y3..., and a


window of length n

o  Example: 3 7 2 0 4 5 9 7 2
o  Moving average of window = 3:
4323676
11
Estimation of Trend Curve
o  The freehand method
n  Fit the curve by looking at the graph
n  Costly and barely reliable for large-scaled data
mining
o  Moving Average Method
n  Series of arithmetic means
n  Used for smoothing
n  Provides overall impression of data over time
o  The least-square method
n  Find the curve minimizing the sum of the squares
of the deviation of points on the curve from the
corresponding data points
12
Similarity Search in Time-Series Analysis
o  Normal database query finds exact match
o  Similarity search finds data sequences that differ only
slightly from the given query sequence

o  Two categories of similarity queries


n  Whole matching: find a sequence that is similar to the query
sequence
n  Subsequence matching: find all pairs of similar sequences

o  Typical Applications
n  Financial market
n  Scientific databases
n  Medical diagnosis
13
Time Series Problems (from a data perspective)

o  The Similarity Problem


X = x1, x2, …, xn and Y = y1, y2, …, yn

o  Define and compute Sim(X, Y)


n  E.g. do stocks X and Y have similar movements?
o  Retrieve efficiently similar time series
n  Vs. database search!
$price

$price

1 365
day

$price 1 365
day

distance function: by expert

1 365
(eg, Euclidean distance)
day
Problems
o  Define the similarity (or distance) function
o  Find an efficient algorithm to retrieve similar time
series from a database
n  Faster than sequential scan?

The Similarity function depends on the Application


Euclidean Similarity Measure

o  View each sequence as a point in n-dimensional


Euclidean space (n = length of each sequence)
o  Define (dis-)similarity between sequences X and Y
as

n
p 1/ p
L p = (∑ | xi − yi | )
i =1

p=1 Manhattan distance

p=2 Euclidean distance


Euclidean model
Query Q Database

n datapoints

Euclidean Distance between


two time series Q = {q1, q2, …, qn}
and S = {s1, s2, …, sn}
Q

n 2
D(Q, S ) ≡ (q
∑ i − si )
i =1 n datapoints
Euclidean model
Query Q Database Distance Rank

0.98 4
n datapoints

0.07 1
Euclidean Distance between
two time series Q = {q1, q2, …, qn}
and S = {s1, s2, …, sn}
Q 0.21 2

n 2 0.43 3
D(Q, S ) ≡ ∑ (qi − si )
i =1 n datapoints
Similarity Retrieval
o  Range Query
D(Q, S ) ≤ ε
n  Find all time series S where

o  Nearest Neighbor query


n  Find all the k most similar time series to Q

o  A method to answer the above queries: Linear scan


Data Streams
o  Continuous, unbounded, rapid, time-varying streams
of data elements
o  Occur in a variety of modern applications
n  Network monitoring and traffic engineering
n  Sensor networks
n  Telecom call records
n  Financial applications
n  Web logs and click-streams
n  Manufacturing processes

o  DSMS = Data Stream Management System

21
DBMS versus DSMS

22
DBMS versus DSMS
o  Persistent relations o  Transient streams (and
persistent relations)

23
DBMS versus DSMS
o  Persistent relations o  Transient streams (and
persistent relations)

o  One-time queries o  Continuous queries

24
The (Simplified) Big Picture

Streamed Stored
Register Result Result
Query

DSMS
Input streams
Archive
Scratch Store Stored
Relations

25
(Simplified) Network Monitoring
Intrusion
Warnings

Online
Register Performance
Monitoring Metrics
Queries

DSMS
Network measurements,
Packet traces
Archive
Scratch Store Lookup
Tables

26
Example
o  One might be interested in only the color and
itemnames...
I was interested
o  They mightinwant
only thewhat
to see color andas a cross-
is known
mnames…tabulation of the aggregates corresponding to
Itemname and Color – but how for data streams?
ght want to see Blue Red … Total
t is known as a
s-tabulation of the Jacket 23 45 … 234
egates
responding to
Jeans 24 28 … 462
mname and Color
… … … … …
What does this
ind you of?
Total 89 132 … 2384
How could you
this via a
OUP BY?
14
Processing time windows

Processing Time Windows

Image: Tyler Akidau


• System waits for x time units
– System decides on stream partitioning
– Simple, easy to implement
– Ignores any time information in the stream -> any aggregation can be arbitrary
28
• Similar: Counting Windows
29

You might also like