0% found this document useful (0 votes)
2 views61 pages

Lecture 1 - Time Series Fundamentals - Introduction

The document outlines a course on Machine Learning for Time Series, detailing its structure, topics, and evaluation methods. It covers fundamental concepts, various models, and applications of machine learning in time series analysis, emphasizing its growing importance due to increasing data generation. The course includes lectures, exercises, and a project, with assessments based on written exams and participation.

Uploaded by

zixuanchai77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views61 pages

Lecture 1 - Time Series Fundamentals - Introduction

The document outlines a course on Machine Learning for Time Series, detailing its structure, topics, and evaluation methods. It covers fundamental concepts, various models, and applications of machine learning in time series analysis, emphasizing its growing importance due to increasing data generation. The course includes lectures, exercises, and a project, with assessments based on written exams and participation.

Uploaded by

zixuanchai77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Machine Learning for Time Series

(MLTS or MLTS-Deluxe Lectures)

Dr. Dario Zanca


Machine Learning and Data Analytics (MaD) Lab
Friedrich-Alexander-Universität Erlangen-Nürnberg
18.10.2022
Organisational Information

Machine Learning for time series


• 5 ECTS
• Lectures + Exercises

Machine Learning for Time Series (Deluxe)


• 7.5 ECTS
• Lectures + Exercises + Project
Topics overview

• Time series fundamentals and definitions (2 lectures) ß


• Bayesian Inference (1 lecture)
• Gaussian processes (2 lectures)
• State space models (2 lectures)
• Autoregressive models (1 lecture)
• Data mining on time series (1 lecture)
• Deep learning on time series (4 lectures)
• Domain adaptation (1 lecture)
Course times

Lectures (online)
A new lecture recording is generally released every Thursday on FAU.TV
Consultation hours by appointment, write to [email protected]

Exercises (online)
Live Zoom Session starting on November 3rd
Recordings from previous editions are available at https://www.fau.tv/course/id/3178

StudOn 2023-2024:
https://www.studon.fau.de/crs5276833.html
Exams and evaluation

Written Exam (5 ECTS)


• 70% from lectures, 30% from exercises
• On-campus
Course organizers
Lecturers

Machine Learning and Data Analytics (MaD) Lab


• Dr. Dario Zanca, [email protected] *
• Prof. Dr. Björn Eskofier, [email protected]

* Please, address all your correspondence about the course to Dr. Dario Zanca
Course organizers
Teaching assistants

Exercises, responsibles:
• Richard Dirauf (M.Sc.), [email protected]
• Philipp Schlieper (M.Sc.), [email protected]
References

Machine learning: A Probabilistic Perspective,


by Kevin Murphy (2012)

The Elements of Statistical Learning: Data Mining,


Inference, and Prediction
by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2009)

Deep Learning
by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2016)
Time series fundamentals
Motivations
An old history of time series analysis: Babylonian astronomical diaries

VII century B.C.


“[…] Night of the 5th, beginning of the night,
the moon was 2 ½ cubits behind Leonis […]
Night of the 17th, last part of the night, the
moon stood 1 ½ cubits behind Mars, Venus
was below.”
• Babylonians collected the earliest
evidence of periodic planetary
phenomena
• Applied their mathematics for systematic
astronomic predictions
An old history of time series analysis: Babylonian astronomical diaries

Nowadays, thousands of ground-based and space-


based telescopes(a) generate new knowledge every
night.
• The Vera C. Rubin Observatory in Chile is geared up to collect 20
terabytes per night from 2022(b).
• The Square Kilometre Array, the world’s largest radio telescope,
will generate up to 2 petabytes daily, starting in 2028.
• The Very Large Array (ngVLA) will generate hundreds of
petabytes annually.

(a) https://research.arizona.edu/stories/space-versus-ground-telescopes
(b) https://www.nature.com/articles/d41586-020-02284-7
An old history of time series analysis: The Birth of Epidemiology

1662, John Graunt describes the data collection:


”When anyone dies, […] the same is known to the Searchers, corresponding with
the said Sexton. The Searchers hereupon...examine by what Disease, or Casualty
the corps died. Hereupon they make their Report to the Parish-Clerk, and he, every
Tuesday night, carries in an Accompt of all the Burials, and Christnings, hapning
that Week, to the Clerk of the Hall.”
An old history of time series analysis: The Birth of Epidemiology

1662, John Graunt describes the data collection:


”When anyone dies, […] the same is known to the Searchers, corresponding with
the said Sexton. The Searchers hereupon...examine by what Disease, or Casualty
the corps died. Hereupon they make their Report to the Parish-Clerk, and he, every
Tuesday night, carries in an Accompt of all the Burials, and Christnings, hapning
that Week, to the Clerk of the Hall.”

• Rudimentary conclusions about the mortality and morbidity of


certain diseases
• Graunt's work is still used today to study population trends and
mortality
Importance of time series
Machine learning on time series is becoming The amount of created data increased from two zettabytes in
increasingly important because of the 2010 to 47 zettabytes in 2020

massive production of time series data from


diverse sources, e.g.,
• Digitalization in healthcare
• Internet of things
• Smart cities
• Process monitoring

https://www.statista.com
Example: Predicting demand of products

Amazon sells 400 million products in over Some products are sold depending on the season

185 countries(a).
Ø Maintaining surplus inventory levels for
every product is cost-prohibitive.
Ø Predict future demand of products
Example: Predicting demand of products

Amazon sells 400 million products in over Some products are sold depending on the season

185 countries(a).
Ø Maintaining surplus inventory levels for
every product is cost-prohibitive.
Ø Predict future demand of products

Methods:
Feedforward □ First models required
Statistical Random Transforme
Neural RNN/CNN manual feature
Methods Forests rs engineering
Networks
□ New methods are fully
data-driven
2007 2009 2020
Time
2015 2017
Example: Duplex makes tedious phone calls

Long standing goal of making humans having a


natural conversation with machines,as they
would with each other.
Ø Carry out real-world tasks over the phone

■ Additional audio features


■ Automatic speech recognition
■ Desired service, time/day

E.g., Duplex calling a restaurant.

https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html
Example: Duplex makes tedious phone calls

Method: An RNNs with several features. We use


a combination of text to speech (TTS) engine and
a synthesis TTS engine to control intonation (e.g.,
“hmm”s and “uh”s).

■ Additional audio features


Limitations: trained on specific tasks. Cannot ■ Automatic speech recognition
deal general conversations. ■ Desired service, time/day

E.g., Duplex calling a restaurant.

https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html
Example: Activity recognition in sports (FAU Erlangen)

Many injuries in sports are caused by overuse. Sensor attachment at


thewrist of the
Ø These injuries are a major cause for reduced dominant hand witha
soft, thin wristband
performance of professional and non-
professional beach volleyball players.
Ø Monitoring of player actions could help
identifying and understanding risk factors and
prevent such injuries.

https://doi.org/10.1007/s10618-017-0495-0
Example: Activity recognition in sports (FAU Erlangen)

Method: A CNN is used to classify players’


activities. Classifications allow to create players’
profiles.

Actions:
• Underhand serve • Shot attack
• Overhand serve • Spike
• Jump serve • Block
• Underarm set • Dig
• Overhead set • Null class.

https://doi.org/10.1007/s10618-017-0495-0
Time series fundamentals
Definitions and basic properties
What is a time series?
A time series can be described as a set of observations, taken sequentially in time,

𝑆 = {𝑠!, … , 𝑠" }

where 𝑠# ∈ ℝ$ is the measured state of the observed process at time 𝑡# .

Typically, observations are generally dependent


• Studying the nature of this dependency is of particular interest
• Time series analysis is concerned with techniques for the analysis of these dependencies
Terminology: Regularly Sampled vs Irregularly Sampled
Discrete time series are regularly sampled if
their observations are euqally spaced in

Quantity [u]
time.
∀𝑖 ∈ 1, … , 𝑇 − 1 ,
Δ%! = 𝑡#&! − 𝑡# = 𝑐𝑜𝑛𝑠𝑡. 𝑡! 𝑡" 𝑡# 𝑡$
Time [s]
Terminology: Regularly Sampled vs Irregularly Sampled
Discrete time series are regularly sampled if
their observations are euqally spaced in

Quantity [u]
time.
∀𝑖 ∈ 1, … , 𝑇 − 1 ,
Δ%! = 𝑡#&! − 𝑡# = 𝑐𝑜𝑛𝑠𝑡. 𝑡! 𝑡" 𝑡# 𝑡$
Time [s]

In contrast, for irregularly sampled time


sequences, the observations are not equally

Quantity [u]
spaced.
• They are generally defined as a collection
of pairs
𝑡! 𝑡" 𝑡# 𝑡$
Time [s]
𝑆= 𝑠!, 𝑡! , … , 𝑠" , 𝑡"
Terminology: Univariate vs Multivariate
Let 𝑆 = 𝑠!, … , 𝑠" be a time series,
where 𝑠# ∈ ℝ$ , ∀𝑖 ∈ {1, … , 𝑇} .

Quantity [u]
If 𝑑 = 1, 𝑆 is said univariate.
• Only one variable is varying over time. Time [s]

Time [s]
Terminology: Univariate vs Multivariate
Let 𝑆 = 𝑠!, … , 𝑠" be a time series,
where 𝑠# ∈ ℝ$ , ∀𝑖 ∈ {1, … , 𝑇} .

Quantity [u]
If 𝑑 = 1, 𝑆 is said univariate.
• Only one variable is varying over time. Time [s]

If 𝑑 > 1, 𝑆 is said multivariate.

Quantity [u]
• Multiple variables are varying over time
• E.g., tri-axial accelerometer
measurements Time [s]
Terminology: Discrete vs Continuous

A time series is said to be continuous if observations are made at each instant of time, even
when its measurements consist only of a discrete set of values.
• E.g., the number of people in a room.

A time series is said to be discrete if observations are taken at specific times. Discrete time
series can arise in different ways:
• Sampled (e.g., daily rainfall)
• Aggregated (e.g., monthly reports of daily rainfalls)
Terminology: Discrete vs Continuous

We will denote as mixed-type a multivariate time series consisting of both continuous and
discrete observations
• E.g., a time series consisting of continuous sensor values and discrete event log
for the monitoring of an industrial machine
Terminology: Periodic

A time series is said periodic if there


exists a number 𝜏 ∈ ℝ, called period, Is the biological signal of an heartbeat a periodic function?

such that

𝑠! = 𝑠!"# , ∀𝑖 ∈ {1, … , 𝑇 − 𝜏}

E.g., the continuous time series


defined by the trigonometric function
f x = sin(𝑥)
Terminology: Deterministic vs Non-Deterministic
A deterministic time series is one that could be expressed explicitly by an analytical
expression.
• Observations are generated from a system with no randomness.

In contrast, a non-deterministic time series can not be described by an analytic expression. A


time series may be non-deterministic because :
• The information necessary to describe the process is not fully observable, or
• The process generating the time series is inherently random
Stochastic Process
Non-deterministic time series can be regarded as manifestations (equiv., realization) of a
stochastic process, which is defined as a set of random variables 𝑋% %∈{!,…,+}

Even if we were to imagine having observed the process for an infinite period 𝑇 of time, the
infinite sequence
&/
𝑆 = … , 𝑠%-!, 𝑠% , 𝑠%&!,… = 𝑠% %.-/

would still be a single realization from that process.


Stochastic Process

Still, if we had a battery of N computers generating series 𝑆 (!) , …, 𝑆 (2) , and considering
selecting the observation at time 𝑡 from each series,
(!) (2)
𝑠% , … , 𝑠%

this would be described as a sample of N realizations of the random variable 𝑋%


Stochastic Process

Still, if we had a battery of N computers generating series 𝑆 (!) , …, 𝑆 (2) , and considering
selecting the observation at time 𝑡 from each series,
(!) (2)
𝑠% , … , 𝑠%

this would be described as a sample of N realizations of the random variable 𝑋%

This random variable 𝑋% is associated with an unconditional density, denoted by


𝑓3" (𝑠% )
#$%
"
!
• E.g., for the Gaussian white noise process 𝑓3" 𝑠% = 4 56
𝑒 %&%
Stochastic Process

The unconditional mean is the expectation, provided it exists, of the 𝑡-th observation, i.e.,
&/
𝐸 𝑋% = = 𝑠% 𝑓3" 𝑠% 𝑑𝑠% = 𝜇%
-/

Similarly, the variance of the random variable 𝑋% is defined as


&/
𝐸 𝑋% − 𝜇% 4 = = 𝑠% − 𝜇% 4𝑓 𝑠% 𝑑𝑠%
3"
-/
Stochastic Process

Given any particular realization 𝑆 (#) of a stochastic process (i.e., a time series), we can define
the vector of the 𝑗 + 1 most recent observations

(#) (#)
𝑥%7 = [𝑠%-8 , … , 𝑠% ]

We want to know the probability distribution of this vector 𝑥%7 across realizations. We
can calculate the 𝒋-th autocovariance

𝛾8% = 𝐸(𝑋% − 𝜇% )(𝑋%-8 − 𝜇%-8 )


Stationarity

If neither the mean 𝜇% or the autocovariance 𝛾8% depend on the temporal variable 𝑡, then
the process is said to be (weakly) stationary.

E.g., let the stochastic process 𝑋% & /


%.-/ represent the sum of a constant 𝜇 with a
Gaussian white noise process 𝜖% & /
%.-/, such that

𝑋% = 𝝁 + 𝜖%

Quantity [u]
Then, its mean is constant: 𝐸(𝑋% ) = 𝜇 + 𝐸(𝜖% ) = 𝜇
and its 𝑗-th autocovariance: 𝐸(𝑋% − 𝜇)(𝑋%-8 − 𝜇) = 𝛾8
Time [s]
Stationarity

If neither the mean 𝜇% or the autocovariance 𝛾8% depend on the temporal variable 𝑡, then
the process is said to be (weakly) stationary.

E.g., let the stochastic process 𝑋% & /


%.-/ represent the sum of a constant 𝜇 with a
Gaussian white noise process 𝜖% & /
%.-/, such that

𝑋% = 𝝁 + 𝜖%

Quantity [u]
Then, its mean is constant: 𝐸(𝑋% ) = 𝜇 + 𝐸(𝜖% ) = 𝜇
and its 𝑗-th autocovariance: 𝐸(𝑋% − 𝜇)(𝑋%-8 − 𝜇) = 𝛾8
Time [s]

In other words: A process is said to be stationary if the process statistics do not depend on time.
Ergodicity
# #
Given a time series, denoted by 𝑆 (#) = 𝑠! , … , 𝑠" , we can compute the sample temporal
average as
"
1 (#)
𝑠̅ = I 𝑠%
𝑇
%.!

The ergodicity of a time series bind the concept of the process mean with that of temporal
sample mean:
• A process is said to be ergodic if 𝑠̅ converges to 𝜇% as 𝑇 → ∞
Ergodicity
# #
Given a time series, denoted by 𝑆 (#) = 𝑠! , … , 𝑠" , we can compute the sample temporal
average as
"
1 (#)
𝑠̅ = I 𝑠%
𝑇
%.!

The ergodicity of a time series bind the concept of the process mean with that of temporal
sample mean:
• A process is said to be ergodic if 𝑠̅ converges to 𝜇% as 𝑇 → ∞

In other words: A process is said to be ergodic if its time statistics equals the process statistic,
provided that the process is observed long enough.
Example: Stationarity and Ergodicity
To clarify the concept, we give an example of stationary but not ergodic process. Suppose the
(#) &/
mean 𝜇 of the 𝑖-th realization of 𝑋% %.-/ is sampled from the normal distribution
(#) (#)
𝑈 0, 𝜆4 and, similarly to the previous example, 𝑋% = 𝜇 + 𝜖% .

We have that the process is stationary because:


#
𝜇% = 𝐸 𝜇 + 𝐸 𝜖% = 0
# #
𝛾8% = 𝐸 𝜇 + 𝜖% 𝜇 + 𝜖%-8 = 𝜆4
Example: Stationarity and Ergodicity

However, its sample temporal mean, converges to a different value than the process mean,
i.e.,

𝑠̅ = (1/𝑇) I(𝜇 # +𝜖% ) = 𝜇 #


Time series fundamentals
i.i.d. observations and central limit theorems
Time series and i.i.d. data
Observations collected in a time series 𝑆 = 𝑠!, … , 𝑠" are generally not i.i.d.
• Observation 𝑠# could be dependent on previous observations 𝑠8 , with 𝑗 < 𝑖
• The distribution of the underlying data generation process could change over time, i.e. it
is not identically distributed
Time series and i.i.d. data
Observations collected in a time series 𝑆 = 𝑠!, … , 𝑠" are generally not i.i.d.
• Observation 𝑠# could be dependent on previous observations 𝑠8 , with 𝑗 < 𝑖
• The distribution of the underlying data generation process could change over time, i.e. it
is not identically distributed

For example:
• The price of a stock today depends on its price yesterday (dependence)
• and the volatility of the stock, i.e., its dispersion of returns, might change over time
(change on the underlying distribution)
Time series and i.i.d. data
The structure of this dependence imposes challenges on the statistical data analysis of time
series.
• Many tools for statistical inference are valid only for i.i.d. data
Time series and i.i.d. data
It might be useful to be able to assess the structure of the dependence between random
variables. For this reason we make use of their correlation.
• Generally, we measure the correlation between two variables 𝑋# and 𝑋8 with their
covariance 𝐶𝑜𝑣(𝑋# , 𝑋8 ).
• 𝐶𝑜𝑣(𝑋# , 𝑋8 ) = 0 → uncorrelated
• We measure dependence of an entire time series with a similar concept, the long-run
variance
• 𝜎#4 = ∑ℤ 𝐶𝑜𝑣(𝑋# , 𝑋#&: )
The Central Limit Theorem
The Central Limit Theorem (CLT) suggests that the sum of random variables converges to a
normal distribution, under precise conditions.

More precisely, for a sequence of i.i.d. random variables 𝑋% %∈{!,…,+} with 𝜇 = 𝐸(𝑋% ) and
𝜎 4 = 𝐸 𝑋% − 𝜇 4, by the CLT it holds:
"
1
𝑇 I 𝑋# − 𝜇 → 𝒩(0, 𝜎 4)
𝑇
!
The Central Limit Theorem
The Central Limit Theorem (CLT) suggests that the sum of random variables converges to a
normal distribution, under precise conditions.

More precisely, for a sequence of i.i.d. random variables 𝑋% %∈{!,…,+} with 𝜇 = 𝐸(𝑋% ) and
𝜎 4 = 𝐸 𝑋% − 𝜇 4, by the CLT it holds:
"
1
𝑇 I 𝑋# − 𝜇 → 𝒩(0, 𝜎 4)
𝑇
!

For stationary time series with mean 𝜇 and long-run variance 𝜎 4 the CLT holds as before.
Why is the CLT important?
If the CLT holds for a time series, we can draw from a larger range of methods.
• Statistical inference depends on the possibility to take a broad view of results from a
sample to the population.
• The CLT legitimizes the assumption of normality of the error terms in linear regression.
However,
• Many time series we encounter in the real world satisfy CLT assumption of independence
and stationarity
• Or can be transformed into stationary time series, e.g., by differentiations or other
transformations
Why is the CLT important?
If the CLT holds for a time series, we can draw from a larger range of methods.
• Statistical inference depends on the possibility to take a broad view of results from a
sample to the population.
• The CLT legitimizes the assumption of normality of the error terms in linear regression.
However,
• Many time series we encounter in the real world satisfy CLT assumption of independence
and stationarity
• Or can be transformed into stationary time series, e.g., by differentiations or other
transformations

It is a good idea to start by checking whether the data is independent or stationary.


Insight: CLT for dependent random variables
Different version of the CLT exist for dependent random variables. For example, under the
assumption of a M-dependent random process(a), we have that the following limit theorem
holds:
Let 𝑋% %∈{!,…,"} be M-dependent stationary process with mean 𝜇, covariance 𝛾8 , and
denoted with 𝑉; the variance of the mean of n observations,
;
𝑉; ≔ I 𝛾8
8.-;

If 𝑉; > 0, then,
𝑛 𝑋# − 𝜇 → 𝑁 0, 𝑉; .
(a) A stochastic process 𝑋% %∈{!,…,*} is said to be M-dependent if 𝑋% %,- are independent of the stochastic variables 𝑋% %.-/0/!
Time series fundamentals
Recap
Recap
Time series have long been studied in history
• Recent digitalization increases the importance of
time series analysis
Recap
Time series have long been studied in history
• Recent digitalization increases the importance of
time series analysis
Properties of time series
• Regularly vs irregularly sampled
• Univariate vs multivariate
• Discrete vs continuous
• Periodic
• Deterministic vs non-deterministic
• Stationarity
• Ergodicity
Recap
Time series have long been studied in history Central limit theorem only holds for stationary time
series
• Recent digitalization increases the importance of
time series analysis • Less restrictive CLT versions exist
Properties of time series • Need to properly learn dependences
• Regularly vs irregularly sampled
• Univariate vs multivariate
• Discrete vs continuous
• Periodic
• Deterministic vs non-deterministic
• Stationarity
• Ergodicity

You might also like