Introduction To Gaussian Process Models: C Esar Lincoln Cavalcante Mattos
Introduction To Gaussian Process Models: C Esar Lincoln Cavalcante Mattos
November 2015
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Agenda
1 Introduction to GPs
- Why GPs?
- Basic Definitions
- GPs for Regression
- Covariance Function and Hyperparameters Optimization
- From Feature Space to GPs
2 Dynamical System Identification
3 Advanced Topics
- Sparse Models
- Classification
- Robust Learning
- Unsupervised Learning
- Deep Models
- More Nonlinear Dynamical Models
4 Conclusion
2 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
3 / 54
Beira Mar Avenue. Iracema guerreira statue.
UFC
8 campi.
114 undergraduate courses.
146 graduate courses.
2,150 professors.
26,800 undergraduate
students.
6,000 graduate students.
PPGETI
200 masters dissertations.
75 PhD thesis.
5 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
6 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Basics
7 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Basics
Definition
If a vector of random variables f RN follows a multivariate Gaussian
distribution, we can express it by
1 1 > 1
p(f |, K ) = N 1 exp (f ) K (f ) , (1)
(2) 2 |K | 2 2
8 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Basics
Marginalization
The observation of a larger collection of variables does not affect the
distribution of smaller subsets, which implies that f 1 N (1 , K 11 ) and
f 2 N (2 , K 22 ).
Conditioning
Conditioning on Gaussians results in a new Gaussian distribution given by
p(f 1 |f 2 = y ) = N (f 1 |1 + K 12 K 1 1
22 (y 2 ), K 11 K 12 K 22 K 21 ) (3)
9 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Basics
Gaussian Process
Definition
A GP defines a distribution of functions
f : X R, (4)
10 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Basics
Samples from a GP
11 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Nonlinear Regression
D = {(x i , yi )|N
i=1 } = (X , y ), (6)
yi = f (x i ) + i , i N (0, n2 ) (7)
12 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Standard GP modeling
Choose a multivariate Gaussian prior for the unknown function
f = f (X ) N (f |0, K ), (8)
Kij = k (x i , x j ), (9)
where K RN N , Kij = k (x i , x j ), is the covariance matrix,
obtained with a kernel function k (, ).
A common choice is the squared exponential function:
D
!
1 X
k (x i , x j ) = f2 exp wd2 (xid xjd )2 , (10)
2
d=1
Standard GP modeling
Likelihood
Considering the observation of a Gaussian noisy version of f , we have
p(y |f ) = N (y |f , n2 I ), (11)
Marginal likelihood
The marginal distribution of y is calculated by integrating out f :
Z Z
p(y |X ) = p(y |f )p(f |X )d f = N (y |f , n2 I )N (f |0, K )d f , (12)
= N (y |0, K + n2 I ). (13)
14 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Standard GP modeling
Inference
Inference for f , given a new input x , is obtained by conditioning:
K + n2 I k f
y
N 0, , (14)
f k f k
p(f |x , X , y ) = N (k f (K + n2 I )1 y , k k f (K + n2 I )1 k f ).
(15)
where
K = K (X , X ), (16)
k f = [K (x , x 1 ), , K (x , x N )], (17)
kf = k>
f , (18)
k = K (x , x ). (19)
15 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Standard GP modeling
16 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Samples from a GP
17 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Samples from a GP
Figure 4 : Samples from the posterior (after the observation of y ), without noise
(n2 = 0).
18 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Samples from a GP
Figure 5 : Samples from the posterior (after the observation of y ), with noise
(n2 = 0.01). 19 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Samples from a GP
20 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Samples from a GP
21 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Covariance Matrix
Hyperparameters Optimization
The noise variance n2 is included in the vector of hyperparameters
which is optimized with the maximization of the marginal log-likelihood
L() = log p(y |X , ), the so-called evidence of the model:
1 1 N
L() = log |K + n2 I | y > (K + n2 I )1 y log(2). (20)
2| {z } 2| {z } 2
model capacity data fitting
Covariance Matrix
Hyperparameters Optimization
23 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Covariance Matrix
Kernel Function
24 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Covariance Matrix
Alternative View
Alternative View
Introduction
Black-box modeling
The model is obtained only from the systems inputs and outputs.
Introduction
where x i is the regressor vector (or state), Ly and Lu are some specified
lags, f () it the transition function and i is a Gaussian noise.
GP-NARX
Experiments
Validation
1-step ahead prediction: the prediction is performed based on past
inputs and known observed outputs.
Infinite-step ahead prediction (free simulation): the prediction is
performed based on past inputs and past predictions.
Metrics
q PN
1
Root Mean Square Error: RMSE = N i=1 (yi i )2 .
Negative Log Density
Error:
1 PN (yi i )2
NLD = 2N i=1 log(2) + log(i2 ) + i2
.
31 / 54
Artificial datasets1
Input/Samples
# Output Estimation Test Noise
yi =
yi1 yi2 (yi1 +2.5)
+ ui1 ui = U(2, 2) ui = sin(2i/25)
1 2
1+yi1 2
+yi2 N (0, 0.29)
300 samples 100 samples
yi1
ui = sin(2i/25)+
yi = 3
+ ui1 ui = U(2, 2)
2 2
1+yi1 sin(2i/10) N (0, 0.65)
300 samples 100 samples
yi = 0.8yi1 + ui = U(1, 1) ui = sin(2i/25)
3 N (0, 0.07)
(ui1 0.8)ui1 (ui1 + 0.5) 300 samples 100 samples
ui = N (ui |0, 1) ui = N (ui |0, 1)
3 )
yi = yi1 0.5 tanh(yi1 + ui1
4 1 ui 1 1 ui 1 N (0, 0.0025)
150 samples 150 samples
yi = 0.3yi1 + 0.6yi2 + ui = U (1, 1) ui = sin(2i/250)
5 N (0, 0.18)
0.3 sin(3ui1 ) + 0.1 sin(5ui1 ) 500 samples 500 samples
1
Narendra, K.S., Parthasarathy, K., Identification and control of dynamical
systems using neural networks, 1990; Kocijan J. et. al., Dynamic systems
identification with Gaussian processes, 2005
Artificial-1 dataset
Sparse Models
Sparse GP Approximations
41 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Classification
Robust Learning
Robust Learning
(x ) (x ) (x )
xi = f (xi1 , , xiLx ui1 , , uiLu ) + i , i N (i |0, x2 ),
(34)
(y) (y) (y)
yi = xi + i , i N (i |0, i1 ), i (i |, ). (35)
3
Mattos, C. L. C., et al., Latent Autoregressive Gaussian Process Models for Robust
System Identification, submitted to DYCOPS 2016.
44 / 54
GP-RLARX for System Identification
Artificial 4. Artificial 5.
RMSE values for free simulation with different levels of contamination by outliers.
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
Unsupervised Learning
Deep Models
Deep GPs6
6
Damianou A. and Lawrence, N., Deep Gaussian Process, 2013
47 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
where
h i> hh i i>
(1) (1) (1)
x i1 , u i1 = x i1 , , xiL , [ui1 , , u iL u ] , if h = 1,
h i> hh i h ii>
(h) (h) (h1) (h) (h) (h1) (h1)
x i = x i1 , x i = xi1 , , xiL , xi , , xiL+1 , if 1 < h H ,
h i>
(H ) (H )
x i = xi , , x (H )
, if h = H + 1.
iL+1
9
work in progress!
50 / 54
RGP for System Identification (free simulation)
Final Remarks
52 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion
References
53 / 54
Introduction to GPs Dynamical System Identification Advanced Topics Conclusion