0% found this document useful (0 votes)
37 views20 pages

Process Convolutions

Process convolutions provide a convenient representation of Gaussian processes. A Gaussian process can be defined by convolving a kernel function with white noise. This results in a stationary process whose covariance is related to the Fourier transform of the kernel. Discrete approximations of the kernel convolution can be used for practical applications and allow for non-stationary processes by using location-dependent kernels or latent processes. The model can be fitted by treating it as a linear model with a specially structured design matrix defined by the kernel. Generalizations allow for non-Gaussian latent processes and spatially-varying kernels.

Uploaded by

Felipe Leiva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views20 pages

Process Convolutions

Process convolutions provide a convenient representation of Gaussian processes. A Gaussian process can be defined by convolving a kernel function with white noise. This results in a stationary process whose covariance is related to the Fourier transform of the kernel. Discrete approximations of the kernel convolution can be used for practical applications and allow for non-stationary processes by using location-dependent kernels or latent processes. The model can be fitted by treating it as a linear model with a specially structured design matrix defined by the kernel. Generalizations allow for non-Gaussian latent processes and spatially-varying kernels.

Uploaded by

Felipe Leiva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Process Convolutions

A convenient representation of a Gaussian process is given by


process convolutions. Consider a kernel function k(s) and a white
noise process w(s), with E(w(s)) = 0, var(w(s)) = 2 and
cov(w(s), w(s )) = 0. Then define
Z
x(s) = k(s u)w(u)du
S

More formally we define the process as


Z Z
x(s) = k(s u)dB(u), where dB(u) = B(A) N (0, 2 |A|)
S A

and cov(B(A), B(C)) = 2 |A C|, which corresponds to a


Brownian motion.

1
Process Convolutions
We have that E(x(s)) = 0
Z
var(x(s)) = 2 k 2 (s u)du

and
Z Z
cov(x(s), x(s )) = 2 k(s u)k(s u)du = k(t)k(t d)dt

where d = s s . Thus the process x(s) is stationary.

2
Process Convolutions
We have that E(x(s)) = 0
Z
var(x(s)) = 2 k 2 (s u)du

and
Z Z
cov(x(s), x(s )) = 2 k(s u)k(s u)du = k(t)k(t d)dt

where d = s s . Thus the process x(s) is stationary.


The Fourier transform of the covariance of x(s) is the square of the
Fourier transform of the k. Thus, for a given covariance C, the
corresponding kernel is the inverse-transform of the root of the
spectrum of C:
p
k = IF T ( F T (C))

2
Process Convolutions Examples
The Gaussian correlation corresponds to a Gaussian kernel.
The Matern correlation corresponds to the kernel
 1/2  
s s
k(s) = K1/2 > 0, > 1

The exponential correlation corresponds to a spike.


The covariance has twice the number of derivatives that the kernel
has.

3
Process Convolutions Examples
The Gaussian correlation corresponds to a Gaussian kernel.
The Matern correlation corresponds to the kernel
 1/2  
s s
k(s) = K1/2 > 0, > 1

The exponential correlation corresponds to a spike.


The covariance has twice the number of derivatives that the kernel
has.
A kernel offering varying degrees of smoothing and compact
support is the Bezier kernel

1 ||s||2  if ||s|| < 1
k(s) = >0
0 otherwise.

3
Discrete Approximations

In practice we consider a grid of points in S to approximate the


kernel convolution. So, for u1 , . . . , up we have that
p
X
x(s) = k(s uj )w(uj )
j=1

We observe hat
p
X
var(x(s)) = 2 k 2 (s uj )
j=1

and
p
X
cov(x(s), x(s )) = 2 k(s uj )k(s uj )
j=1

which imply that the resulting covariance is NOT stationary.

4
Non-Stationarity

The continuous version of the kernel convolution can be used to


obtain non-stationary Gaussian processes by:
Kernel dependent on location-varying parameters
Z
x(s) = k(s u; (u))w(u)du
S

Convolving process with covariance function dependent on


location-varying parameters.
Z
x(s) = k(s u)w(u) (s)du
S

5
Fitting the Model
Given a set of observations y1 , . . . , ym at locations s1 , . . . , sm we fit
the model
p
X
yi = (s) + k(s uj ; )wj + i , i N (0, 2 )
j=1

with priors wj N (0, 2 ), p( 2 ), p() and p( 2 ).

6
Fitting the Model
Given a set of observations y1 , . . . , ym at locations s1 , . . . , sm we fit
the model
p
X
yi = (s) + k(s uj ; )wj + i , i N (0, 2 )
j=1

with priors wj N (0, 2 ), p( 2 ), p() and p( 2 ).


This is a hierarchical linear model where the design matrix is
defined by the kernel.

Y = + K()w +

The dimension of w is p, which corresponds to the size of the grid


that is used for w(s). As p is much smaller than m, this results is
an important reduction in the dimension of the problem for
computational purposes. Usually (s) = z (s).

6
Non-Gaussian Processes

In many situations we need to consider spatial processes that are


not Gaussian. Here are several options:
Generalized linear models. Use a Gaussian process to model
the mean function of the observations transformed using the
link function.
Non-linear transformations and clipping. This is useful, for
example, for binary data.
Process convolutions for non-Gaussian latent processes. w(ui )
can be taken as realizations of distributions other than
Gaussian. For example, a positive distribution will provide a
positive-valued random field.

7
Non-Gaussian Process Convolutions
Elaborating on the previous slide
p
X
x(s) = k(s uj ; )wj , wj F
j=1

where F is a distribution with support in R+ , then x(s) 0, s.

8
Non-Gaussian Process Convolutions
Elaborating on the previous slide
p
X
x(s) = k(s uj ; )wj , wj F
j=1

where F is a distribution with support in R+ , then x(s) 0, s.


Consider the normalized kernel
k(s uj ; )
k (s uj ; ) = P
l k(s ul ; )

then
p
X
x(s) = k (s uj ; )wj , wj F
j=1

where F has (0, 1) support, like a beta distribution, then


x(s) (0, 1), s, as x(s) is a convex combination of wj .

8
Spatially-Varying Kernels

A process with spatially varying kernel can be written as


p
X
x(s) = b(s ui ; (s))wi
i=1

where

1 ||s j||2 1 if ||s j|| < 1

b(s j; )
0 otherwise.

and = (1 , . . . , 4 ).

9
Spatially-Varying Kernels

A process with spatially varying kernel can be written as


p
X
x(s) = b(s ui ; (s))wi
i=1

where

1 ||s j||2 1 if ||s j|| < 1

b(s j; )
0 otherwise.

and = (1 , . . . , 4 ).
The distance is given as
q
T
||s j|| ((xs xj ), (ys yj )) 1 ((xs xj ), (ys yj )).

9
Bezier Kernels

1.0
a b
66

0.8
64
62

0.6
j
latitude

K(.,w)
s
60

0.4
j
58

0.2
56
54

0.0
12 8 6 4 2 0 12 10 8 6 4 2 0

longitude longitude

10
Spatially-Varying Kernels
The ellipsoidal shape is controlled by the parameters in

1 + 2 cos 24 2 sin 24
1
2 sin 24 1 2 cos 24
 
1 1 1 1 1
= + ,
2 a2 A2 a 2 A2
a = L + 2 (U L), A = a + 3 (U a), 2 , 3 (0, 1)
So the semi-minor and semi-major axes a and A belong to (L, U ).

11
Spatially-Varying Kernels
The ellipsoidal shape is controlled by the parameters in

1 + 2 cos 24 2 sin 24
1
2 sin 24 1 2 cos 24
 
1 1 1 1 1
= + ,
2 a2 A2 a 2 A2
a = L + 2 (U L), A = a + 3 (U a), 2 , 3 (0, 1)
So the semi-minor and semi-major axes a and A belong to (L, U ).
The spatial variation of is obtained, with a normalized b, as
p
X
(s) = b(s ui ; )(ui ) = (2, 1, 0, 0)
i=1

with appropriate uniform priors on each k (ui ), k = 1, . . . , 4.

11
Scallops Data

38.5 39.0 39.5 40.0 40.5 41.0 41.5

38.5 39.0 39.5 40.0 40.5 41.0 41.5


0.1
0.3
0.4
0.6 0.8
latitude

latitude
0.9

0.7
0.5
0.2

73.5 73.0 72.5 72.0 73.5 73.0 72.5 72.0

longitude longitude
38.5 39.0 39.5 40.0 40.5 41.0 41.5

38.5 39.0 39.5 40.0 40.5 41.0 41.5


3 2
2
4 4
3 4 0
1 0
latitude

latitude
4 4
1 2 2
3 5 2 6
6 2
2 10

2 0
1 6 2

6
0 2
8 4
4
1 6
2
1
6
2

73.5 73.0 72.5 72.0 73.5 73.0 72.5 72.0

longitude longitude

Using the DPC with fixed parameter Bezier kernels on the scallops data.
We use a fixed ellipsoidal kernel that follows the coastline.

12
Space-Varying Kernels

38.5 39.0 39.5 40.0 40.5 41.0 41.5

38.5 39.0 39.5 40.0 40.5 41.0 41.5


3
3
1 0
4 2
latitude

latitude
4
1
2
2
6 5
3 6

7
1
2

4
5
3

73.5 73.0 72.5 72.0 73.5 73.0 72.5 72.0

longitude longitude
38.5 39.0 39.5 40.0 40.5 41.0 41.5

38.5 39.0 39.5 40.0 40.5 41.0 41.5


0.62
2.95
0.6
0.58 2.9
0.6 2.85
latitude

latitude
0.62
0.64
0.66 2.8
0.68 2.9
0 .7 0.72
0.

2.8
58

2.75 2.75
0.5

2.7 2.8
4
56
0.

73.5 73.0 72.5 72.0 73.5 73.0 72.5 72.0

longitude longitude

Letting the four parameters be DPCs with spherical kernels we have


space-varying ellipsoidal shapes and smoothness. Lower panels:
eccentricity (left) and smoothness (right).

13
Modeling issues

In the previous model we add measurement error and usually a


linear drift.
We need to specify a prior for w. Usually p(w) 1 works fine.
The compact support of b() produces sparse matrices in the
resulting design matrix of the linear model. This can be used
to speed up calculations.
How many knots do we use? Where do we put them? There
are no good answers to these questions. We can use model
comparison methods to choose between different configurations.
Nevertheless, intuitively, the space-varying nature of the
support should compensate for small or sparse grids.

14

You might also like