0% found this document useful (0 votes)
31 views2 pages

Tut7 Questions

Uploaded by

Amir Sharifi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views2 pages

Tut7 Questions

Uploaded by

Amir Sharifi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

MLPR Tutorial Sheet 7

1. Pre-processing for Bayesian linear regression and Gaussian processes:


We have a dataset of inputs and outputs {x(n) , y(n) }nN=1 , describing N preparations of
cells from some lab experiments. The output of interest, y(n) , is the fraction of cells that
are alive in preparation n. The first input feature of each preparation indicates whether
(n)
the cells were created in lab A, B, or C. That is, x1 ∈ {A, B, C}. The other features are real
numbers describing experimental conditions such as temperature and concentrations
of chemicals and nutrients.

a) Describe how you might represent the first input feature and the output when
learning a regression model to predict the fraction of alive cells in future prepara-
tions from these labs. Explain your reasoning.

b) Compare using the lab identity as an input to your regression (as you’ve discussed
above), with two baseline approaches: i) Ignore the lab feature, treat the data from
all labs as if they came from one lab; ii) Split the dataset into three parts one for
lab A, one for B, and one for C. Then train three separate regression models.
Discuss both simple linear regression and Gaussian process regression. Is it
possible for these models, when given the lab identity as in a), to learn to emulate
either or both of the two baselines?

c) There’s a debate in the lab about how to represent the other input features: log-
temperature or temperature, and temperature in Fahrenheit, Celsius or Kelvin?
Also whether to use log concentration or concentration as inputs to the regression.
Discuss ways in which these issues could be resolved.
Harder: there is a debate between two different representations of the output.
Describe how this debate could be resolved.

2. Gaussian processes with non-zero mean:


In the lectures we assumed that the prior over any vector of function values was
zero mean: f ∼ N (0, K ). We focussed on the covariance or kernel function k(x(i) , x( j) ),
which evaluates the Kij elements of the covariance matrix (also called the ‘Gram
matrix’).
If we know in advance that the distribution of outputs should be centered around
some other mean m, we could put that into the model. Instead, we usually subtract the
known mean m from the y data, and just use the zero mean model.
Sometimes we don’t really know the mean m, but look at the data to estimate it. A
fully Bayesian treatment puts a prior on m and, because it’s an unknown, considers all
possible values when making predictions. A flexible prior on the mean vector could be
another Gaussian process(!). Our model for our noisy observations is now:

m ∼ N (0, Km ), Km from kernel function k m ,


f ∼ N (m, K f ), K f from kernel function k f ,
y ∼ N (f, σn2 I), noisy observations.

Show that — despite our efforts — the function values f still come from a function
drawn from a zero-mean Gaussian process (if we marginalize out m). Identify the
covariance function of the zero-mean process for f .
Identify the mean’s kernel function k m for two restricted types of mean: 1) An unknown
constant mi = b, with b ∼ N (0, σb2 ). 2) An unknown linear trend: mi = m(x(i) ) =
w> x(i) + b, with Gaussian priors w ∼ N (0, σw2 I), and b ∼ N (0, σb2 ).

MLPR:tut7 Iain Murray, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2018/ 1


Sketch three typical draws from a GP prior with kernel:

k( x (i) , x ( j) ) = 0.12 exp − ( x (i) − x ( j) )2 /2 + 1.




Hints in footnote1 .

3. Laplace approximation:
The Laplace approximation fits a Gaussian distribution to a distribution by matching
the mode of the log density function and the second derivatives at that mode. See the
w8b lecture notes for more pointers.
Exercise 27.1 in MacKay’s textbook (p342) is about inferring the parameter λ of a
Poisson distribution based on an observed count r. The likelihood function and prior
distribution for the parameter are:

λr 1
P(r | λ) = exp(−λ) , p(λ) ∝ .
r! λ
Find the Laplace approximation to the posterior over λ given an observed count r.
Now reparameterize the model in terms of ` = log λ. After performing the change of
variables2 , the improper prior on log λ becomes uniform, that is p(`) is constant. Find
the Laplace approximation to the posterior over ` = log λ.
Which version of the Laplace approximation is better? It may help to plot the true and
approximate posteriors of λ and ` for different values of the integer count r.

1. The covariances of two Gaussians add, so think about the two Gaussian processes that are being added to give
this kernel. You can get the answer to this question by making a tiny tweak to the Gaussian process demo code
provided with the class notes.
2. Some review of how probability densities work: Conservation of probability mass means that: p(`)d` = p(λ)dλ
for small corresponding elements d` and dλ. Dividing and taking limits: p(`) = p(λ)|dλ/d`|, evaluated at
λ = exp(`). The size of the derivative |dλ/d`| is referred to as a Jacobian term.

MLPR:tut7 Iain Murray, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2018/ 2

You might also like