Tut7 Questions
Tut7 Questions
a) Describe how you might represent the first input feature and the output when
learning a regression model to predict the fraction of alive cells in future prepara-
tions from these labs. Explain your reasoning.
b) Compare using the lab identity as an input to your regression (as you’ve discussed
above), with two baseline approaches: i) Ignore the lab feature, treat the data from
all labs as if they came from one lab; ii) Split the dataset into three parts one for
lab A, one for B, and one for C. Then train three separate regression models.
Discuss both simple linear regression and Gaussian process regression. Is it
possible for these models, when given the lab identity as in a), to learn to emulate
either or both of the two baselines?
c) There’s a debate in the lab about how to represent the other input features: log-
temperature or temperature, and temperature in Fahrenheit, Celsius or Kelvin?
Also whether to use log concentration or concentration as inputs to the regression.
Discuss ways in which these issues could be resolved.
Harder: there is a debate between two different representations of the output.
Describe how this debate could be resolved.
Show that — despite our efforts — the function values f still come from a function
drawn from a zero-mean Gaussian process (if we marginalize out m). Identify the
covariance function of the zero-mean process for f .
Identify the mean’s kernel function k m for two restricted types of mean: 1) An unknown
constant mi = b, with b ∼ N (0, σb2 ). 2) An unknown linear trend: mi = m(x(i) ) =
w> x(i) + b, with Gaussian priors w ∼ N (0, σw2 I), and b ∼ N (0, σb2 ).
Hints in footnote1 .
3. Laplace approximation:
The Laplace approximation fits a Gaussian distribution to a distribution by matching
the mode of the log density function and the second derivatives at that mode. See the
w8b lecture notes for more pointers.
Exercise 27.1 in MacKay’s textbook (p342) is about inferring the parameter λ of a
Poisson distribution based on an observed count r. The likelihood function and prior
distribution for the parameter are:
λr 1
P(r | λ) = exp(−λ) , p(λ) ∝ .
r! λ
Find the Laplace approximation to the posterior over λ given an observed count r.
Now reparameterize the model in terms of ` = log λ. After performing the change of
variables2 , the improper prior on log λ becomes uniform, that is p(`) is constant. Find
the Laplace approximation to the posterior over ` = log λ.
Which version of the Laplace approximation is better? It may help to plot the true and
approximate posteriors of λ and ` for different values of the integer count r.
1. The covariances of two Gaussians add, so think about the two Gaussian processes that are being added to give
this kernel. You can get the answer to this question by making a tiny tweak to the Gaussian process demo code
provided with the class notes.
2. Some review of how probability densities work: Conservation of probability mass means that: p(`)d` = p(λ)dλ
for small corresponding elements d` and dλ. Dividing and taking limits: p(`) = p(λ)|dλ/d`|, evaluated at
λ = exp(`). The size of the derivative |dλ/d`| is referred to as a Jacobian term.