Essay
Essay
probability distributions - we assumed that the likelihood for a single data point was a
Gaussian (or “normal”) distribution centered on a y_pred with known variance sigma^2,
and the likelihood for the whole data set was the product of the likelihoods for the
individual data points. Now, consider an experiment where we believe the data points
will be distributed according to the Poisson distribution. For example, our model might
be that the amount of traffic on a street increases linearly in time (say, during the
morning rush hour). Our observations are counts of how many cars go by during
1-minute periods, so our observations are integers (number of cars during 8:00 to 8:01,
number of cars from 8:01 to 8:02, etc), and our y_pred value is the “rate”. The Poisson
probability function is P(k | rate) = rate^k e^-rate / k!, where k is the count (observed
number of cars).
● Please sketch what the log-likelihood function for this model would look like (that
is, please give code, but we don’t care about getting all the punctuation exactly
right; pseudo-code is fine).
Hint: Your log-likelihood function will have arguments (b, m, counts, times) where (b,m)
are the parameters of the linear model, “counts” are the (integer) counts of cars per time
period, and the “times” will be the number of minutes since the start of the experiment;
the model is that the rate increases linearly with time. Assume the log-likelihood
function will be used for MCMC, so you can drop any terms that are constant for fixed
data (counts).
Question 2: We looked at one way of handling outliers in a data set – by using a “mixture
model”, where we said that there were two probability distributions the data points
could be drawn from – the “good” distribution, where there was a y_pred and the data
points were drawn from a Gaussian distribution around that y_pred – and a “bad”
distribution, where the data points could be drawn from the whole range of the y data,
with a flat distribution. And then we had a p_good parameter that said what fraction of
the data points were drawn from the “good” distribution. Now, assume that, instead of
outliers that are just junk, we have a data set that contains observations from two
distinct populations. (Maybe they’re two different kinds of supernovae, for example.)
And for each data point, we don’t know which population it comes from. Assume that
each population follows a linear model, but with different line parameters (different B
and M values).
● Please describe a mixture model that will allow you to infer the line parameters
for both populations at the same time, and also infer what fraction of the whole
data set comes from each population.
● Sketch the code for the log-likelihood function.
3
Question 1: A linear system A x = b with a given square matrix A and given right-hand
side vector b can be solved iff the matrix has no zero eigenvalues. What if the matrix A
is rectangular, and the system imposes too many conditions on X? Describe a method
that you can use to still find the “best” solution to such a system. Describe the main
ideas underlying your approach, and sketch a relevant algorithm that uses methods
from linear algebra.
Question 2: Numerical methods for analysis have a deep connection to linear algebra.
For example, when solving a differential equation, one is looking for a function (e.g.
`f(x)`) that satisfies certain conditions. This requires being able to *represent* a function
numerically, i.e. to describe a function in a computer code with a finite number of real
numbers. Evaluating this function at a given point `x` is then a linear operation. In
addition, differential operators (i.e. derivatives) are linear operators.
4
Question 2: In the affine invariant ensemble sampler, we saw that the algorithm moves a
group of walkers around the state space. One of the parameters we have to set when
using this sampler is the number of walkers. Please describe any strict limits on the
number of walkers that we can choose in order to ensure correct results. Please
describe practical considerations that would lead us to choose a certain number of
walkers.