0% found this document useful (0 votes)
22 views10 pages

Essay

Log likelihood

Uploaded by

Shahnewaz Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views10 pages

Essay

Log likelihood

Uploaded by

Shahnewaz Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Question 1: In class, we looked at a likelihood function that had independent Gaussian

probability distributions - we assumed that the likelihood for a single data point was a
Gaussian (or “normal”) distribution centered on a y_pred with known variance sigma^2,
and the likelihood for the whole data set was the product of the likelihoods for the
individual data points. Now, consider an experiment where we believe the data points
will be distributed according to the Poisson distribution. For example, our model might
be that the amount of traffic on a street increases linearly in time (say, during the
morning rush hour). Our observations are counts of how many cars go by during
1-minute periods, so our observations are integers (number of cars during 8:00 to 8:01,
number of cars from 8:01 to 8:02, etc), and our y_pred value is the “rate”. The Poisson
probability function is P(k | rate) = rate^k e^-rate / k!, where k is the count (observed
number of cars).

● Please sketch what the log-likelihood function for this model would look like (that
is, please give code, but we don’t care about getting all the punctuation exactly
right; pseudo-code is fine).

Hint: Your log-likelihood function will have arguments (b, m, counts, times) where (b,m)
are the parameters of the linear model, “counts” are the (integer) counts of cars per time
period, and the “times” will be the number of minutes since the start of the experiment;
the model is that the rate increases linearly with time. Assume the log-likelihood
function will be used for MCMC, so you can drop any terms that are constant for fixed
data (counts).

● Discuss why there is no “y_error” or uncertainty estimate on the counts.


2

Question 2: We looked at one way of handling outliers in a data set – by using a “mixture
model”, where we said that there were two probability distributions the data points
could be drawn from – the “good” distribution, where there was a y_pred and the data
points were drawn from a Gaussian distribution around that y_pred – and a “bad”
distribution, where the data points could be drawn from the whole range of the y data,
with a flat distribution. And then we had a p_good parameter that said what fraction of
the data points were drawn from the “good” distribution. Now, assume that, instead of
outliers that are just junk, we have a data set that contains observations from two
distinct populations. (Maybe they’re two different kinds of supernovae, for example.)
And for each data point, we don’t know which population it comes from. Assume that
each population follows a linear model, but with different line parameters (different B
and M values).

● Please describe a mixture model that will allow you to infer the line parameters
for both populations at the same time, and also infer what fraction of the whole
data set comes from each population.
● Sketch the code for the log-likelihood function.
3

Question 1: A linear system A x = b with a given square matrix A and given right-hand
side vector b can be solved iff the matrix has no zero eigenvalues. What if the matrix A
is rectangular, and the system imposes too many conditions on X? Describe a method
that you can use to still find the “best” solution to such a system. Describe the main
ideas underlying your approach, and sketch a relevant algorithm that uses methods
from linear algebra.

Question 2: Numerical methods for analysis have a deep connection to linear algebra.
For example, when solving a differential equation, one is looking for a function (e.g.
`f(x)`) that satisfies certain conditions. This requires being able to *represent* a function
numerically, i.e. to describe a function in a computer code with a finite number of real
numbers. Evaluating this function at a given point `x` is then a linear operation. In
addition, differential operators (i.e. derivatives) are linear operators.
4

● Describe one way in which a function can be represented numerically. (We


discussed two such methods in detail in the course.)
● Explain how this represents a function, i.e. how to evaluate `f(x)` for a given `x`.
● Sketch (on a blackboard, not on a computer) a (Julia) function that calculates the
derivative of the function. This function doesn't need to be 100% correct
(computers are picky), but it needs to convey the main ideas for evaluating the
derivative.

Question 1: In the tutorial session we looked at a variant of the Metropolis-Hastings


algorithm where insteme. Please describe the advantages and disadvantages of these
two approaches, and a situation where one would be more useful than the other.
5

Question 2: In the affine invariant ensemble sampler, we saw that the algorithm moves a
group of walkers around the state space. One of the parameters we have to set when
using this sampler is the number of walkers. Please describe any strict limits on the
number of walkers that we can choose in order to ensure correct results. Please
describe practical considerations that would lead us to choose a certain number of
walkers.

Question 1: Ordinary differential equations: 1. What are ordinary differential equations


(ODEs)? Give an example. The standard form in which an explicit ODE is formulated
when it is to be solved numerically is y'(x) = f(y, x) where x is the independent variable,
and y(x) is the solution that is sought. The function f is called the right hand side. 2.
Describe a simple method that can be used to solve an ODE numerically. Sketch the
respective algorithm as a Julia function on the blackboard. 3. ODEs that contain higher
(higher than first) derivatives can be reduced to first order. Describe how to do this for
an ODE of your choice. This leads to a system of ODEs. How do you solve such a
system of ODEs? 4. What is a disadvantage of very simple ODE integrators? How can
these disadvantages be overcome?
6

Question 2: 1. What is an elliptic partial differential equation? Give an example. 2. When


given just an elliptic PDE, one needs additional conditions to make up a well-defined
system. What conditions are these? Why are they necessary? Sketch the domain on the
blackboard. 3. What ansatz could you choose to convert an elliptic equation to a linear
equation of the form A x = b? How are the operator A and the right hand side vector b
defined? 4. Give a simple argument why A will be square in your ansatz. 5. What is a
sparse matrix? How is it stored, and why can this be much more efficient than a regular
(dense) matrix? Why are sparse matrices useful for a finite difference discretization?
7

Question 3: 1. What is a hyperbolic partial differential equation? Give an example. 2.


When given just a hyperbolic elliptic PDE, one needs additional conditions to make up a
well-defined system. What conditions are these? Why are they necessary? Sketch the
domain on the blackboard. 3. While it is possible to convert a hyperbolic PDE to a linear
equation of the form A x = b (in the same way as an elliptic equation), this is typically
not done. Describe on the blackboard how a hyperbolic PDE could be discretized and
solved. 4. What is the Courant–Friedrichs–Lewy condition?
8
9
10

Home · eschnett/2018-computational-physics-course Wiki · GitHub

You might also like