Notes On Random Variables, Expectations, Probability Densities, and Martingales
Notes On Random Variables, Expectations, Probability Densities, and Martingales
Sims
For many or most of you, parts of these notes will be review. If you have had
multivariable calculus and econometrics or statistics, it should all be review until the
section on martingales. If the math used in these notes and in the exercise at the end
seems very unfamiliar or difficult to you, be sure to let me know and to raise questions
in class when the math starts to go by too fast. These notes use integrals with respect
to several variables, which may be new if you’ve not had multivariable calculus.
The first part of these notes is material that should be in most introductory under-
graduate probability or econometrics textbooks. The sections on the law of iterated
expectations and on martingales overlap the assigned material in Campbell, Lo and
MacKinlay.
Parts of the exercise and the notes assume that you can work with matrix multipli-
cation, inversion, determinants, etc. If this is not true for you, let me know and we
will go over these concepts in class.
1. Random Variables
A random variable is a mathematical model for something we do not know but
which has a range of possible values, possibly some more likely than others. Examples
include the sum of the dots on rolled dice, the value of a share of Microsoft stock 6
months from now, and the name of the prime minister of Russia 3 weeks from now.
For most of us the value of a share of Microsoft stock 6 months ago is also a random
variable, though once we have looked it up somewhere we could treat it as a simple
number. If dice are fair, each number of dots, from 1 through 6, is equally likely on each
die. We formalize this by saying that each of the numbers from 1 to 6 has probability
1
6
. The set of numbers from 1 to 6, the range of possible values, is what we call the
probability space of the random variable, and the set of numbers, adding up to 1,
that we attach to the elements of the probability space (in this case 16 on each of the six
numbers in the space) is the probability distribution on the space. When we roll
two dice together and consider their sum, the probability space becomes the integers
from 2 to 12, and it no longer makes sense to give them all equal probability.
A random variable like the value of Microsoft shares in 6 months does not have a
finite list of possible values. It could lie anywhere between zero and some very large
positive number. (Actually, stock prices are always quoted in discrete increments, so
that there really are only finitely many positive values, but the number of possible
values is very large, so we are going to pretend, as does most financial modeling, that
1
2
the price could actually be any real number.) For a random variable like this, every
individual real number has zero probability, but intervals of real numbers can have
positive probability. We usually specify the probability of such intervals by specifying
a probability density function or pdf. A pdf for a single random variable X taking
on real values is a function f (·) defined on the real line that is everywhere non-negative
and satisfies Z ∞
f (x) dx = 1 . (1)
−∞
The probability of an interval (a, b) of values for the random variable is then
Z b
P [X ∈ (a, b)] = f (x) dx. (2)
a
Random variables that take on no single numerical value with positive probability, but
have a pdf over the real line are called continuously distributed, while those that
take on a list of possible values, each with positive probability, are called discretely
distributed. There can also be random variables that mix these two categories.
A set of random variables {X1 , . . . , Xn } may have a joint distribution. The sim-
plest sort of example would be the joint distribution of the values of two dice rolled
together. Each can take on the values 1 to 6. The joint probability space for them is
the set of pairs of numbers (n1 , n2 ), with each of n1 and n2 taking on values 1 to 6.
We can display this probability space graphically as:
die 2
1 2 3 4 5 6
1 • • • • • •
2 • • • • • •
die 1
3 • • • • • •
4 • • • • • •
5 • • • • • •
6 • • • • • •
If each side of a single die has probability 16 and they are thrown fairly, we usually
assume that each of the 36 dots in this diagram has the same probability. Since there
1
are 36 dots and the probabilities add to one, each has probability 36 . Note that now
we can see why it does not make sense to give equal probability to all possible sums
of values on the dice. The sum of the two dice is the same along diagonal rows of the
diagram, running from upper right to lower left. The sum is two just at the diagram’s
upper left corner, 12 just at the lower right corner, and 7 along the longest diagonal,
running from lower left to upper right. So a sum of 7 occurs at 6 points and has a total
probability of 16 , while 2 has probability 361
.
For a pair of continuously distributed random variables {X1 , X2 } the joint distribu-
tion is described by a joint pdf f (·, ·). The probability of a region of values in which
X1 lies in (a, b) and X2 lies in (c, d) is the integral of f (x, y) over the rectangle defined
3
When instead X and Y are jointly continuously distributed, so they have a pdf f (x, y),
the marginal pdf g(x) of X is found from
Z
g(x) = f (x, y) dy . (8)
At least in classroom problem sets, the simplest way to calculate conditional expec-
tations for continuous distributions is usually to use (11) to form a conditional pdf,
then to integrate the product of the conditional pdf with the thing whose expectation
is being taken. That is, use
Z
E[g(X, Y ) |x ] = g(x, y)f (y |x)dy . (13)
Often instead of starting from a joint pdf, we start with a model that describes a
conditional pdf. Then we might want to construct a joint pdf from a given marginal
pdf for X and conditional pdf for Y |X. It is easy to see that (12) can be turned around
to read
f (x, y) = f (y|x)f (x) . (15)
Note that here we are relying on the different argument lists for f to make it clear that
there are in fact three different functions that are all denoted f . The left-hand side
of (15) is the joint pdf of Y and X, the first f on the right is the conditional pdf of
Y |X, and the second f on the right is the marginal pdf of X. This is sometimes handy
notation, but it can lead to confusion, especially if, say, we are dealing with both a
Y |X pdf and a X|Y pdf in the same discussion.
Note that for market participants, it must be that Pt is in the information set available
at t—they couldn’t trade at the price Pt if they didn’t know what it was. This means
that we always include Pt in the Xt vector, and that therefore Et [Pv ] = Pv for any
v ≤ t.2
Note also that in introducing the “Et ” notation, we avoid the tedious writing out
of the information set, which can be dangerous. The same equation, written in Et
notation, can have different implications according to what is being implicitly assumed
about the information set.
where σ is the element in the i’th row, j’th column of Σ−1 . 3 In (20) the vector µ is
ij
Avoiding matrix product notation, we can write this out by stating that σij , the i’th
row, j’th column element of Σ, is
σij = E[(Xi − µi ) · (Xj − µj )] . (23)
One reason the normal distribution occurs so often is that it allows calculation of
marginal and conditional distributions directly, without explicitly evaluating integrals
of pdf’s. So in practice it is less important to know (20) than to know that if X1
and X2 are two jointly normal random variables with 2 × 1 mean vector µ and 2 × 2
covariance matrix Σ,
σ12
E[X1 |X2 ] = µ1 + (X2 − µ2 ) , (24)
σ22
and that, furthermore, the conditional distribution of X1 |X2 is itself normal. The
marginal distribution of X1 in this case is just N (µ1 , σ11 ).
When X1 and X2 are themselves vectors, it is again true that the conditional dis-
tributions of one given the other are normal. We split the mean vector of the joint
distribution into two pieces, and the covariance matrix into four pieces, corresponding
to the division of the full vector made up of X1 and X2 into its two components. Then
in matrix notation we can write
E[X1 |X2 ] = µ1 + (X2 − µ2 )0 Σ−1
22 Σ21 . (25)
In this expression we are using the notation that
Σij = E[(Xi − µi ) · (Xj − µj )0 ] , i, j = 1, 2 . (26)
7. Exercises
(1) Suppose that time is discrete, i.e. that t takes on only integer values, and that
we replace (27) by the apparently weaker assertion that for all t, Et [Pt+1 ] = Pt .
Use the law of iterated expectations to show that this implies (27).
(2) There are only four time periods, t = 1, 2, 3, 4. We know that there are only
four possible time paths for the asset price Pt , and we can list them as
t
path no. 1 2 3 4
1 2 3 3 3
Pt paths:
2 2 1 2 3
3 2 1 2 1
4 2 1 0 0
Here each row represents a possible time path for Pt . All paths have P1 = 2,
reflecting the fact that there is no uncertainty about P1 , but after that the
course of prices is uncertain. The probabilities of the four paths are π1 , π2 ,
π3 , π4 . The information known at t consists only of values of Ps for s ≤ t. If
the four probabilities are π1 = .5, π2 = .125, π3 = .125, and π4 = .25, show
that P is a martingale. Is P also a martingale if we change π3 to .25 and π4
to .125? If not, what is the profit opportunity implied by this change in the
probabilities, and at what time would it appear? Note that what you have
been given here is the joint distribution of P1 , . . . , P4 , so that you will have to
do some summing to get marginal distributions, form conditional distributions,
and form expectations.
(3) Suppose an asset price Pt at three dates, t = 1, 2, 3, has a joint normal distri-
bution with mean 2 at all three dates (i.e. a mean vector consisting of three 2’s
stacked up) and one of the following four covariance matrices:
1 1 0 3 3 3
(a) 1 2 2 (b) 3 5 5
0 2 5 3 5 6
6 5 3 3 3 3
(c) 5 5 3 (d) 3 4 3
3 3 3 3 3 5
We again assume that information available at t is only current and past
prices. For each of these covariance matrices, determine whether Pt is a mar-
tingale. Here you will want to use (25). (The distributions we assume here
imply nonzero probabilities of negative P , which is unrealistic. Ignore this lack
of realism in doing the exercise.)