0% found this document useful (0 votes)

102 views

Conditional Probability

This document provides an introduction to conditional probabilities and expectations. It defines conditional expectation rigorously using measure theory concepts like the Lebesgue-Radon-Nikodym theorem. The document explains the intuition behind conditional expectations and how they can be viewed as averages when observing random variables through a coarser information structure. It also covers properties of conditional expectations like linearity and the law of total probability.

Uploaded by

albertopianoflauta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views

Conditional Probability

Uploaded by

albertopianoflauta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Introduction to Conditional Probabilities and

Expectations
Steve Cheng
March 2, 2008

Contents
1 Basic definition

2 Intuitive explanations

3 Conditional expectation given a random variable

4 Basic properties of the conditional expectation

5 Conditional probabilities

6 Convergence properties of the conditional expectation

7 Conditional probability measures

8 Calculating conditional expectations

9 Change of variable

10 What is being conditioned on can be set constant

11 Hilbert space theory of conditional expectations

12 Bibliography

Purpose
In this note, we give a rigorous definition of conditional probabilities and expectations,
and some fundamental results about them. We assume that the reader already is familiar
with the intuitive notion of conditional probabilities (P(A | B) for P(B) > 0)). For
our exposition, we will also depend on, of course, some measure theory, including the
Lebesgue-Radon-Nikodym theorem.
The author has written this note because he still does not readily encounter introductions to conditional probability that are theoretically rigorous and yet not afraid to delve
into, explain and justify the intuition behind the concepts. (Though J. Michael Steeles

book referenced in the bibliography comes close, even as that author remarks that the
abstract definition of conditional probability is not easy to love; fortunately, love is not
required.)

Copyright matters
Permission is granted to copy, distribute and/or modify this document under the terms of
the GNU Free Documentation License, Version 1.2 or any later version published by the
Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and
with no Back-Cover Texts.

Basic definition

Let (, , P) be a probability space. Suppose we are given G , a -algebra of , and

Y : R, a random variable with E|Y | < . We define the conditional expectation of
Y given G by the following procedure.
Let be a signed measure given by
(A) = E[Y 1A ] ,

for A G.

Clearly () = 0, while if A1 , A2 , . . . are disjoint events, then

(A1 A2 ) = E[Y 1A1 A2 ]
= E[Y (1A1 + 1A2 + )]
= E[Y 1A1 ] + E[Y 1A2 ] +
= (A1 ) + (A2 ) + .
Let P|G denote the restriction of P to G. If P|G (A) = 0 then E[Y 1A ] = 0, so the
measure is absolutely continuous with respect to P|G . By the Lebesgue-Radon-Nikodym
theorem, there exists a G-measurable function g : R, such that
Z
E[Y 1A ] = (A) =
g dP|G = E[g 1A ] , for all A G.
(1)
A

Moreover, the theorem says that the function g is actually unique up to P|G -null sets.
Definition 1.1. The conditional expectation of Y given G, denoted by E[Y | G], is defined
to be any of the G-measurable functions g : R that satisfy equation (1).
Though there are many functions g that are candidates, each pair of them only differ
on a set of P|G -measure zero.
As the conditional expectation can only be defined when Y is integrable (E|Y | < ),
we will tacitly make such assumptions in our work unless stated otherwise.

Intuitive explanations

We now want to give some intuitions about the strange definition of conditional expectation just given.

Firstly, a -algebra G on the sample space , roughly speaking, is a lens or filter

through which we observe what is happening in the probability space. If G is coarse and
contains few events the extreme example being G = {, } then we can calculate
probabilities for those events only. To say that a function g : R is G-measurable for
a smaller -algebra G, means that it is blurred by looking at it through G. This vague
statement is made precise by this simple fact:
Proposition 2.1. Let (, G) be a measurable space, and g : R be a G-measurable
function. If 1 , 2 are never separated in G, i.e. there is no set in G that contains
one of these points but not the other, then g(1 ) = g(2 ).
Proof. Let B = {g(1 )}. As g 1 (B) is G-measurable and contains 1 , it must contain
2 .

We can now say that, E[Y | G] is an average value of the random variable Y when
it is viewed through the lens of a more constricting -algebra G. It is an average value,
because, when its expected value is taken in equation (1), it is the same as the expectation
of Y , across all subsets of G. The function E[Y | G] is a blurred version of Y where we
cannot tell the difference in the value of Y , at sample points that are not separated in G.
To further illustrate the intuitive explanations, consider conditioning Y on the trivial
-algebra G = {, }, which separates no points. The G-measurable function E[Y | G]
must clearly be a constant. By equation (1), the constant must be EY , i.e. the average
value of Y on the whole sample space.
Another important case to consider is when Y is already G-measurable that is,
when
G (Y ) = {Y 1 (B) : B is a Borel set in R} ,
or that G separates points finely enough for Y . Then E[Y | G] = Y immediately from the
definition. So constricting Y to G just amounts to doing nothing at all.

Conditional expectation given a random variable

We commonly condition random variables given the -algebra

G = (X) = {X 1 (B) : B is measurable}
that is generated by another random variable X.
If X : R takes the same value at two sample points 1 , 2 , then G = (X)
does not separate 1 and 2 . Therefore, by Proposition 2.1, E[Y | G]() takes on the
same values wherever X() are at the same values. In other words, E[Y | G] is really a
function of X; that is, E[Y | G] = h X for some h : R R.
We will want to ascertain some measurability properties of h, and to do this it is best
to go modify the original construction of E[Y | G] to produce the function h directly.
Given a measurable function X : 0 , where (0 , 0 ) is another measurable space1 ,
define the signed measure 0 on (0 , 0 ) by
0 (B) = E[Y 1XB ] = E[Y 1X 1 (B) ] ,

for all B 0 .

1 The most simplest case is 0 = R and 0 = B , but 0 could also be Rn . Usually we will assume
R
that 0 contains all singleton sets; otherwise Proposition 2.1 might fail to hold with R replaced by 0 .
Also we want to be allowed to talk about seemingly simple sets like [X = x].

Let PX be the pull-back measure B 7 P(X 1 (B)). If PX (B) = 0, then 0 (B) = 0,

so 0 is absolutely continuous with respect to PX . By the Lebesgue-Radon-Nikodym
theorem, we obtain a measurable function h : 0 R, unique up to PX -null sets, such
that
Z
E[Y 1XB ] = 0 (B) =
h dPX , for all B 0 .
(2)
B

By a change of variables with A = X 1 (B) G = (X), we have

Z
Z
h X dP|G =
h dPX
A
B
Z
= E[Y 1XB ] = E[Y 1E ] =
g dP|G ,
g = E[Y | G] .
A

Since the integrals on the extreme left and right are equal on all A G, the integrands
h X and g must be equal P|G -almost surely.
Definition 3.1. The symbol E[Y | X] is to mean the same thing as E[Y | (X)]. It is
called the conditional expectation of Y given the random variable X.
There always exists a measurable function h : 0 R such that h(X()) = E[Y |
X]() for P|G -almost all . The value h(x) is often denoted by E[Y | X = x], even
though it may not always be well-defined as a single number. When we use this notation
we will usually have a specific version of the function h in mind.
Equation (2) stated in the new notation becomes:
Z

E[Y 1XB ] = E E[Y | X] 1XB =
E[Y | X = x] dPX ,

for all B 0 .

(3)

To summarize, the construction just given is basically the same as the first construction
used for E[Y | G], only that (0 , 0 , PX ) replaces (, , P|G ) in the original. Since they
are so similar, when we discuss properties about E[Y | G] and E[Y | X = x] hereafter,
we will usually state and prove the properties for only E[Y | G], as the modifications for
E[Y | X = x] are trivial.
Intuitively, the function E[Y | X] answers the question: What is the average value of
Y given the values that the random variable X takes on?
Or, the operation E[ | X] can be thought of as extracting the part of a random variable
that can be predicted as a function of X (a (X)-measurable random variable).
Note the arguments in this section also give an easy-to-digest characterization of realvalued integrable functions2 that are (X)-measurable for some measurable X: they
are exactly those functions that can be expressed as an integrable function of X almost
everywhere.

Basic properties of the conditional expectation

The following property is trivial but well-known in nave probability theory (when G =
(X)):
2 This

characterization is also valid in general measure theory, provided the measure involved is -finite.

Proposition 4.1 (The law of total probability). For any real-valued random variable Y ,

E E[Y | G] = EY .
Proof. Take A to be the entire probability space G in equation (1).

In a sense, the conditional expectation has been defined in such a way that functions that are conditional expectations must satisfy a generalization of the law of total
probability that it must hold on all G-measurable subsets and not just on . The generalized law of total probability makes the satisfying functions unique up to sets of measure
zero, whereas lots of other G-measurable functions Z can satisfy only E[Z1 ] = E[Y 1 ],
including of course the constant Z = EY .
It is easy to check the linearity property of conditional expectations, as suggested by
the notation:
Proposition 4.2. For constants a, b, c R, almost surely
E[aX + bY + c | G] = a E[X | G] + b E[Y | G] + c .
Proof. The Radon-Nikodym derivative operates linearly on signed measures, and obviously E[c | G] = c.

And the comparison property:
Proposition 4.3. If Y 0, then almost surely
E[Y | G] 0 ,
which in combination of Proposition 4.2 leads to the generalized triangle inequality:

E[Y | G] E[|Y | | G] .
Proof. The Radon-Nikodym derivative of a positive measure is almost everywhere nonnegative.

The same properties hold when G is replaced with a random variable X.
We end this section with the tower property of conditional expectation, that is so
frequently used that it would not do justice to omit:
Proposition 4.4. If G1 and G2 are two -algebras with G1 G2 , then for any random
variable Y ,

E E[Y | G1 ]G2 = E[Y | G1 ] = E E[Y | G2 ]G1 .
Explanation: Filtering a random variable on a coarser -algebra, then a sharper/finer
one, or vice versa, is the same as filtering just on the coarser one. Or, conditioning can
only subtract from, and not add to, the randomness of a random variable.
Proof. The first equality is immediate: as E[Y | G1 ] is G1 -measurable by definition, and
hence is already G2 measurable, so re-conditioning on G2 changes nothing.
For the second equality, we must show that E E[Y | G2 ] | G1 is just another version
of E[Y | G1 ], satisfying its defining equation (1). All we need to do is to let B G1 G2
and push some symbols around:

E E[ E[Y | G2 ] G1 1B ] = E E[Y | G2 ] 1B = E[Y 1B ] .

Proposition 4.5. If X is G-measurable, then X can be factored out of a conditional

expectation with respect to G:
E[XY | G] = X E[Y | G] .
Proof. We have to prove that, for all B G,

E X E[Y | G] 1B = E[XY 1B ] .
If X = 1A for some A G, this is trivial. By linearity, if X is a simple random variable
of the form
n
X
X=
xi 1Ai , xi R , Ai G ,
i=1

the conclusion also holds.

Now assume, for the moment, that X and Y are non-negative. Take a sequence of
random variables Xn that increase to X, and apply the monotone convergence theorem
twice:

E (XY ) 1B = lim E (Xn Y ) 1B = lim E Xn E[Y | G] 1B = E X E[Y | G] 1B .
n

Finally, if X and Y are not necessarily non-negative, write them in terms of their
positive and negative parts, and apply linearity to obtain the same conclusion.

Conditional probabilities

We have delayed the discussion of conditional probabilities, because they are defined by
a similar process as conditional expectations, and we want to avoid repeating proofs of
what are mostly the same thing. But with conditional expectations available, they are
easy to define:
Definition 5.1. The conditional probability of an event A given G is the function
P(A | G) = E[1A | G] .
Definition 5.2. The conditional probability of an event A given a random variable
X : 0 is the function
P(A | X) = E[1A | X] .
Intuitively, the right-hand sides are the average value of 1A given G (or X). This
average value of the indicator function 1A is the probability of the event A occurring.
Also, if given a value X() = x 0 , evaluating P(A | X = x) = E[1A | X =
x] = E[1A | X]() gives the conditional probability of A occurring when we know that
X = x. However, we cannot take this too literally, because if the event [X = x] occurs
with probability zero, P(A | X = x) might not have a single value across all versions of
the function P(A | X).
Reinterpreting equations (1) and (3), we arrive at the following defining relations:
Z
P(A E) =
P(A | E) dP|G ,
E G,
(4)
ZE
P(A [X B]) =
P(A | X = x) dPX ,
B 0 .
(5)
xB

But these are just the Bayes rules for conditional probabilities!
6

Definition 5.3. For any event B with positive probability of occurrence, we also
define
P(A | B) = P(A | 1B = 1)
E(Y | B) = E(Y | 1B = 1) .
Applying equation (3) shows that P(A | B) just defined agrees with the usual definition:
Z
P(A B) = E[1A 1B ] =
P(A | 1B ) dP1B
{1}

= P(A | 1B = 1) P(1B = 1) = P(A | B) P(B) ,

so that
P(A | B) =

P(A B)
.
P(B)

(6)

If B is a null set, P(A | B) is not well-defined, and this is of course where the traditional
definition of P(A | B) fails.
If B is the event [X = x], the definition just made is (fortunately!) consistent with
that of P(A | X = x).
For the sake of completeness, we mention the following intuitive fact, obviously true
for the nave definition of conditional probability:
Proposition 5.1. Let A be an event. Then P(A | G) = P(A) almost surely if and
only if A is independent from all events in G,
Proof. For the if direction: for all E G,
E[P(A) 1E ] = P(A) P(E) = P(A E) = E[1A 1E ] = E[P(A | G) 1E ] ,
so P(A) satisfies the condition for being P(A | G). The converse follows from the same
equation.

The proof of the following theorem requires a result from the next section. But rest
assured, the next section does not depend on the results here, nor is the present result
crucial to understand immediately. The material is put into this order only to make it
easier to digest, and to build intuition on how independence and conditional expectations
are related.
Corollary 5.2. E[f (Y ) | G] = E[f (Y )] for all PY -integrable functions f , if and only if Y
is a random variable independent of all the events in G.
(The random variable Y can have a codomain 0 that is not necessarily R.)
Proof. Note that Y being independent of all events in G means exactly that the events
[Y B] are independent of all events in G.
For the if direction, Proposition 5.1 already proves the case for f = 1B for measurable sets B 0 , setting A = [Y B]. For arbitrary integrable f , approximate it
with a sequence of linear combinations fn of indicator functions converging to it pointwise
everywhere and dominated by |f | itself. Employing dominated convergence (Proposition
6.4) to take limits, we find E[f (Y ) | G] = E[f (Y )].
For the only if direction, simply select the particular cases f = 1B for measurable
sets B 0 , and apply the only if direction from Proposition 5.1.

7

For a little intuition of Corollary 5.2, take the example of G = (X) and h = identity.
If Y is independent from X, then Y cannot be expressed as a function of X at all, unless
it is constant.3 Thus knowing the value of X cannot possibly give additional information
on Y this is the content of the equation E[Y | X] = E[Y ], the constant average value.

Convergence properties of the conditional expectation

This section is devoted to developing the analogues of the convergence theorems for normal
expectations or integrals.
Theorem 6.1. Taking conditional expectations is a

E E[Y | G] E|Y | or equivalently,

L1 contraction:

E[Y | G] 1 kY kL1 .
L

Proof. E E[Y | G] E E[|Y | | G] = E|Y | using the generalized triangle inequality
(Proposition 4.3).

Corollary 6.2. If the real-valued random variables X1 , X2 , . . . converge to X in L1 , then
E[Xn | G] converge to E[X | G] in L1 .
Proof. Apply the previous theorem to Y = Yn = Xn X and take n .

Theorem 6.3 (Monotone convergence). If the non-negative random variables Xn are

increasing almost surely to a random variable X, and then the conditional expectations
E[Xn | G] increase almost surely to E[X | G].
Direct proof. First of all, we know that almost surely E[Xn | G] are increasing and bounded
by E[X | G] (justified by Proposition 4.3), but whether they increase to the bound E[X | G]
is still in question. To settle this, let B be the event where supn E[Xn | G] is less than
E[X | G] , for some fixed > 0.
Notice that B G because both E[Xn | G] and E[X | G] are, by definition, Gmeasurable. Then:
E[X 1B ] = lim E[Xn 1B ]
n

= lim E E[Xn | G] 1B
n

= E lim E[Xn | G] 1B
n

E lim E[X | G] 1B

normal monotone convergence

definition of conditional expectation
normal monotone convergence

= E[X 1B ] P(B) ,
and this is impossible unless P (B) = 0. Letting & 0, we see that the event where
E[Xn | G] does not increase to E[X | G] must have probability zero.

3 This

conclusion also follows if Y is merely uncorrelated with all functions of X. For a proof, refer to
the last remarks in Section 2.

Simpler proof. Apply the next result on dominated convergence, as Xn are dominated by
X. Of course, normally we cannot reduce the Dominated Convergence Theorem to the
Monotone Convergence Theorem this way, because E|X| in general might be infinite, but
in that case E[X | G] cannot be defined anyway.

Another formulation of the monotone convergence is possible that perhaps does not so
trivially reduce to dominated convergence. Suppose we do not assume that X has finite
mean, but that Z = limn E[Xn | G] has finite mean. Then we can conclude that X has
finite mean and Z = E[X | G].
Theorem 6.4 (Dominated convergence). If the random variables Xn converge to a random variable X almost surely, and |Xn | are dominated by another random variable Z with
finite expectation, then almost surely,

lim E |Xn X| | G = 0 and hence lim E[Xn | G] = E[X | G] .
n

= lim E[Yn 1B ]
n

dominated convergence on Yn .

Thus the G-measurable random variable limn E[Yn | G] is equal to 0 almost surely.

Conditional probability measures

Suppose X is a discrete random variable. If we look at P(A | X = x) = P(A [X =

x])/P([X = x]), as a function of A, it describes a probability measure with each fixed x.
So we are led to ask: is A 7 P(A | G)() a probability measure for each fixed ?
The answer is no, generally. For one thing, P(A | G)() is not always well-defined for
pointwise . We can see in more detail what goes wrong when we attempt to prove that
it is a probability measure.
Proposition 7.1. Let (, , P) be a probability space, and G be a -algebra.
i. If A is any event, then 0 P(A | G) 1 almost surely (with respect to P|G ).
ii. P( | G) = 0 and P( | G) = 1 almost surely.
iii. Given disjoint events A1 , A2 , . . . , we have almost surely
P(A1 A2 | G) = P(A1 | G) + P(A2 | G) + .
Proof.

i. Let E = [P(A | G) 0]. Then

Z
0 P(A E) =

P(A | G) dP|G 0 ,
E

so P(A | G) = 0 almost surely on E. i.e. P(A | G) almost never takes on negative

values. Similarly, if E = [P(A | G) 1],
Z
Z

P (E) P (A E) =
P(A | G) dP|G P(E) ,
P(A | G) 1 dP|G = 0 .
E

i.e. P(A | G) is almost never greater than one.

ii.

1 P( | G) dP|G = 0 .

P( | G) dP|G ,

1 = P() =

By (i), the integrand of the last integral is non-negative, and therefore P( | G) = 1

almost surely. Similarly,
Z
0 = P() = P( ) =
P( | G) dP|G

implies that P( | G) must be zero almost surely.

iii. Since for all E G,
Z
[
[

X
P
An | G dP|G = P
An E =
P(An E)
E

XZ
n

Z
P(An | G) dP|G =

!
X

P(An | G)

dP|G ,

the first and final integrands are equal almost surely. (The exchange of summation
and integration is allowed since the integrands are non-negative.)

The key phrase in Proposition 7.1 is almost surely. There is a P|G -null set N
where the (in)equalities may fail. The set N depends on the events A, because each
conditional probability is separately constructed for each A. There may not necessarily
exist a null set N for which the (in)equalities hold everywhere else for every event A .
On the other hand, if we were to identify a countable set of events A that we are
interested in, then we avoid this problem. For each A there is an exceptional null set,
and the countable union of all of these is again a null set; everywhere else on all the
relevant relations hold.
Given a random variable Y : R, a category of events that we can look at are
those of the form Y 1 (, y] . If we restrict y Q {, +}. There are at most
countably many of these, and yet knowing only their probabilities already determines the
probability distribution of Y . So the idea is to construct a version of B 7 P([Y B] | G),
where almost at all sample points in it becomes a probability distribution.
In our formal construction, we will also generalize to random variables that are Rn valued.

Theorem 7.2. Let (, , P) be a probability space, G be a -algebra, and Y : Rn

a random variable. For each , there exists a probability measure on (Rn , BRn ),
such that for P|G -almost all ,
(B) = P([Y B] | G)() .
Proof.
Construction of the cumulative distribution function. Let D be the countable collection of all events Dy = (, y1 ] (, yn ] for y Q {, +}.
By taking countable unions of the exceptional null sets, we can obtain a null set N
for which properties (i) and (ii) of Proposition 7.1 hold everywhere else for events
in Y 1 (D).
If A, B D and A B, then we have
P([Y A] | G) P([Y B] | G) ,

(7)

except on a null set. For each of the countably many pairs (A, B) D D there is
such a null set. Taking their union, we obtain a null set M for which relation (7)
holds everywhere except on M .
For each y Qn , define

F (y) = P Y Dy | G () ,

\ (N M )

(8)

for one of the versions of P( | G). And for z Rn \ Qn , set

F (z) =

inf

yQn : zj yj

F (y) ,

\ (N M ) .

(9)

We claim that F is a multi-dimensional cumulative distribution function, for each

\ (N M ). Clearly F (z) 0 for all z Rn . Also, by relation (7), F is
an increasing function of each argument if they are all rational. For non-rational
arguments it is also seen that F is increasing by virtue of equation (9).
These facts also mean that relation (9) can be broadened to:
F (z) =

inf

yRn : z6=y,zj yj

F (y) ,

(10)

for z Rn \ Qn .
Actually equation (10) holds for z Qn as well. Firstly, because F is increasing
in each variable, the infimum can be taken over only points of the form yn =
z + (1/n, . . . , 1/n), for n N, with no change. And secondly, by Fatous Lemma,
Z

0
lim inf F (yn ) F (z) dP|G
n
Z

lim inf
F (yn ) F (z) dP|G
n

= lim inf P(Y Dyn ) P(Y Dz ) = 0 .

Thus, lim inf n F (yn ) F (z) = inf n F (yn ) F (z) = 0 for all except on a
null set Nz . Provided we strip away these null sets too (for z Qn ) at the beginning,
equation (10), equivalent to right-continuity of F , holds in general.
With the same sort of argument using Fatous Lemma, we can prove that, save for a
null set, F 0 if one of the variables tends to , and F 1 if all the variables
tend to +.
For those which are on the exceptional null sets, we can define F (z) arbitrarily as P([Y Dz ]).
Thus we now know F is a cumulative distribution function, which then has a
corresponding probability measure .
(B) is the conditional probability. All that remains is to show that (B) = P([Y
B] | G)().
The first point is that 7 (B) should be G-measurable. This is unfortunately
somewhat technical: it involves the monotone class theorem, the same sort of argument used to prove measurability in Fubinis Theorem.
Let 0 = {B BRn : 7 (B) is G-measurable}. By equation (8), (D) is
measurable for all the sets in D D. For finite disjoint unions and complements B
of sets from D, (B) is measurable too, because it can be obtained by addition and
subtraction of various functions (D) for D D. And if Bn are sets in 0 increasing
or decreasing to B, then (B) = limn (Bn ) is a limit of G-measurable functions
and hence is measurable. This shows 0 is a monotone class, containing the algebra
generated D; by the monotone class theorem, 0 equals the -algebra generated by
D, that is, BRn .
The rest is easy. Consider
Z
B 7

(B) dP|G ,
E

which defines a positive measure, and, by definition, agrees with the measure B 7
P([Y B] E) for B D. Since D generates BRn , the two measures are ultimately
equal. As this is true for all E G, we have (B) = P([Y B] | G) as desired.
Definition 7.1. Let Y : 0 be measurable, for a measurable space (0 , 0 ). Any
function : 0 [0, 1] such that
(i) for each , : 0 [0, 1] is a probability measure, and
(ii) for each B 0 , (B) = P([Y B] | G) P|G -almost surely
is called a conditional probability measure for Y given G. In general, we denote these
functions by PY |G .
A conditional probability measure for Y given a random variable X is similarly defined,
and denoted PY |X .
Theorem 7.2 says that PY |G (or PY |X ) exists at least if 0 = Rn .
We make a brief note, that it exists also if 0 = RN . That is, given a a countable
number of random variables Yn : R, we can still construct PY |G for Y = (Y1 , Y2 , . . . ).
This is done by the same kind of procedures used to construct sample spaces for an infinite
12

number of random variables namely, by the Kolmogorov Existence Theorem. For it to

work, we only need to verify the consistency conditions:
P (E) = P (E R|||| ) ,

(11)

where , are ordered finite subsets of N (with no repetition of members), and P stands
for the finite-dimensional conditional probability measures for (Y(1) , Y(2) , . . . , Y(||) )
constructed in Theorem 7.2. For each , , equation (11) is found to hold for every
measurable E BR|| except for a null set. But there are only countably many possible
pairs of , , so we can obtain a single null set where equation (11) holds everywhere else.
Then the Kolmogorov Existence Theorem allows us to construct
P(a1 < Y1 b1 , a2 < Y2 b2 , . . . | G)()
as a probability measure for each .
(We cannot go much further than this, to construct conditional probability measures
for an uncountable number of variables, because by taking the variables 1E for every event
E, we would be able to construct P(E | G)().)

Calculating conditional expectations

In elementary courses of probability theory, one learns a definition of the conditional

probability measure x (B) = P(Y B | X = x), and defines E[Y | X = x] as the
expectation of a random variable whose distribution is given by x .
Since our definition of conditional expectation does not invoke conditional probability
measures, the above definition though quite convenient for practical computations
has to be proven. With well-defined conditional probability measures at our disposal,
we can do this.
Theorem 8.1. Let (, , P) be a probability space, G be a -algebra, and Y : R
be a random variable. Then
Z
Z
E[Y | G] =
y dPY |G i.e., E[Y | G]() =
y dPY |G() for .
yR

Proof. The approximation theorem for measurable functions furnishes a sequence of random variables Y1+ , Y2+ , . . . , such that Yn+ 0 and Yn+ % Y + = max(0, Y ). In fact they
have the explicit expression:
n

n2
1
X
k k+1
k
+
1
+
n
1
,
D
=
,
, Dn, = [n, ) .
Yn =
n,k
[Y Dn,k ]
[Y Dn, ]
2n
2n 2n
k=1

Then we have
E[Yn+

| G] =

n
n2
1
X

k=1

k
P(Y Dn,k | G) + n P(Y Dn, | G)
2n

n
n2
1
X

k
PY |G (Dn,k ) + n PY |G (Dn, )
2n
k=1
!
n
Z
n2
1
X
k
=
1D + n 1Dn, dPY |G .
2n n,k
R

k=1

(linearity of E[ | G])

y(0,)

We apply a similar argument for Y = max(0, Y ): by taking limits through a

sequence of functions Y1 , Y2 , . . . such that Yn 0 and Yn % Y , we have
Z
E[Y | G] = lim E[Yn | G] =
y dPY |G .
n

y(,0)

Hence
E[Y | G] = E[Y + | G] E[Y | G] =

Z
y dPY |G .

y(,)

Example 8.1. If X, Y are real-valued random variables, with a joint probability density
fX,Y , then we calculate that
Z
Z
fX,Y (x, y)
dy , fX (x) =
fX,Y (x, y) dy .
E[Y | X = x] =
y
fX (x)

Not suprisingly, the conditional probability density that is, the Radon-Nikodym
derivative appears to be the infinitesimal version of the elementary equation (6) for
the conditional probability.

Change of variable

Suppose X is a random variable on a probability space (, , P), and f : R R is a

measurable function. Let Y = f (X). Then EY can be computed in any of these three
ways:
Z
Z
Z
f (X()) dP ,

f (x) dPX ,

y dPY .

(12)

So what about E[Y | G]?

Theorem 8.1 says that it can be calculated as
Z
y dPY |G ,
yR

which is the counterpart to the last integral in (12). The first integral in (12) has no
counterpart for conditional expectations, since we do not have available a conditional
measure B 7 P(B | G)() that is defined for all events B. But the second integral in
(12) ought to have an analogy for conditional expectations, namely:
Z
f (x) dPX|G .
xR

We shall prove this.

Theorem 9.1. Let (, , P) be a probability space, and G be a -algebra. Let

X : 0 be a measurable function to another measurable space 0 , and f : 0 R be
measurable also. Then
Z
E[f (X) | G] =
f (x) dPX|G .
x0

Proof. Let Y = f (X). We show PY |G = PX|G f 1 almost surely. For each B BR ,

almost surely we have

PY |G (B) = P Y 1 (B) | G = P (f X)1 (B) | G

= P X 1 (f 1 (B)) | G = PX|G f 1 (B) .
There is a single P|G -null set on which the above equations are true for every B = (, y],
with y Q {, +}. Since the left- and right-side expressions define measures, the
equations must then hold for every B BR , on the same P|G -null set. The result now
follows from the change-of-variables theorem for ordinary integrals.

What is being conditioned on can be set constant

The aim of this section is to rigorously generalize two facts well known from the nave
definition of conditional probability:
1. For any events A, B (with P(B) > 0), P(A B | B) = P(A | B). i.e. if we are given
B, and asked to calculate conditional probabilities, then of course B happens for
certain. For the same reason, P(A B c | B) = 0.
2. This is related to the first fact. Suppose f (X, Y ) is a measurable function of two
random variables X and Y , and we want to compute E[f (X, Y ) | X]. Since X is a
given in the conditional probability, in the integral calculations X may be assumed
to be constant. So, for instance (Proposition 4.5), E[XY | X] = X E[Y | X] .
In what follows, (, , P) is a probability space, G is a -algebra, and 1 , 2 are
two other measurable spaces. Also X : 1 will be G-measurable, while Y : 2
will be -measurable.
Theorem 10.1. Let be a version of the conditional probability measure PY |G . Then the
product measure = X gives a version of the conditional probability measure PX,Y |G .
(Here x denotes the point-mass measure at x 1 .)
Proof. Clearly, is a probability measure everywhere on .
We prove that (S) is G-measurable for every measurable S 1 2 by appealing
to the monotone class theorem (again). Taking S of the form A B, where A 1 ,
B 2 are measurable, the function (S) = X (A) (B) = 1[XA] (B) is G-measurable
because X and (B) are. And G-measurability is preserved under finite disjoint unions
of sets A B, and under increasing and decreasing limits. So it follows that (S) is
G-measurable for every S in the product -algebra of 1 2 .

Also, for each E G, consider

the measure S 7 E[(S) 1E ]. It agrees with the
measure S 7 P [(X, Y ) S] E on sets S of the form A B:

E (A B) 1E = E 1[XA] P([Y B] | G) 1E

= E P([Y B] | G) 1[XA]E

= P [X A] [Y B] E
(note [X A] E G).
And hence the two measures agree on all measurable S
1 2 . Since E G is
arbitrary, it follows that (S) is a version of P (X, Y ) S | G .

Theorem 10.2. Let f : 1 2 R be measurable in the product measure space. Then
Z
E[f (X, Y ) | G] =
f (X, y) dPY |G .
y2

Corollary 10.3. Under the same hypotheses as Theorem 10.2,

Z
E[f (X, Y ) | X = x] =
f (x, y) dPY |X=x = E[f (x, Y ) | X = x] .
y2

Hilbert space theory of conditional expectations

In section 2, we observed that, in general, the conditional expectation E[Y | G] extracts

the part of Y that is G-measurable.
This is literally true if Y is a L2 random variable: then Y can be decomposed uniquely
into random variables U and V such that
Y = U + V (almost surely),
U M = {all G-measurable, L2 random variables} ,

V M .

The U part of Y can be obtained by projecting Y orthogonally onto M, a closed

subspace of the Hilbert space L2 , with inner product hX, Y i = E[XY ]. The part left over,
the random variable V , will be in M , orthogonal to M.
The projection operator is exactly E[ | G] because,
E[Y 1B ] = E[U 1B ] + E[V 1B ] ,

for every B G,

and E[V 1B ] = 0 by the definition of the orthogonal projection. Since the conditional
expectation is unique, we must have U = E[Y | G].
16

It is also not hard to give an intuitive description of M : it consists of all L2 random variables, with zero mean, that are uncorrelated with every G-measurable random
variable. Indeed, since 1 M, we must have E(V 1) = EV = 0 for every V M .
Then E[U V ] = E[(U EU )(V EV )], so V M is orthogonal to U if and only if V is
uncorrelated to U .
A slicker way of recognizing that U = E[Y | G] is to recall that the image of the
orthogonal projection onto M can be characterized as the unique vector in M closest in
norm to the pre-image:
Proposition 11.1. If Y is a L2 real-valued random variable, then the best estimate of
Y , in the least-squares sense, using only G-measurable functions, is E[Y | G]. That is,

E (Y X)2 E (Y E[Y |G])2
for all G-measurable random variables X, with equality ifand only if X = E[Y | G].
(The lower bound can also be written as: E Var(Y |G) = Var(Y ) Var E[Y |G] .)
Proof. The proof is completely analogous to the proof of the well-known case when X are
restricted to constants. We have:

E (Y X)2 ] = E(X 2 ) 2E(XY ) + E(Y 2 )

= E(X 2 ) 2E XE(Y | G) + E E(Y 2 | G)

= E X 2 2E(Y | G) X + E(Y 2 | G) .
The outermost integrand is a quadratic in X, which is minimized when X equals the
G-measurable function E[Y | G].

In fact, this Hilbert-space argument can be turned around, to prove the existence of
E[ | G] without recourse to the Lebesgue-Radon-Nikodym theorem!4
Example 11.1 (Orthonormal basis expansion of conditional expectation). Let X and Y
be two L2 random variables, with some joint distribution that is known, and we want to
compute the conditional expectation E[Y | X] = E[Y | (X)].
Recall, from linear algebra, that an orthogonal projection can be evaluated from its
known actions on an orthonormal basis {Zn } that spans (X).
X
E[Y | (X)] =
hY, Zn i Zn .
n

Let F be the cumulative distribution function of X. Then U = F (X) takes on values

in [0, 1] and has the uniform distribution. Since every element of (X) = (U ) can be
represented as a function of U , we can set Zn = n (U ) for some n L2 [0, 1] with
Lebesgue measure. This latter Hilbert space L2 [0, 1] is separable, so the orthonormal
basis is countable.
Thus, for any orthonormal basis {n } of L2 [0, 1], we can expand:
X
E[Y | X] =
E[Y n (U )] n (U ) , U = F (X) ,
n

the series of random variables being convergent in L2 .

Incidentally, the Lebesgue-Radon-Nikodym theorem has a nice proof using Hilbert-space methods

also.

Example 11.2 (Fourier expansion of conditional expectation). One popular orthonormal

basis is the complex Fourier basis n (u) = e2inu . So:
E[Y | X] =

E[Y e2inU ] e2inU ,

U = F (X) .

Example 11.3 (Conditional expectation for discrete random variables). The only time
that the orthonormal basis in Example 11.1 can be taken to be a finite set is when
X has finite range
p{x1 , . . . , xn }. In this case, the obvious orthonormal basis to use is
Zn = 1(X = xn )/ P(X = xn ). Then we arrive at the familiar expression:
E[Y | X] =

N
X
E[Y 1(X = xn )]
1(X = xn ) .
P(X = xn )
n=1

Bibliography

References
[Bouleau]

Nicolas Bouleau, Dominique Lepingle. Numerical Methods for Stochastic

Processes. Wiley-Interscience, 1994.

[Folland]

Gerald B. Folland, Real Analysis: Modern Techniques and Their Applications, second ed. Wiley-Interscience, 1999.

[Rosenthal]

Jeffrey S. Rosenthal, A First Look at Rigorous Probability Theory. World

Scientific, 2000.

[Schmetterer]

Leopold Schmetterer, Introduction to Mathematical Statistics. Trans.

Kenneth Wickwise. Springer-Verlag, 1974.

[Steele]

J. Michael Steele, Stochastic Calculus and Financial Applications.

Springer-Verlag, 2001.

Applied Stochastic Processes: M. Ottobre
No ratings yet
Applied Stochastic Processes: M. Ottobre
164 pages
Wanous (1992) PDF
No ratings yet
Wanous (1992) PDF
10 pages
Statistic and Probability
100% (4)
Statistic and Probability
21 pages
Lesson 1 - Money, Money, Money With Student Responses and Added Reflection
No ratings yet
Lesson 1 - Money, Money, Money With Student Responses and Added Reflection
7 pages
7 - The Stock Market
No ratings yet
7 - The Stock Market
20 pages
Appendix A 2
No ratings yet
Appendix A 2
18 pages
The Definition and Existence of Conditional Expectation
No ratings yet
The Definition and Existence of Conditional Expectation
17 pages
Prerequis Esp Cond
No ratings yet
Prerequis Esp Cond
6 pages
Marcin Pitera. Stochastic Processes.
No ratings yet
Marcin Pitera. Stochastic Processes.
45 pages
Martingales
No ratings yet
Martingales
15 pages
Handout For The Quantitative Finance Course: Conditional Expectation and Discrete Martingale
No ratings yet
Handout For The Quantitative Finance Course: Conditional Expectation and Discrete Martingale
4 pages
Conditional Probability and Expectation
No ratings yet
Conditional Probability and Expectation
19 pages
Summary Notes 1
No ratings yet
Summary Notes 1
4 pages
ST7 1
No ratings yet
ST7 1
21 pages
درسي
No ratings yet
درسي
2 pages
APA Lecture Notes Part2
No ratings yet
APA Lecture Notes Part2
21 pages
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
No ratings yet
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
13 pages
n-2 Cond-Exp
No ratings yet
n-2 Cond-Exp
31 pages
Conditional Expectation
No ratings yet
Conditional Expectation
7 pages
SDENotes 2
No ratings yet
SDENotes 2
140 pages
Probabilityspace
No ratings yet
Probabilityspace
5 pages
Advanced Probabiliy
No ratings yet
Advanced Probabiliy
80 pages
Markov
No ratings yet
Markov
46 pages
MTSP - TA Problems Solutions 3 PDF
No ratings yet
MTSP - TA Problems Solutions 3 PDF
5 pages
Discrete Time
No ratings yet
Discrete Time
106 pages
Spring School Stochastic
No ratings yet
Spring School Stochastic
14 pages
Chapter 4
No ratings yet
Chapter 4
26 pages
lec 1
No ratings yet
lec 1
19 pages
Measure Theoretic Probability Theory Notes
No ratings yet
Measure Theoretic Probability Theory Notes
3 pages
Notes Mainimp
No ratings yet
Notes Mainimp
164 pages
Stochastic Processes Lecture Notes Ucb Stat150 Itebooks download
No ratings yet
Stochastic Processes Lecture Notes Ucb Stat150 Itebooks download
44 pages
lecture2
No ratings yet
lecture2
6 pages
Probability Theory: 1 Heuristic Introduction
No ratings yet
Probability Theory: 1 Heuristic Introduction
17 pages
Lecture23 Conditional Expectation
No ratings yet
Lecture23 Conditional Expectation
4 pages
Wattle Lecture 15
No ratings yet
Wattle Lecture 15
6 pages
6710 Notes
No ratings yet
6710 Notes
118 pages
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
No ratings yet
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
155 pages
Stochastic Calculus
No ratings yet
Stochastic Calculus
217 pages
Probability
100% (1)
Probability
145 pages
An Introduction To Stochastic Control
No ratings yet
An Introduction To Stochastic Control
134 pages
ST302
No ratings yet
ST302
58 pages
Introduction To Financial Mathematics
No ratings yet
Introduction To Financial Mathematics
47 pages
REU Project: Topics in Probability: Trevor Davis August 14, 2006
No ratings yet
REU Project: Topics in Probability: Trevor Davis August 14, 2006
12 pages
116 - Vggray R. - Probability, Random Processes, and Ergodic Properties
No ratings yet
116 - Vggray R. - Probability, Random Processes, and Ergodic Properties
5 pages
Proba 2
No ratings yet
Proba 2
88 pages
Kỳ Vọng Có Điều Kiện
No ratings yet
Kỳ Vọng Có Điều Kiện
12 pages
Martingale Theory and Applications: DR Nic Freeman June 4, 2015
No ratings yet
Martingale Theory and Applications: DR Nic Freeman June 4, 2015
40 pages
Probability Theory: STAT310/MATH230 September 12, 2010
No ratings yet
Probability Theory: STAT310/MATH230 September 12, 2010
151 pages
Probability Theory: STAT310/MATH230 September 12, 2010
No ratings yet
Probability Theory: STAT310/MATH230 September 12, 2010
151 pages
Condition A Ex of Convergence Theorem
No ratings yet
Condition A Ex of Convergence Theorem
8 pages
R300 MT Class 1 Slides
No ratings yet
R300 MT Class 1 Slides
68 pages
Paolo Baldi, Laurent Mazliak, Pierre Priouret - Martingales and Markov Chains - Solved Exercises and Elements of Theory (2002, Chapman and Hall - CRC)
No ratings yet
Paolo Baldi, Laurent Mazliak, Pierre Priouret - Martingales and Markov Chains - Solved Exercises and Elements of Theory (2002, Chapman and Hall - CRC)
194 pages
Lecturenotes5 6 Probability
No ratings yet
Lecturenotes5 6 Probability
10 pages
Stochastic Lectures
No ratings yet
Stochastic Lectures
8 pages
Stochastic Dynamic Programming: 4.1 The Axiomatic Approach To Probability: Basic Con-Cepts of Measure Theory
No ratings yet
Stochastic Dynamic Programming: 4.1 The Axiomatic Approach To Probability: Basic Con-Cepts of Measure Theory
17 pages
Lecture 5
No ratings yet
Lecture 5
17 pages
Ouka Cha
No ratings yet
Ouka Cha
16 pages
Elements of Probability Theory - Lecture Notes
No ratings yet
Elements of Probability Theory - Lecture Notes
58 pages
Lnotes
No ratings yet
Lnotes
409 pages
Relating Two Definitions of Expectation
No ratings yet
Relating Two Definitions of Expectation
2 pages
Lecture 2: Conditional Expectation: ! ! ! ! ! ! ! G (X) " - Y Residual
No ratings yet
Lecture 2: Conditional Expectation: ! ! ! ! ! ! ! G (X) " - Y Residual
4 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
SVM Rmarkdown Article Example
No ratings yet
SVM Rmarkdown Article Example
9 pages
Markdown
No ratings yet
Markdown
12 pages
Decision Making and Bayes Linear Methods: Module Leader
No ratings yet
Decision Making and Bayes Linear Methods: Module Leader
70 pages
Decision Making and Bayes Linear Methods: Module Leader
No ratings yet
Decision Making and Bayes Linear Methods: Module Leader
70 pages
Prob Level Sets
No ratings yet
Prob Level Sets
8 pages
Notes 14
No ratings yet
Notes 14
189 pages
SevenNationArmy Lyrics
No ratings yet
SevenNationArmy Lyrics
1 page
Applications of The Double Integral
No ratings yet
Applications of The Double Integral
13 pages
Proba
No ratings yet
Proba
23 pages
Chapter 5 - Risk and Return
No ratings yet
Chapter 5 - Risk and Return
50 pages
Buffon's Needle, The Easy Way
No ratings yet
Buffon's Needle, The Easy Way
9 pages
X Y X X Y y X X, Y y Y Y: Conditional Expectation
No ratings yet
X Y X X Y y X X, Y y Y Y: Conditional Expectation
21 pages
KNHS Entrepreneurship Q2 W4 D4
No ratings yet
KNHS Entrepreneurship Q2 W4 D4
4 pages
DOC-20250420-WA0010.
No ratings yet
DOC-20250420-WA0010.
12 pages
8th NTRCA College Level
50% (2)
8th NTRCA College Level
42 pages
Tasting Atmospherics Taste Associations With Colour Parameters of Coffee Shop Interiors
No ratings yet
Tasting Atmospherics Taste Associations With Colour Parameters of Coffee Shop Interiors
9 pages
Montecarlo in Aerodynamics
No ratings yet
Montecarlo in Aerodynamics
32 pages
3RD Week3
No ratings yet
3RD Week3
25 pages
Probability & Statistics
No ratings yet
Probability & Statistics
3 pages
Discrete Probability Distributions Problem Set
No ratings yet
Discrete Probability Distributions Problem Set
7 pages
Department of Electrical Engineering EE 322 - Analog and Digital Communication
No ratings yet
Department of Electrical Engineering EE 322 - Analog and Digital Communication
6 pages
Random Variables
No ratings yet
Random Variables
82 pages
Instant download (Ebook) Finite Markov Chains and Algorithmic Applications by Olle Häggström ISBN 9780521813570, 9780511019418, 9780521890014, 0511019416, 0521813573, 0521890012 pdf all chapter
100% (3)
Instant download (Ebook) Finite Markov Chains and Algorithmic Applications by Olle Häggström ISBN 9780521813570, 9780511019418, 9780521890014, 0511019416, 0521813573, 0521890012 pdf all chapter
76 pages
Learning Module 4_Probability Trees and Conditional Expectations
No ratings yet
Learning Module 4_Probability Trees and Conditional Expectations
20 pages
CT3 Past Exams 2005 - 2009
No ratings yet
CT3 Past Exams 2005 - 2009
175 pages
MDM4U NOTES Week 5
No ratings yet
MDM4U NOTES Week 5
16 pages
Hybrid Math 11 Stat Q1 M2 W2 V2
No ratings yet
Hybrid Math 11 Stat Q1 M2 W2 V2
13 pages
Bagus
No ratings yet
Bagus
56 pages
A First Course in Probability 9th Edition, (Ebook PDF) Full Chapter Instant Download
100% (3)
A First Course in Probability 9th Edition, (Ebook PDF) Full Chapter Instant Download
44 pages
Probability and Statistics Lpu
No ratings yet
Probability and Statistics Lpu
227 pages
AB1202 Statistics and Analysis
No ratings yet
AB1202 Statistics and Analysis
16 pages
Statistics Lecture Notes
No ratings yet
Statistics Lecture Notes
15 pages
John F. Nash, JR.: Econometrica, Vol. 18, No. 2. (Apr., 1950), Pp. 155-162
No ratings yet
John F. Nash, JR.: Econometrica, Vol. 18, No. 2. (Apr., 1950), Pp. 155-162
9 pages

Conditional Probability

Uploaded by

Conditional Probability

Uploaded by

Introduction to Conditional Probabilities and

3 Conditional expectation given a random variable

4 Basic properties of the conditional expectation

6 Convergence properties of the conditional expectation

7 Conditional probability measures

8 Calculating conditional expectations

10 What is being conditioned on can be set constant

11 Hilbert space theory of conditional expectations

Let (, , P) be a probability space. Suppose we are given G , a -algebra of , and

Clearly () = 0, while if A1 , A2 , . . . are disjoint events, then

Firstly, a -algebra G on the sample space , roughly speaking, is a lens or filter

Conditional expectation given a random variable

We commonly condition random variables given the -algebra

Let PX be the pull-back measure B 7 P(X 1 (B)). If PX (B) = 0, then 0 (B) = 0,

By a change of variables with A = X 1 (B) G = (X), we have

Basic properties of the conditional expectation

Proposition 4.5. If X is G-measurable, then X can be factored out of a conditional

the conclusion also holds.

= P(A | 1B = 1) P(1B = 1) = P(A | B) P(B) ,

Convergence properties of the conditional expectation

Theorem 6.3 (Monotone convergence). If the non-negative random variables Xn are

normal monotone convergence

Conditional probability measures

Suppose X is a discrete random variable. If we look at P(A | X = x) = P(A [X =

i. Let E = [P(A | G) 0]. Then

so P(A | G) = 0 almost surely on E. i.e. P(A | G) almost never takes on negative

i.e. P(A | G) is almost never greater than one.

By (i), the integrand of the last integral is non-negative, and therefore P( | G) = 1

implies that P( | G) must be zero almost surely.

Theorem 7.2. Let (, , P) be a probability space, G be a -algebra, and Y : Rn

for one of the versions of P( | G). And for z Rn \ Qn , set

We claim that F is a multi-dimensional cumulative distribution function, for each

= lim inf P(Y Dyn ) P(Y Dz ) = 0 .

number of random variables namely, by the Kolmogorov Existence Theorem. For it to

Calculating conditional expectations

In elementary courses of probability theory, one learns a definition of the conditional

We apply a similar argument for Y = max(0, Y ): by taking limits through a

Suppose X is a random variable on a probability space (, , P), and f : R R is a

So what about E[Y | G]?

We shall prove this.

Theorem 9.1. Let (, , P) be a probability space, and G be a -algebra. Let

Proof. Let Y = f (X). We show PY |G = PX|G f 1 almost surely. For each B BR ,

What is being conditioned on can be set constant

Also, for each E G, consider

Corollary 10.3. Under the same hypotheses as Theorem 10.2,

Hilbert space theory of conditional expectations

In section 2, we observed that, in general, the conditional expectation E[Y | G] extracts

The U part of Y can be obtained by projecting Y orthogonally onto M, a closed

Let F be the cumulative distribution function of X. Then U = F (X) takes on values

the series of random variables being convergent in L2 .

Example 11.2 (Fourier expansion of conditional expectation). One popular orthonormal

E[Y e2inU ] e2inU ,

Nicolas Bouleau, Dominique Lepingle. Numerical Methods for Stochastic

Jeffrey S. Rosenthal, A First Look at Rigorous Probability Theory. World

Leopold Schmetterer, Introduction to Mathematical Statistics. Trans.

J. Michael Steele, Stochastic Calculus and Financial Applications.

You might also like