0% found this document useful (0 votes)
24 views167 pages

FE-606 - Unit 2-Revision of Probability and Statistics

The document is a revision guide for a course on Mathematics for Finance, focusing on Probability and Statistics. It covers key concepts such as probability theory, discrete and continuous probability distributions, joint distributions, and conditional probability. The material includes definitions, examples, and theorems related to these topics, aimed at providing a comprehensive understanding of probability in real-world applications.

Uploaded by

redom2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views167 pages

FE-606 - Unit 2-Revision of Probability and Statistics

The document is a revision guide for a course on Mathematics for Finance, focusing on Probability and Statistics. It covers key concepts such as probability theory, discrete and continuous probability distributions, joint distributions, and conditional probability. The material includes definitions, examples, and theorems related to these topics, aimed at providing a comprehensive understanding of probability in real-world applications.

Uploaded by

redom2023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 167

Addis Ababa University

College of Business and Economics


Department of Economics
FE 606: Mathematics for Finance
2. Revision of Probability and Statistics

Fantu Guta Chemrie (PhD)


F. Guta (CoBE) FE 606 September, 2023 1 / 167
2. Revision of Probability and Statistics (6 hours)
2.1 Introduction to Probability Theory Concepts
2.2 Discrete Probability Distributions
2.3 Continuous Probability Distributions
2.4 Expectation of a Random Variable
2.5 Jointly Distributed Random Variables
2.6 Moment Generating Functions
2.7 Limit Theorems
2.8 Conditional Probability and Conditional Expectation

F. Guta (CoBE) FE 606 September, 2023 2 / 167


Revision of Probability and Statistics

2.1 Introduction to Probability Theory Concepts

Any realistic model of a real-world phenomenon


must take into account the possibility of
randomness.
The quantities we are interested in will not be
predictable in advance but, rather, will exhibit an
inherent variation that should be taken into account
F. Guta (CoBE) FE 606 September, 2023 3 / 167
by the model.

This is usually accomplished by allowing the model


to be probabilistic in nature which requires
probability theory concepts.

2.1.1 Sample Space and Events

The set of all possible outcomes of an experiment is


known as the sample space of the experiment and is
denoted by S.
Some examples are the following:
F. Guta (CoBE) FE 606 September, 2023 4 / 167
i). If the experiment consists of the ‡ipping of a coin,
then
S = fH, T g .

where H means that the outcome of the toss is a


head and T that it is a tail.
ii). If the experiment consists of rolling a die, then the
sample space is

S = f1, 2, 3, 4, 5, 6g

where the outcome i means that i appeared on the


F. Guta (CoBE) FE 606 September, 2023 5 / 167
die, i = 1, 2, 3, 4, 5, 6.
iii). If the experiments consists of ‡ipping two coins,
then the sample space consists of the following four
points:

S = f(H, H ) , (H, T ) , (T , H ) , (T , T )g

iv). If the experiment consists of measuring the lifetime


of a car, then the sample space consists of all
nonnegative real numbers. That is,

S = [0, ∞)
F. Guta (CoBE) FE 606 September, 2023 6 / 167
Any subset A of the sample space S is known as an
event. Some examples of events are:
In the experiment of ‡ipping a coin A = fHg is the
event that a head appears on the ‡ip of the coin
In the experiment of ‡ipping two coin
A = f(H, H ) , (H, T )g is the event that a head appears
on the …rst coin.

For any two events A and B of a sample space S we


de…ne the new event A [ B to consist of all
outcomes that are either in A or in B or in both A
F. Guta (CoBE) FE 606 September, 2023 7 / 167
and B.

For any two events A and B, we may also de…ne the


new event A \ B, and referred to as the intersection
of A and B, consisting of all outcomes which are
both in A and in B.
If A \ B = ?, then A and B are said to be mutually
exclusive.
If A1 , A2 , ... are events, then the union of these
events, denoted by [∞
n=1 An , is de…ned to be the
F. Guta (CoBE) FE 606 September, 2023 8 / 167
event that consists of all outcomes that are in An
for at least one value of n = 1, 2, ....

Similarly, the intersection of the events An , denoted


by \∞
n=1 An , is de…ned to be the event consisting of
those outcomes that are in all of the events An ,
n = 1, 2, ....
For any event A we de…ne the new event A0 ,
referred to as the complement of A, to consist of all
outcomes in the sample space S that are not in A.
F. Guta (CoBE) FE 606 September, 2023 9 / 167
2.1.2 Probabilities De…ned on Events

Probability Distribution/Measure: A measure is a


mapping from events to the reals, i.e. P : A 7 ! R,
that satis…es the so called axioms of probability.

i). 0 P (A) 1
ii). P (S ) = 1
iii). For any sequence of events A1 , A2 , ... that are
mutually exclusive, i.e., An \ Am = ? when n 6= m,
then
F. Guta (CoBE) FE 606 September, 2023 10 / 167

S
P An = ∑∞
n=1 P (An )
n =1
We refer to P (A) as the probability of the event A.
Example (2.1)
In the die rolling experiment, if we supposed that all
six numbers were equally likely to appear, then the
probability of getting an even number would equal

1 1 1 1
P (f2, 4, 6g) = P f2g + P f4g + P f6g = + + = .
6 6 6 2

Since the events A and A0 are mutually exclusive


F. Guta (CoBE) FE 606 September, 2023 11 / 167
Example (2.1 continued. . . )
and since A [ A0 = S, by (ii ) and (iii ) we have that
1 = P (S ) = P (A [ A0 ) = P (A) + P (A0 ), or
P (A0 ) = 1 P (A)

Theorem (2.1)
If A and B are two events in a sample space S, then
P (A [ B ) = P (A) + P (B ) P (A \ B )

Proof.
Using the Venn diagram,
A [ B = (A \ B 0 ) [ (A0 \ B ) [ (A \ B )
F. Guta (CoBE) FE 606 September, 2023 12 / 167
Proof.
P (A [ B ) = P (A \ B 0 ) + P (A0 \ B ) + P (A \ B )

However, A = (A \ B 0 ) [ (A \ B )

P (A) = P (A \ B 0 ) + P (A \ B )

Similarly, B = (A0 \ B ) [ (A \ B )

P (B ) = P (A0 \ B ) + P (A \ B )

Finally,

F. Guta (CoBE) FE 606 September, 2023 13 / 167


Proof.
P (A [ B ) = P (A \ B 0 ) + P (A0 \ B ) + P (A \ B )

= P (A) P (A \ B ) + P (B ) P (A \ B ) + P (A \ B )

= P (A) + P (B ) P (A \ B )

Example (2.2)
Suppose that we toss two coins, and suppose that
we assume that each of the four outcomes in the
sample space
S = f(H, H ) , (H, T ) , (T , H ) , (T , T )g
is equally likely and hence has probability 41 .
F. Guta (CoBE) FE 606 September, 2023 14 / 167
Example (2.2 continued. . . )
Let A is the event that the …rst coin falls heads, and
B is the event that the second coin falls heads, i.e.,
A = f(H, H ) , (H, T )g , B = f(H, H ) , (T , H )g
The probability of A [ B is

P (A [ B ) = P (A) + P (B ) P (A \ B )
1 1 1 3
= + =
2 2 4 4

This probability could, of course, have been


computed directly since
F. Guta (CoBE) FE 606 September, 2023 15 / 167
Example (2.2 continued. . . )
3
P (A [ B ) = P f(H, H ) , (H, T ) , (T , H )g = .
4

The probability that any one of the three events A


or B or C occurs may be calculated as:

P (A [ B [ C ) = P ((A [ B ) [ C )

= P (A [ B ) + P (C ) P ((A [ B ) \ C )

= P (A ) + P (B ) P (A \ B ) + P (C ) P ((A \ C ) [ (B \ C ))

F. Guta (CoBE) FE 606 September, 2023 16 / 167


= P (A ) + P (B ) + P (C ) P (A \ B ) P (A \ C ) P (B \ C ) + P (A \ B \ C )

In fact, it can be shown by induction that, for any n


events A1 , A2 , A3 , . . . , An ,

P (A1 [ A2 [ [ An ) = ∑ P (Ai ) ∑ P (Ai \ Aj ) + ∑ P (Ai \ Aj \ Ak )


i i <j i <j <k

∑ P (Ai \ Aj \ Ak \ Al ) +
i <j <k <l

+ ( 1)n +1 P (A1 \ A2 \ \ An )

2.1.3 Conditional Probability

Often we want to calculate the probability that an


F. Guta (CoBE) FE 606 September, 2023 17 / 167
event A occurs given that an event A occurred.
We use the notation: P ( Aj B ). This is only de…ned
when P (B ) > 0.
The rule for conditional probability:
P (A \ B )
P ( Aj B ) = (2.1)
P (B )

Example (2.3)
Roll a fair dice once and let

A = feven numberg ,
F. Guta (CoBE) FE 606 September, 2023 18 / 167
Example (2.3 continued. . . )
B = f1, 2, 3, 5g.

What is P ( Aj B )? We do this by computing:

1 4
P (A \ B ) = , and P (B ) =
6 6

So we obtain that,

1/6 1
P ( Aj B ) = =
4/6 4

The Multiplication rule:


F. Guta (CoBE) FE 606 September, 2023 19 / 167
8
>
>
>
< P (A) P ( B j A)
P (A \ B ) =
>
>
>
: P (B ) P ( Aj B )

The Law of Total Probability:

P (A) = P ( Aj B ) P (B ) + P ( Aj B 0 ) P (B 0 )

Example (2.4)
A bank is considering extending credit to a new
customer and is interested in the probability that
the client will default on the loan. Based on
historical data, the bank knows that there is a 5%

F. Guta (CoBE) FE 606 September, 2023 20 / 167


Example (2.4 continued. . . )
chance that a customer who has overdrawn an
account will default, while there is only a 0.5%
chance that a customer who has never overdrawn
an account will default.

Unfortunately, the bank does not know for sure if


the customer will overdraw her account.
Based on background checks the bank believes
there is a 30% chance that the customer will

F. Guta (CoBE) FE 606 September, 2023 21 / 167


Example (2.4 continued. . . )
overdraw the account.

Calculate the probability that she will default if


credit is extended.
Let

A = fcustomer defaults on the loang ,and


B = fcustomer overdraws her accountg ,

P ( Aj B ) = 0.05, P ( Aj B 0 ) = 0.005, P (B ) = 0.3


F. Guta (CoBE) FE 606 September, 2023 22 / 167
Example (2.4 continued. . . )
So via the law of total probability we have

P (A) = P ( Aj B ) (B ) + P ( Aj B 0 ) P (B 0 )
= (0.05) (0.3) + (0.005) (0.7) ' 0.0185.

The law of total probability can be generalized by


de…ning the notion of a partition of the sample
space.
We say that B1 , B2 , : : : , Bn is a partition of S if
F. Guta (CoBE) FE 606 September, 2023 23 / 167
[ni=1 Bi = S

Bi \ Bj = ?, for i 6= j

Given a partition of S we have that,


n
P (A) = ∑ P ( Aj Bi ) P (Bi ) .
i =n

Two events A and B are said to be independent if

P (A \ B ) = P (A) P (B )
F. Guta (CoBE) FE 606 September, 2023 24 / 167
By Equation (1.5) this implies that A and B are
independent if

P ( Aj B ) = P (A)

which also implies that P ( Bj A) = P (B ).

Events A1 , A2 , : : : , An are independent if for every

collection of distinct indices fi1 , i2 , : : : , ik g,

P Ai1 \ Ai2 \ \ Aik = P Ai1 P Ai2 P Aik . (2.2)

for any k 2 and i1 < i2 < < ik .


F. Guta (CoBE) FE 606 September, 2023 25 / 167
Independent events have the properties that:

i). A, B independent implies A, B c independent.


ii). Any Aij can be replaced by Acij in equation (2.2).

Example (2.5)
Consider a sequence of n independent trials, each of
1
which has a probability n of being a “success”.
What is the probability of zero successes in n trials?
What if the number of trials is doubled?
The probability is simply:
F. Guta (CoBE) FE 606 September, 2023 26 / 167
Example (2.5 continued. . . )

n
P (failure on trial 1,. . . ,failure on trial n) = ∏ P (failure on trial i )
i =1
n
1 1
= 1 'e
n

If we double the number of trials then:

2n
P (failure on trial 1,. . . ,failure on trial n) = ∏ P (failure on trial i )
i =1
2n
1 2
= 1 'e
n

F. Guta (CoBE) FE 606 September, 2023 27 / 167


2.1.4 Bayes’Rule

Let A and B be events, then We may express A as


P (A \ B )
P ( Bj A) =
P (A)
We may express A as

A = (A \ B ) [ (A \ B 0 )

Since A \ B and A \ B c are mutually exclusive, we


have that

P (A) = P (A \ B ) + P (A \ B 0 )
F. Guta (CoBE) FE 606 September, 2023 28 / 167
P (A) = P ( Aj B ) P (B ) + P ( Aj B 0 ) P (B 0 ) (2.3)
From the preceding two expressions it follows:
P ( Aj B ) P (B )
P ( Bj A) =
P ( Aj B ) P (B ) + P ( Aj B 0 ) P (B 0 )

Example (2.6)
In answering a question on a multiple-choice test a
student either knows the answer or guesses. Let p
be the probability that she knows the answer and
1 p the probability that she guesses. Assume that
a student who guesses at the answer will be correct
F. Guta (CoBE) FE 606 September, 2023 29 / 167
Example (2.6 continued. . . )
with probability 1/m, where m is the number of
multiple-choice alternatives.

What is the conditional probability that a student


knew the answer to a question given that she
answered it correctly?
Let A and B denote, respectively, the event that the
student answers the question correctly and the
event that she actually knows the answer.

F. Guta (CoBE) FE 606 September, 2023 30 / 167


Example (2.6 continued. . . )
Now

P (A \ B )
P ( Bj A) =
P (A)
P ( Aj B ) P (B )
=
P ( Aj B ) P (B ) + P ( Aj B 0 ) P (B 0 )
p
=
p + (1/m) (1 p )
pm
=
1 + p ( m 1)

Equation (2.3) may be generalized in the following


F. Guta (CoBE) FE 606 September, 2023 31 / 167
manner.

Suppose that B1 , B2 , ..., Bn are mutually exclusive


events such that [ni=1 Bi = S. By writing

A = [ni=1 (A \ Bi )

and using the fact that the events A \ Bi ,


i = 1, ..., n, are mutually exclusive, we obtain that
n n
P (A) = ∑ P (A \ Bi ) = ∑ P ( Aj Bi ) P (Bi ) (2.4)
i =1 i =1

F. Guta (CoBE) FE 606 September, 2023 32 / 167


Suppose now that A has occurred and we are
interested in determining which one of the Bj also
occurred. By Equation (2.4) we have that
P ( Aj Bj ) P (Bj )
P Bj A = n (2.5)
∑ P ( Aj Bi ) P (Bi )
i =1
Equation (2.5) is known as Bayes’formula.
Example (2.7)
If a person is lying, the probability that this is
correctly detected by the polygraph is 0.88, whereas

F. Guta (CoBE) FE 606 September, 2023 33 / 167


Example (2.7 continued. . . )
if the person is telling the truth, this is correctly
detected with probability 0.86.

Suppose we are consider a question for which 99%


of all subjects tell the truth. Our polygraph machine
says a subject is lying on this question.
What is the probability that the polygraph is
incorrect?
Let A = fpolygraph says the subject is lyingg;

F. Guta (CoBE) FE 606 September, 2023 34 / 167


Example (2.7 continued. . . )
B = fsubject is actually lyingg

Then our goal is to compute P ( B 0 j A): we are given

P ( Aj B ) = 0.88, P A0 B 0 = 0.86, P (B ) = 0.01

P (A \ B 0 )
P B0 A =
P (A)
P ( Aj B 0 ) P (B 0 )
=
P ( Aj B 0 ) P (B 0 ) + P ( Aj B ) P (B )
0.14 0.99
= ' 0.94.
0.14 0.99 + 0.88 0.01
F. Guta (CoBE) FE 606 September, 2023 35 / 167
2.1.5 Random Variables

Properties of F :The class of events F (called a


σ -algebra or σ -…eld) should be such that the
operations normally conducted on events, for
example countable unions or intersections, or
complements, keeps us within that class. In
particular it is such that

a). Ω 2 F , where Ω is the sample space.


b). If A 2 F then A0 2 F .
F. Guta (CoBE) FE 606 September, 2023 36 / 167
c). If An 2 F for all n = 1, 2, ...., then [∞
n=1 An 2 F .

It follows from these properties that F is also


closed under countable intersections, or countable
intersections of unions, etc.
De…nition (2.1)
Let X be a function from a probability space
(Ω, F , P ) into the real numbers.
We say that the function is measurable (in which
case we call it a random variable) if for x 2 R,

F. Guta (CoBE) FE 606 September, 2023 37 / 167


De…nition (2.1 continued. . . )
the set fω; X (ω ) xg 2 F .

Since events in F are those to which we can attach


a probability, this permits us to obtain probabilities
for the event that the random variable X is less
than or equal to any number x.

De…nition (2.2)
For an arbitrary set A 2 F de…ne IA (ω ) = 1 if
ω 2 A and 0 otherwise. This is called an indicator
random variable.
F. Guta (CoBE) FE 606 September, 2023 38 / 167
Example (2.8)
Suppose that our experiment consists of tossing two
fair coins. Letting X denote the number of heads
appearing, then X is a random variable taking on
one of the values 0, 1, 2 with respective probabilities

1
P fX = 0g = P f(T , T )g =
4
1
P fX = 1g = P f(H, T ) , (T , H )g =
2
1
P fX = 2g = P f(H, H )g =
4

F. Guta (CoBE) FE 606 September, 2023 39 / 167


Example (2.9)
Suppose that we toss a coin having a probability p
of coming up heads, until the …rst head appears.
Letting X denote the number of ‡ips required, then
assuming that the outcome of successive ‡ips are
independent, X is a random variable taking on one
of the values 1, 2, 3, ..., with respective probabilities:

P fX = 1g = P fHg = p
P fX = 2g = P f(T , H )g = (1 p) p
F. Guta (CoBE) FE 606 September, 2023 40 / 167
Example (2.9 continued. . . )
P fX = 3g = P f(T , T , H )g = (1 p )2 p
..
.

P fX = x g = P f(T , T , : : : , T , H )g = (1 p )x 1
p, x 1

As a check, note that


P ([∞
x =1 fX = x g) = ∑ P fX = x g
x =1

= ∑ (1 p )x 1
p
x =1
p
= =1
1 (1 p )
F. Guta (CoBE) FE 606 September, 2023 41 / 167
Example (2.10)
Suppose that our experiment consists of seeing how

long a battery can operate before wearing down.

Suppose also that we are concerned only about

whether or not the battery lasts at least two years.

In this case, we may de…ne the random variable I by


8
>
>
>
< 1, if the life time of battery is two or more years
I=
>
>
>
: 0,
F. Guta (CoBE) FE 606 September, 2023 42 / 167
Example (2.10 continued. . . )
then the random variable I is known as the indicator
random variable for event E that the life time of
battery is two or more years.

We will often denote the event fω 2 Ω, X (ω ) xg


more compactly by [X x ].
In general functions of one or more random
variables gives us another random variable (provided
that function is measurable).
F. Guta (CoBE) FE 606 September, 2023 43 / 167
For example, if X1 , X2 are random variables, so is
X1 + X2 , X1 X2 , minfX1 , X2 g.
The cumulative distribution function (or more
simply the distribution function) F ( ) of a Random
variable X is de…ned to be the function
F (x ) = P (X x ), for x 2 R.
Properties of the cumulative distribution function.

i). A cumulative distribution function F (x ) is


non-decreasing. i.e. F (x ) F (y ) whenever x y.
F. Guta (CoBE) FE 606 September, 2023 44 / 167
ii). F (x ) ! 0, as x ! ∞.
iii). F (x ) ! 1, x ! ∞.
iv). F (x ) is right continuous. i.e.
F (x ) = limh!0+ F (x + h) ( i.e. the limit as h
decreases to 0).

There are two types of distributions considered here,


discrete distributions and continuous ones.
Discrete distributions are those whose cumulative
distribution function at any point x can be expressed
F. Guta (CoBE) FE 606 September, 2023 45 / 167
as a …nite or countable sum of values. For example

F (x ) = ∑ pi
fi,Xi xg

for some probabilities pi which sum to one.


In this case the cumulative distribution is piecewise
constant, with jumps at the values xi that the
random variable can assume.
The values of those jumps are the individual
probabilities pi . For example P (X = x ) is equal to
the size of the jump in the graph of the cumulative
F. Guta (CoBE) FE 606 September, 2023 46 / 167
distribution function at the point x.

We refer to the function f (x ) = P (X = x ) as the


probability function of the distribution when the
distribution is discrete.

2.2 Discrete Distributions

A random variable that can take on at most a


countable number of possible values is said to be
discrete.
For a discrete random variable X , we de…ne the
F. Guta (CoBE) FE 606 September, 2023 47 / 167
probability mass function p (x ) of X by

p (x ) = P fX = xg

The probability mass function p (x ) is positive for at


most a countable number of values of x.
That is, if X must assume one of the values
x1 , x2 , : : :, then

p (xi ) > 0, i = 1, 2, : : :
p (x ) = 0, all other values of x
F. Guta (CoBE) FE 606 September, 2023 48 / 167
Since X must take on one of the values xi , we have

∑ p (xi ) = 1.
i =1
The cumulative distribution function F can be
expressed in terms of p (x ) by

F (x ) = ∑ p (xi )
fi :xi xg

For instance, suppose X has a probability mass


function given by
1 1 1
p ( 1) = , p ( 2) = , p ( 3) =
2 3 6
F. Guta (CoBE) FE 606 September, 2023 49 / 167
then, the cumulative distribution function F of X is
given by
8
>
> 0, x < 1
>
>
>
>
< 1, 1 x < 2
2
F (x ) =
>
> 5
>
> 6, 2 x <3
>
>
: 1, x 3

2.2.1 The Bernoulli Distribution

Consider an experiment with an outcome classi…ed


as either a “success” or as a “failure”.
F. Guta (CoBE) FE 606 September, 2023 50 / 167
If we let X = 1 if the outcome is a success and
X = 0 if it is a failure, then the probability mass
function of X is given by
9
p ( 0) = P ( X = 0) = 1 p =
(2.6)
p ( 1) = P ( X = 1) = p ;

where p, 0 p 1, is the probability that the trial


is a “success”.
A random variable X is said to be a Bernoulli r.v if
its probability mass function is given by equation
F. Guta (CoBE) FE 606 September, 2023 51 / 167
(2.6) for some p 2 (0, 1).

2.2.2 The Binomial Distribution

Consider n independent trials, each of which results


in a “success” with probability p and in a “failure”
with probability 1 p, are to be performed.
If X represents the number of successes that occur
in the n trials, then X is said to be a binomial
random variable with parameters (n, p ).
The probability mass function of a binomial random
F. Guta (CoBE) FE 606 September, 2023 52 / 167
variable having parameters (n, p ) is given by
n x
p (x ) = p (1 p )n x
, x = 0, 1, 2, : : : , n (2.7)
x
where
n n!
=
x x! (n x )!
equals the number of di¤erent groups of x objects
that can be chosen from a set of n objects.
The validity of equation (2.7) may be veri…ed by
…rst noting that the probability of any particular
sequence of the n outcomes containing x successes
F. Guta (CoBE) FE 606 September, 2023 53 / 167
and n x failures is, by the assumed independence
of trials, px (1 p )n x
.
n
Equation (2.3) then follows since there are
x
di¤erent sequences of the n outcomes leading to x
successes and n x failures.
3
For instance, if n = 3, x = 2, then there are =
2
3 ways in which the three trials can result in two
successes.
Note that, by the binomial theorem, the probabilities
F. Guta (CoBE) FE 606 September, 2023 54 / 167
sum to one, that is,
n n
n x
∑ p (x ) = ∑ x
p (1 p )n x
= (p + (1 p ))n = 1
x =0 x =0

Example (2.11)
It is known that any item produced by a certain
machine will be defective with probability 0.1,
independently of any other item. What is the
probability that in a sample of three items, at most
one will be defective?

F. Guta (CoBE) FE 606 September, 2023 55 / 167


Example (2.11 continued. . . )
Let X be the number of defective items in the
sample, then X is a binomial r.v with parameters

(3, 0.1). Hence, the desired probability is given by

F (1) = p (0) + p (1) = P (X = 0) + P (X = 1)

3 3
= 0.10 (1 0.1)3 0
+ 0.11 (1 0.1)3 1
' 0.972.
0 1

Note: If X is a binomial r.v with parameters (n, p ), then


we say that X has a binomial distribution with
F. Guta (CoBE) FE 606 September, 2023 56 / 167
parameters (n, p ).

2.2.3 The Negative Binomial Distribution

The binomial distribution was generated by


assuming that we repeated trials a …xed number n
of times and then counted the total number of
successes X in those n trials.
Suppose we decide in advance that we wish a …xed
number (k ) of successes instead, and sample
repeatedly until we obtain exactly this number.
F. Guta (CoBE) FE 606 September, 2023 57 / 167
Then the number of trials X is random.
x 1 k
p (x ) = p (1 p )x k
, x = k, k + 1, : : :
k 1
2.2.4 The Geometric Distribution

Consider independent trials, each having probability


p of being a success, are performed until a success
occurs.
If we let X be the number of trials required until the
…rst success, then X is said to be a geometric r.v
with parameter p.
F. Guta (CoBE) FE 606 September, 2023 58 / 167
Its probability mass function is given by

p (x ) = (1 p )x 1
p, x = 1, 2, : : : (2.8)

Equation (2.8) follows since in order for X to equal


x it is necessary and su¢ cient that the …rst x 1
trials be failures and the x th trial a success.
Equation (2.8) follows since the outcomes of the
successive trials are assumed to be independent.
To check that p (x ) is a probability mass function,
we note that
F. Guta (CoBE) FE 606 September, 2023 59 / 167
∞ ∞ ∞
∑ p (x ) = ∑ (1 p )x 1
p = p ∑ (1 p )x 1
=1
x =1 x =1 x =1

2.2.5 The Poisson Distribution

A random variable X , taking on one of the values


0, 1, 2, : : : , is said to be a Poisson random variable
with parameter λ , if for some λ > 0,
x
λλ
p (x ) = P (X = x ) = e , x = 0, 1, ... (2.9)
x!
Equation (2.9) de…nes a probability mass function as
∑∞ ∑∞
x
λ λ λ eλ
x =0 p (x ) = e x =0 x ! = e =1
F. Guta (CoBE) FE 606 September, 2023 60 / 167
An important property of the Poisson r.v is that it
may be used to approximate a binomial r.v when
the binomial parameter n is large and p is small.

To see this, suppose that X is a binomial r.v with


parameters (n, p ), and let λ = np. Then

n!
P fX = x g = p x (1 p )n x
x ! (n x ) !
x n x
n! λ λ
= 1
x ! (n x ) ! n n
n (n 1) (n x + 1) λ x (1 λ /n )n
=
nx x ! (1 λ /n )x

F. Guta (CoBE) FE 606 September, 2023 61 / 167


Now, for n large and p small
n x
λ λ n (n 1) (n x + 1) λ
1 'e , ' 1, 1 '1
n nx n

Hence, for n large and p small,


x
λλ
P fX = xg ' e
x!

Example (2.12)
Consider an experiment that consists of counting
the number of α-particles given o¤ in a one-second
interval by one gram of radioactive material.
F. Guta (CoBE) FE 606 September, 2023 62 / 167
Example (2.12 continued. . . )
If we know from past experience that, on the
average, 3.2 such α-particles are given o¤, what is a
good approximation to the probability that no more
than two α-particles will appear?
The number of α-particles given o¤ will be a
Poisson r.v with parameter λ = 3.2. Hence the
desired probability is

3.2 3.2 3.22


P fX 2g = e 1+ + ' 0.38
1! 2!
F. Guta (CoBE) FE 606 September, 2023 63 / 167
2.3 Continuous Probability Distributions

In this section, we are concerned with r.v whose set


of possible values is uncountable.
Let X be such a r.v. We say that X is a continuous
r.v if there exists a nonnegative function f (x ),
de…ned for all real x 2 ( ∞, ∞), having the property
that for any set B of real numbers
Z
P fX 2 Bg = f (x ) dx (2.10)
B
The function f (x ) is called the probability density
F. Guta (CoBE) FE 606 September, 2023 64 / 167
function of the r.v .X
Since X must assume some value, f (x ) must satisfy
Z ∞
P fX 2 ( ∞, ∞)g = f (x ) dx = 1

Let B = [a, b], we obtain from Equation (2.10) that


Z b
P fX 2 [a, b]g = P fa X bg = f (x ) dx (2.11)
a

If we let a = b in the preceding, then


Z a
P fX = ag = f (x ) dx = 0
a
F. Guta (CoBE) FE 606 September, 2023 65 / 167
The relationship between the cumulative distribution
F ( ) and the probability density f ( ) is expressed by
Z x
F (x ) = P fX 2 ( ∞, x ]g = f (u ) du

Di¤erentiating both sides of the preceding yields


d
F (x ) = f (x )
dx
A somewhat intuitive interpretation of the density
function may be obtained from equation (2.11) as
follows:
F. Guta (CoBE) FE 606 September, 2023 66 / 167
ε R a+ 2ε
P a 2 X a + ε2 = a ε f (x ) dx ' εf (a)
2

From this, we see that f (a) is a measure of how


likely it is that the r.v will be near a.

2.3.1 The Uniform Distribution

A random variable is said to be uniformly


distributed over the interval (α, β ) if its probability
density function (pdf ) is given by
8
< 1 , if α < X < β
β α
f (x ) =
: 0, otherwise
F. Guta (CoBE) FE 606 September, 2023 67 / 167
For any α < a < b < β ,
Z b
1 b a
P fa X bg = dx =
a β α β α

Example (2.13)
Calculate the cdf of a r.v uniformly distributed over
Rx
(α, β ). Since F (x ) = ∞f (u ) du, we obtain
8
>
>
>
> 0, x α
>
>
>
<
F (x ) = x α
>
> β α, α <X <β
>
>
>
>
>
: 1, X β
F. Guta (CoBE) FE 606 September, 2023 68 / 167
2.3.2 Exponential Distribution

A continuous r.v whose pdf is given, for some


λ > 0, by
8
< λe λx, x 0
f (x ) =
: 0, x < 0
is said to be an exponential r.v with parameter λ .
The cumulative distribution function F :
Z x
λu λx
F (x ) = λe du = 1 e , x 0
0
R∞
Note that F (∞) = 0 λe λ x dx = 1.
F. Guta (CoBE) FE 606 September, 2023 69 / 167
2.3.3 Gamma Distribution

A continuous r.v whose pdf is given by


8 α 1
< λ e λ x (λ x ) , x 0
Γ (α )
f (x ) =
: 0 , x <0
for some λ > 0, α > 0 is said to be a gamma r.v
with parameters α, λ .
The quantity Γ (α ) is called the gamma function
and is de…ned by
Z ∞
x α 1
Γ (α ) = e x dx
0
F. Guta (CoBE) FE 606 September, 2023 70 / 167
It is easy to show by induction that for integer α,
say, α = n, Γ (n) = (n 1) !

2.3.4 Normal Distribution

We say that X is a normal r.v (or simply that X is


normally distributed) with parameters µ and σ 2 if
the density of X is given by
( )
2
1 1 x µ
f (x ) = p exp , ∞<X <∞
σ 2π 2 σ

An important fact about normal r.v is that if X is


F. Guta (CoBE) FE 606 September, 2023 71 / 167
normally distributed with parameters µ and σ 2 then
Y = αX + β is normally distributed with parameters
α µ + β and α 2 σ 2 .

To prove this, suppose …rst that α > 0 and note


that FY ( ), the cdf of the random variable Y , is
given by

FY (y ) = P fY yg = P fαX + β yg
y β y β
= P X = FX
α α
F. Guta (CoBE) FE 606 September, 2023 72 / 167
Z (y β )/α n o
p1 1 x µ 2
FY ( y ) = 2π
exp 2 σ dx
∞ σ
( )
2
d 1 1 y (α µ + β )
FY ( y ) = p exp , ∞<y <∞
dy ασ 2π 2 ασ

Hence, Y is normally distributed with parameters


α µ + β and (ασ )2 .
A similar result is also true when α < 0.
One implication of the preceding result is that if X
is normally distributed with parameters µ and σ 2
then Y = (X µ ) /σ is normally distributed with
parameters 0 and 1.
F. Guta (CoBE) FE 606 September, 2023 73 / 167
Such a random variable Y is said to have the
standard normal distribution.

2.4 Expectation of a Random Variable


2.4.1 The Discrete Case

If X is a discrete r.v having a probability mass


function p (x ), then the expected value of X is
de…ned by

E (X ) = ∑ xp (x )
fx :f (x )>0g

F. Guta (CoBE) FE 606 September, 2023 74 / 167


Example (2.14)
Find E (X ) where X is the outcome when we roll a
fair die. Since
p (1) = p (2) = p (3) = p (4) = p (5) = p (6) = 16 , we obtain

1 1 1 1 1 1 7
E (X ) = 1 +2 +3 +4 +5 +6 = .
6 6 6 6 6 6 2

Example (2.15)
Find E (X ) when X is binomially distributed with
parameters n and p.

F. Guta (CoBE) FE 606 September, 2023 75 / 167


Example (2.15)
n x
E (X ) = ∑nx =0 x p (1 p )n x
= ∑nx =0 x x !(nn ! x )! p x (1 p )n x
x

n
( n 1) !
= np ∑ (x 1) ! ( n x ) !
p x 1 (1 p )n x
x =1
n
n 1 x 1
= np ∑ x 1
p (1 p )n x
x =1
n 1
n 1
= np ∑ k
p k (1 p ) (n 1) k

k =0
= np [p + (1 p )]n 1

= np
F. Guta (CoBE) FE 606 September, 2023 76 / 167
Example (2.16)
What is the expected value of a geometric r.v with
parameter p.
∞ ∞
E (X ) = ∑ xp (1 p) x 1
=p ∑ x (1 p )x 1
x =1 x =1
∞ ∞
= p ∑ x (1 p )x 1
=p ∑ xqx 1
x =1 x =1

where q = 1 p.
!
∞ ∞
d d
E (X ) = p ∑ dq (qx ) = p dq ∑ qx
x =1 x =1
F. Guta (CoBE) FE 606 September, 2023 77 / 167
Example (2.16 continued. . . )
d q 1
E (X ) = p dq =p = p1 .
1 q (1 q )2

Example (2.17)
Find E (X ) if X is a Poisson r.v with parameter λ .

∞ ∞ ∞
λx λx λx
E (X ) = ∑ xe λ
x!
= ∑ xe λ
x!
=e λ

x =0 x =1 x =1 (x 1) !
∞ ∞
λx 1 λk
= λe λ
∑ = λe λ
∑ = λe λ λ
e =λ
x =1 (x 1) ! k =0 k !

2.4.2 The Continuous Case


F. Guta (CoBE) FE 606 September, 2023 78 / 167
The expected value of a continuous r.v X having a
pdf f (x ) is de…ned by
Z ∞
E (X ) = xf (x ) dx

Example (2.18)
Let X be exponentially distributed with parameter
λ . Find E (X ).
Z ∞
λx
E (X ) = xλ e dx
0

Integrating by parts (dv = λ e λx, u = x) yields


F. Guta (CoBE) FE 606 September, 2023 79 / 167
Example (2.18 continued. . . )
∞ R∞ λx ∞
λx λ x dx e 1
E (X ) = xe + 0 e =0 λ = λ
0 0

Example (2.19)
Let X be a gamma r.v with parameters α, and λ .
Find E (X ).
Z ∞ 1 Z ∞
λ e λ x (λ x )α Γ (α + 1) λ e λ x (λ x )α
E (X ) = x dx = dx
0 Γ (α ) λ Γ (α ) 0 Γ (α + 1)
αΓ (α ) α
= = .
λ Γ (α ) λ

F. Guta (CoBE) FE 606 September, 2023 80 / 167


Example (2.20)
Calculate E (X ) when X is normally distributed
with parameters µ and σ 2 .
Z ∞
1 1 2
( x σ µ ) dx
E (X ) = p xe 2
σ 2π ∞

Writing x as (x µ ) + µ yields
Z ∞ Z ∞
1 1
( x σ µ ) dx + µ p1
2 1 2
( x σ µ ) dx
E (X ) = p (x µ)e 2 e 2
σ 2π ∞ σ 2π ∞

Letting y = x µ leads to

F. Guta (CoBE) FE 606 September, 2023 81 / 167


Example (2.20 continued. . . )
Z ∞ Z ∞
1 y 2 /2σ 2
E (X ) = p ye dx + µ f (x ) dx
σ 2π ∞ ∞

where f (x ) is the normal density. By symmetry, the


…rst integral must be 0, and so
Z ∞
E (X ) = µ f (x ) dx = µ.

2.4.3 Expectation of a Function of a Random


Variable
F. Guta (CoBE) FE 606 September, 2023 82 / 167
Suppose we are given a r.v X and its probability
distribution (that is, its probability mass function in
the discrete case or its pdf in the continuous case).
Suppose also that we are interested in the expected
value of some function of X , say, g (X ).
How do we go about doing this?
Proposition 2.1 a). If X is a discrete r.v with
probability mass function f (x ), then for any
real-valued function g,
F. Guta (CoBE) FE 606 September, 2023 83 / 167
E [g (X )] = ∑fx :f (x )>0g g (x ) f (x )
b). If X is a continuous r.v with pdf f (x ), then for any
real-valued function g,
Z ∞
E [g (X )] = g (x ) f (x ) dx

Corollary (2.1)
If a and b are constants, then

E (aX + b) = aE (X ) + b

The expected value of a r.v X , E (X ), is also


F. Guta (CoBE) FE 606 September, 2023 84 / 167
E [g (X )] = ∑fx :f (x )>0g g (x ) f (x )
referred to as the mean or the …rst moment of X .

The quantity E (X n ), n 1, is called the nth


moment of X .
By Proposition 2.1, we note that
8
<∑ n
n fx :f (x )>0g x f (x ) , if X is discrete
E (X ) =
: R ∞ x n f (x ) dx, if X is continuous

Another quantity of interest is the variance of a r.v


X , denoted by var (X ), which is de…ned by
F. Guta (CoBE) FE 606 September, 2023 85 / 167
h i
2
var (X ) = E (X E (X )) = E X2 E (X )2

Example (2.21)
Let X be normally distributed with parameters µ
and σ 2 . Find var (X ).
Recalling that E (X ) = µ, we have that
h i
2
var (X ) = E (X µ)
Z ∞
1 (x µ )2 /2σ 2
= p (x µ )2 e dx
σ 2π ∞

Substituting y = (x µ ) /σ yields
F. Guta (CoBE) FE 606 September, 2023 86 / 167
Example (2.21 continued. . . )
2 R∞
var (X ) = pσ y 2 e y 2 /2 dy
2π ∞

Integrating by parts (u = y, dv = ye y 2 /2 dy) gives



Z ∞
σ2 y 2 /2 y 2 /2
var (X ) = p ye + e dy
2π ∞ ∞
Z ∞
σ2 y 2 /2
= p e dy = σ 2
2π ∞

Example (2.22)
Calculate var (X ) where X represents the outcome
when a fair die is rolled.
F. Guta (CoBE) FE 606 September, 2023 87 / 167
Example (2.22 continued. . . )
As previously noted in Example 2.15, E (X ) = 27 .
Also,

1 1 1 1 1 1 91
E X 2 = 12 + 22 + 32 + 42 + 52 + 62 = .
6 6 6 6 6 6 6

Hence,
2
2 291 7 35
var (X ) = E X E (X ) = =
6 2 12

2.5 Jointly Distributed Random Variables


F. Guta (CoBE) FE 606 September, 2023 88 / 167
2.5.1 Joint Distribution Functions

We are often interested in probability statements


concerning two or more r.v.
To deal with such probabilities, we de…ne, for any
two r.v X and Y , the joint cumulative probability
distribution function of X and Y by

F (x, y ) = P fX x, Y yg , ∞ < x, y < ∞

The distribution of X can be obtained from the


joint distribution of X and Y as follows:
F. Guta (CoBE) FE 606 September, 2023 89 / 167
FX (x ) = P fX x g = P fX x, Y < ∞g = F fx, ∞g

Similarly, the cumulative distribution function of Y


is given by

FY (y ) = P fY yg = F f∞, yg

In the case where X and Y are both discrete


random variables, it is convenient to de…ne the joint
probability mass function of X and Y by

p (x, y ) = P fX = x, Y = yg
F. Guta (CoBE) FE 606 September, 2023 90 / 167
The probability mass function of X may be obtained
from p (x, y ) by

pX (x ) = ∑fy ,p (x,y )>0g p (x, y )

Similarly,

pY (y ) = ∑fx,p (x,y )>0g p (x, y )

We say that X and Y are jointly continuous if there


exists a function p (x, y ), de…ned for all real x and
y, having the property that for all sets A and B of
F. Guta (CoBE) FE 606 September, 2023 91 / 167
real numbers
Z Z
P fX 2 A, Y 2 Bg = f (x, y ) dxdy
B A

The function f (x, y ) is called the joint probability


density function of X and Y .
The probability density of X can be obtained from a
knowledge of f (x, y ) as follows:

P fX 2 Ag = P fX 2 A, Y 2 ( ∞, ∞)g
Z ∞Z Z
= f (x, y ) dxdy = fX (x ) dx
∞ A A
F. Guta (CoBE) FE 606 September, 2023 92 / 167
Z ∞
where fX (x ) = f (x, y ) dy.

Similarly, the probability density function of Y is
given by
Z ∞
fY (y ) = f (x, y ) dx

Since
Z x Z y
F (x, y ) = P fX x, Y yg = f (u, v ) dvdu
∞ ∞
di¤erentiation yields
∂2
F (x, y ) = f (x, y )
∂ x∂ y
F. Guta (CoBE) FE 606 September, 2023 93 / 167
A variation of Proposition 2.1 states that if X and

Y are r.v and g is a function of two r.v, then


8
>
>
>
< ∑y ∑x g (x, y ) f (x, y ) , in the discrete case
E [g (x, y )] = Z
>
> ∞Z ∞
>
: f (x, y ) dxdy, in the continuous case
∞ ∞

If X1 , X2 , ..., Xn are n r.v, then for any n constants


a1 , a2 , ..., an ,

E [a1 X1 + a2 X2 + + an Xn ] = a1 E (X1 ) + a2 E (X2 ) + + an E (Xn )

F. Guta (CoBE) FE 606 September, 2023 94 / 167


Example (2.23)
At a party N men throw their hats into the center
of a room. The hats are mixed up and each man
randomly selects one. Find the expected number of
men who select their own hats.
Let X denote the number of men that select their
own hats, we can best compute E (X ) by noting
that
X = X1 + X2 + + XN

where
F. Guta (CoBE) FE 606 September, 2023 95 / 167
Example (2.23 continued. . . )
8
< 1, if the i th man selects his own hat
Xi =
: 0, otherwise

Now, because the i th man is equally likely to select


any of the N hats, it follows that
1
P fXi = 1g = P the i th man selects his own hat = N

and so,

1
E (X ) = E (X1 ) + E (X2 ) + + E (XN ) = N =1
N

F. Guta (CoBE) FE 606 September, 2023 96 / 167


2.5.2 Independent Random Variables

The r.v X and Y are said to be independent if, for


all x, y,

P fX x, Y yg = P fX xg P fY yg (2.12)

In terms of the joint distribution function F of X


and Y , we have that X and Y are independent if

F (x, y ) = FX (x ) FY (y ) , for x and y

When X and Y are discrete, the condition of


F. Guta (CoBE) FE 606 September, 2023 97 / 167
independence reduces to

p (x, y ) = pX (x ) pY (y )

while if X and Y are jointly continuous,


independence reduces to

f (x, y ) = fX (x ) fY (y )

Proposition 2.2 If X and Y are independent, then


for any functions h and g

E [h (x ) g (y )] = E [h (x )] E [g (y )]
F. Guta (CoBE) FE 606 September, 2023 98 / 167
Proof.
Suppose that X and Y are jointly continuous. Then

Z ∞Z ∞
E [h (x ) g (y )] = h (x ) g (y ) f (x, y ) dxdy
∞ ∞
Z ∞Z ∞
= h (x ) g (y ) fX (x ) fY (y ) dxdy
∞ ∞
Z ∞ Z ∞
= h (x ) fX (x ) dx g (y ) fY (y ) dy
∞ ∞

= E [h (x )] E [g (y )]

The proof in the discrete case is similar.

F. Guta (CoBE) FE 606 September, 2023 99 / 167


2.5.3 Covariance and Variance of Sums of R.V.

The covariance of any two r.v X and Y , denoted by


cov (X , Y ), is de…ned by

cov (X , Y ) = E [(X E (X )) (Y E (Y ))]

= E [XY XE (Y ) YE (X ) + E (X ) E (Y )]

= E (XY ) E (X ) E (Y ) E (X ) E (Y ) + E (X ) E (Y )

= E (XY ) E (X ) E (Y )

Note that if X and Y are independent, then by


F. Guta (CoBE) FE 606 September, 2023 100 / 167
proposition 2.2 it follows that cov (X , Y ) = 0.
Example (2.24)
The joint density function of X , Y is

1
f (x, y ) = e (y +x /y )
, 0 < x, y < ∞
y

a). Verify that the preceding is a joint density function.


b). Find cov (X , Y ).

Solution
a). To show that f (x, y ) is a joint density function we
F. Guta (CoBE) FE 606 September, 2023 101 / 167
Solution
need to show it is nonnegative, which is immediate,
Z ∞Z ∞
and that f (x, y ) dxdy = 1.
∞ ∞

We prove the latter as follows:

Z ∞Z ∞ Z ∞Z ∞
1
f (x, y ) dxdy = e (y +x /y ) dxdy
∞ ∞ 0 0 y
Z ∞ Z ∞
y 1
= e e x /y dx dy
0 0 y
Z ∞
= e y dy = 1
0

F. Guta (CoBE) FE 606 September, 2023 102 / 167


Solution
b). To obtain cov (X , Y ), note that the density
function of Y is
Z ∞
y 1 x /y y
fY (y ) = e e dx = e
0 y

Thus, Y is an exponential r.v with parameter 1,


implying that
E (Y ) = 1

We compute E (X ) and E (XY ) as follows:

F. Guta (CoBE) FE 606 September, 2023 103 / 167


Solution Z ∞Z ∞
E (X ) = xf (x, y ) dxdy
∞ ∞
Z ∞Z ∞
1 (y +x /y )
= x e dxdy
0 0 y
Z ∞ Z ∞
y x x /y
= e e dx dy
0 0 y
Z ∞
x x /y dx
Now, ye is the expected value of an
0
exponential r.v with parameter 1/y, and thus is
equal to y. Consequently,
F. Guta (CoBE) FE 606 September, 2023 104 / 167
Solution Z ∞
E (X ) = ye y dy =1
0
Also,
Z ∞Z ∞
E (XY ) = xyf (x, y ) dxdy
Z ∞∞Z ∞∞
1 (y +x /y )
= xy e dxdy
0 0 y
Z ∞ Z ∞
y x (x /y )
= ye e dx dy
0 0 y
Z ∞
= y 2e y
dy
0

F. Guta (CoBE) FE 606 September, 2023 105 / 167


Solution
Integration by parts (dv = e y dy, u = y 2 ) gives
Z ∞ Z ∞
2 y 2 y ∞ y
E (XY ) = y e dy = y e 0
+ 2ye dy
0 0
= 2E (Y ) = 2

Consequently,

cov (X , Y ) = E (XY ) E (X ) E (Y ) = 2 1 ( 1) = 1

Properties of Covariance

F. Guta (CoBE) FE 606 September, 2023 106 / 167


For any r.v X , Y , Z and constant c,

i). cov (X , X ) = var (X ),


ii). cov (X , Y ) = cov (Y , X ),
iii). cov (cX , Y ) = c cov (X , Y )
iv). cov (X , Y + Z ) = cov (X , Y ) + cov (X , Z )

Whereas the …rst three properties are immediate,

the …nal one is easily proven as follows:

cov (X , Y + Z ) = E [X (Y + Z )] E (X ) E (Y + Z )
F. Guta (CoBE) FE 606 September, 2023 107 / 167
cov (X , Y + Z ) = E [XY + XZ ] E (X ) [E (Y ) + E (Z )]

= E (XY ) E (X ) E (Y ) + E (XZ ) E (X ) E (Z )

= cov (X , Y ) + cov (X , Z )

The fourth property listed generalizes to give the


following result:
!
n m n m
cov ∑ Xi , ∑ Yj = ∑ ∑ cov (Xi , Yj ) (2.13)
i =1 j =1 i =1 j =1

A useful expression for the variance of the sum of


F. Guta (CoBE) FE 606 September, 2023 108 / 167
r.v can be obtained from (2.13) as follows:
! !
n n n n n
var ∑ Xi = cov ∑ Xi , ∑ Xj = ∑ ∑ cov Xi , Xj
i =1 i =1 j =1 i =1 j =1
n n n
= ∑ var (Xi ) + ∑ ∑ cov Xi , Xj
i =1 i =1 j 6=i
n n n
= ∑ var (Xi ) + 2 ∑ ∑ cov Xi , Xj (2.14)
i =1 i =1 j <i

If Xi , i = 1, ..., n are independent random variables,


then (2.14) reduces to
!
n n
var ∑ Xi = ∑ var (Xi )
i =1 i =1
F. Guta (CoBE) FE 606 September, 2023 109 / 167
De…nition (2.3)
If X1 , ..., Xn are independent and identically
distributed, then the random variable X = n1 ∑ni=1 Xi
is called the sample mean.

Proposition 2.3 : Suppose that X1 , ..., Xn are


independent and identically distributed with
expected value µ and variance σ 2 . Then,

i). E X = µ.
ii). var X = σ 2 /n.
F. Guta (CoBE) FE 606 September, 2023 110 / 167
iii). cov X , Xi X = 0.
Proof.
Parts (i ) and (ii ) are easily established as follows:

1 n
n i∑
E X = E (Xi ) = µ
=1
!
2 n 2 n
1 1
var X =
n
var ∑ Xi =
n ∑ var (Xi ) =
i =1 i =1

To prove part (c ) we reason as follows:

cov X , Xi X = cov X , Xi cov X , X


F. Guta (CoBE) FE 606 September, 2023 111 / 167
Proof.
cov X , Xi X = n1 cov Xi + ∑nj6=i Xj , Xi var X

!
n
1 1 σ2
=
n
cov (Xi , Xi ) + cov
n
∑ Xj , Xi n
j 6=i
σ2 σ2
= =0
n n

Example (2.25)
Compute the variance of a binomial r.v X with
parameters n and p.
F. Guta (CoBE) FE 606 September, 2023 112 / 167
Solution
Since a binomial r.v represents the number of
successes in n independent trials when each trial has
a common probability p of being a success, we may
write
X = X1 + X2 + + Xn

where the Xi are independent Bernoulli r.v such that


8
< 1, if the i th trial is a success
Xi =
: 0, otherwise

F. Guta (CoBE) FE 606 September, 2023 113 / 167


Solution
Hence, from (2.14) we obtain

var (X ) = var (X1 ) + var (X2 ) + + var (Xn )

But

var (Xi ) = E Xi2 E (Xi )2


= p p 2 = p (1 p)

and thus
var (X ) = np (1 p)
F. Guta (CoBE) FE 606 September, 2023 114 / 167
It is often important to be able to calculate the
distribution of X + Y from the distributions of X
and Y when X and Y are independent.
Suppose that X and Y are continuous, X having
pdf f and Y having pdf g.

Then, letting FX +Y (a) be the cumulative

distribution function of X + Y , we have


ZZ
FX +Y (a) = P fX + Y ag = f (x ) g (y ) dxdy
X +Y a

F. Guta (CoBE) FE 606 September, 2023 115 / 167


Z ∞Z a y Z ∞ Z a y
R a y
F X +Y (a ) = ∞ f (x ) g (y ) dxdy = f (x ) dx g (y ) dy
∞ ∞ ∞ ∞
Z ∞
= F X (a y ) g (y ) dy (2.15)

By di¤erentiating (2.15), we obtain that the pdf


fX +Y (a) of X + Y is given by
Z ∞ Z ∞
d d d
F X +Y (a ) = F X (a y ) g (y ) dy = F X (a y ) g (y ) dy
da da ∞ ∞ da
Z ∞
= fX (a y ) g (y ) dy

Example (2.26)
Let X and Y be independent Poisson r.v with

F. Guta (CoBE) FE 606 September, 2023 116 / 167


Example (2.26 continued. . . )
respective means λ 1 and λ 2 . Calculate the
distribution of X + Y .
Solution
Since the event fX + Y = ng may be written as the

union of the disjoint events fX = k, Y = n kg,

0 k n, we have

n
P fX + Y = n g = ∑ P fX = k, Y = n kg
k =0
F. Guta (CoBE) FE 606 September, 2023 117 / 167
Solution
P fX + Y = ng = ∑nk =0 P fX = k g P fY = n kg

k
n
λ n2 k
= ∑e k!
λ1 λ 1
e λ2
(n k )!
k =0
e (λ 1 +λ 2 ) n n!
=
n!
∑ k ! (n k )!
λ k1 λ n2 k
k =0
n
(λ 1 +λ 2 ) ( λ 1 + λ 2 )
= e
n!

Thus, X + Y has a Poisson distribution with mean


λ 1 + λ 2.
F. Guta (CoBE) FE 606 September, 2023 118 / 167
2.5.4 Joint Probability Distribution of Functions of
Random Variables.

Let X1 and X2 be jointly continuous r.v with joint


pdf f (x1 , x2 ).
It is sometimes necessary to obtain the joint
distribution of the r.v Y1 and Y2 Y1 = g1 (X1 , X2 )
and Y2 = g2 (X1 , X2 ) for some functions g1 and g2 .
Assume that the functions g1 and g2 satisfy the
following conditions:
F. Guta (CoBE) FE 606 September, 2023 119 / 167
1). The equations y1 = g1 (x1 , x2 ) and y2 = g2 (x1 , x2 )
can be uniquely solved for x1 and x2 in terms of y1
and y2 with solutions given by, say, x1 = h1 (y1 , y2 ),
x2 = h2 (y1 , y2 ).
2). The functions h1 and h2 have continuous partial
derivatives at all points (y1 , y2 ) and are such that
the following 2 2 determinant
∂ h1 ∂ h1
∂ y1 ∂ y2 ∂ h1 ∂ h2 ∂ h1 ∂ h2
J (y1 , y2 ) = = 6= 0
∂ h2 ∂ h2 ∂ y1 ∂ y2 ∂ y2 ∂ y1
∂ y1 ∂ y2

F. Guta (CoBE) FE 606 September, 2023 120 / 167


at all points (x1 , x2 ).

Under these two conditions it can be shown that the


random variables Y1 and Y2 are jointly continuous
with joint density function given by:

fY 1 ,Y 2 (y1 , y2 ) = fX 1 ,X 2 (h1 (y1 , y2 ) , h2 (y1 , y2 )) jJ (y1 , y2 )j (2.16)

Example (2.27)
If X and Y are independent gamma r.v with
parameters (α, λ ) and (β , λ ), respectively, compute
the joint density of U = X + Y & V = X / (X + Y ).
F. Guta (CoBE) FE 606 September, 2023 121 / 167
Solution
The joint density of X and Y is given by

fX ,Y (x, y ) = fX (x ) fY (y )
λe λ x (λ x )α 1 λ e λ y (λ y )β 1
=
Γ (α ) Γ (β )
λ α +β λ (x +y ) α 1 β 1
= e x y
Γ (α ) Γ (β )

Now, if h1 (x, y ) = uv, h2 (x, y ) = u (1 v ), then

∂ h1 ∂ h1 ∂ h2 ∂ h2
= v, = u, =1 v, = u
∂u ∂v ∂u ∂v
F. Guta (CoBE) FE 606 September, 2023 122 / 167
Solution
Finally, we see that

fU ,V (u, v ) = fX ,Y (uv, u (1 v )) u

λ α +β λu 1 1
= e (uv )α (u (1 v ))β u
Γ (α ) Γ (β )
λ α +β λ u α +β 1 α 1 1
= e u v (1 v )β
Γ (α ) Γ (β )
1
λe λu
(λ u )α +β Γ (α + β ) α 1 1
= v (1 v )β
Γ (α + β ) Γ (α ) Γ (β )

Hence X + Y and X / (X + Y ) are independent,


with X + Y having a gamma distribution with
F. Guta (CoBE) FE 606 September, 2023 123 / 167
Solution
parameters (α + β , λ ) and X / (X + Y ) having
density function

Γ (α + β ) α 1 1
fV (v ) = v (1 v )β , 0<v <1
Γ (α ) Γ (β )

This is called the beta density with parameters


(α, β ).

2.6 Moment Generating Functions

The moment generating function (MGF) MX (t ) of


F. Guta (CoBE) FE 606 September, 2023 124 / 167
the r.v X is de…ned for all values t by
8
>
>
>
< ∑x e tx p (x ) , if X is discrete
tX
MX (t ) = E e = Z
>
> ∞
>
: e tx f (x ) dx, if X is continuous

We call MX (t ) the MGF because all of the

moments of X can be obtained by successively

di¤erentiating MX (t ). For example,

d d tX
MX0 (t ) = E e tX = E e = E Xe tX
dt dt
F. Guta (CoBE) FE 606 September, 2023 125 / 167
Hence,
MX0 (0) = E (X )

Similarly,

d
MX00 (t ) = E Xe tX = E X 2 e tX
dt

and so
MX00 (0) = E X 2

In general, the nth derivative of MX (t ) evaluated at


t = 0 equals E (X n ), i.e.,
F. Guta (CoBE) FE 606 September, 2023 126 / 167
MXn (t ) φ n (0) = E (X n ) , n 1
We now compute MX (t ) for some common
distributions.
Example (2.28)
MGF of the Binomial distribution with parameters n

and p :

n
n x
MX (t ) = E e tX = ∑ e tx x
p (1 p )n x
x =0
n
n

x n
= pe t (1 p )n x
= pe t + 1 p
x =0 x
F. Guta (CoBE) FE 606 September, 2023 127 / 167
Example (2.28 continued. . . )
n 1
Hence, MX0 (t ) = n (pe t + 1 p) pe t
and so MX0 (0) = E (X ) = np

Di¤erentiating a second time yields

n 2 2 n 1
MX00 (t ) = n (n 1) pe t + 1 p pe t + n pe t + 1 p pe t

and so

E X 2 = MX00 (0) = n (n 1) p2 + np

F. Guta (CoBE) FE 606 September, 2023 128 / 167


Example (2.28 continued. . . )
Thus, the variance of X is given by

var (X ) = E X 2 E (X )2
= n (n 1) p2 + np n2 p 2
= np (1 p)

Example (2.29)
MGF of the Poisson distribution with mean λ
∞ λλx
tx e
MX (t ) = E e tX
= ∑e x!
x =0
F. Guta (CoBE) FE 606 September, 2023 129 / 167
Example (2.29 continued. . . )
x
(λ e t )
MX (t ) = e λ
∑∞
x =0 x! =e λ eλ et = exp λ (e t 1)

Di¤erentiation yields

MX0 (t ) = λ e t exp λ e t 1

MX00 (t ) = λ 2 e 2t exp λ e t 1 + λ e t exp λ e t 1

and so E (X ) = MX0 (0) = λ

E X 2 = MX00 (0) = λ 2 + λ
F. Guta (CoBE) FE 606 September, 2023 130 / 167
Example (2.29 continued. . . )
var (X ) = E X 2 E (X )2 = λ

Thus, both the mean and the variance of the


Poisson distribution equal λ .

Example (2.30)
MGF of the Gamma Distribution with Parameters

α, and λ

Z ∞ λx 1
tX tx λ e (λ x )α
MX (t ) = E e = e dx
0 Γ (α )

F. Guta (CoBE) FE 606 September, 2023 131 / 167


Example (2.30 continued. . . )
Z ∞ 1
λα (λ t )e (λ t )x ((λ t )x )α
MX (t ) = (λ t )α Γ (α )
dx
0
α
= λ λ t , for t < λ

Di¤erentiation of MX (t ) yields

αλ α α ( α + 1) λ α
MX0 (t ) = α +1
, MX0 (t ) =
(λ t) (λ t )α +2

Hence,

α α ( α + 1)
E (X ) = MX0 (0) = , E X 2 = MX00 (0) =
λ λ2
F. Guta (CoBE) FE 606 September, 2023 132 / 167
Example (2.30 continued. . . )
The variance of X is thus given by

var (X ) = E X 2 E (X )2
α ( α + 1) α 2 α
= =
λ2 λ λ2
Example (2.31)
MGF of the Normal Distribution with Parameters µ
and σ 2
The MGF of a standard normal r.v Z is obtained as
follows.
F. Guta (CoBE) FE 606 September, 2023 133 / 167
Example (2.31 continued. . . )
R∞ tz p1 e z 2 /2 dz
MZ (t ) = E e tZ = ∞e 2π
Z ∞
1 2
= p e (z 2tz )/2 dz
∞ 2π
Z ∞
t 2 /2 1 2 2
= e p e (z t ) /2 dz = e t /2
∞ 2π

If Z is a standard normal, then X = σ Z + µ is


normal with parameters µ and σ 2 ; therefore,
h i
tX t (σ Z + µ )
MX (t ) = E e =E e = E e tµ e tσ z
F. Guta (CoBE) FE 606 September, 2023 134 / 167
Example (2.31 continued. . . )
2 σ 2 /2
MX (t ) = e t µ MZ (t σ ) = e t µ e t = exp t µ + 12 t 2 σ 2

By di¤erentiating we obtain

1
MX0 (t ) = µ + t σ 2 exp t µ + t 2 σ 2
2
2 1 1
MX00 (t ) = µ + tσ 2 exp t µ + t 2 σ 2 + σ 2 exp t µ + t 2 σ 2
2 2

and so E (X ) = MX0 (0) = µ

E X 2 = MX00 (0) = µ 2 + σ 2

F. Guta (CoBE) FE 606 September, 2023 135 / 167


Example (2.31 continued. . . )
implying that

var (X ) = E X 2 E (X )2 = σ 2

Note: MGF of the sum of independent r.v is just the

product of the individual moment generating


functions.
Example (3.32)
Show that if X and Y are independent normal r.v
with parameters µ 1 , σ 21 & µ 2 , σ 22 , respectively,
F. Guta (CoBE) FE 606 September, 2023 136 / 167
Example (2.32 continued. . . )
then X + Y is normal with mean µ 1 + µ 2 and
variance σ 21 + σ 22 .

Solution
MGF of X + Y is found as follows:

h i
MX +Y (t ) = E e t (X +Y ) = E e tX e tY = MX (t ) MY (t )

1 1
= exp t µ 1 + t 2 σ 21 exp t µ 2 + t 2 σ 22
2 2
1
= exp t ( µ 1 + µ 2 ) + t 2 σ 21 + σ 22
2

F. Guta (CoBE) FE 606 September, 2023 137 / 167


Solution
This is the MGF of a normal r.v with mean µ 1 + µ 2
and variance σ 21 + σ 22 .
Hence, the result follows since the MGF uniquely
determines the distribution.

2.6.1 Joint Distribution of the Sample Mean and


Sample Variance from a Normal Population

Let X1 , ..., Xn be independent and identically


distributed r.v, each with mean µ and variance σ 2 .
F. Guta (CoBE) FE 606 September, 2023 138 / 167
The random variable S 2 de…ned by
n
1
1∑
2 2
S = Xi X
n i =1
is called the sample variance of these data.
To compute E S 2 we use the identity
n n
∑ ∑ (Xi
2 2
Xi X = µ )2 n X µ (2.17)
i =1 i =1
This can be shown as follows:
n n
∑ ∑
2 2
Xi X = (Xi µ) X µ
i =1 i =1
F. Guta (CoBE) FE 606 September, 2023 139 / 167
2 2
∑ni=1 Xi X = ∑ni=1 (Xi µ )2 + n X µ 2 X µ ∑ni=1 (Xi µ)

n
∑ (Xi
2
= µ )2 n X µ
i =1

Using Identity (2.17) gives


n h i h i
= ∑ E (Xi
2 2 2
E ( n 1) S µ) nE X µ
i =1
2
= nσ nvar X = (n 1) σ 2

Thus, we obtain from the preceding that

E S2 = σ 2
F. Guta (CoBE) FE 606 September, 2023 140 / 167
We will now determine the joint distribution of the
sample mean X = ∑ni=1 Xi /n and the sample
variance S 2 when the Xi have a normal distribution.
To begin we need the concept of a chi-squared
random variable.
De…nition (2.4)
If Z1 , ..., Zn are independent standard normal r.v,
then the random variable ∑ni=1 Zi2 is said to be a
chi-squared random variable with n degrees of
freedom.
F. Guta (CoBE) FE 606 September, 2023 141 / 167
We shall now compute the moment generating

function of ∑ni=1 Zi2 . To begin, note that

Z ∞
2 2 1 2
E e tZi = e tzi p e zi /2 dzi
∞ 2π
Z ∞
1 2
= p e (1 2t )zi /2 dzi
∞ 2π
Z ∞
1 2 2 1
= σ p e zi /2σ dzi , where σ 2 = (1 2t )
∞ σ 2π

1 /2
= σ = (1 2t )

Hence,
F. Guta (CoBE) FE 606 September, 2023 142 / 167
n n
n /2
E exp t ∑ Zi2 = ∏ E exp tZi2 = (1 2t )
i =1 i=

Now, let X1 , ..., Xn be independent normal r.v, each


with mean µ and variance σ 2 , and let
X = ∑ni=1 Xi /n and S 2 denote their sample mean
and sample variance.
Since the sum of independent normal r.v is also a
normal r.v, it follows that X is a normal r.v with
mean µ and variance σ 2 /n.
In addition, from Proposition 2.3,
F. Guta (CoBE) FE 606 September, 2023 143 / 167
cov X , Xi X = 0, i = 1, 2, : : : , n (2.18)
For a normal r.v zero correlation implies
independence, thus, X is independent of the
sequence of deviations Xi X , i = 1, 2, : : : , n, it
follows that it is also independent of the sample
variance S 2 .
To determine the distribution of S 2 , use identity
(2.17) to obtain
n
∑ (Xi
2
(n 1) S =2
µ )2 n X µ
i =1
F. Guta (CoBE) FE 606 September, 2023 144 / 167
Dividing both sides of this equation by σ 2 yields
2 n 2
(n 1) S 2 X µ Xi

µ
2
+ p = (2.19)
σ σ/ n i =1 σ

Now, ∑ni=1 ((Xi µ ) /σ )2 is the sum of the squares


of n independent standard normal r.v, and so is a
chi-squared r.v with n degrees of freedom; it thus
has moment generating function (1 2t ) n/2 .
p 2
Also X µ / (σ / n) is the square of a
standard normal r.v and so is a chi-squared r.v with
F. Guta (CoBE) FE 606 September, 2023 145 / 167
1/2
one df ; it thus has MGF (1 2t ) .

In addition, we have already seen that the two r.v


on the left side of (2.19) are independent.
Therefore, because the MGF of the sum of
independent r.v is equal to the product of their
individual MGF, we obtain that
h i
1)S 2 /σ 2 1/2 n/2
E e t (n (1 2t ) = (1 2t )
h i
1)S 2 /σ 2 (n 1)/2
E e t (n = (1 2t )

F. Guta (CoBE) FE 606 September, 2023 146 / 167


(n 1)/2
But because (1 2t ) is the MGF of a χ 2
r.v with n 1 df, we can conclude, since the MGF
uniquely determines the distribution of the r.v, that
this is the distribution of (n 1) S 2 /σ 2 .
Proposition 2.4 If X1 , ..., Xn are iid normal r.v with
mean µ and variance σ 2 , then the sample mean X
and the sample variance S 2 are independent. X is a
normal r.v with mean µ and variance σ 2 /n;
(n 1) S 2 /σ 2 is a χ 2 r.v with n 1 df.

F. Guta (CoBE) FE 606 September, 2023 147 / 167


2.7 Limit Theorems

Proposition 2.5 (Markov’s Inequality) If X is a r.v


that takes only nonnegative values, then for any
E (X )
value a > 0, P fX ag a .

Proof.
We give a proof for the case where X is continuous

with density f .

Z ∞ Z a Z ∞
E (X ) = xf (x ) dx = xf (x ) dx + xf (x ) dx
0 0 a
F. Guta (CoBE) FE 606 September, 2023 148 / 167
Proof. Z a Z ∞
E (X ) xf (x ) dx a f (x ) dx = aP fX ag
0 a

E (X )
E (X ) aP fX ag P fX ag
a

Proposition 2.6 (Chebyshev’s Inequality) If X is a


r.v with mean µ and variance σ 2 , then, for any
value k > 0,
σ2
P fjX µj kg
k2
F. Guta (CoBE) FE 606 September, 2023 149 / 167
Proof.
Since (X µ )2 is a nonnegative random variable,
we can apply Markov’s inequality (with a = k 2 ) to
obtain
h i
2
n o E (X µ) σ2
2 2
P (X µ) k = 2
k2 k

Example (2.33)
Suppose we know that the number of items
produced in a factory during a week is a r.v with
F. Guta (CoBE) FE 606 September, 2023 150 / 167
Example (2.33 continued. . . )
mean 500.

a). What is the probability that this week’s production


will be at least 1000?
b). If the variance of a week’s production is known to
equal 100, then what is the probability that this
week’s production will be between 400 and 600?

Solution
Let X be the number of items that will be produced
in a week.
F. Guta (CoBE) FE 606 September, 2023 151 / 167
Solution
a). By Markov’s inequality,

E (X ) 500 1
P fX 1000g = = .
1000 1000 2

b). By Chebyshev’s inequality,

σ2 1
P fjX 500j 100g =
1002 100

Hence

1 99
P fjX 500j 100g 1 =
100 100
F. Guta (CoBE) FE 606 September, 2023 152 / 167
Solution
and so the probability that this week’s production
will be between 400 and 600 is at least 0.99.
Theorem (2.1: Strong Law of Large Numbers)
Let X1 , X2 , ... be a sequence of independent r.v
having a common distribution, and let E (Xi ) = µ.
Then, with probability 1, X ! µ, n ! ∞.

Example (2.34)
Suppose that a sequence of independent trials is
performed. Let E be a …xed event and denote by
F. Guta (CoBE) FE 606 September, 2023 153 / 167
Example (2.34 continued. . . )
P (E ) the probability that E occurs on any
particular trial. Let
8
< 1, if E occurs on the i th trial
Xi =
: 0, if E does not occurs on the i th trial

We have by the strong law of large numbers that,


with probability 1,

X1 + X2 + + Xn
! E (X ) = P (E )
n
F. Guta (CoBE) FE 606 September, 2023 154 / 167
Theorem (2.2: Central Limit Theorem)
Let X1 , X2 , ... be a sequence of independent,
identically distributed r.v, each with mean µ and
variance σ 2 . Then the distribution of

X µ d
p ! N (0, 1) , as n ! ∞.
σ/ n

Example (2.35: Normal Approximation to the Binomial)


Let X be the number of times that a fair coin,
‡ipped 40 times, lands heads. Find the probability
that X = 20. Use the normal approximation and
F. Guta (CoBE) FE 606 September, 2023 155 / 167
Example (2.35 continued. . . )
then compare it to the exact solution.

Solution
Since the binomial is a discrete r.v, and the normal

a continuous r.v, it leads to a better approximation

to write the desired probability as

P fX = 20g = P f19.5 < X < 20.5g


( )
19.5 20 X np 20.5 20
= P p <p < p
10 np (n p ) 10
F. Guta (CoBE) FE 606 September, 2023 156 / 167
Solution
P fX = 20g = P f 0.16 < Z < 0.16g

' Φ (0.16) Φ ( 0.16) ' 0.1272

The exact result is


40
40 1
P fX = 20g = ' 0.1254
20 2

2.8 Conditional Probability and Expectation

One of the most useful concepts in probability


F. Guta (CoBE) FE 606 September, 2023 157 / 167
theory is that of conditional probability and
conditional expectation.

2.8.1 The Discrete Case

If X and Y are discrete random variables, then it is


natural to de…ne the conditional probability mass
function of X given that Y = y, by

p X jY ( xj y ) = P fX = xj Y = yg
P fX = x, Y = yg p (x, y )
= =
P fY = yg pY (y )
F. Guta (CoBE) FE 606 September, 2023 158 / 167
for all values of y such that P fY = yg > 0.
Similarly, the conditional probability distribution
function of X given that Y = y is de…ned, for all y
such that P fY = yg > 0, by

F X jY ( xj y ) = P fX xj Y = yg = ∑ p X jY ( aj y )
a x
Finally, the conditional expectation of X given that
Y = y is de…ned by

E [ X j Y = y ] = ∑ xP f X = x j Y = y g = ∑ xp X jY ( x j y )
x x

F. Guta (CoBE) FE 606 September, 2023 159 / 167


If X is independent of Y , then the conditional mass
function, distribution, and expectation are the same
as the unconditional ones.
This follows, since if X is independent of Y , then

p X jY ( xj y ) = P fX = xj Y = yg = P fX = xg

Example (2.36)
If X and Y are independent Poisson random
variables with respective means λ 1 and λ 2 ,
calculate the conditional expected value of X given
F. Guta (CoBE) FE 606 September, 2023 160 / 167
Example (2.36 continued. . . )
that X + Y = n.
Solution
Let us …rst calculate the conditional probability mass

function of X given that X + Y = n. We obtain

P fX = x, X + Y = ng
P f X = x j X + Y = ng =
P fX + Y = n g
P fX = x, Y = n x g
=
P fX + Y = n g
P fX = x g P fY = n x g
=
P fX + Y = ng

F. Guta (CoBE) FE 606 September, 2023 161 / 167


Solution
Recalling that X + Y has a Poisson distribution
with mean λ 1 + λ 2 , the preceding equation equals
" # 1
e λ1λ x
1 e λ2λ n x
2 e (λ 1 +λ 2 ) ( λ
1 + λ 2 )n
P f X = x j X + Y = ng =
x! (n x )! n!
n λ x1 λ n2 x
=
x (λ 1 + λ 2 )n
x n x
n λ1 λ1
=
x λ1 +λ2 λ1 +λ2

The conditional distribution of X given that


X + Y = n is the binomial distribution with

F. Guta (CoBE) FE 606 September, 2023 162 / 167


Solution
parameters n and λ 1 / (λ 1 + λ 2 ). Hence,

λ1
E fX j X + Y = ng = n .
λ1 +λ2

2.8.2 The Continuous Case

If X and Y have a joint pdf f (x, y ), then the


conditional pdf of X , given that Y = y, is de…ned
for all values of y such that fY (y ) > 0, by
f (x, y )
f X jY ( xj y ) =
fY (y )
F. Guta (CoBE) FE 606 September, 2023 163 / 167
The conditional expectation of X , given that Y = y,
is de…ned for all values of y such that fY (y ) > 0, by
Z ∞
E [XjY = y] = xf X jY ( xj y ) dx

Example (2.37)
Suppose the joint density of X and Y is given by
8
>
>
>
< 4y (x y )e (x +y ) , 0 < x < ∞, 0 y x
f (x, y ) =
>
>
>
: 0, otherwise

Compute E [ X j Y = y ].
F. Guta (CoBE) FE 606 September, 2023 164 / 167
Solution
The conditional density of X , given that Y = y, is

given by

f (x, y )
f X jY ( x j y ) =
fY (y )
4y (x y )e (x +y )
= Z ∞
4y (x y )e (x +y ) dx
y
(x y )e x
= Z ∞
(x y )e x dx
y

F. Guta (CoBE) FE 606 September, 2023 165 / 167


Solution
(x y )e x
=Z ∞ , by letting s = x y
se ( y + s ) ds
0

(x y )
= (x y )e , x >y

Z ∞
(x y )
E [XjY = y] = x (x y )e dx
y
Z ∞
= (s + y ) se s ds
Z0∞ Z ∞
2 s s
= s e ds + yse ds
0 0
= 2+y
F. Guta (CoBE) FE 606 September, 2023 166 / 167
Note 1: Law of iterated expectations:

E (X ) = E [E ( X j Y )]

F. Guta (CoBE) FE 606 September, 2023 167 / 167

You might also like