0% found this document useful (0 votes)
74 views

Introduction To Probability Theory

The document discusses transformations of random variables and their effect on probability distributions. It introduces real-valued transformations of multi random variables of the form φ(X,Y) and derives the distribution of Z=φ(X,Y). It also discusses linear transformations where φ(X,Y)=(X,Y)A for an invertible matrix A, and shows that the distribution of (U,V)=(X,Y)A is given by the Jacobian of the transformation. The document provides examples including deriving the distribution of the sum of independent random variables and the distribution of exponential random variables. It also discusses the multinormal distribution.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Introduction To Probability Theory

The document discusses transformations of random variables and their effect on probability distributions. It introduces real-valued transformations of multi random variables of the form φ(X,Y) and derives the distribution of Z=φ(X,Y). It also discusses linear transformations where φ(X,Y)=(X,Y)A for an invertible matrix A, and shows that the distribution of (U,V)=(X,Y)A is given by the Jacobian of the transformation. The document provides examples including deriving the distribution of the sum of independent random variables and the distribution of exponential random variables. It also discusses the multinormal distribution.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Introduction to Probability Theory

K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay

October 1, 2017
2

LECTURES 18-19

Real valued transformations of multi random variables: In this sub-


section, we look at transformations of (X, Y ) of the form ϕ ◦ (X, Y ), i.e.
ϕ(X, Y ) where ϕ : R2 → R, i.e. real valued functions of (X, Y ), for example
X +Y, XY etc. One can write down the distribution function Z = ϕ◦(X, Y )
using the distribution µ of (X, Y ) as follows. Set
Az = {(x, y)|ϕ(x, y) ≤ z}, z ∈ R.
Then
FZ (z) = P ({ϕ(X, Y ) ∈ (−∞, z]}) = P ({(X, Y ) ∈ Az }) = µ(Az ),
where µ denote the joint distribution of (X, Y ) and FZ denote the distribu-
tion function of Z = ϕ(X, Y ). Hence it is all about computing Az and then
µ(Az ).

Example 0.1 (Distribution of sum) Let X, Y be with joint pdf f . Then one
can compute the distribution of the sum Z = X + Y as follows. Note
Az = {(x, y)| − ∞ < x < ∞, −∞ < y ≤ z − x}.
Hence
Z ∞ Z z−x
FZ (z) = f (x, y)dydx
Z−∞ −∞
∞ Z z
(put t = y + x) = f (x, t − x)dtdx
Z−∞ −∞
z Z ∞
(change order of integration) = f (x, t − x)dxdt.
−∞ −∞

Hence X + Y has pdf given by


Z ∞
fZ (z) = f (x, z − x)dx.
−∞

Corollary 0.1 Let X, Y be independent random variables with joint pdf f .


Then the pdf of X + Y is given by
fX+Y (z) = fX ∗ fY (z),
where fX ∗ fY denote the convolution of fX and fY and is defined as
Z Z
fX ∗ fY (z) = fX (y)fY (z − y)dy = fX (z − y)fY (y)dy .
R R
3

Proof. The proof (exercise) follows immediately from f (x, y) = fX (x)fY (y)
and the above example.

Example 0.2 Let X, Y be independent exponential random variables with


parameters λ1 and λ2 respectively. Then

λ1 e−λ1 x if x ≥ 0

fX (x) =
0 if x < 0 .

fY is given similarly. Now for z ≤ 0, clearly fX ∗ fY (z) = 0. For z > 0,


Z z
fX ∗ fY (z) = λ1 e−λ1 x λ2 e−λ2 (z−x) dx
0 λ1 λ2 −λ z
1 − e−λ2 z ) if λ1 6= λ2
= λ2 −λ1 (e .
2
λ ze −λz if λ1 = λ2 = λ

The above gives the pdf of X + Y .

Linear transformations of multi random variables: In this section we


look at when ϕ(x, y) = (x, y)A, where A is a 2 × 2 invertible (i.e. non
singular) matrix. We will see the distribution of φ(X, Y ). Note that
 
cos θ − sin θ
A =
sin θ cos θ

rotates (X, Y ) through an angle θ in the counter clockwise direction. To


find the distribution of (U, V ) = (X, Y )A, we use the following change of
variable formula. Let ϕ = (ϕ1 , ϕ2 ) : R2 → R2 be a map (with continuously
differentiable partial derivatives) such that the Jacobian at (x, y)
!
∂ϕ1 ∂ϕ1
∂x ∂y
J(ϕ1 (x, y), ϕ2 (x, y)) = det ∂ϕ2 ∂ϕ2 6= 0
∂x ∂y

Then the area element under the mapping (x, y) 7→ (u = ϕ1 (x, y), v =
ϕ2 (x, y)) makes the transformation dxdy 7→ dudv = |J(x, y)|dxdy 1
1
Here note that infinitesemal (small) rectangle [x, x + dx] × [y, y + dy] i.e. dx × dy
is approximately mapped to the parallelogram du × dv. Now du is the vector joining the
points (ϕ1 (x, y), ϕ2 (x, y)) and (ϕ1 (x + dx, y), ϕ2 (x + dx, y)) and hence

du = (ϕ1 (x + dx, y) − ϕ1 (x, y), ϕ2 (x + dx, y) − ϕ2 (x, y))


∂ϕ1 (x, y) ∂ϕ2 (x, y)
∼ ( dx, dx).
∂x ∂x
4

Theorem 0.1 Let D be an elementary region in R2 and f : D → R be


continuous. Let ϕ = (ϕ1 , ϕ2 ) : O → R2 be such that

(i) ϕ is one to one

(ii) ϕ1 , ϕ2 have continuous paritial derivatives on O, O is an open set in


R2

(iii) J(ϕ1 (x, y), ϕ2 (x, y)) 6= 0 on O

(iv) There exists E ⊆ O such that E is elementary and ϕ(E) = D.

Then
ZZ ZZ
f (u, v)dudv = f (ϕ1 (x, y), ϕ2 (x, y))|J(ϕ1 (x, y), ϕ2 (x, y))|dxdy.
D E

As an application of the above change of variable formula, we have the


following result.

Theorem 0.2 Let (X, Y ) be a random vector in R2 with joint pdf f and A
be a non singular 2 × 2 matrix. Then (U, V ) = (X, Y )A is with pdf g given
by
1
g(u, v) = f ((u, v)A−1 )
|detA|

Proof: For u, v ∈ R, set Ruv = (−∞, u] × (−∞, v] and I = Ruv A−1 .


One can see that I will be an elementary set. Now the distribution function
Similarly
∂ϕ1 (x, y) ∂ϕ2 (x, y)
dv ∼ ( dy, dy).
∂y ∂y

∂ϕ1 (x, y) ∂ϕ2 (x, y)


du = dxi + dxj + 0k
∂x ∂x
∂ϕ1 (x, y) ∂ϕ2 (x, y)
dv = dyi + dyj + 0k.
∂y ∂y

Area of the parallelogram = |du × dv|


∂ϕ1 (x, y) ∂ϕ2 (x, y) ∂ϕ2 (x, y) ∂ϕ1 (x, y)
 
= dxdy − dxdy

∂x ∂y ∂x ∂y
k

= |J(ϕ1 (x, y), ϕ2 (x, y))|dxdy.


5

F(U,V ) of (U, V ) is given by


F(U,V ) (u, v) = P {(U, V ) ∈ Ruv }
[using (x, y) 7→ (x, y)A, a bijection] = P {(X, Y ) ∈ I}
ZZ
= f (x, y)dxdy
Z ZI
1
(Theorem 0.1 to (u, v) 7→ (u, v)A−1 ) = f ((u, v)A−1 ) dudv.
Ruv |detA|
Hence (U, V ) has a pdf say g and is given by
1
g(u, v) = f ((u, v)A−1 ).
|detA|

Multinormal distribution. X is said tobe a non degenerate multinormal


σ12 σ12
with parameters µ = (µ1 , µ2 ) and Σ = if the joint pdf fX is
σ21 σ22
given by
1 1 −1 ⊥
fX (x1 , x2 ) = √ e− 2 (x−µ)Σ (x−µ) ,
2π detΣ
where Σ is symmetric positive definite and (x − µ)⊥ is the column vector
corresponding to the row vector (x1 −µ1 , x2 −µ2 ). Also Σ is positive definite
implies that |σ12 | ≤ σ1 σ2 . Rewrite Σ as
 2 
σ1 ρσ1 σ2
Σ =
ρσ1 σ2 σ22
then (exercise)
h i
1 (x1 −µ1 )2 2ρ(x1 −µ1 )(x2 −µ2 ) (x −µ )2
1 − 2 2 − σ1 σ2
+ 2 22
fX (x1 , x2 ) = p e 2(1−ρ ) σ1 σ2
.
2πσ1 σ2 1 − ρ 2

Now we will see the marginal pdfs of multinormal.

Essence of the calculation is ’completing the square’. I will do it sepa-


rately. Consider
(x1 −µ1 )2 2

σ12
− 2ρ(x1 −µ 1 )(x2 −µ2 )
σ1 σ2 + (x2 −µ
σ22
2)

h 2  2 i  2
= (x2σ−µ 2
2)
− 2ρ(x1 −µ1 )(x2 −µ2 )
σ σ + ρ(x1 −µ1 )
σ1 + (1 − ρ2 ) (x1σ−µ 1)

 21 2  2 1
(x2 −µ2 ) ρ(x1 −µ1 ) 2 (x1 −µ1 )
= σ2 − σ + (1 − ρ ) σ
 2 1  2 1
= (x2σ−µ 2
2)
− a + (1 − ρ2 ) (x1σ−µ 1
1)
,
6

where
ρ(x1 − µ1 )
a= .
σ1
Now
 2
(x1 −µ1 )
− 12
 2
Z ∞ σ1 Z ∞ 1 (x2 −µ2 )
e −
2(1−ρ2 ) σ2
−a
fX (x1 , x2 )dx2 = p e dx2
−∞ 2πσ1 σ2 1 − ρ2 −∞
 2
(x1 −µ1 )
x − µ − 21 σ1 Z ∞
h 1 2 2
i e 1 x2
put x = p −a = √ √ e− 2 dx
1 − ρ2 σ2 2πσ1 2π −∞
 2
(x1 −µ1 )
1 − 12 σ1
= √ e .
2πσ1
Hence X1 ∼ N (µ1 , σ1 ). Similarly X2 ∼ N (µ2 , σ2 ).
Theorem 0.3 Let X = (X1 , X2 ) be a multinormal non degenerete random
variable with parameters µ and Σ. Then for any α, β ∈ R αX1 + βX2 is a
normal random variable.
Proof: Since X − µ is normal with paramenters 0 and Σ, it is enough to
prove the theorem when µ = 0. (exercise)

Let A be a 2 × 2 symmetric matrix such that AA⊥ = Σ [ Here the


1 1 1
choice of A is Σ 2 and Σ 2 = P Λ 2 P −1 where Λ is the diagonal matrix (of
1
eigen values of Σ and Σ = P ΛP −1 ) and hence Λ 2 is the diagonal matrix
with diagonal entries as the square root of the eigen values of Λ]. Define
Y = XA−1 , then using Theorem 0.1, the pdf g of Y exists and is given by
g(y1 , y2 ) = |detA|f (yA)
√ 1 1 −1 ⊥
= detΣ √ e− 2 yAΣ Ay
2π detΣ
1 − 1 kyk2
= e 2 .

Hence Y is multi normal with parameters 0 and I. Therefore g(y1 , y2 ) =
gY1 (y1 )gY2 (y2 ). This implies that Y1 and Y2 are independent normal random
variables.
Now we can see that αX1 + βX2 = aY1 + bY2 for some a, b ∈ R and hence
is a normal random variable. (exercise)
This completes the proof.
7

Remark 0.1 The proof of Theorem 0.3 tells us some thing more. Let X ∼
N (µ, Σ), i.e. X is a multinormal random variable with paramenters µ and
Σ. Set X̄ = X − µ, then X̄ ∼ N (0, Σ).
1
Then Ȳ = X̄Σ− 2 ∼ N (0, I). Hence
1 1 1 1
Y := XΣ− 2 = µΣ− 2 + X̄Σ− 2 ∼ N (µΣ− 2 , I).

Note the above is a generalization to the multidimentional case of the


following result for the normal random variables.

X ∼ N (µ, σ 2 ), then aX ∼ N (aµ, a2 σ 2 ).

Theorem 0.3 leads to a more general definition multinormal which in-


cludes multinormal random variables which are degenerate also.

Definition 6.7 A random vector X = (X1 , X2 ) is said to be multinormal if


αX1 + βX2 is normal for all α, β ∈ R.

Example 0.3 Let X1 ∼ N (0, 1) and X = (X1 , −X1 ). Then any linear
combination of the components of X is normally distributed. Also X doesnot
have a joint density function. To see this, let L = {(x, y)|x + y = 0}, the
graph of x + y = 0. Then

P {X ∈ L} = P (Ω) = 1.

Now suppose X has a joint pdf f , then for Ln = L ∩ [−n, n],


ZZ
P {X ∈ Ln } = f (x, y)dxdy = 0, for all n ≥ 1.
Ln

Now
P {X ∈ L} = lim P {X ∈ Ln } = 0
n→∞

a conradiction to P {X ∈ L} = 1. Hence X doesn’t have a density. i.e., X


is an example of a degenerate multinormal distribution.

Chapter 7: Expectation and other moments


In this chapter, we introduce expected value or the mean of a random
variable. First we define expectation for discrete random variables and then
8

for general random variable. Finally we introduce other moments and com-
ment on moment problem.

First we give a useful representation of discrete random variables.


Theorem 0.4 Let X be a discrete random variable defined on a probability
space (Ω, F, P ). Then there exists a partition {An | n = 1, , . . . , N } ⊆ F of
Ω and {an |n = 1, 2, . . . , N } ⊆ R such that an ’s are distinct and
N
X
X = an IAn a.s ,
n=1

where N may be ∞.
Proof. Let F be the distribution function of X. Let {a1 , a2 , . . . , aN } be the
set of all discontinuities of F . Here N may be ∞. Since X is discrete, we
have
XN
F (an ) − F (an −) = 1 .
n=1
Set
An = {X = an } .
Then {An } is pairwise disjoint and ∪N
∞ An = Ω. i.e., {An } is a partition of
Ω and
N
X
X = an IAn .
n=1

Remark 0.2 If X is a discrete random variable on a probability space


(Ω, F, P ), then the ’effective’ range of X is at the most countable. Here
’effective’ range means those values taken by X which has positive probabil-
ity. This leads to the name ’discrete’ random variable.
In fact, in the above proof {An } is a partition of Ω0 which is subset of Ω
by excluding the ’non probable values of X and hence P (Ω0 ) = 1.

Remark 0.3 If X is a discrete random variable, then one can assume with-
out the loss of generality that

X
X = an IAn .
n=1

Since if N < ∞, then set An = ∅ for n ≥ N + 1 and an , n ≥ N + 1 are


chosen so that they are distinct.
9

Theorem 0.5 Let {Bn } be a countable partition of Ω from F and {bb } be


a sequence of real numbers which are not necessarily distinct. Then if

X ∞
X
an IAn = bn IBn
n=1 n=1

then

X ∞
X
an P (An ) = bn P (Bn ) .
n=1 n=1

Proof. (Reading exercise) Note that an ’s are distinct and hence it follows
that each Bm is contained in An for aome n. For each n ≥ 1, set

In = {m ≥ 1|An Bm 6= ∅} .

Then clearly
An = ∪m∈In Bm , n ≥ 1 .
Also if m0 ∈ In then an = bm0 . Therefore

X ∞ X
X
bm P (Bm ) = bm P (Bm )
m=1 n=1 m∈In
X∞ X
= an P (Bm )
n=1 m∈In
X∞
= an P (An ) .
n=1

This completes the proof.


Definition 7.1. Let X be a discrete random variable represented by {(An , an ) | n ≥
1}. Then expectation of X denoted by EX is defined as

X
EX = an P (An ) ,
n=1

provided the right hand side series converges absolutely.

Remark 0.4 In view of Remark 6.0.5., if X has range a1 , a2 , . . . , aN , then


N
X
EX = an P {X = an } .
n=1
10

Example 0.4 Let X be a Bernoulli(p) random variable. Then

X = IA , where A = {X = 1} .

Hence
EX = P (A) = p .

Example 0.5 Let X be a Binomial(n, p) random variable. Then


n
X
X = k IAk , where Ak = {X = k} .
k=0

Hence
n n  
X X n k
EX = kP (Ak ) = k p (1 − p)n−k
k
k=0   k=0
n
X n−1 k
= n p (1 − p)n−k = np .
k−1
k=1

Here we used the identity


   
n n−1
k = n .
k k−1

Example 0.6 Let X be a Poisson (λ) random variable. Then



X
X = n IAn , where An = {X = n} .
n=0

Hence

X λn
EX = n e−λ = λ.
n!
n=0

Example 0.7 Let X be a Geometric (p) random variable. Then



X
X = n IAn , where An = {X = n}
n=1

Hence

X p 1
EX = n p (1 − p)n−1 = 2
= .
(1 − (1 − p)) p
n=1
11

Theorem 0.6 (Properties of expectation) (Reading exercise) Let X and Y


be discrete random variables with finite means. Then
(i) If X ≥ 0, then EX ≥ 0.
(ii) For a ∈ R
E(aX + Y ) = aEX + EY .

Proof. (i) Let {(An , an )|n ≥ 1} be a representation of X. Then X ≥ 0


implies an ≥ 0 for all n ≥ 1. Hence

X
EX = an P (An ) ≥ 0 .
n=1

(ii) Let Y has a representation (Bn , bn ) | n ≥ 1}. Now by setting

Cnm = An Bm , n, m ≥ 1, anm = an , m ≥ 1, bnm = bm , n ≥ 1

one can use same partition for X and Y . Therefore



X
aX + Y = (aanm + bnm ) ICnm .
n,m=1

Hence

X
E(aX + Y ) = (aanm + bnm ) P (Cnm )
n,m=1
X∞ X∞ ∞ X
X ∞
= a anm P (Cnm ) + bnm P (Cnm )
n=1 m=1 m=1 n=1
X∞ X ∞ X∞ X ∞
= a an P (An Bm ) + bm P (An Bm )
n=1 m=1 m=1 n=1
X∞ ∞
X
= a an P (An ) + bm P (Bm )
n=1 m=1
= a EX + EY .

Definition 7.2. (Simple random variable) A random variable is said to be


simple if it is discrete and the distribution function has only finitely many
discontinuities.
Theorem 0.7 Let X be random variable in (Ω, F, P ) such that X ≥ 0,
then there exists a sequence of simple random variables {Xn }satisfying
(i) For each n ≥ 1, Xn ≥ 0, Xn ≤ Xn+1 ≤ X.
(ii) For each ω ∈ Ω, Xn (ω) → X(ω) as n → ∞.
12

Proof. For n ≥ 1, define simple random variable Xn as follows:

k k k+1
(
if ≤ X(ω) < , k = 0, · · · , n2n − 1
Xn (ω) = 2n 2n 2n
0 if X(ω) ≥ n .

Then Xn ’s satisfies the following:

Xn ≤ Xn+1 , n ≥ 1

lim Xn (ω) = X(ω), ω ∈ Ω .


n→∞

Lemma 0.1 Let X be a non negative random variable and {Xn } be a se-
quence of simple random variables satisfying (i) and (ii) of Theorem 6.0.25.
Then limn→∞ EXn exists and is given by

lim EXn = sup{EY | Y is simple and Y ≤ X} .


n→∞

Proof. (Reading exercise) Since Xn ≤ Xn+1 , we have EXn ≤ EXn+1 , n ≥


1 (see exercise). Hence limn→∞ EXn exists. Also since Xn ’s are simple,
clearly,
EXn ≤ sup{EY | Y is simple and Y ≤ X}, n ≥ 1.
Therefore

lim EXn ≤ sup{EY | Y is simple and Y ≤ X} .


n→∞

Hence to complete the proof, it suffices to show that for Y simple and
Y ≤ X,
EY ≤ lim EXn .
n→∞

Let
m
X
Y = ak IAk ,
k=1

where {Ak | k = 1, . . . , m} is a partition of Ω. Fix  > 0, set for k ≥ 1 and


n ≥ 1,
Akn = {ω ∈ Ak | Xn (ω) ≥ ak − } .
Since Xn ≤ Xn+1 , n ≥ 1, we have for each k ≥ 1.

Ak n ⊆ Ak n+1 , n ≥ 1 .
13

Also
ω ∈ Ak =⇒ X(ω) = ak
=⇒ limn→∞ Xn (ω) = ak
=⇒ Xn0 (ω) ≥ ak −  for some n0
=⇒ ω ∈ Akn0 ⊆ ∪∞n=1 Akn .

Hence
∪∞
n=1 Akn = Ak , 1 ≤ k ≤ m .

From the definition of Akn we have


m
X
Xn ≥ (ak − )IAkn .
k=1

Hence
m
X
EXn ≥ (ak − )P (Akn ) . (0.1)
k=1

Using continuity property of probability, we have

lim P (Akn ) = P (Ak ), 1 ≤ k ≤ m .


n→∞

Now let, n → ∞ in (0.1), we get


m
X
lim EXn ≥ (ak − )P (Ak ) = EY −  .
n→∞
k=1

Since,  > 0 is arbitrary, we get

lim EXn ≥ EY .
n→∞

This completes the proof.


Definition 7.3. The expectation of a non negative random variable X is
defined as
EX = lim EXn , (0.2)
n→∞

where Xn is a sequence of simple random variables as in Theorem 7.4.

Remark 0.5 One can define expectation of X, non negative random vari-
able, as
EX = sup{EY | Y is simple and Y ≤ X} .
But we use Definition 7.3., since it is more handy.

You might also like