Week11 Notes
Week11 Notes
11. Week 11
Remark 11.1. While considering a Bernoulli or a Binomial RV, we looked at random experiments
with exactly two outcomes. We now consider random experiments with two or more than two
outcomes. Suppose a random experiment terminates in one of k possible outcomes A1 , A2 , · · · , Ak
for k ≥ 2. More generally, we may also consider random experiments which terminate in one of
k mutually exclusive and exhaustive events A1 , A2 , · · · , Ak with k ≥ 2. Write pj = P(Aj ), j =
1, 2, · · · , k, which does not change from trial to trial. Then, p1 + p2 + · · · + pk = 1. Suppose n trials
are conducted independently and let Xj , j = 1, 2, · · · , k denote the number of times event Aj has
occured, respectively. Then the RVs X1 , X2 , · · · , Xk satisfy the relation X1 + X2 + · · · + Xk = n
and we have
n!
P(X1 = x1 , · · · , Xk = xk ) = px1 · · · pxkk
x1 ! · · · xk ! 1
for x1 , · · · , xk ∈ {0, 1, · · · , n} with x1 + · · · + xk = n. The probability is zero otherwise. Removing
the redundancy we have the joint p.m.f. of (X1 , X2 , · · · , Xk−1 ) given by
MX (t1 , t2 , · · · , tk−1 )
n! x1 xk−1
n−x −···−xk−1
p1 et1 · · · pk−1 etk−1
X
= pk 1
x1 ,··· ,xk ∈{0,1,··· ,n}
x1 ! · · · xk−1 !(n − x1 − · · · − xk−1 )!
x1 +···+xk−1 ≤n
n
= p1 et1 + p2 et2 + · · · + pk−1 etk−1 + pk
(b) If t = (t1 , 0, · · · , 0) ∈ Rk−1 , then MX (t) = E exp (t1 X1 ) = MX1 (t1 ). But, using the above
n
expression for the joint MGF, we have MX1 (t1 ) = MX (t) = (p1 et1 + p2 + · · · + pk−1 + pk ) =
n
(p1 et1 + 1 − p1 ) . Therefore, X1 ∼ Binomial(n, p1 ). Similarly, Xi ∼ Binomial(n, pi ), ∀i =
2, · · · , k − 1. In particular, EXi = npi , V ar(Xi ) = npi (1 − pi ).
(c) For distinct indices i, j ∈ {1, 2, · · · , k − 1},
n
MXi ,Xj (ti , tj ) = MX (0, · · · , 0, ti , 0, · · · , 0, tj , 0, · · · , 0) = pi eti + pj etj + 1 − pi − pj , ∀(ti , tj ) ∈ R2 .
Note 11.3. We now look at distributions that arise in practice from random samples. Such
distributions are usually referred to as sampling distributions. More specifically, if X1 , X2 , · · · , Xn
132
is a random sample from N (µ, σ 2 ) distribution, we shall look at various statistics involving the
1 Pn
sample mean X̄ = n i=1 Xi .
Note 11.4 (Distribution of square of a standard Normal RV). Let X ∼ N (0, 1). Recall that the
p.d.f. of X is !
1 x2
fX (x) = √ exp − , ∀x ∈ R
2π 2
We consider the distribution of Y = X 2 by first computing the MGF. We have,
√ Z
tX 2
Z ∞
1 tx2 − x22 2 ∞ (t− 12 )x2 1 1
MY (t) = Ee = √ e e dx = √ e dx = (1 − 2t)− 2 , ∀t <
−∞ 2π π 0 2
Comparing with the MGF of the Gamma(α, β) distribution, we conclude that X 2 ∼ Gamma( 12 , 2).
X−µ 2
X−µ
Note 11.5. If X ∼ N (µ, σ 2 ), then σ
∼ N (0, 1) and hence σ
∼ Gamma( 21 , 2).
Remark 11.6 (Distribution of the sample mean for a random sample from the Normal distribution).
If X1 , X2 , · · · , Xn is a random sample from N (µ, σ 2 ) distribution, then for Y = X1 + X2 + · · · + Xn ,
using independence of Xi ’s we have
n
1
MXi (t) = exp(nµt + nσ 2 t2 )
Y
MY (t) =
i=1 2
Pn 2
1 σ
and hence X1 + X2 + · · · + Xn ∼ N (nµ, nσ 2 ). Now, X̄ = n i=1 Xi ∼ N µ, n and consequently,
√ X̄−µ 2
n σ ∼ N (0, 1) and n X̄−µ σ
∼ Gamma( 12 , 2).
Xi −µi 2
Pn
Note 11.9. Using Note 11.7, we conclude that i=1 σi
∼ χ2n , where X1 , X2 , · · · , Xn are
independent RVs with Xi ∼ N (µi , σi2 ), i = 1, 2, · · · , n.
Note 11.10. As argued in Note 11.7, using Remark 9.32(h) we conclude that X + Y ∼ χ2m+n ,
where X, Y are independent RVs with X ∼ χ2m and Y ∼ χ2n .
n
Note 11.11. If X ∼ χ2n , then using properties of the Gamma 2
,2 distribution, we have EX =
−n
n, V ar(X) = 2n and MX (t) = (1 − 2t) 2 , ∀t < 21 .
Remark 11.12 (Sample Mean and Sample Variance are independent corresponding to a random
sample drawn from a Normal distribution). Let X1 , X2 , · · · , Xn be a random sample from N (µ, σ 2 )
distribution. Consider the sample mean X̄ = n1 (X1 + X2 + · · · + Xn ) and sample variance Sn2 =
1 Pn
n−1 i=1 (Xi − X̄)2 . Show that X̄ and Sn2 are independent.
Look at the joint MGF of (X1 − X̄, X2 − X̄, · · · , Xn − X̄, X̄) given by
n
tj (Xj − X̄) + tn+1 X̄ , ∀(t1 , t2 , · · · , tn , tn+1 ) ∈ Rn+1
X
M (t1 , t2 , · · · , tn , tn+1 ) = E exp
j=1
n
X
= E exp sj Xj ,
j=1
Pn
tn+1 − t
i=1 i
where sj = tj + n
. Using the independence of Xj ’s, we have
n
Y
M (t1 , t2 , · · · , tn , tn+1 ) = E exp (sj Xj )
j=1
n
1
exp µsj + σ 2 s2j
Y
=
j=1 2
n n
1
sj + σ 2 s2j
X X
= exp µ
j=1 2 j=1
!2
n Pn
1 t2
!
1 X i=1 ti
= exp µtn+1 + σ 2 n+1 exp σ 2 tj −
2 n 2 j=1 n
MX1 −X̄,X2 −X̄,··· ,Xn −X̄ (t1 , t2 , · · · , tn ) = MX1 −X̄,X2 −X̄,··· ,Xn −X̄,X̄ (t1 , t2 , · · · , tn , 0)
!2
n Pn
1 X i=1 ti
= exp σ 2 tj −
2 j=1 n
and
1 t2
!
MX̄ (tn+1 ) = MX1 −X̄,X2 −X̄,··· ,Xn −X̄,X̄ (0, 0, · · · , 0, tn+1 ) = exp µtn+1 + σ 2 n+1 .
2 n
Therefore, (X1 − X̄, X2 − X̄, · · · , Xn − X̄) and X̄ are independent. Consequently, the sample
1 Pn
variance Sn2 = n−1 i=1 (Xi − X̄)2 and X̄ are independent.
Remark 11.13 (Distribution of the sample variance for a random sample from the Normal distribu-
tion). Let X1 , X2 , · · · , Xn be a random sample from N (µ, σ 2 ) distribution. By looking at the joint
Pn
MGF of X1 − X̄, · · · , Xn − X̄ and X̄, Remark 11.12 gives i=1 (Xi − X̄)2 and X̄ are independent.
Now,
n n n
1 X 2 1 X 2 1 X 2 n(X̄ − µ)2
(Xi − µ) = 2 (Xi − X̄ + X̄ − µ) = 2 (Xi − X̄) +
σ 2 i=1 σ i=1 σ i=1 σ2
1 Pn n(X̄−µ)2 1 Pn n(X̄−µ)2
where σ2 i=1 (Xi − µ)2 ∼ χ2n and σ2
∼ χ21 . Since, σ2 i=1 (Xi − µ)2 and σ2
are
1 Pn 2
independent, we conclude i=1 (Xi − X̄)
σ2
∼ χ2n−1 . Taking the sample variance as Sn2 =
2
conclude that (n−1)S
1 Pn
n−1 i=1 (Xi − X̄)2 , we σ2
n
∼ χ2n−1 .
Note 11.14. Given a random sample X1 , X2 , · · · , Xn from N (µ, σ 2 ) distribution, the sample mean
Pn 2
Pn
X̄ = 1
n i=1 Xi ∼ N µ, σn and sample variance Sn2 = 1
n−1 i=1 (Xi − X̄)2 has the property that
(n−1)Sn2
X̄−µ
σ 2 ∼ χ2n−1 . The distribution of Sn
is of interest.
Definition 11.15 (Student’s t-distribution with n degrees of freedom). Let n be a positive integer.
Let X ∼ N (0, 1) and Y ∼ χ2n be independent RVs. Then,
X
T =q
Y
n
135
is said to follow the t-distribution with n degrees of freedom. In this case, we write T ∼ tn . The
p.d.f. is given by
n+1
!− n+1
Γ 2 t2 2
fT (t) = √ 1+ , ∀t ∈ R.
n
Γ 2 Γ 12 n n
Here, ET k exists if k < n. Since, the distribution is symmetric about 0 and hence ET k = 0 for all
k odd with k < n. If k is even and k < n, then
k+1
k Γ 2
Γ n−k
2
ET k = n 2
1 n
.
Γ 2 Γ 2
n
In particular, if n > 2, then ET = 0 and V ar(T ) = n−2
. The t-distribution appears in the tests
for statistical significance.
√
Note 11.16. If X1 , X2 , · · · , Xn is a random sample from N (µ, σ 2 ) distribution, then n X̄−µ
Sn
∼
tn−1 .
Note 11.17. Let X1 , X2 , · · · , Xm and Y1 , Y2 , · · · , Yn be independent random samples from N (µ1 , σ12 )
1 Pm
and N (µ2 , σ22 ) distribution, respectively. Consider the sample variances S12 := m−1 i=1 (Xi − X̄)2
1 Pn S12 (m−1)S12
and S22 := n−1 j=1 (Yj − Ȳ )2 . The distribution of S22
is of interest. Note that σ12
∼ χ2m−1 and
(n−1)S22
σ22
∼ χ2n−1 .
Definition 11.18 (F -distribution with degrees of freedom m and n). Let m and n be positive
integers. Let X ∼ χ2m and Y ∼ χ2n be independent RVs. Then,
X
m
F = Y
n
is said to follow the F -distribution with degrees of freedom m and n. In this case, we write
F ∼ Fm,n . The p.d.f. is given by
Γ( m+n m −1 − m+n
2 ) m
m
x 2 1 + mx 2
, if x > 0,
Γ( m Γ n n
2 ) (2)
n n
fF (x) =
0,
otherwise.
136
m −1 − m+n
1 m m 2 m 2
x 1+ x , if x > 0,
B( m , n ) n n n
= 2 2
0,
otherwise.
1
Note 11.19. If F ∼ Fm,n , then F
∼ Fn,m .
Note 11.20. Let X1 , X2 , · · · , Xm and Y1 , Y2 , · · · , Yn be independent random samples from N (µ1 , σ12 )
1 Pm
and N (µ2 , σ22 ) distribution, respectively. Consider the sample variances S12 = m−1 i=1 (Xi − X̄)2
1 Pn σ22 S12
and S22 = n−1 j=1 (Yj − Ȳ )2 . The distribution of σ12 S22
∼ Fm−1,n−1 .
We now look at the Normal distribution and construct some generalization in higher dimensions.
This will give us an example of a continuous random vector, which under additional hypothesis
becomes absolutely continuous.
We give multiple ways of defining this generalization, all of which, at the end, turn out to be
equivalent.
Remark 11.22. We now discuss basic properties of the RV Y constructed in Definition 11.21.
137
In the above simplification, we have used the fact that Cov(Xi , Xj ) = 0, due to the inde-
pendence of Xi and Xj .
(iii) Consider a d × d real matrix K = (kij )d×d with kij = Cov(Yi , Yj ), ∀i, j. Then, by the
computations above, we conclude
d
a2il , i = 1, 2, · · · , d
X
kii = Cov(Yi , Yi ) = V ar(Yi ) =
l=1
and for i ̸= j,
d
X
kij = Cov(Yi , Yj ) = ail ajl .
l=1
The matrix K shall be referred to as the variance-covariance matrix of the random vector
Y or simply, the covariance matrix. It is also called the dispersion matrix.
(iv) Continue with the matrix K defined above. For i, j = 1, 2, · · · , d, observe that
d
ail ajl = (AAt )ij ,
X
kij =
l=1
Pd
Here, ut b = j=1 uj bj . Since Xi ’s are independent, and the marginal MGFs MXi =
1 2
exp 2
x , ∀x ∈ R, i = 1, 2, · · · , d, we have
d X d
1Xt
E exp(u1 Y1 + u2 Y2 + · · · + ud Yd ) = exp(u b) exp ( aij ui )2
2 j=1 i=1
d X d
1X
= exp(ut b) exp aij ui alj ul
2 j=1 i,l=1
d
1
= exp(ut b) exp
X
kil ui ul
2 i,l=1
1
= exp(ut b + ut Ku).
2
Hence, we have computed the joint MGF of Y . Note that this MGF exists for all points
u = (u1 , · · · , ud )t ∈ Rd .
(vi) We now compute the Characteristic function of Y . We have
1
ΦY (u) = exp iu b − ut Ku , ∀u ∈ Rd
t
2
Notation 11.23. We usually refer to Y , as defined in Definition 11.21, as a multivariate Normal
random vector with mean vector b and variance-covariance matrix K and write Y ∼ Nd (b, K).
a11 a12
b1
Example 11.24. Consider the two-dimensional case. Then with A = and b =
b2
,
a21 a22
we have
Y1 = b1 + a11 X1 + a12 X2 , Y2 = b2 + a21 X1 + a22 X2 .
Moreover, EY1 = b1 , V ar(Y1 ) = a211 + a212 , EY2 = b2 , V ar(Y2 ) = a221 + a222 and Cov(Y1 , Y2 ) =
a11 a21 + a12 a22 . Here,
a211 + a212 a11 a21 + a12 a22 a11 a12 a11 a21
K= = = AAt .
a11 a21 + a12 a22 a211 + a212 a21 a22 a12 a22
139
Note 11.25. By Remark 11.22(iv), K = AAt . Then, det(K) = det(AAt ) = (det(A))2 . Therefore,
det(K) ̸= 0 if and only if det(A) ̸= 0. In other words, K is invertible if and only if A is invertible.
Example 11.26. Continue with Example 11.24. Assume that σ12 := V ar(Y1 ) > 0, σ22 := V ar(Y1 ) >
Cov(Y1 ,Y2 )
0 and set ρ := σ1 σ2
. If |ρ| =
̸ 1, then we have K is invertible. Furthermore, in this case, we
can show that Y is absolutely continuous by computing the joint p.d.f..
So far, we gave a constructive definition of a multivariate Normal random vector. We have also
computed the corresponding MGF. Recall that the distribution of any random vector is determined
by its MGF, provided the MGF exists. This leads us to an alternative definition of a multivariate
Normal random vector.
Definition 11.27 (Multivariate Normal distribution through its MGF). Let b = (b1 , · · · , bd )t ∈ Rd
and let K be a d × d real symmetric non-negative definite matrix. A d-dimensional random vector
Y is said to be multivariate Normal if its MGF is given by
1
MY (u) = exp(ut b + ut Ku), ∀u ∈ Rd .
2
Note 11.28. If K is as in the definition above, then there exists a d × d real matrix A such that
K = AAt . Then taking X = (X1 , X2 , · · · , Xd )t with Xi ’s as independent N (0, 1) RVs and setting
Y = AX + b gives us the MGF above. This is the connection between the constructive definition
11.21 and the definition 11.27 through its MGF.
Note 11.29. In terms of notation 11.23, the random vector Y in definition 11.27 above is a
multivariate Normal random vector with mean vector b and variance-covariance matrix K, i.e.
Y ∼ Nd (b, K).
Example 11.30. Let Y ∼ Nd (b, K). Then for any c ∈ Rn and any n × d real matrix B, consider
the n-dimensional random vector Z = c + BY . A straight-forward verification yields Z ∼ Nn (c +
Bb, BKB t ).
Remark 11.31. Considering the case n = 1 in the above example 11.30, any linear combination
Pd
j=1 aj Yj , with a1 , a2 , · · · , ad ∈ R and Y = (Y1 , · · · , Yd )t ∼ Nd (b, K), is univariate normal.
140
With the above observation at hand, we are ready to prove the following important result on
multivariate Normal distributions. We shall see that this result yields yet another definition of
multivariate Normal distribution.
Theorem 11.32. A random vector Y = (Y1 , · · · , Yd )t is multivariate Normal if and only if any
Pd
linear combination j=1 aj Yj , with a1 , a2 , · · · , ad ∈ R, is univariate Normal.
Proof. The ‘only if’ part follows from the Remark 11.31 above.
Pd
Now, consider the ‘if’ part of the statement. Assume that j=1 aj Yj is univariate Normal for all
choices of a1 , a2 , · · · , ad ∈ R. In particular, Yj ’s are univariate Normal RVs.
Suppose that Yj ∼ N (µj , σj2 ). Then, with a = (a1 , a2 , · · · , ad )t and µ = (µ1 , µ2 , · · · , µd )t ,
d d
aj µ j = at µ
X X
E aj Y j =
j=1 j=1
and
d
aj Yj = at Ka,
X
V ar
j=1
Motivated by Theorem 11.32, we make the following alternative definition for multivariate Nor-
mal distribution.
Definition 11.33. We say that a d-dimensional random vector Y follows multivariate Normal
distribution if at Y is univariate Normal for all a ∈ Rd .
141
Remark 11.34. Recall the notion of Characteristic function of a random vector. Like the MGFs,
the Characteristic functions also determine the distributions uniquely. One may also define mul-
tivariate Normal distributions by specifying the Characteristic function.
We have discussed multiple equivalent ways of defining the multivariate Normal distribution.
Now, we shall focus on the question of existence of a joint p.d.f. of a multivariate Normal random
vector.
Theorem 11.35. Let Y ∼ Nd (b, K). Suppose K is invertible. Then the following statements hold.
Pd
(i) j=1 aj (Yj − bj ) = 0 with probability 1 if and only if aj = 0, ∀j = 1, 2, · · · , d, i.e., the
components Yj − bj , j = 1, 2, · · · , d are linearly independent.
(ii) Y has a joint p.d.f. given by
1
− d2 − 12
fY (y) = (2π) (det(K)) exp − (y − b)t K −1 (y − b) , ∀y ∈ Rd .
2
Pd
Proof. For the first statement, if aj = 0, ∀j, then we have j=1 aj (Yj − bj ) = 0 with probability 1.
To prove the converse. Since K is a d × d real symmetric positive-definite matrix, it is diagonal-
izable. There exists a d × d orthogonal matrix A such that D = At KA is a diagonal matrix with
eigen-values of K as the diagonal entries. Note that the eigen-values are strictly positive, since K
is positive-definite.
Consider Z = At (Y − b). By Example 11.30, Z ∼ Nd (0, D). If with probability 1, at (Y − b) = 0,
P
d
then V ar(at (Y − b)) = E j=1 a2j (Yj − bj )2 = 0. Again,
d
a2j (Yj − bj )2 = E at (Y − b)(Y − b)t a = at Ka > 0,
X
E
j=1
unless aj = 0, ∀j, since K is positive-definite. Hence, we must have aj = 0, ∀j. This proves the
first statement.
To obtain the joint p.d.f. as stated in the second statement. Continue with the notations of the
proof of the first part. If λ1 , · · · , λd are the eigen-values of K, using the constructive definition of
142
But, Y = AZ + b with det(A) = ±1, since A is orthogonal. Using change of variables, we have the
p.d.f. of Y as stated above. This completes the proof. □
Proof. If Yj ’s are uncorrelated, then K = (kij )d×d is a diagonal matrix. If a diagonal entry kii = 0,
then V ar(Yi ) = kii = 0. This implies Yi is degenerate.
If the non-degenerate Yj ’s are independent, then along with the degenerate Yi ’s we obtain the
required independence. Thus without loss of generality, we assume that the diagonal entries are
non-zero.
Since the diagonal entries are the eigen-values of K, we get det(K) ̸= 0, i.e., K is invertible. By
Theorem 11.35, the joint p.d.f. of Y is
d
− d2 − 12 1X (yj − bj )2
fY (y) = (2π) (det(K)) exp − √ , ∀y = (y1 , · · · , yd )t ∈ Rd .
2 j=1 kii
The joint p.d.f. can be split with terms separately depending on different yj ’s and hence Yj ’s are
independent with Yj ∼ N (bj , kjj ). This completes the proof. □
Remark 11.37. If K is a singular matrix, then using arguments similar to Theorem 11.35, we
can show that (Yj − bj ), j = 1, · · · , d are linearly dependent. In fact, any linearly independent
subcollection of (Yj − bj ), j = 1, · · · , d is a multivariate Normal random vector of the appropriate
dimension with a joint p.d.f..
We now discuss a special case of 2-dimensional Normal random vector with a p.d.f.. We now
revisit most of the results which we already studied in the general d-dimensional framework.
Definition 11.38. A bivariate random vector X = (X1 , X2 ) is said to follow bivariate Normal
distribution N2 (µ1 , µ2 , σ12 , σ22 , ρ) for µ1 ∈ R, µ2 ∈ R, σ1 > 0, σ2 > 0, ρ ∈ (−1, 1), if the joint p.d.f. is
143
given by
Remark 11.39. Let X = (X1 , X2 ) ∼ N2 (µ1 , µ2 , σ12 , σ22 , ρ) for some µ1 ∈ R, µ2 ∈ R, σ1 > 0, σ2 >
0, ρ ∈ (−1, 1).
fX2 (x2 )
Z ∞
= fX1 ,X2 (x1 , x2 ) dx1 , ∀x2 ∈ R
−∞
" 2 #
1 1 x2 − µ 2
=√ exp −
2πσ2 2 σ2
Z ∞ " 2 #
1 1 σ1
× √ √ exp − 2 x1 − µ1 + ρ (x2 − µ2 ) dx1
−∞ 2πσ1 1−ρ 2 2σ1 (1 − ρ )
2 σ2
" 2 #
1 1 x2 − µ 2
=√ exp −
2πσ2 2 σ2
144
and hence X2 ∼ N (µ2 , σ22 ). Similarly, X1 ∼ N (µ1 , σ12 ). Thus the parameters µ1 =
EX1 , σ12 = V ar(X1 ), µ2 = EX2 , σ22 = V ar(X2 ) have their own interpretation.
(b) The covariance Cov(X1 , X2 ) is given by
x1 −µ1 x2 −µ2
By changing variables to y1 = σ1
, y2 = σ2
, and simplifying the above expression, we
have Cov(X1 , X2 ) = ρσ1 σ2 . Consequently, the correlation ρ(X1 , X2 ) = ρ. We now have the
interpretation of the parameter ρ.
(c) The conditional distribution of X1 given X2 = x2 ∈ R is described by the conditional p.d.f.
fX1 ,X2 (x1 , x2 )
fX1 |X2 (x1 | x2 ) =
fX2 (x2 )
" 2 #
1 1 σ1
=√ √ exp − 2 x1 − µ1 + ρ (x2 − µ2 ) , ∀x1 ∈ R
2πσ1 1−ρ2 2σ1 (1 − ρ )
2 σ2
and hence X1 | X2 = x2 ∼ N (µ1 + ρ σσ21 (x2 − µ2 ), σ12 (1 − ρ2 )). Similarly, for x1 ∈ R,
X2 | X1 = x1 ∼ N (µ2 + ρ σσ12 (x1 − µ1 ), σ22 (1 − ρ2 )).
(d) Using the conditional distributions obtained above, we conclude
σ1
E[X1 | X2 = x2 ] = µ1 + ρ (x2 − µ2 ),
σ2
V ar[X1 | X2 = x2 ] = σ12 (1 − ρ2 ),
σ2
E[X2 | X1 = x1 ] = µ2 + ρ (x1 − µ1 ),
σ1
V ar[X2 | X1 = x1 ] = σ22 (1 − ρ2 ).
(e) If X1 and X2 are independent, then they are uncorrelated and in particular ρ = ρ(X1 , X2 ) =
0. Conversely, if ρ = 0, then
" ( 2 2 )#
1 1 x1 − µ 1 x2 − µ 2
fX1 ,X2 (x1 , x2 ) = exp − +
2πσ1 σ2 2 σ1 σ2
= fX1 (x1 )fX2 (x2 ), ∀(x1 , x2 ) ∈ R2
145
MX (t1 , t2 )
= E exp(t1 X1 + t2 X2 )
Z ∞ Z ∞
= exp(t1 x1 + t2 x2 )fX1 ,X2 (x1 , x2 ) dx1 dx2
−∞ −∞
Z ∞ Z ∞
= exp(t2 x2 )fX2 (x2 ) exp(t1 x1 )fX1 |X2 (x1 | x2 ) dx1 dx2
−∞ −∞
Z ∞
σ1 1
= exp(t2 x2 )fX2 (x2 ) exp(µ1 t1 + ρ(x2 − µ2 )t1 + σ12 (1 − ρ2 )t21 ) dx2
−∞ σ2 2
Z ∞
σ1 1 σ1
= exp(µ1 t1 + ρ (−µ2 )t1 + σ12 (1 − ρ2 )t21 ) exp(t2 x2 + ρ t1 x2 )fX2 (x2 ) dx2
σ2 2 −∞ σ2
2 !
σ1 1 2 σ1 1 2 σ1
2 2
= exp(µ1 t1 + ρ (−µ2 )t1 + σ1 (1 − ρ )t1 ) exp µ2 t2 + ρ t1 + σ2 t2 + ρ t1
σ2 2 σ2 2 σ2
1 1
= exp(µ1 t1 + µ2 t2 + σ12 t21 + σ22 t22 + ρσ1 σ2 t1 t2 ), ∀(t1 , t2 ) ∈ R2 .
2 2
(h) Let c1 , c2 ∈ R such that at least one of c1 , c2 is not zero and take Y = c1 X1 + c2 X2 . Now,
1 2 2 1 2 2
MY (t) = E exp(c1 tX1 + c2 tX2 ) = exp (µ1 c1 + µ2 c2 )t + σ c + σ c + ρσ1 σ2 c1 c2 t2 , ∀t ∈ R.
2 1 1 2 2 2
146
Remark 11.40. The above statement for the linear combination of X1 , X2 actually characterizes
the bivariate Normal distribution. If X = (X1 , X2 ) is such that EX1 = µ1 , EX2 = µ2 , V ar(X1 ) =
σ12 > 0, V ar(X2 ) = σ22 > 0, ρ(X1 , X2 ) = ρ ∈ (−1, 1) and c1 X1 + c2 X2 ∼ N (c1 µ1 + c2 µ2 , c21 σ12 +
c2 σ22 + 2ρc1 c2 σ1 σ2 ) for all (c1 , c2 ) ̸= (0, 0), then
MX (t1 , t2 ) = E exp(t1 X1 + t2 X2 )
Remark 11.41 (Interpretation of parameters appearing in the p.d.f. of a Continuous RV). In the
examples of continuous RVs discussed in this course, we have seen that certain parameters appear
in the description of p.d.fs. If we specify the values of these parameters, then we obtain a specific
example of distribution from a family of possible distributions. In certain cases, we have already
been able to interpret them in terms of properties of the distribution of the RV. For example, if
X ∼ N (µ, σ 2 ), then µ = EX and σ 2 = V ar(X). We list some interpretation of these parameters.
and the p.d.f. f1 is free of θ, i.e. does not depend on θ. We can restate this fact in terms of
the corresponding RVs Xθ as follows: the p.d.f./distribution of 1θ Xθ does not depend on θ.
x−µ
(c) (Location-scale parameter) If we have a family of p.d.f.s fµ,σ with σ > 0 and if σ1 f σ
=
f0,1 (x), ∀x ∈ R, then we say that (µ, σ) is a location-scale parameter for the family of
distributions given by the p.d.f.s fµ,σ . In this case, the family is called a location-scale
family and the p.d.f. f0,1 is free of (µ, σ), i.e. does not depend on (µ, σ). We can restate
Xµ,σ −µ
this fact in terms of the corresponding RVs Xµ,σ as follows: the p.d.f./distribution of σ
does not depend on (µ, σ).
(d) (Shape parameter) Some family of p.d.f.s also has a shape parameter, where changing the
value of the parameter affects the shape of the graph of the p.d.f..
Example 11.42. (a) The family of RVs Xµ,θ ∼ Cauchy(µ, θ), µ ∈ R, θ > 0 with the p.d.f.
θ 1
fµ,θ (x) = , ∀x ∈ R
π θ + (x − µ)2
2
α is a shape parameter.
Definition 11.43 (Weibull distribution). We say that an RV X follows the Weibull distribution
with shape parameter α > 0 and scale parameter β > 0, if its p.d.f. is given by
h α i
α xα−1 exp − x
, if x > 0,
βα β
fX (x) =
0, otherwise.
1
Note 11.44. Let X ∼ Exponential(β α ) for some α, β > 0. Then Y = X α follows the Weibull
distribution with shape parameter α and scale parameter β.
148
Definition 11.45 (Pareto distribution). We say that an RV X follows the Pareto distribution
with scale parameter θ > 0 and shape parameter α > 0, if its p.d.f. is given by
αθα
if x > 0,
(x+θ)α+1
,
fX (x) =
0,
otherwise.