0% found this document useful (0 votes)
14 views

Week12 Notes

Uploaded by

Reevu Thapa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Week12 Notes

Uploaded by

Reevu Thapa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

149

12. Week 12

Remark 12.1 (Descriptive Measures of Probability Distributions). The distribution of an RV pro-


vides numerical values through which we can quantify/understand the manner in which the RV
takes values in various subsets of the real line. However, at times, it is difficult to grasp the fea-
tures of the RV from the distribution. As an alternative, we typically use four types of numerical
quantities associated with the distribution to summarize the information. We refer to them as
descriptive measures of the probability distribution.

(a) Measures of Central Tendency or location: here, we try to find a ‘central’ value around
which the possible values of the RV are distributed.
(b) Measures of Dispersion: once we have an idea of the ‘central’ value of the RV (equivalently,
the probability distribution), we check the scattering/dispersion of the all the possible
values of the RV around this ‘central’ value.
(c) Measures of Skewness: here, we try to quantify the asymmetry of the probability distribu-
tion.
(d) Measures of Kurtosis: here, we try to measure the thickness of the tails of the RV (equiv-
alently, the probability distribution) while comparing with the Normal distribution.

We describe these measures along with examples.

Example 12.2 (Measures of Central Tendency). (a) The Mean of an RV is a good example
of a measure of central tendency. It also has the useful property of linearity. However, it
may be affected by few extreme values, referred to as the outliers. The mean may not exist
for all distributions.
1
(b) Median, i.e. a quantile of order 2
of an RV is always defined and is usually not affected by
a few outliers. However, the median lacks the linearity property, i.e. a median of X + Y
has no general relationship with the medians of X and Y . Further, a median focuses on
the probabilities with which the values of the RV occur rather than the exact numerical
values. A median need not be unique.
150

(c) The mode m0 of a probability distribution is the value that occurs with ‘highest probability’,
and is defined by fX (m0 ) = sup {fX (x) : x ∈ SX }, where fX denotes the p.m.f./p.d.f. of X,
as appropriate and SX denotes the support of X. Mode need not be unique. Distributions
with one, two or multiple modes are called unimodal, bimodal or multimodal distributions,
respectively. Usually, it is easy calculate. However, it may so happen that a distribution
has more than multiple modes situated far apart, in which case it may not be suitable for
a measure of central tendency.

Example 12.3 (Measures of Dispersion). (a) If the support SX of an RV X is contained in


the interval [a, b] and this is the smallest such interval, then we define b − a to be the range
of X. This measure of dispersion does not take into account the probabilities with which
the values of X are distributed.
(b) Mean Deviation about a point c ∈ R: If E|X − c| exists, we define it to be the mean
deviation of X about the point c. Usually, we take c to be the mean (if it exists) or the
median and obtain mean deviation about the mean or median, respectively. However, it
may be difficult to compute and even may not exist. The mean deviations are also affected
by a few outliers.
q
(c) Standard Deviation: As defined earlier, the standard deviation of an RV X is V ar(X),
if it exists. Compared to the mean deviation, the standard deviation is usually easier to
compute. The standard deviation is affected by a few outliers.
(d) Quartile Deviation: Recall that z0.25 and z0.75 denotes the lower and upper quartiles. We
define z0.75 − z0.25 to be the inter-quartile range and refer to 12 [z0.75 − z0.25 ] as the semi-inter-
quartile range or the quartile deviation. This measures the spread in the middle half of the
distribution and is therefore not influenced by extreme values. However, it does not take
into account the numerical values of the RV. √
V ar(X)
(e) Coefficient of Variation: The coefficient of variation of X is defined as EX
, provided
EX ̸= 0. This aims to measure the variation per unit of mean. It, by definition, does not
depend on the unit of measurement. However, it may be sensitive to small changes in the
mean, if it is close to zero.
151

Note 12.4 (A Measure of Skewness). If the distribution of an RV X is symmetric about the mean
µ, then fX (µ + x) = fX (µ − x), ∀x ∈ R, where fX denotes the p.m.f./p.d.f. of X. If this is not the
case, then two cases may occur.

(a) (Positively skewed) the distribution may have more probability mass towards the right
hand side of the graph of fX . In this case, the tails on the right hand side are longer.
(b) (Negatively skewed) the distribution may have more probability mass towards the left hand
side of the graph of fX . In this case, the tails on the left hand side are longer.

To measure this asymmetry, we usually look at EZ 3 , where Z = √X−EX , provided the moments
V ar(X)
exist. Note that Z is independent of the units of measurement and
E(X − EX)3 µ3 (X)
EZ 3 = 3 = .
(V ar(X)) 2 (µ2 (X))3/2
We may refer to a distribution being positively or negatively skewed according as the above quantity
being positive or negative. If X ∼ Exponential(λ), then EZ 3 = 2 and hence the distribution of
X is positively skewed.

Note 12.5. There are many other measures of skewness used in practice. However, we do not
discuss them in this course.

Note 12.6 (A measure of Kurtosis). The probability distribution of X is said to have higher
(respectively, lower) kurtosis than the Normal distribution, if its p.m.f./p.d.f., in comparison with
the p.d.f. of a Normal distribution, has a sharper (respectively, rounded) peak and longer/fatter
(respectively, shorter/thinner) tails. To measure the kurtosis of X, we look at EZ 4 , where Z =
√X−EX , provided the moments exist. Note that Z is independent of the units of measurement
V ar(X)
and
E(X − EX)4 µ4 (X)
EZ 4 = = .
(V ar(X))2 (µ2 (X))2
If X ∼ N (µ, σ 2 ), then Z ∼ N (0, 1) and hence EZ 4 = 3 (see Remark 8.1). For a general RV X, the
µ4 (X)
quantity (µ2 (X))2
− 3 is referred to as the excess kurtosis of X. If the excess kurtosis is zero, positive
or negative, then we refer to the corresponding probability distribution as mesokurtic, leptokurtic
152

or platykurtic, respectively. If X ∼ Exponential(λ), then EZ 4 = 9 and hence the distribution of


X is leptokurtic.

Definition 12.7 (Quantile function of an RV). Let X be an RV with the DF FX . The function
QX : (0, 1) → R defined by

QX (p) := inf{x ∈ R : FX (x) ≥ p}, ∀p ∈ (0, 1)

is called the quantile function of X.

Proposition 12.8 (Probability integral transform). Let X be a continuous RV with the DF FX ,


p.d.f. fX and quantile function QX .
(a) We have FX (X) ∼ U nif orm(0, 1).
d
(b) For any U ∼ U nif orm(0, 1), we have QX (U ) = X.

Proof. We prove only the first statement. The proof of the second statement is similar. Take
Y = FX (X). Then,

0, if y < 0,


FY (y) = P(Y ≤ y) = P(FX (X) ≤ y) = 
1,
 if y ≥ 1.

For y ∈ [0, 1), we have


P(FX (X) = y) = P(x1 ≤ X ≤ x2 ) = 0

for some x1 , x2 ∈ R with FX (x1 ) = FX (x2 ). Here, we have used the fact that X is a continuous
RV. Now, for y ∈ [0, 1),

P(FX (X) ≤ y) = P(FX (X) < y)

= 1 − P(FX (X) ≥ y)

= 1 − P(X ≥ QX (y))

= 1 − P(X > QX (y))

= P(X ≤ QX (y))
153

= FX (QX (y)

= y.

Hence, Y = FX (X) ∼ U nif orm(0, 1). This completes the proof. □

Note 12.9. Let X be an RV with the quantile function QX . If we can generate random samples
U1 , U2 , · · · , Un from U ∼ U nif orm(0, 1), then QX (U1 ), QX (U2 ), · · · , QX (Un ) are random samples
from the distribution of X. This observation may be used in practice to generate random samples
for known distributions from the U nif orm(0, 1) distribution.

Note 12.10 (Moments do not determine the distribution of an RV). Let X ∼ N (0, 1) and consider
Y = eX . The distribution of Y is usually called the lognormal distribution, since ln Y = X ∼
N (0, 1). Using standard techniques, we can compute the p.d.f. of Y :

2
h i
 √1 y −1 exp − (ln2y) , if y > 0,



fY (y) =
0, otherwise.

It can be shown that the continuous RVs Xα , α ∈ [−1, 1] with the p.d.fs

fXα (y) = fY (y) [1 + α sin(2π ln y)] , ∀y ∈ R

has the same moments as Y . However, the distributions are different. This shows that the
moments of an RV do not determine the distribution. (see the article ‘On a property of the
lognormal distribution’ by C.C. Heyde, published in Journal of the Royal Statistical Society:
Series B, volume 29 (1963).)

Note 12.11 (Operations on DFs). Recall that a DF F : R → [0, 1] is characterized by the


properties that it is right continuous, non-decreasing and limx→∞ F (x) = 1, limx→−∞ F (x) = 0.
Given two DFs F, G : R → [0, 1] and α ∈ [0, 1], we make the following observations.

(a) (Convex combination of DFs) The function H : R → [0, 1] defined by H(x) := αF (x) +
(1 − α)G(x), ∀x ∈ R has the relevant properties and hence is a DF.
154

(b) (Product of DFs) The function H : R → [0, 1] defined by H(x) := F (x)G(x), ∀x ∈ R has
the relevant properties and hence is a DF. In particular, F 2 is a DF, if F is so.
In fact, a general DF can be written as a convex combination of discrete DFs and some special
continuous DFs. We do not discuss such results in this course.

Remark 12.12. In practice, given a known RV X, many times we need to find out the distribution
of h(X) for some function h : R → R or even, simply, compute the expectations of the form
Eh(X). As already discussed earlier in the course, we can theoretically (i.e., in principle) compute
R∞
Eh(X) as −∞ h(x)fX (x) dx, when X is a continuous RV with p.d.f. fX , for example. However, in
practice, it may happen that this integral does not have a closed form expression – which makes it
challenging to evaluate. The problem becomes more intractable when we look at similar problems
where X is a random vector and the joint/marginal distributions need to be considered. In such
situations, as an alternative, we try to find ‘good’ approximations for the quantities of interest,
where the approximation terms are easier to compute than the original expression. This motivation
leads to the various notions for convergence of RVs. If some quantity of interest involving an RV
X, say EX, is difficult to compute, then we find an appropriate ‘approximating’ sequence of RVs
{Xn }n for X and use the values EXn as an approximation for EX.

Remark 12.13. Given a random sample X1 , X2 , · · · , Xn from N (µ, σ 2 ) distribution, consider the
1 Pn
sample mean X̄n = n i=1 Xi . Here, we have written X̄n , instead of just X̄, to highlight the
 2

dependence of the sample mean on the sample size n. Recall that X̄n ∼ N µ, σn . The behaviour
of X̄n for large n is of interest. This is also another motivation for us to study the convergence of
sequences of RVs.

We now discuss concepts for convergence of sequences of RVs.

Definition 12.14 (Convergence in r-th mean). Let X, X1 , X2 , · · · be RVs defined on the same
probability space (Ω, F, P). Let r ≥ 1. If E|X|r < ∞, E|Xn |r < ∞, ∀n and if

lim E|Xn − X|r = 0,


n→∞

then we say that the sequence {Xn }n converges to X in r-th mean.


155

Note 12.15. (a) If a sequence {Xn }n converges to X in r-th mean for some r ≥ 1, then we
have
lim E|Xn |r = E|X|r ,
n→∞

and
lim EXnr = EX r ,
n→∞

i.e., we have the convergence of the r-th moments.


(b) The sequence {Xn }n converges to X in r-th mean if and only if the sequence {Xn − X}n
converges to 0 in r-th mean.

Remark 12.16. Even though we have defined the r-th order moments for 0 < r < 1, for technical
reasons we do not consider the convergence in r-th mean in this case. The details are beyond
the scope of this course. In what follows, whenever we consider the convergence in r-th mean, we
assume r ≥ 1.

Definition 12.17 (Convergence in Probability). Let X, X1 , X2 , · · · be RVs defined on the same


probability space (Ω, F, P). If for all ϵ > 0, we have

lim P(|Xn − X| ≥ ϵ) = 0,
n→∞

P
then we say that the sequence {Xn }n converges to X in probability and write Xn −−−→ X.
n→∞

Note 12.18. (a) Suppose that a sequence {Xn }n converges to X in probability. Now, for all
ϵ > 0, note that

P(|Xn − X| ≥ 2ϵ) ≤ P(|Xn − X| > ϵ) ≤ P(|Xn − X| ≥ ϵ).

Convergence in probability is equivalent to the fact that

lim P(|Xn − X| > ϵ) = 0.


n→∞

(b) The sequence {Xn }n converges to X in probability if and only if the sequence {Xn − X}n
converges to 0 in probability.
156

Proposition 12.19. Let X, X1 , X2 , · · · be RVs defined on the same probability space (Ω, F, P). If
P
the sequence {Xn }n converges to X in r-th mean for some r ≥ 1, then Xn −−−→ X.
n→∞

Proof. By Markov’s inequality (Corollary 8.9), we have

P(|Xn − X| > ϵ) ≤ ϵ−r E|Xn − X|r .

Since limn→∞ E|Xn − X|r = 0, we have the result. □

Corollary 12.20. Let {Xn }n be a sequence of RVs with finite second moments. If limn EXn = µ
and limn V ar(Xn ) = 0, then {Xn }n converges to µ in 2nd mean and in particular, in probability.

Proof. We have E|Xn − µ|2 = E [(Xn − µn ) + (µn − µ)]2 = E(Xn − µn )2 + (µn − µ)2 = V ar(Xn ) +
(µn − µ)2 . By our hypothesis, limn E|Xn − µ|2 = 0. Hence, {Xn }n converges to µ in 2nd mean.
By Proposition 12.19, the sequence also converges in probability. □

Example 12.21. Let X1 , X2 , · · · be i.i.d. U nif orm(0, θ) RVs, for some θ > 0. The sequence
{Xn }n being i.i.d. means that the collection {Xn : n ≥ 1} is mutually independent and that all
the RVs have the same law/distribution. Here, the common p.d.f. and the common DF are given
by 
 

 0, if x < 0,
1, if x ∈ (0, θ),

 


θ x
f (x) = , F (x) = θ
, if 0 ≤ x < θ,
0, otherwise

 



1, if x ≥ θ.

Consider X(n) = max{X1 , X2 , · · · , Xn }. Using Proposition 10.15, we have the marginal p.d.f. of
X(n) is given by 
 n xn−1 , if x ∈ (0, θ),


θn
gX(n) (x) =
0, otherwise.

Then, Z θ Z θ
n n n n−1 n 2
EX(n) = x n xn−1 dx = θ, 2
EX(n) = x2 n
x dx = θ
0 θ n+1 0 θ n+2
157

and
" 2 #
n n n(n + 1)2 − n2 (n + 2) n

2
V ar(X(n) ) = θ − = θ2 2
= θ2 .
n+2 n+1 (n + 2)(n + 1) (n + 2)(n + 1)2
Now, limn EX(n) = θ and limn V ar(X(n) ) = 0. Hence, by Corollary 12.20, {X(n) }n converges in
2nd mean to θ and also in probability.

Remark 12.22 (Convergence in probability does not imply convergence in r-th mean). Consider a
sequence of discrete RVs {Xn }n with Xn ∼ Bernoulli( n1 ), ∀n. Consider Yn := nXn , ∀n. Then Yn ’s
are also discrete with the p.m.f.s given by




1 − n1 , if y = 0,


fYn (y) = 1
 n
, if y = n,



0, otherwise.

1 n→∞ P
For all ϵ > 0, we have P(|Yn | ≥ ϵ) = n
−−−→ 0 and hence Yn −−−→ 0. But, for any r > 1,
n→∞
E|Yn |r = nr−1 , ∀n. Here, {Yn }n does not converge to 0 in r-th mean.

Example 12.23. X1 , X2 , · · · be i.i.d. RVs following N (µ, σ 2 ) distribution. Recall that X̄n =
 2

1 Pn σ σ2
n i=1 Xi ∼ N µ, n . Then limn EX̄n = limn µ = µ and limn V ar(X̄n ) = limn n
= 0. By
Corollary 12.20, {X̄n }n converges in 2nd mean to µ and also in probability.

The above example leads to the following result.

Theorem 12.24 (Weak Law of Large Numbers (WLLN)). Let X1 , X2 , · · · be i.i.d. RVs such that
1 Pn P
EX1 exists. Then, X̄n = n i=1 Xi −−−→ EX1 .
n→∞

Remark 12.25. We only discuss the proof of Theorem 12.24, when EX12 exists. The proof of the
theorem when EX12 does not exist is beyond the scope of this course. However, we shall use this
theorem in its full generality.
158

1 Pn
Proof of WLLN (Theorem 12.24) (assuming EX12 < ∞). Observe that EX̄n = n i=1 EXi = n1 nEX1 =
EX1 and, using independence of Xi ’s we have
n
!
1 X 1 n→∞
V ar(X̄n ) = 2 V ar Xi = V ar(X1 ) −−−→ 0.
n i=1 n
By Corollary 12.20, the result follows. □

You might also like