0% found this document useful (0 votes)
19 views12 pages

P8-Properties of Distributions

STAT

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views12 pages

P8-Properties of Distributions

STAT

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

§8 Properties of distributions

§8.1 Expectation
8.1.1 Definition. The expectation (or expected value, or mean) of a random variable X is:
X
(i) (for discrete case) E[X] = x P(X = x);
x∈X(Ω)
Z ∞
(ii) (for continuous case) E[X] = x f (x) dx, where f is the pdf of X.
−∞

8.1.2 The expectation of X is what we would expect of its value if we are to take an observation of
X. It is a weighted average of the whole range of values attainable by X (i.e. X(Ω)). Heavier
weight is given to points with higher mass/density function.

8.1.3 Proposition. Let X1 , . . . , Xn be random variables. Define Y = g(X1 , . . . , Xn ), for some given
real-valued function g. Then
X X
(i) (for discrete case) E[Y ] = ··· g(x1 , . . . , xn ) P(X1 = x1 , . . . , Xn = xn );
x1 xn
Z ∞ Z ∞
(ii) (for continuous case) E[Y ] = ··· g(x1 , . . . , xn ) f (x1 , . . . , xn ) dx1 · · · dxn , where
−∞ −∞
f is the joint pdf of (X1 , . . . , Xn ).

This allows us to compute the expectation of any transformation of random variables using
their joint distribution.

8.1.4 Examples.

(i) X ∼ U [a, b]:


Z b 
1  b 2 − a2 a+b
E[X] = x dx = = .
a b−a 2(b − a) 2
(ii) For any A ⊂ R,
   
E 1 {X ∈ A} = 1 × P 1 {X ∈ A} = 1 + 0 × P 1 {X ∈ A} = 0 = P(X ∈ A).
Alternatively,
 
Z ∞ Z
E 1 {X ∈ A} = 1 {x ∈ A} f (x) dx = f (x) dx = P(X ∈ A).
−∞ A

(discrete case similar).


 
Special case: A = (−∞, x] ⇒ E 1 {X ≤ x} = P(X ≤ x) = F (x) (cdf at x).

44
(iii) (X, Y, Z) has joint pdf:

f (x, y, z) = 40 xz 1 {x, y, z ≥ 0, x + z ≤ 1, y + z ≤ 1}.

What is E[XY Z]?


Z 1 Z 1 Z 1
E[XY Z] = 40 (xyz)xz 1 {x ≤ 1 − z, y ≤ 1 − z} dx dy dz
0 0 0
Z 1 Z 1−z Z 1−z
= 40 x2 yz 2 dx dy dz
0
Z0 1 0

= (20/3) z 2 (1 − z)5 dz = 5/126.


0

8.1.5 For constants α, β and random variables X, Y , we have E[αX + β Y ] = α E[X] + β E[Y ].

8.1.6 X ≥ 0 ⇒ E[X] ≥ 0 (trivial!)

8.1.7 X ≥ Y ⇒ E[X] ≥ E[Y ] (since X − Y ≥ 0.)

8.1.8 |EX| ≤ E|X| (since |X| ≥ both X and −X.)

8.1.9 If g(x1 , . . . , xn ) = a ∀ x1 , . . . , xn ∈ R, i.e. g(·) is a constant function, and (X1 , . . . , Xn ) has


 
joint probability function f (·), then E g(X1 , . . . , Xn ) = a.
(Reason — for continuous case,
Z ∞ Z ∞ Z ∞ Z ∞
··· g(x1 , . . . , xn ) f (x1 , . . . , xn ) dx1 · · · dxn = a ··· f (x1 , . . . , xn ) dx1 · · · dxn = a;
−∞ −∞ −∞ −∞

discrete case similar.)

8.1.10 X, Y independent random variables ⇒ E[XY ] = E[X] E[Y ]


(Reason — for continuous case,
Z Z Z Z Z  Z 
xy f (x, y) dx dy = xy fX (x)fY (y) dx dy = x fX (x) dx y fY (y) dy ;

discrete case similar.)

8.1.11 More examples.

(i) X ∼ Poisson (λ):


∞ ∞  k −λ  ∞ ∞
X X λ e X λk−1 e−λ X
E[X] = k P(X = k) = k =λ =λ P(X = k − 1) = λ.
k=0 k=1
k! k=1
(k − 1)! k=1

45
(ii) X ∼ geometric with success probability p:
∞ ∞  
X
k
X d k d 1 1−p
E[X] = k(1 − p) p = − p(1 − p) (1 − p) = − p(1 − p) = .
k=0 k=0
dp dp p p
(iii) X ∼ negative binomial (no. of failures before kth success with success probability p ):
Write X = X1 + · · · + Xk , where each Xi ∼ geometric with success probability p. Then

k(1 − p)
E[X] = E[X1 ] + · · · + E[Xk ] = .
p

(iv) N men and M women are seated randomly round a table. Let X = no. of men with a
woman seated immediately to their right. Calculate E[X].
Solution:
It is hard to find mass function of X explicitly. In fact,
    
N +M N −1 M −1 . N +M
P(X = k) = .
k k−1 k−1 N

Easier... write X = X1 + · · · + XN , where

Xi = 1 {ith man has a woman to his right}.

All E[Xi ], i = 1, . . . , N , are equal as labellings of men have no effect on their expected seating
situations. Consider a particular man. There are (M + N − 1) ways of assigning a person to his
right, M ways of which are to be women. Thus
M M
P(Xi = 1) = = 1 − P(Xi = 0) ⇒ E[Xi ] = ,
M +N −1 M +N −1
so that
N
X MN
E[X] = E[Xi ] = .
M +N −1
i=1

Moral of (iii) and (iv): sometimes E[X] is easier to derive not from first principles.
(v) (St. Petersburg Paradox)
Game:
You are given $8 before each game. Then toss a coin until first head turns up. You have
to concede $2k where k = no. of tosses needed to produce first head.
Argument:
On average a head turns up in every two tosses. So expect to take 2 tosses to get a head

46
and hence expected amount to concede = $22 = $4. So you have a net gain of $4 per
game. Very attractive and you will accumulate a fortune?!
Reality: NOT QUITE!
Solution: Let X be amount conceded after a game. Then

P(X = 2k ) = 2−k , k = 1, 2, . . . ,

so that

X
E[X] = 2k 2−k = 1 + 1 + 1 + · · · = ∞.
k=1

But you earn only $8 before each game. Expected net loss per game is ∞ !!

(vi) X ∼ Gamma (α, β):


∞ ∞
β α xα−1 e−βx β α+1 x(α+1)−1 e−βx
Z   Z
α α
E[X] = x dx = dx = .
0 Γ(α) β 0 Γ(α + 1) β

Note: The last integral = 1 because the integrand is the pdf of Gamma (α + 1, β).

X ∼ exp(β) ⇒ X ∼ Gamma (1, β) ⇒ E[X] = 1/β;
Special cases:
X ∼ χ2 ⇒ X ∼ Gamma (m/2, 1/2) ⇒ E[X] = m.
m

(vii) X ∼ Beta (α, β):


Z 1
Γ(α + β)
E[X] = xα (1 − x)β−1 dx
Γ(α)Γ(β) 0
Z 1
α Γ(α + β + 1) α
= x(α+1)−1 (1 − x)β−1 dx = .
α + β Γ(α + 1)Γ(β) 0 α+β

(viii) X ∼ Cauchy ⇒ E[X] does not exist.


(ix) X ∼ N (µ, σ 2 ):

1 x − µ
Z
E[X] = E[X − µ] + µ = (x − µ) φ dx + µ
−∞ σ σ
Z ∞
 ∞
=σ u φ(u) du + µ = σ − φ(u) −∞ + µ = µ.
−∞

47
§8.2 Variance and moments
8.2.1 Definition. The rth moment of a random variable X is E[X r ].
r
8.2.2 Definition. The rth central moment of a random variable X is E X − E[X] .

8.2.3 Definition. The variance of a random variable X, denoted by Var(X), is its second central
moment.

8.2.4 Moments of most frequent use are the mean (i.e. E[X]) and variance (i.e. Var(X)). The mean
measures centrality of the distribution of X, whereas the variance measures the amount by
which X deviates from its mean, i.e. dispersion (or spread ) of the distribution of X.

small variance large variance

mean mean

The above diagram displays two density functions with small and large variances respectively.
p
8.2.5 Definition. The standard deviation of X, usually denoted by σ, is Var(X).

8.2.6 Standard deviation has the same unit as the random variable X.

8.2.7 In general, moments, variance and standard deviation are properties used to describe the
distribution of X and are hence non-random constants.
2
8.2.8 Var(X) = E[X 2 ] − E[X]

48
8.2.9 For any constants a, b, Var(aX + b) = a2 Var(X).

8.2.10 Var(X) ≥ 0

8.2.11 Var(X) = 0 ⇒ X is a non-random constant, i.e. P(X = a) = 1 for some constant a.

8.2.12 Theorem. If X1 , . . . , Xn are independent random variables, then

Var(X1 + · · · + Xn ) = Var(X1 ) + · · · + Var(Xn ).

Proof: For X, Y independent,


h 2 i h 2 i
Var(X + Y ) = E X + Y − E[X + Y ] = E X − E[X] + Y − E[Y ]
h 2 i h 2 i h  i
= E X − E[X] + E Y − E[Y ] + 2 E X − E[X] Y − E[Y ]
   
= Var(X) + Var(Y ) + 2 E X − E[X] E Y − E[Y ] = Var(X) + Var(Y ).

Now, put X = X1 +· · ·+Xn−1 and Y = Xn and apply above results inductively to prove the theorem.

8.2.13 Examples.

(i) X ∼ Binomial (n, p):


X = X1 + · · · + Xn for X1 , . . . , Xn independent Bernoulli trials with success probability
p. Since Xi2 = Xi always, we have
2
E Xi2 = E[Xi ] = p ⇒ Var(Xi ) = E Xi2 − E[Xi ] = p − p2 = p(1 − p).
   

Thus n
X
Var(X) = Var(Xi ) = np(1 − p).
i=1

(ii) X ∼ Poisson (λ):


∞ ∞
 X λk e−λ 2
X λk−2 e−λ
= λ2 .

E X(X − 1) = k(k − 1) =λ
k=0
k! k=2
(k − 2)!

Recalling that E[X] = λ, we have


2
E[X 2 ] = E[X] + λ2 = λ + λ2 ⇒ Var(X) = E[X 2 ] − E[X] = λ.

Thus, a Poisson (λ) random variable has its mean and variance both equal to λ.

49
(iii) X ∼ U [a, b]: Var(X) = (b − a)2 /12. (Exercise)
(iv) X ∼ Gamma (α, β):
∞ ∞
β α xα−1 e−βx β α+2 x(α+2)−1 e−βx
Z Z
2 2 α(α + 1) α(α + 1)
E[X ] = x dx = dx = .
0 Γ(α) β2 0 Γ(α + 2) β2

Recalling that E[X] = α/β, we have


2 α
Var(X) = E[X 2 ] − E[X] = .
β2

X ∼ exp(β) ≡ Gamma (1, β) ⇒ Var(X) = 1/β 2 ;
Special cases:
X ∼ χ2 ≡ Gamma (m/2, 1/2) ⇒ Var(X) = 2m.
m

(v) X ∼ Beta (α, β):


Z 1
2 α(α + 1) Γ(α + β + 2) α(α + 1)
E[X ] = x(α+2)−1 (1−x)β−1 dx = .
(α + β)(α + β + 1) Γ(α + 2)Γ(β) 0 (α + β)(α + β + 1)

Recalling that EX = α/(α + β), we have


2 αβ
Var(X) = E[X 2 ] − E[X] = .
(α + β)2 (α + β + 1)

(vi) A Cauchy random variable has infinite second moment and undefined variance.
(vii) X ∼ N (µ, σ 2 ):
∞ Z ∞
1 x − µ
Z
2 2
∞
y 2 φ(y) dy = σ 2 Φ(y)−y φ(y) −∞ = σ 2 .

Var(X) = (x−µ) φ dx = σ
−∞ σ σ −∞

§8.3 Covariance and correlation


8.3.1 Definition. The covariance of a pair of random variables X and Y is
 
Cov(X, Y ) = E (X − E[X])(Y − E[Y ]) .

8.3.2 Definition. Suppose that Var(X), Var(Y ) > 0. The correlation coefficient, or simply correla-
tion, of X and Y is
Cov(X, Y )
ρ(X, Y ) = p .
Var(X)Var(Y )

50
8.3.3 Both covariance and correlation are properties of the joint distribution of (X, Y ), and are
non-random constants. They measure how much X and Y are linearly related to each other.

8.3.4 Cov(X, Y ) = E[XY ] − E[X] E[Y ].

8.3.5 Var(X + Y ) = Var(X) + Var(Y ) + 2 Cov(X, Y )

8.3.6 Definition. X, Y are uncorrelated if Cov(X, Y ) = 0 (i.e. E[XY ] = E[X] E[Y ].)

8.3.7 The magnitude of Cov(X, Y ) is affected by the spread of the distributions of X and Y , and
may not reflect their relationship solely. The correlation ρ(X, Y ) eliminates this effect by
normalising Cov(X, Y ), and is more honest in reflecting the relationship between X and Y .

8.3.8 |ρ(X, Y )| ≤ 1 always, so ρ(X, Y ) gives a universal measure of correlation between two random
variables, and is useful for comparing relationships of different pairs of random variables.

8.3.9 X, Y are positively correlated if ρ(X, Y ) > 0, and negatively correlated if ρ(X, Y ) < 0, whereas
ρ(X, Y ) = 0 if and only if X and Y are uncorrelated.

8.3.10 The bigger is |ρ(X, Y )|, the


 more highly correlated are X and Y .
ρ(X, Y ) = 1 if and only if X increases linearly with Y ;
Extreme cases:
ρ(X, Y ) = −1 if and only if X decreases linearly with Y .

8.3.11 Examples.

(i) Toss a coin n times, giving X heads and Y = n − X tails. With P(head) = p, we have
X ∼ Binomial (n, p) and Y ∼ Binomial (n, 1 − p).
Thus
   
Cov(X, Y ) = E (X − E[X])(Y − E[Y ]) = E (X − np)(n − X − n(1 − p))
= − E (X − np)2 = − Var(X) = − np(1 − p).
 

Here Cov(X, Y ) depends both on n and p, and therefore is not a good measure of how X
and Y are related: shouldn’t X, Y be related in the same way irrespective of the values of
n, p?
Consider now
− np(1 − p)
ρ(X, Y ) = p = −1 (free of n, p).
np(1 − p) × n(1 − p)[1 − (1 − p)]
Note: ρ(X, Y ) = −1 implies that X decreases linearly with Y .

51
(ii) X, Y independent Poisson (λ) random variables.
Set Z = X + Y , so that Z ∼ Poisson (2λ). Then
2
Cov(X, Z) = E X(X + Y ) − E[X] E[X + Y ] = EX 2 − E[X] = Var(X) = λ.
 

Note that Var(X) = λ and Var(Z) = 2λ. Thus


λ 1
ρ(X, Z) = p = √ (free of λ).
λ(2λ) 2

Here X, Z are positively correlated, i.e. the bigger is X, the bigger is Z.



We can say that (X, Y ) in (i) are more correlated than (X, Y ) in (ii) since | − 1| > 1/ 2. Such
comparison is meaningful only if we compare correlation coefficients, but NOT covariances!

8.3.12 Independent ⇒ uncorrelated

8.3.13 Uncorrelated 6⇒ independent


Example. Let V, W be independent Bernoulli trials with success probability 1/2.
Set X = V + W and Y = |V − W |. Then

E[X] = E[V ] + E[W ] = 1/2 + 1/2 = 1,


1 X
X 1 1 X
X 1
E[Y ] = |v − w| P(V = v, W = w) = (1/4) |v − w| = 1/2,
v=0 w=0 v=0 w=0
1 X
X 1
 
E[XY ] = E (V + W )|V − W | = (1/4) (v + w)|v − w| = 1/2 = E[X] E[Y ],
v=0 w=0

so that X, Y are uncorrelated.


On the other hand, noting that

P(X = 0) = P(V = 0, W = 0) = 1/4,


P(Y = 0) = P(V = 0, W = 0) + P(V = 1, W = 1) = 1/2,
P(X = 0, Y = 0) = P(V + W = 0, |V − W | = 0) = P(V = 0, W = 0) = 1/4,

we have
P(X = 0, Y = 0) 6= P(X = 0) P(Y = 0),
so that X, Y are not independent.

52
§8.4 Quantiles
8.4.1 Definition. Given a random variable X with cdf F , the quantile of order α (or, more simply,
the αth quantile) of X is inf{x ∈ R : F (x) > α}.
Note: There is no unique definition of quantile. For example, it can be defined as inf{x ∈ R : F (x) ≥ α}
or sup{x ∈ R : F (x) < α} etc.

8.4.2 Practical applications are concerned only with α ∈ (0, 1).

8.4.3 If F is continuous and strictly increasing (i.e. x < y ⇒ F (x) < F (y)), then the quantile of
order α is F −1 (α), provided that α ∈ (0, 1).

8.4.4 Terminology. αth quantile = (100 α)th percentile


(0.5)th quantile = median or 2nd quartile
(0.25)th quantile = 1st quartile
(0.75)th quantile = 3rd quartile
(3rd quartile) − (1st quartile) = interquartile range

8.4.5 The median, like the mean, measures the centrality of the distribution of X.
The interquartile range, like the variance (or standard deviation), measures the dispersion or
spread of the distribution of X.
They are, however, more robust than the mean and variance (or standard deviation), in the
sense that they are less sensitive to the tails of the distribution.

8.4.6 Examples. Suppose α ∈ (0, 1).

(i) X ∼ U [a, b]: cdf F (x) = (x − a)/(b − a) (for x ∈ [a, b])

⇒ quantile of order α is F −1 (α) = a + α(b − a) = (1 − α)a + αb.

(ii) X ∼ Bernoulli (p):



0, x < 0,

 
 0 if 0 < α < 1 − p,
cdf F (x) = 1 − p, 0 ≤ x < 1, ⇒ quantile of order α is

 1 if 1 − p ≤ α < 1.
1, x≥1

53
(iii) X ∼ exp(λ):

ln(1 − α)
cdf F (x) = 1 − e−λx ⇒ quantile of order α is − .
λ
1
(iv) X ∼ Cauchy with pdf f (x) =  :
π 1 + (x − θ)2

1 tan−1 (x − θ) 
cdf F (x) = + ⇒ quantile of order α is θ + tan π(α − 1/2) .
2 π
Note: The above Cauchy distribution has
median = θ + tan(0) = θ, interquartile range = tan(π/4) − tan(−π/4) = 2,

BUT its mean and variance are undefined.

§8.5 *** More challenges ***


8.5.1 (Umbrella problem)
n people go to a party on a rainy night. At the end of party, people take their umbrellas at random.

(a) What is the probability that exactly k (≤ n) people get correct umbrellas?
(b) Find the expected number of correctly claimed umbrellas.
(c) Describe the distribution of the number of correctly claimed umbrellas as n → ∞.
What is the mean of this distribution? How does it relate to your answer in (b)?

8.5.2 Let X be a positive random variable and ε be an arbitrary positive constant.

(a) Verify that


1{X ≥ ε} + 1{X < ε} ≡ 1,
where 1{·} denotes the indicator function.
(b) Suppose that X ≤ M with probability one and that E[X] = K, for some positive constants
M and K. Show that for ε < M ,
K −ε
P(X ≥ ε) ≥ .
M −ε

54
(c) Suppose now that X is a positive random variable bounded in the interval [0, 1] and E[X] =
1/2. Deduce from (b) that
P(X ≥ 1/4) ≥ 1/3.
Obtain an upper bound for P(X > 3/4).
[Hint: consider the random variable 1 − X.]

55

You might also like