0% found this document useful (0 votes)

19 views15 pages

Section 5 - Expectation and Variance

Uploaded by

jixuezhanggg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views15 pages

Section 5 - Expectation and Variance

Uploaded by

jixuezhanggg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

5 Expectation and variance

In Section 3 and 4 we encountered a wide range of random variables. It is often useful to describe
or summarise their behaviour and in this section we introduce several quantities to do this. We pay
particular attention to two of key importance: the expectation and the variance of a random variable.

5.1 Expectation

Definition 5.1 (Expectation – discrete case). Let X be a discrete random variable. The expectation
of X is given by
X
E[X] := x · P(X = x),
x∈SX
P P
provided the series x∈SX |x| · P(X = x) converges. If the sum x∈SX |x| · P(X = x) diverges then
the expectation is undefined 1 .

Remark 5.2. Let X be a discrete random variable.

• The expectation of X is often called the mean or the average of X.

• If SX is a finite set then x∈SX |x| · P(X = x) converges, and so the expectation always exists.
P

This might not be true if |SX | is infinite (see Example 5.10 below).

Example 5.3. Let X satisfy P(X = 2) = 0.4, P(X = 4) = 0.1 and P(X = 10) = 0.5. Then

E[X] = 2 · (0.4) + 4 · (0.1) + 10 · (0.5) = 6.2.

When betting on a game of chance, an important random variable is the amount of money gained over
the game. The expectation is then often used to decide if the game is worth playing.

Example 5.4. Take two kings and four aces, shuffle the six cards thoroughly and then draw two
without replacement. We win £1.25 if we draw two aces; otherwise, we pay £1. Should we play?
Let X denote the amount gained in a game (negative if we lose). The question gives P(X = 1.25) =
4
6
2 / 2 = 0.4 and P(X = −1) = 0.6, so

E[X] = (1.25) · P(X = 1.25) + (−1) · P(X = −1) = (1.25) · (0.4) + (−1) · (0.6) = −0.1.

As the expected gain is negative, on average we will loss money and so should not play 2 .

Let us calculate the expectations of some familiar random variables.

1
If a series is absolutely convergent then the order in which we sum its elements does not effect the answer, proven in
1SAS. This is why absolute convergence is important here.
2
Hopefully this sounds like an good criteria to make a decision about whether to play but at the moment this relies
only on our intuition. We will obtain a more rigorous justification in Section 6.

1
Example 5.5 (Constant). If X has P(X = c) = 1 for some c ∈ R then E[X] = c · P(X = c) = c.

Example 5.6 (Discrete uniform). Let X follow the uniform distribution on {1, . . . , n}, that is P(X =
k) = 1/n for all k = 1, . . . , n. Then,
n n
X X i n(n + 1) n+1
E[X] = i · P(X = i) = = = .
n 2n 2
i=1 i=1

Example 5.7 (Binomial). On problem sheet 4 you will show that if X ∼ binn,p then E[X] = np.

Example 5.8 (Geometric). Let X ∼ geop with p ∈ (0, 1). Here there is a small trick. Taking q = 1−p
∞
X ∞
X ∞
X ∞
X ∞
X
k−1 k k−1 k−1
p · E[X] = E[X] − q · E[X] = kpq − kpq = kpq − (k − 1)pq = pq k−1 = 1.
k=1 k=1 k=1 k=2 k=1

The third equality here follows by a change of variable and the final equality follows by summing a
geometric series. Rearranging we get E[X] = p1 .

Example 5.9 (Poisson). Let X ∼ Poiλ with λ > 0. Then,

∞ ∞ ∞ ∞
X X λk X λk `=k−1
X λ`
E[X] = k · P(X = k) = ke−λ = e−λ = λe−λ = λe−λ eλ = λ.
k! (k − 1)! `!
k=0 k=1 k=1 `=0

1
Example 5.10 (No expectation). Let X be a discrete random variable with P(X = n) = n(n+1) for
n ∈ N. These probabilities do indeed sum to one since:
m m m
X 1 X1 X 1 1
= − =1− → 1, as m → ∞.
n(n + 1) n n+1 m+1
n=1 n=1 n=1
P∞ P∞ 1
However n=1 n · P(X = n) = n=1 n+1 which diverges 3 . Therefore E[X] is not defined in this case.

The expectation of continuous random variables is defined very similarly.

Definition 5.11 (Expectation – continuous case). Let X be a continuous random variable with density
fX . Then the expectation of X is given by
Z ∞
E[X] := x · fX (x)dx,
−∞
R∞ R∞
provided −∞ |x| · fX (x)dx exists. If −∞ |x| · fX (x)dx does not exist then E[X] is undefined.

Example 5.12 (Continuous uniform). Let X ∼ unif[a, b] with density fX (x) = (b − a)−1 for x ∈ [a, b]
and fX (x) = 0 for x ∈
/ [a, b]. Then,
∞ b
b2 − a2
Z Z
−1 a+b
E[X] = x · fX (x)dx = (b − a) xdx = = .
−∞ a 2(b − a) 2
3 P∞ 1
As the harmonic series n=1 n diverges.

2
Example 5.13 (Exponential). Let X ∼ expλ where λ > 0. Then X has density function fX (x) =
λ · e−λx for x > 0 and fX (x) = 0 otherwise. Then,
Z ∞ Z ∞ Z ∞
−λx −λx ∞ 1
e−λx dx = ,

E[X] = x · fX (x)dx = x · λe dx = x(−e ) 0 +
−∞ 0 0 λ

where we used integration by parts in the second last equality.

Example 5.14. Let f (x) = 41 | sin(x)| for x ∈ [0, 2π] and f (x) = 0 elsewhere. Then f gives a density
function (check this!) and if X is a random variable with this density
Z ∞ Z 2π Z π Z 2π
1 1 1
E[X] = x · f (x)dx = · x · | sin(x)|dx = · x · sin(x)dx − · x · sin(x)dx.
−∞ 4 0 4 0 4 π
Rb
Integration by parts (with u = x and v = − cos(x)) gives a x sin(x)dx = [−x · cos(x) + sin(x)]ba . Thus

1 1 (π + 0) (−2π − π)
E[X] = · [−x · cos(x) + sin(x)]π0 − · [−x · cos(x) + sin(x)]2π
π = − = π.
4 4 4 4

5.2 Key properties of expectation

The next theorem, together with Remark 5.16), is absolutely central to why expectation is so useful.

Theorem 5.15. Let X, Y be discrete or continuous random variables with well-defined expectations.

(i) Given any a, b ∈ R, we have E[aX + bY ] = a · E[X] + b · E[Y ]. (linearity of expectation)

(ii) If X, Y are independent then E[X · Y ] = E[X] · E[Y ].

We postpone the proof of Theorem 5.15 to end of the section so that we can highlight some aspects
of the theorem and see some applications.

Remark 5.16. The following points are of key importance.

• Linearity of expectation (Theorem 5.15 (i )) extends by induction on n to give that for any n ≥ 1,
a1 , . . . , an ∈ R and random variables X1 , . . . , Xn with well-defined expectations, we have

E[a1 X1 + · · · + an Xn ] = a1 E[X1 ] + · · · + an E[Xn ].

• From Section 3 we know that if X, Y are independent random variables then f (X), g(Y ) are also
independent random variables for any functions f, g : R → R. It follows from Theorem 5.15 (ii )
that E[X 2 Y 3 ] = E[X 2 ] · E[Y 3 ], that E[sin(X) · eY ] = E[sin(X)] · E[eY ], etc. 4 .

• Independence is key in (ii ); in general for random variables X, Y we have E[X · Y ] 6= E[X] · E[Y ].
4
Provided all these expectations are well defined.

3
Example 5.17 (Dice). Let X1 , . . . , Xn be random variables with the uniform distribution on {1, . . . , 6}
(representing fair dice rolls). Then the random variable Y = X1 + · · · + Xn has SY = {n, n +
1, . . . , 6n − 1, 6n} but P(Y = k) is quite tricky to calculate. However E[Y ] = E[X1 + · · · + Xn ] =
E[X1 ] + · · · + E[Xn ] = nE[X1 ] by Theorem 5.15(i). But E[X1 ] = 3.5 by Example 5.6, so E[Y ] = 3.5n.

Example 5.18 (Sum of digits). Let X be uniformly distributed on {0, 1, . . . , 999}. Let Y be its
sum of digits. What is E[Y ]? We can write X = X1 + 10X2 + 100X3 , where Xi ∈ {0, 1, . . . , 9}
follows the uniform distribution on this set. In particular, E[Xi ] = (0 + 1 + 2 + · · · + 9)/10 = 4.5. As
Y = X1 + X2 + X3 , we find E[Y ] = E[X1 ] + E[X2 ] + E[X3 ] = 13.5 using linearity of expectation.

Example 5.19 (Expectation of the normal distribution). We first focus on the standard normal,
2
N ∼ N (0, 1). Using that xe−x /2 is an odd function, by symmetry
Z ∞ Z 0 Z ∞
1 2 1 2 1 2
E[N ] = √ xe−x /2 dx = √ xe−x /2 dx + √ xe−x /2 dx = 0. (1)
2π −∞ 2π −∞ 2π 0
Now to consider X ∼ N (µ, σ 2 ). By Proposition 4.19 we have µ + σN ∼ N (µ, σ 2 ) and it follows from
Theorem 5.15 (i) that E[X] = E[µ + σN ] = E[µ] + σ · E[N ] = µ, by (1).

To further exploit linearity of expectation we need the notion of a Bernoulli random variable.

Definition 5.20 (Bernoulli distribution). The Bernoulli distribution with parameter p, with p ∈ [0, 1],
is the probability distribution on {0, 1} given by

Berp (1) = p, Berp (0) = 1 − p.

A random variable X follows the Bernoulli distribution with parameter p if SX = {0, 1} and P(X =
k) = Berp (k) for all k ∈ SX = {0, 1}. In this case we write X ∼ Berp . (Note that bin1,p = Berp .)

Example 5.21 (Bernoulli). If X ∼ Berp then E[X] = 1 · P(X = 1) + 0 · P(X = 0) = p.

Example 5.22. Let X ∼ Ber0.5 and set Y = 1 − X. Then Y ∼ Ber0.5 and X · Y = 0. This gives
E[X · Y ] = 0 6= 1/4 = E[X] · E[Y ]; see the third bullet point in Remark 5.16.

In computing expectations, it is often possible to ‘break up’ a random variable X into a sum of simpler
ones X = ni=1 Xi and then use linearity of expectation to find E[X] = ni=1 E[Xi ]. The following
P P

three examples illustrate this method, which is very applicable.

Example 5.23 (Binomial). Let X ∼ binn,p . We will show E[X] = np. We know that X counts the
number of occurrences of independent events A1 , . . . , An , where each event happens with probability
p. Thus, X = X1 + X2 + · · · + Xn with Bernoulli random variables X1 , . . . , Xn , where Xi = 1 if and
only if Ai occurs. Linearity of expectation and Example 5.21 give E[X] = ni=1 E[Xi ] = ni=1 p = np.
P P

Example 5.24 (Hypergeometric). We show that if X ∼ hypn,r,t where n, r ≤ t then E[X] = n rt .

One could compute this by hand, exploiting identities for binomial coefficients (exercise!). Instead we
present a computation-free proof.
Recall that X counts the number of red balls in a sample of size n from an urn containing r red balls
and t balls in total. Label the red balls from 1 to r and the remaining balls from r + 1 to t. Let Xi = 1

4
if the i-th ball is in the sample and Xi = 0 otherwise. Then the number of red balls X = X1 + · · · + Xr .
By linearity of expectation,

E[X] = E[X1 ] + · · · + E[Xr ]. (2)

t−1
t
But E[Xi ] = P(Xi = 1) = n−1 / n = n/t as we draw n balls without replacement and are interested
in the event we draw a specific ball. Putting this value into (2) gives E[X] = n · rt .

Example 5.25 (Birthday paradox). Place 50 people in a room and assume that their birthdays are
uniformly distributed over the whole year, independently of each other. Let X denote the number of
days in the year on which someone in the group has their birthday. What is E[X]?
The distribution of X is intricate, and we will not even think of studying it. Instead, for i = 1, . . . , 365,
let Ai be the event that somebody celebrates on day i. Then, P (Aci ) = (364/365)50 . Hence, P (Ai ) =
1 − (364/365)50 . As we can write X = X1 + X2 + · · · + X365 with Bernoulli random variables
X1 , . . . , X365 , where Xi = 1 if and only if Ai occurs, linearity of expectation gives

E[X] = 365 · E[X1 ] = 365 · P(A1 ) = 365 · 1 − (364/365)50 = 46.786 [3dp].

Remark 5.26. The fact that linearity of expectation works without independence is a huge strength.
Note that in the previous two examples, the random variables {Xi } were not independent.

Example 5.27 (Dice). We roll a fair die repeatedly and let X be the number of rolls necessary for
all faces to show up at least once. What is E[X]?
To calculate this, for each i ∈ {1, . . . , 6} let Ti denote the roll when the i-th new face first appears 5 .
In particular, we have T1 = 1 and T6 = X. Then we can write

X = T1 + (T2 − T1 ) + (T3 − T2 ) + (T4 − T3 ) + (T5 − T4 ) + (T6 − T5 ).

By linearity of expectation, our calculation now reduces to calculating E[Ti+1 − Ti ] for i ∈ {1, . . . , 5}.
What is the distribution of Ti+1 − Ti if i ∈ {1, . . . , 5}? At time Ti , we have seen i faces. The number of
rolls until the next new face appears follows the geometric distribution with parameter pi = (6 − i)/6.
Hence, Ti+1 − Ti ∼ geopi and E [Ti+1 − Ti ] = 1/pi = 6/(6 − i). By linearity of expectation, we find

5 5 5
X X 6 X1
E[X] = 1 + E[Ti+1 − Ti ] = 1 + =1+6 = 14.7
6−i i
i=1 i=1 i=1

This calculation is a variant of the coupon collector problem, which tends to appear quite often 6 .

The following result is also very useful in calculating expectations.

Lemma 5.28. Let g : R → R be a function.

5
Note: Ti is not the first time that face i appears.
6
You might like to see the Wikipedia article for more information here.

5
(i) If X is a discrete random variable, we have
X
E[g(X)] = g(x) · P(X = x),
x∈SX
P
provided x∈SX |g(x)| · P(X = x) converges.

(ii) If X is a continuous random variable with density fX then

Z ∞
E[g(X)] = g(x) · fX (x)dx,
−∞
R∞
provided −∞ |g(x)| · fX (x)dx converges.

Again we postpone the proof, to first see how it applies.

Example 5.29. Lemma 5.28 might look a little odd, but it is very convenient. To see why, note that
if X is a discrete random variable then by Definition 5.1 we have
X
E[X 2 ] = x · P(X 2 = x).
x∈SX 2

This might require determining SX 2 and probabilities P(X 2 = x). Lemma 5.28 shows this is not
necessary; taking g(x) = x2 we also have E[X 2 ] = x∈SX x2 · P(X = x).
P

Example 5.30 (Karate). We chop a stick of length 1 into two pieces at a uniformly chosen point.
What is the expected length of the longer of the two pieces?

Letting X ∼ unif[0, 1] be the breaking point, we want E[g(X)] where g(x) = max(x, 1 − x). Thus,
1 0.5 1 0.5 1
−(1 − x)2 x2
Z Z Z
3
E[Y ] = max(x, 1 − x) · 1dx = (1 − x)dx + xdx = + = .
0 0 0.5 2 0 2 0.5 4

5.2.1 The proofs of Theorem 5.15 and Lemma 5.28.

Proof of Theorem 5.15. We only consider the case of discrete random variables in the proof. To
simplify the notation, we write SX = {x1 , x2 , . . .} and SY = {y1 , y2 , . . .}, noting that these sets could
be finite or infinite.

To prove (i) let Z = aX + bY with SZ = axi + byk : i ≥ 1, k ≥ 1 . We have
X X [

E[Z] = z·P Z =z = z·P X = x i , Y = yk
z∈SZ z∈SZ i≥1,k≥1:axi +byk =z
X X

= z· P X = xi , Y = yk
z∈SZ i≥1,k≥1:axi +byk =z
X X
= axi + byk · P X = xi , Y = yk .
i≥1 k≥1

6
The series for E[X] and E[Y ] are absolutely convergent, so we can rearrange this sum by 1SAS results:
X X X X
E[Z] = a · xi · P X = x i , Y = yk + b · yk · P X = xi , Y = yk
i≥1 k≥1 k≥1 i≥1
X X
=a· xi · P(X = xi ) + b · yk · P(Y = yk )
i≥1 k≥1

= a · E[X] + b · E[Y ].

To prove (ii), note that by replacing aX + bY with X · Y in (i), the analogue of the calculation for
E[Z] gives
XX XX
E[X · Y ] = xi yk · P(X = xi , Y = yk ) = xi yk · P(X = xi ) · P(Y = yk )
i≥1 k≥1 i≥1 k≥1
X X
= xi · P(X = xi ) · yk · P(Y = yk )
i≥1 k≥1

= E[X] · E[Y ].

The second equality above is the key point at which independence is used (recalling Definition 3.33).
The third equality is clear if the series are finite, but holds for infinite series by results from 1SAS, as
both series are absolutely convergent. The final equality holds by definition of E[X] and E[Y ].

Proof of Lemma 5.28. We only prove (i). By definition of E[g(X)], we have

X X
E[g(X)] = y · P g(X) = y = y · P X ∈ {x ∈ SX : g(x) = y}
y∈Sg(X) y∈Sg(X)
X X
= y· P(X = x)
y∈Sg(X) x∈SX :g(x)=y
X X
= g(x) · P(X = x)
x∈SX y∈Sg(X) :g(x)=y
X
= g(x) · P(X = x).
x∈SX

The fourth equality holds by switching the order of summations (i.e. the order of x and y) and the
final equality holds by noting that {y ∈ Sg(X) : g(x) = y} = |{g(x)}| = 1.

5.3 Variance

Definition 5.31 (Variance - discrete case). Let X be a discrete random variable with well-defined
expectation. The variance of X is given by
X
Var(X) := E (X − E[X])2 = (x − E[X])2 · P(X = x),

(3)
x∈SX

7
provided the series in (3) converges. The standard deviation of X is then given by
p
σX := Var(X).

If the series in (3) diverges then both Var(X) and σX are undefined.

Remark 5.32. Let X be a discrete random variable with well-defined expectation and variance.

• The variance of X is always non-negative, by definition in (3).

• Both the variance and the standard deviation give a measure of the typical distance of the
random variable X from E[X]. These quantities have different benefits:

(i) typically |X − E[X]| is not much larger than σX (proven in Section 6), while
(ii) variance behaves much better algebraically (see Proposition 5.39 and Theorem 5.50 below).

Example 5.33. We return to the game from Example 5.4. As X satisfies P(X = 1.25) = 0.4,
P(X = −1) = 0.6 and E[X] = −0.1, we deduce

Var(X) = (1.25 − (−0.1))2 · 0.4 + (−1 − (−0.1))2 · 0.6 = 1.215.

Example 5.34 (Constant random variable). We return to Example 5.5 with a random variable X
satisfying P(X = c) = 1 for some c ∈ R. Then, E[X] = c and Var(X) = (c − c)2 · P(X = x) = 0. This
makes sense as there is no variation in the values which X can attain.

Example 5.35. Let X follow the uniform distribution on {49, 50, 51} and Y follow the uniform
distribution on {1, . . . , 99}. Then E[X] = E[Y ] = 50. However the random variable X attains values
p
very close to E[X] and has a small standard deviation; σX = 2/3. The random variable Y is evenly
spread on {1, . . . , 99}, and σY is much larger; σY = 28.57 . . . as shown in Example 5.40.

Definition 5.36 (Variance - continuous case). Let X be a random variable with density fX , and
well-defined expectation. Then the variance of X is given by
Z ∞
Var(X) = (x − E[X])2 · fX (x)dx,
−∞
p
provided the integral exist and the corresponding standard deviation by σX = Var(X).

Remark 5.37. More generally, for g : R → R, if g(X) has a well-defined expectation, then
Z ∞ p
Var(g(X)) = (g(x) − E[g(X)])2 · fX (x)dx, σg(X) = Var(g(X)).
−∞

Example 5.38. Coming back to Example 5.30, recall that we had X ∼ unif[0, 1] and were interested
in g(X) where g(x) = max(x, 1 − x). Our calculation gave E[g(X)] = 3/4.
Z 1 Z 1/2 Z 1
2 2 2
Var(g(X)) = max(x, 1 − x) − 3/4 dx = 1 − x − 3/4 dx + x − 3/4 dx
0 0 1/2
1 3

h (1/4 − x)3 i1/2 h (x − 3/4)3 i1 4 4 1
= − + = = .
3 0 3 1/2 3 48

8
5.4 Key properties of variance

Proposition 5.39. Let X be a discrete or continuous random variable with well-defined expectation
and variance. Then the following hold:

(i) Var(X) = E[X 2 ] − (E[X])2 ,

(ii) given a, b ∈ R we have Var(aX + b) = a2 · Var(X),

(iii) Var(X) = 0 if and only if P(X = x0 ) = 1 for some x0 ∈ R.

Proof. We will prove (i)–(iii) assuming X is a discrete random variable. For (i) note that

Var(X) = E (X − E[X])2 = E X 2 − 2 · X · E[X] + E[X]2

= E X 2 − 2E X · E[X] + E[X]2

= E X 2 − E[X]2 .

The second equality holds by expanding (X − E[X])2 and third holds by Theorem 5.15 (i).
To see (ii) note that by Theorem 5.15 (i) we have E[aX + b] = a · E[X] + b. It follows that
2 2
= a2 Var(X).

Var(aX + b) = E aX + b − E[aX + b] = E aX − a · E[X]

To prove (iii), note that if P(X = x0 ) = 1 then given y ∈ R \ {x0 } we have P(X = y) ≤ P(X 6= x0 ) = 0.
It follows that E[X] = x0 · 1 + 0 = x0 and Var(X) = (x0 − E[X])2 · 1 + 0 = 0.
On the other hand, suppose Var(X) = 0. Then given any y ∈ SX we have
X 2
(y − E[X])2 · P(X = y) ≤ x − E[X] · P(X = x) = Var(X) = 0.
x∈SX
P
Thus if y 6= E[X] then P(X = y) = 0. It follows that P(X = E[X])+0 = x∈SX P(X = x) = P(Ω) = 1,
and so P(X = E[X]) = 1.

Example 5.40 (Discrete uniform). Let X follow the uniform distribution on {1, . . . , n}. Then by
Example 5.6 we have E[X] = (n + 1)/2. As
n
X i2 n(n + 1)(2n + 1) (n + 1)(2n + 1)
E[X 2 ] = = = ,
n 6n 6
i=1

1
by Proposition 5.39 (i) we obtain that Var(X) = E[X 2 ] − (E[X])2 = 12 (n
2 − 1).

Example 5.41 (Bernoulli). If X ∼ Berp then E[X] = p by Example 5.21 and we find that Var(X) =
(1 − p)2 · p + p2 · (1 − p) = p(1 − p).

Example 5.42 (Binomial). Var(X) = np(1 − p) for X ∼ binn,p (see problem sheet 4).

Example 5.43 (Hypergeometric). If X ∼ hypn,r,t then Var(X) = n rt t−r

t−n
t t−1 . We will not
prove this, but if you are interested, see Example 8.30 in Introduction to Probability by Anderson,
Seppäläinen and Valkó.

9
Example 5.44 (Geometric). Let X ∼ geop where p ∈ (0, 1). Letting q := 1 − p we have
∞ ∞ ∞ ∞ ∞ q
X X X X X 1
E[X 2 ] = k 2 (1 − q)q k−1 = (k + 1)2 q k − k2 qk = 2 kq k + qk = 2 · · E[X] + .
p p
k=1 k=0 k=1 k=1 k=0

2q 1 1 q 1−p
Since E[X] = 1/p by Example 5.8, we find Var(X) = E[X 2 ] − (E[X])2 = p2
+ p − p2
= p2
= p2
.

Example 5.45 (Poisson). Let X ∼ Poiλ with λ > 0. In Example 5.9 we saw that E[X] = λ. A direct
computation of E[X 2 ] is cumbersome, and it is better to consider E[X(X − 1)]:
∞ ∞ ∞
X λk X λk `=k−2
X λ`
E[X(X − 1)] = k(k − 1)e−λ = e−λ = λ2 e−λ = λ2 e−λ eλ = λ2 .
k! (k − 2)! `!
k=2 k=2 `=0

We obtain Var(X) = E[X 2 ] − (E[X])2 = E[X(X − 1)] + E[X] − (E[X 2 ])2 = λ2 + λ − λ2 = λ.

Example 5.46. (Exercise!) Taking Y = g(X) as in Example 5.30, you can check that Var(Y ) = 1/48.
Similarly for the random variable X from Example 5.14, we have Var(X) = π 2 /2 − 2.

Example 5.47 (Continuous uniform distribution). Let X ∼ unif[a, b]. In Example 5.12 we had
already shown that E[X] = (a + b)/2. To find the variance, compute
b
b3 − a3 a2 + ab + b2
Z
2 −1
E[X ] = (b − a) x2 dx = = .
a 3(b − a) 3
(b−a)2
Thus, Var(X) = E[X 2 ] − (E[X])2 = 12 .

Example 5.48 (Exponential). Let X ∼ expλ where λ > 0. Then

Z ∞ i∞ Z ∞
2 −λx
h
−λx 2E[X] 2
2
E[X ] = λx e dx = −2xe +2 xe−λx dx = 0 + = 2,
0 0 0 λ λ
1
using Example 5.13. This gives Var(X) = E[X 2 ] − (E[X])2 = λ2
.
2
Example 5.49 (Normal). Again let N ∼ N (0, 1) with density f (x) = (2π)−1/2 e−x /2 , x ∈ R, recalling
2
that E[N ] = 0 from 5.19. To compute Var(N ) = E[N 2 ] − E[N ] = E N 2 , we perform integration

2
by parts with u = x, v 0 = xe−x /2 :
Z ∞ h i∞ Z ∞ Z ∞
2 1 − x2
2 1 −x2 /2 − x2
2
Var(N ) = E[N ] = √ x · xe dx = √ −xe + e dx = f (x)dx = 1.
2π −∞ 2π −∞ −∞ −∞

As µ + σN ∼ N (µ, σ 2 ), it follows from Proposition 5.39 (ii) that Var(X) = σ 2 for X ∼ N (µ, σ 2 ).

The following theorem shows that variance is additive for independent random variables.

Theorem 5.50. Let X1 , . . . , Xn be discrete or continuous random variables with well-defined expec-
tations and variances. If X1 , . . . , Xn are independent, then
n
X
Var(X1 + · · · + Xn ) = Var(Xi ).
i=1

10
Proof. We again prove this only for discrete random variables. It is convenient to set µi = E[Xi ] for
each i ∈ [n]. By linearity of expectation we have E[ ni=1 Xi ] = ni=1 E[Xi ] = ni=1 µi . Secondly,
P P P

given 1 ≤ i 6= j ≤ n the random variables Xi and Xj are independent, so the random variables Xi − µi
and Xj − µj are also independent by Proposition 3.38. Thus
n
X n
h X Xn 2 i n
h X 2 i

Var Xi = E Xi − E[ Xi ] =E (Xi − µi )
i=1 i=1 i=1 i=1
n
hX n
X i
=E (Xi − µi ) · (Xj − µj )
i=1 j=1
n
X h 2 i Xn n
X i

= E Xi − µi + E (Xi − µi ) · (Xj − µj )
i=1 i=1 j=1:j6=i
Xn n
X n
X i

= Var(Xi ) + E (Xi − µi ) · E (Xi − µi )
i=1 i=1 j=1:j6=i
n
X
= Var(X).
i=1
2
The second last equality here uses that Var(Xi ) = E Xi − µi for all i = 1, . . . , n and that by
Theorem 5.15 (ii), as Xi − µi and Xj − µi are independent for i 6= j. The final inequality holds as

E Xi − µi = 0 for all i = 1, . . . , n.

Example 5.51. Let X, Y be independent random variables with Var(X) = 2 and Var(Y ) = 5. What
is Var(3X − 5Y )? By Proposition 3.38 the random variables 3X, −5Y are independent and so

Var(3X − 5Y ) = Var(3X) + Var(−5Y ) = 32 Var(X) + (−5)2 Var(Y ) = 18 + 125 = 143.

Remark 5.52. Let X, Y be random variables with well-defined variances. In general, we do not have
Var(X +Y ) = Var(X)+Var(Y ). For example, let X be an arbitrary random variable with well-defined
variance Var(X) > 0 and set Y = −X. Then 0 = Var(X + Y ) 6= Var(X) + Var(Y ) = 2Var(X) where
we used Proposition 5.39(ii) with a = −1, b = 0 in the last step.

Example 5.53 (Binomial). Let X ∼ binn,p . As in Example 5.23 we have X = ni=1 Xi where with
P

Xi ∼ Berp for all i = 1, . . . , n. By additivity of variances for independent random variables, since the
random variables X1 , . . . , Xn are independent (the events A1 , . . . , An are independent), we obtain
n n
!
X X
Var(X) = Var Xi = Var(Xi ) = np(1 − p),
i=1 i=1

where we used the expression for the variance of a Bernoulli random variable from Example 5.41.

For fixed p ∈ [0, 1], the value E[X] in Example 5.53 is proportional to n as n → ∞, while the standard
√
deviation σX grows like n. In particular, we find that the fluctuations of X are much smaller than
E[X] for large n.

11
5.5 Median

Another quantity which plays an important role in statistics is the median.

Definition 5.54 (Median). Let X be a discrete or continuous random variable. A value m ∈ R is

called a median of X if P(X ≥ m) ≥ 1/2 and P(X ≤ m) ≥ 1/2.

Remark 5.55. .

• In general, the median of a random variable is not necessarily unique.

• Like the expectation, the median gives an estimate of the ‘typical value’ of X. Statisticians tend
to have a soft spot for the median and often choose it over the expectation to represent data.
One reason is that it is more stable and less sensitive to extreme values than the expectation.

• On the downside, the median doesn’t behave very well algebraically and the median analogues of
the useful properties in Theorem 5.15 all tend to fail. In particular, the median is rarely linear.

Example 5.56 (Continuous uniform distribution). Let X ∼ unif[a, b] with distribution function

0 if t < a,


FX (t) = t−a
b−a if a ≤ t ≤ b,


1 if t > b.

Solving FX (t) = 1/2 shows that m = (a + b)/2 = E[X] is the unique median of X.

Example 5.57 (Discrete uniform distribution). Let X have the uniform distribution on {1, . . . , n}.
Given x ∈ R with 1 ≤ x ≤ n, we have P(X ≤ x) = bxc/n and P(X ≥ x) = (bn − xc + 1)/n. If n is
odd, the E[X] = (n + 1)/2 is the unique median of X. If n is even, however, then every number in the
interval [n/2, n/2 + 1] is a median of X.

Example 5.58. We roll a fair dice and as usual let Ω = {1, . . . , 6} and P denote the uniform distri-
bution on Ω. We also let X : Ω → R denote the outcome of the roll, so that X(i) = i. As seen above,
any number in [3, 4] is a median of X.
Now instead assign a new value K to the face 6, where K is some very large number (maybe K = 1000).
Let Y be the outcome of rolling the modified fair dice. Then, for K ≥ 4, any median of Y still lies in
the interval [3, 4]. However, we find E[Y ] = (K + 15)/6 and so the mean changes drastically 7 .

7
One can show that, for any median m, we have |E[X] − m| ≤ σX , so median and expectation are close to each other
if σX is small. (A proof of this inequality is difficult.)

12
5.6 Statistical applications

Hypothesis tests. Randomised double-blind clinical trials work as follows: a group of patients is
randomly subdivided into two subgroups T (for treatment) and C (for control). Patients in group T
are given a certain treatment while group C patients receive placebos. To minimise any kind of bias
neither participants nor doctors know the partition into groups T and C.
In 1948 the first randomised clinical trial was held in the UK leading to a breakthrough in tuberculosis
treatment using the antibiotic streptomycin. Six months of treatment saw the following results:

improvement no improvement total

treatment (T) 39 16 55
no treatment (C) 17 35 52
total 56 51 107

The numbers look convincing, but one might still ask if such an extreme result could have occurred
by coincidence. Assuming that the conditions of 56 of 107 patients were determined to improve (inde-
pendently of any treatment), the number X of those patients in group T follows the hypergeometric
distribution with n = 55, r = 56 and t = 107. Thus, E[X] = 55 · 56/107 = 28.78 . . . The probability
for a deviation of at least 39 − 28.78 . . . from the mean is
18 56 51 56 56 51

X k · 55−k
X k · 55−k
P(X ≤ 18) + P(X ≥ 39) = 107
+ 107
≈ 0.001.
k=4 55 k=39 55

This probability is called the p-value associated with the data. In hypothesis testing, one fixes a
significance level α (often 0.01, 0.05 or 0.10) and rejects the null hypothesis (here: treatment has no
effect) if the p-value is smaller than α. 8

Mean estimation. A central problem in statistics is the following: given independent realisations
X̂1 , . . . , X̂n (that is, data) from an unknown distribution which has mean µ and variance σ 2 ≥ 0, we
would like to estimate µ. The obvious choice is the sample mean

µ̂n = n−1 (X̂1 + · · · + X̂n ).

By linearity of expectation, we find E[µ̂n ] = µ. Further, by Proposition 5.39 (ii) and Theorem 5.50
we have Var(µ̂n ) = σ 2 /n. As n → ∞, the fluctuations of µ̂n around its expectation µ become smaller
giving that µ̂n is extremely likely 9 to be a good estimator for the true mean µ.

Variance estimation. In the setting of the previous example, we sometimes also want to estimate
σ 2 . The obvious choice would be n−1 (X̂1 − µ)2 + · · · + (X̂n − µ)2 . Indeed, this quantity is a good

8
This line of argument is known as Fisher’s exact test after Sir Ronald Fisher (1890 - 1962), the founder of modern
statistical science. You will study hypothesis tests in much greater detail in the Year 2 module 2S.
9
The law of large numbers, discussed in Section 6, will make this statement more precise.

13
approximation for σ 2 ; however, in applications, it is useless since we do not know the true value of µ.
Instead, we work with the sample variance
n
X
σ̂n2 := n−1 (X̂i − µ̂n )2 .
i=1
Pn
By rearranging this expression (a one line calculation) we obtain that σ̂n2 = n−1 2 − µ̂2n . Since

i=1 X̂i
E µ̂2n = Var(µ̂n ) + (E[µ̂n ])2 = σ 2 /n + µ2 and E X̂i2 = σ 2 + µ2 , it follows that

n − 1
E[σ̂n2 ] = · σ2. (4)
n
Similar to estimation of the mean, to show that σ̂n2 is a good estimator for σ 2 , we would like to prove
that its variance tends to zero as n → ∞. A lengthy calculation (a page or two in length) gives that

E X̂i4 − σ 4 E X̂i4 + σ 4 E X̂i4 − σ 4

2
Var σ̂n = + + .
n n2 n3
Provided E X̂i4 < ∞ we obtain Var σ̂n2 → 0 as n → ∞, and σ̂n2 is very likely to be close to E σ̂n2 .

Remark 5.59. In statistics, we often seek 10 an estimator θ̂ for a parameter θ. The estimator is said

to be unbiased if θ is the average value (or expectation) of θ̂. From above E µ̂n = µ and so µ̂n is an
unbiased estimator for µ. On the other hand, from (4) we see that σ̂n2 is a biased estimator. This can
be corrected, by instead taking the unbiased estimator
n
X
(n − 1)−1 (X̂i − µ̂n )2 .
i=1

Most important takeaways in this chapter. You should

• know definitions of expectation, variance, standard deviation and median for discrete and con-
tinuous random variables,

• be familiar with the main properties of expectation and variance,

• be able to compute expectation, variance and median in simple examples,

• know expectation for binomial, hypergeometric, geometric, Poisson, discrete and continuous
uniform, exponential and normal distribution,

• know variance of binomial, Poisson, geometric and normal distribution,

• be familiar with the Bernoulli distribution and know its expectation and variance,

• be able to exploit linearity of expectation to compute the expectation of complicated random

variables,

• appreciate the concept of hypothesis tests.

10
E.g. selecting a group of voters to estimate the proportion of an electorate in favour of a referendum result.

14
Tables of formula and comparisons

The following table extends the previous table from Section 4 (page 3), to include expressions for the
expectation and variance.

discrete r.v. continuous r.v.

mass/density function pX (k), k ∈ SX fX (x), x ∈ R

P Rt
distribution function FX k∈SX ,k≤t pX (k) −∞ fX (x)dx
FX is a step function FX is continuous

connection P(X = k) = FX (k) − P(X < k) fX (x) = FX0 (x)

P R∞
expectation E[X] k∈SX k · pX (k) −∞ x · fX (x)dx
P R∞
expectation E[g(X)] k∈SX g(k) · pX (k) −∞ g(x) · fX (x)dx

variance Var(X) E[X 2 ] − (E[X])2 E[X 2 ] − (E[X])2

The second table below gives formulae for the expectation and variance of common random variables,
and a reference to the example above in which the formula was derived 11 .

distribution parameters expectation Ex. no. variance Ex. no.

n+1 n2 −1
(discrete) uniform n 2 5.6 12 5.40

Bernoulli p p 5.21 p(1 − p) 5.41

binomial n, p np 5.23 np(1 − p) 5.53

r r
t−r t−n
hypergeometric n, r, t n t 5.24 n t t t−1 5.43
1 1−p
geometric p p 5.8 p2
5.44

Poisson λ λ 5.9 λ 5.45

a+b (b−a)2
(continuous) uniform a<b 2 5.12 12 5.47
1 1
exponential λ λ 5.13 λ2
5.48

Normal µ, σ 2 µ 5.19 σ2 5.49

11
In the electronic version of these notes, the example number references all have hyperlinks, which might be convenient.

Test Bank Statistics For The Behavioral Sciences 9th Ed CH 2
100% (2)
Test Bank Statistics For The Behavioral Sciences 9th Ed CH 2
16 pages
Re-Examining The Matching Hypothesis
No ratings yet
Re-Examining The Matching Hypothesis
17 pages
Raw Material Release Process
100% (2)
Raw Material Release Process
9 pages
Week 5 Lectures
No ratings yet
Week 5 Lectures
11 pages
Discrete Random Variables
No ratings yet
Discrete Random Variables
21 pages
Wattle Lecture 15
No ratings yet
Wattle Lecture 15
6 pages
Kulkami, V. G. Modeling Analysis Design and Control of Stochastic System (2000) .3
No ratings yet
Kulkami, V. G. Modeling Analysis Design and Control of Stochastic System (2000) .3
30 pages
Expectations of Discrete Random Variables: Scott Sheffield
No ratings yet
Expectations of Discrete Random Variables: Scott Sheffield
61 pages
Math PDF
No ratings yet
Math PDF
61 pages
5.5. Solved Problems
100% (3)
5.5. Solved Problems
61 pages
Expectation
No ratings yet
Expectation
94 pages
Lecture 11
No ratings yet
Lecture 11
6 pages
Solutions5 6
No ratings yet
Solutions5 6
3 pages
Mit18 05 s22 Class04-Prep-B
No ratings yet
Mit18 05 s22 Class04-Prep-B
7 pages
IP Unit 4 (Expectation)
No ratings yet
IP Unit 4 (Expectation)
22 pages
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 151 180
No ratings yet
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 151 180
30 pages
ST120 Practice Sheet 5
No ratings yet
ST120 Practice Sheet 5
2 pages
Mathematical Expectation
No ratings yet
Mathematical Expectation
25 pages
Slides-Probability and Random Processes, 4, March 2024
No ratings yet
Slides-Probability and Random Processes, 4, March 2024
116 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
Draw PDF
No ratings yet
Draw PDF
21 pages
ECS315 2014 Postmidterm U1 PDF
No ratings yet
ECS315 2014 Postmidterm U1 PDF
89 pages
Chap2 PDF
No ratings yet
Chap2 PDF
20 pages
Chapter 4
No ratings yet
Chapter 4
26 pages
Chapter Two
No ratings yet
Chapter Two
10 pages
Lecture - 02
No ratings yet
Lecture - 02
36 pages
Probability Distribution and Expectations PDF
No ratings yet
Probability Distribution and Expectations PDF
56 pages
3 Expectation
No ratings yet
3 Expectation
70 pages
More Discrete R.V
No ratings yet
More Discrete R.V
40 pages
Expectations of Discrete Random Variables: Scott She Eld
No ratings yet
Expectations of Discrete Random Variables: Scott She Eld
18 pages
Expectation Math
No ratings yet
Expectation Math
19 pages
Chapter 5
No ratings yet
Chapter 5
19 pages
DI&M Part3
No ratings yet
DI&M Part3
18 pages
L05 Final
No ratings yet
L05 Final
19 pages
Random Variables
No ratings yet
Random Variables
44 pages
8: Continuous Random Variables and Probability Density Functions
No ratings yet
8: Continuous Random Variables and Probability Density Functions
20 pages
Properties of Expectation: Jeff Chak Fu WONG
No ratings yet
Properties of Expectation: Jeff Chak Fu WONG
55 pages
Random Variable and Mathematical Expectation
No ratings yet
Random Variable and Mathematical Expectation
9 pages
Chapter 4 Slides
No ratings yet
Chapter 4 Slides
27 pages
Stats 1 - IITM BS Notes - Part 4
No ratings yet
Stats 1 - IITM BS Notes - Part 4
16 pages
Open MOAT Solutions
No ratings yet
Open MOAT Solutions
16 pages
Mathematical Foundations of Computer Science Lecture Outline
No ratings yet
Mathematical Foundations of Computer Science Lecture Outline
6 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Chapter 6, 7, 8
No ratings yet
Chapter 6, 7, 8
25 pages
Discrete Probability Distributions: Random Variables
No ratings yet
Discrete Probability Distributions: Random Variables
52 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Random Variables: Fall 2017 Instructor: Ajit Rajwade
No ratings yet
Random Variables: Fall 2017 Instructor: Ajit Rajwade
74 pages
P8-Properties of Distributions
No ratings yet
P8-Properties of Distributions
12 pages
Random Variables and Mathematical Expectations - Lecture 13 Notes
No ratings yet
Random Variables and Mathematical Expectations - Lecture 13 Notes
9 pages
SE1 Solutions
No ratings yet
SE1 Solutions
16 pages
Mathematical Expectation
No ratings yet
Mathematical Expectation
6 pages
Random Variables Tarea Teoría
No ratings yet
Random Variables Tarea Teoría
8 pages
Lecture 10
No ratings yet
Lecture 10
16 pages
Topic 3: The Expectation and Other Moments of A Random Variable
No ratings yet
Topic 3: The Expectation and Other Moments of A Random Variable
20 pages
LEC0125 RNG Generation
No ratings yet
LEC0125 RNG Generation
7 pages
Random Variables: 1.1 Elementary Examples
No ratings yet
Random Variables: 1.1 Elementary Examples
14 pages
Unit 8
No ratings yet
Unit 8
23 pages
SlidesCourse 14 Oct
No ratings yet
SlidesCourse 14 Oct
10 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
4.5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
EMERY SJ 1988 The Prediction of Moisture Content in Untreated Pavement Layers CSIR Research Report 644
No ratings yet
EMERY SJ 1988 The Prediction of Moisture Content in Untreated Pavement Layers CSIR Research Report 644
109 pages
587 Sample Solutions Manual Probability and Statistics For Engineering and The Sciences 9th Edition by Devore & Matt Carlton
No ratings yet
587 Sample Solutions Manual Probability and Statistics For Engineering and The Sciences 9th Edition by Devore & Matt Carlton
10 pages
Conger Et Al 2000
No ratings yet
Conger Et Al 2000
21 pages
Naturalistic Generalizations: Robert E. Stake and Deborah J. Trumbull
100% (2)
Naturalistic Generalizations: Robert E. Stake and Deborah J. Trumbull
7 pages
Assignment 2
No ratings yet
Assignment 2
11 pages
1 (Group - 1) Creswell A Movement To Mixed Method Study
No ratings yet
1 (Group - 1) Creswell A Movement To Mixed Method Study
13 pages
Social Studies Questions Part 2
No ratings yet
Social Studies Questions Part 2
17 pages
Cia I Dump
No ratings yet
Cia I Dump
12 pages
PAASCU FILES Teachers As Curriculum Leaders
No ratings yet
PAASCU FILES Teachers As Curriculum Leaders
28 pages
Hypothesis Testing Roadmap PDF
50% (2)
Hypothesis Testing Roadmap PDF
2 pages
Employee Engagement and Retention Strategies For O
No ratings yet
Employee Engagement and Retention Strategies For O
9 pages
Program Need Analysis Questionnaire For DEP Program
100% (1)
Program Need Analysis Questionnaire For DEP Program
7 pages
Literature Review On Online Reverse Auctions Research
No ratings yet
Literature Review On Online Reverse Auctions Research
20 pages
Teaching Reading Through Writing: Marjatta Takala
No ratings yet
Teaching Reading Through Writing: Marjatta Takala
7 pages
Class Xii Summer Holiday Homework
No ratings yet
Class Xii Summer Holiday Homework
24 pages
Lecture 4-Literature Reveiw
No ratings yet
Lecture 4-Literature Reveiw
25 pages
Edexcel Gcse Statistics Coursework Example
100% (2)
Edexcel Gcse Statistics Coursework Example
6 pages
Ancient Journeys and Migrants
No ratings yet
Ancient Journeys and Migrants
3 pages
Issues in Organizational Leadership
No ratings yet
Issues in Organizational Leadership
9 pages
Applied Research - Final Document
No ratings yet
Applied Research - Final Document
46 pages
Recruitment, Selection, and Assessment: Are The CV and Interview Still Worth Using?
No ratings yet
Recruitment, Selection, and Assessment: Are The CV and Interview Still Worth Using?
6 pages
Develop Med Child Neuro - 2021 - Jackman - Interventions To Improve Physical Function For Children and Young People With
No ratings yet
Develop Med Child Neuro - 2021 - Jackman - Interventions To Improve Physical Function For Children and Young People With
14 pages
Framework For Access Services Librarianship - An Initiative Sponsored by ACRL Access Services Interest Group
No ratings yet
Framework For Access Services Librarianship - An Initiative Sponsored by ACRL Access Services Interest Group
33 pages
What Is A P Value
No ratings yet
What Is A P Value
4 pages
Pine Needle Length Comparisons in Conifers
0% (1)
Pine Needle Length Comparisons in Conifers
6 pages
RRL Suggestion
No ratings yet
RRL Suggestion
3 pages
Bio Teknik Menjawab
No ratings yet
Bio Teknik Menjawab
33 pages

Section 5 - Expectation and Variance

Uploaded by

Section 5 - Expectation and Variance

Uploaded by

5 Expectation and variance

Remark 5.2. Let X be a discrete random variable.

• The expectation of X is often called the mean or the average of X.

E[X] = 2 · (0.4) + 4 · (0.1) + 10 · (0.5) = 6.2.

Let us calculate the expectations of some familiar random variables.

Example 5.9 (Poisson). Let X ∼ Poiλ with λ > 0. Then,

The expectation of continuous random variables is defined very similarly.

where we used integration by parts in the second last equality.

5.2 Key properties of expectation

(i) Given any a, b ∈ R, we have E[aX + bY ] = a · E[X] + b · E[Y ]. (linearity of expectation)

(ii) If X, Y are independent then E[X · Y ] = E[X] · E[Y ].

Remark 5.16. The following points are of key importance.

E[a1 X1 + · · · + an Xn ] = a1 E[X1 ] + · · · + an E[Xn ].

Berp (1) = p, Berp (0) = 1 − p.

Example 5.21 (Bernoulli). If X ∼ Berp then E[X] = 1 · P(X = 1) + 0 · P(X = 0) = p.

three examples illustrate this method, which is very applicable.

Example 5.24 (Hypergeometric). We show that if X ∼ hypn,r,t where n, r ≤ t then E[X] = n rt .

E[X] = E[X1 ] + · · · + E[Xr ]. (2)

E[X] = 365 · E[X1 ] = 365 · P(A1 ) = 365 · 1 − (364/365)50 = 46.786 [3dp].

X = T1 + (T2 − T1 ) + (T3 − T2 ) + (T4 − T3 ) + (T5 − T4 ) + (T6 − T5 ).

The following result is also very useful in calculating expectations.

Lemma 5.28. Let g : R → R be a function.

(ii) If X is a continuous random variable with density fX then

Again we postpone the proof, to first see how it applies.

5.2.1 The proofs of Theorem 5.15 and Lemma 5.28.

Proof of Lemma 5.28. We only prove (i). By definition of E[g(X)], we have

• The variance of X is always non-negative, by definition in (3).

Var(X) = (1.25 − (−0.1))2 · 0.4 + (−1 − (−0.1))2 · 0.6 = 1.215.

(i) Var(X) = E[X 2 ] − (E[X])2 ,

(ii) given a, b ∈ R we have Var(aX + b) = a2 · Var(X),

(iii) Var(X) = 0 if and only if P(X = x0 ) = 1 for some x0 ∈ R.

Var(X) = E (X − E[X])2 = E X 2 − 2 · X · E[X] + E[X]2

Example 5.43 (Hypergeometric). If X ∼ hypn,r,t then Var(X) = n rt t−r

We obtain Var(X) = E[X 2 ] − (E[X])2 = E[X(X − 1)] + E[X] − (E[X 2 ])2 = λ2 + λ − λ2 = λ.

Example 5.48 (Exponential). Let X ∼ expλ where λ > 0. Then

Var(3X − 5Y ) = Var(3X) + Var(−5Y ) = 32 Var(X) + (−5)2 Var(Y ) = 18 + 125 = 143.

Another quantity which plays an important role in statistics is the median.

Definition 5.54 (Median). Let X be a discrete or continuous random variable. A value m ∈ R is

• In general, the median of a random variable is not necessarily unique.

improvement no improvement total

µ̂n = n−1 (X̂1 + · · · + X̂n ).

E X̂i4 − σ 4 E X̂i4 + σ 4 E X̂i4 − σ 4

Most important takeaways in this chapter. You should

• be familiar with the main properties of expectation and variance,

• be able to compute expectation, variance and median in simple examples,

• know variance of binomial, Poisson, geometric and normal distribution,

• be able to exploit linearity of expectation to compute the expectation of complicated random

• appreciate the concept of hypothesis tests.

discrete r.v. continuous r.v.

mass/density function pX (k), k ∈ SX fX (x), x ∈ R

connection P(X = k) = FX (k) − P(X < k) fX (x) = FX0 (x)

variance Var(X) E[X 2 ] − (E[X])2 E[X 2 ] − (E[X])2

distribution parameters expectation Ex. no. variance Ex. no.

Bernoulli p p 5.21 p(1 − p) 5.41

binomial n, p np 5.23 np(1 − p) 5.53

Poisson λ λ 5.9 λ 5.45

Normal µ, σ 2 µ 5.19 σ2 5.49

You might also like