Random Variables Tarea Teoría
Random Variables Tarea Teoría
In other words, the outcomes = 1, 2 map to X = 0, while the outcomes = 3, 4, 5, 6 map to X = 1. The
r.v. X takes on values {0, 1}, with probabilities Pr(X = 0) = 2/3 and Pr(X = 1) = 1/3.
Or say you roll this same die n times, so that the sample space is = {1, 2, 3, 4, 5, 6}n. Examples of
random variables on this larger space are
The sample point = (1, 1, 1, 1, . . . , 1, 6), for instance, would map to X = 1, Y = n 1. The variable X
takes values in {0, 1, 2, . . . , n}, with
k nk
n 1 5
Pr(X = k) =
k 6 6
(where the summation is over all the possible values x that X can have). This is a direct generalization of
the notion of average (which is typically defined in situations where the outcomes are equally likely). If X
3-1
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010
is a continuous random variable, then this summation needs to be replaced by an equivalent integral; but
well get to that later in the course.
Here are some examples.
1. Coin with bias (heads probability) p.
Define X to be 1 if the outcome is heads, or 0 if it is tails. Then
Another random variable on this space is X 2 , which also takes on values in {0, 1}. Notice that X 2 = X,
and in fact X k = X for all k = 1, 2, 3, . . .! Thus, E(X 2 ) = p as well. This simple case shows that in
general, E(X 2 ) 6= E(X)2 .
2. Fair die.
Define X to be the outcome of the roll, so X {1, 2, 3, 4, 5, 6}. Then
1 1 1 1 1 1
E(X) = 1 +2 +3 +4 +5 +6 = 3.5.
6 6 6 6 6 6
3. Two dice.
Let X be their sum, so that X {2, 3, 4, . . . , 12}. We can calculate the probabilities of each possible
value of X and tabulate them as follows:
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
Pr(X = x) 36 36 36 36 36 36 36 36 36 36 36
The last step is somewhat mysterious; just take our word for it, and well get back to it later!
5. Toss a fair coin forever; how many tosses to the first heads?
Let X {1, 2, . . .} be the number of tosses until you first see heads. Then
1
Pr(X = k) = Pr((T, T, T, . . . , T, H)) = .
2k
It follows that
X k
E(X) = = 2.
2k
k=1
We saw in class how to do this summation. The technique was based on the formula for the sum of a
geometric series: if |r| < 1, then
a
a + ar + ar2 + = .
1r
3-2
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010
6. Toss a coin with bias p forever; how many tosses to the first heads?
Once again, X {1, 2, . . .}, but this time the distribution is different:
For example, recall our earlier example about two rolls of a die, in which we let X be the sum of the rolls
and derived E(X) by first computing Pr(X = x) for all x {2, 3, . . . , 12}. Well, now we can do it much
more easily: simply write X1 for the first roll and X2 for the second roll, so that X = X1 + X2 . We already
know E(Xi ) = 3.5, so E(X) = 7.
More generally, for any random variables X1 , X2 , . . . , Xn ,
3-3
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010
2. Toss n coins of bias p and let X be the number of heads. What is E(X)?
Let Xi be 1 if the ith coin turns up heads, and 0 if it turns up tails. Then E(Xi ) = p and since
X = X1 + + Xn , we have E(X) = np.
3. Toss n coins of bias p; what is the expected number of times HT H appears in the resulting sequence?
Let Xi be 1 if there is an occurrence of HT H starting at position i (so 1 i n 2). The total
number of such occurrences is X = X1 + X2 + + Xn2 . Since E(Xi ) = p2 (1 p), we have
E(X) = (n 2)p2 (1 p).
k
E(Xi ) = .
ki+1
Invoking linearity of expectation,
This confirms our earlier observations about the coupon collector problem: you need to buy about k ln k
boxes.
3-4
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010
Since E(Xij ) = 1/n (do you see why?), it follows that the expected number of collisions is
m 1 m(m 1)
E(X) = = .
2 n 2n
So if m < 2n, the expected number of collisions is < 1, which means every ballgoes into a different bin.
This relates back to the birthday paradox, where m is close to the threshold 2n.
for all x, y. In words, the joint distribution of (X, Y ) factors into the product of the individual distributions.
This also implies, for instance, that
Y
1 0 1
1 0.4 0.16 0.24
X 0 0.05 0.02 0.03
1 0.05 0.02 0.03
E(XY ) = E(X)E(Y ).
Another useful fact is that f (X) and g(Y ) must also be independent, for any functions f and g.
3-5
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010
3.5 Variance
If you need to summarize a probability distribution by a single number, then the mean is a reasonable choice
although often the median is better advised (more on this later). But neither the mean nor median captures
how spread out the distribution is.
Look at the following two distributions:
100 100
They both have the same expectation, 100, but one is concentrated near the middle while the other is pretty
flat. To distinguish between them, we are interested not just in the mean = E(X), but also in the typical
distance from the mean, E(|X |). It turns out to be mathematically convenient to work with the square
instead: the variance of X is defined to be
In the above example, the distribution on the right has a higher variance that the one on the left.
var(X) = E((X )2 )
= E(X 2 + 2 2X)
= E(X 2 ) + E(2 ) + E(2X) (linearity)
= E(X 2 ) + 2 2E(X)
= E(X 2 ) + 2 22 = E(X 2 ) 2 .
3. For any random variable X, it must be the case that E(X 2 ) (E(X))2 .
This is simply because var(X) = E(X 2 ) (E(X))2 0.
p
4. E(|X |) var(X).
|X | instead of X, you get E(|X |2 )
If you apply the previous property to theprandom variablep
2
(E(|X |)) . Therefore, E(|X |) E(|X |2 ) = var(X).
p
The last property tells us that var(X) is a good measure of the typical spread of X: how far it typically
lies from its mean. We call this the standard deviation of X.
3-6
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010
3.5.2 Examples
1. Suppose you toss a coin with bias p, and let X be 1 if the outcome is heads, or 0 if the outcome is tails.
Lets look at the distribution of X and of X 2 .
Prob X X2
p 1 1
1p 0 0
From this table, E(X) = p and E(X 2 ) = p. Thus the variance is var(X) = E(X 2 ) (E(X))2 = p(1 p).
2. Roll a 4-sided die (a tetrahedron) in which each face is equally likely to come up, and let the outcome
be X {1, 2, 3, 4}.
We have two formulas for the variance:
E (X )2
var(X) =
var(X) = E(X 2 ) 2
where = E(X). Lets try both and make sure we get the same answer. First of all, = E(X) =
(1 + 2 + 3 + 4)/4 = 2.5. Now, lets tabulate the distribution of X 2 and (X )2 .
Prob X X2 (X )2
1/4 1 1 2.25
1/4 2 4 0.25
1/4 3 9 0.25
1/4 4 16 2.25
3-7
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010
E(X 2 ) = E((X1 + + Xn )2 )
Xn X
= E Xi2 + Xi Xj
i=1 i6=j
X X
= E(Xi2 ) + E(Xi Xj )
i i6=j
1 1
= n + n(n 1) = 2.
n n(n 1)
Thus var(X) = E(X 2 ) (E(X)2 ) = 1. This means that the number of fixed points has mean 1 and
variance 1: in short, it is quite unlikely to be very much larger than 1.
The standard deviation quantifies the spread of the distribution whereas the mean specifies its location. If
you increase all values of X by 10, then the distribution will shift to the right and the mean will increase by
10. But the spread of the distribution and thus the standard deviation will remain unchanged.
On the other hand, if you double all values of X, then its distribution becomes twice as wide, and thus
its standard deviation is doubled. Which means that its variance, which is the square of the standard
deviation, gets multiplied by 4.
In summary, for any constants a, b:
var(aX + b) = a2 var(X).
3-8