0% found this document useful (0 votes)
121 views8 pages

Random Variables Tarea Teoría

This document discusses random variables and their expected values. It begins by defining random variables as mappings from a sample space to real numbers, where the sample space assigns probabilities to outcomes. Examples of random variables include the number of sixes rolled when rolling a die multiple times. The expected value (or mean) of a random variable is the probability-weighted average of its possible values. For discrete random variables, expected value is calculated as the sum of each possible value multiplied by its probability. For continuous variables, the sum is replaced with an integral. Properties of expected value include linearity (E(aX + b) = aE(X) + b) and that the expected value of sums is the sum of

Uploaded by

Marco Espinosa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views8 pages

Random Variables Tarea Teoría

This document discusses random variables and their expected values. It begins by defining random variables as mappings from a sample space to real numbers, where the sample space assigns probabilities to outcomes. Examples of random variables include the number of sixes rolled when rolling a die multiple times. The expected value (or mean) of a random variable is the probability-weighted average of its possible values. For discrete random variables, expected value is calculated as the sum of each possible value multiplied by its probability. For continuous variables, the sum is replaced with an integral. Properties of expected value include linearity (E(aX + b) = aE(X) + b) and that the expected value of sums is the sum of

Uploaded by

Marco Espinosa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

CSE 103: Probability and statistics Winter 2010

Topic 3 Random variables, expectation, and variance

3.1 Random variables


A random variable (r.v.) is defined on a probability space (, Pr) and is a mapping from to R.
The value of the random variable is fully determined by the outcome . Thus the underlying
probability space (probabilities Pr()) induces a probability distribution over the random variable. Lets
look at some examples.
Suppose you roll a fair die. The sample space is = {1, 2, 3, 4, 5, 6}, all outcomes being equally likely.
On this space we can then define a random variable

1 if die is 3
X=
0 otherwise

In other words, the outcomes = 1, 2 map to X = 0, while the outcomes = 3, 4, 5, 6 map to X = 1. The
r.v. X takes on values {0, 1}, with probabilities Pr(X = 0) = 2/3 and Pr(X = 1) = 1/3.
Or say you roll this same die n times, so that the sample space is = {1, 2, 3, 4, 5, 6}n. Examples of
random variables on this larger space are

X = the number of 6s rolled,


Y = the number of 1s seen before the first 6.

The sample point = (1, 1, 1, 1, . . . , 1, 6), for instance, would map to X = 1, Y = n 1. The variable X
takes values in {0, 1, 2, . . . , n}, with
   k  nk
n 1 5
Pr(X = k) =
k 6 6

(do you see why?).


As a third example, suppose you throw a dart at a dartboard of radius 1, and that it lands at a random
location on the board. Define random variable X to be the distance of the dart from the center of the board.
Now X takes values in [0, 1], and for any x in this range, Pr(X x) = x2 .
Henceforth, well follow the convention of using capital letters for r.v.s.

3.2 The mean, or expected value


For a random variable X that takes on a finite set of possible values, the mean, or expected value, is
X
E(X) = x Pr(X = x)
x

(where the summation is over all the possible values x that X can have). This is a direct generalization of
the notion of average (which is typically defined in situations where the outcomes are equally likely). If X

3-1
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010

is a continuous random variable, then this summation needs to be replaced by an equivalent integral; but
well get to that later in the course.
Here are some examples.
1. Coin with bias (heads probability) p.
Define X to be 1 if the outcome is heads, or 0 if it is tails. Then

E(X) = 0 Pr(X = 0) + 1 Pr(X = 1) = 0 (1 p) + 1 p = p.

Another random variable on this space is X 2 , which also takes on values in {0, 1}. Notice that X 2 = X,
and in fact X k = X for all k = 1, 2, 3, . . .! Thus, E(X 2 ) = p as well. This simple case shows that in
general, E(X 2 ) 6= E(X)2 .
2. Fair die.
Define X to be the outcome of the roll, so X {1, 2, 3, 4, 5, 6}. Then
1 1 1 1 1 1
E(X) = 1 +2 +3 +4 +5 +6 = 3.5.
6 6 6 6 6 6

3. Two dice.
Let X be their sum, so that X {2, 3, 4, . . . , 12}. We can calculate the probabilities of each possible
value of X and tabulate them as follows:

x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
Pr(X = x) 36 36 36 36 36 36 36 36 36 36 36

This gives E(X) = 7.


4. Roll n die; how many sixes appear?
Let X be the number of 6s. Weve already analyzed the distribution of X, so
n n    k  nk
X X n 1 5 n
E(X) = k Pr(X = k) = k = .
k 6 6 6
k=0 k=0

The last step is somewhat mysterious; just take our word for it, and well get back to it later!
5. Toss a fair coin forever; how many tosses to the first heads?
Let X {1, 2, . . .} be the number of tosses until you first see heads. Then
1
Pr(X = k) = Pr((T, T, T, . . . , T, H)) = .
2k
It follows that

X k
E(X) = = 2.
2k
k=1

We saw in class how to do this summation. The technique was based on the formula for the sum of a
geometric series: if |r| < 1, then
a
a + ar + ar2 + = .
1r

3-2
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010

6. Toss a coin with bias p forever; how many tosses to the first heads?
Once again, X {1, 2, . . .}, but this time the distribution is different:

Pr(X = k) = Pr((T, T, T, . . . , T, H)) = (1 p)k1 p.

Using the same technique as before, we get E(X) = 1/p.


Theres another way to derive this expectation. We always need at least one coin toss. If were lucky
(with probability p), were done; otherwise (with probability 1 p), we start again from scratch.
Therefore E(X) = 1 + (1 p)E(X), so that E(X) = 1/p.
7. Pascals wager: does God exist?
Here was Pascals take on the issue of Gods existence: if you believe there is some chance p > 0 (no
matter how small) that God exists, then you should behave as if God exists.
Why? Well, let the random variable X denote your amount of suffering.
Suppose you behave as if God exists (that is, you are good). This behavior incurs a significant but
finite amount of suffering (you are not able to do some of the things you would like to). Say X = 10.
On the other hand, suppose you behave as if God doesnt exist that is, you do all the things you
want to do. If God really doesnt exist, youre fine, and your suffering is X = 0. But if God exists,
then you go straight to hell and your suffering is X = . Thus your expected suffering if you behave
badly is E(X) = 0 (1 p) + p = .
So: to minimize your expected suffering, behave as if God exists!

3.3 Linearity of expectation


If you double each value of X, then you also double its average; that is, E(2X) = 2E(X). Likewise, if you
raise each of its values by 1, you will also increase the average by 1; that is, E(X + 1) = E(X) + 1. More
generally, for any constants a, b,
E(aX + b) = aE(X) + b.
Another exceptionally useful formula says that the mean value of the sum of variables is simply the sum of
their individual means. Formally, for any random variables X, Y ,

E(X + Y ) = E(X) + E(Y ).

For example, recall our earlier example about two rolls of a die, in which we let X be the sum of the rolls
and derived E(X) by first computing Pr(X = x) for all x {2, 3, . . . , 12}. Well, now we can do it much
more easily: simply write X1 for the first roll and X2 for the second roll, so that X = X1 + X2 . We already
know E(Xi ) = 3.5, so E(X) = 7.
More generally, for any random variables X1 , X2 , . . . , Xn ,

E(X1 + + Xn ) = E(X1 ) + + E(Xn ).

Some quick examples:


1. Roll n dice and let X be the number of sixes. What is E(X)?
This time, let Xi be 1 if the ith roll is a six, and 0 otherwise. Thus E(Xi ) = 1/6, so E(X) = n/6.

3-3
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010

2. Toss n coins of bias p and let X be the number of heads. What is E(X)?
Let Xi be 1 if the ith coin turns up heads, and 0 if it turns up tails. Then E(Xi ) = p and since
X = X1 + + Xn , we have E(X) = np.
3. Toss n coins of bias p; what is the expected number of times HT H appears in the resulting sequence?
Let Xi be 1 if there is an occurrence of HT H starting at position i (so 1 i n 2). The total
number of such occurrences is X = X1 + X2 + + Xn2 . Since E(Xi ) = p2 (1 p), we have
E(X) = (n 2)p2 (1 p).

3.3.1 Fixed points of a permutation


The fixed points of a permutation are the numbers that remain in their original position. For instance, in
the permutation
(1, 2, 3, 4, 5, 6) (6, 2, 5, 4, 1, 3)
the fixed points are 2 and 4. Let X be the number of fixed points in a random permutation of (1, 2, . . . , n);
what is E(X)?
Linearity is very helpful here. Define the random variable Xi to be 1 if i is a fixed point, and 0 otherwise.
Then E(Xi ) = 1/n. Therefore
E(X) = E(X1 + + Xn ) = 1.
The expected number of fixed points is 1, regardless of n.

3.3.2 Coupon collector, again


Recall the setting: each cereal box holds one of k action figures (chosen at random), and you want to collect
all the figures. What is the expected number of cereal boxes you need to buy?
Suppose you keep buying boxes until you get all the figures. Let Xi be the number of boxes you buy to
get from i 1 distinct figures to i distinct figures. Therefore X = X1 + X2 + + Xk , and of course X1 = 1.
What is E(Xi )? Well, you already have i 1 of the figures, so the chance of getting a new figure in a
cereal box is (k (i 1))/k. Call this p. Therefore, the expected amount of time you have to wait to get a
new figure is 1/p: just like waiting for a coin with bias p to turn up heads. That is,

k
E(Xi ) = .
ki+1
Invoking linearity of expectation,

E(X) = E(X1 ) + + E(Xk )


k k k k
= + + + +
k k1 k2 1
 
1 1
= k 1 + + +
2 k
k ln k.

This confirms our earlier observations about the coupon collector problem: you need to buy about k ln k
boxes.

3-4
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010

3.3.3 Balls in bins, again


Toss m balls in n bins; what is the expected number of collisions? Lets make this more precise. For any
1 i < j m, define the random variable Xij to be 1 if balls i and j land in the same bin, and 0 otherwise.
Then the number of collisions is defined to be
X
X = Xij .
1i<jm

Since E(Xij ) = 1/n (do you see why?), it follows that the expected number of collisions is
 
m 1 m(m 1)
E(X) = = .
2 n 2n

So if m < 2n, the expected number of collisions is < 1, which means every ballgoes into a different bin.
This relates back to the birthday paradox, where m is close to the threshold 2n.

3.4 Independent random variables


Random variables X and Y are independent if

Pr(X = x, Y = y) = Pr(X = x)Pr(Y = y)

for all x, y. In words, the joint distribution of (X, Y ) factors into the product of the individual distributions.
This also implies, for instance, that

Pr(X = x|Y = y) = Pr(X = x).

Which of the following pairs (X, Y ) are independent?


1. Pick a random card out of a standard deck. Define X to be 1 if it is a heart; and 0 otherwise. Define
Y to be 1 if it is a jack, queen, or king; and 0 otherwise.
2. Toss a fair coin n times, and define X to be the number of heads, and Y to be 1 if the last toss is
heads (and 0 otherwise).
3. X and Y take values in {1, 0, 1}, and their joint distribution is given by the following table of
probabilities.

Y
1 0 1
1 0.4 0.16 0.24
X 0 0.05 0.02 0.03
1 0.05 0.02 0.03

If X, Y are independent, they satisfy the following useful product rule:

E(XY ) = E(X)E(Y ).

Another useful fact is that f (X) and g(Y ) must also be independent, for any functions f and g.

3-5
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010

3.5 Variance
If you need to summarize a probability distribution by a single number, then the mean is a reasonable choice
although often the median is better advised (more on this later). But neither the mean nor median captures
how spread out the distribution is.
Look at the following two distributions:

100 100

They both have the same expectation, 100, but one is concentrated near the middle while the other is pretty
flat. To distinguish between them, we are interested not just in the mean = E(X), but also in the typical
distance from the mean, E(|X |). It turns out to be mathematically convenient to work with the square
instead: the variance of X is defined to be

var(X) = E((X )2 ) = E((X E(X))2 ).

In the above example, the distribution on the right has a higher variance that the one on the left.

3.5.1 Properties of the variance


In what follows, take to be E(X).
1. The variance cannot be negative.
Since each individual value (X )2 is 0 (since its squared), the average value E((X )2 ) must be
0 as well.
2. var(X) = E(X 2 ) 2 .
This is because

var(X) = E((X )2 )
= E(X 2 + 2 2X)
= E(X 2 ) + E(2 ) + E(2X) (linearity)
= E(X 2 ) + 2 2E(X)
= E(X 2 ) + 2 22 = E(X 2 ) 2 .

3. For any random variable X, it must be the case that E(X 2 ) (E(X))2 .
This is simply because var(X) = E(X 2 ) (E(X))2 0.
p
4. E(|X |) var(X).
|X | instead of X, you get E(|X |2 )
If you apply the previous property to theprandom variablep
2
(E(|X |)) . Therefore, E(|X |) E(|X |2 ) = var(X).
p
The last property tells us that var(X) is a good measure of the typical spread of X: how far it typically
lies from its mean. We call this the standard deviation of X.

3-6
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010

3.5.2 Examples
1. Suppose you toss a coin with bias p, and let X be 1 if the outcome is heads, or 0 if the outcome is tails.
Lets look at the distribution of X and of X 2 .

Prob X X2
p 1 1
1p 0 0

From this table, E(X) = p and E(X 2 ) = p. Thus the variance is var(X) = E(X 2 ) (E(X))2 = p(1 p).
2. Roll a 4-sided die (a tetrahedron) in which each face is equally likely to come up, and let the outcome
be X {1, 2, 3, 4}.
We have two formulas for the variance:
E (X )2

var(X) =
var(X) = E(X 2 ) 2
where = E(X). Lets try both and make sure we get the same answer. First of all, = E(X) =
(1 + 2 + 3 + 4)/4 = 2.5. Now, lets tabulate the distribution of X 2 and (X )2 .

Prob X X2 (X )2
1/4 1 1 2.25
1/4 2 4 0.25
1/4 3 9 0.25
1/4 4 16 2.25

Reading from this table,


1
E(X 2 ) = (1 + 4 + 9 + 16) = 7.5
4
1
E(X )2 = (2.25 + 0.25 + 0.25 + 2.25) = 1.25
4
The first formula for variance gives var(X) = E(X )2 = 1.25. The second formula gives var(X) =
E(X 2 ) 2 = 7.5 (2.5)2 = 1.25, the same thing.
3. Roll a k-sided die in which each face is equally likely to come up. The outcome is X {1, 2, . . . , k}.
The expected outcome is
1
1 + 2 + + k 2 k(k + 1) k+1
E(X) = = = ,
k k 2
using a special formula for the sum of the first k integers. Theres another for the sum of the first k
squares, from which
1
12 + 22 + + k 2 6 k(k + 1)(2k + 1) (k + 1)(2k + 1)
E(X 2 ) = = = .
k k 6
Then
(k + 1)(2k + 1) (k + 1)2 k2 1
var(X) = E(X 2 ) (E(X))2 = = .
6 4 12

The standard deviation is thus approximately k/ 12.

3-7
CSE 103 Topic 3 Random variables, expectation, and variance Winter 2010

4. X is the number of fixed points of a random permutation of (1, 2, . . . , n).


Proceeding as before, let Xi be 1 if i is a fixed point of the permutation, and 0 otherwise. Then
E(Xi ) = 1/n. For i 6= j, the product Xi Xj is 1 only if both i and j are fixed points, which occurs with
probability 1/n(n 1) (why?). Thus E(Xi Xj ) = 1/n(n 1).
Since X is the sum of the individual Xi , we have E(X) = 1 and

E(X 2 ) = E((X1 + + Xn )2 )

Xn X
= E Xi2 + Xi Xj
i=1 i6=j
X X
= E(Xi2 ) + E(Xi Xj )
i i6=j
1 1
= n + n(n 1) = 2.
n n(n 1)

Thus var(X) = E(X 2 ) (E(X)2 ) = 1. This means that the number of fixed points has mean 1 and
variance 1: in short, it is quite unlikely to be very much larger than 1.

3.5.3 Another property of the variance


Heres a cartoon picture of a well-behaved distribution with mean and standard deviation (that is,
= E(X) and 2 = var(X)).

The standard deviation quantifies the spread of the distribution whereas the mean specifies its location. If
you increase all values of X by 10, then the distribution will shift to the right and the mean will increase by
10. But the spread of the distribution and thus the standard deviation will remain unchanged.
On the other hand, if you double all values of X, then its distribution becomes twice as wide, and thus
its standard deviation is doubled. Which means that its variance, which is the square of the standard
deviation, gets multiplied by 4.
In summary, for any constants a, b:

var(aX + b) = a2 var(X).

Contrast this with the mean: E(aX + b) = aE(X) + b.

3-8

You might also like