0% found this document useful (0 votes)
21 views

Variance and Standard Deviation Math 217 Probability and Statistics

The document discusses the variance and standard deviation of rolling a fair die. It defines variance as the expected value of the squared distance from the mean (E[(X-μ)2]). The variance of rolling a fair die is calculated to be 12/35. Taking the square root of the variance gives the standard deviation, which for a fair die is approximately 1.707. The document then discusses some key properties of variance, including that the variance of cX is c2 times the variance of X, and the variance of independent random variables X and Y summed is the sum of their individual variances.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Variance and Standard Deviation Math 217 Probability and Statistics

The document discusses the variance and standard deviation of rolling a fair die. It defines variance as the expected value of the squared distance from the mean (E[(X-μ)2]). The variance of rolling a fair die is calculated to be 12/35. Taking the square root of the variance gives the standard deviation, which for a fair die is approximately 1.707. The document then discusses some key properties of variance, including that the variance of cX is c2 times the variance of X, and the variance of independent random variables X and Y summed is the sum of their individual variances.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Pair of dice.

Let’s take as an example the roll X


of one fair die. We know µ = E(X) = 3.5. What’s
the variance and standard deviation of X?

Variance and standard deviation Var(X)


Math 217 Probability and Statistics = E (X − µ)2

Prof. D. Joyce, Fall 2014 6
X
= (x − 3.5)2 P (X = x)
Variance for discrete random variables. The x=1
variance of a random variable X is intended to give X 6

a measure of the spread of the random variable. If = 1


6
(x − 3.5)2
X takes values near its mean µ = E(X), then the x=1
1
(− 52 )2 + (− 23 )2 + (− 12 )2 + ( 21 )2 + ( 32 )2 + ( 52 )2

variance should be small, but if it takes values from = 6
35
from µ, then the variance should be large. = 12
The measure we’ll use for distance from the mean
will be the square of the distance from the mean, Since the variance is σ 2 = 35
12
, therefore the stan-
p
(x − µ)2 , rather than the distance from the mean, dard deviation is σ = 35/12 ≈ 1.707.
|x−µ|. There are three reasons for using the square
of the distance rather than the absolute value of the Properties of variance. Although the defini-
difference. First, the square is easier to work with tion works okay for computing variance, there is an
mathematically. For instance, x2 has a derivative, alternative way to compute it that usually works
but |x| doesn’t. Second, large distances from the better, namely,
mean become more significant, and it has been ar-
gued that this is a desirable property. Most impor- σ 2 = Var(X) = E(X 2 ) − µ2
tant, though, is that the square is the right measure Here’s why that works.
to use in order to derive the important theorems in
the theory of probability, in particular, the Central σ2 = Var(x)
Limit Theorem. E (X − µ)2

=
Anyway, we define the variance Var(X) of a ran- = E(X 2 − 2µX + µ2 )
dom variable X as the expectation of the square of
the distance from the mean, that is, = E(X 2 ) − 2µE(X) + µ2
= E(X 2 ) − 2µµ + µ2
Var(X) = E (X − µ)2 .

= E(X 2 ) − µ2
As the square is used in the definition of vari-
Here are a couple more properties of variance.
ance, we’ll use the square root of the variance to
First, if you multiply a random variable X by a
normalize this measure of the spread of the ran-
constant c to get cX, the variance changes by a
dom variable. The square root of the variance is
factor of the square of c, that is
called the standard deviation, denoted in our text
as SD(X) Var(cX) = c2 Var(X).
p q 
SD(X) = Var(X) = E (X − µ)2 . That’s the main reason why we take the square root
of variance to normalize it—the standard deviation
Just as we have a symbol µ for the mean, or expec- of cX is c times the standard deviation of X:
tation, of X, we denote the standard deviation of
X as σ, and so the variance is σ 2 . SD(cX) = |c| SD(x).

1
(Absolute value is needed in case c is negative.) It’s S = X1 + X2 + · · · + Xn where each Xi equals 1
easy to show that Var(cX) = c2 Var(X): with probability p and 0 with probability q = 1 − p.
By the preceding theorem,
Var(cX) = E (cX − µ)2

Var(S) = Var(X1 ) + Var(X2 ) + · · · + Var(Xn ).
= E c2 (X − µ)2


= c2 E (X − µ)2 We can determine that if we can determine the vari-




= c2 Var(X) ance of one Bernoulli trial X.


Now, Var(X) = E(X 2 ) − µ2 , and for a Bernoulli
2 2
The next important property of variance is that trial µ = p. Let’s compute E(X ). E(X ) =
it’s translation invariant, that is, if you add a con- P (X=0)0 + P (X=1)1 = p. Therefore, the 2
variance
stant to a random variable, the variance doesn’t of one Bernoulli trial is Var(X) = p − p = pq.
change: From that observation, we conclude the variance
Var(X + c) = Var(X). of the binomial distribution is

In general, the variance of the sum of two random Var(S) = n Var(X) = npq.
variables is not the sum of the variances of the two Taking the square root, we see that the standard
random variables. But it is when the two random deviation of that binomial distribution is √npq.
variables are independent. That gives us the important observation that the
Theorem. If X and Y are independent random spread of a binomial distribution is proportional to
variables, then Var(X + Y ) = Var(X) + Var(Y ). the square root of n, the number of trials.
Proof: This proof relies on the fact that E(XY ) = The argument generalizes to other distributions:
E(X) E(Y ) when X and Y are independent. The standard deviation of a random sample is pro-
portional to the square root of the number of trials
Var(X + Y ) in the sample.
= E((X + Y )2 ) − µ2X+Y
Variance of a geometric distribution. Con-
= E(X 2 + 2XY + Y 2 ) − (µX + µY )2
sider the time T to the first success in a Bernoulli
= E(X 2 ) + 2E(X)E(Y ) + E(Y 2 ) process. Its probability mass function is f (t) =
− µ2X − 2µX µY − µ2Y pq t−1 . We saw that its mean was µ = E(T ) =
1
= E(X 2 ) + 2µX µY + E(Y 2 ) p
. We’ll compute its variance using the formula
− µ2X − 2µX µY − µ2Y Var(X) = E(X 2 ) − µ2 .
= E(X 2 ) − µ2X + E(Y 2 ) − µ2Y X∞
2
E(T ) = t2 pq t−1
= Var(X) + Var(Y )
t=1

q.e.d. = 1p + 22 pq + 32 pq 2 + · · · + n2 pq n−1 + · · ·
The last power series we got when we evaluated
Variance of the binomial distribution. Let S E(T ) was
be the number of successes in n Bernoulli trials, 1
where the probability of success is p. This random = 1 + 2x + 3x2 + · · · + nxn−1 + · · ·
(1 − x)2
variable S has a binomial distribution, and we could
use its probability mass function to compute it, but Multiply it by x to get
there’s an easier way. The random variable S is x
actually a sum of n independent Bernoulli trials = x + 2x2 + 3x3 + · · · + nxn + · · ·
(1 − x)2

2
Differentiate that to get
1+x
= 1 + 22 x + 32 x2 + · · · + n2 xn−1 + · · ·
(1 − x)3

Set x to q, and multiply the equation by p, and we


get
1+q
= p + 22 pq + 32 pq 2 + · · · + n2 pq n−1 + · · ·
p2
1+q
Therefore E(T 2 ) = . Finally,
p2
1+q 1 q
Var(X) = E(X 2 ) − µ2 = − = .
p2 p2 p2

Math 217 Home Page at http://math.clarku.


edu/~djoyce/ma217/

You might also like