0% found this document useful (0 votes)
47 views6 pages

4 Moment Generating Functions: 4.1 Definition and Moments

The document discusses moment generating functions (mgfs) and how they can be used to compute moments of random variables. Some key points: 1) The mgf of a random variable X is defined as MX(t) = E[etX]. Mgfs make computations of things like sums of independent random variables much simpler compared to other methods. 2) Moments of X, such as its mean and variance, can be obtained by taking derivatives of the mgf at t=0. 3) The mgf of the sum of independent random variables X and Y is the product of their individual mgfs, MX+Y(t) = MX(t)MY(t).

Uploaded by

moh01362eoopyom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views6 pages

4 Moment Generating Functions: 4.1 Definition and Moments

The document discusses moment generating functions (mgfs) and how they can be used to compute moments of random variables. Some key points: 1) The mgf of a random variable X is defined as MX(t) = E[etX]. Mgfs make computations of things like sums of independent random variables much simpler compared to other methods. 2) Moments of X, such as its mean and variance, can be obtained by taking derivatives of the mgf at t=0. 3) The mgf of the sum of independent random variables X and Y is the product of their individual mgfs, MX+Y(t) = MX(t)MY(t).

Uploaded by

moh01362eoopyom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

4 Moment generating functions

Moment generating functions (mgf) are a very powerful computational tool.


They make certain computations much shorter. However, they are only a
computational tool. The mgf has no intrinsic meaning.

4.1 Definition and moments


Definition 1. Let X be a random variable. Its moment generating function
is
MX (t) = E[etX ]
At this point in the course we have only considered discrete RV’s. We
have not yet defined continuous RV’s or their expectation, but when we do
the definition of the mgf for a continuous RV will be exactly the same.
Example: Let X be geometric with parameter p. Find its mgf.
Recall that fX (k) = p(1 − p)k−1 . Then
∞ ∞
X
tk k−1 t
X 1
M(t) = e p(1 − p) = pe et(k−1) (1 − p)k−1 = pet
k=1 k=1
1− et (1 − p)
Note that the geometric series that we just summed only converges if
et (1 − p) < 1. So the mgf is not defined for all t.
What is the point? Our first application is show that you can get the
moments of X from its mgf (hence the name).
Proposition 1. Let X be a RV with mgf MX (t). Then
(n)
E[X n ] = MX (0)
(n)
where MX (t) is the nth derivative of MX (t).
Proof.
dn tX dn X tk X
n
E[e ] = n
e fX (k) = k n etk fX (k)
dt dt k k

At t = 0 this becomes
X
k n fX (k) = E[X n ]
k

1
There was a cheat in the proof. We interchanged derivatives and an
infinite sum. You can’t always do this and to justify doing it in the above
computation we need some assumptions on fX (k). We will not worry about
this issue.

Example: Let X be binomial RV with n trials and probability p of success.


The mgf is
n  
tk n
X
tX
E[e ] = e pk (1 − p)n−k
k=0
k
n  
X n
= (pet )k (1 − p)n−k = [pet + (1 − p)]n
k=0
k

Now we use it to compute the first two moments.

M ′ (t) = n[pet + (1 − p)]n−1 pet ,


M ′′ (t) = n(n − 1)[pet + (1 − p)]n−2 p2 e2t + n[pet + (1 − p)]n−1 pet

Setting t = 0 we have

E[X] = M ′ (0) = np, E[X 2] = M ′′ (0) = n(n − 1)p2 + np

So the variance is

var(X) = E[X 2 ] − E[X]2 = n(n − 1)p2 + np − n2 p2 = np − np2 = np(1 − p)

4.2 Sums of independent random variables


Suppose X and Y are independent random variables, and we define a new
random variable by Z = X + Y . Then the pmf of Z is given by
X
fZ (z) = fX (x)fY (y)
x,y:x+y=z

The sum is over all points (x, y) subject to the constraint that they lie on
the line x + y = z. This is equivalent to summing over all x and setting
y = z − x. Or we can sum over all y and set x = z − y. So
X X
fZ (z) = fX (x)fY (z − x), fZ (z) = fX (z − y)fY (y)
x y

2
Note that this formula look like a discrete convolution. One can use this
formula to compute the pmf of a sum of independent RV’s. But computing
the mgf is much easier.

Proposition 2. Let X and Y be independent random variables. Let Z =


X + Y . Then the mgf of Z is given by

MZ (t) = MX (t)MY (t)

If X1 , X2 , · · · , Xn are independent and identically distributed, then

MX1 +X2 +···+Xn (t) = [M(t)]n

where M(t) = MXj (t) is the common mgf of the Xj ’s.

Proof.

E[etZ ] = E[et(X+Y ) ] = E[etX etY ] = E[etX ] E[etY ] = MX (t)MY (t)

The proof for n RV’s is the same.


Computing the mgf does not give you the pmf of Z. But if you get a mgf
that is already in your catalog, then it effectively does. We will illustrate
this idea in some examples.

Example: We use the proposition to give a much shorter computation of


the mgf of the binomial. If X is binomial with n trials and probability p of
success, then we can write it as a sum of the outcome of each trial:
n
X
X= Xj
j=1

where Xj is 1 if the jth trial is a success and 0 if it is a failure. The Xj are


independent and identically distributed. So the mgf of X is that of Xj raised
to the n.

MXj (t) = E[etXj ] = pet + 1 − p

So
n
MX (t) = pet + 1 − p


3
which is of course the same result we obtained before.

Example: Now suppose X and Y are independent, both are binomial with
the same probability of success, p. X has n trials and Y has m trials. We
argued before that Z = X + Y should be binomial with n + m trials. Now
we can see this from the mgf. The mgf of Z is
n  m  n+m
MZ (t) = MX (t)MY (t) = pet + 1 − p pet + 1 − p = pet + 1 − p


which is indeed the mgf of a binomial with n + m trials.

Example: Look at the negative binomial distribution. It has two parameters


p and n and the pmf is
 
k−1 n
fX (k) = p (1 − p)k−n , k ≥ n
n−1
So
∞  
X
tk k−1 n
MX (t) = e p (1 − p)k−n
k=n
n−1

X (k − 1)!
= etk pn (1 − p)k−n
k=n
(n − 1)!(k − n)!

Let j = k − n in the sum to get



X (n + j − 1)! n
et(n+j) p (1 − p)j
j=0
(n − 1)!j!

etn pn X (n + j − 1)! tj
= e (1 − p)j
(n − 1)! j=0 j!

etn pn X dn−1 n+j−1
= x |x=et (1−p)
(n − 1)! j=0 dxn−1

etn pn dn−1 X n+j−1
= x |x=et (1−p)
(n − 1)! dxn−1 j=0

The natural thing to do next is factor out an xn−1 from the series to turn
it into a geometric series. We do something different that will save some

4
computation later. Note that the n − 1th derivative will kill any term xk
with k < n − 1. So we can replace

X ∞
X
n+j−1
x by xj
j=0 j=0

in the above. So we have



etn pn dn−1 X j etn pn dn−1 1
x | x=e t (1−p) = |x=et (1−p)
(n − 1)! dxn−1 j=0 (n − 1)! dxn−1 1 − x
etn pn 1
= (n − 1)! |x=et (1−p)
(n − 1)! 1−x
n
et p

=
1 − et (1 − p)
This is of the form something to the n. The something is just the mgf of
the geometric distribution with parameter p. So the sum of n independent
geometric random variables with the same p gives the negative binomial with
parameters p and n.

4.3 Other generating functions


The book uses the “probability generating function” for random variables
taking values in 0, 1, 2, · · · (or a subset thereof). It is defined by

X
GX (s) = fX (k)sk
k=0

Note that this is just E[sX ], and this is our mgf E[etX ] with t = ln(s).
Anything you can do with the probability generating function you can do
with the mgf, and we will not use the probability generating function.
The mgf need not be defined for all t. We saw an example of this with
the geometric distribution where it was defined only if et (1 − p) < 1, i.e,
t < − ln(1 − p). In fact, it need not be defined for any t other than 0. As
an example of this consider the RV X that takes on all integer values and
P (X = k) = c(1 + k 2 )−1 . The constant c is given by

1 X 1
=
c k=−∞ 1 + k 2

5
We leave it to the reader to show that

X 1
etk =∞
k=−∞
1 + k2

for all nonzero t.


Another moment generating function that is used is E[eitX ]. A probabilist
calls this the charateristic function of X. An analyst might call it the fourier
transform of the distribution of X. It has the advantage that for real t it is
always defined.

End of September 28 lecture

You might also like