Ws1 - Probability Generating Functionssol
Ws1 - Probability Generating Functionssol
1.16
a.
1
2
b.
1
n
c.
E
_
I
i
_
=
1
n
1.17
a.
P (N
1
> n) = P (X
1
= max {X
1
, ..., X
n
})
=
1
n
b.
P (N
1
= n) = P (N
1
> n 1) P (N
1
> n)
=
1
n 1
1
n
P (N
1
= n) = 1
c.
EN
1
=
i=1
iP (N
1
= i)
=
i=1
i
1
i (i + 1)
=
i=2
1
i
=
d.
Consider we have reached the rst record. Use this value in a separate process, then whenever we
reach the rst record (N
1
) of the separate process, we have reached the second record of the rst
process. Thus N
2
is a r.v because N
1
is.
e.
Yup, crazy.
1.18
a.
P (X
n
, X
n+1
both records) =
1
n(n + 1)
by symmetry
b.
1
P (X
n
, X
n+1
both records) =
1
n(n + 1)
=
1
n
1
n + 1
= P (X
n
record) P (X
n+1
record) by 1.16b
the two events are independent
c.
Let N
A
= # adjacent records
I
n
=
_
1 if X
n
, X
n+1
are both records
0 o.w
Then
E [N
A
] = E
n=1
I
n
=
1
1 2
+
1
2 3
+
= 1
Weird, right, you expect only one adjacent record in an innite seq.
1.32
a.
Monotone Convergence Theorem (MCT)
b.
By Markov inequality. Check condition that
I
A
i
0, thus Markov inequality applies. Then
P
_
i=1
I
A
i
> m
_
E [
i=1
I
A
i
]
m
=
i=1
P (A
i
)
m
0 as m since
i=1
P (A
i
) <
1.33
a.
b
x
2
f (x) dx b
2
b
f (x) dx = b
2
b.
2
(1): by denition
(2):
xf (x) dx = EX
b
xf (x) dx
0 b
b
f (x) dx
= b
c.
Per the hint. Minimizing
x
2
f (x) dx
is the same as minimizing
x
2
f (x)
1
dx
Let g (x) =
1
1
f (x), let Y g (x) on (, b). Then we want to minimize
EY
2
given
P (Y < b) = 1
EY
b
1
Then
EY
2
(EY )
2
_
b
1
_
2
So the minimum value for EY
2
= E
2
Y =
_
b
1
_
2
, this is achieved if Y =
b
1
wp 1. Thus
x
2
f (x) dx = (1 ) EY
2
(b)
2
1
d.
By part a and c, we have
(b)
2
1
+ b
2
= b
2
_
1
1
+ 1
_
=
b
2
1
We then have
2
2
+ b
2
P (X b)
2
2
+ b
2
e.
Let X = Y EY , then everything follows.
3
1.33 (Alternative)
1 {X b}
_
X + c
b + c
_
2
by construction. Then
E [1 {X b}] = P (X b)
E
_
_
X + c
b + c
_
2
_
=
2
+ c
2
(b + c)
2
(1)
Minimize over c:
d
dc
2
+ c
2
(b + c)
2
=
2c (b + c)
2
2 (
2
+ c
2
) (b + c)
(b + c)
4
set
= 0
0 = c (b + c)
_
2
+ c
2
_
= bc
2
c =
2
b
plug into (1) we have
P (X b)
2
+
_
2
b
_
2
_
b +
2
b
_
2
=
b
2
2
+
4
(b
2
+
2
)
2
=
2
2
+ b
2
1.39
a.
P
_
S
n
E [S
n
]
n
_
var (S
n
)
n
2
var (X
i
)
n
2
nA
n
2
=
A
n
0
b.
4
P
_
S
n
E [S
n
]
n
_
var (S
n
)
n
2
var (X
i
)
n
2
n
i=1
An
1
n
2
=
An
2
n
2
= A
1
n
, 0 < 1
0
1.41
A based proof is given in the solution handbook, available online.
1.45
The quetion might be a bit vague. It actually asks us to show that
Sn
n
does not converge to
anything in probability. To show this we rst use a simple lemme
Lemma. If X
n
P
X, Y
n
P
Y , then X
n
+ Y
n
P
X + Y .
Proof. .
P (|(X
n
+ Y
n
) (X + Y )| > ) P (|(X
n
X)| +|Y
n
+ Y | > )
P
__
|(X
n
X)| >
2
_
_
_
|Y
n
+ Y | >
2
__
P
_
|(X
n
X)| >
2
_
+ P
_
|Y
n
+ Y | >
2
_
0
Suppose
Sn
n
Z for some r.v Z, then obviously
S
2n
2n
Z as well, thus
Sn
S
2n
2n
P
0. However,
S
n
n
S
2n
2n
=
S
n
n
_
S
n
+ S
2n
_
, where S
n
=
2n
i=n+1
X
i
=
D
S
n
=
_
1
1
2
_
S
n
n
1
2
S
n
(2)
Now notice S
n
and S
n
are independent (why?), thus
(2)
_
1
1
2
_
Z
1
2
Z
2
N
_
0,
_
_
1
1
2
_
2
+
1
2
_
2
_
= 0
where Z
1
and Z
2
are iid N
_
0,
2
_
This is a contradiction, thus
Sn
n
cannot converge in probability.
5
1.47.
1
1
m
= exp
_
ln
_
1
1
m
__
exp
_
1
m
_
Thushy cant we just say G
(k)
X
(1) = E [X (X 1) (X 2) ... (X k + 1)], why is that 1 neces-
sary? What happens when |z| > 1 is highly dependent on the distribution, i.e how fast P (X = i)
goes to 0.
mn
_
1
1
m
_
exp
_
mn
1
m
_
= 0
1.48.
a.
EX = 100
var (X) = EX
2
E
2
X
= 1 10
10
+ 10
14
100
S
n
=
n
i=1
X
i
, then
E [S
n
] = nEX
var [S
n
] = nvar (X)
b.
The sum of rst 10
6
variables is no larger than 10
6
.
The event S
n
10
6
occurs if X
n
= 10
12
, n, The probability of this event is
P
_
S
n
10
6
_
=
_
1 10
10
_
10
6
c.
We want P (S
n
> 10
6
), which means at least one X
n
> 10
6
, so
P
_
S
n
> 10
6
_
P
_
10
6
_
i=1
_
X
i
> 10
6
_
_
10
6
P
_
X
1
> 10
6
_
= 10
6
10
10
= 10
4
1.49.
P [|Y
n
| ]
E |Y
n
|
0
6
Worksheet Problems
1.
G
X
(z) =
i=1
p
i
z
i
= E
_
z
X
, by denition
2.
Suppose
G
X
(z) = G
Y
(z) , for some rv Y
We have
G
X
(z) = G
Y
(z) , |z| < 1
Thus
G
(k)
X
(0) = G
(k)
Y
(0) , k
P (X = k) = P (Y = k) , k
(can you justify switching the derivative and the sum? uniform convergence of power series within
its radius of convergence)
Thus X =
d
Y .
3.
d
k
dz
k
G
X
(z) =
d
k
dz
k
i=0
P (X = i) z
i
=
i=k
i (i 1) (i 2) ... (i k + 1) P (X = i) z
ik
, |z| < 1
G
(k)
X
(1) =
i=k
i (i 1) (i 2) ... (i k + 1) P (X = i)
= E [X (X 1) (X 2) ... (X k + 1)]
(Note 1: Why cant we just say G
(k)
X
(1) = E [X (X 1) (X 2) ... (X k + 1)], i.e why is that
1 necessary? What happens when |z| > 1 is highly dependent on the distribution, i.e how fast
P (X = i) goes to 0.
Note 2: The fact that lim
x1
G
X
(x) = G
X
(1) is justied by MCT)
4.
basic arithmatic
7
5.
G
X+Y
(z) = E
_
z
X+Y
_
= E
_
z
X
_
E
_
z
Y
_
= G
X
(z) G
Y
(z)
6.
Collect terms
G
X
(z) G
Y
(z) =
i=0
P (X = i) z
i
j=0
P (Y = j) z
j
=
k=0
_
k
l=0
P (X = l) P (Y = k l) z
k
_
=
k=0
P (X + Y = k) z
k
7.
generalize
8.
b.
G
X
(z) =
n=0
e
n
n!
z
n
=
n=0
e
(z)
n
n!
= e
(z1)
n=0
e
z
(z)
n
n!
= e
(z1)
G
Sn
(z) = e
(z1)
i
8
d.
G
X
(z) =
n=r
_
n 1
r 1
_
p
r
q
nr
z
n
=
n=r
_
n 1
r 1
_
p
r
q
nr
z
n
=
n=r
_
n 1
r 1
_
(zp)
r
(zq)
nr
=
(zp)
r
(1 zq)
r
n=r
_
n 1
r 1
_
(1 zq)
r
(zq)
nr
=
_
zp
1 zq
_
r
G
Sn
(z) =
_
zp
1 zq
_
r
i
9.
G
S
N
(z) = E
_
z
S
N
= E
_
E
_
z
Sn
|N
= E
_
E
N
_
z
X
i
_
= G
N
(G
X
(z))
10.
Apply 9
G
S
N
(z) = G
N
(G
X
(z))
= G
N
(1 p + pz)
= e
(1p+pz1)
= e
p(z1)
9
11.
P (S
N
= j, N S
N
= k) =
N=0
P (S
N
= j, N S
N
= k|N = n) P (N = n)
= P (S
N
= j, N S
N
= k|N = j + k) P (N = j + k)
, since P (S
N
= j, N S
N
= k|N = n) = 0, n = j + k
=
_
j + k
j
_
p
j
q
k
j+k
e
j + k!
=
e
j!k!
(p)
j
(q)
k
=
_
e
p
(p)
j
j!
__
e
q
(q)
k
k!
_
= P (S
N
= j) P (N S
N
= k)
12.
Same process as 11.
Challenge Problem
(note: I )
1.
Suppose we draw n = 2 random variables, then
P (N
1
= 0) =
1
2
P (N
1
= 1) =
1
2
At n = 3, we have
P (N
1
= 0) =
3
6
P (N
1
= 1) =
2
6
P (N
1
= 2) =
1
6
First of all, due to iid, WLOG let us replace the nX
_
3 4 1 2
3 4 2 1
2 4 1 3
2 4 3 1
1 4 2 3
1 4 3 2
_
_
We can also achieve 1 adj.rec by moving one of 1, 2 between 3 and 4, or 1 between 2 and 3. Thus
the total number of possible 1-adj.rec arrangement is
_
3
2
_
2! + 2 + 1 = 9
You can easily see that for n = 5, this idea follows through as well, and the total number is
_
4
2
_
2! + 3 + 2 + 1 = 18
Generalize to n we have the total number of possible (n 3)-adj.rec arrangements is
_
n 1
2
_
2! +
n2
i=1
i = 3
n2
i=1
i
If one follows this pattern, or any n m, m n arrangement one needs
m
0
nm
i
1
=1
i
1
i
2
=1
...
i
m2
i
m1
i
Where m
0
= # ways to rearrange a mtuple to achieve 0-adj.rec, with 1
0
= 1. Thus the distribu-
tion is dened recursively.
To check the result, I wrote a R program to simulate the process, and the values till n = 11 are
shown below:
11
n # of adjacent records
n-1 n-2 n-3 n-4 n-5 n-6 n-7 n-8 n-9 n-10 n-11
1 1
0
=1
2 1 2
0
=1
3 1 2 3
0
=3
4 1 3 9 4
0
=11
5 1 4 18 44 5
0
=53
6 1 5 30 110 265 6
0
=309
7 1 6 45 220 765 1854 7
0
=2119
8 1 7 63 385 1855 6489 14833 8
0
=16687
9 1 8 84 616 3710 17304 59332 133496 9
0
=148329
10 1 9 108 924 6678 38934 177996 600732 1334961 10
0
=1468457
11 1 10 135 1320 11130 77868 444990 2002440 6674805 14684570 11
0
=16019531
To get the probabilities one can simply divide the raw # by the number of permutations (n!).
12