0% found this document useful (0 votes)
25 views67 pages

Stats Book

Uploaded by

vanshsingla79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views67 pages

Stats Book

Uploaded by

vanshsingla79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

A M I T G O YA L

PROBABILITY
Copyright © 2023 Amit Goyal
Contents

1 Probability 11

2 Conditional Probability 21

Discrete Random
3 Variables 25

4 Continuous Random Variables 43

5 Topics in Random Variables 57

6 Sampling 59

7 Estimation 65
List of Figures

4.1 xy > u 56
List of Tables
probability 9
1 | Probability

Definition 1.1: Experiment

An experiment is an act whose outcome is not predictable with


certainty. For example: Tossing a fair coin, selecting a student
by drawing a name from the box, rolling a fair dice, drawing 5
cards from a well shuffled deck of 52 cards etc.

Definition 1.2: Sample Space

Consider an experiment. The set of all possible outcomes of


an experiment is known as the sample space of the experiment
and is denoted by S. For example - if the experiment consists
of flipping two coins, then the sample space consists of four
elements: S = {( H, H ), ( H, T ), ( T, H ), ( T, T )}.

Definition 1.3: Events

Any subset E of the sample space S is known as an event. Two


ways to describe events:
1. In words
2. As sets

Definition 1.4: High-school definition of Probability

Number of outcomes in E
Pr( E) =
Number of outcomes in S
Note that this definition assumes all outcomes are equally
likely and the sample space is a finite set.
12 amit goyal

Theorem 1.1: The basic principle of counting

Suppose an experiment has two stages. Then if stage 1 can re-


sult in any of m possible outcomes and if, for each outcome of
stage 1, there are n possible outcomes of stage 2, then together
there are mn possible outcomes of the experiment.

Theorem 1.2: Permutations


Suppose we have n objects. There are n! different arrange-
ments of objects possible. Each arrangement is known as a
permutation. There are (n − 1)! circular permutations.

Theorem 1.3: Combinations

Suppose we have n objects. How many different groups of


robjects
  from a total of n objects? Wedefine
can be formed 
n n n! n
, for r ≤ n, by = . We say that repre-
r r r!(n − r )! r
sents the number of possible combinations of n objects taken r
at a time.

Theorem 1.4: Sampling: Choosing k objects out of n

1. Order Matters
(a) With Replacement
nk
(b) Without Replacement

n n!
Pk =
(n − k)!

2. Order does not matter


(a) With Replacement

n+k−1 ( n + k − 1) !
 
n + k −1
Ck = =
k k!(n − 1)!

(b) Without Replacement


 
n n n!
Ck = =
k k!(n − k)!
probability 13

Theorem 1.5

Thefollowing
  equalities
 hold:
n n
1. =
k n − k 
n−1

n
2. n =k
k−1 k
  k   
m+n m n
3. =∑
k j k−j
   j=0 
n−1 n−1

n
4. = +
k k k−1
n  
n
5. ( a + b)n = ∑ ak bn−k
k =0
k

Definition 1.5: Probability

Let F denotes the set of all events of an experiment. Proba-


bility is function Pr : F → [0, 1] that satisfy the following
axioms:
1. Pr(∅) = 0,!Pr(S) = 1
∞ ∞
∑ Pr( Ai ) for a disjoint collection { Ai
[
2. Pr Ai = ⊂ S :
i =1 i =1
i ∈ N} of events.

Theorem 1.6
1. Pr( Ac ) = 1 − Pr( A)
2. If A ⊂ B, then Pr( A) ≤ Pr( B).
3. Pr( A ∪ B) = Pr( A) + Pr( B) − Pr( A ∩ B)
4. Pr( A ∪ B ∪ C ) = Pr( A) + Pr( B) + Pr(C ) − Pr( A ∩ B) −
Pr( A ∩ C ) !
− Pr( B ∩ C ) + Pr( A ∩ B ∩ C )
n n
∑ Pr( Ai ) − ∑ Pr( Ai ∩ A j )+
[
5. Pr Ai =
i =1 i =1 i< j
∑ Pr( Ai ∩ A j ∩ Ak ) − ∑ Pr( Ai ∩ A j ∩ Ak ∩ Al ) + · · · +
i< j<k i < j<k<l
!
n
n +1
\
(−1) Pr Ai
i =1
14 amit goyal

Definition 1.6: Independence of events

• Independence of two events. Events A and B are indepen-


dent if Pr( A ∩ B) = Pr( A) Pr( B).
• Independence of three events. Events A, B and C are in-
dependent if Pr( A ∩ B) = Pr( A) Pr( B), Pr( A ∩ C ) =
Pr( A) Pr(C ), Pr( B ∩ C ) = Pr( B) Pr(C ) and Pr( A ∩ B ∩ C ) =
Pr( A) Pr( B) Pr(C ).
• Independence of n events. A collection { Ai ⊂ S : 1 ≤ i ≤
n} of events
! are said to be independent if ∀ I ⊂ {1, 2, . . . n},
= ∏ Pr( Ai )
\
Pr Ai
i∈ I i∈ I

Solved Problems

Example 1.1: [Click]

Out of 5 men and 2 women, a committee of 3 is to be formed.


In how many ways can it be formed if at least one woman is
included in each committee?

Solution 1.1
   
7 5

3 3

Example 1.2: [Click]

An urn containing 5 red, 5 black and 10 white balls. If balls


are drawn without replacement. What is the probability that in
first 7 draws, at least one ball of each colour is drawn?
probability 15

Solution 1.2
Let Ar be the event that there is at least one red ball drawn
in the first seven balls. Likewise, Ab be the event that there is
at least one black ball drawn in the first seven balls, and Aw
be the event that there is at least one white ball in the seven
draws. We want to find the probability of the event that at
least one ball of each color is drawn in the seven draws which
is
Pr( Ar ∩ Ab ∩ Aw ) = 1 − Pr( Arc ∪ Acb ∪ Acw ).
So to find Pr( Ar ∩ Ab ∩ Aw ), we just need to find Pr( Arc ∪ Acb ∪
Acw ). By inclusion-exclusion principle,

Pr( Arc ∪ Acb ∪ Acw )


= Pr( Arc ) + Pr( Acb ) + Pr( Acw ) − Pr( Arc ∩ Acb )
− Pr( Arc ∩ Acw ) − Pr( Acb ∩ Acw ) + Pr( Arc ∩ Acb ∩ Acw )
(15
7) (15
7) (10
7) (10
7)
= + + − −0−0+0
(20
7) (20
7) (20
7) (20
7)
(15
7) (15
7)
= +
(20
7)(20
7)
6435
= 2×
77520
≈ 0.166
64650
Therefore, Pr( Ar ∩ Ab ∩ Aw ) = 77520 ≈ 0.834

Example 1.3: [Click]

The probability of a contractor getting a plumbing contract is


2/3 and the probability of him getting an electricity contract
is 5/9. The probability of getting at least one contract is 4/5.
What’s the probability that he he will get both contracts?

Solution 1.3

Let P be the event that the contractor gets the plumbing con-
tract, E be the event that he gets the electricity contract. We are
given
Pr( P) = 23 , Pr( E) = 95 and Pr( P ∪ E) = 45 .
To find Pr( P ∩ E), we will use the following equality
Pr( P ∩ E) = Pr( P) + Pr( E) − Pr( P ∪ E) = 23 + 59 − 45 = 19
45
16 amit goyal

Example 1.4: [Click]

Five players are dealt 3 cards each. What is the probability of a


player getting 3 aces?

Solution 1.4

Let Ej denotes the event that player j gets three aces, where j ∈
{1, 2, 3, 4, 5}. We want to find the probability Pr( E1 ∪ E2 ∪ E3 ∪
E4 ∪ E5 ). Since E1 , E2 , E3 , E4 , E5 are mutually disjoint,
Pr( E1 ∪ E2 ∪ E3 ∪ E4 ∪ E5 ) = Pr( E1 ) + Pr( E2 ) + Pr( E3 ) +
Pr( E4 ) + Pr( E5 )
By symmetry, Pr( E1 ) = Pr( E2 ) = Pr( E3 ) = Pr( E4 ) = Pr( E5 )
holds.
Therefore,
(43) 1
Pr( E1 ∪ E2 ∪ E3 ∪ E4 ∪ E5 ) = 5 Pr( E1 ) = 5 × 52 =
( ) 1105
3

Example 1.5: [Click]

What is the probability that no two have the same face value in
a poker hand of 5 cards?

Solution 1.5

(13
5 )4
5
≈ 0.50708
(52
5)

Example 1.6: [Click]

How many ways can 12 boys and 14 girls be arranged in a


line?

Solution 1.6
26!

Example 1.7: [Click]

10 balls are thrown into 5 bins uniformly, at random, and in-


dependently, what is the probability that there are no empty
bins?
probability 17

Solution 1.7

Let the five bins be numbered 1, 2, 3, 4, and 5, respectively,


and let Ai be the event that bin i is not empty. We want to
find the probability of the event that no bin is empty i.e.
A1 ∩ A2 ∩ A3 ∩ A4 ∩ A5 . Now
Pr( A1 ∩ A2 ∩ A3 ∩ A4 ∩ A5 ) = 1 − Pr( A1c ∪ A2c ∪ A3c ∪ A4c ∪ A5c )
and by inclusion-exclusion principle,

Pr( A1c ∪ A2c ∪ A3c ∪ A4c ∪ A5c )


     
5 5 5
= Pr( A1c ) − Pr( A1c ∩ A2c ) + Pr( A1c ∩ A2c ∩ A3c )
1 2 3
 
5
− Pr( A1c ∩ A2c ∩ A3c ∩ A4c )
4
 
5
+ Pr( A1c ∩ A2c ∩ A3c ∩ A4c ∩ A5c )
5
   10    10    10    10
5 4 5 3 5 2 5 1
= − + −
1 5 2 5 3 5 4 5
   10
5 0
+
5 5
≈ 0.477

Therefore, Pr( A1 ∩ A2 ∩ A3 ∩ A4 ∩ A5 ) ≈ 1 − 0.477 = 0.523

Example 1.8: [Click]

In how many ways can you put 15 identical balls in 3 distinct


boxes such that each box contains at least 1 and at most 10
balls?
18 amit goyal

Solution 1.8
Notice that the number of ways to put 15 identical balls in
3 distinct boxes such that each box contains at least 1 and at
most 10 balls is equal to the number of positive integer solu-
tions to the following system of equations/inequalities:

x1 + x2 + x3 = 15
1 ≤ x1 ≤ 10
1 ≤ x2 ≤ 10
1 ≤ x3 ≤ 10

Equivalently, we can define y1 = x1 − 1, y2 = x2 − 1 and y3 =


x3 − 1, rewrite the system as

y1 + y2 + y3 = 12
0 ≤ y1 ≤ 9
0 ≤ y2 ≤ 9
0 ≤ y3 ≤ 9

and find the number of non-negative integer solutions to this


system. Doing so gives us: (14 4 14
12) − 3 × (2) where (12) is the
number of non-negative integer solutions to y1 + y2 + y3 = 12
and 3 × (42) is the number of non-negative integer solutions to
y1 + y2 + y3 = 12 with the property that either y1 or y2 or y3 is
greater than or equal to 10.
Therefore, the number of ways to put 15 identical balls in 3
distinct boxes such that each box contains at least 1 and at
most 10 balls is equal to (14 4
12) − 3 × (2) = 73.

Example 1.9: [Click]

Does it imply that the two not independent events are mutu-
ally exclusive events?

Solution 1.9

No. Consider any event A with the property that 0 < Pr( A) <
1. A is neither independent of itself nor A and A are mutually
exclusive. However, two mutually exclusive events A and B
with the property that 0 < Pr( A) ≤ 1 and 0 < Pr( B) ≤ 1 can’t
be independent. This is because Pr( A ∩ B) = 0 ̸= Pr( A) Pr( B).
probability 19

Example 1.10: [Click]

A drawer has 5 brown socks and 4 green socks. A man takes


out 2 socks at random. What is the probability that they
match?

Solution 1.10

(52) + (42)
.
(92)

Example 1.11: [Click]

Tickets numbered 1 to 10 are mixed up and two tickets are


drawn at random. What is the probability that they are multi-
ples of 3?

Solution 1.11

There are (10


2 ) = 45 ways to select two tickets from the set of
tickets numbered 1 to 10. There are (32) = 3 ways to select two
tickets from the set of tickets that are numbered multiples of 3.
3 1
So the required probability is 45 = 15 .

Example 1.12: [Click]

If Pr( A) ≥ 0.8 and Pr( B) ≥ 0.8, then Pr( A ∩ B) ≥ l. Find the


largest value of l for which the above implication is true.

Solution 1.12

Pr( A ∩ B) = Pr( B) − Pr( Ac ∩ B)


≥ Pr( B) − Pr( Ac )
= 0.8 − 0.2
= 0.6

Therefore, Pr( A ∩ B) ≥ 0.6.


2 | Conditional Probability

Definition 2.1: Conditional Probability

Conditional Probability of event A given that event B has


occurred is defined as:
Pr( A ∩ B)
Pr( A| B) = if Pr( B) > 0
Pr( B)

Theorem 2.1
1. Pr( A ∩ B) !
= Pr( B) Pr( A| B) = Pr( A) Pr( B| A)
n
\
2. Pr Ai = Pr( A1 ) Pr( A2 | A1 ) Pr( A3 | A1 ∩
i =1
A2 ) · · · Pr( An | A1 ∩ A2 ∩ A3 · · · ∩ An−1 )
Pr( B| A) Pr( A)
3. (Bayes’ Rule) Pr( A| B) =
Pr( B)
4. (Law of total Probability) Given a partition
A1 , A2 , A3 , . . . , An of S,
n n
Pr( E) = ∑ Pr(E ∩ Ai ) = ∑ Pr(E| Ai ) Pr( Ai )
i =1 i =1

Pr( B| A) Pr( A)
5. (Bayes’ Rule) Pr( A| B) =
Pr( B| A) Pr( A) + Pr( B| Ac ) Pr( Ac )

Definition 2.2: Conditional independence

We say events A and B are conditionally independent given C


if
Pr( A ∩ B|C ) = Pr( A|C ) Pr( B|C )
22 amit goyal

Solved Problems

Example 2.1: [Click]

A bag contains 10 white and 3 red balls while another bag con-
tains 3 white and 5 red balls. Two balls are drawn at random
and put in the second bag. Then a ball is drawn at random
from the second bag, what is the probability that it is a white
ball?

Solution 2.1
Let EWW be the event that two balls drawn from the first bag
are both white. Likewise, EWR be the event that one white ball
and one red ball are drawn from the first bag, and ERR be the
event that two balls drawn from the first bag are both red. Let
W be the event that a white ball is drawn from the second bag.
By the Law of total probability,

Pr(W ) = Pr( EWW ) Pr(W | EWW ) + Pr( EWR ) Pr(W | EWR )


+ Pr( ERR ) Pr(W | ERR )
(10
2) 5 (10 3
1 )(1) 4 (32) 3
= 13
× + 13
× + 13
×
( 2 ) 10 (2) 10 ( ) 10
2
45 5 30 4 3 3
= × + × + ×
78 10 78 10 78 10
354
=
780
59
=
130
≈ 0.4538

Example 2.2: [Click]

Three dice are thrown simultaneously. What is the probability


that 4 has appeared on two dice given that 5 has occurred on
one dice?

Solution 2.2
Three dice are thrown simultaneously. Let A be the event that
4 appears on two dice, and B be the event that 5 occurs on (ex-
actly) one dice. We want to find the conditional probability of
A given B.
Pr( A ∩ B) (3) 13 1
Pr( A| B) = = 13 652 =
Pr( B) ( 1 ) 63 25
probability 23

Example 2.3: [Click]

A company produces light bulbs at three factories A, B and C.


• Factory A produces 40% of the total number of bulbs, of
which 2% are defective.
• Factory B produces 35% of the total number of bulbs, of
which 4% are defective.
• Factory C produces 25% of the total number of bulbs, of
which 3% are defective.
1. A defective bulb is found among the total output. Find the
probability that it came from (a) Factory A, (a) Factory B, (c)
Factory C.
2. Now suppose a factory is chosen at random, and one of its
bulbs is randomly selected. If the bulb is defective, find the
probability that it came from (a) Factory A, (a) Factory B, (c)
Factory C.

Solution 2.3

1. Let D be the event that the randomly selected bulb is de-


fective. Let A be the event that the randomly selected bulb
came from factory A. Likewise, define events B and C.
We are given that Pr( A) = 0.4, Pr( B) = 0.35 and Pr(C ) =
0.25. Also, Pr( D | A) = 0.02, Pr( D | B) = 0.04, and Pr( D |C ) =
0.03. We want to find Pr( A| D ). By Bayes’ Rule,

Pr( D | A) Pr( A)
Pr( A| D ) =
Pr( D | A) Pr( A) + Pr( D | B) Pr( B) + Pr( D |C ) Pr(C )
0.008
=
0.008 + 0.014 + 0.0075
= 0.2712

Likewise, Pr( B| D ) = 0.4746 and Pr(C | D ) = 0.2542.


2. In this question we are given that Pr( A) = 31 , Pr( B) = 13 and
Pr(C ) = 13 . Also, Pr( D | A) = 0.02, Pr( D | B) = 0.04, and
Pr( D |C ) = 0.03. To find Pr( A| D ), we use Bayes’ Rule (as be-
fore), and we get

Pr( D | A) Pr( A)
Pr( A| D ) =
Pr( D | A) Pr( A) + Pr( D | B) Pr( B) + Pr( D |C ) Pr(C )
2
=
9
4
Similarly, Pr( B| D ) = 9 and Pr(C | D ) = 13 .
Discrete Random
3 | Variables

Definition 3.1: Random Variable

A Random Variable is any real-valued function defined on


sample space X : S → R.

Definition 3.2: Discrete Random Variable

A discrete random variable is the random variable that can


take a finite or countably infinite number of values.

Definition 3.3: Probability Mass Function (PMF)

Consider a discrete random variable X : S → R. Associated


with it is a probability mass function (PMF) p X : R → [0, 1] de-
fined as follows:
p X ( x ) = Pr({s ∈ S| X (s) = x }) = Pr( X = x )

Definition 3.4: Cumulative Distribution Function (CDF)

The cumulative distribution function (CDF) of a random vari-


able X is a function FX : R → [0, 1] defined as follows:
FX ( x ) = Pr({s ∈ S| X (s) ≤ x }) = Pr( X ≤ x )

Theorem 3.1: Properties of a CDF

1. FX is monotonically non-decreasing.
2. lim FX ( x ) = 0 and lim FX ( x ) = 1
x →−∞ x →∞
3. FX is right-continuous.
26 amit goyal

Definition 3.5: Bernoulli Random Variable

X ∼ Bern( p) i.e. X is a Bernoulli random variable with pa-


rameter p if it indicates whether a trial that results in a success
with probability p is a success or not.

p X (1) = Pr( X = 1) = p
p X (0) = Pr( X = 0) = 1− p

Definition 3.6: Binomial Random Variable

X ∼ Bin(n, p) is a Binomial random variable with parameters n


and p if it represents the number of successes in n independent
trials when each trial is a success with probability p.
 
n x
p X ( x ) = Pr( X = x ) = p (1 − p ) n − x
x

where x ∈ {0, 1, . . . , n}.

Definition 3.7: Geometric Random Variable

X ∼ Geom( p) is a Geometric random variable with parameter


p if it represents the number of failures before the first success
where each trial is independently a success with probability p.

p X ( x ) = Pr( X = x ) = p(1 − p) x

where x ∈ {0, 1, 2, 3 . . .}.

Definition 3.8: Poisson Random Variable.

X ∼ Pois(λ) is used to model the number of events that occur


when these events are either independent or weakly depen-
dent and each has a small probability of occurrence. It has
parameter λ that represents the rate of occurrence.

e−λ λ x
p X ( x ) = Pr( X = x ) =
x!
where x ∈ {0, 1, 2, 3 . . .}.
probability 27

Definition 3.9: Indicator Random Variable

Let A be any event. Indicator Random Variable, 1 A : S → R is


a random variable that assigns value 1 to those outcomes when
event A occurs, and 0 otherwise.

1 if s ∈ A
1 A (s) =
0 if s ∈
/A

Definition 3.10: Expected Value or Expectation

Expected value of a random variable X, with PMF p X , is de-


fined by

E( X ) = ∑ xp X ( x ) = ∑ X (s) Pr({s})
x ∈ X (S) s∈S

Here X (S) denotes the range of X.

Theorem 3.2

1. For X ∼ Bern( p), E( X ) = p


2. For X ∼ Bin(n, p), E( X ) = np
1− p
3. For X ∼ Geom( p), E( X ) =
p
4. For X ∼ Pois(λ), E( X ) = λ

Theorem 3.3

Consider Xn ∼ Bin(n, pn ) and X ∼ Pois(λ) where λ = npn for


all n. If n → ∞ and pn → 0, then Xn → X in distribution i.e.
p Xn ( x ) → p X ( x ) for all x ∈ Z+ .

Theorem 3.4: Linearity of Expectation

Given two random variables X : S → R and Y : S → R, we


have

E ( X + Y ) = E ( X ) + E (Y )

and for any c ∈ R we also have

E(cX ) = cE( X )
28 amit goyal

Definition 3.11: Transformation of a Discrete Random Vari-


able
Given a random variable X : S → R, one may generate other
random variables by applying various transformations on
X. Y : S → R is the transformation of a random variable
X : S → R if Y (s) = g ◦ X (s) = g( X (s)) for some function
g : R → R.

Theorem 3.5: PMF of a Transformed Random Variable

Suppose Y = g( X ) is a transformation of random variable X


using function g. The PMF of Y, pY can be calculated using the
PMF of X. In particular, to obtain pY (y) for any y, we add the
probabilities of all values of x such that g( x ) = y:

pY ( y ) = ∑ pX (x)
{ x:g( x )=y}

Theorem 3.6: Law of the Unconscious Statistician

Law of the Unconscious Statistician (LOTUS) is

E( g( X )) = ∑ g( x ) p X ( x )
x ∈ X (S)

Example: Find Expected Utility

Definition 3.12: Variance of a Random Variable

Variance of a random variable X, denoted by V( X ), is

V( X ) = E( X − E( X ))2 = E( X 2 ) − (E( X ))2

Definition 3.13: Standard Deviation of a Random Variable

V( X ) is known as the standard deviation of X and is de-


p

noted by σX .

Theorem 3.7

1. For X ∼ Bern( p), V( X ) = p(1 − p)


2. For X ∼ Bin(n, p), V( X ) = np(1 − p)
1− p
3. For X ∼ Geom( p), V( X ) =
p2
4. For X ∼ Pois(λ), V( X ) = λ
probability 29

Definition 3.14: Negative Binomial Random Variable

X ∼ NBin(r, p) if it is the number of failures before the rth


success where each trial is independently a success with prob-
ability p.

r+x−1 r
 
p X ( x ) = Pr( X = x ) = p (1 − p ) x
r−1

where x ∈ {0, 1, 2, 3 . . .}

Definition 3.15: Hypergeometric Random Variable

X ∼ Hyper( N, n, m) if it is the number of white balls in a


random sample of n balls chosen without replacement from an
urn of N balls of which m are white.
N−m
  
m
x n−x
p X ( x ) = Pr( X = x ) =  
N
n

where x ∈ {max(0, n + m − N ), 1, 2, . . . , min{m, n}}.

Definition 3.16: Discrete Uniform Random Variable

X ∼ DUnif( a, a + n) if

1
p X ( x ) = Pr( X = x ) =
n+1
where x ∈ { a, a + 1, a + 2, . . . , a + n}.

Definition 3.17: Joint Probability Mass Function

Consider two discrete random variables X and Y associated


with the same experiment. The probabilities of the values that
X and Y can take are captured by the joint PMF of X and Y,
denoted p X,Y . In particular, if ( x, y) is a pair of possible values
of X and Y, the probability mass of ( x, y) is the probability of
the event X = x, Y = y:

p X,Y ( x, y) = Pr ({s ∈ S| X (s) = x } ∩ {s ∈ S|Y (s) = y})


= Pr( X = x ∩ Y = y)
or simply, Pr( X = x, Y = y)
30 amit goyal

Definition 3.18: Marginal PMF of X

Given the joint PMFs of X and Y ( p X,Y ), we can calculate the


PMF of X by using

pX (x) = ∑ p X,Y ( x, y)
y ∈Y ( S )

We also refer to p X as the marginal PMF of X.

Definition 3.19: Conditional PMF of X given event A

The conditional PMF of a random variable X, conditioned on a


particular event A with Pr( A) > 0, is defined by

Pr( X = x ∩ A)
p X | A ( x ) = Pr( X = x | A) =
Pr( A)

Definition 3.20: Conditional PMF of X given Y

Conditional PMF of X at x given Y = y with pY (y) > 0, de-


noted by p X |Y ( x |y), is defined as:

p X |Y ( x | y ) = Pr( X = x |Y = y)
Pr( X = x, Y = y)
=
Pr(Y = y)
p X,Y ( x, y)
or simply
pY ( y )

Definition 3.21: Independence of Random Variables

We say two random variable X and Y are independent if

p X,Y ( x, y) = p X ( x ) pY (y)

for all x, y.

Definition 3.22: 2-D LOTUS (Law of the Unconscious Statisti-


cian)

For any g : R2 → R, and random variables X and Y,

E( g( X, Y )) = ∑ ∑ g( x, y) p X,Y ( x, y)
x ∈ X ( S ) y ∈Y ( S )
probability 31

Definition 3.23: Covariance between X and Y

Covariance between random variables X and Y, denoted by


C( X, Y ), is defined as follows:

C( X, Y ) = E(( X − E( X ))(Y − E(Y )))


= E( XY ) − (E( X )E(Y ))

Theorem 3.8

Given n + m random variables X1 , . . . , Xn and Y1 , . . . , Ym , and


n + m real numbers a1 , . . . , an and b1 , . . . , bm , the following
holds
!
n m n m
C ∑ ai Xi , ∑ bj Yj = ∑ ∑ ai bj C(Xi , Yj )
i =1 j =1 i =1 j =1

Theorem 3.9
nm
1. For X ∼ Hyper( N, n, m), E( X ) = , and V( X ) =
N
nm( N − m)( N − n)
.
N 2 ( N − 1)
r (1 − p )
2. For X ∼ NBin(r, p), E( X ) = , and V( X ) =
p
r (1 − p )
.
p2

Definition 3.24: Correlation between X and Y

Correlation between random variables X and Y, denoted by


ρ X,Y , is defined as follows:
!
X − E ( X ) Y − E (Y ) C( X, Y )
ρ X,Y = C , p =
V( X ) V (Y )
p
σX σY

Theorem 3.10

For any pair of random variables X and Y, the following holds:

−1 ≤ ρ( X, Y ) ≤ 1

Theorem 3.11

If X and Y are independent, they are uncorrelated i.e.


C( X, Y ) = 0.
32 amit goyal

Definition 3.25: Conditional Expectation of X given Y = y

Conditional Expectation of X given an event Y = y, is denoted


by E( X |Y = y), and is defined as follows:

E ( X |Y = y ) = ∑ xp X |Y ( x |y)
x ∈ X (S)

Note1 1
E( X |Y ) is a random variable

Theorem 3.12

E( X ) = E(E( X |Y ))

Theorem 3.13

V( X ) = V(E( X |Y )) + E(V( X |Y ))

Definition 3.26: Median of a Random Variable

We say that m X is the median of random variable X if

1
Pr( X ≤ m X ) ≥
2
1
Pr( X ≥ m X ) ≥
2

Solved Problems

Example 3.1: [Click]

Assume that X is uniformly distributed on {1, 2, . . . , n} and


Y is uniformly distributed on {2, 4, . . . , 2n}. Assuming that X
and Y are independent random variables, what is the variance
of XY?

Solution 3.1

V( XY ) = E( X 2 Y2 ) − (E( XY ))2
= E( X 2 )E(Y2 ) − (E( X )E(Y ))2
= 4E( X 2 )E( X 2 ) − 4(E( X ))4
"  #
(n + 1)(2n + 1) 2 n+1 4
 
= 4 −
6 2
probability 33

Example 3.2: [Click]

Let X and Y be the random variables which respectively are


the number of tosses to see your first head and the number of
tosses to see your first tail. What is the covariance between X
and Y, Cov( X, Y )?

Solution 3.2

X and Y are Geometric ( p) Random Variables with parameter


p = 1/2. So we have E( X ) = E(Y ) = 1p = 2. To find the
Cov( X,Y ), we’ll now find E( XY ). Observe that
Y if X = 1
XY =
 X if X > 1

E( XY ) = E( XY | X = 1) Pr( X = 1) + E( XY | X > 1) Pr( X > 1)


= E(Y | X = 1) Pr( X = 1) + E( X | X > 1) Pr( X > 1)
= (1 + 2)0.5 + (1 + 2)0.5
= 3

Therefore, Cov( X, Y ) = 3 − 4 = −1

Example 3.3: [Click]

Can the Cov( X, X + Y ) = 0?

Solution 3.3

Yes it is possible. Consider any random variable X and let Y =


− X.

Example 3.4: [Click]

For n independent Bernoulli trials, each with the probabil-


ity p of success, let X be the number of successes so that
X ∼ Binom(n, p) and let Y be the number of failures i.e.
Y = n − X ∼ Binom(n, 1 − p). Find the expected value E( XY )
and the covariance Cov( X, Y )?
34 amit goyal

Solution 3.4

Given that X ∼ B(n, p) and Y = n − X ∼ B(n, 1 − p), the cor-


relation coefficient between X and Y (denoted by ρ X,Y ) equals
−1. Consequently, the covariance is

Cov( X, Y ) = ρ X,Y VX VY = −np(1 − p) and
E( XY ) = Cov( X, Y ) + E( X )E(Y ) = −np(1 − p) + n2 p(1 − p) =
n ( n − 1) p (1 − p )

Example 3.5: [Click]

Show that the distribution of a random variable X with possi-


ble values 0, 1, 2 is determined by µ1 = E( X ) and µ2 = E( X 2 )?

Solution 3.5

Given that µ1 = E( X ) and µ2 = E( X 2 ), and X take values in


the set {0, 1, 2}, we can write

0 · Pr( X = 0) + 1 · Pr( X = 1) + 2 · Pr( X = 2) = µ1


0 · Pr( X = 0) + 1 · Pr( X = 1) + 4 · Pr( X = 2) = µ2
Pr( X = 0) + Pr( X = 1) + Pr( X = 2) = 1

Solving the above system of equations, we get


µ2 − µ1
Pr( X = 2) =
2
− µ2
Pr( X = 1) = 2µ1 
3µ1 − µ2

Pr( X = 0) = 1 −
2

Example 3.6: [Click]

X ∼ Pois(2), Y = min( X, 10). What is the probability distribu-


tion of Y?

Solution 3.6

If FX and FY denote the CDFs of X and Y respectively, and


Y = min( X, 10), we can  use CDF of X to get CDF of Y in the
 F (t) for t < 10
X
following way : FY (t) =
1 for t ≥ 10
probability 35

Example 3.7: [Click]

Suppose the occurrence of A makes it more likely that B will


occur. In that case, show that the occurrence of B makes it
more likely that A will occur i.e. show that if Pr( B| A) > Pr( B),
then it is also true that Pr( A| B) > Pr( A).

Solution 3.7

Just rewrite the equality


Pr( A ∩ B) = Pr( A) Pr( B| A) = Pr( B) Pr( A| B) as
Pr( B| A) Pr( A| B)
=
Pr( B) Pr( A)
and the result follows.

Example 3.8: [Click]

6 independent fair coins are tossed in a row. What is the ex-


pected number of consecutive HH pairs?

Solution 3.8

For j ∈ {2, 3, 4, 5, 6}, define



1 if outcomes of ( j − 1)th and jth tosses are heads,
Ij =
0 otherwise

Note that E( Ij ) = Pr( Ij = 1) = 41 .


Let N denotes the number of pairs of heads. We can write N
as N = I2 + I3 + I4 + I5 + I6 .
Therefore,
E( N ) = E( I2 ) + E( I3 ) + E( I4 ) + E( I5 ) + E( I6 )
By symmetry,
E( N ) = 5E( I2 ) = 54 = 1.25

Example 3.9: [Click]

A coin is weighted so that the probability of obtaining a head


in a single toss is 0.3. If the coin is tossed 35 times, then what
is the probability of obtaining between 9 and 14 heads exclu-
sively?
36 amit goyal

Solution 3.9

Here X ∼ Bin(35, 0.3), consequently probability of the event


that a ≤ X ≤ b is
b  
35
Pr( a ≤ X ≤ b) = ∑ 0.3 j 0.735− j
j= a
j
where a and b are integers satisfying 0 ≤ a ≤ b ≤ 35 .

Example 3.10: [Click]

A sample of 4 items are selected randomly from a box con-


taining 12 items of which 5 are defective, find the expected
number of defective items.

Solution 3.10

Let Ij be the indicator random variable that takes value 1 if


jth item in the sample (of size 4) is defective, and 0 other-
wise. Observe that the number of defective items is equal to
I1 + I2 + I3 + I4 . By linearity of expectation, expected number
of defective items is equal to E( I1 ) + E( I2 ) + E( I3 ) + E( I4 ).
By symmetry, it is equal to 4E( I1 ). Now, expected value of I1
is equal to the probability that the first item is defective i.e.
E( I1 ) = Pr( I1 = 1) = 12 5
. Therefore, expected number of
5
defective items is equal to 4 × 12 = 35 .

Example 3.11: [Click]

If X and Y are uncorrelated random variables with variance 1,


then what is the variance of X − Y?

Solution 3.11

V( X − Y ) = V( X ) + V(Y ) − 2C( X, Y ) = 1 + 1 − 2(0) = 2

Example 3.12: [Click]

Four identical objects are distributed randomly into 3 distinct


boxes. Let X denote the number of objects that end up in the
first box. What is the expected value of X?
probability 37

Solution 3.12

Given that
X = number of objects that end up in the first box
Also let
Y = number of objects that end up in the second box
Z = number of objects that end up in the third box
Since four identical objects are distributed randomly into these
3 distinct boxes, we have
X+Y+Z = 4
Therefore,
E ( X ) + E (Y ) + E ( Z ) = 4
By symmetry, E( X ) = E(Y ) = E( Z )
So we get E( X ) = 34 .

Example 3.13: [Click]

Let X1 , X2 , X3 be the numbers on the cards drawn from a pack


containing 9 cards numbered 5, 6, 7, 8, 9, 10, 11, 12 and 13, re-
spectively.

Solution 3.13

Define X = max( X1 , X2 , X3 ), here is the probability mass func-


( k −5 )
tion of X: p X (k) = 29 for k = 7, 8, 9, 10, 11, 12, 13
(3)

Example 3.14: [Click]

There are 10 items in a box, 6 of which are defective. 4 items


are selected randomly without replacement. What is the ex-
pected number of selected defective items?
38 amit goyal

Solution 3.14

Let Xi be the indicator random variable that takes value 1 if


the ith item is defective and 0 otherwise, where i ∈ {1, 2, 3, 4}.
Note that X1 , X2 , X3 and X4 are not independent, but they
have the same distribution:
6
Pr( Xi = 1) = 10 = 35
6
Pr( Xi = 0) = 1 − 10 = 25
for each i ∈ {1, 2, 3, 4}.
This follows from the symmetry of positions. Item on the
second position is just as likely to be a defective item as the
one on the first position, and same holds for other posi-
tions. Therefore, expected value of Xi is equal to the prob-
6
ability that the ith item is defective i.e. 10 . If X denotes the
number of defective items selected, then we can write X as
X = X1 + X2 + X3 + X4 .
By linearity of expectation, it follows that
E( X ) = E( X1 ) + E( X2 ) + E( X3 ) + E( X4 ) = 12 5 = 2.4.

Example 3.15: [Click]

What is the expected number of times that heads will appear


when a fair coin is tossed three times?

Solution 3.15

Let Xi be the indicator random variable that takes value 1


if heads appear on the ith toss and 0 otherwise. Therefore,
expected value of Xi is equal to the probability that heads ap-
pears on the ith toss i.e. 12 . If X denotes the number of times
that heads will appear when a fair coin is tossed three times,
then we can write X as X = X1 + X2 + X3 . By linearity of ex-
pectation, it follows that E( X ) = E( X1 ) + E( X2 ) + E( X3 ) = 23 .

Example 3.16: [Click]

If 5 men and 5 women are seated randomly in a single row of


chairs. What is the expected number of women sitting next to
at least 1 man?
probability 39

Solution 3.16

Let the seats be numbered 1, 2, 3, . . . , 10 in order of arrange-


ment. Define a random variable I1 in the following way: I1
takes value 1 if seat number 1 is occupied by a woman and
seat number 2 is occupied by a man, and 0 otherwise. Like-
wise, we define random variables Ij for 2 ≤ j ≤ 9 in the similar
way: Ij takes value 1 if seat number j is occupied by a woman
and at least one of the two seats numbered ( j − 1) and ( j + 1)
is occupied by a man, and takes value 0 otherwise. I10 takes
value 1 if seat number 10 is occupied by a woman and seat
number 9 is occupied by a man, and 0 otherwise. We define N
to be the number of women seating next to at least one man.
Therefore,

N = I1 + I2 + · · · + I9 + I10

By linearity of expectation,

E( N ) = E( I1 ) + E( I2 ) + · · · + E( I9 ) + E( I10 )

Notice that while the Ij ’s are not independent, this is irrelevant


for E( N ) By symmetry,

E( I1 ) = E( I10 ) and E( I2 ) = · · · = E( I9 )

So, we just need to find E( I1 ) and E( I2 ).

(51)(51)8! 5
E( I1 ) = Pr( I1 = 1) = =
10! 18
E( I2 ) = Pr( I2 = 1)
= 1 − Pr( I2 = 0)
= 1 − [Pr(seat number 2 is occupied by man) +
Pr(seat numbers 1, 2 and 3 are occupied by women)]
" #
(51)9! (53)3!7!
 
1 1 5
= 1− + = 1− + =
10! 10! 2 12 12

Therefore,

E( N ) = E( I1 ) + E( I2 ) + · · · + E( I9 ) + E( I10 )
5 10 35
= [2 × E( I1 )] + [8 × E( I2 )] = + =
9 3 9
40 amit goyal

Example 3.17: [Click]

What is the expected number of coin flips until you get 3


heads in a row?

Solution 3.17

Let N0 denotes the number of coin flips required to get 3 heads


in a row starting from an initial state or any other state in
which the last coin flip resulted in a tail. Also let N1 denotes
the number of coin flips required to get 3 heads in a row start-
ing from a state where we have observed one head in the only
toss so far or any other state in which the last two coin flips
resulted in a tail followed by a head. Likewise, let N2 denotes
the number of coin flips required to get 3 heads in a row start-
ing from a state where we have observed one tail followed by
two consecutive heads in the last three flips, or simply two
consecutive heads if we have just flipped the coin twice so far.
Let us use ni to denote E ( Ni ), and we just need to solve the
system of equations to get the required quantity.

n0 = 1 + 0.5n0 + 0.5n1
n1 = 1 + 0.5n2 + 0.5n0
n2 = 1 + 0.5n0

Solving the above system yields:

n0 = 14, n1 = 12, n2 = 8

Therefore, the expected number of coin flips until we get three


heads in a row is 14.

Example 3.18: [Click]

X1 , X2 , X3 be a i.i.d random sample of size 3 drawn from a


population with Bern( p) distribution. What is the distribution
of Y = max( X1 , X2 , X3 )?

Solution 3.18

Notice that random variable Y takes value 0 when X1 = X2 =


X3 = 0, and 1 otherwise. Therefore, Pr(Y = 0) = (1 − p)3 and
Pr(Y = 1) = 1 − (1 − p)3 .
probability 41

Example 3.19: [Click]

The random variable X takes the values −1 and 1, each with


probability 0.5. What is the covariance between X and X 2 ?

Solution 3.19

Given that X take only two values −1 and 1, X 3 = X. Also, X


take values −1 and 1 with equal probability so E( X ) = 0. It
follows that C( X, X 2 ) = E( X 3 ) − E( X 2 )E( X ) = E( X )(1 −
E( X 2 )) = 0.

Example 3.20: [Click]

A random variable X has binomial distribution with mean 4


and variance 2.4. What is the probability that X is positive?

Solution 3.20

Given that X ∼ Bin(n, p), find n and p by solving the follow-


ing:

np = 4, np − np2 = 2.4

and you will get

n = 10, p = 0.4

So, Pr( X > 0) = 1 − Pr( X = 0) = 1 − 0.610

A random variable X takes only two values, 0 and 1, with Pr( X =


0) = 0.3. What is the value of E( X 11 )?
4 | Continuous Random Variables

Definition 4.1: Continuous Random Variable

A random variable X is called continuous if there is a non-


negative function f X , called the probability density function of
X, or PDF for short, such that
Z b
Pr( a < X < b) = f X ( x )dx
a

for all a, b. Note that to qualify as a PDF, a function f X must be


non-negative, i.e., f X ( x ) ≥ 0 for every x, and must also have
the normalization property
Z ∞
f X ( x )dx = Pr(−∞ < X < ∞) = 1
−∞

Definition 4.2: Expectation and Variance of a Continuous


Random Variable

Z ∞
E( X ) = x f X ( x )dx
−∞
V( X ) = E( X − E( X ))2 = E( X 2 ) − (E( X ))2

1 1
Table of comparison between Discrete
and Continuous RVs

Definition 4.3: Uniform Random Variable

X ∼ U [ a, b] if

 1

if a ≤ x ≤ b
f X (x) = b − a
0 otherwise

44 amit goyal

Theorem 4.1: Transformation of a Continuous Random Vari-


able
Let X be a continuous random variable with PDF f X , consider
a transformation Y = g( X ), where g is differentiable and
monotonic, then PDF of Y is given by

dx
f Y (y) = f X ( x )
dy

where y = g( x ).

Theorem 4.2: Universality property of Uniform

If X ∼ FX is a continuous random variable then the transfor-


mation FX ( X ) ∼ U [0, 1] follows a uniform distribution.

Definition 4.4: Normal Random Variable

X ∼ N (µ, σ2 ) if it has the density function:

1 1 x −µ 2
f X (x) = √ e− 2 ( σ )
σ 2π

2 2
68%-95%-99.7% Rule

Definition 4.5: Moment Generation Function (MGF)

A random variable X has a MGF

MX (t) = E(etX )

, if this is finite for some interval (− a, a), a > 0. Not only can a
moment-generating function be used to find moments of a ran-
dom variable, it can also be used to identify which probability
mass function a random variable follows.

Theorem 4.3

A moment-generating function uniquely determines the proba-


bility distribution of a random variable.

Theorem 4.4

If X and Y are independent then MX +Y (t) = MX (t) MY (t).

Theorem 4.5
1 2 t2
MGF of X ∼ N (µ, σ2 ) is MX (t) = eµt+ 2 σ .
probability 45

Definition 4.6: Exponential Random Variable

X ∼ Expo(λ) if it has the density function:



λe−λx , if x > 0
f X (x) =
0, otherwise

Its CDF is

Z x 1 − e−λx , if x > 0
FX ( x ) = f X (t)dt =
−∞ 0, otherwise

Theorem 4.6

Consider the transformation Y = λX where X ∼ Expo(λ) then


Y ∼ Expo(1). E(Y ) = V(Y ) = 1. Also MGF of Y is MY (t) =
1
.
1−t

Theorem 4.7

λ
MGF of X ∼ Expo(λ) is MX (t) = where λ > t.
λ−t

Definition 4.7: Memoryless Property

We say that random variable X satisfies memoryless property


if

Pr( X ≥ s + t| X ≥ s) = Pr( X ≥ t)

for all s, t ∈ R+ .

Theorem 4.8

A continuous positive random variable X satisfies memory-


less property if and only if it is distributed Expo(λ) for some
λ > 0.

Definition 4.8: Chi-square Random Variable

We say X ∼ χ2 (n) if X = ∑in=1Zi2 where


 Zi s are i.i.d.
1 1
N (0, 1). Note that χ2 (1) = Gamma , and χ2 (n) =
  2 2
n 1
Gamma , .
2 2
46 amit goyal

Definition 4.9: Student tn (Gosset)

We say T ∼ tn if
Z
T= √
X/n
where Z ∼ N (0, 1) and X ∼ χ2 (n). Also, Z and X are indepen-
dent.

Definition 4.10: Joint, Marginal and Conditional Distribution

A random vector ( X, Y ) is called continuous if there is a non-


negative function f X,Y , called the joint probability density
function of X, Y, or joint PDF for short, such that
Z dZ b
Pr( a < X < b, c < Y < d) = f X,Y ( x, y)dxdy
c a

We define Joint CDF of ( X, Y ) by

FX,Y ( x, y) = Pr( X ≤ x, Y ≤ y)

Marginal PDF of X, denoted by f X ( x ), can be obtained in the


following way:
Z ∞
f X (x) = f X,Y ( x, y)dy
−∞

For y with f Y (y) > 0, Conditional PDF of X given Y = y, is de-


fined by

f X,Y ( x, y)
f X |Y ( x | y ) =
f Y (y)

Definition 4.11: Conditional Expectation of X given Y = y

Conditional Expectation of X given Y = y, denoted by


E( X |Y = y), is defined as follows:
Z ∞
E ( X |Y = y ) = x f X |Y ( x |y)dx
−∞

Note: E( X ) = E(E( X |Y )) and V( X ) = V(E( X |Y )) +


E(V( X |Y )) holds for the continuous case as well.
probability 47

Definition 4.12: Beta Random Variable

X ∼ Beta( a, b), where a, b > 0 if its density is given by



cx a−1 (1 − x )b−1 , if 0 < x < 1
f X (x) =
0, otherwise

where c is a normalizing constant whose value depends on a


and b.

Definition 4.13: Gamma function


Z ∞
Γ( a) = x a−1 e− x dx
0
for all real a > 0.

Theorem 4.9

Γ(n) = (n − 1)! where n is a positive integer.

Definition 4.14: Gamma Random Variable

We say X ∼ Gamma( a, 1) if its density is



 1 x a−1 e− x , if x > 0

f X ( x ) = Γ( a)
0,

otherwise

X
We say Y ∼ Gamma( a, λ) if Y = for X ∼ Gamma( a, 1).
λ
Density of Y is

 λ (λy) a−1 e−λy ,

if y > 0
f Y (y) = Γ( a)
0,

otherwise

Theorem 4.10: Transformations of random vectors

Given a random vector X = ( X1 , X2 ), and a differentiable func-


tion g : R2 → R2 such that a random vector Y = (Y1 , Y2 ) =
g( X1 , X2 ). The joint PDF of Y can be determined by
" ∂x ∂x1
#
1
∂y1 ∂y2
f Y (y1 , y2 ) = f X ( x1 , x2 )| ∂x2 ∂x2 |
∂y1 ∂y2

where (y1 , y2 ) = g( x1 , x2 ).
48 amit goyal

Theorem 4.11: Transformations of random vectors

Consider two independent random variables X ∼ Gamma( a, 1)


and Y ∼ Gamma(b, 1). Then X + Y ∼ Gamma( a + b, 1). If we
X
let T = X + Y and W = , then T and W are independent.
X+Y
Also, W ∼ Beta( a, b). We also get the normalizing constant for
Γ( a + b)
Beta( a, b) as .
Γ( a)Γ(b)

Solved Problems

Example 4.1: [Click]

Find the E(min(2X − Y, X + Y )) when X and Y are indepen-


dently and identically distributed uniform random variables
on [0, 1].

Solution 4.1


2X − Y X
if Y > 2
min(2X − Y, X + Y ) =
X
X + Y if Y ≤ 2

Therefore,

E [min(2X − Y, X + Y )]
Z 1Z 1 Z 1Z x
2
= x
(2x − y)dydx + ( x + y)dydx
0 2 0 0
5
=
12

Example 4.2: [Click]

Let X be a random variable with a uniform distribution over


[0, 1] ∪ [3, 4]. What is the Cumulative distribution function
(CDF) of X?
probability 49

Solution 4.2

Since X is uniformly distributed over [0, 1] ∪ [3, 4], its probabil-


ity density function (PDF) is

 1 for x ∈ [0, 1] ∪ [3, 4]
f X (x) = 2
0 otherwise

Now we can obtain the CDF from the PDF in this way:



 0 for t ≤ 0

 t
Z t

2

 for 0 < t ≤ 1
FX (t) = Pr( X ≤ t) = f x ( x )dx = 12 for 1 < t < 3
−∞ 
1 t −3

2 + 2 for 3 < t < 4





for t ≥ 4

1

Example 4.3: [Click]

A random variable Y has a uniform distribution over the inter-


val (0, θ ). What is the expected value and variance of Y?

Solution 4.3

Given that Y ∼ Unif(0, θ ) we can find its expected value and


variance in the following way :

1
Z θ
θ
E (Y ) = ydy =
0 θ 2
and
1 2 θ2
Z θ
2
E (Y ) = y dy =
0 θ 3
Consequently, the variance is

θ2
V (Y ) = E(Y2 ) − (E(Y ))2 =
12

Example 4.4: [Click]

What is Pr Y 2 > X > t , knowing that t ∈ (0, 1) when X and




Y are independent Unif[0, 1] random variables?


50 amit goyal

Solution 4.4

Z 1
Pr(Y 2 > X > t) = Pr(Y 2 > X > t|Y = y) f Y (y)dy
0
Z 1
= Pr(y2 > X > t|Y = y)dy
0
Z 1
= √ Pr(y2 > X > t)dy
t
Z 1
2
= √ (y − t)dy
t
1 2 3
= − t + t2
3 3

Example 4.5: [Click]

Given that X ∼ Unif(0, 1) and Y | X = p ∼ Binom(10, p). Find


the variance of Y.

Solution 4.5

We are given that X ∼ Unif(0, 1) and Y | X = p ∼ Binom(10, p).


We can find the variance of Y using the following law :
V(Y ) = V(E(Y | X )) + E(V(Y | X ))
Here
V(E(Y | X )) = V(10X ) = 100
12  
E(V(Y | X )) = E(10X (1 − X )) = 10 1
2 − 1
3 = 10
6
So, V(Y ) = 100
12 + 10
6 = 10

Example 4.6: [Click]

Suppose X ∼ Unif(0, 3) and Y ∼ Unif(0, 4) and they are inde-


pendent random variables, what is the probability that X < Y?
probability 51

Solution 4.6

Given that X ∼ Unif(0, 3) and Y ∼ Unif(0, 4) and they are


independent random variables, we can find the required prob-
ability in the following way:

Pr( X < Y ) = E(Pr( X < Y | X ))


Z 3
= Pr(Y > x | X = x ) f X ( x )dx
0
Z 3
1
= Pr(Y > x )dx
0 3
Z 3
1 4−x
 
= dx
0 3 4
Z 3
4−x 5
= dx =
0 12 8

Example 4.7: [Click]

Can two dependent random variables become independent


after you condition them on a third random variable?

Solution 4.7

Suppose X | Z = z and Y | Z = z be independently and identi-


cally distributed uniform random variables Unif(0, z), and let
Z ∼ Unif(0, 1).
It follows from the definition that X and Y are dependent, but
X | Z = z and Y | Z = z are independent.

Example 4.8: [Click]

If X ∼ N (0, 1), what is the distribution of Y = e X ?

Solution 4.8

If X ∼ N (0, 1) and Y = e X , density of Y is


dx
f Y (y) = f X ( x ) , where y = e x .
dy
Therefore, for y >  0,
1 1 (ln y)2
f Y (y) = f X (ln y) = √ e− 2
y 2πy
Y = e X , when X is normally distributed, is commonly referred
to as a log-normal random variable (because ln Y = X has a
normal distribution)
52 amit goyal

Example 4.9: [Click]

Given that Y ∼ N (µY , σY2 ), and X = aeY for a > 0, what is the
density function of X?

Solution 4.9

Given that Y ∼ N (µY , σY2 ), and X = aeY for a > 0.


Density function of X is
dy
f X ( x ) = f Y (y) , where x = aey
dx
Therefore, for x > 0,
(ln x −ln a−µY )2

 
1 1 2σ2
f X ( x ) = f Y (ln x − ln a) = √ e Y
x 2πσY x

Example 4.10: [Click]

How can the mode of uniform distribution be determined?

Solution 4.10

Let f X denotes the density function of random variable X.


Mode m of the distribution of X solves the following:

max f X (x) . . . (1)


x

For a uniform random variable X ∼ Unif(0, 1), density is



1 if 0 ≤ x ≤ 1
f X (x) =
0 elsewhere

Therefore, every point in the interval [0, 1] solves (1). Hence all
values in the interval [0, 1] are modes of the distribution of X.

Example 4.11: [Click]

Let Y be an exponential random variable with mean 1/θ where


θ is positive. The conditional distribution of X given Y = λ has
Poisson distribution with mean λ. Then the variance of X is?

Solution 4.11

Given that Y ∼ Expo(θ ), where θ > 0 and X |Y = λ ∼ Pois(λ),


variance of X can be computed as follows:
1 1
V( X ) = E(V( X |Y )) + V(E( X |Y )) = E(Y ) + V(Y ) = + 2
θ θ
probability 53

Example 4.12: [Click]

Let X and Y be i.i.d Unif(0, 1) random variables, then what is


Pr(Y < ( X − 0.5)2 )?

Solution 4.12

Given that X and Y are iid Unif(0, 1), the joint density of X
and Y is

1 if 0 < x < 1 and 0 < y < 1
f X,Y ( x, y) =
0 elsewhere

and probability of the event Y < ( X − 0.5)2 is equal to


Z 1 Z ( x−0.5)2
Pr(Y < ( X − 0.5)2 ) = 1dydx
0 0
Z 1
= ( x − 0.5)2 dx
0
Z 1
= 2 ( x − 0.5)2 dx
0.5
Z 0.5
= 2 x2 dx
0
1
=
12

Example 4.13: [Click]

Suppose A1 ∼ Unif(0, 1), A2 | A1 ∼ Unif(0, A1 ), A3 | A2 ∼


Unif(0, A2 ), . . ., An | An−1 ∼ Unif(0, An−1 ). Given that S =
A1 + A2 + · · · + An + · · · , find its expected value.
54 amit goyal

Solution 4.13

E( S ) = ∑ E( A i )
i =1
where
E( A1 ) = 12
E( A2 ) = E(E( A2 | A1 )) = E( A21 ) = 21 E( A1 ) = 14
E( A3 ) = E(E( A3 | A2 )) = E( A22 ) = 21 E( A2 ) = 18
Likewise by induction step,
if E( An ) = 21n then E( An+1 ) = 2n1+1 .
Here is the proof:
E( An+1 ) = E(E( An+1 | An )) = E( A2n ) = 12 E( An ) = 1
2n +1
Therefore,
∞ ∞
E( S ) = ∑ E( A i ) = ∑ 1
2i
= 1.
i =1 i =1

Example 4.14: [Click]

Given that Y | X = x ∼ Unif( x − 1, x + 1), E( X ) = 1 and


V( X ) = 53 , find E(Y ) and V(Y ).

Solution 4.14

E (Y | X = x ) = x
V(Y | X = x ) = 31
So,
E(Y ) = E(E(Y | X )) = E( X ) = 1
and  
V(Y ) = V(E(Y | X )) + E(V(Y | X )) = V( X ) + E 13 = 5
3 + 1
3 =
2.

Example 4.15: [Click]

If a unit length stick is broken into two pieces, what is the


probability that the longer portion is thrice as long as the
shorter one?
probability 55

Solution 4.15

Let X ∼ Unif(0, 1) denote the point on the stick from where it


is broken into two pieces, the probability that the longer por-
tion is at least thrice as long as the shorter one is equal to the
probability that max( X, 1 − X ) ≥ 3 min( X, 1 − X ).

Pr(max( X, 1 − X ) ≥ 3 min( X, 1 − X ))
= Pr( X ≥ 3(1 − X )) + Pr(1 − X ≥ 3X )
   
3 1
= Pr X ≥ + Pr X ≤
4 4
1 1
= +
4 4
1
=
2

Example 4.16: [Click]

Given the joint density of X and Y,



24xy if 0 < x < 1, 0 < y < 1 − x
f X,Y ( x, y) =
0 elsewhere

What is E( X )?

Solution 4.16

Density of X is
Z 1− x
 24xydy = 12x (1 − x )2 if 0 < x < 1
f X (x) = 0
0 elsewhere

Therefore, the expected value of X is


Z 1
2
E( X ) = 12x2 (1 − x )2 dx =
0 5

Example 4.17: [Click]

Given the joint density of X and Y,



2(1 − x ) if 0 < x < 1, 0 < y < 1
f X,Y ( x, y) =
0 elsewhere

Determine the density function of random variable U = XY.


56 amit goyal

Solution 4.17

To find the density of random variable U = XY, we can first


find its CDF. Observe that range of U is (0, 1). For u ∈ (0, 1),
CDF of U at u is

FU (u) = Pr(U ≤ u) = 1 − Pr(U > u)

To find Pr(U > u) we need to integrate the joint density f X,Y


over values of ( x, y) in the blue region i.e. where xy > u. For
u ∈ (0, 1),
Z 1Z 1
Pr(U > u) = 2(1 − x )dydx
u u/x
Z 1  u
= 2(1 − x ) 1 − dx
u x
= 1 − u2 + 2u ln u

Therefore,

2
u − 2u ln u if u ∈ (0, 1)


FU (u) = 1 − Pr(U > u) = 0 if u ≤ 0


1 if u ≥ 1

1

To get the density of U, we can differentiate the CDF,


xy > u

2(u − ln u − 1) if u ∈ (0, 1)
f U (u) =
y→

0 otherwise

0
0 u 1
x→
Figure 4.1: xy > u
5 | Topics in Random Variables

Definition 5.1: Mixture Random variables

Theorem 5.1: Statistical Inequalities

1. Cauchy-Schwarz Inequality. For any two random variables


X and Y defined on the same sample space:
q
|E( XY )| ≤ E( X 2 )E(Y )2

2. Jensen’s Inequality. If g is a convex function, then

E( g( X )) ≥ g(E( X ))

3. Markov’s Inequality. For any a > 0,

E(| X |)
Pr(| X | ≥ a) ≤
a
4. Chebyshev’s Inequality. For any a > 0,

V( X )
Pr(| X − µ| ≥ a) ≤
a2
where µ = E( X ).

Definition 5.2: Convergence in Probability

Let Y1 , Y2 , . . . be a sequence of random variables (not necessar-


ily independent), and let a be a real number. We say that the
sequence Yn converges to a in probability, if for every ϵ > 0, we
have

lim Pr(|Yn − a| ≥ ϵ) = 0
n→∞
6 | Sampling

Definition 6.1: Sample Mean

Consider the sequence X1 , X2 , . . . , Xn of i.i.d. random variables


with mean µ and variance σ2 , we define the sample mean by

X1 + X2 + · · · + X n
Mn =
n

Theorem 6.1
Let X1 , X2 , . . . be i.i.d random variables with mean µ and vari-
ance σ2 , we have Mn converges to µ in probability.
60 amit goyal

Theorem 6.2: Monte Carlo Integration


Rb
Let f be a complicated function whose integral a f ( x )dx we
want to approximate. Assume that 0 ≤ f ( x ) ≤ c ∀ x ∈ [ a, b],
so that we know the integral is finite. The technique of Monte
Carlo Integration uses random samples to obtain approxima-
tions of definite integrals when exact integration methods are
unavailable. The procedure is the following: Suppose we pick
i.i.d. points ( X1 , Y1 ), ( X2 , Y2 ), . . . , ( Xn , Yn ) uniformly in the
rectangle [ a, b] × [0, c]. Define indicator r.v.s I1 , . . . , In by let-
ting Ij = 1 if Yj ≤ f ( X j ) and Ij = 0 otherwise. Then Ij are
Bernoullin r.v.s with success probability is precisely the ratio
of area below f to the area of the rectangle [ a, b] × [0, c]. Let
p = E( I j ),
Rb
f ( x )dx
p = E( Ij ) = Pr( Ij = 1) = a
c(b − a)

We can estimate p using 1


n ∑nj=1 Ij , and then estimate the de-
sired integral by
Z b n
1
a
f ( x )dx ≈ c(b − a)
n ∑ Ij
j =1

Since Ij s are i.i.d with mean p, it follows from the law of large
numbers that estimate n1 ∑nj=1 Ij converges to p in probability
as the number of points approach infinity.
probability 61

Theorem 6.3: Convergence of empirical CDF

Let X1 , X2 , . . . , Xn be i.i.d random variables with CDF F. For


each number x, let Rn ( x ) count how many of X1 , . . . , Xn are
less than or equal to x, that is,
n
Rn ( x ) = ∑ I (Xj ≤ x)
j =1

Since indicators I ( X j ≤ x ) are i.i.d with probability of success


F ( x ), we know Rn ( x ) is binomial with parameters n and F ( x ).
The empirical CDF of X1 , X2 , . . . , Xn is defined as

Rn ( x )
F̂n ( x ) =
n

considered as a function of x. By law of large numbers, F̂n ( x )


converges to F ( x ) in probability and therefore, can serve as a
reasonable estimate of F ( x ) when n is large.
The empirical CDF is commonly used in nonparametric statis-
tics, a branch of statistics that tries to understand a random
sample without making strong assumptions about the family
of distributions from which it is originated.

Theorem 6.4: The Central Limit Theorem

Let X1 , X2 , . . . be i.i.d random variables with mean µ and vari-


ance σ2 . Define
n
∑ Xi − nµ
i =1
Zn = √
σ n

Then, the CDF of Zn converges to the standard normal CDF.


Z z
1 2 /2
Φ(z) = √ e−t dt
2π −∞

in the sense that

lim Pr( Zn ≤ z) = Φ(z)


n→∞

for every z.
62 amit goyal

Theorem 6.5: Binomial Convergence to Normal

Let Y ∼ Bin(n, p). By Central Limit Theorem, we can consider


Y to be a sum of n i.i.d Bern( p) r.v.s. Therefore, for large n,

Y∼
˙ N (np, np(1 − p))

If Y is a Binomial random variable with parameters n and p, n


is large, and k, l are non negative integers, then
! !
l + 12 − np k − 21 − np
Pr(k ≤ Y ≤ l ) ≈ Φ p −Φ p
np(1 − p) np(1 − p)

Theorem 6.6: Poisson Convergence to Normal

Let Y ∼ Pois(n). By Central Limit Theorem, we can consider Y


to be a sum of n i.i.d Pois(1) r.v.s. Therefore, for large n,

Y∼
˙ N (n, n)

Theorem 6.7: Gamma Convergence to Normal

Let Y ∼ Gamma(n, λ). By Central Limit Theorem, we can


consider Y to be a sum of n i.i.d Expo(λ) r.v.s. Therefore, for
large n,
n n 
Y∼ ˙ N ,
λ λ2

Theorem 6.8: Volatile Stock


Each day, a very volatile stock rises 70% or drops 50% in price,
with equal probabilities and with different days independent.
Let Yn be the stock price after n days, starting from an initial
value of Y0 = 100.
1. Explain why log Yn is approximately Normal for n large,
and state its parameters.
2. What happens to E(Yn ) as n → ∞?
3. Use the law of large numbers to find out what happens to
Yn as n → ∞.
probability 63

Theorem 6.9

If Z1 , Z2 , . . . , Zn is a random sample from a standard normal


distribution, then
1. Z n has a normal distribution with mean 0 and variance 1/n.
2. Z n and ∑nj=1 ( Zj − Z n )2 are independent.
3. ∑nj=1 ( Zj − Z n )2 has a chi-square distribution with n − 1 de-
grees of freedom.

Theorem 6.10: Distribution of the sample variance

For i.i.d. X1 , X2 , . . . , Xn ∼ N (µ, σ2 ), the sample variance is the


r.v.
n
1
Sn2 =
n−1 ∑ ( X j − X n )2
j =1

Show that

(n − 1)Sn2
∼ χ2n−1
σ2

Solved Problems

Example 6.1: [Click]

Let X1 , X2 , . . . be the sequence of i.i.d random variables with


Pr( Xi = 1) = 41
Pr( Xi = 2) = 34
n
∑ Xi
i =1
Define X n = . What is the lim Pr( X n ≤ 1.8)?
n n→∞
64 amit goyal

Solution 6.1
Given the data above, we can find expectation and variances of
Xi and X n as follows:
E( Xi ) = 74 = 1.75
V( Xi ) = 163

E( X n ) = 74 = 1.75
V( X n ) = 16n3

By Chebyshev’s Inequality,
Pr(| X n − 1.75| > 0.05) ≤ 75
n
Therefore, lim Pr(| X n − 1.75| > 0.05) = 0.
n→∞
Working on the desired quantity Pr( X n ≤ 1.8), we get

Pr( X n ≤ 1.8) = 1 − Pr( X n − 1.75 > 0.05)


≥ 1 − Pr(| X n − 1.75| > 0.05)

Since lim Pr(| X n − 1.75| > 0.05) = 0, the above implies that
n→∞
lim Pr( X n ≤ 1.8) = 1.
n→∞
Alternatively, we can also apply weak law of large numbers
according to which X n converges in probability to 1.75 and
consequently, lim Pr( X n ≤ 1.8) = 1 holds.
n→∞
7 | Estimation

x <- 1:1000
y <- rep(0, length(x))

# for loop
for (i in 1:length(x)){
if (x[i]%%3 == 0 | x[i] > 50){
y[i] = x[i]
} else { def cost(X, y, theta):
y[i] = -x[i] m = len(y)
} J =
} (1/(2*m))*(((X@theta-y)**2).sum())
sum(y) return J

x <- 1:1000 def gradientDescent(X, y,


y <- x*(x%%3 == 0 | x > 50)-x*(1- (x%%3 == 0 | x > 50)) theta, alpha, iterations):
sum(y) m = len(y)
J_history =
np.zeros(iterations)
Example 7.1: [Click] for iter in
range(iterations):
Suppose that we know that the random variable X is uniform theta = theta -
with support [0, b]. Suppose that we observed X = 2.5, what is (((alpha/m)*(X@theta
an unbiased estimate of b based on this single observation? - y).T@X).T)
J_history[iter] =
cost(X, y, theta)
Solution 7.1 return theta, J_history

Given that X ∼ Unif[0, b], E( X ) = 2b . Therefore, E(2X ) = b. In


other words, 2X is an unbiased estimator of b, and the corre-
sponding estimate of b when the realization of X equals 2.5 is
2(2.5) = 5.

Example 7.2: [Click]

Given that Y1 , Y2 , Y3 are iid Unif(0, θ ), list some unbiased esti-


mators of θ.
66 amit goyal

Solution 7.2

Given that Y1 , Y2 , Y3 are iid Unif(0, θ ), all these are unbiased


estimators of θ :
• θ̂1 = 23 (Y1 + Y2 + Y3 )
• θ̂2 = Y1 + Y2
• θ̂3 = 2Y1
• θ̂4 = 43 max(Y1 , Y2 , Y3 )
• θ̂5 = max(Y1 , Y2 , Y3 ) + min(Y1 , Y2 , Y3 )
References

• Introduction to Probability by Blitzstein & Hwang

• Introduction to Probability by Bertsekas & Tsitsiklis

You might also like