0% found this document useful (0 votes)
27 views59 pages

Probability Statistics Notes

Uploaded by

aryan78wwe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views59 pages

Probability Statistics Notes

Uploaded by

aryan78wwe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Probability and Statistics Notes∗

Dr. Husney Parvez Sarwar


Department of Mathematics
Indian Institute of Technology Kharagpur
[email protected]; [email protected]


Caution: there might be errors/mistakes/typos. If you see some, then please email
me.

1
Contents

1 Syllabus 3
1.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Special Distributions . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Function of a Random Variable . . . . . . . . . . . . . . . . . 3
1.5 Joint Distributions . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.7 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . 3
1.8 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.9 Testing of Hypotheses . . . . . . . . . . . . . . . . . . . . . . 4

2 Motivation 4

3 Definition of probability and examples 4

4 Random Variable (RV) 10


4.1 Properties of CDF. . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Some inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Some special discrete distribution . . . . . . . . . . . . . . . . 17
4.4 Distribution of Functions of a RV X . . . . . . . . . . . . . . 27

5 Joint distribution 37
5.1 Conditional distributions in two variables . . . . . . . . . . . 42
5.2 Transformation of random variables . . . . . . . . . . . . . . 44
5.3 Bivariate normal distribution . . . . . . . . . . . . . . . . . . 48

6 Various Convergence and sampling distributions 49


6.1 The Law of Large Numbers . . . . . . . . . . . . . . . . . . . 53

7 Estimation of parameters 59

8 Testing of hypotheses 59

2
1 Syllabus
1.1 Probability
Classical, relative frequency and axiomatic definitions of probability, addi-
tion rule and conditional probability, multiplication rule, total probability,
Bayes’ Theorem and independence, problems.

1.2 Random Variables


Discrete, continuous and mixed random variables, probability mass, proba-
bility density and cumulative distribution functions, mathematical expecta-
tion, moments, moment generating function, median and quantiles, Cheby-
shev’s inequality, problems.

1.3 Special Distributions


Discrete uniform, binomial, geometric, negative binomial, hypergeometric,
Poisson, continuous uniform, exponential, gamma, Weibull, Pareto, beta,
normal, Cauchy distributons, reliability of series and parallel systems, prob-
lems.

1.4 Function of a Random Variable


Distribution of function of a random variable, problems.

1.5 Joint Distributions


Joint, marginal and conditional distributions, product moments, correlation,
independence of random variables, bivariate normal distribution, problems.

1.6 Transformations
Functions of random vectors, distributions of sums of random variables,
problems.

1.7 Sampling Distributions


The Central Limit Theorem, distributions of the sample mean and the sam-
ple variance for a normal population, Chi-Square, t and F distributions,

3
problems.

1.8 Estimation
Unbiasedness, consistency, the method of moments and the method of max-
imum likelihood estimation, confidence intervals for parameters in one sam-
ple and two sample problems of normal populations, confidence intervals for
proportions, problems.

1.9 Testing of Hypotheses


Null and alternative hypotheses, the critical and acceptance regions, two
types of error, power of the test, the most powerful test and Neyman-Pearson
Fundamental Lemma, tests for one sample and two sample problems for
normal populations, test for proportions, Chi-square goodness of fit test
and its applications, problems.

2 Motivation
Probability and statistics has a plethora of applications in real world. Chance
of winning a game. Applications in

• Data science

• Artificial Intelligence

• Machine Learning

• Weather forecast

• Insurance company while making policy

• The drug company: whether a drug is effective

3 Definition of probability and examples


Definition 1. (Random Experiment) Random experiment (or statis-
tical experiment) is an experiment in which

1. All the outcomes of the experiment are known in advance.

2. Any performance of the experiment results in an outcome that is not


known in advance.

4
3. The experiment can be repeated under identical conditions.

Example 1. 1. Tossing a coin: All the outcomes are H, T .

2. Rolling of a dice: all the outcomes are 1, 2, 3, 4, 5, 6.

3. Drawing a card from the deck of 52 cards.

4. The length of life of a bulb produced by certain manufacturer.

Definition 2 (Sample space). A set which a collection of all possible out-


comes of a random experiment is known as the sample space for the exper-
iment and it is denoted by Ω.

Example 2. For the previous example, the following are the sample space
Ω.

1. Ω = {H, T }.

2. Ω = {1, 2, 3, 4, 5, 6}.

3. Ω = {π : π is any permutation of 52 cards }.

4. Ω = [0, ∞).

Definition 3 (Classical definition of Probability). [Laplace 1812] If the


sample space (Ω) of a random experiment is a finite set and A ⊂ Ω. The
probability of A, P (A) is defined as P (A) = |A|/|Ω| under the assumptions
that all outcomes are equally likely. Here | · | denotes the cardinality of a
set.

Example 3. The sample space of tossing a coin is Ω = {H, T }. What is


the probability of getting Head.

A = {H} ⊂ Ω P (A) = |A|/|Ω| = 1/2.

Example 4. Three coins are tossed simultaneoulsy. What is the probability


that atleast two tails occur ?
Solution: Ω = {HHH, HHT, T T H, T T T, HT T, HT H, T HT, T HH}, |Ω| =
8, |A| = 4, |P (A) = 4/8 = 1/2|.

Problem 1. What is the probability that a randomly chosen number from


N = {1, 2, 3, · · · } will be an even number ?

Solution: Ωn = 1, 2, · · · , 2n, An = 2, 4, · · · , 2n limn→∞ Ω = N, limn→∞ An =


A, |An | = n, |Ωn | = 2n. So, P (A) = limn→∞ n/2n = 1/2.

Definition 4. Frequency definition of probability If the sample space (Ω)


of a random experiment is a countable set and A ⊂ Ω, then the probability
of A is defined as

5
P (A) = limn→∞ |An |/|Ωn |, limn→∞ An = A, limn→∞ Ωn = Ω.

Definition 5. A set T is said to be countable if there exists a map Φ :


T → N such that Φ is bijective or T is finite.

Notations: ∃ := there exist, ∀ := for all, ∪ := Union, or, ∩ := Inter-


section, and.

Definition 6. (Algebra) A collection A of the subset of Ω is called an


algebra if

1. Ω ∈ A

2. A ∈ A(A ⊂ Ω) =⇒ Ac ∈ A

3. A, B ∈ A, A, B ∈ Ω =⇒ A ∪ B ∈ A.

Example 5. Ω = {H, T }, A = {Ω, ϕ, {H}, {T }}. A = power set Ω is an


algebra.

Example 6. Ω = {1, 2, 3, 4, 5, 6}, A = {ϕ, Ω, {1, 2}, {3, 4, 5, 6}} is an alge-


bra, this is not a power set of Ω.

Example 7. A = {ϕ, Ω} is an algebra.

Definition 7. σ-algebra/field: An algebra A of the subsets of Ω is called


σ-algebra/field if {Ai } ⊆ Ω and {Ai } ∈ A =⇒ ∪∞
i=1 Ai ∈ A

Example 8. A1 = {ϕ, Ω} is σ-algebra.

Example 9. A2 = power set of Ω is also a σ-algebra.

Example 10. A = {ϕ, Ω, A, Ac } is a σ-algebra.

Non example:

1. A = {Ω}.

2. Ω = [0, ∞), A = {ϕ, Ω, An = [0, 1 − 1/n), Acn ∀n}, this is an algebra.


But ∪∞
n=1 An = [0, 1) ∈
/ A, so not σ-algebra.

Definition 8. (Axiomatic Definition of Probability) (Kolmogorov) If A is


σ-algebra of the subsets of a non-empty set Ω, then the probability(P ) is
defined to be a function P : A → [0, 1] which satisfies,

1. P (Ω) = 1

2. P (A) ≥ 0 for any A ∈ A


P∞
3. {Ai } ∈ A =⇒ P (∪∞
i=1 Ai ) = i=1 P (Ai ) if Ai ∩ Aj = ϕ ∀i ̸= j.

6
Remark: Classical definition is a particular case of the axiomatic definition
of probability.

Example 11. What is the probability that a randomly chosen number from
Ω = [0, 1] will be

1. a rational number

2. less than 0.4.

Definition 9. (Probability Space) The triple (Ω, A, P ) is called probabil-


ity space.

Definition 10. (Event) For a given probability space (Ω, A, P ) if A ⊆ Ω


and A ∈ A, then A is called an event.

Some properties of probability (function):

1. P (ϕ) = 0

2. P (Ac ) = 1 − P (A)

3. if A ⊆ B, then P (A) ≤ P (B)

4. 1 − P (∪∞ ∞ c
i=1 Ai ) = P (∩i=1 Ai )

5. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

6. P (∪ni=1 Ai ) ≤ ni=1 P (Ai )


P

7. P (∪ni=1 Ai ) = nk=1 (−1)k−1 Sk , where Sk = 1≤i1 <i2 <···<ik ≤n P (Ai1 ∩ · · · ∩ Aik )


P P

Definition 11. (Conditional Probability) For a given probability space


(Ω, A, P ) if A and B are two events such that P (B) > 0, then the con-
ditional probability of A given B is defined as
P (A∩B)
P (A|B) = P (B) .

Problem 2. Consider all families with two children and assume that the
boy and the girl are equally
(i) If a family is chosen at random and is found to have a boy. Then what
is the probability that the other one is also a boy?
(ii) If a child is chosen at random from the families and is found to be boy
then what is the probability that the other child in that family is also a boy?

Solution: (i) Ω = {(b, b), (b, g), (g, b), (g, g)}
A := a boy is found in the family
B := the other one is also boy
P (A) = 34 , P (A ∩ B) = 14

7
P (A∩B)
P (B|A) = P (A) = 31 .

(ii) Ω = {bb, bg, gg, gb},


A := the child is boy,
B := the other is also boy.
P (A) = 12 , P (A ∩ B) = 14
P (B|A) = P P(A∩B) 1
(A) = 2 .

Definition 12. (Independent event) For a given probability space, (Ω, A, P ),


the two events A and B are called independent if P (A ∩ B) = P (A)P (B).
This implies that P (A|B) = P (A) and P (B|A) = P (B).

Remark. If the probability function P is changed to some P1 on the same


(Ω, A), then events A and B may not be independent anymore.

Definition 13. (Pairwise independence) For a given probability space (Ω, A, P ),


consider a sequence of events {Ai }. The sequence of events are called pair-
wise independent if P (Ai ∩ Aj ) = P (Ai )P (Aj ), ∀i ̸= j.

Definition 14. (Mutual independence) For a given probability space (Ω, A, P ),


consider the sequence of events {Ai }. This sequence of events are called mu-
tually independent if
Q
P (∩i1 ,i2 ,··· ,ik ) = i1 ,i2 ,··· ,ik P (Ai ), ∀i1 ̸= i2 ̸= · · · ̸= ik ; k ∈ N.

[picture]

Example 12. Consider a tetrahedron. one side : A, 2nd side : B, 3rd side
: C, 4th side : ABC. Roll a tetrahedron , Ω = {A, B, C, ABC}
P (A) = 21 , P (A ∩ B) = 14 = P (A)P (B)
P (B) = 21 , P (A ∩ C) = 41 = P (A)P (C)
1
P (C) = 2 , P (B ∩ C) = 41 = P (B)P (C)
A1 = seeing A
A2 = seeing B
A3 = seeing C
{A1 , A2 , A3 } are pairwise independent.
P (A1 ∩ A2 ∩ A3 ) = 14 and P (A1 )P (A2 )P (A3 ) = 81 , so {A1 , A2 , A3 } are not
mutually independent.

Definition 15. (Mutually exclusive event) For a given probability space


(Ω, A, P ), the sequence of events {Ai } are called mutually exclusive if
Ai ∩ Aj = ϕ, ∀i ̸= j.

Definition 16. (Mutually exhaustive events) A sequence of events is called


mutually exhaustive if ∪∞i=1 Ai = Ω.

8
Remark. Mutually exclusive or exhaustive events does not depend on the
probability function.

Definition 17. (Partition) For a given probability space (Ω, A, P ), a se-


quence of events {Ai } are called partition of Ω if {A − i} are mutually
exclusive and mutually exhaustive.
[picture]

Theorem 1. (Bayes’ Theorem) Let A1 , A2 , · · · , Ak be a partition of Ω and


(Ω, A, P ) a probability space with P (Ai ) > 0 and P (B) > 0 for some B ⊆ Ω.
Then P (Ai |B) = PkP (B|Ai )P (Ai )
i=1 P (B|Ai )P (Ai )

Proof.
P (Ai ∩ B) P (B ∩ Ai )
P (Ai |B) = = (1)
P (B) P (B)
P (B ∩ Ai )
P (B ∩ Ai ) = P (B|Ai )P (Ai )[P (B|Ai ) = ] (2)
P (Ai )
B = (B ∩ A1 ) ∪ (B ∩ A2 ) ∪ · · · ∪ (B ∩ Ak )
= B ∩ (A1 ∪ A2 ∪ · · · ∪ Ak )
=B∩Ω
=B
Now (B ∩ Ai ) ∩ (B ∩ Aj ) = ϕ, because Ai ∩ Aj = ϕ as they are mutually
exclusive.
k
X k
X
P (B) = P (B ∩ Ai ) = P (B|Ai )P (Ai )[f rom(2)] (3)
i=1 i=1

Put the value of (2) and (3) in (1).


Then we get
P (Ai |B) = PkP (B|Ai )P (Ai )
i=1 P (B|Ai )P (Ai )

Problem 3. (Ω, A, P ) is a probability space and P : A → [0, 1] such that


A ⊆ B. Prove that P (A) ≤ P (B).
P (A∩B)
Solution: P (A|B) = P (B) ≤1
P (A)
=⇒ P (B)≤1
=⇒ P (A) ≤ P (B).
Alternatively, B = A ∪ (B ∩ Ac ) ∪ ϕ ∪ · · ·
=⇒ P (B) = P (A) + P (B ∩ Ac ) + 0
=⇒ P (A) ≤ P (B), as P (B ∩ Ac ) ≥ 0.

9
Problem 4. There are three drawers in a table. The first drawer contains
two gold coins. The second drawer contains a gold and a silver coin. The
third one contains two silver coins. Now a drawer is chosen at random and
then a coin is selected randomly. It is found that the gold coin has been
selected. What is the probability that the second drawer was chosen?

Solution: Ω = {D1g, D1g, D2g, D2s, D3s, D3s}


A1 = Drawer 1 is selected
A2 = Drawer 2 is selected
A3 = Drawer 3 is selected
B = the gold coin is found

P (B|A2 )P (A2 )
P (A2 |B) =
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + P (B|A3 )P (A3 )
1 1
2 × 3
=
1 × 13 + 12 × 13 + 0 × 31
1 6
= ×
6 3
1
= .
3
Problem 5. Prove that P (Ac ) = 1 − P (A).

Solution: P (Ω) = A ∪ Ac ∪ ϕ ∪ · · · , Ai = ϕ, ∀i ≥ 3.
P (Ω) = P (A) + P (Ac ) + P (ϕ) + · · ·
=⇒ 1 = P (A) + P (Ac ) + 0
=⇒ P (Ac ) = 1 − P (A).

Problem 6. Prove that P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

Solution: A ∪ B = A ∪ (B ∩ Ac ) ∪ ϕ ∪ · · · ∪ ϕ
P (A ∪ B) = P (A) + P (B ∩ Ac )
B = (B ∩ Ac ) ∪ (A ∩ B) ∪ ϕ ∪ · · · ∪ ϕ
P (B) = P (B ∩ Ac ) + P (A ∩ B)
Hence P (A ∪ B) = P (A) + P (B) − P (A ∪ B)

Fact: If A ∩ B = ϕ, ∀A, B ∈ A, then P (A ∪ B) = P (A) + P (B).

4 Random Variable (RV)


Ω = sample space; A = σ−algebra; P is a probability function.

Definition 18. Let (Ω, A, P ) be a probability space. Then a function X :


− R is called a random variable if X −1 ((−∞, x]) = {ω ∈ Ω : X(ω) ≤
Ω →
x} ∈ A, ∀x ∈ R.

10
Example 13. Ω = {HH, HT, T H, T T }
A = all subsets of Ω
X:Ω→R
X(ω) = the number of H in ω ∈ Ω
X(HH) = 2, X(T H) = 1, X(HT ) = 1, X(T T ) = 0



ϕ x<0

{T T } 0≤x<1
X −1 ((−∞, x]) =


{T H, HT, T T } 1 ≤ x < 2

Ω 2≤x<∞

Remark. Let (Ω, A, P ) be probability space such that Ω is a countable


set of points (finite, countably infinite), A = all subsets of Ω. Then every
numerical valued function on Ω is a RV.
Remark. All functions from Ω to R need not be RV.
Example 14. For any set A ⊆ Ω, define
(
0 ω∈/A
IA (ω) =
1 ω∈A

IA (ω) is called the indicator function of set A. IA (ω) is an RV ⇐⇒ A ⊆ A.


Problem 7. IA is an RV ⇐⇒ A ∈ A.
Definition 19. Let X be an RV in (Ω, A, P ). Define a point function
F : R →− R as F (x) = P {ω : X(ω) ≤ x}, ∀x ∈ R. The function F (x) is
called cumulative distribution function (CDF).
Example 15. Let (Ω, A, P ) be probability space and X : Ω →
− R defined
by X(ω) = c, ∀ω ∈ Ω, c ∈ R is fixed.
F (x) = P {ω : X(ω) ≤ x} = P {X −1 (−∞, x]}.
(
0 x<c
F (x) =
1 x≥c

4.1 Properties of CDF.


1. F (x) is non-decreasing, i.e., x > y =⇒ F (x) ≥ F (y)

2. F is right continuous

3. F (−∞) = limx→−∞ = 0

11
4. F (+∞) = limx→+∞ = 1
[draw picture]

Want to prove : x < y =⇒ F (x) ≤ F (y)


F (x) = P {X −1 (−∞, x]}
F (y) = P {X −1 (−∞, y}
X −1 (−∞, x] ⊆ X −1 (−∞, y]
So, P {X −1 (−∞, x]} ≤ P {X −1 (−∞, y]}. Hence F (x) ≤ F (y).

Two types of random variable (RV) in general.


1. Discrete
2. Continuous
Note that there can be a mixed type random variable as well.
Definition 20. An RV X defined (Ω, A, P ) is said to be discrete if there
exists a countable set E ⊆ R such that P {ω : X(ω) ∈ E} = 1.
Example 16. If Ω is finite all RV is discrete type

P The collection of numbers {Pi } satisfying (P (X = xi ) = Pi )


Definition 21.
for all i and ∞
i=1 Pi = 1 is called probability mass function(pmf).
[picture]

Example 17. A box contains good and defective items. If an item is drawn
is good we assign 1 to the drawing, otherwise 0. Let p be the probability of
drawing at random a good item. Then
X = 1 P (X = 1) = p
X = 0 P (X = 0) = 1 − p

0
 x<0
F (X) = P {X ≤ x} = 1 − p 0 ≤ x < 1

1 x≥1

Exercise 1:
6 1
Let X be an RV with pmf P (X = k) = π 2 k2
,k = 1, 2, · · · . Then
calculate F (X)?

Definition 22. Let X be an RV defined on (Ω, A, P ) with cdf F . Then


X is said to be continuous if there exists a non-negative function f (x) such
that for every real number x, we have
Rx
F (x) = −∞ f (t)dt

12
Remark. The function f (t) is called the probability density function (pdf)
of
R ∞the RV X.
−∞ f (t)dt = F (−∞) = 1.

P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a)
= F (b) − F (a)
Z b Z a
= f (t)dt − f (t)dt
−∞ −∞
Z b
= f (t)dt
a

Example 18. Let X be a RV with CDF F (x), where



0 x ≤ 0

F (x) = x 0 <≤ 1

1 x>1

Calculate pdf f (t)?

Rx
F (x) = ∞ f (t)dt

F (x) = f (x)

(
0 t ≤ 0, t ≥ 1
f (t) =
1 0<t<1
[picture]

Remark. Let X be discrete RV with pmf {pk }. Then the expectation


P∞ or
expected Pvalue of X, denoted by E(X), is defined by E(X) = k=1 x k pk
provided ∞
P∞ P
|x
k=1 k k|p < ∞. x p
k=1 k k may converge but |x |p
k k diverge.
=⇒ E(X) does not exist !
Exercise: Find such examples.

Definition 23. IfR X is of the type continuous


R and pdf f (t) we say E(X)
exists and
R∞equals xf (x)dx. Provided |x|f (x)dx <∞
E(X) = −∞ xf (x)dx, where ever f (x) is defined.

Exercise 2:
If X is RV, then prove that

1. aX + b, a, b ∈ R, are RV

13
2. X 2 is RV
3. |X| is RV.

Definition 24. We define nth moment mn = E(X n ) about the origin.


1
Example 19. Ω = {1, 2, 3, · · · , N }, P (X = k) = N
E(X) = ∞
P 1 PN 1 N +1
k=1 k N = k=1 k N = 2
all the moments exists
E(X 2 ) = N 2 1 = (N +1)(2N +1)
P
k=1 k N 6

Example 20. X are RV with pdf


(
2
x3
x≥1
f (x) =
0 x<1
R∞
E(X) = R1 x22 dx = 2

E(X 2 ) = 1 x2 dx does not exist.
Definition 25. Let k be a positive integer and c a constant. If E(X − c)k
exists, we call it the moment of order k about the point c.
If we take c = E(X) = µ, which exists since E|x| < ∞, we call E(X − µ)k
the central moment of order k or the moment of order k about the mean
µk = E(X − µ)k .
Definition
√ 26. µ2 = σ 2 = E(X − µ)2 = variance.

σ = variance = µ2 = standard deviation.
n
pk (1 − p)n−k , k =

Example 21. Let X be an RV with pmf P {X = k} = k
0, 1, · · · , n.
n  
X n
µ = E(X) = pk (1 − p)n−k
k
k=0
n  
X n−1
= np pk−1 (1 − p)n−k
k−1
k=1
= np
σ 2 = V ar(X)
= E(X 2 ) − E(X)2
E(X 2 ) = E(X(X − 1) + X)
∞  
X n k
= k(k − 1) p (1 − p)n−k + np
k
k=0
= n(n − 1)p2 + np
Var(X) = n(n − 1)p2 + np − n2 p2 = np(1 − p).

14
Definition 27. A number Qp satisfying P (X ≤ Qp ) ≥ p and P (X ≥ Qp ) ≥
1 − p, 0 < p < 1 is called quartile of order p. If p = 12 , then Q1/2 is called
median.
Q1/4 , Q2/4 , Q3/4 quartiles
Q1/10 , Q2/10 , · · · , Q9/10 decile
Q1/100 , Q2/100 , · · · , Q99/100 percentiles.
Definition 28. (Moment generating function) X be an RV on (Ω, A, P ).
Then the function M (s) = E(esX ) is known as moment generate function
of the RV X if E(esX ) exists in some neighbourhood of 0.
Example 22. X with pdf
1 − x2
(
f (x) = 2e x>0
0 x≤0
Z ∞
M (s) = esx f (x)dx
−∞
1 ∞ sx − x
Z
= e e 2 dx
2 0
1 ∞ − x (1−2s)
Z
= e 2 dx
2 0
1 1
= ,s <
1 − 2s 2
Theorem 2. If the MGF M (s) of an RV X exists for all s ∈ (−s0 , s0 ), say
s0 > 0, then the derivatives of all order exist at s = 0 and can be evaluated
under the integral sign i.e., M (k) (s)|s=0 = E(X k ), for positive integer k.
Remark. Alternatively, if the MGF M (s) exists for s ∈ (−s0 , s0 ), s0 > 0,
one can express M (s)(uniquely) in a Maclaurin series expansion: M (s) =
2 ′′ k
M (0) + sM ′ (0) + s2 M + · · · so E(X k ) is the coefficient of sk! .
Example 23. Let X be an RV with pdf
1 − x2
(
2 e x>0
f (x) =
0 x≤0
1
M (s) = 1−2s , s < 21
′ 2 ′′ 8 ′ 2
M (s) = (1−2s) 2 , M (s) = (1−2s)3 , E(X) = M (0) = 2, E(X ) = 8.

Var(X) = E(X 2 ) − (E(X))2 = 8 − 4 = 4.

Example 24. Let X be an RV with pdf


(
1 0≤x≤1
f (x) =
0 otherwise

15
es −1
R1
M (s) = 0 esx = s

E(X) = M (0)

= lim M (s)
s→0
ses − es + 1
= lim
s→0 s2
ses + es − es
= lim (byL′ Hospitalrule)
s→0 2s
1
=
2

4.2 Some inequalities


Chebyshev’s inequality: Let X be an Rv with mean µ and variance σ 2 .
2
Then for any k > 0, P (|X − µ| ≥ k) ≤ σk2

Proof. Let X be continuous variable with pdf f (x).

σ 2 = V ar(X) = E(X − µ)2


Z ∞
= (x − µ)2 f (x)dx
Z−∞
≥ (x − µ)2 f (x)dx
|x−µ|≥k
Z
≥ k 2 f (x)dx
|x−µ|≥k
2
= k P (|X − µ| ≥ k)
σ2
=⇒ P (|X − µ| ≥ k) ≤ k2

Example 25. The number of customers who visits a store everyday is an


RV X with µ = 18 and σ = 2.5. With what probability can we assert that
these will be between 8 and 28 customers?
Solution:
P (8 < X < 28) = P (−10 < X − 18 < 10)
= P (|X − 18| < 10)
σ2
≥1−
100
(2.5)2
=1−
100
= 0.94

16
Problem 8.
|x−α|
(
1
β {1 − β } α−β <x<α+β
f (x) =
0 otherwise

For what values of α and β, f (x) becomes pdf?


Solution:
Z α+β
1 |x − α|
{1 − }dx
α−β β β
Z 1
= (1 − |y|)dy
−1
Z 1
=2 (1 − y)dy
0
=1

α ∈ R, β > 0. Median = α
Z α+β
M ean = E(X) = xf (x)dx
α−β
Z 1
= (yβ + α)(1 − |y|)dy
−1
Z 1 Z 1
= yβ(1 − |y|)dy + α (1 − |y|)dy
−1 −1
=0+α

Z α+β Z 1
2 2
V ariance(X) = (x − α) f (x)dx = β y 2 (1 − |y|)dy
α−β −1
Z 1
= 2β 2 y 2 (1 − y)dy
0
β2
= .
6

4.3 Some special discrete distribution


1. Degenerate distribution
The simplest distribution is that of an RV X degenerate at point k,
i.e., P {X = k} = 1 and 0 otherwise. If we define
(
0 x<0
ϵ(x) =
1 x≥0

17
(
0 x<k
F (x) = ϵ(x − k) =
1 x≥k
E(X) = k, E(X l ) = k l , Var(X) = E(X 2 ) − {E(X)}2 = k 2 − k 2 = 0,
MGF = M (t) = E(etX ) = et .
2. Two-point distribution
We say that an RV X has a two point distribution if it takes two
values x1 and x2 , with probabilities P {X = x1 } = p and P {X =
x2 } = 1 − p, 0 < p < 1
DF = F (x) = pϵ(x − x1 ) + (1 − p)ϵ(x − x2 )
E(X) = px1 + (1 − p)x2
E(X n ) = xn1 p + X2n (1 − p)
MGF = M (t) = E(etX ) = petx1 + (1 − p)etx2
Var(X)E(X 2 ) − {E(X)}2 = px21 + (1 − p)x22 − {px1 + (1 − p)x2 }2 =
p(1 − p)(x1 − x2 )2 .
3. Bernoulli Distribution
If x1 = 1 and x2 = 0 then it is called Bernoulli Distribution.
E(X) = p
M (t) = 1 + p(et − 1)
Var(X) = p(1 − p) = pq, p + q = 1

Example 26. Coin tossing P {H} = p, 0 < p < 1, P {T } = 1 − p.


Then, X(H) = 1, X(T ) = 0.
P {X = 1} = p, P {X = 0} = 1 − p.

4. Uniform Distribution on n Points


X is said to have a uniform distribution on n points {x1 , x2 , · · · , xn }
if its pmf isP of the form P {X = xi } =Pn1 , i = 1, 2, · · · , n. Thus we may
write X = ni=1 xi I[X=xi ] , F (x) = n1 ni=0 ϵ(x − xi )
E(X) = n1 Pni=1 xi ,
P
E(X l ) = n1 Pni=1 xi l , l = 1, 2, ···.
1 n 2 1 Pn 2
Var(X) = P n i=1 xi − ( n i=1 xi )
1 n tx
M (t) = n i=1 e , ∀t i

(n+1)(2n+1)
In particular, xi = 1, i = 1, 2, · · · , n E(X) = n+1
2 , E(X ) =
2
6 ,
2
n −1
Var(X) = 12 .
Problem 9. A box contains tickets numberd 1 to N . Let X be the
largest number drawn in n random drawing with replacement. P (X =
k) =?
P (X ≤ k) = ( Nk )n
P (X ≤ k − 1) = ( k−1
N )
n
n n
P (X = k) = P (x ≤ k) − P (X ≤ k − 1) = k −(k−1)
Nn .

18
5. Binomial distribution
We say that X has binomial distribution with parameter p if its pmf
is given by
n k n−k , k = 0, · · · , n; 0 ≤ p ≤ 1

p k = P {X = k} = k p (1 − p)
P n n
k=0 pk = (p + (1 − p))

X ∼ b(n, p)

F (x) = nk=0 nk pk (1 − p)n−k ϵ(x − k)


P 

E(X) =Pnp, Var(X) = npq, p + q = 1


M (t) = nk=0 etk nk pk q n−k .
Problem 10. A fair dice is rolled n times. The probability of obtain-
ing exactly one 6 is n( 61 )( 56 )n−1 . The probability of obtaining no 6 is
( 65 )n and the probability of obtaining at least one 6 is 1 − ( 56 )n . Find
the number of trials needed for the probability of at least one 6 to be
≥ 21 .

solution: 1 − ( 56 )n ≥ 12
log 2
=⇒ n ≥ log 1.2 ≈ 3.8. n = 4

6. Negative Binomial Distribution(Pascal or Waiting-Time Distri-


bution)
Let (Ω, s, P ) be a probability space of a given statistical experiment,
and let A ∈ s with P (A) = p. On any performance of the experiment,
if A happens we call it a success, otherwise a failure.

Consider a succession of trials of the experiment, and let us compute


the probabilty of observing exactly r success, where r ≥ 1 is a fixed
integer. If X denotes the number of failures that precede the rth suc-
cess, X + r is the total number of replications needed to produce r
success. This will happen if and only if the last trial results in a suc-
cess and among the previous (r + X − 1) trials, there are exactly X
failure. It follow by independence that
 
x+r−1 r
P {X = x} = p (1 − p)x , x = 0, 1, 2, · · · (4)
x
 
−r r
P {X = x} = p (−q)x , x = 0, 1, · · · (5)
x
We see that
∞  
X −r
(−q)x = (1 − q)−r = p−r (6)
x
x=0
It follows that ∞
P
n=0 P {X = x} = 1.

19
Definition 29. For a fixed positive integer r ≥ 1 and 0 < p < 1 an RV
with pmf given by (4) is said to have a negative binomial distrtibution.
We use this notation X ∼ N B(r; p).

We may
P∞write
X = x=0 xI{X=x} and F (x) = ∞ k+r−1
pr (1 − p)k ϵ(x − k)
P 
x=0 k
∞  
X x+r−1
M GF = M (t) = p2 (1 − p)x etx
x
x=0
∞  
r
X
t x x+r−1
=p (qe ) [q = 1 − p]
x
x=0
= p (1 − qet )−r , qet < 1
r

Problem 11. An airline knows that 5 of the people making reservations do


not turn up for the flight. So it sells 52 tickets for a 50 seat flight. What is
the probability that every passenger who turns up will get a seat?

Solution:
X := number of passengers who turn up for the flight.
X ∼ b(n = 52, p = 0.95)

P (X ≤ 50) = 1 − P (X = 51) − P (X = 52)


 
52
=1− (0.95)51 (0.05) − (0.95)52
51
≈ 0.74

Hypergeometric Distribution
A box contains N marbles. Of these, M are drawn at random, marked,
and returned to the the box. The contents of the box are then thoroughly
mixed. Next, n marbles are drawn at random from the box, and the marked
marbles are counted. If X denotes the number of marked marbles, then
M N −M
 
x n−x
P {X = x} = N
 , x = 0, 1, 2, · · · (7)
x

Definition 30. An RV X as in (7) is called a hypergeometric RV.

20
n N −M
M
 
X x x
M ean = µ = E(X) = N
n−x
x=0 n
n M
 N −1−(M −1)
Xx x
= N
n−x
x=1 n
n−1 M −1 N −1−(M −1)
 
Mn X y n−y−1
= N −1

N n−1
y=0
Mn
=
N
M n (M −1)(n−1)
E(X(X − 1)) = N (N −1) = E(X 2 − X) = E(X 2 ) − E(X)

M n(M − 1)(n − 1) M n Mn 2
V ar(X) = E(X 2 ) − {E(X)}2 = + −( )
N (N − 1) N N
nM (N − M )(N − n)
=
N 2 (N − 1)
Poisson Distribution
An RV is said to be Poisson RV with parameter λ > 0 if its pmf is given by
e−λ λk
P {X = k} = , k = 0, 1, 2, · · · (8)
k!
Some experiments result in counting the number of times particular event
occur in the given times or given physical objects.

Example 27. We could count the number of customers that arrive at a


ticket window between 12 noon to 2 pm.
We first check to see that (8) indeed defines a pmf. We have
P∞ P∞ e−λ λk
k=0 P {X = k} = k=0 k! = e−λ eλ = 1.
If X has pmf given by (8), we will write X ∼ P (λ).


X e−λ λk
E(X) = k
k!
k=0

−λ
X λk−1
=e λ
(k − 1)!
k=1
−λ λ
= λe e

21

X e−λ λk
E(X 2 ) = k2
k!
k=0

−λ 2
X kλk−2
=e λ
(k − 1)!(k − 2)!
k=1

−λ 2
X λk−2 k − 1 1
=e λ ( + )
(k − 2)! k − 1 k − 1
k=1
∞ ∞
−λ 2
X λk−2 −λ
X λk−1
=e λ +e λ
(k − 2)! (k − 1)!
k=2 k=1
−λ 2 λ −λ λ
=e λ e +e λe
2
=λ +λ

Var(X) = λ.
The MGF of X is given by E(etX ) = exp [λ(et − 1)]

Problem 12. Telephone calls enter a college switchboard on the average of


2 every 3 minutes. If one assume an approximate Poisson process (Poisson
distribution), what is the probability that five or more calls arriving in a 9
minute period?

Solution: Let X denote the number of calls in a 9 minute period.


E(X) = 3 × 2 = 6 = λ

P (X ≥ 5) = 1 − P (X ≤ 4)
4
X e−6 6x
=1−
x!
x=0
= 1 − 0.255
= 0.715

Some Continuous Distributions.

Definition 31. Uniform distribution(Rectangular Distribution). An RV X


is said to have a uniform distribution on the interval [a.b], −∞ < a < b < ∞,
if its pdf is given by (
1
a≤x≤b
f (x) = b−a
0 otherwise

22
We will write X ∼ U [a, b] if X has a uniform distribution
R∞ on [a, b].
The endpoint a or b both may be excluded clearly, −∞ f (x)dx = 1,
hR i
∞ Rb 1 b−a
−∞ f (x) = a b−a dx = b−a = 1 .
The cdf of X is given by

0
 x<a
x−a
F (x) = b−a a ≤ x < b

1 b≤x

R∞ R b x 1 2 2 a+b
E(X) = −∞ xf (x)dx = a 1−a dx = 2(b−a) [b − a ] = 2 .
Rb k
E(X k ) = ∞∞ k 1 1 1
−∞ x f (x)dx = 1−a a x dx = b−a x+1 (b
k+1 − ak+1 )

σ 2 = V ar(X) = E(X 2 ) − {E(X)}2


1 a+b 2
= (b3 − a3 ) − ( )
3(b − a) 2
1 a2 + 2ab + b2
= (b2 + ab + a2 ) −
3 4
a2 − 2ab + b2
=
12
1
= (a − b)2
12
M (t) =E(etx )
Z ∞
= etx f (x)dx
−∞
Z b
1
= etx dx
b−a a
1
= [etb − eta ], t ̸= 0.
t(b − a)
Exponential Distribution.
We say that the RV X has an exponential distribution if its pdf is defined by

x
f (x) = 1θ e− θ , 0 ≤ x < ∞

where the parameter θ > 0.


Accordingly, the waiting time W until the first change in a Poisson process
has an exponential distribution with θ = λ1 . To determine the exact meaning
of the parameter θ, we first find the normal-generating function of X. It is

23
Z ∞
1 x
M (t) = etx ( )e− θ dx
0 θ
Z b
1 x
= lim ( )e−(1−θt) θ dx
b→∞ 0 θ
" x
#b
e−(1−θt) θ
= lim −
b→∞ 1 − θt
0
1 1
= ,t <
1 − θt θ
′ θ
Thus M (t) = (1−θt)2
′′ 2θ2
and M (t) = (1−θt)3
Hence, for an exponential distribution, we have,
′ ′′ ′
µ = M (0) = θ, and σ 2 = M (0) − [M ]2 = θ2 .
So λ is then mean number of changes in the unit interval, then θ = λ1 is the
mean waiting time for the first change. In particular, suppose that λ = 7
is the mean number of changes per minute, then the mean waiting time for
the first change is 17 of a minute, a result that agrees with our intuition.
R∞
The integral Γ(α) = 0+ xα−1 e−x dx is called Gamma function. Γ(α) <
∞, converges as α > 0 and Γ(α) diverges as α ≤ 0. If α > 1, Γ(α) =

(α − 1)Γ(α). If α = n is an integer Γ(n) = (n − 1)!. Γ( 21 ) = π.

Definition 32. The RV X has a Gamma distribution if its pdf is defined


by
α−1 e− xθ
(
1
Γ(α)θα x x≥0
f (x) =
0 x<0
We will write X ∼ G(α, θ)

Problem 13. Suppose that an average of 30 customers per hour arrive at a


shop in accordance with a Poisson process. That is, if a minute is our unit,
then λ = 12 . What is the probability that the shopkeeper will wait more
than 5 mins before both of the first two customers arrive?

Solution: If X denotes the waiting time in minutes until the second


customer arrives, then X has a gamma distribution α = 2, θ = λ1

24
x

x2−1 e− 2
Z
P (X > 5) = dx
5 Γ(2)22
x

xe− 2
Z
= dx
5 4
1h x ∞
x
i
= (−2)xe− 2 − 4e− 2
4 5
1 −5/2 −5/2
= [10e + 4e ]
4
7
= e−5/2
2
= 0.287

Chi-square Distribution.
Let X ∼ G(α, θ). If θ = 2 and α = 2r , r is a positive integer. The pdf of X is

r x
f (x) = 1
r x 2 −1 e− 2 , x ≥ 0
Γ( r2 )2 2

In this case, we say that X has a χ2 -distribution with r degrees freedom and
write as X ∼ χ2 (r).
µ = αθ = 2r × 2 = r
σ 2 = αθ2 = 2r × 4 = 2r
r
M (t) = (1 − 2t)− 2 , t < 12
Rx 1 r ω
F (x) = 0 r r ω 2 −1 e− 2 dω
Γ( 2 )2 2

Example 28. X ∼ χ2 (5)


P (X > 15.09) = 1 − P (X ≤ 15.09) = 1 − F (15.09) = 1 − 0.99 = 0.01.
Problem 14. If customers arrive at a shop on the average of 30 per hour
in accordance with a Poisson process, what is the probability that the shop-
keeper will have to wait longer than 9.39 minutes for the first 9 customers
to arrive?
Solution: Note the mean rate of arrivals per minute is λ = 12 . Thus
θ = 2, and α = 2r = 9.
If X denotes the waiting time until the ninth arrivals, then X ∼ χ2 (18)

P (X > 9.39) = 1 − F (9.39)


= 1 − 0.05
= 0.95

Normal Distribution.
The RV X has a normal distribution if its pdf is defined by

25
2
f (x) = √1
σ 2π
exp [− (x−µ)
2σ 2
], −∞ < x < ∞

where µ and σ are parameters satisfying −∞ < µ < ∞ and 0 < σ < ∞. We
write X ∼ N (µ, σ 2 ).
Clearly f (x) > 0 because of exponential function. We now evaluate the
integral
R∞ 2
I = −∞ σ√12π exp [− (x−µ)
2σ 2
]dx. Want to show I = 1.
x−µ
In I, change variable of integration by letting z = σ , then
2
− z2
R∞
I= √1
−∞ 2π e dz

Since, I > 0, if I 2 = 1 then I = 1. Now


Z ∞  Z ∞ 
2 1 − x2
2 2
− y2
I = e dx e dy
2π −∞ −∞
Z ∞Z ∞
1 x2 + y 2
= exp[− ]dxdy
2π −∞ −∞ 2
Z 2π Z ∞
1 r2
= e− 2 rdrdθ, [x = r cos θ, y = r sin θ, 0 ≤ θ ≤ 2π, 0 ≤ r < ∞]
2π 0 0
Z 2π
1
= dθ
2π 0

=

=1

The MGF of X is


etx (x − µ)2
Z
M (t) = √ exp [− ]dx
σ 2π 2σ 2
Z−∞

1 1
= √ exp{− 2 [x2 − 2(µ + σ 2 t)x + µ2 ]}dx
−∞ σ 2π 2σ
To evaluate this integral, we compute the square in the exponent
x2 − 2(µ + σ 2 t)x + µ2 = [x − (µ + σ 2 t)]2 − 2µσ 2 t − σ 4 t2
2 t+σ 4 t2 2 2
Thus M (t) = exp ( 2µσ 2σ 2 ) = exp (µt + σ 2t )
′ 2 2
M (t) = (µ + σ 2 t) exp (µt + σ 2t )
′′ 2 2
M (t) = [(µ + σ 2 t)2 + σ 2 ] exp (µt + σ 2t )

E(X) = M (0) = µ
′′ ′
Var(X)M (0) − [M (0)]2 = µ2 + σ 2 − µ2 = σ 2 .
2
Example 29. Given pdf = f (x) = √1
32π
exp [− (x+7)
32 ] what are mean and
standard deviation?
µ = −7, σ 2 = 16, X ∼ N (−7, 16)

26
M (t) = exp −7t + 8t2
Z ∼ N (0, 1) is called standard normal distribution.
z 2
pdf = f (z) = √1 e− 2

2
− w2
Rz
cdf = Φ(z) = √1
−∞ 2π e dw
(⋆) pdf is symmetric
(⋆) Φ(−z) = 1 − Φ(z)

Example 30. Suppose M (t) = exp (5t + 12t2 ) what is µ and σ?


2
1
f (x) = √48π exp [− (x−5)
48 ], µ = 5, σ = 24

R 1.24 z2
Example 31. If Z ∼ N (0, 1), P (Z ≤ 1.24) = Φ(1.24) = √1 e− 2 dz =
−∞ 2π
0.8925.

Problem 15. If the distribution Z ∼ N (0, 1). Find the the cconstant a
such that P (Z ≤ a) = 0.9174
Indeed,Φ(a) = 0.9147, Φ(1.37) = 0.9147, a = 1.37
because Φ is 1 − 1.
µ4
Kurtosis: β2 = σ4
. It measures peakness.

Example 32. For normal distribution β2 = 3. N (µ, σ 2 ) is called standard


normal distribution if µ = 0, σ = 1
′ µ4
Relative Kurtosis: β2 = σ4
−3

4.4 Distribution of Functions of a RV X


X u
Let X be an RV. Let Ω −→ R − → R, u is continuous or Piecewise continuous.
Then Y = u(X) : Ω → − R is also random variable. Note that u(X) = u ◦ X
is a composition of two functions. We are interested in finding distribution
of Y
G(Y ) = P (Y ≤ y) = P [u(X) ≤ y]. The probability density function (pdf)

for the random variable Y is g(y) = G (y).

Example 33. Let X have a gamma distribution with pdf

f (x) = 1 α−1 e− xθ , 0<x<∞


Γ(α)θα x

where α > 0, θ > 0. Let Y = eX so that the support of Y is 1 < y < ∞.

G(y) = P (Y ≤ y) = P (eX ≤ Y ) = P (X ≤ ln y). i.e.,


α−1 e− xθ dx
R ln y 1
G(y) = 0 Γ(α)θ αx


and thus the pdf g(y) = G (y) of Y is

27
ln y
g(y) = 1 α−1 e− θ ( 1 )
Γ(α)θα (ln y) y

Equivalently, we have
1 (ln y)α−1
g(y) = Γ(α)θα 1 , 1<y<∞
y 1+ θ

is called a loggamma pdf.


Exercise 3:

Let X have the pdf f (x) = 3(1−x)2 , 0 < x < 1, say Y = (1−x)3 = u(x).
Calculate g(y)?

Example 34. A spinner is mounted at the point (0, 1), let w be the smallest
angle between the y-axis and the spinner. Assume that w is the value of a
random variable W that has a uniform distribution on the interval (− π2 , π2 ).
That is, W is U ((− π2 , π2 )), and the distribution function of W is


0
 −∞ < w < − π2
P (W ≤ w) = F (w) = (w + π2 ) π1 − π2 ≤ w < π2

1 w ≥ π2

The relationship between x and w is given by x = tan w; that is x is the


point on the x-axis which is the intersection of that axis and the linear
extension of the spinner. To find the distribution function of the random
variable X = tan W , we see that the distribution function of X is given by
G(x) = P (X ≤ x) = P (tan W ≤ x) = P (W ≤ tan−1 x) = F (tan−1 x) =
(tan−1 x + π2 )( π1 ), −∞ < x < ∞.
The last equality follows because − π2 < w = tan−1 x < π2 . The pdf of X is
given by
′ 1
g(x) = G (x) = π(1+x2 )
, −∞ < x < ∞

is called Cauchy pdf.


X−µ
Theorem 3. If X is N (µ, σ 2 ), then Z = σ is N (0, 1)

Proof. The distribution function for Z is

X −µ
P (Z ≤ z) = P ( ≤ z)
σ
= P (X ≤ zσ + µ)
Z zσ+µ
1 (x − µ)2
= √ exp [− ]dx
−∞ σ 2π 2σ 2
In the integral, use the change of variable of integration given by
w = x−µ
σ

28
to obtain
Rz w2
P (Z ≤ z) = −∞ √12π e− 2 dw
But this is the expression of ϕ(z), the distribution function of a standard
normal random variable, Z ∼ N (0, 1).

Remark. X ∼ N (µ, σ 2 )

P (a ≤ X ≤ b) = P (a − µ ≤ x − µ ≤ b − µ)
a−µ X −µ b−µ
= P( ≤ ≤ )
σ σ σ
a−µ b−µ
= P( ≤Z≤ )
σ σ
b−µ a−µ
= Φ( ) − Φ( )
σ σ
Φ is cdf of N (0, 1).
Example 35. If X is N (3, 16), then µ = 3, σ = 4

4−3 X −3 8−3
P (4 ≤ X ≤ 8) = P ( ≤ ≤ )
4 4 4
= Φ(1.25) − Φ(0.25)
= 0.8944 − 0.5987
= 0.2957
Example 36. If N is N (25, 36), we find a constant c such that P (|X −25| ≤
c) = 0.9544
P (− 6c ≤ X−25
6 ≤ 6c ) = 0.9544
Thus Φ( 6 ) − [1 − Φ( 6c )] = 0.9544
c

Φ( 6c ) = 0.9772, also Φ(2) = 0.9722, 6c = 2 =⇒ c = 12


(X−µ)2
Theorem 4. If the RV X is N (µ, σ 2 ), σ 2 > 0, then RV V = σ = z2
is χ2 (1).
Moments:
If X is N (µ, σ 2 ), then even moments µ2n = 1·3·5 · · · (2n−1)σ 2n ; n = 1, 2, · · · .
x−µ
R∞ − 1 ( σ )2
By definition: µ2n = −∞ (x − µ)2n e σ2 √2π dx
x−µ
letting z = σ in the above integral, yields
Z ∞ − 12 z 2
2e
µ2n = (zσ) √ σdz
−∞ σ 2π
Z ∞
σ 2n z2
= 2√ z 2n e− 2 dz
2π 0
R ∞ 2n −ax2
= 1·3·5···(2n−1)

Fact from calculus: 0 x e 2n+1 an a
So, µ2n = 1 · 3 · 5 · · · (2n − 1)σ 2n

29
µ2 = σ 2 , µ4 = 3σ 4
3σ 4
Kurtosis = β2 = µµ42 = σ4
= 3.
2
Odd moments for N (µ, σ 2 ) is µ 2n+1 .
x−µ 2
R∞ −1
2( σ )
By definition: µ2n+1 = −∞ (x − µ)2n+1 e √
σ 2π
dx
x−µ
letting z = σ in the above integral we have
2
Z ∞ 2n+1 e z2
2n+1 z
µ2n+1 = σ √
−∞ 2π
= 0,
since the integrand is odd function.
Skewness= β1 = σµ33 = 0.

Beta Distribution.
The integral Z 1−
B(α, β) = xα−1 (1 − x)β−1 dx (9)
0+
converges for α > 0, β > 0 and is called a beta function. For α ≤ 0 or β ≤ 0,
the above integral diverges.
Properties:

1. B(α, β) = B(β, α)

R∞
2. 0+ x
α−1 (1 + x)−α−β dx

Γ(α)Γ(β)
3. B(α, β) = Γ(α+β)

(
1 α−1 (1
B(α,β) x − x)β−1 0<x<1
f (x) = (10)
0 otherwise
defines a pdf.

Definition 33. An RV X with pdf given by (10) is said to have a beta


distribution with parameters α and β, α > 0, β > 0. We will write X ∼
B(α, β) for a beta variable with pdf given by (10)

The cdf of B(α, β) is given by



0
 x≤0
1
Rx
F (x) = 0+ y α−1 (1 − y)β−1 dy 0<x<1
 B(α,β)
1 x≤1

30
Raw moments,
Z 1
n1
mn = E(X ) = xn+α−1 (1 − x)β−1 dx
B(α, β) 0
B(n + α, β)
=
B(α, β)
Γ(n + α)Γ(α, β)
=
Γ(n + α − β)Γ(α)
B(α+1,β) Γ(α+1) Γ(α+β)
E(X) = B(α,β) = Γ(α) =
Γ(α+β+1)
Var(X) = E(X 2 )− = (α+β)2αβ
E(X)2
1
R 1 tx α−1 (α+β+1)
MGF = M (t) = B(α,β) 0 e x (1 − x)β−1 dx
Since moments of all order exist and E|X|j < 1, ∀j, we have M (t) =
P∞ tj j
P∞ tj Γ(α+j)Γ(α+β)
j=0 j! E(X) = j=0 j! Γ(α+β+j)Γ(α)

Remark. In the special case where α = β = 1, we get uniform distribution


on (0, 1), i.e., X ∼ U (0, 1)

Remark. If X is a beta RV with parameters α and β, then 1 − X is a beta


variate with parameters β and α. In particular, X is B(α, α) ⇐⇒ 1 − X
is B(α, α)

Example 37. Let X be distribution with pdf


(
1 2
x (1 − x) 0 < x < 1
f (x) = 12
0 otherwise

Then X ∼ B(3, 2) and


Γ(n+3)Γ(5) 12
E(X n ) = Γ(3)Γ(n+5) = (n+4)(n+3)
12
E(X) = 20
6 1
Var(X) = 25×6 = 25
1
R 0.5 2
P (0.2 < X < 0.5) = 12 0.2 x (1 − x)dx = 0.023

Cauchy Distribution.

Definition 34. An RV X is said to have a Cauchy distribution with pa-


rameters µ and θ if its pdf is given by
µ 1
f (x) = , −∞ < x < ∞, µ > 0 (11)
π µ + (x − θ)2
2

we will write X ∼ C(µ, θ) for a Cauchy RV with pdf given by (11)

31
Check for pdf
Z ∞ Z ∞
µ 1
f (x)dx = 2 2
dx
−∞ −∞ π µ + (x − θ)
1 ∞ x−θ
Z
1
= 2
dy, [y = ]
π −∞ 1 + y µ
Z ∞
2 dy
=
π 0 1 + y2
2  −1 ∞
= tan y 0
π
2 π
= ×
π 2
=1

cdf of C(0, 1) is given by F (x) = 1


2 + 1
π tan−1 x, −∞ < x < ∞.

Theorem 5. Let X be a Cauchy RV with parameters µ and θ. Then mo-


ments of order < 1 exist, but moments of order ≥ 1 does not exist for the
RV X.
1
Proof. Suffices to consider the pdf f (x) = π1 1+x 2 , −∞ < x < ∞. E|X|
α =
2 ∞ α 1 1 α
R
π 0 x 1+x2 dx and letting z = 1+x2 in the integral, we get, E(|x| ) =
1 1 1−α
(1 − z)α , which converges for α < 1, diverges for α ≥ 1
R
π 0 z

Pareto Distribution
We say that the RV X has a pareto distribution with parameter θ > 0 and
α > 0 if its pdf is given by
αθα
(
(x+θ)θ+1
x>0
f (x) =
0 otherwise

θ : scale parameter
α : shape parameter
θα
F (x) = P (X ≤ x) = 1 − (θ+x)α , x >0
θ
E(X) = α−1
αθ2
Var(X)= (α−2)(α−1)2
,α >2

Suppose that X has a Pareto distribution with parameters θ and α.


Writing Y = ln( Xθ ), we can see that Y has pdf
y
fY (y) = (1+eαey )α+1 , −∞ < y < ∞
FY (y) = 1 − (1 + ey )−a , ∀y

Weibull Distribution
Gamma distribution X ∼ G(α, β)

32
x
α−1 e− β
(
1
Γ(α)β α x x>0
f (x) =
0 x≤0
G(1, β), then
x
1 −β
(
βe x>0
f (x) =
0 x≤0
1
Y = Xα,α > 0

F (y) = P (Y ≤ y)
1
= P (X α ≤ y)
Z yα
= f (x)dx
−∞
Z yα
1 − βx
= e
0 β
Z yα
β
= e−z dz
0
 −z
 yβα
= −e 0
α
− yβ
=1−e

f (y) = F (y)

(
α α−1 −
βy e y>0
β
f (y) = (12)
0 y≤0
Y is said to have Weibull distribution with pdf given by (12).

Reliability

• Let T ∈ [0, ∞) be a random variable representing the lifetime distri-


bution of a pdf/component/system/machine etc.

• Assume that T has a continuous cdf F () and pdf f (·) on [0, ∞)

Hazard Rate/ Failure Rate


Suppose that the item has survived upto time t and we want to know the
probability that it will not survive for an additional time dt. Hence,

33
P (T ∈ (t, dt))
P (T ∈ (t, dt)|T > t) =
P (T > t)
f (t)dt
=
1 − F (t)
d
= [− log(1 − F (t))]
dt
f (t) d
λ(t) = = [− log(1 − F (t))] (13)
1 − F (t) dt
represents the conditional probability intensity that an item of age t will fail
instantly.

This instantaneous failure rate is known as Hazard rate/ Failure rate.


The function 1 − F (t) = F̄ (t) is called the survival or reliability function.
From (13), prove Rthat
t
1 − F (t) = exp − 0 λ(s)ds
Rt
F (t) = 1 − exp − 0 λ(s)ds

f (t) = F (t)

Example 38. λ(t) = a + bt2


bt2
F (t) = 1 − e−at− 2

bt2
(
(a + bt)e−(at+ 2 ) t≥0
f (t) = (14)
0 t<0

Remark. In (14), put a = 0, then we get a distribution which calls Rayleigh


distribution.
Example 39. If we say death rate of a smoker at each age is k times than
a non-smoker.
at age t , kλn (t) = λs (t)
Suppose a year old non-smoker will survive untill the age b, b > a
P (a year old non-smoker reaches the age b)
= P (non-smoker lifetime > b | non-smoker lifetime > a )
1−Fn (b)
= 1−F n (a)
exp {− 0b λn (t)dt}
R
= Ra
exp {− 0 λn (t)dt}
Rb
= exp (− a λn (t)dt)
P (a year old smoker reaches the age b)
Rb
= exp(− a λs (t)dt)
Rb
= exp(−k a λn (t)dt)
Rb
= {exp(− a λn (t)dt)}k

34
Problem 16. Under a certain complicated birth situation the mortality
rate of a new born child is given by Z(t) = 0.5 + 2t, t > 0
If the baby survives to age one what is the probability that he/ she will
survive to age 2.

Solution: X ∼ Age
P (X > 2|X > 1) = P P(A∩B)
(B) =
P (A)
P (B) = P (X>2)
P (X>1)
Z
P (X > t) = RX (t) = exp {− z(t)dt}

= exp {−(0.5t + t2 )}
P (X > 2) = e−5 , P (X > 1) = e−1.5

Problem 17. Consider a university computer center with an average rate


of job submission λ = 0.1 jobs per second. Assuming that the number of
arrivals per unit time is poisson distributed with parameter λ. Find the
probability that an interval of 10 seconds elapess without job submission

Solution:
Z
P (X ≥ 10) = 0∞ 0.λe−0.1t dt
1
= lim [−e−0.1t + e−1 ]
t→∞
−1
=e
≈ 0.368

Exercise: Prove the following:

1. E(aX) = aE(X)
E(nX) = E(X + · · · + X) = nE(X)
E(cX + d) = cE(X) + d

2. Var(cX) = c2 Var(X)
R∞
M (t) = E(etX ) = −∞ etx f (x)dx
′ ′′
M (0) = E(X), M (0) = E(X 2 )

Problem 18. Compute the probability of obtaining 3 defectives in a sample


size 10 taken without replacement from a to fox of 20 components containing
4 defectives.
(43)(167)
Solution: Distribution ∼ hypergeometric ∼ h(3, 10, 4, 20) = ≈
(20
10 )
0.248

35
Problem 19. The lifetime X in hours of a component is modeled by a
Weibull distribution with α = 2. Starting with a large number of compo-
nents , it is observed that 15% of the components that have lasted 90 hours
fail before 100 hours. Determine parameter λ.
2
Solution: FX (x) = 1 − e−λx
15
P (X < 100|X > 90) = 100
=⇒ P (90<X<100)
P (X>90) = 0.15
FX (100)−FX (90)
=⇒ 1−FX (90) = 0.15
2 2
e−λ90 −e−λ100
=⇒ λ90 2 = 0.15
1−e
=⇒ λ ∼ − ln1900
(0.85)
≈ 0.000086

36
5 Joint distribution
Many a time, the data appears in a pair, tuples etc. For instance, one is
interested in collecting data (height, weight) of a college students for study
purpose.
We begin with the definition of bivariate distribution. Similarly, one can
define multivariate distribution.

Definition 35. Let X and Y be two random variables defined on a discrete


probability space. Let S denote the corresponding two dimensional space of
X and Y , the two random variable of discrete type.. The probability that
X = x, Y = y is denoted by f (x, y) = P (X = x, Y = y).
The function f (x, y) is called joint probability mass function (pmf) if X and
Y has the following properties :

1. 0 ≤ f (x, y) ≤ 1

PP
2. (x,y)∈S =1

PP
3. P [(x, y) ∈ A] = (x,y)∈A f (x, y), A ⊆ S.

Example 40. Roll a pair of unbiased dice. Then sample space is S =


{(1, 1), (1, 2), · · · , (6, 6)}, |S| = 36. The probability of each point (x, y) is
1
P (X = x, Y = y) = 36 .
Let X denote the smaller and Y the larger outcome on the dice. P (X =
2
2, Y = 3) = 36 since (2, 3) and (3, 2) are favourable outcomes. P (X =
1
2, Y = 2) = 36
(
1
1≤x=y≤6
f (x, y) = 362
36 1≤x<y≤6

Definition 36. Let X and Y have the joint pmf f (x, y) with the space S.
The pmf of X alone, which is called the marginal pmf of X, is defined by
P
f1 (x) = y f (x, y) = P (X = x), x ∈ S1

Similarly, the marginal pmf of Y ,


P
f2 (y) = x f (x, y) = P (Y = y), y ∈ S2

Definition 37. The random variables X and Y are independent ⇐⇒


P (X = x, Y = y) = P (X = x)P (Y = y) i.e., f (x, y) = f1 (x)f1 (y), x ∈
S1 , y ∈ S2 .

Problem 20. In the previous example are X and Y independent?

37
Answer: No, f (x, y) ̸= f1 (x)f2 (y)
11 1 1
For instance, f1 (1)f2 (1) = 36 × 36 ̸= f (1, 1) = 36 .
2
Example 41. Joint pmf X and Y given by f (x, y) = xy 30 , x = 1, 2, 3; y =
1, 2. S = {(1, 1), (2, 1), (3, 1), · · · } ⊆ R2 . Find marginal pmfs? Are X and
Y independent?
2
Let us calculate. f1 (x) = 2y=1 xy x 4x 5x x
P
30 = 30 + 30 = 30 = 6
2 y2 2y 2 3y 2 6y 2 y2
f2 (y) = 3x=1 xy
P
30 = 30 + 30 + 30 = 30 = 5
2
x y2
f (x, y) = xy
30 = 6 5 = f1 (x)f2 (y) ∀x, y
Hence, X and Y are independent.

Definition 38. Let X1 and X2 be two random variables of discrete type


with joint pmf f (x1 , x2 ) of the space S. If u(X PP 1 , X2 ) is a function of these
two random variables then E(u(X1 , X2 )) = (x1 ,x2 )∈S u(x1 , x2 )f (x1 , x2 )
if it exists, is called the mathematical expectation (or Expected value) of
u(X1 , X2 ).
PP
Remark. (x1 ,x2 )∈S |u(x1 , x2 )|f
P(x1 , x2 ) converge
PP and finite.
Y = u(X1 , X2 ), pmf g(y), E(Y ) = y∈S1 yg(y) = (x1 ,x2 )∈S u(x1 , x2 )f (x1 , x2 ).

Example 42. There are eight similar chips in a bowl, three marked (0, 0)
two marked (0,1), two marked (1, 0), one marked (1, 1). A player select a
chip at random and is given sum of the coordinates in dollars. X1 and X2
represent those coordinates. Then f (x1 , x2 ) = 3−x81 −x2 , x1 = 0, 1; x2 = 0, 1
u(X1 , X2 ) = X1 + X2 , E(u(X1 , X2 )) =?
1 X
1
X 3 − x1 − x2
E(u(X1 , X2 )) = (x1 + x2 )
8
x2 =0 x1 =0
1
X x2 (3 − x2 ) (1 + x2 )(2 − x2 )
= ( + )
8 8
x2 =0
2 2 2
= + +
8 8 8
6
=
8
3
=
4
Remark. (a) u(X1 , X2 ) = Xi , E(u(X1 , X2 )) = E(Xi ) = µi is called the
mean of XP i , iP= 1, 2
xi f (x1 ,x2 ) P
E(X1 ) = (x1 ,x2 ) f1 (x1 ) = E(X1 ) = x1 f1 (x1 )
(b) If u2 (X1 , X2 ) = (Xi − ui )2 , then E(u2 (X1 , X2 )) = E[(Xi − ui )2 ] = σi2 =
V ar(Xi ).

38
Definition 39. The joint pdf of two continuous type random variables is
an integrable f (x, y) with the following properties:

1. f (x, y) ≥ 0

R∞ R∞
2. −∞ −∞ f (x, y)dxdy =1

RR
3. P [(X, Y ) ∈ A] = A f (x, y)dxdy, where {(x, y) ∈ A} is an event in
the plane.

Remark. Property (3) implies that P [(X, Y ) ∈ A] is the volume of the solid
over the region A in the xy-plane and bounded by the surface z = f (x, y).

Example 43. Let X and Y have the joint pdf f (x, y) = 32 x2 (1 − |y|),
−1 < x < 1, −1 < y < 1
Let A = {(x, y) : 0 < x < y, 0 < yR <R x}. The probability that (X, Y ) falls
1 x
in A is given by P [(X, Y ) ∈ A] = 0 0 23 x2 (1 − y)dydx = 40
9

The respective
R∞ marginal pdf for continuous case:
f1 (x) = −∞ f (x, y)dy, x ∈ S1
R∞
f2 (y) = −∞ f (x, y)dx, y ∈ S2
where S1 and S2 are spaces of X and Y .
From theR previous example, we calculate
1
f1 (x) = −1 23 x2 (1 − |y|)dy = 23 x2 , −1 < x < 1
R1
f2 (y) = −1 32 x2 (1 − |y|)dx = 1 − |y|, −1 < y < 1

Remark. The definitions associated with mathematical expectations are


the same as these associated with the discrete case after replacing summa-
tion with integrations.

Example 44. Let X and Y have joint pdf


fR (x, y) = 2, 0 ≤ x ≤ y ≤ 1.
∞ R∞
−∞ f (x, y)dxdy
R 1 −∞R1
= x=0 y=x 2dydx
R1
= 0 [2y]1x dx
R1
= 0 [2 − 2x] dx
1
= 2x − x2 0


=1

Problem 21. Let X and Y be a joint pdf, f (x, y) = 2, 0 ≤ x ≤ y ≤ 1.


Find P (0 ≤ X ≤ 12 , 0 ≤ Y ≤ 21 )?

39
Solution:
1 1 1
P (0 ≤ X ≤ , 0 ≤ Y ≤ ) = P (0 ≤ X ≤ y, 0 ≤ y ≤
2 2 2
Z 1Z y
2
= 2dxdy
0 0
Z 1
2
= 2ydy
0
1
=
4
[Draw picture]
The Rmarginal pdfs are given by
1
f1 (x) = x 2dy = 2(1 − x), 0 ≤ x ≤ 1
Ry
f2 (y) = 0 2dy = 2y, 0 ≤ y ≤ 1
Three illustrations
R1R1 of expected
R1 values are
1
E(X) = 0 x 2xdydx = 0 2x(1 − x)dx = 3
R1Ry R1 2
E(Y ) = 0 0 2ydxdy = 0 2y 2 dy = 3
R1Ry R1 1
E(Y 2 ) = 0 0 2y 2 dxdy = 0 2y 3 dy = 2

Definition 40. X and Y are independent ⇐⇒ the joint factors into the
product of their marginal pdfs, namely,
f (x, y) = f1 (x)f2 (y), x ∈ S1 , y ∈ S2 .

Binomial distribution to trinomial distribution


If we extend:
We have 3 mutually exclusive and exhaustive ways for an experiment to
terminate; say perfect, seconds, and defective.

We repeate the experiment n independent times and the probabilities


p1 , p2 , p3 = 1 − p1 − p2 of perfect, seconds and defective remain the from
trial to trial. in the n trials, let X1 = number of perfect items, X2 = number
of seconds item, X3 = number of defectives = n − X1 − X2
If x1 and x2 are non-negative integers such that x1 + x2 ≤ x, then the
probability of having x1 perfect, x2 seconds, and n − x1 − x2 defectives in
that order is px1 1 px2 2 (1 − p1 − p2 )n−x1 −x2 .
Recall: Discussing multiple random variables X = (x1 , · · · , xn ) [15 page]

Definition 41. Distrbution function F (x, y) = P (X ≤ x, Y ≤ y).

Theorem 6. A function F of two variable is a DF of some two dimensional


RV ⇐⇒ it satisfies following conditions:

40
1. F is non-decreasing and right continuous with respect to each coordi-
nate

2. F (−∞, y) = F (x, ∞) = 0 and F (+∞, +∞) = 1.

3. For every (x1 , y1 ), (x2 , y2 ) with x1 < x2 and y1 < y2 the inequality
F (x2 , y2 ) − F (x2 , y1 ) + F (x1 , y1 ) − F (x1 , y2 ) ≥ 0.

Remark. The conditions (1) and (2) are not sufficient to make any function
F (x, y) a DF. For example, consider F (x, y) = 0, when x < 0, y < 0, x+y < 1
and F (x, y) = 1 otherwise. Then F satisfies both (1) and (2). However check
that P (1/3 < X ≤ 1, 1/3 < Y ≤ 1) = −1 ≱ 0.

µi := E(Xi ), i = 1, 2
σi2
= E((Xi − µi )2 ), i = 1, 2
Covariance and correlation:

1. µ3 (X1 , X2 ) = (X1 − µ1 )(X2 − µ2 )


E(µ3 (X1 , X2 )) = E[(X1 − µ1 )(X2 − µ2 )] = σ12 = cov(X1 , X2 ) is called
the covariance of X1 and X2 .

2. Cov(X1 , X2 ) = E(X1 X2 ) − E(X1 )E(X2 )


σ12
3. If σ1 > 0 and σ2 > 0, then define ρ = σ1 σ2 is called corelation coeffi-
cient of X1 and X2 .

Note: E((X1 − µ1 )(X2 − µ2 ))


= E(X1 X2 − µ1 X2 − µ2 X1 + µ1 µ2 )
= E(X1 X2 ) − µ1 µ2
(check that E is linear and distributive )

Cov(X1 , X2 ) = E(X1 X2 ) − µ1 µ2 = σ12 (15)


Cov(X1 , X2 ) X1 − µ1 X2 − µ2
ρ= = Cov( , ) (16)
σ1 σ2 σ1 σ2
The above two equation gives E(X1 , X2 ) = µ1 µ2 + ρσ1 σ2
i.e., the expected value of the pdf of two random variables is equal to the
pdf µ1 µ2 of their expectation + their covariance ρσ1 σ2 . Also, observe that
correlation is a covariance after standardization.
Some properties of covariance:

1. V ar(X) = Cov(X, X)

2. Cov(X, c) = 0, where c is constant.

3. Cov(X, Y ) = Cov(Y, X)

41
4. Cov(cX, Y ) = cCov(X, Y )

5. Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z)


Theorem 7.
−1 ≤ ρ ≤ 1.
x1 +2x2
Problem 22. Let X1 and X2 have the joint pmf f (x1 , x2 ) = 18 , x1 =
1, 2; x2 = 1, 2.
The marginal pmf are respectively,
f1 (x1 ) = 2x2 =1 x1 +2x = 2x18
1 +6
P
18
2
, x1 = 1, 2
P2 x1 +2x 3+4x2
f2 (x2 ) = x1 18 = 18 , x2 = 1, 2
2

Since f (x1 , x2 ) ̸= f1 (x1 )f2 (x2 ), X1 and X2 are dependent. The mean and
the variance of X1 are respectively
µ1 = 2x1 =1 x1 2x18 1 +6 8 10
= 14
P
= 18 + 2 18 9
σ 2 = 2x1 =1 x21 2x18 1 +6
− ( 14 2 = 24 − 196 = 20
P
9 ) 9 81 81

The mean and variance of X2 are respectively,


µ2 = 2x2 =1 x2 3+4x 7 11 20
P
18
2
= 18 + 2 18 = 18
P2 2 3+4x2
σ22 = x2 =1 x2 18 − ( 20 2
18 ) =
51
18 − 841
324 = 77
324

The covariance of X1 and X2 is


cov(X1 , X2 ) = 2x1 =1 2x2 =1 x1 x2 x118
+x2
− ( 14 29 1
P P
9 )( 18 ) = − 162

− 1 1
corelation coefficient ρ = q 162
20 77
= − √1540 = −0.025
( 81 )( 324 )

5.1 Conditional distributions in two variables


Let X and Y have a joint discrete distribution with pmf f (x, y) on space S.
Let the marginal pmf be f1 (x) and f2 (y) with spaces S1 and S2 respectively.
Let event A = {X = x} and B = {Y = y}, (x, y) ∈ S. Thus A ∩ B = {X =
x, Y = y},
P (A ∩ B) = P (X = x, Y = y) = f (x, y)
P (B) = P (Y = y) = f2 (y) > 0
We see that the conditional probability of event A given event B is P (A|B) =
P (A∩B) f1 (x,y)
P (B) = f2 (y) .

Definition 42. The conditional pmf of X, given that Y = y is defined by


f (x,y)
g(x|y) = f2 (y) provided f2 (y) > 0.
Similarly, the conditional pmf of y, given that X = x, is defined by

42
f (x,y)
h(y|x) = f1 (x) , provided f1 (x) > 0.

Example 45. Let X and Y have the joint pmf, f (x, y) = x+y 21 , x =
2x+3
1, 2, 3; y = 1, 2. We showed that f1 (x) = 21 , x = 1, 2, 3 and f2 (y) =
3y+6
21 , y = 1, 2
(x+y)
x+y
So, conditional pmf of X, given Y = y is equal to g(x|y) = 21
3y+6 = 3y+6 ,
21
x = 1, 2, 3; y = 1, 2.
4
P (X = 2|Y = 2) = g(2|2) = 12 = 13
x+y
Similarly, the conditional pmf of Y , given X = x is equal to , h(y|x) = 2x+3 ,
x = 1, 2, 3; y = 1, 2.

Note: 0 ≥ h(y|x). If we sum h(y|x) over y foe that fixed x, we obtain


P P f (x,y) f1 (x)
y h(y|x) = y f1 (x) = f1 (x) = 1.

Thus h(y|x) satisfies the conditions of a probability mass function. So we


can compute conditional probabilities such as
P
P (a < Y < b|X = x) = {y:a<y<b} h(y|x)

and conditional expectations such as


P
E(µ(Y )|X = x) = y µ(y)h(y|x)

in a manner similar to these associated with probabilities and expectations.


Two special conditional expectations are the conditional mean of Y , given
X = x, defined by
P
µY |x = E(Y |x) = y h(y|x)

and the conditional variance of Y , given X = x, defined by

σY2 |x = E((Y − E(Y |x))2 |x) = y (y − E(Y |x))2 h(y|x)


P

which can be computed using

σY2 |x = E(Y 2 |x) − (E(Y |x))2


2 .
Similarly, define µX|y , σX|y

Example 46. Background of the earlier example compute µY |x and σY2 |x


when x = 3.
µY |3 = E(Y |X = 3) = 2y=1 yh(y|3) = 2y=1 y( 3+y 4 10 14
P P
9 )= 9 + 9 = 9
2 14 2 3+y
σY2 |3 = E((y − 14 2 25 4 16 5 20
P
9 ) |X = 3) = y=1 (y − 9 ) ( 9 ) = 81 ( 9 ) + 81 ( 9 ) = 81 .

43
Binomial distribution to Trinomial distribution
Here we have 3 mutually exclusive and exhaustive ways for experiment to
terminate say, perfect, “seconds”, and defective. We repeated the experi-
ment n independents times and the probabilities p1 , p2 , p3 = 1 − p1 − p2 of
perfect, “seconds” and defective respectively remain the same in each trial.
In the n trials, let x1 = number of perfect items, X2 = number of seconds
items, X3 = n − X1 − X2 = number of defectives.
If x1 and x2 are non-negative integers such that x1 + x2 ≤ n, then the
probability of having x1 perfect, x2 “seconds” and n − x1 − x2 defectives is
px1 1 px2 2 (1 − p1 − p2 )n−x1 −x2

However if we want P (X1 = x1 , X2 = x2 ), then we recognize that X1 =


n!
x1 and X2 = x2 can be achived in x1 !x2 !(n−x 1 −x2 )!
different ways.
Hence the trinomial pmf is f (x1 , x2 ) =
n!
P (X1 = x1 , X2 = x2 ) = x1 !x2 !(n−x px1 px2 2 (1 − p1 − p2 )n−x1 −x2
1 −x2 )! 1
We know that X1 is b(n, p1 ) and X2 is b(n, p2 ) and thus X1 and X2 are
dependent as the product of these marginal pmf is not equal to f (x1 , x2 ).

Problem 23. In manufacturing certain item, it is bound that in normal


production about 95% of items are good ones, 4% are “seconds”, and 1%
are defectives.
This particular company has a program of quality control by statistical
method, and each hour an engine inspector observes 20 items selected at
random, counting the number X of “seconds” and the number Y of defec-
tives. If, in fact, the production is normal, then find the probability that in
this sample of size n = 20, at least two “seconds” on at least two defective
items are found.

Solution: Let A = {(x, y) : x ≥ 2ory ≥ 2}, then

P (A) = 1 − P (A‘ )
= 1 − P (X = 00r1andY = 0or1)
= 1 − P (X = 0, Y = 0) − P (X = 0, Y = 1) − P (X = 1, Y = 0) − P (X = 1, Y = 1)
20!
=1− (0.04)0 (0.01)0 (0.95)20
0!0!20!
= 0.204

5.2 Transformation of random variables


We considered the transformation of one random variable X with pdf f (x).
“change of variable technique”
X continuous pdf = f (x), c1 < x < c2
Y = u(x) is a continuous increasing function of X with the inverse function

44
X = v(Y ). c1 < x < c2 maps onto d1 = u(c1 ) < y < d2 = u(c2 )
The distribution function Y is G(y) = P (Y ≤ y) = P (u(X) ≤ y) = P (X ≤
v(y)), d1 < y < d2
G(y) = 0, y ≤ d1
G(y) = 1, y ≥ d2
Then Z v(y)
G(y) = f (x)dx, d1 < y < d2 (17)
0
From calculus: from (17), we get
′ ′
G (y) = g(y) = f (v(y)v (y)), d1 < y < d2

G (y) = g(y) = 0 if y < d1 or y > d2 .
For illustration, of this change of variable technique, Y = eX , where X has
1
pdf f (x) = Γ(α)θ α−1 e− xθ , 0 < x < ∞
αx

there c1 = 0, and c2 = ∞ and thus d1 = 1 and d2 = ∞, also X = ln Y = v(y).



Since v (y) = y1 , the pdf of Y is
1
g(y) = Γ(α)θ α−1 e−(ln y) ( 1 ), 1 < y < ∞.
α (ln y) y


g(y) = |v (y)|f (v(y))

Let X be binomial with parameters n and p. Since X has a discrete


distribution, Y = u(X) will also have a discrete distribution with the same
probabilities as these in support of X.

For illustration, with n = 3, p = 14 , Y = X 2


 √ √
We have g(y) = √3y ( 14 ) y ( 43 )3− 2 , X = 0, 1, 2, 3, Y = 0, 1, 4, 9

Two variable case


If X1 and X2 are two continuous type random variables with joint pdf
f (x1 , x2 ) and if Y1 = u1 (X1 , X2 ), Y2 = u2 (X1 , X2 ) has single valued in-
verse. X1 = v1 (Y1 , Y2 ), X2 = v2 (Y1 , Y2 )
The joint pdf of Y1 and Y2 is g(y1 , y2 ) = |J|f (v1 (y1 , y2 ), v2 (y1 , y2 )), y1 , y2 ∈
S1 , where the Jacobian J is the determinant
∂x1 ∂x1
∂y1 ∂y2
J= ∂x2 ∂x2
∂y1 ∂y2

We find the support S1 of Y1 , y2 by considering the mapping of the support


S of X1 , X2 under the transformation y1 = u1 (x1 , x2 ), y2 = u2 (x1 , x2 ).
This method of finding the distributions of Y1 and Y2 is called the change
of variable techniques.
x1 = v1 (y1 , y2 )
x2 = v2 (y1 , y2 )

45
Compute
∂x1 ∂x1
∂y1 ∂y2
J= ∂x2 ∂x2
∂y1 ∂y2

Example 47. Let X1 , X2 have the joint pdf f (x1 , x2 ) = 2, 0 < x1 < x2 < 1.
Consider the transformation Y1 = X X2 , Y2 = X2 .
1

It is certainly easy enough to solve for x1 and x2 , namely, x1 = y1 y2 , x2 = y2


and
∂x1 ∂x1
∂y1 ∂y2
J= ∂x2 ∂x2 = y2
∂y1 ∂y2

Note: The points for which y2 = 0, 0 ≤ y < 1, all map into the single
point x1 = 0, x2 = 0. i.e., many to one mapping and yet we restricting
ourself to one-one mapping. However, the boundaries are not part of our
support. Thus S1 is as depicted in (?) and according to the rule, the joint
pdf of y1 and y2 is g(y1 , y2 ) = |y2 |2 = 2y, 0 < y1 < 1, 0 < y2 < 1. It is
interesting to note that the marginal probabilities density functions are
R1
g1 (y1 ) = 0 2y2 dy2 = 1, 0 < y2 < 1
R1
g2 (y2 ) = 0 2y2 dy1 = 2y2 , 0 < y2 < 1

Thus Y1 = XX2 and Y2 = X2 are independent. Even though the computation


1

of Y1 depends very much on the value of Y2 , still Y1 and Y2 are independent


in the probability sense.

Example 48. Let X1 and X2 be independent random variables, each with


pdf f (x) = e−x , 0 < x < ∞. Hence their joint pdf is

f (x1 )f (x2 ) = e−x1 −x2 , 0 < x1 < ∞, 0 < x2 < ∞

Let us consider Y1 = X1 − X2 , Y2 = X1 + X2
y2 −y1
=⇒ x1 = y1 +y 2
2 , x2 = 2

1 1 1
J= 2 2 =
− 12 1
2 2

[figure 1][figure 2] The line segment on the boundary, namely x1 = 0, 0 <


x2 < ∞, and x2 = 0, 0 < x1 < ∞ map into line segment
y1 + y2 = 0, y2 > y1 and y1 = y2 , y2 > −y2 respectively.
These are shown in figure 2, and support of S1 is depicted there.
Since the region S1 is not bounded by horizontal and vertical line segments,
y1 and y2 are (?)
The joint pdf of Y1 and Y2 is

g(y1 , y2 ) = 12 e−y2 , −y2 < y2 < y2 , 0 < y2 < ∞

The marginal pdf of Y2 is

46
R y2 1 −y2
g2 (y2 ) = −y2 2 e dy1 = y2 e−y2 , 0 < y2 < ∞
That of Y1 is
(R ∞
1 −y2
2e dy2 = 12 ey1 −∞ < y1 < 0
g1 (y1 ) = Ry∞
1
1 −y2
y1 2e dy2 = 12 e−y 0 < y1 < ∞

that is, the expression for g1 (y1 ) depends on the location of y1 , although
this could be written as
g1 (y1 ) = 21 e−|y1 | , ∞ < y − 2 < ∞
which is called a double experiment pdf.
Theorem 8. Let (X1 , X2 , · · · , Xn ) be an n dimensional random variable of
the continuous type with pdf f (x1 , x2 , · · · , xn )

1. let y1 = g1 (x1 , x2 , · · · , xn )
y2 = g2 (x1 , x2 , · · · , xn )
..
.
yn = gn (x1 , x2 , · · · , xn )
be a one-one mapping Rn 7→ Rn , i.e., there exists the inverse transfor-
mation
x1 = h1 (y1 , y2 , · · · , yn ), x2 = h2 (y1 , y2 , · · · , yn ), · · · , xn = hn (y1 , y2 , · · · , yn )
defined over the range of the transformation.

2. Assume that both the mapping and its inverse are continuous

∂xi
3. Assume that the partial derivatives ∂yj , 1 ≤ i ≤ n, 1 ≤ j ≤ n exist and
are continuous

4. Assume that the Jacobian J of the inverse transformation


∂x1 ∂x1 ∂x1
∂y1 ∂y2 ··· ∂yn
∂x2 ∂x2 ∂x2
∂(x1 , · · · , xn ) ∂y1 ∂y2 ··· ∂yn
J= = ..
∂(y1 , · · · , yn ) .
∂xn ∂xn ∂xn
∂y1 ∂y2 ··· ∂yn

is different from zero for (y1 , y2 , · · · , yn ) in the image of the transfor-


mation.
Then (Y1 , Y2 , · · · , Yn ) has a joint absolutely continuous DF with pdf given
by
w(y1 , y2 , · · · , yn ) = |J|f (h(y1 , · · · , yn ), · · · , hn (y1 , · · · , yn ))

47
Example 49. Let X1 , X2 , X3 be iid RVs with common exponential function
(
e−x x > 0
f (x) =
0 otherwise

Also let Y1 = X1 +X2 +X3 , Y2 = X1X+X 1 +X2


2 +X3
, Y3 = X1X+X
1
2
. Then x1 = y1 y2 y3 ,
x2 = y1 y2 − x1 = y1 y2 (1 − y3 ), x3 = y1 − y1 y2 = y1 (1 − y2 ). The Jacobian
transformation is given by

y2 y2 y1 y3 y1 y2
J = y2 (1 − y3 ) y1 (1 − y3 ) −y1 y2 = −y12 y2
1 − y2 −y1 0
X1 +X2
Note, 0, y1 < ∞, 0 < y2 < 1, 0 < y3 < 1. Thus the 0 < X1 +X2 +X3 < 1.
Joint pdf of Y1 , Y2 , Y3 is given by w(y1 , y2 , y3 ) = y12 y2 e−y1 .

5.3 Bivariate normal distribution


Definition 43. A two dimensional RV (X, Y ) is said to have a bivariate
normal distribution if the joint pdf is of the form
Q(x,y)
1. f (x, y) = 1√
e− 2 , −∞ < x < ∞, −∞ < x < ∞
2πσ1 σ2 1−ρ2
where σ1 > 0, σ2 > 0, |ρ| < 1 and Q is positive definite quadratic form

2. Q(x, y) = 1
1−ρ2
[( x−µ1 2
σ1 ) − 2ρ x−µ
σ1
1 y−µ2 y−µ2 2
σ2 + ( σ2 ) ]

Theorem 9. The function defined by (1) and (2) with σ1 > 0, σ2 > 0, |ρ| <
1 is a joint pdf. The marginal pdfs of X and Y are respectively N (µ1 , σ12 )
and N (µ2 , σ22 ) and ρ is the correlation coefficient between X and Y .
R∞
Proof. Let f1 (x) = −∞ f (x, y)dy. Note that

y − µ2 x − µ1 2 x − µ1 2
(1 − ρ2 ) = ( −ρ ) + (1 − ρ2 )( )
σ2 σ1 σ1
y − [µ2 + ρ(σ2 /σ1 )(x − µ1 )] 2 x − µ1 2
={ } + (1 − ρ2 )( )
σ2 σ1
It follows that
(y−βx )2
exp {− 2 (1−ρ2 ) }
1 −(x−µ1 )2 R ∞ 2σ2
f1 (x) = √ exp [ ] −∞ √ √
σ1 2π 2σ12 σ2 1−ρ2 2π

where we have written


βx = µ2 + ρ σσ21 (x − µ1 )
The integrand is the pdf of a N (βx , σ22 (1 − ρ2 )) RV, so that,

48
f1 (x) = 1

σ1 2π
exp [− 12 ( x−µ 1 2
σ1 ) ], −∞ < x < ∞
R∞ R∞ R∞
Thus −∞ [ −∞ f (x, y)dy]dx = −∞ f1 (x)dx = 1
and f (x, y) is a joint pdf of two RVs of the continuous type. It also follows
that f1 is the marginal pdf of X, so that X is N (µ1 , σ12 ).
In a similar manner we can show that Y is N (µ2 , σ22 ).

6 Various Convergence and sampling distributions


Definition 44. Let {Fn } be a sequence of distribution functions. If there
exists a DF F such that as n → ∞, Fn (x) → − F (x)
at every point x at which F is continuous, we say that Fn converges in law
w
(or weakly) to F , and we write Fn −→F
If {Xn } is a sequence of RVs and {Fn } is the corresponding sequence of DFs,
we say
Xn converges in distribution (or law) to X if there exists an RV X with DF
w L
F such that Fn −
→ F . We write Xn −
→ X.
Remark. It is quite possible for a given sequence of DFs to converge to a
function that is not a DF.
Example 50. Consider the sequence of DFs
(
0 x<n
Fn (x) =
1 x≥n

here Fn (x) is the DF of the RV Xn degenerate at x = n. We see that Fn (x)


converges to a function F that is identically equal to 0 and hence is not a
DF.
Example 51. Let X1 , X2 , · · · , Xn be iid RVs with common density function
(
1
0 < x < θ(0 < θ < ∞)
f (x) = θ
0 otherwise

Let X(n) = max (X1 , X2 , · · · , Xn ). Then density function of X(n) is


( n−1
nx
fn (x) = θn , 0 < x < θ
0, otherwise

and the DF of X(n) is



0
 x<0
Fn (x) = ( xθ )n 0≤x<θ

1 x≥θ

49
We see that as n → ∞
Fn (x) →
− F (x) (
0 x≤θ
=
1 x≥θ
w
is DF. Hence Fn −
→F

The following example shows that the convergence in distribution does


not imply convergence in moments.

Example 52. Let Fn be a sequence of DFs defined by



0
 x<0
Fn (x) = 1 − n1 0 ≤ x < n

1 x≥n

w
clearly, Fn −
→ F , where F is the density function given by
(
0 x<0
F (x) =
1 x≥0

Note that Fn is the DF of the RV Xn with pmf P (Xn = 0) = 1 − n1 ,


P (Xn = n) = n1 and F is the density function of the RV X degenerate at 0.

Example 53. Let {Xn } be a sequence of RVs with pmf


(
1 x = 2 = n1
fn (x) = P {Xn = x} =
0 otherwise

Note that none of the fn ’s assign any probability to the point x = 2. It


follows that fn (x) →
− f (x) as n → ∞
where f (x) = 0, ∀x. However, the sequence of the DFs {Fn } of RVs Xn
converges to the function
(
0 x<2
F (x) =
1 x≥2

at all continuity points of F . Since F is the DF of the RV degenerate at


w
x = 2, Fn −→ F.

Theorem 10. Let {Xn } be a sequence of RVs such that Xn →


− X and let c
be a constant. Then
L
1. Xn + c −
→ X + c and

50
L
2. cXn −
→ cX, c ̸= 0

A slightly stronger concept of convergence is defined convergence in


probability.

Definition 45. Let {Xn } be a sequence of RVs defined on some probability


space (Ω, A, P ) we say that the sequence {Xn } converges in probability to
the RV X if for every ϵ > 0. P {|Xn − X| > ϵ} → 0 as n → ∞. We write
P
Xn −
→X

Remark. We emphasize that the definition says nothing about the conver-
gence of RVs Xn to the RV X in the sense in which it is understood in real
P
analysis/ calculus. Thus Xn −→ X does not imply that given ϵ > 0, we can
find an N such that |Xn − X| < ϵ, ∀n ≥ N . The last definition speaks only
of the convergence of the sequence in probabilities P {|Xn − X| > ϵ} to 0.
1
Example 54. Let {Xn } be a sequence of RVs with pmf P {Xn = 1} = n
and P {Xn = 0} = 1 − n1 . Then
(
P {Xn = 1} = n1 0 < ϵ < 1
P {|Xn | > ϵ} =
0 ϵ≥1

P
It follows that P {|Xn | > ϵ} → 0 as n → ∞ and we conclude that Xn −
→ 0.

Some Properties

P P
1. Xn −→ X ⇐⇒ Xn − X − → 0.
P {|Xn − X − 0| > ϵ} → 0 as n → ∞
⇐⇒ P {|Xn − X| > ϵ} → 0 as n → ∞

P P
2. Xn − → X, Xn − → Y =⇒ P {X = Y } = 1. P {|X − Y | > c} ≤
P {|Xn − X| > 2 } + P {|Xn − Y | > 2c }
c

=⇒ P {|X − Y | > c} = 0, ∀c > 0

P P
3. Xn −→ X =⇒ Xn − Xm − → 0 as n, m → ∞ for P {|Xn − Xm | > ϵ} ≤
P {|Xn − X| > 2ϵ } + P {|Xm − X| > 2ϵ }

P P P
4. Xn −
→ X, Yn −
→ Y =⇒ Xn ± Yn −
→X ±Y

P P
5. Xn −
→ X =⇒ kXn −
→ kX, k is constant.

51
P P
→ k =⇒ Xn2 −
6. Xn − → k2

P P P
7. Xn −→ a, Yn −→ b, a, b constants =⇒ Xn Yn − → ab
(Xn +Yn )2 −(Xn −Yn )2 P (a+b)2 −(a−b)2
Xn Yn = 4 −
→ 4 = ab

P P
→ 1 =⇒ Xn−1 −
8. Xn − → 1 for
1 1 1
P {| − 1| ≥ ϵ} = P { ≥ 1 + ϵ} + P { ≤ 1 − ϵ}
Xn Xn Xn
1 1 1
= P{ ≥ 1 + ϵ} + P { ≤ 0} + P {0 < ≤ 1 − ϵ}
Xn Xn Xn

P { X1n ≥ 1 + ϵ} = P { X1n − 1 ≥ ϵ} = P { 1−X


Xn ≥ ϵ} → 0 as n → ∞
n

P P P
9. Xn − → b, a, b are constants, b ̸= 0 =⇒ Xn Yn−1 −
→ a, Yn − → ab−1
P Yn P b P P
(Yn −
→ b, by (5) b −
→ 1, by (8) Yn → 1, by(7)Xn Ybn −
− → a =⇒
P
Xn Yn−1 −
→ ab−1 , by (5))

P P
10. Xn − → X, and Y an RV =⇒ Xn Y − → XY .
Note that Y is an RV, so that given δ > 0, there exists a k > 0 such
that P {|Y | > k} < 2δ .
Thus P {|Xn Y − XY | > ϵ}
= P {|Xn − Y ||Y | > ϵ, |Y | > k} + P {|Xn − X||Y | > ϵ, |Y | ≤ k}
< 2δ + P (|Xn − X| kϵ )
P
=⇒ Xn Y −
→ XY .

P P P
11. Xn −→ X, Yn − → Y =⇒ Xn Yn −
→ XY .
P
( Note (Xn − X)(Yn − Y ) −
→ 0 by (7). The result now follows on
multiplication
P
Xn Yn − Xn Y − XYn + XY −→0
as n → ∞, Xn Y → XY , XYn → XY by (10)
P P
Hence, Xn Yn − XY −
→ 0 by (1) Xn Yn −
→ XY )
P
Theorem 11. Let Xn − → X, and g be a continuous function defined on R.
P
Then g(Xn ) −
→ g(X) as n → ∞.
Now we explain the relationship between weak convergence and conver-
gence in probability.
P L
Theorem 12. Xn −
→ X =⇒ Xn −
→ X.

52
L P
→ k =⇒ Xn −
Theorem 13. Let k be a constant then Xn − → k.
L P
→ k ⇐⇒ Xn −
Corollary 13.1. Xn − → k.

6.1 The Law of Large Numbers


It is commonly believed that if a fair coin is tossed many times and the
proportion of head is calculated, that proportion will be close to 21 .

Theorem 14. (Law of Large Numbers)


Let X1 , X2 , · · · , Xi , · · · be a sequence of independent random variable with
2 1 Pn
E(Xi ) = µ and Var(Xi ) = σ . Let X̄n = n i=1 Xi . Then for any ϵ > 0,
P (|X̄n − µ| > ϵ) → 0 as n → ∞.

Proof. We first find E(X̄n ) and Var(X̄n );


E(X̄n ) = n1 ni=1 E(Xi ) = µ
P

Since the Xi ‘s are independent,


1 Pn σ2
Var(X̄n ) = n2 i=1 V ar(Xi ) = n
The result follows immediately from Chebyshev’s Inequality, which states
that
V ar(X̄n ) σ2
P (|X̄n − µ| > ϵ) ≤ ϵ2
= nϵ2
→ 0 as n → ∞

Example 55. (Example of weak Law of large Number)


Let X1 , X2 , · · · be iid RVs with common law b(1, p). Then EXi = p,
P
Var(Xi ) = p(1 − p), and we have X̄n −
→ p as n → ∞ by the previous
theorem.

Limiting Moment Generating functions (MGF)


Let X1 , X2 , · · · be a sequence of RVs. Let Fn be the density function of Xn ,
n = 1, 2, · · · , and suppose that the MGF Mn (t) of Fn exists. what happen
to Mn (t) as n → ∞? If it converges, does it always converges to an MGF?

Example 56. Let {Xn } be a sequence of Rvs with pmf P {Xn = −n} = 1,
n = 1, 2, · · · . If we have MGF= Mn (t) − E(etXn ) = e−tn .
Mn (t) → 0 as n → ∞, ∀t > 0.
Mn (t) = e−tn → 1 , n → ∞, t = 0.
So, 
0 t > 0

Mn (t) → M (t) = 1 t = 0 as n → ∞

∞ t<0

53
But M (t) is not an MGF. Note that if Fn is the DF of Xn then
(
0 x < −n
Fn (x) =
1 x ≥ −n
Fn (x) → F (x) = 1∀x as n → ∞.
Note: F (x) = 1∀x is not a DF, F (−∞) = limx→∞ F (x) ̸= 0.
L
Question: Suppose that Xn has MGF Mn and Xn − → X, where X is
an RV with MGF M . Does Mn (t) → M (t) as n → ∞?
The answer to this question is in the negative.
Example 57. Consider DF

0
 x < −n
Fn (x) = 21 + cn tan−1 (nx) −n ≤ x < n

1 x≥n

1
where cn = 2 tan−1 (n2 )
.
If x < 0, Fn (x) → 0 as n → ∞
If x ≥ 0, Fn (x) → 1 as n → ∞

(
0 x<0
Fn (x) → F (x) =
1 x≥0
F (x) is DF and Fn (x) → F (x) at all points of continuity of the DF F . The
MGF associated
Rn with Fn is
Mn (t) = −n cn etx 1+nn2 x2 dx
which exists forall t.
The MGF corresponding to F is M (t) = 1∀t. But Mn (t) ̸→ M (t), since
Mn (t) > ∞ if t ̸= 0.
3 3
Indeed, Mn (t) > 0 cn |t| 6x 1+nn2 x2 dx.
Rn

Remark. MGF are often useful for establishing the convergence of distri-
bution functions. Fact is: a distribution function Fn is uniquely determined
by its MGF Mn . The following theorem (we call continuity theorem) states
that this unique determination holds for limits as well.
Theorem 15. (Continuity Theorem)
Let Fn be a sequence of cumulative distribution functions with the corre-
sponding MGF M . If Mn (t) → M (t) for all t in an open interval containing
zero, then Fn (x) → F (x) at all continuity point of F .
Example 58. Let Xn be an RV with pmf P (Xn = 1) = n1 , P (Xn = 0) =
1 − n1 for each n. Then Mn (t) = n1 et + (1 − n1 ) exists ∀t ∈ R.
Mn (t) → 1 as n → ∞ ∀t ∈ R. Mn (t) → 1 as n → ∞ ∀t ∈ R. Here M (t) = 1
L w
is the MGF of an RV X degenerate at 0. Xn −
→ X. Also Fn −
→ F.

54
Theorem 16. (Central Limit P Theorem)
Let X1 , X2 , · · · be iid, Sn = ni=1 Xi , limn→∞ P ( σS√nn ) = Φ(n).

Example 59. An insurance company has 25, 000 automobile policy hold-
ers. If the yearly claim of a policy holder is a random variable with mean
320 and standard deviation 540, approximate the probability that the total
yearly claim exceeds 8.3 million.
Solution: Let X denote the total yearly claim. Number the policy holders,
and let Xi denote the yearly claim of policy holder
Pn i. with n = 25, 00, we
have from the central limit theorem that X = i=1 Xi will have approxi-
mately a normal√ distribution with mean 320 × 25000 = 8 × 106 and standard
4
deviation 540 25000 = 8.5381 × 10 . Therefore,
P (X > 8.3 × 106 )
X−8×106 8.3×106 −8×106
=P ( 8.5381×104 > 8.5381×104
)
X−8×10 6 0.3×10 6
= P ( 8.5381×104 > 8.5381×104 )
≈ P (Z > 3.51), Z is a standard normal
≈ 0.00023
Thus, there are only 2.3 chances out of 10, 000 that the total yearly claim
will exceed 8.3 million.

Example 60. Civil engineers believe that W , the amount of weight (in
units of 1000 pounds) that a certain span of a bridge can withstand without
structural damage resulting is normally distributed with mean 400 and stan-
dard deviation 40. Suppose that the weight (again,in units of 1000 pounds)
of a car is a random variable with mean 3 and standard deviation 0.3. How
many cars would have to be on the bridge span for the probability of struc-
tural damage to exceed 0.1?
Solution: Let Pn denote the probability of structural damage when there
are n cars on the bridge. That is, Pn = P (X1 + · · · + Xn ≥ W ) = P (X1 +
· · · + Xn − W ≥ 0), where Xi is the weight of the Pnith car, i = 1, 2, · · · , n.
Now it follows from central limit theorem that i=1 Xi is approximately
normal with mean 3n and variance 0.09n. Since W isPindependent of the
Xi , i = 1, 2, · · · , n and is also normal, it follows that ni=1 Xi − W is ap-
proximately
Pn normal with mean and variance given by
E( P X
i=1 i − W ) = 3n − 400
Var( i=1 Xi − W ) = Var( ni=1 Xi ) + Var(W ) = 0.9n + 1600
n P
−W −(3n−400)
Thus, Pn = P ( X1 +···+X √ n
0.09n+1600
≥ √−(3n−400)
0.09n+1600
400−3n
) ≈ P (Z ≥ √0.09n+1600 )
where Z is a standard normal random variable. Now P (Z ≥ 1.28) ≈ 0.1
400−3n
and so if the number of cars n is such that √0.09n+1600 ≤ 1.28 or n ≥ 117
then there is at least 1 chance in 10 that structural damage will occur.

Problem 24. The ideal size of a first year class at particular college is
150 students. The college, knowing from the past experience that, on the
average, only 30 percent of those accepted for admission will actually attend,

55
uses a policy of approving the applications of 450 students. Compute the
probability that more than 150 first year students attend this college.

Solution: Let X denote the number of students that attend, then as-
suming that each accepted applicant will independently attend, it follows
that X is a binomial random variable with parameters n = 450 and p = 0.3.
Since binomial is a discrete and the normal a continuous distribution, it is
best to compute P (X = i) as P (i − 0.5 < X < i + 0.5) when applying the
normal approximation, this is called the continuity correction. This yields
the approximation
P (X > 150.5) = P ( √X−450×0.3
450×0.3×0.7
≥√150.5−450×.3
450×0.3×0.7
) ≈ P (Z > 1.59) = 0.06
Hence, only 6 percent of the time do more than 150 of the first 450 accepted
actually attend.

Remark. One of the most important applications of the central limit theo-
rem is in regard to binomial random variables. Since such a random variable
X having a parameter (n, p) represents the number of successes in n inde-
pendent trials when each trial is a success with probability p, we can express
it as
X = X1 + · · · + Xn
where (
1 if the trial is success
Xi =
0 otherwise
Because E(Xi ) = p, Var(Xi ) = p(1 − p), it follows from the central limit
theorem that for n large √X−np will approximately be a standard normal
np(1−p)
variable.

It should be noted that we now have two possible approximation to


binomial probabilities:
The Poisson approximation, which yields a good approximation when n is
large and p is small and the normal approximation which can be shown to
be quite good when np(1 − p) is large.
The normal approximation will in general be quite good for values of n
satisfying np(1 − p) ≥ 10.
Approximate distribution of the sample mean X1 , · · · , Xn ∼ mean µ and
variance σ 2 .
(?) can approximate sample mean X̄ = X1 +···+X
n
n

Since constant multiple of a normal random variable is also normal, it follows


from central limit theorem that X̄ ∼ normal n >> 0
E(X̄) = µ, standard deviation of X̄ = √σn .
X̄−µ
This implies √
σ/ n
has ≈ a standard normal distribution.
χ2 , t
and F - distribution
Consider a statistical experiment that culminates in outcomes x which are

56
the values of the RV X. F is CDF of X. F will not be known completely,i.e.,
one or more parameters associated to F will be unknown.
Job is: to estimate these unknown parameters or to test the validity of the
certain statement about them.
or
We seek information about some numerical characteristics of a collection of
elements, called a population.

Definition 46. Let X be an RV with CDF F , and X1 , · · · , Xn be iid RVs


with common CDF F . THen the collection X1 , · · · , Xn is known as a random
sample of size n from the CDF F or simply as n independent observation
on X.

If
QX 1 , · · · , Xn is a random sample from F then their joint CDF F (X1 , · · · , Xn ) =
n
i=1 F (Xi )

Definition 47. Let X1 , · · · , Xn be n independent observations on an RV X,


and let f : Rn →
− Rm be a Borel measurable function (E.g. continuous
function or f has finitely many discontinuity, you think of these function for
this (?))
Then the RV f (X1 , · · · , Xn ) is called a (sample) statistic provided that it is
not a function of any unknown parameter(s).

Example 61. (Definition)


Let X1 , · · · , Xn be a random sample from a distribution F . Then the statis-
tic
X̄ = Snn = ni=1 Xni
P
is called sample mean P and the statistic
n
2
Pn (Xi −X̄)2 Xi2 −nX̄ 2
S = i=1 n−1 = i=1n−1
is called sample variance, and S is called the sample standard deviation.

Non-example: Let X ∼ N (µ, σ 2 ), where µ is known but σ 2 is un-


known. LetPX1 , · · · , Xn be a sample from N (µ, σ 2 ).Then, according to our
definition, ni=1 X
σ2
i
is not a statistic.

Remark. 1. “Sample” means “Random sample”

2. Note that the sample statistic X̄, S 2 etc. are random variables, while
the population parameters µ,2 and so on are fixed constants that may
be unknown.

Chi-square distribution
Recall that Chi-square distribution is a special case of the Gamma distribu-
tion. Let n > 0 be an integer. Then G( n2 , 2) is a χ2 (n) RV.

57
If X has a Chi-square distribution with n degrees of freedom, we write
X ∼ χ2 (n). If pdf is given by
( n/2−1 −x/2
x e
2n/2Γ(n/2)
x≥0
f (x) =
0 x<0

the MGF by M (t) = (1 − 2t)−n/2 for t < 21 . Mean E(X) = n and variance
Var(X) = 2n.
Student’s t-statistic

Definition 48. Let X ∼ N (0, 1) and Y ∼ χ2 (n), and let X and Y be


independent then the statistic T = √X is said to have a t-distribution
Y /n
with n degrees of freedom and we write T ∼ t(n).
Γ[(n+1)/2]
Theorem 17. The pdf of T defined above is given by fn (t) = √ (1 +
Γ(n/2) nπ
t2 −(n+1)/2
n) , −∞ < t < ∞.
([picture] The pdf fn (t) is symmetric in t.)
and fn (t) → 0 as t → ∞. For large n the t-distribution is close to the normal
2 2
distribution. Indeed, (1 + tn )−(n+1)/2 → e−t /2 as n → ∞. Moreover, as
t → ∞ or t → −∞, the tails of fn (t) → 0 much more slowly than do the tails
of the N (0, 1) pdf. Thus for small n and large to P (|T | > t0 ) ≥ P (|Z| > t0 ),
Z ∼ N (0, 1). P (|T | > tn,α/2 ) = α
Positive values of tn,α are tabulated for selected values of n and α.
Negative values may be obtained from symmetry tn,1−α = −tn,α .

Theorem 18. Let X ∼ t(n), n > 1. Then E(X r ) exists for r < n. In
particular, if r < n is odd, E(X r ) = 0 and if r < n is even E(X r ) =
nr/2 Γ[(r+1)/2]Γ[(n−r)/2]
F ( 12 )Γ( n )
.
2

n
Corollary 18.1. If n > 2, E(X) = 0 and E(X 2 ) = Var(X) = n−2
(Gosset proposed t-distribution, his penname was student)

Definition 49. Let X and Y be independent χ2 RVs with m and n degrees


of freedom respectively. Then RV F = X/m
Y /n is said to have an F -distribution
with (m, n) degrees of freedom and we write F ∼ F (m, n).

Theorem 19. The pdf of the F -statistic defined above is given by


− m+n
( Γ[(m+n)/2]
m m (m/2)−1
Γ(m/2)Γ(n/2) ( n )( n x) (1 + mn x)
2 x>0
g(x) =
0 x≤0
1
Remark. 1. If X ∼ F (m, n) then X ∼ Γ(n, m) (see definition)

58
2. We write Fm,n,α for the upper α percent point of the F (m, n) distri-
bution, that is, P (F (m, n) > Fm,n,α ) = α.
1
From (1) we have the following relation: Fm,n,1−α = Fn,m,α .

Theorem 20. Let X ∼ F (M, n). Then for k > 0,


m n
n k Γ[k+ 2 ]Γ[ 2 −k]
E(X k ) = ( m ) Γ( m )Γ( n ) for n > 2k.
2 2
n n2 (2m+2n−4)
In particular, E(X) = n−2 , n > 2, Var(X) = m(n−2)2 (n−4)
,n >4

Problem 25. Suppose that we are attempting to locate a target in 3D


and that the three coordinate errors (in meters) of the point chosen are
independent normal random variable with mean 0 and standard deviation
2.
Find the probability that the distance between the point chosen and the
target exceeds 3 meters.
Solution: If D is the distance then D2 = X12 + X22 + X32 , where Xi is the
error in the ith coordinate. Sine Zi = X2i , i = 1, 2, 3 are all standard normal
variables, it follows that
9
P (D2 > 9) = (Z12 + Z22 + Z32 > )
4
2 9
= P (χ3 > )
4
= 0.5222.

7 Estimation of parameters
To be typed

8 Testing of hypotheses
To be typed

References
[1] Jhon A. Rice, Mathematical Statistics and Data Analysis.

[2] R.V. Hogg, E.A. Tanis D.L. Zimmerman, Probability and Statistical
Inference

[3] V.K. Rohatgi, A. K. Md. Ehsanes Saleh, An introduction to probability


and statistics. Second edition. Wiley Series in Probability and Statis-
tics: Texts and References Section. Wiley-Interscience, New York, 2001.
xviii+716 pp

59

You might also like