0% found this document useful (0 votes)
4 views

Notes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Notes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

1 Introduction

There are two types of approach to probability :-

• Axiomatic approach

• Classical approach

2 Random Experiment
It is an experiment in which

• The set of all possible outcomes of experiment is known in advance.

• Any specific outcome is not known in advance.

• Experiment can be repeated under identical conditions.

3 Sample Space
It is a pair (S, Ω) where

• S is set of all possible outcomes of the experiment.

• Ω is a σ - field of subsets of S.

• Example:-
If we toss a coin. then

S = {H, T }
and

Ω = {{H}, {T }; {H, T }, ϕ}

• Events:- Subsets of Sample space are called events.

4 Operations on sets
For A, B ⊆ S

Experiment Sets
A or B A∪B
A and B A∩B
not A AC

1
5 De-Morgans’ Law
1. (A ∪ B)C = AC ∩ B C

2. (A ∩ B)C = AC ∪ B C

6 Sigma Field
Let S be a set Ω = set of subsets of S satisfying following conditions :-

1. ϕ ∈ Ω.

2. If A ∈ Ω, then AC ∈ Ω

3. If A, B ∈ Ω, then A ∪ B ∈ Ω

Example:-

S = {1, 2, 3, 4, 5, 6}
If A = {1, 2, 3}
B = {4, 5}
then

Ω = ϕ, S, A, AC , B, B C , A ∪ B, AC ∩ B C


7 Probability
Let S be a sample space & Ω be the sigma field generated by S then probability is a function

P : Ω → [0, 1].

satisfying the following conditions:-

1. 0 ⩽ P (A) ⩽ 1; ∀A ∈ Ω

2. P (S) = 1

3. If A1 , A2 , . . . , An such that Ai ∩ Aj = ϕ for all i, j then

n
! n
[ X
P Ai = P (Ai )
i=1 i=1

2
7.1 Properties
1. P AC = 1 − P (A)


2. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
3. P (A∪B ∪C) = P (A)+P (B)+P (C)−P (A∩B)−P (B ∩C)−P (C ∩A)+P (A∩B ∩C)
Remark:- The triple (S, Ω, P ) is called probability space.

8 Conditional Probability
Let (S, Ω, P ) be probability space and let H ∈ S with P (H) > 0, then for an arbitrary
A ∈ S, we shall write :

P (A ∩ H)
P (A/H) = ; P (H) ̸= 0
P (H)
is called conditional probability of A given H.

9 Bayes Theorem
Let S be a sample space and B1 , B2 , . . . , Bn be a partition of S, then for any A ⊆ S

P (Bi) · P (A/Bi)
P (Bi/A) = Pn
i=1 P (A/Bi) · P (Bi)

10 Independent Events
Two events, A and B are said to be independent if
P (A ∩ B) = P (A) · P (B)
Remarks:-
1. If A and B are independent, then
• AC and B are independent
• A and B C are independent
• AC and B C are independent.
2. If A, B, and C are independent, then
• P (A ∩ B ∩ C) = P (A) · P (B) · P (C)
• P (A ∩ B) = P (A) · P (B)
• P (B ∩ C) = P (B) · P (C)
• P (A ∩ C) = P (A) · P (C)
Conditions (2), (3), (4) do NOT imply (1).

3
11 Random Variable
Definition:- A Random variable (X) is a function which convert elements of sample space
into real numbers.

Example:- In tossing of coin twice: If X : no. of heads, then X can have values 0, 1, 2.
X Events
0 TT
1 HT, T H
2 HH
Here {0, 1, 2} is called Support of Random variable X.
Let X be a random variable having support {x1 , x2 , . . . , xn } with

P (X = xi ) = pi
then the probability distribution of X is represented as
X x1 x2 ··· xn
p(x) p1 p2 ··· pn
provided
Pn
i=1 pi = 1; pi ⩾ 0
pi ’s are called Probability Mass Function(PMF).
Example:- Getting first head in repeated tossing of a coin.

S = {H, T H, T T H, T T T H, . . .}
X = Number of tosses to get first head.

x 1 2/ 3 4 ...
P (x) 1/2 1/4 1/8 1/16 ...

X 1 1 1
pi = + + + ....
i=1
2 4 8
1
2
= 1 =1
1− 2

12 Types of Random Variables


Random variable are of two types:-
• Discrete : If S is mapped onto finite or countably infinite set. For example:- tossing a
coin, throwing a die.

• Continuous : If S is mapped onto an uncountable set. For example:- Velocity of light,


length, height, etc.

4
13 Cumulative Distribution function: (CDF)
A CDF FX of random variable X is defined as:-
FX (x) = P (X ⩽ x); ∀x ∈ R
For example if

x 0 1 2
P (x) 1/4 1/2 1/4
then

FX (−3) = P (X ⩽ −3) = 0
1
FX (0.5) = P (X ⩽ 0.5) = P (X = 0) =
4
3
FX (1) = P (X ⩽ 1) = P (X = 0) + P (X = 1) =
4
FX (2) = P (X ⩽ 2) = 1

13.1 Properties
• FX (−∞) = P (X ≤ −∞) = 0
• FX (∞) = P (X ≤ ∞) = 1
• FX is right continuous as:-

FX (a) = FX a+


• It is Monotonically Increasing Function i.e. for x1 , x2 ; such that x1 ⩽ x2

⇒ FX (x1 ) ⩽ FX (x2 )
• Note that we can write,
P (X = a) = FX (a) − FX a−


Example:-


 0 ;x < 0
1/4 ;0 ⩽ x ⩽ 1

FX (x) =

 3/4 ;1 < x < 2
1 ;x ⩾ 2

It is NOT CDF as

FX (1) ̸= FX 1+


Because it is NOT Right Continuous.


Remark:- Graph of FX (x) has jumps at X = xi , and these jumps are the probabilities of
X = xi & so we can find distributions.

5
13.2 Continuous Random Variable
If CDF of X that is, FX (x) of a random variable is continuous, then it is called continuous
random variable.
• If P (X = a) = 0, then Random Variable X is continuous. for example:-

0;
 x<0
FX (x) = x; 0⩽x<1

1; x⩾1

• In such case
P (a < x < b) = P (a ⩽ x < b) = P (a < x ⩽ b) = P (a ⩽ x ⩽ b)

14 Probability density function


Since for continuous variable, we don’t have pi for X = xi ; thus we have
A non-negative function f (x) which is said to be probability density function of Random
variable X if
Z x
FX (x) = fX (t)dt
−∞
Note that Z ∞
fX (t)dt = 1
−∞
(
1 ;0 ⩽ x < 1
For example:- fX (x) =
0 ; otherwise
Remark:- For continuous Random variable
d
FX (x) = FX′ (x) = fX (x)
dx

15 Moments
15.1 Expectation of X (1st Moment,Mean/Average)
n
X
E(X) = xi p i
i=1
Ex:- X = Rolling a die
X: 1 2 3 4 5 6
P : 1/6 1/6 1/6 1/6 1/6 1/6
E(X) = 1/6 + 2 · 1/6 + 3 · 1/6 + · · · · +6 · 1/6 = 3.5

6
• For infinite values: ∞
X
E(X) = x i Pi
i=1

exists, if E(|X|) < ∞.

15.2 Expectation of a function of X


Let g(X) be a function of X having probability distribution

X: x1 x2 · · · xn
p: p1 p 2 · · · pn
n
X
E[g(X)] = g (xi ) pi .
i=1

Ex:
X : −1 0 1
p : 1/3 1/3 1/3
1 1 1
E(X) = (−1). + 0. + 1. = 0
3 3 3
Now find E(X 2 )
let Y = X 2
Y :0 1
p : 1/3 2/3
E(Y ) = 0.1/3 + 1.2/3 = 2/3
On the other hand we can find it as
n
X
2
E(X ) = x2i .pi
i=1

E(X 2 ) = (−1)2 .1/3 + (0)2 .1/3 + (1)2 .1/3 = 2/3


Case 1: If g(X) = X r
n
X
r
E(X ) = xri .pi
i=1

This is called rth moment about origin

n
X
E[(X − a)r ] = (xi − a)r .pi
i=1

is called rth moment about point a.

7
Case 2: Second Moment about mean
n
X
2
E[(X − E(X) ] = [xi − E(X)]2 .pi
i=1

Let E(X) = µ then we have


n
X
E[(X − E(X)2 ] = [xi − µ]2 .pi
i=1

The term E[(X − E(X) ] is called the variance of X (Measure of uncertainty) and denoted
2

as V ar(X).
n
 X
2
(xi − µ)2 pi

V ar(X) = E (X − µ) =
i=1
n
X n
X n
X
V ar(X) = x2i pi − 2µ xi p i + µ 2 pi
i=1 i=1 i=1
2 2

V ar(X) = E X − [E(X)]
V ar(X) ≥ 0
E(X − µ) = 0
E(|X − µ|) → Mean Deviation
It is always better to use x2 instead of |x| because it has more properties and easy to handle.

V ar(X) = σ 2 = E (X − µ)2


σ 2 = E X 2 − (E(X))2


σ is known as standard deviation.


Ex: Consider
X : −1 0 1 2
p : 1/6 1/3 1/3 1/6
−1 1 1 1
E(X) = +0+ + =
6 3 3 2
   2
1 1 4 1 7 1
σ2 = + + − ⇒ σ2 = −
6 3 6 2 6 4
11
σ2 =
12

The Moment Generating function(MGF) of X is defined as

  X xi t
MX (t) = E eXt = e · pi

8
Ex:
X : −1 0 1 2
P : 1/6 1/3 1/3 1/6
1 1 1 1
MX (t) = e−t · + e0 · + et · + e2t ·
6 3 3 6
• E(aX + b) = aE(X) + b

• E(a) = a

• Var(aX + b) = a2 Var(X)

• MX (t) = E eXt


 
X2 2
• MX (t) = E 1 + Xt + 2!
t + ...

t2
• MX (t) = 1 + tE(X) + 2!
E (X 2 ) + . . .

• d
dt
(MX (t)) = E(X) + t · E (X 2 ) + . . .

• E(X) = d
dt
[MX (t)] t=0

d2
• E (X 2 ) = dt2
[MX (t)]
t=0

dr
• E (X r ) = dtr
{MX (t) |t=0

The MGF of a probability distribution is uniquely defined.

If the cdf of X is continuous then we have

FX (a) = FX a+ = FX a− )


P (X = a) = 0.

If X is continuous then

P (a < X < b) = P (a ⩽ X < b) = P (a < X ⩽ b) = P (a ⩽ X ⩽ b)

9
16 Probability Density Function
Let X be a continuous random variable with CDF FX then pdf of X is defined by fX , if
Z x
FX (x) = fX (x)dx
−∞

In case of discrete random variable, we call it as probability mass function.


We have FX (∞) = 1, that is,
Z ∞
fX (x)dx = 1
−∞

Here FX (x) is probability distribution function of X.

Point probability in case of continuous random variable is 0 .

P (X = a) = FX (a) − FX (a− )
Z a Z a−
= fX (x)dx − fX (x)dx = 0
−∞ −∞

Both the integrations are same, so we have the point probability zero.

P (a ≤ X ≤ b) = P (a < X < b) = FX (b) − FX (a)


Z b Z a
= fX (x)dx − fX (x)dx
−∞ −∞
Zb
= fX (x)dx
a

where, X represents the random variable and x represents an observed value.

Characteristics of fX : −
1) fX (P dF ) should be non-negative.
R∞
2) −∞
fX (x)dx = 1 since FX (∞) = 1

Question: f (x) = 3x2 ; 0 < x < 1 and 0 ; otherwise


Solution:- As we can see that 3x2 is non-negative. So, we can write
Z ∞ Z 1
fx (x)dx = 3x2 dx
−∞ 0
 1
= x3 0 = 1
So, f (x) is valid density function.
find P 12 < X < 43 ?

10
  Z 3/4
1 3
P <X< = 3x2 dx
2 4 1/2
 3/4
= x3 1/2
19
= Ans.
Z64 a
FX (a) = fX (x)dx
−∞

This should be continuous which is true because

Z a+ Z a Z a−
fX (x)dx = fX (x)dx = fX (x)dx
−∞ −∞ −∞

Case:- When X is a Discrete Random Variable


n
X
E(X) = p i xi
i=1
n
X
E(g(X)) = g (xi ) pi
i=1

Case:- when X is a Continuous Random Variable.


Z ∞
E(X) = xfX (x)dx
−∞
Z ∞
E(g(X)) = g(x)fX (x)dx
−∞
R∞
provided −∞
g(x)fX (x)dx < ∞
(
k ;0 < x < 1
Question : fX (x) =
0 ; otherwise
If k = 1. then fX (x) is valid as normalised.
Z 1
⇒ kdx = 1
0
⇒ k[x]10 = 1
⇒k=1

kx ; 0 < x < 2
Question: fX (x) =
0 ; otherwise
find i) P (−1 < X < 1)
ii) E(X)
(ii) Var(X)
Solution:

11
Z ∞
kxdx = 1
−∞
Z 2
⇒ kxdx = 1
0
k  2 2 k[4 − 0]
⇒ x 0=1⇒ =1
2 2
1
⇒k =
2
(
x
;0 < x < 2
So, we have fX (x) = 2
0 ; otherwise
R1
(i) P (−1 < X < 1) = x
0 2
dx = 14 .
(ii)
Z ∞
E(X) = xfX (x)dx
−∞
Z 2
x
= x · dx
2
Z0 2 2
x
= dx
0 2
1  3 2
= x 0
6
E(X) = 4/3
(iii)
Var(X) = E X 2 − [E(X)]2

Z 2
x 16
= x2 · dx −
0 2 9
1 4 2 16
 
= x 0−
8 9
16
=2−
9
Var(X) = 2/9

12
Characteristic function:
ϕX (t) = E eiXt )
Z ∞
= eixt fX (x)dx
Z−∞

|ϕX (t)| ≤ eixt |fX (x)| |dx|
Z−∞

= fX (x)dx
−∞
Z ∞
= fX (x)dx = 1
−∞

So, we have |ϕX (t)| ≤ 1.


If MX (t) exists, then

ϕX (t) = MX (it)
Inverse Fourier Transformation
Z ∞
1
fX (x) = e−ixt ϕX (t)dt
2π −∞

ϕX (t) is uniquely defined for a probability distribution

ϕX (t) ←→ fX (x)
MX (it) = ϕX (t) = E eiXt


(it)2
E X2 + · · ·

= 1 + (it)E(X) +
 2!
1 d
E(X) = ϕX (t)
i dt

• MX (t) = E eXt


• MX (0) = 1

• MaX+b (t) = E e(aX+b)t




• MaX+b (t) = ebt E eaXt




• MaX+b (t) = ebt MX (at)

13
17 Functions of Random Variables:
Let X has probability distribution

X : −1 0 1 2
pX : 1/3 1/6 1/6 1/3

Let Y = g(X) → can we obtain the probability distribution of Y ?


Let Y = eX
Y : e−1 1 e e2
p : 1/3 1/6 1/6 1/3
Let Y = X 2
Y :0 1 4
p : 1/6 1/2 1/3
Let X be a Continuous Random Variables with Pdf, fX (x) then what is the Pdf of Y = g(X)
?
(
1 ;0 < x < 1
Let fX (x) =
0 ; otherwise
Let Y = eX then
(
1 ;0 < y < e
fY (y) =
0 ; otherwise

Is it correct? The integral of fY (y) is not equal to 1 so, it is not a pdf. Now we will go by
principle term F

Z x
FX (x) = fX (t)dt
−∞
!
Z g2 (x)
d
we have · fX (t)dt = fX (g2 (x)) g2′ (x) − fX (g1 (x)) g1′ (x)
dx g1 (x)

d
∴ (FX (x)) = fX (x)
dx
Now we will first find FY (y) and from it we will find fY (y).

14
FY (y) = P (Y ≤ y)
= p(g(X) ≤ y)
= P X ≤ g −1 (y)


= fX g −1 (y)


d
FY (y) = (fY (y)
dy
 d
∴ fY (y) = fX g −1 (y) · · g −1 (y)

dy
Let Y = e and find fY (y)?
X

Solution:
fY (y) = p(Y ≤ y)
= p eX ≤ y (we can invert here because it is monotonically increasing)


= p(X ≤ ln y)

d
= fX (ln y) (ln y)
dy
1
fY (y) = ; 1 < y < e.
y
By formula:
 d −1
fY (y) = fX g −1 (y) (g (y))
dy
fX (ln y) 1
= ·
1 y
1
= ; 1 < y < e.
y
as fX (ln y) = 1.
Theorem. Let X be a continuous random variable with pdf fX (x) and Y = g(X) be a
monotonic and differentiable function, then the pdf of Y is given by,
d −1
fY (y) = fX (g −1 (y)) (g (y)) , y ∈ S(Y ).
dy
Exercise . Let X be a random variable having pdf given by,
(
1 0<x<1
fX (x) =
0 otherwise
and (
2 −1 < x < 1
gX (x) =
0 otherwise
Find the density of Y = X 2 .

15
Bernoulli Distribution: A Bernoullian trail is an experiment with two possible outcomes:
success (1) or failure (0) with probabilities p and 1 − p, respectively. Mathematically, we
denote it as P (X = 1) = p and P (X = 0) = 1 − p with 0 < p < 1. Then X ∼ Bernoulli(p).

Bernoulli Trails: A sequence of n trails is said to be a Bernoulli trail if it satisfies the


following conditions:
1. Each trail results in either success or failure.
2. The probability of success/failure remains constant in each trail.
3. Trails are independent.
Binomial Distribution: A random variable X is said to have a binomial distribution,
denoted by X ∼ Bin(n, p), if its probability mass function is given by,
 
n k
P (X = k) = p (1 − p)(n−k) , k = 0, 1, 2, · · · , n,
k
where n is the number of trails, p is the success probability, and k is the number of successes.

Note:
1. i.i.d. → Independent and Identical distribution.
2. E[X] = nx=0 xP (X = x) = nx=0 x nx px (1 − p)(n−x) = np.
P P 

3. V [X] = np(1 − p).


4. Let X ∼ Bin(n, p) and Y ∼ Bin(m, p) and both are independent, then X + Y ∼
Bin(n + m, p).
5. Let X ∼ Bin(n, p), then n − X ∼ Bin(n, 1 − p).
Poisson Distribution: It counts the number of successes in n trails, where n is large and
the success probability is very small. A random variable X is said to be Poisson distribution,
denoted by X ∼ P oisson(λ), if its probability mass function is given by,
e−λ λk
P (X = k) = , k = 0, 1, 2, · · · .
k!
Theorem. Let X ∼ Bin(n, p) and λ = np = constant, then X ∼ P oisson(λ) as n → ∞.
Proof.
 
n k
P (X = k) = p (1 − p)(n−k)
k
 k  (n−k)
n! λ λ
= 1−
k!(n − k)! n n
λ n
k

λ 1− n n!
= k .
k! nk (n − k)! 1 − λ n

16
Now, as n approaches ∞, we arrive at the following expression:

e−λ λk
P (X = k) = , k = 0, 1, 2, ...
k!

Note:
(k)
1. E[X k ] = MX (t)|t=0 where MX (t) = E[eXt ].

2. The Moment Generating Function (MGF) of a Poisson random variable is given as


follows:

X
MX (t) = ext P (X = x)
x=0

X e−λ λx
= ext
x=0
x!

−λ
X (et λ)x
=e
x=0
x!
λ(et −1)
=e .

′ ′
3. MX (t) = λet eλ(e −1) . Thus, MX (0) = E[X] = λ.
t

′′
4. MX (0) = E[X 2 ] = λ2 + λ.

5. V ar[X] = E[X 2 ] − (E[X])2 = λ.

Theorem. Let X ∼ P oisson(λ) and Y ∼ P oisson(µ) and both are independent, then
X + Y ∼ P oisson(λ + µ).

Geometric Distribution:

Definition:1 Let the random variable X count the number of failures before getting the
first success in a sequence of Bernoulli trails with a success probability of p. Then, a random
variable X is said to have a geometric distribution with probability p denoted by X ∼ Geo(p)
if it’s probability mass function is given by,

P (X = k) = (1 − p)k p, k = 0, 1, 2, · · · .

Definition:2 Let the random variable X count the number of trials to get the first success
in a sequence of Bernoulli trails with a success probability of p. Then, a random variable
X is said to have a geometric distribution with probability p denoted by X ∼ Geo(p) if it’s
probability mass function is given by,

P (X = k) = (1 − p)k−1 p, k = 1, 2, 3, · · · .

17
Negative Binomial Distribution:

Definition:1 Let the random variable X count the number of failures before getting the
rth success in a sequence of Bernoulli trails with a success probability of p. Then, a random
variable X is said to have a negative binomial distribution with probability p denoted by
X ∼ N B(r, p) if it’s probability mass function is given by,
 
k+r−1 r
P (X = k) = p (1 − p)k , k = 0, 1, 2, · · · .
r−1
Definition:2 Let the random variable X count the number of trials to get the rth success in
a sequence of Bernoulli trails with a success probability of p. Then, a random variable X is
said to have a negative binomial distribution with probability p denoted by X ∼ N B(r, p)
if it’s probability mass function is given by,
 
k−1 r
P (X = k) = p (1 − p)k−r , k = r, r + 1, · · · .
r−1
Theorem. Let X1 , X2 , · · · , Xn are the i.i.d random variables and each Xi ∼ Geo(p). Define
X = X1 + X2 + · · · + Xn then X ∼ N B(r, p).
Coupon collector problem: If each box of a brand of cereals contains a coupon and there
are n different types of coupons. Assume that each time you collect a coupon, it is equally
likely to be any of the n types. What is the expected number of coupons needed until you
have a complete set?

Solution: Let N be the total number of coupons that you need to get all n types. Define
N = N1 + N2 + · · · + Nn , where Ni is the number of coupons that you need to get the ith
type. Also, Ni − 1 ∼ Geo( n−(i−1)
n
).
E[N ] = E[N1 + N2 + · · · + Nn ]
= E[N1 ] + E[N2 ] + · · · + E[Nn ]
n n n
=1+ + + ··· + .
n−1 n−2 1
Uniform Distribution: A random variable X is said to have uniform distribution over an
interval (a, b) if its probability density function is given by,
1
fX (x) = , a < x < b.
b−a
Theorem. Let X be a random variable with cumulative distribution function (CDF) F (x),
then Y = F (X) ∼ U (0, 1).
Gamma Distribution: Let X be a random variable having density,
1 x
α−1 − β
fX (x) = x e , x > 0,
Γαβ α
then, X is said to follow Gamma distribution with non-negative parameters α and β. It is
denoted by X ∼ G(α, β).
Note:

18
R∞
1. Γn = 0
xn−1 e−x dx, where n is a positive integer.

2. Γ 12 = π.

3. Γn = (n − 1)Γn − 1, n > 0.

4. Γn = (n − 1)!.
R ∞ n−1 −ax
5. Γn
an = 0
x e dx.

Z ∞
1 x
E(X) = x α
xα−1 e− β dx
0 Γα β
1
= Γ(α + 1)β α+1
Γα β α
= αβ

V ar(X 2 ) = αβ 2
Exponential Distribution: If we put α = 1 in eqn.(??) then we will get fX (x) =
−x
e , x > 0 which is exponential distribution.
1 β
β

X is said to be exponential random variable if its pdf is given by


1 −x
fX (x) = eβ , x>0 OR fX (x) = λe−λx , x > 0
β
x
FX (x) = 1 − e− β x > 0 OR FX (x) = 1 − e−λx x > 0
1 1
E(X) = β, V ar(X) = β 2 OR E(X) = , V ar(X) = 2
λ λ
Memoryless Property:

P (X > m + n|X > m) = P (X > n)

19
Proof.

P (X > m + n, X > m)
P (X > m + n|X > m) =
P (X > m)
P (X > m + n)
=
P (X > m)
1 − FX (m + n)
=
1 − FX (m)
1 − (1 − e−(m+n)/β )
=
1 − (1 − e−m/β )
= e−n/β
= 1 − FX (n)
= P (X > n)

Chi-Square Distribution: In X ∼ G(α, β) if we put α = n


2
and β = 2 then we will get

1 n
fX (x) = x 2 −1 e−x/2 , x > 0
Γ( n2 ) 2n/2

X ∼ G( n2 , 2) is called Chi-square random variable X ∼ χ2(n) .

E(X) = n, V ar(X) = 2n

Normal/Gaussian Distribution: A random variable X is said to follow normal distribu-


tion if its pdf is given by,
1 −1 x−µ 2
fX (x) = √ e2( σ ) , x∈R
2π σ

We can write X ∼ N (µ, σ 2 ) where,

E(X) = µ and V ar(X) = σ 2

Moment generating function for Normal distribution is


1
MX (t) = µt + σ 2 t2
2

20
Proof.
Z ∞
xt 1 −1 x−µ 2
MX (t) = E(e ) = √ ext e 2 ( σ ) dx
2π σ −∞
Z ∞
1 −1 2 2 2
=√ e 2σ2 (x −2xµ+µ −2σ xt) dx
2π σ −∞
 
Z ∞ −1 2
1 2 2 2 2
2 x −2x(µ+σ t)+(µ+σ t) +µ −(µ+σ t)
2 2
=√ e 2σ dx
2π σ −∞
   2
Z ∞ −1 2 −1
1 2
2 µ −(µ+σ t)
2 2
2 x−(µ+σ t)
=√ e 2σ e 2σ dx
2π σ −∞
   2
−1
Z ∞ −1
1 2
2 µ −(µ+σ t)
2 2 2
2 x−(µ+σ t)
=√ e 2σ e 2σ dx
2π σ −∞
1 2 2
= e(µt+ 2 σ t )

Standardization(Normalization): Let X ∼ N (µ, σ 2 ) then we can normalize it by


subtracting µ and divide by σ.
X −µ
Z= , where E(Z) = 0 and V ar(Z) = 1
σ
and the density function of standard normal distribution is,
1 −z2
fZ (z) = √ e 2

and we can write it Z ∼ N (0, 1), here Z is standard normal random variable.

Properties of Normal distribution:

1. E(X) = µ and V ar(X) = σ 2 .

2. If Z = X−µ
σ
, Z ∼ N (0, 1).

3. aX + b ∼ N (aµ + b, a2 σ 2 ).

4. Mean = Mode = Median.

5. E(Z (2n+1) ) = 0, Z (2n+1) is odd function so integration become 0.

6. E(Z 3 ) = 0, so skewness is zero.


Skewness=E(( X−µ
σ
)3 ) {skewness is zero for symmetric curve.}

7. E(Z 4 ) = 3, so Kurtosis is 3.

Note: A probability distribution can be determined uniquely by

21
i. Probability density function.

ii. Moment generating function.

Proof. 3. let Y = aX + b then

MY (t) = E(e(ax+b)t )
= ebt E(eatx )
= ebt MX (at)
1 2 a2 t2
= ebt eµat+ 2 σ
1 2 a2 )t2
= e(aµ+b)t+( 2 σ

so, aX + b ∼ N (aµ + b, a2 σ 2 ).
Example: X ∼ N (5.3, 1) and Y ∼ N (5.5, 1.5) then
X−5.3
1
∼ N (0, 1) and Y√−5.5
1.5
∼ N (0, 1).

Example: Let X ∼ N (15, 9) find P (12 < X < 20).


Solution:
 12 − 15
X − 15 20 − 15 
P (12 < X < 20) = P < <
3 3 3
 5 
=P −1<Z <
3
 5
=P Z< − P (Z < −1)
3
 
values of P Z < 53 and P (Z < −1) comes from table.

Moment Ineqalities:

Theorem. Let h be a non-negative function of a random variable X such that E(h(X)) ex-
ists, then

E(h(X))
P (h(X) ≥ ϵ) ≤
ϵ
Proof. Z ∞
E(h(X)) = h(x)f (x)dx, where f (x) is pdf of X
−∞

Let, A = {x; h(x) ≤ ϵ}


Z Z
E(h(X)) = h(x)f (x)dx + h(x)f (x)dx
A Ac
| {z } | {z }
h(x)≥ϵ h(x)<ϵ

22
Z Z
E(h(X)) ≥ h(x)f (x)dx ≥ ϵ f (x)dx
A A
Z
E(h(X))
f (x)dx ≤
A ϵ
E(h(X))
P (h(x) ≥ ϵ) ≤
ϵ

Markov’s Inequality:
E(|X|r )
P (|X| ≥ k) ≤
kr
Proof. Using the previous result
let h(x) = |x|r and ϵ = k r then

E(|X|r )
P (|X|r ≥ k r ) ≤
kr
E(|X|r )
P (|X| ≥ k) ≤
kr

Example: Let, say given E(X 2 ) = 2 then P (|X| ≥ 3) =?

E(|X|2 )
P (|X| ≥ 3) ≤
32
2
P (|X| ≥ 3) ≤
9
Chebyshev’s Inequality: Let, X be a random variable with with E(X) = µ and V ar(X) =
σ 2 then
1
P (|X − µ| ≥ kσ) ≤ 2
k
or
1
P (|X − µ| ≤ kσ) ≥ 1 − 2
k
Joint probability Distribution: Let X and Y be random variables with support {x1 , x2 , . . . , xn }
and {y1 , y2 , . . . , ym } then joint probability distribution can be written in the form of

y1 y2 y3 ... ym
x1 p11 p12 p13 ... p1m
 
x2  p21 p22 p23 ... p2m 
T = x3  p31 p32 p33 ... p3m
 
,
..  . .. .. ..
.  ..

. . ... . 
xn pn1 pn2 pn3 . . . pnm

pij = P (X = xi , Y = yj ), where i ∈ {1, 2, ..., n}, j ∈ {1, 2, ..., m} and pij ∈ [0, 1].

23
XX
pij = 1
i j
X
pi+ = pij = Marginal density of X.
j
X
p+j = pij = Marginal density of Y.
i

Joint Information: Anything related to both X and Y.


Marginal Information: anything related to only X or Y.
Conditional Information: Conditioning one variable (fixing) and prediction of other vari-
able.
P (X = x, Y = y)
P (X = x|Y = y) =
P (Y = y)
Continuous joint pdf: fXY (x, y), support (x, y)
Z Z
fXY (x, y)dx dy = 1
R

Marginal Pdf: Z
fX (x) = fXY (x, y)dy
Y
Z
fY (y) = fXY (x, y)dx
X
Conditional Pdf:
fXY (x, y)
fX|Y (x|y) =
fY (y)
Conditional Expectation:
Z
E(X|Y ) = x fX|Y (x|y)dx
X
Z
E(Y |X) = y fY |X (y|x)dy
Y
Expectation of a function g(X, Y ):
Z Z
E(g(X, Y )) = g(x, y)fXY (x, y)dx dy

Covariance of X and Y:
 
Cov(X, Y ) = E (X − E(X))(Y − E(Y ))
 
= E (X − X̄)(Y − Ȳ )
= E(XY ) − E(X)E(Y )

24
If we replace Y by X then we will get

Cov(X, X) = V ar(X)

Note:

1. If Cov(X,Y) is positive then X and Y are positively correlated.

2. If Cov(X,Y) is negative then X and Y are inversely correlated.

Correlation Coefficient:
Cov(X, Y )
ρXY = p
V ar(X) V ar(Y )
Question:
fXY (x, y) = ke−(2x+y) , x > 0, y > 0

1. P (X < 1, Y < 1)

2. P (X < Y )

3. E(X), E(Y), E(X|Y ), E(Y |X)

4. Cov(X,Y)

5. ρXY

Predective Model:
Amount of Rain (X) ⇌ Amount of crops (Y )
Ŷ = H(X)
Input (Given) → System → Output (Predicted)
Exercise: Are random variable X and Y related where

fXY (x, y) = kxy, 0 < x < 1, 0 < y < 1 and x + y ≤ 1 .

Solution:
Z 1 Z 1−y
kxydxdy = 1
0 0
Z 1
k
y(1 − y)2 dy = 1
0 2
 4 1
k y 2y 3 y 2
− + =1
2 4 3 2 0
k = 24

Also Z 1−x
fX (x) = 24xydy = 12x(1 − x)2 , 0<x<1
0

25
gives
2y
fY |X (y|x) = , 0<y <1−x
(1 − x)2
Also E(X) = 25 , E(Y ) = 52 , E(X 2 ) = 15 , E(Y 2 ) = 1
5
and E(XY ) = 2
15
gives

Cov(X, Y ) = E(XY ) − E(X)E(Y )


2
=−
75
1 1
V ar(X) = , V ar(Y ) =
25 25
Cov(X, Y ) 2
ρ= p =− .
V ar(X)V ar(Y ) 3
Correlation Coefficient(ρ): measures the strength of linear dependence and −1 ≤ ρ ≤ 1.

• ρ = −1 → X and Y are perfectly linearly negatively related.

• ρ = 1 → X and Y are perfectly linearly positively related.

• ρ = 0 → Not linearly related/dependent.

Z
E(Y |X) = yfY |X (y|x)dy
1−x
2y 2
Z
2
= 2 dy = (1 − x).
0 (1 − x) 3
2
E(Y |X = x) = (1 − x) (Regression)
3
2
Yp = (1 − x) (Regression)
3
Covariance Matrix(Σ):
 
V ar(X) Cov(X, Y )
Σ=
Cov(Y, X) V ar(Y )
 1 2

− 75
= 25
2 1 (Positive semidefinite and symmetric)
− 75 25

Question: Find the value of k?

f (x, y) = k, 0 < x < 1


f (x, y) = ke−x , 0 < x < y < ∞.

Independence of X and Y -

26
• If X and Y are independent, then

fXY (x, y) = fX (x)fY (y) ∀ x, y

Or P (X = x, Y = y) = P (X = x)P (Y = y) ∀ x, y.

• If dependent then

– Are they positively dependent or negatively dependent by covariance.


– Are they linearly dependent or not. (Correlation Coeffiecient)

• ρ = 0 or Cov(X, Y ) = 0 does not imply independence of X and Y .

• If ρ → ±1(close to) then, Yp = α + βX (Linear Regression)


Choose α and β such that E = (Yi − α − βXi )2 is minimum.
P

∂E X X
= 0 =⇒ Yi = αn + β Xi
∂α
∂E X X X
= 0 =⇒ Xi Yi = α Xi + β Xi2
∂β

Solving normal equations we get α̂ and β̂


1
xi yi − n1
P P  1P 
n
xi n yi
β̂ = P 2
1
x2i − n1
P
xi
n (1)
Cov(X, Y )
=
V ar(X)

Theorem 1. X and Y are independent if and only if

fXY (x, y) = h(x)g(y) x ∈ S(X) and y ∈ S(Y )

Question: Are X and Y linearly dependent where

fXY (x, y) = kxy, 0 < x < 1, 0 < y < 1 and x + y < 1.

Answer: Dependent.
Question: If X and Y are independent then find

1. Cov(X,Y)

2. ρ

3. E(XY )

4. Var(X+Y)

Solution:

27
1. Cov(X,Y)=0,

2. ρ = 0,

3. E(XY)=E(X)E(Y),

4. Var(X+Y)=Var(X)+Var(Y).

Note:

• If X and Y are independent, then

fXY (x, y) = fX (x)fY (y) ∀ x, y

• If X, Y and Z are independent, then

fXY Z (x, y, z) = fX (x)fY (y)fZ (z) ∀ x, y, z

Integrating with respect to x, we get

fY Z (y, z) = fY (y)fZ (z)

Similarly
fXY (x, y) = fX (x)fY (y)
and
fXZ (x, z) = fX (x)fZ (z)

• Let X1 , X2 , ..., Xn be n random variables then these are independepent iff

fX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ) = fX1 (x1 )fX2 (x2 )...fXn (xn ). ∀x1 , x2 , ..., xn .

Exercise: Show that X and Y are independent where

fXY (x, y) = kxe−y 0 < x < 1, y > 0

Solution: Since fXY (x, y) = h(x)g(y), where h(x) = kx and g(y) = e−y
Note: X and Y are independent iff

MX+Y (t) = MX (t)MY (t), ∀t

Question: If X and Y are independent and X ∼ N (0, 1) and Y ∼ N (0, 1) then what is the
distribution of X + Y .
Solution: Given that
t2
MX (t) = e 2
t2
MY (t) = e 2

28
Also
2
MX (t)MY (t) = et = MX+Y (t)
=⇒ X + Y ∼ N (0, 2).
Question: If X and Y are independent and X ∼ N (1, 4) and Y ∼ N (2, 9) then what is the
distribution of 2X + 3Y − 1?
Solution: Given that
M2X+3Y −1 (t) = E(e(2X+3Y −1)t )
= E(e2Xt )E(e3Y t )E(e−t )
= MX (2t)MY (3t)e−t .
• If X ∼ N (α, β 2 ) and Y ∼ N (γ, δ 2 ) and X, Y are independent then
aX + bY + c ∼ N (aα + bγ + c, a2 β 2 + b2 δ 2 ).

• Var(aX + bY + c) = a2 Var(X) + b2 Var(Y ) + 2abCov(X, Y ).


• If X1 , X2 , ..., Xn are independent random variables such that Xi ∼ N (µi , σi2 ), then
n n n
!
X X X
2 2
ai X i ∼ N ai µ i , ai σi .
i=1 i=1 i=1

Question: If X ∼ P (λ1 ) and Y ∼ P (λ2 ) and X and Y are independent then what is the
distribution of X + Y ?
Solution: X + Y ∼ P (λ1 + λ2 ).
Question: If X ∼ U (0, 1) and Y ∼ U (0, 1) and X and Y are independent then what is
the distribution of X + Y ?
Solution:
et − 1
MX (t) =
t
t 2
(e − 1)
MX+Y (t) = (Not known)
t2
Lemma: Given that X and Y follows fXY (x, y) and X and Y are independent then what
is the distribution of X + Y ?
Proof: Let U = X + Y
FU (u) = P (U ≤ u)
= P (X + Y ≤ u)
Z Z
= fXY (x, y)dxdy
X+Y ≤u
Z ∞ Z u−y
= fX (x)fY (y)dxdy
−∞ −∞
Z ∞ Z u−y 
= fX (x)dx fY (y)dy
−∞ −∞
Z ∞
= FX (u − y)fY (y)dy
−∞

29
R∞
=⇒ fU (u) = −∞ fX (u − y)fY (y)dy.
Question: If X ∼ Exp(λ1 ) and Y ∼ Exp(λ2 ) and X and Y are independent then what is
the distribution of X + Y ?
Solution: From given

fX (x) = λ1 e−λ1 x , x>0


fY (y) = λ2 e−λ2 y , y > 0
Z ∞
fU (u) = fX (u − y)fY (y)dy
−∞
Z ∞
= λ1 e−λ1 (u−y) λ2 e−λ2 y dy
0
Z u
= λ1 λ2 e−λ1 (u−y) e−λ2 y dy.
0
Transformation:

fX (x) → fY (y), Y = g(X)

d −1
fY (y) = fX (g −1 (y)) (g (y))
dy

Let fXY (x, y) be the joint density of X & Y .


Let U = g1 (X, Y ) & V = g2 (X, Y ) be two transformations such that
1. g1 and g2 are invertible and X = h1 (U, V ), Y = h2 (U, V ).

2. Partial derivatives of g1 & g2 exist.

3. The Jacobian (J ̸= 0)
∂x ∂x
J= ∂u ∂v ̸= 0
∂y ∂y
∂u ∂v
then the joint density is given by

fU V (u, v) = fXY (h1 (u, v), h2 (u, v)) |J| , (u, v) ∈ S(U, V )

Exercise . Let fXY (x, y) = x + y, 0 ≤ x, y ≤ 1 and U = X + Y & V = X − Y .Then


Find fU V (u, v).

Solution:
U +V U −V
X= & Y =
2 2
1 1
J= 2 2
1 −1
2 2

30
−1
J=
2

U +V U −V
fU V (u, v) = fXY ( , ) |J|
2 2

U +V U −V 1
fU V (u, v) = ( + )( )
2 2 2

u
fU V (u, v) =
2
and 0 ≤ U + V ≤ 2 & 0 ≤ U − V ≤ 2


u
2
 0 < u < 1, −u < v < u
u
fU V (u, v) = 1 < u < 2, u − 2 < v < −u + 2
2
0 otherwise

and

fU (u) = 2u, 0 < u < 2

Exercise . Let X ∼ U (0, 1) & Y ∼ U (0, 1) and X ,Y are independent Random Variable.U =
X + 2Y and V = 3X − Y . Then find fXY (x, y) and fU V (u, v).

Theorem. Let X1 ∼ N (µ1 , σ12 )........X


P n ∼PN (µ
2
n , σn ) and if X1 , X2 , .......Xn are indepen-
2 2
dent.Then a1 X1 + ..... + an Xn ∼ N ( ai µi , ai σi )

Let Y = X12 and fX1 → fY ,


√ 1 √ 1
fY (y) = fX1 ( Y )( √ ) + fX1 (− Y )( √ )
2 Y 2 Y

31
1 √ √
fY (y) = √ (fX1 ( Y ) + fX1 (− Y ))
2 Y
and Z ∼ N (0, 1) and Y = Z 2
1 √ √
fY (y) = √ (fX1 ( Y ) + fX1 (− Y ))
2 Y

1 1 −y 1 −y
fY (y) = √ ( √ e 2 + √ e 2 )
2 Y 2π 2π

1 −y
fY (y) = √ e2
2πy

1 −1 −y
fY (y) = q 1
y 2 e 2 , y>0
1
2
2 2

Y ∼ χ21

Note:

X ∼ N (µ, σ 2 )

X −µ
Z=
σ

Z ∼ N (0, 1)

Note:

X ∼ P (λ), x = 0, 1, ....

2X + 1 ≁ P

Every linear function does not necessarily follow the Poisson distribution.
Theorem. Let Y1 , Y2 , ....., Yn ∼ χ21 . if Y1 , Y2 , ....., Yn are independent.Then Yi ∼ χ2n .
P
i

Proof.
P
Yi t
MY (t) = MP Yi (t) = E(e )

n
Y
MY (t) = E(eYi t )
i=1

32
n
Y
MY (t) = MYi (t)
i=1

Z ∞
1 −1 1
MYi (t) = q y 2 e−y( 2 −t) dt
1 21
2
2 0

1 Γ( 12 )
MYi (t) = q 1
1 21
2 ( 12 − t) 2
2

1
MYi (t) = √
1 − 2t

1
MY (t) = n
(1 − 2t) 2

Y ∼ χ2n

t- Distribution: Let X ∼ U (0, 1) & Y ∼ χ2n and X, Y are independent Random


Variables.Then √XY ∼ tn (t- distribution).
n
Define U = √XY and V = Y
n

fXY (x, y) = fX (x)fY (y)

1 −x2 1 n −y
fXY (x, y) = √ e 2 1 n y 2 −1 e 2 , y > 0
2π Γ22 2

Then x = u nv and y = v
p
pv
√u
n 2 vn
J=
0 1

r
v
J=
n

r
1 −u2 v n
−1 −v v
fU V (u, v) = √ 1 n
e 2n v 2 e2 , u∈R & v>0
2πΓ 2 2 2 n

33
Z ∞
1 n 1 −v 2
(1+ un )
fU (u) = √ n v ( 2 + 2 )−1 e 2 dv
2πΓ 12 2 2 0

1 Γ( n2 + 12 )
fU (u) = √ n 2 n 1 ,u ∈ R
2πΓ 12 2 2 ( 12 (1 + un )) 2 + 2

F- Distribution: Let X ∼ χ2n & Y ∼ χ2m and X, Y are independent Random Vari-
X
able.Then n
Y ∼ F (n, m) (F- distribution).
n
X
Define U = n
Y and V = Y
n

fXY (x, y) = fX (x)fY (y)

Then X = U V n
m
and Y = V
n n
Vm Um
J=
0 1

n
J =V
m

fU V (u, v) = fXY (x, y) |J|

Bivariate Normal Distribution: Let X, Y is said to follow a bivariate normal distribution


BN (µ1 , µ2 , ρ, σ12 , σ22 ) if the joint density is given by

1 −1 x − µ1 2 x − µ1 y − µ2 y − µ2 2
fXY (x, y) = p exp( 2
[( ) − 2ρ( )( )+( ) ])
2πσ1 σ2 1− ρ2 2(1 − ρ ) σ1 σ1 σ2 σ2

where , x & y ∈ R
If

ρ = 0 → fXY (x, y) = fX (x)fY (y)

→X & Y are independent.

If (X, Y ) ∼ BN (µ1 , µ2 , ρ, σ12 , σ22 ) then,

ρ=0⇒ Complete independent.

we know that

independence ⇒ ρ = 0

34
Therefore, we have

Independence ⇔ ρ = 0

In General,

ρ = 0 ⇔ LinearIndependence

If (X, Y ) ∼ BN (µ1 , µ2 , ρ, σ12 , σ22 ) then, X,Y can only be linearly related.
1 −1 x − µ1 2 x − µ1 y − µ2 y − µ2 2
fXY (x, y) = p exp( 2
[( ) − 2ρ( )( )+( ) ])
2πσ1 σ2 1 − ρ2 2(1 − ρ ) σ1 σ1 σ2 σ2

1 −1 y − µ2 x − µ1 2 1 y − µ2 2
fXY (x, y) = p exp( 2
[(( ) − ρ( )) − ( ) ])
2πσ1 σ2 1 − ρ2 2(1 − ρ ) σ2 σ1 2 σ2

1 −1 x−µ1 2
( ) 1 −1 σ2
fXY (x, y) = √ e 2 σ1 √ p exp( 2 )σ 2
[y − µ 2 − ρ (x − µ1 )]2 )
2πσ1 2πσ2 1 − ρ 2 2(1 − ρ 2 σ 1

fXY (x, y) = fX (x)fY |X (y|x)

1 1 y−δ 2
fY |X (y|x) = √ e2( γ )
2πγ

Y |X ∼ N (δ, γ 2 )

σ2
E(Y |X) = δ = µ2 + ρ (x − µ1 )
σ1

E(Y |X) = α + βx

σ2 Cov(X, Y ) σ2 Cov(X, Y )
β̂ = ρ = =
σ1 σ1 σ2 σ1 V ar(X)

Convergence in Probability
A sequence of random variables {X1 ,X2 ,. . . ,Xn } is said to converge in probability to another
random variable X if:
P(|Xn − X| > ϵ) → 0 as n → ∞
P
Xn −→X

35
Example:
Let {Xn } be a sequence of random variables defined by

P (Xn = 1) = 1
n
and P (Xn = 0) = 1 − 1
n

Let ϵ = 12 ,

(
P (Xn = 1) = 1
if 0 < ϵ < 1,
Then P (|Xn − X| > ϵ) = P (|Xn − X| > 21 ) = n
0 if ϵ ≥ 1
P
It follows that P (|Xn | > ϵ) → 0 as n → ∞, and we conclude that Xn −
→ 0.

Convergence in Moments
A sequence {X1 , X2 , . . . , Xn } of random variables is said to converge in rth moment to a
random variable X if:

E(Xrn ) → E(Xr )

Convergence in Law
Let {Xn } be a sequence of random variables with C.D.F. {Fn } defined on the probability
space (Ω, F, P ). Further, let X be another random variable with C.D.F., F(·), then;
{Xn } is s.t.b. converging in distribution to X if:

limn→∞ Fn (x) → F(x) at all points x where F(·) is continuous.

We denote it by:
L
Xn →
− X

18 Weak Law of large Number


Let {Xn } be a sequence of random variables, and let Sn be
n
X
Xk
k=1

, where n = 1, 2, 3, 4, . . .. We say that {Xn } obeys the weak law of large numbers with
respect to the sequence of constants {Bn }, where Bn > 0 and Bn → ∞, if there exists a
sequence {An } such that Bn−1 (Sn − An ) → 0 as n → ∞. Here, An is called the centering
constant, and Bn is called the normalizing constant.

36
18.1 Theorem
Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables,
each having mean E[Xi ] = µ. Then, for any ε > 0,
 
X1 + · · · + Xn
P − µ > ε → 0 as n → ∞
n

Proof: We shall prove the result only under the additional assumption that the random
variables have a finite variance σ 2 . Now, as

σ2
   
X1 + · · · + Xn X 1 + · · · + Xn
E = µ and Var =
n n n

it follows from Chebyshev’s inequality that

σ2
 
X1 + · · · + Xn
P −µ>ε ≤ 2
n nε

and the result is proved. ■


Cor1: If the random variables Xn are identically distributed and pairwise uncorrelated
with E(Xi ) = µ and Var(Xi ) = σ 2 < ∞, we can choose An = nµ and Bn = nσ 2 .
Pn
Cor2: We can choose Bn = n provided that n i=1 σi2
.

Cor3: Let Xn be a sequence of random variables such that An = nµ, Bn = n, and


nσ 2
n2
→ 0 as n → ∞. Thus, {Xn } is a sequence of pairwise-uncorrelated identically dis-
tributed random variables with finite variance. Moreover, Snn → µ.

Example (Die Rolling)


Consider n rolls of a die. Let Xi be the outcome of the i-th roll. Then Sn = X1 +X2 +. . .+Xn
is the sum of the first n rolls. This is an independent Bernoulli trial with E(Xi ) = 27 . Thus,
by the Law of Large Numbers (LLN), for any ϵ > 0,
 
7
P |Sn /n − | ≥ ϵ → 0 as n → ∞.
2

This can be restated as for any ϵ > 0,


 
7
P |Sn /n − | < ϵ → 1 as n → ∞.
2

37
Example(Poisson Random Variables)
Let {Xn ; n ≥ 1} be a sequence of i.i.d. Poisson random variables with parameter λ. Then
we have
λk
P (X1 = k) = e−λ ,
k!
for k = 0, 1, 2, . . .. Thus, µ = E(X1 ) = λ and Var(X1 ) = λ. Hence, by the Weak Law of
Large Numbers (WLLN),
P
X̄n −
→ µ.

Important
Let {Xn } be any sequence of random variables. Define Sn = n−1 nk=1 Xk . A necessary and
P
sufficient
 2 condition for the sequence {Xn } to satisfy the weak law of large numbers is that
E 1+Y Yn
2 tends to 0 as n tends to infinity.
n

Example Let X1 , X2 , . . . be iid random variables with E(X1k ) < ∞ for some positive
integer k. Then,
Xn
Xjk → E(X1k ) as n → ∞.
j=1

If E(Xi2 ) < ∞, then


n
1X 2
Xi → E(Xi2 ) as n → ∞.
n i=1

Example
d
Let X1 , X2 , . . . be iid C(1, 0) random variables. We have seen that n−1 Sn −→ C(1, 0), so that
n−1 Sn does not converge in probability to 0. It follows that the weak law of large numbers
does not hold.

If E(Xi2 ) < ∞, then


n
1X 2
X → E(Xi2 ) as n → ∞.
n i=1 i
(
Xi , if |Xi | ≤ c, i = 1, 2, . . . , n 0,
(Xci ) =
if |X | > c, i = 1, 2, . . . , n
Pn i c
Define Sn = i=1 Xi and Mn = ni=1 E[(Xi )c ].
c
P

Example
Let X1 , X2 , . . . be iid random variables with a common probability density function (pdf)
given by (
1 + x2ρ+ρ , x ≥ 1
f (x) =
, x<1

38
where ρ > 0.

Then, the expected value of X is


Z ∞
1 1+ρ
E(X) = 1 + ρ dx = , if ρ < ∞.
1 x1+ρ ρ

and the law of large numbers holds, i.e., n−1 Sn tends to 1+ρ
ρ
as n tends to infinity.

19 Limiting Moment Generating Function:


Lemma 19.1. Let x1 , x2 , . . . be a sequence of random variables. Let fn be the density function
of xn for n = 1, 2, . . ., and suppose that the MGF Mn (t) of fn exists. What happens to Mn (t)
as n → ∞? If it converges, does it always converge to the MGF?
Example: Let {xn } be a sequence of random variables with PMF P {xn = −n} = 1,
n = 1, 2, . . .. We have Mn (t) = E(etx ) = e−tnx → 0 as n → ∞ for  all t > 0 and Mn (t) → ∞
0
 if t > 0
for all t < 0 and Mn (t) → 1 at t = 0. Then Mn (t) → M (t) = 1 if t = 0 . But M (t)

∞ if t < 0

(
0 if x < −n
is not an MGF. Note that if fn is the density function of xn , then Fn (x) =
1 if x ≥ −n
and f is not a distribution function.
Now suppose that xn has MGF Mn and Xn → X where X is a random variable with
MGF M (t). Does Mn (t) → M (t) as n → ∞?

Lemma 19.2. Let f (x) = O(x) if f (x)


x
→ 0 as x → 0. We have
  n
a 1
lim 1 + + O = ea for every real a.
n→∞ n n

Example 19.1. Let X ∼ P (λ), then the MGF of X is given by M (t) = exp[λ(et − 1)] for
all t. √ √
√ , then the MGF of Y is given by MY (t) = e−t/ λ M (t/ λ). Logarithm of
Let Y = X−λ
λ
MY (t) is  
t t
log MY (t) = − √ + log M √
λ λ
t √
= − √ + λ(et/ λ − 1).
λ
Expanding the exponential term and solving, after solving,

t2
log MY (t) = as λ → ∞,
2
2 /2
so that MY (t) → et as λ → ∞, which is the MGF of a N (0, 1) random variable.

39
20 What is Random Sample?
A Collection of I.I.D.(Independent Identically Distributed) Random Variables (X1 , X2 , X3 , . . . , Xn )
is called a "Random Sample" of size n.
Qn Since each of X1 , X2 , X3 , . . . , Xn has same distribution and are independent, so f (x1 , x2 , x3 , . . . , xn ) =
i=1 f (xi , θ)

21 What is Statistic?
Statistic- A function of random sample.
There are two most important statistics and these are :

Sample Mean : Mathematically Given by : X̄ = 1


Pn
n i=1 Xi

Sample Variance : Mathematically Given by : S 2 = 1


Pn
n−1 i=1 (Xi − X̄)2

22 Properties of Mean and Variance :


Let, X1 , X2 , X3 , . . . , Xn be a random sample such that E(Xi ) = µ and V ar(Xi ) = σ 2
If n is large , then by Central Limit Theorem,
As the sample size (n) becomes sufficiently large, the sampling distribution of the standard-
ized sample mean, which is X̄−µ √σ
, approaches a standard normal distribution (Z) with mean
n
0 and standard deviation 1, denoted as:

X̄ − µ n→∞
Z= −−−→ N (0, 1)
√σ
n

n
1X
E(X̄) = E( Xi )
n i=1

σ2
Var(X̄) =
n
1
S2 = (Xi − X̄)2
P
(n−1)

1 X
S2 = (Xi − µ + µ − X̄)2 (2)
(n − 1)

1 X
S2 = ((Xi − µ)2 + (µ − X̄)2 + 2(Xi − µ)(µ − X̄)2 ) (3)
(n − 1)

40
1 X
S2 = ( (Xi − µ)2 − n(X̄ − µ)2 ) (4)
(n − 1)
1
=⇒ E(S 2 ) = E((Xi − µ)2 ) − nE((X̄ − µ)2 ))(5)
P
(n−1)
(

1 X 2 σ2
2
E(S ) = [ σ − n ] = σ2 (6)
n−1 n
from equation (3)
n
(n − 1)S 2 X Xi − µ 2 X̄ − µ
2
= ( ) − ( σ )2 (7)
σ i=1
σ √
n
n
(n − 1)S 2 X 2
= Zi − Z 2 (8)
σ2 i=1

(n − 1)S 2
≈ χ2( n−1) (9)
σ2

23 Statistics :
let say we have given a data set:

X1 , X2 , X3 , . . . , Xn → f (xi ) = exp(λ) (10)

assumption-1: fX (x : θ)
f (x; θ)

Unknown Parameter

Population Parameter

23.1 Unbiased statistics:→


A statistic g(X1 , X2 , X3 , . . . , Xn ) is an unbiased estimator of θ if

E[g(X1 , X2 , X3 , . . . , Xn )] = θ
E[g(X1 , X2 , X3 , . . . , Xn )] − θ = bias

41
23.2 Efficient Statistic→
g1 (X1 , X2 , X3 , . . . , Xn ) is efficient than g2 (X1 , X2 , X3 , . . . , Xn ) if Var[g1 (X1 , X2 , X3 , . . . , Xn )]
≤ V ar[g2 (X1 , X2 , X3 , . . . , Xn ) ]

MVUE → minimum variance unbiased estimator


Example :X1 , X2 , X3 , . . . , Xn follows N(µ, σ 2 )
given that :E(Xi ) =µ, var(Xi ) =σ 2

(X1 + X2 − X3 + X4 + X5 )
g1 (X1 , X2 , X3 , . . . , Xn ) =
5
g2 (X1 , X2 , X3 , . . . , Xn ) = X̄ (11)
X1 + X n
g3 (X1 , X2 , X3 , . . . , Xn ) = (12)
2
E(X1 + E(X2 ) − E(X3 ) + E(X4 ) + E(X5 )
E[g1 (X1 , X2 , X3 , . . . , Xn )] =
5


E[g1 (X1 , X2 , X3 , . . . , Xn )] = → not Unbiased (13)
5
E[g2 (X1 , X2 , X3 , . . . , Xn )] = E(X̄ = µ → Unbiased (14)
E(X1 ) + E(Xn )
E[g3 (X1 , X2 , X3 , . . . , Xn )] = = µ → Unbiased (15)
2
σ2
Var[g2 (X1 X2 ...Xn )] = Var(X̄) = (16)
n
X1 + X2 σ2
Var[g3 (X1 X2 ...Xn )] = Var( )= (17)
2 2
Var[g2 (...)] <= Var[g3 (...)] (18)
X̄ → MUVE for µ & S2 (for σ 2 )
σ2
1) X̄ ∼ N µ, n ) ←→ X̄−µ

∼ N (0, 1)
n

2) 1
S 2 = (n−1) (Xi − µ)2 − n(X̄ − µ)2
P
2
(n−1)S
= ni=1 ( Xiσ−µ )2 − ( X̄−µ )2
P
σ2 √σ
n

here ( Xiσ−µ )2 →χ2n


here ( Xi −µ 2
√σ
) →χ21
n

3) X̄ and S 2 are independent


X̄−µ
√σ
(19)
n
t= q
(n−1)S 2
σ(n−1)

x̄ − µ
t= ∼ t(n-1) (20)
√S
n

42
24 Objective
To obtain g(X1 , X2 , . . . , Xn ) that estimates θ.

25 Unbiased Estimator (UE)


θ = E(θ̂) = E[g(X1 , X2 , . . . , Xn )]
Bias: Bias = θ − E(θ̂).

26 Efficiency
g1 is more efficient than g2 if Var(g1 ) ≤ Var(g2 ).

27 Minimum Variance Unbiased Estimator (MVUE)


MVUE is the Minimum Variance Unbiased Estimator.

28 Maximum Likelihood Estimator (MLE)


Likelihood function:
L(θ) = f (x1 , x2 , . . . , xn )
Example:
Argmaxθ (L(θ)) = θ̂
For a sample X1 , X2 , . . . , Xn from an exponential distribution with mean λ:
n n
Y Y 1 − xi
L(λ) = fXi (xi ) = e λ
i=1 i=1
λ
Pn
xi
log(L(λ)) = −n ln(λ) − i=1
λ
Pn
d n xi
ln(L(λ)) = − + i=12 =0
dλ λ λ
n
!
1 X
λ̂ = xi
n i=1
MLE is MVUE for a large dataset.
Example: X1 , X2 , . . . , Xn , Xi ∼ U (0, θ)
1
fXi (xi ) =
θ

43
n n
Y Y 1
L(θ) = fXi (xi ) =
i=1 i=1
θ
log(L(θ)) = −n log(θ)
d n
log(L(θ)) = − = 0
dθ θ
It does not work always!
Example: X1 , X2 , . . . , Xn , Xi ∼ N (µ, σ 2 )
n n
1 1 xi −µ 2
e− 2 ( σ )
Y Y
2
L(µ, σ ) = fXi (xi ) = √
i=1 i=1
2πσ
n
2 n n 2 1 X
G(L(µ, σ )) = − log(2π) − log(σ ) − 2 (xi − µ)2
2 2 2σ i=1
n
d 1 X
G(L(µ, σ 2 )) = 2 (xi − µ) = 0
dµ 2σ i=1
n
1X
µ̂ = xi
n i=1
n
d 2 n 1 X
2
G(L(µ, σ )) = − 2
+ 4
(xi − µ)2 = 0
dσ 2σ (2σ ) i=1
n
2 1X
σ̂ = (xi − µ̂)2
n i=1
Joint estimators of (µ, σ 2 ):
µ̂ = x̄
n
2 1X
σ̂ = (xi − x̄)2
n i=1

E(σ̂ 2 ) ̸= σ 2 (Biased)
σ̂ 2 is not an unbiased estimator of σ 2
Unbiased estimator for σ 2 :
n
n 1 X
g(x1 , x2 , . . . , xn ) = σ̂ 2 = (xi − x̄)2
n−1 n − 1 i=1

Choose θ̂ such that


E(θ̂ − θ)2 = Minimum square error
 
Arg min E(θ̂ − θ) ,
2
θ̂ = g(x1 , x2 , . . . , xn )
θ̂

44
h i
E(θ̂ − θ)2 = E (θ̂ − E(θ̂))2 + 2(θ̂ − E(θ̂))(E(θ̂) − θ) + (E(θ̂) − θ)2
n 2 2 o
E(θˆ − θ)2 = E θˆ − E(θˆ) + θ − E(θˆ) + 2 θˆ − E(θˆ) E(θˆ) − θ


where:

• θ̂ is a random variable,

• E(θ̂) is a constant,

• θ is a constant.

This expression can be expanded as:

2 2
= E θˆ − E(θˆ) + θ − E(θˆ) + 2 θˆ − E(θˆ) E(θˆ) − θ
 
2
= Var(θˆ) + Bias(θˆ)


29 Confidence Interval Estimation


To obtain g1 (x1 , x2 , . . . , xn ) and g2 (x1 , x2 , . . . , xn ):

P (g1 ≤ θ ≤ g2 ) ≥ 1 − α
Where:

1. 1 − α is the level of confidence.

2. (g1 , g2 ) is the interval.

3. P (g1 ≤ θ ≤ g2 ) ≥ 1 − α.

Case 1
Let X1 , X2 , . . . , Xn be a random sample from N (µ, σ 2 ) and σ 2 is known.
α = error
α = 0.05, 0.1, 0.01, represents the confidence.
Create a confidence interval for µ:

1. Point estimator of µ is X̄.


2
2. X̄ ∼ N (µ, σn ).

3. X̄−µ
√σ
∼ N (0, 1).
n

45
P (a ≤ Z ≤ b) = 1 − α


P −z α2 ≤ Z ≤ z α2 = 1 − α
!
X̄ − µ
P −z α2 ≤ ≤ z α2 =1−α
√σ
n
    
σ σ
P X̄ − z α2 √ ≤ µ ≤ X̄ + z α2 √ =1−α
n n
(1 − α)100% confidence interval for µ is:
    
σ σ
X̄ − z α2 √ , X̄ + z α2 √
n n
For example, when α = 0.05, the 95% confidence interval for µ with z0.025 = 1.96 is:
    
σ σ
X̄ − 1.96 √ , X̄ + 1.96 √
n n

Case 2
When σ is unknown:

X̄ − µ
∼ tn−1
√S
n
!
X̄ − µ
P −t α2 ,n−1 ≤ ≤ t α2 ,n−1 =1−α
√S
n

(1 − α)100% confidence interval for µ is:


    
S S
X̄ − t α2 ,n−1 √ , X̄ + t α2 ,n−1 √
n n
or a small data set with a random sample x1 , x2 , . . . , xn from N (µ, σ 2 ), the point estimator
for σ 2 is S 2 . The confidence interval for σ 2 is given by:

46
(n − 1)S 2 2 (n − 1)S 2
≤ σ ≤
χ2α/2 χ21−α/2
where χ2α/2 and χ21−α/2 are the critical values from the chi-square distribution with n − 1
degrees of freedom. The probability that σ 2 falls within this interval is 1 − α. In terms of
the chi-square distribution, this is expressed as:

(n − 1)S 2
 
2 2
P χ1−α/2 ≤ ≤ χα/2 = 1 − α
σ2
Further simplifying, we have:

!
(n − 1)S 2 2 (n − 1)S 2
P ≤ σ ≤ =1−α
χ2α/2 χ21−α/2

The confidence interval for σ 2 is


" #
(n − 1)S 2 (n − 1)S 2
, 2
χ2(α/2) χ(1−α/2)

47
Interval estimator
let X1 , X2 , X3 , ...Xn be a random sample of size n then
Case-1: When n is large,confidence interval for µ is,
 
S S
X̄ − z α2 √ , X̄ + z α2 √
n n
Case-2: When n is small,confidence interval for µ is,
 
S S
X̄ − t α2 ,n−1 √ , X̄ + t α2 ,n−1 √
n n

Confidence Interval for p: let X1 , X2 , X3 , ...Xn be a random sample(large) of size n


such that
Xi ∼ B(1, p),
we know that the point estimator for p is X̄ and

X̄ − E(X̄)
p → N (0, 1)
V ar(X̄)

When sample is large E(X̄) = E(Xi ) = p and var(X̄) = var(Xi )


n
= p(1−p)
n
and let z = qX̄−p
p(1−p)
n
and

P (−z α2 ≤ z ≤ z α2 ) = 1−α
X̄ − p
P (−z α2 ≤ q ≤ z α2 ) = 1−α
p(1−p)
n
r r
p(1 − p) p(1 − p)
P (X̄ − z α2 ≤ p ≤ X̄ + z α2 ) = 1 − α,
n n

replace p(1 − p) = X̄(1 − X̄) then confidence interval for p is


" r r #
X̄(1 − X̄) X̄(1 − X̄)
X̄ − z α2 , X̄ + z α2 .
n n
q
Now −z α2 ≤ qX̄−p
p(1−p)
≤ z α2 ⇒ |X̄ − P | ≤ z α2 p(1−p)
n
we get
n

z α2 2 z α2 2
   
2
1+ P − 2X̄ + P + X̄ 2 ≤ 0
n n
 zα 2
 r zα 2 2
  zα 2

2X̄ + n 2
± 2X̄ + n2 − 4 1 + n2 X̄ 2
P =  zα 2

2 1 + n2

48
2
zα/2
when n is large then n
≈ 0 we get,

  r 4X̄z 2 z4
zα/2 2 α/2 α/2
2X̄+ n
± 4X̄ 2 + n
+ 2
n
P = 
zα/2 2

2 1+ n

X̄(1−X̄)zα/2
P = X̄ ± √
n

Remark: For small sample zα/2 can be replaced by tα/2

Let two samples X1 , X2 , ...Xn ∼ N (µ1 , σ12 ) and Y1 , Y2 , ...Yn ∼ N (µ2 , σ22 ) then we will find
C.I. (Confidence Interval) for (µ1 − µ2 ) as we know point estimator of µ1 − µ2 is X̄ − Ȳ ,
where X̄ ∼ N (µ1 , σ12 /2) and Ȳ ∼ N (µ2 , σ22 /2) implies X̄ − Ȳ ∼ N (µ1 − µ2 , σ12 /2 + σ22 /2)
hence C.I. for µ1 − µ2 is
" r r #
S12 S22 S12 S22
(X̄ − Ȳ ) − zα/2 + , (X̄ − Ȳ ) + zα/2 +
n m n m

Testing of Hypothesis:
Decision Making
Ex. Let the average height of the population is 5.4 ft and P = 40% then accept or reject
Statistical hypothesis:
Null hypothesis: Initial claim(should be rejected)

Ho : µ = 5.4 (simple) or p ≥ 0.4

Alternate hypothesis: This is complement to Ho ,

H1 : Ho is not true
µ ̸= 5.4 or p < 0.4

Specific Value of parameter: Assumed null hypothesis implies simple hypothesis,

Ho : µ = 5.4 v/s H1 : µ ̸= 5.4


• Collect the data (5.1, 5.5, 5.6,...)

• X̄ = 5.8,(Ho is rejected → not close)


If X̄ = 5.3or5.5(close to 5.4 or not?)
decision →
Actual H0 H1
(error is deviation from actual scenario)
H0 ✓ type I error
H1 type II error ✓

49
1. Over estimation

2. Under estimation

P ( type I error) = α
⇒ P (H0 is rejected/H0 is true) = α

P ( type II error) = β
⇒ P ( accept H0 /H0 is not true) = β
Objective: Minimize both α and β
simultanious reduction in α and β is not possible. So let first fix the value of α and minimize β

• Rejection region of H0

• Critical region
H0 : p = 0.4, H1 : p ̸= 0.4 collect the data and we have point estimator of p =X̄
If X̄ is close to 0.4 then accept H0 otherwise reject H0 .

Level of significance
Objective: Minimize both α and β
Testing:µ = µ0 v/s H1 : µ ̸= µ0
Test critical region/ Rejection region

C = {(X1 , X2 , ..., Xn ) : |X̄ − µ0 | > c}


where c is some significant number
we fix α,

α = P ( type I error ) = PH0 ( reject H0 )


= PH0 (c)
= PH0 (|X̄ − µ0 | > c)

⇒ PH0 (|X̄ − µ0 | < c) = 1 − α


⇒ PH0 (−c ≤ X̄ − µ0 ≤ c) = 1 − α
−c c
⇒ PH 0 ( √ ≤ z ≤ √ ) = 1 − α
σ/ n σ/ n
zα/2 σ
say, zα /2 = c√
σ/ n
⇒c= √ ,
n
for large data σ will be replaced by S.

50
• Decision with (1 − α)100% level of significance

• Reject H0 if |X̄ − µ0 | > c


z S
Reject H0 if |X̄ − µ0 | > α/2

n

zα/2 S
• Accept H0 if |X̄ − µ0 | ≤ √
n 

z S zα/2 S
⇒ µ0 ∈ X̄ − α/2 √ , X̄ +
n

n
X̄−µ
⇒ zα/2 ≤ √0
S/ n
≤ zα/2

Method:
Step 1- Define H0 and H1

Step 2- Fix α.

Step 3- Define a test statistic( under H0 ), reject H0 if X̄ is far from µ0

√ 0 , reject H0 if |zcal | is large i.e. |zcal | > zα/2 → reject H0


Step 4- zcal = X̄−µ
S/ n
Example:
Data:= 5.1, 5.8, 5.7,...,6 (n=100),
Average height(µ0 )=5.4
H0 : µ = 5.4, H1 : µ ̸= 5.4 95% level of significance, α = 0.05,
Hence X̄ = 5.6 and S 2 = 1,
5.6 − 5.4
zcal = √
1/ 100
zcal = 2
zα/2 → calculate from table, z0.025 = 1.96
as |zcal | > zα/2 , we reject H0 at 95% level of significance.

β= P (type II error)
= PH1 ( accept H0 )
z S
= PH1 (|X̄ − µ0 | ≤ α/2
√ )
n
√ √
= PH1 (−zα/2 S/ n ≤ X̄ − µ0 ≤ zα/2 S/ n)
= PH1 (−zα/2 √Sn ≤ X̄ − µ0 ≤ zα/2 √Sn )
= PH1 (−zα/2 √Sn + µ0 − µ1 ≤ X̄ − µ1 ≤ zα/2 √Sn + µ0 − µ1 )
µ0 −µ µ0 −µ
= PH1 (−zα/2 + √1
S/ n
≤ Z ≤ zα/2 + √ 1)
S/ n

Here 1 − β defines power of the test.


If data is small → replace zα/2 with tα/2,(n−1)
One sided test: H0 : µ ≤ µ0 vsH1 : µ > µ0

51
C = {(X1 , X2 , ..., Xn ) : X̄ − µ > c} based on α

α= PH0 (c)
= PH0 (X̄ − µ > c)
⇒ 1 − α = PH0 (X̄ − µ < c)

X̄−µ
z= ∼ N (0, 1)
√0
S/ n 
PH 0 Z ≤ S/c√n = 1 − α ⇒ c√
S/ n
= zα ⇒ c = z√
αS
n
Reject H0 at (1 − α)100% level of significance if X̄ − µ0 > z√
αS
n
⇒ X̄ > µ0 + z√
αS
n
(zα = tα
when n < 30)
H0 : µ ≥ µ0 vs H1 : µ < µ0
Critical region is,
C = {(X1 , X2 , ..., Xn ) : X̄ − µ < c} based on α

α= PH0 (c)
= P (X̄ − µ0 < c)
= P ( X̄−µ
√0 <
S/ n
c√
S/ n
)
c√
= P (z < S/ n
)

c= √S (−zα )
n
If X̄−µ
√ 0 < −zα
S/ n
then reject H0 .

Two sample tests:


µ1 → Average salary in Jodhpur, µ2 → Average salary in Mumbai,

H0 : µ1 ≤ µ2 vs H1 : µ1 ̸= µ2

Given X1 , X2 , ...Xn and Y1 , Y2 , ...Yn are two samples, then critical region,
C = {(X1 , X2 , ...Xn , Y1 , Y2 , ...Yn ) : |X̄ − Ȳ | > c}

α= PH0 (c)
= PH0 (|X̄ − Ȳ | > c)
⇒ PH0 (−c ≤ X̄ − Ȳ ≤ c) = 1−α
S12 S22
 
X̄ − Ȳ ∼ N µ1 − µ2 , +
n m
(X̄ − Ȳ ) − (µ1 − µ2 )
⇒Z= q ∼ N (0, 1) for H0 , µ1 − µ2 = 0
S12 S22
n
+ m

52
−c c
P (r 2
≤z≤ r )=1−α
S1 S2 2
S1 S2
n
+ m2 n
+ m2
q
S2 S2
c = zα/2 n1 + m2
Reject H0 at (1 − α)100% level of significance if |X̄ − Ȳ | > c One sided:
H0 : µ1 ≤ µ2 vs H1 : µ1 > µ2

β= PH1 ( Accept H0 )
= PH1 (|X̄ − Ȳ | ≤ c)
= P (−c ≤ |X̄ − Ȳ | ≤ c)
 H1 

= PH1  −c−(µ
r 1
2
−µ2 )
2
≤z≤ c−(µ1 −µ2 ) 
r
2
S1 S S1 S2
n
+ m2 n
+ m2

For fixed α
1 − α → level of significance
1 − β → power of test

Hypothesis for σ 2 :
I : H0 : σ12 = σ02 v/s H1 : σ12 ̸= σ02
I/II
(n − 1)S 2
2
∼ χ2(n−1)
σ
Test statistic(Under H0 )

(n − 1)S 2
Ycal =
σ02
Compare Ycal with χ2α/2,(n−1) and χ21−α/2,(n−1) .
Reject H0 if Ycal > χ2α/2 or Ycal < χ21−α/2 .

Hypothesis: H0 : p = p0 , H1 : p ̸= p0
for p = α
C = {(x1 , x2 , x3 , .....xα ) : |X − p0 | > c}
C = PH0 (|X − p0 | > c)
Z ∼ N (p0 , p0 (1−p
n
0)
)
X − p0
Z=q ∼ N (0, 1)
p0 (1−p0
n

53
−c X − p0 c
P (q ≤q ≤q )=1−α
p0 (1−p0 p0 (1−p0 p0 (1−p0
n n n

c=Z q
p0 (1−p0
α/2 n
β(p1 ) = PH1 (Acceptance of H0 )
= PH1 (|X − p0 | ≤ c)
= PH1 (−c ≤ X − p0 ≤ c)
−c + p0 − p1 c + p0 − p1
= P( q ≤Z≤ q )
p0 (1−p0 p0 (1−p0 )
n n

X − p0
Zcal = q
p0 (1−p0 )
n

If Zcal > Zα/2 or Zcal < Zα/2 then reject H0 .


Note: 1 − β is known as power of the test.

Two sample test


Hypothesis: H0 : p1 = p2 = p0 , H1 : p1 ̸= p2
Here two samples (X1 , ....., Xn ), (Y1 , ....Yn ) from independent solution are considered. If H0
is true in that case X = Y.
Both X and Y follows normal distribution under H1 .
X − Y ∼ N (0, p0 (1 − p0 )( n1 + m1 ).
Zcal = √ (X−Y )−01 1
p0 (1−p0 )( n + m
Reject H0 if Zcal < Zα/2 or Zcal > Zα/2 .
Note: If H0 : p1 = p2 = p3 = p4 = ..... and H1 : H0 is not true. This is known as analysis of
variance and here F-distribution can be used.

Goodness of fit test


2
χ2 = ni=1 (oi −e i)
∼ χ2n−1
P
ei
H0 is rejected if χ2 is large i.e. χ2 > χ2α,(n−1) .

Stochastic Process
A collection of random variables {Xt : t ∈ T } varying with respect to time is called a
stochastic process.
Here Xt is a random variable at time t.

54
Categories of Stochastic processes
• Discrete time - Discrete state
For example: number of people coming to a doctor everyday.

• Continuous time - Discrete state


For example: number of calls every hour.

• Discrete time - Continuous state


For example: measuring rain after the day.

• Continuous time - Continuous state

Statistical Properties
Let Xt : t ≥ T be a stochastic process.

1. Distribution Function:
First order: Ft (x) = P (Xt ≤ x)
Second order: Ft1 ,t2 (x, y) = P (Xt1 ≤ x, Xt2 ≤ y).

2. Expectation: R
η(t) = E(Xt ) = xt ft (x)dx.

3. Autocorrelation:
R(t1 , t2 ) = E(Xt1 · Xt2 ).

4. Average power:
R(t, t) = E(Xt 2 )
Note: This is autocorrelation at same time.

5. Autocovariance:
C(t1 , t2 ) = E(Xt1 · Xt2 ) − E(Xt1 ) · E(Xt2 ).
C(t1 , t2 ) = R(t1 , t2 ) − η(t1 ) · η(t2 ).

Increments of a Stochastic process


Let t1 ≤ t2 ≤ t3 .... then (Xt1 − Xt2 ), (Xt2 − Xt3 ), ...... are known as the increments.

• Independent increments
If (Xt1 − Xt2 ), (Xt2 − Xt3 ), ...... all are independent then we say stochastic process has
independent increments.

• Stationary increments
If the distribution of (Xt+h − Xt ) does not depend on t but depends only on h i.e. the
increments of same length follow same distribution then we say stochastic process has
stationary increments.
For example: Distribution of (Xt15 − Xt10 ) = Distribution of (Xt8 − Xt3 )

55
Counting process
A stochastic process {Nt : t ≥ 0} is said to be a counting process if Nt is the number of
events occuring in time interval (0,t].

Poisson process
A counting process {Nt : t ≥ 0}, N0 = 0 is said to be a poisson process if

(a) The increments are independent.

(b) The increments are stationary with Nt+s − Ns ∼ P (λt),


where λ is arrival rate.
λtr (−λt)
P (Nt+s − Ns = r) = r!
e

Interarrival time
The time between two arrivals is said to be interarrival time.
T1 denotes Time of occurence of first event.
T2 denotes Time between first and second event.
Tn denotes Time between (n − 1)th and nth event.
P (T1 > t) denotes the probability for time of first arrival to be more than t.
P (T1 > t) = P ( no arrival in (0,t] )
P (T1 > t) = P (N (t) = 0) = e(−λt) .
The interarrival time follows exponential distribution with mean 1/λ.

Markov process
A stochastic process {Xt : t > 0} is said to be a markov process if
P (X(n+1) = a(n+1) : X0 = a0 , X1 = a1 , .......Xn = an ) = P (X(n+1) = a(n+1) : Xn = an )

Transition Probability Matrix


 
p11 p12 p13
Original matrix = p21 p22 p23 
p31 p32 p33
Here pi,j = P (X1 = j : X0 = i) denotes the movement from state i to j

56

You might also like