0% found this document useful (0 votes)

7 views56 pages

Notes

Uploaded by

ishanrajpurohit21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views56 pages

Notes

Uploaded by

ishanrajpurohit21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

1 Introduction

There are two types of approach to probability :-

• Axiomatic approach

• Classical approach

2 Random Experiment
It is an experiment in which

• The set of all possible outcomes of experiment is known in advance.

• Any specific outcome is not known in advance.

• Experiment can be repeated under identical conditions.

3 Sample Space
It is a pair (S, Ω) where

• S is set of all possible outcomes of the experiment.

• Ω is a σ - field of subsets of S.

• Example:-
If we toss a coin. then

S = {H, T }
and

Ω = {{H}, {T }; {H, T }, ϕ}

• Events:- Subsets of Sample space are called events.

4 Operations on sets
For A, B ⊆ S

Experiment Sets
A or B A∪B
A and B A∩B
not A AC

1
5 De-Morgans’ Law
1. (A ∪ B)C = AC ∩ B C

2. (A ∩ B)C = AC ∪ B C

6 Sigma Field
Let S be a set Ω = set of subsets of S satisfying following conditions :-

1. ϕ ∈ Ω.

2. If A ∈ Ω, then AC ∈ Ω

3. If A, B ∈ Ω, then A ∪ B ∈ Ω

Example:-

S = {1, 2, 3, 4, 5, 6}
If A = {1, 2, 3}
B = {4, 5}
then

Ω = ϕ, S, A, AC , B, B C , A ∪ B, AC ∩ B C

7 Probability
Let S be a sample space & Ω be the sigma field generated by S then probability is a function

P : Ω → [0, 1].

satisfying the following conditions:-

1. 0 ⩽ P (A) ⩽ 1; ∀A ∈ Ω

2. P (S) = 1

3. If A1 , A2 , . . . , An such that Ai ∩ Aj = ϕ for all i, j then

n
! n
[ X
P Ai = P (Ai )
i=1 i=1

2
7.1 Properties
1. P AC = 1 − P (A)

2. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
3. P (A∪B ∪C) = P (A)+P (B)+P (C)−P (A∩B)−P (B ∩C)−P (C ∩A)+P (A∩B ∩C)
Remark:- The triple (S, Ω, P ) is called probability space.

8 Conditional Probability
Let (S, Ω, P ) be probability space and let H ∈ S with P (H) > 0, then for an arbitrary
A ∈ S, we shall write :

P (A ∩ H)
P (A/H) = ; P (H) ̸= 0
P (H)
is called conditional probability of A given H.

9 Bayes Theorem
Let S be a sample space and B1 , B2 , . . . , Bn be a partition of S, then for any A ⊆ S

P (Bi) · P (A/Bi)
P (Bi/A) = Pn
i=1 P (A/Bi) · P (Bi)

10 Independent Events
Two events, A and B are said to be independent if
P (A ∩ B) = P (A) · P (B)
Remarks:-
1. If A and B are independent, then
• AC and B are independent
• A and B C are independent
• AC and B C are independent.
2. If A, B, and C are independent, then
• P (A ∩ B ∩ C) = P (A) · P (B) · P (C)
• P (A ∩ B) = P (A) · P (B)
• P (B ∩ C) = P (B) · P (C)
• P (A ∩ C) = P (A) · P (C)
Conditions (2), (3), (4) do NOT imply (1).

3
11 Random Variable
Definition:- A Random variable (X) is a function which convert elements of sample space
into real numbers.

Example:- In tossing of coin twice: If X : no. of heads, then X can have values 0, 1, 2.
X Events
0 TT
1 HT, T H
2 HH
Here {0, 1, 2} is called Support of Random variable X.
Let X be a random variable having support {x1 , x2 , . . . , xn } with

P (X = xi ) = pi
then the probability distribution of X is represented as
X x1 x2 ··· xn
p(x) p1 p2 ··· pn
provided
Pn
i=1 pi = 1; pi ⩾ 0
pi ’s are called Probability Mass Function(PMF).
Example:- Getting first head in repeated tossing of a coin.

S = {H, T H, T T H, T T T H, . . .}
X = Number of tosses to get first head.

x 1 2/ 3 4 ...
P (x) 1/2 1/4 1/8 1/16 ...
∞
X 1 1 1
pi = + + + ....
i=1
2 4 8
1
2
= 1 =1
1− 2

12 Types of Random Variables

Random variable are of two types:-
• Discrete : If S is mapped onto finite or countably infinite set. For example:- tossing a
coin, throwing a die.

• Continuous : If S is mapped onto an uncountable set. For example:- Velocity of light,

length, height, etc.

4
13 Cumulative Distribution function: (CDF)
A CDF FX of random variable X is defined as:-
FX (x) = P (X ⩽ x); ∀x ∈ R
For example if

x 0 1 2
P (x) 1/4 1/2 1/4
then

FX (−3) = P (X ⩽ −3) = 0
1
FX (0.5) = P (X ⩽ 0.5) = P (X = 0) =
4
3
FX (1) = P (X ⩽ 1) = P (X = 0) + P (X = 1) =
4
FX (2) = P (X ⩽ 2) = 1

13.1 Properties
• FX (−∞) = P (X ≤ −∞) = 0
• FX (∞) = P (X ≤ ∞) = 1
• FX is right continuous as:-

FX (a) = FX a+

• It is Monotonically Increasing Function i.e. for x1 , x2 ; such that x1 ⩽ x2

⇒ FX (x1 ) ⩽ FX (x2 )
• Note that we can write,
P (X = a) = FX (a) − FX a−

Example:-


 0 ;x < 0
1/4 ;0 ⩽ x ⩽ 1

FX (x) =

 3/4 ;1 < x < 2
1 ;x ⩾ 2

It is NOT CDF as

FX (1) ̸= FX 1+

Because it is NOT Right Continuous.

Remark:- Graph of FX (x) has jumps at X = xi , and these jumps are the probabilities of
X = xi & so we can find distributions.

5
13.2 Continuous Random Variable
If CDF of X that is, FX (x) of a random variable is continuous, then it is called continuous
random variable.
• If P (X = a) = 0, then Random Variable X is continuous. for example:-

0;
 x<0
FX (x) = x; 0⩽x<1

1; x⩾1


• In such case
P (a < x < b) = P (a ⩽ x < b) = P (a < x ⩽ b) = P (a ⩽ x ⩽ b)

14 Probability density function

Since for continuous variable, we don’t have pi for X = xi ; thus we have
A non-negative function f (x) which is said to be probability density function of Random
variable X if
Z x
FX (x) = fX (t)dt
−∞
Note that Z ∞
fX (t)dt = 1
−∞
(
1 ;0 ⩽ x < 1
For example:- fX (x) =
0 ; otherwise
Remark:- For continuous Random variable
d
FX (x) = FX′ (x) = fX (x)
dx

15 Moments
15.1 Expectation of X (1st Moment,Mean/Average)
n
X
E(X) = xi p i
i=1
Ex:- X = Rolling a die
X: 1 2 3 4 5 6
P : 1/6 1/6 1/6 1/6 1/6 1/6
E(X) = 1/6 + 2 · 1/6 + 3 · 1/6 + · · · · +6 · 1/6 = 3.5

6
• For infinite values: ∞
X
E(X) = x i Pi
i=1

exists, if E(|X|) < ∞.

15.2 Expectation of a function of X

Let g(X) be a function of X having probability distribution

X: x1 x2 · · · xn
p: p1 p 2 · · · pn
n
X
E[g(X)] = g (xi ) pi .
i=1

Ex:
X : −1 0 1
p : 1/3 1/3 1/3
1 1 1
E(X) = (−1). + 0. + 1. = 0
3 3 3
Now find E(X 2 )
let Y = X 2
Y :0 1
p : 1/3 2/3
E(Y ) = 0.1/3 + 1.2/3 = 2/3
On the other hand we can find it as
n
X
2
E(X ) = x2i .pi
i=1

E(X 2 ) = (−1)2 .1/3 + (0)2 .1/3 + (1)2 .1/3 = 2/3

Case 1: If g(X) = X r
n
X
r
E(X ) = xri .pi
i=1

This is called rth moment about origin

n
X
E[(X − a)r ] = (xi − a)r .pi
i=1

is called rth moment about point a.

7
Case 2: Second Moment about mean
n
X
2
E[(X − E(X) ] = [xi − E(X)]2 .pi
i=1

Let E(X) = µ then we have

n
X
E[(X − E(X)2 ] = [xi − µ]2 .pi
i=1

The term E[(X − E(X) ] is called the variance of X (Measure of uncertainty) and denoted
2

as V ar(X).
n
X
2
(xi − µ)2 pi

V ar(X) = E (X − µ) =
i=1
n
X n
X n
X
V ar(X) = x2i pi − 2µ xi p i + µ 2 pi
i=1 i=1 i=1
2 2

V ar(X) = E X − [E(X)]
V ar(X) ≥ 0
E(X − µ) = 0
E(|X − µ|) → Mean Deviation
It is always better to use x2 instead of |x| because it has more properties and easy to handle.

V ar(X) = σ 2 = E (X − µ)2

σ 2 = E X 2 − (E(X))2

σ is known as standard deviation.

Ex: Consider
X : −1 0 1 2
p : 1/6 1/3 1/3 1/6
−1 1 1 1
E(X) = +0+ + =
6 3 3 2
2
1 1 4 1 7 1
σ2 = + + − ⇒ σ2 = −
6 3 6 2 6 4
11
σ2 =
12

The Moment Generating function(MGF) of X is defined as

X xi t
MX (t) = E eXt = e · pi

8
Ex:
X : −1 0 1 2
P : 1/6 1/3 1/3 1/6
1 1 1 1
MX (t) = e−t · + e0 · + et · + e2t ·
6 3 3 6
• E(aX + b) = aE(X) + b

• E(a) = a

• Var(aX + b) = a2 Var(X)

• MX (t) = E eXt

X2 2
• MX (t) = E 1 + Xt + 2!
t + ...

t2
• MX (t) = 1 + tE(X) + 2!
E (X 2 ) + . . .

• d
dt
(MX (t)) = E(X) + t · E (X 2 ) + . . .

• E(X) = d
dt
[MX (t)] t=0

d2
• E (X 2 ) = dt2
[MX (t)]
t=0

dr
• E (X r ) = dtr
{MX (t) |t=0

The MGF of a probability distribution is uniquely defined.

If the cdf of X is continuous then we have

FX (a) = FX a+ = FX a− )

P (X = a) = 0.

If X is continuous then

P (a < X < b) = P (a ⩽ X < b) = P (a < X ⩽ b) = P (a ⩽ X ⩽ b)

9
16 Probability Density Function
Let X be a continuous random variable with CDF FX then pdf of X is defined by fX , if
Z x
FX (x) = fX (x)dx
−∞

In case of discrete random variable, we call it as probability mass function.

We have FX (∞) = 1, that is,
Z ∞
fX (x)dx = 1
−∞

Here FX (x) is probability distribution function of X.

Point probability in case of continuous random variable is 0 .

P (X = a) = FX (a) − FX (a− )
Z a Z a−
= fX (x)dx − fX (x)dx = 0
−∞ −∞

Both the integrations are same, so we have the point probability zero.

P (a ≤ X ≤ b) = P (a < X < b) = FX (b) − FX (a)

Z b Z a
= fX (x)dx − fX (x)dx
−∞ −∞
Zb
= fX (x)dx
a

where, X represents the random variable and x represents an observed value.

Characteristics of fX : −
1) fX (P dF ) should be non-negative.
R∞
2) −∞
fX (x)dx = 1 since FX (∞) = 1

Question: f (x) = 3x2 ; 0 < x < 1 and 0 ; otherwise

Solution:- As we can see that 3x2 is non-negative. So, we can write
Z ∞ Z 1
fx (x)dx = 3x2 dx
−∞ 0
1
= x3 0 = 1
So, f (x) is valid density function.
find P 12 < X < 43 ?

10
Z 3/4
1 3
P <X< = 3x2 dx
2 4 1/2
3/4
= x3 1/2
19
= Ans.
Z64 a
FX (a) = fX (x)dx
−∞

This should be continuous which is true because

Z a+ Z a Z a−
fX (x)dx = fX (x)dx = fX (x)dx
−∞ −∞ −∞

Case:- When X is a Discrete Random Variable

n
X
E(X) = p i xi
i=1
n
X
E(g(X)) = g (xi ) pi
i=1

Case:- when X is a Continuous Random Variable.

Z ∞
E(X) = xfX (x)dx
−∞
Z ∞
E(g(X)) = g(x)fX (x)dx
−∞
R∞
provided −∞
g(x)fX (x)dx < ∞
(
k ;0 < x < 1
Question : fX (x) =
0 ; otherwise
If k = 1. then fX (x) is valid as normalised.
Z 1
⇒ kdx = 1
0
⇒ k[x]10 = 1
⇒k=1

kx ; 0 < x < 2
Question: fX (x) =
0 ; otherwise
find i) P (−1 < X < 1)
ii) E(X)
(ii) Var(X)
Solution:

11
Z ∞
kxdx = 1
−∞
Z 2
⇒ kxdx = 1
0
k 2 2 k[4 − 0]
⇒ x 0=1⇒ =1
2 2
1
⇒k =
2
(
x
;0 < x < 2
So, we have fX (x) = 2
0 ; otherwise
R1
(i) P (−1 < X < 1) = x
0 2
dx = 14 .
(ii)
Z ∞
E(X) = xfX (x)dx
−∞
Z 2
x
= x · dx
2
Z0 2 2
x
= dx
0 2
1 3 2
= x 0
6
E(X) = 4/3
(iii)
Var(X) = E X 2 − [E(X)]2

Z 2
x 16
= x2 · dx −
0 2 9
1 4 2 16

= x 0−
8 9
16
=2−
9
Var(X) = 2/9

12
Characteristic function:
ϕX (t) = E eiXt )
Z ∞
= eixt fX (x)dx
Z−∞
∞
|ϕX (t)| ≤ eixt |fX (x)| |dx|
Z−∞
∞
= fX (x)dx
−∞
Z ∞
= fX (x)dx = 1
−∞

So, we have |ϕX (t)| ≤ 1.

If MX (t) exists, then

ϕX (t) = MX (it)
Inverse Fourier Transformation
Z ∞
1
fX (x) = e−ixt ϕX (t)dt
2π −∞

ϕX (t) is uniquely defined for a probability distribution

ϕX (t) ←→ fX (x)
MX (it) = ϕX (t) = E eiXt

(it)2
E X2 + · · ·

= 1 + (it)E(X) +
2!
1 d
E(X) = ϕX (t)
i dt

• MX (t) = E eXt

• MX (0) = 1

• MaX+b (t) = E e(aX+b)t

• MaX+b (t) = ebt E eaXt

• MaX+b (t) = ebt MX (at)

13
17 Functions of Random Variables:
Let X has probability distribution

X : −1 0 1 2
pX : 1/3 1/6 1/6 1/3

Let Y = g(X) → can we obtain the probability distribution of Y ?

Let Y = eX
Y : e−1 1 e e2
p : 1/3 1/6 1/6 1/3
Let Y = X 2
Y :0 1 4
p : 1/6 1/2 1/3
Let X be a Continuous Random Variables with Pdf, fX (x) then what is the Pdf of Y = g(X)
?
(
1 ;0 < x < 1
Let fX (x) =
0 ; otherwise
Let Y = eX then
(
1 ;0 < y < e
fY (y) =
0 ; otherwise

Is it correct? The integral of fY (y) is not equal to 1 so, it is not a pdf. Now we will go by
principle term F

Z x
FX (x) = fX (t)dt
−∞
!
Z g2 (x)
d
we have · fX (t)dt = fX (g2 (x)) g2′ (x) − fX (g1 (x)) g1′ (x)
dx g1 (x)

d
∴ (FX (x)) = fX (x)
dx
Now we will first find FY (y) and from it we will find fY (y).

14
FY (y) = P (Y ≤ y)
= p(g(X) ≤ y)
= P X ≤ g −1 (y)

= fX g −1 (y)

d
FY (y) = (fY (y)
dy
d
∴ fY (y) = fX g −1 (y) · · g −1 (y)

dy
Let Y = e and find fY (y)?
X

Solution:
fY (y) = p(Y ≤ y)
= p eX ≤ y (we can invert here because it is monotonically increasing)

= p(X ≤ ln y)

d
= fX (ln y) (ln y)
dy
1
fY (y) = ; 1 < y < e.
y
By formula:
d −1
fY (y) = fX g −1 (y) (g (y))
dy
fX (ln y) 1
= ·
1 y
1
= ; 1 < y < e.
y
as fX (ln y) = 1.
Theorem. Let X be a continuous random variable with pdf fX (x) and Y = g(X) be a
monotonic and differentiable function, then the pdf of Y is given by,
d −1
fY (y) = fX (g −1 (y)) (g (y)) , y ∈ S(Y ).
dy
Exercise . Let X be a random variable having pdf given by,
(
1 0<x<1
fX (x) =
0 otherwise
and (
2 −1 < x < 1
gX (x) =
0 otherwise
Find the density of Y = X 2 .

15
Bernoulli Distribution: A Bernoullian trail is an experiment with two possible outcomes:
success (1) or failure (0) with probabilities p and 1 − p, respectively. Mathematically, we
denote it as P (X = 1) = p and P (X = 0) = 1 − p with 0 < p < 1. Then X ∼ Bernoulli(p).

Bernoulli Trails: A sequence of n trails is said to be a Bernoulli trail if it satisfies the

following conditions:
1. Each trail results in either success or failure.
2. The probability of success/failure remains constant in each trail.
3. Trails are independent.
Binomial Distribution: A random variable X is said to have a binomial distribution,
denoted by X ∼ Bin(n, p), if its probability mass function is given by,

n k
P (X = k) = p (1 − p)(n−k) , k = 0, 1, 2, · · · , n,
k
where n is the number of trails, p is the success probability, and k is the number of successes.

Note:
1. i.i.d. → Independent and Identical distribution.
2. E[X] = nx=0 xP (X = x) = nx=0 x nx px (1 − p)(n−x) = np.
P P

3. V [X] = np(1 − p).

4. Let X ∼ Bin(n, p) and Y ∼ Bin(m, p) and both are independent, then X + Y ∼
Bin(n + m, p).
5. Let X ∼ Bin(n, p), then n − X ∼ Bin(n, 1 − p).
Poisson Distribution: It counts the number of successes in n trails, where n is large and
the success probability is very small. A random variable X is said to be Poisson distribution,
denoted by X ∼ P oisson(λ), if its probability mass function is given by,
e−λ λk
P (X = k) = , k = 0, 1, 2, · · · .
k!
Theorem. Let X ∼ Bin(n, p) and λ = np = constant, then X ∼ P oisson(λ) as n → ∞.
Proof.

n k
P (X = k) = p (1 − p)(n−k)
k
k (n−k)
n! λ λ
= 1−
k!(n − k)! n n
λ n
k

λ 1− n n!
= k .
k! nk (n − k)! 1 − λ n

16
Now, as n approaches ∞, we arrive at the following expression:

e−λ λk
P (X = k) = , k = 0, 1, 2, ...
k!

Note:
(k)
1. E[X k ] = MX (t)|t=0 where MX (t) = E[eXt ].

2. The Moment Generating Function (MGF) of a Poisson random variable is given as

follows:
∞
X
MX (t) = ext P (X = x)
x=0
∞
X e−λ λx
= ext
x=0
x!
∞
−λ
X (et λ)x
=e
x=0
x!
λ(et −1)
=e .

′ ′
3. MX (t) = λet eλ(e −1) . Thus, MX (0) = E[X] = λ.
t

′′
4. MX (0) = E[X 2 ] = λ2 + λ.

5. V ar[X] = E[X 2 ] − (E[X])2 = λ.

Theorem. Let X ∼ P oisson(λ) and Y ∼ P oisson(µ) and both are independent, then
X + Y ∼ P oisson(λ + µ).

Geometric Distribution:

Definition:1 Let the random variable X count the number of failures before getting the
first success in a sequence of Bernoulli trails with a success probability of p. Then, a random
variable X is said to have a geometric distribution with probability p denoted by X ∼ Geo(p)
if it’s probability mass function is given by,

P (X = k) = (1 − p)k p, k = 0, 1, 2, · · · .

Definition:2 Let the random variable X count the number of trials to get the first success
in a sequence of Bernoulli trails with a success probability of p. Then, a random variable
X is said to have a geometric distribution with probability p denoted by X ∼ Geo(p) if it’s
probability mass function is given by,

P (X = k) = (1 − p)k−1 p, k = 1, 2, 3, · · · .

17
Negative Binomial Distribution:

Definition:1 Let the random variable X count the number of failures before getting the
rth success in a sequence of Bernoulli trails with a success probability of p. Then, a random
variable X is said to have a negative binomial distribution with probability p denoted by
X ∼ N B(r, p) if it’s probability mass function is given by,

k+r−1 r
P (X = k) = p (1 − p)k , k = 0, 1, 2, · · · .
r−1
Definition:2 Let the random variable X count the number of trials to get the rth success in
a sequence of Bernoulli trails with a success probability of p. Then, a random variable X is
said to have a negative binomial distribution with probability p denoted by X ∼ N B(r, p)
if it’s probability mass function is given by,

k−1 r
P (X = k) = p (1 − p)k−r , k = r, r + 1, · · · .
r−1
Theorem. Let X1 , X2 , · · · , Xn are the i.i.d random variables and each Xi ∼ Geo(p). Define
X = X1 + X2 + · · · + Xn then X ∼ N B(r, p).
Coupon collector problem: If each box of a brand of cereals contains a coupon and there
are n different types of coupons. Assume that each time you collect a coupon, it is equally
likely to be any of the n types. What is the expected number of coupons needed until you
have a complete set?

Solution: Let N be the total number of coupons that you need to get all n types. Define
N = N1 + N2 + · · · + Nn , where Ni is the number of coupons that you need to get the ith
type. Also, Ni − 1 ∼ Geo( n−(i−1)
n
).
E[N ] = E[N1 + N2 + · · · + Nn ]
= E[N1 ] + E[N2 ] + · · · + E[Nn ]
n n n
=1+ + + ··· + .
n−1 n−2 1
Uniform Distribution: A random variable X is said to have uniform distribution over an
interval (a, b) if its probability density function is given by,
1
fX (x) = , a < x < b.
b−a
Theorem. Let X be a random variable with cumulative distribution function (CDF) F (x),
then Y = F (X) ∼ U (0, 1).
Gamma Distribution: Let X be a random variable having density,
1 x
α−1 − β
fX (x) = x e , x > 0,
Γαβ α
then, X is said to follow Gamma distribution with non-negative parameters α and β. It is
denoted by X ∼ G(α, β).
Note:

18
R∞
1. Γn = 0
xn−1 e−x dx, where n is a positive integer.
√
2. Γ 12 = π.

3. Γn = (n − 1)Γn − 1, n > 0.

4. Γn = (n − 1)!.
R ∞ n−1 −ax
5. Γn
an = 0
x e dx.

Z ∞
1 x
E(X) = x α
xα−1 e− β dx
0 Γα β
1
= Γ(α + 1)β α+1
Γα β α
= αβ

V ar(X 2 ) = αβ 2
Exponential Distribution: If we put α = 1 in eqn.(??) then we will get fX (x) =
−x
e , x > 0 which is exponential distribution.
1 β
β

X is said to be exponential random variable if its pdf is given by

1 −x
fX (x) = eβ , x>0 OR fX (x) = λe−λx , x > 0
β
x
FX (x) = 1 − e− β x > 0 OR FX (x) = 1 − e−λx x > 0
1 1
E(X) = β, V ar(X) = β 2 OR E(X) = , V ar(X) = 2
λ λ
Memoryless Property:

P (X > m + n|X > m) = P (X > n)

19
Proof.

P (X > m + n, X > m)
P (X > m + n|X > m) =
P (X > m)
P (X > m + n)
=
P (X > m)
1 − FX (m + n)
=
1 − FX (m)
1 − (1 − e−(m+n)/β )
=
1 − (1 − e−m/β )
= e−n/β
= 1 − FX (n)
= P (X > n)

Chi-Square Distribution: In X ∼ G(α, β) if we put α = n

2
and β = 2 then we will get

1 n
fX (x) = x 2 −1 e−x/2 , x > 0
Γ( n2 ) 2n/2

X ∼ G( n2 , 2) is called Chi-square random variable X ∼ χ2(n) .

E(X) = n, V ar(X) = 2n

Normal/Gaussian Distribution: A random variable X is said to follow normal distribu-

tion if its pdf is given by,
1 −1 x−µ 2
fX (x) = √ e2( σ ) , x∈R
2π σ

We can write X ∼ N (µ, σ 2 ) where,

E(X) = µ and V ar(X) = σ 2

Moment generating function for Normal distribution is

1
MX (t) = µt + σ 2 t2
2

20
Proof.
Z ∞
xt 1 −1 x−µ 2
MX (t) = E(e ) = √ ext e 2 ( σ ) dx
2π σ −∞
Z ∞
1 −1 2 2 2
=√ e 2σ2 (x −2xµ+µ −2σ xt) dx
2π σ −∞

Z ∞ −1 2
1 2 2 2 2
2 x −2x(µ+σ t)+(µ+σ t) +µ −(µ+σ t)
2 2
=√ e 2σ dx
2π σ −∞
2
Z ∞ −1 2 −1
1 2
2 µ −(µ+σ t)
2 2
2 x−(µ+σ t)
=√ e 2σ e 2σ dx
2π σ −∞
2
−1
Z ∞ −1
1 2
2 µ −(µ+σ t)
2 2 2
2 x−(µ+σ t)
=√ e 2σ e 2σ dx
2π σ −∞
1 2 2
= e(µt+ 2 σ t )

Standardization(Normalization): Let X ∼ N (µ, σ 2 ) then we can normalize it by

subtracting µ and divide by σ.
X −µ
Z= , where E(Z) = 0 and V ar(Z) = 1
σ
and the density function of standard normal distribution is,
1 −z2
fZ (z) = √ e 2
2π
and we can write it Z ∼ N (0, 1), here Z is standard normal random variable.

Properties of Normal distribution:

1. E(X) = µ and V ar(X) = σ 2 .

2. If Z = X−µ
σ
, Z ∼ N (0, 1).

3. aX + b ∼ N (aµ + b, a2 σ 2 ).

4. Mean = Mode = Median.

5. E(Z (2n+1) ) = 0, Z (2n+1) is odd function so integration become 0.

6. E(Z 3 ) = 0, so skewness is zero.

Skewness=E(( X−µ
σ
)3 ) {skewness is zero for symmetric curve.}

7. E(Z 4 ) = 3, so Kurtosis is 3.

Note: A probability distribution can be determined uniquely by

21
i. Probability density function.

ii. Moment generating function.

Proof. 3. let Y = aX + b then

MY (t) = E(e(ax+b)t )
= ebt E(eatx )
= ebt MX (at)
1 2 a2 t2
= ebt eµat+ 2 σ
1 2 a2 )t2
= e(aµ+b)t+( 2 σ

so, aX + b ∼ N (aµ + b, a2 σ 2 ).
Example: X ∼ N (5.3, 1) and Y ∼ N (5.5, 1.5) then
X−5.3
1
∼ N (0, 1) and Y√−5.5
1.5
∼ N (0, 1).

Example: Let X ∼ N (15, 9) find P (12 < X < 20).

Solution:
12 − 15
X − 15 20 − 15
P (12 < X < 20) = P < <
3 3 3
5
=P −1<Z <
3
5
=P Z< − P (Z < −1)
3

values of P Z < 53 and P (Z < −1) comes from table.

Moment Ineqalities:

Theorem. Let h be a non-negative function of a random variable X such that E(h(X)) ex-
ists, then

E(h(X))
P (h(X) ≥ ϵ) ≤
ϵ
Proof. Z ∞
E(h(X)) = h(x)f (x)dx, where f (x) is pdf of X
−∞

Let, A = {x; h(x) ≤ ϵ}

Z Z
E(h(X)) = h(x)f (x)dx + h(x)f (x)dx
A Ac
| {z } | {z }
h(x)≥ϵ h(x)<ϵ

22
Z Z
E(h(X)) ≥ h(x)f (x)dx ≥ ϵ f (x)dx
A A
Z
E(h(X))
f (x)dx ≤
A ϵ
E(h(X))
P (h(x) ≥ ϵ) ≤
ϵ

Markov’s Inequality:
E(|X|r )
P (|X| ≥ k) ≤
kr
Proof. Using the previous result
let h(x) = |x|r and ϵ = k r then

E(|X|r )
P (|X|r ≥ k r ) ≤
kr
E(|X|r )
P (|X| ≥ k) ≤
kr

Example: Let, say given E(X 2 ) = 2 then P (|X| ≥ 3) =?

E(|X|2 )
P (|X| ≥ 3) ≤
32
2
P (|X| ≥ 3) ≤
9
Chebyshev’s Inequality: Let, X be a random variable with with E(X) = µ and V ar(X) =
σ 2 then
1
P (|X − µ| ≥ kσ) ≤ 2
k
or
1
P (|X − µ| ≤ kσ) ≥ 1 − 2
k
Joint probability Distribution: Let X and Y be random variables with support {x1 , x2 , . . . , xn }
and {y1 , y2 , . . . , ym } then joint probability distribution can be written in the form of

y1 y2 y3 ... ym
x1 p11 p12 p13 ... p1m
 
x2  p21 p22 p23 ... p2m 
T = x3  p31 p32 p33 ... p3m
 
,
..  . .. .. ..
.  ..

. . ... . 
xn pn1 pn2 pn3 . . . pnm

pij = P (X = xi , Y = yj ), where i ∈ {1, 2, ..., n}, j ∈ {1, 2, ..., m} and pij ∈ [0, 1].

23
XX
pij = 1
i j
X
pi+ = pij = Marginal density of X.
j
X
p+j = pij = Marginal density of Y.
i

Joint Information: Anything related to both X and Y.

Marginal Information: anything related to only X or Y.
Conditional Information: Conditioning one variable (fixing) and prediction of other vari-
able.
P (X = x, Y = y)
P (X = x|Y = y) =
P (Y = y)
Continuous joint pdf: fXY (x, y), support (x, y)
Z Z
fXY (x, y)dx dy = 1
R

Covariance of X and Y:

Cov(X, Y ) = E (X − E(X))(Y − E(Y ))

= E (X − X̄)(Y − Ȳ )
= E(XY ) − E(X)E(Y )

24
If we replace Y by X then we will get

Cov(X, X) = V ar(X)

Note:

1. If Cov(X,Y) is positive then X and Y are positively correlated.

2. If Cov(X,Y) is negative then X and Y are inversely correlated.

Correlation Coefficient:
Cov(X, Y )
ρXY = p
V ar(X) V ar(Y )
Question:
fXY (x, y) = ke−(2x+y) , x > 0, y > 0

1. P (X < 1, Y < 1)

2. P (X < Y )

3. E(X), E(Y), E(X|Y ), E(Y |X)

4. Cov(X,Y)

5. ρXY

Predective Model:
Amount of Rain (X) ⇌ Amount of crops (Y )
Ŷ = H(X)
Input (Given) → System → Output (Predicted)
Exercise: Are random variable X and Y related where

fXY (x, y) = kxy, 0 < x < 1, 0 < y < 1 and x + y ≤ 1 .

Solution:
Z 1 Z 1−y
kxydxdy = 1
0 0
Z 1
k
y(1 − y)2 dy = 1
0 2
4 1
k y 2y 3 y 2
− + =1
2 4 3 2 0
k = 24

Also Z 1−x
fX (x) = 24xydy = 12x(1 − x)2 , 0<x<1
0

25
gives
2y
fY |X (y|x) = , 0<y <1−x
(1 − x)2
Also E(X) = 25 , E(Y ) = 52 , E(X 2 ) = 15 , E(Y 2 ) = 1
5
and E(XY ) = 2
15
gives

Cov(X, Y ) = E(XY ) − E(X)E(Y )

2
=−
75
1 1
V ar(X) = , V ar(Y ) =
25 25
Cov(X, Y ) 2
ρ= p =− .
V ar(X)V ar(Y ) 3
Correlation Coefficient(ρ): measures the strength of linear dependence and −1 ≤ ρ ≤ 1.

• ρ = −1 → X and Y are perfectly linearly negatively related.

• ρ = 1 → X and Y are perfectly linearly positively related.

• ρ = 0 → Not linearly related/dependent.

Z
E(Y |X) = yfY |X (y|x)dy
1−x
2y 2
Z
2
= 2 dy = (1 − x).
0 (1 − x) 3
2
E(Y |X = x) = (1 − x) (Regression)
3
2
Yp = (1 − x) (Regression)
3
Covariance Matrix(Σ):

V ar(X) Cov(X, Y )
Σ=
Cov(Y, X) V ar(Y )
1 2

− 75
= 25
2 1 (Positive semidefinite and symmetric)
− 75 25

Question: Find the value of k?

f (x, y) = k, 0 < x < 1

f (x, y) = ke−x , 0 < x < y < ∞.

Independence of X and Y -

26
• If X and Y are independent, then

fXY (x, y) = fX (x)fY (y) ∀ x, y

Or P (X = x, Y = y) = P (X = x)P (Y = y) ∀ x, y.

• If dependent then

– Are they positively dependent or negatively dependent by covariance.

– Are they linearly dependent or not. (Correlation Coeffiecient)

• ρ = 0 or Cov(X, Y ) = 0 does not imply independence of X and Y .

• If ρ → ±1(close to) then, Yp = α + βX (Linear Regression)

Choose α and β such that E = (Yi − α − βXi )2 is minimum.
P

∂E X X
= 0 =⇒ Yi = αn + β Xi
∂α
∂E X X X
= 0 =⇒ Xi Yi = α Xi + β Xi2
∂β

Solving normal equations we get α̂ and β̂

1
xi yi − n1
P P 1P
n
xi n yi
β̂ = P 2
1
x2i − n1
P
xi
n (1)
Cov(X, Y )
=
V ar(X)

Theorem 1. X and Y are independent if and only if

fXY (x, y) = h(x)g(y) x ∈ S(X) and y ∈ S(Y )

Question: Are X and Y linearly dependent where

fXY (x, y) = kxy, 0 < x < 1, 0 < y < 1 and x + y < 1.

Answer: Dependent.
Question: If X and Y are independent then find

1. Cov(X,Y)

2. ρ

3. E(XY )

4. Var(X+Y)

Solution:

27
1. Cov(X,Y)=0,

2. ρ = 0,

3. E(XY)=E(X)E(Y),

4. Var(X+Y)=Var(X)+Var(Y).

Note:

• If X and Y are independent, then

fXY (x, y) = fX (x)fY (y) ∀ x, y

• If X, Y and Z are independent, then

fXY Z (x, y, z) = fX (x)fY (y)fZ (z) ∀ x, y, z

Integrating with respect to x, we get

fY Z (y, z) = fY (y)fZ (z)

Similarly
fXY (x, y) = fX (x)fY (y)
and
fXZ (x, z) = fX (x)fZ (z)

• Let X1 , X2 , ..., Xn be n random variables then these are independepent iff

fX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ) = fX1 (x1 )fX2 (x2 )...fXn (xn ). ∀x1 , x2 , ..., xn .

Exercise: Show that X and Y are independent where

fXY (x, y) = kxe−y 0 < x < 1, y > 0

Solution: Since fXY (x, y) = h(x)g(y), where h(x) = kx and g(y) = e−y
Note: X and Y are independent iff

MX+Y (t) = MX (t)MY (t), ∀t

Question: If X and Y are independent and X ∼ N (0, 1) and Y ∼ N (0, 1) then what is the
distribution of X + Y .
Solution: Given that
t2
MX (t) = e 2
t2
MY (t) = e 2

28
Also
2
MX (t)MY (t) = et = MX+Y (t)
=⇒ X + Y ∼ N (0, 2).
Question: If X and Y are independent and X ∼ N (1, 4) and Y ∼ N (2, 9) then what is the
distribution of 2X + 3Y − 1?
Solution: Given that
M2X+3Y −1 (t) = E(e(2X+3Y −1)t )
= E(e2Xt )E(e3Y t )E(e−t )
= MX (2t)MY (3t)e−t .
• If X ∼ N (α, β 2 ) and Y ∼ N (γ, δ 2 ) and X, Y are independent then
aX + bY + c ∼ N (aα + bγ + c, a2 β 2 + b2 δ 2 ).

• Var(aX + bY + c) = a2 Var(X) + b2 Var(Y ) + 2abCov(X, Y ).

• If X1 , X2 , ..., Xn are independent random variables such that Xi ∼ N (µi , σi2 ), then
n n n
!
X X X
2 2
ai X i ∼ N ai µ i , ai σi .
i=1 i=1 i=1

Question: If X ∼ P (λ1 ) and Y ∼ P (λ2 ) and X and Y are independent then what is the
distribution of X + Y ?
Solution: X + Y ∼ P (λ1 + λ2 ).
Question: If X ∼ U (0, 1) and Y ∼ U (0, 1) and X and Y are independent then what is
the distribution of X + Y ?
Solution:
et − 1
MX (t) =
t
t 2
(e − 1)
MX+Y (t) = (Not known)
t2
Lemma: Given that X and Y follows fXY (x, y) and X and Y are independent then what
is the distribution of X + Y ?
Proof: Let U = X + Y
FU (u) = P (U ≤ u)
= P (X + Y ≤ u)
Z Z
= fXY (x, y)dxdy
X+Y ≤u
Z ∞ Z u−y
= fX (x)fY (y)dxdy
−∞ −∞
Z ∞ Z u−y
= fX (x)dx fY (y)dy
−∞ −∞
Z ∞
= FX (u − y)fY (y)dy
−∞

29
R∞
=⇒ fU (u) = −∞ fX (u − y)fY (y)dy.
Question: If X ∼ Exp(λ1 ) and Y ∼ Exp(λ2 ) and X and Y are independent then what is
the distribution of X + Y ?
Solution: From given

fX (x) = λ1 e−λ1 x , x>0

fY (y) = λ2 e−λ2 y , y > 0
Z ∞
fU (u) = fX (u − y)fY (y)dy
−∞
Z ∞
= λ1 e−λ1 (u−y) λ2 e−λ2 y dy
0
Z u
= λ1 λ2 e−λ1 (u−y) e−λ2 y dy.
0
Transformation:

fX (x) → fY (y), Y = g(X)

d −1
fY (y) = fX (g −1 (y)) (g (y))
dy

Let fXY (x, y) be the joint density of X & Y .

Let U = g1 (X, Y ) & V = g2 (X, Y ) be two transformations such that
1. g1 and g2 are invertible and X = h1 (U, V ), Y = h2 (U, V ).

2. Partial derivatives of g1 & g2 exist.

3. The Jacobian (J ̸= 0)
∂x ∂x
J= ∂u ∂v ̸= 0
∂y ∂y
∂u ∂v
then the joint density is given by

fU V (u, v) = fXY (h1 (u, v), h2 (u, v)) |J| , (u, v) ∈ S(U, V )

Exercise . Let fXY (x, y) = x + y, 0 ≤ x, y ≤ 1 and U = X + Y & V = X − Y .Then

Find fU V (u, v).

Solution:
U +V U −V
X= & Y =
2 2
1 1
J= 2 2
1 −1
2 2

30
−1
J=
2

U +V U −V
fU V (u, v) = fXY ( , ) |J|
2 2

U +V U −V 1
fU V (u, v) = ( + )( )
2 2 2

u
fU V (u, v) =
2
and 0 ≤ U + V ≤ 2 & 0 ≤ U − V ≤ 2


u
2
 0 < u < 1, −u < v < u
u
fU V (u, v) = 1 < u < 2, u − 2 < v < −u + 2
2
0 otherwise


and

fU (u) = 2u, 0 < u < 2

Exercise . Let X ∼ U (0, 1) & Y ∼ U (0, 1) and X ,Y are independent Random Variable.U =
X + 2Y and V = 3X − Y . Then find fXY (x, y) and fU V (u, v).

Theorem. Let X1 ∼ N (µ1 , σ12 )........X

P n ∼PN (µ
2
n , σn ) and if X1 , X2 , .......Xn are indepen-
2 2
dent.Then a1 X1 + ..... + an Xn ∼ N ( ai µi , ai σi )

Let Y = X12 and fX1 → fY ,

√ 1 √ 1
fY (y) = fX1 ( Y )( √ ) + fX1 (− Y )( √ )
2 Y 2 Y

31
1 √ √
fY (y) = √ (fX1 ( Y ) + fX1 (− Y ))
2 Y
and Z ∼ N (0, 1) and Y = Z 2
1 √ √
fY (y) = √ (fX1 ( Y ) + fX1 (− Y ))
2 Y

1 1 −y 1 −y
fY (y) = √ ( √ e 2 + √ e 2 )
2 Y 2π 2π

1 −y
fY (y) = √ e2
2πy

1 −1 −y
fY (y) = q 1
y 2 e 2 , y>0
1
2
2 2

Y ∼ χ21

Note:

X ∼ N (µ, σ 2 )

X −µ
Z=
σ

Z ∼ N (0, 1)

Note:

X ∼ P (λ), x = 0, 1, ....

2X + 1 ≁ P

Every linear function does not necessarily follow the Poisson distribution.
Theorem. Let Y1 , Y2 , ....., Yn ∼ χ21 . if Y1 , Y2 , ....., Yn are independent.Then Yi ∼ χ2n .
P
i

Proof.
P
Yi t
MY (t) = MP Yi (t) = E(e )

n
Y
MY (t) = E(eYi t )
i=1

32
n
Y
MY (t) = MYi (t)
i=1

Z ∞
1 −1 1
MYi (t) = q y 2 e−y( 2 −t) dt
1 21
2
2 0

1 Γ( 12 )
MYi (t) = q 1
1 21
2 ( 12 − t) 2
2

1
MYi (t) = √
1 − 2t

1
MY (t) = n
(1 − 2t) 2

Y ∼ χ2n

t- Distribution: Let X ∼ U (0, 1) & Y ∼ χ2n and X, Y are independent Random

Variables.Then √XY ∼ tn (t- distribution).
n
Define U = √XY and V = Y
n

fXY (x, y) = fX (x)fY (y)

1 −x2 1 n −y
fXY (x, y) = √ e 2 1 n y 2 −1 e 2 , y > 0
2π Γ22 2

Then x = u nv and y = v
p
pv
√u
n 2 vn
J=
0 1

r
v
J=
n

r
1 −u2 v n
−1 −v v
fU V (u, v) = √ 1 n
e 2n v 2 e2 , u∈R & v>0
2πΓ 2 2 2 n

33
Z ∞
1 n 1 −v 2
(1+ un )
fU (u) = √ n v ( 2 + 2 )−1 e 2 dv
2πΓ 12 2 2 0

1 Γ( n2 + 12 )
fU (u) = √ n 2 n 1 ,u ∈ R
2πΓ 12 2 2 ( 12 (1 + un )) 2 + 2

F- Distribution: Let X ∼ χ2n & Y ∼ χ2m and X, Y are independent Random Vari-
X
able.Then n
Y ∼ F (n, m) (F- distribution).
n
X
Define U = n
Y and V = Y
n

fXY (x, y) = fX (x)fY (y)

Then X = U V n
m
and Y = V
n n
Vm Um
J=
0 1

n
J =V
m

fU V (u, v) = fXY (x, y) |J|

Bivariate Normal Distribution: Let X, Y is said to follow a bivariate normal distribution

BN (µ1 , µ2 , ρ, σ12 , σ22 ) if the joint density is given by

1 −1 x − µ1 2 x − µ1 y − µ2 y − µ2 2
fXY (x, y) = p exp( 2
[( ) − 2ρ( )( )+( ) ])
2πσ1 σ2 1− ρ2 2(1 − ρ ) σ1 σ1 σ2 σ2

where , x & y ∈ R
If

ρ = 0 → fXY (x, y) = fX (x)fY (y)

→X & Y are independent.

If (X, Y ) ∼ BN (µ1 , µ2 , ρ, σ12 , σ22 ) then,

ρ=0⇒ Complete independent.

we know that

independence ⇒ ρ = 0

34
Therefore, we have

Independence ⇔ ρ = 0

In General,

ρ = 0 ⇔ LinearIndependence

If (X, Y ) ∼ BN (µ1 , µ2 , ρ, σ12 , σ22 ) then, X,Y can only be linearly related.
1 −1 x − µ1 2 x − µ1 y − µ2 y − µ2 2
fXY (x, y) = p exp( 2
[( ) − 2ρ( )( )+( ) ])
2πσ1 σ2 1 − ρ2 2(1 − ρ ) σ1 σ1 σ2 σ2

1 −1 y − µ2 x − µ1 2 1 y − µ2 2
fXY (x, y) = p exp( 2
[(( ) − ρ( )) − ( ) ])
2πσ1 σ2 1 − ρ2 2(1 − ρ ) σ2 σ1 2 σ2

1 −1 x−µ1 2
( ) 1 −1 σ2
fXY (x, y) = √ e 2 σ1 √ p exp( 2 )σ 2
[y − µ 2 − ρ (x − µ1 )]2 )
2πσ1 2πσ2 1 − ρ 2 2(1 − ρ 2 σ 1

fXY (x, y) = fX (x)fY |X (y|x)

1 1 y−δ 2
fY |X (y|x) = √ e2( γ )
2πγ

Y |X ∼ N (δ, γ 2 )

σ2
E(Y |X) = δ = µ2 + ρ (x − µ1 )
σ1

E(Y |X) = α + βx

σ2 Cov(X, Y ) σ2 Cov(X, Y )
β̂ = ρ = =
σ1 σ1 σ2 σ1 V ar(X)

Convergence in Probability
A sequence of random variables {X1 ,X2 ,. . . ,Xn } is said to converge in probability to another
random variable X if:
P(|Xn − X| > ϵ) → 0 as n → ∞
P
Xn −→X

35
Example:
Let {Xn } be a sequence of random variables defined by

P (Xn = 1) = 1
n
and P (Xn = 0) = 1 − 1
n

Let ϵ = 12 ,

(
P (Xn = 1) = 1
if 0 < ϵ < 1,
Then P (|Xn − X| > ϵ) = P (|Xn − X| > 21 ) = n
0 if ϵ ≥ 1
P
It follows that P (|Xn | > ϵ) → 0 as n → ∞, and we conclude that Xn −
→ 0.

Convergence in Moments
A sequence {X1 , X2 , . . . , Xn } of random variables is said to converge in rth moment to a
random variable X if:

E(Xrn ) → E(Xr )

Convergence in Law
Let {Xn } be a sequence of random variables with C.D.F. {Fn } defined on the probability
space (Ω, F, P ). Further, let X be another random variable with C.D.F., F(·), then;
{Xn } is s.t.b. converging in distribution to X if:

limn→∞ Fn (x) → F(x) at all points x where F(·) is continuous.

We denote it by:
L
Xn →
− X

18 Weak Law of large Number

Let {Xn } be a sequence of random variables, and let Sn be
n
X
Xk
k=1

, where n = 1, 2, 3, 4, . . .. We say that {Xn } obeys the weak law of large numbers with
respect to the sequence of constants {Bn }, where Bn > 0 and Bn → ∞, if there exists a
sequence {An } such that Bn−1 (Sn − An ) → 0 as n → ∞. Here, An is called the centering
constant, and Bn is called the normalizing constant.

36
18.1 Theorem
Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables,
each having mean E[Xi ] = µ. Then, for any ε > 0,

X1 + · · · + Xn
P − µ > ε → 0 as n → ∞
n

Proof: We shall prove the result only under the additional assumption that the random
variables have a finite variance σ 2 . Now, as

σ2

X1 + · · · + Xn X 1 + · · · + Xn
E = µ and Var =
n n n

it follows from Chebyshev’s inequality that

σ2

X1 + · · · + Xn
P −µ>ε ≤ 2
n nε

and the result is proved. ■

Cor1: If the random variables Xn are identically distributed and pairwise uncorrelated
with E(Xi ) = µ and Var(Xi ) = σ 2 < ∞, we can choose An = nµ and Bn = nσ 2 .
Pn
Cor2: We can choose Bn = n provided that n i=1 σi2
.

Cor3: Let Xn be a sequence of random variables such that An = nµ, Bn = n, and

nσ 2
n2
→ 0 as n → ∞. Thus, {Xn } is a sequence of pairwise-uncorrelated identically dis-
tributed random variables with finite variance. Moreover, Snn → µ.

Example (Die Rolling)

Consider n rolls of a die. Let Xi be the outcome of the i-th roll. Then Sn = X1 +X2 +. . .+Xn
is the sum of the first n rolls. This is an independent Bernoulli trial with E(Xi ) = 27 . Thus,
by the Law of Large Numbers (LLN), for any ϵ > 0,

7
P |Sn /n − | ≥ ϵ → 0 as n → ∞.
2

This can be restated as for any ϵ > 0,

7
P |Sn /n − | < ϵ → 1 as n → ∞.
2

37
Example(Poisson Random Variables)
Let {Xn ; n ≥ 1} be a sequence of i.i.d. Poisson random variables with parameter λ. Then
we have
λk
P (X1 = k) = e−λ ,
k!
for k = 0, 1, 2, . . .. Thus, µ = E(X1 ) = λ and Var(X1 ) = λ. Hence, by the Weak Law of
Large Numbers (WLLN),
P
X̄n −
→ µ.

Important
Let {Xn } be any sequence of random variables. Define Sn = n−1 nk=1 Xk . A necessary and
P
sufficient
2 condition for the sequence {Xn } to satisfy the weak law of large numbers is that
E 1+Y Yn
2 tends to 0 as n tends to infinity.
n

Example Let X1 , X2 , . . . be iid random variables with E(X1k ) < ∞ for some positive
integer k. Then,
Xn
Xjk → E(X1k ) as n → ∞.
j=1

If E(Xi2 ) < ∞, then

n
1X 2
Xi → E(Xi2 ) as n → ∞.
n i=1

Example
d
Let X1 , X2 , . . . be iid C(1, 0) random variables. We have seen that n−1 Sn −→ C(1, 0), so that
n−1 Sn does not converge in probability to 0. It follows that the weak law of large numbers
does not hold.

If E(Xi2 ) < ∞, then

n
1X 2
X → E(Xi2 ) as n → ∞.
n i=1 i
(
Xi , if |Xi | ≤ c, i = 1, 2, . . . , n 0,
(Xci ) =
if |X | > c, i = 1, 2, . . . , n
Pn i c
Define Sn = i=1 Xi and Mn = ni=1 E[(Xi )c ].
c
P

Example
Let X1 , X2 , . . . be iid random variables with a common probability density function (pdf)
given by (
1 + x2ρ+ρ , x ≥ 1
f (x) =
, x<1

38
where ρ > 0.

Then, the expected value of X is

Z ∞
1 1+ρ
E(X) = 1 + ρ dx = , if ρ < ∞.
1 x1+ρ ρ

and the law of large numbers holds, i.e., n−1 Sn tends to 1+ρ
ρ
as n tends to infinity.

19 Limiting Moment Generating Function:

Lemma 19.1. Let x1 , x2 , . . . be a sequence of random variables. Let fn be the density function
of xn for n = 1, 2, . . ., and suppose that the MGF Mn (t) of fn exists. What happens to Mn (t)
as n → ∞? If it converges, does it always converge to the MGF?
Example: Let {xn } be a sequence of random variables with PMF P {xn = −n} = 1,
n = 1, 2, . . .. We have Mn (t) = E(etx ) = e−tnx → 0 as n → ∞ for  all t > 0 and Mn (t) → ∞
0
 if t > 0
for all t < 0 and Mn (t) → 1 at t = 0. Then Mn (t) → M (t) = 1 if t = 0 . But M (t)

∞ if t < 0

(
0 if x < −n
is not an MGF. Note that if fn is the density function of xn , then Fn (x) =
1 if x ≥ −n
and f is not a distribution function.
Now suppose that xn has MGF Mn and Xn → X where X is a random variable with
MGF M (t). Does Mn (t) → M (t) as n → ∞?

Lemma 19.2. Let f (x) = O(x) if f (x)

x
→ 0 as x → 0. We have
n
a 1
lim 1 + + O = ea for every real a.
n→∞ n n

Example 19.1. Let X ∼ P (λ), then the MGF of X is given by M (t) = exp[λ(et − 1)] for
all t. √ √
√ , then the MGF of Y is given by MY (t) = e−t/ λ M (t/ λ). Logarithm of
Let Y = X−λ
λ
MY (t) is
t t
log MY (t) = − √ + log M √
λ λ
t √
= − √ + λ(et/ λ − 1).
λ
Expanding the exponential term and solving, after solving,

t2
log MY (t) = as λ → ∞,
2
2 /2
so that MY (t) → et as λ → ∞, which is the MGF of a N (0, 1) random variable.

39
20 What is Random Sample?
A Collection of I.I.D.(Independent Identically Distributed) Random Variables (X1 , X2 , X3 , . . . , Xn )
is called a "Random Sample" of size n.
Qn Since each of X1 , X2 , X3 , . . . , Xn has same distribution and are independent, so f (x1 , x2 , x3 , . . . , xn ) =
i=1 f (xi , θ)

21 What is Statistic?
Statistic- A function of random sample.
There are two most important statistics and these are :

Sample Mean : Mathematically Given by : X̄ = 1

Pn
n i=1 Xi

Sample Variance : Mathematically Given by : S 2 = 1

Pn
n−1 i=1 (Xi − X̄)2

22 Properties of Mean and Variance :

Let, X1 , X2 , X3 , . . . , Xn be a random sample such that E(Xi ) = µ and V ar(Xi ) = σ 2
If n is large , then by Central Limit Theorem,
As the sample size (n) becomes sufficiently large, the sampling distribution of the standard-
ized sample mean, which is X̄−µ √σ
, approaches a standard normal distribution (Z) with mean
n
0 and standard deviation 1, denoted as:

X̄ − µ n→∞
Z= −−−→ N (0, 1)
√σ
n

n
1X
E(X̄) = E( Xi )
n i=1

σ2
Var(X̄) =
n
1
S2 = (Xi − X̄)2
P
(n−1)

1 X
S2 = (Xi − µ + µ − X̄)2 (2)
(n − 1)

1 X
S2 = ((Xi − µ)2 + (µ − X̄)2 + 2(Xi − µ)(µ − X̄)2 ) (3)
(n − 1)

40
1 X
S2 = ( (Xi − µ)2 − n(X̄ − µ)2 ) (4)
(n − 1)
1
=⇒ E(S 2 ) = E((Xi − µ)2 ) − nE((X̄ − µ)2 ))(5)
P
(n−1)
(

1 X 2 σ2
2
E(S ) = [ σ − n ] = σ2 (6)
n−1 n
from equation (3)
n
(n − 1)S 2 X Xi − µ 2 X̄ − µ
2
= ( ) − ( σ )2 (7)
σ i=1
σ √
n
n
(n − 1)S 2 X 2
= Zi − Z 2 (8)
σ2 i=1

(n − 1)S 2
≈ χ2( n−1) (9)
σ2

23 Statistics :
let say we have given a data set:

X1 , X2 , X3 , . . . , Xn → f (xi ) = exp(λ) (10)

assumption-1: fX (x : θ)
f (x; θ)

Unknown Parameter

Population Parameter

23.1 Unbiased statistics:→

A statistic g(X1 , X2 , X3 , . . . , Xn ) is an unbiased estimator of θ if

E[g(X1 , X2 , X3 , . . . , Xn )] = θ
E[g(X1 , X2 , X3 , . . . , Xn )] − θ = bias

41
23.2 Efficient Statistic→
g1 (X1 , X2 , X3 , . . . , Xn ) is efficient than g2 (X1 , X2 , X3 , . . . , Xn ) if Var[g1 (X1 , X2 , X3 , . . . , Xn )]
≤ V ar[g2 (X1 , X2 , X3 , . . . , Xn ) ]

MVUE → minimum variance unbiased estimator

Example :X1 , X2 , X3 , . . . , Xn follows N(µ, σ 2 )
given that :E(Xi ) =µ, var(Xi ) =σ 2

(X1 + X2 − X3 + X4 + X5 )
g1 (X1 , X2 , X3 , . . . , Xn ) =
5
g2 (X1 , X2 , X3 , . . . , Xn ) = X̄ (11)
X1 + X n
g3 (X1 , X2 , X3 , . . . , Xn ) = (12)
2
E(X1 + E(X2 ) − E(X3 ) + E(X4 ) + E(X5 )
E[g1 (X1 , X2 , X3 , . . . , Xn )] =
5

3µ
E[g1 (X1 , X2 , X3 , . . . , Xn )] = → not Unbiased (13)
5
E[g2 (X1 , X2 , X3 , . . . , Xn )] = E(X̄ = µ → Unbiased (14)
E(X1 ) + E(Xn )
E[g3 (X1 , X2 , X3 , . . . , Xn )] = = µ → Unbiased (15)
2
σ2
Var[g2 (X1 X2 ...Xn )] = Var(X̄) = (16)
n
X1 + X2 σ2
Var[g3 (X1 X2 ...Xn )] = Var( )= (17)
2 2
Var[g2 (...)] <= Var[g3 (...)] (18)
X̄ → MUVE for µ & S2 (for σ 2 )
σ2
1) X̄ ∼ N µ, n ) ←→ X̄−µ
√
∼ N (0, 1)
n

2) 1
S 2 = (n−1) (Xi − µ)2 − n(X̄ − µ)2
P
2
(n−1)S
= ni=1 ( Xiσ−µ )2 − ( X̄−µ )2
P
σ2 √σ
n

here ( Xiσ−µ )2 →χ2n

here ( Xi −µ 2
√σ
) →χ21
n

3) X̄ and S 2 are independent

X̄−µ
√σ
(19)
n
t= q
(n−1)S 2
σ(n−1)

x̄ − µ
t= ∼ t(n-1) (20)
√S
n

42
24 Objective
To obtain g(X1 , X2 , . . . , Xn ) that estimates θ.

25 Unbiased Estimator (UE)

θ = E(θ̂) = E[g(X1 , X2 , . . . , Xn )]
Bias: Bias = θ − E(θ̂).

26 Efficiency
g1 is more efficient than g2 if Var(g1 ) ≤ Var(g2 ).

27 Minimum Variance Unbiased Estimator (MVUE)

MVUE is the Minimum Variance Unbiased Estimator.

28 Maximum Likelihood Estimator (MLE)

Likelihood function:
L(θ) = f (x1 , x2 , . . . , xn )
Example:
Argmaxθ (L(θ)) = θ̂
For a sample X1 , X2 , . . . , Xn from an exponential distribution with mean λ:
n n
Y Y 1 − xi
L(λ) = fXi (xi ) = e λ
i=1 i=1
λ
Pn
xi
log(L(λ)) = −n ln(λ) − i=1
λ
Pn
d n xi
ln(L(λ)) = − + i=12 =0
dλ λ λ
n
!
1 X
λ̂ = xi
n i=1
MLE is MVUE for a large dataset.
Example: X1 , X2 , . . . , Xn , Xi ∼ U (0, θ)
1
fXi (xi ) =
θ

43
n n
Y Y 1
L(θ) = fXi (xi ) =
i=1 i=1
θ
log(L(θ)) = −n log(θ)
d n
log(L(θ)) = − = 0
dθ θ
It does not work always!
Example: X1 , X2 , . . . , Xn , Xi ∼ N (µ, σ 2 )
n n
1 1 xi −µ 2
e− 2 ( σ )
Y Y
2
L(µ, σ ) = fXi (xi ) = √
i=1 i=1
2πσ
n
2 n n 2 1 X
G(L(µ, σ )) = − log(2π) − log(σ ) − 2 (xi − µ)2
2 2 2σ i=1
n
d 1 X
G(L(µ, σ 2 )) = 2 (xi − µ) = 0
dµ 2σ i=1
n
1X
µ̂ = xi
n i=1
n
d 2 n 1 X
2
G(L(µ, σ )) = − 2
+ 4
(xi − µ)2 = 0
dσ 2σ (2σ ) i=1
n
2 1X
σ̂ = (xi − µ̂)2
n i=1
Joint estimators of (µ, σ 2 ):
µ̂ = x̄
n
2 1X
σ̂ = (xi − x̄)2
n i=1

E(σ̂ 2 ) ̸= σ 2 (Biased)
σ̂ 2 is not an unbiased estimator of σ 2
Unbiased estimator for σ 2 :
n
n 1 X
g(x1 , x2 , . . . , xn ) = σ̂ 2 = (xi − x̄)2
n−1 n − 1 i=1

Choose θ̂ such that

E(θ̂ − θ)2 = Minimum square error

Arg min E(θ̂ − θ) ,
2
θ̂ = g(x1 , x2 , . . . , xn )
θ̂

44
h i
E(θ̂ − θ)2 = E (θ̂ − E(θ̂))2 + 2(θ̂ − E(θ̂))(E(θ̂) − θ) + (E(θ̂) − θ)2
n 2 2 o
E(θˆ − θ)2 = E θˆ − E(θˆ) + θ − E(θˆ) + 2 θˆ − E(θˆ) E(θˆ) − θ

where:

• θ̂ is a random variable,

• E(θ̂) is a constant,

• θ is a constant.

This expression can be expanded as:

2 2
= E θˆ − E(θˆ) + θ − E(θˆ) + 2 θˆ − E(θˆ) E(θˆ) − θ

2
= Var(θˆ) + Bias(θˆ)

29 Confidence Interval Estimation

To obtain g1 (x1 , x2 , . . . , xn ) and g2 (x1 , x2 , . . . , xn ):

P (g1 ≤ θ ≤ g2 ) ≥ 1 − α
Where:

1. 1 − α is the level of confidence.

2. (g1 , g2 ) is the interval.

3. P (g1 ≤ θ ≤ g2 ) ≥ 1 − α.

Case 1
Let X1 , X2 , . . . , Xn be a random sample from N (µ, σ 2 ) and σ 2 is known.
α = error
α = 0.05, 0.1, 0.01, represents the confidence.
Create a confidence interval for µ:

1. Point estimator of µ is X̄.

2
2. X̄ ∼ N (µ, σn ).

3. X̄−µ
√σ
∼ N (0, 1).
n

45
P (a ≤ Z ≤ b) = 1 − α

P −z α2 ≤ Z ≤ z α2 = 1 − α
!
X̄ − µ
P −z α2 ≤ ≤ z α2 =1−α
√σ
n

σ σ
P X̄ − z α2 √ ≤ µ ≤ X̄ + z α2 √ =1−α
n n
(1 − α)100% confidence interval for µ is:

σ σ
X̄ − z α2 √ , X̄ + z α2 √
n n
For example, when α = 0.05, the 95% confidence interval for µ with z0.025 = 1.96 is:

σ σ
X̄ − 1.96 √ , X̄ + 1.96 √
n n

Case 2
When σ is unknown:

X̄ − µ
∼ tn−1
√S
n
!
X̄ − µ
P −t α2 ,n−1 ≤ ≤ t α2 ,n−1 =1−α
√S
n

(1 − α)100% confidence interval for µ is:

S S
X̄ − t α2 ,n−1 √ , X̄ + t α2 ,n−1 √
n n
or a small data set with a random sample x1 , x2 , . . . , xn from N (µ, σ 2 ), the point estimator
for σ 2 is S 2 . The confidence interval for σ 2 is given by:

46
(n − 1)S 2 2 (n − 1)S 2
≤ σ ≤
χ2α/2 χ21−α/2
where χ2α/2 and χ21−α/2 are the critical values from the chi-square distribution with n − 1
degrees of freedom. The probability that σ 2 falls within this interval is 1 − α. In terms of
the chi-square distribution, this is expressed as:

(n − 1)S 2

2 2
P χ1−α/2 ≤ ≤ χα/2 = 1 − α
σ2
Further simplifying, we have:

!
(n − 1)S 2 2 (n − 1)S 2
P ≤ σ ≤ =1−α
χ2α/2 χ21−α/2

The confidence interval for σ 2 is

" #
(n − 1)S 2 (n − 1)S 2
, 2
χ2(α/2) χ(1−α/2)

47
Interval estimator
let X1 , X2 , X3 , ...Xn be a random sample of size n then
Case-1: When n is large,confidence interval for µ is,

S S
X̄ − z α2 √ , X̄ + z α2 √
n n
Case-2: When n is small,confidence interval for µ is,

S S
X̄ − t α2 ,n−1 √ , X̄ + t α2 ,n−1 √
n n

Confidence Interval for p: let X1 , X2 , X3 , ...Xn be a random sample(large) of size n

such that
Xi ∼ B(1, p),
we know that the point estimator for p is X̄ and

X̄ − E(X̄)
p → N (0, 1)
V ar(X̄)

When sample is large E(X̄) = E(Xi ) = p and var(X̄) = var(Xi )

n
= p(1−p)
n
and let z = qX̄−p
p(1−p)
n
and

P (−z α2 ≤ z ≤ z α2 ) = 1−α
X̄ − p
P (−z α2 ≤ q ≤ z α2 ) = 1−α
p(1−p)
n
r r
p(1 − p) p(1 − p)
P (X̄ − z α2 ≤ p ≤ X̄ + z α2 ) = 1 − α,
n n

replace p(1 − p) = X̄(1 − X̄) then confidence interval for p is

" r r #
X̄(1 − X̄) X̄(1 − X̄)
X̄ − z α2 , X̄ + z α2 .
n n
q
Now −z α2 ≤ qX̄−p
p(1−p)
≤ z α2 ⇒ |X̄ − P | ≤ z α2 p(1−p)
n
we get
n

z α2 2 z α2 2

2
1+ P − 2X̄ + P + X̄ 2 ≤ 0
n n
zα 2
r zα 2 2
zα 2

2X̄ + n 2
± 2X̄ + n2 − 4 1 + n2 X̄ 2
P = zα 2

2 1 + n2

48
2
zα/2
when n is large then n
≈ 0 we get,

r 4X̄z 2 z4
zα/2 2 α/2 α/2
2X̄+ n
± 4X̄ 2 + n
+ 2
n
P =
zα/2 2

2 1+ n
√
X̄(1−X̄)zα/2
P = X̄ ± √
n

Remark: For small sample zα/2 can be replaced by tα/2

Let two samples X1 , X2 , ...Xn ∼ N (µ1 , σ12 ) and Y1 , Y2 , ...Yn ∼ N (µ2 , σ22 ) then we will find
C.I. (Confidence Interval) for (µ1 − µ2 ) as we know point estimator of µ1 − µ2 is X̄ − Ȳ ,
where X̄ ∼ N (µ1 , σ12 /2) and Ȳ ∼ N (µ2 , σ22 /2) implies X̄ − Ȳ ∼ N (µ1 − µ2 , σ12 /2 + σ22 /2)
hence C.I. for µ1 − µ2 is
" r r #
S12 S22 S12 S22
(X̄ − Ȳ ) − zα/2 + , (X̄ − Ȳ ) + zα/2 +
n m n m

Testing of Hypothesis:
Decision Making
Ex. Let the average height of the population is 5.4 ft and P = 40% then accept or reject
Statistical hypothesis:
Null hypothesis: Initial claim(should be rejected)

Ho : µ = 5.4 (simple) or p ≥ 0.4

Alternate hypothesis: This is complement to Ho ,

H1 : Ho is not true
µ ̸= 5.4 or p < 0.4

Specific Value of parameter: Assumed null hypothesis implies simple hypothesis,

Ho : µ = 5.4 v/s H1 : µ ̸= 5.4

• Collect the data (5.1, 5.5, 5.6,...)

• X̄ = 5.8,(Ho is rejected → not close)

If X̄ = 5.3or5.5(close to 5.4 or not?)
decision →
Actual H0 H1
(error is deviation from actual scenario)
H0 ✓ type I error
H1 type II error ✓

49
1. Over estimation

2. Under estimation

P ( type I error) = α
⇒ P (H0 is rejected/H0 is true) = α

P ( type II error) = β
⇒ P ( accept H0 /H0 is not true) = β
Objective: Minimize both α and β
simultanious reduction in α and β is not possible. So let first fix the value of α and minimize β

• Rejection region of H0

• Critical region
H0 : p = 0.4, H1 : p ̸= 0.4 collect the data and we have point estimator of p =X̄
If X̄ is close to 0.4 then accept H0 otherwise reject H0 .

Level of significance
Objective: Minimize both α and β
Testing:µ = µ0 v/s H1 : µ ̸= µ0
Test critical region/ Rejection region

C = {(X1 , X2 , ..., Xn ) : |X̄ − µ0 | > c}

where c is some significant number
we fix α,

α = P ( type I error ) = PH0 ( reject H0 )

= PH0 (c)
= PH0 (|X̄ − µ0 | > c)

⇒ PH0 (|X̄ − µ0 | < c) = 1 − α

⇒ PH0 (−c ≤ X̄ − µ0 ≤ c) = 1 − α
−c c
⇒ PH 0 ( √ ≤ z ≤ √ ) = 1 − α
σ/ n σ/ n
zα/2 σ
say, zα /2 = c√
σ/ n
⇒c= √ ,
n
for large data σ will be replaced by S.

50
• Decision with (1 − α)100% level of significance

• Reject H0 if |X̄ − µ0 | > c

z S
Reject H0 if |X̄ − µ0 | > α/2
√
n

zα/2 S
• Accept H0 if |X̄ − µ0 | ≤ √
n

z S zα/2 S
⇒ µ0 ∈ X̄ − α/2 √ , X̄ +
n
√
n
X̄−µ
⇒ zα/2 ≤ √0
S/ n
≤ zα/2

Method:
Step 1- Define H0 and H1

Step 2- Fix α.

Step 3- Define a test statistic( under H0 ), reject H0 if X̄ is far from µ0

√ 0 , reject H0 if |zcal | is large i.e. |zcal | > zα/2 → reject H0

Step 4- zcal = X̄−µ
S/ n
Example:
Data:= 5.1, 5.8, 5.7,...,6 (n=100),
Average height(µ0 )=5.4
H0 : µ = 5.4, H1 : µ ̸= 5.4 95% level of significance, α = 0.05,
Hence X̄ = 5.6 and S 2 = 1,
5.6 − 5.4
zcal = √
1/ 100
zcal = 2
zα/2 → calculate from table, z0.025 = 1.96
as |zcal | > zα/2 , we reject H0 at 95% level of significance.

β= P (type II error)
= PH1 ( accept H0 )
z S
= PH1 (|X̄ − µ0 | ≤ α/2
√ )
n
√ √
= PH1 (−zα/2 S/ n ≤ X̄ − µ0 ≤ zα/2 S/ n)
= PH1 (−zα/2 √Sn ≤ X̄ − µ0 ≤ zα/2 √Sn )
= PH1 (−zα/2 √Sn + µ0 − µ1 ≤ X̄ − µ1 ≤ zα/2 √Sn + µ0 − µ1 )
µ0 −µ µ0 −µ
= PH1 (−zα/2 + √1
S/ n
≤ Z ≤ zα/2 + √ 1)
S/ n

Here 1 − β defines power of the test.

If data is small → replace zα/2 with tα/2,(n−1)
One sided test: H0 : µ ≤ µ0 vsH1 : µ > µ0

51
C = {(X1 , X2 , ..., Xn ) : X̄ − µ > c} based on α

α= PH0 (c)
= PH0 (X̄ − µ > c)
⇒ 1 − α = PH0 (X̄ − µ < c)

X̄−µ
z= ∼ N (0, 1)
√0
S/ n
PH 0 Z ≤ S/c√n = 1 − α ⇒ c√
S/ n
= zα ⇒ c = z√
αS
n
Reject H0 at (1 − α)100% level of significance if X̄ − µ0 > z√
αS
n
⇒ X̄ > µ0 + z√
αS
n
(zα = tα
when n < 30)
H0 : µ ≥ µ0 vs H1 : µ < µ0
Critical region is,
C = {(X1 , X2 , ..., Xn ) : X̄ − µ < c} based on α

α= PH0 (c)
= P (X̄ − µ0 < c)
= P ( X̄−µ
√0 <
S/ n
c√
S/ n
)
c√
= P (z < S/ n
)

c= √S (−zα )
n
If X̄−µ
√ 0 < −zα
S/ n
then reject H0 .

Two sample tests:

µ1 → Average salary in Jodhpur, µ2 → Average salary in Mumbai,

H0 : µ1 ≤ µ2 vs H1 : µ1 ̸= µ2

Given X1 , X2 , ...Xn and Y1 , Y2 , ...Yn are two samples, then critical region,
C = {(X1 , X2 , ...Xn , Y1 , Y2 , ...Yn ) : |X̄ − Ȳ | > c}

α= PH0 (c)
= PH0 (|X̄ − Ȳ | > c)
⇒ PH0 (−c ≤ X̄ − Ȳ ≤ c) = 1−α
S12 S22

X̄ − Ȳ ∼ N µ1 − µ2 , +
n m
(X̄ − Ȳ ) − (µ1 − µ2 )
⇒Z= q ∼ N (0, 1) for H0 , µ1 − µ2 = 0
S12 S22
n
+ m

52
−c c
P (r 2
≤z≤ r )=1−α
S1 S2 2
S1 S2
n
+ m2 n
+ m2
q
S2 S2
c = zα/2 n1 + m2
Reject H0 at (1 − α)100% level of significance if |X̄ − Ȳ | > c One sided:
H0 : µ1 ≤ µ2 vs H1 : µ1 > µ2

β= PH1 ( Accept H0 )
= PH1 (|X̄ − Ȳ | ≤ c)
= P (−c ≤ |X̄ − Ȳ | ≤ c)
 H1 

= PH1  −c−(µ
r 1
2
−µ2 )
2
≤z≤ c−(µ1 −µ2 ) 
r
2
S1 S S1 S2
n
+ m2 n
+ m2

For fixed α
1 − α → level of significance
1 − β → power of test

Hypothesis for σ 2 :
I : H0 : σ12 = σ02 v/s H1 : σ12 ̸= σ02
I/II
(n − 1)S 2
2
∼ χ2(n−1)
σ
Test statistic(Under H0 )

(n − 1)S 2
Ycal =
σ02
Compare Ycal with χ2α/2,(n−1) and χ21−α/2,(n−1) .
Reject H0 if Ycal > χ2α/2 or Ycal < χ21−α/2 .

Hypothesis: H0 : p = p0 , H1 : p ̸= p0
for p = α
C = {(x1 , x2 , x3 , .....xα ) : |X − p0 | > c}
C = PH0 (|X − p0 | > c)
Z ∼ N (p0 , p0 (1−p
n
0)
)
X − p0
Z=q ∼ N (0, 1)
p0 (1−p0
n

53
−c X − p0 c
P (q ≤q ≤q )=1−α
p0 (1−p0 p0 (1−p0 p0 (1−p0
n n n

c=Z q
p0 (1−p0
α/2 n
β(p1 ) = PH1 (Acceptance of H0 )
= PH1 (|X − p0 | ≤ c)
= PH1 (−c ≤ X − p0 ≤ c)
−c + p0 − p1 c + p0 − p1
= P( q ≤Z≤ q )
p0 (1−p0 p0 (1−p0 )
n n

X − p0
Zcal = q
p0 (1−p0 )
n

If Zcal > Zα/2 or Zcal < Zα/2 then reject H0 .

Note: 1 − β is known as power of the test.

Two sample test

Hypothesis: H0 : p1 = p2 = p0 , H1 : p1 ̸= p2
Here two samples (X1 , ....., Xn ), (Y1 , ....Yn ) from independent solution are considered. If H0
is true in that case X = Y.
Both X and Y follows normal distribution under H1 .
X − Y ∼ N (0, p0 (1 − p0 )( n1 + m1 ).
Zcal = √ (X−Y )−01 1
p0 (1−p0 )( n + m
Reject H0 if Zcal < Zα/2 or Zcal > Zα/2 .
Note: If H0 : p1 = p2 = p3 = p4 = ..... and H1 : H0 is not true. This is known as analysis of
variance and here F-distribution can be used.

Goodness of fit test

2
χ2 = ni=1 (oi −e i)
∼ χ2n−1
P
ei
H0 is rejected if χ2 is large i.e. χ2 > χ2α,(n−1) .

Stochastic Process
A collection of random variables {Xt : t ∈ T } varying with respect to time is called a
stochastic process.
Here Xt is a random variable at time t.

54
Categories of Stochastic processes
• Discrete time - Discrete state
For example: number of people coming to a doctor everyday.

• Continuous time - Discrete state

For example: number of calls every hour.

• Discrete time - Continuous state

For example: measuring rain after the day.

• Continuous time - Continuous state

Statistical Properties
Let Xt : t ≥ T be a stochastic process.

1. Distribution Function:
First order: Ft (x) = P (Xt ≤ x)
Second order: Ft1 ,t2 (x, y) = P (Xt1 ≤ x, Xt2 ≤ y).

2. Expectation: R
η(t) = E(Xt ) = xt ft (x)dx.

3. Autocorrelation:
R(t1 , t2 ) = E(Xt1 · Xt2 ).

4. Average power:
R(t, t) = E(Xt 2 )
Note: This is autocorrelation at same time.

5. Autocovariance:
C(t1 , t2 ) = E(Xt1 · Xt2 ) − E(Xt1 ) · E(Xt2 ).
C(t1 , t2 ) = R(t1 , t2 ) − η(t1 ) · η(t2 ).

Increments of a Stochastic process

Let t1 ≤ t2 ≤ t3 .... then (Xt1 − Xt2 ), (Xt2 − Xt3 ), ...... are known as the increments.

• Independent increments
If (Xt1 − Xt2 ), (Xt2 − Xt3 ), ...... all are independent then we say stochastic process has
independent increments.

• Stationary increments
If the distribution of (Xt+h − Xt ) does not depend on t but depends only on h i.e. the
increments of same length follow same distribution then we say stochastic process has
stationary increments.
For example: Distribution of (Xt15 − Xt10 ) = Distribution of (Xt8 − Xt3 )

55
Counting process
A stochastic process {Nt : t ≥ 0} is said to be a counting process if Nt is the number of
events occuring in time interval (0,t].

Poisson process
A counting process {Nt : t ≥ 0}, N0 = 0 is said to be a poisson process if

(a) The increments are independent.

(b) The increments are stationary with Nt+s − Ns ∼ P (λt),

where λ is arrival rate.
λtr (−λt)
P (Nt+s − Ns = r) = r!
e

Interarrival time
The time between two arrivals is said to be interarrival time.
T1 denotes Time of occurence of first event.
T2 denotes Time between first and second event.
Tn denotes Time between (n − 1)th and nth event.
P (T1 > t) denotes the probability for time of first arrival to be more than t.
P (T1 > t) = P ( no arrival in (0,t] )
P (T1 > t) = P (N (t) = 0) = e(−λt) .
The interarrival time follows exponential distribution with mean 1/λ.

Markov process
A stochastic process {Xt : t > 0} is said to be a markov process if
P (X(n+1) = a(n+1) : X0 = a0 , X1 = a1 , .......Xn = an ) = P (X(n+1) = a(n+1) : Xn = an )

Transition Probability Matrix

 
p11 p12 p13
Original matrix = p21 p22 p23 
p31 p32 p33
Here pi,j = P (X1 = j : X0 = i) denotes the movement from state i to j

cs109 Final Cheat 3 PDF
No ratings yet
cs109 Final Cheat 3 PDF
13 pages
ALL ST218 Lecture Notes
No ratings yet
ALL ST218 Lecture Notes
87 pages
Victor Guillemin-Multilinear Algebra and Differential Forms For Beginners (Fall 2010 MIT Notes)
No ratings yet
Victor Guillemin-Multilinear Algebra and Differential Forms For Beginners (Fall 2010 MIT Notes)
290 pages
Lecture 1
No ratings yet
Lecture 1
81 pages
Econ-2042 - Unit 2-HO
No ratings yet
Econ-2042 - Unit 2-HO
12 pages
Chap2 Discrete Distributions
No ratings yet
Chap2 Discrete Distributions
22 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
Learn Distribute
No ratings yet
Learn Distribute
23 pages
Continuous Random Variables
No ratings yet
Continuous Random Variables
28 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
Chapter 2
No ratings yet
Chapter 2
8 pages
Introduction To Probability: 2.1 Random Variable
No ratings yet
Introduction To Probability: 2.1 Random Variable
4 pages
02-Random Variables
No ratings yet
02-Random Variables
38 pages
P - S 3 Random Variables
No ratings yet
P - S 3 Random Variables
18 pages
Probability Review
No ratings yet
Probability Review
12 pages
MAS 102 - Topic 1
No ratings yet
MAS 102 - Topic 1
13 pages
RVSP Notes
89% (9)
RVSP Notes
123 pages
02-Random Variables
No ratings yet
02-Random Variables
44 pages
Orientation - Basic Mathematics and Statistics - Probability
No ratings yet
Orientation - Basic Mathematics and Statistics - Probability
48 pages
02-Random Variables2
No ratings yet
02-Random Variables2
47 pages
Slides-Probability and Random Processes, 4, March 2024
No ratings yet
Slides-Probability and Random Processes, 4, March 2024
116 pages
CHAPTER TWO (2) S
No ratings yet
CHAPTER TWO (2) S
69 pages
STA 120-All Lectures
No ratings yet
STA 120-All Lectures
64 pages
Addis Ababa Science & Technology University Department of Electrical & Computer Engineering
No ratings yet
Addis Ababa Science & Technology University Department of Electrical & Computer Engineering
63 pages
SMA 240 Probability and Statistics 1 Lecture Notes
No ratings yet
SMA 240 Probability and Statistics 1 Lecture Notes
36 pages
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
No ratings yet
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
8 pages
M131-Lecture Notes No. 4
No ratings yet
M131-Lecture Notes No. 4
58 pages
05 Random Signal
No ratings yet
05 Random Signal
40 pages
Section06 Solutions
No ratings yet
Section06 Solutions
11 pages
Probability Space and Random Variable Proporties
No ratings yet
Probability Space and Random Variable Proporties
21 pages
Probability Basics
No ratings yet
Probability Basics
19 pages
LECT3 Probability Theory
No ratings yet
LECT3 Probability Theory
42 pages
1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
No ratings yet
1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
11 pages
Lect 2
No ratings yet
Lect 2
7 pages
Basic Probability Review
No ratings yet
Basic Probability Review
77 pages
02 Random Variables SEIDTCHR
No ratings yet
02 Random Variables SEIDTCHR
44 pages
Stochastic
No ratings yet
Stochastic
63 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
PT2425 Cheatsheet Updatedv2
No ratings yet
PT2425 Cheatsheet Updatedv2
5 pages
SI Chapter-1
No ratings yet
SI Chapter-1
30 pages
Discrete Random Variable
No ratings yet
Discrete Random Variable
41 pages
Chapter 0
No ratings yet
Chapter 0
13 pages
01 Lectureslides ProbTheory
No ratings yet
01 Lectureslides ProbTheory
42 pages
Random Variables and Process
No ratings yet
Random Variables and Process
31 pages
StochasticModels 2011 Part 2 v1
No ratings yet
StochasticModels 2011 Part 2 v1
22 pages
Module 3 Probability
No ratings yet
Module 3 Probability
46 pages
UECM2273 Mathematical Statistics
No ratings yet
UECM2273 Mathematical Statistics
16 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Probability Distribution and Density Functions
No ratings yet
Probability Distribution and Density Functions
24 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
Lecture 3 - CSE38900 - Rev
No ratings yet
Lecture 3 - CSE38900 - Rev
88 pages
Chapter 6
No ratings yet
Chapter 6
5 pages
Mathematical Statistics
No ratings yet
Mathematical Statistics
7 pages
Ch1 Random Variables and Probability Distributions 0
No ratings yet
Ch1 Random Variables and Probability Distributions 0
25 pages
Math556 02 RVProbDist
No ratings yet
Math556 02 RVProbDist
6 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
Lecture 6 - Fall 2023
No ratings yet
Lecture 6 - Fall 2023
38 pages
Continuous Probability Distributions
No ratings yet
Continuous Probability Distributions
59 pages
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
100% (1)
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
14 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
The Generation and Display of Normal Maps in 3ds Max
No ratings yet
The Generation and Display of Normal Maps in 3ds Max
12 pages
Layer Normalization: Jimmy@psi - Toronto.edu Rkiros@cs - Toronto.edu Hinton@cs - Toronto.edu
No ratings yet
Layer Normalization: Jimmy@psi - Toronto.edu Rkiros@cs - Toronto.edu Hinton@cs - Toronto.edu
14 pages
Golden Section Search
No ratings yet
Golden Section Search
6 pages
Sawaryn 2005
No ratings yet
Sawaryn 2005
13 pages
MSC 3rd Syllabus
No ratings yet
MSC 3rd Syllabus
10 pages
Mubiito SMACK S5 Assesment 1 2024
No ratings yet
Mubiito SMACK S5 Assesment 1 2024
3 pages
Grade 9 Paper Maths
No ratings yet
Grade 9 Paper Maths
14 pages
Exercises in Nonlinear Control Systems
No ratings yet
Exercises in Nonlinear Control Systems
99 pages
Fine-Structure Constant From Golden Ratio Geometry
No ratings yet
Fine-Structure Constant From Golden Ratio Geometry
17 pages
Mtap Grade 5
No ratings yet
Mtap Grade 5
2 pages
Geometry Basics Vocabulary
No ratings yet
Geometry Basics Vocabulary
39 pages
XII Maths DPP (09) - Prev Chaps + Functions - ITF + Limits, Continuity + MOD
No ratings yet
XII Maths DPP (09) - Prev Chaps + Functions - ITF + Limits, Continuity + MOD
21 pages
The Concept of Logical Consequence - John Etchemendy
No ratings yet
The Concept of Logical Consequence - John Etchemendy
7 pages
Exogeneity Assumptions
No ratings yet
Exogeneity Assumptions
3 pages
Slides For Linear Algebra (Lec 1)
No ratings yet
Slides For Linear Algebra (Lec 1)
15 pages
C.B.S.E. SAMPLE PAPER 2021-22 (TERM-II) : Mathematics Class-XII
No ratings yet
C.B.S.E. SAMPLE PAPER 2021-22 (TERM-II) : Mathematics Class-XII
57 pages
BSEdMath 1 M101 Course Guide
No ratings yet
BSEdMath 1 M101 Course Guide
26 pages
Assignment7 (Questions)
No ratings yet
Assignment7 (Questions)
3 pages
Paper Crane Lesson Plan
No ratings yet
Paper Crane Lesson Plan
3 pages
Riemann Sums Essay
No ratings yet
Riemann Sums Essay
18 pages
Bezdek - 1987 - Some Non-Standard Clustering Algorithms PDF
No ratings yet
Bezdek - 1987 - Some Non-Standard Clustering Algorithms PDF
582 pages
wOfx426Tz2pyGtEGrPmv - Calculus of Variation (Kalika) 82pages
No ratings yet
wOfx426Tz2pyGtEGrPmv - Calculus of Variation (Kalika) 82pages
82 pages
CT N1 (Chapter 1 Factors and Multiples)
No ratings yet
CT N1 (Chapter 1 Factors and Multiples)
4 pages
DLL - Mathematics 5 - Q2 - W6
No ratings yet
DLL - Mathematics 5 - Q2 - W6
5 pages
Personal Growth Plan 2021-2022
No ratings yet
Personal Growth Plan 2021-2022
7 pages
Secondary Maths Activity Booklet
No ratings yet
Secondary Maths Activity Booklet
16 pages
Digital Sat k12 Student Weekend 143221959
No ratings yet
Digital Sat k12 Student Weekend 143221959
1 page
TI36PRO Guidebook EN
No ratings yet
TI36PRO Guidebook EN
78 pages
12th Class Maths Notes 2024 CH 3 Ex 3 7
No ratings yet
12th Class Maths Notes 2024 CH 3 Ex 3 7
5 pages