Notes
Notes
• Axiomatic approach
• Classical approach
2 Random Experiment
It is an experiment in which
3 Sample Space
It is a pair (S, Ω) where
• Ω is a σ - field of subsets of S.
• Example:-
If we toss a coin. then
S = {H, T }
and
Ω = {{H}, {T }; {H, T }, ϕ}
4 Operations on sets
For A, B ⊆ S
Experiment Sets
A or B A∪B
A and B A∩B
not A AC
1
5 De-Morgans’ Law
1. (A ∪ B)C = AC ∩ B C
2. (A ∩ B)C = AC ∪ B C
6 Sigma Field
Let S be a set Ω = set of subsets of S satisfying following conditions :-
1. ϕ ∈ Ω.
2. If A ∈ Ω, then AC ∈ Ω
3. If A, B ∈ Ω, then A ∪ B ∈ Ω
Example:-
S = {1, 2, 3, 4, 5, 6}
If A = {1, 2, 3}
B = {4, 5}
then
Ω = ϕ, S, A, AC , B, B C , A ∪ B, AC ∩ B C
7 Probability
Let S be a sample space & Ω be the sigma field generated by S then probability is a function
P : Ω → [0, 1].
1. 0 ⩽ P (A) ⩽ 1; ∀A ∈ Ω
2. P (S) = 1
n
! n
[ X
P Ai = P (Ai )
i=1 i=1
2
7.1 Properties
1. P AC = 1 − P (A)
2. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
3. P (A∪B ∪C) = P (A)+P (B)+P (C)−P (A∩B)−P (B ∩C)−P (C ∩A)+P (A∩B ∩C)
Remark:- The triple (S, Ω, P ) is called probability space.
8 Conditional Probability
Let (S, Ω, P ) be probability space and let H ∈ S with P (H) > 0, then for an arbitrary
A ∈ S, we shall write :
P (A ∩ H)
P (A/H) = ; P (H) ̸= 0
P (H)
is called conditional probability of A given H.
9 Bayes Theorem
Let S be a sample space and B1 , B2 , . . . , Bn be a partition of S, then for any A ⊆ S
P (Bi) · P (A/Bi)
P (Bi/A) = Pn
i=1 P (A/Bi) · P (Bi)
10 Independent Events
Two events, A and B are said to be independent if
P (A ∩ B) = P (A) · P (B)
Remarks:-
1. If A and B are independent, then
• AC and B are independent
• A and B C are independent
• AC and B C are independent.
2. If A, B, and C are independent, then
• P (A ∩ B ∩ C) = P (A) · P (B) · P (C)
• P (A ∩ B) = P (A) · P (B)
• P (B ∩ C) = P (B) · P (C)
• P (A ∩ C) = P (A) · P (C)
Conditions (2), (3), (4) do NOT imply (1).
3
11 Random Variable
Definition:- A Random variable (X) is a function which convert elements of sample space
into real numbers.
Example:- In tossing of coin twice: If X : no. of heads, then X can have values 0, 1, 2.
X Events
0 TT
1 HT, T H
2 HH
Here {0, 1, 2} is called Support of Random variable X.
Let X be a random variable having support {x1 , x2 , . . . , xn } with
P (X = xi ) = pi
then the probability distribution of X is represented as
X x1 x2 ··· xn
p(x) p1 p2 ··· pn
provided
Pn
i=1 pi = 1; pi ⩾ 0
pi ’s are called Probability Mass Function(PMF).
Example:- Getting first head in repeated tossing of a coin.
S = {H, T H, T T H, T T T H, . . .}
X = Number of tosses to get first head.
x 1 2/ 3 4 ...
P (x) 1/2 1/4 1/8 1/16 ...
∞
X 1 1 1
pi = + + + ....
i=1
2 4 8
1
2
= 1 =1
1− 2
4
13 Cumulative Distribution function: (CDF)
A CDF FX of random variable X is defined as:-
FX (x) = P (X ⩽ x); ∀x ∈ R
For example if
x 0 1 2
P (x) 1/4 1/2 1/4
then
FX (−3) = P (X ⩽ −3) = 0
1
FX (0.5) = P (X ⩽ 0.5) = P (X = 0) =
4
3
FX (1) = P (X ⩽ 1) = P (X = 0) + P (X = 1) =
4
FX (2) = P (X ⩽ 2) = 1
13.1 Properties
• FX (−∞) = P (X ≤ −∞) = 0
• FX (∞) = P (X ≤ ∞) = 1
• FX is right continuous as:-
FX (a) = FX a+
⇒ FX (x1 ) ⩽ FX (x2 )
• Note that we can write,
P (X = a) = FX (a) − FX a−
Example:-
0 ;x < 0
1/4 ;0 ⩽ x ⩽ 1
FX (x) =
3/4 ;1 < x < 2
1 ;x ⩾ 2
It is NOT CDF as
FX (1) ̸= FX 1+
5
13.2 Continuous Random Variable
If CDF of X that is, FX (x) of a random variable is continuous, then it is called continuous
random variable.
• If P (X = a) = 0, then Random Variable X is continuous. for example:-
0;
x<0
FX (x) = x; 0⩽x<1
1; x⩾1
• In such case
P (a < x < b) = P (a ⩽ x < b) = P (a < x ⩽ b) = P (a ⩽ x ⩽ b)
15 Moments
15.1 Expectation of X (1st Moment,Mean/Average)
n
X
E(X) = xi p i
i=1
Ex:- X = Rolling a die
X: 1 2 3 4 5 6
P : 1/6 1/6 1/6 1/6 1/6 1/6
E(X) = 1/6 + 2 · 1/6 + 3 · 1/6 + · · · · +6 · 1/6 = 3.5
6
• For infinite values: ∞
X
E(X) = x i Pi
i=1
X: x1 x2 · · · xn
p: p1 p 2 · · · pn
n
X
E[g(X)] = g (xi ) pi .
i=1
Ex:
X : −1 0 1
p : 1/3 1/3 1/3
1 1 1
E(X) = (−1). + 0. + 1. = 0
3 3 3
Now find E(X 2 )
let Y = X 2
Y :0 1
p : 1/3 2/3
E(Y ) = 0.1/3 + 1.2/3 = 2/3
On the other hand we can find it as
n
X
2
E(X ) = x2i .pi
i=1
n
X
E[(X − a)r ] = (xi − a)r .pi
i=1
7
Case 2: Second Moment about mean
n
X
2
E[(X − E(X) ] = [xi − E(X)]2 .pi
i=1
The term E[(X − E(X) ] is called the variance of X (Measure of uncertainty) and denoted
2
as V ar(X).
n
X
2
(xi − µ)2 pi
V ar(X) = E (X − µ) =
i=1
n
X n
X n
X
V ar(X) = x2i pi − 2µ xi p i + µ 2 pi
i=1 i=1 i=1
2 2
V ar(X) = E X − [E(X)]
V ar(X) ≥ 0
E(X − µ) = 0
E(|X − µ|) → Mean Deviation
It is always better to use x2 instead of |x| because it has more properties and easy to handle.
V ar(X) = σ 2 = E (X − µ)2
σ 2 = E X 2 − (E(X))2
X xi t
MX (t) = E eXt = e · pi
8
Ex:
X : −1 0 1 2
P : 1/6 1/3 1/3 1/6
1 1 1 1
MX (t) = e−t · + e0 · + et · + e2t ·
6 3 3 6
• E(aX + b) = aE(X) + b
• E(a) = a
• Var(aX + b) = a2 Var(X)
• MX (t) = E eXt
X2 2
• MX (t) = E 1 + Xt + 2!
t + ...
t2
• MX (t) = 1 + tE(X) + 2!
E (X 2 ) + . . .
• d
dt
(MX (t)) = E(X) + t · E (X 2 ) + . . .
• E(X) = d
dt
[MX (t)] t=0
d2
• E (X 2 ) = dt2
[MX (t)]
t=0
dr
• E (X r ) = dtr
{MX (t) |t=0
FX (a) = FX a+ = FX a− )
P (X = a) = 0.
If X is continuous then
9
16 Probability Density Function
Let X be a continuous random variable with CDF FX then pdf of X is defined by fX , if
Z x
FX (x) = fX (x)dx
−∞
P (X = a) = FX (a) − FX (a− )
Z a Z a−
= fX (x)dx − fX (x)dx = 0
−∞ −∞
Both the integrations are same, so we have the point probability zero.
Characteristics of fX : −
1) fX (P dF ) should be non-negative.
R∞
2) −∞
fX (x)dx = 1 since FX (∞) = 1
10
Z 3/4
1 3
P <X< = 3x2 dx
2 4 1/2
3/4
= x3 1/2
19
= Ans.
Z64 a
FX (a) = fX (x)dx
−∞
Z a+ Z a Z a−
fX (x)dx = fX (x)dx = fX (x)dx
−∞ −∞ −∞
11
Z ∞
kxdx = 1
−∞
Z 2
⇒ kxdx = 1
0
k 2 2 k[4 − 0]
⇒ x 0=1⇒ =1
2 2
1
⇒k =
2
(
x
;0 < x < 2
So, we have fX (x) = 2
0 ; otherwise
R1
(i) P (−1 < X < 1) = x
0 2
dx = 14 .
(ii)
Z ∞
E(X) = xfX (x)dx
−∞
Z 2
x
= x · dx
2
Z0 2 2
x
= dx
0 2
1 3 2
= x 0
6
E(X) = 4/3
(iii)
Var(X) = E X 2 − [E(X)]2
Z 2
x 16
= x2 · dx −
0 2 9
1 4 2 16
= x 0−
8 9
16
=2−
9
Var(X) = 2/9
12
Characteristic function:
ϕX (t) = E eiXt )
Z ∞
= eixt fX (x)dx
Z−∞
∞
|ϕX (t)| ≤ eixt |fX (x)| |dx|
Z−∞
∞
= fX (x)dx
−∞
Z ∞
= fX (x)dx = 1
−∞
ϕX (t) = MX (it)
Inverse Fourier Transformation
Z ∞
1
fX (x) = e−ixt ϕX (t)dt
2π −∞
ϕX (t) ←→ fX (x)
MX (it) = ϕX (t) = E eiXt
(it)2
E X2 + · · ·
= 1 + (it)E(X) +
2!
1 d
E(X) = ϕX (t)
i dt
• MX (t) = E eXt
• MX (0) = 1
13
17 Functions of Random Variables:
Let X has probability distribution
X : −1 0 1 2
pX : 1/3 1/6 1/6 1/3
Is it correct? The integral of fY (y) is not equal to 1 so, it is not a pdf. Now we will go by
principle term F
Z x
FX (x) = fX (t)dt
−∞
!
Z g2 (x)
d
we have · fX (t)dt = fX (g2 (x)) g2′ (x) − fX (g1 (x)) g1′ (x)
dx g1 (x)
d
∴ (FX (x)) = fX (x)
dx
Now we will first find FY (y) and from it we will find fY (y).
14
FY (y) = P (Y ≤ y)
= p(g(X) ≤ y)
= P X ≤ g −1 (y)
= fX g −1 (y)
d
FY (y) = (fY (y)
dy
d
∴ fY (y) = fX g −1 (y) · · g −1 (y)
dy
Let Y = e and find fY (y)?
X
Solution:
fY (y) = p(Y ≤ y)
= p eX ≤ y (we can invert here because it is monotonically increasing)
= p(X ≤ ln y)
d
= fX (ln y) (ln y)
dy
1
fY (y) = ; 1 < y < e.
y
By formula:
d −1
fY (y) = fX g −1 (y) (g (y))
dy
fX (ln y) 1
= ·
1 y
1
= ; 1 < y < e.
y
as fX (ln y) = 1.
Theorem. Let X be a continuous random variable with pdf fX (x) and Y = g(X) be a
monotonic and differentiable function, then the pdf of Y is given by,
d −1
fY (y) = fX (g −1 (y)) (g (y)) , y ∈ S(Y ).
dy
Exercise . Let X be a random variable having pdf given by,
(
1 0<x<1
fX (x) =
0 otherwise
and (
2 −1 < x < 1
gX (x) =
0 otherwise
Find the density of Y = X 2 .
15
Bernoulli Distribution: A Bernoullian trail is an experiment with two possible outcomes:
success (1) or failure (0) with probabilities p and 1 − p, respectively. Mathematically, we
denote it as P (X = 1) = p and P (X = 0) = 1 − p with 0 < p < 1. Then X ∼ Bernoulli(p).
Note:
1. i.i.d. → Independent and Identical distribution.
2. E[X] = nx=0 xP (X = x) = nx=0 x nx px (1 − p)(n−x) = np.
P P
16
Now, as n approaches ∞, we arrive at the following expression:
e−λ λk
P (X = k) = , k = 0, 1, 2, ...
k!
Note:
(k)
1. E[X k ] = MX (t)|t=0 where MX (t) = E[eXt ].
′ ′
3. MX (t) = λet eλ(e −1) . Thus, MX (0) = E[X] = λ.
t
′′
4. MX (0) = E[X 2 ] = λ2 + λ.
Theorem. Let X ∼ P oisson(λ) and Y ∼ P oisson(µ) and both are independent, then
X + Y ∼ P oisson(λ + µ).
Geometric Distribution:
Definition:1 Let the random variable X count the number of failures before getting the
first success in a sequence of Bernoulli trails with a success probability of p. Then, a random
variable X is said to have a geometric distribution with probability p denoted by X ∼ Geo(p)
if it’s probability mass function is given by,
P (X = k) = (1 − p)k p, k = 0, 1, 2, · · · .
Definition:2 Let the random variable X count the number of trials to get the first success
in a sequence of Bernoulli trails with a success probability of p. Then, a random variable
X is said to have a geometric distribution with probability p denoted by X ∼ Geo(p) if it’s
probability mass function is given by,
P (X = k) = (1 − p)k−1 p, k = 1, 2, 3, · · · .
17
Negative Binomial Distribution:
Definition:1 Let the random variable X count the number of failures before getting the
rth success in a sequence of Bernoulli trails with a success probability of p. Then, a random
variable X is said to have a negative binomial distribution with probability p denoted by
X ∼ N B(r, p) if it’s probability mass function is given by,
k+r−1 r
P (X = k) = p (1 − p)k , k = 0, 1, 2, · · · .
r−1
Definition:2 Let the random variable X count the number of trials to get the rth success in
a sequence of Bernoulli trails with a success probability of p. Then, a random variable X is
said to have a negative binomial distribution with probability p denoted by X ∼ N B(r, p)
if it’s probability mass function is given by,
k−1 r
P (X = k) = p (1 − p)k−r , k = r, r + 1, · · · .
r−1
Theorem. Let X1 , X2 , · · · , Xn are the i.i.d random variables and each Xi ∼ Geo(p). Define
X = X1 + X2 + · · · + Xn then X ∼ N B(r, p).
Coupon collector problem: If each box of a brand of cereals contains a coupon and there
are n different types of coupons. Assume that each time you collect a coupon, it is equally
likely to be any of the n types. What is the expected number of coupons needed until you
have a complete set?
Solution: Let N be the total number of coupons that you need to get all n types. Define
N = N1 + N2 + · · · + Nn , where Ni is the number of coupons that you need to get the ith
type. Also, Ni − 1 ∼ Geo( n−(i−1)
n
).
E[N ] = E[N1 + N2 + · · · + Nn ]
= E[N1 ] + E[N2 ] + · · · + E[Nn ]
n n n
=1+ + + ··· + .
n−1 n−2 1
Uniform Distribution: A random variable X is said to have uniform distribution over an
interval (a, b) if its probability density function is given by,
1
fX (x) = , a < x < b.
b−a
Theorem. Let X be a random variable with cumulative distribution function (CDF) F (x),
then Y = F (X) ∼ U (0, 1).
Gamma Distribution: Let X be a random variable having density,
1 x
α−1 − β
fX (x) = x e , x > 0,
Γαβ α
then, X is said to follow Gamma distribution with non-negative parameters α and β. It is
denoted by X ∼ G(α, β).
Note:
18
R∞
1. Γn = 0
xn−1 e−x dx, where n is a positive integer.
√
2. Γ 12 = π.
3. Γn = (n − 1)Γn − 1, n > 0.
4. Γn = (n − 1)!.
R ∞ n−1 −ax
5. Γn
an = 0
x e dx.
Z ∞
1 x
E(X) = x α
xα−1 e− β dx
0 Γα β
1
= Γ(α + 1)β α+1
Γα β α
= αβ
V ar(X 2 ) = αβ 2
Exponential Distribution: If we put α = 1 in eqn.(??) then we will get fX (x) =
−x
e , x > 0 which is exponential distribution.
1 β
β
19
Proof.
P (X > m + n, X > m)
P (X > m + n|X > m) =
P (X > m)
P (X > m + n)
=
P (X > m)
1 − FX (m + n)
=
1 − FX (m)
1 − (1 − e−(m+n)/β )
=
1 − (1 − e−m/β )
= e−n/β
= 1 − FX (n)
= P (X > n)
1 n
fX (x) = x 2 −1 e−x/2 , x > 0
Γ( n2 ) 2n/2
E(X) = n, V ar(X) = 2n
20
Proof.
Z ∞
xt 1 −1 x−µ 2
MX (t) = E(e ) = √ ext e 2 ( σ ) dx
2π σ −∞
Z ∞
1 −1 2 2 2
=√ e 2σ2 (x −2xµ+µ −2σ xt) dx
2π σ −∞
Z ∞ −1 2
1 2 2 2 2
2 x −2x(µ+σ t)+(µ+σ t) +µ −(µ+σ t)
2 2
=√ e 2σ dx
2π σ −∞
2
Z ∞ −1 2 −1
1 2
2 µ −(µ+σ t)
2 2
2 x−(µ+σ t)
=√ e 2σ e 2σ dx
2π σ −∞
2
−1
Z ∞ −1
1 2
2 µ −(µ+σ t)
2 2 2
2 x−(µ+σ t)
=√ e 2σ e 2σ dx
2π σ −∞
1 2 2
= e(µt+ 2 σ t )
2. If Z = X−µ
σ
, Z ∼ N (0, 1).
3. aX + b ∼ N (aµ + b, a2 σ 2 ).
7. E(Z 4 ) = 3, so Kurtosis is 3.
21
i. Probability density function.
MY (t) = E(e(ax+b)t )
= ebt E(eatx )
= ebt MX (at)
1 2 a2 t2
= ebt eµat+ 2 σ
1 2 a2 )t2
= e(aµ+b)t+( 2 σ
so, aX + b ∼ N (aµ + b, a2 σ 2 ).
Example: X ∼ N (5.3, 1) and Y ∼ N (5.5, 1.5) then
X−5.3
1
∼ N (0, 1) and Y√−5.5
1.5
∼ N (0, 1).
Moment Ineqalities:
Theorem. Let h be a non-negative function of a random variable X such that E(h(X)) ex-
ists, then
E(h(X))
P (h(X) ≥ ϵ) ≤
ϵ
Proof. Z ∞
E(h(X)) = h(x)f (x)dx, where f (x) is pdf of X
−∞
22
Z Z
E(h(X)) ≥ h(x)f (x)dx ≥ ϵ f (x)dx
A A
Z
E(h(X))
f (x)dx ≤
A ϵ
E(h(X))
P (h(x) ≥ ϵ) ≤
ϵ
Markov’s Inequality:
E(|X|r )
P (|X| ≥ k) ≤
kr
Proof. Using the previous result
let h(x) = |x|r and ϵ = k r then
E(|X|r )
P (|X|r ≥ k r ) ≤
kr
E(|X|r )
P (|X| ≥ k) ≤
kr
E(|X|2 )
P (|X| ≥ 3) ≤
32
2
P (|X| ≥ 3) ≤
9
Chebyshev’s Inequality: Let, X be a random variable with with E(X) = µ and V ar(X) =
σ 2 then
1
P (|X − µ| ≥ kσ) ≤ 2
k
or
1
P (|X − µ| ≤ kσ) ≥ 1 − 2
k
Joint probability Distribution: Let X and Y be random variables with support {x1 , x2 , . . . , xn }
and {y1 , y2 , . . . , ym } then joint probability distribution can be written in the form of
y1 y2 y3 ... ym
x1 p11 p12 p13 ... p1m
x2 p21 p22 p23 ... p2m
T = x3 p31 p32 p33 ... p3m
,
.. . .. .. ..
. ..
. . ... .
xn pn1 pn2 pn3 . . . pnm
pij = P (X = xi , Y = yj ), where i ∈ {1, 2, ..., n}, j ∈ {1, 2, ..., m} and pij ∈ [0, 1].
23
XX
pij = 1
i j
X
pi+ = pij = Marginal density of X.
j
X
p+j = pij = Marginal density of Y.
i
Marginal Pdf: Z
fX (x) = fXY (x, y)dy
Y
Z
fY (y) = fXY (x, y)dx
X
Conditional Pdf:
fXY (x, y)
fX|Y (x|y) =
fY (y)
Conditional Expectation:
Z
E(X|Y ) = x fX|Y (x|y)dx
X
Z
E(Y |X) = y fY |X (y|x)dy
Y
Expectation of a function g(X, Y ):
Z Z
E(g(X, Y )) = g(x, y)fXY (x, y)dx dy
Covariance of X and Y:
Cov(X, Y ) = E (X − E(X))(Y − E(Y ))
= E (X − X̄)(Y − Ȳ )
= E(XY ) − E(X)E(Y )
24
If we replace Y by X then we will get
Cov(X, X) = V ar(X)
Note:
Correlation Coefficient:
Cov(X, Y )
ρXY = p
V ar(X) V ar(Y )
Question:
fXY (x, y) = ke−(2x+y) , x > 0, y > 0
1. P (X < 1, Y < 1)
2. P (X < Y )
4. Cov(X,Y)
5. ρXY
Predective Model:
Amount of Rain (X) ⇌ Amount of crops (Y )
Ŷ = H(X)
Input (Given) → System → Output (Predicted)
Exercise: Are random variable X and Y related where
Solution:
Z 1 Z 1−y
kxydxdy = 1
0 0
Z 1
k
y(1 − y)2 dy = 1
0 2
4 1
k y 2y 3 y 2
− + =1
2 4 3 2 0
k = 24
Also Z 1−x
fX (x) = 24xydy = 12x(1 − x)2 , 0<x<1
0
25
gives
2y
fY |X (y|x) = , 0<y <1−x
(1 − x)2
Also E(X) = 25 , E(Y ) = 52 , E(X 2 ) = 15 , E(Y 2 ) = 1
5
and E(XY ) = 2
15
gives
Z
E(Y |X) = yfY |X (y|x)dy
1−x
2y 2
Z
2
= 2 dy = (1 − x).
0 (1 − x) 3
2
E(Y |X = x) = (1 − x) (Regression)
3
2
Yp = (1 − x) (Regression)
3
Covariance Matrix(Σ):
V ar(X) Cov(X, Y )
Σ=
Cov(Y, X) V ar(Y )
1 2
− 75
= 25
2 1 (Positive semidefinite and symmetric)
− 75 25
Independence of X and Y -
26
• If X and Y are independent, then
Or P (X = x, Y = y) = P (X = x)P (Y = y) ∀ x, y.
• If dependent then
∂E X X
= 0 =⇒ Yi = αn + β Xi
∂α
∂E X X X
= 0 =⇒ Xi Yi = α Xi + β Xi2
∂β
Answer: Dependent.
Question: If X and Y are independent then find
1. Cov(X,Y)
2. ρ
3. E(XY )
4. Var(X+Y)
Solution:
27
1. Cov(X,Y)=0,
2. ρ = 0,
3. E(XY)=E(X)E(Y),
4. Var(X+Y)=Var(X)+Var(Y).
Note:
Similarly
fXY (x, y) = fX (x)fY (y)
and
fXZ (x, z) = fX (x)fZ (z)
fX1 ,X2 ,...,Xn (x1 , x2 , ..., xn ) = fX1 (x1 )fX2 (x2 )...fXn (xn ). ∀x1 , x2 , ..., xn .
Solution: Since fXY (x, y) = h(x)g(y), where h(x) = kx and g(y) = e−y
Note: X and Y are independent iff
Question: If X and Y are independent and X ∼ N (0, 1) and Y ∼ N (0, 1) then what is the
distribution of X + Y .
Solution: Given that
t2
MX (t) = e 2
t2
MY (t) = e 2
28
Also
2
MX (t)MY (t) = et = MX+Y (t)
=⇒ X + Y ∼ N (0, 2).
Question: If X and Y are independent and X ∼ N (1, 4) and Y ∼ N (2, 9) then what is the
distribution of 2X + 3Y − 1?
Solution: Given that
M2X+3Y −1 (t) = E(e(2X+3Y −1)t )
= E(e2Xt )E(e3Y t )E(e−t )
= MX (2t)MY (3t)e−t .
• If X ∼ N (α, β 2 ) and Y ∼ N (γ, δ 2 ) and X, Y are independent then
aX + bY + c ∼ N (aα + bγ + c, a2 β 2 + b2 δ 2 ).
Question: If X ∼ P (λ1 ) and Y ∼ P (λ2 ) and X and Y are independent then what is the
distribution of X + Y ?
Solution: X + Y ∼ P (λ1 + λ2 ).
Question: If X ∼ U (0, 1) and Y ∼ U (0, 1) and X and Y are independent then what is
the distribution of X + Y ?
Solution:
et − 1
MX (t) =
t
t 2
(e − 1)
MX+Y (t) = (Not known)
t2
Lemma: Given that X and Y follows fXY (x, y) and X and Y are independent then what
is the distribution of X + Y ?
Proof: Let U = X + Y
FU (u) = P (U ≤ u)
= P (X + Y ≤ u)
Z Z
= fXY (x, y)dxdy
X+Y ≤u
Z ∞ Z u−y
= fX (x)fY (y)dxdy
−∞ −∞
Z ∞ Z u−y
= fX (x)dx fY (y)dy
−∞ −∞
Z ∞
= FX (u − y)fY (y)dy
−∞
29
R∞
=⇒ fU (u) = −∞ fX (u − y)fY (y)dy.
Question: If X ∼ Exp(λ1 ) and Y ∼ Exp(λ2 ) and X and Y are independent then what is
the distribution of X + Y ?
Solution: From given
d −1
fY (y) = fX (g −1 (y)) (g (y))
dy
3. The Jacobian (J ̸= 0)
∂x ∂x
J= ∂u ∂v ̸= 0
∂y ∂y
∂u ∂v
then the joint density is given by
fU V (u, v) = fXY (h1 (u, v), h2 (u, v)) |J| , (u, v) ∈ S(U, V )
Solution:
U +V U −V
X= & Y =
2 2
1 1
J= 2 2
1 −1
2 2
30
−1
J=
2
U +V U −V
fU V (u, v) = fXY ( , ) |J|
2 2
U +V U −V 1
fU V (u, v) = ( + )( )
2 2 2
u
fU V (u, v) =
2
and 0 ≤ U + V ≤ 2 & 0 ≤ U − V ≤ 2
u
2
0 < u < 1, −u < v < u
u
fU V (u, v) = 1 < u < 2, u − 2 < v < −u + 2
2
0 otherwise
and
Exercise . Let X ∼ U (0, 1) & Y ∼ U (0, 1) and X ,Y are independent Random Variable.U =
X + 2Y and V = 3X − Y . Then find fXY (x, y) and fU V (u, v).
31
1 √ √
fY (y) = √ (fX1 ( Y ) + fX1 (− Y ))
2 Y
and Z ∼ N (0, 1) and Y = Z 2
1 √ √
fY (y) = √ (fX1 ( Y ) + fX1 (− Y ))
2 Y
1 1 −y 1 −y
fY (y) = √ ( √ e 2 + √ e 2 )
2 Y 2π 2π
1 −y
fY (y) = √ e2
2πy
1 −1 −y
fY (y) = q 1
y 2 e 2 , y>0
1
2
2 2
Y ∼ χ21
Note:
X ∼ N (µ, σ 2 )
X −µ
Z=
σ
Z ∼ N (0, 1)
Note:
X ∼ P (λ), x = 0, 1, ....
2X + 1 ≁ P
Every linear function does not necessarily follow the Poisson distribution.
Theorem. Let Y1 , Y2 , ....., Yn ∼ χ21 . if Y1 , Y2 , ....., Yn are independent.Then Yi ∼ χ2n .
P
i
Proof.
P
Yi t
MY (t) = MP Yi (t) = E(e )
n
Y
MY (t) = E(eYi t )
i=1
32
n
Y
MY (t) = MYi (t)
i=1
Z ∞
1 −1 1
MYi (t) = q y 2 e−y( 2 −t) dt
1 21
2
2 0
1 Γ( 12 )
MYi (t) = q 1
1 21
2 ( 12 − t) 2
2
1
MYi (t) = √
1 − 2t
1
MY (t) = n
(1 − 2t) 2
Y ∼ χ2n
1 −x2 1 n −y
fXY (x, y) = √ e 2 1 n y 2 −1 e 2 , y > 0
2π Γ22 2
Then x = u nv and y = v
p
pv
√u
n 2 vn
J=
0 1
r
v
J=
n
r
1 −u2 v n
−1 −v v
fU V (u, v) = √ 1 n
e 2n v 2 e2 , u∈R & v>0
2πΓ 2 2 2 n
33
Z ∞
1 n 1 −v 2
(1+ un )
fU (u) = √ n v ( 2 + 2 )−1 e 2 dv
2πΓ 12 2 2 0
1 Γ( n2 + 12 )
fU (u) = √ n 2 n 1 ,u ∈ R
2πΓ 12 2 2 ( 12 (1 + un )) 2 + 2
F- Distribution: Let X ∼ χ2n & Y ∼ χ2m and X, Y are independent Random Vari-
X
able.Then n
Y ∼ F (n, m) (F- distribution).
n
X
Define U = n
Y and V = Y
n
Then X = U V n
m
and Y = V
n n
Vm Um
J=
0 1
n
J =V
m
1 −1 x − µ1 2 x − µ1 y − µ2 y − µ2 2
fXY (x, y) = p exp( 2
[( ) − 2ρ( )( )+( ) ])
2πσ1 σ2 1− ρ2 2(1 − ρ ) σ1 σ1 σ2 σ2
where , x & y ∈ R
If
we know that
independence ⇒ ρ = 0
34
Therefore, we have
Independence ⇔ ρ = 0
In General,
ρ = 0 ⇔ LinearIndependence
If (X, Y ) ∼ BN (µ1 , µ2 , ρ, σ12 , σ22 ) then, X,Y can only be linearly related.
1 −1 x − µ1 2 x − µ1 y − µ2 y − µ2 2
fXY (x, y) = p exp( 2
[( ) − 2ρ( )( )+( ) ])
2πσ1 σ2 1 − ρ2 2(1 − ρ ) σ1 σ1 σ2 σ2
1 −1 y − µ2 x − µ1 2 1 y − µ2 2
fXY (x, y) = p exp( 2
[(( ) − ρ( )) − ( ) ])
2πσ1 σ2 1 − ρ2 2(1 − ρ ) σ2 σ1 2 σ2
1 −1 x−µ1 2
( ) 1 −1 σ2
fXY (x, y) = √ e 2 σ1 √ p exp( 2 )σ 2
[y − µ 2 − ρ (x − µ1 )]2 )
2πσ1 2πσ2 1 − ρ 2 2(1 − ρ 2 σ 1
1 1 y−δ 2
fY |X (y|x) = √ e2( γ )
2πγ
Y |X ∼ N (δ, γ 2 )
σ2
E(Y |X) = δ = µ2 + ρ (x − µ1 )
σ1
E(Y |X) = α + βx
σ2 Cov(X, Y ) σ2 Cov(X, Y )
β̂ = ρ = =
σ1 σ1 σ2 σ1 V ar(X)
Convergence in Probability
A sequence of random variables {X1 ,X2 ,. . . ,Xn } is said to converge in probability to another
random variable X if:
P(|Xn − X| > ϵ) → 0 as n → ∞
P
Xn −→X
35
Example:
Let {Xn } be a sequence of random variables defined by
P (Xn = 1) = 1
n
and P (Xn = 0) = 1 − 1
n
Let ϵ = 12 ,
(
P (Xn = 1) = 1
if 0 < ϵ < 1,
Then P (|Xn − X| > ϵ) = P (|Xn − X| > 21 ) = n
0 if ϵ ≥ 1
P
It follows that P (|Xn | > ϵ) → 0 as n → ∞, and we conclude that Xn −
→ 0.
Convergence in Moments
A sequence {X1 , X2 , . . . , Xn } of random variables is said to converge in rth moment to a
random variable X if:
E(Xrn ) → E(Xr )
Convergence in Law
Let {Xn } be a sequence of random variables with C.D.F. {Fn } defined on the probability
space (Ω, F, P ). Further, let X be another random variable with C.D.F., F(·), then;
{Xn } is s.t.b. converging in distribution to X if:
We denote it by:
L
Xn →
− X
, where n = 1, 2, 3, 4, . . .. We say that {Xn } obeys the weak law of large numbers with
respect to the sequence of constants {Bn }, where Bn > 0 and Bn → ∞, if there exists a
sequence {An } such that Bn−1 (Sn − An ) → 0 as n → ∞. Here, An is called the centering
constant, and Bn is called the normalizing constant.
36
18.1 Theorem
Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables,
each having mean E[Xi ] = µ. Then, for any ε > 0,
X1 + · · · + Xn
P − µ > ε → 0 as n → ∞
n
Proof: We shall prove the result only under the additional assumption that the random
variables have a finite variance σ 2 . Now, as
σ2
X1 + · · · + Xn X 1 + · · · + Xn
E = µ and Var =
n n n
σ2
X1 + · · · + Xn
P −µ>ε ≤ 2
n nε
37
Example(Poisson Random Variables)
Let {Xn ; n ≥ 1} be a sequence of i.i.d. Poisson random variables with parameter λ. Then
we have
λk
P (X1 = k) = e−λ ,
k!
for k = 0, 1, 2, . . .. Thus, µ = E(X1 ) = λ and Var(X1 ) = λ. Hence, by the Weak Law of
Large Numbers (WLLN),
P
X̄n −
→ µ.
Important
Let {Xn } be any sequence of random variables. Define Sn = n−1 nk=1 Xk . A necessary and
P
sufficient
2 condition for the sequence {Xn } to satisfy the weak law of large numbers is that
E 1+Y Yn
2 tends to 0 as n tends to infinity.
n
Example Let X1 , X2 , . . . be iid random variables with E(X1k ) < ∞ for some positive
integer k. Then,
Xn
Xjk → E(X1k ) as n → ∞.
j=1
Example
d
Let X1 , X2 , . . . be iid C(1, 0) random variables. We have seen that n−1 Sn −→ C(1, 0), so that
n−1 Sn does not converge in probability to 0. It follows that the weak law of large numbers
does not hold.
Example
Let X1 , X2 , . . . be iid random variables with a common probability density function (pdf)
given by (
1 + x2ρ+ρ , x ≥ 1
f (x) =
, x<1
38
where ρ > 0.
and the law of large numbers holds, i.e., n−1 Sn tends to 1+ρ
ρ
as n tends to infinity.
Example 19.1. Let X ∼ P (λ), then the MGF of X is given by M (t) = exp[λ(et − 1)] for
all t. √ √
√ , then the MGF of Y is given by MY (t) = e−t/ λ M (t/ λ). Logarithm of
Let Y = X−λ
λ
MY (t) is
t t
log MY (t) = − √ + log M √
λ λ
t √
= − √ + λ(et/ λ − 1).
λ
Expanding the exponential term and solving, after solving,
t2
log MY (t) = as λ → ∞,
2
2 /2
so that MY (t) → et as λ → ∞, which is the MGF of a N (0, 1) random variable.
39
20 What is Random Sample?
A Collection of I.I.D.(Independent Identically Distributed) Random Variables (X1 , X2 , X3 , . . . , Xn )
is called a "Random Sample" of size n.
Qn Since each of X1 , X2 , X3 , . . . , Xn has same distribution and are independent, so f (x1 , x2 , x3 , . . . , xn ) =
i=1 f (xi , θ)
21 What is Statistic?
Statistic- A function of random sample.
There are two most important statistics and these are :
X̄ − µ n→∞
Z= −−−→ N (0, 1)
√σ
n
n
1X
E(X̄) = E( Xi )
n i=1
σ2
Var(X̄) =
n
1
S2 = (Xi − X̄)2
P
(n−1)
1 X
S2 = (Xi − µ + µ − X̄)2 (2)
(n − 1)
1 X
S2 = ((Xi − µ)2 + (µ − X̄)2 + 2(Xi − µ)(µ − X̄)2 ) (3)
(n − 1)
40
1 X
S2 = ( (Xi − µ)2 − n(X̄ − µ)2 ) (4)
(n − 1)
1
=⇒ E(S 2 ) = E((Xi − µ)2 ) − nE((X̄ − µ)2 ))(5)
P
(n−1)
(
1 X 2 σ2
2
E(S ) = [ σ − n ] = σ2 (6)
n−1 n
from equation (3)
n
(n − 1)S 2 X Xi − µ 2 X̄ − µ
2
= ( ) − ( σ )2 (7)
σ i=1
σ √
n
n
(n − 1)S 2 X 2
= Zi − Z 2 (8)
σ2 i=1
(n − 1)S 2
≈ χ2( n−1) (9)
σ2
23 Statistics :
let say we have given a data set:
assumption-1: fX (x : θ)
f (x; θ)
Unknown Parameter
Population Parameter
E[g(X1 , X2 , X3 , . . . , Xn )] = θ
E[g(X1 , X2 , X3 , . . . , Xn )] − θ = bias
41
23.2 Efficient Statistic→
g1 (X1 , X2 , X3 , . . . , Xn ) is efficient than g2 (X1 , X2 , X3 , . . . , Xn ) if Var[g1 (X1 , X2 , X3 , . . . , Xn )]
≤ V ar[g2 (X1 , X2 , X3 , . . . , Xn ) ]
(X1 + X2 − X3 + X4 + X5 )
g1 (X1 , X2 , X3 , . . . , Xn ) =
5
g2 (X1 , X2 , X3 , . . . , Xn ) = X̄ (11)
X1 + X n
g3 (X1 , X2 , X3 , . . . , Xn ) = (12)
2
E(X1 + E(X2 ) − E(X3 ) + E(X4 ) + E(X5 )
E[g1 (X1 , X2 , X3 , . . . , Xn )] =
5
3µ
E[g1 (X1 , X2 , X3 , . . . , Xn )] = → not Unbiased (13)
5
E[g2 (X1 , X2 , X3 , . . . , Xn )] = E(X̄ = µ → Unbiased (14)
E(X1 ) + E(Xn )
E[g3 (X1 , X2 , X3 , . . . , Xn )] = = µ → Unbiased (15)
2
σ2
Var[g2 (X1 X2 ...Xn )] = Var(X̄) = (16)
n
X1 + X2 σ2
Var[g3 (X1 X2 ...Xn )] = Var( )= (17)
2 2
Var[g2 (...)] <= Var[g3 (...)] (18)
X̄ → MUVE for µ & S2 (for σ 2 )
σ2
1) X̄ ∼ N µ, n ) ←→ X̄−µ
√
∼ N (0, 1)
n
2) 1
S 2 = (n−1) (Xi − µ)2 − n(X̄ − µ)2
P
2
(n−1)S
= ni=1 ( Xiσ−µ )2 − ( X̄−µ )2
P
σ2 √σ
n
x̄ − µ
t= ∼ t(n-1) (20)
√S
n
42
24 Objective
To obtain g(X1 , X2 , . . . , Xn ) that estimates θ.
26 Efficiency
g1 is more efficient than g2 if Var(g1 ) ≤ Var(g2 ).
43
n n
Y Y 1
L(θ) = fXi (xi ) =
i=1 i=1
θ
log(L(θ)) = −n log(θ)
d n
log(L(θ)) = − = 0
dθ θ
It does not work always!
Example: X1 , X2 , . . . , Xn , Xi ∼ N (µ, σ 2 )
n n
1 1 xi −µ 2
e− 2 ( σ )
Y Y
2
L(µ, σ ) = fXi (xi ) = √
i=1 i=1
2πσ
n
2 n n 2 1 X
G(L(µ, σ )) = − log(2π) − log(σ ) − 2 (xi − µ)2
2 2 2σ i=1
n
d 1 X
G(L(µ, σ 2 )) = 2 (xi − µ) = 0
dµ 2σ i=1
n
1X
µ̂ = xi
n i=1
n
d 2 n 1 X
2
G(L(µ, σ )) = − 2
+ 4
(xi − µ)2 = 0
dσ 2σ (2σ ) i=1
n
2 1X
σ̂ = (xi − µ̂)2
n i=1
Joint estimators of (µ, σ 2 ):
µ̂ = x̄
n
2 1X
σ̂ = (xi − x̄)2
n i=1
E(σ̂ 2 ) ̸= σ 2 (Biased)
σ̂ 2 is not an unbiased estimator of σ 2
Unbiased estimator for σ 2 :
n
n 1 X
g(x1 , x2 , . . . , xn ) = σ̂ 2 = (xi − x̄)2
n−1 n − 1 i=1
44
h i
E(θ̂ − θ)2 = E (θ̂ − E(θ̂))2 + 2(θ̂ − E(θ̂))(E(θ̂) − θ) + (E(θ̂) − θ)2
n 2 2 o
E(θˆ − θ)2 = E θˆ − E(θˆ) + θ − E(θˆ) + 2 θˆ − E(θˆ) E(θˆ) − θ
where:
• θ̂ is a random variable,
• E(θ̂) is a constant,
• θ is a constant.
2 2
= E θˆ − E(θˆ) + θ − E(θˆ) + 2 θˆ − E(θˆ) E(θˆ) − θ
2
= Var(θˆ) + Bias(θˆ)
P (g1 ≤ θ ≤ g2 ) ≥ 1 − α
Where:
3. P (g1 ≤ θ ≤ g2 ) ≥ 1 − α.
Case 1
Let X1 , X2 , . . . , Xn be a random sample from N (µ, σ 2 ) and σ 2 is known.
α = error
α = 0.05, 0.1, 0.01, represents the confidence.
Create a confidence interval for µ:
3. X̄−µ
√σ
∼ N (0, 1).
n
45
P (a ≤ Z ≤ b) = 1 − α
P −z α2 ≤ Z ≤ z α2 = 1 − α
!
X̄ − µ
P −z α2 ≤ ≤ z α2 =1−α
√σ
n
σ σ
P X̄ − z α2 √ ≤ µ ≤ X̄ + z α2 √ =1−α
n n
(1 − α)100% confidence interval for µ is:
σ σ
X̄ − z α2 √ , X̄ + z α2 √
n n
For example, when α = 0.05, the 95% confidence interval for µ with z0.025 = 1.96 is:
σ σ
X̄ − 1.96 √ , X̄ + 1.96 √
n n
Case 2
When σ is unknown:
X̄ − µ
∼ tn−1
√S
n
!
X̄ − µ
P −t α2 ,n−1 ≤ ≤ t α2 ,n−1 =1−α
√S
n
46
(n − 1)S 2 2 (n − 1)S 2
≤ σ ≤
χ2α/2 χ21−α/2
where χ2α/2 and χ21−α/2 are the critical values from the chi-square distribution with n − 1
degrees of freedom. The probability that σ 2 falls within this interval is 1 − α. In terms of
the chi-square distribution, this is expressed as:
(n − 1)S 2
2 2
P χ1−α/2 ≤ ≤ χα/2 = 1 − α
σ2
Further simplifying, we have:
!
(n − 1)S 2 2 (n − 1)S 2
P ≤ σ ≤ =1−α
χ2α/2 χ21−α/2
47
Interval estimator
let X1 , X2 , X3 , ...Xn be a random sample of size n then
Case-1: When n is large,confidence interval for µ is,
S S
X̄ − z α2 √ , X̄ + z α2 √
n n
Case-2: When n is small,confidence interval for µ is,
S S
X̄ − t α2 ,n−1 √ , X̄ + t α2 ,n−1 √
n n
X̄ − E(X̄)
p → N (0, 1)
V ar(X̄)
P (−z α2 ≤ z ≤ z α2 ) = 1−α
X̄ − p
P (−z α2 ≤ q ≤ z α2 ) = 1−α
p(1−p)
n
r r
p(1 − p) p(1 − p)
P (X̄ − z α2 ≤ p ≤ X̄ + z α2 ) = 1 − α,
n n
z α2 2 z α2 2
2
1+ P − 2X̄ + P + X̄ 2 ≤ 0
n n
zα 2
r zα 2 2
zα 2
2X̄ + n 2
± 2X̄ + n2 − 4 1 + n2 X̄ 2
P = zα 2
2 1 + n2
48
2
zα/2
when n is large then n
≈ 0 we get,
r 4X̄z 2 z4
zα/2 2 α/2 α/2
2X̄+ n
± 4X̄ 2 + n
+ 2
n
P =
zα/2 2
2 1+ n
√
X̄(1−X̄)zα/2
P = X̄ ± √
n
Let two samples X1 , X2 , ...Xn ∼ N (µ1 , σ12 ) and Y1 , Y2 , ...Yn ∼ N (µ2 , σ22 ) then we will find
C.I. (Confidence Interval) for (µ1 − µ2 ) as we know point estimator of µ1 − µ2 is X̄ − Ȳ ,
where X̄ ∼ N (µ1 , σ12 /2) and Ȳ ∼ N (µ2 , σ22 /2) implies X̄ − Ȳ ∼ N (µ1 − µ2 , σ12 /2 + σ22 /2)
hence C.I. for µ1 − µ2 is
" r r #
S12 S22 S12 S22
(X̄ − Ȳ ) − zα/2 + , (X̄ − Ȳ ) + zα/2 +
n m n m
Testing of Hypothesis:
Decision Making
Ex. Let the average height of the population is 5.4 ft and P = 40% then accept or reject
Statistical hypothesis:
Null hypothesis: Initial claim(should be rejected)
H1 : Ho is not true
µ ̸= 5.4 or p < 0.4
49
1. Over estimation
2. Under estimation
P ( type I error) = α
⇒ P (H0 is rejected/H0 is true) = α
P ( type II error) = β
⇒ P ( accept H0 /H0 is not true) = β
Objective: Minimize both α and β
simultanious reduction in α and β is not possible. So let first fix the value of α and minimize β
• Rejection region of H0
• Critical region
H0 : p = 0.4, H1 : p ̸= 0.4 collect the data and we have point estimator of p =X̄
If X̄ is close to 0.4 then accept H0 otherwise reject H0 .
Level of significance
Objective: Minimize both α and β
Testing:µ = µ0 v/s H1 : µ ̸= µ0
Test critical region/ Rejection region
50
• Decision with (1 − α)100% level of significance
zα/2 S
• Accept H0 if |X̄ − µ0 | ≤ √
n
z S zα/2 S
⇒ µ0 ∈ X̄ − α/2 √ , X̄ +
n
√
n
X̄−µ
⇒ zα/2 ≤ √0
S/ n
≤ zα/2
Method:
Step 1- Define H0 and H1
Step 2- Fix α.
β= P (type II error)
= PH1 ( accept H0 )
z S
= PH1 (|X̄ − µ0 | ≤ α/2
√ )
n
√ √
= PH1 (−zα/2 S/ n ≤ X̄ − µ0 ≤ zα/2 S/ n)
= PH1 (−zα/2 √Sn ≤ X̄ − µ0 ≤ zα/2 √Sn )
= PH1 (−zα/2 √Sn + µ0 − µ1 ≤ X̄ − µ1 ≤ zα/2 √Sn + µ0 − µ1 )
µ0 −µ µ0 −µ
= PH1 (−zα/2 + √1
S/ n
≤ Z ≤ zα/2 + √ 1)
S/ n
51
C = {(X1 , X2 , ..., Xn ) : X̄ − µ > c} based on α
α= PH0 (c)
= PH0 (X̄ − µ > c)
⇒ 1 − α = PH0 (X̄ − µ < c)
X̄−µ
z= ∼ N (0, 1)
√0
S/ n
PH 0 Z ≤ S/c√n = 1 − α ⇒ c√
S/ n
= zα ⇒ c = z√
αS
n
Reject H0 at (1 − α)100% level of significance if X̄ − µ0 > z√
αS
n
⇒ X̄ > µ0 + z√
αS
n
(zα = tα
when n < 30)
H0 : µ ≥ µ0 vs H1 : µ < µ0
Critical region is,
C = {(X1 , X2 , ..., Xn ) : X̄ − µ < c} based on α
α= PH0 (c)
= P (X̄ − µ0 < c)
= P ( X̄−µ
√0 <
S/ n
c√
S/ n
)
c√
= P (z < S/ n
)
c= √S (−zα )
n
If X̄−µ
√ 0 < −zα
S/ n
then reject H0 .
H0 : µ1 ≤ µ2 vs H1 : µ1 ̸= µ2
Given X1 , X2 , ...Xn and Y1 , Y2 , ...Yn are two samples, then critical region,
C = {(X1 , X2 , ...Xn , Y1 , Y2 , ...Yn ) : |X̄ − Ȳ | > c}
α= PH0 (c)
= PH0 (|X̄ − Ȳ | > c)
⇒ PH0 (−c ≤ X̄ − Ȳ ≤ c) = 1−α
S12 S22
X̄ − Ȳ ∼ N µ1 − µ2 , +
n m
(X̄ − Ȳ ) − (µ1 − µ2 )
⇒Z= q ∼ N (0, 1) for H0 , µ1 − µ2 = 0
S12 S22
n
+ m
52
−c c
P (r 2
≤z≤ r )=1−α
S1 S2 2
S1 S2
n
+ m2 n
+ m2
q
S2 S2
c = zα/2 n1 + m2
Reject H0 at (1 − α)100% level of significance if |X̄ − Ȳ | > c One sided:
H0 : µ1 ≤ µ2 vs H1 : µ1 > µ2
β= PH1 ( Accept H0 )
= PH1 (|X̄ − Ȳ | ≤ c)
= P (−c ≤ |X̄ − Ȳ | ≤ c)
H1
= PH1 −c−(µ
r 1
2
−µ2 )
2
≤z≤ c−(µ1 −µ2 )
r
2
S1 S S1 S2
n
+ m2 n
+ m2
For fixed α
1 − α → level of significance
1 − β → power of test
Hypothesis for σ 2 :
I : H0 : σ12 = σ02 v/s H1 : σ12 ̸= σ02
I/II
(n − 1)S 2
2
∼ χ2(n−1)
σ
Test statistic(Under H0 )
(n − 1)S 2
Ycal =
σ02
Compare Ycal with χ2α/2,(n−1) and χ21−α/2,(n−1) .
Reject H0 if Ycal > χ2α/2 or Ycal < χ21−α/2 .
Hypothesis: H0 : p = p0 , H1 : p ̸= p0
for p = α
C = {(x1 , x2 , x3 , .....xα ) : |X − p0 | > c}
C = PH0 (|X − p0 | > c)
Z ∼ N (p0 , p0 (1−p
n
0)
)
X − p0
Z=q ∼ N (0, 1)
p0 (1−p0
n
53
−c X − p0 c
P (q ≤q ≤q )=1−α
p0 (1−p0 p0 (1−p0 p0 (1−p0
n n n
c=Z q
p0 (1−p0
α/2 n
β(p1 ) = PH1 (Acceptance of H0 )
= PH1 (|X − p0 | ≤ c)
= PH1 (−c ≤ X − p0 ≤ c)
−c + p0 − p1 c + p0 − p1
= P( q ≤Z≤ q )
p0 (1−p0 p0 (1−p0 )
n n
X − p0
Zcal = q
p0 (1−p0 )
n
Stochastic Process
A collection of random variables {Xt : t ∈ T } varying with respect to time is called a
stochastic process.
Here Xt is a random variable at time t.
54
Categories of Stochastic processes
• Discrete time - Discrete state
For example: number of people coming to a doctor everyday.
Statistical Properties
Let Xt : t ≥ T be a stochastic process.
1. Distribution Function:
First order: Ft (x) = P (Xt ≤ x)
Second order: Ft1 ,t2 (x, y) = P (Xt1 ≤ x, Xt2 ≤ y).
2. Expectation: R
η(t) = E(Xt ) = xt ft (x)dx.
3. Autocorrelation:
R(t1 , t2 ) = E(Xt1 · Xt2 ).
4. Average power:
R(t, t) = E(Xt 2 )
Note: This is autocorrelation at same time.
5. Autocovariance:
C(t1 , t2 ) = E(Xt1 · Xt2 ) − E(Xt1 ) · E(Xt2 ).
C(t1 , t2 ) = R(t1 , t2 ) − η(t1 ) · η(t2 ).
• Independent increments
If (Xt1 − Xt2 ), (Xt2 − Xt3 ), ...... all are independent then we say stochastic process has
independent increments.
• Stationary increments
If the distribution of (Xt+h − Xt ) does not depend on t but depends only on h i.e. the
increments of same length follow same distribution then we say stochastic process has
stationary increments.
For example: Distribution of (Xt15 − Xt10 ) = Distribution of (Xt8 − Xt3 )
55
Counting process
A stochastic process {Nt : t ≥ 0} is said to be a counting process if Nt is the number of
events occuring in time interval (0,t].
Poisson process
A counting process {Nt : t ≥ 0}, N0 = 0 is said to be a poisson process if
Interarrival time
The time between two arrivals is said to be interarrival time.
T1 denotes Time of occurence of first event.
T2 denotes Time between first and second event.
Tn denotes Time between (n − 1)th and nth event.
P (T1 > t) denotes the probability for time of first arrival to be more than t.
P (T1 > t) = P ( no arrival in (0,t] )
P (T1 > t) = P (N (t) = 0) = e(−λt) .
The interarrival time follows exponential distribution with mean 1/λ.
Markov process
A stochastic process {Xt : t > 0} is said to be a markov process if
P (X(n+1) = a(n+1) : X0 = a0 , X1 = a1 , .......Xn = an ) = P (X(n+1) = a(n+1) : Xn = an )
56