Module A
Module A
and Systems
Instructor: Prof. Ashish R. Hota
Logistics:
• Class Timing: Monday: 12 noon - 12:55pm; Tuesday: 10am - 11:55am,
• Venue: NC 244
• Instructor Email: [email protected]. Use EE60039 in Subject Line.
• Course Website: http://www.facweb.iitkgp.ac.in/∼ahota/prob.html
• MS Team Password: ps6tv5h. All materials will be uploaded on Teams.
Syllabus:
Module A: Introduction to Probability and Random Variables. 4
Weeks. Main Reference: Chapters 1-5 of Wasserman.
1. Probability Space. Independence. Conditional Probability. [Chapter 1 of
Hajek, Chapter 2-5 of Chan, Chapter 1 of Gallager]
2. Random Variables and Vectors. Discrete and Continuous Distributions.
[Chapter 1 of Hajek, Chapter 2-5 of Chan, Chapter 1 of Gallager]
3. Expectation, Moments, Characteristic Functions. [Chapter 1 of Hajek, Chap-
ter 2-5 of Chan, Chapter 1 of Gallager]
4. Inequalities and Bounds. [Chapter 1-2 of Hajek, Chapter 6 of Chan, Chapter
1 of Gallager]
5. Convergence of Random Variables. Law of Large Numbers, Central Limit
Theorem. [Chapter 2 of Hajek, Chapter 6 of Chan, Chapter 1 of Gallager]
Module B: Random Processes. 4 Weeks.
1. Definition, Discrete-time and Continuous-time Random Processes [Chapter
4 of Hajek, Chapter 10 of Chan]
2. Stationarity, Power Spectral Density, Second order Theory [Chapter 4, 8 of
Hajek, Chapter 10 of Chan]
1
3. Gaussian Process [Chapter 3 of Hajek, Chapter 3 of Gallager]
4. Markov Chain, Classification of States, Limiting Distributions [Chapter 4 of
Gallager]
Module C: Basics of Bayesian Estimation. 4 Weeks.
1. Maximum Likelihood, Maximum Aposteriori, Mean Square and Linear Mean
Square Estimation [Chapter 5 of Hajek, Chapter 8 of Chan]
2. Conditional Expectation and Orthogonality [Chapter 3 of Hajek, Chapter 10
of Gallager]
3. Kalman Filters [Chapter 3 of Hajek]
4. Hidden Markov Models [Chapter 5 of Hajek]
Module D: Information, Entropy, and Divergence, 1 Week
Reference:
The subject will closely follow the treatment in the following texts.
1. Larry Wasserman, All of Statistics, Springer Texts in Statistics, 2004.
Available at: https://link.springer.com/book/10.1007/978-0-387-21736-9
2. Bruce Hajek, Random Process For Engineers,
Cambridge University Press, 2015. Available at:
https://hajek.ece.illinois.edu/Papers/randomprocJuly14.pdf
3. Robert G. Gallager, Stochastic Processes: Theory for Applications,
Cambridge University Press, 2013.
4. Stanley H. Chan, Introduction to Probability for
Data Science, Michigan Publishing, 2021. Available at:
https://probability4datascience.com/index.html
5. Jason Speyer and Walter Chung, Stochastic Processes, Estimation
and Control, SIAM, 2008.
Evaluation Plan:
1. Midsem: 30%
2. Endsem: 50%
3. Homework and Class Performance: 20%
Probability Space
Notations:
• N : set of natural numbers
• R : set of real numbers
• Z: set of integers
• Q : set of rational numbers
• R+ = {x ∈ R | x ≥ 0} and Z+ = {a ∈ Z | a ≥ 0}.
• For a set X, we denote the set of all its subsets by P(X)
Note
• Any F that satisfies the properties a,b, and c is called a σ-algebra over Ω,
and (Ω, F) is called a measurable space.
1
Examples
Specifically,
S∞ A1 = [−1, 0], A2 = [−1, 0.5], . . . , A10 = [−1, 0.9]
i=1 Ai = {a | a ∈ Ai for some finite i} = [−1, 1].
• Homework: T∞
Bn [0, 1 − n1 ), n=1 Bn = ?
∞
Cn = [0, 1 + n1 ),
T
n=1 Cn = ?
2
Elementary Properties implied by probability axioms
3
Conditional Probability and Independence
Notes:
• If A, B are independent, P(B) > 0, then
P(A ∩ B) = P(A) · P(B) ⇒ P(A | B) = P(A).
Knowledge that event B is true gives you no further information about
occurence of A.
• Suppose A and B are disjoint can they be independent?
No. Disjoint is the strongest form of dependence. Occurence of one event
rules out the occurence of the other.
4
Baye’s Law
• Baye’s Law:
P (Ai ∩ B) P (B | Ai ) P(Ai )
P (Ai | B) = = Pk .
P(B) j=1 P (B | A j ) P (Aj )
Problem: Consider a disease that affects one out of every 1000 individuals.
There is a test that detects the disease with 99% accuracy, that is, it clas-
sifies a healthy individual as having the disease with 1% chance, and a sick
individual as healthy with 1% chance. Then,
1. What is the probability that a randomly chosen individual will test pos-
itive by the test?
2. Given that a person tests positive, what is the probability that he or
she has the disease?
Homework: Repeat the above when detection accuracy is 99.9%, 99.99% and
99.999%.
5
Random Variable
where
Note: Functions that satisfy this property are called measurable functions. Mea-
surability is a property of the function X and the σ-algebra.
6
(Important) Indicator Random Variable
3. if 1 ≤ a, then 1−1
E ((−∞, a]) =?.
7
Probability Distribution of a Random Variable
8
Example
Let Ω = {1, 2, 3} and X : Ω → R such that X(1) = 0.5, X(2) = 0.7, X(3) =
0.7.
• Find the smallest σ-algebra on Ω such that X is a random variable.
• Let P({1}) = 0.3. Find the distribution FX .
9
Random Vectors and Random Processes
10
Discussion on Random Variables
11
Discrete Random Variable
12
Continuous Random Variable
13
Properties of probability density function
For a continuous random variable X, its pdf satisfies the following properties.
1. fX (x) ≥ 0, for every x ∈ R.
R∞
2. −∞ fX (x)dx = FX (∞) = 1.
3. fX (x) is not a probability; if can be take values larger than 1 at some points.
R x+ Rx R x+
4. FX (x + ) − FX (x) = −∞ fX (x)dx − −∞ fX (x)dx = x fX (x)dx.
Rb
5. P(a ≤ X ≤ b) = FX (b) − FX (a) = a fX (x)dx.
14
Expectation of a Random Variable
• Indicator r.v for event A is a simple random variable with E[1A ] = P(A).
15
Properties of Expectation
Properties of Expectation:
• Linearity: For two random variables X, and Y ,
16
Function of random variables
17
Characteristic Function
• CX (0) = E[1] = 1.
dCX (h) R∞
• dh |h=0 = −∞ (ix)fX (x)dx = iE[X].
18
Random Vector
X1
X
2
• A random vector X = .. such that each Xi , 1 ≤ i ≤ n is a r.v..
.
Xn
19
Computing Marginal Distributions
If joint distribution/ density/ mass function is given, we can compute the distri-
bution/ density/ PMF of each individual constituent random variable.
• Suppose joint density fX (c1 , c2 , . . . , cn ) is given, Find fX2 (c2 ). Recall that
Z c1 Z c2 Z cn
FX (c1 , c2 , . . . , cn ) = ... fX (x1 , x2 , . . . , xn )dx1 dx2 . . . dxn
x1 =−∞ x2 =−∞ xn =−∞
FX2 (c2 ) = lim FX (c1 , c2 , . . . , cn )
ci →∞,i6=2
Z ∞ Z c2 Z ∞
= ... fX (x1 , x2 , . . . , xn )dx1 dx2 . . . dxn
Zx1c=−∞
2
x2 =−∞
Z ∞
xn =−∞
Z ∞
= ... fX (x1 . . . xn )dx1 dx3 . . . dxn dx2
Zx2 =−∞
c2
x1 =−∞ xn =−∞
20
Example
X
Consider a random vector with joint density
Y
(
x + cy 2 , x ∈ [0, 1], y ∈ [0, 1]
fXY (x, y) =
0, otherwise.
21
Independence of Random Variables
22
Practice Problems
23
Correlation and Covariance
Let X and Y be discrete random variables that take values as X ∈ {x1 , x2 .....xn }
and Y ∈ {y1 , y2 ......ym }. Let the joint pmf be pij = P(X = xi , Y = yj ). Then,
n X
X m
E[XY ] = xi yj pij = xT P y, where,
i=1 j=1
x1 y1 p11 p12 . . . p1m
x y p p . . . p
2 2 21 22 2m
x = .. , y = .. , P = .. .
. . .
xn ym pn1 pn2 . . . pnm
Correlation Coefficient
cov(X, Y )
ρX,Y = p p , −1 ≤ ρXY ≤ 1.
var(X) var(Y )
Inner product interpretation
n
T
X xT y
x y= xi yi = ||x|| ||y|| cos θ =⇒ cos θ = .
i=1
||x|| ||y||
24
Properties of Covariance
25
Covariance Matrix of a Random Vector
X1
X
2
For a random vector X = .. , the covariance matrix contains the covariance
.
Xn
of each pair of constituent random variables.
cov(X1 ) cov(X1 , X2 ) . . . cov(X1 , Xn )
cov(X, X) = cov(X) = .. ∈ Rn×n
.
cov(Xn , X1 ) cov(Xn , X2 ) . . . cov(Xn )
X1 − E[X1 ] (X1 − E[X1 ]) (X2 − E[X2 ]) . . . (Xn − E[Xn ])
X − E[X ]
2 2
= E
..
.
Xn − E[Xn ]
= E[(X − E[X]) (X − E[X])> ].
X1 Y1
X Y
2 2
For two random vectors X = .. , and Y = .. , the (cross)-covariance
. .
Xn Ym
matrix is given by
cov(X1 , Y1 ) cov(X1 , Y2 ) . . . cov(X1 , Ym )
cov(X, Y ) = .. ∈ Rn×m
.
cov(Xn , Y1 ) cov(Xn , Y2 ) . . . cov(Xn , Ym )
= E[(X − E[X]) (Y − E[Y ])> ].
26
Sum of IID Random Variables
• Let Xi be the random variable that represents the outcome of i-th experi-
ment.
27
Solution
Pn 1
Pn
Let S = i=1 X i and S̄ := n i=1 Xi .
Pn Pn
E[S] = E i=1 i =
X i=1 E[Xi ] = nµ.
E[S] = µ.
= nσ 2 ,
σ2
Now: var(S̄) = ( n1 )2 var(S) = n
28
Gaussian Random Variable
Consequently, Z ∞
1 (x−µ)2
√ e− 2σ 2 = 1.
−∞ 2πσ 2
Most derivations involving Gaussian random variables and vectors leverage char-
acteristic function.
29
Jointly Gaussian Random Variables
X1
X
2
A Gaussian random vector X = .. is characterized by two quantities:
.
Xn
E[X1 ]
E[X ]
2
mean: µX = E[X] = .. ∈ Rn and
.
E[Xn ]
covariance matrix: CX ∈ Rn×n with (CX )i,j = cov(Xi , Xj ).
30
Properties of Gaussian Random Vectors
31
Inequalities and Bounds
Pn
Union bound: If A1 , A2 ....... An are events, P(∪ni=1 Ai ) ≤ i=1 P(Ai )
(Equality holds when Ai are disjoint)
Markov’s Inequality: Let X be a non negative r.v. Then, for any > 0,
E[X]
P(X ≥ ) ≤ .
1
Note: This bound is useful for large values of . In particular, if < E[X] , then
E[X]
> 1 which is trivial.
Is Y ≤ X?, E[Y ] =?
Chebyshev’s Inequality: For any random variable X, with E[X] = µ, and
any > 0,
var(X)
P[|X − µ| ≥ ] ≤ .
2
32
Inequalities and Bounds
2
− PN 2
P Sn − E[Sn ] ≤ − ≤ e i=1 (bi −ai )2 .
E[Z] ≥ 0
=⇒ E[s2 X 2 + 2sXY + Y 2 ] ≥ 0
=⇒ s2 E[X 2 ] + 2sE[XY ] + E[Y 2 ] ≥ 0
33
Chernoff Bound
Let X ∼ Binomial r.v (n,p) with probability mass function given by P(X =
k) = (nk)pk (1 − p)n−k , with k = {0, 1, 2.....n}. Find upper bounds on
P(X ≥ q) using Markov, Chebyshev and Chernoff bounds.
34
Distribution of sum of two random variables
35
Convergence of Sequences
Note: The above definition requires us to first conjecture a limit point x∗ , which
may not always be trivial.
lim |xn − xm | = 0.
n,m→∞
Example: Let xn = n1 , i.e., the sequence (xn )n∈N = (1, 12 , 13 . . .). What is a
possible value of x∗ ? Is this sequence a Cauchy sequence?
36
Almost Sure Convergence
Note: For a given outcome ω, Xn (ω) is a sequence of real numbers.
37
Convergence in Probability and in Mean Square Sense
Definition 14. A sequence of random variables (Xn )n∈N , with E[Xn2 ] <
∞ ∀n, converges to X ∗ in mean square sense if
lim E (Xn − X ∗ )2 = 0.
n→∞
38
Convergence in Distribution
Example
Let X be a r.v with CDF
α
θ , α ∈ [0, θ],
FX (α) = 1, α ≥ θ,
0, α ≤ 0.
Yk = maxi∈{1,2,...k} Xi .
Show that the sequence (Yn )n∈N converges in distribution to a random variable
Y ∗ whose distribution is given by
(
1, α ≥ θ
FY ∗ (α) =
0, otherwise.
39
Cauchy Criterion
lim E[(Xm − Xn )2 ] = 0.
n,m→∞
40
Limit Theorems
Sn
Note: What about the random variable n −µX ? What is its mean and variance?
41