ECE226 ProbabilityClassNotes
ECE226 ProbabilityClassNotes
Probability theory studies random phenomena in a formal mathematical way. It is essential for all
engineering and scientific disciplines dealing with models that depend on chance. Probability plays
a central role in e.g., telecommunications and finance systems. Telecommunications systems strive to
provide reliable and secure transmission and storage of information under the uncertainties coming
from various types of random noise and adversarial behavior. Finance systems strive to maximize
profits in spite of the uncertainties coming from natural and man-made events. The students will
learn the fundamentals of probability that are necessary for several ECE courses and related fields.
Class time and place: Mon & Thr, 10:20 PM - 11:40 AM, HLL-114.
Thrusday 4:00 – 5:00 pm: Amir Behrouzi Far, [email protected], Sections 2 & 4
Monday 9:00 – 10:00am: Fatemeh Koochaki, [email protected], Sections 1 & 5
Monday 3:30 – 4:30pm: Chrysanthi Koumpouzi, [email protected], Class TA
Friday 4:00 – 5:00pm Poojankuma Oza, [email protected], Sections 3 & 6
Please direct your questions about the quizzes and homework to the class TA Chrys Koumpouzi,
and feel free to contact any TA for technical questions about the course material.
Prerequisites: calculus
Grading: quizzes & homework 20%, 2 midterm exams 20% each, final exam 40%.
The midterm exams will be in class on February 15 and March 29, closed books and notes.
Text: Two textbooks available online (click on the book title below):
1. Introduction to Probability by Grinstead and Snell
2. Introduction to Probability, Statistics, and Random Processes by Pishro-Nik
Course notes: given per week in separate documents on the class Sakai page.
BASIC SET THEORY
1/9
(Random) Experiments
2/9
An Experiment & its Set of Outcomes
The set of outcomes for the experiment of tossing this 20-faced coin
is {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 13, 14, 15, 16, 17, 18, 19, 20}.
I A set is a collection of some items (elements).
I Sample space is the set of all possible outcomes of an experiment.
I An empty set is the set with no elements.
3/9
An Experiment & its Set of Outcomes
The set of outcomes for the experiment of tossing this 20-faced coin
is “in the eyes of the beholder” (or a measuring apparatus).
What if I cannot identify the digits, but can tell if there are two or one?
4/9
Membership and Inclusion
Elements belong to sets (or not) and sets contain elements (or not).
Sets are subsets of other sets (or not).
The set A is a subset of B if every element of A is also an element of B.
What is B to A?
Notation:
I We use upper case letters for sets (and lower case for set elements).
I We use ⌦ for the sample space, and ; for the empty set.
I 2 means “belongs to”. How about 3 and 2 /?
Example: If ⌦ = {H, T }, then H 2 ⌦ and ⌦ 3 T and a 2
/ ⌦.
I ⇢ means “is subset of” as in A ⇢ B. Is B A?
What can we say if ⌦ = {1, 2, 3, 4, 5, 6}, A = {1, 2, 3}, and B = {2, 4, 6}?
5/9
Set Operations
Let A and B be two sets.
A [ B = {x | x 2 A or x 2 B} .
A \ B = {x | x 2 A and x 2 B} .
A - B = A \ B = {x | x 2 A and x 62 B} .
Ā = Ã = Ac = {x | x 2 ⌦ and x 62 A} .
6/9
Set Operations – Ven Diagrams
A\B A[B
A B A B
A\B Ac
A B A
7/9
How do Sets Relate to Each Other?
8/9
Reading Material
9/9
BASIC PROBABILITY
1 / 13
An Experiment, Its Outcomes & Their Probabilities
2 / 13
Exercise
A \ (B [ C)c
3 / 13
Exercise continued
The outcome is
1. in at most one of the events A or B.
(A \ B)c
(A [ B [ C)c
4 / 13
An Experiment, Its Outcomes & Their Probabilities
|⌦|
X
µj = 1 .
j=1
⌦ = {1, 2, 3, 4, 5, 6} ,
5 / 13
An Experiment & Its Associated Random Variable(s)
6 / 13
An Experiment & Its Associated Random Variable(s)
7 / 13
Random Variable & its Probability Distribution
For an experiment, we have
X: RV denoting the value of its outcomes
⌦: the sample space (i.e., the set of all possible values of X)
9 / 13
A Claim and a Proof
Proof:
I By the definition of the probability of an event, we have
X X
P(E) = µ(!) and P(F) = µ(!)
!2E !2F
I We note that
X X X
P(F) = µ(!) = µ(!) + µ(!) > P(E)
!2F !2E !2F\E
| {z } | {z }
P(E) >0
10 / 13
A Claim and a Proof
Claim:
Let A1 , . . . , An be a partition of ⌦, and let E be any event. Then
n
X
P(E) = P(E \ Ai )
i=1
? Proof reasoning:
I What is a partition ? Ai are pairwise disjoint and
A1 [ · · · [ An = ⌦.
P
I What about ? Have we seen a sum of event probabilities earlier?
If A and B are disjoint then P(A [ B) = P(A) + P(B)
I The claim should hold for any n; how about n = 2 ?
I What is the union of sets E \ Ai ? What is their intersection ?
11 / 13
What can we say about sets E \ Ai ?
⌦
A1
E A3
A2
12 / 13
Reading Material
13 / 13
CONDITIONAL PROBABILITY
1 / 18
An Experiment, Its Outcomes & Events
Example:
The sample space for the die-rolling experiment is ⌦ = {1, 2, 3, 4, 5, 6}.
“The number of dots that turned up is divisible by 3” is an event.
2 / 18
Re-Assessing Beliefs
⌦
µ 1/6 1/6 1/6 1/6 1/6 1/6
µ0 0 0 1/2 0 0 1/2
3 / 18
Conditional Probability
4 / 18
Conditional Probability Examples
⌦
µ 1/6 1/6 1/6 1/6 1/6 1/6
µ0 0 0 1/2 0 0 1/2
If we know that E has taken place, what can we say about other events?
5 / 18
Conditional Probability Computing
E E\F F
6 / 18
Conditional Probability Computing
E is what remains of ⌦
E E \ F is what remains of F
7 / 18
An Example – Revising Belief
8 / 18
An Example – Is New Knowledge Always Helpful?
9 / 18
The Bayes Rule
P(E \ F)
We have seen that P(F|E) = . What is P(E|F)?
P(E)
P(F \ E)
P(E|F) =
P(F)
P(E \ F) P(E|F)P(F)
P(F|E) = =
P(E) P(E)
10 / 18
A Reminder
⌦
A1
E A3
A2
11 / 18
Total Probability
2. For any E, F ⇢ ⌦,
P(E \ F)
P(F|E) =
P(E)
n
X
P(E) = P(E | Ai )P(Ai )
i=1
12 / 18
An Example
One of the 3 biased dice is picked uniformly at random and rolled.
Each die is equally likely to be picked as any other.
⌦
µW 1/6 1/6 1/6 1/6 1/6 1/6
µB 0 0 1/2 0 0 1/2
µR 1/3 0 1/3 0 1/3 0
14 / 18
Independent Events – An Example
If an even number turned up, what is the probability that the die is red?
If the red die is picked, what is the probability that the number is even?
15 / 18
Independent Events
Claim:
Two events E and F are independent if and only if
P(E \ F) = P(E)P(F) .
Proof :
P(E \ F) = P(E|F)P(F) = P(E)P(F)
16 / 18
Independence is Tricky
17 / 18
Reading Material
18 / 18
COUNTING METHODS
1 / 14
A Two-Stage Experiment
2 / 14
Sampling With and Without Replacement
Besides, we may (or not) care about the order of the recorded numbers.
3 / 14
Sampling With and Without Replacement
Example:
Pick and record 7 cards from a regular deck of 52 cards w/o replacement.
Q: How many different ordered sequences of cards are possible?
A: 52 · 51 · 50 · 49 · 48 · 47 · 46
4 / 14
k-permutations of n
5 / 14
n-permutations of n
6 / 14
k-combinations of n
7 / 14
Drawing with Replacement
8 / 14
The Birthday Problem
1. Would you bet that two (any two) have the same birthday?
Yes, if the probability that this happens is higher than it does not.
2. Would you bet that at least one has the same birthday as yours?
Probability that no friend has the same birthday is
⇣ 364 ⌘23
365
7
1
2
?
9 / 14
Tree Diagrams
Tree diagrams are used to study experiments that take place in stages,
e.g, ordering food in restaurants (appetizer, main dish, desert):
ice cream
meat
cake
ice cream
soup fish cake
ice cream
vegetable
cake
(start)
ice cream
meat
cake
ice cream
juice fish
cake
ice cream
vegetable
cake
How many possible choices are there for the complete meal?
10 / 14
Tree Diagrams – Total Probability and Bayes Rule
ω m (ω)
meat ω1 .4
.5
.3 fish ω .24
soup 2
.2
.8
vegetable ω 3 .16
(start)
meat ω 4 .06
.2 .3
.3
vegetable ω 6 .06
Observations:
(We denote a head by 0, a tail by 1, and P(0) = p, P(1) = 1 - p.)
1. Sample space consists of all 4-bit binary strings.
(Each bit corresponds to a coin toss sub-experiment.)
2. Event of interest is E✓=◆{1100, 1010, 1001, 0101, 0110, 0011}.
4
Note that there are outcomes in E.
2
3. The probability of each outcome in E is p2 (1 - p)2 .
) ✓ ◆
4 2
P(E) = p (1 - p)2
2
12 / 14
Tree Diagrams for Coin Tosses
p 1-p
H T
p 1-p p 1-p
H T H T
H T H T H T H T
H T H T H T H T H T H T H T H T
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
13 / 14
Reading Material
14 / 14
DISCRETE RANDOM VARIABLES
1 / 16
An Experiment & Its Associated Random Variable(s)
2 / 16
An Experiment & Its Associated Random Variable(s)
3 / 16
Discrete Random Variables
4 / 16
Random Variable & its Probability Distribution
For discrete RVs, we also say the probability mass function (PMF).
5 / 16
Bernoulli Random Variable
is Bernoulli(P(E)).
6 / 16
k-combinations of n
7 / 16
Binomial Coefficients
8 / 16
Repeated Coin Tosses
Observations:
(We denote a head by 1, a tail by 0, and P(1) = p, P(0) = 1 - p.)
1. Sample space consists of all 4-bit binary strings.
(Each bit corresponds to a coin toss sub-experiment.)
2. Event of interest is E✓=◆{1100, 1010, 1001, 0101, 0110, 0011}.
4
Note that there are outcomes in E.
2
3. The probability of each outcome in E is p2 (1 - p)2 .
) ✓ ◆
4 2
P(E) = p (1 - p)2
2
9 / 16
Tree Diagrams for Coin Tosses
p 1-p
H T
p 1-p p 1-p
H T H T
H T H T H T H T
H T H T H T H T H T H T H T H T
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
10 / 16
Bernoulli Trials
11 / 16
RVs Associated with Bernoulli Trials
12 / 16
RVs Associated with Bernoulli Trials
P(Y = k) = (1 - p)k-1 · p
13 / 16
RVs Associated with Bernoulli Trials
Z - total number of coin tosses until the the `-th head appears
14 / 16
Two More Discrete RVs
1
P(U = !) = for each ! 2 ⌦
n
15 / 16
Reading Material
16 / 16
DISCRETE RANDOM VARIABLES
1 / 18
A Gambling Game
1 1 1 1 1 1
1· - 2 · + 3 · - 4 · + 5 · - 6 · = -0.5 .
6 6 6 6 6 6
2 / 18
The Expected Value
Example: X ⇠ Bernoulli(p) )
3 / 18
Some Properties of the Binomial Coefficients
✓ ◆
n n!
= for 0 6 k 6 n
k k! (n - k)!
4 / 18
The Expectation of the Binomial RV
✓ ◆
n k
X ⇠ B(n, p) ) P(X = k) = p (1 - p)n-k for k = 0, 1, . . . , n )
k
n
X ✓ ◆ n
X ✓ ◆
n k n n-1 k
E[X] = k· p (1 - p)n-k = k p (1 - p)n-k
k k k-1
k=0 | {z } k=1
P(X=k)
n ✓
X ◆
n - 1 k-1
= np p (1 - p)(n-1)-(k-1)
k-1
k=1
Xm ✓ ◆
m `
= np p (1 - p)m-` (with m = n - 1, ` = k - 1)
`
`=0
= np
5 / 18
The Sum of Two Random Variables
Claim: Let X and Y be RVs with finite expected values. Then
6 / 18
The Expectation of the Binomial RV
X ⇠ B(n, p) ) X = Y1 + Y2 + · · · + Yn
)
E[X] = E[Y1 + Y2 + · · · + Yn ]
= E[Y1 ] + E[Y2 ] + · · · + E[Yn ]
= np
7 / 18
The Expectation of the Poisson RV
k -
e
X ⇠ Poisson( ) ) P(X = k) = for k = 0, 1, 2, . . . )
k!
1
X k - X1 k
e
E[X] = k· = e-
k! }
| {z (k - 1)!
k=0 k=1
P(X=k)
1
X k-1
= e-
(k - 1)!
k=1
| {z }
e
8 / 18
The Expectation of the Geometric RV
1
X 1
X
E[X] = k · qk-1 p = p kqk-1
| {z }
k=0 k=1
P(X=k)
1
X 1
d k d X k
=p q =p q
dq dq
k=1 k=1
d h X ki
1
1
=p q q =p·
dq (1 - q)2
k=0
| {z }
1/(1-q)
1
=
p
9 / 18
The Expectation of the Negative Binomial (Pascal) RV
X ⇠ NB(`, p) ) X = Y1 + Y2 + · · · + Y`
1
where Yi ⇠ Geometric(p) for i 2 {1.2, . . . , `} ) E[Yi ] =
p
)
E[X] = E[Y1 + Y2 + · · · + Y` ]
= E[Y1 ] + E[Y2 ] + · · · + E[Y` ]
n
=
p
10 / 18
The Variance
2
We often use µX and X to denote the mean and variance of RV X.
11 / 18
Example # 1
A fair die is rolled once; let X be the number that turns up. Find V(X).
I To find V(X), we must first find the expected value of X:
1 1 1 1 1 1 7
E(X) = 1 · +2· +3· +4· +5· +6· =
6 6 6 6 6 6 2
I To find the variance of V(X),
we form the RV (X - E(X))2 and find its expectation:
X
PMF 1/6 1/6 1/6 1/6 1/6 1/6
(X - E(X))2 25/4 9/4 1/4 1/4 9/4 25/4
A biased die is rolled once; let X be the number that turns up.
Find V(X) if P(3) = P(4) = 1/2.
I E(X) = 7/2
I To find the variance of V(X),
we form the RV (X - E(X))2 and find its expectation:
X
PMF 0 0 1/2 1/2 0 0
(X - E(X))2 25/4 9/4 1/4 1/4 9/4 25/4
From this table, we find E((X - E(X))2 ) = 1/4 and D(X) = 1/2 .
13 / 18
Example # 3
A biased die is rolled once; let X be the number that turns up.
Find V(X) if P(1) = P(6) = 1/2.
I E(X) = 7/2
I To find the variance of V(X),
we form the RV (X - E(X))2 and find its expectation:
X
PMF 1/2 0 0 0 0 1/2
(X - E(X))2 25/4 9/4 1/4 1/4 9/4 25/4
From this table, we find E((X - E(X))2 ) = 25/4 and D(X) = 5/2 .
14 / 18
Point Processes
15 / 18
Point Processes
0 10 20 30 40 50 60
16 / 18
Poisson Arrivals
k -
e
! as n ! 1
k!
17 / 18
Reading Material
For Midterm 1:
1, 2, 3 in Introduction to Probability, Statistics, ...
1.2, 3.1, 3.2, 4.1, 5, 6.1, 6.2 in Introduction to Probability
18 / 18
MULTIPLE DISCRETE RANDOM VARIABLES
1 / 18
The Coupon Collector’s Problem
2 / 18
Time to Collect All Numbers (Coupons)
4 / 18
A Gambling Game
5 / 18
Functions of Random Variables
6 / 18
Functions of Random Variables
Example:
In a die rolling experiment, RV X corresponds to the number
on the face that turns up, and RV Y = X2 is a function of X.
X X
Compute and compare 1) x2 P(X = x) and 2) yP(Y = y).
x2⌦X y2⌦Y
7 / 18
Two Discrete Random Variables - Example
⌦
µB 0 0 1/2 0 0 1/2
µR 1/3 0 1/3 0 1/3 0
Let RV X be the color of the picked die and Y the number that turns up.
Find the joint probability of X and Y.
8 / 18
Joint and Marginal Probabilities of Two RVs
We can compute the marginal PMFs of X and Y from the joint PMF
X X
pX (x) = pX,Y (x, y) pY (y) = pX,Y (x, y)
y2⌦Y x2⌦X
9 / 18
Conditioning on Random Variables
10 / 18
Independent Random Variables
X Y
11 / 18
The Sum of Two Random Variables
12 / 18
The Product of Two Independent Random Variables
13 / 18
Conditional Expectation
Definition: Let F be an event and X is an RV with ⌦X = {x1 , x2 , . . .}.
The conditional expectation of X given F is
X
E(X|F) = xj P(X = xj |F) .
j
X X⇣ X ⌘
Proof: E(X|Fk )P(Fk ) = xj P(X = xj |Fk ) P(Fk )
k k j
XX
= xj P(X = xj and Fk occurs)
k j
X X
= xj P(X = xj and Fk occurs)
j k
X
= xj P(X = xj ) = E(X)
j 14 / 18
Joint (Vector) Random Variables
⌦ = ⌦1 ⇥ ⌦2 ⇥ · · · ⇥ ⌦n
15 / 18
(In)dependence of Random Variables X1 , X2 , . . . Xn , . . .
Independent RVs:
P(Xn = xn | X1 = x1 , . . . , Xn-1 = xn-1 ) = P(Xn = xn )
Yn
) P(X1 = x1 , . . . , Xn = xn ) = P(Xi = xi )
i=1
Markov Chain:
P(Xn = xn | X1 = x1 , . . . , Xn-1 = xn-1 ) = P(Xn = xn | Xn-1 = xn-1 )
Yn
) P(X1 = x1 , . . . , Xn = xn ) = P(X1 = x1 ) P(Xi = xi | Xi-1 = xi-1 )
i=2
⇥ ⇤ ⇥ ⇤
Martingale: E |Xn | < 1 and E Xn | X1 , . . . , Xn-1 = Xn-1
16 / 18
Bernoulli Trials and Gambler’s Fortune
Let
I p be the probability that head turns up.
I Xn be the RV associated with the gain/loss of the n-th toss.
I Sn be the RV associated with the gambler’s fortune after n tosses
How about S1 , S2 , . . . ?
Note that, when p = 1/2, the gambler’s expected fortune after the next trial,
given the history, is equal to his present fortune. Therefore, Sn is a martingale.
17 / 18
Reading Material
18 / 18
(IN)DEPENDENCE OF RVs
1 / 18
Joint (Vector) Random Variables
I When several RVs X1 , X2 , . . . , Xn correspond to an experiment
we often consider them jointly as a vector RV X̄ = (X1 , X2 , . . . , Xn )
I If Xi has range ⌦i , then the range ⌦ of X̄ is the Cartesian product
⌦ = ⌦1 ⇥ ⌦2 ⇥ · · · ⇥ ⌦n
Independent RVs:
P(Xn = xn | X1 = x1 , . . . , Xn-1 = xn-1 ) = P(Xn = xn )
Yn
) P(X1 = x1 , . . . , Xn = xn ) = P(Xi = xi )
i=1
Markov Chain:
P(Xn = xn | X1 = x1 , . . . , Xn-1 = xn-1 ) = P(Xn = xn | Xn-1 = xn-1 )
Yn
) P(X1 = x1 , . . . , Xn = xn ) = P(X1 = x1 ) P(Xi = xi | Xi-1 = xi-1 )
i=2
⇥ ⇤ ⇥ ⇤
Martingale: E |Xn | < 1 and E Xn | X1 , . . . , Xn-1 = Xn-1
3 / 18
Joint and Marginal PMFs Two RVs
⌦Y
y1 y2 ... ym
x1 P(X = x1 , Y = y1 ) P(X = x1 , Y = y2 ) ... P(X = x1 , Y = ym ) P(x1 )
x2 P(X = x2 , Y = y1 ) P(X = x2 , Y = y2 ) ... P(X = x2 , Y = ym ) P(x2 )
⌦X
.. .. .. .. .. ..
. . . . . .
x` P(X = x` , Y = y1 ) P(X = x1 , Y = y2 ) ... P(X = x` , Y = ym ) P(x` )
P(Y = y1 ) P(Y = y2 ) ... P(Y = ym )
The PMFs of the individual RVs are referred to as the marginal PMFs.
4 / 18
Joint and Marginal PMFs Two RVs
Example #1: A red die and a blue die are rolled, and the joint
probability for each pair of faces is given in the following table:
⌦B
5 / 18
Joint and Marginal PMFs Two RVs
Example #2: A red die and a blue die are rolled, and the joint
probability for each pair of faces is given in the following table:
⌦B
6 / 18
The Sum of Two Random Variables
7 / 18
The Product of Two Independent Random Variables
8 / 18
Calculation of the Variance
V(X) = E(X2 ) - µ2 .
Proof: We have
⇥ ⇤
V(X) = E (X - µ)2
= E(X2 - 2µX + µ2 )
= E(X2 ) - 2µE(X) + µ2
= E(X2 ) - µ2
9 / 18
Calculation of the Expectation of a Liner Function of RV
Proof: We have
X
E[aX + b] = (ax + b) · pX (x)
x2⌦X
X X
=a x · pX (x) + b · pX (x)
x2⌦X x2⌦X
= aE[X] + b.
10 / 18
Calculation of the Variance of a Liner Function of RV
Proof: We have
X
V[Y] = (ax + b - E[aX + b])2 · pX (x)
x2⌦X
X 2
= ax + b - aE[X] - b · pX (x)
x2⌦X
X
= a2 (x - E[X])2 · pX (x) = a2 V[X]
x2⌦X
11 / 18
The Variance of the Sum Two Independent RVs
12 / 18
Covariance and Correlation
Definition:
cov(X, Y)
and the correlation coefficient is ⇢X,Y = p
V(X)V(Y)
13 / 18
Some Properties of Covariance
How much can one RV tell about the other? Covariance is an indicator.
⇥ ⇤
cov(X, Y) = E (X - E[X])(Y - E[Y])
1. cov(X, X) = V(X)
2. cov(X, Y) = cov(Y, X)
3. -1 6 ⇢(X, Y) 6 1
4. V(X + Y) = V(X) + V(Y) + 2 cov(X, Y)
5. cov(X, Y) = E [X · Y] - E [X] · E [Y]
) If two RVs are independent, they are also uncorrelated.
TRUE OR FALSE?
1. If X and Y are uncorrelated, then V(X + Y) = V(X) + V(Y) T
2. If X and Y are uncorrelated, then they are independent. F
14 / 18
Correlation vs. (In)dependence
Example:
The joint probability of RVs X and Y is given in the following table:
⌦X
-1 0 1
⌦Y -1 0 1/4 0
0 1/4 0 1/4
1 0 1/4 0
15 / 18
Mutual Information
16 / 18
Solve a problem before you go ...
17 / 18
Reading Material
18 / 18
SUMS OF RVs & LAWS OF LARGE NUMBERS
1 / 17
The Expectation of the Sum of Two Random Variables
2 / 17
The Variance of the Sum of Two Random Variables
Claim: Let X and Y be two RVs with finite expectations. Then
3 / 17
Distribution of of the Sum of Two Independent RVs
A die is rolled twice. Let X1 and X2 be the outcomes and S2 = X1 + X2
If X1 & X2 are iid (independent identically distributed) with PMF m
⌦
m 1/6 1/6 1/6 1/6 1/6 1/6
4 / 17
The Distribution of the Sum of Two Independent RVs
5 / 17
Distribution of of the Sum of Two Independent RVs
The price of a stock on a trading day changes for some random amount
X with PMF PX
X : -1 0 1 2
pX : 1/4 1/2 1/8 1/8
Find the distribution for the change in stock price after two consecutive
and independent trading days.
6 / 17
The Weak Low of Large Numbers
Example:
I Consider n rolls of a die, and let Xj be the outcome of the jth roll.
This is an independent trials process with E(Xj ) = 7/2.
I What can we say about Sn = X1 + X2 + · · · + Xn ?
The Weak Law of Large Numbers says that, for any ✏ > 0
✓ ◆
Sn 7
P - >✏ ! 0 as n ! 1.
n 2
7 / 17
Distribution of Sn /n for Die Rolling Trials, n=1, n=2, n=3
0.20
0.15
probability
0.10
0.05
0.00
1 2 3 4 5 6
sample space 8 / 17
Law of Large Numbers
The Weak Law of Large Numbers says that, for any ✏ > 0,
✓ ◆
Sn
P -µ >✏ ! 0 as n ! 1
n
9 / 17
Sample Mean and Variance
10 / 17
Sample Mean
11 / 17
The Sample Mean Estimator
What else is important for an estimator (besides the bias)?
12 / 17
Sample Mean and Variance
13 / 17
Calculation of the Expectation of a Liner Function of RV
Proof: We have
X
E[aX + b] = (ax + b) · pX (x)
x2⌦X
X X
=a x · pX (x) + b · pX (x)
x2⌦X x2⌦X
= aE[X] + b.
14 / 17
Calculation of the Variance of a Liner Function of RV
Proof: We have
X
V[Y] = (ax + b - E[aX + b])2 · pX (x)
x2⌦X
X 2
= ax + b - aE[X] - b · pX (x)
x2⌦X
X
= a2 (x - E[X])2 · pX (x) = a2 V[X]
x2⌦X
15 / 17
Sample Mean and Variance
1. The Weak Law of Large Numbers tells that, for any ✏ > 0,
P X̄n - µ > ✏ ! 0 as n ! 1
2
2. The variance of the sample mean V(X̄n ) = /n.
It is very unlikely that the sample mean gets very far from its mean.
16 / 17
Reading Material
17 / 17
TAILS, LIMITS, AND CONTINUITY
1 / 16
Fraction of Heads in Coin Tossing
Problem: A fair coin is tossed n times (e.g. 50, 100, 200)
1. What is the expected fraction of heads?
2. How likely is it that the fraction of heads deviates
from the expected by more than 0.1?
0.10
0.10
0.10
n=50 n=100 n=200
0.08
0.08
0.08
probability
probability
probability
0.06
0.06
0.06
0.04
0.04
0.04
0.02
0.02
0.02
0.00
0.00
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
2 / 16
Number of Heads in n = 50 Fair-Coin Tosses
0.12
19 ✓ ◆⇣ ⌘ ⇣
X 50 1 k 1 ⌘n-k 0.08
probability
1-
k 2 2
k=0 0.06
+
0.04
✓ ◆
50 ⇣ 1 ⌘k ⇣ 1 ⌘n-k
50
X
1- 0.02
k 2 2
k=31
0.00
10 15 20 25 30 35 40
number of heads
3 / 16
Measuring Deviation from the Mean
Claim: For any RV X and any positive real number ✏ > 0, we have a
bound on the probability that X differs from E(X) by ✏ or more:
V(X)
P(|X - E(X)| > ✏) 6 Chebyshev Inequality
✏2
The Weak Law of Large Numbers says that, for any ✏ > 0,
✓ ◆
Sn
P -µ >✏ ! 0 as n ! 1
n
5 / 16
Law of Large Numbers - Proof
6 / 16
Fraction of Heads in Coin Tossing
7 / 16
Bernoulli Trials
8 / 16
Fraction of Heads in Coin Tossing
9 / 16
Fraction of Heads Yn in n Coin Tosses
0.10
0.10
0.10
n=50 n=100 n=200
0.08
0.08
0.08
probability
probability
probability
0.06
0.06
0.06
0.04
0.04
0.04
0.02
0.02
0.02
0.00
0.00
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
1 2
The possible values 0, , , . . . , 1 of Yn become closer to each other.
n n
10 / 16
Fraction of Heads Yn in n Coin Tosses
0.10
0.10
0.10
n=50 n=100 n=200
0.08
0.08
0.08
probability
probability
probability
0.06
0.06
0.06
0.04
0.04
0.04
0.02
0.02
0.02
0.00
0.00
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
1 2
The possible values 0, , , . . . , 1 of Yn become closer to each other.
n n
11 / 16
Continuous Random Variables
-1 t x x+ 1
12 / 16
Cumulative Distribution Function (CDF)
FX (x) = P(X 6 x)
-1 t x x+ 1
Note that
PX ((x, x + ]) = F(x + ) - F(x)
13 / 16
Probability Density Function (PDF)
The probability density function of a continuous real-valued RV X is
-1 t x x+ 1
Therefore,
PX ((x, x + ]) F(x + ) - F(x) d
f(x) = lim = lim = F(x) = F0 (x)
!0 !0 dx
14 / 16
Probability Density Function (PDF) - Properties
Zx
I F(x) = f(t)dt
-1
I fX (x) is not the probability that X takes the value x
Zb
I P((a, b]) = f(x)dx
a
Z1
I f(x)dx = 1
1
15 / 16
Reading Material
16 / 16
CONTINUOUS RANDOM VARIABLES
1 / 18
Continuous Random Variables
-1 t x x+ 1
2 / 18
Cumulative Distribution Function (CDF)
FX (x) = P(X 6 x)
-1 t x x+ 1
Note that
PX ((x, x + ]) = F(x + ) - F(x)
3 / 18
Probability Density Function (PDF)
The probability density function of a continuous real-valued RV X is
-1 t x x+ 1
Therefore,
PX ((x, x + ]) F(x + ) - F(x) d
f(x) = lim = = F(x) = F0 (x)
!0 dx
4 / 18
Probability Density Function (PDF) - Properties
Zx
I F(x) = f(t)dt
-1
I fX (x) is not the probability that X takes the value x
Zb
I P((a, b]) = f(x)dx
a
Z1
I f(x)dx = 1
-1
5 / 18
Expected Value and Variance
2
The variance = V(X) of a real-valued RV X is defined by
Z +1
2
= E (X - µ)2 = (x - µ)2 f(x)dx
-1
Z +1
= (x2 - 2µx + µ2 )f(x)dx
-1
Z +1 Z +1 Z +1
= x2 f(x)dx - 2µ xf(x)dx + µ2 f(x)dx
-1 -1 -1
2 2
= E(X ) - 2µ · E(X) + µ · 1
= E(X2 ) - µ2
6 / 18
Uniform Random Variable X ⇠ Uniform(a, b)
The PDF:
8 fX (x)
> 1
< for a 6 x 6 b,
f(x) = b - a
>
:0 for x < a or x > b 1
b-a
7 / 18
A Spinner - An Example of a Uniform RV
The PDF of X: 0
8
>
<1 for 0 6 x 6 1
f(x) =
>
:0 otherwise
Point x is distance x from 0.
8 / 18
An Example of a Uniform RV
How long will Michael have to wait for the bus on average?
What is the probability that he waits less than five minutes?
9 / 18
2
Gaussian (Normal) Random Variable X ⇠ N(µ, )
2
f(x; µ, )
1
PDF:
µ = 0, = 0.4
1 (x-µ)2
f(x; µ, 2
)= p e- 2 2
2 2⇡
0.5
E(X) = µ µ = 1, = 0.8
2
V(X) =
0 x
-2 0 2 4
10 / 18
Exponential Random Variable X ⇠ Expo( )
f(x; )
PDF:
8 1.5
< e- x
x > 0,
f(x) =
:0 x < 0.
1
= 1.5
CDF:
8
<1 - e- x
x > 0, 0.5
F(x) =
:0 x < 0.
= 0.75
x
0 1 2 3 4 5
Models e.g., device lifetime, service request inter-arrival and service time.
“How long do we have wait until something happens?”
11 / 18
The Mean and the Variance of X ⇠ Expo( )
Z1 Z1
E(X) = xfX (x) dx = xe- x
dx
0 0
1 Z1
- x
= -xe + e- x dx
0 0
x 1
e- 1
=0+ =
- 0
Z1
1
V(X) = E(X2 ) - E2 (X) = x2 fX (x) dx - 2
0
Z1 1 Z1
1 1
= x2 e- x dx - 2 = -x2 e- x + 2 xe- x
dx - 2
0 0 0
1 - x 1 1
2xe 2 1 2 1 1
= -x2 e- x
- - 2
e- x
- 2
= 2
- 2
= 2
.
0 0 0
12 / 18
The Memoryless Property of Exponential RVs
Proof:
P(T > r + s \ T > r)
P(T > r + s | T > r) =
P(T > r)
P(T > r + s) 1 - F(r + s)
= =
P(T > r) 1 - F(r)
e- (r+s)
= = e- s
e- r
= 1 - F(s) = P(T > s)
13 / 18
What is Half-Life?
14 / 18
Half Life Example
I Half-life is often used to specify exponential decay.
I E.g., “hard drives have a half-life of two years” means
1
Pr(T > 2) = .
2
where T is the time it takes for a new disk to fail (T is an RV).
What is the probability that a disk needs repair within its first year?
Soulution:
I We know that T is an exponential RV but are not given .
I We can find from the half-time, or
I We can use the memoryless property to solve this problem,
we can have
I increasing failure rate if P(T > r + s | T > r) > P(T > s)
I decreasing failure rate if P(T > r + s | T > r) < P(T > s)
I constant failure rate if P(T > r + s | T > r) = P(T > s)
16 / 18
Solve a problem before you go ...
17 / 18
Reading Material
18 / 18
MULTIPLE CONTINUOUS RVs
1 / 27
Uniform Random Variable X ⇠ Uniform(a, b)
The PDF:
8 fX (x)
> 1
< for a 6 x 6 b,
f(x) = b - a
>
:0 for x < a or x > b 1
b-a
2 / 27
A Spinner - An Example of a Uniform RV
The PDF of X: 0
8
>
<1 for 0 6 x 6 1
f(x) =
>
:0 otherwise
Point x is distance x from 0.
3 / 27
Recall Rolling a Die
Recall:
The range ⌦ is the set of all possible outcomes of an experiment.
Elements of ⌦ are called outcomes, and its subsets are called events.
Example:
The range for the die-rolling experiment is ⌦ = {1, 2, 3, 4, 5, 6}. “The
number of dots that turned up is smaller than 4” is an event.
4 / 27
Re-Assessing Beliefs
⌦
µ 1/6 1/6 1/6 1/6 1/6 1/6
µ0 1/3 1/3 1/3 0 0 0
5 / 27
Continuous Conditional Probability
6 / 27
Continuous Conditional Probability – Example
Suppose we know the pointer is in the upper half of the circle – event E
x
E = [0, 1/2], F = [1/6, 1/3], and F \ E = F.
Therefore,
0
P(F \ E) 1/6 1
P(F|E) = = =
P(E) 1/2 3
7 / 27
Joint PDFs and CDFs of Multiple Continuous RVs
F(x1 , x2 , . . . , xn ) = P(X1 6 x1 , X2 6 x2 , . . . , Xn 6 xn )
@n F(x1 , x2 , . . . , xn )
I f(x1 , x2 , . . . , xn ) =
@x1 @x2 · · · @xn
8 / 27
Joint and Marginal PDFs of Two Continuous RVs
9 / 27
Independent Continuous RVs
or equivalently
I Continuous RVs X1 , X2 , . . . , Xn with PDFs f1 (x), f2 (x), . . . , fn (x)
are mutually independent iff
f(x1 , x2 , . . . , xn ) = f1 (x1 )f2 (x2 ) · · · fn (xn )
for any choice of x1 , x2 , . . . , xn .
10 / 27
Expectation of Sums and Products
I If X and Y are real-valued RVs and c is any constant, then
E(XY) = E(X)E(Y)
11 / 27
Variance of Sums
V(cX) = c2 V(X) ,
V(X + c) = V(X) .
12 / 27
Covariance and Correlation
Definition:
cov(X, Y)
and the correlation coefficient is ⇢X,Y = p
V(X)V(Y)
13 / 27
Some Properties of Covariance
How much can one RV tell about the other? Covariance is an indicator.
⇥ ⇤
cov(X, Y) = E (X - E[X])(Y - E[Y])
1. cov(X, X) = V(X)
2. cov(X, Y) = cov(Y, X)
3. -1 6 ⇢(X, Y) 6 1
4. V(X + Y) = V(X) + V(Y) + 2 cov(X, Y)
5. cov(X, Y) = E [X · Y] - E [X] · E [Y]
) If two RVs are independent, they are also uncorrelated.
TRUE OR FALSE?
1. If X and Y are uncorrelated, then V(X + Y) = V(X) + V(Y) T
2. If X and Y are uncorrelated, then they are independent. F
14 / 27
PMF of of the Sum of Two Independent RVs
A die is rolled twice. Let X1 and X2 be the outcomes, and S2 = X1 + X2 .
Then X1 and X2 iid (independent identically distributed) with PMF m:
⌦
m 1/6 1/6 1/6 1/6 1/6 1/6
15 / 27
PMF of the Sum of Two Independent RVs
16 / 27
PDF of the Sum of Two Independent RVs
17 / 27
Dirac’s Delta Function
18 / 27
Mathematical Convenience and/or Physical Reality?
How precisely can we locate a particle? Nice video if you like physics.
19 / 27
The Step and the Impulse Function
1 1
0.5 0.5
-4 -2 2 4 x x
20 / 27
Continuous Representation of Discrete RVs
Rolling a die with PMF m:
⌦
m 1/6 1/6 1/6 1/6 1/6 1/6
6
X 6
X
1 1
f(x) = · (x - i) F(x) = · H(x - i)
6 6
i=1 i=1
1 1
0.5 0.5
1 2 3 4 5 6 x 2 4 6 x
21 / 27
Mixed Random Variables
X is a mixed RV iff its PDF has both impulses and nonzero, finite values.
Example:
Observe someone dialing a phone and record the duration of the call.
Your observation tells you the following:
I 1/3 of the calls are not answered (and thus last 0 minutes),
I the duration of answered calls is U(0, 3) in minutes.
Let X denote the call duration. Find the PDF, CDF, and the mean of Y.
22 / 27
Random Processes
A random process is a collection of RVs Xt , t 2 T that
I have a common sample space ⌦
I are usually indexed by time t, t 2 T
Examples:
I stock price over some period of time:
@n F(xt1 , . . . , xtn )
I f(xt1 , . . . , xtn ) =
@xt1 · · · @xtn
24 / 27
Stationarity
i.e., joint PDF (and thus CDF) does not change by shifts ⌧ in time.
25 / 27
Autocorrelation and Cross-correlation
E[(Xt - µt )(Xs - µs )]
R(s, t) =
t s
E[(Xt - µt )(Ys - µs )]
R(s, t) =
t s
26 / 27
Reading Material
27 / 27
(IN)DEPENDENCE OF CONTINUOUS RVs
1 / 18
Continuous Random Variables
-1 t x x+ 1
2 / 18
Cumulative Distribution Function (CDF)
FX (t) = P(X 6 t)
-1 t x x+ 1
Note that
PX ((x, x + ]) = F(x + ) - F(x)
3 / 18
Probability Density Function (PDF)
The probability density function of a continuous real-valued RV X is
-1 t x x+ 1
Therefore,
PX ((x, x + ]) F(x + ) - F(x) d
f(x) = lim = lim = F(x) = F0 (x)
!0 !0 dx
4 / 18
Probability Density Function (PDF) - Properties
d
I f(x) = F(x), and is not the probability that X takes the value x
dx
Zx
I F(x) = f(t)dt
-1
Z1
I f(t)dt = 1
-1
Zb
I P((a, b]) = f(x)dx
a
Z
I f(x)dx probability of event E
E
5 / 18
Expected Value and Variance
2
The variance = V(X) of a real-valued RV X is defined by
Z +1
2
= E (X - µ)2 = (x - µ)2 f(x)dx
-1
Z +1
= (x2 - 2µx + µ2 )f(x)dx
-1
Z +1 Z +1 Z +1
= x2 f(x)dx - 2µ xf(x)dx + µ2 f(x)dx
-1 -1 -1
2 2
= E(X ) - 2µ · E(X) + µ · 1
= E(X2 ) - µ2
6 / 18
Uniform Random Variable X ⇠ Uniform(a, b)
The PDF:
8 fX (x)
> 1
< for a 6 x 6 b,
f(x) = b - a
>
:0 for x < a or x > b 1
b-a
7 / 18
A Spinner - An Example of a Uniform RV
The PDF of X: 0
8
>
<1 for 0 6 x 6 1
f(x) =
>
:0 otherwise
Point x is distance x from 0.
8 / 18
Recall Rolling a Die
Recall:
The range ⌦ is the set of all possible outcomes of an experiment.
Elements of ⌦ are called outcomes, and its subsets are called events.
Example:
The range for the die-rolling experiment is ⌦ = {1, 2, 3, 4, 5, 6}. “The
number of dots that turned up is smaller than 4” is an event.
9 / 18
Re-Assessing Beliefs
⌦
µ 1/6 1/6 1/6 1/6 1/6 1/6
µ0 1/3 1/3 1/3 0 0 0
10 / 18
Continuous Conditional Probability
11 / 18
Continuous Conditional Probability – Example
Suppose we know the pointer is in the upper half of the circle – event E
What is then the probability of event F that 1/6 6 x 6 1/3?
x
E = [0, 1/2], F = [1/6, 1/3], and F \ E = F.
Therefore,
0
P(F \ E) 1/6 1
P(F|E) = = =
P(E) 1/2 3
12 / 18
Joint PDFs and CDFs of Multiple Continuous RVs
F(x1 , x2 , . . . , xn ) = P(X1 6 x1 , X2 6 x2 , . . . , Xn 6 xn )
@n F(x1 , x2 , . . . , xn )
I f(x1 , x2 , . . . , xn ) =
@x1 @x2 · · · @xn
13 / 18
Joint and Marginal PDFs of Two Continuous RVs
14 / 18
Independent Continuous RVs
or equivalently
I Continuous RVs X1 , X2 , . . . , Xn with PDFs f1 (x), f2 (x), . . . , fn (x)
are mutually independent iff
f(x1 , x2 , . . . , xn ) = f1 (x1 )f2 (x2 ) · · · fn (xn )
for any choice of x1 , x2 , . . . , xn .
15 / 18
Covariance and Correlation
Definition:
cov(X, Y)
and the correlation coefficient is ⇢X,Y =
V(X)V(Y)
16 / 18
Some Properties of Covariance
How much can one RV tell about the other? Covariance is an indicator.
⇥ ⇤
cov(X, Y) = E (X - E[X])(Y - E[Y])
1. cov(X, X) = V(X)
2. cov(X, Y) = cov(Y, X)
3. -1 6 ⇢(X, Y) 6 1
4. V(X + Y) = V(X) + V(Y) + 2 cov(X, Y)
5. cov(X, Y) = E [X · Y] - E [X] · E [Y]
) If two RVs are independent, they are also uncorrelated.
TRUE OR FALSE?
1. If X and Y are uncorrelated, then V(X + Y) = V(X) + V(Y) T
2. If X and Y are uncorrelated, then they are independent. F
17 / 18
Reading Material
18 / 18
SUMS & MIXES OF RVs, PROCESSES
1 / 18
Expectation of Sums and Products
I If X and Y are real-valued RVs and c is any constant, then
E(XY) = E(X)E(Y)
2 / 18
Variance of Sums
V(cX) = c2 V(X) ,
V(X + c) = V(X) .
3 / 18
PMF of of the Sum of Two Independent RVs
A die is rolled twice. Let X1 and X2 be the outcomes, and S2 = X1 + X2 .
Then X1 and X2 iid (independent identically distributed) with PMF m:
⌦
m 1/6 1/6 1/6 1/6 1/6 1/6
4 / 18
PMF of the Sum of Two Independent RVs
5 / 18
PDF of the Sum of Two Independent RVs
6 / 18
Sum of Two Independent Normal Random Variables
1 2 1 2
Let X ⇠ p e-x /2 and Y ⇠ p e-y /2 be independent RVs.
2⇡ 2⇡
What can we say about Z = X + Y?
1 (x-µ)2
I Recall that p e- is the PDF
2 2
2 2⇡
2
of a Gaussian RV with the meant µ and variance .
7 / 18
Sum of Two Independent Normal Random Variables
1 2 1 2
Let X ⇠ p e-x /2 and Y ⇠ p e-y /2 be independent RVs.
2⇡ 2⇡
Find the PDF of Z = X + Y.
We have
fZ (z) = fX ⇤ fY (z)
Z
1 +1 -(z-y)2 /2 -y2 /2
= e e dy
2⇡ -1
Z
1 -z2 /4 +1 -(y-z/2)2
= e e dy
2⇡ -1
Z
1 -z2 /4 p 1 1 -(y-z/2)2
= e ⇡ p e dy
2⇡ ⇡ -1
| {z }
=1
1 2 1 2
= p e-z /4 = p e-z /(2·2)
4⇡ 2·2·⇡
8 / 18
Dirac’s Delta Function
9 / 18
Mathematical Convenience and/or Physical Reality?
How precisely can we locate a particle? Nice video if you like physics.
10 / 18
The Step and the Impulse Function
1 1
0.5 0.5
-4 -2 2 4 x x
11 / 18
Continuous Representation of Discrete RVs
Rolling a die with PMF m:
⌦
m 1/6 1/6 1/6 1/6 1/6 1/6
6
X 6
X
1 1
f(x) = · (x - i) F(x) = · H(x - i)
6 6
i=1 i=1
1 1
0.5 0.5
1 2 3 4 5 6 x 2 4 6 x
12 / 18
Mixed Random Variables
X is a mixed RV iff its PDF has both impulses and nonzero, finite values.
Example:
Observe someone dialing a phone and record the duration of the call.
Your observation tells you the following:
I 1/3 of the calls are not answered (and thus last 0 minutes),
I the duration of answered calls is U(0, 3) in minutes.
Let X denote the call duration. Find the PDF, CDF, and the mean of Y.
13 / 18
Random Processes
A random process is a collection of RVs Xt , t 2 T that
I have a common sample space ⌦
I are usually indexed by time t, t 2 T
Examples:
I stock price over some period of time:
@n F(xt1 , . . . , xtn )
I f(xt1 , . . . , xtn ) =
@xt1 · · · @xtn
15 / 18
Stationarity
i.e., joint PDF (and thus CDF) does not change by shifts s in time.
16 / 18
Autocorrelation and Cross-correlation
E[(Xt - µt )(Xs - µs )]
R(s, t) =
t s
E[(Xt - µt )(Ys - µs )]
R(s, t) =
t s
17 / 18
Reading Material
18 / 18
LIMIT DISTRIBUTIONS & TAIL INEQUALITIES
1 / 17
Fraction of Heads in Coin Tossing
Problem: A fair coin is tossed n times (e.g. 50, 100, 200)
1. What is the expected fraction of heads?
2. How likely is it that the fraction of heads deviates
from the expected by 0.1 or more?
0.10
0.10
0.10
n=50 n=100 n=200
0.08
0.08
0.08
probability
probability
probability
0.06
0.06
0.06
0.04
0.04
0.04
0.02
0.02
0.02
0.00
0.00
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
2 / 17
Number of Heads in n = 50 Fair-Coin Tosses
0.12
20 ✓ ◆⇣ ⌘ ⇣
X 50 1 k 1 ⌘n-k 0.08
probability
1-
k 2 2
k=0 0.06
+
0.04
✓ ◆
50 ⇣ 1 ⌘k ⇣ 1 ⌘n-k
50
X
1- 0.02
k 2 2
k=30
0.00
10 15 20 25 30 35 40
number of heads
3 / 17
A Tail Inequality
Markov’s inequality:
If X is a nonnegative random variable and a > 0, we have
E(X)
P(X > a) 6
a
Example:
I The number of heads in n tosses is Xn ⇠ B(n, 0.5) ) E(X50 ) = 25.
I By Markov’s inequality, we have
5
P(X > 30) 6
6
Not a very impressive bound!
4 / 17
Measuring Deviation from the Mean
Claim: For any RV X and any positive real number ✏ > 0, we have a
bound on the probability that X differs from E(X) by ✏ or more:
V(X)
P(|X - E(X)| > ✏) 6 Chebyshev Inequality
✏2
Chebychev inequality:
For any RV X and any positive real number ✏ > 0, we have
V(X)
P(|X - E(X)| > ✏) 6
✏2
Example:
I The number of heads in n tosses is Xn ⇠ B(n, 0.5) )
E(X50 ) = 25 and V(X50 ) = 12.25
I By Chebychev inequality, we have
6 / 17
Law of Large Numbers - Statement
The Weak Law of Large Numbers says that, for any ✏ > 0,
✓ ◆
Sn
P -µ >✏ ! 0 as n ! 1
n
7 / 17
Law of Large Numbers - Proof
8 / 17
The Central Limit Theorem (CLT) – Statement
1
Consider the RVs p (Sn - nµ) for n = 1, 2, . . . ,
n
2
and note that their mean is 0 and their variance is .
1
CLT: RVs p (Sn - nµ) "converge in distribution" N(0, 2 ),
n
,
1
RVs p (Sn - nµ) "converge in distribution" to the standard normal:
n 2
Sn - nµ d
p ! N (0, 1) .
n 2
9 / 17
The Standardized Sum for Bernoulli Trials
Sn - np
S⇤n = p .
npq
P(S⇤n 6 x) ! (x) as n ! 1
10 / 17
Binomial Distribution Approximations
11 / 17
Point Processes
12 / 17
Point Processes
13 / 17
Point Processes
0 10 20 30 40 50 60
14 / 17
Poisson Arrivals
k -
e
! as n ! 1
k!
15 / 17
Inter-Arrivals Times
1
The probability that the time between two arrivals is k · is
n
⇣ ⌘k
1- · ⇠ · e- k· t
· t
n n
where t = 1/n.
16 / 17
Reading Material
17 / 17
PROBABILITY IN ECE: TELECOMMUNICATIONS
1 / 13
Telecommunications
X Y b
X
transmitter channel receiver
2 / 13
Communications Channel
X Y b
X
transmitter channel receiver
The relation between the input and the output is described by, e.g.,
I the conditional PDF of the output given the input W(Y | X)
(we call W the transition probability)
I a noise RV Z added to the input s.t. Y = X + Z.
3 / 13
The Binary Symmetric Channel BSC(p)
1-p
Binary input and output: 0 0
⌦X = ⌦Y = {0, 1}
p
X p Y
W(0 | 0) = W(1 | 1) = 1 - p
W(1 | 0) = W(0 | 1) = p 1 1
1-p
where Z ⇠ Bernoulli(p)
4 / 13
Binary Erasure Channel BEC(✏)
1-✏
Binary input and ternary output: 0 0
⌦X = {0, 1}, ⌦Y = {0, 1, -} ✏
W(1 | 0) = W(0 | 1) = 0 X - Y
W(0 | 0) = W(1 | 1) = 1 - ✏
✏
W(- | 0) = W(- | 1) = ✏ 1 1
1-✏
5 / 13
Binary Input Additive Gaussian Noise Channel
W(y | x)
-1 or 1 input and real-valued
output: ⌦X = {-1, 1}, ⌦Y = R
x=1
x = -1
1 - (x+1)
2
W(y | -1 ) = p e 2 2
2 2⇡
1 (x-1)2
W(y | 1 ) = p e- 2 2
2 2⇡
y
-1 0 1
2
We can instead say Y = X + Z where Z ⇠ N(0, ).
6 / 13
The Optimal Detector and its Error Rate
W(y | x)
I We assume equal priors:
-1
y 7 0
1
y
-1 0 1
1 1
Probability of error Pe is given by Pe =P(e|1) + P(e| - 1).
2 2
Because of symmetry, we have P(e|1) = P(e| - 1).
7 / 13
The Optimal Detector and its Error Rate
We have
Z1
1 (x+1)2
P(Y > 0| - 1) = p e- 2 2 dx set y = (x + 1)/
2⇡ 2
Z01
1 y2
= p e- 2 dy
1/ 2⇡
⇣1 ⌘
= Q
where ⇣ x ⌘ Z1
1 1 y2
Q(x) = erfc p = p e- 2 dy
2 2 2⇡ x
8 / 13
Introducing Redundancy (Dependence)
I repetition coding
message transmit
0 000
1 111
I parity-check coding
message transmit
00 000
01 011
10 101
11 110
9 / 13
Random Variables in a Repetition Code on the BSC
Given the channel output ȳ, we look for the most probable input,
namely, the one that maximizes P(X̄ = x̄ | Ȳ = ȳ):
10 / 13
BSC(p) with a Repetition Code
Transitions from 000 input: Transitions from 111 input:
11 / 13
BSC(p) with a Majority Vote Repetition Code
Transitions from 000 input:
Pe
X̄ Ȳ P(Ȳ | X̄)
000 (1 - p)3
100 (1 - p)2 p
010 (1 - p)2 p
000 001 (1 - p)2 p
110 (1 - p)p2
011 (1 - p)p2
101 (1 - p)p2 p
111 p3
12 / 13
Parity-Check Coding on BEC(✏)
Px
Parity-check coding:
message transmit
00 000
01 011
10 101
11 110
13 / 13
PROBABILITY IN ECE: DETECTION
1/9
Binary Hypothesis Testing – Example
A test:
I We give the aspirin to n people to take when they have a headache.
I We accept H1 if at least m people are cured.
How should we determine this critical value m? How does n matter?
2/9
Binary Hypothesis Testing – Example
Consider 50 trials with the rate of cure 60% under H0 and 80% under H1 :
0.14
What should m be?
0.12
H0 H1
0.10
0.08
probability
0.06
0.04
0.02
0.00
0 10 20 30 40 50
# of cured people
3/9
Binary Hypothesis Testing – Errors
X✓
m-1 ◆
n k
w.p. p (1 - p1 )n-k
k 1
k=0
4/9
Binary Hypothesis Testing – Errors
An error occurs if
1. the true hypothesis is H0 and we decide H1 , false alarm
or
2. the true hypothesis is H1 and we decide H0 . missed detection
5/9
Denial-of-Service (DoS) Cyber Attack
I Goal: Make a network resource unavailable to its intended users.
I DoS is typically accomplished by overloading the targeted resource
(e.g., cloud computing server) with superfluous requests.
I Possible symptoms:
1) seeming unavailability of a web site
2) extremely slow file download
7/9
Binary Hypothesis Testing – Detection
Therefore, we can find PH1 |X (x) and PH0 |X (x) if we know the priors:
9/9
PROBABILITY IN ECE: MACHINE LEARNING
1 / 13
Bernoulli and Binomial RVs
2 / 13
Markov’s Inequality
E(X)
P(X > a) 6
a
Example: Tossing a coin with P(H) = 0.25 n times.
I The number of heads in n tosses is Sn ⇠ B(n, 0.25)
) E(S400 ) = 100.
I For a = 150, by Markov’s inequality, we have
2
P(S400 > 150) 6
3
Not an impressive bound! In fact, P(S400 > 150) = 2.18 · 10-8 .
3 / 13
Chebychev Inequality
For a RV X, we can apply Markov’s inequality to RV Y = (X - E(X))2 :
E(Y) V(X)
P(Y > a) 6 , P((X - E(X))2 > a) 6
a a
p V(X)
, P(|X - E(X)| > a) 6
a
p
By setting ✏ = a, we get the Chebychev Inequality:
For any RV X and any positive real number ✏ > 0, we have
V(X)
P(|X - E(X)| > ✏) 6
✏2
Example: Tossing a coin with P(H) = 0.25 n times.
I The number of heads in n tosses is Sn ⇠ B(n, 0.25) )
E(S400 ) = 100 and V(S400 ) = 75
I By Chebychev inequality, we have
75
P(|S400 - 100| > 50) 6 = 0.03
502
4 / 13
The Standardized Sum for Bernoulli Trials
Sn - np
S⇤n = p .
npq
P(S⇤n 6 x) ! (x) as n ! 1
In our example, we have p = 0.25, and are interested in P(S400 > 150)
5 / 13
Estimating the Sn Tail Probability by CLT
Note that
Sn - np a - np
S⇤n = p =) Sn > a , S⇤n > p
npq npq
6 / 13
Review Tree Diagrams – Jan. 29 Lecture
Tree diagrams are used to study experiments that take place in stages,
e.g, ordering food in restaurants (appetizer, main dish, desert):
ice cream
meat
cake
ice cream
soup fish cake
ice cream
vegetable
cake
(start)
ice cream
meat
cake
ice cream
juice fish
cake
ice cream
vegetable
cake
How many possible choices are there for the complete meal?
7 / 13
Tree Diagrams – Total Probability and Bayes Rule
ω m (ω)
meat ω1 .4
.5
.3 fish ω .24
soup 2
.2
.8
vegetable ω 3 .16
(start)
meat ω 4 .06
.2 .3
.3
vegetable ω 6 .06
Suppose you can go to your currant favorite restaurant or try a new one.
What would you do each evening for dinner over a month?
9 / 13
An m-Coin Bandit Problem
10 / 13
The m-Armed Bandit Problem
Ingredients:
I A – known set of m possible actions (e.g., select & toss a coin)
I R – known set of possible rewards (e.g., get a dollar or not)
I P[r|a] – unknown PDFs of rewards r 2 R given an action a 2 A.
Dynamics:
At each step s
I the agent selects an action as 2 A
I the environment generates a reward rs 2 R w.p. P[rs |as ]
The goal is to maximize cumulative reward.
11 / 13
A 2-Coin Bandit Problem
12 / 13
A 2-Coin Bandit Problem
No Exploration Algorithm:
Pick a coin at random and toss T times. The expected reward is
1 1
V1 = T p1 + T p2 .
2 2
1
The regret is T (p1 - p2 ), and is linear in T .
2
Explore-First Algorithm:
1. Exploration phase: toss each coin n < T times.
2. Select the coin with the highest average reward.
3. Exploitation phase: toss the selected coin in all remaining rounds.
What else?
13 / 13
BAYES RULE & TOTAL PROBABILITY
1/7
Tree Diagrams – Jan. 29 Lecture
Tree diagrams are used to study experiments that take place in stages,
e.g, ordering food in restaurants (appetizer, main dish, desert):
ice cream
meat
cake
ice cream
soup fish cake
ice cream
vegetable
cake
(start)
ice cream
meat
cake
ice cream
juice fish
cake
ice cream
vegetable
cake
How many possible choices are there for the complete meal?
2/7
Tree Diagrams – Total Probability and Bayes Rule
ω m (ω)
meat ω1 .4
.5
.3 fish ω .24
soup 2
.2
.8
vegetable ω 3 .16
(start)
meat ω 4 .06
.2 .3
.3
vegetable ω 6 .06
4/7
Some Example Questions
5/7
Some Example Questions
1. Find the probability of a user requesting an Audio file?
We can use the Total Probability Theorem across user types:
7/7