0% found this document useful (0 votes)
34 views45 pages

Unit 1

The document provides an overview of key concepts in probability theory, including: 1) It defines the basic elements of a probability space - the sample space, events, and probability measure - and explains the axioms of probability. 2) It discusses discrete and continuous probability spaces and how probabilities are defined differently for each. 3) Useful probability laws like the union bound, total probability, and conditional probability are introduced. Bayes' rule for updating probabilities is also covered.

Uploaded by

Malli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views45 pages

Unit 1

The document provides an overview of key concepts in probability theory, including: 1) It defines the basic elements of a probability space - the sample space, events, and probability measure - and explains the axioms of probability. 2) It discusses discrete and continuous probability spaces and how probabilities are defined differently for each. 3) Useful probability laws like the union bound, total probability, and conditional probability are introduced. Bayes' rule for updating probabilities is also covered.

Uploaded by

Malli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Lecture Notes 1

Probability and Random Variables

• Probability Spaces

• Conditional Probability and Independence

• Random Variables

• Functions of a Random Variable

• Generation of a Random Variable

• Jointly Distributed Random Variables

• Scalar detection

EE 278: Probability and Random Variables Page 1 – 1


Probability Theory

• Probability theory provides the mathematical rules for assigning probabilities to


outcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage
• Basic elements of probability theory:
◦ Sample space Ω: set of all possible “elementary” or “finest grain” outcomes
of the random experiment
◦ Set of events F : set of (all?) subsets of Ω — an event A ⊂ Ω occurs if the
outcome ω ∈ A
◦ Probability measure P: function over F that assigns probabilities to events
according to the axioms of probability (see below)
• Formally, a probability space is the triple (Ω, F, P)

EE 278: Probability and Random Variables Page 1 – 2


Axioms of Probability

• A probability measure P satisfies the following axioms:


1. P(A) ≥ 0 for every event A in F
2. P(Ω) = 1
3. If A1, A2, . . . are disjoint events — i.e., Ai ∩ Aj = ∅, for all i 6= j — then
[∞  X∞
P Ai = P(Ai)
i=1 i=1
• Notes:
◦ P is a measure in the same sense as mass, length, area, and volume — all
satisfy axioms 1 and 3
◦ Unlike these other measures, P is bounded by 1 (axiom 2)
◦ This analogy provides some intuition but is not sufficient to fully understand
probability theory — other aspects such as conditioning and independence are
unique to probability

EE 278: Probability and Random Variables Page 1 – 3


Discrete Probability Spaces

• A sample space Ω is said to be discrete if it is countable


• Examples:
◦ Rolling a die: Ω = {1, 2, 3, 4, 5, 6}
◦ Flipping a coin n times: Ω = {H, T }n , sequences of heads/tails of length n
◦ Flipping a coin until the first heads occurs: Ω = {H, T H, T T H, T T T H, . . .}
• For discrete sample spaces, the set of events F can be taken to be the set of all
subsets of Ω, sometimes called the power set of Ω
• Example: For the coin flipping experiment,
F = {∅, {H}, {T }, Ω}

• F does not have to be the entire power set (more on this later)

EE 278: Probability and Random Variables Page 1 – 4


• The probability measure P can be defined by assigning probabilities to individual
outcomes — single outcome events {ω} — so that:
P({ω}) ≥ 0 for every ω ∈ Ω
X
P({ω}) = 1
ω∈Ω

• The probability of any other event A is simply


X
P(A) = P({ω})
ω∈A

• Example: For the die rolling experiment, assign


1
P({i}) = 6 for i = 1, 2, . . . , 6
The probability of the event “the outcome is even,” A = {2, 4, 6}, is
3 1
P(A) = P({2}) + P({4}) + P({6}) = 6 = 2

EE 278: Probability and Random Variables Page 1 – 5


Continuous Probability Spaces

• A continuous sample space Ω has an uncountable number of elements


• Examples:
◦ Random number between 0 and 1: Ω = (0, 1 ]
◦ Point in the unit disk: Ω = {(x, y) : x2 + y 2 ≤ 1}
◦ Arrival times of n packets: Ω = (0, ∞)n
• For continuous Ω, we cannot in general define the probability measure P by first
assigning probabilities to outcomes
• To see why, consider assigning a uniform probability measure over (0, 1 ]
◦ In this case the probability of each single outcome event is zero
◦ How do we find the probability of an event such as A = [0.25, 0.75]?

EE 278: Probability and Random Variables Page 1 – 6


• Another difference for continuous Ω: we cannot take the set of events F as the
power set of Ω. (To learn why you need to study measure theory, which is
beyond the scope of this course)
• The set of events F cannot be an arbitrary collection of subsets of Ω. It must
make sense, e.g., if A is an event, then its complement Ac must also be an
event, the union of two events must be an event, and so on
• Formally, F must be a sigma algebra (σ-algebra, σ-field), which satisfies the
following axioms:
1. ∅ ∈ F
2. If A ∈ F then Ac ∈ F
S∞
3. If A1, A2, . . . ∈ F then i=1 Ai ∈F
• Of course, the power set is a sigma algebra. But we can define smaller
σ-algebras. For example, for rolling a die, we could define the set of events as
F = {∅, odd, even, Ω}

EE 278: Probability and Random Variables Page 1 – 7


• For Ω = R = (−∞, ∞) (or (0, ∞), (0, 1), etc.) F is typically defined as the
family of sets obtained by starting from the intervals and taking countable
unions, intersections, and complements
• The resulting F is called the Borel field
• Note: Amazingly there are subsets in R that cannot be generated in this way!
(Not ones that you are likely to encounter in your life as an engineer or even as
a mathematician)
• To define a probability measure over a Borel field, we first assign probabilities to
the intervals in a consistent way, i.e., in a way that satisfies the axioms of
probability
For example to define uniform probability measure over (0, 1), we first assign
P((a, b)) = b − a to all intervals
• In EE 278 we do not deal with sigma fields or the Borel field beyond (kind of)
knowing what they are

EE 278: Probability and Random Variables Page 1 – 8


Useful Probability Laws

• Union of Events Bound:


n
 [  n
X
P Ai ≤ P(Ai)
i=1 i=1

• Law of Total Probability: Let A1, A2,SA3, . . . be events that partition Ω, i.e.,
disjoint (Ai ∩ Aj = ∅ for i 6= j) and i Ai = Ω. Then for any event B
P
P(B) = iP(Ai ∩ B)
The Law of Total Probability is very useful for finding probabilities of sets

EE 278: Probability and Random Variables Page 1 – 9


Conditional Probability

• Let B be an event such that P(B) 6= 0. The conditional probability of event A


given B is defined to be
P(A ∩ B) P(A, B)
P(A | B) = =
P(B) P(B)

• The function P(· | B) is a probability measure over F , i.e., it satisfies the


axioms of probability
• Chain rule: P(A, B) = P(A)P(B | A) = P(B)P(A | B) (this can be generalized
to n events)
• The probability of event A given B, a nonzero probability event — the
a posteriori probability of A — is related to the unconditional probability of
A — the a priori probability — by
P(B | A)
P(A | B) = P(A)
P(B)
This follows directly from the definition of conditional probability

EE 278: Probability and Random Variables Page 1 – 10


Bayes Rule

• Let A1, A2, . . . , An be nonzero probability events that partition Ω, and let B be
a nonzero probability event
• We know P(Ai ) and P(B | Ai), i = 1, 2, . . . , n, and want to find the a posteriori
probabilities P(Aj | B), j = 1, 2, . . . , n
• We know that
P(B | Aj )
P(Aj | B) = P(Aj )
P(B)
• By the law of total probability
X n n
X
P(B) = P(Ai, B) = P(Ai )P(B | Ai)
i=1 i=1

• Substituting, we obtain Bayes rule


P(B | Aj )
P(Aj | B) = Pn P(Aj ), j = 1, 2, . . . , n
i=1 P(Ai )P(B | Ai )

• Bayes rule also applies to a (countably) infinite number of events

EE 278: Probability and Random Variables Page 1 – 11


Independence

• Two events are said to be statistically independent if


P(A, B) = P(A)P(B)

• When P(B) 6= 0, this is equivalent to


P(A | B) = P(A)
In other words, knowing whether B occurs does not change the probability of A
• The events A1, A2, . . . , An are said to be independent if for every subset
Ai1 , Ai2 , . . . , Aik of the events,
k
Y
P(Ai1 , Ai2 , . . . , Aik ) = P(Aij )
j=1
Qn
• Note: P(A1, A2, . . . , An) = j=1 P(Ai ) is not sufficient for independence

EE 278: Probability and Random Variables Page 1 – 12


Random Variables

• A random variable (r.v.) is a real-valued function X(ω) over a sample space Ω,


i.e., X : Ω → R

ω X(ω)

• Notations:
◦ We use upper case letters for random variables: X, Y, Z, Φ, Θ, . . .
◦ We use lower case letters for values of random variables: X = x means that
random variable X takes on the value x, i.e., X(ω) = x where ω is the
outcome

EE 278: Probability and Random Variables Page 1 – 13


Specifying a Random Variable

• Specifying a random variable means being able to determine the probability that
X ∈ A for any Borel set A ⊂ R, in particular, for any interval (a, b ]
• To do so, consider the inverse image of A under X , i.e., {ω : X(ω) ∈ A}

set A
inverse image of A under X(ω), i.e., {ω : X(ω) ∈ A}

• Since X ∈ A iff ω ∈ {ω : X(ω) ∈ A},


P({X ∈ A}) = P({ω : X(ω) ∈ A}) = P{ω : X(ω) ∈ A}
Shorthand: P({set description}) = P{set description}

EE 278: Probability and Random Variables Page 1 – 14


Cumulative Distribution Function (CDF)
• We need to be able to determine P{X ∈ A} for any Borel set A ⊂ R, i.e., any
set generated by starting from intervals and taking countable unions,
intersections, and complements
• Hence, it suffices to specify P{X ∈ (a, b ]} for all intervals. The probability of
any other Borel set can be determined by the axioms of probability
• Equivalently, it suffices to specify its cumulative distribution function (cdf):
FX (x) = P{X ≤ x} = P{X ∈ (−∞, x ]} , x∈R
• Properties of cdf:
◦ FX (x) ≥ 0
◦ FX (x) is monotonically nondecreasing, i.e., if a > b then FX (a) ≥ FX (b)
FX (x)
1

EE 278: Probability and Random Variables Page 1 – 15


◦ Limits: lim FX (x) = 1 and lim FX (x) = 0
x→+∞ x→−∞

◦ FX (x) is right continuous, i.e., FX (a+) = limx→a+ FX (x) = FX (a)


◦ P{X = a} = FX (a) − FX (a−), where FX (a−) = limx→a− FX (x)
◦ For any Borel set A, P{X ∈ A} can be determined from FX (x)
• Notation: X ∼ FX (x) means that X has cdf FX (x)

EE 278: Probability and Random Variables Page 1 – 16


Probability Mass Function (PMF)

• A random variable is said to be discrete if FX (x) consists only of steps over a


countable set X

• Hence, a discrete random variable can be completely specified by the probability


mass function (pmf)
pX (x) = P{X = x} for every x ∈ X
P
Clearly pX (x) ≥ 0 and x∈X pX (x) = 1
• Notation: We use X ∼ pX (x) or simply X ∼ p(x) to mean that the discrete
random variable X has pmf pX (x) or p(x)

EE 278: Probability and Random Variables Page 1 – 17


• Famous discrete random variables:
◦ Bernoulli: X ∼ Bern(p) for 0 ≤ p ≤ 1 has the pmf
pX (1) = p and pX (0) = 1 − p

◦ Geometric: X ∼ Geom(p) for 0 ≤ p ≤ 1 has the pmf


pX (k) = p(1 − p)k−1 , k = 1, 2, 3, . . .

◦ Binomial: X ∼ Binom(n, p) for integer n > 0 and 0 ≤ p ≤ 1 has the pmf


 
n k
pX (k) = p (1 − p)n−k , k = 0, 1, 2, . . .
k
◦ Poisson: X ∼ Poisson(λ) for λ > 0 has the pmf
λk −λ
pX (k) = e , k = 0, 1, 2, . . .
k!
◦ Remark: Poisson is the limit of Binomial for np = λ as n → ∞, i.e., for every
k = 0, 1, 2, . . ., the Binom(n, λ/n) pmf
λk −λ
pX (k) → e as n → ∞
k!

EE 278: Probability and Random Variables Page 1 – 18


Probability Density Function (PDF)

• A random variable is said to be continuous if its cdf is a continuous function

• If FX (x) is continuous and differentiable (except possibly over a countable set),


then X can be completely specified by a probability density function (pdf)
fX (x) such that Z x
FX (x) = fX (u) du
−∞

• If FX (x) is differentiable everywhere, then (by definition of derivative)


dFX (x)
fX (x) =
dx
F (x + ∆x) − F (x) P{x < X ≤ x + ∆x}
= lim = lim
∆x→0 ∆x ∆x→0 ∆x

EE 278: Probability and Random Variables Page 1 – 19


• Properties of pdf:
◦ fX (x) ≥ 0
Z ∞
◦ fX (x) dx = 1
−∞

◦ For any event (Borel set) A ⊂ R,


Z
P{X ∈ A} = fX (x) dx
x∈A
In particular, Z x2
P{x1 < X ≤ x2} = fX (x) dx
x1

• Important note: fX (x) should not be interpreted as the probability that X = x.


In fact, fX (x) is not a probability measure since it can be > 1
• Notation: X ∼ fX (x) means that X has pdf fX (x)

EE 278: Probability and Random Variables Page 1 – 20


• Famous continuous random variables:
◦ Uniform: X ∼ U[a, b ] where a < b has pdf
(
1
b−a if a ≤ x ≤ b
fX (x) =
0 otherwise

◦ Exponential: X ∼ Exp(λ) where λ > 0 has pdf


(
λe−λx if x ≥ 0
fX (x) =
0 otherwise

◦ Laplace: X ∼ Laplace(λ) where λ > 0 has pdf


1
fX (x) = λe−λ|x|
2

◦ Gaussian: X ∼ N (µ, σ 2 ) with parameters µ (the mean) and σ 2 (the


variance, σ is the standard deviation) has pdf
(x−µ)2
fX (x) = √
1
e
− 2σ2
2πσ 2

EE 278: Probability and Random Variables Page 1 – 21


The cdf of the standard normal random variable N (0, 1) is
Z x 2
1 −u
Φ(x) = √ e 2 du
−∞ 2π

Define the function Q(x) = 1 − Φ(x) = P{X > x}


N (0, 1)

Q(x)

x
The Q(·) function is used to compute P{X > a} for any Gaussian r.v. X :
Given Y ∼ N (µ, σ 2), we represent it using the standard X ∼ N (0, 1) as
Y = σX + µ
Then
y−µ y−µ
   
P{Y > y} = P X > =Q
σ σ

◦ The complementary error function is erfc(x) = 2Q( 2 x)

EE 278: Probability and Random Variables Page 1 – 22


Functions of a Random Variable

• Suppose we are given a r.v. X with known cdf FX (x) and a function y = g(x).
What is the cdf of the random variable Y = g(X)?
• We use
FY (y) = P{Y ≤ y} = P{x : g(x) ≤ y}

x y

y
{x : g(x) ≤ y}

EE 278: Probability and Random Variables Page 1 – 23


• Example: Square law detector. Let X ∼ FX (x) and Y = X 2 . We wish to find
FY (y)
y

√ x
− y √
y

If y < 0, then clearly FY (y) = 0. Consider y ≥ 0,


√ √ √ √
FY (y) = P {− y < X ≤ y } = FX ( y) − FX (− y )
If X is continuous with density fX (x), then
1
 √ √ 
fY (y) = √ fX (+ y) + fX (− y)
2 y

EE 278: Probability and Random Variables Page 1 – 24


• Remark: In general, let X ∼ fX (x) and Y = g(X) be differentiable. Then
k
X fX (xi)
fY (y) = ′ (x )|
,
i=1
|g i

where x1, x2, . . . are the solutions of the equation y = g(x) and g ′(xi ) is the
derivative of g evaluated at xi

EE 278: Probability and Random Variables Page 1 – 25


• Example: Limiter. Let X ∼ Laplace(1), i.e., fX (x) = (1/2)e−|x| , and let Y be
defined by the function of X shown in the figure. Find the cdf of Y
y

+a
−1
x
+1
−a

To find the cdf of Y , we consider the following cases


◦ y < −a: Here clearly FY (y) = 0
◦ y = −a: Here
FY (−a) = FX (−1)
Z −1
1 x 1 −1
= 2 e dx = 2e
−∞

EE 278: Probability and Random Variables Page 1 – 26


◦ −a < y < a: Here
FY (y) = P{Y ≤ y}
= P {aX ≤ y}
n yo y 
=P X≤ = FX
a a
Z y/a
= 21 e−1 + 1 −|x|
2e dx
−1

◦ y ≥ a: Here FY (y) = 1
Combining the results, the following is a sketch of the cdf of Y
FY (y)

y
−a a

EE 278: Probability and Random Variables Page 1 – 27


Generation of Random Variables

• Generating a r.v. with a prescribed distribution is often needed for performing


simulations involving random phenomena, e.g., noise or random arrivals
• First let X ∼ F (x) where the cdf F (x) is continuous and strictly increasing.
Define Y = F (X), a real-valued random variable that is a function of X
What is the cdf of Y ?
Clearly, FY (y) = 0 for y < 0, and FY (y) = 1 for y > 1
For 0 ≤ y ≤ 1, note that by assumption F has an inverse F −1 , so
FY (y) = P{Y ≤ y} = P{F (X) ≤ y} = P{X ≤ F −1(y)} = F (F −1(y)) = y
Thus Y ∼ U [ 0, 1 ], i.e., Y is a uniformly distributed random variable
• Note: F (x) does not need to be invertible. If F (x) = a is constant over some
interval, then the probability that X lies in this interval is zero. Without loss of
generality, we can take F −1(a) to be the leftmost point of the interval
• Conclusion: We can generate a U[ 0, 1 ] r.v. from any continuous r.v.

EE 278: Probability and Random Variables Page 1 – 28


• Now, let’s consider the opposite scenario where we are given X ∼ U[ 0, 1 ] (a
random number generator) and wish to generate a random variable Y with
prescribed cdf F (y), e.g., Gaussian or exponential
x = F (y)
1

y = F −1(x)

• If F is continuous and strictly increasing, set Y = F −1(X). To show Y ∼ F (y),


FY (y) = P{Y ≤ y}
= P{F −1(X) ≤ y}
= P{X ≤ F (y)}
= F (y) ,
since X ∼ U[ 0, 1 ] and 0 ≤ F (y) ≤ 1

EE 278: Probability and Random Variables Page 1 – 29


• Example: To generate Y ∼ Exp(λ), set
1
Y = − ln(1 − X)
λ

• Note: F does not need to be continuous for the above to work. For example, to
generate Y ∼ Bern(p), we set
(
0 X ≤1−p
Y =
1 otherwise

x = F (y)

1−p

y
0 1

• Conclusion: We can generate a r.v. with any desired distribution from a U[0, 1] r.v.

EE 278: Probability and Random Variables Page 1 – 30


Jointly Distributed Random Variables

• A pair of random variables X and Y defined over the same probability space are
specified by their joint cdf
FX,Y (x, y) = P{X ≤ x, Y ≤ y} , x, y ∈ R
FX,Y (x, y) is the probability of the shaded region of R2
y

(x, y)

EE 278: Probability and Random Variables Page 1 – 31


• Properties of the cdf:
◦ FX,Y (x, y) ≥ 0

◦ If x1 ≤ x2 and y1 ≤ y2 then FX,Y (x1, y1) ≤ FX,Y (x2, y2)

◦ lim FX,Y (x, y) = 0 and lim FX,Y (x, y) = 0


y→−∞ x→−∞

◦ lim FX,Y (x, y) = FX (x) and lim FX,Y (x, y) = FY (y)


y→∞ x→∞

FX (x) and FY (y) are the marginal cdfs of X and Y

◦ lim FX,Y (x, y) = 1


x,y→∞

• X and Y are independent if for every x and y


FX,Y (x, y) = FX (x)FY (y)

EE 278: Probability and Random Variables Page 1 – 32


Joint, Marginal, and Conditional PMFs

• Let X and Y be discrete random variables on the same probability space


• They are completely specified by their joint pmf:
pX,Y (x, y) = P{X = x, Y = y} , x ∈ X , y ∈ Y
XX
By axioms of probability, pX,Y (x, y) = 1
x∈X y∈Y

• To find pX (x), the marginal pmf of X , we use the law of total probability
X
pX (x) = p(x, y) , x ∈ X
y∈Y

• The conditional pmf of X given Y = y is defined as


pX,Y (x, y)
pX|Y (x|y) = , pY (y) 6= 0, x ∈ X
pY (y)

• Chain rule: pX,Y (x, y) = pX (x)pY |X (y|x) = pY (y)pX|Y (x|y)

EE 278: Probability and Random Variables Page 1 – 33


• Independence: X and Y are said to be independent if for every (x, y) ∈ X × Y ,
pX,Y (x, y) = pX (x)pY (y) ,
which is equivalent to pX|Y (x|y) = pX (x) for every x ∈ X and y ∈ Y such
that pY (y) 6= 0
Example (Binary Symmetric Channel):
Z ∼ Bern(ǫ)

X ∼ Bern(p) Y =X ⊕Z

X and Z are independent

EE 278: Probability and Random Variables Page 1 – 34


Joint, Marginal, and Conditional PDF

• X and Y are jointly continuous random variables if their joint cdf is continuous
in both x and y
In this case, we can define their joint pdf, provided that it exists, as the function
fX,Y (x, y) such that
Z x Z y
FX,Y (x, y) = fX,Y (u, v) du dv , x, y ∈ R
−∞ −∞

• If FX,Y (x, y) is differentiable in x and y, then


∂ 2F (x, y) P{x < X ≤ x + ∆x, y < Y ≤ y + ∆y}
fX,Y (x, y) = = lim
∂x∂y ∆x,∆y→0 ∆x∆y
• Properties of fX,Y (x, y):

◦ fX,Y (x, y) ≥ 0
Z ∞Z ∞
◦ fX,Y (x, y) dx dy = 1
−∞ −∞

EE 278: Probability and Random Variables Page 1 – 35


• The marginal pdf of X can be obtained from the joint pdf via the law of total
probability: Z ∞
fX (x) = fX,Y (x, y) dy
−∞

• X and Y are independent iff fX,Y (x, y) = fX (x)fY (y) for every x, y
• Conditional cdf and pdf: Let X and Y be continuous random variables with
joint pdf fX,Y (x, y). We wish to define FY |X (y | X = x) = P{Y ≤ y | X = x}
We cannot define the above conditional probability as
P{Y ≤ y, X = x}
P{X = x}
because both numerator and denominator are equal to zero. Instead, we define
conditional probability for continuous random variables as a limit
FY |X (y|x) = lim P{Y ≤ y | x < X ≤ x + ∆x}
∆x→0
P{Y ≤ y, x < X ≤ x + ∆x}
= lim
∆x→0 P{x < X ≤ x + ∆x}
Ry Z y
fX,Y (x, u) du ∆x fX,Y (x, u)
= lim −∞ = du
∆x→0 fX (x)∆x −∞ fX (x)

EE 278: Probability and Random Variables Page 1 – 36


• We then define the conditional pdf in the usual way as
fX,Y (x, y)
fY |X (y|x) = if fX (x) 6= 0
fX (x)
• Thus Z y
FY |X (y|x) = fY |X (u|x) du
−∞
which shows that fY |X (y|x) is a pdf for Y given X = x, i.e.,
Y | {X = x} ∼ fY |X (y|x)

• Chain rule: fX,Y (x, y) = fX (x)fY |X (y|x)


• Independence: X and Y are independent if fX,Y (x, y) = fX (x)fY (y) for every
(x, y), or equivalently, fY |X (y|x) = fY (y)
• Example (Additive Gaussian channel):
Z ∼ N (0, N )

X ∼ N (0, P ) Y

X and Z are independent

EE 278: Probability and Random Variables Page 1 – 37


One Discrete and One Continuous Random Variables

• Let Θ be a discrete random variable with pmf pΘ(θ)


• For each Θ = θ with pΘ(θ) 6= 0, let Y be a continuous random variable, i.e.,
FY |Θ(y|θ) is continuous for all θ. We define fY |Θ(y|θ) in the usual way
• The conditional pmf of Θ given y can be defined as a limit
P{Θ = θ, y < Y ≤ y + ∆y}
pΘ|Y (θ | y) = lim
∆y→0 P{y < Y ≤ y + ∆y}
pΘ(θ)fY |Θ(y|θ)∆y fY |Θ(y|θ)pΘ(θ)
= lim =
∆y→0 fY (y)∆y fY (y)
This leads to the Bayes rule:
fY |Θ(y|θ)
pΘ|Y (θ | y) = P ′ )f
p (θ)
′) Θ
p
θ′ Θ (θ Y |Θ (y|θ

EE 278: Probability and Random Variables Page 1 – 38


• Example: Additive Gaussian Noise Channel
Consider the following noisy channel:
Z ∼ N (0, N )

Θ Y

The signal transmitted is a binary random variable Θ:


(
+1 with probability p
Θ=
−1 with probability 1 − p

The received signal, also called the observation, is Y = Θ + Z , where Θ and Z


are independent
Given Y = y is received (observed), find pΘ|Y (θ|y), the a posteriori pmf of Θ

EE 278: Probability and Random Variables Page 1 – 39


• In some cases we are given fY (y) and pΘ|Y (θ|y) for every y
• We can find the a posteriori pdf of Y using the Bayes rule:
pΘ|Y (θ|y)
fY |Θ(y|θ) = R ′ ′ ′
fY (y)
fY (y )pΘ|Y (θ|y )dy

• Example: Coin with random bias


Consider a coin with random bias P ∼ fP (p). Flip the coin and let X = 1 if the
outcome is heads and X = 0 if the outcome is tails
Given that X = 1 (i.e., outcome is heads), find fP |X (p|1), the a posteriori pdf
of P

EE 278: Probability and Random Variables Page 1 – 40


Scalar Detection

• Consider the following signal processing problem:


noisy Y
Θ ∈ {θ0, θ1} decoder Θ̂(Y ) ∈ {θ0, θ1}
channel
fY |Θ(y|θ)
where the signal sent is
(
θ0 with probability p
Θ=
θ1 with probability 1 − p
and the observation (received signal) is
Y | {Θ = θ} ∼ fY |Θ(y | θ) , θ ∈ {θ0, θ1}

• We wish to find the estimate Θ̂(Y ) (i.e., design the decoder) that minimizes the
probability of error:

Pe = P{Θ̂ 6= Θ} = P{Θ = θ0, Θ̂ = θ1} + P{Θ = θ1, Θ̂ = θ0}
= P{Θ = θ0}P{Θ̂ = θ1 | Θ = θ0} + P{Θ = θ1}P{Θ̂ = θ0 | Θ = θ1}

EE 278: Probability and Random Variables Page 1 – 41


• We define the maximum a posteriori probability (MAP) decoder as
(
θ0 if pΘ|Y (θ0|y) > pΘ|Y (θ1|y)
Θ̂(y) =
θ1 otherwise
• The MAP decoding rule minimizes Pe , since
min Pe = 1 − max P{Θ̂(Y ) = Θ}
Θ̂ Θ̂
Z ∞
= 1 − max fY (y)P{Θ = Θ̂(y) | Y = y} dy
Θ̂ −∞
Z ∞
=1− fY (y) max P{Θ = Θ̂(y) | Y = y} dy
−∞ Θ̂(y)

and the probability of error is minimized if we pick the largest pΘ|Y (Θ̂(y)|y) for
every y, which is precisely the MAP decoder
• If p = 12 , i.e., equally likely signals, using Bayes rule, the MAP decoder reduces
to the maximum likelihood (ML) decoder
(
θ0 if fY |Θ(y|θ0) > fY |Θ(y|θ1)
Θ̂(y) =
θ1 otherwise

EE 278: Probability and Random Variables Page 1 – 42


Additive Gaussian Noise Channel

• Consider the additive Gaussian noise channel with signal


( √
+ P with probability 12
Θ = √
− P with probability 12
noise Z ∼ N (0, N ) (Θ and Z are independent), and output Y = Θ + Z
• The MAP decoder is



+ P P{Θ = + P | Y = y}
if √ >1

Θ̂(y) = P{Θ = − P | Y = y}
−√P

otherwise
Since the two signals are equally likely, the MAP decoding rule reduces to the
ML decoding rule



+ P if Y |Θ f (y | + P)
√ >1

Θ̂(y) = fY |Θ(y | − P )

− P otherwise

EE 278: Probability and Random Variables Page 1 – 43


• Using the Gaussian pdf, the ML decoder reduces to the minimum distance
decoder ( √ √ 2 √
+ P (y − P ) < (y − (− P ))2
Θ̂(y) = √
− P otherwise
From the figure, this simplifies to
( √
+ P y>0
Θ̂(y) = √
− P y<0
Note: The decision when y = 0 is arbitrary

√ √
f (y |− P ) f (y |+ P )

√ √ y
− P + P

EE 278: Probability and Random Variables Page 1 – 44


• Now to find the minimum probability of error, consider
Pe = P{Θ̂(Y ) 6= Θ}
√ √ √
= P{Θ = P }P{Θ̂(Y ) = − P | Θ = P } +
√ √ √
P{Θ = − P }P{Θ̂(Y ) = P | Θ = − P }
1
√ 1

= 2 P{Y ≤ 0 | Θ = P } + 2 P{Y > 0 | Θ = − P }
√ √
1 1
= 2 P{Z ≤ − P } + 2 P{Z > P }
r ! √
P

=Q =Q SNR
N

The probability of error is a decreasing function of P/N , the signal-to-noise


ratio (SNR)

EE 278: Probability and Random Variables Page 1 – 45

You might also like