0% found this document useful (0 votes)

12 views13 pages

Information & Communication

The document provides an overview of probability theory, including concepts such as sample space, events, probability laws, and axioms. It discusses discrete and continuous random variables, their probability mass and density functions, and key properties like expectation, variance, and entropy. Additionally, it covers important rules such as Bayes' Rule and the Total Probability Theorem, along with the definitions of independent events and the characteristics of cumulative distribution functions.

Uploaded by

achutunisr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views13 pages

Information & Communication

Uploaded by

achutunisr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ℹ️

Information & Communication

Probability Theory

Universal set, denoted by Ω: contains all objects that

could conceivably be of interest in a particular context.
Having specified the context in terms of a universal set Ω,
we only consider sets S that are subsets of Ω.

Two sets are said to be disjoint if their intersection is empty.

Elements of a Probabilistic Model →

The sample space Ω, which is the set of all possible outcomes of an

experiment.

The probability law , which assigns to a set A of possible outcomes

(also called an event) a non-negative number P (A)(called the
probability of A).

Sample Space: Every probabilistic model involves an underlying

process, called the experiment, that will produce exactly one out of
several possible outcomes. The set of all possible outcomes is called
the sample space of the experiment, and is denoted by Ω.

Information & Communication 1

Event: A subset of the sample space, that is, a collection of possible
outcomes, is called an event.

Note:for example: three tosses of a coin constitute a single experiment,

rather than three experiments.

Note: The sample space of an experiment may consist of a finite or an

infinite number of possible outcomes.

Probability Axioms →

Non-negativity: P (A) ≥ 0, for every event A.

Additivity: If A1 , A2 , ..., An is a sequence of disjoint events, then

the probability of their union satisfies:

P (A1 ∪ A2 ∪ .... ∪ An ) = P (A1 ) + P (A2 ) + ... + P (An )

Normalization: The probability of the entire sample space Ωis equal

to 1, that is, P (Ω) = 1.

Example:1 = P (Ω) = P (Ω ∪ Ø) = P (Ω) + P (Ø) = 1 + P (Ø), and

this shows that the probability of the empty event is: P (Ø) = 0.

Some Properties of Probability Laws →

If A ⊂ B,then P (A) ≤ P (B)

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
P (A ∪ B) ≤ P (A) + P (B)(can be further generalized)
Conditional Probability:

denoted by P (A∣B)
P (A∩B )
P (A∣B) = P (B )

We assume that P (B) > 0; the conditional probability is undefined

if the conditioning event has zero probability.

In words, out of the total probability of the elements of B, P (A∣B)

is the fraction that is assigned to possible outcomes that also belong
to A

For a fixed event B, it can be verified that the conditional probabilities
P (A∣B)form a legitimate probability law that satisfies the three axioms

Information & Communication 2

↗️ Multiplication Rule

Total Probability Theorem: Let A1 , A2 , ..., An be disjoint

events that form a partition of the sample space (each

possible outcome is included in one and only one of the
events A1 , A2 , ..., An ) and assume that P (Ai ) > 0, for

all i = 1, 2, ..., n.Then, for any event B , we have:

P (B) = P (A1 ∩ B)+⋅⋅⋅ +P (An ∩ B)

= P (A1 )P (B∣A1 )+⋅⋅⋅ +P (An )P (B∣An )

Bayes’ Rule: Let A1 , A2 , ..., An be disjoint events that

form a partition of the sample space and assume that

P (Ai ) > 0, for all i = 1, 2, ..., n.Then, for any event B ,

we have:
P (Ai )⋅P (B∣Ai )
P (Ai ∣B) =

P (B )

Bayes’ rule is used for inference. There are a number of “causes” that
may result in a certain “effect.” We observe the effect, and we wish to
infer the cause.

Independent Events:
When the occurrence of B provides no information and
does not alter the probability that A has occurred, i.e.,
P (A∣B) = P (A). In this case, we say that Ais
independent of B .

Equivalently, P (A ∩ B) = P (A)P (B)

The definition of independence can be extended to
multiple events (more than two).

Independence is a symmetric property; that is, if Ais independent of B,

then Bis independent of A, and we can unambiguously say that Aand

Information & Communication 3

Bare independent events.
Misconception: A
common first thought is that two events are
independent if they are disjoint, but in fact the opposite is true: two
disjoint events Aand Bwith P (A) > 0and P (B) > 0are never
independent, since their intersection A ∩ Bis empty and has
probability 0.

If Aand Bare independent, so are Aand Bc .

Pairwise independence does not imply mutual independence.

Conversely, mutual independence does not imply pairwise
independence.

Discrete Random Variables

Given an experiment and the corresponding set of possible

Definition:

outcomes (the sample space), a random variable associates a particular

number with each outcome. We refer to this number as the numerical
value or the experimental value of the random variable. Mathematically,
a random variable is a real-valued function of the experimental
outcome.

We can associate with each random variable certain “averages” of

interest, such the mean and the variance.

A random variable is called discrete if its range (the set of values that
it can take) is
finite or at most countably infinite.

A random variable that can take an uncountably infinite number of

values is not discrete.

A discrete random variable has an associated probability mass function

(PMF), which gives the probability of each numerical value that the
random variable can take.

→If xis any possible value of X , the

Probability Mass Function

probability mass of x, denoted pX (x) ,is the probability of the event {

X = x} consisting of all outcomes that give rise to a value of X equal

to x:

pX (x) = P ({X = x})

Information & Communication 4

X → denotes the random variable; x→ denotes the real number such
as the numerical value of a random variable.

Note: ∑x pX (x) = 1

Calculation of the PMF of a Random Variable X →

For each possible value
xof X :
1. Collect all the possible outcomes that give rise to the
event {X = x}.

2. Add their probabilities to obtain pX (x).

Different DRVs →

The Bernoulli Random Variable:

It is used to model generic probabilistic situations with just two

outcomes. The Bernoulli random variable takes the two values 1
and 0, depending on the outcome.

It’s PMF is:

pX (x) = {
p, if x = 1

1 − p, if x = 0

PMF of a Bernoulli(p) random variable.

Mean = p; Variance = p(1 − p)

Information & Communication 5

The Binomial Random Variable:

We refer to X as a binomial random variable with parameters n

and p. The PMF of X consists of the binomial probabilities.
n−k
pX (k) = P (X = k) = (nk ) ⋅ pk ⋅ (1 − p)

A Binomial PMF

The normalization property ∑x pX (x) = 1, specialized to the

binomial random variable, is written as:
∑nk=0 (nk )⋅k ⋅ (1 − p)n−k = 1

Mean = n ⋅ p; Variance = n ⋅ p ⋅ (1 − p)

The Poisson Random Variable:

A Poisson random variable takes non-negative integer values.

Its PMF is given by:
λk
pX (k) = e−λ ⋅

Mean = λ; Variance = λ

Expectation/Mean and Variance

Expectation of X , which is a weighted (in proportion to probabilities)

average of the possible values of X .

We define the expected value (also called the expectation or the mean)
of a random variable

Information & Communication 6

X , with PMF pX (x), by E[X] = ∑x x ⋅ pX (x)

It is useful to view the mean of X as a “representative” value of X ,

which
lies somewhere in the middle of its range.

Moment: We define the nth moment as E[X n ], the expected value of the
random variable X n . With this terminology, the 1st moment of X is just
the mean.

Variance: Denoted by Var(X)and is defined as the expected value of

the random variable (X − E[X])2 , i.e., var(X) = E[(X − E[X])2 ]
The variance is always non-negative. The variance provides a measure
of dispersion of X around its mean. Another measure of dispersion is
the standard deviation of X, which is defined as the square root of the
variance and is denoted by σX :σX = var (X )

Variance in Terms of Moments Expression: var(X) = E[X 2 ] − (E[X])2

Continuous Random Variable

A random variable X is called continuous if its probability law can be

described in terms of a nonnegative function fX , called the probability

density function of X , or PDF for short, which satisfies:

f (X ∈ B) = ∫B fX (x) dx for every subset B of the real line.

b
More generally, P (a ≤ X ≤ b) = ∫a fX (x) dx

a
For any single value a, we have P (X = a) = ∫a fX (x) dx = 0. For

this reason, including or excluding the endpoints of an interval has no

effect on its probability:

P (a ≤ X ≤ b) = P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b)

Note that to qualify as a PDF, a function fX must be non-negative, i.e.,

fX (x) > 0for every x, and must also satisfy the normalization

∞
equation: ∫−∞ fX (x) dx = P (−∞ < X < ∞) = 1

Graphically, this means that the entire area under the graph of the PDF
must be equal to 1.

Information & Communication 7

IMPORTANT: Even though a PDF is used to calculate event probabilities,
fX (x)is not the probability of any particular event. In particular, it is not

restricted to be ≤ 1. fX (x) ≥ 0for all x.

Uniform or uniformly distributed Random Variable. Its PDF

(x) = { 2
c, if a ≤ x ≤ b
has the form: fX , where cis a
x , if otherwise

constant.

The PDF of a uniform Random Variable.

Expectation/Mean: The expected value or mean of a

continuous random variable X is defined by: E[X] =
∞
∫−∞ x ⋅ fX (x) dx

This is similar to the discrete case except that the PMF is replaced by
the PDF, and summation is replaced by integration.

Cumulative Distribution Functions

The CDF of a random variable X is denoted by FX and provides the

probability P(X ≤ x). In particular, for every xwe

have:

FX (x) = P (X ≤ x) = { xk≤x
∑ pX (k) , X: Discrete

∫−∞ fX (t) dt, X: Continuous

Loosely speaking, the CDF “accumulates” probability “up to” the value
x.
Any random variable associated with a given probability model has a

Information & Communication 8

CDF, regardless of whether it is discrete, continuous, or other. This is
because {
X ≤ x} is always an event and therefore has a well-defined probability.

Properties of CDF →
1. FX is monotonically nondecreasing, if x
≤ y , then
FX (x) ≤ FX (y).

2.
FX (x)tends to 0as x → −∞, and to 1as x → ∞.

3. If
X is discrete, then FX has a piecewise constant and

staircase-like form.
4. If
X is continuous, then FX has a continuously varying

form.

pX (k) = P (X ≤ k) − P (X ≤ k − 1) = FX (k) − FX (k − 1)

dFX
fX (x) =

(x)
Entropy

Entropy is a measure of the uncertainty of a random variable.

The entropy H(X)of a discrete random variable X is defined by →

H (X ) = − ∑x∈X p (x) ⋅ log (p (x)). The base of the logarithm is 2.

Adding terms of zero probability does not change the entropy.

Lemma: H(X) ≥ 0

Information & Communication 9

The above mentioned definition is for a single random variable. We will
now extend the definition to a pair of random variables.

The joint entropy: The joint entropy H(X, Y )of a pair of discrete
random variables (X, Y )with a joint distribution p(x, y)is defined as:

H(X, Y ) = − ∑x∈X ∑y∈Y p (x, y) log (p (x, y))

Conditional Entropy: The expected value of the entropies of the

conditional distributions, averaged over the conditioning random
variable.

H(Y ∣X) = ∑x∈X p (x) H(Y ∣X = x)

Intuitively and can be proved otherwise, the entropy of a pair of random

variables is the entropy of one plus the conditional entropy of the other.

Chain Rule: H(X, Y ) = H(X) + H(Y ∣X)

Corollary: H(X, Y ∣Z) = H(X∣Z) + H(Y ∣X, Z)

H(X, Y ∣Z)refers to the conditional entropy of random variables X

and Y given random variable Z .

Note: Generally, H(Y ∣X) =

 H(X∣Y ).

However, H(X) − H(X∣Y ) = H(Y ) − H(Y ∣X).

Relative Entropy: The relative entropy D(p∣∣q)is a

measure of the inefficiency of assuming that the
distribution is q when the true distribution is p.
Relative entropy is always nonnegative and is zero if and
only if
p = qD(p∣∣q)
= ∑x∈X p (x) ⋅ log ( pq ((xx)) )

Mutual information is a measure of the

Mutual Information:

amount of information that one random variable contains

about another random variable. It is the reduction in the

Information & Communication 10

uncertainty of one random variable due to the knowledge
of the other.

Consider two random variables X and Y with a joint

probability mass function p(x, y)and marginal probability
mass functions p(x)and p(y). The mutual information
I(X; Y )is the relative entropy between the joint
distribution and the product distribution p(x)⋅ (y):

I (X; Y ) = − ∑x∈X ∑y∈Y p (x, y ) log ( p(px()x,y

⋅p(y) )
)

Note: Note that D(p∣∣q) =

 D(q∣∣p)in general.

Relation between entropy and mutual information →

I(X; Y ) = H(X) − H(X∣Y ) = H(Y ) − H(Y ∣X) =
H(X) + H(Y ) − H(X, Y )

The mutual information I(X; Y )is the reduction in the

uncertainty of X due to the knowledge of Y , or the
reduction in the uncertainty of Y due to the knowledge of
X .

X says as much about Y as Y says about X , thus

I(X; Y ) = I(Y ; X)

Information & Communication 11

Relationship between entropy and mutual information.

Chain rule for entropy Let X1 , X2 , ..., Xn be drawn according to

p(x1 , x2 , ..., xn ). Then

n
H(X1 , X2 , … , Xn ) = ∑i=1 H (Xi ∣Xi−1 , ..., X1 )

H(X1 , X2 , ..., Xn ) = H(X1 ) + H(X2 ∣X1 )+⋅⋅⋅ +H(Xn ∣Xn−1 , ..., X1 )

It is defined as the reduction in the uncertainty

Conditional Mutual Information:

of X due to knowledge of Y when Z is given.

I(X; Y ∣Z) = H(X∣Z) − H(X∣Y , Z)

Chain rule for information: I (X1 , X2 , ..., Xn ∣ Y ) =

∑ni=1 I (Xi ; Y ∣ Xi−1 , Xi−2 , ..., X1 )

The conditional Relative Entropy:

D(p(y∣x)∣∣q(y∣x)) = ∑x p (x) ∑y p (y∣x) ⋅ log ( pq((y∣x

y∣x)
)
)

Chain rule for relative entropy:

D(p(x, y)∣∣q(x, y)) = D(p(x)∣∣q(x)) + D(p(y∣x)∣∣q(y∣x))

Jensen’s Inequality:
If
f is a convex function and X is a random variable, E(f(X)) ≥ f(E(X))

Information & Communication 12

Information inequality:
Let
p(x), q(x), x ∈ X , be two probability mass functions. Then

D(p∣∣q) ≥ 0with equality if and only if p(x) = q(x)for allx.

Non-negativity of mutual information:
For any two random variables,
X, Y , I(X; Y ) ≥ 0with equality if and only if X and Y are independent.

Information & Communication 13

ML Cheat Sheet
50% (2)
ML Cheat Sheet
74 pages
ML DL AI Cheatsheet
No ratings yet
ML DL AI Cheatsheet
52 pages
Probability Cheatsheet
100% (1)
Probability Cheatsheet
10 pages
Mathematical Induction
100% (2)
Mathematical Induction
8 pages
Unit-III Probability and Random Variables
No ratings yet
Unit-III Probability and Random Variables
10 pages
Module 1
No ratings yet
Module 1
12 pages
7 Probability Communication
No ratings yet
7 Probability Communication
62 pages
Statistics and Probability Katabasis
No ratings yet
Statistics and Probability Katabasis
7 pages
All of Statistics Chapter 1 & 2
100% (1)
All of Statistics Chapter 1 & 2
24 pages
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
No ratings yet
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
8 pages
Probability Cheatsheet
100% (2)
Probability Cheatsheet
10 pages
Probability Formula Sheet
No ratings yet
Probability Formula Sheet
11 pages
Matrices and Simultaneous Equations For The Web PDF
No ratings yet
Matrices and Simultaneous Equations For The Web PDF
5 pages
Intro To Probability (Pattern Recognition)
No ratings yet
Intro To Probability (Pattern Recognition)
94 pages
MM3&4 - Probability and Distributions Summary Notes
No ratings yet
MM3&4 - Probability and Distributions Summary Notes
31 pages
All in One CheatSheet PDF
No ratings yet
All in One CheatSheet PDF
52 pages
Construcciones Geometricas
100% (1)
Construcciones Geometricas
197 pages
Stats
No ratings yet
Stats
25 pages
Probability Theory
No ratings yet
Probability Theory
8 pages
Reflective Essay of Probability Statistics
No ratings yet
Reflective Essay of Probability Statistics
24 pages
AI ML Cheatsheet
No ratings yet
AI ML Cheatsheet
51 pages
Probability and Random Variables
No ratings yet
Probability and Random Variables
14 pages
Random Variables and Distributions - New
No ratings yet
Random Variables and Distributions - New
84 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Probability Review
No ratings yet
Probability Review
12 pages
II Sem - Last Minute Revision
No ratings yet
II Sem - Last Minute Revision
44 pages
Examination in - Grade 8: - NATIONAL H S
100% (1)
Examination in - Grade 8: - NATIONAL H S
4 pages
ADC - Lec 2 - Probability
No ratings yet
ADC - Lec 2 - Probability
31 pages
L1 Queue
No ratings yet
L1 Queue
9 pages
Soalan Kbat Mat t5 2015
No ratings yet
Soalan Kbat Mat t5 2015
6 pages
3 Prob-Review
No ratings yet
3 Prob-Review
77 pages
MAS 102 - Topic 1
No ratings yet
MAS 102 - Topic 1
13 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
Random Variables
No ratings yet
Random Variables
44 pages
Probab Refresh
No ratings yet
Probab Refresh
7 pages
Probability: Totalfavourable Events Total Number of Experiments
No ratings yet
Probability: Totalfavourable Events Total Number of Experiments
39 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
PPT6-Probability and Random Variables
No ratings yet
PPT6-Probability and Random Variables
42 pages
SI Chapter-1
No ratings yet
SI Chapter-1
30 pages
CHP 5
No ratings yet
CHP 5
63 pages
Blue White Abstract Simple Project Presentation - 20240804 - 193747 - 0000
No ratings yet
Blue White Abstract Simple Project Presentation - 20240804 - 193747 - 0000
16 pages
Probability and Stochastic Models
No ratings yet
Probability and Stochastic Models
78 pages
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-01-22 Reference-Material-I
No ratings yet
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-01-22 Reference-Material-I
32 pages
Mathematical Statistics
No ratings yet
Mathematical Statistics
7 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
All Cheat Shests 1749903425
No ratings yet
All Cheat Shests 1749903425
3 pages
Probability-The Science of Uncertainty and Data
No ratings yet
Probability-The Science of Uncertainty and Data
4 pages
Lecture03 Probability Review
No ratings yet
Lecture03 Probability Review
48 pages
Probability Basics
No ratings yet
Probability Basics
19 pages
Business Mathematics Presentation
No ratings yet
Business Mathematics Presentation
28 pages
Pcs Module 1 Notes
No ratings yet
Pcs Module 1 Notes
51 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
79 pages
Maths 2nd Sem Ki Parchi
No ratings yet
Maths 2nd Sem Ki Parchi
1 page
ECO 231 (Intermediate Statistics For Economics
No ratings yet
ECO 231 (Intermediate Statistics For Economics
19 pages
Dividing Polynomials (Synthetic Division)
No ratings yet
Dividing Polynomials (Synthetic Division)
2 pages
Introduction To Probability and Statistics
No ratings yet
Introduction To Probability and Statistics
28 pages
PML Class 0 2025
No ratings yet
PML Class 0 2025
55 pages
Stat 116
No ratings yet
Stat 116
7 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
Top 243 Pyqs of Jee Mains 2024 Chapterwise
No ratings yet
Top 243 Pyqs of Jee Mains 2024 Chapterwise
34 pages
Chapter 7
No ratings yet
Chapter 7
13 pages
Differential Equations-I
No ratings yet
Differential Equations-I
13 pages
Further Mathematics Note
No ratings yet
Further Mathematics Note
24 pages
Permutation and Combination Word
No ratings yet
Permutation and Combination Word
9 pages
G. W. Stewart - Matrix Algorithms-Society For Industrial and Applied Mathematics (1998)
No ratings yet
G. W. Stewart - Matrix Algorithms-Society For Industrial and Applied Mathematics (1998)
479 pages
Appendix B - Mathematics Review PDF
No ratings yet
Appendix B - Mathematics Review PDF
18 pages
CAIE-IGCSE-International Mathematics0607 - Theory
No ratings yet
CAIE-IGCSE-International Mathematics0607 - Theory
34 pages
Mat1511 - Assignment 5 - 2024
No ratings yet
Mat1511 - Assignment 5 - 2024
4 pages
MATH140 Exam Revision - Algebra
No ratings yet
MATH140 Exam Revision - Algebra
12 pages
Alecture 18 - Sequential Circuits 3
No ratings yet
Alecture 18 - Sequential Circuits 3
11 pages
Chapter #9: Finite State Machine Optimization: Contemporary Logic Design
No ratings yet
Chapter #9: Finite State Machine Optimization: Contemporary Logic Design
34 pages
ALGEBRA
No ratings yet
ALGEBRA
19 pages
KS3 Year 9 Maths 2022 1
No ratings yet
KS3 Year 9 Maths 2022 1
1 page
Face Recognition Using Subspace LDA: Sheifali Gupta, O.P.Sahoo, Ajay Goel and Rupesh Gupta
No ratings yet
Face Recognition Using Subspace LDA: Sheifali Gupta, O.P.Sahoo, Ajay Goel and Rupesh Gupta
3 pages
Mathematics: Birthday Paradox
No ratings yet
Mathematics: Birthday Paradox
43 pages
Unit 5 B Test
No ratings yet
Unit 5 B Test
6 pages
Lec13 - Combinational Logic 2
No ratings yet
Lec13 - Combinational Logic 2
18 pages
ps9 Sol
No ratings yet
ps9 Sol
2 pages
Chapter II Vector Analysis 2023-2024 in English
No ratings yet
Chapter II Vector Analysis 2023-2024 in English
10 pages
Signal Processing
No ratings yet
Signal Processing
10 pages
3rd QUARTER UNIT TEST Sy 2023 2024
No ratings yet
3rd QUARTER UNIT TEST Sy 2023 2024
3 pages
Sequential Dynamical System
No ratings yet
Sequential Dynamical System
2 pages
Information and Communication Assignment 1
No ratings yet
Information and Communication Assignment 1
3 pages
(Passwater) HW Key Topic 1.5 Polynomial Functions and Complex Zeros
No ratings yet
(Passwater) HW Key Topic 1.5 Polynomial Functions and Complex Zeros
2 pages
Probability Assignment 4
No ratings yet
Probability Assignment 4
2 pages
Simple Equations Mock Test
No ratings yet
Simple Equations Mock Test
2 pages
08 Application of Integrals
No ratings yet
08 Application of Integrals
1 page
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Probability Theory: A Concise Course
From Everand
Probability Theory: A Concise Course
Y. A. Rozanov
4/5 (2)
Lebesgue Integration
From Everand
Lebesgue Integration
J.H. Williamson
No ratings yet
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet

Information & Communication

Uploaded by

Information & Communication

Uploaded by

ℹ️

Information & Communication

Universal set, denoted by Ω: contains all objects that

Two sets are said to be disjoint if their intersection is empty.

Elements of a Probabilistic Model →﻿

The sample space Ω, which is the set of all possible outcomes of an

The probability law , which assigns to a set A of possible outcomes

Sample Space: Every probabilistic model involves an underlying

Information & Communication 1

Note:for example: three tosses of a coin constitute a single experiment,

Note: The sample space of an experiment may consist of a finite or an

Non-negativity: P (A) ≥ 0﻿, for every event A﻿.

the probability of their union satisfies:

Normalization: The probability of the entire sample space Ω﻿is equal

Example:1 = P (Ω) = P (Ω ∪ Ø) = P (Ω) + P (Ø) = 1 + P (Ø)﻿, and

Some Properties of Probability Laws →﻿

If A ⊂ B,﻿then P (A) ≤ P (B)﻿

We assume that P (B) > 0﻿; the conditional probability is undefined

In words, out of the total probability of the elements of B﻿, P (A∣B)﻿

Information & Communication 2

Total Probability Theorem: Let A1 , ​ A2 , ..., An ﻿be disjoint

events that form a partition of the sample space (each

all i = 1, 2, ..., n.﻿Then, for any event B ﻿, we have:

= P (A1 )P (B∣A1 )+⋅⋅⋅ +P (An )P (B∣An )﻿

Bayes’ Rule: Let A1 , ​ A2 , ..., An ﻿be disjoint events that

form a partition of the sample space and assume that

Equivalently, P (A ∩ B) = P (A)P (B)﻿

Independence is a symmetric property; that is, if A﻿is independent of B﻿,

Information & Communication 3

If A﻿and B﻿are independent, so are A﻿and Bc ﻿.

Pairwise independence does not imply mutual independence.

Discrete Random Variables

Given an experiment and the corresponding set of possible

outcomes (the sample space), a random variable associates a particular

We can associate with each random variable certain “averages” of

A random variable that can take an uncountably infinite number of

A discrete random variable has an associated probability mass function

→﻿If x﻿is any possible value of X ﻿, the

X = x﻿} consisting of all outcomes that give rise to a value of X ﻿equal

pX (x) = P ({X = x})﻿

Information & Communication 4

Calculation of the PMF of a Random Variable X ﻿→

2. Add their probabilities to obtain pX ​ (x)﻿.

The Bernoulli Random Variable:

It is used to model generic probabilistic situations with just two

It’s PMF is:

PMF of a Bernoulli(p) random variable.

Mean = p;﻿ Variance = p(1 − p)﻿

Information & Communication 5

We refer to X ﻿as a binomial random variable with parameters n﻿

The normalization property ∑x pX ​ ​ (x) = 1﻿, specialized to the

Mean = n ⋅ p﻿; Variance = n ⋅ p ⋅ (1 − p)﻿

A Poisson random variable takes non-negative integer values.

Mean = λ﻿; Variance = λ﻿

Expectation of X ﻿, which is a weighted (in proportion to probabilities)

Information & Communication 6

It is useful to view the mean of X as a “representative” value of X ﻿,

Variance: Denoted by Var(X)﻿and is defined as the expected value of

Variance in Terms of Moments Expression: var(X) = E[X 2 ] − (E[X])2 ﻿

Continuous Random Variable

A random variable X ﻿is called continuous if its probability law can be

density function of X ﻿, or PDF for short, which satisfies:

this reason, including or excluding the endpoints of an interval has no

P (a ≤ X ≤ b) = P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b)﻿

Information & Communication 7

restricted to be ≤ 1﻿. fX (x) ≥ 0﻿for all x﻿.

Uniform or uniformly distributed Random Variable. Its PDF

The PDF of a uniform Random Variable.

Expectation/Mean: The expected value or mean of a

Cumulative Distribution Functions

The CDF of a random variable X ﻿is denoted by FX ﻿and provides the ​

probability P(X ≤ x)﻿. In particular, for every x﻿we

∫−∞ fX (t) dt, X: Continuous

Information & Communication 8

pX (k) = P (X ≤ k) − P (X ≤ k − 1) = FX (k) − FX (k − 1)﻿

Entropy is a measure of the uncertainty of a random variable.

Elements of a Probabilistic Model →

Non-negativity: P (A) ≥ 0, for every event A.

Normalization: The probability of the entire sample space Ωis equal

Example:1 = P (Ω) = P (Ω ∪ Ø) = P (Ω) + P (Ø) = 1 + P (Ø), and

Some Properties of Probability Laws →

If A ⊂ B,then P (A) ≤ P (B)

We assume that P (B) > 0; the conditional probability is undefined

In words, out of the total probability of the elements of B, P (A∣B)

Total Probability Theorem: Let A1 , A2 , ..., An be disjoint

all i = 1, 2, ..., n.Then, for any event B , we have:

= P (A1 )P (B∣A1 )+⋅⋅⋅ +P (An )P (B∣An )

Bayes’ Rule: Let A1 , A2 , ..., An be disjoint events that

Equivalently, P (A ∩ B) = P (A)P (B)

Independence is a symmetric property; that is, if Ais independent of B,

If Aand Bare independent, so are Aand Bc .

→If xis any possible value of X , the

X = x} consisting of all outcomes that give rise to a value of X equal

pX (x) = P ({X = x})

Calculation of the PMF of a Random Variable X →

2. Add their probabilities to obtain pX (x).

Mean = p; Variance = p(1 − p)

We refer to X as a binomial random variable with parameters n

The normalization property ∑x pX (x) = 1, specialized to the

Mean = n ⋅ p; Variance = n ⋅ p ⋅ (1 − p)

Mean = λ; Variance = λ

Expectation of X , which is a weighted (in proportion to probabilities)

It is useful to view the mean of X as a “representative” value of X ,

Variance: Denoted by Var(X)and is defined as the expected value of

Variance in Terms of Moments Expression: var(X) = E[X 2 ] − (E[X])2

A random variable X is called continuous if its probability law can be

density function of X , or PDF for short, which satisfies:

P (a ≤ X ≤ b) = P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b)

restricted to be ≤ 1. fX (x) ≥ 0for all x.

The CDF of a random variable X is denoted by FX and provides the

probability P(X ≤ x). In particular, for every xwe

pX (k) = P (X ≤ k) − P (X ≤ k − 1) = FX (k) − FX (k − 1)

The entropy H(X)of a discrete random variable X is defined by →

H (X ) = − ∑x∈X p (x) ⋅ log (p (x)). The base of the logarithm is 2.

H(Y ∣X) = ∑x∈X p (x) H(Y ∣X = x)

Chain Rule: H(X, Y ) = H(X) + H(Y ∣X)

H(X, Y ∣Z)refers to the conditional entropy of random variables X

However, H(X) − H(X∣Y ) = H(Y ) − H(Y ∣X).

Relative Entropy: The relative entropy D(p∣∣q)is a

Relation between entropy and mutual information →

The mutual information I(X; Y )is the reduction in the

X says as much about Y as Y says about X , thus

Chain rule for entropy Let X1 , X2 , ..., Xn be drawn according to

p(x1 , x2 , ..., xn ). Then

H(X1 , X2 , ..., Xn ) = H(X1 ) + H(X2 ∣X1 )+⋅⋅⋅ +H(Xn ∣Xn−1 , ..., X1 )

of X due to knowledge of Y when Z is given.

I(X; Y ∣Z) = H(X∣Z) − H(X∣Y , Z)

∑ni=1 I (Xi ; Y ∣ Xi−1 , Xi−2 , ..., X1 )

D(p(x, y)∣∣q(x, y)) = D(p(x)∣∣q(x)) + D(p(y∣x)∣∣q(y∣x))

D(p∣∣q) ≥ 0with equality if and only if p(x) = q(x)for allx.