0% found this document useful (0 votes)

10 views15 pages

L1 Prob

The document covers fundamental concepts in set theory, probability, and random variables, including definitions, operations, and key theorems. It introduces the algebra of sets, probability axioms, and the distinction between probability mass functions and probability density functions. Additionally, it explains random variables, their types, and their significance in statistical analysis.

Uploaded by

choudhury.devesh5905

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views15 pages

L1 Prob

Uploaded by

choudhury.devesh5905

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

CGS698C, Module 1: Sets, probability, and random variables

Himanshu Yadav
2024-05-17

Contents

1 Set theory 2
1.1 Binary operations on sets 2
1.2 The algebra of sets 3
2 Probability theory 3
2.1 Foundations 3
2.2 Probability mass function and probability density function 5
3 Random variables 6
3.1 Discrete random variables 7
3.2 The expected value and variance of a random variable 8
3.3 Continuous random variables 9
3.4 Some important probability distributions 11
4 Conditional probability and Bayes’ theorem 12
4.1 Conditional probability 12
4.2 Independent events 13
4.3 Total probability 13
4.4 Bayes’ theorem 13
5 Using Bayes’ theorem for statistical inference 14
bayesian models & data analysis 2

1 Set theory

A set is a collection of objects or elements. Suppose that set A consists of two numbers 0 and 1. We
can denote this set as follows:

A = {0, 1}
From the above, we can also say that 0 is a member of set A:

0∈A
Similarly, 1 ∈ A.
The number 2 is not a member:

2∈
/A
Let us say S is a set of natural numbers between 2 and 8. We can write:

S = {2, 3, 4, 5, 6, 7, 8}
Also, we can describe the set S as follows. The vertical bar, |, is read “such that.”

S = { x | x ∈ N+ and 2 ≤ x ≤ 8}
If A = {1,2,3}, B= {1,3,2} and C={3,1,2,1}, we can write A = B = C.

1.1 Binary operations on sets

1. The union of two sets A and B ia denoted by A ∪ B
A ∪ B is the set of all objects that are a member of A, or B, or both.

2. The intersection of two sets A and B is denoted by A ∩ B

A ∩ B is the set of all objects that are members of both A and B.

3. Set difference for the sets B and A is denoted by B\ A

B\ A is the set of all members of B that are not members of A.
B\ A = { x | x ∈ B and x ∈
/ A}

4. The Cartesian product of A and B, denoted by A × B

A × B is the set whose members are all possible ordered pairs ( a, b), such that a ∈ A, and b ∈ B.

5. A set A is a subset of another set B, denoted by A ⊆ B if all members of A are also in B, i.e., for all
a ∈ A, a ∈ B.
A is a proper subset of B, denoted by A ⊂ B if all members of A are also in B, but A ̸= B.

6. The power set of a set A, denoted by P ( A), is the set of all possible subsets of A.

7. Two sets A and B are called disjoint sets if A and B have no element in common.
A ∩ B = ∅ where ∅ represent an empty set (a set with no elements in it).
bayesian models & data analysis 3

8. The complement of a set A is denoted by Ā or Ac

Ā is the set of all those elements which belong to the universal set U but does not belong to A.

Ā = { x | x ∈
/ A}

1.2 The algebra of sets

1. A ∪ A = A
A∩A = A

2. A ∪ B = B ∪ A
A∩B = B∩A

3. A ∪ ( B ∪ C ) = ( A ∪ B) ∪ C
A ∩ ( B ∩ C ) = ( A ∩ B) ∩ C

4. A ∩ ( B ∪ C ) = ( A ∩ B) ∪ ( A ∩ C )
A ∪ ( B ∩ C ) = ( A ∪ B) ∩ ( A ∪ C )

5. A ∪ ∅ = A
A∩∅ = ∅

2 Probability theory

Suppose you run an experiment where the participants have to decide whether a given sentence is
grammatically correct or not. And, the participants are forced to select either yes or no. The recorded
responses you have are “yes” and “no.”
What is the set of all possible outcomes from the experiment?
Ω = {“yes”, “no”}
The set of all possible outcomes from the experiment is called the sample space of the experiment.
What is the power set of the sample sapce Ω?
F = {∅,{“yes”},{“no”},{“yes”,“no”}}
∅ : there is no outcome
{“yes”} : the outcome is “yes”
{“no”} : the outcome is “no”
{“yes”,“no”} : the outcome is either “yes” or “no”
The above are the all different collections of possible results. These collections are called events.
For example, {“no”} is an event that the participant answers “no”; {“yes”,“no”} is the event that the
participant answers either “yes” or “no”. The proper set F is called the event space.
Probability is a way of assigning every event a real value between 0 and 1 based on some require-
ments. What are those requirements? How do we assign a probability value to every event?

2.1 Foundations
We can assign a probability value P( E) to an event E based on the following three axioms.
bayesian models & data analysis 4

1. First axiom:
The probability of an event E is a non-negative real number
P( E) ∈ R, P( E) ≥ 0 where E ∈ F
It follows that P( E) is always finite.

2. Second axiom:
The probability that at least one of the elementary events in the entire sample space will occur is 1.
P(Ω) = 1

3. Third axiom:
Any countable sequence of disjoint sets (also called mutually exclusive events) E1 , E2 , . . . En satis-
fies the following
P( E1 ∪ E2 ∪ E3 ∪ . . .) = P( E1 ) + P( E2 ) + P( E3 ) + . . .
P(∪i∞=1 Ei ) = ∑i∞=1 P( Ei )

Let’s see what we can deduce from the above three axioms about our grammaticality judgment
example.
Suppose that E1 and E2 are two mututally exclusive events in the sample space Ω, and an empty
set ∅ is also an event in the same sample space. According to the third axiom,

P( E1 ∪ E2 ∅ ∪ ∅ ∪ ∅ . . .) = P( E1 ) + P( E2 ) + P(∅) + P(∅) + P(∅) + . . . (1)

From set theory you know that E1 ∪ ∅ ∪ ∅ = E1

∞
P( E1 ∪ E2 ) = P( E1 ) + P( E2 ) + ∑ P(∅) (2)
i =3
From the first axiom we know that P(∅) ≥ 0, P( E1 ) ≥ 0, P( E2 ) ≥ 0 and P( E1 ∪ E2 ) is finite. Hence,

P(∅) = 0 (3)

Let us go back to our grammaticality judgment experiment.

Sample space: Ω = {“yes”, “no”}
Event space: F = {∅,{“yes”},{“no”},{“yes”,“no”}}
We just verified that

P(∅) = 0 (4)
Now, the second axiom implies that:

P({”yes”} ∪ {”no”}) = 1 (5)

Finally, the third axiom implies that,

P(∅ ∪ {”yes”} ∪ {”no”}) = P(∅) + P({”yes”}) + P({”no”}) (6)

P({”yes”} ∪ {”no”}) = P(∅) + P({”yes”}) + P({”no”}) (7)

bayesian models & data analysis 5

1 = 0 + P({”yes”}) + P({”no”}) (8)

So,

P({”yes”}) + P({”no”}) = 1 (9)

Above equation implies that the sum of probabilities of all elementary events in the sample space
Ω is equal to 1. More generally, if x ∈ Ω:

∑ f (x) = 1 (10)
x ∈Ω

where f ( x ) ∈ [0, 1].

f is a function that assigns a probability value to each elementary event x.
Also, for any event E ∈ F:

P( E) = ∑ f (x) (11)
x∈E

2.2 Probability mass function and probability density function

The function f ( x ) in Equation~11 maps a discrete outcome x in the sample space Ω to a probability
value; it is called a probability mass function.
Now, consider another experiment. You record the reading times for each participant: how much
time (in milliseconds) does it take to read a sentence?
What is the sample space now?
Ω = R+
This sample space is not a finite or countable set now. It is a continuous sample space.
How do we assign probabilities to an outcome x such that x ∈ Ω?
Z ∞
f ( x ) dx = 1 (12)
0

Suppose an event E exists such that E ⊂ R+

Z
P( X ∈ E) = f ( x ) dx (13)
x∈E
The function f ( x ) maps the values (outcomes) in the continuous sample space Ω to a continuous
probability space, such that Z x
P( X ≤ x ) = f ( x ) dx (14)
0
The function f ( x ) is called a probability density function.
In the above equation, there is a variable we have not defined yet, the variable X. The value of X
depend on the outcomes in the continuous sample space; it is called a continuous random variable.
More generally, for any experiment, you can define a random variable X whose values depend on the
outcome of the experiment. We will talk about random variables in the next section.
bayesian models & data analysis 6

3 Random variables

A random variable X is a function that maps the outcomes in a sample space Ω (say {“yes”,“no”}) to
another (real-valued) space ⊗§ (e.g., {0, 1} where 1 corresponds “yes” and 0 corresponds to “no”).
We can write a random variable X as
X : Ω → Ωx
such that
X (ω ) ∈ Ω x where ω ∈ Ω
For example, in a single-coin toss experiment, the sample space is Ω = { H, T }. We can define a
random variable X which is a function that counts the number of heads in an outcome ω that belongs
to Ω.
X: No. of heads in ω where ω ∈ Ω
( )
0 if ω ̸= H
X (ω ) = where w ∈ Ω
1 if ω = H
Similarly, we can also define a random variable Y that counts the number of tails in an outcome ω.
The probabilities are always assigned to the values of the random variable. We will see in the next
part that why the random variables are so useful for assigning probability values.
In the case of continuous sample space, the measurable space is often the same as the sample
space of the experiment. Suppose an experiment (or any generative process) produces outcomes that
belongs to a sample space Ω = { x | x ∈ R+ and 2 ≤ x ≤ 5}. These outcomes are values that are
coming from what is known as a (continuous) random variable. Suppose that in an experimental
trial the outcome is 2.5; we will write X = 2.5, where X is a random variable associated with the
experiment. A random variable is written with capital letters (X or Y, etc.), and the outcomes are
written in lower case (x or y, etc.). In Bayesian statistics, where parameters are also random variables,
it is common to use Greek letters like α, β, etc., to represent random variables (i.e., the capital letter
convention generally only applies to letters of the English alphabet).
Let us see how it is more convenient to assign probabilites to the values of the random variable
rather than assigning probabilities to the sample space Ω.
Consider an experiment where a coin is tossed three times. What will be the sample space of the
experiment?
There are a total of eight possible outcomes.
Ω = { HHH, HHT, HTH, HTT, THH, THT, TTH, TTT }
It is difficult to directly assign probabilities to this sample space, because we will need a probabil-
ity mass function that has probabilities defined for all 8 outcomes.
Now consider a different idea. What if we ask: how many heads appear (in a trial / experiment)
when a coin is tossed three times? We can define a random variable X.
X: No. of heads in the outcome ω where ω ∈ Ω.
If we represent outcome of each toss as ωi , we can write
ω = (ω1 , ω2 , ω3 ) ∈ Ω. The random variable X is given by
( )
3
0 if ωi ̸= H
X (ω ) = ∑ ϕ ( ωi ) where ϕ(ωi ) =
1 if ωi = H
i =1

The above random variables yields the following values:

bayesian models & data analysis 7

X(HHH) = 3, X(THH)=2, and so on.

Hence, X (w) ∈ {0, 1, 2, 3}
So, the random variable X takes “number of heads in three coin-tosses”, i.e., {0, 1, 2, 3}, as its val-
ues.
An experiment can be associated with more than one random variable. For example, consider
another idea: how many tails appear when a coin is tossed three times?
You can define another random variable Y : Ω → Ωy which takes “the number of tails in three
coin-tosses’ ’ as its values. It would map the sample space Ω to another space Ωy , such that ⊗y =
{0, 1, 2, 3}

• Random variables can be discrete. For example, in our coin tossing example, the random variable
X takes a countable list of values (i.e., 0, 1, 2 and 3).

• Random variables can be continuous: they can take any numerical value in an interval or collection
of intervals. For example, in an experiment where we record response or reading time, a random
variable X associated with the experiment can take any positive real number value.

• A random variable is associated with a function, called probability mass function (PMF) for dis-
crete random variables, and probability density function (PDF) for continuous random variables.

• The PMF assigns probabilities to the values of a discrete random variable. The PDF assigns prob-
abilities to particular intervals (ranges) of values of a continuous random variable. The PDF does
not assign a probability to a point value, but rather a density.

So, for any experiment or any generative process, you can define a sample space Ω, a random
variable X that maps its sample space to another space ⊗ x , an event space F which is a power set of
⊗ x , and a function P that maps the event space F to a set of probability values.
The sample space Ω, the event space F, and the function P from the event space F to a set of prob-
abilities together make the formal model of an underlying generative process, denoted by (Ω, F, P).

3.1 Discrete random variables

Suppose a discrete random variable X takes the values x1 , x2 , x3 , . . . , xn . What is the probability that
the random variable X takes the value xi , where i = 1, . . . , n? The probabilities can be assigned by the
probability mass functon f , such that

P ( X = xi ) = f ( xi )
under the requirement
n
∑ f ( xi ) = 1
i =1
An important example: The binomial random variable
Suppose that in an experiment, the trials are independent. And, in each trial, one of the two pos-
sible outcomes can occur with probabilities p and 1 − p. If p remains constant throughout the exper-
iment, each one of these trials is called a Bernoulli trial. Bernoulli trials can represent the generative
processes where each outcome is strictly binary, such as heads/tails, on/off, up/down, etc. The pair
bayesian models & data analysis 8

of possible outcomes is usually represented by success/failure where p is the probability of success

and 1 − p is the probability of failure.
The sample space is S = {success, failure}. For a single Bernoulli trial, let us define a random
variable X such that success is assigned a real number value 1 and failure is assigned 0.

xi = 0 1 Table 1: A random
variable in which
f ( xi ) = 1− p p two outcomes are
A further distribution can arise from Bernoulli trials. Consider an experiment containing n inde- possible: success
or failure. The
pendent Bernoulli trials. Suppose there were k successes in n trials. We can define a new random outcome success
variable X which takes number of successes (out of total number of trials) as its values. is assigned the
number 1, and
The probability distribution of the random variable X that represents the number of successes in n failure the number
Bernoulli trials is given by 0, and a probability
is assigned to each
n! number.
P( X = k ) = f (k, n, p) = p k (1 − p ) n − k (15)
k!(n − k)!
The expression n!
k!(n−k )!
is written as (nk) in mathematics, leading to the above PMF being commonly
written as:

n k
P( X = k) = f (k, n, p) = p (1 − p ) n − k (16)
k
The above distribution is called the binomial distribution, and the random variable is called the
binomial random variable.

k= 0 1 2 ... n Table 2: The proba-

n ( n −1) k bility mass function
f (k, n, p) = p k (1 − p )n−k npk (1 − p )n−k 2 p (1 − p )n−k ... p k (1 − p )n−k when we carry
out n independent
Bernoulli trials.
3.2 The expected value and variance of a random variable
The expected value (also called expectation, mean, first moment) of a random variable X is the
weighted average of the possible values. Suppose a discrete random variable X can take values
x1 , x2 , x3 , . . ., with probabilities f ( x1 ), f ( x2 ), f ( x3 ), . . ., where f represent the probability mass func-
tion. The expected value of X is given by
n
E( X ) = ∑ xi f ( xi )
i =1

The expected value of X is the arithmetic mean of large number of independently drawn values for
the variable X.
The expected value satisfies the following relationships

1. E(cX ) = cE( X )

2. E( X + Y ) = E( X ) + E(Y )

3. E( XY ) = E( X ) E(Y ) (if X and Y are independent)

bayesian models & data analysis 9

The variance of a random variable X is given by:

Var ( X ) = E[( X − E( X ))2 ]

This can be rewritten as:

Var ( X ) = E[ X 2 + E( X )2 − 2XE( X )]
Equivalently:

Var ( X ) = E( X 2 ) + E( E( X )2 ) − E(2XE( X ))

Var ( X ) = E( X 2 ) + E( X )2 − 2E( X ) E( X )

Var ( X ) = E( X 2 ) − E( X )2

E( X ) is often written as µ.
The standard deviation of a random variable X is given by
q
σX = Var ( X )
The variance satisfies the following relationships

1. Var (cX ) = c2 Var ( X )

2. Var ( X + c) = Var ( X )

3. Var ( X + Y ) = Var ( X ) + Var (Y ) (if X and Y are independent)

3.3 Continuous random variables

Consider another experiment where you record the decision times on a grammaticality judgment
task: how much time (in milliseconds) does it take to decide whether the sentence is grammatical or
not?
Suppose we define a random variable X which takes decision times as its values.
The variable X cannot take its values from a countable list; X is a continuous random variable and
can take any value in the continuous space X ≥ 0.
It is impossible to determine the probability of a specific value of X, so we cannot assign probabili-
ties like P( X = xi ).
We can however assign probability to an interval of values of X. For example, we can determine
probability of obtaining a value between x1 and x2 , i.e., P( x1 ≤ X ≤ x2 ) or, equivalently, P( x1 < X <
x2 ).
A continuous random variable X is associated with a probability density function f ( x ), which
assigns probabilities over an interval of values of X in the following way:
R∞
(a) −∞ f ( x ) dx = 1 where f ( x ) ≥ 0, and −∞ ≤ x ≤ ∞
bayesian models & data analysis 10

(b) For any x1 , x2 such that −∞ < x1 < x2 < ∞

Rx
P( x1 ≤ X ≤ x2 ) = P( x1 < X < x2 ) = x 2 f ( x ) dx
1

We can also define a cumulative distribution function F(x) such that

Z x
F ( x ) = P( X ≤ x ) = f ( x ) dx
−∞
Expected value and variance of a Continuous random variable
The expected value or the mean of a continuous random variable X is given by
Z ∞
µ = E( X ) = x f ( x ) dx
−∞
The variance is given by
Z ∞
σ2 = Var ( X ) = E[( X − µ)2 ] = ( x − µ)2 f ( x ) dx
−∞
The normal distribution
Think about the distribution of heights in a population.

Suppose the average height of the population is 6 feet, and the number of people with height > 6
and < 6 is almost same.
Consider an experiment where you randomly pick an individual from the population and record
their height. Let us say we define a random variable X that takes the recorded height as its value.
The variable X is a continuous random variable with specific properties. For example, it is sym-
metrically distributed around its mean, i.e., P( X < E( X )) ≈ P( X > E( X )).
The distribution of variable X in this example can be characterized by a normal distribution with
the following probability density function:

1 ( x − µ )2
−
f (x) = √ e 2σ2
σ 2π
such that:
bayesian models & data analysis 11

R∞
• −∞ f ( x ) dx = 1
R∞
• −∞ x f ( x ) dx = µ
R∞
• −∞ x2 f ( x ) dx = σ2

3.4 Some important probability distributions

Type of Name of Probability density function (PDF) or

Random variable the distribution Probability mass function (PMF)

n!
1 Discrete Binomial PMF: f (k; n, p) = k!(n−k )!
p k (1 − p)k

λk e−λ
2 Discrete Poisson PMF: f (k; λ) = k! where λ > 0

( x − µ )2
−
3 Continuous Normal PDF: f ( x; µ, σ) = √1 e 2σ2
σ 2π

( α + β −1) !
4 Continuous Beta PDF: f ( x; α, β) = ( α −1) ! ( β −1) !
x α−1 (1 − x ) β−1 (where α, β > 0)
α
x α−1 e− βx (where α, β > 0)
β
5 Continuous Gamma PDF: f ( x; α, β) = ( α −1) !
bayesian models & data analysis 12

4 Conditional probability and Bayes’ theorem

Let us look at some useful results and properties that emerge from the three axioms of probability.

4.1 Conditional probability

The probability of occurrence of an event A given that another event B has already occurred is called
the conditional probability of A given B, and it is denoted by P( A| B).

P( A ∩ B)
P( A| B) = given that P( B) ̸= 0
P( B)
Let us verify the above relationship using an example.
Suppose you toss two fair coins simultaneously. The sample space would be Ω = { HH, HT, TH, TT }.
Consider two events A and B.
A : both the coins show heads
B : at least one coin show heads
What is the probability of occurrence of B given that A has occurred?
It will be equal to probability of A such that A is an event in the sample space B, A ⊆ B, where
B = { HH, HT, TH }
A = { HH }
Given that the coins were fair.
P( HH ) = P( HT ) = P( TH )
Consider B as the sample space, from the second and the third axoim we can deduce that,
P( HH ) + P( HT ) + P( TH ) = 1
So,
P( HH ) = P( HT ) = P( TH ) = 31
Hence,

1
P({ HH }| B) =
3

1
P( A| B) =
3
What is the probability of an event A ∩ B in the sample space Ω?
A ∩ B = { HH }
For the sample space Ω, P( HH ) + P( HT ) + P( TH ) + P( TT ) = 1
1
so, P( HH ) = P( HT ) = P( TH ) = P( TT ) = 1/4, which implies that P( A ∩ B) = 4
and,
P({ HH, HT, TH }) = P( HH ) + P( HT ) + P( TT ) = 34
hence, P( B) = 34
Finally,

P( A ∩ B) 1
= = P( A| B)
P( B) 3
bayesian models & data analysis 13

4.2 Independent events

Two events A and B are said to be independent if the occurrence of one does not affect the (probabil-
ity or odds of) occurrence of the other.
The above statement implies that P( A| B) = P( A) (and also, P( B| A) = P( B)). The following
relationship is satisfied

P( A ∩ B)
P( A| B) = = P( A)
P( B)

P( A ∩ B) = P( B) P( A)
The above result implies that two events A and B are independent if and only if the the probability
of joint occurrence of A and B is equal to the product of their probabilites.
The term P( A ∩ B) gives the probability that both events A and B occur, it is called the joint proba-
bility and also represented by P( A, B).
Generally, n events E1 , E2 , . . . , En are independent if and only if P( E1 , E2 , E3 , . . . En ) = P( E1 ) P( E2 ) P( E3 ) . . . P( En ).

4.3 Total probability

Suppose n mutually exclusive events A1 , A2 , A3 , . . . , An occur in an event space F, such that
∩in=1 Ai = ∅ and ∪in=1 Ai = S
For another event B in F, (B ⊆ F),
we can say that B ∩ A1 and B ∩ A2 are mutually exclusive. So,

P(( B ∩ A1 ) ∪ ( B ∩ A2 ) ∪ . . .) = P( B ∩ A1 ) + P( B ∩ A2 ) + . . .
From set theory we know that, ( B ∩ A1 ) ∪ ( B ∩ A2 ) ∪ ( B ∩ A3 ) ∪ . . . = B ∪ ( A1 ∩ A2 ∩ A3 ∩ . . .).
n
P( B ∪ (∩in=1 Ai )) = ∑ P ( B ∩ Ai )
i =1
n
P( B) = ∑ P ( B ∩ Ai )
i =1

We know that P( B ∩ Ai ) = P( B| Ai ) P( Ai ). Hence,

n
P( B) = ∑ P ( B | Ai ) P ( Ai ) (17)
i =1

The above relationship is called the law of total probability.

4.4 Bayes’ theorem

Suppose two mututally exclusive and exhaustive events A1 and A2 occur in an event space F such
that
A1 ∪ A2 = S and A1 ∩ A2 = ∅
For an event B in F we can say that:
bayesian models & data analysis 14

P ( B ∩ A1 ) = P ( B | A1 ) P ( A1 ) = P ( A1 | B ) P ( B )

Similarly:

P ( B ∩ A2 ) = P ( B | A2 ) P ( A2 ) = P ( A2 | B ) P ( B )
From the above equations we can derive the following:

P ( B | A1 ) P ( A1 )
P ( A1 | B ) =
P( B)
And, from the law of total probability we know that,

P ( B ) = P ( B | A1 ) P ( A1 ) + P ( B | A2 ) P ( A2 )

Hence,

P ( B | A1 ) P ( A1 ) P ( B | A1 ) P ( A1 )
P ( A1 | B ) = =
P( B) P ( B | A1 ) P ( A1 ) + P ( B | A2 ) P ( A2 )
The above equation is Bayes’ rule.
Let us talk about the variables that assign values to the outcomes of an underlying generative
process (random event).

5 Using Bayes’ theorem for statistical inference

Suppose that an outcome x observed in an experiment is assumed to come from a normal distribu-
tion, such that
( x − µ )2
−
f ( x; µ, σ2 ) = √1 e 2σ2
σ 2π
where f ( x ) is the probability density function; f ( x ) assigns the probability density value to the
outcome x conditional on the parameters mean µ and variance σ2 of the normal distribution. The
probability density of x conditional on µ and σ2 can be written as,
( x − µ )2
−
p( x |µ, σ2 ) = √1 e 2σ2
σ 2π
The goal of statistical inference is figure out what value(s) of µ and σ2 have generated the observed
outcome x.
We know the probability density of obtaining x given µ and σ2 , can we calculate the probability
density of (a range of) values µ and σ2 conditional on the observed outcome x?
p(µ, σ2 | x ) =?
Using Bayes’ theorem,
p( x |µ,σ2 )· p(µ,σ2 )
p(µ, σ2 | x ) = RR
p( x |µ,σ2 )· p(µ,σ2 ) dµ dσ2
bayesian models & data analysis 15

More generally, suppose the observed outcome x is assumed to be a value of the random variable X
whose probability density function is f ( x; θ ); f ( x; θ ) assigns a probability density value to x condi-
tional on a parameter θ. The probability density of x given the parameter θ is given by p( x |θ ).
Our goal is to infer what value(s) of the parameter θ has generated the given (observed) datapoint x.

Note: When f ( x; θ ) is seen as a function of x, it is called a probability density function; and when
f ( x; θ ) is seen as a function of θ, it is called a likelihood function, also denoted by L(θ | x ).

Chapter 1 - Probability (With Solutions)
No ratings yet
Chapter 1 - Probability (With Solutions)
65 pages
Prepared By: Dr. S.Munirathnam
No ratings yet
Prepared By: Dr. S.Munirathnam
254 pages
IITM Machine Learning
No ratings yet
IITM Machine Learning
857 pages
Module A
No ratings yet
Module A
43 pages
Slides-Sksk
100% (1)
Slides-Sksk
151 pages
Module 1
No ratings yet
Module 1
12 pages
Random Signals by Shanmugan1988 (1) - 23-124
100% (1)
Random Signals by Shanmugan1988 (1) - 23-124
102 pages
Prepared BY: Dr. S.Munirathnam
No ratings yet
Prepared BY: Dr. S.Munirathnam
254 pages
Lectures Ma 2203
No ratings yet
Lectures Ma 2203
209 pages
RV 3
No ratings yet
RV 3
20 pages
Math 2901 Booklet 15
No ratings yet
Math 2901 Booklet 15
291 pages
Wa0002.
No ratings yet
Wa0002.
55 pages
Chap 1
No ratings yet
Chap 1
17 pages
Module 1 Analog Communication
No ratings yet
Module 1 Analog Communication
53 pages
Probability and Random Variables
No ratings yet
Probability and Random Variables
67 pages
Lec 1 - Stochastic Experiment and Sample Space, Probability Models On Dsicrete Sample Spaces
No ratings yet
Lec 1 - Stochastic Experiment and Sample Space, Probability Models On Dsicrete Sample Spaces
32 pages
Unit 1
No ratings yet
Unit 1
45 pages
MATHEMATICAL - STATISTICS (p.1-34)
No ratings yet
MATHEMATICAL - STATISTICS (p.1-34)
34 pages
Probability and Statistics II MAY 2023
No ratings yet
Probability and Statistics II MAY 2023
51 pages
BSM201 19.4.2024
No ratings yet
BSM201 19.4.2024
48 pages
Berrar EBCB Naive Bayes Preprint
No ratings yet
Berrar EBCB Naive Bayes Preprint
19 pages
Probabity 1
No ratings yet
Probabity 1
27 pages
Slides 11 09 PDF
No ratings yet
Slides 11 09 PDF
105 pages
Chapter 5. Probability and Random Process - Updated
No ratings yet
Chapter 5. Probability and Random Process - Updated
151 pages
Main
No ratings yet
Main
24 pages
Introduction To Discrete Probability Theory and Bayesian Networks
No ratings yet
Introduction To Discrete Probability Theory and Bayesian Networks
26 pages
Ch06 - Probality and Random Process
No ratings yet
Ch06 - Probality and Random Process
42 pages
GSM 199 Prev
No ratings yet
GSM 199 Prev
25 pages
STOCHPROC
No ratings yet
STOCHPROC
64 pages
II Sem - Last Minute Revision
No ratings yet
II Sem - Last Minute Revision
44 pages
Week1 Notes
No ratings yet
Week1 Notes
10 pages
Probability
No ratings yet
Probability
69 pages
Probability Review
No ratings yet
Probability Review
29 pages
Stochastic Processes and The Mathematics of Finance: Jonathan Block April 1, 2008
No ratings yet
Stochastic Processes and The Mathematics of Finance: Jonathan Block April 1, 2008
132 pages
Prob Notes
No ratings yet
Prob Notes
94 pages
Notes On Probability: Peter J. Cameron
No ratings yet
Notes On Probability: Peter J. Cameron
94 pages
22MT2005-CO1-Session 1
No ratings yet
22MT2005-CO1-Session 1
16 pages
Lecture 2 ECE 4th
No ratings yet
Lecture 2 ECE 4th
17 pages
PSQT Notes Co1
No ratings yet
PSQT Notes Co1
7 pages
CENG 222 Statistical Methods For Computer Engineering
No ratings yet
CENG 222 Statistical Methods For Computer Engineering
31 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
Chapter Five and Six
No ratings yet
Chapter Five and Six
23 pages
Orientation - Basic Mathematics and Statistics - Probability
No ratings yet
Orientation - Basic Mathematics and Statistics - Probability
48 pages
Unit-IV Engineering Maths-III (Defn and Problems)
No ratings yet
Unit-IV Engineering Maths-III (Defn and Problems)
14 pages
Eda Midterms-Compilation
No ratings yet
Eda Midterms-Compilation
12 pages
STAT 516 Course Notes Part 0: Review of STAT 515: 1 Probability
No ratings yet
STAT 516 Course Notes Part 0: Review of STAT 515: 1 Probability
21 pages
MSO201 Week1 Lecture Notes
No ratings yet
MSO201 Week1 Lecture Notes
7 pages
Soln 1
No ratings yet
Soln 1
8 pages
All in One CheatSheet
100% (1)
All in One CheatSheet
52 pages
Random Events and Probability
No ratings yet
Random Events and Probability
25 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
57 pages
Chapter 1 Probability
No ratings yet
Chapter 1 Probability
13 pages
Chapter3-Probability Distribution
100% (1)
Chapter3-Probability Distribution
35 pages
Reinforcement Learning: Markov Decision Process
No ratings yet
Reinforcement Learning: Markov Decision Process
17 pages
Session 14 - Joint Probability Distributions (GbA) PDF
No ratings yet
Session 14 - Joint Probability Distributions (GbA) PDF
69 pages
Probability Mass Function of A Discrete Random Variable
No ratings yet
Probability Mass Function of A Discrete Random Variable
32 pages
5ETB0 2024 2025 Slides Combined
No ratings yet
5ETB0 2024 2025 Slides Combined
345 pages
Conditional Probability
No ratings yet
Conditional Probability
85 pages
Module 1
No ratings yet
Module 1
99 pages
Joint Probability Distributions PDF
No ratings yet
Joint Probability Distributions PDF
19 pages
PSQT Odd Sem Tutorial Workbook 24-25
No ratings yet
PSQT Odd Sem Tutorial Workbook 24-25
104 pages
ANG2ed 3 R
No ratings yet
ANG2ed 3 R
135 pages
L2 Constructing Probability Distribution
No ratings yet
L2 Constructing Probability Distribution
20 pages
Discrete Random Variables and Probability Distributions
No ratings yet
Discrete Random Variables and Probability Distributions
19 pages
02-Random Variables
No ratings yet
02-Random Variables
62 pages
DLL Q3 Week 1 Stat and Proba - Angelene - Ambatali
No ratings yet
DLL Q3 Week 1 Stat and Proba - Angelene - Ambatali
13 pages
Learning Quiz 3 - Discrete Random Variables - Jupyter Notebook
No ratings yet
Learning Quiz 3 - Discrete Random Variables - Jupyter Notebook
15 pages
Lecture 01 On Joint Distribution For Discrete RV - 04-09-19
No ratings yet
Lecture 01 On Joint Distribution For Discrete RV - 04-09-19
3 pages
Eec 161 ch05
No ratings yet
Eec 161 ch05
141 pages
Umema MSF
No ratings yet
Umema MSF
10 pages
Assignment 1: SAP (Due Date 17 March 2025) : P X X X, X
No ratings yet
Assignment 1: SAP (Due Date 17 March 2025) : P X X X, X
4 pages
Lecture 8
No ratings yet
Lecture 8
13 pages
Introduction To Binomial Distribution
No ratings yet
Introduction To Binomial Distribution
10 pages
Basics of Reliability
No ratings yet
Basics of Reliability
54 pages
5 - Pair R. V.
No ratings yet
5 - Pair R. V.
24 pages
Module 2 - Lesson 4
No ratings yet
Module 2 - Lesson 4
11 pages
Problems
No ratings yet
Problems
20 pages
Discrete Random Variables - Part V: Satyajit Thakor
No ratings yet
Discrete Random Variables - Part V: Satyajit Thakor
9 pages
Lecture15 Sums of RVs
No ratings yet
Lecture15 Sums of RVs
5 pages
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Real Variables with Basic Metric Space Topology
From Everand
Real Variables with Basic Metric Space Topology
Robert B. Ash
5/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)

L1 Prob

Uploaded by

L1 Prob

Uploaded by

CGS698C, Module 1: Sets, probability, and random variables

1.1 Binary operations on sets

2. The intersection of two sets A and B is denoted by A ∩ B

3. Set difference for the sets B and A is denoted by B\ A

4. The Cartesian product of A and B, denoted by A × B

8. The complement of a set A is denoted by Ā or Ac

1.2 The algebra of sets

P( E1 ∪ E2 ∅ ∪ ∅ ∪ ∅ . . .) = P( E1 ) + P( E2 ) + P(∅) + P(∅) + P(∅) + . . . (1)

From set theory you know that E1 ∪ ∅ ∪ ∅ = E1

Let us go back to our grammaticality judgment experiment.

P({”yes”} ∪ {”no”}) = 1 (5)

P(∅ ∪ {”yes”} ∪ {”no”}) = P(∅) + P({”yes”}) + P({”no”}) (6)

P({”yes”} ∪ {”no”}) = P(∅) + P({”yes”}) + P({”no”}) (7)

1 = 0 + P({”yes”}) + P({”no”}) (8)

P({”yes”}) + P({”no”}) = 1 (9)

where f ( x ) ∈ [0, 1].

2.2 Probability mass function and probability density function

Suppose an event E exists such that E ⊂ R+

The above random variables yields the following values:

X(HHH) = 3, X(THH)=2, and so on.

3.1 Discrete random variables

of possible outcomes is usually represented by success/failure where p is the probability of success

k= 0 1 2 ... n Table 2: The proba-

3. E( XY ) = E( X ) E(Y ) (if X and Y are independent)

The variance of a random variable X is given by:

Var ( X ) = E[( X − E( X ))2 ]

1. Var (cX ) = c2 Var ( X )

3. Var ( X + Y ) = Var ( X ) + Var (Y ) (if X and Y are independent)

3.3 Continuous random variables

(b) For any x1 , x2 such that −∞ < x1 < x2 < ∞

We can also define a cumulative distribution function F(x) such that

3.4 Some important probability distributions

Type of Name of Probability density function (PDF) or

4 Conditional probability and Bayes’ theorem

4.1 Conditional probability

4.2 Independent events

4.3 Total probability

We know that P( B ∩ Ai ) = P( B| Ai ) P( Ai ). Hence,

The above relationship is called the law of total probability.

4.4 Bayes’ theorem

5 Using Bayes’ theorem for statistical inference

You might also like