4 Probability Theory
4 Probability Theory
700
= 0.7
700 + 300
• We can also say that if we pick someone at random from this group of men
and women, the probability of choosing men is 70%.
• In this case, if we think of picking someone at random, the proportion
becomes a probability. Let’s consider some important definitions
Random Experiments
• Random Experiments means that everyone has the same chance of
being chosen. Each outcome of a random experiment is “equally
likely” or has the “same chance”. We can also say that the selection
process is honest, fair, or unbiased. Tossing a coin is an example of a
random experiment.
• The outcome is the result of one single trial of the random
experiment. The set of all possible outcomes of a single trial of a
random experiment is sample space or outcome space. It is denoted
by Ω. We can also call it Universe and denote by 𝑈.
• Something that may or may not happen is known as an event. An
event 𝐴 is mathematically represented by a subset of sample space Ω
Venn Diagrams
• Probability is defined as a function of events and the events are
represented by sets.
• First of all, we should represent the events as subsets of a sample
space Ω.
• The Venn diagrams can be used to show the relationship between
events and the relationship of events with the sample space.
• The rules of probability help define the relations between the events.
For example, 𝐶 occurs if either 𝐴 or 𝐵 occurs. The 𝐶 is the union of 𝐴
and 𝐵 that can be represented by 𝐶 = 𝐴 ∪ 𝐵.
Axioms of Probability
Axioms
• If all the outcomes in a sample space Ω are equally likely, the probability of
𝐴 is the ratio of number of outcomes in 𝐴 and the total number of
outcomes
# 𝐴
𝑃 𝐴 =
# Ω
• The probability is a real number between 0 and 1 i.e. 0 ≤ 𝑃 𝐴 ≤ 1.
• The probability that equals 1 represents certainty: 𝑃 Ω = 1.
• The 0 probability means uncertainty 𝑃 𝐴 = 0. That means that 𝐴 cannot
happen at all. In that case, 𝐴 would be represented by an empty set (or set
with 0 elements) and denoted by ∅. So 𝑃 ∅ = 0.
• Intermediate probability between 0 and 1 represent the various degree of
certainty.
Rule of Areas
• Let’s assume an event 𝐵 that has been partitioned into 𝑛 events
𝐵1 , ⋯ , 𝐵𝑛 .
• If 𝐵 = 𝐵1 ∪ 𝐵2 ∪ ⋯ ∪ 𝐵𝑛 . All the events 𝐵1 , ⋯ , 𝐵𝑛 are mutually
exclusive.
• If we represent this relationship as 𝐵 being subdivided into smaller
areas 𝐵1 , ⋯ , 𝐵𝑛 , the area in 𝐵 is the sum of the areas of smaller areas.
• This gives us the additional rule of the area.
• The addition rule states that if an event can occur in different ways,
the probability of that event is the sum of probabilities of all the ways
the event can occur.
Rules of Probability
• Non-negative: 𝑃(𝐵) ≥ 0
• Addition: If 𝐵1 , 𝐵2 , ⋯ , 𝐵𝑛 are partition of 𝐵 then
• 𝑃 𝐵 = 𝑃 𝐵1 + 𝑃 𝐵2 + ⋯ + 𝑃(𝐵𝑛 )
• Total One: 𝑃 Ω = 1
• Inclusion-Exclusion: 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴𝐵)
Joint Probability and Independent
Events
Joint Probability
• For the joint probability of the events A and B, both the events occur
simultaneously if we repeat the random experiment.
• We can denote this by the probability of the set of outcomes that we
get in both the events 𝐴 and 𝐵 i.e. 𝐴 ∩ 𝐵.
• So the joint probability of 𝐴 and 𝐵 is the probability at their
intersection. i.e. 𝑃(𝐴 ∩ 𝐵).
Independent Events
If both the events 𝐴 and 𝐵 are independent, then the joint probability of
events 𝐴 and 𝐵 is the product of the individual probabilities of both the
events.
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 × 𝑃(𝐵)
If that is not true, the events are called dependent events.
For example, two random draws from a population are independent if a
replacement is done between the draws.
They are dependent (i.e., not independent) if they are done without
replacement.
Independent events have no influence on the outcome of each other. In the
case of a joint probability, the occurrence of the first event may not impact
the occurrence of the second event.
Marginal Probability
• In case of a joint probability, the probability of one of the events is
called its marginal probability. The marginal probability of an event is
the sum of its disjoint parts.
𝐴 = 𝐴 ∩ 𝐵 ∪ (𝐴 ∩ 𝐵𝑐 ) and 𝑃 𝐴 = 𝑃 𝐴 ∩ 𝐵 + 𝑃(𝐴 ∩ 𝐵𝑐 )
Conditional Probability and
Independent Events
Conditional Probability
• On many occasions, we are interested in knowing the probability of
an event given that some other event has occurred.
• We want to know how the occurrence of one event affects the
probability that another event has occurred.
• Suppose event 𝐴 has occurred.
• Our universe is reduced to 𝐴.
• Everything outside of 𝐴 is ruled out.
• We are now interested only in outcomes inside event 𝐴.
• This leads to a reduced universe 𝑈𝑟 = 𝐴.
Conditional Probability
• Since event 𝐴 has already occurred, the total probability of the reduced
universe is 1.
• Therefore, we can also state that the probability of 𝐵 given 𝐴 is the
1
unconditional probability part of 𝐵 that remains in 𝐴 multiplied by .
𝑃 𝐴
• So, the conditional probability of event 𝐵 given event 𝐴.
𝑃 𝐴∩𝐵
•𝑃 𝐵𝐴 =
𝑃 𝐴
• So, the conditional probability 𝑃 𝐵 𝐴 is proportional to the joint
1
probability 𝑃(𝐴 ∩ 𝐵) that has been rescaled by multiplying with so
𝑃 𝐴
that the probability of the reduced universe equals to 1.
Conditional Probability for Independent
Events
• If both the events 𝐴 and 𝐵 are independent events, we have
𝑃 𝐵 𝐴 = 𝑃(𝐵)
𝑃 𝐵 ∩ 𝐴 = 𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝐵 𝑃(𝐵)
Bayes’ Theorem
Bayes’ Theorem
• The theorem of conditional probability states
𝑃 𝐴∩𝐵
𝑃 𝐵𝐴 = 𝑃 𝐴
• We can find𝑐 the marginal probability of event 𝐴 by summing the probability of disjoint parts. Since 𝐴 ∩ 𝐵
and 𝐴 ∩ 𝐵 are disjoint
𝐴 = 𝐴 ∩ 𝐵 ∪ 𝐴 ∩ 𝐵𝑐
𝑃 𝐴 = 𝑃 𝐴 ∩ 𝐵 + 𝑃 𝐴 ∩ 𝐵𝑐
𝑃 𝐴∩𝐵
𝑃 𝐵𝐴 =
𝑃 𝐴 ∩ 𝐵 + 𝑃 𝐴 ∩ 𝐵𝑐
Bayes’ Theorem
• By using multiplication rule we can get the Bayes’ Theorem for a single
event
𝑃 𝐴 𝐵 ×𝑃 𝐵
𝑃 𝐵𝐴 =
𝑃 𝐴 𝐵 × 𝑃 𝐵 + 𝑃 𝐴 𝐵𝑐 × 𝑃 𝐵𝑐
• The Bayes’ theorem is a restatement of the conditional probability where
• We can find the probability of 𝐴 by adding the probabilities of its disjoint
sections, 𝐴 ∩ 𝐵 and 𝐴 ∩ 𝐵𝑐
• Find each of joint probability by using the multiplication rule
• Please note that the union of 𝐵 and 𝐵𝑐 represents the whole universe
𝑈 (or Ω) and both of them are disjoint
Set of events portioning the universe
• On several occasions, more then two events partition the universe.
For example, there are 𝑛 events 𝐵1 , ⋯ , 𝐵𝑛 that partitions the universe
such that:
• The union is universe. 𝐵1 ∪ 𝐵2 ∪ ⋯ ∪ 𝐵𝑛 = 𝑈
• Every distinct pair of the events are disjoint, 𝐵𝑖 ∩ 𝐵𝑗 = ∅ for 𝑖 =
1, ⋯ , 𝑛, 𝑗 = 1, ⋯ , 𝑛 and 𝑖 ≠ 𝑗
• So, we can say 𝐴 = 𝐴 ∩ 𝐵1 ∪ 𝐴 ∩ 𝐵2 ∪ ⋯ (𝐴 ∩ 𝐵𝑛 ). Hence
𝑛
𝑃 𝐴 = 𝑃 𝐴 ∩ 𝐵𝑗
𝑗=1
Set of events portioning the universe
• This is called “the law of total probability”. The probability of an event 𝐴 is sum of the
probabilities of its disjoint parts. By using
𝑛
the multiplication rule, we can deduce
𝑃 𝐴 = 𝑃 𝐴 𝐵𝑗 × 𝑃 𝐵𝑗
𝑗=1
• The conditional probability 𝑃 𝐵𝑖 𝐴 for 𝑖 = 1, ⋯ , 𝑛 is
𝑃 𝐴 ∩ 𝐵𝑖
𝑃 𝐵𝑖 𝐴 =
𝑃 𝐴
• Using multiplication rule in numerator and law of the total probability” in the
denominator,
𝑃 𝐴 𝐵𝑖 × 𝑃 𝐵𝑖
𝑃 𝐵𝑖 𝐴 = 𝑛
σ𝑗=1 𝑃 𝐴 𝐵𝑗 × 𝑃 𝐵𝑗
• This is known as Bayes’ Theorem.
Bayesian Statistics
Prior Probabilities
• Using Bayes’ theorem, we can revise our beliefs on the basis of the
evidence.
• We have an unobservable event 𝐵.
• Suppose, the unconditional probability, 𝑃(𝐵) is known before the
experiment is started, based on our prior belief.
• We also call it prior probabilities because it is the probability that is
available before we observe 𝐴.
• We state our assumption about 𝐵, before observing the data, in the
form of prior probability.
Likelihood Probabilities
The effect of the observed data is the conditional probability, 𝑃 𝐴 𝐵 .
The conditional probability of 𝐴 given 𝐵 is the likelihood of the
unobservable events.
The likelihood function expresses the probability of observed dataset 𝐴
for different values of 𝐵.
Posterior Probabilities
• The posterior probability of event 𝐵 given the occurrence of event 𝐴 is
denoted by 𝑃 𝐵 𝐴 . The posterior probability is obtained after we have
observed 𝐴.
• The Bayes’ theorem converts a prior probability into a posterior probability
• Given these definitions of prior, likelihood, and posterior, we can state the
Bayes’ theorem in words
• If you have to undergo surgery and you want to know about the chances of a successful
surgery, the notion of your undergoing repeated surgeries is completely absurd
• Even though the doctor may have the data on the success rate of such surgeries
performed in the post, every patient is unique
• If you consult one doctor, based on your medical and physical health, the doctor may say
95% chances of successful surgery whereas another doctor might have another opinion,
98%
• You may take several opinions and form your own opinion.
Subjective Interpretation
• In the probability statements where some kind of intuitive judgment of
uncertainties is involved, is called subjective probability or degree of belief.
• Like the long-run frequency probability models, the subjective probability
models have their limitations.
• Subjective models are imprecise.
• You cannot pool the subjective opinion of several individuals about the
same event.
• Despite such difficulties, many people find the idea of subjective opinion
reasonable.
• The subjective interpretation also gives an idea about the notion of
conditional probability and captures the idea of refining your probabilistic
opinion over time as you acquire new information or data.
Example
• Suppose a laboratory test on a blood sample yields two possible
results, positive or negative. According to the industry reports, the
blood sample 95% of the people with a particular disease yield
positive results. But 2% of the people without the disease also give
positive results (false positive). 1% of the total population is infected
by the disease. Determine the probability that a person is chosen
randomly from the population will have disease given that the blood
sample of the person tests positive?
Solution
Observed Event (𝐴) : - +ve Blood test result (+)
Unobserved Event 𝐵 :- Person actually has the disease (𝐷)
Bayes’ Formula
𝑃 +𝐷 𝑃 𝐷
𝑃 𝐷+ =
𝑃 + 𝐷 𝑃 𝐷 + 𝑃 + 𝐷𝑐 𝑃 𝐷𝑐
0.95 0.01 95
𝑃 𝐷+ = = ≈ 32%
0.95 0.01 + 0.02 0.99 293
Thus 32% of the people who actually test positive are infected by the disease.
Explanation
• Typically, the likelihoods 𝑃(𝐴|𝐵) is based on long-run frequency
interpretation
• If the prior probability 𝑃 𝐵 is based on long-run frequency
interpretation, the posterior probability 𝑃(𝐵|𝐴) will also be based on
long-run frequency interpretation
• In the example, prior, posterior, and likelihood probabilities admit
long-run frequency interpretation.
Example
A patient from the population given above walks into the clinic of a
doctor with some symptoms of the disease. After examining the
patient and without checking the blood test report, the doctor gives his
opinion that there are 30% chances that the patient is suffering from
the disease. How should revise his opinion after checking the blood
report
Solution
The doctor should use the Bayes’ Theorem. In this case, the prior is based on the
belief of the doctor not on the long-run frequency interpretation. But the likelihood
remains on the long-run frequency interpretation.
So, given the positive blood test result, the doctor should revise his opinion and
state that the patient has 95.317% probability of the disease.
Explanation
• To come up with the prior, the doctor should make a judgment based on
his clinical examination and his personal experience
• It is impossible to create a mathematical model for this process
• The Bayes’ theorem does not help the doctor come up with the prior
opinion
• The theory suggests how the opinion should be revised given a single
additional piece of information such as a blood test reports in this case.
• The prior and posterior are relative terms
• The posterior distribution after the 1st experiment will become the prior
distribution before the 2nd experiment
• So, an opinion can be revised after using Bayes’ theorem.
Thanks
Samatrix Consulting Pvt Ltd