0% found this document useful (0 votes)
29 views40 pages

4 Probability Theory

This document provides an overview of probability theory. It defines key concepts like random experiments, sample spaces, events, and axioms of probability. It explains how to calculate probabilities using concepts like proportions, Venn diagrams, and the addition rule. Conditional probability and Bayes' theorem are also covered. Independent and dependent events are distinguished. Marginal, joint, and conditional probabilities are defined.

Uploaded by

Vanshika Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views40 pages

4 Probability Theory

This document provides an overview of probability theory. It defines key concepts like random experiments, sample spaces, events, and axioms of probability. It explains how to calculate probabilities using concepts like proportions, Venn diagrams, and the addition rule. Conditional probability and Bayes' theorem are also covered. Independent and dependent events are distinguished. Marginal, joint, and conditional probabilities are defined.

Uploaded by

Vanshika Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Probability Theory

Samatrix Consulting Pvt Ltd


Uncertainty and Probability
• In our everyday life, we deal with many situations that are completely
unpredictable and uncertain.
• For example, if we think about the weather conditions today evening.
• We cannot be certain whether it will rain or not today evening.
• We may also contact the Meteorological department to get the latest
weather forecast that is based on scientific methods.
• Still, the prediction may not be completely true.
• Hence, there is always a component of uncertainty in most of the real
situations.
• So, we need appropriate probability and mathematical models to make
sense of uncertain events.
Probability
• Probability is an extension of proportion that is the ratio of a part to a
whole. For example, in a group, if there are 700 men and 300 women, the
proportion of men would be

700
= 0.7
700 + 300

• We can also say that if we pick someone at random from this group of men
and women, the probability of choosing men is 70%.
• In this case, if we think of picking someone at random, the proportion
becomes a probability. Let’s consider some important definitions
Random Experiments
• Random Experiments means that everyone has the same chance of
being chosen. Each outcome of a random experiment is “equally
likely” or has the “same chance”. We can also say that the selection
process is honest, fair, or unbiased. Tossing a coin is an example of a
random experiment.
• The outcome is the result of one single trial of the random
experiment. The set of all possible outcomes of a single trial of a
random experiment is sample space or outcome space. It is denoted
by Ω. We can also call it Universe and denote by 𝑈.
• Something that may or may not happen is known as an event. An
event 𝐴 is mathematically represented by a subset of sample space Ω
Venn Diagrams
• Probability is defined as a function of events and the events are
represented by sets.
• First of all, we should represent the events as subsets of a sample
space Ω.
• The Venn diagrams can be used to show the relationship between
events and the relationship of events with the sample space.
• The rules of probability help define the relations between the events.
For example, 𝐶 occurs if either 𝐴 or 𝐵 occurs. The 𝐶 is the union of 𝐴
and 𝐵 that can be represented by 𝐶 = 𝐴 ∪ 𝐵.
Axioms of Probability
Axioms
• If all the outcomes in a sample space Ω are equally likely, the probability of
𝐴 is the ratio of number of outcomes in 𝐴 and the total number of
outcomes
# 𝐴
𝑃 𝐴 =
# Ω
• The probability is a real number between 0 and 1 i.e. 0 ≤ 𝑃 𝐴 ≤ 1.
• The probability that equals 1 represents certainty: 𝑃 Ω = 1.
• The 0 probability means uncertainty 𝑃 𝐴 = 0. That means that 𝐴 cannot
happen at all. In that case, 𝐴 would be represented by an empty set (or set
with 0 elements) and denoted by ∅. So 𝑃 ∅ = 0.
• Intermediate probability between 0 and 1 represent the various degree of
certainty.
Rule of Areas
• Let’s assume an event 𝐵 that has been partitioned into 𝑛 events
𝐵1 , ⋯ , 𝐵𝑛 .
• If 𝐵 = 𝐵1 ∪ 𝐵2 ∪ ⋯ ∪ 𝐵𝑛 . All the events 𝐵1 , ⋯ , 𝐵𝑛 are mutually
exclusive.
• If we represent this relationship as 𝐵 being subdivided into smaller
areas 𝐵1 , ⋯ , 𝐵𝑛 , the area in 𝐵 is the sum of the areas of smaller areas.
• This gives us the additional rule of the area.
• The addition rule states that if an event can occur in different ways,
the probability of that event is the sum of probabilities of all the ways
the event can occur.
Rules of Probability
• Non-negative: 𝑃(𝐵) ≥ 0
• Addition: If 𝐵1 , 𝐵2 , ⋯ , 𝐵𝑛 are partition of 𝐵 then
• 𝑃 𝐵 = 𝑃 𝐵1 + 𝑃 𝐵2 + ⋯ + 𝑃(𝐵𝑛 )
• Total One: 𝑃 Ω = 1

• Some other useful general rules of probability have been


summarized. They have been derived from the basic rules.

• Complement Rule: 𝑃 not 𝐴 = 𝑃 𝐴𝑐 = 1 − 𝑃(𝐴)


Rules of Probability
• Difference Rule: If occurrence of 𝐴 means occurrence of 𝐵. 𝑃 𝐴 ≤
𝑃(𝐵). Then probability that 𝐵 occurs and 𝐴 does not occur is the
difference between probability of 𝐵 and probability of 𝐴

𝑃 𝐵 and not 𝐴 = 𝑃 𝐵𝐴𝑐 = 𝑃 𝐵 − 𝑃(𝐴)

• Inclusion-Exclusion: 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴𝐵)
Joint Probability and Independent
Events
Joint Probability
• For the joint probability of the events A and B, both the events occur
simultaneously if we repeat the random experiment.
• We can denote this by the probability of the set of outcomes that we
get in both the events 𝐴 and 𝐵 i.e. 𝐴 ∩ 𝐵.
• So the joint probability of 𝐴 and 𝐵 is the probability at their
intersection. i.e. 𝑃(𝐴 ∩ 𝐵).
Independent Events
If both the events 𝐴 and 𝐵 are independent, then the joint probability of
events 𝐴 and 𝐵 is the product of the individual probabilities of both the
events.
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 × 𝑃(𝐵)
If that is not true, the events are called dependent events.
For example, two random draws from a population are independent if a
replacement is done between the draws.
They are dependent (i.e., not independent) if they are done without
replacement.
Independent events have no influence on the outcome of each other. In the
case of a joint probability, the occurrence of the first event may not impact
the occurrence of the second event.
Marginal Probability
• In case of a joint probability, the probability of one of the events is
called its marginal probability. The marginal probability of an event is
the sum of its disjoint parts.

𝐴 = 𝐴 ∩ 𝐵 ∪ (𝐴 ∩ 𝐵𝑐 ) and 𝑃 𝐴 = 𝑃 𝐴 ∩ 𝐵 + 𝑃(𝐴 ∩ 𝐵𝑐 )
Conditional Probability and
Independent Events
Conditional Probability
• On many occasions, we are interested in knowing the probability of
an event given that some other event has occurred.
• We want to know how the occurrence of one event affects the
probability that another event has occurred.
• Suppose event 𝐴 has occurred.
• Our universe is reduced to 𝐴.
• Everything outside of 𝐴 is ruled out.
• We are now interested only in outcomes inside event 𝐴.
• This leads to a reduced universe 𝑈𝑟 = 𝐴.
Conditional Probability
• Since event 𝐴 has already occurred, the total probability of the reduced
universe is 1.
• Therefore, we can also state that the probability of 𝐵 given 𝐴 is the
1
unconditional probability part of 𝐵 that remains in 𝐴 multiplied by .
𝑃 𝐴
• So, the conditional probability of event 𝐵 given event 𝐴.
𝑃 𝐴∩𝐵
•𝑃 𝐵𝐴 =
𝑃 𝐴
• So, the conditional probability 𝑃 𝐵 𝐴 is proportional to the joint
1
probability 𝑃(𝐴 ∩ 𝐵) that has been rescaled by multiplying with so
𝑃 𝐴
that the probability of the reduced universe equals to 1.
Conditional Probability for Independent
Events
• If both the events 𝐴 and 𝐵 are independent events, we have

𝑃 𝐵 𝐴 = 𝑃(𝐵)

• For independent events, 𝑃 𝐵 ∩ 𝐴 = 𝑃 𝐵 × 𝑃(𝐴). The factor 𝑃(𝐴)


will cancel out. If the events 𝐴 and 𝐵 are independent events, the
knowledge about 𝐴 does not affect the probability of occurrence of
event 𝐵.
Multiplication Rule
• In several applications, for events 𝐴 and 𝐵, on several occasions, the
conditional probability 𝑃(𝐵|𝐴) and overall probability 𝑃(𝐴) are more
obvious than the overall probability 𝑃 𝐵 ∩ 𝐴
• In such cases, we can calculate 𝑃(𝐵 ∩ 𝐴) by using the following
rearrangement of the general formula for conditional probability:

𝑃 𝐵 ∩ 𝐴 = 𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝐵 𝑃(𝐵)
Bayes’ Theorem
Bayes’ Theorem
• The theorem of conditional probability states

𝑃 𝐴∩𝐵
𝑃 𝐵𝐴 = 𝑃 𝐴

• We can find𝑐 the marginal probability of event 𝐴 by summing the probability of disjoint parts. Since 𝐴 ∩ 𝐵
and 𝐴 ∩ 𝐵 are disjoint

𝐴 = 𝐴 ∩ 𝐵 ∪ 𝐴 ∩ 𝐵𝑐
𝑃 𝐴 = 𝑃 𝐴 ∩ 𝐵 + 𝑃 𝐴 ∩ 𝐵𝑐

• By substituting the value of 𝑃(𝐴) in the definition of conditional probability, we get

𝑃 𝐴∩𝐵
𝑃 𝐵𝐴 =
𝑃 𝐴 ∩ 𝐵 + 𝑃 𝐴 ∩ 𝐵𝑐
Bayes’ Theorem
• By using multiplication rule we can get the Bayes’ Theorem for a single
event
𝑃 𝐴 𝐵 ×𝑃 𝐵
𝑃 𝐵𝐴 =
𝑃 𝐴 𝐵 × 𝑃 𝐵 + 𝑃 𝐴 𝐵𝑐 × 𝑃 𝐵𝑐
• The Bayes’ theorem is a restatement of the conditional probability where
• We can find the probability of 𝐴 by adding the probabilities of its disjoint
sections, 𝐴 ∩ 𝐵 and 𝐴 ∩ 𝐵𝑐
• Find each of joint probability by using the multiplication rule
• Please note that the union of 𝐵 and 𝐵𝑐 represents the whole universe
𝑈 (or Ω) and both of them are disjoint
Set of events portioning the universe
• On several occasions, more then two events partition the universe.
For example, there are 𝑛 events 𝐵1 , ⋯ , 𝐵𝑛 that partitions the universe
such that:
• The union is universe. 𝐵1 ∪ 𝐵2 ∪ ⋯ ∪ 𝐵𝑛 = 𝑈
• Every distinct pair of the events are disjoint, 𝐵𝑖 ∩ 𝐵𝑗 = ∅ for 𝑖 =
1, ⋯ , 𝑛, 𝑗 = 1, ⋯ , 𝑛 and 𝑖 ≠ 𝑗
• So, we can say 𝐴 = 𝐴 ∩ 𝐵1 ∪ 𝐴 ∩ 𝐵2 ∪ ⋯ (𝐴 ∩ 𝐵𝑛 ). Hence
𝑛

𝑃 𝐴 = ෍ 𝑃 𝐴 ∩ 𝐵𝑗
𝑗=1
Set of events portioning the universe
• This is called “the law of total probability”. The probability of an event 𝐴 is sum of the
probabilities of its disjoint parts. By using
𝑛
the multiplication rule, we can deduce
𝑃 𝐴 = ෍ 𝑃 𝐴 𝐵𝑗 × 𝑃 𝐵𝑗
𝑗=1
• The conditional probability 𝑃 𝐵𝑖 𝐴 for 𝑖 = 1, ⋯ , 𝑛 is
𝑃 𝐴 ∩ 𝐵𝑖
𝑃 𝐵𝑖 𝐴 =
𝑃 𝐴
• Using multiplication rule in numerator and law of the total probability” in the
denominator,
𝑃 𝐴 𝐵𝑖 × 𝑃 𝐵𝑖
𝑃 𝐵𝑖 𝐴 = 𝑛
σ𝑗=1 𝑃 𝐴 𝐵𝑗 × 𝑃 𝐵𝑗
• This is known as Bayes’ Theorem.
Bayesian Statistics
Prior Probabilities
• Using Bayes’ theorem, we can revise our beliefs on the basis of the
evidence.
• We have an unobservable event 𝐵.
• Suppose, the unconditional probability, 𝑃(𝐵) is known before the
experiment is started, based on our prior belief.
• We also call it prior probabilities because it is the probability that is
available before we observe 𝐴.
• We state our assumption about 𝐵, before observing the data, in the
form of prior probability.
Likelihood Probabilities
The effect of the observed data is the conditional probability, 𝑃 𝐴 𝐵 .
The conditional probability of 𝐴 given 𝐵 is the likelihood of the
unobservable events.
The likelihood function expresses the probability of observed dataset 𝐴
for different values of 𝐵.
Posterior Probabilities
• The posterior probability of event 𝐵 given the occurrence of event 𝐴 is
denoted by 𝑃 𝐵 𝐴 . The posterior probability is obtained after we have
observed 𝐴.
• The Bayes’ theorem converts a prior probability into a posterior probability
• Given these definitions of prior, likelihood, and posterior, we can state the
Bayes’ theorem in words

posterior ∝ likelihood × prior

• The denominator in the Bayes’ theorem is a normalization constant that


ensures that posterior probability distribution integrates to one and
remains a valid probability density function.
Interpretation of Probability
• There are two significant interpretations of probability.
1. Frequency Interpretation
2. Subjective Interpretation
Frequency Interpretation

• In frequency interpretation, the


probability is an approximation to
long-run relative frequencies.
• The relative frequency
interpretation measures how often,
or how frequently, an event
happens in a sequence of
observations.
• For example if we toss a coin or roll
a die, again and again, an infinite
number of times.
• This method of probability
interpretation is used in frequentist
statistics.
Subjective Interpretation
• The notion of long-run relative frequencies makes good sense in the context of repeated
trials. But the concept of repeated trials does not always make sense. For example
• The probability of a particular patient recovering after surgery
• The probability of a major earthquake in the country next year
• The probability of a car driver involved in a car accident next year

• If you have to undergo surgery and you want to know about the chances of a successful
surgery, the notion of your undergoing repeated surgeries is completely absurd
• Even though the doctor may have the data on the success rate of such surgeries
performed in the post, every patient is unique
• If you consult one doctor, based on your medical and physical health, the doctor may say
95% chances of successful surgery whereas another doctor might have another opinion,
98%
• You may take several opinions and form your own opinion.
Subjective Interpretation
• In the probability statements where some kind of intuitive judgment of
uncertainties is involved, is called subjective probability or degree of belief.
• Like the long-run frequency probability models, the subjective probability
models have their limitations.
• Subjective models are imprecise.
• You cannot pool the subjective opinion of several individuals about the
same event.
• Despite such difficulties, many people find the idea of subjective opinion
reasonable.
• The subjective interpretation also gives an idea about the notion of
conditional probability and captures the idea of refining your probabilistic
opinion over time as you acquire new information or data.
Example
• Suppose a laboratory test on a blood sample yields two possible
results, positive or negative. According to the industry reports, the
blood sample 95% of the people with a particular disease yield
positive results. But 2% of the people without the disease also give
positive results (false positive). 1% of the total population is infected
by the disease. Determine the probability that a person is chosen
randomly from the population will have disease given that the blood
sample of the person tests positive?
Solution
Observed Event (𝐴) : - +ve Blood test result (+)
Unobserved Event 𝐵 :- Person actually has the disease (𝐷)

Bayes’ Formula
𝑃 +𝐷 𝑃 𝐷
𝑃 𝐷+ =
𝑃 + 𝐷 𝑃 𝐷 + 𝑃 + 𝐷𝑐 𝑃 𝐷𝑐

Given 𝑃 + 𝐷 = 0.95 , 𝑃 + 𝐷 𝑐 = 0.02, 𝑃 𝐷 = 0.01 𝑃 𝐷 𝑐 = 0.99

0.95 0.01 95
𝑃 𝐷+ = = ≈ 32%
0.95 0.01 + 0.02 0.99 293

Thus 32% of the people who actually test positive are infected by the disease.
Explanation
• Typically, the likelihoods 𝑃(𝐴|𝐵) is based on long-run frequency
interpretation
• If the prior probability 𝑃 𝐵 is based on long-run frequency
interpretation, the posterior probability 𝑃(𝐵|𝐴) will also be based on
long-run frequency interpretation
• In the example, prior, posterior, and likelihood probabilities admit
long-run frequency interpretation.
Example
A patient from the population given above walks into the clinic of a
doctor with some symptoms of the disease. After examining the
patient and without checking the blood test report, the doctor gives his
opinion that there are 30% chances that the patient is suffering from
the disease. How should revise his opinion after checking the blood
report
Solution
The doctor should use the Bayes’ Theorem. In this case, the prior is based on the
belief of the doctor not on the long-run frequency interpretation. But the likelihood
remains on the long-run frequency interpretation.

𝑃 𝐷 = 0.3, 𝑃 𝐷𝑐 = 0.7, 𝑃 + 𝐷 = 0.95, 𝑃 + 𝐷𝑐 = 0.02

0.3 0.95 285


𝑃 𝐷+ = = = 0.95317
0.3 0.95 + 0.02 0.7 299

So, given the positive blood test result, the doctor should revise his opinion and
state that the patient has 95.317% probability of the disease.
Explanation
• To come up with the prior, the doctor should make a judgment based on
his clinical examination and his personal experience
• It is impossible to create a mathematical model for this process
• The Bayes’ theorem does not help the doctor come up with the prior
opinion
• The theory suggests how the opinion should be revised given a single
additional piece of information such as a blood test reports in this case.
• The prior and posterior are relative terms
• The posterior distribution after the 1st experiment will become the prior
distribution before the 2nd experiment
• So, an opinion can be revised after using Bayes’ theorem.
Thanks
Samatrix Consulting Pvt Ltd

You might also like