1 Classical Probability: Indian Institute of Technology Bombay
1 Classical Probability: Indian Institute of Technology Bombay
1 Classical Probability
The word classical in the title is a slight misnomer. The intention here is to differentiate
it from the modern version (or axiomatic version). Classical probability is mostly the one
that you learned already. It is about counting and computing the frequency of occurrence,
or more generally computing ratios for real-valued variables. Though there are some reser-
vations about its universal applicability, the classical version is very useful and the time
spend there is of immense value. We can adapt and upgrade almost all that we learn here
to the axiomatic framework of modern probability.
The following notations are reserved for the rest of this course.
Before we start, two words of caution. The purpose of this chapter is not to prove
anything in a rigorous fashion. So we will redefine many things that we learn now in a
more rigorous way later. Several phrases are used in a loose sense, for example ‘associate’
or ‘uniformly’ at random. The later has a precise meaning, but we will reach there only
after several lectures. For the time being, take it in their literal sense from the context.
For example, ‘uniformly’ at random corresponds to some fair way of choosing among the
outcomes.
2. A die is rolled twice, we wish to know whether the sum of faces is 7 or 11.
3. A bag has 3 blue balls and 2 red balls, we wish to know whether it will be red, if one
ball is picked without looking into the bag.
In order to clearly articulate experiments like the ones listed above, we need at least three
entities.
1. Outcomes of the experiment, or observations (to be denoted as Ω).
The first entity is referred to as Sample Space, which is the set of all outcomes of the
experiment. The elements of the sample space are also known as sample points. Similarly,
the events of interest belongs to the so-called Event Space, or the set of all interesting
events. We are all familiar with the third entity, where the symbol P stands for probability
or probability measure.
While the first two quantities are natural and unambiguous, the third one needs careful
consideration.
∣A∣
P (A) = , (2)
∣Ω∣
where ∣A∣ counts the number of outcomes in the set A. Keep in mind, it is the number of
outcomes that we count in assigning the probability to an event A.
While the theory is simple enough to comprehend, this may lead to inconsistencies,
necessitating a more foolproof approach. We will develop that framework later, but let us
first go through the frequency interpretation for some traditional examples.
2. Event-Space P(Ω), let us take it as the set of all subsets of Ω). An example event of
interest {1, 6}. This is equivalent to asking: is the outcome 1 or 6?
3. A probability measure
∣A∣
P (A) =
∣Ω∣
where ∣A∣ denotes cardinality of the set A (to be read as card(A)).
n
Notice that the third entity, i.e. the probability measure, to an extend summarizes
our past or prior knowledge about the experiment. In the actual experiment, there maybe
other exogenous factors including
2
• the surface on which the die fell
But none of those listed is our concern, we simply strip them out of consideration and take
a naive view, which in turn gives us the power to generalize.
1. Ω = {(i, j) ∶ 1 ≤ i ≤ 6, 1 ≤ j ≤ 6}
3.
∣A∣
P (A) = .
∣Ω∣
∣A∣ (k )
n
P (A) = = n.
∣Ω∣ 2
To show that classical probability is more than simple counting, let us consider the following
example (from Hajek, Lecture-notes, UIUC, see ee325 website).
Example 4 A traffic light repeats in cycles of length 75s. The respective duration of green,
orange and red are 30s, 5s and 40s. Suppose you do not have a watch, and drive into this
traffic intersection and observe the lights.
2. A = Red
The above example also contains some not-so-true assumptions, that the traffic signal is
not connected to the arrivals of vehicles and other road conditions.
Exercise 1 Find the probabilities of observing Green, and that of Orange in Example 4.
3
2.2 Bayes’ Rule
Since probability is concerned with events, one can start thinking of what constitutes
events. Of course events are those interesting sets on which we like to ask questions. For
example, for an event A = {a, b, c, d}, the question ‘whether A occurred?’, is like asking if
either one of a, b, c or d occurred, where a, b, c, d are among the possible outcomes. This is
one difference about outcomes and events. Outcomes are mutually exclusive, i.e one of the
outcomes do happen, and only one. Many events can happen simultaneously, in particular
if A1 ⊂ A2 , then A1 happened will imply that A2 also did happen (the reverse is not true.)
For our probability measure to be meaningful we should be able to tackle the set-
operations of union (⋃), intersection (⋂), complement Ac etc. Frequency interpretation
gives natural answers to such questions when there are only finitely many sets involved in
the operations. Let us first look at the union.
∣A ⋃ B∣
P (A ⋃ B) = (3)
∣Ω∣
∣A∣ ∣B∣ ∣A ⋂ B∣
= + − (4)
∣Ω∣ ∣Ω∣ ∣Ω∣
= P (A) + P (B) − P (A ⋂ B). (5)
In particular, when A and B are disjoint sets (i.e. A ⋂ B = ∅) then
P (A ⋃ B) = P (A) + P (B).
Let us now define a frequency interpretation of conditional probability. Let A and B are
events associated with some sample space Ω.
△ No. of outcomes favoring A and B
P (A∣B) = (6)
No. of outcomes favoring B
∣A ⋂ B∣
= (7)
∣B∣
∣A ⋂ B∣
∣Ω∣
= ∣B∣
(8)
∣Ω∣
P (A ⋂ B)
= . (9)
P (B)
Note: I accept that there is an imprecision in the above statement, i.e. ‘events associated
with some sample space Ω’, what does it mean?. This will get eminently clear as we go on,
but for the time being take the last equation above as the definition of conditional probability.
It is time to revisit our open-the-boxes example from the previous notes. In turns out
that we can model the full exercise by the flip of a coin. That is, the experiment of switching
the box and winning is equivalent to tossing a biased coin with P (H) = 32 .
Example 5 We mentioned that ‘switching’ the box gives a better chance of winning in
finding the ALICE problem. Let us find this probability, denoted as P (W ). Assume that if
ALICE in not in Box I, the host randomly opens one of the other boxes to show a Leopard.
If a leopard is in Box I, then the host opens the box with the other leopard.
Solution: We will break the event of winning to two parts. Winning after getting door 2
opened and otherwise. Notice that our first choice is always Box 1. We will denote by D2O
c
the opening of door 2. Since D20 and D20 are disjoint sets,
P (W ) = P (W ⋂ D2O ) + P (W ⋂ D20
c
) (10)
4
What is the probability of W ⋂ D2O ? It is asking about both door 2 opening and win
happening together. Consider Ω1 = {LLP, LP L, P LL}, we know that each outcome here
has probability 31 . Of this, only the event LLP has both D20 and W happening. Hence
P (W ⋂ D2O ) = 13 . Same is the case where door 3 gets opened, and thus,
1 1 2
P (W ) = + = , (11)
3 3 3
the answer we got in class.
Note: Please note that we used a rule called Bayes Rule II in deriving this result. This rule
will be covered in detail in the later chapters. n
The 3 Box question can also reveal some intriguing aspects with respect to conditional
probability. This explanation can clear any more doubts on this question.
Solution: P (W ∣D20 ) is looking at the fraction of winning events which also has D20 , as
compared to the fraction of W or W c which also has D2O . The latter is nothing but the
probability of D2O itself. It is clear that the fraction of D2O is 12 . The fraction of both D2O
AND W is 31 (i.e. LLP happens). Thus
1
2
P (W ∣D20 ) = 3
1 = . (12)
2
3
P (W ) = P (W ⋂ D2O ) + P (W ⋂ D2O c
) (13)
= P (D2O )P (W ∣D20 ) + P (D2O )P (W ∣D20
c c
) (14)
1 2 1 2
= × + × (15)
2 3 2 3
2
= . (16)
3
n
Exercise 2 Use the above exercise to argue that in finding the LEOPARD problem, know-
ing which other box has a leopard does not change the probability of a leopard being in the
first box.
5
3 Bertrand’s Paradox
If we can assign probabilities for each event of interest in the manner shown in last section,
then there is little need for an alternate approach. However, our probability assignment
strategy may fail, or lead to inconsistencies, as exemplified by the so-called Bertrand’s
Paradox. In this paradox, three seemingly similar experiments will end up giving dramati-
cally different answers, and to make it worse, each one is true. Betrand’s paradox is about
circles and chords, recall that a line which connects any two points of the circle is known
as a chord.
Consider a circle or radius r centered at the origin. Suppose
√ a cord AB
is chosen at random, what is the probability that l(AB) > 3r, where
l(AB) denotes the length of the chord AB.
Before
√ we start solving, realize
that 3r is nothing but the side of
an equilateral triangle inscribed in √ √
the circle, see Figure 1. The sides 3r 3r
are calculated using
√ r r
3 1 2
cos 30 = and sin 30 = . 30deg
2 2 √
3r
We will provide three different
methods, each one appearing to
solve this problem. √
Figure 1: Inscribed equi-lateral triangle has side 3r
second point lies anywhere in the segment CD of the circle, the corresponding chord will
6
√
lie in the shaded region, which in turn implies that l(AB) ≥ 3r. However, CD is exactly
1
3 of the circumference. Since B is chosen uniformly on the circumference,
√ 1
P (l(AB) > 3r) = ,
3
when A is chosen as shown. However, if we change A, the problem stays the same and our
computations are independent of the initial choice of A. Thus 13 is the probability that we
look for. n
7
r
2
that a lot more centers are possibly chosen in the inner circle than Method II. Method II
will have a uniform distribution of points since we chose it that way. Thus about a quarter
of the points is expected to be inside the inner circle using Method-II. On the other hand,
Method III is expected to pack 21 of the points inside the inner circle. Conversely, Method
I and III will choose possibly less number points in the outer ring, leading to a diluted
density of mid-points there.
The above paradox is carefully tackled in the axiomatic approach by including the
probability association as a part of the problem definition itself. More details are included
in the coming Section.
Exercise 3 Some one stays behind a curtain, tosses a fair coin three times and then shouts
loudly from behind. “Two HEADs are there” (at least). What is the probability that all
results are HEAD?
Exercise 4 Some one stays behind a curtain, tosses a fair coin three times and then shouts
loudly from behind. “The first two results are HEADs”. What is the probability that all
results are HEAD?
Exercise 5 A bag contains 8 red balls and 6 blue balls. If 5 balls are picked at random,
find the probability that 2#{RED} + #{BLU E} = 8, where the notation #{RED} counts
the number of RED balls picked.