0% found this document useful (0 votes)
91 views8 pages

1 Classical Probability: Indian Institute of Technology Bombay

The document discusses classical probability theory and provides examples to illustrate key concepts. It defines classical probability as counting and computing frequency of outcomes rather than the modern axiomatic approach. Examples are provided to show experiments, outcomes, events, and assigning probability measures based on frequency interpretation. Conditional probability is also defined based on frequencies of outcomes for the event occurring together versus the event alone.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views8 pages

1 Classical Probability: Indian Institute of Technology Bombay

The document discusses classical probability theory and provides examples to illustrate key concepts. It defines classical probability as counting and computing frequency of outcomes rather than the modern axiomatic approach. Examples are provided to show experiments, outcomes, events, and assigning probability measures based on frequency interpretation. Conditional probability is also defined based on frequencies of outcomes for the event occurring together versus the event alone.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Indian Institute of Technology Bombay

Department of Electrical Engineering

Handout 4 EE 325 Probability and Random Processes


Lecture Notes 2 July 24, 2014

1 Classical Probability
The word classical in the title is a slight misnomer. The intention here is to differentiate
it from the modern version (or axiomatic version). Classical probability is mostly the one
that you learned already. It is about counting and computing the frequency of occurrence,
or more generally computing ratios for real-valued variables. Though there are some reser-
vations about its universal applicability, the classical version is very useful and the time
spend there is of immense value. We can adapt and upgrade almost all that we learn here
to the axiomatic framework of modern probability.
The following notations are reserved for the rest of this course.

N − natural numbers, Z − integers, R − real numbers,


C − complex numbers, Q − rational numbers

Before we start, two words of caution. The purpose of this chapter is not to prove
anything in a rigorous fashion. So we will redefine many things that we learn now in a
more rigorous way later. Several phrases are used in a loose sense, for example ‘associate’
or ‘uniformly’ at random. The later has a precise meaning, but we will reach there only
after several lectures. For the time being, take it in their literal sense from the context.
For example, ‘uniformly’ at random corresponds to some fair way of choosing among the
outcomes.

2 Experiments, Outcomes and Events


Probability theory is concerned with experiments, whether physically conceivable or not.
On a finer scale, we worry about

1. the possible outcomes of an experiment

2. interesting events that we wish to enquire about

3. degrees of assurance on various possibilities

Here are some simple examples.

1. A coin is tossed, we are interested in knowing whether a HEAD occurred.

2. A die is rolled twice, we wish to know whether the sum of faces is 7 or 11.

3. A bag has 3 blue balls and 2 red balls, we wish to know whether it will be red, if one
ball is picked without looking into the bag.

In order to clearly articulate experiments like the ones listed above, we need at least three
entities.
1. Outcomes of the experiment, or observations (to be denoted as Ω).

2. Events of interest (denoted by A or Ai , where i ∈ N).

3. A measure that we can associate to each event (denoted by P (A)).

The first entity is referred to as Sample Space, which is the set of all outcomes of the
experiment. The elements of the sample space are also known as sample points. Similarly,
the events of interest belongs to the so-called Event Space, or the set of all interesting
events. We are all familiar with the third entity, where the symbol P stands for probability
or probability measure.
While the first two quantities are natural and unambiguous, the third one needs careful
consideration.

2.1 Frequency Interpretation


In the classical interpretation, the frequency of occurrence of an event is assigned as the
probability of the said event. In particular, for any event A, we will assign a probability
as
#outcomes favoring A
P (A) = . (1)
Total number of outcomes
Here ’favoring’ is used in the sense, ‘those outcomes which will lead to the said event A’.
We will write this as,

∣A∣
P (A) = , (2)
∣Ω∣

where ∣A∣ counts the number of outcomes in the set A. Keep in mind, it is the number of
outcomes that we count in assigning the probability to an event A.
While the theory is simple enough to comprehend, this may lead to inconsistencies,
necessitating a more foolproof approach. We will develop that framework later, but let us
first go through the frequency interpretation for some traditional examples.

Example 1 Consider throwing a balanced die. We can articulate this as

1. Outcomes Ω = {1, 2, 3, 4, 5, 6}.

2. Event-Space P(Ω), let us take it as the set of all subsets of Ω). An example event of
interest {1, 6}. This is equivalent to asking: is the outcome 1 or 6?

3. A probability measure
∣A∣
P (A) =
∣Ω∣
where ∣A∣ denotes cardinality of the set A (to be read as card(A)).

n
Notice that the third entity, i.e. the probability measure, to an extend summarizes
our past or prior knowledge about the experiment. In the actual experiment, there maybe
other exogenous factors including

• the softness of the hand

2
• the surface on which the die fell

• you prayed for a six or one!

But none of those listed is our concern, we simply strip them out of consideration and take
a naive view, which in turn gives us the power to generalize.

Example 2 Two balanced dice are rolled.

1. Ω = {(i, j) ∶ 1 ≤ i ≤ 6, 1 ≤ j ≤ 6}

2. Event Space P(Ω). Example A = {(i, j) ∶ i + j = 10}.

3.
∣A∣
P (A) = .
∣Ω∣

We can evaluate P (A) for the given A to be


3 1
P (A) = = .
36 12
Example 3 A fair coin is tossed n times.

1. Ω = {(ω1 , ω2 , ⋯, ωn ) ∶ ωi = {H, T }}.

2. An example event A = {(ω1 , ⋯, ωn ) ∶ ∑ni=1 1{ωi =H} = k}.

3. For the event A, we can assign a measure

∣A∣ (k )
n
P (A) = = n.
∣Ω∣ 2

To show that classical probability is more than simple counting, let us consider the following
example (from Hajek, Lecture-notes, UIUC, see ee325 website).

Example 4 A traffic light repeats in cycles of length 75s. The respective duration of green,
orange and red are 30s, 5s and 40s. Suppose you do not have a watch, and drive into this
traffic intersection and observe the lights.

1. Ω = {Red, Green, Orange}.

2. A = Red

3. Probability of A can be assigned as


40 8
P (Red) = = .
75 15

The above example also contains some not-so-true assumptions, that the traffic signal is
not connected to the arrivals of vehicles and other road conditions.

Exercise 1 Find the probabilities of observing Green, and that of Orange in Example 4.

3
2.2 Bayes’ Rule
Since probability is concerned with events, one can start thinking of what constitutes
events. Of course events are those interesting sets on which we like to ask questions. For
example, for an event A = {a, b, c, d}, the question ‘whether A occurred?’, is like asking if
either one of a, b, c or d occurred, where a, b, c, d are among the possible outcomes. This is
one difference about outcomes and events. Outcomes are mutually exclusive, i.e one of the
outcomes do happen, and only one. Many events can happen simultaneously, in particular
if A1 ⊂ A2 , then A1 happened will imply that A2 also did happen (the reverse is not true.)
For our probability measure to be meaningful we should be able to tackle the set-
operations of union (⋃), intersection (⋂), complement Ac etc. Frequency interpretation
gives natural answers to such questions when there are only finitely many sets involved in
the operations. Let us first look at the union.
∣A ⋃ B∣
P (A ⋃ B) = (3)
∣Ω∣
∣A∣ ∣B∣ ∣A ⋂ B∣
= + − (4)
∣Ω∣ ∣Ω∣ ∣Ω∣
= P (A) + P (B) − P (A ⋂ B). (5)
In particular, when A and B are disjoint sets (i.e. A ⋂ B = ∅) then
P (A ⋃ B) = P (A) + P (B).
Let us now define a frequency interpretation of conditional probability. Let A and B are
events associated with some sample space Ω.
△ No. of outcomes favoring A and B
P (A∣B) = (6)
No. of outcomes favoring B
∣A ⋂ B∣
= (7)
∣B∣
∣A ⋂ B∣
∣Ω∣
= ∣B∣
(8)
∣Ω∣
P (A ⋂ B)
= . (9)
P (B)
Note: I accept that there is an imprecision in the above statement, i.e. ‘events associated
with some sample space Ω’, what does it mean?. This will get eminently clear as we go on,
but for the time being take the last equation above as the definition of conditional probability.
It is time to revisit our open-the-boxes example from the previous notes. In turns out
that we can model the full exercise by the flip of a coin. That is, the experiment of switching
the box and winning is equivalent to tossing a biased coin with P (H) = 32 .
Example 5 We mentioned that ‘switching’ the box gives a better chance of winning in
finding the ALICE problem. Let us find this probability, denoted as P (W ). Assume that if
ALICE in not in Box I, the host randomly opens one of the other boxes to show a Leopard.
If a leopard is in Box I, then the host opens the box with the other leopard.
Solution: We will break the event of winning to two parts. Winning after getting door 2
opened and otherwise. Notice that our first choice is always Box 1. We will denote by D2O
c
the opening of door 2. Since D20 and D20 are disjoint sets,
P (W ) = P (W ⋂ D2O ) + P (W ⋂ D20
c
) (10)

4
What is the probability of W ⋂ D2O ? It is asking about both door 2 opening and win
happening together. Consider Ω1 = {LLP, LP L, P LL}, we know that each outcome here
has probability 31 . Of this, only the event LLP has both D20 and W happening. Hence
P (W ⋂ D2O ) = 13 . Same is the case where door 3 gets opened, and thus,

1 1 2
P (W ) = + = , (11)
3 3 3
the answer we got in class.
Note: Please note that we used a rule called Bayes Rule II in deriving this result. This rule
will be covered in detail in the later chapters. n
The 3 Box question can also reveal some intriguing aspects with respect to conditional
probability. This explanation can clear any more doubts on this question.

Example 6 Compute the probability of winning given that D2O happened.

Solution: P (W ∣D20 ) is looking at the fraction of winning events which also has D20 , as
compared to the fraction of W or W c which also has D2O . The latter is nothing but the
probability of D2O itself. It is clear that the fraction of D2O is 12 . The fraction of both D2O
AND W is 31 (i.e. LLP happens). Thus
1
2
P (W ∣D20 ) = 3
1 = . (12)
2
3

We can verify the formula for winning using Bayes’ Formula as

P (W ) = P (W ⋂ D2O ) + P (W ⋂ D2O c
) (13)
= P (D2O )P (W ∣D20 ) + P (D2O )P (W ∣D20
c c
) (14)
1 2 1 2
= × + × (15)
2 3 2 3
2
= . (16)
3
n

Exercise 2 Use the above exercise to argue that in finding the LEOPARD problem, know-
ing which other box has a leopard does not change the probability of a leopard being in the
first box.

5
3 Bertrand’s Paradox
If we can assign probabilities for each event of interest in the manner shown in last section,
then there is little need for an alternate approach. However, our probability assignment
strategy may fail, or lead to inconsistencies, as exemplified by the so-called Bertrand’s
Paradox. In this paradox, three seemingly similar experiments will end up giving dramati-
cally different answers, and to make it worse, each one is true. Betrand’s paradox is about
circles and chords, recall that a line which connects any two points of the circle is known
as a chord.
Consider a circle or radius r centered at the origin. Suppose
√ a cord AB
is chosen at random, what is the probability that l(AB) > 3r, where
l(AB) denotes the length of the chord AB.

Before
√ we start solving, realize
that 3r is nothing but the side of
an equilateral triangle inscribed in √ √
the circle, see Figure 1. The sides 3r 3r
are calculated using
√ r r
3 1 2
cos 30 = and sin 30 = . 30deg
2 2 √
3r
We will provide three different
methods, each one appearing to
solve this problem. √
Figure 1: Inscribed equi-lateral triangle has side 3r

3.1 Method I: Random End-points


Let us first pick a point A on the circle. We will form the chord AB by randomly picking
the second point B on the circle. For example, A is chosen as in Figure 2. Now, if the

Figure 2: End points at random

second point lies anywhere in the segment CD of the circle, the corresponding chord will

6

lie in the shaded region, which in turn implies that l(AB) ≥ 3r. However, CD is exactly
1
3 of the circumference. Since B is chosen uniformly on the circumference,

√ 1
P (l(AB) > 3r) = ,
3
when A is chosen as shown. However, if we change A, the problem stays the same and our
computations are independent of the initial choice of A. Thus 13 is the probability that we
look for. n

3.2 Method II: Random Midpoint


In this method, a point inside the circle is chosen
at random. Observe that any point P other than
the origin uniquely corresponds to a chord with
P as the mid-point. There are many possibili- r
2
ties when P is indeed the circle center. However
we are free to draw any chord as we please, once
the origin is chosen. We will learn later that
the probability of getting origin in this exper-
iment is anyway zero. Consider Figure 3, and
notice that when the chosen point falls inside
the shaded region, the corresponding
√ chord will
have length greater than 3r. This is because
Figure 3: The midpoints of chords are
the inner circle is the
√ locus of all the center-of- uniformly chosen
chords with length 3r. Since the mid-points
are uniformly chosen, the chance that a point in the shaded region is chosen is proportional
to the area of the shaded region. Thus,
√ π(r/2)2 1
P (l(AB) ≥ 3r) = = .
πr2 4
n

3.3 Method III


Another way of finding a solution to our problem is to consider random radial lines, i.e.
an angle is uniformly chosen in the interval [0, 360] and a radial line is drawn from origin
at this angle to the positive horizontal axis. Figure 4 illustrates this, where the radial is
drawn at an angle −75deg . Once the radial line is chosen, a point is uniformly picked on
this radial line. The picked point then acts as the center of a chord for our original circle
(notice that this chord is normal to the radial line, see Figure).
Given a radial line, if a point picked is within a distance of 2r from the center, then it
will fall inside the boundary of the inner-circle. Like in the last problem, a point in the
inner circle will lead to a desired chord.
Thus the probability we seek is the probability of choosing a uniform value in the first
half of the radial line.
√ 1
P (l(AB) ≥ 3r) = .
2
n
So the three methods gave completely different answers and leave us with the question,
‘which one shall we believe?’. The key to this lies in the fact that the three methods are

7
r
2

Figure 4: First choosing a radial and then a point on it

attempting to find three different


√ quantities, none of which we can unequivocally call as the
probability of {l(AB) > 3r}. In particular, the association with a probability measure
is part of the problem definition itself, and we can define many such measures on a given
sample space with the same events of interest. In order to throw some more light, let us
consider the event Em that the midpoint of a chord is chosen inside the inner circle of radius
2 , and compute P (Em ) for the three methods. Clearly P (Em ) = 3 for Method-I, implying
r 1

that a lot more centers are possibly chosen in the inner circle than Method II. Method II
will have a uniform distribution of points since we chose it that way. Thus about a quarter
of the points is expected to be inside the inner circle using Method-II. On the other hand,
Method III is expected to pack 21 of the points inside the inner circle. Conversely, Method
I and III will choose possibly less number points in the outer ring, leading to a diluted
density of mid-points there.
The above paradox is carefully tackled in the axiomatic approach by including the
probability association as a part of the problem definition itself. More details are included
in the coming Section.

Exercise 3 Some one stays behind a curtain, tosses a fair coin three times and then shouts
loudly from behind. “Two HEADs are there” (at least). What is the probability that all
results are HEAD?

Exercise 4 Some one stays behind a curtain, tosses a fair coin three times and then shouts
loudly from behind. “The first two results are HEADs”. What is the probability that all
results are HEAD?

Exercise 5 A bag contains 8 red balls and 6 blue balls. If 5 balls are picked at random,
find the probability that 2#{RED} + #{BLU E} = 8, where the notation #{RED} counts
the number of RED balls picked.

You might also like