Introduction To Probability
Introduction To Probability
ECO 523
Probability
Some history1
▶ The subject of statistics and probability seem to be of ancient origin in
India and find a mention even in the great Indian epic, the
Mahabharata,.” Eg: the following quote of Ian Hacking from the History
and Philosophy Science Seminar: “King Bhangasuri wanting to flaunt his
skill in numbers of leaves, and the number of fruits, on two great
branches of a spreading tree. There are, he avers, 2095 fruits. Nala
counts all night and is duly amazed by morning. Bhangasuri says ”I, of
dice possess the science, and in numbers thus am skilled.”
▶ The treatise Arthasastra by Kautilya during the Mauryan period had a
detailed description of the system of data collection relating to
agricultural , population and economic censuses in villages and towns
during the period.
▶ The tradition of collecting data in detail continued during the period of
Mughal emperor Akbar around 1590 A.D. The Ain-i-Akbari written by
Abul Fazal during 1596-1597 A.D. has the best compilation of that
period containing a wealth of information. This had details of several
government departments including the system of legalized measurements,
land classification and crop yields and other information. Abul Fazal was
“ regarded as a statistician ” (Jarret (1894)).
1
This discussion on the history of statistics is taken from Ghosh et. al
(1999) and Rao (2006).
Some history (contd.)
“It is not the case that at least one of A and B occur” is the same as
saying that “A does not occur and B does not occur”, and saying that “it
is not the case that both occur” is the same as saying that “at least one
does not occur”. Analogous results hold for unions and intersections of
more than two events.
Sample Space
Example 1.
Coin flips: A coin is flipped 10 times.Writing Heads as H and
Tails as T, the sample space is the set of all possible strings of
length 10 of H’s and T’s. We can (and will) encode H as 1 and T
as 0, so that an outcome is a sequence (s1 , ..., s10 ) with sj ∈ {0, 1},
and the sample space is the set of all such sequences.
Sample Space (contd.)
1. Let A1 be the event that the first flip is Heads. As a set,
A1 = {(1, s2 , ..., s10 ) : sj ∈ {0, 1} for 2 ≤ j ≤ 10}.
This is a subset of the sample space, so it is indeed an event.
Similarly, let Aj be the event that the j th flip is Heads for j = 2, 3, .
. . , 10.
2. Let B be the event that at least one flip was Heads.
10
[
B= Aj
j=1
4. Let D be the event that there were at least two consecutive Heads.
[9
D = (Aj ∩ Aj+1 )
j=1
Naive definition of probability
▶ Naive definition of probability: Let A be an event for an
experiment with a finite sample space S. The naive probability
of A is
|A| number of outcomes favorable to A
Pnaive (A) = =
|S| total number of outcomes in S
▶ For example, in figure 1 it says
5 4
Pnaive (A) = , Pnaive (B) = ,
9 9
8 1
Pnaive (A ∪ B) = , Pnaive (A ∩ B) =
9 9
▶ For the complements of the events just considered,
4 5
Pnaive (AC ) = , Pnaive (B C ) = ,
9 9
1 8
Pnaive ((A ∪ B)C ) = , Pnaive ((A ∩ B)C ) =
9 9
▶ The naive definition is very restrictive in that it requires S to
be finite, with equal mass for each pebble.
Multiplication Rule
Theorem 1.
Multiplication rule: Consider a compound experiment consisting
of two sub-experiments, Experiment A and Experiment B. Suppose
that Experiment A has a possible outcomes, and for each of those
outcomes Experiment B has b possible outcomes. Then the
compound experiment has ab possible outcomes.
▶ Imagine a tree diagram. Let the tree branch a ways according
to the possibilities for Experiment A, and for each of those
branches create b further branches for Experiment B.
▶ It is often easier to think about the experiments as being in
chronological order, but there is no requirement in the
multiplication rule that Experiment A has to be performed
before Experiment B.
Multiplication Rule (contd.)
▶ Suppose you are buying an ice cream cone. You can choose whether
to have a cake cone or a waffle cone, and whether to have
chocolate, vanilla, or strawberry as your flavor.
Figure 2
Example 2.
Subsets: A set with n elements has 2n subsets, including the empty set ;
and the set itself. This follows from the multiplication rule since for each
element, we can choose whether to include it or exclude it. For example,
the set {1, 2, 3} has the 8 subsets ϕ, {1}, {2}, {3}, {1, 2}, {1, 3}, {2,
3}, {1, 2, 3}.
Multiplication Rule (contd.)
Theorem 2.
Sampling with replacement:Consider n objects and making k choices from
them, one at a time with replacement (i.e., choosing a certain object does not
preclude it from being chosen again). Then there are nk possible outcomes
▶ For example, imagine a jar with n balls, labeled from 1 to n. Each
sampled ball is a sub-experiment with n possible outcomes, and there are
k sub-experiments. Thus there are nk ways to choose a sample of size k.
Theorem 3.
▶ Sampling without replacement: Consider n objects and making k choices
from them, one at a time without replacement (i.e., choosing a certain object
precludes it from being chosen again).Then there are n(n-1)...(n-k+1) possible
outcomes, for k ≤ n (and 0 possibilities for k > n).
▶ The result also follows from multiplication rule: each sampled ball is
again a sub-experiment, and the number of possible outcomes decreases
by 1 each time.
Multiplication Rule (contd.)
Example 3.
Permutations and factorials:A permutation of 1, 2, . . . ,n is
an arrangement of them in some order, e.g., 3, 5, 1, 2, 4 is a
permutation of 1, 2, 3, 4, 5. By Theorem 3 with k = n, there are
n! permutations of 1, 2, . . . ,n. For example there are n! factorial
ways in which n people can line up for ice cream. (recall that
n! = n(n − 1)....1 for any positive integer n and 0! = 1.)
▶ Birthday problem: There are k people in a room. Assume
each person’s birthday is equally likely to be any of the 365
days of the year (we exclude February 29), and that people’s
birthdays are independent (we assume there are no twins in
the room). What is the probability that two or more people in
the group have the same birthday?
▶ Solved in class
Adjusting for overcounting
In many counting problems, it is not easy to directly count each
possibility once and only once. If, however, we are able to count
each possibility exactly c times for some c, then we can adjust by
dividing by c. For example, if we have exactly double-counted each
possibility, we can divide by 2 to get the correct count. We call
this adjusting for overcounting.
▶ Committees and teams: Consider a group of four people.
1. How many ways are there to choose a two-person committee?
2. How many ways are there to break the people into two teams
of two?
▶ Solved in class
Adjusting for overcounting (contd.)
A binomial coefficient counts the number of subsets of a certain size for
a set, such as the number of ways to choose a committee of size k from
a set of n people. Sets and subsets are by definition unordered, e.g.,
{3, 1, 4} = {4, 1, 3}, so we are counting the number of ways to choose k
objects out of n, without replacement and without distinguishing between
the different orders in which they could be chosen.
▶ Binomial coefficient: For any nonnegative integers k and n, the
binomial coefficient n Ck , read as “n choose k”, is the number of
subsets of size k for a set of size n.
Theorem 4.
Binomial coefficient formula: For k ≤ n, we have
n n(n − 1)...(n − k + 1) n!
Ck = =
k! (n − k)!k!
(Saying that these events are disjoint means that they are
mutually exclusive: Ai ∩ Aj = ϕ for i ̸= j.)
▶ Any function P (mapping events to numbers in the interval [0,
1]) that satisfies the two axioms is considered a valid
probability function.
Non-naive definition of probability (contd.)
Theorem 6.
Inclusion-exclusion: For any events A1 , ..., An ,
n
[ X X X
P( Ai ) = P(Ai ) − P(Ai ∩ Aj ) + P(Ai ∩ Aj ∩ Ak ) − .....
i=1 i i<j i<j<k