0% found this document useful (0 votes)
27 views

Chapter 1 - Probability (With Solutions)

This document provides an introduction to probability and statistical foundations. It outlines key concepts like sample spaces, events, probability measures, and methods for computing probabilities. Sample spaces can be finite or infinite depending on the nature of the experiment. Events are subsets of outcomes in the sample space. Probability is a measure of the likelihood of an event occurring that satisfies certain properties. Computing probabilities involves counting favorable outcomes when the sample space is finite and all outcomes are equally likely. Algebraic operations can combine events and conditional probability considers dependencies between events.

Uploaded by

caoyuanboy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Chapter 1 - Probability (With Solutions)

This document provides an introduction to probability and statistical foundations. It outlines key concepts like sample spaces, events, probability measures, and methods for computing probabilities. Sample spaces can be finite or infinite depending on the nature of the experiment. Events are subsets of outcomes in the sample space. Probability is a measure of the likelihood of an event occurring that satisfies certain properties. Computing probabilities involves counting favorable outcomes when the sample space is finite and all outcomes are equally likely. Algebraic operations can combine events and conditional probability considers dependencies between events.

Uploaded by

caoyuanboy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

ST5201: Statistical

Foundations of Data Science


Chapter 1: Probability

Wei Yang, Wang


[email protected]
Department of Statistics and Data Science
National University of Singapore (NUS)
1
Outline

● Introduction

● Sample spaces

● Probability measures

● Computing probabilities: counting methods

● Conditional probability

● Independence

Textbook reading: Section 1.1 – Section 1.6

2
What is probability?

Probability,

● Is a measure of the likeliness that an event will occur

● The higher the probability, the more certain we are that the event will occur

● Quantification of uncertainty in many fields:

○ Operations research (inventory prediction)

○ Finance (stock volatility)

○ Data science (what about model precision & recall?)

3
Sample space
Definition
An experiment is any action or process whose outcome is subject to uncertainty or
randomness

● An experiment can have multiple outcomes.

● In an experiment, only 1 possible outcome can occur. It is uncertain which outcome will occur.

● Assume the set of all possible outcomes is known,

Definition
The sample space of an experiment, denoted by Ω, is the set/collection of all possible
outcomes of that experiment. Each element of Ω, denoted by ω, is an outcome.
4
Examples

Experiments with 2 possible outcomes:

e.g. Coin toss, light bulbs, binary class labels

Ω={H, T}, {on, off}, {True, False}, {1, 0}

A single die has 6 possible sides. Hence an experiment involving rolling a


die has sample space:

Ω = {1, 2, 3, 4, 5, 6}

5
Examples

What about 3 consecutive coin tosses?

Ω = {HHH, THH, HTH, HHT, TTH, HTT, THT, TTT}

t
Amount of time between successive customer arrivals

Ω = {t | t ≥ 0}

time

6
Events
Definition
An event A is any collection of outcomes contained in the sample space (i.e., any subset of Ω written as A
⊂ Ω). An event is simple if it consists of exactly 1 outcome & compound if it consists of > 1 outcome.

Events are concerned with sets of outcomes, A ⊂ Ω. (Not just interested in single outcomes)

When an experiment is performed & an outcome ω ∈ Ω is observed/realized:

● An event A is said to occur if the observed outcome ω is included in A (i.e., ω ∈ A)

● Exactly 1 simple event ({ω}) occurs, but many compound events occur simultaneously

7
Examples

Toss 2 coins: Ω = {HH, HT, TH, T T}

Event A: 2 heads are observed

● A = {HH}, is a simple event as A contains


only 1 outcome

Event B: exactly 1 head is observed

● B = {HT, TH}, is a compound event

8
Examples

Car defects: Suppose in an experiment,

A: Only one good car

is observed. Hence a simple


event has occurred. As well
B: At most one defective car
as compound events B and
C, but not A.

C: All all cars in same condition

9
Algebra of events

● Carry over set operations from set theory.


● Create new and more complex events.

Definition
Given any event A, B ⊂ Ω,
● The complement of A, denoted by Ac, Ā, or A’, is the set of all outcomes in Ω that are not contained
in A
● The union of sets A and B, denoted by A ∪ B (read “A or B”), is the event consisting of all
outcomes that are either in A or B or in both events
● The intersection of sets A and B, denoted by A ∩ B or AB & read “A and B”, is the event consisting
of all outcomes that are both in A and B

10
Algebra of events

Definition
The null event, denoted by ∅, is the event consisting of no outcomes

Definition
We say A and B are disjoint or mutually exclusive events when A ∩ B = ∅. It follows that A and Ac must be
disjoint for any event A ⊂ Ω.

Definition
We say A and B are exhaustive events when A ∪ B = Ω. It follows that A and Ac must be exhaustive for
any event A ⊂ Ω.
11
Venn diagrams

Ac A∪B A∩B 12
Venn diagrams

A and B are mutually exclusive or disjoint events. A and Ac are mutually exclusive and exhaustive events

13
Venn diagrams - Quiz

In a 2 coin toss experiment, if Ω = {HH, HT, TH, TT}; A = {HH} is the right circle and B = {HT, TH} is the left
circle, where is C = {TT}?

14
Venn diagrams - Quiz

Assume there are 7 possible outcomes in an experiment s.t. Ω = {0, 1, 2, 3, 4, 5, 6}

Let A = {0, 1, 2, 3, 4}, B = {3, 4, 5, 6}, and C = {1, 2}. Then,

● Ac =
● A∪B=
● A∪C=
● A∩B=
● A∩C=
● (A ∩ C)c =
● B∩C=

15
Venn diagrams - Quiz

Assume there are 7 possible outcomes in an experiment s.t. Ω = {0, 1, 2, 3, 4, 5, 6}

Let A = {0, 1, 2, 3, 4}, B = {3, 4, 5, 6}, and C = {1, 2}. Then,

● Ac = {5, 6}
● A ∪ B = {0, 1, 2, 3, 4, 5, 6} = Ω (i.e. A and B are exhaustive)
● A ∪ C = {0, 1, 2, 3, 4} = A (C is a subset of A)
● A ∩ B = {3, 4}
● A ∩ C = {1, 2}
● (A ∩ C)c = {0, 3, 4, 5, 6}
● B ∩ C = ∅ (i.e. B and C are disjoint)

16
Algebra for multiple events

The operations of union & intersection can be extended to ≥ 3 events, and the idea of disjointness &
exhaustiveness can also be generalized

For any 3 events, A, B, C ⊂ Ω

● A, B & C are said to be mutually exclusive or pairwise disjoint if no 2 events have any outcomes in
common
● A, B & C are said to be exhaustive if the event A ∪ B ∪ C consists of all outcomes in Ω

17
Laws for multiple events

● Associative Laws ● DeMorgan’s Laws


○ (A ∪ B) ∪ C = A ∪ (B ∪ C)
○ (A ∩ B) ∩ C = A ∩ (B ∩ C)

● Distributive Laws
○ (A∪B)∩C = (A∩C)∪(B∩C) ○
○ (A∩B)∪C = (A∪C)∩(B∪C)

18
Probability measure

Definition
A probability measure on Ω is a function P from subsets of Ω to [0, 1] that satisfies the following rules:
● P(Ω) = 1
● If A ⊂ Ω, then P(A) ≥ 0
● If A1, A2, · · · , An, · · · are disjoint, then

Note: The above rules do not completely determine an assignment of probabilities to events. They serve
only to rule out assignments inconsistent with our intuitive notions of prob

19
Probability measure

Properties: Given any two events A, B ⊂ Ω

● P(Ac) = 1 − P(A)
● P(∅) = 0 (i.e. probability that there is no outcome is 0)
● P(A) ≤ 1
● If A ⊂ B, then P(A) ≤ P(B)
● Addition Law:
○ P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

Note: The addition law serves as the general formula to compute the probability of an event of interest
which is expressible as a union of events via decomposing the event of interest into “smaller” events

20
Examples

In an undergraduate module, 60% of students have statistics background, 80% have calculus background,
and 50% have both. If a student is selected at random, what is the probability that s/he has background in
(1) at least 1 subject and (2) exactly 1 subject?

Solution:

Let A = {selected student has statistics background}, B = {selected student has calculus background},
hence P(A) = 0.6, P(B) = 0.8, P(A ⋂ B) = 0.5.

P(background in at least 1 subject) = P(A) + P(B) - P(A ⋂ B) = 0.9

P(background in exactly 1 subject) = P(ABc) + P(BAc) = 0.9 - 0.5 = 0.4

21
Calculating probability

Depend on the nature of the sample space Ω: finite versus infinite

When the sample space is finite (i.e., Ω = {ω1, ω2, · · · , ωn}):

● Let
○ P({ωi}) = pi, i = 1, 2, · · · , n

where n ≥ 1, called the cardinality of Ω, is a finite positive integer denoting the total # of
outcomes in Ω

● Computing P(A) is straightforward due to rule of probability:

Simply add probabilities of all the outcomes ωi in the event A!


22
Sample spaces with equally likely outcomes

Many experiments have outcomes equally likely to occur, e.g., coin toss, dice throw, birthday date of a
selected student, ...

Counting method:

For an experiment satisfying

● The sample space is finite


● The n outcomes are equally likely to occur,

The probability of any event A is:

23
Examples

Question: Solution:

Coin flip: Ω = {HH, HT, TH, TT}, with cardinality N A = {HH, HT, TH} = {HH} U {HT} U {TH}
=4
P(A) = P({HH}) + P({HT}) + P({TH})
Let A denote the event that at least 1 head is
observed For a fair coin,

A = {HH, HT, TH} P({HH}) = P({HT}) = P({TH}) = P({TT}) = ¼

Then,

P(A) = ¼ + ¼ + ¼ = ¾ =

24
Counting methods

In reality, many experiments may be associated with quite large cardinality n

Counting all the outcomes (i.e., cardinality of Ω) is not an easy task; obtaining the cardinality of event A is
also prohibitive

Example: the sample space for 50 coin tosses.

Several counting methods or systematic ways for enumeration will be introduced

● Especially useful in computing probs when the sample space is finite


● Some of the ideas can be borrowed or generalized to experiments with infinite sample spaces

25
The multiplication principle

Decompose an experiment into sequence of simpler experiments

Multiplication principle & its extension:

● If one experiment has m > 0 outcomes and another experiment has n > 0 outcomes, then there are
m × n possible outcomes for the two experiments
● If there are p > 2 experiments, where the first experiment has n1 possible outcomes, the second n2, ·
· · , the pth np possible outcomes, then there are a total of n1 × n2 × · · · × np possible outcomes for
the p experiments

Example: A coin toss has 2 outcomes: {H, T}. 50 coin tosses has 2 × 2 ×… × 2 = 250 outcomes.

26
Examples

1. Draw a card from a deck:


a. 52 cards/outcomes,
b. or 2 experiments with m = 4 and n = 13 outcomes (4 suits and 13 face values). 4 x 13 = 52.

2. Sample 3 cars for defects


a. 3 experiments with n1 = n2 = n3 = 2 (normal or defect) outcomes
b. Single experiment of 2 x 2 x 2 = 8 outcomes

3. Singapore sweep on first Wednesday of every month:


a. 7 experiments with 10 outcomes each
b. 107 = 10,000,000 possible outcomes

27
Permutations and combinations

Interested in the number of ways one can select a The number of outcomes i.e. sample space
subset of size r from a group of n cardinality depends on
distinct/distinguishable objects {c1, …, cn}.
● Whether we are allowed to get duplicated
objects (i.e. sampling with replacement vs
without)
● Whether the sequence of the selected r
objects matter

28
Permutations
When ordering matters:

Definition
A permutation is an ordered arrangement of objects. Selecting a sample of size r (=0, 1, …, n) from a set
of n objects, there are
● nr permutations under sampling with replacement

permutations under sampling without replacement

Recall:

● n! is called n factorial
● When n is a positive integer, n! = n(n − 1)(n − 2)· · · 1. We have the convention 0! = 1.
● The total number of permutations of n distinct objects is n!/(n−n)! = n! 29
Examples

Refer to the picture earlier on M&M’s, how many different ordered arrangements of 4 M&M’s selected
from 6 M&M’s of different colors are there?

Solution: Here, n = 6 distinct M&M’s colors; r = 4 selected M&M’s colors (subset size). Note that the
order of the 4 colors matters and the M&M’s cannot be duplicated (i.e., sampling without replacement).
The number of permutations is

P = 6!/(6-4)! = 6!/2! = 6 x 5 x 4 x 3 = 360


6 4

30
Examples

There are 10 teaching assistants (TAs) in a module Solution:


grading a test. The test consists of 5 questions.
Each TA is assigned to grade at most 1 question. Select 10 TAs into 5 different question slots.
How many ways can the TAs be chosen for
grading? ● Order matters as questions are different
● Sampling without replacement as each TA
can only mark 1 question at most

10!/(10-5)! = 10!/5! = 10(9)(8)(7)(6) = 30240

31
Quiz

Question: Solution:

There are n rewards to be distributed randomly by


a teacher to n students so that each student gets 1
reward. Suppose that 1 of the rewards is the top
prize. The students will queue up to receive a
reward from the teacher. Shall a student queue
first so as to increase his/her chance of getting the
top prize?

32
Quiz

Question: Solution:

● There are n! ways to distribute the rewards in


There are n rewards to be distributed randomly by their various orders
a teacher to n students so that each student gets 1 ● Each ordering is equally likely to happen
reward. Suppose that 1 of the rewards is the top ● Assume a student queues at i-th position, what
is the probability that the i-th position of the
prize. The students will queue up to receive a reward sequence is also the first prize?
reward from the teacher. Shall a student queue
first so as to increase his/her chance of getting the P(first prize at i-th) = # sequences with first prize at
top prize? i-th / # sequences
= # ways to distribute n-1 rewards into the other n-1
positions / n!
= (n-1)!/n! = 1/n

33
Combinations

Sometimes, we may be no longer interested in how the objects are arranged, but in the constituents of
the subset. For instance, we do not care about the ordering of M&M’s colors.

When ordering does not matter:

Definition
A combination is an unordered arrangement/collection of objects. For a set of n distinct objects and a
subset of size r, there are

different combinations under sampling without replacement when r = 0, 1, …, n.

34
Examples

Question: Solution:

In the same M&M example as earlier, how many Here, n = 6 distinct M&M’s (group size), ≈ = 4
different combinations of 4 M&M’s selected from selected M&M’s (subset size). The order of the 4
6 M&M’s of different colors are there? colors does not matter and we cannot duplicate
the M&M’s (i.e., sampling without replacement).
The number of combinations is

Note: this is <<360, the number of permutation


outcomes

35
Binomial coefficients

Binomial Coefficients & The Binomial Theorem

is also called the binomial coefficient as it occurs in the expansion

When a = 1, b = 1,

36
Binomial coefficients
(a+b)n = (a+b)(a+b)...(a+b)

Relate the binomial expansion process to the selection process:

● Each of the n brackets gives either a or b in the expansion ⇔ n objects into 2 classes selected (a),
unselected (b)
● arbn-r ⇔ exactly r objects are selected
● Number of terms arbn-r ⇔ number of ways to have exactly r objects selected ⇔

For example, when n=6 and r=4, the coefficient for a4b2 is .

One possibility is

37
Multinomial coefficients

What about higher number of classes k ≥ 3? Classes: Classes 1, 2, · · · , k with n1, n2, · · · , nk objects

Example: 6 M&M’s, k = 3 groups, n1 = 3, n2 = 2 and n3 = 1:

38
Multinomial Coefficients
Multinomial Coefficients & The Multinomial Theorem
The number of ways to assign n distinct objects into k distinct classes with ni objects in the i-th class, i =
1, 2, · · · , k, , is

which is called the multinomial coefficient as it occurs in the expansion

where the sum is over all nonnegative integers n1, n2,… nk such that n1+n2+...+nk = n

Remark: The assignment/sampling is without replacement as each object is classified to exactly 1 class
39
Examples

Question: How many ways are there to give 2 Question: What is the coefficient of x2y2z3 in the
M&M’s each to 2 kids with 6 M&M colours? expansion of (w+x+y+z)7?

Solution: Bucketing 6 M&M’s into 3 groups of 2 Solution: Multinomial expansion with k = 4 classes
each (kid 1, kid2, not given). Ordering of colours and nw = 0, nx = ny = 2 and nz = 3.
does not matter. Hence,

40
Conditional probability
Definition
Let A & B be two events with P(B) > 0. The conditional probability of A given B is defined to be

● “Given B”: an event B has already occurred


● Reduced sample space ⇔ Only outcomes in B, but not in Bc ⊂ Ω are possible to occur
○ the sample space for this “new” experiment becomes B rather than Ω and has a probably smaller cardinality
○ it is possible that it is easy to understand A once B has occurred as B contains fewer outcomes than Ω

41
Conditional probability

● There need not be a causal or temporal relationship between A and B.


○ Example: The conditional probability that a selected person has height ≥ 170 cm given that this person
weighting ≥ 80 kg? Weight and height are related, but larger weight does not cause larger height.

● P(A|B) may or may not be equal to P(A)


● In general, P(A|B) (the conditional probability of A given B) is not equal to P(B|A).

● Conditional probabilities can be correctly reversed using Bayes’ rule.

42
Examples

Suppose individuals who bought digital camera on Amazon were recommended memory card and spare
batteries during checkout. 60% bought a memory card, 40% bought a spare battery, and 30% bought
both.

For a random buyer, let A be the event a memory card is purchased and B be the event a spare battery is
bought.

Note: P(A|B) ≠ P(A), and P(A|B) + P(Ac|B) = 1


43
Quiz

A bin contains 25 light bulbs, of which 5 are in good condition and function at least 30 days. 10 are
partially defective and will fail in the second day of use, while the rest are totally defective and will not
light up at all. Given that a light bulb lights up, what is the probability that it will still continue to light up
after 1 week?

44
Quiz

A bin contains 25 light bulbs, of which 5 are in good condition and function at least 30 days. 10 are
partially defective and will fail in the second day of use, while the rest are totally defective and will not
light up at all. Given that a light bulb lights up, what is the probability that it will still continue to light up
after 1 week?

Let G be event that light bulb is in good condition (will work >= 30 days), and T be the event that the
randomly chosen bulb is totally defective. So Tc is the event that light bulb is in good condition or partially
defective. We want to find out:

p(G|Tc) = P(GTc)/P(Tc) = P(G)/P(Tc) = (5/25) / (15/25) = 1/3

Note: P(G|Tc) ≠ P(G) = 5/25

45
Quiz

Given that a fair coin is flipped twice, what is the probability of obtaining 2 heads given that the first flip
is a head?

Ω = {HH, HT, TH, TT}

46
Quiz

Given that a fair coin is flipped twice, what is the probability of obtaining 2 heads given that the first flip
is a head?

Ω = {HH, HT, TH, TT}

Let A = {HH}, B = {HH, HT}

P(A|B) = P(A ∩ B)/P(B) = ¼ / ½ = ½

Or

Since B has already occurred, the sample space is just B. and A is one out of 2 possible outcomes hence ½.

47
Multiplication law
Multiplication law
Let A and B be two events with P(B) > 0. Then,

● When P(B) and P(A|B) are available or can be easily computed, P(A ∩ B) can be obtained as a
product
● An alternative formula: When P(A) > 0 and P(B|A) are available, P(A ∩ B) = P(A)P(B|A)
● In practice, for any complex event representable as an intersection of 2 events, its prob can be
computed in 2 ways

48
Examples

Four individuals responded to a request to donate blood. Only type O+ is required. However suppose we
do not know their blood type, and only that one of them has the correct blood type, what is the
probability that we have test for the blood type of at least 3 of the individuals before we get O+?

Let, B = {1st type test is not O+}, A = {2nd type test is not O+}.

Then P(B) = ¾ , and P(A|B) = ⅔. Multiplication law yields,

P(At least 3 individuals typed) = P(A ⋂ B) = P(B)P(A|B) = ¾ x ⅔ = ½

49
Law of total probability
Definition
A collection of events B1, B2, · · · , Bn is called a partition of size n if

● B1, B2, · · · , Bn are called mutually exclusive & exhaustive events


● The simplest partition of size 2 including any event B is {B, Bc}

Law of total probability


Let B1, B2, · · · , Bn be a partition with P(Bi) > 0 for all i. Then, for any event A

50
Law of total probability

E..g partition size n=4:

Idea:

● Break event A into smaller events, A⋂B1,... ,


A⋂B4
● Compute probability of each by
multiplication law

51
Examples

In a factory, 40% of goods come from line 1 and 60% from line 2. Line 1 has a defect rate of 8% and line 2
10%. If an item from the factory is chosen at random, what is the probability that it will not be defective?

P(not defective) = P(line 1)P(not defective | line 1) + P(line 2)(P(not defective | line 2)

0.4x(1-0.08) + 0.6x(1-0.1) = 0.908

52
Quiz

A chain video store sells 3 brands of DVD players. Of its sales, 50% are brand 1 and 30% are brand 2. Each
brand has 1 year warranty, and it is known that 25% of brand 1 requires warranty work, while brand 2 and
3 are 20% and 10% respectively. What is the probability that a randomly chosen purchaser will require
repair under warranty?

53
Quiz

A chain video store sells 3 brands of DVD players. Of its sales, 50% are brand 1 and 30% are brand 2. Each
brand has 1 year warranty, and it is known that 25% of brand 1 requires warranty work, while brand 2 and
3 are 20% and 10% respectively. What is the probability that a randomly chosen purchaser will require
repair under warranty?

Let Ai = {brand i is purchased}, for i = 1, 2, 3, and B = {needs repair}

P(B) = ∑iP(Ai)P(B|Ai) = (0.5 x 0.25) + (0.3 x 0.2) + (0.2 x 0.1) = 0.205

54
Examples

● A tree diagram is a handy tool for computing probabilities in experiments composing several stages
/ generations
● Components / ingredients in a tree diagram:
○ Nodes and branches: total number of different generations depending on the total number of outcomes
○ Probabilities attached to each branch

● In the DVD example:


○ First generation corresponds to the DVD brand (3 branches)
○ Second generation corresponds to the possibilities of needing repair (2 branches)
○ Attach probabilities to each branch in first generation

○ Attach conditional probabilities to second generation

55
56
Bayes rule
Bayes rule
Let B1, B2, · · · , Bn be a partition with P(Bi) > 0 for all i. Then, for any event A with P(A) > 0,

● Numerator P(A ∩ Bj); Denominator: P(A)


● Reverse chronological order: usually Bj happens before A in time. I.e. What should have happened
before A has occurred?
● As Bj is the event of interest, it provides a hint that one needs to look for a partition containing Bj
s.t. the n probs at the RHS are available or easily computed

57
Examples

Refer to DVD example again: Suppose a customer returns to store asking for warranty work, what is the
probability that it is brand 1 or 2 or 3 player?

From previous example, we know P(needs warranty work) = P(B) = 0.205

P(A1|B) = P(A1)P(B|A1) / P(B)= 0.5 x 0.25 / 0.205 = 0.61

P(A2|B) = P(A2)P(B|A2) / P(B)= 0.3 x 0.2 / 0.205 = 0.29

P(A3|B) = P(A3)P(B|A3) / P(B)= 0.2 x 0.1 / 0.205 = 0.1

Note: P(A1|B) + P(A2|B) + P(A3|B) = 1

58
Quiz

People are trying to get home at peak hour. As seasoned consumers, everyone opens their preferred ride
hailing app to book a ride. App A is used 60% of the time as it is more popular, being cheaper and
advertises more. App B is used 30% of the time, and C the remaining. Suppose you know that app A will
not find a car/cab 20% of the time, 10% for B and 30% for C. You observe that your friend failed to get a
ride. What is the probability that your friend used app A?

59
Quiz

People are trying to get home at peak hour. As seasoned consumers, everyone opens their preferred ride
hailing app to book a ride. App A is used 60% of the time as it is more popular, being cheaper and
advertises more. App B is used 30% of the time, and C the remaining. Suppose you know that app A will
not find a car/cab 20% of the time, 10% for B and 30% for C. You observe that your friend failed to get a
ride. What is the probability that your friend used app A?

Let F be the event of failing to get a ride. Then,

P(A|F) = P(F ⋂ A) / P(F) = P(F|A)P(A) / [ P(F|A)P(A) + P(F|B)P(B) + P(F|C)P(C) ]

(0.6 x 0.2) / (0.6 x 0.2 + 0.3 x 0.1 + 0.1 x 0.3) = 0.667

60
Independence

● An important concept or notion in prob & stat based on conditional probs


● Recall: The definition of conditional probability enables us to revise the probability P(A) originally
assigned to A when we are subsequently informed that another event B has occurred; the “new”
prob of A is P(A|B)
○ P(A) ≟ P(A|B)

● Intuitively , P(A|B) would be different from P(A) unless knowing B does not tell us anything about A
(i.e., “occurrence of B has nothing to do with occurrence of A”).

61
Independence
Definition
Events A and B are said to be independent if
● P(A ∩ B) = P(A)P(B), or equivalently
● P(A) = P(A|B), or equivalently
● P(B) = P(B|A),
Otherwise they are said to be dependent

● In general, multiplication law P(A ∩ B) = P(A)P(B|A) = P(B)P(A|B) is always true for any intersection;
the first formula above, P(A ∩ B) = P(A)P(B), is a special case following from independence of A & B.
● Independence & disjointness are 2 different concepts
○ conclude disjointness from Venn diagram (no probabilities involved)
○ independence is defined in terms of probabilities
○ disjointness means that P(A|B) = 0 ⇒ dependence as long as P(A) ≠ 0 and P(B) ≠ 0.

62
Independence
Independence of 2 events
If A & B are independent, then so are A&Bc , Ac&B, and Ac&Bc.

Definition
Three events A, B, and C are said to be mutually independent if all the 4 conditions hold:
● P(A ∩ B ∩ C) = P(A)P(B)P(C)
● P(A ∩ B) = P(A)P(B)
● P(A ∩ C) = P(A)P(C)
● P(B ∩ C) = P(B)P(C)

● A is independent of any event formed by B and C


● Three events are pairwise independent if the last 3 conditions hold
63
Examples

Drawing cards from a deck: Toss 2 coins:

Let A = {is an ace}, D = {is a diamond}. P(A) = 4/52 = Let A = {1st coin H}, B = {2nd coin H}, C = {only 1 H}
1/13, P(D) = 13/52 = 1/4
P(B|A) = P(B) since first coin toss does not
P(A ∩ D) = 1/52 = 1/13 x 1/4 = P(A)P(D) influence second coin toss. I.e. A and B are indept.

A and D are independent Suppose coins are fair (i.e. P(A) = P(B) = 0.5) then

P(C) = P({HT, TH}) = 2/4 = 0.5

P(A|C) = P(A ∩ C)/P(C) = P({HT}) / P({HT, TH}) =


0.25/0.5 = 0.5 = P(A), i.e. A and C are independent.
64
Examples

Toss 2 dice: Example 1 Example 2

Let A6 = {sum 2 dice gives 6} and B = {first die gives Let A7 denote sum of 2 dice is 7
4}
P(A7) = P({1:6, 2:5, 3:4, 4:3, 5:2, 6:1}) = 6/36 = ⅙,
A6 = {1:5, 2:4, 3:3, 4:2, 5:1} , and B = {4:1, 4:2, 4:3, and P(A7 ∩ B) = P({4:3}) = 1/36
4:4, 4:5, 4:6}, A6 ∩ B = {4:2}
1/36 = P(A7 ∩ B) = P(A7)P(B) = ⅙ x ⅙
1/36 = P(A6 ∩ B) ≠ P(A6)P(B) = 5/36 x 1/6
⇒A7 and B are independent
⇒A6 and B are dependent

65

You might also like