Lecture 9 - Probability COMP7180
Lecture 9 - Probability COMP7180
2
Why Probability: An Example
• Suppose you are trying to determine if a patient has inhalational
anthrax (吸入性炭疽病). You observe the following symptoms:
3
Why Probability: An Example
• You would like to determine how likely the patient is infected with
inhalational anthrax given that the patient has a cough, a fever, and
difficulty in breathing;
4
Why Probability: An Example
• Now suppose you order an x-ray and observe that the patient has a
wide mediastinum ((胸腔)纵隔);
5
Why Probability: An Example
• In the previous slides, what you observed affected your belief that the
patient is infected with anthrax;
6
Highlight Page
What is Probability
• A probability can be regard as a function to estimate the value of every event.
7
What is Probability
Example: Toss a coin (1 time). Then, the outcome is H or T, where H is
the head of a coin and T is the tail of a coin.
8
What is Probability
The domain should satisfy some special properties:
9
𝐶
𝐶
What is Probability
Example: Toss a coin (1 time). Then, the outcome is H or T, where H is
the head of a coin and T is the tail of a coin.
Then S= { H, T}; The domain is { {H,T}, {H}, {T}, ∅}.
{H,T}, {H}, {T}, ∅ are called events.
• S and ∅ should be event;
• S = ∅; {H} = {T}; {T} ={H}; ∅ = S;
• S ∩ ∅=∅; S∩{H}={H}; S∩{T}={T}; {H}∩{T}=∅;
• {H}∪{T}=S; {H} ∪ ∅={H}; {T} ∪ ∅={T}.
10
𝐶
𝐶
𝐶
𝐶
Highlight Page
What is Probability
As a function, we should have a range (值域). What is the range?
11
What is Probability
Example. Toss a coin (1 time). There are outcomes: H and T, where H is
the head of a coin and T is the tail of a coin.
12
Highlight Page
What is Probability
Probability is a special function, which should satisfy some properities:
• P(S)=1; P(∅)=0; 0≤P(E) ≤ 1;
• P({H,T})=1; P(∅)=0;
• P({H}∪{T}) = P({H})+P({T})-P(∅).
14
Highlight Page
What is Probability
• How to understand P(E∪F) = P(E)+P(F)-P(E∩F)?
E E ∩F F
15
Highlight Page
Random Variables
• Generally, it is very complex to represent an event;
• Example. Toss a coin (1 time). In the sample space S={ H, T}, we design a
function X: S→{1,-1} such that X(H)=1 and X(T)=-1. Then X is
a random variable.
Moreover, P(X=1) = P({H}) = 0.5 and P(X=-1) = P({T})=0.5.
16
Highlight Page
• For vector-valued variables, we would write the random variable as X and one
of its values as x.
17
Probability Distributions
• A probability distribution is a description of how likely a random
variable or set of random variables is to take on each of its possible
states. The way we describe probability distributions depends on
whether the variables are discrete or continuous.
18
Highlight Page
• 0 ≤ P(X = x) ≤ 1
• ∑
P(X = x) = 1. We refer to this property as being normalized
x 19
Highlight Page
1 1
P(X = xi) = P(X = xi) =
∑ ∑
; = =1
•
𝑖
𝑛
𝑛
𝑛
𝑖
20
𝑛
Highlight Page
∫
2) P(a≤X≤b) = X(x) x;
+∞
𝑎
∫ X(x)
3) x=1.
𝑝
𝑑
𝑝
𝑑
21
𝑝
𝑝
𝑏
Highlight Page
22
Highlight Page
1 1
exp( − 2 (x − ) )
2
2 2 2
𝜋
𝜎
𝜎
𝜇
24
Continuous Variables and PDF
• Continuous uniform distribution is one of the most important
continuous distributions.
• The probability density function of continuous unfiorm distribution
1
can be written as (x; , ) =
−
𝑏
𝑎
𝑝
𝑎
𝑏
25
Exercise 1
Classify each random variable as either discrete or continuous
1. The number of applicants for a job.
2. The time between customers entering a checkout lane at a retail
store.
3. The temperature of a cup of coffee served at a restaurant.
4. The air pressure of a tire on an automobile.
5. The number of students who actually register for classes at a
university next semester.
26
Exercise 1
Classify each random variable as either discrete or continuous
1. The number of applicants for a job. Discrete
2. The time between customers entering a checkout lane at a retail
store. Continuous
3. The temperature of a cup of coffee served at a restaurant.
Continuous
4. The air pressure of a tire on an automobile.Continuous
5. The number of students who actually register for classes at a
university next semester. Discrete
27
Exercise 2
Determine whether or not the table is a valid probability distribution of
a discrete random variable. Explain fully.
• X = -2, 0, 2, 4. P(X=-2)=0.2, P(X=0) = 0.3, P(X=2) = 0.3, P(X=4)=0.2.
P(X=1) = 0.1, P(X=3)= 0.2, P(X=4) = 0.1, P(X=70)= 0.3, P(X=80) = 0.2.
What is P(X=90)?
What is P(X<70)?
What is P(70≤X<90)?
30
Exercise 3
A discrete random variable X has the following probability distribution:
P(X=1) = 0.1, P(X=3)= 0.2, P(X=4) = 0.1, P(X=70)= 0.3, P(X=80) = 0.2.
What is P(X=90)?
P(X=90) = 1-P(X=1)-P(X=3)-P(X=4)-P(X=70)-P(X=80)=0.1
What is P(X<70)? P(X<70)=P(X=1)+P(X=3)+P(X=4)=0.4
What is P(90>X≥70)?
P(90>X≥70)=P(X=70)+P(X=80)=0.3+0.2=0.5 31
Exercise 4
• A standard dice (骰 ) has six sides printed with little dots numbering
1, 2, 3, 4, 5, and 6. Assume that the dice is fair, i.e., each of these
outcomes is equally likely. Since there are six possible outcomes, the
probability of obtaining any side of the dice is 1/6. Now you are
throwing the dice for three times. What is the probability that at least
two throws have the same outcome?
32
子
Exercise 4
• A standard dice (骰 ) has six sides printed with little dots numbering
1, 2, 3, 4, 5, and 6. Assume that the dice is fair, i.e., each of these
outcomes is equally likely. Since there are six possible outcomes, the
probability of obtaining any side of the dice is 1/6. Now you are
throwing the dice for three times. What is the probability that at least
two throws have the same outcome?
• Solution: We are required to calculate P(A), where A stands for the
event that “at least two throws have the same outcome”. Then event
~A is “all three throws have different outcomes”, and we have P(~A) =
(6*5*4)/(6*6*6) = 5/9. So that P(A) = 1 - P(~A) = 4/9.
33
子
Exercise 5
• We are in the process of fine-tuning the parameters of a machine learning
algorithm, specifically two parameters: w1 and w2. It is established that
both w1 and w2 follow a uniform distribution within the range [0,1]. In
order for the algorithm to function optimally, it is crucial that the
difference between w1 and w2 does not exceed 1/4 (0.25). Now, the
question arises: What is the probability of the algorithm successfully
meeting this criterion and working effectively?
34
Exercise 5 w2
0.25
35
Exercise 6
• Given a line segment AD, we select two points, B and C, on AD. Here, B and
C are independently and uniformly distributed along AD. We then break
the line segment at points B and C, resulting in three new line segments.
The task is to determine the probability of these three new line segments
being able to form a triangle.
36
Exercise 6
• Given a line segment AD, we select two points, B and C, on AD. Here, B and C are
independently and uniformly distributed along AD. We then break the line segment
at points B and C, resulting in three new line segments. The task is to determine the
probability of these three new line segments being able to form a triangle.
• Solution: Assume that the coordinates of A, B, C, D are 0, x, y, 1, respectively. y
Here we have 0 ≤ x, y ≤ 1.
• Let’s first assume x < y. Then the three new segments are AB, BC, and CD. Their 1
lengths are x, y-x, 1-y. To form a triangle, the following three inequalities should
be satisfied:
• |AB| + |BC| > |CD|, |AB| + |CD| > |BC|, |BC| + |CD| > |AB|,
• x + (y-x) > 1-y, x + (1-y) > y-x, (y-x) + (1-y) > x ➔ y > 1/2, y-x < 1/2, x <
1/2.
• P(x < y and form a triangle) = 1/8. 0.25
• Similarly, we have P(x > y and form a triangle) = 1/8.
x
• So P(new line segments being able to form a triangle) = 1/4. 0 0.25 1
37
Highlight Page
Joint Distribution
• In some practice case, we need to consider multiple randon variables
Joint Distribution
• If random variables X, Y are continuous random variables, then the joint
distribution of X and Y is
P(a1≤X≤b1, a2≤Y≤b2)
• In fact, when X, Y are continuous random variables, there exists a
probability density function for the joint distribution:
1 2
∫ ∫
P(a1≤X≤b1, a2≤Y≤b2)= XY(x,y)dxdy,
1 2
40
Highlight Page
Marginal Distribution
∑
P(X=x) = P(X = x, Y = y)
y
∑
P(Y=y) = P(X = x, Y = y)
x
41
Highlight Page
Marginal Distribution
• If random variables X and Y are continuous random variables, then the marginal
distribution with respect to X is
+∞
∫ ∫−∞
P(a ≤ X ≤ b) = XY(x, y) y x
and the density function of X is
+∞
∫−∞ XY(x, y)
(x) = y
Similarly, we can obtain the marginal distribution with respect to Y and the
density function of Y.
𝑎
𝑝
𝑑
𝑑
𝑝
𝑑
42
𝑝
𝑏
Joint Probability and Marginal Probability : An Example
X Y Z P(X,Y,Z)
• Joint probabilities can involve any
number of variables 0 0 0 0.1
0 0 1 0.2
• For each combination of variables, we
need to say how probable that 0 1 0 0.05
combination is 0 1 1 0.05
• The probabilities of these combinations 1 0 0 0.3
need to sum to 1
1 0 1 0.1
• Once you have the joint probability 1 1 0 0.05
distribution, you can calculate any
1 1 1 0.15
probability involving X, Y, and Z
43
Joint Probability and Marginal Probability: An Example
▫ P(X=1, Y =1)?
▫ = P(X=1, Y = 1, Z = 1) + P(X=1, Y = 1, Z = 0) = X Y Z P(X,Y,Z)
0.2
0 0 0 0.1
▫ P(X=1, Y = 0)? 0 0 1 0.2
▫ = P(X=1, Y = 0, Z = 1) + P(X=1, Y = 0, Z= 0) = 0.4
0 1 0 0.05
▫ P(X=1)? 0 1 1 0.05
▫ = P(X=1, Y = 1) + P(X=1, Y = 0)
=0.6 1 0 0 0.3
1 0 1 0.1
44
Joint Probability and Marginal Probability: An Example
• Solution:
X Y Z P(X,Y,Z)
P(Y=1) = P(X=1,Y=1,Z=1)+
0 0 0 0.1
P(X=0,Y=1,Z=1)+
0 0 1 0.2
P(X=1,Y=1,Z=0)+
0 1 0 0.05
P(X=0,Y=1,Z=0) 0 1 1 0.05
= 0.15+0.05+0.05+0.05 1 0 0 0.3
= 0.3 1 0 1 0.1
1 1 0 0.05
1 1 1 0.15
45
Conditional Probability
X Y Z P(X,Y,Z)
0 0 0 0.1
• Given an event E and an event F,
0 0 1 0.2
the condition probability of E given F is
0 1 0 0.05
0 1 1 0.05
P(E ∩ F)
P(E|F) = 1 0 0 0.3
P(F) 1 0 1 0.1
1 1 0 0.05
• P(X=1 | Y=1) = P(X = 1, Y = 1) / P(Y = 1) 1 1 1 0.15
= 0.2/0.3= 2/3
46
Highlight Page
Conditional Distribution
• Conditional distributions seek to answer the question, what is the
probability distribution over Y , when we know that X must take on a
certain value x.
47
Conditional Distribution
P(X=1 | Y=1) = P(X = 1, Y = 1) / P(Y = 1) = 0.2/0.3= 2/3
X Y Z P(X,Y,Z)
0 0 0 0.1
0 0 1 0.2
P(Z=0 | X=1) = P(X = 1, Z = 0) / P(X = 1) = 0.35/0.6= 7/12
0 1 0 0.05
0 1 1 0.05
1 0 0 0.3
Exercise 8: Try the calculate the following by yourself:
1 0 1 0.1
P(Z=0|X=1, Y=1)?
P(Y=0|X=1, Z=1)? 1 1 0 0.05
1 1 1 0.15
48
Conditional Distribution
• Solution X Y Z P(X,Y,Z)
P(X=1,Y=1) = P(X=1,Y=1,Z=1)+P(X=1,Y=1,Z=0)
0 0 0 0.1
= 0.15+0.05=0.2
0 0 1 0.2
P(Z=0|X=1,Y=1) = P(X=1,Y=1,Z=0)/P(X=1,Y=1)
= 0.05/0.2=1/4 0 1 0 0.05
0 1 1 0.05
49
Highlight Page
Conditional Distribution
• If X and Y are continuous random variables, then
1 2 1
∫ ∫ ∫
P(a2 ≤ Y ≤ b2 | a1 ≤ X ≤ b1) = XY(x,y)dydx / X(x)dx,
1 2 1
...........
∏
P(E1 ∩ E2.... ∩ En) = P(E1) P(Ei | E1 ∩ E2 ∩ … ∩ Ei−1)
i=2
0 0 0 0.1
P(X=0, Y=1, Z=1) =0.05;
0 0 1 0.2
P(X=0) = 0.4;
P(Y=1|X=0)= 0.1/0.4=1/4; 0 1 0 0.05
0 1 1 0.05
By chain rule, we obtain that 1 0 0 0.3
1 0 1 0.1
P(Z=1|X=0, Y=1)
1 1 0 0.05
= P(X=0, Y=1, Z=1)/(P(Y=1|X=0)P(X=0))
= 0.05/0.1 = 1/2 1 1 1 0.15
53
Chain Rule of Conditional Probabilities
X Y Z P(X,Y,Z)
0 0 1 0.2
P(Z=1|X=0,Y=0) 0 1 0 0.05
0 1 1 0.05
1 0 0 0.3
1 0 1 0.1
1 1 0 0.05
1 1 1 0.15
54
Chain Rule of Conditional Probabilities
• Solution X Y Z P(X,Y,Z)
0 0 0 0.1
P(X=0, Y=0, Z=1) =0.2; 0 0 1 0.2
P(X=0) = 0.4;
0 1 0 0.05
P(Y=0|X=0)= (0.1+0.2)/0.4=3/4;
0 1 1 0.05
1 0 0 0.3
By chain rule, we obtain that
1 0 1 0.1
= 0.2/0.3 = 2/3
55
Exercise 10
• A standard die has six sides printed with little dots numbering 1, 2, 3, 4, 5,
and 6. Assume that the die is fair, i.e., each of these outcomes is equally
likely. Since there are six possible outcomes, the probability of obtaining any
side of the die is 1/6. Now Bob is throwing the dice for three times. Assume
that event A is “The outcome of the first throw is 1” and event B is “The
summation of outcomes of three throws is no less than 10”. Calculate P(B|
A) and show the detailed calculation procedure.
56
Exercise 10
• A standard die has six sides printed with little dots numbering 1, 2, 3, 4, 5, and 6. Assume
that the die is fair, i.e., each of these outcomes is equally likely. Since there are six possible
outcomes, the probability of obtaining any side of the die is 1/6. Now Bob is throwing the
dice for three times. Assume that event A is “The outcome of the first throw is 1” and
event B is “The summation of outcomes of three throws is no less than 10”. Calculate P(B|
A) and show the detailed calculation procedure.
• Solution: P(B|A) = P(A,B)/P(A).
• Then event (A and B) is “The outcome of the first throw is 1 and (at the same time) the
summation of outcomes of three throws is no less than 10”, which is equivalent to “The
outcome of the first throw is 1 and the summation of outcomes of the second and the
third throws is no less than 9”.
• There are 10 situations that satisfy this condition: (outcome of the second throw, outcome
of the third throw) = (3,6), (4,5), (4,6), (5,4), (5,5), (5,6), (6,3), (6,4), (6,5), (6,6).
• We also know that there are 6*6 = 36 situations of the second and the third throws.
Therefore, P(B|A) = P(A,B)/P(A) = 10/36 = 5/18.
57
Thank You!
58