BSTA 2104 Probability and Statistics II Notes Sep Dec 2024
BSTA 2104 Probability and Statistics II Notes Sep Dec 2024
BSTA 2104 Probability and Statistics II Notes Sep Dec 2024
STATISTICS II
GROUPS:
Course Purpose
The course explores probability distributions of functions of random variables,
an essential skill in proving standard statistical results.
Course Content
For this unit, other than the notes provided here, use this website in tandem
https://www.probabilitycourse.com/preface.php.
Probability Distributions
Probability distributions describe how the values of a random variable are dis-
tributed. The main types are:
1. Discrete Distributions
For example, the binomial distribution describes the number of successes
in a fixed number of independent Bernoulli trials.
2. Continuous Distributions
For example, the normal distribution describes data that clusters around
a mean.
Some Distributions
1. Binomial Distribution
• PMF: P (X = k) = n
k pk (1 − p)n−k for k = 0, 1, . . . , n.
• Mean: E[X] = np
• Variance: Var(X) = np(1 − p)
2. Normal Distribution
(x−µ)2
• PDF: f (x) = √ 1
2πσ 2
e− 2σ 2
• Mean: µ
Example
1
For a fair six-sided die, the PMF is P (X = x) = 6 for x = 1, 2, 3, 4, 5 or 6
Example
The standard normal distribution
1 (x−µ)2
f (x) = √ e− 2σ 2 (3)
2πσ 2
3. Cumulative Distribution Function (CDF) The CDF gives the probability
that X takes a value less than or equal to x.
FX (x) = P (X ≤ x) (4)
Mathematical Expectation
Mathematical expectation, also known as the expected value (EV) or mean, is
a fundamental concept in probability and statistics that provides a measure of
the central tendency of a random variable.
Notation
For a discrete random variable X, the expected value is denoted as E[X].
For a continuous random variable, it’s often represented as µ or E[X].
Properties of Expectation
1. Linearity of Expectation: If X and Y are random variables, then:
E[aX + bY ] = aE[X] + bE[Y ]
This holds true regardless of whether X and Y are independent.
2. Expectation of Constants: If c is a constant, then:
E[c] = c
To find the probability that a randomly selected adult has a height between
160 cm and 180 cm, we calculate:
Z 180
P (160 ≤ X ≤ 180) = f (x) dx
160
1
P (X = x) = for x = 1, 2, 3, 4, 5, 6
6
1. Probability of a Specific Outcome
The probability of rolling a 3:
1
P (X = 3) =
6
F (x) = P (X ≤ x)
P (X = x, Y = y) = P (X = x) · P (Y = y)
• Covariance:
• Correlation Coefficient:
Cov(X, Y )
ρX,Y = ,
σX σY
which standardizes the covariance to a range between -1 and 1, indicating
the strength and direction of the linear relationship.
1 1 7
E[X] = 10 · + 20 · + 0 · = 1 + 4 = KES5.
10 5 10
Continuous Case
1. For a uniform distribution on the interval [0, 1]:
1 1
x2
Z
1
E[X] = x · 1 dx = = .
0 2 0 2
E[X] = µ.
E[X 2 ] = E[X]
Thus:
Var(X) = E[X] − (E[X])2 = E[X](1 − E[X])
3. Prove that the expected value of the sum of independent random variables
is the sum of their expected values.
Solution
Let X1 , X2 , . . . , Xn be independent random variables:
" n # n
X X
E Xi = E[Xi ]
i=1 i=1
The result holds as the sum is simply the sum of expectations. What if
you have random variables X and Y ? See the link https://www.youtube.
com/watch?v=7KeV3wLw0 o
Practical Examples
4. A fair six-sided die is rolled. Define a random variable X as the outcome
of the roll. What is E[X] and Var(X)?
Solution
P (X = x) = x, x = 1, 2, 3, 4, 5, 6
1+2+3+4+5+6
E[X] = = 3.5
6
12 + 2 2 + 3 2 + 4 2 + 5 2 + 6 2 91
Var(X) = − (3.5)2 = − 12.25 ≈ 2.92
6 6
5. If X is a random variable representing the number of heads in three coin
flips, find E[X] and Var(X).
Solution
- X can take values 0, 1, 2, 3 with probabilities 81 , 38 , 38 , 18 :
1 3 3 1
P (X = 0) = , P (X = 1) = , P (X = 2) = , P (X = 3) =
8 8 8 8
1 3 3 1 3 6 3 12
E[X] = 0 · +1· +2· +3· = + + = = 1.5
8 8 8 8 8 8 8 8
10.5
Var(X) = E[X 2 ] − (E[X])2 = − (1.5)2 = 0.75
8
E[X] = np = 10 × 0.8 = 8
Problem breakdown
Prove that for a discrete random variable X with probability mass func-
tion pX (x), the sum of pX (x) over all possible values of x is equal to 1.
Solution
The probability mass function (PMF) pX (x) of a discrete random variable
X is defined as:
pX (x) = P (X = x)
This represents the probability that X takes a specific value x.
Since X can take one of a countable set of values, say x1 , x2 , . . ., the total
probability for all possible values of X must sum to 1:
X
pX (xi ) = 1
xi
For a general discrete random variable X, which can take infinitely many
values x1 , x2 , x3 , . . ., the sum extends over all these values:
∞
X
pX (xi ) = 1
i=1
Problem breakdown
Show that if X is a random variable with a cumulative distribution func-
tion (CDF) FX (x), then limx→−∞ FX (x) = 0 and limx→∞ FX (x) = 1.
Solution
The cumulative distribution function (CDF) FX (x) of a random variable
X is defined as:
FX (x) = P (X ≤ x)
This represents the probability that the random variable X takes a value
less than or equal to x.
To prove the first part, limx→−∞ FX (x) = 0: As x approaches −∞, the
probability P (X ≤ x) becomes smaller and smaller, because X has a
lower probability of taking extremely negative values. In the limit, as x
approaches −∞, the CDF approaches 0 because:
lim P (X ≤ x) = 0
x→−∞
This result holds because the total probability of all possible values of X
must sum to 1.
Thus, the limits of the cumulative distribution function are:
lim FX (x) = 0 and lim FX (x) = 1
x→−∞ x→∞
Problem breakdown
Prove that the expectation of a constant random variable c is c.
Solution
Let X be a constant random variable, meaning X takes the value c with
probability 1:
P (X = c) = 1
P (X = x) = 0 for x ̸= c
E[X] = c · P (X = c) = c · 1 = c
Solution
λk e−λ
P (X = k) = for k = 0, 1, 2, . . .
k!
5. Proof that the CDF of a normal distribution is given by the error function.
Problem breakdown
Show that if X follows a normal distribution N (µ, σ 2 ), then the CDF of
X is given by:
1 x−µ
FX (x) = 1 + erf √
2 σ 2
where erf is the error function.
Solution
The cumulative distribution function (CDF) FX (x) of a normally dis-
tributed random variable X with mean µ and variance σ 2 is defined as:
Z x
FX (x) = P (X ≤ x) = fX (t)d(t)
−∞
1 (t−µ)2
fX (t) = √ e− 2σ2
σ 2π
Solution
Let X be a random variable following a binomial distribution with param-
eters n and p, denoted X ∼ Binomial(n, p). The probability mass function
(PMF) of X is:
n k
P (X = k) = p (1 − p)n−k
k
for k = 0, 1, 2, . . . , n.
The mean (expectation) of X is given by:
E[X] = np
Derivation of WLLN
2
Let X1 , X2 , . . . , Xn be i.i.d. random
Pn variables with mean µ and variance σ .
1
Define the sample mean X n = n i=1 Xi .
Compute the expectation of the sample mean:
E[X n ] = µ.
Derivation of CLT
Let X1 , X2 , . . . , Xn be independent and identically distributed (i.i.d.) random
variables with mean µ = E[Xi ] and variance σ 2 = Var(Xi ). Define the
normalized sum Zn as:
n
1 X Xi − µ
Zn = √ .
n i=1 σ
As n → ∞, the distribution of Zn converges in distribution to a standard
normal distribution:
d
Zn −
→ N (0, 1).
This means that the standardized sum of the random variables approaches a
normal distribution with mean 0 and variance 1, regardless of the distribution
of Xi , provided Xi has finite mean and variance.
To derive the CLT, we use characteristic functions, which are a powerful tool
in probability theory for studying sums of random variables.
Let X1 , X2 , . . . , Xn be i.i.d. random variables with mean µ and variance σ 2 .
The sum of these random variables is:
Sn = X1 + X2 + · · · + Xn .
The sample mean is:
n
Sn 1X
Xn = = Xi .
n n i=1
We are interested in the behavior of Sn as n → ∞. To standardize Sn , √
we
subtract the expected value nµ and divide by the standard deviation σ n,
forming a normalized sum Zn :
φX (t) = E eitX .
Since the Yi ’s P
are independent, the characteristic function of the normalized
n
sum Zn = √1n i=1 Yi is:
n
t
φZn (t) = φY1 √ .
n
We now expand φYi (t) in a Taylor series around t = 0. Since E[Yi ] = 0 and
Var(Yi ) = 1, we have:
t2
φYi (t) = 1 − + o(t2 ).
2
Substituting this approximation into the characteristic function of Zn , we get:
n
t2
1
φZn (t) = 1 − +o .
2n n
n
For large n, we use the fact that 1 + nx → ex as n → ∞. Thus:
2
t
φZn (t) ≈ exp − .
2
This is the characteristic function of the standard normal distribution N (0, 1).
Since the characteristic function of Zn converges to the characteristic function
of N (0, 1), we conclude that Zn converges in distribution to N (0, 1):
d
Zn −
→ N (0, 1).
Theoretical Examples
1. Prove the Weak Law of Large Numbers using Chebyshev’s inequality for
i.i.d. random variables X1 , X2 , . . . , Xn with mean µ and variance σ 2 .
Solution
Compute E[X n ] = µ.
2
Compute Var(X n ) = σn . Apply Chebyshev’s inequality:
σ2
P |X n − µ| ≥ ϵ ≤ 2 .
nϵ
2. Sketch the proof of the Strong Law of Large Numbers for i.i.d. random
variables X1 , X2 , . . . , Xn with mean µ and variance σ 2 .
Solution Pn Sn
Define Sn = i=1 Xi . Show that X n = n converges almost surely to µ.
(Hint: Use the Borel-Cantelli Lemma)
Solution
Let Xi = 1 if the i-th trial is a success, and 0 otherwise. Then Xi has
mean p and variance p(1 − p). Pn
Use WLLN to show that the sample proportion n1 i=1 Xi converges to
p in probability.
Solution
5. Prove the Strong Law of Large Numbers using martingale techniques (ad-
vanced).
Solution
(Hint: use the Martingale Convergence Theorem).
Practical Examples
1. You have collected daily temperatures over 365 days. Use the Weak Law
of Large Numbers to estimate the population mean temperature.
Solution
Let Xi be the temperature on day i.P
n
Compute the sample mean X n = n1 i=1 Xi .
By the WLLN, X n converges in probability to the true population mean
temperature as n → ∞.
Solution
Let Xi be the completion time for the
Pni-th employee.
Compute the sample mean X n = n1 i=1 Xi .
By the WLLN, the sample mean approximates the population mean com-
pletion time as n grows large.
Solution
Assume the heights are normally distributed with mean µ and variance
σ2 .
Use the CLT to approximate the distribution of the sample mean:
σ2
X n ∼ N µ, .
n
Solution
Let Xi be the Pnumber of customer visits on day i. Compute the sample
n
mean X n = n1 i=1 Xi .
By the SLLN, X n will almost surely converge to the true mean number
of customer visits as n → ∞.
Solution
Let Xi be 1 if the i-thPproduct is defective, and 0 otherwise. The sample
n
proportion is X n = n1 i=1 Xi .
Apply the CLT to approximate the distribution of X n .
Use this distribution to compute the probability that the sample propor-
tion is within 1% of the true defect rate.
Solution
Identify events: R (Rain), C (Cloudy).
Use the formula: P (R|C) = P P(R∩C)
(C) .
Solution
Identify events: P (Pass), S (Studied).
∩S)
Use the formula: P (P |S) = P P(P(S) .
0.72
Substitute values: P (P |S) = 0.8 .
Calculate: P (P |S) = 0.9.
Conclusion: The probability of passing given studying is 0.9.
P (X = x, Y = y)
P (X = x|Y = y) = , P (Y = y) > 0
P (Y = y)
Solution
Identify events: X (Grade), Y (Hours studied).
Use the formula: P (X = A|Y = 5) = P (X=A,Y =5)
P (Y =5) .
0.1
Substitute values: P (X = A|Y = 5) = 0.25 .
Calculate: P (X = A|Y = 5) = 0.4.
Conclusion: The probability of receiving an ’A’ given 5 hours of study is
0.4.
Solution
Identify events: X (Wait time), Y (Day).
P (X=Under 10 minutes,Y =Monday)
Use the formula: P (X = Under 10 minutes|Y = Monday) = P (Y =Monday) .
P (D|A)P (A)
P (A|D) =
P (D)
P (D) = P (D|A)P (A) + P (D|B)P (B) = (0.7 × 0.4) + (0.8 × 0.6) = 0.74
2. A data center has two types of servers. Type A servers fail with a prob-
ability of 0.1 when CPU usage is high, while Type B servers fail with a
probability of 0.2. 30% of the servers are Type A, and 70% are Type B.
If a server fails, what is the probability that it is a Type A server?
Solution
Let A be Type A server, F be a server failure.
We are asked to find P (A|F ).
Using Bayes’ Theorem:
P (F |A)P (A)
P (A|F ) =
P (F )
Compute P (F ):
3. A cloud service provider has two types of servers: high performance (HP)
and low performance (LP). High-performance servers experience overload
with a probability of 0.05, while low-performance servers experience over-
load with a probability of 0.2. 20% of the servers are high-performance.
If an overload occurs, what is the probability it was a high-performance
server?
Solution
Let H be the event of using a high-performance server, and O be an over-
load.
We want to find P (H|O).
Use Bayes’ Theorem:
P (O|H)P (H)
P (H|O) =
P (O)
Calculate P (O):
P (O) = P (O|H)P (H) + P (O|L)P (L) = (0.05 × 0.2) + (0.2 × 0.8) = 0.17
Solution
Define L as not logged in for over a month and C as churn.
We want to calculate P (L|C).
Using Bayes’ Theorem:
P (C|L)P (L)
P (L|C) =
P (C)
P (C) = P (C|L)P (L) + P (C|R)P (R) = (0.6 × 0.25) + (0.2 × 0.75) = 0.3
Solution
Define the events:
Known values:
P (A ∩ B) = 0.1, P (B) = 0.2
P (A ∩ B) 0.1
P (A|B) = = = 0.5
P (B) 0.2
Conclusion: The probability that system A fails given that system B fails
is 0.5.
Solution
Define the events:
Known values:
P (X ∩ Y ) = 0.3, P (Y ) = 0.5
P (X ∩ Y ) 0.3
P (X|Y ) = = = 0.6
P (Y ) 0.5
Solution
Define the events:
Known values:
P (C ∩ V ) = 0.2, P (V ) = 0.4
P (C ∩ V ) 0.2
P (C|V ) = = = 0.5
P (V ) 0.4
Conclusion: The probability of a click given that the customer visited the
page is 0.5.
Solution
Define the events:
Known values:
P (A ∩ D) = 0.05, P (D) = 0.15
P (A ∩ D) 0.05
P (A|D) = = = 0.333
P (D) 0.15
fT,L (t, l)
fT |L (t|l) =
fL (l)
e−(t+l)
fT |L (t|l) = = e−t
e−l
Solution
Joint PDF:
fX,Y (x, y) = e−(x+y)
e−(x+y)
fX|Y (x|y) = = e−x
e−y
Solution
Joint PDF:
fX,Y (x, y) = 2e−2(x+y)
2e−2(x+y)
fX|Y (x|y) = = 2e−2x
e−2y
P (A) = P (A ∩ B1 ) + P (A ∩ B2 ) + · · · + P (A ∩ Bn )
P (A ∩ Bi ) = P (A|Bi )P (Bi )
Solution
Identify events: A (Server failure), B1 (Hardware failure), B2 (Software
Substitute values:
Calculate:
P (A) = 0.18 + 0.28 = 0.46
Solution
Identify events: A (Success), B1 (Manual testing), B2 (Automated test-
ing).
Apply the Law of Total Probability:
Substitute values:
Calculate:
P (A) = 0.28 + 0.54 = 0.82
Bayes Theorem
Bayes’ Theorem relates the conditional and marginal probabilities of random
events:
P (B|A)P (A)
P (A|B) =
P (B)
P (A ∩ B)
P (A|B) =
P (B)
P (A ∩ B) = P (B|A)P (A)
P (B|A)P (A)
P (A|B) =
P (B)
Click this link to see the differences between Bayes Theorem and Conditional
Probability https://www.cuemath.com/data/bayes-theorem/
Solution
Identify events: F (Faulty), C (Crash).
Apply Bayes’ Theorem:
P (C|F )P (F )
P (F |C) =
P (C)
Substitute values:
(0.95)(0.02)
P (F |C) =
0.1
Calculate:
0.019
P (F |C) = = 0.19
0.1
Conclusion: The probability that the server is faulty given that it crashed
is 0.19.
Solution
Identify events: M (Malicious), F (Flagged).
Apply Bayes’ Theorem:
P (F |M )P (M )
P (M |F ) =
P (F )
Substitute values:
(0.9)(0.01)
P (M |F ) =
0.05
Calculate:
0.009
P (M |F ) = = 0.18
0.05
Conclusion: The probability that the file is malicious given it was flagged
is 0.18.
Conditional Variance
It is the variance of X given Y = y:
Solution
Define possible salaries and probabilities for 5 years of experience.
Calculate the conditional mean:
2. What is the expected bug fix time given that there are 3 developers given
bug fix times: T = {5, 7, 10} hours with probabilities P (T = 5) = 0.2,
P (T = 7) = 0.5, P (T = 10) = 0.3?
Solution
Calculate the expected time:
Calculate:
E[T |D = 3] = 1 + 3.5 + 3 = 7.5 hours
Conclusion: The expected bug fix time is 7.5 hours when 3 developers are
working.
P (B|A)P (A)
P (A|B) =
P (B)
where P (B) can be expanded using the Law of Total Probability if necessary:
Key Differences
1. P (X = x, Y = y) ≥ 0
P P
2. x y P (X = x, Y = y) = 1
X\Y 1 2 3 Total
2 6 8
0 27 27 0 27
6 6 12
1 0 27 27 27
6 6
2 0 27 0 27
1 1
3 27 0 0 27
2. In a computer network, packets are sent to two servers, and each packet
either reaches its destination or is lost. Let X represent the number of
packets reaching Server A and Y represent the number of packets reaching
Server B. If the joint PMF is given by:
X\Y 0 1 2 Total
0 0.1 0.1 0.05 0.25
1 0.1 0.2 0.15 0.45
2 0.05 0.1 0.15 0.30
Solution
We need to verify that the total probability sums to 1. Summing the table:
X\Y 0 1 2 Total
0 0.2 0.05 0.02 0.27
1 0.15 0.2 0.05 0.4
2 0.08 0.15 0.1 0.33
X\Y 0 1 2 Total
0 0.15 0.1 0.05 0.3
1 0.2 0.2 0.1 0.5
2 0.05 0.1 0.05 0.2
Solution
To find P (X = 1), sum the probabilities for X = 1:
5. In a software testing process, two test suites are run on the same code. Let
X represent the number of failed test cases in Suite A, and Y represent
the number of failed test cases in Suite B. The joint distribution is given
as:
X\Y 0 1 2 Total
0 0.3 0.15 0.05 0.5
1 0.1 0.15 0.05 0.3
2 0.05 0.1 0.05 0.2
What is the probability that neither test suite has any failed test cases?
Solution
The probability that neither test suite has any failed test cases is P (X =
0, Y = 0) = 0.3.
Marginal Probability
Marginal Probability from Set Theory
Marginal probability can be understood from set theory by considering it as the
probability of a single event occurring without regard to other related events.
It is derived from joint probabilities by ”summing out” or ”marginalizing” over
the outcomes of the other events.
The joint probability P (A ∩ B) is the probability that it rains and the football
game is played.
The marginal probability P (A) would be the total probability that it rains,
regardless of whether the game happens or not. This would include both:
• The probability that it rains and the game is played.
• The probability that it rains and the game is not played.
In set notation, marginal probability can be seen as focusing on the event A,
whether A ∩ B or A ∩ B c (where B c is the complement of B) occurs.
Solution
8 12 6 1
P (X = 0) = , P (X = 1) = , P (X = 2) = , P (X = 3) =
27 27 27 27
3 18 6
P (Y = 1) = , P (Y = 2) = , P (Y = 3) =
27 27 27
2. In the example of network packet loss, the second example in the Joint
Distribution of Two Random Variables section, the joint distribution of
packets reaching Servers A and B is:
X\Y 0 1 2 Total
0 0.1 0.1 0.05 0.25
1 0.1 0.2 0.15 0.45
2 0.05 0.1 0.15 0.30
Solution
The marginal distribution of packets reaching Server A, P (X), is:
Solution
The marginal distribution of packets reaching Server A, P (X), is:
4. In the earlier example of customer purchases and returns, the joint PMF
is given by:
X\Y 0 1 2 Total
0 0.2 0.05 0.02 0.27
1 0.15 0.2 0.05 0.4
2 0.08 0.15 0.1 0.33
Solution
We sum the joint probabilities over all values of X:
X\Y 0 1 2 Total
0 0.15 0.1 0.05 0.3
1 0.2 0.2 0.1 0.5
2 0.05 0.1 0.05 0.2
Solution
To find the marginal distribution of Y (tasks on Processor B):
X\Y 0 1 2 Total
0 0.3 0.15 0.05 0.5
1 0.1 0.15 0.05 0.3
2 0.05 0.1 0.05 0.2
Solution
We sum the joint probabilities for each value of X:
PXY (x, y) = P (X = x, Y = y)
This represents the probability that X takes value x and Y takes value y simul-
taneously. The joint PMF contains all the information about the distribution
of X and Y .
Joint Range RXY is the set of all pairs (x, y) where PXY (x, y) > 0:
P (X = x, Y = y) = P ((X = x) ∩ (Y = y))
Lemma 1: The sum of all probabilities in the joint PMF must equal 1:
X X
PXY (x, y) = 1
x∈RX y∈RY
Proof
Let (X, Y ) be a pair of discrete random variables with a joint PMF PXY (x, y)
defined over their respective ranges RX and RY . The joint PMF PXY (x, y)
assigns probabilities to every pair (x, y), ensuring that all possible outcomes are
accounted for. By the definition of a probability measure, the total probability
across the sample space must equal 1:
X X
PXY (x, y) = 1
x∈RX y∈RY
Proof
To derive the marginal PMF PX (x), we sum the joint PMF PXY (x, y) over all
possible values of Y : X
PX (x) = PXY (x, y)
y∈RY
The total probability law states that we can calculate the probability of an event
by conditioning on a partition of the sample space.
The marginalization process ensures that PX (x) captures all possible interac-
tions with Y while isolating the effects of Y .
Thus, we express PX (x) as:
X
PX (x) = PXY (x, y)
y∈RY
Proof
This reflects the total probability of observing Y taking the value y, accounting
for all possible values of X.
We can express it as:
X
PY (y) = P (X | Y = y)PY (y) = P (Y = y)
x∈RX
Proof
By applying Lemma 2, we can assert:
X
PX (x) = PXY (x, y)
y∈RY
This shows how marginal PMFs capture the essential probabilities of one vari-
able, eliminating the influence of the other.The equation confirms that PX (x)
is derived from summing the contributions from Y :
X
PX (x) = PXY (x, y)
y∈RY
• Marginal PMF of X:
X
PX (x) = PXY (x, y)
y∈RY
Example
1. Given the joint PMF of X and Y , find
X\Y 0 1 2
1 1 1
0 6 4 8
1 1 1
1 8 6 6
Solution
The marginal PMF of X For X = 0
X 1 1 1 9
PX (0) = PXY (0, y) = + + = = 0.375
y
6 4 8 24
For X = 1
X 1 1 1 15
PX (1) = PXY (1, y) = + + = = 0.625
y
8 6 6 24
P (X = 0, Y = 1) 1/4 2
P (Y = 1 | X = 0) = = =
PX (0) 0.375 3
X\Y 10 20 30
2 (Age Group: 20-29) 0.2 0.1 0.05
3 (Age Group: 30-39) 0.1 0.15 0.1
Solution
The marginal PMF of X is found by summing the joint PMF over
all possible values of Y for each X.
Solution
To find the marginal PMF of Y from the joint PMF, we sum the
joint probabilities over all possible values of X for each value of Y .
This shows the probabilities associated with each value of Y
For Y = 10
X
PY (10) = PXY (x, 10) = PXY (2, 10)+PXY (3, 10) = 0.2+0.1 = 0.3
x∈{2,3}
For Y = 20
X
PY (20) = PXY (x, 20) = PXY (2, 20)+PXY (3, 20) = 0.1+0.15 = 0.25
x∈{2,3}
For Y = 30
X
PY (30) = PXY (x, 30) = PXY (2, 30)+PXY (3, 30) = 0.05+0.1 = 0.15
x∈{2,3}
P (X = 3, Y = 20)
P (Y = 20 | X = 3) =
PX (3)
From the joint PMF table, we know:
P (X = 3, Y = 20) = 0.15
and from the marginal PMF of X:
PX (3) = 0.35
X\Y 10 20 30
0 0.1 0.2 0.15
1 0.05 0.25 0.25
Solution
To find the marginal PMF of X, sum across the columns for each value of
X:
For X = 0
For X = 1
Examples
1. Given the joint PDF fX,Y (x, y) = 6xy for 0 < x < 1 and 0 < y < 1, how
do we find the value of the joint distribution function FX,Y (0.5, 0.5)?
Solution
To compute FX,Y (0.5, 0.5), we integrate the joint PDF:
Z 0.5 Z 0.5
FX,Y (0.5, 0.5) = 6xy dx dy
0 0
First, we integrate with respect to x:
Z 0.5 2 0.5
x 0.25
6xy dx = 6y = 6y × = 1.5y
0 2 0 2
Now, we integrate with respect to y:
Z 0.5 2 0.5
y 0.25
1.5y dy = 1.5 = 1.5 × = 0.1875
0 2 0 2
Thus, FX,Y (0.5, 0.5) = 0.1875.
2. Suppose the response times X and Y of two servers are modeled with the
joint PDF fX,Y (x, y) = 8xy for 0 < x < 1 and 0 < y < 1. How do we
calculate the probability that both servers respond within 0.3 seconds?
Solution
We need to find:
Z 0.3 Z 0.3
P (0 < X < 0.3, 0 < Y < 0.3) = 8xy dx dy
0 0
First, we integrate with respect to x:
Z 0.3 2 0.3
x 0.09
8xy dx = 8y = 8y × = 0.36y
0 2 0 2
Next, we integrate with respect to y:
Z 0.3 2 0.3
y 0.09
0.36y dy = 0.36 = 0.36 × = 0.0162
0 2 0 2
Thus, the probability that both servers respond within 0.3 seconds is
0.0162.
The marginal PDF of X is obtained by integrating the joint PDF over all values
of Y .
The marginal distribution function of the continuous random variable Y is de-
fined as: Z ∞
FY (y) = P (Y ≤ y) = f (x, y) dx.
−∞
The marginal PDF of Y is obtained by integrating the joint PDF over all values
of X.
Examples
1. Given the joint PDF fX,Y (x, y) = 6xy for 0 < x < 1 and 0 < y < 1, how
can we determine the marginal PDF fX (x)?
Solution
We find P (X < 0.4) by integrating the marginal PDF:
0.4 0.4
x2
Z
0.16
P (X < 0.4) = 3x dx = 3 =3× = 0.24
0 2 0 2
f (x, y)
fY |X (y|x) =
fX (x)
Solution
To find the marginal PDF of X, integrate the joint PDF over all values of
y:
Z 1 2 1
y
fX (x) = 6xy dy = 6x = 3x, 0 < x < 1
0 2 0
2. In a data processing pipeline, the time taken for processing X (in seconds)
and the number of errors Y are jointly distributed according to the PDF:
Solution
First, find the marginal PDFs:
Z ∞
fX (x) = e−(x+y) dy = e−x
0
Z ∞
fY (y) = e−(x+y) dx = e−y
0
Since fX,Y (x, y) = fX (x) · fY (y), the variables X and Y are independent.
4. In a software testing scenario, let X represent the number of bugs found
in module A, and Y represent the number of bugs found in module B.
Suppose the joint PDF of X and Y is:
(
10xy if 0 < x < 1, 0 < y < 1
fX,Y (x, y) =
0 otherwise
What is the probability that both modules have bug counts less than 0.5?
Solution
We need to compute:
Z 0.5 Z 0.5
P (X < 0.5, Y < 0.5) = 10xy dx dy
0 0
More Examples
1. Consider two continuous random variables, X and Y , which have a joint
PDF defined as fX,Y (x, y) = 6xy for values 0 < x < 1 and 0 < y < 1.
How can we verify that this joint PDF is valid by ensuring that the total
probability integrates to 1 over the specified range?
Solution
To confirm that this is a valid joint PDF, we need to compute the total
probability: Z 1Z 1 Z 1Z 1
6xy dx dy = 6 xy dx dy
0 0 0 0
First, we perform the integration with respect to x:
Z 1
y 1 1
xy dx = = y
0 2 0 2
Next, we integrate with respect to y:
Z 1
1 1 1 1
y dy = × =
0 2 2 2 4
Thus, the total integral is:
1
6× =1
4
This confirms that fX,Y (x, y) is indeed a valid joint PDF.
2. Given the joint PDF fX,Y (x, y) = 6xy for 0 < x < 1 and 0 < y < 1, how
can we calculate the probability that both random variables X and Y fall
within the range of 0.2 to 0.5?
Solution
We need to find:
Z 0.5 Z 0.5
P (0.2 < X < 0.5, 0.2 < Y < 0.5) = 6xy dx dy
0.2 0.2
Solution
First, find fY (0.5):
Z 1 Z 1
fY (y) = fX,Y (x, y) dx = 6xy dx = 3y ⇒ fY (0.5) = 3×0.5 = 1.5
0 0
4. If the joint PDF for two software system response times X and Y is given
by fX,Y (x, y) = 8xy for 0 < x < 1 and 0 < y < 1, how can we calculate
the conditional PDF fX|Y (x|0.2)?
Solution
1. Find fY (0.2):
Z 1
fY (y) = 8xy dx = 4y ⇒ fY (0.2) = 4 × 0.2 = 0.8
0
5. For the joint PDF fX,Y (x, y) = 10xy where 0 < x < 1 and 0 < y < 1, how
do we find P (X < 0.4|Y = 0.6)?
Solution
Calculate the conditional PDF fX|Y (x|0.6): First, find fY (0.6):
Z 1
fY (y) = 10xy dx = 5y ⇒ fY (0.6) = 5 × 0.6 = 3
0
6. If the joint PDF fX,Y (x, y) = 4xy for 0 < x < 1 and 0 < y < 1 is given,
how do we find fX|Y (x|0.8)?
Solution
First, find fY (0.8):
Z 1
fY (y) = 4xy dx = 2y ⇒ fY (0.8) = 2 × 0.8 = 1.6
0
7. In a data analysis project, the joint PDF of the heights X and weights Y
of individuals is given by fX,Y (x, y) = 12xy for 0 < x < 1 and 0 < y < 1.
How do we find the probability that both height and weight are less than
0.5?
Solution
We need to calculate:
Z 0.5 Z 0.5
P (X < 0.5, Y < 0.5) = 12xy dx dy
0 0
Solution
We calculate:
Z 0.2 Z 0.2
P (X < 0.2, Y < 0.2) = 20xy dx dy
0 0
Solution
We find: Z 0.4 Z 0.6
P (X < 0.6, Y < 0.4) = 14xy dx dy
0 0
Solution
We need to compute:
Z 0.7 Z 0.5
P (X < 0.5, Y < 0.7) = 18xy dx dy
0 0
Exercise
1. Let X and Y be two random variables. Then for
(
kxy for 0 < x < 4 and 1 < y < 5
fX,Y (x, y) =
0 otherwise
Solution
Marginal density function of Y is given by
Z ∞ Z 2
fY (y) = fX,Y (x, y) dx = dx
−∞ 0
x
= 2 [y]0
Solution
Conditional density function of Y given X (0 < X < 1) is
fX,Y (x, y) 2
fY |X (y|x) = = for 0 < y < x.
fX (x) 2x
fX,Y (x, y) 2 1
fX|Y (x|y) = = = for y < x < 1.
fY (y) 2(1 − y) 1−y
Find
(a) P (X < 1, Y < 3)
Solution
Z 1 Z 3
P (X < 1, Y < 3) = fX,Y (x, y) dy dx
−∞ −∞
Z 1 Z 3
1
= (6 − x − y) dy dx
0 2 8
Z 1 Z 3
6−x−y
= dy dx
0 2 8
2 !
6y − xy − y2 3
Z 1
= dx
0 8 2
(18 − 3x − 92 ) − (12 − 2x − 2
Z 1
= dx
0 8
(18 − 3x − 92 ) − (10 − 2x)
Z 1
= dx
0 8
Solution
P [X < 1, Y, 3
P [X < 1|Y < 3] =
P [Y < 3]
Where Y < 3 Z 2 Z 3
1
= (6 − x − y) dy dx
0 2 8
Z 2 Z 3
6−x−y
= dy dx
0 2 8
2 !
6y − xy − y2 3
Z 2
= dx
0 8 2
(18 − 3x − 92 ) − (12 − 2x − 2
Z 2
= dx
0 8
(18 − 3x − 92 ) − (10 − 2x)
Z 2
= dx
0 8
Z 27
2 − x)
= dx
0 8
x2
! !
7 2
2x − 2 5
= =
8 0 8
where Z 2
P (Y < 3) =
0
3
8 3
∴ P [X < 1|Y < 3] = 5 = = 0.6
8
5
1. A subscription service has 100 customers, and each has a 10% chance of
leaving. What is the probability that exactly 5 will leave in a given month?
Solution
Using the Binomial PMF with n = 100, p = 0.1, and k = 5:
100
P (X = 5) = (0.1)5 (0.9)95 ≈ 0.1871
5
Solution
Using the Poisson PMF with λ = 3 and k = 5:
35 e−3
P (X = 5) = ≈ 0.1008
5!
0.1 · 10−2
9.128.51 · 10−2
6.48 · 10−2
4.84 · 10−2 4.36 · 10−2
· 10
1.98−3 −2 2.56 · 10−2
10−4
3 · 1.4 10· 10
·6.3 −3
0
0 1 2 3 4 5 6 7 8 9 10
Number of Customers Leaving (k)
Poisson Distribution: λ = 3
0.3
0.220.22
0.2
Probability
0.17
0.15
0.1
0.1
Solution
This is a classic example of a Binomial distribution where n = 100, p = 0.1,
and we are interested in finding the probability of k = 5 customers leaving.
Using the Binomial PMF:
100
P (X = 5) = (0.1)5 (0.9)95
5
Thus, the probability that exactly 5 customers will leave in the given
month is approximately 0.1871.
0.25
0.2 0.19
0.16
Probability
0.15
0.1
7.32 · 10−2 7.07 · 10−2
5 · 10−2
2.95 · 10−2
2.03 · 10−2
8.8 · 10−3 −3
−4· 10−3 4.9 · 10
1.1 · 10−3
1 · 101.6
0
0 1 2 3 4 5 6 7 8 9 10
Number of Successes (k)
Solution
Here, λ = 3 and we are interested in finding P (X = 5).
Using the Poisson PMF:
35 e−3
P (X = 5) =
5!
243e−3
P (X = 5) = ≈ 0.1008
120
Thus, the probability that the server will receive exactly 5 requests in a
second is approximately 0.1008.
0.25
0.22 0.22
0.2
0.17
Probability
0.15
0.15
0.1
0.1
Solution
We use the Binomial Probability Mass Function (PMF):
Here:
n = 5, k = 3, p = 0.7
Substituting values:
5
P (X = 3) = (0.7)3 (0.3)2
3
Calculating:
5!
P (X = 3) = (0.7)3 (0.3)2 = 10 × 0.343 × 0.09 ≈ 0.3087
3!(5 − 3)!
0.4
0.36
0.31
0.3
Probability
0.2
0.17
0.13
0.1
2.54 · 10−2
2.4 · 10−3
0
0 1 2 3 4 5
Number of Successes (k)
Solution
We use the Poisson PMF:
λ = 2, k=5
Substituting values:
25 e−2 32e−2
P (X = 5) = = ≈ 0.0361
5! 120
Thus, the probability of finding exactly 5 defects in an hour is 0.0361.
0.2
0.18
0.15
0.14
Probability
1.2 · 10−2
3.4 · 108−3
· 10−4
2 · 10−4
4 · 10−5
0
0 1 2 3 4 5 6 7 8 9 10
Number of Defects (k)
Solution
We use the Hypergeometric PMF:
K N −K
k n−k
P (X = k) = N
n
Here:
Substituting values:
5 15
2 10 × 105
P (X = 2) = 20
2 = ≈ 0.2166
4
4845
0.3
0.22
0.2
Probability
0.16
0.1
7.81 · 10−2
7.8 · 10−3
0
0 1 2 3 4
Number of Defective Components (k)
Solution
We use the Exponential PDF:
P (T ≤ t) = 1 − e−λt
Here:
1
λ= , t=5
10
Substituting values:
1
P (T ≤ 5) = 1 − e− 10 ×5 = 1 − e−0.5 ≈ 1 − 0.6065 = 0.3935
0.15
Probability Density
0.1
5 · 10−2
0
0 5 10 15 20 25 30
Time (minutes)
Solution
We use the Normal Distribution CDF:
a−µ b−µ
P (a < X < b) = P <Z<
σ σ
Where Z is the standard normal variable:
Thus, the probability that the score is between 85 and 115 is 0.6826.
Probability Density
2
0
60 80 100 120 140
Score
Solution
The expected value of a Beta distribution is:
α
E(θ) =
α+β
Substituting values:
2 2
E(θ) = = = 0.4
2+3 5
Thus, the expected value of θ is 0.4.
11. The time until a server experiences two failures follows a Gamma distri-
bution with shape parameter α = 2 and rate parameter λ = 13 . What is
the probability that the server will experience the second failure within 6
hours?
Solution
We use the Gamma CDF for α = 2:
P (T ≤ t) = 1 − e−λt (1 + λt)
Here:
1.5
Probability Density
1
0.5
0
0 0.2 0.4 0.6 0.8 1
θ
1
α = 2, λ= , t=6
3
Substituting values:
− 31 ×6 1
P (T ≤ 6) = 1 − e 1 + × 6 = 1 − e−2 × 3 ≈ 0.5941
3
Thus, the probability that the server will experience the second failure
within 6 hours is 0.5941.
When we have some data and we want to categorize a random variable, we can
use the chart (Figure 61.15:Distributional Choices) named in the link as a guide
https://tinyheero.github.io/2016/03/17/prob-distr.html
11 Bivariate Distributions
For this topic, refer to the link. When you see solution, click on it to display the
workings. https://www.probabilitycourse.com/chapter5/5 3 2 bivariate normal
dist.php
0.3
Probability Density
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3
Time (hours)
1
Figure 12: Gamma Distribution with α = 2, λ = 3
2. For a fair six-sided die, calculate the expected value of the outcome.
3. A data center records 2 failures per hour. What is the probability of
recording exactly 3 failures in an hour? Use the Poisson distribution
formula.