T2 Distributions
T2 Distributions
Probability Distributions
Wei YOU
Spring, 2025
1/66
Random Variable Discrete R.V. Continuous R.V.
Introduction
Last time we saw descriptive statistics.
• Graphical: pie chart, bar chart, stem-and-leaf diagram, box plot, histogram.
• Numerical: sample range, sample mean, sample quartile, IQR, sample standard
deviation, sample correlation.
Descriptive statistics provides an initial look into the dataset you collected.
• Next, we need a more sophisticated analysis of the data.
• Inferential statistics provides mathematical tools to infer the characteristics of, or
make assertions about, the population from the sample.
• In this topic, we will explore how random variables are essential in building the
framework for these inferential methods.
2/66
Random Variable Discrete R.V. Continuous R.V.
Introduction
3/66
Random Variable Discrete R.V. Continuous R.V.
Example: Rolling two dice, where an outcome could be (6, 6) or (3, 1), etc.
Sample Space
The sample space is the set of all possible outcomes of a random experiment.
Example: For two dice, this includes all ordered pairs from (1, 1), (1, 2), . . . , (6, 6).
4/66
Random Variable Discrete R.V. Continuous R.V.
The outcomes of a random experiment and the corresponding events may be difficult
to handle mathematically:
Example: when rolling two dice, the outcome is an ordered pair of numbers, which
does not lend itself directly to arithmetic operations.
Example: In clinical trials, the outcome might be a group of individuals with various
treatment results, complicating direct analysis. 5/66
Random Variable Discrete R.V. Continuous R.V.
6/66
Random Variable Discrete R.V. Continuous R.V.
We have
X({ (3, 1) }) = 4
|{z} ,
| {z }
an outcome the value of the RV
X({ (6, 6) }) = 12
|{z} .
| {z }
an outcome the value of the RV
8/66
Random Variable Discrete R.V. Continuous R.V.
Random Variable
Notation convention
• A generic random variable is denoted by an uppercase letter such as X, Y, N .
• After the experiment is conducted, the observed value/realization
(deterministic/known) is denoted by a lowercase letter such as xi , yi , ni .
9/66
Random Variable Discrete R.V. Continuous R.V.
Example: What is the range of: X = “sum of the numbers shown on the dice.”
Discrete Random Variable
When a random variable is a discrete variable, we call it a discrete random variable.
Equivalently, the range is finite (or countably infinite).
PMF
Such a function p(x) = P(X = x) is called a probability mass function (PMF) of
the random variable X.
# of heads 0 1 2 Total
Probability 0.25 0.5 0.25 1
13/66
Random Variable Discrete R.V. Continuous R.V.
Example: Tossing a fair coin (2 possible outcomes); color of the card picked randomly
from a deck (2 possible outcomes).
14/66
Random Variable Discrete R.V. Continuous R.V.
Bernoulli Distribution
Bernoulli experiment/trial
A Bernoulli experiment has
• One trial that can take two mutually exclusive results: “1” as success and “0” as
failure.
• The probability of success is p.
15/66
Random Variable Discrete R.V. Continuous R.V.
Bernoulli Distribution
Bernoulli
A random variable X is said to be a Bernoulli random variable if
(
1 if success
X=
0 if fail.
16/66
Random Variable Discrete R.V. Continuous R.V.
Expectation
1
Abbreviation for “Bernoulli distribution with success probability p”. 17/66
Random Variable Discrete R.V. Continuous R.V.
Linearity of Expectation
• For any constant a and b,
E[aX + b] = aE[X] + b.
18/66
Random Variable Discrete R.V. Continuous R.V.
Variance
Variance
The variance of a random variable X, denoted by Var(X), is the expected value of the
squared deviation from the mean of X, that is,
Property
For any constants a and b,
Var(aX + b) = a2 Var(X).
Bernoulli Distribution
20/66
Random Variable Discrete R.V. Continuous R.V.
Binomial Trial
Binomial Trial
A binomial experiment has the following characteristics:
• The experiment consists of a fixed number of observations n.
• Each trial is a Bernoulli trial with success probability p.
• The trials are independent, i.e the outcome of one trial does not impact the
outcome on other trials.
21/66
Random Variable Discrete R.V. Continuous R.V.
X1 , . . . , Xn are independent if
23/66
Random Variable Discrete R.V. Continuous R.V.
XX
E[XY ] = xi yj P(X = xi , Y = yj )
i j
XX
= xi yj pX (xi ) pY (yj )
i j
! !
X X
= xi pX (xi ) yj pY (yj )
i j
= E[X] E[Y ].
24/66
Random Variable Discrete R.V. Continuous R.V.
25/66
Random Variable Discrete R.V. Continuous R.V.
Binomial Distribution
Binomial distribution
A Binomial random variable X is the total number of success from n independent
Bernoulli trials, each with success probability p.
X ∼ Binomial(n, p)
n n!
=
i i! × (n − i)!
26/66
Random Variable Discrete R.V. Continuous R.V.
Binomial Distribution
27/66
Random Variable Discrete R.V. Continuous R.V.
• Ecpectation:
n n n
" #
X X X
E[X] = E Ii = E[Ii ] = p = np.
i=1 i=1 i=1
• Variance:
n n
!
X X
Var(X) = Var Ii = Var(Ii ) = np(1 − p).
i=1 i=1
28/66
Random Variable Discrete R.V. Continuous R.V.
Binomial Distribution
Example: Suppose the probability that an item produced by a certain machine will be
defective is 0.1, independent of other items. Find the prbability that a sample of 10
items will contain at most one defective item.
Solution:
Let X be the number of defects, then X ∼ Binomial(10, 0.1).
So the probability is
10 0 10 10
P{X = 0} + P{X = 1} = (0.1) (0.9) + (0.1)1 (0.9)9 = 0.7361
0 1
29/66
Random Variable Discrete R.V. Continuous R.V.
31/66
Random Variable Discrete R.V. Continuous R.V.
X ≈ Binomial(n, p),
E[X] = λ = np.
Hence p = λ/n.
i
n i n−i n λ
P(X = i) ≈ p (1 − p) = (1 − λ/n)n−i
i i n
n! λi
= i (1 − λ/n)n−i
n (n − i)! i!
λi
→ 1 × e−λ
i!
32/66
Random Variable Discrete R.V. Continuous R.V.
Poisson distribution
Poisson distribution
A random variable X is Poisson(λ) with λ > 0 if the PMF is
λi −λ
P{X = i} = e , i = 0, 1, 2, . . .
i!
33/66
Random Variable Discrete R.V. Continuous R.V.
Properties of Poisson RV
• Sum 1: (Taylor expansion of ex )
X λi
eλ =
i!
i=0
• Mean:
X λi X −λ λi X λi
E[X] = ie−λ = e =λ e−λ = λ.
i! (i − 1)! i!
i=0 i=1 i=0
• Variance
X λi X −λ λi
E[X 2 ] = i2 e−λ = e (i − 1 + 1) = λ + λ2 .
i! (i − 1)!
i=0 i=1
Hence Var(X) = λ.
34/66
Random Variable Discrete R.V. Continuous R.V.
λ = np = 10 × 0.1 = 1.
10 11
P{Y = 0} + P{Y = 1} = e−1 + e−1 = 2e−1 = 0.7358
0! 1!
35/66
Random Variable Discrete R.V. Continuous R.V.
36/66
Random Variable Discrete R.V. Continuous R.V.
Intuitively (we will see), sample average and standard deviation are close to the
expectation and the square root of variance.
• Because E[X] = λ, we can use the average λ̂1 = ni=1 Xi /n.
P
• Because Var[X] = λ, we may also use λ̂2 = S 2 , where S 2 is the sample variance.
f (x) f (x)
Rb Rx
a f (s) ds F (x) = −∞ f (s) ds
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Z x
′
f (x) = F (x), F (x) = f (s) ds.
−∞
38/66
Random Variable Discrete R.V. Continuous R.V.
Variance
The variance is defined similarly as in the discrete case:
Example: Uniform[a, b]
Expectation
b b
x2 b2 − a2
Z
1 a+b
µ = E[X] = x· dx = = = .
a b−a 2(b − a) a 2(b − a) 2
Second moment
b b
x3 b3 − a3 b2 + ab + a2
Z
1
E[X 2 ] = x2 · dx = = = .
a b−a 3(b − a) a 3(b − a) 3
Variance
2
b2 + ab + a2 (b − a)2
2 2 a+b
Var(X) = E[X ] − µ = − = .
3 2 12
42/66
Random Variable Discrete R.V. Continuous R.V.
• Python:
import numpy as np
np.random() # generate one Uniform[0,1]
np.random((3,2)) # a matrix of 3-by-2 Uniform[0,1]
• R: runif(1)
• Matlab: rand()
43/66
Random Variable Discrete R.V. Continuous R.V.
To generate R.V. from any distribution, all you need is the inverse of the CDF and
Uniform[0, 1].
44/66
Random Variable Discrete R.V. Continuous R.V.
https://www.lexaloffle.com/bbs/?pid=114907
45/66
Random Variable Discrete R.V. Continuous R.V.
Consider a bead on its way down the Galton board. After each row of pegs, the bead
takes a step:
• It goes left one step, or
• It goes right one step.
Let X be the random variable representing the bead’s move at a given row:
(
−1, if the bead goes left,
X=
1, if the bead goes right,
46/66
Random Variable Discrete R.V. Continuous R.V.
The Galton board suggests that as the number of rows increases, the distribution of
the bead’s position Yn tends to form a bell curve due to the Central Limit Theorem. 47/66
Random Variable Discrete R.V. Continuous R.V.
Variance
n n
!
X X
Var(Yn ) = Var Xi = Var(Xi ) = n Var(X) = n × 1 = n.
i=1 i=1
48/66
Random Variable Discrete R.V. Continuous R.V.
49/66
Random Variable Discrete R.V. Continuous R.V.
Observation: When plotting a histogram (e.g., using Python libraries such as Matplotlib), the plotter
automatically scales the canvas so that the histogram fills most of the available space. But why is this done?
• Maximizing Visual Information: Scaling ensures that the details of the data distribution (such as peaks,
valleys, and spread) are clearly visible.
Question: What
√ is the “natural
√ scale” for the
√ plot as function of n?
3 ∗ 10 ≈ 10, 3 ∗ 40 ≈ 20, 3 ∗ 1000 ≈ 100.
51/66
Random Variable Discrete R.V. Continuous R.V.
Yn − E[Yn ] Yn − 0
Zn = p = √ .
Var(Yn ) n
52/66
Random Variable Discrete R.V. Continuous R.V.
Yn − E[Yn ]
Zn = p ≈ Z, as n becomes large.
Var(Yn )
Here, Z is a continuous random variable following the “bell curve” distribution, called
the standard normal distribution, denoted by N (0, 1).
• The CLT explains why the distribution of bead positions tends to become
bell-shaped as the number of rows increases.
• More generally, the CLT states that sums of many independent and identically
distributed (i.i.d.) random variables, when standardized, converge to N (0, 1). 53/66
Random Variable Discrete R.V. Continuous R.V.
55/66
Random Variable Discrete R.V. Continuous R.V.
The quality of the normal approximation via the Central Limit Theorem (CLT) varies:
• If the underlying distribution is normal, the approximation is exact.
• If the underlying distribution is skewed, the approximation may be poor for small
sample sizes.
• The quality of the approximation improves as the sample size increases.
• As a rule of thumb, if the distribution is not too skewed and the variance is
moderate, a sample size of n ≥ 30 should provide a reasonably accurate
approximation.
56/66
Random Variable Discrete R.V. Continuous R.V.
Note that
E[Ii ] = p and Var(Ii ) = p(1 − p).
By the Central Limit Theorem,
p p
Yn ≈ n E[I1 ] + n Var(I1 ) Z = np + np(1 − p) Z, where Z ∼ N (0, 1).
57/66
Random Variable Discrete R.V. Continuous R.V.
For a deeper understanding of the Central Limit Theorem and related topics, consider
watching these insightful videos by @3Blue1Brown:
• “But what is the Central Limit Theorem?”
• “A pretty reason why Gaussian + Gaussian = Gaussian”
58/66
Random Variable Discrete R.V. Continuous R.V.
Normal
Having seen the universality of the standard normal distribution, ensured by the central
limit theorem, we can now focus more on “The Bell Curve” itself.
Normal Distribution
A random variable is said to be normally distributed with parameters µ and σ 2 , and
we write X ∼ N (µ, σ 2 ), if the PDF is
1 (x−µ)2
f (x) = √ e− 2σ2 , −∞ < x < ∞
2πσ
59/66
Random Variable Discrete R.V. Continuous R.V.
60/66
Random Variable Discrete R.V. Continuous R.V.
Y ∼ N (aµ + b, a2 σ 2 ).
Now, if we set
1 µ
a= and b = − ,
σ σ
we obtain
X −µ
Y = ∼ N (0, 1).
σ
This transformation is known as the standardization of a normal distribution.
Note: Although Φ(x) has no closed-form expression, numerical values are widely
available.
62/66
Random Variable Discrete R.V. Continuous R.V.
Complement Rule
P(Z > x) = 1 − P(Z ≤ x) = 1 − Φ(x).
Symmetry Property
P(Z < −x) = Φ(−x).
Since P(Z > x) = P(Z < −x), it follows that:
Φ(−x) = 1 − Φ(x).
63/66
Random Variable Discrete R.V. Continuous R.V.
X −µ
Z= ∼ N (0, 1)
σ
implies
b − µ b − µ
P(X < b) = P Z < =Φ
σ σ
and b − µ a − µ
P(a < X < b) = Φ −Φ .
σ σ
64/66
Random Variable Discrete R.V. Continuous R.V.
66/66