0% found this document useful (0 votes)
2 views66 pages

T2 Distributions

The document provides an overview of statistics for engineers, focusing on probability distributions and random variables. It introduces concepts such as random experiments, sample spaces, and the importance of random variables in statistical analysis. Additionally, it covers types of random variables, probability mass functions, and specific distributions like Bernoulli and Binomial distributions.

Uploaded by

aiden050917
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views66 pages

T2 Distributions

The document provides an overview of statistics for engineers, focusing on probability distributions and random variables. It introduces concepts such as random experiments, sample spaces, and the importance of random variables in statistical analysis. Additionally, it covers types of random variables, probability mass functions, and specific distributions like Bernoulli and Binomial distributions.

Uploaded by

aiden050917
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

IEDA 2540 Statistics for Engineers

Probability Distributions

Wei YOU

Spring, 2025

1/66
Random Variable Discrete R.V. Continuous R.V.

Introduction
Last time we saw descriptive statistics.
• Graphical: pie chart, bar chart, stem-and-leaf diagram, box plot, histogram.
• Numerical: sample range, sample mean, sample quartile, IQR, sample standard
deviation, sample correlation.

Descriptive statistics provides an initial look into the dataset you collected.
• Next, we need a more sophisticated analysis of the data.
• Inferential statistics provides mathematical tools to infer the characteristics of, or
make assertions about, the population from the sample.
• In this topic, we will explore how random variables are essential in building the
framework for these inferential methods.
2/66
Random Variable Discrete R.V. Continuous R.V.

Introduction

If the population has no underlying structure, drawing scientifically valid conclusions


becomes impossible. To resolve this, we assume that each observation is a random
variable.
• Question 1: What is a random variable?
• Question 2: For a given experiment, what is a suitable random variable to model
it? ⇒ In this lecture, we review random variables commonly seen in statistics.
• Question 3: After we pinpoint a random variable, how do we use it to make
statistical claims? ⇒ This will be addressed later in the course.

3/66
Random Variable Discrete R.V. Continuous R.V.

Introduction to Random Experiments

Statistics regards experiments as random and focuses on their outcomes.


Random Experiment
A random experiment is an experiment in which the outcome is not known until the
experiment is performed.

Example: Rolling two dice, where an outcome could be (6, 6) or (3, 1), etc.

Sample Space
The sample space is the set of all possible outcomes of a random experiment.

Example: For two dice, this includes all ordered pairs from (1, 1), (1, 2), . . . , (6, 6).

4/66
Random Variable Discrete R.V. Continuous R.V.

Outcomes and Events


Statistics regards experiments as random. Statistician cares about the outcomes that
satisfies a certain description.
Event
An event is a specific subset of the sample space that satisfies a given condition. For
instance, you win $1 if the event “the sum of the two dice is 6” occurs, which
corresponds to the outcomes {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}.

The outcomes of a random experiment and the corresponding events may be difficult
to handle mathematically:
Example: when rolling two dice, the outcome is an ordered pair of numbers, which
does not lend itself directly to arithmetic operations.
Example: In clinical trials, the outcome might be a group of individuals with various
treatment results, complicating direct analysis. 5/66
Random Variable Discrete R.V. Continuous R.V.

What is a Random Variable?

To perform statistical analysis, we would like to simplify these outcomes by assigning


numerical values to it, so that mathematical calculation is possible. This process leads
to the concept of a random variable.
Random variable
A random variable is a function that assigns a real number to each outcome of a
random experiment.

• For random experiments, the outcome is not known in advance.


• Hence, the associated value (the variable!) to describe the even of interest is also
unknown.

6/66
Random Variable Discrete R.V. Continuous R.V.

What is a Random Variable?


Example: Roll two dice, the outcome could be (3, 1).
Let
X = “sum of the numbers shown on the dice.”

We have
X({ (3, 1) }) = 4
|{z} ,
| {z }
an outcome the value of the RV

X({ (6, 6) }) = 12
|{z} .
| {z }
an outcome the value of the RV

• After the experiment, we now observe a deterministic/known number. This is


called a realization of the random variable.

X maps an outcome (not an event) to a number.


7/66
Random Variable Discrete R.V. Continuous R.V.

Example: Rolling Two Dice

8/66
Random Variable Discrete R.V. Continuous R.V.

Random Variable

Notation convention
• A generic random variable is denoted by an uppercase letter such as X, Y, N .
• After the experiment is conducted, the observed value/realization
(deterministic/known) is denoted by a lowercase letter such as xi , yi , ni .

9/66
Random Variable Discrete R.V. Continuous R.V.

Types of Random Variables


Range of a random variable
The range of a random variable is all the possible values that it can take.

Example: What is the range of: X = “sum of the numbers shown on the dice.”
Discrete Random Variable
When a random variable is a discrete variable, we call it a discrete random variable.
Equivalently, the range is finite (or countably infinite).

Continuous Random Variable


When a random variable is a continuous variable, we call it a continuous random
variable. Equivalently, the range is an interval of real numbers.

We will review some of the commonly used probability distributions.


10/66
Random Variable Discrete R.V. Continuous R.V.

Probability Distributions: Why should we care?


Many commonly seen experiments can be characterized by simple distributions.
Example: consider the toss of a coin. The coin has two possible outcomes: {H, T },
where H appears with a probability p. The outcome is characterized by a simple
random variable with two possible values.
In understanding random experiments, we need to analyze the behavior of the
randomness of the associated random variables.
• The problem is: we do not know exactly the value of p!
• Statistics can help in estimating this parameter.
• The foundation is that we assume the coin indeed follows such a probability
distribution – the form of the distribution is assumed beforehand, with only the
parameter p being unknown.
11/66
Random Variable Discrete R.V. Continuous R.V.

Discrete Random Variables and Probability Mass Function (PMF)


In describing the randomness of an event, we introduced probability.
• The chance of an event E happening is denoted P(E).

To describe the randomness of a discrete random variable X, we consider the


probability of X taking certain values.
• Consider a special event
Ex = “X = x”.
• Here, Ex collects all outcomes ω such that X(ω) = x, so Ex is indeed an event.
• With the definition of probability for events, we can say that the probability of X
taking a value of x is
P(X = x) = P(Ex ).
12/66
Random Variable Discrete R.V. Continuous R.V.

PMF
Such a function p(x) = P(X = x) is called a probability mass function (PMF) of
the random variable X.

Example: Flip two coins

# of heads 0 1 2 Total
Probability 0.25 0.5 0.25 1

13/66
Random Variable Discrete R.V. Continuous R.V.

Uniform Random Variable

Uniform Random Variable


A random variable with equal probability for all outcomes are called a uniform random
variable.

Example: Tossing a fair coin (2 possible outcomes); color of the card picked randomly
from a deck (2 possible outcomes).

14/66
Random Variable Discrete R.V. Continuous R.V.

Bernoulli Distribution

In statistical studies, we usually have experiments that gives binary outcomes.


Example: A product passes/fails quality control.
Example: A coin toss gives head/tail.
These experiments are usually called Bernoulli trials or Bernoulli experiments.

Bernoulli experiment/trial
A Bernoulli experiment has
• One trial that can take two mutually exclusive results: “1” as success and “0” as
failure.
• The probability of success is p.

15/66
Random Variable Discrete R.V. Continuous R.V.

Bernoulli Distribution

Bernoulli
A random variable X is said to be a Bernoulli random variable if
(
1 if success
X=
0 if fail.

A random variable X is said to follow a Bernoulli distribution with success


probability p, if
(
p x = 1,
P{X = x} =
1 − p x = 0.

16/66
Random Variable Discrete R.V. Continuous R.V.

Expectation

Expectation for discrete random variable


The expectation (expected value, mean) of a random variable X is denoted by E[X].
In the discrete case, the expectation is the average of all possible values, weighted by
the probability. Mathematically, it is defined as
X
E[X] = xi P(X = xi ).
i

Example: let X be a Bernoulli(p)1 random variable. Then,

E[X] = 1 × P(X = 1) + 0 × P(X = 0) = p.

1
Abbreviation for “Bernoulli distribution with success probability p”. 17/66
Random Variable Discrete R.V. Continuous R.V.

Linearity of Expectation
• For any constant a and b,

E[aX + b] = aE[X] + b.

• For any constants a1 , . . . , an and b,

E[a1 X1 + a2 X2 + · · · + an Xn + b] = a1 E[X1 ] + a2 E[X2 ] + · · · + an E[Xn ] + b.

Cf. Linearity of sample mean x̄:


• (ax + b) = ax̄ + b.
• (a1 x1 + a2 x2 + · · · + an xn + b) = a1 x̄1 + a2 x̄2 + · · · + an x̄n + b.

18/66
Random Variable Discrete R.V. Continuous R.V.

Variance
Variance
The variance of a random variable X, denoted by Var(X), is the expected value of the
squared deviation from the mean of X, that is,

Var(X) = E[(X − µ)2 ] = E[X 2 ] − µ2 , where µ = E[X].

Property
For any constants a and b,

Var(aX + b) = a2 Var(X).

Cf. property of sample variance:


• If yi = a xi + b, i = 1, 2, . . . , n, then s2y = a2 s2x .
19/66
Random Variable Discrete R.V. Continuous R.V.

Bernoulli Distribution

Example: Let X be a Bernoulli(p) random variable. Then

Var(X) = E[(X − E[X])2 ] = (1 − p)2 · P(X = 1) + (0 − p)2 · P(X = 0) = p(1 − p).

20/66
Random Variable Discrete R.V. Continuous R.V.

Binomial Trial

Binomial Trial
A binomial experiment has the following characteristics:
• The experiment consists of a fixed number of observations n.
• Each trial is a Bernoulli trial with success probability p.
• The trials are independent, i.e the outcome of one trial does not impact the
outcome on other trials.

• Suppose we have a Binomial trial with parameters n and p.


• How many successes are there in total?

21/66
Random Variable Discrete R.V. Continuous R.V.

Independence of Random Variables


Independence
Two random variables X and Y (not necessarily discrete) are independent if for any
two sets of real numbers A and B

P({X ∈ A} ∩ {Y ∈ B}) = P({X ∈ A}) P({Y ∈ B}).

For discrete random variables, X and Y are independent if

P(X = xi , Y = yj ) = P(X = xi ) P(Y = yj ).

X1 , . . . , Xn are independent if

P(X1 = x1 , . . . , Xn = xn ) = P(X1 = x1 ) × · · · × P(Xn = xn ).


22/66
Random Variable Discrete R.V. Continuous R.V.

Independence: Expectation and Variance Properties


Product of Independent Random Variables
If X and Y (not necessarily discrete) are independent, then

E[XY ] = E[X] E[Y ].

Variance of a Sum of Independent Random Variables


For independent random variables X1 , X2 , . . . , Xn , the variance of their sum is equal
to the sum of their variances:

Var(X + Y ) = Var(X) + Var(Y ),


X  X
Var Xi = Var(Xi ).
i i

23/66
Random Variable Discrete R.V. Continuous R.V.

Proofs for reference

XX
E[XY ] = xi yj P(X = xi , Y = yj )
i j
XX
= xi yj pX (xi ) pY (yj )
i j
! !
X X
= xi pX (xi ) yj pY (yj )
i j

= E[X] E[Y ].

24/66
Random Variable Discrete R.V. Continuous R.V.

Proofs for reference

Var(X + Y ) = E[(X + Y )2 ] − (E[X + Y ])2


= E[X 2 + 2XY + Y 2 ] − (E[X] + E[Y ])2
 
= E[X 2 + 2XY + Y 2 ] − (E[X])2 + 2E[X]E[Y ] + (E[Y ])2
 
= E[X 2 ] + 2E[XY ] + E[Y 2 ] − (E[X])2 + 2E[X]E[Y ] + (E[Y ])2
 
= E[X 2 ] + 2E[X]E[Y ] + E[Y 2 ] − (E[X])2 + 2E[X]E[Y ] + (E[Y ])2
   
= E[X 2 ] − (E[X])2 + E[Y 2 ] − (E[Y ])2
= Var(X) + Var(Y ).

25/66
Random Variable Discrete R.V. Continuous R.V.

Binomial Distribution
Binomial distribution
A Binomial random variable X is the total number of success from n independent
Bernoulli trials, each with success probability p.

X ∼ Binomial(n, p)

The probability distribution is given by


 
n i
P{X = i} = p (1 − p)n−i , i = 0, 1, 2, . . . , n
i

 
n n!
=
i i! × (n − i)!
26/66
Random Variable Discrete R.V. Continuous R.V.

Binomial Distribution

Named after binomial expansion


n  
n
X n i n−i
(a + b) = ab .
i
i=0

Does the probabilities sum to 1? We have that


n n  
X X n
P{X = i} = pi (1 − p)n−i = [p + (1 − p)]n = 1.
i
i=0 i=0

27/66
Random Variable Discrete R.V. Continuous R.V.

Connection between Binomial and Bernoulli


Let X ∼ Binomial(n, p) and let Ii ∼ Bernoulli(p) be independent Bernoulli random
variables with success rate p. Then
X
X= Ii .
i

• Ecpectation:
n n n
" #
X X X
E[X] = E Ii = E[Ii ] = p = np.
i=1 i=1 i=1
• Variance:
n n
!
X X
Var(X) = Var Ii = Var(Ii ) = np(1 − p).
i=1 i=1
28/66
Random Variable Discrete R.V. Continuous R.V.

Binomial Distribution

Example: Suppose the probability that an item produced by a certain machine will be
defective is 0.1, independent of other items. Find the prbability that a sample of 10
items will contain at most one defective item.
Solution:
Let X be the number of defects, then X ∼ Binomial(10, 0.1).
So the probability is
   
10 0 10 10
P{X = 0} + P{X = 1} = (0.1) (0.9) + (0.1)1 (0.9)9 = 0.7361
0 1

29/66
Random Variable Discrete R.V. Continuous R.V.

Application of Binomial Trial

When n is large and p is small, binomial trial can be use to model


• The number of misprints on a page (or a group of pages) of a book.
• The number of customers entering a post office on a given day.
• The number of patients in an emergency department in a day.
• The number of calls to a call center over a week.

Sounds good! But do we have problems in the calculation?


   
10 0 10 10
P{X = 0} + P{X = 1} = (0.1) (0.9) + (0.1)1 (0.9)9 = 0.7361
0 1

Hard to calculate, numerically unstable. Approximations?


30/66
Random Variable Discrete R.V. Continuous R.V.

Example: Wire Flaws

Flaws occur at random along the length of a thin copper wire.


• X number of flaws in a unit length wire.
• λ rate at which flaw occur, so λ flaws per unit length.
• Expectation E[X] = λ × 1.

How to calculate the distribution of X?


• Partition the wire in to n sections, each with length ∆t = 1/n
• Assume that at most one flaw may occur in each section.
• Assume that the flaws occur at random in each section, with probability p.
• Look familiar?

31/66
Random Variable Discrete R.V. Continuous R.V.

Example: Wire Flaws


The number of flaws can be approximated by a binomial random variable

X ≈ Binomial(n, p),

where n is the number of sections and p is chosen to match the expectation:

E[X] = λ = np.

Hence p = λ/n.
     i
n i n−i n λ
P(X = i) ≈ p (1 − p) = (1 − λ/n)n−i
i i n
n! λi
= i (1 − λ/n)n−i
n (n − i)! i!
λi
→ 1 × e−λ
i!
32/66
Random Variable Discrete R.V. Continuous R.V.

Poisson distribution
Poisson distribution
A random variable X is Poisson(λ) with λ > 0 if the PMF is

λi −λ
P{X = i} = e , i = 0, 1, 2, . . .
i!

33/66
Random Variable Discrete R.V. Continuous R.V.

Properties of Poisson RV
• Sum 1: (Taylor expansion of ex )

X λi
eλ =
i!
i=0

• Mean:
X λi X −λ λi X λi
E[X] = ie−λ = e =λ e−λ = λ.
i! (i − 1)! i!
i=0 i=1 i=0
• Variance
X λi X −λ λi
E[X 2 ] = i2 e−λ = e (i − 1 + 1) = λ + λ2 .
i! (i − 1)!
i=0 i=1

Hence Var(X) = λ.
34/66
Random Variable Discrete R.V. Continuous R.V.

Poisson approximates Binomial


When n is large and p is small,

Poisson(np) ≈ Binomial(n, p).

Example: Recall the defective machine example.

P{X = 0} + P{X = 1} = 0.7361

If we want to use a Poisson random variable to approximate, we match the expectation:

λ = np = 10 × 0.1 = 1.

Try Y ∼ Poisson(1) to approximate it

10 11
P{Y = 0} + P{Y = 1} = e−1 + e−1 = 2e−1 = 0.7358
0! 1!
35/66
Random Variable Discrete R.V. Continuous R.V.

Poisson approximates Binomial

Poisson approximating Binomial


• Decent accuracy.
• Asymptotically the same.
• Much less computation needed.

36/66
Random Variable Discrete R.V. Continuous R.V.

Poisson: Statistical View


Poisson random variable can be use to model the number of patients visiting the
Emergency Department (ED) every day.
• Knowing λ can help with the staffing decision.
• If we observe X1 , . . . , Xn patients in an ED over a period of n days, how can we
estimate λ?

Intuitively (we will see), sample average and standard deviation are close to the
expectation and the square root of variance.
• Because E[X] = λ, we can use the average λ̂1 = ni=1 Xi /n.
P

• Because Var[X] = λ, we may also use λ̂2 = S 2 , where S 2 is the sample variance.

Which is better? ⇒ Estimation theory.


37/66
Random Variable Discrete R.V. Continuous R.V.

Continuous Random Variables


Continuous Random Variable
When a random variable is a continuous variable, we call it a continuous random
variable. Equivalently, the range is an interval of real numbers.

f (x) f (x)

Rb Rx
a f (s) ds F (x) = −∞ f (s) ds

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Z x

f (x) = F (x), F (x) = f (s) ds.
−∞
38/66
Random Variable Discrete R.V. Continuous R.V.

Uniform Distribution on [0,1]: CDF and PDF


Uniform Distribution on [0, 1]
A random variable X is said to follow a Uniform distribution on the interval [0, 1] if it
has an equal chance to take any value in [0, 1].
The probability density function (PDF):
(
1, 0 ≤ x ≤ 1,
f (x) = F ′ (x) = 1
f (x)
0, otherwise.
Z b
The cumulative distribution function (CDF): 0.5 f (s) ds
a

0, x < 0,


1
F (x) = P (X ≤ x) = x, 0 ≤ x ≤ 1,


1, x > 1.
39/66
Random Variable Discrete R.V. Continuous R.V.

Uniform Distribution on [a, b]: CDF and PDF


Uniform Distribution on [a, b]
A random variable X is said to follow a Uniform distribution on the interval [a, b] if it
has an equal chance to take any value in [a, b].
The probability density function (PDF):
(
1
b−a , a ≤ x ≤ b, f (x)
f (x) = F ′ (x) =
0, otherwise. 0.33

The cumulative distribution function (CDF):



0, x < a,


x−a −1 1 2
F (x) = P (X ≤ x) = b−a , a ≤ x ≤ b,


1, x > b.
40/66
Random Variable Discrete R.V. Continuous R.V.

Expectation and Variance (Continuous Case)


Expectation
The expectation of a continuous random variable X is given by:
Z ∞
µ = E[X] = x f (x) dx.
−∞

If we know the distribution of X, then the expectation of a function g(X) is


Z ∞
E[g(X)] = g(x) f (x) dx.
−∞

Variance
The variance is defined similarly as in the discrete case:

σ 2 = Var(X) = E[(X − µ)2 ] = E[X 2 ] − µ2 .


41/66
Random Variable Discrete R.V. Continuous R.V.

Example: Uniform[a, b]
Expectation
b b
x2 b2 − a2
Z
1 a+b
µ = E[X] = x· dx = = = .
a b−a 2(b − a) a 2(b − a) 2

Second moment
b b
x3 b3 − a3 b2 + ab + a2
Z
1
E[X 2 ] = x2 · dx = = = .
a b−a 3(b − a) a 3(b − a) 3

Variance
2
b2 + ab + a2 (b − a)2

2 2 a+b
Var(X) = E[X ] − µ = − = .
3 2 12
42/66
Random Variable Discrete R.V. Continuous R.V.

Generating Uniform R.V.

We assume that a Uniform[0, 1] random variable can always be generated.

• Python:

import numpy as np
np.random() # generate one Uniform[0,1]
np.random((3,2)) # a matrix of 3-by-2 Uniform[0,1]

• R: runif(1)
• Matlab: rand()

43/66
Random Variable Discrete R.V. Continuous R.V.

Why is Uniform R.V. Important?


The inverse of a distribution function FX−1 (y) = inf{x : FX (x) ≥ y}.
Inverse of CDF
• For any random variable X with continuous CDF FX (·), Y = FX (X) is
Uniform[0, 1].
• If Y is Uniform[0, 1], then for any CDF FX (·), X = FX−1 (Y ) has CDF FX (·).

Proof: for continuous CDF,


P(FX (X) ≤ x) = P(X ≤ FX−1 (x)) = FX (FX−1 (x)) = x
P(FX−1 (Y ) ≤ x) = P(Y ≤ FX (x)) = F (x).

To generate R.V. from any distribution, all you need is the inverse of the CDF and
Uniform[0, 1].
44/66
Random Variable Discrete R.V. Continuous R.V.

The Galton Board


The Galton board is a vertical board with interleaved rows of pegs. Beads are
dropped from the top and, when the board is level, each bead bounces either left or
right as it hits a peg. Eventually, the beads are collected into bins at the bottom, and
the height of the bead columns in the bins represents the frequency of outcomes.

• It serves as a physical demonstration of random processes, where each left/right


bounce mimics an independent Bernoulli trial.
• As the number of rows increases, the distribution of beads in the bins tends to
approximate a normal distribution, illustrating the Central Limit Theorem.
• The Galton board visually demonstrates how randomness at the micro-level
can lead to predictable, statistical behavior at the macro-level.

https://www.lexaloffle.com/bbs/?pid=114907
45/66
Random Variable Discrete R.V. Continuous R.V.

Galton Board: Modeling a Bead’s Path

Consider a bead on its way down the Galton board. After each row of pegs, the bead
takes a step:
• It goes left one step, or
• It goes right one step.

Let X be the random variable representing the bead’s move at a given row:
(
−1, if the bead goes left,
X=
1, if the bead goes right,

with P(X = −1) = 0.5 and P(X = 1) = 0.5.

46/66
Random Variable Discrete R.V. Continuous R.V.

Galton Board: Modeling a Bead’s Path


Denote by Yi the position of the bead in the i-th row. The initial position is defined as
Y0 = 0. Then the position in the first row is Y1 = Y0 + X1 = X1 , where X1 is an
independent and identically distributed (i.i.d.) copy of X.
As we continue, the position at row k is given by
k
X
Yk = Yk−1 + Xk = Xi .
i=1

By mathematical induction, the position at any row n is:


n
X
Yn = Xi .
i=1

The Galton board suggests that as the number of rows increases, the distribution of
the bead’s position Yn tends to form a bell curve due to the Central Limit Theorem. 47/66
Random Variable Discrete R.V. Continuous R.V.

Central Tendency and Dispersion of Yn


Pn
Recall that the bead’s position after n rows is given by Yn = i=1 Xi .
Expectation
n n
" #
X X
E[Yn ] = E Xi = E[Xi ] = n E[X] = n × 0 = 0.
i=1 i=1

Variance
n n
!
X X
Var(Yn ) = Var Xi = Var(Xi ) = n Var(X) = n × 1 = n.
i=1 i=1

48/66
Random Variable Discrete R.V. Continuous R.V.

Galton Board (12 Rows): Central Tendency and Dispersion


There are 25 rows in this Galton board.
• µ = E[Y25 ] = 0.
p √
• σ = Var(Y25 ) = 25 = 5.
The Galton board suggests that:
• Roughly 68% of the beads end up in
the bins in positions µ ± σ.
• Roughly 95% of the beads end up in
the bins in positions µ ± 2σ.
• Almost all beads end up in the bins in
positions µ ± 3σ.

49/66
Random Variable Discrete R.V. Continuous R.V.

Increasing the rows in the Galton board

• As the number of rows n increases, the distribution of location of the beads


p √
spreads wider and wider: σ = Var(Yn ) = n.
• The shape of the distribution becomes more and more like a bell curve.
50/66
Random Variable Discrete R.V. Continuous R.V.

Increasing the rows in the Galton board

Observation: When plotting a histogram (e.g., using Python libraries such as Matplotlib), the plotter
automatically scales the canvas so that the histogram fills most of the available space. But why is this done?
• Maximizing Visual Information: Scaling ensures that the details of the data distribution (such as peaks,
valleys, and spread) are clearly visible.

Question: What
√ is the “natural
√ scale” for the
√ plot as function of n?
3 ∗ 10 ≈ 10, 3 ∗ 40 ≈ 20, 3 ∗ 1000 ≈ 100.
51/66
Random Variable Discrete R.V. Continuous R.V.

Standardizing the Beads’ Positions


To implement the scaling logic of the plotter,
we can standardize the positions of the beads:

Yn − E[Yn ] Yn − 0
Zn = p = √ .
Var(Yn ) n

The standardized positions Zn will be centered


around 0 and have a “standard” standard
deviation of 1.
It turns out that regardless of the number of
rows, the standardized positions follow the
same bell curve.

52/66
Random Variable Discrete R.V. Continuous R.V.

Central Limit Theorem and the Galton Board


The math behind the Galton board can be rigorously justified, yielding the famous
Central Limit Theorem (CLT) for summation
Pn
Recall that Yn = i=1 Xi , we have:

Yn − E[Yn ]
Zn = p ≈ Z, as n becomes large.
Var(Yn )

Here, Z is a continuous random variable following the “bell curve” distribution, called
the standard normal distribution, denoted by N (0, 1).

• The CLT explains why the distribution of bead positions tends to become
bell-shaped as the number of rows increases.
• More generally, the CLT states that sums of many independent and identically
distributed (i.i.d.) random variables, when standardized, converge to N (0, 1). 53/66
Random Variable Discrete R.V. Continuous R.V.

Sample Mean and Standardized Position


Starting with
n
Yn − E[Yn ] X
Zn = p where Yn = Xi ,
Var(Yn ) i=1
we note that E[Yn ] = n E[X1 ] and Var(Yn ) = n Var(X1 ). Hence,
Pn n
i=1 Xi − n E[X1 ] X − E[X1 ] 1X
Zn = p = p , where X = Xi .
n Var(X1 ) Var(X1 )/n n
i=1

CLT for sample mean


r r
Var(X1 ) Var(X1 )
X = E[X1 ] + Zn ≈ E[X1 ] + Z, for large n.
n n
• The sample mean X is centered around the population mean E[X1 ].
p
• The scale of its fluctuations is about Var(X1 )/n – decreases as n increases.
54/66
Random Variable Discrete R.V. Continuous R.V.

Central Limit Theorem (CLT) Works for Any Distribution


Central Limit Theorem (CLT)
Let X1 , X2 , . . . , Xn be independent and identically distributed (i.i.d.) random variables
with finite variance. Then
n
X̄ − E[X1 ] 1X
Zn = p ≈ Z ∼ N (0, 1), where X̄ = Xi .
Var(X1 )/n n
i=1

This result implies that r


Var(X1 )
X̄ ≈ E[X1 ] + Z,
n
n
X p
Xi ≈ n E[X1 ] + n Var(X1 ) Z.
i=1

55/66
Random Variable Discrete R.V. Continuous R.V.

Quality of the Normal Approximation

The quality of the normal approximation via the Central Limit Theorem (CLT) varies:
• If the underlying distribution is normal, the approximation is exact.
• If the underlying distribution is skewed, the approximation may be poor for small
sample sizes.
• The quality of the approximation improves as the sample size increases.
• As a rule of thumb, if the distribution is not too skewed and the variance is
moderate, a sample size of n ≥ 30 should provide a reasonably accurate
approximation.

56/66
Random Variable Discrete R.V. Continuous R.V.

Normal Approximation for a Binomial Random Variable


Suppose Yn is a binomial random variable with parameters n and p, then Yn can be
approximated by a normal distribution when n is large (works well when np ≥ 5 and
n(1 − p) ≥ 5).
Solution:
n
X
Yn = Ii , where each Ii ∼ Bernoulli(p).
i=1

Note that
E[Ii ] = p and Var(Ii ) = p(1 − p).
By the Central Limit Theorem,
p p
Yn ≈ n E[I1 ] + n Var(I1 ) Z = np + np(1 − p) Z, where Z ∼ N (0, 1).

57/66
Random Variable Discrete R.V. Continuous R.V.

Further Reading: 3Blue1Brown Videos

For a deeper understanding of the Central Limit Theorem and related topics, consider
watching these insightful videos by @3Blue1Brown:
• “But what is the Central Limit Theorem?”
• “A pretty reason why Gaussian + Gaussian = Gaussian”

58/66
Random Variable Discrete R.V. Continuous R.V.

Normal

Having seen the universality of the standard normal distribution, ensured by the central
limit theorem, we can now focus more on “The Bell Curve” itself.
Normal Distribution
A random variable is said to be normally distributed with parameters µ and σ 2 , and
we write X ∼ N (µ, σ 2 ), if the PDF is

1 (x−µ)2
f (x) = √ e− 2σ2 , −∞ < x < ∞
2πσ

E[X] = µ, E[X 2 ] = σ 2 + µ2 , Var(X) = σ 2 .

59/66
Random Variable Discrete R.V. Continuous R.V.

Density of Normal Distribution

• The normal distribution is symmetric


around its mean µ.
• The density function is unimodal,
with the peak at µ.
• The spread of the distribution is
controlled by the standard deviation σ
– most density (99.7%) lies within
µ ± 3σ.

60/66
Random Variable Discrete R.V. Continuous R.V.

Scalability of the Normal Distribution


Scaling Property
If X ∼ N (µ, σ 2 ) and Y = aX + b, then

Y ∼ N (aµ + b, a2 σ 2 ).

Now, if we set
1 µ
a= and b = − ,
σ σ
we obtain
X −µ
Y = ∼ N (0, 1).
σ
This transformation is known as the standardization of a normal distribution.

We can always write X = σZ + µ, where Z ∼ N (0, 1).


We call Z the standard normal distribution.
61/66
Random Variable Discrete R.V. Continuous R.V.

Standard Normal Distribution

The standard normal distribution, denoted by Z ∼ N (0, 1), satisfies

Probability Density Function (PDF) of standard normal


1 x2
f (x) = √ e− 2 , x ∈ (−∞, ∞).

Cumulative Distribution Function (CDF) of standard normal


Z x
1 y2
Φ(x) = √ e− 2 dy, x ∈ (−∞, ∞).
2π −∞

Note: Although Φ(x) has no closed-form expression, numerical values are widely
available.
62/66
Random Variable Discrete R.V. Continuous R.V.

Properties of the Standard Normal Distribution

Complement Rule
P(Z > x) = 1 − P(Z ≤ x) = 1 − Φ(x).

Symmetry Property
P(Z < −x) = Φ(−x).
Since P(Z > x) = P(Z < −x), it follows that:

Φ(−x) = 1 − Φ(x).

63/66
Random Variable Discrete R.V. Continuous R.V.

Example: Evaluating Probabilities for a Normal Distribution

For any X ∼ N (µ, σ 2 ), the standardization

X −µ
Z= ∼ N (0, 1)
σ
implies
 b − µ b − µ
P(X < b) = P Z < =Φ
σ σ
and b − µ a − µ
P(a < X < b) = Φ −Φ .
σ σ

64/66
Random Variable Discrete R.V. Continuous R.V.

Evaluating Probabilities for a Normal Distribution


Example: Suppose X ∼ N (3, 16)
(a) Find P {X < 11}:
 11 − 3 
P (X < 11) = Φ = Φ(2).
4
Using standard normal tables, Φ(2) ≈ 0.9772.
(b) Find P {X > −1}:
 −1 − 3 
P (X > −1) = 1 − P (X ≤ −1) = 1 − Φ = 1 − Φ(−1).
4
Since Φ(−1) = 1 − Φ(1) and Φ(1) ≈ 0.8413, it follows that

P (X > −1) = Φ(1) ≈ 0.8413.

(c) Show that P {2 < X < 7} ≈ 0.44. 65/66


Random Variable Discrete R.V. Continuous R.V.

Extended Reading and Exercises

• Sections 3.1-3.5, 3.8, 4.1-4.6, 7.2 of Douglas C. Montgomery and George C.


Runger, Applied Statistics and Probability for Engineers, 7th Ed.
• Videos by @3Blue1Brown:
• “But what is the Central Limit Theorem?”
• “A pretty reason why Gaussian + Gaussian = Gaussian”

66/66

You might also like