MAS 102 - Topic 1
MAS 102 - Topic 1
PRINCIPLES OF PROBABILITY
Notation
Events are denoted using capital letters so that p(A) denotes the probability that event A
occurs.
Range of Probability
0 6 p(A) 6 1
Axioms of Probability
1. p(S) = 1.
0
It follows that for event A from sample space S that p(A ) = 1 − p(A).
Venn Diagrams
Venn diagrams are used to represent events graphically. We use set notation to identify different
areas on a Venn diagram.
ε or S, the universal set represents the sample space.
A, B, closed curves represent the events.
0
The event that A does not occur denoted as A
0
p(A ) = 1 − p(A)
1
The event that A or B or both occur denoted as A ∪ B
If events A and B are mutually exclusive, then p(A∪B) = p(A)+p(B). This implies p(A∩B) =
0.
Independent events
If events A and B are independent, then p(A ∩ B) = p(A) × p(B). This implies p(B|A) = p(B)
and p(A|B) = p(A).
Let A1 , ..., Ak be mutually exclusive and exhaustive events. Then for any other event B,
X
k
p(B) = p(B|A1 )p(A1 ) + · · · + p(B|Ak )p(Ak ) = p(B|Ai )p(Ai )
i=1
The Bayes theorem is used to work out the probability that a given prior event occurred given
that a subsequent event has occurred.
2
A random variable is a way of mapping outcomes of random processes to numbers (quantify
outcomes) e.g. Let X be the outcome when you toss a coin,
1, if heads
X=
0, if tails
or let Y be the sum of the uppermost faces when two die are rolled.
When you quantify outcomes you can do more mathematics on the outcomes and equally state
more mathematical notations on the outcome.
e.g. the probability that the sum of the uppermost faces is less than or equal to 12 is denoted
as p(Y 6 12).
Capital letters, X, are used to denote the random variables, while small letters x, denote a
particular value (or realized values) of the random variable.
Illustration
Consider a study where the objective is to estimate the average height of some seedling. Height
is a random variable while 2.2 cm is the realized value of the random variable.
Random variables may be
A probability function of a random variable describes how total probability is distributed over
the various values that the random variable takes.
The probability function of a discrete random variable is termed it’s probability mass function
(pmf) while that of a continuous random variable, it’s probability density function (pdf).
x 0 1 2 3
1 3 3 1
p(x) = p(X = x) 8 8 8 8
3
OR
1
, x = 0, 3
8
p(x) = p(X = x) = 3 , x = 1, 2
8
0, otherwise
i) p(x) > 0.
P
ii) ∀x = 1.
i) f(x) > 0.
R∞
ii) −∞ f(x)dx = 1.
Rb
iii) p(a 6 X 6 b) = p(a < X < b) = a f(x)dx.
The cumulative distribution function (cdf) of a random variable X also termed the distribution
function (df) is denoted F(x) = p(X 6 x).
p(X 6 x), X discrete
F(x) = R
x f(x)dx, X continuous
−∞
For continuous X,
d 0
f(x) = F(x) = F (x)
dx
4
Mode, median and quartiles of a continuous random variable
4) The mode is the value of the random variable where it is most dense i.e. where the pdf
reaches its highest point (at a maximum turning point)
df(x) d2 f(x)
= 0; <0
dx dx2
MOMENTS
Let g(x) denote any function of X, then the expected value of g(x) denoted E[g(x)] is
P
for X discrete
∀x g(x)p(x),
E[g(x)] = R
∞ g(x)f(x)dx, for X continuous
−∞
Special Case
This special type of expectation is called the mean and is denoted by µ = E(X).
Properties of Expectations
1) E[kX] = kE[X],
If g(x) = (x − µ)2 , where µ = E(X) is the mean of the random variable X then the expected
value of (x − µ)2 denoted E[(x − µ)2 ] is
P
2
for X discrete
2 ∀x (x − µ) p(x),
E[(x − µ) ] = R
∞ (x − µ)2 f(x)dx, for X continuous
−∞
This special type of expectation is called the variance and is denoted by σ2 = Var(X).
Further Var(X) = E[(x − µ)2 ] = E[X2 ] − E[X]2 .
5
Properties of Variances
1) Var[kX] = k2 Var[X],
Consider the random variable X and let g(x) = xr , r > 0, then the expected value of g(x)
denoted E[g(x)] = E[xr ] is
P
xr p(x), for X discrete
∀x
E[xr ] =
R∞ xr f(x)dx, for X continuous
−∞
E[xr ] is termed the rth moment of the random variable X about the origin.
The mean µ = E[X] is the 1st moment of the random variable X about the origin.
The variance of the random variable X denoted σ2 = Var(X) = E[(X − µ)2 ] = E[X2 ] − E[X]2
hence the [2nd moment about the origin] – [1st moment about the origin] squared.
Consider the random variable X with mean µ = E[X] and let g(x) = (x − µ)r , r > 0, then the
expected value of g(x) denoted E[g(x)] = E[(x − µ)r ] is
P
r
for X discrete
r ∀x (x − µ) p(x),
E[(x − µ) ] = R
∞ (x − µ)r f(x)dx, for X continuous
−∞
E[(x − µ)r ] is termed the rth moment of the random variable X about the mean, µ.
When r = 2, E[(x − µ)2 ] = Var(X). Hence the variance is the 2nd moment about the mean.
When r = 1, E[(x − µ)] = E[X] − µ = µ − µ = 0. This implies the 1st moment of a random
variable X about its mean, µ is 0.
The moment generating function of a random variable X, is used to generate its moments about
the origin. It is denoted by:
P
etx p(x), for X discrete
tx ∀x
MX (t) = E[e ] =
R∞ etx f(x)dx, for X continuous
−∞
where t is a constant.
6
Mean and Variance using Moment Generating Functions
0
E[X] = MX (0) i.e. differentiate the moment generating function once with respect to t and let
t = 0.
00 0
Var(X) = MX (0) − [MX (0)]2 .
2) MX1 +X2 +···+Xn (t) = MX1 (t)MX2 (t)...MXn (t) where the Xi0 s are independent random vari-
ables.
at t
X−a
3) MU (t) = e− h MX h
where U = h
and a and h are constants.
Uniform Distribution
hence
1 , for each x
p(x) = p(X = x) = n
0, otherwise
n+1 n2 − 1
It has properties E[X] = and Var[X] = .
2 12
Bernoulli Distribution
7
2) The trail has two possible outcomes termed a success and a failure
1, if success
X=
0, if failure
The random variable X denotes the success and it’s probability mass function is given by
px (1 − p)1−x , x = 0, 1
p(x) = p(X = x) =
0, otherwise
denoted X ∼ B(p).
It has properties E[X] = p, Var[X] = pq and MX (t) = (q + pet ).
Binomial Distribution
2) Each trail has two possible outcomes technically termed a ’success’ and a ’failure’ (Bernoulli
trails.
The random variable X which denotes the number of successes in n trails has a binomial
distribution. It’s probability mass function is given by
n px (1 − p)n−x , x = 0, 1, 2, ..., n
x
p(x) = p(X = x) =
0, otherwise
Poisson Distribution
The Poisson random variable X represents the number of events that occur in an interval. The
interval may be a fixed length in time or space. The events must occur:
8
i) singly in space or time;
iii) at a constant rate in the sense that the mean number of occurrences in an interval is
proportional to the length of the interval.
denoted X ∼ Po (λ).
t −1)
It has properties E[X] = λ, Var[X] = λ and MX (t) = eλ(e .
The Poisson distribution can be used as a limiting form of the binomial distribution i.e. when
n, the number of trials is too large and p, the probability of success is too small – a rare event.
If X ∼ B(n, p) with n large and p small, then we can approximate it by X ∼ Po (λ) where
λ = np.
Geometric Distribution
The geometric distribution models discrete waiting time. The random variable X denotes the
number of trails required before the first success. It’s probability function is given by
qx p, x = 0, 1, 2, ...
p(x) = p(X = x) =
0, otherwise
Hypergeometric Distribution
The hypergeometric distribution is a discrete distribution that models the number of events in
a fixed sample size when you know the total number of items in the population that the sample
is from. Each item in the sample has two possible outcomes (either an event or a nonevent).
The hypergeometric distribution is used under the following conditions:
9
1) Total number of items (population), M, is fixed; with k of a certain type.
2) Sample size (number of trials),n, is a portion of the population; n items are drawn without
replacement.
We note that the chosen group contains x successes and (n − x) failures. In how many ways
can you pick x successes from k of a certain type?
The random variable X is said to have a hypergeometric distribution and it’s probability mass
function is given by k M−k
x n−x
M
p(x) = p(X = x) = n
0, otherwise
kn kn (M − k)(M − n)
It has properties E[X] = , Var[X] = .
M M M(M − 1)
In a sequence of independent Bernoulli(p) trials, let the random variable X denote the trial at
which the rth success occurs, where r is a fixed integer. Then
x−1 pr (1 − p)x−r , x = r, r + 1, ...
r−1
p(X = x|r, p) =
0, otherwise
r(1 − p) r(1 − p)
It has properties E[Y] = and Var[Y] = .
p p2
10
Rectangular Distribution
Models a random variable where probability is constant (or the same) over a given interval. Its
probability density function is given by
k, a 6 x 6 b
f(x) =
0, elsewhere
1
This implies k = .
b−a
x−a a+b (b − a)2
It has properties F(x) = , E[X] = and Var[X] = .
b−a 2 12
Exponential Distribution
1. Used to model the behaviour of probabilities that reflect a large number of small values
and a small number of large values.
2. It is often concerned with the amount of time until some specific event occurs. It models
the length of time between Poisson happenings.
denoted X ∼ Exp(λ). λ is the decay parameter i.e. it controls the rate of decay or decline and
1
λ= .
µ
1 1 λ
It has properties F(x) = 1 − e−λx , E[X] = , Var[X] = 2 and MX (t) = .
λ λ λ−t
Normal Distribution
Used to model continuous random variables which have a symmetric distribution. Its proba-
bility function is given by:
√ 1 e− 12 ( x−µ
2
σ ) ,
2πσ2
−∞ < x < ∞, −∞ < µ < ∞, σ2 > 0
f(x) =
0, elsewhere
11
denoted X ∼ N(µ, σ2 ).
It has properties
Zx
1 1 x−µ 2
F(x) = Φ(x) = p(X 6 x) = √ e− 2 ( σ ) dx
−∞ 2πσ2
E(X) = µ
Var(X) = σ2
1 2 σ2 +2µt)
MX (t) = e 2 (t
Any normal variable X ∼ N(µ, σ2 ) can be transformed to a standard normal variable Z ∼ N(0, 1)
by the formula
X−µ
Z=
σ
We say, we are standardizing the normal random variable.
The probability function of the standard normal variable is given by:
√1 e− z22 , −∞ < z < ∞
f(z) = 2π
0, elsewhere
Var(Z) = 1
t2
MZ (t) = e 2
The areas under the standard normal curve are given in the standard normal tables.
The normal distribution provides a simple and accurate approximation to the binomial and
Poisson distributions. The normal distribution is a continuous distribution hence p(X = x) = 0
while the Binomial and Poisson distributions are discrete distributions. We then use a continuity
correction.
12
1) p(X 6 n) by p(Y < n + 0.5).
For a binomial distribution, there are two possible approximations, depending upon whether p
lies close to 0.5 (in which case a normal approximation is used) or p is small (in which case a
Poisson distribution is used).
If X ∼ B(n, p) and n is large and p is close to 0.5 then X can be approximated by Y ∼
N(np, np(1 − p)).
If you are approximating a binomial distribution by a normal distribution you should go di-
rectly to the normal distribution not via a Poisson distribution as this involves one not two
approximations and should therefore be more accurate.
If you are in doubt over which approximation is appropriate a useful rule of thumb is to calcu-
late the mean np and if this is less than or equal to 10 use the Poisson approximation. If the
mean is more than 10 then a normal approximation is usually suitable.
Reference
2. Jay L Devore & Kenneth N Berk (2012) Modern Mathematical Statistics with Applications
2nd Edition, Springer Texts in Statistics
3. William Mendenhall, III, Robert J Beaver & Barbara M Beaver (2013) Introduction to
Probability and Statistics 14th Edition, Brooks/Cole Cengage Learning
13