0% found this document useful (0 votes)
14 views13 pages

MAS 102 - Topic 1

The document provides a comprehensive overview of the principles of probability, including notation, axioms, and rules such as the addition and multiplication rules. It also discusses random variables, their probability functions, cumulative distribution functions, moments, and moment-generating functions, as well as standard discrete distributions like uniform, Bernoulli, binomial, and Poisson distributions. Key concepts such as conditional probability, independence, and the law of total probability are also explained.

Uploaded by

petersarikaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views13 pages

MAS 102 - Topic 1

The document provides a comprehensive overview of the principles of probability, including notation, axioms, and rules such as the addition and multiplication rules. It also discusses random variables, their probability functions, cumulative distribution functions, moments, and moment-generating functions, as well as standard discrete distributions like uniform, Bernoulli, binomial, and Poisson distributions. Key concepts such as conditional probability, independence, and the law of total probability are also explained.

Uploaded by

petersarikaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Topic 1: Review of MAS 103

PRINCIPLES OF PROBABILITY

Notation

Events are denoted using capital letters so that p(A) denotes the probability that event A
occurs.

Range of Probability

0 6 p(A) 6 1

Axioms of Probability

The three basic probability axioms can be summarized as follows:

1. p(S) = 1.
0
It follows that for event A from sample space S that p(A ) = 1 − p(A).

2. p(A) > 0 for all A ⊂ S.


Rules 1 and 2 together are telling us that probabilities lie between 0 (impossible) and 1
(certain).

3. p(A ∪ B) = p(A) + p(B) if A ∩ B = ∅.


If two events cannot occur simultaneously, ie they are mutually exclusive, the probability of
the event defined by their union is equal to the sum of the probabilities of the two events.
This property is known as additivity.

Venn Diagrams

Venn diagrams are used to represent events graphically. We use set notation to identify different
areas on a Venn diagram.
ε or S, the universal set represents the sample space.
A, B, closed curves represent the events.

0
The event that A does not occur denoted as A
0
p(A ) = 1 − p(A)

1
The event that A or B or both occur denoted as A ∪ B

p(A ∪ B) = p(A) + p(B) − p(A ∩ B) - the Addition Rule.

The event that A and B occurs denoted as A ∩ B

p(A ∩ B) = p(A) × p(B|A) - the Multiplication Rule.

Conditional probability - the event that B given A occurs denoted as B|A


p(A ∩ B)
p(B|A) =
p(A)

Conditional probability - the event that A given B occurs denoted as A|B


p(A ∩ B)
p(A|B) =
p(B)
Conditional probabilities shrink the sample space to the prior event.

Mutually Exclusive events

If events A and B are mutually exclusive, then p(A∪B) = p(A)+p(B). This implies p(A∩B) =
0.

Independent events

If events A and B are independent, then p(A ∩ B) = p(A) × p(B). This implies p(B|A) = p(B)
and p(A|B) = p(A).

The Law of Total Probability

Let A1 , ..., Ak be mutually exclusive and exhaustive events. Then for any other event B,
X
k
p(B) = p(B|A1 )p(A1 ) + · · · + p(B|Ak )p(Ak ) = p(B|Ai )p(Ai )
i=1

The Bayes Theorem

The Bayes theorem is used to work out the probability that a given prior event occurred given
that a subsequent event has occurred.

THE CONCEPT OF A RANDOM VARIABLE

2
A random variable is a way of mapping outcomes of random processes to numbers (quantify
outcomes) e.g. Let X be the outcome when you toss a coin,


1, if heads
X=

0, if tails

or let Y be the sum of the uppermost faces when two die are rolled.
When you quantify outcomes you can do more mathematics on the outcomes and equally state
more mathematical notations on the outcome.
e.g. the probability that the sum of the uppermost faces is less than or equal to 12 is denoted
as p(Y 6 12).
Capital letters, X, are used to denote the random variables, while small letters x, denote a
particular value (or realized values) of the random variable.

Illustration

Consider a study where the objective is to estimate the average height of some seedling. Height
is a random variable while 2.2 cm is the realized value of the random variable.
Random variables may be

i) Discrete – take particular values (values on a discrete scale)

ii) Continuous – take a given range of values (values in a given range)

Probability functions of random variables

A probability function of a random variable describes how total probability is distributed over
the various values that the random variable takes.
The probability function of a discrete random variable is termed it’s probability mass function
(pmf) while that of a continuous random variable, it’s probability density function (pdf).

The discrete case: probability mass function

x 0 1 2 3
1 3 3 1
p(x) = p(X = x) 8 8 8 8

3
OR 

 1

 , x = 0, 3

8
p(x) = p(X = x) = 3 , x = 1, 2

 8



0, otherwise

The pmf satisfies two conditions namely:

i) p(x) > 0.
P
ii) ∀x = 1.

The continuous case: probability density function




3x2 , 0 6 x 6 1
f(x) =

0, otherwise
The pdf satisfies the following conditions:

i) f(x) > 0.
R∞
ii) −∞ f(x)dx = 1.
Rb
iii) p(a 6 X 6 b) = p(a < X < b) = a f(x)dx.

For a continuous random variable, p(X = x) = 0.

Cumulative distribution function

The cumulative distribution function (cdf) of a random variable X also termed the distribution
function (df) is denoted F(x) = p(X 6 x).


p(X 6 x), X discrete
F(x) = R

 x f(x)dx, X continuous
−∞

Properties of cumulative distribution functions

1) p(a < X 6 b) = p(X 6 b) − p(X 6 a) = F(b) − F(a).

2) p(a 6 X 6 b) = p(a < X 6 b) + p(X = a) = F(b) − F(a) + p(a).

3) p(a < X < b) = p(a < X 6 b) − p(X = b) = F(b)–F(a) − p(b).

For continuous X,
d 0
f(x) = F(x) = F (x)
dx
4
Mode, median and quartiles of a continuous random variable

Given X is a continuous random variable with cdf F(x) then

1) Median, m, of X is given by F(m) = 0.5.

2) Lower quartile, Q1 , is given by F(Q1 ) = 0.25.

3) Upper quartile, Q3 , is given by F(Q3 ) = 0.75.

4) The mode is the value of the random variable where it is most dense i.e. where the pdf
reaches its highest point (at a maximum turning point)
df(x) d2 f(x)
= 0; <0
dx dx2
MOMENTS

Let g(x) denote any function of X, then the expected value of g(x) denoted E[g(x)] is

 P
 for X discrete
∀x g(x)p(x),
E[g(x)] = R

 ∞ g(x)f(x)dx, for X continuous
−∞

Special Case

If g(x) = x, then the expected value of X denoted E[X] is



 P
 for X discrete
∀x xp(x),
E[X] = R

 ∞ xf(x)dx, for X continuous
−∞

This special type of expectation is called the mean and is denoted by µ = E(X).

Properties of Expectations

1) E[kX] = kE[X],

2) E[X + k] = E[X] + k where k is a real number.

If g(x) = (x − µ)2 , where µ = E(X) is the mean of the random variable X then the expected
value of (x − µ)2 denoted E[(x − µ)2 ] is

 P
 2
for X discrete
2 ∀x (x − µ) p(x),
E[(x − µ) ] = R

 ∞ (x − µ)2 f(x)dx, for X continuous
−∞

This special type of expectation is called the variance and is denoted by σ2 = Var(X).
Further Var(X) = E[(x − µ)2 ] = E[X2 ] − E[X]2 .

5
Properties of Variances

1) Var[kX] = k2 Var[X],

2) Var[X + k] = Var[X] where k is a real number.

Moments about the origin

Consider the random variable X and let g(x) = xr , r > 0, then the expected value of g(x)
denoted E[g(x)] = E[xr ] is

 P
 xr p(x), for X discrete
∀x
E[xr ] =
R∞ xr f(x)dx, for X continuous

−∞

E[xr ] is termed the rth moment of the random variable X about the origin.
The mean µ = E[X] is the 1st moment of the random variable X about the origin.
The variance of the random variable X denoted σ2 = Var(X) = E[(X − µ)2 ] = E[X2 ] − E[X]2
hence the [2nd moment about the origin] – [1st moment about the origin] squared.

Moments about the mean

Consider the random variable X with mean µ = E[X] and let g(x) = (x − µ)r , r > 0, then the
expected value of g(x) denoted E[g(x)] = E[(x − µ)r ] is

 P
 r
for X discrete
r ∀x (x − µ) p(x),
E[(x − µ) ] = R

 ∞ (x − µ)r f(x)dx, for X continuous
−∞

E[(x − µ)r ] is termed the rth moment of the random variable X about the mean, µ.
When r = 2, E[(x − µ)2 ] = Var(X). Hence the variance is the 2nd moment about the mean.
When r = 1, E[(x − µ)] = E[X] − µ = µ − µ = 0. This implies the 1st moment of a random
variable X about its mean, µ is 0.

Moment Generating Functions

The moment generating function of a random variable X, is used to generate its moments about
the origin. It is denoted by:

 P
 etx p(x), for X discrete
tx ∀x
MX (t) = E[e ] =
R∞ etx f(x)dx, for X continuous

−∞

where t is a constant.

6
Mean and Variance using Moment Generating Functions
0
E[X] = MX (0) i.e. differentiate the moment generating function once with respect to t and let
t = 0.
00 0
Var(X) = MX (0) − [MX (0)]2 .

Theorems of Moment Generating Functions

Given a random variable X,

1) McX (t) = MX (ct), where c is a constant.

2) MX1 +X2 +···+Xn (t) = MX1 (t)MX2 (t)...MXn (t) where the Xi0 s are independent random vari-
ables.

at t
 X−a
3) MU (t) = e− h MX h
where U = h
and a and h are constants.

4) Moment generating functions are unique to given distributions.

STANDARD DISCRETE DISTRIBUTIONS

Uniform Distribution

Conditions for a discrete uniform random variable X are:

i) X is defined over a set of n distinct values.

ii) Each value is equally likely

hence 

 1 , for each x
p(x) = p(X = x) = n

0, otherwise

n+1 n2 − 1
It has properties E[X] = and Var[X] = .
2 12

Bernoulli Distribution

Conditions for a Bernoulli distribution include

1) A single trail termed a Bernoulli trail.

7
2) The trail has two possible outcomes termed a success and a failure


1, if success
X=

0, if failure

3) p = p(success) and q = p(failure); where q = 1 − p.

The random variable X denotes the success and it’s probability mass function is given by


px (1 − p)1−x , x = 0, 1
p(x) = p(X = x) =

0, otherwise

denoted X ∼ B(p).
It has properties E[X] = p, Var[X] = pq and MX (t) = (q + pet ).

Binomial Distribution

Conditions required for a Binomial distribution

1) A fixed number, n, of independent trials.

2) Each trail has two possible outcomes technically termed a ’success’ and a ’failure’ (Bernoulli
trails.

3) The probability of success, p, in each trail is constant.

The random variable X which denotes the number of successes in n trails has a binomial
distribution. It’s probability mass function is given by


 n px (1 − p)n−x , x = 0, 1, 2, ..., n

x
p(x) = p(X = x) =

0, otherwise

denoted X ∼ B(n, p).


It has properties E[X] = np, Var[X] = npq and MX (t) = (q + pet )n , where p = p(success)
and q = p(failure); and q = 1 − p.

Poisson Distribution

The Poisson random variable X represents the number of events that occur in an interval. The
interval may be a fixed length in time or space. The events must occur:

8
i) singly in space or time;

ii) independently of each other;

iii) at a constant rate in the sense that the mean number of occurrences in an interval is
proportional to the length of the interval.

Such events are said to occur randomly.


The probability mass function of X is given by
 x −λ

 λ e , x = 0, 1, 2, ...
p(x) = p(X = x) = x!

0, otherwise

denoted X ∼ Po (λ).
t −1)
It has properties E[X] = λ, Var[X] = λ and MX (t) = eλ(e .

A limiting form of the Binomial Distribution

The Poisson distribution can be used as a limiting form of the binomial distribution i.e. when
n, the number of trials is too large and p, the probability of success is too small – a rare event.
If X ∼ B(n, p) with n large and p small, then we can approximate it by X ∼ Po (λ) where
λ = np.

Geometric Distribution

The geometric distribution models discrete waiting time. The random variable X denotes the
number of trails required before the first success. It’s probability function is given by


qx p, x = 0, 1, 2, ...
p(x) = p(X = x) =

0, otherwise

where p = p(success) and q = p(failure); and q = 1 − p.


q q p
It has properties E[X] = , Var[X] = 2 and MX (t) = .
p p (1 − qet )

Hypergeometric Distribution

The hypergeometric distribution is a discrete distribution that models the number of events in
a fixed sample size when you know the total number of items in the population that the sample
is from. Each item in the sample has two possible outcomes (either an event or a nonevent).
The hypergeometric distribution is used under the following conditions:

9
1) Total number of items (population), M, is fixed; with k of a certain type.

2) Sample size (number of trials),n, is a portion of the population; n items are drawn without
replacement.

3) Probability of success changes after each trial.

4) The random variable X denotes the number of successes.

We note that the chosen group contains x successes and (n − x) failures. In how many ways
can you pick x successes from k of a certain type?
The random variable X is said to have a hypergeometric distribution and it’s probability mass
function is given by  k M−k
 


 x n−x
M

p(x) = p(X = x) = n


0, otherwise
 
kn kn (M − k)(M − n)
It has properties E[X] = , Var[X] = .
M M M(M − 1)

Negative Binomial Distribution

In a sequence of independent Bernoulli(p) trials, let the random variable X denote the trial at
which the rth success occurs, where r is a fixed integer. Then


 x−1 pr (1 − p)x−r , x = r, r + 1, ...

r−1
p(X = x|r, p) =

0, otherwise

and we say that X has a negative binomial(r, p) distribution.


The negative binomial distribution is sometimes defined in terms of the random variable Y =
number of failures before rth success. This formulation is statistically equivalent to the one
given above in terms of X = trial at which the rth success occurs, since Y = X − r. The
alternative form of the negative binomial distribution is


 r+y−1 pr (1 − p)y , y = 0, 1, ...

y
p(Y = y) =

0, otherwise

r(1 − p) r(1 − p)
It has properties E[Y] = and Var[Y] = .
p p2

STANDARD CONTINUOUS DISTRIBUTIONS

10
Rectangular Distribution

Models a random variable where probability is constant (or the same) over a given interval. Its
probability density function is given by


k, a 6 x 6 b
f(x) =

0, elsewhere

1
This implies k = .
b−a
x−a a+b (b − a)2
It has properties F(x) = , E[X] = and Var[X] = .
b−a 2 12

Exponential Distribution

1. Used to model the behaviour of probabilities that reflect a large number of small values
and a small number of large values.

2. It is often concerned with the amount of time until some specific event occurs. It models
the length of time between Poisson happenings.

Its probability density function is given by:




λe−λx , x > 0
f(x) =

0, elsewhere

denoted X ∼ Exp(λ). λ is the decay parameter i.e. it controls the rate of decay or decline and
1
λ= .
µ
1 1 λ
It has properties F(x) = 1 − e−λx , E[X] = , Var[X] = 2 and MX (t) = .
λ λ λ−t

Normal Distribution

Used to model continuous random variables which have a symmetric distribution. Its proba-
bility function is given by:


 √ 1 e− 12 ( x−µ
2
σ ) ,
2πσ2
−∞ < x < ∞, −∞ < µ < ∞, σ2 > 0
f(x) =

0, elsewhere

11
denoted X ∼ N(µ, σ2 ).
It has properties
Zx
1 1 x−µ 2
F(x) = Φ(x) = p(X 6 x) = √ e− 2 ( σ ) dx
−∞ 2πσ2
E(X) = µ

Var(X) = σ2
1 2 σ2 +2µt)
MX (t) = e 2 (t

Any normal variable X ∼ N(µ, σ2 ) can be transformed to a standard normal variable Z ∼ N(0, 1)
by the formula
X−µ
Z=
σ
We say, we are standardizing the normal random variable.
The probability function of the standard normal variable is given by:


 √1 e− z22 , −∞ < z < ∞

f(z) = 2π


0, elsewhere

denoted Z ∼ N(0, 1). It has properties


Zz
1 z2
F(z) = Φ(z) = p(Z 6 z) = √ e− 2 dz
−∞ 2π
E(Z) = 0

Var(Z) = 1
t2
MZ (t) = e 2

The areas under the standard normal curve are given in the standard normal tables.

Normal Approximation to Binomial and Poisson

The normal distribution provides a simple and accurate approximation to the binomial and
Poisson distributions. The normal distribution is a continuous distribution hence p(X = x) = 0
while the Binomial and Poisson distributions are discrete distributions. We then use a continuity
correction.

Using a continuity correction

We first write the probability using 6 or >. We approximate

12
1) p(X 6 n) by p(Y < n + 0.5).

2) p(X > n) by p(Y > n − 0.5).

Approximating a Binomial distribution

For a binomial distribution, there are two possible approximations, depending upon whether p
lies close to 0.5 (in which case a normal approximation is used) or p is small (in which case a
Poisson distribution is used).
If X ∼ B(n, p) and n is large and p is close to 0.5 then X can be approximated by Y ∼
N(np, np(1 − p)).
If you are approximating a binomial distribution by a normal distribution you should go di-
rectly to the normal distribution not via a Poisson distribution as this involves one not two
approximations and should therefore be more accurate.
If you are in doubt over which approximation is appropriate a useful rule of thumb is to calcu-
late the mean np and if this is less than or equal to 10 use the Poisson approximation. If the
mean is more than 10 then a normal approximation is usually suitable.

Approximating a Poisson distribution by a normal distribution

If X ∼ Po (λ) and λ is large then X can be approximated by Y ∼ N(λ, λ).

Reference

1. S C Gupta & V K Kapoor (2000) Fundamentals of Mathematical Statistics (A Modern


Approach); 10th Edition, Sultan Chand & Sons

2. Jay L Devore & Kenneth N Berk (2012) Modern Mathematical Statistics with Applications
2nd Edition, Springer Texts in Statistics

3. William Mendenhall, III, Robert J Beaver & Barbara M Beaver (2013) Introduction to
Probability and Statistics 14th Edition, Brooks/Cole Cengage Learning

13

You might also like