SOR1211 - Probability
SOR1211 - Probability
Elementary Probability:
1. The set of all possible outcomes of a statistical experiment is called the sample space and is
represented by S.
3. Sample Spaces with a large or infinite number of sample points are best described by a
statement or rule method.
S := { x∨statement on x }
4. An event is a subset of a sample space. That subset represents all of the elements for which
the event is true. The probability measure, denoted by P assign the probability to each
subset.
5. The complement of an event A with respect to S, is the subset of all elements of S that are
not A . The complement of A is denoted as A' .
7. The intersection of two events A and B, denoted by A ∩ B, is the event containing all the
elements common to A and B.
8. Two events A and B are mutually exclusive, or disjoint of A ∩ B=φ, that is if A and B
have no elements in common.
9. The union of two events A and B, denoted by A ∪B , is the event containing all the elements
that belong to A or B or both. That is, if A={ a , b , c } and B= {b , c , d , e } , then
A ∪ B= {a , b , c , d , e }.
10. If an operation can be performed in n1 ways, and if for each of these ways a second can be
performed in n2 ways, then the two operation can be performed in n1 n2 ways. The
generalized multiplication rule extends the latter for any n k operations.
1
Sample Space (S )=Number of Elements (n)Sample¿ r ¿ ¿
12. If the operation is carried out without replacement, then;
14. In general, n distinct objects taken r at a time can be arranged in n ( n−1 )( n−2 ) … ( n−r+ 1 )
ways.
n!
nPr=
( n−r ) !
16. The number of distinct permutations of n objects of which n1 are of one kind, n2 of a second
kind and n k of the k th kind is;
n!
n1 ! n2 ! …n k !
17. The number of ways of partitioning a set of n objects into r cells with n1 elements in the first
cell, n2 elements in the second, and so forth is;
n n!
( ) =
n 1 , n2 , … , n r n1 ! n2 ! … nr !
Whereby n1 +n 2+ …+nr =n
2
n n!
(r , n−r )= r ! ( n−r )!
19. The probability of an event A is the sum of the weights of all sample points in A . Therefore,
P ( A 1 ∪ A 2 ∪ A3 ∪ … ) =P ( A 1 ) + P ( A 2 ) + P ( A 3 ) +…
21. If an experiment can result in any one of N different equally likely outcomes, and if exactly n
of these outcomes correspond to event A , then the probability of event A is;
n
P ( A )=
N
P ( A ∪ B ) =P ( A )+ P ( B )−P ( A ∩B )
P ( A ∪ B ) =P ( A )+ P ( B )
P ( A ) + P ( A ' )=1
3
26. The probability of an event B occurring when it is known that some event A has occurred is
called conditional probability;
P ( A ∩B )
P ( B∨A )=
P(A)
27. If A and B are independent then P ( B∨A )=P ( B ) and P ( A∨B )=P ( A ) .
P ( A ∩ B )=P ( A ) P ( B∨ A )
Provided P ( A ) >0
30. A collection of events A={ A1 , ..., An } are mutually independent if for any subset of
A , A i 1 , ... A ik , for k ≤ n, we have;
P( Ai 1 ∩·· · ∩ A ik )=P( A i 1) ·· · P( A ik )
28. If the events B1 , B2 ,... , B k constitute a partition of the sample space S such that P( Bk )≠ 0
for i=1 , 2, ... k , then for any event A of S;
k k
P( A)=∑ P (B i ∩ A)=∑ P(Bi ) P (A∨Bi )
i =1 i =1
29. Bayes’ Theorem: If the events B1, B2, ..., Bk constitute a partition of the sample space S such
that P( Bi )≠ 0 or i=1 , 2, ... k , then for any event A in S, such that P( A)≠0 ;
P ( Br ∩ A ) P (B r ) P ( A|Br )
P ( B r| A ) = k
= k
∑ P ( Bi ∩ A ) ∑ P ( B i ) P ( A|Bi )
i=1 i=1
4
Proof:
P ( Br ∩ A ) P ( Br ) P( A∨B r)
P ( B r| A ) = =
P( A) P(A)
A=S ∩ A
A=( B 1 ∪ B2 ∪... ∪ B k ) ∩ A
A=( B 1 ∩ A ) ∪ ( B2 ∩ A ) ∪… ∪ ( Bk ∩ A )
P ( A )=P ( B1 ∩ A ) + P ( B 2 ∩ A ) + …+ P ( B k ∩ A )
k k
P ( A )=∑ P ( Bi ∩ A )=∑ P ( Bi ) P( A∨B i)
i=1 i=1
P ( Br ) P( A∨Br )
P ( B r| A ) = k
∑ P ( Bi ) P( A∨B i)
i=1
¿ P ( B ) P ( C ) [ 1−P ( A ) ]
¿ P ( A' ) P ( B) P ( B)
5
¿ P ( A ) [ 1−P ( C ) ][ 1−P ( B ) ]
¿ P ( A ) P ( B' ) P ( C ' )
P ( A ∩B ' ) =P ( A )−P ( A ∩B )
¿ P ( A )−P ( A ) P ( B )
¿ P ( A ) [ 1−P ( B ) ]
¿ P ( A ) P ( B' )
¿ [ 1−P ( A ) ] −P ( B ) [ 1−P ( A ) ]
¿ [ 1−P ( A ) ] [1−P ( B ) ]
¿ P ( A ' ) P ( B' )
Random Variables:
1. If each point in a sample space is assigned a numerical value denoted by x , these values are,
of course, random quantities determined by the outcome of the experiment. A random
variable, denoted by X is a function that associates a real number with each element in the
sample space. The random variable for which 0 and 1 are chosen to describe the two possible
values is called a Bernoulli random variable.
3. If a sample space contains an infinite number of possibilities equal to the number of points on
a line segment, it is called a continuous sample space.
f (x)≥ 0
P ( X=x )=f ( x )
∑ f ( x )=1
x
6
5. The cumulative distribution function F (x) of a discrete random variable X with
probability distribution f (x) is;
6. In dealing with continuous variables, the function f (x) is referred to as the probability
density function (pdf). For a continuous random variable X , defined over the set of real
numbers;
f ( x ) ≥0 ∀ x ∈ R
∞
∫ f ( x ) dx=1
−∞
b
P ( a< X < b )=∫ f ( x ) dx
a
7. The cumulative distribution function F (x)of a continuous random variable X with density
function f(x) is;
x
F (x)=P( X ≤ x )=∫ f ( t ) dt
−∞
2. Let X be a random variable with probability distribution f (x) . The mean, or expected value,
of a discrete X is;
7
μ= E ( x )=∑ xf ( x )
x
For a continuous X ;
∞
μ= E ( x )= ∫ xf ( x ) dx
−∞
3. The most important measure of variability of a random variable X is obtained by applying the
latter theorem with g( X)=(X −μ)2. The quantity is referred to as the variance of the
random variable X or the variance of the probability distribution of X and is denoted by
Var (X ) or the symbol σ 2X , or simply by σ 2 when it is clear to which random variable we
refer.
Let X be a random variable with probability distribution f (x) and mean μ. The variance of a
discrete X is;
For a continuous X ;
∞
σ 2=E[( X−μ)2 ]= ∫ ( x−μ )2 f ( x ) dx
−∞
The positive square root of the variance ( σ ) is called the standard deviation of X .
σ 2=E ( X 2 ) −μ 2
E ( aX +b )=∑ ( aX + b ) P ( X =x )
x
8
E ( aX +b )=a ∑ xP ( X =x )+ b ∑ P ( X=x )
x x
E ( aX +b )=aE ( X )+ b
E ( a X 2+ b )=∑ ( a X 2 +b ) P ( X =x )
x
E ( a X + b )=∑ ( a X 2 ) P ( X=x ) + ∑ bP ( X =x )
2
x x
E ( a X 2+ b )=aE ( X 2 ) +b
E ( b−a X 2 ) =b−aE ( X 2)
6. The expected value of the sum or difference of two or more functions of a random variable X
is the sum or difference of the expected values of the functions. That is;
2
Var ( aX +b )=E [ ( aX +b )( aX +b ) ]−E [ aX +b ]
2
Var ( aX +b )=E [ a2 X 2 +2 abX +b2 ]− [ aE ( X ) +b ]
2
Var ( aX +b )=a2 E [ X 2 ] −[ E ( X ) ]
2
Var ( a X 2 +b )=E [ ( a X 2+ b ) ( a X 2+ b ) ]−E [ a X 2 +b ]
9
2
Var ( a X 2 +b )=∑ ( a2 X 4 +2 ab X 2 +b 2 ) P ( X =x )−aE [ X 2 +b ]
x
2
Var ( b−a X 2 ) =E [ ( b−a X 2 ) ( b−a X 2 ) ] −E [ b−a X 2 ]
2
Var ( b−a X 2 ) =E [ b2−2 ab X 2 +a 2 X 4 ]−E [ b−a X 2 ]
2
Var ( b−a X 2 ) =b2 ∑ P ( X=x )−2 ab ∑ x 2 P ( X =x )+ a2 ∑ x 4 P ( X =x ) −b2 +2 abE ( X 2 ) −a2 ( E ( X 2) )
x x x
2. The number X of successes in n Bernoulli trials is called a binomial random variable. The
probability distribution of this discrete random variable is called the binomial distribution
denoted by b ( x ; n , p) .
3. Then the probability distribution of the binomial random variable X , the number of successes
in n independent trials, is;
b ( x ; n , p)=( nx) p q
x n−x
whereby x=0 , 1 ,2 , 3 , … n
4. The mean and variance of the binomial distribution b ( x ; n , p) are μ=np and σ 2=npq
respectively.
5. The binomial experiment becomes a multinomial experiment if we let each trial have more
than two possible outcomes.
10
6. If a given trial can result in the k outcomes E1 , E2 ,... Ek with probabilities p1 , p2 , ... pk , then
the probability distribution of the random variables X 1 , X 2 ,... X k, representing the number of
occurrences for E1 , E2 ,... Ek in n independent trials, is;
f ( x 1 , x 2 , … x k ; p1 , p2 , … p k ,n )=
( x , x n, … x ) p
1 2 k
x1
1
x2
, p2 , … pk
xk
whereby
k k
∑ x i=n and ∑ pi =1
i=1 i=1
Expectation Derivation:
n
n− x
E ( X ( X −1 ) )=∑ x ( x−1 ) nCx p x ( 1− p )
x=0
n
( n−2 ) !
¿ n ( n−1 ) p 2
∑ ( x −2 ) ! ( n−x ) ! p x−2 ( 1− p )n− x
x=2
2 n−1
¿ n ( n−1 ) p ( p+ 1− p )
¿ n ( n−1 ) p2
E ( X ) =np
Variance Derivation:
Geometric Distribution:
11
1. If repeated independent trials can result in a success with probability p and a failure with
probability q=1− p , then the probability distribution of the random variable X , the number
of the trial on which the first success occurs is;
x n−x
1 1
g ( x ; p )=nCx ( mean )( 1−
mean )
g ( x ; p )=nCx p x qn− x
whereby x=1 , 2, 3 , …
2. The mean and variance of a random variable following geometric distribution are;
1 1−p
μ= σ 2=
p p2
Expectation Derivation:
∞
x−1
E ( X ) =∑ xp ( 1− p )
x=1
∞
x−1
¿ p ∑ x ( 1− p )
x=1
∂ ( 1− y )−1 y =0
¿p
∂y y=1− p
1
E ( X )=
p
Variation Derivation:
∞
x−1
E ( X ( X −1 ) )=∑ x ( x−1 ) p ( 1− p )
x=1
∞
x−2
¿ p ( 1− p ) ∑ x ( x−1 ) ( 1− p )
x=1
∂ ( 1− y )−2 y =0
¿ p ( 1− p )
∂y y=1− p
2 ( 1− p )
¿
p2
12
2 ( 1− p ) 1 1
Var ( X )= + − 2
p2 p p
( 1− p )
Var ( X )=
p2
2. The Poisson process is defined by the following properties. The number of outcomes
occurring in one-time interval or specified region of space is independent of the number that
occur in any other disjoint time interval. The probability that a single outcome will occur
during a very short time interval or in a small region is proportional to the length of the time
interval or the size of the region and does not depend on the number of outcomes occurring
outside this time interval or region. The probability that more than one outcome will occur in
such a short time interval or fall in such a small region is negligible.
4. The probability distribution of the Poisson random variable X , representing the number of
outcomes occurring in a given time interval or specified region denoted by t , is;
e− λt ( λt) x
p(x ; λt)=
x!
whereby x=0 , 1 ,2 , …
r
p(r ; λt)=∑ p(x ; λt )
x=0
5. Both the mean and the variance of the Poisson distribution p(x ; λt) are λt .
13
b ( x ; n , p)→ p (x ; μ)
Expectation Derivation:
∞
e− λ λ x
E ( X ) =∑ x
x=0 x!
∞
λ x−1
¿ λe− λ ∑ x
x=0 ( x−1 ) !
¿ λe− λ e λ
E ( X ) =λ
Variation Derivation:
∞
e−λ λ x
E ( X ( X −1 ) )=∑ x ( x−1 )
x=0 x!
∞
λ x−2
¿ λ2 e− λ ∑ x ( x−1 )
x=0 ( x−2 ) !
¿ λ2 e− λ e λ =λ2
2
Var ( X )=E ( X 2 )−E ( X )
Normal Distribution:
1. The normal distribution is often referred to as the Gaussian distribution which constitutes
of a bell-shaped curve. A continuous random variable X having the bell-shaped distribution is
called a normal random variable.
2. The mathematical equation for the probability distribution of the normal variable depends on
the mean and standard deviation. Hence, the values of the density of X is denoted by;
n(x ; μ , σ )
3. The density of the normal random variable X , with mean μ and variance σ 2 is;
14
−1
1 2
n(x ; μ , σ )= e 2 σ ( x−μ)2
√ 2 πσ
4. A normal curve is said to have the following properties. The mode, which is the point on
the horizontal axis where the curve is a maximum, occurs at x=μ. The curve is symmetric
about a vertical axis through the mean ( μ ) . The curve has its points of inflection at x=μ ± σ ;
it is concave downward if μ−σ < X < μ+ σ and is concave upwards otherwise. The normal
curve approaches the horizontal axis asymptotically as we proceed in either direction away
from the mean. The total area under the curve and above the horizontal axis is equal to 1.
5. The mean and variance of n(x ; μ , σ ) are μ and σ 2, respectively. Hence, the standard
deviation is σ .
6. The distribution of a normal random variable with mean 0 and variance 1 is called a standard
normal distribution.
Expectation Derivation:
∞ −1 x− μ
1 2 ( )
σ
E ( X −μ )= ∫ ( x−μ ) e dx
√ 2 πσ −∞
E ( X ) =μ
Variation Derivation:
Var ( X )= ∫ e 2 σ dy
√2 π −∞ √ 2 π −∞
Var ( X )=σ 2
Exponential Distribution:
1. The exponential distribution is a special case of the gamma distribution. The gamma
function is defined as;
15
∞
Γ ( α )=∫ x α −1 e− x dx
0
For α >0
2. The continuous random variable X has a gamma distribution, with parameters α and β , if its
density function is given by;
−x
1
{
f ( x ; α , β )= β α Γ ( α )
0
x α −1 eβ
μ= β and σ 2=β 2
1
λ=
μ
Mean/Expectation Derivation:
∞
E ( X ) =∫ λx e− λx dx
0
∞
¿−x e− λx ∞ +∫ e−λx dx
0 0
1
E ( X )=
λ
Variance Derivation:
16
∞
E ( X ) =∫ λ x 2 e− λx dx
2
∞
2 − λx ∞ +2 x e− λx dx
¿−x e ∫
0 0
2
¿
λ2
2
Var ( X )=E ( X 2 )−E ( X )
2
2 1
Var ( X )=
λ 2
−()
λ
1
Var ( X )=
λ2
μ=np
σ =√ np ( 1− p )
17