0% found this document useful (0 votes)
389 views79 pages

Project STT 311

The probability density function f(x) is given as: f(x) =  0, x < 0  2x - x , 0 ≤ x ≤ 3  0, x > 3 To find the probability that the life is between 1 to 2 hours, we calculate: P(1 < x < 2) = ∫12 f(x) dx = ∫12 (2x - x2) dx = 3 - 2 = 1 Hence, the probability that the life of the insect is between 1 to 2 hours is 1. 1.7 CONCLUS

Uploaded by

Joseph Akanbi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
389 views79 pages

Project STT 311

The probability density function f(x) is given as: f(x) =  0, x < 0  2x - x , 0 ≤ x ≤ 3  0, x > 3 To find the probability that the life is between 1 to 2 hours, we calculate: P(1 < x < 2) = ∫12 f(x) dx = ∫12 (2x - x2) dx = 3 - 2 = 1 Hence, the probability that the life of the insect is between 1 to 2 hours is 1. 1.7 CONCLUS

Uploaded by

Joseph Akanbi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

NATIONAL OPEN UNIVERSITY OF NIGERIA

SCHOOL OF SCIENCES

COURSE CODE: STT 311

COURSE TITLE: PROBABILITY DISTRIBUTION 2

0
STT 311 PROBABILITY DISTRIBUTION 2

Course Developer: MR A. ADELEKE.

Osurl State College of Technology ESA- OKE

(LAGOS CENTRE)

1
TABLE OF CONTENTS

UNIT 1: PROBABILITY SPACES, MEASURE AND DISTRIBUTION

1.0 Introduction

1.1 Objective

1.2 Probability space

1.3 Sample space and Event

1.4 Probability Measure

1.5 Theorem on probability space

1.6 Probability distribution

1.7 Conclusion

1.8 Summary

1.9 Tutor Marked Assignment

1.91 References.

UNIT 2 : DISTRIBUTION OF RANDOM VARIABLE SPACES

2.0 Introduction

2.1 Objective

2.2 Random Variables

2.3 Discrete Probability Distributions

2.4 Distribution Functions For Random Variables

2.5 Distribution Function For Discrete Random Variables.

2
2.6 Continuous Random Variables

2.7 Graphical Representation of Random Variables.

2.8 Joint Distributions

2.9 Independence of Random Variables

2.10 Conditional Distributions

2.11 Conclusion

2.12 Summary

2.13 Tutor Marked Assignment (TMA)

2.14 Reference / Further Reading / Other References.

UNIT 3 : EXPECTATION OF RANDOM VARIABLES.

3.0 Introduction

3.1 Objectives

3.2 What is Expectation of Random Variables?

3.3 Theorems on Expectation

3.4 The Variance and Standard Deviation.

3.5 Theorems on Variance

3.6 Moments

3.7 Moments Generating Functions

3.8 Theorems on Moment Generating Functions.

3.9 Characteristics Function.

3
3.10 Conclusion

3.11 Summary

3.12 Tutor Marked Assignment (TMA)

3.13 References / Further Reaching / Other Resources.

UNIT 4 : LIMIT THEOREM.

4.0 Introduction

4.1 Objectives

4.2 Chebyshev’s Inequality

4.3 Convergence of Random Variables

4.4 Demovre’s Theorem

4.5 Central Limit Theorem

4.6 Klinchine’s Theorem

4.7 Conclusion

4.8 Summary

4.9 Tutor Marked Assignment ( TMA)

5.0 References / Further Reading / Other Resources.

4
UNIT 1: PROBABILITY SPACES, MEASURE AND DISTRIBUTION

1.0 INTRODUCTION

This Unit Focuses an Probability spaces, probability measures and

probability distribution for continuous random variables. It gives some basic

definition and relevant working examples will be given to make the concept

more meaningful for the learners.

1.1 OBJECTIVES.

At the end of this unit, student should be able to:

• Understand the meaning of probability space and its notation.

• Define Sample space and event, and Event Space.

• Discuss Probability Measure and State its Theorems.

• Discuss Probability Distribution for Continuous Random Variables.

1.2 PROBABILITY SPACE

A Probability Space is a triplet Finite measure space (Ω, ∧ , P [ . ] ) Where

Ω is a sample space and each w ∈ Ω is called a sample point, and Ρ [ . ] is

a function that has ∆ as its domain that is a single term that gives us

an expedient way to assume the existence of all three components in its

notation.

5
1.3 SAMPLE SPACE AND EVENT

Definition of Sample Space :

The sample space denoted by Ω , is the collection or totality of all

possible outcomes of a conceptual experiment.

In addition to Ω, S , Z R, E µ and A are the other sysbols to denotes

sample space.

Event: An event is a subset of the sample space.

Event Space: The class of all events associated with a given experiment

is defined to be the event spaces.

1.4 PROBABILITY MEASURE

A probability measure is a normed non-negative, countable additive set

function defined on the field of all events.

Definition:- A probability measure P on a σ -field of subset A of set Ω

is a real -valued function having domain A satisfying the following

properties

(i). Ρ(Ω) = 1

(ii). Ρ ( A ) ≥ 0 for all A ε ∆

6
(iii) If Αn, n = 1, 2 , ...................are mutually disjo int sets in Λ ,

then Ρ (UAn) = Σ Ρ ( An).

A probability space, denoted by ( Ω, Α , Ρ).

ΡB ( A) is a conditional probability measure on Ω

We say that (Ω, ∧ Ρ ) is the Pr obabilty space obtained by conditioning (Ω ∧ P )

by the event B.

If an event Β depends on occurrence of event Α1 or Α 2 then

Ρ (Β ) = Ρ (Β ∧ Α1 ) + Ρ (Β ∧ Α 2 )
= Ρ ( Α1 ) Ρ (Β / Α 2 ) + P( A2 ) P (B / A2 )

In general, if an event Β depends on occurrence of events

Α1 , Α 2 , ........................ Α∧, then


Ρ (Β ) = ∑ i
Ρ (Β n Αi )

= ∑ Ρ ( Ai ) P(B / Ai )
i

1.5 Theorem: let (Ω, ∧ , Ρ) be a probability space then

(Ω, ∧, ΡΒ ) is also a probability space.

Proof:
p
( An B)
PB (A) = ≥ O;
P( B)

P(ΩnB )
PB ( Ω ) = = 1
P (B )

7
Let A1, A2……… be disjoint event in I, then


[ [ U Ai[nB]
Ρ (UAi B) = 1

Ρ ( B)
Ρ U ( AinB )
=
Ρ ( B)

Since (AlnB), I = 1, 2 …………….. are disjoint events, we have


∞ ∞
p B (U A i ) =
1
∑P 1
B ( Ai ) and so PB is countably additive.

Hence ( Ω, ∧, PB ) is probability space

Lemma

Let [An ] be a sequence of independent measurable sets

(i) If ∑ P (An) ∠ ∞ then P ( A i .) = O


.
(ii) If P ( A n ) = ∞ then P (A ) = 1
1

where A1 = Lim sup An

Proof :

(i) A = ∩ U An ⊆ U Am
m =n


Since = U decreaseb as n in creases, we have
m=n

 ∞  ∞
P( A) ≤ P  U Am  ≤ ∑ P ( Am )
 m=n  m =n

8
Since ∑P n ∠ ∞, then


l im
n→∞
∑ P( Am) = 0
m =n

( ) =  U I A   ∞ c 
∞ ∞
(iii ) P A • c
m ≤ ∑  nI= m A m 
P
n = 1 m =n   
∞ ∞
= ∑∏ (1 − P( Am ) = 0
n=m n=m

Hence

P(A) =1

1.6 PROBABILITY DISTRIBUTION

Definition:- Let x be a random variable whose image set (s) is a

continuous numbers such as an interval. Then the set

a ≤ x ≤ b is an event in S .

Ρ ( a < x ∠ b ) defined as
The probability

b
Ρ(a <x<b) = ∫a
f ( x) d x

Is called continuous random variable, the function f is called the

distribution or continuous probability function or density faction of X

and it satisfies the following

i. f ( X )> 0

ii ∫
R
f ( x ) dx = 1 where R = (a1b)

9
iii. f (x ) is a non - decreasing function

b
iv. ∫
a
f ( x ) dx = f (b) − f (a ) = p (a < x < b)

for example, if x is a random variable defined to take any value in

interval (0,1) if a point is taken in this interval say 0.45, the probability

that a point picked is

1
P (x = 0.45) =
Uncountale po int s
1
=
No of po int s btw the int erval

1
= = 0

But it is easier to find the probability of a sub-interval within an

interval, it is possible to calculate per ( x ∠ 0.45)etc.

The pr ( x ∠ 0.45) = pr (lenght O to 0. 45)

Ratio of the given Lenght


=
Total lenght

Similarly

x 2 − x1
Pr ( x1 ∠ x ∠ x 2 ) =
Total Lenght

10
Example 1.6.1 :- Given the figure below

C
D D

Find the probability of the shaded portion.

Solution:

Area of ∆ CDE
Pr (Shaded portion) =
Area of ∆ ABC

Example 1.6.2 : - The length of life measure in hours of a certain rare

type of insect is a random, reliable x with portability density function

( )
 0 3 2 x − x 2 0 < x < 2
f (x ) =  O 4
 O elesewthere

If the amount of food measured in milligrams consumed in a life time

by such an insect defined by the function g (x ) = x 2 ,

, where x is the lenght of life mesasued in hours, find the expected amount of

food that will be consumed by an insect of this type.

11
Solution

Expected amount of food = E [g (x)]

2
= − ∫00 g ( x) f ( x) dn

2
= ∫0 x 2 ( 3 4 ( 2 x − x 2 ) dx

= 1.2 mg

Example 1.5.3:- Given a continuous random variable with the

probability distribution function

f ( x) = ∫ O ≤ x ≤ 10
2
kxO

O elsewhere

Find k such that f (x) is then Pdf

Solution
1
If F (x) is a Pdf, ∫0
f ( x) dx = 1

∫ f ( x)dx = ∫
10 2
K x dx
0

10
Kx 3 ) (10)
3
( 0) 3
= 0
= k − k =1
3 3 3

k
= (10 3 − 00 ) = 1
3

1000 K
= 1
3

12
3
k =
1000

1.7 CONCLUSION

In this unit, you have learnt probability space, the notation of its

components, the definition of sample space, event and event space.

You also learned probability measure and its main probabilities related

theorems. You also learned probability distribution of a continuous

random variables with relevant working examples.

1.8 SUMMARY

What you have learned in this unit are the following probability distribution

concepts.

i. The meaning of probability space and its notation

ii. Important definition of sample space, Event and event space .

iii. Probability measure and its properties

iv. Part of probability distribution of a continuous random variables

13
EXERCISE 1.80.1 (SAE)

The surface area measured in squaremeter of a flat metal disk manufactured

by a certain process is a random variable x with probability density

function.

f ( x) = ∫ 6 (x −
0
x2 ) O∠x ∠1
dscuhere

Find the expected radius measured in meter of a flat metal disk

manufactured by this process.

1.90. TUTOR MARKED ASSIGNMENT (TMA)

Exercise 1.8.1: The probability function of a random variable x is

given by

2 p x =1
p = 2
 x
f ( x) = 
4 p x = 3
0 otherwise

Where P is a constant find

(a ) P (0 ≤ x < 3 ) , (b) P ( x > 1).

14
1.9 REFERENCES / FURTHER READING

DR. R.A Kasumu (2003) probability theory first edition published by

Fatol ventures Lagos.

Alexander M. Mood et al 1974 introduction to the theory of statistics third

edition published by McGraw –Hill.

15
UNIT 2: DISTRIBUTION OF RANDOM VARIABLES SPACES

2.0 INTRODUCTION

This unit concerns with the meaning and classification of random

variables into discrete and continuous random variables are high lighted,

distributive functions for discrete and continuous random variables and

related examples are also given.

The unit further high light Graphical representation, Joint distribution for

discrete and continuous random variable , Independence and Conditional

probability of random variables and working example on each are given.

2.1 OBJECTIVE

At the end of this unit student should be able to,

- Understand the meaning of random variables.

- Classify random variables into discrete and continuous random variable

with example.

- Define and state the properties of distribution function.

- State the distribution function for discrete and continuous random

variables and solve example on each.

- Show the graphical representation of random variables.

16
- State the joint distributions for two random variables which are either

both discrete or both continuous.

- State the independent of random variables for independent and

dependent events.

- State the conditional probability function for discrete and continuous

random variables.

- Solve related problems on the distribution of random variables spaces.

2.2 RANDOM VARIABLES

A random variable is a function whose domain of definition is the simple

space S of a random experiment and whose range is a set of real numbers.

Definition 2

A real valued measurable function

X :Ω → R with respect to (Ω, ∧ Ρ )


,

Is called a random variables.

Note:

Suppose {Xη } is a sequence of random. Variable

If lim X n = X , then X is Random variable


n→∞

17
Example : suppose that a coin is tossed twice so that the sample space

S = (HH HT TH , TT ). let X repses ent the number of heads that can come

up. For example X ( HH ) = 2 , X ( HT ) = X (TH ) = 1, X (TT ) = 0

Since the domain of X is S and the range consists of real numbers,

them x is a random variable.

A random variable that takes on a finite or countably infinite number of

values is called a discrete random variable. While the one which takes

a non countable infinite number of values is called a nondiscrete /

continuous random variable.

2.3 DISCRETE PROBABILITY DISTRIBUTIONS

Let X be a discrete random variable, and suppose that the possible

values that it can assume are given by x1 , x2 , x3 , .......... ....................,

arranged in some order. Suppose also that these values are assumed

with probabilities given by

p ( X = xk ) = f ( xk ) K = 1, 2, ................(1)

It is convenient to introduce the probability function, also suffered to as

probability distribution, give by

P ( X = x ) = f ( x ) .................... .............................. .........( 2)

18
For X = xk , this reduces to equation given above while for other

values of X 1 f ( x ) = 0.

In general, f ( x) is a probablity function if

f ( x) ≥ 0
1.

2 ∑f
x
( x) = 1

Where the sum in equation (2) is taken over all possible values of x

2.4 DISTRIBUTION FUNCTION FOR RANDOM VARIABLES

The cumulative distribution function, or briefly distribution function

for a random variables X is defined by

f ( x) = P ( X ≤ x )

Where x is any real number, that is

−∞ ∠x ∠ ∞

The distribution function F ( x) has the following properties:

1. f (x ) is non decreasing [ie , f ( x) ≤ f ( y ) if x ≤ y )

2. l im F ( x) = 0; lim f ( x) = 1
x→∞ η →∞

 
3. f ( X ) is continous fromthe right ie lim f ( x + h) for al x .
 ης∞ 

19
2.5 DISTRIBUTION FUNCTION FOR DESECRATE RANDOM

VARIABLES

The distribution function for a discrete random variable x can be

obtained from its probability function by noting that, for all

x in (−, ∞, ∞)
p ( x) = P ( X ≤ x ) = Σ f ( x)
v ≤η

Where the sum is taken over all values µ taken on by X for which

µ≤ x if X takes on only a finite numbers of values

x1 , x 21 .......................xn, then the distribution function is given by

0 − ∞ < x < x1
 f (x ) x1 ≤ x ≤ x2
 1
F ( x) 
[ f ( x1 ) + f ( x2 ) x2 ≤ xη ∠ x2
[ F ( x ) + − + F x ) x ≤ x ∠ ∞
 1 n n n

Example 2.5.1:- find the probability function corresponding to the

random variable x when a coin is tossed twice; assuming that the coin

is fair.

Solution

p ( HH ) = 1 , P ( HT ) = 1 , P (TH ) = 1
4 4 4
and P(TT ) = 1
4

20
Then, P ( X = 0) = P(TT ) = 1
4
P ( X = 1 = P ( HT ) + P (TH )
= 1 + 1 = 1
4 4 2

P ( X = 2 ) = P ( HH ) = 1
4

The probability function is given in the table below

x 0 1 2

f (x) 1 1 1
4 2 4

Example 2.5.2 = (a) find the distribution for the random variable X

from the working example above (b) obtain its graph

Solution:

The distribution function is

0 − ∞ ∠ x ∠ 0 since the table 


1 
 4 0 ≤ x ∠1 x since f (x1) = 1
 4 
f (x) = 3 3 
 4 1 ≤ x ∠ 2 since f (x1) + f (x2) = 4
 
1 2 ≤ x∠∞ since 2 + 4 + 4 = 1 
1 1 1

21
0 − ∞ ∠ x ∠ 0 sin ce x1 = 0 from the above table
 
1 = 1 
 4 0 ≤ x ∠ 1 sin ce f ( x1 ) 4 
 
f (η )  3 1 ≤ x ∠ 2 sience f ( x1 ) + f ( x2 ) = 3 
4 4
 
 ie 1 + 1 = 3 
 4 2 4 
1 2 ≤ x ∠∞ sin ce 1 + 1 + 1 = 1 
 2 4 4 

b.

f (x)

3
4
½
1
2

1 31
4 4
0
1 2 x

The following things about the above distribution function should be

noted.

1. The magnitudes of the jumps at 0, 1, 2 are ¼ , ½ , ¼ which

are precisely the probabilities ie this fact enables one to

obtain the probability function from the distribution function.

22
2. Because of the appearance of the graph it is often called a staircase

function or step function, and the value at the function at an

integer is obtained from the higher step; thus the value at I is ¾

and not ¼ . This is expressed mathematically by stating that the

distribution function is continuous from the right at 0,1,2.

3. As we proceed from left to right in the distribution function is

monotonically increasing function.

2.6 CONTINUOUS RANDOM VARIABLES

A non discrete random variable x is said to be absolutely continuous,

or simply continuous, if its distribution function may be represented as


x
f ( x1 = P ( X ≤ x ) = ∫
−∞
f (u )du (− ∞ ∠ x ∠ ∞)

Where the function f ( x) has thefollowing proaperties

1. f ( x) ≥ 0


2 ∫f

( x)dx = 1

It follows from the above that if X is a continuous random variable,

then, the probability that X takes on only one particular value is zero,

whereas the interval probability that X lies between two different

values say a and b is given by

23
b
p (a ∠ x ∠ b ) = ∫ f ( x) dx
a

Example 2.6.1 (a) find the constant C such that the function

cx 2 0 ∠ x ∠ 3
f (x ) = 
0 otherwise

Is a density function, and

b. Computer P (1 ∠ x ∠ 2 )

Solution

Since f (x) stratifies property (2) if C ≥ 0, it must satisfy property 2 in

order to be a density function


∞ 3
Now, ∫

f ( x) dx = ∫ cx 2 dx
0
3
Cx 3
=
3 0
C (3) ( 0) 3
= − C
3 3
27C 0
= − = 9C
3 3

And since the integral equal to 1, we have 9C = 1

C = 1/9
2
b. P (1∠ x ∠ 2) = ∫
1
1 x 2 dx
9

24
x3 2 ( 2 ) − (1)3 3

27 ∫1
= =
27 27
8 1 7
= − =
27 27 27

2.7 GRAPHICAL REPRESENTATIONS OF RANDOM VARIABLE

If f (x) is the density function for random variable x , then we can

represent y = f (x) graphically by a curve as shown in the figure below.

Since f ( x) ≥ O, the curve cannot fall below the x-anis the entire area

bonded by the curve and the x-anis must be I because of the second

property i.e ∫

f ( x)dx = 1) .

Geometrically the probability that x is between a and b, i.e p (a < x < b),

is the represented by the area shown shaded from the first figure below

25
The destitution function F ( X ) = p ( X ≤ x) is a monotonically increasing

function which increases from 0 to a 1 and is represented by a curve

as in the second figure.

26
2.8 JOINT DISTRIBUTIONS

Joint distributions can easily be generalized to two more random variables.

We shall consider the typical case of two random variables that are either

both discrete or both continuous.

Discrete case: - If x and Y are two discrete random variables, we define the

joint probability function of x and y by

P ( X = x, Y = y ) = f ( x, y )

Where (1) f ( x, γ ) ≥ O

(2) ∑∑ f
x y
( x, y ) = 1

i.e the sum over all values of x and y is 1

Suppose that x can assume any one of m values x1, x2 , − − xm and y can

assume any none of n values y1, y2, - - yn.

Then the probability of the event that x = xj and y = Yj is given by

P(x = xj, Y = yk) = f ( x j, y k )

A joint probability fraction for x and y can be represented by a joint

probability table as shown below

27
X
Y Y1 Y2 --------- Yx Total

X1 F( x1 , y1 ) F(x1y2) --------- F( x 1 y n) x1 ) F1( x 1 y n) x1 )

X2 F(x2y1) F( x2 , y1 ) --------- F( x 2 , y n ) F1 (x2)

Xm F(xmy1) F(xmy2) F(xmyn) F1 (xm)

Totals → f 2 ( y1 ) f 2 ( y2 ) ------- f2 (yn) 1

The probability that X = xj is obtained by adding all entries in the row

corresponding to xi and is given by

2
P ( x = η j ) = f1 ( x j ) = ∑fk −1
( xj y k )

For J = 1, 2 -------------- m, these are indicated by the entry totals in the

entrance right hand column or margin from the table above similarly

the probability that Y= yk is obtained by adding all entries in the column

corresponding to yk and is given by


m
p (Y = yk ) = f 2 ( yk ) = ∑f
j =1
( xj, yk )

For k = 1,2,......... ......................., n, these are indicated by the entry totals in

the bottom row or margin of the probability table from the two

equations given above f 1 ( xj ) and f 2 ( yk ) or simply f1 ( x) and f 2 ( y )

28
which are obtained from the margin of the table are refer to as the

marginal functions of X and y, respectively it should be noted that


m ∧

∑f
j =1
1 ( xi ) = 1 ∑f
k =1
2 ( yk ) = 1

Which can be written as follow

m η

∑ ∑f
j =1 K =1
( xj , yk ) = 1

This is simply the statement that the total probability of all entries is

1. the grand total of 1 is indicated in the lower right – hand of the

probability table.

The joint distribution function of x and y is defined by

f ( x1y ) = P ( x ≤ x, y ≤ y ) = ∑ ∑
µ ≤ η ν ≤ y
f ( x1 ν )

In the probability table f ( x, y ) is the sum of all eateries for which x j ≤ x

and yk ≤ y.

CONTINUOUS CASE:

The case where both variables are continuous is obtained easily by analogy

with the discrete case on replacing sums by integrals thus the joint

probability function or joint density function for random variables x and y is

defined by

(1) f ( x, y ) ≥ O

29
∞ ∞
(2) ∫ ∫
−∞ −∞
P ( x1 y ) dx dy = 1

Graphically Ζ = f ( x, y ) represents a surface called the probability

surface as indicated in the figure below. The total volume bounded by

this surface and the xy plane is equal to 1 in accordance with property

2 above. The probability that x lies between a and b while y lies

between c and d is given graphically by the shaded values of the

figure below and mathematically by


b d
P(a ∠ x ∠ b, c ∠ y ∠ d ) = ∫
X =a

y =c
f ( x, y ) dxdy

30
More generally, if A represents any event, there will be a region RA of

the xy plan that corresponds to it. In such case we can find the

probability of A by performing the integration over RA i.e

P (A) = ∫ ∫
RA
f ( x1 y ) dxdy )

The joint distribution function of x and y in this case is defined by


η y

F ( x, y ) = p ( X ≤ x, Y ≤ y ) = ∫
µ = −∞

v = −∞
f (u, v) du dv

It follows in analogy with equation

d F ( x)
= f ( x) ,
dx

∂2 F
= f (x,y)
∂ x dy

That is density function is obtained by differentiation the distribution

function with respect to x and y from the joint distribution equation

given above to obtain


x η
P (x ≤ x ) = F1 ( x) = ∫ ∫
v = −∞ v = − ∞
f (u, v) dudv

∞ y

P (y ≤ y ) = f 2 ( y ) = ∫
µ = −∞

v = −∞
f (u , v) dudv

The two equation above are called the marginal distribution functions or

simply the distribution function of x and y respectively.

31
The derivative of the equations with aspect to x and y are then called the

marginal density functions or simply the density functions, of x and y which

are given below


∞ ∞
F1 ( x) = ∫
v = −∞
f ( x, v)dv f 2 ( y) = ∫
u = −∞
f (u , y ) du

2.9 INDEPENDENCE OF RANDOM VARIABLES

Suppose that x and y are discrete random variables. If the events X = x and

Y = y are independent events for all x and y, then we say that x and y are

independent random variables. In such case,

P (X = x, Y = γ ) = p ( X = x) p (Y = y )

Or equivalently

f( x, y ) = f1 ( x) f 2 ( y )

Conversely, if for all x and y the joint probability function f ( x1 y ) can be

expressed as the product of a function of x alone and a function of y alone

(which are then the marginal probability function of X and Y) X and Y are

independent. If however, f ( x, y ) cannot be so expressed, then X and Y are

dependent. If X and Y are continuous random variables, we say that they are

independent random variables if the events X ≤ x and Y ≤ y are

independent events for all x and y. In such case we can write

32
P (X ≤ x, Y ≤ y ) = p ( X ≤ x ) p ( Y ≤ y ) or equivalently

F ( x1 y ) = F1 ( x) F2 ( y )

Where f1 ( x)η and f 2 ( y ) are the (marginal) distribution functions of X and

Y, respectively conversely, X and Y are independent random variables if for

all x and y, their joint distribution function F (x,y) can be expressed as a

product of a function of x alone and a function of y alone (which are the

marginal distribution of X and Y respectively)

If however, f ( x, y ) cannot be expressed, then x and y are dependent. For

continuous independent random variables, it is also true that the joint density

function f (x,y) is the product of a function of x alone, f1 (x) and a function

of y alone, f2 (y) , and these are the (marginal) density functions of x and y,

respectively.

2.10 CONDITIONAL DISTRIBUTIONS

We already know that if p (A) > O,

AnB)
P(B/A) = P (
P( A)

If x and y are discrete random variables and we have the events (A: x = x)

( B : Y = y ), the above equation becomes

33
P(Y = y X = x ) =
f ( x, y )
f1 ( x )

Where f ( X , y ) = P ( x = X , y = y ) is the joint probability fraction and f1 (x) is

the marginal probability function for x.

f ( x1 y )
F(y x) =
f 1 ( x)

And call it the conditional probability function of Y given X. Similarly, the

conditional probability function of X given Y is

f ( x1 y )
F ( x y) =
f 2 ( y)

We can also denote

f ( xl y ) and f ( yl x) by f i ( xly )and f 2 ( ylx)respectively these ideas can easily

be extended to the case where X, Y are continuous random variables.

For example the conditional density f unction of Y given X is

f ( x 1 y)
f ( ylη ) =
f1 ( x)

Where f ( x, y ) is the joint density function of x and y, and fi ( x) is the

marginal density function of x . Using the equation above we can find

the probability of y being between c and d given that x ∠ X ∠ x + dx is

P (c < Y < d x < X < x + dx ) = = ∫ f ( y


d
x ) dy
c

34
Example 10.1

2: 101 A random variable x has the density function

C
f ( x) = , where − ∞ < x < ∞ .
( x + 1)
2

a. Find the value of the constant C

b. Find the probability that X2 lies between 1/3 and 1

Solution

c. We must have ∫
−∞
f ( x) dx = 1 ie

∞ c
∫ dx = C tan − 1x ∞
−∞ x + 1 2 −∞

∏  ∏ 
= C  −  −  = 1
2  2 
=C∏= 1
C = 1

1 3
b. If ≤ x 2 ≤ 1 , then either ≤ x ≤ or
3 3

3
−1≤ x ≤ -
3

Thus the required probability is

35
1 − 53 dx 1 1 dx 2 1 dx
∫ + ∫ = ∫
3

∏ −1 x 2 +1 ∏ 53
3 x 2 +1 ∏ 53
3 x 2 +1
2  −1 −1 J3 
= tan ( 1) − tan ( )
∏  3 
2 ∏ ∏ 1
=  −  =
∏ 4 6 6

Example 2:10:2

2: 102 find the distribution function corresponding to the density

function of the example 2:10:1 given above.

Solution :

u 1 µ du
f ( x) = ∫

f (u) du =
∏ ∫∞ u 2 + 1
−∞
1  −1
x

=
∏ 
tan u ∫
∞ 

=
1

tan [ −1
x − tan −1
(−∞) ]

1  −1 ∏ 1 1 −1
= tan x+ = + tan x
∏  
2 2 ∏

Example 2:10:3

2: 10:3 The distribution function for a random variable x is

1 −2µ
f ( x) =  −e x ≥ 0
0 η 0

36
Find (a) the density function (b) the probability that x > 2 and (c)

the probability that − 3 ∠ x ≤ 4

Solution :

f ( x) = 1 − e − 2 x when x ≥ 0
a. d − 2x
( f ( x) = 0 − (−2 ) e − 2 x = 2e
dx

d 2e − 2 x x ≥ 0
2 > f ( x) = f ( x) = 
dx 0 x< 0


− 2η − 2µ ∞
(b) p (x > 2 ) = ∫ 2
2e dx = − e
2

. = − e − 2 (∞ )
−4
= − eo
+ e = e− 4

4
− 2µ
4 o
o dx + ∫ 2e dx
p (−3 < x ≤ 4) = ∫ f (x) dx = ∫
o
c. −3 −3
= − e − 2x 4
= 1 − e −8
o

OR p (−3 < x ≤ 4 ) = P ( x ≤ 4 ) − P x ≤ − 3)

P (4) − P (−3)
= (1 − e − 8 ) − (o)
− 8
= 1− e

Example 2:10:4 The joint probability function of two discrete random

variable X and Y - 1 is given by f ( x, y ) = C (2 x + y ), where x and y can

assume all integers such that

o ≤ x ≤ 2, o ≤ y ≤ 3 , and f ( x, y ) = o othewise

a. find the value of the constant C

37
b. Find p (x = 2, y = 1). (c) find p (x >1, y < 2

Solution:

The sample points x, y ) for which probabilities are different from zero

are indicated below

y 0 1 2 3 Totals
x

0 0 C 2c 3c 6c

1 2c 3c 4c 5c 14c

2 4c 5c 6c 7c 22c

Totals 6c 9c 12c 15c 42c

The probabilities associated with these points, given by C (2 x + y ), are

shown in the table above

Since the grand 24C must equal to 1 i.e 42 C = 1

C=1
42

F ( x1 y )
b. P (x = 2 , Y = 1) = C (2 x 2) + 1) +
grand total

5c
5c +
42c
5
= 5c +
42

38
c. p (x ≥ 1 , y ≤ 2 ) = ∑ f ∑ f ( x, y )
x ≥=1 y≤2

= (2c + 3c + 4c ) + (4c + 5c + 6c)


24 4
= 24c = =
42 7

Example 2:10:5 Find (a) f ( y 2),

(b) p ( y = 1 x = 2 for the distribution

Find the example 2:10:4 above

Solution

Using the results from the above example

f ( x, y ) (2 x + y ) / 42
f ( y x) = =
f1 ( x) f1 ( x)

So that with x = 2

(4 + y ) / 42 4 + y x 21
f (y 2 ) = =
11 11 x 42
21

4+ y
=
22

4 +1
p ( y = 1 x = 2) = f (1 2) =
(b) 22
= 5
22

39
2.11 CONCLUSION

In this unit, you studied random variables and classification, distributive

functions for discrete and continuous random variables. You also learned

Graphical representation and joint distribution for discrete and continuous

random variables.

Independence and conditional probability of random variables and related

working examples are also learned from this its

2:12 SUMMARY

In this unit distribution of random variables spaces that you studied included

(1) Meaning of random variables and its classification.

(2) Distribution functions for discrete and continuous random variables.

(3) Graphical representation of random variables

(4) Joint distribution for discrete and continuous random variables.

(5) Independence and conditional probability of random variables

(6) Worked examples on each concept of random variables.

Exercise 2.12 .1 (S A E)

40
Suppose that a pair of fair dice are to be tossed, and let the random

variable x denote the sum of the points. Obtain the probability

distribution for x.

2.13 TUTOR MARKED ASSIGNMENT (TMA)

Exercises 2.131. The joint density function of two continuous random

variables x and y is

{C x y o ∠ x ∠ 4, 1 ∠ y ∠ 5
f ( x, y ) = 
0 otherwise

Find the value of the constant c

(a) Find P ( 1 ∠ X ∠ 2, 2 ∠ Y ∠ 3)

(b) Find p (X ≥ 3, Y ≤ 2).

2.14 REFERENCES/ FURTHER READING / OTHER RESOURCES

Marry R Spiegel etal (2009) probability and statistics third edition

published by mc craw hill

41
UNIT 3 EXPECTATION OF RANDOM VARIABLES.

3.0 INTRODUCTION.

This is a very important concept in probability and statistics. The unit

will forcusses on mathematical expectation of random variables.

Expected value for discrete and continuous random variables are stated.

Variance and Standard Deviation for discrete and continuous random

variables are highlighted; also some important theorems on the

expectation of random variables are discussed.

Moment and Moment generating functions for random variables are also

learned from this unit. Characteristics function of random variables are

also learned and relevant working examples on each concept are given to

make the unit more meaningful.

3.1 OBJECTIVE

At the end of this unit, student should be able to

1. Define Expectation of random variable

2. Express mathematically the Expected Value of Mean for discrete and

continuous random variables.

3. State and prove Theorems on Expectation.

42
4. State Variance and Standard Deviation for Discrete and Continuous

Random Variables.

5. Find the Mathematical Expectation of Moments and Moments Generation

Function for Discrete and Continuous random variables.

6. Find the characteristic function of a given random variable.

7. Solve related examples on the mathematical expectation of random

variables.

3.2 What Is Expectation of Random Variables?

Let X be a discrete random variable with probability function f (x) , Then

the expected value of x, E (x) is defined to be


η
E (X) = ∑
J =1
X j f ( xj ) = ∑X f ( x)

= ∑x f (x) (1)

If f ( x) is a accurate characterization of the population frequency

distribution, then

E (x) = µ ( the population mean)

For a continuous random variable x having density function f (x) the

expectation of x is defined as

E (x) = ∫−∞
x f ( x ) d x − − − ( 2)

43
Provided that the integral converges absolutely. Where f (.x) is the value

of its probability density at x e.g If x is the member of point roll with a

balance die. F ( x) = 1 6 for x = 1, 2, 3, 4, 5, 6 and its mathematical

1 1 1 1 1 1
expectation is E ( x) = 1 x + 2x + 3x + 4x + 5 x + 6x
6 6 6 6 6 6

1 2 3 4 5 6
= + + + + +
6 6 6 6 6 6

21
= = 3 1 (3.5)
6 2

Also if X has the unform density function

f (x ) = 1 for 2 ∠ x ∠ 4 and f ( x) = o elsewhere. Then


2

4 1 2
E ( x) = ∫ x. 1 dx = =3
1
x
2 2 4
2

In many scientifical problem, we are interested not only the expected

value of a random variable X but also in the expected value of random

variable related to x. Thus, we might be interested in random variable y

whose values are related to those of X by y = g ( x)

The E [g ( x)] = ∑ g ( x) f ( x) − − − (3)

Where x is discrete and E [( g ( x) )] =



∫−∞
g ( x) f ( x) d x

For continuous case using the above example find the expectation of

g ( x) = x 2 for the number of points rolls with a balance die

44
Solution

E [g ( x) ] = ∑ x 2 f ( x)

f ( x) = 1 2, 3, 4, 5, 6
(
f ( x) = 1
6

∴ E [g ( x ) ] = 1 x
1 1 1 1 1
+ 4 x + 9 x + 16 x + 25 x + 36 x 1
6 6 6 6 6 6

= 15 16

Similarly for the random variable with the uniform density function

f ( x) = 1 2 for 2 ∠ x ∠ 4 and f ( x) = O elsewhere, we get

E [g ( x)] = E ( x 2 ) =
4

2
x 2 f ( x) dx

4 1
= ∫
2
x 2 ( ) dx
2

4 1 3
∫ 12 x dx =
2 4
= x 2
2 6

= 9 13

3.3 THEOREMS ON EXPECTATION

Theorem 3.31: If c is any constant, then E(C X) = C E (X)

Also, E [C.g ( x)] = C E [g ( x)]

Theorem 3.3.2 : If x and y are any random variables, then

E (X + Y) = E (x) + E (Y)

45
Theorem 3.3.3: If x and y are independent random variables, then

E (X Y) = E (X) E (Y)

THEOREM 3.3.4

E [(ax + b ) ]= ∑
η
η
(in ) aη −ι bι E ( xη −ι )
ι=o

For instance, if n = 1,

E [(an + b) n ] =
n

∑ (1ι ) a
ι =o
1−ι
bι E ( x1−ι )

()
= 1 a E (x ) + 1 b E (1)
o
() 1

= a E (x) +b

If I= 2

[ ] ∑ (2) a b E [x − ]
2
2 −ι
E (ax + b ) 2 = i 2 i
ι
ι =o

( )a E (X )+ ( )ab E ( X ) + ( ) b E (1)
20
2 2 2
1
2
2
2

= a2 E (x2) + 2ab E (x) + b2

Theorem 3.3.4 can easily be proved by mathematical induction

If z is a random variable whose values are related to those of z and random

variable x and y by means of equation z = g ( x, y )

The mathematical Expectation is written as

E [g ( x, y )] = ∑ ∑ g ( x, y ) f ( x , y ) − − ( 4)
η y

Or E [g ( x, y )] =
∞ ∞
∫ ∫ −∞ −∞
g ( x, y ) f ( x, y ) dx dy − −(5)

46
In equation (4) f ( x, y ) is the value of joint probability function of x and

y at x, y while in equation (5) f ( x, y ) Corresponds to the value of the joint

probability density

3.4 THE VARIANCE AND STANDARD DEVIATION

We have already noted that the expectation of a random variable X is

often called the means and is denoted by µ another quality of great

importance in probability and statistics is called variance and is defined

by Var (x) = E (x- µ ) 2 .......... .................... .....................(6)

The variance is a non-negative number. The positive squaeroot of the

variance is called the Standard Deviatiation and is given by

σx = Var ( x ) = E ([X ])
− x ) 2 − .......... ....................... (7)

The standard derivation is often denoted by σ instead of σ x , and the

variance in such case is σ 2

If x is a discrete random variable taking the values X1,X2 ………. Xn

and having probability function f (x),, then the variance is given by


n
α x2 = E ( X − µ )2 = ∑ ( x; − µ )
j =1
2
f (x j ) = ∑ (x − µ) 2
f ( x) − .................... .......(8)

In the special case of (8) where the probabilities are all equal, we

have σ 2 = [( x1 − µ ) 2 + ( x 2 − µ ) 2 + .......... .... (x n _ µ )2 ] ) n −


...................(9)

47
Which is the variance for a set of n numbers x1...................... xn

If X takes on an infinite number of values



X 1 , x2 .................., then σ 2 x = E ( X − µ ) = ∫ ( X − µ ) f (x )dx................................(10)
2 2
−∞

Provided that the integral converges.

3.5. THEOREMS ON VARIANCE

Theorems 3.5.1

σ 2 = E ( x − µ ) 2 x) = E ( x2 ) − µ 2 =
= E ( x 2 ) −( E ( x) 2
where µ = E ( x) .

Theorem 3.5.2. If C is any constant,

Var (c x) = C 2 Var ( x) − .......... ........................................ ..........................(1)

Theorem 3.5.3 The quantity E (( x − a) 2 ) is a minimum when a = µ = E (x)

Theorem 3.5.4 If X and Y are independent random variables

Var ( X + Y ) =Var ( X ) + Var (Y ) or


σ 2 x1 y = σ x
2
+ σ 2 y .............................. .................................(12)
Var ( X − Y ) = Var ( X ) − Var (Y ) or
σ 2 x − y = σ x 2 − σ y 2 ........................................ ..................................(13)

48
3.6 MOMENTS

The rth moment of a random variable X about the mean µ , also called the

rth central moment, is defined as

µ r = E ( x − µ ) r .............................. ........................................ ....(14)

Where r = o, 1, 2, ------ it follows that µ0 = 1, µ , = o and µ2 = σ 2 i.e second

moment about the mean is the variance.

We have, assuming absolute convergence.

µr = ∑ (x − µ ) r
f ( x) (discrete) .................................................................(15)
var iabl ; e


µr = ∫ ( x − µ ) r f ( x) dx (continuous) .................... ..........................(16)
−∞ var iable

The rth moment of x about the origin, also called the rth raw moment, is

defined as

µ 1 r = E ( x r ) .............................. .............................. ..........................(17)

The zero moment and the first moment about the mean are respectively 1

and 0 Since µ o = E [( x − µ ) o ] = E (1) = 1

And µ 1 = E [ ( x − µ ) 1 ] = E ( x) = E ( µ ) = µ − µ = o

The second movement called the variance and is dented by σ 2 .

[
µ 2 = E (x − µ )2 = σ 2 ]
This indicate the strength on dispersion of the distribution generated.

49
Generally moment about the mean describe the shape of the distribution of a

random variable.

3.7 MOMENT GENERATING FUNCTIONS

Although the moment of some distribution can be determined directly by

evaluating the necessary integral or sum. There exist on alternative

technique which often provide considerable signification. This technique is

based on the moment generating function which is given by

M x (t ) = E (e tx
) = ∑e t x
f ( x)
( discrete var iable )
.......... ................................(18)


M x (t ) = E (e tx ) = ∫
t x
e f ( x) d x .................... ..............(19)
−∞ ( continuous var iable )

tx t 2 x 2 t 3 x3 tr x r
but e txη = 1 + + + + −−− −−−
! 2! 3! r!
t 2 x2  tr x r 
This for the discrete case Μx (t ) = ∑ 1 + tx + + − − −+ − − f ( x)
 21 r1 
2 r
t t
⇒ Μ x (t ) = ∑ f ( x + t ∑ x f ( x) + ∑ x 2 f ( x) + − − − ∑ x r f ( x) + − − −
2! r!

t2 tr
Μ x (t ) = 1 + µ 1t + µ 1 + −−− µ1 + .................... ........................................ . (20)
2 2! r!

Thus if we expand Μ x (t ) as a power series in t, the coefficient of tr is µ 1r

which is the rth moment about the origin of distribution X.

50
You observed that the maclaurins series of a function Μx (t ) with coefficient

tr
is the rth derivation of the function with respect to t at t = o.
r!

Hence another way of determine the movement of a distribution is given by

variation

 dr Μx (t ) 
µ r1 = 
 dt r  t = o

3.8 THEOREMS ON MOMENT GENERATING FUNCTION

Theorems 3.8.1 If Μ x (t ) is the moment generating function of the random

variable x and a and b (≠ 0) are constants, then the moment generating

x+a
function of  (t ) = t
at
 is M  x+a 
e b
Mx  
 b  
 b 
 b

Theorem 3.8.2 :- If X and Y are independent random variables having

moment generating fractions M x (t ) and M r (t ), respectively, then

Μ x + y (t ) = M x (t ) M y (t ) ..............................................................(21)

i.e The moment generating function of a sum of independent random

variables is equal to the product of their moment generating functions.

51
3.9 CHARACTERISTIC FUNCTIONS

Characteristic function of a random variable X(w) defined on ( Ω, ∧ , p )

provides a powerful and application tool in the theory of probability.

Characteristic function has one important advantages over moment

generating function because it can be need to prove both the weak law of

large numbers and the Central limit theorem which will be treated in the next

unit.

Definition: Let X be a random variable with probability distribution

function. The characteristic function of x is defined for real t by

ψ (t ) =
tx
∫ ei d F ( x) = ∫e
itx
p (dx)
R

= ( )
E eitx = E eitx ( )
Where eitx = cos t + i sin t x

( )
E eitx = E (cos t x ) + i E sin t x

Properties of characteristic function

(a) (i) ψ (t ) is informally continuous on the real line.

(ii) Ψ (o) = 1

(iii) φ (t ≤ 1 for all t ) .

Since e itx = 1

52
Proof:

(i) φ (t + h) − φ (t ) = ∫
−∞

{e i (t + h)
x ei
th
}p (d x)

∞ −h
≤ ∫
−∞
e i (t x − e itx p (dx)

∞ ∞
= ∫− ∞ e itx e iηx − p (dx) ≤ ∫− ∞ e i ηx p (dx)

By the dominated convergence theorem, we have



Lim ∞
iηx iηx

h → 0 ∞∫
e itx e − 1 p (dx) = ∫ lim e − p (dx) = 0
−∞ h →ο

Note:

(i) e i η x − 1 ≤2

(ii) The limit tends to zero independently of t.

Thus, φ (t + h) − φ (t ) → o independent of t.

Hence φ (t ) is uniformly continuous on the real line

(iii) e itx
= cos t x + ι sin t x = cos 2 t x + sin 2 t x = 1

φ (t ) = ∫ e i tx p (dx) ≤ ∫ e i tx p (dx) = ∫ p (dx) =1

(b) The characteristic function of the sum of independent random

variables is the product of their characteristic functions

53
Proof :

Let S n = X 1 + X 2 + − − + + + Xn where X 1 , X 2 − − − − X n are independent

random variables.

Then

ψ (t ) = E [e its n] = E [eit (x1 + ...............xn ) ]


sn

= E (e itx1) E (e itx2) --- E (E (e itx n )

= ψ x (t ) ψ x 2 (t ) − − −ψ xn (t ) for all real t.

If ( X i ) are independent and identically distributed then

ψ sn (t ) = ψ x (t ) n

(c) Unlike movement generating functions, ψ x (t )

Is finite for all variable x and all real number t. The reason being that eit is

bounded while it is unbounded for - ∞ ∠ ∠ ∞

d. The distribution function of X and hence the Pdf, if it exists can

be obtained from the characteristic function using an “Inversion

formula:,

η
If X is integer valued random variable then f x (n ) =
1
∫ e itn
ψ x (t ) dt
2∏ ∏

1 ∞
If x is a continuous random variable, then f x ( x) = ∫ eit ψ x (t ) dt
2∏ ∞


assuming ∫ ψ x (t ) ∠ ∞

54
e. Properties of characteristic function enable us to prove both the

weak law of large numbers and the central limit theorem.

Properties (c) (d) and (e) are important advantages of functions over

moment generating function.

(f) If two random variables have the same characteristic function they

have the same distribution function.

(g) If x has finite nth moment, then µ x ( n ) (t ) exists and is continuous in t.

µ x ( (t ) = d n
n) E (e itx
dt n
n
{
) E (ix ) eitx }

µ x ( n ) (0 )
Thus E (x n ) =
in

Example 3.9.1: let x have an exponential distribution with parameter

β find the characteristic function of x

Solution

− βx
ψ x (t ) =E (e i ) = βe
tx

itx
e dx


( β − it ) x β
= β ∫ e− dx =
∞ β − it

Example 3:9:2 Let x be informally distributed an (-1,1). Find the

characteristics function of X.

55
Solution

(t ) ) = E = ∫−1e 1 1
ψ x dx itx

2
1 e i tx 1
= t≠0
2 it −1

1  ei − e − it 
t
sin t
=   =
2  it  t

e it
= cos t + i sin t ,
Note: e− it
= cos t − i sin t ,
− it
e it
−e 20 sin t.

Example 3.9.3 : Find the characteristic function of the random variable

X having density function give by

 1 1x1 ≤ a
f ( x =  2a
0 otherwise

Solution

∞ 1 a
E (eitx ) = ∫ f ( x) dx = ∫
itx itx
e e dx
∞ 2a a

1 e itx e itx − e − itx


= a
=
2a it −a 2 iat
sin at sni ϑ
= =
at ϑ

Using Euler’s formula with ϑ = at

Example 3.9.4 : Find the expectation of a discrete random variable x

whose probability function is given by


x
1
f ( x) =   [x = 1, 2, 3, ..................................... −]
2

56
Solution

∞ x
1 1 1 1
We have E ( x) = ∑ x   = + 2   +3 +
x = 1 2 2 4 8

To find the sum,

1 1 1 1


Let S = + 2   + 3   + 4   + ………………………………..
2 4 8  16 

1 1 1 1
Then S = + 2   + 3   + ……………………………….
2 4 8  16 

1 1 1 1 1
By subtracting S = + + + + ...................... = 1
2 2 4 8 16

Therefore, S = 2

Example 3.9.5: A Continuous Random Variable X Has Probability

2e − 2 x x> 0
Density given by f (n) = 
0 x≤0

Find (∞) E ( x) (b) E ( x 2 )

∞ ∞
(∞) E ( x ) = ∫ x f ( x) dx = ∫ x ( 2e − 2x
) dx
∞ ∞


− 2x
= 2 ∫∞
xe dx

e 
= 2 ∫ ( x)  − 2 x  − (1)
e− 3 x ( ) ∞ 1
2  4 ∫∫η =
2

∞ ∞
−2
E( x2 ) = ∫ −∞
f ( x) dx = 2 ∫ο x2 e xdx
(b)   e− 2 x  −2x
   e− 2µ 
= 2( x 2 ) 
− 2
( )
x e
− 2 
4
 +  4   ∫∫
e
=
1
       8  0 2

57
Example 3.9.6 Find (a) the variance, (b) The standard deviation of the

sum obtained in tossing a pair of fair dice.

Solution

1 1 1 7


E ( x) = E ( y ) = 1  + 2   + ................... 6   =
6 6 6 2
1 1 1
Hence E ( x 2 ) = E ( y 2 ) = 12   + 2 2   + .............. 6 2  
6 6 6
1 1 1
= 1  + 4   + ............36  
6 6 6
= 91
6

91 7 91 49
Var ( X ) = Vor (Y ) = −   2
= −
Then 6 2 6 4
= 35
12.

And since X and Y are independent

var ( x + y ) = var ( x) + var ( y )


35 35 70
= + =
12 12 12
= 35
6

(c) Standard Deviation = var nance

var ( x + y ) =
35
ie σ x1+ y
=
6

Example 3.9.7: The random variable x can assume the values 1 and -1

with probability ½ each. Find (a) the moment generating function

(b) the first four moment about the origin.

58
Solution:

(a.) ( )= e
E e it
x t (1) 1
 +e
2
t ( −1)  1 
 
2
 et + e −1 
1  
=  
− .............................. ......................(1)
2
t2 t3 t4
(b) we have − et = 1 + t +
+ + .......... .................... .....( 2)
2! 3! 4!

then from (1)


1 t
2
e + et = 1 + (t2
2!
+
t4
4!
)
+ .......... .............................. .

t2 t3 t4
But from (2) M x (t ) = 1 + µ 1 2 + µ 1 + µ 13 + µ1 4 + .................... ..................
2! 3! 4!
Then compering (1) and (2) we have
µ = 0, µ 1 2 = 1, µ 1 3 = 0, µ 1 4 = 1

The odd moment are all zero, and the even moments are all one

Example 3.9.8: A random variable x has density function given by

− 2x
 2e x≥0
f (x) ) = 
0 x<0

Find (a) the moment generating function,

(b) the first four moments about the origin

Solution:

( ) ∫ ∞ tx
(a ) M x (t ) = E etx = e f ( x) dx
−∞

−2x
= ∫
tx
e ( 2e ) dx


= 2 ∫ e (t − 2 ) x
dx

(t − 2) x ∞
= 2e
t−2 ο

59
2
= assu min g t < 2
2−t

(b) If t < 2 we have

2 1 t t2 t3 t4
= =1+ + + +
2−t 1− t 2 4 8 16
2

t t3 t4
But Mx (t ) = 1 + µt + µ 12 + µ 13 + µ 14 + .......... ........................
2! 3! 4!

Therefore, on comparing terms µ = 1 2

1 1 3 3
µ21 = , µ 3 = , µ 14 =
2 4 2

3.10 CONCLUSION

In this unit you have learnt mathematical expectation of random variables

for discrete and continuous random variables. You also learned Variance

and Standard Deviation for discrete and continuous random variables.

Some important theorems on Expectation, Variance and Standard Deviation

are stated.

Moment, Moment Generating functions and Characteristic function are

fully treated and related working example on each concept are easily

shown to make the learning of the unit more meaningful.

60
3.11 SUMMARY

In this unit expectation of random variables that you have studied

included the following:

- Meaning of Expectation for Discreet and Continuous random

variables

- Mathematical Expectation for discreet and continuous random

variables.

- Expected Value for Variance and Standard Deviation .

- Theorems on the Expectation of Random Variables

- Moment, Moment Generating Function for Discrete and Continuous

Random Variables

- Characteristic Functions of Random Variables

- Working examples on the Mathematical Expectation of Random

Variables

Exercise: 3.11.1 (ASE)

The density function of a random variables X is given by

1
 x 0 <x< 2
f ( x) =  2
0 otherwise

find (a) E ( x) (b) E (x2 )

61
3.1.2 TUTOR MARKED ASSIGNMENT (TMA)

Exercises 3.12.1: if X is random variable of Exercise 3.11.1 above find

E (3 x 2 − 2 x)

Exercises 3.12.2 A random variable X has E(x) = 2, E (x2) = 8 find

(a) Var (x) (b) σ x

3.13 Reference / Further Reading / other resources

Murray R silage et al (2009) Probability and statistics.

Third addition published by Mc Graw Hill Dr R.A Kasumu (2003)

probability theory first edition published by FATOL VENTURES

LAGOS, DR S.A Okunuga (1998) Probability Distribution 2 lecturer

Materials.

62
LIMIT THEOREM

4.0 INTRODUCTION

The purpose of this unit is to acquaint the students with the liquid

theorems on cheby shev’s inequality, convergence, weak laws of Lange

numbers, strong law of large number. Some of the theorems are

proved and related working examples are shown

4.1 OBJECTIVES

At the end of this unit student should be able to

• State and prove chebyshev’s inequality

• Define Convergence of random variables

• State and prove some theorems on convergence in measure

• State and prove weak law of large numbers

• State the strong law large numbers

4.2 CHEBYSHEV’S INEQUALITY

This is a important theorems in probability and statistics that reveals a

general property of discrete or continuous random variable having finite

mean and variance in known under the name of chebyshev’s inequality

Themes 4.21 suppose that X is a random variable (discrete or

continuous)

63
Having mean µ and variance σ 2 , which are finite. Then if epsilon (e)

is a positive number,

σ2
P (1 x − µ ≥ E) ≥
ε2
or with ε = k σ

1
P(x−µ 1 ≥ kσ ) ≤
k2

Proof :

We shall proof for continuous random variables. A proof for discrete

variables is similar it integrals are replaced by sums.

If f (x) is the density function of X, then


σ 2 = E ( X − µ ) 2 = ∫ (x − µ)
2
f ( x)dx
 ∞ 

Since the integrand is nonnegative, the value of the integral can only

decrease when the range of integration is diminished .

Therefore,

σ2 ≥ ∫ ) ≥E ( x − µ ) 2 f ( x) dx ≥ ∫ ε 2 f ( x) dx
1x − µ
1− µ 1
2E

= ε2 ∫f ( x) dx
1x − µ ) ≥ E

But the last integral is equal to P (1 x −x 1 ≥ E

Hence,

64
σ2
P (1 x − µ 1 ≥ ε) ≤
ε2

4.3 CONVERGENCE OF RANDOM VARIABLE

Definition 1 A sequence of random variables Xn is said to converge in

distribution or in law to X (we write Xn →


L
X ) if the corresponding

sequence of distribution functions Fn, Fn → F as n → ∞ .

In this case Fn is a distribution function of F.

Example: Consider the random variable Xn which is a Binomial

Bi (n , p ) , then
random variable Bi (1, p ), Bi (2, p ), Bi (3, p ) .................... ...... → X
Bi (n, p ) tends to normal

Furthermore if the corresponding F distribution gives:

()
Fn = Pr ( X ≤ x) = ∑ n p x q n
n
−x
→ F

If Xn 
→
l
X ie Fn ( X ) → F distribution function of X Then
(1) Cn (t ) → C (t )

Where Cn(t ) stands for corresponding characteristic function of random

variable Xn

(2) for any bounded continuous function

∫ gd Fn ( x) → ∫ gd F
Definition 2: The sequence {Xn} of random variables is said to

converge to a random variable X if

65
P { Xn − X >ε } → 0 as n → ∞ for an ε > 0 . we indicate this by Xn 
→
P
X

Definition 3: The sequence {Xn} of random variables is said to

converge in mean square to a random variable X it

ε { Xn− X } − 0 as n → ∞ . we indicate this by Xn


2
m→
.s
X

Def int ion 4 : The sequence {Xn} of random var iable is said to converge with probablity
one(or almost surely , a.s.) to a cons tan t C if {
P lim X
n→∞
n =c } =1

We indicate this by Xn → c or X n Xn


w
. p.
→C

or equivalently, lim P Sup Xn − C > ε  = 0, for every ε > 0.


 n≥ N 

Note:

1. In general f n → f does not imply that {Xn}converges to a random

variable for example, Supose X is N (0,1) and for all n, let Xn = − X

Then Xn is N (0,1) That is, Fn (x ) = F (x ) 1

Fn ( x) = ( f ( x) for all n. But

P { Xn −x ≥ ε }= P {− 2 x ≥ ε }=  ε
P  x ≥  → 0 as n → ∞
2

Hence Xn 
→p
X

(ii) If X n →
P
X then Fn → F . letting ε → 0,

We see that Fn ( x) → F ( x)

66
4.4 DEMOVRE’S THEOREM

Let X n be Bi (n, p) then,

Xn − np
Yn = → N (ο ,1) as n → ∞
npq

Pr oof Pr ( X n = r ) = n P r q n − x ()
r

Cn (t ) xn = E (ei tx )


= ∑e
x=0
itx
Pr( Xn = x)

e ( )P
∞ itx

= ∑
µ =0
n
r
x
q n−x

( ) (e )


x
= n
r
itx
P q n− x
µ=0

= Pe ( it
+q ) n

t yn
Therefore characteristic function of Yn = E (ei )

 itX n itnp 
= E e − 
 jnpq npq 

np
 itxn

= e − it q
E  e 
 npq 
 

n
−it
np
 it
+ q 
i.e Cy n (t ) = e q  p e vnpq 
 
 

67
 it

n log  Pe 
np
→ log C yn (t ) = − it +
evnpq
+1 − p
q  
 

np   it 
= − it +n log 1 + p  e 1
q   jnpq − 

z2
But ez = + z + + .......... .............................. .................... ....
2!

z2 z3
And log (1 + z ) = z − + .................... .............................. ..............
2! 3!


∴ log  y n (t ) = it
np    it
 + n log  1 + p  +
(it )2 + (it )3 + }} 0 ⇒ order
 q    npq
 2npq npq)3!

np   it t2  I  p2 −t2  1 
= − it + n Log 1 + P  − + 0  3 .... −  npq + 0  2 
q   npq 2npq n 2  2   n 

np nt 2 p np 2 t 2  1 
+ O  
np
= − it + it − +
q q 2npq 2npq n1 
 2
t2  1 
= − + 0 
2  n
t2

Cy (t ) → e 2
as n → ∞

t2

but e 2
is cheracteristic function of a stadard normal distrubtion N (0.1)
∴ − Yn →
L
N (0,1)

Theorem

If Xn is binomial Bi (n, p) as n → ∞, P → 0, such that

68
np = λ , Xn →
L
Poison (λ )
Re call : Cxn (t ) = Pe it + q ( )
n

n
 λ it  
 e +  + 1 − λ  = 1 + λ e it − 1 
n

n 
( )
  n  n 

n
 z
lim 1 +  = e z
 n

λ
n

lim Cxn (t ) = lim 1 + e ( it 
−1 )
n →∞ n →∞  n 
P →0
np →λ

= e λ eit − λ

Which is the characteristic function of poison (λ ), as n → ∞, p → 0, np = λ.

4.5 CENTRAL LIMIT THEOREM

Let X 1 , X 2 , .............................. , X n be independently and identically

distributed random variables each with mean µ and variance ∂ 2 .

X 1 + X 2 + ............................. + xn ) − µ
Let Yn = (
n ∂

x − µ n
=

→ N (0,1)
L
Then yn

X 1 + X 2 + .............................. + Xn
X =
n

69
n X = X 1 + X 2 + .......... ........................ + X n
nX − µ
nX =

Poof:

Let C(t) be the characteristics function of X1 so that C (t ) = E (e itxi )

Since the random variable are identical

( )
⇒ E (e itx ) = E e itx 2 = .............................. ....... =E e itx = C (t )
1
( ) n

Ctn (t ) = E (e )
it y
n

 X + .......... ..... X n − µ 
= E e Xi + 2 .
 σ n 

 e itx1 itX 2 itX itu 


=Ε  + .......... + −
n
.
 n σ nσ nσ n σ 

itu  itx1
  itx 2  itxn 
− n σ
= e E e . E e  − − − − − E e 
nσ    n σ   n σ 

n
itu   t 
C− (t ) = e −
C  
n σ   n σ 
 
Yn

(it ) 2
But C (t ) = 1 + α1 it + α 2 .......... .................... .....................
2!

Where d1 = µ 11 = µ

σ 2 = µ2 = µ 2 + σ 2

  it 
2

itu tu
µ 21
∴C (t ) = e − 1 + i +   + 0 (t 3 )
Yn
nσ  nσ  nσ  2! 
   

70
Taking log to get e

n
itµ  itµ t 2 µ12 
log e CY n (t ) = − 
+ n log 1 + − − + 0 (t 3 
)
nσ  nσ 2σ n 2 
 

X2 X3
But log (1+X) = X – + − ................
2 3

ctµ
t 2µ1
( itµ ) 3
∴ log CY n (t ) = − it
n
µ +n∫ nσ
− ( +− + ) ........
σ 2 nσ 2σ 2 n

t2
=- ( µ 2 2 − µ 2 ) as n → ∞
2σ2

= − t2

t2
∴C Yn
(t ) → e − (remove the log )
2

Hence Y n →
L
N (0,1)

Definition 1: A sequence of random variable Xn is said to “ converge

in probability”(weakly) to the constant C if the limit

P( Xn − C > k ) = 0 where C > 0 This is written as

Xn 
→
P
C

Definition 2: A sequence of random variable Xn is said to converge in

probability to X if Xn → X →
P
0

71
4.6 KHINCHINE’S THEOREM

(Weak law of large Numbers) let X1, X2, ……………….. Xn be a

sequence of independently and identically distributed random variable

each with mean µ

x1 + X 2 + X 3 + .................... ................ + Xn
Let X n =
n

Then X n →
P
µ

Poof:

C (t ) = E (e it x
1 )
= 1 + i t µ + 0 (t 2 )
Let C X
(t ) = E e ( it x
n )
= E e ( + X it ( x1
2 + .................... + X n )
= E (e ) E(e it x
1
1tx
2 ). E (e itx 2 )

n

=C
 t 
  = E (e it x1 ) .E (e itx2 ). E e itx n ( )
  n 
itµ
n

= 1 +

+ 0 t2  ( )
 n 

= y C xn (t ) → eitµ as n → ∞

hence, X n →
P
µ

meaning l im P
n →∞
[ x −µ >k =0 ]
eitµ is the characteristics function of a random variables taking the

value of µ with probability 1

72
THEOREM 4

The strong law of large Numbers let {Xi} be a sequence of

independent random variables such that

 ∞
612 
E  Xn) = ), Var ( X o) = 01 − 2 and ∑1 12 < ∞, 
 

n
1
Then the sequence
n
∑ xi
e =1
converge to 0 almost easily ie)

Proof:

Xn
Let Yn = n = 1, 2, .............................
n

(Yn) = ∑
σ 2
n
E (Yn) = 0 and ∑Var 2
< ∞,
n

∞ ∞
Xn
∑ Yn
n =1
= ∑
n =1 n

Converge almost easily and hence

1
lim
n →∞ n
∑ Xi = 0 a .e

Corollary 1

If (Xi) is a sequence of independent identically distributed random

variables such that E ( Xi) = 0

and variance such that σ 2 < ∞, then

Sn ∑ Xi
= 1
→ο almost surely
n n

73
Corollary Π

Let everything be as from the above theorem except that

E ( Xn) = µ and Var ( Xn) for all n.

Sn
Then → µ almost surely
n

Example 4.61

 1
0 with Pr ob n
Suppose X 
η=
 1
1 with prob 1 −
 n

X = 1 with probability 1.

Solution.

Consider Xn − X . The only possible values of xη − x1 are o and 1. So

 1
 0 , with prob − n
Xn − X =
1 with prob 1
 n

0 X < 0 
 1 
P { X n− X ≤ X } = 1 −
 
0≤ X < 1
 n 
1 X ≥1 

74
 Xn − X ≤ X  0 x < 0
Lim P  = 
  1 x ≥ = 0

X n →
P
X

0, x < 0 
 
1 0, X < 1
Fn ( X ) =  , 0 ≤ x < 1, F ( x) =  
 n. 1, X ≥ 1
1, X ≥ 1 
 

F n ( x) → F ( x) for all X

Note: X n →
d
Xn → X

Example 4.6.2

Let x 1, x2, ----- X n be a sequence of independent identically distributed

poison random variable with parameter λ. . Then

− − λ
E  X n − λ 2
} = Var ( X n ) = → o as n → ∞
 n

− −
Therefore, X n   → λ or X n 
Means quare
 → λ
quadratic mean

Example 4.

Let X1, X2 -------, Xη be a sequence of i i d

Poison random variables. Than by chebychev’s inequality

 −
p  X −λ >l }≥ Var ( X n )
 l

λ
= → 0 as n → ∞
nl 2

75
Therefore,

X n →
P
λ

EXAMPLE 4.6.3

Convergence in mean square implies convergence in probability.

By chebychev’s inequality

 ( Xn − X 2
)
P  X n − X >l} ≤ E → 0 as n → ∞
 l2 ∞

Thus X n →
P
X

In general, convergence in rth mean implies convergence in probability.

4.7 CONCLUSION

In this unit you have learned chebyshev’s inequality, the proving and the

application of the theorem.

You have also learned convergence of random variables with different

definitions, Demovre’s theorem, and Central limit theorem by using

characteristics function for finding convergence.

Moreover, the weak and strong law of large numbers are discussed the and

related working examples on the theorems are treated.

76
4.80 SUMMARY

In this unit the following concept have been learned.

2. Chebyshev’s Inequality

3. Convergence of Random Variables

4. Demovre’s Theorem

5. Central limit Theorem

6. Weak law of Large Numbers

7. Strong law of Large Numbers

8. Relevant examples on the theorems are treated.

EXERCISE: 4.80.1 (S A E)

A random variable X has mean 3 and variance 2. Use Chebyshev’s

Inequality to obtain an upper bound for

(a) P( x- 3 ≥ 2 ), (b) P ( x -3 ≥ ).

EXERCISE 4.80.2

Show that the (weak) law of large numbers can be stated as.

Sn
Lim P ( − µ <l) =
n

77
4.90 TUTOR MARKED ASSIGNMENT (TMA)

(a) Show that the sequence X n of random variable θ is said to

converge

(i) In mean square to Random Variable X.

(ii) With probability one almost surely to a constant c.

(b) State and Prove Central limit Theorem.

5. φ References / Further Reading Dr RA kasumu (2003) Probability Theory

first edition published by Fatol Ventures Lagos Dr S.A Okunuga (1998)

Probability Distribution 2 lecture Materials Murray R. Spiegel et al

Schaum’s Outhies (2009) Probability and Statistics Published by Mc Graw

Hill.

78

You might also like