0% found this document useful (0 votes)
228 views33 pages

Unit-7 IGNOU STATISTICS

This document discusses discrete random variables and their probability distributions. It begins by defining a random variable as a real-valued function on a discrete sample space whose value depends on the outcome of a random experiment. Examples of random variables include the number of heads in a coin toss and the temperature on a given day. The document then defines the probability distribution of a random variable as a function that assigns a probability to each value the random variable can take. Key concepts covered include joint and marginal distributions of multiple random variables, as well as their means, variances, covariances, and correlations.

Uploaded by

Carbideman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
228 views33 pages

Unit-7 IGNOU STATISTICS

This document discusses discrete random variables and their probability distributions. It begins by defining a random variable as a real-valued function on a discrete sample space whose value depends on the outcome of a random experiment. Examples of random variables include the number of heads in a coin toss and the temperature on a given day. The document then defines the probability distribution of a random variable as a function that assigns a probability to each value the random variable can take. Key concepts covered include joint and marginal distributions of multiple random variables, as well as their means, variances, covariances, and correlations.

Uploaded by

Carbideman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

--

- --

UNIT 7 DISCRETE RANDOM VARIABLE


AND ITS PROBABILITY
DISTRIBUTION
Structure
7.1 Introduction
Objectives
7.2 Random Variable
7.3 Two or More Random Variables
Joint Distribution of Random Variables
Marginal Distributionsand Independence
7.4 Mathematical Expectation
7.5 Variance, Covariance and Correlation Coefficient
I
7.6 Moments and Moment Generating Function
$'
7.7 Distribution of Sum of Two Random Variables
7.8 Summary
7.9 Solutions and Answers

7.1 INTRODUCTION

We have seen a number of examples of sample spaces of random experiments in the


previous two units. You must have noticed that in most practical applications a numerical
value is associated with each outcome of a random experiment. Mathematically speaking,
we have a real-valued function defined on the sample space. Such a function is called a
random variable. This unit is devoted to the study of a random variable, defined on a
discrete sample space. We introduce the cbncept of such a random variable and its
probability distribution in Sec. 7.2. In Sec. 7.3 we describe the joint probability distribution
of two or more random variables which leads to a discussion of marginal distributions and
independence of random variables. The mathematical expectation (mean), variance of a
random variable, covariance and correlation of two random variables are discussed in
Sections 7.5 and 7.6, respectively. Then we generalise these concepts to introduce moments
and moment generating functions. You have already come across these terms in the context
of a frequency distribution in Block 1. Here we are going to discuss them in the context of a
discrete probability distribution. We conclude this unit with an introduction to the problem
of obtlaining the distribution of the sum of two random variables.

Objectives
A study of this unit would enable you to :
define a random variable and specify its probability distribution,
specify the joint distribution of two or more random variables,
obtain their marginal distributions and examine them for their independence,
define and calculate the means, variances, covariances and correlation coefficients of
random variables,
define moments and obtain moment generating functions,
obtain the probability distribution of the sum of two random variables.
-- --PA-A
-- --
Probability on Dlscrete Sample
w'=* 7.2 RANDOM VARIABLE
In the first two units of this block we have introduced the concepts of a random experiment,
associated sample space and probability of an event. With the help of these we study the
uncertainties associated with such experiments. We usually find that a numerical
measurement or quantity is associated with a random experiment. Consider the following
examples :
1) A person invests Rs. 1 in purchasing a lottery ticket. He either wins the first prize of
Rs. 100 or loses his rupee. His net gain is either -1 or 99. This net gain cannot be
predicted in advance.
2) The authorities of IGNOU cannot predict in advance the number of students who
would join and complete this course. This number could be O,1,2, . . . .
3) The number of calls that a telephone exchange would receive in a specified time
interval can be 0, 1,2, . ...
4) The total number of defects in a motor cycle coming off a production line can be any
number like O,1,2, . . . . .
5) The maximum temperature of Delhi on June 05, can be anywhere between 40" and
50"C.
All these examples have one common feature. They describe a numerical characteristic -
associated with a random experiment. This characteristic depends on the outcome of the
experiment and therefore its value cannot be predicted in advance.
The numerical characteristic assuciated with a random experiment is a variable quantity
which behaves randomly and so we may call it a "random variable". This is of course, not a
technical definition of the term "random variable".
In order to make our ideas precise, we consider an example.' Suppose we are interested in the
number X of heads obtained in three tosses of a coin.
The sample space R consists of the eight points
a, = HHH, = HHT, w3 = HTH, w4 = THH, = T'I'H, w6 = THT,

We could be also interested in the I-et us denote by X ( a j ) the number of heads obtained when the outcome of our experiment is
number X of girls in families with
three children. .
wj. where j = 1.2, . . ,8. You can easily check that

Do you agree that, the number X of heads in three tosses of a coin is a function defined on
the sample space R? It assumes the values 0, 1 , 2 and 3 , as you have been above. Observe,
now that

X = k means that there are k heads in


the outcome.
x= { 0 iff the outcome is 08
1 iff the outcome is 0 5 ,
2 iff the outcome is 9,
3 iff the outcome is a , .
or co-,
W3, or k4

We can, therefore, make the following identification of events:


[x=o]=( ~ ] T [ ~ = ~ ] = ( ~ ~ , o ~ ' ~ )
[X=k],k=O, 1,2,3,isasubsetof
R,and hence is an event. [x=2]=
( 0 ~ , 0 j , W ~ ) , [ x = 3 ] =( a 1 ) .

Suppose now that


P ( 0 , ) = P ( a 2 ) = .: . = P ( 0 g ) = 118.
Then because of the above identification of events [X = j], j = 0 , 1 , 2 , 3 , we can write
p[x = o] = P { o s ) = 1/89 P[x = 11 = P ( o 5 . 06. 9)= P { w 5 ] +
Dlscrete Random Variable and Its
Prooabllity Distribution
and P[X = 31 = 118,

where we read P[X = j] as "probability that X equals j." Have you noticed that [X = j],
j = 0, 1,2,3 are mutually disjoint sets, and that

Also note that

which is as it should be (see Axiom in Unit 6).

Now let us sum up and list the essential properties of the number X of heads obtained in
three tosses of a coin.
i) X is a function defined on the sample space R.
ii) It assumes a finite number of real valves.
iii) We can assign a probability to the event that X assumes a particular value.
iv) The sum of the probabilities that X assumes the different values is one.
In this unit (and in this block) we shall restrict our attention to discrete sample spaces. So, on
the basis of the above discussion we give the following definition.
Definition 1: A random variable is a real-valued function on a discrete sample space R.
In what follows we shall denote random variables by capital letters, X, Y, W, U, V, ..., with
or without suffixes: The value of a random variable X at a point o in the sample space R,
will be denoted by X (a). We shall also write r.v. for random variable.

Recall that a discrete sample space has either a finite number of points or its points can be
arranged in a sequence. Since an r.v. is a function on the sample space, it can take either a
finite number of values or its values can be arranged in a sequence. Suppose, therefore, that
an r.v. X takes the values xl, x2 . . . . . Denote the probability P[X = xJ]that X takes the
value xj by f(x.),
J
j = 1,2, . . . . .Then we have the following definition.

Definition 2 : The function f(xj) = P [X = x.],


J
j = 1,2, . . . ,defined for the values xl, x2, . . .
assumed by X is called the probability mass function of X.

Sometimes it is also called the probability distribution of X.

Do you agree that f(xJ)2 O? What about the sum


f(xl) + f(x2) + . . . ?
Now X is a function from R to R. Therefore, the sets [ X = xj] = ( oE R I X(o) = xj],j = 1,
2,. . . ,are all mutually disjoint. Because, if
o E [X = xj] n [X = xk] for some j # k, then
X(o) = xj and X (o) = xk,where xj + xk.

this is impossible since X is a function.


Further, U' [X = x,] = R.
j

Hence, z z
j
f(xj) =
J
P[X = xJ]= P[ U[X = xJ]]
J

= P(R) = 1.
We now give some examples concerning probability mass functions.

Example l'.:We have seen that the probability mass function of the r.v. X denoting the
number of heads obtained in three tosses of a coin is,
f(0) = 118, f(1) = f (2) = 318, f(3) = 118.
Probability on Discrete Sample Example 2 : An unbiased die is rolled twice. Let X denote the total score so obtained. The
Spacec sample space of this experiment is the set f2 = ((x, y) I x, y = 1, . . . 6 ) of all ordered pairs
(x, y), x being the score obtained on the first throw and y that on the second throw. Each of
the 36 points in f2 carries the probability 1/36. Now what values does X take ? X take the
values 2,3, . . . , 12. In the following table we identify the subjects corresponding to the
events [X = j], J = 2,3, . . . , 12, as well as the corresponding probabilites, f(2), . . . f(12).
Table 1 :Probability Mass Function of X
j Event [X=j] Subset of LZ
2 [X = 21 ((19 1))
3 [X = 31 {(192)9 (29 1))
4 [X = 41 1(193)(292)9(391))
5 [ x = 51 { ( I *4) ( ~ 3( )3 ,~ ~(4,1)1
)~
6 [X = 61 {(I, 3,(2,4), (3,3), (4,2), (5, 1))
7 [X = 71 ((1,6), (2,5), (3,4), (4,3), (5.39 (6.1))
8 [X = 81 1 (2,6), (3,5), (4,4) (5,3), ( 6 2 ) 1
9 [X=9] ' ((3,6), (4,5), (5,4), (6,311

You can see that f(j) > 0 for all j.


You can also check that f(2) + f(3) + . . . f(12) = 1.
Example 3 :An r.u. X takes the values 1,2, . . . ,k, with probabilities
P [ X = j ] = f ( i ) = c j , j = l ...,k.
Ixt US find h e constant c such that f(j) is a probability mass function.

Iff is a probability mass function, we must have


f ( l ) + f ( 2 ) + . ,. +f(k)= 1.
i.e.c(1+2+.,.+k)=l,
k(k + 1)
i.e., c -= 1, implying that c = -clearly, c > 0,
2 , k(k + 1)'

which implies that f(j) > 0y .Thus, the probability mass function of X is

Now you can extend the arguments used in Example 3 tc solve this exercise.

E l ) An r.v. X takes the values 0, 1,2, . . . with probabilities


f(j).=cpJ,j=O,1.2 ,...,
where 0 < p < 1. Determine c such that f(j) is a probability mass function.

In E l you must have seen that f(j)s are terms of a convergent geometric series. Therefore,
we say that the r.v. X with the probability mass function in El has a geometric distribution.
Let us return to the discussion of the three tosses of an unbiased coin. The r.v. X, denoting
the number of heads so obtained: has the probability mass function

Suppose we want to know the probability P[X S 21. Since X 5 2 iff, X = 0 or 1 or 2. and
since the events [X = 01, [X = I] and [X = 21 are disjoint, we can write,
P ~ ~ < ~ ~ = P I x = o ~1 1+4 Pp r~x =x 2=1
Discrete Random Variable and its
Probability Distribution

Similarly, we can obtain the probability that the sum of the scores obtained by rolling a die
twice is greater that 8. In fact, for r.v. X of Example 2,
P[X> 81 =P[X= 91 +P[X = 101 + P[X = l l ] + P[X = 121

More generally, let H be any subset of the set of possible values of an r.v. X. Then

using the properly P7 or P8.of Unit 6. Here the sum is taken over all ~ o i n t xj
s in the
subset H.
Suppose we have a random variable X assuming values xl, x2, . . . with probabilities f(xl),
f(x2), . . . , respectively. You may also visualise this as an illustration of a frequency
distribution. The values xl, x2, . . . assumed by the random variable correspond to the values
of the variable or to mid-values of the class-intervals, and the probabilities f(x,), f(x2), . . .
play the role of relative frequencies. We will find this interpretation useful when studying
expectation and variance of a random variable.
In what follows, we shall study the properties of a random variable only in terms of its
probability mass function. That is, we may not always referto the underlying sample space
or to the specification of the function on the sample space which yields random variable
with specified probability mass function. However, we can always visualise a random
experiment which leads to a random variable with specified probability mass function. To
see this, imagine.a box containing cards bearing the numbers x,, x2 . . . ,and let f(xj) be the
.
proportion of cards bearing the number xj, j = 1,2, . . . If we choose one of the cards a t
random from this box, then it will bear the number xj with probability f(x,), j = 1'2, . . . .
Thus, we have a random experiment which yields a random variable with a specified
probabilitg distribution. Did you notice that we said that we can visualise a random
experiment and not that we can construct an experiment? This is because we will not be
able to construct the box or any other mechanical device if some or all of probabilities f(xl),
f(x2), . . . are irrational numbers or if the discrete random variable takes infinitely many
values.
Thus, although for technical reason it is necessary to consider the sample space on which
our r.v. is defined, all its properties can be studied with the help of only the probability mass
function. In what follows, we shall use the short form p.m.f. for probability mass
function.
But before we go any further, it is time to do some exercises.

E2) Let XI be the score obtained on the first throw and X2 be the score obtained on the
second throw of an unbiased die. Define W = XI - XZ. Obtain the p.m.f. of W.
(Hint : Follow the method of Example 2.)
E3) Three cards are drawn without replacement from a deck of 52 playing cards. Find the
p.m.f. of the number Y of spades in the three cards.
E4) A person has 4 keys with which to open a lock. We selects one of the keys at random
from the 4 keys on the first attempt. Subsequently, he discards the keys already used
and selects one key at random from the remaining keys. He may require 1,2,3 or 4
attempts to open the lock. Obtain the probability distribution of the number of
attempts.
Probability on M ~ r e t Sample
e If YOU have done these exercises, you would have got a fairly good grasp of p.m.f. Next we
Spaces study the joint distribution of random variables.

7.3 TWO OR MORE RANDOM VARIABLES

There are many situations where we have to study two or more r.vs. in connection with a
random experiment. The following are some examples of suclj \ltuations.

I
i) A store sells two brands. A and B, of tooth-paste. The sales X and Y of brands A and
B, respectively, in one week are of interest. Here X and Y are r.vs., both taking
Recall that you have already come values 0, 1.2 . . .
across joint frequency dismhutions
in Unit 4. I
ii) Let X denote the number of boys born in a hospital in one week and Y that of girls
born in the same hospital in the same week. Then X and Y are r.vs., both taking the
values@ 1 , 2 . . . 1

iii) A group of 50 people is vaccinated against a disease and another group of 40 people is
not vaccinated. Let X and Y denote the number of people affected by the disease from
the two groups. Then X and Y are r.vs. taking values 0,1 . . .SO, and 0, 1 . . . ,40,
respective1y.
iv) Suppose we classify the persons according to the day of the week they were
born. If X I , X2, . . . ,X7 denote the number of students with birthdays on Monday,
,,.
Tuesday, . . . ,Sunday from a class of 100 students, then X . . , X7 are r.vs. taking
values 0,1, . . . , 100 stlbject to the restriction X,+ X2 + . . . .+ X7 = 100.
We begin this section by describing methods of studying the jbint distribution of two or
more random variables.

7.3.1 Joint Distribution of Random Variables


Let us consider the following artificial example.
Example 4 :A committee of two persons is formed by selecting them at random and
without replacement from a group of 10 persons, of whom 2 are mathematicians, 4 are
statisticians and 4 are engineers. Let X and Y denote the number of mathematicians and
statisticians, respectively, on the committee. The possible values of X are 0, 1,2,which are
also the possible values of Y. Thus, all the ordered pairs (x, y) of the values of X and Y are
(090). (9, 1). (0,2), (1, O), (1, I), (2. O), (19 2)- (29 1) and (23 2).

The total number of ways of selecting two persons from a group of 10 persons is
I
Since the persons are selected at random, each of these 45 ways has the same
1
- Consider the event [ X = 1, Y = 11 that the committee has one mathematician and one
45'
statistician. One mathematician can be selected from two in

can be selected from 4 statisticians in


t) = 2 ways and one statistician

= 4 ways. Hence the total number of committees


8
with 1 mathematician and 1 statistician is 2 x 4 = 8. Thus P[X = 1, Y = 11 = -
45'
To obtain the probability of the event [ X = 0,Y = 11, observe that if X = 0, Y = 1, this
means that 1 statistician is on the committee and nmathematician is on it. Then the other

6).
person on the committee has to be one of the 4 engineers. This engineer can be selected in
4 ways. Hence

Similarly, we can obtain


(4)
Discrete Random Variable and its
Probability Distribution

Since the committee has only two members, it is obvious that there are no sample points
corresponding to the events [X= 1, Y = 21, [X= 2, Y = 11 and [ X = 2, Y = 21. Hence, their
probabilities are all equal to zero
We now summarise these calculations in the following table.
Table 2 :P[X = x, Y = y] for x,y = 0,1;2.
0 1 2

0 6/45 16/45 6/45


1 8/45 8/45 0
2 1/45 0 0

Note that if we denote by f(x, y) the probability P[X = x, Y = y], the function f(x, y) is
defined for all pairs/(x, y) of values x and y of X and Y,respectively. Moreover,
f(x, Y)2 0
and

We say that the function f(x, y) is the joint probability mass function of the r.vs. X,Y.More
generally, we have the following definition.
Definition 3 : Let X and Y be two r.vs. associated with the same random experiment. Let
. ..
xl, x2, . . denote the values of X and yl, y2, . denote those of Y.The function f (xj, yk)
..
defined for all ordered pairs (xj, yk). j, k = 1,2 .by the relation

is called the joint probability mass function of X and Y


Note that by definition,

and

Moreover, we should clarify that [X= xj, Y = yk] ally stands for the event [X= xj]
n [Y= y,] and that [X= xj, Y q k ] is a simplified and accepted way of expressing the
intersection of the two events [X= xj] and [Y = yk]. Notice also that in Example 4, we had
used x and y as the arguments of the p.m.f. and in the definition given above we are using
xj and yk as the arguments. We shall use both notations and trust that it will not cause any
confusion.
Now here is an example.
Example 5 :Suppose X and Y are two r.vs. with p.m.f.
flu rr\ - r lv A v\ v =1 3 2
' A onA v = 1 3 m o t An vnnn thinL- i c th- v a l ~ i - nf r 7
Probability on Discrete Sample c should be such that > 0 and
Spaces
, 4 2
C C f(x. Y)= 1
x=l y r l
The left side of the above equation is

Hence, c = 1/32 and the joint p.m.f. of X,Y is

Let us also obtain P[X = 21, P[Y= 11 and P[Y = 21.

Since Y takes the two values 1 and 2, we can write

I
Moreover, [Y= 11 and [Y= 21 are disjoint events and therefore the events [X 2, Y = I ]
and [X= 2, Y = 21 are also disjoint. Hence,

Similarly,

Now since Y takes only two values 1 and 2.

14 18 8

Note that P[Y = 11 = -and P[Y = 21 = - specify the p.m.f. of Y when X and Y have the
32 32
given joint p.m.f. It is called the marginal probability mass function of Y.We will discuss
this concept in more detail in the next section.
Example 6 :Let us obtain the conditional probability P[X = 4 1 Y = 21, that is, the
probability that X = 4 given Y = 2 for Example 5.
y1
By definition of the conditional probability,

Examples 5 and 6 illustrate that we can obtain probabilities of events associated with r.vs. X
and Y by using the joint p.m.f. Hence, as in the case of a single r.v., the joint p.m.f. of X
'
and Y is s&d to specify the joint probability distribution of X,Y. It is therefore enough to
specify the joint p.m.f. of X and Y to answer any question about them.
The concept of joint distribution of two r.vs. is easily extended to that of three r.vs. X,Y and '.'
Z. We now need to specify the p.m.f.
f(xj, yk9zi) = P[x = xj, Y = yk9z = zi].
I

It>
for all ordered triples (x;, yk, z;), of values x;, yk and z; of X,Y and Z.
We can now further extend these concepts to more than thpe r.vs. But we omit the details Discrete Random Variable and its
since, in this course, we shall be mostly dealing with joint distribution of a pair of r.vs. See if Mstribution
you can solve these exercises now.

"5) The joint p.m.f. f(x, y) of two r.vs. X and Y is given in the following table.

a) Obtain (i) P[X = 21, (ii) P[Y = 0] (iii) P[X = 1, Y I 2 1 (iv) P[X 5 2, Y = 01
(v) P[X = 2 I Y =O].
b) Are the events [X = 21 and [Y = 0] independent ?
c) Calculate P[X + Y = 41.
E6) The joint p.m.f. of two r.vs., XI and X2 is given by

where xl, x2 = 0, 1, . . . , 10, subject to the restriction that x l + x2 I10.


Find the following probabilities
a) PIXI = 31
b) P[X2 2 41
c) P[X, = 3 1 x 2 2 4 ]

Let's turn our attention to marginal distributions now.

7.3.2 Marginal Distributions and Independence


In unit 4 we have discussed the notion of marginal frequency distributions, where we fix one
of the variables and study the frequency distribution of the other. We now study the p.m.f.
of the marginal distribution. Later we shall use this to define independent random variables.

Let X and Y be r.vs. with values xl, x2, . . . and y l , y2 . . ., respeclively and joint p.m.f.
f(xj, yk) = P[x = Xj, Y = yk].

We define new functions g and h as follows :

c g(xj) = x k
, 1,2, . . .
f(xj. Y ~ j) = .
A

and h(yk) =,xj


f(x,, yk), k = 1.2,. ...
In (1). we keep the value xj of X fixed and sum f(xj, yk) over all values yk of Y. On the other
hand, in (2), yk is kept fixed and f(xj, yk) is summed over all values of X. We wish to
interpret the function g(xj) defined for all values, xj of X and the function h(yk) defined for
all values yk of Y. Notice that both g and h being sums of non-negative numbers, are
themselves non-negative. Further,
.,
7

Probability on Discrete Sample Thus, g(xj) has all the properties of a p.m.f. Similar1y;you can verify that h(y& also has all
\paces
the properties of a p.m.f. We call these the p.m.f. of the marginal distributions of X and Y,
as you can see from the following definition.

Definition 4 :The function g(xj) defined for all values xj of the r.v. X by the relation I

b'

is called the p.m.f. of the marginal distribution of X. Similarly, h(yk) defined'fsr all the
values yk of the r.v. Y by the relation I

C
h ( ~ k ) = f(xj. YL)
j
is called the p.m.f. of the marginal distribution of Y.
Let's try to understand this concept by taking an exampl'e.
Example 7 :Let X,Y be two r.vs. with joint p.m.f. f(x, y) defined by the following table.
Table 3 :Joint p.m.f. f(x, y)

The marginal p.m.f. g(x) of X is obtained by summing all the elements in each of the rows.
Similarly, the marginal p.m.f. of Y is obtained by summing all the elements in each of the
columns. This procedure is a.straightforward consequence of the definition of g(x) and of
h(y) when the joint p.m.f. is defined by the above tabular form. In fact, we have
g(0) = P[X = 01 = 113
g(1) = P[X = 11 = 5/24
g(2) = P[X = 21 = 11/24
Similarly,
h(0) = P[Y = 01 = 6/24
h(1) = P[Y = 11 = 9/24
h(2) = P[Y = 21 = 6/24
h(3) = P[Y = 31 = 3/24
In this example, we have g(x) s P[X = x] and h(y) = P[Y = y] for all x and y.
Is it a coincidence ? No.
Notice that in the general situation, .

and recall that the events [X= x. Y = yk] for fixed xj and different yk values are disjoint.
Hence, by property PI and P8 o# unit 6, ,.

Similarly,

Here is another example.


E-pk 8: thehint p.m.f. of X and Y be given by f(x, y) = for x = 0, 1.2.3, and I H ~ r r r t eYmdom \ urisble and its
30 Prohabilitk I)i*tribution
y = o , 1.2.
Then

Similarly,

The discussion, so far, tells us that we can determine the marginal p.m.fs. from a knowledge
of the joint p.m.f. of the two r.vs. But is it possible to determine the joint p.m.f. from a
knowledge of the marginal p.m.fs. ? To answer this, we consider the following two distinct
joint p.m.fs. fl and f2 and the corresponding marginal p.m.fs. The first p.m.f. is given by

The corresponding marginal p.m.fs. are


g1(0)=112, g1(1)=1/2,

hl (0) = 3/8, hl (1) = 518.


Now we define the second p.m.f. as
f2(0,0)=3/16, f2(Q1)=5/16,

For this joint p.m.f., the marginal p.m.fs. are


g2(0)= 112, g2(1)=1/2,

So, what do we find ? Although the joint p.m.fs. fl and f2 are different, they lead us to the
same marginal p.m.fs., gl = g2, hl = h2. In other words, the marginal distributions of X and
Y do not determine their joint distribution uniquely. However, there is one particular
situation where this is possible. We now discuss this situation in detail.
Let X and Y be two r.vs. with joint p.m.f. f(x, y) specified in the following table :
Table 4

Consider f(1.3\= PIX = 1. Y = 71 = 1113 i a the nrnhahilitv nf t h m inte-rtinn TY 1


Discrete Random Variable and its Y = 31 of the events IX = 1rand [Y = 31 is 1/12. But since g(1) = P[X = 11 = 116 and
Probability Distribution h(3) = P[Y = 31 = 1/2, we have the relation
P[X = l , Y = 3 ] = f ( l , 3 ) = g ( l ) h(3)
= P[X = 11 P[Y = 31.
This means that the events [X = 11 and [Y = 33 are independent. In fact, notice that Table 4
is so constructed that

-
P[X = x, Y = y] = f(x, y) = g(x) h(y)
P[x = XI P[Y = y]
for all x = 0, 1.2, and y =jO, 1,2,3. In other words, for all possible values x of X and y of Y,
the events [X = x] and [Y =y] are independent. In such a situation, we are justified in
asserting that the r.vs. X and Y are independent r.vs. More formally, we have the following
definition for independent r.vs.

Definition 5 :Let X and Y be two r.vs. with joint p.m.f. f(xj, yk) and marginal p.m.fs. g(xj)
and h(yk) of X and Y, respectively. If for all pairs (xj, yk),
t
then we say that the r.vs. X and Y are stochastically independent or, simply, independent.

Now we give an equivalent definition in the following remark.

Remark 1': An equivalent definition of independence of X and Y would be as follows :The


r.vs. X and Y are independent if for all pairs (xj, yk), the events [X = xj] and [Y = yk] are
independent, i.e., if
P [ x = xj, Y = yk = P[x = xj] P[Y = yk] . . . (4)
for all pairs (xj, yk).

Note that we have defined ir dependence of r.vs. in terms of independence of events. Thus,
.
no essentially new concept i involved in the definition of independence of two r.vs. except
that the product relation (3) t,r equivalently the product relation (4) should hold for all pairs
(x, y) of values x of X and y of Y.
Note that the r.vs. X and Y o!'Examples 7 and 8 are not independent. You can check that
f(0,O) # g(0) h(C 1 in Example 7.
Similarly, in Example 8,

With this background, can yo i extend the concept of independence of two r.vs; to that of
n(> 2) r.vs?
Definition 6 :Let X1 ,. . . ,X ,be n r.vs. They are said to be independent if
PIXl = Xl, . . . ,X,,
= x,] = PIXl = xl] P[X2 = x21 . . . P[X, = x,l
for all n-tuples (xl, . . . ,x,) of values xl of XI, x2 of X 2 , . . . ,x, of X,,.

If you have followed the ideas introduced in this section, then you should be able to solve
these exercises.

E7) Determine the value of c so that the following functions represent the joint p.m.f. of
the r.vs. X and Y.
a) f ( x , y ) = c , x = 1 , 2 , 1 , y = 1 , 3 , 3 .

E8) Obtain the marginal p.m ?s.of X and Y in each of the cases of E7.
E9) Examine if X and Y are i dependent in :,ach of the cases of E7.
Discrete Random Variable and its
E10) Suppose the r.vs. X and Y have the joint p.m.f. f(x, y) specified by the following table Probability Distribution

1 a) Obtain the marginal p.m.fs. of X and Y.


I
b) Determine if X and Y are independent.

So far, you have seen that the p.m.f. of one or more random variables can be visualised as
their frequency distribution where probabilities correspond to relative frequencies. You also
know that given a frequency distribution, we can find its mean, variance, covariance and
moments. Let us study these concepts for the p.m.f. of a r.v. now.

7.4 MATHEMATICAL EXPECTATION

Suppose that the smres obtained by five students in a class are


40,50,55.60 and 75.
What is the average or arithmetic mean score of these five students ? This average is
(40+50+55+60+75)=56.00
5

I
The problem becomes a little more complicated if we have the following frequency
distribution of the scores of 100 students in the class.

I Score 40 50
Frequency 10 15 . 35

By the usual formula you can compute the average score as


1
- ~ 1 0 x 4 0 + 1 5 x 5 0 + 3 5 . x 5 5 + 2 5 x 6 0 +1 5 x 7 5 )
I 100

However, let us rewrite this in a slightly different form as follows. The required average is

Note that the fraction 101100, 15/100,35/100,25/100 and 1.51100 are, in fact, the relative
frequencies or the proportions of the students who obtain the scores 40,50,55,60 and 75,
respectively.
As you know, the arithmetic mean is a measure of central tendency giving a single number
around which the observations are distributed. Now we want to define a similar measure of
central tendency for the probability distributions of a r.v. X, which assumes different values
with their associated probabilities. The only difference is that the role of relative frequencies
is now taken over by the probabilities.
The simplest situation is to consider a r.v. X which takes two values 1 and 2, and suppose
that P[X = 11 = 113 and P[X = 21 = 2/3. The mean, or the mathematical expectation, of this
r.v. X is defined to be
Probablllty on Discrete Sample Suppose now that a r.v. X takes a finite numter n of values x l , x 2 , . . . ,xn with probabilities
Spaces
.
f(xl). f(x2). . . f (x,,).
9

Then we define the expectation of as

Suppose now that the r.v. X assumes an infinity of values xl, x2 , . . . with associated
.
probabilities f(xl), f(x2).. .The expectation of X is now defined by the infinite series.

The symbol E(X) is read as the


expectation of X.

provided the infinite series converges absolutely. i.e., provided x1


j= 1
x, 1 f(x,) is a
convergent series.

Notice that if 2 I xj I f(xj) is a convergent series, then


j
A series aj is called a
j
n
convergent serles if Sn = a;
i= I 5 I xj I f(x,) c -,
tends to a finite limit as n -+ 0.But j
do'nt spend much time over this
definition. You will be asked to sum i.e. E(X) is a finite number or we say that E(X) is finite.
~ n i ygeometric series in this course.
Formally, we have the following definition which is valid both whet1 X assumes a finite
number of values and when it assumes a countably infinite number of values.
Definition 7 :The expectation E(X)of the r.v.X assuming values xl. x2. . . . with
probabilities f(x,), f(x2), . . . is given by

pmvided I xj I f(xj) is finite.


j

We shall not discuss the definition of E(X) when the infinite series I xj I f(xj) does not
converge. The discussion of such cases is beyond the scope of this course and so, we shall
consider only those r.vs. which have a finite expectation.
The mean of X, expected value of X, mathematical expectation of X, mean of the
distribution of X are some of the synonynis in use for E(X).
w e now illustrate the computation of E(X) through some examples.
Example 9 :Let us find the expected score obtained on the roll of an unbiased die.
The score X obtained on the roll of a die is I , % , 3,.4,5
or 6 and each has probability 116, i.e.
P[X = x] = 116 for x = 1,2, . . . ,6. Hence,

Example 10 :A lottery consists of 100 tickets valued at Rs. 21 each. A person buys 1
ticket mrl wniild main a nf R c
nri7~ 1 M if hic tirkpt i c the winnino tickpt T pt IIC find hi^
The probability that the person wins the prize is 111W and that he loses is 99/100. His net biwrete Random Variable and it5
gain X is Rs. 98 if he wins, and is Rs. (-2) if he loses. Thus, we need to find E(X) when P[X Probability Distribution
= -21 = 991100 and P[X = 981 = 1/100. We get

Thus, his net expected gain is Rs. (-I), i.e., his expected loss is Rs. 1.
Now we consider two situations, where the r.v. takes an infinite number of values.
Example 11 : Suppose we want to find the expected value of a r.v. X which has the p.m.f.

By definition

Many a times, we need to calculate not E(X) but the expected value of a function of X, like
x2, cos X, exp(tX), etc. Of course, all such functions are again r.vs. and we can use the
definition to calculate their expectation. However, the following example s~iggestsa simple
solution.

Example 12 :Let X be a r.v. with p.m.f. given by the following table.

We want to compute E(x~).

Since X assumes the values -2, -1,0, 1,2, the Lalues of x2are 0, I and 4. Do you agree that
p[x2 = 01 = P[X = 01 = 4/10 ?
~ o w , s i n m 1~i ~f f =
X = 1 orX=-1,
p[x2 = 11 = P[[X = 11 u [X = I]]
= P[X = 11 + P[X = -11 = 4/10.

Similarly, p[x2 = 41 = P[X = 21 + P[X = -21 = 2/10.

In short, the p.m.f. of x2is specified by


p[x2 = 01 = 4/10, p[x2 = 11 = 4/10. p[x2 = 41 = 2/10.

Hence,

Here we first obtained the p.m.f. of x2 and then used the definition of E(x~).This, in
general, could be a cumbersome procedure. So let's try another way.
2
Let us calculate x2f(x).
x=-2
Probability on Dlscrete Sample
2
Spaces
x
x=-2
x2f(x)=4
(,b)+
- 1 -
0
+o - + 1 -
0 [f) (:o)
+4 -

= 1.2 . . . (8)
2
The equality E ( x ~ )= x
x=-2
x2f(x), brought out by (7) and (8) is not an accident. It is a

consequence of some detailed analysis which leads us to the following theorem.


Theorem 1 : Let X be a r.v. assuming values xl, x2, . . . with probabilities f(x,), f(x2), . .
Let $(X) be a r.v. which is a function of X, i.e., when X = xj, $(X) = $(xj). Then

E[$(x)l = xj
$(xj) f(xj), . . . (9)
ptovided the series on the right hand side of (9)is absolutely convergent.
We shall not prove this theorem. But we would like to bring out some important points
concerning it.
Remark 2 :
i) We have the following useful interpretation for

E[$(X)I= C $(xj) P[X = xjl.


j

ii) The illustration in Example 12 is not a proof of the above theorem. The proof is
beyond the scope of this course.
We'll be interested in functions of iii) Suppose X and Y are two r.vs. with joint p.m.f. f(xj, yk). Let $ be a real-valued
the type $ (xj, yk) = XJ + yl;
$ (xj, ~ k =) X j functibn defined on the product set G x H';where G = { x x,, . . .) is the set of values
$ (xj, yk) = "J Yk. o f X a n d H = {yl,y2,.. .) isthesetofvaluesofY.
Let us denote by $(X, Y), the r.v. which assumes the value $(xj, yk), when X = xj and
Y = yk. We define, by analogy with the result of Theorem 1,

provided, of course, the infinite series on the right is absolutely convergent.


Now consider the random variable $(X) = a x + b, where a and b are constants. What will be
then the expectation of a x + b? Suppose X assumes the values x,, x2, . . . with probabilities
f(xl), f(x2). . . . We have

= aE(X) + b, since x xj f(x,) = E(X) and x f(xj) = 1.

We can generalise this result and find a simple way of calculating the expectation of the sum
of two r.vs. X and Y. This is given in the following result.
Suppose X and Y are two r.vs. with joint p.m.f., f(xj, yk),j, k = 1.2, . . . Suppose E(X) and
E(Y) are finite. Then E(X + Y) is finite and
E (X + Y) = E(X) + E(Y).
This result is true when X and Y take either finite or countably infinite values. We shall not
worry about the proof in the countably infinite case here. The proof in the finite case-is very
easy and we are sure you can write it yourself.
Discrete Random Variable and its
E l l ) 1 f X a n d Y a r e t w o r . v ~withjointp.m.f.f(xj,yk),j=
. I r 2,..., n , k = 1 , 2, . . . m,and Probability Distribution
E(X), E(Y) are finite, then prove that
E(X + Y) = E(X) + E(Y).

A simple induction argument leads us to the following result:

If XI,X2, . ..
,Xn are r.vs. such that E(Xi) is finite for all i, then XI + . . . + Xn also has a
finite expectation and
E(XI + X Z + . . . + X n ) = E ( X I ) + . . . +E(X,).
We now list a simple but useful property of E(X).

If a < X 5 b, a, b . R, i.e., if the values x I , x2. . . . of the r.v. X are such that a Ixj 5 b for all
j = 1,2, . . . , then a E(X) b.
Proof : Obseke that because a Ixj 5 b for all j 1 1, we have

Equivalently, since C f(xj) = 1, a < E(X) I b.


See if you can solve these exercises by using the results of this section.

E 12) Prove :
a) If X > 0, and E(X) is finite than E(X) symbol 0.
b) Let X > Y, i.e., the r.v. X - Y assumes only non-negative values. Then
E(X) 2 E(Y).
E13) Let the p.m.f. of a r.v. X be
f(x) = (3 - x)/IO, x = -1,o, 1,2.
a) Calculate E(X).
b) Calculate E ( x ~ )by using (9) and also by determining the p.m.f, of x2and verify
that both give the same result.
C) Use the results of (a) and (b) to calculate
E[(4X + 5)2].
E14) Calculate E[exp(tX)] for the distribution discussed in Example 11. Here t is a fixed
number.
E15) An unbiased die is rolled. We say that a success occurs if the score obtained is 1 or 2.
Any other score (i.e. a score of 3,4, 5 or 6) is called a failure. Let Xk = 0 or 1
according as the k-th trial results in a failure or a success. Notice that XI + . . .+ X, is
the number of successes obtained in n rolls of the die. Obtain E(Xk) and hence the
expected number of successes in n rolls of the die.

So far we have discussed some of the properties of the expectation of a r.v. X. You have
seen that expectation is regarded as a measure of central tendency of the probability
distribution of X, with the probabjlities f(xj) = P[X = xi] playing the role of relative
frequencies. In the next section we will extend these concepts to obtain measures of
dispersion of X arbund its mean value.
- - -

7.5 VARIANCE, COVARIANCE AND CORRELATION


COEFFICIENT
Probability on Discrete Sample measures of dispersion and correlation you studiea in Block 1. In what follows, we assume
Spaces that all the relevant expectations are defined.
Definition 8 :Let X be a r.v. assuming values x,, x2 . . .with probabilities f(xl), f(x2) .. .
Let CL denote E(X). The variance of X, denoted by Var (X), is
Vat (X) = E[(X - C L ) ~ ] = (xj - p)2 f(xj) . . .(15)
j
Note that as we have seen in the case of the expectation, Var (X) has a close similarity with
the variance (or the second moment about the mean) of a frequency distribution discussed in
Block 1.
The expression (15) for Var (X) is not suitable for purposes of computation. The following
lemma provides a simplification.

Lemma 1:Var (X) = E ( x ~ )- p2. . . .(16)


Recall that we have proved a similar result in Seq. 2.4.3 of Block 1.
The proof of this lemma follows on exactly similar lines.

It is also convenient to write (16) as


-
var (x) = E{(X p)2j = E ( x ~ )- [E((X)I~ . . , (17)
The positive square root of Var(X) is called the standard deviation of X. We denote it
by a(X).

The variance of X, being the expectation of the non-negative r.v. (X - I ) ~ is


, always
non-negative, i.e. Var(X) 2 0 (see El2 a). Also Var(X) is finite, whenever E(x~)is finite
(3ee E l 2 a).
For, suppose E(x') is finite. Then since 1 X 1 5 x2+ 1, E[ I X 1 I 5 E(x2) + 1, qnd hence
E[ I X 1 ] is finite. So, whenever E(x2) < w, E[I X I 1 is finite and so, be definition, E(X) is
finite. Then (17) implies that Var(X) is finite.
Note further that if X is a r.v. such that P[X = a] = 1, then E(X) = a. It also follows that
P[X - a = 01 = 1, implying E[(X - a12] = 0. Hence, if the r.v. X assumes only one value,
its variance is zero. Conversely, if Var(X) = 0,

This implies that (xj - p)2= 0 for all j such that f(xj) z 0.This means that X takes only one
value p, or that P[X = p] = 1. In short, Var (X) is zero iff the r.v. X assumes only one value
or is a constant. Such a r,v. is said to have a degenerate distribution or is said to be a
degenerate r.v.
Now look at some examples, where we have calculated the variance or some r.vs., which
you have already met.
Example 13 :Here we calculate the variance of the score obtained on the throw of an
unbiased die.

Let X denote the score obtained on the throw of the unbiased die. Then .

In Example 9 we have seen that

Further, E ( x ~ )= ${I2 + z2 + 32 + 42 + 52 + 61'


Example 14: Let us calculate the variance of the gain of the person of Example 10. Discrete Random Variable and its
Probability Distribution
99
Recall that P[X = -21 = -and P[X = 981 = and that E(X) = -1. Hence,
100 100
99 + (98)2 . -
Var (X) = (-2) 2 . - 1 -
100 100 (-112
L
= 99.
Example 15 :Suppose we want to obtain the variance of the r.v. X of Example 11.

Since P[X = r] = 1 , x = 0. 1,2, . . . we have


3(3J

In Example 1 1 as well as in the above example, we were required to calculate the sums of
some infinite series. Here is how we find the sums of series of the type,

Using the formula for the sum of a geometric series, we get

To compute S note that

P
Therefore, S, = -
(1 -P12
Similarly,

This gives us

The calculations in Exanlple 15 are for p = 113.

Now here is anexercise for you.


Probability un Discrete Sample
SPP- E 16) Prove that Var(aX + b) = a2 Var (X).

We now give some important observations concerning the result,in El6 in the following
remark.

Remark 3 :
I
I
i) If we treat Y as a r.v. obtained from X by a change of origin and scale, then El6
implies that the variance is unaffected bjr change of origin.
ii) The standard deviation of Y = aX + b is I a I times the standard deviation of X.
iii) Suppose E(X) = p and Var(X) = a2, where a is the standard deviation of X.

Then the mean and variance of Y = u, are zero and one. respectively. The r.v.
a
y = -(X *,
is called the standardized or normalized version of X.
a
Our next aim is to obtain Var(X+Y). For this purpose we need to introduce the concept of 1:
covariance of the two r.vs. X and Y.
Let X and Y be two r.vs. with joint p.m.f. f(xj, yk), j, k = 1,2, . . .Then

E(xy) = xj k
4 yk yk), . . . (19)

(xj * yk122 o where the sum of the series on the right is assumed to be finite (see remark 2(iii)). Let
p, and p
,, denote the means of X and Y, respectively. Now we are in a position to define the
xz + Y£
=>+21x,yk1 I covariance.

Definition 9 :The covariance between X and Y, is defined to be

= I: (9- px) (yk - b)f ( ~ jyk)


?
j k
We can simplify this as follows :
COV(X,Y) = E(XY - px Y - py X+ px py]

We can also write


Cov(X, Y) = E(XY) - E(X) E(Y).

The elementary inequality 1 xj yk I S


(xjz + ya. implies that
2

Hence, we conclude that

\ /
It follows that if Var(X) and Var(Y) are finite, then Cov(X, Y) is finite.
We illustrate the procedure for the computation of Cov(X, Y) by means of an example now.
Example 16 :Suppose the joint p.m.f. of X, Y is given by the following table :
Table 5 Discrete Random Variable and its
-- - Probability Distribution
0 , 1 2 g(x)

-
0 3/28 9/28 3/28 15/28
1 3/14 3/14 0 317
, 2 1/28 0 0 1/28
h6') 5/14 15/28 3/28 1

Let's compute the covariance Cov (X, Y).


We have

Similarly,

Hence, Cov (X, Y) = E (XY) - 1


,C L ~

You must have noticed that the troublesome step in this calculation is the computation of
E(XY). But for some r.vs., this is simplified. We establish this inthe following theorem.
Theorem 2 :If X and Y are independent revs.and have finite ex&tations, then
. E(XY)= E(X)E(Y).
Proof: Since X and Yare independent r.vs., their joint p.m.f. is
f(xj, yk) = g(xj) h(yk)*
Where g(xj) and h(yk) are the marginal p.m.fs. of X and Y, respt ctively (see Definition 5).

We, therefore, have

E(XY) = 2 C xj yk f(xj, yk)


Probability on Discrete Sample
Spaces

, - = E(X) E(Y).
We generalise this result for n independent r.vs in the following corollary.
Corollary : If XI, X2. . . . Xn,are n independent r.vs with finite expectations, then
f . \

We are not going to prove this corollary here.


Here is another useful result which follows frorn Theorem 2:
Corollary: If X and Y are independent r.vs. with finite variances, then Cov (X,Y) = 0.
Caution : If Cov(X, Y) = 0, it does not follow that X and Y are independent. For example,
consider the r.vs. X and Y with joint p.rn.f. as in Table 6.

Observe that

and

Thus, Cov (X, Y) = 0.


However, f(1, 1) = 0 # g(1) h(1). This shows that X and Y are not independent.
This, X and Y are independent =+ X and Y have zero covariance.
But the converse is not true.
We are now in a position to obtain Var (X + Y).
If X and Y are random variables with finite variances, then
Var (X + Y) = Var (X) + Var (Y) + ~ C O(X,Y).
V
Let's prove this
Proof :Let X and Y be r.vs. with joint p.m.f. f(xj, yk). Then E(X + Y) = CL, + % and
Discrete Random Variable and its
Probability Distribution

= Var (X) + Var (Y) + 2 Cov (X, Y)


as required.
Corollary: If X and Y are r.vs. with Cov (X, Y) = 0,then
Var (X + Y) = Var (X) + Var (Y) . . . (23)
Note that if X and Y are independent r.vs., then (23) automatically holds.
We now give a result about the variance of the sum of n r.vs :
IfXI, X2, . . . , X,, are r.vs with finite variances,

We omit the proof of this result. The result about n independent r.vs now follows:
Corollary : if X I , . . . , Xn are independent r.vs., with finite variances, then

111 fact, it is enough to assume that the r.vs. X1, . . . , Xn have pairwise zero covariances to

claim this result. Try to do this exercise now. It concerns the definitions and results which
we have just discussed.

E17) Let the joint distribution of X and Y be as specified in Example 16. Obtain
v a r (X + Y).

The following theorem expresses the covariance between a x + b and cY + d, where a, b, c, d


are constants, in terms of Cov(X, Y).
Theorem 4 : Cov(aX + b, cY + d) = ac Cov (X. Y).
Proof: Let Z1 = aX + b and Z2 = cY + d Then E(Z1) = ap, + band E(Z2) = cpy + d. NOW
COV(aX + b, CY + d) = E (Z1 Z2) - E(Z1) E(ZZ)
=E [(ax+ b) (cY + d)] --(ap, + b) (cy, + d)
= E[acXY + adX + bcY + bd]
- acpxpy- adp, - bcpy - bd

= ac E(XY) + adp, + bcpy + bd - acpxpy

- adp, - bcpy - bd
= ac l E(XY) - p,~,,I
= ac Cov (X, Y), as required.
We can use this theorem to arrive at the following result.
Corollary :If X and Y are r.vs. with Cov(X, Y) = 0,then
v a r (X - Y) = v a r (X) + v a r (Y).
Proof :Applying Theorem 4, we get
Cov (X, - Y) = - Cov (X, Y)
.- .
Probability on Discrete Sample Also, be using the result in E16,we can write Var (-Y)= Var (Y).
Spaces
Hence, in general,
var (X- Y)= var (X)+ var (-Y)+ 2 Cov (X,
-Y).
= var (X)+ var (Y)- 2 Cov (X,
Y).
Var (X-Y)= Var(X) + Var(Y).
We conclude this section with the discussion of the correlation coefficient between X and Y.
The definition of the correlation coefficient is very similar to that of the correlation
coefficient you encountered in Unit 4 in connection with bivariate data.
Definition 10.: The correlation coemcient between X and Y is defined to be

In this definition we assume that Var (X)and Var (Y)are both finite and positive and that
the square root in the denominator is the positive square root. We give below some simple
properties of the correlation coefficient.
1) LetZ1=aX+bandq=cY+d.Then

Z2)
Proof :Recall that Cov (Z1, = ac Cov (X,Y)and thgt Var (Z,) =.a2 Var (X)and
Var (Z2)= c2 Var (Y).Hence

from which the required result follows.


(X- Px) (Y- Py)
In particular, if X*= -andY* = are the standardised versions of X and Y,
Ox
respectively, then p(X*,Y*)= p(X,Y),since oxand oyare both positive.
2) -lIp(X,Y)I+l.
Proof: You have already seen this result in Unit 4. Here id an alternative proof. Let X*and
Y*be the standardised r.vs. Since Var (X*)= Var (Y*)= i , we find that Cov (X*, Y*)=
p(X*,Y*)= p(X,Y).Moreover,
var (X*+ Y*) = var (X*)+ var (Y*)+ 2cov (X*,
Y*)

Since Var (X*+ Y*)2 0, we have


2t 1 + p(X,Y)12 0,
orp(X,Y)2- 1.
Similarly
OIVar(X*-Y*)=2(1-p(X,Y))
implies that p(X,Y) I 1. Hence the result.
3) The correlation coefficient p(X,Y)= f 1 if and only if there exist connstants a and b
such that Y = aX +b.
Proof:LetY=aX+b.Then
Var (Y)= a2 Var (X)and
Y)= Cov (X,aX + b) = a Cov (X.X)= a Var (X).
Cov (X,
aVw (X)
Hence p(X,Y)=

= f 1 according as a > 0"ora < 0.

Conversely, suppose p(X,Y)= 1. Then from the proof of the second property above, we
have ,=
var (X* - Y*) = 0. Discrete Random Variable and its
Probability Distribution
This implies that X* - Y* is a degenerate r.v. or than
X* - Y* = c, a constant
Since E(X*) = E(Y*) = 0,i: = 0.Equivalently,

0 0
wherea=' and b = h - =pX..
=x , =x

The proof for the case when p(X, Y)= -1 is similar..In that case H 2 use the result
var (X* + Y*) = 0.
We have given a number of examples in this section to show haw to obtain :fit ~,iean,
variance, covariance, etc. for random variables. Now would you like to try yo:ir hand at
these exercises?

E18) Compute the means, the variances, the covariances and the correlation coefficients for
the joint distribution of E7 and E10.
E19) Obtain the variance of the total number of successes in El5 under the assumption that
X,, X2, . . . ,Xnare independent r.vs. I

E20) Obtain Var (ax+ by).

So far we have discussed many concepts for a random variable with a given p.m.f. We had
talked about the same concepts in relation to a frequency distribution in Block 1. In the next
section we will take up the study of yet another concept.

7.6 MOMENTS AND MOMENT GENERATING


FUNCTION

We have studied the properties of E(X) and Var(X) in the previous two sections. There are
expectations of some functions of r.vs associated with a probability distribution, which play
an important role in statistical theory. We plan to study properties of some of these in this
section.
Let r be a positive integer. The r-th moment of a r.v. X or of its probability distribution is
ktT= E(XT)= x
j
xi f(xj),

provided, of course, the series on the right is absolutely convergent. Sometimes we need to
use

which is called the r-th moment of X about a. In this sense 4 is the r-th moment about the
origin (a = 0).

Of course, when r = 0,9 = 1 and therefore, J L ' ~= 1. The first moment p'+s, the, by now
familiar, expected value or mean of X. The variance, Var(X), is the second moment of X
x
about its mean, m2( ).
Probability.onDiscrete Sample . .
( u 1 ~ - ~ 1 1Hencewecanassertthatwhateverbetherealnumberu,~u1~-~51u~~+1.
Spaces A consequence of this inequality is the following :

= 1+ C IX," f(xj).
j
Thus, whenever, the r-th moment of X is finite, so is the (r - 1)-th moment. In particular, all
the moments p',, s S r, would be finite.

We do not enter into any detailed study of the properties of moments of a rv. except to
introduce the so-called moment generating function which will be useful to us in Units 8
and 9.
Let t be a real variable and suppose that

M, (t) = E(exp(tX)) = 2 exp(txj) f(xj)


j
is finite for all values o f t in a neighbourhood of the origin t = 0.Then the function M,(t) o f t
is called the moment generating function of X. We abbreviate it as m.g.f..
You may be wondering why we call M, (t), the moment generating function. Recall that
Maclaurin's expansion of exp(tx) is
t2x2 t3x3
exp(tx) = 1 + tx + -- + ---+ . . . (Calculus, Unit 6)
2! 3!
It, therefore, follows that

In other words, p: is the coefficient of f/r! in Maclaurin's expansion of the m.g.f. In fact, we
can write

In this sense, the m.g.f. generates moments.


In the following example we find the m.g.f. of a random variable.
Example 17 : Let X be a r.v. with
P[X = 01 = 213 and P[X = 11 = 113.
Its m.g.f. is
2 . -1
&(t) = et.O . - +
3 3

Since here

we find that

p b = -2+ - =1 1 and p f , = 1- , r = 1 , 2 , . .
3 3 3
Discrete Random Variable and its
Probability Distribution

Proof :By definition


My (t) = E[exp(ty)l
= E[exp(taX + tb)]
= ek E[exp (atX)]
= e k ~ (at).
,

(X - Px)
In particular, if X* = -is the standardised version of X, then
=x

We shall use this result in some later units of this course.


The importance of the m.g.f. uoes not lie only in its ability to generate the moments of the
r.v.X. Under certain conditions, the m.g.f. can uniquely identify the probability mass
function of X and hence its probability distribution. But we'll iiot go into the details here.
Now we prove another result which is useful in the study of the distribution of the sum of
two or more independent r.vs.
11) Let X and Y be independen' ~s with m.g.fs. M, (t) 1;it M, (t). Then the m.g.f. of
X+Yis
M, + ,(t) = M,(t) My@).
Proof: Since X and Y are independent r.vs, their joint p.m.f. is f(xj, yk) = g(xj) h(yk), where
g and h are the p.m.fs. of X and Y, respectively. Hence,
Mx + (t) = E[exp( t(X + Y) 11

= Mx(t)My(t),.
'
which is the required result.
We shall talk more about the probability distribution of the sum of two r.vs. in the next
section. But before that we are giving you a simple exercise to do.

E21) Obtain the m.g.f. and the moments of the r.v. in Example I .i

7.7 DISTRIBUTION OF SUM OF TWO R!iNDOM


VARIABLES

In Example 2 we have discussed the probability distribution of the sum of scores obtained
on two mlls of an unbiased die. In this section we are interested in the methods of obtaining
the distributionof the sum of two r.vs. We begin with the following simple
example.
Examde 18: The ioint ~.m.f.of (X.Y)is as s~ecifiedin Table 'I.
Probability on Discrete Sample Table 7
Spaces

We want to obtain the p.m.f. of X + Y.


Observe, first of all, that since X takes the values 0, 1,2, and Y takes the values 0, 1, 2, 3, the
r.v. X+Y can assume the values 0, 1,2,3,4,5. Now we list the different possibilities.
[X+Y=O]=[X=O,Y=O],

[X+Y=5]=[X=2,Y=3].
It immediately follows that

Thus, the p.m.f. of X + Y is


f(0) = 1/27,.f(l) = 6/27, f(2) = 13/27, f(3) = 7/27.
This example illustrates the general method of obtaining the p.rn.f. of X+Y from the joint
p.m.f. of X and Y. The basic steps are
i) Identify the possible distinct values of X + Y.
ii) .
If ul, u;?,. . denote these distinct values of X + Y, identify all the sets [X = x,, Y = yk]
for which xj + yk = y, say.
iii) Then

where the sum extends over all those (xj, yk) which add up'to y.

This general procedure, though valid in principle for all discrete r.vs., is chmbersome,
except in very simple situations. We, therefore, investigate a special case in which
simplification is possible.

Suppose X and Y are independent r.vs. which assume non-negative integral values
0, 1,2, . . . Let PIX = x] = f(x), and P[Y = y] = g(y), x, y = 0, 1,2, . . . Because of the
independence of X and Y,
P[X=x,Y=y]=f(x)g(y)
for all x and y. In order to obtain the p.m.f. of X + Y, observe that X + Y assumes the values Discrete Random Variable and its
0, 1.2, . . . Moreover, the event' [X + Y = r] is the union of the disjoint events. Robability Distribution
V

[X=O,Y=r],[X=l,Y=r-11, ...., [X=r,Y=O],


where r is a non-negative integer. It follows that

, This procedure is illustrated in the following example.

Example 19.: k t X and Y be independent r.vs. with

i.e., X and Y are independent r.vs.9 with the same p.m.f. The p.m.f. of X + Y is given by
r
PIX + Y = r] = z P[X=j]P[Y=r- j]
j=O

=(r+l)[:"+J,r=o,l,2 ,...
When you study geometric distribution in Unit 9, you will come across a more general result
of which this example is a particular case.
Here are some exercises for you.

E22) Obtain the distribution of X+Y when the joint p.m.f. of X and Y is as specified in
Examples 7 and 8.

This brings us to the end of this unit. In it we have discussed the probability distribution of a
random variable at length. Let's now briefly recall the various concepts which we have
covered here.

7.8 SUMMARY

In this unit we have covered the following points.


1) A random variable is a function defined on a sample space. Its probability distribution
..
is specified by its p.m.f. f(xj) = P[ X = xj], j = 1,2, . We can study two (or more)
..
r.vs. X and Y in terms of their joint p.m.f., f(xj, yk) = P[X = xj , Y = yk],j, k = 1,2, .
The marginal p.m.f., g(xj) = P [ X = xj] of X and h(yk) = P[Y = yk] of Y can be
calculated from f(xj, yk), but the converse is not true. The r.vs. X and Y are said to be
independent if

2, The expectation E(X) = z xj f(xj) of a r.v. X with p.m.f. f(x). its variance
j
Var (X) = F(X" - {E(x)/Zand the covariance Cov(X. Y) = E(XY) - E(X)E(Y) are
Probability on Discrete Sample important characteristics of the r.vs. They have some simple properties like
Spaces
E(X + Y) = E(X) + E(Y), E(aX) = aE(X),
Var (ax + b) = a2 Var (X),
Var (X + Y) = Var(X) + Var (Y) + 2Cov (X,Y), etc.
3) If X and Y are independent r.vs., they have zero covariance. But r.vs. with zero
covariance are not necessarily independent. The correlation coefficient

is a measure of the correlation between X and Y. It is such that -1 I p ( ~Y), 1, the


extreme values p(X, Y) = f 1 being attained iff Y is a linear function of X.
4) The m.g.f. of a r.v. X is M, (t) = E(exp (tX)), provided it is finite for all values o f t in
a neighbowhood of zero. It has the important property &"If X and Y are
independent, the m.g.f. M, &t) of X + Y is the product M,(t) My(t)of their m.g.fs.
+

5) It is possible to obtain the p.m.f. of X + Y from the joint p.m.f. of X and Y . Some
simplification is possible when X and Y are non-negative integer-valued r.vs.

1 7.9 SOLUTIONS AND ANSWERS


-

E2) The possible values of W are - 5 . 4 , . . .O, 1, . . . ,5 and its p.m.f. is given by

E3) The possible values of Y are 0, 1,2, and 3. To obtain P[Y = 21 for example, observe
that there are)1:\
ways of selecting 2 spades out of 13 spades and the third card

can be chosen in (3): ways out of the remaining 39 cards. Hence

Similarly,

and

E4) If X = Number of attempts, the possible values of X are 1,2,3* 4. It is easy to check
that P[X = 11 = P[X = 21 = P[X = 31 = P[X = 41 = 114 which gives us the p.m.f. of X.
To obtain its probability distribution, we need to specify P[X . HI, where H is a subset
of S = (1,2,3,4). There are 16 subsets of S. In fact, we have
Discrete Random Variable and
Probability Dlstrlbutlon

(ii) 817
(iii) 419
(iv) 7/27
(v) 318.

ii)
z [y](1/8)j (7/8)1°-j.
10

j=4

iii)

E7) a) 119
b) 1/20
c) 1/42
E8) a) g(x) = 113, x = 1,2,3, h (y) = 113, y = 1,2,3,.
b) g(x) = (x2 + 4)/10, x = -1, 1
h(y) = (y2 + 1)/10, y =-2,2
c) g(x) = (x + 2 )/14, x = 0,\1,2,3.
h(y) = (2y + 5)/21, y = 0, 1,2.
E9) X and Y are independent in cases a) and b).

E10) a) - g(1) = 0.35, g(3) = 0.50, g(5) = 0.15,


g(0) = 0.45, h(1) = 0.55.
b) They are not independent.

Since E(X) and E(Y) are finite, the series above are absolutely convergent.

< m.
:. E (X + Y) is defined and
E (X + Y) = E(X) + E(Y).
Probability on Discrete Sample
Spaces

Hence E (Xk) = 113 and E(Xl + . . .+ &I = 43.

Var (ax + b) = E [{Y - E(y)f ]


= "[{ax- &(XI} ]
2

2
= E [a2{x - E(x)} ]
= a2 Var X.
E 17) From Example 15 we get
-9
Cov (X, Y) = -
56'

45
var ((Y) = 112.

) (Xk) = (1/3) - (1/3J2 = 2/9 and hence


~ 1 9 Var

E20) Var ( a + by) = a2 Var (X) + b2 V& (Y) + 2ab COV(X, Y).
E21) M.g.f. = E[exp(tX)]
Now x takes values, 1.2, . . ., 6. ,

M.g.f. = x e 9 f(xj)
Discrete Random Variable and its
Probabliity Distribution
-

E22) In Example 7, the possible values of X + Y are 0, 1 , 2 , 3 , 4 , 5 and its p.m.f. is


P[X + Y = I] = 5/24, P[X + Y = 21 = 113
P[X + Y = 31 = 318, P[X + Y = 41 = 1/24, and
P[X + Y = 51 = 1/24; since P[X + Y = 01 = 0.
In Example 8, the possible values of X+Y are 0, 1 , 2 , 3 , 4 , 5 ,and its p.m.f. is
PIX + Y = 11 = 1/15, P[X + Y = 21 = 115,
P[X + Y = 31 = 3/10, P[X + Y = 41 = 4/15,
P[X + Y = 51 = 5/30.

You might also like