Unit-7 IGNOU STATISTICS
Unit-7 IGNOU STATISTICS
- --
7.1 INTRODUCTION
Objectives
A study of this unit would enable you to :
define a random variable and specify its probability distribution,
specify the joint distribution of two or more random variables,
obtain their marginal distributions and examine them for their independence,
define and calculate the means, variances, covariances and correlation coefficients of
random variables,
define moments and obtain moment generating functions,
obtain the probability distribution of the sum of two random variables.
-- --PA-A
-- --
Probability on Dlscrete Sample
w'=* 7.2 RANDOM VARIABLE
In the first two units of this block we have introduced the concepts of a random experiment,
associated sample space and probability of an event. With the help of these we study the
uncertainties associated with such experiments. We usually find that a numerical
measurement or quantity is associated with a random experiment. Consider the following
examples :
1) A person invests Rs. 1 in purchasing a lottery ticket. He either wins the first prize of
Rs. 100 or loses his rupee. His net gain is either -1 or 99. This net gain cannot be
predicted in advance.
2) The authorities of IGNOU cannot predict in advance the number of students who
would join and complete this course. This number could be O,1,2, . . . .
3) The number of calls that a telephone exchange would receive in a specified time
interval can be 0, 1,2, . ...
4) The total number of defects in a motor cycle coming off a production line can be any
number like O,1,2, . . . . .
5) The maximum temperature of Delhi on June 05, can be anywhere between 40" and
50"C.
All these examples have one common feature. They describe a numerical characteristic -
associated with a random experiment. This characteristic depends on the outcome of the
experiment and therefore its value cannot be predicted in advance.
The numerical characteristic assuciated with a random experiment is a variable quantity
which behaves randomly and so we may call it a "random variable". This is of course, not a
technical definition of the term "random variable".
In order to make our ideas precise, we consider an example.' Suppose we are interested in the
number X of heads obtained in three tosses of a coin.
The sample space R consists of the eight points
a, = HHH, = HHT, w3 = HTH, w4 = THH, = T'I'H, w6 = THT,
We could be also interested in the I-et us denote by X ( a j ) the number of heads obtained when the outcome of our experiment is
number X of girls in families with
three children. .
wj. where j = 1.2, . . ,8. You can easily check that
Do you agree that, the number X of heads in three tosses of a coin is a function defined on
the sample space R? It assumes the values 0, 1 , 2 and 3 , as you have been above. Observe,
now that
where we read P[X = j] as "probability that X equals j." Have you noticed that [X = j],
j = 0, 1,2,3 are mutually disjoint sets, and that
Now let us sum up and list the essential properties of the number X of heads obtained in
three tosses of a coin.
i) X is a function defined on the sample space R.
ii) It assumes a finite number of real valves.
iii) We can assign a probability to the event that X assumes a particular value.
iv) The sum of the probabilities that X assumes the different values is one.
In this unit (and in this block) we shall restrict our attention to discrete sample spaces. So, on
the basis of the above discussion we give the following definition.
Definition 1: A random variable is a real-valued function on a discrete sample space R.
In what follows we shall denote random variables by capital letters, X, Y, W, U, V, ..., with
or without suffixes: The value of a random variable X at a point o in the sample space R,
will be denoted by X (a). We shall also write r.v. for random variable.
Recall that a discrete sample space has either a finite number of points or its points can be
arranged in a sequence. Since an r.v. is a function on the sample space, it can take either a
finite number of values or its values can be arranged in a sequence. Suppose, therefore, that
an r.v. X takes the values xl, x2 . . . . . Denote the probability P[X = xJ]that X takes the
value xj by f(x.),
J
j = 1,2, . . . . .Then we have the following definition.
Hence, z z
j
f(xj) =
J
P[X = xJ]= P[ U[X = xJ]]
J
= P(R) = 1.
We now give some examples concerning probability mass functions.
Example l'.:We have seen that the probability mass function of the r.v. X denoting the
number of heads obtained in three tosses of a coin is,
f(0) = 118, f(1) = f (2) = 318, f(3) = 118.
Probability on Discrete Sample Example 2 : An unbiased die is rolled twice. Let X denote the total score so obtained. The
Spacec sample space of this experiment is the set f2 = ((x, y) I x, y = 1, . . . 6 ) of all ordered pairs
(x, y), x being the score obtained on the first throw and y that on the second throw. Each of
the 36 points in f2 carries the probability 1/36. Now what values does X take ? X take the
values 2,3, . . . , 12. In the following table we identify the subjects corresponding to the
events [X = j], J = 2,3, . . . , 12, as well as the corresponding probabilites, f(2), . . . f(12).
Table 1 :Probability Mass Function of X
j Event [X=j] Subset of LZ
2 [X = 21 ((19 1))
3 [X = 31 {(192)9 (29 1))
4 [X = 41 1(193)(292)9(391))
5 [ x = 51 { ( I *4) ( ~ 3( )3 ,~ ~(4,1)1
)~
6 [X = 61 {(I, 3,(2,4), (3,3), (4,2), (5, 1))
7 [X = 71 ((1,6), (2,5), (3,4), (4,3), (5.39 (6.1))
8 [X = 81 1 (2,6), (3,5), (4,4) (5,3), ( 6 2 ) 1
9 [X=9] ' ((3,6), (4,5), (5,4), (6,311
which implies that f(j) > 0y .Thus, the probability mass function of X is
Now you can extend the arguments used in Example 3 tc solve this exercise.
In E l you must have seen that f(j)s are terms of a convergent geometric series. Therefore,
we say that the r.v. X with the probability mass function in El has a geometric distribution.
Let us return to the discussion of the three tosses of an unbiased coin. The r.v. X, denoting
the number of heads so obtained: has the probability mass function
Suppose we want to know the probability P[X S 21. Since X 5 2 iff, X = 0 or 1 or 2. and
since the events [X = 01, [X = I] and [X = 21 are disjoint, we can write,
P ~ ~ < ~ ~ = P I x = o ~1 1+4 Pp r~x =x 2=1
Discrete Random Variable and its
Probability Distribution
Similarly, we can obtain the probability that the sum of the scores obtained by rolling a die
twice is greater that 8. In fact, for r.v. X of Example 2,
P[X> 81 =P[X= 91 +P[X = 101 + P[X = l l ] + P[X = 121
More generally, let H be any subset of the set of possible values of an r.v. X. Then
using the properly P7 or P8.of Unit 6. Here the sum is taken over all ~ o i n t xj
s in the
subset H.
Suppose we have a random variable X assuming values xl, x2, . . . with probabilities f(xl),
f(x2), . . . , respectively. You may also visualise this as an illustration of a frequency
distribution. The values xl, x2, . . . assumed by the random variable correspond to the values
of the variable or to mid-values of the class-intervals, and the probabilities f(x,), f(x2), . . .
play the role of relative frequencies. We will find this interpretation useful when studying
expectation and variance of a random variable.
In what follows, we shall study the properties of a random variable only in terms of its
probability mass function. That is, we may not always referto the underlying sample space
or to the specification of the function on the sample space which yields random variable
with specified probability mass function. However, we can always visualise a random
experiment which leads to a random variable with specified probability mass function. To
see this, imagine.a box containing cards bearing the numbers x,, x2 . . . ,and let f(xj) be the
.
proportion of cards bearing the number xj, j = 1,2, . . . If we choose one of the cards a t
random from this box, then it will bear the number xj with probability f(x,), j = 1'2, . . . .
Thus, we have a random experiment which yields a random variable with a specified
probabilitg distribution. Did you notice that we said that we can visualise a random
experiment and not that we can construct an experiment? This is because we will not be
able to construct the box or any other mechanical device if some or all of probabilities f(xl),
f(x2), . . . are irrational numbers or if the discrete random variable takes infinitely many
values.
Thus, although for technical reason it is necessary to consider the sample space on which
our r.v. is defined, all its properties can be studied with the help of only the probability mass
function. In what follows, we shall use the short form p.m.f. for probability mass
function.
But before we go any further, it is time to do some exercises.
E2) Let XI be the score obtained on the first throw and X2 be the score obtained on the
second throw of an unbiased die. Define W = XI - XZ. Obtain the p.m.f. of W.
(Hint : Follow the method of Example 2.)
E3) Three cards are drawn without replacement from a deck of 52 playing cards. Find the
p.m.f. of the number Y of spades in the three cards.
E4) A person has 4 keys with which to open a lock. We selects one of the keys at random
from the 4 keys on the first attempt. Subsequently, he discards the keys already used
and selects one key at random from the remaining keys. He may require 1,2,3 or 4
attempts to open the lock. Obtain the probability distribution of the number of
attempts.
Probability on M ~ r e t Sample
e If YOU have done these exercises, you would have got a fairly good grasp of p.m.f. Next we
Spaces study the joint distribution of random variables.
There are many situations where we have to study two or more r.vs. in connection with a
random experiment. The following are some examples of suclj \ltuations.
I
i) A store sells two brands. A and B, of tooth-paste. The sales X and Y of brands A and
B, respectively, in one week are of interest. Here X and Y are r.vs., both taking
Recall that you have already come values 0, 1.2 . . .
across joint frequency dismhutions
in Unit 4. I
ii) Let X denote the number of boys born in a hospital in one week and Y that of girls
born in the same hospital in the same week. Then X and Y are r.vs., both taking the
values@ 1 , 2 . . . 1
iii) A group of 50 people is vaccinated against a disease and another group of 40 people is
not vaccinated. Let X and Y denote the number of people affected by the disease from
the two groups. Then X and Y are r.vs. taking values 0,1 . . .SO, and 0, 1 . . . ,40,
respective1y.
iv) Suppose we classify the persons according to the day of the week they were
born. If X I , X2, . . . ,X7 denote the number of students with birthdays on Monday,
,,.
Tuesday, . . . ,Sunday from a class of 100 students, then X . . , X7 are r.vs. taking
values 0,1, . . . , 100 stlbject to the restriction X,+ X2 + . . . .+ X7 = 100.
We begin this section by describing methods of studying the jbint distribution of two or
more random variables.
The total number of ways of selecting two persons from a group of 10 persons is
I
Since the persons are selected at random, each of these 45 ways has the same
1
- Consider the event [ X = 1, Y = 11 that the committee has one mathematician and one
45'
statistician. One mathematician can be selected from two in
6).
person on the committee has to be one of the 4 engineers. This engineer can be selected in
4 ways. Hence
Since the committee has only two members, it is obvious that there are no sample points
corresponding to the events [X= 1, Y = 21, [X= 2, Y = 11 and [ X = 2, Y = 21. Hence, their
probabilities are all equal to zero
We now summarise these calculations in the following table.
Table 2 :P[X = x, Y = y] for x,y = 0,1;2.
0 1 2
Note that if we denote by f(x, y) the probability P[X = x, Y = y], the function f(x, y) is
defined for all pairs/(x, y) of values x and y of X and Y,respectively. Moreover,
f(x, Y)2 0
and
We say that the function f(x, y) is the joint probability mass function of the r.vs. X,Y.More
generally, we have the following definition.
Definition 3 : Let X and Y be two r.vs. associated with the same random experiment. Let
. ..
xl, x2, . . denote the values of X and yl, y2, . denote those of Y.The function f (xj, yk)
..
defined for all ordered pairs (xj, yk). j, k = 1,2 .by the relation
and
Moreover, we should clarify that [X= xj, Y = yk] ally stands for the event [X= xj]
n [Y= y,] and that [X= xj, Y q k ] is a simplified and accepted way of expressing the
intersection of the two events [X= xj] and [Y = yk]. Notice also that in Example 4, we had
used x and y as the arguments of the p.m.f. and in the definition given above we are using
xj and yk as the arguments. We shall use both notations and trust that it will not cause any
confusion.
Now here is an example.
Example 5 :Suppose X and Y are two r.vs. with p.m.f.
flu rr\ - r lv A v\ v =1 3 2
' A onA v = 1 3 m o t An vnnn thinL- i c th- v a l ~ i - nf r 7
Probability on Discrete Sample c should be such that > 0 and
Spaces
, 4 2
C C f(x. Y)= 1
x=l y r l
The left side of the above equation is
I
Moreover, [Y= 11 and [Y= 21 are disjoint events and therefore the events [X 2, Y = I ]
and [X= 2, Y = 21 are also disjoint. Hence,
Similarly,
14 18 8
Note that P[Y = 11 = -and P[Y = 21 = - specify the p.m.f. of Y when X and Y have the
32 32
given joint p.m.f. It is called the marginal probability mass function of Y.We will discuss
this concept in more detail in the next section.
Example 6 :Let us obtain the conditional probability P[X = 4 1 Y = 21, that is, the
probability that X = 4 given Y = 2 for Example 5.
y1
By definition of the conditional probability,
Examples 5 and 6 illustrate that we can obtain probabilities of events associated with r.vs. X
and Y by using the joint p.m.f. Hence, as in the case of a single r.v., the joint p.m.f. of X
'
and Y is s&d to specify the joint probability distribution of X,Y. It is therefore enough to
specify the joint p.m.f. of X and Y to answer any question about them.
The concept of joint distribution of two r.vs. is easily extended to that of three r.vs. X,Y and '.'
Z. We now need to specify the p.m.f.
f(xj, yk9zi) = P[x = xj, Y = yk9z = zi].
I
It>
for all ordered triples (x;, yk, z;), of values x;, yk and z; of X,Y and Z.
We can now further extend these concepts to more than thpe r.vs. But we omit the details Discrete Random Variable and its
since, in this course, we shall be mostly dealing with joint distribution of a pair of r.vs. See if Mstribution
you can solve these exercises now.
"5) The joint p.m.f. f(x, y) of two r.vs. X and Y is given in the following table.
a) Obtain (i) P[X = 21, (ii) P[Y = 0] (iii) P[X = 1, Y I 2 1 (iv) P[X 5 2, Y = 01
(v) P[X = 2 I Y =O].
b) Are the events [X = 21 and [Y = 0] independent ?
c) Calculate P[X + Y = 41.
E6) The joint p.m.f. of two r.vs., XI and X2 is given by
Let X and Y be r.vs. with values xl, x2, . . . and y l , y2 . . ., respeclively and joint p.m.f.
f(xj, yk) = P[x = Xj, Y = yk].
c g(xj) = x k
, 1,2, . . .
f(xj. Y ~ j) = .
A
Probability on Discrete Sample Thus, g(xj) has all the properties of a p.m.f. Similar1y;you can verify that h(y& also has all
\paces
the properties of a p.m.f. We call these the p.m.f. of the marginal distributions of X and Y,
as you can see from the following definition.
Definition 4 :The function g(xj) defined for all values xj of the r.v. X by the relation I
b'
is called the p.m.f. of the marginal distribution of X. Similarly, h(yk) defined'fsr all the
values yk of the r.v. Y by the relation I
C
h ( ~ k ) = f(xj. YL)
j
is called the p.m.f. of the marginal distribution of Y.
Let's try to understand this concept by taking an exampl'e.
Example 7 :Let X,Y be two r.vs. with joint p.m.f. f(x, y) defined by the following table.
Table 3 :Joint p.m.f. f(x, y)
The marginal p.m.f. g(x) of X is obtained by summing all the elements in each of the rows.
Similarly, the marginal p.m.f. of Y is obtained by summing all the elements in each of the
columns. This procedure is a.straightforward consequence of the definition of g(x) and of
h(y) when the joint p.m.f. is defined by the above tabular form. In fact, we have
g(0) = P[X = 01 = 113
g(1) = P[X = 11 = 5/24
g(2) = P[X = 21 = 11/24
Similarly,
h(0) = P[Y = 01 = 6/24
h(1) = P[Y = 11 = 9/24
h(2) = P[Y = 21 = 6/24
h(3) = P[Y = 31 = 3/24
In this example, we have g(x) s P[X = x] and h(y) = P[Y = y] for all x and y.
Is it a coincidence ? No.
Notice that in the general situation, .
and recall that the events [X= x. Y = yk] for fixed xj and different yk values are disjoint.
Hence, by property PI and P8 o# unit 6, ,.
Similarly,
Similarly,
The discussion, so far, tells us that we can determine the marginal p.m.fs. from a knowledge
of the joint p.m.f. of the two r.vs. But is it possible to determine the joint p.m.f. from a
knowledge of the marginal p.m.fs. ? To answer this, we consider the following two distinct
joint p.m.fs. fl and f2 and the corresponding marginal p.m.fs. The first p.m.f. is given by
So, what do we find ? Although the joint p.m.fs. fl and f2 are different, they lead us to the
same marginal p.m.fs., gl = g2, hl = h2. In other words, the marginal distributions of X and
Y do not determine their joint distribution uniquely. However, there is one particular
situation where this is possible. We now discuss this situation in detail.
Let X and Y be two r.vs. with joint p.m.f. f(x, y) specified in the following table :
Table 4
-
P[X = x, Y = y] = f(x, y) = g(x) h(y)
P[x = XI P[Y = y]
for all x = 0, 1.2, and y =jO, 1,2,3. In other words, for all possible values x of X and y of Y,
the events [X = x] and [Y =y] are independent. In such a situation, we are justified in
asserting that the r.vs. X and Y are independent r.vs. More formally, we have the following
definition for independent r.vs.
Definition 5 :Let X and Y be two r.vs. with joint p.m.f. f(xj, yk) and marginal p.m.fs. g(xj)
and h(yk) of X and Y, respectively. If for all pairs (xj, yk),
t
then we say that the r.vs. X and Y are stochastically independent or, simply, independent.
Note that we have defined ir dependence of r.vs. in terms of independence of events. Thus,
.
no essentially new concept i involved in the definition of independence of two r.vs. except
that the product relation (3) t,r equivalently the product relation (4) should hold for all pairs
(x, y) of values x of X and y of Y.
Note that the r.vs. X and Y o!'Examples 7 and 8 are not independent. You can check that
f(0,O) # g(0) h(C 1 in Example 7.
Similarly, in Example 8,
With this background, can yo i extend the concept of independence of two r.vs; to that of
n(> 2) r.vs?
Definition 6 :Let X1 ,. . . ,X ,be n r.vs. They are said to be independent if
PIXl = Xl, . . . ,X,,
= x,] = PIXl = xl] P[X2 = x21 . . . P[X, = x,l
for all n-tuples (xl, . . . ,x,) of values xl of XI, x2 of X 2 , . . . ,x, of X,,.
If you have followed the ideas introduced in this section, then you should be able to solve
these exercises.
E7) Determine the value of c so that the following functions represent the joint p.m.f. of
the r.vs. X and Y.
a) f ( x , y ) = c , x = 1 , 2 , 1 , y = 1 , 3 , 3 .
E8) Obtain the marginal p.m ?s.of X and Y in each of the cases of E7.
E9) Examine if X and Y are i dependent in :,ach of the cases of E7.
Discrete Random Variable and its
E10) Suppose the r.vs. X and Y have the joint p.m.f. f(x, y) specified by the following table Probability Distribution
So far, you have seen that the p.m.f. of one or more random variables can be visualised as
their frequency distribution where probabilities correspond to relative frequencies. You also
know that given a frequency distribution, we can find its mean, variance, covariance and
moments. Let us study these concepts for the p.m.f. of a r.v. now.
I
The problem becomes a little more complicated if we have the following frequency
distribution of the scores of 100 students in the class.
I Score 40 50
Frequency 10 15 . 35
However, let us rewrite this in a slightly different form as follows. The required average is
Note that the fraction 101100, 15/100,35/100,25/100 and 1.51100 are, in fact, the relative
frequencies or the proportions of the students who obtain the scores 40,50,55,60 and 75,
respectively.
As you know, the arithmetic mean is a measure of central tendency giving a single number
around which the observations are distributed. Now we want to define a similar measure of
central tendency for the probability distributions of a r.v. X, which assumes different values
with their associated probabilities. The only difference is that the role of relative frequencies
is now taken over by the probabilities.
The simplest situation is to consider a r.v. X which takes two values 1 and 2, and suppose
that P[X = 11 = 113 and P[X = 21 = 2/3. The mean, or the mathematical expectation, of this
r.v. X is defined to be
Probablllty on Discrete Sample Suppose now that a r.v. X takes a finite numter n of values x l , x 2 , . . . ,xn with probabilities
Spaces
.
f(xl). f(x2). . . f (x,,).
9
Suppose now that the r.v. X assumes an infinity of values xl, x2 , . . . with associated
.
probabilities f(xl), f(x2).. .The expectation of X is now defined by the infinite series.
We shall not discuss the definition of E(X) when the infinite series I xj I f(xj) does not
converge. The discussion of such cases is beyond the scope of this course and so, we shall
consider only those r.vs. which have a finite expectation.
The mean of X, expected value of X, mathematical expectation of X, mean of the
distribution of X are some of the synonynis in use for E(X).
w e now illustrate the computation of E(X) through some examples.
Example 9 :Let us find the expected score obtained on the roll of an unbiased die.
The score X obtained on the roll of a die is I , % , 3,.4,5
or 6 and each has probability 116, i.e.
P[X = x] = 116 for x = 1,2, . . . ,6. Hence,
Example 10 :A lottery consists of 100 tickets valued at Rs. 21 each. A person buys 1
ticket mrl wniild main a nf R c
nri7~ 1 M if hic tirkpt i c the winnino tickpt T pt IIC find hi^
The probability that the person wins the prize is 111W and that he loses is 99/100. His net biwrete Random Variable and it5
gain X is Rs. 98 if he wins, and is Rs. (-2) if he loses. Thus, we need to find E(X) when P[X Probability Distribution
= -21 = 991100 and P[X = 981 = 1/100. We get
Thus, his net expected gain is Rs. (-I), i.e., his expected loss is Rs. 1.
Now we consider two situations, where the r.v. takes an infinite number of values.
Example 11 : Suppose we want to find the expected value of a r.v. X which has the p.m.f.
By definition
Many a times, we need to calculate not E(X) but the expected value of a function of X, like
x2, cos X, exp(tX), etc. Of course, all such functions are again r.vs. and we can use the
definition to calculate their expectation. However, the following example s~iggestsa simple
solution.
Since X assumes the values -2, -1,0, 1,2, the Lalues of x2are 0, I and 4. Do you agree that
p[x2 = 01 = P[X = 01 = 4/10 ?
~ o w , s i n m 1~i ~f f =
X = 1 orX=-1,
p[x2 = 11 = P[[X = 11 u [X = I]]
= P[X = 11 + P[X = -11 = 4/10.
Hence,
Here we first obtained the p.m.f. of x2 and then used the definition of E(x~).This, in
general, could be a cumbersome procedure. So let's try another way.
2
Let us calculate x2f(x).
x=-2
Probability on Dlscrete Sample
2
Spaces
x
x=-2
x2f(x)=4
(,b)+
- 1 -
0
+o - + 1 -
0 [f) (:o)
+4 -
= 1.2 . . . (8)
2
The equality E ( x ~ )= x
x=-2
x2f(x), brought out by (7) and (8) is not an accident. It is a
E[$(x)l = xj
$(xj) f(xj), . . . (9)
ptovided the series on the right hand side of (9)is absolutely convergent.
We shall not prove this theorem. But we would like to bring out some important points
concerning it.
Remark 2 :
i) We have the following useful interpretation for
ii) The illustration in Example 12 is not a proof of the above theorem. The proof is
beyond the scope of this course.
We'll be interested in functions of iii) Suppose X and Y are two r.vs. with joint p.m.f. f(xj, yk). Let $ be a real-valued
the type $ (xj, yk) = XJ + yl;
$ (xj, ~ k =) X j functibn defined on the product set G x H';where G = { x x,, . . .) is the set of values
$ (xj, yk) = "J Yk. o f X a n d H = {yl,y2,.. .) isthesetofvaluesofY.
Let us denote by $(X, Y), the r.v. which assumes the value $(xj, yk), when X = xj and
Y = yk. We define, by analogy with the result of Theorem 1,
We can generalise this result and find a simple way of calculating the expectation of the sum
of two r.vs. X and Y. This is given in the following result.
Suppose X and Y are two r.vs. with joint p.m.f., f(xj, yk),j, k = 1.2, . . . Suppose E(X) and
E(Y) are finite. Then E(X + Y) is finite and
E (X + Y) = E(X) + E(Y).
This result is true when X and Y take either finite or countably infinite values. We shall not
worry about the proof in the countably infinite case here. The proof in the finite case-is very
easy and we are sure you can write it yourself.
Discrete Random Variable and its
E l l ) 1 f X a n d Y a r e t w o r . v ~withjointp.m.f.f(xj,yk),j=
. I r 2,..., n , k = 1 , 2, . . . m,and Probability Distribution
E(X), E(Y) are finite, then prove that
E(X + Y) = E(X) + E(Y).
If XI,X2, . ..
,Xn are r.vs. such that E(Xi) is finite for all i, then XI + . . . + Xn also has a
finite expectation and
E(XI + X Z + . . . + X n ) = E ( X I ) + . . . +E(X,).
We now list a simple but useful property of E(X).
If a < X 5 b, a, b . R, i.e., if the values x I , x2. . . . of the r.v. X are such that a Ixj 5 b for all
j = 1,2, . . . , then a E(X) b.
Proof : Obseke that because a Ixj 5 b for all j 1 1, we have
E 12) Prove :
a) If X > 0, and E(X) is finite than E(X) symbol 0.
b) Let X > Y, i.e., the r.v. X - Y assumes only non-negative values. Then
E(X) 2 E(Y).
E13) Let the p.m.f. of a r.v. X be
f(x) = (3 - x)/IO, x = -1,o, 1,2.
a) Calculate E(X).
b) Calculate E ( x ~ )by using (9) and also by determining the p.m.f, of x2and verify
that both give the same result.
C) Use the results of (a) and (b) to calculate
E[(4X + 5)2].
E14) Calculate E[exp(tX)] for the distribution discussed in Example 11. Here t is a fixed
number.
E15) An unbiased die is rolled. We say that a success occurs if the score obtained is 1 or 2.
Any other score (i.e. a score of 3,4, 5 or 6) is called a failure. Let Xk = 0 or 1
according as the k-th trial results in a failure or a success. Notice that XI + . . .+ X, is
the number of successes obtained in n rolls of the die. Obtain E(Xk) and hence the
expected number of successes in n rolls of the die.
So far we have discussed some of the properties of the expectation of a r.v. X. You have
seen that expectation is regarded as a measure of central tendency of the probability
distribution of X, with the probabjlities f(xj) = P[X = xi] playing the role of relative
frequencies. In the next section we will extend these concepts to obtain measures of
dispersion of X arbund its mean value.
- - -
This implies that (xj - p)2= 0 for all j such that f(xj) z 0.This means that X takes only one
value p, or that P[X = p] = 1. In short, Var (X) is zero iff the r.v. X assumes only one value
or is a constant. Such a r,v. is said to have a degenerate distribution or is said to be a
degenerate r.v.
Now look at some examples, where we have calculated the variance or some r.vs., which
you have already met.
Example 13 :Here we calculate the variance of the score obtained on the throw of an
unbiased die.
Let X denote the score obtained on the throw of the unbiased die. Then .
In Example 1 1 as well as in the above example, we were required to calculate the sums of
some infinite series. Here is how we find the sums of series of the type,
P
Therefore, S, = -
(1 -P12
Similarly,
This gives us
We now give some important observations concerning the result,in El6 in the following
remark.
Remark 3 :
I
I
i) If we treat Y as a r.v. obtained from X by a change of origin and scale, then El6
implies that the variance is unaffected bjr change of origin.
ii) The standard deviation of Y = aX + b is I a I times the standard deviation of X.
iii) Suppose E(X) = p and Var(X) = a2, where a is the standard deviation of X.
Then the mean and variance of Y = u, are zero and one. respectively. The r.v.
a
y = -(X *,
is called the standardized or normalized version of X.
a
Our next aim is to obtain Var(X+Y). For this purpose we need to introduce the concept of 1:
covariance of the two r.vs. X and Y.
Let X and Y be two r.vs. with joint p.m.f. f(xj, yk), j, k = 1,2, . . .Then
E(xy) = xj k
4 yk yk), . . . (19)
(xj * yk122 o where the sum of the series on the right is assumed to be finite (see remark 2(iii)). Let
p, and p
,, denote the means of X and Y, respectively. Now we are in a position to define the
xz + Y£
=>+21x,yk1 I covariance.
\ /
It follows that if Var(X) and Var(Y) are finite, then Cov(X, Y) is finite.
We illustrate the procedure for the computation of Cov(X, Y) by means of an example now.
Example 16 :Suppose the joint p.m.f. of X, Y is given by the following table :
Table 5 Discrete Random Variable and its
-- - Probability Distribution
0 , 1 2 g(x)
-
0 3/28 9/28 3/28 15/28
1 3/14 3/14 0 317
, 2 1/28 0 0 1/28
h6') 5/14 15/28 3/28 1
Similarly,
You must have noticed that the troublesome step in this calculation is the computation of
E(XY). But for some r.vs., this is simplified. We establish this inthe following theorem.
Theorem 2 :If X and Y are independent revs.and have finite ex&tations, then
. E(XY)= E(X)E(Y).
Proof: Since X and Yare independent r.vs., their joint p.m.f. is
f(xj, yk) = g(xj) h(yk)*
Where g(xj) and h(yk) are the marginal p.m.fs. of X and Y, respt ctively (see Definition 5).
, - = E(X) E(Y).
We generalise this result for n independent r.vs in the following corollary.
Corollary : If XI, X2. . . . Xn,are n independent r.vs with finite expectations, then
f . \
Observe that
and
We omit the proof of this result. The result about n independent r.vs now follows:
Corollary : if X I , . . . , Xn are independent r.vs., with finite variances, then
111 fact, it is enough to assume that the r.vs. X1, . . . , Xn have pairwise zero covariances to
claim this result. Try to do this exercise now. It concerns the definitions and results which
we have just discussed.
E17) Let the joint distribution of X and Y be as specified in Example 16. Obtain
v a r (X + Y).
- adp, - bcpy - bd
= ac l E(XY) - p,~,,I
= ac Cov (X, Y), as required.
We can use this theorem to arrive at the following result.
Corollary :If X and Y are r.vs. with Cov(X, Y) = 0,then
v a r (X - Y) = v a r (X) + v a r (Y).
Proof :Applying Theorem 4, we get
Cov (X, - Y) = - Cov (X, Y)
.- .
Probability on Discrete Sample Also, be using the result in E16,we can write Var (-Y)= Var (Y).
Spaces
Hence, in general,
var (X- Y)= var (X)+ var (-Y)+ 2 Cov (X,
-Y).
= var (X)+ var (Y)- 2 Cov (X,
Y).
Var (X-Y)= Var(X) + Var(Y).
We conclude this section with the discussion of the correlation coefficient between X and Y.
The definition of the correlation coefficient is very similar to that of the correlation
coefficient you encountered in Unit 4 in connection with bivariate data.
Definition 10.: The correlation coemcient between X and Y is defined to be
In this definition we assume that Var (X)and Var (Y)are both finite and positive and that
the square root in the denominator is the positive square root. We give below some simple
properties of the correlation coefficient.
1) LetZ1=aX+bandq=cY+d.Then
Z2)
Proof :Recall that Cov (Z1, = ac Cov (X,Y)and thgt Var (Z,) =.a2 Var (X)and
Var (Z2)= c2 Var (Y).Hence
Conversely, suppose p(X,Y)= 1. Then from the proof of the second property above, we
have ,=
var (X* - Y*) = 0. Discrete Random Variable and its
Probability Distribution
This implies that X* - Y* is a degenerate r.v. or than
X* - Y* = c, a constant
Since E(X*) = E(Y*) = 0,i: = 0.Equivalently,
0 0
wherea=' and b = h - =pX..
=x , =x
The proof for the case when p(X, Y)= -1 is similar..In that case H 2 use the result
var (X* + Y*) = 0.
We have given a number of examples in this section to show haw to obtain :fit ~,iean,
variance, covariance, etc. for random variables. Now would you like to try yo:ir hand at
these exercises?
E18) Compute the means, the variances, the covariances and the correlation coefficients for
the joint distribution of E7 and E10.
E19) Obtain the variance of the total number of successes in El5 under the assumption that
X,, X2, . . . ,Xnare independent r.vs. I
So far we have discussed many concepts for a random variable with a given p.m.f. We had
talked about the same concepts in relation to a frequency distribution in Block 1. In the next
section we will take up the study of yet another concept.
We have studied the properties of E(X) and Var(X) in the previous two sections. There are
expectations of some functions of r.vs associated with a probability distribution, which play
an important role in statistical theory. We plan to study properties of some of these in this
section.
Let r be a positive integer. The r-th moment of a r.v. X or of its probability distribution is
ktT= E(XT)= x
j
xi f(xj),
provided, of course, the series on the right is absolutely convergent. Sometimes we need to
use
which is called the r-th moment of X about a. In this sense 4 is the r-th moment about the
origin (a = 0).
Of course, when r = 0,9 = 1 and therefore, J L ' ~= 1. The first moment p'+s, the, by now
familiar, expected value or mean of X. The variance, Var(X), is the second moment of X
x
about its mean, m2( ).
Probability.onDiscrete Sample . .
( u 1 ~ - ~ 1 1Hencewecanassertthatwhateverbetherealnumberu,~u1~-~51u~~+1.
Spaces A consequence of this inequality is the following :
= 1+ C IX," f(xj).
j
Thus, whenever, the r-th moment of X is finite, so is the (r - 1)-th moment. In particular, all
the moments p',, s S r, would be finite.
We do not enter into any detailed study of the properties of moments of a rv. except to
introduce the so-called moment generating function which will be useful to us in Units 8
and 9.
Let t be a real variable and suppose that
In other words, p: is the coefficient of f/r! in Maclaurin's expansion of the m.g.f. In fact, we
can write
Since here
we find that
p b = -2+ - =1 1 and p f , = 1- , r = 1 , 2 , . .
3 3 3
Discrete Random Variable and its
Probability Distribution
(X - Px)
In particular, if X* = -is the standardised version of X, then
=x
= Mx(t)My(t),.
'
which is the required result.
We shall talk more about the probability distribution of the sum of two r.vs. in the next
section. But before that we are giving you a simple exercise to do.
E21) Obtain the m.g.f. and the moments of the r.v. in Example I .i
In Example 2 we have discussed the probability distribution of the sum of scores obtained
on two mlls of an unbiased die. In this section we are interested in the methods of obtaining
the distributionof the sum of two r.vs. We begin with the following simple
example.
Examde 18: The ioint ~.m.f.of (X.Y)is as s~ecifiedin Table 'I.
Probability on Discrete Sample Table 7
Spaces
[X+Y=5]=[X=2,Y=3].
It immediately follows that
where the sum extends over all those (xj, yk) which add up'to y.
This general procedure, though valid in principle for all discrete r.vs., is chmbersome,
except in very simple situations. We, therefore, investigate a special case in which
simplification is possible.
Suppose X and Y are independent r.vs. which assume non-negative integral values
0, 1,2, . . . Let PIX = x] = f(x), and P[Y = y] = g(y), x, y = 0, 1,2, . . . Because of the
independence of X and Y,
P[X=x,Y=y]=f(x)g(y)
for all x and y. In order to obtain the p.m.f. of X + Y, observe that X + Y assumes the values Discrete Random Variable and its
0, 1.2, . . . Moreover, the event' [X + Y = r] is the union of the disjoint events. Robability Distribution
V
i.e., X and Y are independent r.vs.9 with the same p.m.f. The p.m.f. of X + Y is given by
r
PIX + Y = r] = z P[X=j]P[Y=r- j]
j=O
=(r+l)[:"+J,r=o,l,2 ,...
When you study geometric distribution in Unit 9, you will come across a more general result
of which this example is a particular case.
Here are some exercises for you.
E22) Obtain the distribution of X+Y when the joint p.m.f. of X and Y is as specified in
Examples 7 and 8.
This brings us to the end of this unit. In it we have discussed the probability distribution of a
random variable at length. Let's now briefly recall the various concepts which we have
covered here.
7.8 SUMMARY
2, The expectation E(X) = z xj f(xj) of a r.v. X with p.m.f. f(x). its variance
j
Var (X) = F(X" - {E(x)/Zand the covariance Cov(X. Y) = E(XY) - E(X)E(Y) are
Probability on Discrete Sample important characteristics of the r.vs. They have some simple properties like
Spaces
E(X + Y) = E(X) + E(Y), E(aX) = aE(X),
Var (ax + b) = a2 Var (X),
Var (X + Y) = Var(X) + Var (Y) + 2Cov (X,Y), etc.
3) If X and Y are independent r.vs., they have zero covariance. But r.vs. with zero
covariance are not necessarily independent. The correlation coefficient
5) It is possible to obtain the p.m.f. of X + Y from the joint p.m.f. of X and Y . Some
simplification is possible when X and Y are non-negative integer-valued r.vs.
E2) The possible values of W are - 5 . 4 , . . .O, 1, . . . ,5 and its p.m.f. is given by
E3) The possible values of Y are 0, 1,2, and 3. To obtain P[Y = 21 for example, observe
that there are)1:\
ways of selecting 2 spades out of 13 spades and the third card
Similarly,
and
E4) If X = Number of attempts, the possible values of X are 1,2,3* 4. It is easy to check
that P[X = 11 = P[X = 21 = P[X = 31 = P[X = 41 = 114 which gives us the p.m.f. of X.
To obtain its probability distribution, we need to specify P[X . HI, where H is a subset
of S = (1,2,3,4). There are 16 subsets of S. In fact, we have
Discrete Random Variable and
Probability Dlstrlbutlon
(ii) 817
(iii) 419
(iv) 7/27
(v) 318.
ii)
z [y](1/8)j (7/8)1°-j.
10
j=4
iii)
E7) a) 119
b) 1/20
c) 1/42
E8) a) g(x) = 113, x = 1,2,3, h (y) = 113, y = 1,2,3,.
b) g(x) = (x2 + 4)/10, x = -1, 1
h(y) = (y2 + 1)/10, y =-2,2
c) g(x) = (x + 2 )/14, x = 0,\1,2,3.
h(y) = (2y + 5)/21, y = 0, 1,2.
E9) X and Y are independent in cases a) and b).
Since E(X) and E(Y) are finite, the series above are absolutely convergent.
< m.
:. E (X + Y) is defined and
E (X + Y) = E(X) + E(Y).
Probability on Discrete Sample
Spaces
2
= E [a2{x - E(x)} ]
= a2 Var X.
E 17) From Example 15 we get
-9
Cov (X, Y) = -
56'
45
var ((Y) = 112.
E20) Var ( a + by) = a2 Var (X) + b2 V& (Y) + 2ab COV(X, Y).
E21) M.g.f. = E[exp(tX)]
Now x takes values, 1.2, . . ., 6. ,
M.g.f. = x e 9 f(xj)
Discrete Random Variable and its
Probabliity Distribution
-