Lec1 6
Lec1 6
Hello and welcome to this lecture. In the previous lecture we looked at multiple random
variables; how to describe the joint distribution of multiple discrete random variables. In
particular I gave some simple examples of joint PMF and we looked at the scenarios and
computed what the joint PMF will be.
Now, we are going to start looking at marginal PMF. I sort of ended at the in the previous lecture
by saying, when you have a lot of random variables the joint PMF can become a very
complicated object, which you cannot easily describe. So, you are looking for some simpler
alternatives and this marginalization and conditional distributions are very nice pathways to get
to the joint distribution.
So, let us start by looking at the marginal PMF. In particular, the individual marginal PMF is
what I will start with, and then we will slowly generalize this. So, we are going to look at
multiple discrete random variables; we also we already saw the marginal PMF in the context of
two random variables. Now, we are going to increase the two to multiple that is nothing more.
So, so let say you have n random variables X1, X2,.. Xn and these are distributed with some joint
PMF f X1, X2,.. Xn so something. And then the PMF of the individual random variables are these
individual marginal PMFs; that we are interested in X1, X2,.. Xn.
And we can quite easily see that the PMF of X1 itself evaluated at some t, is going to be basically
P( X1 = t). And it turns out you can find these marginal PMFs by summing over the joint PMFs
for different ranges. What do you sum over you basically sum over all that you do not want to
keep. So, if you are looking at f(X1), you have to sum over everything other than X1. So, joint
PMF of course will depend on X1, X2 all the way to Xn.
So, you sum over the joint PMF, you can see here in the first formula there ( )
( ) ∑ ( ). I am summing over which is in
the range of X2, which is in the range of X3; all the way to which is in the range of Xn.
What I am summing? I am summing the joint PMF evaluated at t, which is the value of X1 which
I want to keep. So, I will not be summing over t, I want to keep that t; and then everything else is
all possible values. So, this is a very simple proof, the proof is very similar to what we did
before.
Basically, I am saying I want the probability that X1 equals t; I will write the event X1 =t, and X2
= , X3 = and so on; and simply add up over all possible values for ; so that is exactly
what it is. It is a very simple formula, except that when you execute it, sometimes it can become
a bit confusing; but keep this rule in mind. When you marginalize everything that you do not
want, you keep only what you want; you get the marginal PMFs.
So, when you go to X2 what will you do? You will keep X2 alone and then put variables for all
the other random variables, sum over all of them; you get the marginal PMF of X2 so on. So, that
is the multiple discrete random variable marginalization.
(Refer Slide Time: 03:28)
So, let us see a few examples just before we always been seeing examples; so, look at an
example. I am going to toss a fair coin thrice, it is a fair coin; and X1 is the indicator for the first
toss being heads. As usual X1, X2, X3 are 1 and 0 for the first toss, or second toss, third toss
respectively. We have seen the joint PMF before; this is just a 1/8. If you want to look at the
marginal say ( ); notice what I am doing here. I am keeping the 0 fixed; so, let us maybe use
some of the color here blue. I am keeping the 0 as X1 throughout and the other guys are varied
across you marginalize it out.
sum over all these things; you keep 1 1 fixed for X1 and everything else you vary. And simply
add them it up, you get half; ( ) ( ) ( ) ( )
execute.
So, this is a very simple problem and you see how the margins work out. If you do X2 also will
get the same answer, X3 also will get the same answer. Of course, we knew the distribution of
X1, X2, X3 before it is not very difficult to do this.
(Refer Slide Time: 05:04)
Let us go to our slightly more interesting example of this 000 to 999. X is the first digit from left,
Y is the number modulo 2, and Z is the first digit from the right. So, in this case you might say , I
will have to sum up over all the joint PMF; but really you know the marginal is directly and
easily. You can write it down; there is no need to sum over the joint PMF or anything. If you
look at the first digit from the left, what is the probability that the first digit is going to be equal
to 0? What is the probability that it is going o be equal to 1? You will see all of them is 1/10.
There are 10, there are 100 out of the 1000 cases which are favorable, for every particular first
digit from the left. If you want the first digit to be 0, you have 000 to 099; so that is hundred
favorable out of the overall 1000, so you get 1/10. So, it is easy to quickly see that
( ); so you see quite often to find the marginal PMF from an
experiment. You may not have to do the summation and the marginalization very painfully;
directly you maybe able to deal with the marginalization. So, that is something that is important
to remember as well; do not go finding the joint PMF all the time.
You maybe able to quickly find directly find the marginal. Likewise, Y is the number modulo 2;
so Y is going to be 0, if the number is even; and 1, if the number is odd. And exactly 500 out of
these 1000 numbers are even, the other 500 are odd. So, if you only care about even or odd, it is
going to be uniform in 0, 1, ( ). So, it is 500/1000. So, it is uniform and 0, 1
same thing with Z. ( ); it is the last digit. Now, the units place
digit that is also uniform from 0 to 9. So, quite often the marginal maybe actually quite easy to
find, you may not have to worry too much about adding up over the joint PMF and all that.
So, let us come to this very interesting case. We have been talking about data a lot and practical
examples more complicated examples, where the space is difficult to specify and all that. And
this IPL powerplay over 1, the first over of the powerplay is maybe a good example of such a
situation. So, we will assume that this over has 6 deliveries; there are cases where it has 7. So, let
us say we just forget about the seventh delivery, which is consider the over to be the first 6
deliveries that are bowled. There is always be 6 deliveries in the over, so we do that. And Xi is
the number of runs scored in the i-th delivery; so that is the variable we have been we have been
looking at it.
Now, what do I do for the marginal distribution of Xi? So, if you remember I gave you this
example; and said the joint PMF of X1 to X6 looks formidable to calculate. It is formidable to not
just calculate even specify or write down or think about. It seems like it is all over the places;
you may not be able to really do something very clear. But, you will see interestingly, you can do
something very reasonable for getting the marginal PMF of the X1 to X6. It is not too bad like I
mentioned there are about 1500 IPL matches that have already happened 1598. And if you
tabulate the data from there, it is not too unreasonable to think of the marginal distribution of X1.
So, let us see, let me show you how this works. We have seen before that with I mean you can
take * +; it is very unlikely that more than 8 is going to happen. The trick is
how to assign probabilities to these values? How do we do this? It is we have never done
something like this before, is not it. Previously, it is always been some toy experiment; we have
been able to able to always sort of figure out what the probability should be. And how do you
assign probabilities? Maybe you can guess from your experience of IPL matches. What is going
to be the most probable run scored?
Let us say in the first ball mostly the batsman is going to defend or something. So, in the first
ball X1 it is very likely that the 0 is the dominant number. In fact you may even guess that 0
would be the most probable run that somebody scores of a delivery. Most cases maybe 0,
boundary can happen. But, how do we assign probabilities? What is the meaningful way to
assign the probability? So, traditionally what people do this is again. This is this is probably not
the best method and but, but it is a reasonable method; and it is not too bad. So, you can go go
out and collect the data in the past occurrences.
So, there is been like I said 1598 matches, where the first over has been bowled so far. And you
go and see in ball 1 what happened? As in how many times 0 runs were scored in. So, it runs out
in 957 matches 0 runs were scored; look at the large fraction, and 1 run was scored in 429
matches. Together that covers a large variety of large number significant number, and then you
have 2 runs in 57 matches; 3 runs in 5 matches and 4 boundary seems to be more popular. 138
matches and then 5 runs in 8 matches, 6 runs in 4 matches that is it so far looks like nobody has a
no ball; and been hit for a six in the first ball. So, you do not have a 7 or 8.
So, so one of the ways is to assign probabilities in the same proportion as data; so, maybe you
want to say P(X1)=957/1598 I am not claiming once again that this is the best way or anything
like that there is nothing like the best in these kinds of things. It seems reasonable and there are
good reasons why this might be okay; and we will see later on why this might be a good way to
assign these kind of probabilities. But, at least intuitively most of you would say yes; that is that
seems reasonable, so you can do that. Now, you can repeat his for ball 2.
So, notice how these numbers are large enough; I mean I would have liked to have 15000
matches, then maybe these numbers all will be much much better. And but but you know
somebody might say this between only 4 matches where six was hit. But, there is been 1500
matches and I am looking at only 6 possibilities; so it is okay, it seems like I have enough data. It
feels like I have enough data to make this statement; I am making a lot of loose statements here.
But, these are important things to just think about. I have seen enough matches to be able to
comment about what happens in the first ball of an over; it is not it is not very unreasonable.
So, I have done this now, I have taken all the balls the first balls have tabulated in how many
matches 0 runs were scored. How many matches 1 run was scored in the first ball, second ball,
third ball et-cetera; and I put down the distribution in this little table here. So, you can see the
marginals one is able to think of in a reasonable way; so, maybe I want to assign a probability of
0.5989 to X1 being 0. I want to assign a probability of 0.2685; this is again in proportion of how
many times in the past this has happened. So, maybe going forward this is a reasonable way to
assign the probabilities.
And you can notice there are some subtle differences. For instance, a six is much more likely
some 6 to 7 times more likely to be hit off the third or fourth ball. The fourth ball seems to be
very interesting for hitting six, and you can also look at boundaries; again the fourth ball is much
more likely than anything else. So, in general it seems very interesting and and it looks like 0 is
lowest in the last ball. So, almost for the highest time promotion of times 0 is unlikely in the last
ball; but still more than 50 percent of the balls are dot balls. That is an interesting observation as
well only for half the time across the matches people hit it 4 runs.
So, so you notice I will I will keep coming back to this example over and over again this week
and in the coming in the next week also. This this kind of a thing is very important for data
science and seeing the connection between, where the probability distribution is coming from
where statistics will eventually come from. And then at least where data is entering the picture,
in terms of coming up with something. And I also want you to think about how if you want to
look at the joint PMF for all the 6 balls, 1598 matches is not good enough. There are just too
many cases and they all will not appear even one time; or you cannot really put anything down
there meaningfully.
I will come back and comment on it for you later on as well. But, at least this much we are able
to do the marginal with this kind of data; we are able to easily write down. So, maybe the
marginal is a good idea, so let se what else we can do with this data going forward. So, so what is
the moral of the story model of the story? Is in large problems when there are a lot of random
variables marginals are your only way, in which you are going to make progress. So, you cannot
cannot deal with the joint PMF, you have to look for smaller distributions; and try to stitch them
together and how we do that we will see in this lecture.
So, let me now generalize a little bit; this is also important. Let us say we have 3 random
variables now and they have a joint distribution ; so, I am taking a very simple case. Just
now we saw n random variables and all that; let us start with 3 after 2 we should go to 3, so we
are going to 3. We have discussed individual marginal PMFs, so you have Now,
what about what about ? This is a joint PMF of X1 and X2; it is a very reasonable object.
So, so just now I mentioned how joint PMF of everything might be difficult. But, what about
joint PMF of X1 and X2? It does not seem that bad and it is very valid object; and can be say
something about that and likewise , . So, there is some meaning in this, it seems
reasonable to think of these pair-wise distributions for instance, when you have 3 random
variables and not just the entire joint PMF.
So, it turns out this is very much possible and you can do it exactly like before the principle
being. When you want to marginalize, keep only what you want; and sum over everything you
do not want; so, that is that is the principle of marginalization. So, if I want X1 X2, I want to find
the joint PMF of X1X2; I am given the joint PMF of whole thing, let say X1, X2, X3 I have. I
am trying to marginalize and find how do I do that?
They can even be pairs of random variables X1X2, X1X3; what you do for X1X3? ( )
( ) ∑ ( ) Same thing with X2X3; you sum over all
possibilities for X1, so it is very simple extension. So, we started with the entire joint PMF and
then we said maybe the marginals are interesting; we looked at the individual marginals.
Now, we have pair-wise marginals, and you can now extend to other situations; so, this principle
is important to remember. Whenever you want to marginalize, you keep what you want sum over
everything you do not want; and you get a marginalization.
(Refer Slide Time: 17:13)
So, here is example; I mean this is a simple enough example, so that I can do this for you. So, if
you do X1X2, so let us first marginalize some back here over ; I am going to have t1 t2. t1
takes value 0, 1 and t2 takes value 0, 1; so if you want to do ( ), so this guy. What will
this be? So, maybe I should use different colors; so use a blue. So, 0 0 is blue, so you have 0 0
here 0 0 here and 0 0 here. And t3 I have to sum over all possibilities for t3; so if I do that I have
to add 1/9+1/9+1/9=1/3. So, let us look change color to look at 0 1.
If I if I look at no no 0 1would be X1 =0 and this guy this position; and that is 0 1 and 0 1. So,
you fix t1 to 0, t2 to 1 and then sum over all possible values for t3. There is only 1 and 2, this
( ) 1/9+1/9=2/9. Let us change to some other color orangish color. For 1 0,
( ) And then one can change to I do not know magenta or something like
that for this ( ) . So, this is my joint So, one can do also other joint PMFs, let
Let say we do just for variety; we have t1 here and t2 here. Remember now t3 can take 3
values, so t1 is 0 1; this (t3) is 0 1 2, so you have 6 possibilities here. I am not going to bother
with the colors; bear me for that, bear with me for that. So, I want X1 0 and X3 0, so 0 0, so
maybe the first one alone; so let us look at this in black. So, this is 0 0 that is it that is the only
the possibility; so ( ) . So, ( ) . What about 0 2? 0 2 is 0 2 is out
That is again just one possibility, no two possibilities; ( ) . So, you get a 2 by
9 star and then you have 1 1 being one possibility ( ) ; and ( ) . Notice
how the marginalization is working and how we are able to do it very easily given the table; that
is a simple enough calculation to do. Of course when the numbers become bigger, the probability
problem becomes more complex; these things are difficult to do. But at least for simple toy
problems, the basic probability one can do this very clearly. Alright, so I thought I will put one
sheet here for working, but I really do not need to work; it is clear enough what I have to do.
So, let us go to slightly more, so we did 2 random variables, 3 random variables; the next logical
thing is 4 random variables. I am trying to show you that you can have all sorts of variety here;
again the same principle you sum over everything you do not want. If you want the marginal
ofX1from the joint, you sum over ;
( ) ( ) ∑ ( )
( ) ( ) ∑ ( )
You want X1X2, you sum over ; you want X2X4, you sum from the all possibles of X1
all possibility of X3.
You want the joint PMF of X1, X3, X4 three of them; then you sum over t2 prime over X2 that
is it, it is as simple as that.
( ) ( ) ∑ ( )
So, from the joint PMF going to marginals it is a very simple one-way process; this multiple
things do not happen there, it is very easy. But, we already know from marginals you cannot
uniquely go to the joint PMF; and there analyze the variety in data science. So, let us see what so
I think that is it; I do not know. I do not think I have more worked out examples for you, that is
good enough I think.
So, in general here is the general formula, I do not want to beat around this formula; you read it
and understand. So, if you have n different random variables, you want to take any subset of
random variables; i1, i2, ik and want to do marginalization. You simply sum over everything
except for i1 to ik; so you can start from 1 go till i1 minus 1. Then jump to i1 plus 1, go go like
that. Skip, keep only ti1 to t ik sum over everything else; you are done.
( )
∑ ( )
So, that is the way to go from joint PMF to marginal PMF in the general case. So, I think that
concludes marginal PMF; the next big topic is conditioning with multiple random variables. That
we will do in the next lecture.