Topic 2: Random Variables and Probability Distributions: Rohini Somanathan Course 003, 2014-2015
Topic 2: Random Variables and Probability Distributions: Rohini Somanathan Course 003, 2014-2015
Topic 2: Random Variables and Probability Distributions: Rohini Somanathan Course 003, 2014-2015
Page 0
Rohini Somanathan
'
&
Page 1
%
Rohini Somanathan
'
Random variables
Definition: Let (S, S, ) be a probability space. If X : S < is a real-valued function having as
its domain the elements of S, then X is called a random variable.
A random variable is therefore a real-valued function defined on the space S. Typically x is
used to denote this image value, i.e. x = X(s).
If the outcomes of an experiment are inherently real numbers, they are directly
interpretable as values of a random variable, and we can think of X as the identity function,
so X(s) = s.
We choose random variables based on what we are interested in getting out of the
experiment. For example, we may be interested in the number of students passing an exam,
and not the identities of those who pass. A random variable would assign each element in
the sample space a number corresponding to the number of passes associated with that
outcome.
We therefore begin with a probability space (S, S, ) and arrive at an induced probability
space (R(X), B, PX (A)).
How exactly do we arrive at the function Px (.)? As long as every set A R(X) is associated
with an event in our original sample space S, Px (A) is just the probability assigned to that
event by P.
&
Page 2
%
Rohini Somanathan
'
Random variables..examples
1. Tossing a coin ten times.
The sample space consists of the 210 possible sequences of heads and tails.
There are many different random variables that could be associated with this
experiment: X1 could be the number of heads, X2 the longest run of heads divided by
the longest run of tails, X3 the number of times we get two heads immediately before a
tail, etc...
For s = HT T HHHHT T H, what are the values of these random variables?
2. Choosing a point in a rectangle within a plane
An experiment involves choosing a point s = (x, y) at random from the rectangle
S = {(x, y) : 0 x 2, 0 y 1/2}
The random variable X could be the xcoordinate of the point and an event is X taking
values in [1, 2]
Another random variable Z would be the distance of the point from the origin,
p
Z(s) = x2 + y2 .
3. Heights, weights, distances, temperature, scores, incomes... In these cases, we can have
X(s) = s since these are already expressed as real numbers.
&
Page 3
%
Rohini Somanathan
'
x = 1, 2, 3 . . .
&
Page 4
%
Rohini Somanathan
'
f(w)
wx
In this case, the distribution function will be a step function, jumping at all points x in
R(X) which are assigned positive probability.
Consider the experiment of tossing two fair coins. Describe the probability space induced
by the random variable X, the number of heads, and derive the distribution function of X.
&
Page 5
%
Rohini Somanathan
'
Discrete distributions
Definition: A random variable X has a discrete distribution if X can take only a finite number k
of different values x1 , x2 , . . . , xK or an infinite sequence of different values x1 , x2 , . . . .
The function f(x) = P(X = x) is the probability function of x. We define it to be f(x) for all
values x in our sample space R(X) and zero elsewhere.
If X has a discrete distribution, the probability of any subset A of the real line is given by
P
P(X A) =
f(xi ).
xi A
Examples:
1. The discrete uniform distribution: picking one of the first k non-negative integers at
random
1
for x = 1, 2, ...k,
f(x) = k
0
otherwise
2. The binomial distribution: the probability of x successes in n trials.
n px qnx
for x = 0, 1, 2, ...n,
x
f(x) =
0
otherwise
Derive the distribution functions for each of these.
&
Page 6
%
Rohini Somanathan
'
Continuous distributions
The sample space associated with our random variable often has an infinite number of points.
Example: A point is randomly selected inside a circle of unit radius with origin (0, 0) where the probability
assigned to being in a set A S is P(A) = areaof A and X is the distance of the selected point from the
origin. In this case F(x) = Pr(X x) = area of circle with radius x , so the distribution function of X is given by
0
F(x) =
for x < 0
x2
0x<1
1x
The function f is called the probability density function or p.d.f. of X and must satisfy the
conditions below
1. f(x) 0
2.
f(x)dx = 1
What is f(x) for the above example? How can you use this to compute P( 14 < X 21 )? How would
you use F(x) instead?
&
Page 7
%
Rohini Somanathan
'
Continuous distributions..examples
1. The uniform distribution on an interval: Suppose a and b are two real numbers with a < b.
A point x is selected from the interval S = {x : a x b} and the probability that it
belongs to any subinterval of S is proportional to the length of that subinterval. It follows
that the p.d.f. must be constant on S and zero outside it:
1
for a x b
f(x) = ba
0
otherwise
Notice that the value of the p.d.f is the reciprocal of the length of the interval, these values
can be greater than one, and the assignment of probabilities does not depend on whether
the distribution is defined over the closed interval or the open interval (a, b)
2. Unbounded random variables: It is sometimes convenient to define a p.d.f over unbounded
sets, because such functions may be easier to work with and may approximate the actual
distribution of a random variable quite well. An example is:
0
for x 0
f(x) =
1 2
for x > 0
(1+x)
3. Unbounded densities: The following function is unbounded around zero but still represents
a valid density.
2 x 13
for 0 < x < 1
f(x) = 3
0
otherwise
&
Page 8
%
Rohini Somanathan
'
Mixed distributions
Often the process of collecting or recording data leads to censoring, and instead of obtaining
a sample from a continuous distribution, we obtain one from a mixed distribution.
Examples:
The weight of an object is a continuous random variable, but our weighing scale only
records weights up to a certain level.
Households with very high incomes often underreport their income, for incomes above a
certain level (say $250,000), surveys often club all households together - this variable is
therefore top-censored.
In each of these examples, we can derive the probability distribution for the new random
variable, given the distribution for the continuous variable. In the example weve just
considered:
0
for x 0
f(x) =
1 2
for x > 0
(1+x)
suppose we record X = 3 for all values of X 3 The p.f. for our new random variable Y is
given by the same p.f. for values less than 3 and by 14 for Y=3.
Some variables, such as the number of hours worked per week have a mixed distribution in
the population, with mass points at 0 and 40.
&
Page 9
%
Rohini Somanathan
'
3. F(x) is right-continuous, i.e. F(x) = F(x+ ) at every point x, where F(x+ ) is the right hand
limit of F(x).
( for discrete random variables, there will be a jump at values that are taken with positive probability)
&
Page 10
%
Rohini Somanathan
'
F0 (x) = f(x)
For discrete and mixed discrete-continous random variables F(x) will exhibit a countable number
of discontinuities at jump points reflecting the assignment of positive probabilities to a countable
number of events.
&
Page 11
%
Rohini Somanathan
'
le of a
F(x)
1
z3
z2
z1
z0
x1
x2
x3
x4
Section 1.10. Similarly, the fact that Pr(X x) approaches 1 as x follows from
&
Exercise 12 in Sec. 1.10.
Page 12
%
Rohini Somanathan
'
xa
ba
for x a and 1 for x > b. Given a value p, we simply solve for the pth quantile:
x = pb + (1 p)a. Compute this for p = .5, .25, .9, . . .
&
Page 13
%
Rohini Somanathan
'
1x
f(x) = 8
0
for 0 x 4
otherwise
1
4
1
2
cx2
f(x) =
0
for 1 x 2
otherwise
&
Page 14
%
Rohini Somanathan
'
Bivariate distributions
Social scientists are typically interested in the manner in which multiple attributes of
people and the societies they live in. The object of interest is a multivariate probability
distribution. examples: education and earnings, days ill per month and age, sex-ratios and
areas under rice cultivation)
This involves dealing with the joint distribution of two or more random variables. Bivariate
distributions attach probabilities to events that are defined by values taken by two random
variables (say X and Y).
Values taken by these random variables are now ordered pairs, (xi , yi ) and an event A is a
set of such values.
If both X and Y are discrete random variables, the probability function
P
f(x, y) = P(X = x and Y = y) and P(X, Y) A =
f(xi , yi )
(xi ,yi )A
&
Page 15
%
Rohini Somanathan
'
gender
male
female
none
.05
.2
primary
.25
.1
middle
.15
.04
high
.1
.03
senior secondary
.03
.02
.02
.01
What are some features of a table like this one? In particular, how would we obtain
probabilities associated with the following events:
receiving no education
becoming a female graduate
completing primary school
What else do you learn from the table about the population of interest?
&
Page 16
%
Rohini Somanathan
'
f is now called the joint probability density function and must satisfy
1. f(x, y) 0 for < x < and < y <
2.
f(x, y)dxdy = 1
Example 1: Given the following joint density function on X and Y, well calculate P(X Y)
f(x, y) =
cx2 y
for x2 y 1
otherwise
First find c to make this a valid joint density (notice the limits of integration here)-it will turn out to be 21/4.
3 .
Then integrate the density over Y (x2 , x) and X (1, 1). Now using this density, P(X Y) = 20
Example 2: A point (X, Y) is selected at random from inside the circle x2 + y2 9. Determine the joint density
function, f(x, y).
&
Page 17
%
Rohini Somanathan
'
If F(x, y) is continuously differentiable in both its arguments, the joint density is derived as:
f(x, y) =
2 F(x, y)
xy
and given the density, we can integrate w.r.t x and y over the appropriate limits to get the
distribution function.
Example:
X and Y and their joint density. Notice the (x, y) range over which F(x, y) is strictly increasing.
&
Page 18
%
Rohini Somanathan
'
Marginal distributions
A distribution of X derived from the joint distribution of X and Y is known as the marginal
distribution of X. For a discrete random variable:
f1 (x) = P(X = x) =
P(X = x and Y = y) =
f(x, y)
and analogously
f2 (y) = P(Y = y) =
P(X = x and Y = y) =
f(x, y)
For a continuous joint density f(x, y), the marginal densities for X and Y are given by:
f(x, y)dx
f1 (x) =
Go back to our table representing the joint distribution of gender and education and find
the marginal distribution of education.
Can one construct the joint distribution from one of the marginal distributions?
&
Page 19
%
Rohini Somanathan
'
&
Page 20
%
Rohini Somanathan
'
2x
for 0 x 1
g(x) =
0
otherwise
Find the probability that X + Y 1.
The joint density 4xy is got by multiplying the marginal densities because these variables
are independent. The required probability of 61 is then obtained by integrating over
y (0, 1 x) and x (0, 1)
How might we use a table of probabilities to determine whether two random variables are
independent?
Given the following density, can we tell whether the variables X and Y are independent?
ke(x+2y)
for x 0 and y 0
f(x, y) =
0
otherwise
Notice that we can factorize the joint density as the product of k1 ex and k2 e2y where
k1 k2 = k. To obtain the marginal densities of X and Y, we multiply these functions by
appropriate constants which make them integrate to unity. This gives us
f1 (x) = ex for x 0 and f2 (y) = 2e2y for y 0
&
Page 21
%
Rohini Somanathan
'
x + y
otherwise
Notice that we cannot factorize the joint density as the product of a non-negative function
of x and another non-negative function of y. Computing the marginals gives us
f1 (x) = x +
1
1
for 0 < x < 1 and f2 (y) = y + for 0 < y < 1
2
2
kx2 y2
for x2 + y2 1
otherwise
In this case the possible values X can take depend on Y and therefore, even though the joint
density can be factorized, the same factorization cannot work for all values of (x, y).
More generally, whenever the space of positive probability density of X and Y is bounded by a
curve, rather than a rectangle, the two random variables are dependent.
&
Page 22
%
Rohini Somanathan
'
g(x)dx
h(y)dy
c
d
R
c
b
R
a
density.
Now to show that if the support is not a rectangle, the variables are dependent: Start with a point (x, y) outside
the domain where f(x, y) > 0. If x and y are independent, we have f(x, y) = f1 (x)f2 (y), so one of these must be zero.
Now as we move due south and enter the set where f(x, y) > 0, our value of x has not changed, so it could not be
that f1 (x) was zero at the original point. Similarly, if we move west, y is unchanged so it could not be that f2 (y)
was zero at the original point. So we have a contradiction.
&
Page 23
%
Rohini Somanathan
'
Conditional distributions
Definition: Consider two discrete random variables X and Y with a joint probability function
f(x, y) and marginal probability functions f1 (x) and f2 (y). After the value Y = y has been
observed, we can write the the probability that X = x using our definition of conditional
probability:
f(x, y)
P(X = x and Y = y)
=
P(X = x|Y = y) =
Pr(Y = y)
f2 (y)
g1 (x|y) =
f(x,y)
f2 (y)
1. for each fixed value of y, g1 (x|y) is a probability function over all possible values of X
because it is non-negative and
X
g1 (x|y) =
1 X
1
f(x, y) =
f2 (y) = 1
f2 (y) x
f2 (y)
2. conditional probabilities are proportional to joint probabilities because they just divide
these by a constant.
We cannot use the definition of condition probability to derive the conditional density for
continuous random variables because the probability that Y takes any particular value y is zero.
We simply define the conditional probability density function of X given Y = y as
g1 (x|y) =
f(x, y)
for ( < x < and < y < )
f2 (y)
&
Page 24
%
Rohini Somanathan
'
The numerator in g1 (x|y) = f (y) is a section of the surface representing the joint density and
2
the denominator is the constant by which we need to divide the numerator to get a valid density
(which integrates to unity)
&
Page 25
%
Rohini Somanathan
'
gender
male
female
f(education|gender=male)
none
.05
.2
.08
primary
.25
.1
.42
middle
.15
.04
.25
high
.1
.03
.17
senior secondary
.03
.02
.05
.02
.01
.03
f(gender|graduate)
.67
.33
&
Page 26
%
Rohini Somanathan
'
cx2 y
for x2 y 1
f(x, y) =
0
otherwise
the marginal distribution of X is given by
Z1
21 2
21 2
x ydy =
x (1 x4 )
4
8
x2
f(x,y)
f1 (x) :
g2 (y|x) =
2y
1x4
for x2 y 1
otherwise
R1
3
4
g2 (y| 21 ) =
7
15
&
Page 27
%
Rohini Somanathan
'
(1)
Notice that the conditional distribution is not defined for a value y0 at which f2 (y) = 0, but this is irrelevant
because at any such value f(x, y0 ) = 0.
Example: X is first chosen from a uniform distribution on (0, 1) and then Y is chosen from a uniform distribution
on (x, 1). The marginal distribution of X is straightforward:
f1 (x) =
otherwise
1
1x
otherwise
1
1x
otherwise
g2 (y|x) =
f(x, y) =
f2 (y) =
y
Z
f(x, y)dx =
1
dx = log(1 y) for 0 < y < 1
1x
&
Page 28
%
Rohini Somanathan
'
Multivariate distributions
Our definitions of joint, conditional and marginal distributions can be easily extended to an
arbitrary finite number of random variables. Such a distribution is now called a multivariate
distributon.
The joint distribution function is defined as the function F whose value at any point
(x1 , x2 , . . . xn ) <n is given by:
F(x1 , . . . , xn ) = P(X1 x1 , X2 x2 , . . . , Xn xn )
For a discrete joint distribution, the probability function at any point (x1 , x2 , . . . xn ) <n is given
by:
f(x1 , . . . , xn ) = P(X1 = x1 , X2 = x2 , . . . , Xn = xn )
(2)
and the random variables X1 , . . . , Xn have a continuous joint distribution if there is a nonnegative
function f defined on <n such that for any subset A <n ,
Z
Z
P[(X1 , . . . , Xn ) A] =
f(x1 , . . . , xn )dx1 . . . dxn
(3)
...A ...
The marginal distribution of any single random variable Xi can now be derived by integrating
over the other variables
Z
Z
f1 (x1 ) =
...
f(x1 , . . . , xn )dx2 . . . dxn
(4)
and the conditional probability density function of X1 given values of the other variables is:
g1 (x1 |x2 . . . xn ) =
f(x1 , . . . , xn )
f0 (x2 , . . . , xn )
(5)
&
Page 29
%
Rohini Somanathan
'
&
Page 30
%
Rohini Somanathan
'
Multivariate distributions..example
Suppose we start with the following density function for a variable X1 :
ex for x > 0
f1 (x) =
0
otherwise
and are told that for any given value of X1 = x1 , two other random variables X2 and X3 are
independently and identically distributed with the following conditional p.d.f.:
&
Page 31
%
Rohini Somanathan
'
1
2
Zy
f(x)dx =
otherwise
&
Page 32
%
Rohini Somanathan
'
&
Page 33
%
Rohini Somanathan
'
&
Page 34
%
Rohini Somanathan