0% found this document useful (0 votes)
26 views25 pages

Chap 3.1

The document discusses jointly distributed random variables, focusing on joint and marginal distributions. It defines joint probability mass functions (pmf) for discrete random variables and provides examples, including calculations for joint pmfs and marginal distributions. Additionally, it introduces joint probability density functions (pdf) for continuous random variables and outlines their properties.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views25 pages

Chap 3.1

The document discusses jointly distributed random variables, focusing on joint and marginal distributions. It defines joint probability mass functions (pmf) for discrete random variables and provides examples, including calculations for joint pmfs and marginal distributions. Additionally, it introduces joint probability density functions (pdf) for continuous random variables and outlines their properties.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Stat1301 Probability& Statistics I Spring 2008-2009

Chapter III Jointly Distributed Random Variables


§ 3.1 Joint and Marginal Distributions

When an experiment or survey is conducted, two or more random variables are


often observed simultaneously not only to study their individual probabilistic
behaviours but also to determine the degree of relationship among the variables as
in most of cases, the variables are related. The probabilistic behaviours of the
random variables are described by their joint distribution.

In the simplest case, suppose there are only two discrete random variables (X , Y)
which take distinct values :

Values of X : X (Ω ) = {x1 , x2 ,..., xr }


Values of Y : Y (Ω ) = { y1 , y2 ,..., yc }

Definition

The joint probability mass function (joint pmf) of the discrete random variables X
and Y and is defined by

p ( x, y ) = P ( X = x, Y = y ) , x ∈ X (Ω ) , y ∈Y (Ω ) .

Sometimes the joint pmf can be conveniently presented in the form of a two-way
table as

Values of Y
Values of X y1 y2 … yc
x1 p( x1 , y1 ) p( x1 , y 2 ) … p( x1 , yc )
x2 p( x2 , y1 ) p( x2 , y 2 ) … p( x2 , yc )
… … … … …
xr p( xr , y1 ) p( xr , y 2 ) … p( xr , y c )

P.104
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.1

Suppose that 3 balls are randomly selected from an urn containing 3 red, 4 white,
and 5 blue balls. If we let X and Y denote, respectively, the number of red and
white balls in the sample, then both X and Y takes values 0, 1, 2, 3 only. The
joint pmf of ( X, Y ) can be calculated as

⎛ 5⎞ ⎛12 ⎞ 10
p (0,0) = P ( X = 0, Y = 0) = P (3 blue balls ) = ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ =
⎝ 3⎠ ⎝ 3 ⎠ 220

⎛ 4 ⎞⎛ 5 ⎞ ⎛12 ⎞ 40
p (0,1) = P ( X = 0, Y = 1) = P (1 white 2 blue ) = ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ =
⎝ 1 ⎠⎝ 2 ⎠ ⎝ 3 ⎠ 220

⎛ 3 ⎞⎛ 4 ⎞ ⎛12 ⎞ 12
p (2,1) = P (2 red 1 white) = ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ =
⎝ 2 ⎠⎝ 1 ⎠ ⎝ 3 ⎠ 220

p (2,2 ) = P (2 red 2 white) = 0

Base on similar calculations, we have

Values of Y
Values of X 0 1 2 3 Total
0 0.0454 0.1818 0.1364 0.0182 0.3818
1 0.1364 0.2727 0.0818 0 0.4909
2 0.0682 0.0545 0 0 0.1227
3 0.0045 0 0 0 0.0045
Total 0.2545 0.5091 0.2182 0.0182 1.0000

The above probabilities can be also represented by the following expression:

⎛ 3 ⎞⎛ 4 ⎞⎛ 5 ⎞
⎜⎜ ⎟⎟⎜⎜ ⎟⎟⎜⎜ ⎟
⎝ x ⎠⎝ y ⎠⎝ 3 − x − y ⎟⎠
p ( x, y ) = , x = 0,1,2,3 , y = 0,1,2,3 , x + y ≤ 3 .
⎛12 ⎞
⎜⎜ ⎟⎟
⎝3⎠

It is called the bivariate hypergeometric distribution.

P.105
Stat1301 Probability& Statistics I Spring 2008-2009

Conditions for a joint pmf

1. 0 ≤ p( x, y ) ≤ 1 for all x ∈ X (Ω ), y ∈ Y (Ω ) .

2. ∑ ∑ p ( x, y ) = 1
x∈ X (Ω ) y∈Y (Ω )

3. P (( X , Y ) ∈ A) = ∑ p ( x, y ) where A ⊂ X (Ω ) × Y (Ω ) .
( x , y )∈ A

Example 3.2

For the joint pmf in above example, obviously p satisfies properties 1 and 2.

For the probability that there are same number of red and white balls,

P (X = Y ) = P (( X , Y ) ∈ {(0,0), (1,1), (2,2 ), (3,3)})

= p (0,0) + p (1,1) = 0.0454 + 0.2727 = 0.3181 .

For the probability that there are less red balls than white balls,

P ( X < Y ) = P (( X , Y ) ∈ {(0,1), (0,2 ), (0,3), (1,2 ), (1,3), (2,3)})

= p (0,1) + p (0,2 ) + p (0,3) + p (1,2 )

= 0.1818 + 0.1364 + 0.0182 + 0.0818 = 0.4182

We may also compute the probability concerning about X only. For example,

P ( X = 0) = p (0,0) + p (0,1) + p (0,2 ) + p (0,3)


= 0.0454 + 0.1818 + 0.1364 + 0.0182 = 0.3818

P ( X = 1) = p (1,0) + p (1,1) + p (1,2 ) + p (1,3)


= 0.1364 + 0.2727 + 0.0818 = 0.4909

As can be seen, the probabilistic behaviour of X (or Y) alone can be obtained


directly from the joint distribution of X and Y. The probability of each value is the
corresponding row (column) sum. Since the probabilities obtained are the marginal
totals from the two-way table, the distribution of X (Y) alone is called the marginal
distribution of X (Y). Thus we have the following definition.

P.106
Stat1301 Probability& Statistics I Spring 2008-2009

Definition

Let X and Y be discrete random variables with joinly pmf p ( x, y ) . The marginal
pmfs of X and Y are respectively defined as

p X ( x ) = P ( X = x ) = ∑ p ( x, y )
y∈Y (Ω )
and
pY ( y ) = P (Y = y ) = ∑ p ( x, y ) .
x∈ X (Ω )

Example 3.3

For the (X , Y) in previous examples, the marginal pmf of X is given by

⎧ 0.3818 x=0
⎪ 0.4909 x =1

p X (x ) = P( X = x ) = ⎨ .
⎪ 0.1227 x=2
⎪⎩ 0.0045 x=3
The marginal pmf of Y is
⎧ 0.2545 y=0
⎪ 0.5091 y =1

pY ( y ) = P (Y = y ) = ⎨ .
⎪ 0.2182 y=2
⎪⎩ 0.0182 y=3

Remark
Joint pmf can uniquely determine the marginal pmfs, but the converse is not true.

Example 3.4

The following table shows a different joint pmf from the previous example that
yield the same marginal pmfs.

Values of Y
Values of X 0 1 2 3 Total
0 0.0972 0.1944 0.0833 0.0069 0.3818
1 0.1249 0.2499 0.1071 0.0089 0.4909
2 0.0312 0.0625 0.0268 0.0022 0.1227
3 0.0011 0.0023 0.0010 0.0001 0.0045
Total 0.2545 0.5091 0.2182 0.0182 1.0000
P.107
Stat1301 Probability& Statistics I Spring 2008-2009

Definition

We say that X and Y are jointly continuous if there exists a function f ( x, y )


defined for all real x and y , having the property that for every (measurable) set
C in the two-dimensional plane,

P (( X , Y ) ∈ C ) = ∫∫ f ( x, y )dxdy .
( x , y )∈C

This function, if exists, is called the joint probability density function (joint pdf).

Properties of joint pdf

1. f ( x, y ) ≥ 0 for all − ∞ < x < ∞, − ∞ < y < ∞ .

2. ∫ − ∞ ∫ − ∞ f ( x, y )dxdy = 1
∞ ∞

3. P (( X , Y ) ∈ A) = ∫∫A f ( x, y )dxdy
P (a ≤ X ≤ b, c ≤ Y ≤ d ) = ∫ c ∫ a f ( x, y )dxdy .
d b
In particular,

4. Joint distribution function:

F ( x, y ) = P ( X ≤ x, Y ≤ y ) = ∫ − ∞ ∫ − ∞ f (s, t )dsdt
y x

5. Marginal pdf :
f X ( x ) = ∫ − ∞ f ( x, y )dy ,

−∞< x<∞
fY ( y ) = ∫ − ∞ f ( x, y )dx ,

−∞< y<∞

Example 3.5

f ( x, y ) = 4 x(1 − y ) , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1

It is easy to verify that f is a joint pdf as f ( x, y ) ≥ 0 for all 0 ≤ x ≤ 1 , 0 ≤ y ≤ 1 and

1
1 1
∫ ∫
0 0
4x (1 − y )dydx = ∫ 1
0
4 x
⎡y2 ⎤
⎢y
1
− ⎥ dx = ∫ 0 2 xdx = x 2 [ ] 1
0 =1
⎣ 2 ⎦0

P.108
Stat1301 Probability& Statistics I Spring 2008-2009

P (0 ≤ X ≤ 1 2 , 1 2 ≤ Y ≤ 1) = ∫ 0 ∫ 1 2 4 x (1 − y )dydx = 1 16
12 1
Similarly,

Suppose we want to determine P ( X < 3Y ) . The following graph shows the region
corresponding to the event X < 3Y .

x = 3y

1/3

x
0 1

The points in the region can be specified by y > x 3 , 0 < x < 1 , therefore we have

1
⎡ y2 ⎤
P ( X < 3Y ) = ∫ ∫ (1 − y )dydx = ∫
1 1 1
0 x3
4x 0
4 x ⎢y− ⎥ dx
⎣ 2 ⎦x3
1
1⎛ 4 x2 2 x3 ⎞ ⎡ 2 4 x3 x4 ⎤ 11
=∫ ⎜ 2x − + ⎟ = ⎢ − + =
18 ⎥⎦ 0 18
0⎜
dx x
⎝ 3 9 ⎟⎠ ⎣ 9

Joint distribution function :

For 0 ≤ x ≤ 1 , 0 ≤ y ≤ 1 ,

(
F ( x, y ) = ∫ 0 ∫ 0 4 s (1 − t )dtds = x 2 2 y − y 2
x y
)
For 0 ≤ x ≤ 1 , y > 1 , F ( x, y ) = F ( x,1) = x 2
For x > 1 , 0 ≤ y ≤ 1 , F ( x, y ) = F (1, y ) = 2 y − y 2

Therefore
⎧0 x < 0, y < 0
⎪ 2
(
⎪ x 2y − y
2
) 0 ≤ x ≤ 1, 0 ≤ y ≤ 1

F ( x, y ) = ⎨ x 2 0 ≤ x ≤ 1, y > 1

⎪ 2y − y x > 1, 0 ≤ y ≤ 1
2

⎪⎩ 1 x > 1, y > 1

P.109
Stat1301 Probability& Statistics I Spring 2008-2009

Marginal pdfs :
1
⎡ 2

f X (x ) = ∫
1
0
4x (1 − y )dy = 4 x ⎢ y − y ⎥ = 2 x , 0 ≤ x ≤1
⎣ 2 ⎦0

1
[ ]
fY ( y ) = ∫ 0 4 x (1 − y )dx = (1 − y ) 2 x 2
1
0 = 2(1 − y ) , 0 ≤ y ≤1

§ 3.2 Independence of Random Variables

Definition

Two random variables X and Y are said to be independent if and only if their joint
pmf (pdf) is equal to the product of their marginal pmfs (pdfs), i.e.

p ( x, y ) = p X ( x ) pY ( y ) for all x, y , if X, Y are discrete;

or f ( x, y ) = f X ( x ) fY ( y ) for all x, y , if X, Y are continuous.

Thus X and Y are dependent if there exists x and y such that

p ( x, y ) ≠ p X ( x ) pY ( y ) , (or f ( x, y ) ≠ f X ( x ) f Y ( y )).

Example 3.6

In example 3.1, X is the number of red balls, Y is the number of white balls in a
sample of 3 randomly drawn from an urn containing 3 red balls, 4 white balls, and
5 blue balls. The following table shows the joint and marginal pmfs.

Values of Y
Values of X 0 1 2 3 Total
0 0.0454 0.1818 0.1364 0.0182 0.3818
1 0.1364 0.2727 0.0818 0 0.4909
2 0.0682 0.0545 0 0 0.1227
3 0.0045 0 0 0 0.0045
Total 0.2545 0.5091 0.2182 0.0182 1.0000

p X (0) pY (0) = 0.3818 × 0.2545 = 0.0972 ≠ p (0,0)

Therefore X and Y are dependent, i.e. knowing the value of X will affect our
uncertainty about Y, and vice versa.

P.110
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.7

In example 3.5, we have f ( x, y ) = 4 x(1 − y ) , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1;

f X (x ) = 2 x , 0 ≤ x ≤ 1 , fY ( y ) = 2(1 − y ) , 0 ≤ y ≤ 1 .

Since f X ( x ) f Y ( y ) = 4 x(1 − y ) = f ( x, y ) for all 0 ≤ x ≤ 1 , 0 ≤ y ≤ 1 , X and Y are


independent.

Example 3.8

Consider the following joint pmf.

Y
X 10 20 40 80 p X (x )
20 0.04 0.06 0.06 0.04 0.2
=(0.2)(0.2) =(0.2)(0.3) =(0.2)(0.3) =(0.2)(0.2)

40 0.10 0.15 0.15 0.10 0.5


=(0.2)(0.2) =(0.5)(0.3) =(0.5)(0.3) =(0.5)(0.2)

60 0.06 0.09 0.09 0.06 0.3


=(0.3)(0.2) =(0.3)(0.3) =(0.3)(0.3) =(0.3)(0.2)
pY ( y ) 0.2 0.3 0.3 0.2 1.00

Since the products p X ( x ) pY ( y ) agrees everywhere with p( x, y ) , the random


variables X and Y are independent.

On the other hand, consider the following pmf.

Y
X 10 20 40 80 p X (x )
20 0.04 0.06 0.06 0.04 0.2
40 0.10 0.15 0.15 0.05 0.45
60 0.06 0.09 0.09 0.11 0.35
pY ( y ) 0.2 0.3 0.3 0.2 1.00

Although we have p (20,10) = p X (20) pY (10) , p (20,20) = p X (20) pY (20) , …, the


random variables X and Y are not independent as p (40,80) ≠ p X (40) pY (80) .

P.111
Stat1301 Probability& Statistics I Spring 2008-2009

Proposition

Let X and Y be random variables with joint pdf (or pmf) f ( x, y ) . Then X and Y are
independent if and only if

(i) the supports of X and Y do not depend on each other (i.e. the region of
possible values is a rectangle); and

(ii) f ( x, y ) can be factorized as g ( x )h ( y ) .

The proposition also applies to discrete random variables.

Example 3.9

⎧1
⎪ ( x + 1)( y + 1)e
−x− y
x, y > 0
f ( x, y ) = ⎨ 4
⎪⎩ 0 otherwise

X and Y are independent since the supports do not depend on each other and

⎡1 ⎤
[ ]
f ( x, y ) = ⎢ ( x + 1)e − x ⎥ ( y + 1)e − y .
⎣4 ⎦

Example 3.10

⎧1
⎪ ( x + y )e
− x− y
x, y > 0
f ( x, y ) = ⎨ 2
⎪⎩ 0 otherwise

X and Y are NOT independent as f ( x, y ) cannot be factorized as g ( x )h ( y ) .

P.112
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.11

Suppose we randomly choose a point uniformly within the unit circle x 2 + y 2 = 1 .


Then the joint pdf of X and Y is given by

⎧⎪ 1 π , x2 + y2 ≤ 1
f ( x, y ) = ⎨
⎪⎩ 0 , x2 + y2 > 1

Although f ( x, y ) is a constant, X and Y are NOT independent because the


support of X is from − 1 − y 2 to 1 − y 2 which depends on the value of y.

Remarks

1. The definitions of joint pdf (pmf) and marginal pdf (pmf) can be generalized to
multivariate case directly.

• Joint pmf of X 1 , X 2 ,..., X n : p ( x1 , x2 ,..., xn ) = P ( X 1 = x1 , X 2 = x2 ,..., X n = xn )

• Joint pdf of X 1 , X 2 ,..., X n : f ( x1 , x2 ,..., xn )

• Marginal pmf/pdf of X 1 :

Discrete p X 1 ( x1 ) = ∑ ∑L ∑ p ( x1 , x2 ,..., xn ) , x1 ∈ℵ1


x2 x3 xn

f X 1 ( x1 ) = ∫− ∞ L ∫− ∞ f ( x1 , x2 ,..., xn )dx2 dx3 L dxn , − ∞ < x1 < ∞


∞ ∞
Continuous

• X 1 , X 2 ,..., X n are said to be (mutually) independent if and only if

f ( x1 , x2 ,..., xn ) = f X ( x1 ) f X ( x2 )L f X ( xn ) , for all x1 , x2 ,..., xn .


1 2 n

2. If X 1 , X 2 ,..., X n are independent, then for all subsets A1 , A2 ,..., An ⊂ (− ∞, ∞ ) ,

P ( X 1 ∈ A1 , X 2 ∈ A2 ,..., X n ∈ An ) = P ( X 1 ∈ A1 )P ( X 2 ∈ A2 )L P ( X n ∈ An ) .

In particular, F ( x1 , x2 ,..., xn ) = FX 1 ( x1 )FX 2 ( x2 )L FX n ( xn ) for all x1 , x2 ,..., xn .

The converse is also true.

P.113
Stat1301 Probability& Statistics I Spring 2008-2009

§ 3.3 Expectation of Function of Random Variables

Definition

For random variables X 1 , X 2 ,..., X n (not necessarily independent) with joint pmf
p( x1 , x2 ,..., xn ) or joint pdf f ( x1 , x2 ,..., xn ) ; if u ( X 1 , X 2 ,..., X n ) is a function of
these random variables, then the expectation of u ( X 1 , X 2 ,..., X n ) is defined as

Discrete
E (u ( X 1 , X 2 ,..., X n )) = ∑ ∑L ∑ u ( x1 , x2 ,..., xn ) p ( x1 , x2 ,..., xn )
x1 x 2 xn
Continuous

E (u ( X 1 , X 2 ,..., X n )) = ∫ − ∞ L ∫ − ∞ u ( x1 , x2 ,..., xn ) f ( x1 , x2 ,..., xn )dx1dx2 L dxn


∞ ∞

Example 3.12

Pairs of resistors are to be connected in parallel and a difference in electrical


potential applied across the resistor assembly. Ohm’s law predicts that in such a
situation, the combined resistance would be
−1
⎛1 1 ⎞
R = ⎜⎜ + ⎟⎟
⎝ R1 R2 ⎠

where R1 and R2 are the two resistances. Suppose that experience tell us that the
two resistances have a joint pdf

⎧ xy 2
⎪ 0 < x < 2, 0 < y < 3
f ( x, y ) = ⎨ 18 .
⎪⎩ 0 otherwise

Then the expected value of the combined resistance is given by

3 2⎛ 1 1⎞
E (R ) = ∫ 0 ∫ 0 ⎜ + ⎟ f ( x, y )dxdy
⎝x y⎠
2 3
1 3⎡ 2 x2 y ⎤ 1 ⎡ y3 y2 ⎤
1 3 2 2
18
( )
= ∫ 0 ∫ 0 y + xy dxdy = ∫ 0 ⎢ y x +
18 ⎣ 2 ⎥⎦ 0
1 3 2
dy = ∫ 0 y + y dy = ⎢ + ⎥
9 9⎣ 3 2 ⎦0
( )
= 1.5 Ω

P.114
Stat1301 Probability& Statistics I Spring 2008-2009

If the two resistors are to be connected in series, then the combined resistance
would be
R = R1 + R2 .

Can we compute E (R ) by E (R1 ) + E (R2 ) ?

Properties

1. E ( X + Y ) = E ( X ) + E (Y ) ( X and Y need not be independent)

In general, E (u1 ( X ) + u 2 (Y )) = E (u1 ( X )) + E (u 2 (Y )) .

2. If X and Y are independent, then E ( XY ) = E ( X )E (Y ) .

In general, E (u1 ( X )u 2 (Y )) = E (u1 ( X ))E (u 2 (Y )) . The converse is NOT


necessarily true.

3. If X and Y are independent, then the moment generating function of X + Y is


equal to the product of the moment generating functions of X and Y , i.e.

M X +Y (t ) = M X (t )M Y (t ) .

4. If u ( X 1 , X 2 ,..., X n ) = g ( X 1 ) (i.e. u is a function of X 1 only), then the expectation


of g can be obtained by the marginal pmf/pdf of X 1 . That is

E (g ( X 1 )) = ∑ ∑L ∑ g ( x1 ) p ( x1 , x2 ,..., xn )
x1 x 2 xn
= ∑ g ( x1 )∑ ∑L ∑ p ( x1 , x2 ,..., xn )
x1 x2 x3 xn
= ∑ g ( x1 ) p X 1 ( x1 )
x1

P.115
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.13

For the previous example, the marginal pmf of R1 and R2 are

⎧x ⎧ y2
⎪ 0< x<2 ⎪ 0< y <3
f R1 ( x ) = ⎨ 2 ; f R2 ( y ) = ⎨ 9 .
⎪⎩ 0 otherwise ⎪⎩ 0 otherwise

Hence
21 4 31 9
E (R1 ) = ∫ 0 x 2 dx = , E (R2 ) = ∫ 0 y 3dy = .
2 3 9 4

4 9 43
E (R ) = E (R1 ) + E (R2 ) = + =
3 4 12

xy 2
Since f R1 ( x ) f R2 ( y ) = = f ( x, y ) for all 0 < x < 2 and 0 < y < 3 , R1 and R2 are
18
independent. Thus
E (R1R2 ) = E (R1 )E (R2 ) = 3 .

Example 3.14

Suppose X ~ χ r2 and Y ~ χ r22 are two independent random variables. Then the
1

moment generating function of X + Y is given by

1 1 1
M X +Y (t ) = M X (t )M Y (t ) =
1
= (r1 + r2 ) 2 , t<
(1 − 2t )r 2 (1 − 2t )r
1 2 2
(1 − 2t ) 2

which is the moment generating function of χ r2 + r . Since moment generating


1 2

function uniquely determine the distribution, we have

X + Y ~ χ r2 + r .
1 2

P.116
Stat1301 Probability& Statistics I Spring 2008-2009

§ 3.4 Population Covariance and Correlation

Definition

Let X and Y be random variables with means μ x , μ y respectively. The


population covariance between X and Y is defined as

σ xy = Cov( X , Y )
= E [( X − μ x )(Y − μ y )]
= E ( XY ) − μ x μ y

Example 3.15

In example 3.1, X is the number of red balls, Y is the number of white balls in a
sample of 3 randomly drawn from an urn containing 3 red balls, 4 white balls, and
5 blue balls. The following table shows the joint and marginal pmfs.

Values of Y
Values of X 0 1 2 3 Total
0 0.0454 0.1818 0.1364 0.0182 0.3818
1 0.1364 0.2727 0.0818 0 0.4909
2 0.0682 0.0545 0 0 0.1227
3 0.0045 0 0 0 0.0045
Total 0.2545 0.5091 0.2182 0.0182 1.0000

3 3
E ( XY ) = ∑ ∑ xyf ( x, y )
x =0y =0

= 1 × 1 × 2727 + 1 × 2 × 0.0818 + 2 × 1 × 0.0545


= 0.5455

E ( X ) = 1 × 0.4909 + 2 × 0.1227 + 3 × 0.0045 = 0.75

E (Y ) = 1 × 0.5091 + 2 × 0.2182 + 3 × 0.0182 = 1

Cov ( X , Y ) = E ( XY ) − E ( X )E (Y ) = 0.5455 − 0.75 × 1 = −0.2045

P.117
Stat1301 Probability& Statistics I Spring 2008-2009

Properties

1. The sign and the magnitude of σ xy reveal the direction and the strength of the
linear relationship between X and Y .
( X ↑, Y ↑, σ xy > 0; X ↑, Y ↓, σ xy < 0 )

2. The magnitude of σ xy depends on the scales of X and Y.

Let X ' = aX + b , Y ' = cY + d ; then

Cov( X ' , Y ') = Cov(aX + b, cY + d )


= E [(aX − aμ X )(cY − cμ Y )]
= acE[( X − μ X )(Y − μ Y )] = acCov( X , Y )

3. Cov( X , X ) = Var ( X )

⎛m ⎞ m n
4. Cov⎜ ∑ ai X i , ∑ b j Y j ⎟ = ∑∑ ai b j Cov ( X i , Y j )
n

⎝ i =1 j =1 ⎠ i =1 j =1

5. Var ( X + Y ) = Cov ( X + Y , X + Y )
= Cov ( X , X ) + Cov ( X , Y ) + Cov (Y , X ) + Cov (Y , Y )
= Var ( X ) + Var (Y ) + 2Cov ( X , Y )

Var ⎜ ∑ X i ⎟ = ∑Var ( X i ) + 2∑ Cov (X i , X j ).


⎛ n ⎞ n
In general,
⎝ i =1 ⎠ i =1 i< j

Example 3.16
(Sampling without replacement from a finite population)

Suppose we randomly draw n balls from an urn with m red balls and N – m white
balls. Let X be the number of red balls in our sample. Then X has a hypergeometric
distribution with pmf

⎛ m ⎞⎛ N − m ⎞
⎜⎜ ⎟⎟⎜⎜ ⎟⎟

p ( x ) = P ( X = x ) = ⎝ ⎠⎝ ⎠
x n x
, max (0, n − (N − m )) ≤ x ≤ min(n, m ) .
⎛N⎞
⎜⎜ ⎟⎟
⎝n ⎠

Direct derivation of E ( X ) and Var ( X ) may be difficult.

P.118
Stat1301 Probability& Statistics I Spring 2008-2009

Let
⎧1 if the i th ball drawn is red
Yi = ⎨ ,
⎩0 otherwise
n
then X = ∑ Yi . Consider
i =1

m m 2 m( N − m )
E (Yi ) = E Yi ( ) 2
= P (Yi = 1) =
m
N
, Var (Yi ) = − 2 =
N N N2

m m −1
E (YiY j ) = P (Yi = 1, Y j = 1) =
N N −1

m(m − 1) m m( N − m )
Cov (Yi , Y j ) =
2
− 2 =− 2 for i ≠ j .
( )N N −1 N N ( N − 1)

Hence

⎛ n ⎞ n n m nm
E ( X ) = E ⎜ ∑ Yi ⎟ = ∑ E (Yi ) = ∑ =
⎝ i =1 ⎠ i =1 i =1 N N

Var ( X ) = ∑Var (Yi ) + 2∑ Cov (Yi , Y j )


n

i =1 i< j

n m( N − m ) ⎛ m( N − m ) ⎞
=∑ + 2 ∑ ⎜⎜ − 2 ⎟⎟
i =1 N 2
i< j ⎝ N ( N − 1) ⎠

nm( N − m ) n(n − 1) m( N − m ) ⎛ N − n ⎞ m ⎛ m⎞
= − 2 = ⎜ ⎟ n ⎜ 1 − ⎟
N2 2 N 2 ( N − 1) ⎝ N − 1 ⎠ N ⎝ N⎠

Coefficient of Correlation

Since σ xy depends on the scales of X and Y, it is difficult to determine the strength


of the linear relationship between X and Y. Thus we need a standardized measure
which is invariant under linear transformation of X and Y.

P.119
Stat1301 Probability& Statistics I Spring 2008-2009

Definition
Let X and Y be random variables with covariance σ xy , standard deviations
σ x = Var ( X ) , σ y = Var (Y ) . The population correlation coefficient between X
and Y is defined as
σ xy Cov ( X , Y )
ρ = Corr ( X , Y ) = = .
σ xσ y Var ( X )Var (Y )

Example 3.17

For example 3.15, σ xy = −0.2045 , σ x = 0.6784 , σ y = 0.7385 .

σ xy − 0.2045
Hence ρ = = = −0.4082 . The number of red balls and the
σ xσ y 0.6784 × 0.7385
number of white balls in the sample are slightly negatively correlated.

Cauchy-Schwartz Inequality

Let X and Y be random variables with finite second moments. Then

(E ( XY ))2 ≤ E (X 2 )E (Y 2 ).
The equality holds if and only if either P (Y = 0) = 1 or P ( X = aY ) = 1 for some
constant a, i.e. X and Y have a perfect linear relationship.

Properties of the correlation coefficient

1. − 1 ≤ ρ ≤ 1

By Cauchy-Schwartz inequallity,

σ xy = E (( X − μ X )(Y − μY ))
2 2

( )(
≤ E ( X − μ X ) E (Y − μY ) = σ x2σ y2
2 2
)

P.120
Stat1301 Probability& Statistics I Spring 2008-2009

σ XY
2

i.e. ρ = 2 2 ≤ 1.
2

σ Xσ Y

The equality holds ( ρ = ±1) when X and Y are perfectly linearly related, i.e.
when
P ( X − μ X = a (Y − μY )) = 1 .

2. ρ is invariant under linear transformation of X and Y .

Let X ' = aX + b , Y ' = cY + d . Then

Cov (aX + b, cY + d ) acCov ( X , Y )


Corr ( X ' , Y ' ) = =
Var (aX + b )Var (cY + d ) a 2Var ( X ) c 2Var (Y )

acCov ( X , Y )
= = sign(ac ) Corr ( X , Y )
ac Var ( X )Var (Y )

3. If X and Y are independent, then

Cov( X , Y ) = E ( XY ) − E ( X )E (Y ) = E ( X )E (Y ) − E ( X )E (Y ) = 0 .

and hence ρ = 0 .

4. If X and Y are independent, then

Var ( X + Y ) = Var ( X − Y ) = Var ( X ) + Var (Y ) .

Remarks

1. The converse of property 3 need not be true. That is, ρ = 0 does not imply X
and Y are independent.

2. Correlation coefficient measures the strength of the linear relationship only. It


may be possible that X and Y are strongly related but that the relation is
curvilinear, and ρ would be nearly zero.

3. An observed correlation may be due to a third unknown casual variable.

P.121
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.18

θ ~ U (0,2π ) , X = sin θ , Y = cosθ

(
Obviously X and Y are not independent because P X 2 + Y 2 = 1 = 1 . )
However,

2π 2π
sin θ cos θ cosθ sin θ
E(X ) = ∫ E (Y ) = ∫
2π 2π
dθ = − =0 , dθ = =0
0
2π 2π 0
0
2π 2π 0


sin θ cosθ 2π sin 2θ cos 2θ
E ( XY ) = ∫

dθ = ∫ 0 dθ = − =0
0
2π 4π 8π 0

Hence Cov( X , Y ) = 0 , i.e. X and Y are uncorrelated.

§ 3.5 Linear Combinations of Random Variables

Sometimes we may be interested in the linear combinations (or weighted sums) of


random variables as can be seen from the following example.

Example 3.19

Suppose that a couple is drawn at random from a large population of working


couples. In thousand dollar units, let

X = man’s income, Y = woman’s income

Then the couple’s total income is the sum S = X +Y .

Suppose their pension contribution is 10% of the man’s income, and 20% of the
woman’s income. Then the couple’s total pension contribution is a weighted sum:

W = 0.1X + 0.2Y .

Suppose we know that the average man’s income is E ( X ) = 20 and the average
woman’s is E (Y ) = 16 , then the average total income is

E (S ) = E ( X + Y ) = E ( X ) + E (Y ) = 20 + 16 = 36

P.122
Stat1301 Probability& Statistics I Spring 2008-2009

and the average total pension contribution is

E (W ) = E (0.1X + 0.2Y ) = 0.1E ( X ) + 0.2 E (Y ) = 2 + 3.2 = 5.2 .

Furthermore, if we were given the joint distribution of X and Y from which we


calculated σ x2 = 60 , σ 2y = 70 , σ xy2 = 49 ; then

Var (S ) = Var ( X + Y ) = Var ( X ) + Var (Y ) + 2Cov( X , Y )

= 60 + 70 + 2(49 ) = 228

σ S = 228 = 15.1

and
Var (W ) = Var (0.1X + 0.2Y )
= (0.1) Var ( X ) + (0.2 ) Var (Y ) + 2(0.1)(0.2 )Cov ( X , Y )
2 2

= (0.01)(60 ) + (0.04 )(70 ) + 2(0.02 )(49 ) = 5.36

σ W = 5.36 = 2.32

§ 3.6 Sum of Independent Random Variables

Let X 1 , X 2 ,..., X n be independent random variables (not necessarily identically


distributed) with mean and variances

E( X i ) = μi , Var ( X i ) = σ i2 .

n
Let Y = ∑ ai X i where ai ’s are constants. Then
i =1

E (Y ) = ∑ ai E ( X i ) = ∑ ai μ i
n n

i =1 i =1

Var (Y ) = ∑Var (ai X i ) = ∑ ai2Var ( X i ) = ∑ ai2σ i2


n n n

i =1 i =1 i =1

P.123
Stat1301 Probability& Statistics I Spring 2008-2009

If in addition, X 1 , X 2 ,..., X n have the same marginal distribution with pdf f ( x ) ,


then X 1 , X 2 ,..., X n are said to be independently identically distributed (iid) with pdf
f ( x ) . We call this set of random variables {X 1 , X 2 ,..., X n } a random sample from
f ( x ) . For a random sample, all the random variables will have common mean and
variance
E(X i ) = μ , Var ( X i ) = σ 2 .

Based on a random sample, usually we will compute some summary statistics such
as the sample mean and sample variance. The probabilistic behaviour of these
summary statistics are called the sampling distributions.

1 n
Sample Mean X= ∑ Xi
n i =1

E (X ) = ∑ μ = n μ = μ
n
1 1
i =1 n n

1 2 σ2
Var ( X ) = ∑ 2 σ = n 2 σ =
n
1 2
i =1 n n n

Sample Variance S =
1 n
2
∑ (X i − X )2
n − 1 i =1

⎧n
⎩ i =1
2⎫
n

⎭ i =1
{ 2
n

i =1
}
E ⎨∑ ( X i − X ) ⎬ = ∑ E ( X i − X ) = ∑Var ( X i − X )

( E (X i − X ) = μ − μ = 0 )

= ∑ {Var ( X i ) + Var ( X ) − 2Cov( X i , X )}


n

i =1

n ⎧ 2 σ2 σ2⎫
= ∑ ⎨σ + −2 ⎬
i =1 ⎩ n n ⎭

= (n − 1)σ 2

Hence ( )
E S2 =σ 2 .

P.124
Stat1301 Probability& Statistics I Spring 2008-2009

Moment generating function

Let X 1 , X 2 ,..., X n be independent (not necessarily identically distributed) random


variables with moment generating functions M X (t ) . Let Y = ∑ ai X i , then the
n

i
i =1

moment generating function of Y is given by

M Y (t ) = E (e tY ) = E (e t ( a X + a X +L+ a X ) )
1 1 2 2 n n

= E (e ta X e ta X Le ta X )
1 1 2 2 n n

= E (e ta X )E (e ta X )L E (e ta X )
1 1 2 2 n n
(independence)
= M X (a1t )M X (a2 t )L M X (an t )
1 2 n

= ∏ M X (ai t ) .
n

i
i =1

Example 3.20

Let X 1 , X 2 ,..., X n be identically independently distributed with moment generating


function M X (t ) . Then the moment generating function of the sample mean can be
evaluated by
n
⎛t⎞ ⎧
n
⎛ t ⎞⎫
M X (t ) = ∏ M X ⎜ ⎟ = ⎨ M X ⎜ ⎟ ⎬ .
i =1 ⎝n⎠ ⎩ ⎝ n ⎠⎭

Example 3.21

Suppose X i ~ N (μ ,σ 2 ) . The moment generating function of each individual


iid

random variable is
⎧ σ 2t 2 ⎫
M X (t ) = exp ⎨μt + ⎬.
⎩ 2 ⎭

Therefore the moment generating function of the sample mean is given by

⎧ ⎡ t σ 2 (t n )2 ⎤ ⎫ (σ 2 n )t 2 ⎫
n

M X (t ) = ⎨exp ⎢ μ + ⎥ ⎬ = exp⎨μt + ⎬
⎩ ⎣ n 2 ⎦⎭ ⎩ 2 ⎭

Since moment generating function uniquely determine the distribution, we have

⎛ σ2⎞
X ~ N ⎜ μ, ⎟ .
⎝ n ⎠

P.125
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.22

Let X be a particular measure to an experimental object. Since there would be


measurement error, the value of X can be viewed as random. Assume that
X ~ N (μ , 0.01) where μ is the true measurement value for that object. Then the
probability that a measurement deviates from the true value by 0.05 is given by

⎛ X −μ ⎞
P ( X − μ > 0.05) = P⎜ > 0.5 ⎟ = 2(1 − Φ (0.5)) = 2(0.309 ) = 0.618
⎝ 0.1 ⎠

If we take 10 measurements independently and use the sample mean of these 10


measurements as our final measure, then this sample mean will be distributed as

⎛ 0.01 ⎞ X −μ
X ~ N ⎜ μ, ⎟, i.e. ~ N (0,1) .
⎝ 10 ⎠ 0.1 10

Hence the probability that this final measure deviates from the true value by 0.05 is

⎛ X −μ 0.05 ⎞
P ( X − μ > 0.05) = P⎜⎜ > ⎟ = 2(1 − Φ (1.58)) = 2(0.057 ) = 0.114

⎝ 0. 1 10 0 .1 10 ⎠

Example 3.23

iid
Suppose X i ~ Exp (λ ) . The moment generating function for each X is

λ
M X (t ) = , t<λ.
λ −t

The moment generating function of the sample mean is hence given by


n
⎛ λ ⎞ ⎛ nλ ⎞
n

M X (t ) = ⎜⎜ ⎟⎟ = ⎜ ⎟ , t < nλ
⎝ λ − t n ⎠ ⎝ nλ − t ⎠

Comparing with the moment generating function of gamma distribution, we have

X ~ Γ(n, nλ ) .

P.126
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.24

Let X i ~ N (μ ,σ 2 ) . How does S 2 =


1 n
(X i − X )2 distribute?
iid


n − 1 i =1

Consider

∑ ( X i − μ ) = ∑ (X i − X + X − μ )
n n
2 2

i =1 i =1

= ∑ ( X i − X ) + ∑ ( X − μ ) + 2( X − μ )∑ ( X i − X )
n n n
2 2

i =1 i =1 i =1

= ∑ ( X i − X ) + n( X − μ )
n
2 2

i =1

(n − 1)S 2 ⎛ X − μ ⎞
2
⎛ Xi − μ ⎞
n 2
Hence ∑⎜ σ ⎟ = + ⎜⎜ ⎟⎟
i =1 ⎝ ⎠ σ2 ⎝ σ n ⎠

W = Y + Z (say)

Xi − μ ⎛X −μ⎞
( )⇒
2
X i ~ N μ ,σ 2
~ N (0,1) ⇒ ⎜ i ⎟ ~ χ1
2
σ ⎝ σ ⎠

Since X i ’s are independent, we have W ~ χ n2 .

On the other hand,

2
⎛ σ2⎞ X −μ ⎛X −μ⎞
X ~ N ⎜⎜ μ , ⎟⎟ ⇒ ~ N (0,1) ⇒ ⎜⎜ ⎟⎟ ~ χ12 , i.e. Z ~ χ12 .
⎝ n ⎠ σ n ⎝σ n ⎠

It can be shown (in later section) that X and S 2 are independent. Therefore Y and
Z are also independent. Now consider the moment generating functions

M W (t ) = M Y (t )M Z (t )

= M Y (t )
1 1

(1 − 2t ) n2
(1 − 2t )1 2

M Y (t ) =
1

(1 − 2t )(n−1) 2

P.127
Stat1301 Probability& Statistics I Spring 2008-2009

Therefore for a normal random sample,

Y=
(n − 1)S 2 ~ χ 2 , or equivalently,
⎛ n − 1 n − 1⎞
S 2 ~ Γ⎜ , 2⎟.
n −1
σ 2
⎝ 2 2σ ⎠

ES( )2
=
n −1 n −1
=σ 2, Var S ( )
2
=
n −1 (n − 1)2 =
2σ 4
2 2σ 2
2 4σ 4 n −1

§ 3.7 Sampling without replacement

When we write down the expression X 1 , X 2 ,..., X n ~ f ( x ) , we implicitly assumed


iid

that there is an underlying population with histogram resemble the shape of f ( x ) .


Independence among the random variables can be justified either

(i) the X’s are drawn with replacement; or


(ii) there are infinite number of objects in the population.

However, in some real life applications neither of these two criteria is satisfied. If
the population size N is very large such that the ratio n N is negligible, then
X 1 , X 2 ,..., X n can be still assumed to be approximately mutually independent. On
the other hand, if n N is not so small, then X 1 , X 2 ,..., X n is no longer a random
sample. In this case, it is called a simple random sample (SRS) and the inference
about the sample mean should be adjusted as

⎛ N − n ⎞σ
E (X ) = μ Var ( X ) = ⎜
2

, ⎟ .
⎝ N − 1 ⎠ n

⎛ N − n⎞
The factor ⎜ ⎟ is called the finite population correction factor.
⎝ N −1 ⎠

P.128

You might also like