0% found this document useful (0 votes)
45 views32 pages

MATH 156 Chapter 4 - Probability and Calculus

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views32 pages

MATH 156 Chapter 4 - Probability and Calculus

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

93

Chapter 4

Probability and Calculus

Goal: We often make decisions where the outcome


is uncertain. In this chapter, we discuss how such
situations can be modelled with probability, and how
it all relates to the calculus topics you learnt so far.
94

4.1 Probability distributions


→ Section 10.1 in the textbook
• Definitions:
• An experiment is an activity with observable
results.
• Outcomes are the observable results from an
experiment.
• The sample space is the collection of all out-
comes of an experiment.
• An event is a collection of one or more out-
comes of an experiment.
• Examples:
toss roll Apple’s profit
Experiment
a coin the dice in Q1 2020
Example for
head 2 US$ 21 bn
an outcome
{head, {1, 2, 3,
Sample space (−∞, ∞)
tail} 4, 5, 6}
Example for
{head} {1, 3, 5} [21 bn,22 bn]∗
an event
∗This is the event that Apple’s profit for the first
quarter 2020 will be between US$ 21 and 22 bn.
95

• Definition: A probability of an event is a num-


ber between 0 and 1, giving the proportionate
number of times that the event will occur if the
experiment associated with the event is repeated
independently many times.

• Examples: 1) P ({head}) = 1/2 for tossing a coin,


2) P ({1, 3, 5}) = 1/2 for rolling a fair dice,
3) to find P (Apple’s profit is in [21bn,22bn]), we
need to model the distribution of Apple’s profit.

• Definitions: Often the outcome of an experiment


is a number or we can assign a number to it. We
denote it by X , which we call a random variable.
Two important groups of random variables:
A) A random variable is called finite discrete if it
takes only a finite number of different values.
B) A random variable is called continuous if it
can take any value in an interval.

• Examples: 1) X = 1 if coin toss results in head,


and X = 0 if coin toss results in tail; this random
variable is finite discrete.
96

2) X = outcome from rolling the dice, hence X


takes a value in {1, 2, 3, 4, 5, 6}; this random vari-
able is finite discrete.

3) X = Apple’s profit in the first quarter of 2020;


this random variable is continuous.

• Definition: A probability density function for a


continuous random variable X is a function f (x)
such that
Z b
P ( a ≤ X ≤ b) = f (x) dx
a
for all a ≤ b.

1.6

1.4

1.2

1
f(x)

0.8

0.6

0.4

0.2

0
0 0.5 a 1 b 1.5
x
97

• Properties: A probability density function f (x)


satisfies:
1) f (x) ≥ 0 for all x,
Z ∞
2) f (x) dx = 1.
−∞
Conversely, any function f with these two prop-
erties is the probability density function for some
continuous random variable X .

• Examples: 1) For a < b, we set



c if a ≤ x ≤ b,
f (x) =
0 otherwise,
for a constant c > 0. Find c such that f is a prob-
ability density function. Remark: This function f
is called uniform density function on [a, b].

0.6
f(x)

0.4
area = 1
0.2

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5


x
98

We first note that f (x) ≥ 0 for all x. To find c,


we compute
Z ∞ Z b
f (x) dx = c dx = cx|b−a = c(b − a)
−∞ a
and set it equal to 1. We obtain c(b − a) = 1,
1 .
hence c = b−a

2) Consider

c if x ≥ 0,
f (x) =
0 if x < 0,
for a constant c > 0. Find c such that f is a
probability density function.

There is no constant c such that f is a probability


density function. Indeed, we compute
Z ∞ Z ∞ Z b
f (x) dx = c dx = lim c dx
−∞ 0  b→∞ 0
= lim cx|b0 = lim bc = ∞,
b→∞ b→∞
which can never be equal to 1 for any choice of c.
99

3) Consider

kx3 if 0 ≤ x ≤ 2,
f (x) =
0 otherwise,
for a constant k > 0. (a) Find k such that f is a
probability density function.
(b) Let X be a random variable with this proba-
bility density function f . Find the probabilities

(i) P (1 ≤ X ≤ 2), (ii) P (−1 ≤ X ≤ 1),


(iii) P (X = 1).

2.5

1.5
f(x)

0.5

area = 1

0
0 0.5 1 1.5 2 2.5
x
100

(a) We first note that f (x) ≥ 0 for all x. To


find k, we compute

k 42 24k
Z ∞ Z 2
f (x) dx = kx3 dx = x = = 4k
−∞ 0 4 0 4
and set it equal to 1. We obtain 4k = 1 so that
k = 14 .
Rb
(b) Using P (a ≤ X ≤ b) = a f (x) dx, we com-
pute the probabilities
Z 2 Z 2
1 3
(i) P (1 ≤ X ≤ 2) = f (x) dx = x dx
1 1 4
1 42 1 15
= x = (16 − 1) = ,
16 1 16 16
Z 1 Z 1
1 3
(ii) P (−1 ≤ X ≤ 1) = f (x) dx = x dx
−1 0 4
1 41 1 1
= x = (1 − 0) = ,
16 0 16 16
Z 1 Z 1
1 3
(iii) P (X = 1) = f (x) dx = x dx
1 1 4
1 41 1
= x = (1 − 1) = 0,
16 1 16
where we used in (ii) that f (x) = 0 for all x < 0.
101

Generalizing (iii), we have


Z a
P (X = a) = f (x) dx = 0
a
for any continuous random variable X and num-
ber a.
4) Consider

k e−kx if x ≥ 0,
f (x) =
0 if x < 0,
for a constant k > 0. (a) Check that f is a prob-
ability density function. Remark: This function f
is called exponential density function.
(b) Let X be a random variable with this proba-
bility density function f for k = 1/10. Find the
probability P (X > 20).
0.15

0.1
f(x)

0.05

area = 1

0
0 5 10 15 20 25 30 35 40 45 50
x
102

We have f (x) ≥ 0 for all x and


Z ∞ Z b
f (x) dx = lim ke−kx dx
−∞ b→∞ 0 !
k −kx b
= lim − e
b→∞ k 0
−kb
 
= lim − e +1 =1
b→∞
so that f is indeed a probability density function.

(b) We compute
Z ∞
P (X > 20) = f (x) dx
20
Z b
1 −x/10
= lim e dx
b→∞ 20 10 !
10 −x/10 b
= lim − e
b→∞ 10 20
−b/10 −20/10
 
= lim − e +e
b→∞
= e−2 ≈ 0.1353.
103

• Comments: Sometimes the outcomes of an exper-


iment are associated with more than one random
variable. For example, we may be interested in
the relationship between the profits of Apple and
Microsoft. To study such problems, we extend
the concept of a probability density function of
a random variable to functions of more than one
variable.

• Definition: A joint probability density function


for two continuous random variables X and Y is
a function f (x, y ) such that
Z dZ b
P (a ≤ X ≤ b, c ≤ Y ≤ d) = f (x, y ) dxdy
c a
for all a < b and c < d.

• Properties: A joint probability density function


f (x, y ) satisfies:
1) f (x, y ) ≥ 0 for all x and y ,
Z ∞ Z ∞
2) f (x, y ) dxdy = 1.
−∞ −∞

Conversely, any function f (x, y ) with these two


properties is the joint probability density function
for some continuous random variables X and Y .
104

• Examples: 1) Consider

k e−x−2y if x ≥ 0 and y ≥ 0,
f (x, y ) =
0 otherwise,
for a constant k > 0. (a) Find k such that f is a
probability density function.
Let X and Y be random variables with this joint
probability density function f (x, y ).
(b) Find the probability P (X > 1, 0 ≤ Y ≤ 2).
(c) Find the probability P (−1 ≤ Y ≤ 1).
105

(a) We first note that f (x, y ) ≥ 0 for all x and


y . To find k, we compute
Z ∞ Z ∞ Z ∞Z ∞
f (x, y ) dxdy = ke−x−2y dxdy
−∞ −∞ 0 0
by first finding
Z ∞ Z b
ke−x−2y dx = lim ke−x−2y dx
0 b→∞ 0
 x=b
− ke−x−2y

= lim
b→∞  x=0
−b−2y

= lim − ke + ke 0−2y
b→∞
= ke−2y
so that
Z ∞ Z ∞ Z ∞
f (x, y ) dxdy = ke−2y dy
−∞ −∞ 0
Z b
= lim ke−2y dy
b→∞ 0
k
 
b
= lim − e−2y
b→∞ 2 0
k −2b k
 
= lim − e +
b→∞ 2 2
k
= .
2
We set this equal to 1, resulting in k = 2.
106

(b) To compute the probability


Z 2Z ∞
P (X > 1, 0 ≤ Y ≤ 2) = 2e−x−2y dxdy,
0 1
we first find
Z ∞ Z b
2e−x−2y dx = lim 2e−x−2y dx
1 b→∞ 1
 x=b
− 2e−x−2y

= lim
b→∞  x=1
−b−2y −1−2y

= lim − 2e + 2e
b→∞
= 2e−1−2y

so that
Z 2Z ∞
P (X > 1, 0 ≤ Y ≤ 2) = 2e−x−2y dxdy
0 1
Z 2
= 2e−1−2y dy
0
2 −1−2y 2
=− e
2 0
= −e−5 + e−1
≈ 0.3611.
107

(c) The probability P (−1 ≤ Y ≤ 1) can be writ-


ten as P (X ≥ 0, −1 ≤ Y ≤ 1) because it means
that there is no restriction on X , and X takes
only positive values. We can write

P (−1 ≤ Y ≤ 1) = P (X ≥ 0, −1 ≤ Y ≤ 1)
Z 1Z ∞
= 2e−x−2y dxdy,
0 0
where we started the y -integral at 0 because Y
does not take any negative values. We compute
Z ∞
2e−x−2y dx = 2e−2y
0
similarly to part (b). We then obtain
Z 1Z ∞
P (−1 ≤ Y ≤ 1) = 2e−x−2y dxdy
0 0
Z 1
= 2e−2y dy
0
2 −2y 1
=− e
2 0
= −e−2 + 1
≈ 0.8647.
108

2) Consider

if − 1 ≤ x ≤ 1
k(x + y 2)



f (x, y ) =

and − 1 ≤ y ≤ 1,


0 otherwise,
for a constant k > 0. Find k such that f is a
joint probability density function.

For any k, the function f (x, y ) takes negative val-


ues when x ≤ 0 and y 2 < −x. Therefore, there
is no k such that f is a joint probability density
function.

4.2 Expected value and standard


deviation
→ Section 10.2 in the textbook

• Motivation: The expected value is a key aspect of


a random variable. It is defined as the probability-
weighted average of the random variable. There
are different definitions for finite discrete and con-
tinuous random variables.
109

• Definition: For a finite discrete random variable


X that takes the values x1, x2, . . . , xn with prob-
abilities p1, p2, . . . , pn, respectively, the expected
value is given by

E (X ) = x1p1 + x2p2 + · · · + xnpn.

For a continuous random variable X with prob-


ability density function f , the expected value is
given by
Z ∞
E (X ) = xf (x) dx.
−∞

• Interpretation: The expected value E (X ) is the


centre of the probability distribution of X . In-
deed, if we think of the distribution as a mass dis-
tribution (with total mass 1), then the expected
value is the centre of mass or balance point. The
two pictures on the next page illustrate this for
finite discrete and continuous random variables.
110

• Examples: 1) What is the expected value of the


outcome of rolling a fair 6-sided die?

Each possible outcome x1 = 1, x2 = 2,. . . , x6 = 6


is taken with equal probability pi = 61 so that the
111

expected value equals


1 1 1
x1p1 + x2p2 + · · · + x6p6 = 1 + 2 + · · · + 6
6 6 6
1
= (1 + 2 + · · · + 6)
6
21
= = 3.5.
6
2) Assume that the life span of a certain electronic
component is modelled by a random variable X
that has exponential density

k e−kx if x ≥ 0,
f (x) =
0 if x < 0,
for a constant k > 0. What is the average life
span of the component?

The average life span of the component is given


by the expected value
Z ∞
E (X ) = xf (x) dx
Z−∞

= kxe−kx dx
0
Z b
= lim kxe−kx dx.
b→∞ 0
112

We compute this integral by using integration by


parts so that
Z b b
1

kx e−kx = kx − e−kx
0 ↓ ↑ k 0
k − 1 e−kx
k
Z b 
1 −kx

− k− e dx
0 k
−kb 1 −kx b
= −be − e
k 0
−kb 1 −kb 1
= −be − e + .
k k
By L’Hôpital’s rule, we have
b 1
−be−kb
 
lim = − lim kb
= − lim kb
=0
b→∞ b→∞ e b→∞ k e
so that we conclude
Z b
E (X ) = lim kxe−kx dx
b→∞ 0
1 1
 
= lim − be−kb − e−kb +
b→∞ k k
1
= .
k
113

• Notation: The so-called sigma notation allows us


to write a sum with many terms in a compact
form as
n
X
aj = a1 + a2 + · · · + an,
j=1
P
where stands for sum.

In particular, we can write the expected value of


a finite discrete random variable as
n
X
E (X ) = xj p j .
j=1

• Properties of sums:

1. Constant coefficients:
n
X
a = a + a + · · · + a = na
j=1

2. Linearity:
n
X n
X n
X
(caj + kbj ) = c aj + k bj
j=1 j=1 j=1
114

because we can write


n
X
(caj + kbj )
j=1
= (ca1 + kb1) + (ca2 + kb2) + · · · + (can + kbn)
= ca1 + ca2 + · · · + can + kb1 + kb2 + · · · + kbn
n
X n
X
=c aj + k bj .
j=1 j=1
• Properties of the expected value:
1. Constant random variable: if X = c with
probability 1, the expected value is E (X ) = c.

2. Linearity:
E (cX + kY ) = cE (X ) + kE (Y )
for random variables X , Y and constants c, k.

To see this for finite discrete random variables,


we use the sigma notation
n
X
E (cX + kY ) = (cxj + kyj )pj
j=1
n
X Xn
=c xj pj + k yj pj = cE (X ) + kE (Y ).
j=1 j=1
115

Linearity of the expected value is also valid for


continuous random variables, but we do not show
this here.

• Motivation: In addition to the expected value,


we are often interested in measuring how much
spread out the possible outcomes are.

• Definitions: The variance of a random variable


X is given by
 2
Var(X ) = E X − E (X ) .

The standard deviation of X is given by


q
σ (X ) = Var(X ).

• Computation: If X is a finite discrete random


variable that takes the values x1, x2, . . . , xn with
probabilities p1, p2, . . . , pn, respectively, the vari-
ance is computed as
n n
(xj − µ)2pj , where µ =
X X
Var(X ) = xj p j .
j=1 j=1
116

If X is a continuous random variable with proba-


bility density function f , the variance is computed
as
Z ∞
Var(X ) = (x − µ)2f (x) dx,
−∞
Z ∞
where µ = xf (x) dx.
−∞

• Comment: σ describes the variability in the dis-


tribution of X . If σ is small, the values of X tend
to be close to the expected value (= centre). If
σ is large, the values of X are more spread out.

• Properties:

For a constant c and a random variable X :

1. Var(cX ) = c2Var(X ) and σ (cX ) = |c|σ (X ):


 2
Var(cX ) = E cX − E (cX )
 2
=E cX − cE (X )
 2
= c2 E X − E (X )

= c2Var(X )
117

and
q q
σ (cX ) = Var(cX ) = c2Var(X ) = |c|σ (X ).

2. Var(c) = 0 and σ (c) = 0:


 2
Var(c) = E c − E (c) = 0.
| {z }
=0

3. Var(X + c) = Var(X ) and σ (X + c) = σ (X ):


 2
Var(X + c) = E X + c − E (X + c)
 2 
=E X − E (X ) = Var(X ).

   2
4. Var(X ) = 2
E X − E (X ) :
 2 
Var(X ) = E X − E (X )
  2
= E X 2 − 2XE (X ) + E (X )
     2
= 2
E X − E 2XE (X ) + E (X )
   2
= 2
E X − 2E (X )E (X ) + E (X )
   2
= 2
E X − E (X ) .
118

Property 4 is useful to compute the variance,


as it is often faster to compute E (X 2) than
using the definition of variance.

• Example: Find the standard deviation of a ran-


dom variable X associated with the probability
density function

 4 if 1 ≤ x ≤ 4,
f (x) = 3x2
0 otherwise.

We solve this question in four steps.


1. Compute E (X ):
Z ∞ Z 4
4 4 4
E (X ) = xf (x) dx = dx = ln(x)
−∞ 1 3x 3 1
4 4 4
= ln(4) − ln(1) = ln(4).
3 3 3
2. Compute E (X 2):
4 4
Z ∞ Z 4
  4
E X2 = x2f (x) dx = dx = x
−∞ 1 3 3 1
4 4
= 4 − = 4.
3 3
119

3. Compute Var(X ):
2
4
   2 
Var(X ) = E X 2 − E (X ) =4− ln(4)
3
4. Compute σ (X ):
s
q 16
σ (X ) = Var(X ) = 4− (ln(4))2
9
≈ 0.7638.

4.3 Covariance and correlation


• Motivation: In this section, we discuss how to
measure the dependence between two random vari-
able, for example, how do the returns of two
stocks depend on each other.

• Definitions: Let X and Y be two random vari-


ables, then the covariance of X and Y is denoted
and given by
  
Cov(X, Y ) = E X − E (X ) Y − E (Y ) .
The correlation is given by
Cov(X, Y )
ρ(X, Y ) = q .
Var(X )Var(Y )
120

• Comment: The correlation is a measure for the


degree to which large values of X tend to be asso-
ciated with large values of Y . The correlation tells
us if there is a relation between two random vari-
ables and which direction that relationship is in.

• Properties:

1. Symmetry:

Cov(X, Y ) = Cov(Y, X ),
ρ(X, Y ) = ρ(Y, X ).
2. Linearity of covariance:

Cov(aX1 + bX2 + c, Y ) = aCov(X1, Y )


+ bCov(X2, Y ).
3. Relation to variance:

Var(X ) = Cov(X, X ).
4. Variance of sum:

Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X, Y ).


Note that this formula is similar to

(a + b)2 = a2 + b2 + 2ab.
121

Reason:

Var(X + Y ) = Cov(X + Y, X + Y )
= Cov(X, X + Y ) + Cov(Y, X + Y )
= Cov(X, X ) + Cov(X, Y )
+ Cov(Y, X ) + Cov(Y, Y )
= Var(X ) + Var(Y ) + 2Cov(X, Y ).

• Examples:
1. Let X be a random variable with a variance
of 2, and let Y be a random variable with a
variance of 4. Let the covariance of X and X
be equal to 1. Find the variance of 3X + Y .
We compute the variance of 3X + Y as

Var(3X + Y ) = Var(3X ) + Var(Y )


+ 2Cov(3X, Y )
= 9Var(X ) + Var(Y )
+ 6Cov(X, Y )
= 9 · 2 + 4 + 6 = 28.
122

2. For a random variable X , set Y = aX + b


for some a, b ∈ R with Var(X ) 6= 0. What is
ρ(X, Y )?
We start by calculating the covariance

Cov(X, Y ) = Cov(X, aX + b)
= aCov(X, X )
= aVar(X ).
Additionally, using Var(Y ) = a2Var(X ), we
find
Cov(X, Y )
ρ(X, Y ) = q
Var(X )Var(Y )
aVar(X )
=q
Var(X )a2Var(X )

1 if a > 0,

a

= √ = DNE if a = 0,
a2 
−1


if a < 0.

• Application: In investment management, the risk


of a stock is often measured as the standard devi-
ation of its return. Let us consider investments in
123

Apple and Microsoft. The variance of the return


of Apple’s stock is 0.3, the variance of the return
of Microsoft’s stock is 0.2, and their covariance is
0.1. If you invest a fraction z of a certain amount
in Apple’s stock and 1 − z in Microsoft’s stock,
at what value of z is the risk of your investment
is minimized?

Let X be the return of Apple’s stock and Y be the


return of Microsoft’s stock. From the question,
we know

Var(X ) = 0.3, Var(Y ) = 0.2, Cov(X, Y ) = 0.1.

If we invest z in Apple’s stock and 1 − z in Mi-


crosoft’s stock, the risk is
q
Var(zX + (1 − z )Y ),

which we want to minimize over z . Rather than


minimizing the standard deviation, it is easier to
minimize the variance, which gives the same min-
124

imizer. We compute

Var(zX + (1 − z )Y ) = Var(zX ) + Var((1 − z )Y )


+ 2Cov(zX, (1 − z )Y )
= z 2Var(X ) + (1 − z )2Var(Y )
+ 2z (1 − z )Cov(X, Y )
= 0.3z 2 + 0.2(1 − z )2
+ 0.2z (1 − z ).
This is a quadratic function in z , whose minimizer
we can find by setting its derivative equal to zero
so that

0.6z − 0.4(1 − z ) + 0.2(1 − 2z ) = 0,


which is equivalent to

0.6z − 0.2 = 0,
hence
0.2 1
z= = ,
0.6 3
which means it is optimal to invest 1/3 in Apple
and 2/3 in Microsoft.

You might also like