0% found this document useful (0 votes)
97 views

Notes ch1 Random Variables and Probability Distributions

1. The document discusses random variables and probability distributions. It defines random variables, discrete and continuous random variables, and probability distribution functions. 2. It also defines concepts like the cumulative distribution function, probability mass function, probability density function, and expected value. 3. Examples are provided to illustrate discrete and continuous random variables and how to calculate their probability distribution and cumulative distribution functions.

Uploaded by

Erkin D
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

Notes ch1 Random Variables and Probability Distributions

1. The document discusses random variables and probability distributions. It defines random variables, discrete and continuous random variables, and probability distribution functions. 2. It also defines concepts like the cumulative distribution function, probability mass function, probability density function, and expected value. 3. Examples are provided to illustrate discrete and continuous random variables and how to calculate their probability distribution and cumulative distribution functions.

Uploaded by

Erkin D
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Random Variables and Probability Distributions

Applied Statistics

Erkin Diyarbakirlioglu

IAE Gustave Eiffel

February 6, 2022

1
Table of contents

1 Random variables and probability distributions ........................................................................................3

1.1 Random variables ...........................................................................................................................................3

1.2 Probability distributions .............................................................................................................................3

2 Expectation and variance ......................................................................................................................................6

2.1 Expected value .................................................................................................................................................6

2.2 Variance ..............................................................................................................................................................7

2
1 Random variables and probability distributions

1.1 Random variables

1. A random variable 𝑋 is a real-valued set function that assigns one and only one real number to
each element in the sample space Ω. Mathematically, 𝑋 is a mapping from Ω into a subset of ℝ as,

𝑋: Ω → ℝ (1)

Literally speaking, a random variable is a representation of the outcomes of a random experiment


that one uses to determine the numerical quantities that 𝑋 may take as the result of the
experiment. In general, capital letters like 𝑋, 𝑌 or 𝑍 represent a random variable and small-case
letters like 𝑥, 𝑦, or 𝑧 represent a set of realizations (outcomes) of the variable.

Example 1. Let 𝑋 be the rolling of a die. Then, we can state that Ω𝑋 = {1, 2, … ,6} and denote a
specific realization from 𝑋 as 𝑥𝑖 .

2. 𝑋 is said to be a discrete random variable if the set of outcomes is countable. 𝑋 is said to be a


continuous random variable if the set of outcomes is not countable. For example, if 𝑋 is defined as
the result of a die rolled once, then the outcomes 𝑥1 , 𝑥2 , … , 𝑥6 can be associated to the results 𝑥1 =
1, 𝑥2 = 2, … , 𝑥6 = 6. The outcomes are countable, so we say that 𝑋 is a discrete random variable.
Let 𝑌 another random variable defined as the time a driver takes to finish a race. Theoretically, 𝑌
can take any real value within a reasonable interval and, consequently, it is not possible to count
how many outcomes has the sample space of 𝑌. Therefore, 𝑌 is a continuous random variable.1

1.2 Probability distributions

3. For any 𝑋, discrete or continuous, the cumulative distribution function (CDF) is defined as,

𝐹𝑋 (𝑥) = 𝑃(𝑋 ≤ 𝑥) (2)

The CDF is useful to calculate the probabilities of random events. Given the definition of the CDF,
the following properties should be noted,

a. 𝐹(𝑥) is monotonically non-decreasing; if 𝑥1 ≤ 𝑥2 then 𝐹(𝑥1 ) ≤ 𝐹(𝑥2 )

1In many textbooks, one uses the term “finite” as a substitute of “countable”. Thus, one states that 𝑋 is a discrete
random variable if its sample space has a finite number of elements.

3
b. lim 𝐹(𝑥) = 0 and lim 𝐹(𝑥) = 1
𝑥→−∞ 𝑥→∞

c. 0 ≤ 𝐹(𝑥) ≤ 1

4. If 𝐹𝑋 is continuous and strictly increasing, then its inverse function 𝐹 −1 (𝑞) is called the quantile
function of 𝑋, with 0 ≤ 𝑞 ≤ 1. Mathematically, the CDF and quantile functions are related as,

𝐹(𝑥) = 𝑞 ↔ 𝐹 −1 (𝑞) = 𝑥 (3)

Example 2. 𝑋 is a continuous random variable and 𝐹(𝑥 = 1.96) = 0.95. This implies that the
quantile function of 𝑋 evaluated at 0.95 is 𝐹 −1 (𝑞 = 0.95) = 1.96.

5. Some quantiles have specific names. For example, the 0.5th quantile is the median of 𝑋. The 0.25
and 0.75 quantiles are respectively the first and third quartiles. Quantiles are also expressed as
percentiles. For example, the 0.5th quantile of a random variable 𝑋 is also its 50th percentile.

6. The probability distribution function (PDF) of a random variable 𝑋 is a listing of all disjoint
outcomes of 𝑋 and their respective probabilities. Mathematically,

𝑃(𝑋 = 𝑥) if X discrete
𝑓𝑋 (𝑥) = { (4)
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) if X continuous

where 𝑎 and 𝑏 are any two values in the domain of 𝑋.2 For a function 𝑓(𝑥) to a valid PDF, it must
satisfy the following conditions:

a. 𝑓(𝑥) ≥ 0 for all 𝑥 ∈ ℝ



b. ∑𝑥 𝑓𝑋 (𝑥) = 1 if 𝑋 is discrete and ∫−∞ 𝑓𝑋 (𝑥)𝑑𝑥 = 1 if 𝑋 is continuous

Some sources use the term probability mass function to describe 𝑓(𝑥) when 𝑋 is a discrete random
variable and reserve the term probability density function for 𝑓(𝑥) when 𝑋 is continuous.

Example 3. 𝑋 shows the rolling of a die. Then, the PDF and CDF of 𝑋 can be tabulated as,

2In fact, for a continuous random variable 𝑋, the value of the probability distribution function associated with a
particular value is 0 because,
𝑥+𝜖
𝑃(𝑋 = 𝑥) = lim{𝑃(𝑋 ∈ [𝑥, 𝑥 + 𝜖])} = lim {∫ 𝑓𝑋 (𝑥)𝑑𝑥 } = 0
𝜖→0 𝜖→0 𝑥
Therefore, we can only assign probabilities to intervals in the range of 𝑥, whence the definition 𝑓(𝑥) = 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏).

4
outcome, 𝑥𝑖 1 2 3 4 5 6
PDF: 𝑓(𝑥) = 𝑃(𝑋 = 𝑥𝑖 ) 1/6 1/6 1/6 1/6 1/6 1/6
CDF: 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥𝑖 ) 1/6 2/6 3/6 4/6 5/6 6/6

Graphically, the two probability functions of 𝑋 can be visualized as follows,

PDF CDF
0.20 1.00

0.15 0.80
0.60
0.10
0.40
0.05 0.20
0.00 0.00
1 2 3 4 5 6 1 2 3 4 5 6

Example 4. Let 𝑍 be a continuous random variable with the CDF and PDF shown below,

𝑍 can take any values from −∞ to ∞. Because the PDF of 𝑍 evaluated at a specific real-value on ℝ
is zero, we work with a range of values for making probability calculations. For example, the result
of the operation, say, 𝑃(𝑍 ≤ 1) can be illustrated using the CDF and PDF as follows,

On the CDF, the result 𝐹(𝑧 = 1) is shown by the vertical dashed line connecting the 𝑥-axis to the
function 𝐹(𝑧) while 𝐹(𝑧 = 1) corresponds to the shaded area on the left of corresponds to the
shaded area on the left of 𝑧 = 1 on the PDF.

5
7. The CDF and PDF of a random variable are two related functions. Specifically, if 𝑋 is continuous,

𝑥
𝑑𝐹(𝑥)
𝐹(𝑥) = ∫ 𝑓𝑋 (𝑢)𝑑𝑢 ↔ 𝑓(𝑥) = (5)
−∞ 𝑑𝑥

and if 𝑋 is discrete,

𝐹(𝑥) = ∑ 𝑓(𝑥) ↔ 𝑓(𝑥𝑖 ) = 𝐹(𝑥𝑖 ) − 𝐹(𝑥𝑖−1 ) (6)


𝑋≤𝑥

Example 5. Suppose a train arrives to a station every 20 minutes.3 Therefore, the waiting time of a
randomly selected person is a random variable 𝑋 that can be described as,

𝑘 for 0 ≤ 𝑥 ≤ 20
𝑓(𝑥) = {
0 otherwise

20
For 𝑓(𝑥) to be a valid PDF, it must satisfy ∫0 𝑓(𝑥)𝑑𝑥 = 1. Thus, [𝑘𝑥]20
0 = 1 and 𝑘 = 1⁄20,

1⁄20 for 0 ≤ 𝑥 ≤ 20
𝑓(𝑥) = {
0 otherwise

𝑥 𝑥 1 1 1
We can derive the CDF as 𝐹(𝑥) = ∫0 𝑓(𝑢)𝑑𝑢 = ∫0 𝑑𝑢 = [𝑢]0𝑥 = 𝑥.
20 20 20

2 Expectation and variance

2.1 Expected value

8. The mathematical expectation, or the expected value of a random variable 𝑋 is defined as,

∑ 𝑥𝑓(𝑥) if 𝑋 discrete
𝐸(𝑋) = 𝑥 (7)
∫𝑥𝑓(𝑥)𝑑𝑥 if 𝑋 continuous
{ 𝑥

Thus, the expected value of 𝑋 is basically a probability-weighted average of all the possible values
𝑋 can take. The expected value of 𝑋 is also called the mean of 𝑋, and typically denoted as 𝜇𝑋 .

3 Source: Heumann et al. (2016, 130)

6
Example 6. Let 𝑋 be the result of a die rolled one time. So, 𝑋 can take integers from 1 to 6 each with
probability 𝑓(𝑥𝑖 ) = 𝑃(𝑋 = 𝑥𝑖 ) = 1⁄6. Then 𝜇𝑋 can be calculated as,

6
1 1 1
𝜇𝑋 = ∑ 𝑥𝑖 𝑓(𝑥𝑖 ) = (1 × ) + (2 × ) + ⋯ + (6 × ) = 3.5
6 6 6
𝑖=1

Example 7. Consider again the "waiting time for a train" example. We established that,

1⁄20 for 0 ≤ 𝑥 ≤ 20
𝑓(𝑥) = {
0 otherwise

We can calculate 𝐸(𝑋) as follows,

∞ 0 20 ∞
𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥𝑓(𝑥)𝑑𝑥 + ∫ 𝑥𝑓(𝑥)𝑑𝑥 + ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞ −∞ 0 20
20 20
1 1 400
𝐸(𝑋) = 0 + ∫ 𝑥 𝑑𝑥 + 0 = [ 𝑥 2 ] = − 0 = 10
0 20 40 0 40

The "average" waiting time for the train is 10 minutes.

2.2 Variance

9. The variance of 𝑋 measures the extent to which different values that 𝑋 may take can fall apart
from the mean. In other words, whereas the expected value, or the mean, is a measure of the
central tendency of 𝑋, and the variance is a measure of dispersion.4 Mathematically,

∑(𝑥𝑖 − 𝜇)2 𝑓(𝑥𝑖 ) if 𝑋 discrete


2
𝑉𝑎𝑟(𝑋) = 𝐸 [(𝑋 − 𝐸(𝑋)) ] = 𝑖=1 (8)

2
∫ (𝑥 − 𝜇) 𝑓(𝑥)𝑑𝑥 if 𝑋 continuous
{ −∞

So, the variance measures the average square of the difference between 𝑋 and its mean 𝜇. In
general, we denote the variance of a random variable as 𝑉𝑎𝑟(𝑋) = 𝜎𝑋2 .

4 By central tendency, it is meant a specific value around which the realizations of 𝑋 tend to cluster.

7
Example 8. Consider the discrete random variable 𝑋 that shows the result of a six-sided die. We
have already calculated 𝐸(𝑋) = 𝜇 = 3.5. Then, the variance of 𝑋 is,

6
1 1
𝑉𝑎𝑟(𝑋) = ∑(𝑥𝑖 − 𝜇)2 𝑓(𝑥𝑖 ) = (1 − 3.5)2 × + ⋯ + (6 − 3.5)2 × = 2.91
6 6
𝑖=1

Example 9. Consider the continuous random variable 𝑋 that shows the waiting time for a train.
Given 𝐸(𝑋) = 10, we can calculate the variance of 𝑋 as follows,


𝑉𝑎𝑟(𝑋) = ∫ (𝑥 − 𝜇)2 𝑓(𝑥)𝑑𝑥
−∞
0 20 ∞
𝑉𝑎𝑟(𝑋) = ∫ (𝑥 − 10)2 𝑓(𝑥)𝑑𝑥 + ∫ (𝑥 − 10)2 𝑓(𝑥)𝑑𝑥 + ∫ (𝑥 − 10)2 𝑓(𝑥)𝑑𝑥
−∞ 0 20
20 20
1 1 2
𝑉𝑎𝑟(𝑋) = 0 + ∫ (𝑥 − 10)2 𝑑𝑥 + 0 = ∫ (𝑥 − 20𝑥 + 100)𝑑𝑥
0 20 0 20
20
1 1 3 2
𝑉𝑎𝑟(𝑋) = [ ( 𝑥 − 10𝑥 + 100𝑥)] = 33.3333 …
20 3 0

10. The positive square root of the variance is called the standard deviation, 𝜎𝑋 = |√𝑉𝑎𝑟(𝑋)|. In

most applications, the standard deviation is used to quantify the dispersion of the random variable
instead of the variance.

11. 𝑉𝑎𝑟(𝑋) can be also expressed as 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 . In fact,

2
𝑉𝑎𝑟(𝑋) = 𝐸 [(𝑋 − 𝐸(𝑋)) ] = 𝐸[𝑋 2 − 2𝑋𝐸(𝑋) + 𝐸(𝑋)2 ]

𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − 2𝐸(𝑋(𝐸(𝑋)) + 𝐸(𝑋)2 = 𝐸(𝑋 2 ) − 2𝐸(𝑋)𝐸(𝑋) + 𝐸(𝑋)2


𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 (9)

12. Let 𝑋1 , … , 𝑋𝑛 be a sequence of 𝑛 random variables and 𝑎1 , … 𝑎𝑛 constants. Define,

𝑌 = 𝑎1 𝑋1 + ⋯ + 𝑎𝑛 𝑋𝑛 (10)

i.e. a linear combination of 𝑋1 , … , 𝑋𝑛 with 𝑎𝑖 as the weight associated to each 𝑋𝑖 . Then, the following
useful rules for calculating 𝐸(𝑌) and 𝑉𝑎𝑟(𝑌) apply to such linear combinations:

8
a. 𝐸(𝑌) = 𝑎1 𝐸(𝑋1 ) + ⋯ + 𝑎𝑛 𝐸(𝑋𝑛 )
b. 𝑉𝑎𝑟(𝑌) = 𝑎12 𝑉𝑎𝑟(𝑋1 ) + ⋯ + 𝑎𝑛2 𝑉𝑎𝑟(𝑋𝑛 ) if 𝑋𝑖 and 𝑋𝑗 are pairwise independent,
𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = 0 for 𝑖 ≠ 𝑗.

Example 10. It has been previously established that if 𝑋 represents the result of a fair six-sided die,
then 𝐸(𝑋) = 3.5 and 𝑉𝑎𝑟(𝑋) = 2.91. Suppose you roll a die 100 times and calculate the sum of the
results, say 𝑊. The expected value and the variance of 𝑊 can be modeled as a linear combination
like 𝑊 = 𝑋1 + ⋯ + 𝑋100 where 𝑋𝑖 shows the 𝑖th roll of the same die. Because 𝐸(𝑋𝑖 ) = 3.5 and
𝑉𝑎𝑟(𝑋𝑖 ) = 2.91 for each 𝑖 = 1, … ,100, it follows that

𝐸(𝑊) = 𝐸(𝑋1 ) + ⋯ + 𝐸(𝑋100 ) = 100 × 3.5


𝑉𝑎𝑟(𝑊) = 𝑉𝑎𝑟(𝑋1 ) + ⋯ + 𝑉𝑎𝑟(𝑋100 ) = 100 × 2.91

9
References

Textbooks and articles:

Heumann, Christian, Michael Schomaker, and H.C. Shalabh. 2016. Introduction to Statistics and
Data Analysis. Springer.
Ross, Sheldon M. 1999. An Introduction to Mathematical Finance. Cambridge University Press.

Web sites:

10
End-of-chapter exercises: Random Variables and Probability Distributions

Exercise 1. Random variables

𝑋 is a random variable that shows the result of a fair coin flipped twice. (1) Let Ω be the sample
space of 𝑋. Write down the elements of Ω. (2) Tabulate the probability distribution function of 𝑋.

Exercise 2. Random variables

𝑌 is defined as the rolling of two six-sided dices. Write down the PDF and CDF of 𝑌.

Exercise 3. Random variables

The game of roulette involves spinning a wheel with 38 slots: 18 red, 18 black and 2 green. A ball
is spun onto the wheel and will eventually land in a slot, where each slot has an equal chance of
capturing the ball. Write down the PDF of this game.

Exercise 4. Random variables

A fair coin is tossed until "heads" is obtained. Let 𝑋 be the random variable that shows the number
of tails before the first head shows up, and 𝑝 the probability of getting heads in any given trial.
Write down a generic formula that shows the probability of getting the first heads at the 𝑥th trial?

Exercise 5. Probability distributions

Let 𝑋 be a random variable that shows the result of one six-sided die rolled one time. Evaluate the
following functions: (1) 𝑓(𝑥 = 2); (2) 𝑓(3 ≤ 𝑥 < 5); (3) 𝐹(𝑥 = 3); (4) 𝐹(𝑥 ≤ 6).

Exercise 6. Probability distributions

Let 𝑋 be a discrete random variable. Show that 𝑓(𝑥) = (1 − 𝑝)𝑥 𝑝 with 0 ≤ 𝑝 ≤ 1, is a valid
probability distribution function.

Exercise 7. Probability distributions

Show that the function 𝑓(𝑥) = (𝑥 + 2)⁄25 is a valid discrete probability mass function over the
domain of 𝑋 ∈ {1, 2, 3, 4, 5}.

Exercise 8. Probability distributions

11
Let 𝑓(𝑥) = 𝑐𝑥 2 for 𝑥 = 1, 2 and 3. Find 𝑐 such that 𝑓(𝑥) is a valid probability mass function.

Exercise 9. Probability distributions

Let 𝑓(𝑥) = 𝑐(1⁄4)𝑥 for 𝑥 = 1, 2, … Find 𝑐 such that 𝑓(𝑥) is a valid probability mass function.

Exercise 10. Probability distributions

𝑋 is a continuous random variable with the following CDF,

0 𝑖𝑓 𝑥 < 2
1 2
𝐹(𝑥) = {− 𝑥 + 2𝑥 − 3 𝑖𝑓 2 ≤ 𝑥 ≤ 4
4
1 𝑖𝑓 𝑥 > 4

(1) What is the PDF of 𝑋? (2) Calculate 𝑃(𝑋 < 3).

Exercise 11. Probability distributions

A winemaker experiments with new grapes and adds a new wine to his stock. The percentage sold
by the end of the season depends on the weather and various other factors. It can be modelled
using the random variable with the following CDF,

0 𝑖𝑓 𝑥 < 0
𝐹(𝑥) = {3𝑥 − 2𝑥 3
2
𝑖𝑓 0 ≤ 𝑥 ≤ 1
1 𝑖𝑓 𝑥 > 1

(1) Determine the PDF of 𝑋. (2) What is the probability to sell at least one-third of his wine, but
no more than two thirds?

Exercise 12. Probability distributions


1
Suppose that 𝑋 shows the waiting time for the train on a station. The CDF of 𝑋 is 𝐹(𝑥) = 20 𝑥. What

is the probability to wait the next train between 15 and 20 minutes?

Exercise 13. Probability distributions and expectation

Let 𝑋 be defined as a random variable that shows the outcome of one six-sided die rolled once.
The sample space of 𝑋 and their respective probabilities are shown below.

𝑥𝑖 1 2 3 4 5 6
𝑃(𝑋 = 𝑥𝑖 ) 1/6 1/6 1/6 1/6 1/6 1/6
Calculate 𝑃(𝑋 ≤ 3). Calculate the mean of 𝑋.

12
Exercise 14. Probability distributions and expectation

A quality index summarizes different features of a product. Experts may assign different quality
scores depending on their experience with the product. Let 𝑋 be the quality index for a graphic
tablet. The PDF of 𝑋 is given below,

𝑐𝑥(2 − 𝑥) 𝑖𝑓 0 ≤ 𝑥 ≤ 2
𝑓(𝑥) = {
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

(1) Determine 𝑐 such that 𝑓(𝑥) is a valid PDF. (2) Determine the CDF of 𝑋. (3) Calculate 𝐸(𝑋) and
𝑉𝑎𝑟(𝑋).

Exercise 15. Probability distributions and expectation

The bar graph on the right shows the PDF of a 50%


discrete random variable 𝑋. (1) Using the 0.4
40%
graph, show that 𝑓(𝑥) is a probability 0.3
30%
distribution function. (2) Evaluate the following 0.2
20%
functions: 𝑓(𝑥 = 0), 𝑓(𝑥 ≤ 1), 𝐹(𝑥 = 2). Note 0.1
10%
that 𝐹(⋅) refers to the cumulative distribution
0%
function of 𝑋. (3) What is the mean of 𝑋?
-2 0 1 2

Exercise 16. Expectation

Two books are assigned for a statistics class: a textbook and its corresponding study guide. The
university bookstore determined 20% of enrolled students do not buy either book, 55% buy the
textbook only, and 25% buy both books. These percentages are relatively constant from one year
to another. If there are 100 students enrolled, how many books should the bookstore expect to
sell to this class? The textbook costs $137 and the study guide $33. How much revenue should the
bookstore expect from this class of 1000 students?

Exercise 17. Expectation and variance

An editor states that 70% of all books he had so far been editing contains no typo error, while 20%
of them contains 1 typo error and 10% contains 2 typo errors. He assumes that the share of books
that contain more than 2 typo errors is negligible. What is the mean value of typo errors in books
edited by this editor? What is the variance of the mean number of errors?

13
Exercise 18. Expectation

Four buses carrying 152 students from the same school arrive at a stadium. The buses carry,
respectively, 39, 33, 46 and 34 students. One of the 152 students is randomly chosen. Let 𝑋 denote
the number of students who were on the bus of the selected student. One of the four bus drivers
is also randomly chosen. Let 𝑌 be the number of students who were on that driver's bus. Calculate
𝐸(𝑋) and 𝐸(𝑌). Why 𝐸(𝑋) is larger than 𝐸(𝑌)?

Exercise 19. Expectation

A gambling book recommends the following strategy for the game of roulette. It recommends the
gambler bet $1 on red. If red appears, which has probability of 18/38 occurring, then the gambler
should take his profit of $1 and quit. If the gambler loses this bet, he should then make a second
bet of size twice his initial bet and then quit. Let 𝑋 denote the gambler's winnings. What is 𝐸(𝑋)?

Exercise 20. Expectation and variance

The game of roulette involves spinning a wheel with 38 slots: 18 red, 18 black and 2 green. A ball
is spun onto the wheel and will eventually land in a slot, where each slot has an equal chance of
capturing the ball. Gamblers can place bets on black or red slots. If the ball lands on their color,
the payoff is double the money they bet. If the ball lands on another color, then they lose their
money. Suppose you bet $1 on red. What is the expected value and standard deviation of your
winnings?

Exercise 21. Expectation

Let 𝑋1 , … , 𝑋𝑛 be a sequence of independent random variables, all having the same expected value
𝜇 and variance 𝜎 2 . Define the random variable 𝑋̅ as the arithmetic average of these variables,
called the sample mean, given by,
𝑛
1
𝑋̅ = ∑ 𝑋𝑖
𝑛
𝑖=1

Show that 𝐸(𝑋̅) = 𝜇 and 𝑉𝑎𝑟(𝑋̅) = 𝜎 2 ⁄𝑛.

Exercise 22. Expected value and variance

A chance game involving the rolling a six-sided twice has the following outcomes:

outcome 1st die = 2nd die 1st die + 2nd die = even 1st die + 2nd die = odd
for each $1 bet win $1 × 4 nothing lose $1 × 2

14
For example, if a player places $10 and the results of the two rolls are the same, the player wins
four times his initial bet. So, his profit will be (4 × $10) − $10 = $30, that is the payoff minus the
initial bet that cannot be recovered. Calculate the expected value of the profit or loss (P&L) for a
player who bets $20 to this game. What is the standard deviation of the P&L?

Exercise 23. Expected value and variance

A lawyer must decide whether to charge a fixed fee of $5,000 or take a contingency fee of $25,000
if she wins the case (and $0 if she loses). She thinks that her probability of winning the case is
30%. Determine the mean and standard deviation of her fee if (1) she takes the fixed fee; (2) she
takes the contingency fee.

Exercise 24. Combinations of random variables

Mickey has invested 60% of his portfolio in Facebook stocks and 40% in Amazon stocks. Let 𝑅𝐹
be the percentage change in Facebook stock price next month and 𝑅𝐴 the percentage change in
Amazon stock price next month. Mickey estimated the mean percentage price change and the
standard deviation for both stocks as follows:

mean st. deviation


Facebook 0.0045 0.0846
Amazon 0.0035 0.0519
Calculate the expected change and standard deviation in Mickey's portfolio for next month if the
price changes of Facebook and Amazon stocks are independent random variables.

Exercise 25. Combinations of random variables

An airline charges the following baggage fees: $25 for the first bag and $35 for the second. Suppose
54% of passengers have no checked luggage, 34% have one piece of checked luggage and 12%
have two pieces. We suppose a negligible portion of people checks more than two bags. (1) Write
a probability model for the revenues per passenger. What is the expected revenue per passenger?
Standard deviation? (2) About how much revenue should the airline expect for a flight of 120
passengers? With what standard deviation? Which assumption you make in your calculations?

Exercise 26. Combinations of random variables

Jimmy travels to work four days a week, except on Wednesdays. We will use 𝑋1 to represent his
travel time on Monday, 𝑋2 to represent his travel time on Tuesday, and so on. (1) Write an
equation that represents his travel time for the week. (2) It takes Jimmy an average of 45 minutes

15
on Mondays and Fridays, and 40 minutes on Tuesdays and Thursdays. What is his average
commute time for the week? (3) Suppose Jimmy’s daily commute has a standard deviation of 5
minutes for each day of the week. What is the standard deviation of his total weekly travel time to
work?

16
Solutions to end-of-chapter exercises: Random Variables and Probability Distributions

Exercise 1. Random variables

Ω𝑋 = {(𝐻, 𝐻), (𝐻, 𝑇), (𝑇, 𝐻), (𝑇, 𝑇)}. The PMF of 𝑋 can be tabulated as,

𝑥𝑖 𝐻, 𝐻 𝐻, 𝑇 𝑇, 𝐻 𝑇, 𝑇
𝑓(𝑥𝑖 ) 0.25 0.25 0.25 0.25

Exercise 2. Random variables

Exercise 3. Random variables

The PDF of the game can be written as follows,

𝑥𝑖 red black green


𝑓(𝑥𝑖 ) 18⁄38 18⁄38 2⁄38

Exercise 4. Random variables

Let 𝑝 be the probability of getting "Heads" at the first trial. If, on the other hand, heads is
observed at the second trial, this implies that the first trial turned "Tails". Therefore, the
probability of getting "Heads" after 𝑥 − 1 "Tails" is 𝑃(𝑋 = 𝑥) = (1 − 𝑝)𝑥 𝑝.

Exercise 5. Probability distributions

(1) …

Exercise 6. Probability distributions

If 𝑓(𝑥) is a valid probability function, then ∑𝑥 𝑓(𝑥) = 1. Consider a sequence of 𝑛 trials for the
random variable 𝑋. We have ∑𝑛𝑖=0 𝑓(𝑖) = 𝑓(𝑥 = 0) + 𝑓(𝑥 = 1) + ⋯ + 𝑓(𝑥 = 𝑛) = (1 − 𝑝)0 𝑝 +
(1 − 𝑝)1 𝑝 + ⋯ + (1 − 𝑝)𝑛 𝑝 , which can be rewritten as ∑𝑛𝑖=0 𝑓(𝑖) = 𝑝(1 + (1 + 𝑝) + (1 + 𝑝)2 +
⋯ + (1 − 𝑝)𝑛 ). Since 0 ≤ 𝑝 ≤ 1, it turns out that 0 ≤ 1 − 𝑝 ≤ 1. Therefore, lim {1 + (1 + 𝑝) +
𝑛→∞

17
(1 + 𝑝)2 + ⋯ + (1 − 𝑝)𝑛 } = 1. Recall the generic geometric series with 𝑛 terms, 𝑠𝑛 = 1 + 𝑢 +
1−𝑢𝑛+1
𝑢2 + ⋯ + 𝑢𝑛 . Multiplying 𝑠𝑛 by 𝑢 and subtracting the result from 𝑠𝑛 , we can obtain 𝑠𝑛 = 1−𝑢
.
1−(1−𝑝)𝑛+1 1
Substituting 𝑢 with 1 − 𝑝, we obtain ∑𝑛𝑖=0 𝑓(𝑖) = 𝑝 × lim { 𝑝
} = 𝑝 × 𝑝 = 1.
𝑛→∞

Exercise 7. Probability distributions

We must check that 𝑓(𝑥𝑖 ) ≥ 0 and ∑𝑓(𝑥𝑖 ) = 1. The first condition can be verified by calculating
𝑓(𝑥) for each value 𝑋 can take. The second condition must be verified as 𝑓(𝑋 = 1) + ⋯ +
𝑓(𝑋 = 5). Doing the math, it can be seen that ∑𝑓(𝑥𝑖 ) = 1.

Exercise 8. Probability distributions

𝑓(𝑥) is valid PMF if ∑𝑥 𝑓(𝑥𝑖 ) = 1 and 𝑓(𝑥𝑖 ) ≥ 0. 𝑐12 + 𝑐22 + 𝑐33 = 𝑐(1 + 4 + 9) = 1 → 𝑐 =
1⁄14. In addition, 𝑓(𝑥𝑖 ) ≥ 0 for all 𝑥𝑖 .

Exercise 9. Probability distributions

1 1 1
We must solve for 𝑐 in 𝑐 (4 + 42 + 43 + ⋯ ) = 1. Note that the series in the parenthesis is a

geometric series with reason 1⁄4 and converges to (1⁄4)⁄(1 − (1⁄4)). The solution is now trivial.

Exercise 10. Probability distributions

Source: Heumann et al. (2016, 150). 𝑓(𝑥) = 𝑑𝐹(𝑥)⁄𝑑𝑥 → 𝑓(𝑥) = 0 if 𝑥 < 2, 𝑓(𝑥) = −0.5𝑥 + 2 if
2 ≤ 𝑥 ≤ 4 and 𝑓(𝑥) = 0 if 𝑥 > 4. 𝑃(𝑋 < 3) = 𝑃(𝑋 ≤ 3) − 𝑃(𝑋 = 3). 𝑋 is continuous, so
𝑃(𝑋 = 3) = 0. Using the CDF, 𝐹(3) = 𝑃(𝑋 ≤ 3) = 3⁄4.

Exercise 11. Probability distributions

Source: Heumann et al. (2016, 150). (1) 𝑓(𝑥) = 𝑑𝐹(𝑥)⁄𝑑𝑥 = 6(𝑥 − 𝑥 2 ) if 0 ≤ 𝑥 ≤ 1 and 0
1 2 2 1
elsewhere. (2) 𝑃 (3 ≤ 𝑋 ≤ 3) = 𝐹 (3) − 𝐹 (3) = 0.48149.

18
Exercise 12. Probability distributions

𝐹(20) − 𝐹(15) = (1⁄20) × 20 − (1⁄20) × 15 = 0.25.

Exercise 13. Probability distributions and expectation

𝑃(𝑋 ≤ 3) = 3⁄6. 𝐸(𝑋) = 3.5.

Exercise 14. Probability distributions and expectation

Source: Heumann et al. (2016, 150). 𝑓(𝑥) is a valid PDF if

∞ 0 2 ∞
1 = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
−∞ −∞ 0 2
2 2
1 1
1 = 0 + ∫ 𝑐(2𝑥 − 𝑥 2 )𝑑𝑥 + 0 = 𝑐 [𝑥 2 − 𝑥 3 ] = 𝑐 (22 − 23 )
0 3 0 3

3
Solving for 𝑐 we get 𝑐 = 3⁄4. In addition, 𝑓(𝑥) = (2𝑥 − 𝑥 2 ) ≥ 0 ∀𝑥 ∈ [0, 2]. The CDF is
4

𝑥 𝑥 𝑥
3 3 1 3 1
𝐹(𝑥) = ∫ 𝑓(𝑢)𝑑𝑢 = ∫ (2𝑢 − 𝑢2 )𝑑𝑢 = [𝑢2 − 𝑢3 ] = (𝑥 2 − 𝑥 3 )
0 0 4 4 3 0 4 3

The expected value of 𝑋 is

∞ 0 2 ∞
𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥𝑓(𝑥)𝑑𝑥 + ∫ 𝑥𝑓(𝑥)𝑑𝑥 + ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞ −∞ 0 2
2 2
3 3
𝐸(𝑋) = 0 + ∫ 𝑥 ( (2𝑥 − 𝑥 2 )) 𝑑𝑥 + 0 = ∫ (2𝑥 2 − 𝑥 3 )𝑑𝑥
0 4 0 4
2
3 2 1 3 16 16
𝐸(𝑋) = [ 𝑥 3 − 𝑥 4 ] = ( − ) = 1
4 3 4 0 4 3 4

To calculate the variance, we will make use of the formula 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 . So, we first
take the integral,

2 2
3 2 3 1 1 3 1 1 6
𝐸(𝑋 2 ) = ∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 = ∫ (2𝑥 3 − 𝑥 4 )𝑑𝑥 = [ 𝑥 4 − 𝑥 5 ] = ( 24 − 25 ) =
0 4 0 4 2 5 0 4 2 5 5

Then, 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 = 6⁄5 − 12 = 1⁄5.

19
Exercise 15. Probability distributions and expectation

𝑓(𝑥) is valid PDF because 𝑓(𝑥𝑖 ) ≥ 0 for all 𝑥𝑖 and ∑𝑥 𝑓(𝑥𝑖 ) = 1. 𝑓(0) = 0.4. 𝑓(𝑥 ≤ 1) = 𝑓(−2) +
𝑓(0) + 𝑓(1) = 0.3 + 0.4 + 0.2 = 0.9. 𝐹(2) = 𝑃(𝑋 ≤ 2) = 1. 𝐸(𝑋) = −2 × 0.3 + 0 × 0.4 + 1 ×
0.2 + 2 × 0.1 = −0.2.

Exercise 16. Expectation

𝐸(𝑋) = 20% × 0 + 55% × 1 + 25% × 2 = 1.05 book per student. Multiplying by the number of
students, we can expect the bookstore to sell 1000 × 1.05 = 1,050 books for this class. The
expected revenue is 𝐸(𝑋) = 55% × $137 + 25% × ($137 + $33) = $117.85 per student.
Aggregating, we get 1000 × $117.85 = $117,850.

Exercise 17. Expectation and variance

𝐸(𝑋) = 0.7 × 0 + 0.2 × 1 + 0.1 × 2 = 0.4 error. 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 = 0.6 − 0.16 = 0.44.

Exercise 18. Expectation

Source: Ross (1999, 17). 𝑋 can take the values Ω𝑋 = {39, 33, 46, 34} with probabilities
𝑃(𝑋 = 39) = 39⁄152, 𝑃(𝑋 = 33) = 33⁄152, 𝑃(𝑋 = 46) = 46⁄152 and 𝑃(𝑋 = 34) = 34⁄152. We
calculate 𝐸(𝑋) = 38.69. 𝑌 can also take the same values as 𝑋, however each bus is equally likely
to be chosen, so 𝑃(𝑌 = 39) = ⋯ = 𝑃(𝑌 = 34) = 0.25. We calculate 𝐸(𝑌) = 38. 𝐸(𝑋) is larger than
𝐸(𝑌) because when calculating the mean of 𝑋 we assign greater weight to the third outcome that
yields 𝑋 = 46.

Exercise 19. Expectation

Source: Ross (1999, 17). Let's tabulate the outcomes of the strategy sequentially and the
probabilities associated with each one,

outcomes red not red, red not red, not red


probability 18 20 18 20 20
× ×
38 38 38 38 38
payoff $2 $4 $0
profit $1 $2 −$2
18 20 18 20 20
We calculate 𝐸(𝑋) = $1 × 38 + $2 × 38 × 38 − $2 × 38 × 38 = $0.4183.

20
Exercise 20. Expectation and variance

Suppose that a player places a bet on the red slot. The probability model for the game can be
described as follows,

outcome red black green


probability 18/38 18/38 2/38
payoff $2 $0 $0
profit $1 −$1 −$1
𝐸(𝑝𝑟𝑜𝑓𝑖𝑡) = (18⁄38) × $1 + ((18 + 2)⁄38) × (−$1) = −$0.0526. 𝑉𝑎𝑟(𝑝𝑟𝑜𝑓𝑖𝑡) = 𝐸(𝑝𝑟𝑜𝑓𝑖𝑡 2 ) −
𝐸(𝑝𝑟𝑜𝑓𝑖𝑡)2 = $1 − (−$0.0526)2 = $0.9972. The standard deviation is $0.9986.

Exercise 21. Expectation

1 1 1 1
𝐸(𝑋̅) = 𝐸 (𝑛 (𝑋1 + ⋯ 𝑋𝑛 )) = 𝑛 (𝐸(𝑋1 ) + ⋯ + 𝐸(𝑋𝑛 )) = 𝑛 (𝜇 + ⋯ + 𝜇) = 𝑛 𝑛𝜇 = 𝜇. The variance

1 1 2
can be rewritten as 𝑉𝑎𝑟(𝑋̅) = 𝑉𝑎𝑟 ( (𝑋1 + ⋯ + 𝑋𝑛 )) = ( ) (𝑉𝑎𝑟(𝑋1 ) + ⋯ + 𝑉𝑎𝑟(𝑋𝑛 )). Because
𝑛 𝑛

1 𝜎2
𝑉𝑎𝑟(𝑋𝑖 ) = 𝜎 2 , we get 𝑉𝑎𝑟(𝑋̅) = 𝑛2 𝑛𝜎 2 = 𝑛 .

Exercise 22. Expected value and variance

For a $20 bet, the payoff profile will be (1) (4 × $20) − $20 = $60 with probability 6⁄36; (2) −$20
with probability 12⁄36, and (3) (−2 × $20) − $20 = −$60 with probability 18⁄36. So, 𝐸(𝑋) =
−$26.67, 𝜎𝑋 = $42.69.

Exercise 23. Expected value and variance

If she takes the fixed fee, the mean is $5000 and the standard deviation 0 because the fee is to be
received with certainty. If she takes the contingency fee with 30% probability of winning the case,
then 𝐸(𝑋) = 30% × $25,000 + 70% × $0 = $7,500 with a standard deviation by $11,456.

Exercise 24. Combinations of random variables

The expected change of the portfolio is 𝐸(𝑅𝑃 ) = 60% × 0.0045 + 40% × 0.0035 = 0.0041. The
variance of the change in portfolio value is 𝑉𝑎𝑟(𝑅𝑃 ) = 0.62 × 0.08462 + 0.42 × 0.05192 = 0.003.
The standard deviation is then 𝜎𝑅 = 0.0548.

21
Exercise 25. Combinations of random variables

Let's tabulate the data,

nb of checked luggage 0 1 2
proportion of passengers 54% 34% 12%
fee $0 $25 $25+$35
The expected fee per passenger 𝐸(𝑋) = 54% × $0 + 34% × $25 + 12% × ($25 + $35) = $15.7.
The variance is $398.01 and 𝜎 = $19.9502. On a flight with 120 passengers, the airline can expect
to charge a total fee by 120 × 𝐸(𝑋) = $1884 with variance 120 × 𝑉𝑎𝑟(𝑋) = $47761.1. The
standard deviation is then $218.54.

Exercise 26. Combinations of random variables

22
Appendix. Useful notation

This appendix gives some commonly used notation and symbols in probability and statistics.

▪ Greek alphabet:

Letter Name Letter Name

Α 𝛼 Alpha Ν 𝜈 Nu
Β 𝛽 Beta Ξ 𝜉 Xi
Γ 𝛾 Gamma O o Omicron
Δ 𝛿 Delta Π 𝜋 Pi
Ε 𝜖 Epsilon Ρ 𝜌 Rho
Ζ 𝜁 Zeta Σ 𝜎 Sigma
Η 𝜂 Eta Τ 𝜏 Tau
Θ 𝜃 Theta Υ 𝜐 Upsilon
Ι 𝜄 Iota Φ 𝜙 Phi
Κ 𝜅 Kappa Χ 𝜒 Chi
Λ 𝜆 Lambda Ψ 𝜓 Psi
Μ 𝜇 Mu Ω 𝜔 Omega

▪ Ω stands for the sample space of an experiment, and 𝑥𝑖 an element of Ω.


▪ ∅ is the empty set.
▪ Capital letters like 𝐴, 𝐵 etc. denote events, i.e. any subset in the sample space, 𝐴 ⊆ Ω.
▪ The complement of an event is denoted as 𝐴̅ or 𝐴𝑐 . Union and intersection between two events
are 𝐴 ∪ 𝐵 and 𝐴 ∩ 𝐵, respectively.
▪ 𝑃(𝐴) denotes the probability of the event A. The conditional probability of an event A given
another event B is denoted as 𝑃(𝐴|𝐵).
▪ Lowercase letters 𝑎, 𝑏, etc. are used for scalars.
▪ Lowercase boldface letters like 𝒂, 𝒃, etc. are used for vectors.
▪ Uppercase letters like 𝑋, 𝑌, etc. typically represent a random variable. Lowercase letters like 𝑥,
𝑦, etc. correspond to a realization of the random variable. If the set of outcomes is countable,
then it is possible to add an index like 𝑥𝑖 with 𝑖 = 1,2, … , 𝑛.
▪ The expected value and variance of a random variable are denoted as 𝐸(𝑋) and 𝑉𝑎𝑟(𝑋). The
covariance between two random variables like 𝑋 and 𝑌 is 𝐶𝑜𝑣(𝑋, 𝑌).
▪ A “hat” over a parameter or a parameter vector, e.g. 𝜃̂ or 𝜽
̂ , denotes as estimator of the
corresponding parameter or parameter vector.

23
▪ In practice, there are some specific symbols reserved to some parameters and/or statistics. For
example, the population mean is typically denoted by 𝜇, while a sample average is denoted
by 𝑥̅ . While the population variance is 𝜎 2 , the sample variance is either written as 𝜎̂ 2 or 𝑠 2 .
▪ The probability distribution function (PDF) of a random variable 𝑋 is denoted as 𝑓𝑋 (𝑥; 𝜽)
where 𝜽 is the vector of the parameters of the distribution. For example, if 𝑋 follows a normal
distribution, then the vector 𝜽 contains the mean and the standard deviation of 𝑋. In most
applications when the parameters are known and there is no other random variable, we use a
simplified notation like 𝑓(𝑥).
▪ The cumulative distribution function (CDF) of a random variable is 𝐹𝑋 (𝑥).
▪ Φ(𝑍 ≤ 𝑧) and 𝜙(𝑧) represent the standard normal CDF and PDF, respectively. A random
variable that follows a standard normal distribution is denoted as 𝑍 ∼ 𝑁(0,1).

24
Appendix. Solutions to selected exercises

(advanced, requires calculus) Consider the following pdf:

0 𝑖𝑓 𝑥 ≤ 0
𝑐
𝑓𝑋 (𝑥) = { 𝑖𝑓 0 < 𝑥 < 1
√𝑥
0 𝑖𝑓 𝑥 ≥ 1

Calculate 𝑃(0.2 ≤ 𝑋 ≤ 0.8).

Solution: Since 𝑓(𝑥) is a density function, it must satisfy


1 1
𝑐
∫ 𝑑𝑡 = 1 → 𝑐 ∫ 𝑡 −1⁄2 𝑑𝑡 = 1
0 √𝑡 0

1
So, 𝑐2𝑡 1⁄2 |0 = 1 → 𝑐 = 1⁄2. Then, 𝑃(0,2 ≤ 𝑋 ≤ 0,8) = 𝐹(𝑥 = 0,8) − 𝐹(𝑥 = 0,2). So,

0,8
1 1 0,8
𝐹(𝑥 = 0,8) = ∫ 𝑑𝑥 = ∫ 𝑥 −1⁄2 𝑑𝑥 = √𝑥 + 𝐶
0 2√𝑥 2 0

where we apply the power rule ∫ 𝑥 𝑛 𝑑𝑥 = 𝑥 𝑛+1 ⁄(𝑛 + 1). Therefore, 𝑃(0,2 ≤ 𝑋 ≤ 0,8) = √0,8 −
√0,2 = 0,4472.

25
Extra material (not used in class)

26
Joint and marginal distributions

Basic definitions

13. The joint density function of two random variables 𝑋 and 𝑌, denoted 𝑓𝑋,𝑌 (𝑥, 𝑦), is specified as,

∑ ∑ 𝑓𝑋,𝑌 (𝑥, 𝑦)
𝑎≤𝑥≤𝑏 𝑐≤𝑦≤𝑑
𝑃(𝑎 ≤ 𝑥 ≤ 𝑏, 𝑐 ≤ 𝑦 ≤ 𝑑) = 𝑏 𝑑
(11)
∫ ∫ 𝑓𝑋,𝑌 (𝑥, 𝑦)𝑑𝑦𝑑𝑥
{ 𝑎 𝑐

respectively for discrete and continuous random variables. If 𝑓𝑋,𝑌 (𝑥, 𝑦) is a joint density function,
then all probabilities must be positive, and the summation and the integrals must sum up to 1.

14. The joint cumulative density function of two random variables, denoted 𝐹𝑋,𝑌 (𝑥, 𝑦), is like the
probability of a joint event. It is defined as,

∑ ∑ 𝑓𝑋,𝑌 (𝑥, 𝑦)
𝑋≤𝑥 𝑌≤𝑦
𝑃(𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦) = 𝑥 𝑦 (12)
∫ ∫ 𝑓𝑋,𝑌 (𝑡, 𝑠)𝑑𝑠𝑑𝑡
{ −∞ −∞

15. The marginal density function is defined with respect to a single random variable. To obtain
the marginal density from the joint density, it is necessary to sum or integrate out the other
variable:

∑ 𝑓𝑋,𝑌 (𝑥, 𝑦)
𝑐≤𝑦≤𝑑
𝑓𝑋 (𝑥) = (13)
∫ 𝑓𝑋,𝑌 (𝑥, 𝑠)𝑑𝑠
{ 𝑦

16. Using the marginal and joint density functions, it can be shown that two random variables are
statistically independent if, and only if, their joint density is the product of their marginal densities.

𝑓(𝑥, 𝑦) = 𝑓𝑋 (𝑥) × 𝑓𝑌 (𝑦) ⇔ 𝑋 and 𝑌 are independent (14)

The same rule also applies to the cdf, i.e. 𝐹(𝑥, 𝑦) = 𝐹𝑋 (𝑥) × 𝐹𝑌 (𝑦).

Moments in joint distributions

27
17. The means, variances and other higher moments of the variables in a joint distribution are
defined with respect to their marginal distributions. If 𝑋 is a discrete random variable, then

𝐸(𝑋) = ∑ 𝑥𝑓𝑋 (𝑥) = ∑ 𝑥 [∑ 𝑓(𝑥, 𝑦)] = ∑ ∑ 𝑥𝑓(𝑥, 𝑦) (15)


𝑥 𝑥 𝑦 𝑥 𝑦

Variances are computed in the same way:

2 2
𝑉𝑎𝑟(𝑋) = ∑(𝑥 − 𝐸(𝑋)) 𝑓𝑋 (𝑥) = ∑ ∑(𝑥 − 𝐸(𝑋)) 𝑓(𝑥, 𝑦) (16)
𝑥 𝑥 𝑦

18. The covariance between the 𝑋 and 𝑌 is a special case where,

𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑥 − 𝜇𝑋 )(𝑦 − 𝜇𝑌 )] = 𝐸(𝑋𝑌) − 𝜇𝑋 𝜇𝑌 = 𝜎𝑋𝑌 (17)

If the random variables are independent, then the covariance will be zero. Since independence
implies that 𝑓(𝑥, 𝑦) = 𝑓𝑋 (𝑥)𝑓𝑌 (𝑦), then,

𝜎𝑋𝑌 = ∑ ∑ 𝑓𝑋 (𝑥)𝑓𝑌 (𝑦)(𝑥 − 𝜇𝑋 )(𝑦 − 𝜇𝑌 )


𝑥 𝑦

𝜎𝑋𝑌 = (∑(𝑥 − 𝜇𝑋 )𝑓𝑋 (𝑥)) (∑(𝑦 − 𝜇𝑌 )𝑓𝑌 (𝑦)) (18)


𝑥 𝑦

𝜎𝑋𝑌 = 𝐸[(𝑥 − 𝜇𝑋 )]𝐸[(𝑦 − 𝜇𝑌 )]


𝜎𝑋𝑌 = 0

19. Although the sign of the covariance shows the direction of the covariation between 𝑋 and 𝑌,
its magnitude depends on the scales of measurement. A preferable measure to overcome this
problem is to use the correlation coefficient:

𝐶𝑜𝑣(𝑋, 𝑌)
𝜌𝑋𝑌 = (19)
𝜎𝑋 𝜎𝑌

It can be shown that the correlation is bounded within the interval [−1, 1]. It is thus scale-
independent.

20. The following results can be useful to assess the moments and co-moments in a joint
distribution. With 𝑎, 𝑏, 𝑐 and 𝑑 constants, it can be shown that

a. 𝐸(𝑎𝑋 + 𝑏𝑌 + 𝑐) = 𝑎𝐸(𝑋) + 𝑏𝐸(𝑌) + 𝑐


b. 𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑌 + 𝑐) = 𝑎2 𝑉𝑎𝑟(𝑋) + 𝑏 2 𝑉𝑎𝑟(𝑌) + 2𝑎𝑏𝐶𝑜𝑣(𝑋, 𝑌)
c. 𝐶𝑜𝑣(𝑎𝑋 + 𝑏𝑌, 𝑐𝑋 + 𝑑𝑌) = 𝑎𝑐𝑉𝑎𝑟(𝑋) + 𝑏𝑑𝑉𝑎𝑟(𝑌) + (𝑎𝑑 + 𝑏𝑐)𝐶𝑜𝑣(𝑋, 𝑌)

28
Conditional distributions

21. Conditioning and the use of conditional distributions play a significant role in statistical
modeling. In a bivariate probability distribution, we can define a conditional distribution over 𝑌
for each value of 𝑋. The conditional distributions are then defined as follows:

𝑓(𝑥, 𝑦) 𝑓(𝑥, 𝑦)
𝑓(𝑦|𝑥) = and 𝑓(𝑥|𝑦) = (20)
𝑓(𝑥) 𝑓(𝑦)

22. Using this definition, we can rewrite the proposition for independent random variables: 𝑋 and
𝑌 are independent if, and only if, 𝑓(𝑥, 𝑦) = 𝑓(𝑥)𝑓(𝑦). It follows that, if 𝑋 and 𝑌 are independent
random variables, then

𝑓(𝑦|𝑥) = 𝑓(𝑦) and 𝑓(𝑥|𝑦) = 𝑓(𝑥)

In plain English, if the random variables are independent, the probabilities of events relating to
one variable are unrelated to the other. Note also that the definition of conditional distributions
implies the following result,

𝑓(𝑥, 𝑦) = 𝑓(𝑦|𝑥)𝑓(𝑥) = 𝑓(𝑥|𝑦)𝑓(𝑦) (21)

23. The conditional mean of a random variable is the mean of its conditional distribution,

∫ 𝑦𝑓(𝑦|𝑥)𝑑𝑦 𝑖𝑓 𝑌 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
𝑦
𝐸(𝑌|𝑋) = (22)
∑ 𝑦𝑓(𝑦|𝑥) 𝑖𝑓 𝑌 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
{ 𝑦

The conditional mean function 𝐸(𝑌|𝑋) is called the regression of 𝑌 on 𝑋.

24. Notice that a random variable can be described as,

𝑌 = 𝐸(𝑌|𝑋) + (𝑌 − 𝐸(𝑌|𝑋)) (23)


𝑌 = 𝐸(𝑌|𝑋) + 𝜀

The conditional variance of a random variable is the variance of the conditional distribution, i.e.

2
𝑉𝑎𝑟(𝑌|𝑋) = 𝐸 [(𝑌 − 𝐸(𝑌|𝑋)) |𝑋] (24)

which is given by,

29
2
∫ (𝑦 − 𝐸(𝑌|𝑋)) 𝑓(𝑦|𝑥)𝑑𝑦 𝑖𝑓 𝑌 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
𝑦
𝑉𝑎𝑟(𝑌|𝑋) = 2 (25)
∑(𝑦 − 𝐸(𝑌|𝑋)) 𝑓(𝑦|𝑥) 𝑖𝑓 𝑌 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
{ 𝑦

The conditional variance is also called the scedastic function. Unlike the conditional mean function,
however, it is common for the conditional variance not to vary with 𝑋. The case where the
conditional variance is unrelated to 𝑋 is called homoscedasticity.

30

You might also like