Notes ch1 Random Variables and Probability Distributions
Notes ch1 Random Variables and Probability Distributions
Applied Statistics
Erkin Diyarbakirlioglu
February 6, 2022
1
Table of contents
2
1 Random variables and probability distributions
1. A random variable 𝑋 is a real-valued set function that assigns one and only one real number to
each element in the sample space Ω. Mathematically, 𝑋 is a mapping from Ω into a subset of ℝ as,
𝑋: Ω → ℝ (1)
Example 1. Let 𝑋 be the rolling of a die. Then, we can state that Ω𝑋 = {1, 2, … ,6} and denote a
specific realization from 𝑋 as 𝑥𝑖 .
3. For any 𝑋, discrete or continuous, the cumulative distribution function (CDF) is defined as,
The CDF is useful to calculate the probabilities of random events. Given the definition of the CDF,
the following properties should be noted,
1In many textbooks, one uses the term “finite” as a substitute of “countable”. Thus, one states that 𝑋 is a discrete
random variable if its sample space has a finite number of elements.
3
b. lim 𝐹(𝑥) = 0 and lim 𝐹(𝑥) = 1
𝑥→−∞ 𝑥→∞
c. 0 ≤ 𝐹(𝑥) ≤ 1
4. If 𝐹𝑋 is continuous and strictly increasing, then its inverse function 𝐹 −1 (𝑞) is called the quantile
function of 𝑋, with 0 ≤ 𝑞 ≤ 1. Mathematically, the CDF and quantile functions are related as,
Example 2. 𝑋 is a continuous random variable and 𝐹(𝑥 = 1.96) = 0.95. This implies that the
quantile function of 𝑋 evaluated at 0.95 is 𝐹 −1 (𝑞 = 0.95) = 1.96.
5. Some quantiles have specific names. For example, the 0.5th quantile is the median of 𝑋. The 0.25
and 0.75 quantiles are respectively the first and third quartiles. Quantiles are also expressed as
percentiles. For example, the 0.5th quantile of a random variable 𝑋 is also its 50th percentile.
6. The probability distribution function (PDF) of a random variable 𝑋 is a listing of all disjoint
outcomes of 𝑋 and their respective probabilities. Mathematically,
𝑃(𝑋 = 𝑥) if X discrete
𝑓𝑋 (𝑥) = { (4)
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) if X continuous
where 𝑎 and 𝑏 are any two values in the domain of 𝑋.2 For a function 𝑓(𝑥) to a valid PDF, it must
satisfy the following conditions:
Some sources use the term probability mass function to describe 𝑓(𝑥) when 𝑋 is a discrete random
variable and reserve the term probability density function for 𝑓(𝑥) when 𝑋 is continuous.
Example 3. 𝑋 shows the rolling of a die. Then, the PDF and CDF of 𝑋 can be tabulated as,
2In fact, for a continuous random variable 𝑋, the value of the probability distribution function associated with a
particular value is 0 because,
𝑥+𝜖
𝑃(𝑋 = 𝑥) = lim{𝑃(𝑋 ∈ [𝑥, 𝑥 + 𝜖])} = lim {∫ 𝑓𝑋 (𝑥)𝑑𝑥 } = 0
𝜖→0 𝜖→0 𝑥
Therefore, we can only assign probabilities to intervals in the range of 𝑥, whence the definition 𝑓(𝑥) = 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏).
4
outcome, 𝑥𝑖 1 2 3 4 5 6
PDF: 𝑓(𝑥) = 𝑃(𝑋 = 𝑥𝑖 ) 1/6 1/6 1/6 1/6 1/6 1/6
CDF: 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥𝑖 ) 1/6 2/6 3/6 4/6 5/6 6/6
PDF CDF
0.20 1.00
0.15 0.80
0.60
0.10
0.40
0.05 0.20
0.00 0.00
1 2 3 4 5 6 1 2 3 4 5 6
Example 4. Let 𝑍 be a continuous random variable with the CDF and PDF shown below,
𝑍 can take any values from −∞ to ∞. Because the PDF of 𝑍 evaluated at a specific real-value on ℝ
is zero, we work with a range of values for making probability calculations. For example, the result
of the operation, say, 𝑃(𝑍 ≤ 1) can be illustrated using the CDF and PDF as follows,
On the CDF, the result 𝐹(𝑧 = 1) is shown by the vertical dashed line connecting the 𝑥-axis to the
function 𝐹(𝑧) while 𝐹(𝑧 = 1) corresponds to the shaded area on the left of corresponds to the
shaded area on the left of 𝑧 = 1 on the PDF.
5
7. The CDF and PDF of a random variable are two related functions. Specifically, if 𝑋 is continuous,
𝑥
𝑑𝐹(𝑥)
𝐹(𝑥) = ∫ 𝑓𝑋 (𝑢)𝑑𝑢 ↔ 𝑓(𝑥) = (5)
−∞ 𝑑𝑥
and if 𝑋 is discrete,
Example 5. Suppose a train arrives to a station every 20 minutes.3 Therefore, the waiting time of a
randomly selected person is a random variable 𝑋 that can be described as,
𝑘 for 0 ≤ 𝑥 ≤ 20
𝑓(𝑥) = {
0 otherwise
20
For 𝑓(𝑥) to be a valid PDF, it must satisfy ∫0 𝑓(𝑥)𝑑𝑥 = 1. Thus, [𝑘𝑥]20
0 = 1 and 𝑘 = 1⁄20,
1⁄20 for 0 ≤ 𝑥 ≤ 20
𝑓(𝑥) = {
0 otherwise
𝑥 𝑥 1 1 1
We can derive the CDF as 𝐹(𝑥) = ∫0 𝑓(𝑢)𝑑𝑢 = ∫0 𝑑𝑢 = [𝑢]0𝑥 = 𝑥.
20 20 20
8. The mathematical expectation, or the expected value of a random variable 𝑋 is defined as,
∑ 𝑥𝑓(𝑥) if 𝑋 discrete
𝐸(𝑋) = 𝑥 (7)
∫𝑥𝑓(𝑥)𝑑𝑥 if 𝑋 continuous
{ 𝑥
Thus, the expected value of 𝑋 is basically a probability-weighted average of all the possible values
𝑋 can take. The expected value of 𝑋 is also called the mean of 𝑋, and typically denoted as 𝜇𝑋 .
6
Example 6. Let 𝑋 be the result of a die rolled one time. So, 𝑋 can take integers from 1 to 6 each with
probability 𝑓(𝑥𝑖 ) = 𝑃(𝑋 = 𝑥𝑖 ) = 1⁄6. Then 𝜇𝑋 can be calculated as,
6
1 1 1
𝜇𝑋 = ∑ 𝑥𝑖 𝑓(𝑥𝑖 ) = (1 × ) + (2 × ) + ⋯ + (6 × ) = 3.5
6 6 6
𝑖=1
Example 7. Consider again the "waiting time for a train" example. We established that,
1⁄20 for 0 ≤ 𝑥 ≤ 20
𝑓(𝑥) = {
0 otherwise
∞ 0 20 ∞
𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥𝑓(𝑥)𝑑𝑥 + ∫ 𝑥𝑓(𝑥)𝑑𝑥 + ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞ −∞ 0 20
20 20
1 1 400
𝐸(𝑋) = 0 + ∫ 𝑥 𝑑𝑥 + 0 = [ 𝑥 2 ] = − 0 = 10
0 20 40 0 40
2.2 Variance
9. The variance of 𝑋 measures the extent to which different values that 𝑋 may take can fall apart
from the mean. In other words, whereas the expected value, or the mean, is a measure of the
central tendency of 𝑋, and the variance is a measure of dispersion.4 Mathematically,
So, the variance measures the average square of the difference between 𝑋 and its mean 𝜇. In
general, we denote the variance of a random variable as 𝑉𝑎𝑟(𝑋) = 𝜎𝑋2 .
4 By central tendency, it is meant a specific value around which the realizations of 𝑋 tend to cluster.
7
Example 8. Consider the discrete random variable 𝑋 that shows the result of a six-sided die. We
have already calculated 𝐸(𝑋) = 𝜇 = 3.5. Then, the variance of 𝑋 is,
6
1 1
𝑉𝑎𝑟(𝑋) = ∑(𝑥𝑖 − 𝜇)2 𝑓(𝑥𝑖 ) = (1 − 3.5)2 × + ⋯ + (6 − 3.5)2 × = 2.91
6 6
𝑖=1
Example 9. Consider the continuous random variable 𝑋 that shows the waiting time for a train.
Given 𝐸(𝑋) = 10, we can calculate the variance of 𝑋 as follows,
∞
𝑉𝑎𝑟(𝑋) = ∫ (𝑥 − 𝜇)2 𝑓(𝑥)𝑑𝑥
−∞
0 20 ∞
𝑉𝑎𝑟(𝑋) = ∫ (𝑥 − 10)2 𝑓(𝑥)𝑑𝑥 + ∫ (𝑥 − 10)2 𝑓(𝑥)𝑑𝑥 + ∫ (𝑥 − 10)2 𝑓(𝑥)𝑑𝑥
−∞ 0 20
20 20
1 1 2
𝑉𝑎𝑟(𝑋) = 0 + ∫ (𝑥 − 10)2 𝑑𝑥 + 0 = ∫ (𝑥 − 20𝑥 + 100)𝑑𝑥
0 20 0 20
20
1 1 3 2
𝑉𝑎𝑟(𝑋) = [ ( 𝑥 − 10𝑥 + 100𝑥)] = 33.3333 …
20 3 0
10. The positive square root of the variance is called the standard deviation, 𝜎𝑋 = |√𝑉𝑎𝑟(𝑋)|. In
most applications, the standard deviation is used to quantify the dispersion of the random variable
instead of the variance.
2
𝑉𝑎𝑟(𝑋) = 𝐸 [(𝑋 − 𝐸(𝑋)) ] = 𝐸[𝑋 2 − 2𝑋𝐸(𝑋) + 𝐸(𝑋)2 ]
𝑌 = 𝑎1 𝑋1 + ⋯ + 𝑎𝑛 𝑋𝑛 (10)
i.e. a linear combination of 𝑋1 , … , 𝑋𝑛 with 𝑎𝑖 as the weight associated to each 𝑋𝑖 . Then, the following
useful rules for calculating 𝐸(𝑌) and 𝑉𝑎𝑟(𝑌) apply to such linear combinations:
8
a. 𝐸(𝑌) = 𝑎1 𝐸(𝑋1 ) + ⋯ + 𝑎𝑛 𝐸(𝑋𝑛 )
b. 𝑉𝑎𝑟(𝑌) = 𝑎12 𝑉𝑎𝑟(𝑋1 ) + ⋯ + 𝑎𝑛2 𝑉𝑎𝑟(𝑋𝑛 ) if 𝑋𝑖 and 𝑋𝑗 are pairwise independent,
𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = 0 for 𝑖 ≠ 𝑗.
Example 10. It has been previously established that if 𝑋 represents the result of a fair six-sided die,
then 𝐸(𝑋) = 3.5 and 𝑉𝑎𝑟(𝑋) = 2.91. Suppose you roll a die 100 times and calculate the sum of the
results, say 𝑊. The expected value and the variance of 𝑊 can be modeled as a linear combination
like 𝑊 = 𝑋1 + ⋯ + 𝑋100 where 𝑋𝑖 shows the 𝑖th roll of the same die. Because 𝐸(𝑋𝑖 ) = 3.5 and
𝑉𝑎𝑟(𝑋𝑖 ) = 2.91 for each 𝑖 = 1, … ,100, it follows that
9
References
Heumann, Christian, Michael Schomaker, and H.C. Shalabh. 2016. Introduction to Statistics and
Data Analysis. Springer.
Ross, Sheldon M. 1999. An Introduction to Mathematical Finance. Cambridge University Press.
Web sites:
10
End-of-chapter exercises: Random Variables and Probability Distributions
𝑋 is a random variable that shows the result of a fair coin flipped twice. (1) Let Ω be the sample
space of 𝑋. Write down the elements of Ω. (2) Tabulate the probability distribution function of 𝑋.
𝑌 is defined as the rolling of two six-sided dices. Write down the PDF and CDF of 𝑌.
The game of roulette involves spinning a wheel with 38 slots: 18 red, 18 black and 2 green. A ball
is spun onto the wheel and will eventually land in a slot, where each slot has an equal chance of
capturing the ball. Write down the PDF of this game.
A fair coin is tossed until "heads" is obtained. Let 𝑋 be the random variable that shows the number
of tails before the first head shows up, and 𝑝 the probability of getting heads in any given trial.
Write down a generic formula that shows the probability of getting the first heads at the 𝑥th trial?
Let 𝑋 be a random variable that shows the result of one six-sided die rolled one time. Evaluate the
following functions: (1) 𝑓(𝑥 = 2); (2) 𝑓(3 ≤ 𝑥 < 5); (3) 𝐹(𝑥 = 3); (4) 𝐹(𝑥 ≤ 6).
Let 𝑋 be a discrete random variable. Show that 𝑓(𝑥) = (1 − 𝑝)𝑥 𝑝 with 0 ≤ 𝑝 ≤ 1, is a valid
probability distribution function.
Show that the function 𝑓(𝑥) = (𝑥 + 2)⁄25 is a valid discrete probability mass function over the
domain of 𝑋 ∈ {1, 2, 3, 4, 5}.
11
Let 𝑓(𝑥) = 𝑐𝑥 2 for 𝑥 = 1, 2 and 3. Find 𝑐 such that 𝑓(𝑥) is a valid probability mass function.
Let 𝑓(𝑥) = 𝑐(1⁄4)𝑥 for 𝑥 = 1, 2, … Find 𝑐 such that 𝑓(𝑥) is a valid probability mass function.
0 𝑖𝑓 𝑥 < 2
1 2
𝐹(𝑥) = {− 𝑥 + 2𝑥 − 3 𝑖𝑓 2 ≤ 𝑥 ≤ 4
4
1 𝑖𝑓 𝑥 > 4
A winemaker experiments with new grapes and adds a new wine to his stock. The percentage sold
by the end of the season depends on the weather and various other factors. It can be modelled
using the random variable with the following CDF,
0 𝑖𝑓 𝑥 < 0
𝐹(𝑥) = {3𝑥 − 2𝑥 3
2
𝑖𝑓 0 ≤ 𝑥 ≤ 1
1 𝑖𝑓 𝑥 > 1
(1) Determine the PDF of 𝑋. (2) What is the probability to sell at least one-third of his wine, but
no more than two thirds?
Let 𝑋 be defined as a random variable that shows the outcome of one six-sided die rolled once.
The sample space of 𝑋 and their respective probabilities are shown below.
𝑥𝑖 1 2 3 4 5 6
𝑃(𝑋 = 𝑥𝑖 ) 1/6 1/6 1/6 1/6 1/6 1/6
Calculate 𝑃(𝑋 ≤ 3). Calculate the mean of 𝑋.
12
Exercise 14. Probability distributions and expectation
A quality index summarizes different features of a product. Experts may assign different quality
scores depending on their experience with the product. Let 𝑋 be the quality index for a graphic
tablet. The PDF of 𝑋 is given below,
𝑐𝑥(2 − 𝑥) 𝑖𝑓 0 ≤ 𝑥 ≤ 2
𝑓(𝑥) = {
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
(1) Determine 𝑐 such that 𝑓(𝑥) is a valid PDF. (2) Determine the CDF of 𝑋. (3) Calculate 𝐸(𝑋) and
𝑉𝑎𝑟(𝑋).
Two books are assigned for a statistics class: a textbook and its corresponding study guide. The
university bookstore determined 20% of enrolled students do not buy either book, 55% buy the
textbook only, and 25% buy both books. These percentages are relatively constant from one year
to another. If there are 100 students enrolled, how many books should the bookstore expect to
sell to this class? The textbook costs $137 and the study guide $33. How much revenue should the
bookstore expect from this class of 1000 students?
An editor states that 70% of all books he had so far been editing contains no typo error, while 20%
of them contains 1 typo error and 10% contains 2 typo errors. He assumes that the share of books
that contain more than 2 typo errors is negligible. What is the mean value of typo errors in books
edited by this editor? What is the variance of the mean number of errors?
13
Exercise 18. Expectation
Four buses carrying 152 students from the same school arrive at a stadium. The buses carry,
respectively, 39, 33, 46 and 34 students. One of the 152 students is randomly chosen. Let 𝑋 denote
the number of students who were on the bus of the selected student. One of the four bus drivers
is also randomly chosen. Let 𝑌 be the number of students who were on that driver's bus. Calculate
𝐸(𝑋) and 𝐸(𝑌). Why 𝐸(𝑋) is larger than 𝐸(𝑌)?
A gambling book recommends the following strategy for the game of roulette. It recommends the
gambler bet $1 on red. If red appears, which has probability of 18/38 occurring, then the gambler
should take his profit of $1 and quit. If the gambler loses this bet, he should then make a second
bet of size twice his initial bet and then quit. Let 𝑋 denote the gambler's winnings. What is 𝐸(𝑋)?
The game of roulette involves spinning a wheel with 38 slots: 18 red, 18 black and 2 green. A ball
is spun onto the wheel and will eventually land in a slot, where each slot has an equal chance of
capturing the ball. Gamblers can place bets on black or red slots. If the ball lands on their color,
the payoff is double the money they bet. If the ball lands on another color, then they lose their
money. Suppose you bet $1 on red. What is the expected value and standard deviation of your
winnings?
Let 𝑋1 , … , 𝑋𝑛 be a sequence of independent random variables, all having the same expected value
𝜇 and variance 𝜎 2 . Define the random variable 𝑋̅ as the arithmetic average of these variables,
called the sample mean, given by,
𝑛
1
𝑋̅ = ∑ 𝑋𝑖
𝑛
𝑖=1
A chance game involving the rolling a six-sided twice has the following outcomes:
outcome 1st die = 2nd die 1st die + 2nd die = even 1st die + 2nd die = odd
for each $1 bet win $1 × 4 nothing lose $1 × 2
14
For example, if a player places $10 and the results of the two rolls are the same, the player wins
four times his initial bet. So, his profit will be (4 × $10) − $10 = $30, that is the payoff minus the
initial bet that cannot be recovered. Calculate the expected value of the profit or loss (P&L) for a
player who bets $20 to this game. What is the standard deviation of the P&L?
A lawyer must decide whether to charge a fixed fee of $5,000 or take a contingency fee of $25,000
if she wins the case (and $0 if she loses). She thinks that her probability of winning the case is
30%. Determine the mean and standard deviation of her fee if (1) she takes the fixed fee; (2) she
takes the contingency fee.
Mickey has invested 60% of his portfolio in Facebook stocks and 40% in Amazon stocks. Let 𝑅𝐹
be the percentage change in Facebook stock price next month and 𝑅𝐴 the percentage change in
Amazon stock price next month. Mickey estimated the mean percentage price change and the
standard deviation for both stocks as follows:
An airline charges the following baggage fees: $25 for the first bag and $35 for the second. Suppose
54% of passengers have no checked luggage, 34% have one piece of checked luggage and 12%
have two pieces. We suppose a negligible portion of people checks more than two bags. (1) Write
a probability model for the revenues per passenger. What is the expected revenue per passenger?
Standard deviation? (2) About how much revenue should the airline expect for a flight of 120
passengers? With what standard deviation? Which assumption you make in your calculations?
Jimmy travels to work four days a week, except on Wednesdays. We will use 𝑋1 to represent his
travel time on Monday, 𝑋2 to represent his travel time on Tuesday, and so on. (1) Write an
equation that represents his travel time for the week. (2) It takes Jimmy an average of 45 minutes
15
on Mondays and Fridays, and 40 minutes on Tuesdays and Thursdays. What is his average
commute time for the week? (3) Suppose Jimmy’s daily commute has a standard deviation of 5
minutes for each day of the week. What is the standard deviation of his total weekly travel time to
work?
16
Solutions to end-of-chapter exercises: Random Variables and Probability Distributions
Ω𝑋 = {(𝐻, 𝐻), (𝐻, 𝑇), (𝑇, 𝐻), (𝑇, 𝑇)}. The PMF of 𝑋 can be tabulated as,
𝑥𝑖 𝐻, 𝐻 𝐻, 𝑇 𝑇, 𝐻 𝑇, 𝑇
𝑓(𝑥𝑖 ) 0.25 0.25 0.25 0.25
Let 𝑝 be the probability of getting "Heads" at the first trial. If, on the other hand, heads is
observed at the second trial, this implies that the first trial turned "Tails". Therefore, the
probability of getting "Heads" after 𝑥 − 1 "Tails" is 𝑃(𝑋 = 𝑥) = (1 − 𝑝)𝑥 𝑝.
(1) …
If 𝑓(𝑥) is a valid probability function, then ∑𝑥 𝑓(𝑥) = 1. Consider a sequence of 𝑛 trials for the
random variable 𝑋. We have ∑𝑛𝑖=0 𝑓(𝑖) = 𝑓(𝑥 = 0) + 𝑓(𝑥 = 1) + ⋯ + 𝑓(𝑥 = 𝑛) = (1 − 𝑝)0 𝑝 +
(1 − 𝑝)1 𝑝 + ⋯ + (1 − 𝑝)𝑛 𝑝 , which can be rewritten as ∑𝑛𝑖=0 𝑓(𝑖) = 𝑝(1 + (1 + 𝑝) + (1 + 𝑝)2 +
⋯ + (1 − 𝑝)𝑛 ). Since 0 ≤ 𝑝 ≤ 1, it turns out that 0 ≤ 1 − 𝑝 ≤ 1. Therefore, lim {1 + (1 + 𝑝) +
𝑛→∞
17
(1 + 𝑝)2 + ⋯ + (1 − 𝑝)𝑛 } = 1. Recall the generic geometric series with 𝑛 terms, 𝑠𝑛 = 1 + 𝑢 +
1−𝑢𝑛+1
𝑢2 + ⋯ + 𝑢𝑛 . Multiplying 𝑠𝑛 by 𝑢 and subtracting the result from 𝑠𝑛 , we can obtain 𝑠𝑛 = 1−𝑢
.
1−(1−𝑝)𝑛+1 1
Substituting 𝑢 with 1 − 𝑝, we obtain ∑𝑛𝑖=0 𝑓(𝑖) = 𝑝 × lim { 𝑝
} = 𝑝 × 𝑝 = 1.
𝑛→∞
We must check that 𝑓(𝑥𝑖 ) ≥ 0 and ∑𝑓(𝑥𝑖 ) = 1. The first condition can be verified by calculating
𝑓(𝑥) for each value 𝑋 can take. The second condition must be verified as 𝑓(𝑋 = 1) + ⋯ +
𝑓(𝑋 = 5). Doing the math, it can be seen that ∑𝑓(𝑥𝑖 ) = 1.
𝑓(𝑥) is valid PMF if ∑𝑥 𝑓(𝑥𝑖 ) = 1 and 𝑓(𝑥𝑖 ) ≥ 0. 𝑐12 + 𝑐22 + 𝑐33 = 𝑐(1 + 4 + 9) = 1 → 𝑐 =
1⁄14. In addition, 𝑓(𝑥𝑖 ) ≥ 0 for all 𝑥𝑖 .
1 1 1
We must solve for 𝑐 in 𝑐 (4 + 42 + 43 + ⋯ ) = 1. Note that the series in the parenthesis is a
geometric series with reason 1⁄4 and converges to (1⁄4)⁄(1 − (1⁄4)). The solution is now trivial.
Source: Heumann et al. (2016, 150). 𝑓(𝑥) = 𝑑𝐹(𝑥)⁄𝑑𝑥 → 𝑓(𝑥) = 0 if 𝑥 < 2, 𝑓(𝑥) = −0.5𝑥 + 2 if
2 ≤ 𝑥 ≤ 4 and 𝑓(𝑥) = 0 if 𝑥 > 4. 𝑃(𝑋 < 3) = 𝑃(𝑋 ≤ 3) − 𝑃(𝑋 = 3). 𝑋 is continuous, so
𝑃(𝑋 = 3) = 0. Using the CDF, 𝐹(3) = 𝑃(𝑋 ≤ 3) = 3⁄4.
Source: Heumann et al. (2016, 150). (1) 𝑓(𝑥) = 𝑑𝐹(𝑥)⁄𝑑𝑥 = 6(𝑥 − 𝑥 2 ) if 0 ≤ 𝑥 ≤ 1 and 0
1 2 2 1
elsewhere. (2) 𝑃 (3 ≤ 𝑋 ≤ 3) = 𝐹 (3) − 𝐹 (3) = 0.48149.
18
Exercise 12. Probability distributions
∞ 0 2 ∞
1 = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑑𝑥
−∞ −∞ 0 2
2 2
1 1
1 = 0 + ∫ 𝑐(2𝑥 − 𝑥 2 )𝑑𝑥 + 0 = 𝑐 [𝑥 2 − 𝑥 3 ] = 𝑐 (22 − 23 )
0 3 0 3
3
Solving for 𝑐 we get 𝑐 = 3⁄4. In addition, 𝑓(𝑥) = (2𝑥 − 𝑥 2 ) ≥ 0 ∀𝑥 ∈ [0, 2]. The CDF is
4
𝑥 𝑥 𝑥
3 3 1 3 1
𝐹(𝑥) = ∫ 𝑓(𝑢)𝑑𝑢 = ∫ (2𝑢 − 𝑢2 )𝑑𝑢 = [𝑢2 − 𝑢3 ] = (𝑥 2 − 𝑥 3 )
0 0 4 4 3 0 4 3
∞ 0 2 ∞
𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥 = ∫ 𝑥𝑓(𝑥)𝑑𝑥 + ∫ 𝑥𝑓(𝑥)𝑑𝑥 + ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞ −∞ 0 2
2 2
3 3
𝐸(𝑋) = 0 + ∫ 𝑥 ( (2𝑥 − 𝑥 2 )) 𝑑𝑥 + 0 = ∫ (2𝑥 2 − 𝑥 3 )𝑑𝑥
0 4 0 4
2
3 2 1 3 16 16
𝐸(𝑋) = [ 𝑥 3 − 𝑥 4 ] = ( − ) = 1
4 3 4 0 4 3 4
To calculate the variance, we will make use of the formula 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 . So, we first
take the integral,
2 2
3 2 3 1 1 3 1 1 6
𝐸(𝑋 2 ) = ∫ 𝑥 2 𝑓(𝑥)𝑑𝑥 = ∫ (2𝑥 3 − 𝑥 4 )𝑑𝑥 = [ 𝑥 4 − 𝑥 5 ] = ( 24 − 25 ) =
0 4 0 4 2 5 0 4 2 5 5
19
Exercise 15. Probability distributions and expectation
𝑓(𝑥) is valid PDF because 𝑓(𝑥𝑖 ) ≥ 0 for all 𝑥𝑖 and ∑𝑥 𝑓(𝑥𝑖 ) = 1. 𝑓(0) = 0.4. 𝑓(𝑥 ≤ 1) = 𝑓(−2) +
𝑓(0) + 𝑓(1) = 0.3 + 0.4 + 0.2 = 0.9. 𝐹(2) = 𝑃(𝑋 ≤ 2) = 1. 𝐸(𝑋) = −2 × 0.3 + 0 × 0.4 + 1 ×
0.2 + 2 × 0.1 = −0.2.
𝐸(𝑋) = 20% × 0 + 55% × 1 + 25% × 2 = 1.05 book per student. Multiplying by the number of
students, we can expect the bookstore to sell 1000 × 1.05 = 1,050 books for this class. The
expected revenue is 𝐸(𝑋) = 55% × $137 + 25% × ($137 + $33) = $117.85 per student.
Aggregating, we get 1000 × $117.85 = $117,850.
𝐸(𝑋) = 0.7 × 0 + 0.2 × 1 + 0.1 × 2 = 0.4 error. 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 = 0.6 − 0.16 = 0.44.
Source: Ross (1999, 17). 𝑋 can take the values Ω𝑋 = {39, 33, 46, 34} with probabilities
𝑃(𝑋 = 39) = 39⁄152, 𝑃(𝑋 = 33) = 33⁄152, 𝑃(𝑋 = 46) = 46⁄152 and 𝑃(𝑋 = 34) = 34⁄152. We
calculate 𝐸(𝑋) = 38.69. 𝑌 can also take the same values as 𝑋, however each bus is equally likely
to be chosen, so 𝑃(𝑌 = 39) = ⋯ = 𝑃(𝑌 = 34) = 0.25. We calculate 𝐸(𝑌) = 38. 𝐸(𝑋) is larger than
𝐸(𝑌) because when calculating the mean of 𝑋 we assign greater weight to the third outcome that
yields 𝑋 = 46.
Source: Ross (1999, 17). Let's tabulate the outcomes of the strategy sequentially and the
probabilities associated with each one,
20
Exercise 20. Expectation and variance
Suppose that a player places a bet on the red slot. The probability model for the game can be
described as follows,
1 1 1 1
𝐸(𝑋̅) = 𝐸 (𝑛 (𝑋1 + ⋯ 𝑋𝑛 )) = 𝑛 (𝐸(𝑋1 ) + ⋯ + 𝐸(𝑋𝑛 )) = 𝑛 (𝜇 + ⋯ + 𝜇) = 𝑛 𝑛𝜇 = 𝜇. The variance
1 1 2
can be rewritten as 𝑉𝑎𝑟(𝑋̅) = 𝑉𝑎𝑟 ( (𝑋1 + ⋯ + 𝑋𝑛 )) = ( ) (𝑉𝑎𝑟(𝑋1 ) + ⋯ + 𝑉𝑎𝑟(𝑋𝑛 )). Because
𝑛 𝑛
1 𝜎2
𝑉𝑎𝑟(𝑋𝑖 ) = 𝜎 2 , we get 𝑉𝑎𝑟(𝑋̅) = 𝑛2 𝑛𝜎 2 = 𝑛 .
For a $20 bet, the payoff profile will be (1) (4 × $20) − $20 = $60 with probability 6⁄36; (2) −$20
with probability 12⁄36, and (3) (−2 × $20) − $20 = −$60 with probability 18⁄36. So, 𝐸(𝑋) =
−$26.67, 𝜎𝑋 = $42.69.
If she takes the fixed fee, the mean is $5000 and the standard deviation 0 because the fee is to be
received with certainty. If she takes the contingency fee with 30% probability of winning the case,
then 𝐸(𝑋) = 30% × $25,000 + 70% × $0 = $7,500 with a standard deviation by $11,456.
The expected change of the portfolio is 𝐸(𝑅𝑃 ) = 60% × 0.0045 + 40% × 0.0035 = 0.0041. The
variance of the change in portfolio value is 𝑉𝑎𝑟(𝑅𝑃 ) = 0.62 × 0.08462 + 0.42 × 0.05192 = 0.003.
The standard deviation is then 𝜎𝑅 = 0.0548.
21
Exercise 25. Combinations of random variables
nb of checked luggage 0 1 2
proportion of passengers 54% 34% 12%
fee $0 $25 $25+$35
The expected fee per passenger 𝐸(𝑋) = 54% × $0 + 34% × $25 + 12% × ($25 + $35) = $15.7.
The variance is $398.01 and 𝜎 = $19.9502. On a flight with 120 passengers, the airline can expect
to charge a total fee by 120 × 𝐸(𝑋) = $1884 with variance 120 × 𝑉𝑎𝑟(𝑋) = $47761.1. The
standard deviation is then $218.54.
22
Appendix. Useful notation
This appendix gives some commonly used notation and symbols in probability and statistics.
▪ Greek alphabet:
Α 𝛼 Alpha Ν 𝜈 Nu
Β 𝛽 Beta Ξ 𝜉 Xi
Γ 𝛾 Gamma O o Omicron
Δ 𝛿 Delta Π 𝜋 Pi
Ε 𝜖 Epsilon Ρ 𝜌 Rho
Ζ 𝜁 Zeta Σ 𝜎 Sigma
Η 𝜂 Eta Τ 𝜏 Tau
Θ 𝜃 Theta Υ 𝜐 Upsilon
Ι 𝜄 Iota Φ 𝜙 Phi
Κ 𝜅 Kappa Χ 𝜒 Chi
Λ 𝜆 Lambda Ψ 𝜓 Psi
Μ 𝜇 Mu Ω 𝜔 Omega
23
▪ In practice, there are some specific symbols reserved to some parameters and/or statistics. For
example, the population mean is typically denoted by 𝜇, while a sample average is denoted
by 𝑥̅ . While the population variance is 𝜎 2 , the sample variance is either written as 𝜎̂ 2 or 𝑠 2 .
▪ The probability distribution function (PDF) of a random variable 𝑋 is denoted as 𝑓𝑋 (𝑥; 𝜽)
where 𝜽 is the vector of the parameters of the distribution. For example, if 𝑋 follows a normal
distribution, then the vector 𝜽 contains the mean and the standard deviation of 𝑋. In most
applications when the parameters are known and there is no other random variable, we use a
simplified notation like 𝑓(𝑥).
▪ The cumulative distribution function (CDF) of a random variable is 𝐹𝑋 (𝑥).
▪ Φ(𝑍 ≤ 𝑧) and 𝜙(𝑧) represent the standard normal CDF and PDF, respectively. A random
variable that follows a standard normal distribution is denoted as 𝑍 ∼ 𝑁(0,1).
24
Appendix. Solutions to selected exercises
0 𝑖𝑓 𝑥 ≤ 0
𝑐
𝑓𝑋 (𝑥) = { 𝑖𝑓 0 < 𝑥 < 1
√𝑥
0 𝑖𝑓 𝑥 ≥ 1
1
So, 𝑐2𝑡 1⁄2 |0 = 1 → 𝑐 = 1⁄2. Then, 𝑃(0,2 ≤ 𝑋 ≤ 0,8) = 𝐹(𝑥 = 0,8) − 𝐹(𝑥 = 0,2). So,
0,8
1 1 0,8
𝐹(𝑥 = 0,8) = ∫ 𝑑𝑥 = ∫ 𝑥 −1⁄2 𝑑𝑥 = √𝑥 + 𝐶
0 2√𝑥 2 0
where we apply the power rule ∫ 𝑥 𝑛 𝑑𝑥 = 𝑥 𝑛+1 ⁄(𝑛 + 1). Therefore, 𝑃(0,2 ≤ 𝑋 ≤ 0,8) = √0,8 −
√0,2 = 0,4472.
25
Extra material (not used in class)
26
Joint and marginal distributions
Basic definitions
13. The joint density function of two random variables 𝑋 and 𝑌, denoted 𝑓𝑋,𝑌 (𝑥, 𝑦), is specified as,
∑ ∑ 𝑓𝑋,𝑌 (𝑥, 𝑦)
𝑎≤𝑥≤𝑏 𝑐≤𝑦≤𝑑
𝑃(𝑎 ≤ 𝑥 ≤ 𝑏, 𝑐 ≤ 𝑦 ≤ 𝑑) = 𝑏 𝑑
(11)
∫ ∫ 𝑓𝑋,𝑌 (𝑥, 𝑦)𝑑𝑦𝑑𝑥
{ 𝑎 𝑐
respectively for discrete and continuous random variables. If 𝑓𝑋,𝑌 (𝑥, 𝑦) is a joint density function,
then all probabilities must be positive, and the summation and the integrals must sum up to 1.
14. The joint cumulative density function of two random variables, denoted 𝐹𝑋,𝑌 (𝑥, 𝑦), is like the
probability of a joint event. It is defined as,
∑ ∑ 𝑓𝑋,𝑌 (𝑥, 𝑦)
𝑋≤𝑥 𝑌≤𝑦
𝑃(𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦) = 𝑥 𝑦 (12)
∫ ∫ 𝑓𝑋,𝑌 (𝑡, 𝑠)𝑑𝑠𝑑𝑡
{ −∞ −∞
15. The marginal density function is defined with respect to a single random variable. To obtain
the marginal density from the joint density, it is necessary to sum or integrate out the other
variable:
∑ 𝑓𝑋,𝑌 (𝑥, 𝑦)
𝑐≤𝑦≤𝑑
𝑓𝑋 (𝑥) = (13)
∫ 𝑓𝑋,𝑌 (𝑥, 𝑠)𝑑𝑠
{ 𝑦
16. Using the marginal and joint density functions, it can be shown that two random variables are
statistically independent if, and only if, their joint density is the product of their marginal densities.
The same rule also applies to the cdf, i.e. 𝐹(𝑥, 𝑦) = 𝐹𝑋 (𝑥) × 𝐹𝑌 (𝑦).
27
17. The means, variances and other higher moments of the variables in a joint distribution are
defined with respect to their marginal distributions. If 𝑋 is a discrete random variable, then
2 2
𝑉𝑎𝑟(𝑋) = ∑(𝑥 − 𝐸(𝑋)) 𝑓𝑋 (𝑥) = ∑ ∑(𝑥 − 𝐸(𝑋)) 𝑓(𝑥, 𝑦) (16)
𝑥 𝑥 𝑦
If the random variables are independent, then the covariance will be zero. Since independence
implies that 𝑓(𝑥, 𝑦) = 𝑓𝑋 (𝑥)𝑓𝑌 (𝑦), then,
19. Although the sign of the covariance shows the direction of the covariation between 𝑋 and 𝑌,
its magnitude depends on the scales of measurement. A preferable measure to overcome this
problem is to use the correlation coefficient:
𝐶𝑜𝑣(𝑋, 𝑌)
𝜌𝑋𝑌 = (19)
𝜎𝑋 𝜎𝑌
It can be shown that the correlation is bounded within the interval [−1, 1]. It is thus scale-
independent.
20. The following results can be useful to assess the moments and co-moments in a joint
distribution. With 𝑎, 𝑏, 𝑐 and 𝑑 constants, it can be shown that
28
Conditional distributions
21. Conditioning and the use of conditional distributions play a significant role in statistical
modeling. In a bivariate probability distribution, we can define a conditional distribution over 𝑌
for each value of 𝑋. The conditional distributions are then defined as follows:
𝑓(𝑥, 𝑦) 𝑓(𝑥, 𝑦)
𝑓(𝑦|𝑥) = and 𝑓(𝑥|𝑦) = (20)
𝑓(𝑥) 𝑓(𝑦)
22. Using this definition, we can rewrite the proposition for independent random variables: 𝑋 and
𝑌 are independent if, and only if, 𝑓(𝑥, 𝑦) = 𝑓(𝑥)𝑓(𝑦). It follows that, if 𝑋 and 𝑌 are independent
random variables, then
In plain English, if the random variables are independent, the probabilities of events relating to
one variable are unrelated to the other. Note also that the definition of conditional distributions
implies the following result,
23. The conditional mean of a random variable is the mean of its conditional distribution,
∫ 𝑦𝑓(𝑦|𝑥)𝑑𝑦 𝑖𝑓 𝑌 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
𝑦
𝐸(𝑌|𝑋) = (22)
∑ 𝑦𝑓(𝑦|𝑥) 𝑖𝑓 𝑌 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
{ 𝑦
The conditional variance of a random variable is the variance of the conditional distribution, i.e.
2
𝑉𝑎𝑟(𝑌|𝑋) = 𝐸 [(𝑌 − 𝐸(𝑌|𝑋)) |𝑋] (24)
29
2
∫ (𝑦 − 𝐸(𝑌|𝑋)) 𝑓(𝑦|𝑥)𝑑𝑦 𝑖𝑓 𝑌 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
𝑦
𝑉𝑎𝑟(𝑌|𝑋) = 2 (25)
∑(𝑦 − 𝐸(𝑌|𝑋)) 𝑓(𝑦|𝑥) 𝑖𝑓 𝑌 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
{ 𝑦
The conditional variance is also called the scedastic function. Unlike the conditional mean function,
however, it is common for the conditional variance not to vary with 𝑋. The case where the
conditional variance is unrelated to 𝑋 is called homoscedasticity.
30