0% found this document useful (0 votes)
6 views48 pages

EE311 Lecture Chapter #04 Random Variables and Expectation

This lecture discusses random variables, their definitions, and properties, including discrete and continuous random variables. It explains the concepts of probability mass functions, cumulative distribution functions, and joint distributions with examples. The lecture emphasizes the importance of random variables in experiments and their role in calculating probabilities.

Uploaded by

sofiasknowledge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views48 pages

EE311 Lecture Chapter #04 Random Variables and Expectation

This lecture discusses random variables, their definitions, and properties, including discrete and continuous random variables. It explains the concepts of probability mass functions, cumulative distribution functions, and joint distributions with examples. The lecture emphasizes the importance of random variables in experiments and their role in calculating probabilities.

Uploaded by

sofiasknowledge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Lecture 4: RANDOM VARIABLES AND EXPECTATION

CCNY EE311
Instructor: Andrii Golovin, Ph.D. in Physics and Mathematics
[email protected]

1
RANDOM VARIABLES
In tossing dice we are often interested in the sum of the two dice, but the value of
each dice is not important. For example, someone may be interested in knowing that
the sum is 7 and not be concerned what outcome brought that sum, it can be 𝟏, 𝟔 ,
𝟐, 𝟓 , 𝟑, 𝟒 , 𝟒, 𝟑 , 𝟓, 𝟐 , or (𝟔, 𝟏). Thus, for the random process the important
parameter is the overall result, which is determined by the random quantities within
that random process. Consequently, the overall result is also a random quantity of
interest. The random quantities of interest that are results of the experiment are
known as random variables. Since the value of a random variable is determined by
the outcome of the experiment, we may assign probabilities of its possible values.
EXAMPLE 4.1a . Letting 𝑿 denote the random variable that is defined as the sum of
two fair dice, then probabilities of the random variable 𝑿 are

where 𝑿 = 𝟐, 𝟑, 𝟒, … , 𝟏𝟐 .

2
RANDOM VARIABLES
General formula to calculate the probability of mutually exclusive random variable
defined by using the sum of the two dice:
𝟏𝟐 𝟏𝟐

𝟏=𝑷 𝑺 =𝑷 ራ 𝑿=𝒊 = ෍𝑷 𝑿 = 𝒊
𝒊=𝟐 𝒊=𝟐

The random variables demonstrate a finite number of possible values. Random


variables whose set of possible values can be written either as a finite sequence
𝒙𝟏, . . . , 𝒙𝒏, or as an infinite sequence 𝒙𝟏, . . . are said to be discrete.

• Definition: Discrete variable is a random variable whose set of possible values


is a sequence.
• There are also random variables that take on a continuum of possible values.
These are known as continuous random variables. One example is the
continuous random variable denoting the lifetime of a car.
• Definition: The cumulative distribution function, or more simply the
distribution function, 𝑭 𝒙 = 𝑷 𝑿 ≤ 𝒙 is the probability that the random
variable 𝑿 takes on a value that is less than or equal to constant 𝒙.
3
Let’s find probability 𝑷{𝒂 < 𝑿 ≤ 𝒃}.
Solution: 𝑿 ≤ 𝒃 = 𝑿 ≤ 𝒂 ∪ 𝒂 < 𝑿 ≤ 𝒃 ,
𝐰𝐡𝐞𝐫𝐞 𝐚𝐧 𝐢𝐧𝐭𝐞𝐫𝐬𝐞𝐜𝐭𝐢𝐨𝐧 𝑿 ≤ 𝒂 𝒂 < 𝑿 ≤ 𝒃 = 𝟎,
then 𝑷 𝑿 ≤ 𝒃 = 𝑷 𝑿 ≤ 𝒂 + 𝑷 𝒂 < 𝑿 ≤ 𝒃
𝑷 𝒂<𝑿≤𝒃 =𝑷 𝑿≤𝒃 −𝑷 𝑿≤𝒂
𝑷 𝒂<𝑿≤𝒃 =𝑭 𝒃 −𝑭 𝒂
𝑿≤𝒃

𝑿≤𝒂 𝒂<𝑿≤𝒃
𝑿
𝒂 𝒃
EXAMPLE 4.1c. Suppose the random variable 𝑿 has distribution function

𝟎, 𝒙 ≤ 𝟎.
𝑭 𝒙 =൞ What is the probability that 𝑿 exceeds 1?
𝟐
𝟏 − 𝒆−𝒙 , 𝒙 > 𝟎
Solution:
By using Proposition 3.4.1 that 𝑷 𝑬 = 𝟏 − 𝑷 𝑬𝒄 and 𝑭 𝒙 = 𝑷 𝑿 ≤ 𝒙 one can get
𝑷 𝑿 > 𝟏 = 𝟏 − 𝑷 𝑿 ≤ 𝟏 = 𝟏 − 𝑭 𝟏 = 𝟏 − 𝟏 − 𝒆−𝟏 = 𝒆−𝟏 = 𝟎. 𝟑𝟔𝟖
4
• Definition: Probability mass function 𝒑(𝒂) of a discrete random variable 𝑿 is
defined by the equation:

𝒑(𝒂) = 𝑷{𝑿 = 𝒂}.

Properties of probability mass function for its possible values of 𝑺 = 𝒙𝒊 , … , 𝒙∞

𝒑 𝒙𝒊 > 𝟎 and σ∞ ∞
𝒊=𝟏 𝒑 𝒙𝒊 = 𝑷 ‫ 𝟏 = 𝑺 𝑷 = 𝒊𝒙 𝟏=𝒊ڂ‬, where 𝒊 = 𝟏, 𝟐, 𝟑, … ∞ and
𝒙∞ ≠ ∞.

For other values of 𝒙 outside of sample space 𝑺 the random variables set 𝒑 𝒙 = 𝟎 .

The cumulative distribution function 𝑭 can be expressed in terms of 𝒑(𝒙) :

𝑭 𝒂 =𝑷 𝑿≤𝒂 = ෍ 𝒑 𝒙
𝒂𝒍𝒍 𝒙≤𝒂

5
EXAMPLE 4.2a . Consider a random variable 𝑿 that is equal to 𝒙𝟏, 𝒙𝟐, or 𝒙𝟑, where
𝟏 𝟏
𝒑 𝒙𝟏 = , 𝒑 𝒙𝟐 = .
𝟐 𝟑
a) Find 𝒑 𝒙𝟑 ; build plots: b) 𝒑 𝒙 and c) 𝑭 𝒙 .
𝟏 𝟏 𝟏
Solution: a) 𝐏 𝑺 = σ𝟑𝒊=𝟏 𝒑 𝒙𝒊 = 𝟏 ⟹ 𝒑 𝒙𝟑 = 𝟏 − 𝒑 𝒙𝟏 − 𝒑 𝒙𝟐 = 𝟏 − − = ;
𝟐 𝟑 𝟔

(b) (c)
𝟏
𝟐
𝟏
𝟑
𝟏
𝟔

Cumulative distribution function:


∅, 𝒊<𝟏
𝟏 𝟏
𝟎+ = , 𝟏≤𝒊<𝟐
𝑭 𝒊 = ෍ 𝒑 𝒊 = 𝟐 𝟐
𝟏 𝟏 𝟓
𝒂𝒍𝒍 𝒙≤𝒂 + = , 𝟐≤𝒊<𝟑
𝟐 𝟑 𝟔
6 𝟏, 𝟑≤𝒊
• Definition: The random variable 𝑿 is continuous random variable , if it has a continuous
interval of possible values. The probability of continuous random variable is defined as
𝑷 𝑿 ∈ 𝑩 = ‫𝒙𝒅 𝒙 𝒇 𝑩׬‬,
where
• B is an interval of real numbers;
• The function 𝒇(𝒙) is a non-negative function defined for all real numbers 𝒙, it is called as
the probability density function of the random variable 𝑿;
• The function 𝒇(𝒙) must satisfy the equitation:
+∞
𝟏=𝑷 𝑺 =න 𝒇 𝒙 𝒅𝒙 .
−∞
• For 𝒙 values belonging to interval 𝑩 = [𝒂, 𝒃]:
𝒃
𝑷 𝑿 ∈ 𝑩 = න 𝒇 𝒙 𝒅𝒙 ⟹ 𝐏 𝒂 ≤ 𝑿 ≤ 𝒃 = න 𝒇 𝒙 𝒅𝒙 .
𝑩 𝒂

𝒂
• If 𝒂 = 𝒃 then ‫ ⟹ 𝟎 = 𝒂 = 𝑿 𝑷 = 𝒙𝒅 𝒙 𝒇 𝒂׬‬the probability of the
continuous random variable for any particular value 𝑿 is zero.
• The cumulative distribution function 𝑭(𝒂) and the probability density 𝒇(𝒙) are related by
the formula that
𝒂
𝑭 𝒂 = 𝑷 𝑿 ≤ 𝒂 = න 𝒇 𝒙 𝒅𝒙 .
−∞
or
𝒅𝑭
=𝒇 𝒂
𝒅𝒂
7
EXAMPLE 4.2b . Suppose that 𝑿 is a continuous random variable whose probability
density function is given by

𝑪 𝟒𝒙 − 𝟐𝒙𝟐 , 𝟎<𝒙<𝟐
𝒇 𝒙 =൝
𝟎, 𝒙≥𝟐
(a) What is the value of 𝑪?
(b) Find 𝑷 𝑿 > 𝟏 .

Solution: a) Integral from the probability density function is 𝟏 by the definition and
Axiom II:

𝑷 𝑺 = න 𝒇 𝒙 𝒅𝒙 = 𝟏 ⟹
−∞
𝟐
⟹ 𝑷 𝑺 = 𝑪 න 𝟒𝒙 − 𝟐𝒙𝟐 𝒅𝒙 = 𝟏 ⟹
𝟎

𝟐𝒙 𝟑 𝟐 𝟐𝟒 − 𝟏𝟔 𝟖
𝟐
𝑷 𝑺 = 𝑪 𝟐𝒙 − ቮ =𝑪 −𝟎=𝑪
𝟑 𝟑 𝟑
𝟎
𝟖 𝟑
⟹ 𝑪=𝟏 ⟹ 𝑪= .
𝟑 𝟖
8
EXAMPLE 4.2b . Suppose that 𝑿 is a continuous random variable whose probability
density function is given by

𝑪 𝟒𝒙 − 𝟐𝒙𝟐 , 𝟎 < 𝒙 < 𝟐


𝒇 𝒙 =൝
𝟎, 𝒙≥𝟐
(a) What is the value of 𝑪?
(b) Find 𝑷 𝑿 > 𝟏 .

𝟑
Solution: a) 𝑪 = .
𝟖

𝟏 +∞
b) 𝑷 𝑿 > 𝟏 = 𝟏 − 𝑷 𝑿 ≤ 𝟏 = 𝟏 − ‫׬‬−∞ 𝒇 𝒙 𝒅𝒙 = 𝟏 = ‫׬‬−∞ 𝒇 𝒙 𝒅𝒙 =

𝟏 ∞
+∞ 𝟑 𝟐
=න 𝒇 𝒙 𝒅𝒙 − න 𝒇 𝒙 𝒅𝒙 = න 𝒇 𝒙 𝒅𝒙 = න 𝟒𝒙 − 𝟐𝒙𝟐 𝒅𝒙 =
−∞ −∞ 𝟏 𝟖 𝟏

𝟑 𝟐 𝟐
𝟑 𝟐 𝟏 𝟑 𝟐 𝟑 𝟖 𝟏 𝟏
𝟐
= 𝟐 න 𝒙𝒅𝒙 − න 𝒙 𝒅𝒙 = 𝒙𝟐 ተ − 𝒙 ቮ = 𝟒−𝟏− + = .
𝟒 𝟏 𝟏 𝟒 𝟑 𝟒 𝟑 𝟑 𝟐
𝟏 𝟏 9
Jointly Distributed Random Variables
We are often interested in the relationships between two or more random variables.
Such relation can be described by the joint cumulative probability distribution
function of 𝑿 and 𝒀:
𝑭 𝒙, 𝒚 = 𝑷 𝑿 ≤ 𝒙, 𝒀 ≤ 𝒚
For example, in the statistics for the possible causes of cancer, we might be
interested in the relationship between the average number 𝒙 of cigarettes smoked
daily (a) and the age 𝒚 at which an individual contracts cancer (b):
a) 𝑷 𝑿 ≤ 𝒙 = 𝑷 𝑿 ≤ 𝒙, 𝒀 < ∞ = 𝑭 𝒙, ∞
b) 𝑷 𝒀 ≤ 𝒚 = 𝑷 𝑿 ≤ ∞, 𝒀 < 𝒚 = 𝑭 ∞, 𝒚
The joint probability mass function of 𝑿 and 𝒀 , 𝒑 𝒙𝒊, 𝒚𝒋 , is defined as

𝒑 𝒙𝒊 , 𝒚𝒊 = 𝑷 𝑿 = 𝒙𝒊 , 𝒀 = 𝒚𝒊 .
The individual probability mass functions of 𝑿 and 𝒀 are easily obtained from the
joint probability mass function:
𝑿 = 𝒙𝒊 = ራ 𝑿 = 𝒙𝒊 , 𝒀 = 𝒚𝒋 ⟹ 𝑷 𝑿 = 𝒙𝒊 = 𝑷 ራ 𝑿 = 𝒙𝒊 , 𝒀 = 𝒚𝒋 =
𝒋 𝒋

= σ𝒋 𝑷 𝑿 = 𝒙𝒊 , 𝒀 = 𝒚𝒋 = σ𝒋 𝒑 𝒙𝒊 , 𝒚𝒋 and 𝑷 𝒀 = 𝒚𝒋 = ⋯ = σ𝒊 𝒑 𝒙𝒊 , 𝒚𝒋
10
Jointly Distributed Random Variables
𝑷 𝑿 = 𝒙𝒊 = σ𝒋 𝑷 𝑿 = 𝒙𝒊 , 𝒀 = 𝒚𝒋 = σ𝒋 𝒑 𝒙𝒊 , 𝒚𝒋 .
and 𝑷 𝒀 = 𝒚𝒋 = ⋯ = σ𝒊 𝒑 𝒙𝒊 , 𝒚𝒋 .
So, by specifying the joint probability mass function always determines the individual
mass functions. However, it does not work in reverse way, namely, 𝑷{𝑿 = 𝒙𝒊 } and
𝑷{𝒀 = 𝒚𝒋} do not determine the value of 𝑷{𝑿 = 𝒙𝒊 , 𝒀 = 𝒚𝒋}.

EXAMPLE 4.3a . Suppose that 3 batteries are randomly chosen from a group 𝑿 of 3 new,
group Y of 4 used but still working, and 5 defective batteries.
Find the joint probability mass function 𝒑 𝒙𝒊 , 𝒚𝒋 = 𝑷 𝑿 = 𝒙𝒊 , 𝒀 = 𝒚𝒋 , which describes
probability to get useful batteries.
Solution:
The total number of batteries is 3+4+5=12, and 𝒙𝒊 = ∅, 𝟏, 𝟐, 𝟑 and 𝒚𝒋 = ∅, 𝟏, … , 𝟒.
Thus, the joint probability mass functions are
𝟓 𝟒 𝟑 𝟓
∙𝟏∙𝟏 𝟏∅
∅ ∅
𝒑 ∅, ∅ = 𝟑 = 𝟑 = ≈ ∅. ∅𝟒𝟔;
𝟏𝟐 𝟏𝟐 𝟐𝟐∅
𝟑 𝟑
𝟓 𝟒 𝟑
𝟐 𝟏 ∅ 𝟐
𝒑 ∅, 𝟏 = 𝟏𝟐 = 𝟏𝟏 ≈ ∅. 𝟏𝟖
𝟑 11
EXAMPLE 4.3a .
𝟓 𝟒 𝟑
𝒑 ∅, 𝟐 = 𝟏 𝟐 ∅ = 𝟑 ≈ ∅. 𝟏𝟒
𝟏𝟐 𝟐𝟐
𝟑
𝟓 𝟒 𝟑
∅ 𝟑 ∅ 𝟏
𝒑 ∅, 𝟑 = = ≈ ∅. ∅𝟏𝟖
𝟏𝟐 𝟓𝟓
𝟑
𝟓 𝟒 𝟑
𝟐 ∅ 𝟏 𝟑
𝒑 𝟏, ∅ = = ≈ ∅. 𝟏𝟒
𝟏𝟐 𝟐𝟐
𝟑
𝟓 𝟒 𝟑
∅ 𝟐 𝟑
𝒑 𝟐, ∅ = 𝟏 = ≈ ∅. ∅𝟔𝟖
𝟏𝟐 𝟒𝟒
𝟑
𝟓 𝟒 𝟑
∅ ∅ 𝟑 𝟏
𝒑 𝟑, ∅ = = ≈ ∅. ∅∅𝟒𝟔
𝟏𝟐 𝟐𝟐∅
𝟑
12
𝟓 𝟒 𝟑
𝟏 𝟏 𝟏 𝟑
EXAMPLE 4.3a . 𝒑 𝟏, 𝟏 = 𝟏𝟐 = ≈ ∅. 𝟐𝟕
𝟏𝟏
𝟑
𝟓 𝟒 𝟑
∅ 𝟐 𝟏 𝟗
𝒑 𝟏, 𝟐 = = ≈ ∅. ∅𝟖𝟐
𝟏𝟐 𝟏𝟏∅
𝟑
𝟓 𝟒 𝟑
∅ 𝟏 𝟐 𝟑
𝒑 𝟐, 𝟏 = 𝟏𝟐 = ≈ ∅. ∅𝟓𝟓
𝟓𝟓
𝟑

13
EXAMPLE 4.3a .
The individual probability mass functions of 𝑿 is 𝑷 𝑿 = 𝒙𝒊 = σ𝒋 𝒑 𝒙𝒊 , 𝒚𝒋 .

𝒊 = ∅ ⟹ 𝑷 𝑿 = 𝒙∅ = ෍ 𝒑 𝒙∅ , 𝒚𝒋 = 𝒑 ∅, ∅ + 𝒑 ∅, 𝟏 + 𝒑 ∅, 𝟐 + 𝒑 ∅, 𝟑 =
𝒋
𝟏 𝟐 𝟑 𝟏 𝟐𝟏 𝟖𝟒
= + + + = = ;
𝟐𝟐 𝟏𝟏 𝟐𝟐 𝟓𝟓 𝟓𝟓 𝟐𝟐𝟎
𝒊 = 𝟏 ⟹ 𝑷 𝑿 = 𝒙𝟏 = ෍ 𝒑 𝒙𝟏 , 𝒚𝒋 = 𝒑 𝟏, ∅ + 𝒑 𝟏, 𝟏 + 𝒑 𝟏, 𝟐 + 𝒑 𝟏, 𝟑 =
𝒋
𝟑 𝟑 𝟗 𝟐𝟕 𝟏𝟎𝟖
= + + +∅= = ;
𝟐𝟐 𝟏𝟏 𝟏𝟏∅ 𝟓𝟓 𝟐𝟐𝟎
𝒊 = 𝟐 ⟹ 𝑷 𝑿 = 𝒙𝟐 = ෍ 𝒑 𝒙𝟐 , 𝒚𝒋 = 𝒑 𝟐, ∅ + 𝒑 𝟐, 𝟏 + 𝒑 𝟐, 𝟐 + 𝒑 𝟐, 𝟑 =
𝒋
𝟑 𝟑 𝟐𝟕
= + +∅+∅= ;
𝟒𝟒 𝟓𝟓 𝟐𝟐𝟎
𝒊 = 𝟑 ⟹ 𝑷 𝑿 = 𝒙𝟑 = ෍ 𝒑 𝒙𝟑 , 𝒚𝒋 = 𝒑 𝟑, ∅ + 𝒑 𝟑, 𝟏 + 𝒑 𝟑, 𝟐 + 𝒑 𝟑, 𝟑 =
𝒋
𝟏 𝟏
= +∅+∅+∅= ;
𝟐𝟐∅ 𝟐𝟐𝟎

14
EXAMPLE 4.3a .
The individual probability mass functions of 𝒀 is 𝑷 𝒀 = 𝒚𝒋 = σ𝒊 𝒑 𝒙𝒊 , 𝒚𝒋 .
𝒋 = ∅ ⟹ 𝑷 𝒀 = 𝒚∅ = ෍ 𝒑 𝒙𝒊 , 𝒚∅ = 𝒑 ∅, ∅ + 𝒑 𝟏, ∅ + 𝒑 𝟐, ∅ + 𝒑 𝟑, ∅ =
𝒊
𝟏 𝟑 𝟑 𝟏 𝟏𝟒 𝟓𝟔
= + + + = = ;
𝟐𝟐 𝟐𝟐 𝟒𝟒 𝟐𝟐𝟎 𝟓𝟓 𝟐𝟐𝟎

𝒋 = 𝟏 ⟹ 𝑷 𝒀 = 𝒚𝟏 = ෍ 𝒑 𝒙𝒊 , 𝒚𝟏 = 𝒑 ∅, 𝟏 + 𝒑 𝟏, 𝟏 + 𝒑 𝟐, 𝟏 + 𝒑 𝟑, 𝟏 =
𝒊
𝟐 𝟑 𝟑 𝟐𝟖 𝟏𝟏𝟐
= + + +∅= = ;
𝟏𝟏 𝟏𝟏 𝟓𝟓 𝟓𝟓 𝟐𝟐𝟎
𝒋 = 𝟐 ⟹ 𝑷 𝒀 = 𝒚𝟐 = ෍ 𝒑 𝒙𝒊 , 𝒚𝟐 = 𝒑 ∅, 𝟐 + 𝒑 𝟏, 𝟐 + 𝒑 𝟐, 𝟐 + 𝒑 𝟑, 𝟐 =
𝒊
𝟑 𝟗 𝟏𝟐 𝟒𝟖
= + +∅+∅= = ;
𝟐𝟐 𝟏𝟏∅ 𝟓𝟓 𝟐𝟐𝟎

𝒋 = 𝟑 ⟹ 𝑷 𝒀 = 𝒚𝟑 = ෍ 𝒑 𝒙𝒊 , 𝒚𝟑 = 𝒑 ∅, 𝟑 + 𝒑 𝟏, 𝟑 + 𝒑 𝟐, 𝟑 + 𝒑 𝟑, 𝟑 =
𝒊
𝟏 𝟏 𝟒
= +∅+∅+∅= =
𝟓𝟓 𝟓𝟓 𝟐𝟐𝟎
15
EXAMPLE 4.3a .

21
= 𝑃 𝑋 = 𝑥0
55
𝟐𝟕
=
𝟓𝟓

𝑃 𝑌=

𝑷 𝑿 = 𝒙𝒊 = σ𝒋 𝑷 𝑿 = 𝒙𝒊 , 𝒀 = 𝒚𝒋 = σ𝒋 𝒑 𝒙𝒊 , 𝒚𝒋 𝑬𝒒. (𝟒. 𝟑. 𝟏)
and 𝑷 𝒀 = 𝒚𝒋 = ⋯ = σ𝒋 𝒑 𝒙𝒊 , 𝒚𝒋 𝑬𝒒. (𝟒. 𝟑. 𝟐)
The probability mass function of 𝑿 is obtained by computing the row sums, in
accordance with the Equation 4.3.1.
The probability mass function of 𝒀 is obtained by computing the column sums, in
accordance with Equation 4.3.2.
16
EXAMPLE 4.3a . Check-up:

The marginal probability


𝟏𝟒 𝟐𝟖 𝟏𝟐 𝟏 mass functions of X:
= = = =
𝟓𝟓 𝟓𝟓 𝟓𝟓 𝟓𝟓
The individual probability mass 𝒏
functions of X is in the margin of the ෍𝑷 𝑿 = 𝒊 = 𝟏
table, it is often referred to as being 𝒊
the marginal probability mass
functions of Y: σ𝒏𝒋 𝑷 𝒀 = 𝒋 = 𝟏 84 + 108 + 27 + 1
=1
220
𝟏𝟒 + 𝟐𝟖 + 𝟏𝟐 + 𝟏
=𝟏
𝟓𝟓 17
Continuous Random Jointly Continuous Random
Variable 𝑿. Variables 𝑿 and 𝒀.

where 𝒙 ∈ −∞, ∞ , 𝑩 = [𝒂, 𝒃] is set of


where 𝑨 and 𝑩 are sets of real numbers,
real numbers and 𝒇 𝒙 is the probability
and set 𝑪 = {(𝒙, 𝒚) ∶ 𝒙 ∈ 𝑨, 𝒚 ∈ 𝑩}.
density function.

where 𝑭 𝒙 = 𝑷 𝑿 ≤ 𝒙 is the cumulative


distribution function.

18
Independent Random Variables
• Definition: The random variables 𝑿 and 𝒀 are said to be independent when for any
two sets of real numbers 𝑨 and 𝑩
𝑷{𝑿 ∈ 𝑨, 𝒀 ∈ 𝑩} = 𝑷{𝑿 ∈ 𝑨} ∙ 𝑷{𝒀 ∈ 𝑩}.

In terms of joint distribution function 𝑭:


𝑭 𝒂, 𝒃 = 𝑭𝑿=𝒂 ∙ 𝑭𝒀=𝒃 for all 𝒂 and 𝒃.

In terms of probability mass function 𝒑:


𝒑(𝒙, 𝒚) = 𝒑𝑿(𝒙) ∙ 𝒑𝒀( 𝒚) for all 𝒙 and 𝒚.

In terms of probability density function 𝒇:


𝒇 (𝒙, 𝒚) = 𝒇𝑿 (𝒙) ∙ 𝒇𝒀 ( 𝒚) for all 𝒙 and 𝒚.

19
Independent Random Variables
𝑷{𝑿 ∈ 𝑨, 𝒀 ∈ 𝑩} = 𝑷{𝑿 ∈ 𝑨}𝑷{𝒀 ∈ 𝑩} 𝒑(𝒙, 𝒚) = 𝒑𝑿(𝒙)𝒑𝒀( 𝒚)

𝑭 𝒂, 𝒃 = 𝑭𝒙=𝒂 𝑭𝒚=𝒃 𝒇 (𝒙, 𝒚) = 𝒇𝑿 (𝒙)𝒇𝒀 ( 𝒚)

EXAMPLE 4.3e . The successive daily changes of the price of a given item are
assumed to be independent and identically distributed random variables with
probability mass function given by 𝒑 𝒂 = 𝑷 𝑿 = 𝒂 . Find the probability that the
item’s price will increase successively by 1, 2, and 0 points in the next three days.

Daily Changes of 𝒑 𝒂

the Price [Points]


-3 0.05

-2 0.1

-1 0.2 𝒑 𝟏, 𝟐, 𝟎 = 𝟎. 𝟐 ∙ 𝟎. 𝟏 ∙ 𝟎. 𝟑 = 𝟎. 𝟎𝟎𝟔
0 0.3

1 0.2

2 0.1
3 0.05 20
Conditional Distributions
Definition: If 𝑿 and 𝒀 are discrete random variables, a conditional probability
mass function 𝒑𝑿ȁ𝒀 𝒙 𝒚 can be defined similarly to conditional probability
𝑷 𝑬𝑭
𝑷 𝑬𝑭 = of events 𝑬 and 𝑭:
𝑷 𝑭
𝑷 𝑿=𝒙,𝒀=𝒚 𝒑 𝒙,𝒚
𝒑𝑿ȁ𝒀 𝒙 𝒚 = 𝑷 𝑿 = 𝒙 𝒀 = 𝒚 = = , where 𝒀 = 𝒚.
𝑷 𝒀=𝒚 𝒑𝒀 𝒚

EXAMPLE 4.3g . The joint probability mass function 𝒑(𝒙, 𝒚) of 𝑿 and 𝒀, is


𝒑 𝟎, 𝟎 = 𝟎 . 𝟒, 𝒑 𝟎, 𝟏 = 𝟎 . 𝟐, 𝒑 𝟏, 𝟎 = 𝟎 . 𝟏, 𝒑 𝟏, 𝟏 = 𝟎. 𝟑 .
Calculate the conditional probability mass function 𝒑𝑿ȁ𝒀 𝒙 𝒚 , when 𝒀 = 𝟏.

Solution: 𝑷 𝒀 = 𝟏 = 𝒑𝒀 𝟏 = σ𝒙 𝒑 𝒙, 𝟏 = 𝒑 𝟎, 𝟏 + 𝒑 𝟏, 𝟏 = 𝟎. 𝟐 + 𝟎. 𝟑 = 𝟎. 𝟓 .
𝒑 𝟎, 𝟏 𝟎. 𝟐
𝒑𝑿ȁ𝒀 𝟎𝟏 =𝑷 𝑿=𝟎𝒀=𝟏 = = = 𝟎. 𝟒 ;
𝒑𝒀 𝟏 𝟎. 𝟓

𝒑 𝟏,𝟏 𝟎.𝟑
𝒑𝑿ȁ𝒀 𝟏 𝟏 = 𝑷 𝑿 = 𝟏 𝒀 = 𝟏 = = = 𝟎.6 .
𝒑𝒀 𝟏 𝟎.𝟓

21
Expectation
• Definition: Expectation of a discrete random variable 𝑿 is a weighted average of the
possible values 𝒙𝒊 that 𝑿 can take on, where each value being weighted by the
probability 𝑷 𝑿 = 𝒙𝒊 : 𝒏 𝒏

𝑬 𝑿 = ෍ 𝒙𝒊 𝑷 𝑿 = 𝒙𝒊 = ෍ 𝒙𝒊 𝒑 𝒙𝒊 .
𝒊 𝒊

EXAMPLE 4.4a . Find 𝑬[𝑿] where 𝑿 is the outcome when we roll one fair die.
𝟏
Solution: 𝒑 𝟏 = 𝒑 𝟐 = 𝒑 𝟑 = 𝒑 𝟒 = 𝒑 𝟓 = 𝒑 𝟔 = .
𝟔

𝟏 𝟏 𝟏 𝟏 𝟏 𝟏
𝑬 𝑿 = σ𝒏𝒊 𝒙𝒊 𝒑 𝒙𝒊 = 𝟏 +𝟐 +𝟑 +𝟒 +𝟓 +𝟔 =
𝟔 𝟔 𝟔 𝟔 𝟔 𝟔
𝟏 𝟐𝟏 𝟕
= 𝟏+𝟐+𝟑+𝟒+𝟓+𝟔 = = = 𝟑. 𝟓 .
𝟔 𝟔 𝟐
Conclusion:
• Expectation 𝑬[𝑿] is a statistical parameter calculated for a sequence of random
numbers, it is NOT the value that we expect to have for 𝑿𝒊 ;
• Expectation 𝑬 𝑿 = 𝝁 is the true mean value of all random variables belonging to the
population of 𝑿. Thus, it is the generalized weighted average of 𝑿.
• The sample mean 𝑿 ഥ is not necessarily equal to true mean 𝝁.
• 𝑬[𝑿] has the same units of a measurement as does 𝑿.
22
Suppose someone wanted to compute the expectation of 𝒈(𝑿), where 𝒈(𝑿)
is a continues function of random variable 𝑿 and it takes on the value 𝒈(𝑿)
when 𝑿 = 𝒙. The stretching or shrinking values of 𝒈(𝑿) are still
characterized by the original values of probability mass function 𝒑 𝒙𝒊
(probability density function 𝒇 𝒙 ).

PROPOSITION 4.5.1 . Expectation of a Function of a Random Variable.


a) If 𝑿 is a discrete random variable with probability mass function 𝒑(𝒙),
then for any real-valued function 𝒈 𝒙 :
𝒏

𝑬𝒈 𝑿 = ෍ 𝒈 𝒙𝒊 𝒑 𝒙𝒊 ,
𝒊
b) If 𝑿 is a continuous random variable with probability density function
𝒇(𝒙), then for any real-valued function 𝒈 𝒙 :

𝑬𝒈 𝑿 = න 𝒈 𝒙 𝒇 𝒙 𝒅𝒙 .
−∞

23
Corollary 4.5.2
If 𝒂 and 𝒃 are constants, then 𝑬 𝒂𝑿 + 𝒃 = 𝒂𝑬 𝑿 + 𝒃

Proof:
a) In the discrete case,
𝒏 𝒏 𝒏

𝑬 𝒂𝑿 + 𝒃 = ෍ 𝒂𝒙𝒊 + 𝒃 𝒑 𝒙𝒊 = 𝒂 ෍ 𝒙𝒊 𝒑 𝒙𝒊 + 𝒃 ෍ 𝒑 𝒙𝒊 = 𝒂𝑬 𝑿 + 𝒃 ;
𝒊 𝒊 𝒊

b) In continuous case,
∞ ∞ ∞

𝑬 𝒂𝑿 + 𝒃 = න 𝒂𝒙 + 𝒃 𝒇 𝒙 𝒅𝒙 = 𝒂 න 𝒙𝒇 𝒙 𝒅𝒙 + 𝒃 න 𝒇 𝒙 𝒅𝒙 = 𝒂𝑬 𝑿 + 𝒃 ;
−∞ −∞ −∞

c) If 𝒂 = 𝟎, then 𝑬𝒃 =𝒃 ;

d) If 𝒃 = 𝟎, then 𝑬 𝒂𝑿 = 𝒂𝑬 𝑿 .

24
EXAMPLE 4.5a . 𝑿 has the following probability mass function
𝒑(𝟎) = 𝟎. 𝟐, 𝒑(𝟏) = 𝟎. 𝟓, 𝒑(𝟐) = 𝟎. 𝟑 .
Calculate 𝑬[𝑿𝟐].
Solution:
𝒙𝟏 = 𝟎, 𝒙𝟐 = 𝟏, 𝒙𝟑 = 𝟐, and 𝒀 = 𝑿𝟐 , then 𝒚𝟏 = 𝟎, 𝒚𝟐 = 𝟏, 𝒚𝟑 = 𝟒.
“Since 𝒀 = 𝒇(𝑿) is itself a random variable, it must have a probability distribution, which
should be computable from a knowledge of the distribution of 𝑿. Once we have obtained the
distribution of 𝑿, we can then compute 𝑬[𝒇(𝑿)] by the definition of the expectation”.
𝒏 𝒏
𝒘𝒉𝒆𝒓𝒆
𝑬 𝒀 = ෍ 𝒚𝒊 𝒑 𝒚𝒊 = = ෍ 𝒚𝒊 𝒑 𝒙𝒊 = 𝟎 ∙ 𝟎. 𝟐 + 𝟏 ∙ 𝟎. 𝟓 + 𝟒 ∙ 𝟎. 𝟑 = 𝟏. 𝟕 .
𝒑 𝒚𝒊 = 𝒑 𝒙 𝒊
𝒊 𝒊

Conclusion: if someone needs to calculate 𝑬 𝒀 , where 𝒚𝒊 = 𝒇 𝒙𝒊 , it must be calculated by


using the original probability mass function 𝒑(𝒙𝒊 ) and 𝒚𝒊 :
𝒏

෍ 𝒙𝒎
𝒊 𝒑 𝒙𝒊 ,
if 𝑿 is discrete,
𝒊
𝑬 𝑿𝒎 =

න 𝒙𝒎 𝒇 𝒙 𝒅𝒙 , if 𝑿 is continuous.
−∞
25
26
EXAMPLE 4.5f . A secretary has typed 𝒏 letters along with their respective envelopes. The
envelopes get mixed up. If the letters are placed in the mixed-up envelopes in a
completely random manner, what is the expected number of letters that are placed in
the correct envelopes?

Solution: 𝑿 = 𝒙𝟏 + 𝒙𝟐 +· · · +𝒙𝒏 is a number of letters placed in correct envelopes,


𝒙 = 𝟏, 𝐢𝐟 𝐥𝐞𝐭𝐭𝐞𝐫 𝐢𝐬 𝐩𝐥𝐚𝐜𝐞𝐝 𝐢𝐧 𝐢𝐭𝐬 𝐩𝐫𝐨𝐩𝐞𝐫 𝐞𝐧𝐯𝐞𝐥𝐨𝐩𝐞
where 𝒙𝒊 = ቊ 𝒊
𝒙𝒊 = 𝟎, 𝐨𝐭𝐡𝐞𝐫𝐰𝐢𝐬𝐞
Thus, 𝑬[𝑿] is expectation that 𝑿 letters are placed in correct envelopes:
𝑬 𝑿 = 𝑬 𝒙𝟏 + 𝒙𝟐 +· · · +𝒙𝒏 = 𝑬 𝒙𝟏 + 𝑬 𝒙𝟐 + ⋯ + 𝑬 𝒙𝒏 =
𝒏

= ෍ 𝑬 𝒙𝒊 = 𝒏 ∙ 𝑬 𝒙𝒊 =∗
𝒊=𝟏
𝟏 𝟏
𝒑 𝟏 = 𝒏 because for one letter there is equally likely the 𝒏 envelopes. Thus, 𝒑 𝟎 = 𝟏 − 𝒏.

𝟏 𝟏 𝟏
Expectation for 𝒙𝒊 letter is 𝑬 𝒙𝒊 = 𝒙𝟏 ∙ 𝒑 𝒙𝟏 + 𝒙𝟐 ∙ 𝒑 𝒙𝟐 = 𝟏 ∙ 𝒏 + 𝟎 ∙ 𝟏 − 𝒏 = 𝒏 .
𝒏
𝟏
∗= ෍ 𝑬 𝒙𝒊 = 𝒏 = 𝟏
𝒏
𝒊=𝟏
Answer: For a large number of letters, there are, on the average, exactly one of the
letters will be in its own envelope. 27
• Definition: Expectation of a discrete random variable 𝑿 is a weighted average of
the possible values 𝒙𝒊 that 𝑿 can take in a large number of repetitions of the
experiment, where each value 𝒙𝒊 being weighted by the probability 𝑷 𝑿 = 𝒙𝒊 :

𝒏 𝒏

𝑬 𝑿 = ෍ 𝒙𝒊 𝑷 𝑿 = 𝒙𝒊 = ෍ 𝒙𝒊 𝒑 𝒙𝒊 .
𝒊 𝒊
weight

• Definition: The expected value of a random variable 𝑿 , 𝑬[𝑿], is also referred to


as the “true mean 𝝁” or “first moment” of 𝑿.

• Definition: The quantity 𝑬[𝑿𝒏], 𝒏 ≥ 𝟏, is called the 𝒏𝒕𝒉 moment of 𝑿 .

28
Variance
• Definition: The variance 𝑽𝒂𝒓(𝑿) of a random variable 𝑿 with mean 𝝁 is
𝑽𝒂𝒓(𝑿) = 𝑬[(𝑿 − 𝝁)𝟐]

𝑽𝒂𝒓 𝑿 = 𝑬 𝑿 − 𝝁 𝟐 = 𝑬 𝑿𝟐 − 𝟐𝝁𝑿 + 𝝁𝟐 = 𝑬 𝑿𝟐 − 𝑬 𝟐𝝁𝑿 + 𝑬 𝝁𝟐 =

𝒘𝒉𝒆𝒓𝒆
= 𝑬 𝑿𝟐 − 𝟐𝝁𝑬 𝑿 + 𝑬 𝝁𝟐 = 𝑬 𝑿 = 𝝁 = 𝑪𝒐𝒏𝒔𝒕 = 𝑬 𝑿𝟐 − 𝟐𝝁𝝁 + 𝝁𝟐 =

𝒘𝒉𝒆𝒓𝒆
= 𝑬 𝑿𝟐 − 𝟐𝝁𝟐 + 𝝁𝟐 = 𝑬 𝑿𝟐 − 𝝁𝟐 = 𝝁𝟐 = 𝑬 𝑿 𝟐 = 𝑬 𝑿𝟐 − 𝑬 𝑿 𝟐
.

Conclusion: an alternative formula for the variance of 𝑿 is

𝑽𝒂𝒓 𝑿 = 𝑬 𝑿𝟐 − 𝑬 𝑿 𝟐

• Definition: The quantity 𝑽𝒂𝒓 𝑿 is called the standard deviation of 𝑿, it has the same
units as 𝑿.

29
Example 4.6a. Compute 𝑽𝒂𝒓 𝑿 , where 𝑿 represents the outcome when one rolls a
fair die.

Solution:
The alternative formula for the variance of 𝑿 is
𝑽𝒂𝒓 𝑿 = 𝑬 𝑿𝟐 − 𝑬 𝑿 𝟐

𝟏
Since 𝒑 𝒙𝒊 = , where 𝒙𝒊 = 𝟏, 𝟐, 𝟑, 𝟒, 𝟓, 𝟔, then one can obtain that
𝟔
𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟗𝟏
𝑬 𝑿𝟐 = σ𝟔𝒊=𝟏 𝒙𝟐𝒊 ∙ 𝒑 𝒙𝒊 = 𝟏𝟐 ∙ + 𝟐𝟐 ∙ + 𝟑𝟐 ∙ + 𝟒𝟐 ∙ + 𝟓𝟐 ∙ + 𝟔𝟐 ∙ = .
𝟔 𝟔 𝟔 𝟔 𝟔 𝟔 𝟔

𝟕
Since it was shown in Example 4.4a that 𝑬 𝑿 = = 𝟑. 𝟓 , one can demonstrate that
𝟐
𝟐
𝟗𝟏 𝟕 𝟑𝟓
𝑽𝒂𝒓 𝑿 = 𝑬 𝑿𝟐 − 𝑬𝑿 𝟐 = − = ≈ 𝟐. 𝟗𝟐 = 𝝈𝟐
𝟔 𝟐 𝟏𝟐

𝟑𝟓
So, the corresponding deviation is 𝝈 = ≈ 𝟏. 𝟕𝟏 .
𝟏𝟐

30
A useful identity concerning variances is
𝑽𝒂𝒓 𝒂𝑿 + 𝒃 = 𝒂𝟐 𝑽𝒂𝒓 𝑿 , where 𝒂 and 𝒃 are constants.
Prove:
𝒅𝒆𝒇
𝑬𝑿 = 𝝁 Corollary4.5.2
Let ⟹ 𝑬 𝒂𝑿 + 𝒃 = 𝒂𝑬 𝑿 + 𝒃 = 𝒂𝝁 + 𝒃 ⇒
𝝁𝒏𝒆𝒘 → 𝒂𝝁 + 𝒃
𝑿𝒏𝒆𝒘 → 𝒂𝑿 + 𝒃

𝒅𝒆𝒇
𝑽𝒂𝒓 𝑿 = 𝑬 𝑿 − 𝝁 𝟐
𝟐
⟹ 𝑽𝒂𝒓 𝒂𝑿 + 𝒃 = 𝑬 𝒂𝑿 + 𝒃 − 𝝁𝒏𝒆𝒘 = 𝝁𝒏𝒆𝒘 → 𝒂𝝁 + 𝒃

𝑿𝒏𝒆𝒘 → 𝒂𝑿 + 𝒃
=
𝟐 𝟐
=𝑬 𝒂𝑿 + 𝒃 − 𝒂𝝁 + 𝒃 = 𝑬 𝒂𝑿 − 𝒂𝝁 = 𝑬 𝒂𝟐 𝑿 − 𝝁 𝟐 = 𝒂𝟐 𝑬 𝑿 − 𝝁 𝟐 =

= 𝒂𝟐 𝑽𝒂𝒓 𝑿 .
 if 𝒂 = 𝟎, 𝐭𝐡𝐞𝐧 𝑽𝒂𝒓 𝒃 = 𝟎
 if 𝒂 = 𝟏, then 𝑽𝒂𝒓 𝑿 + 𝒃 = 𝑽𝒂𝒓 𝑿
 if 𝒃 = 𝟎, then 𝑽𝒂𝒓 𝒂𝑿 = 𝒂𝟐 𝑽𝒂𝒓 𝑿
31
 𝑽𝒂𝒓 𝑿 + 𝑿 = 𝑽𝒂𝒓 𝟐𝑿 = 𝟒𝑽𝒂𝒓 𝑿 ≠ 𝑽𝒂𝒓 𝑿 + 𝑽𝒂𝒓 𝑿
Covariance
• Definition: The covariance 𝑪𝒐𝒗 𝑿, 𝒀 of two random variables 𝑿 and 𝒀 is defined
by
𝑪𝒐𝒗 𝑿, 𝒀 = 𝑬 𝑿 − 𝝁𝒙 ∙ 𝒀 − 𝝁𝒚 ,
where 𝝁𝒙 and 𝝁𝒚 are the means of 𝑿 and 𝒀, respectively.
The covariance is a measure of the joint variability of two random variables 𝑿 and 𝒀.
The magnitude of the covariance is not easy to interpret because it is not normalized
and hence depends on the magnitudes of the variables. The normalized version of
the covariance, the correlation coefficient, however, shows by its magnitude the
strength of the linear relation.
Useful identities: 𝑪𝒐𝒗 𝑿, 𝒀 = 𝑬 𝑿 ∙ 𝒀 − 𝑬 𝑿 ∙ 𝑬 𝒀 Eq.4.7.1
𝑪𝒐𝒗 𝑿, 𝒀 = 𝑪𝒐𝒗 𝒀, 𝑿 ;
𝑪𝒐𝒗 𝑿, 𝑿 = 𝑽𝒂𝒓 𝑿 ;
𝑪𝒐𝒗 𝒂𝑿, 𝒀 = 𝒂𝑪𝒐𝒗 𝑿, 𝒀 . ⟸Problem #47 of Homework
Proof of Eq.4.7.1: 𝑪𝒐𝒗 𝑿, 𝒀 = 𝑬 𝑿 − 𝝁𝒙 ∙ 𝒀 − 𝝁𝒚 =
= 𝑬 𝑿 ∙ 𝒀 − 𝝁𝒙 𝒀 − 𝝁𝒚 𝑿 + 𝝁𝒙 𝝁𝒚 = 𝑬 𝑿 ∙ 𝒀 − 𝝁𝒙 𝑬 𝒀 − 𝝁𝒚 𝑬 𝑿 + 𝝁𝒙 𝝁𝒚 =
𝝁𝒙 = 𝑬 𝑿
= = 𝑬 𝑿 ∙ 𝒀 − 𝝁𝒙 𝝁𝒚 − 𝝁𝒚 𝝁𝒙 + 𝝁𝒙 𝝁𝒚 = 𝑬 𝑿 ∙ 𝒀 − 𝑬 𝑿 ∙ 𝑬 𝒀
𝝁𝒚 = 𝑬 𝒀 32
Lemma 4.7.1: Additive Property of the Covariance
𝑪𝒐𝒗 𝑿 + 𝒁, 𝒀 = 𝑪𝒐𝒗 𝑿, 𝒀 + 𝑪𝒐𝒗 𝒁, 𝒀

Proof:
𝑬𝒒. 𝟒. 𝟕. 𝟏
𝑪𝒐𝒗 𝑿 + 𝒁, 𝒀 = 𝑪𝒐𝒗 𝑿, 𝒀 = 𝑬 𝑿 ∙ 𝒀 − 𝑬 𝑿 ∙ 𝑬 𝒀 =

= 𝑬 𝑿 + 𝒁 𝒀 − 𝑬 𝑿 + 𝒁 𝑬 𝒀 = 𝑬 𝑿𝒀 + 𝒁𝒀 − 𝑬 𝑿 + 𝒁 𝑬 𝒀 =
𝐄𝐪. 𝟒. 𝟓. 𝟏
= 𝑬 𝑿 + 𝒀 = 𝑬 𝑿 + 𝑬 𝒀 = 𝑬 𝑿𝒀 + 𝑬 𝒁𝒀 − 𝑬 𝑿 𝑬 𝒀 − 𝑬 𝒁 𝑬 𝒀 =

= 𝑬 𝑿𝒀 − 𝑬 𝑿 𝑬 𝒀 + 𝑬 𝒁𝒀 − 𝑬 𝒁 𝑬 𝒀 = 𝑬𝒒. 𝟒. 𝟕. 𝟏 =

= 𝑪𝒐𝒗 𝑿, 𝒀 + 𝑪𝒐𝒗 𝒁, 𝒀

33
Correlation
• Definition: The strength of the relationship between 𝑿 and 𝒀 is indicated by the
correlation between 𝑿 and 𝒀 , a dimensionless quantity obtained by dividing the
covariance by the product of the standard deviations of 𝑿 and 𝒀 :
𝑪𝒐𝒗 𝑿, 𝒀
𝑪𝒐𝒓𝒓 𝑿, 𝒀 = ,
𝑽𝒂𝒓 𝑿 𝑽𝒂𝒓 𝒀

where −𝟏 ≤ 𝑪𝒐𝒓𝒓 𝑿, 𝒀 ≤ 𝟏,
𝑪𝒐𝒗 𝑿, 𝒀 = 𝑬 𝑿 − 𝝁𝒙 ∙ 𝒀 − 𝝁𝒚 and 𝑽𝒂𝒓(𝑿) = 𝑬[(𝑿 − 𝝁)𝟐].

One may see some similarities with the definition given in the Chapter #2:
• Definition: The sample correlation coefficient 𝒓, of the data pairs 𝒙𝒊 , 𝒚𝒊 ,
𝒊 = 𝟏, . . . , 𝒏 is defined by
σ𝒏𝒊=𝟏(𝒙𝒊 − 𝒙
ഥ)(𝒚𝒊 − 𝒚
ഥ) σ𝒏𝒊=𝟏(𝒙𝒊 − 𝒙
ഥ)(𝒚𝒊 − 𝒚
ഥ)
𝒓= = .
𝒏 − 𝟏 𝒔𝒙 𝒔𝒚 σ𝒏𝒊=𝟏 ഥ
𝒙𝒊 − 𝒙 𝟐 σ𝒏
𝒊=𝟏 ഥ
𝒚𝒊 − 𝒚 𝟐

34
Markov’s inequality (PROPOSITION 4.9.1)
If 𝑿 is a random variable that takes only nonnegative values (𝑿 ≥ 𝟎), then for any
value 𝒂 ≥ 𝟎 :
𝑬𝑿
𝑷 𝑿≥𝒂 ≤ .
𝒂
Proof: For the case of continuous random variables 𝑿 and probability density
function 𝒇 𝒙 ∞one can show that ∞ 𝒂 ∞

𝑬 𝑿 = න 𝒙𝒇 𝒙 𝒅𝒙 = 𝒙 ≥ 𝟎 = න 𝒙𝒇 𝒙 𝒅𝒙 = න 𝒙𝒇 𝒙 𝒅𝒙 + න 𝒙𝒇 𝒙 𝒅𝒙 ⟹
−∞ 𝟎 𝟎 𝒂

𝑬 𝑿 ≥ න 𝒙𝒇 𝒙 𝒅𝒙
𝒂

𝐢𝐭 𝐜𝐚𝐧 𝐛𝐞

because we are looking for 𝑷 𝑿 ≥ 𝒂 ⇒ 𝐭𝐡𝐚𝐭 ⇒ 𝑬 𝑿 ≥ ‫𝒙𝒅 𝒙 𝒇𝒂 𝒂׬‬
𝐱=𝐚

𝑬𝑿
𝑬 𝑿 ≥ 𝒂 න 𝒇 𝒙 𝒅𝒙 ⟹ 𝑬 𝑿 ≥ 𝒂𝑷 𝑿 ≥ 𝒂 ⟹ 𝑷 𝑿≥𝒂 ≤ .
𝒂
𝒂
35
Chebyshev’s Inequality (PROPOSITION 4.9.2)
If 𝑿 is a random variable with mean 𝝁 and variance 𝝈𝟐 , then for any value 𝒌 > 𝟎 :
𝝈𝟐
𝑷 𝑿−𝝁 ≥𝒌 ≤ 𝟐
𝒌
Proof:
𝟐
𝑬 𝑿−𝝁 𝝈𝟐
𝝈𝟐 = 𝑽𝒂𝒓 𝑿 = 𝑬 𝑿 − 𝝁 𝟐 ⟺ = 𝟐
𝒌𝟐 𝒌
From Markov’s inequality:
𝐂𝐡𝐚𝐧𝐠𝐞 𝐨𝐟
𝟐
𝑬𝑿 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞𝐬 𝟐
𝑬 𝑿−𝝁 𝝈𝟐
𝑷 𝑿≥𝒂 ≤ ⟹ ⟹𝑷 𝑿−𝝁 ≥ 𝒌𝟐 ≤ = 𝟐 ⟹ ∗
𝒂 𝒂 → 𝒌𝟐 𝒌𝟐 𝒌
𝑿→ 𝑿−𝝁 𝟐

𝑿−𝝁 𝟐 ≥ 𝒌𝟐 ⟺ 𝑿 − 𝝁 ≥ 𝒌, thus 𝑷 𝑿 − 𝝁 𝟐 ≥ 𝒌𝟐 = 𝑷 𝑿 − 𝝁 ≥ 𝒌

𝟐
𝑬 𝑿−𝝁 𝝈𝟐
∗ ⟹𝑷 𝑿−𝝁 ≥𝒌 ≤ = 𝟐 .
𝒌𝟐 𝒌

36
Markov’s inequality (PROPOSITION 4.9.1)
If 𝑿 is a random variable that takes only nonnegative values (𝑿 ≥ 𝟎), then for any
value 𝒂 ≥ 𝟎 :
𝑬𝑿
𝑷 𝑿≥𝒂 ≤ .
𝒂

Chebyshev’s Inequality (PROPOSITION 4.9.2)


If 𝑿 is a random variable with mean 𝝁 and variance 𝝈𝟐 , then for any value 𝒌 > 𝟎 :
𝝈𝟐
𝑷 𝑿−𝝁 ≥𝒌 ≤ 𝟐
𝒌

• The importance of Markov’s and Chebyshev’s inequalities is that they enable us to derive
bounds on probabilities when only mean 𝝁, or both the mean 𝝁 and the variance 𝝈𝟐 , of the
probability distribution are known.
• Chebyshev’s Inequality holds if 𝝈 ≠ 𝟎 for limited number of random variables while
𝝈𝟐
𝒏 ⟶ ∞, because of the positive sign of probability 𝟎 ≤ 𝑷 𝑿 − 𝝁 ≥ 𝒌 ≤ .
𝒌𝟐

37
EXAMPLE 4.9a. Suppose that it is known that the number of items produced in a
factory during a week is a random variable with mean 50.
a) What can be said about the probability that this week’s production will exceed 75?
b) If the variance of a week’s production is known to equal 25, then what can be said
about the probability that this week’s production will be between 40 and 60?

Solution: Let 𝑿 be the number of items that will be produced in a week.


a) By using Markov’s inequality one can get that

𝑬𝑿 𝒂 = 𝟕𝟓 𝟓𝟎 𝟐
𝑷 𝑿≥𝒂 ≤ ⟹ ⟹ 𝑷 𝑿 ≥ 𝟕𝟓 ≤ = ≈ 𝟎. 𝟔𝟕
𝒂 𝑬 𝑿 = 𝝁 = 𝟓𝟎 𝟕𝟓 𝟑

b) Also, by using Chebyshev’s inequality one can get that

𝝈𝟐 𝝈𝟐 = 𝟐𝟓 𝟐𝟓 𝟏
𝑷 𝑿−𝝁 ≥𝒌 ≤ ⟹ 𝑬 𝑿 = 𝝁 = 𝟓𝟎 ⟹ 𝑷 𝑿 − 𝟓𝟎 ≥ 𝟏𝟎 ≤ =
𝒌𝟐 𝟏𝟎𝟐 𝟒
𝒌 = 𝟏𝟎
Hence
𝟏 𝟑
𝑷 𝑿 − 𝟓𝟎 < 𝟏𝟎 = 𝟏 − 𝑷 𝑿 − 𝟓𝟎 ≥ 𝟏𝟎 = 𝟏 − = = 𝟎. 𝟕𝟓 .
𝟒 𝟒
Answer: The probability that this week’s production will be between 40 and 60 is at
least 0.75 . 38
The Weak Law of Large Numbers (Theorem 4.9.3)
{𝑿𝟏, 𝑿𝟐, . . . , } is a sequence of independent and identically distributed random
variables, each having mean 𝑬[𝑿] = 𝝁 and 𝑽𝐚𝐫 𝑿 = 𝝈𝟐 . Then,
ഥ−𝝁 >𝜺 ⟶𝟎
𝑷 𝑿 ഥ − 𝝁 < 𝜺 ⟶ 𝟏 , 𝐰𝐡𝐞𝐫𝐞 𝒏 ⟶ ∞ 𝐚𝐧𝐝 𝜺 > 𝟎 .
𝐨𝐫 𝑷 𝑿
0 0
𝑿𝟏 +𝑿𝟐 +⋯+𝑿𝒏
ഥ is a sample mean (a value calculated out of experimental data).
=𝑿
𝒏
𝑬[𝑿] = 𝝁 is a finite true mean (a real value of mean expected for sequence 𝑿) .

Proof:
To proof the Theorem 4.9.3 one can use Chebyshev’s Inequality by replacing 𝑿 with
ഥ:
𝑿
𝑽𝒂𝒓 𝑿ഥ
𝑷 𝑿ഥ−𝝁 ≥𝒌 ≤ .
𝒌𝟐

ഥ :
Within this approach one should develop an identity for 𝑽𝒂𝒓 𝑿
𝑿𝟏 + 𝑿𝟐 + ⋯ + 𝑿𝒏 𝟏
ഥ = 𝑽𝒂𝒓
𝑽𝒂𝒓 𝑿 = 𝟐 𝑽𝒂𝒓 𝑿𝟏 + 𝑿𝟐 + ⋯ + 𝑿𝒏 =? ? ?
𝒏 𝒏
39
We have already proved Lemma 4.7.1:
𝑪𝒐𝒗 𝑿 + 𝒁, 𝒀 = 𝑪𝒐𝒗 𝑿, 𝒀 + 𝑪𝒐𝒗 𝒁, 𝒀 .
This lemma can be generalized:
𝑪𝒐𝒗 σ𝒏𝒊=𝟏 𝑿𝒊 , 𝒀 = σ𝒏𝒊=𝟏 𝑪𝒐𝒗 𝑿𝒊 , 𝒀 . 𝑬𝒒. (𝟒. 𝟕. 𝟓)
Proposition 4.7.2
𝒏 𝒎 𝒏 𝒎

𝑪𝒐𝒗 ෍ 𝑿𝒊 , ෍ 𝒀𝒋 = ෍ ෍ 𝑪𝒐𝒗 𝑿𝒊 , 𝒀𝒋
𝒊=𝟏 𝒋=𝟏 𝒊=𝟏 𝒋=𝟏
Proof:
Let’s assume 𝒀 = σ𝒎
𝒋=𝟏 𝒀𝒋 , then from 𝑬𝒒. (𝟒. 𝟕. 𝟓) one can get
𝒏 𝒎 𝒏 𝒏 𝒏 𝒎

𝑪𝒐𝒗 ෍ 𝑿𝒊 , ෍ 𝒀𝒋 = 𝑪𝒐𝒗 ෍ 𝑿𝒊 , 𝒀 = ෍ 𝑪𝒐𝒗 𝑿𝒊 , 𝒀 = ෍ 𝑪𝒐𝒗 𝑿𝒊 , ෍ 𝒀𝒋 =


𝒊=𝟏 𝒋=𝟏 𝒊=𝟏 𝒊=𝟏 𝒊=𝟏 𝒋=𝟏

𝒏 𝒎
𝐮𝐬𝐞 𝑬𝒒 𝟒. 𝟕. 𝟐 :
= = ෍ 𝑪𝒐𝒗 ෍ 𝒀𝒋 , 𝑿𝒊 = 𝐮𝐬𝐞 𝑬𝒒. (𝟒. 𝟕. 𝟓) =
𝑪𝒐𝒗 𝑨, 𝑩 = 𝑪𝒐𝒗(𝑩, 𝑨)
𝒊=𝟏 𝒋=𝟏

𝐮𝐬𝐞 𝑬𝒒 𝟒. 𝟕. 𝟐 :
= σ𝒏𝒊=𝟏 σ𝒎
𝒋=𝟏 𝑪𝒐𝒗 𝒀𝒋 , 𝑿𝒊 = = σ𝒏𝒊=𝟏 σ𝒎
𝒋=𝟏 𝑪𝒐𝒗 𝑿𝒊 , 𝒀𝒋 ∎
𝑪𝒐𝒗 𝑨, 𝑩 = 𝑪𝒐𝒗(𝑩, 𝑨)
40
From the definition of Covariance:
𝑪𝒐𝒗 𝑿, 𝒀 = 𝑬 𝑿 − 𝝁𝒙 𝒀 − 𝝁𝒚 ⟹
𝟐
⟹ 𝑪𝒐𝒗 𝑿, 𝑿 = 𝑬 𝑿 − 𝝁𝒙 = 𝑽𝒂𝒓 𝑿 ⟹ 𝑪𝒐𝒗 𝑿, 𝑿 = 𝑽𝒂𝒓 𝑿 𝑬𝒒. 𝟒. 𝟕. 𝟑
One can use the Equation 4.7.3 to check-up the Corollary 4.7.3, which is
𝒏 𝒏 𝒏 𝒏

𝑽𝒂𝒓 ෍ 𝑿𝒊 = ෍ 𝑽𝒂𝒓 𝑿𝒊 + ෍ ෍ 𝑪𝒐𝒗 𝑿𝒊 , 𝑿𝒋


𝒊=𝟏 𝒊=𝟏 𝒊=𝟏 𝒋=𝟏
𝒋≠𝒊
Check-up: By assuming that 𝒏 = 𝟐, where 𝒊 = 𝟏, 𝟐, and 𝑿𝟏 = 𝑿 and 𝑿𝟐 = 𝒀 in the
Corollary 4.7.3, one can get that
𝐮𝐬𝐞 𝑬𝒒 𝟒. 𝟕. 𝟐 :
𝑽𝒂𝒓 𝑿 + 𝒀 = 𝑽𝒂𝒓 𝑿 + 𝑽𝒂𝒓 𝒀 + 𝑪𝒐𝒗 𝑿, 𝒀 + 𝑪𝒐𝒗 𝒀, 𝑿 = =
𝑪𝒐𝒗 𝑨, 𝑩 = 𝑪𝒐𝒗(𝑩, 𝑨)
= 𝑽𝒂𝒓 𝑿 + 𝑽𝒂𝒓 𝒀 + 𝟐𝑪𝒐𝒗 𝑿, 𝒀 = (∗)
In even more particular case of 𝒀 = 𝑿, one can get that
∗ = 𝑽𝒂𝒓 𝑿 + 𝑽𝒂𝒓 𝑿 + 𝟐𝑪𝒐𝒗 𝑿, 𝑿 = 𝟐𝑽𝒂𝒓 𝑿 + 𝟐𝑪𝒐𝒗 𝑿, 𝑿
Also, previously we demonstrated that

𝑽𝒂𝒓 𝑿 + 𝑿 = 𝑽𝒂𝒓 𝟐𝑿 = 𝑽𝒂𝒓 𝒂𝑿 = 𝒂𝟐 𝑽𝒂𝒓 𝑿 = 𝟒𝑽𝒂𝒓 𝑿 .

In this case, the Corollary 4.7.3 might be simplified to Eq. (4.7.3) as next
𝟒𝑽𝒂𝒓 𝑿 = 𝟐𝑽𝒂𝒓 𝑿 + 𝟐𝑪𝒐𝒗 𝑿, 𝑿 ⟹ 𝟒𝑽𝒂𝒓 𝑿 − 𝟐𝑽𝒂𝒓 𝑿 = 𝟐𝑪𝒐𝒗 𝑿, 𝑿 ⟹
𝑪𝒐𝒗 𝑿, 𝑿 = 𝑽𝒂𝒓 𝑿 ∎
41
Theorem 4.7.4
If 𝑿 and 𝒀 are independent random variables, then
𝑪𝒐𝒗 𝑿, 𝒀 = 𝟎. ( Part I )
Moreover, for a sequence of independent random variables 𝑿𝟏, 𝑿𝟐, … , 𝑿𝒏 :

𝑽𝒂𝒓 σ𝒏𝒊=𝟏 𝑿𝒊 = σ𝒏𝒊=𝟏 𝑽𝒂𝒓 𝑿𝒊 . ( Part II )


Proof (Part I):
Let’s define 𝑬 𝑿 ∙ 𝒀 in the 𝑬𝒒. 𝟒. 𝟕. 𝟏 , which is 𝑪𝒐𝒗 𝑿, 𝒀 = 𝑬 𝑿 ∙ 𝒀 − 𝑬 𝑿 ∙ 𝑬 𝒀 , ∶

𝑬 𝑿 ∙ 𝒀 = ෍ ෍ 𝒙𝒊 𝒚𝒋 𝑷 𝑿 = 𝒙𝒊 , 𝒀 = 𝒚𝒋 = ෍ ෍ 𝒙𝒊 𝒚𝒋 𝑷 𝑿 = 𝒙𝒊 𝑷 𝒀 = 𝒚𝒋 =
𝒋 𝒊 𝒋 𝒊

= ෍ 𝒚𝒋 𝑷 𝒀 = 𝒚𝒋 ෍ 𝒙𝒊 𝑷 𝑿 = 𝒙𝒊 = 𝑬 𝒀 ∙ 𝑬 𝑿 .
𝒋 𝒊
Thus, from Eq.(4.7.1) :
𝑪𝒐𝒗 𝑿, 𝒀 = 𝑬 𝑿 ∙ 𝒀 − 𝑬 𝑿 ∙ 𝑬 𝒀 = 𝑬 𝑿 ∙ 𝑬 𝒀 − 𝑬 𝑿 ∙ 𝑬 𝒀 = 𝟎 .

42
Theorem 4.7.4
If 𝑿 and 𝒀 are independent random variables, then
𝑪𝒐𝒗 𝑿, 𝒀 = 𝟎.
( Part I )
Moreover, for a sequence of independent random variables 𝑿𝟏, 𝑿𝟐, … , 𝑿𝒏 :
𝒏 𝒏

𝑽𝒂𝒓 ෍ 𝑿𝒊 = ෍ 𝑽𝒂𝒓 𝑿𝒊 .
𝒊=𝟏 𝒊=𝟏
( Part II )
Proof (Part II): Let’s use Corollary 4.7.3:
𝒏 𝒏 𝒏 𝒏

𝑽𝒂𝒓 ෍ 𝑿𝒊 = ෍ 𝑽𝒂𝒓 𝑿𝒊 + ෍ ෍ 𝑪𝒐𝒗 𝑿𝒊 , 𝑿𝒋


𝒊=𝟏 𝒊=𝟏 𝒊=𝟏 𝒋=𝟏
𝒋≠𝒊
For independent variables σ𝒏𝒊=𝟏 σ𝒏𝒋=𝟏 𝑪𝒐𝒗 𝑿𝒊 , 𝑿𝑱 = 𝟎 , then
𝒋≠𝒊

𝒏 𝒏

𝑽𝒂𝒓 ෍ 𝑿𝒊 = ෍ 𝑽𝒂𝒓 𝑿𝒊 . ∎
𝒊=𝟏 𝒊=𝟏 43
The Weak Law of Large Numbers (Theorem 4.9.3)
{𝑿𝟏, 𝑿𝟐, . . . , } is a sequence of independent and identically distributed random variables,
each having mean 𝑬[𝑿] = 𝝁 and 𝑽𝐚𝐫 𝑿 = 𝝈𝟐 . Then,
ഥ − 𝝁 > 𝜺 ⟶ 𝟎, 𝐰𝐡𝐞𝐫𝐞 𝒏 ⟶ ∞ 𝐚𝐧𝐝 𝜺 > 𝟎 .
𝑷 𝑿
Proof:
𝑿𝟏 +𝑿𝟐 +⋯+𝑿𝒏
ഥ is a sample mean (a value calculated out of experimental data).
=𝑿
𝒏
𝑬[𝑿] = 𝝁 is a finite true mean (a real value of mean expected for sequence 𝑿) .

𝑿𝟏 + 𝑿𝟐 + ⋯ + 𝑿𝒏 𝟏
ഥ = 𝑽𝒂𝒓
𝑽𝒂𝒓 𝑿 = 𝟐 𝑽𝒂𝒓 𝑿𝟏 + 𝑿𝟐 + ⋯ + 𝑿𝒏 =? ? ? =∗
𝒏 𝒏
By using the Theorem 4.7.4 that
𝒏 𝒏

𝑽𝒂𝒓 ෍ 𝑿𝒊 = ෍ 𝑽𝒂𝒓 𝑿𝒊 ,
𝒊=𝟏 𝒊=𝟏
one can continue
𝒏 𝒏
𝟏 𝟏 𝟏 𝑽𝒂𝒓 𝑿𝒊 𝝈𝟐
∗= 𝟐 ∙ 𝑽𝒂𝒓 ෍ 𝑿𝒊 = 𝟐 ∙ ෍ 𝑽𝒂𝒓 𝑿𝒊 = 𝟐 ∙ 𝒏 ∙ 𝑽𝒂𝒓 𝑿𝒊 = = .
𝒏 𝒏 𝒏 𝒏 𝒏
𝒊=𝟏 𝒊=𝟏

44
The Weak Law of Large Numbers (Theorem 4.9.3)
{𝑿𝟏, 𝑿𝟐, . . . , } is a sequence of independent and identically distributed random
variables, each having mean 𝑬[𝑿] = 𝝁 and 𝑽𝐚𝐫 𝑿 = 𝝈𝟐 . Then,
ഥ − 𝝁 > 𝜺 ⟶ 𝟎, 𝐰𝐡𝐞𝐫𝐞 𝒏 ⟶ ∞ 𝐚𝐧𝐝 𝜺 > 𝟎 .
𝑷 𝑿
Proof:
𝝈𝟐
ഥ =
We have demonstrated that 𝑽𝒂𝒓 𝑿 .
𝒏

Now one can re-write the condition of Chebyshev’s Inequality for the independent
ഥ = 𝑿𝟏 +𝑿𝟐 +⋯+𝑿𝒏 :
and identically distributed random variables by replacing 𝑿 with 𝑿
𝒏

𝑿⟶𝑿
𝑽𝒂𝒓 𝑿 𝝈𝟐
𝑷 𝑿−𝝁 ≥𝒌 ≤ ⟹ 𝑽𝒂𝒓 𝑿𝒊 ഥ =
⟶ 𝑽𝒂𝒓 𝑿 ⟹
𝒌𝟐 𝒏
𝒌⟶𝜺

𝝈𝟐
𝒏 𝝈𝟐
ഥ−𝝁 ≥𝜺 ≤
𝑷 𝑿 = 𝟐 ⟶ 𝟎. ∎
𝜺𝟐 𝜺 𝒏
𝒏→∞

45
The Weak Law of Large Numbers (Theorem 4.9.3)
{𝑿𝟏, 𝑿𝟐, . . . , } is a sequence of independent and identically distributed random
variables, each having mean 𝑬[𝑿𝒊 ] = 𝝁 and 𝑽𝐚𝐫 𝑿𝒊 = 𝝈𝟐 . Then,
ഥ − 𝝁 > 𝜺 ⟶ 𝟎, 𝐰𝐡𝐞𝐫𝐞 𝒏 ⟶ ∞ 𝐚𝐧𝐝 𝜺 > 𝟎 ⟹
𝑷 𝑿

𝝈𝟐
ഥ−𝝁 <𝜺 =𝟏−𝑷 𝑿
⟹ 𝑷 𝑿 ഥ−𝝁 >𝜺 =𝟏− ⟶ 𝟏,
𝜺𝟐 𝒏
𝒏→∞

Conclusion: Constant 𝜺 is a fixed positive value, which characterizes some difference


between sample average value 𝑿 ഥ (or experimental mean) and true mean value 𝝁
(the real value of mean, which might be predicted theoretically), and it converges to
zero as an experiment continues to a large number 𝒏. Thus, the Weak Law of Large
Numbers states that there is a small probability that with the large 𝒏 the sample
average 𝑿 ഥ would be far off from the true mean value 𝝁 of the sequence
𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 .

• The weak law of large numbers was originally proven by James Bernoulli for the special case,
where the random variable is 𝑿𝒊 = 𝟎, 𝟏 , which was published in 1713 (eight years after his
death).
• The general form of the weak law of large numbers was proved by the Alexander Khintchine
(1894-1959).
46
Two Consequences of the Theorem 4.7.4
If 𝑿 and 𝒀 are independent random variables, then
𝑪𝒐𝒗 𝑿, 𝒀 = 𝟎.
Moreover, for a sequence of independent random variables 𝑿𝟏, 𝑿𝟐, … , 𝑿𝒏 :
𝒏 𝒏

𝑽𝒂𝒓 ෍ 𝑿𝒊 = ෍ 𝑽𝒂𝒓 𝑿𝒊 .
𝒊=𝟏 𝒊=𝟏

There are two consequences of Theorem 4.7.4 related to the correlation and
expectation of independent random variables 𝑿 and 𝒀:

𝑑𝑒𝑓
𝑪𝒐𝒗 𝑿,𝒀 𝟎
𝟏) 𝑪𝒐𝒓𝒓 𝑿, 𝒀 = = = 𝟎, where 𝑪𝒐𝒗 𝑿, 𝒀 = 𝟎 .
𝑽𝒂𝒓 𝑿 𝑽𝒂𝒓 𝒀 𝑽𝒂𝒓 𝑿 𝑽𝒂𝒓 𝒀

𝑪𝒐𝒗 𝑿, 𝒀 = 𝟎
𝟐) ൞ ⟹𝑬 𝑿∙𝒀 =𝑬 𝑿 𝑬 𝒀 .
𝑪𝒐𝒗 𝑿, 𝒀 = 𝑬 𝑿 ∙ 𝒀 − 𝑬 𝑿 𝑬 𝒀
47
The Strong Law of Large Numbers
ഥ converges almost
The strong law of large numbers states that the sample average 𝑿
surely to the true mean 𝝁 all random numbers from a space, when 𝒏 is large :
ഥ ⟶ 𝝁, when 𝒏 ⟶ ∞
𝑿
or
ഥ = 𝝁 = 𝟏.
𝑷 𝒍𝒊𝒎𝒏⟶∞ 𝐗

• The Strong Law of Large Numbers considers all random numbers from a space,
while the Weak Law of Large Numbers deals just with a sample space of random
numbers.

• If the Strong Law of Large Numbers holds for all random numbers of the space for
the particular value 𝝁, then this 𝝁 can be used for the description of the Weak Law
of Large Numbers applied to any sample mean 𝑿 ഥ from that sample space.
• The reverse is not necessarily the true statement .

48

You might also like