0% found this document useful (0 votes)
15 views14 pages

(Heads) 0 (Tails) 1 (1) 1, (2) 2, - . - , (6) 6

Uploaded by

mejamal923
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views14 pages

(Heads) 0 (Tails) 1 (1) 1, (2) 2, - . - , (6) 6

Uploaded by

mejamal923
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

3.

Random Variables and Probability Distributions

1. Random variables

A variable that may take some values with uncertainty which may be measured by
probability.

Examples.

Tossing a coin
𝑋𝑋(Heads) = 0
𝑋𝑋(Tails) = 1
Tossing a die
𝑋𝑋(1) = 1, 𝑋𝑋(2) = 2, . . . , 𝑋𝑋(6) = 6

Temperature
𝑋𝑋(2025_01_28_10: 30am) = −10. 20 𝐶𝐶

2. Distribution functions
If 𝑋𝑋 is a random variable and 𝑥𝑥 is a number then the probability for the event [𝑋𝑋 ≤ 𝑥𝑥]
can be written as
𝑃𝑃𝑃𝑃[ 𝑋𝑋 ≤ 𝑥𝑥]
which is a function of 𝑥𝑥 so we can write:
𝐹𝐹(𝑥𝑥) = 𝑃𝑃𝑃𝑃[ 𝑋𝑋 ≤ 𝑥𝑥], −∞ < 𝑥𝑥 < ∞

18
3. Properties of general probability distribution functions
(a) 0 ≤ 𝐹𝐹(𝑥𝑥) ≤ 1, −∞ < 𝑥𝑥 < ∞
(b) if 𝑥𝑥1 ≤ 𝑥𝑥2 , then 𝐹𝐹(𝑥𝑥1 ) ≤ 𝐹𝐹(𝑥𝑥2 )
(c) 𝑙𝑙𝑙𝑙𝑙𝑙 𝐹𝐹(𝑥𝑥) = 𝐹𝐹(∞) = 1, 𝑙𝑙𝑙𝑙𝑙𝑙 𝐹𝐹(𝑥𝑥) = 𝐹𝐹(−∞) = 0
𝑥𝑥→+∞ 𝑥𝑥→−∞

(d) 𝑙𝑙𝑙𝑙𝑙𝑙 𝐹𝐹(𝑥𝑥) = 𝐹𝐹(𝑥𝑥0 ) (as shown in Figure 3.7)


𝑥𝑥→𝑥𝑥0 +

Examples.
Negative exponential distribution, is typically used to calculate the probability that an
event will happen within certain period of time 𝑥𝑥

0, if 𝑥𝑥 < 0
𝐹𝐹(𝑥𝑥) = �
1 − 𝑒𝑒 −𝜆𝜆𝜆𝜆 , if 𝑥𝑥 ≥ 0

19
Binomial distribution, is typically used to calculate the probability that j out of n experiments
are successes from a very large population (infinity) while the population success rate is p.
𝑛𝑛 𝑛𝑛!
𝐹𝐹(𝑥𝑥) = � � 𝑗𝑗 � 𝑝𝑝 𝑗𝑗 𝑞𝑞 𝑛𝑛−𝑗𝑗 = � 𝑝𝑝 𝑗𝑗 𝑞𝑞 𝑛𝑛−𝑗𝑗
𝑗𝑗! (𝑛𝑛 − 𝑗𝑗)!
𝑗𝑗≤𝑥𝑥 𝑗𝑗≤𝑥𝑥

For the coin tossing example, 𝑝𝑝 = 𝑞𝑞 = 0.50. For a quality control process, 𝑝𝑝 should be much
smaller such as 1% or less.

4. Discrete random variables


The random variables can only take some discrete values at 𝑥𝑥0 , 𝑥𝑥1 , 𝑥𝑥2 , . ..
The probability function is in the form of
𝑝𝑝(𝑥𝑥) = 𝑃𝑃𝑃𝑃[ 𝑋𝑋 = 𝑥𝑥]
Properties:
(a) 𝑝𝑝(𝑥𝑥) = 0, unless 𝑥𝑥 is one of 𝑥𝑥0 , 𝑥𝑥1 , 𝑥𝑥2 , . ..
(b) 0 ≤ 𝑝𝑝(𝑥𝑥𝑖𝑖 ) ≤ 1 for all 𝑥𝑥𝑖𝑖
(c) ∑𝑖𝑖 𝑝𝑝(𝑥𝑥𝑖𝑖 ) = ∑𝑖𝑖 𝑃𝑃𝑃𝑃[ 𝑥𝑥 = 𝑥𝑥𝑖𝑖 ] = 1
The function 𝐹𝐹(𝑥𝑥) and 𝑝𝑝(𝑥𝑥) are related by:

𝐹𝐹(𝑥𝑥) = 𝑃𝑃𝑃𝑃[ 𝑋𝑋 ≤ 𝑥𝑥] = � 𝑃𝑃𝑃𝑃[ 𝑥𝑥 = 𝑥𝑥𝑖𝑖 ] = � 𝑝𝑝(𝑥𝑥𝑖𝑖 )


𝑥𝑥𝑖𝑖 ≤𝑥𝑥 𝑥𝑥𝑖𝑖 ≤𝑥𝑥

Examples are Bernoulli trials, Geometric distributions, Binomial distributions, Poisson


distributions, etc.

20
5. Continuous random variables
A random variable which may take any values in (−∞, ∞)and its distribution function 𝐹𝐹(𝑥𝑥)
𝑑𝑑
is: 1) continuous; 2) has a derivative 𝑓𝑓(𝑥𝑥) = 𝐹𝐹(𝑥𝑥) for all values of 𝑥𝑥 (may have some
𝑑𝑑𝑑𝑑

exceptions); and 3) the derivative is piece wise continuous. 𝐹𝐹(𝑥𝑥) and 𝑓𝑓(𝑥𝑥) are related by:
𝑑𝑑
(a) 𝑓𝑓(𝑥𝑥) = 𝐹𝐹(𝑥𝑥)
𝑑𝑑𝑑𝑑
𝑥𝑥 𝑑𝑑 𝑥𝑥
(b) 𝐹𝐹(𝑥𝑥) = 𝐹𝐹(𝑥𝑥) − 𝐹𝐹(−∞) = ∫−∞ 𝐹𝐹(𝑡𝑡)𝑑𝑑𝑑𝑑 = ∫−∞ 𝑓𝑓(𝑡𝑡)𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑
𝑏𝑏
(c) 𝑃𝑃𝑃𝑃[ 𝑎𝑎 < 𝑋𝑋 ≤ 𝑏𝑏] = 𝐹𝐹(𝑏𝑏) − 𝐹𝐹(𝑎𝑎) = ∫𝑎𝑎 𝑓𝑓(𝑡𝑡)𝑑𝑑𝑑𝑑

The function has the following properties:


(i) 𝑓𝑓(𝑥𝑥) = 0 if 𝑥𝑥 is not in the range of 𝑋𝑋
(ii) 𝑓𝑓(𝑥𝑥) ≥ 0 for all 𝑥𝑥

(iii) ∫−∞ 𝑓𝑓(𝑥𝑥)𝑑𝑑𝑑𝑑 = 1
(iv) 𝑓𝑓(𝑥𝑥) is continuous or piecewise continuous

Examples are negative exponential, normal, uniform, Beta and other distributions.

6. Expectation: The “center” of the distribution


For discrete random variables:

𝜇𝜇 = 𝐸𝐸[𝑥𝑥] = � 𝑥𝑥𝑖𝑖 𝑝𝑝(𝑥𝑥𝑖𝑖 )


𝑖𝑖

For continuous random variables:



𝜇𝜇 = 𝐸𝐸[𝑥𝑥] = � 𝑥𝑥𝑥𝑥(𝑥𝑥)𝑑𝑑𝑑𝑑
−∞

21
Properties of 𝐸𝐸[𝑥𝑥]
For constants 𝑎𝑎 and 𝑏𝑏,
(i) 𝐸𝐸[𝑎𝑎𝑎𝑎 + 𝑏𝑏] = 𝑎𝑎𝑎𝑎[𝑥𝑥] + 𝑏𝑏
(ii) 𝐸𝐸[𝑏𝑏] = 𝑏𝑏

In general:
𝐸𝐸[𝑥𝑥1 + 𝑥𝑥2 + ⋯ + 𝑥𝑥𝑛𝑛 ] = 𝐸𝐸[𝑥𝑥1 ] + 𝐸𝐸[𝑥𝑥2 ] + ⋯ + 𝐸𝐸[𝑥𝑥𝑛𝑛 ]

Example 6.1. Bernoulli distribution. Probability of success in 1 experiment while the


population success rate is p.

𝑝𝑝, if 𝑥𝑥 = 1
𝑝𝑝(𝑥𝑥) = �
1 − 𝑝𝑝 = 𝑞𝑞, if 𝑥𝑥 = 0
Then:

𝐸𝐸[𝑥𝑥] = 𝜇𝜇 = � 𝑖𝑖𝑝𝑝𝑖𝑖 = 0𝑝𝑝0 + 1𝑝𝑝1 = 0𝑞𝑞 + 1𝑝𝑝 = 𝑝𝑝


𝑖𝑖

Example 6.2. Negative exponential distribution:

0, if 𝑥𝑥 < 0
𝐹𝐹(𝑥𝑥) = � −𝜆𝜆𝜆𝜆
1 − 𝑒𝑒 , if 𝑥𝑥 ≥ 0

0, if 𝑥𝑥 < 0
𝑓𝑓(𝑥𝑥) = �
𝜆𝜆𝑒𝑒 −𝜆𝜆𝜆𝜆 , if 𝑥𝑥 ≥ 0

∞ ∞
−𝜆𝜆𝜆𝜆
1 ∞ −𝑢𝑢 1
𝐸𝐸[𝑥𝑥] = 𝜇𝜇 = � 𝑥𝑥𝑥𝑥(𝑥𝑥)𝑑𝑑𝑑𝑑 = � 𝑥𝑥(𝜆𝜆𝑒𝑒 )𝑑𝑑𝑑𝑑 = � 𝑢𝑢𝑒𝑒 𝑑𝑑𝑑𝑑 =
−∞ 0 𝜆𝜆 0 𝜆𝜆

22
7. Variances: Measure for variation or dispersion

⎧� (𝑥𝑥 − 𝜇𝜇)2 𝑓𝑓(𝑥𝑥)𝑑𝑑𝑑𝑑 , continuous case
⎪ −∞
𝜎𝜎 2 = 𝑉𝑉𝑉𝑉𝑉𝑉[𝑥𝑥] = 𝐸𝐸[(𝑥𝑥 − 𝜇𝜇)2 ] =
⎨ �(𝑥𝑥𝑖𝑖 − 𝜇𝜇)2 𝑝𝑝(𝑥𝑥𝑖𝑖 ) , discrete case

⎩ 𝑖𝑖
Properties:
• 𝑉𝑉𝑉𝑉𝑉𝑉[𝑥𝑥] = 𝐸𝐸[(𝑥𝑥 − 𝜇𝜇)2 ] = 𝐸𝐸[𝑥𝑥 2 ] − 𝜇𝜇2 or
• 𝑉𝑉𝑉𝑉𝑉𝑉[𝑥𝑥] = 𝐸𝐸[(𝑥𝑥 − 𝜇𝜇)2 ] = 𝐸𝐸[𝑥𝑥 2 ] − (𝐸𝐸[𝑥𝑥])2
• 𝑉𝑉𝑉𝑉𝑉𝑉[𝑏𝑏] = 0
• 𝑉𝑉𝑉𝑉𝑉𝑉[𝑥𝑥 + 𝑏𝑏] = 𝑉𝑉𝑉𝑉𝑉𝑉[𝑥𝑥]
• 𝑉𝑉𝑉𝑉𝑉𝑉[𝑎𝑎𝑎𝑎] = 𝑎𝑎2 𝑉𝑉𝑉𝑉𝑉𝑉[𝑥𝑥]
• 𝑉𝑉𝑉𝑉𝑉𝑉[𝑎𝑎𝑎𝑎 + 𝑏𝑏] = 𝑎𝑎2 𝑉𝑉𝑉𝑉𝑉𝑉[𝑥𝑥]
For 𝑆𝑆 = 𝑋𝑋 + 𝑌𝑌, if 𝑋𝑋 and 𝑌𝑌are independent, then we have:
𝑉𝑉𝑉𝑉𝑉𝑉[𝑆𝑆] = 𝑉𝑉𝑉𝑉𝑉𝑉[𝑥𝑥] + 𝑉𝑉𝑉𝑉𝑉𝑉[𝑦𝑦]
Standard Deviation: 𝜎𝜎 = √𝜎𝜎 2

Examples.
For Bernoulli distribution:
𝑝𝑝, if 𝑥𝑥 = 1
𝑝𝑝(𝑥𝑥) = �
1 − 𝑝𝑝 = 𝑞𝑞, if 𝑥𝑥 = 0
We have
𝐸𝐸[𝑥𝑥] = 𝜇𝜇 = 𝑝𝑝, 𝑉𝑉𝑉𝑉𝑉𝑉[𝑥𝑥] = 𝜎𝜎 2 = 𝑝𝑝(1 − 𝑝𝑝)

For negative exponential distribution:


0, if 𝑥𝑥 < 0
𝑓𝑓(𝑥𝑥) = �
𝜆𝜆𝑒𝑒 −𝜆𝜆𝜆𝜆 , if 𝑥𝑥 ≥ 0
1 1
We have: 𝐸𝐸[𝑥𝑥] = 𝜇𝜇 = , 𝑉𝑉𝑉𝑉𝑉𝑉[𝑥𝑥] = 𝜎𝜎 2 =
𝜆𝜆 𝜆𝜆2

23
4. Sets of Random Variables

1. Pairs of Random Variables

Two random variables associated with one event

Examples
• The height and weight of a randomly selected individual
• Time interval between the first and the second arrivals and time interval between the
second and the third arrivals
• Speed and acceleration of a vehicle randomly encountered
• Others

24
2. Distribution functions
The probability distribution function for the event with 𝐴𝐴1 = {𝑠𝑠 : 𝑋𝑋 (𝑠𝑠) ≤ 𝑥𝑥} and
𝐴𝐴2 = {𝑠𝑠 : 𝑌𝑌 (𝑠𝑠) ≤ 𝑦𝑦} can be written as
𝐹𝐹𝑥𝑥 (𝑥𝑥) = 𝑃𝑃𝑃𝑃[ 𝑋𝑋 ≤ 𝑥𝑥] = 𝑃𝑃𝑃𝑃[ 𝐴𝐴1 ]
𝐹𝐹𝑦𝑦 (𝑦𝑦) = 𝑃𝑃𝑃𝑃[ 𝑌𝑌 ≤ 𝑦𝑦] = 𝑃𝑃𝑃𝑃[ 𝐴𝐴2 ]
and
𝐹𝐹(𝑥𝑥, 𝑦𝑦) = 𝑃𝑃𝑃𝑃[ 𝑋𝑋 ≤ 𝑥𝑥 and 𝑌𝑌 ≤ 𝑦𝑦] = 𝑃𝑃𝑃𝑃[ 𝐴𝐴1 ∩ 𝐴𝐴2 ]
which is a function of 𝑥𝑥 and 𝑦𝑦.

Example 4.1.
𝑋𝑋 is the time interval between the first and the second arrivals, 𝑌𝑌 is the time interval
between the second and the third arrivals with the joint distribution function of:
1 − 𝑒𝑒 −𝑎𝑎𝑎𝑎 − 𝑒𝑒 −𝑎𝑎𝑎𝑎 − 𝑒𝑒 −𝑎𝑎(𝑥𝑥+𝑦𝑦) , 0 < 𝑥𝑥 < ∞, 0 < 𝑦𝑦 < ∞
𝐹𝐹(𝑥𝑥, 𝑦𝑦) = �
0, otherwise

3. Independence
If 𝐴𝐴1 and 𝐴𝐴2 are independent for every 𝑥𝑥 and 𝑦𝑦, then 𝑋𝑋 and 𝑌𝑌 are independent random
variables. We then have:

𝑃𝑃𝑃𝑃[ 𝐴𝐴1 ∩ 𝐴𝐴2 ] = 𝑃𝑃𝑃𝑃[ 𝐴𝐴1 ] 𝑃𝑃𝑃𝑃[ 𝐴𝐴2 ]


Or
𝑃𝑃𝑃𝑃[ 𝑋𝑋 ≤ 𝑥𝑥 and 𝑌𝑌 ≤ 𝑦𝑦] = 𝑃𝑃𝑃𝑃[ 𝑋𝑋 ≤ 𝑥𝑥] 𝑃𝑃𝑃𝑃[ 𝑌𝑌 ≤ 𝑦𝑦]
Or
𝐹𝐹(𝑥𝑥, 𝑦𝑦) = 𝐹𝐹𝑋𝑋 (𝑥𝑥)𝐹𝐹𝑌𝑌 (𝑦𝑦)

If a joint distribution can be factored in this form, then the two random variables are
independent.

25
Example 4.1.
1 − 𝑒𝑒 −𝑎𝑎𝑎𝑎 − 𝑒𝑒 −𝑎𝑎𝑎𝑎 + 𝑒𝑒 −𝑎𝑎(𝑥𝑥+𝑦𝑦) , 0 < 𝑥𝑥 < ∞, 0 < 𝑦𝑦 < ∞
𝐹𝐹(𝑥𝑥, 𝑦𝑦) = �
0, otherwise
(1 − 𝑒𝑒 −𝑎𝑎𝑎𝑎 )(1 − 𝑒𝑒 −𝑎𝑎𝑎𝑎 ), 0 < 𝑥𝑥 < ∞, 0 < 𝑦𝑦 < ∞
𝐹𝐹(𝑥𝑥, 𝑦𝑦) = �
0, otherwise

and we can have


1 − 𝑒𝑒 −𝑎𝑎𝑎𝑎 0 < 𝑥𝑥 < ∞, 1 − 𝑒𝑒 −𝑎𝑎𝑎𝑎 0 < 𝑦𝑦 < ∞,
𝑢𝑢(𝑥𝑥) = � 𝑣𝑣(𝑦𝑦) = �
0, otherwise 0, otherwise

We conclude that 𝑋𝑋 and 𝑌𝑌 are independent with 𝑢𝑢(𝑥𝑥) = 𝐹𝐹𝑋𝑋 (𝑥𝑥) and 𝑣𝑣(𝑦𝑦) = 𝐹𝐹𝑌𝑌 (𝑦𝑦)

4. Discrete case
Assume that random variables 𝑋𝑋 and 𝑌𝑌 can only take integer values at
1,2, 3, ..., 𝑖𝑖 − 1, 𝑖𝑖, 𝑖𝑖 + 1,…, we can define:
𝑝𝑝(𝑖𝑖, 𝑗𝑗) = 𝑃𝑃𝑃𝑃[ 𝑋𝑋 = 𝑖𝑖 and 𝑌𝑌 = 𝑗𝑗]
Properties
(a) 0 ≤ 𝑝𝑝(𝑖𝑖, 𝑗𝑗) ≤ 1, for each 𝑖𝑖, 𝑗𝑗
(b) ∑𝑖𝑖,𝑗𝑗 𝑝𝑝(𝑖𝑖, 𝑗𝑗) = ∑𝑖𝑖 ∑𝑗𝑗 𝑝𝑝(𝑖𝑖, 𝑗𝑗) = 1
(c) 𝐹𝐹(𝑥𝑥, 𝑦𝑦) = ∑𝑖𝑖≤𝑥𝑥 ∑𝑗𝑗≤𝑦𝑦 𝑝𝑝(𝑖𝑖, 𝑗𝑗)
If 𝑝𝑝𝑖𝑖 = 𝑃𝑃𝑃𝑃[ 𝑋𝑋 = 𝑖𝑖], 𝑞𝑞𝑗𝑗 = 𝑃𝑃𝑃𝑃[ 𝑌𝑌 = 𝑗𝑗]
Then 𝑝𝑝𝑖𝑖 = ∑𝑗𝑗 𝑃𝑃𝑃𝑃[ 𝑋𝑋 = 𝑖𝑖 and 𝑌𝑌 = 𝑗𝑗] = ∑𝑗𝑗 𝑝𝑝(𝑖𝑖, 𝑗𝑗) and
𝑞𝑞𝑗𝑗 = ∑𝑖𝑖 𝑃𝑃𝑃𝑃[ 𝑋𝑋 = 𝑖𝑖 and 𝑌𝑌 = 𝑗𝑗] = ∑𝑖𝑖 𝑝𝑝(𝑖𝑖, 𝑗𝑗)
If 𝑋𝑋 and 𝑌𝑌 are independent, then
𝑝𝑝(𝑖𝑖, 𝑗𝑗) = 𝑃𝑃𝑃𝑃[ 𝑋𝑋 = 𝑖𝑖 and 𝑌𝑌 = 𝑗𝑗] = 𝑃𝑃𝑃𝑃[ 𝑋𝑋 = 𝑖𝑖] 𝑃𝑃𝑃𝑃[ 𝑌𝑌 = 𝑗𝑗] = 𝑝𝑝𝑖𝑖 𝑞𝑞𝑗𝑗

26
5. Testing for independence of discrete random variables
For 𝑛𝑛 pieces of data from 2 discrete random variables 𝑋𝑋 and 𝑌𝑌, there are 𝑐𝑐 possible
values for 𝑋𝑋, and 𝑟𝑟 possible values for 𝑌𝑌 as arranged in the following table:

𝑌𝑌
𝑋𝑋 𝑦𝑦1 𝑦𝑦2 𝑦𝑦3 … 𝑦𝑦𝑐𝑐
𝑥𝑥1 𝑛𝑛11 𝑛𝑛12 𝑛𝑛13 … 𝑛𝑛1𝑐𝑐
𝑥𝑥2 𝑛𝑛21 𝑛𝑛22 𝑛𝑛23 … 𝑛𝑛2𝑐𝑐
𝑥𝑥3 𝑛𝑛31 𝑛𝑛32 𝑛𝑛33 … 𝑛𝑛3𝑐𝑐
… … … … … …
𝑥𝑥𝑟𝑟 𝑛𝑛𝑟𝑟1 𝑛𝑛𝑟𝑟2 𝑛𝑛𝑟𝑟3 … 𝑛𝑛𝑟𝑟𝑟𝑟

We can calculate the relative frequency as the estimate of the probability 𝑝𝑝̂1 that 𝑋𝑋 takes
the value 𝑥𝑥1 :
𝑛𝑛 +𝑛𝑛 +𝑛𝑛 +⋯+𝑛𝑛1𝑐𝑐 𝑛𝑛1𝑗𝑗
𝑝𝑝̂1 = 11 12 13 = ∑𝑐𝑐𝑗𝑗=1 (*)
𝑛𝑛 𝑛𝑛

Similarly, we can calculate the estimate for each 𝑥𝑥𝑖𝑖 :


𝑐𝑐
𝑛𝑛𝑖𝑖1 + 𝑛𝑛𝑖𝑖2 + 𝑛𝑛𝑖𝑖3 + ⋯ + 𝑛𝑛𝑖𝑖𝑖𝑖 𝑛𝑛𝑖𝑖𝑖𝑖
𝑝𝑝̂𝑖𝑖 = =� , 𝑖𝑖 = 1,2, . . . 𝑟𝑟
𝑛𝑛 𝑛𝑛
𝑗𝑗=1

We also can calculate the probability estimate 𝑞𝑞�𝑗𝑗 for each 𝑦𝑦𝑗𝑗 :
𝑛𝑛1𝑗𝑗 +𝑛𝑛2𝑗𝑗 +𝑛𝑛3𝑗𝑗 +⋯+𝑛𝑛𝑟𝑟𝑟𝑟 𝑛𝑛𝑖𝑖𝑖𝑖
𝑞𝑞�𝑗𝑗 = = ∑𝑟𝑟𝑖𝑖=1 , 𝑗𝑗 = 1,2, . . . , 𝑐𝑐 (**)
𝑛𝑛 𝑛𝑛

The ratio 𝑛𝑛𝑖𝑖𝑖𝑖 ⁄𝑛𝑛 is an estimate of

𝑝𝑝(𝑖𝑖, 𝑗𝑗) = 𝑃𝑃𝑃𝑃[ 𝑋𝑋 = 𝑖𝑖 and 𝑌𝑌 = 𝑗𝑗]

27
If 𝑋𝑋 and 𝑌𝑌 are independent, we should have 𝑝𝑝(𝑖𝑖, 𝑗𝑗) = 𝑝𝑝𝑖𝑖 𝑞𝑞𝑗𝑗 . So we want to compare 𝑝𝑝̂ 𝑖𝑖 𝑞𝑞�𝑗𝑗
and 𝑛𝑛𝑖𝑖𝑖𝑖 ⁄𝑛𝑛 to see if they are close: 𝑝𝑝̂𝑖𝑖 𝑞𝑞�𝑗𝑗 ≈ 𝑛𝑛𝑖𝑖𝑖𝑖 ⁄𝑛𝑛 or 𝑛𝑛𝑖𝑖𝑖𝑖 ≈ 𝑛𝑛𝑝𝑝̂ 𝑖𝑖 𝑞𝑞�𝑗𝑗 . If 𝑝𝑝̂ 𝑖𝑖 𝑞𝑞�𝑗𝑗 and 𝑛𝑛𝑖𝑖𝑖𝑖 ⁄𝑛𝑛 are
not close, we cannot assume that 𝑋𝑋 and 𝑌𝑌 are independent. To determine the closeness
of these 2 values, we can calculate the following statistics:

(𝑛𝑛𝑖𝑖𝑖𝑖 − 𝑛𝑛𝑝𝑝̂ 𝑖𝑖 𝑞𝑞�𝑗𝑗 )2


𝑅𝑅𝑖𝑖𝑖𝑖 =
𝑛𝑛𝑝𝑝̂𝑖𝑖 𝑞𝑞�𝑗𝑗

and use 𝜒𝜒 2 test by letting


𝑟𝑟 𝑐𝑐

𝜒𝜒 2 = � � 𝑅𝑅𝑖𝑖𝑖𝑖
𝑖𝑖=1 𝑗𝑗=1

If the value of 𝜒𝜒 2 is less than a given threshold value, we may assume that the two
variables are independent with corresponding confidence level.

Example 4.16.
Data on 5140 phone calls to a police dispatcher are collected. 415 calls among the 5140
calls were for ambulance services. Table 4.6 shows the types of the phone calls and the
following calls. We are interested in seeing if the 2 random variables, types of calls at
any time and types of immediately following calls, are independent.

28
As shown in Table 4.6, there are 42 cases that an ambulance call is followed by another
ambulance call. There are 373 cases that an ambulance call is followed by a non-
ambulance call. Following equations (*) and (**), we can calculate 𝑝𝑝̂1 , 𝑝𝑝̂2 , 𝑞𝑞�1 and 𝑞𝑞�2 :
Type of call n+1
Type of call n 0 (𝑗𝑗 = 1) 1 (𝑗𝑗 = 2) Total
0 (𝑖𝑖 = 1) 42 373 415 𝑝𝑝̂1 =0.08074
1 (𝑖𝑖 = 2) 373 4352 4725 𝑝𝑝̂2 =0.91926
Total 415 4725 5140
𝑞𝑞�1 =0.08074 𝑞𝑞�2 =0.91926

The estimated counts 𝑛𝑛𝑝𝑝̂ 𝑖𝑖 𝑞𝑞�𝑗𝑗 assuming independence are:


Type of call n+1
Type of call n 0 (𝑗𝑗 = 1) 1 (𝑗𝑗 = 2)
0 (𝑖𝑖 = 1) 𝑛𝑛𝑝𝑝̂1 𝑞𝑞�1 =33.506 𝑛𝑛𝑝𝑝̂1 𝑞𝑞�2 =381.493 𝑝𝑝̂1 =0.08074
1 (𝑖𝑖 = 2) 𝑛𝑛𝑝𝑝̂2 𝑞𝑞�1 =381.493 𝑛𝑛𝑝𝑝̂2 𝑞𝑞�2 =4343.507 𝑝𝑝̂2 =0.91926
𝑞𝑞�1 =0.08074 𝑞𝑞�2 =0.91926

(𝑛𝑛𝑖𝑖𝑖𝑖 −𝑛𝑛𝑝𝑝�𝑖𝑖 𝑞𝑞�𝑗𝑗 )2


We then calculate 𝑅𝑅𝑖𝑖𝑖𝑖 = :
𝑛𝑛𝑝𝑝�𝑖𝑖 𝑞𝑞�𝑗𝑗
𝑅𝑅𝑖𝑖𝑖𝑖 0 (𝑗𝑗 = 1) 1 (𝑗𝑗 = 2)
0 (𝑖𝑖 = 1) 2.1528 0.1891
1 (𝑖𝑖 = 2) 0.1891 0.0166

and 𝜒𝜒 2 = ∑𝑟𝑟𝑖𝑖=1 ∑𝑐𝑐𝑗𝑗=1 𝑅𝑅𝑖𝑖𝑖𝑖 =2.5476.

From Table B.3, for 1 degree of freedom and at 95% or 99% level of significance, they
2 2
𝜒𝜒1,0.05 = 3.84, 𝜒𝜒1,0.01 = 6.64. We conclude these 2 variables are independent.

29
Example 4.17
Gold particles were counted in a small area of liquid in an experiment regarding
Brownian motion. The particles were counted for 2 times in the area of observations with
data collected as shown in Table 4.7.

(𝑛𝑛𝑖𝑖𝑖𝑖 −𝑛𝑛𝑝𝑝�𝑖𝑖 𝑞𝑞�𝑗𝑗 )2


From this table, we can calculate 𝑅𝑅𝑖𝑖𝑖𝑖 = and 𝜒𝜒 2 = ∑𝑟𝑟𝑖𝑖=1 ∑𝑐𝑐𝑗𝑗=1 𝑅𝑅𝑖𝑖𝑖𝑖 to determine
𝑛𝑛𝑝𝑝�𝑖𝑖 𝑞𝑞�𝑗𝑗

if the results from the first and second counts are independent. Since:
(𝑛𝑛00 − 𝑛𝑛𝑝𝑝̂0 𝑞𝑞�0 )2 (210 − 92.8)2
𝑅𝑅00 = = = 148.0
𝑛𝑛𝑝𝑝̂0 𝑞𝑞�0 92.8
2 2
and from Table B.3 𝜒𝜒(6−1)×(5−1) = 𝜒𝜒20 = 37.57 with 99% level of significance, we can
conclude that they are not independent.

30
31

You might also like