0% found this document useful (0 votes)
11 views48 pages

c2 RVs Distribution

This document discusses random variables and their distributions, defining key concepts such as random variables, distribution functions, and their properties. It includes examples of discrete and continuous random variables, the law of averages versus the law of large numbers, and the implications of gambler's fallacy. Additionally, it introduces indicator functions and the probability mass and density functions associated with random variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views48 pages

c2 RVs Distribution

This document discusses random variables and their distributions, defining key concepts such as random variables, distribution functions, and their properties. It includes examples of discrete and continuous random variables, the law of averages versus the law of large numbers, and the implications of gambler's fallacy. Additionally, it introduces indicator functions and the probability mass and density functions associated with random variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

RANDOM VARIABLES AND

THEIR DISTRIBUTIONS
CHAPTER-2
CS6015-LINEAR ALGEBRA AND RANDOM PROCESSES
• Random variable definition : A random variable is a function
𝑋 ∶ Ω → ℝ with the property that 𝑤 ∈ Ω 𝑋 𝑤 ≤ 𝑥} ∈ ℱ for each
𝑥 ∈ ℝ. Such a function is said to be 𝓕-measurable.
• We shall always use upper-case letters, such as 𝑋, 𝑌, and 𝑍, to
represent generic random variables, whilst lowercase letters, such
as 𝑥, 𝑦, and 𝑧, will be used to represent possible numerical values
of these variables.
• Every random variable has a distribution function.
• Distribution function definition : The distribution function of a
random variable 𝑋 is the function 𝐹 ∶ ℝ ➔ [0, 1] given by
𝑭 𝒙 = 𝑷 𝑿 ≤ 𝒙 ; the Prob. that X (w) <= x.
• Events written as 𝑤 ∈ Ω 𝑋 𝑤 ≤ 𝑥} are commonly abbreviated to
{𝑤 ∶ 𝑋 𝑤 ≤ 𝑥} or {𝑋 ≤ 𝑥}.
(2) F(x) = P(A(x))
Example
• A fair coin is tossed twice: Ω = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}. For 𝑤 ∈ Ω,
let 𝑋(𝑤) be the number of heads, so that
𝑋(𝐻𝐻) = 2, 𝑋 𝐻𝑇 = 𝑋 𝑇𝐻 = 1, 𝑋 𝑇𝑇 = 0.

• Now suppose that a gambler wagers his fortune of £1 on the


result of this experiment. He gambles cumulatively so that his
fortune is doubled each time a head appears, and is annihilated
on the appearance of a tail. His subsequent fortune 𝑊 is a
random variable given by :
𝑊(𝐻𝐻) = 4, 𝑊(𝐻𝑇) = 𝑊(𝑇𝐻) = 𝑊(𝑇𝑇) = 0.

• A typical distribution function 𝐹𝑋 of 𝑋 is given by :


0 𝑖𝑓 𝑥 < 0
1/4 𝑖𝑓 0 ≤ 𝑥 < 1
𝐹𝑋 𝑥 =
3/4 𝑖𝑓 1 ≤ 𝑥 < 2
1 𝑖𝑓 𝑥 ≥ 2
The distribution function of a random variable X tells us about the
values taken by X and their relative likelihoods,
rather than about the sample space and the collection of events.

• The distribution function 𝐹𝑊 of 𝑊 is given by


0 𝑖𝑓 𝑥 < 0
𝐹𝑊 𝑥 = 3/4 𝑖𝑓 0 ≤ 𝑥 < 4
1 𝑖𝑓 𝑥 ≥ 4
Lemma :
1. A distribution function 𝐹 has the following
properties :
𝐥𝐢𝐦 𝑭 𝒙 = 𝟎 , 𝐥𝐢𝐦 𝑭 𝒙 = 𝟏
𝒙→−∞ 𝒙→∞

Proof : Part 1 : Let 𝐵𝑛 = 𝑤 ∈ Ω 𝑋 𝑤 ≤ −𝑛} = {𝑋 ≤ −𝑛}


The sequence 𝐵1 , 𝐵2 , … is decreasing with the empty set as limit.
i.e., 𝐵1 ⊇ 𝐵2 ⊇ 𝐵3 ⊇ ⋯
𝐵= 𝐵𝑖 = 𝜙
𝑖
𝑃 𝐵 = lim 𝑃 𝐵𝑛
𝑛→∞
(From chapter 1 we know that if 𝐵1 , 𝐵2 …is a decreasing
sequence of events, so that 𝐵1 ⊇ 𝐵2 ⊇ ⋯and 𝐵 is written
for their limit, then:

B= 𝐵𝑖 = lim 𝐵𝑖
𝑖→∞
𝑖=1
Then, 𝑃 𝐵 = lim 𝑃(𝐵𝑖 ) )
𝑖→∞

𝑃 𝐵𝑛 = 𝐹 −𝑛
So, 𝑷 𝑩 = 𝟎. Hence 𝐥𝐢𝐦 𝑭 𝒙 = 𝟎
𝒙→−∞
• Part 2 :

Let 𝐴𝑛 = 𝑋 ≤ 𝑛
The sequence 𝐴1 , 𝐴2 , … is increasing.
i.e., 𝐴1 ⊆ 𝐴2 ⊆ 𝐴3 ⊆ ⋯

𝐴= 𝐴𝑖 = Ω
𝑖

𝑃 𝐴 = lim 𝑃 𝐴𝑛 = 1
𝑛→∞
But 𝑃 𝐴 = 𝐹 𝑛 = 1.

Hence 𝐥𝐢𝐦 𝑭 𝒙 = 𝟏
𝒙→∞
Lemma :
2. If 𝒙 ≤ 𝒚, 𝑭 𝒙 ≤ 𝑭(𝒚)

Proof :
Let 𝐴 𝑥 = 𝑋 ≤ 𝑥 , 𝐴 𝑥, 𝑦 = {𝑥 < 𝑋 ≤ 𝑦}
Then 𝐴 𝑦 = 𝐴 𝑥 ∪ 𝐴 𝑥, 𝑦 is a disjoint union.
So, 𝑃 𝐴(𝑦) = 𝑃 𝐴 𝑥 + 𝑃 𝐴 𝑥, 𝑦
Giving, 𝑭 𝒚 = 𝐹 𝑥 + 𝑃 𝑥 < 𝑋 ≤ 𝑦 ≥ 𝑭 𝒙

2.1) F is right-continuous, that is, F(x + h)  F(x)


Before going to the next lemma, visit:
∞ 𝒏

𝑷( 𝑨𝒊 ) = 𝐥𝐢𝐦 𝑷( 𝑨𝒊 )
𝒏→∞
𝒊=𝟏 𝒊=𝟏

Proof :
Let 𝐵1 = 𝐴1 , 𝐵2 = 𝐴2 \A1 , 𝐵3 = 𝐴3 \(𝐴2 𝐴1 ), …

𝐵𝑖 ∩ 𝐵𝑗 = 𝜙

∞ ∞

𝐴𝑖 = 𝐵𝑖
𝑖=1 𝑖=1
𝐵𝑖 ∩ 𝐵𝑗 = 𝜙

∞ ∞

𝐴𝑖 = 𝐵𝑖
𝑖=1 𝑖=1

∞ ∞ ∞

𝑃 𝐴𝑖 = 𝑃 𝐵𝑖 = 𝑃(𝐵𝑖 )
𝑖=1 𝑖=1 𝑖=1

𝑛 𝑛

lim 𝑃 𝐵𝑖 = lim 𝑃 𝐵𝑖
𝑛→∞ 𝑛→∞
𝑖=1 𝑖=1
𝑛
= lim 𝑃( 𝑖=1 𝐴𝑖 )
𝑛→∞

∞ 𝒏
Thus, 𝑷( 𝒊=𝟏 𝑨𝒊 ) = 𝐥𝐢𝐦 𝑷( 𝒊=𝟏 𝑨𝒊 )
𝒏→∞
• Constant R.V : The simplest random variable takes a constant value on
the whole domain Ω. Let 𝑐 ∈ ℝ and define
𝑋 ∶ Ω → ℝ by
𝑋 𝑤 = 𝑐 for all 𝑤 ∈ Ω.
0 𝑖𝑓 𝑥 < 𝑐
𝐹 𝑥 = the step function
1 𝑖𝑓 𝑥 ≥ 𝑐
More generally, we call X constant (almost surely) if
there exists 𝑐 ∈ ℝ such that P(X = c) = 1.

• Bernoulli R.V : Let 𝑋 ∶ Ω → ℝ be given by 𝑋 𝐻 = 1, 𝑋(𝑇) = 0. Then


𝑋 is the simplest non-trivial random variable, having two possible
values, 0 and 1. Its distribution function ( Bern(P) )
𝐹 𝑥 = 𝑃(𝑋 ≤ 𝑥) is:

0 𝑖𝑓 𝑥 < 0
𝐹 𝑥 = 1−𝑝 𝑖𝑓 0 ≤ 𝑥 < 1
1 𝑖𝑓 𝑥 ≥ 1
Indicator functions
• Let 𝐴 be an event and let 𝐼𝐴 : Ω → ℝ be the
indicator function of 𝐴; that is,

1 𝑖𝑓 𝑤 ∈ 𝐴
𝐼𝐴 𝑤 =
0 𝑖𝑓 𝑤 ∈ 𝐴𝑐

• Then 𝐼𝐴 is a Bernoulli random variable taking the


values 1 and 0 with probabilities 𝑃 𝐴 and
P Ac respectively.
Properties of Distribution function
Lemma :
Let 𝐹 be the distribution function of 𝑋. Then,
• 𝑃 𝑋 >𝑥 =1−𝐹 𝑥
• 𝑃 𝑥 <𝑋 ≤𝑦 =𝐹 𝑦 −𝐹 𝑥
• 𝑃 𝑋 = 𝑥 = 𝐹 𝑥 − lim 𝐹(𝑦)
𝑦↑𝑥
The law of averages
• The law of averages is the law that a particular outcome or
event is inevitable or certain, simply because it is statistically
possible. This notion can lead to the gambler’s fallacy when
one becomes convinced that a particular outcome must
come soon simply because it has not occurred recently.

• In gambler’s fallacy the gambler believes that a particular


outcome is more likely because it has not happened
recently, or (conversely) that because a particular outcome
has recently occurred, it will be less likely in the immediate
future.
Example
• A common example of how the law of averages can mislead
involves the tossing of a fair coin (a coin equally likely to
come up heads or tails on any given toss).
• If someone tosses a fair coin and gets several heads in a row,
that person might think that the next toss is more likely to
come up tails than heads in order to "even things out."
• But the true probabilities of the two outcomes are still equal
for the next coin toss and any coin toss that might follow.
• Past results have no effect whatsoever: Each toss is an
independent event.
• The law of large numbers is often confused with the law of
averages, and many texts use the two terms interchangeably.
However, the law of averages, strictly defined, is not a law at
all, but a logic error that is sometimes referred to as the
gambler’s fallacy.
• The law of averages is not a mathematical principle, whereas
the law of large numbers is.
• In probability theory, the law of large numbers is a theorem
that describes the result of performing the same experiment a
large number of times.
• According to the law, the average of the results obtained from
a large number of trials should be close to the expected value,
and will tend to become closer as more trials are performed.
Discrete and Continuous R.V.s
(just the definitions)
• The random variable 𝑋 is called discrete if it takes values in
some countable subset {𝑥1 , 𝑥2 , … } only, of ℝ. The discrete
random variable 𝑋 has (probability) mass function (PMF)
𝑓: ℝ➔ [0, 1] given by :
𝒇(𝒙) = 𝑷(𝑿 = 𝒙).

• The random variable X is called continuous if its distribution


function (CDF) can be expressed as:

𝒙 𝒇 = 𝜹𝑭 𝜹𝒙
𝑭 𝒙 = 𝒇 𝒖 𝒅𝒖 𝒙∈ℝ
−∞
for some integrable function 𝑓: ℝ➔ [0, ∞) called the
(probability) density function (PDF) of 𝑋.
If the sample space is the set of possible numbers rolled on
two dice, and the random variable of interest is the sum S of the
numbers on the two dice, then S is a discrete random variable whose
distribution is described by the probability mass function (PMF)
plotted as the height of picture columns here. < Src: WIKI >

PDF
• Distribution function definition : The distribution function (CDF) of
a random variable 𝑋 is the function 𝐹 ∶ ℝ ➔ [0, 1] given by
𝑭 𝒙 = 𝑷 𝑿 ≤ 𝒙 ; the Prob. that X (w) <= x.

(probability) mass function (PMF) 𝑓: ℝ➔ [0, 1] of discrete x, is given by


𝒇(𝒙) = 𝑷(𝑿 = 𝒙).
𝒙

𝒇 = 𝜹𝑭 𝜹𝒙 𝑭 𝒙 = 𝒇 𝒖 𝒅𝒖 𝒙∈ℝ
−∞
for some integrable function 𝑓: ℝ➔ [0, ∞) called the
(probability) density function (PDF) of continuous 𝑋.
Random Vectors
• Suppose that 𝑋 and 𝑌 are random variables on the
probability space Ω, 𝐹, 𝑃 . Their distribution functions,
𝐹𝑋 and 𝐹𝑌 , contain information about their associated
probabilities.
• But how may we encapsulate information about their
properties relative to each other?
• The key is to think of 𝑋 and 𝑌 as being the components of
a 'random vector' (𝑋, 𝑌) taking values in ℝ2 , rather than
being unrelated random variables each taking values in ℝ.
Example: Coin Tossing
• Suppose that we toss a coin 𝑛 times, and set
𝑋𝑖 equal to 0 or 1 depending on whether the 𝑖𝑡ℎ
toss results in a tail or a head.

• We think of the vector 𝑿 = (𝑋1 , 𝑋2 , … . , 𝑋𝑛 ) as


describing the result of this composite experiment.
The total number of heads is the sum of the entries
in 𝑿.
Joint Distribution Function
• An individual random variable 𝑋 has a distribution
function 𝐹𝑋 defined by 𝐹𝑋 𝑥 = 𝑃 𝑋 ≤ 𝑥 𝑓𝑜𝑟 𝑥 ∈ ℝ.
• The corresponding 'joint' distribution function of a
random vector (𝑋1 , 𝑋2 , … . , 𝑋𝑛 ) is the quantity
𝑃 𝑋1 ≤ 𝑥1 , 𝑋2 ≤ 𝑥2 , … , 𝑋𝑛 ≤ 𝑥𝑛 , a function of 𝑛 real
variables 𝑥1 , 𝑥2 , … , 𝑥𝑛 .
• In order to aid the notation, we introduce an ordering
of vectors of real numbers: for vectors
𝒙 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) and 𝒚 = (𝑦1 , 𝑦2 , … , 𝑦𝑛 ) we write
𝒙 ≤ 𝒚 if 𝑥𝑖 ≤ 𝑦𝑖 for each 𝑖 = 1,2, … , 𝑛.
Definition and Properties of Joint
Distribution Function
• The joint distribution function of a random vector
𝑿 = (𝑋1 , 𝑋2 , … . , 𝑋𝑛 ) on the probability space Ω, 𝐹, 𝑃 is the
function 𝐹𝑿 ∶ ℝn → 0,1 given by 𝐹𝑿 𝒙 = 𝑃 𝑿 ≤ 𝒙 𝑓𝑜𝑟 𝒙 ∈
ℝn .
Lemma :
• Joint distribution function 𝐹𝑋,𝑌 of random vector (𝑋, 𝑌) have
properties similar to those of ordinary distribution functions which
are as follows:
1. lim 𝐹𝑋,𝑌 𝑥, 𝑦 = 0 𝑎𝑛𝑑 lim 𝐹𝑋,𝑌 𝑥, 𝑦 = 1
𝑥,𝑦→−∞ 𝑥,𝑦→∞

2. 𝐼𝑓 𝑥1 , 𝑦1 ≤ 𝑥2 , 𝑦2 𝑡ℎ𝑒𝑛 𝐹𝑋,𝑌 𝑥1 , 𝑦1 ≤ 𝐹𝑋,𝑌 𝑥2 , 𝑦2

3. 𝐹𝑋,𝑌 is continuous from above, in that


𝐹𝑋,𝑌 𝑥 + 𝑢, 𝑦 + 𝑣 → 𝐹𝑋,𝑌 𝑥, 𝑦 as 𝑢, 𝑣 ↓ 0.
• Note: The individual distribution functions of X and Y
can be recaptured from a knowledge of their joint
distribution function.
• The converse is false : it is not generally possible to
calculate 𝐹𝑋,𝑌 from a knowledge of 𝐹𝑋 and 𝐹𝑌 alone.
• The functions F𝑋 and 𝐹𝑌 are called the 'marginal'
distribution functions of 𝐹𝑋,𝑌 .
Example
• A schoolteacher asks each member of his or her class to flip a
fair coin twice and to record the outcomes.
• The diligent pupil D does this and records a pair (𝑋𝐷 , 𝑌𝐷 ) of
outcomes. The lazy pupil L flips the coin only once and writes
down the result twice, recording thus a pair (𝑋𝐿 , 𝑌𝐿 )where
𝑋𝐿 = YL .
• Clearly 𝑋𝐷 , 𝑌𝐷 , 𝑋𝐿 , 𝑌𝐿 are random variables with the same
distribution functions. However, the pairs (𝑋𝐷 , 𝑌𝐷 ) and
(𝑋𝐿 , 𝑌𝐿 ) have different joint distribution functions.
1
• In particular, 𝑃 𝑋𝐷 = 𝑌𝐷 = ℎ𝑒𝑎𝑑𝑠 = since
only one of the
4
four possible pairs of outcomes contains heads only,
1
whereas 𝑃 𝑋𝐿 = 𝑌𝐿 = ℎ𝑒𝑎𝑑𝑠 = .
2
• The random variables 𝑋 and 𝑌 on the probability space Ω, 𝐹, 𝑃
are called (jointly) discrete if the vector 𝑋, 𝑌 takes values in
some countable subset of ℝ2 only. The jointly discrete random
variables 𝑋, 𝑌 have joint (probability) mass function
𝑓 ∶ ℝ2 → 0,1 given by 𝑓 𝑥, 𝑦 = 𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 .

• The random variables 𝑋 and 𝑌 on the probability space Ω, 𝐹, 𝑃


are called (jointly) continuous if their joint distribution function
can be expressed as
𝑥 𝑦

𝐹𝑋,𝑌 𝑥, 𝑦 = 𝑓 𝑢, 𝑣 𝑑𝑢𝑑𝑣 𝑥, 𝑦 ∈ ℝ
𝑢=−∞ 𝑣=−∞
for some integrable function 𝑓 ∶ ℝ2 → [0, ∞) called the joint
(probability) density function of the pair (𝑋, 𝑌).
Monte Carlo Simulation (MCS)
• 'Monte Carlo simulation' is used to describe a method for
propagating uncertainties in model inputs into uncertainties
in model outputs (results).
• Hence, it is a type of simulation that explicitly and
quantitatively represents uncertainties.
• Monte Carlo simulation relies on the process of explicitly
representing uncertainties by specifying inputs as probability
distributions. If the inputs describing a system are uncertain,
the prediction of future performance is necessarily
uncertain.
• That is, the result of any analysis based on inputs
represented by probability distributions is itself a probability
distribution.
• Compared to deterministic analysis, the Monte Carlo method
provides a superior simulation of risk. It gives an idea of not
only what outcome to expect but also the probability of
occurrence of that outcome.

• Different explanation : When you develop a forecasting model –


any model that plans ahead for the future – you make certain
assumptions.
• Because these are projections into the future, the best you can
do is estimate the expected value. Based on historical data, or
expertise in the field, or past experience, you can draw an
estimate. While this estimate is useful for developing a model, it
contains some inherent uncertainty and risk, because it's an
estimate of an unknown value.
In telecommunications, when planning a wireless network,
design must be proved to work for a wide variety of scenarios that
depend mainly on the number of users, their locations and the
services they want to use. Monte Carlo methods are typically used to
generate these users and their states. The network performance is
then evaluated and, if results are not satisfactory, the network
design goes through an optimization process.
In autonomous robotics, Monte Carlo localization can
determine the position of a robot. It is often applied to stochastic
filters such as the Kalman filter or particle filter that forms the heart
of the SLAM (simultaneous localization and mapping) algorithm.
Path tracing, occasionally referred to as Monte Carlo ray
tracing, renders a 3D scene by randomly tracing samples of possible
light paths. Repeated sampling of any given pixel will eventually
cause the average of the samples to converge on the correct solution
of the rendering equation, making it one of the most physically
accurate 3D graphics rendering methods.
Monte Carlo methods have been developed into a technique
called Monte-Carlo tree search that is useful for searching for the
best move in a game. Possible moves are organized in a search tree
and a large number of random simulations are used to estimate the
long-term potential of each move. A black box simulator represents
the opponent's moves.
• In some cases, it's possible to estimate a range of values. In a
construction project, you might estimate the time it will take to
complete a particular job; based on some expert knowledge, you
can also estimate the absolute maximum time it might take, in
the worst possible case, and the absolute minimum time, in the
best possible case.
• The key feature of a Monte Carlo simulation is that it can tell you
– based on how you create the ranges of estimates – how likely
the resulting outcomes are.
• Example: A dam. It is proposed to build a dam in order to
regulate the water supply, and in particular to prevent seasonal
flooding downstream. How high should the dam be?
• Dams are expensive to construct, and some compromise
between cost and risk is necessary.
• It is decided to build a dam which is just high enough to ensure
that the chance of a flood of some given extent within ten years is
less than 10−2 ,say.
• No one knows exactly how high such a dam need be, and a young
probabilist proposes the following scheme.
• Through examination of existing records of rainfall and water
demand we may arrive at an acceptable model for the pattern of
supply and demand.
• This model includes, for example, estimates for the distributions
of rainfall on successive days over long periods.
• With the aid of a computer, the 'real world' situation is simulated
many times in order to study the likely consequences of building
dams of various heights.
• In this way we may arrive at an accurate estimate of the height
required.
Example
• A dentist schedules all his/her patients for 30
minutes appointments.
• Some of the patients take more or less than
30 minutes depending on the type of dental
work to be done.
• The following summary shows the categories
of work, their probabilities and the time
actually needed to complete the work:
Category Time required No. of patients
Filling 45 min 40
Crown 60 min 15
Cleaning 15 min 15
Extracting 45 min 10
Checkup 15 min 20
• Simulate the dentist’s clinic for 4 hours and find out the average
waiting time for the patients as well as the idleness of the
doctor. Assume that all the patients show up at the clinic at
exactly their scheduled arrival time starting at 8:00 a.m.

• Use the following random numbers for handling the above


problem:
40, 82, 11, 34, 25, 66, 17, 79
Steps:
• Find the probability distribution
• Cumulative distribution
• Setting random number intervals
• Generating random numbers
• Find the solution based on the above details

• Keep repeating above several times to get different


distributions of the solution space.
Category Time required No. of patients
Filling 45 min 40
Crown 60 min 15
Cleaning 15 min 15
Extracting 45 min 10
Checkup 15 min 20

Category Probability Cumulative Random No.


Probability Interval
Filling 0.40 0.40 0-39
Crown 0.15 0.55 40-54
Cleaning 0.15 0.70 55-69
Extracting 0.10 0.80 70-79
Checkup 0.20 1.00 80-99
Patient Scheduled Random Category Service
arrival Number time
needed
1 8:00 40 Crown 60 min
2 8:30 82 Checkup 15 min
3 9:00 11 Filling 45 min
4 9:30 34 Filling 45 min
5 10:00 25 Filling 45 min
6 10:30 66 Cleaning 15 min
7 11:00 17 Filling 45 min
8 11:30 79 Extracting 45 min
Patient Scheduled Service Service Service Waiting Idle
arrival start duration end (in min) time
(in min)
1 8:00 8:00 60 9:00 0 0

2 8:30 9:00 15 9:15 30 0

3 9:00 9:15 45 10:00 15 0

4 9:30 10:00 45 10:45 30 0

5 10:00 10:45 45 11:30 45 0

6 10:30 11:30 15 11:45 60 0

7 11:00 11:45 45 12:30 45 0

8 11:30 12:30 45 1:15 60 0

You might also like