Chapter 2 - Random Variables (With Solutions)
Chapter 2 - Random Variables (With Solutions)
2
What is a random variable?
● Example: Examine 2 light bulbs. We are interested in the number of defective bulbs (∈ {0, 1, 2}),
but not whether a specific bulb is defective or not. Such a rule is illustrated below:
3
What is a random variable?
Usually, people are interested in a special measurable characteristic of the outcomes in different
experiments, though the sample space Ω may consist of qualitative or quantitative outcomes
● Manufacturers: proportion of defective light bulbs from a lot
● Market researchers: preference in a scale of 1-10 of consumers about a proposed product
● Research physicians: changes in certain reading from patients who are prescribed a new drug
4
What is a random variable?
● Manufacturers: the number of defective light bulbs from a lot of fixed size N is any integer [0, N]
● Market researchers: preference score of consumers about the proposed product, integer in [1, 10]
● Research physicians: change in certain reading from a patient who is prescribed a new drug is any real
number from −∞ to ∞
● Data scientists: Number of correctly classified images from N images is an integer in [0, N] 5
Random variables
v
Definition
For a given sample space Ω of an experiment, a random variable (r.v.) is a function X whose domain is Ω
and whose range is the set of real numbers
6
Random variables
Notations:
7
Discrete random variables
Definition
A discrete r.v. is a r.v. that can take on only a finite or at most a countably infinite number of values
Range is
To understand the performance of a r.v. X, one wants to specify the probability attached to each value in its range
where P is the probability measure defined in Lecture 1. It can be found based on the probability of outcomes.
8
Probability mass function
● Let A = {ω ∈ Ω|X(ω) = x} be the set/event containing all the outcomes ω ∈ Ω which are mapped to
x by X.
● Following from the rule of probability, for any , simply add probabilities of all outcomes in A
Definition
The probability mass function (pmf), or frequency function of a discrete r.v. with range {x1, x2, · · · } is a
function p s.t.
9
Cumulative distribution function
Definition
The cumulative distribution function (cdf) of any r.v. is defined by
Remark: Both pmf and cdf are necessary and sufficient ways to uniquely characterize a discrete r.v.
10
Examples
11
Examples
12
Examples
cdf characterizes the distribution of probabilities on each possible value/subset of a r.v., so does pmf (for a
discrete r.v.)
● We can characterize a r.v. with its distribution, and say “(the r.v.) X follows/has a distribution with pmf p(x)
(or cdf F(x))”.
14
Bernoulli trials and Bernoulli r.v.
● Bernoulli trial: an experiment whose outcomes can be classified as, generically “success (S)” or
“failure (F)”
● A function X which maps S to 1 and F to 0 defines the simplest discrete r.v. which takes on only 2
possible values
Definition
A Bernoulli r.v. with parameter or probability of success 0 < p < 1 takes on only 2 values, 0 & 1, with p(1) =
p, p(0) = 1 − p, p(x) = 0 if x ≠ 0 and x ≠ 1.
● Write X ~ Ber(p)
● pmf of X, where q = 1 − p:
Will remove “p(x) = 0” otherwise in the following definitions for simplicity, but keep it in mind in your work! 15
Examples
● Many examples of Bernoulli trials: Toss a coin; Examine a light bulb; Whether it rains in NUS tomorrow; ...
● Being the simplest experiment resulting in only 2 possible outcomes, different Bernoulli trials appear very
often in the real world
● In the real world, there are many more random phenomena corresponding to experiments defined
systematically by or related to ≥ 2 Bernoulli trials
● R.v. constructed from these experiments, for example,
○ binomial r.v.
○ geometric r.v.
○ negative binomial r.v.
○ hypergeomeric r.v.
○ Poisson r.v.
16
The Binomial distribution
Definition
A r.v. X is said to have a binomial distribution with parameters n and p, write X ∼ Bin(n, p), if its pmf is
defined by
17
The Binomial distribution
● The pmf p(x) = P(X = x) for x = 0, 1, 2, · · · , n can be found by counting methods with combinations:
○ Any outcome of the experiment is an sequence of Success (1) and Failure (0)
○ A particular sequence of x of 1s and n − x of 0s occurs with probability px (1 − p)n−x (following from the
multiplication law and independence between trial results)
○ Adding up px (1 − p)n−x for times gives p(x) (following from the addition law and disjointness of all these
sequences)
18
The Binomial distribution
The shape of pmf: symmetric versus skewed as shown by pmf’s with n = 10 & p = 0.5 (above) or p = 0.1
(below)
19
Example
20
Example
Problem: Solution:
Suppose that it is known that a manufacturer Let X be the number of defective fuses in a lot of
produces defective fuses subject to a probability size 100. Then, X is a binomial r.v. with parameters
of .05. In a lot of 100 produced fuses, what are the n = 100 and p = 0.05.
probabilities that
21
The Geometric distribution
A Bernoulli trial associated with probability of success p is repeatedly performed indep. until the 1st success is observed
X is a geometric r.v. with probability of success p where p is the probability of observing a success from every Bernoulli trial
Definition
A r.v. X is said to have a geometric distribution with parameter p, write X ∼ Geo(p), if its pmf is defined by
X is a negative binomial r.v. with parameters r and p, where p is the probability of observing a success from every
Bernoulli trial
Definition
A r.v. X is said to have a negative binomial distribution with parameters r and p, write X ∼ NegBin(r, p), if
its pmf is defined by
Question: Solution:
24
Quiz
Question: Solution:
Suppose that it is known that a manufacturer Let X be the number of fuses to be examined.
produces defective fuses subject to a probability Then, X ∼ Geo(0.05) ≡ NegBin(1, 0.05) for Question
of 0.05. By examining the produced fuses in series 1, X ∼ NegBin(5, 0.05) for Question 2.
one-by-one, what are the probabilities that
25
The Hypergeometric distribution
From a population of 2 kinds of objects (Success & Failure) with r Successes and N − r Failures, a total of n
< N objects are draw without replacement
26
The Hypergeometric distribution
X is a hypergeometric r.v. with parameters r, N, and n, where r is number of successes, N is population size,
and n is sample size
Definition
A r.v. X is said to have a hypergeometric distribution with parameters r, N and n, write X ∼ Hyper(r, N, n),
if its pmf is defined by
p(x) is derived easily due to the fact that the order of the n selected objects does not matter
A Bin(n, p) r.v. can be alternatively defined by the same mechanism as above with the sample size set as the
number of trials n and p = r/N, except that the objects are drawn with replacement
28
Example
Problem: Solution:
It is common that manufacturers perform quality Let X be number of defective items observed in a
control of their products by sampling a few sample of size 10. Then, X is a hypergeometric r.v.
products from a lot. When there are defective with parameters r = 4, N = 20 and n = 10, which
items more than a threshold value, then the lot
takes on values 0, 1, 2, 3, 4
will not be shipped. Suppose that in a lot of size 20,
4 of the products are defective. When there are >
2 defective items among 10 inspected, the lot will
be rejected. What is the prob that this lot will be
rejected?
29
The Poisson distribution
Definition
A r.v. X is said to have a Poisson distribution with parameters λ > 0, write X ∼ Poi(λ), if its pmf is defined by
Poisson distribution is a good model for the number of occurrences of a rare incidence in a fixed period of
time, in a given space, etc.
31
The Poisson pmf
32
Example
Problem: Solution:
Flaws (bad records) on a used video tape occur on Let X be the number of flaws on a tape of 4800
the average of 1 flaw per 1200 feet and the feet. Then X〜Poi(4800/1200 x 1 = 4)
number of flaws follows a Poisson distribution.
What are the probabilities that 1.
33
Example - Poisson vs Binomial
Problem: Solution:
In a huge community, it is known that 0.7% of the Here we assume that the number of colour blind people
population is colour-blind. What is the probability in a group of 1000 people from this community, Y〜
that at most 10 in a group of 1000 people from the Bin(1000, 0.007). The required probability is
community are colour blind?
34
The Poisson process
● Alternatively the Poisson distribution can be derived from a mathematical model, called a Poisson process
having rate λ > 0, for describing a random phenomenon regarding occurrence of a certain incidence or
“event” in time.
● λ is the rate per unit time at which events occur
○ E.g. λ = 2 may stand for 2 events per minute/hour/week
2. Probability of exactly 1 occurrence of the event in a sufficiently small interval of length h is roughly λh
3. Probability of ≥2 occurrences of the event in a sufficiently small interval of length h is essentially zero
Problem: Solutions:
36
Quiz
Problem: Solutions:
37
Continuous random variables
● Among real world “experiments”, many “natural” variables/quantities of interest have sample
spaces with uncountable possible outcomes, e.g.,
○ temperature range on any day
● Define a one-to-one function X which maps these uncountably infinite outcomes to themselves
(function f(x) = x)
● Viewing X as a r.v., the range of X is an interval (possibly bounded) or a union of intervals, and this
kind of r.v.’s are called continuous random variables.
38
Continuous random variables
In case we follow what we did in discussing discrete r.v.’s, we need P(X = x) for every possible value of x in
R of the cont. r.v. X.
● For any cont. r.v., we must have, for any possible value x in the range of X,
P(X = x) = 0,
39
Probability density function
40
Probability density function
Definition
The probability density function (pdf ) of a cont. r.v. X is an integrable function satisfying
●
● is piecewise continuous
●
● The prob that X takes on a value in the interval equals the area under the curve
between a & b:
41
PMF vs PDF
42
Properties of continuous r.v.
Property
1. for any
2.
3. For small , if is continuous at c,
● The value of . However, from property 3, the probability that X is in a small interval
around c is proportional to
● (Integral) property 3 in differential notation:
43
CDF of continuous r.v.
Definition
The cdf of a cont. r.v. X with pdf f(x) is defined by
● With the fundamental theorem of calculus, the pdf is the first derivative of the cdf:
44
Uniform r.v.
Definition
A r.v. X is called a uniform r.v. with parameter a and b, write X ∼ Unif(a, b), if, for b > a, its pdf is given by
The probability that X is in any interval of length h in (a, b) equals to h/(b − a). For standard uniform, it is h.
45
Uniform r.v.
46
pdf
Uniform r.v.
cdf
47
Example
The direction of imperfection on a tyre is subject to What is the probability that the angle is
uncertainty. Let X be the angle clockwise, between a between 90॰ and 180॰?
vertical reference line and the line connecting the centre
of the tyre to the imperfection. A possible pdf for X is:
48
Exponential r.v.
Definition
A r.v. X is called an exponential r.v. with parameter λ, write X ∼ Exp(λ), if, for λ > 0, its pdf is given by
49
Exponential r.v.
PDF CDF
50
Exponential r.v.
Memoryless property
For X ∼ Exp(λ), and s, t > 0,
is independent of s.
Let X be the lifetime of some product: if the product is good at time t, the distribution of the remaining
time that it is good is the same as the original lifetime distribution when it was new
The only cont. r.v. possessing this property. In fact, the memorylessness property indicates the
exponential distribution.
51
Exponential r.v.
Problem: Solution:
Suppose that the length of wait for a taxi at a taxi Let X be the duration of wait. Then X 〜 Exp(0.1).
stand is an exponential r.v. with parameter λ = 0.1.
Someone is in front of you in the queue for a 1. P(X > 10) = 1 - F(10) = e-0.1x10 = e-1 = 0.368
taxi. Find the probabilities that you will have to
wait 2. P(10 > X > 20) = F(20) - F(10)
52
Gamma r.v.
Definition
A r.v. X is called a gamma r.v. with shape parameter α and rate parameter λ, write X ∼ G(α, λ), if, for α, λ >
0, its pdf is given by
54
Beta r.v.
Definition
A r.v. X is called a beta r.v. with parameters a and b, write X ∼ B(a, b), if, for a, b > 0, its pdf is given by
Family of beta densities for different values of a & b: a fairly flexible class for modeling r.v.’s that are
restricted on [0, 1]
55
Beta density / pdf
a = 2, b = 2
a = 6, b = 6
a = 6, b = 2 a = 0.5, b = 4
56
Normal distribution
Definition
A r.v. X has a normal/Gaussian distribution with parameters μ and σ, write X ∼ N(μ, σ2), if, for −∞ < μ < ∞,
σ > 0, its pdf is given by
57
Normal distribution
The most widely used models for diverse phenomena such as measurement errors in scientific
experiments, reaction times in psychological experiments, etc.
Many r.v.’s, such as height and time, have distributions that are well approximated by a normal
distribution
Central Limit Theorem (CLT) to be introduced later - sum of independent. r.v.’s is approximately normal -
justifies the use of the normal distribution in many applications
An important special case: the normal distribution with mean μ = 0 and sd σ = 1, called the standard
normal distribution and denoted by Z ∼ N(0, 1)
58
The normal density / pdf
59
The normal density / pdf
60
Linear transformation of a Normal r.v.
Definition
Suppose that X ∼ N(μ, σ2). Then, Y = a + bX, for fixed constants a and b, is a normal r.v. with mean a + bμ
and variance b2σ2.
1. From CDF:
2. Thus PDF:
So,
61
Standard Normal r.v.
Definition
Suppose that X ∼ N(μ, σ2). Then, is a standard normal r.v. with mean 0 and variance 1.
Notice that is a linear transformation of X, and apply the result with a = -μ/σ and b = 1/σ
The above linear transformation on the r.v. X defined by subtracting the mean of X followed by dividing
the result by the sd of X is called standardization of X
62
Probability computation for Normal r.v.
Definition
The cdf of a N(0, 1) r.v., Z, is defined by, for −∞ < x < ∞,
64
Z-table
65
Example
66
Example
67
Quiz
Let X be the gestational length in weeks. We know from prior research X ∼ N(39.18, 22). What is the
probability of gestations being less than 40 weeks?
68
Quiz
Let X be the gestational length in weeks. We know from prior research X ∼ N(39.18, 22). What is the
probability of gestations being less than 40 weeks?
0.6591
0.41 69
The “inverse” problem
Definition
For a r.v. X ∼ N(μ, σ2) and probability 0 < p < 1, we are interested in d s.t. P(X ≤ d) = p. Here, d is called
the quantile of X at p, and this problem is called “inverse” problem of computing probs of a normal r.v.
When a normally distributed r.v. X is of interest,
● what is the value of d in the following claim for any given 0 < p < 1,
● frequency that a realization from X is ≤ d is p
Locate p in the pool of numbers in the Z-table, and then compute d based on the x value associated with p
70
Example
Let X be the gestational length in weeks. We know from prior research X ∼ N(39, 22). What is the
gestational length l such that 40% of all gestational lengths are shorter?
Find , where
Find
71
Functions of a r.v.
Given a r.v. X with density function, it is often that we are interested in another r.v. Y = g(X) which is
defined as a known function g (either one-to-one or many-to-one) of X
e.g., interested in the revenue of a shop (Y) which depends on the sales (X) (assuming that the r.v., sales, is
fully understood)
72
Functions of a r.v.
Proof: Assume g(x) is an increasing function. Suppose y = g(x) for some x, then with Y = g(X),
74
Example
Note: This non-negative r.v. is called a lognormal r.v. as a logarithmic transformation of Y gives a normal r.v.
75
Quiz
Note: square function is neither increasing nor decreasing. So change of variable technique is not
applicable for any
76
Quiz
Note: square function is neither increasing nor decreasing. So change of variable technique is not
applicable for any
77