lect2
lect2
FY (y) = P rob(exp(X) ≤ y)
= P rob(X ≤ log y)
1 1 1
= FX (log y) = + log y, for y ∈ [ , e].
2 2 e
Be careful about the bounds of the support!
∂
fY (y) = FY (y)
∂y
1 1 1
= fX (log y) = , for y ∈ [ , e].
y 2y e
1
2. X ∼ U [−1, 1] and Y = X 2
FY (y) = P rob(X 2 ≤ y)
√ √
= P rob(− y ≤ X ≤ y)
√ √
= FX ( y) − FX (− y)
√ √ √
= 2FX ( y) − 1, by symmetry: FX (− y) = 1 − FX ( y).
∂
fY (y) = FY (y)
∂y
√ 1 1
= 2fX ( y) √ = √ , for y ∈ [0, 1].
2 y 2 y
As the first example above showed, it’s easy to derive the CDF and PDF of Y when g(·) is
a strictly monotonic function:
Theorems 2.1.3, 2.1.5: When g(·) is a strictly increasing function, then
Z g −1 (y)
FY (y) = fX (x)dx = FX (g −1 (y))
−∞
∂ −1
fY (y) = fX (g −1 (y)) g (y) using chain rule.
∂y
These are the change of variables formulas for transformations of univariate random variables.
Thm 2.1.8 generalizes this to piecewise monotonic transformations.
2
Here is a special case of a transformation:
Thm 2.1.10: Let X have a continuous CDF FX (·) and define the random variable Y =
FX (X). Then Y ∼ U [0, 1], i.e., FY (y) = y, for y ∈ [0, 1].
Note: all that is required is that the CDF FX is continuous, not that it must be strictly
increasing. The result also goes through when FX is continuous but has flat parts (cf.
discussion in CB, pg. 34).
Expected value (Definition 2.2.1): The expected value, or mean, of a random variable
g(X) is
R∞
−∞
g(x)fX (x)dx if X continuous
Eg(X) = P
x∈X g(x)P (X = x) if X discrete
provided that the integral or the sum exists
The expectation is a linear operator (just like integration): so that
" n
# n
X X
E α∗ gi (X) + b = α ∗ Egi (X) + b.
i=1 i=1
Note: Expectation is a population average, i.e., you average values of the random variable
g(X) weighting by the population density fX (x).
3
Other measures:
1. Median: med(X) = m such that FX (x) = 0.5. Robust to outliers, and has nice
invariance property: for Y = g(X) and g(·) monotonic increasing, then med(Y ) =
g(med(X)).
Moments: important class of expectations
For each integer n, the n-th (uncentred) moment of X ∼ FX (·) is µ0n ≡ EX n .
The n-th centred moment is µn ≡ E(X − µ)n = E(X − EX)n . (It is centred around the
mean EX.)
√
For n = 2: µ2 = E(X − EX)2 is the Variance of X. µ2 is the standard deviation.
Important formulas:
The moments of a random variable are summarized in the moment generating function.
Definition: the moment-generating function of X is MX (t) ≡ E exp(tX), provided that the
expectation exists in some neighborhood t ∈ [−h, h] of zero.
This is also called the “Laplace transform”.
Specifically:
R ∞ tx
−∞
e fX (x)dx for X continuous
Mx (t) = P tx
x∈X e P (X = x) for X discrete.
(n) dn
EX n = MX (0) ≡ MX (t) ,
dtn t=0
4
which is the n-th derivative of the MGF, evaluated at t = 0.
Example: standard normal distribution:
Z ∞
x2
1
MX (t) = √ exp tx − dx
−∞ 2π 2
Z ∞
1 1 2 2
= √ exp − ((x − t) − t ) dx
−∞ 2π 2
Z ∞
1 2 1 1 2
= exp( t ) · √ exp − (x − t) dx
2 −∞ 2π 2
1
= exp( t2 ) · 1
2
where last term on RHS is integral over density function of N (t, 1), which integrates to one.
First moment: EX = MX1 (0) = t · exp( 12 t2 ) t=0 = 0.
Second moment: EX 2 = MX2 (0) = exp( 21 t2 ) + t2 exp( 12 t2 ) = 1.
In many cases, the MGF can characterize a distribution. But problem is that it may not
exist (eg. Cauchy distribution)
For a RV X, is its distribution uniquely determined by its moment generating function?
Thm 2.3.11: For X ∼ FX and Y ∼ FY , if MX and MY exist, and MX (t) = MY (t) for all t
in some neighborhood of zero, then FX (u) = FY (u) for all u.
Note that if the MGF exists, then it characterizes a random variable with an infinite number
of moments (because the MGF is infinitely differentiable). Converse not necessarily true.
(ex. log-normal random variable: X ∼ N (0, 1), Y = exp(X))
Characteristic function:
The characteristic function of a random variable g(x), defined as
Z +∞
φg(x) (t) = Ex exp(itg(x)) = exp(itg(x))f (x)dx
−∞
5
• The CF always exists. This follows from the equality eitx = cos(tx) + i · sin(tx), and
both the real and complex parts of the integrand are bounded functions.
• Consider a symmetric density function, with f (−x) = f (x) (symmetric around zero).
Then resulting φ(t) is real-valued, and symmetric around zero.
• The CF completely determines the distribution of X (every cdf has a unique charac-
teristic function).
• X and Y , independent, with characteristic functions φX (t) and φY (t). Then φX+Y (t) =
φX (t)φY (t)
• φ(0) = 1.
R +∞
• For a given characteristic function φX (t) such that −∞ |φX (t)|dt < ∞,1 the corre-
sponding density fX (x) is given by the inverse Fourier transform, which is
Z +∞
1
fX (x) = φX (t) exp(−itx)dt.
2π −∞
6
• Characteristic function also summarizes the moments of a random variable. Specifi-
cally, note that the h-th derivative of φ(t) is
Z +∞
h
φ (t) = ih g(x)h exp(itg(x))f (x)dx. (2)
−∞
Hence, assuming the h-th moment, denoted µhg(x) ≡ E[g(x)]h exists, it is equal to
µhg(x) = φh (0)/ih .
Hence, assuming that the required moments exist, we can use Taylor’s theorem to
expand the characteristic function around t = 0 to get:
it 1 (it)2 2 (it)k k
φ(t) = 1 + µg(x) + µg(x) + ... + µg(x) + o(tk ).
1 2 k!
• Cauchy distribution, cont’d: The characteristic function for the Cauchy distribu-
tion is
φ(t) = exp(−|t|).
This is not differentiable at t = 0, which by Eq. (2) is saying that its mean does not
exist. Hence, the expansion of the characteristic function in this case is invalid.