Distributions and Normal Random Variables
Distributions and Normal Random Variables
1 Random variables
for all x ∈ (0, 1). Therefore q(x) is called the x-quantile of X . It is such a number that random variable X
takes a value smaller or equal to this number with probability x. If F is not strictly increasing or continuous,
then we de˝ne q(x) as a generalized inverse of F , i.e. q(x) = inf{t ∈ R : F (t) ≥ x} for all x ∈ (0, 1). In
other words, q(x) is a number such that F (q(x) + ε) ≥ x and F (q(x) − ε) < x for any ε > 0. As an exercise,
check that P {X ≤ q(x)} ≥ x.
1
1.2 Functions of Random Variables
Suppose we have random variable X and function g : R → R. Then we can de˝ne another random variable
Y = g(X). The cdf of Y can be calculated as follows
where g −1 may be the set-valued inverse of g . The set g −1 (−∞, t] consists of all s ∈ R such that g(s) ∈
(−∞, t], i.e. g(s) ≤ t. If g is strictly increasing and continuously di˙erentiable then it has strictly increasing
and continuously di˙erentiable inverse g −1 de˝ned on set g(R). In this case P {X ∈ g −1 (−∞, t]} = P {X ≤
g −1 (t)} = FX (g −1 (t)) for all t ∈ g(R). If, in addition, X is a continuous random variable, then
( ) ( )−1 ( )−1
dFY (t) dFX (g −1 (t)) dFX (s) dg(s) dg(s)
fY (t) = = = = fX (g −1 (t))
dt dt ds s=g −1 (t) ds ds
s=g −1 (t) s=g −1 (t)
FY (t) = P {Y ≤ t} = P {X − a ≤ t} = P {X ≤ t + a} = FX (t + a).
In particular, if X is continuous, then Y is also continuous with fY (t) = fX (t + a). If Y = bX with b > 0,
then
FY (t) = P {bX ≤ t} = P {X ≤ t/b} = FX (t/b).
2
• k -th moment: g(x) = xk , E[X k ]
• Bernoulli (p): random variable X has Bernoully(p) distribution if it takes values from X = {0, 1},
P {X = 0} = 1 − p and P {X = 1} = p. Its expectation E[X] = 1 · p + 0 · (1 − p) = p. Its second moment
E[X 2 ] = 12 · p + 02 · (1 − p) = p. Thus, its variance V (X) = E[X 2 ] − (E[X])2 = p − p2 = p(1 − p).
Notation: X ∼ Bernoulli(p).
• Poisson (λ): random variable X has a Poisson(λ) distribution if it takes values from X = {0, 1, 2, ...}
and P {X = j} = e−λ λj /j!. As an exercise, check that E[X] = λ and V (X) = λ. Notation: X ∼
Poisson(λ}.
• Uniform (a, b): random variable X has a Uniform(a, b) distribution if its density fX (x) = 1/(b − a) for
x ∈ (a, b) and fX (x) = 0 otherwise. Notation: X ∼ U (a, b).
• Normal (µ, σ 2 ): random variable X has a Normal(µ, σ 2 ) distribution if its density fX (x) = exp(−(x −
√
µ)2 /(2σ 2 ))/( 2πσ) for all x ∈ R. Its expectation E[X] = µ and its variance V (X) = σ 2 . Notation:
X ∼ N (µ, σ 2 ). As an exercise, check that if X ∼ N (µ, σ 2 ), then Y = (X − µ)/σ ∼ N (0, 1). Y is
said to have a standard normal distribution. It is known that the cdf of N (µ, σ 2 ) is not analytical,
i.e. it can not be written as a composition of simple functions. However, there exist tables that give
3
its approximate values. The cdf of a standard normal distribution is commonly denoted by Φ, i.e. if
Y ∼ N (0, 1), then FY (t) = P {Y ≤ t} = Φ(t).
∂ 2 FX,Y (x, y)
fX,Y (x, y) =
∂x∂y
From the joint pdf fX,Y one can calculate the pdf of, say, X . Indeed,
∫ x ∫ +∞
FX (x) = P {X ≤ x} = f (s, t)dtds
−∞ −∞
∫
Therefore fX (s) = −∞ +∞
f (s, t)dt. The pdf of X is called marginal to emphasize that it comes from a joint
pdf of X and Y .
If X and Y have a joint pdf, then we can de˝ne a conditional pdf of Y given X = x (for x such that
fX (x) > 0): fY |X (y|x) = fX,Y (x, y)/fX (x). Conditional probability is a full characterization of how Y is
distributed for any given given X = x. The probability that Y ∈ A for some set A given that X = x can
∫
be calculated as P {Y ∈ A|X = x} = A fY |X (y|x)dy . In a similar manner we can calculate the conditional
∫ +∞
expectation of Y given X = x: E[Y |X = x] = −∞ yfY |X (y|x)dy . As an exercise, think how we can de˝ne
the conditional distribution of Y given X = x if X and Y are discrete random variables.
Two extremely useful properties of a conditional expectation are: for any random variables X and Y ,
2.2 Independence
Random variables X and Y are said to be independent if fY |X (y|x) = fY (y) for all x ∈ R, i.e. if the marginal
pdf of Y equals conditional pdf Y given X = x for all x ∈ R. Note that fY |X (y|x) = fY (y) if and only if
fX,Y (x, y) = fX (x)fY (y). If X and Y are independent, then g(X) and f (Y ) are also independent for any
functions g : R → R and f : R → R. In addition, if X and Y are independent, then E[XY ] = E[X]E[Y ].
4
Indeed,
∫ +∞ ∫ +∞
E[XY ] = xyfX,Y (x, y)dxdy
−∞ −∞
∫ +∞ ∫ +∞
= xyfX (x)fY (y)dxdy
−∞ −∞
∫ +∞ ∫ +∞
= xfX (x)dx yfY (y)dy
−∞ −∞
= E[X]E[Y ]
2.3 Covariance
For any two random variables X and Y we can de˝ne covariance as
2. cov(aX, bY ) = abcov(X, Y ) for any random variables X and Y and any constants a and b
3. cov(X + a, Y ) = cov(X, Y ) for any random variables X and Y and any constant a
To prove property 5, consider random variable X − aY with a = cov(X, Y )/V (Y ). On the one hand, its
variance V (X − aY ) ≥ 0. On the other hand,
V (X − aY ) = V (X) − 2acov(X, Y ) + a2 V (Y )
= V (X) − 2(cov(X, Y ))2 /V (Y ) + (cov(X, Y )2 /V (Y )
Thus, the last expression is nonnegative as well. Multiplying it by V (Y ) yields the result.
√
The correlation of two random variables X and Y is de˝ned by corr(X, Y ) = cov(X, Y )/ V (X)V (Y ).
By property 5 as before, |corr(X, Y )| ≤ 1. If |corr(X, Y )| = 1, then X and Y are linearly dependent, i.e.
there exist constants a and b such that X = a + bY .
5
3 Normal Random Variables
Let us begin with the de˝nition of a multivariate normal distribution. Let Σ be a positive de˝nite n × n
matrix. Remember that the n × n matrix Σ is positive de˝nite if aT Σa > 0 for any non-zero n × 1 vector a.
Here superindex T denotes transposition. Let µ be n × 1 vector. Then X ∼ N (µ, Σ) if X is continuous and
its pdf is given by
exp(−(x − µ)T Σ−1 (x − µ)/2)
fX (x) = √
(2π)n/2 det(Σ)
for any n × 1 vector x.
A normal distribution has several useful properties:
1. if X ∼ N (µ, Σ), then Σij = cov(Xi , Xj ) for any i, j = 1, ..., n where X = (X1 , ..., Xn )T
3. if X ∼ N (µ, Σ), then any subset of components of X is normal as well. In particular, Xi ∼ N (µi , Σii )
4. if X and Y are uncorrelated normal random variables, then X and Y are independent. As an exercise,
check this statement
5. if X ∼ N (µX , σX
2
), Y ∼ N (µY , σY2 ), and X and Y are independent, then X +Y ∼ N (µX +µY , σX
2
+σY2 )
6. Any linear combination of normals is normal. That is, if X ∼ N (µ, Σ) is an n × 1 dimensional normal
vector, and A is a ˝xed k × n full-rank matrix with k ≤ n, then Y = AX is a normal k × 1 vector:
Y ∼ N (Aµ, AΣAT ).
be the covariance matrix of 2 × 1 normal random vector X = (X1 , X2 )T with mean µ = (µ1 , µ2 )T . Note that
Σ12 = Σ21 = σ12 since cov(X1 , X2 ) = cov(X1 , X2 ). From linear algebra, we know that det(Σ) = σ11 σ22 − σ12
2
and [ ]
1 σ22 −σ12
Σ−1 = .
det(Σ) −σ12 σ11
6
Thus the pdf of X is
fX (x1 , x2 )
fX1 |X2 (x1 |X2 = x2 ) =
fX2 (x2 )
exp{−[(x1 − µ1 )2 σ22 + (x2 − µ2 )2 σ12
2
/σ22 − 2(x1 − µ1 )(x2 − µ2 )σ12 ]/(2 det(Σ))}
= √ √
2π det(Σ)/σ22
exp{−[(x1 − µ1 )2 + (x2 − µ2 )2 σ12
2 2
/σ22 − 2(x1 − µ1 )(x2 − µ2 )σ12 /σ22 ]/(2 det(Σ)/σ22 )}
= √ √
2π det(Σ)/σ22
exp{−[x1 − µ1 − (x2 − µ2 )σ12 /σ22 ]2 /(2 det(Σ)/σ22 )}
= √ √
2π det(Σ)/σ22
exp{−(x1 − µ̃)2 /(2σ̃)}
= √ √ ,
2π σ̃
where µ̃ = µ1 + (x2 − µ2 )σ12 /σ22 and σ̃ = det(Σ)/σ22 . Note, that the last expression equals the pdf of a
normal random variable with mean µ̃ and variance σ̃ yields the result.
7
MIT OpenCourseWare
https://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms