Math556 05 Inequalities
Math556 05 Inequalities
S OME I NEQUALITIES
Expectation Inequalities
JENSEN’S INEQUALITY
Jensen’s Inequality gives a lower bound on expectations of convex functions. Recall that a function
g(x) is convex if, for 0 < λ < 1,
for all x and y. Alternatively, if the derivatives are well defined, function g(x) is convex if
d2
{g(t)}t=x = g (2) (x) ≥ 0.
dt2
Conversely, g(x) is concave if −g(x) is convex.
PX [g(X) = a + bX] = 1.
and hence
EX [g(X)] > EX [l(X)].
But l(x) is linear, so EX [l(X)] = a + bEX [X] = g(µ), yielding the contradiction
1
5
4
g(x)
3
g(x)
l(x) = a + bx
2
µ=2
1
0
0 1 2 3 4 5
for all vectors (x1 , . . . , xn ); this follows by induction using the original definition. We may regard this
statement as stating
g (En [X]) ≤ En [g(X)] (1)
where ∫ ∫
En [X] = x dFn (x) En [g(X)] = g(x) dFn (x)
2
where Fn is the cdf of the discrete distribution on {x1 , . . . , xn } with associated probability masses
{λ1 , . . . , λn }, that is,
∑n
Fn (x) = λj I[xj ,∞) (x).
j=1
Now, for any FX , we can find infinite sequences {(xj , λj ), j = 1, 2, . . .} such that for all x
– this is stated pointwise here, but convergence functionwise also holds. Also, as g is convex, it is also
continuous. Therefore we may pass limits through the integrals and note that
CAUCHY-SCHWARZ INEQUALITY
Theorem
For random variable X and functions g1 () and g2 (), we have that
{EX [g1 (X)g2 (X)]}2 ≤ EX [{g1 (X)}2 ]EX [{g2 (X)}2 ] (2)
for some c ̸= 0.
Proof Let X1 = g1 (X) and X2 = g2 (X), and let
Set a2 = EX [X22 ] and b2 = EX [X12 ]. If either a or b is zero, the inequality clearly holds. We may thus
consider EX [X12 ], EX [X22 ] > 0: we have
2EX [X12 ]EX [X22 ] + 2{EX [X12 ]EX [X22 ]}1/2 EX [X1 X2 ] ≥ 0
2EX [X12 ]EX [X22 ] − 2{EX [X12 ]EX [X22 ]}1/2 EX [X1 X2 ] ≥ 0
−{EX [X12 ]EX [X22 ]}1/2 ≤ EX [X1 X2 ] ≤ {EX [X12 ]EX [X22 ]}1/2
that is {EX [X1 X2 ]}2 ≤ EX [X12 ]EX [X22 ] or, in the original form
3
We examine the case of equality:
{EX [g1 (X)g2 (X)]}2 = EX [{g1 (X)}2 ]EX [{g2 (X)}2 ] (3)
If EX [{gj (X)}2 ] = 0 for j = 1 or 2, then gj (X) is constant with probability one, say PX [gj (X) = c] = 1.
Clearly the left-hand side of (2) is non-negative, so we must have equality as the right-hand side is
zero. So suppose EX [{gj (X)}2 ] > 0 for j = 1, 2, but g1 (X) = cg2 (X) with probability one for some
c ̸= 0. In this case we replace g1 (X) in the left- and right- hand sides of (2) to conclude that
{EX [cg2 (X)2 ]}2 = EX [{cg2 (X)}2 ]EX [{g2 (X)}2 ] = c2 EX [{g2 (X)}2 ]
say. Let Z = g1 (X) − cg2 (X). For a contradiction, assume that Z is not zero with probability 1: we have
which is strictly positive. However the right hand side can be written,
( ) ( )
E[g1 (X)g2 (X)] 2 E[g1 (X)g2 (X)] 2
E[{g1 (X)} ] + c{E[{g2 (X)} ]} −
2 2 1/2
−
{E[{g2 (X)}2 ]}1/2 {E[{g2 (X)}2 ]}1/2
Now if we set
E[g1 (X)g2 (X)]
c=
E[{g2 (X)}2 ]
the second term is zero, so we must then have
but this contradicts assumption (3). Hence Z must be zero with probability 1, that is
with probability 1.
HÖLDER’S INEQUALITY
Lemma Let a, b > 0 and p, q > 1 satisfy
p−1 + q −1 = 1. (4)
Then
p−1 ap + q −1 bq ≥ ab
with equality if and only if ap = bq .
Proof Fix b > 0. Let
g(a; b) = p−1 ap + q −1 bq − ab.
4
We require that g(a; b) ≥ 0 for all a. Differentiating wrt a for fixed b yields g (1) (a; b) = ap−1 − b, so that
g(a; b) is minimized (the second derivative is strictly positive at all a) when ap−1 = b, and at this value
of a, the function takes the value
as, by equation (4), 1/p + 1/q = 1 =⇒ (p − 1)q = p. As the second derivative is strictly positive at all a,
the minimum is attained at the unique value of a where ap−1 = b, where, raising both sides to power
q yields ap = bq .
|EX,Y [XY ]| ≤ EX,Y [|XY |] ≤ {EX [|X|p ]}1/p {EfY [|Y |q ]}1/q
Proof (Absolutely continuous case: discrete case similar) For the first inequality,
∫∫ ∫∫
EX,Y [|XY |] = |xy|fX,Y (x, y) dx dy ≥ xyfX,Y (x, y) dx dy = EX,Y [XY ]
and ∫∫ ∫∫
EX,Y [XY ] = xyfX,Y (x, y) dx dy ≥ −|xy|fX,Y (x, y) dx dy = −EX,Y [|XY |]
so
−EX,Y [|XY |] ≤ EX,Y [XY ] ≤ EX,Y [|XY |] ∴ |EX,Y [XY ]| ≤ EX,Y [|XY |].
For the second inequality, set
|X| |Y |
a= b= .
{EX [|X|p ]} 1/p
{EfY [|Y |q ]}1/q
|X|p |Y |q |XY |
p−1 −1
≥
EX [|X|p ] + q EfY [|Y | ] {EX [|X|p ]} {EfY [|Y |q ]}1/q
q 1/p
p−1
EX [|X|p ] + q−1 Ef Y
[|Y |q ]
= p−1 + q −1 = 1
EX [|X|p ] Ef Y
[|Y |q]
5
Theorem (CAUCHY-SCHWARZ INEQUALITY REVISITED)
Suppose that X and Y are two random variables.
{ }1/2 { }
|EX,Y [XY ]| ≤ EX,Y [|XY |] ≤ EX [|X|2 ] EfY [|Y |2 ] 1/2
(b) Lyapunov’s Inequality: Define Y = 1 with probability one. Then, for 1 < p < ∞
EX [|X|] ≤ {EX [|X|p ]}1/p .
Let 1 < r < p. Then
EX [|X|r ] ≤ {EX [|X|pr ]}1/p
and letting s = pr > r yields
EX [|X|r ] ≤ {EX [|X|s ]}r/s
so that
{EX [|X|r ]}1/r ≤ {EX [|X|s ]}1/s
for 1 < r < s < ∞.
6
Concentration and Tail Probability Inequalities
Lemma (CHEBYCHEV’S LEMMA) If X is a random variable, then for non-negative function h, and
c > 0,
PX [h(X) ≥ c] ≤
EX [h(X)]
c
Proof (continuous case) : Suppose that X has density function fX which is positive for x ∈ X. Let
A = {x ∈ X : h(x) ≥ c} ⊆ X. Then, as h(x) ≥ c on A,
∫ ∫ ∫
EX [h(X)] = h(x)fX (x) dx = h(x)fX (x) dx + h(x)fX (x) dx
A A′
∫
≥ h(x)fX (x) dx
A
∫
≥ cfX (x) dx = c PX [X ∈ A] = c PX [h(X) ≥ c]
A
Alternately stated (by Casella and Berger) as follows: If P [Y ≥ 0] = 1 and P [Y = 0] < 1, then for
any r > 0
PY [Y ≥ r] ≤
EX [Y ]
r
with equality if and only if
PY [Y = r] = p = 1 − PY [Y = 0]
for some 0 < p ≤ 1.
or equivalently
PX [|X − µ| ≥ kσ] ≤ 1/k 2 .
Setting ϵ = kσ gives
PX [|X − µ| ≥ ϵ] ≤ σ 2 /ϵ2
or equivalently
PX [|X − µ| < ϵ] ≥ 1 − σ 2 /ϵ2 .
7
Theorem (TAIL BOUNDS FOR THE NORMAL DENSITY)
If Z ∼ N (0, 1), then for t > 0
√ √
2 t −t2 /2 2 1 −t2 /2
e ≤ PZ [|Z| ≥ t] ≤ e
π 1 + t2 πt
Proof By symmetry, PZ [|Z| ≥ t] = 2 PZ [Z ≥ t], so
after writing 1 = x/x, then integrating by parts, and then noting that, on (t, ∞), x > t ⇐⇒ 1/x2 < 1/t2 ,
and that the integrand is non-negative. Therefore, combining terms
( )∫ ∞
1 1
e−x /2 dx ≥ e−t /2
2 2
1+ 2
t t t
To see the quality of the approximation, the table below shows the values of the bounding values for t
ranging from 1 to 5. Clearly the bounds improve as t gets larger.