0% found this document useful (0 votes)
7 views8 pages

Math556 05 Inequalities

The document discusses several important inequalities in mathematical statistics, including Jensen's Inequality, Cauchy-Schwarz Inequality, Hölder's Inequality, and Minkowski's Inequality. Jensen's Inequality provides a lower bound for expectations of convex functions, while Cauchy-Schwarz and Hölder's inequalities relate the expectations of products of random variables to the expectations of their individual components. The document includes proofs and conditions for equality in these inequalities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views8 pages

Math556 05 Inequalities

The document discusses several important inequalities in mathematical statistics, including Jensen's Inequality, Cauchy-Schwarz Inequality, Hölder's Inequality, and Minkowski's Inequality. Jensen's Inequality provides a lower bound for expectations of convex functions, while Cauchy-Schwarz and Hölder's inequalities relate the expectations of products of random variables to the expectations of their individual components. The document includes proofs and conditions for equality in these inequalities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

556: M ATHEMATICAL S TATISTICS I

S OME I NEQUALITIES

Expectation Inequalities
JENSEN’S INEQUALITY
Jensen’s Inequality gives a lower bound on expectations of convex functions. Recall that a function
g(x) is convex if, for 0 < λ < 1,

g(λx + (1 − λ)y) ≤ λg(x) + (1 − λ)g(y)

for all x and y. Alternatively, if the derivatives are well defined, function g(x) is convex if

d2
{g(t)}t=x = g (2) (x) ≥ 0.
dt2
Conversely, g(x) is concave if −g(x) is convex.

Theorem (JENSEN’S INEQUALITY)


Suppose that X is a random variable with expectation µ, and function g is convex and finite. Then

EX [g(X)] ≥ g(EX [X])


with equality if and only if, for every line a + bx that is a tangent to g at µ

PX [g(X) = a + bX] = 1.

that is, g(x) is linear.


Proof Let l(x) = a + bx be the equation of the tangent at x = µ. Then, for each x, g(x) ≥ a + bx as in
the figure. Thus

EX [g(X)] ≥ EX [a + bX] = a + bEX [X] = l(µ) = g(µ) = g(EX [X])


as required. Also, if g(x) is linear, then equality follows by properties of expectations. Suppose that

EX [g(X)] = g(EX [X]) = g(µ)


but g(x) is convex, but not linear. Let l(x) = a + bx be the tangent to g at µ. Then by convexity
∫ ∫ ∫
g(x) − l(x) > 0 ∴ (g(x) − l(x)) dFX (x) = g(x) dFX (x) − l(x) dFX (x) > 0

and hence
EX [g(X)] > EX [l(X)].
But l(x) is linear, so EX [l(X)] = a + bEX [X] = g(µ), yielding the contradiction

EX [g(X)] > g(EX [X]).


and the result follows.

1
5
4

g(x)
3
g(x)

l(x) = a + bx
2

µ=2
1
0

0 1 2 3 4 5

Figure 1: The function g(x) and its tangent at x = µ.

• If g(x) is concave, then


EX [g(X)] ≤ g(EX [X])
• g(x) = x2 is convex, thus [ 2]
EX X ≥ {EX [X]}2
• g(x) = log x is concave, thus
EX [log X] ≤ log {EX [X]}

Alternative approach to Jensen’s Inequality:


We may use the general definition of convexity to prove the result by using the fact that the distribution
FX can be viewed as a limiting function derived from a sequence of discrete cdfs. We have that g(x) is
convex if, for n ≥ 2 and constants λj , j = 1, . . . , n, with 0 < λj < 1, and λ1 + · · · + λn = 1
 
∑n ∑n
g λj xj  ≤ λj g (xj )
j=1 j=1

for all vectors (x1 , . . . , xn ); this follows by induction using the original definition. We may regard this
statement as stating
g (En [X]) ≤ En [g(X)] (1)
where ∫ ∫
En [X] = x dFn (x) En [g(X)] = g(x) dFn (x)

2
where Fn is the cdf of the discrete distribution on {x1 , . . . , xn } with associated probability masses
{λ1 , . . . , λn }, that is,
∑n
Fn (x) = λj I[xj ,∞) (x).
j=1

Now, for any FX , we can find infinite sequences {(xj , λj ), j = 1, 2, . . .} such that for all x

lim Fn (x) = FX (x)


n−→∞

– this is stated pointwise here, but convergence functionwise also holds. Also, as g is convex, it is also
continuous. Therefore we may pass limits through the integrals and note that

lim En [X] = EX [X] lim En [g(X)] = EX [g(X)]


n−→∞ n−→∞

which yields Jensen’s inequality by substitution into (1).

CAUCHY-SCHWARZ INEQUALITY

Theorem
For random variable X and functions g1 () and g2 (), we have that

{EX [g1 (X)g2 (X)]}2 ≤ EX [{g1 (X)}2 ]EX [{g2 (X)}2 ] (2)

with equality if and only if either EX [{g1 (X)}2 ] = 0 or EX [{g2 (X)}2 ] = 0, or

PX [g1 (X) = cg2 (X)] = 1

for some c ̸= 0.
Proof Let X1 = g1 (X) and X2 = g2 (X), and let

Y1 = aX1 + bX2 Y2 = aX1 − bX2

and as EY1 [Y12 ], EY2 [Y22 ] ≥ 0, we have that

a2 EX [X12 ] + b2 EX [X22 ] + 2abEX [X1 X2 ] ≥ 0

a2 EX [X12 ] + b2 EX [X22 ] − 2abEX [X1 X2 ] ≥ 0

Set a2 = EX [X22 ] and b2 = EX [X12 ]. If either a or b is zero, the inequality clearly holds. We may thus
consider EX [X12 ], EX [X22 ] > 0: we have

2EX [X12 ]EX [X22 ] + 2{EX [X12 ]EX [X22 ]}1/2 EX [X1 X2 ] ≥ 0

2EX [X12 ]EX [X22 ] − 2{EX [X12 ]EX [X22 ]}1/2 EX [X1 X2 ] ≥ 0

Rearranging, we obtain that

−{EX [X12 ]EX [X22 ]}1/2 ≤ EX [X1 X2 ] ≤ {EX [X12 ]EX [X22 ]}1/2

that is {EX [X1 X2 ]}2 ≤ EX [X12 ]EX [X22 ] or, in the original form

{EX [g1 (X)g2 (X)]}2 ≤ EX [{g1 (X)}2 ]EX [{g2 (X)}2 ].

3
We examine the case of equality:

{EX [g1 (X)g2 (X)]}2 = EX [{g1 (X)}2 ]EX [{g2 (X)}2 ] (3)

If EX [{gj (X)}2 ] = 0 for j = 1 or 2, then gj (X) is constant with probability one, say PX [gj (X) = c] = 1.
Clearly the left-hand side of (2) is non-negative, so we must have equality as the right-hand side is
zero. So suppose EX [{gj (X)}2 ] > 0 for j = 1, 2, but g1 (X) = cg2 (X) with probability one for some
c ̸= 0. In this case we replace g1 (X) in the left- and right- hand sides of (2) to conclude that

{EX [cg2 (X)2 ]}2 = EX [{cg2 (X)}2 ]EX [{g2 (X)}2 ] = c2 EX [{g2 (X)}2 ]

and equality follows.


For the converse, assume that (3) holds. If both sides equate to zero, then we must have at least one
term on the right-hand side equal to zero, so EX [{gj (X)}2 ] = 0 for j = 1 or 2. If both sides equate to a
positive constant then both EX [{gj (X)}2 ] > 0. By assumption, we may write

{EX [g1 (X)g2 (X)]}2


EX [{g1 (X)}2 ] =
EX [{g2 (X)}2 ]

say. Let Z = g1 (X) − cg2 (X). For a contradiction, assume that Z is not zero with probability 1: we have

E[Z 2 ] = E[{g1 (X)}2 ] + c2 E[{g2 (X)}2 ] − 2cE[g1 (X)g2 (X)]

which is strictly positive. However the right hand side can be written,
( ) ( )
E[g1 (X)g2 (X)] 2 E[g1 (X)g2 (X)] 2
E[{g1 (X)} ] + c{E[{g2 (X)} ]} −
2 2 1/2

{E[{g2 (X)}2 ]}1/2 {E[{g2 (X)}2 ]}1/2

Now if we set
E[g1 (X)g2 (X)]
c=
E[{g2 (X)}2 ]
the second term is zero, so we must then have

{E[g1 (X)g2 (X)]}2


E[{g1 (X)}2 ] − >0
E[{g2 (X)}2 ]

but this contradicts assumption (3). Hence Z must be zero with probability 1, that is

g1 (X) = cg2 (X)

with probability 1.

HÖLDER’S INEQUALITY
Lemma Let a, b > 0 and p, q > 1 satisfy

p−1 + q −1 = 1. (4)

Then
p−1 ap + q −1 bq ≥ ab
with equality if and only if ap = bq .
Proof Fix b > 0. Let
g(a; b) = p−1 ap + q −1 bq − ab.

4
We require that g(a; b) ≥ 0 for all a. Differentiating wrt a for fixed b yields g (1) (a; b) = ap−1 − b, so that
g(a; b) is minimized (the second derivative is strictly positive at all a) when ap−1 = b, and at this value
of a, the function takes the value

p−1 ap + q −1 (ap−1 )q − a(ap−1 ) = p−1 ap + q −1 ap − ap = 0

as, by equation (4), 1/p + 1/q = 1 =⇒ (p − 1)q = p. As the second derivative is strictly positive at all a,
the minimum is attained at the unique value of a where ap−1 = b, where, raising both sides to power
q yields ap = bq .

Theorem (HÖLDER’S INEQUALITY)


Suppose that X and Y are two random variables, and p, q > 1 satisfy (4). Then

|EX,Y [XY ]| ≤ EX,Y [|XY |] ≤ {EX [|X|p ]}1/p {EfY [|Y |q ]}1/q

Proof (Absolutely continuous case: discrete case similar) For the first inequality,
∫∫ ∫∫
EX,Y [|XY |] = |xy|fX,Y (x, y) dx dy ≥ xyfX,Y (x, y) dx dy = EX,Y [XY ]

and ∫∫ ∫∫
EX,Y [XY ] = xyfX,Y (x, y) dx dy ≥ −|xy|fX,Y (x, y) dx dy = −EX,Y [|XY |]

so
−EX,Y [|XY |] ≤ EX,Y [XY ] ≤ EX,Y [|XY |] ∴ |EX,Y [XY ]| ≤ EX,Y [|XY |].
For the second inequality, set

|X| |Y |
a= b= .
{EX [|X|p ]} 1/p
{EfY [|Y |q ]}1/q

Then from the previous lemma

|X|p |Y |q |XY |
p−1 −1

EX [|X|p ] + q EfY [|Y | ] {EX [|X|p ]} {EfY [|Y |q ]}1/q
q 1/p

and taking expectations yields, on the left hand side,

p−1
EX [|X|p ] + q−1 Ef Y
[|Y |q ]
= p−1 + q −1 = 1
EX [|X|p ] Ef Y
[|Y |q]

and on the right hand side


E
X,Y [|XY |]
{E p
X [|X| ]}
1/p
E
{ fY [|Y |q ]}1/q
and the result follows.
Note: here we have equality if and only if

PX,Y [|X|p = c|Y |q ] = 1

for some non zero constant c.

5
Theorem (CAUCHY-SCHWARZ INEQUALITY REVISITED)
Suppose that X and Y are two random variables.
{ }1/2 { }
|EX,Y [XY ]| ≤ EX,Y [|XY |] ≤ EX [|X|2 ] EfY [|Y |2 ] 1/2

Proof Set p = q = 2 in the Hölder Inequality.


Corollaries:
(a) Let µX and µY denote the expectations of X and Y respectively. Then, by the Cauchy-Schwarz
inequality
{ }1/2 { }
|EX,Y [(X − µX )(Y − µY )]| ≤ EX [(X − µX )2 ] EfY [(Y − µY )2 ] 1/2
so that
EX,Y [(X − µX )(Y − µY )] ≤ EX [(X − µX )2 ]EfY [(Y − µY )2 ]
and hence
{CovX,Y [X, Y ]}2 ≤ VarX [X] VarfY [Y ].

(b) Lyapunov’s Inequality: Define Y = 1 with probability one. Then, for 1 < p < ∞
EX [|X|] ≤ {EX [|X|p ]}1/p .
Let 1 < r < p. Then
EX [|X|r ] ≤ {EX [|X|pr ]}1/p
and letting s = pr > r yields
EX [|X|r ] ≤ {EX [|X|s ]}r/s
so that
{EX [|X|r ]}1/r ≤ {EX [|X|s ]}1/s
for 1 < r < s < ∞.

Theorem (MINKOWSKI’S INEQUALITY)


Suppose that X and Y are two random variables, and 1 ≤ p < ∞. Then
{EX,Y [|X + Y |p ]}1/p ≤ {EX [|X|p ]}1/p + {EfY [|Y |p ]}1/p
Proof Write

EX,Y [|X + Y |p ] = EX,Y [|X + Y ||X + Y |p−1 ]

≤ EX,Y [|X||X + Y |p−1 ] + EX,Y [|Y ||X + Y |p−1 ]


by the triangle inequality |x + y| ≤ |x| + |y|. Using Hölder’s Inequality on the terms on the right hand
side, for q selected to satisfy 1/p + 1/q = 1,
{ }1/q { }1/q
EX,Y [|X + Y |p ] ≤ {EX [|X|p ]}1/p EX,Y [|X + Y |q(p−1) ] + {EfY [|Y |p ]}1/p EX,Y [|X + Y |q(p−1) ]
{ }1/q
and dividing through by EX,Y [|X + Y |q(p−1) ] yields
EX,Y [|X + Y |p ]
{ }1/q ≤ {EX [|X|p ]}1/p + {EfY [|Y |p ]}1/p
EX,Y [|X + Y |q(p−1) ]
and the result follows as q(p − 1) = p, and 1 − 1/q = 1/p.

6
Concentration and Tail Probability Inequalities

Lemma (CHEBYCHEV’S LEMMA) If X is a random variable, then for non-negative function h, and
c > 0,
PX [h(X) ≥ c] ≤
EX [h(X)]
c
Proof (continuous case) : Suppose that X has density function fX which is positive for x ∈ X. Let
A = {x ∈ X : h(x) ≥ c} ⊆ X. Then, as h(x) ≥ c on A,
∫ ∫ ∫
EX [h(X)] = h(x)fX (x) dx = h(x)fX (x) dx + h(x)fX (x) dx
A A′

≥ h(x)fX (x) dx
A

≥ cfX (x) dx = c PX [X ∈ A] = c PX [h(X) ≥ c]
A

and the result follows.

• SPECIAL CASE I - THE MARKOV INEQUALITY


If h(x) = |x|r for r > 0, so
PX [|X|r ≥ c] ≤
EX [|X|r ] .
c

Alternately stated (by Casella and Berger) as follows: If P [Y ≥ 0] = 1 and P [Y = 0] < 1, then for
any r > 0
PY [Y ≥ r] ≤
EX [Y ]
r
with equality if and only if
PY [Y = r] = p = 1 − PY [Y = 0]
for some 0 < p ≤ 1.

• SPECIAL CASE II - THE CHEBYCHEV INEQUALITY


Suppose that X is a random variable with expectation µ and variance σ 2 . Then h(x) = (x − µ)2
and c = k 2 σ 2 , for k > 0, [ ]
PX (X − µ)2 ≥ k 2 σ 2 ≤ 1/k 2

or equivalently
PX [|X − µ| ≥ kσ] ≤ 1/k 2 .
Setting ϵ = kσ gives
PX [|X − µ| ≥ ϵ] ≤ σ 2 /ϵ2
or equivalently
PX [|X − µ| < ϵ] ≥ 1 − σ 2 /ϵ2 .

7
Theorem (TAIL BOUNDS FOR THE NORMAL DENSITY)
If Z ∼ N (0, 1), then for t > 0
√ √
2 t −t2 /2 2 1 −t2 /2
e ≤ PZ [|Z| ≥ t] ≤ e
π 1 + t2 πt
Proof By symmetry, PZ [|Z| ≥ t] = 2 PZ [Z ≥ t], so

( )1/2 ∫ ∞ ( )1/2 ∫ ∞ ( )1/2


e−t /2
2
1 −x2 /2 1 x −x2 /2 1
PZ [Z ≥ t] = e dx ≤ e dx = .
2π t 2π t t 2π t

Similarly, for t > 0,


∫ ∞ ∫ [ ] ∫ ∞ ∫
−x2 /2

x −x2 /2 1 −x2 /2 ∞ 1 −x2 /2 1 −t2 /2 1 ∞ −x2 /2
e dx ≡ e dx = − e − e dx ≥ e − 2 e dx
t t x x t t x2 t t t

after writing 1 = x/x, then integrating by parts, and then noting that, on (t, ∞), x > t ⇐⇒ 1/x2 < 1/t2 ,
and that the integrand is non-negative. Therefore, combining terms
( )∫ ∞
1 1
e−x /2 dx ≥ e−t /2
2 2
1+ 2
t t t

and cross-multiplying by the positive term t2 /(1 + t2 ) yields


∫ ∞

−x2 /2 t 2 t
e−t /2 e−t /2 .
2 2
e dx ≥ 2
∴ PZ [|Z| > t] ≥ 2
t 1+t π1+t

To see the quality of the approximation, the table below shows the values of the bounding values for t
ranging from 1 to 5. Clearly the bounds improve as t gets larger.

t 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0


Lower 2.420e-01 1.196e-01 4.319e-02 1.209e-02 2.659e-03 4.610e-04 6.298e-05 6.770e-06 5.718e-07
True 3.173e-01 1.336e-01 4.550e-02 1.242e-02 2.700e-03 4.653e-04 6.334e-05 6.795e-06 5.733e-07
Upper 4.839e-01 1.727e-01 5.399e-02 1.402e-02 2.955e-03 4.987e-04 6.692e-05 7.104e-06 5.947e-07

You might also like