0% found this document useful (0 votes)
158 views7 pages

Notes On Jensen's Inequality

Jensen's inequality states that for a convex function f(x) and a random variable X, the expected value of f(X) is greater than or equal to f of the expected value of X. It also holds for concave functions f(x) but with the inequalities reversed. The notes provide several representations of Jensen's inequality from different sources and relate it to concepts like expectation-maximization and Kullback-Leibler divergence. The derivations in the notes from Richard Yida Xu and Wikipedia are identified as better matching the derivation of EM and KL-divergence.

Uploaded by

Jun Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views7 pages

Notes On Jensen's Inequality

Jensen's inequality states that for a convex function f(x) and a random variable X, the expected value of f(X) is greater than or equal to f of the expected value of X. It also holds for concave functions f(x) but with the inequalities reversed. The notes provide several representations of Jensen's inequality from different sources and relate it to concepts like expectation-maximization and Kullback-Leibler divergence. The derivations in the notes from Richard Yida Xu and Wikipedia are identified as better matching the derivation of EM and KL-divergence.

Uploaded by

Jun Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Notes on Jensen’s inequality

There are various concrete representa!ons of Jensen’s inequality.

Jensen’s inequality in Andrew Ng’s Lecture Notes

Let f be a convex func!on, and let X be a random variable. Then:

E[f (X)] ≥ f (EX)

Moreover, if f is strictly convex, then E[f (X)] = f (EX) holds true if and only if
X = E[X] with probability 1 (i.e., if X is a constant).

Jensen’s inequality also holds for concave func!ons f , but with the direc!on of all the
inequali!es reversed (E[f (X)] ≤ f (EX), etc.).

For an interpreta!on of the theorem, consider the figure below.


Here, f is a convex func!on shown by the solid line. Also, X is a random variable that
has a 0.5 chance of taking the value a, and a 0.5 chance of taking the value b (indicated
on the x-axis). Thus, the expected value of X is given by the midpoint between a and b.
We also see the values f (a), f (b) and f (E[X]) indicated on the y -axis. Moreover, the
value E[f (X)] is now the midpoint on the y -axis between f (a) and f (b). From our
example, we see that because f is convex, it must be the case that E[f (X)] ≥
f (EX).

Jensen’s inequality in David McAllester’s Lecture Notes

Consider a probability distribu!on P on a set M and a func!on X assigning real values


X(m) for m ∈ M . If f is convex then for any distribu!on P on M we have the
following:
Em∼P [f (X(m))] ≥ f (Em∼P [X(m)])

Jensen’s inequality in Richard Yida Xu’s Lecture Notes

If Φ is a convex func!on and 0 < t < 1, then

Φ((1 − t) × x1 + t × x2 ) ≤ (1 − t) × Φ(x1 ) + t × Φ(x2 )


n
With ∑i=1 pi = 1, we can generalize the above inequality.

Φ(p1 x1 + p2 x2 + ... + pn xn ) ≤ p1 Φ(x1 ) + p1 Φ(x1 ) + ... + pn Φ(xn )


n n
Φ(∑ pi × xi ) ≤ ∑ pi × Φ(xi )
i=1 i=1

If both values of xi and f (xi ) are in the domain of Φ, we can replace xi with f (xi ) and
s!ll get
n n
Φ(∑ pi × f (xi )) ≤ ∑ pi × Φ(f (xi ))
i=1 i=1

For the con!nuous case with ∫x∈S p(x) = 1, if both values of x and f (x) are in the
domain of Φ we can get

Φ(∫ f (x)p(x)dx) ≤ ∫ Φ(f (x))p(x)dx


x∈S x∈S

Actually, the above inequality is

Φ(E[f (x)]) ≤ E[Φ(f (x))]

Jensen’s inequality from Wikipedia

Form involving a probability density func!on

Suppose Ω is a measurable subset of the real line and f (x) is a non-nega!ve func!on
such that

∫ f (x) dx = 1
−∞

In probabilis!c language, f is a probability density func!on.

Then Jensen’s inequality becomes the following statement about convex integrals:

If g is any real-valued measurable func!on and φ is convex over the range of g , then
∞ ∞
φ (∫ g(x)f (x) dx) ≤ ∫ φ(g(x))f (x) dx.
−∞ −∞

If g(x) = x, then this form of the inequality reduces to a commonly used special case:
∞ ∞
φ (∫ x f (x) dx) ≤ ∫ φ(x) f (x) dx.
−∞ −∞

Alterna!ve finite form

Let Ω = {x1 , ...xn }, and take µ to be the coun!ng measure on Ω, then the general
form reduces to a statement about sums:

n n
φ (∑ g(xi )f (xi )) ≤ ∑ φ(g(xi ))f (xi )
i=1 i=1

provided that f (xi ) = λi ≥ 0 and

λ1 + ⋯ + λn = 1

Gibbs’ inequality

If p(x) is the true probability distribu!on for x, and q(x) is another distribu!on, then
applying Jensen’s inequality for the random variable Y (x) = q(x)/p(x) and the
func!on φ(y) = −log(y) gives
E[φ(Y )] ≥ φ(E[Y ])

Therefore:

KL(p(x)∥q(x)) = ∫ p(x) log ( ) dx


p(x)
q(x)
= − ∫ p(x) log ( ) dx
q(x)
p(x)
≥ − log (∫ p(x) dx)
q(x)
p(x)
= − log (∫ q(x) dx) = 0

a result called Gibbs’ inequality.

It shows that the average message length is minimized when codes are assigned on the
basis of the true probabili!es p rather than any other distribu!on q . The quan!ty that is
non-nega!ve is called the Kullback–Leibler divergence of q from p.

Since −log(x) is a strictly convex func!on for x > 0, it follows that equality holds
when p(x) equals q(x) almost everywhere.

Notes:
Compared with other notes, it seems that the notes from Richard Yida Xu and Wikipedia
be"er match the deriva!on of EM and KL-divergence.

Reference

David McAllester, Jensen’s Inequality, (h"p://#c.uchicago.edu/~dmcallester/#c101-


07/lectures/jensen/jensen.pdf)

Richard YiDa Xu, Expecta!on-Maximiza!on, (h"p://www-


staff.it.uts.edu.au/~ydxu/ml_course/em.pdf)

Jensen’s inequality from Wikipedia (h"ps://en.wikipedia.org/wiki/Jensen’s_inequality)

You might also like