Discussion Notes 2-6

Review of entropy and Han's inequality, and their application to Efron-Stein. Applications in turn for those concepts, including VC dimension, conjugate dual functions and Rademacher complexity.

Uploaded by

Mahbod Matt Olfat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views3 pages

Discussion Notes 2-6

Review of entropy and Han's inequality, and their application to Efron-Stein. Applications in turn for those concepts, including VC dimension, conjugate dual functions and Rademacher complexity.

Uploaded by

Mahbod Matt Olfat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

STAT210B Notes

Matt Olfat
February 13, 2017

Efron-Stein Inequality
Let X1 , . . . , Xn be independent random variables, and let Z = f (X1 , . . . , Xn ). Then, we have that:
" n #
X
V ar(Z) E Ei [Z]2 Ei2 [Z] .
i=1

we will show that:

(x) = x2 Ent (Z) =PE(Z) (EZ)

(x) = x log x Ent (Z) E [ ni=1 Ei [(Z)] (Ei [Z])]
Let X be a random variable taking values in X countable.
P
Definition 1 the Shannon entropy of X is H(X) = E[ log p(x)] = xX p(x) log p(x).

p(x) log p(x)

P
Definition 2 If P, Q are probability distributions on X , then D(P |Q) = xX q(x)
0.

P q(x)
Can show using that log x x1, 0 log 0 = 0. Then consider D(P |Q) xX ,p(x)>0 p(x) 1 p(x)

0.
Entropy maximized over uniform distribution:
let |X | , q(x) = |X1 | . Then, D(P |Q) = log |X | H(P ) 0 H(P ) log |X |.
D(P |Q) = 0 iff P = Q.

Entropy on Product Spaces

if X, Y are random variables takingPvalues in X , Y, respectively, they have some joint probability
P (x, y) on X Y : H((X, Y )) = xX ,yY p(x, y) log p(x, y).

Definition 3 the mutual information between X and Y is I(X, Y ) = H(X) + H(Y ) H(X, Y ).
This measures how dependent the variables are.

Now, PX (x) = yY P (X = x, Y = y). Then, I(x, y) = xX ,yY p(x, y) log pX p(x,y)

P P
(x)pY (y)
=
D(P |PX PY ), where denotes the probability product measure, PX (X = x)PY (Y = y). This
is zero when X, Y are independent.

Definition 4 the conditional entropy of X given Y is H(X|Y ) = H(X, Y )H(Y ) = EY [H(P (X|Y ))].

Pn 5 (Hans Inequality) Let X1 , . . . , Xn be discrete random variables. Then, H(x1 , . . . , xn )

Theorem
1
n1 i=1 H(x1 , . . . , xi1 , xi+1 , . . . , xn ).

Proof. H(x1 , . . . , xn ) = H(x1 , . . . , xi1 , xi+1 , . . . , xn ) + H(xi |x1 , . . . , xi=1 , xi+i , . . . , xn ).

Sum this for all i to get:
P
nH(x
P 1 , . . . , xn ) = H(x1 , . . . , xi1 , xi+1 , . . . , xn )
+ P H(xi |x1 , . . . , xi1 , xi+i , . . . , xn ) P
H(x1 , . . . , xi1 , xi+1 , . . . , xn ) + H(xi |x1 , . . . , xi1 )
P P
P H(x1 , . . . , xi1 , xi+1 , . . . , xn ) + H(xi |x1 , . . . , xi1 )
= H(x1 , . . . , xi1 , xi+1 , . . . , xn ) + H(x1 , . . . , xi )

This is tight when the variables are independent.

Consider the binary hypercube with edges at -1 and 1. The Hamming distance measures the
number of differing entries in two vectors. Also consider a graph G = (V, E), where each corner of
the cube is a node and adjacent corners have edges between them. Then, |V | = 2n , |E| = n 2n1 .
|E|
Also, |V |
= n/2 = log22 |V | .
We want: A V = {1, 1}n , |E(A)| |A|
log22 |A| .
Proof. Let X have a uniform distribution over A. H(X)P H(X (i) ) is the entropy of the
ith coordinate given everything else. This is also equal to xA p(x) log p(xi |x(i) ). How-
ever, p(xi |x(i) ) = 0.5 if (x1 , . . . , xi1 , xi , xi+1 , . . . , xn ) A, and is 1 P otherwise. Let x(i) =
(x1 , . . . , xi1 , xi , xi+1 , . . . , xn ). We have: H(X)H(X (i) ) = log 2 log 2
P
|A| xA (i) A = |A| |E(A)|
i 1x
H(X).
Let X be a countable set and P let P, Q be probability distributions on X n . Assume P = P1
P2 Pm and denote q (i) = q(x1 , . . . , xi1 , xi+1 , . . . , xn ), p(i) (x(i) ) =
p1 (x1 ) pi1 (xi1 )pi+1 (xi+1 ) pn (xn ).
1
Pn
Theorem 6 (Hans Inequality) D(Q|P ) n1 D(Q(i) |P (i) ),
Pn
or D(Q|P ) i=1 D(Q|P ) D(Q(i) |P (i) ) .
1
H(Q(i) ). Now, we have xX n q(x) log q(x)
P P
Proof. The original theorem gave us H(Q) n1
1
Pn P (i) (i) (i) (i)

n1 i=1 (i)
x X n1 q (x ) log q (x ) . We want to show that
1
Pn P
q(x) log p(x) n1 i=1 x(i) X n q (x ) log q (i) (x(i) ). We may use the fact that p is a prod-
(i) (i)
P

= n1P ni=1 xX n q(x) log p(i) (x(i) )pi (xi ) =

P P P
uct measure to get xX n q(x) log p(x)P
n1 ni=1 xX n q(x) log pi (xi ) + n1 ni=1 xX n q(x) log p(i) (x(i) )
P P

February 13th
E[(Z Zi )2 ] =E[(Z Zi )2 1{ZZi } ] + E[(Z Zi )2 1{Z<Zi } ]
(1)
=2E[(Z Zi )2+ ] = 2E[(Z Zi )2 ]
This gives us various versions of Efron-Stein.

2
Definition 7 Let f : xn P [0, ). f is self-bounded if i, fi : xi1 [0, ) such that 0
f (x) fi (x(i) ) 1 and ni=1 (f (x) fi (x(i) )) f (x).

Corollary 8 Let Z = f (x1 , . . . , xn ), x1 , . . . , xn independent, f is self-bounded, Z L2 . Then,

Var(Z) E[Z].

E[(f (x) fI (x(i) ))2 ] E[(f (x)

P
Proof. fi are given. By Efron-Stein, V ar[Z]
fI (x(i) ))] E[f (x)] = E[Z].
So self-bounded functions have variances smaller than their expected values. One application
of this is in relative stability:
Zn
Definition 9 We say that nonnegative random variables Zn are relatively stable if E[Zn ]
1 in
probability as n increases.

Thus, the expectation is all we need to know about the magnitude of Zn . If we assume that
V ar[Zn ] E[Zn ]. Then, P (| E[Z Zn
n]
1| ) V2 E[Z ar[Zn ]
n]
1
2 2 E[Z ] .
n
Now, we move on to configuration functions. First, we say that a property is defined over a
union of finite products of a set X. Let 1 X1 , 2 X1 X2 , . . . . We say that (x1 , . . . , xn )
xn satisfies the property if (x1 , . . . , xn ) n . is hereditary (motonote) if the following of
(x1 , . . . , xn ) n then i, . . . , ik , (xi1 , . . . , xik ) k . Let f : X i N. This function maps any
string X = (x1 , . . . , xn ) to the size of the maximal subsets of that string that satisfies . Then, f is
called a configuration function.

Corollary 10 if f is a configuration function, then f is self-bounded.

Proof. Let fi = f , x = (x1 , . . . , xn ) X n . Let i1 , . . . , ik such that (xi1 , . . . , xik ) is the

maximal subset that satisfies . fi (x(i) ) = f (x1 , . . . , xi1
P , xi+1 , . . . , xn ). Then, clearly, 0 f
fi 1, and they will only differ for i {i1 , . . . , ik } so f (x) fi (x(i) )) k = f (x). Then, by
corollary 8, V ar(f (x)) E[f (x)].
VC dimensions are an example of this. Let A be a collection of subsets of X . Let X =
(x1 , . . . , xn ) X n . T r(X) = {A {x1 , . . . , xn } : A A}. We can see that, depending on A,
the trace of X may not capture the richness of the entire power set of X (Consider X a collection
of points and A to be the set of half-spaces facing to the right). The Shatte coefficient (the growth
coefficient) of X is |T r(X)|. A subset {xn , . . . , xn } is shattered by A if the trace is equal to
its power set. The VC dimension of A with respect to that particular X is the size of the maximal
subset shattered by A. Clearly, the property of being shattered is monotone, so the VC dimension is
a configuration function and we have V ar[D(X)] E[D(X)]. This is an example of an empirical
process that concentrates.
Another key example of empirical processes that concentrate is the Rademacher complexity.
Let x1 , . . . , xn be independent
Pn uniform [0, 1]d random variables, 1 , . . . , n i.i.d. Rademacher( 21 ).
Let Z = E[maxk[d] j=1 j (xTj ek )|x1 , . . . , xn ]. Z has the self-bounding property, as removing
one element from the summation inside the maximization can onlyPdecrease the P total value by
(Z Zi ) = (E maxk[d] j (xTj ek )
P
less than one, P i.e. 0 T
Z Zi 1. Furthermore,
P
E maxk[d] j6=i j (xj ek )) n(E[max

Book
No ratings yet
Book
113 pages
Book
No ratings yet
Book
106 pages
Emp Proc Lecture Notes
No ratings yet
Emp Proc Lecture Notes
172 pages
Basdyn
No ratings yet
Basdyn
558 pages
CMU Prob-Grad-Notes - Tomasz Tkocz
No ratings yet
CMU Prob-Grad-Notes - Tomasz Tkocz
226 pages
High Dimensional Probability MA3K0 Notes 3
No ratings yet
High Dimensional Probability MA3K0 Notes 3
108 pages
Datasheet FCP 500 Fa Data Sheet enUS 2701686027
No ratings yet
Datasheet FCP 500 Fa Data Sheet enUS 2701686027
5 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Statistics Training
No ratings yet
Statistics Training
96 pages
Econ 623 AsymptoticTheory 2023
No ratings yet
Econ 623 AsymptoticTheory 2023
74 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Handbook of Chemical and Biological Sensors
100% (2)
Handbook of Chemical and Biological Sensors
575 pages
Entropy Methods in Combinatorics: Daniel Naylor
No ratings yet
Entropy Methods in Combinatorics: Daniel Naylor
50 pages
HDP Solution
No ratings yet
HDP Solution
76 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Shannon Entropy
No ratings yet
Shannon Entropy
11 pages
Math7224 Notes
No ratings yet
Math7224 Notes
32 pages
EXP No - 1
No ratings yet
EXP No - 1
5 pages
ემპირიული პროცესები
No ratings yet
ემპირიული პროცესები
131 pages
Manual Del Instructor - Respuestas Del Libro
No ratings yet
Manual Del Instructor - Respuestas Del Libro
221 pages
Mean Field
No ratings yet
Mean Field
31 pages
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
No ratings yet
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
155 pages
Introduction To Information Theory
No ratings yet
Introduction To Information Theory
20 pages
Unit I - UV-vis Spectroscopy I
No ratings yet
Unit I - UV-vis Spectroscopy I
10 pages
Econometric S If All 2020
No ratings yet
Econometric S If All 2020
119 pages
Conditional Distributions
No ratings yet
Conditional Distributions
5 pages
Probability Bounds
No ratings yet
Probability Bounds
14 pages
CLT, Tcheby, Chi Problems
No ratings yet
CLT, Tcheby, Chi Problems
22 pages
Ashish Mcdiarmid
No ratings yet
Ashish Mcdiarmid
22 pages
Lecture Note 4
No ratings yet
Lecture Note 4
8 pages
Embedded Figures 1st Chapter
No ratings yet
Embedded Figures 1st Chapter
14 pages
New Method of Sinking Caisson Tunnel in Soft Soil: Abda Berisso Bame
100% (1)
New Method of Sinking Caisson Tunnel in Soft Soil: Abda Berisso Bame
73 pages
Stat520 Ch.5
No ratings yet
Stat520 Ch.5
5 pages
11470257UK
No ratings yet
11470257UK
26 pages
Rectilinear Motion (Uniform Acceleration Lab 3)
No ratings yet
Rectilinear Motion (Uniform Acceleration Lab 3)
4 pages
Infinitesimal Strain Theory
No ratings yet
Infinitesimal Strain Theory
13 pages
1 s2.0 S2211601X11001076 Main
No ratings yet
1 s2.0 S2211601X11001076 Main
8 pages
Cheat Sheet For The Final Exam
No ratings yet
Cheat Sheet For The Final Exam
6 pages
17 Notes MFML Probreview
No ratings yet
17 Notes MFML Probreview
19 pages
03 Hoeffding
No ratings yet
03 Hoeffding
5 pages
On Modeling and Control of Flexible Manipulators 59834b2e1723ddf156c8c17f
No ratings yet
On Modeling and Control of Flexible Manipulators 59834b2e1723ddf156c8c17f
160 pages
L12 cf2
No ratings yet
L12 cf2
8 pages
Selective Review - Probability
No ratings yet
Selective Review - Probability
30 pages
MIT6 441S16 Midterm
No ratings yet
MIT6 441S16 Midterm
5 pages
Grade 10-Informal 1
No ratings yet
Grade 10-Informal 1
9 pages
Memo Proba
No ratings yet
Memo Proba
2 pages
E2 201: Information Theory (2019) Solutions To Homework 3
No ratings yet
E2 201: Information Theory (2019) Solutions To Homework 3
11 pages
December 2, 2020
No ratings yet
December 2, 2020
38 pages
Intrebari Fermi
No ratings yet
Intrebari Fermi
12 pages
PDEs PPT - 111623
No ratings yet
PDEs PPT - 111623
17 pages
Part 3 Essay
No ratings yet
Part 3 Essay
38 pages
Final Soln
No ratings yet
Final Soln
5 pages
03 Asym - Ipynb Econ Prob
No ratings yet
03 Asym - Ipynb Econ Prob
3 pages
Advanced Probabiliy
No ratings yet
Advanced Probabiliy
80 pages
NCTT 2013 295
No ratings yet
NCTT 2013 295
6 pages
Lecture Notes 2 1 Probability Inequalities
No ratings yet
Lecture Notes 2 1 Probability Inequalities
9 pages
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
No ratings yet
An Introduction To R: W. N. Venables, D. M. Smith and The R Core Team
106 pages
York Play
100% (2)
York Play
10 pages
Notes It
No ratings yet
Notes It
46 pages
1950 - Paper - Information Retrieval Viewed As Temporal Signalling
No ratings yet
1950 - Paper - Information Retrieval Viewed As Temporal Signalling
12 pages
Vishay Sfernice: Features
No ratings yet
Vishay Sfernice: Features
5 pages
Tensor Decomp Presentation
No ratings yet
Tensor Decomp Presentation
9 pages
Maa 203 Cheat Sheet Lucien Walewski
No ratings yet
Maa 203 Cheat Sheet Lucien Walewski
2 pages
Ineq PDF
No ratings yet
Ineq PDF
3 pages
Durrett Probability Theory and Examples Solutions PDF
73% (15)
Durrett Probability Theory and Examples Solutions PDF
122 pages
Lecture Notes Fall Term 2013
No ratings yet
Lecture Notes Fall Term 2013
40 pages
Ma MSCMT 03
No ratings yet
Ma MSCMT 03
3 pages
Hoeffding Bounds
No ratings yet
Hoeffding Bounds
9 pages
MARK SCHEME For The June 2004 Question Papers
No ratings yet
MARK SCHEME For The June 2004 Question Papers
21 pages
(Some) Solutions For HW Set # 2
No ratings yet
(Some) Solutions For HW Set # 2
3 pages
VIBXPERT II Balancer Catalog English
No ratings yet
VIBXPERT II Balancer Catalog English
3 pages
Unit 2 - Week 1: Assignment 1
100% (1)
Unit 2 - Week 1: Assignment 1
5 pages
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
100% (1)
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
14 pages
Tensor Completion Notes
No ratings yet
Tensor Completion Notes
2 pages
Mallows Theorem
No ratings yet
Mallows Theorem
21 pages
ADASD
No ratings yet
ADASD
4 pages
STAT 538 Maximum Entropy Models C Marina Meil A Mmp@stat - Washington.edu
No ratings yet
STAT 538 Maximum Entropy Models C Marina Meil A Mmp@stat - Washington.edu
20 pages
1511152635981
No ratings yet
1511152635981
14 pages
Wear Particle Analysis Using Ferrography
No ratings yet
Wear Particle Analysis Using Ferrography
4 pages
Lecture Notes 2 1 Probability Inequalities
No ratings yet
Lecture Notes 2 1 Probability Inequalities
9 pages
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
No ratings yet
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
13 pages
Chapter5 - Shear and Diagonal Tension in Beams
No ratings yet
Chapter5 - Shear and Diagonal Tension in Beams
9 pages
1 Inequalities: 1.1 Markov
No ratings yet
1 Inequalities: 1.1 Markov
15 pages
DIN 2391-2393 - Composição e Resistência Mecânica PDF
No ratings yet
DIN 2391-2393 - Composição e Resistência Mecânica PDF
4 pages
Lecture 4 Inequalities and Asymptotic Estimates
No ratings yet
Lecture 4 Inequalities and Asymptotic Estimates
9 pages
נוסחאות ואי שיוויונים
No ratings yet
נוסחאות ואי שיוויונים
12 pages
MIT14 30s09 Lec17
No ratings yet
MIT14 30s09 Lec17
9 pages
Chapter 4 Terrestrial Environment
No ratings yet
Chapter 4 Terrestrial Environment
6 pages
College Statistics
No ratings yet
College Statistics
244 pages
3 Pocket
No ratings yet
3 Pocket
3 pages
Understanding Rod Load
100% (1)
Understanding Rod Load
7 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet