0% found this document useful (0 votes)

46 views8 pages

Chapshannon PDF

This chapter discusses Shannon entropy and its relationship to concepts in information theory and statistical mechanics. It introduces key information theory concepts such as quantifying information in bits, information size, information gain, and Shannon entropy. Shannon entropy measures the uncertainty in a random variable and is analogous to thermodynamic entropy in statistical mechanics, with both being related to the number of microscopic configurations of a system that correspond to its macroscopic properties.

Uploaded by

nino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views8 pages

Chapshannon PDF

Uploaded by

nino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CHAPTER 6

Shannon entropy

This chapter is a digression in information theory. This is a fascinating

subject, which arose once the notion of information got precise and quantifyable.
From a physical point of view, information theory has nothing to do with physics.
However, the concept of Shanon entropy shares some intuition with Boltzmann’s,
and some of the mathematics developed in information theory turns out to have
relevance in statistical mechanics. This chapter contains very basic material on
information theory.

1. Quantifying information
Computers have popularized the notion of bit, a unit of information that takes
two values, 0 or 1. We introduce the information size H0 (A) of a set A as the
number of bits that is necessary to encode each element of A separately, i.e.
H0 (A) = log2 |A|. (6.1)
This quantity has a unit, the bit. If we have two sets A and B, then
H0 (A × B) = H0 (A) + H0 (B). (6.2)
This justifies the logarithm. The information size of a set is not necessarily integer.
If we need to encode the elements of A, the number of necessary bits is dH0 (A)e
rather than H0 (A); but this is irrelevant for the theory.
The ideas above are natural and anybody might have invented the concept of
information size. The next notion, the information gain, is intuitive but needed a
genius to define and quantify it.
Suppose you need to uncover a certain English word of five letters. You manage
to obtain one letter, namely an e. This is useful information, but the letter e is
common in English, so this provides little information. If, on the other hand, the
letter that you discover is j (the least common in English), the search has been
more narrowed and you have obtained more information. The information gain
quantifies this heuristics.
We need to introduce a relevant formalism. Let A = (A, p) be a discrete
probability space. That is, A = {a1 , . . . , an } is a finite set, and each element has
probability pi . (The σ-algebra is the set of all subsets of A.) The information gain
G(B|A) measures the gain obtainedPby the knowledge that the outcome belongs to
the set B ⊂ A. We denote p(B) = i∈B pi .
Definition 6.1. The information gain is
1
G(B|A) = log2 = − log2 p(B).
p(B)

51
52 6. SHANNON ENTROPY

The information gain is positive, and it satisfies the following additivity prop-
erty. Let B ⊂ C ⊂ A. The gain for knowing that the outcome is in C is
G(C|A) = − log2 p(C). The gain for knowing that it is in B, after knowing that it
is in C, is
p(B)
G(B|C) = − log2 p(B|C) = − log . (6.3)
p(C)
It follows that G(B|A) = G(C|A) + G(B|C), as it should be.
The unit for the information gain is the bit. We gain 1 bit if p(B) = 21 .

2. Shannon entropy
It is named after Shannon1, although its origin goes back to Pauli and von
Neumann.
Definition 6.2. The Shannon entropy of A is
n
X
H(A) = − pi log2 pi .
i=1

The extension to continuum probability spaces is not straightforward and we

do not discuss it here.
Proposition 6.1. H(A) 6 log2 n, with equality iff p(ai ) = 1/n for all i.
Proof. Since − log2 is convex, we have from Jensen’s inequality
n n
X 1 X 1
− log2 pi 6 − log2 pi = −H(A). (6.4)
p
i=1 i i=1
pi
| {z }
n
Since − log2 is strictly convex, the inequality above is strict unless p1 = · · · =
pn .
Proposition 6.2. H is strictly concave with respect to p. That is, writing
H(p) instead of H(A = (A, p)), we have
H(αp(1) + (1 − α)p(2) ) > αH(p(1) ) + (1 − α)H(p(2) ).
Proof. The space of probabilities on A is the convex set
X
P = {(p1 , . . . , pn ) : 0 6 pi 6 1, pi = 1}. (6.5)
(It
P is actually a simplex.) Given p in the interior of P , let q = (q1 , . . . , qn ) such that
qi = 0, and such that p + λq ∈ P provided λ is a number small enough. Then
Xn
H(p + λq) = − (pi + λqi ) log2 (pi + λqi ). (6.6)
i=1
The derivatives with respect to λ are
n n
dH X d2 H 1 X qi2
=− qi log2 (pi + λqi ), = − . (6.7)
dλ i=1
dλ2 log 2 i=1 p1 + λqi
The latter is strictly negative.
1The American Claude Shannon (1916–2001) wrote A mathematical theory of communication
in 1948, an article that created information theory.
3. RELATION WITH BOLTZMANN ENTROPY 53

Definition 6.3. The relative entropy of the probability p with respect to the
probability q is
n
X pi
H(p|q) = pi log2 .
i=1
qi

The definition is not symmetric under exchanges of p and q, H(p|q) 6= H(q|p)

unless p = q.
Proposition 6.3.
H(p|q) > 0,
with equality iff p = q.
Proof. By Jensen,
X qi X q
i
H(p|q) = − pi log2 > − log2 pi = 0. (6.8)
i
pi i
p i

3. Relation with Boltzmann entropy

Statistical mechanics provides three probability measures on the phase space,
the microcanonical, canonical, and grand-canonical measures. We now study the
Shannon entropy of these measures; as it turns, it coincides with the thermodynamic
entropy. For simplicity we consider the Ising model in the lattice gas interpretation,
but the present discussion is clearly more general. Recall that the domain D ⊂ Zd
is discrete, and that the state space is Ω = {0, 1}D . The Hamiltonian Ω → R
involves a sum over nearest neighbours.

Microcanonical ensemble. The probability is uniform over all states with

energy U and number of particles N :
(
1
if H(ω) = U and N (ω) = N ,
pmicro (ω) = X(U,D,N )
0 otherwise.
1
Then H(pmicro ) = log2 X(U, D, N ) = kB log 2 S(U, D, N ).

Canonical ensemble. We now have the Gibbs factor.

( −βH(ω)
e
if N (ω) = N,
pcan (ω) = Y (β,D,N )
0 otherwise.
One easily obtains
1
H(pcan ) = βhH(ω)i + log Y (β, D, N ) .
log 2
The average of the Hamiltonian is equal to the thermodynamic energy U (β, D, N ).
The logarithm of Y is equal to −βF (β, D, N ). Recall that the free energy is related
to the energy and the entropy by U = F − T S. With β = 1/kB T , we see that
1
H(pcan ) = kB log 2 S(β, D, N ).
54 6. SHANNON ENTROPY

Grand-canonical ensemble. The probability pgd−can involves now the chem-

1
ical potential. As is checked in the exercises, we have H(pgd−can ) = kB log 2 S(β, D, µ).

A thorough rigorous discussion of the relations between Shannon and thermo-

dynamic entropies, and of the theory of large deviations can be found in [Pfister,
2002]2

4. Shannon theorem
A basic problem in information theory deals with encoding large quantities of
information. We start with a finite set A, that can denote the 26 letters from the
Latin alphabet, or the 128 ASCII symbols, or a larger set of words. We consider a
file that contains N |A| symbols with N large. How many bits are required so that
the file can be encoded without loss of information? The answer is given by the
information size, H0 (AN ) = N H0 (A).
The question becomes more interesting, and the answer more surprising, if we
allow an error δ. We now seek to encode only files that fall in a set B ⊂ A, such
that p(B) > 1 − δ. If a file turns out to be in A \ B, then we lose the information.
The information size is given by Hδ (A), where
Hδ (A) = inf log2 |B|. (6.9)
B⊂A
p(B)>1−δ

Notice that limδ→0 Hδ (A) = H0 (A).

The occurrence of probabilities may be confusing, so a discussion is needed. In
the real world, information is selected for a purpose and its content is well-chosen.
But we modelize the problem of information transmission using probabilities; the
process can be described as follows. A string of N characters is selected at random
using some probability. One wants to encode this string, to send it (or to store it),
and to decode it. We assume that no error is committed during these operations,
except that the string may lie outside the set of codable strings. This modelization
is behind all compression algorithms. Strange as it may seem, an MP3 file of Manon
Lescaut has been compressed as if the music was the the result of random composing
by Puccini, random performing by the orchestra, and random singing by the soli!
SoQwe want to say something about Hδ (AN ) (the probability of (a1 , . . . , aN ) ∈
N
A is p(ai ), i.e. we assume independence). Notice that Hδ (A) 6 H0 (A), and also
that H(A) 6 H0 (A), with equality iff elements of A come with equal probability.
We have H0 (AN ) = N H0 (A), but Hδ (AN ) is smaller than N Hδ (A) in general.

Theorem I (Shannon source coding theorem). For any δ > 0,

1
lim Hδ (AN ) = H(A).
N →∞ N

The theorem says that if we allow for a tiny error, and if our message is large
(depending on the error), the number of required bits is roughly N H(A). Notice
that the limit in the theorem is a true limit, not a lim inf or a lim sup. Thus
Shannon entropy gives the optimal compression rate, that can be approached but
not improved.
2Ch.-É. Pfister, Thermodynamical aspects of classical lattice systems, in In and out of equi-
librium: Physics with a probability flavor, Progr. Probab. 51, Birkhäuser (2002)
4. SHANNON THEOREM 55

Proof. It is based on the (weak) law of large numbers. Consider the random
variable − log2 p(a). The law of large numbers states that, for any ε > 0,
n 1 X N o

lim Prob (a1 , . . . , aN ) : − log2 p(ai ) − E − log2 p(a) > ε = 0.

N →∞ N i=1 | {z }
| {z } H(A)
1
−N log2 p(a1 ,...,aN )
(6.10)
There exists therefore a set AN,ε ⊂ AN such that limN p(AN,ε ) = 1, and such that
any (a1 , . . . , aN ) ∈ AN,ε satisfies

2−N (H(A)+ε) 6 p(a1 , . . . , aN ) 6 2−N (H(A)−ε) . (6.11)

The number of elements of AN,ε is easily estimated:

1 > p(AN,ε ) > |AN,ε | 2−N (H(A)+ε) , (6.12)

so that |AN,ε | 6 2N (H(A)+ε) . For any δ > 0, we can choose N large enough so that
p(AN,ε ) > 1 − δ. Then

Hδ (AN ) 6 log2 |AN,ε | 6 N (H(A) + ε). (6.13)

It follows that
1
lim sup Hδ (AN ) 6 H(A). (6.14)
N →∞ N
For the lower bound, let BN,δ be the minimizer for Hδ ; that is, p(BN,δ ) > 1 − δ,
and
Hδ (AN ) = log2 |BN,δ | > log2 BN,δ ∩ AN,ε .

(6.15)
We need a lower bound for the latter term.

+ p BN,δ ∩ AcN,ε .

1−δ 6 p BN,δ ∩ AN,ε (6.16)
| {z } | {z }
6|BN,δ ∩AN,ε |2−N (H(A)−ε) 6δ if N large

Then
BN,δ ∩ AN,ε > (1 − 2δ) 2N (H(A)−ε) .

(6.17)
We obtain
1 1
Hδ (AN ) > log2 (1 − 2δ) + H(A) − ε. (6.18)
N N
This gives the desired bound for the lim inf, and Shannon’s theorem follows.

It is instructive to quantify the ε’s and δ’s of the proof. Invoking the central
limit theorem instead of the law of large numbers, we get that
Z ∞
1 2 2
p(AN,ε ) ≈ 2σ √ e− 2 t dt ≈ e−N ε ≈ δ.
2
Nε
1
(σ 2 is the variance of the random variable − log2 p(a).) This shows that ε ≈ N − 2
and δ ≈ e−N . It is surprising that δ can be so tiny, and yet makes such a difference!
56 6. SHANNON ENTROPY

5. Codes and compression algorithms

A code is a mapping from a “string” (a finite sequence of letters) to a finite
sequence of binary numbers. The coding of a string is sequential, i.e. letters are
coded one by one, independently of one another. We need some notation. A binary
sequence, or sequence of bits, is written b = b1 b2 . . . bm , where each bi = 0, 1. The
concatenation of two binary sequences b and b0 , of respective lengths m and n, is
the sequence
bb0 = b1 . . . bm b01 . . . b0n .
Of course, the length of bb0 is m + n. Let B be the set of all finite binary sequences,
of arbitrary length. We want to encode a string (a1 , . . . , an ) of elements of an
“alphabet” A.
Definition 6.4. A code is a map c from ∪N >1 AN to B, which satisfies the
following sequential property:
c(a1 , . . . , an ) = c(a1 ) . . . c(an ).
A code is uniquely decodable if the mapping is injective (one-to-one).
Some codes map any letter a ∈ A to a binary sequence with fixed length. For
instance, ASCII characters use 7 bits for any letter. But compression algorithms
use variable length codes. A special class of variable length codes are prefix codes.
These are uniquely decodable codes where a sequence of binary numbers can be
decoded sequentially. That is, one reads the binary numbers from the left, one by
one, until one recognizes the code of a letter. One extracts the letter, and reads
the next binary numbers until the next letter is identified. These notions are best
understood with the help of examples.
Let A = {a1 , a2 , a3 , a4 }. The code c(1) is undecodable; c(2) is uniquely decod-
able but is not a prefix code; and c(3) and c(4) are prefix codes.

c(1) : a1 7→ 0 c(2) : a1 7→ 0 c(3) : a1 7 00

→ c(4) : a1 7→ 0
a2 7→ 1 a2 7→ 01 a2 7→ 01 a2 7→ 10
a3 7→ 00 a3 7→ 011 a3 7→ 10 a3 7→ 110
a4 7→ 11 a4 7→ 0111 a4 7→ 11 a4 7→ 111

Any fixed length, uniquely decodable code is also prefix. Prefix codes can be
represented by trees, see Fig. 6.1.

r a1 7→ 00 r a1 7→ 0
0 0
r r r a2 → 7 10
0
1 r
HH 0
a2 7→ 01 @
r 1@ r r a3 7→ 110
0 r
0
@ a3 7→ 10 @
1@
r 1@ r
@ @
1@ r 1@ r
a4 7→ 11 a4 →
7 111

Figure 6.1. Tree representations for the prefix codes c(3) and c(4) .
5. CODES AND COMPRESSION ALGORITHMS 57

The goal of compression algorithms3 is to encode strings with the smallest

sequence of binary numbers. For a ∈ A, let `(a) be the length of the sequence c(a).
If each letter a occurs with (independent) probability p(a), the expectation for the
length of one letter is X
L(A, c) = p(a)`(a). (6.19)
a∈A
Thus the goal is to find the code that minimizes E(`). It is instructive to consider
the following two situations with A = {a1 , a2 , a3 , a4 }, and the two prefix codes
above.
(1) If p1 = p2 = p3 = p4 = 41 , the expected length for c(3) is 2 bits, and for
c(4) it is 2.25 bits. The first code is better. Notice that H(A) = 2 bits.
(2) If p1 = 21 , p2 = 14 , p3 = p4 = 18 , the expected length for c(3) is still 2 bits,
but it is now 1.75 bits for c(4) . The second code is better. Notice that
H(A) = 1.75 bits.
As we will see, the codes are optimal for these two situations.
The idea is that frequent letters should be coded with smaller length. This clearly
comes at the expense of other letters, that will need longer strings in order for the
code to be decodable. There is a bound on minimally achievable lengths, that does
not involve probabilities. It is known as Kraft inequality.
Proposition 6.4 (Kraft inequality).
• Any one-to-one code on A satisfies
X
2−`(a) 6 1.
a∈A

• Given {`(a) : a ∈ A} satisfying the inequality above, there corresponds a

prefix code.
Proof. We need to prove the first statement for any decodable code, not
necessarily a prefix code. We start with
X N X
2−`(a) = 2−`(a1 )−...−`(aN )
a∈A a1 ,...,aN
NX
`max (6.20)
2−L # (a1 , . . . , aN ) : ` c(a1 , . . . , aN ) = L .

=
L=N `min

We set `min = mina∈A `(a), and similarly for `max . Since the code is one-to-one,
the number #{·} above is no more than 2L . Then
X N NX
`max
2−`(a) 6 2−L 2L = N (`max − `min + 1). (6.21)
a∈A L=N `min

Since the right side grows like N , the left side cannot grow exponentially with N ;
it must be less or equal to 1.
The second claim can be proved e.g. by suggesting an explicit construction
for the prefix code, given lengths that satisfy Kraft inequality. It is left as an
exercise.
3The word “algorithm” derives from Abu Ja’far Muhammad ibn Musa Al-Khwarizmi, who
was born in Baghdad around 780, and who died around 850.
58 6. SHANNON ENTROPY

Shannon entropy again appears as a limit for data compression.

Theorem II (Limit to compression). For any alphabet A, and any probability

p on A, the optimal prefix code c satisfies
H(A) 6 L(A, c) 6 H(A) + 1.

Proof. For the lower bound, consider a code c with lengths `i = `(c(ai )).
−`i
Define qi = 2 z , with z = j 2−`j . We have
P

n
X n
X n
X
L(A, c) = pi ì = − pi log2 qi − log2 z > − pi log2 pi = H(A). (6.22)
i=1 i=1 i=1
The inequality holds because of positivity of the relative entropy, Proposition 6.3,
and because z 6 1 (Kraft inequality).
For the upper bound, define ì = d− log2 pi e (the integer immediately bigger
than − log2 pi ). Then
Xn Xn
2−ì 6 pi = 1. (6.23)
i=1 i=1
This shows that Kraft inequality is verified, so there exists a prefix code c with
these lengths. The expected length is easily estimated,
Xn n
X
L(A, c) = pi d− log2 pi e 6 pi (− log2 pi + 1) = H(A) + 1. (6.24)
i=1 i=1

Exercise 6.1. State and prove Jensen’s inequality.

Exercise 6.2. Check that Shannon’s entropy of the grand-canonical probability is

equal to the corresponding entropy. Consider a discrete statistical mechanics model
such as the lattice gas with nearest-neighbour interactions (Ising model).

Exercise 6.3. Recall the definitions for the codes c(1) , c(2) , c(3) , and c(4) . Explain
why c(1) is undecodable; c(2) is uniquely decodable but not prefix; c(3) and c(4) are
prefix codes.

Exercise 6.4. Given lengths {`(a)} satisfying Kraft inequality, show the existence
of a corresponding prefix code.

Quantum Information: Stephen M. Barnett
No ratings yet
Quantum Information: Stephen M. Barnett
60 pages
Brillouin - The Negentropy Principle of Information
No ratings yet
Brillouin - The Negentropy Principle of Information
13 pages
Shannon's Theorems: Math and Science Summer Program 2020
No ratings yet
Shannon's Theorems: Math and Science Summer Program 2020
28 pages
Information Theory: A Tutorial Introduction
0% (1)
Information Theory: A Tutorial Introduction
23 pages
Lecture 1
No ratings yet
Lecture 1
211 pages
Vedic Math Notes Hand Written (WWW - ExamTyaari.in) PDF
100% (1)
Vedic Math Notes Hand Written (WWW - ExamTyaari.in) PDF
228 pages
The Source Coding Theorem: M Ario S. Alvim (Msalvim@dcc - Ufmg.br)
No ratings yet
The Source Coding Theorem: M Ario S. Alvim (Msalvim@dcc - Ufmg.br)
62 pages
A Mini-Introduction To Information Theory
No ratings yet
A Mini-Introduction To Information Theory
41 pages
Intro Quant Info
No ratings yet
Intro Quant Info
128 pages
Materi Source Coding
No ratings yet
Materi Source Coding
39 pages
Short Intro Quantum Information
No ratings yet
Short Intro Quantum Information
64 pages
Information Theory in Molecular Biology: Christoph Adami
No ratings yet
Information Theory in Molecular Biology: Christoph Adami
30 pages
Information Theory: Mike Brookes E4.40, ISE4.51, SO20
No ratings yet
Information Theory: Mike Brookes E4.40, ISE4.51, SO20
114 pages
Information Theory A Tutorial Introduction-1-20
No ratings yet
Information Theory A Tutorial Introduction-1-20
20 pages
1805.11965 Edward Witten
No ratings yet
1805.11965 Edward Witten
40 pages
Lecture 5
No ratings yet
Lecture 5
42 pages
MIT18 600F19 Lec33
No ratings yet
MIT18 600F19 Lec33
58 pages
A Mini-Introduction To Information Theor PDF
No ratings yet
A Mini-Introduction To Information Theor PDF
40 pages
A Mini-Introduction To Information Theor PDF
No ratings yet
A Mini-Introduction To Information Theor PDF
40 pages
A Statistical Measure of Complexity - Desiquilibrio
No ratings yet
A Statistical Measure of Complexity - Desiquilibrio
23 pages
Sub Bab 15
No ratings yet
Sub Bab 15
14 pages
A Mini-Introduction To Information Theory: Edward Witten
No ratings yet
A Mini-Introduction To Information Theory: Edward Witten
39 pages
Shannon Entropy A Rigorous Notion at The Crossroads Between Probability Information Theory Dynamical Systems and Statistical Physics
No ratings yet
Shannon Entropy A Rigorous Notion at The Crossroads Between Probability Information Theory Dynamical Systems and Statistical Physics
63 pages
Information Theory
No ratings yet
Information Theory
114 pages
Presentation Math7952
No ratings yet
Presentation Math7952
29 pages
Paper Theory On Information Theory
No ratings yet
Paper Theory On Information Theory
15 pages
Theory and Application of Trapdoor Functions
No ratings yet
Theory and Application of Trapdoor Functions
12 pages
Information
No ratings yet
Information
15 pages
Probabilistic Methods in Information Theory
No ratings yet
Probabilistic Methods in Information Theory
48 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
Information Theory/ Data Compression Ma 4211: J Urgen Bierbrauer February 28, 2007
No ratings yet
Information Theory/ Data Compression Ma 4211: J Urgen Bierbrauer February 28, 2007
78 pages
Information Theory Textbook
No ratings yet
Information Theory Textbook
14 pages
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
No ratings yet
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
44 pages
2 Information Measurement and Entropy
No ratings yet
2 Information Measurement and Entropy
23 pages
Lect 6 Quantinfo 1112
No ratings yet
Lect 6 Quantinfo 1112
13 pages
Communication Theory and Coding: Basics
No ratings yet
Communication Theory and Coding: Basics
17 pages
Entropy Optimization Principles and Their Applications
No ratings yet
Entropy Optimization Principles and Their Applications
18 pages
Entropy Handbook Definitions, Theorems, M-Files
No ratings yet
Entropy Handbook Definitions, Theorems, M-Files
22 pages
David Ellerman - An Introduction To Logical Entropy and Its Relation To Shannon Entropy
No ratings yet
David Ellerman - An Introduction To Logical Entropy and Its Relation To Shannon Entropy
23 pages
Information Theory
No ratings yet
Information Theory
18 pages
A Joint Representation of Rényi's and Tsalli's Entropy With Application in Coding Theory - 2017 - International Journal of Mathematics A
No ratings yet
A Joint Representation of Rényi's and Tsalli's Entropy With Application in Coding Theory - 2017 - International Journal of Mathematics A
6 pages
Three Tutorial Lectures
No ratings yet
Three Tutorial Lectures
36 pages
I. Entropy in Statistical Mechanics.: 03. Boltzmann Entropy, Gibbs Entropy, Shannon Information
No ratings yet
I. Entropy in Statistical Mechanics.: 03. Boltzmann Entropy, Gibbs Entropy, Shannon Information
24 pages
Entropy, Relative Entropy and Mutual Information
No ratings yet
Entropy, Relative Entropy and Mutual Information
38 pages
Slide 04
No ratings yet
Slide 04
16 pages
Entropy Estimates of Small Data Sets-Bonachela Hinrichsen Munoz-JPhysA 41 202001
No ratings yet
Entropy Estimates of Small Data Sets-Bonachela Hinrichsen Munoz-JPhysA 41 202001
9 pages
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
No ratings yet
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
5 pages
Entropy in Mathemtics
No ratings yet
Entropy in Mathemtics
2 pages
Entropy 1
No ratings yet
Entropy 1
7 pages
Entr 5
No ratings yet
Entr 5
2 pages
Entropy and Uncertainty
No ratings yet
Entropy and Uncertainty
15 pages
Apocalyptic Record Preview 1 - Ahroun
No ratings yet
Apocalyptic Record Preview 1 - Ahroun
53 pages
Arkansas River Pullover by Irina Anikeeva Eng 1.1
100% (1)
Arkansas River Pullover by Irina Anikeeva Eng 1.1
8 pages
CoverThomas Ch2 PDF
No ratings yet
CoverThomas Ch2 PDF
38 pages
Entropy: Low Entropy High Entropy
No ratings yet
Entropy: Low Entropy High Entropy
11 pages
Microcanonical Ensemble Unit 8
No ratings yet
Microcanonical Ensemble Unit 8
12 pages
27may 3
No ratings yet
27may 3
3 pages
Lecture 1: Introduction, Entropy and ML Estimation
No ratings yet
Lecture 1: Introduction, Entropy and ML Estimation
5 pages
Spotlight On Literature A - Tests PDF
No ratings yet
Spotlight On Literature A - Tests PDF
17 pages
Snubbing Practice Irp15 - Final - 2007
100% (4)
Snubbing Practice Irp15 - Final - 2007
149 pages
Play Types
No ratings yet
Play Types
11 pages
Aquaculture Facility Certification: Finfish, Crustacean and Mollusk
No ratings yet
Aquaculture Facility Certification: Finfish, Crustacean and Mollusk
36 pages
Avo, Inversion AND Seismic Attributes: Poisson's Ratio Volumetrics
No ratings yet
Avo, Inversion AND Seismic Attributes: Poisson's Ratio Volumetrics
2 pages
DocuCentre-V C2265 C2263 Brochure
No ratings yet
DocuCentre-V C2265 C2263 Brochure
8 pages
Enex Sustainability Report 2021
No ratings yet
Enex Sustainability Report 2021
109 pages
MIRACLE - Riley Clemmons (Impressão)
No ratings yet
MIRACLE - Riley Clemmons (Impressão)
1 page
Digestive System Class Prints
No ratings yet
Digestive System Class Prints
10 pages
Online Papers - June 2010 Batch
No ratings yet
Online Papers - June 2010 Batch
105 pages
Assessing The Urbanization-Driven Challenges of Transportation Scarcity and Commuter Accessibility in Davao City: A Systematic Review
No ratings yet
Assessing The Urbanization-Driven Challenges of Transportation Scarcity and Commuter Accessibility in Davao City: A Systematic Review
11 pages
Experiment Lab 6
No ratings yet
Experiment Lab 6
8 pages
AMC - EX - P001 - SOP - Field Data Naming Standard - V01
No ratings yet
AMC - EX - P001 - SOP - Field Data Naming Standard - V01
14 pages
GRADE 10.volume and Surface Area
No ratings yet
GRADE 10.volume and Surface Area
7 pages
PCR 5 Lab Setup
No ratings yet
PCR 5 Lab Setup
22 pages
GL XX Mobilgear 600 XP Series
No ratings yet
GL XX Mobilgear 600 XP Series
4 pages
Engine Valve Lash - Inspect/Adjust: Testing and Adjusting
No ratings yet
Engine Valve Lash - Inspect/Adjust: Testing and Adjusting
6 pages
Glacier NP
No ratings yet
Glacier NP
1 page
PE 2 (Prelim Exam - Attempt 1 Review.) PDF
No ratings yet
PE 2 (Prelim Exam - Attempt 1 Review.) PDF
8 pages
TSSA Field Approval Information July 29, 2011 PDF
No ratings yet
TSSA Field Approval Information July 29, 2011 PDF
5 pages
IBC Storage & Handling - Gulf EcoPro
No ratings yet
IBC Storage & Handling - Gulf EcoPro
2 pages
Colley's Method.
No ratings yet
Colley's Method.
13 pages
Geometric Progression - Summary Note: General Term
No ratings yet
Geometric Progression - Summary Note: General Term
5 pages
Engine
No ratings yet
Engine
33 pages
Rectangle-Wolfram Language Documentation
No ratings yet
Rectangle-Wolfram Language Documentation
4 pages
Ivi6000c Im 12
No ratings yet
Ivi6000c Im 12
4 pages
BME Assessment 02 QP
No ratings yet
BME Assessment 02 QP
2 pages

Chapshannon PDF

Uploaded by

Chapshannon PDF

Uploaded by

CHAPTER 6

This chapter is a digression in information theory. This is a fascinating

The extension to continuum probability spaces is not straightforward and we

The definition is not symmetric under exchanges of p and q, H(p|q) 6= H(q|p)

3. Relation with Boltzmann entropy

Microcanonical ensemble. The probability is uniform over all states with

Canonical ensemble. We now have the Gibbs factor.

Grand-canonical ensemble. The probability pgd−can involves now the chem-

A thorough rigorous discussion of the relations between Shannon and thermo-

Notice that limδ→0 Hδ (A) = H0 (A).

Theorem I (Shannon source coding theorem). For any δ > 0,

2−N (H(A)+ε) 6 p(a1 , . . . , aN ) 6 2−N (H(A)−ε) . (6.11)

The number of elements of AN,ε is easily estimated:

1 > p(AN,ε ) > |AN,ε | 2−N (H(A)+ε) , (6.12)

Hδ (AN ) 6 log2 |AN,ε | 6 N (H(A) + ε). (6.13)

5. Codes and compression algorithms

c(1) : a1 7→ 0 c(2) : a1 7→ 0 c(3) : a1 7 00

The goal of compression algorithms3 is to encode strings with the smallest

• Given {`(a) : a ∈ A} satisfying the inequality above, there corresponds a

Shannon entropy again appears as a limit for data compression.

Theorem II (Limit to compression). For any alphabet A, and any probability

Exercise 6.1. State and prove Jensen’s inequality.

Exercise 6.2. Check that Shannon’s entropy of the grand-canonical probability is

You might also like