Shannon Source Coding Theorem
Shannon Source Coding Theorem
Kim Boström
∗
Institut für Physik, Universität Potsdam, 14469 Potsdam, Germany
The idea of Shannon’s famous source coding theorem Now consider a very long message x. Typically, the let-
[1] is to encode only typical messages. Since the typical ter ak will appear with the frequency Nk ≈ N pk . Hence,
messages form a tiny subset of all possible messages, we the probability of such typical message is roughly
need less resources to encode them. We will show that K
the probability for the occurence of non-typical strings Y
p(x) ≈ ptyp ≡ pN Nk
1 · · · pK =
1
pN
k
pk
. (4)
tends to zero in the limit of large message lengths. Thus
k=1
we have the paradoxical situation that although we “for-
get” to encode most messages, we loose no information We see that typical messages are uniformly distributed
in the limit of very long strings. In fact, we make use of by ptyp . This indicates that the set T of typical messages
redundancy, i.e. we do not encode “unnecessary” infor- has the size
mation represented by strings which almost never occur. 1
Recall that a random message of length N is a string |T | ≈ . (5)
ptyp
If we encode each member of T by a binary string we
need
possible
messages K
X
IN = log |T | = −N pk log pk ≡ N H(X), (6)
k=1
which expresses the fact that the letters are statistically which is also a random variable. Since the Xn are iden-
independent from each other. tical copies of the letter ensemble X, the average of A is
equal to the average of f (X),
N
1 X
hAi = hf (Xn )i = hf (X)i, (11)
∗ Electronic address: [email protected] N i=1
2
1 X
N
|T | 2−N (H−δ) ≥ 1 − (29)
hf (X)i − δ ≤ f (xn ) ≤ hf (X)i + δ. (17) N (H−δ)
N n=1 ⇔ |T | ≥ (1 − ) 2 . (30)
The law of large numbers implies that for every , δ > 0 Relations (28) and (30) can be combined into the crucial
there is a natural number N0 , such that for all N > N0 relation
the total probability of all typical sequences fulfills
(1 − ) 2N (H−δ) ≤ |T | ≤ 2N (H+δ) . (31)
X
PT ≡ p(x) ≥ 1 − . (18)
For N → ∞ we can choose , δ = 0 and obtain the desired
x∈T
expression
The total probability PT represents the probability for a
randomly chosen sequence x to lie in the typical set T . |T | → 2N H(X) , (32)
Now consider the special random variable
thus we need IN → N H(X) bits to encode the message.
f (X) := − log p(X). (19) Equivalently, the information content per letter reads I =
H(X) bits. Finally, let us investigate if we can further
The average of f (X) equals the Shannon entropy of the improve the compression. Relation (30) gives a lower
ensemble X, bound for the size of the typical set. Let us compress
below H bits per letter by fixing some 0 > 0 and encode
only sequences that lie in a “subtypical set” T 0 ⊂ T whose
X
hf (X)i = − p(x) log p(x) = H(X). (20)
x∈A
size reads
0 0
The typical set now contains all messages x whose prob- |T 0 | ≤ (1 − )2N (H−δ− ) < 2N (H−δ− ) . (33)
ability fulfills
The righthand side of (22) states that the probability of
N a typical sequence is bounded from above by
1 X
H −δ ≤− log p(xn ) ≤ H + δ, (21)
N n=1 p(x) ≤ pmax ≡ 2−N (H−δ) . (34)
3