0% found this document useful (0 votes)

24 views66 pages

Chapter 6

Uploaded by

liaoendymion

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views66 pages

Chapter 6

Uploaded by

liaoendymion

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Chapter 6

Lossy Data Compression and Transmission

Po-Ning Chen, Professor

Institute of Communications Engineering

National Chiao Tung University

Hsin Chu, Taiwan 30010, R.O.C.

6.1.1 Motivation I: 6-1

Motivations
• Lossy data compression = to compress a source to a rate less than the source
entropy.
output with
source with unmanageable
rH > C - channel with error -
capacity C

compressed output with

source with lossy data source with manageable
rH > C compressor rH < C channel with error E -
- -
introducing capacity C
error E

Example for the application of lossy data compression

6.1.2 Distortion measures I: 6-2

Definition 6.4 (Distortion measure) A distortion measure is a mapping

ρ : Z × Ẑ → +,
where Z is the source alphabet, Ẑ is the reproduction alphabet for compressed
code, and + is the set of non-negative real numbers.

• The distortion measure ρ(z, ẑ) can be viewed as the cost of representing the
source symbol z ∈ Z by a reproduction symbol ẑ ∈ Ẑ.
E.g. A lossy data compression is similar to “grouping.”
' $' $Representative for this group

1 2 1 2
⇒

 
3 4 3 4 0 1 2 2
 
& %&6 % 1 0 2 2
[ρ(i, j)]:=  ,
Representative for this group 2 2 0 1
2 2 1 0
6.1.2 Distortion measures I: 6-3

– Average distortion under uniform source distribution

1 1 1 1 1
ρ(1, 1) + ρ(2, 1) + ρ(3, 3) + ρ(4, 3) = .
4 4 4 4 2
– Resultant entropy
H(Z) = log 2(4) = 2 bits ⇒ H(Ẑ) = log2(2) = 1 bit.
6.1.2 Distortion measures I: 6-4

• The above example presumes |Ẑ| = |Z|.

• Sometimes, it is convenient to have |Ẑ| = |Z| + 1.
E.g. |Z = {1, 2, 3}| = 3 and |Ẑ = {1, 2, 3, E}| = 4
and the distortion measure is deﬁned by
 
0 2 2 0.5
[ρ(i, j)]:=  2 0 2 0.5  .
2 2 0 0.5
– Suppose only two outcomes are allowed under uniform Z.
Then
(1) → 1 and (2, 3) → E
is an optimal choice (that minimizes the average distortion measure for a
given compression rate).
– Average distortion
1 1 1 1
ρ(1, 1) + ρ(2, E) + ρ(3, E) = .
3 3 3 3
– Resultant entropy
H(Z) = log2(3) bits ⇒ H(Ẑ) = [log2(3) − 2/3] bits.
6.1.3 Frequently used distortion measures I: 6-5

Example 6.5 (Hamming distortion measure) Let source alphabet and re-
production alphabet be the same, i.e., Z = Ẑ. Then the Hamming distribution
measure is given by
0, if z = ẑ;
ρ(z, ẑ):=
1, if z = ẑ.
This is also named the probability-of-error distortion measure because
E[ρ(Z, Ẑ)] = Pr(Z = Ẑ).

Example 6.6 (Absolute error distortion measure) Assuming that Z =

Ẑ = R, the absolute error distortion measure is given by
ρ(z, ẑ) := |z − ẑ|.

Example 6.7 (Squared error distortion) Assuming that Z = Ẑ = R, the

squared error distortion measure is given by
ρ(z, ẑ) := (z − ẑ)2.
The squared error distortion measure is perhaps the most popular distortion mea-
sure used for continuous alphabets.
Comments on squared error distortion I: 6-6

• The squared error distortion measure has the advantages of simplicity and
having a closed-form solution for most cases of interest, such as when using
least squares prediction.
• Yet, this measure is not ideal for practical situations involving data operated
by human observers (such as image and speech data) as it is inadequate in
measuring perceptual quality.
• For example, two speech waveforms in which one is a marginally time-shifted
version of the other may have large square error distortion; however, they sound
quite similar to the human ear.
Distortion measure for sequences I: 6-7

Definition 6.8 (Additive distortion measure between vectors) The ad-

ditive distortion measure ρn between vectors z n and ẑ n of size n (or n-sequences or
n-tuples) is deﬁned by

n
ρn (z n , ẑ n ) = ρ(zi , ẑi).
i=1

Definition 6.9 (Maximum distortion measure)

ρn (z n , ẑ n ) = max ρ(zi , ẑi).
1≤i≤n

Question raised due to distortion measures for sequences

• Whether to reproduce source sequence z n by sequence ẑ n of the same length is
a must or not.
• In other words, can we use z̃ k to represent z n for k = n?
Answer: The answer is certainly yes if a distortion measure for z n and z̃ k is
deﬁned.
Distortion measure for sequences I: 6-8

Problem: A problem for taking k = n is that the distortion measure for sequences
can no longer be defined based on per-letter distortions, and hence a per-letter
formula for the best lossy data compression rate cannot be rendered.
Solution: To view the lossy data compression in two steps.
Step 1 : Find the data compression code
h : Z n → Ẑ n
for which the pre-specified distortion constraint and rate constraint are
both satisfied.
Step 2 : Derive the (asymptotically) lossless data compression block code for
source h(Z n). The existence of such code with block length
k > H(h(Z n )) bits
is guaranteed by Shannon’s lossless source coding theorem.

• Therefore, a lossy data compression code from

Zn → Ẑ n → {0, 1}k

is established.
Distortion measure for sequences I: 6-9

• Since the second step is already discussed in lossless data compression, we can
say that the theorem regarding the lossy data compression is basically a theorem
on the ﬁrst step.
6.2 Fixed-length lossy data compression I: 6-10

Definition 6.10 (Fixed-length lossy data compression code subject

to average distortion constraint) An (n, M, D) ﬁxed-length lossy data com-
pression code for source alphabet Z n and reproduction alphabet Ẑ n consists of a
compression function
h : Z n → Ẑ n
with the size of the codebook (i.e., the image h(Z n)) being |h(Z n)| = M , and the
average distortion satisfying
1
E ρn(Z n , h(Z n)) ≤ D.
n
• Code rate for lossy data compression
1
log M bits/sourceword
n 2
• Asymptotic code rate for lossy data compression
1
lim sup log M bits/sourceword
n→∞ n 2
Achievable Rate-Distortion Pair I: 6-11

Definition 6.11 (Achievable rate-distortion pair) For a given sequence of

distortion measures {ρn }n≥1, a rate distortion pair (R, D) is achievable if there
exists a sequence of ﬁxed-length lossy data compression codes (n, Mn, D) with
ultimate code rate
1
lim sup log2 Mn ≤ R.
n→∞ n

Definition 6.12 (Rate-distortion region) The rate-distortion region R of a

source {Zn} is the closure of the set of all achievable rate-distortion pair (R, D).
Achievable Rate-Distortion Pair I: 6-12

Lemma 6.13 (Time-sharing principle) Under an additive distortion mea-

sure ρn , the rate-distortion region R is a convex set; i.e., if (R1, D1) ∈ R and
(R2, D2) ∈ R, then (λR1 + (1 − λ)R2, λD1 + (1 − λ)D2) ∈ R for all 0 ≤ λ ≤ 1.

Proof:
• time-sharing argument:
– If we can use an (n, M1, D1) code ∼C1 to achieve (R1, D1) and an (n, M2, D2)
code ∼C2 to achieve (R2, D2), then for any rational number 0 < λ < 1, we
can use ∼C1 for a fraction λ of the time and use ∼C2 for a fraction 1 − λ
of the time to achieve (Rλ, Dλ), where Rλ = λR1 + (1 − λ)R2 and Dλ =
λD1 + (1 − λ)D2;
– hence the result holds for any real number 0 < λ < 1 by the density of the
rational numbers in R and the continuity of Rλ and Dλ in λ.
• Let r and s be positive integers and let λ = r
r+s ; then 0 < λ < 1.
Achievable Rate-Distortion Pair I: 6-13

• Assume that the pairs (R1, D1) and (R2, D2) are achievable. Then there exist
a sequence of (n, M1, D1) codes ∼C1 and a sequence of (n, M2, D2) codes ∼C2
such that for n suﬃciently large,
1
log M1 ≤ R1
n 2
and
1
log M2 ≤ R2.
n 2
• Construct a sequence of new codes ∼C of blocklength nλ = (r + s)n, codebook
size M = M1r × M2s and compression function h : Z (r+s)n → Ẑ (r+s)n such that
h(z (r+s)n) = (h1(z1n ), . . . , h1(zrn ), h2(zr+1
n n
), . . . , h2(zr+s ))
where
z (r+s)n = (z1n , . . . , zrn , zr+1
n n
, . . . , zr+s )
and h1 and h2 are the compression functions of ∼C1 and ∼C2, respectively.
Achievable Rate-Distortion Pair I: 6-14

• The average (or expected) distortion under the additive distortion measure ρn
and the rate of code ∼C are given by

(r+s)n (r+s)n
ρ(r+s)n(z , h(z )) 1 ρn (z1n , h1(z1n )) ρn (zrn , h1(zrn ))
E = E + ···+ E
(r + s)n r+s n n
n n n n

ρn (zr+1 , h2(zr+1 )) ρn (zr+s , h2(zr+s ))
+E + ···+ E
n n
1
≤ (rD1 + sD2)
r+s
= λD1 + (1 − λ)D2 = Dλ
and
1 1
log2 M = log2(M1r × M2s)
(r + s)n (r + s)n
r 1 s 1
= log2 M1 + log M2
(r + s) n (r + s) n 2
≤ λR1 + (1 − λ)R2 = Rλ,
respectively, for n suﬃciently large. Thus, (Rλ, Dλ) is achievable by ∼.
C
Achievable Rate-Distortion Pair I: 6-15

Definition 6.14 (Rate-distortion function) The rate-distortion function, de-

noted by R(D), of source {Zn} is the smallest R̂ for a given distortion threshold D
such that (R̂, D) is an achievable rate-distortion pair; i.e.,
R(D):= inf{R̂ ≥ 0 : (R̂, D) ∈ R}.

Observation 6.15 (Monotonicity and convexity of R(D)) Note that,

under an additive distortion measure ρn , the rate-distortion function R(D) is non-
increasing and convex in D (the proof is left as an exercise).
6.3 Rate-distortion theorem I: 6-16

Definition 6.16 (Distortion typical set) The distortion δ-typical set with
respect to the memoryless (product) distribution PZ,Ẑ on Z n × Ẑ n and a bounded
additive distortion measure ρn (·, ·) is deﬁned by

Dn(δ) := (z n , ẑ n ) ∈ Z n × Ẑ n :

1
− log2 PZ n (z n ) − H(Z) < δ,
n

1
− log2 P n (ẑ n ) − H(Ẑ) < δ,
n Ẑ

1
− log2 P n n (z n , ẑ n ) − H(Z, Ẑ) < δ,
n Z ,Ẑ

1
and ρn (z n, ẑ n ) − E[ρ(Z, Ẑ)] < δ .
n
AEP for distortion typical set I: 6-17

Theorem 6.17 If (Z1, Ẑ1), (Z2, Ẑ2), . . ., (Zn , Ẑn), . . . are i.i.d., and ρn are bou-
nded additive distortion measure, then
1
− log2 PZ n (Z1, Z2, . . . , Zn) → H(Z) in probability;
n
1
− log2 PẐ n (Ẑ1, Ẑ2, . . . , Ẑn) → H(Ẑ) in probability;
n
1
− log2 PZ n,Ẑ n ((Z1, Ẑ1), . . . , (Zn, Ẑn )) → H(Z, Ẑ) in probability;
n
and
1
ρn (Z n, Ẑ n ) → E[ρ(Z, Ẑ)] in probability.
n
Proof: Functions of independent random variables are also independent random
variables. Thus by the weak law of large numbers, we have the desired result. 2

• It needs to be pointed out that without the bounded property assumption, the
normalized sum of an i.i.d. sequence does not necessarily converge in probability
to a ﬁnite mean, hence the need for requiring that ρ be bounded.
AEP for distortion typical set I: 6-18

Theorem 6.18 (AEP for distortion measure) Given a DMS {(Zn, Ẑn)}
with generic joint distribution PZ,Ẑ and any δ > 0, the distortion δ-typical set
satisﬁes
1. PZ n,Ẑ n (Dnc (δ)) < δ for n suﬃciently large.
2. For all (z n , ẑ n ) in Dn(δ),

PẐ n (ẑ n) ≥ PẐ n|Z n (ẑ n|z n )2−n[I(Z;Ẑ)+3δ]. (6.3.1)

Proof: The first result follows directly from Theorem 6.17 and the definition of
the distortion typical set Dn(δ). The second result can be proved as follows:
PZ n,Ẑ n (z n , ẑ n )
PẐ n|Z n (ẑ n |z n ) =
PZ n (z n )
n n
P Z n ,Ẑ n (z , ẑ )
= PẐ n (ẑ n )
PZ n (z n )PẐ n (ẑ n )
2−n[H(Z,Ẑ)−δ]
≤ PẐ n (ẑ ) n
2−n[H(Z)+δ]2−n[H(Ẑ)+δ]
= PẐ n (ẑ n )2n[I(Z;Ẑ)+3δ],
where the inequality follows from the definition of Dn(δ).
AEP for distortion typical set I: 6-19

• Alternative form of (6.3.1):

PZ n,Ẑ n (z n , ẑ n )
≤ 2−n[I(Z;Ẑ)+3δ] for all (z n , ẑ n ) ∈ Dn(δ).
PZ n (z n )P (ẑ n )
Ẑ n

Lemma 6.19 For 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, and n > 0,

(1 − xy)n ≤ 1 − x + e−yn, (6.3.2)
with equality holds if, and only if, (x, y) = (1, 0).
Shannon’s lossy source coding theorem I: 6-20

Theorem 6.20 (Shannon’s rate-distortion theorem for memoryless

sources) Consider a DMS {Zn }∞ n=1 with alphabet Z, reproduction alphabet Ẑ
and a bounded additive distortion measure ρn (·, ·); i.e.,

n
ρn (z n , ẑ n ) = ρ(zi , ẑi) and ρmax := max ρ(z, ẑ) < ∞,
(z,ẑ)∈Z×Ẑ
i=1

where ρ(·, ·) is a given single-letter distortion measure. Then the source’s rate-
distortion function satisﬁes the following expression
R(D) = min I(Z; Ẑ).
PẐ|Z : E[ρ(Z,Ẑ)]≤D

Proof: Deﬁne
R(I)(D) := min I(Z; Ẑ); (6.3.3)
PẐ|Z : E[ρ(Z,Ẑ)]≤D

this quantity is typically called Shannon’s information rate-distortion function.

We will then show that the (operational) rate-distortion function R(D) given in
Deﬁnition 6.14 equals R(I)(D).
Shannon’s lossy source coding theorem I: 6-21

1. Achievability Part (i.e., R(D + ε) ≤ R(I)(D) + 4ε for arbitrarily small ε > 0):
We need to show that for any ε > 0, there exist 0 < γ < 4ε and a sequence of
lossy data compression codes {(n, Mn, D + ε)}∞ n=1 with
1
lim sup log2 Mn ≤ R(I)(D) + γ < R(I)(D) + 4ε.
n→∞ n
Step 1: Optimizing conditional distribution. Let PZ̃|Z be the conditional
distribution that achieves R(I)(D), i.e.,
R(I)(D) = min I(Z; Ẑ) = I(Z; Z̃).
PẐ|Z : E[ρ(Z,Ẑ)]≤D

Then
E[ρ(Z, Z̃)] ≤ D.
Choose Mn to satisfy
1 1
R(I)(D) + γ ≤ log2 Mn ≤ R(I)(D) + γ
2 n
for some γ in (0, 4ε), for which the choice should exist for all suﬃciently large
n > N0 for some N0. Deﬁne

γ ε
δ := min , .
8
1 + 2ρ
max
Required in Step 4 Required in Step 5
Shannon’s lossy source coding theorem I: 6-22

Step 2: Random coding. Independently select Mn words from Ẑ n according

to

n
n
PZ̃ n (z̃ ) = PZ̃ (z̃i ),
i=1
and denote this random codebook by ∼Cn , where

PZ̃ (z̃) = PZ (z)PZ̃|Z (z̃|z).
z∈Z
Shannon’s lossy source coding theorem I: 6-23

Step 3: Encoding rule. Deﬁne a subset of Z n as

J (∼Cn ) := {z n ∈ Z n : ∃ z̃ n ∈ ∼Cn such that (z n , z̃ n ) ∈ Dn(δ)} ,
where Dn(δ) is deﬁned under PZ̃|Z . Based on the codebook
∼Cn = {c1, c2, . . . , cMn },
deﬁne the encoding rule as:


 cm , if (z n , cm ) ∈ Dn(δ);


(when more than one satisfying the requirement,
hn (z n) =

 just pick any.)

 any word in ∼C ,
n otherwise.

Note that when z n ∈ J (∼Cn ), we have (z n , hn(z n )) ∈ Dn(δ) and

1
ρn (z n , hn(z n )) ≤ E[ρ(Z, Z̃)] + δ ≤ D + δ.
n
Shannon’s lossy source coding theorem I: 6-24

Step 4: Calculation of the probability of the complement of J (∼

Cn).
Let N1 be chosen such that for n > N1,
PZ n,Z̃ n (Dnc (δ)) < δ.
Let
Ω := PZ n (J c(∼Cn)).
Then the expected probability of source n-tuples not belonging to J (∼Cn),
averaged over all randomly generated codebooks, is given by
 

E[Ω] = PZ̃ n (∼Cn)  PZ n (z n )
∼Cn z n ∈J (∼Cn )
 

= PZ n (z ) 
n
PZ̃ n (∼Cn) .
z n ∈Z n ∼Cn : z n ∈
J (∼Cn )
Shannon’s lossy source coding theorem I: 6-25

For any z n given, to select a codebook ∼Cn satisfying z n ∈ J (∼Cn ) is equivalent

to independently draw Mn n-tuples from Ẑ n which are not jointly distortion
typical with z n . Hence,
! " #$Mn
PZ̃ n (∼Cn) = Pr (z n , Z̃ n ) ∈ Dn(δ) .
∼Cn : z n ∈
J (∼Cn )

For convenience, we let K(z n , z̃ n ) denote the indicator function of Dn(δ), i.e.,

n n 1, if (z n, z̃ n ) ∈ Dn(δ);
K(z , z̃ ) =
0, otherwise.
Then
 Mn

PZ̃ n (∼Cn) = 1 − PZ̃ n (z̃ n )K(z n , z̃ n) .
∼Cn : z n ∈
J (∼Cn ) z̃ n ∈Ẑ n
Shannon’s lossy source coding theorem I: 6-26

Continuing the computation

 of E[Ω], we get M
n

E[Ω] = PZ n (z n ) 1 − PZ̃ n (z̃ n )K(z n, z̃ n )
z n ∈Z n z̃ n ∈Ẑ n
 Mn

≤ PZ n (z n ) 1 − PZ̃ n |Z n (z̃ n |z n )2−n(I(Z;Z̃)+3δ) K(z n, z̃ n ) (by (6.3.1))
z n ∈Z n z̃ n ∈Ẑ n
 Mn

= PZ n (z n ) 1 − 2−n(I(Z;Z̃)+3δ) PZ̃ n |Z n (z̃ n |z n )K(z n, z̃ n )
z n ∈Z n z̃ n ∈Ẑ n
%
≤ PZ n (z n ) 1 − PZ̃ n |Z n (z̃ n |z n )K(z n, z̃ n ) + exp −Mn · 2−n(I(Z;Z̃)+3δ) (from (6.3.2))
z n ∈Z n
z̃ n ∈Ẑ n

≤ PZ n (z n ) 1 − PZ̃ n |Z n (z̃ n |z n )K(z n, z̃ n )
z n ∈Z n z̃ n ∈Ẑ n
%
n(R(I) (D)+γ/2) −n(I(Z;Z̃)+3δ)
+ exp −2 ·2 (for R(I) (D) + γ/2 < (1/n) log2 Mn )
& '
≤ 1 − PZ n ,Z̃ n (Dn(δ)) + exp −2nδ (for R(I) (D) = I(Z; Z̃) and δ ≤ γ/8)
& '
= PZ n ,Z̃ n (Dnc (δ)) + exp −2nδ
≤ δ + δ = 2δ
%
for all n > N := max N0, N1, 1δ log2 ln min{δ,1}1
.
Since E[Ω] = E [PZ n (J c(∼Cn ))] ≤ 2δ, there must exist a codebook ∼C∗n such
that PZ n (J c (∼C∗n)) is no greater than 2δ for n suﬃciently large.
Shannon’s lossy source coding theorem I: 6-27

Step 5: Calculation of distortion. The distortion of the optimal codebook

∼C∗n (from the previous step) satisﬁes for n > N :
1 1
n n
E[ρn (Z , hn(Z ))] = PZ n (z n ) ρn (z n, hn (z n ))
n n
z n ∈J (∼C∗n )
1
+ PZ n (z n) ρn (z n , hn(z n ))
n
z n ∈J (∼C∗n )

≤ n
PZ n (z )(D + δ) + PZ n (z n )ρmax
z n ∈J (∼C∗n ) z n ∈J (∼C∗n )
≤ (D + δ) + 2δ · ρmax
≤ D + δ(1 + 2ρmax )
≤ D + ε.

This concludes the proof of the achievability part.

Shannon’s lossy source coding theorem I: 6-28

2. Converse Part (i.e., R(D + ε) ≥ R(I)(D) for arbitrarily small ε > 0 and
any D ∈ {D ≥ 0 : R(I)(D) > 0}): We need to show that for any sequence of
{(n, Mn, Dn)}∞
n=1 code with

1
lim sup log2 Mn < R(I)(D),
n→∞ n
there exists ε > 0 such that
1
Dn = E[ρn (Z n , hn(Z n ))] > D + ε
n
for n suﬃciently large. The proof is as follows.
Step 1: Convexity of mutual information. By the convexity of mutual in-
formation I(Z; Ẑ) with respect to PẐ|Z for a ﬁxed PZ , we have

I(Z; Ẑλ) ≤ λ · I(Z; Ẑ1) + (1 − λ) · I(Z; Ẑ2 ),

where λ ∈ [0, 1], and
PẐλ|Z (ẑ|z) := λPẐ1|Z (ẑ|z) + (1 − λ)PẐ2|Z (ẑ|z).
Shannon’s lossy source coding theorem I: 6-29

Step 2: Convexity of R(I)(D). Let PẐ1|Z and PẐ2|Z be two distributions

achieving R(I)(D1) and R(I)(D2), respectively. Since

E[ρ(Z, Ẑλ )] = PZ (z) PẐλ|Z (ẑ|z)ρ(z, ẑ)
z∈Z ẑ∈Ẑ
( )
= PZ (z) λPẐ1|Z (ẑ|z) + (1 − λ)PẐ2|Z (ẑ|z) ρ(z, ẑ)
z∈Z,ẑ∈Ẑ
= λD1 + (1 − λ)D2,
we have
R(I)(λD1 + (1 − λ)D2) ≤ I(Z; Ẑλ)
≤ λI(Z; Ẑ1) + (1 − λ)I(Z; Ẑ2)
= λR(I)(D1) + (1 − λ)R(I)(D2).
Therefore, R(I)(D) is a convex function.
Shannon’s lossy source coding theorem I: 6-30

Step 3: Strictly decreasing and continuity properties of R(I)(D).

By deﬁnition, R(I)(D) is non-increasing in D. Also,

R (D) = 0 iﬀ D ≥ Dmax := min
(I)
PZ (z)PẐ (ẑ)ρ(z, ẑ)
PẐ
z∈Z
ẑ∈Ẑ
= min PẐ (ẑ) PZ (z)ρ(z, ẑ)
PẐ
ẑ∈Ẑ z∈Z

= min PZ (z)ρ(z, ẑ) (6.3.4)
ẑ∈Ẑ
z∈Z

which is ﬁnite by the boundedness of the distortion measure. Thus since

R(I)(D) is non-increasing and convex, it directly follows that it is strictly de-
creasing and continuous over {D ≥ 0 : R(I)(D) > 0}.
Shannon’s lossy source coding theorem I: 6-31

Step 4: Main proof.

log2 Mn ≥ H(hn (Z n))
= H(hn (Z n)) − H(hn (Z n )|Z n), since H(hn(Z n )|Z n ) = 0;
= I(Z n ; hn(Z n ))
= H(Z n ) − H(Z n |hn(Z n))
n n
= H(Zi ) − H(Zi|hn (Z n), Z1, . . . , Zi−1)
i=1 i=1
by the independence of Z n, and the chain rule for conditional entropy;
n n
≥ H(Zi ) − H(Zi|Ẑi ), where Ẑi is the ith component of hn(Z n);
i=1 i=1
n
n
= I(Zi ; Ẑi) ≥ R(I)(Di), where Di := E[ρ(Zi , Ẑi )];
i=1 i=1
* n +

n
1 (I) 1
= n R (Di) ≥ nR(I) Di , by convexity of R(I)(D);
n n
i=1

i=1
1
= nR(I) E[ρn (Z n , hn(Z n ))] ,
n
where the last step follows since the distortion measure is additive.
Shannon’s lossy source coding theorem I: 6-32

Finally,
1
lim sup log2 Mn < R(I)(D)
n→∞ n
implies the existence of N and γ > 0 such that
1
log2 Mn < R(I)(D) − γ
n
for all n > N . Therefore, for n > N ,

1 1
R(I) E[ρn (Z n, hn(Z n ))] ≤ log2 Mn < R(I)(D) − γ,
n n
which, together with the fact that R(I)(D) is strictly decreasing, implies that
1
E[ρn (Z n , hn(Z n ))] > D + ε
n
for some ε = ε(γ) > 0 and for all n > N .
Hence, (R(I)(D), D + ε) is not achievable and the operational R(D) satisﬁes
R(D + ε) > R(I)(D) for arbitrarily small ε > 0.
Shannon’s lossy source coding theorem I: 6-33

3. Summary:
• For D ∈ {D ≥ 0 : R(I)(D) > 0}, the achievability and converse parts jointly
imply that
R(I)(D) + 4ε ≥ R(D + ε) ≥ R(I)(D)
for arbitrarily small ε > 0.
• These inequalities together with the continuity of R(I)(D) yield that
R(D) = R(I)(D)
for D ∈ {D ≥ 0 : R(I)(D) > 0}.
• For D ∈ {D ≥ 0 : R(I)(D) = 0}, the achievability part gives us
R(I)(D) + 4ε = 4ε ≥ R(D + ε) ≥ 0
for arbitrarily small ε > 0. This immediately implies that
R(D) = 0 (= R(I)(D)).
2
Notes I: 6-34

• The formula of the rate-distortion function obtained in the previous theorems

is also valid for the squared error distortion over the real numbers, even if it is
unbounded.

– For example, the boundedness assumption in the theorems can be replaced

with assuming that there exists a reproduction symbol ẑ0 ∈ Ẑ such that
E[ρ(Z, ẑ0 )] < ∞.
– This assumption can accommodate the squared error distortion measure
and a source with ﬁnite second moment (including continuous-alphabet
sources such as Gaussian sources).

• Here, we put the boundedness assumption just to facilitate the exposition of

the current proof.
Notes I: 6-35

• After introducing
– Shannon’s source coding theorem for block codes
– Shannon’s channel coding theorem for block codes
– Rate-distortion theorem
in the memoryless (and stationary ergodic) system setting, we brieﬂy elucidate
the “key concepts or techniques” behind these lengthy proofs, in particular:
– The notion of a typical set
∗ The typical set construct – speciﬁcally,
· δ-typical set for source coding
· joint δ-typical set for channel coding
· distortion typical set for rate-distortion
uses a law of large numbers or AEP argument to claim the existence
of a set with very high probability; hence, the respective information
manipulation can just focus on the set with negligible performance loss.
Notes I: 6-36

– The notion of random coding

∗ The random coding technique shows that the expectation of the desired
performance over all possible information manipulation schemes (ran-
domly drawn according to some properly chosen statistics) is already
acceptably good, and hence the existence of at least one good scheme
that fulﬁlls the desired performance index is validated.
• As a result, in situations where the above two techniques apply, a similar
theorem can often be established.
Notes I: 6-37

Theorem 6.21 (Shannon’s rate-distortion theorem for stationary er-

godic sources) Consider a stationary ergodic source {Zn}∞ n=1 with alphabet Z,
reproduction alphabet Ẑ and a bounded additive distortion measure ρn (·, ·); i.e.,

n
ρn (z n , ẑ n ) = ρ(zi , ẑi) and ρmax := max ρ(z, ẑ) < ∞,
(z,ẑ)∈Z×Ẑ
i=1

where ρ(·, ·) is a given single-letter distortion measure. Then the source’s rate-
distortion function is given by
R(D) = R̄(I)(D),
where
R̄(I)(D) := lim Rn(I)(D) (6.3.5)
n→∞
is called the asymptotic information rate-distortion function. and
1
Rn(I)(D) := min I(Z n ; Ẑ n) (6.3.6)
PẐ n |Z n : n1 E[ρn (Z n ,Ẑ n )]≤D n

is the n-th order information rate-distortion function.

Notes I: 6-38

• Question: Can we extend the theorems to cases where the two arguments
fail?’
• It is obvious that only when new methods (other than the above two) are
developed can the question be answered in the aﬃrmative.
6.4 Calculation of the rate-distortion function I: 6-39

Theorem 6.23 Fix a binary DMS {Zn}∞ n=1 with marginal distribution PZ (0) =
1 − PZ (1) = p, where 0 < p < 1. Then the source’s rate-distortion function under
the Hamming additive distortion measure is given by:

hb(p) − hb(D) if 0 ≤ D < min{p, 1 − p};
R(D) =
0 if D ≥ min{p, 1 − p},
where hb(p) := − p · log(p) − (1 − p) · log(1 − p) is the binary entropy function.

Proof: Assume without loss of generality that p ≤ 1/2.

• We ﬁrst prove the theorem under 0 ≤ D < min{p, 1 − p} = p. Observe that
for any binary random variable Ẑ,
H(Z|Ẑ) = H(Z ⊕ Ẑ|Ẑ).
Also observe that
E[ρ(Z, Ẑ)] ≤ D implies Pr{Z ⊕ Ẑ = 1} ≤ D.
6.4 Calculation of the rate-distortion function I: 6-40

Then
I(Z; Ẑ) = H(Z) − H(Z|Ẑ)
= hb(p) − H(Z ⊕ Ẑ|Ẑ)
≥ hb(p) − H(Z ⊕ Ẑ) (conditioning never increase entropy)
≥ hb(p) − hb(D),
where the last inequality follows since hb(x) is increasing for x ≤ 1/2, and
Pr{Z ⊕ Ẑ = 1} ≤ D.
• Since the above derivation is true for any PẐ|Z , we have
R(D) ≥ hb(p) − hb (D).
6.4 Calculation of the rate-distortion function I: 6-41

• It remains to show that the lower bound is achievable by some PẐ|Z , or equiv-
alently, H(Z|Ẑ) = hb(D) for some PẐ|Z .

By deﬁning PZ|Ẑ (0|0) = PZ|Ẑ (1|1) = 1−D, we immediately obtain H(Z|Ẑ) =

hb (D). The desired PẐ|Z can be obtained by solving
1 = PẐ (0) + PẐ (1)
PZ (0) PZ (0)
= PẐ|Z (0|0) + P (1|0)
PZ|Ẑ (0|0) PZ|Ẑ (0|1) Ẑ|Z
p p
= PẐ|Z (0|0) + (1 − PẐ|Z (0|0))
1−D D
and
1 = PẐ (0) + PẐ (1)
PZ (1) PZ (1)
= PẐ|Z (0|1) + P (1|1)
PZ|Ẑ (1|0) PZ|Ẑ (1|1) Ẑ|Z
1−p 1−p
= (1 − PẐ|Z (1|1)) + P (1|1),
D 1 − D Ẑ|Z
and yield

1−D D 1−D D
PẐ|Z (0|0) = 1− and PẐ|Z (1|1) = 1− .
1 − 2D p 1 − 2D 1−p
6.4 Calculation of the rate-distortion function I: 6-42

• Now in the case of p ≤ D < 1 − p, we can let PẐ|Z (1|0) = PẐ|Z (1|1) = 1 to
obtain I(Z; Ẑ) = 0 and

1
1
E[ρ(Z, Ẑ)] = PZ (z)PẐ|Z (ẑ|z)ρ(z, ẑ) = p ≤ D.
z=0 ẑ=0

Similarly, in the case of D ≥ 1 − p, we let PẐ|Z (0|0) = PẐ|Z (0|1) = 1 to obtain

I(Z; Ẑ) = 0 and

1
1
E[ρ(Z, Ẑ)] = PZ (z)PẐ|Z (ẑ|z)ρ(z, ẑ) = 1 − p ≤ D.
z=0 ẑ=0
2

• Remark: The Hamming additive distortion measure is deﬁned as:

n
ρn (z n , ẑ n ) = zi ⊕ ẑi,
i=1

where “⊕” denotes modulo two addition. In such case, ρ(z n , ẑ n ) is exactly the
number of bit changes or bit errors after compression.
6.4.2 Rate distortion func / the squared error dist I: 6-43

Theorem 6.26 (Gaussian sources maximize the rate-distortion func-

tion) Under the additive squared error distortion measure, namely

n
ρn (z n, ẑ n ) = (zi − ẑi )2,
i=1

the rate-distortion function for any continuous memoryless source {Zi} with a pdf
of support R, zero mean, variance σ 2 and finite differential entropy satisfies

1 σ2
log2 , for 0 < D ≤ σ 2
R(D) ≤ 2 D

0, for D > σ 2
with equality holding when the source is Gaussian.

Proof: By Theorem 6.20 (extended to the “unbounded” squared error distortion

measure),
R(D) = R(I)(D) = min I(Z; Ẑ).
fẐ|Z : E[(Z−Ẑ)2 ]≤D

So for any fẐ|Z satisfying the distortion constraint,

R(D) ≤ I(fZ , fẐ|Z ).
6.4.2 Rate distortion func / the squared error dist I: 6-44

For 0 < D ≤ σ 2:
• Choose a dummy Gaussian random variable W with zero mean and variance
aD, where a = 1 − D/σ 2, and is independent of Z. Let Ẑ = aZ + W . Then
E[(Z − Ẑ)2] = E[(1 − a)2Z 2] + E[W 2] = (1 − a)2σ 2 + aD = D
which satisﬁes the distortion constraint.
• Note that the variance of Ẑ is equal to E[a2Z 2] + E[W 2] = σ 2 − D.
• Consequently,
R(D) ≤ I(Z; Ẑ)
= h(Ẑ) − h(Ẑ|Z)
= h(Ẑ) − h(W + aZ|Z)
= h(Ẑ) − h(W |Z)
= h(Ẑ) − h(W ) (by the independence of W and Z)
1
= h(Ẑ) − log2(2πe(aD))
2
1 1 1 σ2
≤ log2(2πe(σ − D)) − log2(2πe(aD)) = log2 .
2
2 2 2 D
6.4.2 Rate distortion func / the squared error dist I: 6-45

For D > σ 2:
• Let Ẑ satisfy Pr{Ẑ = 0} = 1 (and be independent of Z).
• Then E[(Z − Ẑ)2] = E[Z 2]+E[Ẑ 2 ]−2E[Z]E[Ẑ] = σ 2 < D, and I(Z; Ẑ) = 0.
Hence, R(D) = 0 for D > σ 2.

The achievability of this upper bound by a Gaussian source (with zero mean and
variance σ 2) can be proved by showing that under the Gaussian source,
(1/2) log2(σ 2/D)
is a lower bound to R(D) for 0 < D ≤ σ 2.
6.4.2 Rate distortion func / the squared error dist I: 6-46

Indeed, when the source Z is Gaussian and for any fẐ|Z such that E[(Z − Ẑ)2] ≤ D,
we have
I(Z; Ẑ) = h(Z) − h(Z|Ẑ)
1
= log2(2πeσ 2) − h(Z − Ẑ|Ẑ)
2
1
≥ log2(2πeσ 2) − h(Z − Ẑ)
2
1 1
≥ log2(2πeσ 2) − log2 2πe Var[(Z − Ẑ)]
2 2
1 1
≥ log2(2πeσ 2) − log2 2πe E[(Z − Ẑ)2]
2 2
1 1
≥ log2(2πeσ 2) − log2 (2πeD)
2 2
2
1 σ
= log2 .
2 D
6.4.2 Rate distortion func / the squared error dist I: 6-47

Theorem 6.27 (Shannon lower bound on the rate-distortion func-

tion: squared error distortion) Consider a continuous memoryless source
{Zi } with a pdf of support R and finite differential entropy under the additive
squared error distortion measure. Then its rate-distortion function satisfies
1
R(D) ≥ h(Z) − log2(2πeD).
2
Proof: The proof, which follows similar steps as in the achievability of the upper
bound in the proof of the previous theorem, is left as an exercise.
6.4.2 Rate distortion func / the squared error dist I: 6-48

• In Lemma 5.42, we show that for a discrete-time continuous-alphabet memo-

ryless additive-noise channel with input power constraint P and noise variance
σ 2, its capacity satisﬁes
CG(P ) + D(ZZG) ≥ C(P ) ≥ CG(P ) .

log2 1+ P2
non-Gaussianness 1
=h(ZG )−h(Z) 2 σ

• Similarly, for a continuous memoryless source {Zi} with a pdf of support R and
finite differential entropy under the additive squared error distortion measure
its rate-distortion function satisfies
R (D) − D(ZZG) ≤ R(D) ≤ RG(D) .
G
2
log2 σD
Shannon lower bound 1
on the rate distortion func 2

Section 6.4.3 is based on a similar idea but targets for the absolute error distor-
tion; hence, we omit it in our lecture. Notably, a correction has been provided
for Theorem 6.29 (See errata for the textbook.)
6.5 Lossy joint source-channel coding theorem I: 6-49

• This is also named lossy information-transmission theorem.

Definition 6.32 (Lossy source-channel block code) Given a discrete-time

source {Zi}∞
i=1 with alphabet Z and reproduction alphabet Ẑ and a discrete-time
channel with input and output alphabets X and Y, respectively, an m-to-n lossy
source-channel block code with rate mn source symbol/channel symbol is a pair of
mappings (f (sc) , g (sc)), where
f (sc) : Z m → X n and g (sc) : Y n → Ẑ m.

Encoder Xn Yn Decoder
Z ∈Z
m m - -
Channel - -
Ẑ m ∈ Ẑ m
f (sc) g (sc)

,m
Given an additive distortion measure ρm = i=1 ρ(zi , ẑi ), where ρ is a distor-
tion function on Z × Ẑ, we say that the m-to-n lossy source-channel block code
(f (sc) , g (sc)) satisfies the average distortion fidelity criterion D, where D ≥ 0, if
1
E[ρm (Z m, Ẑ m )] ≤ D.
m
6.5 Lossy joint source-channel coding theorem I: 6-50

Theorem 6.33 (Lossy joint source-channel coding theorem) Consider

a discrete-time stationary ergodic source {Zi}∞i=1 with finite alphabet Z, finite
reproduction alphabet Ẑ, bounded additive distortion measure ρm (·, ·) and rate-
distortion function R(D), and consider a discrete-time memoryless channel with
input alphabet X , output alphabet Y and capacity C. Assuming that both R(D)
and C are measured in the same units, the following hold:
• Forward part (achievability): For any D > 0, there exists a sequence of m-
to-nm lossy source-channel codes (f (sc), g (sc)) satisfying the average distortion
fidelity criterion D for sufficiently large m if

m
lim sup · R(D) < C.
m→∞ n m

• Converse part: On the other hand, for any sequence of m-to-nm lossy source-
channel codes (f (sc), g (sc) ) satisfying the average distortion ﬁdelity criterion D,
we have
m
· R(D) ≤ C.
nm
6.5 Lossy joint source-channel coding theorem I: 6-51

Observation 6.34 (Lossy joint source-channel coding theorem with

signaling rates)
• The above theorem admits another form when the source and channel are
described in terms of “signaling rates”.
• Let Ts and Tc represent the durations (in seconds) per source letter and per
channel input symbol, respectively.
• In this case, TTsc represents the source-channel transmission rate measured in
source symbols per channel use (or input symbol).
– Forward part: The source can be reproduced at the output of the channel
with distortion less than D (i.e., there exist lossy source-channel codes
asymptotically satisfying the average distortion ﬁdelity criterion D) if

Tc
· R(D) < C.
Ts
– Converse part: For any lossy source-channel codes satisfying the average
distortion ﬁdelity criterion D, we have

Tc
· R(D) ≤ C.
Ts
6.6 Shannon limit of communication systems I: 6-52

• A bound on the end-to-end distortion of a communication system:

– If a source with rate-distortion function R(D) can be transmitted over a

channel with capacity C via a source-channel block code of rate Rsc > 0
(in source symbols/channel use) and reproduced at the destination with an
average distortion no larger than D, then we must have that
1
R(D) ≤ C. (6.6.1)
Rsc
– Shannon limit:

1
DSL := min D : R(D) ≤ C .
Rsc
6.6 Shannon limit of communication systems I: 6-53

R(D)

1
DSL Rsc C

D
6.6 Shannon limit of communication systems I: 6-54

Example 6.35 (Shannon limit for a binary uniform DMS over a

BSC)
• Let Z = Ẑ = {0, 1} and consider a binary uniformly distributed DMS {Zi}
(i.e., a Bernoulli(1/2) source) using the additive Hamming distortion measure.
• Note that in this case, E[ρ(Z, Ẑ)] = P (Z = Ẑ) := Pb.
• We desire to transmit the source over a BSC with crossover probability < 1/2.
• We then have for 0 ≤ D ≤ 12 ,
R(D) = 1 − hb(D), and C = 1 − hb().

• Hence, for a given ,

1 1 − h ()
(1 − hb ()) = h−1
b
DSL := min D : 1 − hb(D) ≤ b 1− .
Rsc Rsc

• Alternatively, for a given D,

1 ! $
SL := max : 1 − hb(D) ≤ (1 − hb()) = h−1
b 1 − Rsc 1 − hb(D)
Rsc
6.6 Shannon limit of communication systems I: 6-55

• It is well-known that a BSC with crossover probability represents a binary-

input AWGN channel used with antipodal (BPSK) signaling and hard-decision
coherent demodulation.
• With average energy per signal P , noise power N0
2
and signal-to-noise ratio
(SNR) γ = P/N0, we have
-
=Q 2γ (6.6.5)
where . ∞
1 2
− t2
Q(x) = √ e dt
2π x
is the Gaussian Q-function.
• If the channel is used with a source-channel code of rate Rsc source (or infor-
mation) bits/channel use, then can be expressed in terms of a so-called SNR
per source (or information) bit
Eb 1 P 1
γb := = = γ,
N0 Rsc N0 Rsc
where Eb is the average energy per source bit.
6.6 Shannon limit of communication systems I: 6-56

• Thus,
-
=Q 2Rscγb (6.6.6)

• The minimal γb (in dB) for a given Pb = D < 12 and a source-channel code
rate Rsc < 1:
1 ! −1 $2
γb,SL = Q (SL)
2Rsc

Rate Rsc Pb = 0 Pb = 10−5 Pb = 10−4 Pb = 10−3 Pb = 10−3

1/3 1.212 1.210 1.202 1.150 0.077
1/2 1.775 1.772 1.763 1.703 1.258
2/3 2.516 2.513 2.503 2.423 1.882
4/5 3.369 3.367 3.354 3.250 2.547

• For Rsc = 1,
! ! $$
SL := h−1
b 1 − R sc 1 − h b (D) = D = Pb
and
1 ! −1 $2
γb,SL = Q (Pb) .
2
6.6 Shannon limit of communication systems I: 6-57

Example 6.37 (Shannon limit for a memoryless Gaussian source over

an AWGN channel)
• Let Z = Ẑ = R and consider a memoryless Gaussian source {Zi} of mean
zero and variance σ 2 and the squared error distortion function.
• The objective is to transmit the source over an AWGN channel with input
power constraint P and noise variance σN 2
= N20 and recover it with distortion
ﬁdelity no larger than D, for a given threshold D > 0.
• The source’s rate-distortion function is given by
1 σ2
R(D) = log2 for 0 < D < σ2.
2 D
Furthermore, the capacity (or capacity-cost function) of the AWGN channel is
given as
1 P
C(P ) = log2 1 + 2 .
2 σN
6.6 Shannon limit of communication systems I: 6-58

• The Shannon limit DSL for this system with rate Rsc is obtained via

1
DSL := min D : R(D) ≤ C(P )
Rsc

1 σ2 1 P
= min D : log2 ≤ log 1 + 2
2 D 2Rsc 2 σN
σ2
= 1/Rsc
(6.6.10)
1 + σP2
N

for 0 < DSL < σ 2.

6.6 Shannon limit of communication systems I: 6-59

Example 6.39 (Shannon limit for a binary uniform DMS over a

binary-input AWGN channel)
• Let Z = Ẑ = {0, 1} and consider a binary uniformly distributed DMS {Zi}
(i.e., a Bernoulli(1/2) source) using the additive Hamming distortion measure.
• The binary uniform source is sent via a source-channel code over a binary-input
AWGN channel used with antipodal (BPSK) signaling of power P and noise
2
variance σN = N0 /2.
• We then have for 0 ≤ D ≤ 12 ,
R(D) = 1 − hb(D).
6.6 Shannon limit of communication systems I: 6-60

• However, the channel√capacity √ C(P ) of the AWGN whose input takes on

two possible values + P or − P , whose output is real-valued and whose
noise variance is σN2
= N20 , is given by evaluating the mutual information √ be-
tween √the channel input and output under the input distribution PX (+ P ) =
PX (− P ) = 1/2:
. ∞ * / +
P 1 P P
e−y /2 log2 cosh
2
C(P ) = 2 log2(e) − √ 2 + y 2 dy
σN 2π −∞ σN σN
. ∞ * / +
RscEb 1 RscEb RscEb
e−y /2 log2 cosh
2
= log2(e) − √ +y dy
N0/2 2π −∞ N0 /2 N0 /2
. ∞ -
1 −y 2 /2
= 2Rscγb log2(e) − √ e log2[cosh(2Rscγb + y 2Rscγb )]dy,
2π −∞
where P = RscEb is the channel signal power, Eb is the average energy per
source bit, and γb = Eb/N0 is the SNR per source bit.
• Then, it requires
. ∞ -
1 1 −y 2 /2
1 − hb (Pb) ≤ 2Rscγb log2(e) − √ e log2[cosh(2Rscγb + y 2Rscγb)]dy .
Rsc 2π −∞
6.6 Shannon limit of communication systems I: 6-61

Shannon limit
1
Rsc = 1/2 c
sc sc sc sc sc sc Rsc = 1/3 s
1e-1 sc sc cs c
s c
ss c
1e-2 s
s c
s
s c
Pb 1e-3
c
1e-4 s c
s
1e-5 s

1e-6
-6 -5 -4 -3 -2 -1 -.495 0.19 1 2
γb (dB)

The Shannon limits for (2, 1) and (3, 1) codes under binary-input AWGN channel.

• The Shannon limits calculated above are pertinent due to the invention of
near-capacity achieving channel codes, such as Turbo or LDPC codes.
• For example, the rate-1/2 Turbo coding system proposed in 1993 can approach
a bit error rate of 10−5 at γb = 0.9 dB, which is only 0.714 dB away from the
Shannon limit of 0.186 dB.
6.6 Shannon limit of communication systems I: 6-62

Rate Rsc Pb = 0 Pb = 10−5 Pb = 10−4 Pb = 10−3 Pb = 10−3

1/3 −0.496 −0.496 −0.504 −0.559 −0.960
1/2 0.186 0.186 0.177 0.111 −0.357
2/3 1.060 1.057 1.047 0.963 0.382
4/5 2.040 2.038 2.023 1.909 1.152
6.6 Shannon limit of communication systems I: 6-63

Example 6.40 (Shannon limit for a binary uniform DMS over a

binary-input Rayleigh fading channel)
• Consider a BPSK modulated Rayleigh fading channel.
• Its input power is P = RscEb, its noise variance is σN
2
= N0/2 and the fading
distribution is Rayleigh:
fA (a) = 2ae−a ,
2
a > 0.
• Then,
0 . +∞ . +∞ ! $
Rscγb −Rsc γb (y+a)2 4Rsc γb ya
CDSI (γb ) = 1− fA(a) e log2 1 + e dy da.
π 0 −∞

• We then generate the below table according to:

1
1 − hb (Pb) ≤ CDSI (γb),
Rsc
Rate Rsc Pb = 0 Pb = 10−5 Pb = 10−4 Pb = 10−3 Pb = 10−3
1/3 0.489 0.487 0.479 0.412 −0.066
1/2 1.830 1.829 1.817 1.729 1.107
2/3 3.667 3.664 3.647 3.516 2.627
4/5 5.936 5.932 5.904 5.690 4.331
Key Notes I: 6-64

• Why lossy data compression (e.g., to transmit a source with entropy larger
than capacity)
• Distortion measure
• Lossy data compression codes
• Rate-distortion function
• Distortion typical set
• AEP for distortion measure
• Rate distortion theorem
Key Notes I: 6-65

Terminology
• Shannon’s source coding theorem → Shannon’s ﬁrst coding theorem;
• Shannon’s channel coding theorem → Shannon’s second coding theorem;
• Rate distortion theorem → Shannon’s third coding theorem.
• Information transmission Theorem → Joint source-channel coding theorem
– Shannon limit (BER versus SNRb )

RDT Berger
No ratings yet
RDT Berger
35 pages
Lecture 7
No ratings yet
Lecture 7
31 pages
Information Theory and Reliable Communication - Gallager
92% (13)
Information Theory and Reliable Communication - Gallager
603 pages
Advances in Source Coding Toby Berger
No ratings yet
Advances in Source Coding Toby Berger
67 pages
Information Theory Module 4
No ratings yet
Information Theory Module 4
36 pages
2201.01741v2 - Understanding Entropy Coding With Asymmetric Numeral Systems (ANS) - Statistician Perspective
No ratings yet
2201.01741v2 - Understanding Entropy Coding With Asymmetric Numeral Systems (ANS) - Statistician Perspective
26 pages
Lecture Notes Part II
No ratings yet
Lecture Notes Part II
52 pages
Shannon Final SCS
No ratings yet
Shannon Final SCS
32 pages
Rate Distortion
No ratings yet
Rate Distortion
84 pages
The Information Lost in Erasures
No ratings yet
The Information Lost in Erasures
29 pages
Capacity Coding
No ratings yet
Capacity Coding
37 pages
SC 11
No ratings yet
SC 11
47 pages
Exercise6 Sol
No ratings yet
Exercise6 Sol
14 pages
Set 7
No ratings yet
Set 7
25 pages
Fundamentals of Vector Quantization
No ratings yet
Fundamentals of Vector Quantization
87 pages
IICT Notes Unit-2
No ratings yet
IICT Notes Unit-2
17 pages
ITC 2020 21 Lecture 7
No ratings yet
ITC 2020 21 Lecture 7
15 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Lecture9 Lossless Coding
No ratings yet
Lecture9 Lossless Coding
46 pages
Hw1sol Sanov Rate Distortion
No ratings yet
Hw1sol Sanov Rate Distortion
15 pages
Itc 2022 22-05-01
No ratings yet
Itc 2022 22-05-01
9 pages
T3 Modulations
No ratings yet
T3 Modulations
8 pages
Information Theory and Coding: Universit' A Degli Studi Di Siena Facolt'a Di Ingegneria
No ratings yet
Information Theory and Coding: Universit' A Degli Studi Di Siena Facolt'a Di Ingegneria
156 pages
2015 Chapter 8 MMS IT
No ratings yet
2015 Chapter 8 MMS IT
12 pages
Lec06 PDF
No ratings yet
Lec06 PDF
10 pages
CS6301 - Analog and Digital Communication (ADC) PDF
No ratings yet
CS6301 - Analog and Digital Communication (ADC) PDF
122 pages
CH - 08 - #1 Distortion Criteria
No ratings yet
CH - 08 - #1 Distortion Criteria
9 pages
2004 Martinian Wornell Allerton
No ratings yet
2004 Martinian Wornell Allerton
10 pages
Module-3 Information Theory: Entropy Source-Coding Theorem
No ratings yet
Module-3 Information Theory: Entropy Source-Coding Theorem
14 pages
Chapter 5 Multi
No ratings yet
Chapter 5 Multi
16 pages
EE/Ma 126b Information Theory - Homework Set #4
No ratings yet
EE/Ma 126b Information Theory - Homework Set #4
5 pages
Itc Term1
No ratings yet
Itc Term1
78 pages
Lec40 - 210102096 - VEDIKA GARG
No ratings yet
Lec40 - 210102096 - VEDIKA GARG
5 pages
EQ2310 Collection of Problems
100% (1)
EQ2310 Collection of Problems
203 pages
Information Theory Coding 6 Sem Ec Notes
91% (22)
Information Theory Coding 6 Sem Ec Notes
174 pages
Basis Representation Fundamentals: Notes by J. Romberg
No ratings yet
Basis Representation Fundamentals: Notes by J. Romberg
28 pages
Data Compression
No ratings yet
Data Compression
49 pages
Data Compression Intro
100% (1)
Data Compression Intro
107 pages
Encoding The Ball From Limited Measurements
No ratings yet
Encoding The Ball From Limited Measurements
10 pages
Information Theory and Coding - Chapter 2
0% (1)
Information Theory and Coding - Chapter 2
41 pages
Fix 1978 - Rate Distortion Functions For Squared Error Distortion Measures - Proc. 16th Annu. Allerton Conf. Commun. Control Comput
No ratings yet
Fix 1978 - Rate Distortion Functions For Squared Error Distortion Measures - Proc. 16th Annu. Allerton Conf. Commun. Control Comput
8 pages
EE 376A: Information Theory: Lecture Notes
No ratings yet
EE 376A: Information Theory: Lecture Notes
75 pages
Main Techniques and Performance of Each Compression
No ratings yet
Main Techniques and Performance of Each Compression
23 pages
Information Theory
No ratings yet
Information Theory
38 pages
Image Compression Fundamentals
85% (13)
Image Compression Fundamentals
84 pages
First Midterm Exam
No ratings yet
First Midterm Exam
10 pages
Lec2 - Data Compression PDF
No ratings yet
Lec2 - Data Compression PDF
9 pages
TSBK08 Data Compression Exercises: Informationskodning, ISY, Link Opings Universitet, 2013
No ratings yet
TSBK08 Data Compression Exercises: Informationskodning, ISY, Link Opings Universitet, 2013
32 pages
ITC AKASH Full End Sem
No ratings yet
ITC AKASH Full End Sem
36 pages
Agenda For The Lecture: C Himanshu Tyagi. Feel Free To Use With Acknowledgement
No ratings yet
Agenda For The Lecture: C Himanshu Tyagi. Feel Free To Use With Acknowledgement
7 pages
Coding Theorems For A Discrete Source A Criterion-: With Fidelity
No ratings yet
Coding Theorems For A Discrete Source A Criterion-: With Fidelity
1 page
Tanaman Indah Dan Bersih
No ratings yet
Tanaman Indah Dan Bersih
5 pages
Digital Coding of Analog Signal Ut1
No ratings yet
Digital Coding of Analog Signal Ut1
36 pages
Sayood DataCompression
No ratings yet
Sayood DataCompression
22 pages
Unit 1 INFORMATION THEORY SOURCE CODING MCQ
No ratings yet
Unit 1 INFORMATION THEORY SOURCE CODING MCQ
16 pages
CE Notes
No ratings yet
CE Notes
32 pages
4 20240 456
0% (1)
4 20240 456
5 pages
CS6301 - Analog and Digital Communication (ADC) PDF
No ratings yet
CS6301 - Analog and Digital Communication (ADC) PDF
122 pages
Untitled
No ratings yet
Untitled
4 pages
ITC Unit 2
No ratings yet
ITC Unit 2
186 pages
Coding Theory
No ratings yet
Coding Theory
49 pages
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
No ratings yet
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
7 pages
Unit 1
100% (2)
Unit 1
45 pages
Source Coding
No ratings yet
Source Coding
10 pages
Information Theory
No ratings yet
Information Theory
26 pages
Source Coding: 1. Introduction-Encoding of The Source Output 2. Shannon S Encoding Algorithm 3. 4. 5. Outcome
No ratings yet
Source Coding: 1. Introduction-Encoding of The Source Output 2. Shannon S Encoding Algorithm 3. 4. 5. Outcome
15 pages
Adc Module 3
No ratings yet
Adc Module 3
80 pages
Overview & Chapter 1
No ratings yet
Overview & Chapter 1
45 pages
DC Digital Communication MODULE IV PART1
100% (3)
DC Digital Communication MODULE IV PART1
23 pages
Information Theory and Coding PDF
No ratings yet
Information Theory and Coding PDF
150 pages
Information Theory Module 3
No ratings yet
Information Theory Module 3
68 pages
7-Information Theory
No ratings yet
7-Information Theory
29 pages
Ece141 Lec10 Information Theory
No ratings yet
Ece141 Lec10 Information Theory
49 pages
Measure of Randomness For Stochastic Processes
No ratings yet
Measure of Randomness For Stochastic Processes
49 pages
Proof To Shannon's Source Coding Theorem
No ratings yet
Proof To Shannon's Source Coding Theorem
5 pages
2019 雅思作文题目
No ratings yet
2019 雅思作文题目
11 pages
Source Coding
No ratings yet
Source Coding
18 pages
2018 雅思作文题目
No ratings yet
2018 雅思作文题目
11 pages
2020 雅思作文题目
No ratings yet
2020 雅思作文题目
10 pages
2021 雅思作文题目
No ratings yet
2021 雅思作文题目
10 pages
2022 雅思作文题目
No ratings yet
2022 雅思作文题目
7 pages
Chapter 02 Information Theory
No ratings yet
Chapter 02 Information Theory
15 pages
EE 6717 PPT Slides
No ratings yet
EE 6717 PPT Slides
9 pages
Mid Spring Exam 2025S
No ratings yet
Mid Spring Exam 2025S
6 pages
Information Theory: A Tutorial Introduction
0% (1)
Information Theory: A Tutorial Introduction
23 pages
The Source Coding Theorem: M Ario S. Alvim (Msalvim@dcc - Ufmg.br)
No ratings yet
The Source Coding Theorem: M Ario S. Alvim (Msalvim@dcc - Ufmg.br)
62 pages
Source Coding Theorem: Marks Spaces E "." Q " - .-"
No ratings yet
Source Coding Theorem: Marks Spaces E "." Q " - .-"
2 pages
Image Compression Fundamentals PDF
No ratings yet
Image Compression Fundamentals PDF
84 pages
Mackay Book Review
No ratings yet
Mackay Book Review
2 pages
Shannon's Noiseless Coding Theorem
No ratings yet
Shannon's Noiseless Coding Theorem
4 pages
Itc Project
No ratings yet
Itc Project
8 pages
Shannon Source Coding Theorem
No ratings yet
Shannon Source Coding Theorem
3 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Chapter 6

Uploaded by

Chapter 6

Uploaded by

Chapter 6

Lossy Data Compression and Transmission

Po-Ning Chen, Professor

Institute of Communications Engineering

National Chiao Tung University

Hsin Chu, Taiwan 30010, R.O.C.

compressed output with

Example for the application of lossy data compression

Definition 6.4 (Distortion measure) A distortion measure is a mapping

– Average distortion under uniform source distribution

• The above example presumes |Ẑ| = |Z|.

Example 6.6 (Absolute error distortion measure) Assuming that Z =

Example 6.7 (Squared error distortion) Assuming that Z = Ẑ = R, the

Definition 6.8 (Additive distortion measure between vectors) The ad-

Definition 6.9 (Maximum distortion measure)

Question raised due to distortion measures for sequences

• Therefore, a lossy data compression code from

Definition 6.10 (Fixed-length lossy data compression code subject

Definition 6.11 (Achievable rate-distortion pair) For a given sequence of

Definition 6.12 (Rate-distortion region) The rate-distortion region R of a

Lemma 6.13 (Time-sharing principle) Under an additive distortion mea-

Definition 6.14 (Rate-distortion function) The rate-distortion function, de-

Observation 6.15 (Monotonicity and convexity of R(D)) Note that,

PẐ n (ẑ n) ≥ PẐ n|Z n (ẑ n|z n )2−n[I(Z;Ẑ)+3δ]. (6.3.1)

• Alternative form of (6.3.1):

Lemma 6.19 For 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, and n > 0,

Theorem 6.20 (Shannon’s rate-distortion theorem for memoryless

this quantity is typically called Shannon’s information rate-distortion function.

Step 2: Random coding. Independently select Mn words from Ẑ n according

Step 3: Encoding rule. Deﬁne a subset of Z n as

Note that when z n ∈ J (∼Cn ), we have (z n , hn(z n )) ∈ Dn(δ) and

Step 4: Calculation of the probability of the complement of J (∼

For any z n given, to select a codebook ∼Cn satisfying z n ∈ J (∼Cn ) is equivalent

Continuing the computation

Step 5: Calculation of distortion. The distortion of the optimal codebook

This concludes the proof of the achievability part.

I(Z; Ẑλ) ≤ λ · I(Z; Ẑ1) + (1 − λ) · I(Z; Ẑ2 ),

Step 2: Convexity of R(I)(D). Let PẐ1|Z and PẐ2|Z be two distributions

Step 3: Strictly decreasing and continuity properties of R(I)(D).

which is ﬁnite by the boundedness of the distortion measure. Thus since

Step 4: Main proof.

• The formula of the rate-distortion function obtained in the previous theorems

– For example, the boundedness assumption in the theorems can be replaced

• Here, we put the boundedness assumption just to facilitate the exposition of

– The notion of random coding

Theorem 6.21 (Shannon’s rate-distortion theorem for stationary er-

is the n-th order information rate-distortion function.

Proof: Assume without loss of generality that p ≤ 1/2.

By deﬁning PZ|Ẑ (0|0) = PZ|Ẑ (1|1) = 1−D, we immediately obtain H(Z|Ẑ) =

Similarly, in the case of D ≥ 1 − p, we let PẐ|Z (0|0) = PẐ|Z (0|1) = 1 to obtain

• Remark: The Hamming additive distortion measure is deﬁned as:

Theorem 6.26 (Gaussian sources maximize the rate-distortion func-

Proof: By Theorem 6.20 (extended to the “unbounded” squared error distortion

So for any fẐ|Z satisfying the distortion constraint,

Theorem 6.27 (Shannon lower bound on the rate-distortion func-

• In Lemma 5.42, we show that for a discrete-time continuous-alphabet memo-

• This is also named lossy information-transmission theorem.

Definition 6.32 (Lossy source-channel block code) Given a discrete-time

Theorem 6.33 (Lossy joint source-channel coding theorem) Consider

Observation 6.34 (Lossy joint source-channel coding theorem with

• A bound on the end-to-end distortion of a communication system:

– If a source with rate-distortion function R(D) can be transmitted over a

Example 6.35 (Shannon limit for a binary uniform DMS over a

• Hence, for a given ,

• Alternatively, for a given D,

• It is well-known that a BSC with crossover probability  represents a binary-

Rate Rsc Pb = 0 Pb = 10−5 Pb = 10−4 Pb = 10−3 Pb = 10−3

Example 6.37 (Shannon limit for a memoryless Gaussian source over

for 0 < DSL < σ 2.

Example 6.39 (Shannon limit for a binary uniform DMS over a

• However, the channel√capacity √ C(P ) of the AWGN whose input takes on

Rate Rsc Pb = 0 Pb = 10−5 Pb = 10−4 Pb = 10−3 Pb = 10−3

Example 6.40 (Shannon limit for a binary uniform DMS over a

• We then generate the below table according to:

You might also like

For any z n given, to select a codebook ∼Cn satisfying z n ∈ J (∼Cn ) is equivalent

• Hence, for a given ,

• It is well-known that a BSC with crossover probability represents a binary-