0% found this document useful (0 votes)
24 views66 pages

Chapter 6

Uploaded by

liaoendymion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views66 pages

Chapter 6

Uploaded by

liaoendymion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Chapter 6

Lossy Data Compression and Transmission

Po-Ning Chen, Professor

Institute of Communications Engineering

National Chiao Tung University

Hsin Chu, Taiwan 30010, R.O.C.


6.1.1 Motivation I: 6-1

Motivations
• Lossy data compression = to compress a source to a rate less than the source
entropy.
output with
source with unmanageable
rH > C - channel with error -
capacity C

compressed output with


source with lossy data source with manageable
rH > C compressor rH  < C channel with error E -
- -
introducing capacity C
error E

Example for the application of lossy data compression


6.1.2 Distortion measures I: 6-2

Definition 6.4 (Distortion measure) A distortion measure is a mapping


ρ : Z × Ẑ → +,
where Z is the source alphabet, Ẑ is the reproduction alphabet for compressed
code, and + is the set of non-negative real numbers.

• The distortion measure ρ(z, ẑ) can be viewed as the cost of representing the
source symbol z ∈ Z by a reproduction symbol ẑ ∈ Ẑ.
E.g. A lossy data compression is similar to “grouping.”
' $'  $Representative for this group

 


1 2 1 2



 
3 4 3 4 0 1 2 2
  
& %&6 % 1 0 2 2
[ρ(i, j)]:=  ,
Representative for this group 2 2 0 1
2 2 1 0
6.1.2 Distortion measures I: 6-3

– Average distortion under uniform source distribution


1 1 1 1 1
ρ(1, 1) + ρ(2, 1) + ρ(3, 3) + ρ(4, 3) = .
4 4 4 4 2
– Resultant entropy
H(Z) = log 2(4) = 2 bits ⇒ H(Ẑ) = log2(2) = 1 bit.
6.1.2 Distortion measures I: 6-4

• The above example presumes |Ẑ| = |Z|.


• Sometimes, it is convenient to have |Ẑ| = |Z| + 1.
E.g. |Z = {1, 2, 3}| = 3 and |Ẑ = {1, 2, 3, E}| = 4
and the distortion measure is defined by
 
0 2 2 0.5
[ρ(i, j)]:=  2 0 2 0.5  .
2 2 0 0.5
– Suppose only two outcomes are allowed under uniform Z.
Then
(1) → 1 and (2, 3) → E
is an optimal choice (that minimizes the average distortion measure for a
given compression rate).
– Average distortion
1 1 1 1
ρ(1, 1) + ρ(2, E) + ρ(3, E) = .
3 3 3 3
– Resultant entropy
H(Z) = log2(3) bits ⇒ H(Ẑ) = [log2(3) − 2/3] bits.
6.1.3 Frequently used distortion measures I: 6-5

Example 6.5 (Hamming distortion measure) Let source alphabet and re-
production alphabet be the same, i.e., Z = Ẑ. Then the Hamming distribution
measure is given by 
0, if z = ẑ;
ρ(z, ẑ):=
1, if z = ẑ.
This is also named the probability-of-error distortion measure because
E[ρ(Z, Ẑ)] = Pr(Z = Ẑ).

Example 6.6 (Absolute error distortion measure) Assuming that Z =


Ẑ = R, the absolute error distortion measure is given by
ρ(z, ẑ) := |z − ẑ|.

Example 6.7 (Squared error distortion) Assuming that Z = Ẑ = R, the


squared error distortion measure is given by
ρ(z, ẑ) := (z − ẑ)2.
The squared error distortion measure is perhaps the most popular distortion mea-
sure used for continuous alphabets.
Comments on squared error distortion I: 6-6

• The squared error distortion measure has the advantages of simplicity and
having a closed-form solution for most cases of interest, such as when using
least squares prediction.
• Yet, this measure is not ideal for practical situations involving data operated
by human observers (such as image and speech data) as it is inadequate in
measuring perceptual quality.
• For example, two speech waveforms in which one is a marginally time-shifted
version of the other may have large square error distortion; however, they sound
quite similar to the human ear.
Distortion measure for sequences I: 6-7

Definition 6.8 (Additive distortion measure between vectors) The ad-


ditive distortion measure ρn between vectors z n and ẑ n of size n (or n-sequences or
n-tuples) is defined by

n
ρn (z n , ẑ n ) = ρ(zi , ẑi).
i=1

Definition 6.9 (Maximum distortion measure)


ρn (z n , ẑ n ) = max ρ(zi , ẑi).
1≤i≤n

Question raised due to distortion measures for sequences


• Whether to reproduce source sequence z n by sequence ẑ n of the same length is
a must or not.
• In other words, can we use z̃ k to represent z n for k = n?
Answer: The answer is certainly yes if a distortion measure for z n and z̃ k is
defined.
Distortion measure for sequences I: 6-8

Problem: A problem for taking k = n is that the distortion measure for sequences
can no longer be defined based on per-letter distortions, and hence a per-letter
formula for the best lossy data compression rate cannot be rendered.
Solution: To view the lossy data compression in two steps.
Step 1 : Find the data compression code
h : Z n → Ẑ n
for which the pre-specified distortion constraint and rate constraint are
both satisfied.
Step 2 : Derive the (asymptotically) lossless data compression block code for
source h(Z n). The existence of such code with block length
k > H(h(Z n )) bits
is guaranteed by Shannon’s lossless source coding theorem.

• Therefore, a lossy data compression code from

Zn → Ẑ n → {0, 1}k

is established.
Distortion measure for sequences I: 6-9

• Since the second step is already discussed in lossless data compression, we can
say that the theorem regarding the lossy data compression is basically a theorem
on the first step.
6.2 Fixed-length lossy data compression I: 6-10

Definition 6.10 (Fixed-length lossy data compression code subject


to average distortion constraint) An (n, M, D) fixed-length lossy data com-
pression code for source alphabet Z n and reproduction alphabet Ẑ n consists of a
compression function
h : Z n → Ẑ n
with the size of the codebook (i.e., the image h(Z n)) being |h(Z n)| = M , and the
average distortion satisfying
1
E ρn(Z n , h(Z n)) ≤ D.
n
• Code rate for lossy data compression
1
log M bits/sourceword
n 2
• Asymptotic code rate for lossy data compression
1
lim sup log M bits/sourceword
n→∞ n 2
Achievable Rate-Distortion Pair I: 6-11

Definition 6.11 (Achievable rate-distortion pair) For a given sequence of


distortion measures {ρn }n≥1, a rate distortion pair (R, D) is achievable if there
exists a sequence of fixed-length lossy data compression codes (n, Mn, D) with
ultimate code rate
1
lim sup log2 Mn ≤ R.
n→∞ n

Definition 6.12 (Rate-distortion region) The rate-distortion region R of a


source {Zn} is the closure of the set of all achievable rate-distortion pair (R, D).
Achievable Rate-Distortion Pair I: 6-12

Lemma 6.13 (Time-sharing principle) Under an additive distortion mea-


sure ρn , the rate-distortion region R is a convex set; i.e., if (R1, D1) ∈ R and
(R2, D2) ∈ R, then (λR1 + (1 − λ)R2, λD1 + (1 − λ)D2) ∈ R for all 0 ≤ λ ≤ 1.

Proof:
• time-sharing argument:
– If we can use an (n, M1, D1) code ∼C1 to achieve (R1, D1) and an (n, M2, D2)
code ∼C2 to achieve (R2, D2), then for any rational number 0 < λ < 1, we
can use ∼C1 for a fraction λ of the time and use ∼C2 for a fraction 1 − λ
of the time to achieve (Rλ, Dλ), where Rλ = λR1 + (1 − λ)R2 and Dλ =
λD1 + (1 − λ)D2;
– hence the result holds for any real number 0 < λ < 1 by the density of the
rational numbers in R and the continuity of Rλ and Dλ in λ.
• Let r and s be positive integers and let λ = r
r+s ; then 0 < λ < 1.
Achievable Rate-Distortion Pair I: 6-13

• Assume that the pairs (R1, D1) and (R2, D2) are achievable. Then there exist
a sequence of (n, M1, D1) codes ∼C1 and a sequence of (n, M2, D2) codes ∼C2
such that for n sufficiently large,
1
log M1 ≤ R1
n 2
and
1
log M2 ≤ R2.
n 2
• Construct a sequence of new codes ∼C of blocklength nλ = (r + s)n, codebook
size M = M1r × M2s and compression function h : Z (r+s)n → Ẑ (r+s)n such that
h(z (r+s)n) = (h1(z1n ), . . . , h1(zrn ), h2(zr+1
n n
), . . . , h2(zr+s ))
where
z (r+s)n = (z1n , . . . , zrn , zr+1
n n
, . . . , zr+s )
and h1 and h2 are the compression functions of ∼C1 and ∼C2, respectively.
Achievable Rate-Distortion Pair I: 6-14

• The average (or expected) distortion under the additive distortion measure ρn
and the rate of code ∼C are given by
 
(r+s)n (r+s)n
ρ(r+s)n(z , h(z )) 1 ρn (z1n , h1(z1n )) ρn (zrn , h1(zrn ))
E = E + ···+ E
(r + s)n r+s n n
n n n n

ρn (zr+1 , h2(zr+1 )) ρn (zr+s , h2(zr+s ))
+E + ···+ E
n n
1
≤ (rD1 + sD2)
r+s
= λD1 + (1 − λ)D2 = Dλ
and
1 1
log2 M = log2(M1r × M2s)
(r + s)n (r + s)n
r 1 s 1
= log2 M1 + log M2
(r + s) n (r + s) n 2
≤ λR1 + (1 − λ)R2 = Rλ,
respectively, for n sufficiently large. Thus, (Rλ, Dλ) is achievable by ∼.
C
Achievable Rate-Distortion Pair I: 6-15

Definition 6.14 (Rate-distortion function) The rate-distortion function, de-


noted by R(D), of source {Zn} is the smallest R̂ for a given distortion threshold D
such that (R̂, D) is an achievable rate-distortion pair; i.e.,
R(D):= inf{R̂ ≥ 0 : (R̂, D) ∈ R}.

Observation 6.15 (Monotonicity and convexity of R(D)) Note that,


under an additive distortion measure ρn , the rate-distortion function R(D) is non-
increasing and convex in D (the proof is left as an exercise).
6.3 Rate-distortion theorem I: 6-16

Definition 6.16 (Distortion typical set) The distortion δ-typical set with
respect to the memoryless (product) distribution PZ,Ẑ on Z n × Ẑ n and a bounded
additive distortion measure ρn (·, ·) is defined by

Dn(δ) := (z n , ẑ n ) ∈ Z n × Ẑ n :
 
 1 
− log2 PZ n (z n ) − H(Z) < δ,
 n 
 
 1 
− log2 P n (ẑ n ) − H(Ẑ) < δ,
 n Ẑ 
 
 1 
− log2 P n n (z n , ẑ n ) − H(Z, Ẑ) < δ,
 n Z ,Ẑ 
  
1 
and  ρn (z n, ẑ n ) − E[ρ(Z, Ẑ)] < δ .
n
AEP for distortion typical set I: 6-17

Theorem 6.17 If (Z1, Ẑ1), (Z2, Ẑ2), . . ., (Zn , Ẑn), . . . are i.i.d., and ρn are bou-
nded additive distortion measure, then
1
− log2 PZ n (Z1, Z2, . . . , Zn) → H(Z) in probability;
n
1
− log2 PẐ n (Ẑ1, Ẑ2, . . . , Ẑn) → H(Ẑ) in probability;
n
1
− log2 PZ n,Ẑ n ((Z1, Ẑ1), . . . , (Zn, Ẑn )) → H(Z, Ẑ) in probability;
n
and
1
ρn (Z n, Ẑ n ) → E[ρ(Z, Ẑ)] in probability.
n
Proof: Functions of independent random variables are also independent random
variables. Thus by the weak law of large numbers, we have the desired result. 2

• It needs to be pointed out that without the bounded property assumption, the
normalized sum of an i.i.d. sequence does not necessarily converge in probability
to a finite mean, hence the need for requiring that ρ be bounded.
AEP for distortion typical set I: 6-18

Theorem 6.18 (AEP for distortion measure) Given a DMS {(Zn, Ẑn)}
with generic joint distribution PZ,Ẑ and any δ > 0, the distortion δ-typical set
satisfies
1. PZ n,Ẑ n (Dnc (δ)) < δ for n sufficiently large.
2. For all (z n , ẑ n ) in Dn(δ),

PẐ n (ẑ n) ≥ PẐ n|Z n (ẑ n|z n )2−n[I(Z;Ẑ)+3δ]. (6.3.1)

Proof: The first result follows directly from Theorem 6.17 and the definition of
the distortion typical set Dn(δ). The second result can be proved as follows:
PZ n,Ẑ n (z n , ẑ n )
PẐ n|Z n (ẑ n |z n ) =
PZ n (z n )
n n
P Z n ,Ẑ n (z , ẑ )
= PẐ n (ẑ n )
PZ n (z n )PẐ n (ẑ n )
2−n[H(Z,Ẑ)−δ]
≤ PẐ n (ẑ ) n
2−n[H(Z)+δ]2−n[H(Ẑ)+δ]
= PẐ n (ẑ n )2n[I(Z;Ẑ)+3δ],
where the inequality follows from the definition of Dn(δ).
AEP for distortion typical set I: 6-19

• Alternative form of (6.3.1):


PZ n,Ẑ n (z n , ẑ n )
≤ 2−n[I(Z;Ẑ)+3δ] for all (z n , ẑ n ) ∈ Dn(δ).
PZ n (z n )P (ẑ n )
Ẑ n

Lemma 6.19 For 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, and n > 0,


(1 − xy)n ≤ 1 − x + e−yn, (6.3.2)
with equality holds if, and only if, (x, y) = (1, 0).
Shannon’s lossy source coding theorem I: 6-20

Theorem 6.20 (Shannon’s rate-distortion theorem for memoryless


sources) Consider a DMS {Zn }∞ n=1 with alphabet Z, reproduction alphabet Ẑ
and a bounded additive distortion measure ρn (·, ·); i.e.,

n
ρn (z n , ẑ n ) = ρ(zi , ẑi) and ρmax := max ρ(z, ẑ) < ∞,
(z,ẑ)∈Z×Ẑ
i=1

where ρ(·, ·) is a given single-letter distortion measure. Then the source’s rate-
distortion function satisfies the following expression
R(D) = min I(Z; Ẑ).
PẐ|Z : E[ρ(Z,Ẑ)]≤D

Proof: Define
R(I)(D) := min I(Z; Ẑ); (6.3.3)
PẐ|Z : E[ρ(Z,Ẑ)]≤D

this quantity is typically called Shannon’s information rate-distortion function.


We will then show that the (operational) rate-distortion function R(D) given in
Definition 6.14 equals R(I)(D).
Shannon’s lossy source coding theorem I: 6-21

1. Achievability Part (i.e., R(D + ε) ≤ R(I)(D) + 4ε for arbitrarily small ε > 0):
We need to show that for any ε > 0, there exist 0 < γ < 4ε and a sequence of
lossy data compression codes {(n, Mn, D + ε)}∞ n=1 with
1
lim sup log2 Mn ≤ R(I)(D) + γ < R(I)(D) + 4ε.
n→∞ n
Step 1: Optimizing conditional distribution. Let PZ̃|Z be the conditional
distribution that achieves R(I)(D), i.e.,
R(I)(D) = min I(Z; Ẑ) = I(Z; Z̃).
PẐ|Z : E[ρ(Z,Ẑ)]≤D

Then
E[ρ(Z, Z̃)] ≤ D.
Choose Mn to satisfy
1 1
R(I)(D) + γ ≤ log2 Mn ≤ R(I)(D) + γ
2 n
for some γ in (0, 4ε), for which the choice should exist for all sufficiently large
n > N0 for some N0. Define
 
γ ε
δ := min , .
8
 1 + 2ρ
  max
Required in Step 4 Required in Step 5
Shannon’s lossy source coding theorem I: 6-22

Step 2: Random coding. Independently select Mn words from Ẑ n according


to

n
n
PZ̃ n (z̃ ) = PZ̃ (z̃i ),
i=1
and denote this random codebook by ∼Cn , where

PZ̃ (z̃) = PZ (z)PZ̃|Z (z̃|z).
z∈Z
Shannon’s lossy source coding theorem I: 6-23

Step 3: Encoding rule. Define a subset of Z n as


J (∼Cn ) := {z n ∈ Z n : ∃ z̃ n ∈ ∼Cn such that (z n , z̃ n ) ∈ Dn(δ)} ,
where Dn(δ) is defined under PZ̃|Z . Based on the codebook
∼Cn = {c1, c2, . . . , cMn },
define the encoding rule as:


 cm , if (z n , cm ) ∈ Dn(δ);


(when more than one satisfying the requirement,
hn (z n) =

 just pick any.)

 any word in ∼C ,
n otherwise.

Note that when z n ∈ J (∼Cn ), we have (z n , hn(z n )) ∈ Dn(δ) and


1
ρn (z n , hn(z n )) ≤ E[ρ(Z, Z̃)] + δ ≤ D + δ.
n
Shannon’s lossy source coding theorem I: 6-24

Step 4: Calculation of the probability of the complement of J (∼


Cn).
Let N1 be chosen such that for n > N1,
PZ n,Z̃ n (Dnc (δ)) < δ.
Let
Ω := PZ n (J c(∼Cn)).
Then the expected probability of source n-tuples not belonging to J (∼Cn),
averaged over all randomly generated codebooks, is given by
 
 
E[Ω] = PZ̃ n (∼Cn)  PZ n (z n )
∼Cn z n ∈J (∼Cn )
 
 
= PZ n (z ) 
n
PZ̃ n (∼Cn) .
z n ∈Z n ∼Cn : z n ∈
 J (∼Cn )
Shannon’s lossy source coding theorem I: 6-25

For any z n given, to select a codebook ∼Cn satisfying z n ∈ J (∼Cn ) is equivalent


to independently draw Mn n-tuples from Ẑ n which are not jointly distortion
typical with z n . Hence,
 ! " #$Mn
PZ̃ n (∼Cn) = Pr (z n , Z̃ n ) ∈ Dn(δ) .
∼Cn : z n ∈
 J (∼Cn )

For convenience, we let K(z n , z̃ n ) denote the indicator function of Dn(δ), i.e.,

n n 1, if (z n, z̃ n ) ∈ Dn(δ);
K(z , z̃ ) =
0, otherwise.
Then
 Mn
 
PZ̃ n (∼Cn) = 1 − PZ̃ n (z̃ n )K(z n , z̃ n) .
∼Cn : z n ∈
 J (∼Cn ) z̃ n ∈Ẑ n
Shannon’s lossy source coding theorem I: 6-26

Continuing the computation


 of E[Ω], we get M
n
 
E[Ω] = PZ n (z n ) 1 − PZ̃ n (z̃ n )K(z n, z̃ n )
z n ∈Z n z̃ n ∈Ẑ n
 Mn
 
≤ PZ n (z n ) 1 − PZ̃ n |Z n (z̃ n |z n )2−n(I(Z;Z̃)+3δ) K(z n, z̃ n ) (by (6.3.1))
z n ∈Z n z̃ n ∈Ẑ n
 Mn
 
= PZ n (z n ) 1 − 2−n(I(Z;Z̃)+3δ) PZ̃ n |Z n (z̃ n |z n )K(z n, z̃ n )
z n ∈Z n z̃ n ∈Ẑ n
    %
≤ PZ n (z n ) 1 − PZ̃ n |Z n (z̃ n |z n )K(z n, z̃ n ) + exp −Mn · 2−n(I(Z;Z̃)+3δ) (from (6.3.2))
z n ∈Z n
  z̃ n ∈Ẑ n

≤ PZ n (z n ) 1 − PZ̃ n |Z n (z̃ n |z n )K(z n, z̃ n )
z n ∈Z n z̃ n ∈Ẑ n
 %
n(R(I) (D)+γ/2) −n(I(Z;Z̃)+3δ)
+ exp −2 ·2 (for R(I) (D) + γ/2 < (1/n) log2 Mn )
& '
≤ 1 − PZ n ,Z̃ n (Dn(δ)) + exp −2nδ (for R(I) (D) = I(Z; Z̃) and δ ≤ γ/8)
& '
= PZ n ,Z̃ n (Dnc (δ)) + exp −2nδ
≤ δ + δ = 2δ
 %
for all n > N := max N0, N1, 1δ log2 ln min{δ,1}1
.
Since E[Ω] = E [PZ n (J c(∼Cn ))] ≤ 2δ, there must exist a codebook ∼C∗n such
that PZ n (J c (∼C∗n)) is no greater than 2δ for n sufficiently large.
Shannon’s lossy source coding theorem I: 6-27

Step 5: Calculation of distortion. The distortion of the optimal codebook


∼C∗n (from the previous step) satisfies for n > N :
1  1
n n
E[ρn (Z , hn(Z ))] = PZ n (z n ) ρn (z n, hn (z n ))
n n
z n ∈J (∼C∗n )
 1
+ PZ n (z n) ρn (z n , hn(z n ))
n
z n ∈J (∼C∗n )
 
≤ n
PZ n (z )(D + δ) + PZ n (z n )ρmax
z n ∈J (∼C∗n ) z n ∈J (∼C∗n )
≤ (D + δ) + 2δ · ρmax
≤ D + δ(1 + 2ρmax )
≤ D + ε.

This concludes the proof of the achievability part.


Shannon’s lossy source coding theorem I: 6-28

2. Converse Part (i.e., R(D + ε) ≥ R(I)(D) for arbitrarily small ε > 0 and
any D ∈ {D ≥ 0 : R(I)(D) > 0}): We need to show that for any sequence of
{(n, Mn, Dn)}∞
n=1 code with

1
lim sup log2 Mn < R(I)(D),
n→∞ n
there exists ε > 0 such that
1
Dn = E[ρn (Z n , hn(Z n ))] > D + ε
n
for n sufficiently large. The proof is as follows.
Step 1: Convexity of mutual information. By the convexity of mutual in-
formation I(Z; Ẑ) with respect to PẐ|Z for a fixed PZ , we have

I(Z; Ẑλ) ≤ λ · I(Z; Ẑ1) + (1 − λ) · I(Z; Ẑ2 ),


where λ ∈ [0, 1], and
PẐλ|Z (ẑ|z) := λPẐ1|Z (ẑ|z) + (1 − λ)PẐ2|Z (ẑ|z).
Shannon’s lossy source coding theorem I: 6-29

Step 2: Convexity of R(I)(D). Let PẐ1|Z and PẐ2|Z be two distributions


achieving R(I)(D1) and R(I)(D2), respectively. Since
 
E[ρ(Z, Ẑλ )] = PZ (z) PẐλ|Z (ẑ|z)ρ(z, ẑ)
z∈Z ẑ∈Ẑ
 ( )
= PZ (z) λPẐ1|Z (ẑ|z) + (1 − λ)PẐ2|Z (ẑ|z) ρ(z, ẑ)
z∈Z,ẑ∈Ẑ
= λD1 + (1 − λ)D2,
we have
R(I)(λD1 + (1 − λ)D2) ≤ I(Z; Ẑλ)
≤ λI(Z; Ẑ1) + (1 − λ)I(Z; Ẑ2)
= λR(I)(D1) + (1 − λ)R(I)(D2).
Therefore, R(I)(D) is a convex function.
Shannon’s lossy source coding theorem I: 6-30

Step 3: Strictly decreasing and continuity properties of R(I)(D).


By definition, R(I)(D) is non-increasing in D. Also,

R (D) = 0 iff D ≥ Dmax := min
(I)
PZ (z)PẐ (ẑ)ρ(z, ẑ)
PẐ
z∈Z
 ẑ∈Ẑ 
= min PẐ (ẑ) PZ (z)ρ(z, ẑ)
PẐ
ẑ∈Ẑ z∈Z

= min PZ (z)ρ(z, ẑ) (6.3.4)
ẑ∈Ẑ
z∈Z

which is finite by the boundedness of the distortion measure. Thus since


R(I)(D) is non-increasing and convex, it directly follows that it is strictly de-
creasing and continuous over {D ≥ 0 : R(I)(D) > 0}.
Shannon’s lossy source coding theorem I: 6-31

Step 4: Main proof.


log2 Mn ≥ H(hn (Z n))
= H(hn (Z n)) − H(hn (Z n )|Z n), since H(hn(Z n )|Z n ) = 0;
= I(Z n ; hn(Z n ))
= H(Z n ) − H(Z n |hn(Z n))
n  n
= H(Zi ) − H(Zi|hn (Z n), Z1, . . . , Zi−1)
i=1 i=1
by the independence of Z n, and the chain rule for conditional entropy;
n n
≥ H(Zi ) − H(Zi|Ẑi ), where Ẑi is the ith component of hn(Z n);
i=1 i=1
n 
n
= I(Zi ; Ẑi) ≥ R(I)(Di), where Di := E[ρ(Zi , Ẑi )];
i=1 i=1
* n +

n
1 (I) 1
= n R (Di) ≥ nR(I) Di , by convexity of R(I)(D);
n n
i=1
 
i=1
1
= nR(I) E[ρn (Z n , hn(Z n ))] ,
n
where the last step follows since the distortion measure is additive.
Shannon’s lossy source coding theorem I: 6-32

Finally,
1
lim sup log2 Mn < R(I)(D)
n→∞ n
implies the existence of N and γ > 0 such that
1
log2 Mn < R(I)(D) − γ
n
for all n > N . Therefore, for n > N ,
   
1 1
R(I) E[ρn (Z n, hn(Z n ))] ≤ log2 Mn < R(I)(D) − γ,
n n
which, together with the fact that R(I)(D) is strictly decreasing, implies that
1
E[ρn (Z n , hn(Z n ))] > D + ε
n
for some ε = ε(γ) > 0 and for all n > N .
Hence, (R(I)(D), D + ε) is not achievable and the operational R(D) satisfies
R(D + ε) > R(I)(D) for arbitrarily small ε > 0.
Shannon’s lossy source coding theorem I: 6-33

3. Summary:
• For D ∈ {D ≥ 0 : R(I)(D) > 0}, the achievability and converse parts jointly
imply that
R(I)(D) + 4ε ≥ R(D + ε) ≥ R(I)(D)
for arbitrarily small ε > 0.
• These inequalities together with the continuity of R(I)(D) yield that
R(D) = R(I)(D)
for D ∈ {D ≥ 0 : R(I)(D) > 0}.
• For D ∈ {D ≥ 0 : R(I)(D) = 0}, the achievability part gives us
R(I)(D) + 4ε = 4ε ≥ R(D + ε) ≥ 0
for arbitrarily small ε > 0. This immediately implies that
R(D) = 0 (= R(I)(D)).
2
Notes I: 6-34

• The formula of the rate-distortion function obtained in the previous theorems


is also valid for the squared error distortion over the real numbers, even if it is
unbounded.

– For example, the boundedness assumption in the theorems can be replaced


with assuming that there exists a reproduction symbol ẑ0 ∈ Ẑ such that
E[ρ(Z, ẑ0 )] < ∞.
– This assumption can accommodate the squared error distortion measure
and a source with finite second moment (including continuous-alphabet
sources such as Gaussian sources).

• Here, we put the boundedness assumption just to facilitate the exposition of


the current proof.
Notes I: 6-35

• After introducing
– Shannon’s source coding theorem for block codes
– Shannon’s channel coding theorem for block codes
– Rate-distortion theorem
in the memoryless (and stationary ergodic) system setting, we briefly elucidate
the “key concepts or techniques” behind these lengthy proofs, in particular:
– The notion of a typical set
∗ The typical set construct – specifically,
· δ-typical set for source coding
· joint δ-typical set for channel coding
· distortion typical set for rate-distortion
uses a law of large numbers or AEP argument to claim the existence
of a set with very high probability; hence, the respective information
manipulation can just focus on the set with negligible performance loss.
Notes I: 6-36

– The notion of random coding


∗ The random coding technique shows that the expectation of the desired
performance over all possible information manipulation schemes (ran-
domly drawn according to some properly chosen statistics) is already
acceptably good, and hence the existence of at least one good scheme
that fulfills the desired performance index is validated.
• As a result, in situations where the above two techniques apply, a similar
theorem can often be established.
Notes I: 6-37

Theorem 6.21 (Shannon’s rate-distortion theorem for stationary er-


godic sources) Consider a stationary ergodic source {Zn}∞ n=1 with alphabet Z,
reproduction alphabet Ẑ and a bounded additive distortion measure ρn (·, ·); i.e.,

n
ρn (z n , ẑ n ) = ρ(zi , ẑi) and ρmax := max ρ(z, ẑ) < ∞,
(z,ẑ)∈Z×Ẑ
i=1

where ρ(·, ·) is a given single-letter distortion measure. Then the source’s rate-
distortion function is given by
R(D) = R̄(I)(D),
where
R̄(I)(D) := lim Rn(I)(D) (6.3.5)
n→∞
is called the asymptotic information rate-distortion function. and
1
Rn(I)(D) := min I(Z n ; Ẑ n) (6.3.6)
PẐ n |Z n : n1 E[ρn (Z n ,Ẑ n )]≤D n

is the n-th order information rate-distortion function.


Notes I: 6-38

• Question: Can we extend the theorems to cases where the two arguments
fail?’
• It is obvious that only when new methods (other than the above two) are
developed can the question be answered in the affirmative.
6.4 Calculation of the rate-distortion function I: 6-39

Theorem 6.23 Fix a binary DMS {Zn}∞ n=1 with marginal distribution PZ (0) =
1 − PZ (1) = p, where 0 < p < 1. Then the source’s rate-distortion function under
the Hamming additive distortion measure is given by:

hb(p) − hb(D) if 0 ≤ D < min{p, 1 − p};
R(D) =
0 if D ≥ min{p, 1 − p},
where hb(p) := − p · log(p) − (1 − p) · log(1 − p) is the binary entropy function.

Proof: Assume without loss of generality that p ≤ 1/2.


• We first prove the theorem under 0 ≤ D < min{p, 1 − p} = p. Observe that
for any binary random variable Ẑ,
H(Z|Ẑ) = H(Z ⊕ Ẑ|Ẑ).
Also observe that
E[ρ(Z, Ẑ)] ≤ D implies Pr{Z ⊕ Ẑ = 1} ≤ D.
6.4 Calculation of the rate-distortion function I: 6-40

Then
I(Z; Ẑ) = H(Z) − H(Z|Ẑ)
= hb(p) − H(Z ⊕ Ẑ|Ẑ)
≥ hb(p) − H(Z ⊕ Ẑ) (conditioning never increase entropy)
≥ hb(p) − hb(D),
where the last inequality follows since hb(x) is increasing for x ≤ 1/2, and
Pr{Z ⊕ Ẑ = 1} ≤ D.
• Since the above derivation is true for any PẐ|Z , we have
R(D) ≥ hb(p) − hb (D).
6.4 Calculation of the rate-distortion function I: 6-41

• It remains to show that the lower bound is achievable by some PẐ|Z , or equiv-
alently, H(Z|Ẑ) = hb(D) for some PẐ|Z .

By defining PZ|Ẑ (0|0) = PZ|Ẑ (1|1) = 1−D, we immediately obtain H(Z|Ẑ) =


hb (D). The desired PẐ|Z can be obtained by solving
1 = PẐ (0) + PẐ (1)
PZ (0) PZ (0)
= PẐ|Z (0|0) + P (1|0)
PZ|Ẑ (0|0) PZ|Ẑ (0|1) Ẑ|Z
p p
= PẐ|Z (0|0) + (1 − PẐ|Z (0|0))
1−D D
and
1 = PẐ (0) + PẐ (1)
PZ (1) PZ (1)
= PẐ|Z (0|1) + P (1|1)
PZ|Ẑ (1|0) PZ|Ẑ (1|1) Ẑ|Z
1−p 1−p
= (1 − PẐ|Z (1|1)) + P (1|1),
D 1 − D Ẑ|Z
and yield
   
1−D D 1−D D
PẐ|Z (0|0) = 1− and PẐ|Z (1|1) = 1− .
1 − 2D p 1 − 2D 1−p
6.4 Calculation of the rate-distortion function I: 6-42

• Now in the case of p ≤ D < 1 − p, we can let PẐ|Z (1|0) = PẐ|Z (1|1) = 1 to
obtain I(Z; Ẑ) = 0 and

1 
1
E[ρ(Z, Ẑ)] = PZ (z)PẐ|Z (ẑ|z)ρ(z, ẑ) = p ≤ D.
z=0 ẑ=0

Similarly, in the case of D ≥ 1 − p, we let PẐ|Z (0|0) = PẐ|Z (0|1) = 1 to obtain


I(Z; Ẑ) = 0 and

1 
1
E[ρ(Z, Ẑ)] = PZ (z)PẐ|Z (ẑ|z)ρ(z, ẑ) = 1 − p ≤ D.
z=0 ẑ=0
2

• Remark: The Hamming additive distortion measure is defined as:



n
ρn (z n , ẑ n ) = zi ⊕ ẑi,
i=1

where “⊕” denotes modulo two addition. In such case, ρ(z n , ẑ n ) is exactly the
number of bit changes or bit errors after compression.
6.4.2 Rate distortion func / the squared error dist I: 6-43

Theorem 6.26 (Gaussian sources maximize the rate-distortion func-


tion) Under the additive squared error distortion measure, namely

n
ρn (z n, ẑ n ) = (zi − ẑi )2,
i=1

the rate-distortion function for any continuous memoryless source {Zi} with a pdf
of support R, zero mean, variance σ 2 and finite differential entropy satisfies

1 σ2
log2 , for 0 < D ≤ σ 2
R(D) ≤ 2 D

0, for D > σ 2
with equality holding when the source is Gaussian.

Proof: By Theorem 6.20 (extended to the “unbounded” squared error distortion


measure),
R(D) = R(I)(D) = min I(Z; Ẑ).
fẐ|Z : E[(Z−Ẑ)2 ]≤D

So for any fẐ|Z satisfying the distortion constraint,


R(D) ≤ I(fZ , fẐ|Z ).
6.4.2 Rate distortion func / the squared error dist I: 6-44

For 0 < D ≤ σ 2:
• Choose a dummy Gaussian random variable W with zero mean and variance
aD, where a = 1 − D/σ 2, and is independent of Z. Let Ẑ = aZ + W . Then
E[(Z − Ẑ)2] = E[(1 − a)2Z 2] + E[W 2] = (1 − a)2σ 2 + aD = D
which satisfies the distortion constraint.
• Note that the variance of Ẑ is equal to E[a2Z 2] + E[W 2] = σ 2 − D.
• Consequently,
R(D) ≤ I(Z; Ẑ)
= h(Ẑ) − h(Ẑ|Z)
= h(Ẑ) − h(W + aZ|Z)
= h(Ẑ) − h(W |Z)
= h(Ẑ) − h(W ) (by the independence of W and Z)
1
= h(Ẑ) − log2(2πe(aD))
2
1 1 1 σ2
≤ log2(2πe(σ − D)) − log2(2πe(aD)) = log2 .
2
2 2 2 D
6.4.2 Rate distortion func / the squared error dist I: 6-45

For D > σ 2:
• Let Ẑ satisfy Pr{Ẑ = 0} = 1 (and be independent of Z).
• Then E[(Z − Ẑ)2] = E[Z 2]+E[Ẑ 2 ]−2E[Z]E[Ẑ] = σ 2 < D, and I(Z; Ẑ) = 0.
Hence, R(D) = 0 for D > σ 2.

The achievability of this upper bound by a Gaussian source (with zero mean and
variance σ 2) can be proved by showing that under the Gaussian source,
(1/2) log2(σ 2/D)
is a lower bound to R(D) for 0 < D ≤ σ 2.
6.4.2 Rate distortion func / the squared error dist I: 6-46

Indeed, when the source Z is Gaussian and for any fẐ|Z such that E[(Z − Ẑ)2] ≤ D,
we have
I(Z; Ẑ) = h(Z) − h(Z|Ẑ)
1
= log2(2πeσ 2) − h(Z − Ẑ|Ẑ)
2
1
≥ log2(2πeσ 2) − h(Z − Ẑ)
2
1 1
≥ log2(2πeσ 2) − log2 2πe Var[(Z − Ẑ)]
2 2
1 1
≥ log2(2πeσ 2) − log2 2πe E[(Z − Ẑ)2]
2 2
1 1
≥ log2(2πeσ 2) − log2 (2πeD)
2 2
2
1 σ
= log2 .
2 D
6.4.2 Rate distortion func / the squared error dist I: 6-47

Theorem 6.27 (Shannon lower bound on the rate-distortion func-


tion: squared error distortion) Consider a continuous memoryless source
{Zi } with a pdf of support R and finite differential entropy under the additive
squared error distortion measure. Then its rate-distortion function satisfies
1
R(D) ≥ h(Z) − log2(2πeD).
2
Proof: The proof, which follows similar steps as in the achievability of the upper
bound in the proof of the previous theorem, is left as an exercise.
6.4.2 Rate distortion func / the squared error dist I: 6-48

• In Lemma 5.42, we show that for a discrete-time continuous-alphabet memo-


ryless additive-noise channel with input power constraint P and noise variance
σ 2, its capacity satisfies
CG(P ) + D(ZZG) ≥ C(P ) ≥ CG(P ) .
     
log2 1+ P2
non-Gaussianness 1
=h(ZG )−h(Z) 2 σ

• Similarly, for a continuous memoryless source {Zi} with a pdf of support R and
finite differential entropy under the additive squared error distortion measure
its rate-distortion function satisfies
R (D) − D(ZZG) ≤ R(D) ≤ RG(D) .
G     
2
log2 σD
Shannon lower bound 1
on the rate distortion func 2

Section 6.4.3 is based on a similar idea but targets for the absolute error distor-
tion; hence, we omit it in our lecture. Notably, a correction has been provided
for Theorem 6.29 (See errata for the textbook.)
6.5 Lossy joint source-channel coding theorem I: 6-49

• This is also named lossy information-transmission theorem.

Definition 6.32 (Lossy source-channel block code) Given a discrete-time


source {Zi}∞
i=1 with alphabet Z and reproduction alphabet Ẑ and a discrete-time
channel with input and output alphabets X and Y, respectively, an m-to-n lossy
source-channel block code with rate mn source symbol/channel symbol is a pair of
mappings (f (sc) , g (sc)), where
f (sc) : Z m → X n and g (sc) : Y n → Ẑ m.

Encoder Xn Yn Decoder
Z ∈Z
m m - -
Channel - -
Ẑ m ∈ Ẑ m
f (sc) g (sc)

,m
Given an additive distortion measure ρm = i=1 ρ(zi , ẑi ), where ρ is a distor-
tion function on Z × Ẑ, we say that the m-to-n lossy source-channel block code
(f (sc) , g (sc)) satisfies the average distortion fidelity criterion D, where D ≥ 0, if
1
E[ρm (Z m, Ẑ m )] ≤ D.
m
6.5 Lossy joint source-channel coding theorem I: 6-50

Theorem 6.33 (Lossy joint source-channel coding theorem) Consider


a discrete-time stationary ergodic source {Zi}∞i=1 with finite alphabet Z, finite
reproduction alphabet Ẑ, bounded additive distortion measure ρm (·, ·) and rate-
distortion function R(D), and consider a discrete-time memoryless channel with
input alphabet X , output alphabet Y and capacity C. Assuming that both R(D)
and C are measured in the same units, the following hold:
• Forward part (achievability): For any D > 0, there exists a sequence of m-
to-nm lossy source-channel codes (f (sc), g (sc)) satisfying the average distortion
fidelity criterion D for sufficiently large m if
 
m
lim sup · R(D) < C.
m→∞ n m

• Converse part: On the other hand, for any sequence of m-to-nm lossy source-
channel codes (f (sc), g (sc) ) satisfying the average distortion fidelity criterion D,
we have  
m
· R(D) ≤ C.
nm
6.5 Lossy joint source-channel coding theorem I: 6-51

Observation 6.34 (Lossy joint source-channel coding theorem with


signaling rates)
• The above theorem admits another form when the source and channel are
described in terms of “signaling rates”.
• Let Ts and Tc represent the durations (in seconds) per source letter and per
channel input symbol, respectively.
• In this case, TTsc represents the source-channel transmission rate measured in
source symbols per channel use (or input symbol).
– Forward part: The source can be reproduced at the output of the channel
with distortion less than D (i.e., there exist lossy source-channel codes
asymptotically satisfying the average distortion fidelity criterion D) if
 
Tc
· R(D) < C.
Ts
– Converse part: For any lossy source-channel codes satisfying the average
distortion fidelity criterion D, we have
 
Tc
· R(D) ≤ C.
Ts
6.6 Shannon limit of communication systems I: 6-52

• A bound on the end-to-end distortion of a communication system:

– If a source with rate-distortion function R(D) can be transmitted over a


channel with capacity C via a source-channel block code of rate Rsc > 0
(in source symbols/channel use) and reproduced at the destination with an
average distortion no larger than D, then we must have that
1
R(D) ≤ C. (6.6.1)
Rsc
– Shannon limit:
 
1
DSL := min D : R(D) ≤ C .
Rsc
6.6 Shannon limit of communication systems I: 6-53

R(D)

1
DSL Rsc C

D
6.6 Shannon limit of communication systems I: 6-54

Example 6.35 (Shannon limit for a binary uniform DMS over a


BSC)
• Let Z = Ẑ = {0, 1} and consider a binary uniformly distributed DMS {Zi}
(i.e., a Bernoulli(1/2) source) using the additive Hamming distortion measure.
• Note that in this case, E[ρ(Z, Ẑ)] = P (Z = Ẑ) := Pb.
• We desire to transmit the source over a BSC with crossover probability  < 1/2.
• We then have for 0 ≤ D ≤ 12 ,
R(D) = 1 − hb(D), and C = 1 − hb().

• Hence, for a given ,


   
1 1 − h ()
(1 − hb ()) = h−1
b
DSL := min D : 1 − hb(D) ≤ b 1− .
Rsc Rsc

• Alternatively, for a given D,


   
1 ! $
SL := max  : 1 − hb(D) ≤ (1 − hb()) = h−1
b 1 − Rsc 1 − hb(D)
Rsc
6.6 Shannon limit of communication systems I: 6-55

• It is well-known that a BSC with crossover probability  represents a binary-


input AWGN channel used with antipodal (BPSK) signaling and hard-decision
coherent demodulation.
• With average energy per signal P , noise power N0
2
and signal-to-noise ratio
(SNR) γ = P/N0, we have
-
=Q 2γ (6.6.5)
where . ∞
1 2
− t2
Q(x) = √ e dt
2π x
is the Gaussian Q-function.
• If the channel is used with a source-channel code of rate Rsc source (or infor-
mation) bits/channel use, then  can be expressed in terms of a so-called SNR
per source (or information) bit
Eb 1 P 1
γb := = = γ,
N0 Rsc N0 Rsc
where Eb is the average energy per source bit.
6.6 Shannon limit of communication systems I: 6-56

• Thus,
-
=Q 2Rscγb (6.6.6)

• The minimal γb (in dB) for a given Pb = D < 12 and a source-channel code
rate Rsc < 1:
1 ! −1 $2
γb,SL = Q (SL)
2Rsc

Rate Rsc Pb = 0 Pb = 10−5 Pb = 10−4 Pb = 10−3 Pb = 10−3


1/3 1.212 1.210 1.202 1.150 0.077
1/2 1.775 1.772 1.763 1.703 1.258
2/3 2.516 2.513 2.503 2.423 1.882
4/5 3.369 3.367 3.354 3.250 2.547

• For Rsc = 1,
! ! $$
SL := h−1
b 1 − R sc 1 − h b (D) = D = Pb
and
1 ! −1 $2
γb,SL = Q (Pb) .
2
6.6 Shannon limit of communication systems I: 6-57

Example 6.37 (Shannon limit for a memoryless Gaussian source over


an AWGN channel)
• Let Z = Ẑ = R and consider a memoryless Gaussian source {Zi} of mean
zero and variance σ 2 and the squared error distortion function.
• The objective is to transmit the source over an AWGN channel with input
power constraint P and noise variance σN 2
= N20 and recover it with distortion
fidelity no larger than D, for a given threshold D > 0.
• The source’s rate-distortion function is given by
1 σ2
R(D) = log2 for 0 < D < σ2.
2 D
Furthermore, the capacity (or capacity-cost function) of the AWGN channel is
given as  
1 P
C(P ) = log2 1 + 2 .
2 σN
6.6 Shannon limit of communication systems I: 6-58

• The Shannon limit DSL for this system with rate Rsc is obtained via
 
1
DSL := min D : R(D) ≤ C(P )
Rsc
  
1 σ2 1 P
= min D : log2 ≤ log 1 + 2
2 D 2Rsc 2 σN
σ2
= 1/Rsc
(6.6.10)
1 + σP2
N

for 0 < DSL < σ 2.


6.6 Shannon limit of communication systems I: 6-59

Example 6.39 (Shannon limit for a binary uniform DMS over a


binary-input AWGN channel)
• Let Z = Ẑ = {0, 1} and consider a binary uniformly distributed DMS {Zi}
(i.e., a Bernoulli(1/2) source) using the additive Hamming distortion measure.
• The binary uniform source is sent via a source-channel code over a binary-input
AWGN channel used with antipodal (BPSK) signaling of power P and noise
2
variance σN = N0 /2.
• We then have for 0 ≤ D ≤ 12 ,
R(D) = 1 − hb(D).
6.6 Shannon limit of communication systems I: 6-60

• However, the channel√capacity √ C(P ) of the AWGN whose input takes on


two possible values + P or − P , whose output is real-valued and whose
noise variance is σN2
= N20 , is given by evaluating the mutual information √ be-
tween √the channel input and output under the input distribution PX (+ P ) =
PX (− P ) = 1/2:
. ∞ * / +
P 1 P P
e−y /2 log2 cosh
2
C(P ) = 2 log2(e) − √ 2 + y 2 dy
σN 2π −∞ σN σN
. ∞ * / +
RscEb 1 RscEb RscEb
e−y /2 log2 cosh
2
= log2(e) − √ +y dy
N0/2 2π −∞ N0 /2 N0 /2
. ∞ -
1 −y 2 /2
= 2Rscγb log2(e) − √ e log2[cosh(2Rscγb + y 2Rscγb )]dy,
2π −∞
where P = RscEb is the channel signal power, Eb is the average energy per
source bit, and γb = Eb/N0 is the SNR per source bit.
• Then, it requires
. ∞ -
1 1 −y 2 /2
1 − hb (Pb) ≤ 2Rscγb log2(e) − √ e log2[cosh(2Rscγb + y 2Rscγb)]dy .
Rsc 2π −∞
6.6 Shannon limit of communication systems I: 6-61

Shannon limit
1
Rsc = 1/2 c
sc sc sc sc sc sc Rsc = 1/3 s
1e-1 sc sc cs c
s c
ss c
1e-2 s
s c
s
s c
Pb 1e-3
c
1e-4 s c
s
1e-5 s

1e-6
-6 -5 -4 -3 -2 -1 -.495 0.19 1 2
γb (dB)

The Shannon limits for (2, 1) and (3, 1) codes under binary-input AWGN channel.

• The Shannon limits calculated above are pertinent due to the invention of
near-capacity achieving channel codes, such as Turbo or LDPC codes.
• For example, the rate-1/2 Turbo coding system proposed in 1993 can approach
a bit error rate of 10−5 at γb = 0.9 dB, which is only 0.714 dB away from the
Shannon limit of 0.186 dB.
6.6 Shannon limit of communication systems I: 6-62

Rate Rsc Pb = 0 Pb = 10−5 Pb = 10−4 Pb = 10−3 Pb = 10−3


1/3 −0.496 −0.496 −0.504 −0.559 −0.960
1/2 0.186 0.186 0.177 0.111 −0.357
2/3 1.060 1.057 1.047 0.963 0.382
4/5 2.040 2.038 2.023 1.909 1.152
6.6 Shannon limit of communication systems I: 6-63

Example 6.40 (Shannon limit for a binary uniform DMS over a


binary-input Rayleigh fading channel)
• Consider a BPSK modulated Rayleigh fading channel.
• Its input power is P = RscEb, its noise variance is σN
2
= N0/2 and the fading
distribution is Rayleigh:
fA (a) = 2ae−a ,
2
a > 0.
• Then,
0 . +∞ . +∞ ! $
Rscγb −Rsc γb (y+a)2 4Rsc γb ya
CDSI (γb ) = 1− fA(a) e log2 1 + e dy da.
π 0 −∞

• We then generate the below table according to:


1
1 − hb (Pb) ≤ CDSI (γb),
Rsc
Rate Rsc Pb = 0 Pb = 10−5 Pb = 10−4 Pb = 10−3 Pb = 10−3
1/3 0.489 0.487 0.479 0.412 −0.066
1/2 1.830 1.829 1.817 1.729 1.107
2/3 3.667 3.664 3.647 3.516 2.627
4/5 5.936 5.932 5.904 5.690 4.331
Key Notes I: 6-64

• Why lossy data compression (e.g., to transmit a source with entropy larger
than capacity)
• Distortion measure
• Lossy data compression codes
• Rate-distortion function
• Distortion typical set
• AEP for distortion measure
• Rate distortion theorem
Key Notes I: 6-65

Terminology
• Shannon’s source coding theorem → Shannon’s first coding theorem;
• Shannon’s channel coding theorem → Shannon’s second coding theorem;
• Rate distortion theorem → Shannon’s third coding theorem.
• Information transmission Theorem → Joint source-channel coding theorem
– Shannon limit (BER versus SNRb )

You might also like