Coding 515
Coding 515
Information Theory
1
Information Theory Problems
• How to transmit or store information as
efficiently as possible.
• What is the maximum amount of information
that can be transmitted or stored reliably?
• How can information be kept secure?
2
Digital Communications System
W X PX
Data source
PY|X
Data sink
Ŵ Y PY
Ŵ is an estimate of W
3
Communication Channel
• Cover and Thomas Chapter 7
4
Discrete Memoryless Channel
X Y
x1 y1
x2 y2
.
. .
. .
.
xK
yJ
J≥K≥2
5
Discrete Memoryless Channel
6
Binary Symmetric Channel
x1 y1
x2 y2
channel matrix
p p
p p
7
Binary Errors with Erasure Channel
1-p-q
x1 0 0 y1
q
p
e y2
p
q
x2 1 1 y3
1-p-q
8
BPSK Modulation K=2
2Eb
x1 ( t ) = + cos ( 2π fct ) 0 bit Eb is the energy per bit
Tb
Tb is the bit duration
2Eb fc is the carrier
x2 ( t ) = − cos ( 2π fct ) 1 bit frequency
Tb
9
BPSK Demodulation in AWGN
− Eb 0 + Eb
x2 x1
p(y|x2) p(y|x1)
x2 x1
1 ( y − xk )2
pY|X (y |= k)
x x= exp −
2πσ 2
2σ 2
10
BPSK Demodulation K=2
− Eb 0 + Eb
J=3
y3 y2 y1
− Eb 0 + Eb
J=4
y4 y3 y2 y1
− Eb 0 + Eb
J=8
y8 y1
11
Mutual Information for a BSC
crossover probability p
X BSC Y
p= 1 − p
channel matrix
p p
PY|X =
p p X Y
p
w 0 0
p p(x= 0)= w
p p(x =1) =1 − w =w
w 1 1
p
12
13
Convex Functions
Concave Convex
14
Concave Function
15
Convex Function
16
Mutual Information
17
18
input probabilities
19
20
21
channel transition probabilities
22
BSC I(X;Y)
w p
23
Properties of the Channel Capacity
• C ≥ 0 since I(X;Y) ≥ 0
• C ≤ log|X| = log(K) since
C = max I(X;Y) ≤ max H(X) = log(K)
• C ≤ log|Y| = log(J) for the same reason
• I(X;Y) is a concave function of p(X), so a local
maximum is a global maximum
24
Channel Capacity
X channel Y
25
Binary Symmetric Channel
x1 y1
x2 y2
C=
1 − h(p) for w =
w=
1/2
26
Symmetric Channels
A discrete memoryless channel is said to be
symmetric if the set of output symbols
{yj}, j = 1, 2, ..., J,
can be partitioned into subsets such that for
each subset of the matrix of transition
probabilities
• each column is a permutation of the other
columns
• each row is a permutation of the other rows.
27
Binary Channels
Symmetric channel matrix Non-symmetric channel matrix
1 − p p 1 − p1 p2
P= = P p1 ≠ p2
p 1 − p p1 1 − p2
28
Binary Errors with Erasure Channel
1-p-q
x1 0 0 y1
q
p
e y2
p
q
x2 1 1 y3
1-p-q
29
Binary Errors with Erasure Channel
1 − p − q p
P= q q
p 1 − p − q
1 − p − q p
P1 =
p 1 − p − q
P2 = [q q ]
30
Symmetric Channels
• No partition required → strongly symmetric
• Partition required → weakly symmetric
31
Capacity of a Strongly Symmetric Channel
Theorem
K J
I(X;Y)
=
k∑p(x )∑p(y | x )logp(y | x ) + H(Y)
k 1 =j 1
j k j k
J
= ∑p(y | x )logp(y | x ) + H(Y)
j =1
j k j k
32
Example J = K = 3
x1 .7 y1
.1 .2
.2
x2 .7 y2
.1
.2 .1
x3 .7 y3
33
Example
.7 .2 .1
PY|X = .1 .7 .2
.2 .1 .7
K J
=
∑p(x )∑p(y | x )logp(y | x )
k
k 1 =j 1
j k j k
34
Example
J
H(Y) = −∑ p(y j )logp(y j )
j =1
K
p(y1 ) = ∑ p(y1 | xk )p(xk )
k =1
K
p(y2 ) = ∑ p(y2 | xk )p(xk )
k =1
K
p(y J ) = ∑ p(y J | xk )p(xk )
k =1
36
r-ary Symmetric Channel
p p
C = (1 − p)log(1 − p) + (r − 1) log( ) + log r
r −1 r −1
p
= log r + (1 − p)log(1 − p) + p log( )
r −1
= log r + (1 − p)log(1 − p) + p log(p) − p log(r − 1)
= log r − h(p) − p log(r − 1)
• r=2 C = 1 – h(p)
• r=3 C = log23 – h(p) – p
• r=4 C = 2 – h(p) – plog23
37
Binary Errors with Erasure Channel
1-p-q
0 0
q
p
e
p
q
1 1
1-p-q
38
Binary Errors with Erasure Channel
.8 .05
PY|X = .15 .15
.05 .8
.425
.5
PX = PY = PY|X × PX = .15
.5 .425
39
Capacity of a Weakly Symmetric Channel
L
C = ∑ qi C i
i =1
• qi – probability of channel i
• Ci – capacity of channel i
40
Binary Errors with Erasure Channel
1− p−q
1−q q
0 0 0
p q
1−q
p e
1−q q
1 1− p−q 1 1 q
1−q
.9412 .0588
P1 = P2 = [1.0 1.0]
.0588 .9412
41
Binary Errors with Erasure Channel
L
C = ∑ qi C i
i =1
42
Binary Erasure Channel
1-p
0 0
p
p
1 1
1-p
C=1-p
43
Z Channel (Optical)
1
0 0 (light on)
1 1 (light off)
1-p
p(x= 0)= w
p(x =1) =1 − w =w
44
Z Channel (Optical)
2 2 p(y j | xk )
I(X;Y) = ∑∑ p(xk )p(y j | xk )log
=k 1 =j 1 p( y j )
1
I(x1 ;Y) = log
w + pw
p 1
I(x2 ;Y) p log
= + (1 − p)log
w + pw w
45
Mutual Information for the Z Channel
• p = 0.15
I(X;Y)
w
46
Z Channel (Optical)
1
w*= 1 −
(1 − p)(1 + 2h(p)/(1− p) )
=p 0.15
= w* 0.555
= C 0.685
47
Channel Capacity for the Z, BSC and BEC
p
48
Blahut-Arimoto Algorithm
K J p(y j | xk )
I(X;Y) = ∑ p(xk )∑ p(y j | xk )log K
p(x )p(y | x )
∑
= k 1 =j 1
l =1
l j l
50
Blahut-Arimoto Algorithm
• Update the probabilities
51
52
Symmetric Channel Example
53
54
Non-Symmetric Channel Example
0.7000
55
56
57
58
59
Communication over Noisy Channels
60
Binary Symmetric Channel
x1 y1
x2 y2
channel matrix
p p
p p p= 1 − p
61
Binary Symmetric Channel
• Consider a block of N = 1000 bits
– if p = 0, 1000 bits are received correctly
– if p = 0.01, 990 bits are received correctly
– if p = 0.5, 500 bits are received correctly
• When p > 0, we do not know which bits are in
error
– if p = 0.01, C = .919 bit
– if p = 0.5, C = 0 bit
62
Triple Repetition Code
• N=3
message w codeword c
0 000
1 111
63
Binary Symmetric Channel Errors
• If N bits are transmitted, the probability of an
m bit error pattern is
(1 − p )
m N −m
p
• The probability of exactly m errors is
( )p
N
m
m
(1 − p )
N −m
64
Triple Repetition Code
• N=3
• The probability of 0 errors is (1 − p)3
• The probability of 1 error is 3p(1 − p)2
• The probability of 2 errors is 3p2 (1 − p)
• The probability of 3 errors is p3
65
Triple Repetition Code
• For p = 0.01
– The probability of 0 errors is .970
– The probability of 1 error is 2.94×10-2
– The probability of 2 errors is 2.97×10-4
– The probability of 3 errors is 10-6
1
• If p
2
p(0 errors) p(1 error) p(2 errors) p(3 errors)
66
Triple Repetition Code – Decoding
Received Word Codeword Error Pattern
0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 1
0 1 0 0 0 0 0 1 0
1 0 0 0 0 0 1 0 0
1 1 1 1 1 1 0 0 0
1 1 0 1 1 1 0 0 1
1 0 1 1 1 1 0 1 0
0 1 1 1 1 1 1 0 0
67
Triple Repetition Code
• Majority vote or nearest neighbor decoding
will correct all single errors
000, 001, 010, 100 → 000
111, 110, 101, 011 → 111
• The probability of a decoding error is then
Pe = 3p2 (1 − p) + p3 = 3p2 − 2p3 < p
• If p = 0.01, then Pe = 0.000298 and only one
word in 3356 will be in error after decoding.
• A reduction by a factor of 33.
68
Code Rate
• After compression, the data is (almost)
memoryless and uniformly distributed
(equiprobable)
• Thus the entropy of the messages
(codewords) is
H(W ) = log bM
• The blocklength of a codeword is N
69
Code Rate
• The code rate is given by
log2 M
R= bits per channel use
N
• M is the number of codewords
• N is the block length
• For the triple repetition code
log2 2 1
R= =
3 3
70
Repetition
Code rate R
71
YN
XN
72
Shannon’s Noisy Coding Theorem
For any ε > 0 and for any rate R less than the
channel capacity C, there is an encoding and
decoding scheme that can be used to ensure
that the probability of decoding error Pe is less
than ε for a sufficiently large block length N.
73
Code rate R
74
Error Correction Coding N = 3
• R = 1/3 M = 2
0 → 000
1 → 111
• R=1M=8
000 → 000 001 → 001 010 → 010 011 → 011
111 → 111 110 → 110 101 → 101 100 → 100
• Another choice R = 2/3 M = 4
00 → 000 01 → 011
10 → 101 11 → 110
75
Error Correction Coding N = 3
• BSC p = 0.01
• M is the number of codewords
76
Codes for N=3
101 111 101 111
100
110 110
77
Error Correction Coding N = 5
• BSC p = 0.01
78
Error Correction Coding N = 7
• BSC p = 0.01 N = 7
Code Rate R Pe M=2NR
1 0.0679 128
6/7 0.0585 64
5/7 0.0490 32
4/7 2.03×10-3 16
3/7 1.46×10-3 8
2/7 9.80×10-4 4
1/7 3.40×10-7 2
81
Binary Codes
• For given values of M and N, there are
2MN
possible binary codes.
• Of these, some will be bad, some will be best
(optimal), and some will be good, in terms of
Pe
• An average code will be good.
82
83
Channel Capacity
• To prove that information can be transmitted
reliably over a noisy channel at rates up to the
capacity, Shannon used a number of new
concepts
– Allowing an arbitrarily small but nonzero
probability of error
– Using long codewords
– Calculating the average probability of error over a
random choice of codes to show that at least one
good code exists
84
Channel Coding Theorem
• Random coding used in the proof
• Joint typicality used as the decoding rule
• Shows that good codes exist which provide an
arbitrarily small probability of error
• Does not provide an explicit way of
constructing good codes
• If a long code (large N) is generated randomly,
the code is likely to be good but is difficult to
decode
85
86
Channel Capacity: Weak Converse
lower
bound
on Pe
R
C
87
Channel Capacity: Weak Converse
• C = 0.3 1-C/R
88
Channel Capacity: Strong Converse
• For rates above capacity (R > C)
89
Arimoto’s Error Exponent EA(R)
90
EA(R) for a BSC with p=0.1
91
• The capacity is a very clear dividing point
• At rates below capacity, Pe → 0 exponentially as N → ∞
• At rates above capacity, Pe → 1 exponentially as N → ∞
Pe
R
C
92