0% found this document useful (0 votes)
90 views

INGI 2348: Information Theory and Coding

This document provides an outline and introduction to channel coding in information theory. It defines key concepts like channel capacity, coding theorem, and linear codes. The summary is: 1) It introduces the basic notions of channel capacity including defining discrete memoryless channels, mutual information, and the capacity of a channel being the maximum mutual information over input distributions. 2) It covers properties of channel capacity like convexity of mutual information and techniques for characterizing capacity using convex optimization and Lagrange multipliers. 3) It provides an overview of topics in channel coding theory to be covered, including linear codes, cyclic codes, Reed-Solomon codes, and convolutional codes.

Uploaded by

Fan Yang
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

INGI 2348: Information Theory and Coding

This document provides an outline and introduction to channel coding in information theory. It defines key concepts like channel capacity, coding theorem, and linear codes. The summary is: 1) It introduces the basic notions of channel capacity including defining discrete memoryless channels, mutual information, and the capacity of a channel being the maximum mutual information over input distributions. 2) It covers properties of channel capacity like convexity of mutual information and techniques for characterizing capacity using convex optimization and Lagrange multipliers. 3) It provides an overview of topics in channel coding theory to be covered, including linear codes, cyclic codes, Reed-Solomon codes, and convolutional codes.

Uploaded by

Fan Yang
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

INGI 2348: Information Theory and Coding

Part 2: Channel Coding


() INGI 2348: Information Theory and Coding Part 2: Channel Coding 1 / 33
Outline
1
Channel capacity for discrete time memoryless channels
1.1 Basic Notions
1.2 Capacity computation
1.3 Coding Theorem: negative result
2
Coding Theorem for noisy channels
2.1 Introduction: Block Coding
2.2 Block Decoding
2.3 Error probability
2.4 Coding Theorem
3
Binary Hamming Space
3.1 Minimum distance
3.2 (Finding bounds on code size)
4
Linear Codes
4.1 Linear encoding
4.2 Linear decoding
4.3 Particular Codes
4.4 (Generalized linear codes)
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 2 / 33
Outline
5
Cyclic codes
5.1 Introduction
5.2 General Theory
6
Reed-Solomon Codes
4.1 Maximum distance separable codes (MDS)
6.2 Polynomial description
6.3 Applications
6.4 Decoding of Reed-Solomon codes
7
Convolutional codes
7.1 Denition and algebraic properties
7.2 State Trellis and Viterbi decoding
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 3 / 33
1. Channel Capacity for discrete time memoryless channels
1.1 Basic Notions
1.1.1 Channel denition
1.1.2 Mutual information I (X; Y)
1.1.3 Capacity
1.2 Capacity computations
1.2.1 Convexity properties
1.2.2 Characterization of capacity
1.2.3 Symmetric channels
1.3 Coding Theorem: Negative result
1.3.1 Introduction
1.3.2 Preliminary results
1.3.3 Main result
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 4 / 33
Introduction
General communication scheme
Transmission of information on noisy channel
Question/Objective
For a given channel (see def) what amount of information can be transmitted correctly
(with virtually no errors)
Realized with coding (channel coding)
- Adapting probabilities of input sequence X
- Error correction by adding redundancy
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 5 / 33
1.1.1. Channel Denition
Input Alphabet A = {a
1
, . . . , a
K
}
Output Alphabet B = {b
1
, . . . , b
J
}
Transition probabilities: P(b
j
|a
k
) 1 j J, 1 k K
Memoryless Channel
Input sequence x = (x
1
, . . . , x
N
) , with x
i
A
Output sequence y = (y
1
, . . . , y
N
) , with y
i
B
Probability to receive a given output sequence y if x was sent:
P(y|x) =
N

i =1
P(y
i
|x
i
)
Example: Binary Symmetric channel
A = B = {0, 1}
P(0|0) = P(1|1) = 0.9
P(0|1) = P(1|0) = 0.1
P(00|00) = (0.9)(0.9) = 0.81 . . .
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 6 / 33
1.1.2. Mutual Information
Probabilities of input X: Q(a
k
), 1 k K
Joint probabilities on X, Y: P(a
k
, b
j
) = Q(a
k
)P(b
j
|a
k
)
Marginal Probabilities on output Y: P(b
j
) =

K
k=1
Q(a
k
)P(b
j
|a
k
)
Mutual Information
I (X; Y) =
K

k=1
J

j =1
P(a
k
, b
j
) log
P(b
j
|a
k
)
P(b
j
)
= E
X,Y

log
P(b
j
|a
k
)
P(b
j
)

=
K

k=1
J

j =1
P(a
k
, b
j
) log
P(b
j
, a
k
)
P(a
k
)P(b
j
)
H(X|Y) = H(X) I (X; Y)
Information obtained on average on X by observing Y
Uncertainty on X is decreased on average by I (X; Y) when observing Y
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 7 / 33
Capacity
For a given channel characterized by transition probabilities P(b
j
|a
k
)
Denition: Capacity
C = max
Q(a
k
)
I (X; Y)
Maximum of mutual information, optimized over the input distribution
If I (X; Y) = C, the distribution Q is said to realize the capacity
Interpretation:
For transmission of sequence x = (x
1
, . . . , x
N
) y = (y
1
, . . . , y
N
)
Coding theorem will show that the capacity is the maximum amount of information
that can be transmitted per letter
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 8 / 33
Transmission of sequences
X
N
, Y
N
: Random sequences of N inputs and N outputs
X
i
, Y
i
are random variables for i
th
letter in the sequence
In general, should consider the mutual information for the whole sequence I (X
N
; Y
N
)
Theorem 1.1: Transmission on memoryless channel
If channel is memoryless
(1) Then
I (X
N
; Y
N
)
N

i =1
I (X
i
; Y
i
)
(2) If inputs are independent
I (X
N
; Y
N
) =
N

i =1
I (X
i
; Y
i
)
(3) If inputs are independent and distribution realizes the capacity
I (X
N
; Y
N
) = NC
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 9 / 33
Proof of Theorem 1.1
I (X
N
; Y
N
) = H(Y
N
) H(Y
N
|X
N
)
Memoryless channel:
H(Y
N
|X
N
) =

y
Q
N
(x)P
N
(y|x)
N

i =1
log P(y
i
|x
i
)
=
N

i =1
H(Y
i
|X
i
)
Joint entropy: H(U, V) H(U) + H(V) with equality i U and V are independent
H(Y
N
)
N

i =1
H(Y
i
)
I (X
N
; Y
N
)
N

i =1
(H(Y
i
) H(Y
i
|X
i
)) =
N

i =1
I (X
i
; Y
i
)
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 10 / 33
Proof of Theorem 1.1
If inputs are independent: Q
N
(x) =

N
i =1
Q
i
(x
i
)
Then outputs are independent (memoryless channel) and
H(Y
N
) =
N

i =1
H(Y
i
)
I (X
N
; Y
N
) =
N

i =1
I (X
i
; Y
i
)
If in addition the distributions Q
i
realize the capacity
H(X
i
; Y
i
) = C for all i
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 11 / 33
1.2.1 Convexity of I (X; Y)
Denition: A function f : R
K
R is -convex if, for all , R
K
and 0 1,
f () + (1 )f () f

+ (1 )

Function is above the straight line


Function -convex with inequality in other direction
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 12 / 33
Convexity of Mutual Information
Theorem 1.2: Convexity
The Mutual information I (X; Y) is -convex in the input distribution Q(a
k
)
Proof:
Let Q
0
(x) and Q
1
(x) be two probability distributions on X. For 0 1, dene
Q(x) = Q
0
(x) + (1 )Q
1
(x)
Let I
0
, I
1
and I be the corresponding mutual informations. We want to prove that
I
0
(X; Y) + (1 )I
1
(X; Y) I (X; Y)
Dene an auxiliary random variable Z with values z {0, 1} and probabilities

Z
(0) =
Z
(1) = 1
X is assumed to be generated with Q
0
if z = 0 and with Q
1
if z = 1
P
X|Z
(x|0) = Q
0
(x) P
X|Z
(x, 1) = Q
1
(x)
The probability distribution of X is then
P
X
(x) =
1

z=0

Z
(z)P
X|Z
(x|z)
= Q(x)
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 13 / 33
Convexity of Mutual Information (Proof)
Considering the cascade of the two channels Z X, X Y:
I (X; Y |Z) I (X; Y)
Developing
I (X; Y|Z) =

x,y,z
(z)P(x, y|z) log
P(y|x, z)
P(y|z)
I
z
(X; Y) =

x,y
Q
z
(x)P(y|x) log
P(y|x)

x
Q
z
(x)P(y|x)
P(x, y|z) = P(x|z)P(y|x, z) = Q
z
(x)P(y| x, z)
P(y|x, z) = P(y|x) [cascade]
P(y|z) =

x
Q
z
(x)P(y| x, z)
I (X; Y|Z) = I
0
(X; Y) + (1 )I
1
(X; Y)
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 14 / 33
Characterization of the capacity
Maximization of I (X; Y) over parameters
k
= Q(a
k
)
Convex set of possible probabilities
S = { R
K
:
k
0,
1
+ . . . +
K
= 1}
Maximize a convex function on a convex set
- There exist ecient algorithm
- What about an analytical expression?
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 15 / 33
Convex optimization
Based on Lagrange multipliers for the current problem
Theorem 1.3: Convex optimization
For -convex function f : S R on convex set S, also assumed continuously derivable
on S, the vector maximizes f on S i there exists a real number such that
f ()

k
= for all k such that
k
> 0
f ()

k
for all k such that
k
= 0
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 16 / 33
Characterization of capacity
Sometimes an analytical solution can be proven
Theorem 1.4
Let the function g
k
: S R be dened as
g
k
() =
J

j =1
P(b
j
|a
k
) log
P(b
j
|a
k
)

K
t=1

t
P(b
j
|a
t
)
where
t
= Q(a
t
)
The distribution Q realizes the capacity i there exists a real number C such that
g
k
() = C for all k such that Q(a
k
) > 0
g
k
() C for all k such that Q(a
k
) = 0
The number C is unique and is the capacity of the channel
Property: I (X; Y) =

K
k=1

k
g
k
()
Proven by using Theorem 1.3 with f () = I (X; Y)
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 17 / 33
1.2.3 Symmetric Channel
A channel is characterized by its transition probability matrix P
P =

P(b
1
|a
1
) P(b
2
|a
1
) . . . P(b
J
|a
1
)
P(b
1
|a
2
) P(b
2
|a
2
) . . . P(b
J
|a
2
)
.
.
.
.
.
.
.
.
.
P(b
1
|a
K
) P(b
2
|a
K
) . . . P(b
J
|a
K
)

The channel is symmetric if there exists a partition


P = [P
1
, P
2
, . . . , P
m
]
of the columns of P such that, inside each block P
r
, the lines are equivalent up to a
permutation, and the columns are equivalent up to a permutation
Example 1: Binary Symmetric Channel
P =

q p
p q

p + q = 1
Simple partition: One block
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 18 / 33
Examples of Symmetric channels
Example 2: Binary symmetric channel with erasure
P =
0 1

q p r
p q r

0
1
p + q + r = 1
Symbol corresponds to erasure of input symbol
Example 3: 3-state symmetric channel with zero diagonal
P =
0 1 2

0 p q
q 0 p
p q 0

0
1
2
p + q = 1
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 19 / 33
Capacity of symmetric channels
Theorem 1.5: Optimal distribution for symmetric channels
For a symmetric channel, the uniform distribution
Q(a
k
) = 1/K for k = 1, . . . , K
realizes capacity.
Proof: Use Theorem 1.4 with = (1/K, . . . , 1/K): show that g
k
() is constant
Consider the partition T
1
. . . T
m
= {1, . . . , J} corresponding to the symmetric
partition P = [P
1
, P
2
, . . . , P
m
]
g
k
() =
m

r =1
g
k,r
()
with
g
k,r
() =

j T
r
P(b
j
|a
k
) log
P(b
j
|a
k
)
P(b
j
)
It is sucient to show that for each r : g
k,r
() is independent of k
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 20 / 33
Capacity of symmetric channels
For j T
r
(with xed r ), P(b
j
) does not depend on j
P(b
j
) =

l
Q(a
l
)P(b
j
|a
l
) =
1
K

l
P(b
j
|a
l
) = c
r
because the columns of P
r
are equivalent to each other
The sum for j T
r
of the P(b
j
|a
k
) log P(b
j
|a
k
) does not depend on k because all
the lines of P
r
are equivalent. Similarly for the sum on P(b
j
|a
k
) log P(b
j
) because
P(b
j
) = c
r
In conclusion: g
k,r
() does not depend on k
The capacity is given by (using any value of k)
C =
J

j =1
P(b
j
|a
k
) log
KP(b
j
|a
k
)

K
l =1
P(b
j
|a
l
)
Example 0: Perfect (error-free) channel. For an input alphabet of K letters
C =
K

j =1

j ,k
log K
j ,k
= log K
Obvious since I (X; Y) = H(X)
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 21 / 33
Capacity of symmetric channels: examples
Ex 1 Binary Symmetric channel
C = q log 2q + p log 2p
= log 2 + q log q + p log p
= log 2 H(p)
where H is the binary entropy function
H(p) = p log p (1 p) log(1 p)
Ex 2 Binary symmetric channel with erasure
C = q log
2q
p + q
+ p log
2p
p + q
+ r log
2r
2r
= (p + q)

log 2 +
q
p + q
log
p
p + q
+
p
p + q
log
p
p + q

= (p + q)

log 2 H(
p
p + q
)

Limit cases:
(1) r = 0: Binary symmetric channel
(2) p = 0: C = (1 r ) log 2
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 22 / 33
Capacity of symmetric channels: examples
Ex 3 3-state symmetric channel with zero diagonal
C = q log 3q + p log 3p
= log 3 + q log q + p log p
= log 3 H(p)
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 23 / 33
Examples of NON-symmetric channels
Ex 4 Z-Channel
P =
1 0

1 0
p q

1 Q(1) = 1 R
0 Q(0) = R
I (X; Y) = Rg
0
(R, 1 R) + (1 R)g
1
(R, 1 R)
g
0
= p log
p
1 Rq
+ q log
q
Rq
g
1
= log
1
1 Rq
The distribution (R, 1 R) that realizes capacity must satisfy g
0
= g
1
(= C)
R =
1
q(1 + )
with = e
H(q)
q
C = g
1
= log(1 +
1

)
For small values of p
C = log 2
1
2
H(p) + O((p log p)
2
)
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 24 / 33
Examples of NON-symmetric channels
Ex 5 W-Channel
P =
1 0

1 0
p q
0 1

1
0
1
I (X; Y) = H(Y) H(Y|X)
H(Y) log 2
Distribution Q(1) = Q(1) = 1/2 and Q(0) = 0 gives I (X; Y) = log 2
(degenerates into Binary symmetric channel without error)
C = log 2
Independently of p and q
In general: No analytical expression for non symmetric channel
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 25 / 33
1.3.1 Coding Theorem: Introduction
One objective of transmission: Guarantee a low error probability
(=probability that the letters sent by the source are incorrectly decoded at the
receiver)
Coding theorem: If the entropy of the source is lower than the capacity, there exists
a coding/decoding scheme that provides an error probability as low as desired
Negative result: If the entropy of the source is larger than the capacity, it is not
possible to decrease the error probability arbitrarily low
Model
Objective: Decoded sequence v reproduces source sequence u as correctly as possible
Error probability on letter l :

l
=

u,v|u
l
=v
l
P(u, v)
Average error probability: =
1
L

L
l =1

l
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 26 / 33
Coding Theorem: Introduction
Source assumed memoryless : H(U
L
) = LH(U)

s
dened as time between source letters

c
is time between channel letters
Negative result: If
H(U)

s
>
C

c
, then
for some positive constant , independent of L
Entropy of source and channel capacity here represented in bits/sec
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 27 / 33
1.3.2 Preliminary results
(1) If uncertainty remains H(U
L
|V
L
) > 0, the average error probability cannot be zero
For one given letter L = 1 (Lemma 1.7)
For the whole sequence L > 1 (Lemma 1.8)
(2) Relation between H(U
L
|V
L
), the source entropy H(U) and the capacity C
(Lemma 1.9)
Lemma 1.7: Error probability as a function of the conditional entropy
Let U and V be 2 random variables on an alphabet D of size S. The error probability
dened as =

u=v
P(u, v) satises
log(S 1) +H() H(U|V) = H(U) I (U; V)
Lower bound on the error probability as a function of the conditional entropy
(= remaining uncertainty on U after observing V)
Average uncertainty is lower than the uncertainty on the letter u when there is an
error + uncertainty on the apparition of an error
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 28 / 33
Proof of Lemma 1.7
Compare P(u|v) with the symmetric distribution of parameter :
P

(u|v) =


S1
if u = v error
1 if u = v no error
Using =

u=v
P(u, v), compute
H(U|V) log(S 1) H() =

u,v
P(u, v) log
P

(u|v)
P(u|v)
Use log z z 1 (for log
e
)

u,v
P(u, v) log
P

(u|v)
P(u|v)

u=v
P(u, v)


(S 1)P(u|v)
1

u=v
P(u, v)

1
P(u|v)
1

=

S 1

u=v
P(v)

u=v
P(u, v) + (1 )

u=v
P(v)

u=v
P(u, v)
= + (1 ) (1 ) = 0
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 29 / 33
Error probability for sequence
Similar result for sequence of L letters
Lemma 1.8
For sequences of L letters, the average error probability satises
log(S 1) +H( )
1
L
H(U
L
|V
L
)
Proof:
Inequality on joint entropy
H(U
L
|V
L
)
L

l =1
H(U
l
|V
l
)
Then apply previous lemma to each pair U
l
, V
l
and use -convexity of H
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 30 / 33
Processing chain and information
Consider chain U
L
X
N
Y
N
V
L
Each element in the chain only depend on previous one (no side information)
When x is known, y is (conditionally) independent of u: P(y|x, u) = P(y|x)
Similarly P(v|y, x, u) = P(v|y)
Lemma 1.9: Theorem of processing chain
If the source sequence of length L is transmitted at the receiver through N use of the
channel:
I (U
L
; V
L
) I (X
N
, Y
N
)
Proof: I (U
L
; Y
N
|X
N
) = 0 : u brings no information on y if x is known. And it comes
I (U
L
; Y
N
) I (X
N
; Y
N
)
Similarly
I (U
L
; V
L
) I (U
L
; Y
N
)
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 31 / 33
1.3.3 Main result
We now relate L and N by introducing the time intervals
s
and
c
Number of channel usage: N = [L
s
/
c
]

Theorem 1.10: Negative result for coding


Memoryless discrete source U on alphabet of S letters, providing a letter every
s
sec
Memoryless channel of capacity C accepting a letter every
c
seconds
Source sequence of length L is sent through the channel with N = [L
s
/
c
]

uses of
the channel
For all L, the average error probability (per letter) satises
log(S 1) +H( ) H(U)

s

c
C
Proof:
log(S 1) +H( )
1
L
H(U
L
|V
L
) =
1
L
H(U
L
)
1
L
I (U
L
; V
L
) = H(U)
1
L
I (U
L
; V
L
)
H(U)
1
L
I (X
N
; Y
N
) H(U)
N
L
C
H(U)

s

c
C
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 32 / 33
Comments on result
Lower bound on average error probability
For any combination of source and channel coding
If entropy (in bits/sec) is higher than capacity, then some uncertainty must remain
and the error probability cannot approach zero
Only applies on the average probability, not on individual probabilities
l
Remains to prove the positive part
() INGI 2348: Information Theory and Coding Part 2: Channel Coding 33 / 33

You might also like