Channel Coding: Reliable Communication Through Noisy Channels
Channel Coding: Reliable Communication Through Noisy Channels
RELIABLE COMMUNICATION
THROUGH NOISY CHANNELS
In real-life, channels are always noisy.
Removing redundancy makes the data more
susceptible to corruption by noise.
So we introduce redundancy to detect and
correct errors.
Then why did we remove redundancy in the
first place?
We introduce redundancy in a controlled
manner, so that we know exactly how much
redundancy is needed for a channel of
particular S/N ratio.
Simplest model of a channel:
Binary Symmetric channel [BSC]:
1-p
0 0
p
p
1 1
1-p
Repetition code: A simple way to increase
reliability
We repeat each bit n times.
Consider n = 3.
Probability of error = Perr
= p3 +3p2q
If p = 0.001, then Perr = 0.0013+3 x 0.0012x 0.999
= 2.998 x 10-6
But if we require Perr to be less than 10-6, then what
do we do?
Have more repetitions per bit!
For repetition code K5,
p 5 5 p 4 q 10 p 3q 2
Perr =
However, having more repetitions reduces the
information rate of the code.
Number of information bits k
Information rate
Total number of bits in code word n
i m 1 i
Information rate and protection from noise are
usually contradictory requirements.
If Information rate is high,
Perr is also high.
Consider even parity code: n-1 information
bits and 1 check bit.
R(K) = (n-1)/n = 1-1/n ~ 1 (if n is high).
It detects single errors, but then error
detection capability reduces with increasing
n.
Consider the following code K4*:
Information bits Code Word
00 0000
01 0111
10 1000
11 1111
What’s the rule here?
First bit is retained and second bit is repeated
thrice.
Decoding is correct, if first bit is not
corrupted and at most one out of the
remaining three bits is corrupted.
Therefore: Perr = 1-(q4 + 3pq3) = 0.001
Can we improve on this, without
compromising on the information rate (0.5)?
Yes. Consider K6*
Information bits codeword
000 000000
100 100011
010 010101
001 001110
011 011011
101 101101
110 110110
111 111000
What is the Perr?
Code corrects all single errors since any two
distinct code words differ in 3 bits.
Perr = 1-(q6 + 6pq5) = 0.00015
How was the improvement achieved?
By increasing the “distance” between the
code words!
Can we design better and better codes?
Can we make simultaneously make R(K) large
and Perr vanishingly small?
Hamming Distance:
Given two words a a1a2 .....an andb b1b2 .....bn
, the Hamming distance is defined as the
number of positions in which the two code
words differ. Denote it by d a , b
The Hamming distance between any two code
words satisfies the triangle inequality.
d a , b d b , c d a , c
Importance of Hamming distance:
On receiving a word, we assign to the code
word which has the smallest Hamming
distance from the received word (maximum
likelihood decoding)
A code detects t errors iff its minimum
distance is greater than t.
A code corrects t errors iff its minimum
distance is greater than 2t.
What limits the information rate of a channel?
(while we keep Perr arbitrarily low)
The channel capacity.
Conditional probability is important for
understanding the concept of channel
capacity.
Entropy (uncertainty) of the input alphabet
under the condition of receiving yi:
n
H X yi P( x j | yi ) log 2 P ( x j | yi )
j 1
Taking the mean value over all output
symbols yi , we obtain the uncertainty about
the input after receiving the output of the
channel: n
H ( X | Y ) H ( X | yi ) P( yi )
i 1
n n
H ( X | Y ) P( yi ) P ( x j | yi ) log2 P ( x j | yi )
i 1 j 1
n n
H ( X | Y ) P ( x j , yi ) log 2 P( x j | yi )
i 1 j 1
H(X|Y) is called the conditional entropy. This
is the uncertainty about the input after
receiving the output, so it is a measure of the
information loss due to noise.
I ( X ;Y ) H ( X ) H ( X | Y )
n n n n
I ( X ; Y ) P( x j , yi ) log 2 P( x j ) P( x j , yi ) log 2 P( x j | yi )
i 1 j 1 i 1 j 1
n n P( x j , yi )
I ( X ; Y ) P( x j , yi ) log 2
i 1 j 1 P ( yi ) P ( x j )
yi 0 1
P(0,yi) p0q p0p
P(1,yi) p1p p1q
The maximum value of the self information is
1 bit, therefore, the expression for the mutual
information can be written as:
n n P( x j , yi )
I ( X ; Y ) P ( x j , yi ) log 2
i 1 j 1 P ( yi ) P ( x j )
P ( 0 | 0) P (0 | 1) P (1 | 0) P (1 | 1)
I ( X ; Y ) P (0,0) log P (0,1) log P (1,0) log P (1,1) log
P ( 0 ) P ( 0 ) P (1 ) P (1 )