0% found this document useful (0 votes)
74 views15 pages

CSC 310: Information Theory: University of Toronto, Fall 2011 Instructor: Radford M. Neal

This document provides an overview of information theory and error correction concepts as they relate to communication channels. It defines key channel types like binary symmetric channels and discusses quantifying information capacity. It also covers error detection and correction techniques like adding extra bits of data to encoded blocks to fix errors. Finally, it introduces information-theoretic concepts such as channel capacity, which represents the maximum rate of error-free data transmission through a channel.

Uploaded by

Tien Tan Ngo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views15 pages

CSC 310: Information Theory: University of Toronto, Fall 2011 Instructor: Radford M. Neal

This document provides an overview of information theory and error correction concepts as they relate to communication channels. It defines key channel types like binary symmetric channels and discusses quantifying information capacity. It also covers error detection and correction techniques like adding extra bits of data to encoded blocks to fix errors. Finally, it introduces information-theoretic concepts such as channel capacity, which represents the maximum rate of error-free data transmission through a channel.

Uploaded by

Tien Tan Ngo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

CSC 310: Information Theory

University of Toronto, Fall 2011


Instructor: Radford M. Neal

Week 8
Information Channels
Suppose that data must be sent through a channel before it can be used.
This channel may be unreliable, but we can use a code designed
counteract this.
Some questions we aim to answer:

• Can we quantify how much information a channel can transmit?

• If we have low tolerance for errors, will we be able to make full use of
a channel, or must some of the channel’s capacity be lost to ensure a
low error probability?

• How can we correct (or at least detect) errors in practice?

• Can we do as well in practice as the theory says is possible?


Error Correction for Memory Blocks
The “channel” may transmit information through time rather than space
— ie, it is a memory device.
Many memory devices store data in blocks — eg, 64 bits for RAM, 512
bytes for disk.
Can we correct some errors by adding a few more bits? For instance,
could we correct any single error if we use 71 bits to encode a 64 bit block
of data stored in RAM?
Error Correction in a Communications System
In other applications, data arrives in a continuous stream. An overall
system might look like this:

Encoder for Data often sent in blocks


Data Compression
Error-Correcting
In Program Code

Noise Channel

Decoder for
Decompression Data
Error-Correcting
Code Program Out
Error Detection
We might also be interested in detecting errors, even if we can’t correct
them:

• For RAM or disk memory, error detection tells us that we need to call
the repair person.

• For some communication applications, we have the option of asking


the sender to re-transmit.

• If we know that a bit is in error, we can try to minimize the damage


— eg, if the bit is part of a pixel in an image, we can replace the pixel
with the average of nearby pixels.
Formal Definition of a Channel
A channel is defined by

• An input alphabet, AX , with symbols a1 , . . . , ar . We will usually


assume that the input alphabet is binary, with AX = {0, 1}.

• An output alphabet, AY , with symbols called b1 , . . . , bs . This is also


often binary, with AY = {0, 1}, but it can be different from AX .

• A description of how the output depends on the input.


Channel Transition probabilities
We will assume that the correspondence of input symbols with output
symbols is always known — there are no “insertions” or “deletions” of
symbols.
We will also assume that the channel is memoryless — each output
symbol is influenced only by the corresponding input symbol, not by
earlier input or output symbols.
The behaviour of such a channel is defined by its transition probabilities:

Qj|i = P (Y = bj | X = ai )

These transition probabilities are fixed by the nature of the channel. They
can be changed only by redesigning it.
The Binary Symmetric Channel (BSC)
For the BSC, the input and output alphabets are both {0, 1}.
With probability f , the symbol received is different from the symbol
transmitted. With probability 1 − f , the symbol is received correctly.
We can view the input and output alphabets as Z2 , the field of integers
modulo 2. The channel can then be seen as adding “noise” to the input:
b = a+n

where n is 0 with probability 1 − f and 1 with probability f .


Addition modulo 2 is the same as exclusive-or, and works as follows:

0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0
The Binary Erasure Channel (BEC)
For the BEC, the input alphabet is {0, 1}, but the output alphabet is
{0, ?, 1}. The “?” output represents an “erasure”, in which the
transmitted symbol is lost.
An erasure happens with probability f ; otherwise, the symbol is received
correctly.
The transition probabilities for the BEC can be arranged in a matrix as
follows:
 
1−f 0
 
 
Q = (Qj|i ) =   f f  
 
0 1−f
The Z Channel
The Z Channel has input alphabet is {0, 1}, and output alphabet {0, 1},
like the BSC.
However, the Z channel is asymmetrical. The 0 symbol is always
transmitted correctly, but the 1 symbol is received incorrectly (as 0) with
probability f .
The matrix of transition probabilities for the Z channel is as follows:
 
1 f
Q = (Qj|i ) =  
0 1−f
Channel Input Distribution
We can choose what input symbols we feed into the channel. We might
send symbols from some source, the output of a data compression program
applied to that source, or an error-correcting code for either of these.
For the moment, we’ll assume that the symbols we put in are independent
of each other, with some specified distribution:

pi = P (X = ai )

We might aim to choose these input probabilities so that we make efficient


use of the channel.
Deriving Some More Probabilities
The input and the transition probabilities together define the joint
probability for any combination of channel input and output:

Rij = P (X = ai , Y = bj )

= P (X = ai ) P (Y = bj | X = ai ) = pi Qj|i

We can now find the output probabilities:


r
X r
X
qj = P (Y = bj ) = Rij = pi Qj|i
i=1 i=1

Finally, we get the backward probabilities:

P (X = ai | Y = bj ) = P (X = ai , Y = bj )/P (Y = bj ) = Rij /qj = Si|j

The backward probabilities give the situation from the receiver’s point of
view — given that I’ve received symbol bj , how likely is it that the symbol
sent was ai ?
Input, Output, and Joint Entropies
The amount of information being sent is measured by the input entropy:
Xr
H(X) = pi log(1/pi )
i=1
where pi = P (X = ai ).
Similarly, the amount of “information” received (some of which may
actually be noise) is measured by the output entropy:
s
X
H(Y ) = qj log(1/qj )
j=1
where qj = P (Y = bj ).
We also have the joint entropy:
r X
X s
H(X, Y ) = Rij log(1/Rij )
i=1 j=1

where Rij = P (X = ai , Y = bj ). This is the information (including noise)


obtained by an outside observer who sees both the input and the output.
Mutual Information
We can now define the mutual information between the input and the
output:
I(X; Y ) = H(X) + H(Y ) − H(X, Y )

The mutual information is meant to represent the amount of information


that is being communicated from the sender to the receiver.
This makes intuitive sense: The difference of H(X) + H(Y ) and H(X, Y )
is the “overlap” in the knowledge of the sender and receiver — due to
information having been transmitted.
But the real test of this definition is whether it leads to useful theorems
and insights.
Channel Capacity
I(X; Y ) measures how much information the channel transmits, which
depends on two things:

1) The transition probabilities for the channel.

2) The input distribution

We assume that we can’t change (1), but that we can change (2).
The capacity of a channel is the maximum value of I(X; Y ) that can be
obtained with any choice of input distribution.
We will eventually see that the capacity is the rate at which data can be
sent through the channel with vanishingly small probability of error.

You might also like