0% found this document useful (0 votes)
38 views

Information Coding

This document provides an introduction and overview of the topics that will be covered in an information theory course over 8 weeks. The course will introduce key information theory concepts like entropy, relative entropy, and mutual information. It will then cover source and channel coding techniques including Huffman coding, rate distortion theory, and channel capacity. The goal is to understand the fundamental limits of data compression and reliable communication over noisy channels.

Uploaded by

Extc Eng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Information Coding

This document provides an introduction and overview of the topics that will be covered in an information theory course over 8 weeks. The course will introduce key information theory concepts like entropy, relative entropy, and mutual information. It will then cover source and channel coding techniques including Huffman coding, rate distortion theory, and channel capacity. The goal is to understand the fundamental limits of data compression and reliable communication over noisy channels.

Uploaded by

Extc Eng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

An Introduction to Information Theory

Adrish Banerjee
Department of Electrical Engineering
Indian Institute of Technology Kanpur
Kanpur, Uttar Pradesh
India

Lecture #1A: Introduction


In this lecture, we give a brief introduction of the topics that we will cover
in this course. We first describe what is information.Then, using a simple
example we illustrate how information about the source distribution can be
used to design source compression algorithm. We also describe a model of
digital communication system.
Before all of that,let us first talk about some of the textbooks and references,
that we will follow here.

Textbooks:
• James L. Massey, Lecture notes on “Applied Digital Information The-
ory I”. (http://www.isiweb.ee.ethz.ch/archive/massey scr/)
• Thomas M. Cover, Joy A. Thomas, “Elements of Information Theory”,
2nd Edition, John Wiley & Sons, 2006.
For the first textbook, lecture notes are available at the given link, which
are a very nice set of lecture notes on Information theory.

References:
• Robert G. Gallager, “Information Theory and Reliable Communica-
tions”, John Wiley & Sons, 1968.

1
• Raymond W. Yeung, “Information Theory and Network Coding”, Springer,
2008.

• David J. C. MacKay, “Information Theory, Inference, and Learning


Algorithms”, Cambridge University Press.

• Robert Ash, “Information Theory”, Dover Publications, 1965.

• Imre Csiszar and Jonos Korner, “Information Theory”, Second edition,


Cambridge University Press, 2011.

• N. J. A. Sloane and Aaron D Wyner, “Claude Elwood Shannon: Col-


lected Papers”, IEEE Press 1993.

• Abbas El Gamal and Yong-Han Kim, “Network Information Theory”,


1st Edition, Cambridge University Press, 2011.

• Emmanuel Desurvire, “Classical and Quantum Information Theory”,


1st Edition, Cambridge University Press, 2009.

So let us now talk about what are the course content for this particular
course. So, we will first start of with quantifying information, what is infor-
mation and how do we quantify information. So, we will talk about heartless
measure of information and Shannon measure of information and then we
will talk about entropy, relative entropy, mutual information and their prop-
erties. After that we will talk about some information inequalities, Jensen’s
inequalities, log sum inequality and some other results such as fano’s lemma
which we will be using subsequently later on to prove some other results.
Then we will move into the problem of source compression source coding.
So, source coding can be from block to variable length coding, variable to
block length coding, block to block length coding and variable to variable
length coding.
So, that depends on whether the input sequences of fixed length if you are
calling a block or input length is variable, that is this block and this block
to variable whether the output is of variable length or fixed length. So, we
can get compression if we take block of data, we can get compression by
representing blocks which are happening more frequently by lesser number
of bits and blocks which are happening less frequently by larger number of
bits. So, in that way we can reduce the number of bits required to represent
the source.

2
So, we will talk about optimal blocks to variable length coding, which is your
Huffman coding, we will prove the conditions for optimality what is the mini-
mum number of bits to required to represent the code. We will also talk about
variable to block length coding and for a specific instance where the message
can be passed in a particular fashion, we will talk about an optimal variable
to block length coding in that particular instance which is what is known as
Tunstall coding. We will also talk about variable to variable length coding
which are very prominently used in lot of source compression algorithms. So,
in particular we will talk about arithmetic coding and Lempel-Ziv algorithm,
which comes under class of universal source compression algorithm, you do
not require any knowledge about the prior source distribution of the source
that you are trying to compress. We will also talk about block to block length
coding, now if we are doing block to block length coding to get compression;
obviously, we are encountering a lossy compression.
Now we will show that there are some sequences which are more likely to
happen, given a source distribution there are some sequences which are more
likely to happen from that particular source and some sequences are less
likely to happen. So, when we do block to block length coding we will try to
encode, assign individual code words to all those sequences which are more
likely to happen, which we call as typical sequences and of course for all
other non typical sequences we will just assign one particular code word. So,
when we transmit a typical sequence, the decoder will be able to decode that
sequence where as for the non typical sequence we would not be able to dis-
tinguish what sequences was transmitted and hence this is the example of a
lossy source compression. We will also talk about what is collectively known
as Asymptotic Equipartition Property, this is information theory analog of
loss large numbers; we will talk about what are these properties and what is
the consequence of these properties.
Then we will move to source compression for sources which have memory. So,
we will exploit basically temporal correlation between bits that are coming
out of source we are going to exploit that correlation to design ours source
encoder. So, this will be our coding for sources with memory and we take
a very simple example to illustrate that. After this few lectures on source
compression we will now move to channel capacity computation. So, chan-
nel is the medium over which we are communicating, how many bits we can
transmit over communication link that basically channel capacity. So, we
will talk about some very simple channel models then we will talk about how
to compute the channel capacity. Then we will move from discrete random

3
variable to continuous random variable and we will define entropy for contin-
uous random variable, that is basically what is known as differential entropy
and we will consider an example of very commonly used channel which is
addictive white Gaussian noise channel, we will compute the capacity of ad-
dictive white Gaussian noise channel.
Next topic deals with what is known as rate distortion theory. Now if we
have a real number and you try to represent that real number, if you want
to represent it you require infinite number of bits right, but if you try to
represent that real number with finite number of bits then essentially you
are introducing some sort of distortion. So, how do you find out that your
representation using fixed number of bits is a good representation of a real
number? So, you need to describe a goodness measure between your origi-
nal real number and its representation that is basically known as distortion
measure. Now rate distortion theory deals with if you have a source and a
given distortion measure you are interested in knowing for example, what is
the minimum average distortion that can be achieved for a given rate.
So, given distortion measure we are interested in basically finding out or we
can alternatively say that given a distortion average distortion measure you
want to find out what is the minimum rate that you can achieve. And finally,
if time permits we are going to talk about network information theory. Now
earlier here when I talked about channel capacity I am going to talk about
point to point channel. So, there is 1 transmitter and 1 receiver. So, we are
going to characterize a capacity of such channels. Now think of scenarios
when you have multiple senders and multiple receivers, how do you find out
capacity of such channels or let us say you have multiple senders multiple
receivers and you want to do distributed source compression. So, all these
problems come under this network information theory. So, this is roughly
the syllabus that we plan to cover in this 20 hour lectures, spanned over 8
weeks.

Topics to be covered: Information Theory


• Introduction: Entropy, Relative Entropy, Mutual Information
• Information Inequalities
• Block to variable length coding: Huffman coding
• Variable to block length coding: Tunstall coding

4
• Variable to variable length coding: Arithmetic codes, Lempel-Ziv codes
• Block to block length coding: Typical sequences
• Asymptotic Equipartition Property
• Coding for sources with memory
• Channel capacity
• Differential Entropy
• Gaussian Channel
• Rate Distortion Theory
• Network Information Theory

What to expect from the course:


We will try to answer few questions like ,What are the fundamental lim-
its of communication?.At what rate we can reliably communicate over the
channel, for example for lossy compression what is the minimum number of
bits required to represent a source. What are the fundamental limits of data
compression? We will take a look at practical source compression algorithms
and some mathematical techniques.
Information Theory answers two fundamental questions in communications
i) What is the ultimate data compression? Given a source, what is the
minimum number of bits required to represent a source.
ii) What is the ultimate transmission rate? The maximum rate at which
we can transmit over a communication channel and still be able to
reliably decode the message.

So, let us first intuitively try to understand, What is Information?


For example,a flip of coin with two heads does not convey any information
because it is a biased coin and it will always give head.
A source producing successive bits of π : 3, 1, 4, 1, 5, 9, 2, 6 also does not
convey any information as there is no uncertainty in the source.
Shannon’s Information Theory regards only those symbols as information
that are not predictable.

5
We know that all communication systems include these basic steps.

i) Encoding a message at its source.

ii) Transmitting that message through a communication medium.

iii) Decoding the message at its destination.

So, information theory answers two fundamental questions in communication,


number one - what is the ultimate data compression that you can achieve.
So, given a source, what is the minimum number of bits required to represent
the source and the other question that is it answers is what is the maximum
rate at which we can transmit over a communication channel and still at the
receiver you should be able to reliably decode that message.
So, before we go into the block diagram for digital communication system let
us first intuitively try to understand what is information? So, I am giving
you an example - let us say you have a coin which is a biased coin. So, this
is the coin which has both the sides head and I flip this coin and ask you to
guess what is the outcome does this flipping of coin convey any information
to you, the answer is no, why? Because it is a biased coin you know both
sides are head. So, no matter what I flip, I will always get a head. So, this
does not, my flipping of coin does not convey any information because you
know beforehand that you will get a head right. If this would have been an
unbiased coin which has head on 1 side and tail on 1 side then flipping of coin
would have conveyed information because you do not know whether head is
going to come or tail is going to come.
Let us look at a source which produces these bits 3, 1, 4, 1, 5, 9, 2, and 6,
does this convey information? Again in this case these are basically represen-
tation of this number pie 3, 1, 4, 1, 5, 9, 2, 6. So, if I give you this numbers
you know this I am giving you basically representation of pie. So, this again
does not convey any information to you. So, why does not it convey any
information? Because there is no uncertainty in the source, the outcome of
this experiment is known. So, Shannon’s information theory regards only
those symbols as containing information, that are not predictable. So, as
I gave an example if you have an unbiased coin or let us say you have an
unbiased dice if you roll the dice, rolling of the dice when I ask you to guess
what is the number is, that event conveys information because you do not
know when I roll an unbiased dice whether I am going to get 1, 2, 3, 4 or 5
or 6.

6
For the first step for digital communication system we will take an exam-
ple to illustrate, how we will do encoding of information. A bag contains
50% black balls, 25% red balls, 12.5% blue balls, 12.5% green balls. You are
randoming picking a ball from the bag and want to convey the information
about the color of the ball.
Simple encoding (Dumb way!), black=00, red=01, blue=10, green=11. An
average of 2.0 bits/color
Smart way?We use the statistical structure of a source to represent its out-
put efficiently. black=0, red=10, blue=110, green=111.An average of 1.75
bits/color.If we are interested in source compression, we will try to represent
sources which occurs more frequently with lesser number of bits and other
sources which occur less frequently with more number of bits.

Can you figure out the color of the balls from the sequence 0110100111?
Black, blue, red, black, green. As it is a prefix free code, we are able to make
out when is a particular code is ending.
Main principle of data compression: “Only infrmatn esentil to understnd mst
b tranmitd.”

Next we will explain what is channel and channel capacity.

The transmission medium in communication is known as channel. In his


landmark paper in 1948, A Mathematical Theory of Communication, in Bell
System Technical Journal, Shannon introduced the concept of channel capac-
ity. The channel capacity is a measure of the amount of information that can
be conveyed between the input X and the output Y of a channel. Shannon in
his celebrated noisy channel coding theorem proved the existence of channel
coding schemes that can achieve an arbitrarily low error probability as long
as the information can be transmitted across the channel at a rate less than
the channel capacity, C. Example: If the channel capacity of a particular
communication link is (say) 2 Gbps. We can communicate over this channel
at any desired rate less than 2 Gbps, and achieve arbitrary low error rates.

7
Let us look at the block diagram of a communication system.

Transmitter

Information Source u Channel v Digital


Encryption
Source Encoder Encoder Modulation

Channel
Receiver

^
u Channel r
Sink Source Decryption Digital
Decoder Decoder Demodulation

In the above diagram, we have source encoder which tries to represent


the information source in a compact fashion ,then we encrypt the bits and we
add controlled redundancy which enable us to detect and correct errors.We
modulate the information and transmit symbols through the channel and
then we reverse all the above steps in the receiver.
So Let us quickly run through the various blocks of digital communication.
• Source Coding: To minimize the number of bits per unit time re-
quired to represent the source output. This process is known as source
coding or data compression. Examples: Huffman coding, Lempel-Ziv
algorithm. The output of the source encoder is referred to as the in-
formation sequence.
• Encryption: To make source bits transmission secure. This process
of converting source bits (message text) into a source stream that looks
like meaningless random bits of data (cipher text) is known as encryp-
tion. Examples: Data Encryption Standard (DES), RSA system.
• Channel Coding: To correct transmission errors introduced by the
channel. The process of introducing some redundant bits to a sequence
of information bits in a controlled manner to correct transmission errors
is known as channel coding or error control coding. Example: Repetition
code, Reed-Solomon codes, CRC codes. The encoded sequence that is
the output of the channel encoder is referred to as codeword.
• Modulation: To map the codewords into waveforms which are then
transmitted over the physical medium known as the channel. Examples:
Phase shift keying (PSK), quadrature amplitude modulation (QAM).

8
• Channel: The physical transmission medium; it can be wireless or
wireline. It corrupts transmitted waveforms due to various effects such
as noise, interference, fading, and multipath transmission. Examples:
Binary erasure channel (BEC), Additive white Gaussian noise (AWGN)
channel.

• Demodulation: To convert received noisy waveform to a sequence of


bits, which is an estimate of the transmitted data bits. This is known
as hard demodulation. If the demodulator outputs are unquantized
(or has more than two quantization levels), this is known as soft de-
modulation. Soft demodulation has significant improvement over hard
demodulation.

• Channel Decoding: To estimate the information bits û, and correct


the transmission errors. If û 6= u, decoding errors have occurred. The
performance of the channel decoder is usually measured by the bit error
rate (BER) or the frame error rate (FER) of the decoded information
sequence. The BER is defined as the expected number of information
bit decoding errors per decoded information bit. The coded sequences
can be broken up into blocks of data frames. A frame error occurs if
any information bit in that data frame is in error. The decoded FER
is the percentage of frames in error.

• Decryption: To recover the plain text from the cipher text with the
help of key. It is in the key that the security of a modern cipher lies,
not in the details of the cipher.

• Source Decoding: To reconstruct the original source bits from the


decoded information sequence. Due to channel errors, the final recon-
structed signal may be distorted.
So, to conclude basically in this course, we are going to concentrate on
the source encoder decoder part of digital communication system and
in the transmission medium to characterize what sort of rates, what
sort of data rates we can achieve over a communication link.

You might also like