Information Theory & Coding: "Science Is Organized Knowledge. Wisdom Is Organized Life." - Immanuel Kant
Information Theory & Coding: "Science Is Organized Knowledge. Wisdom Is Organized Life." - Immanuel Kant
2
Course Outlines
Elements of sets theory and probability theory
Basic concepts of information theory
Measure of information and uncertainty
Entropy
Basic concepts of communication channels
organization
Basic principles of encoding
Error-detecting and error-correcting codes
3
Reference Books
Elements of Information Theory (1st or 2nd editions),
Thomas Cover & Joy A. Thomas, JohnWiley & Sons
Error Control Coding ( First,1983 or Second, 2004
edition), Shu Lin & Daniel J. Costello Jr., Prentice Hall
Error Correcting Codes, W. Wesley Peterson & E. J.
Weldon, Jr., The MIT Press
Fundamentals of Error Correcting Codes, W. Carry
Huffman & Vera Pless, Cambridge University Press
Digital Communications, Bernard Sklar, Prentice Hall
4
Evaluation
Exams
Midterms
Quizzes, Homework
Final Exam
Grading Criteria
As per policy of the university
5
Main Objective of the
Course
Some practical contributions of Information
Theory
Outline of the way thinking for this course
The questions answered by Claude Shannon in 1948
where did these answers lead in the subsequent 63
years
Course Organization
What we will do
What you should do
6
Main Objective
quantify the notion of information in a
mathematically and intuitively sound way
explain how this quantitative measure of
information may be used in order to build
efficient solutions to multitudinous
engineering problems
show its far-reaching interest in other
fields
7
Information 1
What does a word information mean?
There is no some exact definition, however:
Information carries new specific knowledge,
which is definitely new for its recipient;
Information is always carried by some specific
carrier in different forms (letters, digits, different
specific symbols, etc.);
Information is meaningful only if the recipient is
able to interpret it
8
Information 2
The information materialized is a message
Information is always about something (size of a
parameter, occurrence of an event, etc.)
Information may be a truth or a lie
Even a disruptive noise used to inhibit the flow of
communication would be a form of information
However, generally speaking, if the amount of
information in the received message increases,
the message is more accurate
9
IT Contribution 1
15
Other IT Contributions
16
The original motivation behind
information theory
Claude Shannon (and others), in the first half of the last
century, was struggling to formalize the problem of
transmitting information over noisy channels.
To understand the problem, he developed A
mathematical theory of information to define well-posed
problems in this context. Later, this theory appeared to
be complete and became The Information Theory.
The seminal work of Claude Shannon (and others) has
influenced in an irrevocable way our way of thinking in
the fields of communications, statistics, reasoning under
uncertainty, cybernetics, physics, etc.
17
Claude Shannon
From Wikipedia, the free encyclopedia
Claude Elwood Shannon (April 30, 1916 -
February 24, 2001), an American electronic
engineer and mathematician, is known as the
father of information theory.
Shannon is famous for having founded
information theory with one landmark paper
published in 1948. But he is also credited with
founding both digital computer and digital circuit
design theory in 1937, when, as a 21-year-old
masters student at MIT, he wrote a thesis
demonstrating that electrical application of
Boolean algebra could construct and resolve any
logical, numerical relationship.
It has been claimed that this was the most
important masters thesis of all time.
18
Claude Shannons seminal
work in IT
19
The two (three) main questions
posed by Shannon
21
Basic Probability Theory 1
Measure of uncertainty
Frequency of occurrence
Definition 1
An experiment is an occurrence whose result, or outcome is
uncertain.
Example--Cast a die and observe the number facing up
Definition 2
The set of all possible outcomes is called the sample space for
the experiment.
ExampleOutcomes; 1,2,3,4,5,6
Sample Space; {1,2,3,4,5,6}
22
Basic Probability Theory 2
Definition 3
An event E is any collection of possible outcomes of
an experiment, that is, any subset of S (including S
itself)
It is not necessary for every set of outcomes to be an
event favorable outcomes
The sets of outcomes to which one assigns a
probability
E occurs in a particular experiment if the outcome of
that experiment is one of the elements of E
Example E: the outcome is even ; E = {2,4,6}
23
Basic Probability Theory 3
Estimated Probability
If an experiment is performed N times, and
the event E occurs fr(E) times, then the ratio
P(E) = fr(E)/N
is called the relative frequency or estimated
probability of E.
The number fr(E) is called the frequency of E
N is the number of trials or sample size
24
Basic Probability Theory 4
A Probability Space is a triplet consisted of
Omega (nonempty set, sample space), Borel
field/sigma field and probability
measure/countably additive function from B into
[0,1] such that P(Omega) = 1
A collection of subsets is called sigma-field if it
satisfies three properties
The empty set is an element of B
B is closed under complementation
B is closed under countable unions
25
Basic Probability Theory 5
Probability of equally likely outcomes
Discrete uniform probability law
If the sample space S consists of n possible
outcomes which are equally likely, then the
probability of any event E is
P(E) = number of favorable outcomes/total no of
outcomes
= n(E)/n(S)
26
Basic Probability Theory 6
Probability distribution (PD)
Everything we say about PD applies equally to both
estimated and theoretical probabilities
A PD is an assignment of a number P(si) to each
outcome si in a sample space S={s1,s2,.sn} such
that summation of the probabilities of each outcome is
equal to 1.
Given the PD, one can obtain the probability of an
event E by adding up the probabilities of the
outcomes in E.
27
Basic Probability Theory 7
If P(E) = 0, the event is an impossible
event.
Addition Principles
Mutuallyexclusive events
General principle
Complement property
28
Basic Probability Theory 8
Conditional Probability
Conditional probability of X given Y, P(X|Y), is
probability that X takes on a particular value given Y
has a particular value
Independent Events
If X, Y independent, P(X, Y) = P(X)P(Y)
Note: if X, Y independent:
P(X|Y) = P(X) given P(Y) is not equal to zero
Bayes Theorem
29
Random Variables
Variable that represents outcome of an event
X represents value from roll of a fair die; probability
for rolling n: P(X = n) = 1/6
If die is loaded so 2 appears twice as often as other
numbers, P(X = 2) = 2/7 and, for n 2, P(X = n) = 1/7
Note: P(X) means specific value for X doesnt
matter
Example: all values of X are equiprobable
30
The measure of uncertainty
proposed by Claude Shannon
Entropy of a random variable.
(Context: a random experiment seen from
a particular observer)
(Maths: exploit probability calculus)
NB: other contributors: Kolmogorov,
Chaitin, Picard , Von Neuman, etc.
31
The measure of uncertainty
proposed by Claude Shannon
Suppose you are only interested in the outcome of a certain binary
variable: e.g. will you succeed or not in this course ?
You are not sure about the outcome because of lack of information
and because of some sources of (unavoidable) randomness....
But, let us consider an extreme case: suppose that you (we) know in
advance that everyone passes (pleasant situation for most people):
it means no uncertainty.
Let us consider another (hypothetical) extreme case (less pleasant,
for most people in the room): assume that we know that everyone
will fail: no uncertainty, either.
For C. Shannon, and the theory that we are trying to learn in this
course, these two situations are perfectly identical, because he (his
theory) doesnt take care of pleasure.
32
Formalization of the measure
of uncertainty
Shannon proposed that the uncertainties of such two unrelated
questions should be combined by summing them together.
Thus, we have for the simultaneous case of two unrelated
questions: Uncertainty = 2 Shannon.
Shannon also asked that the uncertainty should change in a
continuous way with the probabilities (and should depend only on
these probabilities, because he didnt want to bother about
pleasure...)
Consequently: we have (and this is a theorem):
33
Information = the variation of the measure of
uncertainty proposed by Claude Shannon
Now consider that you are posing the same two questions, but that
you are in a different context, where you already know which
students in the class like to eat fish.
What is your uncertainty now ?
1 Shannon! (only the uncertainty about fail or pass).
Therefore, Shannon proposed that information quantity is measured
as the change in uncertainty between two contexts:
here the change of context is about whether you know about fish
eating preferences.
We will make this a bit more precise, but for the time being let us
say that we have a measure of information which in the current case
is of 1 Shannon (because, even if we know about fish preference,
but we still have no idea about pass or fail).
34
Where did these answers lead
in the subsequent 63 years
Question one: when we talk about students, in context 2, we dont
need to say whether they like to eat fish: data compression (remove
redundancies)
Question two: when we transmit information over a noisy channel,
we are limited in the number of information bits that we can transmit
because the channel lies in a random and non observable fashion
(more subtle notion to be understood), but we can cope with that by
repeating transmissions.
When reasoning about uncertain outcomes (e.g. medical
diagnostics, forensics, economy, complex systems), information
theory provides a principled way to address a very broad set of
questions.
35