COSC1003/1903 Information Theory: Joseph Lizier
COSC1003/1903 Information Theory: Joseph Lizier
Information Theory
Joseph Lizier
Guest lecturer
Reference texts
Outline
3 Other measures
4 Sample applications
5 Summary
What is information?
What is information?
Thomas (1991)
How do natural
systems process
−→
information? Image from Cover and
Thomas (1991)
X is a random variable
A variable whose value is subject to chance.
i.e. an answer/signal/measurement
e.g. result of a coin flip, whether it rains today, etc.
X is a random variable
A variable whose value is subject to chance.
i.e. an answer/signal/measurement
e.g. result of a coin flip, whether it rains today, etc.
x is a sample or outcome or measurement of X
drawn from some discrete alphabet AX = {x1 , x2 , . . .}
For binary X , AX = {0, 1}
For a coin toss, AX = {heads, tails}
For hair colour in Guess who? : AX = {?}
X is a random variable
A variable whose value is subject to chance.
i.e. an answer/signal/measurement
e.g. result of a coin flip, whether it rains today, etc.
x is a sample or outcome or measurement of X
drawn from some discrete alphabet AX = {x1 , x2 , . . .}
For binary X , AX = {0, 1}
For a coin toss, AX = {heads, tails}
For hair colour in Guess who? : AX = {?}
We have PDF defined: p(x) = Pr {X = x}, x ∈ AX
0 ≤ p(x) ≤ 1, ∀ x ∈ AX
X
p(x) = 1
x∈AX
1
We’ll show later how this is a unique form ...
h(x) = log2 6
p(x) 5
= − log2 (p(x)) 4
h(x)
3
Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1
h(x) = log2 6
p(x) 5
= − log2 (p(x)) 4
h(x)
3
Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1
h(x) = log2 6
p(x) 5
= − log2 (p(x)) 4
h(x)
3
Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1
h(x) = log2 6
p(x) 5
= − log2 (p(x)) 4
h(x)
3
Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1
h(x) = log2 6
p(x) 5
= − log2 (p(x)) 4
h(x)
3
Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1
h(x) = log2 6
p(x) 5
= − log2 (p(x)) 4
h(x)
3
Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1
h(x) = log2 6
p(x) 5
= − log2 (p(x)) 4
h(x)
3
Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1
h(x) = log2 6
p(x) 5
= − log2 (p(x)) 4
h(x)
3
Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1
h(x) = log2 6
p(x) 5
= − log2 (p(x)) 4
h(x)
3
Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1
(Shannon) entropy
= hh(x)i
Expectation value of Shannon information content
p log p = 0 in the limit as p → 0
Examples:
If ∃x, p(x) = 1 → H(X ) = 0.
For binary X , p(0) = p(1) = 0.5 → H(X ) = 1 bit.
p(x) = 1/|AX |, ∀x → H(X ) = log2 (|AX |) bits.
(Shannon) entropy
= hh(x)i
(Shannon) entropy
= hh(x)i
(Shannon) entropy
= hh(x)i
(Shannon) entropy
= hh(x)i
What has information theory ever done for me? zip files, mp3s,
encoding mobile telecoms / ADSL etc.
Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015
Outline Intro Entropy Other measures Applications Close 21
2
How to determine the coding to use is a discussion for another time ...
Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015
Outline Intro Entropy Other measures Applications Close 23
Joint entropy
Joint entropy
Joint entropy
Conditional entropy
Conditional entropy
Conditional entropy
+ +
H(X) H(Y)
Conditional entropy
Conditional entropy
Example 1:
H(X,Y)
Coding characters in English text –
H(X|Y) + + H(Y|X)
what variable Y would drop H(X ) and
therefore the code length for a
+ + conditional encoding of incoming
H(X) H(Y) character X ?
Conditional entropy
Example 1:
H(X,Y)
Coding characters in English text –
H(X|Y) + + H(Y|X)
what variable Y would drop H(X ) and
therefore the code length for a
+ + conditional encoding of incoming
H(X) H(Y) character X ?
Context of previous character(s) Y
changes the probability of the next
character X – Markov chains
Conditional entropy
Conditional entropy
Mutual information
Mutual information
+ +
H(X) H(Y)
I(X;Y)
Mutual information
Mutual information
Mutual information
I(X;Y)
Mutual information
Mutual information
Mutual information
Mutual information
5 5 α γ+ 0.5
10 10
0.8
γ- 0
30 30
α -2.5
γ0-
+
35 35
γ -3
5 10 15 20 25 30 35 5 10 15 20 25 30 35
lM1
lSMA
lPMD rPMD
lSPL rSPL
V1
rBG
lSC rSC
rCer
Tm → m(G,C)
3
0.2
10 06 06 02 11 03 2
0.1
0.48
1
0
03 0
0.46 −0.1
−0.2 −1
09 10
−0.3 −2
05 07 08 04 0.44
−0.4 −3
−2 −1.5 −1 −0.5 0 0.5 1 1.5
0.42 δ A ( G, C)
m
References I
References II