0% found this document useful (0 votes)

37 views85 pages

COSC1003/1903 Information Theory: Joseph Lizier

Uploaded by

Mukaila Issah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views85 pages

COSC1003/1903 Information Theory: Joseph Lizier

Uploaded by

Mukaila Issah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

COSC1003/1903

Information Theory

Joseph Lizier

Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 2

Guest lecturer

Dr. Joseph Lizier

Senior Lecturer
Complex Systems Research Group
School of Civil Engineering
Rm 338A Civil Eng Building
[email protected]

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 3

Reference texts

T. M. Cover and J. A. Thomas. Elements of Information

Theory. Wiley-Interscience, New York, 1991.
see ch. 2
D. J. C. MacKay. Information Theory, Inference, and Learning
Algorithms. Cambridge University Press, Cambridge, 2003.
see ch. 2 and 8
download at
http://www.inference.phy.cam.ac.uk/itprnn/book.html

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 4

Outline

1 Introduction to information theory

2 Entropy: fundamental quantity of information theory

3 Other measures

4 Sample applications

5 Summary

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 5

What is information?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 5

What is information?

You tell me ...

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 6

A game about information: Guess who?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 6

A game about information: Guess who?

1. Let’s talk about the rules

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 7

A game about information: Guess who?

2. Who wants to play?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 8

A game about information: Guess who?

3. What did we learn from the game:

1 What are the best/worst questions to ask or strategies?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 8

A game about information: Guess who?

3. What did we learn from the game:

1 What are the best/worst questions to ask or strategies?

2 What types of information did we encounter?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 9

What is Information Theory?

An approach to quantitatively capture the notion of

information.

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 9

What is Information Theory?

An approach to quantitatively capture the notion of

information.
Traditionally, information theory provides answers to two
fundamental questions (Cover and Thomas, 1991):
1 What is the ultimate data compression?
How small can I zip up a file?
2 What is the ultimate transmission rate of communication?
What is my max download speed at home?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 10

What is Information Theory?

It’s also about far more than these traditional areas:

Image from Cover and

Thomas (1991)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 10

What is Information Theory?

It’s also about far more than these traditional areas:

How do natural
systems process
−→
information? Image from Cover and

Thomas (1991)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 11

Defining information: first pass

JL: “Information is all about questions and answers”

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 11

Defining information: first pass

JL: “Information is all about questions and answers”

Information is the amount by which

one variable (an answer/signal/measurement)
reduces our uncertainty or surprises us
about another variable.

This was quantified by Claude Shannon (1948)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 12

Quantifying information: preliminaries

X is a random variable
A variable whose value is subject to chance.
i.e. an answer/signal/measurement
e.g. result of a coin flip, whether it rains today, etc.

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 12

Quantifying information: preliminaries

X is a random variable
A variable whose value is subject to chance.
i.e. an answer/signal/measurement
e.g. result of a coin flip, whether it rains today, etc.
x is a sample or outcome or measurement of X
drawn from some discrete alphabet AX = {x1 , x2 , . . .}
For binary X , AX = {0, 1}
For a coin toss, AX = {heads, tails}
For hair colour in Guess who? : AX = {?}

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 12

Quantifying information: preliminaries

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 13

Shannon information content

The fundamental quantity of information theory

Shannon information content1 of a sample or outcome x:

1
h(x) = log2
p(x)

Units are bits for log in base 2.

Best thought of as a measure of surprise at the value of this
sample or outcome x given p(x):
No surprise if there is only ever one outcome p(x) = 1;
There is always some level of surprise if there exists more than
one outcome with p(x) > 0
Our surprise increases as x becomes less likely;

1
We’ll show later how this is a unique form ...

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 14

Shannon information content

Shannon information content of a sample or outcome x:

1 7

h(x) = log2 6

p(x) 5

= − log2 (p(x)) 4

h(x)
3

Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1

h(heads) for a fair coin?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 14

Shannon information content

Shannon information content of a sample or outcome x:

1 7

h(x) = log2 6

p(x) 5

= − log2 (p(x)) 4

h(x)
3

Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1

h(heads) for a fair coin? 1 bit

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 14

Shannon information content

Shannon information content of a sample or outcome x:

1 7

h(x) = log2 6

p(x) 5

= − log2 (p(x)) 4

h(x)
3

Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1

h(heads) for a fair coin? 1 bit

h(1) for a 6-sided die (p(1) = 1/6)?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 14

Shannon information content

Shannon information content of a sample or outcome x:

1 7

h(x) = log2 6

p(x) 5

= − log2 (p(x)) 4

h(x)
3

Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1

h(heads) for a fair coin? 1 bit

h(1) for a 6-sided die (p(1) = 1/6)? 2.58 bits

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 14

Shannon information content

Shannon information content of a sample or outcome x:

1 7

h(x) = log2 6

p(x) 5

= − log2 (p(x)) 4

h(x)
3

Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1

h(heads) for a fair coin? 1 bit

h(1) for a 6-sided die (p(1) = 1/6)? 2.58 bits
h(not 1) for a 6-sided die (p(not 1) = 5/6)?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 14

Shannon information content

Shannon information content of a sample or outcome x:

1 7

h(x) = log2 6

p(x) 5

= − log2 (p(x)) 4

h(x)
3

Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1

h(heads) for a fair coin? 1 bit

h(1) for a 6-sided die (p(1) = 1/6)? 2.58 bits
h(not 1) for a 6-sided die (p(not 1) = 5/6)? 0.26 bits

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 14

Shannon information content

Shannon information content of a sample or outcome x:

1 7

h(x) = log2 6

p(x) 5

= − log2 (p(x)) 4

h(x)
3

Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1

h(heads) for a fair coin? 1 bit

h(1) for a 6-sided die (p(1) = 1/6)? 2.58 bits
h(not 1) for a 6-sided die (p(not 1) = 5/6)? 0.26 bits
h(1) for a 20-sided die (p(1) = 1/20)?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 14

Shannon information content

Shannon information content of a sample or outcome x:

1 7

h(x) = log2 6

p(x) 5

= − log2 (p(x)) 4

h(x)
3

Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1

h(heads) for a fair coin? 1 bit

h(1) for a 6-sided die (p(1) = 1/6)? 2.58 bits
h(not 1) for a 6-sided die (p(not 1) = 5/6)? 0.26 bits
h(1) for a 20-sided die (p(1) = 1/20)? 4.32 bits
h(not 1) for a 20-sided die (p(not 1) = 19/20)?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 14

Shannon information content

Shannon information content of a sample or outcome x:

1 7

h(x) = log2 6

p(x) 5

= − log2 (p(x)) 4

h(x)
3

Examples: 0
0 0.2 0.4
p(x)
0.6 0.8 1

h(heads) for a fair coin? 1 bit

h(1) for a 6-sided die (p(1) = 1/6)? 2.58 bits
h(not 1) for a 6-sided die (p(not 1) = 5/6)? 0.26 bits
h(1) for a 20-sided die (p(1) = 1/20)? 4.32 bits
h(not 1) for a 20-sided die (p(not 1) = 19/20)? 0.07 bits

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 15

Shannon information content

Shannon information content of a sample or outcome x:

1
h(x) = log2
p(x)
= − log2 (p(x))

Examples – Guess Who? :

h(alex)?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 15

Shannon information content

Shannon information content of a sample or outcome x:

1
h(x) = log2
p(x)
= − log2 (p(x))

Examples – Guess Who? :

h(alex)? log2 1/24
1
= 4.585 bits
h(female)?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 15

Shannon information content

Shannon information content of a sample or outcome x:

1
h(x) = log2
p(x)
= − log2 (p(x))

Examples – Guess Who? :

h(alex)? log2 1/24
1
= 4.585 bits

h(female)? log2 5/24
1
= 2.263 bits

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 15

Shannon information content

Shannon information content of a sample or outcome x:

1
h(x) = log2
p(x)
= − log2 (p(x))

Examples – Guess Who? :

h(alex)? log2 1/24
1
= 4.585 bits

h(female)? log2 5/24
1
= 2.263 bits
h(male)?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 15

Shannon information content

Shannon information content of a sample or outcome x:

1
h(x) = log2
p(x)
= − log2 (p(x))

Examples – Guess Who? :

h(alex)? log2 1/24
1
= 4.585 bits

h(female)? log2 5/24
1
= 2.263 bits

h(male)? log2 19/24 = 0.337 bits
1

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 15

Shannon information content

Shannon information content of a sample or outcome x:

1
h(x) = log2
p(x)
= − log2 (p(x))

Examples – Guess Who? :

h(alex)? log2 1/24
1
= 4.585 bits

h(female)? log2 5/241
= 2.263 bits

h(male)? log2 19/24 = 0.337 bits
1

Is “female?” a good question to ask first?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 15

Shannon information content

Shannon information content of a sample or outcome x:

1
h(x) = log2
p(x)
= − log2 (p(x))

Examples – Guess Who? :

h(alex)? log2 1/24
1
= 4.585 bits

h(female)? log2 5/241
= 2.263 bits

h(male)? log2 19/24 = 0.337 bits
1

Is “female?” a good question to ask first?

Is “alex?” a good question to ask first?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 16

(Shannon) entropy

Shannon entropy of a random variable X :

X 1
H(X ) = p(x) log2
p(x)
x∈AX
X
=− p(x) log2 (p(x))
x∈AX

= hh(x)i
Expectation value of Shannon information content
p log p = 0 in the limit as p → 0
Examples:
If ∃x, p(x) = 1 → H(X ) = 0.
For binary X , p(0) = p(1) = 0.5 → H(X ) = 1 bit.
p(x) = 1/|AX |, ∀x → H(X ) = log2 (|AX |) bits.

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 17

(Shannon) entropy

Shannon entropy of a random variable X :

X 1
H(X ) = p(x) log2
p(x)
x∈AX
X
=− p(x) log2 (p(x))
x∈AX

= hh(x)i

Examples – Guess Who? :

H(who)?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 17

(Shannon) entropy

Shannon entropy of a random variable X :

X 1
H(X ) = p(x) log2
p(x)
x∈AX
X
=− p(x) log2 (p(x))
x∈AX

= hh(x)i

Examples – Guess Who? :

H(who)? 4.585 bits
H(sex)?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 17

(Shannon) entropy

Shannon entropy of a random variable X :

X 1
H(X ) = p(x) log2
p(x)
x∈AX
X
=− p(x) log2 (p(x))
x∈AX

= hh(x)i

Examples – Guess Who? :

H(who)? 4.585 bits
H(sex)? 24
5
× 2.263 + 19
24 × 0.337 = 0.738 bits

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 17

(Shannon) entropy

Shannon entropy of a random variable X :

X 1
H(X ) = p(x) log2
p(x)
x∈AX
X
=− p(x) log2 (p(x))
x∈AX

= hh(x)i

Examples – Guess Who? :

H(who)? 4.585 bits
H(sex)? 24 5
× 2.263 + 19
24 × 0.337 = 0.738 bits
Is “female?” a good question to ask first?
Is “alex?” a good question to ask first?
What is the best question to ask first, and why?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 18

(Shannon) entropy – coding it

Shannon entropy of a random variable X :

X
H(X ) = − p(x) log2 (p(x))
x∈AX

Let’s code it:

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 18

(Shannon) entropy – coding it

Shannon entropy of a random variable X :

X
H(X ) = − p(x) log2 (p(x))
x∈AX

Let’s code it:

For a binary X with p1 = p(X = 1):

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 18

(Shannon) entropy – coding it

Shannon entropy of a random variable X :

X
H(X ) = − p(x) log2 (p(x))
x∈AX

Let’s code it:

For a binary X with p1 = p(X = 1):
H(X ) = −p1 log2 (p1 ) − (1 − p1 ) log2 ((1 − p1 ))

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 18

(Shannon) entropy – coding it

Shannon entropy of a random variable X :

X
H(X ) = − p(x) log2 (p(x))
x∈AX

Let’s code it:

For a binary X with p1 = p(X = 1):
H(X ) = −p1 log2 (p1 ) − (1 − p1 ) log2 ((1 − p1 ))
What does H(X ) look like as a function of p(X = 1)?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 18

(Shannon) entropy – coding it

Shannon entropy of a random variable X :

X
H(X ) = − p(x) log2 (p(x))
x∈AX

Let’s code it:

For a binary X with p1 = p(X = 1):
H(X ) = −p1 log2 (p1 ) − (1 − p1 ) log2 ((1 − p1 ))
What does H(X ) look like as a function of p(X = 1)?
For a general discrete AX :
1 Input?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 18

(Shannon) entropy – coding it

Shannon entropy of a random variable X :

X
H(X ) = − p(x) log2 (p(x))
x∈AX

Let’s code it:

For a binary X with p1 = p(X = 1):
H(X ) = −p1 log2 (p1 ) − (1 − p1 ) log2 ((1 − p1 ))
What does H(X ) look like as a function of p(X = 1)?
For a general discrete AX :
1 Input? Take p(x) as a vector

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 18

(Shannon) entropy – coding it

Shannon entropy of a random variable X :

X
H(X ) = − p(x) log2 (p(x))
x∈AX

Let’s code it:

For a binary X with p1 = p(X = 1):
H(X ) = −p1 log2 (p1 ) − (1 − p1 ) log2 ((1 − p1 ))
What does H(X ) look like as a function of p(X = 1)?
For a general discrete AX :
1 Input? Take p(x) as a vector
2 How to sum over x?
3 What are some possible error conditions here?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 19

What does it mean, and traditional usage

Using an optimal compression or encoding scheme given p(x):

h(x) is the number of bits for a symbol to communicate x
H(X ) is the number of bits to communicate the x on average.
Or (in bits): how few yes/no questions could I need to ask (on
average) to determine the value of x?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 19

What does it mean, and traditional usage

Using an optimal compression or encoding scheme given p(x):

Think about Guess who? as a decoding task

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 20

What does it mean, and traditional usage

Using an optimal compression or encoding scheme given p(x):

Image from Shannon (1948)

What has information theory ever done for me?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 20

What does it mean, and traditional usage

Using an optimal compression or encoding scheme given p(x):

Image from Shannon (1948)

What has information theory ever done for me? zip files, mp3s,
encoding mobile telecoms / ADSL etc.
Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015
Outline Intro Entropy Other measures Applications Close 21

What does it mean, and traditional usage

Using an optimal compression or encoding scheme given p(x):

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 21

What does it mean, and traditional usage

Using an optimal compression or encoding scheme given p(x):

h(x) is the number of bits for a symbol to communicate x
H(X ) is the number of bits to communicate the x on average.
Example: say we want to communicate the result of a horse race
with four horses {a, b, c, d}:
How many bits to encode each outcome?
Assume p(x) = 0.25, ∀x to give 2 bits. max. entropy
assumption
If p(a) = 0.5, p(b) = 0.25, p(c) = p(d) = 0.125?
h(x) tells us to use 1 bit for a (say “0”), 2 bits for b (say
“10”) and 3 bits for c and d (say “110” and “111”);
H(X ) = 1.75 bits.
Using the actual p(x) leads to more efficient coding

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 21

What does it mean, and traditional usage

Using an optimal compression or encoding scheme given p(x):

Entropy of text and compression

Think about coding letters in English language text

Can we get any insights into how many bits to use
for each letter?2
Look at entropy of alphabet in MacKay (2003) →
Meaning of a non-integer number of bits:
Encoding one sample at a time can only be done
with an integer number of bits
To reach the lower limits suggested by information
theory, we would need to use block coding (i.e.
encoding multiple samples together)

2
How to determine the coding to use is a discussion for another time ...
Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015
Outline Intro Entropy Other measures Applications Close 23

Joint entropy

We can consider joint entropy of a multivariate, e.g. {X , Y }:

X X 1
H(X , Y ) = p(x, y ) log2
p(x, y )
x∈AX y ∈AY

Is H(X , Y ) = H(X ) + H(Y )?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 23

Joint entropy

We can consider joint entropy of a multivariate, e.g. {X , Y }:

X X 1
H(X , Y ) = p(x, y ) log2
p(x, y )
x∈AX y ∈AY

Is H(X , Y ) = H(X ) + H(Y )?

Only for indepedent variables where p(x, y ) = p(x)p(y ) !

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 23

Joint entropy

We can consider joint entropy of a multivariate, e.g. {X , Y }:

X X 1
H(X , Y ) = p(x, y ) log2
p(x, y )
x∈AX y ∈AY

Is H(X , Y ) = H(X ) + H(Y )?

Only for indepedent variables where p(x, y ) = p(x)p(y ) !
Can you think about how to code H(X , Y )?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 24

Aside: Shannon entropy – derivation

Shannon entropy of a random variable X :

X
H(X ) = − p(x) log2 (p(x))
x∈AX

Is a unique form that satisfies three axioms (Ash, 1965;

Shannon, 1948):
Continuity w.r.t. p(x)
Monotony – H(X ) ↑ as |AX | ↑, for p(x) = |A1 |
X
Grouping – For independent variables X and Y ,
H(X , Y ) = H(X ) + H(Y )

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 25

Conditional entropy

What if we already know something about X – how does that

change the surprise?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 25

Conditional entropy

What if we already know something about X – how does that

change the surprise?
Conditional entropy: (average) surprise remaining about X if
we already know the value of Y :
H(X | Y ) = H(X , Y ) − H(Y )

X X 1
= p(x, y ) log2
p(x | y )
x∈AX y ∈AY

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 25

Conditional entropy

What if we already know something about X – how does that

+ +
H(X) H(Y)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 25

Conditional entropy

What if we already know something about X – how does that

change the surprise?
Conditional entropy: (average) surprise remaining about X if
we already know the value of Y :
H(X | Y ) = H(X , Y ) − H(Y )

X X 1
= p(x, y ) log2
p(x | y )
x∈AX y ∈AY
H(X,Y)
H(Y|X) 0 ≤ H(X | Y ) ≤ H(X )
H(X|Y) + +
H(X | Y ) = H(X ) iff X and Y are
+ + independent
H(X) H(Y) H(X | Y ) = 0 means there is no surprise
left about X once we know Y
Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015
Outline Intro Entropy Other measures Applications Close 26

Conditional entropy

Conditional entropy: (average) surprise remaining about X if

we already know the value of Y :

H(X | Y ) = H(X , Y ) − H(Y )

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 26

Conditional entropy

Conditional entropy: (average) surprise remaining about X if

we already know the value of Y :

H(X | Y ) = H(X , Y ) − H(Y )

Example 1:
H(X,Y)
Coding characters in English text –
H(X|Y) + + H(Y|X)
what variable Y would drop H(X ) and
therefore the code length for a
+ + conditional encoding of incoming
H(X) H(Y) character X ?
Context of previous character(s) Y
changes the probability of the next
character X – Markov chains

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 27

Conditional entropy

Conditional entropy: (average) surprise remaining about X if

we already know the value of Y :

H(X | Y ) = H(X , Y ) − H(Y )

Example 2:
H(X,Y)
Guess who? – how much surprise
H(X|Y) + + H(Y|X) remains (on average) about X = who
given the Y = sex?
+ + 1 First H(who, sex) =
H(X) H(Y)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 27

Conditional entropy

Conditional entropy: (average) surprise remaining about X if

we already know the value of Y :

H(X | Y ) = H(X , Y ) − H(Y )

Example 2:
H(X,Y)
Guess who? – how much surprise
H(X|Y) + + H(Y|X) remains (on average) about X = who
given the Y = sex?
+ + 1 First H(who, sex) =
H(X) H(Y) H(who) = 4.585 bits because
character contains all information
about the sex
2 Next, H(sex) = 0.738 bits (Slide
17)
3 So H(who | sex) = 3.847 bits.

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 28

Mutual information

Mutual Information is the reduction in uncertainty or surprise

about one variable that we obtain from another
I (X ; Y ) = H(X ) + H(Y ) − H(X , Y )
= H(X ) − H(X | Y )

X X p(x, y )
= p(x, y ) log2
p(x)p(y )
x∈AX y ∈AY

X X p(x | y )
= p(x, y ) log2
p(x)
x∈AX y ∈AY

Can anyone smell Bayes rule?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 29

Mutual information

Mutual Information is the reduction in uncertainty or surprise

about one variable that we obtain from another
I (X ; Y ) = H(X ) + H(Y ) − H(X , Y )
= H(X ) − H(X | Y )

X X p(x | y )
= p(x, y ) log2
p(x)
x∈AX y ∈AY
H(X,Y)
H(X|Y) + + H(Y|X)

+ +
H(X) H(Y)
I(X;Y)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 29

Mutual information

Mutual Information is the reduction in uncertainty or surprise

about one variable that we obtain from another
I (X ; Y ) = H(X ) + H(Y ) − H(X , Y )
= H(X ) − H(X | Y )

X X p(x | y )
= p(x, y ) log2
p(x)
x∈AX y ∈AY
H(X,Y)
H(X|Y) H(Y|X) 0 ≤ I (X ; Y ) ≤ min(H(X ), H(Y ))
+ +
I (X ; Y ) = 0 iff X and Y are independent
+ + I (X ; Y ) = H(X ) means there is no
H(X) H(Y) surprise left about X once we know Y –
i.e. Y tells us all the information about X .
I(X;Y)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 30

Mutual information

Mutual Information is the reduction in uncertainty or surprise

about one variable that we obtain from another
I (X ; Y ) = H(X ) + H(Y ) − H(X , Y )
= H(X ) − H(X | Y )

X X p(x | y )
= p(x, y ) log2
p(x)
x∈AX y ∈AY
H(X,Y)
This reflects our earlier definition of
H(X|Y) + + H(Y|X) information.
I (X ; X ) = H(X ) is the self-information.
+ + Entropy and information are
H(X) H(Y) complementary quantities
I(X;Y) MI is a non-linear correlation

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 31

Mutual information

I (X ; Y ) = H(X ) + H(Y ) − H(X , Y )

H(X,Y)
H(X|Y) + + H(Y|X) Example: Guess who?
I (who; sex) = H(sex) trivially
+ +
H(X) H(Y) I (earings; sex)?

I(X;Y)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 31

Mutual information

I (X ; Y ) = H(X ) + H(Y ) − H(X , Y )

H(X,Y)
H(X|Y) + + H(Y|X) Example: Guess who?
I (who; sex) = H(sex) trivially
+ +
H(X) H(Y) I (earings; sex)?

I(X;Y) 1 Construct p(earings, sex)

2 Plug in

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 31

Mutual information

I (X ; Y ) = H(X ) + H(Y ) − H(X , Y )

H(X,Y)
H(X|Y) + + H(Y|X) Example: Guess who?
I (who; sex) = H(sex) trivially
+ +
H(X) H(Y) I (earings; sex)?

I(X;Y) 1 Construct p(earings, sex)

2 Plug in (result is 0.212 bits)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 31

Mutual information

I (X ; Y ) = H(X ) + H(Y ) − H(X , Y )

H(X,Y)
H(X|Y) + + H(Y|X) Example: Guess who?
I (who; sex) = H(sex) trivially
+ +
H(X) H(Y) I (earings; sex)?

I(X;Y) 1 Construct p(earings, sex)

2 Plug in (result is 0.212 bits)
3 Why is there MI here?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 32

Mutual information

MI is a great model-free tool to:

detect relationships between variables
reveal patterns
show how such relationships and patterns fluctuate in time.

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 33

Information theory – sample applications

Feature selection for machine learning

e.g. in Disease diagnosis from breath/urine analysis
∼10 000 features available
use of MI to select which could be used in classification

eNose from Sensigent (image

used under CC-BY-SA license)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 34

Information theory – sample applications

Space-time characterisation of information processing in

distributed systems
Highlight information processing hot-spots;
Use information processing to explain dynamics.
1
γ+ γ- γ+ α 1

5 5 α γ+ 0.5

10 10
0.8
γ- 0

-0.5 e.g. Cellular automata

15 15 0.6
β γ- γ- -1
(Lizier, 2014)
20 20
0.4
α α -1.5
25 25 γ- γ+
0.2 γ- -2

30 30
α -2.5

γ0-
+
35 35
γ -3
5 10 15 20 25 30 35 5 10 15 20 25 30 35

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 35

Information theory – sample applications

Analysing information processing in the brain

Localise response to a given stimulus;
Revealing directed brain network structures;
Inferring differences in information processing under cognitive
task or condition.

lM1
lSMA
lPMD rPMD

lSPL rSPL

rBG
lSC rSC

rCer

Lizier et al. (2011) Gómez et al. (2014)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 36

Information theory – sample applications

Analysing implicit communications

e.g. in robotic soccer matches (Robocup)

Relative mid. AIS vs. Relative Transfer from mid. to mid.

7
0.54
04 08 07 05 0.6 6
11 09 0.5
0.52 5
0.4
4
02 0.3
0.5

Tm → m(G,C)
3
0.2
10 06 06 02 11 03 2
0.1
0.48
1
0
03 0
0.46 −0.1

−0.2 −1
09 10
−0.3 −2
05 07 08 04 0.44
−0.4 −3
−2 −1.5 −1 −0.5 0 0.5 1 1.5
0.42 δ A ( G, C)
m

Cliff et al. (2014, 2015)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 37

What you need to know

Meaning of information, uncertainty and surprise and their

relationship
Meaning of entropy as p log2 (p)
That entropy tells us about minimal lengths to encode
outcomes of a random variable
That mutual information tells us information (reduction in
entropy) conveyed by one variable about another
How to calculate entropy

There is lots more to information theory that we didn’t cover

(e.g. conditional MI, measures of information processing,
continuous variables, etc.)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 38

References I

R. B. Ash. Information Theory. Dover Publications Inc., New York, 1965.

O. M. Cliff, J. T. Lizier, X. R. Wang, P. Wang, O. Obst, and M. Prokopenko.
Towards quantifying interaction networks in a football match. In S. Behnke,
M. Veloso, A. Visser, and R. Xiong, editors, RoboCup 2013: Robot World Cup
XVII, volume 8371 of Lecture Notes in Computer Science, pages 1–12. Springer,
Berlin/Heidelberg, 2014.
O. M. Cliff, J. T. Lizier, X. R. Wang, P. Wang, O. Obst, and M. Prokopenko.
Quantifying Long-Range Interactions and Coherent Structure in Multi-Agent
Dynamics. 2015. under submission.
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience,
New York, 1991.
C. Gómez, J. T. Lizier, M. Schaum, P. Wollstadt, C. Grützner, P. Uhlhaas, C. M.
Freitag, S. Schlitt, S. Bölte, R. Hornero, and M. Wibral. Reduced predictable
information in brain signals in autism spectrum disorder. Frontiers in
Neuroinformatics, 8:9+, 2014.
J. T. Lizier. Measuring the dynamics of information processing on a local scale in time
and space. In M. Wibral, R. Vicente, and J. T. Lizier, editors, Directed Information
Measures in Neuroscience, Understanding Complex Systems, pages 161–193.
Springer, Berlin/Heidelberg, 2014.

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Outline Intro Entropy Other measures Applications Close 39

References II

J. T. Lizier, J. Heinzle, A. Horstmann, J.-D. Haynes, and M. Prokopenko. Multivariate

information-theoretic measures reveal directed information structure and task
relevant changes in fMRI connectivity. Journal of Computational Neuroscience, 30
(1):85–107, 2011.
D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge
University Press, Cambridge, 2003.
C. E. Shannon. A mathematical theory of communication. Bell System Technical
Journal, 27(3–4):379–423, 623–656, 1948.

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Entropy (Information Theory)
No ratings yet
Entropy (Information Theory)
17 pages
Module 1
No ratings yet
Module 1
40 pages
BEC503-DC-M3-Information Theory
No ratings yet
BEC503-DC-M3-Information Theory
100 pages
TLE 8 Worktext ICT-Photo Editing FINAL
No ratings yet
TLE 8 Worktext ICT-Photo Editing FINAL
60 pages
Information & Entropy: Comp 595 DM Professor Wang
100% (1)
Information & Entropy: Comp 595 DM Professor Wang
9 pages
Multimedia Various Questions
100% (1)
Multimedia Various Questions
35 pages
Self Information
No ratings yet
Self Information
3 pages
An Introduction To Information Theory and Entropy: Tom Carter CSU Stanislaus
No ratings yet
An Introduction To Information Theory and Entropy: Tom Carter CSU Stanislaus
139 pages
An Introduction To Information Theory and Entropy: Tom Carter
No ratings yet
An Introduction To Information Theory and Entropy: Tom Carter
139 pages
Carter - An Introduction To Information Theory and Entropy
No ratings yet
Carter - An Introduction To Information Theory and Entropy
126 pages
The Source Coding Theorem: M Ario S. Alvim (Msalvim@dcc - Ufmg.br)
No ratings yet
The Source Coding Theorem: M Ario S. Alvim (Msalvim@dcc - Ufmg.br)
62 pages
Probability and Information
No ratings yet
Probability and Information
25 pages
Probability & Information: Prof. J Bapat
No ratings yet
Probability & Information: Prof. J Bapat
20 pages
Lecture-1 Information Theory
No ratings yet
Lecture-1 Information Theory
20 pages
Quantum Information: Stephen M. Barnett
No ratings yet
Quantum Information: Stephen M. Barnett
60 pages
MIT18 440S14 Lecture34
No ratings yet
MIT18 440S14 Lecture34
19 pages
Basic Information Theory: Thinh Nguyen Oregon State University
No ratings yet
Basic Information Theory: Thinh Nguyen Oregon State University
17 pages
Information Theory & Coding: "Science Is Organized Knowledge. Wisdom Is Organized Life." - Immanuel Kant
No ratings yet
Information Theory & Coding: "Science Is Organized Knowledge. Wisdom Is Organized Life." - Immanuel Kant
35 pages
Lecture 1
No ratings yet
Lecture 1
211 pages
Information Theory in Molecular Biology: Christoph Adami
No ratings yet
Information Theory in Molecular Biology: Christoph Adami
30 pages
Information Theory Textbook
No ratings yet
Information Theory Textbook
14 pages
Untitled Presentation
No ratings yet
Untitled Presentation
11 pages
Introduction To Information Entropy
No ratings yet
Introduction To Information Entropy
11 pages
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
No ratings yet
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
44 pages
Sub Bab 15
No ratings yet
Sub Bab 15
14 pages
Principles of Information Theory
No ratings yet
Principles of Information Theory
5 pages
Information Measure
No ratings yet
Information Measure
22 pages
EE Final
No ratings yet
EE Final
27 pages
Information Theory
No ratings yet
Information Theory
18 pages
What Is Information?: W. Szpankowski
No ratings yet
What Is Information?: W. Szpankowski
29 pages
Short Intro Quantum Information
No ratings yet
Short Intro Quantum Information
64 pages
ICT - Module 1 Lecture 1
No ratings yet
ICT - Module 1 Lecture 1
34 pages
Module14 InformationTheoryandEntropy
No ratings yet
Module14 InformationTheoryandEntropy
24 pages
Chapshannon PDF
No ratings yet
Chapshannon PDF
8 pages
Information Theory
No ratings yet
Information Theory
114 pages
Lec35 - 210108062 - ZAINAB ALI
No ratings yet
Lec35 - 210108062 - ZAINAB ALI
9 pages
Information Theory: Mike Brookes E4.40, ISE4.51, SO20
No ratings yet
Information Theory: Mike Brookes E4.40, ISE4.51, SO20
114 pages
Information
No ratings yet
Information
15 pages
Lecture 1: Introduction, Entropy and ML Estimation
No ratings yet
Lecture 1: Introduction, Entropy and ML Estimation
5 pages
Lecture 5
No ratings yet
Lecture 5
42 pages
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
No ratings yet
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
5 pages
Paper Theory On Information Theory
No ratings yet
Paper Theory On Information Theory
15 pages
Entr 5
No ratings yet
Entr 5
2 pages
Presentation Math7952
No ratings yet
Presentation Math7952
29 pages
Communication Theory and Coding: Basics
No ratings yet
Communication Theory and Coding: Basics
17 pages
Ch5 Entropy and Information
No ratings yet
Ch5 Entropy and Information
77 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
Information Theory and Coding: Faculty of Engineering
No ratings yet
Information Theory and Coding: Faculty of Engineering
18 pages
2015 Chapter 7 MMS IT
No ratings yet
2015 Chapter 7 MMS IT
36 pages
TCTutorial
100% (10)
TCTutorial
173 pages
Gentle Intro To Information Theory
No ratings yet
Gentle Intro To Information Theory
139 pages
Multimedia Concept & Topics
No ratings yet
Multimedia Concept & Topics
21 pages
Unit 1 ppt2 Information Theory
No ratings yet
Unit 1 ppt2 Information Theory
24 pages
2009 Lecture25
No ratings yet
2009 Lecture25
11 pages
IICT Notes Unit-2
No ratings yet
IICT Notes Unit-2
17 pages
Iict Unit One
No ratings yet
Iict Unit One
35 pages
Entropy: Low Entropy High Entropy
No ratings yet
Entropy: Low Entropy High Entropy
11 pages
Adami What Is Information
No ratings yet
Adami What Is Information
14 pages
2 Information Measurement and Entropy
No ratings yet
2 Information Measurement and Entropy
23 pages
(Info) Whisper of The Heart (1995) HD (x264, AAC) - (WWW - BuGGeRs.Tk)
No ratings yet
(Info) Whisper of The Heart (1995) HD (x264, AAC) - (WWW - BuGGeRs.Tk)
3 pages
MIT16 36s09 Lec03
No ratings yet
MIT16 36s09 Lec03
10 pages
HTCyberSecurity. UNIT 1
No ratings yet
HTCyberSecurity. UNIT 1
23 pages
Final Seminar Report PDF
50% (2)
Final Seminar Report PDF
17 pages
Tellarplus: Performance Package
No ratings yet
Tellarplus: Performance Package
112 pages
Information Theory111
No ratings yet
Information Theory111
1 page
CBMT2103 Intro To Multimedia Tech Caug14 (RS) (M)
No ratings yet
CBMT2103 Intro To Multimedia Tech Caug14 (RS) (M)
236 pages
Storage Management: What Radiologists Need To Know
100% (1)
Storage Management: What Radiologists Need To Know
6 pages
Streaming Media - Audience and Industry Shifts in A Networked Soci
No ratings yet
Streaming Media - Audience and Industry Shifts in A Networked Soci
215 pages
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
No ratings yet
Advantages of Transformer and Its Application For Medical Image Segmentation: A Survey
22 pages
Chapter 4 - Multimedia Element Sound
No ratings yet
Chapter 4 - Multimedia Element Sound
32 pages
Linear & Cyclic Codes
No ratings yet
Linear & Cyclic Codes
5 pages
T Rec F.780.1 202203 I!!pdf e
No ratings yet
T Rec F.780.1 202203 I!!pdf e
42 pages
Source Coding Vs Channel Coding
No ratings yet
Source Coding Vs Channel Coding
5 pages
7608E Series of Eight (8) Video Inputs Encoder With Bi-Directional or 12-Unidirection Audio Inputs
0% (1)
7608E Series of Eight (8) Video Inputs Encoder With Bi-Directional or 12-Unidirection Audio Inputs
2 pages
R19 4th Year Syllabus (1-07-2022)
No ratings yet
R19 4th Year Syllabus (1-07-2022)
25 pages
Asynchronous (Cervo Ramboyong)
No ratings yet
Asynchronous (Cervo Ramboyong)
16 pages
RadioCaster UserManual en
No ratings yet
RadioCaster UserManual en
23 pages
Activity No. 3 Pulse Code Modulation: Perez, Patricia Grace P. Student Number: 201912327 Bscpe 3-1 Cpen 80
No ratings yet
Activity No. 3 Pulse Code Modulation: Perez, Patricia Grace P. Student Number: 201912327 Bscpe 3-1 Cpen 80
5 pages
Ol-Cs ZM
No ratings yet
Ol-Cs ZM
16 pages
Rekordbox5.3.0 Introduction EN
No ratings yet
Rekordbox5.3.0 Introduction EN
38 pages
Gazi Thesis Wittek
100% (3)
Gazi Thesis Wittek
7 pages
Derick
No ratings yet
Derick
9 pages
Steganoanlysis For All Media Files
No ratings yet
Steganoanlysis For All Media Files
92 pages
Mil - 2ND Audio Handouts
No ratings yet
Mil - 2ND Audio Handouts
3 pages
Duplicate Cleaner Log
No ratings yet
Duplicate Cleaner Log
8 pages
Tutorial Create - Video - From - Still - Image - and - From - Au
No ratings yet
Tutorial Create - Video - From - Still - Image - and - From - Au
6 pages
ATOM DX Data Sheet - EN
No ratings yet
ATOM DX Data Sheet - EN
14 pages
Telecommunication Questions
No ratings yet
Telecommunication Questions
2 pages
Mathematical Ideas And Solutions To Unsolved Problems
From Everand
Mathematical Ideas And Solutions To Unsolved Problems
VINCE FLYNT
No ratings yet
Backward Chaining: Fundamentals and Applications
From Everand
Backward Chaining: Fundamentals and Applications
Fouad Sabry
No ratings yet

COSC1003/1903 Information Theory: Joseph Lizier

Uploaded by

COSC1003/1903 Information Theory: Joseph Lizier

Uploaded by

COSC1003/1903

Lecture 15 | Monday, 14 September 2015

Dr. Joseph Lizier

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

 T. M. Cover and J. A. Thomas. Elements of Information

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

1 Introduction to information theory

2 Entropy: fundamental quantity of information theory

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

 You tell me ...

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

A game about information: Guess who?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

A game about information: Guess who?

1. Let’s talk about the rules

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

A game about information: Guess who?

2. Who wants to play?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

A game about information: Guess who?

3. What did we learn from the game:

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

A game about information: Guess who?

3. What did we learn from the game:

2 What types of information did we encounter?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

What is Information Theory?

 An approach to quantitatively capture the notion of

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

What is Information Theory?

 An approach to quantitatively capture the notion of

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

What is Information Theory?

It’s also about far more than these traditional areas:

Image from Cover and

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

What is Information Theory?

It’s also about far more than these traditional areas:

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Defining information: first pass

JL: “Information is all about questions and answers”

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Defining information: first pass

JL: “Information is all about questions and answers”

Information is the amount by which

This was quantified by Claude Shannon (1948)

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Quantifying information: preliminaries

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Quantifying information: preliminaries

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Quantifying information: preliminaries

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Shannon information content

 The fundamental quantity of information theory

 Units are bits for log in base 2.

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Shannon information content

 Shannon information content of a sample or outcome x:

 h(heads) for a fair coin?

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Shannon information content

 Shannon information content of a sample or outcome x:

 h(heads) for a fair coin? 1 bit

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Shannon information content

 Shannon information content of a sample or outcome x:

 h(heads) for a fair coin? 1 bit

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Shannon information content

 Shannon information content of a sample or outcome x:

 h(heads) for a fair coin? 1 bit

Joseph Lizier Computational Science Lecture 15 | Monday, 14 September 2015

Shannon information content

 Shannon information content of a sample or outcome x:

T. M. Cover and J. A. Thomas. Elements of Information

You tell me ...

An approach to quantitatively capture the notion of

An approach to quantitatively capture the notion of

The fundamental quantity of information theory

Units are bits for log in base 2.

Shannon information content of a sample or outcome x:

h(heads) for a fair coin?

Shannon information content of a sample or outcome x:

h(heads) for a fair coin? 1 bit

Shannon information content of a sample or outcome x:

h(heads) for a fair coin? 1 bit

Shannon information content of a sample or outcome x:

h(heads) for a fair coin? 1 bit

Shannon information content of a sample or outcome x:

h(heads) for a fair coin? 1 bit

Shannon information content of a sample or outcome x:

h(heads) for a fair coin? 1 bit

Shannon information content of a sample or outcome x:

h(heads) for a fair coin? 1 bit

Shannon information content of a sample or outcome x:

h(heads) for a fair coin? 1 bit

Shannon information content of a sample or outcome x:

h(heads) for a fair coin? 1 bit

Shannon information content of a sample or outcome x:

Examples – Guess Who? :

Shannon information content of a sample or outcome x:

Examples – Guess Who? :

Shannon information content of a sample or outcome x:

Examples – Guess Who? :

Shannon information content of a sample or outcome x:

Examples – Guess Who? :

Shannon information content of a sample or outcome x:

Examples – Guess Who? :

Shannon information content of a sample or outcome x:

Examples – Guess Who? :

Is “female?” a good question to ask first?

Shannon information content of a sample or outcome x:

Examples – Guess Who? :

Is “female?” a good question to ask first?

Shannon entropy of a random variable X :

Shannon entropy of a random variable X :

Examples – Guess Who? :

Shannon entropy of a random variable X :

Examples – Guess Who? :

Shannon entropy of a random variable X :

Examples – Guess Who? :

Shannon entropy of a random variable X :

Examples – Guess Who? :

Shannon entropy of a random variable X :

Let’s code it:

Shannon entropy of a random variable X :

Let’s code it: