0% found this document useful (0 votes)

16 views44 pages

Comp101 Lect02

COMP101 otago lecture 2

Uploaded by

shyamalimashreyadas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views44 pages

Comp101 Lect02

COMP101 otago lecture 2

Uploaded by

shyamalimashreyadas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

COMP101:

Foundations
of Information
Systems

Grant Dick
Department of
Information Science

Lecture 02:
Information Theory
Before we start …

• Labs start this week:

• OBS 1.18 (Otago Business School building), or
• Arts CAL (Arts Building) (Thursdays)
• See your timetable on eVision

• First lab assessment this week:

• Each lab has a small assessment activity or sign-
off
• Best 10 of 11 contribute to internal assessment
(20%)
• Minimum of 6 assessments completed for hurdle

2
Class Reps

• For a class of ~ 140 students, would be

great to have one from each lab stream
(i.e., 3-4 reps)

• If interested, please let me know (we can

do the registration for you!)

5
Goals for today

• Information theory – what is

“information”, how to measure
information quantity.

• Examples of Information Theory:

• Sizing data requirements
• Estimating required effort
• Estimating compression efficiency
• Creating decision rules from data

6
Recap from last lecture:

• I’m booking a table at a

restaurant, and the waiter
has asked “How many
guests?”
(note: it’s a loud restaurant,
so I answer with a gesture
instead of just talking)

• My response is:
a) “Table for 7, please?”
b) “Table for 25, please?”
c) “Table for 17, please?”
That gesture could be
d) “Table for 127, please?”

anything! 7
Data is just
symbols
These symbols can be used to represent things, but context is
needed!
Source: Wikimedia Commons
So, what is Information?

Claude Shannon
Information

• Very hard to define without using

tautology!

From OED:
Information

• “Knowledge communicated concerning

some particular fact, subject, or event;
that of which one is apprised or told;
intelligence, news.” (OED)

• Keywords: knowledge, communicate,

insight, fact, input.

• Information forms the input into decision

making
11
Information as “States”

• A source of information takes on different

values/concepts.

• Each value is a unique state for that

information source

• Note: “value” is independent from

representation (e.g., 10010, C, 11001002,
百 all represent the same information
state)
12
Information as “Surprise”

• If information provides insight into decision

making/understanding, then we can measure
the value/importance/size of information in
terms of its “Surprise”

• A rare state is usually surprising, and usually

also informative

• Therefore, information surprise is measured

in terms of its (inverse) frequency of
occurrence of its states
13
Information as Surprise

• A common state
(high ) is not
informative, so
Surprise is low

• A deterministic state ()
is completely
uninformative, so
Surprise is zero

• An impossible state ()
is undefined
14
The bit – smallest unit of
information
• Information must have variation:
• i.e., more than one state

• The smallest number of states is therefore 2:

• Presence or absence
• On or Off
• Yes or No
• 1 or 0

• Shannon gave this term a special name: bit

(binary digit)

15
Representing Multiple States in
Bits
• A bit can represent two states (0, 1)

• To represent multiple states, we simply chain

together more bits:
• e.g., to represent four states: (00, 01, 10, 11), six
states (000, 001, 010, 011, 100, 101), …
• We’ll elaborate on this in lectures 17-19
(representations)

• We therefore use the bit as the basic unit of

information (i.e., we “size” a source of
information as the number of bits required to
represent all of its states)
16
“Sizing” information
Just how many bits are required?

Examples:
Days of the week
Degrees offered by the university
Age of an individual

17
Size as “Expected Surprise”

• For a source of information with states,

we can measure the surprise of each
state as:

• The expected (average) surprise that we

get from this source is then:

• We call this value the Shannon

Entropy, measured in bits

18
Example:

• Imagine that we encounter weather states in

the following probabilities:
State
Sunny 1 0.20
Raining 2 0.20
Foggy 3 0.10
Cloudy 4 0.45
Frosty 5 0.05

• The entropy of this system is therefore:

bits
19
20
Equal Probability States

• Shannon Entropy is maximised when

states are equally-likely (when ), in these
cases, we typically simplify to:

21
Example: Password Characters

• Assume (unrealistically!) that each

character in a password can be A-Z, a-z,
0-9, _, or a space with equal probability
(i.e., random password).

• Each symbol equates to one of states

• Therefore, each character in a password

contributes bits of entropy.

22
Uses of Entropy

• Storage Requirements

• Estimating Computational Effort

• Decision Rules in Machine Learning

• As a basis for compression

23
Storage: How many bits are
needed?
• Often we need to work out how many bits
will be required to store a value in
memory, on disk, etc.

• Entropy defines the lower bound on

the number of required bits

• But we can’t store fractions of a bit, so

we round up to the nearest whole, so:

24
Computational Effort

• Entropy defines an exponential scale of work:

• Every additional bit of entropy doubles the number
of states in the system.

• Can therefore use entropy to measure the

difficulty or required work for a problem

• e.g., passwords: if we increase the per-

character entropy of our passwords (for a
given length), then we double the effort
required to brute force crack the password.
25
Decision Rules

• Decision often made by “divide and

conquer”:
• Partition problem into smaller (simpler) sets
• Use the average outcome of smaller sets to
inform decision (majority rule)

• How do we systematically identify the

best “splitting rule”?

• Entropy! (more accurately, entropy loss)

26
Entropy Loss (Information Gain!)

• Recall that a “pure” (no diversity) system

has zero entropy while a perfectly mixed
system has maximum diversity

• To test a splitting rule:

• Measure the entropy of the whole set
• “Split” the whole set into two (or more)
subsets
• Compute the entropy of each subset and total
these
• Measure the difference in total entropy before
and after the split. 27
If we have a lot of candidate splitting rules,
pick the one that produces the largest
difference!

This forms the basis of a method called decision tree learning!

28
Example (n.b., not assessed!)

• Predicting whether a person will default

on loan based on the following data:
default student balance income
No No 266 60183
Yes Yes 1551 19028
Yes No 1666 25055
No Yes 435 14813
Yes No 1320 36549
No No 1639 30625
Yes Yes 2039 12182
No No 0 62129
No No 566 39109
Yes No 1586 52275

• Entropy of “default” is: =1bit

• Let’s try splitting on “student=No” … 29
Example

• Predicting whether a person will default on

loan based on the following data:
default student default student
No No Yes Yes
Yes No No Yes
Yes No Yes Yes
No No
No No
No No
Yes No

• Entropy before: 1bit

• Entropy after: 0.7*0.985 + 0.3*0.918 =
0.965
• Difference: 0.035 bits (not much!)
30
Example

• Predicting whether a person will default

• Let’s try splitting on “balance ≤ 1000” …

31
Example

• Predicting whether a person will default on

loan based on the following data:
default balance default balance
No 266 Yes 1551
No 435 Yes 1666
No 0 Yes 1320
No 566 No 1639
Yes 2039
Yes 1586

• Entropy before: 1bit

• Entropy after: 0.4*0.0 + 0.6*0.650 = 0.390
• Difference: 0.61 bits (much better!)
32
Further splitting makes for
potentially even better
predictions!

Splitting rule:
If balance < 943 or
balance between 1612.5
and 1652.5, then default
is “No”, otherwise default
is “Yes”

33
Compression

• Entropy defines the number of bits

required to perfectly represent (encode) a
system:
• part of Shannon's source coding theorem
• Naïve encodings rarely achieve this efficiency

• Efficiency of encoding can be measured

as:

• Example …

34
Efficiency of naïve encoding of
states
• Consider our earlier example (states of
weather):
State Naïve Code
Sunny 1 0.20 000
Raining 2 0.20 001
Foggy 3 0.10 010
Cloudy 4 0.45 011
Frosty 5 0.05 100

• Efficiency of this 3-bit encoding is:

35
Huffman Coding

• Entropy and probability are closely

• Theory: give each state a potentially

different length (number of bits) to
encode

• More frequent states get shorter codes

• Encoding generated by building a

Huffman tree. 36
Building a Huffman tree

Can use the following algorithm to build a

tree:
1. Start with an empty set T
2. Add all the symbols to set T
3. While there are multiple "trees" in the set:
a. Remove the lowest probability tree from set T (break ties in
terms of smaller tree size), and call this tree A
b. Remove the lowest probability tree from set T (break ties in
terms of smaller tree size), and call this tree B
c. Make a new tree C by joining A and B, and set the
probability of C to p(A) + p(B)
d. Add the new tree C to set T
4. Return the only tree in set T as the Huffman tree
37
Example: building a Huffman
tree
1. Start with an empty set T
2. Add all the symbols to set T
3. While there are multiple "trees" in the set:
a. Remove the lowest probability tree from set T (break ties in
terms of smaller tree size), and call this tree A
b. Remove the lowest probability tree from set T (break ties in
terms of smaller tree size), and call this tree B
c. Make a new tree C by joining A and B, and set the probability
of C to p(A) + p(B)
d. Add the new tree C to set T
4. Return the only tree in set T as the Huffman tree

Using:
p(Sunny)=0.2,
p(Raining)=0.2,
p(Foggy)=0.1,
p(Cloudy)=0.45,
p(Frosty)=0.05
38
From Tree to Code

State Huffman
Code
Sunny 0.20
Raining 0.20
Foggy 0.10
Cloudy 0.45
Frosty 0.05

39
Efficiency of Huffman Code

• Efficiency is as before (average code length):

• Average=0.2*2 + 0.2*3 + 0.1*4 + 0.45*1 + 0.05*4
=2.05 bits
State Naïve Code Huffman
Code
Sunny 1 0.20 000 01
Raining 2 0.20 001 000
Foggy 3 0.10 010 0010
Cloudy 4 0.45 011 1
Frosty 5 0.05 100 0011

• Efficiency is therefore
40
A Huffman tree for Wikipedia!
Generated by scraping text
from 10000 Wikipedia
pages and computing
frequencies

This code from this tree

has an efficiency of
99.4% (a naïve 6-bit
encoding is 73.8%)

41
Remark

• Huffman codes are rarely used in

practice, but variants of the approach are
used in many places, e.g.,:
• ZIP compression, text compression, JPEG

42
In this week’s lab

• Compute the entropy of a system

• Build a Huffman tree, measure the efficiency

of it’s corresponding coding

• Use a Huffman code to encode and decode

signals

• Compare Huffman codes to Scrabble scores

and Morse code (you should see something
interesting here! )
43
Thanks!
Questions?

Year 5 Full Autumn Term
100% (1)
Year 5 Full Autumn Term
124 pages
Duties and Responsibilities of Teachers
No ratings yet
Duties and Responsibilities of Teachers
11 pages
Fun With Language Book 2 Part 1
0% (2)
Fun With Language Book 2 Part 1
86 pages
The Man Who Fell To Earth
100% (1)
The Man Who Fell To Earth
9 pages
P.E. 9 - Q1 - Module1b
No ratings yet
P.E. 9 - Q1 - Module1b
13 pages
Research Proposal On Military Writing
100% (2)
Research Proposal On Military Writing
16 pages
Chapter 2 - Mathematical Preliminaries For Lossless Compression
No ratings yet
Chapter 2 - Mathematical Preliminaries For Lossless Compression
56 pages
AngularJS Tutorial W3Schools PDF
No ratings yet
AngularJS Tutorial W3Schools PDF
43 pages
Name Designation Department Faculty E-Mail Address
No ratings yet
Name Designation Department Faculty E-Mail Address
3 pages
PMIT-6214: Information Coding: Instructor: M. Shamim Kaiser Email: Text Phone: 01511000555
No ratings yet
PMIT-6214: Information Coding: Instructor: M. Shamim Kaiser Email: Text Phone: 01511000555
76 pages
TSLB Linguistic (Academic Writing Real)
No ratings yet
TSLB Linguistic (Academic Writing Real)
6 pages
Information Theory: A Tutorial Introduction
0% (1)
Information Theory: A Tutorial Introduction
23 pages
Information Theory
No ratings yet
Information Theory
26 pages
Speed Reading: From Wikipedia, The Free Encyclopedia
No ratings yet
Speed Reading: From Wikipedia, The Free Encyclopedia
5 pages
Resume Sample For Entry Level Nurse
100% (2)
Resume Sample For Entry Level Nurse
9 pages
JSREP Volume 38 Issue 183ج1 Pages 223-310
No ratings yet
JSREP Volume 38 Issue 183ج1 Pages 223-310
88 pages
Information Theory
No ratings yet
Information Theory
114 pages
ANFIS Wind Speed Estimator-Based Output Feedback Near-Optimal MPPT Control For PMSG Wind Turbine
No ratings yet
ANFIS Wind Speed Estimator-Based Output Feedback Near-Optimal MPPT Control For PMSG Wind Turbine
11 pages
Electrical Machines I
No ratings yet
Electrical Machines I
31 pages
Chapter 2 - Edited
No ratings yet
Chapter 2 - Edited
45 pages
Monthly Planner Month of July To September Classes Nur - To Ukg
No ratings yet
Monthly Planner Month of July To September Classes Nur - To Ukg
5 pages
Information Theory: Mike Brookes E4.40, ISE4.51, SO20
No ratings yet
Information Theory: Mike Brookes E4.40, ISE4.51, SO20
114 pages
Probability & Information: Prof. J Bapat
No ratings yet
Probability & Information: Prof. J Bapat
20 pages
Lecture-1 Information Theory
No ratings yet
Lecture-1 Information Theory
20 pages
Dtree
No ratings yet
Dtree
15 pages
Information, Entropy, and The Motivation For Source Codes: Hapter
No ratings yet
Information, Entropy, and The Motivation For Source Codes: Hapter
12 pages
Amount of Information I Log (1/P)
No ratings yet
Amount of Information I Log (1/P)
2 pages
Lecture 1: Introduction, Entropy and ML Estimation
No ratings yet
Lecture 1: Introduction, Entropy and ML Estimation
5 pages
Unit 1
No ratings yet
Unit 1
94 pages
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
No ratings yet
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
44 pages
Annual Summary Report 2015 - Minhaj Welfare Foundation
No ratings yet
Annual Summary Report 2015 - Minhaj Welfare Foundation
44 pages
Basic Information Theory: Thinh Nguyen Oregon State University
No ratings yet
Basic Information Theory: Thinh Nguyen Oregon State University
17 pages
01-Syllabus and Intro
No ratings yet
01-Syllabus and Intro
21 pages
Head Master
100% (1)
Head Master
1 page
Noise, Information Theory, and Entropy: CS414 - Spring 2007
No ratings yet
Noise, Information Theory, and Entropy: CS414 - Spring 2007
44 pages
Entropy (Information Theory)
No ratings yet
Entropy (Information Theory)
17 pages
2015 Chapter 7 MMS IT
No ratings yet
2015 Chapter 7 MMS IT
36 pages
Wiley Society For Research in Child Development
No ratings yet
Wiley Society For Research in Child Development
13 pages
Digital Communication Unit 5
No ratings yet
Digital Communication Unit 5
105 pages
Memo 1 and Round 2 Plan
No ratings yet
Memo 1 and Round 2 Plan
9 pages
MIT18 600F19 Lec33
No ratings yet
MIT18 600F19 Lec33
58 pages
ICT - Module 1 Lecture 1
No ratings yet
ICT - Module 1 Lecture 1
34 pages
Business Analytics & Machine Learning: Decision Tree Classifiers
No ratings yet
Business Analytics & Machine Learning: Decision Tree Classifiers
60 pages
Lec35 - 210108062 - ZAINAB ALI
No ratings yet
Lec35 - 210108062 - ZAINAB ALI
9 pages
Information Theory 5th Unit
No ratings yet
Information Theory 5th Unit
20 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
Info Theory
No ratings yet
Info Theory
59 pages
Turtle - Turtle Graphics - Python 3.12.3 Documentation
No ratings yet
Turtle - Turtle Graphics - Python 3.12.3 Documentation
34 pages
MIT16 36s09 Lec03
No ratings yet
MIT16 36s09 Lec03
10 pages
Decision Trees
No ratings yet
Decision Trees
128 pages
Isbs203 System Analysis and Design: Designing The Human Interface
No ratings yet
Isbs203 System Analysis and Design: Designing The Human Interface
5 pages
Information Theory A Tutorial Introduction-1-20
No ratings yet
Information Theory A Tutorial Introduction-1-20
20 pages
A Visual Introduction To Information Theory
No ratings yet
A Visual Introduction To Information Theory
43 pages
Data Visualization
No ratings yet
Data Visualization
3 pages
Module 1
No ratings yet
Module 1
40 pages
Communication Theory and Coding: Basics
No ratings yet
Communication Theory and Coding: Basics
17 pages
Short Intro Quantum Information
No ratings yet
Short Intro Quantum Information
64 pages
Empowering: Changemakers
No ratings yet
Empowering: Changemakers
39 pages
Module14 InformationTheoryandEntropy
No ratings yet
Module14 InformationTheoryandEntropy
24 pages
RSL+Exam+Fee+Sheet+2024 Malaysia+
No ratings yet
RSL+Exam+Fee+Sheet+2024 Malaysia+
1 page
03 InformationGain
No ratings yet
03 InformationGain
20 pages
A Semi Detailed Lesson Plan - SCI2
No ratings yet
A Semi Detailed Lesson Plan - SCI2
4 pages
Hamilton - 2021 - Information Theory A Gentle Introduction
No ratings yet
Hamilton - 2021 - Information Theory A Gentle Introduction
8 pages
Lecture2 1
No ratings yet
Lecture2 1
37 pages
21ECE72 - Coding and Cryp Module 1
No ratings yet
21ECE72 - Coding and Cryp Module 1
34 pages
Overview & Chapter 1
No ratings yet
Overview & Chapter 1
45 pages
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
No ratings yet
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
5 pages
Lecture 1
No ratings yet
Lecture 1
211 pages
Lecture 7 Source Coding 2024
No ratings yet
Lecture 7 Source Coding 2024
28 pages
Chap 2
No ratings yet
Chap 2
47 pages
Lossless Math
No ratings yet
Lossless Math
32 pages
Source Coding
No ratings yet
Source Coding
29 pages
Physics Project Plan
No ratings yet
Physics Project Plan
2 pages
Ekiram Zeynu
No ratings yet
Ekiram Zeynu
22 pages
C&C Combined Module Notes
No ratings yet
C&C Combined Module Notes
206 pages
ML Lecture04x2
No ratings yet
ML Lecture04x2
16 pages
Physical - Sciences - GR - 10 - Assessent - Frame - June 2025
No ratings yet
Physical - Sciences - GR - 10 - Assessent - Frame - June 2025
3 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Unit 1 INFORMATION ENTROPY FUNDAMENTALS
No ratings yet
Unit 1 INFORMATION ENTROPY FUNDAMENTALS
13 pages
Gb-Xi (Half Yearly) Set-1
No ratings yet
Gb-Xi (Half Yearly) Set-1
6 pages
2 Information Measurement and Entropy
No ratings yet
2 Information Measurement and Entropy
23 pages
Module 4
No ratings yet
Module 4
15 pages
Iict Unit One
No ratings yet
Iict Unit One
35 pages
Lecture 1
No ratings yet
Lecture 1
28 pages
AP® Computer Science Principles Course and Exam Description, Effective Fall 2023
No ratings yet
AP® Computer Science Principles Course and Exam Description, Effective Fall 2023
1 page
The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
From Everand
The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
S. Deviant
4.5/5 (3)
I Can Be a Cool Coder
From Everand
I Can Be a Cool Coder
Thomas Canavan
No ratings yet
ONE WAY OR ANOTHER: MAKING YOUR WAY IN LIFE
From Everand
ONE WAY OR ANOTHER: MAKING YOUR WAY IN LIFE
Edward Steep
No ratings yet
Stem Guides To Calculating Time
From Everand
Stem Guides To Calculating Time
Kay Robertson
No ratings yet
Essential Computer Hardware: The Illustrated Guide to Understanding Computer Systems
From Everand
Essential Computer Hardware: The Illustrated Guide to Understanding Computer Systems
Kevin Wilson
No ratings yet

Comp101 Lect02

Uploaded by

Comp101 Lect02

Uploaded by

COMP101:

• Labs start this week:

• First lab assessment this week:

• For a class of ~ 140 students, would be

• If interested, please let me know (we can

• Information theory – what is

• Examples of Information Theory:

• I’m booking a table at a

• Very hard to define without using

• “Knowledge communicated concerning

• Keywords: knowledge, communicate,

• Information forms the input into decision

• A source of information takes on different

• Each value is a unique state for that

• Note: “value” is independent from

• If information provides insight into decision

• A rare state is usually surprising, and usually

• Therefore, information surprise is measured

• The smallest number of states is therefore 2:

• Shannon gave this term a special name: bit

• To represent multiple states, we simply chain

• We therefore use the bit as the basic unit of

• For a source of information with states,

• The expected (average) surprise that we

• We call this value the Shannon

• Imagine that we encounter weather states in

• The entropy of this system is therefore:

• Shannon Entropy is maximised when

• Assume (unrealistically!) that each

• Each symbol equates to one of states

• Therefore, each character in a password

• Estimating Computational Effort

• Decision Rules in Machine Learning

• As a basis for compression

• Entropy defines the lower bound on

• But we can’t store fractions of a bit, so

• Entropy defines an exponential scale of work:

• Can therefore use entropy to measure the

• e.g., passwords: if we increase the per-

• Decision often made by “divide and

• How do we systematically identify the

• Entropy! (more accurately, entropy loss)

• Recall that a “pure” (no diversity) system

• To test a splitting rule:

This forms the basis of a method called decision tree learning!

• Predicting whether a person will default

• Entropy of “default” is: =1bit

• Predicting whether a person will default on

• Entropy before: 1bit

• Predicting whether a person will default

• Let’s try splitting on “balance ≤ 1000” …

• Predicting whether a person will default on

• Entropy before: 1bit

• Entropy defines the number of bits

• Efficiency of encoding can be measured

• Efficiency of this 3-bit encoding is:

• Entropy and probability are closely

• Theory: give each state a potentially

• More frequent states get shorter codes

• Encoding generated by building a

Can use the following algorithm to build a

• Efficiency is as before (average code length):

This code from this tree

• Huffman codes are rarely used in

• Compute the entropy of a system

• Build a Huffman tree, measure the efficiency

• Use a Huffman code to encode and decode

• Compare Huffman codes to Scrabble scores

You might also like