0% found this document useful (0 votes)

3 views35 pages

Foundations of Information Processing: Information and Data Compression

The document discusses the foundations of information processing, focusing on the definitions and characteristics of information, data compression techniques, and the theories of communication, particularly Shannon's theory. It covers concepts such as quantization, entropy, and various encoding methods, including lossless and lossy compression. Additionally, it highlights the importance of efficient data representation and the increasing need for data compression due to the rapid growth of digital information.

Uploaded by

kl asertification

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views35 pages

Foundations of Information Processing: Information and Data Compression

Uploaded by

kl asertification

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Foundations of

Information Processing

Information and
data compression

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 1

Considerations about information

Consideration 1:
How information can be defined?

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 2

Consideration 2:

Does information contain the value?

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 3

Information is processed data
which is usually
foundation of knowledge.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 4

Information: from analog to discrete signals
Brookshear, J.G. Computer Science - An overview, 7th ed. Addison Wesley, 2003

• Data vs. information vs. knowledge.

• Real-world signals: data content/information and preserving it.
• Digital (time-dependent) signals and analog-digital converters.
• Conversion of from a continuous signal to a discrete one:
– Swedish-born American engineer of electronics Harry Nyquist (Yale U,
Bell Labs) proposed for the sampling frequency Fs > 2 ⋅ Fmax
where Fs /2 is called the Nyquist frequency.
• The sampling frequency must be more than
twice larger than the maximum frequency.
• Aliasing: if sampling is not frequent enough,
the higher frequencies than
the sampling frequency
appear as the lower frequencies,
i.e., wrong alias (see the figure).

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 5

Information: quantization
Brookshear, J.G. Computer Science - An overview, 7th ed. Addison Wesley, 2003

• Quantization from the continuous signal to the discrete signal.

• Quantization noise: sampling frequency and resolution of signal values.
• The difference between an input value and its quantized value (such
as rounding error) is referred to as quantization error.
• This rounding from a continuous signal to a discrete signal is done by
selected accuracy (resolution).
• See the figure:
• How frequently to sample?
• How accurately to define
a discrete value?
=> quantization noise

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 6

Shannon’s theory of communication
• Information:
– Processed data: information content.
– Probability of the result: probability of the event.
• Communication between two parties by an American mathematician and
an engineer Claude Shannon, MIT, Bell Labs (the father of information
theory): A Mathematical Theory of Communication, 1948, Later: A →The.
• The signal is being sent through the channel.

In the book, it was introduced

the new unit of information,
called bit = binary digit
Analog signal
=> Digital signal

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 7

Information content
Information content in the event A:

1
𝑖𝑖 𝐴𝐴 = log 𝑏𝑏 = − log 𝑏𝑏 𝑃𝑃(𝐴𝐴)
𝑃𝑃(𝐴𝐴)

where

• P(A) = the probability of the event A.

• b = the base of the logarithmic function.

• The character set of the information unit.
• The number of characters for use/how many events.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 8

Weather forecast: information content
Atacama Desert is an extremely dry region in Chile and Peru:
• A local record: no rain in 400 years. The whole desert: 40 years.
• The probability that tomorrow will be dry is 100 % → P(dry) = 1.
• The probability of rain is then P(rain) = 0.
• Two possible events which one of them occurs → base =2.
• Information content that it is dry:
1 1
𝑖𝑖(dry) = log 2 = log 2 = log 2 1 = log 2 20 = 0
𝑃𝑃(dry) 1
Lappeenranta:
• The probability that tomorrow will be dry is 50 % → P(dry) = 0.5
• Two possible events which one of them occurs → base =2.
• Information content that it is dry:
1 1
𝑖𝑖(dry) = log 2 = log 2 = log 2 2 = log 2 21 = 1
𝑃𝑃(dry) 0.5

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 9

Entropy
• The average level of information of a random variable X in the
outcomes of the events.
• The shortest average length of the message in bits to transfer
information.
• Shannon entropy:
– Fair coin flipping (heads or tails): 1 bit/toss
– Always a head (or always a tail): 0 bit/toss
• The entropy is
• the average number of bits
which is needed
to encode one symbol.
• This defines the minimum capacity for a channel for reliable
(lossless) binary data transfer.
• For more detailed information, see, for example:
https://www.youtube.com/watch?v=YtebGVx-Fxw

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 10

Entropy
• The weighted average of information of symbols:

𝑛𝑛
1
𝐻𝐻 𝑋𝑋 = � 𝑝𝑝(𝑥𝑥𝑖𝑖 )log 𝑏𝑏 ( )
𝑝𝑝(𝑥𝑥𝑖𝑖 )
𝑖𝑖=1

𝑛𝑛

= − �𝑝𝑝(𝑥𝑥𝑖𝑖 )log 𝑏𝑏 𝑝𝑝(𝑥𝑥𝑖𝑖 )

𝑖𝑖=1
• Example: coin flipping
1
𝑝𝑝 𝑥𝑥1 = 𝑝𝑝 𝑥𝑥2 =
2

1 1 1 1 1 1
𝐻𝐻 𝑋𝑋 = −( log 2 + log 2 )= − ( − + − ) = 1.0
2 2 2 2 2 2

→ One bit is enough to encode: for example, heads (bit 1) and tails (bit 0).

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 11

What is on the average the minimum number
of bits required to represent one character?
• A language contains four characters: #,%,@,?
• The probabilities to appear in the language:
P(#)=1/2, P(%)=1/4, P(@)=1/8, P(?)=1/8
• How many bits at least are needed on the average to represent one
character? 𝑛𝑛

𝐻𝐻 𝑋𝑋 = − �𝑝𝑝(𝑥𝑥𝑖𝑖 )log 𝑏𝑏 𝑝𝑝(𝑥𝑥𝑖𝑖 )

𝑖𝑖=1
1 1 1 1 1 1 1 1
=− ⋅ log 2 + ⋅ log 2 + ⋅ log 2 + ⋅ log 2
2 2 4 4 8 8 8 8
1 1 1 1
=− ⋅ −1 + ⋅ −2 + ⋅ −3 + ⋅ −3 = 1.75 bit
2 4 8 8

• How should the characters be encoded?

→ See the next slides about data compression.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 12

Considerations about data compression

Consideration 1:
Does information contain redundancy?

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 13

Consideration 2:

Could information be stored

in a more compressed way
by encoding it again?

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 14

Consideration 3:

What is
the most compressed representation
for
the best encoding?

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 15

The amount of digital information is increasing
very rapidly due to
digitalization and
image, video, and audio applications.

The representation of data can be changed by

encoding it again.

By efficient encoding of information,

a need of memory can be optimized.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 16

Data compression
Brookshear, J.G. Computer Science - An overview, 7th ed. Addison Wesley, 2003

• Motivation for data compression.

• Lossless compression.
• Lossy compression.
• Fixed-length code.
• Variable-length code.
• Data compression methods.
• Examples, especially Huffman coding.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 17

Motivation: why to save bits?
• Compressed data need less space in transfer and storing
⇒ increased speed and
less memory capacity needed
⇒ more robust performance in general.
• In most cases data can be compressed without affecting
information contained by data
• Information is not lost at all (lossless),
• or it is not lost “too” much (near lossless).
• Entropy defines the limit
• how many bits/symbol on the average
are needed at minimum to encode information.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 18

Encoding strings into bits
• For each symbol in the queue (for example, for a character in a string
or for a letter in text) it is selected the code word of its own bounded
by the following equation:
N
≥ logb K
L
N is the lenght of the code word
L is the length of the string to be encoded
b is the size of the code alphabet
K is the size of the alphabet to be encoded
• Example:
When the alphabet contains K=8 characters which are encoded to
the string of the length L=4, and the code alphabet contains 0 and 1
so the size b=2, so
𝑁𝑁 ≥ L log b K = 4 log 2 8 = 4 log 2 23 = 4 ∗ 3 = 12 bits

are needed for encoding.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 19

Approaches for compression
• Compression ratio: how much encoding compresses.
• Fixed-length/variable-length code.
• Lossless compression:
– The original source information is not lost or changed so
it can be restored to its original form.
– The compression ratio is limited.
– What does define this limit?

• Lossy compression:
– The original source information is partly lost so it cannot
be restored to its original form.
– The compression ratio is better.
– Is too much information lost?

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 20

Gray code
• Frank Gray, a physicist, Purdue University,
Bell Labs, invented the code in 1947, 7 1 0 0
named in 1954 as the Gray code.
• Binary representation where an ordering of 6 1 0 1
the binary numeral system is such that two
successive values differ in only one bit.
5 1 1 1
• Originally developed for electromechanical 4 1 1 0
switches.
• Mainly related to error correction in digital
3 0 1 0
communications where the parity is being 2 0 1 1
checked.
• The number of 1s in the string is odd => 1. 1 0 0 1
• The number of 1s is even => 0.
0 0 0 0
• In the Gray code neighboring codes always
contain the different parity.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 21

Run-length encoding (RLC)
• The binary representation of a string consists of queues of ones (1)
and zeros (0).
• Thus, let us store runs of data as the counts of the same value in a
row.
• In practice in the binary numeric system, the number of consecutive
bits of the same value (1 or 0) is stored in encoding.
• In the beginning, the first number in encoding is the count of 1s.
• If the first bit in the string is 0 then the first number is set to 0,
and then the next number is set to the count of 0s.
• Example: 1s and 0s and their counts are illustrated in colors

00011010100001111100010111001111 →

032111145311324

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 22

Fixed-length encoding
• Lossless compression.
• Each word (or character) to be encoded has a bit pattern of
its own.
• The number of bits in the pattern is 𝑁𝑁 ≥ 𝐿𝐿 𝑙𝑙𝑙𝑙𝑙𝑙𝑏𝑏𝐾𝐾.
• Wenglish consists of 5-character words with the alphabet a-z (in
total 26 different characters).
• See http://www.inference.org.uk/mackay/itprnn/ps/65.86.pdf
where the cited figure 2.1 can be found at
http://www.inference.org.uk/mackay/itprnn/ps/22.40.pdf
• Wenglish is a kind of the English language, but it contains
only words of 5 characters.
• How many bits/words are needed for encoding of this language
so that every word has its own bit pattern?

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 23

How many bits/word?
• The alphabet contains 26 characters (from a to z).
=> the size of the alphabet 𝐾𝐾 = 26
• The characters are coded as words of 5 characters.
=> the length of the string to be encoded 𝐿𝐿 = 5
• The code alphabet contains bits 1 and 0
=> The size of the code alphabet 𝑏𝑏 = 2

𝑁𝑁 ≥ 𝐿𝐿log 𝑏𝑏 𝐾𝐾 ⇒ 𝑁𝑁 ≥ 5 ⋅ log 2 26 ⇒ 𝑁𝑁 ≥ 23.5 bits

Thus, Wenglish would require 24 bits/word.

• In a computer this would require 4 bytes = long integer.
• One byte is 8 bits.
• There is no representation of 3 bytes in a computer.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 24

Fixed-length encoding: lossy compression
• Lossy compression:
– The own bit patters for only the most common words.

– Rare words do not have the own bit pattern.

– A calculated risk to lose information in order to increase data

compression.
• There are 32000 typically used words in Wenglish so let us these
words have individual bit patterns of their own, and the rest of
words are represented with 8000 words.
– How many bits/word with 32000+8000=40000 words?

𝑁𝑁 ≥ log 2 40000 ⇒ 𝑁𝑁 ≥ 15.29

• 16 bits/word => 2 bytes in a computer.

• Thus, 2 bytes instead of 4 bytes as in the previous example.

Variable-length codes: lossless compression
• Huffman code:
- David Huffman, Am. computer scientist, MIT, 1952, UC Santa Cruz.

- Most frequent characters are encoded with shortest bit patterns.

• LZW code:
- Lempel, Ziv, Welch, 1978/1984, Technion, Israel & MIT.

- Most frequent words are saved in the table and given as short
codes as possible.
• Q code:
- Whole sentences are encoded with fixed-length code words.

- QRA = What ship or coast station is that?

• Arithmetic coding:
• Rather than separating the input into component symbols and
replacing each with a code, arithmetic coding encodes the entire
message into a single number, an arbitrary-precision fraction as
0.0 ≤ q < 1.0.

Huffman coding
The estimated probability or frequency of occurrence for symbols
must be known in order to use this coding compression algorithm:
1. Add the probabilities of the two least probable symbols and
place the addition in the list instead of them.
2. Repeat until two probabilities are left:
a) 0 is assigned as a code word to the one.
b) 1 is assigned as a code word to the other one.
3. Add 0 and 1 in the right to the code word of the united
probabilities.

Next, let us see examples.

Example: Huffman code
P(𝑠𝑠𝑖𝑖 ) = the estimated probability of occurrence of the symbol 𝑠𝑠𝑖𝑖
si P(si)
s1 0.40 0.40 0.40
s2 0.25 0.60
0.20 0.35
s3 0.20 0.40
0.15 0.25
s4 0.15 0.15
s5 0.10
si
s1 1
s2 000 1 1 0
s3 001 01 00 1
s4 010 000 01
s5 011 001

Example: what is theoretically
the best possible compression?
• Entropy sets the limit to optimal encoding (bits/symbol on the average):
𝑛𝑛
1
𝐻𝐻 𝑋𝑋 = � 𝑝𝑝(𝑥𝑥𝑖𝑖 )log𝑏𝑏 ( )=
𝑝𝑝(𝑥𝑥𝑖𝑖 )
𝑖𝑖=1

1 1 1 1 1
0.4 log 2 + 0.2 log 2 + 0.15 lg 2 + 0.15 log 2 + 0.10 log 2
0.4 0.2 0.15 0.15 0.10

= 2.1464 bits/symbol
• This is the best possible compression theoretically.
• Note that without compression to directly encode 5 symbols requires 3
bits/symbol (you cannot directly encode/decode 5 symbols with 2 bits only):
s1 000
s2 001
s3 010
s4 011
s5 100

Example: how efficient is our encoding?
• What is the average length in bits in the message of 100 Huffman
coded symbols in our example?:
P(si)
0.40 s1 1
Decoding example:
0.20 s2 000 Encoded: 10011
0.15 s3 001 Decoding: 10011
0.15 s4 010 Decoded: s1s3s1
0.10 s5 011

• 40 times the symbol 𝑠𝑠1 (code 0), 20 times 𝑠𝑠2 (code 000), etc.
• The average length is 40 ∗ 1 + 20 + 15 + 15 + 10 ∗ 3 = 220 bits.
• So only a little bit more than the best possible according to entropy:
100 ∗ 2.1464 => 215 bits (rounding upwards).
• Without compression: 3 ∗ 100 = 300 bits.

Huffman code: another example The probabilities
affect the
si P(si)
representation of
each symbol in bits
s1 0.55 0.55
s2 0.55 0.55 = bit patterns
0.25 0.25 0.25
change according
s3 0.45 to the probabilities.
0.15 0.15 0.20
s4 0.03 0.05 Is the average
s5 0.02
length longer than
in the previous
example
si (compute)?
s1 0 What happens if all
s2 10 0 0 0 symbols are as
probable?
s3 110 10 10 1 => The algorithm
s4 1110 110 11 does not work
111 since it is based on
s5 1111 probabilities.

Differential pulse-code modulation (DPCM)

The difference to 187 187 176 199 176 187 187

the neighboring 187 176 199 172 199 176 187
values only is
stored. 176 199 172 172 172 199 176
199 172 172 178 172 172 199
187 and its
neighbors are given
as an example in 187 0 11 -23 23 -11 0
colors. 0 11 -23 27 -27 23 -11

11 -23 27 0 0 -27 23
-23 27 0 -6 6 0 -27

Lossy compression
• Motivation:
• The better compression ratio by allowing a loss of information.
• For example, transform coding.
• Example: Discrete Fourier Transform (DFT):
• An image transformed to the frequency domain from the
spatial domain describes the image frequencies (details in the
image).
• The original image can be fully restored from the transformed
image based on frequency information.
• Trick: the less frequency information stored, the better
compression ratio (and the more original information lost).
• In practice, Discrete Cosine Transform (DCT) is applied since
it is efficient to implement.
• Next, an example is shown how to compress data.

Example: Fourier Transform for compression
• An image (in the left) transformed to the frequency domain (in the right).
• The DC component is in the middle (the origin).
• The compression ratio and the loss of information depend on how much
frequency values are stored:
• The larger the circle around the origin, the less loss of information.
• The smaller the circle, the more compression.

Summary
• Information is processed data which is usually foundation of
knowledge.

• The entropy in information theory is the average number of

bits which is needed to encode one symbol.

• Encoded information can be compressed (to encode again)

when data elements contain redundant information.

• Compression can be lossless (no original information lost) or

lossy.

• The entropy defines the lower boundary to the number of bits

per data element for optimal encoding.

Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
17 pages
09 Basic Compression
No ratings yet
09 Basic Compression
81 pages
2003 Nissan Altima 2.5 Serivce Manual PG
69% (13)
2003 Nissan Altima 2.5 Serivce Manual PG
72 pages
Lecture 1
No ratings yet
Lecture 1
28 pages
Topic 2 Information and Coding Theory
No ratings yet
Topic 2 Information and Coding Theory
68 pages
Book-Chapter-07 (Lossless Compression Algorithms) Merged
No ratings yet
Book-Chapter-07 (Lossless Compression Algorithms) Merged
25 pages
Data Compression
No ratings yet
Data Compression
26 pages
L15 Compression
No ratings yet
L15 Compression
63 pages
Information Theory
No ratings yet
Information Theory
40 pages
Chap 2
No ratings yet
Chap 2
47 pages
Communication Ii 4 - Year 3Hrs Theor, 1 HR Pratical
No ratings yet
Communication Ii 4 - Year 3Hrs Theor, 1 HR Pratical
29 pages
MMC Module Iii-1
No ratings yet
MMC Module Iii-1
73 pages
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
No ratings yet
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
48 pages
Advanced Multimedia Infrastructure
No ratings yet
Advanced Multimedia Infrastructure
32 pages
Lossless Math
No ratings yet
Lossless Math
32 pages
Unit 1
No ratings yet
Unit 1
94 pages
Chapter 2 - Mathematical Preliminaries For Lossless Compression
No ratings yet
Chapter 2 - Mathematical Preliminaries For Lossless Compression
56 pages
HTCS501 Unit 4
No ratings yet
HTCS501 Unit 4
17 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Intro To ICT 11
No ratings yet
Intro To ICT 11
31 pages
PT1 Itc QB
No ratings yet
PT1 Itc QB
12 pages
Cummins Engine ISF 2.8, ISB, ISC, ISLe, 4ISBe, 6ISBe, ISDe, ISM, QSM, ISX, QSX Fault Codes
No ratings yet
Cummins Engine ISF 2.8, ISB, ISC, ISLe, 4ISBe, 6ISBe, ISDe, ISM, QSM, ISX, QSX Fault Codes
72 pages
DC M1 Merged
No ratings yet
DC M1 Merged
26 pages
Chapter10 Part1 Huffman
No ratings yet
Chapter10 Part1 Huffman
17 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Ams 2600 Machinery Health Expert User Guide A6560r A6510 Modules en 589834
No ratings yet
Ams 2600 Machinery Health Expert User Guide A6560r A6510 Modules en 589834
84 pages
Source Coding
No ratings yet
Source Coding
29 pages
Chapter 2 - Edited
No ratings yet
Chapter 2 - Edited
45 pages
Module IV
No ratings yet
Module IV
37 pages
Information Coding Techniques
0% (2)
Information Coding Techniques
374 pages
Chapter 7
No ratings yet
Chapter 7
36 pages
Source Coding Ompression
No ratings yet
Source Coding Ompression
34 pages
Cse3086 Itc Notes 30 Oct
No ratings yet
Cse3086 Itc Notes 30 Oct
50 pages
Image and Video Compression: Lecture 12, April 27, 2009 Lexing Xie
No ratings yet
Image and Video Compression: Lecture 12, April 27, 2009 Lexing Xie
77 pages
Diagnostic Manual DI3200 BSIII
100% (2)
Diagnostic Manual DI3200 BSIII
41 pages
Lec3 Source Coding Annotated Day4
No ratings yet
Lec3 Source Coding Annotated Day4
75 pages
Concepts & Information Theory
No ratings yet
Concepts & Information Theory
68 pages
Introduction To ITC
No ratings yet
Introduction To ITC
24 pages
Data Compression
No ratings yet
Data Compression
49 pages
The Information Theory: C.E. Shannon, A Mathematical Theory of Communication'
No ratings yet
The Information Theory: C.E. Shannon, A Mathematical Theory of Communication'
43 pages
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
No ratings yet
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
35 pages
Class Notes PDF
0% (1)
Class Notes PDF
24 pages
Report On "Sampling Theorem Verification & Frequency Domain Analysis Using MATLAB."
No ratings yet
Report On "Sampling Theorem Verification & Frequency Domain Analysis Using MATLAB."
8 pages
Welcome To 6.004!: Handouts: Lecture Slides, Calendar
No ratings yet
Welcome To 6.004!: Handouts: Lecture Slides, Calendar
22 pages
Data Compression Basics: Discrete Source
No ratings yet
Data Compression Basics: Discrete Source
34 pages
2017 May 24 Huffman Lecture1
No ratings yet
2017 May 24 Huffman Lecture1
24 pages
Information Coding Techniques
No ratings yet
Information Coding Techniques
42 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
Introduction To Information Technology: Lecture #6
No ratings yet
Introduction To Information Technology: Lecture #6
22 pages
KX TC1709LBB
100% (1)
KX TC1709LBB
75 pages
ECE Syllabus - 2017 Batch PDF
No ratings yet
ECE Syllabus - 2017 Batch PDF
314 pages
Digital Circuits and Logic Design
No ratings yet
Digital Circuits and Logic Design
42 pages
Sayood DataCompression
No ratings yet
Sayood DataCompression
22 pages
Compression PDF
No ratings yet
Compression PDF
55 pages
A Conceptual Foundation For The Shannon-Weaver Model of Communication
No ratings yet
A Conceptual Foundation For The Shannon-Weaver Model of Communication
9 pages
Noise, Information Theory, and Entropy: CS414 - Spring 2007
No ratings yet
Noise, Information Theory, and Entropy: CS414 - Spring 2007
44 pages
Data Compression
No ratings yet
Data Compression
20 pages
Introduction To Data Compression - Guy E. Blelloch PDF
No ratings yet
Introduction To Data Compression - Guy E. Blelloch PDF
54 pages
Understanding Smart Sensors, Third Edition
100% (2)
Understanding Smart Sensors, Third Edition
10 pages
Amount of Information I Log (1/P)
No ratings yet
Amount of Information I Log (1/P)
2 pages
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
MINCO F2 Engine Controller Operating Instruction Manual
No ratings yet
MINCO F2 Engine Controller Operating Instruction Manual
10 pages
Chem 126.1 - Assignment 1
No ratings yet
Chem 126.1 - Assignment 1
9 pages
01-Syllabus and Intro
No ratings yet
01-Syllabus and Intro
21 pages
DTC Ecm 164.122
No ratings yet
DTC Ecm 164.122
31 pages
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
No ratings yet
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
7 pages
Flight Testing Module 1
No ratings yet
Flight Testing Module 1
137 pages
EE200 SSN Sandhan L06 Removed
No ratings yet
EE200 SSN Sandhan L06 Removed
44 pages
LIMP User Manual
No ratings yet
LIMP User Manual
47 pages
Control Systems I Exam Solutions
No ratings yet
Control Systems I Exam Solutions
3 pages
Data Communication & Networks
No ratings yet
Data Communication & Networks
73 pages
What Is Data Acquisition
No ratings yet
What Is Data Acquisition
5 pages
Shuja Abbas: Profile
No ratings yet
Shuja Abbas: Profile
2 pages
R7M Configuration
No ratings yet
R7M Configuration
27 pages
Electronics
No ratings yet
Electronics
3 pages
Motstart Three-Phase Start-Up Supervision For Motors: 1Mrs752307-Mum
No ratings yet
Motstart Three-Phase Start-Up Supervision For Motors: 1Mrs752307-Mum
16 pages
Beckhoff EL3004
No ratings yet
Beckhoff EL3004
3 pages
ZZ Es1274a
No ratings yet
ZZ Es1274a
2 pages
Optimally Rotation-Equivariant Directional
No ratings yet
Optimally Rotation-Equivariant Directional
8 pages
Changes in GATE-2021 Syllabus From GATE-2020 Syllabus For Electrical Engineering (Technical Subjects)
No ratings yet
Changes in GATE-2021 Syllabus From GATE-2020 Syllabus For Electrical Engineering (Technical Subjects)
5 pages
Commcenter Digital Message Center: Model 300Mb
No ratings yet
Commcenter Digital Message Center: Model 300Mb
2 pages
DTSP
No ratings yet
DTSP
4 pages
The Secret Life of Programs: Understand Computers -- Craft Better Code
From Everand
The Secret Life of Programs: Understand Computers -- Craft Better Code
Jonathan E. Steinhart
4/5 (2)
Building a BeagleBone Black Super Cluster
From Everand
Building a BeagleBone Black Super Cluster
Andreas Josef Reichel
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Build Switch and Logic Gates Using Transistors on the Breadboard
From Everand
Build Switch and Logic Gates Using Transistors on the Breadboard
GURUPRASAD N H
No ratings yet
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
IPv4 Subnetting for Beginners: Your Complete Guide to Master IP Subnetting in 4 Simple Steps: Computer Networking, #1
From Everand
IPv4 Subnetting for Beginners: Your Complete Guide to Master IP Subnetting in 4 Simple Steps: Computer Networking, #1
Ramon Nastase
5/5 (3)

Foundations of Information Processing: Information and Data Compression

Uploaded by

Foundations of Information Processing: Information and Data Compression

Uploaded by

Foundations of

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 1

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 2

Does information contain the value?

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 3

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 4

• Data vs. information vs. knowledge.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 5

• Quantization from the continuous signal to the discrete signal.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 6

In the book, it was introduced

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 7

• P(A) = the probability of the event A.

• b = the base of the logarithmic function.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 8

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 9

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 10

= − �𝑝𝑝(𝑥𝑥𝑖𝑖 )log 𝑏𝑏 𝑝𝑝(𝑥𝑥𝑖𝑖 )

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 11

𝐻𝐻 𝑋𝑋 = − �𝑝𝑝(𝑥𝑥𝑖𝑖 )log 𝑏𝑏 𝑝𝑝(𝑥𝑥𝑖𝑖 )

• How should the characters be encoded?

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 12

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 13

Could information be stored

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 14

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 15

The representation of data can be changed by

By efficient encoding of information,

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 16

• Motivation for data compression.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 17

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 18

are needed for encoding.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 19

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 20

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 21

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 22

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 23

𝑁𝑁 ≥ 𝐿𝐿log 𝑏𝑏 𝐾𝐾 ⇒ 𝑁𝑁 ≥ 5 ⋅ log 2 26 ⇒ 𝑁𝑁 ≥ 23.5 bits

Thus, Wenglish would require 24 bits/word.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 24

– Rare words do not have the own bit pattern.

– A calculated risk to lose information in order to increase data

𝑁𝑁 ≥ log 2 40000 ⇒ 𝑁𝑁 ≥ 15.29

• 16 bits/word => 2 bytes in a computer.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 25

- Most frequent characters are encoded with shortest bit patterns.

- QRA = What ship or coast station is that?

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 26

Next, let us see examples.

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 27

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 28

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 29

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 30

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 31

The difference to 187 187 176 199 176 187 187

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 32

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 33

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 34

• The entropy in information theory is the average number of

• Encoded information can be compressed (to encode again)

• Compression can be lossless (no original information lost) or

• The entropy defines the lower boundary to the number of bits

©LUT BM40A0102 Prof. Heikki Kälviäinen & Prof. Lasse Lensu 35

You might also like