Basics of Compression
Basics of Compression
Basics of Compression
Outline
Need for compression and compression
algorithms classification
Basic Coding Concepts
Fixed-length
coding and variable-length coding
Compression Ratio
Entropy
1
1/22/2024
Broad Classification
Entropy Coding (statistical)
lossless;independent of data characteristics
e.g. RLE, Huffman, LZW, Arithmetic coding
Source Coding
lossy;may consider semantics of the data
depends on characteristics of the data
e.g. DCT, DPCM, ADPCM, color model transform
Hybrid Coding (used by most multimedia systems)
combine entropy with source encoding
e.g., JPEG-2000, H.264, MPEG-2, MPEG-4, MPEG-7
2
1/22/2024
Data Compression
Branch of information theory
minimize amount of information to be
transmitted
Transform a sequence of characters into a
new string of bits
same information content
length as short as possible
Concepts
Coding (the code) maps source messages from
alphabet (A) into code words (B)
3
1/22/2024
Taxonomy of Codes
Block-block
source msgs and code words of fixed length; e.g.,
ASCII
Block-variable
sourcemessage fixed, code words variable; e.g.,
Huffman coding
Variable-block
source variable, code word fixed; e.g., RLE
Variable-variable
source variable, code words variable; e.g., Arithmetic
7
Example of Block-Block
Coding “aa bbb cccc ddddd Symbol Code word
eeeeee fffffffgggggggg” a 000
b 001
Requires 120 bits c 010
d 011
e 100
f 101
g 110
space 111
4
1/22/2024
Example of Variable-Variable
Coding “aa bbb cccc ddddd Symbol Code word
eeeeee fffffffgggggggg” aa 0
bbb 1
Requires 30 bits cccc 10
don’t forget the spaces ddddd 11
eeeeee 100
fffffff 101
gggggggg 110
space 111
Concepts (cont.)
A code is
distinct
if each code word can be distinguished from
every other (mapping is one-to-one)
uniquely decodable if every code word is identifiable
when immersed in a sequence of code words
e.g., with previous table, message 11 could be defined as
either ddddd or bbbbbb
10
5
1/22/2024
Static Codes
Mapping is fixed before transmission
message represented by same codeword
every time it appears in message (ensemble)
Huffman coding is an example
Dynamic Codes
Mapping changes over time
also referred to as adaptive coding
Attempts to exploit locality of reference
periodic,
frequent occurrences of messages
dynamic Huffman is an example
Hybrids?
build set of codes, select based on input
12
6
1/22/2024
Amount of compression
redundancy
compression ratio
How to measure?
13
Measure of Information
Consider symbols si and the probability of
occurrence of each symbol p(si)
In case of fixed-length coding , smallest
number of bits per symbol needed is
L ≥ log2(N) bits per symbol
Example: Message with 5 symbols need 3
bits (L ≥ log25)
14
7
1/22/2024
Variable-Length Coding-
Entropy
What is the minimum number of bits per
symbol?
Answer: Shannon’s result – theoretical
minimum average number of bits per code
word is known as Entropy (H)
n
p(s ) log
i 1
i 2 p( si )
15
Entropy Example
Alphabet = {A, B}
p(A) = 0.4; p(B) = 0.6
16
8
1/22/2024
Entropy Example
Calculate the entropy for an image with
only two levels 0 and 255. P(0)=0.5 and
P(255)= 0.5
17
Entropy example
A gray scale image has 256 levels A={ 0,
1, 2, ………….255} with equal
probabilities. Calculate Entropy.
18
9
1/22/2024
Entropy Example
Calculate the Entropy of aaabbbbccccdd
P(a)= 0.23
P(b) = 0.3
P(c)=0.3
P(d)= 0.15
19
10