0% found this document useful (0 votes)
21 views

Lecture 1

This document discusses data compression concepts including lossless and lossy compression techniques. It covers why data is compressed, including to conserve storage space and reduce transmission time. It also discusses measures used to evaluate compression performance and factors to consider like complexity and standards.

Uploaded by

anushka
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Lecture 1

This document discusses data compression concepts including lossless and lossy compression techniques. It covers why data is compressed, including to conserve storage space and reduce transmission time. It also discusses measures used to evaluate compression performance and factors to consider like complexity and standards.

Uploaded by

anushka
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Data Compression

Lecture 1
Basic Data Compression
original
Concepts
compressed decompressed

x y
Encoder Decoder xˆ

• Lossless compression x
– Also called entropy coding, reversible coding.
• Lossy compression x xˆ
– Also called irreversible coding.
• Compression ratio = xxˆ y
– x is number of bits in x.

2
Why
• Conserve storageCompress
space
• Reduce time for transmission
– Faster to encode, send, then decode than to send the
original
• Progressive transmission
– Some compression techniques allow us to send the
most important bits first so we can get a low
resolution version of some data before getting the
high fidelity version
• Reduce computation
– Use less data to achieve an approximate answer

3
Measures of performance
• Compression measures
– Compression ratio = Bits in original image
Bits in compressed image

– Bits per symbol


• Fidelity measures
2
– Mean square error (MSE)Avg( original - reconstructed)
– SNR - Signal to noise ratio10 log10 (Signal Power / Noise power)
– PSNR - Peak signal to noise ratio
– HVS based
04/02/2024 4
Other issues
• Encoder and decoder computation complexity
• Memory requirements
• Fixed rate or variable rate
• Error resilience
• Symmetric or asymmetric
• Decompress at multiple resolutions
• Decompress at various bit rates
• Standard or proprietary
04/02/2024 5
What is information?
• Semantic interpretation is subjective
• Statistical interpretation - Shannon 1948
– Self information i(A) associated with event A is
1
log 2
P ( A)
– More probable events have less information and
less probable events have more information.
– If A and B are two independent events then self
information i(AB) = i(A) + i(B)
04/02/2024 6
Braille
• System to read text by feeling raised dots on
paper (or on electronic displays). Invented in
1820s by Louis Braille, a French blind man.

a b c z

and the with


mother
th ch gh

7
Braille Example
Clear text:
Call me Ishmael. Some years ago -- never mind how long
precisely -- having \\ little or no money in my purse,
and nothing particular to interest me on shore, \\ I thought I
would sail about a little and see the watery part of the
world. (238 characters)
Grade 2 Braille in ASCII.
,call me ,i\%mael4 ,``s ye$>$s ago -- n``e m9d h[ l;g
precisely -- hav+ \\ ll or no m``oy 9 my purse1 \& no?+
``picul$>$ 6 9t]e/ me on \%ore1 \\ ,i $?$``$|$ ,i wd sail ab
a ll \& see ! wat]y ``p ( ! \_w4 (203 characters)

Compression ratio = 238/203 = 1.17


8
CSEP 590 - Lecture 1 - Autumn 2007
Lossless
• Data is not lost -Compression
the original is really needed.
– text compression
– compression of computer binary files
• Compression ratio typically no better than 4:1 for
lossless compression on many kinds of files.
• Statistical Techniques
– Huffman coding
– Arithmetic coding
– Golomb coding
• Dictionary techniques
– LZW, LZ77
– Sequitur
– Burrows-Wheeler Method
• Standards - Morse code, Braille, Unix compress, gzip, zip,
bzip, GIF, JBIG, Lossless JPEG
9
CSEP 590 - Lecture 1 - Autumn 2007
Lossy
• Compression
Data is lost, but not too much.
– audio
– video
– still images, medical images, photographs
• Compression ratios of 10:1 often yield quite high
fidelity results.
• Major techniques include
– Vector Quantization
– Wavelets
– Block transforms
– Standards - JPEG, JPEG2000, MPEG 2, H.264

10
CSEP 590 - Lecture 1 - Autumn 2007
Why is Data Compression
Possible
• Most data from nature has redundancy
– There is more data than the actual information
contained in the data.
– Squeezing out the excess data amounts to
compression.
– However, unsqueezing is necessary to be able to
figure out what the data means.
• Information theory is needed to understand
the limits of compression and give clues on
how to compress well.
11
What is Information
• Analog data
– Also called continuous data
– Represented by real numbers (or complex
numbers)
• Digital data
– Finite set of symbols {a1, a2, ... , am}
– All data represented as sequences (strings) in the
symbol set.
– Example: {a,b,c,d,r} abracadabra
– Digital data can be an approximation to analog
data
12
Symbols
• Roman alphabet plus punctuation
• ASCII - 256 symbols
• Binary - {0,1}
– 0 and 1 are called bits
– All digital information can be represented
efficiently in binary
– {a,b,c,d} fixed length representation
symbol a b c d
binary 00 01 10 11

– 2 bits per symbol


13
Exercise - How Many
Bits Per
• Symbol?
Suppose we have n symbols. How many
bits (as a function of n ) are needed in to represent
a symbol in binary?
– First try n a power of 2.

14
Discussion: Non-Powers of
Two
• Can we do better than a fixed length
representation for non-powers of two?

15
Information
Theory
• Developed by Shannon in the 1940’s and 50’s
• Attempts to explain the limits of communication
using probability theory.
• Example: Suppose English text is being sent
– It is much more likely to receive an “e” than a “z”.
– In some sense “z” has more information than “e”.

16
First-order
• Suppose we are Information
given symbols {a , a , ... , a }.
1 2 m
• P(ai) = probability of symbol ai occurring in the
absence of any other information.
P(a1) + P(a2) + ... + P(am) = 1
• inf(ai) = log2(1/P(ai)) bits is the information of ai
in bits. 7
6
5
-log(x)
4
y

3
2
1
0
0.5
0.01
0.08

0.15

0.22
0.29

0.36
0.43

0.57
0.64

0.71
0.78

0.85

0.92
0.99
x

17
Example
• {a, b, c} with P(a) = 1/8, P(b) = 1/4, P(c) = 5/8
– inf(a) = log2(8) = 3
– inf(b) = log2(4) = 2
– inf(c) = log2(8/5) = .678
• Receiving an “a” has more information than
receiving a “b” or “c”.

18
First Order

Entropy
The first order entropy is defined for a probability
distribution over symbols {a1, a2, ... , am}.
m
1
)
H ∑
i1 P(ai ) log2 ( P(ai )
• H is the average number of bits required to code up a
symbol, given all we know is the probability distribution of
the symbols.
• H is the Shannon lower bound on the average number of bits
to code a symbol in this “source model”.
• Stronger models of entropy include context.

19
Entropy Examples
• {a, b, c} with a 1/8, b 1/4, c 5/8.
– H = 1/8 *3 + 1/4 *2 + 5/8* .678 = 1.3 bits/symbol

• {a, b, c} with a 1/3, b 1/3, c 1/3. (worst case)


– H = 3* (1/3)*log2(3) = 1.6 bits/symbol

• Note that a standard code takes 2 bits per


symbol
symbol a b c
binary code 00 01 10

20
Entropy

Curve
Suppose we have two symbols with probabilities x
and 1-x, respectively.
maximum entropy at .5
1.2

1 -(x log x + (1-x)log(1-x))

0.8
entropy

0.6

0.4

0.2

0
0

1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9

p
r
o 22
b
a
b
A Simple Prefix Code
• {a, b, c} with a 1/8, b 1/4, c 5/8.
• A prefix code is defined by a binary tree
• Prefix code property
– no output is a prefix of another

binary tree
0 1 a 00
c code
0 1 b 01
a b
c 1

ccabccbccc
1 1 00 01 1 1 01 1 1 1
23
Binary Tree Terminology
root

node

leaf

1. Each node, except the root, has a unique parent.


2. Each internal node has exactly two children.

24
Decoding a Prefix Code
repeat
start at root of tree
0 1
repeat
c
0 1 if read bit = 1 then go right else
go left
a b
until node is a leaf
report leaf
until end of the code

11000111100

25
Decoding a Prefix Code
0 1
c
0 1
a b

11000111100

26
Decoding a Prefix Code
0 1
c
0 1
a b

11000111100

27
Decoding a Prefix Code
0 1
c
0 1
a b

11000111100

28
Decoding a Prefix Code
0 1
c
0 1
a b

11000111100

cc

29
Decoding a Prefix Code
0 1
c
0 1
a b

11000111100

cc

30
Decoding a Prefix Code
0 1
c
0 1
a b

11000111100

cc

31
Decoding a Prefix Code
0 1
c
0 1
a b

11000111100

cca

32
Decoding a Prefix Code
0 1
c
0 1
a b

11000111100

cca

33
Decoding a Prefix Code
0 1
c
0 1
a b

11000111100

cca

34
Decoding a Prefix Code
0 1
c
0 1
a b

11000111100

ccab

35
Decoding a Prefix Code
0 1
c
0 1
a b

11000111100

ccabccca

36

You might also like