0% found this document useful (0 votes)
89 views29 pages

Data Compression Techniques

This document discusses various data compression techniques including Huffman coding, arithmetic coding, LZ coding, and their applications. Huffman coding assigns variable-length binary codes to symbols but its complexity grows exponentially with larger block sizes. Arithmetic coding splits the interval [0,1] based on symbol probabilities and codes the final interval. LZ coding uses adaptive dictionaries to encode repeated patterns in the data. Common applications of these techniques include JPEG, PNG, GIF, ZIP, and others.

Uploaded by

Zatin Gupta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views29 pages

Data Compression Techniques

This document discusses various data compression techniques including Huffman coding, arithmetic coding, LZ coding, and their applications. Huffman coding assigns variable-length binary codes to symbols but its complexity grows exponentially with larger block sizes. Arithmetic coding splits the interval [0,1] based on symbol probabilities and codes the final interval. LZ coding uses adaptive dictionaries to encode repeated patterns in the data. Common applications of these techniques include JPEG, PNG, GIF, ZIP, and others.

Uploaded by

Zatin Gupta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

TSBK01 Image Coding and Data Compression

Lecture 4: Data Compression Techniques

Jrgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)

Outline
Huffman coding Arithmetic coding
Application: JBIG

Universal coding LZ-coding


LZ77, LZ78, LZW

Applications: GIF and PNG

Repetition
Coding: Assigning binary codewords to (blocks of) source symbols. Variable-length codes (VLC) and fixedlength codes. Instantaneous codes Uniqely decodable codes Non-singular codes All codes Tree codes are instantaneous. Tree code , Krafts Inequality.

Creating a Code: The Data Compression Problem


Assume a source with an alphabet A and known symbol probabilities {pi}. Goal: Chose the codeword lengths as to minimize the bitrate, i.e., the average number of bits per symbol li pi. Trivial solution: li = 0 8 i. Restriction: We want an instantaneous -l code, so 2 i 1 (KI) must be valid. Solution (at least in theory): li = log pi

In practice
Use some nice algorithm to find the code tree
Huffman coding Tunnstall coding

Huffman Coding
Two-step algorithm:
1. Iterate:
Merge the least probable symbols. Sort.

2. Assign bits.
a b c d
0 10 110 111

0.5
10

0.5 0.25
11

0 1

0.5 0.5

Merge

0.25 0.125 0.125

Sort
Assign Get code

0.25

Coding of the BMS


Trick: Code blocks of symbols (extended source). Example: p1 = , p2 = . Applying the Huffman algorithm directly: 1 bit/symbol. Block 00 01 10 11 P(block) 9/16 3/16 3/16 1/16 Code 0 10 110 111

approx 0.85 bits/symbol

Huffman Coding: Pros and Cons


+ Fast implementations. + Error resilient: resynchronizes in ~ l2 steps. - The code tree grows exponentially when the source is extended. - The symbol probabilities are built-in in the code. Hard to use Huffman coding for extended sources / large alphabets or when the symbol probabilities are varying by time.

Arithmetic Coding
Shannon-Fano-Elias Basic idea: Split the interval [0,1] according to the symbol probabilities. Example: A = {a,b,c,d}, P = {, , 1/8, 1/8}.

0.2 0.5

Start in b. Code the sequence c c a. ) Code the interval [0.9, 0.96] Bit 1 1 1 Interval 0.5 - 1 0.75 - 1 0.875 - 1 0.875 - 0.9375 Decoder c

b
0.5 0.8

0.2

0.6

a
0.2

1
0

0.90624 - 0.9375 c a
1

0.5

c a b
0.9

a
0.9 0.96

b
0.98

c
1

An Image Coding Application


Consider the image content in a local environment of a pixel as a state in a Markov model. Example (binary image): 0 0 1
1
0 X

Such an environment is called a context. A probability distribution for X can be estimated for each state. Then arithmetic coding is used. This is the basic idea behind the JBIG algorithm for binary images and data.

Flushing the Coder


The coding process is ended (restarted) and the coder flushed
after a given number of symbols (FIVO) or When the interval is too small for a fixed number of output bits (VIFO).

Universal Coding
A universal coder doesnt need to know the statistics in advance. Instead, estimate from data. Forward estimation: Estimate statistics in a first pass and transmit to the decoder. Backward estimation: Estimate from already transmitted (received) symbols.

Universal Coding: Examples


1. An adaptive arithmetic coder
Statistics estimation

Arithmetic coder

2. An adaptive dictionary technique


The LZ coders [Sayood 5]

3. An adaptive Huffman coder [Sayood 3.4]

Ziv-Lempel Coding (ZL or LZ)


Named after J. Ziv and A. Lempel (1977). Adaptive dictionary technique.
Store previously coded symbols in a buffer. Search for the current sequence of symbols to code. If found, transmit buffer offset and length.

LZ77
Search buffer
a
8

Look-ahead buffer
a
2

b
7

c
6

a
5

b
4

d
3

c
1

b 23

e f

Output triplet <offset, length, next> Transmitted to decoder: 8 3 d 0

0 e 1 2 f

If the size of the search buffer is N and the size of the alphabet is M we need bits to code a triplet. Variation: Use a VLC to code the triplets! PKZip, Zip, Lharc, PNG, gzip, ARJ

Drawback with LZ77


Repetetive patterns with a period longer than the search buffer size are not found. If the search buffer size is 4, the sequence abcdeabcdeabcdeabcde will be expanded, not compressed.

LZ78
Store patterns in a dictionary Transmit a tuple <dictionary index, next>

LZ78
a b c a b a b c

Output tuple <dictionary index, next>


Transmitted to decoder: Decoded: Dictionary:
1

0 a 0 b 0 c 1 b 4 c
a b c a b ab c

a b c ab

2
3 4 5

Strategy needed for limiting dictionary size!

abc

LZW
Modification to LZ78 by Terry Welch, 1984. Applications: GIF, v42bis Patented by UniSys Corp. Transmit only the dictionary index. The alphabet is stored in the dictionary in advance.

LZW
Input sequence:

Output: dictionary index Transmitted: Decoded:

1
1 2 3 4

2
a b c d ab

3
6 7 8 9

5
bc ca aba abc
1 2 3 4

b
a b c d ab

ab

ab
bc ca aba

Encoder dictionary:

Decoder dictionary:
6 7 8

And now for some applications: GIF & PNG

GIF
CompuServe Graphics Interchange Format (1987, 89). Features:
Designed for up/downloading images to/from BBSes via PSTN. 1-, 4-, or 8-bit colour palettes. Interlace for progressive decoding (four passes, starts with every 8th row). Transparent colour for non-rectangular images. Supports multiple images in one file (animated GIFs).

GIF: Method
Compression by LZW. Dictionary size 2b+1 8-bit symbols
b is the number of bits in the palette.

Dictionary size doubled if filled (max 4096). Works well on computer generated images.

GIF: Problems
Unsuitable for natural images (photos):
Maximum 256 colors () bad quality). Repetetive patterns uncommon () bad compression).

LZW patented by UniSys Corp. Alternative: PNG

PNG: Portable Network Graphics


Designed to replace GIF. Some features:
Indexed or true-colour images ( 16 bits per plane). Alpha channel. Gamma information. Error detection.

No support for multiple images in one file.


Use MNG for that.

Method:
Compression by LZ77 using a 32KB search buffer. The LZ77 triplets are Huffman coded.

More information: www.w3.org/TR/REC-png.html

Summary
Huffman coding
Simple, easy, fast Complexity grows exponentially with the block length Statistics built-in in the code

Arithmetic coding
Complexity grows linearly with the block size Easily adapted to variable statistics ) used for coding of Markov sources

Universal coding
Adaptive Huffman or arithmetic coder LZ77: Buffer with previously sent sequences <offset,length,next> LZ78: Dictionary instead of buffer <index,next> LZW: Modification to LZ78 <index>

Summary, cont
Where are the algorithms used?
Huffman coding: JPEG, MPEG, PNG, Arithmetic coding: JPEG, JBIG, MPEG-4, LZ77: PNG, PKZip, Zip, gzip, LZW: compress, GIF, v42bis,

Finally
These methods work best if the source alphabet is small and the distribution skewed.
Text Graphics

Analog sources (images, sound) require other methods


complex dependencies accepted distortion

You might also like