0% found this document useful (0 votes)
34 views

Binary Image Compression: - The Art of Modeling Image Source

The document discusses lossless compression techniques for binary images, specifically run length coding which models runs of consecutive identical pixels, and its application in formats like TIFF; it also discusses Lempel-Ziv coding and how building a dictionary of patterns can compress data by replacing repeated patterns with shorter codes.

Uploaded by

Lal Chand
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Binary Image Compression: - The Art of Modeling Image Source

The document discusses lossless compression techniques for binary images, specifically run length coding which models runs of consecutive identical pixels, and its application in formats like TIFF; it also discusses Lempel-Ziv coding and how building a dictionary of patterns can compress data by replacing repeated patterns with shorter codes.

Uploaded by

Lal Chand
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Binary Image Compression

• The art of modeling image source


– Image pixels are NOT independent events
• Run length coding of binary and graphic
images
– Applications: BMP, TIF/TIFF
• Lempel-Ziv coding*
– How does the idea of building a dictionary
can achieve data compression?

EE465: Introduction to Digital Ima


ge Processing
From Theory to Practice
• So far, we have discussed the problem of
designing VLC under the assumption that the
source distribution is known (i.e., the
probabilities of all possible events are given).
• In practice, the source distribution (probability
models) is unknown and cannot be obtained by
relative frequency counting method due to the
inherent dependency among source data.

EE465: Introduction to Digital Ima


ge Processing
An Image Example

Binary image sized 100100


(approximately 1000 dark pixels)
EE465: Introduction to Digital Ima
ge Processing
A Bad Model
• Each pixel in the image is observed as an
independent event
• All pixels can be characterized by a single
discrete random variable – binary Bernoulli
source
• We can estimate the probabilities by
relative frequency counting

EE465: Introduction to Digital Ima


ge Processing
Synthesized Image by the Bad
Model

Binary Bernoulli distribution


(P(X=black)=0.1)
EE465: Introduction to Digital Ima
ge Processing
Why does It Fail?
• Roughly speaking
– Pixels in an image are not independent events (sourc
e is not memoryless but spatially correlated)
– Pixels in an image do not observe the same probabilit
y model (source is not stationary but spatially varying)
• Fundamentally speaking
– Pixels are the projection of meaningful objects in the r
eal world (e.g., characters, lady, flowers, cameraman,
etc.)
– Our mother nature is not random (except at the quant
um level)

EE465: Introduction to Digital Ima


ge Processing
Similar Examples in 1D
• Scenario I: think of a paragraph of English
texts. Each alphabet is NOT independent due
to semantic structures. For example, the
probability of seeing a “u” is typically small;
however, if a “q” appears, the probability of the
next alphabet being “u” is large.
• Scenario II: think of a piece of human speech. It
consists of silent and voiced segments. The
silent segment can be modeled by white
Gaussian noise; while the voiced segment can
not (e.g., pitches)

EE465: Introduction to Digital Ima


ge Processing
Data Compression in Practice

discrete source Y entropy binary


source X modeling coding bit stream
P(Y)
probability
estimation

Probabilities can be estimated by counting relative


frequencies either online or offline

The art of data compression is the art of source modeling

EE465: Introduction to Digital Ima


ge Processing
Source Modeling Techniques
• Transformation
– transform the source into an equivalent yet
more convenient representation
• Prediction
– Predict the future based on the causal past
• Pattern matching
– Identify and represent repeated patterns

EE465: Introduction to Digital Ima


ge Processing
Non-image Examples
• Transformation in audio coding (MP3)
– Audio samples are transformed into frequency
domain
• Prediction in speech coding (CELP)
– Human vocal tract can be approximated by an
auto-regressive model
• Pattern matching in text compression
– Search repeated patterns in texts

EE465: Introduction to Digital Ima


ge Processing
How to Build a Good Model
• Study the source characteristics
– The origin of data: computer-generated,
recorded, scanned …
– It is about linguistics, physics, material
science and so on …
• Choose or invent the appropriate tool
(modeling technique)
Q: why pattern matching is suitable for texts,
but not for speech or audio?

EE465: Introduction to Digital Ima


ge Processing
Image Modeling
• Binary images
– Scanned documents (e.g., FAX) or computer-
generated (e.g., line drawing in WORD)
• Graphic images
– Windows icons, web banners, cartoons
• Photographic images
– Acquired by digital cameras
• Others: fingerprints, CT images,
astronomical images …

EE465: Introduction to Digital Ima


ge Processing
Lossless Image Compression
• No information loss – i.e., the decoded
image is mathematically identical to the
original image
– For some sensitive data such as document or
medical images, information loss is simply
unbearable
– For others such as photographic images, we
only care about the subjective quality of
decoded images (not the fidelity to the
original)

EE465: Introduction to Digital Ima


ge Processing
Binary Image Compression
• The art of modeling image source
– Image pixels are NOT independent events
• Run length coding of binary and graphic
images
– Applications: BMP, TIF/TIFF
• Lempel-Ziv coding*
– How does the idea of building a dictionary
can achieve data compression?

EE465: Introduction to Digital Ima


ge Processing
Run Length Coding (RLC)

What is run length?

Run length is defined as the length of consecutively


identical symbols
Examples

Coin-flip HHHHH T HHHHHHH


5 1 7

random walk SSSS EEEE NNNN WWWW


4 4 4 4

EE465: Introduction to Digital Ima


ge Processing
Run Length Coding (Con’t)

discrete
Transformation Y Entropy binary
by run-length
source X counting coding bit stream
P(Y)
probability
estimation

Y is the sequence of run-lengths from which X


can be recovered losslessly

EE465: Introduction to Digital Ima


ge Processing
RLC of 1D Binary Source

X 0000 111 000000 11111 00

Y 4 3 6 5 2

Huffman coding

(need extra 1 bit to denote


compressed data what the starting symbol is)
Properties
-“0” run-length (red) and “1” run-length (green) alternates
- run-lengths are positive integers

EE465: Introduction to Digital Ima


ge Processing
Variation of 1D Binary RLC

When P(x=0) is close to 1, we can record


run-length of dominant symbol (“0”) only

Example
00000100000001000011000000001…
5 7 4 0 8 run-length

Properties
- all coded symbols are “0” run-lengths
- run-length is a nonnegative integer
EE465: Introduction to Digital Ima
ge Processing
Modeling Run-length

geometric source: P(X=k)=(1/2)k, k=1,2,…

Run-length k Probability
1 1/2
2 1/4
3 1/8
4 1/16
5 1/32
… …

EE465: Introduction to Digital Ima


ge Processing
Golomb Codes
Optimal VLC for geometric source: P(X=k)=(1/2)k, k=1,2,…

k codeword
1 0 1 0
2 10
3 110 1 0
4 1110
5 11110 1 0
6 111110
1 0
7 1111110
8 11111110
… …… …
EE465: Introduction to Digital Ima
ge Processing
From 1D to 2D

white run-length
black run-length

Question: what distinguishes 2D from 1D coding?


Answer: inter-line dependency of run-lengths

EE465: Introduction to Digital Ima


ge Processing
Relative Address Coding
(RAC)*
00000111111100000000000 previous
line
7
00000011111111000001100 current
line

d1=1 d2=-2 NS,run=2


NS – New Start

Its variation was adopted by CCITT for Fax transmission

EE465: Introduction to Digital Ima


ge Processing
Image Example
CCITT test image No. 1
Size: 17282376
Raw data (1bps)

filesize of ccitt1.pbm: 513216 bytes

filesize of ccitt1.tif: 37588 bytes

Compression Ratio=13.65

EE465: Introduction to Digital Ima


ge Processing
Graphic Images (Cartoons)

Total 12 different colors (black,white,red,green,blue,yellow …)

Observations:
-dominant background color (e.g., white)
-objects only contain a few other colors
EE465: Introduction to Digital Ima
ge Processing
Palette-based Image
Representation
index color
0 white
Any (R,G,B)
1 black
24-bit color
2 red
can be repre-
3 green sented by its
4 blue index in the
5 yellow palette.
… …

EE465: Introduction to Digital Ima


ge Processing

You might also like