Digital Image Processing
Prepared by
K.Indragandhi,AP(Sr.Gr.)/ECE
1
Module-IV
IMAGE COMPRESSION
2
• Definition: Compression means storing data in a
format that requires less space than usual.
• Data compression is particularly useful in
communications because it enables devices to
transmit the same amount of data in fewer bits.
3
• The bandwidth of a digital communication link
can be effectively increased by compressing data
at the sending end and decompressing data at the
receiving end.
• There are a variety of data compression
techniques, but only a few have been
standardized.
4
Types of Data Compression
• There are two main types of data compression :
Lossy and Lossless.
• In Lossy data compression the message can never be
recovered exactly as it was before it was
compressed.
5
• In a Lossless data compression file the original
message can be exactly decoded.
• Lossless compression is ideal for text.
• Huffman coding is type of lossless data
compression.
6
Compression Algorithms
• Huffman Coding
• Run Length Encoding
• Shift Codes
• Arithmetic Codes
• Block Truncation Codes
• Transform codes
• Vector Quantization
7
Huffman Coding
• Huffman coding is a popular compression technique that
assigns variable length codes (VLC) to symbols, so that
the most frequently occurring symbols have the shortest
codes.
• On decompression the symbols are reassigned their
original fixed length codes.
8
• The idea is to use short bit strings to represent
the most frequently used characters
• and to use longer bit strings to represent less
frequently used characters.
9
• That is, the most common characters, usually space, e,
and t are assigned the shortest codes.
• In this way the total number of bits required to transmit
the data can be considerably less than the number
required if the fixed length ASCII representation is used.
• A Huffman code is a binary tree with branches assigned
the value 0 or 1.
10
11
Huffman Algorithm
• To each character, associate a binary tree consisting of
just one node.
• To each tree, assign the character’s frequency, which
is called the tree’s weight.
• Look for the two lightest-weight trees. If there are
more than two, choose among them randomly.
12
• Merge the two into a single tree with a new root
node whose left and right sub trees are the two
we chose.
• Assign the sum of weights of the merged trees as
the weight of the new tree.
• Repeat the previous step until just one tree is left.
13
Huffman Coding Example
• Character frequencies
– A: 20% (.20)
– B: 9% (.09)
– C: 15%
– D: 11%
– E: 40%
– F: 5%
• No other characters in the document
14
Huffman Code
E BF D A C
.4 .14 .11 .20 .15
0 1
B F
.09 .05
15
Huffman Code ABCDEF
1.0
0 1
• Codes ABCDF E
– A: 010 0
.6
1
.4
– B: 0000
BFD AC
– C: 011 .25 .35
0 1 0 1
– D: 001
BF D A C
– E: 1 .14 .11 .20 .15
– F: 0001 0 1
B F
.09 .05
• Note
– None are prefixes of another
16
Huffman Coding
• TENNESSEE • ENCODING
9 • E:1
0/ \1
• S : 00
5 e(4)
• T : 010
0/ \1
• N : 011
s(2) 3
0/ \1
Average code length = (1*4 +
t(1) n(2)
2*2 + 3*2 + 3*1) / 9 = 1.89
17
Average Code Length
Average code length =
i=0,n (length*frequency)/ i=0,n frequency
= { 1(4) + 2(2) + 3(2) + 3(1) } /(4+2+2+1)
= 17 / 9 = 1.89
18
ENTROPY
Entropy is a measure of information content:
the more probable the message, the lower its
information content, the lower its entropy
Entropy = -i=1,n (pi log2 pi)
( p - probability of the symbol)
= - ( 0.44 * log20.44 + 0.22 * log20.22
+ 0.22 * log20.22 + 0.11 * log20.11 )
= - (0.44 * log0.44 + 2(0.22 * log0.22 + 0.11 * log0.11)
/ log2
= 1.8367
19
Advantages & Disadvantages
• The problem with Huffman coding is that it uses
an integral number of bits in each code.
• If the entropy of a given character is 2.5 bits,the
Huffman code for that character must be either 2
or 3 bits , not 2.5.
20
• Though Huffman coding is inefficient due to using
an integral number of bits per code , it is
relatively easy to implement and very efficient for
coding and decoding.
• It provides the best approximation for coding
symbols when using fixed width codes.
21
Run-length encoding
22
• Run-length encoding (RLE) is a very simple form of data
compression encoding.
• RLE is a lossless type of compression
• It is based on simple principle of encoding data. This
principle is to every stream which is formed of the same
data values (repeating values is called a run) i.e sequence
of repeated data values is replaced with count number
and a single value.
23
• This intuitive principle works best on certain data types
in which sequences of repeated data values can be
noticed;
• RLE is usually applied to the files that a contain large
number of consecutive occurrences of the same byte
pattern.
24
• RLE may be used on any kind of data regardless of its content, but data which is
being compressed by RLE determines how good compression ratio will be
achieved.
• RLE is used on text files which contains multiple spaces for indention and
formatting paragraphs, tables and charts.
• Digitized signals also consist of unchanged streams so such signals can also be
compressed by RLE.
• A good example of such signal are monochrome images, and questionable
compression would be probably achieved if such compression was used on
continous-tone (photographic) images.
25
• Fair compression ratio may be achieved if RLE is
applied on computer generated color images.
• RLE is a lossless type of compression and cannot
achieve great compression ratios,
• but a good point of that compression is that it
can be easily implemented and quickly executed.
26
Example1
•
WWWWWWWWWWWWBWWWWWWWWWWWW
BBB
WWWWWWWWWWWWWWWWWWWWWWWW
BWWWWWWWWWWWWWW
• If we apply a simple run-length code to the above hypothetical scan line, we ge
the following:
• 12WB12W3B24WB14W
27
Shift code:
A shift code is generated by
• Arranging the source symbols so that their probabilities
are monotonically decreasing
•Dividing the total number of symbols into symbol blocks
of equal size.
•Coding the individual elements within all blocks
identically, and
•Adding special shift-up or shift-down symbols to identify
each block. Each time a shift-up or shift-down symbol is
recognized at the decoder, it moves one block up or down
with respect to a pre-defined reference block.
28
Arithmetic coding
•Unlike the variable-length codes described previously,
arithmetic coding, generates non-block codes.
•In arithmetic coding, a one-to-one correspondence between
source symbols and code words does not exist.
•Instead, an entire sequence of source symbols (or message)
is assigned a single arithmetic code word.
•Arithmetic coding, is entropy coder widely used, the only
problem is it's speed, but compression tends to be better than
can achieve
29
• The code word itself defines an interval of real numbers between
0 and 1
• As the number of symbols in the message increases, the interval
used to represent it becomes smaller and the number of
information units (say, bits) required to represent the interval
becomes larger
• Each symbol of the message reduces the size of the interval in
accordance with the probability of occurrence.
• It is suppose to approach the limit set by entropy.
30
• The idea behind arithmetic coding is to have a
probability line, 0-1
• assign to every symbol a range in this line based on its
probability
• higher the probability, the higher range which assigns
to it.
• Once we have defined the ranges and the probability
line, start to encode symbols
• every symbol defines where the output floating point
number lands
31
Example
Symbol Probability Range
a 2 [0.0 , 0.5)
b 1 [0.5 , 0.75)
c 1 [0.7.5 , 1.0)
32
Algorithm to compute the output number
• Low = 0
• High = 1
• Loop. For all the symbols.
Range = high - low
High = low + range * high_range of
the symbol being coded
Low = low + range * low_range of the symbol
being coded
33
Symbol Range Low value High value
0 1
b 1 0.5 0.75
a 0.25 0.5 0.625
c 0.125 0.59375 0.625
a 0.03125 0.59375 0.609375
The output number will be 0.59375
34
Arithmetic coding
Let the message to be encoded be a1a2a3a3a4
35
0.16
0.8 0.072 0.0688
0.08
0.4 0.04 0.056 0.0624 0.06496
0.2 0.048 0.0592 0.06368
36
So, any number in the interval [0.06752,0.0688) , for example
0.068 can be used to represent the message.
Decode 0.39.
Since 0.8>code word > 0.4, the first symbol should be a3.
1.0 0.8 0.72 0.432
0.8 0.72 0.648 0.8
0.4 0.36 0.504 0.4
0.2 0.28 0.432 0.2
0.0 0.2 0.36 37
0.36