Chapter 8 - Image Compression
Chapter 8 - Image Compression
PROCESSING
Chapter 8 – Image Compression
Instructor:
Dr J. Shanbehzadeh
[email protected]
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
8.1 Fundamentals
Table of Contents
2
8.1 Fundamentals
8.1.1 coding redundancy
8.1.2 spatial and temporal redundancy
8.1.3 irrelevant information
8.1.4 measuring imageinformation
8.1.5 fidelity Criteria
8.1.6 Image compression Models
8.1.7 Image Formats , Containers , and compression
standard
3
Image Compression
Preview
Image compression , the art and science of reducing the amount of data reguired to
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
represent an image ,is one of the most useful and commerically successful
technologies in the filed of digital image processing .the number of imaged that are
compress and decompressed daily is staggering,and the compressions and
decompressions themselves are virtually invisible to the user .anyone who owns a
digital camera ,surfs the web ,or wathes the latest hollywood movies on digital video
.disks (DVDs)benefits from the algorithms and standards discussed in this chapter
To better understand the need for compact image representations,consider the amount
of data reguired to represent a two – hour standard definiton(SD)television movie
using 720 480 24 bit pixel arrays.Adigitsl movies (or video)is sequence of video
farmes in which each frame is a full –color still image .Becase video players must
display the frames sequentially at rates near 30 fps (Frames per second).SD digital
video data must be accessed at
4
Image Compression
Or 224 GB (gigabytes)of data .twenty –seven 8.5 GB dual-layer DVDs (assuming conventional 12 cm
disks)are needed to store it.to put a two-hour movies on a single DVD.each frame must be compressed – on
average – by a factor of 26.3 the compression must be even higher for high definition (HD) television.where
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
5
8.1 FUNDAMENTALS
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
The term data compression refers to the process of reducing the amount of data reguired to represent a
given quantity of information.in this definition,data and information are not the same thing ,data
are the means by which information is conveyed.Because various amounts of data can be used to
represent the same amount of information,representations that contain irrelevant or repeated
information are said to contain redundant data,if we let b and b denote the number of bits (or
information –carrying units)in two representations of the same information,the relative data
redundancy R of the representation whit b bits is
6
8.1 fundamental
If C=10 (Sometimes written 10:1 )for instance ,the larger representation has 10 bits
. of data for every 1 bit of data in the smaller representation
Thecorresponding relative data redundancy of the larger representation is 0.9
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
8
8.1.1 Coding Redundancy
In chapter3,we developed techniques for image enhancement by histogram
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
processing,assuming that the intensity values of an image are random quanti ties.In
this section ,we use a similar formulation to introduce optimal information
coding.Assume that a discrete random variable rk in the interval [0,L -1] is used to
represent the intensities of an M N image and that each rk occurs whith
probabilitypr (rk).As in section.3.3
Where L is the number of intensity values,and nk is the number of times that the kth
intensity appears in the image.if the number of bits used to represent each value of r k
is (rk),then the average number of bits required to represent each pixel is.
9
8.1.1 Coding Redundancy -2
That is ,the average length of the code words assigned to the various intensity
values is found by summing the products of the number of bits used to
represent each intensity and the probability that the intensity occurs.the total
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
10
On the other hand,if the scheme designated as code 2 in table 8.1 is used,the average length of the encoded pixels
is,in accordance whit Eq .(8.1-4)
The total number of bits needed to represent the entire image is MNL avg =256×256×1.81 or 118,621.from Eqs.(8.1-
2)and (8.1-1),the resulting compression and corresponding relative redundancy are.
and
8.1.1 Coding Redundancy - 4
Respectively.thus 77.4% of the data in the original 8-bit 2-D intensity array is
redundant. The compression achieved by code 2 results from assigning fewer bits to
themore probable intensity values than to the less probable ones.in the resulting
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
variable –length code,r128-the images most probable intensity –is assignd the 1-bit
code word 1 [of length l2(r128)=1],while r255-its least probable occurring intensity –is
assigned the 3-bit code word 001 [of length L2
3] note that the best fixed-length code that can be assignd to the intensities of =)r255(
the image in fig.8.1(a) is the natural 2-bit counting seguence {00,01,10,11},but the
resulting compression is only 8/2 or 4:1 –about 10% less than the 4.42:1 compression
.of the variable-length code
As the preceding axample shows,coding redundancy is present when the codes
assigned to a set of events (such as intensity values)do not take full advantage of the
probabilities of the events.Conding redundancy is almost always present when the
intensities of an image are represented using a natural binary code.the reason is that
most images are composed of objects that have a regular and somewhat predictable
morphology (shape)and reflectance,and are sampled so that the objects being
depicated are much larger than the picture elements.the natural consequence is
that,for most images,certain intensities are more probably than others(that is,the
histograms of most images are not uniform).A natural binary encoding assigns the
same number of bits to both the most and least probable values .failing to minimize
.Eq.(8.1-4) and resulting in coding redundancy
8.1.2 spatial and temporal redundancy
Consider the computer-generated collection of constant intensity lines in fig.8.1(b).in
the corresponding 2-D intensity array:
1 . All 256 intensities are equally probably.As fig.8.2 shows,the histogram of the image
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
is uniform.
2.because the intensity of each line was selected randomly ,its pixels are independent of
one another in the vertical direction.
3.because the pixels along each line are identical ,they are maximally correlated
(completedly dependent on one another) in the horizontal direction
The first observation tells us that the image in fig.8.1(b)-when represented as a
conventional 8-bit intensity array – cannot be compressed by variable length coding
alone.unlike the image on fig.8.1 (a) (and Example 8.1),whose historgram was not
uniform ,a fixed –length 8-bit code in this case minimizes Eq.(8.1-4).observations 2
and 3 reveal a significant spatial redundancy that can be eliminated.for instance,by
representing the image in fig.8.1(b) as a sequence of run-length pairs,where each run
–length pair specifies the start of a new intensity and the number of consecutive
pixels that have that intensity Arun –length based representation compresses the
original 2-D,8- bit intensity array by (256×256×8)/[(256+256)×8] or 128:1.Each 256-13
8.1.2 spatial and temporal redundancy - 2
• In most images,pixels are correlated spatially (in both x and y )and in time (when
the image is part of a video sequence).because most pixel intensities can be
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
15
8.1.3 Irrelevant Information - 2
We conclude the section by noting that the redundancy examined here is
fundamentally different from the redundancies discussed in section 8.1.1 and
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
8.1.2 its elimination is possible because theinformation itself is not essential for
normal visual processing and/or the intended use of the image.because its
omission results in a loss of quantitative information,its removal is commonly
referred to as quantization.this terminology is consistent with normal use of the
word,which generally means the mapping of a broad rang of input values to a
limited number of output values (see section 2.4)because information is
lost,quantization is an irreversible operation.
16
8.1.4 : Measuring Image Information
In the previous section ,we introduced several ways to reduce the amount of data used
to represent an image.the question that naturally arises is this :how few bits aer
actually needed to represent the information inan image?
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
That is ,is there a minimum amount of data that is sufficient to describe an image
without losing information ?information theory provides the mathematical framework
to answer this and related question .its fundamental premise is that the generation of
information can be modeled as a probabilistic process that can be measured in a
manner that agrees with intuition.in accordance with this suposition ,a random event
E with probability p (E) is said to contain.
8.1.4 : Measuring Image Information - 2
Units of information .if p(E) =1 (that is ,the event always occurs),I (E) =0 and no
information is attributed to it .because no uncertainly is associated with the event ,no
information would be transferre by communicating that the event has occurred [it
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
The aj in this equation are called source symbols .because the are statisically
independent ,the source symbole itself is called a zero-memory source .if an image is
considered to be the output of an imaginary zero zero-memory “intensity source”we
can use the histogram of the observed image to estimate the symbol probabilities of
the source.then the intensity source’s entropy becomes.
18
8.1.4 : Measuring Image Information - 3
Where variables L,rk , and pr (rk) are as defnied in section 8.1.1 and 3.3. because the
base 2 logarithm is used,Eq.(8.1-7) is the average information per intensity output of
the imaginary intensity source in bits .it is not possible to code the intensity values of
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
the imaginary source (and thus the sample images)with fewer than H bits/pixel.
□the entropy of the image in fig.8.1(a)can be estimated by substituting the intensity
probabilities from table 8.1 into Eq .(8.1-7)
In a similar manner ,the entropies of the images in fig .8.1(b)and (c) can be shown to
be 8 bits/pixel and 1.566 bits /pixel ,respectively .note that the image in fig .8.1(a)
appears to have the most visual information ,but has almost the lowest computed
entropy – 1.66 bits/pixel .the image in fig .8.1(b) has almost five times the entropy of
the image in (a),but appears to have about the same (or less)visual information ;and
the image in fig .8.1(c),which seems to have little or no information ,has almost the
same entropy as the image in (a).the obvious conclusion is that the amount of
entropy and thus information in an image is far from intuitive.
19
Shannon’s first theorem
Recall that the variable-length code in Example 8.1 was able represent the intensities of
the image in fig.8.1(a)using only 1.81bits/pixel .Although this is higher than 1.6614
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
bits/pixel entropy estimate from example 8.2 shannons first theorem –also called the
noiseless coding theorem (shannon[1984])-assures us that the image in fig.8.1(a)can
be represented with as few as 1.6614 bits/pixel. To prove it in a general
way ,shannon looked at representing groups of n consecutive source symbols with a
single code word (rather than one code word per source symbol)and showed that.
20
Shannon’s first theorem - 2
Where Lavg,n is the average number of code symbols reguired to represent all n-
symbol group .in the proof ,he defined the nth extension of a zero-memory source to
be the hypothetical source that produce n-symbol blocks using the symbols of the
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
original source ;and computed Lavg,n by applying Eq .(8.1-8)to the code words used to
represent the n-symbol blocks.Equation (9.1-8) tells us that Lavg,n/n can be made
aribitrarily close to H by encoding infinitely long extensions of the single –symbol
source.that is,it is possible to represent the output of a zero –memory source with an
.average of H information units per source symbol
If we now return to the idea that an images is a “sample”of the intensity source that
produced it ,a block of n source symbols corresponds to a group of n adjacent
pixels.to construct a variable-length code for n- pixel blocks, the relative frequencies
of the blocks must be computed.but the nth extension of a hypothetical intensity
source with 256 intensity values has 256possible n-pixel blocks .Even in the simple
case of n = 2,a 65,536 element histogram and up to 65,536 variable – length code
words must be generated for n =3,as many as 16,777,216 code words are needed .so
even for small values of n, computational complexity limits the usefulness of the
.extension coding approach in practice
Finally ,we note that although Eq.(8.1-7) provides a lower bound on the compression
that can be achieved when coding statistically independent pixels directly ,it breaks
down when the pixels of an image are correlated.blocks of correlated pixels can be
coded with fewer average bits per pixel than the equation predicts.Rather than
using source extensions,less correlated descriptors (like intensity run –lengths) are
normally selected and coded without extension .this was the approach used to
compress fig.8.1(b) in section 8.1.2.when the output of a source of information
depends on a finite number of preceding outputs ,the source is called a markov or 21
finite memory source.
8.1.5 Fidelity
In section 8.1.3.it was noted that the removal of irrelevant visual”information
involves a loss of real or quantitative image information.because information is lost ,a
means of quantifying the nature of the loss is needed .two types of criteria can be
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
used for such an assessment 1) objective fidelity criterian and (2)subjective fidelity
.criteria
When information loss can be expressed as a mathematical function of the input and
output of a compression process,it is said to be based on an objective fidelity
criterion.An example is the root –mean –square(rms)error between two images.let f
(x,y) be an input image and f (x,y) be an approximation of f (x,y) that results from
compressing and subsequenty decompressing the input.for any value of x and y ,the
error e (x,y) between f (x,y) and f (x,y ) is
Where the images are of size M×N. the root –mean-square error ,e rms,between f (x,y)
and f (x,y) is then the square root of the squared error averaged over the M×N
array ,or
22
8.1.5 Fidelity – 2
If f (x,y)is considered [by a simple rearrangement of the terms in Eq.(8.1-9)]to be the
sum of the original image f (x,y) and an error or “noise”signal e (x,y),the mean-
square signal – to noise ratio of the output image ,denoted SNRms ,can be defined as
in section 5.8:
The rms value of the signal – to – noise ratio,denoted SNRrms ,is obtained by taking
the square root of Eq.(8.1-11)
While objective fidelity criteria offer a simple and convenient way to evaluate
information loss ,decompressed images are ultimately viewed by
humans,someasuring image quality by the subjective evaluations of people is often
more appropriate.this can be done by presenting a decompressed image to a cross
section of viewers and averaging their evaluations .the evaluations may be made
using an absolute rating scale or by means of side-by-side comparisons of f (x,y) and f
(x,y).table 8.2 shows one possible absolute rating scale .sid-by –side comparisons can
be done with a scale such as {-3,-2,-1,0,1,2,3,} to represent the subjective
evaluations {much worse,worse,slightly worse,the same,slightly better ,better,much
better},respectively.in either case ,the evaluations are based on subjective fidelity
criteria.
23
8.1.5 Fidelity - 3
□figure 8.4 shows three different approximation of the image in fig .8.1(a).using Eq.
(8.1-10)with fig .8.1(a)for f(x,y) and the images in fig .8.4(a) through (c) as f(x,y),the
computed rms errors are 5.17,15.67,and 14.17 intensity levels,respectively.in terms
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
of rms error-an objective fidelity criterion the three images in fig .8.4 are ranked in
order of decreasing quality as {(a),(c),(b)}.
24
8.1.5 Fidelity - 4
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
Figures 8.4(a) and(b) are typical of images that have been compressed and
subsequently reconstructed.both retain the essential information of the original
image-like the spetial and intensity characteristics of its objects.
And their rms errors correspond roughly to perceived quality.figure 8.4(a),which is
practically as good as the original image,has the lowest rms error,while fig.8.4(b) has
more error but noticeable degradation at the bound aries between objects.this is
exactly as one would expect.
Figure 8.4 (c) is an artificially generated image that demonstrates the limitation of
objective fidelity criteria.note that the image is missing large sections of several
important lines(i.e,visual information) and has small dark squares (i.e.,artifacts) in
the upper right quadrant.the visual content of the image is misleading and certainly
not as accurate as the image in (b) ,but it has less rms error-14.17 versus 15.76
intensity values.A subjective evaluation of the three images using table 8.2 might
yield an excellent rating for(a) a passable or marginal rating for (b) and an inferior of
unusable rating for (c) the rms error measure ,on the other hand,ranks (c) ahead 25
8.1.6 : Image Compression Model
As fig 8.5 shows ,an image compression system is composed of two distinct functional
components:an encoder and a decoder.the encoder performs compression ,and the
decoder performs the complementary operation of decompression .both operations
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
can be performed in software ,as is the case in web browsers and many commercial
image editing programs,or ina combination of hardware and firmware,as in
commercial DVD players.A codec is a device or program image f (x,…) is fed into the
encoder,which creats a compressed representation of the input.this representation is
stored for later use,or trans mitted for storage and use at a remot location.when the
compressed representation is presented to its complementary decoder ,a
reconstructed output image f (x,…) is generated.in still-image applications,the
encoded input and decoder output are f (x,y) and f (x,y) ,respectively ; in video
applications ,they
Are f (x,y,t) and f(x,y,t) ,where discrete parameter t specifics time.in general ,f (x,…)may
or may not be an exact replica of (x,…) if it is ,the compression system is called error
free,lossless,or information perserving.if not,the reconstructed output image is
distorted and the compression system is referred to as lossy.
26
The encoding or compresion process
The encoder of fig .8.5 is designd to remove the redundancies described in sections
8.1-8.1.3 through a series of three independent operations.in the first stage of the
encoding process ,a mapper transforms f (x,…) into a (usually non visual) format
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
28
8.1.7 Image Formats , Containers , and Compression
Standard
In the context of digital imagine,an image file format is a stsndard way to organize
and store image data.it defines how the data is arranged and the type of
compression- if any –that is used .An image container is similar to a file format but
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
29
Also included .Note that they
are shown in gray ,which is
used in fig .8.6 to denote
entries that are not
sanctioned by an
international standards
organization.table 8.3 and
8.4 summarize the
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
standards,applications,and
key compression methods
are identified .the
compression methods
themselves are the subject of
section of the next section.in
both tables ,forward
refrences to the relevant
subsections of sections 8.2
are enclosed in square
brackets.
30
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
31
32
Problems
Problem 8.1
Problem 8.2
Problem 8.3
Problem 8.4
Problem 8.5
33
8.2 Some Basic Compression Methods
8.2.1 Huffman Coding
8.2.2 Golomb Coding
8.2.3 Arithmatic Coding
8.2.4 LZW Coding
8.2.5 Run-Length Coding
8.2.6 Symbol – Based Coding
8.2.7 Bit-Plane Coding
8.2.8 Block Transform Coding
8.2.9 Predictive Coding
8.2.10 Wavelet Coding
34
8.2 Some Basic Compression Methods
35
8.2.1 Huffman Coding
One of the most popular techniques for removing coding reundancy is due
to huffman (huffman [1952]).when coding the symbols of an information
source individually ,huffman coding yields the smallest possible number of
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
code symbols per source symbol.in terms of shannon s first theorem (see
section 8.1.4),the resulting code id optimal for a fixed value of n , subject
to the constraint that the source symbols be coded one at a time. In
practice,the source symbols may be either the intensities of an image or
the output of an intensity mapping operation (pixel differences, run
lengths,and so on)
The first step in huffmans approach is to create a series of source
reductions by ordering the probabilities of the symbols under
consideration and combining the lowest probability symbols into asingle
symbol that replacess them in the next source reduction.fingure 8.7
illustrates this process for binary coding (K-ary huffman codes can also be
constructed).At the far left,a hypothetical set of source symbols and their
probabilities are orderd from top to bottom in terms of decreasing
probability values .to form the first source reduction ,the bottom two
probabilities,0.06 and 0.04 are combined to form a “compound symbol”
with probability 0.1.this compound symbol and its associated probability
are placed in the first source reduction column so that the probabilities of
the reduced source also are ordered from the most to the least
probable.this process is then repeated unit a reduce source with two
36
symbols(at the far right) is reached.
8.2.1 Huffman Coding
The second step in huffmans procedure is to code each reduced source,starting with the
smallest source and working back to the original source.the miniral length binary
code for a two –symbol source,of course are the symbols 0 and 1 As fig .8.8
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
shows,these symbols are assigned to the two symbols on the right (the assignment is
arbitarary;reversing the order of the 0 and 1 would work just as well ).As the reduced
source symbol with probability 0.6 was generated by combining two symbols in the
reduced source to its left,the 0 used to code it isnow assigned to both of these
symbols , and a 0 and 1 are arbitrarily appended to each to distinguish them from
each other.this operation is then repeated for
37
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
Each reduced source until the original source is reached.the final code appears at
the far left in fig .8.8.the average length of this code is
Lavg = (0.4)(1)+(0.3)(2)+(0.1)(3)+(0.1)(4)+(0.06)(5)+(0.04)(5)=2.2 bits/pixel
And the entropy of the source is 2.14 bits/symbol.
Huffmans procedure creats the optimal code for a set of symbols and probabilities
subject to the constraint that the symbols be coded one at a time After the code has
been created, coding and/or error-free decoding is accomplished in a simple lookup
table manner.the code itself is an instantaneous uniquely decodable block code.it is
called a block code because each source symbol is mapped into a fixed sequence of
code symbols.it is instantaneous because each code word in a string of code symbols
can be decoded without referencing succeeding symbols.it is uniguely decodable
because any string of code symbols can be decoded in only one way.thus,any string of
huffman encoded symbols can be decoded by examinig the individual symbols of the
string in a left-to right manner.for the binary code of fig.8.8,a left –to-right scan of
the encoded string 010100111100 reveals that the first valid code word is 01010,with
is the code for symbol a3 .the next valid code is 011 ,which corresponds to symbol
a1.continuing in this manner reveals the completely decoded message to be a 3a1a2a2a6 38
The 512×512×8 bit monochrome image in fig.8.9(a) has the intensity histogram shown
in fig.8.9 (b) because the intensities are not equally probable,
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
39
8.2.2 Golomb Coding
In this section we consider the coding of nonnegative integer inputs with exponentially decaying probability
distributions.inputs of this type can be optimally encoded (in the sense of shannons first theorem)using a
family of codes that are computationally simpler than huffman codes.the codes them selves were first
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
proposed for the representation of nonnegative run lengths (Golomb [1966]).in the disscussion that
follows,the notation [x] denotes the largest integer less than or equal to x,[x] means the smallest intenger
grater than or equal to x and x mod y is the remainder of x divided by y.
Given a nonnegative integer n and a positive intenger divisor m >0,the golomb code of quotient[n/m] and
the binary representation of remainder n mod m.G m(n) is constructed as follows:
Step 1. Form the unary code of quotient [n/m] .(the unary code of an integer q is defined as q 1s
followed by a 0 .)
Step 2 . let k = [log m] ,c =2k –m , r =n mod m , and compute truncated remainder r
2
such that
table 8.5 list the G1,G2,and G4 codes of the first ten nonnegative integers.Because m is a power of 2 in each
case (i.e.,1=20,2=21 , and 4=22), they are the first three Golomb –Rice codes as well .Moreover ,G 1 is the
unary code of the nonnegative integers because [n/1] =n and n mod 1 =0 for all n.
Keeping in mind that Golomb codes can only be used to represent nonnegative integers and that there are
many Golomb codes to choose from ,a key step in their effective application is the selection of divisor
m.when the integers to be represented are geometrically distributed with probability mass function (PMF)
FOR SOME 0<1 , Golomb codes can be shown to be optimal – in the sense that G m (n) provides the shortest
averge code length of all uniqualy decipher able codes –when (Gallager and voorhis [1975])
41
Figure 8.10 (a) plots Eq .(8.2-2) for three values of ρ and illustrates graphically the
symbol probabilities that Golomb codes handle well (that is ,code efficiently )As is
shown in the figure ,small integers are much more probable than large ones.
Because the probabilities of the intensities in an image [see,for example ,the
histogram of fig .8.9 (b) ] are unlikely to much the probabilities specified in Eq.(8.2-2)
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
and shown in fig .8.10 (a) ,Golomb codes are seldom used for the coding of
intensities .when intensity differences are to be coded ,however ,the
42
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
Probabilities of the resulting “difference values”(see section 8.2.9)-with the notable exception of the
negative differences – often resemble those of Eq.(8.2-2) and fig .8.10 (a).to handle negative differences in
Golomb coding.which can only represent nonnegative integers ,a mapping like.
43
typically is used .using this mapping ,for example,the two –sided PMF shown in fig .8.10 (b) can be
transformed into the one –sided PMF in fig .8.10 (c) .its intergers are reodered .alternating the negative and
positive integers so that the negative integers are mapped into the odd positive integer positions.if p (n) is
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
two – sided and centered at zero , p (m (n)) will be one – sided .the mapped integers , M(n) , can then be
efficiently encoded using an appropriate Golomb – Rice code (Weinberger et al .[1996])
□Consider again the image from fig 8.1 (c) and note that its histogram – see fig .8.3 (a) – is similar to the
two – sided distribution in fig .8.10 (b) above .if we let n be some nonnegative integer intensity in the
image ,where 0 ≤ n ≤ 255 ,and μ be the mean intensity ,p (n-μ) is the two - sided distribution shown in
fig .8.11 (a).this plot was generated by normalizing the histogram in fig .8.3 (a) by the total number of
pixels in the image and shifting the normalized values to the left by 128 (which in effect subtracts the mean
intensity from the image ) .in accordance with Eq.(8.2-4),p (M (n –μ )) is then the one – sided distribution
shown in fig .8.11(b).if the recorded intensity values are Golomb coded using a MATLAB implementation
of code G1 in column 2 of table 8.5 , the encoded representation is 4.5 times smaller than the original image
(i.e., c = 4.5 ).the G1 code realizes 4.5/5.1 or 88% of the theoretical compression possible whit variable –
length coding .(Based on the entropy calculated in Example 8.2 , the maximum possible compression ratio
through variable – length coding is C=8/1.566≈5.1.) moreover ,Golomb coding achieves 96% of the
compression provided by a MATLAB implementation of Huffmans approach – and doesn’t reguire the
computation of a custom Huffman coding table.
Now consider the image in fig .8.9 (a) .if its intensities are Golomb coded using the same G 1 code as
above ,C = 0.0922 . that is ,there is data expansion.
44
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
This is due to the fact that the probabilities of the intensities of the image in fig 8.9(a) are much different
than the probabilities defined in Eq.(8.2-2).in a similar manner,huffman codes can produce data expansion
when used to encode symbols whose probabilities are different from those for which the cod was
computed.in practice ,the futher you depart from the input probability assumtions for which a code is
designd,the greater the risk of poor compression performance and data expansion.
45
To conclude our coverage of Golomb codes,we note that column 5 of table 8.5 contains the first 10 codes of
the zeroth order exponential Golomb code,denoted G exp (n). Exponential-Golomb codes are useful for the
encoding of run lengths ,because both short and long runs are encoded efficiently An order-k exponential-
Golomb code G exp (n) is computed as follows:
And form the unary code of i.if k =0,I = [ log 2(n+1)and the code is also known as the Elias gramma code.
truncate the binary representation of
Step 2 .
46
the unary code of 3 is 1110 and Eq.(8.2-6) of step 2 yields
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
Which when truncated to its 3+0 least significant bits becomes 001.the concatenation of the results from
steps 1 and 2 then yields 1110001.Note that this is the entry in column 4 of table 8.5 for n =8 .finally ,we
note that like huffman codes of the last section ,the Golomb codes of table 8.5 are variable-
length,instantaneous uniquely decodable block codes.
47
8.2.3 Arithmatic Coding
Unlike the variable – length codes of the previous two sections,arithmetic coding generates nonblock
codes.in arithmetic coding ,which can be traced to the work of Elias (see Abramson [1963]) a one – to- one
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
correspondence between source symbols and cod words does not exist.instead ,an entire sequence of source
symbols (or message) is assignd a single arithmetic code word .the code word itself defines an interval of
real numbers between 0 and 1 .As the number of symbols in the message increases,the interval used to
represent it becomes smaller and the number of information units(say,bits) required to represent the invertal
becomes larger .Each symbol of the message reduces the size of the interval in accordance with its
probability of occurrence .because the technique does not require number of code symbols(that is,that the
symbols be coded one at a time),it achieves (but only in theory ) the bound established by shannons first
theorem of section 8.1.4
Figur 8.12 illustrates the basic arithmetic coding process.here,a five – symbol seguence or message ,a 1a2a3a4
from a four-symbol source is coded.At the start of the coding process,the message is assumed to occupy the
entire half-open interval [0,1).As table 8.6 shows ,this interval is subdivided initially into four regions based
on the probabilities of each source symbol .symbol a 1,for example ,is associated with subinterval
[0,0.2).because it is the first symbol of the message being coded,the massage interval is initially narrowed to
[0,02).thus in fig.8.12
48
49
[0,0.2)is expanded to the full height of the figure and its end points labeled by the values of the narrowed
range.the narrowed rang is then subdivided in accordance with the original source symbol probabilities and
the process continues con-tinues with the original source symbol probabilities and the process com-tinues
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
with the original source symbol probabilities and the process com-tinues with the next massage symbol . in
this manner,symbol a2 narrows the subinterval to [0.04,0.08),a 3 further narrows it to [0.056,0.072),and so on
.the final message symbol,which must be reserved as special end-of-message in-dicator,narrows the rang
to[0.06752,0.0688).Of course,any number within this subinterval-for example,0.068-can be used to
represent the massage .in the arithmetically-coded massage of Fig.8.12,three decimal digits are used to
represent the five-symbol massage.This translates into 0.6 decimal digits per source symbol and compares
favorably with the entropy of the source,which, .from
Eq .(8.1-6),is 0.58 decimal digits per source symbol.As the lenghth of the se-quence being coded
increases,the resulting arithmetic code approaches the bound established by shannon’s
First theorem.In practice,two factors cause cod –ing performance to fall short of the bound:(1)the addition
of the end-of-message indicator that is needed to separate one massage from another,and (2)the use of finite
precision.practical implementations of arithmetic coding address the latter problem by introducing a scaling
strategy and a rounding start-egy (Langdon and Rissanen [1981].The scaling strategy renormalizes each
subinterval to the[0,1] rang before subdividing it in accordance with the symbol probabilities.The rounding
strategy guarantees that the truncations associated with finite precision arithmetic do not precision arithmeic
do not prevent the coding subimtervals from being represented accurately.
50
Adaptive Context dependent probability
With accurate input symbol probability models,that is,models that provide the true probabilities of the
symbols being coded,arithmetic coders are near opti-mal in the sense of minimizing the average number of
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
code symbols required to represent the symbols being coded.Like in both Huffman and Golomb cod-
ing,however,inaccurate probability models can lead to non-optimal results.A simple way to improve the
accuracy of the probabilities employed is to use an adaptive ,context dependent probability model.adaptive
probability models update symbol probabilities as symbols are coded or become known.Thus,the
probabilities adapt to the local statistics of the symbols being coded.context dependent models probabilities
that are based on a predefinded neigh-borhood of pixels-called the context-around the symbols being
coded.Nor-mally,a causal context-one limited to symbols that have already been coded-is used.Both the Q-
coder(pennebaker et al.[1988])and MQ-coder (ISO/IEC[2000]),two well-known arithmetic coding
techniques that have been incorpotated into the JBIG,JPEG-2000,and other important image compression
standards,use probability models that are adaptive and con-text dependent.The Q-coder dynamically
updates symbol probabilities during the interval renormalizations that are part of the arithmetic coding
process.Adaptive context dependent models also have been used in Golomb coding-for example,in the
JPEG-LS compression standard.
51
Figure 8.13(a)diagrams the steps involved in adaptive,context –dependet arithmetic coding of binary source
symbols.Arithmetic coding often is used when binary symbols are to be coded. As each symbol (or bit)
begans the coding process,its context is formed in the context determination block of Fig.8.13(a) Figures
8.13 (b)through(d)show three possible contexts that can be used:(1)the immediately preceding symbol,(2)a
group of preceding symbols,and (3)some number of preceding simbols plus symbols on the previous scan
line.For the three cases shown,the probability estimation block must manage 2,(or 2), (or256),and
(or32)contexts and their associated probabilities.For instance,if the context in the context in Fig.8.13(b)is
used,conditional probabilities
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
P(0|a=0)(the probability that the symbol being coded is a 0 given that the preceding symbol
is a 0 ),P(1|a+0),p(0|1=1,and p(1|a=1)must be tracked.The appropriate probbabilities are then passed to the
Arithmetic coding block as afunction of the current context and drive the generation of the arith-metically
coded output sequence in accordance with the process illustrated in Fig.8.12.The probabilities associated
with the context involved in the current coding step are then updated to reflect the fact that another symbol
within that context has been processed.
Finally we note that a variety of arithmetic coding techniques are protected by United States patents(and
may in addition be protected in other jurisdic-tions).Because of these patents and the possibility of
unfavorable monetary judgments for their infringement,most implementions of the JPEG com pression
standard,which contains for both Huffman and arithmetic coding,typically support Huffman coding alone.
52
8.2.4 LZW Coding
The techniques covered in the previous sections are focused on the removal of coding redundancy.In this
section,we consider an error-free compression approach that also addresses spatial redundancies in an
image.The technique,called Lempel-Ziv-Welch (LZW)coding ,assigns fixed-length code words to variable
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
length sequences of source symbols.Recall from section 8.1.4 that shannon used the idea of source
symbols.Recall from section 8.1.4 that shannon used the idea of coding sequences of source symbols,rather
than in-dividual source symbols,in the proof of his first theorem.A key feature of LZW coding is that it
requires no a priori knowledge of the probability of occurrence of the symbols to be encoded.Despite the
fact that until recently it was protected under a United states patent,LZW compression has been integrated
into a vari-ety of mainstream imaging file formats ,including GIF,TIFF,and PDF .The PNG format was
created to get around LZW licensing requirements.
Consider again the 512×512,8-bit image from fig .8.9(a).Using Adobe photoshop,an uncompressed TIFF
version of this image requires 286,740 bytes of disk space -262,144 bytes for the 512×512 8-bit pixels plus
24,596bytes of overhead.Using TIFF’s LZW compression option ,however,the re-sulting file is 224,420
bytes.The compression ratio is c=1.28.Recall that for the Huffman encoded representation of figh .8.9(a)in
Example 8.4,c=1.077.
The additional compression realized by the LZW approach is due the removal of some of the image’s spatial
redundancy.
53
LZW coding is conceptually very simple (Wrlch [1984]).At the onset of the coding process,a codebook or
dictionary containing the source symbols to be coded is constructed.for 8-bit monochrome images,the first
256 words of the dictionary are assigned to intensities 0,1,2,…,255.As the encoder sequen-tially examines
image pixels,intensity sequences that are not in the dictionary are placed in algorithmically determined
(e.g.,the next unused) locations .If the first two pixsels of the image are white,for instance,sequence “255-
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
255”might be assigned to location 256,the address following the locations reserved for intensity levels 0
through 255.the next time that two consecutive white pixels are encountered ,code word 256,the address of
the location containing sequence 255-255,is used to represent them.if a 9-bit ,512-word dictionary is
employed in the coding process,the original (8+8)bits that were used to rep-resent the two pixels are
replaced by a single 9-bit code word.clearly ,the size of the dictionary is an important system parameter.if it
is too small ,the detec-tion of matching intensity-level sequences will be less likely;if it is too large,the size
of the code words will adversely affect compression performance.
Consider the following4×4,8-bit image of a vertical edge:
54
Table 8.7 details the steps involved in coding its 16 pixels.A 512-word dictio-nary with the following
starting content is assumed:
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
55
The image is encoded by processing its pixels in a left –to-right ,top-to-bottom manner.Each successive
intensity value is concatenated with a variable-column 1 of table8.7-called the “currently recognized
sequence.”As can be seen ,this variable is initially null or empty. The dictionary is searched for each
con-catenated sequence and if found .as was the case in the first row of the table,is replaced by the newly
concatenated and recognized (i.e.,located in the dictionary)sequence.This was done in column 1 of row2.No
output codes are generated,nor is the dictionary altered.If the concatenated sequence is not found,
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
however,the address of the currently recognized sequence is output as the next encoded value,the
concatenated but unrecognized sequence is added to the dictionary,and the currently recognized sequence is
initialized to the current pixel value.This occurred in row 2of the table.The last two columns detail the
intensity se-quences that are added to the dictionary when scanning the entire 4×4 image.Nine additional
code words are difinded.At the conclusion of coding,the dictionary contains265code words and the LZW
algorithm has success-fully identified several repeating intensity sequences-leveraging them to reduce the
original 128-bit imaget to 90 bits.(i.e.,10 9-bit codes).The encoded output is optained by reading the third
column from top to bottom.The resulting com-pression ratio is 1.42:1
56
A unique feature of the LZW coding just demonstrated is that the coding dictionary or code book is created
while the data are being encoded.Rmark-ably,an LZW decoder builds an identical decompression dictionary
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
as it de-codes simoltaneously the encoded data stream.It is left as an exercise to the reader(see problem
8.20)to decode the output of the preciding example and reconstruct the code book.Although not needed in
this example,most practi-cal applications require a strategy for handling dictionary overflow.A simple
solution is to flush or reiitialize the dictionary when it becomes full and con-tinue coding with a new
initialized dictionary.A more complex option is to monitor compression performance and flush the
dictionary when it becoms poor or unacceptable.Alternatively,the least used dictionary entries can be
tracked and replaced when necessary.
57
8.2.5 Run-Length Coding
As was noted in Section 8.12,images with repeating intensities along their rows(or columns) can often be
compressed by representing runs of identical intensities as run-length pairs,where each run-length pair
specifies the start of a new intensityt and the number of consecutive pixels that have that intensi-ty The
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
technique,referred to as run-length encoding (RLE),was developed in the 1950s and become,along with its
2-D extensions,the standard com-pression approach in facsimile (FAX)coding.compression is achieved by
eliminating a simple from of spatial redundancy-groups of identical intensi-ties.when there are few(or
no)runs of identical pixels,run –length encoding results in data expansion.
The BMP file format uses a run-length encoding in which image data is represented in two different
modes:encoded and absolute-and either mode can occur anywhere in the image.In encoded mode,a two byte
RLE representation is used.The first byte specifies the numberof consecutive pix-els that have the color
index contained byte.The 8-bit color index selects the run’s intensity(color or gray value)from a table of256
pos-sible intensities.
In absolute mode,the first byte is0and the second byte signals one of four possible conditions,as shown in
Table 8.8when the second byte is 0 or 1,the end of a line or the end of the image has been reached.If it is
2,the next two bytes contain unsigned horizontal and vertical offsets to a new spatial position (and pixel)in
the image.If the second byte is between 3 and 255,it specifies the number of uncompressed pixels that
follow-with each subsequent byte containing the color index of one pixel.The total number of bytes must
be aligned on a 16-bit word boundary.
An uncompressed BMP file (saved using photoshop )of the 512×512×8bit image shown in fig.89(a)requires
263,244bytes of memory.compressed using BMP’sRLE option,the file expands to 267,706bytes-and the
compres-sion ratio is c=0.98.There are not enough equal intensity runs to make run-length compression
effective ;a small amount of expansion occurs. For the image in Fig.8.1(c),however ,the BMP RLE option
results in a compression ratioc=1.35.
58
Run –length encoding is particulary effective when compressing binary im-ages.Because there are only two
possible intensities (black and whith ).adjacent pixels are more likely to be identical.In addition ,.each
image row can be repre-sented by a sequence of lenghts only-rather than length-intensity pairs as was used
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
in Example 8.8 the basic idea is to code each contiguous group(i.e.,run)of 0sor 1s encountered in a left to
right scan of a row by its length and to estab-lish a convention for determining the value of the run.The
most common conventions are(1)to specify the value of the first run of each row,or (2)to assume that each
row begins with a white run,whose run length may in fact be zero.
Although run-length encoding is in itself an effective method of compress –ing binary images,additional
compression can be achieved by variable-length coding the run lengths themselves .The black and white run
lengths can be coded separately using variable _length codes that are specifically tailored to their own
statistics.for example,letting symbol a represent a black run of length j.we can estimate the probability that
symbol a was emitted by an imaginary black run –length source by dividing the number of black run
lengths
59
of length j in the entire image by the total number of black runs.An estimate of the entropy of this black run-
length source, denoted H0,follows by substi-tuting these probabilities into Eq.(8.1.6).A similar argument
holds for the en-tropy of the white runs,denoted H.The approximate run-length entropy of the image is then
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
Where the variables L0 and L1 denote the average values of black and white run
lengths,respectively .Equation(8.2-7)provides an estimate of the average number of
bits per pixel required to code the run lengths in a binary image using a variable-
length code.
Two of the oldest and most widely used image compression standards are the CCITT
Group 3 and 4 standards for binary image compression.Al-though they have been
used in a variety of computer applications,they were originally designed as facsimile
(FAX)coding methods for transimitting doc-uments over telephone networks.The
Group3 standard uses a 1-D run-length coding technique in which the last K-1 lines of
each group of k lines (for k=2or4)can be optionally coded in a 2-D manner.The Group
4 standard is a simplified or streamlined version of the Group3 standard in which
only 2-d coding is allowed.Both standards use the same 2-d coding approach,which is
two_dimensional in the sense that information from the previous line is used to
encode the current line.Both 1-D and 2-D coding are disussed next.
60
One-dimensional CCITT compression
In the 1-D CCITT Group 3 compression standard,each line of an image is encoded as a series of variable –
length Huffman code words that represent the run lengths of alternating white and back runs in a left –to
right scan of the line.The compression method employed is commonly referred to as Modified Huffman
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
(Mh)coding.The code words themselvese of two types ,which the standard refers to as terninating codes and
makeup codes.
If run length r is less than 63,a terminating code standard specifies dofferent ter-minating codes for black
and white runs.If r>63.two codes are used-a makeup code for quotient [r|64]and terminating code for
remainder r mod 64.Makeup cides are listed in Table A.2 and may or may not depend on the intensity(black
or white)of the run being coded.if [r\64]<1792,separate black and white run makeup codes are
specified;otherwise,makeup codes are independent of run intensity.The standard requires that each line
began with a whith run –length code word,which may in fact be 00110101 ,the code for a white run of
length zero
.finally ,a unique end-of-line(EOL)code word 000000000001 is used to terminate each line ,as well as to
signal the first line of each new image.The end of a sequence of images is indicated by six consecutive
EOLs.
61
Two-One-dimensional CCITT compression
The 2-D compression approach adopted for both the CCITT Group 3 and 4 standards is a line-by-line
method in which the position of each black-to-white or white-to-black run transition is coded with respect
to the position of a reference element a0 that is situated on the current coding line.the previously coded line
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
is called the reference line;the reference line for the first line of each new image is an imaginary white
line.the 2-D coding technique that is used is called Relative Element Address Designate(READ)coding.In
the Group 3 standard,one or three READ coded lines are allowed between sus-cessive MH coded lines and
the technique is called Modified READ(MR) coding.In the Group 4 standard,a greater number of READ
coded lines are al-lowed and the metod is called Modified Modified READ(MMR)coding.As was
previously noted,the coding is two-dimensional in the sense that informa-tion from the previous line is used
to encode the current line.Two-dimensional transforms are not involed.
Figure 8.14 shows the basic2-D coding process for a single scan line.Note that the initial steps of the
procedure are directed at locating several key changing elements:a 1,a1,a2,b1,and b2.A changing element is
defined by the standard as a pixel whose value is different from that of the previous pixel on the same
line.the most important changing element is a 0(the reference ele-ment),which is either set to the location of
an imaginary white changing ele-ment to the left of the first pixel of each new coding line or determined
from the previous coding mode.coding modes are discussed in the following para-graph.After a 0 is
located,a1is identified as the location of the next changing element to the right of a1on the coding line,b1 as
the changing element of the opposite value (of a 0 )and to the right of a0on the reference (or previ-
ous)line,and b2 as the next changing element to the right of b 1on the refer-ence line.if any of these changing
elements are not detected,they are set to the location of an imaginary pixel to the right of the last pixel on
the appro-priate line.Figure 8.15 provides two illustrations of the general relationships between the various
changing elements.
62
Two-One-dimensional CCITT compression -2
After identification of the current reference element and associated change –ing elements,two simple tests
are performed to select one of three possible coding modes:pass mode,vertical mode ,or horizontal
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
mode.The initial test ,which corresponds to the first branch point in the flowchart in fig.8.14,com-pares the
location of b2 to that of a1.the second test ,which corresponds to the second branch point in
fig.8.14.computes the distance (in pixels)between the locations of a 1 and b1and compares it against
3.depending on the outcome of these tests,one of the the three outlined coding blocks of fig.8.14 is entered
and the appropriate coding procedure is executed.A new reference element is then established ,as per the
flowchart,in preparation for the next coding iteration.
Table 8.9 defenes the specific codes utilized for each of the three possible coding modes.In pass
mode,which specifically excludes the case in which b 2 is directly above a1only the pass mode code word
0001 is needed.As figh.8.15 (a)shows thise mode identifies white or black reference line runs that do not
overlap
63
64
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
The current white or black coding line runs. In horizontal coding mode,the dis-tances from a 0 to a1 and a1 to
a2 must be coded in accordance with the termina-tion and makeup codes of tables A.1 and A.2 of Appendix
A and then appended to the horizontal mode word 001.This is indicated in table 8.9 by the notation
001+M(a0a1)+M(a1a2),where a0a2 denote the dis-tances from a0 to
a1 and a1to a2,respenctively.finally ,in
vertical coding mode,one of six special variable –length codes is assigned to the distance between a 1 and
b1.figure 8.15(b)illustrates the parameters involved in both horizontal and vertical mode coding .the
extension mode code word at the bottom of table 8.9 is used to enter an optional facsimile coding mode.for
example,the 0000001111 code is used to initiate an uncompressed mode of transmission .
Although fig 8.15(b)is annotated with the parameters for both horizontal and vertical mode coding (to
facilitate the discussion above),the depicted pat-tern of black and white pixels is a case for vertical mode
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
coding.that is,be –cause b2 is to the right of a1 ,the first (or pass mode) test in fig.8.14 fails.the second
test,which determines whether the vertical or horizontal coding mode
Is entered,indicates that vertical mode coding should be used,because the dis-tance from a 1 to b1 is less than
3.in accordance with table 8.9 ,the appropriate code words is 000010,implying that a 1 is two pixels left of b
1 .in preparation for the next coding iteration ,a 0 is moved to the location of a1.
66
Figure 8.16 (a) is a 300 dpi scan of a 7×9.25 inch book page displayed at about 1\3 scale.Note that about
half of the page contains text ,around 9% is oc-cupied by a halftone image ,and the test is white space .A
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
section of the page is enlarged in fig .8.16(b) .keep in mind that we are dealing white a binary image;the
illusion of gray tones is created,as was described in section 4.5.4 by the halftoning process used in
printing .if the binary pixels of the image in fig.8.16(a) are stored in groups of 8 pixels per byte,the
1952×2697 bit scanned image ,commonly called a docoment,requires 658,068 bytes.An uncompressed PDF
file of the document (created in photoshop) requires 663,445 bytes.CCITT Group3 compression reduces the
file to 123,497 bytes-resulting in a compression ratio c=5.37;CCITT Group 4 compression reduces the file
to 110,456 bytes,increasing the compression ratio to about 6.
8.2.6 Symbol-Based Coding
In symbol –or token –based coding ,an image is represented as a collection of frequently occurring sub-
images ,called symbols.Each such symbol is stored in a symbol dictionary and image is coded as aset of
triplets {[x1,y1,t1),(x2,y2,t 2),…..}where each (xi,yi)pair specifies the of a symbol in the image and tokent is the
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
address of the symbol or sub-image in the dictionary. That is ,each triplet represents an instance of a
dictionary symbol in the image.storing repeated symbols only once can compress images significantly-
particulary in document storage and retrieval applications ,where the sym-bols are often character bitmaps
that are repeated many times.
8.2.6 Symbol-Based Coding -2
Consider the simple bilevel image in fig.8.17(a).it contains the single word,banana,which is composed of three
unigue symbols:a b,three a’s,and two n’s .Assuming that the b is the first symbol identified in the coding
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
process,its 9×7 bitmap is stored in location 0 of the symbol dictionary.As fig.8.17(b)shows ,the token
identifying the b bitmap is 0.thus,the first triplet in the encoded image’s representation [see fig.8.17(c) is
(0.2.0)-indicating that the upper-left corner (an arbitrary convention) of the rectangular bitmap representing
the b symbol is to be placed at location (0,2)in the decoded image.After the bitmaps for the a and n symbols
have been identified and added to the dictionary,the remainder of the image can be encoded with five
additional triplets.A s long as the six triplets required to locate the symbols in the image,together with the
three bitmaps required to define them,are smaller than the original image,com-pression occurse .in thise
case,the starting image has9×51×1 or 459 bits and,assuming that each triplet is composed of 3 bytes,the
compressed representa-tion has (6×3×8) +[ (9×7)×(6×7)+(6×6)or 285 bits;the result-ing compression ratio
c=1.61.To decode the symbol-based representation in fig .8.17(c),you simply read the bitmaps of the
symbols specified in the triplets from the symbol dictionary andplace them at the spatial coordinates
specified in each triplet.
8.2.6 Symbol-Based Coding -3
Symbol-based compression was proposed in the early 1970 s (Acher and Nagy [1974]),but has become
practical only recently.Advances on symbol matching algorithms (see chapter 12) and increased CPU
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
coputer process-ing speeds have made it possible both to select dictionary symbols and find where they
occur in an image in a timely manner.And like many other com-pression methods ,symbol –based decoding
is significantly faster than encoding .finally ,we note that both the symbol bitmaps that are stored in the
dictionary and the triplets used to reference them can themselves be encoded to further improve
compression performance.if-as in fig.8.17-only exact symbol matches are allowed ,the resulting
compression is lossless;if small differences are permitted,some level of reconstruction error will be present.
70
JBIG2 compression
JBIG2 is an international standard for bilevel image compression .By segmenting an image into overlapping
and\or non—overlapping regions of text,halftone, and generic content ,comprssion techniques that are
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
symbol will correspond to a charac-ter bitmap-a subimage representing a character of text .there is normally only one
character bitmap(or subimage) in the symbol dictionary for each upper-and lowercase character of the font being
used .for example,there would be one “a” bitmap in the dictionary,one “A” bitmap,one “b”bitmap,and so on.
In lossy JBIG2 compression .often called perceptually lossless or visually lossless,we neglect differences between
dictionary bitmaps (i.e.,the reference character bitmaps or character templates)and specific in-stances of the
corresponding characters in the image.in lossless compres-sion,the differences are stored and used in conjunction with
the triplerts encoding each character (by the decoder)to produce the actual image bitmaps.All bitmaps are encoded either
arithmetically or using MMR(see section 8.2.5);the triplets used to access dictionary entries are either arithmetically or
Huffman encoded.
Halftone regions are similar to text regions in that they are composed of patterns arranged in a regular grid.the symbols
that are stored in the dic-tionary,however,are not character bitmaps but periodic patterns that rep-resent intensities (e.g.,
of a photograph)that have been dithred to produce bilevel images for printing.
Generic regions contain non-text,non-halftone information,like line art and noise,and are compressed using either
is achieved .After all ,the encoder must segment the image into regions,choose the text and halftone
symbols that are storedin the dictionary-ies,and decide when those symbols are essentially the same as,or
different from ,potential instances of the symbols in the image .the decoder simply uses that information to
recreate the original image.
Consider again the bilevel image in fig.8.16(a).figure 8.18(a)shows a re-constructed section of the image
after lossless JBIG2 encoding (by a comer –cially available document compression application ).it is an
exact replica of the original image.Note that the ds in the reconstructed text vary slightly .de-spite the fact
that they were generated from the same d entry in the dictionary. The differences between that d and the ds
in the image were used to refine the output of the dictionary.the standard defines an algorithm for
accomplishing
This during the decoding of the encoded dictionary bitmaps .for the purposes of our discussion ,you can
think of it as adding the difference between a diction-nary bitmap and a specific instance of the
corresponding character in the image to the bitmap read from the dictionary.
72
JBIG2 compression - 3
Figure 8.18(b)is another reconstruction of the area in (a)after perceptu-ally lossless JBIG2
compression.Note that the ds in thise figure are identical. The have been copied directly from the symbol
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
dictionary .the reconstruct-tion is called perceptually lossless because the text is readable and the font is
even the same .the small differences-shown in fig.8.18(c)-between the ds in the original image and the d in
the dictionary are considered unimportant because they do not affect readability.Remember that we are
dealing with bilevel images,so there are only three intensities in fig .8.18 (c).intensity 128 indicates areas
where there is no difference between the corresponding pixels of the images in figs.8.18(a) and
(b);intensities 0(black)and 255(white) in-dicate pixels of opposite intensities in the two images-for
example,a black pixel in on image that is white in the other ,and vice versa.
The lossless JBIG2 compression that was used to generate fig..8.18(a) re-duces the original 663,445 byte
uncompressed PDF image to 32,705 bytes;the compression ratio is c=20.3.perceptually lossless JBIG2
compression re-duces the image to 23,913 bytes .increasing the compression ratio to about 27.7.these
compressions are 4to5 times greater than the CCITT Group 3 and4 results from Example 8.10.
8.2.7 Bit-Plane Coding
The run-length and symbol-based techniques of the previous sections can be applied to images with more
than two intensities by processing their bit planes individually .the technique,called bit-plane coding,is
based on the concept of decomposing a multilevel(monochrome or color)image into a series of binary
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
images (see section 3.2.4)and compressing each binary image via one of sev-eral well-known binary
compression methods. In this section,we describe the two most popular decomposition approaches.
The intensities of an m-bit monochrome image can be represented in the form of the base-2 polynomial
Based on this property, a simple method of decomposing the image into a col-lection
of binary images is to separate the m coefficients of the polynomial into m 1-bit bit
planes.As noted in section 3.2.4 ,the lowest order bit plane(the plane corresponding
to the least significant bit)is generated by collecting the a 0 bits of each pixel,while the
highest order bit plance contains the am-1 bits or coef-ficients.in genereal ,each bit
plance is constructed by setting its pixels equal to the values of the appropriate bits
or polynomial coefficients from each pixel in the original image .the inherent
disadvantage of thise decomposition approach is that small changes in intensity can
have a significant impact on the complexity of the bit planes .if a pixel of intensity
127(01111111 is adjacent to a pixel of intensity 128(10000000),for instance,every bit
plance will contain a corresponding 0 to 1 (or 1 to 0) transition .for example,because
the most significant bits of the binary codes for 127 and 128 are different ,the
highest bit plane will contain a zero-valued pixel next to a pixel of value 1,creating a
0 to 1 (or 1 to 0)transition at that point.
8.2.7 Bit-Plane Coding - 2
An alternative decomposition approach (which reduces the effect of small intensity variations ) is to first
represent the image by an m-bit Gray code.the m-bit Gray code g m-1….g2g1g0 that corresponds to the
polynomial in EQ.(8.2.8) can be computed from
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
Here, , denotes the exclusive OR operation.thise code has the unique prop-erty that
successive code words differ in only one biy position.thus,small changes in intensity
are less likely to affect all m bit planes.for instance,when intensity levels 127 and 128
are adjacent,only the highest order bit plane will contain a 0 to 1 transition.because
the Gray codes that correspond to127 and 128 are 11000000,respectively.
Figures 8.19 and 8.20 show the eight binary and Gray-coded bit planes of the 8-bit
monochrome image of the child in fig.8.19(a).Note that the high-order bit planes are
far less complex than heir low-order counterparts.that is they contain large uniform
areas of significantly less detail,busyness,or ran-domness.in addition,the Gray-coded
bit planes are less complex than the cor-responding binary bit planes.Both
observations are reflected in the JBIG2 coding results of table 8.10 Note for
instance,that the a5 and g5 results are
75
77
8.2.7 Bit-Plane Coding - 3
Significantly larger than the a 6 and g6 compressions; and that both g6 aresmaller than their a5 and a6
counterparts.thise trend continues throughout the table ,with the single exception of a a .Gray-coding
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
provides a compression advantage of about 1.06:1 on average.combined together,the Gray –coded files
compress the original monochrome image by 678,676\475,964 or 1.43:1;the non –Gray-coded files
compress the image by 678,676\503,916 or 1.35:1.
Finally ,we note that the two least significant bits in fig 8.20 have little ap-parent structure.Because thise is
typical of most 8-bit monochrome images,bit –plane coding is usually restricted to images of 6 bits\pixel or
less.JBIGI,hthe predecessor to JBIG2 ,imposes cuch a limit.
78
8.2.8 Block Transform Coding
In thise section,we consider a compression technique that divides an image into small non-overlapping
blocks of equal size (e.g.,8×8)and processes the blocks independently using a 2-D transform.in block
transform coding ,a re-versible ,linear transform(cuch as the fourier transform) is used to map each block or
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
subimage into a set of transform coefficients ,which are then quantized and coded.for most images,a
significant number of the coefficients have small magnitudes and can be coarsely guantized (or discarded
entirely)with little image distortion.Avariety of transformations,including the discrete fourier
transform(DFT)of chapter 4,can be used to transform the image data.
Figure 8.21 shows a typical block transform coding system.the decoder im-plements the inverse sequence of
steps(with the exception of the quantization function)of the encoder ,which performs four relatively
straightforward oper-ations:subimage decomposition,transformation ,quantization,and coding.An M ×N
input image is subdivided first into subimages of size n×n ,which are then transformed to generate MN\
Subimage transform arrays,each of size n×n .the goal of the transformation process is to decorrelate the
pixels of each subimage,or to pack as much information as possible into the smallest number of transform
coefficients.the quqntization stage then selectively elim-inates or more coarsely quantizes the coefficients
that carry the least amount of information in a predefined sense(several methods are discussed later in the
section).these coefficients have the smallest impact on reconstructed subimage quality .the encoding process
terminates by coding(normally using a variable-length code)the quantized coefficients.Any or all of the
transform encoding steps can be adapted to local image content,called adaptive trans-form coding ,or fixed
for all subimages,called nonadaptive transform coding.
79
Transform selection
Block transform coding systems based on a variety of discrete 2_D transforms have been constructed and\
or studied extensively.the choice of a particular transform in a given application depends on the amount of
reconstruction error that can be tolerated and the computational resources available.com-pression is
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
achieved during the quantization of the transformed coefficients (not during the transformation step).
With reference to the discussion in section 2.6.7 consider a subimage g(x,y) of size n×n whose
forvard ,discrete transform ,T (u,v),can be expressed in terms of the general relation
For x,y =0,1,2,…,n-1 .in these equations,r(x,y,u,v)and s(x,y,u,v) are callled the
forward and inverse transformation kernels,respectively.for reasons that will become
clear later in the section ,they also are referred to as basis function or basis
images.the t(u,v)for u,v=0,1,2,….n-1 in eq.(8.2-10)are called transform
coefficients;they can be viewed as the ex-pansion coefficients-see section 7.2.1-of a
series expansion of g(x,y) with respect to basis functions s(x,y,u,v).
As explained in section 2.6.7 ,the kernel in Eg .(8.2-10)is separable if
80
Transform selection - 2
In addition,the kernel is symmetric if r 1 is functionally equal to r2 .in thise case,Eq.(8.2-12) can be expressed in
the form
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
Identical comments apply to the inverse kernel if r(x,y,u.v)is replaced by s(x,y,u,v) in Eqs.(8.2-12) and (8.2-
13).it is not difficult to show that a 2-D transform with a separable kernel can b computed using row-
column or colimn-row passes of the corresponding 1-D transform.in the manner ex-plained in section
4.11.1.
The forward and inverse transformation kernels in Eqs.(8.2,10)and (8.2-11)determine the type of transform
that is cmputed and the overall computation-al complexity and reconstruction error of the block transform
coding system in which they are employed.the best known transformation kernel pair is
Ans
Where j=-1 .these are the transformation kernels defined in Eqs.(2.6-34) and (2.6-
35)of chaper 2 eith M=N=n .substituting these kernels into,Eqs.(8.2-10)and (8.2-
11)yields a simplified version of the discrete fourier transform pair introduced in
section 4.5.5.
A computationally simpler transformation that ie also useful in transform
coding,called the walsh-Hadamard transform(WHT),is drived from the functionally
identical kernels
81
Transform selection - 3
Where n= .the summation in the exponent of thise expression is performed in modulo 2 arithmetic and b k
(Z) is the kth bit (from right to left)in the binary reprentation of z. if m=3 and z=6(110 in binary),for
example,b0(z)=0,b1(z)=1,and b2(z)=1.the pi(u) in Eq.(8.2-16)are computed using:
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
82
Transform selection - 4
With denotes +1 and black denotes-1.to obtain the top left block,we let u=v=0 and plot values of
r(x,y,0,0)for x,y=0,1,2,3.All values in this case are+1 .the second block on the top row is a plot of values of
r(x,y,0,1) for x,y=0,1,2,3,and so on As already noted,the importance of the walsh-Hadamard transform is its
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
Where
And similarly for a(v).Figure 8.23 shows r(x,y,u,v) for the case n=4.the computation follows the same
format as explained for fig .8.22,white the dif-ference that the values of r are not integers.in fig .8.23,the
lighter intensity values correspond to larger values of r.
83
Transform selection - 5
Figures 8.24(a)through (c)show three approximations of the 512×512monochrome image in fig .8.9(a).these
pictures were obtained by dividing the original omage into subimages of size 8×8 ,representing each
subimage using one of the transforms just described (i.e.,the DFT,WHT,or DCT transform of the truncated
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
coefficient arrays.
84
Transform selection - 6
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
In each case,the 32 retained coefficients were selected on the basis of max-imum magnitude.Note that in all
cases,the 32 discarded coefficients had little visual pmpact on the quality of the reconstructed image,their
elimination ,however ,was accompanied by some mean-square error,which can be seen in the scaled error
images of figs.8.24(d)through(f).the actual rms errors were2.32,1.78,and 1,13 intensities ,respectively
85
Transform selection - 7
The small differences in mean-square reconstruction error noted in the pre-ceding example are related directly
to the energy or information packing prop-erties of the transforms employed .in accordance with EQ.(8.2-
11) an n×n subimage g(x,y)can be expressed as a function of its 2-D transform T(u,v):
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
For x,y=0,1,2,….n-1 .Bcause the inverse kernel s(x,y,u,v)in Eq.(8.2-20)depends only on the indices
x,y,u,v,and not on the values of g(x,y)or T(u,v),it can be viewed as defining a set of set of basis functions or
basis imges for the defined by Eq .(8.2-20).thise interpretation becomes clearer if the notation used in
Eq .8.2.20) is modified to obtain
86
Transform selection - 8
then G the matrix containing the pixels of the input subimage,is explicitly de-fined as a linear combination
of matrices of size n×n ;that is ,the S uv for u,v=0,1,2,…n-1 in Eq.(8.2-22).these matrices in fact are the
basis im-ages (or functions) of the series expansion in Eq.(8.2-20);the associated t(u,v)are the
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
expansion .Figures 8.22 and 8.23 illustrate graphi-cally the WHT and DCT basis images foe the case of n=4.
If we now define a transform coefficient masking function
For u,v =0,1,2,…,n-1, an approximation of G can be obtained from the truncated expansion
Where x(u,v) is constructed to eliminate the basis images that make the smallest contribution to the total sum in
Eq .(8.2-21). The mean –square error between subimage G and approximation C then is
87
Transform selection - 9
Where|| G-Ĝ|| is the norm of matrix (G-G) and t is the variance of the coefficient at transformal location
(u,v).the final simplification is based on the orthonormal nuture of the basis images and the assumption that
the pixels of G are generated by a random process with zero mean and known covari-ance.the total mean-
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
square approximation error thus is the sum of the vari-ances of the discarded transform coefficients;that
is ,the coefficients for which x(u,v)=0,so that [1-x(u,v)] in Eq.(8.2-25) is 1.transformations that re-distribute
or pack the most information into the fewest coefficients provide the best subimage approximations and ,the
smallest reconstruct-tion errors.finally ,under the assumptions that led to Eq.(8.2-25),the mean-square error
of the MN\ subimages of an M×N image are identical.thus the mean-square error (being a meaqsure of
average error) of the M×N image equals the mean –sguare error of a single subimage.
The earlier example showed that the information packing ability of the DCT is superior to tha of the DFT
and WHT. Although thise condition uaually holds for most images,the karhunen-Loéve transform (see
chqapter 11),not the DCT ,is the optimal transform in an information packing sense.thise is due to the fact
that the KLT minimizes the mean-square error in Eq.(8.2-25)for any input image and any number of
retained coefficients(Kramer and Mathews[1956]).However,because the KLT is data dependent,obtaining
the KLTR basis images for each cubimage ,in general ,is a nontrivial computational task.
For thise reason ,the KLT is used infrequently in practice for image compression.
Instead,a transform,such as the DFT,WHT,or DCT,whose basis images are fixed(input
independent),normally is used.of the possible imput independent transforms,the nonsinusoidal
transforms(such as the WHT transform) are the simplest to implement.the sinusoidal transforms(such as the
DFT or DCT) more closely approximate the information packing ability of the optimal KLT.
88
Transform selection - 10
Hence ,most transform coding systems are based on the DCT,which provides a good compromise between
information packing ability and computational complexity.in fact,the properties of the DCT have proved to
be of sucsh practi-cal value that the DCT has become an international standard for transform cod-ing
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
systems.compared to the other input independent transforms,it has the advantages of having been
implemented in a single integrated circuit,packing the most information into the fewest coefficients (for
most images),and mini-mizing the block-like appearance,called blocking artifact,that results when the
boundaries between subimages become visible.thise last property is particularly important in comparisons
with the other sinusoidal transforms.As fig .8.25(a) shows,the implicitn-point periodicity(see section
4.6.3)of the DFT gives rise to boundary discontinuities that result in result in substantial high-frequency
transform
Content .when the DFT transform coefficients are truncated or quantized.the Gibbs
phenomenon causes the boundary points to take on erroneous values,which appear
in an image as blocking artifact.that is ,the boundaries between ad-jacent subimages
become visible because the boundary pixels of the subimages assume the mean
values of discontinuities formed at the boundary points[see fig .8.25(a)].the DCT of 89
fig.8.25(b)reduces thise effect,because its implicit 2n-point periodicity does not
Subimage size Selection
A nother significant factor affecting transform coding error and computational complexity is subimage
size .in most applications ,images are subdivided so that the correlation (redundancy)between adjacent
subimages is reduced to some acceptable level and so that n is an integer power of 2 where,as before,n is the
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
subimage dimension.the latter condition simplifies the computation of the subimage transforms (see the
base-2)successive doubling method dis-cussed in section 4.11.3)in general,both the level of compression
and com-putational complexity increase as the subimage size increases.the most popular subimage sizes are
8×8 and 16×16.
Figure 8.26 illustrates graphically the impact of subimage size on transform coding reconstruction error.the
data plotted were obtained by dividing the monochrome image of fig.8.9(a)into subimages of size n×n.for
n=2,4,8,16,….,256,512,computing the transform of each subimage,trun-cating 75% of the resulting
coefficients,and taking the inverse transform of the truncated arrays.Note that the Hadammard and cosine
curves flatten as the size of the subimage becomes greater than 8×8 ,whereas the fourier reconstruction
Subimage size Selection - 2
Error continues to decrease in thise region.As n further increases,the fourier re-construction error crosses
the walsh –Hadamard curve and approaches the cosine result.thise result is consistent whith the theoretical
and experimental findings re-ported by Netravali and limp[1980]and by pratt[1991] for a 2-D Markov
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
image source.
All three curves intersect when 2×2 subimages are used .in thise case,only one of the four coefficients(25%)
of each transformed array was retained.the coefficient in all cases was the dc component,so the inverse
transform simply replaced the four subimage pixels by their average value[see Eq .(4.6-21)].thise condition
is evident in fig .8.27(b),which shows a zoomed portion of the 2×2 DCT result.note that the bloking artifact
that is prevalent in thise result decreases as the subimage size increases to 4×4 and 8×8 in figs.8.27(c)and
(d).figure 8.27(a)shows a zoomed portion of the original image for reference.
Bit Allocation
The reconstruction error associated with the truncated series expansion of eq.(8.2-24)is a
function of the number and relative importance of the
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
Transform coefficients that are discarded,as well as the precision that is used
to represent the retained coefficients.in most transform coding sys-
tems,the retained coefficients are selected [that is ,the masking function of
Eq.(8.2-23)is constructed]on the basis of maximum variance ,called zonal
coding,or on the basis of maximum magnitude,called threshold coding.the
overall process of truncating,quantizing,and coding the coefficients of a
transformed subimage is commonly called bit allocation
92
Bit Allocation - 2
Figures 8.28 (a) and (c)show two approximations of fig .8.9(a) in which 87.5% of the DCT coefficients of each
8×8 subimage were discarded.the first result was obtained via threshold coding by keeping the eight largest
transform coefficients,and the second image was generated by using a zonal coding approach.in the latter
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
case,each DCT coefficient was considered a random variable whose distribution could be computed over
the ensemble of all transformed subimages .the 8 distributions of largest variance (12.5% 0f the 64
coefficients in the transformed 8×8 subimage)were located and used to determine the coordinates,u and v,of
the coefficients,T(u,v),that were re-tained for all subimages.Note that threshold coding difference image
offig.8.28(b)contains less error than the zonal coding result in fig.8.28(d).Both images have .Both images
have been scaled to make the errors more visible .the correcponding rms errors are 4.5 and 6.5 intensities,
respectively.
93
Bit Allocation - 3
Zonal coding implementation zonal coding is based on the information theory concept of viewing
information as uncertainty .therefore the transform coefficients of maximum variance carry the most image
information and should be retained in the coding process.the variances themselves can be cal-culated
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
directly from the ensemble of MN\ transformed subimage arrays ,as in the preceding example,or based on
as assumed image model (say,a markerkov autocorrelation function). In either case ,the zonal sampling
process can be viewed,in accordance with Eq .(8.2-24) , as multiplying each t (u,v) by the cor-responding
element in a zonal mask ,which is constructed by placing a 1 in the locations of maximum variance and a 0
in all othere locations.coefficients of maximum variance usually are located around the origin of an image
trance-form,resulting in the typical zonal mask shown in fig .8.29(a).
The coefficients retained during the zonal sampling process must be quan-tized and coded .so zonal masks
are sometimes depicted showing the number of bits used to code each coefficient [fig.8.29(b).]in most
cases,the coefficients are allocated the same number of bits,or some fixed number of bits is distributed
among them unequally.in the first case ,the coefficients generally are normal-ized by their atandard
deviations and uniformaly quantized .in the second case ,a quantizer,such as an optimal Lioyd-max
quantizer(see optimal quantizers in section 8.2.9),is designed for each coefficient.to construct the required
guan- tizers,the zeroth or dc coefficient normally is modeled by a Ravleigh density function .whereas the
remaining coefficients are modeled by a laplacian or Gaussian density .the number of
quantization levels(and thus the number of bits)allotted to each quantizer is made
proportional to log 2 t (u,v) .thus the re-tained coefficients in Eq (8.2-24)-which (in the
context of the current discus-sion)are selected on the basis of maximum variance –
are assigned bits in proportion to the coefficient variances.
94
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
Bit Allocation - 4
95
Threshold Coding implementation
Threshold ciding implementation zonal coding usually is implemented by using a single fixed mask for all
subimages .threshold coding ,however ,is in-herently adaptive in the sense that the location of the transform
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
coefficients retained for each subimage vary from one subimage to another .in fact,threshold coding is the
adaptive transform coding approach most often usedin practice because of its computational simplicity.the
underlyng concept is that for any subimage ,the transform coefficients of largest magnitude make the most
significant contribution to reconstructed subimage quality,as demonstrated in the last example.Because the
locations of the maximum co-efficients vary from one subimage to another,the elements of x(u,v) T
(u,v)normally are recordered(in a predefined manner)to form a 1-d ,run –length coded sequence.figure
8.29(c) shows a typical thresholdmask for one subim-age of a hypothetical image .this mask provides a
convenient way to visualize the threshold coding process for the corresponding subimage,as well as to
mathematically describe the process using Eq.(8.2-24).when the mask is ap-plied [via Eq .(8.2.24)]to the
subimage for which it was derived,and the re-sulting n×n array is reordered to form an -element coefficient
sequence in accordance with the zigzag ordering pattern of fig.8.29(d) ,the reordered1-D sequence contains
several long rung runs of 0s [the zigzag pattern becomes ev-ident by starting at 0 in fig .8.29(d) and
following the numbers in sequence]. These runs normally are run-length coded.the nonzero or retained
coeffi-cients,corresponding to the mask locations that contain a1,are represented using a variable-length
code.
Threshold Coding implementation - 2
There are three basic ways to threshold a transformed subimage or,stated differently ,to create a subimage
threshold masking function of the form given in Eq .(8.2.23): ( 1)A single global threshold can be applied to
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
all subimages;(2) a different threshold can be used for each cubimage;or (3) the threshold can be varied as a
function of the location of each coefficient within the subim-age.in the first approach ,the level of
compression differs from image to image,depending on the number of coefficients that exceed the global
thresh-old.in the second ,called N-largest coding,the same number of coefficients is discarded for each
subimage.As a result,the code rate is constant and knownin advance.the third technique,like the first ,results
in a variable code rate,but offers the advantage that thresholding and quantization can be combined by
replacing x(u,v)T(u,v)in Eq .(8.2-24)with
Threshold Coding implementation - 3
Where (u,v)is a thresholded and quantized approximation of T(u,v) ,and Z(u,v)is an element of the transform
normalization array
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
Before a normalized (thresholded and quantized )subimage transform, (u,v),can be inverse transformed to
obtain an approximation of subimage g(x,y) ,it must be multiplied by Z (u,v).The resulting denormalized
zrray,de-noted (u,v) is an approximation of (u,v) :
The inverse transform of (u,v) yields the decompressed subimage approxi –matio
Figure 8.30(a)depicts Eq .(8.2-26)graphically for the case in which Z(u,v)is assigned particular value c.Note
that (u,v)assumes integer value k if and only if
Threshold Coding implementation - 4
If Z (u,v)>2T(u,v),then (u,v)=0 and the transform coefficient is com-pletely truncated or discarded .when
(u,v)is represented with a variable-length code that increases in length as the magnitude of k increases,the
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
number of bits used to represent T (u,v)is controlled by the value of c.thuse the elements of Z
100
JPEG - 2
greater compression ,higher precision ,or progressive reconstruction application;and(3)a lossless
independent coding system for reversible compression.to be JPEG compatible,a product or system must
include support for the baseline sys-tem.No particular file format,spatial resolution ,or color space model is
R. C. Gonzalez, and R. E. Woods, Digital Image Processing, New Jersey: Prentice Hall, 3rd edition, 2008.
specified .
in the baseline system,often called the sequential baseline system,the input and output data precision is
limited to 8 bits,whereas the quantized DCT val-ues are restricted to 11 bits .the compression itself is
performed in three se-quential steps:DCT computation,quantization ,and variable-length code
assignment,the image is first subdivided into pixel blocks of size 8×8 ,
which are processed left to right ,top to bottom .As each 8×8 block or subimage is encountered ,its 64 pixels
are level-shifted by subtracting the quantity where
is the maximum number of intensity levels. The 2-D discrete cosine transform of the block is then
computed,quantized in accordance with Eq.(8.2-26),and reordered ,using the zigzag pattern of fig,8.29(d),to
form a 1-D sequence of quantized coefficients.
Because the one-dimensionally reordered array generated under the zigzag pattern of fig.8.29(d) is arranged
qualitatively according to increasing spatial frequency,the JPEG coding procedure is designed to take
advantage of the long runs of zeros that normally result from the reordering.In particular,the nonzero AC
101