0% found this document useful (0 votes)
13 views73 pages

Unit 4 Updated DIP

The document discusses redundancy in data storage, particularly in image compression, highlighting types such as coding redundancy and interpixel redundancy. It explains various encoding techniques, including fixed-length and variable-length encoding, and introduces the Discrete Cosine Transform (DCT) as a key method for compressing images by transforming spatial data into frequency data. Additionally, it outlines different image file formats and compression methods, comparing lossless and lossy compression techniques.

Uploaded by

vinaygupta.cse26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views73 pages

Unit 4 Updated DIP

The document discusses redundancy in data storage, particularly in image compression, highlighting types such as coding redundancy and interpixel redundancy. It explains various encoding techniques, including fixed-length and variable-length encoding, and introduces the Discrete Cosine Transform (DCT) as a key method for compressing images by transforming spatial data into frequency data. Additionally, it outlines different image file formats and compression methods, comparing lossless and lossy compression techniques.

Uploaded by

vinaygupta.cse26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Redundancy

• Redundancy refers to “storing extra information to represent a quantity of


information”.
Information vs Data

REDUNDANT
DATA

INFORMATION

DATA = INFORMATION + REDUNDANT DATA


Types of Redundancy
Coding Redundancy
• Code redundancy relates to how information is expressed through codes
representing data, such as the gray levels within an image.
• When these codes use excessive symbols to represent each gray level,
more than what’s required, the resultant image is described as having
coding redundancy.
• To better understand this, let’s look at two types of encoding: fixed-
length and variable-length encoding.
Fixed-length encoding
• We assign the same number of bits to represent each shade in fixed-length
encoding. For instance, let’s use 3 bits for simplicity. If we have four
shades of gray, the codes might look like this:
• Shade 1: 000
• Shade 2: 001
• Shade 3: 010
• Shade 4: 011
• Even if we only need 2 bits to represent these shades, we use 3 bits for
every shade to create coding redundancy.
Variable-length encoding

• Like Huffman coding, we assign shorter codes to more frequent


shades in variable-length encoding. For example:
• Shade 1: 0
• Shade 2: 10
• Shade 3: 110
• Shade 4: 111
• Here, the more common shades get shorter codes, reducing the
number of bits used and minimizing coding redundancy.
Interpixel Redundancy
• Interpixel redundancy refers to the correlation between neighboring pixels in an
image. In most natural images, adjacent pixels are similar in value (like in a
smooth sky or a person’s skin), so storing each pixel value independently causes
redundant data.
• Consider a grayscale image row:
[100, 101, 102, 101, 100]
Each pixel is close in value to its neighbor — not much new information is added.
So instead of storing all values, we could store:
• The first pixel (e.g., 100), and
• The difference from the previous pixel: [+1, +1, -1, -1].
• This is called differential encoding or predictive coding and helps reduce
redundancy.
DCT (Discrete Cosine Transform)
• The Discrete Cosine Transform (DCT) is a mathematical tool used in image and
video compression, like JPEG, MPEG, and HEVC. It transforms spatial-domain
data (like pixel intensities) into frequency-domain data.
• DCT converts an image (or part of it) from pixels into frequencies.
• Natural images have a lot of low-frequency content (slow changes in brightness)
and very little high-frequency content (sharp edges, noise).
• DCT helps by:
• Concentrating most of the image’s information into fewer coefficients (usually
in the top-left corner).
• Allowing us to discard small/high-frequency coefficients that the human eye
doesn’t notice → this reduces file size.
Numerical of 1-D DCT
• Let the input signal be:
x=[52,55,61,66]
• We’ll compute the 1D DCT (Type-II) of this signal.
Numerical of 2-D DCT
What Does Type-II Mean in DCT?
• The term "Type-II DCT" refers to one specific variant of the Discrete Cosine
Transform (DCT).
• There are actually 8 standard types of DCTs, labeled Type-I to Type-VIII, but
only a few are commonly used.
• DCT Type-II is the most commonly used version, especially in image and video
compression, like:JPEG (image)MPEG (video)H.264 / HEVC.
• When people say "DCT", they usually mean Type-II by default.
Comparison
Lossless Compression Lossy Compression
No data is lost Data is lost. Source information is lost. Less
important information from media is
removed.
Exact Replica Exact replica is not obtained
Used in text encryption Used in image audio video
Process is reversible Process is irreversible

JPEG – best for photographs or designs with people, places or things in them
PNG – best for images with transparent backgrounds
GIF – best for animated GIFs, otherwise, use the JPG format
Vector vs. Raster
• Raster Image Files
• Raster images are constructed by a series of pixels, or individual blocks, to form
an image.
• JPEG, GIF, and PNG are all raster image extensions.
• Every photo we find online or in print is a raster image.
• Pixels have a defined proportion based on their resolution (high or low), and
when the pixels are stretched to fill space they were not originally intended to fit,
they become distorted, resulting in blurry or unclear images.
• In order to retain pixel quality, we cannot resize raster images without
compromising their resolution.
• As a result, it is important to remember to save raster files at the exact
dimensions needed for the application.
• Vector Image Files
• Vector images are far more flexible.
• They are constructed using proportional formulas rather than pixels.
• EPS, AI and PDF are perfect for creating graphics that require frequent resizing.
• Your logo and brand graphics should have been created as a vector, and you should
always have a master file on hand.
• The real beauty of vectors lies in their ability to be sized as small as a postage
stamp, or large enough to fit on an 18-wheeler.
Types of Digital image files
• There are five main formats:
1. TIFF (also known as TIF), file types ending in .tif
• TIFF stands for Tagged Image File Format. TIFF images create very large file
sizes. TIFF images are uncompressed and thus contain a lot of detailed image
data (which is why the files are so big).
• TIFFs are also extremely flexible in terms of color (they can be grayscale, or
CMYK for print, or RGB for web) and content (layers, image tags).
• TIFF is the most common file type used in photo software (such as Photoshop),
as well as page layout software (such as Quark and InDesign), again because a
TIFF contains a lot of image data.
2. JPEG (also known as JPG), file types ending in .jpg
• JPEG stands for Joint Photographic Experts Group, which created this standard
for this type of image formatting.
• JPEG files are images that have been compressed to store a lot of information in
a small-size file.
• Most digital cameras store photos in JPEG format, because then you can take
more photos on one camera card than you can with other formats.
• A JPEG is compressed in a way that loses some of the image detail during the
compression in order to make the file small (and thus called “lossy”
compression).
• JPEG files are usually used for photographs on the web, because they create a
small file that is easily loaded on a web page and also looks good.
• JPEG files are bad for line drawings or logos or graphics, as the compression
makes them look “bitmappy” (jagged lines instead of straight ones).
3. GIF, file types ending in .gif
• GIF stands for Graphic Interchange Format.
• This format compresses images but, as different from JPEG, the compression
is lossless (no detail is lost in the compression, but the file can’t be made as
small as a JPEG).
• GIFs also have an extremely limited color range suitable for the web but not
for printing.
• This format is never used for photography, because of the limited number of
colors. GIFs can also be used for animations.

4. PNG, file types ending in .png


• PNG stands for Portable Network Graphics.
• It was created as an open format to replace GIF, because the patent for GIF
was owned by one company and nobody else wanted to pay licensing fees. It
also allows for a full range of color and better compression.
• It’s used almost exclusively for web images, never for print images.
• For photographs, PNG is not as good as JPEG, because it creates a larger file.
But for images with some text, or line art, it’s better, because the images look
less “bitmappy.”
5. RAW - Raw Image Formats
• A RAW image is the least-processed image type on this list -- it's often the first
format a picture inherits when it's created.
• When we snap a photo with camera, it's saved immediately in a raw file format.
• Only when we upload media to a new device and edit it using image software is it
saved using one of the image extensions explained above

6. PSD - Photoshop Document


• PSDs are files that are created and saved in Adobe Photoshop, the most popular
graphics editing software ever.
• This type of file contains "layers" that make modifying the image much easier to
handle.
• This is also the program that generates the raster file types mentioned above.
7. PDF - Portable Document Format
• PDFs were invented by Adobe with the goal of capturing and reviewing rich
information from any application, on any computer, with anyone, anywhere.
• If a designer saves your vector logo in PDF format, you can view it without any
design editing software (as long as you have downloaded the free Acrobat Reader
software), and they have the ability to use this file to make further manipulations.
• This is by far the best universal tool for sharing graphics.

8. EPS - Encapsulated Postscript


• EPS is a file in vector format that has been designed to produce high-resolution
graphics for print.
• Almost any kind of design software can create an EPS.
• The EPS extension is more of a universal file type (much like the PDF) that can be
used to open vector-based artwork in any design editor, not just the more common
Adobe products.
9. AI - Adobe Illustrator Document
• AI is, by far, the image format most preferred by designers and the most reliable type
of file format for using images in all types of projects from web to print, etc.
• Adobe Illustrator is the industry standard for creating artwork from scratch and
therefore more than likely the program in which your logo was originally rendered.
• Illustrator produces vector artwork, the easiest type of file to manipulate.

10. INDD - Adobe Indesign Document


• INDDs (Indesign Document) are files that are created and saved in Adobe Indesign.
• Indesign is commonly used to create larger publications, such as newspapers,
magazines and eBooks.
Some Basic Compression Methods
Lossless predictive coding:
• Based on eliminating the interpixel redundancies of closely spaced
pixels by extracting &coding only the new information in eachpixel.

• New information: difference between the actual & predicted value


of that pixel.
•Figure shows basic component of a lossless predictive coding system.

•It consists of an encoder & a decoder each containing an identical


predictor.
•As each successive pixel of input image f(n) is introduced to the
encoder, predictor generates its anticipated value.
Output of the predictor is then rounded to the nearest integer
f(n)bar & used to form the difference or prediction error.
e(n) =f(n)– f(n)bar
• It is coded using a variable length to generate the next
element of the compressed datastream.

• The decoder reconstruct the fn from the received variable-


length code words &perform the inverse operation

• f(n) =e(n) +f(n)bar

• f(n)bar is generated by prediction formed by a linear


combination of m previouspixels.
m
f(n)bar =round [Σ αi f(n-i)] where, m – order of linearpredictor
i =1
round – function used to denoterounding
αi – for i =1, 2, 3, …..m are prediction coefficients.
Lossy Predictive coding:
LossyPredictive coding:
• In this method we add a quantizer to the lossless predictive coding
model.

• It replaces the nearest integer function & is placed between symbol


encoder & point where prediction error forms.

• It maps the prediction error into a limited range of outputs denoted by


ë(n), which establish the amount ofcompression & distortion.

• In order to accommodate the insertion of the quantization step, the


error free encoder must be altered so that the predictions by the
encoder &decoder are equivalent.
LossyPredictive coding:

• This is accomplished by placing the predictor within a feedback loop,


where its input ƒ(n)dot is generated as a function of past predictions
&quantized errors.

ƒ(n)dot =e(n)dot +f(n)bar

• This closed loop configuration prevents error buildup at the decoder’s


output.
Non-Adaptive v.s. Adaptive
Predictive Coding
• In non-adaptive coders, a fixed set of prediction coefficients are
used across the entire image.
• In adaptive coders, one updates the correlation coefficients R(k,
l) and hence the prediction coefficient ak based on local samples.
– In forward adaptive predictive coder, for each block of pixels,
the correlation coefficients are calculated for this block and the
optimal coefficients are used.
– In backward adaptive predictive coder, the correlation
coefficients and consequently the prediction coefficients are
updated based on the past reconstructed samples, in both the
encoder and decoder.
Transform Coding
• Transform Coding
• • Predictive coding technique is a spatial domain technique since it
operates on the pixel values directly.
• • Transform coding techniques operate on a reversible linear
transform coefficients of the image (ex. DCT, DFT, Walsh etc.)
Steps
• Input N × N image is subdivided into subimages of size n × n .
• n × n subimages are converted into transform arrays. This tends to
decorrelate pixel values and pack as much information as possible
in the smallest number of coefficients.
• Quantizer selectively eliminates or coarsely quantizes the
coefficients with least information.
• Symbol encoder uses a variable-length code to encode the
quantized coefficients.
• Any of the above steps can be adapted to each subimage (adaptive
transform coding), based on local image information, or fixed for
all subimages.
Lossy Compression
Transform Coding
N -1 N-1
T (u, v)   f (x, y)g(x, y, u, v) u, v  0,1, , N 1
u 0 v0
N 1 N1
f (x, y)   T (u, v)h(x, y, u, v) x, y  0,1, , N 1
u 0 v0

Forward kernel is Separable


if:
g(x, y, u, v)  g1 (x, u).g2 ( y, v)
Forward kernel is Symmetric if:

g1  g 2  g(x, y, u, v)  g1 (x, u).g1 ( y, v)


Lossy Compression
Transform Coding
Discrete Fourier Transform (DFT):
1  j 2 (uxvy ) / N
g(x, y, u, v)  e
N
h(x, y, u, v)  e j 2 (uxvy ) / N
Walsh- Hadamard Transform ( WHT):
m1

1 [bi ( x) pi (u)b i( y ) pi (v)]


g(x, y, u, v)  h(x, y, u, v)  (1) i0 (N  2m )
N
bk( z) is the k th bit (from right to left) in the binary
representation of z.
Discrete Cosine Transform (DCT)
• Given a two-dimensional N × N image f (m, n), its discrete cosine transform
(DCT) C(u, v) is defined as:

• Similarly, the inverse discrete cosine transform (IDCT) is given by


• The DCT is
• Separable (can perform 2-D transform in terms of 1-D transform).
• Symmetric (the operations on the variables m, n are identical).
• Forward and inverse transforms are identical
• The DCT is the most popular transform for image compression
algorithms like JPEG (still images), MPEG (motion pictures).
• The more recent JPEG2000 standard uses wavelet transforms instead of
DCT.
Subimage Size Selection
• Images are subdivided into subimages of size n × n to reduce the
correlation (redundancy) between adjacent subimages.
2k
• Usually n = , for some integer k. This simplifies the computation of
the transforms (ex. FFT algorithm).
• Typical block sizes used in practice are 8×8 and 16×16.
Bit Allocation
• After transforming each subimage, only a fraction of the coefficients are
retained. This can be done in two ways:

• Zonal coding: Transform coefficients with large variance are retained. Same set
of coefficients retained in all subimages.

• Threshold coding: Transform coefficients with large magnitude in each


subimage are retained. Different set of coefficients retained in different
subimages.
• The retained coefficients are quantized and then encoded.
• The overall process of truncating, quantizing, and coding the transformed
coefficients of the subimage is called bit-allocation
Transform Coding- Bit allocation
• Zonal coding
1. Fixed number of bits / coefficient
- Coefficients are normalized by their standard
deviations and uniformly quantized
2. Fixed number of bits is distributed among the coefficients unequally.
Transform Coding- Bit allocation
• Threshold coding
1. Single global threshold
2. Different threshold for each subimage ( N- Largest coding)
3. Threshold can be varied as a function of the location of each coeff.
• In each subimage, the transform coefficients of largest magnitude contribute most
significantly and are therefore retained.
• • A different set of coefficients is retained in each subimage. So this is an
adaptive transform coding technique.
• • The thresholding can be represented as T(u,v)m(u,v), where m(u,v) is a
masking function:

• The elements of T(u,v)m(u,v) are reordered in a predefined manner to form a 1-


2
D sequence of length n. This sequence has several long runs of zeros, which are
run-length encoded
The thresholding itself can be done in three different ways, depending on the
“truncation criterion:
• A single global threshold is applied to all subimages. The level of
compression differs from image to image depending on the number of
coefficients that exceed the threshold.
• N-largest coding: The largest N coefficients are retained in each
subimage. Therefore, a different threshold is used for each subimage. The
resulting code rate (total # of bits required) is fixed and known in advance.
• Threshold is varied as a function of the location of each coefficient in the
subimage. This results in a variable code rate (compression ratio). Here,
the thresholding and quantization steps can be together represented
Wavelet Transforms
• Represent an image as a sum of wavelet functions (wavelets) with different
locations and scales.
• Any decomposition of an image into wavelets involves a pair of
waveforms: one to represent the high frequencies corresponding to the
detailed parts of an image and one for the low frequencies or smooth parts
of an image.
 Discrete Wavelet Transform
• The Discrete Wavelet Transform (DWT) of image signals produces a
nonredundant image representation, which provides better spatial and spectral
localization of image formation, compared with other multi scale representations
such as Gaussian and Laplacian pyramid.
• Recently, Discrete Wavelet Transform has attracted more and more interest in
image fusion .An image can be decomposed into a sequence of different spatial
resolution images using DWT.In case of a 2D image, an N level decomposition
can be performed resulting in 3N+1 different frequency bands and it is shown in
figure.
JPEG Compression

• JPEG stands for Joint Photographic Experts Group

• Used on 24-bit color files.

• Works well on photographic images.

• Although it is a lossy compression technique, it yields an


excellent quality image with high compression rates.

3/29/2025 50
JPEG Schematic
Colour De
quantization
Inverse
Transform -
DCT
JPEG Compression

Decompression
JPEG
Down- Decoding Up-
sampling sampling

Compressed Colour
Forward
image data Transform
DCT

Quantization Encoding
Fact about JPEG Compression

• It defines three different coding systems:

1. A lossy baseline coding system, adequate for most


compression applications

2. An extended coding system for greater compression, higher


precision, or progressive reconstruction applications

3. A lossless independent coding system for reversible


compression

3/29/2025 52
Algorithm
• Splitting: Split the image into 8 x 8 non-overlapping pixel blocks. If the
image cannot be divided into 8-by-8 blocks, then you can add in empty
pixels around the edges, essentially zero-padding the image.
• Colour Space Transform from [R,G,B] to [Y,Cb,Cr] & Subsampling.
• DCT: Take the Discrete Cosine Transform (DCT) of each 8-by-8 block.
• Quantization: quantize the DCT coefficients according to psycho-visually
tuned quantization tables.
• Serialization: by zigzag scanning pattern to exploit redundancy.
• Vectoring: Differential Pulse Code Modulation (DPCM) on DC
components
• Run Length Encoding (RLE) on AC components
• Entropy Coding:
– Run Length Coding
– Huffman Coding or Arithmetic Coding
Step I - Splitting
The input image is divided into smaller blocks having 8 x 8 dimensions,
summing up to 64 units in total. Each of these units is called a pixel, which is the
smallest unit of any image.

Original Image split into multiple


image 8x8 pixel blocks
Step II - RGB to YCbCr
conversion
• JPEG makes use of [Y,Cb,Cr] model instead of [R,G,B] model.
• The precision of colors suffer less (for a human eye) than the precision of
contours (based on luminance)

Simple color space model: [R,G,B] per pixel


JPEG uses [Y, Cb, Cr] Model
Y = Brightness
Cb= Color blueness
Cr = Color redness
Colour
conversion

sampling
factor (1, 1, 1)
8x8 pixel
1 pixel = 3 components

Y = 0.299R + 0.587G + 0.114B


Cb = -0.1687R – 0.3313G + 0.5B + 128
Cr = 0.5R – 0.4187G – 0.0813B + 128
Down-sampling
Y is taken for every pixel, and Cb, Cr are taken for a block of 2x2 pixels.

4 blocks
16 x16 pixel

MCU with
sampling
factor
(2, 1, 1)

MCU = Minimum Coded Unit (smallest unit that can be coded)


Data size reduces to half.
Step III - Forward DCT

• The DCT uses the cosine function, therefore not interacting with complex
numbers at all.
• DCT converts the information contained in a block(8x8) of pixels from
spatial domain to the frequency domain.

Why DCT?
• Human vision is insensitive to high frequency components, due to which it
is possible to treat the data corresponding to high frequencies as redundant.
To segregate the raw image data on the basis of frequency, it needs to be
converted into frequency domain, which is the primary function of DCT.
DCT
Formula
• 1-D DCT

• But the image matrix is a 2-D matrix. So we can either apply 1-D DCT to
the image matrix twice. Once row-wise, then column wise, to get the DCT
coefficients. Or we can apply the standard 2-D DCT formula for JPEG
compression. If the input matrix is P(x,y) and the transformed matrix is
F(u,v) or G(u,v) then the DCT for the 8 x 8 block is computed using the
expression:-

• 2-D DCT –
DC and AC components
(8x8 (8x8
) )
2-D
F(0,0) is called the DC component
DCT
and the rest of P(x,y) are called AC
components.

P(x,y) F(u,v)

• For u = v = 0 the two cosine terms are 0 and hence the value in the location F[0,0]
of the transformed matrix is simply a function of the summation of all the values in
the input matrix.
• This is the mean of all 64 values in the matrix and is known as the DC coefficient.
• Since the values in all the other locations of the transformed matrix have a
frequency coefficient associated with them they are known as AC coefficients.
Step IV - Quantization
• Humans are unable to see aspects of an image that are at really high
frequencies. Since taking the DCT allows us to isolate where these high
frequencies are, we can take advantage of this in choosing which values to
preserve. By multiplying the DCT matrix by some mask, we can zero out
elements of the matrix, thereby freeing the memory that had been representing
those values.
• The resultant quantize matrix will only preserve values at the lowest
frequencies up to a certain point.
• Why Quantization?
– To reduce the number of bits per sample.
• Two types:
– Uniform quantization
• q(u,v) is a constant
– Non-uniform quantization
• Custom quantization tables can be put in image/scan header.
• JPEG Standard defines two default quantization tables, one each for
luminance and chrominance.
Quantization
𝐅(𝐮,𝐯)
Standard 𝑭 𝐮,𝒗 = 𝒓𝒐𝒖𝒏𝒅
𝐐(𝐮,𝐯)
Formula:

• F(u,v) represents a DCT coefficient, Q(u,v) is quantizationmatrix, and


F u,v represents the quantized DCT coefficients to be applied for
successive entropy coding.
• The quantization step is the major information losing step in JPEG
compression.
• For non-uniform quantization, there are 2 psycho-visually tuned
quantization tables each for luminance (Y) and chrominance (Cb,Cr)
components defined by jpeg standards.
Quantization
16 11 10 16 24 40 51 61
17 18 24 47 99 99 99 99
12 12 14 19 26 58 60 55
18 21 26 66 99 99 99 99
14 13 16 24 40 57 69 56
24 26 56 99 99 99 99 99
14 17 22 29 51 87 80 62
47 66 99 99 99 99 99 99
18 22 37 56 68 109 103 77
99 99 99 99 99 99 99 99
24 35 55 64 81 104 113 92
99 99 99 99 99 99 99 99
49 64 78 87 103 121 120 101
99 99 99 99 99 99 99 99
72 92 95 98 112 100 103 99
99 99 99 99 99 99 99 99

The Luminance Quantization Table The Chrominance Quantization Table

• The entries of Q(u,v) tend to have larger values towards the lower right corner. This
aims to introduce more loss at the higher spatial frequencies.
• The tables above show the default Q(u,v) values obtained from psychophysical studies
with the goal of maximizing the compression ratio while minimizing perceptual losses
in JPEG images.
Quantization - Example
160 80 47 39 5 3 0 0 16 7 4 2 0 0 0 0
65 53 48 8 5 2 0 0 5 4 3 0 0 0 0 0
58 34 6 4 2 0 0 0 4 2 0 0 0 0 0 0
40 7 6 1 1 0 0 0 0 0 0 0 0 0 0 0
8 4 0 0 0 0 0 0 Quantizer 0 0 0 0 0 0 0 0
5 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

DCT Coefficients Quantized Coefficients F(u,v)


F(u,v)
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56 Each element of F(u,v) is divided
14 17 22 29 51 87 80 62 by the corresponding element of
18 22 37 56 68 109 103 77
Q(u,v) and then rounded off to the
24 35 55 64 81 104 113 92
nearest integer to get the F(u,v)
matrix.
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99

Quantization Table Q(u,v)


Step V – Zig-Zag Scan
• Maps 8 x 8 matrix to a 1 x 64 vector.
• Why zigzag scanning?
– To group low frequency coefficients at the top of the vector and high frequency
coefficients at the bottom.
• In order to exploit the presence of the large number of zeros in the
quantized matrix, a zigzag of the matrix is used.

(8x8 (8x8 (64x1)


) )
...
End
Product: (1x64)
Step VI – Vectoring: DPCM on DC
component
• Differential Pulse Code Modulation (DPCM) is applied to the DC component.
• DC component is large and varies, but often close to previous value.
• DPCM encodes the difference between the current and previous 8x8 block.
• The entropy coding operates on a 1-dimensional string of values (vector). However the
output of the quantization matrix is a 2-D matrix and hence this has to be represented in 1- D
form. This is known as Vectoring.
• Example:-
38 38
(1x64) (1x64)

45 7
(1x64)
(1x64)
34 -11
. (1x64) . (1x64)
. .
. .
29 -5
(1x64)
(1x64)
Step VII - RLE on AC
components
• Run Length Encoding (RLE) is applied to the AC components.
• 1x63 vector (AC) has lots of zeros in it.
• Encoded as (skip, value) pairs, where skip is the number of zeros preceding
a non-zero value in the quantized matrix, and value is the actual coded
value of the non-zero component.
• (0,0) is sent as end-of-block (EOB) sentinel value and the zeros after that
are discarded. Only the number of zeros is taken note of.

… 0 0 0 6 0 0 0 0 0 1 …
…. (1x64)

(3,6) (5,1)
RLE Examples
Example 1-

Example 2-

The 1x64 vector is reduced in terms of bits by grouping the elements as groups of
(repeat count,pixel value).
Step VIII – Huffman Coding
• DC and AC components finally need to be represented by a smaller
number of bits
• Why Huffman Coding?
– Significant levels of compression can be obtained by replacing long strings of binary digits
by a string of much shorter code words.
• How?
– The length of each code word is a function of its relative frequency of occurrence.
• Normally a table of code words is used with the set of code words pre-
computed using the Huffman Coding Algorithm.
• In Huffman Coding, each DPCM-coded DC coefficient is represented by a
pair of symbols : (Size, Amplitude)
• where Size indicates number of bits needed to represent the coefficient and
Amplitude contains actual bits.
Huffman Coding: DC
Components
• DC Components are coded as (Size,Value). The look-up table for generating
code words for Value is as given below:-

Table 1: Huffman Table for DC components Value field


Huffman Coding – DC
Components
• The look-up table for generating code words for Size is as given below:-

Table 2: Huffman Table for DC components Size


field
The JPEG File Structure

Short name Bytes Size Name


SOI 0xFFD8 none Start Of Image
Start Of Frame
SOF0 0xFFC0 variable size
(Baseline DCT)
Start Of Frame
SOF2 0xFFC2 variable size
(Progressive DCT)
DHT 0xFFC4 variable size Define Huffman Table(s)
Define Quantization
DQT 0xFFDB variable size
Table(s)

DRI 0xFFDD 2 bytes Define Restart Interval

SOS 0xFFDA variable size Start Of Scan

RSTn 0xFFD0…0xFFD7 none Restart

APPn 0xFFEn variable size Application-specific


COM 0xFFFE variable size Comment
EOI 0xFFD9 none End Of Image
Merits of JPEG Compression

• Works with colour and grayscale images, but not with binary images.
• Up to 24 bit colour images (Unlike GIF)
• Target photographic quality images (Unlike GIF)
• Suitable for many applications e.g., satellite, medical, general
photography, etc.

You might also like