0% found this document useful (0 votes)
15 views27 pages

Wk7 JPEG Withlinks

The document discusses the processing, memory, and communication aspects of computer vision systems, focusing on video data representation and compression techniques. It highlights the importance of reducing image data volume through methods like JPEG compression, which involves color space conversion, discrete cosine transform, quantization, and entropy coding. The document also contrasts analogue and digital video transmission, emphasizing the trade-offs between compression efficiency and image quality.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views27 pages

Wk7 JPEG Withlinks

The document discusses the processing, memory, and communication aspects of computer vision systems, focusing on video data representation and compression techniques. It highlights the importance of reducing image data volume through methods like JPEG compression, which involves color space conversion, discrete cosine transform, quantization, and entropy coding. The document also contrasts analogue and digital video transmission, emphasizing the trade-offs between compression efficiency and image quality.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CE6023

Processing, Memory and Communications


Instructor: Patrick Denny
Processing, Memory and Communication
• Computer Vision Systems need to process a representation of the environment so we need to consider how video
data is actually represented as it is passed through a computer vision system
• In order to do so, we will examine representations of video data using some of the main standards and examine the
trade-offs between compression and video quality.
• We will also need to consider the implications for the overall video architectures
• You will develop an understanding of the considerations that are made when moving images aroudn

(c) Patrick Denny 2024 2


Compression
• Each pixel from an image sensor provides information about a scene but this is generally considerably more than
is needed to convey important visual information about an object around a computer vision system.
• Note that such a system may have to send images over a communications link
• If they are too big, then
• images get moved around too slowly (low bandwidth)
• images get delivered in full too slowly (high latency)
• images incur excessive storage (high storage volume)
• These flaws have significant downstream effects
• Realtime applications become infeasible as information moves around too slowly and/or arrives too late
• Expensive, difficult, power-hungry electronics are needed to increase bandwidth and reduce latency
• Expensive, power-hungry, space-wasting storage is needed
• If we have too much data representing an image then our systems can grind to a halt.
• We need consistent methods to reduce the volume of an image’s data without significantly reducing an image’s
information
• This we call compression

(c) Patrick Denny 2024 3


Compression – Analogue
Video Transmission

• The original forms of image compression used in


analogue television were implicit in the sense that
images of scenes were made by scanning horizontal
lines in a scene and representing them as analogue
voltages.
• These approaches are nowadays largely historical as
the world has gone digital.
• However, there are still simple analogue camera and
transmission systems in existence based on a rich
history of solid engineering.
• Nowadays, we live in the digital age so we will consider
compression from the perspective of moving around
• Images (using JPEG to illustrate the concepts)
An analogue NTSC signal of a scanline across a
• Video (using MPEG-2 to illustrate the concepts)
Yellow-Cyan-Green-Magenta-Red-Blue colour bar

(c) Patrick Denny 2024 4


JPEG Image Compression

(c) Patrick Denny 2024


How JPEG works
• The JPEG file format was one of the most technologically impressive advancements to image compression to come
on the scene in 1992
• Since then, it has been a dominant force in the representation of photo-quality images and has enabled computer
vision systems by giving them the facility to move image data.
• We can break down the basic idea of JPEG compression into the following steps.

(c) Patrick Denny 2024 7


Colourspace Conversion
• One of the key principles of lossy data compression
is that humans sensors use far less data for the
representation of a scene than silicon image sensors.
• In deciding on how best to represent image
information, the Joint Picture Experts Group (JPEG)
selected the YCbCr colour space
• YCbCr is a perceptually uniform colour space.
• A colour space is perceptually uniform if a change
of length in any direction of the colour space is
perceived by a human as the same change.
• A non-uniform perceptual colourmap can have
stark contrasts when transitioning from one hue
to another hue.
• In data visualization, these contrasts can be
mistaken as changes in the data rather than as
transitions in the color palette.
• A perceptually uniform colour space is a good
choice relating to what a human perceives
• The first step is a conversion of an image’s data from
RGB to YCbCr
(c) Patrick Denny 2024 8
Colourspace Conversion
- downsampling Y Cb Cr

• In a conversion from RGB to YCbCr, the resulting


Cb and Cr channels which hold the chroma
information contain less information than the Y
channel does.
• A result, the JPEG algorithm resizes the Cb and Cr
channels to be about ¼ of their original size
• This step is downsampling
• Downsampling is lossy, i.e., you won’t be able to
recover the exact source colours) but the overall
impact on the visual components of the human
visual cortex is minimal.
• Luma (Y) is where most of the perceptual
information is for a human so the impact of the
downsampling on the visual system is low.

(c) Patrick Denny 2024 9


Image divided into 8x8 blocks of pixels
• From this point on, JPEG does all operations on 8x8 blocks of pixels
• This is done because we generally expect that there is not a lot of variation over the 8x8 blocks, even in very
complex photos – there tends to be some self-similarity in local areas
• This similarity is made use of during compression later.

(c) Patrick Denny 2024 10


Discrete Cosine Transform (DCT)
• We now use a trick where we represent the 8x8 blocks in a different way that makes use of the similarity
• The discrete cosine transform allows us to represent the data in the 8x8 blocks losslessly but facilitates
compression.
• The key component of the DCT is that it assumes that the 8x8 blocks can be represented as combination of cosine
functions.
• Let’s look at some simple examples of what we mean

(c) Patrick Denny 2024 11


Discrete Cosine Transform (DCT)
• For example, consider the following graph

• You can see that it is actually a sum of


cos(x) +cos(2x)+cos(4x)

(c) Patrick Denny 2024 12


Discrete Cosine
Transform (DCT)
• In the previous example, we had a
1-D fit for using cosine functions
and we can do this in 2-D also
using a set of basis functions
• The basis functions can be used to
build up the image in a unique
way that captures all the values of
the 8x8 block
• Any macroblock is a sum of
multiples of these basis functions

(c) Patrick Denny 2024 13


Example using DCT
• The discrete cosine transform converts each 8x8 block of each (Y, Cb, Cr) into a frequency domain representation
• Let’s take an example, where we look at the Y component of a block

• For an 8-bit image, each entry in the original block falls into the range [0,255] and this is shifted so that the midpoint
of the range, 128,is shifted to zero and the new range is [-128,127]
• This results in a new matrix g

(c) Patrick Denny 2024 14


Example using DCT
• The next step is to take the two-dimensional DCT of g
• The DCT transforms the 8x8 block g to a linear combination of these 64 patterns.
• The patterns are referred to as the two-dimensional DCT basis functions and the output values
are referred to as transform coefficients.
• The horizontal index is u and the vertical index is v
• The transform is given by (refer to Wikipedia page)
1 2𝑥+1 𝑢𝜋 2𝑦+1 𝑣𝜋
• 𝐺𝑢,𝑣 = 𝛼(𝑢)𝛼(𝑣) σ7𝑥=0 σ7𝑥=0 𝑔𝑥,𝑦 cos cos
4 16 16
• where
• u is the horizontal spatial frequency for the integers 0 ≤ u < 8
• v is the vertical spatial frequency for the integers 0 ≤ v < 8
1
, 𝑖𝑓 𝑢 = 0
• 𝜶 𝒖 =ቐ 2 is a normalizing scale factor to make the transformation orthogonal
1 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• gx,y is the pixel value at co-ordinates (x,y)
• Gu,v is the DCT coefficient at co-ordinates (u,v)

(c) Patrick Denny 2024 15


Example using DCT
• If we perform this transformation on our matrix, we get the
following (rounded to the nearest two digits beyond the
decimal point)
• Note the top-left corner entry with the rather large
magnitude. This is the DC coefficient (also called the
constant component) which defines the basic hue for the
entire block.
• The remaining 63 coefficient are the AC coefficients
• An advantage of the DCT is it tends to aggregate most of
the signal in one corner of the result.
• The quantization step to follow accentuates this effect while
simultaneously reducing the overall size of the DCT
coefficients, resulting in a signal that is easy to compress
efficiently in the entropy stage.

(c) Patrick Denny 2024 16


Quantization
• The human eye is good at seeing small differences in brightness over a relatively large
are, but not so good at distinguishing the exact strength of a high spatial frequency
brightness variation
• This allows one to greatly reduce the amount of information in the high-frequency
components.
• This is done by simply dividing each component in the frequency domain by a constant
for that component and then rounding to the nearest integer.
• This rounding operation is the only lossy operation in the whole process (other than in
chroma subsampling) if the DCT computation is performed with sufficiently high
precision.
• As a result of this, it is typically the case that many of the higher frequency components
are round to zero and many of the rest become small positive or negative numbers,
which take many fewer bits to represent.

(c) Patrick Denny 2024 17


Quantization
• The elements in the quantization matrix control the
compression ratio, with larger values producing greater
compression
• A typical quantization matrix Q (for a quality of 50% as
specified in the original JPEG standard) is shown
• The quantized DCT coefficients are computed with
𝐺𝑗,𝑘
• 𝐵𝑗,𝑘 = 𝑟𝑜𝑢𝑛𝑑 for j = 0,1,2,...,7 and k = 0,1,2,...,7
𝑄𝑗,𝑘

• where
• G is the unquantized DCT coefficients
• Q is the quantization matrix shown and
• B is the quantized matrix of coefficients

(c) Patrick Denny 2024 18


Quantization
• Our example gives us a matrix B of quantized
DCT coefficients
• Notice that most of the higher-frequency
elements of the sub-block (i.e., those with an
x or y spatial frequency greater than 4) are
quantized into zero values.
• Reminder on “higher spatial frequency
elements”
• The more you move down or to the right,
the higher the spatial frequency
associated with the coefficient.
• What does that mean? Well, look at the
basis functions
• The more to the right or down you go in
the matrix function, the higher the
occurrence of features in the basis
function in a given space of 8x8, so the
higher the spatial frequency

(c) Patrick Denny 2024 19


Simple compression examples
• A simple interactive JPEG compression can be seen
at this link
• JPEG Viewer (omarshehata.github.io)
• We can see
• Test letters for compression
• Corresponding matrix B
• Basis functions used at current compression level
• Decompressed image
• You will see that
• stronger compression ->
• fewer non-zero entries in the matrix->
• smaller set of data to be conveyed.
• Play with it!

(c) Patrick Denny 2024 20


Entropy coding
• Entropy coding is a special form of lossless data compression
• It involves arranging the image components in a “zigzag” order using run-
length encoding (RLE) which groups similar frequencies together, inserting
length coding zeros and then using Huffman coding on what is left.
• JPEG also allows, but doesn’t require, for the superior arithmetic coding to
be used instead of Huffman coding.
• However, it is slower and more computationally expensive to encode
and decode and covered by patents.
• Arithmetic coding reputedly typically makes files about 5-7% smaller
than Huffman coding
• This is an example of where one might make a tradeoff of greater
processing overhead for smaller bandwidth.
• Let’s look at entropy coding with an example of the previously quantized
matrix in the next slide.

(c) Patrick Denny 2024 21


Entropy Coding
• The previously quantized DC coefficient is used to predict the current quantized DC coefficient.
• The difference between the two is encoded rather than the actual value
• The encoding of the 63 quantized AC coefficients does not use such prediction differencing
• Let’s look at the previous matrix expressed using a zigzag representation on the next slide (for illustrative purposes)

(c) Patrick Denny 2024 22


Entropy coding
• If the i-th block is represented by Bi and positions within each block are represented by (p,q) where p = 0, 1,…, 7
and q = 0, 1,…, 7 then any coefficient in the DCT image can be represented as B i(p,q).
• So, that scheme, the order of encoding pixels (for the i-th block) is
• Bi(0,0), Bi(0,1), Bi(1,0), Bi(2,0), Bi(1,1), Bi(0,2), Bi(0,3), Bi(1,2) and so on

Long string of
zeroes easy to
compress

(c) Patrick Denny 2024 23


Entropy Coding – Sequential vs Progressive
• The encoding mode shown so far is called baseline sequential encoding
• JPEG also supports progressive encoding
• Sequential encoding encodes a single block in a zigzag manner as shown
• Progressive encoding takes similarly-positioned batches of coefficients of all blocks in one go (called a scan)
followed by the next batch of coefficients of all blocks.
• It has been found that baseline progressive JPEG encoding usually gives better compression as compared to
baseline sequential JPEG because it is possible to use different Huffman tables in the compression (we’ll discuss
shortly what these are) tailored for different frequencies on each “scan” or “pass” which includes similarly positioned
coefficients), though the difference is not too large.
• An intuition might be that different blocks might have similar “textures” which means similar expressions of spatial
frequencies – this means fewer possibilities for a specific spatial frequency, which means it is more compressible.

(c) Patrick Denny 2024 24


Entropy encoding – Huffman encoding
• In order to encode the above-generated coefficient pattern, JPEG uses Huffman encoding
• The JPEG standard provides general-purpose Huffman tables
• However, encoders may also choose to generate Huffman tables optimized for the actual frequency distributions in
images being encoded
• This can be useful in applications where there is a very limited anticipated variety of images with a known
spatial frequency distribution.
• (Note from Patrick - I once crashed a development camera system using a chart of high frequency elements)

(c) Patrick Denny 2024 25


Entropy encoding - RLE
• The process of encoding the zig-zag quantized data begins with a run-length encoding explained below, where:
• x is the non-zero, quantized AC coefficient
• RUNLENGTH is the number of zeros that came before this non-zero AC coefficient
• SIZE is the number of bits required to represent x
• AMPLITUDE is the bit-representation of x
• RLE works by examining each non-zero AC coefficient x and determining how many zeroes came before the previous
coefficient. Symbol 1 Symbol 2
• With this information, two symbols are created (RUNLENGTH, SIZE) (AMPLITUDE)
• Both RUNLENGTH and SIZE are held in the same byte, meaning that each only contains four bits of information. The
higher bits deal with the number of zeros, while the lower bits denote the number of bits necessary to encode the value of x
• This has the implication of Symbol 1 being only able to store information regarding the first 15 zeroes preceding the non-
zero AC coefficient. However, JPEG defines two special Huffman code words
• One is for ending the sequence prematurely when the remaining coefficients are zero (called “End-of-Block” or “EOB”)
• The other is for when the run of zeroes goes beyond 15 before reaching a non-zero AC coefficient. In such a case
where 16 zeros are encountered before a given non-zero AC coefficient, Symbol 1 is encoding “specially” as (15,0)(0)
• The overall process continues until “EOB” – denoted by (0,0) is reached
• Let’s look at the example we had and step through the encoding

(c) Patrick Denny 2024 26


Entropy encoding of our example
• Before we look at the example, a reminder that the first value in the matrix, -
26, the DC coefficient, is not encoded in the same way.
• The overall process continues until EOB – denoted by (0,0) – is reached
• (0, 2)(-3);(1, 2)(-3);(0, 2)(-2);(0, 3)(-6);(0, 2)(2);(0, 3)(-4);(0, 1)(1);(0, 2)(-3);(0,
1)(1);(0, 1)(1); (0, 3)(5);(0, 1)(1);(0, 2)(2);(0, 1)(-1);(0, 1)(1);(0, 1)(-1);(0,
2)(2);(5, 1)(-1);(0, 1)(-1);(0, 0);
• From here, frequency calculations are made based on occurrences of the
coefficients
• In our example block, most of the quantized coefficients are small numbers
that are not preceded immediately by a zero coefficient and these more-
frequent cases will be represented by shorter code words.

(c) Patrick Denny 2024 27


Decoding
• Decoding is essentially a (lossy) version of these
processes

(c) Patrick Denny 2024 28

You might also like