0% found this document useful (0 votes)

43 views19 pages

Lecture3 PDF

Uploaded by

Avijit Rana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views19 pages

Lecture3 PDF

Uploaded by

Avijit Rana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

The Essence of Image and Video Compression

1E8: Introduction to Engineering

Introduction to Image and Video Processing

Dr. Anil C. Kokaram,

Electronic and Electrical Engineering Dept.,
Trinity College, Dublin 2, Ireland,
[email protected]

1 Overview

This handout covers the basics of Image and Video compression as follows

1. What is compression and why is it needed?

2. The simplest possible compression scheme: Run Length Encoding

3. Representing signals by sums of sines and cosines [The Fourier Transform]

4. Transform compression and JPEG

5. Motion estimation and predicting pictures in a sequence

6. Video Compression

1
2 THE NEED FOR COMPRESSION

2 The need for compression

Consider a typical television image. It consists of pixels in each row, and there are
rows. A 4:2:2 (broadcast standard) video frame (as you would get from your Digital Set Top
box, or DVD) represents colour as below.

4:2:2 4:1:1

4:2:0

In one frame there are pixels. As each pixel is repre-

sented by one byte, then that is bytes. At 25 frames/sec this means a bandwidth of

MB/sec is required to transmit the VIDEO ALONE! This means about

to store one hour of movie. This is the RAW DATA bandwidth.

The available bandwidth for a single Digital television channel is at best 6Mbits/sec. This is
about 30 times smaller than the 20MB/sec needed. DVD can store at most 4GB, how does
one fit 2 hours of movie on a DVD?

You digital mobile phone can handle maybe 1 Mbit/sec absolute TOPS . That is 180 times
smaller than required for video.

Imagine you are a film and TV archive (like www.ina.fr or the BBC or rte). You need to keep
a record of 24 hours of programming on 100’s of channels daily for up to 50 years (in the case
of the BBC). Hmm.. there is not enough space in a town to stack up the CD’s needed to store
that!

So a mechanism is needed to represent images with fewer bytes than the raw data

1E8 Introduction to Engineering 2 Anil Kokaram www.mee.tcd.ie/ sigmedia

3 TOWARDS COMPRESSION

3 Towards compression

I don’t really need pixels for my 1 inch mobile screen do I? So I can throw away

every 4th pixel and 4th line (subsampling) for instance, and yield a picture instead.

So now I can show the same picture for 1/16 the storage. Not good enough. Besides,
pictures look really crap on a TV set.

Format Total Active MB/sec

Resolution Resolution
CCIR 601 30 frames/sec, 4:3 Aspect Ratio, 4:2:2

QCIF

CIF

Full
CCIR 601 25 frames/sec, 4:3 Aspect Ratio, 4:2:2

QCIF

CIF

Full

What if I start to think about mathematical models for pictures . . . ? Then I can send/store the
parameters of my model instead of the actual pictures, and if my model is simple, I can store
less parameters than pixels and get some compression. Hmmm. But pictures look pretty com-
plicated. In fact most interesting pictures tend to be different from other pictures. Otherwise
why look?

It turns out that you can make some generic statements about images and image sequences.

1. In small, local regions, pixel intensity and colour tends to be the same or at least slowly

varying. For small .. think blocks of pixels.

2. You can construct any picture by adding together a weighted set of pre-defined primitive
pictures. These primitive pictures are in fact the 2D equivalent of sines and cosines.
3. In a video sequence consecutive pictures tend to look the same except for the moving
bits.

We’ll use these ideas now.

1E8 Introduction to Engineering 3 Anil Kokaram www.mee.tcd.ie/ sigmedia

4 RUN LENGTH ENCODING

4 Run Length Encoding

Consider that you want to transmit a fax as an image. There are just 2 colours 0 = black and 1
= white. Let’s say your image is as below (the letter H in a binary image).

0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 1 1 0 1 1 0 0
0 1 1 0 1 1 0 0
0 1 1 1 1 1 0 0
0 1 1 1 1 1 0 0
0 1 1 0 1 1 0 0
0 0 0 0 0 0 0 0

Instead of sending every single pixel, since there tend to be long lengths of consecutive re-

peated pixels (i.e. long runs) we could send a (for instance) followed by the number of times
it is repeated.

So instead of sending or storing for instance, you would store , the first number
being the colour, and the second being the number of times that colour occurred consecutively.
Instead of storing 8 bytes, we have stored just 2. We have encoded some raw data of 8 zeros,

as just 2 bytes. We have achieved a compression factor of !

In typical RLE schemes, you do not account for all possible runs. Instead you only allow for
runs of length say 0 to 32 for instance. Then a run of length 64 would need to be encoded as
2 runs of length 32.

Lets say for our RLE scheme we allow a maximum run length of 8, and the data is either 0 or
1. The image example then can be represented by . . .

But what about a real/grayscale image? Hmm. RLE might get inefficient if the data is not
mostly flat!
10 32 22 12
10 20 20 10
10 30 20 10
8 31 20 15

1E8 Introduction to Engineering 4 Anil Kokaram www.mee.tcd.ie/ sigmedia

5 SIGNAL TRANSFORMS

5 Signal Transforms

What if it were possible to change the image in some reversible process, so that we created a result
that was easier to compress? In other words we take our data and transform it in some clever way
to make RLE work better.
This is related to another idea.
Suppose I had a photoalbum/dictionary of all the possible images in the world ever made in the
past and ever will be possible in the future. And suppose I gave you a copy of this dictionary in
which each image was assigned a number.
Then instead of having to send you the raw data, I would just send you the number of the image
in the dictionary, and you could look it up and you’d have the picture!
This dictionary would be very large since pictures come in many flavours. To make a smaller
dictionary, you can instead choose images which when added together make up the picture you
want to send or store.
So now to send a picture, the transmitting end has to work out which set of images could be
added together to give the picture. Then the transmitter sends the indexes of those images to the
receiver. The receiver then looks up the pictures and then adds them together to give the received
image.
About 200 years ago1 , a guy called Fourier, spotted that you could actually do this with any
signal. He was working on 1D signals but the same applies to 2D ones.
1
No electricity, no computers, no cinema, no television, no hot baths, no baths, no showers. Lice in your hair all the time, no
soap, no nylon, no jeans, no flushing toilets, no sewage system ...

1E8 Introduction to Engineering 5 Anil Kokaram www.mee.tcd.ie/ sigmedia

5.1 Representing signals with waves 5 SIGNAL TRANSFORMS

5.1 Representing signals with waves

The brilliant discovery of Fourier, was that any 1D signal can be represented by a weighted sum of
sines and cosines. So to make a triangle wave for instance, all you need to do is to add a bunch of
sines and cosines together of different frequencies and different amplitudes.

−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1

−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0.2

−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1

−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Time (seconds)

2/π

−1/π
0 1 2 3 4 5 6 7

And he came up with a mathematical formula that says which frequencies and which amplitudes
were needed to synthesise a particular signal.
Since we all know what sines and cosines look like, we can summarise this signal decomposition
with a graph of Amplitude versus Frequency. That graph will tell us how much of each frequency

1E8 Introduction to Engineering 6 Anil Kokaram www.mee.tcd.ie/ sigmedia

5.1 Representing signals with waves 5 SIGNAL TRANSFORMS

should be added together. This is the Frequency Spectrum of a signal.

Given this graph, Fourier also worked out how to reconstruct the original signal. He discovered
a completely reversible transform: The Fourier Transform. It converts or transforms a signal from
the time domain into a frequency domain. For audio signals like music, this sorta makes intuitive
sense, for images and other signals its less intuitive but no less useful.
150 years later2 (in the 1960’s) people3 worked out how to use this for Digital signals and how
it could be automated with computers. Then Fourier’s idea really became super-useful.
You see: we can think of the sines and cosines at different frequencies as our dictionary, and
the amplitudes as a weight attached to each one. So to transmit some data all you need to do is to
work out frequencies and amplitudes and send that instead of the actual raw data. The signals in
this special dictionary are called basis functions and the corresponding amplitudes needed are called
coefficients.
So its a bit like saying, instead of sending the sawtooth wave (in the example above), send
instead the graph of amplitude versus frequency. That graph is a whole lot smaller, but it contains
all the same information.
Think of this. Suppose I have a music signal which is a pure sine wave lasting 10 secs at 50
Hertz that is represented by a digital signal sampled at 44.1 KHz. This means that my data record

is 441 K samples long. Say we’re using 16bit audio, that’s 441 2 bytes. Instead of transmitting all
882 bytes : how could I send the same signal with just 3 bytes?
2
People were sorting out the showers, baths, electricity, lice in the meantime
3
A guy with the funny name of Tukey

1E8 Introduction to Engineering 7 Anil Kokaram www.mee.tcd.ie/ sigmedia

5.2 Image Transforms 5 SIGNAL TRANSFORMS

5.2 Image Transforms

With 2D signals things are a bit trickier. 2D sines and cosines look a bit like a wave in a wave tank,
or a wave in your bath, or a wave in the sea. Except the wave is a wave in intensity or brightness.
The equation for working out how much of each wave you need to make a picture is also a bit tricky.
Furthermore, each wave is represented by a complex number. Urgh?

1
20

0.5

30 0

−0.5

−1
80

60 70
50 60
40 50
40
30
20 20
10
0 0
60

10 20 30 40 50 60

Wave is directed at degrees off

horizontal, frequency is cycles per pel in that direction and phase lag .
Instead electrical/signal processing engineers have come up with a simpler4 Transform that uses
only Cosine waves. This transform, known as the Discrete Cosine Transform, results in only real
numbers. It is the basis of JPEG.
4
Not really

1E8 Introduction to Engineering 8 Anil Kokaram www.mee.tcd.ie/ sigmedia

5.3 JPEG for First Year Undergraduates 5 SIGNAL TRANSFORMS

5.3 JPEG for First Year Undergraduates

JPEG is based on Transforming blocks of pixels using the 2D DCT. For a signal of 8
samples, the 8 possible DCT basis function (the dictionary) is as below.
8-point DCT: rows 1 to 4 8-point DCT: rows 5 to 8
0 -4

-0.5 -4.5

-1 -5

-1.5 -5.5

-2 -6

-2.5 -6.5

-3 -7

-3.5 -7.5

-4 -8

-4.5 -8.5

-5 -9
0 2 4 6 8 0 2 4 6 8

The 64 2D DCT basis functions and the 2D DCT of a block in Lenna are shown below.

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

Now we can see that the effect of Transforming a block of pixels is to reduce its overall energy.
Its flatter in the DCT space. This means that we have less information to transmit. Here is

what happens if we take every block in Lenna and transform it with the 2D DCT.

1E8 Introduction to Engineering 9 Anil Kokaram www.mee.tcd.ie/ sigmedia

5.3 JPEG for First Year Undergraduates 5 SIGNAL TRANSFORMS

Now we’re almost there ...

You can see that in the Transformed images, there are many coefficients that are almost zero.
So why transmit or store them at all? If we wanted to reconstruct the image exactly, we would
need all these tiny values, but because we know that the Human Visual System can tolerate
defects in pictures, we know that maybe we can throw away the small coefficients and keep
the big ones and still have a reasonable looking picture.
In fact, in JPEG what is done is to quantise the coefficients with varying degrees of accu-
racy. So the top left hand corner coefficient is quantised with 32 levels say, while the bottom
right hand corner is quantised to 2 levels. This is because low frequency information is more
important than high frequency for visual perception.
When you set the Quality setting for JPEG in Adobe Photoshop, you are changing the quan-
tisation levels. For low quality, you throw away more information, i.e. you quantise more
coarsely. For high quality you keep more information so you quantise finely.
After that step JPEG uses RLE to encode each block of coefficients in a zig-zag scan.

1E8 Introduction to Engineering 10 Anil Kokaram www.mee.tcd.ie/ sigmedia

5.3 JPEG for First Year Undergraduates 5 SIGNAL TRANSFORMS

Problems : blocking artefacts and mosquito noise.

1E8 Introduction to Engineering 11 Anil Kokaram www.mee.tcd.ie/ sigmedia

6 VIDEO COMPRESSION

6 Video Compression

All the best codecs for media are based on transforming the data in some way. JPEG2000 is
based on a new kind of transform, the Wavelet Transform discovered only in the late 1980’s.
Compression of audio .mp3 is based on 1D DCT. MPEG (Motion Picture Experts Group) is
used for compression of video for DVD or DTV [MPEG1,2,4]. Ireland was a major player in
establishing the MPEG 4 standard.
Intel Indeo, Apple Quicktime, Divx are all based on MPEGGy ideas.
MPEG is based again on the 8 point DCT just like JPEG except....

In video most consecutive pictures look the same. So if I knew what one picture looked
like, then in theory I could build all the others by slightly adjusting that one. This is called
prediction.

But things move around in video, so we have to estimate that motion to work out how to shift
the pixels around in order to create the next image.

6.0.1 On Motion Compensated Prediction

To understand how prediction can help with video compression, The top row of figure 2 shows a sequence of

images of the Suzie sequence. It is QCIF ( ) resolution and at a frame rate of 30 frames/sec.

We have already seen that Transform coding of images yields significant levels of compression, e.g.
JPEG. Therefore a first step at compressing a sequence of data is to consider each picture separately. Consider

using the 2D DCT of blocks. The DCT coefficients for each frame of Suzie are shown in the second
row of figure 2. The use of the DCT on the raw image data yields a compression of the original 8 bits/pel
data to about 0.8 bits/pel on each frame. Note that the DCT coefficients have NOT been quantised using the
standard JPEG Quantisation matrix for demonstration purposes.

We know that most images in a sequence are mostly the same as the frames nearby except with different
object locations. Thus we can propose that the image sequence obeys a simple predictive model (discussed
in previous lectures) as follows:

(1)

where
is some small prediction error that is due to a combination of noise and “model mismatch”. Thus
we can measure the prediction error at each pixel in a frame as

(2)

This is the motion compensated prediction error, sometimes referred to as the Displaced Frame Difference
(DFD). The only model parameter required to be estimated is the motion vector
. Assume for the moment
that we use some process to estimate these vectors. We will look at that later.

Figure 1 illustrates how motion compensation can be applied to predict any frame from any previous
frame using motion estimation. The figure shows block based motion vectors being used to match every

1E8 Introduction to Engineering 12 Anil Kokaram www.mee.tcd.ie/ sigmedia

6 VIDEO COMPRESSION

Block Motion Shifted Block in Frame n−1

Motion Vector

Motion

Vector

Location of Block in Frame n

Frame n−1

Object

n−1 n

Block in Frame n

Frame n

Figure 1: Explaining how motion compensation works.

block in frame with the block that is most similar in frame . The difference between the corresponding

pixels in these blocks according to equation 2 is the prediction error.

In MPEG, the situation shown in figure 1 (where frame is predicted by a motion compensated version

of frame ) is called Forward Prediction. The block that is to be constructed i.e. frame is called the

Target Block. The frame that is supplying the prediction is called the Reference Picture, and the resulting
data used for the motion compensation (i.e. the displaced block in frame ) is the Prediction Block.

6.0.2 Image prediction

The Fourth row of Figure 2 shows the prediction error of each frame of the Suzie sequence starting from the

first frame as a reference. A three level Block Matcher was used with blocks and a motion threshold

for motion detection of at the highest resolution level. The accuracy of the search was

pixels. Each

DFD frame is the difference between frame and a motion compensated frame

, given the original

frame
.

Again, we can compress this sequence of ‘transformed’ images (including the first I frame) using the

DCT of blocks of . Now the amount of data needed per is about 0.4 bits/pel. Substantial compression
has been achieved over attempting to compress each image separately. Of course, you will have deduced that
this was going to be the case because there is much less information content in the DFD frames than in the
original picture data.

To confirm that it is indeed motion compensated prediction that is contributing most of the benefit, the

1E8 Introduction to Engineering 13 Anil Kokaram www.mee.tcd.ie/ sigmedia

6 VIDEO COMPRESSION

Figure 2: Frames 50-53 of the Suzie sequence processed by various means. From Top to Bottom row: Orig-
inal Frames; DCT of Top Row; Non-motion compensated DFD; Motion Compensated DFD with backward
prediction; DCT of previous row.

1E8 Introduction to Engineering 14 Anil Kokaram www.mee.tcd.ie/ sigmedia

6.1 Block Matching 6 VIDEO COMPRESSION

I B B P B B P B B I

Figure 3: A typical Group of Pictures (GOP) in MPEG2

3rd row of figure 2 shows the non-motion compensated frame difference (FD)

between the
frames of Suzie. There is substantially more energy in these FD frames than in the DFD frames, hence the
higher bit rate.

6.0.3 Problems with occlusion

A closer look at the DFD frame sequence in row 2 of Figure 2 shows that in frames 52 and 53 (in particular)
there are some areas that show very high DFD. This is explained by observing the behaviour of Suzie in the
top row. In those frames her head moves such that she uncovers or occludes some area of the background.
The phone handset also uncovers a portion of her swinging hair. In the situation of uncovering, the data in
some parts of frame simply does not exist in frame . Thus the DFD must be high. However, the data

that is uncovered in frame , typically is also exposed in frame . Therefore, if we could look into the

next frame as well as the previous frame we probably will be able to find a good match for any block whether
it is occluded or uncovered.

Using such Bi-directional prediction gives much better image fidelity. This idea is used in MPEG-2. It
uses both backward prediction for some frames (P frames) and bidirectional prediction for others (B frames).

The sequencing is shown below. Typically MPEG2 encodes images in the following order IBBPBBPBBPBBPI. . . .
I-frames (Intra-coded frames) are encoded just like JPEG i.e. without any motion compensation. This allows
the codec to cope with varying image content...think what would happen if you tried to predict every image
in a movie from the first frame. Its not going to work is it? So I-frames are slipped in every 12 frames or so
to give a new reference frame for prediction of the next 12 frames.

6.1 Sledgehammer motion estimation: Block Matching

The most popular and to some extent the most robust technique to date for motion estimation is Block Match-
ing (BM).

Two basic assumptions are made in this technique.

1E8 Introduction to Engineering 15 Anil Kokaram www.mee.tcd.ie/ sigmedia

6.1 Block Matching 6 VIDEO COMPRESSION

1. Constant translational motion over small blocks (say or ) in the image. This is the same
as saying that there is a minimum object size that is larger than the chosen block size.

2. There is a maximum (pre-determined) range for the horizontal and vertical components of the motion
vector at each pixel site. This is the same as assuming a maximum velocity for the objects in the
sequence. This restricts the range of vectors to be considered and thus reduces the cost of the algorithm.

The image in frame , is divided into blocks usually of the same size ,
. Each block is considered in
turn and a motion vector is assigned to each. The motion vector is chosen by matching the block in frame

with a set of blocks of the same size at locations defined by some search pattern in the previous frame.

Given a possible vector

, we can define the DFD between a pixel in the current frame and
its motion compensated pixel in the previous frame as

(3)

Define the Mean Absolute Error of the DFD between the block in the current frame and that in the previous

frame as

(4)
Block
We can use Mean Squared Error (MSE) as well, but MAE is more robust to noise.

The block matching algorithm then proceeds as follows at each image block.

1. Pre-determine a set of candidate vectors to be tested as the motion vector for the current block
2. For each calculate the MAE
3. Choose the motion vector for the block as that which yields the minimum MAE.

The set of vectors in effect yield a set of candidate motion compensated blocks in the previous frame

for evaluation. The separation of the candidate blocks in the search space determines the smallest vector
that can be estimated. For integer accurate motion estimation the position of each block coincides with the
image grid. For fractional accuracy, blocks need to be extracted between locations on the image grid. This
requires some interpolation. In most cases Bilinear interpolation is sufficient.

Figure 4 shows the search space used in a full motion search technique. The current block is compared

to every block of the same size in an area of size

. The search5 space is chosen by
deciding on the maximum displacement allowed: in Figure 4 the maximum displacement estimated is

for both horizontal and vertical components.

The technique arises from a direct solution of equation 1. The BM solution can be seen to minimize

the Mean Absolute DFD (or Mean Square DFD) with respect to , over the block. The chosen

displacement, satisfies the model equation 1 in some ‘average’ sense.
5
There are searched locations.

1E8 Introduction to Engineering 16 Anil Kokaram www.mee.tcd.ie/ sigmedia

6.1 Block Matching 6 VIDEO COMPRESSION

Frame n-1 Frame n

N+2w
N
w

N+2w
N

Centre pixel of block to be matched

Centre pixel of candidate
matching block
Border of entire search area

Figure 4: Motion estimation via Block Matching. The positions indicated by a in frame are searched
for a match with the

block in frame . One block to be examined is located at displacement

,
and is shaded.

6.1.1 Computation

The Full Motion Search is computationally demanding. Given a maximum expected displacement of
pels, there are
searched blocks (assuming integer displacements only). Each

requires on the order of operations to calculate the MAE. This implies

block considered
operations per block
for an integer accurate motion estimate. Several reduced search techniques have been introduced which lessen
this burden. They attempt to reduce the operations required either by reducing the locations searched or by
reducing the number of pixels sampled in each block. However, reduced searches may find local minima in
the DFD function and yield spurious matches.

6.1.2 Three step search

The simplest mechanism for reducing the computational burden of Full Search BM is to reduce the number
of motion vectors that are evaluated. The Three-step search is a hierarchical search strategy that evaluates
first 9 then 8 and finally again 8 motion vectors to refine the motion estimate in three successive steps. At
each step the distance between the evaluated blocks is reduced. The next search is centred on the position
of the best matching block in the previous search. It can be generalised to more steps to refine the motion
estimate further. Figure 5 shows the searched blocks in frame for this process.

6.1.3 Cross Search

The cross search is another variant on the subsampled motion vector visiting strategy. It changes the geometry

of the search pattern to a or pattern. Figure 5 shows the searched blocks in frame for this process.

If the best match is found at the centre of the search pattern or the boundary of the search window, then the
search step is reduced.

1E8 Introduction to Engineering 17 Anil Kokaram www.mee.tcd.ie/ sigmedia

6.2 Video codec issues 6 VIDEO COMPRESSION

2 2 2

1 1 2 1 2
3 3 3
2 2 3 2 3 1 2 3
3 3 3
1 0 1 1 0 1 2 3
5
1 2 3 5 4 5
5
1 1 1 4

Figure 5: Illustration of searched locations (central pixel of the searched block is shown) in Three-step BM
(left) and Cross-search BM (right). The search window extent is shown in red for Cross-search. The best
matches at each search level are circled in blue.

6.1.4 Problems

The BM algorithm is noted for being a robust estimator of motion since noise effects tend to be averaged out
over the block operations. However, if there is no textural information in the the two blocks compared, then
noise dominates the matching process and causes spurious motion estimates.

This problem can be isolated by comparing the best match found ( ), to the ‘no motion’ match ( ). If
these matches are sufficiently different then the motion estimate is accepted otherwise no motion is assumed.
A threshold acts on the ratio . The error measure used is the MAE. If , where is some
threshold chosen according to the noise level suspected, then no motion is assumed. This algorithm verifies
the validity of the motion estimate once motion is detected.

The main disadvantages of Block Matching are the heavy computation involved (although these are byte
wise manipulations) and the motion averaging effect of the blocks. If the blocks chosen are too large then
many differently moving objects may be enclosed by one block and the chosen motion vector is unlikely to
match the motion of any of the objects. The advantages are that it is very simple to implement6 and it is
robust to noise due to the averaging over the blocks.

There are many more useful motion estimators than this. These others do give you motion better matched
to what is actually going on in the scene. But we will not look at these here.

6.2 Video codec issues

DVD and DTV both use MPEG-2 , and the core is exactly as described here. MPEG-2 became a standard
around 1992, and just 4 years later Digital Television was a reality. This is quite amazing considering that the
advances in research in video compression that made this possible were only really about 15 years old at the
time. Compare that to the 200 years it took Fourier to be really appreciated!
6
It has been implemented on Silicon for video coding applications.

1E8 Introduction to Engineering 18 Anil Kokaram www.mee.tcd.ie/ sigmedia

6.2 Video codec issues 6 VIDEO COMPRESSION

Mobile phone video communications will use MPEG-4 (established around 1998). Unfortunately that is
going through some teething trouble at the moment.

Sadly, the creation of MPEG standards is not as simple as motion estimation, DFD, DCT, quantisation
and transmission. When you actually start to think about putting together codecs the following issues arise.

Compression There are at least three fundamentally different types of multimedia data sources: pictures, audio and
text. Different compression techniques are needed for each data type. Each piece of data has to be
identified with unique codewords for transmission.

Sequencing The compressed data from each source is scanned into a sequence of bits. This sequence is then
packetised for transport. The problem here is to identify each different part of the bitstream uniquely
to the decoder, e.g. header information, DCT coefficient information.

Multiplexing The audio and video data (for instance) has to be decoded at the same time (or approximately the
same time) to create a coherent signal at the receiver. This implies that the transmitted elementary
data streams should be somehow combined so that they arrive at the correct time at the decoder. The
challenge is therefore to allow for identifying the different parts of the multiplexed stream and to insert
information about the timing of each elementary data stream.

Media The compressed and multiplexed data has to be stored on some DSM and then later (or live) broadcast
to receivers across air or other links. Access to different Media channels (including DSM) is governed
by different constraints and this must somehow be allowed for in the standards description.

Errors Errors in the received bitstream invariably occur. The receiver must cope with errors such that the
system performance is robust to errors or it degrades in some graceful way.

Bandwidth The bandwidth available for the multimedia transmission is limited. The transmission system must
ensure that the bandwidth of the bitstream does not exceed these limits. This problem is called Rate
Control and applies both to the control of the bitrate of the elementary data streams and the multiplexed
stream.

Multiplatform The coded bitstream may need to be decoded on many different types of device with varying proces-
sor speeds and storage resources. It would be interesting if the transmission system could provide a
bitstream which could be decoded to varying extents by different devices. Thus a low capacity device
could receive a lower quality picture than a high capacity device that would receive further features and
higher picture quality. This concept applied to the construction of a suitable bitstream format is called
Scalability.

What we have covered here is the core of the standard used for image and video compression. This just
says how the data itself is compressed. If you open up an .avi or .mpg file, you will not see this data in that
same form. It has to be encoded into symbols, and timing and copyright information embedded at the very
least. This makes the design of codecs a tricky business. But it is certainly true that without standards, there
would be no business in video communications.

Finally, note that none of the compression standards actually describe how you do the things you have
to do. It just describes how to represent bits and package them. So you can use cleverer DCTs or cleverer
motion estimators to get better speed and performance. That is why one manufacturer’s codec could be better
than another’s even though they both create compressed video according to the same standard.

1E8 Introduction to Engineering 19 Anil Kokaram www.mee.tcd.ie/ sigmedia

Chanakya InitialHandoutMPX13
100% (1)
Chanakya InitialHandoutMPX13
13 pages
Dent Ial: Ssc338D/Ssc338Q High-Integrated Ip Camera Soc Processor
No ratings yet
Dent Ial: Ssc338D/Ssc338Q High-Integrated Ip Camera Soc Processor
20 pages
Mgu S7 Ece
0% (1)
Mgu S7 Ece
16 pages
Cs Notes
No ratings yet
Cs Notes
19 pages
File C Users Dan AppData Local Temp HhB969
100% (1)
File C Users Dan AppData Local Temp HhB969
206 pages
Caie As Level: Computer SCIENCE (9618)
No ratings yet
Caie As Level: Computer SCIENCE (9618)
16 pages
Image Compression Using DCT Implementing Matlab
50% (2)
Image Compression Using DCT Implementing Matlab
23 pages
Chapter 1 Final
100% (1)
Chapter 1 Final
63 pages
An Introduction To Digital Signal Processing: Edmund M-K. Lai
No ratings yet
An Introduction To Digital Signal Processing: Edmund M-K. Lai
224 pages
Multimedia Systems: Chapter 7: Data Compression
No ratings yet
Multimedia Systems: Chapter 7: Data Compression
41 pages
(WWW Vtuworld Com) Multimedia-Communication-Notes PDF
No ratings yet
(WWW Vtuworld Com) Multimedia-Communication-Notes PDF
220 pages
Homeworksolutions
100% (2)
Homeworksolutions
6 pages
Unit-I Data Communication
No ratings yet
Unit-I Data Communication
134 pages
A Seminar On: Wavelet Video Processing
No ratings yet
A Seminar On: Wavelet Video Processing
18 pages
Compression
No ratings yet
Compression
71 pages
Image Database
No ratings yet
Image Database
16 pages
Image and Video Processing
No ratings yet
Image and Video Processing
112 pages
Mul c5
No ratings yet
Mul c5
115 pages
Image Compression
No ratings yet
Image Compression
70 pages
Image Processing
No ratings yet
Image Processing
39 pages
IGCSE Computer Science Notes
No ratings yet
IGCSE Computer Science Notes
98 pages
Streaming Logs
No ratings yet
Streaming Logs
78 pages
Caie As Level Computer Science 9618 Theory 65e5a37b2cf0b64dda328103 684
No ratings yet
Caie As Level Computer Science 9618 Theory 65e5a37b2cf0b64dda328103 684
19 pages
DVCAMformat
No ratings yet
DVCAMformat
97 pages
Introduction To JPEG and Mpeg: Ingemar J. Cox University College London
No ratings yet
Introduction To JPEG and Mpeg: Ingemar J. Cox University College London
68 pages
Baraniuk IMA Compression June07 Final
No ratings yet
Baraniuk IMA Compression June07 Final
87 pages
Blueray HT f5530k
No ratings yet
Blueray HT f5530k
59 pages
Computer Science Paper 2 Notes #RS421 (By Ryan Sutton)
No ratings yet
Computer Science Paper 2 Notes #RS421 (By Ryan Sutton)
51 pages
Labview - Waveletes Analsys Toolkit
No ratings yet
Labview - Waveletes Analsys Toolkit
63 pages
Jpeg, h261, Mpeg
No ratings yet
Jpeg, h261, Mpeg
57 pages
Oracle Advanced Compression in Database 11g Rel. 2: Value/Performance
No ratings yet
Oracle Advanced Compression in Database 11g Rel. 2: Value/Performance
42 pages
CHP 1 - DMM3e-02 - PocketPDF - Slides
No ratings yet
CHP 1 - DMM3e-02 - PocketPDF - Slides
49 pages
Image Compression 2
No ratings yet
Image Compression 2
24 pages
Image Compression and Its Implementation in Real Life: Shreyansh Tripathi, Vedant Bonde, Yatharth Rai
No ratings yet
Image Compression and Its Implementation in Real Life: Shreyansh Tripathi, Vedant Bonde, Yatharth Rai
36 pages
Cascaded Mpeg Rate Control For Simultaneous Improvement of Accuracy and
No ratings yet
Cascaded Mpeg Rate Control For Simultaneous Improvement of Accuracy and
40 pages
DCT
No ratings yet
DCT
32 pages
Compression: DMET501 - Introduction To Media Engineering
No ratings yet
Compression: DMET501 - Introduction To Media Engineering
26 pages
Group 4 Thursday
No ratings yet
Group 4 Thursday
44 pages
An Introduction To Digital Image Processing
No ratings yet
An Introduction To Digital Image Processing
49 pages
Special Topics Data Compression
No ratings yet
Special Topics Data Compression
51 pages
Image Processing Compression Techniques
No ratings yet
Image Processing Compression Techniques
16 pages
2018-Sixth-Semster-Diploma in Computer Engineering, - (WWW - Arjun00.com - NP)
No ratings yet
2018-Sixth-Semster-Diploma in Computer Engineering, - (WWW - Arjun00.com - NP)
29 pages
Data Representation B
No ratings yet
Data Representation B
29 pages
ECE280F24 Lab5
No ratings yet
ECE280F24 Lab5
27 pages
SOP For Compression Testing
0% (1)
SOP For Compression Testing
3 pages
REPORT Compressed Image Processing 45
No ratings yet
REPORT Compressed Image Processing 45
23 pages
An Approach For Compressing Digital Images by Using Run Length Encoding
No ratings yet
An Approach For Compressing Digital Images by Using Run Length Encoding
14 pages
Caie As Level Computer Science 9618 Theory 64397e8cbbb10d1b5011e127 259
No ratings yet
Caie As Level Computer Science 9618 Theory 64397e8cbbb10d1b5011e127 259
19 pages
Digital Video
No ratings yet
Digital Video
26 pages
Wk7 JPEG Withlinks
No ratings yet
Wk7 JPEG Withlinks
27 pages
Data Representation (Part 3)
No ratings yet
Data Representation (Part 3)
26 pages
Data Hiding Report
No ratings yet
Data Hiding Report
22 pages
EEE6218 Visual Information Processing (VIP) : DR Charith Abhayaratne
No ratings yet
EEE6218 Visual Information Processing (VIP) : DR Charith Abhayaratne
23 pages
V Gkn-3 NFGCK
No ratings yet
V Gkn-3 NFGCK
21 pages
Unit 1, Lesson 9 - Lossless Compression
No ratings yet
Unit 1, Lesson 9 - Lossless Compression
18 pages
Image Compression: by Artificial Neural Networks
No ratings yet
Image Compression: by Artificial Neural Networks
14 pages
Roshan Kumar: Presented by
No ratings yet
Roshan Kumar: Presented by
32 pages
Image Processing and Compression Techniques
No ratings yet
Image Processing and Compression Techniques
16 pages
SMT Project 3-1
No ratings yet
SMT Project 3-1
17 pages
معالجة الصورة -1
No ratings yet
معالجة الصورة -1
14 pages
Criteria For Choosing Line Codes in Data Communication (#116892) - 99145
No ratings yet
Criteria For Choosing Line Codes in Data Communication (#116892) - 99145
15 pages
Data Represintation and Storage
No ratings yet
Data Represintation and Storage
19 pages
Mit18 06S10 L31
No ratings yet
Mit18 06S10 L31
13 pages
Compressed Pattern Diagnosis For Scan Chain Failures
No ratings yet
Compressed Pattern Diagnosis For Scan Chain Failures
8 pages
The Video Data Type: COMP 249 Advanced Distributed Systems
No ratings yet
The Video Data Type: COMP 249 Advanced Distributed Systems
16 pages
Compression Techniques
No ratings yet
Compression Techniques
17 pages
DaVinci Resolve 16 Supported Codec List
No ratings yet
DaVinci Resolve 16 Supported Codec List
14 pages
Unit 2 MM
No ratings yet
Unit 2 MM
11 pages
MKV Info
No ratings yet
MKV Info
2 pages
Evaluation of Wavelet Filters For Image Compression: G. Sadashivappa, and K. V. S. Anandababu
No ratings yet
Evaluation of Wavelet Filters For Image Compression: G. Sadashivappa, and K. V. S. Anandababu
7 pages
An Approach For Compressing Digital Images by Using Run Length Encoding
No ratings yet
An Approach For Compressing Digital Images by Using Run Length Encoding
6 pages
Information Representation
No ratings yet
Information Representation
7 pages
Notes and Exercises - 1.3 Compression
No ratings yet
Notes and Exercises - 1.3 Compression
10 pages
Lecture 2 - Image Fundamentals
No ratings yet
Lecture 2 - Image Fundamentals
10 pages
An Enhanced Approach in Run Length Encoding Scheme (Earle)
No ratings yet
An Enhanced Approach in Run Length Encoding Scheme (Earle)
6 pages
Macmillan Education Digital Guide
No ratings yet
Macmillan Education Digital Guide
8 pages
Compression Using Huffman Coding
No ratings yet
Compression Using Huffman Coding
9 pages
Image Processing and Compression Techniques: Digitization Includes Sampling of Image and Quantization of Sampled Values
No ratings yet
Image Processing and Compression Techniques: Digitization Includes Sampling of Image and Quantization of Sampled Values
14 pages
Chapter - 5 Data Compression
No ratings yet
Chapter - 5 Data Compression
8 pages
Jpeg Compressor Using Matlab
No ratings yet
Jpeg Compressor Using Matlab
6 pages
DC MCQ Unit-03
No ratings yet
DC MCQ Unit-03
4 pages
Interpixel Redundancy: CS 450: Introduction To Digital Signal and Image Processing
No ratings yet
Interpixel Redundancy: CS 450: Introduction To Digital Signal and Image Processing
4 pages
Video Compression Makes Big Gains
No ratings yet
Video Compression Makes Big Gains
4 pages
SD2A200 GN AW PV - Datasheet - 20211125
No ratings yet
SD2A200 GN AW PV - Datasheet - 20211125
3 pages
MEMO TI-5000JX Testing 1Vpp Sine and Inductive Scanning Gap
No ratings yet
MEMO TI-5000JX Testing 1Vpp Sine and Inductive Scanning Gap
4 pages
Encoding Data: IT (9626) Theory Notes
No ratings yet
Encoding Data: IT (9626) Theory Notes
2 pages

Lecture3 PDF

Uploaded by

Lecture3 PDF

Uploaded by

The Essence of Image and Video Compression

1E8: Introduction to Engineering

Dr. Anil C. Kokaram,

1. What is compression and why is it needed?

2. The simplest possible compression scheme: Run Length Encoding

3. Representing signals by sums of sines and cosines [The Fourier Transform]

4. Transform compression and JPEG

5. Motion estimation and predicting pictures in a sequence

2 The need for compression

MB/sec is required to transmit the VIDEO ALONE! This means about

1E8 Introduction to Engineering 2 Anil Kokaram www.mee.tcd.ie/ sigmedia

Format Total Active MB/sec

varying. For small .. think blocks of pixels.

We’ll use these ideas now.

1E8 Introduction to Engineering 3 Anil Kokaram www.mee.tcd.ie/ sigmedia

4 Run Length Encoding

1E8 Introduction to Engineering 4 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 5 Anil Kokaram www.mee.tcd.ie/ sigmedia

5.1 Representing signals with waves

1E8 Introduction to Engineering 6 Anil Kokaram www.mee.tcd.ie/ sigmedia

should be added together. This is the Frequency Spectrum of a signal.

1E8 Introduction to Engineering 7 Anil Kokaram www.mee.tcd.ie/ sigmedia

5.2 Image Transforms

1E8 Introduction to Engineering 8 Anil Kokaram www.mee.tcd.ie/ sigmedia

5.3 JPEG for First Year Undergraduates

1E8 Introduction to Engineering 9 Anil Kokaram www.mee.tcd.ie/ sigmedia

Now we’re almost there ...

1E8 Introduction to Engineering 10 Anil Kokaram www.mee.tcd.ie/ sigmedia

Problems : blocking artefacts and mosquito noise.

1E8 Introduction to Engineering 11 Anil Kokaram www.mee.tcd.ie/ sigmedia

6.0.1 On Motion Compensated Prediction

1E8 Introduction to Engineering 12 Anil Kokaram www.mee.tcd.ie/ sigmedia

Block Motion Shifted Block in Frame n−1

Location of Block in Frame n

Figure 1: Explaining how motion compensation works.

pixels in these blocks according to equation 2 is the prediction error.

6.0.2 Image prediction

1E8 Introduction to Engineering 13 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 14 Anil Kokaram www.mee.tcd.ie/ sigmedia

Figure 3: A typical Group of Pictures (GOP) in MPEG2

6.0.3 Problems with occlusion

6.1 Sledgehammer motion estimation: Block Matching

Two basic assumptions are made in this technique.

1E8 Introduction to Engineering 15 Anil Kokaram www.mee.tcd.ie/ sigmedia

Given a possible vector  

1E8 Introduction to Engineering 16 Anil Kokaram www.mee.tcd.ie/ sigmedia

Frame n-1 Frame n

Centre pixel of block to be matched

requires on the order of operations to calculate the MAE. This implies 

6.1.2 Three step search

6.1.3 Cross Search

1E8 Introduction to Engineering 17 Anil Kokaram www.mee.tcd.ie/ sigmedia

6.2 Video codec issues

1E8 Introduction to Engineering 18 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 19 Anil Kokaram www.mee.tcd.ie/ sigmedia

You might also like

1E8 Introduction to Engineering 2 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 3 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 4 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 5 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 6 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 7 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 8 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 9 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 10 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 11 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 12 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 13 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 14 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 15 Anil Kokaram www.mee.tcd.ie/ sigmedia

Given a possible vector

1E8 Introduction to Engineering 16 Anil Kokaram www.mee.tcd.ie/ sigmedia

requires on the order of operations to calculate the MAE. This implies

1E8 Introduction to Engineering 17 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 18 Anil Kokaram www.mee.tcd.ie/ sigmedia

1E8 Introduction to Engineering 19 Anil Kokaram www.mee.tcd.ie/ sigmedia