0% found this document useful (0 votes)
141 views25 pages

6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression

This document discusses lossless data compression techniques. It introduces run-length coding, which codes repeating symbols by encoding one symbol and the length of the run. Variable-length coding (VLC) assigns shorter codes to more frequent symbols to reduce the number of bits needed. Shannon-Fano and Huffman coding are two entropy coding methods discussed, with Huffman coding typically performing better by assigning codes through building a binary tree from frequency statistics. Lossless compression aims to compress data without any information loss.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views25 pages

6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression

This document discusses lossless data compression techniques. It introduces run-length coding, which codes repeating symbols by encoding one symbol and the length of the run. Variable-length coding (VLC) assigns shorter codes to more frequent symbols to reduce the number of bits needed. Shannon-Fano and Huffman coding are two entropy coding methods discussed, with Huffman coding typically performing better by assigning codes through building a binary tree from frequency statistics. Lossless compression aims to compress data without any information loss.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Unit 6: Multimedia Data Compression:

Lossless Compression Algorithms: Introduction, Run-Length Coding, Variable-Length Coding


(VLC), Arithmetic Coding, Lossless Image Compression, Lossy Compression Algorithms:
Introduction, Basic Video Compression Techniques: H.261, H.263.
6.1 Lossless Compression Algorithms: Introduction
Introduction
The emergence of multimedia technologies has made digital libraries a reality. Nowadays,
libraries, museums, film studios, and governments are converting more and more data and
archives into digital form. Some of the data (e.g., precious books and paintings) indeed need to
be stored without any loss.

As a start, suppose we want to encode the call numbers of the 120 million or so items in the
Library of Congress (a mere 20 million, if we consider just books). Why don’t we just transmit
each item as a 27-bit number, giving each item a unique binary code (since 227 > 120, 000,
000)? The main problem is that this “great idea” requires too many bits. And in fact there exist
many coding techniques that will effectively reduce the total number of bits needed to represent
the above information. The process involved is generally referred to as compression.

We had a beginning look at compression schemes aimed at audio. There, we had to first consider
the complexity of transforming analog signals to digital ones, whereas here, we shall consider
that we at least start with digital signals.For example, even though we knowan image is captured
using analog signals, the file produced by a digital camera is indeed digital. The more general
problem of coding (compressing) a set of any symbols, not just byte values, say, has been studied
for a long time.

Getting back to our Library of Congress problem, it is well known that certain parts of call
numbers appear more frequently than others, so it would be more economic to assign fewer bits
as their codes. This is known as variable-length coding (VLC)—the more frequently appearing
symbols are coded with fewer bits per symbol, and vice versa. As a result, fewer bits are usually
needed to represent the whole collection.

In this chapter we study the basics of information theory and several popular lossless
compression techniques. Figure 7.1 depicts a general data compression scheme, in which
compression is performed by an encoder and decompression is performed by a decoder.

We call the output of the encoder codes or codewords. The intermediate medium could either be
data storage or a communication/computer network. If the compression and decompression
processes induce no information loss, the compression scheme is lossless; otherwise, it is lossy.
The next several chapters deal with lossy compression algorithms as they are commonly used for
image, video, and audio compression. Here, we concentrate on lossless compression.

If the total number of bits required to represent the data before compression is B0 and the total
number of bits required to represent the data after compression is B1, then we define the
compression ratio as
compression ratio =B0 /B1 . (7.1)
In general, we would desire any codec (encoder/decoder scheme) to have a compression ratio
much larger than 1.0. The higher the compression ratio, the better the lossless compression
scheme, as long as it is computationally feasible.

6.2 Run-Length Coding


Instead of assuming amemoryless source, run-length coding (RLC) exploits memory present in
the information source.

It is one of the simplest forms of data compression. The basic idea is that if the information
source we wish to compress has the property that symbols tend to form continuous groups,
instead of coding each symbol in the group individually, we can code one such symbol and the
length of the group. As an example, consider a bilevel image (one with only 1-bit black and
white pixels) with monotone regions—like an fx.
This information source can be efficiently coded using run-length coding. In fact, since there are
only two symbols, we do not even need to code any symbol at the start of each run. Instead, we
can assume that the starting run is always of a particular color (either black or white) and simply
code the length of each run.

The above description is the one-dimensional run-length coding algorithm. A two dimensional
variant of it is generally used to code bilevel images. This algorithm uses the coded run
information in the previous row of the image to code the run in the current row. A full
description of this algorithm can be found in

6.3 Variable-Length Coding (VLC).


Since the entropy indicates the information content in an information source S, it leads to a
family of coding methods commonly known as entropy coding methods. As described earlier,
variable-length coding (VLC) is one of the best-known such methods. Here, we will study the
Shannon–Fano algorithm, Huffman coding, and adaptive Huffman coding.

6.3.1 Shannon–Fano Algorithm


The Shannon–Fano algorithmwas independently developed by Shannon at Bell Labs and Robert
Fano at MIT [6]. To illustrate the algorithm, let us suppose the symbols to
be coded are the characters in the word HELLO. The frequency count of the symbols is Symbol
HELO
Count 1 1 2 1
The encoding steps of the Shannon–Fano algorithm can be presented in the following top-down
manner:
1. Sort the symbols according to the frequency count of their occurrences.
2. Recursively divide the symbols into two parts, each with approximately the same number of
counts, until all parts contain only one symbol.
A natural way of implementing the above procedure is to build a binary tree. As a convention, let
us assign bit 0 to its left branches and 1 to the right branches. Initially, the symbols are sorted as
LHEO. As Fig. shows, the first division yields two parts: L with a count of 2, denoted as L:(2);
and H, E and O with a total count of 3, denoted as H, E, O:(3). The second division yields H:(1)
and E, O:(2).
The last division is E:(1) and O:(1).

Table summarizes the result, showing each symbol, its frequency count, information Content

, resulting codeword, and the number of bits needed to encode each symbol in the
word HELLO. The total number of bits used is shown at the bottom.

To revisit the previous discussion on entropy, in this case


This suggests that the minimum average number of bits to code each character in the word
HELLO would be at least 1.92. In this example, the Shannon–Fano algorithm uses an average of
10/5 = 2 bits to code each symbol, which is fairly close to the lower bound of 1.92. Apparently,
the result is satisfactory.

It should be pointed out that the outcome of the Shannon–Fano algorithm is not necessarily
unique. For instance, at the first division in the above example, it would be equally valid to
divide into the two parts L, H:(3) and E, O:(2). This would result in the coding in Fig.

Table shows the codewords are different now. Also, these two sets of codewords may behave
differently when errors are present. Coincidentally, the total number of bits required to encode
the world HELLO remains at 10. The Shannon–Fano algorithm delivers satisfactory coding
results for data compression, but it was soon outperformed and overtaken by the Huffman coding
method.
6.3.2 Huffman Coding
First presented by Huffman in a 1952 paper , this method attracted an overwhelming amount of
research and has been adopted in many important and/or commercial applications, such as fax
machines, JPEG, and MPEG.
In contradistinction to Shannon–Fano, which is top-down, the encoding steps of the Huffman
algorithm are described in the following bottom-up manner. Let us use the same example word,
HELLO. A similar binary coding tree will be used as above, in which the left branches are coded
0 and right branches 1. A simple list data structure is also used.

Algorithm (Huffman Coding).


1. Initialization: put all symbols on the list sorted according to their frequency counts.

2. Repeat until the list has only one symbol left.


(a) Fromthe list, pick two symbolswith the lowest frequency counts. FormaHuffman subtree
that has these two symbols as child nodes and create a parent node for them.
(b) Assign the sum of the children’s frequency counts to the parent and insert it into the list,
such that the order is maintained.
(c) Delete the children from the list.

3. Assign a codeword for each leaf based on the path from the root.

In Fig., new symbols P1, P2, P3 are created to refer to the parent nodes in the Huffman coding
tree. The contents in the list are illustrated below:
After initialization: L H E O
After iteration (a): L P1 H
After iteration (b): L P2
After iteration (c): P3

For this simple example, the Huffman algorithm apparently generated the same coding result as
one of the Shannon–Fano results shown in Fig., although the results are usually better. The
average number of bits used to code each character is also 2, (i.e., (1+1+2+3+3)/5 = 2).

As another simple example, consider a text string containing a set of characters and their
frequency counts as follows: A:(15), B:(7), C:(6), D:(6) and E:(5). It is easy to show that the
Shannon–Fano algorithm needs a total of 89 bits to encode this string, whereas the Huffman
algorithm needs only 87.

As shown above, if correct probabilities (“prior statistics”) are available and accurate, the
Huffman coding method produces good compression results. Decoding for the Huffman coding
is trivial as long as the statistics and/or coding tree are sent before the data to be compressed (in
the file header, say).

This overhead becomes negligible if the data file is sufficiently large.

The following are important properties of Huffman coding:


• Unique prefix property. No Huffman code is a prefix of any other Huffman code.
For instance, the code 0 assigned to L in Fig. 7.5c is not a prefix of the code 10

for H or 110 for E or 111 for O; nor is the code 10 for H a prefix of the code 110 for E or 111 for
O. It turns out that the unique prefix property is guaranteed by the above Huffman algorithm,
since it always places all input symbols at the leaf nodes of the Huffman tree.
The Huffman code is one of the prefix codes for which the unique prefix property holds. The
code generated by the Shannon–Fano algorithm is another such example. This property is
essential and also makes for an efficient decoder, since it precludes any ambiguity in decoding.

In the above example, if a bit 0 is received, the decoder can immediately produce a symbol L
without waiting for any more bits to be transmitted.

• Optimality. The Huffman code is a minimum-redundancy code, as shown in Huffman’s 1952


paper . It has been proven that the Huffman code is optimal for a given data model (i.e., a given,
accurate, probability distribution):
– The two least frequent symbols will have the same length for their Huffman codes, differing
only at the last bit. This should be obvious from the above algorithm.
– Symbols that occur more frequently will have shorter Huffman codes than symbols that occur
less frequently. Namely, for symbols si and s j , if pi ≥ p j then li ≤ l j , where li is the number of
bits in the codeword for si .
– It has been shown that the average code length for an information source S is strictly less than
η + 1. Combined with Eq , we have
6.4 Arithmetic Coding
Arithmetic coding is a more modern coding method that usually outperforms Huffman coding in
practice. It was initially developed in the late 1970s and 1980s. The initial idea of arithmetic
coding was introduced in Shannon’s 1948 work. Peter Elias developed its first recursive
implementation.

Various modern versions of arithmetic coding have been developed for newer multimedia
standards: for example, Fast Binary Arithmetic Coding in JBIG, JBIG2 and JPEG-2000, and
Context-Adaptive Binary Arithmetic Coding (CABAC) in H.264 and H.265.

Normally (in its non-extended mode), Huffman coding assigns each symbol a codeword that has
an integral bit length. As stated earlier, log2(1/pi) indicates the amount of information contained
in the information source si , which corresponds to the number of bits needed to represent it. For
example, when a particular symbol si has a large probability (close to 1.0), log2 (1/pi ) will be
close to 0, and even assigning only one bit to represent that symbol will be very costly if we have
to transmit that one bit many times.

Although it is possible to group symbols into metasymbols for codeword assignment (as in
extended Huffman coding) to overcome the limitation of integral number of bits per symbol, the
increase in the resultant symbol table required by the Huffman encoder and decoder would be
formidable.
Arithmetic coding can treat the whole message as one unit and achieve fractional number of bits
for each input symbol. In practice, the input data is usually broken up into chunks to avoid error
propagation. In our presentation below, we will start with a simplistic approach and include a
terminator symbol. Then we will introduce some improved methods for practical
implementations.

6.4.1 Basic Arithmetic Coding Algorithm


A message is represented by a half-open interval [a, b) where a and b are real numbers between
0 and 1. Initially, the interval is [0, 1). When the message becomes longer, the length of the
interval shortens, and the number of bits needed to represent the interval increases. Suppose the
alphabet is [A, B,C, D, E, F, $], in which $ is a special symbol used to terminate the message,
and the known probability distribution is as shown in Fig. a.
The encoding process is illustrated in Fig. b and c, in which a string of symbols CAEE$ is
encoded. Initially, low = 0, high = 1.0, and range = 1.0. The first symbol is C, Range_low(C) =
0.3, Range_high(C) = 0.5, so after the symbol C, low = 0 + 1.0 × 0.3 = 0.3, high = 0 + 1.0 × 0.5
= 0.5. The new range is now reduced to 0.2.
For clarity of illustration, the ever-shrinking ranges are enlarged in each step (indicated by
dashed lines) in Fig. b. After the second symbol A, low, high, and range are 0.30, 0.34, and 0.04.
The process repeats itself until after the terminating 208 7 Lossless Compression Algorithms
symbol $ is received. By then low and high are 0.33184 and 0.33220, respectively.

It is apparent that finally we have range = PC × PA × PE × PE × P$ = 0.2 × 0.2 × 0.3 × 0.3 × 0.1
= 0.00036
The final step in encoding calls for generation of a number that falls within the range [low, high).
This number is referred to as a tag, i.e., a unique identifier for the interval that represents the
sequence of symbols. Although it is trivial to pick such a number in decimal, such as 0.33184,
0.33185, or 0.332 in the above example, it is less obvious how to do it with a binary fractional
number. The following algorithm will ensure that the shortest binary codeword is found if low
and high are the two ends of the range and low < high.

For the above example, low = 0.33184, high = 0.3322. If we assign 1 to the first binary fraction
bit, it would be 0.1 in binary, and its decimal value(code) = value(0.1) = 0.5 > high. Hence, we
assign 0 to the first bit. Since value(0.0) = 0 < low, the while loop continues. Assigning 1 to the
second bit makes a binary code 0.01 and value(0.01)=0.25, which is less than high, so it is
accepted. Since it is still true that value(0.01) < low, the iteration continues. Eventually, the
binary codeword generated is 0.01010101, which is 2−2 + 2−4 + 2−6 + 2−8 = 0.33203125. It can
be proven [2] that ⌈log2(1/πi Pi )⌉ is the upper bound. Namely, in the worst case, the shortest
codeword in arithmetic coding will require k bits to encode a sequence of symbols.

6.4.2 Binary Arithmetic Coding


As described, the implementation of arithmetic coding involves continuous generation
(calculation) of new intervals, and checking against delimiters of the intervals. When the number
of symbols is large, this involves many calculations (integer or floating number) so it can be
slow. As the name suggests, Binary Arithmetic Coding deals with two symbols only, i.e., 0 and
1. Figure illustrates a simple scenario. It is obvious that only one new value inside the tag
interval is generated at each step, i.e., 0.7, 0.49, 0.637, and 0.5929. The decision of which
interval to take (first or second) is also simpler.
The encoder and decoder including the scaling and possible integer implementation work the
same way as for non-binary symbols. Non-binary symbols can be converted to binary for Binary
Arithmetic Coding through binarization.

Many coding schemes can be used for the binarization. We will introduce one of them, the Exp-
Golomb code, Fast Binary Arithmetic Coding (Q-coder, MQ-coder) was developed in
multimedia standards such as JBIG, JBIG2, and JPEG-2000. The more advanced version,
Context-Adaptive Binary Arithmetic Coding (CABAC) is used in H.264 (M-coder) and H.265.

6.4.3 Adaptive Arithmetic Coding


We now know that arithmetic coding can be performed incrementally. Hence, there is no need to
know the probability distribution of all symbols in advance. This makes the codec process
especially adaptive—we can record the current counts of the symbols received so far, and update
the probability distribution after each symbol. The updated probability distribution will be used
for dividing up the interval in the next step.

As in the Adaptive Huffman Coding, as long as the encoder and decoder are synchronized (i.e.,
using the same update rules), the adaptive process will work flawlessly. Nevertheless, Adaptive
Arithmetic Coding has a major advantage over Adaptive Huffman Coding: there is now no need
to keep a (potentially) large and dynamic symbol table and constantly update the Adaptive
Huffman tree. Below we outline the procedures for Adaptive Arithmetic Coding, and give an
example to illustrate how it also works for Binary Arithmetic Coding.
6.5 Lossless Image Compression
One of the most commonly used compression techniques in multimedia data compression is
differential coding. The basis of data reduction in differential coding is the redundancy in
consecutive symbols in a data stream. Recall that we considered lossless differential coding in,
when we examined how audio must be dealt with via subtraction from predicted values. Audio is
a signal indexed by one dimensional time. Here we consider how to apply the lessons learned
from audio to the context of digital image signals that are indexed by two, spatial, dimensions (x,
y).

6.5.1 Differential Coding of Images


Let us consider differential coding in the context of digital images. In a sense, we move from
signals with domain in one dimension to signals indexed by numbers in two dimensions (x,
y)—the rows and columns of an image. Later, we will look at video signals. These are even more
complex, in that they are indexed by space and time (x, y, t).

Because of the continuity of the physical world, the gray-level intensities (or color) of
background and foreground objects in images tend to change relatively slowly across the image
frame. Since we were dealing with signals in the time domain for audio, practitioners generally
refer to images as signals in the spatial domain.

The generally slowly changing nature of imagery spatially produces a high likelihood that
neighboring pixels will have similar intensity values. Given an original image
I (x, y), using a simple difference operator we can define a difference image d(x, y)
as follows:

This is a simple approximation of a partial differential operator ∂/∂x applied to an image defined
in terms of integer values of x and y. Another approach is to use the discrete version of the 2D
Laplacian operator to define a difference image d(x, y) as
In both cases, the difference image will have a histogram as in Fig. d, derived from the d(x, y)
partial derivative image in Fig. b for the original image I in Fig. a. Notice that the histogram for
the unsubtracted I itself is much broader, as in Fig. c. It can be shown that image I has larger
entropy than image d, since it has amore even distribution in its intensity values. Consequently,
Huffman coding or some other variable-length coding scheme will produce shorter bit-length
codewords for the difference image. Compression will work better on a difference image.

6.5.2 Lossless JPEG


Lossless JPEG is a special case of JPEG image compression. It differs drastically from other
JPEG modes in that the algorithm has no lossy steps. Thus we treat it here and consider the more
used lossy JPEG methods.

Lossless JPEG is invoked when the user selects a 100% quality factor in an image tool.
Essentially, lossless JPEG is included in the JPEG compression standard simply for
completeness.

The following predictive method is applied on the unprocessed original image (or each color
band of the original color image). It essentially involves two steps: forming a differential
prediction and encoding.

1. A predictor combines the values of up to three neighboring pixels as the predicted value for
the current pixel, indicated by X in Fig. The predictor can use any one of the seven schemes
listed in Table 7.6. If predictor P1 is used, the neighboring intensity value A will be adopted as
the predicted intensity of the current pixel; if predictor P4 is used, the current pixel value is
derived from the three neighboring pixels as A + B − C; and so on.

2. The encoder compares the prediction with the actual pixel value at position X and encodes the
difference using one of the lossless compression techniques we have discussed, such as the
Huffman coding scheme. Since prediction must be based on previously encoded neighbors, the
very first pixel in the image I (0, 0) will have to simply use its own value. The pixels in the first
row always use predictor P1, and those in the first column always use P2. Lossless JPEG usually
yields a relatively low compression ratio, which renders it impractical for most multimedia
applications. An empirical comparison using some 20 images indicates that the compression
ratio for lossless JPEG with any one of the seven predictors ranges from 1.0 to 3.0, with an
average of around 2.0. Predictors 4–7 that consider neighboring nodes in both horizontal and
vertical dimensions offer slightly better compression (approximately 0.2–0.5 higher) than
predictors 1–3.

Table shows a comparison of the compression ratio for several lossless compression techniques
using test images Lena, football, F-18, and flowers. These standard images used for many
purposes in imaging work are shown on the textbook website for this chapter.

This chapter has been devoted to the discussion of lossless compression algorithms. It should be
apparent that their compression ratio is generally limited (with a maximum at about 2–3).
However, many of the multimedia applications we will address in the next several chapters
require a much higher compression ratio. This is accomplished by lossy compression schemes.

6.6 Lossy Compression Algorithms: Introduction

In this chapter, we consider lossy compression methods. Since information loss implies some
tradeoff between error and bitrate, we first consider measures of distortion—e.g., squared error.
Different quantizers are introduced, each of which has a different distortion behavior.

A discussion of transform coding leads into an introduction to the Discrete Cosine Transform
used in JPEG compression and the Karhunen Loève transform. Another transform scheme,
wavelet-based coding, is then set out deals extensively with the subject of lossy data
compression in a well-organized and easy-to-understand manner.

The mathematical foundation for the development of many lossy data compression algorithms is
the study of stochastic processes. The compression ratio for image data using lossless
compression techniques (e.g., Huffman Coding, Arithmetic Coding, LZW) is low when the
image histogram is relatively flat.

For image compression in multimedia applications, where a higher compression ratio is required,
lossy methods are usually adopted. In lossy compression, the compressed image is usually not
the same as the original image but is meant to form a close approximation to the original image
perceptually. To quantitatively describe how close the approximation is to the original data,
some form of distortion measure is required.

6.6.1 DistortionMeasures
A distortion measure is a mathematical quantity that specifies how close an approximation is to
its original, using some distortion criteria. When looking at compressed data, it is natural to think
of the distortion in terms of the numerical difference between the original data and the
reconstructed data.

However, when the data to be compressed is an image, such a measure may not yield the
intended result. For example, if the reconstructed image is the same as original image except that
it is shifted to the right by one vertical scan line, an average human observer would have a hard
time distinguishing it from the original and would therefore conclude that the distortion is small.

However, when the calculation is carried out numerically, we find a large distortion, because of
the large changes in individual pixels of the reconstructed image. The problem is that we need
ameasure of perceptual distortion, not a more naive numerical approach. However, the study of
perceptual distortions is beyond the scope of this book. Of the many numerical distortion
measures that have been defined, we present the three most commonly used in image
compression.
If we are interested in the average pixel difference, the mean square error (MSE) σ2 is often
used. It is defined as

where xn, yn, and N are the input data sequence, reconstructed data sequence, and length of the
data sequence, respectively. If we are interested in the size of the error relative to the signal, we
can measure the signal-to-noise ratio (SNR) by taking the ratio of the average square of the
original data sequence and the mean square error (MSE). In decibel units (dB), it is defined as

where σ2x is the average square value of the original data sequence
and σ2d is the MSE.
Another commonly used measure for distortion is the peak-signal-to-noise ratio (PSNR), which
measures the size of the error relative to the peak value of the signal xpeak. It is given by

6.6.2 The Rate-Distortion Theory


Lossy compression always involves a tradeoff between rate and distortion. Rate is the average
number of bits required to represent each source symbol. Within this framework, the tradeoff
between rate and distortion is represented in the form of a rate-distortion function R(D).

Intuitively, for a given source and a given distortion measure, if D is a tolerable amount of
distortion, R(D) specifies the lowest rate at which the source data can be encoded while keeping
the distortion bounded above by D. It is easy to see that when D = 0, we have a lossless
compression of the source. The rate-distortion function

is meant to describe a fundamental limit for the performance of a coding algorithm and so can be
used to evaluate the performance of different algorithms. Figure shows a typical rate-distortion
function. Notice that the minimum possible rate at D = 0, no loss, is the entropy of the source
data. The distortion corresponding to a rate R(D) ≡ 0 is the maximum amount of distortion
incurred when “nothing” is coded.

Finding a closed-form analytic description of the rate-distortion function for a given source is
difficult, if not impossible.

6.7 Basic Video Compression Techniques: H.261, H.263.


we studied three major multimedia standards, namely MPEG-1, MPEG-2 and the more recent as
well as advanced MPEG-4. Apart from the MPEG, the International Telecommunication Union-
Telecommunications Standards Sector (ITU-T) also evolved the standards for multimedia
communications at restricted bit-rate over the wireline and wireless channels.
The ITU-T standardization on multimedia first started with H.261, which was developed for
ISDN video conferencing. The next standard H.263 supported Plain Old Telephone Systems
(POTS) conferencing at very low bit-rates (64 Kbits/sec and lower). The most recent and
advanced standard H.264 offers significant coding improvement over its predecessors and
supports mobile video applications.

In this lesson, we are going to study the first two of these standards i.e., H.261 and H.263. These
standards mostly use the same concepts as those followed in MPEG and instead of repeating
information, only the novelties and the special features of these standards will be covered.

6.7.1.1 Basic objectives of H.261 standard


The H.261 standard developed in 1988-90 was a fore runner to the MPEG-1 and was designed
for video conferencing applications over ISDN telephone lines.
The baseline ISDN has a bit-rate of 64 Kbits/ sec and at the higher end, ISDN supports bit-rates
having integral multiples (p) of 64 Kbits/sec.

For this reason, the standard is also referred to as the p x 64 Kbits/sec standard.

In addition to forming a basis for the MPEG-1 and MPEG-2 standards, the H.261 standards
offers two important features:
a) Maximum coding delay of 150 msec. It has been observed that delays exceeding 150 msec do
not provide direct visual feed back in bi-directional video conferencing

b) Amenability to VLSI implementation, which is important for widespread commercialization


of videophone and teleconferencing equipments.

6.7.1.2 Picture formats and frame-types in H.261


The H.261 standard supports two picture formats:

i) Common Intermediate Format (CIF), having 352 x 288 pixels for the luminance channel (Y)
and 176 x 144 pixels for each of the two chrominance channels U and V. Four temporal rates,
viz, 30 15, 10 or 7.5 frames/ sec are supported. CIF images are used when p ≥ 6, that is for video
conferencing applications.

ii) Quarter of Common Intermediate Format (QCIF) having 176 x 144 pixels for the Y and 88 x
72 pixels each for U and V. QCIF images are normally used for low bit-rates applications like
videophones (typically p =1). The same four temporal rates are supported by QCIF images also.

H.261 frames are of two types


• I-frames: These are coded without any reference to previously coded frames.

• P-frames: These are coded using a previous frame as a reference for prediction.

6.7.1.3 H.261 Bit-stream structure


The H.261 bit-stream follows a hierarchical structure having the following layers:
• Picture-layer, that includes start of picture code (PSC), time stamp reference (TR), frame-type
(I or P), followed by Group of Blocks (GOB) data.

• GOB layer that includes a GOB start code, the group number, a group quantization value,
followed by macroblocks (MB) data.

• MB layer, that includes macroblock address (MBA), macroblock type (MTYPE : intra/inter),
quantizer (MQUANT), motion vector data (MVD), the coded block pattern (CBP), followed by
encoded block.

• Block-layer, that includes zig-zag scanned (run, level) pair of coefficients, terminated by the
end of block (EOB)

The H.261 bit-stream structure is illustrated in fig

It is possible that encoding of some GOBs may have to be skipped and the GOBs considered for
encoding must therefore have a group number, as indicated. A common quantization value may
be used for the entire GOB by specifying the group quantizer value. However, specifying the
MQUANT in the macroblock overrides the group quantization value.
We now explain the major elements of the hierarchical data structure.

Each picture in H.261 is divided into GOBs as illustrated in fig


Fig is Arrangement of GOBs in (a) CIF picture, (b) QCIF picture Each GOB thus relates to 176
pixels by 48 lines of Y and 88 pixels by 24 lines each of U and V. Each GOB must therefore
comprise of 33 macroblocks- 11 horizontally and 3 vertically, as shown in fig. below

Fig 26.3 Composition of a GOB. Each box corresponds to a macroblock and the number
corresponds to the macroblock number. Data for each GOB consists of a GOB header followed
by data for macroblocks. Each GOB header is transmitted once between picture start codes in the
CIF or QCIF sequence.
6.7.1.4 Macroblock layer:
As already shown in fig , each GOB consists of 33 macroblocks. Each macroblock relates to 16 x
16 pixels of Y and corresponding 8 x 8 pixels of each U and V, as shown in fig .

Since block is defined as a spatial array of 8 x 8 pixels, each macroblock therefore consists of six
blocks – four from Y and one each from U and V. Each macroblock has a header, that includes
the following information –

• Macroblock address (MBA)- It is a variable length codeword indicating the position of a


macroblock within a group of blocks. For the first transmitted macroblock in a GOB, MBA is the
absolute address. For subsequent macroblocks, MBA is the difference between the absolute
address of the macroblock and the last transmitted macroblock. MBA is always included in
transmitted macroblocks. Macroblocks are not transmitted when they contain no information for
that part of the picture they represent.

• Macroblock type (MTYPE)- It is also variable length codeword that indicates the prediction
mode employed and which data elements are present. The H.261 standard supports the following
prediction modes: o intra modes are adopted for those macroblocks whose content change
significantly between two successive macroblocks.
o inter modes employ DCT of the inter-frame prediction error.
o inter + MC modes employ DCT of the motion compensated prediction error.
o inter +MC + fil modes also employ filtering of the predicted macroblock

• Quantizer (MQUANT): It is present only if so indicated by the MTYPE. MQUANT overrides


the quantizer specified in the GOB and till any subsequent MQUANT is specified, this quantizer
is used for all subsequent macroblocks.

• Motion vector data (MVD) – It is also a variable length codeword (VLC) for the horizontal
component of the motion vector, followed by a variable length codeword for the vertical
component. MVD is obtained from the macroblock vector by subtracting the vector of the
preceding macroblock.
• Coded block pattern (CBP) – CBP gives a pattern number that signifies which of the block
within the macroblock has at least one significant transformation coefficient. The pattern number
is given by

where, Pn = 1 if any coefficient is present for block n, else 0. The block numberings are as per
fig.2.

6.7.1.5 Block Layer:


The block layer does not have any separate header, since macroblock is the basic coding entity.
Data for a block consists of codewords of transform coefficients (TCOEFF), followed by an end
of block (EOB) marker.

Transform coefficients are always present for intra macroblocks. For inter-coded macro blocks,
transform coefficients may or may not be present within the block and their status is given by the
CBP field in the macroblock layer. TCOEFF encodes the (RUN, LEVEL) combinations using
variable length codes, where RUN indicates run of zero coefficient in the zig-zag scanned block
DCT array.

6.7.2.1 Basic objectives of H.263 standard


Before we talk about the next video coding standard H.263 that was adopted by ITU-T, we
would like to tell the readers why we do not talk about any standard called H.262, which should
have been logically there in between the H.261 and H.263. Indeed, H.262 project was defined to
address ATM/broadband video conferencing applications, but since it scope was included by
MPEG-2 standard, no separate ITU-T standard was proposed by the name H.262.

The H.261 standard’s coding algorithm achieved several major performance breakthroughs, but
at the lower extreme of its bit-rate, i.e., at 64 Kbit/sec, serious blocking artifacts produced
annoying effects.

This was tackled by reduced the frame rate, for example from the usual rate of 30 frames/ sec
down to 10 frames/sec by considering only one out of every three frames and dropping the
remaining two. However, reduced frame rate reduces temporal resolution, which is also not very
desirable for rapidly changing scenes. Reduced frame rate also causes high endto-end delays,
which is also not very desirable.

Hence, there was a need to design a coding standard that would provide better performance than
the H.261 standard at lower bit-rate. With this requirement evolved the H.263 standard, whose
targeted application was POTS video conferencing.

During the development of H.263, the target bit-rate was determined by the maximum bitrate
achievable at the general switched telephone network (GSTN), which was 28.8 Kbits/sec at that
time. At these bit-rates, it was necessary to keep the overhead information at a minimum. The
other requirements of H.263 standardization were:

• Use of available technology


• Interoperability between the other standards, like H.261
• Flexibility for future extensions
• Quality of service parameters, such as resolution, delay, frame-rate etc.
• Subjective quality measurements.

Based on all these requirements an efficient coding scheme was designed. Although it was
optimized for 28.8 Kbits/sec, even at higher bit rates up to 600 Kbits/sec, H.263 outperformed
the H.261 standard.

6.7.2.2 Picture formats of H.263 T


he H.263 standard supports five pictures formats ---

The CIF, 4CIF and 16 CIF picture formats are optional for encoders as well as decoders. It is
mandatory for the decoders to support both sub-QCIF and QCIF picture formats. However, for
encoders, only one of these two formats (Sub-QCIF or QCIF) is mandatory. In all these formats
Y, U and V are sampled in 4: 2:0.

26.6 Improved features of H.263 over H.261


The H.263 offered several major improvements over its predecessor H.261.
Some of these are –
• Half – pixel motion estimation: In H.261, the motion vector were expressed in integer pixel
units. This often poses a limitation in motion compensation, since one pixel resolution is often
too crude to represent real world motion. Half-pixel motion estimation is explained in the next
sub section.
• Unrestricted motion vector mode: In the default prediction mode of H.263, all motion vectors
are restricted so that the pixels referenced by them lie within the coded picture area. In the
unrestricted motion vector mode, this restriction is removed and the motion vectors are allowed
to point outside the picture.
• Advanced prediction mode – This is an optional mode that supports four motion vectors per
macro block and overlapped block motion compensation (OBMC). This is explained in detail
later.
• PB-frames mode – This increases the frame-rate without significantly increasing the bitrate.
The concept is explained in a later sub-section.
• Syntax based arithmetic coding (SAC) mode – This mode achieves a better compression, as
compared to Huffman coded VLC tables.
6.7.2.3 Half pixel motion estimation : Motion estimation with half pixel accuracy requires pixel
based interpolation, as illustrated in fig.

In fig , the integer pixel positions have been indicated by the “+” symbol. A half pixel wide grid
is formed in which one out of four pixels in the grid coincides with the integer grid position. The
remaining three out of four pixels are generated through interpolation. The interpolated pixels
indicated by “O” symbol lies at the integer as well as the half-pixel positions. Together, the
integer and the half pixel positions create a picture that has two times the spatial resolution as
compared to the original picture in both the horizontal and the vertical directions. In fig 26.5, the
interpolated pixels marked as ‘a” ‘b” ‘c’ and ‘d’ are given by.

where A, B,C and D are the pixel intensities at the integer pixel positions.

When the motion estimation is carried out on this improved resolution interpolated image, a
motion vector of 1 unit in this resolution corresponds to 0.5 unit with respect to the original
resolution. This is the basic principles of half-pixel motion estimation.

6.7.2.4 Advanced prediction mode in H.263 :


The advanced prediction mode in H.263 has two major aspects –
(a) it uses four motion vectors per macroblock. Each 8 x 8 block is associated with one motion
vector instead of one motion vector for the entire macroblock. This results in a better motion
representation, but increases the number of bits to encode the motion vector.

(b) it uses overlapped block motion compensation (OBMC), which results in overall smoothing
of the image and removal of blocking artifacts. OBMC involves using motion vectors of
neighboring blocks to reconstruct a block. An 8 x 8 luminance block pixel P(i,j) is a weighted
sum of three prediction values, as shown below
where (uk,vk) is the motion vector of the current block (k = 0) the block either above or below (k
=1), or the block either to the left or right of the current block (k =2). Here p (i,j) is the reference
(previous) frame and { Hk(i,j): k=0,1,2} are the weights defined as

In the advanced prediction mode, motion vectors are allowed to cross the picture boundaries, just
like the unrestricted motion vector mode.

6.7.2.5 PB-frame mode :


The H.263 standard has introduced a new concept, called PB-frames, which combines a P-frame
and a B-frame to encode as one unit. The prediction mechanism employed in PB-frame is
illustrated in fig.

The P-picture within the PB-frame is predicted from the previously decoded P picture and the B-
picture is bi-directionally predicted from the previous P-picture, as well as the P-picture currently
being decoded. Information from the P-picture and the B-picture within the PB-frame is
interleaved at the macroblock level.

The P-macroblock information is directly followed by the B-macroblock information. The


transmission of bit overhead is much higher if P and B pictures are encoded separately. For a
given P-picture rate, the use of PB-frames causes no extra delay.

6.7.2.6 H.263 + Extension


Subsequent to the H.263 recommendations, some extension features were added to it and these
are referred to as the H.263 + features. These include some new types of pictures, such as
scalability pictures, improved PB-frames, custom source formats etc; new coding methods such
as advanced intra coding through spatial filtering, deblocking filters etc.

You might also like