Lecture4 Data Compression
Lecture4 Data Compression
Lecture4 Data Compression
DATA COMPRESSION
JPEG and MPEG Standards
1
Motivation for Standards
Why Standards?
{ Compatibility
{ Production cost reduction
{ Triggering growth and product development
Do JPEG/MPEG standards kill research?
{ Not necessarily
{ Those standards are very
exible regarding the encoder
design, leaving much room for improvement
{ The trends are toward even greater
exibility in future
generations of the standards
{ Another growth area is the development of systems which
integrate components that use the standards
2
Image/Video Compression Standards
(Outline)
JPEG
{ Baseline JPEG
{ Extended JPEG
{ Lossless JPEG (DPCM + Human/Arithmetic)
MPEG
{ History (H.261 and H.263)
{ MPEG1
{ MPEG2
{ Why not MPEG3
{ Future: MPEG4 (slated for 1998)
3
The JPEG \Toolkit"
JPEG provides a \toolkit" of techniques for compressing
continuous-tone, still, color and monochrome images
Baseline JPEG provides a DCT-based algorithm, and uses
run-length encoding and Human coding
Baseline JPEG operates only in sequential mode, and is re-
stricted to 8 bits/pixel input images
Extended JPEG oers several optional enhancements:
{ 12-bit/pixel input
{ Progressive transmission
{ Choice between Arithmetic and Human coding
{ Adaptive quantization
{ Tiling
{ Still picture interchange le format (SPIFF)
{ Selective renement
Applications of JPEG
{ Desktop publishing, color fax, photojournalism,
picture archiving and communications systems for med-
ical images, general image archiving systems, consumer
imaging, graphic arts, and others
4
The Baseline JPEG Algorithm
1. It operates on 8 8 blocks of the input image
2. Mean-normalization (subtract 128 from each pixel)
3. Transform: DCT-transform each block
4. Quantization
An 8 8 quantization matrix Q is user-provided
Each block is divided by Q (point by point)
The terms are then rounded to their nearest integers
Remark: Up to 4 quantization matrices per image are
allowed (for example, one for luminance, and for each of
the three color components)
5. Entropy-coding of the DC coecients (the top left coecient
of each quantized block) using DPCM+Human
Human-encode the DC residuals derived from the dif-
ference between each DC and the DC of the preceding
block
6. Entropy-coding of the AC (i.e., non-DC) coecients
Zigzag-order the quantized coecients of each block
Record for each nonzero coecient both its distance (called
run) to the preceding nonzero coecient in the zigzag se-
quence, and its value (called level)
Human code the [run,level] terms using one single Hu-
man table for all the AC's of the image
5
A Schematic Diagram of the JPEG Encoder
6
The Quantization Matrices
They are user-provided
They can be computed using the contrast sensitivity function
of the HVS
Their values are 8-bit integers
They provide control over the bitrate by scaling them by a
constant factor
Example
0 1
11
B
B 16 10 16 24 40 51 61 CC
B
12
B
B
B
B
12 14 19 26 58 60 55 CCCCC
13
B
B
B
B
14 16 24 40 57 69 56 CCCC
Q= 17
B
B
B
B
14 22 29 51 87 80 62 CCCC
:
22
B
B
B
B
18 37 56 68 109 103 77 CCCC
B
35
B
B
B 24 55 64 81 104 113 92 CCCC
B
64
B
B
B 49 78 87 103 121 120101 CCCC
@ A
72 92 95 98 112 100 103 99
7
The DC Human Table
The DC residuals are in the range [-2047,2047]
Divide this range into 12 categories corresponding to \one-
dimensional ripples"
8
Coding of the AC Terms
(The AC Human Table)
Divide the range of the AC's into 10 categories
Category k, 1 k 10, consists of 2k integers ranging from
,2k + 1 to ,2k, , and from 2k, to 2k , 1
1 1
9
Examples
Take 0an 8 8 block of Lena 1
B
B 143
147 149 152 156 147 146 149 CC
B
B
B
B
B
151
146 143 154 148 144 153 132 CCCCC
B
B
B
B
147
143 145 149 144 145 128 133 CCCC
B=
B
B
B
B
152
145 145 144 146 134 130 137 CCCC
B
B
B
B
146
143 142 147 124 127 139 138 CCCC
B
B
B
B 139
145 139 127 126 135 139 141 CCCC
B
B
B
B 145
137 124 130 138 136 140 144 CCCC
@ A
144 124 136 134 137 139 142 145
Normalize
0 B 1
BB 15
19 21 24 28 19 18 21 CC
BB
18BB 23
BB
15 26 20 16 25 4 CCCCC
15 BB 19 17 21 16 17 0 5 CCCC
BB
17 BB 24 17 16 18 6 2 9 CCCC
NB = 15
BB
BB 18 14 19 ,4 ,1 11 10 CCCC
BB
17 BB 11
BB 11 ,1 ,2 7 11 13 CCCC
9 BB 17
B@ ,4 2 10 8 12 16 CCCC
16 ,4 8 6 9 11 14 17 A
10
Perform
0 DCT 1
:4
B
B 103 12:4 6:0 2:1 8:1 5:7 ,0:7 ,0:4 CC
B
:7
B
B
B 31 11:6 ,22:8 ,0:7 ,0:1 ,0:5 ,4:7 ,1:8 CCCCC
B
:0
B
B
B 11 ,21:2 ,3:7 8:8 2:3 0:1 ,0:2 5:6 CCCC
D=
B
:2
B
B
B
B
0 ,4:9 10:3 ,8:6 ,8:9 ,2:5 ,7:6 ,1:4 CCCC
:9
B
B
B
B
4 ,0:4 ,1:9 ,8:9 6:6 2:6 3:6 3:6 CCCC
:7
B
B
B
B 0 ,0:9 ,0:9 4:1 7:2 ,15:5 2:8 1:8 CCCC
, :1
B
B
B
B 3 ,1:0 ,2:5 ,5:5 ,5:2 ,6:7 10:7 ,1:1 CCCC
,2:9
@
,0:2 ,1:2 ,2:7 3:6 2:6 0:4 ,6:4 A
Quantize
0
(round(D:=Q)) 1
BB 6
1 0 0 0 0 00 CC
BB
1BB 4 ,2 0 0 0 00 CCCCC
,2 BB
BB 1
BB
0 0 0 0 00 CCCC
0 BB 0 0 0 0 0 00 CCCC
Dq = 0
BB
BB 0 0 0 0 0 00 CCCC
BB
0 BB 0
BB 0 0 0 0 00 CCCC
0 BB 0
B@ 0 0 0 0 00 CCCC
A
0 0 0 0 0 0 00
zigzag ordering: 6 1 4 1 1 0 -2 -2 allzeros
11
Example Human Table for Lena
12
Coding the Example Block
zigzag ordering: 6 1 4 1 1 0 0 -2 -2 allzeros
Human symbols: [run, category]
DC (0,1) (0,3) (0,1) (0,1) (2,2) (0,2) EOB
Code:
(DC di. code)/001/100100/001/001/111110010/0100/1010
Bitrate (assuming 8 bits dor the DC residual symbol):
40
64
= 0:625 bits/pixel
13
Decoding
1. Entropy-decode the bitstream back to the quantized blocks
2. Dequantize: multiply each block coecient by the corre-
sponding coecient of the quantization matrix
3. Apply the inverse DCT transform on each block
4. Denormalize: add 128 to each coecient
14
Error
0 block 1
B
B 0 0 ,4 ,5 ,1 ,7 ,3 4 CC
B
B
B
B 6 , 1 , 8 0 ,5 ,5 9 ,9 CCCCC
B
B
B
B 1 ,4 ,4 0 ,2 3 ,9 ,1 CCCC
6 ,1 0 2 7 ,1 ,2
B
B
B
B
B
7 CCCC
B
B
B
B
1 0 2 11 ,9 ,4 9 8 CCCC
B
B
B
B ,2 7 5 ,4 ,4 4 5 6 CCCC
B
B
B
B 9 3 ,6 2 9 3 1 2 CCCC
@
11 ,7 9 8 8 4 0 ,2 A
MSE=5.2
15
Extended JPEG
Extended JPEG allows for several optional enhancements:
{ 12-bit/pixel input
{ Arithmetic coding is allowed as an alternative to Human
coding
{ Adaptive quantization: Allows 5-bit scale change to the
quantization matrix from one block to another
{ SPIFF: A le format that provides for the interchange
of compressed image les between dierent application
environments
{ Progressive transmission (PT)
Sequential PT
Spectral selection
Successive approximations
Hierarchical PT (pyramid encoding)
{ Tiling
{ Selective renement
16
Sequential Progressive Transmission
17
Hierarchical Progressive Transmission
18
MPEG
(History)
Standard Description
Activity
Internation Telecommunication Union (ITU);
H.261 designed for ISDN applications at p 64 kbits/sec,
Dec. 1990 (p = 1; 2; :::; 30)
ITU Recommendation;
H.263 video coding for low bitrate communication;
Feb. 1995 improved visual telephony applications
ISO Moving Picture Expert Group;
MPEG1: storage/retrieval of video+audio at about
1.5 Mbits/s
MPEG MPEG2: for generic applications at higher birates
Nov. 1994 (initially 10Mbits/s);
MPEG3: initially intended for 50 Mbits/s bitrate,
but MPEG2 was found to achieve the MPEG3 speed
goals, leading to the abandonment of MPEG3
19
Basic Concepts
Video a sequence of images called frames
Color
{ Three components: red (R), green (G), and blue (B)
{ For compatibility with non-colored media, the RGB model
was converted to an equivalent model | Y CbCr
Y is the luminance component, which was experimen-
tally determined to be
Y = 0:299R + 0:587G + 0:114B
Cb = B , Y
Cr = R , Y
{ Y is referred to as luma, and Cb & Cr as chroma
Every frame is really 3 images: one Y , one Cb and one Cr
8 bits/pixel for each of the three color components
Because human vision is less sensitive to color, the Cb and
Cr images are downsampled by 2 in each dimension (they
are quarter the size of Y images)
Much of MPEG processing is on the basis of a macroblock:
A 16 16 luminance block with the two 8 8 associated
chroma blocks
20
Modes of MPEG Compression
Intraframe compression
{ Exploits spatial redundancy only
{ Operates on single frames independetly of other frames
Interframe compression
{ Exploits both spatial and temporal redunadancies
{ Employs motion estimation (MS) without standardizing
any MS algorithm
{ Derives motion-compensated predictions of frames
{ Finally, it performs JPEG-like compression on the resid-
ual frames
21
Types of Frames in MPEG
22
Interframe Compression of P and B Blocks
A motion-compensated prediction (i.e., approximation) f of
a P/B frame f is made
The residual f , f is then compressed in a JPEG-like style
Any macroblock of the original frame f may be strictly intra
compressed if its prediction is deemed to be poor
Issues to be addressed
{ Method of strict intra compression
{ Method of compressing residual macroblocks
{ Motion-compensated prediction
23
Flowchart of MPEG Compression
24
Motion-Estimation and Prediction
Motion estimation is performed on the basis of macroblocks,
using the 16 16 luminance blocks only
Motion is assumed to be uniform across all the pixels of a
macroblock
Remark: There is a tradeo in deciding the size of the basic
block for motion estimation
{ The block has to be suciently large to avoid \false hits"
{ The block has to be suciently small to avoid diverse
motions within one single block
{ The MPEG block size, 16 16, is a good compromise
25
Motion-Estimation and Prediction for P Frames
Consider a P-frame P
P will be predicted (i.e., approximated) from one single ref-
erence frame R
R is the most recent (decoded) I or P frame
For each macroblock MB of P , nd the closest matching
macroblock MB' in the reference frame R
If the MB-to-MB' match is satisfactory, then
{ treat MB' as the prediction (i.e., approximation) of MB
{ record the motion (i.e., displacement) vector between the
two macroblocks (allowing half-pixel accuracy)
{ Compute and compress the macroblock residual MB,MB'
(luma and chroma)
26
If the MB-to-MB' match is found to be unsatisfactory, then
the macroblock MB is strictly intra compressed as is done in
I frames
The motion vectors of all the macroblocks of P exhibit re-
dundancy due to similar (or sometimes identical) motion ex-
perienced by many neighboring macroblocks
This redundancy is exploited by coding the consecutive dif-
ferential values of motion vectors (i.e., DPCM)
Remark 1: MPEG does not standardize the decision mech-
anism for judging whether or or not a match between two
macroblocks is satisfactory
Remark 2: A typical decision mechanism involves computing
an error measure between the luminance of the two macro
blocks. The match is treated as satisfory if and only if the
error is below a certain threshold. Possible error measures
include mean-square error (MSE), mean absolute-dierence
(MB ,MB )
error (MAD), and variance
variance(MB ) .
0
27
Motion-Estimation and Prediction for B Frames
Consider a B-frame B
B will be predicted (i.e., approximated) from TWO reference
frames R1 and R2
R is the most recent (decoded) past I/P frame, and R is
1 2
the nearest (decoded) future I/P frame
For each macroblock MB of B , nd the closest matching
macroblock MB1 in the reference frame R1, and the closest
matching macroblock MB2 in R2
The predicted macroblock is PM = NINT ( MB + MB )
1 1 2 2
28
Compute and compress the macroblock residual MB , PM
(luma and chroma)
If neither match is found to be satisfactory, then the mac-
roblock MB is strictly intra compressed as is done in I frames
Record the motion vector(s) between the MB and the other
one or two macroblocks (allowing half-pixel accuracy)
Again, the motion-vector redundancy is exploited by cod-
ing the consecutive dierential values of motion vectors (i.e.,
DPCM)
Remark: The prediction mode chosen is 2-bit coded and
passed on along with the macroblock header information
29
MPEG1 Constrained Parameters
30
MPEG2
Higher data rates than MPEG1
MPEG2 allows for higher quality source images
{ 4:2:0 (chroma subsamples only horizontally)
{ 4:4:4 (no subsampling of chroma)
{ Note: 4:2:2 is what's supported by MPEG1
MPEG2 allows for ner quantization and for specifying separate
quantization table for luma and chroma
MPEG2 allows for ner adjustment of quantization scale fac-
tor, used in intra compression
MPEG2 allows for interlaced video
MPEG2 supports error concealment (of lost macroblocks)
MPEG2 supports scalable compression
{ SNR-scalability: by sending bands of DCT coecients
{ spatial-scalability: pixel resolution by down-/up-sampling
{ temporal-scalability: dierent frame rates by skipping frames
31
MPEG2 (Cont.)
MPEG2 has a Prole and Level structure
{ Proles are algorithmic elements included in MPEG2
{ Levels are upper bounds on parameter values
Proles (proles are backward-compatible)
1. Simple: No use of B frames
2. Main: what was described earlier, but no scalability
3. SNR-scalable
4. Spatially scalable
5. High: temporally scalable, higher-quality source data (4:2:0,
4:2:2 and possibly 4:4:4 in the future)
Levels
1. Low: 352 240 frame size
2. Main: 720 480 frame size
3. High 1440: 1440 1152 frame size
4. High: 1920 1080 frame size
32
Allowed Level-Prole Combinations
33
References
1. B. pennebaker and J. L. Mitchell, JPEG Still Image Data
Compression Standard, Van Nostrand reinhold, New York
1993.
2. ISO-11172-2: Generic Coding of moving pictures and associ-
ated audio (MPEG-1)
3. ISO-13818-2: Generic Coding of moving pictures and associ-
ated audio (MPEG-2)
34