Videocompressionbasics Mpeg2 091031123547 Phpapp02
Videocompressionbasics Mpeg2 091031123547 Phpapp02
Videocompressionbasics Mpeg2 091031123547 Phpapp02
BYVIJAY
07/10/2009
AGENDA
OVERVIEW VIDEO SCHEME VIDEO COMPRESSION MPEG MPEG-2 VIDEO COMPRESSION MPEG-2 FRAMES
INTRA FRAME NON-INTRA FRAME
07/10/2009
1. OVERVIEW
o A video comprises of a sequence of frames.
Frame 1 Frame 2 Frame 3 Frame 4
o A video, of the duration of 1 second, generated by a TV camera usually contains 24 frames or 30 frames.
o Each pixel in a frame is represented by three attributes (each 8 bits long) One luminance attribute and two chrominance attributes. (i.e. YCbCr) {Y, Cb, Cr} Luminance (Y) Chrominance (CbCr) Frame : Describes the brightness of the pixel. : Describes the color of the pixel.
07/10/2009
o An uncompressed video data is big in size. 720 Pixels For exampleA single frame having the resolution of 720 X 480 480 (no. of pixels in each horizontal line is 720 and Lines total no. of horizontal lines per frame is 480) will be described by
{Y, Cb, Cr}, {Y, Cb, Cr}, {Y, Cb, Cr}. {Y, Cb, Cr} {Y, Cb, Cr}, {Y, Cb, Cr}, {Y, Cb, Cr}. {Y, Cb, Cr} . ........................ . . . ........................ . . {Y, Cb, Cr}, {Y, Cb, Cr}, {Y, Cb, Cr}. {Y, Cb, Cr}
Frame (720 X 480) 720 X 480 X 8 + 720 X 480 X 8 + 720 X 480 X 8 bits = 8294400 bits. ~ 8.29 Mb. Or A complete video of 1 second will be described by (720 X 480 X 8 + 720 X 480 X 8 + 720 X 480 X 8) X 24 bits = 199065600 bits ~ 199 Mb. Thus, for the entire movie, the data would be too big to fit on DVDs or to transmit it using the bandwidth of available TV channels.
07/10/2009
2. VIDEO SCHEME
Types of video schemes used for transmission -
Field 1
Field 2
o The two successive fields (field 1 & field 2) are called a frame. o Both the fields are sent one after another and display puts them back together before displaying the full frame. o Quality degraded as sometimes the frames come out of sync. o It conserve the bandwidth. o Maximum frame rate is 60 frames/ second.
07/10/2009
07/10/2009
3. VIDEO COMPRESSION
The concept of video compression lies on two main factorso The data in frames is often redundant in space and time. For exampleSpatial redundancy In a frame, adjacent pixels are usually correlated. e.g. - The grass is green in the background of a frame. Time based redundancy In a video, adjacent frames are usually correlated. e.g. - The green background is persisting frame after frame.
Frame 3 Frame 4
Frame 1
Frame 2
o The human eye better resolve the brightness details than color details. So the way human eye works, it is also possible to delete some data from the frame with almost no noticeable degradation in image quality.
07/10/2009
4. MPEG
MPEG stands for Motion Picture Experts Group established in 1988 as a working group within ISO/IEC that has defined standards for digital compression of audio & video signals. Such aso MPEG-1 : It was the very first project of this group and published in 1993 as ISO/IEC 11172 standard. MPEG-1 defines coding methods to compress the progressively scanned video. Commonly used in CD-i and Video CD systems. It supports coding bit rate of 1.5 Mbit/s. o MPEG-2 : is an extension of MPEG-1, published in 1995 as ISO/IEC 13818 standard. MPEG-2 defines coding methods to compress progressively scanned video as well as interlaced scanned video. Commonly used in broadcast format, such as Standard Definition TV (SDT) and High Definition TV (HDT). It supports coding bit rate of 3 - 15 Mbit/s for SDT and 15 20 Mbit/s for HDT. o MPEG-4 : introduced in 1998 and still in development as ISO/IEC 14496 standard. MPEG-2 defines object based coding methods for mixed media data and provides new features, such as 3D rendering, animation graphics, DRM, various types of interactivity etc. Commonly used in web based streaming media, CD, videophone, DVB, etc . It supports coding bit rate of few Kbit/s tens of Mbit/s.
07/10/2009
Intra coded frames (I-frames), Predictive coded frames (P-frames), and Bi-directionally predictive coded frames (Bframes) o Compression is based on Spatial redundancy Time based redundancy
I-Frame P-Frame B-Frame
o Compressed frames (I, P & B frames) are organized in a sequence to form Group of Pictures (GOP).
(GOP 1)
Group of pictures
07/10/2009
6. MPEG-2 FRAMES
I - Frame
o Compressed directly from of a raw (uncompressed) frame. o Compression is based on spatial redundancy in the current raw frame only and inability of human eye to detect certain changes in the image.
Compressed
I Raw Frame
o I-frame is a reference frame and can be used to predict the P-frame immediately following it.
P - Frame
o Compression is based on spatial redundancy as well as on time based redundancy. o P-frame can be predicted by referring I-frame or P-frame immediately preceding it. (P-frame is also a reference frame).
Reference Reference
P Next Previous
P Next
Previous
o P-frame provides better compression than I-frame as it uses the data of previous Iframe or P-frame.
07/10/2009
References
I Future References
B - Frame
Previous References I Previous B P Future
o Compression is similar to P-frame except that B-frame compression is done by referring previous as well as following I-frame and/ or P frame.
P
P Future
o B-frame required frame sequence must be transmitted or stored (I or P frames) out of order, so that future frame is available for reference. It causes some delay through out the system.
Transmission Order
Previous
Raw Frames
100
102
99
104
101
106
103
107
105
108
Encoding Order
o B-frame provides better compression than P-frame & Iframe, as it uses the data of previous as well as succeeding I- frame and/ or P-frame. It requires a memory buffer of double in size to store data of two reference/ anchor frames o B-frame is not reference frame. a
100
102
99
104
101
106
103
107
105
108
Display Order
99
100
101
102
103
104
105
106
107
108
o There is no defined limit to the number of consecutive B-frames within a group of pictures. Most application uses two consecutive B-frames as the ideal trade off between compression efficiency and video quality.
07/10/2009
Discrete cosine transform (DCT) Convert spatial variation into frequency variation. DCT coefficient quantization Reduces higher frequency DCT coefficients to zero . Run-Length amplitude/ length encoding Variable
Compression using entropy encoding, runlength encoding & Huffman encoding . Bit rate control Prevents under/ over flow of data buffer.
Residual error frame & its coding It is generated by subtracting predicted frame from its reference frame, which is further spatially coded & transmitted.
07/10/2009
DCT
Quantization
Run-Length VLC
Video Filtering
Video filtering is a lossy compression technique and is used to compress the spatial redundancies on macro-block basis within the current frame. It operates on color space (i.e. YCbCr encoding & CbCr sub-sampling), as Human Visual System is less sensitive to variations in color as compared to variations in brightness. Video filtering includeso Macro-block : Macro-blocks are created by dividing raw frame into 8 x 8 pixels blocks. For exampleRaw Frame
Block_2
Block_n
07/10/2009
o YCbCr Encoding: converts blocks RGB data into YCbCr color space. For exampleA raw frame contains an image in RGB color space.
Raw Frame
RGB color space contains mutual redundancies, so it requires large space for storage and high bandwidth for transmission. Encoding RGB into YCbCr color space reduces the mutual redundancies
Raw Frame
Cb
Cr
Conversion: Y = + 0.299 * R + 0.587 * G + 0.114 * B Cb = 128 - 0.168736 * R - 0.331264 * G + 0.5 * B Cr = 128 + 0.5 * R - 0.418688 * G - 0.081312 * B
Where-
R, G & B values are 8 bits long and lies in {0, 1, 2, ..., 255} range.
07/10/2009
o Chrominance (CbCr) Sub-Sampling: provides further compression at chrominance plane to reduced number of bits to represent an image. For exampleA YCbCr encoded frame can be represented as- 4:4:4 Sampling Format
x x x x Y Cb x Cr x x x x x x x x x x x x x x x x x x x x
4:4:4
4:4:4 sampling format states that, for every four Y samples there is four Cb & four Cr samples. If an image resolution is 640 x 480 pixels then, the number. of Y samples = 640 x 480, Cr samples = 640 x 480 & Cb samples = 640 X 480 Number of bits required = 640 x 480 x 8 + 640 x 480 x 8 + 640 x 480 x 8 = 7372800 bits ~ 7.3728 Mb
07/10/2009
4:4:4 sampling format can be subsampled to 4:2:2 format, where Cr & Cb are sub-sampled to half the horizontal resolution of Y.
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x x
x x x
4:4:4
4:2:2
That is, in 4:2:0 sampling format - for every four 4 Y samples in horizontal direction, there would be 2 Cb & 2 Cr samples
If an image resolution is 640 x 480 pixels then, the number. of Y samples = 640 x 480, Cr samples = 320 x 480 & Cb samples = 320 X 480 Number of bits required = 640 x 480 x 8 + 320 x 480 x 8 + 320 x 480 x 8 = 4915200 bits ~ 4.9152 Mb
07/10/2009
It can be further subsampled to 4:2:0 format, where Cb & Cr are sub-sampled to half the horizontal and vertical resolution of Y.
4:2:0
x x x x x x x x x x x x x x x x x x
OR
x x x x x x
4:2:2
If an image resolution is 640 x 480 pixels then, the number. of Y samples = 640 x 480, Cr samples = 320 x 240 & Cb samples = 320 X 240 Number of bits required = 640 x 480 x 8 + 320 x 240 x 8 + 320 x 240 x 8 = 3686400 bits ~ 3.6864 Mb (which is far less than 4:4:4 and 4:2:2 format)
07/10/2009
Where-
The output of a DCT function is a DCT coefficient matrix containing the data in frequency domain. Data in frequency domain can be efficiently processed and compressed.
07/10/2009
In DCT, 8x8 data block of Y, Cb and Cr components is converted into frequency domain. For exampleBlock_1
o Lets assume, 8x8 pixel block of Y component and its corresponding 8x8 data block 8 X 8 Data block 52 63 62 63 67 79 85 87 55 59 59 58 61 65 71 79 61 55 68 71 68 60 64 69 66 90 113 122 104 70 59 68 70 109 144 154 126 77 55 65 61 85 104 106 88 68 61 76 64 69 66 70 68 58 65 78 73 72 73 69 70 75 83 94 -76 -65 -66 -65 -61 -49 8 X 8 DCT Coefficient Matrix -415 4 DCT 8X8 Pixels Y -47 -49 -30 -22 7 12 -7 3 0 0 -61 -61 77 34 -13 2 0 -1 27 10 -25 -15 -4 -6 -2 -4 56 13 -29 -10 -2 -2 -1 -1 -20 -7 10 6 2 1 -3 0 -2 -9 5 2 -3 4 4 1 -0 5 -6 2 3 2 -1 2 -43 -41 -73 -69 -69 -70 -67 -63 -57 -49
8X8 Pixels Y
8 X 8 Data block -58 -19 16 26 -2 -51 -73 -63 -67 -43 -24 -22 -40 -60 -67 -52 -64 -59 -62 -58 -60 -70 -63 -50 -55 -56 -55 -59 -58 -53 -45 -34
DCT
12 -8 -1 0
07/10/2009
Quantization
Quantization reduces the amount of information in higher frequency DCT coefficient components using a default quantization matrix defined by MPEG-2 standard.
16 12 14 14 18 24 49 72 11 12 13 17 22 35 64 92 19 14 16 22 37 55 78 95 16 19 24 29 56 64 87 98 24 26 40 51 68 81 103 112 40 58 57 87 109 104 121 100 51 60 69 80 103 113 120 103 61 55 56 62 77 92 101 99
Default quantization matrix contains constant values. It is a lossy operation that causes minor degradation in the image quality due to some subtle loss in brightness and colors.
Each component in DCT coefficient matrix is divided by its corresponding constant value in default quantization matrix and a quantized DCT coefficient matrix is computed. A quantization function can be represented as:
07/10/2009
For example-415 4 -47 -49 12 -8 -1 0 -30 -22 7 12 -7 3 0 0 -61 -61 77 34 -13 2 0 -1 27 10 -25 -15 -4 -6 -2 -4 56 13 -29 -10 -2 -2 -1 -1 -20 -7 10 6 2 1 -3 0 -2 -9 5 2 -3 4 4 1 -0 5 -6 2 3 2 -1 2 16 12 14 14 18 24 49 72 11 12 13 17 22 35 64 92 19 14 16 22 37 55 78 95 16 19 24 29 56 64 87 98 24 26 40 51 68 81 103 112 40 58 57 87 109 104 121 100 51 60 69 80 103 113 120 103 61 55 56 62 77 92 101 99
Quantization
-26 0 -3 -4 1 0 0 0
-3 -2 1 1 0 0 0 0
-6 -4 5 2 0 0 0 0
2 1 -1 -1 0 0 0 0
2 1 -1 0 0 0 0 0
-1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
07/10/2009
Entropy Encoding
0 -3 -4 1 0 0 0
0
-26 1 -3 -1 0 0
15
-1 0 0 0
0 0
48
63
07/10/2009
o Run Length Encoding : is a lossless data compression technique of a sequence in which same data value occurs in many consecutive data elements.
WWWWWWWWWWWWBWWWWWWWWWWWWBBB (Sequence)
Here, the runs of data are stored as single data value and count. That is, the above sequence can be represented as 12W1B12W3B. For example0
-26 1 0 0 -3 -1 0 0
15
-1 0 0 0
48
63
-26, -3, 0, -3, -2, -6, 2, -4, 1, -4, (two 1s), 5, 1, 2, -1, 1, -1, 2, (5 0s), (two -1s), (thirty eights zeroes)
Run Length Encoded DCT coefficients
07/10/2009
o Huffman Encoding : is a lossless data compression technique that uses variable length code table for encoding a source symbol.
Variable length code table
Where, Variable length code table is derived based on the estimated probability of occurrence of each possible value of the source symbol.
char space a e f h
Freq 7 4 4 3 2
For example- MPEG-2 has special Huffman code word (i.e. EOB) to end the sequence prematurely when the remaining coefficients are zero and then it performs variable length encoding for further compression. -26, -3, 0, -3, -2, -6, 2, -4, 1, -4, (two 1s), 5, 1, 2, -1, 1, -1, 2, (5 0s), (two -1s), (thirty eights zeroes)
Run Length Encoded DCT coefficients
-26, -3, 0, -3, -2, -6, 2, -4, 1, -4, (two 1s), 5, 1, 2, -1, 1, -1, 2, (5 0s), (two -1s), EOB
DCT coefficients sequence with EOB
0011101010011011010110001010
Compressed Bit Stream Sequence
07/10/2009
Quantization
Run-Length VLC
o Quantization process may affects relative buffer fullness which in turn affects the output bit rate, as quantization depends on default quantization matrix (picture basis) and quantization scale (macro-block basis). o Encoder has to pass these two parameters to bit rate control mechanism in order to control relative buffer fullness and constant bit rate. o Buffer under flow/ over flow can be prevented by repeating or dropping of entire video frames.
07/10/2009
Inverse DCT
Inverse Quantization
Motion Estimation
Motion Compensation
07/10/2009
Other P-frame in a group of picture can be predicted from previous I-frame or P-frame immediately preceding it.
07/10/2009
Temporal Prediction
Mostly, consecutive video frames are similar except for the differences induced by the objects moving within the frames. For example-
Tree moved down and to the right Frame 1 (Current) Frame 2 (Next)
Temporal prediction uses motion estimation & motion vector techniques to predict these changes in the future frames. Motion estimation is applied at luminance plane only. (It is not applied at chrominance plane as it is assumed that the color motion can be adequately presented with the same motion information as the luminance.)
07/10/2009
o Motion Estimation : It performs a 2-Dimesional spatial search for each luminance macro-blocks within the frame to get the best match. For example-
Macro-block to be searched
Search
Frame 2 Search
Frame 1
Search
Frame 1
Frame 1
If there is no acceptable match, then encoder shall code that particular macroblock as an intra macro-block, even though it may be in a P or B frame.
07/10/2009
o Motion Vector : Assign motion vectors to the resultant macro-blocks to indicate how far the horizontally and vertically the macro-block must be moved so that a predicted frame can be generated For example-
Motion Vectors
Frame 1
Predicted Frame
Since, each forward and backward macro-block contains 2 motion vectors, so a bidirectionally predicted macro-block will contain 4 motion vectors.
07/10/2009
Frame 2
Subtract
Predicted Frame
o More accurate the motion is estimated & matched, more likely the residual error will approach zero and the coding efficiency shall be highest.
o Since, motion vector tends to be highly correlated between macro-blocks Horizontal & vertical component is compared to the previously valid horizontal & vertical motion vector respectively and difference is calculated. o These differences (i.e. Residual Error Frame) are then coded and variable length code is applied on it for maximum compression efficiency.
07/10/2009
o Residual Error Frame Coding : Coding of residual error frame is similar to I-frame with some differences. Such as Default quantization matrix for non-intra frame is flat matrix with constant value of 16 for each of the 64 locations. Non-intra frame quantization contains a dead-zone around zero which helps in eliminating any lone DCT coefficient quantization values that might reduce the run-length amplitude efficiency. Motion vectors for the residual block information are calculated as differential values and coded with a variable length code according to their statistical likelihood of occurrence.
07/10/2009
Run-Length VLD
Inverse Quantization
Inverse DCT
o Buffer : Contains input bit-stream. For fixed rate applications, constant bit-stream is buffered in the memory and read out at variable rate based on the coding efficiency of the macro-blocks and frames to be decoded. o VLD : Reverse the order of run length amplitude/ variable length encoding done in encoding process and recover the quantized DCT coefficient matrix. It is a most complex and computationally expensive portion in decoding. Perform bitwise decoding of input bit-stream using table look-ups to generate quantized DCT coefficients matrix.
07/10/2009
o Inverse Quantization : Reverse the order of quantization done in encoding process and recover the DCT coefficient matrix. Components of decoded quantized DCT coefficient matrix is multiplied by the corresponding value of the default quantization matrix and the quantization scale factor. Resulting coefficient is clipped to the region -2048 to +2047. Perform IDCT mismatch control to prevent long term error propagation within the sequence. o Inverse DCT : Reverse the order of DCT done in encoding process and recover the original frame. A two-dimension DCT function can be represented as:
Where-
07/10/2009
Run-Length VLD
Inverse Quantization
Inverse DCT
Motion Compensation
07/10/2009
9. REFERENCES:
1. 2. 3. http://en.wikipedia.org/wiki/MPEG-2 http://www.john-wiseman.com/technical/MPEG_tutorial.htm http://www.bretl.com/mpeghtml/MPEGindex.htm
07/10/2009