Sampling Image Video Processing
Sampling Image Video Processing
Imaging Basics
If sampling is performed in the time domain, Hz is cycles/sec. In the case of image processing, we can regard an image as a two-dimensional lightintensity function f (x, y ) of spatial coordinates (x, y ). Since light is a form of energy, f (x, y ) must be nonnegative. In order that we can store an image in a computer, which processes data in discrete form, the image function f (x, y ) must be digitized both spatially and in amplitude. Digitization of the spatial coordinates (x, y ) is referred to as image sampling or spatial sampling , and digitization of the amplitude f is referred to as quantization. Moreover, for moving video images, we have to digitize the time component and this is called temporal sampling . Digital video is a representation of a real-world scene, sampled spatially and temporarily and with the light intensity value quantized at each spatial point. A scene is sampled at an instance of time to produce a f rame, which consists of the complete visual scene at that instance, or a f ield, which consists of odd- or even-numbered lines of spatial samples. Figure 3-1 shows the concept of spatial and temporal sampling of videos.
Temporal Samples
images, each colour component is ltered and projected onto an independent 2D CCD array. The CCD array outputs analogue signals representing the intensity levels of the colour component. Sampling the signal at an instance in time produces a sampled image or frame that has specied values at a set of spatial sampling points in the form of an N M array as shown in the following equation. f (x, y ) f (0, M 1) f (1, M 1) . . f (N 1, M 1)
f (0, 0) f (1, 0) . .
f (0, 1) f (1, 1) . .
(3.1)
f (N 1, 0) f (N 1, 1) ...
The right image of Figure 3-2 below shows that a rectangular grid is overlaid on a 2D image to obtain sampled values f (x, y ) at the intersection points of the grid. We may approximately reconstruct the sampled image by representing each sample as a square picture element ( pixel ) as shown on the left image of Figure 3-2. The visual quality of the reconstructed image is affected by the choice of the sampling points. The more sampling points we choose, the higher resolution the resulted sampled image will be. Of course, choosing more sampling points requires more computing power and storage.
Chapter 3
Imaging Basics
computing power and storage to process and save the larger number of samples. Early silent lms used anything between 16 and 24 frames per second ( fps ). Current television standards use sampling rate of 25 or 30 frames per second. There are two commonly used temporal sampling techniques, progressive sampling and interlaced sampling. Progressive sampling is a frame-based sampling technique where a video signal is sampled as a series of complete frames. Film is a progressive sampling source for video. Interlaced sampling is a eld-based sampling technique where the video is sampled periodically at two sample elds; half of the data in a frame ( one eld ) are scanned at one time. To reconstruct the frame, a pair of sample elds are superimposed on each other ( interlaced ). In general, a eld consists of either the odd-numbered or even-numbered scan lines within a frame as shown in Figure 3-3. Start of odd eld Start of even eld
Figure 3-3 Interlaced Scanning An interlaced video sequence contains a sequence of elds, each of which consists of half the data of a complete frame. The interlaced sampling technique can give the appearance of smoother motion as compared to the progressive sampling method when the data are sampled at the same rate. This is due to the motion blur effect of human eyes; the persistence of vision can cause images shown rapidly in sequence to appear as one. When we rapidly switch between two low quality elds, they appear like a single high quality image. Because of this advantage, most current video image formats, including several high-denition video standards, use interlaced techniques rather than progressive methods.
3.1.3 Quantization
Quantization is the procedure of constraining the value of a function at a sampling point to a predetermined nite set of discrete values. Note that the original function can be either continuous or discrete. For example, if we want to specify the temperature of Los Angels, ranging from 0o C to 50o C , up to a a precision of 0.1o C , we must be able to represent 1001 possible values, which require 10 bits to represent one sample. On the other hand, if we only need a precision of 1o C , we only have 51 possible values requiring 6 bits for the representation. For image processing, higher precision give higher image quality but requires more bits in the representation of the samples. We will come back to this topic and discuss how to use quantization to achieve lossy image compression.
Color Spaces
Chapter 3
Imaging Basics
denable colors lie in a unit cube as shown in Figure 3-4. This color space is most natural for representing computer images, in which a color specication such as ( 0.1, 0.8, 0.23 ) can be directly translated into three positive integer values, each of which is represented by one byte.
Blue
[0, 0, 1]
Cyan
M agenta
W hite
Red
Y ellow
(3.2)
In some other notations, the authors like to consider R, G, and B as three unit vectors like the three spatial unit vectors i, j, and k. Just as a spatial vector V can be expressed as v = xi + y j + z k, any color is expressed as C = (rR + gG + bB ), and the red, green, blue intensities are specied by the values of r, g , and b respectively. In our notation here, R, G, and B represent the intensity values of the color components. Suppose we have two colors C1 and C2 given by R1 R2 C1 = G1 , C2 = G2 B1 B2 Does it make sense to add these two colors to produce a new color C ? For instance, consider
Color Spaces
R1 + R2 C = C1 + C2 = G1 + G2 B1 + B2 You may immediately notice that the sum of two components may give a value larger than 1 which lies outside the color cube and thus does not represent any color. Just like adding two points in space is illegitimate, we cannot arbitrarily combine two colors. A linear combination of colors makes sense only if the sum of the coefcients is equal to 1. Therefore, we can have C = 1 C1 + 2 C2 when 0 1 , 2 and 1 + 2 = 1 (3.3)
In this way, we can guarantee that the resulted components will always lie within the color cube as each value will never exceed one. For example, R = 1 R1 + 2 R2 1 1 + 2 1 = 1 which implies R1 The linear combination of colors described by Equation (3.3) is called color blending .
(3.4)
Chapter 3 with
Imaging Basics
0 kr , kb , kg kr + kb + kg = 1
(3.5)
Note that equations (3.4) and (3.5) imply that 0 Y 1 if the R, G, B components lie within the unit color cube. However, U and V can be negative. Typically, kr = 0.299, kg = 0.587, kb = 0.114 (3.6)
which are values used in some TV standards. For convenience, in the forthcoming discussions, we always assume that 0 R, G, B 1 unless otherwise stated. The complete description of an image is specied by Y ( the luminance component ) and the two color differences ( chrominance ) U and V . If the image is black-and-white, U = V = 0. Note that we do not need another difference ( G Y ) for the green component because that would be redundant. We can consider (3.4) as three equations with three unknowns, R, G, B . We can always solve for the three unknowns and recover R, G, B . A fourth equation is not necessary. It seems that there is no advantage of using YUV over RGB to represent an image as both system requires three components to specify an image sample. However, as we mentioned earlier, human eyes are less sensitive to color than to luminance. Therefore, we can represent the U and V components with a lower resolution than Y and the reduction of the amount of data to represent chrominance components will not have an obvious effect on visual quality. Representing chroma with less number of bits than luma is a simple but effective way of compressing an image.
Color Spaces
If we use the ITU standard values kb = 0.114, kr = 0.299, kg = 1 kb kr = 0.587 for (3.7) and (3.8), we will obtain the following commonly used conversion equations. Y = 0.299R + 0.587G + 0.114B Cb = 0.564(B Y ) + 0.5 Cr = 0.713(R Y ) + 0.5 (3.9) R = Y + 1.402Cr 0.701 G = Y 0.714Cr 0.344Cb + 0.529 B = Y + 1.772Cb 0.886 In equations (3.7), it is obvious that 0 Y 1. It turns out that the chrominance components Cb and Cr dened in (3.7) also always lie within the range [0, 1]. We prove this for the case of Cb . From (3.7), we have
Cb
= = = =
Thus
Cb 0
(3.10)
Chapter 3 Also,
Imaging Basics
Cb
Thus Cb 1 Combining (3.10) and (3.11), we have 0 Cb 1 Similarly 0 Cr 1 In summary, we have the following situation. If then 0 R, G, B 1 (3.14) 0 Y, Cb , Cr 1 Note that the converse is not true. That is, if 0 Y, Cb , Cr 1, it does not imply 0 R, G, B 1. A knowledge of this helps us in the implementations of the conversion from RGB to YCbCr and vice versa. We mentioned earlier that the eye can only resolve about 200 different intensity levels of each of the RGB components. Therefore, we can quantize all the RGB components in the interval [0,1] to 256 values, from 0 to 255, which can be represented by one byte of storage without any loss of visual quality. In other words, one byte ( or an 8-bit unsigned integer ) is enough to represent all the values of each RGB component. When we convert from RGB to YCbCr, it only requires one 8-bit unsigned integer to represent each YCbCr component. This implicitly implies that all conversions can be done efciently in integer arithmetic that we shall discuss below. (3.13) (3.12) (3.11)
10
oating point implementation. The java programs presented in this book are mainly for illustration of concepts. In most cases, error checking is omitted and some variable values are hard-coded.
import java.io.*; class Rgbyccf { public static void main(String[] args) { //0 <= R, G, B <= 1, sample values double R = 0.3, G = 0.7, B = 0.2, Y, Cb, Cr; System.out.printf("\nOriginal R, G, B:\t%f, %f, %f", R, G, B ); Y = 0.299 * R + 0.587 * G + 0.114 * B; Cb = 0.564 * (B - Y) + 0.5; Cr = 0.713 * (R - Y) + 0.5; System.out.printf("\nConverted Y, Cb, Cr:\t%f, %f, %f",Y,Cb,Cr); //recovering R, G, B R = Y + 1.402 * Cr - 0.701; G = Y - 0.714 * Cr - 0.344 * Cb + 0.529; B = Y + 1.772 * Cb - 0.886; System.out.printf("\nRecovered R, G, B:\t%f, %f, %f\n\n",R,G,B); } }
The recovered R, G, and B values differ slightly from the original ones due to rounding errors in computing and the representation of numbers in binary form.
Chapter 3
Imaging Basics
11
(3.15)
At the same time, we quantize the R, G, and B values from [0, 1] to 0, 1, ..., 255 which can be done by multiplying the oating-point values by 255. We also need to quantize the shifting constants 0.5, 0.701, 0.529, and 0.886 of (3.9) using the same rule by multiplying them by 255, which will become 0.5 255 = 128 0.701 255 = 179 0.529 255 = 135 0.886 255 = 226
(3.17)
Actually, representing a component of RGB with integer values 0 to 255 is the natural way of a modern computer handling color data. Each pixel has three components ( R, G, and B ) and each component value is saved as an 8-bit unsigned number. As shown in (3.9), in oating-point representation, the Cb component is given by Cb = 0.564(B Y ) + 0.5 After quantization, it becomes Cb = 0.564(B Y ) + 128 Multiplying (3.18) by 2 , we obtain 216 Cb = 36962(B Y ) + 128 216 The corresponding equation for Cr is: 216 Cr = 46727(R Y ) + 128 216 (3.20) (3.19)
16
(3.18)
12
As R, G, and B have become integers, we can carry out the calculations using integer multiplications and then divide the result by 216 . In binary calculations, dividing a value by 216 is the same as shifting the value right by 16. Therefore, from (3.16), (3.19) and (3.20), the calculations of Y and Cb using integer-arithmetic can be carried out using the following piece of java code. Y = (19595 R + 38470 G + 7471 B ) >> 16; Cb = (36962 (B Y ) >> 16) + 128; Cr = (46727 (R Y ) >> 16) + 128;
(3.21)
One should note that the sum of the coefcients in calculating Y is 216 (i.e. 19595 + 38470 + 7471 = 65536 = 216 ), corresponding to the requirement, kr + kg + kb = 1 in the oating-point representation. The constraints of (3.14) and the requirement of 0 R, G, B 255 implies that in our integer representation, 0 Y 255 0 Cb 255 0 Cr 255 In (3.9) the R component is obtained from Y and Cr : R = Y + 1.402Cr 0.701 In integer-arithmetic, this becomes 216 R = 216 Y + 91881Cr 216 179 The value of R is obtained by dividing (3.23) by 216 as shown below in java code: R = (Y + 91881 Cr >> 16) 179; (3.24) (3.23)
(3.22)
We can obtain similar equations for G and B. Combining all these, equations of (3.9) when expressed in integer-arithmetic and in java code will take the following form: Y = (19595 R + 38470 G + 7471 B ) >> 16; Cb = (36962 (B Y ) >> 16) + 128; Cr = (46727 (R Y ) >> 16) + 128; (3.25) R = Y + (91881 Cr >> 16) 179; G = Y ((46793 Cr + 22544 Cb) >> 16) + 135; B = Y + (116129 Cb >> 16) 226; In (3.25), it is obvious that a 32-bit integer is large enough to hold any intermediate calculations. Program Listing 3-2 below shows its implementation. The program generates the outputs shown below.
Chapter 3
13
/* Rgbycci.java * Simple program to demonstrate conversion from RGB to YCbCr and vice * versa using ITU-R recommendation BT.601, and integer-arithmetic. * Since Java does not have data type "unsigned char", we use "int". * Compile: $javac rgbycci.java * Execute: $java regycci */ import java.io.*; /* Note: * 216 = 65536 * kr = 0.299 = 19595 / 216 * kg = 0.587 = 38470 / 216 * Kb = 0.114 = 7471 / 216 * 0.5 = 128 / 255 * 0.564 = 36962 / 216 * 0.713 = 46727 / 216 * 1.402 = 91881 / 216 * 0.701 = 135 / 255 * 0.714 = 46793 / 216 * 0.344 = 22544 / 216 * 0.529 = 34668 / 216 * 1.772 = 116129 / 216 * 0.886 = 226 / 255 */ class Rgbycci { public static void main(String[] args) { int R, G, B; //RGB components int Y, Cb, Cr; //YCbCr components //some sample values for demo R = 252; G = 120; B = 3; //convert from RGB to YCbcr Y = ( 19595 * R + 38470 * G + 7471 * B ) >> 16; Cb = ( 36962 * ( B - Y ) >> 16) + 128; Cr = (46727 * ( R - Y ) >> 16) + 128; System.out.printf("\nOriginal RGB & corresponding YCbCr values:"); System.out.printf("\n\tR = %6d, G = %6d, B = %6d", R, G, B ); System.out.printf("\n\tY = %6d, Cb = %6d, Cr = %6d", Y, Cb, Cr ); //convert from YCbCr to RGB R = Y + (91881 * Cr >> 16) - 179; G = Y -( ( 22544 * Cb + 46793 * Cr ) >> 16) + 135; B = Y + (116129 * Cb >> 16) - 226; System.out.printf("\n\nRecovered RGB values:"); System.out.printf("\n\tR = %6d, G = %6d, B = %6d\n\n", R, G, B ); } }
120, B
14
Again, some precision has been lost when we recover R, G, and B from the converted Y, Cb, and Cr values. This is due to the right shifts in the calculations, which are essentially truncate operations. Because of rounding or truncating errors, the recovered R, G, and B values may not lie within the range [0, 255]. To remedy this, we can have a function that check the recovered value; if the value is smaller than 0, we set it to 0 and if it is larger than 255, we set it to 255. For example, if ( 0 < R ) R = 0; else if ( R > 255 ) R = 255; However, this check is not necessary when we convert from RGB to YCbCr. This is because from (3.14), we know that we always have 0 Y, Cb , Cr 1. For any positive real number, a and 0 a 1 and any positive integer I , 0 Round(aI ) Round(I ) = I and similarly 0 T runcate(aI ) I
This implies that after quantization and rounding, we always have 0 Y, Cb , Cr 255.
Chapter 3
Imaging Basics
15
16
that people use to measure video quality: subjective tests, where human subjects are asked to assess or rank the images, and objective tests, which compute the distortions between the original and processed video sequences.
Subjective quality measurement asks human subjects to rank the quality of a video based on their own perception and understanding of quality. For example, a viewer can be asked to rate the quality on a 5-point scale, with quality ratings ranging from bad to excellent as shwon in Figure 3-6. 100 Excellent: Imperceptible 80 Good: Perceptible 60 Fair: Slightly Annoying 40 Poor: Annoying 20 Bad: Very Annoying 0 Figure 3-6 Example of video quality assessment scale used in subjective tests Very often, a viewers perception on a video is affected by many factors like the viewing environment, the lighting conditions, display size and resolution, the viewing distance, the state of mind of the viewer, whether the material is interesting to the viewer and how the viewer interacts with the visual scene. It is not uncommon that the same viewer who observes the same video at different times under different environments may give signicantly different evaluations on the quality of the video. For example, it has been shown that subjective quality ratings of the same video sequence are usually higher when accompanied by good quality sound, which may lower the evaluators ability to detect impairments. Also, viewers tend to give higher ratings to images with higher contrast or more colorful scenes even though objective testing show that they have larger distortions in comparison to the originals. Nevertheless, subjective quality assessment still remains the most reliable methods of measuring video quality. It is also the most efcient method to test the performance of components, like video codecs, human vision models and objective quality assessment metrics.
Chapter 3
Imaging Basics
17
18
quality by continuously adjusting a side slider on the DSCQS scale, ranging from bad to excellent. The slider value is periodically sampled every 1 - 2 seconds. Using this method, differences between alternative transmission congurations can be analyzed in a more informative manner. However, as the accessor has to adjust the slider from time to time, she may be distracted and thus the rating may be compromised. Also, because of the recency or memory effect, it is quite difcult for the accessor to consistently detect momentary changes in quality, leading to stability and reliability problems of the results.
(Xij Yij )2
i=0 j =0
(3.27)
Though PSNR is a straightforward metric to calculate, it cannot describe distortions perceived by a complex and multi-dimensional system like the human visual system (HVS), and thus fails to give good evaluations in many cases. For example, a viewer may be interested in an object of an images but not its background. If the background is largely distorted, the viewer would still rate that the image is of high quality; however, PSNR measure would indicate that the image is of poor quality. The limitations of this metric have led to recent research in image processing that has focused on developing metrics that resembles the response of real human viewers. Many approaches have been proposed but none of them can be accepted as a standard to be used as an alternative to subjective evaluation. The search of a good acceptable objective test for images will remain a research topic for some time.
Chapter 3
Imaging Basics
19
Windws Fan, Linux Fan describes a true story about a spiritual battle between a Linux fan and a Windows fan. You can learn from the successful fan to become a successful Internet Service Provider ( ISP ) and create your own wealth.
The book describes the the principles of digital video data compression techniques and its implementations in C/C++. Topics covered include RBG-YCbCr conversion, macroblocks, DCT and IDCT, integer arithmetic, quantization, reorder, run-level encoding, entropy encoding, motion estimation, motion compensation and hybrid coding. January 2010 ISBN: 9781451522273