ShaderX2 AdvancedImageProcessing
ShaderX2 AdvancedImageProcessing
Introduction
With the introduction of the ps_2_0 pixel shader model in DirectX 9.0, we are
able to significantly expand our ability to use consumer graphics hardware to perform
image processing operations. This is due to the longer program length, the ability to
sample more times from the input image(s) and the addition of floating point internal data
representation. In the first ShaderX book, we used the ps_1_4 pixel shader model in
DirectX 8.1 to perform basic image processing techniques such as simple blurs, edge
detection, transfer functions and morphological operators [Mitchell02]. In this chapter,
we will extend our image processing toolbox to include color space conversion, a better
edge detection filter called the Canny filter, separable Gaussian and median filters, and a
real-time implementation of the Fast Fourier Transform.
Review
As shown in our original image processing chapter in the first ShaderX book,
post-processing of 3D frames is fundamental to producing a variety of interesting effects
in game scenes. Image processing is performed on a GPU by using the source image as a
texture and drawing a screen-aligned quadrilateral into the back buffer or another texture.
A pixel shader is used to process the input image to produce the desired result in the
render target.
Figure 1 - Using a pixel shader for image processing by rendering from one image to another
1
Advanced Image Processing with DirectX® 9 Pixel Shaders
Image processing is especially powerful when the color of the destination pixel is
the result of computations done on multiple pixels from the source image. In this case,
we sample the source image multiple times and use the pixel shader to combine the data
from the multiple samples (or taps) to produce a single output.
For those who may not be familiar with HSV space, it is a color space which is
designed to be intuitive to artists who think of a color’s tint, shade and tone [Smith78].
Interpolation in this color space can be more aesthetically pleasing than interpolation in
RGB space. Additionally, when comparing colors, it may be desirable to do so in HSV
space. For example, in RGB space, the color {100, 0, 0} is very different from the color
{0, 0, 100}. However, their V components in HSV space are equal. Colors, represented
by {hue, saturation, value} triples are defined to lie within a hexagonal pyramid as
shown in Figure 2 below.
2
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
The hue of a color is represented by an angle between 0˚ and 360˚ around the
central axis of the hexagonal cone. A color’s saturation is the distance from the central
(achromatic) axis and its value is the distance along the axis. Both saturation and value
are defined to be between 0 and 1.
r = color[0];
g = color[1];
b = color[2];
if (colorMax != 0)
{
s = (colorMax - colorMin) / colorMax;
}
if (s != 0) // if not achromatic
{
delta = colorMax - colorMin;
if (r == colorMax)
{
h = (g-b)/delta;
}
else if (g == colorMax)
{
h = 2.0 + (b-r) / delta;
}
3
Advanced Image Processing with DirectX® 9 Pixel Shaders
else // b is max
{
h = 4.0 + (r-g)/delta;
}
h *= 60;
if( h < 0)
{
h +=360;
}
return hsv;
}
if (hsv[1] == 0)
{
if (hsv[2] != 0)
{
color = hsv[2];
}
}
else
{
h = hsv.x * 360.0;
s = hsv.y;
v = hsv.z;
if (h == 360.0)
{
h=0;
}
h /=60;
i = floor (h);
f = h-i;
p = v * (1.0 - s);
4
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
q = v * (1.0 - (s * f));
t = v * (1.0 - (s * (1.0 -f)));
if (i == 0)
{
r = v;
g = t;
b = p;
}
else if (i == 1)
{
r = q;
g = v;
b = p;
}
else if (i == 2)
{
r = p;
g = v;
b = t;
}
else if (i == 3)
{
r = p;
g = q;
b = v;
}
else if (i == 4)
{
r = t;
g = p;
b = v;
}
else if (i == 5)
{
r = v;
g = p;
b = q;
}
color.r = r;
color.g = g;
color.b = b;
}
return color;
}
It is worth noting that RGB and HSV are not the only color spaces of interest in
computer graphics. For example, the original paper [Smith78] which introduced HSV
also introduced a color space called HSL (for hue, saturation and lightness) where L is
5
Advanced Image Processing with DirectX® 9 Pixel Shaders
often the same as the Luminance (Y) channel used in the YIQ color space. If you are
interested in learning more about color spaces, [Smith78] and [Foley90] both provide
excellent discussions.
Step-by-Step Approach
As outlined in [Jain95], the Canny edge detection filter can be implemented by
performing the following operations:
The next step in the process is computation of the partial derivatives (P and Q) in
the u and v directions respectively:
Then the magnitude of the derivative is computed using the standard formula:
Magnitude = P 2 +Q 2
6
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
Finally, the P and Q values are used to determine the direction of the edge at that
texel using the standard equation:
θ = atan2(Q, P)
Magnitude and θ are written out to an image so that the next shader can use them
to complete the Canny filter operation. The edge direction, θ, is a signed quantity in the
range of -π to π and must be packed into the 0 to 1 range in order to prevent loss of data
between rendering passes. In order to do this, we will pack it by computing:
A = abs(θ) / π
You’ve probably noticed that, due to the absolute value, this function is not
invertible, hence data is effectively lost. This does not present a problem for this
particular application due to symmetries in the following step.
The final pass involves sampling the image to get the Magnitude and the edge
direction, θ, at the current location. The edge direction, θ, must now be unpacked into its
proper range. Figure 3 below shows a partitioning of all values of θ (in degrees) into four
sectors.
The sectors are symmetric and map to the possible ways a line can pass through a
3×3 set of pixels. In the previous step, we took the absolute value of θ and divided it by π
to put it in the 0 to 1 range. Since we know that θ is already between 0 and 1 from the
previous step, we are almost done. Since the partitioning is symmetric, it was an
excellent way to reduce the number of comparisons needed to find the correct neighbors
to sample. Normally, to complete the mapping we would multiply A by 4 and we would
be done. However, if you look closely at Figure 3 you will that the sectors are centered
around 0 and 180. In order to compensate for this, the proper equation is:
7
Advanced Image Processing with DirectX® 9 Pixel Shaders
8
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
As you can see in Figure 4, the Canny filter produces one pixel wide edges unlike
more basic filters such as a Sobel edge filter.
Implementation Details
9
Advanced Image Processing with DirectX® 9 Pixel Shaders
The four steps of the Canny edge detection filter described above have been
collapsed into two rendering passes, requiring the two shaders shown below. The first
shader computes the gradients P and Q followed by the Magnitude and direction (θ).
After packing θ into the 0 to 1 range, Magnitude and θ are written out to a temporary
surface.
sampler InputImage;
float2 sampleOffsets[8] : register (c10);
struct PS_INPUT
{
float2 texCoord:TEXCOORD0;
};
10
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
q += texSamples[i] * qKernel[i];
}
p /= 2.0;
q /= 2.0;
In the second pass of the Canny edge detector, Magnitude and θ are read back
from the temporary surface. The edge direction, θ, is classified into one of four sectors
and the neighbors along the proper direction are sampled using dependent reads. The
Magnitudes of these neighbor samples along with a user-defined threshold are then used
to determine whether this pixel is a local maximum or not, resulting in either 0 or 1 being
output as the final result.
sampler InputImage;
struct PS_INPUT
{
float2 texCoord:TEXCOORD0;
};
11
Advanced Image Processing with DirectX® 9 Pixel Shaders
Magnitude = texSamples[0].r;
// Must unpack theta. Prior pass made Theta range between 0 and 1
// But we really want it to be either 0,1,2, or 4. See [Jain95]
// for more details.
Theta = (Theta - PI/16) * 4 ; // Now theta is between 0 and 4
Theta = floor(Theta); // Now theta is an INT.
if( Theta == 0)
{
texCoords[1] = In.texCoord + sampleOffsets[4];
texCoords[2] = In.texCoord + sampleOffsets[3];
}
else if(Theta == 1)
{
texCoords[1] = In.texCoord + sampleOffsets[2];
texCoords[2] = In.texCoord + sampleOffsets[5];
}
else if(Theta == 2)
{
texCoords[1] = In.texCoord + sampleOffsets[1];
texCoords[2] = In.texCoord + sampleOffsets[6];
}
else //if(Theta == 3)
{
texCoords[1] = In.texCoord + sampleOffsets[0];
texCoords[2] = In.texCoord + sampleOffsets[7];
}
12
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
{
result = 1;
}
return result;
}
You can see in Figure 4 that this produces one-pixel-wide edges, which may be
more desirable for some applications. You may see some gaps in the detected edges and,
in some cases, it may be useful to apply a dilation operation to fill in these gaps
[Mitchell02].
Separable Techniques
Certain filtering operations have inherent symmetry which allows us to implement
them more efficiently in a separable manner. That is, we can perform these 2D image
processing operations with a sequence of 1D operations and obtain equivalent results
with less computation. Conversely, we can implement a large separable filter kernel with
the same amount of computation as a small non-separable filter. This is particularly
important when attempting to apply “blooms” to final frames in high dynamic range
space to simulate light scattering. In this final section of the chapter, we will discuss
three separable filtering operations: the Gaussian blur, a median filter approximation and
the Fast Fourier Transform.
Separable Gaussian
A very commonly-used separable filter is the Gaussian filter, which can be used
to perform blurring of 2D images. The 2D isotropic (i.e. circularly symmetric) Gaussian
filter, g2D(x, y), samples a circular neighborhood of pixels from the input image and
computes their weighted average, according to the following equation:
x2 + y 2
1 −
g 2 D ( x, y ) = e 2σ 2
2π σ 2
where σ is the standard deviation of the Gaussian and x and y are the coordinates
of image samples relative to the center of the filter. The standard deviation, σ,
determines the size of the filter.
What this means is that we will sample a local area of texels from the input image
and weight them according to the above equation. For example, for a Gaussian with σ =
1, we compute the following filter kernel (after normalization).
13
Advanced Image Processing with DirectX® 9 Pixel Shaders
In theory, the Gaussian has infinite extent, but the contribution to the final result
is insignificant for input texels outside of this 5×5 region.
1 − 2 − 2
2 2
x y
1
g 2 D ( x, y ) = e 2σ • e 2σ
2π σ 2π σ
= g1D ( x) • g1D ( y)
This means that we can implement a given Gaussian with a series of 1D filtering
operations: one horizontal (g1D(x)) and one vertical (g1D(y)). This allows us to implement
Gaussians with much larger kernels (larger σ) while performing the same amount of
calculations that would be required to implement a smaller non-separable filter kernel.
This technique was used in our real-time implementation of Paul Debevec’s Rendering
with Natural Light animation as seen in Figure 7.
14
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
After rendering the scene in high dynamic range space, Debevec performed a
number of large Gaussian blurs on his 2D rendered scene to obtain blooms on bright
areas of the scene. In order to do this in real-time, we exploited the Gaussian’s
separability to perform the operation efficiently. In our case, we used σ = 7, which
resulted in a 25×25 Gaussian.
Due to the fact that we have only eight texture coordinate interpolators in the
ps_2_0 pixel shader programming model, we must derive some of our texture coordinates
in the pixel shader as deltas from the center tap location. To make the most efficient use
of the hardware, we will perform as many reads from the input image as possible using
non-dependent texture reads.
In our implementation, we divided our samples into three types: inner taps, outer
taps and the center tap. The center tap (c) and inner taps (x) shown in Figure 8 below are
performed using interpolated texture coordinates (and hence non-dependent texture
reads).
15
Advanced Image Processing with DirectX® 9 Pixel Shaders
The outer taps (o) shown in Figure 8 are sampled using texture coordinates
computed in the pixel shader. That is, they are done with dependent reads. Note that the
center tap (c) uses pick-nearest filtering and is aligned with the center of a specific texel
in the input image. The other 12 taps all use bilinear filtering and are aligned so that they
sample from two different texels in the input image. This Gaussian filter is implemented
in HLSL in the following shader:
float2 outerTaps[NUM_OUTER_TAPS];
outerTaps[0] = tapZero * gTexelOffset[0]; // coord for samp 7, 8
outerTaps[1] = tapZero * -gTexelOffset[0]; // coord for samp -7, -8
outerTaps[2] = tapZero * gTexelOffset[1]; // coord for samp 9, 10
outerTaps[3] = tapZero * -gTexelOffset[1]; // coord for samp -9, -10
outerTaps[4] = tapZero * gTexelOffset[2]; // coord for samp 11, 12
outerTaps[5] = tapZero * -gTexelOffset[2]; // coord for samp -11,-12
return accum;
}
16
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
Applying this shader twice in succession (with different input texture coordinates
and gTexelOffset[] table), we compute a 25×25 Gaussian blur and achieve the bloom
effect we are looking for.
Another important filter in image processing is the median filter, the output of
which is the median of the set of input data sampled by the filter kernel. For those who
may not recall, the median of a set of values is the middle value after sorting or ranking
the data. For example, if you have the following set of numbers {9, 3, 6, 1, 2, 2, 8}, you
can sort them to get {1, 2, 2, 3, 6, 8, 9} and select the middle value 3. Hence, the median
of these values is 3. In image processing, a median filter is commonly used to remove
“salt and pepper noise” from images prior to performing other image processing
operations. It is good for this kind of operation because it is not unduly influenced by
outliers in the input data (i.e. the noise) the way that a mean would be. Additionally, the
output of a median filter is guaranteed to be a value which actually appears in the input
image data; a mean does not have this property.
We can first take the median of the rows of the ranked data:
We can then take the median of these medians to get an approximation to the
median of the whole 3×3 region:
17
Advanced Image Processing with DirectX® 9 Pixel Shaders
From this, we obtain the data in the 5th ranked image sample, which is the correct
value. We say that this method is only an approximation to a true median filter because
the true median will not be found if the ranked data is not so evenly distributed within the
filter kernel. For example, if we have the following ranked data, we can get an incorrect
median:
For a 3×3 filter kernel, however, the worst case that this separable median filter
implementation will give you is the 4th or the 6th rank instead of the 5th, which may be
adequate for many applications.
18
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
if( a < b )
{
if( b < c)
{
median = b;
}
else
{
median = max(a,c);
}
}
else
{
if( a < c)
{
median = a;
}
else
{
median = max(b,c);
}
return median;
}
The first pass of the 3×3 median filter, shown below, takes three samples from
the input image: the texel at the current location and the left and right neighbors. The
median red, green and blue values are found independently and the result is written out to
a temporary surface.
sampler InputImage;
float2 sampleOffsets[8];
struct PS_INPUT
{
float2 texCoord:TEXCOORD0;
};
19
Advanced Image Processing with DirectX® 9 Pixel Shaders
result.r = FindMedian(texSamples[0].r,texSamples[1].r,
texSamples[2].r);
result.g = FindMedian(texSamples[0].g,texSamples[1].g,
texSamples[2].g);
result.b = FindMedian(texSamples[0].b,texSamples[1].b,
texSamples[2].b);
result.a = 0;
return result;
}
In the second pass of the 3×3 median filter, the texel at the current location and
the top and bottom neighbors are sampled. The median red, green and blue values are
found independently and the final result of the shader is computed.
sampler InputImage;
float2 sampleOffsets[8];
struct PS_INPUT
{
float2 texCoord:TEXCOORD0;
};
result.r = FindMedian(texSamples[0].r,texSamples[1].r,
texSamples[2].r);
result.g = FindMedian(texSamples[0].g,texSamples[1].g,
texSamples[2].g);
result.b = FindMedian(texSamples[0].b,texSamples[1].b,
texSamples[2].b);
result.a = 0;
20
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
return result;
}
(a) Original (b) One Median Pass (c) Two Median Passes
Figure 9 – Median Filter Results
The original image (9a) has had some noise added to it. With only one pass of the
median filter, much of the noise is removed (9b). Applying the median filter a second
time eliminates the noise almost completely (9c). Median-filtering the red, green and
blue channels of the image independently is a reasonably arbitrary decision that seems to
work well for our data. You may find that another approach, such as converting to
luminance and then determining the median luminance works better for your data.
Fourier Transform
21
Advanced Image Processing with DirectX® 9 Pixel Shaders
equations for performing a DFT and its inverse on a two-dimensional image are shown
below:
Fourier Transform
M −1 N −1
1
H (u , v) =
MN
∑∑ h( x, y)e
x =0 y =0
− i 2π ( ux / M + vy / N )
where h(x, y) is the value of the pixel located at location (x, y), H(u, v) is the
value of the image in frequency space at location (u, v), M is the width of the
image in pixels, and N is the height of the image in pixels.
For these equations, it is important to remember that these are complex numbers (i
is the square root of negative one). Additionally, from complex math:
GPU Implementation
The first thing to note when using a GPU to implement an FFT based on the
decimation in time algorithm is that, to maintain most of its efficiency improvements, the
algorithm must be implemented in multiple passes by rendering to floating point
22
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
temporary buffers. If the spatial domain image is color (i.e. has multiple channels) these
temporary buffers will need to be set up as multiple render targets since the frequency
domain representation of the image uses complex numbers thus doubling the number of
channels on the output.
For a width × height image, the “Decimation in Time” FFT algorithm takes log2
(width) + log2(height) + 2 rendering passes to complete. For example, a 512×512 image
takes 20 rendering passes, which renders at approximately 30 frames per second on
today’s fastest graphics processors. Because each step of the computation is based solely
on the previous step, we are able to conserve memory and ping-pong between two
floating-point renderable textures to implement the following steps of the decimation-in-
time algorithm:
Scramble
In other words, the data member at location i is swapped with the data member at
the location at the bit-reversed address of i. The bit reverse of a given value is its mirror
image written out in binary. For example, the bit reverse of 0111 is 1110. Figure 10
shows an example of a scramble of a 16-element image.
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
23
Advanced Image Processing with DirectX® 9 Pixel Shaders
Values connected by arrows in Figure 10 are swapped during the scramble step.
Obviously, symmetric values such as 0000, 0110, 1001 and 1111 are left in place. Since
pixel shaders can’t easily do such bit-twiddling of pixel addresses, the most effective way
to perform the scramble step is via a dependent read from the input image, using a
specially-authored scramble map stored in another texture to provide the bit-twiddled
address from which to do the dependent read. The shader to perform such a dependent
read for the horizontal scramble is shown below:
struct PS_INPUT
{
float1 scrambleLoc:TEXCOORD0;
float2 imagePos:TEXCOORD1;
};
fromPos = In.imagePos;
It is important to remember that the scramble map must contain enough bits to
uniquely address each texel in the source image. Typically, this means the texture should
be a sixteen-bit single channel texture, preferably an integer format such as D3DFMT_L16.
Butterflies
Once the image has been scrambled, a series of butterfly operations are applied to
the image. In each butterfly pass, a pair of pixels is combined via a complex multiply
and add. Due to the inability of graphics processors to write to random locations in
memory, this operation must be done redundantly on both of the pixels in the pair and
therefore lose some of the ideal FFT efficiency gains. The locations of the paired pixels
are encoded in a butterfly map. The butterfly map is as wide as the source image and has
one row for each butterfly step. The code for applying horizontal butterflies is shown
below.
24
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
struct PS_INPUT
{
float2 srcLocation:TEXCOORD0;
};
sampleCoord.x = srcLocation.x;
sampleCoord.y = pass;
//sample location A
sampleCoord.x = butterflyVal.y;
sampleCoord.y = srcLocation.y;
a = tex2D( sourceImage, sampleCoord).ra;
//sample location B
sampleCoord.x = abs(butterflyVal.x);
sampleCoord.y = srcLocation.y;
b = tex2D( sourceImage, sampleCoord).ra;
The shader performs an extremely simple operation to accomplish its goal. First,
it fetches a texture to determine where on this line of the image to get two parameters a
and b. This same texel contains a factor w that is combined with a and b to produce the
final result. From these parameters, the algorithm can actually produce two of the results
needed for the next pass (a’ and b’), but since GPU’s do not perform random writes to
memory, the texture also includes a flag for which value to leave at this location. The
following equation shows the math used to convert a and b to a’ and b’.
25
Advanced Image Processing with DirectX® 9 Pixel Shaders
The shader only concerns itself with a single channel image and expects that the
real component is fetched into the first component and the imaginary component is
fetched into the fourth component. To handle more components, the shader does not need
to change significantly, but it will need to use separate textures and multiple render
targets to handle more than two channels simultaneously. The largest amount of magic is
in the special butterfly texture. This texture contains the offsets of the a and b parameters
to the function in its first two components and the real and imaginary parts of the w
parameter in its last two components. Additionally, the second texture coordinate is given
a sign to encode whether this execution of the shader should produce a’ or b’. To ensure
an accurate representation of all this with the ability to address a large texture, a thirty-
two bit per component floating point texture is the safest choice.
After the scramble and butterfly passes are applied in the horizontal direction, the
same operations are applied to the columns of the image to get the vertical FFT. The
overall algorithm looks something like the following pseudo code:
// Horizontal butterflies
LoadShader( HorizontalButterfly);
SetTexture( ButterflyTexture[log2(width)]);
for ( i = 0; i < log2( width); i++)
{
SwapSurfacesAandB();
SetShaderConstant( “pass”, i/log2(width));
DrawQuad();
}
// Vertical scramble
SwapSurfacesAandB();
LoadShader( VerticalScramble);
SetTexture( ButterflyTexture[log2(height)]);
DrawQuad();
// Vertical butterflies
LoadShader( VerticalButterfly);
SetTexture( ButterflyTexture[log2(height)]);
for ( i = 0; i < log2( height); i++)
{
SwapSurfacesAandB();
SetShaderConstant( “pass”, i/log2(height));
DrawQuad();
}
26
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
To transform back to the spatial domain, the exact same operations are performed
on the data, except that, as one final step, the data has a scaling factor applied to bring it
into the correct range.
Results
So, now that we know how to apply an FFT to an image using the graphics
processor, what have we computed? What does this frequency domain representation
look like and what does it mean?
The output of the Fourier transform consists not only of complex numbers, but
also typically spans a dynamic range far greater than that which can be displayed directly
in print or on a monitor. As a result, the log of the magnitude of the frequency is
typically used when displaying the Fourier domain. The function used to visualize the
Fourier domain in this chapter is given below:
Finally, the image is also shifted into what is referred to as normal form. This is
done primarily as a way to simplify the interpretation of the data. The shift can be done
on graphics hardware by setting the texture wrap mode to repeat and biasing the texture
coordinates by (-0.5, -0.5). In this format, the lowest frequencies are all concentrated in
the center of the frequency-domain image and the frequencies are progressively higher
closer to the edges of the image.
27
Advanced Image Processing with DirectX® 9 Pixel Shaders
Besides just providing an interesting way to look at and analyze images, the
frequency space representation allows certain operations to be performed more efficiently
than they could be in the spatial domain.
First, removing high frequencies that contribute to aliasing can be most easily
performed in frequency space. The simplest implementation of this simply crops the
image in frequency space to remove the higher frequencies. This is the application of
what is called the ideal filter, but its results tend to be anything but ideal on an image of
finite size. The ideal filter really has an infinite width in the spatial domain, so when the
28
From ShaderX2 – Shader Programming Tips and Tricks with DirectX 9
cropped image is transformed back to the spatial domain, sharp edges will ring with
ghosts propagating in the image. Other filters have been designed to work around such
issues. One well known filter for this sort of purpose is the Butterworth filter.
Conclusion
In this chapter, we’ve added some sophisticated tools to our image processing
toolbox, including HSV↔RGB color space conversion, the Canny edge detection filter
and separable implementations of a Gaussian blur, a median filter, and the Decimation in
Time formulation of the Fast Fourier Transform. We hope that these implementations,
presented here in the industry standard DirectX 9 High Level Shading Language, are easy
29
Advanced Image Processing with DirectX® 9 Pixel Shaders
for you to drop into your own image processing applications. We also hope that they
inspire you to create even more powerful image processing operations specific to your
needs.
Sample Application
The image processing techniques presented in this chapter were developed using
live and recorded video fed to Direct3D via the Microsoft Video Mixing Renderer
(VMR). The sample app, VideoShader, demonstrates the use of Direct3D and the VMR,
with the above filters and several others implemented using HLSL. Source for the
sample application and all of the shaders is available on ATI Developer Relations website.
Acknowledgements
Thanks to John Isidoro of Boston University and ATI Research for the separable
Gaussian filter implementation. Thanks to Michael Gennert of Worcester Polytechnic
Institute and David Gosselin of ATI Research for discussions which resulted in the
implementation of the separable median filter approximation.
References
[Canny86] Canny, John, “A Computational Approach to Edge Detection,” IEEE PAMI
8(6) 679-698, November, 1986.
[Cooley65] Cooley, J. W. and Tukey, O. W. “An Algorithm for the Machine Calculation
of Complex Fourier Series.” Math. Comput. 19, 297-301, 1965.
[Foley90] James Foley, Andries van Dam, Steven K. Feiner and John F. Hughes,
Computer Graphics: Principles and Practice, 2nd Ed. Addison-Wesley, 1990.
[Jain95] Jain, Ramesh, and Kasturi, Rangachar, et.al, Machine Vision, McGraw Hill,
1995.
[Mitchell02] Mitchell, Jason L., “Image Processing with 1.4 Pixel Shaders in Direct3D”
in ShaderX: Vertex and Pixel Shader Tips and Tricks, edited by Wolfgang Engel, 2002.
[Smith78] Smith, Alvy Ray, “Color Gamut Transform Pairs,” SIGGRAPH 78, pp. 12-19.
30