Three-Dimensional Computer Graphics Architecture: Tulika Mitra and Tzi-Cker Chiueh
Three-Dimensional Computer Graphics Architecture: Tulika Mitra and Tzi-Cker Chiueh
SURVEYS
coordinates, which are quadruples of the form {x, y, z, w}, 2.1.3 Projection transformation: This transformation
where in most cases w is 1. (The tuple {x/w, y/w, z/w} is the projects objects onto the screen. There are two types of
Cartesian coordinate of the homogeneous point.) The geo- projections: (1) orthographic projection, which keeps the
metric transformation stage applies a sequence of ope- original size of 3D objects and hence is useful for architec-
rations on the vertices of the triangle. Figure 1 shows the tural and computer-aided design; (2) perspective projection,
geometric transformation part of a typical 3D graphics pipe- which produces more realistic images by making distant
line which consists of the following stages: objects appear smaller. Each of these transformations again
involves a 4 × 4 matrix multiplication. However, as most en-
tries in these matrices are zero, a careful implementation
2.1.1 Model and viewing transformation: Modelling
requires only 6 multiplications and 3
transformation positions primitives with respect to each
additions.
other, and the viewing transformation orients the resulting
set of primitives to the user viewpoint. These two transfor-
2.1.4 Clipping: The application programmer defines a 3D
mations can be combined into a single multiplication of the
viewing frustum such that only the primitives within the
homogeneous vertex coordinate by a 4 × 4 matrix, which is
frustum are projected onto the screen. This step
implemented as 16 floating point multiplications and 12
removes the objects that are outside the viewable area. The
floating point additions. Lighting calculation, in addition,
algorithm requires one floating point comparison per view-
requires the transformation of the vertex nor-
boundary plane, and thus 6 comparisons per vertex. If a
mal by a 3 × 3 inverse transformation matrix, which costs 9
triangle is partially clipped, then the algorithm should cal-
floating point multiplications and 6 floating point
culate the position of the new vertices at the intersection of
additions.
the triangle edge and the view-boundary plane. The number
of such operations performed depends on the actual num-
2.1.2 Lighting: This stage evaluates the colour of the
ber of triangles that cross the view-boundary planes, which
vertices given the direction of light, the vertex position, and
varies from one viewpoint to another. Hence, we will not
the surface-normal vector and material characteristics of an
take this cost into account for our computation requirement
object’s surface. We will consider here only the most
calculation.
popular shading model, called Gouraud shading, which
interpolates the colour of the three vertices across the sur-
2.1.5 Perspective division: If perspective transformation
face. Evaluating the colour of a vertex requires a variable
is applied on a homogeneous vertex, then the w value no
amount of computation depending on the number of light
longer remains equal to 1. This stage divides x, y, z by w to
sources and the material properties. We assume the sim-
convert the vertex to Cartesian coordinates.
plest case of a single light at infinite distance, and the mate-
rial with only ambient and diffuse coefficients. This lighting
2.1.6 Viewport mapping: This step performs the final
model calculates the following equation for each R, G, B
scaling and translation to map the vertices from the pro-
component:
jected coordinate system to the actual viewport on the
computer screen. Each vertex component is scaled by an
Cdiffuse × Clight × (N ⋅ L) + Areflection × Alight, independent scale factor and offset by an independent off-
set, i.e. 3 floating point multiplications and 3 floating point
where Clight and Cdiffuse are the light source intensity and dif- additions.
fuse reflection coefficient; Alight and Areflection are the ambient The total computation requirement to perform geometry
light intensity and ambient light coefficient; (N ⋅ L) is the transformation per vertex is then 46 multiplications, 29 addi-
dot product of surface-normal vector and the direction of tions, 3 divisions, and 6 comparisons. Modern processors
light vector. (N ⋅ L) is calculated only once. However, the can execute floating point addition, subtraction, compari-
rest of the equation should be calculated independently for son, and multiplication operations quite fast using pipelined
R, G and B components for each vertex. This requires a total execution units. Floating point division operation however,
of (3 + 3 × 3 = 12) multiplications and (2 + 3 × 1 = 5) addi- is not usually pipelined, and can take as high as 50 floating
tions per vertex. point addition operations’ worth of time. The total floating
point operation requirement for a single vertex transforma-
tion is then around 130. Today
Figure 1.
Geometry transformation stage of a 3D graphics pipe-
line.
CURRENT SCIENCE, VOL. 78, NO. 7, 10 APRIL 2000 839
SPECIAL SECTION: COMPUTATIONAL SCIENCE
even a modest scene requires around 1 million vertex trans- gle proceeds down from a starting point, and moves out-
formations per second to achieve a rate of 30 frames per ward from the centre line6. The centre line shifts to the left
second. This would translate to 130 MFlops (million floating or right, until it steps outside of the triangle at any point of
point operations) per second. Today’s PCs have sufficient time (Figure 4 a). To achieve parallelism, the triangle may be
floating point computation power and therefore typically traversed one pixelstamp at a time, rather than pixel by
perform the geometric transformation stage in the main CPU. pixel6. A pixelstamp is an array of pixels of dimension X × Y.
Evaluation of edge functions for all the pixels within a pixel-
stamp could start in parallel, and only qualified pixels are
2.2 Rasterization sent to the
pixel processing stage. Triangle traversal visits all pixel-
The rasterization stage comprises two steps. The scan con-
stamps that are completely or partially inside the triangle
version step decomposes a triangle into a set of pixels, and
(Figure 4 b).
calculates the attributes of each pixel, such as colour, depth,
The rasterization stage also includes texture mapping,
alpha, and texture coordinates. The pixel processing step
which is a crucial and widely used technique that wraps a
performs texture mapping, depth test and alpha blending for
2D texture image on the surface of a 3D object to emulate
individual pixels. Figure 2 shows the rasterization stage of
the visual effects of complex 3D geometric details,
the graphics pipeline.
such as wooden surface, tiled wall, etc. Each vertex of a
There are two distinct mechanisms that are quite popular
texture-mapped triangle comes with a texture coordinate that
for the scan conversion step: linear interpolation
defines the part of the texture map to be applied
algorithm and linear edge function algorithm. In linear in-
(refer to Figure 5). These texture coordinates are inter-
terpolation-based algorithms4, the triangle set-up step first
polated across the triangle surface via scan conversion. The
computes the slopes, with respect to the X-axis, for all the
most popular texture mapping implementation is based on
attributes along each edge of the triangle. Next, the edge
mip-mapping7 (Figure 6), which pre-calculates multiple re-
processing step iterates along the edges and computes the
duced-resolution versions of a texture image. Each resolu-
two end points of a horizontal pixel segment, called a span.
tion level corresponds to a particular depth. Coarser (finer)
Finally, the span processing step iterates along each span
resolution levels are used for farther (closer) objects. For a
and computes the attributes for each pixel on the span
3D object at a given depth, the mip-mapping algorithm
through linear interpolation (Figure 3).
chooses a pair of adjacent resolution levels of the texture
In linear edge function-based algorithms5, each edge of
image, and performs weighted filtering of 8 texels (texture
the triangle is defined by a linear edge function. The triangle
pixel) from these two resolution levels. This tri-linear filter-
is scan converted by evaluating, at each pixel’s centre, the
ing eliminates visual discontinuities when different mip-map
function for all edges, and processing only those pixels that
levels are applied on the same object.
are inside all the edges. The attributes are also computed
Before a pixel is written to the frame buffer, the rendering
from the linear functions. Typically, the traversal of a trian-
engine needs to check whether that triangle is actually visi-
ble at that pixel, i.e. no other triangle overlaps that pixel
making it invisible. This is known as hidden surface removal
for opaque objects. The number of overlapping triangles for
a pixel is called the depth complexity of the pixel. The major-
ity of graphics accelerators achieve hidden surface removal
using a depth/Z buffer, which is an array with the same di-
mension as the frame buffer. After a triangle is scan-
Figure 2. Rasterization stage of a 3D graphics pipeline. converted into a set of
a b
It is imperative that a separate hardware accelerator be dedi- well as the texture images are transferred over a high-speed
cated to rasterization. Hence, two distinct classes of graph- system bus such as PCI (Peripheral Components Interface)
ics architectures have been implemented: (1) combined to the rasterization hardware accelerator. A major design
geometry processor and rasterizer, the prime examples being issue for rasterization-only graphics
RealityEngine12 and InfiniteReality13 from SGI; (2) host CPU- accelerators is how to use the system bus bandwidth
based geometry processing and dedicated hardware accel- efficiently.
erator for rasterization.
Almost all of today’s low-end 3D graphics accelerators be-
3. Architectural innovations
long to the second class. In this case, the transformed ge-
ometry (vertex position, colour, and texture coordinates), as
To scale up the performance of the generic 3D graphics
architecture described in the previous section, the following
architectural issues need to be resolved:
• Although in theory state-of-the-art processors seem to
have sufficient raw floating-point computation power to
support geometric transformation at interactive frame
rates, in practice the CPUs are lagging behind the rasteri-
zation performance of the 3D graphics cards. Therefore
higher floating-point performance is essential to achieve
faster frame rates with better rendering quality.
• The data transfer bandwidth between the CPU, which
performs geometric transformation, and the 3D graphics
card, which performs rasterization, plays a crucial role in
the extent to which the entire 3D graphics pipeline can be
sped up. The heavy use of texture map in modern 3D ap-
plications further exacerbates the bandwidth problem.
• The memory access performance in the scan conversion
process has a dominating impact on the overall rasteriza-
tion performance. Improving the rasterization algorithm’s
data access locality is pivotal to the graphics card’s per-
formance.
Figure 7. Total number of triangles processed by the rasterization
engine at different frames or viewangles for various 3D applications. The following subsections describe architectural tech-
niques that have been proposed and implemented to
address these issues.
Figure 8. Total number of pixels processed by the rasterization Figure 9. Total texture memory bandwidth in MBytes for differ-
engine at different frames or viewangles for various 3D applications. ent frames.
3.1 Streaming SIMD extensions to instruction set called Accelerated Graphics Port (AGP)17. AGP connects the
graphics accelerator exclusively to the main memory sub-
Many current microprocessors have added Single Instruc- system (refer to Figure 11). AGP has four main advantages
tion Multiple Data (SIMD) type instructions to accelerate over PCI:
integer processing for media applications such as audio, 1. Reduction of load on PCI: The primary advantage of
video and image processing. This includes Intel’s Multime- AGP is that it eliminates the graphics-related bandwidth
dia Extensions (MMX), HP’s Multimedia Architectural Ex- requirement from the PCI bus by transferring data
tensions (MAX-2), Sun Microsystem’s Visual Instruction from the main memory to the graphics card over a dedicated
Set (VIS), etc. However, the geometry processing stage of bus.
the 3D graphics pipeline is based on floating point data 2. Higher peak bandwidth: AGP 2X (32 bit data path at
types. To exploit the parallelism in the geometry processing 66 MHz) transfers data on both edges of the clock, thereby
stage, Intel, AMD and others have achieving a peak bandwidth of 528 MB/sec. AGP 4X has a
recently added floating point SIMD instructions14,15 to the bandwidth of 1GB/sec.
instruction set. The main idea behind these extensions is 3. Higher sustainable bandwidth: AGP supports pipelining
that the geometry processing requires 32-bit floating point of requests, i.e. overlapping of access time of
data types, whereas the floating point paths (registers and request n with the issue of requests n + 1, n + 2 and so on.
ALUs) are 64-bit in width in most modern processors. Be- It also does sidebanding which provides extra address lines
cause vertex processing is inherently parallelizable, SIMD to issue new requests while the main data/address lines are
instructions allow two vertex-processing operations to be transferring the data corresponding to previous requests.
performed simultaneously using a single floating-point in- These two features makes it more likely for AGP to achieve
struction, with each vertex using half of the 64-bit data path. a sustained bandwidth that is much closer to its peak
Yang, Sano and Lebeck16 showed that SIMD instructions bandwidth.
can improve the geometry transformation performance by 20 4. Direct memory execute: The amount of local memory pre-
to 30%. sent in the graphics accelerator is limited. However, to ob-
tain more realistic images, applications use more and more
high resolution textures, all of which cannot fit into the local
3.2 Accelerated graphics port memory. Hence, the graphics driver needs to perform texture
memory management that keeps track of the textures pres-
Figure 10 shows a high-level view of the components of a ent in the local memory and downloads the required textures
PC desktop system. It consists of the processor, main mem- before they are used. This can introduce significant latency
ory, the north bridge, PCI-based devices and various inter- as the rendering engine waits for the complete mip-map of
connecting buses. The north bridge has the memory the texture image to be downloaded over the PCI/AGP bus.
controller and provides connections among different sys-
tem components. The main processor fetches the 3D model
from main memory, performs geometry transformation, and
writes it back to the main memory. The graphics accelerator
sitting on the PCI bus uses DMA (Direct Memory Access)
to retrieve that data from the main memory and then per-
forms rasterization. One major bottleneck of PC-based sys-
tems is the transfer bandwidth over the PCI bus, which
connects the system memory to the local memory of the
graphics accelerator1. The CPU needs to transfer geometry
data, graphics commands as well as texture data to the
graphics accelerator. Typically, the geometry information
associated with a vertex is about 32 bytes1, including the
vertex coordinates, colour, and texture coordinates, i.e.
32 MB/sec for 1 million vertices. This information crosses
the processor bus two time (once for reading and once for
writing in the geometry transformation stage), the PCI bus
once (transferring data to the graphics card), and the mem-
ory bus three times (in all the above cases). In addition, a
large amount of texture data need to be transferred over the
PCI bus as well. The peak PCI bandwidth of 32-bit, 33-MHz
PCI bus is 132 MB/sec, which is still not quite sufficient. To
solve this problem, Intel introduced a new bus specification, Figure 10. High-level view of the components of a PCI-based gra-
phics subsystem.
AGP provided a new feature called direct memory execute • Since only one tile worth of rasterization buffer is
(DIME) that allows required as opposed to a full-screen buffer, it is possible
the graphics accelerator to directly address main system to use more bits per buffer entry to support more ad-
memory over the AGP bus. A translation table in the AGP vanced rendering techniques such as oversampling or
controller, similar to the virtual to physical address transla- anti-aliasing, which rasterizes each pixel at a higher
tion table in the CPU, allows non-consecutive memory resolution and then down-samples the result to the
pages to appear as a single contiguous address space to required resolution.
the accelerator. This way the graphics accelerator can cache • Titled architecture matches very well with the emerging
the heavily used textures in the local memory, and access embedded DRAM process that can provide small on-
the comparatively little used ones directly from the system chip memory and high memory access bandwidth.
memory.
The main disadvantages of this architecture are
1. It requires an additional pipeline stage to sort tri-
3.3 Bucket rendering angles into buckets, thus increasing the total rendering la-
tency.
Traditional rendering requires random access to the entire
2. Redundant work is performed because large primitives
frame buffer, and it is not very cost-effective to provide a
may overlap with multiple tiles.
large high-bandwidth frame buffer. An interesting architec-
tural idea that addresses this problem is bucket
rendering. Bucket rendering is a technique where the 3.4 Composited image layers: Talisman
screen-space is partitioned into tiles (also called chunks),
and all the primitives of the scene are sorted into buckets, Microsoft introduced Talisman architecture in 1996, that
where each bucket contains the primitives that intersect comprised several independent ideas. However, the key
with the corresponding tile. This architecture renders the distinguishing feature of Talisman is composited image
scene one tile/bucket at a time, thereby reusing the Z-buffer, layer rendering20 that exploits the frame-to-frame coherence
alpha-buffer as well as other necessary buffers for storing for the first time. In traditional architecture, all the primitives
the results of intermediate rendering. At the end, all the tiles are rendered in each frame even though there is a great deal
are collected together to form the final image. Bucket ren- of coherence between consecutive frames. Instead, Talis-
dering has been implemented in Pixel-Planes 5 (ref. 18), Pix- man renders each primitive on a separate image surface. All
elFlow19, Talisman20 and finally commercially available the image surfaces are then composited together to form the
PowerVR from NEC/VideoLogic. The main advantages of final image. In the next frame, the image for a primitive is
bucket rendering are the following: transformed in the screen-space, given the transformation
matrices in the object-space. If the error introduced by im-
age-space transformation is below a threshold, the trans-
formed image can be used as the final result of rendering.
This architecture relies on the fact that image-space trans-
formation is much less expensive compared to object-space
transformation, and image layer composition can be per-
formed more efficiently. The main disadvantage of this ar-
chitecture is the complexity and gate count, and the
incompatibility problem with traditional APIs like OpenGL.
As a result, no commercial attempt has been made so far to
implement Talisman
architecture.
4. Parallel architecture
of parallelization techniques. Because the fundamental issue advantages of the sort-middle architecture are the load im-
in 3D rendering is sorting the geometric primitives with re- balance in the rasterization stage and the communication
spect to a given viewpoint, the parallelization strategies for cost due to redistribution of primitives after transfor-
polygon rendering can be classified as sort-first, sort- mation. Crockett and Orloff 22 proposed a static scan-line-
middle and sort-last depending on where the sorting opera- based scheme for image space partitioning. Whitman23 sug-
tion is performed19, which are gested adaptive load-balancing schemes for sort-middle
illustrated in Figure 12. architecture, while Ellsworth24 took advantage
In the sort-first strategy21, the image space is partitioned of frame-to-frame coherence to achieve better load-
into regions and each processor is responsible for all the balancing.
rendering calculations (both geometry and rasterization) in The sort-last strategy partitions the 3D input model in the
the region to which it is assigned. The screen space beginning of the rendering pipeline without taking into ac-
bounding box of each 3D primitive is calculated by per- count the viewpoint or object coordinates, performs geo-
forming just enough transformations. Every 3D primitive is metric transformation and rasterization on each partition
then distributed to the processors that are responsible for independently to produce a partial image, and finally com-
the image regions with which the bounding box overlaps. posites the partial images according to the depth value of
One primitive can be sent to multiple processors. From this each image pixel. Because of its simplicity, the sort-last ap-
point on, the set of 3D primitives in each processor goes proach has been implemented in several systems, including
through geometric transformation and rasterization com- PixelFlow19 from University of North Carolina, which uses a
pletely independent of primitives in other processors. Fi- high-speed combining network to composite sub-images.
nally, the image regions from the processors are simply The performance of the sort-last strategy depends critically
combined together to form the final rendered image. The on the composition stage. Various methods have been pro-
sort-first architecture has received the least attention so far posed so far to perform the composition. The simplest
because of the load imbalance problem in transformation method is to send the sub-images to a single compositing
and rasterization stage. However, as Mueller21 pointed out, processor19. Other schemes proposed are binary tree com-
the sort-first architecture can easily exploit the frame-to- position25, binary-swap composition26,27 and parallel pipeline
frame coherence and he has proposed a new adaptive algo- composition28.
rithm to achieve better load balancing. Mitra and Chiueh29 showed that all previously proposed
In the sort-middle strategy22, the image space is again sub-image compositing methods can be unified in a single
partitioned and each processor is responsible for one framework.
image region. 3D primitives are first transformed and then In general, in sort-last, a processor sends all the pixels of
distributed to different processors based on the trans- the relevant image space to another processor. This is
formed X and Y coordinates of the primitives. Again a primi- known as sort-last-full technique30. Cox and Hanrahan31
tive is sent to multiple processors if it crosses the image pointed out that it is sufficient to send only the ‘active’
region boundaries. After distribution, each processor per- pixels of the image space which is termed as sort-last-
forms rasterization on the transformed primitives independ- sparse. The trade-off between the two methods is the com-
ent of one another to produce a sub-image for the munication overhead versus extra processing required to
associated image region. The sub-images are then combined encode the ‘active’ pixels.
to form the final projection image. Sort-middle seems to be Until recently, all the parallel rendering engines were im-
the most natural architecture and has plemented either as dedicated ASIC, such as RealityEngine
been implemented both in hardware and software. Both and InfiniteReality, or were implemented on
InfiniteReality13 and RealityEngine Graphics12 have been massively parallel message passing or distributed shared
implemented using sort-middle strategy. The main dis- memory machines such as Intel Paragon. Currently how-
Figure 12. Sort-first, sort-middle, and sort-last parallel rendering architectures. The main dif-
ference between the architectures is where the distribution/sorting of primitives take place. G
represents the geometric transformation engine and R represents the rasterization engine.
ever, advances in the processor and graphics accelerator 8. OpenGL Performance Characterization Group,
technology, as well as the emergence of gigabit local net- http://www.spec.org/ gpc/opc.static/opcview.htm.
9. Dunwoody, J. C. and Linton, M. A., in Proc. of the ACM Sym-
work technology, such as Myrinet, have made it possible to posium on Interactive 3D Graphics, 1990, pp. 155–163.
implement high performance 3D graphics engines on a clus- 10. Chiueh, T. and Lin, W., in Proc. of the 12th ACM SIGGRAPH/
ter of workstations each of which is equipped with a low- Eurographics Workshop on Graphics Hardware, 1997, pp. 17–
cost 3D graphics card32,33. The basic parallelization strate- 24.
11. Mitra, T. and Chiueh, T., in Proc. of the 32nd Annual
gies will remain the same for these architectures. However, ACM/IEEE International Symposium on Microarchitecture
the loosely coupled network topology may (MICRO), 1999, pp. 62–71.
require different kinds of load balancing and composition 12. Akeley, K., in Proc. of the 20th ACM Annual Conference on
algorithms. Computer Graphics (SIGGRAPH), 1993, pp. 109–116.
13. Montrym, J. S., Baum, D. R., Dignam, D. L. and Migdal, C. J., in
Proc. of the 24th Annual ACM Conference on Computer
5. Conclusion Graphics (SIGGRAPH), 1997, pp. 293–302.
14. Intel Corporation, http://developer.intel.com/design/PentiumIII/
manuals/, 1999.
A unique characteristic of 3D graphic applications is that 15. Advanced Micro Devices, Inc.,
there is no end to the addition of new features to the stan- http://www.amd.com/products/cpg/ 3dnow/inside.html.
dard graphics pipeline. Unlike microprocessors, 3D graphics 16. Yang, C., Sano, B. and Lebeck, A. R., in Proc. of the 31st An-
nual ACM/IEEE International Symposium on Microarchitecture,
requires both advances in performance, i.e. more triangles
1998, pp. 14–24.
and more pixels per second as well as new and improved 17. Intel Corporation, http://www.intel.com/technology/agp/agp-
techniques that deliver more realistic image and cinematic _index.htm, 1998.
effects. Engineering and scientific 3D applications such as 18. Fuchs, H., et al. in Proc. of the 16th Annual ACM Conference
Computer Aided Design (CAD) and Computational Fluid on Computer Graphics (SIGGRAPH), 1989, pp. 79–88.
19. Molnar, S., Eyles, J. and Poulton, J., in Proc. of the 19th Annual
Dynamics (CFD) applications as well as entertainment ap- ACM Conference on Computer Graphics (SIGGRAPH), 1992,
plications such as computer games and animated movies, all pp. 231–240.
require higher-quality rendered 20. Torborg, J. and Kajiya, J. T., in Proc. of the 23rd Annual ACM
images at a faster rate, thus placing an increasing demand Conference on Computer Graphics (SIGGRAPH), 1996, pp.
353–363.
on the triangle and pixel rate. Therefore, we expect that 3D 21. Mueller, C., in Proc of the ACM Symposium on Interactive 3D
graphics architecture will remain a challenging field in the Graphics, 1995, pp. 75–84.
foreseeable future and thus has abundant room for further 22. Crockett, T. W. and Orloff, T., IEEE Parallel Distributed Tech:
algorithmic and architectural innovation. Sys. Appl., 1994, 2, 17–28.
23. Whitman, S., IEEE Comput. Graphics Appl., 1994, 14, 41–48.
24. Ellsworth, D., IEEE Comput. Graphics Appl., 1994, 14, 33–40.
25. Shaw, C., Green, M. and Schaeffer, J., Advance in Computer
Graphics Hardware, III, 1991.
1. Kirk, D., in Proc. of 13th ACM SIGGRAPH/Eurographics 26. Ma, K., Painter, J. S., Hansen, C. D. and Krogh, M. F., IEEE
Workshop on Graphics Hardware, http://www.merl.com/hwws98/ Comput. Graphics Appl., 1994, 14, 59–68.
presentations/ kirk/index.html, Keynote address, 1998, pp. 11– 27. Karia, R. J., in Proc. of IEEE Scalable High Performance Com-
13. puting Conference, 1994, pp. 252–258.
2. Neider, J., Davis, T. and Woo, M., Open GL Programming 28. Lee, T., Raghavendra, C. S. and Nicholas, J. B., IEEE Trans. Vis.
Guide, Addison–Wesley, 1993. Comput. Graphics, 1996, 2, 202–217.
3. Microsoft Corporation, 29. Mitra, T. and Chiueh, T., in Proc. of the 6th IEEE International
http://www.microsoft.com/directx/overview/ d3d/default.asp, Conference on Parallel and Distributed System, 1998.
1996. 30. Molnar, S., Cox, M., Ellsworth, D. and Fuchs, H., IEEE Comput.
4. Foley, J. D., vanDam, A., Feiner, S. K., Hughes, J. F. and Phil- Graphics Appl., 1994, 14, 23–32.
lips, R. L., Computer Graphics: Principles and Practice, 31. Cox, M. and Hanrahan, P., IEEE Parallel Distributed Technol-
Addison-Wesley 1990, 2nd edn. ogy: Syst. Appl., 1994, 2.
5. Fuchs, H., et al., in Proc. of the 12th Annual ACM Conference 32. Samanta, R. and others, in Proc. of the 14th ACM SIGGRAPH/
on Computer Graphics (SIGGRAPH), 1985. Eurographics Workshop on Graphics Hardware, 1999, pp. 107–
6. Pineda, J., in Proc. of the 15th Annual ACM Conference on 116.
Computer Graphics (SIGGRAPH), 1988, pp. 17–20. 33. Experimental Computer Systems Lab, Department of Computer
Science, SUNY at Stony Brook, http://www.ecsl.cs.sunysb.edu/
sunder.html.
7. Williams, L., in Proc. of the 10th Annual ACM Conference on
Computer Graphics (SIGGRAPH), 1983.