Cell Projection of Convex Polyhedra
Cell Projection of Convex Polyhedra
Abstract
Finite element methods commonly use unstructured grids as the computational domain. As a matter of fact, the
volume visualization of these unstructured grids is a time consuming task. Here, the fastest known object order
algorithm is the projected tetrahedra algorithm of Shirley and Tuchman. Even with the upcoming of programmable
graphics hardware, the rendering performance did not keep up with the growing complexity of the simulation data.
In this paper we strive to improve the performance of the cell projection technique by posing several restrictions
on the optical model. This allows us to devise a simple but fast hardware-accelerated algorithm which is able to
project arbitrary polyhedral cells, that is tetrahedra, prisms, hexahedra, etc. For this reason, our algorithm is well
suited for the display of unstructured FEM meshes with mixed cell types, but it is also applicable to the real-time
display of gaseous phenonema, such as fire and ground fog.
CR Categories: I.3.5 [Computer Graphics]: Computational have to be sorted in a back to front fashion18, 1 . After that,
Geometry and Object Modeling, I.3.7 [Computer Graphics]: each cell is decomposed into tetrahedra which can be dis-
Three-Dimensional Graphics and Realism. played efficiently using the well known Projected Tetrahedra
(PT) algorithm of Shirley and Tuchman14, 15 . Actual imple-
Keywords: Direct volume rendering, unstructured grids,
mentations of this algorithm achieve a peak performance of
cell projection.
250,00019 to 600,0004 tetrahedra per second including times
for sorting. Due to the growing complexity of the simulation
1. Introduction data frame rates of less than one frame per second are still
quite common for typical unstructured data sets.
In the area of volume visualization the availability of pro-
grammable graphics hardware has lead to both improved Recently, hardware-accelerated methods have been pro-
performance and rendering quality. In the case of regular posed to speed up the PT algorithm, but with actual graph-
data the pre-integration technique3 is the predominant recent ics hardware still no more than approximately 480,00016 to
improvement. While the pre-integration technique has been 490,00020 tetrahedra are possible (timings do not include
applied to unstructured tetrahedral grids even before12 , the sorting). There also exist hardware concepts to overcome the
performance of unstructured volume rendering methods is speed limitations, but it is uncertain when these concepts will
still poor compared to the regular case. In this paper we try find its way into graphics accelerators6 . Since recent efforts
to narrow the performance gap by posing several restrictions to significantly speed up the PT algorithm have not led to
on the optical model. This allows us to devise an algorithm satisfactory results, we pursue a different strategy in this pa-
which efficiently utilizes the graphics hardware to speed up per: First we evaluate the theoretical limit on the number of
unstructured volume rendering. polyhedra that can be rendered on actual graphics hardware.
Based on these results we propose a reasonable modification
Typically, unstructured grids are generated by finite ele-
of the optical model to approach the theoretical limit.
ment methods. In order to visualize the data, all cells first
case of hexahedral cells, this results in 6 faces with 4 ver- less visual clues but as we will see the implementation is
tices each. Assuming that the volumetric grid can be ren- extremely simple so that it can serve as a fast preview and
dered with triangle stripping, 8 vertices have to be passed prototyping option.
down the graphics pipeline per hexahedron. Actual graphics
Recently, Mech9 proposed a method to render bounded
accelerators like the NVIDIA GeForce3 reach a peak perfor-
layered fog using an emissive optical model. The bounded
mance of about 12 million vertices per second using triangle
fog is defined within a triangular surface mesh which al-
strips (in practical experience). Thus, the maximum theoret-
lows for easy hardware-accelerated computation of the ray
ical performance of the NVIDIA GeForce3 is 1.5 million
integral. The length of each ray segment is calculated in the
hexahedra per second.
frame buffer by coding the distance from the near clipping
In order to verify the theoretical result, we first applied plane into the vertex color. Then the length of the ray seg-
maximum intensity projection (MIP)5 . The advantage of ments can be computed by rendering the back faces of the
MIP is that a volumetric grid can be visualized just by ren- fog boundary and by subtracting the front faces. While this
dering all the faces of the cells in an unsorted order. Without approach is simple yet very fast, it assumes a constant fog
great loss of accuracy the scalar values can be assumed to density and requires a 12 bit visual to eliminate Mach bands.
vary linearly inside each hexahedron. Then the maximum In the following we extend this algorithm to project arbitrary
projected scalar value of each ray segment is either the value cell types, such as tetrahedra, hexahedra, or prisms, without
on the front or on the back face. Using this approach we the restriction to a 12 bit frame buffer and with linearly in-
achieved a performance of 643,000 hexahedra or 5.1 million terpolated densities within each cell.
triangles per second. Assuming that a hexahedron needs to
Our so called Projected Convex Polyhedra (PCP) algo-
be decomposed into at least 5 tetrahedra to be rendered with
rithm requires three passes per cell. In the first two passes
the PT algorithm the experimental result of 643,000 hexa-
the normalized length of the ray segments is calculated in the
hedra per second corresponds to 3.2 million tetrahedra per
alpha channel of the frame buffer. For this purpose, the dis-
second. This is still far away from the theoretical maximum,
tance d to the near plane is computed for each vertex of the
but it is almost a magnitude faster than the best known PT
cell. Let dmax denote the maximum distance per cell, let dmin
implementation.
denote the minimum distance, and let ∆d = dmax − dmin be
The performance for such a simple optical model like MIP the difference of both (see also Figure 1). Then the back
is already considerably lower than the theoretical limit. This faces of a cell are rendered into the alpha channel of the
is mainly due to the large rasterization overhead. Hence, it is frame buffer with the alpha component of each vertex set
no surprise that the performance is even worse in the case of to α = (d − dmin )/∆d. In the same fashion, the front faces of
the standard volume density optical model17 . This is due to the cell are rendered into the alpha channel with subtractive
the requirement of visibility sorting. Conceptually, the tetra- blending enabled. As a result, the normalized ray segment
hedra must be read, written, and read back from main mem- lengths are now available in the alpha channel of the frame
ory for sorting (compare Wittenbrink et. al19 ). With increas- buffer.
ing rendering speed of the graphics accelerator the memory
bandwidth consumed by visibility sorting becomes the limit-
ing factor. This behaviour starts at approximately 1.5 million near plane
tetrahedra per second on actual PC hardware. Since the total
performance is currently only around 600,000 tetrahedra per
second the main limiting factor is still the graphics accelera-
tor (and the CPU). We suspect that a significant performance
bump beyond the mentioned 1.5 million tetrahedra per sec-
ond limit is possible only with a structural paradigm shift of
graphics accelerators or special purpose hardware.
stacked prism
triangulated surface
edge height
is derived from
ground fog map base triangle