0% found this document useful (0 votes)
73 views8 pages

Reducing State Changes With A Pipeline Buffer: 1.1 Contributions

This document discusses reducing state changes in rendering systems by using a pipeline buffer. It proposes including a small buffer between the application and graphics hardware that rearranges primitives online to minimize state changes. In experiments, this approach reduced state changes by an order of magnitude and reduced rendering time by 30% while achieving results close to an optimally presorted sequence. The pipeline buffer is a generic technique that can integrate with existing rendering systems with low memory requirements.

Uploaded by

Dmitry Sosnitsky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views8 pages

Reducing State Changes With A Pipeline Buffer: 1.1 Contributions

This document discusses reducing state changes in rendering systems by using a pipeline buffer. It proposes including a small buffer between the application and graphics hardware that rearranges primitives online to minimize state changes. In experiments, this approach reduced state changes by an order of magnitude and reduced rendering time by 30% while achieving results close to an optimally presorted sequence. The pipeline buffer is a generic technique that can integrate with existing rendering systems with low memory requirements.

Uploaded by

Dmitry Sosnitsky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Reducing State Changes with a Pipeline Buffer∗

Jens Krokowski† Harald Räcke‡ Christian Sohler† Matthias Westermann§

Abstract Therefore, in some applications, e.g., in com-


puter games, the primitives are organized in blocks
A limiting factor in the performance of a render- of equal material, texture, and shader program in
ing system is the number of state changes, i.e., order to minimize the number of state changes in
changes of the attributes material, texture, shader the graphics hardware. However, frustum and oc-
program, etc., in the stream of rendered primitives. clusion culling usually require a spatial sorting of
We propose to include a small buffer between appli- the primitives. This stands in partial conflict with
cation and graphics hardware in the rendering sys- the requirement to sort the primitives by attribute
tem. This pipeline buffer is used to rearrange the value, which means that, in general, this approach
incoming sequence of primitives on-line and locally leads to suboptimal visibility culling. Further prob-
in such a way that the number of state changes is lems occur in interactive applications, where a pre-
minimized. This method is generic; it can be easily processing is not possible.
integrated into existing rendering systems.
In our experiments a pipeline buffer reduces the
number of state changes by an order of magnitude
and achieves almost the same rendering time as an
1.1 Contributions
optimal, i.e., presorted, sequence without pipeline
To overcome these problems, we propose a differ-
buffer. Due to its simple structure and its low mem-
ent technique to reduce the number of state changes:
ory requirements this method can easily be imple-
A pipeline buffer is included between application
mented in software or even hardware.
and graphics hardware and reduces state changes
on-line instead of doing a preprocessing. Therefore,
1 Introduction the application can benefit from any (especially spa-
tial) sorting without increasing the running time due
In current rendering systems a number of different to too many state changes.
techniques are integrated to reduce the time to ren- Our approach is based on a small buffer (storing
der a 3D scene. View frustum culling, approxima- less than hundred references) and a well chosen se-
tion, and occlusion culling are used to reduce the lection strategy to rearrange the incoming sequence
number of primitives that have to be rendered. An- of primitives on-line in such a way that the number
other significant factor for the rendering time is the of state changes is minimized.
number of state changes performed by the graph-
This method is generic, i.e., it can be used for
ics hardware during the rendering process. Such
different types of state changes, and it can be inte-
a state change occurs when two subsequently ren-
grated into existing rendering systems. Due to its
dered primitives differ in their attribute values, i.e.,
simple structure and its low memory requirements
in their material, texture, or shader program. These
this method can easily be implemented in software
state changes slow down the rendering system.
or even hardware.

This work is partially supported by the DFG grant ME 872/8-2 In our experiments each scene is stored in an oc-
and WE 2842/1-1. tree and we consider the sequence of primitives that

Heinz Nixdorf Institute and Institute of Computer Sci-
ence, University of Paderborn, Germany, Email: {kroko, results from an in-order traversal of that tree. For
csohler}@upb.de such a sequence we typically get that our method

Computer Science Department, Carnegie Mellon University, • reduces the number of state changes by an or-
USA, Email: [email protected]
§
Computer Science Department, Dortmund University, Ger- der of magnitude,
many, Email: [email protected] • reduces the rendering time by roughly 30%,

VMV 2004 Stanford, USA, November 16–18, 2004


• achieves almost the same rendering time as tives residing in the delay stream may be culled by
an optimal, i.e., presorted, sequence without primitives that were submitted afterwards. The de-
pipeline buffer. lay stream stores 50K-150K triangles and therfore
Note that these results heavily depend on the type is several orders of magnitude larger compared to
of state changes, i.e., on the cost associated with the the pipeline buffer.
state changes. Spatial sorting applications. Occlusion culling
methods compute the visible primitives either dur-
ing preprocessing and store them in state sorted Po-
2 Related Work tential Visible Sets (PVS) or determine them on-
line. If the scene contains mobile primitives (es-
Conventional approaches to reducing state changes
pecially mobile occluders), PVS methods normally
usually use some kind of preprocessing in order to
fail and, in addition, the memory requirements of
sort the sequence of primitives with respect to their
PVS methods can turn out to be a problem [5].
attribute values. However, these methods cannot
be easily combined with culling techniques used in On-line methods usually store the scene in a hi-
modern rendering systems since these techniques erarchical data structure, e.g., octree [7], kd-tree
usually require a spatial sorting of the primitives as [6], or bounding box hierarchy [14], to compute the
opposed to a sorting by attribute value. Although point-based visibility. During a traversal of such a
this trade-off is known for a long time only very lit- hierarchical data structure, the visible primitives are
tle work is done on paying attention to both factors classified and only these primitives are sent to the
simultaneously. graphics hardware. These approaches may generate
The IRIS Performer toolkit [11] resorts the vis- a sequence of primitives that contains many state
ible geometry by modes at the end of the CULL changes. We will show how to improve the render-
traversal for each frame and sends geometry and ing time for these scenarios by reducing the number
graphics commands to the graphics subsystem in of state changes with a pipeline buffer.
the subsequent DRAW traversal. However, the Theoretical analysis. The pipeline buffer sce-
computation of such a rearrangement is so expen- nario is an application of the on-line scheduling
sive that this can only amortize if it is used for sev- problem for sorting buffers [10]. The following
eral frames. is a conversion of this theoretical model to the
The system of Lalonde and Schenk [8] optimizes pipeline buffer application: An input sequence of
the geometric data of video games for the targeted items which are only characterized by a specific
hardware (e.g., PS2, XBox and GameCube) in a attribute has to be rearranged by a sorting buffer
preprocessing step using shader programs intensely. which is a random access buffer with storage capac-
In addition, the data is partitioned into smaller pack- ity for k items. The goal is to minimize the number
ets to fulfill hardware restrictions. These packets are of state changes, i.e., the number of maximal sub-
sorted to minimize expensive state changes. This sequences of items containing only items with the
method is applicable for video games with fixed same attribute value.
topology of the graphical elements but cannot be In this model on-line strategies are evaluated the-
used for modeling applications where the geometry oretically in a competitive analysis. In such a worst-
is changed dynamically. case analysis the cost of an on-line strategy, which
Buck et al. [3] provide a state tracking system for has no knowledge about the future input sequence,
the remote rendering system WireGL. It performs is compared with the cost of an optimal off-line
lazy state updates to transmit less state data over the strategy, which knows the whole input sequence in
network but, in contrast to our approach, it does not advance. Note that an optimal off-line strategy has
alter the sequence of the rendered geometry. also to use the sorting buffer to rearrange the input
Aila et al. notice also the trade-off between state sequence, i.e., in general an optimal off-line strat-
and spatial sorting. For state sorted applications egy can not just sort the whole input sequence.
they propose to add a delay stream in the graphics It is proven that several standard strategies are
hardware between the vertex and pixel processing unsuitable for this theoretical problem [10], i.e.,
unit [1]. The primitives in the delay stream are used the competitive ratio of the First-In-First-Out
√ and
to generate culling information. This way primi- Least-Recently-Used strategy is Ω( k) and the

666
competitive ratio of the Most-Common-First strat- Application
egy is Ω(k). Further, the Bounded-Waste strategy is
presented and it is proven that this strategy achieves Pipeline Buffer
a competitive ratio of O(log2 k) [10]. Note that the SGH se
t to ... State Tracking
competitive ratio of O(log2 k) does not give good

tra
H
SG

ns
performance guarantees for small buffer size k (this

.O
SIS

bj
.

OD-Buffer
is the interesting case in our pipeline buffer sce-

SIS=SGH
nario) because in this case log2 k is quite large in
comparison to k and the constant factors in the O-
notation have a dominating impact. Selection Strategy

3 The Pipeline Buffer Graphics Hardware

In the following, we call the states that occur dur- Figure 1: The pipeline buffer as application-
ing rendering the attributes of the geometry, e.g., independent add-on to the graphics API.
material, texture, vertex and fragment programs. A
primitive has the same attribute values all over the
described geometry. Hence, primitives can be, e.g., Method. For simplicity, we assume that there is
points, triangles, quads or indices to vertex arrays as only one attribute shared by all primitives. In Sec-
long as the attribute value managed by our pipeline tion 3.2 we discuss how to deal with more than one
buffer does not change during the rendering of that attribute.
primitive. The diagram of Figure 1 shows our pipeline
When a sequence of primitives is rendered, the buffer which consists of a state tracking system,
graphics hardware has to perform certain actions a selection strategy and a buffer with random ac-
each time the attribute values of two subsequent cess that stores up to k primitives whose attribute
primitives differ. For example, when two subse- value differs from the one currently processed by
quent primitives have different textures or vertex the graphics hardware.
programs the new texture or vertex program has The system catches the state setting commands
to be loaded before the primitive can be rendered. from the application and records them as the state
This process is called a state change of the graphics of the input stream (SIS). If the current state of
hardware. Moreover, OpenGL performs a valida- the graphics hardware (SGH) is the same as the
tion step to update the internal caches [12]. SIS we pass all subsequent geometry immediately
The rendering time of a sequence of primitives to the graphics hardware. Otherwise, the geome-
depends significantly on the number and the kind of try is appended at the end of the buffer together
state changes in this sequence. Changing the color with the SIS information. In the case of textures
or the blend mode is usually cheap. However, cer- or vertex/fragment programs this is easily possible
tain state changes, e.g., textures, cause page faults because a change involves sending a corresponding
in the internal caches and therefore lower the per- ID. Material information is stored once in the buffer
formance of current graphics hardware. We con- with a generated ID and all primitives sharing this
centrate on expensive states that are widely used in attribute value refer to this.
practice and often changed during the rendering of In case of a buffer overflow we must evict primi-
one frame, whereas the pipeline buffer method does tives from the buffer. (A buffer overflow means that
not depend on nor is limited to these states. k primitives are stored in the buffer.) This forces a
The pipeline buffer is easy to integrate in an exist- state change in the graphics hardware and we have
ing high level rendering system on application level. to select the next state. This is done by the selec-
The only requirement is that each object has a ref- tion strategy. Several strategies are presented and
erence to its attribute values. The more general so- discussed in Section 4. Finally, all primitives in the
lution we propose is to implement the buffer inside buffer that provide the selected state are sent to the
a device driver. graphics hardware.

666
At the end of the input stream or if the flush com- 3.2 Multiple Attributes
mand is executed the selection strategy determines
In many applications the primitives have multiple
the next state and all primitives providing it are ren-
attributes. In this case we can either consider only
dered. This is repeated until the buffer is empty.
the most expensive attribute or we can use a chain
of pipeline buffers, one for each attribute. In this
3.1 Order-Dependence Buffer case we propose to order the attributes increasingly
according to their influence on the performance of
Most graphics APIs require that transparent objects the rendering system.
must be blended in back-to-front order. For scenes For example, let us consider the attributes texture
with transparent objects we propose an additional (expensive) and color (less expensive). In this case
fixed size FIFO buffer called order-dependence we first process all primitives in a color buffer and
buffer (OD-buffer) that stores all transparent primi- finally in a texture buffer. The reason for this order-
tives to avoid reordering. ing is that a block of primitives created by the first
If the OD-buffer is full, first the pipeline buffer is pipeline buffer may be destroyed by a later one.
flushed and then the OD-buffer is emptied. Finally,
when all opaque primitives have been rendered the
remaining transparent primitives of the OD-buffer 4 Selection Strategies
are rendered. This way we achieve that the rela-
tive order of transparent objects is not altered and In this section, we present the different selection
the opaque objects are not delayed relatively to the strategies that are implemented and evaluated in our
transparent ones. Order-independent transparency pipeline buffer scenario. In case of a buffer over-
methods like the A-buffer algorithm [4] or the R- flow, the selection strategy selects the next attribute
buffer [13] present an other possibility to handle this value processed by the graphic hardware from the
problem. primitives currently stored in the pipeline buffer.
Because our pipeline buffer scenario is related to
caching, we consider the standard caching strate-
In-Order Execution. Several graphics APIs, e.g., gies First-In-First-Out (FIFO) and Least-Recently-
OpenGL [2] and DirectX [9], specify that primitives Used (LRU) in Section 4.1 and 4.2, resp. The Most-
mapping to the same pixel must be rendered in the Common-First strategy, presented in Section 4.3, is
order they were submitted. Reordering the prim- a fairly natural strategy in the pipeline buffer sce-
itives violates this rendering semantic and could nario. Finally, we introduce the Round-Robin (RR)
cause temporal flickering problems in some special strategy in Section 4.4.
cases like edge high lighting, stencil operations or if
identical geometries with different textures are ren-
4.1 First-In-First-Out (FIFO)
dered.
Our experience is that for most scenes this affects The FIFO strategy assigns time stamps to each at-
no or only a small subset of primitives. Addition- tribute value stored in the pipeline buffer. Initially,
ally, the order dependency can mostly be reduced the time stamps of all attribute values are undefined.
to a small number of blocks of primitives. Only the When a primitive is stored in the pipeline buffer the
order of these blocks is important but reordering the FIFO strategy checks whether the attribute value of
primitives of one block does not affect the final im- the primitive has an undefined time stamp. In this
age. For example, for edge high lighting the prim- case, the time stamp of this attribute value is set to
itives can be divided in two blocks: Triangles and the current time. Otherwise, it remains unchanged.
lines. Therefore, the pipeline buffer can be used to At each buffer overflow the FIFO strategy selects
reduce the state changes within one block but must the attribute value with the oldest time stamp and
be flushed between the blocks. resets its time stamp to undefined.
In our opinion the in-order dependency cannot The FIFO strategy is a very simple selection
be identified fully automated in all cases. Thus, strategy that does not analyze the stream of prim-
the pipeline buffer can be manuelly flushed or dis- itives. The pipeline buffer acts like a sliding win-
abled completely by the programmer at the expense dow over the stream of primitives in which primi-
of loosing the benefits from the buffer. tives with the same attribute value are combined.

666
4.2 Least-Recently-Used (LRU) The BW strategy provides a trade-off between
the space wasted by primitives with the same at-
Similar to FIFO, the LRU strategy assigns time tribute value and the chance to benefit from fu-
stamps to each attribute value stored in the pipeline ture primitives with the same attribute value. A
buffer. Initially, the time stamps of all attribute val- waste counter is assigned to each attribute value
ues are undefined. When a primitive is stored in the stored in the pipeline buffer. Informally, the waste
pipeline buffer the time stamp of its attribute value counter for an attribute value v is a measure for
is set to the current time. At each buffer overflow the space that has been wasted by all primitives
the LRU strategy selects the attribute value with the with attribute value v currently stored in the pipeline
oldest time stamp and resets its time stamp to unde- buffer. Initially, the waste counters of all attribute
fined. values are set to zero. In case of a buffer over-
The LRU strategy tries to benefit from the past. flow, the BW strategy increases the waste counter
However, the LRU strategy and also the FIFO strat- of each attribute value v by the number of primi-
egy tend to remove primitives too early from the tives with attribute value v currently stored in the
pipeline buffer. Hence, both strategies cannot build pipeline buffer. Then the attribute value with maxi-
large blocks of primitives with the same attribute mal waste counter is selected and this waste counter
value, if additional primitives with the same at- is reset to zero.
tribute value arrive later in the stream.
Even if the BW strategy achieves the minimal
number of attribute changes in our experimental
4.3 Most-Common-First (MCF) evaluation, the computational overhead of the BW
strategy is relatively large. Hence, we designed
The MCF strategy is a fairly natural strategy in the more efficient versions of the BW strategy.
pipeline buffer scenario. In case of a buffer over-
flow, the MCF strategy clears as many locations as A randomized version of the BW strategy is the
possible in the pipeline buffer, i.e., it selects an at- Random-Choice (RC) strategy. The RC strategy
tribute value that is most common among the prim- chooses a primitive uniformly at random from all
itives currently stored in the pipeline buffer. primitives currently stored in the pipeline buffer and
The MCF strategy also fails to achieve good per- selects the attribute value of this primitive. Note
formance guarantees since it keeps primitives with that the RC strategy can also be seen as a ran-
a rare attribute value in the pipeline buffer for a too domized version of the MCF strategy, i.e., the ran-
long period of time. This behavior wastes valu- domized version of the BW and MCF strategy are
able storage capacity that could be used for efficient identical. However, in the deterministic case, the
buffering otherwise. BW strategy clearly outperforms the MCF strategy.
Even if the RC strategy is much simpler than the
BW strategy, the generation of random numbers re-
4.4 Round-Robin (RR) quires a lot of runtime.
The RR strategy is a very practical variant of the Finally, we choose the Round-Robin (RR) strat-
Bounded-Waste (BW) strategy which is introduced egy which is a very efficient variant of the RC strat-
and analysed in a theoretical model [10] (for details egy. The RR strategy uses a selection pointer to se-
see Section 2). Thus, we present first the BW strat- lect an attribute value. Initially, the selection pointer
egy. In the previous sections we saw that a good se- points to the first slot of the pipeline buffer. In case
lection strategy should have the following two prop- of a buffer overflow, the RR strategy selects the at-
erties: tribute value of the primitive the selection pointer
• On the one hand, no primitive should be kept points to. Then the selection pointer is shifted in
in the pipeline buffer for a too long, possible round robin fashion to the next slot in the pipeline
infinite, period of time. buffer. We believe that the RR strategy still has the
• On the other hand, there is a benefit from keep- same properties on typical input sequences as the
ing a primitive in the pipeline buffer, if addi- BW strategy. In our experimental evaluation, the
tional primitives with the same attribute value RR strategy achieves almost the same number of at-
arrive in the near future. tribute changes as the BW strategy.

666
Scene Town Aphrodite Bear Elephant PowerPlant
# Triangles in
view frustum 211589 178068 71384 65296 333585
# Textures
(mean size) 430 (1282 )* 8 (5122 ) 14 (10242 ) 11 (5122 ) 17 (mat.)**
# state changes
(octree generated) 40341 27395 9215 11171 7529
remaining state changes
buffer size 10 30 10 30 10 30 10 30 10 30
RR 11744 7473 5753 2908 2025 1138 2593 1475 3331 2047
MCF 17009 12883 6536 3317 2371 1492 2997 1654 3521 2125
LRU 35862 29465 10539 7748 5613 2904 6103 3206 4358 3286
FIFO 37092 34048 12086 10244 4283 4779 6075 4728 4543 3332
remaining state changes in per cent
RR 29.1% 18.5% 21.0% 10.6% 22.0% 12.3% 23.2% 13.2% 44.2% 27.2%
MCF 42.2% 31.9% 23.9% 12.1% 25.7% 16.2% 26.8% 14.8% 46.8% 28.2%
LRU 88.9% 73.0% 38.5% 28.3% 60.9% 31.5% 54.6% 28.7% 57.9% 43.6%
FIFO 91.9% 84.4% 44.1% 37.4% 46.5% 51.9% 54.4% 42.3% 60.3% 44.3%

Table 1: Statistical overview for different test scenes. State change reductions for buffer size k = 10, 30.
*Material attributes are replaced by monochromatic textures. **PowerPlant uses only material attributes.
Aphrodite, Bear, Elephant courtesy of Polygon Technology, Darmstadt (www.polygon-technology.de).
PowerPlant courtesy of the Walkthrough Group at the University of North Carolina at Chapel Hill
(www.cs.unc.edu/˜walk).

5 Implementation and Results Experimental Setting. Our experiments are


made on a Linux based system with a 2GHz Pen-
For our experiments we have implemented a proto- tium IV processor and an NVIDIA GeForce4 Ti
type rendering system in C++ using OpenGL rou- graphics card. Our test scenes and their characteris-
tines for the rendering. The pipeline buffer is im- tics are summarized in Table 1. In the Town scene
plemented as a linear array and contains references we replaced material attributes by monochromatic
to primitives, e.g., GL POINTS, GL TRIANGLES textures to generate a test scene with a few hundred
or indices to vertex arrays. All measurements are different textures. In all our scenes the primitives
done using compiled vertex arrays. Primitives are are given in world coordinates and stored in an oc-
inserted at the smallest free index. Before a prim- tree. To generate the input sequence we traversed
itive is deleted we swap it with the primitive with this octree by an in-order traversal.
the largest index. Since our buffer is rather small We implemented the four strategies discussed in
no other data structures are needed. When a buffer Section 4 and also the BW and RC strategy that
overflow occurs we first find one primitive that is have been discussed in Section 4.4. Our experi-
evicted according to our selection strategy. Then ments confirmed that BW, RC, and RR are essen-
we check for every primitive stored in the buffer, tially equivalent in their performance with RR hav-
whether it has the same attribute value as the evicted ing an edge on the rendering time because of its
one. If it has the same attribute value, we also evict simplicity. Therefore, in our presentation we se-
it. Otherwise, it stays in the buffer. lected RR as representative strategy. We remark
To speed up the comparison of attribute values that BW achieves up to 3% higher reduction of state
we assign to each attribute value a unique integer changes.
ID. In the case of textures we use the TexID from We made experiments for each combination of
OpenGL. This way, we can compare two attribute strategy, scene, and buffer size. Since there is some
values by a simple integer comparison. redundancy in the results we do not give figures for

666
45000 40000 150

40000 FIFO Town


LRU 35000 Aphrodite
MCF PowerPlant
35000 RR Elephant
30000
Bear
30000

rendering time [msec]


25000

state changes
state changes

25000
20000 100
20000
15000
15000
without pipeline buffer
10000 FIFO
10000 LRU
MCF
5000 5000 RR
presorted sequence

0 0 50
0 50 100 0 10 20 30 40 50 0 50 100
buffer size buffer size buffer size

Figure 2: Reduction of state changes: (a) Comparison of the selection strategies (test scene: Town) (b) RR
strategy for our test scenes. (c) Rendering time for the Town scene with varying buffer size. The dotted and
dashed line represent the rendering time with no and with optimal state change reduction, respectively.

all possible combinations. Instead we try to focus scene contains some textures that rarely occur in the
and explain the behavior on selected examples. scene. Hence, MCF does not evict primitives with
In Section 5.1 we compare our selection strate- these rarely occurring textures from the buffer for
gies w.r.t. their reduction of state changes. Then we a long time. These primitives block valuable buffer
give more detailed results for RR. Finally, in Sec- space and this has a negative effect on the perfor-
tion 5.2 we discuss the rendering performance of mance of MCF on the Town scene.
our strategies, again giving a more detailed discus- Even for small buffer sizes a significant re-
sion for RR. duction is achieved. Our investigation un-
folds that most spatial sorting creates long se-
quences of one attribute value disconnected by
5.1 State Change Reduction
single different values, like AAA B AAAAAA
First we compare our selection strategies w.r.t. their AAAA C AAAAAAAAAAAA B AAAAAAAAA BB
reduction of state changes. We start with the Town AAAAAAAAAA. Even with a buffer size of k = 3
scene. In Figure 2(a) we see the number of state the resulting sequence, e.g., AAAAAAAAAAAAAAA
changes achieved by the different strategies and for AAAAAAAAAAAAAAAAAAA BBBB C AAAAAAA
different buffer sizes. The sequence of primitives AAA contains less than half the state changes.
generated by the in-order octree traversal has 40341 Summarizing we can see that RR clearly outper-
state changes. The highest reduction is achieved forms all other strategies w.r.t. to the reduction of
by the RR strategy which reduces the number of state changes. Even though MCF achieves compa-
state changes to 7473 with buffer size 30. The sec- rable results for some scenes it can be away by more
ond best strategy is MCF with 12883 changes with than a factor of 1.5 as the Town scene shows.
buffer size 30 followed by LRU and FIFO. Addi-
tionally, the performance of LRU and FIFO strongly
5.2 Performance
depends on the buffer size: A slight variation in the
buffer size may increase or decrease the number of Since all strategies are fairly simple, it is no sur-
state changes significantly. The other strategies are prise that the reduction of state changes results into
more robust concerning this phenomenon. faster running times. As an example, Figure 2(c)
For the other scenes the characteristics of the illustrates the rendering time for the four selection
curves are similar to Figure 2(a). However, the rel- strategies on the Town scene. We also give the ren-
ative performance of the strategies depends on the dering time without pipeline buffer on the input se-
scenes. Therefore, we present the full set of curves quence (generated by octree traversal) and on a pre-
only for RR (see Figure 2(b)). For all other strate- sorted sequence, which is the optimum performance
gies we just give results for buffer sizes 10 and 30 we can achieve. It can be seen that RR achieves
for the other test scenes in Table 1. the best performance among our strategies improv-
It turns out that for some scenes the performance ing between 28.8% and 35% on the rendering time
of MCF is closer to RR than for the Town scene. without sorting buffer for textured scenes (see Ta-
This can be explained by the fact that the Town ble 2). We also observe that the performance of RR

666
Scene Town Aphrodite Bear Elephant PowerPlant
is very close to the performance of a presorted se- without
quence without pipeline buffer. Note that RR needs buffering 145ms 148ms 59ms 60ms 115ms
RR (k=30) 101ms 101ms 42ms 39ms 107ms
3-6ms (for buffer size 2) to 3.5-7ms (for buffer size presorted 93ms 96ms 39ms 36ms 101ms
200) for our test scenes if rendering is turned off. speed-up
Hence, the CPU time of RR is less than 10% of the RR 30.3% 31.8% 28.8% 35.0% 7.0%
speed-up
total running time. presorted 35.9% 35.1% 33.9% 40.0% 12.2%
We also observe that below a certain threshold
the number of state changes seems to have no more Table 2: Rendering time of the RR strategy with
influence on the running time. We believe that from buffer size 30.
this point the performance is limited by a differ-
ent bottleneck. Therefore, according to our exper-
[3] I. Buck, G. Humphreys, and P. Hanrahan. Track-
iments a buffer size of 30 is sufficient to achieve ing graphics state for networked rendering. In Proc.
almost optimal performance. of ACM SIGGRAPH / EUROGRAPHICS Workshop
Finally, we observe that the speed-up for the on Graphics Hardware, pages 87–95. ACM Press,
PowerPlant scene is smaller than the speed-up for 2000.
other scenes. The reason is that the PowerPlant [4] L. Carpenter. The a-buffer, an antialiased hidden sur-
scene has no textures. In this scene a state change face method. In Proc. of ACM SIGGRAPH 84, pages
103–108. ACM Press, 1984.
corresponds to a change of material. Such a change
is less expensive than a change of texture. We also [5] D. Cohen-Or, Y. Chrysanthou, C. Silva, and F. Du-
rand. A survey of visibility for walkthrough applica-
remark that in general a change of the shader pro- tions. IEEE Transactions on Visualization and Com-
gram is more expensive than the change of a texture. puter Graphics, 9(3):412–431, 2003.
Thus we can expect a better gain of performance in [6] S. R. Coorg and S. J. Teller. Real-time occlusion
scenes with shader programs. culling for models with large occluders. In Proc. of
We conclude that using a pipeline buffer of size ACM Symposium on Interactive 3D Graphics, pages
30 with the RR strategy a spatially sorted sequence 83–90, 189. ACM Press, 1997.
of primitives can be rendered in almost the same [7] N. Greene, M. Kass, and G. Miller. Hierarchical z-
time as if it was sorted by attribute value. buffer visibility. In Proc. of ACM SIGGRAPH 93,
pages 231–238, 1993.
[8] P. Lalonde and E. Schenk. Shader-driven compi-
6 Conclusion lation of rendering assets. In Proc. of ACM SIG-
GRAPH 2002, pages 713–720. ACM Press, 2002.
We show that our pipeline buffer with the Round- [9] Microsoft. The Microsoft DirectX 9 Programmable
Robin strategy can be used to reduce the number Graphics Pipline. Microsoft Press, 2003.
of state changes in the rendering process by an or- [10] H. Räcke, C. Sohler, and M. Westermann. Online
der of magnitude. In comparison to an unsorted se- scheduling for sorting buffers. In Proc. of the 10th
quence our approach reduces the rendering time by European Symposium on Algorithms (ESA), pages
up to 35%. Our approach almost achieves the op- 820–832. Springer Verlag, Berlin, 2002.
timal performance of a presorted sequence. It is [11] J. Rohlf and J. Helman. Iris performer: A high
easy to implement, fast and generic, i.e., it can be performance multiprocessing toolkit for real-time 3d
graphics. In Proc. of ACM SIGGRAPH 94, pages
used to reduce any type of state change and it can
381–394. ACM Press, 1994.
be easily integrated into existing rendering system.
It would be interesting to evaluate the performance [12] D. Shreiner. Performance opengl: Platform indepen-
dent techniques. In course notes of the ACM SIG-
of our strategy, if it is realized in hardware. GRAPH 2001, 2001.
[13] C.M. Wittenbrink. R-buffer: A pointerless a-buffer
References hardware architecture. In Proc. of ACM SIGGRAPH
/ EUROGRAPHICS Workshop on Graphics Hard-
[1] T. Aila, V. Miettinen, and P. Nordlund. Delay ware, pages 73–80. ACM Press, 2001.
streams for graphics hardware. ACM Transactions [14] H. Zhang, D. Manocha, T. Hudson, and K. Hoff. Vis-
on Graphics, 22(3), pages 792–800, 2003. ibility culling using hierarchical occlusion maps. In
[2] OpenGL Architecture Review Board. OpenGL Ref- Proc. of ACM SIGGRAPH 97, pages 77–88. ACM
erence Manual: The Official Reference Document to Press, 1997.
OpenGL, Version 1.2. Addison-Wesley, 1999.

666

You might also like