0% found this document useful (0 votes)

9 views44 pages

An Algorithm For Computing 2-Dimensional Voronoi Diagrams With The Opengl Graphics Pipeline

paper research about opengl

Uploaded by

nikhil2102005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views44 pages

An Algorithm For Computing 2-Dimensional Voronoi Diagrams With The Opengl Graphics Pipeline

paper research about opengl

Uploaded by

nikhil2102005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

AN ALGORITHM FOR COMPUTING

2-DIMENSIONAL VORONOI DIAGRAMS WITH THE

OPENGL GRAPHICS PIPELINE

Sabrina Templeton

Advisor: Professor Philip Caplan

A Thesis
Presented to the Faculty of the Computer Science Department
of Middlebury College

May 2022
ABSTRACT

The ability to create fast and physically-accurate fluid simulations is highly sought-after
in computer graphics. One method for creating effective fluid simulations relies on
Voronoi diagrams. Voronoi diagrams result from the division of space into cells, or re-

gions, where each cell is generated from a point, known as a site. When used in fluid
simulations, each cell gives way to a particle for the simulation. We propose a novel
way to efficiently compute Voronoi diagrams on the GPU by using a geometry shader

within the OpenGL 3.0 pipeline to compute each Voronoi cell. Other methods for com-
putation of Voronoi diagrams on the GPU exist which use technologies such as CUDA
and OpenCL, but our method differs from these methods in that we implement it within

the OpenGL pipeline. This difference allows for the ability to directly visualize the re-
sultant Voronoi diagram without reliance on additional tools. The ability to compute the
diagrams in parallel on the GPU resulted in a speed-up of two orders of magnitude over

a serial CPU implementation for Voronoi diagram generation, and one order of magni-
tude over the same implementation when run in parallel on the CPU. This speed-up is
especially significant to the application of power diagrams in fluid simulations, as those

computations must be done multiple times at every frame of the animation.

ACKNOWLEDGEMENTS

I would like to thank my advisor Philip Caplan, without whom this project would not be

possible, for all his extremely helpful guidance and advice. I would also like to thank
Professor Andrews and Professor Dickerson for providing invaluable feedback on this
thesis. Additionally, I would like to thank my fellow CSCI 0702 students, who fostered

a wonderful sense of community within our class. Finally, I would like to thank my
friends and family for their continuing support throughout this process.

ii
TABLE OF CONTENTS

1 Introduction 1
1.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Methods 7
2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Voronoi Diagrams . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 OpenGL Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Ability for Parallel Computation . . . . . . . . . . . . . . . . . 11
2.2 Clipping Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Calculation of Nearest Neighbors . . . . . . . . . . . . . . . . 13
2.2.2 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Triangulation of the Result . . . . . . . . . . . . . . . . . . . . 19
2.3 Vertex and Fragment Shaders . . . . . . . . . . . . . . . . . . . . . . . 20

3 Results 22
3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Baseline Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Results from NVIDIA Quadro P4000 . . . . . . . . . . . . . . . . . . 25
3.6 Image Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 Voronoi Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.7.1 White Noise Diagrams . . . . . . . . . . . . . . . . . . . . . . 28
3.7.2 Uniformly Distributed Diagrams . . . . . . . . . . . . . . . . . 28

4 Conclusion 32
4.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

A Vertex and Fragment Shader Code 35

Bibliography 36

iii
LIST OF TABLES

3.1 Comparison to CPU implementation . . . . . . . . . . . . . . . . . . . 24

3.2 Nearest neighbor timing breakdown. . . . . . . . . . . . . . . . . . . . 25
3.3 Results on NVIDIA Quadro P4000 GPU with access to 1792 CUDA
cores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Comparing time across image quality. . . . . . . . . . . . . . . . . . . 27
3.5 Comparison to hexagonal and square distributions . . . . . . . . . . . 31

iv
LIST OF FIGURES

1.1 A demonstration of what each cell in a Voronoi diagram represents. . . 2

1.2 Example of a fluid simulation computed from power diagram cells. . . 3

2.1 Examples of Voronoi diagrams generated from random and non-random

distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Representation of the OpenGL pipeline. . . . . . . . . . . . . . . . . . 9
2.3 A visualization of the radius of security theorem. . . . . . . . . . . . . 11
2.4 A figure showing the stages of the clipping algorithm. . . . . . . . . . 15
2.5 Triangulation of the final cell. . . . . . . . . . . . . . . . . . . . . . . 20

3.1 A graph showing the runtime of our algorithm on the NVIDIA Quadro
P4000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Voronoi diagrams for white noise distributions for 100 and 1000 sites. 28
3.3 Voronoi diagrams for white noise distributions for 10,000 and 100,000
sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 A Voronoi diagram with 1 million points. . . . . . . . . . . . . . . . . 30
3.5 Voronoi diagrams with a regular hexagonal pattern . . . . . . . . . . . 30
3.6 Voronoi diagrams with a regular grid pattern . . . . . . . . . . . . . . 31

4.1 Buffer ping-ponging can be used to adapt our pipeline to fluid simula-
tion applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

v
CHAPTER 1
INTRODUCTION

Building realistic and fast fluid simulations is an important problem in computer graph-
ics today. New methods in fluid simulation have emerged which build on Voronoi di-
agrams, and, more recently, power diagrams, which are a generalization of Voronoi

diagrams in which each point is equipped with a weight [1]. Fluid simulations are ani-
mations which attempt to realistically portray real-world fluids, such as water or smoke,
in motion. Such animations are commonly required in the animation industry today,

as well as being fundamental to ocean simulations, which can additionally be used for
environmental modeling [9]. Each cell in a Voronoi diagram is generated from a point,
known as a site, and corresponds to the region of space which is closer to that site point

than to any other site point in the diagram. This relation is shown in Figure 1.1, where
an arbitrary point from the cell in orange is considered and shown to be closest to the
site of the cell of which it is a part.

Fluid simulations rely on the use of many particles, often in the order of magnitude
of hundreds of thousands or more. These particles need to follow various physical prop-
erties, such as not overlapping [7], and, in the case of incompressible fluid simulations

used to model liquids, maintaining volume [13]. When particles are derived from power
diagrams, those particles follow these properties and can maintain incompressibility [7].
An important task, then, is being able to compute power diagrams with many cells which

could then give way to these particles, and to be able to do so rapidly. In animations and
simulations, speed is especially important. Our project aims to take a step closer to this

goal by creating a way to quickly compute a Voronoi diagram for up to several million
site points.

1
Figure 1.1: This figure shows a set of five site points, shown in red, and their associated
Voronoi cells. The point in black is part of the cell for site z2 , because it is closer to z2
than it is to any of the other sites. This property holds for every point of a Voronoi cell.
Figure made with the aid of desmos.com.

1.1 Motivation and Background

This project was inspired by the potential to positively contribute to applications of

Voronoi and power diagrams in fluid simulation techniques. Fluid simulations fall into
two major categories, particle-based (Lagrangian) and field-based, or fixed mesh (Eu-

lerian) simulations. As compared to fixed mesh methods, particle-based methods are

better at tracking the overall geometry of the fluid and at maintaining conservation of
energy [13], and are a popular choice within the field of computer graphics. Voronoi-

based approaches to fluid simulation were introduced in Whitehurst 1995 [22], which
uses Voronoi cells as part of a technique for compressible fluid simulation. A new tech-
nique for incompressible fluid simulation was proposed in de Goes et al., 2015 [7]. This

technique for fluid simulation relies primarily on a set of 3-dimensional power diagrams
which are calculated at every frame of the animation. These power diagrams then give
way to particles which drive the simulation. Since the calculation of power diagrams

has to be done multiple times for every frame, the speed of each computation is crucial.
The performance of de Goes’ technique is highly constrained by the time it takes for

2
(a) Initial state of a fluid sim- (b) The heavier red liquid (c) This simulation includes
ulation generated using the falls through the lighter blue 10K particles and took a few
avro codebase. liquid. hours to generate.

Figure 1.2: Example of a fluid simulation computed from power diagram cells.

the power diagram generation step, which they found accounts for 54% of the total time
needed to generate a fluid simulation of 1 million particles [7]. Currently, this technique
relies on the VORO++ library [19] for the calculation of power diagrams, but they call

for a way to calculate power diagrams on the GPU, which our project hopes to provide
a foundation for.
Since the power diagram computation step is the primary contributing factor of the

time it takes to generate the fluid simulation like the one in Figure 1.2, we investigate
a method which computes Voronoi diagrams on the GPU. This method differs from
previous GPU-based techniques [3, 17] in that it explores the possibility of computing

the Voronoi diagrams within the OpenGL pipeline. The OpenGL pipeline is a common
tool used for rendering images or animation in computer graphics. Thus, embedding the
Voronoi diagram calculation within the graphics pipeline would provide a method for

fast computation and visualization of a fluid animation without the need for additional
tools such as CUDA or OpenCL. Due to the speed-up in computation of the diagram
that results from the GPU processing power and the consolidation of the rendering stage

with this computation, exploring this possibility could potentially lead to the ability to
create such fluid simulations in real-time.

3
Other applications within the field of computer graphics that rely on Voronoi dia-
grams include fracture simulations [20], surface remeshing [10, 14], mosaicing [11], and
many more. Voronoi diagrams have applications beyond the field of computer graphics

as well, such as in aviation for emergency flight planning [8], in geology for the study
of crystal patterns [21], and biology for the study of protein structures [16]. Recently,
Voronoi diagrams have been used in epidemiology in COVID-19 studies to help identify

potential hot spots [4]. Within other areas within computer science, Voronoi diagrams
have had applications in collision detection, cluster analysis, and solving problems such
as the post-office problem [2]. A potential extension of our project could allow for

interactive input and output of coordinates in additional to the visual diagram which
is output by our current program, which then might provide benefits to researchers in
these broader fields as well. However, the primary contribution of our project is in the

speed-up of the computation of the diagram, and is therefore most relevant to those
applications where speed is an important factor, such as with fluid simulations.
The method we chose to explore to increase speed of the computation is to perform

the computation on the GPU. To best make use of the parallel processing power of
the GPU, we need an algorithm well-suited for computing the cells in parallel. This
rules out the use of common algorithms such as calculating the Voronoi diagram by first

calculating the Delaunay triangulation and then taking the dual [21]. Fragment-based, or
rasterized, methods for computing Voronoi diagrams work by dividing the space into a
grid of pixels, and then iterating over all of the pixels to determine which pixel belongs to

which Voronoi cell, and these methods can be well-suited to fast parallel computation on
the GPU [18]. However, these rasterized cells, even when calculated with an arbitrarily-
high number of fragments, will never result in the precise geometry of a final cell. Our

intent is to build a tool that could be applied to fluid simulation methods, so getting
the precise geometry is necessary, since particles of the simulation are derived directly

4
from the area or volume of the cell. Instead of either of these methods, we will use
an algorithm which is both well-suited to parallel computation and which results in the
precise geometry of the cell.

One example of a successful GPU-based application is given by Ray and Lévy [17],
who describe a method for computing meshless Voronoi on the GPU using CUDA. The
method used in their paper is applicable to the three-dimensional case. The mesh data is

not stored but tetrahedra and cell geometry are calculated which are particularly useful
in fluid simulation. The results of the study find a speed-up of one order of magnitude
as compared to the state-of-the-art CPU implementations. They were able to perform

clipping in around 0.3 seconds for 1 million points, and 3.3 seconds for 10 million
points, in the case where sites are generated in a completely random, or white noise,
distribution [17]. This study demonstrates the potential gain in time that comes from

computing the Voronoi diagram on the GPU.

Basselin et al. [3] details an implementation specifically for the calculation of power
diagrams. They use a parallel implementation on the GPU which results in a 2-3 times

speed-up over CPU-based methods when calculating traditional Voronoi diagrams and
a 50 times speed-up over CPU-based methods when calculating restricted power dia-
grams. Their algorithm runs in 6.7 seconds for power diagrams computed from white

noise distributions in the 10 millions point case [3]. They rely on OpenCL to run their
code on the GPU. This study demonstrates the potential of faster power diagram calcu-
lation for fluid simulations.

As mentioned, we will use the OpenGL pipeline rather than tools such as CUDA or
OpenCL to perform the computation of the Voronoi cells in parallel on the GPU. The
introduction of the programmable pipeline, which was initially introduced in OpenGL

2.0, allows fully customizable shader programs to be written in GLSL, or the OpenGL
Shading Language [23]. These shader programs allow for further customization of the

5
functionality. The OpenGL 4.0 pipeline, the most recent major version at the time of
writing, can include shader programs for geometry and tessellation stages in addition to
the vertex and fragment shaders. We will describe this pipeline in more detail in Section

1.2 Contributions

Our project explores the possibility of performing fast, GPU computation within the

OpenGL pipeline. This method is motivated by the demand for faster computation of
realistic fluid simulations, and explores whether a potential extension of this OpenGL-
based method could lead to interactive or real-time fluid simulations. If fluid simulations

were to be created using this method, they would not rely on additional tools such as
OpenCL or CUDA for fast, GPU computation, but rather would utilize the flexibility of
the OpenGL pipeline to perform the power diagram calculation and then visualize the

fluid simulation within the same pipeline.

Our method resulted in a speed-up of over an order of magnitude as opposed to

a CPU-based approach, demonstrating the significant speed-up that comes with this
GPU-based approach. Our implementation also has the advantage of the ability for the
resulting Voronoi diagram to be directly rendered and visualized after its calculation in

the geometry shader stage.

6
CHAPTER 2
METHODS

In this chapter, we will cover our methods for our implementation of Voronoi diagram
generation in the OpenGL pipeline. We will first review necessary information to un-
derstanding our methods, such as some further introduction to Voronoi diagrams, a de-

scription of the OpenGL pipeline, and what properties and theorems make it possible to
perform these computation in parallel. Then we will describe the primary clipping al-
gorithm, which is performed for each cell. We will also touch on necessary components

of this algorithm such as a triangulation step and the prerequisite calculation of nearest
neighbors. In addition to describing the algorithm, we will also provide the geometry
shader code in Listing 2.1, which is the stage within the OpenGL pipeline where this

algorithm takes place. Finally, we will describe the vertex and fragment shaders of the
OpenGL pipeline.

2.1 Preliminaries

Before diving into our main algorithm and specifics of our implementation, we will
cover some of the prerequisite structures that our methods rely on.

2.1.1 Voronoi Diagrams

A Voronoi cell is defined as the set of all points p⃗ such that the distance between p⃗ and

⃗zi is less than the distance between p⃗ and ⃗zi for all ⃗zj in the set of sites. This definition
is denoted mathematically for the two dimensional case by {⃗p ∈ R2 : ||⃗zi − p⃗|| ≤
||⃗zj − p⃗||∀⃗zj ∈ S}, where S is the set of all sites. Note that the scope of this study is

restricted to the 2-dimensional case of Voronoi diagrams.

7
(a) Voronoi diagram from (b) Voronoi diagram from a (c) Voronoi diagram from a
randomly generated sites. structured grid of sites. uniform distribution of sites
where offsets are added to
every other row, forming
hexagonal cells.

Figure 2.1: Examples of Voronoi diagrams generated from random and non-random
distributions.

The process of generating a Voronoi diagram starts with the generation of sites.

The Voronoi cells are then calculated based on these sites, with one Voronoi cell being
generated for each site. The cells of a Voronoi diagram are always convex polygons.
If the sites are placed regularly, the diagram can consist of cells which are square or

hexagonal. Figure 2.1 shows the difference between a Voronoi diagram which was
generated from 10 randomly placed sites (Figure 2.1a) and those which are generated
from sites arranged in a structured way (Figure 2.1b, Figure 2.1c).

2.1.2 OpenGL Pipeline

Our project utilizes the OpenGL pipeline to perform the algorithm for computation of

Voronoi cells on the GPU. A common use case of the OpenGL pipeline is exemplified
in Figure 2.2a, which shows a pipeline with only a vertex and fragment shader. The
vertex shader here deals with transforming the object, here a cube, being rendered into

the screen space. Then it is rasterized, or broken into fragments, which are then passed
to the fragment shader which assigns color to each fragment.

8
(a) An example of an OpenGL graphics pipeline which contains a vertex shader and a fragment
shader. The vertex shader processes the vertex data and transforms it. Then it undergoes a
rasterization stage, where primitives are broken down into fragments which are then handled by
the fragment shader, which assigns color to each fragment.

(b) The OpenGL graphics pipeline we use. The vertex shader is present but performs no com-
putation. The sites are passed into the geometry shader, the algorithm is run, and then primitives
are output for each cell. Then it passes through the rest of the pipeline in a typical manner.

Figure 2.2: The top panel shows a commonly used graphics pipeline which includes
only a vertex shader and a fragment shader. The bottom panel shows the pipeline as we
use it in this project, with the geometry shader playing the primary part.

The full OpenGL 4.0 pipeline consists of five programmable shaders. When all five
shaders are present, the vertex shader is first in the pipeline, followed by the two stages

relating to the tessellation shader, control and evaluation, respectively. Following that is
the geometry shader stage and the fragment shader stage. Not all stages of this pipeline
are required for every project, but at the very least, a vertex and fragment shader are

required. No matter which subset of these stages are present, the order of the pipeline is
maintained. For instance, the vertex shader always precedes the fragment shader.
The vertex shader executes once for each vertex. Since each vertex is processed

completely independently of the other vertices, they can be processed in parallel if there
is access to more than one thread [23]. Vertex shaders are always necessary as part of the
OpenGL pipeline. However, oftentimes in applications which use a tessellation shader

or geometry shader to create or edit the mesh, the vertex shader can be simplified and

9
considered ‘pass-through,’ since it remains a part of the pipeline but does not do any real
work. The vertex shader that we use is an example of a pass-through shader.
The tessellation shader has many uses such as subdividing a mesh [23]. Subdivi-

sion is the process of taking a mesh and breaking it into smaller triangles, with the end
product typically being a smoother mesh. Another example of a useful application of
tessellation shaders is a study which looked at improving real-time rendering of models

of the Earth [15]. This study uses the tessellation shader to tessellate the terrain more
quickly on a GPU. However, for the purpose of our project, all we need is to generate the
geometry of the Voronoi diagram, and this can be done in the geometry shader. There-

fore, we do not need to include the tessellation shader in our pipeline. This then reduces
our pipeline to the one that was originally introduced with the release of OpenGL3: a
vertex shader, followed by a geometry shader, followed by a fragment shader.

Typically, between the vertex and geometry shader stages, the vertices are assembled
into primitives. In our pipeline, the primitives are emitted as the output of the geometry
shader, as shown in Figure 2.2b. The fragment shader is then run independently for

every fragment, where a fragment can conceptually, in most cases, be considered to be

equivalent to a pixel [23]. Specifically, in our project, the fragment shader will assign
color based on the ID of the primitive, which is passed in from the geometry shader.

The cell data is never stored other than individually in the geometry shader, but instead
is directly visualized.
As mentioned earlier, the majority of our project is written inside the geometry

shader. The geometry shader is able to add, delete, or modify primitives. In our ap-
plication, we use it to add primitives, allowing us to calculate the Voronoi cells within
the geometry shader and then send them through the rest of the pipeline to be rendered.

Since Voronoi cells follow the property of always being convex polygons, we can easily
and reliably break them into triangles so that they can be emitted from the geometry

10
Figure 2.3: The radius of security here is shown as the outer black circle. It is calculated
by doubling the radius, r, of the inner bounding circle, represented by the dashed circle.
Conceptually, this theorem makes sense, as any site points which lie outside the radius
will lead to perpendicular bisectors which will never cross the inner circle. This figure
was created with the aid of desmos.com.

shader.

2.1.3 Ability for Parallel Computation

The calculation of Voronoi cells is not dependent on any of the geometric information
from the neighboring Voronoi cells. Instead, it is reliant on the nearest neighbor infor-

mation, or the closest site points to the current site. That is to say, to calculate each cell,
we must know the location of the current site along with the locations of the nearest

sites, which need to be sorted in order of increasing distance. We can calculate this in-
formation for every site on the CPU side and then pass it in as a texture to the geometry
shader so that each site can access this nearest neighbor information. We can then clip

our Voronoi cell against these neighbors until the final geometry is reached.

Radius of Security Theorem

Clipping against every nearest neighbor would be incredibly inefficient as our algorithm

scales to diagrams of over a million points. Therfore, to avoid looping through an unnec-

11
essarily large number of nearest neighbors, we will use the radius of security theorem
proposed by Lévy and Bonneel in their 2012 paper [14]. This theorem provides a simple
cutoff point for stopping the clipping process. The theorem demonstrates that there is a

radius—the radius of security—which is equal to two times the radius of the bounding
circle of the current polygon such that clipping against any neighbors outside of this
radius will not affect the current cell. Given the radius of the current bounding circle of

the polygon as Rk , this theorem states that for all nearest neighbors ⃗zj such that || ⃗zj -
⃗zj || > 2Rk , the bisector of (⃗zi , ⃗zj ) is non-contributing [14]. Rk is equal to the distance
between the current site and the furthest vertex of the polygon, and is shown as r in

Figure 2.3. Once this radius has been reached, we can stop visiting nearest neighbors
which fall outside this radius.

2.2 Clipping Algorithm

For our project, we needed an algorithm which can run in parallel for every cell in order
to fully get the benefit of the added increased parallel computation power of the GPU.

We can rely on the fact we are calculating the Voronoi diagram that is restricted to a
square. Each cell is then initialized to the square and then progressively clipped based
on the locations of the nearest neighbors until the final geometry of the cell is computed.

This algorithm is often known as half-space clipping, but we refer to it in this paper as
the clipping algorithm. Determining when the cell has reached the final stage is made
simple with the radius of security theorem, described above. If each nearest neighbor is

visited in order of ascending distance, as soon as a nearest neighbor is visited that falls
outside of the radius of security and that no longer affects the geometry of the cell, it
must be true that the none of the rest of the nearest neighbors will have any effect on the

cell, and clipping can cease.

12
2.2.1 Calculation of Nearest Neighbors

The nearest neighbors are calculated on the CPU side for all sites at once, and then
the nearest neighbor information for the entire diagram is passed into the geometry
shader. We use a function from Geogram [12] to perform calculation of the nearest

neighbors. For each point, we calculate 30 nearest neighbors, which we determined by

experimentation to cover the majority of the cases we looked at. This calculation has
the potential to be costly in computation time especially as the number of sites scales

up. The calculation of nearest neighbors is also highly parallel and has the potential for
computation on the GPU. However, this goes beyond the scope of our project, which
instead uses a parallel CPU computation for the nearest neighbors.

2.2.2 Algorithm Description

The clipping algorithm is implemented inside the geometry shader, and is therefore

written in GLSL, which is the OpenGL shading language. The algorithm proceeds as
follows, and is also shown in code inside the geometry shader code included below in
Listing 2.1.

For each site, the algorithm starts by initializing a polygon to the bounding box of
the domain, as it does in lines 83-87 of Listing 2.1. This initialization is also shown
in Figure 2.4a. Then, it loops through all the nearest neighbors. The nearest neighbors

are passed in from the CPU through a texture which is defined in line 13. This texture
contains the nearest neighbor information for every site: for each site, indices for the
corresponding nearest neighbors are listed in ascending order of distance. In order to

find the correct nearest neighbor information for the current site, we multiply the site in-
dex by the number of nearest neighbors to access the correct portion of the information,
which is done in line 90, and then use this index to get the site in line 92. We can then

loop through the nearest neighbors for all sites which we do in line 89, skipping over the

13
first nearest neighbor because the first nearest neighbor is always the current site itself.
The site index that we use both to look up the nearest neighbor information and
to set the current seed is passed to the geometry shader from the vertex shader. It is

received in line 9 of the geometry shader as a flat integer array called instance ID.
The first and only value of this array is the site index, which is conceptually equivalent to
gl PrimtiveIDIn, a built-in variable in OpenGL which represents the index of the

current primitive for which the geometry shader is executing. However, although using
gl PrimtiveIDIn is conceptually equivalent to using the site index in the manner
that we use it, the former caused issues in our implementation.

Once we have identified the current site, zi , and the current nearest neighbor, zj ,
we can proceed to clip the polygon. In our implementation, this is written inside a
calc poly function which is defined on line 31 and called on line 94. Each time this

function is called, we perform a clip on the polygon. As shown in Figure 2.4b, the first
stage of clipping cuts the polygon along the perpendicular bisector of the line between
zi and zj1 . To implement this, we can consider that we want to update the polygon:
⃗ in Figure 2.4b, which will no longer be part of the
remove any vertices, such as p3
⃗ and q1
polygon after the clip, and add in the two vertices, q0 ⃗ in the figure, which are the

intersection points of the clipping plane and the edges of the polygon. When adding and

removing vertices, we need to make sure to maintain counter-clockwise ordering. In

our implementation, we loop through the vertices of the current polygon, each current
vertex is accessed by the variable v0, which is initialized in line 37. We then look at the

next counterclockwise vertex, which is stored in v1. We want to determine if there is an

intersection on the edge between these two vertices. We know there is an intersection
point between any two vertices if and only if one of the vertices will remain inside the

polygon as a vertex after the clipping and the other one will be outside and subsequently
discarded. Thus, we have to determine whether v0 or v1 is ‘inside’ or ‘outside’. We do

14
(a) Initial polygon is equal to (b) The first stage of clipping (c) The second stage of clip-
the bounding domain. cuts between zi and zj1 . ping cuts between zi and zj2 .

(d) The third stage of clip- (e) The fourth and final stage (f) The resultant diagram af-
ping cuts between zi and zj3 . of clipping cuts between zi ter clipping has commenced
and zj4 . for all site points.

Figure 2.4: The clipping algorithm is shown here on an example diagram. The first five
panels show the process to calculate the current cell by clipping against each nearest
neighbor, and the sixth cell shows the resultant diagram after clipping is completed for
all cells. Figures taken from lecture slides [5].

15
so on lines 50 and 53 of the geometry shader code shown in Listing 2.1. To determine
this, we can borrow from our definition of a Voronoi diagram, remembering that all
points in the cell are closer to the site of that cell than they are to any of the nearest

neighbors. It follows that in order for a vertex to be inside at any given clipping stage,
the vertex must be closer to the current site than it is to the nearest neighbor that we are
currently clipping against.

We now have four cases to consider: either both vertices are outside, the first vertex
is inside and the second vertex is outside, the first vertex is outside and the second vertex
is inside, or both vertices are inside. The cases where the two vertices are either both in-

side or both outside are the easiest to consider as we do not calculate the intersection in
this case. At each run of the loop over the vertices, we only want to determine whether
or not to discard the first vertex at which we are looking, v0, as we will then deal with

the second vertex, v1 in the next iteration of the loop, when that vertex becomes v0.
As such, in the case where both vertices are outside, we discard v0—strictly speaking,
we refrain from adding it to the new polygon which we are maintaining, thereby doing

nothing at this step, which is demonstrated by an unwritten else case at line 73. A visual
⃗ and q2
example of this case is in the second stage of clipping in Figure 2.4c, where q1 ⃗

are both discarded. In the case where both vertices are inside, we add v0 to the new

copy of the polygon, since we will add v1 at the next iteration of the loop—this case is
⃗ and p1
shown in line 68. This case is also exemplified in Figure 2.4c, for p0 ⃗ Then, in

the other two cases, we know there is an intersection between the two current vertices,
⃗ and p2.
a case which is shown in Figure 2.4c for p1 ⃗ We calculate this intersection by

calling the calc intersect function, which calculates the intersection between a
line segment and the perpendicular bisector between two Voronoi sites using the imple-

mentation found in Geogram [12]. In either case we add the intersection point that is
returned by this function to the new polygon. If the first vertex is inside, we simply add

16
v0 before adding intersect pt, and if not, we add only the intersect pt at this
stage.
Once this loop has run for every vertex we copy the contents of the new polygon into

the current polygon. Since the number of vertices in a cell is not constant, we initialize
arrays with a size of MAX VERTS, and then keep track of the number of vertices in each
array at any given time. When we copy the contents of the new polygon into the current

polygon in line 75, we also have to update cspace.

17
1
2 #version 410
3 #extension GL_ARB_gpu_shader_fp64 : enable
4 layout (points) in;
5 precision highp float;
6 uniform int u_nb_neighbors;
7 uniform mat4 u_ModelViewProjectionMatrix;
8 const int MAX_VERTS = 30;
9 flat in int[] instance_ID;
10 out float v_Index;
11 layout (triangle_strip, max_vertices = 40) out;
12 uniform samplerBuffer seeds; // the buffer which contains the sites
13 uniform usamplerBuffer nn;
14 void render_poly(vec3[MAX_VERTS] poly, int space){
15 // This function triangulates the polygon and emits the triangles
16 vec3 p0 = poly[0];
17 for (int i = 0; i < space -2; i++){
18 gl_Position = u_ModelViewProjectionMatrix*vec4(p0, 1.0);
19 v_Index = float(instance_ID[0]); // set v_Index for use in the fragment shader
20 EmitVertex();
21 gl_Position = u_ModelViewProjectionMatrix*vec4(poly[i+1], 1.0);
22 v_Index = float(instance_ID[0]);
23 EmitVertex();
24 gl_Position = u_ModelViewProjectionMatrix*vec4(poly[i+2], 1.0);
25 v_Index = float(instance_ID[0]);
26 EmitVertex();
27 EndPrimitive();
28 }
29 }
30 void
31 calc_poly(vec3 zi, vec3 zj, inout vec3[MAX_VERTS] curr_poly, inout int cspace){
32 // where zi is the current site and zj is the current nearest neighbor
33 // cspace represents the actual length of the items in curr_poly
34 int nspace = 0; // nspace keeps track of the next available space in new_poly
35 vec3 new_poly[MAX_VERTS];
36 for (int i = 0; i < cspace; i++) {
37 vec3 v0;
38 vec3 v1;
39 if (i == cspace-1){ // correctly assign the two points for the last edge
40 v0 = curr_poly[cspace -1];
41 v1 = curr_poly[0];
42 }
43 else{
44 v0 = curr_poly[i];
45 v1 = curr_poly[i+1];
46 }
47 int side1 = 0;
48 int side2 = 0;
49 //calculate side 1: determine if the first border point is inside or outside
50 if (distance(v0, zi) < distance(v0, zj)) { side1 = 1; }
51 else { side1 = -1; }
52 //calculate side 2: determine if the second border point is inside or outside
53 if (distance(v1, zi) < distance(v1, zj)) { side2 = 1; }
54 else { side2 = -1; }
55 vec3 intersect;
56 if (side1 != side2){
57 intersect = calc_intersect(zi, zj, v0, v1);
58 if (side1 == 1){
59 new_poly[nspace] = v0;
60 new_poly[nspace+1] = intersect;
61 nspace += 2;
62 }
63 else{
64 new_poly[nspace] = intersect;

18
65 nspace += 1;
66 }
67 }
68 else if (side1 == 1){ // meaning both sides are inside
69 new_poly[nspace] = v0;
70 // we add just v0 since we will add in the v1 at the next step of the loop
71 nspace += 1;
72 }
73 // otherwise both sides are outside and we take neither
74 }
75 curr_poly = new_poly;
76 cspace = nspace; // update cspace to match the new polygon
77 }
78 void main() {
79 // fetch the current site, taking in the ID which we output from the vertex
shader.
80 int site_idx = instance_ID[0];
81 vec3 zi = vec3(texelFetch( seeds , site_idx ).xy, 0);
82 // Initialize the polygon to the entire area of the diagram
83 vec3[MAX_VERTS] curr_poly;
84 curr_poly[0] = vec3(0, 0, 0);
85 curr_poly[1] = vec3(1, 0, 0);
86 curr_poly[2] = vec3(1, 1, 0);
87 curr_poly[3] = vec3(0, 1, 0);
88 int space = 4;
89 for (int i = 1; i < u_nb_neighbors; i ++){
90 int nn0 = int(texelFetch( nn, i + (site_idx * u_nb_neighbors)).x);
91 // nn0 is just the index so we need to look up the corresponding point in seeds
92 vec3 zj= vec3(texelFetch( seeds, nn0).xy, 0);
93 calc_poly(zi, zj, curr_poly, space);
94 // Below is the radius of security theorem
95 float radius = 0;
96 // loop over the vertices of the polygon to find the furthest one,
97 // the distance from zi to that point becomes the radius.
98 for (int j = 0; j < space; j++ ) {
99 vec3 curr_pt = curr_poly[j];
100 if(distance(curr_pt, zi) > radius) {
101 radius = distance(curr_pt, zi);
102 }
103 }
104 // If the current nearest neighbor is outside radius of security, stop clipping
105 if (distance(zj, zi) > (2.1 * radius)){ break; }
106 }
107 render_poly(curr_poly, space);
108 }

Listing 2.1: The code for the geometry shader, which performs the primary algorithm.

2.2.3 Triangulation of the Result

Once we have the geometry of the cell stored as a series of points which define the
corners of the polygon, we must create triangles, or primitives, so that the OpenGL

pipeline can successfully handle the output from the geometry shader. Since the Voronoi
cell is known to always be convex, we are able to easily divide the cell into triangles as

19
Figure 2.5: The triangulation here is shown in white as it would divide up the Voronoi
cell. The first point of the polygon can be used as the point to which we triangulate
given that Voronoi cells are guaranteed to be convex.

shown in Figure 2.5. We can triangulate the polygon to any one of the vertices of the
polygon, so we arbitrarily choose the first point as the one to triangulate towards. These

triangles can then be rasterized, allowing fragments to be processed by the fragment

shader.

2.3 Vertex and Fragment Shaders

As previously mentioned, the vertex shader of our implementation does almost nothing,
since the diagram is not generated until the geometry shader stage. Thus, our vertex

shader is ‘pass-through,’ and the only notable tasks it handles are setting gl Position
for the points, and saving the ID of the vertex in instance ID, passing it to the next
stage—the geometry shader.

The fragment shader manages the assignment of color to the polygons after their
creation. To do so, it looks up the index of the polygon that each fragment belongs to in
the colormap to determine the color, which is passed in from the CPU as a texture in line

11 of Listing A.2. This index is taken in v Index, which is output from the geometry
shader. Again, this is conceptually the same as what gl PrimitiveIDIn should be.

20
The code for the vertex and fragment shaders is included in full in Appendix A.

21
CHAPTER 3
RESULTS

3.1 Hardware

Unless otherwise specified, these results were run on a MacBook Pro with a Intel Iris
Plus Graphics 1536 MB graphics chip, which has access to 384 shader cores. This is

directly related to the number of cells we are able to compute in parallel. Note that our
computer is able to run only 8 threads in parallel, which is relevant to the baseline CPU
implementation as well as the computation of the nearest neighbors. Our computer also

uses a Quad-Core Intel Core i5 processor with a clock rate of 2GHz.

3.2 Baseline Algorithm

As a baseline for our comparison, we ran a Voronoi diagram algorithm generation im-

plementation from the avro library [6], which we ran in parallel and in serial. This
algorithm is primarily intended for calculation of power diagrams, but can be run to
generate Voronoi diagrams, since power diagrams are equivalent to Voronoi diagrams

in the case where all weights are equal. We expected both the parallel and the serial
versions to be slower than our algorithm, and we expected to see a larger difference in
time between the two algorithms as the number of sites for which we were generating

the diagram increased. This difference points to the value and resultant speed-up of be-
ing able to run the calculation in parallel, and with the increased parallel computation
power from the GPU, we expect to see another speed-up of our algorithm as compared

to the avro algorithm run in parallel.

22
3.3 Timing

All of times are the result of averaging the results of ten seperate runs of the code in order
to correct for potential differences in individual run times. The timing method which we
use to time avro includes all the calculations necessary to generate the diagram, such

as the calculation of the nearest neighbors. For the timing of our own implementation,
we were able to separately time the CPU-side nearest neighbors implementation and
the GPU-side intersection calculation. This allows us to see that the nearest neighbors

calculation separately to determine if it is a bottleneck for speed. In order to make a

fair comparison between the two, we total the nearest neighbor and GPU portions of our
results in Table 3.1. Table 3.2 shows the breakdown of the separate timing of the nearest

neighbors calculation and the GPU run time. For a million points, the time it takes for
the nearest neighbor computation alone is under a second. Note however that because
the time it takes for the GPU to run is just under 6 seconds, this accounts for almost 11

percent of the total run time in the one million point case.
In order to get a full idea of comparison to CPU-based methods such as avro, we
forced the program to run in serial on one thread to show the time differential between

the serial and parallel approaches on the CPU. As expected, the algorithm is significantly
slower when run in serial. The results of running the algorithm in serial and in parallel
can be compared in Table 3.1. As expected, the speed difference between the parallel

and serial implementations of this is large, especially on diagrams with higher numbers
of sites. For one million sites, the parallel implementation is almost 8 times faster.

3.4 Comparison

Before comparing the two algorithms directly, we want to acknowledge that there are a
few key differences between the ways avro and our project are implemented that could

23
conceivably slow down the avro implementation as opposed to ours. avro is designed
to be dimension-independent, which may account for some of the difference that we see
in run time. In addition, avro divides the plane into two separate triangles and calcu-

lates the Voronoi diagram separately for each one. This means that the cells which fall on
the diagonal edge between the two triangles have to be calculated twice. Furthermore,
avro uses exact geometric predicates, which is necessary for being able to calculate the

dual Delaunay Triangulation. However, this extra step in calculation could slow down
the overall run time of the avro implementation. While all of these are potential fac-
tors in the overall run time of avro, we expect that these are minor contributing factors

to the difference that we see in run times between the two implementations. The main
factor for this discrepancy is likely the increased parallel computation power that comes
from running our implementation on the GPU, but future work would have to be done to

compare our algorithm to a more exact counterpart on the CPU in order to truly confirm
this.

Table 3.1: Comparison to CPU implementation: This table compares avro’s CPU-
based algorithm with our GPU implementation. All times are given in seconds. n rep-
resents the number of sites, avro-1 refers to the algorithm when forced to run in serial
on one thread, and avro-8 shows results from the algorithm when run in parallel on 8
threads.

n avro-1 avro-8 GPU

100 0.0610 0.00741 0.00183
1K 0.607 0.075 0.00946
10K 6.77 0.0863 0.088
100K 68.4 8.48 0.679
1M 683 90.1 6.63

Overall, our GPU implementation is consistently faster than both the parallel and

the serial run times of the avro implementation on the CPU, as shown in Table 3.1.
This difference becomes especially apparent as the number of points increases. For
the one million point Voronoi diagram, the serial implementation on the CPU takes

24
683 seconds, or around 11 minutes. avro is faster when run in parallel and takes only
around 90 seconds to generate the same diagram. Our implementation takes only around
6.6 seconds to run a diagram for one million points. This is over a 13 times speed-up

over the parallel CPU implementation from avro, and over a 100 times speed-up over
the serial implementation.

Table 3.2: Breakdown of timing in our application: All results are given in seconds.

n GPU nn calc
100 0.00175 8.14E-05
1K 0.00911 3.53E-04
10K 0.0846 0.00339
100K 0.636 0.0430
1M 5.91 0.719

3.5 Results from NVIDIA Quadro P4000

We also ran our algorithm on a more powerful GPU to test if the speed of the computa-
tion would improve if not limited by our MacBook’s hardware. The GPU we used for
this set of results is an NVIDIA Quadro P4000 with access to 1792 CUDA cores, which

is significantly more compared to the 384 shader cores that we had access to with the
MacBook. This computer also had access to 16 threads on the CPU side which results in
a speed-up of the nearest neighbor computation. The results from running our algorithm

are shown in Table 3.3 and visualized in Figure 3.1. We ran white noise, or completely
random distributions of sites, as well as hexagonal and square distributions, all of up
to four million points. Studying both random and regular distributions allows us to see

differences between bad and good cases. In the regular cases, there are less nearest
neighbors to clip against, which allows us to see the timing difference that results from
clipping against up to 30 nearest neighbors versus clipping against less than ten in the

regular case. With the more powerful computation power of this hardware, the nearest

25
Table 3.3: Results on NVIDIA Quadro P4000 GPU with access to 1792 CUDA cores.
All times are shown in seconds. The nearest neighbor times here are again shown sepa-
rately from the GPU portion of our algorithm, which allows us to see that it becomes a
limiting factor in terms of computation time.
(a) Random distribution (b) Square distribution (c) Hexagonal distribution

n GPU nn calc n GPU nn calc n GPU nn calc

1M 0.2546 0.4592 1M 0.0975 0.2255 1.2M 0.0887 0.2659
2M 0.5095 1.0129 2M 0.1811 0.4421 2.3M 0.1738 0.5067
3M 0.7822 1.5705 3M 0.2785 0.6635 3.5M 0.2637 0.7545
4M 1.0623 2.2414 4M 0.3860 0.8903 4.6M 1.7752 1.0439

neighbor calculation step becomes a limiting factor in terms of the speed of the overall

computation, especially as the number of sites increases.

Figure 3.1: Results of our algorithm on the NVIDIA Quadro P4000 in graph form.
The timings for the primary algorithm are on the left and the time for nearest neighbor
calculation on the CPU side is shown on the right.

3.6 Image Quality

The diagrams which we ran in the results above were all 500px by 500px. Final images

included as figures in this paper were rendered at 2000px by 2000px to show the dia-

26
grams in higher resolution. Experimentation revealed that there does not seem to be cor-
relation between the resolution and the speed: that is to say, rendering at higher resolu-
tions does not seem to result in a slower calculation time. This was reinforced by testing

to see if removing the rasterization stage entirely by using GL RASTERIZER DISCARD

would result in faster computation. These results are shown in Table 3.4. For each test,
we ran a white noise distribution for one million points, and averaged the result of ten

runs to eliminate any discrepancies between runs. The results of this experiment shows
that the primary driver of the run time is the geometry shader, which is to be expected
as this is where majority of the computation is done.

Table 3.4: Image Quality Results: This table compares the time to run results of a white
noise distribution with 1 million sites across different resolutions, and the final column
shows the results with no rendering stage. All times are given in seconds.

n 500px 1000px 2000px No Rasterization

1M 5.908 5.166 5.178 5.178

3.7 Voronoi Diagrams

As previously mentioned, one advantage of working within the OpenGL graphics pipeline
is that the Voronoi diagram can be directly rendered as part of the final stages of the
pipeline. This leads to the other primary result of this project: the Voronoi diagrams that

we are able to generate and visualize. We study both the white noise case, which is an
irregular and therefore worse case, and more regular distributions such as the hexagonal
and square pattern distributions.

27
(a) Voronoi diagram with 100 points. (b) Voronoi diagram with 1000 points.

Figure 3.2: Voronoi diagrams for white noise distributions for 100 and 1000 sites.

3.7.1 White Noise Diagrams

Figures 3.2 and 3.3 show examples of diagrams output by our algorithm with 100, 1000,

10,000, 100,000, and 1,000,000 sites generated from white noise, or completely random,
distributions.

3.7.2 Uniformly Distributed Diagrams

In addition to being able to generate diagrams from white noise point distributions, we
study two, more regular, patterns: a hexagonal pattern and a square grid pattern. Images

generated from these distributions are shown in Figure 3.5 and Figure 3.6, respectively.
We also ran results to compare the timings of these distributions to the timing of
the white noise distributions. We hypothesized that the regular diagrams would run in

less time than the random counterparts, since every site is certain to have fewer nearest
neighbors to loop through. Our results for the comparison of the hexagonal to the ran-
dom diagram and the results of the comparison to the square grid are shown in Figure

3.5. These results reinforce our hypothesis, as the hexagonal and square diagrams run

28
(a) Voronoi diagram with 10K points. (b) Voronoi diagram with 100K points.

Figure 3.3: Voronoi diagrams for white noise distributions for 10,000 and 100,000 sites.

in less than two seconds for diagrams of around one million sites. Compared to the
random diagram which takes around 6 seconds for this many points, this is certainly a
significant speed-up.

Note that since the square and hexagonal diagrams need to be run on specific num-
bers of sites in order to generate in a regular way, the square and hexagonal diagrams
cannot be compared directly. For each comparison to the random diagrams were run

on the same number of points. An error consistently occurred when trying to run the
random distribution for the specific number of points that the hexagonal diagram needed
in the 1 million diagram case. However, we can consider in this case that a diagram with

three million points is around 3 times slower to generate for randomly generated sites.

29
Figure 3.4: A Voronoi diagram with 1 million points.

(a) Hexagonal diagram with (b) Hexagonal diagram with (c) Hexagonal diagram with
136 points. 1235 points. 11,758 points.

Figure 3.5: Voronoi diagrams with a regular hexagonal pattern

30
(a) Square diagram with 100 (b) Square diagram with (c) Square diagram with
points. 1024 points. 10,000 points.

Figure 3.6: Voronoi diagrams with a regular grid pattern

Table 3.5: The left table compares the time to run results of a white noise distribution
to the time it takes to run a regular hexagonal distribution, and the right compares the
random and square cases. All times are given in seconds and do not include the time for
calculation of nearest neighbors.
(a) Hexagonal versus random (b) Square versus random

n Hexagonal Random n Square Random

136 8.27E-04 0.00275 100 6.76E-04 0.00149
1.2K 0.00397 0.0142 1K 0.00350 0.00967
11.7K 0.0357 0.114 10K 0.0345 0.0848
117K 0.255 0.711 100K 0.261 0.624
1.1M 1.66 GPUHANG 1M 1.84 5.90
2.02M 2.83 10.3 2M 3.57 10.2
3M 4.13 15.3 3M 5.29 15.3

31
CHAPTER 4
CONCLUSION

In my thesis, I have addressed the question of whether Voronoi diagram calculations

can be done on the GPU within the OpenGL pipeline, and whether this will result in a
significant speed-up over a typical CPU implementation. We have found that there is a

significant speed-up over the implementation from the avro code base that we used as a
comparison point. This speed-up could have broad implications for applications in fluid
simulations which build off of power diagrams. These applications could make use of an

extension of this method to compute those diagrams without having to rely on additional
tools since the calculation and rendering stages can both be done within the OpenGL
environment, and this approach may have the potential for real-time fluid simulations

using this method. Additionally, while improving on applications of Voronoi diagrams

to fluid simulations was the primary inspiration for this project, the implications of this
work are not limited to that field. Rather, this work could prove important for any

application or instance where Voronoi diagrams need to be computed quickly for a large
number of sites. Additionally, since our implementation is able to render the Voronoi
diagrams directly without extra steps, any case where visualizing the resultant diagram

is necessary could also benefit from this approach.

4.1 Future Work

One important area for future work is incorporation into a full fluid simulation, which

would allow for more research to be done in order to the determine the specific degree
to which implementation using the OpenGL pipeline in this way might be advantageous
over other methods. In order for this project to be extended to perform a full fluid

simulation, many modifications would need to be made, starting with those extended
capabilities described below.

32
The first modification would be implementing a dimension-independent algorithm
that would allow for similar computation of three-dimensional and higher dimensional
diagrams, which would be important to support three-dimensional fluid simulations.

Another extension could include implementing Lloyd relaxation. Lloyd relaxation is

the process of smoothing out the mesh so that the cells become more even, and is also
used in fluid simulation techniques. Perhaps the most important extension of this project

in regards to applications for fluid simulations would be to add in capabilities for calcu-
lating power diagrams in addition to Voronoi diagrams.
Another modification which would be important to fluid simulations would be to

perform the computation of the nearest neighbors directly on the GPU for each neighbor.
This would speed up this portion of the algorithm, which is currently a bottleneck in
terms of computation time. It would also mean we could more easily test and correct our

calculation if more than thirty nearest neighbors are required to reach the final geometry
of the cell. This would resolve a current limitation of our algorithm where it is possible
for the geometry generated to be incorrect if a site has more than 30 nearest neighbors.

In the case of using transform feedback for fluid simulations, we use a process known
as buffer ping-ponging. This process is shown in Figure 4.1. Instead of proceeding
through the rasterization step, the output of the vertex shader or geometry shader is sent

back to the CPU side. We would use this process to continually solve for the weights
of the power diagram using an optimal transport problem, which is how conservation of
mass is satisfied in the fluid simulation. Every time the weights are updated, we would

need to replace the sites and recalculate the nearest neighbors. Buffer ping-ponging
allows us to perform these necessary updates, overwrite the information in the buffers,
and then repeat the process as needed until the weights are optimal, at which point we

are ready to render the particles using the pipeline. For an example of how buffer ping-
ponging can be used for fluid simulations, see Chapter 9 of the OpenGL 4.0 Shading

33
Language Cookbook [23] on animations and particles.

Figure 4.1: This figure shows a typical graphics pipeline with buffer ping-ponging. Our
pipeline would use a similar structure, but with the output coming from the geometry
shader instead of the vertex shader. Figure taken from the OpenGL 4.0 Shading Lan-
guage Cookbook [23].

Future research should also be done to make a comparison between our algorithm

and an identical CPU-based algorithm in order to eliminate any confounding variables

that might affect the run time. Additionally, work could be done to compare our al-
gorithm to a similar implementation which instead uses CUDA or OpenCL to perform

the computation and then visualizes the diagram using OpenGL. This would allow for a
direct comparison to measure if the use of OpenGL for the entire program has any speed
advantage over alternative GPU approaches for generating Voronoi diagrams.

34
APPENDIX A
VERTEX AND FRAGMENT SHADER CODE

1 #version 410
2 flat out int instance_ID;
3 layout (location = 0 ) in vec3 a_Position;
4 // This is a pass-through shader, but we still need to set gl_Position
5 void main() {
6 gl_Position = vec4(0.0,0.0,0.0,0.0);
7 // we also save the instance_ID so we can pass it into the geometry shader
8 instance_ID = gl_VertexID;
9 }

Listing A.1: Vertex shader code.

1 #version 410
2 layout( location = 0 ) out vec4 fragColor;
3 in vec3 v_Position;
4 in vec3 v_Normal;
5 in vec3 v_Parameter;
6 in float v_Index;
7 const int ncolor = 256;
8 uniform float u_umin;
9 uniform float u_umax;
10 uniform int u_nb_points;
11 uniform samplerBuffer colormap;
12 void get_color( float u , out vec3 color ) {
13 float umin = 0.;
14 float umax = u_nb_points +1;
15 int indx = int(ncolor*(u - umin)/(umax - umin));
16 if (indx < 0) indx = 0;
17 if (indx > 255) indx = 255;
18 float r0 = texelFetch( colormap , 3*(indx) + 0 ).x;
19 float g0 = texelFetch( colormap , 3*(indx) + 1 ).x;
20 float b0 = texelFetch( colormap , 3*(indx) + 2 ).x;
21 color = vec3(r0,g0,b0);
22 }
23 void main() {
24 vec3 color = vec3(.2, .2, .2);
25 float idx = v_Index; // conceptually the same as gl_PrimitiveID;
26 get_color(idx, color);
27 fragColor = vec4(color, 1);
28 }

Listing A.2: Fragment shader code.

35
BIBLIOGRAPHY

[1] F. Aurenhammer. Power Diagrams: Properties, Algorithms and Applications.

SIAM Journal on Computing, 16(1):78–96, 1987. arXiv:https://doi.
org/10.1137/0216006, doi:10.1137/0216006.

[2] Franz Aurenhammer. Voronoi Diagrams—a Survey of a Fundamental Geometric

Data Structure. ACM Comput. Surv., 23(3):345–405, September 1991. doi:
10.1145/116873.116880.

[3] Justine Basselin, Laurent Alonso, Nicolas Ray, Dmitry Sokolov, Sylvain Lefebvre,
and Bruno Lévy. Restricted Power Diagrams on the GPU. Computer Graphics
Forum, 2021. doi:10.1111/cgf.142610.

[4] Hemant Bherwani, Saima Anjum, Suman Kumar, Sneha Gautam, Ankit Gupta,
Himanshu Kumbhare, Avneesh Anshul, and Rakesh Kumar. Understanding
COVID-19 Transmission through Bayesian Probabilistic Modeling and gis-based
Voronoi Approach: a Policy Perspective. Environment, development and
sustainability, pages 1–19, July 2020. 32837277[pmid]. doi:10.1007/
s10668-020-00849-0.

[5] Philip Caplan. Lecture Notes on ’More Voronoi Diagrams’ for CSCI0422, Febru-
ary 2022.

[6] Philip Claude Caplan. Higher-Dimensional Power Diagrams for Semi-discrete

Optimal Transport. 29th International Meshing Roundtable, June 2021.

[7] Fernando de Goes, Corentin Wallez, Jin Huang, Dmitry Pavlov, and Mathieu Des-
brun. Power Particles: An Incompressible Fluid Solver Based on Power Diagrams.
ACM Trans. Graph., 34(4), July 2015. doi:10.1145/2766901.

[8] Benedikt Grüter, David Seiferth, Matthias Bittner, and Florian Holzapfel. Emer-
gency Flight Planning using Voronoi Diagrams. January 2019. doi:10.2514/
6.2019-1056.

[9] Patrick Haertel. A Lagrangian Ocean Model for Climate Studies. Climate, 7(3),
2019. URL: https://www.mdpi.com/2225-1154/7/3/41, doi:10.
3390/cli7030041.

[10] Jiawei Han, Dongming Yan, Lili Wang, and Qinping Zhao. Computing Restricted
Voronoi Diagram on Graphics Hardware. In Jernej Barbic, Wen-Chieh Lin, and
Olga Sorkine-Hornung, editors, Pacific Graphics Short Papers. The Eurographics
Association, 2017. doi:10.2312/pg.20171320.

36
[11] A. Laraqui, A. Baataoui, A. Saaidi, A. Jarrar, Med Masrar, and K. Satori.
Image mosaicing using voronoi diagram. Multimedia Tools and Applications,
76(6):8803–8829, Mar 2017. doi:10.1007/s11042-016-3478-z.

[12] B Levy. Geogram, 2015.

[13] Bruno Levy. Simulating Fluids with a Computer: Introduction and Recent Ad-
vances, 2018. URL: https://arxiv.org/abs/1811.05636, doi:10.
48550/ARXIV.1811.05636.

[14] Bruno Lévy and Nicolas Bonneel. Variational Anisotropic Surface Meshing with
Voronoi Parallel Linear Enumeration. In Xiangmin Jiao and Jean-Christophe
Weill, editors, Proceedings of the 21st International Meshing Roundtable, pages
349–366, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.

[15] M. V. Mikhaylyuk, P. Y. Timokhin, and A. V. Maltsev. A Method of Earth Ter-

rain Tessellation on the GPU for Space Simulators. Programming and Computer
Software, 43(4):243–249, July 2017. doi:10.1134/S0361768817040065.

[16] Anne Poupon. Voronoi and Voronoi-Related Tessellations in Studies of

Protein Structure and Interaction. Current Opinion in Structural Biology,
14(February):233–241, 2004. URL: https://www.sciencedirect.com/
science/article/pii/S0959440X04000442, doi:https://doi.
org/10.1016/j.sbi.2004.03.010.

[17] Nicolas Ray, Dmitry Sokolov, Sylvain Lefebvre, and Bruno Lévy. Meshless
Voronoi on the GPU. ACM Trans. Graph., 37(6), December 2018. doi:
10.1145/3272127.3275092.

[18] Guodong Rong and Tiow-Seng Tan. Jump Flooding in GPU with Applications to
Voronoi Diagram and Distance Transform. In Proceedings of the 2006 Symposium
on Interactive 3D Graphics and Games, I3D ’06, page 109–116, New York, NY,
USA, 2006. Association for Computing Machinery. doi:10.1145/1111411.
1111431.

[19] Chris Rycroft. Voro++: a Three-Dimensional Voronoi Cell Library in C++.

January 2009. URL: https://www.osti.gov/biblio/946741, doi:
10.2172/946741.

[20] Sara C. Schvartzman and Miguel A. Otaduy. Fracture Animation Based on High-
Dimensional Voronoi Diagrams. In Proceedings of the 18th Meeting of the ACM
SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D ’14, page

37
15–22, New York, NY, USA, 2014. Association for Computing Machinery. doi:
10.1145/2556700.2556713.

[21] D. F. Watson. Computing the n-Dimensional Delaunay Tessellation with Appli-

cation to Voronoi Polytopes*. The Computer Journal, 24(2):167–172, January
1981. arXiv:https://academic.oup.com/comjnl/article-pdf/
24/2/167/967258/240167.pdf, doi:10.1093/comjnl/24.2.167.

[22] Robert Whitehurst. A Free Lagrange Method for Gas Dynamics. Monthly
Notices of the Royal Astronomical Society, 277(2):655–680, 11 1995. arXiv:
https://academic.oup.com/mnras/article-pdf/277/2/655/
18200221/mnras277-0655.pdf, doi:10.1093/mnras/277.2.655.

[23] David Wolff. OpenGL 4.0 Shading Language Cookbook. Packt Publishing, 2011.

Real-Time Volume Graphics
No ratings yet
Real-Time Volume Graphics
506 pages
ICT Detailed Lesson Plan
88% (8)
ICT Detailed Lesson Plan
3 pages
Polygon Mesh Processing Mario Botsch Et - Al 2010 PDF
100% (3)
Polygon Mesh Processing Mario Botsch Et - Al 2010 PDF
243 pages
CUDA Physx Fluids - Harris
No ratings yet
CUDA Physx Fluids - Harris
46 pages
Computer Graphics CSE 306
No ratings yet
Computer Graphics CSE 306
119 pages
Geometric Data Structures For Computer Graphics: Gabriel Zachmann & Elmar Langetepe
No ratings yet
Geometric Data Structures For Computer Graphics: Gabriel Zachmann & Elmar Langetepe
54 pages
MS 201806 Mathai
No ratings yet
MS 201806 Mathai
41 pages
GPU Pro 5 PDF
100% (1)
GPU Pro 5 PDF
524 pages
GPU Pro 1
No ratings yet
GPU Pro 1
711 pages
Understanding The Graphics Pipeline
No ratings yet
Understanding The Graphics Pipeline
35 pages
RamakrishnanV ComputerGraphics MS McGraw FinalDep PDF
No ratings yet
RamakrishnanV ComputerGraphics MS McGraw FinalDep PDF
67 pages
Engel W Ed Gpu Pro 4 Advanced Rendering Techniques
100% (1)
Engel W Ed Gpu Pro 4 Advanced Rendering Techniques
370 pages
(GPU Zen (Book 1) ) Wolfgang Engel - GPU Zen - Advanced Rendering Techniques (2017, Bowker Identifier Services)
No ratings yet
(GPU Zen (Book 1) ) Wolfgang Engel - GPU Zen - Advanced Rendering Techniques (2017, Bowker Identifier Services)
360 pages
Generating Acoustic Projections Using 3D Models
No ratings yet
Generating Acoustic Projections Using 3D Models
41 pages
Voronoi Diagrams
No ratings yet
Voronoi Diagrams
25 pages
10 1 1 1 6269 PDF
No ratings yet
10 1 1 1 6269 PDF
231 pages
Ray Tracing On GPU
No ratings yet
Ray Tracing On GPU
44 pages
Ray Shooting Rendering
No ratings yet
Ray Shooting Rendering
81 pages
GPGPU FFT Ocean Simulation
No ratings yet
GPGPU FFT Ocean Simulation
46 pages
Ray Tracing On GPU: University of Applied Sciences Basel (FHBB) Diploma Thesis
No ratings yet
Ray Tracing On GPU: University of Applied Sciences Basel (FHBB) Diploma Thesis
44 pages
Computer Game Development / Design: Edited by Wolfgang Engel
No ratings yet
Computer Game Development / Design: Edited by Wolfgang Engel
574 pages
Fast Computation of Generalized Voronoi Diagrams
No ratings yet
Fast Computation of Generalized Voronoi Diagrams
10 pages
Francois Lehericey
No ratings yet
Francois Lehericey
36 pages
CS 488
No ratings yet
CS 488
288 pages
Three-Dimensional Computer Graphics Architecture: Tulika Mitra and Tzi-Cker Chiueh
No ratings yet
Three-Dimensional Computer Graphics Architecture: Tulika Mitra and Tzi-Cker Chiueh
9 pages
Game Physics Pearls
100% (6)
Game Physics Pearls
338 pages
Scan Primitives
No ratings yet
Scan Primitives
11 pages
CG Fase1
No ratings yet
CG Fase1
21 pages
Fundamentals of Computer Graphics: Third Edition
No ratings yet
Fundamentals of Computer Graphics: Third Edition
7 pages
Data - Parallel Algorithms On Gpus
No ratings yet
Data - Parallel Algorithms On Gpus
31 pages
Gpu Zen 2 Advanced Rendering Techniques 179758314x 9781797583143
No ratings yet
Gpu Zen 2 Advanced Rendering Techniques 179758314x 9781797583143
304 pages
Semtex-User Guide
No ratings yet
Semtex-User Guide
75 pages
Grafx
No ratings yet
Grafx
69 pages
Icase: IIII11l11ll Ill
No ratings yet
Icase: IIII11l11ll Ill
26 pages
An Investigation of Fourier Domain Fluid Simulation
No ratings yet
An Investigation of Fourier Domain Fluid Simulation
69 pages
ShaPo MultiMat2017 Slides PDF
No ratings yet
ShaPo MultiMat2017 Slides PDF
44 pages
A Comprehensive Survey On Delaunay Triangulation A
No ratings yet
A Comprehensive Survey On Delaunay Triangulation A
25 pages
3D Mesh Processing and Character Animation - With Examples Using OpenGL, OpenMesh and Assimp
No ratings yet
3D Mesh Processing and Character Animation - With Examples Using OpenGL, OpenMesh and Assimp
209 pages
Full Text 01
No ratings yet
Full Text 01
59 pages
Real Time Physics Course Notes
No ratings yet
Real Time Physics Course Notes
88 pages
Real Time Physics Class Notes
No ratings yet
Real Time Physics Class Notes
90 pages
V Sim Reference Manual
No ratings yet
V Sim Reference Manual
589 pages
10.1201 b11324 Previewpdf PDF
No ratings yet
10.1201 b11324 Previewpdf PDF
180 pages
Aula6 Turb in Fluids Numerical DNS
No ratings yet
Aula6 Turb in Fluids Numerical DNS
28 pages
Graphics
No ratings yet
Graphics
8 pages
GPU Pro 3 (Edited by W.Engel) (2012)
No ratings yet
GPU Pro 3 (Edited by W.Engel) (2012)
404 pages
User-Guide-5 1 0
No ratings yet
User-Guide-5 1 0
295 pages
Juney Lee PHD SML 1565216393
No ratings yet
Juney Lee PHD SML 1565216393
244 pages
OpenLB User Guide
No ratings yet
OpenLB User Guide
187 pages
Manual Simulasi Mikromagnetik Dengan Vampire
No ratings yet
Manual Simulasi Mikromagnetik Dengan Vampire
45 pages
User-Guide-5 0 0
No ratings yet
User-Guide-5 0 0
277 pages
GPGPU Programming For Games and Science 1st Edition Eberly Download
100% (1)
GPGPU Programming For Games and Science 1st Edition Eberly Download
37 pages
6.111 Project Report: Brian Axelrod, Amartya Shankha Biswas, Xinkun Nie
No ratings yet
6.111 Project Report: Brian Axelrod, Amartya Shankha Biswas, Xinkun Nie
26 pages
GPU Pro 2 - Advanced Rendering Techniques
No ratings yet
GPU Pro 2 - Advanced Rendering Techniques
500 pages
User-Guide-4 4 1
No ratings yet
User-Guide-4 4 1
250 pages
Collision Detection Using Hierarchical Grid Spatial Partitioning
No ratings yet
Collision Detection Using Hierarchical Grid Spatial Partitioning
44 pages
Eberly, David H GPGPU Programming For Games and Science
100% (1)
Eberly, David H GPGPU Programming For Games and Science
464 pages
CGI Ads - Idea School Pro (By 88GB)
No ratings yet
CGI Ads - Idea School Pro (By 88GB)
5 pages
Aspects in Vedic Astrology
100% (1)
Aspects in Vedic Astrology
37 pages
How To Create Colorful Neon Text With Adobe Illustrator PDF
No ratings yet
How To Create Colorful Neon Text With Adobe Illustrator PDF
19 pages
Closing and Opening: Basic Morphological Algorithms
No ratings yet
Closing and Opening: Basic Morphological Algorithms
16 pages
Meil-2ndquarter Arp
0% (1)
Meil-2ndquarter Arp
18 pages
CS 352: Computer Graphics: Input Interaction
No ratings yet
CS 352: Computer Graphics: Input Interaction
52 pages
Dip HW 6
No ratings yet
Dip HW 6
1 page
EPOWERMENT 3rd
No ratings yet
EPOWERMENT 3rd
4 pages
Commonly Used Kernels
No ratings yet
Commonly Used Kernels
1 page
HowLongProjectsTake GraphicDesign
No ratings yet
HowLongProjectsTake GraphicDesign
5 pages
What Is Offscreen Rendering and How To Optimize It?
No ratings yet
What Is Offscreen Rendering and How To Optimize It?
3 pages
The Most Recent Version of Photoshop Is Photoshop CS3 Photoshop X Photoshop 9.0 Photoshop Pro Tools
No ratings yet
The Most Recent Version of Photoshop Is Photoshop CS3 Photoshop X Photoshop 9.0 Photoshop Pro Tools
5 pages
Surface Rendering
No ratings yet
Surface Rendering
16 pages
Critique Paper 3
No ratings yet
Critique Paper 3
8 pages
Color 1 Color 2 Color 3 Color 4 Color 5
No ratings yet
Color 1 Color 2 Color 3 Color 4 Color 5
4 pages
Graphics Question-1
100% (1)
Graphics Question-1
26 pages
Rendering SketchUp Models With Kerkythea
No ratings yet
Rendering SketchUp Models With Kerkythea
5 pages
Human Rights Promotion RUBRIC
No ratings yet
Human Rights Promotion RUBRIC
5 pages
7 Image Integration 8.29 1
No ratings yet
7 Image Integration 8.29 1
31 pages
Applications Rubrics
No ratings yet
Applications Rubrics
28 pages
SHS TVL Empowerment Technologies Applied Subject
No ratings yet
SHS TVL Empowerment Technologies Applied Subject
12 pages
Raster Graphics: CGVR - Korea.ac - KR
No ratings yet
Raster Graphics: CGVR - Korea.ac - KR
20 pages
Program of Studies
No ratings yet
Program of Studies
16 pages
Question Bank-Computer Graphics & IP
No ratings yet
Question Bank-Computer Graphics & IP
4 pages
Architecture Models, A Necessity or Luxury: A Case Study of Uet Architecture Department
No ratings yet
Architecture Models, A Necessity or Luxury: A Case Study of Uet Architecture Department
4 pages
UNIT 1- APPLIED MULTIMEDIA TECHNIQUES (K Sheme)
No ratings yet
UNIT 1- APPLIED MULTIMEDIA TECHNIQUES (K Sheme)
12 pages
Ict Year 8 End Term
No ratings yet
Ict Year 8 End Term
6 pages
CS 4204 Computer Graphics: 3D Views and Projection
No ratings yet
CS 4204 Computer Graphics: 3D Views and Projection
36 pages
Europe Political Outline Map PDF
No ratings yet
Europe Political Outline Map PDF
2 pages

An Algorithm For Computing 2-Dimensional Voronoi Diagrams With The Opengl Graphics Pipeline

Uploaded by

An Algorithm For Computing 2-Dimensional Voronoi Diagrams With The Opengl Graphics Pipeline

Uploaded by

AN ALGORITHM FOR COMPUTING

2-DIMENSIONAL VORONOI DIAGRAMS WITH THE

Advisor: Professor Philip Caplan

computations must be done multiple times at every frame of the animation.

A Vertex and Fragment Shader Code 35

3.1 Comparison to CPU implementation . . . . . . . . . . . . . . . . . . . 24

1.1 A demonstration of what each cell in a Voronoi diagram represents. . . 2

2.1 Examples of Voronoi diagrams generated from random and non-random

1.1 Motivation and Background

This project was inspired by the potential to positively contribute to applications of

lerian) simulations. As compared to fixed mesh methods, particle-based methods are

computing the Voronoi diagram on the GPU.

fluid simulation within the same pipeline.

the geometry shader stage.

2.1.1 Voronoi Diagrams

restricted to the 2-dimensional case of Voronoi diagrams.

2.1.2 OpenGL Pipeline

every fragment, where a fragment can conceptually, in most cases, be considered to be

2.1.3 Ability for Parallel Computation

Radius of Security Theorem

2.2 Clipping Algorithm

cell, and clipping can cease.

neighbors. For each point, we calculate 30 nearest neighbors, which we determined by

2.2.2 Algorithm Description

removing vertices, we need to make sure to maintain counter-clockwise ordering. In

next counterclockwise vertex, which is stored in v1. We want to determine if there is an

polygon in line 75, we also have to update cspace.

2.2.3 Triangulation of the Result

triangles can then be rasterized, allowing fragments to be processed by the fragment

2.3 Vertex and Fragment Shaders

uses a Quad-Core Intel Core i5 processor with a clock rate of 2GHz.

3.2 Baseline Algorithm

to the avro algorithm run in parallel.

calculation separately to determine if it is a bottleneck for speed. In order to make a

n avro-1 avro-8 GPU

3.5 Results from NVIDIA Quadro P4000

n GPU nn calc n GPU nn calc n GPU nn calc

computation, especially as the number of sites increases.

3.6 Image Quality

to see if removing the rasterization stage entirely by using GL RASTERIZER DISCARD

n 500px 1000px 2000px No Rasterization

3.7 Voronoi Diagrams

3.7.1 White Noise Diagrams

3.7.2 Uniformly Distributed Diagrams

Figure 3.5: Voronoi diagrams with a regular hexagonal pattern

Figure 3.6: Voronoi diagrams with a regular grid pattern

n Hexagonal Random n Square Random

In my thesis, I have addressed the question of whether Voronoi diagram calculations

using this method. Additionally, while improving on applications of Voronoi diagrams

is necessary could also benefit from this approach.

4.1 Future Work

Another extension could include implementing Lloyd relaxation. Lloyd relaxation is

and an identical CPU-based algorithm in order to eliminate any confounding variables

Listing A.1: Vertex shader code.

Listing A.2: Fragment shader code.

[1] F. Aurenhammer. Power Diagrams: Properties, Algorithms and Applications.

[2] Franz Aurenhammer. Voronoi Diagrams—a Survey of a Fundamental Geometric

[6] Philip Claude Caplan. Higher-Dimensional Power Diagrams for Semi-discrete

[12] B Levy. Geogram, 2015.

[15] M. V. Mikhaylyuk, P. Y. Timokhin, and A. V. Maltsev. A Method of Earth Ter-

[16] Anne Poupon. Voronoi and Voronoi-Related Tessellations in Studies of

[19] Chris Rycroft. Voro++: a Three-Dimensional Voronoi Cell Library in C++.

[21] D. F. Watson. Computing the n-Dimensional Delaunay Tessellation with Appli-

You might also like