An Algorithm For Computing 2-Dimensional Voronoi Diagrams With The Opengl Graphics Pipeline
An Algorithm For Computing 2-Dimensional Voronoi Diagrams With The Opengl Graphics Pipeline
Sabrina Templeton
A Thesis
Presented to the Faculty of the Computer Science Department
of Middlebury College
May 2022
ABSTRACT
The ability to create fast and physically-accurate fluid simulations is highly sought-after
in computer graphics. One method for creating effective fluid simulations relies on
Voronoi diagrams. Voronoi diagrams result from the division of space into cells, or re-
gions, where each cell is generated from a point, known as a site. When used in fluid
simulations, each cell gives way to a particle for the simulation. We propose a novel
way to efficiently compute Voronoi diagrams on the GPU by using a geometry shader
within the OpenGL 3.0 pipeline to compute each Voronoi cell. Other methods for com-
putation of Voronoi diagrams on the GPU exist which use technologies such as CUDA
and OpenCL, but our method differs from these methods in that we implement it within
the OpenGL pipeline. This difference allows for the ability to directly visualize the re-
sultant Voronoi diagram without reliance on additional tools. The ability to compute the
diagrams in parallel on the GPU resulted in a speed-up of two orders of magnitude over
a serial CPU implementation for Voronoi diagram generation, and one order of magni-
tude over the same implementation when run in parallel on the CPU. This speed-up is
especially significant to the application of power diagrams in fluid simulations, as those
I would like to thank my advisor Philip Caplan, without whom this project would not be
possible, for all his extremely helpful guidance and advice. I would also like to thank
Professor Andrews and Professor Dickerson for providing invaluable feedback on this
thesis. Additionally, I would like to thank my fellow CSCI 0702 students, who fostered
a wonderful sense of community within our class. Finally, I would like to thank my
friends and family for their continuing support throughout this process.
ii
TABLE OF CONTENTS
1 Introduction 1
1.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Methods 7
2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Voronoi Diagrams . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 OpenGL Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Ability for Parallel Computation . . . . . . . . . . . . . . . . . 11
2.2 Clipping Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Calculation of Nearest Neighbors . . . . . . . . . . . . . . . . 13
2.2.2 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Triangulation of the Result . . . . . . . . . . . . . . . . . . . . 19
2.3 Vertex and Fragment Shaders . . . . . . . . . . . . . . . . . . . . . . . 20
3 Results 22
3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Baseline Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5 Results from NVIDIA Quadro P4000 . . . . . . . . . . . . . . . . . . 25
3.6 Image Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 Voronoi Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.7.1 White Noise Diagrams . . . . . . . . . . . . . . . . . . . . . . 28
3.7.2 Uniformly Distributed Diagrams . . . . . . . . . . . . . . . . . 28
4 Conclusion 32
4.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Bibliography 36
iii
LIST OF TABLES
iv
LIST OF FIGURES
3.1 A graph showing the runtime of our algorithm on the NVIDIA Quadro
P4000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Voronoi diagrams for white noise distributions for 100 and 1000 sites. 28
3.3 Voronoi diagrams for white noise distributions for 10,000 and 100,000
sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 A Voronoi diagram with 1 million points. . . . . . . . . . . . . . . . . 30
3.5 Voronoi diagrams with a regular hexagonal pattern . . . . . . . . . . . 30
3.6 Voronoi diagrams with a regular grid pattern . . . . . . . . . . . . . . 31
4.1 Buffer ping-ponging can be used to adapt our pipeline to fluid simula-
tion applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
v
CHAPTER 1
INTRODUCTION
Building realistic and fast fluid simulations is an important problem in computer graph-
ics today. New methods in fluid simulation have emerged which build on Voronoi di-
agrams, and, more recently, power diagrams, which are a generalization of Voronoi
diagrams in which each point is equipped with a weight [1]. Fluid simulations are ani-
mations which attempt to realistically portray real-world fluids, such as water or smoke,
in motion. Such animations are commonly required in the animation industry today,
as well as being fundamental to ocean simulations, which can additionally be used for
environmental modeling [9]. Each cell in a Voronoi diagram is generated from a point,
known as a site, and corresponds to the region of space which is closer to that site point
than to any other site point in the diagram. This relation is shown in Figure 1.1, where
an arbitrary point from the cell in orange is considered and shown to be closest to the
site of the cell of which it is a part.
Fluid simulations rely on the use of many particles, often in the order of magnitude
of hundreds of thousands or more. These particles need to follow various physical prop-
erties, such as not overlapping [7], and, in the case of incompressible fluid simulations
used to model liquids, maintaining volume [13]. When particles are derived from power
diagrams, those particles follow these properties and can maintain incompressibility [7].
An important task, then, is being able to compute power diagrams with many cells which
could then give way to these particles, and to be able to do so rapidly. In animations and
simulations, speed is especially important. Our project aims to take a step closer to this
goal by creating a way to quickly compute a Voronoi diagram for up to several million
site points.
1
Figure 1.1: This figure shows a set of five site points, shown in red, and their associated
Voronoi cells. The point in black is part of the cell for site z2 , because it is closer to z2
than it is to any of the other sites. This property holds for every point of a Voronoi cell.
Figure made with the aid of desmos.com.
based approaches to fluid simulation were introduced in Whitehurst 1995 [22], which
uses Voronoi cells as part of a technique for compressible fluid simulation. A new tech-
nique for incompressible fluid simulation was proposed in de Goes et al., 2015 [7]. This
technique for fluid simulation relies primarily on a set of 3-dimensional power diagrams
which are calculated at every frame of the animation. These power diagrams then give
way to particles which drive the simulation. Since the calculation of power diagrams
has to be done multiple times for every frame, the speed of each computation is crucial.
The performance of de Goes’ technique is highly constrained by the time it takes for
2
(a) Initial state of a fluid sim- (b) The heavier red liquid (c) This simulation includes
ulation generated using the falls through the lighter blue 10K particles and took a few
avro codebase. liquid. hours to generate.
Figure 1.2: Example of a fluid simulation computed from power diagram cells.
the power diagram generation step, which they found accounts for 54% of the total time
needed to generate a fluid simulation of 1 million particles [7]. Currently, this technique
relies on the VORO++ library [19] for the calculation of power diagrams, but they call
for a way to calculate power diagrams on the GPU, which our project hopes to provide
a foundation for.
Since the power diagram computation step is the primary contributing factor of the
time it takes to generate the fluid simulation like the one in Figure 1.2, we investigate
a method which computes Voronoi diagrams on the GPU. This method differs from
previous GPU-based techniques [3, 17] in that it explores the possibility of computing
the Voronoi diagrams within the OpenGL pipeline. The OpenGL pipeline is a common
tool used for rendering images or animation in computer graphics. Thus, embedding the
Voronoi diagram calculation within the graphics pipeline would provide a method for
fast computation and visualization of a fluid animation without the need for additional
tools such as CUDA or OpenCL. Due to the speed-up in computation of the diagram
that results from the GPU processing power and the consolidation of the rendering stage
with this computation, exploring this possibility could potentially lead to the ability to
create such fluid simulations in real-time.
3
Other applications within the field of computer graphics that rely on Voronoi dia-
grams include fracture simulations [20], surface remeshing [10, 14], mosaicing [11], and
many more. Voronoi diagrams have applications beyond the field of computer graphics
as well, such as in aviation for emergency flight planning [8], in geology for the study
of crystal patterns [21], and biology for the study of protein structures [16]. Recently,
Voronoi diagrams have been used in epidemiology in COVID-19 studies to help identify
potential hot spots [4]. Within other areas within computer science, Voronoi diagrams
have had applications in collision detection, cluster analysis, and solving problems such
as the post-office problem [2]. A potential extension of our project could allow for
interactive input and output of coordinates in additional to the visual diagram which
is output by our current program, which then might provide benefits to researchers in
these broader fields as well. However, the primary contribution of our project is in the
speed-up of the computation of the diagram, and is therefore most relevant to those
applications where speed is an important factor, such as with fluid simulations.
The method we chose to explore to increase speed of the computation is to perform
the computation on the GPU. To best make use of the parallel processing power of
the GPU, we need an algorithm well-suited for computing the cells in parallel. This
rules out the use of common algorithms such as calculating the Voronoi diagram by first
calculating the Delaunay triangulation and then taking the dual [21]. Fragment-based, or
rasterized, methods for computing Voronoi diagrams work by dividing the space into a
grid of pixels, and then iterating over all of the pixels to determine which pixel belongs to
which Voronoi cell, and these methods can be well-suited to fast parallel computation on
the GPU [18]. However, these rasterized cells, even when calculated with an arbitrarily-
high number of fragments, will never result in the precise geometry of a final cell. Our
intent is to build a tool that could be applied to fluid simulation methods, so getting
the precise geometry is necessary, since particles of the simulation are derived directly
4
from the area or volume of the cell. Instead of either of these methods, we will use
an algorithm which is both well-suited to parallel computation and which results in the
precise geometry of the cell.
One example of a successful GPU-based application is given by Ray and Lévy [17],
who describe a method for computing meshless Voronoi on the GPU using CUDA. The
method used in their paper is applicable to the three-dimensional case. The mesh data is
not stored but tetrahedra and cell geometry are calculated which are particularly useful
in fluid simulation. The results of the study find a speed-up of one order of magnitude
as compared to the state-of-the-art CPU implementations. They were able to perform
clipping in around 0.3 seconds for 1 million points, and 3.3 seconds for 10 million
points, in the case where sites are generated in a completely random, or white noise,
distribution [17]. This study demonstrates the potential gain in time that comes from
speed-up over CPU-based methods when calculating traditional Voronoi diagrams and
a 50 times speed-up over CPU-based methods when calculating restricted power dia-
grams. Their algorithm runs in 6.7 seconds for power diagrams computed from white
noise distributions in the 10 millions point case [3]. They rely on OpenCL to run their
code on the GPU. This study demonstrates the potential of faster power diagram calcu-
lation for fluid simulations.
As mentioned, we will use the OpenGL pipeline rather than tools such as CUDA or
OpenCL to perform the computation of the Voronoi cells in parallel on the GPU. The
introduction of the programmable pipeline, which was initially introduced in OpenGL
2.0, allows fully customizable shader programs to be written in GLSL, or the OpenGL
Shading Language [23]. These shader programs allow for further customization of the
5
functionality. The OpenGL 4.0 pipeline, the most recent major version at the time of
writing, can include shader programs for geometry and tessellation stages in addition to
the vertex and fragment shaders. We will describe this pipeline in more detail in Section
2.
1.2 Contributions
Our project explores the possibility of performing fast, GPU computation within the
OpenGL pipeline. This method is motivated by the demand for faster computation of
realistic fluid simulations, and explores whether a potential extension of this OpenGL-
based method could lead to interactive or real-time fluid simulations. If fluid simulations
were to be created using this method, they would not rely on additional tools such as
OpenCL or CUDA for fast, GPU computation, but rather would utilize the flexibility of
the OpenGL pipeline to perform the power diagram calculation and then visualize the
a CPU-based approach, demonstrating the significant speed-up that comes with this
GPU-based approach. Our implementation also has the advantage of the ability for the
resulting Voronoi diagram to be directly rendered and visualized after its calculation in
6
CHAPTER 2
METHODS
In this chapter, we will cover our methods for our implementation of Voronoi diagram
generation in the OpenGL pipeline. We will first review necessary information to un-
derstanding our methods, such as some further introduction to Voronoi diagrams, a de-
scription of the OpenGL pipeline, and what properties and theorems make it possible to
perform these computation in parallel. Then we will describe the primary clipping al-
gorithm, which is performed for each cell. We will also touch on necessary components
of this algorithm such as a triangulation step and the prerequisite calculation of nearest
neighbors. In addition to describing the algorithm, we will also provide the geometry
shader code in Listing 2.1, which is the stage within the OpenGL pipeline where this
algorithm takes place. Finally, we will describe the vertex and fragment shaders of the
OpenGL pipeline.
2.1 Preliminaries
Before diving into our main algorithm and specifics of our implementation, we will
cover some of the prerequisite structures that our methods rely on.
A Voronoi cell is defined as the set of all points p⃗ such that the distance between p⃗ and
⃗zi is less than the distance between p⃗ and ⃗zi for all ⃗zj in the set of sites. This definition
is denoted mathematically for the two dimensional case by {⃗p ∈ R2 : ||⃗zi − p⃗|| ≤
||⃗zj − p⃗||∀⃗zj ∈ S}, where S is the set of all sites. Note that the scope of this study is
7
(a) Voronoi diagram from (b) Voronoi diagram from a (c) Voronoi diagram from a
randomly generated sites. structured grid of sites. uniform distribution of sites
where offsets are added to
every other row, forming
hexagonal cells.
Figure 2.1: Examples of Voronoi diagrams generated from random and non-random
distributions.
The process of generating a Voronoi diagram starts with the generation of sites.
The Voronoi cells are then calculated based on these sites, with one Voronoi cell being
generated for each site. The cells of a Voronoi diagram are always convex polygons.
If the sites are placed regularly, the diagram can consist of cells which are square or
hexagonal. Figure 2.1 shows the difference between a Voronoi diagram which was
generated from 10 randomly placed sites (Figure 2.1a) and those which are generated
from sites arranged in a structured way (Figure 2.1b, Figure 2.1c).
Our project utilizes the OpenGL pipeline to perform the algorithm for computation of
Voronoi cells on the GPU. A common use case of the OpenGL pipeline is exemplified
in Figure 2.2a, which shows a pipeline with only a vertex and fragment shader. The
vertex shader here deals with transforming the object, here a cube, being rendered into
the screen space. Then it is rasterized, or broken into fragments, which are then passed
to the fragment shader which assigns color to each fragment.
8
(a) An example of an OpenGL graphics pipeline which contains a vertex shader and a fragment
shader. The vertex shader processes the vertex data and transforms it. Then it undergoes a
rasterization stage, where primitives are broken down into fragments which are then handled by
the fragment shader, which assigns color to each fragment.
(b) The OpenGL graphics pipeline we use. The vertex shader is present but performs no com-
putation. The sites are passed into the geometry shader, the algorithm is run, and then primitives
are output for each cell. Then it passes through the rest of the pipeline in a typical manner.
Figure 2.2: The top panel shows a commonly used graphics pipeline which includes
only a vertex shader and a fragment shader. The bottom panel shows the pipeline as we
use it in this project, with the geometry shader playing the primary part.
The full OpenGL 4.0 pipeline consists of five programmable shaders. When all five
shaders are present, the vertex shader is first in the pipeline, followed by the two stages
relating to the tessellation shader, control and evaluation, respectively. Following that is
the geometry shader stage and the fragment shader stage. Not all stages of this pipeline
are required for every project, but at the very least, a vertex and fragment shader are
required. No matter which subset of these stages are present, the order of the pipeline is
maintained. For instance, the vertex shader always precedes the fragment shader.
The vertex shader executes once for each vertex. Since each vertex is processed
completely independently of the other vertices, they can be processed in parallel if there
is access to more than one thread [23]. Vertex shaders are always necessary as part of the
OpenGL pipeline. However, oftentimes in applications which use a tessellation shader
or geometry shader to create or edit the mesh, the vertex shader can be simplified and
9
considered ‘pass-through,’ since it remains a part of the pipeline but does not do any real
work. The vertex shader that we use is an example of a pass-through shader.
The tessellation shader has many uses such as subdividing a mesh [23]. Subdivi-
sion is the process of taking a mesh and breaking it into smaller triangles, with the end
product typically being a smoother mesh. Another example of a useful application of
tessellation shaders is a study which looked at improving real-time rendering of models
of the Earth [15]. This study uses the tessellation shader to tessellate the terrain more
quickly on a GPU. However, for the purpose of our project, all we need is to generate the
geometry of the Voronoi diagram, and this can be done in the geometry shader. There-
fore, we do not need to include the tessellation shader in our pipeline. This then reduces
our pipeline to the one that was originally introduced with the release of OpenGL3: a
vertex shader, followed by a geometry shader, followed by a fragment shader.
Typically, between the vertex and geometry shader stages, the vertices are assembled
into primitives. In our pipeline, the primitives are emitted as the output of the geometry
shader, as shown in Figure 2.2b. The fragment shader is then run independently for
The cell data is never stored other than individually in the geometry shader, but instead
is directly visualized.
As mentioned earlier, the majority of our project is written inside the geometry
shader. The geometry shader is able to add, delete, or modify primitives. In our ap-
plication, we use it to add primitives, allowing us to calculate the Voronoi cells within
the geometry shader and then send them through the rest of the pipeline to be rendered.
Since Voronoi cells follow the property of always being convex polygons, we can easily
and reliably break them into triangles so that they can be emitted from the geometry
10
Figure 2.3: The radius of security here is shown as the outer black circle. It is calculated
by doubling the radius, r, of the inner bounding circle, represented by the dashed circle.
Conceptually, this theorem makes sense, as any site points which lie outside the radius
will lead to perpendicular bisectors which will never cross the inner circle. This figure
was created with the aid of desmos.com.
shader.
The calculation of Voronoi cells is not dependent on any of the geometric information
from the neighboring Voronoi cells. Instead, it is reliant on the nearest neighbor infor-
mation, or the closest site points to the current site. That is to say, to calculate each cell,
we must know the location of the current site along with the locations of the nearest
sites, which need to be sorted in order of increasing distance. We can calculate this in-
formation for every site on the CPU side and then pass it in as a texture to the geometry
shader so that each site can access this nearest neighbor information. We can then clip
our Voronoi cell against these neighbors until the final geometry is reached.
Clipping against every nearest neighbor would be incredibly inefficient as our algorithm
scales to diagrams of over a million points. Therfore, to avoid looping through an unnec-
11
essarily large number of nearest neighbors, we will use the radius of security theorem
proposed by Lévy and Bonneel in their 2012 paper [14]. This theorem provides a simple
cutoff point for stopping the clipping process. The theorem demonstrates that there is a
radius—the radius of security—which is equal to two times the radius of the bounding
circle of the current polygon such that clipping against any neighbors outside of this
radius will not affect the current cell. Given the radius of the current bounding circle of
the polygon as Rk , this theorem states that for all nearest neighbors ⃗zj such that || ⃗zj -
⃗zj || > 2Rk , the bisector of (⃗zi , ⃗zj ) is non-contributing [14]. Rk is equal to the distance
between the current site and the furthest vertex of the polygon, and is shown as r in
Figure 2.3. Once this radius has been reached, we can stop visiting nearest neighbors
which fall outside this radius.
For our project, we needed an algorithm which can run in parallel for every cell in order
to fully get the benefit of the added increased parallel computation power of the GPU.
We can rely on the fact we are calculating the Voronoi diagram that is restricted to a
square. Each cell is then initialized to the square and then progressively clipped based
on the locations of the nearest neighbors until the final geometry of the cell is computed.
This algorithm is often known as half-space clipping, but we refer to it in this paper as
the clipping algorithm. Determining when the cell has reached the final stage is made
simple with the radius of security theorem, described above. If each nearest neighbor is
visited in order of ascending distance, as soon as a nearest neighbor is visited that falls
outside of the radius of security and that no longer affects the geometry of the cell, it
must be true that the none of the rest of the nearest neighbors will have any effect on the
12
2.2.1 Calculation of Nearest Neighbors
The nearest neighbors are calculated on the CPU side for all sites at once, and then
the nearest neighbor information for the entire diagram is passed into the geometry
shader. We use a function from Geogram [12] to perform calculation of the nearest
up. The calculation of nearest neighbors is also highly parallel and has the potential for
computation on the GPU. However, this goes beyond the scope of our project, which
instead uses a parallel CPU computation for the nearest neighbors.
The clipping algorithm is implemented inside the geometry shader, and is therefore
written in GLSL, which is the OpenGL shading language. The algorithm proceeds as
follows, and is also shown in code inside the geometry shader code included below in
Listing 2.1.
For each site, the algorithm starts by initializing a polygon to the bounding box of
the domain, as it does in lines 83-87 of Listing 2.1. This initialization is also shown
in Figure 2.4a. Then, it loops through all the nearest neighbors. The nearest neighbors
are passed in from the CPU through a texture which is defined in line 13. This texture
contains the nearest neighbor information for every site: for each site, indices for the
corresponding nearest neighbors are listed in ascending order of distance. In order to
find the correct nearest neighbor information for the current site, we multiply the site in-
dex by the number of nearest neighbors to access the correct portion of the information,
which is done in line 90, and then use this index to get the site in line 92. We can then
loop through the nearest neighbors for all sites which we do in line 89, skipping over the
13
first nearest neighbor because the first nearest neighbor is always the current site itself.
The site index that we use both to look up the nearest neighbor information and
to set the current seed is passed to the geometry shader from the vertex shader. It is
received in line 9 of the geometry shader as a flat integer array called instance ID.
The first and only value of this array is the site index, which is conceptually equivalent to
gl PrimtiveIDIn, a built-in variable in OpenGL which represents the index of the
current primitive for which the geometry shader is executing. However, although using
gl PrimtiveIDIn is conceptually equivalent to using the site index in the manner
that we use it, the former caused issues in our implementation.
Once we have identified the current site, zi , and the current nearest neighbor, zj ,
we can proceed to clip the polygon. In our implementation, this is written inside a
calc poly function which is defined on line 31 and called on line 94. Each time this
function is called, we perform a clip on the polygon. As shown in Figure 2.4b, the first
stage of clipping cuts the polygon along the perpendicular bisector of the line between
zi and zj1 . To implement this, we can consider that we want to update the polygon:
⃗ in Figure 2.4b, which will no longer be part of the
remove any vertices, such as p3
⃗ and q1
polygon after the clip, and add in the two vertices, q0 ⃗ in the figure, which are the
intersection points of the clipping plane and the edges of the polygon. When adding and
polygon as a vertex after the clipping and the other one will be outside and subsequently
discarded. Thus, we have to determine whether v0 or v1 is ‘inside’ or ‘outside’. We do
14
(a) Initial polygon is equal to (b) The first stage of clipping (c) The second stage of clip-
the bounding domain. cuts between zi and zj1 . ping cuts between zi and zj2 .
(d) The third stage of clip- (e) The fourth and final stage (f) The resultant diagram af-
ping cuts between zi and zj3 . of clipping cuts between zi ter clipping has commenced
and zj4 . for all site points.
Figure 2.4: The clipping algorithm is shown here on an example diagram. The first five
panels show the process to calculate the current cell by clipping against each nearest
neighbor, and the sixth cell shows the resultant diagram after clipping is completed for
all cells. Figures taken from lecture slides [5].
15
so on lines 50 and 53 of the geometry shader code shown in Listing 2.1. To determine
this, we can borrow from our definition of a Voronoi diagram, remembering that all
points in the cell are closer to the site of that cell than they are to any of the nearest
neighbors. It follows that in order for a vertex to be inside at any given clipping stage,
the vertex must be closer to the current site than it is to the nearest neighbor that we are
currently clipping against.
We now have four cases to consider: either both vertices are outside, the first vertex
is inside and the second vertex is outside, the first vertex is outside and the second vertex
is inside, or both vertices are inside. The cases where the two vertices are either both in-
side or both outside are the easiest to consider as we do not calculate the intersection in
this case. At each run of the loop over the vertices, we only want to determine whether
or not to discard the first vertex at which we are looking, v0, as we will then deal with
the second vertex, v1 in the next iteration of the loop, when that vertex becomes v0.
As such, in the case where both vertices are outside, we discard v0—strictly speaking,
we refrain from adding it to the new polygon which we are maintaining, thereby doing
nothing at this step, which is demonstrated by an unwritten else case at line 73. A visual
⃗ and q2
example of this case is in the second stage of clipping in Figure 2.4c, where q1 ⃗
are both discarded. In the case where both vertices are inside, we add v0 to the new
copy of the polygon, since we will add v1 at the next iteration of the loop—this case is
⃗ and p1
shown in line 68. This case is also exemplified in Figure 2.4c, for p0 ⃗ Then, in
the other two cases, we know there is an intersection between the two current vertices,
⃗ and p2.
a case which is shown in Figure 2.4c for p1 ⃗ We calculate this intersection by
calling the calc intersect function, which calculates the intersection between a
line segment and the perpendicular bisector between two Voronoi sites using the imple-
mentation found in Geogram [12]. In either case we add the intersection point that is
returned by this function to the new polygon. If the first vertex is inside, we simply add
16
v0 before adding intersect pt, and if not, we add only the intersect pt at this
stage.
Once this loop has run for every vertex we copy the contents of the new polygon into
the current polygon. Since the number of vertices in a cell is not constant, we initialize
arrays with a size of MAX VERTS, and then keep track of the number of vertices in each
array at any given time. When we copy the contents of the new polygon into the current
17
1
2 #version 410
3 #extension GL_ARB_gpu_shader_fp64 : enable
4 layout (points) in;
5 precision highp float;
6 uniform int u_nb_neighbors;
7 uniform mat4 u_ModelViewProjectionMatrix;
8 const int MAX_VERTS = 30;
9 flat in int[] instance_ID;
10 out float v_Index;
11 layout (triangle_strip, max_vertices = 40) out;
12 uniform samplerBuffer seeds; // the buffer which contains the sites
13 uniform usamplerBuffer nn;
14 void render_poly(vec3[MAX_VERTS] poly, int space){
15 // This function triangulates the polygon and emits the triangles
16 vec3 p0 = poly[0];
17 for (int i = 0; i < space -2; i++){
18 gl_Position = u_ModelViewProjectionMatrix*vec4(p0, 1.0);
19 v_Index = float(instance_ID[0]); // set v_Index for use in the fragment shader
20 EmitVertex();
21 gl_Position = u_ModelViewProjectionMatrix*vec4(poly[i+1], 1.0);
22 v_Index = float(instance_ID[0]);
23 EmitVertex();
24 gl_Position = u_ModelViewProjectionMatrix*vec4(poly[i+2], 1.0);
25 v_Index = float(instance_ID[0]);
26 EmitVertex();
27 EndPrimitive();
28 }
29 }
30 void
31 calc_poly(vec3 zi, vec3 zj, inout vec3[MAX_VERTS] curr_poly, inout int cspace){
32 // where zi is the current site and zj is the current nearest neighbor
33 // cspace represents the actual length of the items in curr_poly
34 int nspace = 0; // nspace keeps track of the next available space in new_poly
35 vec3 new_poly[MAX_VERTS];
36 for (int i = 0; i < cspace; i++) {
37 vec3 v0;
38 vec3 v1;
39 if (i == cspace-1){ // correctly assign the two points for the last edge
40 v0 = curr_poly[cspace -1];
41 v1 = curr_poly[0];
42 }
43 else{
44 v0 = curr_poly[i];
45 v1 = curr_poly[i+1];
46 }
47 int side1 = 0;
48 int side2 = 0;
49 //calculate side 1: determine if the first border point is inside or outside
50 if (distance(v0, zi) < distance(v0, zj)) { side1 = 1; }
51 else { side1 = -1; }
52 //calculate side 2: determine if the second border point is inside or outside
53 if (distance(v1, zi) < distance(v1, zj)) { side2 = 1; }
54 else { side2 = -1; }
55 vec3 intersect;
56 if (side1 != side2){
57 intersect = calc_intersect(zi, zj, v0, v1);
58 if (side1 == 1){
59 new_poly[nspace] = v0;
60 new_poly[nspace+1] = intersect;
61 nspace += 2;
62 }
63 else{
64 new_poly[nspace] = intersect;
18
65 nspace += 1;
66 }
67 }
68 else if (side1 == 1){ // meaning both sides are inside
69 new_poly[nspace] = v0;
70 // we add just v0 since we will add in the v1 at the next step of the loop
71 nspace += 1;
72 }
73 // otherwise both sides are outside and we take neither
74 }
75 curr_poly = new_poly;
76 cspace = nspace; // update cspace to match the new polygon
77 }
78 void main() {
79 // fetch the current site, taking in the ID which we output from the vertex
shader.
80 int site_idx = instance_ID[0];
81 vec3 zi = vec3(texelFetch( seeds , site_idx ).xy, 0);
82 // Initialize the polygon to the entire area of the diagram
83 vec3[MAX_VERTS] curr_poly;
84 curr_poly[0] = vec3(0, 0, 0);
85 curr_poly[1] = vec3(1, 0, 0);
86 curr_poly[2] = vec3(1, 1, 0);
87 curr_poly[3] = vec3(0, 1, 0);
88 int space = 4;
89 for (int i = 1; i < u_nb_neighbors; i ++){
90 int nn0 = int(texelFetch( nn, i + (site_idx * u_nb_neighbors)).x);
91 // nn0 is just the index so we need to look up the corresponding point in seeds
92 vec3 zj= vec3(texelFetch( seeds, nn0).xy, 0);
93 calc_poly(zi, zj, curr_poly, space);
94 // Below is the radius of security theorem
95 float radius = 0;
96 // loop over the vertices of the polygon to find the furthest one,
97 // the distance from zi to that point becomes the radius.
98 for (int j = 0; j < space; j++ ) {
99 vec3 curr_pt = curr_poly[j];
100 if(distance(curr_pt, zi) > radius) {
101 radius = distance(curr_pt, zi);
102 }
103 }
104 // If the current nearest neighbor is outside radius of security, stop clipping
105 if (distance(zj, zi) > (2.1 * radius)){ break; }
106 }
107 render_poly(curr_poly, space);
108 }
Listing 2.1: The code for the geometry shader, which performs the primary algorithm.
Once we have the geometry of the cell stored as a series of points which define the
corners of the polygon, we must create triangles, or primitives, so that the OpenGL
pipeline can successfully handle the output from the geometry shader. Since the Voronoi
cell is known to always be convex, we are able to easily divide the cell into triangles as
19
Figure 2.5: The triangulation here is shown in white as it would divide up the Voronoi
cell. The first point of the polygon can be used as the point to which we triangulate
given that Voronoi cells are guaranteed to be convex.
shown in Figure 2.5. We can triangulate the polygon to any one of the vertices of the
polygon, so we arbitrarily choose the first point as the one to triangulate towards. These
As previously mentioned, the vertex shader of our implementation does almost nothing,
since the diagram is not generated until the geometry shader stage. Thus, our vertex
shader is ‘pass-through,’ and the only notable tasks it handles are setting gl Position
for the points, and saving the ID of the vertex in instance ID, passing it to the next
stage—the geometry shader.
The fragment shader manages the assignment of color to the polygons after their
creation. To do so, it looks up the index of the polygon that each fragment belongs to in
the colormap to determine the color, which is passed in from the CPU as a texture in line
11 of Listing A.2. This index is taken in v Index, which is output from the geometry
shader. Again, this is conceptually the same as what gl PrimitiveIDIn should be.
20
The code for the vertex and fragment shaders is included in full in Appendix A.
21
CHAPTER 3
RESULTS
3.1 Hardware
Unless otherwise specified, these results were run on a MacBook Pro with a Intel Iris
Plus Graphics 1536 MB graphics chip, which has access to 384 shader cores. This is
directly related to the number of cells we are able to compute in parallel. Note that our
computer is able to run only 8 threads in parallel, which is relevant to the baseline CPU
implementation as well as the computation of the nearest neighbors. Our computer also
As a baseline for our comparison, we ran a Voronoi diagram algorithm generation im-
plementation from the avro library [6], which we ran in parallel and in serial. This
algorithm is primarily intended for calculation of power diagrams, but can be run to
generate Voronoi diagrams, since power diagrams are equivalent to Voronoi diagrams
in the case where all weights are equal. We expected both the parallel and the serial
versions to be slower than our algorithm, and we expected to see a larger difference in
time between the two algorithms as the number of sites for which we were generating
the diagram increased. This difference points to the value and resultant speed-up of be-
ing able to run the calculation in parallel, and with the increased parallel computation
power from the GPU, we expect to see another speed-up of our algorithm as compared
22
3.3 Timing
All of times are the result of averaging the results of ten seperate runs of the code in order
to correct for potential differences in individual run times. The timing method which we
use to time avro includes all the calculations necessary to generate the diagram, such
as the calculation of the nearest neighbors. For the timing of our own implementation,
we were able to separately time the CPU-side nearest neighbors implementation and
the GPU-side intersection calculation. This allows us to see that the nearest neighbors
neighbors calculation and the GPU run time. For a million points, the time it takes for
the nearest neighbor computation alone is under a second. Note however that because
the time it takes for the GPU to run is just under 6 seconds, this accounts for almost 11
percent of the total run time in the one million point case.
In order to get a full idea of comparison to CPU-based methods such as avro, we
forced the program to run in serial on one thread to show the time differential between
the serial and parallel approaches on the CPU. As expected, the algorithm is significantly
slower when run in serial. The results of running the algorithm in serial and in parallel
can be compared in Table 3.1. As expected, the speed difference between the parallel
and serial implementations of this is large, especially on diagrams with higher numbers
of sites. For one million sites, the parallel implementation is almost 8 times faster.
3.4 Comparison
Before comparing the two algorithms directly, we want to acknowledge that there are a
few key differences between the ways avro and our project are implemented that could
23
conceivably slow down the avro implementation as opposed to ours. avro is designed
to be dimension-independent, which may account for some of the difference that we see
in run time. In addition, avro divides the plane into two separate triangles and calcu-
lates the Voronoi diagram separately for each one. This means that the cells which fall on
the diagonal edge between the two triangles have to be calculated twice. Furthermore,
avro uses exact geometric predicates, which is necessary for being able to calculate the
dual Delaunay Triangulation. However, this extra step in calculation could slow down
the overall run time of the avro implementation. While all of these are potential fac-
tors in the overall run time of avro, we expect that these are minor contributing factors
to the difference that we see in run times between the two implementations. The main
factor for this discrepancy is likely the increased parallel computation power that comes
from running our implementation on the GPU, but future work would have to be done to
compare our algorithm to a more exact counterpart on the CPU in order to truly confirm
this.
Table 3.1: Comparison to CPU implementation: This table compares avro’s CPU-
based algorithm with our GPU implementation. All times are given in seconds. n rep-
resents the number of sites, avro-1 refers to the algorithm when forced to run in serial
on one thread, and avro-8 shows results from the algorithm when run in parallel on 8
threads.
Overall, our GPU implementation is consistently faster than both the parallel and
the serial run times of the avro implementation on the CPU, as shown in Table 3.1.
This difference becomes especially apparent as the number of points increases. For
the one million point Voronoi diagram, the serial implementation on the CPU takes
24
683 seconds, or around 11 minutes. avro is faster when run in parallel and takes only
around 90 seconds to generate the same diagram. Our implementation takes only around
6.6 seconds to run a diagram for one million points. This is over a 13 times speed-up
over the parallel CPU implementation from avro, and over a 100 times speed-up over
the serial implementation.
Table 3.2: Breakdown of timing in our application: All results are given in seconds.
n GPU nn calc
100 0.00175 8.14E-05
1K 0.00911 3.53E-04
10K 0.0846 0.00339
100K 0.636 0.0430
1M 5.91 0.719
We also ran our algorithm on a more powerful GPU to test if the speed of the computa-
tion would improve if not limited by our MacBook’s hardware. The GPU we used for
this set of results is an NVIDIA Quadro P4000 with access to 1792 CUDA cores, which
is significantly more compared to the 384 shader cores that we had access to with the
MacBook. This computer also had access to 16 threads on the CPU side which results in
a speed-up of the nearest neighbor computation. The results from running our algorithm
are shown in Table 3.3 and visualized in Figure 3.1. We ran white noise, or completely
random distributions of sites, as well as hexagonal and square distributions, all of up
to four million points. Studying both random and regular distributions allows us to see
differences between bad and good cases. In the regular cases, there are less nearest
neighbors to clip against, which allows us to see the timing difference that results from
clipping against up to 30 nearest neighbors versus clipping against less than ten in the
regular case. With the more powerful computation power of this hardware, the nearest
25
Table 3.3: Results on NVIDIA Quadro P4000 GPU with access to 1792 CUDA cores.
All times are shown in seconds. The nearest neighbor times here are again shown sepa-
rately from the GPU portion of our algorithm, which allows us to see that it becomes a
limiting factor in terms of computation time.
(a) Random distribution (b) Square distribution (c) Hexagonal distribution
neighbor calculation step becomes a limiting factor in terms of the speed of the overall
Figure 3.1: Results of our algorithm on the NVIDIA Quadro P4000 in graph form.
The timings for the primary algorithm are on the left and the time for nearest neighbor
calculation on the CPU side is shown on the right.
The diagrams which we ran in the results above were all 500px by 500px. Final images
included as figures in this paper were rendered at 2000px by 2000px to show the dia-
26
grams in higher resolution. Experimentation revealed that there does not seem to be cor-
relation between the resolution and the speed: that is to say, rendering at higher resolu-
tions does not seem to result in a slower calculation time. This was reinforced by testing
runs to eliminate any discrepancies between runs. The results of this experiment shows
that the primary driver of the run time is the geometry shader, which is to be expected
as this is where majority of the computation is done.
Table 3.4: Image Quality Results: This table compares the time to run results of a white
noise distribution with 1 million sites across different resolutions, and the final column
shows the results with no rendering stage. All times are given in seconds.
As previously mentioned, one advantage of working within the OpenGL graphics pipeline
is that the Voronoi diagram can be directly rendered as part of the final stages of the
pipeline. This leads to the other primary result of this project: the Voronoi diagrams that
we are able to generate and visualize. We study both the white noise case, which is an
irregular and therefore worse case, and more regular distributions such as the hexagonal
and square pattern distributions.
27
(a) Voronoi diagram with 100 points. (b) Voronoi diagram with 1000 points.
Figure 3.2: Voronoi diagrams for white noise distributions for 100 and 1000 sites.
Figures 3.2 and 3.3 show examples of diagrams output by our algorithm with 100, 1000,
10,000, 100,000, and 1,000,000 sites generated from white noise, or completely random,
distributions.
In addition to being able to generate diagrams from white noise point distributions, we
study two, more regular, patterns: a hexagonal pattern and a square grid pattern. Images
generated from these distributions are shown in Figure 3.5 and Figure 3.6, respectively.
We also ran results to compare the timings of these distributions to the timing of
the white noise distributions. We hypothesized that the regular diagrams would run in
less time than the random counterparts, since every site is certain to have fewer nearest
neighbors to loop through. Our results for the comparison of the hexagonal to the ran-
dom diagram and the results of the comparison to the square grid are shown in Figure
3.5. These results reinforce our hypothesis, as the hexagonal and square diagrams run
28
(a) Voronoi diagram with 10K points. (b) Voronoi diagram with 100K points.
Figure 3.3: Voronoi diagrams for white noise distributions for 10,000 and 100,000 sites.
in less than two seconds for diagrams of around one million sites. Compared to the
random diagram which takes around 6 seconds for this many points, this is certainly a
significant speed-up.
Note that since the square and hexagonal diagrams need to be run on specific num-
bers of sites in order to generate in a regular way, the square and hexagonal diagrams
cannot be compared directly. For each comparison to the random diagrams were run
on the same number of points. An error consistently occurred when trying to run the
random distribution for the specific number of points that the hexagonal diagram needed
in the 1 million diagram case. However, we can consider in this case that a diagram with
three million points is around 3 times slower to generate for randomly generated sites.
29
Figure 3.4: A Voronoi diagram with 1 million points.
(a) Hexagonal diagram with (b) Hexagonal diagram with (c) Hexagonal diagram with
136 points. 1235 points. 11,758 points.
30
(a) Square diagram with 100 (b) Square diagram with (c) Square diagram with
points. 1024 points. 10,000 points.
Table 3.5: The left table compares the time to run results of a white noise distribution
to the time it takes to run a regular hexagonal distribution, and the right compares the
random and square cases. All times are given in seconds and do not include the time for
calculation of nearest neighbors.
(a) Hexagonal versus random (b) Square versus random
31
CHAPTER 4
CONCLUSION
significant speed-up over the implementation from the avro code base that we used as a
comparison point. This speed-up could have broad implications for applications in fluid
simulations which build off of power diagrams. These applications could make use of an
extension of this method to compute those diagrams without having to rely on additional
tools since the calculation and rendering stages can both be done within the OpenGL
environment, and this approach may have the potential for real-time fluid simulations
application or instance where Voronoi diagrams need to be computed quickly for a large
number of sites. Additionally, since our implementation is able to render the Voronoi
diagrams directly without extra steps, any case where visualizing the resultant diagram
One important area for future work is incorporation into a full fluid simulation, which
would allow for more research to be done in order to the determine the specific degree
to which implementation using the OpenGL pipeline in this way might be advantageous
over other methods. In order for this project to be extended to perform a full fluid
simulation, many modifications would need to be made, starting with those extended
capabilities described below.
32
The first modification would be implementing a dimension-independent algorithm
that would allow for similar computation of three-dimensional and higher dimensional
diagrams, which would be important to support three-dimensional fluid simulations.
in regards to applications for fluid simulations would be to add in capabilities for calcu-
lating power diagrams in addition to Voronoi diagrams.
Another modification which would be important to fluid simulations would be to
perform the computation of the nearest neighbors directly on the GPU for each neighbor.
This would speed up this portion of the algorithm, which is currently a bottleneck in
terms of computation time. It would also mean we could more easily test and correct our
calculation if more than thirty nearest neighbors are required to reach the final geometry
of the cell. This would resolve a current limitation of our algorithm where it is possible
for the geometry generated to be incorrect if a site has more than 30 nearest neighbors.
In the case of using transform feedback for fluid simulations, we use a process known
as buffer ping-ponging. This process is shown in Figure 4.1. Instead of proceeding
through the rasterization step, the output of the vertex shader or geometry shader is sent
back to the CPU side. We would use this process to continually solve for the weights
of the power diagram using an optimal transport problem, which is how conservation of
mass is satisfied in the fluid simulation. Every time the weights are updated, we would
need to replace the sites and recalculate the nearest neighbors. Buffer ping-ponging
allows us to perform these necessary updates, overwrite the information in the buffers,
and then repeat the process as needed until the weights are optimal, at which point we
are ready to render the particles using the pipeline. For an example of how buffer ping-
ponging can be used for fluid simulations, see Chapter 9 of the OpenGL 4.0 Shading
33
Language Cookbook [23] on animations and particles.
Figure 4.1: This figure shows a typical graphics pipeline with buffer ping-ponging. Our
pipeline would use a similar structure, but with the output coming from the geometry
shader instead of the vertex shader. Figure taken from the OpenGL 4.0 Shading Lan-
guage Cookbook [23].
Future research should also be done to make a comparison between our algorithm
the computation and then visualizes the diagram using OpenGL. This would allow for a
direct comparison to measure if the use of OpenGL for the entire program has any speed
advantage over alternative GPU approaches for generating Voronoi diagrams.
34
APPENDIX A
VERTEX AND FRAGMENT SHADER CODE
1 #version 410
2 flat out int instance_ID;
3 layout (location = 0 ) in vec3 a_Position;
4 // This is a pass-through shader, but we still need to set gl_Position
5 void main() {
6 gl_Position = vec4(0.0,0.0,0.0,0.0);
7 // we also save the instance_ID so we can pass it into the geometry shader
8 instance_ID = gl_VertexID;
9 }
1 #version 410
2 layout( location = 0 ) out vec4 fragColor;
3 in vec3 v_Position;
4 in vec3 v_Normal;
5 in vec3 v_Parameter;
6 in float v_Index;
7 const int ncolor = 256;
8 uniform float u_umin;
9 uniform float u_umax;
10 uniform int u_nb_points;
11 uniform samplerBuffer colormap;
12 void get_color( float u , out vec3 color ) {
13 float umin = 0.;
14 float umax = u_nb_points +1;
15 int indx = int(ncolor*(u - umin)/(umax - umin));
16 if (indx < 0) indx = 0;
17 if (indx > 255) indx = 255;
18 float r0 = texelFetch( colormap , 3*(indx) + 0 ).x;
19 float g0 = texelFetch( colormap , 3*(indx) + 1 ).x;
20 float b0 = texelFetch( colormap , 3*(indx) + 2 ).x;
21 color = vec3(r0,g0,b0);
22 }
23 void main() {
24 vec3 color = vec3(.2, .2, .2);
25 float idx = v_Index; // conceptually the same as gl_PrimitiveID;
26 get_color(idx, color);
27 fragColor = vec4(color, 1);
28 }
35
BIBLIOGRAPHY
[3] Justine Basselin, Laurent Alonso, Nicolas Ray, Dmitry Sokolov, Sylvain Lefebvre,
and Bruno Lévy. Restricted Power Diagrams on the GPU. Computer Graphics
Forum, 2021. doi:10.1111/cgf.142610.
[4] Hemant Bherwani, Saima Anjum, Suman Kumar, Sneha Gautam, Ankit Gupta,
Himanshu Kumbhare, Avneesh Anshul, and Rakesh Kumar. Understanding
COVID-19 Transmission through Bayesian Probabilistic Modeling and gis-based
Voronoi Approach: a Policy Perspective. Environment, development and
sustainability, pages 1–19, July 2020. 32837277[pmid]. doi:10.1007/
s10668-020-00849-0.
[5] Philip Caplan. Lecture Notes on ’More Voronoi Diagrams’ for CSCI0422, Febru-
ary 2022.
[7] Fernando de Goes, Corentin Wallez, Jin Huang, Dmitry Pavlov, and Mathieu Des-
brun. Power Particles: An Incompressible Fluid Solver Based on Power Diagrams.
ACM Trans. Graph., 34(4), July 2015. doi:10.1145/2766901.
[8] Benedikt Grüter, David Seiferth, Matthias Bittner, and Florian Holzapfel. Emer-
gency Flight Planning using Voronoi Diagrams. January 2019. doi:10.2514/
6.2019-1056.
[9] Patrick Haertel. A Lagrangian Ocean Model for Climate Studies. Climate, 7(3),
2019. URL: https://www.mdpi.com/2225-1154/7/3/41, doi:10.
3390/cli7030041.
[10] Jiawei Han, Dongming Yan, Lili Wang, and Qinping Zhao. Computing Restricted
Voronoi Diagram on Graphics Hardware. In Jernej Barbic, Wen-Chieh Lin, and
Olga Sorkine-Hornung, editors, Pacific Graphics Short Papers. The Eurographics
Association, 2017. doi:10.2312/pg.20171320.
36
[11] A. Laraqui, A. Baataoui, A. Saaidi, A. Jarrar, Med Masrar, and K. Satori.
Image mosaicing using voronoi diagram. Multimedia Tools and Applications,
76(6):8803–8829, Mar 2017. doi:10.1007/s11042-016-3478-z.
[13] Bruno Levy. Simulating Fluids with a Computer: Introduction and Recent Ad-
vances, 2018. URL: https://arxiv.org/abs/1811.05636, doi:10.
48550/ARXIV.1811.05636.
[14] Bruno Lévy and Nicolas Bonneel. Variational Anisotropic Surface Meshing with
Voronoi Parallel Linear Enumeration. In Xiangmin Jiao and Jean-Christophe
Weill, editors, Proceedings of the 21st International Meshing Roundtable, pages
349–366, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
[17] Nicolas Ray, Dmitry Sokolov, Sylvain Lefebvre, and Bruno Lévy. Meshless
Voronoi on the GPU. ACM Trans. Graph., 37(6), December 2018. doi:
10.1145/3272127.3275092.
[18] Guodong Rong and Tiow-Seng Tan. Jump Flooding in GPU with Applications to
Voronoi Diagram and Distance Transform. In Proceedings of the 2006 Symposium
on Interactive 3D Graphics and Games, I3D ’06, page 109–116, New York, NY,
USA, 2006. Association for Computing Machinery. doi:10.1145/1111411.
1111431.
[20] Sara C. Schvartzman and Miguel A. Otaduy. Fracture Animation Based on High-
Dimensional Voronoi Diagrams. In Proceedings of the 18th Meeting of the ACM
SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D ’14, page
37
15–22, New York, NY, USA, 2014. Association for Computing Machinery. doi:
10.1145/2556700.2556713.
[22] Robert Whitehurst. A Free Lagrange Method for Gas Dynamics. Monthly
Notices of the Royal Astronomical Society, 277(2):655–680, 11 1995. arXiv:
https://academic.oup.com/mnras/article-pdf/277/2/655/
18200221/mnras277-0655.pdf, doi:10.1093/mnras/277.2.655.
[23] David Wolff. OpenGL 4.0 Shading Language Cookbook. Packt Publishing, 2011.
38