Efficient Homology Computations On Multi
Efficient Homology Computations On Multi
Manycore Systems
N. Anurag Murty1 , Vijay Natarajan1,2 , Sathish Vadhiyar1
1
Supercomputer Education and Research Centre
2
Department of Computer Science and Automation
Indian Institute of Science, Bangalore, India
[email protected], [email protected], [email protected]
Abstract—Homology computations form an important step large datasets highlight the need for fast and memory-efficient
in topological data analysis that helps to identify connected algorithms for homology computations. This serves as our
components, holes, and voids in multi-dimensional data. Our primary motivation for developing parallel algorithms for
work focuses on algorithms for homology computations of large
simplicial complexes on multicore machines and on GPUs. This homology computations.
paper presents two parallel algorithms to compute homology. A We present parallelization strategies for fast computation
core component of both algorithms is the algebraic reduction of a of homology on multicore and manycore GPU systems. The
cell with respect to one of its faces while preserving the homology algorithm we consider for parallelization uses the method of
of the original simplicial complex. The first algorithm is a parallel algebraic reductions to reduce the size of the input space while
version of an existing sequential implementation using OpenMP.
The algorithm processes and reduces cells within each partition of maintaining its homology[8]. For implementation on multicore
the complex in parallel while minimizing sequential reductions on architectures, the algebraic reduction step in REDHOM is par-
the partition boundaries. Cache misses are reduced by ensuring allelized using OpenMP[9]. We decompose the complex and
data locality for data in the same partition. We observe a linear perform parallel reductions on the different partitions while
speedup on algebraic reductions and an overall speedup of up keeping the boundaries between partitions intact. The next
to 4.9× with 16 cores over sequential reductions. The second
algorithm is based on a novel approach for homology compu- step involves algebraic reduction of the unreduced boundary
tations on manycore/GPU architectures. This GPU algorithm is cells sequentially to compute homology. We obtain up to
memory efficient and capable of extremely fast computation of 4.9× improvements in performance over sequential algebraic
homology for simplicial complexes with millions of simplices. We reductions.
observe up to 40× speedup in runtime over sequential reductions The above idea does not scale well for higher degrees of
and up to 4.5× speedup over REDHOM library, which includes
the sequential algebraic reductions together with other advanced
parallelism as in the case of GPU architectures. So we describe
homology engines supported in the software. a different algorithm amenable to massively parallel archi-
tectures. Each GPU thread attempts to perform an algebraic
I. I NTRODUCTION reduction but this is possible only when certain conditions
Topology is the study of the connectivity of space and involving its neighbours are met. Moreover, it is observed that
provides useful tools for analyzing datasets by enabling the construction of the entire simplicial complex is not necessary
abstract representation of features in the data. Topological for performing algebraic reductions. This observation speeds
data analysis finds numerous applications in neuroscience, up homology computations and leads to a memory-efficient
astrophysics, image analysis, and nonlinear dynamics [1–6]. algorithm. Finally, we define a cost function that enables us to
All of these applications are characterised by very large data perform reductions in a load-balanced manner. Implementation
sizes from which topological data analysis reveals underlying of this algorithm gives up to 40× speedup over sequential
patterns and structure. This structure is extracted in the form of algebraic reductions. We also obtain up to 4.5× speedup
connected components, holes, and voids of higher dimensions over REDHOM, which implements among the fastest sequential
along which the data aligns itself in space. The characteriza- algorithms for homology computations.
tion of these connected components, holes, voids, and their Primary contributions of this work are:
higher dimensional equivalents is more formally described by 1) A multicore algorithm for fast homology computations.
the notion of homology. Computing homology requires the 2) Modifications to sequential algebraic reductions using
construction of a combinatorial representation of the space OpenMP that improve the performance by up to 4.9×.
such as a simplicial complex. 3) A memory-efficient GPU algorithm based on algebraic
An interesting application of homology computations is the reductions that gives up to 40x speedup over the sequen-
detection of holes in the coverage of a sensor network[7]. Hole tial algorithm and up to 4.5× speedup over homology
detection is useful in cell-phone communications, beacon nav- computations in REDHOM library.
igation and some problems in security and defense. These type 4) A novel cost assignment scheme to ensure load-balanced
of applications require real-time computation of homology. execution and to ensure that only low-cost reductions are
The requirement for real-time computations and increasingly performed in a given iteration.
333
(a) (b)
Fig. 2. The torus has one connected component, two tunnels and one void
Fig. 1. (a) A valid simplicial complex of dimension 2 (b) An invalid
simplicial complex since A and B do not intersect on an edge or a vertex.
it. In our example, the boundary of [A, B, C] are the three
edges [A, B], [B, C] and [C, A]. The boundary of a k-
Although we focus on computing the rank of homology simplexPσ = [v0 , v1 , . . . , vk−1 ] is defined as the formal sum
groups in our description, the algorithms presented could ∂σ = i −(1)i [v0 , v1 , . . . , v̂i , . . . , vk−1 ]. A minus sign in this
optionally be used to output an incidence matrix of the reduced sum basically means including the same simplex but with
complex. The Smith normal form algorithm can be applied to the opposite orientation i.e., with any two of the vertices
compute torsion coefficients also. This extension is relevant interchanged. Simplices with opposite orientations cancel each
only in complexes of dimensions higher than three. other out. The coboundary of a k-simplex σ ′ is the set of all
Section II provides the required background and definitions k + 1-simplices that have σ ′ as a face. If a simplex σ ′ lies in
especially focusing on algebraic reductions. Section III is a the boundary of σ, then σ ′ lies in the coboundary of σ ′ . For
literature survey of prior research in this area. Sections IV instance, in Figure 1(a), ∂A = 3 + 4 + 5 and ∂B = 5 + 6 + 7.
and V provide detailed descriptions of the proposed algorithms The coboundary of edges 1 and 2 is {φ}. The coboundary of
for homology computations on multicore systems and GPUs edges 3 and 4 is {A}, and that of 6 and 7 have coboundary
respectively. Experimental results are presented in Section {B}. Edge 5 has {A, B} as its coboundary.
VI and Section VII presents possible directions for future A fundamental property of boundaries is that the bound-
research. ary function applied twice is zero. In the above example,
∂∂[A, B, C] = ∂([B, C] + [C, A] + [A, B]) = [B] − [C] +
II. BACKGROUND
[C] − [A] + [A] − [B] = 0. We define a k-cycle as any formal
In topology, we study the properties of spaces that are sum of simplices whose boundary is zero. Due to property
invariant under continuous deformations or more formally, of boundaries, all boundaries are cycles. However, not all
homeomorphisms. A finite representation of topological spaces cycles bound a higher dimensional simplex. For instance, if
is required to compute these topological invariants. An exam- our original simplicial complex had the edges [B, C], [C, D]
ple of such a finite representation is a simplicial complex. We and [D, B] but not the triangle [B, C, D], the boundary of
present below a few definitions that are required to describe [B, C] + [C, D] + [D, B] is ∂([B, C] + [C, D] + [D, B]) =
our methodology. For a more mathematical treatment, we refer [B] − [C] + [C] − [D] + [D] − [B] = 0. The edges [B, C],
the reader to the texts by Zomorodian[10] and Munkres[11]. [C, D] and [D, B] form a cycle that is not a boundary of any
triangle.
A. Simplicial complexes and simplicial homology
The homology of a simplicial complex deals with counting
A k-simplex σ is the convex hull of a set A of k + 1 the number of independent cycles that do not bound any set of
independent points in Rd , for 0 ≤ k ≤ d. We use the simplices in a higher dimension. The homology in orders 0, 1
terms vertex for 0-simplex, edge for 1-simplex, triangle and 2 represent the number of connected components, tunnels,
for 2-simplex and tetrahedron for 3-simplex. A simplex and voids respectively, and are represented as algebraic groups.
σ ′ is a f ace of a simplex σ if σ ′ is contained in σ. A In this paper, we are interested in computing the rank of
simplicial complex, K, is a finite set of simplices satisfying these groups and we refer to these computations as homology
two properties : (i) if σ ∈ K and τ is a face of σ then τ ∈ K computations. For example, homology computations identify
and (ii) if σ ∈ K and σ ′ ∈ K, then σ ∩ σ ′ is either φ or a face one connected component, two independent tunnels and one
of both σ and σ ′ . The dimension of K, d(K) is defined as void in the simplicial complex that represents a torus in
the maximum dimension of a simplex in K. Fig 1(a) shows a Figure 2. For ease of description, computations are performed
valid simplicial complex whereas the collection of simplices modulo 2 which gives us the Z2 homology[10].
in Fig 1(b) does not satisfy property (ii) and is thus not a
simplicial complex. B. Algebraic reduction
A k-simplex σ can be represented as the set of its vertices Consider the simplicial complex in Figure 1(a) . It consists
[v0 , v1 , . . . , vk−1 ]. For instance, a triangle with vertices A, of one connected component and contains one tunnel. Clearly,
B, and C can be represented as [A, B, C]. The boundary we can construct a smaller sized complex representing one
of a k-simplex is formed by the (k − 1)-simplices bounding component and containing one tunnel. Reduction algorithms
334
reduce the size of a simplicial complex in a way such that
homology remains unchanged.
We focus on algebraic reductions to reduce the size of
the complex[8]. Initially, each dimension d of the simplicial
complex consists of the set of all the d-simplices. During the
reduction procedure, d-simplices can merge to form d-cells,
which can be thought of as more general versions of simplices.
For example, vertices are 0-cells, edges are 1-cells, polygons
are 2-cells and 3-D polytopes are 3-cells. In any intermediate
step of the procedure, dimension d consists of the set of all
the d-cells.
For two cells u,v of the same dimension, we define hu, vi
to be 1 when u = v and 0 otherwise. After the algebraic
reduction of cell b of dimension m with respect to its face
a in dimension m-1, the new boundary maps are given by
Equation 1, where addition is performed modulo 2.
∂v,
if d(v) ∈
/ {m, m + 1},
∂v = ∂v + h∂v, ai∂b, if d(v) = m, (1)
∂v + h∂v, bib, if d(v) = m + 1.
335
180
an input complex into local pieces and performing parallel 160
computations on these. After the parallel computations, the 140
TIME(IN SECONDS)
pieces are merged and homology is calculated again to give 120
Algebraic Reductions
100
the final result. The method relies on the property of the initial Codes + Reducible
80 Complex
division that the homology is equal to the sum of homology of 60 Read and construct
the individual pieces. However, this method does not scale to 40 simplicial complex
the level of parallelism offered by manycore GPU platforms, 20
DATASETS
IV. H OMOLOGY C OMPUTATIONS ON M ULTICORE
S YSTEMS
Fig. 4. Timings for various functions in homology computations using
We now propose two approaches to parallelizing the ho- sequential algebraic reductions
mology computation algorithm. The first approach is suitable
for multicore computations and is based on the sequential
algorithm implemented in the REDHOM library[15]. The li- compute homology is also a time consuming step in sequential
brary has efficient implementations of algorithms based on reductions.
reduction methods such as acyclic subspace construction,
elementary reductions, and discrete Morse theory. Each of Algorithm 1 Algorithm For Multicore Homology
these techniques is applied in sequence on the input complex Input: Maximal simplices of simplicial complex K
to reduce its size and hence compute the homology efficiently. Output: Homology : β[0],β[1],. . . ,β[d(K)]
In this work, our focus is only on parallelizing algebraic 1: Partition the simplicial complex K (P0 ,P1 ,. . . ,Pk−1 )
reductions. We disable all the other steps of REDHOM and 2: Mark boundary vertices
restrict our attention to algebraic reductions. 3: Spawn k threads and assign thread t to Pt
First, we discuss the steps of the sequential algebraic 4: (In Parallel) Threads construct reducible complex for their
reductions which are profiled for various datasets in Figure 4. partition
We discuss the datasets mentioned in the figure in more detail 5: (In Parallel) Threads reduce non-boundary cells in their
in Section VI. partition
6: Merge unreduced partitions to get a single reducible chain
A. Sequential algorithm for algebraic reductions complex K ′
Read and construct simplicial complex. In this step, max- 7: Perform algebraic reductions on all reducible cells of K ′
imally induced simplices of a simplicial complex are taken 8: β[d] is cardinality of irreducible cells in dimension d
as input and the simplices of all dimensions are generated. return β
This step forms the pre-processing step for all the algorithms
implemented in REDHOM, including algebraic reductions. As
seen in Figure 4, this step has a very low contribution to the
B. Multicore algorithm for algebraic reductions
execution times of sequential algebraic reductions.
Codes assignment and construction of reducible complex. We attempt to parallelize the construction of the reducible
The simplicial complex constructed in the previous step has complex and the algebraic reductions as both of these are
to be algebraically reduced for homology computations using the major contributors to the execution time of the homology
Equation 1. Since each step of a reduction modifies the computations using algebraic reductions. Algorithm 1 explains
boundaries and coboundaries of simplices, we need a data the steps for computation of homology on multicore machines.
structure that provides fast access to boundary and coboundary The steps are : decomposition of the complex into partitions,
data. For the purpose of creating a map from simplices to their parallel reductions of these partitions, merging of the reduced
boundaries and coboundaries, integer codes are assigned to partitions and sequential reduction of the merged complex.
all the simplices. Then boundary and coboundary maps which Decomposition into partitions. Parallelization depends on an
assign a chain to each code are constructed. This set of maps initial partition of the input mesh into near-equal sized meshes
constitutes a reducible complex on which algebraic reductions whose boundaries are as small as possible. These partitions
are performed. This step takes up the highest percentage of are generated using METIS, a graph partitioning software[20].
the total execution time. Minimizing the number of boundary cells helps in reducing
Algebraic reductions. This step performs the actual reduc- the time spent in the sequential part of our algorithm while
tions on the reducible complex that represents the input sim- similar sized partitions help in maintaining load balance during
plicial complex. Starting from the highest dimension, the cells the parallel phase. Simplices with all their vertices occurring
are reduced with respect to their faces and their boundary maps in two or more partitions are marked as boundary simplices.
are modified. For each dimension, the number of remaining During the serial read phase, we preallocate contiguous mem-
irreducible simplices is the homology of that dimension. ory for non-boundary simplices from each partitions. This
The modification of these boundaries and coboundaries to ensures spatial locality of simplices from the same partition.
336
Fig. 5. Intermediate steps in reductions of the partitions by different threads.
Different colours represent the partitions reduced by the threads. The boundary Fig. 6. Intermediate steps of the sequential reduction phase. The partitions
elements are shown in red and are not reduced in the parallel phase. are merged in this case and sequential reductions are performed subsequently.
Also, a separate memory space is allocated beforehand for the partially reduced chain complex consisting primarily of unre-
boundary simplices. duced boundary cells. All of these complexes are then merged
Parallel construction of the reducible complex. After bound- together to form a new reducible complex. As reduction oper-
ary vertices are marked, we spawn one thread per partition. ations preserve homology, the homology of this new complex
Firstly, the threads split the simplicial complex obtained from is the same as the homology of the input mesh. Algebraic
the read operation to construct one reducible complex per reductions are applied to all reducible cells of the merged
partition. We allocate memory pools for each partition and chain complex. After this step is completed, the homology in
ensure that simplices which belong to the same partition each dimension is given by the number of irreducible cells of
are assigned to the same memory pool. As data locality is that dimension. Figure 6 shows the different partitions being
maintained for the non-boundary simplices of a partition, merged after which the complex is reduced sequentially.
cache misses incurred in the iterations over these simplices
to construct the reducible complex in parallel are greatly V. H OMOLOGY C OMPUTATIONS ON M ANYCORE /GPU
reduced. This step is crucial in improving the performance S YSTEMS
of the multicore algorithm. As mentioned, the methodology used for multicore homol-
For the set of boundary simplices, all the threads have to ogy cannot be directly extended to GPU architectures. We
iterate over the same memory space but this does not greatly propose an algorithm to compute homology on GPUs. The
affect performance due to the small size of the boundary discussion considers Z2 homology i.e., addition modulo 2, but
compared to the size of the partitions. can be easily extended to arbitrary fields, albeit with increased
The reducible complex represents the boundary and space requirements. We assume that the input is in the form of
coboundary information of the partition assigned to the thread. maximal simplices of a simplicial complex and is stored as a
Codes are assigned to cells while ensuring that boundary cells, set of vertex arrays in the GPU global memory. Algorithm 2 is
i.e., cells with all their vertices on the boundary, are assigned the homology computation algorithm for GPUs. The important
the same code over all partitions. Using METIS for graph steps in the algorithm are explained below.
partitioning ensures that we obtain balanced partitions of the Reducing memory requirements for GPU algorithm. In
mesh, thus keeping these parallel constructions load-balanced. Equation 1, we observe that reducing a cell in dimension
Parallel algebraic reductions. Algebraic reductions are then m can only modify the boundaries in dimensions m and
performed in parallel on each of these reducible complexes. m + 1. So, if we start from the highest dimension and
The reductions are done on all cells with the exception of work our way downwards, the boundaries in only the highest
boundary cells. Boundary cells are shared by two or more dimension are modified[8]. An algebraic reduction of a cell in
partitions and are not reduced in the parallel phase. Our dimension m is performed with respect to one of its faces in
partitions should thus have very small boundary sizes to dimension m − 1. Thus, given the list of all cells we can
maximize the parallel reductions. Figure 5 shows some of the generate all the faces with which the cells can be paired
intermediate steps in the parallel reduction phase. The threads and reduced. This implies that we just need to transfer the
reduce the different coloured partitions in parallel and leave list of simplices of the highest unreduced dimension m to
the boundaries unreduced. perform algebraic reductions. This crucial observation helps in
Merge and sequential reductions. Now each thread has a improving performance of the algorithm as all the intermediate
337
data structures are generated on the GPU so that data transfers the symmetric difference operation carried out for algebraic
between the host and device are minimized. Intermediate data reductions can be executed in time linear in the size of the
structures include the boundary data and the coboundary data boundary/coboundary.
of the cells and faces respectively. For lower dimensions, we
only need to carry forward the unreduced faces from this Algorithm 3 Procedure Reduce-dimension
dimension. In comparison to the space required for storing the 1: Reduce-dimension(struct-cells,struct-faces){
entire simplicial complex, GPU memory requirements are very 2: (GPU) Generate faces from cells
low when we adopt this approach of constructing the complex 3: (GPU) Assign values to boundary, boundary-size vectors
per-dimension starting from the highest dimension. In contrast 4: (GPU) Sort faces in lexicographic order and mark repeated
with this, the entire simplicial complex with simplices in all faces
dimensions is constructed in REDHOM as a pre-processing 5: (GPU) Assign values to coboundary, coboundary-size vec-
step. tors
6: (GPU) Remap to get newIDs in boundary vectors
Algorithm 2 Algorithm For GPU Homology 7: (GPU) Remove repeats from face vectors
Input: Maximal simplices of simplicial complex K 8: Initialize variables irreducible, reduced to 0
Output: Homology : β[0],β[1],. . . ,β[d(K)] 9: while (irreducible + reduced < number of cells) do
1: for dim = d(K) downto 1 do 10: (GPU) Each cell finds face with min. cost of reduction
2: Transfer cells of dimension dim to GPU(struct-cells) 11: (GPU) Cells with min. costs within fixed margin do a
3: Allocate space for faces on GPU(struct-faces) race-prioritycheck-check to lock required
4: β[dim] = Reduce-dimension(struct-cells,struct-faces) boundaries and coboundaries
5: Merge unreduced faces in struct-faces with cells of 12: (GPU) Invoke Kernel Reduce-pair
dimension dim − 1 13: (GPU) Update values oF reduced and irreducible
6: end for 14: end while
7: return β 15: return irreducible
16: }
338
set difference of two sorted arrays is linear in the sum of
number of elements in these arrays, the cost of reducing cell
b with face a is:
reduction cost(a, b) = (#Bdy(b) − 1) × (#Cbdy(a))
X
+ (#Cbdy(g))
g∈Bdy(b)\{a}
(2)
+(#Cbdy(a) − 1) × (#Bdy(b))
X
+ (#Bdy(u)) (a) (b)
u∈Cbdy(a)\{b}
339
180 The time taken by the various functions during sequential
160 algebraic reductions in REDHOM is shown in Figure 4. In all
140
cases, construction of reducible complex is the most time-
TIME (IN SECONDS)
120 merge
100
Algebraic Reductions consuming operation followed by algebraic reductions. The
Codes + Reducible complex read and construction of the simplicial complex is
80 Complex
60 Split done sequentially. Codes are assigned to the simplices and
Read and construct
40
simplicial complex
reducible complexes are constructed on the different partitions
20 in parallel. Algebraic reductions are performed on all non-
0
1 2 4 8 16 boundary simplices in parallel following which the unreduced
NUMBER OF THREADS simplices are merged.
The results of this parallelization for different number of
Fig. 8. Parallelization results for dataset SYNTH using multicore reductions threads on SYNTH dataset are presented in Figure 8.
For all datasets, it is observed that the algebraic reduc-
tions step of the sequential algorithm scales linearly with
180
160
increasing number of cores. We obtain up to 10.7× speedup
140 for this step with 16 cores. We notice that the execution
TIME (IN SECONDS)
340
VII. C ONCLUSIONS AND F UTURE W ORK
50 In this work, we have developed algorithms for homology
SPEEDUP 40 computations on multicore and manycore GPU systems. We
30 observe up to 4.9× speedup with 16 cores over sequential
20 algebraic reductions on multicore systems. A speedup of up to
10 40× over the sequential algebraic reductions is observed using
0 our GPU algorithm. The GPU algorithm compares favourably
BLUNT POST BUCKY SYNTH
with the REDHOM library which has a series of algorithms
DATASETS
for homology computations, giving up to 4.5× performance
gains.
We have explored the possibility of parallelization exclu-
Fig. 10. Speedup of average GPU timings with respect to sequential algebraic sively based on algebraic reductions. There are many other
reductions
types of reduction algorithms implemented in REDHOM. We
plan to extend our work further by identifying algorithms
45 that work at a local level to reduce the size of the simplicial
40 complex and then using a similar approach to parallelize it.
35 Another possible extension could be parallel algorithms for
TIME( IN SECONDS)
30
homology computations in a distributed memory environment.
25
Multicore (16 cores)
20 algebraic reductions ACKNOWLEDGEMENTS
15 GPU (average times)
10 This work was partially supported by the Depart-
5 ment of Science and Technology, India, under Grant
0
BLUNT POST BUCKY SYNTH
SR/S3/EECE/0086/2012.
DATASETS
R EFERENCES
Fig. 11. Comparison of GPU and multicore timings [1] T. Kaczynski, K. Mischaikow, and M. Mrozek, Computational Homol-
ogy. New York: Springer, 2004, vol. 157.
[2] G. Singh, F. Memoli, T. Ishkhanov, G. Sapiro, G. Carlsson, and D. L.
Ringach, “Topological analysis of population activity in visual cortex,”
algorithms for efficient homology computation in addition to Journal of vision, vol. 8, no. 8, 2008.
[3] S. Maadasamy, H. Doraiswamy, and V. Natarajan, “A hybrid parallel
the algebraic reductions. Even when all homology computation algorithm for computing and tracking level set topology.” HiPC, 2012.
engines of REDHOM are switched on, speedups of upto 4.5× [4] A. Gyulassy, V. Natarajan, V. Pascucci, P.-T. Bremer, and B. Hamann,
are observed with the GPU algorithm, as seen in Figure 12. “A topological approach to simplification of three-dimensional scalar
functions,” Visualization and Computer Graphics, IEEE Transactions
We also observe better speedups for POST and SYNTH on, vol. 12, no. 4, pp. 474–484, 2006.
datasets compared to the other datasets for both the algorithms. [5] H. Edelsbrunner and J. L. Harer, Computational Topology: An Introduc-
In Figure 4, we notice that for both these datasets the time tion. American Mathematical Soc., 2010.
[6] R. van de Weygaert, G. Vegter, H. Edelsbrunner, B. J. Jones, P. Pranav,
spent in algebraic reductions forms a high percentage of the C. Park, W. A. Hellwing, B. Eldering, N. Kruithof, E. P. Bos et al.,
total execution time. This relationship between the contribution “Alpha, betti and the megaparsec universe: on the topology of the cosmic
of the algebraic reduction step to the the total execution time web,” in Transactions on Computational Science XIV. Springer, 2011,
pp. 60–101.
and the speedups obtained for the dataset was observed in [7] R. Ghrist and A. Muhammad, “Coverage and hole-detection in sensor
general for all the datasets on which we tested our algorithm. networks via homology,” in Proc. Intl. symp. Information processing in
sensor networks. IEEE Press, 2005, p. 34.
[8] T. Kaczyński, M. Mrozek, and M. Ślusarek, “Homology computation
by reduction of chain complexes,” Computers & Mathematics with
20 Applications, vol. 35, no. 4, pp. 59–70, 1998.
20
18
18
[9] L. Dagum and R. Menon, “Openmp: an industry standard api for shared-
16
memory programming,” Computational Science & Engineering, IEEE,
14 16
seconds)
12 14
10 12 REDHOM (serial, optimized)
[10] A. J. Zomorodian, Topology for Computing. Cambridge University
8 10 GPU (averageREDHOM
times) (serial, Press, 2005.
6 optimized) [11] J. R. Munkres, Elements of Algebraic Topology. Addison-Wesley
Time(in
8
4
6
GPU (average times) Reading, 1984.
2
0
4 [12] R. Kannan and A. Bachem, “Polynomial algorithms for computing the
2 BLUNT POST BUCKY SYNTH smith and hermite normal forms of an integer matrix,” SIAM Journal
0 Datasets
on Computing, vol. 8, no. 4, pp. 499–507, 1979.
BLUNT POST BUCKY SYNTH
[13] B. R. Donald and D. R. Chang, “On the complexity of computing the
Datasets homology type of a triangulation,” in Proc. Annual symp. Foundations
of Computer Science, 1991, pp. 650–661.
[14] C. J. A. Delfinado and H. Edelsbrunner, “An incremental algorithm for
Fig. 12. Comparison of average GPU timings with optimized REDHOM, betti numbers of simplicial complexes on the 3-sphere,” Computer Aided
which includes the sequential algebraic reductions together with other ad- Geometric Design, vol. 12, no. 7, pp. 771–784, 1995.
vanced homology engines supported in the software. [15] “REDHOM,” http://redhom.ii.uj.edu.pl/.
341
[16] M. Mrozek and B. Batko, “Coreduction homology algorithm,” Discrete 41–47.
& Computational Geometry, vol. 41, no. 1, pp. 96–118, 2009. [19] R. H. Lewis and A. Zomorodian, “Multicore homology,” http://comptop.
[17] M. Mrozek, P. Pilarczyk, and N. Żelazna, “Homology algorithm based stanford.edu/preprints/, 2012.
on acyclic subspace,” Computers & Mathematics with Applications, [20] “METIS,” http://glaros.dtc.umn.edu/gkhome/views/metis.
vol. 55, no. 11, pp. 2395–2412, 2008. [21] R. Nasre, M. Burtscher, and K. Pingali, “Morph algorithms on gpus,”
[18] S. Harker, K. Mischaikow, M. Mrozek, V. Nanda, H. Wagner, M. Juda, in Proc. ACM SIGPLAN symp. Principles and practice of parallel
and P. Dłotko, “The efficiency of a homology algorithm based on discrete programming, 2013, pp. 147–156.
morse theory and coreductions,” in Proc. Intl. Workshop Computational [22] “Aim@Shape,” http://www.aimatshape.net/.
Topology in Image Context (CTIC 2010). Image A, vol. 1, 2010, pp.
342