Data Structures and Algorithms for the Optimization of Hierarchical Hybrid Multigrid Methods

Gradl, Tobias

Data Structures and Algorithms for the Optimization of Hierarchical Hybrid Multigrid Methods

Files

6836_Gradl_Dissertation.pdf (976.8 KB)

Language

en

Document Type

Doctoral Thesis

Issue Date

2016-01-21

Issue Year

2015

Authors

Gradl, Tobias

Abstract

Multigrid methods are among the theoretically most efficient algorithms in numerical simulation. They solve certain classes of equations - e. g., those arising from finite element (FE) discretizations - with optimal complexity. Practically relevant for large-scale simulations, however, are only algorithms that exploit the massive parallelism that characterizes today’s high-performance computing landscape. Implementing multigrid methods efficiently on massively parallel computers is challenging, because for some of the core algorithms the distribution of the numerical operations to many processors is not straightforward.

Bergen et al. proved with the Hierarchical Hybrid Grids (HHG) software framework that it is possible to solve FE simulations efficiently with multigrid methods on supercomputers. The central concept of HHG is to discretize the simulated domains into patch-wise structured meshes. It facilitates the distribution of the computational work to many processors, but it also restricts HHG’s flexibility regarding the types of numerical problems it can be applied to.

This thesis presents performance studies for FE simulations with up to 3 × 10¹¹ degrees of freedom that demonstrate short time to solution and good scalability of HHG on up to 16384 processor cores. We describe the modifications to the initial version of HHG - e. g., in the build system and the performance measurement methods - that were necessary in order to execute and study HHG on systems of this size.

The central chapter of the thesis is dedicated to adaptive mesh refinement (AMR). The technique makes HHG applicable to a new class of problems, which is characterized by a strong variance in the required mesh resolution across the domain, e. g., the simulation of room acoustics or turbulent flows. AMR allows for the FE mesh to be tailored flexibly to the simulation’s characteristics. The multigrid solver can thus spend the computer’s resources - memory and processor cycles - on areas where the simulation requires a high resolution. In consequence, the time to solution decreases and the problem size that can be handled increases. When implementing AMR for HHG, it was important to maintain the numerical and software engineering concepts that are crucial for HHG’s performance and scalability. We describe how the algorithms and data structures were extended in order to achieve this goal.

There are many other techniques for optimizing the distribution of computational resources in multigrid algorithms. As a contrast to AMR, we present a technique that was developed in joint work with Thekale et al. A branch and bound search is used to find the optimal number of V-cycles on each level of a full multigrid algorithm. By performing V-cycles on the levels where they yield the best ratio between error reduction and cost, the time to solution of a full multigrid run in a realistic scenario was reduced by 35%.

Abstract

Mehrgittermethoden gehören zumindest theoretisch zu den effizientesten Algorithmen in der Numerischen Simulation. Einige Klassen von Gleichungen, z.B. die bei der Disktretisierung mit Finiten Elementen (FE) entstehenden Gleichungssysteme, sind damit unter bestimmten Voraussetzungen in optimaler Komplexität lösbar. Für die Anwendung im High-Performance-Computing sind jedoch nur Methoden relevant, die den extremen Parallelismus aktueller Supercomputer ausnutzen können. Mehrgittermethoden effizient für Parallelrechner zu implementieren ist eine Herausforderung, weil für einige der zentralen Algorithmen das Verteilen der numerischen Operationen auf viele Prozessoren nicht trivial ist.

Mit dem Software-Framework Hierarchical Hybrid Grids (HHG) zeigten Bergen et al., daß effiziente FE-Simulationen mit Mehrgittermethoden auf Supercomputern möglich sind. Das zentrale Konzept von HHG ist die Diskretisierung des simulierten Gebiets in abschnittsweise strukturierte Gitter. Das erleichtert das Verteilen der Rechenoperationen auf viele Prozessoren, schränkt allerdings auch die Anwendbarkeit von HHG auf bestimmte numerische Probleme ein.

Diese Arbeit demonstriert mit Performance-Studien auf bis zu 16384 Prozessorkernen und FE-Simulationen mit bis zu 3 × 10¹¹ Freiheitsgraden die hohe Effizienz und Skalierbarkeit von HHG. Um HHG auf Systemen dieser Größe ausführen und analysieren zu können, waren Änderungen an der ursprünglichen HHG-Version nötig, z.B. am Build-System und an den Methoden zur Performance-Messung.

Das zentrale Kapitel der Arbeit widmet sich der adaptiven Gitterverfeinerung (AMR, von adatpive mesh refinement). Diese Technik erweitert den Anwendungsbereich von HHG auf Probleme mit starker räumlicher Varianz in der benötigten Gitterweite, z.B. die Simulation von Raumakustik oder von turbulenten Strömungen. AMR ermöglicht eine flexible Anpassung des FE-Gitters an die Simulationscharakteristika. So können die Ressourcen des Computers – Speicher und Prozessorzyklen – gezielt dort eingesetzt werden, wo eine hohe Gitterauflösung nötig ist, und damit die Rechenzeit verringert und die lösbare Problemgröße erhöht werden. Bei der Implementierung von AMR in HHG war es wichtig, die Konzepte aus der Numerik und aus dem Software-Engineering, die für die Performance und Skalierbarkeit von HHG entscheidend sind, zu erhalten. Die Arbeit beschreibt, wie die Algorithmen und Datenstrukturen erweitert wurden, um dieses Ziel zu erreichen.

Eine weitere Methode zur Optimierung von Mehrgittermethoden wurde in Zusammenarbeit mit Thekale et al. entwickelt. Mit einer Branch-And-Bound-Suche wird die Verteilung von V-Zyklen im Full-Multigrid-Algorithmus optimiert. V-Zyklen werden gezielt auf den Leveln ausgeführt, wo sie das beste Verhältnis aus Fehlerreduktion und Kosten erzielen. Mit dieser Methode wurde in einem realistischen Szenario eine Verringerung der Laufzeit von Full Multigrid um 35% erreicht.