HPC Storage Service Autotuning Using Variational-Autoencoder-Guided Asynchronous Bayesian Optimization

Dorier, Matthieu; Egele, Romain; Balaprakash, Prasanna; Koo, Jaehoon; Madireddy, Sandeep; Ramesh, Srinivasan; Malony, Allen D.; Ross, Rob

doi:10.1109/CLUSTER51413.2022.00049

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2210.00798 (cs)

[Submitted on 3 Oct 2022]

Title:HPC Storage Service Autotuning Using Variational-Autoencoder-Guided Asynchronous Bayesian Optimization

Authors:Matthieu Dorier, Romain Egele, Prasanna Balaprakash, Jaehoon Koo, Sandeep Madireddy, Srinivasan Ramesh, Allen D. Malony, Rob Ross

View PDF

Abstract:Distributed data storage services tailored to specific applications have grown popular in the high-performance computing (HPC) community as a way to address I/O and storage challenges. These services offer a variety of specific interfaces, semantics, and data representations. They also expose many tuning parameters, making it difficult for their users to find the best configuration for a given workload and platform.
To address this issue, we develop a novel variational-autoencoder-guided asynchronous Bayesian optimization method to tune HPC storage service parameters. Our approach uses transfer learning to leverage prior tuning results and use a dynamically updated surrogate model to explore the large parameter search space in a systematic way.
We implement our approach within the DeepHyper open-source framework, and apply it to the autotuning of a high-energy physics workflow on Argonne's Theta supercomputer. We show that our transfer-learning approach enables a more than $40\times$ search speedup over random search, compared with a $2.5\times$ to $10\times$ speedup when not using transfer learning. Additionally, we show that our approach is on par with state-of-the-art autotuning frameworks in speed and outperforms them in resource utilization and parallelization capabilities.

Comments:	Accepted at IEEE Cluster 2022
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:2210.00798 [cs.DC]
	(or arXiv:2210.00798v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2210.00798
Related DOI:	https://doi.org/10.1109/CLUSTER51413.2022.00049

Submission history

From: Romain Egele [view email]
[v1] Mon, 3 Oct 2022 10:12:57 UTC (817 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:HPC Storage Service Autotuning Using Variational-Autoencoder-Guided Asynchronous Bayesian Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:HPC Storage Service Autotuning Using Variational-Autoencoder-Guided Asynchronous Bayesian Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators