CERN Accelerating science

Article
Report number arXiv:2203.01112
Title Hyperparameter optimization of data-driven AI models on HPC systems
Related titleHyperparameter optimization of data-driven AI models on HPC systems
Author(s) Wulff, Eric (CERN) ; Girone, Maria (CERN) ; Pata, Joosep (NICPB, Tallinn)
Publication 2023
Imprint 2022-03-02
Number of pages 6
Note Submitted to the proceedings of the ACAT 2021 conference and is to be published in the Journal Of Physics: Conference Series
In: J. Phys. : Conf. Ser. 2438, 1 (2023) pp.012092
In: 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2021), Daejeon, Korea, 29 Nov - 3 Dec 2021, pp.012092
DOI 10.1088/1742-6596/2438/1/012092
Subject category physics.data-an ; cs.LG ; Data Analysis and Statistics ; Computing and Computers
Abstract In the European Center of Excellence in Exascale computing "Research on AI- and Simulation-Based Engineering at Exascale" (CoE RAISE), researchers develop novel, scalable AI technologies towards Exascale. This work exercises High Performance Computing resources to perform large-scale hyperparameter optimization using distributed training on multiple compute nodes. This is part of RAISE's work on data-driven use cases which leverages AI- and HPC cross-methods developed within the project. In response to the demand for parallelizable and resource efficient hyperparameter optimization methods, advanced hyperparameter search algorithms are benchmarked and compared. The evaluated algorithms, including Random Search, Hyperband and ASHA, are tested and compared in terms of both accuracy and accuracy per compute resources spent. As an example use case, a graph neural network model known as MLPF, developed for the task of Machine-Learned Particle-Flow reconstruction in High Energy Physics, acts as the base model for optimization. Results show that hyperparameter optimization significantly increased the performance of MLPF and that this would not have been possible without access to large-scale High Performance Computing resources. It is also shown that, in the case of MLPF, the ASHA algorithm in combination with Bayesian optimization gives the largest performance increase per compute resources spent out of the investigated algorithms.
Copyright/License © 2023-2024 The author(s) (License: CC-BY-3.0)
preprint: (License: CC BY 4.0)



Corresponding record in: Inspire


 Записът е създаден на 2023-09-20, последна промяна на 2023-09-20


Пълен текст:
document - Сваляне на пълен текстPDF
2203.01112 - Сваляне на пълен текстPDF