arXiv : Hyperparameter Optimisation in Deep Learning from Ensemble Methods: Applications to Proton Structure

Cruz-Martinez, Juan; Rocha, Carlos M.R.; Rabemananjara, Tanjona R.; Rojo, Juan; van Oord, Gijs; Stegeman, Roy; Jansen, Aaron

If you experience any problem watching the video, click the download button below

Preprint
Report number	CERN-TH-2024-168 ; arXiv:2410.16248
Title	Hyperparameter Optimisation in Deep Learning from Ensemble Methods: Applications to Proton Structure
Author(s)	Cruz-Martinez, Juan (CERN) ; Jansen, Aaron (Netherlands eScience Center) ; van Oord, Gijs (Netherlands eScience Center) ; Rabemananjara, Tanjona R. (Vrije U., Amsterdam ; NIKHEF, Amsterdam) ; Rocha, Carlos M.R. (Netherlands eScience Center) ; Rojo, Juan (CERN ; Vrije U., Amsterdam ; NIKHEF, Amsterdam) ; Stegeman, Roy (U. Edinburgh, Higgs Ctr. Theor. Phys.)
Imprint	2024-10-21
Number of pages	27
Note	27 pages, 7 figures
Subject category	physics.comp-ph ; Other Fields of Physics ; hep-ex ; Particle Physics - Experiment ; hep-ph ; Particle Physics - Phenomenology
Abstract	Deep learning models are defined in terms of a large number of hyperparameters, such as network architectures and optimiser settings. These hyperparameters must be determined separately from the model parameters such as network weights, and are often fixed by ad-hoc methods or by manual inspection of the results. An algorithmic, objective determination of hyperparameters demands the introduction of dedicated target metrics, different from those adopted for the model training. Here we present a new approach to the automated determination of hyperparameters in deep learning models based on statistical estimators constructed from an ensemble of models sampling the underlying probability distribution in model space. This strategy requires the simultaneous parallel training of up to several hundreds of models and can be effectively implemented by deploying hardware accelerators such as GPUs. As a proof-of-concept, we apply this method to the determination of the partonic substructure of the proton within the NNPDF framework and demonstrate the robustness of the resultant model uncertainty estimates. The new GPU-optimised NNPDF code results in a speed-up of up to two orders of magnitude, a stabilisation of the memory requirements, and a reduction in energy consumption of up to 90% as compared to sequential CPU-based model training. While focusing on proton structure, our method is fully general and is applicable to any deep learning problem relying on hyperparameter optimisation for an ensemble of models.
Other source	Inspire
Copyright/License	preprint: (License: arXiv nonexclusive-distrib 1.0)

$Schematic representation of the model architecture developed here and accommodating multiple stacked replicas. The computationally expensive convolution with the FK tables is now shared among all replicas, with the per-replica separation between training and validation data applied immediately afterwards through the mask $M_{i}^{(k)}$.$ $Left: The scaling of the overall training speed in the multi-replica fits developed for this work, measured in units of the number of replica models trained per hour. The blue points denote the performance of multi-replica fits on the GPU without optimisations, the orange points are the result of various optimisations described in Appendix \ref{app:techimpr}. Right: The peak memory usage (in GB) associated to the fits displayed in the left panel.$ $Left: The scaling of the overall training speed in the multi-replica fits developed for this work, measured in units of the number of replica models trained per hour. The blue points denote the performance of multi-replica fits on the GPU without optimisations, the orange points are the result of various optimisations described in Appendix \ref{app:techimpr}. Right: The peak memory usage (in GB) associated to the fits displayed in the left panel.$ Show more plots

Back to search

Record created 2024-10-24, last modified 2024-11-11

Similar records

Fulltext:

PDF

Add to personal basket
Export as BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

CERN Document Server

Access articles, reports and multimedia content in HEP

Main menu

CERN Accelerating science