EDP Sciences : Training and Serving ML workloads with Kubeflow at CERN

Article
Title	Training and Serving ML workloads with Kubeflow at CERN
Author(s)	Golubovic, Dejan (CERN) ; Rocha, Ricardo (CERN)
Publication	2021
Number of pages	10
In:	EPJ Web Conf. 251 (2021) 02067
In:	25th International Conference on Computing in High-Energy and Nuclear Physics (CHEP), Online, Online, 17 - 21 May 2021, pp.02067
DOI	10.1051/epjconf/202125102067
Subject category	Computing and Computers
Abstract	Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs.
Copyright/License	publication: © The Authors, published by EDP Sciences (License: CC-BY-4.0)

Corresponding record in: Inspire

Back to search

Запись создана 2021-09-07, последняя модификация 2021-09-07

Полный текст:

CERN Document Server