CERN Accelerating science

Talk
Title Training and Serving ML workloads with Kubeflow at CERN
Video
If you experience any problem watching the video, click the download button below
Download Embed
Show n. of views
Mp4:480p
(presenter)
720p
(presenter)
1080p
(presenter)
240p
(presenter)
360p
(presenter)
Subtitles:
Copy-paste this code into your page:
Author(s) Golubovic, Dejan (speaker) (CERN)
Corporate author(s) CERN. Geneva
Imprint 2021-05-20. - 789.
Series (Conferences)
(25th International Conference on Computing in High Energy & Nuclear Physics)
Lecture note on 2021-05-20T11:16:00
Subject category Conferences
Abstract Machine Learning (ML) has been growing in popularity in multiple areas and groups at CERN, covering fast simulation, tracking, anomaly detection, among many others. We describe a new service available at CERN, based on Kubeflow and managing the full ML lifecycle: data preparation and interactive analysis, large scale distributed model training and model serving. We cover specific features available for hyper-parameter tuning and model metadata management, as well as infrastructure details to integrate accelerators and external resources. We also present results and a cost evaluation from scaling out a popular ML use case using public cloud resources, achieving close to linear scaling when using a large number of GPUs.
Copyright/License © 2021-2024 CERN
Submitted by [email protected]

 


 Записът е създаден на 2021-05-21, последна промяна на 2024-06-26


External links:
Сваляне на пълен текстTalk details
Сваляне на пълен текстEvent details