CERN Accelerating science

ATLAS Note
Report number ATL-SOFT-PROC-2021-016
Title Exploring the self-service model to visualize the results of the ATLAS Machine Learning analysis jobs in BigPanDA with Openshift OKD3
Author(s) Stan, Ioan-Mihail (University Politehnica Bucharest) ; Padolski, Siarhei (Brookhaven National Laboratory (BNL)) ; Lee, Christopher Jon (Stony Brook University)
Corporate Author(s) The ATLAS collaboration
Collaboration ATLAS Collaboration
Publication 2021
Imprint 19 Jun 2021
Number of pages 12
Subject category Particle Physics - Experiment
Accelerator/Facility, Experiment CERN LHC ; ATLAS
Free keywords BigPanda ; Openshift ; OKD ; MLFlow ; Pipelines
Abstract A large scientific computing infrastructure must provide sufficient versatility to host any kind of experiment that can lead to innovative ideas and great discoveries. The ATLAS experiment provides wide access possibilities to execute intelligent and complex algorithms and to analyze and interpret the massive amount of data produced in the Large Hadron Collider at CERN. The PanDA Production ANd Distributed Analysis system is an interface between the ATLAS Distributed Computing infrastructure and tenants (eg:scientific groups, physicists ) and it works as a workload management system. The BigPanDa monitoring system is a sub-component of the PanDA and its main role is to monitor the entire life cycle of a job or task running in the ATLAS Distributed Computing infrastructure. Because many scientific experiments are now conducted by Machine Learning algorithms, the BigPanDA community wants to expand the platform’s capabilities and fill the gap between Machine Learning data processing and data visualization. In this regard, BigPanDA takes on the challenge of experiencing the cloud-native paradigm and delegates the data presentation component to MLFlow instances deployed on Openshift OKD. Thus, BigPanDA will interact with Openshift OKD native API and instruct the orchestrator on how to locate and display the results of the Machine Learning analysis by using MLFlow microservices and Kubernetes/Openshift objects. In addition, the proposed solution architecture introduces various DevOps-specific patterns, including continuous integration for the MLFlow middleware containers images and continuous deployment with rolling upgrades for the existing running instances. Machine Learning data visualization services will operate on demand and remain up and available for a limited time, thus optimizing overall resource consumption.



 Zapis kreiran 2021-06-19, zadnja izmjena 2021-06-26


Cjeloviti tekst:
Download fulltextPDF
External link:
Download fulltextOriginal Communication (restricted to ATLAS)