Exploring the self-service model to visualize the results of the ATLAS Machine Learning analysis jobs in BigPanDA with Openshift OKD3

Stan, Ioan-Mihail; Lee, Christopher Jon; Padolski, Siarhei

ATLAS Note
Report number	ATL-SOFT-PROC-2021-016
Title	Exploring the self-service model to visualize the results of the ATLAS Machine Learning analysis jobs in BigPanDA with Openshift OKD3
Author(s)	Stan, Ioan-Mihail (University Politehnica Bucharest) ; Padolski, Siarhei (Brookhaven National Laboratory (BNL)) ; Lee, Christopher Jon (Stony Brook University)
Corporate Author(s)	The ATLAS collaboration
Collaboration	ATLAS Collaboration
Publication	2021
Imprint	19 Jun 2021
Number of pages	12
Subject category	Particle Physics - Experiment
Accelerator/Facility, Experiment	CERN LHC ; ATLAS
Free keywords	BigPanda ; Openshift ; OKD ; MLFlow ; Pipelines
Abstract	A large scientific computing infrastructure must provide sufficient versatility to host any kind of experiment that can lead to innovative ideas and great discoveries. The ATLAS experiment provides wide access possibilities to execute intelligent and complex algorithms and to analyze and interpret the massive amount of data produced in the Large Hadron Collider at CERN. The PanDA Production ANd Distributed Analysis system is an interface between the ATLAS Distributed Computing infrastructure and tenants (eg:scientific groups, physicists ) and it works as a workload management system. The BigPanDa monitoring system is a sub-component of the PanDA and its main role is to monitor the entire life cycle of a job or task running in the ATLAS Distributed Computing infrastructure. Because many scientific experiments are now conducted by Machine Learning algorithms, the BigPanDA community wants to expand the platform’s capabilities and fill the gap between Machine Learning data processing and data visualization. In this regard, BigPanDA takes on the challenge of experiencing the cloud-native paradigm and delegates the data presentation component to MLFlow instances deployed on Openshift OKD. Thus, BigPanDA will interact with Openshift OKD native API and instruct the orchestrator on how to locate and display the results of the Machine Learning analysis by using MLFlow microservices and Kubernetes/Openshift objects. In addition, the proposed solution architecture introduces various DevOps-specific patterns, including continuous integration for the MLFlow middleware containers images and continuous deployment with rolling upgrades for the existing running instances. Machine Learning data visualization services will operate on demand and remain up and available for a limited time, thus optimizing overall resource consumption.

Back to search

Zapis kreiran 2021-06-19, zadnja izmjena 2021-06-26

Slični zapisi

Cjeloviti tekst:

PDF

External link:

Original Communication (restricted to ATLAS)

Add to personal basket
Export as BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

CERN Document Server

Access articles, reports and multimedia content in HEP

Main menu

CERN Accelerating science