CERN Accelerating science

ATLAS Note
Report number ATL-SOFT-PROC-2023-020
Title Accelerating science: the usage of commercial clouds in ATLAS Distributed Computing
Author(s)

Barreiro Megino, Fernando Harald (University of Texas at Arlington (US)) ; Borodin, Misha (University of Iowa (US)) ; De, Kaushik (University of Texas at Arlington (US)) ; Elmsheuser, Johannes (Brookhaven National Laboratory (US)) ; Di Girolamo, Alessandro (CERN) ; Hartmann, Nikolai (Ludwig Maximilians Universitat (DE)) ; Heinrich, Lukas Alexander (Technische Universitat Munchen (DE)) ; Klimentov, Alexei (Brookhaven National Laboratory (US)) ; Lassnig, Mario (CERN) ; Lin, Fa-Hui (University of Texas at Arlington (US)) ; Maeno, Tadashi (Brookhaven National Laboratory (US)) ; Marshall, Zach (Lawrence Berkeley National Lab. (US)) ; Merino Arevalo, Gonzalo (The Barcelona Institute of Science and Technology (BIST) (ES)) ; Nilsson, Paul (Brookhaven National Laboratory (US)) ; Sandesara, Jay Ajitbhai (Amherst College (US)) ; Serfon, Cedric (Brookhaven National Laboratory (US)) ; South, David (Deutsches Elektronen-Synchrotron (DE)) ; Bawa, Harinder Singh (California State University (US))

Corporate Author(s) The ATLAS collaboration
Publication 2024
Imprint 04 Sep 2023
Number of pages 11
In: EPJ Web Conf. 295 (2024) 07002
In: 26th International Conference on Computing in High Energy & Nuclear Physics, Norfolk, Virginia, Us, 8 - 12 May 2023, pp.07002
DOI 10.1051/epjconf/202429507002
Subject category Particle Physics - Experiment
Accelerator/Facility, Experiment CERN LHC ; ATLAS
Free keywords Cloud Computing ; Heterogeneous Computing ; Distributed Computing ; ARM ; GPU ; Dask ; Architectures ; Analysis ; Google ; Amazon
Abstract The ATLAS experiment at CERN is one of the largest scientific ma- chines built to date and will have ever growing computing needs as the Large Hadron Collider collects an increasingly larger volume of data over the next 20 years. ATLAS is conducting R&D projects on Amazon and Google clouds as complementary resources for distributed computing, focusing on some of the key features of commercial clouds: lightweight operation, elasticity and avail- ability of multiple chip architectures. The proof of concept phases have concluded with the cloud-native, vendor- agnostic integration with the experiment’s data and workload management frameworks. Google has been used to evaluate elastic batch computing, ramp- ing up ephemeral clusters of up to O(100k) cores to process tasks requiring quick turnaround. Amazon cloud has been exploited for the successful physics validation of the Athena simulation software on ARM processors. We have also set up an interactive facility for physics analysis allowing end- users to spin up private, on-demand clusters for parallel computing with up to 4000 cores, or run GPU enabled notebooks and jobs for machine learning applications. The success of the proof of concept phases has led to the extension of the Google Cloud project, where ATLAS will study the total cost of ownership of a produc- tion cloud site during 15 months with 10k cores on average, fully integrated with distributed grid computing resources and continue the R&D projects.

Corresponding record in: Inspire


 Datensatz erzeugt am 2023-09-04, letzte Änderung am 2024-10-11