CERN Accelerating science

ATLAS Slides
Report number ATL-DAQ-SLIDE-2024-619
Title Evolution of the ATLAS TDAQ online software framework towards Phase-II upgrade: use of Kubernetes as an orchestrator of the ATLAS Event Filter computing farm
Author(s) Corso Radu, Alina (University of California Irvine (US))
Corporate author(s) The ATLAS collaboration
Collaboration ATLAS Collaboration
Submitted to 27th International Conference on Computing in High Energy & Nuclear Physics, Kraków, Pl, 19 - 25 Oct 2024
Submitted by [email protected] on 06 Dec 2024
Subject category Particle Physics - Experiment
Accelerator/Facility, Experiment CERN LHC ; ATLAS
Abstract The ATLAS experiment at the Large Hadron Collider (LHC) at CERN continuously evolves its Trigger and Data Acquisition (TDAQ) system to meet the challenges of new physics goals and technological advancements. As ATLAS prepares for the Phase-II Run 4 of the LHC, significant enhancements in the TDAQ Controls and Configuration tools have been designed to ensure efficient data collection, processing, and management. This abstract presents the evolution of ATLAS TDAQ Controls and Configuration system leading up to Phase-II Run4. As part of the evolution towards Phase-II, Kubernetes has been chosen to orchestrate the Event Filter farm. By leveraging Kubernetes, ATLAS can dynamically allocate computing resources, scale processing capacity in response to changing data taking conditions, and ensure high availability of data processing services. The integration of the Kubernetes with the TDAQ Run Control framework enables perfect synchronisation between the experiment's data acquisition components and the computing infrastructure. We will discuss the architectural considerations and implementation challenges involved in Kubernetes integration with the ATLAS TDAQ controls and configuration system. We will highlight the benefits of using Kubernetes as an event filter farm orchestrator, including improved resource utilization, enhanced fault tolerance, and simplified deployment and management of data processing workflows. In addition, we will report on the extensive testing of Kubernetes that was conducted using a farm of 2500 servers within the experiment data taking environment, demonstrating its scalability and robustness in handling the demands of the ATLAS TDAQ system for Phase-II. The adoption of Kubernetes represents a significant step forward in the evolution of ATLAS TDAQ controls and configuration system, aligning with industry best practices in container orchestration and cloud-native computing.



 Record created 2024-12-06, last modified 2024-12-06