Abstract
| The ATLAS experiment at the Large Hadron Collider (LHC) at CERN continuously evolves its Trigger and Data Acquisition (TDAQ) system to meet the challenges of new physics goals and technological advancements. As ATLAS prepares for the Phase-II Run 4 of the LHC, significant enhancements in the TDAQ Controls and Configuration tools have been designed to ensure efficient data collection, processing, and management. This abstract presents the evolution of ATLAS TDAQ Controls and Configuration system leading up to Phase-II Run4. As part of the evolution towards Phase-II, Kubernetes has been chosen to orchestrate the Event Filter farm. By leveraging Kubernetes, ATLAS can dynamically allocate computing resources, scale processing capacity in response to changing data taking conditions, and ensure high availability of data processing services. The integration of the Kubernetes with the TDAQ Run Control framework enables perfect synchronisation between the experiment's data acquisition components and the computing infrastructure. We will discuss the architectural considerations and implementation challenges involved in Kubernetes integration with the ATLAS TDAQ controls and configuration system. We will highlight the benefits of using Kubernetes as an event filter farm orchestrator, including improved resource utilization, enhanced fault tolerance, and simplified deployment and management of data processing workflows. In addition, we will report on the extensive testing of Kubernetes that was conducted using a farm of 2500 servers within the experiment data taking environment, demonstrating its scalability and robustness in handling the demands of the ATLAS TDAQ system for Phase-II. The adoption of Kubernetes represents a significant step forward in the evolution of ATLAS TDAQ controls and configuration system, aligning with industry best practices in container orchestration and cloud-native computing. |