CERN Accelerating science

Article
Title Lifecycle Management, Business Continuity and Disaster Recovery Planning for the LHCb Experiment Control System Infrastructure
Author(s) Cifra, Pierfrancesco (CERN ; Nikhef, Amsterdam) ; Sborzacchi, Francesco (CERN) ; Neufeld, Niko (CERN) ; Granado Cardoso, Luis (CERN)
Publication 2024
Number of pages 6
In: EPJ Web Conf. 295 (2024) 07028
In: 26th International Conference on Computing in High Energy & Nuclear Physics, Norfolk, Virginia, Us, 8 - 12 May 2023, pp.07028
DOI 10.1051/epjconf/202429507028
Subject category Particle Physics - Experiment ; Computing and Computers
Accelerator/Facility, Experiment CERN LHC ; LHCb
Abstract LHCb (Large Hadron Collider beauty) is one of the four large particle physics experiments aimed at studying differences between particles and anti-particles and very rare decays in the charm and beauty sector of the standard model at the LHC. The Experiment Control System (ECS) is in charge of the configuration, control, and monitoring of the various subdetectors as well as all areas of the online system, and it is built on top of hundreds of Linux virtual machines (VM) running on a Red Hat Enterprise Virtualisation cluster. For such a mission-critical project, it is essential to keep the system operational; it is not possible to run the LHCb’s Data Acquisition without the ECS, and a failure would likely mean the loss of valuable data. In the event of a disruptive fault, it is important to recover as quickly as possible in order to restore normal operations. In addition, the VM’s lifecycle management is a complex task that needs to be simplified, automated, and validated in all of its aspects, with a particular focus on deployment, provisioning, and monitoring. The paper describes the LHCb’s approach to this challenge, including the methods, solutions, technology, and architecture adopted. We also show limitations and problems encountered, and we present the results of tests performed.
Copyright/License publication: © 2024-2025 The authors
CC-BY-4.0

Corresponding record in: Inspire


 Record created 2024-12-10, last modified 2024-12-10


Fulltext:
Download fulltext
PDF