CERN Accelerating science

ATLAS Slides
Report number ATL-DAQ-SLIDE-2019-088
Title The Central Hint and Information Processor system for automation, error detection and recovery in the ATLAS TDAQ Controls framework
Author(s) Avolio, Giuseppe (European Laboratory for Particle Physics, CERN)
Corporate author(s) The ATLAS collaboration
Collaboration ATLAS Collaboration
Submitted by [email protected] on 18 Mar 2019
Subject category Particle Physics - Experiment
Accelerator/Facility, Experiment CERN LHC ; ATLAS
Free keywords ATLAS ; DAQ ; Complex event processing ; CEP ; Automation ; Recovery ; Error management ; Anomaly detection ; CHIP
Abstract The ATLAS experiment at the Large Hadron Collider at CERN relies on a complex and highly distributed Trigger and Data Acquisition (TDAQ) system to gather and select particle collision data obtained at unprecedented energy and rates. The TDAQ Controls system is the component that guarantees the smooth and synchronous operations of all the TDAQ components and provides the means to minimize the downtime of the system caused by run-time failures. Given the scale and complexity of the TDAQ system and the rates of data to be analyzed, the automation of the system functionality in the areas of error detection and recovery is a strong requirement. That is why in Run 2 the Central Hint and Information Processor (CHIP) service has been introduced; it can be truly considered the "brain" of the TDAQ Controls system. CHIP is an intelligent system able to supervise the ATLAS data taking, take operational decisions and handle abnormal conditions. It is based on an open-source Complex Event Processing (CEP) engine, ESPER. Currently, CHIP's knowledge base is made up of more than 300 rules organized in about 30 different contexts. This paper will focus on the experience gained with CHIP during the whole LHC Run 2 period. Particular attention will be paid to demonstrate how the use of CHIP for automation and error recovery proved to be a valuable asset in optimizing the data taking efficiency, reducing operational mistakes, efficiently handling complex scenarios and improving the latency to react to abnormal situations. Additionally, the huge benefits brought by the CEP engine in terms of both flexibility and simplification of the knowledge base will be reported.



 Registre creat el 2019-03-18, darrera modificació el 2019-03-18