Abstract
| ATLAS is one of the two general-purpose experiments at the Large Hadron Collider (LHC), aiming to detect a wide variety of physics processes. Its trigger system plays a key role in selecting the events that are detected, filtering them down from the 40 MHz bunch crossing rate to the 1 kHz rate at which they are committed to storage. The ATLAS trigger works in two stages, Level-1 and the High-Level Trigger (HLT), with the first being a hardware-based coarse filtering applied using custom electronics and FPGAs, and the second relying on offline-like algorithms implemented fully in software, running on a farm of commodity CPUs. The LHC will undergo the High-Luminosity Upgrade soon (scheduled to be finished by 2029), which represents an additional challenge for the ATLAS trigger. The increased pile-up leads to events that are typically more complex and thus more computationally demanding to reconstruct, and a broad-ranging suite of upgrades to the ATLAS detector itself also encompasses increasing the input and output rates of the High Level Trigger by a factor of 10. As such, both the processing power required to handle a single event and the overall number of events that will need to be processed will increase, placing greater pressure on the trigger farm. One possibility of answering these increased computational demands in a cost- and energy-effective way is the use of hardware accelerators, in particular leveraging the massive parallelism and general computational capabilities offered by GPUs for problems that are suited to their mode of operation. Among the algorithms being assessed for GPU acceleration, Topological Clustering, the main and most computationally demanding stage of calorimeter reconstruction, has reached the significant milestone of 100% agreement with the CPU algorithm and maximum speed-ups in excess of a factor of 10. This is achieved through a more GPU-friendly variant of the algorithm, dubbed Topo-Automaton Clustering. A significant bottleneck remains in the time taken to convert between the data representation used within the GPU and the equivalent CPU data structures, which can be up to two thirds of the total execution time of the algorithm. This contribution will describe the development, optimization and integration of Topo-Automaton Clustering with the ATLAS trigger, including the latest benchmarks and ongoing efforts to develop an EDM framework that could allow for a general description of GPU-friendly data structures in order to alleviate the main bottleneck. |