CERN Accelerating science

Article
Title First implementation and results of the Analysis Grand Challenge with a fully Pythonic RDataFrame
Author(s) Padulano, Vincenzo Eduardo (CERN) ; Guiraud, Enrico (CERN ; Princeton U.) ; Falko, Andrii (Taras Shevchenko U.) ; Gazzarrini, Elena (CERN) ; Garcia Garcia, Enrique (CERN) ; Gosein, Domenic (CERN ; Mannheim U.)
Publication 2024
Number of pages 8
In: EPJ Web Conf. 295 (2024) 06011
In: 26th International Conference on Computing in High Energy & Nuclear Physics, Norfolk, Virginia, Us, 8 - 12 May 2023, pp.06011
DOI 10.1051/epjconf/202429506011
Subject category Computing and Computers
Abstract The growing amount of data generated by the LHC requires a shift in how HEP analysis tasks are approached. Efforts to address this computational challenge have led to the rise of a middle-man software layer, a mixture of simple, effective APIs and fast execution engines underneath. Having common, open and reproducible analysis benchmarks proves beneficial in the development of these modern tools. One such benchmark is provided by the Analysis Grand Challenge (AGC), which represents a specification for realistic analysis pipelines. This contribution presents the first AGC implementation that leverages ROOT RDataFrame, a powerful, modern and scalable execution engine for the HENP use cases. The different steps of the benchmarks are written with a composable, flexible and fully Pythonic API. RDataFrame can then transparently run the computations on all the cores of a machine or on multiple nodes thanks to automatic dataset splitting and transparent workload distribution. The portability of this implementation is shown by running on various resources, from managed facilities to open cloud platforms for research, showing usage of interactive and distributed environments.
Copyright/License CC-BY-4.0

Corresponding record in: Inspire


 Record created 2024-12-12, last modified 2024-12-12


Fulltext:
Download fulltext
PDF