CERN Accelerating science

Published Articles
Title Big Data Tools and Cloud Services for High Energy Physics Analysis in TOTEM Experiment
Author(s) Avati, Valentina (AGH-UST, Cracow) ; Blaszkiewicz, Milosz (AGH-UST, Cracow) ; Bocchi, Enrico (CERN) ; Canali, Luca (CERN) ; Castro, Diogo (CERN) ; Cervantes, Javier (CERN) ; Grzanka, Leszek (AGH-UST, Cracow) ; Guiraud, Enrico (CERN) ; Kaspar, Jan (CERN) ; Kothuri, Prasanth (CERN) ; Lamanna, Massimo (CERN) ; Malawski, Maciej (AGH-UST, Cracow) ; Mnich, Aleksandra (AGH-UST, Cracow) ; Moscicki, Jakub (CERN) ; Murali, Shravan (CERN) ; Piparo, Danilo (CERN) ; Tejedor, Enric (CERN)
Publication 2019
Number of pages 2
In: 11th IEEE/ACM International Conference on Utility and Cloud Computing Companion, Zurich, Switzerland, 17 - 20 Dec 2018, pp.5-6
DOI 10.1109/UCC-Companion.2018.00018
Subject category Computing and Computers
Accelerator/Facility, Experiment CERN LHC ; TOTEM
Abstract The High Energy Physics community has been developing dedicated solutions for processing experiment data over decades. However, with recent advancements in Big Data and Cloud Services, a question of application of such technologies in the domain of physics data analysis becomes relevant. In this paper, we present our initial experience with a system that combines the use of public cloud infrastructure (Helix Nebula Science Cloud), storage and processing services developed by CERN, and off-the-shelf Big Data frameworks. The system is completely decoupled from CERN main computing facilities and provides an interactive web-based interface based on Jupyter Notebooks as the main entry-point for the users. We run a sample analysis on 4.7 TB of data from the TOTEM experiment, rewriting the analysis code to leverage the PyRoot and RDataFrame model and to take full advantage of the parallel processing capabilities offered by Apache Spark. We report on the experience collected by embracing this new analysis model: preliminary scalability results show the processing time of our dataset can be reduced from 13 hrs on a single core to 7 mins on 248 cores.

Corresponding record in: Inspire


 ჩანაწერი შექმნილია 2020-01-09, ბოლოს შესწორებულია 2020-01-21