CERN Accelerating science

ATLAS Note
Report number ATL-DAQ-PROC-2015-026
Title Modeling a Large Data Acquisition Network in a Simulation Framework
Author(s) Colombo, Tommaso (Ruprecht-Karls-University Heidelberg) ; Froening, Holger (CERN) ; Garcia, Pedro Javier ; Vandelli, Wainer (CERN)
Corporate Author(s) The ATLAS collaboration
Publication 2015
Imprint 30 Jul 2015
Number of pages 8
In: 1st IEEE International Workshop on High-Performance Interconnection Networks Towards the Exascale and Big-Data Era, Chicago, Il, United States Of America, 8 - 11 Sep 2015, pp.15570932
DOI 10.1109/CLUSTER.2015.137
Subject category Particle Physics - Experiment
Accelerator/Facility, Experiment CERN LHC ; ATLAS
Abstract The ATLAS detector at CERN records particle collision “events” delivered by the Large Hadron Collider. Its data-acquisition system is a distributed software system that identifies, selects, and stores interesting events in near real-time, with an aggregate throughput of several 10 GB/s. It is a distributed software system executed on a farm of roughly 2000 commodity worker nodes communicating via TCP/IP on an Ethernet network. Event data fragments are received from the many detector readout channels and are buffered, collected together, analyzed and either stored permanently or discarded. This system, and data-acquisition systems in general, are sensitive to the latency of the data transfer from the readout buffers to the worker nodes. Challenges affecting this transfer include the many-to-one communication pattern and the inherently bursty nature of the traffic. In this paper we introduce the main performance issues brought about by this workload, focusing in particular on the so-called TCP incast pathology. Since performing systematic studies of these issues is often impeded by operational constraints related to the mission-critical nature of these systems, we focus instead on the development of a simulation model of the ATLAS data-acquisition system, used as a case study. The simulation is based on the well-established the OMNeT++ framework. Its results are compared with existing measurements of the system's behavior. The successful reproduction of the measurements by the simulations validates the modeling approach. We share some of the preliminary findings obtained from the simulation, as an example of the additional possibilities it enables, and outline the planned future investigations.
Copyright/License Preprint: (License: CC-BY-4.0)

Corresponding record in: Inspire


 Záznam vytvorený 2015-07-30, zmenený 2018-05-29