CERN Accelerating science

Published Articles
Title Streamlining CASTOR to manage the LHC data torrent
Author(s) Lo Presti, Giuseppe (CERN) ; Espinal Curull, Xavier (CERN) ; Cano, E (CERN) ; Fiorini, B (CERN) ; Ieri, A (CERN) ; Murray, S (CERN) ; Ponce, S (CERN) ; Sindrilaru, E (CERN)
Publication 2014
Number of pages 6
In: J. Phys.: Conf. Ser. 513 (2014) 042031
In: 20th International Conference on Computing in High Energy and Nuclear Physics 2013, Amsterdam, Netherlands, 14 - 18 Oct 2013, pp.042031
DOI 10.1088/1742-6596/513/4/042031
Subject category Computing and Computers
Abstract This contribution describes the evolution of the main CERN storage system, CASTOR, as it manages the bulk data stream of the LHC and other CERN experiments, achieving over 90 PB of stored data by the end of LHC Run 1. This evolution was marked by the introduction of policies to optimize the tape sub-system throughput, going towards a cold storage system where data placement is managed by the experiments' production managers. More efficient tape migrations and recalls have been implemented and deployed where bulk meta-data operations greatly reduce the overhead due to small files. A repack facility is now integrated in the system and it has been enhanced in order to automate the repacking of several tens of petabytes, required in 2014 in order to prepare for the next LHC run. Finally the scheduling system has been evolved to integrate the internal monitoring. To efficiently manage the service a solid monitoring infrastructure is required, able to analyze the logs produced by the different components (about 1 kHz of log messages). A new system has been developed and deployed, which uses a transport messaging layer provided by the CERN-IT Agile Infrastructure and exploits technologies including Hadoop and HBase. This enables efficient data mining by making use of MapReduce techniques, and real-time data aggregation and visualization. The outlook for the future is also presented. Directions and possible evolution will be discussed in view of the restart of data taking activities.
Copyright/License publication: (License: CC-BY)

Corresponding record in: Inspire


 レコード 生成: 2015-06-19, 最終変更: 2022-08-17