Streamlining CASTOR to manage the LHC data torrent

Lo Presti, Giuseppe; Cano, E; Sindrilaru, E; Ponce, S; Fiorini, B; Ieri, A; Murray, S; Espinal Curull, Xavier

doi:10.1088/1742-6596/513/4/042031

Published Articles
Title	Streamlining CASTOR to manage the LHC data torrent
Author(s)	Lo Presti, Giuseppe (CERN) ; Espinal Curull, Xavier (CERN) ; Cano, E (CERN) ; Fiorini, B (CERN) ; Ieri, A (CERN) ; Murray, S (CERN) ; Ponce, S (CERN) ; Sindrilaru, E (CERN)
Publication	2014
Number of pages	6
In:	J. Phys.: Conf. Ser. 513 (2014) 042031
In:	20th International Conference on Computing in High Energy and Nuclear Physics 2013, Amsterdam, Netherlands, 14 - 18 Oct 2013, pp.042031
DOI	10.1088/1742-6596/513/4/042031
Subject category	Computing and Computers
Abstract	This contribution describes the evolution of the main CERN storage system, CASTOR, as it manages the bulk data stream of the LHC and other CERN experiments, achieving over 90 PB of stored data by the end of LHC Run 1. This evolution was marked by the introduction of policies to optimize the tape sub-system throughput, going towards a cold storage system where data placement is managed by the experiments' production managers. More efficient tape migrations and recalls have been implemented and deployed where bulk meta-data operations greatly reduce the overhead due to small files. A repack facility is now integrated in the system and it has been enhanced in order to automate the repacking of several tens of petabytes, required in 2014 in order to prepare for the next LHC run. Finally the scheduling system has been evolved to integrate the internal monitoring. To efficiently manage the service a solid monitoring infrastructure is required, able to analyze the logs produced by the different components (about 1 kHz of log messages). A new system has been developed and deployed, which uses a transport messaging layer provided by the CERN-IT Agile Infrastructure and exploits technologies including Hadoop and HBase. This enables efficient data mining by making use of MapReduce techniques, and real-time data aggregation and visualization. The outlook for the future is also presented. Directions and possible evolution will be discussed in view of the restart of data taking activities.
Copyright/License	publication: (License: CC-BY)

Corresponding record in: Inspire

Back to search

Zapis kreiran 2015-06-19, zadnja izmjena 2022-08-17

Slični zapisi

Add to personal basket
Export as BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

CERN Document Server

Access articles, reports and multimedia content in HEP

Main menu

CERN Accelerating science