CERN Accelerating science

CMS Note
Report number CMS-CR-2012-041
Title The CMS Online Cluster: Setup, Operation and Maintenance of an Evolving Cluster
Author(s) Coarasa, Jose Antonio (CERN) ; Bauer, Gerry (MIT) ; Behrens, Ulf (DESY) ; Bouffet, Olivier (CERN) ; Branson, James G (UC, San Diego) ; Bukowiec, Sebastian (CERN) ; Chaze, Olivier (CERN) ; Ciganek, Marek (CERN) ; Cittolin, Sergio (UC, San Diego) ; Deldicque, Christian (CERN) ; Dobson, Marc (CERN) ; Dupon, Aymeric (CERN) ; Erhan, Samim (UCLA) ; Gigi, Dominique (CERN) ; Glege, Frank (CERN) ; Gomez-Reino, Robert (CERN) ; Hartl, Christian (CERN) ; Holzner, André (UC, San Diego) ; Masetti, Lorenzo (CERN) ; Meijers, Frans (CERN) ; Meschi, Emilio (CERN) ; Mommsen, Remigius K (Fermilab) ; Nunez-Barranco-Fernandez, Carlos (CERN) ; O'Dell, Vivian (Fermilab) ; Orsini, Luciano (CERN) ; Paus, Christoph (MIT) ; Petrucci, Andrea (CERN) ; Pieri, Marco (UC, San Diego) ; Polese, Giovanni (CERN) ; Racz, Attila (CERN) ; Raginel, Olivier (MIT) ; Sakulin, Hannes (CERN) ; Sani, Matteo (UC, San Diego) ; Schwick, Christoph (CERN) ; Simon, Michal (CERN) ; Spataru, Andrei Cristian (CERN) ; Stoeckli, Fabian (MIT) ; Sumorok, Konstanty (MIT)
Publication 2012
Imprint 07 Mar 2012
Number of pages 13
In: PoS ISGC 2012 (2012) pp.023
In: International Symposium on Grids and Clouds, Taipei, Taiwan, 26 Feb - 2 Mar 2012, pp.023
Subject category Detectors and Experimental Techniques
Accelerator/Facility, Experiment CERN LHC ; CMS
Abstract The CMS online cluster consists of more than 2700 computers running about 15000 application instances. These applications implement the necessary services to run the data acquisition of the CMS experiment. In this paper the IT solutions employed on the cluster are reviewed. Details are given on the adopted solutions which include the following topics: implementation of a redundant and load balanced network and core IT services; deployment and configuration management infrastructure and its customization; a new monitoring infrastructure. Special emphasis will be put on the scalable approach allowing to increase the size of the cluster with no administration overhead. Finally, the lessons learnt from the two years of running will be presented.
Copyright/License Publication: (License: CC-BY-NC-SA-3.0)

Corresponding record in: INSPIRE


 Record created 2012-04-13, last modified 2017-11-03


Fulltext:
Download fulltextPDF
Published version from PoS:
Download fulltextPDF