Home > CMS Collection > CMS Preprints > The CMS Online Cluster: Setup, Operation and Maintenance of an Evolving Cluster |
CMS Note | |
Report number | CMS-CR-2012-041 |
Title | The CMS Online Cluster: Setup, Operation and Maintenance of an Evolving Cluster |
Author(s) | Coarasa, Jose Antonio (CERN) ; Bauer, Gerry (MIT) ; Behrens, Ulf (DESY) ; Bouffet, Olivier (CERN) ; Branson, James G (UC, San Diego) ; Bukowiec, Sebastian (CERN) ; Chaze, Olivier (CERN) ; Ciganek, Marek (CERN) ; Cittolin, Sergio (UC, San Diego) ; Deldicque, Christian (CERN) ; Dobson, Marc (CERN) ; Dupon, Aymeric (CERN) ; Erhan, Samim (UCLA) ; Gigi, Dominique (CERN) ; Glege, Frank (CERN) ; Gomez-Reino, Robert (CERN) ; Hartl, Christian (CERN) ; Holzner, André (UC, San Diego) ; Masetti, Lorenzo (CERN) ; Meijers, Frans (CERN) ; Meschi, Emilio (CERN) ; Mommsen, Remigius K (Fermilab) ; Nunez-Barranco-Fernandez, Carlos (CERN) ; O'Dell, Vivian (Fermilab) ; Orsini, Luciano (CERN) ; Paus, Christoph (MIT) ; Petrucci, Andrea (CERN) ; Pieri, Marco (UC, San Diego) ; Polese, Giovanni (CERN) ; Racz, Attila (CERN) ; Raginel, Olivier (MIT) ; Sakulin, Hannes (CERN) ; Sani, Matteo (UC, San Diego) ; Schwick, Christoph (CERN) ; Simon, Michal (CERN) ; Spataru, Andrei Cristian (CERN) ; Stoeckli, Fabian (MIT) ; Sumorok, Konstanty (MIT) |
Publication | 2012 |
Imprint | 07 Mar 2012 |
Number of pages | 13 |
In: | PoS ISGC 2012 (2012) pp.023 |
In: | International Symposium on Grids and Clouds, Taipei, Taiwan, 26 Feb - 2 Mar 2012, pp.023 |
Subject category | Detectors and Experimental Techniques |
Accelerator/Facility, Experiment | CERN LHC ; CMS |
Abstract | The CMS online cluster consists of more than 2700 computers running about 15000 application instances. These applications implement the necessary services to run the data acquisition of the CMS experiment. In this paper the IT solutions employed on the cluster are reviewed. Details are given on the adopted solutions which include the following topics: implementation of a redundant and load balanced network and core IT services; deployment and configuration management infrastructure and its customization; a new monitoring infrastructure. Special emphasis will be put on the scalable approach allowing to increase the size of the cluster with no administration overhead. Finally, the lessons learnt from the two years of running will be presented. |
Copyright/License | Publication: (License: CC-BY-NC-SA-3.0) |