Author(s)
|
Schovancova, J (Academy of Sciences of the Czech Republic) ; Barreiro Megino, F H (CERN) ; Borrego, C (Physics Department, Universidad Autonoma de Madrid) ; Campana, S (CERN) ; Di Girolamo, A (CERN) ; Elmsheuser, J (Fakultaet fuer Physik, Ludwig-Maximilians-Universitaet Muenchen) ; Hejbal, J (Academy of Sciences of the Czech Republic) ; Kouba, T (Academy of Sciences of the Czech Republic) ; Legger, F (Fakultaet fuer Physik, Ludwig-Maximilians-Universitaet Muenchen) ; Magradze, E (Georg-August-Universitat Goettingen, II. Physikalisches Institut) ; Medrano Llamas, R (CERN) ; Negri, G (CERN) ; Rinaldi, L (INFN Bologna and Universita' di Bologna, Dipartimento di Fisica) ; Sciacca, G (University of Bern, Albert Einstein Center for Fundamental Physics, Laboratory for High Energy Physics) ; Serfon, C (Fakultaet fuer Physik, Ludwig-Maximilians-Universitaet Muenchen) ; Van Der Ster, D C (CERN) |
Abstract
| The ATLAS Experiment benefits from computing resources distributed worldwide at more than 100 WLCG sites. The ATLAS Grid sites provide over 100k CPU job slots, over 100 PB of storage space on disk or tape. Monitoring of status of such a complex infrastructure is essential. The ATLAS Grid infrastructure is monitored 24/7 by two teams of shifters distributed world-wide, by the ATLAS Distributed Computing experts, and by site administrators. In this paper we summarize automation efforts performed within the ATLAS Distributed Computing team in order to reduce manpower costs and improve the reliability of the system. Different aspects of the automation process are described: from the ATLAS Grid site topology provided by the ATLAS Grid Information System, via automatic site testing by the HammerCloud, to automatic exclusion from production or analysis activities. |