Author(s)
| Taylor, Ryan P. (University of Victoria) ; Cordeiro, Cristovao (CERN) ; Di Girolamo, Alessandro (CERN) ; Hover, John (Brookhaven National Laboratory (BNL)) ; Kouba, Tomas (Academy of Sciences of the Czech Republic, Institute of Physics and Institute for Computer Science) ; Love, Peter (Lancaster University, Department of Physics) ; Mcnab, Andrew (School of Physics and Astronomy, University of Manchester) ; Schovancova, Jaroslava (The University of Texas at Arlington) ; Sobie, Randall (University of Victoria) |
Abstract
| Throughout the first year of LHC Run 2, ATLAS Cloud Computing has undergone a period of consolidation, characterized by building upon previously established systems, with the aim of reducing operational effort, improving robustness, and reaching higher scale. This paper describes the current state of ATLAS Cloud Computing. Cloud activities are converging on a common contextualization approach for virtual machines, and cloud resources are sharing monitoring and service discovery components. We describe the integration of Vac resources, streamlined usage of the High Level Trigger cloud for simulation and reconstruction, extreme scaling on Amazon EC2, and procurement of commercial cloud capacity in Europe. Building on the previously established monitoring infrastructure, we have deployed a real-time monitoring and alerting platform which coalesces data from multiple sources, provides flexible visualization via customizable dashboards, and issues alerts and carries out corrective actions in response to problems. Finally, a versatile analytics platform for data mining of log files is being used to analyze benchmark data and diagnose and gain insight on job errors. |