Author(s)
|
Maeno, Tadashi (Brookhaven National Laboratory (US)) ; Alekseev, Aleksandr (Universidad Andres Bello (CL)) ; Barreiro Megino, Fernando Harald (University of Texas at Arlington (US)) ; De, Kaushik (University of Texas at Arlington (US)) ; Guan, Wen (Brookhaven National Laboratory (US)) ; Karavakis, Edward (Brookhaven National Laboratory (US)) ; Klimentov, Alexei (Brookhaven National Laboratory (US)) ; Korchuganova, Tatiana (University of Pittsburgh (US)) ; Lin, Fa-Hui (University of Texas at Arlington (US)) ; Nilsson, Paul (Brookhaven National Laboratory (US)) ; Wenaus, Torre (Brookhaven National Laboratory (US)) ; Yang, Zhaoyu (Brookhaven National Laboratory (US)) ; Zhao, Xin (Brookhaven National Laboratory (US)) |
Abstract
| In recent years, advanced and complex analysis workflows have gained increasing importance in the ATLAS experiment at CERN, one of the large scientific experiments at the Large Hadron Collider (LHC). Support for such workflows has allowed users to exploit remote computing resources and service providers distributed worldwide, overcoming limitations on local resources and services. The spectrum of computing options keeps increasing across WLCG resources, volunteer computing, high-performance and leadership computing facilities, commercial clouds, and emerging service levels like Platform-as-a-Service (PaaS), Container-as-a-Service (CaaS) and Function-as-a-Service (FaaS), each one providing new advantages and constraints. Users can significantly benefit from these providers, but at the same time, it is cumbersome to deal with multiple providers even in a single analysis workflow with fine-grained requirements coming from their applications' nature and characteristics. In this presentation we will first highlight issues in distributed heterogeneous computing, such as the insulation of users from the complexities of distributed heterogeneous providers, complex resource provisioning for CPU and GPU hybrid applications, integration of PaaS, CaaS, and FaaS providers, smart workload routing, automatic data placement, seamless execution of complex workflows, interoperability between pledged and user resources, and on-demand data production. We will then present solutions developed in ATLAS with the Production and Distributed Analysis system (PanDA system) and future challenges for LHC Run4. |