Author(s)
| Calafiura, Paolo (Lawrence Berkeley National Laboratory and University of California, Berkeley) ; De, Kaushik (The University of Texas at Arlington) ; Guan, Wen (Department of Physics, University of Wisconsin) ; Maeno, Tadashi (Brookhaven National Laboratory (BNL)) ; Nilsson, Paul (Brookhaven National Laboratory (BNL)) ; Oleynik, Danila (Joint Institute for Nuclear Research) ; Panitkin, Sergey (Brookhaven National Laboratory (BNL)) ; Tsulaia, Vakhtang (Lawrence Berkeley National Laboratory and University of California, Berkeley) ; van Gemmeren, Peter (Argonne National Laboratory) ; Wenaus, Torre (Brookhaven National Laboratory (BNL)) |
Abstract
| High performance computing facilities present unique challenges and opportunities for HENP event processing. The massive scale of many HPC systems means that fractionally small utilizations can yield large returns in processing throughput. Parallel applications which can dynamically and efficiently fill any scheduling opportunities the resource presents benefit both the facility (maximal utilization) and the (compute-limited) science. The ATLAS Yoda system provides this capability to HENP-like event processing applications by implementing event-level processing in an MPI-based master-client model that integrates seamlessly with the more broadly scoped ATLAS Event Service. Fine grained, event level work assignments are intelligently dispatched to parallel workers to sustain full utilization on all cores, with outputs streamed off to destination object stores in near real time with similarly fine granularity, such that processing can proceed until termination with full utilization. The system offers the efficiency and scheduling flexibility of preemption without requiring the application actually support or employ checkpointing. We will present the new Yoda system, its motivations, architecture, implementation, and applications in ATLAS data processing at several US HPC centers. |