Large scale fine grain simulation workflows (&quot;Jumbo Jobs&quot;) on HPC's

Benjamin, Douglas; Tsulaia, Vakhtang; Oleynik, Danila; Javurkova, Martina; Guan, Wen; Childers, John Taylor; Magini, Nicolo; Maeno, Tadashi; Nilsson, Paul

ATLAS Slides
Report number	ATL-SOFT-SLIDE-2019-807
Title	Large scale fine grain simulation workflows ("Jumbo Jobs") on HPC's
Author(s)	Benjamin, Douglas (Argonne National Laboratory) ; Maeno, Tadashi (Brookhaven National Laboratory (BNL)) ; Nilsson, Paul (Brookhaven National Laboratory (BNL)) ; Tsulaia, Vakhtang (Lawrence Berkeley National Laboratory and University of California, Berkeley) ; Guan, Wen (Department of Physics, University of Wisconsin) ; Oleynik, Danila (Joint Institute for Nuclear Research) ; Javurkova, Martina (University of Massachusetts, Amherst) ; Magini, Nicolo (Iowa State University) ; Childers, John Taylor (Argonne National Laboratory)
Corporate author(s)	The ATLAS collaboration
Collaboration	ATLAS Collaboration
Submitted to	24th International Conference on Computing in High Energy and Nuclear Physics, Adelaide, Australia, 4 - 8 Nov 2019
Submitted by	[email protected] on 25 Oct 2019
Subject category	Particle Physics - Experiment
Accelerator/Facility, Experiment	CERN LHC ; ATLAS
Abstract	The ATLAS experiment is using large High Performance Computers (HPC's) and fine grained simulation workflows (Event Service) to produce fully simulated events in an efficient manner. ATLAS has developed a new software component (Harvester) which provides resource provisioning and workload shaping. In order to run effectively on the largest HPC machines, ATLAS develop Yoda-Droid software to orchestrate the MPI communication between Harvester and the simulation payload running on over 1000 nodes simultaneously. In this way over 130,000 cores can simultaneously produce simulated Monte Carlo events for ATLAS. The PanDA system also had to be changed to produce "jumbo jobs" capable of simulated over 1 Million events per submission to the local HPC scheduling systems. This presentation will describe in detail the changes to PanDA to enable jumbo jobs and the Yoda-Droid software. Scaling and efficiency measurements will be presented. Results from deployment, integration and operation of the new software at the Titan, Cori and Theta HPC machines will be shown.

Επιστροφή στην αναζήτηση

Δημιουργία εγγραφής 2019-10-25, τελευταία τροποποίηση 2019-10-25

Παρόμοιες εγγραφές

Πλήρες κείμενο:

PPTX

εξωτερικός σύνδεσμος:

Original Communication (restricted to ATLAS)

Προσθήκη στο προσωπικό καλάθι
Εξαγωγή ως BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

CERN Document Server

Access articles, reports and multimedia content in HEP

Main menu

CERN Accelerating science