Abstract
Exchanging data on noncontiguous user buffers has been a dominant communication pattern in many scientific applications. The OpenSHMEM specification introduces a new set of communication routines to support strided data communication. Most high performance implementations of the OpenSHMEM specification support strided data communication by either packing/unpacking or multiple reads/writes based scheme, which incurs significant performance overhead during communication. This performance overhead could prevent application developers from using OpenSHMEM strided data communication routines. Recently, Mellanox has introduced a novel feature, called User-mode Memory Registration (UMR), for noncontiguous data transfer. UMR has the potential to support efficient OpenSHMEM strided data communication. In this paper, we propose UMR-based schemes to support one-sided zero-copy strided data communication for OpenSHMEM. To the best of our knowledge, this is the first paper to design OpenSHMEM strided data communication using the UMR feature. We propose and implement UMR-based designs on top of MVAPICH2-X. Experimental results with shmem iget operation show 3X performance improvement over the multiple reads scheme in default MVAPICH2-X, and 20X performance improvement over the OpenSHMEM reference implementation configured with GASNet. At the application level, for a 3D stencil communication kernel with OpenSHMEM iget routines on 512 processes, the proposed UMR-based design outperforms the multiple reads scheme in default MVAPICH2-X by 20% in total execution time.