10.1201 9781351036863 Previewpdf
10.1201 9781351036863 Previewpdf
Performance Computing
From Petascale toward Exascale
Volume 3
Chapman & Hall/CRC
Computational Science Series
Series Editor: Sartaj Sahni
Data-Intensive Science
Terence Critchlow, Kerstin Kleese van Dam
Grid Computing
Techniques and Applications
Barry Wilkinson
Scientific Computing with Multicore and Accelerators
Jakub Kurzak, David A. Bader, Jack Dongarra
Introduction to Scheduling
Yves Robert, Frederic Vivien
Edited by
Jeffrey S. Vetter
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity
of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized
in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying,
microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers,
MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of
users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been
arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Dedication
Preface xix
Editor xxiii
vii
viii Contents
1.7.3 Optimisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8 Archiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.1 Oracle Hierarchical Storage Manager (SAM-QFS) . . . . . . . . . . 23
1.8.2 MARS/TSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.9 Data Center/Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.10 System Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.10.1 Systems Usage Patterns . . . . . . . . . . . . . . . . . . . . . . . . 25
1.11 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.11.1 Failover Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.11.2 Compute Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.11.3 Data Mover Failover . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.11.4 Storage Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.11.4.1 Normal Mode . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.11.4.2 Failover Mode . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.11.4.3 Recovery Mode . . . . . . . . . . . . . . . . . . . . . . . . 27
1.11.4.4 Isolated Mode . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.11.5 SSH File Transfer Failover . . . . . . . . . . . . . . . . . . . . . . . 27
1.12 Implementing a Product Generation Platform . . . . . . . . . . . . . . . . 28
2.3.1.2 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.1.3 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.1.4 Storage System . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.2 System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.2.1 Systems Administration of the Cray Linux Environment . 42
2.3.2.2 Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.3.3 Programming System . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.3.3.1 Programming Models . . . . . . . . . . . . . . . . . . . . 42
2.3.3.2 Languages and Compilers . . . . . . . . . . . . . . . . . . 43
2.3.4 Deployment and Acceptance . . . . . . . . . . . . . . . . . . . . . . 44
2.3.4.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3.4.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3.5 Early Science and Transition to Operations . . . . . . . . . . . . . 46
2.4 Mira . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.4.1 Architecture and Software Summary . . . . . . . . . . . . . . . . . 49
2.4.2 Evolution of Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.3 Notable Science Accomplishments . . . . . . . . . . . . . . . . . . . 52
2.4.4 System Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.5 Cobalt Job Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.6 Job Failure Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.2 Blade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.2.3 The Overall System . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.4 Performance Summary . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.3 System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.3.1 Development Tools Ecosystem . . . . . . . . . . . . . . . . . . . . . 101
4.3.2 OpenStack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4 Applications and Workloads . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.4.1 Core Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.4.2 Node Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.4.3 System Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.4.4 Node Power Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.5 Deployment and Operational Information . . . . . . . . . . . . . . . . . . 108
4.5.1 Thermal Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.6 Highlights of Mont-Blanc . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.6.1 Reliability Study of an Unprotected RAM System . . . . . . . . . . 111
4.6.2 Network Retransmission and OS Noise Study . . . . . . . . . . . . 114
4.6.3 The Power Monitoring Tool of the Mont-Blanc System . . . . . . . 117
4.7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5 Chameleon 123
Kate Keahey, Pierre Riteau, Dan Stanzione, Tim Cockerill, Joe
Mambretti, Paul Rad, and Paul Ruth
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.1.1 A Case for a Production Testbed . . . . . . . . . . . . . . . . . . . 124
5.1.2 Program Background . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.1.3 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.2 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.2.1 Projected Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.2.2 Phase 1 Chameleon Deployment . . . . . . . . . . . . . . . . . . . . 127
5.2.3 Experience with Phase 1 Hardware and Future Plans . . . . . . . . 129
5.3 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.4 System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.4.1 Core Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.5 Appliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.5.1 System Appliances . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.5.2 Complex Appliances . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.6 Data Center/Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.6.1 University of Chicago Facility . . . . . . . . . . . . . . . . . . . . . 137
5.6.2 TACC Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.6.3 Wide-Area Connectivity . . . . . . . . . . . . . . . . . . . . . . . . 137
5.7 System Management and Policies . . . . . . . . . . . . . . . . . . . . . . . 138
5.8 Statistics and Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . 138
5.9 Research Projects Highlights . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.9.1 Chameleon Slices for Wide-Area Networking Research . . . . . . . 141
5.9.2 Machine Learning Experiments on Chameleon . . . . . . . . . . . . 142
Contents xi
8 Jetstream 189
Craig A. Stewart, David Y. Hancock, Therese Miller, Jeremy Fischer,
R. Lee Liming, George Turner, John Michael Lowe, Steven Gregory,
Edwin Skidmore, Matthew Vaughn, Dan Stanzione, Nirav Merchant,
Ian Foster, James Taylor, Paul Rad, Volker Brendel, Enis Afgan,
Michael Packard, Therese Miller, and Winona Snapp-Childs
8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.1.1 Jetstream Motivation and Sponsor Background . . . . . . . . . . . 192
8.1.2 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.1.3 Hardware Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . 196
8.1.4 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.1.5 Cloud Functionality Tests . . . . . . . . . . . . . . . . . . . . . . . 198
8.1.6 Gateway Functionality Tests . . . . . . . . . . . . . . . . . . . . . . 199
8.1.7 Data Movement, Storage, and Dissemination . . . . . . . . . . . . . 199
8.1.8 Acceptance by NSF . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.2 Applications and Workloads . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.2.1 Highlights of Main Applications . . . . . . . . . . . . . . . . . . . 201
8.3 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
8.4 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
8.4.1 Node Design and Processor Elements . . . . . . . . . . . . . . . . . 203
8.4.2 Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.4.3 Storage Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.5 System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.5.1 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
8.5.2 System Administration . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.5.3 Schedulers and Virtualization . . . . . . . . . . . . . . . . . . . . . 206
8.5.4 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.5.5 Storage Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.5.6 User Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.5.7 Allocation Software and Processes . . . . . . . . . . . . . . . . . . . 209
8.6 Programming System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.6.1 Atmosphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
8.6.2 Jetstream Plugins for the Atmosphere Platform . . . . . . . . . . . 211
8.6.2.1 Authorization . . . . . . . . . . . . . . . . . . . . . . . . . 211
8.6.2.2 Allocation Sources and Special Allocations . . . . . . . . . 211
8.6.3 Globus Authentication and Data Access . . . . . . . . . . . . . . . 212
8.6.4 The Jetstream OpenStack API . . . . . . . . . . . . . . . . . . . . 212
8.6.5 VM libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
8.7 Data Center Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
8.8 System Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
8.9 Interesting Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8.9.1 Jupyter and Kubernetes . . . . . . . . . . . . . . . . . . . . . . . . 216
8.10 Artificial Intelligence Technology Education . . . . . . . . . . . . . . . . . 217
8.11 Jetstream VM Image Use for Scientific Reproducibility - Bioinformatics as
an Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.12 Running a Virtual Cluster on Jetstream . . . . . . . . . . . . . . . . . . . 218
Contents xiii
12 Lomonosov-2 305
Vladimir Voevodin, Alexander Antonov, Dmitry Nikitenko, Pavel
Shvets, Sergey Sobolev, Konstantin Stefanov, Vadim Voevodin, and
Sergey Zhumatiy and Andrey Brechalov, and Alexander Naumov
12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
12.1.1 HPC History of MSU . . . . . . . . . . . . . . . . . . . . . . . . . . 305
12.1.2 Lomonosov-2 Supercomputer: Timeline . . . . . . . . . . . . . . . . 308
Contents xv
13 Electra 331
Rupak Biswas, Jeff Becker, Davin Chan, David Ellsworth, and Robert
Hood, Piyush Mehrotra, Michelle Moyer, Chris Tanner, and William
Thigpen
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
13.2 NASA Requirements for Supercomputing . . . . . . . . . . . . . . . . . . 333
13.3 Supercomputing Capabilities: Conventional Facilities . . . . . . . . . . . . 333
13.3.1 Computer Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
13.3.2 Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
13.3.3 Network Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 334
13.3.4 Storage Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
13.3.5 Visualization and Hyperwall . . . . . . . . . . . . . . . . . . . . . . 336
13.3.6 Primary NAS Facility . . . . . . . . . . . . . . . . . . . . . . . . . . 336
13.4 Modular Supercomputing Facility . . . . . . . . . . . . . . . . . . . . . . . 337
13.4.1 Limitations of the Primary NAS Facility . . . . . . . . . . . . . . . 337
13.4.2 Expansion and Integration Strategy . . . . . . . . . . . . . . . . . 337
13.4.3 Site Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
13.4.4 Module Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
13.4.5 Power, Cooling, Network . . . . . . . . . . . . . . . . . . . . . . . . 339
13.4.6 Facility Operations and Maintenance . . . . . . . . . . . . . . . . . 340
13.4.7 Environmental Impact . . . . . . . . . . . . . . . . . . . . . . . . . 341
13.5 Electra Supercomputer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
13.5.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
13.5.2 I/O Subsystem Architecture . . . . . . . . . . . . . . . . . . . . . . 343
13.6 User Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
13.6.1 System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
13.6.2 Resource Allocation and Scheduling . . . . . . . . . . . . . . . . . . 344
13.6.3 User Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
13.7 Application Benchmarking and Performance . . . . . . . . . . . . . . . . 345
13.8 Utilization Statistics of HECC Resources . . . . . . . . . . . . . . . . . . 347
xvi Contents
14 Bridges: Converging HPC, AI, and Big Data for Enabling Discovery 355
Nicholas A. Nystrom, Paola A. Buitrago, and Philip D. Blood
14.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
14.1.1 Sponsor/Program Background . . . . . . . . . . . . . . . . . . . . . 357
14.1.2 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
14.2 Applications and Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . 359
14.2.1 Highlights of Main Applications and Data . . . . . . . . . . . . . . 360
14.2.2 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . 361
14.2.3 Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
14.2.4 Gateways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
14.2.5 Allocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
14.3 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
14.4 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
14.4.1 Processors and Accelerators . . . . . . . . . . . . . . . . . . . . . . 366
14.4.2 Node Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
14.4.3 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
14.4.4 Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
14.4.5 Storage System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
14.5 System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
14.5.1 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
14.5.2 File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
14.5.3 System Administration . . . . . . . . . . . . . . . . . . . . . . . . . 371
14.5.4 Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
14.6 Interactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
14.6.1 Virtualization and Containers . . . . . . . . . . . . . . . . . . . . . 372
14.7 User Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
14.7.1 User Environment Customization . . . . . . . . . . . . . . . . . . . 373
14.7.2 Programming Models . . . . . . . . . . . . . . . . . . . . . . . . . . 374
14.7.3 Languages and Compilers . . . . . . . . . . . . . . . . . . . . . . . 374
14.7.4 Programming Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
14.7.5 Spark and Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
14.7.6 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
14.7.7 Domain-Specific Frameworks and Libraries . . . . . . . . . . . . . . 375
14.7.8 Gateways, Workflows, and Distributed Applications . . . . . . . . . 375
14.8 Storage, Visualization, and Analytics . . . . . . . . . . . . . . . . . . . . . 376
14.8.1 Community Datasets and Big Data as a Service . . . . . . . . . . . 376
14.9 Datacenter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
14.10 System Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
14.10.1 Reliability and Uptime . . . . . . . . . . . . . . . . . . . . . . . . . 377
14.11 Science Highlights: Bridges-Enabled Breakthroughs . . . . . . . . . . . . . 377
14.11.1 Artificial Intelligence and Big Data . . . . . . . . . . . . . . . . . . 377
14.11.2 Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
14.12 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Contents xvii
16 Oakforest-PACS 401
Taisuke Boku, Osamu Tatebe, Daisuke Takahashi, Kazuhiro Yabana,
Yuta Hirokawa, and Masayuki Umemura, Toshihiro Hanawa, Kengo
Nakajima, and Hiroshi Nakamura, Tsuyoshi Ichimura and Kohei
Fujita, and Yutaka Ishikawa, Mitsuhisa Sato, Balazs Gerofi, and
Masamichi Takagi
16.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
16.2 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
16.3 Applications and Workloads . . . . . . . . . . . . . . . . . . . . . . . . . 403
16.3.1 GAMERA/GHYDRA . . . . . . . . . . . . . . . . . . . . . . . . . 403
16.3.2 ARTED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
16.3.3 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
16.3.3.1 HPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
16.3.3.2 HPCG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
16.4 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
16.5 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
16.6 System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
16.6.1 Basic System Software . . . . . . . . . . . . . . . . . . . . . . . . . 409
16.6.2 IHK/McKernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
16.7 Programming System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
16.7.1 Basic Programming Environment . . . . . . . . . . . . . . . . . . . 412
16.7.2 XcalableMP: A PGAS Parallel Programming Language for Parallel
Many-core Processor System . . . . . . . . . . . . . . . . . . . . . . 413
16.7.2.1 Overview of XcalableMP . . . . . . . . . . . . . . . . . . . 413
16.7.2.2 OpenMP and XMP Tasklet Directive . . . . . . . . . . . . 414
16.7.2.3 Multi-tasking Execution Model in XcalableMP between
Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
xviii Contents
Index 451
Preface
We are pleased to present you with this third volume of material that captures a snapshot
of the rich history of practice in Contemporary High Performance Computing. As evidenced
in the chapters of this book, High Performance Computing continues to flourish, both in
industry and research, both domestically and internationally. While much of the focus of
HPC is on the hardware architectures, a significant ecosystem is responsible for this success.
This book helps capture this broad ecosystem.
High Performance Computing (HPC) is used to solve a number of complex questions
in computational and data-intensive sciences. These questions include the simulation and
modeling of physical phenomena, such as climate change, energy production, drug design,
global security, and materials design; the analysis of large data sets, such as those in genome
sequencing, astronomical observation, and cybersecurity; and, the intricate design of engi-
neered products, such as airplanes and automobiles.
It is clear and well-documented that HPC is used to generate insight that would not oth-
erwise be possible. Simulations can augment or replace expensive, hazardous, or impossible
experiments. Furthermore, in the realm of simulation, HPC has the potential to suggest
new experiments that escape the parameters of observability.
Although much of the excitement about HPC focuses on the largest architectures and
on specific benchmarks, such as TOP500, there is a much deeper and broader commitment
from the international scientific and engineering community than is first apparent. In fact,
it is easy to lose track of history in terms the broad uses of HPC and the communities
that design, deploy, and operate HPC systems and facilities. Many of these sponsors and
organizations have spent decades developing scientific simulation methods and software,
which serves as the foundation of HPC today. This community has worked closely with
countless vendors to foster the sustained development and deployment of HPC systems
internationally.
In this third volume of Contemporary High Performance Computing [1, 2], we continue
to document international HPC ecosystems, which includes the sponsors and sites that
host them. We have selected contributions from international HPC sites, which represent
a combination of sites, systems, vendors, applications, and sponsors. Rather than focus
on simply the architectures or applications, we focus on HPC ecosystems that have made
this dramatic progress possible. Though the very word ecosystem can be a broad, all-
encompassing term, it aptly describes high performance computing. That is, HPC is far more
than one sponsor, one site, one application, one software system, or one architecture. Indeed,
it is a community of interacting entities in this environment that sustains the community
over time. In this regard, we asked contributors to include the following topics in their
chapters:
xix
xx Preface
5. System software
6. Programming systems
7. Storage, visualization, and analytics
8. Data center/facility
Some of the authors followed this outline precisely while others found creative ways
to include this content in a different structure. Once you read the book, I think that you
will agree with me that most of the chapters have exceeded these expectations and have
provided a detailed snapshot of their HPC ecosystem, science, and organization.
Bibliography
[1] J. S. Vetter. Contemporary high performance computing: an introduction. In Jeffrey S.
Vetter, editor, Contemporary High Performance Computing: From Petascale Toward
Exascale, volume 1 of CRC Computational Science Series, page 730. Taylor and Francis,
Boca Raton, 1 edition, 2013.
Preface xxi
[2] J. S. Vetter, editor. Contemporary High Performance Computing: From Petascale To-
ward Exascale, volume 2 of CRC Computational Science Series. Taylor and Francis,
Boca Raton, 1 edition, 2015.
Editor
Jeffrey S. Vetter, Ph.D., is a Distinguished R&D Staff Member, and the founding group
leader of the Future Technologies Group in the Computer Science and Mathematics Division
of Oak Ridge National Laboratory. Vetter also holds a joint appointment at the Electrical
Engineering and Computer Science Department of the University of Tennessee-Knoxville.
From 2005 through 2015, Vetter held a joint position at Georgia Institute of Technology,
where, from 2009 to 2015, he was the Principal Investigator of the NSF Track 2D Ex-
perimental Computing XSEDE Facility, named Keeneland, for large scale heterogeneous
computing using graphics processors, and the Director of the NVIDIA CUDA Center of
Excellence.
Vetter earned his Ph.D. in Computer Science from the Georgia Institute of Technology.
He joined ORNL in 2003, after stints as a computer scientist and project leader at Lawrence
Livermore National Laboratory, and postdoctoral researcher at the University of Illinois at
Urbana-Champaign. The coherent thread through his research is developing rich architec-
tures and software systems that solve important, real-world high performance computing
problems. He has been investigating the effectiveness of next-generation architectures, such
as non-volatile memory systems, massively multithreaded processors, and heterogeneous
processors such as graphics processors and field-programmable gate arrays (FPGAs), for
key applications. His recent books, entitled ”Contemporary High Performance Computing:
From Petascale toward Exascale (Vols. 1 and 2),” survey the international landscape of
HPC.
Vetter is a Fellow of the IEEE, and a Distinguished Scientist Member of the ACM.
Vetter, as part of an interdisciplinary team from Georgia Tech, NYU, and ORNL, was
awarded the Gordon Bell Prize in 2010. Also, his work has won awards at major venues:
Best Paper Awards at the International Parallel and Distributed Processing Symposium
(IPDPS), EuroPar and the 2018 AsHES Workshop, Best Student Paper Finalist at SC14,
Best Presentation at EASC 2015, and Best Paper Finalist at the IEEE HPEC Conference.
In 2015, Vetter served as the Technical Program Chair of SC15 (SC15 Breaks Exhibits and
Attendance Records While in Austin). You can see more at https://ft.ornl.gov/∼vetter.
xxiii
Chapter 1
Resilient HPC for 24x7x365 Weather
Forecast Operations at the Australian
Government Bureau of Meteorology
Dr Lesley Seebeck
Former Group Executive of Data & Digital, CITO, Australian Bureau of Meteorology
Tim F Pugh
Director, Supercomputer Programme, Australian Bureau of Meteorology
Damian Aigus
Support Services, Data & Digital, Australian Bureau of Meteorology
Dr Joerg Henrichs
Computational Science Manager, Data & Digital, Australian Bureau of Meteorology
Andrew Khaw
Scientific Computing Service Manager, Data & Digital, Australian Bureau of Meteorology
Tennessee Leeuwenburg
Model Build Team Manager, Data & Digital, Australian Bureau of Meteorology
James Mandilas
Operations and Change Manager, Data & Digital, Australian Bureau of Meteorology
Richard Oxbrow
HPD Systems Manager, Data & Digital, Australian Bureau of Meteorology
Naren Rajasingam
HPD Analyst, Data & Digital, Australian Bureau of Meteorology
Wojtek Uliasz
Enterprise Architect, Data & Digital, Australian Bureau of Meteorology
John Vincent
Delivery Manager, Data & Digital, Australian Bureau of Meteorology
Craig West
HPC Systems Manager, Data & Digital, Australian Bureau of Meteorology
Dr Rob Bell
IMT Scientific Computing Services, National Partnerships, CSIRO
1
2 Contemporary High Performance Computing, Vol. 3
1.1 Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Program Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Sponsor Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Applications and Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Highlights of Main Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 2017 Case Study: From Nodes to News, TC Debbie . . . . . . . . . . . . . . . . . . 10
1.3.3 Benchmark Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.4 SSP - Monitoring System Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 System Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5.1 Australis Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5.2 Australis Node Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5.2.1 Australis Service Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5.2.2 Australis Compute Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.3 External Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.4 Australis Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.5 Australis Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.6 Australis Storage and Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6.1 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6.2 Operating System Upgrade Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6.3 Schedulers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6.3.1 SMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.3.2 Cylc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.3.3 PBS Professional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Programming System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7.1 Programming Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7.2 Compiler Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.7.3 Optimisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8 Archiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.1 Oracle Hierarchical Storage Manager (SAM-QFS) . . . . . . . . . . . . . . . . . . . . 23
1.8.2 MARS/TSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.9 Data Center/Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.10 System Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.10.1 Systems Usage Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.11 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.11.1 Failover Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.11.2 Compute Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.11.3 Data Mover Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.11.4 Storage Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.11.4.1 Normal Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.11.4.2 Failover Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.11.4.3 Recovery Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.11.4.4 Isolated Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.11.5 SSH File Transfer Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.12 Implementing a Product Generation Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Australian Government Bureau of Meteorology 3
1.1 Foreword
Supercomputing lies at the heart of modern weather forecasting. It coevolves with the
science, technology, means of the collection of observations, the needs of meteorologists,
and the expectations of the users of our forecasts and warnings. It nestles in a web of
other platforms and networks, applications and capabilities. It is driven by, consumes, and
generates vast and increasing amounts of data. And it is part of the global effort by the
world’s meteorological agencies to collect data, understand the weather, and look ahead to
generate forecasts and warnings on which human activity is based. Given the complexity
of the overall task and the web of supporting capability, to talk about the supercomputing
component alone seems reductionist. And yet it is a feat of human engineering and effort
that we do well to recognise. These are capabilities that drive the data and information
business that is the Bureau – the growing benefits available through more data, increasing
granularity and frequency of forecasts, and better information to the Bureau’s customers –
no more and no less than to the scientists or the meteorologists.
The Bureau’s current supercomputer, Australis, was delivered on time and within bud-
get, with the supercomputer itself, a Cray XC40, bought at a capital cost of $A80 million[8].
The programme extends from 2014-15 through 2020-21. Within that period, the Bureau con-
tinues to keep pace with the relentless demands of the data, the models and user needs, and
explore new, improving ways to extract value from both data and capability. It also has to
contend with an increasingly challenging operating environment; their effective use placing
growing demands on organisations in terms of skills, operating costs, and security.
On a personal note, arriving at the start of the programme to replace the existing
supercomputer, I was fortunate to have a highly capable team led by Tim Pugh. To continue
to be an effective contributor to the field, both the Bureau – and Australia – need to nurture
and grow the technical skills, deep computational understanding, insights that build and
shape the field of high performance computing and to exploit that capability. This chapter
sets out the Australian Bureau of Meteorology’s supercomputing capability, and in doing
so helps contribute to that effort.
Dr Lesley Seebeck
Former Group Executive of Data & Digital, CITO,
Australian Bureau of Meteorology
4 Contemporary High Performance Computing, Vol. 3
1.2 Overview
The Australian Government’s Bureau of Meteorology has had the responsibility of pro-
viding trusted, reliable, and responsive meteorological services for Australia - all day, every
day – since 1908. Bringing together the ever-expanding world-wide observation networks,
and improving computational analysis and numerical modelling to deliver the Bureau’s ex-
ceptional predictive and analytical capability, we are able to undertake the grand challenge
of weather and climate prediction.
Australia is a country with a landmass marginally less than the continental United
States, but with a population 13 times smaller. Australia is not only vast, it is also harsh.
With just 9% of the landmass suitable for farming, and the main population living along
the cooler coastal regions, the climate of the continent plays a significant role in defining
the life of the country.
Around the country there are climate pockets similar to those found on every other
continent; Sydney shares a climate similar to South Africa, Canberra is most like Rome,
Melbourne like the San Franciso Bay area, Perth like Los Angeles, Darwin like Mumbai, Ho-
bart like Southern Chile and the UK. Across the centre are deserts, which, though sparsely
populated, still contain major population centres like Alice Springs and the mining town of
Kalgoorlie.
Against this backdrop the Bureau and its forecasting team strive to provide timely
weather products to cover the entire continent and its climate variations, as well as managing
its weather responsibilities for Australia’s Antarctic Territory (a 5.9 million square kilometre
area, 42% of the Antarctic continent), on a 24x7x365 basis. As if this wasn’t a significant
enough daily endeavour, the Bureau also manages a suite of on-demand emergency forecasts
to cover the extreme weather events of the region; tsunami, cyclone, and bushfire (wildfire).
They regularly run in the extreme weather season (December - April) and are also ready
to go as and when they are required. Australia as an island continent also provides a full
oceanographic suite of forecasting.
The Bureau of Meteorology has the unique numerical prediction capabilities required
to routinely forecast the weather and climate conditions across the Australian continent,
its territories, and the surrounding marine environment. When this capability is utilised
with modern data and digital information services, we are able to issue timely forecasts
and warnings to the Australian public, media, industries, and Government services well in
advance of an event. These services are essential to ensure the nation is prepared to act when
faced with an event, and to mitigate the loss of property and lives. As a prepared nation,
we have been very successful in reducing the loss of lives and improving the warnings over
the years, and will continue to improve with each advance in key areas of science, numerical
prediction, observing networks, and computational systems.
Modern weather and climate prediction requires a significant science investment to
achieve modelling advances that lead to enhanced forecast services. The science invest-
ment comes from the Bureau and its local partners in the Commonwealth Scientific and
Industrial Research Organization (CSIRO) and Australian universities and international
partners. The Bureau is a member of the Unified Model (UM) partnership [9], which is lead
by the UK Met Office.
These partnerships bring together the required breadth of science, observations, and
modelling expertise to develop the global data assimilation and forecast models to high
resolution, convection permitting models, and the forthcoming multi-week and seasonal
coupled climate models. Today the Bureau assimilates data from more than 30 different
satellite data streams, surface observations, aircraft and balloon observations, and radar
Australian Government Bureau of Meteorology 5
observations coming next. All these capabilities talk to the sophistication of numerical
prediction modelling and why the Bureau needs such scientific partnerships, observation
networks, and computing capability to continuously deliver better products.
The Bureau strategy is to focus on customer needs to deliver more accurate and trusted
forecasts through its High Performance Computing (HPC) and numerical prediction capa-
bility. Australian businesses, agriculture, mining, aviation, shipping, defence, government
agencies, and citizens are all beneficiaries of more timely and accurate weather forecasts,
the multi-week climate outlooks, seasonal climate and water forecasting, and climate change
projections. The Bureau’s customers have an interest in decision-making across many time-
scales, and value an ability to change decisions (and derive value) well beyond the one-week
lead time typically associated with weather forecasts.[12]
The size of the HPC system is dependent on these factors; the number of modelling
suites; the numerical weather prediction models cost and complexity, arising from finer grid
resolutions; the need to consider a range of probable future atmospheric states (ensemble
modelling); and the need to couple physical modelling domains (i.e. atmosphere, ocean, sea
ice, land surfaces) to better capture physical interactions leading to improve simulations
and forecast skill. Typically, the numerical prediction models are sized to the available
computing capacity, thus constraining the modelling grid resolution.
working to advance weather and climate research and development within the region. It has
facilitated the adoption of a software life cycle process for numerical prediction products;
the Australian Community Climate Earth System Simulator (ACCESS) [4] and the Unified
Model/Variational Assimilation (UM/VAR) weather modelling suites as well as ocean and
marine modelling suites.
This continuing development of the Bureau’s numerical modelling and prediction prod-
ucts have delivered an operational service for the routine and real-time demands of 24x7x365
weather, climate, ocean, and hydrological forecast services.
In July 2015 to meet the growing demand, the Bureau entered into a contract with Cray
to acquire a Cray XC40 system called “Australis” to support its operational HPC require-
ments for improved numerical prediction and forecast services. The Cray computational
systems married Intel processors, Lustre, the Network File System (NFS) filesystems, and
PBS Pro job scheduler system to provide the backbone of the Bureau’s HPC capability.
The computational power of Australis facilitates improvements of the forecast/assimilation
models to deliver a seasonal climate model at 60km resolution, global model at 12km, re-
gional model at 4km, and city models at 1.5km. Australis also provides ensemble modelling
capability to enable probabilistic forecasts to improve decision support systems.
The Bureau’s current HPC platforms consist of several systems, an Exemplar used by
system administrators to test system upgrades and patches; a small Development system
for scientists and software developers called “Terra”; and the Australis operational sys-
tem, a mission critical system for severe weather forecasting and environmental emergency
response.
for research and development projects that delay improvements until the next investment
cycle.
In response to this, the Bureau changed its strategy to separate research and operational
computing investments. Research computing moved to a collaborative national peak facility
at NCI in 2013. New Government funding was obtained in 2014 for the replacement of
the Bureau’s existing Oracle HPC system with one delivering the computing capability
to improve its numerical weather prediction applications and forecast services for severe
weather events through improved accuracy, more up to date forecasts, increased ability
to quantify the probability of forecast outcomes, and responding on-demand to extreme
weather and hazard events as they develop.
Within the Bureau, the HPC platform sits in the Data & Digital Group of the organ-
isation. The weather products are developed by the Research and Development branch in
a collaborative relationship with the HPC technical team and National Forecast Services.
This relationship of a scientific need meeting a technical service has been the internal driver
for the system’s upgrades.
1.2.3 Timeline
The timeline for the latest system design, development, procurement, installation, and
use is shown in Table 1.1 below.
both a very high level of reliability of forecast generation and its timeliness of delivery -
HPC attributes that distinguish it from workloads in other fields, such as research.
Improvements in the forecast quality of NWP over the decades have been driven by
three key factors:
1. Improved understanding of atmospheric physics, and how that understanding can be
encapsulated in a numerical model;
2. Use of more observation data and observation types, together with increasingly so-
phisticated mathematical methods in the “Data Assimilation” process that generates
the initial atmospheric state from which a forecast simulation is produced;
3. Increasing HPC capacity, which has enabled models to run at higher-resolution to
better resolve physical features and processes within a given production time-window.
Our daily runs include; Global NWP, Global Wave, Global Ocean, Australian Regional
NWP, Regional Wave, and six regions of high-resolution NWP models for Victoria/Tasma-
nia, New South Wales, Queensland, South Australia, Western Australia, and Northern Ter-
ritory. A single high-resolution convection resolving NWP model of the Australian continent
is desirable but computationally unattainable due to resource costs. Antarctic forecasting
currently uses the Global NWP model for guidance. Our severe weather modelling consists
of tropical cyclone prediction, fire weather prediction, flood forecasting, and environmen-
tal emergency response to chemical, volcanic, nuclear, and bio-hazard events. Additional
modelling runs for global climate forecasting use the Predictive Ocean Atmosphere Model
for Australia (POAMA) ensemble with a 250km grid resolution, and a new ACCESS cou-
pled climate model with a 60km grid is being readied for multi-week and seasonal climate
forecast services. Further predictive modelling includes ocean tides, storm surge, tsunami,
coastal ocean circulation, space weather, hydrology, and stream flow.
Sydney, NSW
(1.5km togography)
25km Global Model 12km Regional Model 1.5km grid City/State Model
2x daily, 10-day forecast 4x daily, 3-day forecast 4x daily, 36 hour forecast
Cascading or “coupling” of individual NWP models, and now ensemble modelling, has
placed additional stress on HPC capacity in terms of forecast timeliness and peak demand
for the compute and data storage resources. This characteristic typically sets the compute
resource capacity limits or size of the system.
45 150 155
Cairns
10 pm Mar 24 L
10 am Mar 25 1
2 10 pm Mar 25
2 10
10
21
pm
am
10
0p Ma
am
Ma
m r2
Ma
12
6
31
r2
Townsville r2
Ma
pm
0a 6
7
m
r2
4
M
Ma
ar
8
Bowen 4 r2 -20
28
7
Proserpine3
4 Arlie Beach
Hamilton Island
Collinsvill 2
4p
1
10
Mackay
Severe tropical cyclone
m
Mount Coolon
2a
pm
Ma
Debbie
m
r2
Ma
Ma
L
8
St Lawrence
r2
4 pm Mar 29
8
9
L
9 Rockhampton
M ar 2 0 200 400
m
10p Kilometres
© Copyright Commonwealth of Australia 2017, Bureau of Meterology
Coral Sea the system had developed into a Category 2 cyclone by 26th March. On 27th
March Debbie strengthened quickly from a Category 2 to a Category 4 severe tropical
cyclone as it continued heading toward the Queensland coastline.
The storm then continued developing until it crossed the coastline at Airlie Beach at
midday on 28th March with sustained winds of 195 km/h. Bureau observing equipment on
Hamilton Island Airport was damaged by the storm at around 11am. Prior to this, a peak
wind gust of 263 km/h was recorded; this being the highest ever wind gust recorded in
Queensland.
All the while Bureau staff were using the predictions from the ACCESS-G, ACCESS-R,
ACCESS-C, and ACCESS-TC systems to update forecasts and warnings for communities
in the expected path of the storm, and for those likely to experience damage to property
and danger to life. Using the new Australis system, forecast models were produced every
6 hours, using guidance from the newest higher resolution ACCESS-C2 model covering
the highly populated area of southeast Queensland. The model provided output to provide
guidance on potential rainfall totals across southeast Queensland; this gave important input
into decisions surrounding Severe Weather Warnings and Flood Warnings.
Ex-tropical cyclone Debbie tracked south then southeast over the Sunshine Coast and
Brisbane during the afternoon and evening of Thursday, 30th March. The storm continued
to move south across Queensland and into New South Wales; the forecasting responsibilities
moved to the regional office in Sydney. Debbie finally left the Australian mainland on 31st
March. As a severe weather system it continued across the Tasman Sea, where it caused
further significant flooding in the Bay of Plenty region of New Zealand on 6th April[6].
and IO bandwidth. However none of these benchmarks give a holistic view of the overall
HPC system, each one only picks certain aspects of the overall system.
The Bureau uses a different approach for monitoring the performance of the system by
defining a set of five typical applications run as a standard set of benchmarks. This set
represents the mix of applications running on the system routinely, but uses a fixed set of
input data. The benchmarks in this set are three different UM (Unified Model) simulations
at different resolutions (from a low-resolution climate model to a global model), a data
assimilation, and an ocean simulation, as illustrated in Table 1.2.
The runtime of each of those five benchmarks is used to compute a performance per
core value (column 4). These five performance values are then averaged using a geometric
mean and multiplied with the number of cores in the system, resulting in one overall SSP
(Sustained System Performance) number.
Because each simulation uses the same input data, the runtimes of each benchmark
should report very little variation. Consequently the overall SSP figure should be within
variance. Any significant change in the runtime behaviour, in any of those applications,
would result in a significant change of the reported SSP figure, indicating that the system
has an issue that would cause runtime degradation.
The full SSP suite is run once a week at a quiet time on the system. The individual
runtimes, as well as the overall SSP figure, are reported monthly by Cray; these values are
monitored by the Bureau to look for any degradation of the overall system performance.
A similar SSP setup was used on the Bureau’s previous supercomputers. Some interesting
issues discovered included:
1. A 7% performance loss was detected over a six-month period. While an OS update
would have likely solved this issue, the associated risk of an OS update (loss of official
support since newer kernels might not yet be certified to work with other components
of the system, and the risk of introducing new problems) prevented an OS update
from happening. Instead it was decided to reboot all nodes regularly, which solved the
performance slowdown observed by the SSP.
2. A BIOS update contained an incorrect setting (hardware prefetch was disabled). The
BIOS update was rolled out as nodes were rebooted. The SSP value early on indicated
a system issue. Correlating the used nodes with lower SSP results soon indicated that
recently rebooted nodes showed the slowdown, and closer analysis resulted in detecting
the changed BIOS setting.
Australian Government Bureau of Meteorology 13
SSP tests are also used to evaluate new system software. The approach is somewhat
different from the weekly SSP system tests; the tests will only be run on demand, i.e. when
a new version of software is installed and needs to be evaluated (typically before it is made
available on the system). In this scenario it is rare for the same binary to be used more
than once (except to make sure we are getting statistically significant results). In contrast,
the system-testing SSP suite will keep the same binaries for repeated cycles.
Due to the difficulties involved in verifying an application, running suites tend not to
update to a newer compiler or system library until absolutely required. The SSP suite
mirrors this and keeps on running with the previously compiled binary, in order to accurately
reflect the mix running on the system. Once enough newly compiled software is running on
the system, the system SSP suite is recompiled, and a new baseline is established.
Observation
and model
data
Ocean /
Climate models
Downstream
Users:
Workflow Weather
Bureau,
Schedulers models
Public,
Industry
schedulers are able to survive single points of failure independent from Australis and can
maintain operational workflow into the HPC systems. This element of the configuration is
seen as a key element in the HPC system meeting its uptime objective.
The operational benefit of this arrangement is that if an unplanned fault occurs on the
part running the Bureau’s operational services, the system administrators are able to move
operations to the other part and restart the last computational jobs and thus minimise the
effect of unplanned outages on the Bureau’s operations. This design allows the Bureau’s
numerical forecast services to achieve a 99.86% uptime service level, a figure that equates
to less than 1 hour of downtime per month.
4x 10GbE
10GbE
40GbE 40GbE
10GbE 10GbE
BoM Network
40GbE 40GbE
10GbE 10GbE
40GbE 40GbE
XC40 SDB
SDB Boot
Boot SDB
GW LNET LNET LNET LNET SDB
GW SDB
SDB Boot
Boot XC40
West (8)x2 (12)x2 (12)x2 (8)x2 East
FC Switch FC Switch
FDR IB
FDR IB FDR IB
FDR IB
SMW
SMW FDR IB FDR IB SMW
SMW
FDR IB
10GbE 10GbE
Data
SDB Login
SDB Login
SDB Data
SDB
Movers Nodes Nodes Movers
Sonexion Sonexion Sonexion Sonexion
Lustre Lustre Lustre Lustre
3SSU 6SSU 6SSU 3SSU
FDR IB FDR IB
FDR IB FDR IB
Fabric
FDR IB
FDR IB
This processor was selected as the best processor performance when running the SSP bench-
mark. The use of hyperthreading is enabled on a per job basis; thus an application can utilise
either 24 physical cores or 48 virtual cores per node.
The service blades for the XC40 system use Intel Xeon Sandy Bridge processors; therefore
these are not used to run any NWP jobs.
1. Boot and SDB nodes - these nodes provide a boot function and a System Database
2. Network Router nodes (Net) - these nodes provide the capability to route packets to
TM
and from a Cray Aries network to the corporate network.
3. DVS – Data Virtualisation Service nodes provide a method to mount external NFS
file systems (our Netapp FAS) into the Aries network.
4. RSIP – Realm Specific IP-Address nodes provide a service similar to Network Address
Translation. They allows nodes on the Aries network to send packets to and get return
packets from our HPC services. RSIP is typically used for lower level communications
like DNS, LDAP, software license management, workflow status, etc.
5. LNET – these nodes route Lustre files system traffic from the Aries network through
TM
to the Sonexion R Lustre R appliances via InfiniBand .
16 Contemporary High Performance Computing, Vol. 3
4. Suite Scheduler nodes – SMS and Cylc services provide routine operations workflow
management, event triggers, and scheduling for Australis.
5. All the nodes located externally from the XC40 cabinets are managed by Bright
Cluster Manager.
Two NFS servers support the Cray XC40’s by providing a small amount of persistent
storage for critical software and data. The home directories on the XC40 computers are lo-
cated on the NFS file systems; data protection is implemented using file based snapshots, file
replication, and traditional backups. The home directories’ file systems hold the persistent
data required by the supercomputer to run the Production workload.
West
Suspend Upgrade Verify
Original Staging OS Stability New New New
OS OS OS Production OS
(Staging) (Staging) (Prod’n) (Prod’n)
Fail-back
East
1.6.3 Schedulers
A key tool in the ongoing delivery of the operational HPC’s is our use of schedulers.
Managing the complex daily schedule requires management of both jobs and the system
resources on the HPC platform.
In operation, the job scheduler has three goals: first the priority scheduling goal, then
the backfill scheduling goal, and finally the job pre-emption goal.
1. The priority scheduling goal is to run the most important time-critical jobs first; the
environmental emergency response, the on-demand severe weather prediction, and
finally the routine weather prediction jobs within the daily production time-window.
2. The backfill scheduling goal is to run the greatest aggregate sum of non-time critical
jobs when computing resources are available, such as climate, ocean prediction, and
reanalysis jobs. This results in the highest utilisation of the system.
3. The job pre-emption goal is to stop the minimum set of jobs required to allow priority
jobs to run immediately. The suspend-resume pre-emption scheduling is a key feature
of our system, which is used to effectively achieve both priority scheduling and backfill
scheduling goals.
The pre-emption scheduling will target backfill jobs that can be suspended in memory when
time-critical jobs are ready to run. When the priority job has completed, the backfill job is
20 Contemporary High Performance Computing, Vol. 3
resumed. This means that the elapsed time of a backfill job does not need to fit within an
available time slot in the operational schedule. The resource requirements for the backfill
job do need to be met. The large memory compute nodes make the pre-emption scheduling
achievable. Overall this allows us to achieve the highest utilisation of the system and achieve
the production schedule of our business.
The HPC platform currently uses two schedulers (SMS and Cylc) and a workload man-
ager (PBS Professional) to manage the daily work flow. The schedulers are used to feed the
workload manager. Of the 20 plus weather modelling suites the Bureau runs regularly, the
ACCESS suites are the most resource hungry. Running up to 8 times in any 24 hour period,
they need to be managed alongside the Seasonal, Wave, Ocean, and Ocean forecast models,
as well as a fleet of smaller NWP suites. Currently, the Bureau PBS scheduler runs up to
60,000 jobs per day across Australis production and staging.
1.6.3.1 SMS
Developed by European Centre for Medium-Range Weather Forecasts (ECMWF) the
Supervisor Monitor Scheduler (SMS) has been the backbone of the HPC’s delivery platforms
for two decades. Written in C, it allows extensive customisations of task environments. SMS
allows submission to multiple execution hosts using one or more batch schedulers, with suites
scheduled according to time, cycle, suite, task family, and individual task triggering.
The key to the longevity of SMS is that it was always a product developed specifically
to cycle numerical prediction workflows. The 20 years of continual development of SMS
has contributed to HPC’s high rates of uptime and timeliness of delivery. Such a long
evolution and refinement has made it a very stable and reliable product. SMS, however, is
now a decade outside of its operational lifetime and support from ECMWF has ended; the
responsibility for support and development now falls in-house at the Bureau. The limitations
of its interface and its alert and monitoring connectivity eventually drove a decision to seek
a new workflow scheduler.
1.6.3.2 Cylc
The ACCESS suite of weather prediction jobs that deliver the main product from the
Bureau are based on the Unified Model (UM) modelling software from the UK Met Of-
fice. Cylc as a workflow scheduler [10] is integrated in these models along with the suite
configuration package Rose. Both these products use Python as the programming language.
Adopting Cylc as the new workflow scheduler will deliver significant time saving benefits
with a simplified localisation process for every ACCESS model update. The Cylc service,
like SMS, provides our IT Operators with status and alerts relating to running modelling
suites and product generation. The deployment of Cylc continues and is expected to run
into 2018 with an expected retirement of SMS towards 2020.
1.7.3 Optimisations
Since all the codes had previously been optimised with the Intel compilers, only a little
effort was required to tune the existing applications. Working with the Cray hardware
platform (processors, operating system, network & storage) the runtime environment of
those jobs had to be adjusted. The Model Build team frequently used Cray’s grid reorder
tool to change the ‘processes to nodes’ mapping to minimise internode communication.
Results from these changes delivered up to a 5% performance increase.
Further testing investigated the number of nodes required to allow each job to finish
within the required time. In each case various domain decompositions were tested to find
the one that gave the best performance. For the NWP Unified Model, used for the global,
regional, and city forecasts, an additional IO server needed to be configured. The IO servers
have their own topology and configuration, and free-up capacity in computation nodes. In
some cases the Bureau utilised a simple in-house written “experiment manager” to search
the multi-dimension search space for the best combination of parameters.
Running applications on partial nodes was also tested. On the previous HPC platform,
it was found that some applications executed more quickly when using only part of the
cores on a node – in some cases on a 16 core node, only 12 cores would be used. Partial
node usage increases the memory bandwidth per processes and can give more third level
Australian Government Bureau of Meteorology 23
cache per process, resulting in a shorter execution time[1]. Our measurements so far have
not indicated any need to do this on the Cray XC40 platform.
Additional work was necessary for hybrid codes that utilised MPI and OpenMP R for
parallel processing. Using the Intel compiler with its own thread-binding mechanism and
environment variables was found in some cases to interfere with the thread-binding setup
by the Cray application scheduler.
One unexpected problem encountered was a huge runtime variation in jobs that heavily
utilise scripts (jobs starting more than 50,000 processes). The issue was traced to caching
issues with the NFS file system. This drove the decision to move those centrally provided
scripts from NFS and onto Lustre, and resulted in much improved application performance.
1.8 Archiving
Two archival systems are in place to handle data related to the input and output data
from Australis, Oracle’s Hierarchical Storage Manager (OHSM, formally known as SAM-
QFS), and Meteorological Archival and Retrieval System (MARS) from the European Cen-
tre for Medium-Range Weather Forecasts (ECMWF).
Data is moved onto and off SAM-QFS using network copy programs (such as scp) from
Data Mover nodes on the edge of the Crays.
1.8.2 MARS/TSM
MARS archives and retrieves structured data to and from tape, and is optimised for
expected retrieval patterns. MARS is a bespoke archive solution designed and developed by
the ECMWF.[7]
1. Numerical Prediction (NP) specific archival system for gridded and observation
datasets developed and maintained by the ECMWF
2. Metadata database and disk front-end for tape storage
3. 2.5PB of operational archive, growing 1+TB per day
4. Up to 60,000+ transactions per day
5. File formats: GRIBL, GRIBZ, & BUFR [11].
MARS provides a layer to archive/move the data to tape; in our case IBMs Spectrum
Protection (previously Tivoli Storage Manager, TSM) is used to archive the data to tape.
The system supporting the MARS service and TSM is described in Table 1.7.
A sample of system usage patterns on the Production platform can be seen in Figure
1.8. The Time-critical Processing Load graph shows the regularity of the load created by
the 6-hourly Production jobs on the HPC system.
Notes: This load graph omits the impact of the many non time-critical jobs that run on
Australis. These jobs include ocean and climate models, and they tend to use a lot of the
HPC system capacity that otherwise might seem to be unused. In addition, noticeably more
HPC system capacity will be consumed when the higher-resolution models currently being
planned are moved to production.
Figure 1.9 shows the rise in the number of production jobs since the Australis HPC
system inception. Staging jobs are not shown.
1.11 Reliability
In striving to achieve a robust 24x7x365 platform we identified four key areas to optimise
the workflow and balance the competing loads.
1. Compute capacity to run jobs from PBS Pro
2. Datamover capacity to stage files
3. Storage disks for short term files on Lustre
4. SSH access to pull files
26 Contemporary High Performance Computing, Vol. 3
Identifying and focusing on these areas allowed us to establish processes to monitor and
support each key area. In normal operation each node is allocated directly to particular
PBS Pro queues; this is where a process allows specific groups of nodes to be allocated to
either Production or Staging queues.
relatively easily. Before doing so, this data may need to be rolled back to the last known
good state.
The available modes of operation of the storage systems is shown in Figure 1.10. These
range from Normal mode through Failover, Recovery, and Isolated modes, and are discussed
further below.
and to determine which PBS Pro queues they are allocated to. From this, a dynamic DNS
entry is generated for those clients that need to run SCP (secure copy). If a target node
becomes unresponsive, it will be removed from the list maintained by the F5, and when a
queue is switched, the list updates within a few minutes.
Observation Software /
and model Application Stacks
data
Workflow
Schedulers
Ocean / Ocean /
Climate Climate
models products
Downstream
Users:
Weather Weather
Bureau,
models products
Public,
Industry
Production Post Production
Archive
Systems
MARS
SAM
used successfully at other National Meteorology centres; however our choice of hardware
solution is different and worth a discussion.
The post processing platform is being built around two Cray CS400 systems. Each scal-
able CS400 platform comprises 16 compute nodes and 4 GPU nodes, each with 1.6TB NVMe
flash storage, 3 service nodes, and 2 management nodes running Bright Cluster Manager.
The platform will also mount the Cray XC40 Lustre filesystem, and its own GPFS filesys-
tem, which is better equipped to manage the intensive I/O workloads. The configuration
is based on a simplified design philosophy where each CS400 cluster is interconnected over
InfiniBand and have their own dedicated storage based on DDN GRIDScaler R GS14KX
hardware.
Currently the CS400 cluster represents a compute count of 40 nodes, 1440 Intel Broad-
well cores, 8 NVIDIA R Tesla R K80 GPU’s, 10.24TB of RAM, and 4PB of GPFS data
storage with 300TB of SSD flash storage.
The post processing will use IBM’s Spectrum Scale (GPFS) as its parallel file system.
All CS400 clients run GPFS client software while the DDN storage run embedded GPFS
Network Shared Disk(NSD) servers. In addition to this, each CS400 cluster has a dedicated
LNET Node that provides access to the Australis global file systems. Each Compute, GPU
and Service node also runs a version of lustre client to enable it to access the Australis
global file systems.
We believe we are one of only a few HPC organisations utilising both Lustre and GPFS.
Our early testing using IOR (IO performance benchmark tool) measures 35GB/second
across 16 nodes. Our calculations show that there is performance growth headroom within
the DDN storage stack; however the more limiting performance bottleneck is the choice
of FDR in our InfiniBand network (this is consistent with Lustre networking today). To
improve storage performance, the Bureau will be moving to InfniBand EDR in the mid-life
upgrade.
Once the Aurora platform is in place, the Bureau will continue to work towards estab-
lishing a content delivery network with data supplied by the post processing platform.
Bibliography
J. S. Vetter. Contemporary high performance computing: an introduction. In Jeffrey S. Vetter, editor,
Contemporary High Performance Computing: From Petascale Toward Exascale, volume 1 of CRC
Computational Science Series, page 730. Taylor and Francis, Boca Raton, 1 edition, 2013.
J. S. Vetter, editor. Contemporary High Performance Computing: From Petascale Toward Exascale,
volume 2 of CRC Computational Science Series. Taylor and Francis, Boca Raton, 1 edition, 2015.
Ilia Bermous, Joerg Henrichs, and Michael Naughton. Application performance improvement by use
of partial nodes to reduce memory contention. CAWCR Research Letters, pages 19–22, 2013.
http://www.cawcr.gov.au/researchletters/CAWCR_Research_Letters_9.pdf#page=19, [accessed
31-August 2017].
Bureau of Meteorology, Queensland Regional Office. Severe Tropical Cyclone Debbie. Press Release,
29 March 2017. http://www.bom.gov.au/announcements/sevwx/qld/qldtc20170325.shtml,
[accessed 31-August-2017].
TJ Dell. A white paper on the benefits of chipkill – correct ECC for PC server main memory.
Technical report, IBM Microelectronics Division, November 1997.
http://www.ece.umd.edu/courses/enee759h.S2003/references/ibm_chipkill.pdf, [accessed 31-
August-2017].
Tom Keenan, Kamal Puri, Tony Hirst, Tim Pugh, Ben Evans, Martin Dix, Andy Pitman, Peter Craig,
Rachel Law, Oscar Alves, Gary Dietachmayer, Peter Steinle, and Helen Cleugh. Next Generation
Australian Community Climate and Earth-System Simulator (NG-ACCESS) - A Roadmap 2014-
2019. The Centre for Australian Weather and Climate Research, June 2014.
http://www.cawcr.gov.au/technical-reports/CTR_075.pdf, [accessed 31-August-2017].
J Kim, WJ Dally, and D Abts. Technology-Driven, Highly-Scalable Dragonfly Topology. In ACM
SIGARCH Computer Architecture News, volume 36, pages 77–88. IEEE Computer Society, 2008.
Anna Leask, Kurt Bayer, and Lynley Bilby. Tropical storm Debbie – a day of destruction, despair and
drama. New Zealand Herald, 7 April 2017.
http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=11833401, [accessed 10-
October-2017].
Carsten Maass. MARS User Documentation. October 2017.
https://software.ecmwf.int/wiki/display/UDOC/MARS+user+documentation, [accessed 10-
October-2017].
Bureau of Meteorology. New supercomputer to supercharge weather warnings and forecasts. Press
Release, July 2015. http://media.bom.gov.au/releases/188/new-supercomputer-to-supercharge-
weather-warnings-and-forecasts/, [accessed 31-August-2017].
UK Met Office. Unified Model Partnership, October 2016.
https://www.metoffice.gov.uk/research/collaboration/um-partnership, [accessed 31-August-2017].
Hilary J Oliver. Cylc (The Cylc Suite Engine), Version 7.5.0. Technical report, NIWA, 2016.
http://cylc.github.io/cylc/html/single/cug-html.html, [accessed 31-August-2017].
World Meteorological Organization. WMO International Codes, December 2012.
http://www.wmo.int/pages/prog/www/WMOCodes.html, [accessed 31-August-2017].
QJ Wang. Seasonal Water Forecasting and Prediction. Technical report, CSIRO, 2013.
http://www.bom.gov.au/water/about/waterResearch/document/wirada/wirada-long-term-
factsheet.pdf, [accessed 10-October-2017].
COBALT: Component-based lightweight toolkit. http://trac.mcs.anl.gov/projects/cobalt.
Parallel filesystem I/O benchmark. https://github.com/LLNL/ior.
Sandia MPI Micro-Benchmark Suite (SMB). http://www.cs.sandia.gov/smb/.
Sustained System Performance (SSP). http://www.nersc.gov/users/computational-systems/cori/nersc-
8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/ssp/.
The Graph 500 – June 2011. http://www.graph500.org.
The Green 500 – June 2010. http://www.top500.org/green500.
The Top 500 – June 2008. http://www.top500.org.
mdtest, 2017. https://github.com/MDTEST-LANL/mdtest.
Bob Alverson, Edwin Froese, Larry Kaplan, and Duncan Roweth. Cray XC Series Network.
http://www.cray.com/sites/default/files/resources/CrayXCNetwork.pdf.
Anna Maria Bailey, Adam Bertsch, Barna Bihari, Brian Carnes, Kimberly Cupps, Erik W. Draeger,
Larry Fried, Mark Gary, James N. Glosli, John C. Gyllenhaal, Steven Langer, Rose McCallen,
Arthur A. Mirin, Fady Najjar, Albert Nichols, Terri Quinn, David Richards, Tome Spelce, Becky
Springmeyer, Fred Streitz, Bronis de Supinski, Pavlos Vranas, Dong Chen, George L.T. Chiu, Paul
W. Coteus, Thomas W. Fox, Thomas Gooding, John A. Gunnels, Ruud A. Haring, Philip
Heidelberger, Todd Inglett, Kyu Hyoun Kim, Amith R. Mamidala, Sam Miller, Mike Nelson,
Martin Ohmacht, Fabrizio Petrini, Kyung Dong Ryu, Andrew A. Schram, Robert Shearer, Robert
E. Walkup, Amy Wang, Robert W. Wisniewski, William E. Allcock, Charles Bacon, Raymond
Bair, Ramesh Balakrishnan, Richard Coffey, Susan Coghlan, Jeff Hammond, Mark Hereld, Kalyan
Kumaran, Paul Messina, Vitali Morozov, Michael E. Papka, Katherine M. Riley, Nichols A.
Romero, and Timothy J. Williams. Blue Gene/Q: Sequoia and Mira. In Jeffrey S. Vetter, editor,
Contemporary High Performance Computing: From Petascale toward Exascale, chapter 10, pages
225–281. Chapman & Hall/CRC, 2013.
Cray. CLE User Application Placement Guide, S-2496-5204 edition.
Cray. Cray C and C++ Reference Manual, (8.5) S-2179 edition.
Cray. Cray Fortran Reference Manual, (8.5) S-3901 edition.
Cray. XC Series GNI and DMAPP API User Guide, (CLE6.0.UP03) S-2446 edition.
Cray. XC Series Programming Environment User Guide, (17.05) S-2529 edition.
Argonne Leadership Computing Facility. Early science program, 2010. http://esp.alcf.anl.gov.
Sunny Gogar. Intel Xeon Phi x200 processor - memory modes and cluster modes: Configuration and
use cases. Intel Software Developer Zone, 2015. http://software.intel.com/en-us/articles/intel-xeon-
phi-x200-processor-memory-modes-and-cluster-modes-configuration-and-use-cases.
Kevin Harms, Ti Leggett, Ben Allen, Susan Coghlan, Mark Fahey, Ed Holohan, Gordon McPheeters,
and Paul Rich. Theta: Rapid installation and acceptance of an XC40 KNL system. In Proceedings
of the 2017 Cray User Group, Redmond, WA, May 2017.
Mark Holland and Garth A. Gibson. Parity declustering for continuous operation in redundant disk
arrays. In Richard L. Wexelblat, editor, Proceedings of the 5th International Conference on
Architectural Support for Programming Languages and Operating Systems, volume 27. ACM, New
York, NY, 1992.
Paul Peltz Jr., Adam DeConinck, and Daryl Grunau. How to automate and not manage under
Rhine/Redwood. In Proceedings of the 2016 Cray User Group, London, UK, May 2016.
Steven Martin, David Rush, and Matthew Kappel. Cray advanced platform monitoring and control
(CAPMC). In Proceedings of the 2015 Cray User Group, Chicago, IL, April 2015.
John D. McCalpin. Memory bandwidth and machine balance in current high performance computers.
IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages
19–25, December 1995. http://tab.computer.org/tcca/NEWS/DEC95/dec95_mccalpin.ps.
James Milano and Pamela Lembke. IBM System Blue Gene Solution: Blue Gene/Q Hardware
Overview and Installation Planning. Number SG24-7872-01 in An IBM Red-books publication.
May 2013. ibm.com/redbooks.
Department of Energy (DOE) Office of Science. Facilities for the Future of Science: A Twenty-Year
Outlook, 2003. https://science.energy.gov/~/media/bes/pdf/archives/plans/ffs_10nov03.pdf.
Scott Parker, Vitali Morozov, Sudheer Chunduri, Kevin Harms, Chris Knight, and Kalyan Kumaran.
Early evaluation of the Cray XC40 Xeon Phi System Theta at argonne. In Proceedings of the 2017
Cray User Group, Redmond, WA, May 2017.
Avinash Sodani. Knights Landing (KNL): 2nd Generation Intel Xeon Phi. In Hot Chips 27
Symposium (HCS), 2015 IEEE, Cupertino, CA, August 2015.
http://ieeexplore.ieee.org/document/7477467/.
Wolfgang Baumann, Guido Laubender, Matthias Luter, Alexander Reinefeld, Christian Schimmel,
Thomas Steinke, Christian Tuma, and Stefan Wollny. Contemporary High Performance
Computing: From Petascale toward Exascale, volume 2, chapter HLRN-III at Zuse Institute
Berlin, pages 81–114. Chapman & Hall/CRC, 2015.
Heinecke, A. and Klemm, M. and Bungartz, H. J. From GPGPU to Many-Core: Nvidia Fermi and
Intel Many Integrated Core Architecture. Computing in Science and Engineering, 14:78–83, 2012.
Tom Henderson, John Michalakes, Indraneil Gokhale, and Ashish Jha. Chapter 2 - Numerical Weather
Prediction Optimization. In James Reinders and Jim Jeffers, editors, High Performance Parallelism
Pearls, pages 7 – 23. Morgan Kaufmann, Boston, 2015.
Intel Corporation. Itanium ABI, v1.86.
Jeffers, James and Reinders, James. Intel Xeon Phi Coprocessor High Performance Programming.
Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2013.
Khronos OpenCL Working Group. The OpenCL Specification, Version 2.2, March 2016.
https://www.khronos.org/registry/cl/specs/opencl-2.2.pdf.
Michael Klemm, Alejandro Duran, Xinmin Tian, Hideki Saito, Diego Caballero, and Xavier Martorell.
Extending OpenMP* with Vector Constructs for Modern Multicore SIMD Architectures. In
Proceedings of the 8th International Conference on OpenMP in a Heterogeneous World,
IWOMP’12, pages 59–72, Berlin, Heidelberg, 2012. Springer-Verlag.
G. Kresse and J. Furthmüller. Efficiency of ab-initio total energy calculations for metals and
semiconductors using a plane-wave basis set. Comput. Mater. Sci., 6(1):15 – 50, 1996.
G. Kresse and J. Hafner. Phys. Rev. B, 47:558, 1993.
G. Kresse and D. Joubert. Phys. Rev., 59:1758, 1999.
G. Kresse, M. Marsman, and J. Furthmüller. VASP the Guide.
http://cms.mpi.univie.ac.at/vasp/vasp/vasp.html, April 2016.
B. Maronga, M. Gryschka, R. Heinze, F. Hoffmann, F. Kanani-Shring, M. Keck, K. Ketelsen, M. O.
Letzel, M. Shring, and S. Raasch. The Parallelized Large-Eddy Simulation Model (PALM) version
4.0 for atmospheric and oceanic flows: model formulation, recent developments, and future
perspectives. Geosci. Model Dev., 8:1539–1637, 2015.
Y. Nakamura and H. Stüben. BQCD - Berlin quantum chromodynamics program. In PoS (Lattice
2010), page 40, 2010.
Chris J. Newburn, Rajiv Deodhar, Serguei Dmitriev, Ravi Murty, Ravi Narayanaswamy, John
Wiegert, Francisco Chinchilla, and Russell McGuire. Offload Compiler Runtime for the Intel Xeon
Phi Coprocessor. In Supercomputing, pages 239–254. Springer Berlin Heidelberg, 2013.
John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. Scalable Parallel Programming with
CUDA. Queue, 6(2):40–53, March 2008.
Matthias Noack. HAM - Heterogenous Active Messages for Efficient Offloading on the Intel Xeon Phi.
Technical report, ZIB, Takustr.7, 14195 Berlin, 2014.
Matthias Noack, Florian Wende, and Klaus-Dieter Oertel. Chapter 19 - OpenCL: There and Back
Again. In James Reinders and Jim Jeffers, editors, High Performance Parallelism Pearls, pages
355 – 378. Morgan Kaufmann, Boston, 2015.
Matthias Noack, Florian Wende, Thomas Steinke, and Frank Cordes. A Unified Programming Model
for Intra- and Inter-Node Offloading on Xeon Phi Clusters. In International Conference for High
Performance Computing, Networking, Storage and Analysis, SC 2014, New Orleans, LA, USA,
November 16-21, 2014, pages 203–214, 2014.
Matthias Noack, Florian Wende, Georg Zitzlsberger, Michael Klemm, and Thomas Steinke. KART –
A Runtime Compilation Library for Improving HPC Application Performance. In IXPUG
Workshop ―Experiences on Intel Knights Landing at the One Year Mark‖ at ISC High Performance
2017, Frankfurt, Germany, June 2017.
OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 4.0, 2013.
http://www.openmp.org.
OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 4.5, 2015.
http://www.openmp.org.
OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 4.5, 2015.
http://www.openmp.org/.
Scott Pakin, M. Lang, and D.K. Kerbyson. The Reverse-Acceleration Model for Programming
Petascale Hybrid Systems. IBM Journal of Research and Development, 53(5), 2009.
Boris Schling. The Boost C++ Libraries. XML Press, 2011.
Sergi Siso. DL_MESO Code Modernization. Intel Xeon Phi Users Group (IXPUG), March 2016.
IXPUG Workshop, Ostrava.
Avinash Sodani, Roger Gramunt, Jesús Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram
Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. Knights Landing: Second-
Generation Intel Xeon Phi Product. IEEE Micro, 36(2):34–46, 2016.
Florian Wende, Martijn Marsman, Zhengji Zhao, and Jeongnim Kim. Porting VASP from MPI to MPI
+ OpenMP [SIMD]. In Proceedings of the 13th International Workshop on OpenMP, Scaling
OpenMP for Exascale Performance and Portability, IWOMP’17, 2017. Accepted for publication.
Florian Wende, Matthias Noack, Thomas Steinke, Michael Klemm, Chris J. Newburn, and Georg
Zitzlsberger. Portable SIMD Performance with OpenMP* 4.X Compiler Directives. In Proceedings
of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 9833, pages
264–277, New York, NY, USA, 2016. Springer-Verlag New York, Inc.
Rosa M. Badia, Jesus Labarta, Judit Gimenez, and Francesc Escale. DIMEMAS: Predicting MPI
applications behavior in Grid environments. In Workshop on Grid Applications and Programming
Tools (GGF8), volume 86, pages 52–62, 2003.
Barcelona Supercomputing Center. BSC Performance Analysis Tools. https://tools.bsc.es/.
Barcelona Supercomputing Center. MareNostrum III (2013) System Architecture.
https://www.bsc.es/marenostrum/marenostrum/mn3.
Leonardo Bautista-Gomez, Ferad Zyulkyarov, Osman Unsal, and Simon McIntosh-Smith. Unprotected
Computing: A Large-scale Study of DRAM Raw Error Rate on a Supercomputer. In Proceedings of
the International Conference for High Performance Computing, Networking, Storage and Analysis,
SC ’16, pages 55:1–55:11, Piscataway, NJ, USA, 2016. IEEE Press.
Kallia Chronaki, Alejandro Rico, Rosa M Badia, Eduard Ayguadé, Jesús Labarta, and Mateo Valero.
Criticality-aware dynamic task scheduling for heterogeneous architectures. In Proceedings of the
29th ACM on International Conference on Supercomputing, pages 329–338. ACM, 2015.
Lamia Djoudi, Denis Barthou, Patrick Carribault, Christophe Lemuet, Jean-Thomas Acquaviva,
William Jalby, et al. Maqao: Modular assembler quality analyzer and optimizer for Itanium 2. In
The 4th Workshop on EPIC architectures and compiler technology, San Jose, volume 200, 2005.
Alejandro Duran, Eduard Ayguadé, Rosa M Badia, Jesús Labarta, Luis Martinell, Xavier Martorell,
and Judit Planas. OmpSs: a proposal for programming heterogeneous multi-core architectures.
Parallel Processing Letters, 21(02):173–193, 2011.
Markus Geimer, Felix Wolf, Brian JN Wylie, Erika Ábrahám, Daniel Becker, and Bernd Mohr. The
Scalasca performance toolset architecture. Concurrency and Computation: Practice and
Experience, 22(6):702–719, 2010.
Jülich Supercomputing Centre. Jülich Supercomputing Centre – HPC technology. http://www.fz-
juelich.de/ias/jsc/EN/Research/HPCTechnology/PerformanceAnalyse/performanceAnalysis_node.h
tml.
Andreas Knüpfer, Christian Rössel, Dieter an Mey, Scott Biersdorff, Kai Diethelm, Dominic
Eschweiler, Markus Geimer, Michael Gerndt, Daniel Lorenz, Allen Malony, et al. Score-P: A joint
performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In
Tools for High Performance Computing 2011, pages 79–91. Springer, 2012.
Krishna T. Malladi, Benjamin C. Lee, Frank A. Nothaft, Christos Kozyrakis, Karthika Periyathambi,
and Mark Horowitz. Towards Energy-proportional Datacenter Memory with Mobile DRAM. In
Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA ’12,
pages 37–48, 2012.
Mathias Nachtmann and José Gracia. Enabling model-centric debugging for task-based programming
models–a tasking control interface. In Tools for High Performance Computing 2015, pages 147–
160. Springer, 2016.
Vincent Pillet, Jesús Labarta, Toni Cortes, and Sergi Girona. Paraver: A tool to visualize and analyze
parallel code. In Proceedings of WoTUG-18: transputer and occam developments, volume 44,
pages 17–31. IOS Press, 1995.
Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, and Mateo Valero.
Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC? In Proceedings of the
International Conference on High Performance Computing, Networking, Storage and Analysis, SC
’13, pages 40:1–40:12, New York, NY, USA, 2013. ACM.
Nikola Rajovic, Alejandro Rico, Filippo Mantovani, Daniel Ruiz, Josep Oriol Vilarrubi, Constantino
Gomez, Luna Backes, Diego Nieto, Harald Servat, Xavier Martorell, Jesus Labarta, Eduard
Ayguade, Chris Adeniyi-Jones, Said Derradji, Hervé Gloaguen, Piero Lanucara, Nico Sanna, Jean-
Francois Mehaut, Kevin Pouget, Brice Videau, Eric Boyer, Momme Allalen, Axel Auweter, David
Brayford, Daniele Tafani, Volker Weinberg, Dirk Brömmel, René Halver, Jan H. Meinke, Ramon
Beivide, Mariano Benito, Enrique Vallejo, Mateo Valero, and Alex Ramirez. The Mont-blanc
Prototype: An Alternative Approach for HPC Systems. In Proceedings of the International
Conference for High Performance Computing, Networking, Storage and Analysis, SC ’16, pages
38:1–38:12, Piscataway, NJ, USA, 2016. IEEE Press.
Nikola Rajovic, Alejandro Rico, Nikola Puzovic, Chris Adeniyi-Jones, and Alex Ramirez. Tibidabo1:
Making the case for an ARM-based HPC system. Future Generation Computer Systems, 36:322–
334, July 2014.
Nikola Rajovic, Alejandro Rico, James Vipond, Isaac Gelado, Nikola Puzovic, and Alex Ramirez.
Experiences with Mobile Processors for Energy Efficient HPC. In Proceedings of the Conference
on Design, Automation and Test in Europe, DATE ’13, pages 464–468, San Jose, CA, USA, 2013.
EDA Consortium.
Nikola Rajovic, Lluis Vilanova, Carlos Villavieja, Nikola Puzovic, and Alex Ramirez. The low power
architecture approach towards exascale computing. Journal of Computational Science, 4(6):439–
443, 2013.
Pavel Saviankou, Michael Knobloch, Anke Visser, and Bernd Mohr. Cube v4: From performance
report explorer to performance analysis tool. Procedia Computer Science, 51:1343–1352, 2015.
Brice Videau, Kevin Pouget, Luigi Genovese, Thierry Deutsch, Dimitri Komatitsch, Frédéric Desprez,
and Jean-François Méhaut. BOAST: A metaprogramming framework to produce portable and
efficient computing kernels for hpc applications. The International Journal of High Performance
Computing Applications, page 1094342017718068, 2017.
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S.
Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine
learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
Amazon Web Services. Elastic Compute Cloud (EC2). https://aws.amazon.com/ec2/, 2017. [Online;
accessed 28-July-2017].
Ansible HQ. Ansible. https://www.ansible.com, 2017. [Online; accessed 28-July-2017].
Arutyun I. Avetisyan, Roy Campbell, Indranil Gupta, Michael T. Heath, Steven Y. Ko, Gregory R.
Ganger, Michael A. Kozuch, David O’Hallaron, Marcel Kunze, Thomas T. Kwan, et al. Open
Cirrus: A Global Cloud Computing Testbed. Computer, 43(4):35–43, 2010.
Ilia Baldine, Yufeng Xin, Anirban Mandal, Paul Ruth, Chris Heerman, and Jeff Chase. ExoGENI: A
Multi-Domain Infrastructure-as-a-Service Testbed. Testbeds and Research Infrastructure.
Development of Networks and Communities, pages 97–113, 2012.
Daniel Balouek, Alexandra Carpen-Amarie, Ghislain Charrier, Frédéric Desprez, Emmanuel Jeannot,
Emmanuel Jeanvoine, Adrien Lébre, David Margery, Nicolas Niclausse, Lucas Nussbaum, Olivier
Richard, Christian Pérez, Flavien Quesnel, Cyril Rohr, and Luc Sarzyniec. Adding Virtualization
Capabilities to the Grid’5000 Testbed. In Ivan I. Ivanov, Marten van Sinderen, Frank Leymann,
and Tony Shan, editors, Cloud Computing and Services Science, volume 367 of Communications in
Computer and Information Science, pages 3–20. Springer International Publishing, 2013.
Mark Berman, Jeffrey S. Chase, Lawrence Landweber, Akihiro Nakao, Max Ott, Dipankar
Raychaudhuri, Robert Ricci, and Ivan Seskar. GENI: A federated testbed for innovative network
experiments. Computer Networks, 61:5–23, 2014. Special issue on Future Internet Testbeds – Part
I.
Blazar contributors. Welcome to Blazar! — Blazar. http://blazar.readthedocs.io/en/latest/, 2017.
[Online; accessed 28-July-2017].
A. Boles and P. Rad. Voice biometrics: Deep learning-based voiceprint authentication system. In 2017
12th System of Systems Engineering Conference (SoSE), pages 1–6, June 2017.
Bolze, Raphaël and Cappello, Franck and Caron, Eddy and Dayde, Michel and Desprez, Frédéric and
Jeannot, Emmanuel and Jégou, Yvon and Lanteri, Stephane and Leduc, Julien and Melab,
Nouredine and Mornet, Guillaume and Namyst, Raymond and Primet, Pascale and Quétier,
Benjamin and Richard, Olivier and El-Ghazali, Talbi and Touche, Iréa. Grid’5000: A Large Scale
And Highly Reconfigurable Experimental Grid Testbed. International Journal of High
Performance Computing Applications, 20(4):481–494, 2006.
T. Bray. The JavaScript Object Notation (JSON) Data Interchange Format. RFC 7159, RFC Editor,
March 2014.
Tomasz Buchert, Cristian Ruiz, Lucas Nussbaum, and Olivier Richard. A survey of general-purpose
experiment management tools for distributed systems. Future Generation Computer Systems, 45:1–
12, 2015.
Ceilometer contributors. Welcome to the Ceilometer developer documentation! — ceilometer
documentation. https://docs.openstack.org/developer/ceilometer/, 2017. [Online; accessed 28-July-
2017].
Chef contributors. About Ohai — Chef Docs. https://docs.chef.io/ohai.html, 2017. [Online; accessed
28-July-2017].
Brent Chun, David Culler, Timothy Roscoe, Andy Bavier, Larry Peterson, Mike Wawrzoniak, and
Mic Bowman. PlanetLab: An Overlay Testbed for Broad-Coverage Services. ACM SIGCOMM
Computer Communication Review, 33(3):3–12, 2003.
Cinder contributors. Attach a Single Volume to Multiple Hosts — Cinder Specs.
https://specs.openstack.org/openstack/cinder-specs/specs/kilo/multi-attach-volume.html, 2015.
[Online; accessed 28-July-2017].
Cinder contributors. Add Volume Connection Information for Ironic Nodes — Ironic Specs.
https://specs.openstack.org/openstack/ironic-specs/specs/approved/volume-connection-
information.html, 2016. [Online; accessed 28-July-2017].
Marcos Dias de Assuncão, Laurent Lefèvre, and Francois Rossigneux. On the impact of advance
reservations for energy-aware provisioning of bare-metal cloud resources. In 2016 12th
International Conference on Network and Service Management (CNSM), pages 238–242, Oct
2016.
Ewa Deelman, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Sonal Patil, Mei-Hui Su,
Karan Vahi, and Miron Livny. Pegasus: Mapping Scientific Workflows onto the Grid. In Grid
Computing, pages 131–140. Springer, 2004.
Diskimage-builder contributors. Diskimage-builder Documentation.
https://docs.openstack.org/developer/diskimage-builder/, 2017. [Online; accessed 28-July-2017].
D. Duplyakin and R. Ricci. Introducing configuration management capabilities into CloudLab
experiments. In 2016 IEEE Conference on Computer Communications Workshops (INFOCOM
WKSHPS), pages 39–44, April 2016.
EM Fajardo, JM Dost, B Holzman, T Tannenbaum, J Letts, A Tiradani, B Bockelman, J Frey, and D
Mason. How much higher can HTCondor fly? In J. Phys. Conf. Ser., volume 664. Fermi National
Accelerator Laboratory (FNAL), Batavia, IL (United States), 2015.
Geoffrey C. Fox, Gregor von Laszewski, Javier Diaz, Kate Keahey, José Fortes, Renato Figueiredo,
Shava Smallen, Warren Smith, and Andrew Grimshaw. FutureGrid: A Reconfigurable Testbed for
Cloud, HPC, and Grid Computing. In Contemporary High Performance Computing: From
Petascale toward Exascale, Chapman & Hall/CRC Computational Science, pages 603–636.
Chapman & Hall/CRC, April 2013.
Garth Gibson, Gary Grider, Andree Jacobson, and Wyatt Lloyd. PRObE: A Thousand-Node
Experimental Cluster for Computer Systems Research. USENIX; login, 38(3), 2013.
Glance contributors. Welcome to Glance’s documentation! — glance documentation.
https://docs.openstack.org/developer/glance/, 2017. [Online; accessed 28-July-2017].
Gnocchi contributors. Gnocchi – Metric as a Service. http://gnocchi.xyz, 2017. [Online; accessed 28-
July-2017].
Brice Goglin. Exposing the Locality of Heterogeneous Memory Architectures to HPC Applications. In
Proceedings of the Second International Symposium on Memory Systems, MEMSYS ’16, pages
30–39. ACM, 2016.
Google Compute Platform. Compute Engine - IaaS. https://cloud.google.com/compute/, 2017. [Online;
accessed 28-July-2017].
Heat contributors. Welcome to the Heat documentation! — heat documentation.
https://docs.openstack.org/developer/heat/, 2017. [Online; accessed 28-July-2017].
Internet2. Advanced Layer 2 Service. https://www.internet2.edu/products-services/advanced-
networking/layer-2-services/, 2017. [Online; accessed 28-July-2017].
Ironic contributors. Ironic Release Notes: Newton Series (6.0.0 - 6.2.x).
https://docs.openstack.org/releasenotes/ironic/newton.html, 2016. [Online; accessed 28-July-2017].
Ironic contributors. Configuring Web or Serial Console — ironic documentation.
https://docs.openstack.org/developer/ironic/deploy/console.html, 2017. [Online; accessed 28-July-
2017].
Ironic contributors. OpenStack Docs: Multi-tenancy in the Bare Metal service.
https://docs.openstack.org/ironic/latest/admin/multitenancy.html, 2017. [Online; accessed 28-July-
2017].
Ironic contributors. Physical Network Awareness — Ironic Specs.
https://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/physical-network-
awareness.html, 2017. [Online; accessed 28-July-2017].
Ironic contributors. Welcome to Ironic’s developer documentation! — ironic documentation.
https://docs.openstack.org/developer/ironic/, 2017. [Online; accessed 28-July-2017].
K. Keahey and T. Freeman. Contextualization: Providing One-Click Virtual Clusters. In 2008 IEEE
Fourth International Conference on eScience, pages 301–308, Dec 2008.
Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech, and time series. The
handbook of brain theory and neural networks, 3361(10):1995, 1995.
Xiaoyi Lu, Md. Wasi-ur Rahman, Nusrat Islam, Dipti Shankar, and Dhabaleswar K. (DK) Panda.
Accelerating Big Data Processing on Modern HPC Clusters, pages 81–107. Springer International
Publishing, 2016.
Xiaoyi Lu, Jie Zhang, and Dhabaleswar K. (DK) Panda. Building Efficient HPC Cloud with SR-IOV
Enabled InfiniBand: The MVAPICH2 Approach. Springer International Publishing, 2017.
J. Lwowski, P. Kolar, P. Benavidez, P. Rad, J. J. Prevost, and M. Jamshidi. Pedestrian detection
system for smart communities using deep convolutional neural networks. In 2017 12th System of
Systems Engineering Conference (SoSE), pages 1–6, June 2017.
Lyonel Vincent. Hardware Lister (lshw). http://www.ezix.org/project/wiki/HardwareLiSter, 2017.
[Online; accessed 28-July-2017].
David Margery, Emile Morel, Lucas Nussbaum, Olivier Richard, and Cyril Rohr. Resources
Description, Selection, Reservation and Verification on a Large-Scale Testbed. In Victor C.M.
Leung, Min Chen, Jiafu Wan, and Yin Zhang, editors, Testbeds and Research Infrastructure:
Development of Networks and Communities: 9th International ICST Conference, TridentCom
2014, Guangzhou, China, May 5-7, 2014, Revised Selected Papers, pages 239–247. Springer
International Publishing, 2014. DOI: 10.1007/978-3-319-13326-3_23.
Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford,
Scott Shenker, and Jonathan Turner. OpenFlow: Enabling Innovation in Campus Networks. ACM
SIGCOMM Computer Communication Review, 38(2):69–74, 2008.
Dirk Merkel. Docker: Lightweight Linux Containers for Consistent Development and Deployment.
Linux Journal, 2014(239):2, 2014.
Microsoft Azure. Virtual machines – Linux and Azure virtual machines.
https://azure.microsoft.com/services/virtual-machines/, 2017. [Online; accessed 28-July-2017].
National Science Foundation. CISE Research Infrastructure: Mid-Scale Infrastructure - NSFCloud
(CRI: NSFCloud). https://www.nsf.gov/pubs/2013/nsf13602/nsf13602.htm, 2013. [Online;
accessed 28-July-2017].
Neutron contributors. Welcome to Neutron’s developer documentation! — neutron documentation.
https://docs.openstack.org/developer/neutron/, 2017. [Online; accessed 28-July-2017].
Nova contributors. Vendordata — nova documentation.
https://docs.openstack.org/developer/nova/vendordata.html, 2017. [Online; accessed 28-July-2017].
Nova contributors. Welcome to Nova’s developer documentation! — nova documentation.
https://docs.openstack.org/developer/nova/, 2017. [Online; accessed 28-July-2017].
OpenStack contributors. Openstack Juno — OpenStack Open Source Cloud Computing Software.
https://www.openstack.org/software/juno/, 2014. [Online; accessed 28-July-2017].
OpenStack contributors. OpenStack Open Source Cloud Computing Software.
https://www.openstack.org, 2017. [Online; accessed 28-July-2017].
S. Panwar, A. Das, M. Roopaei, and P. Rad. A deep learning approach for mapping music genres. In
2017 12th System of Systems Engineering Conference (SoSE), pages 1–5, June 2017.
Pittsburgh Supercomputing Center. Bridges. https://www.psc.edu/bridges, 2017. [Online; accessed 28-
July-2017].
Ruth Pordes, Don Petravick, Bill Kramer, Doug Olson, Miron Livny, Alain Roy, Paul Avery, Kent
Blackburn, Torre Wenaus, Frank Würthwein, et al. The Open Science Grid. In Journal of Physics:
Conference Series, volume 78. IOP Publishing, 2007.
Puppet. Puppet. https://puppet.com, 2017. [Online; accessed 28-July-2017].
QEMU contributors. QCOW2. http://bit.ly/qcow2, 2017. [Online; accessed 28-July-2017].
Renaissance Computing Institute. Scientific Data Analysis at Scale (SciDAS).
http://renci.org/research/scientific-data-analysis-at-scale-scidas/, 2017. [Online; accessed 28-July-
2017].
Robert Ricci, Eric Eide, and The CloudLab Team. Introducing CloudLab: Scientific Infrastructure for
Advancing Cloud Architectures and Applications. USENIX ;login:, 39(6), December 2014.
Constantine Sapuntzakis, David Brumley, Ramesh Chandra, Nickolai Zeldovich, Jim Chow, Monica
S. Lam, and Mendel Rosenblum. Virtual Appliances for Deploying and Maintaining Software. In
Proceedings of the 17th USENIX Conference on System Administration, LISA ’03, pages 181–
194, 2003.
Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117,
2015.
Craig A. Stewart, Timothy M. Cockerill, Ian Foster, David Hancock, Nirav Merchant, Edwin
Skidmore, Daniel Stanzione, James Taylor, Steven Tuecke, George Turner, et al. Jetstream: A self-
provisioned, scalable science and engineering cloud environment. In Proceedings of the 2015
XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, page 29.
ACM, 2015.
Shawn M. Strande, Haisong Cai, Trevor Cooper, Karen Flammer, Christopher Irving, Gregor von
Laszewski, Amit Majumdar, Dmistry Mishin, Philip Papadopoulos, Wayne Pfeiffer, Robert S.
Sinkovits, Mahidhar Tatineni, Rick Wagner, Fugang Wang, Nancy Wilkins-Diehr, Nicole Wolter,
and Michael L. Norman. Comet: Tales from the long tail: Two years in and 10,000 users later. In
Proceedings of the Practice and Experience in Advanced Research Computing 2017 on
Sustainability, Success and Impact, PEARC17, pages 38:1–38:7. ACM, 2017.
The Chameleon project. Chameleon Cloud Homepage. https://www.chameleoncloud.org, 2017.
[Online; accessed 28-July-2017].
The FutureGrid project. FutureGrid. http://www.futuregrid.org, 2015. [Online; accessed 28-July-
2017].
Leendert van Doorn. Hardware Virtualization Trends. In Proceedings of the 2nd International
Conference on Virtual Execution Environments, pages 45–45, 2006.
Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike
Hibler, Chad Barb, and Abhijeet Joglekar. An integrated experimental environment for distributed
systems and networks. In Proceedings of the Fifth Symposium on Operating Systems Design and
Implementation, pages 255–270. USENIX Association, December 2002.
S. Alam, J. Poznanovic, U. Varetto, N Bianchi, A. Pena, and N. Suvanphim. Early experiences with
the Cray XK6 hybrid CPU and GPU MPP platform. In Proceedings of the Cray User Group
Conference, 2012.
N. Chaimov, A. Malony, C. Iancu, and K. Ibrahim. Scaling Spark on Lustre. ISC High Performance
2016. Lecture Notes in Computer Science, 9945, 2016.
G. Chrysos and S. P. Engineer. Intel xeon phi coprocessor (codename knights corner). In Proceedings
of the 24th Hot Chips Symposium, 2012.
G. Doms and U Schattler. The non-hydrostatic limited-area model LM (lokalmodell) of DWD Part I:
scientific documentation. German Weather Service, Offenbach/M., 1999.
G. Faanes, A. Bataineh, D. Roweth, T. Court, E. Froese, B. Alverson, T. Johnson, J. Kopnick, M.
Higgins, and J. Reinhard. Cray cascade: a scalable hpc system based on a dragonfly network.
Proceedings of the International Conference on High Performance Computing, Networking,
Storage and Analysis (SC 12), 2012.
O. Fuhrer, T. Chadha, T. Hoefler, G. Kwasniewski, X. Lapillonne, D. Leutwyler, D. Lüthi, C. Osuna,
C. Schär, T. C. Schulthess, and H. Vogt. Near-global climate simulation at 1 km resolution:
establishing a performance baseline on 4888 gpus with cosmo 5.0. Geoscientific Model
Development Discussions (under review), 2017.
O. Fuhrer, C. Osuna, X. Lapillonne, T. Gysi, B. Cumming, M. Biaco, A. Arteaga, and T. C.
Schulthess. Towards a performance portable, architecture agnostic implementation strategy for
weather and climate models. Supercomputing frontiers and innovations, 2014.
G. Johansen. Configuring and customizing the cray programming environment on cle 6.0 systems. In
Proceedings of the Cray User Group Conference, 2016.
D. J. Kerbyson, K. J. Barker, A. Vishnu, and A. Hoisie. A performance comparison of current HPC
systems: Blue Gene/Q, Cray XE6 and InfiniBand systems. Future Gener. Comput. Syst., 30:291–
304, January 2014.
S. Matsuoka. Power and energy aware computing with tsubame 2.0 and beyond. In Proceedings of the
2011 Workshop on Energy Efficiency: HPC System and Datacenters, EE-HPC-WG ’11, pages 1–
76, New York, NY, USA, 2011. ACM.
Y. Oyanagi. Lessons learned from the K computer project - from the K computer to Exascale. Journal
of Physics: Conference Series, 523:012001, 06 2014.
S. Ramos and T. Hoefler. Modeling communication in cache-coherent SMP systems: A case-study
with Xeon Phi. In Proceedings of the 22nd International Symposium on High-performance Parallel
and Distributed Computing, HPDC ’13, pages 97–108, New York, NY, USA, 2013. ACM.
O. Schütt, P. Messmer, J. Hutter, and J. VandeVondele. Gpu-accelerated sparse matrix-matrix
multiplication for linear scaling density functional theory. Electronic Structure Calculations on
Graphics Processing Units: From Quantum Chemistry to Condensed Matter Physics, 2016.
G. Sciacca. ATLAS and LHC computing on Cray. 22nd International Conference on Computing in
High Energy and Nuclear Physics, 2016.
A. Sodani, R. Gramunt, J. Corbal, H. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y
Liu. Knights landing: Second-generation intel xeon phi product. IEEE Micro, 36(2):34–46, 2016.
M. Staveley. Adapting Microsoft’s CNTK and ResNet-18 to Enable Strong-Scaling on Cray Systems.
In Neural Information Processing Systems (NIPS), 2016.
Chao Yang, Wei Xue, Haohuan Fu, Hongtao You, Xinliang Wang, Yulong Ao, Fangfang Liu, Lin
Gan, Ping Xu, Lanning Wang, Guangwen Yang, and Weimin Zheng. 10m-core scalable fully-
implicit solver for nonhydrostatic atmospheric dynamics. In Proceedings of the International
Conference for High Performance Computing, Networking, Storage and Analysis, SC ’16, pages
6:1–6:12, Piscataway, NJ, USA, 2016. IEEE Press.
British Standards Institution (2012). EN 50600-1 Information Technology - Data centre facilities and
infrastructures - Part 1: General concepts. London, 2012.
Telecommunications Infrastructure Association (2014). Telecommunications Infrastructure Standard
for Data Centers. 2014. Available from: http://www.tia.org.
British Standards Institution (2014a). Information Technology. Data centre facilities and
infrastructures. Building construction. London, 2014.
British Standards Institution (2014b). Information Technology. Data centre facilities and
infrastructures. Power distribution. London, 2014.
British Standards Institution (2014c). Information Technology. Data centre facilities and
infrastructures. Environmental control. London, 2014.
British Standards Institution (2014d). Information Technology. Data centre facilities and
infrastructures. Management and operational information. London, 2014.
British Standards Institution (2015). Information Technology. Data centre facilities and
infrastructures. Telecommunications cabling infrastructure. London, 2015.
British Standards Institution (2016). Information Technology. Data centre facilities and
infrastructures. Security Systems. London, 2016.
BISCI. ANSI/BICS 002-2014, Data Center Design and Implementation Best Practices. Tampa, FL, 3rd
edition, 2014.
Ladina Gilly. Data centre design standards and best practices for public research high performance
computing centres. Master’s thesis, CUREM - Center for Urban & Real Estate Management,
University of Zurich, CH - 8002 Zurich, 8 2016. Available at:
http://www.cscs.ch/fileadmin/publications/Tech_Reports/Data_centre_design_Thesis_e.pdf.
The Uptime Institute. Data Centre Site Infrastructure Tier Standard: Topology. 2012. Available from:
http://www.uptimeinstitute.com.
The Uptime Institute. Data Center Site Infrastructure Tier Standard: Operational Sustainability. 2013.
Available from: http://www.uptimeinstitute.com.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2007). Structural and
vibration guidelines for datacom equipment centers. Atlanta, GA, 2007.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2008a). Best practices
for datacom facility energy efficiency. Atlanta, GA, 2008.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2008a). TC 9.9 Mission
Critical Facilities Technology Spaces and Electronic Equipment (2008b): High Density Data
Centers. Atlanta, GA, 2008.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2009a). Design
considerations for datacom equipment centers. Atlanta, GA, 2009.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2009b). Real-time energy
consumption measurements in data centers. Atlanta, GA, 2009.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2011). Green tips for
data centers. Atlanta, GA, 2011.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2012). Datacom
equipment power trends and cooling applications. Atlanta, GA, 2012.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2013). PUE: a
comprehensive examination of the metric. Atlanta, GA, 2013.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2014a). Liquid cooling
guidelines for datacom equipment centers. Atlanta, GA, 2014.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2014b). Particulate and
gaseous contamination in datacom environments. Atlanta, GA, 2014.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2015a). Thermal
guidelines for data processing environments. Atlanta, GA, 2015.
American Society of Heating Refrigerating and Air-Conditioning Engineers (2015b). Server Efficiency
- Metrics for Computer Servers and Storage. Atlanta, GA, 2015.
W. P. Turner, J. H. Seader, V. Renaud, and K. G. Brill. Tier Classifications Define Site Infrastructure
Performance. 2008. White-paper available from: http://www.uptimeinstitute.org.
The OAuth 2.0 Authorization Framework. Technical report, 10 2012.
JupyterHub, 2017.
Enis Afgan, Dannon Baker, Marius vandenBeek, Daniel Blankenberg, Dave Bouvier, Martin Cech,
John Chilton, Dave Clements, Nate Coraor, Carl Eberhard, Bjrn Grüning, Aysam Guerler, Jennifer
Hillman-Jackson, Greg VonKuster, Eric Rasche, Nicola Soranzo, Nitesh Turaga, James Taylor,
Anton Nekrutenko, and Jeremy Goecks. The Galaxy platform for accessible, reproducible and
collaborative biomedical analyses: 2016 update. Nucleic Acids Research, 44(W1):W3–W10, 7
2016.
Jim Basney, Terry Fleury, and Jeff Gaynor. CILogon: A federated X.509 certification authority for
cyberinfrastructure logon. Concurrency and Computation: Practice and Experience, 26(13):2225–
2239, 9 2014.
Volker Brendel. Brendel Group Handbook, 2015.
Volker P Brendel. BWASP, 2017.
C. Titus Brown. Next-Gen Sequence Analysis Workshop (2017) angus 6.0 documentation, 2017.
M. S. Campbell, M. Law, C. Holt, J. C. Stein, G. D. Moghe, D. E. Hufnagel, J. Lei, R.
Achawanantakun, D. Jiao, C. J. Lawrence, D. Ware, S.-H. Shiu, K. L. Childs, Y. Sun, N. Jiang, and
M. Yandell. MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of
Plant Genome Annotations. PLANT PHYSIOLOGY, 164(2):513–524, 2 2014.
Kyle Chard, Ian Foster, and Steven Tuecke. Globus: Research Data Management as Service and
Platform, 2017.
CyVerse. Django.
CyVerse. Atmosphere, 2017.
CyVerse. Atmosphere-Ansible, 2017.
CyVerse. Troposphere, 2017.
Jack Dongarra and Piotr Luszczek. HPC Challenge: Design, History, and Implementation Highlights.
In Jeffrey Vetter, editor, Contemporary High Performance Computing: From Petascale toward
Exascale, chapter 2, pages 13–30. Taylor and Francis, CRC Computational Science Series, Boca
Raton, FL, 2013.
Jeremy Fischer, Enis Afgan, Thomas Doak, Carrie Ganote, David Y. Hancock, and Matthew Vaughn.
Using Galaxy with Jetstream. In Galaxy Community Conference, Bloomington, IN, 2016.
Jeremy Fischer, David Y Hancock, John Michael Lowe, George Turner, Winona Snapp-Childs, and
Craig A Stewart. Jetstream: A Cloud System Enabling Learning in Higher Education Communities.
In Proceedings of the 2017 ACM Annual Conference on SIGUCCS, SIGUCCS ’17, pages 67–72,
New York, NY, USA, 2017. ACM.
Ian Foster and Dennis B. Gannon. Cloud computing for science and engineering. Massachusetts
Institute of Technology Press, 2017.
National Science Foundation. High Performance Computing System Acquisition: Continuing the
Building of a More Inclusive Computing Environment for Science and Engineering, 2014.
Genomics and Bioinformatics Service at Texas A&M. PoreCamp USA.
Globus. Globus.
Stephen A. Goff, Matthew Vaughn, Sheldon McKay, Eric Lyons, Ann E. Stapleton, Damian Gessler,
Naim Matasci, Liya Wang, Matthew Hanlon, Andrew Lenards, Andy Muir, Nirav Merchant, Sonya
Lowry, Stephen Mock, Matthew Helmke, Adam Kubach, Martha Narro, Nicole Hopkins, David
Micklos, Uwe Hilgert, Michael Gonzales, Chris Jordan, Edwin Skidmore, Rion Dooley, John
Cazes, Robert McLay, Zhenyuan Lu, Shiran Pasternak, Lars Koesterke, William H. Piel, Ruth
Grene, Christos Noutsos, Karla Gendler, Xin Feng, Chunlao Tang, Monica Lent, Seung-Jin Kim,
Kristian Kvilekval, B. S. Manjunath, Val Tannen, Alexandros Stamatakis, Michael Sanderson,
Stephen M. Welch, Karen A. Cranston, Pamela Soltis, Doug Soltis, Brian O’Meara, Cecile Ane,
Tom Brutnell, Daniel J. Kleibenstein, Jeffery W. White, James Leebens-Mack, Michael J.
Donoghue, Edgar P. Spalding, Todd J. Vision, Christopher R. Myers, David Lowenthal, Brian J.
Enquist, Brad Boyle, Ali Akoglu, Greg Andrews, Sudha Ram, Doreen Ware, Lincoln Stein, and
Dan Stanzione. The iPlant Collaborative: Cyberinfrastructure for Plant Biology. Frontiers in Plant
Science, 2:34, 7 2011.
Chris Holdgraf, Aaron Culich, Ariel Rokem, Fatma Deniz, Maryana Alegro, and Dani Ushizima.
Portable Learning Environments for Hands-On Computational Instruction. Proceedings of the
Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and
Impact – PEARC17, pages 1–9, 2017.
Jetstream. Trial Access Allocation, 2017.
John Michael Lowe, Michael Packard, and C. Bret Hammond. Jetstream Salt States. Jetstream: A
Novel Cloud System for Science
Ruth Malan and Dana Bredemeyer. Functional Requirements and Use Cases. Technical report, 2001.
Joe Mambretti, Jim Chen, and Fei Yeh. Next Generation Clouds, the Chameleon Cloud Testbed, and
Software Defined Networking (SDN). In Proceedings of the 2015 International Conference on
Cloud Computing Research and Innovation (ICCCRI), ICCCRI ’15, pages 73–79, Washington,
DC, USA, 2015. IEEE Computer Society.
Nirav Merchant, Eric Lyons, Stephen Goff, Matthew Vaughn, Doreen Ware, David Micklos, and
Parker Antin. The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the
Life Sciences. PLOS Biology, 14(1):e1002342, 1 2016.
National Science Foundation. Cyberinfrastructure: From Supercomputing to the TeraGrid, 2006.
National Science Foundation. CISE Research Infrastructure: Mid-Scale Infrastructure – NSFCloud
(CRI: NSFCloud), 2013.
National Science Foundation. TeraGrid Phase III: eXtreme Digital Resources for Science and
Engineering (XD), 2008.
Jp Navarro, Craig A Stewart, Richard Knepper, Lee Liming, David Lifka, and Maytal Dahan. The
Community Software Repository from XSEDE: A Resource for the National Research Community.
OpenStack Foundation. Getting started with the OpenStack SDK, 2017.
OpenStack Foundation. Heat, 2017.
OpenStack Foundation. Horizon Dashboard, 2017.
OpenStack Foundation. OpenStack Clients, 2017.
OpenStack Foundation. OpenStack Roadmap, 2017.
ORCID Inc. ORCID — Connecting Research and Researchers.
Yuvi Panda and Andrea Zonca. kubeadm-bootstrap, 2017.
Project Jupyter team. Zero to JupyterHub with Kubernetes, 2017.
Paul Rad, Mehdi Roopaei, Nicole Beebe, Mehdi Shadaram, and Yoris A. Au. AI Thinking for Cloud
Education Platform with Personalized Learning. In 51st Hawaii International Conference on
System Sciences, Waikoloa Village, HI, 2018.
Inc. Red Hat. Ceph Homepage – Ceph, 2017.
FN Sakimura, J Bradley, M Jones, B de Medeiros, and C Mortimore. OpenID Connect Core 1.0
incorporating errata set 1, 2014.
C.A. Stewart. Preserving Scientific Software … in a Usable Form? EDUCAUSE Review, 2016.
C.A. Stewart, V. Welch, B. Plale, G. Fox, M. Pierce, and T. Sterling. Indiana University Pervasive
Technology Institute, 2017.
Craig A Stewart, David Y. Hancock, Matthew Vaughn, Jeremy Fischer, Lee Liming, Nirav Merchant,
Therese Miller, John Michael Lowe, Daniel Stanzione, Jaymes Taylor, and Edwin Skidmore.
Jetstream – Performance, Early Experiences, and Early Results. In Proceedings of the XSEDE16
Conference, St. Louis, MO, 2016.
Craig A. Stewart, David Y. Hancock, Matthew Vaughn, Nirav C. Merchant, John Michael Lowe,
Jeremy Fischer, Lee Liming, James Taylor, Enis Afgan, George Turner, C. Bret Hammond, Edwin
Skidmore, Michael Packard, and Ian Foster. System Acceptance Report for NSF award 1445604
High Performance Computing System Acquisition: Jetstream – A Self-Provisioned, Scalable
Science and Engineering Cloud Environment. Technical report, Indiana University, Bloomington,
IN, 2016.
Craig A Stewart, R Knepper, Andrew Grimshaw, Ian Foster, Felix Bachmann, D Lifka, Morris Riedel,
and Steven Tuecke. Campus Bridging Use Case Quality Attribute Scenarios. Technical report,
2012.
Craig A. Stewart, Richard Knepper, Andrew Grimshaw, Ian Foster, Felix Bachmann, David Lifka,
Morris Riedel, and Steven Tueke. XSEDE Campus Bridging Use Cases. Technical report, 2012.
Craig A. Stewart, Richard Knepper, Matthew R Link, Marlon Pierce, Eric Wernert, and Nancy
Wilkins-Diehr. Cyberinfrastructure, Cloud Computing, Science Gateways, Visualization, and
Cyberinfrastructure Ease of Use. In Mehdi Khosrow-Pour, editor, Encyclopedia of Information
Science and Technology. IGI Global, Hershey, PA, fourth edition, 2018.
University of Texas at Austin, Texas Advanced Computing Center, 2017.
The OpenStack Foundation. OpenStack, 2017.
John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor
Hazlewood, Scott Lathrop, Dave Lifka, Gregory D. Peterson, Ralph Roskies, J. Ray Scott, and
Nancy Wilkens-Diehr. XSEDE: Accelerating Scientific Discovery. Computing in Science &
Engineering, 16(5):62–74, 9 2014.
Steven Tuecke, Rachana Ananthakrishnan, Kyle Chard, Mattias Lidman, Brendan Mccollam, Stephen
Rosen, and Ian Foster. Globus Auth: A Research Identity and Access Management Platform. In
IEEE 12th International Conference on eScience, Baltimore, Maryland, 2016.
Ubuntu. Cloud-Init, 2017.
Venkatesh Viswanath, Michael G Morris, Gordon B Davis, and Fred D Davis. User Acceptance of
Information Technology: Toward a Unified View. MIS Quarterly, 27(3):425–478, 2003.
Gregor von Laszewski, Geoffrey C. Fox, Fugang Wang, Andrew J. Younge, Archit Kulshrestha,
Gregory G. Pike, Warren Smithy, Jens Vöcklerz, Renato J. Figueiredox, Jose Fortesx, and Kate
Keahey. Design of the Futuregrid experiment management framework. In 2010 Gateway
Computing Environments Workshop, GCE 2010, 2010.
XSEDE. XSEDE Education Allocations, 2017.
XSEDE. XSEDE Research Allocations, 2017.
XSEDE. XSEDE Startup Allocations, 2017.
AVBP website at CERFACS. http://www.cerfacs.fr/avbp7x/. Accessed: 2017-07-28.
Chroma github repository. https://jeffersonlab.github.io/chroma/. Accessed: 2017-07-28.
CMS software github repository. https://github.com/cms-sw/cmssw. Accessed: 2017-07-28.
DEEP-ER project website. http://www.deep-er.eu. Accessed: 2017-07-16.
DEEP project website. http://www.deep-project.eu. Accessed: 2017-07-16.
DEEP prototype website. http://www.fz-
juelich.de/ias/jsc/EN/Expertise/Supercomputers/DEEP/DEEP_node.html. Accessed: 2017-07-16.
EXTOLL GmbH website. http://www.extoll.de. Accessed: 2017-07-23.
EXTOLL Tourmalet. http://www.http://extoll.de/products/tourmalet. Accessed: 2017-07-23.
Forschungszentrum Jülich website. http://www.fz-juelich.de/en. Accessed: 2017-07-09.
Full Wave Inversion (FWI) code in DEEP-ER. http://www.deep-projects.eu/applications/project-
applications/enhancing-oil-exploration.html. Accessed: 2017-07-28.
GROMACS application website. http://www.gromacs.org/. Accessed: 2017-07-28.
Helmoltz association website. https://www.helmholtz.de/en/. Accessed: 2017-07-09.
High-Q club website. http://www.fz-juelich.de/ias/jsc/EN/Expertise/High-Q-Club/_node.html.
Accessed: 2017-08-10.
Human Brain Project pilot systems website. http://www.fz-
juelich.de/ias/jsc/EN/Expertise/Supercomputers/HBPPilots/_node.html. Accessed: 2017-07-16.
JSC Simlabs website. http://www.fz-juelich.de/ias/jsc/EN/Expertise/SimLab/simlab_node.html.
Accessed: 2017-08-07.
Jülich Supercomputing Centre website. http://www.fz-juelich.de/ias/jsc/EN. Accessed: 2017-07-09.
JUST: Jülich Storage Cluster website. http://www.fz-
juelich.de/ias/jsc/EN/Expertise/Datamanagement/OnlineStorage/JUST/JUST_node.html. Accessed:
2017-07-16.
NEST code website. www.nest-simulator.org. Accessed: 2017-07-28.
OpenHMC. http://www.uni-heidelberg.de/openhmc. Accessed: 2017-07-26.
ParaStation V5 website. http://www.par-tec.com/products/parastationv5.html. Accessed: 2017-07-26.
Prace website. http://www.prace-ri.eu. Accessed: 2017-08-07.
QPACE3 website. http://www.fz-
juelich.de/ias/jsc/EN/Expertise/Supercomputers/QPACE3/_node.html. Accessed: 2017-07-16.
Reverse time migration (rtm) code website. http://www.cgg.com/en/What-We-Do/Subsurface-
Imaging/Migration/Reverse-Time-Migration. Accessed: 2017-07-28.
SeisSol application website. http://www.seissol.org/. Accessed: 2017-07-28.
SKA data analysis pipeline in DEEP-ER. http://www.deep-projects.eu/applications/project-
applications/radio-astronomy.html. Accessed: 2017-07-28.
Top 500 list. https://www.top500.org/lists/. Accessed: 2017-06-26.
3M. Novec. http://multimedia.3m.com/mws/media/569865O/3mtm-novectm-649-engineered-
fluid.pdf?&fn=Novec649_6003926.pdf.
Adaptive Computing. Maui. http://www.adaptivecomputing.com/products/open-source/maui/.
Adaptive Computing. TORQUE Resource Manager.
http://www.adaptivecomputing.com/products/open-source/torque/.
Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt
Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and
Katherine A. Yelick. The landscape of parallel computing research: A view from Berkeley.
Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley,
Dec 2006.
Norbert Attig, Florian Berberich, Ulrich Detert, Norbert Eicker, Thomas Eickermann, Paul Gibbon,
Wolfgang Gürich, Wilhelm Homberg, Antonia Illich, Sebastian Rinke, Michael Stephan, Klaus
Wolkersdorfer, and Thomas Lippert. Entering the Petaflop-Era – New Developments in
Supercomputing. In NIC Symposium 2010 / ed.: G. Münster, D. Wolf, M. Kremer, Jülich,
Forschungszentrum Jülich, IAS Series Vol. 3. – 978-3-89336-606-4. – S. 1 – 12, 2010. Record
converted from VDB: 12.11.2012.
Dirk Brömmel, Ulrich Detert, Stephan Graf, Thomas Lippert, Boris Orth, Dirk Pleiter, Michael
Stephan, and Estela Suarez. Paving the Road towards Pre-Exascale Supercomputing. In NIC
Symposium 2014 – Proceedings, volume 47 of NIC Series, pages 1–14, Jülich, Feb 2014. NIC
Symposium 2014, Jülich (Germany), 12 Feb 2014 – 13 Feb 2014, John von Neumann Institute for
Computing.
Ulrich Bruening, Mondrian Nuessle, Dirk Frey, and Hans-Christian Hoppe. An Immersive Cooled
Implementation of a DEEP Booster, pages 30–36. Intel Corporation, Munich, 2015.
Michalis Christou, Theodoros Christoudias, Julian Morillo, Damian Alvarez, and Hendrik Merx. Earth
system modelling on system-level heterogeneous architectures: EMAC (version 2.42) on the
Dynamical Exascale Entry Platform (DEEP). Geoscientific model development, 9(9):3483 – 3491,
2016.
Alejandro Duran, Eduard Ayguadé, Rosa M. Badia, Jesús Labarta, Luis Martinell, Xavier Martorell,
and Judit Planas. OmpSs: A proposal for programming heterogeneous multi-core architectures.
Parallel Processing Letters, 21(02):173–193, 2011.
Norbert Eicker, Andreas Galonska, Jens Hauke, and Mondrian Nüssle. Bridging the DEEP Gap –
Implementation of an Efficient Forwarding Protocol, pages 34–41. Intel Corporation, Munich,
2014.
Norbert Eicker and Thomas Lippert. An accelerated Cluster-Architecture for the Exascale. In PARS
’11, PARS-Mitteilungen, Mitteilungen – Gesellschaft für Informatik e.V., Parallel-Algorithmen
und Rechnerstrukturen, ISSN 0177-0454, Nr. 28, Oktober 2011 (Workshop 2011), 110 – 119,
2011. Record converted from VDB: 12.11.2012.
Norbert Eicker, Thomas Lippert, Thomas Moschny, and Estela Suarez. The DEEP Project – An
alternative approach to heterogeneous cluster-computing in the manycore era. Concurrency and
computation, 28(8):23942411, 2016.
Andrew Emerson and Fabio Affinito. Enabling a Quantum Monte Carlo application for the DEEP
architecture. In 2015 International Conference on High Performance Computing Simulation
(HPCS), pages 453–457, July 2015.
Fraunhofer Gesselschaft. BeeGFS website.
Fraunhofer Gesselschaft. BeeOND: BeeGFS On Demand website.
Jens Freche, Wolfgang Frings, and Godehard Sutmann. High Throughput Parallel-I/O using SIONlib
for Mesoscopic Particle Dynamics Simulations on Massively Parallel Computers. In Parallel
Computing: From Multicores and GPU’s to Petascale, / ed.: B. Chapman, F. Desprez, G.R.
Joubert, A. Lichnewsky, F. Peters and T. Priol, Amsterdam, IOS Press, 2010. Advances in Parallel
Computing Volume 19. – 978-1-60750-529-7. – S. 371 – 378, 2010. Record converted from VDB:
12.11.2012.
Wolfgang Frings, Felix Wolf, and Ventsislav Petkov. Scalable Massively Parallel I/O to Task-Local
Files. In Proceedings of the Conference on High Performance Computing Networking, Storage and
Analysis, Portland, Oregon, November 14 – 20, 2009, SC’09, SESSION: Technical papers, Article
No. 17, New York, ACM, 2009. ISBN 978-1-60558-744-8. – S. 1 – 11, 2009. Record converted
from VDB: 12.11.2012.
Markus Götz, Christian Bodenstein, and Morris Riedel. HPDBSCAN – Highly parallel DBSCAN. In
Proceedings of the Workshop on Machine Learning in High-Performance Computing
Environments – MLHPC ’15, page 2. Workshop Workshop on Machine Learning in High-
Performance Computing Environments, subworkshop to Supercomputing 2015, Austin (Texas), 15
Nov 2015 – 15 Nov 2015, ACM Press New York, New York, USA, Nov 2015.
Dorian Krause and Philipp Thörnig. JURECA: General-purpose supercomputer at Jülich
Supercomputing Centre. Journal of large-scale research facilities, 2:A62, 2016.
Anke Kreuzer, Jorge Amaya, Norbert Eicker, Raphaël Léger, and Estela Suarez. The DEEP-ER
project: I/O and resiliency extensions for the Cluster-Booster architecture. In Proceedings of the
20th International Conference on High Performance Computing and Communications (HPCC),
Exeter, UK, 2018. IEEE Computer Society Press. (accepted for publication).
Anke Kreuzer, Jorge Amaya, Norbert Eicker, and Estela Suarez. Application performance on a
Cluster-Booster system. In Proceedings of the 2018 IEEE International Parallel and Distributed
Processing Symposium (IPDPS) Workshops Proceedings (HCW), IPDPS Conference, Vancouver,
Canada, 2018. (accepted for publication).
Pramod Kumbhar, Michael Hines, Aleksandr Ovcharenko, Damian Alvarez, James King, Florentino
Sainz, Felix Schürmann, and Fabien Delalondre. Leveraging a Cluster-Booster Architecture for
Brain-Scale Simulations. In Proceedings of the 31st International Conference High Performance
Computing, volume 9697 of Lecture Notes in Computer Science, pages 363 – 380, Cham, Jun
2016. 31st International Conference High Performance Computing, Frankfurt (Germany), 19 Jun
2016 – 23 Jun 2016, Springer International Publishing.
Raphäel Léger, Damian Alvarez Mallon, Alejandro Duran, and Stephane Lanteri. Adapting a Finite-
Element Type Solver for Bioelectromagnetics to the DEEP-ER Platform. In Parallel Computing:
On the Road to Exascale, volume 27 of Advances in Parallel Computing, pages 349 – 359.
International Conference on Parallel Computing 2015, Edinburgh (UK), 1 Sep 2015 – 4 Sep 2015,
IOS Press Ebooks, Sep 2016.
Thomas Lippert. Recent Developments in Supercomputing, volume 39 of NIC series, pages 1–8. John
von Neumann Institute for Computing, Jülich, 2008. Record converted from VDB: 12.11.2012.
Programming Models @ BSC. The OmpSs Programming Model, 2013.
SchedMD. SLURM website. https://slurm.schedmd.com/.
Michael Stephan and Jutta Docter. JUQUEEN: IBM Blue Gene/Q Supercomputer System at the Jülich
Supercomputing Centre. Journal of large-scale research facilities, 1:A1, 2015.
Estela Suarez, Norbert Eicker, and Thomas Lippert. Supercomputing Evolution at JSC. volume 49 of
Publication Series of the John von Neumann Institute for Computing (NIC) NIC Series, pages 1 –
12, Jlich, Feb 2018. NIC Symposium 2018, Jülich (Germany), 22 Feb 2018 – 23 Feb 2018, John
von Neumann Institute for Computing.
Anna Wolf, Anke Zitz, Norbert Eicker, and Giovanni Lapenta. Particle-in-Cell algorithms on
DEEP:The iPiC3D case study. volume 32, pages 38–48, Erlangen, May 2015. PARS 15, Potsdam
(Germany), 7 May 2015 – 8 May 2015, PARS.
IBM: Tivoli workload scheduler LoadLeveler, 2015.
American Society of Heating, Refrigerating and Air-Conditioning Engineers, 2016.
Applications Software at LRZ, 2017.
Axel Auweter, Arndt Bode, Matthias Brehm, Luigi Brochard, Nicolay Hammer, Herbert Huber, Raj
Panda, Francois Thomas, and Torsten Wilde. A Case Study of Energy Aware Scheduling on
SuperMUC. In Proceedings of the 29th International Conference on Supercomputing – Volume
8488, ISC 2014, pages 394–409, New York, NY, USA, 2014. Springer-Verlag New York, Inc.
Natalie Bates, Girish Ghatikar, Ghaleb Abdulla, Gregory A Koenig, Sridutt Bhalachandra, Mehdi
Sheikhalishahi, Tapasya Patki, Barry Rountree, and Stephen Poole. Electrical grid and
supercomputing centers: An investigative analysis of emerging opportunities and challenges.
Informatik-Spektrum, 38(2):111–127, 2015.
Bavarian Academy of Sciences and Humanities, 2017.
Alexander Breuer, Alexander Heinecke, Sebastian Rettenberger, Michael Bader, Alice-Agnes Gabriel,
and Christian Pelties. Sustained petascale performance of seismic simulations with seissol on
supermuc. In International Supercomputing Conference, pages 1–18. Springer, 2014.
GCS: Delivering 10 Years of Integrated HPC Excellence for Germany, Spring 2017.
Gauss Centre for Supercomputing, 2016.
Green500, 2016.
Carla Guillen, Wolfram Hesse, and Matthias Brehm. The persyst monitoring tool. In European
Conference on Parallel Processing, pages 363–374. Springer, 2014.
Carla Guillen, Carmen Navarrete, David Brayford, Wolfram Hesse, and Matthias Brehm. Dvfs
automatic tuning plugin for energy related tuning objectives. In Green High Performance
Computing (ICGHPC), 2016 2nd International Conference, pages 1–8. IEEE, 2016.
Nicolay Hammer, Ferdinand Jamitzky, Helmut Satzger, Momme Allalen, Alexander Block, Anupam
Karmakar, Matthias Brehm, Reinhold Bader, Luigi Iapichino, Antonio Ragagnin, et al. Extreme
scale-out supermuc phase 2-lessons learned. arXiv preprint arXiv:1609.01507, 2016.
John L Hennessy and David A Patterson. Computer architecture: a quantitative approach. Elsevier,
2012.
HLRS High Performance Computing Center Stuttgart, 2017.
IBM, 2016.
Intel. Intel Xeon Processor E5 v3 Product Family. Processor Specification Update, August 2015.
Intel, 2016.
Jülich Supercomputing Centre (JSC), 2017.
Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences and Humanities, 2017.
Lenovo, 2016.
Magneticum: Simulating Large Scale Structure Formation in the Universe, 2014.
Wagner S., Bode A., Brüchle H., and Brehm M. Extreme Scale-out on SuperMUC Phase 2. 2016.
ISBN: 978-3-9816675-1-6.
Magnus Schwörer, Konstantin Lorenzen, Gerald Mathias, and Paul Tavan. Utilizing fast multipole
expansions for efficient and accurate quantum-classical molecular dynamics simulations. The
Journal of chemical physics, 142(10):03B608_1, 2015.
Hayk Shoukourian. Adviser for Energy Consumption Management: Green Energy Conservation. PhD
thesis, München, Technische Universität München (TUM), 2015.
Hayk Shoukourian, Torsten Wilde, Axel Auweter, and Arndt Bode. Monitoring Power Data: A first
step towards a unified energy efficiency evaluation toolset for HPC data centers. Elsevier, 2013.
Hayk Shoukourian, Torsten Wilde, Axel Auweter, and Arndt Bode. Predicting the Energy and Power
Consumption of Strong and Weak Scaling HPC Applications. Supercomputing Frontiers and
Innovations, 1(2):20–41, 2014.
Hayk Shoukourian, Torsten Wilde, Axel Auweter, Arndt Bode, and Daniele Tafani. Predicting Energy
Consumption Relevant Indicators of Strong Scaling HPC Applications for Different Compute
Resource Configurations. To appear in the proceedings of the 23rd High Performance Computing
Symposium, Society for Modeling and Simulation International (SCS), 2015.
Hayk Shoukourian, Torsten Wilde, Herbert Huber, and Arndt Bode. Analysis of the efficiency
characteristics of the first high-temperature direct liquid cooled petascale supercomputer and its
cooling infrastructure. Journal of Parallel and Distributed Computing, 107:87 – 100, 2017.
Hayk Shoukourian, Torsten Wilde, Detlef Labrenz, and Arndt Bode. Using machine learning for data
center cooling infrastructure efficiency prediction. In Parallel and Distributed Processing
Symposium Workshops (IPDPSW), 2017 IEEE International, pages 954–963. IEEE, 2017.
The SIMOPEK Project. http://simopek.de/, 2016.
Top500, 2017.
T. Wilde, M. Ott, A. Auweter, I. Meijer, P. Ruch, M. Hilger, S. Khnert, and H. Huber. Coolmuc-2: A
supercomputing cluster with heat recovery for adsorption cooling. In 2017 33rd Thermal
Measurement, Modeling Management Symposium (SEMITHERM), pages 115–121, March 2017.
Torsten Wilde, Axel Auweter, and Hayk Shoukourian. The 4 pillar framework for energy efficient hpc
data centers. Computer Science – Research and Development, pages 1–11, 2013.
Compressible Turbulence World’s Largest Simulation of Supersonic, 2013.
APEX Benchmarks. https://www.nersc.gov/research-and-development/apex/apex-benchmarks/.
Intel VTune Amplifier. https://software.intel.com/en-us/intel-vtune-amplifier-xe.
Intel®Advisor. https://software.intel.com/en-us/intel-advisor-xe.
Libfabric OpenFabrics. https://ofiwg.github.io/libfabric.
MPICH. http://www.mpich.org.
NERSC-8 Benchmarks. https://www.nersc.gov/users/computational-systems/cori/nersc-8-
procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/.
NERSC Cori System. https://www.nersc.gov/users/computational-systems/cori.
NERSC Edison System. https://www.nersc.gov/users/computational-systems/edison.
NESAP. http://www.nersc.gov/users/computational-systems/cori/nesap/nesap-projects.
NESAP Application Case Studies. http://www.nersc.gov/users/computational-
systems/cori/application-porting-and-performance/application-case-studies/.
NESAP Projects. http://www.nersc.gov/users/computational-systems/cori/nesap.
NESAP Xeon Phi Application Performance. http://www.nersc.gov/users/application-
performance/preparing-for-cori/.
Quantum ESPRESSO Case Study. http://www.nersc.gov/users/computational-
systems/cori/application-porting-and-performance/application-case-studies/quantum-espresso-
exact-exchange-case-study/.
Roofline Performance Model. http://crd.lbl.gov/departments/computerscience/PAR/research/roofline.
SDE: Intel Software Development Emulator. https://software.intel.com/en-us/articles/intel-software-
development-emulator.
Tips for Using CMake and GNU Autotools on Cray Heterogeneous Systems.
http://docs.cray.com/books/S-2801-1608//S-2801-1608.pdf.
Taylor Barnes, Brandon Cook, Douglas Doerfler, Brian Friesen, Yun He, Thorsten Kurth, Tuomas
Koskela, Mathieu Lobet, Tareq Malas, Leonid Oliker, and et al. Evaluating and Optimizing the
NERSC Workload on Knights Landing. Jan 2016.
Taylor A. Barnes, Thorsten Kurth, Pierre Carrier, Nathan Wichmann, David Prendergast, Paul R.C.
Kent, and Jack Deslippe. Improved treatment of exact exchange in quantum {ESPRESSO}.
Computer Physics Communications, 214:52 – 58, 2017.
W. Bhimji, D. Bard, K. Burleigh, C. Daley, S. Farrell, M. Fasel, B. Friesen, L. Gerhardt, J. Liu, P.
Nugent, D. Paul, J. Porter, and V. Tsulaia. Extreme i/o on hpc for hep using the burst buffer at
nersc. Computing in High-Energy Physics, 2016.
W. Bhimji, D. Bard, M. Romanus, D. Paul, A. Ovsyannikov, B. Friesen, M. Bryson, J. Correa, G.K.
Lockwood, V. Tsulaia, Byna S., S Farrell, D. Gursoy, C. Daley, V Beckner, B. Van Straalen, D.
Trebotich, Tull C., G.H. Weber, N.J. Wright, K. Antypas, and Prabhat. Accelerating science with
the nersc burst buffer. Cray User Group, 2016.
S. Binder, A. Calci, E. Epelbaum, R. J. Furnstahl, J. Golak, K. Hebeler, H. Kamada, H. Krebs, J.
Langhammer, S. Liebig, P. Maris, U.-G. Meißner, D. Minossi, A. Nogga, H. Potter, R. Roth, R.
Skinińki, K. Topolnicki, J. P. Vary, and H. Witała. Few-nucleon systems with state-of-the-art chiral
nucleon-nucleon forces. Phys. Rev. C, 93(4):044002, 2016.
R.S. Canon, T. Declerck, B. Draney, J. Lee, D. Paul, and D. Skinner. Enabling a superfacility with
software defined networking. Cray User Group, 2017.
Brandon Cook, Pieter Maris, Meiyue Shao, Nathan Wichmann, Marcus Wagner, John ONeill, Thanh
Phung, and Gaurav Bansal. High performance optimizations for nuclear physics code mfdn on knl.
In International Conference on High Performance Computing, pages 366–377. Springer, 2016.
Jack Deslippe, Georgy Samsonidze, David A. Strubbe, Manish Jain, Marvin L. Cohen, and Steven G.
Louie. Berkeleygw: A massively parallel computer package for the calculation of the quasiparticle
and optical properties of materials and nanostructures. Computer Physics Communications,
183(6):1269 – 1289, 2012.
Douglas Doerfler, Jack Deslippe, Samuel Williams, Leonid Oliker, Brandon Cook, Thorsten Kurth,
Mathieu Lobet, Tareq Malas, Jean-Luc Vay, and Henri Vincenti. Applying the Roofline
Performance Model to the Intel Xeon Phi Knights Landing Processor, pages 339–353. Springer
International Publishing, Cham, 2016.
P. Hill, C. Synder, and J. Sygulla. Knl system software. Cray User Group, 2017.
D.M. Jacobsen. Extending cle6 to a multicomputer os. Cray User Group, 2017.
M. Jette, D.M. Jacobsen, and D. Paul. Scheduler optimization for current generation cray systems.
Cray User Group, 2017.
William TC Kramer, John M Shalf, and Erich Strohmaier. The sustained system performance (ssp)
benchmark.
Thorsten Kurth, William Arndt, Taylor Barnes, Brandon Cook, Jack Deslippe, Doug Doerfler, Brian
Friesen, Yu He, Tuomas Koskela, Mathieu Lobet, Tareq Malas, Leonid Oliker, Andrey
Ovsyannikov, Samuel Williams, Woo-Sun Yang, and Zhengji Zhao. Analyzing Performance of
Selected Applications on the Cori HPC System. Jun 2017. Accepted for IXPUG Workshop
Experiences on Intel Knights Landing at the One Year Mark, ISC 2017, Frankfurt, Germany.
Melara M, Gamblin T, Becker G, French R, Belhorn M, Thompson K, Scheibel P, and HartmanBaker
R. Using spack to manage software on cray supercomputers. In Proceedings of Cray User Group,
2017.
P. Maris, M. A. Caprio, and J. P. Vary. Emergence of rotational bands in ab initio no-core
configuration interaction calculations of the Be isotopes. Phys. Rev. C, 91(1):014310, 2015.
P. Maris, J. P. Vary, P. Navratil, W. E. Ormand, H. Nam, and D. J. Dean. Origin of the anomalous
long lifetime of 14C. Phys. Rev. Lett., 106(20):202502, 2011.
Pieter Maris, James P. Vary, S. Gandolfi, J. Carlson, and Steven C. Pieper. Properties of trapped
neutrons interacting with realistic nuclear Hamiltonians. Phys. Rev. C, 87(5):054318, 2013.
A. Ovsyannikov, M. Romanus, B. Van Straalwn, G. Weber, and D. Trebotich. Scientific workflows at
datawarp-speed: Accelerated data-intensive science using nersc’s burst buffer. IEEE, 2016.
Meiyue Shao, Hasan Metin Aktulga, Chao Yang, Esmond G Ng, Pieter Maris, and James P Vary.
Accelerating nuclear configuration interaction calculations through a preconditioned block iterative
eigensolver. arXiv preprint arXiv:1609.01689, 2016.
Samuel Williams, Andrew Waterman, and David Patterson. Roofline: An insightful visual
performance model for multicore architectures. Commun. ACM, 52(4):65–76, April 2009.
Samuel Webb Williams. Auto-tuning Performance on Multicore Computers. PhD thesis, Berkeley,
CA, USA, 2008. AAI3353349.
A list of Top50 most powerful supercomputers in Russia and CIS. http://top50.supercomputers.ru.
Moscow University Supercomputing Center. http://hpc.msu.ru.
Octoshell source code. https://github.com/octoshell/octoshell-v2.
Octotron framework source code. https://github.com/srcc-msu/octotron.
Open Encyclopedia of Parallel Algorithmic Features. http://algowiki-project.org.
Slurm — cluster management and job scheduling system. https://slurm.schedmd.com.
xCAT. http://xcat.org/.
A. Antonov, D. Nikitenko, P. Shvets, S. Sobolev, K. Stefanov, Vad. Voevodin, Vl. Voevodin, and S.
Zhumatiy. An approach for ensuring reliable functioning of a supercomputer based on a formal
model. In Parallel Processing and Applied Mathematics. 11th International Conference, PPAM
2015, Krakow, Poland, September 6–9, 2015. Revised Selected Papers, Part I, volume 9573 of
Lecture Notes in Computer Science, pages 12–22. Springer International Publishing, 2016.
A. Antonov, V. Voevodin, and J. Dongarra. Algowiki: an Open encyclopedia of parallel algorithmic
features. Supercomputing Frontiers and Innovations, 2(1):4–18, 2015.
A. Brechalov. Moscow State University Meets Provides a Facility That Meets HPC Demands. Uptime
Institute Journal, 6:50, 2016.
B. Mohr, E. Hagersten, J. Gimenez, A. Knupfer, D. Nikitenko, M. Nilsson, H. Servat, A. Shah, Vl.
Voevodin, F. Winkler, F. Wolf, and I. Zhukov. The HOPSA Workflow and Tools. In Proceedings
of the 6th International Parallel Tools Workshop, Stuttgart, 2012, volume 11, pages 127–146.
Springer, 2012.
D.A. Nikitenko, Vad.V. Voevodin, and S.A. Zhumatiy. Octoshell: Large supercomputer complex
administration system. In Proceedings of the 1st Russian Conference on Supercomputing —
Supercomputing Days 2015, volume 1482 of CEUR Workshop Proceedings, pages 69–83, 2015.
D.A. Nikitenko, S.A. Zhumatiy, and P.A. Shvets. Making Large-Scale Systems Observable —
Another Inescapable Step Towards Exascale. Supercomputing Frontiers and Innovations, 3(2):72–
79, 2016.
V. Sadovnichy, A. Tikhonravov, Vl Voevodin, and V. Opanasenko. Lomonosov: Supercomputing at
Moscow State University. In Contemporary High Performance Computing: From Petascale
toward Exascale, Chapman & Hall/CRC Computational Science, pages 283–307, Boca Raton,
United States, 2013.
K.S. Stefanov, Vl.V. Voevodin, S.A. Zhumatiy, and Vad.V. Voevodin. Dynamically Re-configurable
Distributed Modular Monitoring System for Supercomputers (DiMMon). volume 66 of Procedia
Computer Science, pages 625–634. Elsevier B.V., 2015.
Vl.V. Voevodin, Vad.V. Voevodin, D.I. Shaikhislamov, and D.A. Nikitenko. Data mining method for
anomaly detection in the supercomputer task flow. In Numerical Computations: Theory and
Algorithms, The 2nd International Conference and Summer School, Pizzo calabro, Italy, June 20–
24, 2016, volume 1776 of AIP Conference Proceedings, 2016.
Estimating the Circulation and Climate of the Ocean Consortium, Phase II (ECCO2). Website.
http://ecco.jpl.nasa.gov/.
D. Ellsworth, C. Henze, and B. Nelson. Interactive Visualization of High-Dimensional Petascale
Ocean Data. 2017 IEEE 7th Symposium on Large Data Analysis and Visualization (LDAV).
Phoenix, AZ, 2017.
The Enzo Project. Website. http://enzo-project.org/.
FUN3D: Fully Unstructured Navier-Stokes. Website. http://fun3d.larc.nasa.gov/.
The GEOS-5 System. Website. http://gmao.gsfc.nasa.gov/systems/geos5/.
HECC Storage Resources. Website. https://www.nas.nasa.gov/hecc/resources/storage_systems.html.
High Performance Conjugate Gradients, November 2016. Website. http://www.hpcg-
benchmark.org/custom/index.html?lid=155&slid=289.
hyperwall Visualization System. Website. https://www.nas.nasa.gov/hecc/resources/viz_systems.html.
Massachusetts Institute of Technology General Circulation Model (mitgcm). Website.
http://mitgcm.org/.
Merope Supercomputer. Website. https://www.nas.nasa.gov/hecc/resources/merope.html.
nu-WRF: NASA-Unified Weather Research and Forecasting (nu-WRF). Website.
https://modelingguru.nasa.gov/community/atmospheric/nuwrf.
OVERFLOW Computational Fluid Dynamics (CFD) flow solver. Website.
https://overflow.larc.nasa.gov/.
Pleiades Supercomputer. Website. https://www.nas.nasa.gov/hecc/resources/pleiades.html.
TOP500 – November 2016. Website. https://www.top500.org/lists/2016/11/.
USM3D NASA Common Research Model (USM3D). Website.
https://commonresearchmodel.larc.nasa.gov/computational-approach/flow-solvers-used/usm3d/.
Causal web. https://ccd2.vm.bridges.psc.edu/ccd/login.
Causal web application quick start and user guide.
http://www.ccd.pitt.edu/wiki/index.php?title=Causal_Web_Application_Quick_Start_and_User_G
uide.
Frederick Jelinek Memorial Summer Workshop. https://www.lti.cs.cmu.edu/frederick-jelinek-
memorial-summer-workshop-closing-day-schedule.
Galaxy Main. https://usegalaxy.org.
Galaxy Project Stats. https://galaxyproject.org/galaxy-project/statistics/#usegalaxyorg-usage.
MIDAS MISSION Public Health Hackathon – Visualizing the future of public health. https://midas-
publichealth-hack-3336.devpost.com.
Openstack bare metal provisioning program. https://wiki.openstack.org/wiki/Ironic.
Science gateways listing. https://www.xsede.org/gateways-listing.
The GDELT Project. https://www.gdeltproject.org.
Serafim Batzoglou. Algorithmic challenges in mammalian whole-genome assembly. In Encyclopedia
of Genetics, Genomics, Proteomics and Bioinformatics. American Cancer Society, 2005.
Noam Brown and Tuomas Sandholm. Safe and Nested Subgame Solving for Imperfect-Information
Games. In I Guyon, U V Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, and R
Garnett, editors, ArXiv e-prints, volume Advances i, pages 689–699, Long Beach, California, 2017.
Curran Associates, Inc.
Noam Brown and Tuomas Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top
professionals. Science, 2017.
Gregory A. Cary, R. Andrew Cameron, and Veronica F. Hinman. EchinoBase: Tools for Echinoderm
Genome Analyses. In Eukaryotic Genomic Databases, Methods in Molecular Biology, pages 349–
369. Humana Press, New York, NY, 2018.
Uma R. Chandran, Olga P. Medvedeva, M. Michael Barmada, Philip D. Blood, Anish Chakka,
Soumya Luthra, Antonio Ferreira, Kim F. Wong, Adrian V. Lee, Zhihui Zhang, Robert Budden, J.
Ray Scott, Annerose Berndt, Jeremy M. Berg, and Rebecca S. Jacobson. TCGA Expedition: A Data
Acquisition and Management System for TCGA Data. PLOS ONE, 11(10):e0165395, October
2016.
Chris Dyer and Phil Blunsom. On the State of the Art of Evaluation in Neural Language Models.
pages 1–10, 2018.
Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and
Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks.
Nature Publishing Group, 2017.
O.S. Foundation. The Crossroads of Cloud and HPC: OpenStack for Scientific Research: Exploring
OpenStack Cloud Computing for Scientific Workloads. CreateSpace Independent Publishing
Platform, 2016.
Timothy Gushanas. NASA Twins Study Investigators to Release Integrated Paper in 2018. 2018.
Jo Handelsman. Metagenomics: Application of Genomics to Uncultured Microorganisms.
Microbiology and Molecular Biology Reviews, 68(4):669–685, December 2004.
David E Hudak, Douglas Johnson, Jeremy Nicklas, Eric Franz, Brian McMichael, and Basil Gohar.
Open OnDemand: Transforming Computational Science Through Omni-disciplinary Software
Cyberinfrastructure. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and
Science at Scale, pages 1–7, Miami, USA, 2016. ACM.
Morris A. Jette, Andy B. Yoo, and Mark Grondona. Slurm: Simple linux utility for resource
management. In In Lecture Notes in Computer Science: Proceedings of Job Scheduling Strategies
for Parallel Processing (JSSPP) 2003, pages 44–60. Springer-Verlag, 2002.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet Classification with Deep
Convolutional Neural Networks, pages 1097–1105. Curran Associates, Inc., 2012.
Gregory M. Kurtzer, Vanessa Sochat, and Michael W. Bauer. Singularity: Scientific containers for
mobility of compute. PLOS ONE, 12(5):1–20, 05 2017.
Katherine A Lawrence, Michael Zentner, Nancy Wilkins-Diehr, Julie A Wernert, Marlon Pierce,
Suresh Marru, and Scott Michael. Science gateways today and tomorrow: positive perspectives of
nearly 5000 members of the research community. Concurrency and Computation: Practice and
Experience, 27(16):4252–4268, 2015.
Charng-Da Lu, James Browne, Robert L. DeLeon, John Hammond, William Barth, Thomas R.
Furlani, Steven M. Gallo, Matthew D. Jones, and Abani K. Patra. Comprehensive job level
resource usage measurement and analysis for XSEDE HPC systems. Proceedings of the
Conference on Extreme Science and Engineering Discovery Environment Gateway to Discovery –
XSEDE ’13, page 1, 2013.
Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet. Are GANs
Created Equal? A Large-Scale Study. arXiv:1711.10337 [cs, stat], November 2017. arXiv:
1711.10337.
Herman L. Mays, Chih-Ming Hung, Pei-Jen Shaner, James Denvir, Megan Justice, Shang-Fang Yang,
Terri L. Roth, David A. Oehler, Jun Fan, Swanthana Rekulapally, and Donald A. Primerano.
Genomic Analysis of Demographic History and Ecological Niche Modeling in the Endangered
Sumatran Rhinoceros Dicerorhinus sumatrensis. Current Biology, 28(1):70–76.e4, January 2018.
Dirk Merkel. Docker: lightweight Linux containers for consistent development and deployment. Linux
Journal, 2014(239), 2014.
Paul Nowoczynski, Jason Sommerfield, Jared Yanovich, J. Ray Scott, Zhihui Zhang, and Michael
Levine. The data supercell. In Proceedings of the 1st Conference of the Extreme Science and
Engineering Discovery Environment: Bridging from the eXtreme to the Campus and Beyond,
XSEDE ’12, pages 13:1–13:11, New York, NY, USA, 2012. ACM.
Nicholas A. Nystrom. Bridges virtual tour. https://psc.edu/bvt.
Nicholas A. Nystrom, Michael J. Levine, Ralph Z. Roskies, and J. Ray Scott. Bridges: A uniquely
flexible hpc resource for new communities and data analytics. In Proceedings of the 2015 XSEDE
Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, XSEDE ’15,
pages 30:1–30:8, New York, NY, USA, 2015. ACM.
Nick Nystrom, Joel Welling, Phil Blood, and Eng Lim Goh. Blacklight: Coherent Shared Memory for
Enabling Science. In Contemporary High Performance Computing, Chapman & Hall/CRC
Computational Science, pages 421–440. Chapman and Hall/CRC, July 2013.
David Palesch, Steven E. Bosinger, Gregory K. Tharp, Thomas H. Vanderford, Mirko Paiardini, Ann
Chahroudi, Zachary P. Johnson, Frank Kirchhoff, Beatrice H. Hahn, Robert B. Norgren, Nirav B.
Patel, Donald L. Sodora, Reem A. Dawoud, Caro-Beth Stewart, Sara M. Seepo, R. Alan Harris,
Yue Liu, Muthuswamy Raveendran, Yi Han, Adam English, Gregg W. C. Thomas, Matthew W.
Hahn, Lenore Pipes, Christopher E. Mason, Donna M. Muzny, Richard A. Gibbs, Daniel Sauter,
Kim Worley, Jeffrey Rogers, and Guido Silvestri. Sooty mangabey genome sequence provides
insight into AIDS resistance in a natural SIV host. Nature, 553(7686):77–81, January 2018.
Pavel A. Pevzner, Haixu Tang, and Michael S. Waterman. An Eulerian path approach to DNA
fragment assembly. Proceedings of the National Academy of Sciences, 98(17):9748–9753, August
2001.
Lenore Pipes, Sheng Li, Marjan Bozinoski, Robert Palermo, Xinxia Peng, Phillip Blood, Sara Kelly,
Jeffrey M. Weiss, Jean Thierry-Mieg, Danielle Thierry-Mieg, Paul Zumbo, Ronghua Chen, Gary P.
Schroth, Christopher E. Mason, and Michael G. Katze. The non-human primate reference
transcriptome resource (NHPRTR) for comparative functional genomics. Nucleic Acids Research,
41(D1):D906–D914, January 2013.
Pranav Rajpurkar, Awni Y. Hannun, Masoumeh Haghpanahi, Codie Bourn, and Andrew Y. Ng.
Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks. arXiv:1707.01836
[cs], July 2017. arXiv: 1707.01836.
Jason A. Reuter, Damek V. Spacek, and Michael P. Snyder. High-throughput sequencing technologies.
Molecular Cell, 58(4):586–597, May 2015.
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang,
Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C Berg, and Li Fei-Fei. ImageNet
Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3):211–
252, 2015.
Alexander Sczyrba, Peter Hofmann, Peter Belmann, David Koslicki, Stefan Janssen, Johannes Dröge,
Ivan Gregor, Stephan Majda, Jessika Fiedler, Eik Dahms, Andreas Bremges, Adrian Fritz, Ruben
Garrido-Oter, Tue Sparholt Jørgensen, Nicole Shapiro, Philip D Blood, Alexey Gurevich, Yang
Bai, Dmitrij Turaev, Matthew Z DeMaere, Rayan Chikhi, Niranjan Nagarajan, Christopher Quince,
Fernando Meyer, Monika Balvočit, Lars Hestbjerg Hansen, Søren J Sørensen, Burton K H Chia,
Bertrand Denis, Jeff L Froula, Zhong Wang, Robert Egan, Dongwan Don Kang, Jeffrey J Cook,
Charles Deltel, Michael Beckstette, Claire Lemaitre, Pierre Peterlongo, Guillaume Rizk,
Dominique Lavenier, Yu-Wei Wu, Steven W Singer, Chirag Jain, Marc Strous, Heiner
Klingenberg, Peter Meinicke, Michael D Barton, Thomas Lingner, Hsin-Hung Lin, Yu-Chieh Liao,
Genivaldo Gueiros Z Silva, Daniel A Cuevas, Robert A Edwards, Surya Saha, Vitor C Piro,
Bernhard Y Renard, Mihai Pop, Hans-Peter Klenk, Markus Göker, Nikos C Kyrpides, Tanja
Woyke, Julia A Vorholt, Paul Schulze-Lefert, Edward M Rubin, Aaron E Darling, Thomas Rattei,
and Alice C McHardy. Critical Assessment of Metagenome Interpretation—a benchmark of
metagenomics software. Nature Methods, 14:1063, Oct 2017.
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George van den Driessche,
Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman,
Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine
Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of Go with
deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
Nikolay A. Simakov, Joseph P. White, Robert L. DeLeon, Steven M. Gallo, Matthew D. Jones, Jeffrey
T. Palmer, Benjamin Plessinger, and Thomas R. Furlani. A Workload Analysis of NSF’s
Innovative HPC Resources Using XDMoD. arXiv:1801.04306 [cs], January 2018. arXiv:
1801.04306.
John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor
Hazlewood, Scott Lathrop, Dave Lifka, Gregory D Peterson, Ralph Roskies, J Ray Scott, and
Nancy Wilkens-Diehr. XSEDE: Accelerating Scientific Discovery. Computing in Science &
Engineering, 16(5):62–74, 9 2014.
B Yang, L Ying, and J Tang. Artificial Neural Network Enhanced Bayesian PET Image
Reconstruction. IEEE Transactions on Medical Imaging, PP(99):1, 2018.
Jared Yanovich. Slash2 file system. https://github.com/pscedu/slash2.
J Ye, P Wu, J Z Wang, and J Li. Fast Discrete Distribution Clustering Using Wasserstein Barycenter
With Sparse Support. IEEE Transactions on Signal Processing, 65(9):2317–2332, 2017.
Jonathan D. Young, Chunhui Cai, and Xinghua Lu. Unsupervised deep learning reveals prognostically
relevant subtypes of glioblastoma. BMC Bioinformatics, 18(Suppl 11):381, October 2017.
Daniel R. Zerbino and Ewan Birney. Velvet: Algorithms for de novo short read assembly using de
Bruijn graphs. Genome Research, 18(5):821–829, May 2008.
Xinye Zheng, Jianbo Ye, Yukun Chen, Stephen Wistar, Jia Li, Jose A. Piedra-Fernndez, Michael A.
Steinberg, and James Z. Wang. Detecting Comma-shaped Clouds for Severe Weather Forecasting
using Shape and Motion. arXiv:1802.08937 [cs], February 2018. arXiv: 1802.08937.
The gravIT github repository.
OpenSWR.
Presidential Executive Order No. 13702. 2015.
B. P. Abbott et al. Observation of gravitational waves from a binary black hole merger. Phys. Rev.
Lett., 116:061102, February 2016.
Kapil Agrawal, Mark R. Fahey, Robert McLay, and Doug James. User environment tracking and
problem detection with XALT. In Proceedings of the First International Workshop on HPC User
Support Tools, HUST ’14, pages 32–40, Piscataway, NJ, USA, 2014. IEEE Press.
Ronald Babich, Michael A. Clark, and Bálint Joó. Parallelizing the QUDA library for multi-GPU
calculations in lattice quantum chromodynamics. In Proceedings of the 2010 ACM/IEEE
International Conference for High Performance Computing, Networking, Storage and Analysis, SC
’10, pages 1–11, Washington, DC, USA, 2010. IEEE Computer Society.
Carson Brownlee, Thiago Ize, and Charles D. Hansen. Image-parallel ray tracing using openGL
interception. In Proceedings of the 13th Eurographics Symposium on Parallel Graphics and
Visualization, EGPGV ’13, pages 65–72, Aire-la-Ville, Switzerland, 2013. Eurographics
Association.
Martin Burtscher, Byoung-Do Kim, Jeff Diamond, John McCalpin, Lars Koesterke, and James
Browne. Perfexpert: An easy-to-use performance diagnosis tool for HPC applications. In
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing,
Networking, Storage and Analysis, SC ’10, pages 1–11, Washington, DC, USA, 2010. IEEE
Computer Society.
Todd Evans, William L. Barth, James C. Browne, Robert L. DeLeon, Thomas R. Furlani, Steven M.
Gallo, Matthew D. Jones, and Abani K. Patra. Comprehensive resource use monitoring for HPC
systems with TACC stats. In Proceedings of the First International Workshop on HPC User
Support Tools, HUST ’14, pages 13–21, Piscataway, NJ, USA, 2014. IEEE Press.
National Science Foundation. Advanced computing infrastructure strategic plan. Technical Report
NSF-12-051, 2012.
Niall Gaffney, Christopher Jordan, Tommy Minyard, and Dan Stanzione. Building wrangler: A
transformational data intensive resource for the open science community. 2014 IEEE International
Conference on Big Data (Big Data), pages 20–22, 2014.
J. Hammond. The lltop github repository.
J. Hammond. The xltop github repository.
Alexander Heinecke, Alexander Breuer, Sebastian Rettenberger, Michael Bader, Alice-Agnes Gabriel,
Christian Pelties, Arndt Bode, William Barth, Xiang-Ke Liao, Karthikeyan Vaidyanathan, Mikhail
Smelyanskiy, and Pradeep Dubey. Petascale high order dynamic rupture earthquake simulations on
heterogeneous supercomputers. In Proceedings of the International Conference for High
Performance Computing, Networking, Storage and Analysis, SC ’14, pages 3–14, Piscataway, NJ,
USA, 2014. IEEE Press.
Jacob A. Hummel, Athena Stacy, and Volker Bromm. The First Stars: formation under cosmic ray
feedback. Mon. Not. Roy. Astron. Soc., 460(3):2432–2444, 2016.
Morris A. Jette, Andy B. Yoo, and Mark Grondona. SLURM: Simple linux utility for resource
management. In In Lecture Notes in Computer Science: Proceedings of Job Scheduling Strategies
for Parallel Processing (JSSPP) 2003, pages 44–60. Springer-Verlag, 2002.
Jiuxing Liu, Jiesheng Wu, and Dhabaleswar K. Panda. High performance RDMA-based MPI
implementation over infiniband. Int. J. Parallel Program., 32(3):167–198, June 2004.
Christopher Maffeo, Binquan Luan, and Aleksei Aksimentiev. End-to-end attraction of duplex DNA.
Nucleic Acids Research, 40(9):3812–3821, 2012.
Robert McLay, Karl W. Schulz, William L. Barth, and Tommy Minyard. Best practices for the
deployment and management of production HPC clusters. In State of the Practice Reports, SC ’11,
pages 1–11, New York, NY, USA, 2011. ACM.
Nirav Merchant, Eric Lyons, Stephen Goff, Matthew Vaughn, Doreen Ware, David Micklos, and
Parker Antin. The iplant collaborative: Cyberinfrastructure for enabling data to discovery for the
life sciences. PLOS Biology, 14(1):1–9, January 2016.
James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa,
Christophe Chipot, Robert D. Skeel, Laxmikant Kal, and Klaus Schulten. Scalable molecular
dynamics with NAMD. Journal of Computational Chemistry, 26(16):1781–1802, 2005.
Abtin Rahimian, Ilya Lashuk, Shravan Veerapaneni, Aparna Chandramowlishwaran, Dhairya
Malhotra, Logan Moon, Rahul Sampath, Aashay Shringarpure, Jeffrey Vetter, Richard Vuduc,
Denis Zorin, and George Biros. Petascale direct numerical simulation of blood flow on 200K cores
and heterogeneous architectures. In Proceedings of the 2010 ACM/IEEE International Conference
for High Performance Computing, Networking, Storage and Analysis, SC ’10, pages 1–11,
Washington, DC, USA, 2010. IEEE Computer Society.
Ellen M. Rathje, Clint Dawson, Jamie E. Padgett, Jean-Paul Pinelli, Dan Stanzione, Ashley Adair,
Pedro Arduino, Scott J. Brandenberg, Tim Cockerill, Charlie Dey, Maria Esteva, Fred L. Haan,
Matthew Hanlon, Ahsan Kareem, Laura Lowes, Stephen Mock, and Gilberto Mosqueda.
Designsafe: New cyberinfrastructure for natural hazards engineering. Natural Hazards Review,
18(3):06017001, 2017.
Johann Rudi, A. Cristiano, I. Malossi, Tobin Isaac, Georg Stadler, Michael Gurnis, Peter W. J. Staar,
Yves Ineichen, Costas Bekas, Alessandro Curioni, and Omar Ghattas. An extreme-scale implicit
solver for complex PDEs: Highly heterogeneous flow in earth’s mantle. In Proceedings of the
International Conference for High Performance Computing, Networking, Storage and Analysis, SC
’15, pages 1–12, New York, NY, USA, 2015. ACM.
Dan Stanzione, Bill Barth, Niall Gaffney, Kelly Gaither, Chris Hempel, Tommy Minyard, S.
Mehringer, Eric Wernert, H. Tufo, D. Panda, and P. Teller. Stampede 2: The evolution of an
XSEDE supercomputer. In Proceedings of the Practice and Experience in Advanced Research
Computing 2017 on Sustainability, Success and Impact, PEARC17, pages 1–8, New York, NY,
USA, 2017. ACM.
Craig A. Stewart, Timothy M. Cockerill, Ian Foster, David Hancock, Nirav Merchant, Edwin
Skidmore, Daniel Stanzione, James Taylor, Steven Tuecke, George Turner, Matthew Vaughn, and
Niall I. Gaffney. Jetstream: A self-provisioned, scalable science and engineering cloud
environment. In Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by
Enhanced Cyberinfrastructure, XSEDE ’15, pages 1–8, New York, NY, USA, 2015. ACM.
John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor
Hazlewood, Scott Lathrop, Dave Lifka, Gregory D. Peterson, Ralph Roskies, J. Ray Scott, and
Nancy Wilkins-Diehr. XSEDE: Accelerating scientific discovery. Computing in Science &
Engineering, 16(5):62–74, 2014.
I. Wald, G. P. Johnson, J. Amstutz, et al. OSPRay – A CPU ray tracing framework for scientific
visualization. IEEE Transactions on Visualization & Computer Graphics, 23(1):931–940, 2017.
Fuqing Zhang and Yonghui Weng. Predicting hurricane intensity and associated hazards: A five-year
real-time forecast experiment with assimilation of airborne Doppler radar observations. Bulletin of
the American Meteorological Society, 96(1):25–33, 2015.
Github: ARTED. https://github.com/ARTED/ARTED.
Green500 | TOP500 Supercomputer Sites.
HPCG.
KNC cluster COMA. https://www.ccs.tsukuba.ac.jp/eng/supercomputers/.
TOP500 Supercomputer Sites.
Jack Dongarra, Michael A. Heroux, and Piotr Luszczek. High-performance conjugate-gradient
benchmark: A new metric for ranking high-performance computing systems. The International
Journal of High Performance Computing Applications, 30(1):3–10, 2016.
Jack J. Dongarra, Piotr Luszczek, and Antoine Petitet. The LINPACK benchmark: past, present and
future. Concurrency and Computation: Practice and Experience, 15(9):803–820, 2003.
K. Fujita, T. Ichimura, K. Koyama, M. Horikoshi, H. Inoue, L. Meadows, S. Tanaka, M. Hori, M.
Lalith, and T. Hori. A fast implicit solver with low memory footprint and high scalability for
comprehensive earthquake simulation system. In Research Poster for SC16, International
Conference for High Performance Computing, Networking, Storage and Analysis, November 2016.
Balazs Gerofi, Akio Shimada, Atsushi Hori, and Yutaka Ishikawa. Partially Separated Page Tables for
Efficient Operating System Assisted Hierarchical Memory Management on Heterogeneous
Architectures. In 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing (CCGrid), May 2013.
Balazs Gerofi, Akio Shimada, Atsushi Hori, Takagi Masamichi, and Yutaka Ishikawa. CMCP: A
Novel Page Replacement Policy for System Level Hierarchical Memory Management on Many-
cores. In Proceedings of the 23rd International Symposium on High-performance Parallel and
Distributed Computing, HPDC ’14, pages 73–84, New York, NY, USA, 2014. ACM.
Balazs Gerofi, Masamichi Takagi, Yutaka Ishikawa, Rolf Riesen, Evan Powers, and Robert W.
Wisniewski. Exploring the Design Space of Combining Linux with Lightweight Kernels for
Extreme Scale Computing. In Proceedings of ROSS’15, pages 1–8. ACM, 2015.
Y. Hirokawa, T. Boku, S. A. Sato, and K. Yabana. Performance evaluation of large scale electron
dynamics simulation under many-core cluster based on Knights Landing. In HPC Asia 2018,
January 2018.
T. Ichimura, K. Fujita, P. E. B. Quinay, L. Maddegedara, M. Hori, S. Tanaka, Y. Shizawa, H.
Kobayashi, and K. Minami. Implicit nonlinear wave simulation with 1.08T DOF and 0.270T
unstructured finite elements to enhance comprehensive earthquake simulation. In ACM
Proceedings of the International Conference on High Performance Computing, Networking,
Storage and Analysis (SC’15), November 2015.
T. Ichimura, K. Fujita, S. Tanaka, M. Hori, M. Lalith, Y. Shizawa, and H. Kobayashi. Physics-based
urban earthquake simulation enhanced by 10.7 BlnDOF × 30K time-step unstructured FE non-
linear seismic wave simulation. In IEEE Proceedings of the International Conference on High
Performance Computing, Networking, Storage and Analysis (SC’14), November 2014.
K. Nakajima, M. Satoh, T. Furumura, H. Okuda, T. Iwashita, H. Sakaguchi, T. Katagiri, M.
Matsumoto, S. Ohshima, H. Jitsumoto, T. Arakawa, F. Mori, T. Kitayama, A. Ida, and M. Y.
Matsuo. ppOpen-HPC: Open source infrastructure for development and execution of large-scale
scientific applications on post-peta-scale supercomputers with automatic tuning (AT). In
Optimization in the Real World — Towards Solving Real-Worlds Optimization Problems, volume
13 of Mathematics for Industry, pages 15–35, 2015.
A. Petitet, R. C. Whaley, J. Dongarra, and A. Cleary. HPL – A Portable Implementation of the High-
Performance Linpack Benchmark for Distributed-Memory Computers.
S. A. Sato and K. Yabana. Maxwell + TDDFT multi-scale simulation for laser-matter interactions. J.
Adv. Simulat. Sci. Eng., 1(1), 2014.
Taku Shimosawa. Operating System Organization for Manycore Systems. dissertation, The University
of Tokyo, 2012.
Taku Shimosawa, Balazs Gerofi, Masamichi Takagi, Gou Nakamura, Tomoki Shirasawa, Yuji Saeki,
Masaaki Shimizu, Atsushi Hori, and Yutaka Ishikawa. Interface for Heterogeneous Kernels: A
Framework to Enable Hybrid OS Designs targeting High Performance Computing on Manycore
Architectures. In 2014 21th International Conference on High Performance Computing (HiPC),
December 2014.
Github: ARTED. https://github.com/ARTED/ARTED.
Green500 | TOP500 Supercomputer Sites.
HPCG.
KNC cluster COMA. https://www.ccs.tsukuba.ac.jp/eng/supercomputers/.
TOP500 Supercomputer Sites.
Jack Dongarra, Michael A. Heroux, and Piotr Luszczek. High-performance conjugate-gradient
benchmark: A new metric for ranking high-performance computing systems. The International
Journal of High Performance Computing Applications, 30(1):3–10, 2016.
Jack J. Dongarra, Piotr Luszczek, and Antoine Petitet. The LINPACK benchmark: past, present and
future. Concurrency and Computation: Practice and Experience,15(9):803–820, 2003.
K. Fujita, T. Ichimura, K. Koyama, M. Horikoshi, H. Inoue, L. Meadows, S. Tanaka, M. Hori, M.
Lalith, and T. Hori. A fast implicit solver with low memory footprint and high scalability for
comprehensive earthquake simulation system. In Research Poster for SC16, International
Conference for High Performance Computing, Networking, Storage and Analysis, November 2016.
Balazs Gerofi, Akio Shimada, Atsushi Hori, and Yutaka Ishikawa. Partially Separated Page Tables for
Efficient Operating System Assisted Hierarchical Memory Management on Heterogeneous
Architectures. In Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM
International Symposium on, May 2013.
Balazs Gerofi, Akio Shimada, Atsushi Hori, Takagi Masamichi, and Yutaka Ishikawa. CMCP: A
Novel Page Replacement Policy for System Level Hierarchical Memory Management on Many-
cores. In Proceedings of the 23rd International Symposium on High-performance Parallel and
Distributed Computing, HPDC ’14, pages 73–84, New York, NY, USA, 2014. ACM.
Balazs Gerofi, Masamichi Takagi, Yutaka Ishikawa, Rolf Riesen, Evan Powers, and Robert W.
Wisniewski. Exploring the Design Space of Combining Linux with Lightweight Kernels for
Extreme Scale Computing. In Proceedings of ROSS’15, pages 1–8. ACM, 2015.
Y. Hirokawa, T. Boku, S. A. Sato, and K. Yabana. Performance evaluation of large scale electron
dynamics simulation under many-core cluster based on Knights Landing. In HPC Asia 2018,
January 2018.
T. Ichimura, K. Fujita, P. E. B. Quinay, L. Maddegedara, M. Hori, S. Tanaka, Y. Shizawa, H.
Kobayashi, and K. Minami. Implicit nonlinear wave simulation with 1.08T DOF and 0.270T
unstructured finite elements to enhance comprehensive earthquake simulation. In ACM
Proceedings of the International Conference on High Performance Computing, Networking,
Storage and Analysis (SC’15), November 2015.
T. Ichimura, K. Fujita, S. Tanaka, M. Hori, M. Lalith, Y. Shizawa, and H. Kobayashi. Physics-based
urban earthquake simulation enhanced by 10.7 BlnDOF × 30K time-step unstructured FE non-
linear seismic wave simulation. In IEEE Proceedings of the International Conference on High
Performance Computing, Networking, Storage and Analysis (SC’14), November 2014.
K. Nakajima, M. Satoh, T. Furumura, H. Okuda, T. Iwashita, H. Sakaguchi, T. Katagiri, M.
Matsumoto, S. Ohshima, H. Jitsumoto, T. Arakawa, F. Mori, T. Kitayama, A. Ida, and M. Y.
Matsuo. ppOpen-HPC: Open source infrastructure for development and execution of large-scale
scientific applications on post-peta-scale supercomputers with automatic tuning (AT). In
Optimization in the Real World — Towards Solving Real-Worlds Optimization Problems, volume
13 of Mathematics for Industry, pages 15–35, 2015.
A. Petitet, R. C. Whaley, J. Dongarra, and A. Cleary. HPL – A Portable Implementation of the High-
Performance Linpack Benchmark for Distributed-Memory Computers.
S. A. Sato and K. Yabana. Maxwell + TDDFT multi-scale simulation for laser-matter interactions. J.
Adv. Simulat. Sci. Eng., 1(1), 2014.
Taku Shimosawa. Operating System Organization for Manycore Systems. dissertation, The University
of Tokyo, 2012.
Taku Shimosawa, Balazs Gerofi, Masamichi Takagi, Gou Nakamura, Tomoki Shirasawa, Yuji Saeki,
Masaaki Shimizu, Atsushi Hori, and Yutaka Ishikawa. Interface for Heterogeneous Kernels: A
Framework to Enable Hybrid OS Designs targeting High Performance Computing on Manycore
Architectures. In High Performance Computing (HiPC), 2014 21th International Conference on,
HiPC ’14, December 2014.