0% found this document useful (0 votes)

198 views17 pages

Lammps On Gpus: A Tutorial

This document provides a tutorial on running molecular dynamics simulations using LAMMPS on GPUs. It discusses why GPUs are useful for scientific computing due to their large number of cores and high memory bandwidth. It then summarizes the ongoing efforts to port LAMMPS to GPUs and the capabilities that are currently available like Lennard-Jones and Gay-Berne potentials. The rest of the document outlines the 9 step process to run LAMMPS on a GPU, including checking your GPU, installing CUDA, editing Makefiles, compiling the GPU library, adding GPU packages to LAMMPS, modifying input scripts, and running a sample simulation. Speedups achieved will depend on factors like the CPU, GPU,

Uploaded by

sachu92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views17 pages

Lammps On Gpus: A Tutorial

Uploaded by

sachu92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

LAMMPS on GPUs

A Tutorial
W. Michael Brown, Peng Wang, Paul S. Crozier, Steve Plimpton

Wednesday, February 24, 2010

Why run on GPUs?

Technology paid for by
gamers, but impact to scientific computing is now well-recognized (electrical) solution for data parallelism 240+ cores on a GPU High memory bandwidth

Cheap, low-power

Porting LAMMPS to GPUs

Still largely a research effort
Marc Adams (Nvidia) Pratul Agarwal (ORNL) Sarah Anderson (Cray) Mike Brown (Sandia) Paul Crozier (Sandia) Massimiliano Fatica (Nvidia) Scott Hampton (ORNL) Ricky Kendall (ORNL) Hyesoon Kim (Ga Tech) Axel Kohlmeyer (Temple) Doug Kothe (ORNL) Scott LeGrand (Nvidia) Ben Levine (Temple) Christian Mueller (UTI Germany) Steve Plimpton (Sandia) Duncan Poole (Nvidia) Steve Poole (ORNL) Jason Sanchez (RPI) Arnold Tharrington (ORNL) John Turner (ORNL) Peng Wang (Nvidia) Lars Winterfeld (UTI Germany) Andrew Zonenberg (RPI)

Currently Available in Main LAMMPS

Lennard-Jones
Force/Neighbor

Gay-Berne Potential
Force

More capabilities
soon

How to Run LAMMPS on Your GPU

1. Do you have a GPU?

For single precision
Currently need a CUDA-enabled GPU with compute
capability >= 1.1

For double precision

Currently need a CUDA-enabled GPU with compute
capability >= 1.3
Windows: Device Manager Apple: Apple Menu-> About this Mac -> More Info -> Graphics/Displays Linux: nvidia_settings or /sbin/lspci | grep nVidia List of CUDA-enabled GPUs here: http://www.nvidia.com/object/cuda_gpus.html Can use device query to get compute capability; more later

2. Do you have CUDA?

http://developer.nvidia.com/object/
cuda_2_3_downloads.html Need driver and toolkit only Need to have the nvcc compiler in your path Pay attention to 32- or 64-bit
No 64-bit on apple!
set path = ( $path /usr/local/cuda/bin ) setenv LD_LIBRARY_PATH /usr/local/cuda/lib/ or set path = ( $path /usr/local/cuda/bin ) setenv LD_LIBRARY_PATH /usr/local/cuda/lib64/

3. Edit LAMMPS GPU Makefile

set LROOT = /home/wmbrown/lammps-20Feb10 cd $LROOT/lib/gpu emacs Makefile.nvidia

3. Edit LAMMPS GPU Makefile (2)

BIN_DIR = . OBJ_DIR = . AR = ar CUDA_CPP = nvcc -I/usr/local/cuda/include -DUNIX -O3 -Xptxas -v -use_fast_math CUDA_ARCH = -arch=sm_13 CUDA_PREC = -D_SINGLE_SINGLE CUDA_LINK = -L/usr/local/cuda/lib64 -lcudart $(CUDA_LIB)

For compute capability >= 1.3 can also use:

CUDA_PREC = -D_SINGLE_DOUBLE # Double precision accumulation or CUDA_PREC = -D_DOUBLE_DOUBLE # Double precision everything

For Apple, must compile 32-bit

CUDA_ARCH = -arch=sm_13 m32 CUDA_LINK = -L/usr/local/cuda/lib -lcudart $(CUDA_LIB)

For compiler >= g++ 4.4 on Linux

CUDA_ARCH = -arch=sm_13 --compiler-bindir=/usr/bin/gcc-4.3

4. Make LAMMPS GPU lib

make f Makefile.nvidia ./nvc_get_devices
Device 0: "GeForce GTX 295" Revision number: Total amount of global memory: Number of multiprocessors: Number of cores: Total amount of constant memory: Total amount of shared memory per block: Total number of registers available per block: Warp size: Maximum number of threads per block: Maximum sizes of each dimension of a block: Maximum sizes of each dimension of a grid: Maximum memory pitch: Texture alignment: Clock rate: Concurrent copy and execution: Device 1: "Tesla C1060" 1.3 0.87 GB 30 240 65536 bytes 16384 bytes 16384 32 512 512 x 512 x 64 65535 x 65535 x 1 262144 bytes 256 bytes 1.24 GHz Yes

5. Edit LAMMPS Makefile as Necessary

cd $LROOT/src emacs ./MAKE/Makefile.linux If you are not 64-bit (or Apple) gpu_SYSPATH = -L/usr/local/cuda/lib

If you are using Apple, compile LAMMPS 32-bit to link with GPU library CC = LINK = g++ -m32 g++ -m32

make clean

6. Add GPU Package to LAMMPS

cd $LROOT/src make yes-asphere make yes-gpu make linux

7. Modify your input script

cd $LROOT/bench emacs in.lj Must add newton off to beginning of script and /gpu to a supported pair_style
newton off ... pair_style lj/cut/gpu one/node 0 2.5

GPU Selection Keyword

GPU ID

7. Modify your input script (2)

GPU Selection Keyword
one/node - single compute "node, which may have
multiple cores and/or GPUs. GpuID should be set to the ID of the (first) GPU you wish to use with LAMMPS one/gpu - multiple compute "nodes with one GPU per node. GpuID should be set to the ID of the GPU. multi/gpu - multiple compute "nodes" on your system with multiple GPUs. GpuID should be set to the number of GPUs per node

8. Run your input script

Number of procs = number of gpus you want
mpirun np 3 lmp_linux < in.lj
-------------------------------------------------------------------------- Using GPGPU acceleration for LJ-Cut: -------------------------------------------------------------------------GPU 1: Tesla C1060, 240 cores, 4 GB, 1.3 GHZ GPU 2: Tesla C1060, 240 cores, 4 GB, 1.3 GHZ GPU 3: GeForce GTX 295, 240 cores, 0.87 GB, 1.2 GHZ ---------------------------------------------------------------------------------------------------------------------------------------------GPU Time Stamps: --------------------------------------------------------------------Atom copy: 0.07111 s. Neighbor copy: 0.0004615 s. LJ calc: 0.1702 s. Answer copy: 0 s. ---------------------------------------------------------------------

9. Speed-ups
Depends on

Your CPU Your GPU Number of Particles Cutoff

More talks showing the GPU acceleration in

LAMMPS to come

Questions

CSE Lec4 Cuda
No ratings yet
CSE Lec4 Cuda
91 pages
Tutorial Plimpton
No ratings yet
Tutorial Plimpton
81 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
CUDA Introduction Mod
No ratings yet
CUDA Introduction Mod
50 pages
w13s1 MultiprocessingGPU
No ratings yet
w13s1 MultiprocessingGPU
21 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
07 cmsc416 Cuda
No ratings yet
07 cmsc416 Cuda
26 pages
10 GPU-IntroCUDA3
No ratings yet
10 GPU-IntroCUDA3
141 pages
CUDA-Multiple GPUs
No ratings yet
CUDA-Multiple GPUs
36 pages
27th Aug - Introduction To GPGPU - Part 1
No ratings yet
27th Aug - Introduction To GPGPU - Part 1
32 pages
Programming For Graphics Processing Units (Gpus) : Parallel
No ratings yet
Programming For Graphics Processing Units (Gpus) : Parallel
35 pages
Using CUDA
No ratings yet
Using CUDA
57 pages
CMake Lists
No ratings yet
CMake Lists
4 pages
G80 Cuda
No ratings yet
G80 Cuda
25 pages
cs179 2024 Lec01
No ratings yet
cs179 2024 Lec01
26 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
2023 CSC14120 Lecture00 CourseIntroduction
No ratings yet
2023 CSC14120 Lecture00 CourseIntroduction
30 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
Intro GPUs
No ratings yet
Intro GPUs
36 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
Multi Gpu Programming With Mpi
No ratings yet
Multi Gpu Programming With Mpi
93 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages
Lec 6
No ratings yet
Lec 6
16 pages
Gpu Cuda Part1
No ratings yet
Gpu Cuda Part1
27 pages
Cuda Lab Manual
100% (1)
Cuda Lab Manual
22 pages
SoR 03-Tour of LAMMPS Features
No ratings yet
SoR 03-Tour of LAMMPS Features
47 pages
Lec 1
No ratings yet
Lec 1
27 pages
C Make Lists
No ratings yet
C Make Lists
10 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
3: Getting Started With LAMMPS: Steve Plimpton, Sjplimp@sandia - Gov
No ratings yet
3: Getting Started With LAMMPS: Steve Plimpton, Sjplimp@sandia - Gov
75 pages
Recipe For Running Simple CUDA Code On A GPU Based Rocks Cluster
No ratings yet
Recipe For Running Simple CUDA Code On A GPU Based Rocks Cluster
17 pages
GPUProgramming Talk
No ratings yet
GPUProgramming Talk
18 pages
Lammps Overdrive
No ratings yet
Lammps Overdrive
28 pages
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
100% (1)
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
29 pages
Parallel Execution of A Parameter Sweep For Molecular Dynamics Simulations in A Hybrid GPU/CPU Environment
No ratings yet
Parallel Execution of A Parameter Sweep For Molecular Dynamics Simulations in A Hybrid GPU/CPU Environment
10 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
Computer Hardware Software Installation and Customization
No ratings yet
Computer Hardware Software Installation and Customization
88 pages
cs179 2017 Lec01
No ratings yet
cs179 2017 Lec01
24 pages
SDK TechnicalGuide v2.0 Daubango AI
No ratings yet
SDK TechnicalGuide v2.0 Daubango AI
62 pages
GPGPU Tutorial
No ratings yet
GPGPU Tutorial
155 pages
LAMMPS
No ratings yet
LAMMPS
92 pages
Dell EMC Enterprise 1000
No ratings yet
Dell EMC Enterprise 1000
114 pages
Programming Massively Parallel Processors 4th Edition Wenmei W Hwu Instant Download
No ratings yet
Programming Massively Parallel Processors 4th Edition Wenmei W Hwu Instant Download
77 pages
CUDA - Quick Reference PDF
No ratings yet
CUDA - Quick Reference PDF
2 pages
After Installing Homebrew, We Can Install LAMMPS On Your System With The Following Commands
No ratings yet
After Installing Homebrew, We Can Install LAMMPS On Your System With The Following Commands
3 pages
CUDA
No ratings yet
CUDA
33 pages
Is There A Real Difference Between DSPs and GPUs
100% (1)
Is There A Real Difference Between DSPs and GPUs
18 pages
gtc22 Whitepaper Hopper
No ratings yet
gtc22 Whitepaper Hopper
71 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
7 Csbssyll
No ratings yet
7 Csbssyll
11 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
Cryptography On Gpus: Erdem Sarılı
No ratings yet
Cryptography On Gpus: Erdem Sarılı
18 pages
VkFFT-A Performant Cross-Platform and Open-Source GPU FFT Library
No ratings yet
VkFFT-A Performant Cross-Platform and Open-Source GPU FFT Library
20 pages
Towards High Performance Paged Memory For GPUs
No ratings yet
Towards High Performance Paged Memory For GPUs
13 pages
Lammps On Gpus: A Tutorial
No ratings yet
Lammps On Gpus: A Tutorial
17 pages
Computation 08 00004 PDF
No ratings yet
Computation 08 00004 PDF
24 pages
Tata HPC Aman
No ratings yet
Tata HPC Aman
34 pages
Coa PPT-2
No ratings yet
Coa PPT-2
16 pages
COA Midterm
No ratings yet
COA Midterm
13 pages
2020 - Pérez Cambet - High Performance Ultrasound Simulation Using Monte-Carlo Simulation A GPU Ray-Tracing Implementation
No ratings yet
2020 - Pérez Cambet - High Performance Ultrasound Simulation Using Monte-Carlo Simulation A GPU Ray-Tracing Implementation
10 pages
Python Non-Uniform Fast Fourier Transform (Pynufft) : An Accelerated Non-Cartesian Mri Package On A Heterogeneous Platform (Cpu/Gpu)
No ratings yet
Python Non-Uniform Fast Fourier Transform (Pynufft) : An Accelerated Non-Cartesian Mri Package On A Heterogeneous Platform (Cpu/Gpu)
22 pages
Pawan 09 Graph Algorithms
No ratings yet
Pawan 09 Graph Algorithms
26 pages
Debayer Resize Jgt09
No ratings yet
Debayer Resize Jgt09
10 pages
Introduction To 3D Programming in Delphi
No ratings yet
Introduction To 3D Programming in Delphi
23 pages
Gpu Cuda Part2
No ratings yet
Gpu Cuda Part2
15 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
2013 07 22-Python-CUDA
No ratings yet
2013 07 22-Python-CUDA
25 pages
Cambricon: An Instruction Set Architecture For Neural Networks
No ratings yet
Cambricon: An Instruction Set Architecture For Neural Networks
13 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
Installation of LAMMPS-21Mar12 - 25may12 On OS X 10.7 (Lion)
No ratings yet
Installation of LAMMPS-21Mar12 - 25may12 On OS X 10.7 (Lion)
4 pages
Jetson Platform Brief May2014
No ratings yet
Jetson Platform Brief May2014
15 pages
Brief Overview of Parallel Computing
No ratings yet
Brief Overview of Parallel Computing
14 pages
How Gpus Work
No ratings yet
How Gpus Work
5 pages
Gpu Computing Gems Jade PDF
No ratings yet
Gpu Computing Gems Jade PDF
3 pages
Evolution and Trends in GPU Computing
No ratings yet
Evolution and Trends in GPU Computing
7 pages
Parallel Scan in C CUda
No ratings yet
Parallel Scan in C CUda
3 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Practical GPU Programming: High-performance computing with CUDA, CuPy, and Python on modern GPUs
From Everand
Practical GPU Programming: High-performance computing with CUDA, CuPy, and Python on modern GPUs
Maris Fenlor
No ratings yet
Practical GPU Programming
From Everand
Practical GPU Programming
Maris Fenlor
No ratings yet
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
From Everand
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
Mulayam Singh
No ratings yet
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet

Lammps On Gpus: A Tutorial

Uploaded by

Lammps On Gpus: A Tutorial

Uploaded by

LAMMPS on GPUs

Wednesday, February 24, 2010

Why run on GPUs?

Porting LAMMPS to GPUs

Currently Available in Main LAMMPS

How to Run LAMMPS on Your GPU

1. Do you have a GPU?

For double precision

2. Do you have CUDA?

3. Edit LAMMPS GPU Makefile

set LROOT = /home/wmbrown/lammps-20Feb10 cd $LROOT/lib/gpu emacs Makefile.nvidia

3. Edit LAMMPS GPU Makefile (2)

For compute capability >= 1.3 can also use:

For Apple, must compile 32-bit

For compiler >= g++ 4.4 on Linux

4. Make LAMMPS GPU lib

5. Edit LAMMPS Makefile as Necessary

6. Add GPU Package to LAMMPS

7. Modify your input script

GPU Selection Keyword

7. Modify your input script (2)

8. Run your input script

More talks showing the GPU acceleration in

You might also like