0% found this document useful (0 votes)

53 views

Introduction To Parallel Computing in R: 1 Motivation

The document introduces parallel computing in R by discussing how to use multiple cores to speed up computations that can be distributed independently, such as simulations with different parameters. It explains how to register a parallel backend like doParallel to make cores available to R. The foreach package is presented as a way to split problems into pieces that can be executed in parallel and then combined. Additional tools for parallelizing tasks involving random numbers, machine learning, and more substantial examples are also outlined. The document concludes by mentioning packages for parallelizing larger single computations.

Uploaded by

Ranveer Raaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Introduction To Parallel Computing in R: 1 Motivation

Uploaded by

Ranveer Raaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Introduction to parallel computing in R

Clint Leach
April 10, 2014

1 Motivation
When working with R, you will often encounter situations in which you need to repeat a
computation, or a series of computations, many times. This can be accomplished through
the use of a for loop. However, if there are a large number of computations that need to be
carried out (i.e. many thousands), or if those individual computations are time-consuming
(i.e. many minutes), a for loop can be very slow That said, almost all computers now have
multicore processors, and as long as these computations do not need to communicate (i.e.
they are ”embarrassingly parallel”), they can be spread across multiple cores and executed in
parallel, reducing computation time. Examples of these types of problems include:

• Run a simulation model using multiple different parameter sets,

• Run multiple MCMC chains simultaneously,

• Bootstrapping, cross-validation, etc.

2 Parallel backends
By default, R will not take advantage of all the cores available on a computer. In order to
execute code in parallel, you have to first make the desired number of cores available to R by
registering a ’parallel backend’, which effectively creates a cluster to which computations can
be sent. Fortunately there are a number of packages that will handle the nitty-gritty details
of this process for you:

• doMC (built on multicore, works for unix-alikes)

• doSNOW (built on snow, works for Windows)

• doParallel (built on parallel, works for both)

The parallel package is essentially a merger of multicore and snow, and automatically uses
the appropriate tool for your system, so I would recommend sticking with that.

Creating a parallel backend (i.e. cluster) is accomplished through just a few lines of code:

1
library(doParallel)

# Find out how many cores are available (if you don't already know)
detectCores()

## [1] 4

# Create cluster with desired number of cores

cl <- makeCluster(3)

# Register cluster
registerDoParallel(cl)

# Find out how many cores are being used

getDoParWorkers()

## [1] 3

The syntax for the other packages is essentially the same, just with register<Package>(cl).

3 Executing computations in parallel

Regardless of the application, parallel computing boils down to three basic steps: split the
problem into pieces, execute in parallel, and collect the results.

3.1 Using foreach

These steps can all be handled through the use of the foreach package. This provides a
parallel analogue to a standard for loop.

library(foreach)

x <- foreach(i = 1:3) %dopar% sqrt(i)

## [[1]]
## [1] 1
##
## [[2]]
## [1] 1.414
##
## [[3]]
## [1] 1.732

2
As in a for loop, i defines an iterator (though note the use of = instead of in) splits the
problem into pieces (step 1). For each value of the iterator then, the %dopar% operator passes
the defined computations, here sqrt(i), to the availables cores (step 2). Also note that, in
contrast to a for loop, foreach collects the results (step 3) and returns an object, a list by
default. This can be changed through the .combine option:

# Use the concatenate function to combine results

x <- foreach(i = 1:3, .combine = c) %dopar% sqrt(i)

# Now x is a vector
x

## [1] 1.000 1.414 1.732

# Can also use + or * to combine results

x <- foreach(i = 1:3, .combine = "+") %dopar% sqrt(i)

# Now x is a scalar, the sum of all the results

## [1] 4.146

Other options include rbind and cbind.

It is also important to note that foreach will automatically export any necessary variables
(i.e. all variables defined in your current environment will be available to the cores), but any
packages needed by the computations need to be passed using the .packages option.

3.2 Parallel apply functions

The parallel package also provides parallel analogues for the apply family of functions.

parLapply(cl, list(1, 2, 3), sqrt)

## [[1]]
## [1] 1
##
## [[2]]
## [1] 1.414
##
## [[3]]
## [1] 1.732

3
3.3 Random number generation
If the calculations that you are parallelizing involve random number generation (as they often
will), you will need to explicilty set up your random number generators so that the random
numbers used by the different cores are independent. This can be handled fairly easily through
package doRNG, which generates independent, reproducible random number chains for each
core:

library(doRNG)

# Set the random number seed manually before calling foreach

set.seed(123)

# Replace %dopar% with %dorng%

rand1 <- foreach(i = 1:5) %dorng% runif(3)

# Or set seed using .options.RNG option in foreach

rand2 <- foreach(i = 1:5, .options.RNG = 123) %dorng% runif(3)

# The two sets of random numbers are identical (i.e. reproducible)

identical(rand1, rand2)

## [1] TRUE

3.4 Task-specific packages

These packages can take advantage of a registered parallel backend without needing foreach:

• caret – classification and regression training; cross-validation, etc; will automatically

paralellize if a backend is registered.

• bugsparallel – provides tools for running parallel MCMC chains in WinBUGS using
R2WinBugs (still active?).

• dclone – MCMC methods for maximum likelihood estimation, running BUGS chains in
parallel.

• pls – partial least squares and principal component regression – built-in cross-validation
tools can take advantage multicore by setting options.

• plyr – data manipulation and apply-like functions; can set options to run in parallel.

3.5 Caveats and Warnings

• There is communication overhead to setting up cluster – not worth it for simple problems.

4
• Error handling – default is to stop if one of the taks produces an error, but you lose
the output from any tasks that completed successfully; use .errorhandling option in
foreach to control how errors should be treated.

• Can use ’Performance’ tab of the Windows Task Manager to double check that things
are working correctly (should see CPU usage on the desired number of cores).

• Shutting down cluster – when you’re done, be sure to close the parallel backend using
stopCluster(cl); otherwise you can run into problems later.

3.6 A (slightly) more substantial example

Returning to the tree data you’ve been working with, say we have girth and volume measure-
ments for 100 species of trees, instead of just three, and we want to fit a linear regression for
each species. This can still be done fairly quickly using a for loop on a single core, we can
save a few seconds by running it in parallel.

# Generate fake tree data set with 100 observations for 100 species
tree.df <- data.frame(species = rep(c(1:100), each = 100), girth = runif(10000,
7, 40))
tree.df$volume <- tree.df$species/10 + 5 * tree.df$girth + rnorm(10000, 0, 3)

# Extract species IDs to iterate over

species <- unique(tree.df$species)

# Run foreach loop and store results in fits object

fits <- foreach(i = species, .combine = rbind) %dopar% {
sp <- subset(tree.df, subset = species == i)
fit <- lm(volume ~ girth, data = sp)
return(c(i, fit$coefficients))
}

head(fits)

## (Intercept) girth
## result.1 1 0.09479 4.998
## result.2 2 0.79043 4.980
## result.3 3 0.06980 5.009
## result.4 4 0.25487 4.996
## result.5 5 -0.59531 5.055
## result.6 6 1.00723 4.984

# What if we want all of the info from the lm object? Change .combine
fullfits <- foreach(i = species) %dopar% {
sp <- subset(tree.df, subset = species == i)
fit <- lm(volume ~ girth, data = sp)

5
return(fit)
}

attributes(fullfits[[1]])

## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
##
## $class
## [1] "lm"

4 Less embarrassing parallel problems

Multicore computing is also useful for carrying out single, large computations (e.g. inverting
very large matrices, fitting linear models with ’Big Data’). In these cases, the cores are
working together to carry out a single computation and the thus need to communicate (i.e.
not ’embarrassingly parallel’ anymore). This type of parallel computation is considerably more
difficult, but there are some packages that do most of the heavy lifting for you:

• HiPLAR – High Performance Linear Algebra in R – automaticaly replaces default matrix

commands with multicore computations; Linux only, installation not necessarily straight-
forward.

• pbdR – programming with big data in R – multicore matrix algebra and statistics;
available for all OS, with potentially tractable install. Also has extensive introduction
manual, ”Speaking R with a parallel accent.”

5 Additional Resources
This overview has covered a very thin slice of the tools available, both within the above
packages and in R more broadly. The help pages and vignettes for the above packages are
very useful and provide additional details and examples. The CRAN task view on parallel
computing is also a good resource for digging into the breadth of tools available:
http://cran.r-project.org/web/views/HighPerformanceComputing.html

Parallel Programming in R
100% (4)
Parallel Programming in R
14 pages
Parallel Computing For Data Science With Examples in R, C++ and CUDA (PDFDrive)
No ratings yet
Parallel Computing For Data Science With Examples in R, C++ and CUDA (PDFDrive)
336 pages
Bayes CPH - Tutorial R
No ratings yet
Bayes CPH - Tutorial R
9 pages
Hydraulic Energy
100% (1)
Hydraulic Energy
10 pages
Parallel Computing: # Registering Cores For Parallel Process
No ratings yet
Parallel Computing: # Registering Cores For Parallel Process
4 pages
DAR LEC 16 PARALLEL COMPUTING
No ratings yet
DAR LEC 16 PARALLEL COMPUTING
13 pages
Parallel Computing::: Cheat Sheet
No ratings yet
Parallel Computing::: Cheat Sheet
1 page
Parallel
No ratings yet
Parallel
14 pages
Package Parallel': R-Core October 22, 2011
No ratings yet
Package Parallel': R-Core October 22, 2011
7 pages
Foreach Iterators - Lewis
No ratings yet
Foreach Iterators - Lewis
25 pages
Nesting Foreach Loops: Steve Weston May 18, 2011
No ratings yet
Nesting Foreach Loops: Steve Weston May 18, 2011
6 pages
Package Parallel': R-Core October 19, 2013
No ratings yet
Package Parallel': R-Core October 19, 2013
13 pages
Cluster ParallelTechniques PDF
No ratings yet
Cluster ParallelTechniques PDF
72 pages
Parallel Processing
No ratings yet
Parallel Processing
6 pages
00 State-Of-The-Art in Parallel Computing With R
No ratings yet
00 State-Of-The-Art in Parallel Computing With R
52 pages
BengtssonH - 20191109 Futures NYC
No ratings yet
BengtssonH - 20191109 Futures NYC
68 pages
Parallel Programming and R
No ratings yet
Parallel Programming and R
13 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Submitting Your MATLAB Jobs Using Slurm To High-Performance Clusters - by Rahul Bhadani - Towards Da
No ratings yet
Submitting Your MATLAB Jobs Using Slurm To High-Performance Clusters - by Rahul Bhadani - Towards Da
1 page
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Lecture HPC 11 Parallelization
No ratings yet
Lecture HPC 11 Parallelization
128 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
W2 Advanced Data Structures, IO & Control
No ratings yet
W2 Advanced Data Structures, IO & Control
44 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Matlab Parallel
No ratings yet
Matlab Parallel
617 pages
Forecasting With R Notes
No ratings yet
Forecasting With R Notes
66 pages
Programming With Big Data in R: George Ostrouchov and Mike Matheson Oak Ridge National Laboratory
No ratings yet
Programming With Big Data in R: George Ostrouchov and Mike Matheson Oak Ridge National Laboratory
35 pages
MATLAB Parallel Computing Toolbox: Life Cycle of A Job
No ratings yet
MATLAB Parallel Computing Toolbox: Life Cycle of A Job
12 pages
Ch2
No ratings yet
Ch2
29 pages
Assignment 2b Advanced Programming in R: Ika Pratiwi
No ratings yet
Assignment 2b Advanced Programming in R: Ika Pratiwi
6 pages
A Brief Guide To R For Beginners in Econometrics: Department of Economics, Stockholm University
No ratings yet
A Brief Guide To R For Beginners in Econometrics: Department of Economics, Stockholm University
33 pages
50 Java Concepts Every Developer Should Know
From Everand
50 Java Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Snow
No ratings yet
Snow
9 pages
GETTING STARTED WITH SQL: Exercises with PhpMyAdmin and MySQL
From Everand
GETTING STARTED WITH SQL: Exercises with PhpMyAdmin and MySQL
Remy Lentzner
No ratings yet
Easier Parallel Computing in R With Snowfall and Sfcluster
No ratings yet
Easier Parallel Computing in R With Snowfall and Sfcluster
6 pages
Computer Science, Career and Job
From Everand
Computer Science, Career and Job
Ramkrishna Ghosh
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Where can buy R for Statistics 1st Edition Pierre-Andre Cornillon ebook with cheap price
100% (3)
Where can buy R for Statistics 1st Edition Pierre-Andre Cornillon ebook with cheap price
51 pages
The Fastcluster Package: User's Manual: Daniel Müllner
No ratings yet
The Fastcluster Package: User's Manual: Daniel Müllner
16 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
NumPy Recipes
From Everand
NumPy Recipes
Martin McBride
No ratings yet
IntroToSNAinR Sunbelt 2012 Tutorial
No ratings yet
IntroToSNAinR Sunbelt 2012 Tutorial
16 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
User's Guide: Parallel Computing Toolbox™ 5
No ratings yet
User's Guide: Parallel Computing Toolbox™ 5
713 pages
Essential Algorithms: A Practical Approach to Computer Algorithms
From Everand
Essential Algorithms: A Practical Approach to Computer Algorithms
Rod Stephens
4.5/5 (2)
R Reference Card: 1 Getting Started 3 Input and Output
No ratings yet
R Reference Card: 1 Getting Started 3 Input and Output
7 pages
Introduction To Geographically Weighted Regression (GWR) (PDFDrive)
No ratings yet
Introduction To Geographically Weighted Regression (GWR) (PDFDrive)
58 pages
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
R for Statistics 1st Edition Pierre-Andre Cornillon download pdf
100% (13)
R for Statistics 1st Edition Pierre-Andre Cornillon download pdf
60 pages
Parallel Computing Toolbox™UserGuide
No ratings yet
Parallel Computing Toolbox™UserGuide
729 pages
R_intro2021
No ratings yet
R_intro2021
23 pages
Intro Supercomputing
No ratings yet
Intro Supercomputing
292 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
Rintro
No ratings yet
Rintro
14 pages
Rise and Shine L5 GSE Teacher Mapping Booklet
No ratings yet
Rise and Shine L5 GSE Teacher Mapping Booklet
38 pages
Leonics 2 MW Stand Alone PV Diesel Generator001
No ratings yet
Leonics 2 MW Stand Alone PV Diesel Generator001
2 pages
Anatomy and Physiology in Relation To Complete Denture Construction
No ratings yet
Anatomy and Physiology in Relation To Complete Denture Construction
4 pages
Cad Ex3
No ratings yet
Cad Ex3
7 pages
GS Plasto 4F-2021
No ratings yet
GS Plasto 4F-2021
1 page
Contor Electronic Monofazat de Energie Electrică: Caracteristici Tehnice
No ratings yet
Contor Electronic Monofazat de Energie Electrică: Caracteristici Tehnice
2 pages
Extrapulmonary Tuberculosis
No ratings yet
Extrapulmonary Tuberculosis
19 pages
Reading The Quranic Conception(s) of Justice
No ratings yet
Reading The Quranic Conception(s) of Justice
106 pages
Qw-484A Suggested Format A For Welder Performance Qualifications (WPQ) (See QW-301, Section IX, ASME Boiler and Pressure Vessel Code)
No ratings yet
Qw-484A Suggested Format A For Welder Performance Qualifications (WPQ) (See QW-301, Section IX, ASME Boiler and Pressure Vessel Code)
1 page
Accounting Ratios MCQs 2024
No ratings yet
Accounting Ratios MCQs 2024
3 pages
Tongkul 1991
No ratings yet
Tongkul 1991
11 pages
EECS 1015: Introduction To Computer Science and Programming Topic 4
No ratings yet
EECS 1015: Introduction To Computer Science and Programming Topic 4
82 pages
Ballistics Module
No ratings yet
Ballistics Module
52 pages
RC500 to HC900 Rev1
No ratings yet
RC500 to HC900 Rev1
4 pages
Get (Ebook) The Last Murder. The Investigation, Prosecution, and Execution of Ted Bundy by George R. Dekle Sr. ISBN 9780313397431, 9780313397448, 0313397430 free all chapters
100% (4)
Get (Ebook) The Last Murder. The Investigation, Prosecution, and Execution of Ted Bundy by George R. Dekle Sr. ISBN 9780313397431, 9780313397448, 0313397430 free all chapters
81 pages
Master Thesis Yumeng Li
No ratings yet
Master Thesis Yumeng Li
68 pages
2010PYSSEAYP
No ratings yet
2010PYSSEAYP
2 pages
Craftsman 113.29570 Table Saw Manual
No ratings yet
Craftsman 113.29570 Table Saw Manual
40 pages
Factitious Disorder Munchausen Syndrome In.22
No ratings yet
Factitious Disorder Munchausen Syndrome In.22
6 pages
Equations: Performance Assessment Teacher Support
No ratings yet
Equations: Performance Assessment Teacher Support
4 pages
INBDEBooster Endodontics Notes
No ratings yet
INBDEBooster Endodontics Notes
17 pages
Ajay Kumar: Objective
No ratings yet
Ajay Kumar: Objective
3 pages
Feduc 04 00038
No ratings yet
Feduc 04 00038
10 pages
Critical Care of COVID-19 in The Emergency Department 2021
No ratings yet
Critical Care of COVID-19 in The Emergency Department 2021
214 pages
P6 Maths SA2 2018 Catholic High Exam Papers
No ratings yet
P6 Maths SA2 2018 Catholic High Exam Papers
50 pages
M13 Perdev
No ratings yet
M13 Perdev
20 pages
Amit Kumar Mishra PDF
No ratings yet
Amit Kumar Mishra PDF
3 pages
Custom Car Summer 2019
100% (1)
Custom Car Summer 2019
84 pages
Ions Scattering Spectroscopy (ISS)
No ratings yet
Ions Scattering Spectroscopy (ISS)
22 pages

Introduction To Parallel Computing in R: 1 Motivation

Uploaded by

Introduction To Parallel Computing in R: 1 Motivation

Uploaded by

Introduction to parallel computing in R

• Run a simulation model using multiple different parameter sets,

• Run multiple MCMC chains simultaneously,

• Bootstrapping, cross-validation, etc.

• doMC (built on multicore, works for unix-alikes)

• doSNOW (built on snow, works for Windows)

• doParallel (built on parallel, works for both)

# Create cluster with desired number of cores

# Find out how many cores are being used

3 Executing computations in parallel

3.1 Using foreach

x <- foreach(i = 1:3) %dopar% sqrt(i)

# Use the concatenate function to combine results

## [1] 1.000 1.414 1.732

# Can also use + or * to combine results

# Now x is a scalar, the sum of all the results

Other options include rbind and cbind.

3.2 Parallel apply functions

parLapply(cl, list(1, 2, 3), sqrt)

# Set the random number seed manually before calling foreach

# Replace %dopar% with %dorng%

# Or set seed using .options.RNG option in foreach

# The two sets of random numbers are identical (i.e. reproducible)

3.4 Task-specific packages

• caret – classification and regression training; cross-validation, etc; will automatically

3.5 Caveats and Warnings

3.6 A (slightly) more substantial example

# Extract species IDs to iterate over

# Run foreach loop and store results in fits object

4 Less embarrassing parallel problems

• HiPLAR – High Performance Linear Algebra in R – automaticaly replaces default matrix

You might also like