0% found this document useful (0 votes)
260 views5 pages

Writing Efficient R Code

The document discusses various techniques for benchmarking and optimizing R code performance including: 1) Measuring the read times of CSV and RDS files using system.time() and comparing using microbenchmark(). 2) Using benchmarkme to check the RAM and CPU specs of the machine. 3) Timing growing a vector vs pre-allocating with system.time(). 4) Comparing vectorized operations like multiplication and log-sums to for loops. 5) Timing selecting columns from a data.frame vs matrix and rows using microbenchmark().

Uploaded by

Octavio Flores
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
260 views5 pages

Writing Efficient R Code

The document discusses various techniques for benchmarking and optimizing R code performance including: 1) Measuring the read times of CSV and RDS files using system.time() and comparing using microbenchmark(). 2) Using benchmarkme to check the RAM and CPU specs of the machine. 3) Timing growing a vector vs pre-allocating with system.time(). 4) Comparing vectorized operations like multiplication and log-sums to for loops. 5) Timing selecting columns from a data.frame vs matrix and rows using microbenchmark().

Uploaded by

Octavio Flores
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5

The Art of Benchmarking

R version
# Print the R version details using version
version

# Assign the variable major to the major component


major <- version$major

# Assign the variable minor to the minor component


minor <- version$minor

Comparing read times of CSV and RDS files


# How long does it take to read movies from CSV?
system.time(read.csv("movies.csv"))

# How long does it take to read movies from RDS?


system.time(readRDS("movies.rds"))

//3

Elapsed time
# Load the package
library(microbenchmark)

# Compare the two functions


compare <- microbenchmark(read.csv("movies.csv"),
readRDS("movies.rds"),
times = 10)

# Print compare
compare

//3

DataCamp hardware
# Load the package
library(benchmarkme)

# Assign the variable ram to the amount of RAM on this machine


ram <- get_ram()

# Assign the variable cpu to the cpu specs


cpu <- get_cpu()

Benchmark DataCamp's machine


# Load the package
library("benchmarkme")

# Run the benchmark


res <- benchmark_io(runs = 1, size = 5)

# Plot the results


plot(res)

//2

Timings - growing a vector


# Use <- with system.time() to store the result as res_grow
system.time(res_grow <- growing(n))
Timings - pre-allocation
# Use <- with system.time() to store the result as res_allocate
n <- 30000
system.time(res_allocate <- pre_allocate(n))

Vectorized code: multiplication


# Store your answer as x2_imp
x2_imp <- x * x

Vectorized code: calculating a log-sum


# Initial code
n <- 100
total <- 0
x <- runif(n)
for(i in 1:n)
total <- total + log(x[i])

# Rewrite in a single line. Store the result in log_sum


log_sum <- sum(log(x))

//4

Data frames and matrices - column selection


# Which is faster, mat[, 1] or df[, 1]?
microbenchmark(mat[, 1], df[, 1])

//2

Row timings
# Which is faster, mat[1, ] or df[1, ]?
microbenchmark(mat[1, ], df[1, ])

//2
//3

Profvis in action
# Load the profvis package
library(profvis)

# Profile the following code


profvis({
# Load and select data
comedies <- movies[movies$Comedy == 1, ]

# Plot data of interest


plot(comedies$year, comedies$rating)

# Loess regression line


model <- loess(rating ~ year, data = comedies)
j <- order(comedies$year)

# Add fitted line to the plot


lines(comedies$year[j], model$fitted[j], col = "red")
})

Change the data frame to a matrix


# Load the microbenchmark package
library(microbenchmark)
# The previous data frame solution is defined
# d() Simulates 6 dices rolls
d <- function() {
data.frame(
d1 = sample(1:6, 3, replace = TRUE),
d2 = sample(1:6, 3, replace = TRUE)
)
}

# Complete the matrix solution


m <- function() {
matrix(sample(1:6, 6, replace = TRUE), ncol = 2)
}

# Use microbenchmark to time m() and d()


microbenchmark(
data.frame_solution = d(),
matrix_solution = m()
)

Calculating row sums


# Example data
rolls

# Define the previous solution


app <- function(x) {
apply(x, 1, sum)
}

# Define the new solution


r_sum <- function(x) {
rowSums(x)
}

# Compare the methods


microbenchmark(
app_sol = app(rolls),
r_sum_sol = r_sum(rolls)
)

Use && instead of &


# Example data
is_double

# Define the previous solution


move <- function(is_double) {
if (is_double[1] & is_double[2] & is_double[3]) {
current <- 11 # Go To Jail
}
}

# Define the improved solution


improved_move <- function(is_double) {
if (is_double[1] && is_double[2] && is_double[3]) {
current <- 11 # Go To Jail
}
}
## microbenchmark both solutions
microbenchmark(move(is_double), improved_move(is_double), times = 1e5)

How many cores does this machine have?


# Load the parallel package
library(parallel)

# Store the number of cores in the object no_of_cores


no_of_cores <- detectCores()

# Print no_of_cores
no_of_cores

//2
//2
//3

Moving to parApply
# Determine the number of available cores.
detectCores()

# Create a cluster via makeCluster


cl <- makeCluster(2)

# Parallelize this code


parApply(cl, dd, 2, median)

# Stop the cluster


stopCluster(cl)

Using parSapply()
library("parallel")
# Create a cluster via makeCluster (2 cores)
cl <- makeCluster(2)

# Export the play() function to the cluster


clusterExport(cl, "play")

# Re-write sapply as parSapply


res <- parSapply(cl, 1:100, function(i) play())

# Stop the cluster


stopCluster(cl)

Timings parSapply()
# Set the number of games to play
no_of_games <- 1e5

## Time serial version


system.time(serial <- sapply(1:no_of_games, function(i) play()))

## Set up cluster
cl <- makeCluster(4)
clusterExport(cl, "play")

## Time parallel version


system.time(par <- parSapply(cl, 1:no_of_games, function(i) play()))

## Stop cluster
stopCluster(cl)

You might also like