0% found this document useful (0 votes)
10 views7 pages

A Programmer's Guide To R

Uploaded by

vftsapp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

A Programmer's Guide To R

Uploaded by

vftsapp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

A Programmer’s Guide to R

Serafin Schoch
June 17, 2024

Introduction Outline

The programming language R can be both sur- I. Vectors and Piping: In this section, we ex-
prisingly convenient and irritating. This guide plore why R, a functional programming
will highlight some of R’s unconventional be- language, lacks a map() function similar
havior and explain it. Additionally, I’ll cover to Rust, Haskell, or Java, and why the
basic concepts from other programming lan- length of "Hello World" is 1. We then
guages and either show how to do it in Base R demonstrate how the pipe operator in R
or reference a package that covers it. can simplify function application and en-
R is a rather old language first released in hance code readability.
2000 and built upon the even older language S,
II. Lists and Dictionaries: Here we explore
both primarily used in academic contexts (see
why accessing elements of a vector and a
History of R). This means Base R is old, and
list are the same, although they look dif-
many of beloved concepts don’t exist in plain
ferent on the surface. In the same go will
R or behave in unexpected ways. But don’t
look at accessing values by names and
worry, you don’t need to relearn everything
how to use "dictionaries" in R.
or miss out on these more modern function-
alities. R comes with an ecosystem of well- III. Dataframes: We’ll explore the use of
crafted packages. Especially the "tidyverse" dataframes in R and the advantages of us-
package — a collection of several smaller pack- ing the dplyr package for data manipula-
ages — fills the gaps between Base R and mod- tion. By comparing base R and dplyr syn-
ern programming languages. These packages tax, we demonstrate how dplyr simplifies
consolidate recent developments in design of and enhances data operations.
programming languages into neat wrappers
and functions. Seriously, if you don’t use R IV. Functions and Methods: In this section,
with the appropriate packages, you’re just us- we explore the versatility of functions as
ing legacy code (see Best Practices for R). objects, the utility of assertions for error
handling within functions, and the imple-
This guide won’t cover the R basics. I’ll as-
mentation of class-specific methods using
sume you know how to install packages and
R’s object-oriented features.
use them, or at least you know how to figure
it out yourself. I’ll try to build a solid funda- V. R Package and Rust: We’ll have a look at
mental understanding of R that one would not creating R packages with a Rust backend.
easily find by googling or using other search Rust integration is streamlined with rex-
engines. Certainly, you can get an even bet- tendr. We’ll use devtools to create and
ter understanding by reading documentations. publish a package.
In contrast to the documentations, I’ll provide
you with an opinionated and concise selection Some resources: CheatSheet collection, the
of fundamentals. same on GitHub and R Language Definition.
A Programmer’s Guide to R • Serafin Schoch • June 17, 2024

I. Vectors and Piping

First, we address two questions: Why Piping


doesn’t a functional programming language
like R have a ‘map()‘ function like Rust, Applying several functions in sequence can be
Haskell, or even Java? And why is the length cumbersome with base R syntax.
of "Hello World" 1?
abs(nchar(c("The","words","are")) - 10)
length("Hello World")
## [1] 7 5 7
## [1] 1
We can use the pipe operator from the
‘magrittr‘ package, which is also loaded when
using the ‘dplyr‘ package, to make this more
Vectors readable:

To answer the second question, we must un- # library(dplyr)


derstand that in R, everything is a vector. Basic c("The","words","are") %>%
types like strings, numerics, and booleans are nchar(.) %>%
vectors. There is no single string or boolean, {. - 10} %>%
only vectors of length 1 containing a string or abs(.)
boolean (see R Language Definition). Hence,
the length of "Hello World" is 1 because it is ## [1] 7 5 7
a vector of size 1 containing the string "Hello
World". We can get the number of characters Even though the code is longer, the
in the string using the ‘nchar()‘ function. order in which the functions and opera-
tions are applied is easier to understand.
Notice how the syntax looks similar to
nchar("Hello World")
‘c(...).map(nchar).map(...)‘. This is because
## [1] 11 ‘nchar‘, ‘-‘, and ‘abs‘ are vectorized. Note that
you could also use ‘%>% slice()‘ with the pipe
To create a vector of several strings, we com- operator.
bine them with ‘c()‘. Knowing that the "Hello
World" string is a vector and that we can apply Conclusion
‘nchar()‘ to it, it should be possible to apply
‘nchar()‘ to this vector as well. Since everything in R is a vector, all basic func-
tions are vectorized and work on vectors, mak-
length(c("The","words","are")) ing a ‘map()‘ function in most cases unneces-
sary. Additionally, the pipe operator makes
## [1] 3 the code resemble the syntax of functional
programming languages that use ‘map()‘ and
nchar(c("The","words","are")) makes it easier to understand the order of
applied functions. If you feel that the base
## [1] 3 5 3
R options don’t resemble the iterator options
of typical functional programming languages
As expected, the length of the vector is 3,
enough, have a look at the ‘purrr‘ package.
and the result of ‘nchar()‘ is a vector of nu-
merics containing the number of characters for
each string.

2
A Programmer’s Guide to R • Serafin Schoch • June 17, 2024

II. Lists and Dictionaries

Some of the most commonly used data struc- vec <- c("A","B","C")
tures are dictionaries and lists, but R doesn’t vec[c(TRUE, FALSE, TRUE)] # "A", "C"
have a dictionary type. Furthermore, lists vec["B" == vec] # returns "B"
in R behave peculiarly: Why does calling vec[-2] # returns "A", "C"
‘some_list[1]‘ return the first element wrapped vec[2:3] <- "X" # vec = "A", "X", "X"
in a list, while ‘some_vec[1]‘ returns just the
first element?

some_vec <- c("A","B","C")


Dictionaries
identical("A", some_vec[1])
There is no dictionary type since every vector
## [1] TRUE or list can have named entries. These names
can be given on creation or later by assigning
some_list <- list("A","B","C") the attribute names. Names can be used to
identical("A", some_list[1]) slice and access elements of vectors/lists.

## [1] FALSE vec <- c(1:3) # assign later


names(vec) <- c("A","B","C")
c(A=1,B=2,C=3) # assign on creation

Slicing ## A B C
## 1 2 3
To jump straight to the answer, the ‘[]‘ brack-
ets are used for slicing. The question might be vec["B"]
misleading since slicing a vector also returns
## B
a slice and not an element. However, all ba-
## 2
sic types in R are vectors. In that sense, the
sliced vector containing only one string is as Lists additionally provide the ‘$name‘ dol-
close as we get to a single string, meaning that lar syntax, which does the same as ‘[["name"]]‘.
"A" and ‘some_vec[1]‘ are essentially the same.
This does not hold for lists.
We can access the elements of a list with the list <- list(A=1, B=2, C=3)
‘[[]]‘ brackets. We can also use this on vectors, identical(list[["A"]], list$A)
but we’ll receive just the same as we do by
slicing. ## [1] TRUE

some_list <- list("A","B","C")


identical("A", some_list[[1]]) Conclusion
## [1] TRUE There is no separate dictionary type since vec-
tors and lists can be turned into dictionaries by
Slicing is quite versatile in R. Chapter 2.7 naming them. The ‘$‘ dollar syntax is handy
from the R-intro provides good insights into to access values of named lists. Furthermore,
all possibilities of slicing. We can use it to re- the ‘[]‘ brackets are used to slice and not to re-
trieve a selected part of a vector/list or to up- trieve an element. Confusion can occur when
date the selected values of a vector/list. Here using it on vectors since, in that case, a slice of
are some examples in code: size 1 is really the same as the element itself.

3
A Programmer’s Guide to R • Serafin Schoch • June 17, 2024

III. Dataframes

Dataframes are probably the most used data characters %>% # dplyr
type in R, and this is where the ‘dplyr‘ pack- filter(Age < 50) %>%
age becomes invaluable. Under the hood, select(Name, Skill) %>%
dataframes are lists of lists (and vectors) with arrange(Name)
some additional constraints (e.g., all contained
lists must have the same length). A tibble is ## # A tibble: 2 x 2
a dataframe with some extra verifications and ## Name Skill
fewer automatic transformations, reducing un- ## <chr> <chr>
expected mistakes. Additionally, it is the stan- ## 1 Gigi Storytelling
dard dataframe used throughout the tidyverse ## 2 Momo Listening
package.
Following, we will use the made-up data of # grouping and summarizing ------------
some Momo characters to demonstrate filter- tapply(# base R
ing, selecting, summarizing, and grouping of characters$Height,
dataframes. First, I’ll show how to do it in cut(characters$Age, c(0, 50, 100)),
base R, and instead of explaining, I’ll just pro- mean
vide the syntax using the ‘dplyr‘ package. )

## (0,50] (50,100]
characters <- tibble( ## 135 180
Name = c("Momo", "Beppo", "Gigi"),
Age = c(12, 52, 27), characters %>% # dplyr
Skill = c("Listening", "Steady Pace", group_by(age_cat =
"Storytelling"), cut(Age, c(0,50,100))) %>%
Height = c(120, 180, 150), summarise(avg_height = mean(Height))
)
characters ## # A tibble: 2 x 2
## age_cat avg_height
## # A tibble: 3 x 4 ## <fct> <dbl>
## Name Age Skill Height ## 1 (0,50] 135
## <chr> <dbl> <chr> <dbl> ## 2 (50,100] 180
## 1 Momo 12 Listening 120
## 2 Beppo 52 Steady Pace 180
## 3 Gigi 27 Storytelling 150
Conclusion
# filter and select -------------------
tmp <- characters[ # base R Base R can do most of the things ‘dplyr‘
characters$Age < 50, can, but ‘dplyr‘ syntax seems to explain it-
c("Name", "Skill")] self. Moreover, the syntax facilitates thinking
tmp[order(tmp$Name),] of more complex transformations. Imagine
having data from different weather stations,
## # A tibble: 2 x 2 and you want the newest measurement of each
## Name Skill station. How would you do it in base R? With
## <chr> <chr> ‘dplyr‘, you’ll group by stations, arrange by
## 1 Gigi Storytelling date, and slice to yield the first element of each
## 2 Momo Listening group. (Cheat Sheets: dplyr and check tidyr
for pivot longer/wider)

4
A Programmer’s Guide to R • Serafin Schoch • June 17, 2024

IV. Functions and Methods

R, being a functional programming language, assert(FALSE, "something failed")


handles functions as objects and can even pro-
cess them into lists and vice versa. While the ## Error: something failed
usage of functions as objects is very useful,
the conversion to lists is a legacy feature that assert_off <- TRUE
can be happily ignored. Functions are one of assert(FALSE, "something failed")
the main ways to build and maintain larger
## [1] "assertions off"
coding projects. Usually, the most common
functions already exist in Base R, and almost rm(assert_off)
everything one would want is available in a
package, reducing the need to write your own
functions. Reusing functions can reduce bugs
simply by using code that has already been Here a lot is happening. First, we define
tested by potentially hundreds of other people. the function ‘assert‘, taking a condition and
Even if you build your own functions, they’ll a message. The condition has to evaluate to
be tested by you, and after testing, you’ll know true in order to pass the assertion. If not, the
they work. But imagine rewriting the code message is thrown. Additionally, we have a
over and over again. The potential for a typo, flag ‘assert_off‘ which can be set to true to dis-
misremembering a function name, or assign- able all assertions. This works because R looks
ing a wrong variable is always there. Further- for the variable ‘assert_off‘ in the local environ-
more, well-designed functions make the code ment of the function, but since it can’t find it
readable such that it no longer takes a lot of there, R goes up an environment until it finds
brain power to understand what is happening, the variable or the global scope is reached. If
even if the underlying procedure is very com- it can’t find the variable, it would throw an
plex. Enough of the ramble about the benefits error. Since we don’t want that to happen,
of clean code. we check whether the variable ‘assert_off‘ ex-
ists. If it doesn’t exist, we assume assertions
should be turned on. In practice, I would re-
Functions move the flag, but it’s a nice way to discuss
environments. Here is how we can apply the
Good error messages are invaluable. They ‘assert‘ function.
save potentially hours of debugging. I like to
use assertions at the beginning of functions to
verify whether the input is acceptable. This # fromJSON has a nasty error message
helps to catch errors as early as possible, and
additionally, you can add messages that help readJSON <- function(path) {
to understand what should be different. For assert(
the sake of this section, we will write our own file.exists(path),
assertion function (similar to ‘stopifnot()‘). paste0("can't find \"", path,"\""))
fromJSON(path)
assert <- function(condition, message){ }
if(exists("assert_off")&&assert_off)
return("assertions off") readJSON("some_file.json")
if (!condition) {
stop(message, call. = FALSE) ## Error: can’t find "some_file.json"
}}

5
A Programmer’s Guide to R • Serafin Schoch • June 17, 2024

Methods future(worker)
In addition to functions, R has generics (meth- ## [1] "irrelevant"
ods) that have different implementations de-
pending on the class of the first argument pro- future(4)
vided. Every R object has a class attribute (a
vector of strings). You can manually modify ## Error in UseMethod("future"):
the class attribute, potentially breaking the be- nicht anwendbare Methode für ’future’
havior of some functions. Classes provide the auf Objekt der Klasse "c(’double’,
backbone for various operations, such as cor- ’numeric’)" angewendet
rectly handling additions for both normal and
complex numbers, and even for adding visual Here, we first create a variable ‘worker‘
elements when building a ggplot. Let’s build which is a numeric vector (4). Then we as-
our own class and generics. sign the class ‘proletariat‘ to the worker to de-
fine its class. We see ‘worker‘ is now aware
worker <- 4 of its class. However, since we did not im-
class(worker) plement any methods for this class, it still be-
haves like a number. Next, we show how
## [1] "numeric" a worker should behave when we calculate
the mean. We do this by adding the method
class(worker) <- "proletariat"
‘mean.proletariat‘. Now, whenever ‘mean‘ is
class(worker)
called on an object with the class ‘proletariat‘,
## [1] "proletariat" the string "doesn’t work" is returned. Yay, the
worker has learned: ‘mean(worker)‘ doesn’t
mean(worker) work. We can even change the behavior of
the ‘+‘ function. Notice we have to use ‘""‘
## [1] 4 since ‘+‘ is a special symbol. This way, we can
teach the proletariat what to do when multi-
mean.proletariat <- function(x) ple of them meet up — protest. For now, we
"doesn't work" added methods for already existing functions.
mean(worker) If we want to create a new function that acts
as a method, we use the ‘UseMethod‘ function.
## [1] "doesn't work"
This way, we create the function ‘future‘ and
"+.proletariat" <- function(a, b) { define its behavior for the proletariat. Now
if (sum(a, b) >= 10) "protest" else we can also check what the future of a worker
sum(a, b) * 0.75 is: irrelevance. Since no default method is im-
} plemented for ‘future‘ (‘future.default <- ...‘),
worker + worker the call to ‘future(4)‘ throws an error.

## [1] 6 Conclusions
worker + worker + worker We have learned how to throw errors and how
environments resolve variables. Additionally,
## [1] "protest"
we explored class-specific functionalities and
future <- function(x) adapting basic functions like addition. These
UseMethod("future") are valuable tools for creating more complex
future.proletariat <- function(x) code. For further reading, refer to the R Lan-
"irrelevant" guage Definition.

6
A Programmer’s Guide to R • Serafin Schoch • June 17, 2024

V. R Package and Rust

A rule of thumb in programming is that if you require(devtools)


want to do something, someone else has prob- create_package("path")
ably already done it better and documented it # Change directory to the package
well. R does not natively support Rust, but use_readme_rmd() # Creating Rmd
it is fairly easy to build and run R packages rextendr::use_extendr() # Init
with Rust code, or even build Rust functions rextendr::document() # Update
in R and then run them in the R environment.
The devtools cheat sheet is an excellent sum-
Rust from Within mary of how to create and publish a package
to GitHub. From here on, everything should
The ‘extendr‘ crate for Rust and the ‘rextendr‘ be smooth sailing.
package make it easy to run Rust code from I used the commands above to initialize my
within R. Check the rextendr documentation own package. Initially, it did not immediately
for installation and further examples. After in- work, but after running many commands in
stalling all dependencies, mainly Rust, R, and the devtools cheat sheet and restarting my R
the ‘rextendr‘ package, you can run the follow- sessions, the following commands worked:
ing lines:
load_all()
suppressMessages( check()
rextendr::rust_function(" rextendr::document() # Update
fn add_from_rust(a:f64, b:f64) -> f64 { # Push to GitHub
a + b
}")) I have published my package on GitHub.
Take a look at it here. You can install the pack-
add_from_rust(1.2, 1.7) age directly from GitHub with:

## [1] 2.9 # devtools::install_github(


# "S3r4f1n/rpackageUsingRust")
This is neat but not ideal, since calling suppressMessages(library(
‘rust_function()‘ compiles the provided Rust rpackageUsingRust))
string. This takes a lot of time, and debug- greeting_n_times(2)
ging the Rust code this way is not ideal. By
building our own package, we can avoid this. ## [1] "Hello world!\nHello world!\n"
Compiling and debugging will take place in a
proper development environment.
Conclusion
Packages
Building an R package is not trivial, but the
Luckily, creating your own R package is sup- tooling (devtools) is quite easy to use and well
ported by an amazing toolset. No wonder so documented. With the ‘extendr‘ crate and ‘rex-
many well-crafted packages exist. Check out tendr‘ package, the interoperability between
the extendr and devtools documentation for Rust and R is well supported, enabling the
detailed information. use of Rust as a backend for R packages. If
After installing ‘devtools‘ and ‘rextendr‘, you want to create your own R package using
these commands will initialize an R package Rust, check the dedicated articles on extendr
using Rust: and devtools.

You might also like