Advanced R: Chapter 3.3: Functional Programming
Advanced R: Chapter 3.3: Functional Programming
Recapitulation
Declarative programming
Declarative programs merely describe the problem, i.e. they describe what
needs to be done. It’s left for the underlying programming language (not
the programmer!) to decide how to actually solve it.
Based on the definition, the programming language must now decide on its
own how to calculate the function. All it can fall back on are arithmetic
and logical operators, conditionals and further (recursive) function calls.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 4 / 64
What is functional programming?
2. No imperative programming
Inside the function, no imperative programming is allowed except for the if
conditional. Only further (nested) function calls as well as arithmetic and
logical expressions are permitted.
Often times, recurring tasks have to be performed inside our programs with
only minor changes to the parameters. A common approach to this is copy
and paste. As convenient as it may be, it’s also very error-prone. Instead,
it’s more advisable to follow the DRY principle.
Recursion
Definition: Recursion
Recursion: Examples
We’ve already seen one example: calculating factorials. Two other common
examples of recursion are:
If you’ve already used recursion in R, you may have come across the
following problem:
recursion = function(x) {
if (x == 1)
return(x)
return(Recall(x - 1))
}
recursion(10000)
Conclusion: There must be an upper limit for how often a function can
call itself as well as an error should it be reached.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 14 / 64
Recursion
In particular, the computer must keep track of all open function calls. Upon
completing the execution of a single call, it must then continue with the
preceding call.
Recursion in R I
Recursion in R II
## [1] 0.640933
## [1] 0.6598144
## [1] 0.07202202
## [1] 0.2683692
We’re not quite fond of the output and other aspects of this solution yet,
but we’ll refine this example over the course of this chapter.
Roadmap
Functionals
Functionals
A function that takes a function as its input and returns a vector, is called
a functional.
All of you have probably already used functionals from all three classes.
The first functional that (hopefully) all of you have already used is:
lapply
lapply() takes a vector X and a function FUN as its input. It applies FUN
to every element of X and returns the results as a list.
lapply: Example
The most common use case of lapply() is as a replacement of a for loop:
ns = c(20, 40, 60, 80, 100) res = vector(mode = "list", length = 5L)
res = lapply(ns, function(n) for (i in seq_along(ns)) {
mean(microbenchmark(facRec(n))$time) res[[i]] = mean(
) microbenchmark(facRec(ns[i]))$time
)
}
Aside from lapply(), there are also other apply functions that all replace
certain looping patterns:
Function Pattern
lapply() Input: Vector, Output: List
sapply() Input: Vector, Output: As much simplified as possible
vapply() Input: Vector, Output: Pre-defined type
apply() Input: Matrix, iterating over rows or columns,
Output: Vector
tapply() Input: Vector, Factor, iterating over factor levels,
Output: Simplified list with an element per factor level
by() Input: Data frame, else like tapply()
mapply() Input: Multiple vectors, Output: Simplified list
eapply() Input: Environment, Output: List
The apply family belongs to the first kind of functionals that aim to
simply apply the given function to a provided data structure and
create a proper output.
Each apply function has its own quirks, e.g.: An argument might be
called simplify for one function and SIMPLIFY for the other.
The apply family is the key to mastering the functional aspects of R.
The R package plyr offers unified variants of apply.
Filter()
Filter() takes a vector and a function with a logical return value as input.
It returns a vector with the elements for which the function returns TRUE.
Filter(function(x) x > 3, 1:10)
## [1] 4 5 6 7 8 9 10
Map()
Map() takes a function and multiple vectors with arguments. The function
is then applied to every i-th element of the vectors. If necessary, vectors are
recycled.
Map(`+`, 1:2, 11:12)
## [[1]]
## [1] 12
##
## [[2]]
## [1] 14
Reduce()
Reduce() reduces a vector to a single value. First, the provided function is
applied to a starting value and the first element of the vector. After that,
the resulting value and the next element of the vector are used and so forth
until all elements of the vector have been processed.
Reduce(`+`, 1:10, init = 0)
## [1] 55
The reduction can be performed starting from the left or from the right.
It’s also possible to return the accumulated results as well.
Reduce(`+`, 1:10, init = 0, right = TRUE, accumulate = TRUE)
## [1] 55 54 52 49 45 40 34 27 19 10 0
Anonymous functions
We’ve already made use of this multiple times on the last couple of slides:
sapply(1:10, function(x) x^2)
## [1] 1 4 9 16 25 36 49 64 81 100
The aim of some functionals is not to actually apply the provided function,
but to analyze it. We already know some examples for this:
All these functions have a thing in common: During their execution, the
function fun is evaluated multiple times to learn something about fun.
Usually, fun is interpreted as a purely mathematical function.
2000
Time
Note: The input parameters as well as the output type of avgFun() are
fixed by the function’s definition. Thus, every supplied function must
comply with them.
identityAvgFun = function(time.series, i)
time.series[i]
2000
Time
meanAvgFun = function(time.series, i) {
inds = i + -5:5
inds = inds[inds >= 1 & inds <= length(time.series)]
mean(time.series[inds])
}
5000
Euro
2000
Time
weightMeanAvgFun = function(time.series, i) {
inds = i + -2:2
allowed.inds = inds >= 1 & inds <= length(time.series)
inds = inds[allowed.inds]
weights = 2 * (1:5)[allowed.inds]
weighted.mean(time.series[inds], w = weights)
}
5000
Euro
2000
Time
Function factories
Thus, our newly created function can also access all variables and
parameters of its factory. Therefore, if we often need similar functions that
only differ in a few parameters, we can create them using a function factory.
## [1] 4.13099
## [1] 12.91354
## $fun ## $fun
## function() ## function()
## runif(n, lower, upper) ## runif(n, lower, upper)
## <environment: 0x000000001102aef8> ## <bytecode: 0x0000000010a253b0>
## ## <environment: 0x0000000009340458>
## $n ##
## [1] 1 ## $n
## ## [1] 1
## $lower ##
## [1] 0 ## $lower
## ## [1] 10
## $upper ##
## [1] 10 ## $upper
## [1] 20
The weight functions in our examples were very similar. This doesn’t really
comply with the DRY principle. Instead, we can write a function factory for
weight functions:
mkLogLik = function(X, y) {
if (nrow(X) != length(y))
stop("Number of rows of X and length of y must be equal.")
if (is.data.frame(X))
X = as.matrix(X)
if (is.factor(y)) {
y = droplevels(y)
y = c(0, 1)[y]
}
function(theta) {
cp <- X %*% theta
-sum(y * cp - log(1 + exp(cp)))
}
}
irisLogLik(1:4) irisLogLik(1:4)
## [1] 1 ## [1] 1
counter() counter()
## [1] 2 ## [1] 2
x = 10 x = 20
f1 = capture(x) f2()
f2 = capture(x)
f1() ## [1] 20
## [1] 10
Due to lazy evaluation, x is evaluated upon the first call of the functions
respectively. The promise object looks for the current value of x and
stores it in the respective enclosing environment.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 53 / 64
Functions as output of other functions
After the two functions have been called for the first time, the values of x
are now stored and can be obtained.
x = 30 x = 40
f1() f2()
## [1] 10 ## [1] 20
## [1] 0.841471
sinCnt(1)
## [1] 0.841471
environment(sinCnt)$counter
## [1] 2
Memoization I
Store results of all previous calls (Example for simple numeric functions):
memoise = function(f) {
cache = data.frame(x = NULL, val = NULL)
function(x) {
# Has x been evaluated before?
inds = cache$x == x
if(any(inds)) {
# Look value for x up in cache
return(cache$val[inds])
} else {
# Evaluate and cache x
val = f(x)
cache <<- rbind(cache, data.frame(x = x, val = val))
return(val)
}
}
}
Memoization II
Caution: Recall() doesn’t work here, since it would always call the
non-memoizing function.
## [1] 16
## [1] 4
General pattern
All functional operators (mostly) follow the same pattern:
funOp <- function(f, otherargs) {
# Maybe initialize some objects
function(...) {
# Maybe do something
res <- f(...)
# Maybe do something else
res
}
}
Summary
1 Functions are not static. They are normal data objects that can be
manipulated, appended to lists or used as input of other functions.
2 Functions are closures and can memorize things. We can manipulate
functions by modifying the enclosing environment. The concept of
closures resembles object-oriented programming:
Objects are data with methods. Closures are functions with data.