K24 Functions
K24 Functions
What is a function?
Example
square = function(x) {
# Simply return x^2
return(x^2)
}
square(5)
## [1] 25
A function in R:
is (unsurprisingly) defined by the key word function.
packages instructions and executes them when called.
is called by round brackets.
can depend on parameters.
has been used by all of us 10 000 times already.
Defining a function
A function is defined via the key word function followed by round
brackets and an R expression. The brackets may be empty.
function(arg1 = default1, arg2 = default2, ...)
expression
Over the course of this chapter, we’ll frequently encounter the term
expression. Expressions are one of the 24 data types in R and describe R
code which is to be executed.
## List of 2
## $ :function (..., na.rm = FALSE)
## $ :function (x, ...)
## $srcref
## function(x) {
## # Simply return x^2
## return(x^2)
## }
##
## $name
## [1] "Squarefunction"
## [1] "closure"
typeof(round)
## [1] "special"
typeof(sum)
## [1] "builtin"
Closures
Closure
A closure in R consists of three to four components:
1 Body: Contains the R code from the definition of the function.
2 Formals: A list of the function’s arguments and default values.
3 Environment: The function’s enclosing environment.
4 Optional: Bytecode: Compiled code.
Both types don’t contain R code, but rather directly call the
underlying C code. Thus, they can be more efficient.
formals, body and environment are NULL
Both types can behave differently than closures – in every imaginable
way. We’ll take a look at some examples.
They exist only in the base package – that’s why only R developers
are able to define new ones.
Builtin functions evaluate their arguments before passing them to the
internal (C) function whereas special functions directly pass
unevaluated arguments.
From here on, we’ll be dealing almost exclusively with closures - though at
some points we’ll have to take a look at the behavior of builtins and
specials.
formals(square)
## $x
typeof(formals(square))
## [1] "pairlist"
str(formals(square))
Pairlists
Pairlists are deprecated for normal use! Thus, we’ll never actually use
them, but internally, they are used quite frequently.
Differences: List vs. pairlist
Empty pairlists are NULL as opposed to a list of length 0.
Pairlists are singly linked objects. Each element only knows its
successor. The total length of a pairlist is unknown.
Lists are based on a table in which all elements are linked.
Every pairlist can be treated just like a normal list. However, normal lists
are usually more efficient. This is why pairlists are often automatically
converted to lists.
str(formals(square)[1])
## List of 1
## $ x: symbol
Content of formals
So, formals are merely a named list of arguments. The names are
mandatory, each formal argument has a name.
groceries = function(Milk = 1, Bread, Butter = "Landliebe"){
list(Milk = Milk, Bread = Bread, Butter = Butter)
}
str(formals(groceries))
## [1] 1
f("a")
## [1] "a"
Argument matching I
## List of 3
## $ Milk : num 5
## $ Bread : num 4
## $ Butter: chr "Rama"
Argument matching II
These three possibilities are checked in exactly this order until every calling
parameter is matched to a formal parameter.
1 Is there a formal parameter with an identical name? If so, the
respective calling parameter is matched to it.
2 If not: Is there a formal parameter whose beginning matches the name
of the calling parameter? If there is exactly one fitting parameter, then
they are matched. If there are multiple fitting parameters, an error
occurs.
3 If there are named calling parameters that can’t be matched, an error
occurs (exception: the ’...’ argument).
4 All unnamed calling parameters are matched according to the order of
yet unmatched formal arguments. If there remain more calling
parameters than formal arguments, an error occurs.
## [1] 5.5
mean(x = 1:10)
## [1] 5.5
Good practices
1 Set the first or the first two arguments via positional matching.
2 Match all other arguments via complete matching.
## [1] 5.5
## List of 3
## $ Milk : num 1
## $ Bread : num 2
## $ Butter: chr "Landliebe"
Default values
Lazy evaluation
Lazy evaluation
Arguments in R are not evaluated until they are actually needed. Until then,
they are so-called promise objects.
f = function(x) {
return(10)
}
f(stop("This is an error!"))
## [1] 10
## [1] 20
They can even depend on values that are defined in the function:
f = function(x = 2 * internal.param) {
internal.param = 10
return(x)
}
f()
## [1] 20
By the way, the && and || functions used here are another example for this.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 27 / 92
First component: Formals
## function (x)
## x
## <bytecode: 0x0000000006de70f8>
## <environment: namespace:base>
f(x = ls())
## List of 3
## $ x: num 4
## $ y: num 2
## $ z: num 3
Thus, no errors occur anymore and the collected calling parameters can be
processed inside of the function.
A function with the ... argument can pass the arguments collected by it
to another function with the ... argument:
absoluteMean = function(x, ...) {
abs.vals = abs(x)
abs.mean = mean(abs.vals, ...)
return(abs.mean)
}
absoluteMean(c(1, -1, 2, NA), na.rm = TRUE)
## [1] 1.333333
An important example for the use of ... is the plot function. plot() itself
only possesses a few parameters like x, y, xlim etc. However, a variety of
additional graphical parameters (e.g. col) can be passed to plot which
will be passed to more basic functions like par() via ...
## [1] 18
## [1] 18
Caution!
The ... argument is a powerful tool – but it also circumvents some safety
mechanisms. For example, typos are simply collected by the ... argument
instead of resulting in an error:
sum(1, 2, NA, na.mr = TRUE)
## [1] NA
Therefore: Use with care. Sometimes it’s safer to use a list with additional
arguments and do.call():
absoluteMean2 = function(x, additional.args = list()) {
args = list(x = abs(x))
abs.mean = do.call("mean", c(args, additional.args))
return(abs.mean)
}
absoluteMean2(c(1, -1, 2, NA), additional.args = list(na.rm = TRUE))
## [1] 1.333333
body(square)
## {
## return(x^2)
## }
typeof(body(square))
## [1] "language"
environment(square)
## <environment: R_GlobalEnv>
typeof(environment(square))
## [1] "environment"
f = function() { f()
x = 5
print(environment()) ## <environment: 0x0000000007996900>
print(parent.env(environment())) ## <environment: R_GlobalEnv>
return(x) ## [1] 5
}
environment(f) f()
A new beginning
Every time a function is called anew, it starts from its beginning. This
might seem obvious to us, but it doesn’t always comply with the expected
behavior:
Thus, a function can’t remember anything (at least not in this way). It’s
memoryless so to speak.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 41 / 92
Third component: Environment
The main purpose is to integrate the function into the existing search
tree, i.e. to allow for scoping.
x = 2
f()
## [1] 2
x = x + 1
f()
## [1] 3
And that’s really all there is to say about this – though, we’ll see in the
excercises that the actual reality might prove more complicated than that
at times.
g = function() { g()
encl.env = parent.env(environment())
new = !exists("count", encl.env) ## Defining count
if (new) { ## [1] 1
message("Defining count")
assign("count", 1, envir = encl.env) g()
} else {
count <<- count + 1 ## [1] 2
}
count count
}
environment(g) = new.env() ## Error in eval(expr, envir, enclos):
object ’count’ not found
Aside from the regular assignment via <- and =, there also is a deep
assignment in R. It’s defined in the following way:
Definition: x <<- val
1 Does the current environment have a parent environment?
Binding environments
f = function() x = 10
return(x) f()
## [1] 10
f2 = f
f2()
environment(f2) = new.env()
environment(f2)$x = 2 ## [1] 2
environment(f2)
## <environment: 0x0000000006bedd18>
f = function() { f()
cal.env = parent.frame()
print(cal.env) ## <environment: R_GlobalEnv>
x = get("x", envir = cal.env) ## [1] 10
return(x)
} environment(f)
x = 10
environment(f) = new.env() ## <environment: 0x00000000072612d0>
environment(f)$x = 2
Scoping that searches for variables in the calling environment is also called
dynamic scoping. In R, it can only be performed by explicitly using
parent.frame().
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 49 / 92
Third component: Environment
f(x = ls())
We’re now able to explain it: Default values are evaluated in the execution
environment whereas calling parameters are evaluated in the calling
environment. ls() then returns differing contents.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 50 / 92
Component 4: Byte code
Compiled R code
Compiling an R function
Most closures from the base package are already compiled and as
such contain byte code:
mean
## function (x, ...)
## UseMethod("mean")
## <bytecode: 0x0000000008898858>
## <environment: namespace:base>
## Unit: microseconds
## expr min lq mean median uq max neval
## f(1000) 30.0 31.0 57.0 32.0 34 2400 100
## fCMP(1000) 30.0 31.0 33.0 32.0 34 60 100
## sum((1:1000)^2) 3.2 3.9 5.2 4.5 6 18 100
Wait a minute! The compiled version isn’t that much faster than the
original version. So, compiling it doesn’t help at all?
Changelog R 3.4.0
The JIT (‘Just In Time’) byte-code compiler is now enabled by default at
its level 3.
enableJIT(level = 0)
## [1] 3
f = function(x) {
res = 0
for (i in 1:x) {
res = res + i^2
}
res
}
fCMP = cmpfun(f)
print(microbenchmark(f(1000), fCMP(1000), sum((1:1000)^2)), signif = 2)
## Unit: microseconds
## expr min lq mean median uq max neval
## f(1000) 410.0 430.0 510.0 450.0 570.0 1600 100
## fCMP(1000) 29.0 30.0 33.0 31.0 32.0 130 100
## sum((1:1000)^2) 3.2 4.3 6.9 6.1 7.6 29 100
## List of 2
## $ x: num 1
## $ y: num 2
## [1] 1
Invisible output
## [1] 1
When executing functions, things can go wrong, e.g. when the input
parameter has the wrong type. In these situations, the user should be
notified and in extreme cases the execution of the function should be
aborted with an error. R offers three types of notifications:
1 message(): In case that nothing severe occurs, but the user still
should be notified.
2 warning(): A potentially harmful situation where the execution is still
possible but probably not in the way intended by the user.
3 stop(): Everything is lost. The function’s execution is stopped and
the user is presented with an error message (in red!)
value Behavior
negative Warnings are ignored
0 Warnings are collected, printing via warnings
1 Warnings are printed directly to the console
>1 Warnings are treated like errors
This way, the return type depends on the input without the user having any
influence over it. If the output is supposed to be processed further, new
errors will occur.
The popular practice of using specific numbers to code errors (e.g: return
value 99 = error) should also be avoided in R.
The correct way is using stop() to stop the execution with an error.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 64 / 92
The return value
Side effects
Most functions in R map a set of inputs onto an output. They are also
called pure functions. Contrary to those, there are also functions that have
other side effects aside from returning a value.
Definition: Side effects
Pure functions only compute a return value. Other impacts of a function,
e.g. on the R session status or on the data system, are called side effects.
Examples of a side effect are the generation of a variable in the global
environment or the generation of a graphic or a file.
A simple exemplary function with a side effect can be obtained by using <<-
fun = function(x, value) x «- value
All of these functions have one thing in common: They are called explicitly
for their side effect. Their actual return value is usually disregarded. Yet,
they always have a return value.
x = library(gtools)
x
f = function(y) { f(x)
x <<- y + 1 ## [1] 2
return(x)
} x
## [1] 2
x = 1
Such side effects can lead to errors that are hard to find.
Rule
A function that is not explicitly called for its side effect, should not have a
side effect.
nastyFunction = function() {
all.var.names = ls(pos = 1)
for (name in all.var.names) {
value = get(name, pos = 1)
if (is.numeric(value)) {
assign(name, pos = 1, value = value + 1)
}
}
return("gnahahahaha")
}
nastyFunction()
## [1] "gnahahahaha"
## [1] 3
Cleaning up I
In some cases, it’s not possible to prevent side effects. For example, a
function might require the change of global graphic parameters:
par()$mfrow
## [1] 1 1
myPlot = function(){
par(mfrow = c(2, 1))
# Do some nice plotting
return(invisible(NULL))
}
myPlot()
par()$mfrow
## [1] 2 1
From the outside, the global options appear to have been magically
changed - probably even without us intending to do so.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 69 / 92
The return value
Cleaning up II
on.exit()
Calling on.exit() inside a function defines program code that is executed
upon exiting the function. It doesn’t matter if the function terminates
successfully (e.g. via return()) or with an error. Code defined through
on.exit() is always executed.
Cleaning up III
## [1] 1 1
myPlot = function(){
old.pars = par(no.readonly = TRUE)
on.exit(do.call(par, old.pars))
par(mfrow = c(2, 1))
# Do some nice plotting
}
myPlot()
par()$mfrow
## [1] 1 1
Good practices
The return value of a function should always have the same type and
structure.
The return value of a function is usually specified by return(). This
should be used in particular to directly terminate the function in
simple situations.
You’re welcome to replace return(value) at the end of a function by
a simple value.
Side effects should be avoided. Examples of acceptable side effects:
Loading packages, objects, ...
Beware: Existing variables can be overridden by this
Storing objects in data files.
Printing output to the console (don’t overdo it!).
Generating or displaying a graphic.
Functions ’without output’ should return invisible(NULL).
Function calls
Prefix functions I
The only thing that remains is: How do you call a function? Usually, this is
done using round brackets:
Definition: Prefix function
A function that is called according to the pattern:
’functionname’(name1 = arg1, name2 = arg2, ..)
is called a prefix function.
Prefix functions II
Infix function I
Infix function II
1 + 2 `+`(1, 2)
## [1] 3 ## [1] 3
## [1] 1 ## [1] 1
Unary operators
In addition to these binary infix operators, there also are a handful of unary
operators in R:
x = -5 x = TRUE
+x !x
-x (2 + 2)
## [1] 5 ## [1] 4
10 %m% 5 %m% 3
## [1] 2
## [1] 2
10 %m% (5 %m% 3)
## [1] 8
This also concerns self-created infix functions! The order of evaluation for
operators defined in base R is more complicated. Here, R also has to
consider the order of arithmetic operations for example.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 80 / 92
Function calls
Operators from a higher class are always evaluated first. Inside of an class,
evaluation usually occurs from left to right.
Brackets
The four bracket types ‘(‘ ‘[‘ ‘[[‘ ‘{‘ are also just functions in R.
However, their behavior is different from every other function: They
require a further character, a closing bracket.
Every pair of brackets encloses (‘{‘ even multiple) R statements. It’s
evaluated before the brackets themselves.
‘[‘ ‘[[‘ have more than two arguments (matrix subsetting).
This atypical bevavior is possible, because they aren’t closures:
typeof(`(`) typeof(`[[`)
## [1] "builtin" ## [1] "special"
typeof(`[`) typeof(`{`)
## [1] "special" ## [1] "special"
Replacement functions I
In some assignments, there is a function call on the left side – e.g. when
placing the names attribute of an object.
y = x = 1:2
names(x) = c("a", "b")
x
## a b
## 1 2
We’ve all seen this kind of assignment before and have come to accept it.
But at this point, we want to ask ourselves: How is it possible to assign a
value to the result of a function call? This appears to contradict everything
we have learned so far in this chapter.
Replacement functions II
Just like infix functions, these so-called replacement functions are just
another notation for a special kind of prefix functions. Alternatively, we can
also use the prefix call:
y = `names<-`(y, c("a", "b"))
y
## a b
## 1 2
is equivalent to:
x = 1 y = 1
names(x) = "a" y = `names<-`(y, "a")
x y
## a ## a
## 1 ## 1
x[2] = 2 y = `[<-`(y, 2, 2)
x y
## a ## a
## 1 2 ## 1 2
Key words like for, if, while and even function are just functions
themselves.
typeof(`for`)
## [1] "special"
typeof(`function`)
## [1] "special"
Their call behavior differs wildly from other functions – they are neither
prefix nor infix nor replacement. This is possible because their type is
special or builtin and thus, they don’t have to adhere to the behavior
of closures. Fortunately, we can’t define functions with such a call
behavior ourselves.
## [1] 1 ## [1] 1
## [1] 2 ## [1] 2
val val
## NULL ## NULL
## [1] 2 ## [1] 2
val val
## [1] 2 ## [1] 2
Summary
Summary
Functions behave just like any other object and can be treated as such.
Aside from the internal types builtin and special, we usually
encounter closures. They consist of a list of arguments, a body and
an environment.
Argument matching matches calling parameters to the list of formal
parameters.
The enclosing environment integrates the function into the lexical
scoping used by R.
Functions can be compiled for the purpose of efficiency.
A function in R always has exactly one return value.
Functions are not just an object in R; they play a key role as
everything in R results from function calls.