K21 Datentypen
K21 Datentypen
7 Overview
There are 24 different data types in R. While some are more well known
(e.g. logical or double), others appear (almost) exclusively inside of R
and are thus less known (e.g. promise and symbol).
A complete list can be found on the help page for typeof().
Only six of these 24 data types contain real data and are designated
for us to use. We are using four of them constantly. This chapter is
mostly about these six data types.
An overview of the remaining types will be provided at the end of the
chapter.
Over the course of the semester we will deal more or less with
(almost) all of the 24 data types.
double logical
integer complex
character raw
We will now take a closer look at each of these data types. In particular, we
are interested in answering the following questions:
What is the purpose of the respective data type?
How much memory do single or multiple components of a specific
data type require?
How do we generate elements of a specific data type?
Paradigm 2
The six basic data types are always vectors. Along with lists (see Chapter
2.2) and expressions, they constitute the vector data types.
Consequently, ...
... functions and commands for vectors (e.g. length()) can be
applied to all vector objects.
... there are no scalars in R. A single digit or a single letter is always a
vector of length 1.
double(1) logical(1)
integer(1) complex(1)
character(1) raw(1)
(e, f ) → f × b e−q ,
|f | < 1.
= 0.1001012 · 1011
2
2
str(1) str(1:2)
str(1L) str(numeric(1))
## int 1 ## num 0
Integers behave like doubles (most of the time) - they both also
belong to the numeric type.
So, what’s the use of integers then? They provide advantages w.r.t.
speed and storage requirements.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 17 / 62
Characteristics of the six basic data types integer
## [1] 01 00 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 00
rev(intToBits(2^31-1))
## [1] 00 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
## [26] 01 01 01 01 01 01 01
rev(intToBits(0))
## [1] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 00
The last (first in reversed order) bit specifies the sign. Apparently, ’00’
denotes a positive sign.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 19 / 62
Characteristics of the six basic data types integer
rev(intToBits(-41))
## [1] 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
## [26] 01 00 01 00 01 01 01
rev(intToBits(-(2^31-1)))
## [1] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 01
What happened here? Why aren’t negative numbers expressed just like
positive numbers only with a ’01’ as their last (first) bit?
R uses the so-called ’two’s complement’ to express negative numbers.
So, now we have 231 − 1 positive and negative numbers and a single 0. But
there is still one bit combination left unassigned, the ’negative zero’, i.e.
the combination of a ’01’ for the sign bit (negative) and ’00’ for the
remaining bits.
So, now we have 231 − 1 positive and negative numbers and a single 0. But
there is still one bit combination left unassigned, the ’negative zero’, i.e.
the combination of a ’01’ for the sign bit (negative) and ’00’ for the
remaining bits.
## [1] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 00
Task: Calculate 7 − 6:
At first, note that 7 − 6 = 7 + (−6).
The binary representation of 7 is ’0111’ while it’s ’0110’ for 6.
Now, determine the two’s complement of 6:
Negating the bits: ’1001’,
Adding 1: ’1001’ + ’0001’ = ’1010’.
Then, calculate:
0 1 1 1
+ 1 0 1 0
1 1 1
1 0 0 0 1
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 23 / 62
Characteristics of the six basic data types integer
"a" 'a'
## [1] "a" ## [1] "a"
For encoding bit strings to characters, encoding tables are used. There
are many different standard tables - a common standard is UTF8.
3i 7 + 2i
as.raw(0) as.raw(255)
## [1] 00 ## [1] ff
as.raw(40) as.raw(256)
## [1] 28 ## Warning: out-of-range values
treated as 0 in coercion to raw
## [1] 00
Up until now, we have not really talked about the memory requirements of
the individual data types. The only thing we know so far is that a double
ideally only requires 64 bits = 8 bytes and an integer 32 bits = 4 bytes.
Up until now, we have not really talked about the memory requirements of
the individual data types. The only thing we know so far is that a double
ideally only requires 64 bits = 8 bytes and an integer 32 bits = 4 bytes.
object.size(raw(1)) object.size(integer(1))
## 56 bytes ## 56 bytes
object.size(logical(1)) object.size(double(1))
## 56 bytes ## 56 bytes
object.size(character(1)) object.size(complex(1))
object.size(raw(0)) object.size(integer(0))
## 48 bytes ## 48 bytes
object.size(logical(0)) object.size(double(0))
## 48 bytes ## 48 bytes
object.size(character(0)) object.size(complex(0))
## 48 bytes ## 48 bytes
library(xtable)
Examples:
A string with a single character:
object.size("")
## 112 bytes
# 48 bytes + 56 bytes + 8 bytes
The linear model above (and the following figure) only shows an incline of
8 bytes per element. This is because the command character(x) returns
a vector containing the same character x times.
suppressWarnings(library(ggplot2))
600
type
character
bytes
complex
400 double
integer
logical
raw
200
0 10 20 30 40
vector length
Findings:
The memory requirement does not grow perfectly linear often times. It
sometimes progresses jaggedly or even constantly.
At least 8 bytes are always allocated at once. Thus, the average byte
for a raw is really just an average value.
The memory requirement of the integer appears to be missing, but
its line is actually covered completely by the line for logicals. They
both have exactly the same memory requirements:
all(bytes["integer", ] == bytes["logical", ])
## [1] TRUE
For integers and logicals, only 4 bytes are needed on average for a
single element as well, but at least 8 bytes are allocated. This explains
the slightly wiggly graph.
Paradigm 3
A vector can always only contain elements of a single data type.
So the question becomes: When does a specific data type get coerced
into another?
Caution: The coercion in R occurs tacitly! This can be
potentially dangerous!
Solution:
Beware! When converting back and forth, the result is not always the same
as the original input. Example:
as.double(as.logical(5))
## [1] 1
Operator Meaning
+x Positive value of x
-x Change sign of x
x + y Addition
x - y Subtraction
x * y Multiplication
x / y Division
xˆy Exponentiation
x %% y Modulo division
x %/% y Integer division
Probably, this comes as no surprise. That’s why we are now taking a look
at the characteristics of these operators:
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 45 / 62
Operators for basic data types numerics
When using the function round(), ’.5’ decimal places are always
rounded to the closest even number:
round(0.5) round(2.5)
## [1] 0 ## [1] 2
round(1.5) round(3.5)
## [1] 2 ## [1] 4
The following operators for logical values (and values that can be
interpreted as such) are available:
Operator Meaning
!x Vectorized negation
x & y Vectorized AND
x && y Scalar AND
x | y Vectorized OR
x || y Scalar OR
xor(x, y) Vectorized XOR
as.numeric(!as.raw(40)) as.numeric(!as.raw(128))
## [1] 215 ## [1] 127
The operators &, | and xor() function bitwise for raws. This can lead
to the following results:
As for the first case, it is not clear whether the expression is TRUE or
FALSE, hence NA. In the second case, an AND-concatenation with
FALSE always leads to FALSE (more on this later).
As mentioned above, numeric values can be interpreted as logical
values as well. In these cases, only 0 is interpreted as FALSE and
everything else as TRUE:
!5 1 | 2
## [1] FALSE ## [1] TRUE
Operators - characters I
There is an order on characters as well which can be enquired by using <,
> and ==:
Operators - characters I
There is an order on characters as well which can be enquired by using <,
> and ==:
Solution:
sort(c("a", "10", "1", "A", "2", "aA", "b", "ä", "ae"))
## [1] "1" "10" "2" "a" "A" "ä" "aA" "ae" "b"
Operators - characters II
1
https://en.wikipedia.org/wiki/Unicode_collation_algorithm
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 52 / 62
NAs, NULL, Inf, etc.
The NAs
The NAs? But there’s just one NA, isn’t there?
The NAs
The NAs? But there’s just one NA, isn’t there? Nope! There are in total five
NAs, one for each data type (except for the raw). The NA is the logical NA:
typeof(NA)
## [1] "logical"
typeof(NA_character_) typeof(NA_integer_)
typeof(NA_complex_) typeof(NA_real_)
The five NAs all behave the same way. They are just assigned to different
data types.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 54 / 62
NAs, NULL, Inf, etc. NA
Operators on NAs I
x <- NA x <- 5
x == NA x == NA
## [1] NA ## [1] NA
Instead, checks for NA’s have to be performed with the is.na() function:
x <- NA x <- 5
is.na(x) is.na(x)
Operators on NAs II
In general:
’Calculations’ with NAs always lead to NAs.
Sole exception:
If the result is the same for any value the NA could take, then this result is
returned.
## [1] "NULL"
## logical(0) ## logical(0)
Instead, checks for NULL must be performed using the function is.null():
is.null(NULL) is.null(5)
typeof(Inf) typeof(NaN)
Aside from Inf (+∞), there also is -Inf (−∞). NaN is, for example, the
result of 0 / 0 or ∞ − ∞:
0 / 0 Inf - Inf
is.nan(NaN) is.na(NaN)
is.finite(NaN) is.finite(Inf)
is.finite(NA) is.finite(5)
Overview
The six (or seven when counting NULL) described data types form the base
of R. There are, however, further data types in R which have not been
discussed here. For completion’s sake, we’ll show them now in brief detail.
Data type Description Comment
symbol Variable name E.g. input for functions
pairlist Paired list Mostly for internal use
closure Function object
environment Environment See Chapter 2.3
promise Object for ’lazy evaluation’
language ’language’ object E.g. formula
special Internal function Does not evaluate its arguments
builtin Internal function Does evaluate its arguments