0% found this document useful (0 votes)
23 views67 pages

K21 Datentypen

This document discusses the six basic data types in R: double, integer, complex, character, logical, and raw. It describes the purpose and memory requirements of each data type, and how they can be generated. It also provides details on floating point numbers and double precision numbers.

Uploaded by

DunsScoto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views67 pages

K21 Datentypen

This document discusses the six basic data types in R: double, integer, complex, character, logical, and raw. It describes the purpose and memory requirements of each data type, and how they can be generated. It also provides details on floating point numbers and double precision numbers.

Uploaded by

DunsScoto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Advanced R

Chapter 2.1: Data types

Daniel Horn & Sheila Görz

Summer Semester 2022

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 1 / 62


Our plan for today

1 The six basic data types

2 Characteristics of the six basic data types

3 Data type memory requirements

4 Hierarchy of data types

5 Operators for basic data types

6 NAs, NULL, Inf, etc.

7 Overview

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 2 / 62


Data type – what’s that?

What we have learnt in Chapter 1.1: A computer’s memory consists of


binary digits (bits, 0 or 1). They are concatenated to words of 8 bit
length (one byte).
The memory contains programs as well as data. It’s up to the given
application to correctly interpret these bit strings.
However, users generally don’t want to work with bit strings. Instead,
they prefer more comprehensible representations like (natural)
numbers in base-10.
Data type
A data type states how the contents of the binary memory are to be
interpreted. In particular, they also specify how (arithmetic) operands are
defined.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 3 / 62


Data types in R

There are 24 different data types in R. While some are more well known
(e.g. logical or double), others appear (almost) exclusively inside of R
and are thus less known (e.g. promise and symbol).
A complete list can be found on the help page for typeof().
Only six of these 24 data types contain real data and are designated
for us to use. We are using four of them constantly. This chapter is
mostly about these six data types.
An overview of the remaining types will be provided at the end of the
chapter.
Over the course of the semester we will deal more or less with
(almost) all of the 24 data types.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 4 / 62


Data types and objects

The terms data type and object are closely related:


Data types describe how the content of a single memory cell is to be
interpreted.
An object consists of multiple memory cells (each of which has a data
type). Objects are sometimes called complex or composite data types.
From now on: if we are talking about objects, we are thinking of a set of
memory cells with a data type which can be assigned to a variable in R.
Regarding this, the following holds in R:
Paradigm 1
Everything in R is an object.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 5 / 62


The six basic data types

The six basic data types

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 6 / 62


The six basic data types Introduction

What are the six basic data types in R?

The six basic data types in R are:

double logical
integer complex
character raw

We will now take a closer look at each of these data types. In particular, we
are interested in answering the following questions:
What is the purpose of the respective data type?
How much memory do single or multiple components of a specific
data type require?
How do we generate elements of a specific data type?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 7 / 62


The six basic data types Speicherung

General remarks - All of them are vectors

Paradigm 2
The six basic data types are always vectors. Along with lists (see Chapter
2.2) and expressions, they constitute the vector data types.

Consequently, ...
... functions and commands for vectors (e.g. length()) can be
applied to all vector objects.
... there are no scalars in R. A single digit or a single letter is always a
vector of length 1.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 8 / 62


The six basic data types Generation

General remarks - Generation

Every data type can be generated in R by calling its name.


This can be done by entering its name in R followed by the desired
amount of elements enclosed in brackets.
This will return a vector of the requested data type of specified length
containing the ’null element’ of the respective data type.

double(1) logical(1)

## [1] 0 ## [1] FALSE

integer(1) complex(1)

## [1] 0 ## [1] 0+0i

character(1) raw(1)

## [1] "" ## [1] 00

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 9 / 62


Characteristics of the six basic data types

Characteristics of the six basic data types

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 10 / 62


Characteristics of the six basic data types double

Data types for numbers - double

double denotes a ’double-precision floating-point number’ and is used


to represent real numbers.
Problem: Real numbers can hold infinitely many decimal places - yet,
a computer can obviously only store a finite amount of digits. How is
this possible?
We will now turn to a short excursion to floating-point arithmetics.
For a more detailed discussion about this topic refer to the bachelor
course ’Computergestützte Statistik’.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 11 / 62


Characteristics of the six basic data types double

Definition: Floating-point numbers

In general, we are looking at numbers with a base b, an excess (exponent


bias) q and a mantissa length (amount of significant digits) p.
With these, numbers can be expressed through the value pair (e, f ) so that:

(e, f ) → f × b e−q ,

where e is a natural number from a specified range of values and f is a


number between −1 and +1, i.e.

|f | < 1.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 12 / 62


Characteristics of the six basic data types double

Single-precision floating-point format


Single-precision floating-point numbers typically take the following form:
± eeeeeee ffffffff ffffffff ffffffff ,
i.e. 1 bit for the sign, 7 for the exponent and 24 for the mantissa, which
totals up to 4 bytes.
The computer selects the base b = 2.
The largest binary number representable with seven bits is
27 − 1 = 127, as such: 0 ≤ e < esup = 128.
e −1
A common choice for the excess q is b sup2 c = 63.
⇒ Absolute value of the min./max. representable number is smaller than
1 · 2127−63 = 264 ≈ 1.8 · 1019 .
⇒ The precision (decimal places) of these numbers is
log10 (224 ) ≈ 7.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 13 / 62
Characteristics of the six basic data types double

Double-precision floating-point format


For today’s applications, single precision does often not suffice with respect
to both size and precision. Therefore, double-precision floating-point
numbers are usually used. R only supports the latter.
There are 11 bits available for the exponent, 52 bits for the mantissa
and 1 bit for the sign.
As base, b = 2 is chosen.
Thus, it holds for the exponent that 0 ≤ e < esup = 211 = 2048.
e −1
A possible choice for the excess q is b sup2 c = 1023.
⇒ Absolute value of the min./max. representable number is smaller than

1 · 22047−1023 = 21024 ≈ 1.8 · 10308 .

⇒ The precision (decimal places) of these numbers is

log10 (252 ) ≈ 16.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 14 / 62


Characteristics of the six basic data types double

Example of a double-precision floating-point number


In general, we have: b = 2 = 102 , q = 1023 = 11111111112
⇒ f × 10e−1111111111
2
2

4.625 expressed as a double-precision floating-point number:


4.625 = 100.1012
Thus:
4.625 = 100.1012 · 10022
= 10.01012 · 10122
= 1.00101 · 1010
2
2

= 0.1001012 · 1011
2
2

= 0.1001012 · 102100000000102 −11111111112


Consequently:
e = 10000000010
and
f = 1001010000000000000000000000000000000000000000000000
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 15 / 62
Characteristics of the six basic data types double

So, now we know everything about real numbers, right?

0.1 + 0.2 == 0.3 0.1 + 0.2 - 0.3

## [1] FALSE ## [1] 5.551115e-17

Not every real number can be expressed in R, because only a finite


amount of bits is available for the mantissa.
That’s why almost all arithmetic operations with floating-point
numbers involve roundoff errors.
Beware: For many numbers with a finite decimal expansion the
respective binary expansion does not terminate, e.g. 0.1. Thus, even
the process of storing such numbers already introduces errors.
R (almost) never calculates accurately, but usually almost accurately.
Still, roundoff errors can accumulate over the course of lengthy
calculations and lead to questionable results.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 16 / 62
Characteristics of the six basic data types integer

Data types for numbers - integer


The data type integer is (surprisingly) used to store integers.
Barring a few exceptions, integers in R have to be designated explicitly
because R would use doubles otherwise:

str(1) str(1:2)

## num 1 ## int [1:2] 1 2

str(1L) str(numeric(1))

## int 1 ## num 0

Integers behave like doubles (most of the time) - they both also
belong to the numeric type.
So, what’s the use of integers then? They provide advantages w.r.t.
speed and storage requirements.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 17 / 62
Characteristics of the six basic data types integer

integer - Internal representation

Representation as a bit vector of length 32 = 4 bytes.


Interpretation as a decimal number - but wait, what about negative
numbers?
Solution: Use 31 Bits for the number and 1 bit for its sign.
Note, the largest number possible is 231 − 1 = 2 147 483 647.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 18 / 62


Characteristics of the six basic data types integer

integer - Expressing negative numbers I


intToBits() returns the binary representation of a integer in reversed
order. R denotes a 0 as ’00’ and a 1 as ’01’:
intToBits(41)

## [1] 01 00 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 00

rev(intToBits(2^31-1))

## [1] 00 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
## [26] 01 01 01 01 01 01 01

rev(intToBits(0))

## [1] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 00

The last (first in reversed order) bit specifies the sign. Apparently, ’00’
denotes a positive sign.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 19 / 62
Characteristics of the six basic data types integer

integer - Expressing negative numbers II

rev(intToBits(-41))

## [1] 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
## [26] 01 00 01 00 01 01 01

rev(intToBits(-(2^31-1)))

## [1] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 01

What happened here? Why aren’t negative numbers expressed just like
positive numbers only with a ’01’ as their last (first) bit?
R uses the so-called ’two’s complement’ to express negative numbers.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 20 / 62


Characteristics of the six basic data types integer

integer - Expressing negative numbers III

How do we get the two’s complement of a number?


Negate all bits (including the sign bit),
Then, add a 1 onto the negated bits.
Why even bother with this?
This simplifies arithmetic operations. For example, the subtraction of
two numbers can be regarded as an addition with a negative number
and as such, we can use our ’algorithm’ for addition by hand.
Zero does not have two representations anymore. There is no ’+0’ and
’-0’.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 21 / 62


Characteristics of the six basic data types integer

integer - Expressing negative numbers IV

So, now we have 231 − 1 positive and negative numbers and a single 0. But
there is still one bit combination left unassigned, the ’negative zero’, i.e.
the combination of a ’01’ for the sign bit (negative) and ’00’ for the
remaining bits.

What value could this bit combination represent?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 22 / 62


Characteristics of the six basic data types integer

integer - Expressing negative numbers IV

So, now we have 231 − 1 positive and negative numbers and a single 0. But
there is still one bit combination left unassigned, the ’negative zero’, i.e.
the combination of a ’01’ for the sign bit (negative) and ’00’ for the
remaining bits.

What value could this bit combination represent?

It’s the NA - the missing value:


rev(intToBits(NA))

## [1] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 00

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 22 / 62


Characteristics of the six basic data types integer

integer - A small example I

For visualization purposes we’re looking at 4 bits instead of 32 bits in this


example - it works the same way and leads to the same result.

Task: Calculate 7 − 6:
At first, note that 7 − 6 = 7 + (−6).
The binary representation of 7 is ’0111’ while it’s ’0110’ for 6.
Now, determine the two’s complement of 6:
Negating the bits: ’1001’,
Adding 1: ’1001’ + ’0001’ = ’1010’.
Then, calculate:
0 1 1 1
+ 1 0 1 0
1 1 1

1 0 0 0 1
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 23 / 62
Characteristics of the six basic data types integer

integer - A small example II

Now, we have 5 bits instead of the permitted 4 bits. Therefore, the


first bit gets deleted (actually, it is not even calculated by the
computer).
Consequently, the result of the addition is ’0001’.
Thus, our result has a positive sign and the binary representation ’001’
which is a 1 in decimal representation.
As such, the result of 7 − 6 is +1.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 24 / 62


Characteristics of the six basic data types character

Data types for letters - character

For expressing (character) strings R offers the character data type.


Contrary to other programming languages, a character object can
contain multiple characters.
Character strings can be generated with either single (') or double (")
quotation marks in R:

"a" 'a'
## [1] "a" ## [1] "a"

For encoding bit strings to characters, encoding tables are used. There
are many different standard tables - a common standard is UTF8.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 25 / 62


Characteristics of the six basic data types logical

Data types for logical values - logical


The data type logical can take three values: TRUE, FALSE and NA.
While TRUE and FALSE are rather self-explanatory, NA denotes a
missing value.
TRUE, FALSE and NA are reserved words in R, i.e. their values cannot
be overridden.
This is, however, not the case for the potentially familiar variables T
and F: These do not inherently denote TRUE and FALSE. Instead, they
are only variables initialized with these values and as such, T and F can
be overridden:

TRUE <- 5 T <- 5

## Error in TRUE <- 5: invalid (do_set)


left-hand side to assignment

That’s why TRUE and FALSE should be used instead of T and F!


Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 26 / 62
Characteristics of the six basic data types complex

Data type for complex numbers - complex

The data type complex is used to express numbers in the complex


plane.
Internally, this data type consists of a pair of doubles, one for the
real and one for the imaginary part.
As such, floating-point arithmetics apply to complex numbers in R as
well.
Complex numbers can be generated with an i:

3i 7 + 2i

## [1] 0+3i ## [1] 7+2i

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 27 / 62


Characteristics of the six basic data types raw

Data type for bytes - raw

The data type raw is intended to contain ’raw’ bytes.


A raw object consists of two elements each of which contains a
hexadecimal digit. As such, numbers between 0 and 255 can be
expressed as a raw.

as.raw(0) as.raw(255)
## [1] 00 ## [1] ff
as.raw(40) as.raw(256)
## [1] 28 ## Warning: out-of-range values
treated as 0 in coercion to raw
## [1] 00

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 28 / 62


Data type memory requirements

Data type memory requirements

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 29 / 62


Data type memory requirements

Memory requirement of a single element

Up until now, we have not really talked about the memory requirements of
the individual data types. The only thing we know so far is that a double
ideally only requires 64 bits = 8 bytes and an integer 32 bits = 4 bytes.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 30 / 62


Data type memory requirements

Memory requirement of a single element

Up until now, we have not really talked about the memory requirements of
the individual data types. The only thing we know so far is that a double
ideally only requires 64 bits = 8 bytes and an integer 32 bits = 4 bytes.

object.size(raw(1)) object.size(integer(1))

## 56 bytes ## 56 bytes

object.size(logical(1)) object.size(double(1))

## 56 bytes ## 56 bytes

object.size(character(1)) object.size(complex(1))

## 112 bytes ## 64 bytes

Then why does the reality in R differ from our expectations?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 30 / 62


Data type memory requirements

Memory requirement of an empty vector

Recalling Paradigm 2: These data types are vectors in R. Generating a


vector causes overhead and even vectors of length 0 already require
memory.

object.size(raw(0)) object.size(integer(0))

## 48 bytes ## 48 bytes

object.size(logical(0)) object.size(double(0))

## 48 bytes ## 48 bytes

object.size(character(0)) object.size(complex(0))

## 48 bytes ## 48 bytes

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 31 / 62


Data type memory requirements

Memory requirement of a single element - Findings

The difference between an empty double vector and a double vector


with a single element is exactly the expected 8 bytes of memory space.
The difference between an empty complex vector and a complex
vector with a single element is 16 bytes. This corresponds to two
doubles.
It’s remarkable that raws and logicals require 8 bytes as well. From
a logical point of view, these data types should require less memory.
Even generating one integer requires 8 bytes when it should only
require 4 bytes.
The most memory space is needed for generating a character
element - 64 bytes.
Do these findings hold when generating larger vectors as well?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 32 / 62


Data type memory requirements

Memory requirements of multiple elements I

library(xtable)

# Choose all vector sizes from 0 to 1000


size <- 0:1000

# Determine the respective memory requirement of the six data types


bytes <- sapply(size, function(x)
c(double = object.size(double(x)), integer = object.size(integer(x)),
character = object.size(character(x)), logical = object.size(logical(x)),
complex = object.size(complex(x)), raw = object.size(raw(x))))

# fit the linear model memory = intercept + beta * size


tab <- apply(bytes, 1, function(x) lm(x ~ size)$coefficients)

double integer character logical complex raw


Intercept 48.97 51.92 104.75 51.92 48.38 58.49
beta 8.00 4.00 8.00 4.00 16.00 0.99

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 33 / 62


Data type memory requirements

Memory requirements of multiple elements II


Findings:
The slope parameters reveal the expected result: a double requires 8
bytes, a complex 16 bytes and an integer 4 bytes.
A raw vector apparently only requires one byte per element indeed.
But as already seen, the first raw element needs 8 bytes. How are
these findings compatible?
The intercept of the character is well beyond the 48 bytes required
for an empty vector. Here’s why:
An empty vector needs (just like for the other data types) 48 bytes.
Every additional string produces another 56 bytes of overhead (onetime
storage of the string).
Each string requires an additional 8 bytes for every 8 characters (for a
pointer referencing the string in the memory).
This explains the 112 bytes for a single element: 48 bytes + 56 bytes
+ 8 bytes.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 34 / 62
Data type memory requirements

Memory requirements of multiple elements III

Examples:
A string with a single character:
object.size("")
## 112 bytes
# 48 bytes + 56 bytes + 8 bytes

A string with 8 characters (7 letters and the ""):


object.size("abcdefg")
## 112 bytes
# 48 bytes + 56 bytes + 8 bytes

A string with 9 characters:


object.size("abcdefgh")
## 120 bytes
# 48 bytes + 56 bytes + 2 * 8 bytes

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 35 / 62


Data type memory requirements

Memory requirements of multiple elements IV

Two strings with the same two characters:


object.size(c("a", "a"))
## 120 bytes
# 48 bytes + 56 bytes + 2 * 8 bytes

Two strings with two different characters:


object.size(c("a", "b"))
## 176 bytes
# 48 bytes + 2 * 56 bytes + 2 * 8 bytes

The linear model above (and the following figure) only shows an incline of
8 bytes per element. This is because the command character(x) returns
a vector containing the same character x times.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 36 / 62


Data type memory requirements

Memory requirements of multiple elements V

Graphical illustration of the memory requirements for small and midsized


vectors:

suppressWarnings(library(ggplot2))

# create a data frame to be used by ggplot


data_bytes <- data.frame(size = rep(0:45, each = 6),
type = rep(rownames(bytes), 46), bytes = as.vector(bytes[, 1:46]))

# create ggplot figure


ggplot(data_bytes, aes(x = size, y = bytes, color = type)) +
geom_line(size = 2) + xlab("vector length") +
theme(axis.text = element_text(size = rel(2)),
axis.title = element_text(size = rel(2)),
legend.text = element_text(size = rel(2)),
legend.title = element_text(size = rel(2)))

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 37 / 62


Data type memory requirements

Memory requirements of multiple elements VI


800

600
type
character
bytes

complex
400 double
integer
logical
raw
200

0 10 20 30 40
vector length

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 38 / 62


Data type memory requirements

Memory requirements of multiple elements VII

Findings:
The memory requirement does not grow perfectly linear often times. It
sometimes progresses jaggedly or even constantly.
At least 8 bytes are always allocated at once. Thus, the average byte
for a raw is really just an average value.
The memory requirement of the integer appears to be missing, but
its line is actually covered completely by the line for logicals. They
both have exactly the same memory requirements:
all(bytes["integer", ] == bytes["logical", ])
## [1] TRUE

For integers and logicals, only 4 bytes are needed on average for a
single element as well, but at least 8 bytes are allocated. This explains
the slightly wiggly graph.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 39 / 62


Hierarchy of data types

Hierarchy of data types

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 40 / 62


Hierarchy of data types

Hierarchy - what’s that?

Paradigm 3
A vector can always only contain elements of a single data type.

It would seem reasonable that R throws an error when trying to


combine different data types into one vector.
However:
c(raw(1), FALSE, 2L, 3, 7+3i, "5")
## [1] "00" "FALSE" "2" "3" "7+3i" "5"

So the question becomes: When does a specific data type get coerced
into another?
Caution: The coercion in R occurs tacitly! This can be
potentially dangerous!

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 41 / 62


Hierarchy of data types

A small quiz for the meantime


Which data type is returned by the following inputs?

c(3, 4L) c(FALSE, 3L)


c("abc", 3+6i) c(raw(1), TRUE)
c("efg", raw(1)) c(3, 3+1i)

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 42 / 62


Hierarchy of data types

A small quiz for the meantime


Which data type is returned by the following inputs?

c(3, 4L) c(FALSE, 3L)


c("abc", 3+6i) c(raw(1), TRUE)
c("efg", raw(1)) c(3, 3+1i)

Solution:

typeof(c(3, 4L)) typeof(c(FALSE, 3L))

## [1] "double" ## [1] "integer"

typeof(c("abc", 3+6i)) typeof(c(raw(1), TRUE))

## [1] "character" ## [1] "logical"

typeof(c("efg", raw(1))) typeof(c(3, 3+1i))

## [1] "character" ## [1] "complex"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 42 / 62


Hierarchy of data types

Hierarchy of data types

We have determined the following ’hierarchy’ of data types:

raw → logical → integer → double → complex → character

Beware! When converting back and forth, the result is not always the same
as the original input. Example:
as.double(as.logical(5))

## [1] 1

The as.logical() is particularly dangerous! Each number different from


0 is interpreted as a TRUE, however, when converting a TRUE to a double
or an integer it is always interpreted as a 1!

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 43 / 62


Operators for basic data types

Operators for basic data types

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 44 / 62


Operators for basic data types numerics

Operators - integer, double, complex I


The following unary and binary operators for numerical values are
implemented in R:

Operator Meaning
+x Positive value of x
-x Change sign of x
x + y Addition
x - y Subtraction
x * y Multiplication
x / y Division
xˆy Exponentiation
x %% y Modulo division
x %/% y Integer division

Probably, this comes as no surprise. That’s why we are now taking a look
at the characteristics of these operators:
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 45 / 62
Operators for basic data types numerics

Operators - integer, double, complex II

For doubles +0 and -0 exist internally.


The operator ** can be used for exponentiation.
Exponentiation with a negative base and an exponent with an absolute
value < 1 is not possible in R:
(-8)^(1/3)
## [1] NaN

When using the function round(), ’.5’ decimal places are always
rounded to the closest even number:

round(0.5) round(2.5)
## [1] 0 ## [1] 2
round(1.5) round(3.5)
## [1] 2 ## [1] 4

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 46 / 62


Operators for basic data types logicals

Operators - raw, logical, (numeric) I

The following operators for logical values (and values that can be
interpreted as such) are available:

Operator Meaning
!x Vectorized negation
x & y Vectorized AND
x && y Scalar AND
x | y Vectorized OR
x || y Scalar OR
xor(x, y) Vectorized XOR

Again, this should come as no surprise. Let’s continue with their


characteristics:

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 47 / 62


Operators for basic data types logicals

Operators - raw, logical, (numeric) II

For raw elements, the negation corresponds to the ones’ complement


(negation of all bits). In this case: !x = 255 - x

as.numeric(!as.raw(40)) as.numeric(!as.raw(128))
## [1] 215 ## [1] 127

The operators &, | and xor() function bitwise for raws. This can lead
to the following results:

as.numeric(as.raw(40) & as.raw(38)) as.numeric(as.raw(40) & as.raw(41))


## [1] 32 ## [1] 40

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 48 / 62


Operators for basic data types logicals

Operators - raw, logical, (numeric) III


When using NAs, interesting results can occur:

TRUE & NA FALSE & NA


## [1] NA ## [1] FALSE

As for the first case, it is not clear whether the expression is TRUE or
FALSE, hence NA. In the second case, an AND-concatenation with
FALSE always leads to FALSE (more on this later).
As mentioned above, numeric values can be interpreted as logical
values as well. In these cases, only 0 is interpreted as FALSE and
everything else as TRUE:

!5 1 | 2
## [1] FALSE ## [1] TRUE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 49 / 62


Operators for basic data types characters

Operators - characters I
There is an order on characters as well which can be enquired by using <,
> and ==:

"1" < "2" "b" < "a"

## [1] TRUE ## [1] FALSE

Another small quiz:


The command sort() sorts a vector (characters as well) in an ascending
order. What is the result of this call?
sort(c("a", "10", "1", "A", "2", "aA", "b", "ä", "ae"))

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 50 / 62


Operators for basic data types characters

Operators - characters I
There is an order on characters as well which can be enquired by using <,
> and ==:

"1" < "2" "b" < "a"

## [1] TRUE ## [1] FALSE

Another small quiz:


The command sort() sorts a vector (characters as well) in an ascending
order. What is the result of this call?
sort(c("a", "10", "1", "A", "2", "aA", "b", "ä", "ae"))

Solution:
sort(c("a", "10", "1", "A", "2", "aA", "b", "ä", "ae"))

## [1] "1" "10" "2" "a" "A" "ä" "aA" "ae" "b"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 50 / 62


Operators for basic data types characters

Operators - characters II

This result might surprise to some degree. Findings and questions:


’numbers’ come before letters.
’a’ comes before ’b’.
’1’ < ’2’.
’10’ is ’smaller’ than ’2’.
⇒ Alphabetical sorting where numbers come before letters and
comparisons occur per character. This is why ’10’ is smaller than ’2’,
because ’1’ < ’2’. However:
How does the order ’a’, ’A’, ’ä’, ’aA’, ’ae’ come about?
How does capitalization affect the sorting order?
What about accents and other special characters?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 51 / 62


Operators for basic data types characters

Operators - characters III

The answer is: R sorts according to the Unicode Collation Algorithm1 .


At first, the input is sorted alphabetically without consideration of
capitalization, accents, etc. It holds that:
Numbers come before letters,
When beginning with the same letter combination, short words come
before long words.
After that, accents and other special characters are sorted: ’a’ comes
before ’ä’, ’o’ before ’ö’ and ’u’ before ’ü’. Letters without accents
come before letters with accents.
Next, the use of lower and upper case is considered. Lower case letters
come before upper case letters.
There are further rules for special characters etc. which you can find at
the link below.

1
https://en.wikipedia.org/wiki/Unicode_collation_algorithm
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 52 / 62
NAs, NULL, Inf, etc.

NAs, NULL, Inf, etc.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 53 / 62


NAs, NULL, Inf, etc. NA

The NAs
The NAs? But there’s just one NA, isn’t there?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 54 / 62


NAs, NULL, Inf, etc. NA

The NAs
The NAs? But there’s just one NA, isn’t there? Nope! There are in total five
NAs, one for each data type (except for the raw). The NA is the logical NA:
typeof(NA)

## [1] "logical"

Additionally, there are:

typeof(NA_character_) typeof(NA_integer_)

## [1] "character" ## [1] "integer"

typeof(NA_complex_) typeof(NA_real_)

## [1] "complex" ## [1] "double"

The five NAs all behave the same way. They are just assigned to different
data types.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 54 / 62
NAs, NULL, Inf, etc. NA

Operators on NAs I

Checking for NAs is not possible with the ’==’ operator:

x <- NA x <- 5
x == NA x == NA

## [1] NA ## [1] NA

Instead, checks for NA’s have to be performed with the is.na() function:

x <- NA x <- 5
is.na(x) is.na(x)

## [1] TRUE ## [1] FALSE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 55 / 62


NAs, NULL, Inf, etc. NA

Operators on NAs II

In general:
’Calculations’ with NAs always lead to NAs.

Sole exception:
If the result is the same for any value the NA could take, then this result is
returned.

That’s why FALSE & NA also leads to FALSE.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 56 / 62


NAs, NULL, Inf, etc. NULL

The empty set - NULL


The value NULL represents the empty set. It is a data type on its own:
typeof(NULL)

## [1] "NULL"

The ’==’ operator does not work for NULL either:

NULL == NULL 5 == NULL

## logical(0) ## logical(0)

Instead, checks for NULL must be performed using the function is.null():

is.null(NULL) is.null(5)

## [1] TRUE ## [1] FALSE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 57 / 62


NAs, NULL, Inf, etc. NaN and Inf

Numeric constants - NaN and Inf I

Inf (Infinity) and NaN (Not a Number) are numeric constants:

typeof(Inf) typeof(NaN)

## [1] "double" ## [1] "double"

Aside from Inf (+∞), there also is -Inf (−∞). NaN is, for example, the
result of 0 / 0 or ∞ − ∞:

0 / 0 Inf - Inf

## [1] NaN ## [1] NaN

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 58 / 62


NAs, NULL, Inf, etc. NaN and Inf

Numeric constants - NaN and Inf II


It is possible to check for NaNs using the function is.nan(). However,
is.na() also returns TRUE for a NaN:

is.nan(NaN) is.na(NaN)

## [1] TRUE ## [1] TRUE

To perform a more universal check whether or not a value is neither NA,


NaN nor ±Inf, the function is.finite() is available:

is.finite(NaN) is.finite(Inf)

## [1] FALSE ## [1] FALSE

is.finite(NA) is.finite(5)

## [1] FALSE ## [1] TRUE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 59 / 62


Overview

Overview

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 60 / 62


Overview

Overview of all data types I

The six (or seven when counting NULL) described data types form the base
of R. There are, however, further data types in R which have not been
discussed here. For completion’s sake, we’ll show them now in brief detail.
Data type Description Comment
symbol Variable name E.g. input for functions
pairlist Paired list Mostly for internal use
closure Function object
environment Environment See Chapter 2.3
promise Object for ’lazy evaluation’
language ’language’ object E.g. formula
special Internal function Does not evaluate its arguments
builtin Internal function Does evaluate its arguments

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 61 / 62


Overview

Overview of all data types II

Data type Description Comment


char ’scalar’ character object For internal use only
... Special variable arguments
any Data type that fits every type No objects of this type exist
expression ’expression’ object
list List See Chapter 2.2
bytecode Byte code For internal use only
externalptr External pointer
weakref ’weak’ reference object
S4 S4 object

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 62 / 62

You might also like