0% found this document useful (0 votes)

23 views67 pages

K21 Datentypen

This document discusses the six basic data types in R: double, integer, complex, character, logical, and raw. It describes the purpose and memory requirements of each data type, and how they can be generated. It also provides details on floating point numbers and double precision numbers.

Uploaded by

DunsScoto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views67 pages

K21 Datentypen

Uploaded by

DunsScoto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Advanced R

Chapter 2.1: Data types

Daniel Horn & Sheila Görz

Summer Semester 2022

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 1 / 62

Our plan for today

1 The six basic data types

2 Characteristics of the six basic data types

3 Data type memory requirements

4 Hierarchy of data types

5 Operators for basic data types

6 NAs, NULL, Inf, etc.

7 Overview

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 2 / 62

Data type – what’s that?

What we have learnt in Chapter 1.1: A computer’s memory consists of

binary digits (bits, 0 or 1). They are concatenated to words of 8 bit
length (one byte).
The memory contains programs as well as data. It’s up to the given
application to correctly interpret these bit strings.
However, users generally don’t want to work with bit strings. Instead,
they prefer more comprehensible representations like (natural)
numbers in base-10.
Data type
A data type states how the contents of the binary memory are to be
interpreted. In particular, they also specify how (arithmetic) operands are
defined.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 3 / 62

Data types in R

There are 24 different data types in R. While some are more well known
(e.g. logical or double), others appear (almost) exclusively inside of R
and are thus less known (e.g. promise and symbol).
A complete list can be found on the help page for typeof().
Only six of these 24 data types contain real data and are designated
for us to use. We are using four of them constantly. This chapter is
mostly about these six data types.
An overview of the remaining types will be provided at the end of the
chapter.
Over the course of the semester we will deal more or less with
(almost) all of the 24 data types.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 4 / 62

Data types and objects

The terms data type and object are closely related:

Data types describe how the content of a single memory cell is to be
interpreted.
An object consists of multiple memory cells (each of which has a data
type). Objects are sometimes called complex or composite data types.
From now on: if we are talking about objects, we are thinking of a set of
memory cells with a data type which can be assigned to a variable in R.
Regarding this, the following holds in R:
Paradigm 1
Everything in R is an object.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 5 / 62

The six basic data types

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 6 / 62

The six basic data types Introduction

What are the six basic data types in R?

The six basic data types in R are:

double logical
integer complex
character raw

We will now take a closer look at each of these data types. In particular, we
are interested in answering the following questions:
What is the purpose of the respective data type?
How much memory do single or multiple components of a specific
data type require?
How do we generate elements of a specific data type?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 7 / 62

The six basic data types Speicherung

General remarks - All of them are vectors

Paradigm 2
The six basic data types are always vectors. Along with lists (see Chapter
2.2) and expressions, they constitute the vector data types.

Consequently, ...
... functions and commands for vectors (e.g. length()) can be
applied to all vector objects.
... there are no scalars in R. A single digit or a single letter is always a
vector of length 1.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 8 / 62

The six basic data types Generation

General remarks - Generation

Every data type can be generated in R by calling its name.

This can be done by entering its name in R followed by the desired
amount of elements enclosed in brackets.
This will return a vector of the requested data type of specified length
containing the ’null element’ of the respective data type.

double(1) logical(1)

## [1] 0 ## [1] FALSE

integer(1) complex(1)

## [1] 0 ## [1] 0+0i

character(1) raw(1)

## [1] "" ## [1] 00

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 9 / 62

Characteristics of the six basic data types

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 10 / 62

Characteristics of the six basic data types double

Data types for numbers - double

double denotes a ’double-precision floating-point number’ and is used

to represent real numbers.
Problem: Real numbers can hold infinitely many decimal places - yet,
a computer can obviously only store a finite amount of digits. How is
this possible?
We will now turn to a short excursion to floating-point arithmetics.
For a more detailed discussion about this topic refer to the bachelor
course ’Computergestützte Statistik’.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 11 / 62

Characteristics of the six basic data types double

Definition: Floating-point numbers

In general, we are looking at numbers with a base b, an excess (exponent

bias) q and a mantissa length (amount of significant digits) p.
With these, numbers can be expressed through the value pair (e, f ) so that:

(e, f ) → f × b e−q ,

where e is a natural number from a specified range of values and f is a

number between −1 and +1, i.e.

|f | < 1.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 12 / 62

Characteristics of the six basic data types double

Single-precision floating-point format

Single-precision floating-point numbers typically take the following form:
± eeeeeee ffffffff ffffffff ffffffff ,
i.e. 1 bit for the sign, 7 for the exponent and 24 for the mantissa, which
totals up to 4 bytes.
The computer selects the base b = 2.
The largest binary number representable with seven bits is
27 − 1 = 127, as such: 0 ≤ e < esup = 128.
e −1
A common choice for the excess q is b sup2 c = 63.
⇒ Absolute value of the min./max. representable number is smaller than
1 · 2127−63 = 264 ≈ 1.8 · 1019 .
⇒ The precision (decimal places) of these numbers is
log10 (224 ) ≈ 7.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 13 / 62
Characteristics of the six basic data types double

Double-precision floating-point format

For today’s applications, single precision does often not suffice with respect
to both size and precision. Therefore, double-precision floating-point
numbers are usually used. R only supports the latter.
There are 11 bits available for the exponent, 52 bits for the mantissa
and 1 bit for the sign.
As base, b = 2 is chosen.
Thus, it holds for the exponent that 0 ≤ e < esup = 211 = 2048.
e −1
A possible choice for the excess q is b sup2 c = 1023.
⇒ Absolute value of the min./max. representable number is smaller than

1 · 22047−1023 = 21024 ≈ 1.8 · 10308 .

⇒ The precision (decimal places) of these numbers is

log10 (252 ) ≈ 16.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 14 / 62

Characteristics of the six basic data types double

Example of a double-precision floating-point number

In general, we have: b = 2 = 102 , q = 1023 = 11111111112
⇒ f × 10e−1111111111
2
2

4.625 expressed as a double-precision floating-point number:

4.625 = 100.1012
Thus:
4.625 = 100.1012 · 10022
= 10.01012 · 10122
= 1.00101 · 1010
2
2

= 0.1001012 · 1011
2
2

= 0.1001012 · 102100000000102 −11111111112

Consequently:
e = 10000000010
and
f = 1001010000000000000000000000000000000000000000000000
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 15 / 62
Characteristics of the six basic data types double

So, now we know everything about real numbers, right?

0.1 + 0.2 == 0.3 0.1 + 0.2 - 0.3

## [1] FALSE ## [1] 5.551115e-17

Not every real number can be expressed in R, because only a finite

amount of bits is available for the mantissa.
That’s why almost all arithmetic operations with floating-point
numbers involve roundoff errors.
Beware: For many numbers with a finite decimal expansion the
respective binary expansion does not terminate, e.g. 0.1. Thus, even
the process of storing such numbers already introduces errors.
R (almost) never calculates accurately, but usually almost accurately.
Still, roundoff errors can accumulate over the course of lengthy
calculations and lead to questionable results.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 16 / 62
Characteristics of the six basic data types integer

Data types for numbers - integer

The data type integer is (surprisingly) used to store integers.
Barring a few exceptions, integers in R have to be designated explicitly
because R would use doubles otherwise:

str(1) str(1:2)

## num 1 ## int [1:2] 1 2

str(1L) str(numeric(1))

## int 1 ## num 0

Integers behave like doubles (most of the time) - they both also
belong to the numeric type.
So, what’s the use of integers then? They provide advantages w.r.t.
speed and storage requirements.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 17 / 62
Characteristics of the six basic data types integer

integer - Internal representation

Representation as a bit vector of length 32 = 4 bytes.

Interpretation as a decimal number - but wait, what about negative
numbers?
Solution: Use 31 Bits for the number and 1 bit for its sign.
Note, the largest number possible is 231 − 1 = 2 147 483 647.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 18 / 62

Characteristics of the six basic data types integer

integer - Expressing negative numbers I

intToBits() returns the binary representation of a integer in reversed
order. R denotes a 0 as ’00’ and a 1 as ’01’:
intToBits(41)

## [1] 01 00 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 00

rev(intToBits(2^31-1))

## [1] 00 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
## [26] 01 01 01 01 01 01 01

rev(intToBits(0))

## [1] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 00

The last (first in reversed order) bit specifies the sign. Apparently, ’00’
denotes a positive sign.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 19 / 62
Characteristics of the six basic data types integer

integer - Expressing negative numbers II

rev(intToBits(-41))

## [1] 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
## [26] 01 00 01 00 01 01 01

rev(intToBits(-(2^31-1)))

## [1] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 01

What happened here? Why aren’t negative numbers expressed just like
positive numbers only with a ’01’ as their last (first) bit?
R uses the so-called ’two’s complement’ to express negative numbers.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 20 / 62

Characteristics of the six basic data types integer

integer - Expressing negative numbers III

How do we get the two’s complement of a number?

Negate all bits (including the sign bit),
Then, add a 1 onto the negated bits.
Why even bother with this?
This simplifies arithmetic operations. For example, the subtraction of
two numbers can be regarded as an addition with a negative number
and as such, we can use our ’algorithm’ for addition by hand.
Zero does not have two representations anymore. There is no ’+0’ and
’-0’.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 21 / 62

Characteristics of the six basic data types integer

integer - Expressing negative numbers IV

So, now we have 231 − 1 positive and negative numbers and a single 0. But
there is still one bit combination left unassigned, the ’negative zero’, i.e.
the combination of a ’01’ for the sign bit (negative) and ’00’ for the
remaining bits.

What value could this bit combination represent?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 22 / 62

Characteristics of the six basic data types integer

integer - Expressing negative numbers IV

What value could this bit combination represent?

It’s the NA - the missing value:

rev(intToBits(NA))

## [1] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [26] 00 00 00 00 00 00 00

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 22 / 62

Characteristics of the six basic data types integer

integer - A small example I

For visualization purposes we’re looking at 4 bits instead of 32 bits in this

example - it works the same way and leads to the same result.

Task: Calculate 7 − 6:
At first, note that 7 − 6 = 7 + (−6).
The binary representation of 7 is ’0111’ while it’s ’0110’ for 6.
Now, determine the two’s complement of 6:
Negating the bits: ’1001’,
Adding 1: ’1001’ + ’0001’ = ’1010’.
Then, calculate:
0 1 1 1
+ 1 0 1 0
1 1 1

1 0 0 0 1
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 23 / 62
Characteristics of the six basic data types integer

integer - A small example II

Now, we have 5 bits instead of the permitted 4 bits. Therefore, the

first bit gets deleted (actually, it is not even calculated by the
computer).
Consequently, the result of the addition is ’0001’.
Thus, our result has a positive sign and the binary representation ’001’
which is a 1 in decimal representation.
As such, the result of 7 − 6 is +1.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 24 / 62

Characteristics of the six basic data types character

Data types for letters - character

For expressing (character) strings R offers the character data type.

Contrary to other programming languages, a character object can
contain multiple characters.
Character strings can be generated with either single (') or double (")
quotation marks in R:

"a" 'a'
## [1] "a" ## [1] "a"

For encoding bit strings to characters, encoding tables are used. There
are many different standard tables - a common standard is UTF8.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 25 / 62

Characteristics of the six basic data types logical

Data types for logical values - logical

The data type logical can take three values: TRUE, FALSE and NA.
While TRUE and FALSE are rather self-explanatory, NA denotes a
missing value.
TRUE, FALSE and NA are reserved words in R, i.e. their values cannot
be overridden.
This is, however, not the case for the potentially familiar variables T
and F: These do not inherently denote TRUE and FALSE. Instead, they
are only variables initialized with these values and as such, T and F can
be overridden:

TRUE <- 5 T <- 5

## Error in TRUE <- 5: invalid (do_set)

left-hand side to assignment

That’s why TRUE and FALSE should be used instead of T and F!

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 26 / 62
Characteristics of the six basic data types complex

Data type for complex numbers - complex

The data type complex is used to express numbers in the complex

plane.
Internally, this data type consists of a pair of doubles, one for the
real and one for the imaginary part.
As such, floating-point arithmetics apply to complex numbers in R as
well.
Complex numbers can be generated with an i:

3i 7 + 2i

## [1] 0+3i ## [1] 7+2i

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 27 / 62

Characteristics of the six basic data types raw

Data type for bytes - raw

The data type raw is intended to contain ’raw’ bytes.

A raw object consists of two elements each of which contains a
hexadecimal digit. As such, numbers between 0 and 255 can be
expressed as a raw.

as.raw(0) as.raw(255)
## [1] 00 ## [1] ff
as.raw(40) as.raw(256)
## [1] 28 ## Warning: out-of-range values
treated as 0 in coercion to raw
## [1] 00

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 28 / 62

Data type memory requirements

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 29 / 62

Data type memory requirements

Memory requirement of a single element

Up until now, we have not really talked about the memory requirements of
the individual data types. The only thing we know so far is that a double
ideally only requires 64 bits = 8 bytes and an integer 32 bits = 4 bytes.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 30 / 62

Data type memory requirements

Memory requirement of a single element

object.size(raw(1)) object.size(integer(1))

## 56 bytes ## 56 bytes

object.size(logical(1)) object.size(double(1))

## 56 bytes ## 56 bytes

object.size(character(1)) object.size(complex(1))

## 112 bytes ## 64 bytes

Then why does the reality in R differ from our expectations?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 30 / 62

Data type memory requirements

Memory requirement of an empty vector

Recalling Paradigm 2: These data types are vectors in R. Generating a

vector causes overhead and even vectors of length 0 already require
memory.

object.size(raw(0)) object.size(integer(0))

## 48 bytes ## 48 bytes

object.size(logical(0)) object.size(double(0))

## 48 bytes ## 48 bytes

object.size(character(0)) object.size(complex(0))

## 48 bytes ## 48 bytes

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 31 / 62

Data type memory requirements

Memory requirement of a single element - Findings

The difference between an empty double vector and a double vector

with a single element is exactly the expected 8 bytes of memory space.
The difference between an empty complex vector and a complex
vector with a single element is 16 bytes. This corresponds to two
doubles.
It’s remarkable that raws and logicals require 8 bytes as well. From
a logical point of view, these data types should require less memory.
Even generating one integer requires 8 bytes when it should only
require 4 bytes.
The most memory space is needed for generating a character
element - 64 bytes.
Do these findings hold when generating larger vectors as well?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 32 / 62

Data type memory requirements

Memory requirements of multiple elements I

library(xtable)

# Choose all vector sizes from 0 to 1000

size <- 0:1000

# Determine the respective memory requirement of the six data types

bytes <- sapply(size, function(x)
c(double = object.size(double(x)), integer = object.size(integer(x)),
character = object.size(character(x)), logical = object.size(logical(x)),
complex = object.size(complex(x)), raw = object.size(raw(x))))

# fit the linear model memory = intercept + beta * size

tab <- apply(bytes, 1, function(x) lm(x ~ size)$coefficients)

double integer character logical complex raw

Intercept 48.97 51.92 104.75 51.92 48.38 58.49
beta 8.00 4.00 8.00 4.00 16.00 0.99

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 33 / 62

Data type memory requirements

Memory requirements of multiple elements II

Findings:
The slope parameters reveal the expected result: a double requires 8
bytes, a complex 16 bytes and an integer 4 bytes.
A raw vector apparently only requires one byte per element indeed.
But as already seen, the first raw element needs 8 bytes. How are
these findings compatible?
The intercept of the character is well beyond the 48 bytes required
for an empty vector. Here’s why:
An empty vector needs (just like for the other data types) 48 bytes.
Every additional string produces another 56 bytes of overhead (onetime
storage of the string).
Each string requires an additional 8 bytes for every 8 characters (for a
pointer referencing the string in the memory).
This explains the 112 bytes for a single element: 48 bytes + 56 bytes
+ 8 bytes.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 34 / 62
Data type memory requirements

Memory requirements of multiple elements III

Examples:
A string with a single character:
object.size("")
## 112 bytes
# 48 bytes + 56 bytes + 8 bytes

A string with 8 characters (7 letters and the ""):

object.size("abcdefg")
## 112 bytes
# 48 bytes + 56 bytes + 8 bytes

A string with 9 characters:

object.size("abcdefgh")
## 120 bytes
# 48 bytes + 56 bytes + 2 * 8 bytes

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 35 / 62

Data type memory requirements

Memory requirements of multiple elements IV

Two strings with the same two characters:

object.size(c("a", "a"))
## 120 bytes
# 48 bytes + 56 bytes + 2 * 8 bytes

Two strings with two different characters:

object.size(c("a", "b"))
## 176 bytes
# 48 bytes + 2 * 56 bytes + 2 * 8 bytes

The linear model above (and the following figure) only shows an incline of
8 bytes per element. This is because the command character(x) returns
a vector containing the same character x times.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 36 / 62

Data type memory requirements

Memory requirements of multiple elements V

Graphical illustration of the memory requirements for small and midsized

vectors:

suppressWarnings(library(ggplot2))

# create a data frame to be used by ggplot

data_bytes <- data.frame(size = rep(0:45, each = 6),
type = rep(rownames(bytes), 46), bytes = as.vector(bytes[, 1:46]))

# create ggplot figure

ggplot(data_bytes, aes(x = size, y = bytes, color = type)) +
geom_line(size = 2) + xlab("vector length") +
theme(axis.text = element_text(size = rel(2)),
axis.title = element_text(size = rel(2)),
legend.text = element_text(size = rel(2)),
legend.title = element_text(size = rel(2)))

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 37 / 62

Data type memory requirements

Memory requirements of multiple elements VI

800

600
type
character
bytes

complex
400 double
integer
logical
raw
200

0 10 20 30 40
vector length

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 38 / 62

Data type memory requirements

Memory requirements of multiple elements VII

Findings:
The memory requirement does not grow perfectly linear often times. It
sometimes progresses jaggedly or even constantly.
At least 8 bytes are always allocated at once. Thus, the average byte
for a raw is really just an average value.
The memory requirement of the integer appears to be missing, but
its line is actually covered completely by the line for logicals. They
both have exactly the same memory requirements:
all(bytes["integer", ] == bytes["logical", ])
## [1] TRUE

For integers and logicals, only 4 bytes are needed on average for a
single element as well, but at least 8 bytes are allocated. This explains
the slightly wiggly graph.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 39 / 62

Hierarchy of data types

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 40 / 62

Hierarchy of data types

Hierarchy - what’s that?

Paradigm 3
A vector can always only contain elements of a single data type.

It would seem reasonable that R throws an error when trying to

combine different data types into one vector.
However:
c(raw(1), FALSE, 2L, 3, 7+3i, "5")
## [1] "00" "FALSE" "2" "3" "7+3i" "5"

So the question becomes: When does a specific data type get coerced
into another?
Caution: The coercion in R occurs tacitly! This can be
potentially dangerous!

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 41 / 62

Hierarchy of data types

A small quiz for the meantime

Which data type is returned by the following inputs?

c(3, 4L) c(FALSE, 3L)

c("abc", 3+6i) c(raw(1), TRUE)
c("efg", raw(1)) c(3, 3+1i)

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 42 / 62

Hierarchy of data types

A small quiz for the meantime

Which data type is returned by the following inputs?

c(3, 4L) c(FALSE, 3L)

c("abc", 3+6i) c(raw(1), TRUE)
c("efg", raw(1)) c(3, 3+1i)

Solution:

typeof(c(3, 4L)) typeof(c(FALSE, 3L))

## [1] "double" ## [1] "integer"

typeof(c("abc", 3+6i)) typeof(c(raw(1), TRUE))

## [1] "character" ## [1] "logical"

typeof(c("efg", raw(1))) typeof(c(3, 3+1i))

## [1] "character" ## [1] "complex"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 42 / 62

Hierarchy of data types

We have determined the following ’hierarchy’ of data types:

raw → logical → integer → double → complex → character

Beware! When converting back and forth, the result is not always the same
as the original input. Example:
as.double(as.logical(5))

## [1] 1

The as.logical() is particularly dangerous! Each number different from

0 is interpreted as a TRUE, however, when converting a TRUE to a double
or an integer it is always interpreted as a 1!

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 43 / 62

Operators for basic data types

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 44 / 62

Operators for basic data types numerics

Operators - integer, double, complex I

The following unary and binary operators for numerical values are
implemented in R:

Operator Meaning
+x Positive value of x
-x Change sign of x
x + y Addition
x - y Subtraction
x * y Multiplication
x / y Division
xˆy Exponentiation
x %% y Modulo division
x %/% y Integer division

Probably, this comes as no surprise. That’s why we are now taking a look
at the characteristics of these operators:
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 45 / 62
Operators for basic data types numerics

Operators - integer, double, complex II

For doubles +0 and -0 exist internally.

The operator ** can be used for exponentiation.
Exponentiation with a negative base and an exponent with an absolute
value < 1 is not possible in R:
(-8)^(1/3)
## [1] NaN

When using the function round(), ’.5’ decimal places are always
rounded to the closest even number:

round(0.5) round(2.5)
## [1] 0 ## [1] 2
round(1.5) round(3.5)
## [1] 2 ## [1] 4

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 46 / 62

Operators for basic data types logicals

Operators - raw, logical, (numeric) I

The following operators for logical values (and values that can be
interpreted as such) are available:

Operator Meaning
!x Vectorized negation
x & y Vectorized AND
x && y Scalar AND
x | y Vectorized OR
x || y Scalar OR
xor(x, y) Vectorized XOR

Again, this should come as no surprise. Let’s continue with their

characteristics:

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 47 / 62

Operators for basic data types logicals

Operators - raw, logical, (numeric) II

For raw elements, the negation corresponds to the ones’ complement

(negation of all bits). In this case: !x = 255 - x

as.numeric(!as.raw(40)) as.numeric(!as.raw(128))
## [1] 215 ## [1] 127

The operators &, | and xor() function bitwise for raws. This can lead
to the following results:

as.numeric(as.raw(40) & as.raw(38)) as.numeric(as.raw(40) & as.raw(41))

## [1] 32 ## [1] 40

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 48 / 62

Operators for basic data types logicals

Operators - raw, logical, (numeric) III

When using NAs, interesting results can occur:

TRUE & NA FALSE & NA

## [1] NA ## [1] FALSE

As for the first case, it is not clear whether the expression is TRUE or
FALSE, hence NA. In the second case, an AND-concatenation with
FALSE always leads to FALSE (more on this later).
As mentioned above, numeric values can be interpreted as logical
values as well. In these cases, only 0 is interpreted as FALSE and
everything else as TRUE:

!5 1 | 2
## [1] FALSE ## [1] TRUE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 49 / 62

Operators for basic data types characters

Operators - characters I
There is an order on characters as well which can be enquired by using <,
> and ==:

"1" < "2" "b" < "a"

## [1] TRUE ## [1] FALSE

Another small quiz:

The command sort() sorts a vector (characters as well) in an ascending
order. What is the result of this call?
sort(c("a", "10", "1", "A", "2", "aA", "b", "ä", "ae"))

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 50 / 62

Operators for basic data types characters

Operators - characters I
There is an order on characters as well which can be enquired by using <,
> and ==:

"1" < "2" "b" < "a"

## [1] TRUE ## [1] FALSE

Another small quiz:

The command sort() sorts a vector (characters as well) in an ascending
order. What is the result of this call?
sort(c("a", "10", "1", "A", "2", "aA", "b", "ä", "ae"))

Solution:
sort(c("a", "10", "1", "A", "2", "aA", "b", "ä", "ae"))

## [1] "1" "10" "2" "a" "A" "ä" "aA" "ae" "b"

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 50 / 62

Operators for basic data types characters

Operators - characters II

This result might surprise to some degree. Findings and questions:

’numbers’ come before letters.
’a’ comes before ’b’.
’1’ < ’2’.
’10’ is ’smaller’ than ’2’.
⇒ Alphabetical sorting where numbers come before letters and
comparisons occur per character. This is why ’10’ is smaller than ’2’,
because ’1’ < ’2’. However:
How does the order ’a’, ’A’, ’ä’, ’aA’, ’ae’ come about?
How does capitalization affect the sorting order?
What about accents and other special characters?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 51 / 62

Operators for basic data types characters

Operators - characters III

The answer is: R sorts according to the Unicode Collation Algorithm1 .

At first, the input is sorted alphabetically without consideration of
capitalization, accents, etc. It holds that:
Numbers come before letters,
When beginning with the same letter combination, short words come
before long words.
After that, accents and other special characters are sorted: ’a’ comes
before ’ä’, ’o’ before ’ö’ and ’u’ before ’ü’. Letters without accents
come before letters with accents.
Next, the use of lower and upper case is considered. Lower case letters
come before upper case letters.
There are further rules for special characters etc. which you can find at
the link below.

1
https://en.wikipedia.org/wiki/Unicode_collation_algorithm
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 52 / 62
NAs, NULL, Inf, etc.

NAs, NULL, Inf, etc.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 53 / 62

NAs, NULL, Inf, etc. NA

The NAs
The NAs? But there’s just one NA, isn’t there?

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 54 / 62

NAs, NULL, Inf, etc. NA

The NAs
The NAs? But there’s just one NA, isn’t there? Nope! There are in total five
NAs, one for each data type (except for the raw). The NA is the logical NA:
typeof(NA)

## [1] "logical"

Additionally, there are:

typeof(NA_character_) typeof(NA_integer_)

## [1] "character" ## [1] "integer"

typeof(NA_complex_) typeof(NA_real_)

## [1] "complex" ## [1] "double"

The five NAs all behave the same way. They are just assigned to different
data types.
Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 54 / 62
NAs, NULL, Inf, etc. NA

Operators on NAs I

Checking for NAs is not possible with the ’==’ operator:

x <- NA x <- 5
x == NA x == NA

## [1] NA ## [1] NA

Instead, checks for NA’s have to be performed with the is.na() function:

x <- NA x <- 5
is.na(x) is.na(x)

## [1] TRUE ## [1] FALSE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 55 / 62

NAs, NULL, Inf, etc. NA

Operators on NAs II

In general:
’Calculations’ with NAs always lead to NAs.

Sole exception:
If the result is the same for any value the NA could take, then this result is
returned.

That’s why FALSE & NA also leads to FALSE.

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 56 / 62

NAs, NULL, Inf, etc. NULL

The empty set - NULL

The value NULL represents the empty set. It is a data type on its own:
typeof(NULL)

## [1] "NULL"

The ’==’ operator does not work for NULL either:

NULL == NULL 5 == NULL

## logical(0) ## logical(0)

Instead, checks for NULL must be performed using the function is.null():

is.null(NULL) is.null(5)

## [1] TRUE ## [1] FALSE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 57 / 62

NAs, NULL, Inf, etc. NaN and Inf

Numeric constants - NaN and Inf I

Inf (Infinity) and NaN (Not a Number) are numeric constants:

typeof(Inf) typeof(NaN)

## [1] "double" ## [1] "double"

Aside from Inf (+∞), there also is -Inf (−∞). NaN is, for example, the
result of 0 / 0 or ∞ − ∞:

0 / 0 Inf - Inf

## [1] NaN ## [1] NaN

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 58 / 62

NAs, NULL, Inf, etc. NaN and Inf

Numeric constants - NaN and Inf II

It is possible to check for NaNs using the function is.nan(). However,
is.na() also returns TRUE for a NaN:

is.nan(NaN) is.na(NaN)

## [1] TRUE ## [1] TRUE

To perform a more universal check whether or not a value is neither NA,

NaN nor ±Inf, the function is.finite() is available:

is.finite(NaN) is.finite(Inf)

## [1] FALSE ## [1] FALSE

is.finite(NA) is.finite(5)

## [1] FALSE ## [1] TRUE

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 59 / 62

Overview

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 60 / 62

Overview

Overview of all data types I

The six (or seven when counting NULL) described data types form the base
of R. There are, however, further data types in R which have not been
discussed here. For completion’s sake, we’ll show them now in brief detail.
Data type Description Comment
symbol Variable name E.g. input for functions
pairlist Paired list Mostly for internal use
closure Function object
environment Environment See Chapter 2.3
promise Object for ’lazy evaluation’
language ’language’ object E.g. formula
special Internal function Does not evaluate its arguments
builtin Internal function Does evaluate its arguments

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 61 / 62

Overview

Overview of all data types II

Data type Description Comment

char ’scalar’ character object For internal use only
... Special variable arguments
any Data type that fits every type No objects of this type exist
expression ’expression’ object
list List See Chapter 2.2
bytecode Byte code For internal use only
externalptr External pointer
weakref ’weak’ reference object
S4 S4 object

Daniel Horn & Sheila Görz Advanced R Summer Semester 2022 62 / 62

PR and Media Relations Scope of Works
50% (2)
PR and Media Relations Scope of Works
3 pages
Organograma FIBA - Inglês
No ratings yet
Organograma FIBA - Inglês
1 page
Field Guide To Evil Menu - Midnight Cowboy
No ratings yet
Field Guide To Evil Menu - Midnight Cowboy
31 pages
All R Notes
No ratings yet
All R Notes
78 pages
Matlab - Tutor3 - Variables and Arrays
No ratings yet
Matlab - Tutor3 - Variables and Arrays
15 pages
Sycet/ Cse/Lom/Idsr: Intel P-Iv 2.70 GHZ Processor, 1 GB Ram, 256 GB HDD, 15" LCD Monitor, Keyboard, Mouse
No ratings yet
Sycet/ Cse/Lom/Idsr: Intel P-Iv 2.70 GHZ Processor, 1 GB Ram, 256 GB HDD, 15" LCD Monitor, Keyboard, Mouse
4 pages
141 PDFsam Matlab Prog
No ratings yet
141 PDFsam Matlab Prog
20 pages
Matlab - Tutor3 - Variables and Arrays
No ratings yet
Matlab - Tutor3 - Variables and Arrays
16 pages
RData Types
No ratings yet
RData Types
12 pages
ITW3
No ratings yet
ITW3
23 pages
Data Types R
No ratings yet
Data Types R
7 pages
Data Types Programming Languages
No ratings yet
Data Types Programming Languages
81 pages
Data Types
No ratings yet
Data Types
12 pages
SEE R Practical Dhara
No ratings yet
SEE R Practical Dhara
57 pages
R Variable
No ratings yet
R Variable
7 pages
Complex Data Type in R
No ratings yet
Complex Data Type in R
8 pages
R - Programme Notes
No ratings yet
R - Programme Notes
57 pages
Ds Module2 24
No ratings yet
Ds Module2 24
94 pages
Lab Manual DAR
No ratings yet
Lab Manual DAR
81 pages
Unit-13 Basic of R Programming
No ratings yet
Unit-13 Basic of R Programming
20 pages
UNIT 1 R Handouts-UN
No ratings yet
UNIT 1 R Handouts-UN
83 pages
Day 2
No ratings yet
Day 2
42 pages
Unit 1 &2
No ratings yet
Unit 1 &2
4 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
Basics of R Programming - Part 2
No ratings yet
Basics of R Programming - Part 2
7 pages
R Programme
No ratings yet
R Programme
56 pages
BasicMDL 3 NumMatrices
No ratings yet
BasicMDL 3 NumMatrices
62 pages
Advanced Computational Methods: ENGR 680
No ratings yet
Advanced Computational Methods: ENGR 680
19 pages
Biostat S1 Handout
No ratings yet
Biostat S1 Handout
7 pages
Matlab Data Types
No ratings yet
Matlab Data Types
4 pages
R Programming
No ratings yet
R Programming
30 pages
Starting With R - 1
No ratings yet
Starting With R - 1
1 page
Understanding Basic Data Types and Data Structures in R
No ratings yet
Understanding Basic Data Types and Data Structures in R
10 pages
R Programming
No ratings yet
R Programming
48 pages
TDT4102 - Lecture 02 - Value Added Syntax
No ratings yet
TDT4102 - Lecture 02 - Value Added Syntax
69 pages
Statistics With R Unit 1
No ratings yet
Statistics With R Unit 1
25 pages
Intro2R Wk4 Rev
No ratings yet
Intro2R Wk4 Rev
93 pages
3) Data Representation: Dr. E. Lang
No ratings yet
3) Data Representation: Dr. E. Lang
64 pages
R Data Types
No ratings yet
R Data Types
2 pages
MATLAB Practice PDF
No ratings yet
MATLAB Practice PDF
333 pages
R Project
0% (1)
R Project
25 pages
Assignment R
No ratings yet
Assignment R
5 pages
(Ca) Unit-Iii
No ratings yet
(Ca) Unit-Iii
19 pages
R Programming-Chapiter 4
No ratings yet
R Programming-Chapiter 4
16 pages
R Notes1 Merged
No ratings yet
R Notes1 Merged
36 pages
03 Octave Variabel
No ratings yet
03 Octave Variabel
19 pages
R-Basic Concepts
No ratings yet
R-Basic Concepts
67 pages
ITCE 380 Lab Report 1
No ratings yet
ITCE 380 Lab Report 1
9 pages
Unit 1 Matlab
No ratings yet
Unit 1 Matlab
21 pages
R Prog Lab Manual Theory
No ratings yet
R Prog Lab Manual Theory
16 pages
Introduction To R - Part1
No ratings yet
Introduction To R - Part1
34 pages
Introduction To R Installation: Data Types Value Examples
No ratings yet
Introduction To R Installation: Data Types Value Examples
9 pages
R - Classes (AutoRecovered)
No ratings yet
R - Classes (AutoRecovered)
37 pages
Experiment 9: The DTFT of A Sequence X (N) Is Defined by Following
No ratings yet
Experiment 9: The DTFT of A Sequence X (N) Is Defined by Following
7 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Datatypes Variables Operators in R
No ratings yet
Datatypes Variables Operators in R
22 pages
Data Types in R (Vectors)
No ratings yet
Data Types in R (Vectors)
48 pages
R & Python Notes
No ratings yet
R & Python Notes
131 pages
JNTUA Exploratory Data Analysis With R Lab Manual R20
No ratings yet
JNTUA Exploratory Data Analysis With R Lab Manual R20
29 pages
ENGR - 1204 - Lecture 1 - Week1
No ratings yet
ENGR - 1204 - Lecture 1 - Week1
49 pages
1 - Introduction To Programming With R
No ratings yet
1 - Introduction To Programming With R
13 pages
R Intro
No ratings yet
R Intro
227 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
K32 Oop
No ratings yet
K32 Oop
49 pages
K31 Imperativeprogrammierung
No ratings yet
K31 Imperativeprogrammierung
50 pages
K23 Environments
No ratings yet
K23 Environments
43 pages
K11 Computer
No ratings yet
K11 Computer
71 pages
5) Asymemtric - GRACH
No ratings yet
5) Asymemtric - GRACH
34 pages
3) From - SF - To - VaR
No ratings yet
3) From - SF - To - VaR
48 pages
2) Lecture - Predictability - Fall - 2022
No ratings yet
2) Lecture - Predictability - Fall - 2022
25 pages
Freq Sev Model
No ratings yet
Freq Sev Model
2 pages
Kristina Koleva: Chief Stewardess
No ratings yet
Kristina Koleva: Chief Stewardess
3 pages
Enclaving Atas Tanah Hak Guna Usaha Sebagai Sumber
No ratings yet
Enclaving Atas Tanah Hak Guna Usaha Sebagai Sumber
17 pages
Arts and Crafts Movement - History, Influene and Important Figures (Contribution
No ratings yet
Arts and Crafts Movement - History, Influene and Important Figures (Contribution
66 pages
Board of Intermediate and Secondary Education, DG - Khan Application
No ratings yet
Board of Intermediate and Secondary Education, DG - Khan Application
1 page
Industrial Visit Report Abhishek Gurjar
No ratings yet
Industrial Visit Report Abhishek Gurjar
21 pages
Bonding Chem20 Exam
No ratings yet
Bonding Chem20 Exam
5 pages
Conservation Status of Habitats Directive 92 43 EEC of Coastal and Low Hill Belts in A Mediterranean Biodiversity Hot Spot Gargano Italy
No ratings yet
Conservation Status of Habitats Directive 92 43 EEC of Coastal and Low Hill Belts in A Mediterranean Biodiversity Hot Spot Gargano Italy
24 pages
Eklavya Traders - Trading Rulebook-1
No ratings yet
Eklavya Traders - Trading Rulebook-1
20 pages
Media and Information Literacy: Quarter 1 - LAS 6
100% (1)
Media and Information Literacy: Quarter 1 - LAS 6
12 pages
Hearts and Haunts-Simplified Text-V1
No ratings yet
Hearts and Haunts-Simplified Text-V1
5 pages
Module 3 - Vocabulary & Grammar
No ratings yet
Module 3 - Vocabulary & Grammar
2 pages
Chapter 703
No ratings yet
Chapter 703
14 pages
Book Unit 2
No ratings yet
Book Unit 2
4 pages
De Vries, Unsung Friendship
No ratings yet
De Vries, Unsung Friendship
6 pages
Interview Writing
No ratings yet
Interview Writing
4 pages
Asme Section Ii A-2 Sa-815 Sa-815m
No ratings yet
Asme Section Ii A-2 Sa-815 Sa-815m
10 pages
Inflations, Its Types and Causes of Inflation in Pakistan
No ratings yet
Inflations, Its Types and Causes of Inflation in Pakistan
5 pages
Puerto Rico v. Sánchez Valle, 579 U.S. - (2016)
No ratings yet
Puerto Rico v. Sánchez Valle, 579 U.S. - (2016)
38 pages
Just Culture (Aviation Safety Management Systems)
100% (1)
Just Culture (Aviation Safety Management Systems)
0 pages
Wedding Dance
100% (4)
Wedding Dance
2 pages
Notes On Labour Law-I - 230727 - 192958
100% (1)
Notes On Labour Law-I - 230727 - 192958
66 pages
Thingsboard EN
No ratings yet
Thingsboard EN
4 pages
1817-18 Gas Lighting in London
No ratings yet
1817-18 Gas Lighting in London
1 page
S T o P S: Present Simple
No ratings yet
S T o P S: Present Simple
3 pages
Life Cycle of Angiosperms
No ratings yet
Life Cycle of Angiosperms
5 pages
1st L Research PDF
No ratings yet
1st L Research PDF
5 pages
Test Answer Sheet 3 (En)
No ratings yet
Test Answer Sheet 3 (En)
18 pages