0% found this document useful (0 votes)
98 views9 pages

R Nuts and Bolts

The document provides an overview of basic R data types including vectors, matrices, lists, factors, and data frames. It discusses creating and manipulating objects of each type, setting attributes like names, dimensions, and levels. Important functions covered include c(), vector(), matrix(), list(), factor(), and data.frame(). The document also reviews missing values, coercion between types, and printing/subsetting vector outputs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views9 pages

R Nuts and Bolts

The document provides an overview of basic R data types including vectors, matrices, lists, factors, and data frames. It discusses creating and manipulating objects of each type, setting attributes like names, dimensions, and levels. Important functions covered include c(), vector(), matrix(), list(), factor(), and data.frame(). The document also reviews missing values, coercion between types, and printing/subsetting vector outputs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

R

Nuts and Bolts


Entering Input
At the R prompt we type expressions. The <- symbol is the assignment operator.

> x <- 1
> print(x)
[1] 1
>x
[1] 1
> msg <- "hello"

The grammar of the language determines whether an expression is complete or not.

x <- ## Incomplete expression

The # character indicates a comment. Anything to the right of the # (including the #
itself) is ignored. This is the only comment character in R. Unlike some other languages,
R does not support multi-line comments or comment blocks.

Evaluation
When a complete expression is entered at the prompt, it is evaluated and the result of
the evaluated expression is returned. The result may be auto-printed.

> x <- 5 ## nothing printed


>x ## auto-printing occurs [1] 5
> print(x) ## explicit printing [1] 5

The [1] shown in the output indicates that x is a vector and 5 is its first element.

Typically with interactive work, we do not explicitly print objects with the print function;
it is much easier to just auto-print them by typing the name of the object and hitting
return/enter. However, when writing scripts, functions, or longer programs, there is
sometimes a need to explicitly print objects because auto-printing does not work in
those settings.

When an R vector is printed you will notice that an index for the vector is printed in
square brackets [] on the side. For example, see this integer sequence of length 20.

> x <- 10:30 >x

[1] 10 11 12 13 14 15 16 17 18 19 20 21
[13] 22 23 24 25 26 27 28 29 30

The numbers in the square brackets are not part of the vector itself, they are merely
part of the printed output.
R Objects
R has five basic or “atomic” classes of objects:

• character
• numeric (real numbers) • integer
• complex
• logical (True/False)

The most basic type of R object is a vector. Empty vectors can be created with the
vector() function. There is really only one rule about vectors in R, which is that A vector
can only contain objects of the same class.

Creating Vectors
The c() function can be used to create vectors of objects by concatenating things
together.

> x <- c(0.5, 0.6) ## numeric


> x <- c(TRUE, FALSE) ## logical
> x <- c(T, F) ## logical
> x <- c("a", "b", "c") ## character > x <- 9:29 ## integer > x <- c(1+0i, 2+4i) ## complex

You can also use the vector() function to initialize vectors.

> x <- vector("numeric", length = 10)

>x

[1] 0 0 0 0 0 0 0 0 0 0

Mixing Objects
There are occasions when different classes of R objects get mixed together. Sometimes
this happens by accident but it can also happen on purpose. So what happens with the
following code?

> y <- c(1.7, "a") ## character

> y <- c(TRUE, 2) ## numeric

> y <- c("a", TRUE) ## character


Explicit Coercion
Objects can be explicitly coerced from one class to another using the as.* functions, if
available.

> x <- 0:6


> class(x)
[1] "integer"
> as.numeric(x)
[1] 0 1 2 3 4 5 6
> as.logical(x)
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE

> as.character(x)
[1] "0" "1" "2" "3" "4" "5" "6"

Matrices
Matrices are vectors with a dimension attribute. The dimension attribute is itself an
integer vector of length 2 (number of rows, number of columns)

> m <- matrix(nrow = 2, ncol = 3) >m

[,1] [,2] [,3]

[1,] NA NA NA

[2,] NA NA NA

> dim(m)
[1] 2 3
> attributes(m)
$dim
[1] 2 3

Matrices are constructed column-wise, so entries can be thought of starting in the


“upper left” corner and running down the columns.

> m <- matrix(1:6, nrow = 2, ncol = 3)

>m

[,1] [,2] [,3]


[1,] 1 3 5
[2,] 2 4 6

Matrices can also be created directly from vectors by adding a dimension attribute.

> m <- 1:10


>m

[1] 1 2 3 4 5 6 7 8 9 10

> dim(m) <- c(2, 5)


>m

[,1] [,2] [,3] [,4] [,5]


[1,] 1 3 5 7 9
[2,] 2 4 6 8 10

Matrices can be created by column-binding or row-binding with the cbind() and rbind()
functions.

> x <- 1:3


> y <- 10:12
> cbind(x, y)

X y

[1,] 1 10

[2,] 2 11

[3,] 3 12

> rbind(x, y)

[,1] [,2] [,3]

X 1 2 3

y 10 11 12

Lists
Lists are a special type of vector that can contain elements of different classes. Lists are
a very important data type in R and you should get to know them well. Lists, in
combination with the various “apply” functions discussed later, make for a powerful
combination.

Lists can be explicitly created using the list() function, which takes an arbitrary number
of arguments.

> x <- list(1, "a", TRUE, 1 + 4i)

>x
[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]

[1] TRUE

[[4]]
[1] 1+4i

We can also create an empty list of a prespecified length with the vector() function

> x <- vector("list", length = 5)

>x
[[1]]
NULL

[[2]]

NULL

[[3]]

NULL

[[4]]

NULL

[[5]]

NULL

Factors
Factors are used to represent categorical data and can be unordered or ordered. One
can think of a factor as an integer vector where each integer has a label. Factors are
important in statistical modeling and are treated specially by modelling functions like
lm() and glm().

Factor objects can be created with the factor() function.

> x <- factor(c("yes", "yes", "no", "yes", "no"))

>x
[1] yes yes no yes no
Levels: no yes

> table(x)

x
no yes

2. 3
> ## See the underlying representation of factor

> unclass(x)
[1] 2 2 1 2 1
attr(,"levels")
[1] "no" "yes"

Often factors will be automatically created for you when you read a dataset in using a
function like read.table(). Those functions often default to creating factors when they
encounter data that look like characters or strings.

The order of the levels of a factor can be set using the levels argument to factor(). This can
be important in linear modelling because the first level is used as the baseline level.

> x <- factor(c("yes", "yes", "no", "yes", "no"))

> x ## Levels are put in alphabetical order


[1] yes yes no yes no
Levels: no yes

> x <- factor(c("yes", "yes", "no", "yes", "no"), + levels = c("yes", "no"))
>x
[1] yes yes no yes no

Levels: yes no

Missing Values
Missing values are denoted by NA or NaN for q undefined mathematical operations.

• is.na() is used to test objects if they are NA


• is.nan() is used to test for NaN
• NA values have a class also, so there are integer NA, character NA, etc.

• A NaN value is also NA but the converse is not true

> ## Create a vector with NAs in it


> x <- c(1, 2, NA, 10, 3)


> ## Return a logical vector indicating which elements are NA

> is.na(x)
[1] FALSE FALSE TRUE FALSE FALSE


> ## Return a logical vector indicating which elements are NaN
> is.nan(x)
[1] FALSE FALSE FALSE FALSE FALSE

> ## Now create a vector with both NA and NaN values

> x <- c(1, 2, NaN, NA, 4)


> is.na(x)
[1] FALSE FALSE TRUE TRUE FALSE

> is.nan(x)
[1] FALSE FALSE TRUE FALSE FALSE

Data Frames
Data frames are used to store tabular data in R. They are an important type of object in
R and are used in a variety of statistical modeling applications.

Data frames are usually created by reading in a dataset using the read.table() or read.csv().
However, data frames can also be created explicitly with the data.frame() function or they
can be coerced from other types of objects like lists.

> x <- data.frame(foo = 1:4, bar = c(T, T, F, F))

>x

foo bar

1 1 TRUE

2 2 TRUE

3 3 FALSE

4 4 FALSE

> nrow(x)
[1] 4
> ncol(x)
[1] 2

Names
R objects can have names, which is very useful for writing readable code and self-
describing objects. Here is an example of assigning names to an integer vector.
> x <- 1:3
> names(x)
NULL
> names(x) <- c("New York", "Seattle", "Los Angeles") >x


New York Seattle Los Angeles

1 2 3

> names(x)

[1] "New York" "Seattle" "Los Angeles"

Lists can also have names, which is often very useful.

> x <- list("Los Angeles" = 1, Boston = 2, London = 3)

>x
$`Los Angeles`
[1] 1

$Boston [1] 2

$London
[1] 3
> names(x)
[1] "Los Angeles" "Boston" "London"

Matrices can have both column and row names.

> m <- matrix(1:4, nrow = 2, ncol = 2)


> dimnames(m) <- list(c("a", "b"), c("c", "d")) >m

c d

a 1 3

b 2 4

Column names and row names can be set separately using the colnames() and rownames()
functions.

> colnames(m) <- c("h", "f")

> rownames(m) <- c("x", "z")

>m

h f

x 1 3
z 2 4

Summary
Set column names

names()
colnames()

Set row names

row.names()
rownames()

Summary

There are a variety of different builtin-data types in R. In this chapter we have reviewed
the following

• atomic classes: numeric, logical, character, integer, complex • vectors, lists


• factors
• missing values

• data frames and matrices

All R objects can have attributes that help to describe what is in the object. Perhaps the
most useful attribute is names, such as column and row names in a data frame, or
simply names in a vector or list. Attributes like dimensions are also important as they
can modify the behavior of objects, like turning a vector into a matrix.

You might also like