0% found this document useful (0 votes)
101 views13 pages

R - Tutorial: Matrices Are Vectors

This document provides an overview of functions and operations in R for creating and manipulating vectors, matrices, data frames, and factors. Some key points covered include: 1) Functions for creating vectors from sequences or by repeating values, as well as extracting elements and subsetting vectors. Matrix creation and subsetting is also demonstrated. 2) Reading in data from files like CSV and TXT, attaching and detaching data frames, and merging or binding data together. 3) Working with factors to represent categorical data and cut to bin continuous variables. Applying functions across data using apply family functions like lapply and tapply. 4) Additional vector/matrix operations like recycling values to match lengths, and
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views13 pages

R - Tutorial: Matrices Are Vectors

This document provides an overview of functions and operations in R for creating and manipulating vectors, matrices, data frames, and factors. Some key points covered include: 1) Functions for creating vectors from sequences or by repeating values, as well as extracting elements and subsetting vectors. Matrix creation and subsetting is also demonstrated. 2) Reading in data from files like CSV and TXT, attaching and detaching data frames, and merging or binding data together. 3) Working with factors to represent categorical data and cut to bin continuous variables. Applying functions across data using apply family functions like lapply and tapply. 4) Additional vector/matrix operations like recycling values to match lengths, and
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

R – Tutorial

Creating a vector:
z <- c(1,3,4),z <- c(“f”,”k”,”l”) – do not mix integers with strings!

Creating a sequence:
z <- 2:7

z <- seq(from=1,to=13,by=4)

Repeating a number/string several times:


z <- rep(“r”, times=6)

Repeating a vector:
z <- rep(c(1,3,4),times=3)

Repeating a sequence:
z <- rep(seq(from=1,to=11,by=2),times=3)
z <- rep(2:4), times=5)
z <- rep(c(5,12,13),each=2)

Operations with sequences and vectors:


> z+10
[1] 11 15 19 23

- if 2 vectors are of the same length, we could add, substrate, multiply, etc., them
> x+y #x <- c(1,2,3,4) y <- c(10,13,16,19)
[1] 11 15 19 23
- : has higher precedence than * and +
> z <- 1:5*2 # returns 2,4..10

> table(1,1,2,2) -> shows the occurencies of each element


Extracting elements from vectors:
> x[3] – extract the third(!) element
> x[-3] – extract all elements except the third
> x[1:3] – extract elements 1,2,3
> x[-2:-4] – extract elements 2 to 4
> x[c(1,5)] – extract el. 1 and 5 # duplicates are allowed
> x[x<4] – all el. less than 4
> x[-length(x)] # all elements w/o the last one
> identical(x,y) -> tests two vectors for equality

Matrices are vectors


Creating a matrix:
> matrix(c(1,2,3,4,5,6),nrow=3,byrow=TRUE)
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
nrow – number of rows
byrow- order the elements by row

Extracting elements from a matrix:


> mat[row, column]
> mat[c(1,3),2] – extract elements 1,3 in column 2
> mat[3,] – extract all elements in row 3

Arrays – multidimensional matrices


tests <- array(data=c(firsttest,secondtest),dim=c(3,2,2))
R displays the data layer by layer

Dataframes<-> Matrix is like List<-> Vector


> complete.cases(d4) -> rows, which do not have NA values
[1] TRUE FALSE TRUE FALSE
> d5 <- d4[complete.cases(d4),] -> filter incomplete cases(rows) out

rbind(d,list("Laura",19)) -> adding new rows/cols to dataframe


> examsquiz$ExamDiff <- examsquiz$Exam.2 - examsquiz$Exam.1 # or
this way
d <- merge(d1,d2) # R finds the common rows

Importing data:
csv - comma separated, txt – tab delimited file
data <- read.table(file.choose(), header = T,sep = ",") - for csv
data4 <- read.table(file.choose(),header = T,sep="\t") – for txt
data1 <- read.csv(file.choose(), header=T)
data2 <- read.delim(file.choose(),header=T) - for txt
View(data) – opens a new tab with the file

data = read.csv("klausur.csv", header = TRUE,stringsAsFactors = F)


- add data w/o factors and add then later
Remove objects:
> rm(x)

Extracting elements from data:


> mat[-(3:4),] – all elements without the 3rd and 4th row
> colnames(mat), rownames
> as.matrix(vector)
>r <- z[2,, drop=FALSE] -> extract a column from a matrix and makes it to ano
ther matrix; default arg is TRUE, which makes a vector
> ij <- which(d == smallest,arr.ind=TRUE) -> arr.ind gives matrix index(e.g.
2,4) and not vector index(8)
finding the mean of a column:
> mean(filename$col)

Attaching/Detaching a file to R:
> attach(file) #now we don’t need the $ sign to find a certain column/the mean of a column
> detach(file)

Factors:
Factors are used to represent categorical data. They are stored as integers
- smart trick to relabel data: col <- as.factor(col) then levels(col) <- c(…) to
automatically update all factors!

sex <- factor(c("male", "female", "female", "male"))


levels(sex)
[1] "female" "male" #alphabetical order!
nlevels(sex) % number of levels
[1] 2

- tapply(), split(), by() works pretty much like tapply:


by(aba,aba$Gender,function(m) lm(m[,2]~m[,3]))
aggregate(aba[,-1],list(aba$Gender),median) -> calls tapply for each groput formed
by the second argument and applies the 3rd argument; the result is a data frame with nrow equal to the
# of groups;
cut(vector,bins) -> creates a factor/groups of bins, in which each number of the vector falls
- very useful for creating histograms
cut(seq(1:10), 10) – 10 bins: (0.991,1.9] (1.9,2.8] (2.8,3.7] (3.7,4.6]
(4.6,5.5] (5.5,6.4] (6.4,7.3] (7.3,8.2] (8.2,9.1] (9.1,10]

Summary and dim commands:


> summary(res)
Datum Serie Punkte
Min. :17.28 Min. : 8.202 Min. : 52
1st Qu.:19.82 1st Qu.:12.452 1st Qu.: 69
Median :22.36 Median :16.702 Median : 86
Mean :22.36 Mean :16.702 Mean : 86
3rd Qu.:24.31 3rd Qu.:20.952 3rd Qu.:103
Max. :30.05 Max. :25.202 Max. :120

> dim(res)
[1] #ofrows, #ofcolums

Recording a variable as a factor:


x <- c(0,1,1,1,0,0,1,0)
y <- as.factor(x)
summary(x) #prints the frequencies, not the median and mean

Subsetting data:
mean(col1[col2==”female”]) – prints the mean of col 1 if the level of col 2 is “female”
x <- filename[col == “male”, ] – creates a new table only with the male levels
y <- filename[col1 == “male” & col2 > 20, ]- all columns with this requirements

x <- cbind(x1 = 3, x2 = c(4:1, 2:5))


x <- rbind(x1 = 3, x2 = c(4:1, 2:5))

makes a matrix with first column 3…3 and second column x2


The apply family of functions
sapply(x,function(x) x+1) -> applies a function to vector/list and makes a ve
ctor;
vapply – you should specify the output type
lapply(x,function(x) x+1) -> same but makes a list
can be used on vectors, too, which are coerced to lists;
apply(x,2,sum,fargs) -> applies a function to a matrix, 2 for cols, 1 for row
s; fargs is optional arguments for the func
tapply(data$income,list(data$over25,data$gender),mean) -> makes a new data fr
ame, where the rows are whether the person is over 25 and the columns are the
gender; the entries are the mean of the income of these groups
mapply(FUN,...) -> applies a multivariate(or unary) function elementwise to
the dots, e.g.
mapply(function(x,y) x+y,c(1,10),c(2,3)) -> 2,13

Vectors
- immutable
- x <- c(x[1:3],168,x[4]) -> this creates a new vector

for loop: for (i in 1:length(x))


- if x is NULL, the sequence will be 1 0 -> bad
for(i in seq(y)) {print(y[i])}
for(i in y) {print(i)}

Recycling
applying functions to vectors of different length causes R to repeat the smal
ler vector until it has the same length as the longer
- same with matrices , which are just longer vectors
Vector allocation is time consuming. Its better to create a new vector once and then to assign
values to it, and not to concatenate it using x <- c(x,number)
runs <- vector(length=n)
runs[i] <- 3

Lists are vectors:


j <- list(name="Joe", salary=55000, union=T) , = means that we set them a name
> z <- vector(mode="list")
> z[["abc"]] <- 3
- indexing
> j$salary [1] 55000 or j[["salary"]] [1] 55000 or j[[2]] [1] 55000
- using single brackets makes a new list, even if we index only 1 argument
z$c <- 245 # add a c component; z[5:7] <- c(FALSE,TRUE,TRUE)
NA and NULL
NA – missing or unknown values -> for statistics
NA always takes the mode of the other entries in the vector
- mean(x,na.rm=TRUE)
NULL – non existent values
- NULL has no mode, length 0

Filtering
> z <- c(5,2,-3,8)
> y <- c(1,2,30,5)
> y[z*z > 8] [1] 1 30 5

Whe are using the indices of z which fulfill the condition to determine the corresponding
indices of y
- works with assignment too!
- NULL values are not filtered
=> we use subset(vector,condition)
which(x>3)
first1a <- function(x) return(which(cond)[1]) # takes the first elemen
t of a vector which holds a certain condition

ifelse
ifelse(b,u,v) -> b is condition on the vector, u[i] is the result if b[i] is true
x <- 1:10
y <- ifelse(x %% 2 == 0,5,12) # %% is the mod operator
> ifelse(x[,2]>3,c(1:8),c(19:(19+8)))
[1] 1 20 21 22 23 24 7 8

> x <- c(5,2,9,12)


> ifelse(x > 6,2*x,3*x)
[1] 15 6 18 24
-- makes a list of factors and the corresponding indices in a vector
> grps <- list()
> for (gen in c("M","F","I")) grps[[gen]] <- which(g==gen)
- “Vectorised switch case”
> x <- 1:50
> case_when(
+ x %% 35 == 0 ~ "pdf",
+ x %% 5 == 0 ~ "png",
+ x %% 7 == 0 ~ "jpg",
+ TRUE ~ as.character(x)
+ )

Maths
cumsum, cumprod, pmin- element-wise, pmax
nlm(),optim() – finds a minimum;
D(expression

Linear algebra:
…, solve(a,b) -> Ax=b; qr(), chol, t(); diag(3) -> 3x3 identity matrix; diag(m) -> takes the diagonal of a
matrix

Plotting
- aba is a dataset with cols gender, length, diameter;
pchvec <- ifelse(aba$Gender == "M","o","x")
- pchvec set the plotting “figures”; if gender is M, plot “o”, else plot a “x”
plot(aba$Length,aba$Diameter,pch=pchvec)
- plot length vs. diameter and use the figures from pchvec
locator() -> click on some point on the graph, and it show you the location

Functions:
default argument: b = 2 -> always with spaces!!!
- R searches first for local variables in the function body, then one level higher(global vars or if
the function is nested, in the higher function)
Infix functions:
- always begin and end with %%, e.g. %+%, %*%; you can make your own
Replacement functions:
`modfy<-` <- function(x,elem,value) {
x[elem] <- value
x
}
- we call them like so:
modfy(vec,3) <- 10

- internally, R modifies it like so:


vec <- “modfy<-”(vec,3,10)

- traceback() -> if an error occurs, see what went wrong


invoke errors with stop(“Text”)
- magnittr:: %>% - pipe; x %>% f(y) equivalent to f(x,y) and x %>% f(y,.) is f(y,x)
<<- makes a global variable inside a function; alternatively:
assign("var",x*2,pos=.GlobalEnv)

S3 Classes
x <- structure(1, class = "foo")

or
class(x) <- "foo" # if x is an existing object

unclass() -> get the “content” of an object

- S3 generics: UseMethod(“fun”,arg) -> Method dispatch decides, which method to use for a
generic; in S3, method dispatch works only on the first argument!
isS3stdGeneric
mean <- function (x, ...) {
UseMethod("mean", x)
}
mean.numeric <- function(x, ...) sum(x) / length(x)
mean.data.frame <- function(x, ...) sapply(x, mean, ...)
mean.matrix <- function(x, ...) apply(x, 2, mean)

-- NextMethod()
I create an object with 2 classes attributes (inheritance) 'first' and 'second'.

x <- 1
attr(x,'class') <- c('first','second')
Then I create a generic method Cat to print my object
Cate <- function(x,...)UseMethod('Cate')
I define Cate method for each class.
Cate.first <- function(x,...){
print(match.call())
print(paste('first:',x))
print('---------------------')
NextMethod() ## This will call Cate.second
}

Cate.second <- function(x,y){


print(match.call())
print(paste('second:',x,y))
}

S4 Classes
- more formal and safe than S3
- access with obj@elem or slot(obj,elem)
- @ and $ here can’t make new elements of an object
-Method dispatch can be based on multiple arguments to a generic function, not just one.

setClass("yyy", representation(v="numeric"))

student <- setClass("student", slots=list(name="character", age="numeric",


GPA="numeric"))

- prototype is an optional argument which provides default values for the


slots:

prototype = list(
name = NA_character_,
age = NA_real_
)

create new objects with new()

s <- new("student",name="John", age=21, GPA=3.5)


slot(s,"name") /is same as/ s@name

showMethods(show) -> list all methods of a generic function

Inheritance with contains():

setClass("Employee",
contains = "Person",
slots = c(
boss = "Person"
),
prototype = list(
boss = new("Person")
)
)
Canonical example
Class definition
> setClass("Greeting",
+ representation =
+ representation(phrase = "character"),
+ prototype = prototype(phrase = "Hello world!"))
Default instantiation
> new("Greeting")
An object of class "Greeting"
Slot "phrase":
[1] "Hello world!"
Customized instantiation
> new("Greeting", phrase = "ciao")
An object of class "Greeting"
Slot "phrase":
[1] "ciao"

Validity method
> setValidity("Greeting",
+ function(object) {
+ if (length(object@phrase) != 1)
+ "'phrase' not a single string"
+ else if (!nzchar(object@phrase))
+ "'phrase' is empty"
+ else
+ TRUE
+ })
Invalid object instantiation
> new("Greeting", phrase = "")
Error in validObject(.Object) :
invalid class "Greeting" object: 'phrase' is empty

is() shows you the classes from which an object inherites


# create or override methods:

setMethod("show","student",function(object) {

cat(object@name, "\n")

cat(object@age, "years old\n")

cat("GPA:", object@GPA, "\n")})

Inheritance:

setClass("C", contains = "character") # C inherits from character vector

setClass("B", contains = "C") # B inherits from C

setClass("A", contains = "B") # A inherits from B

R6
 It uses the encapsulated OOP paradigm, which means that methods belong t
o objects, not generics, and you call them like object$method().
 R6 objects are mutable, which means that they are modified in place, and he
nce have reference semantics. objects are not copied when modified; if y1 is an
R6 object, setting y2 <- y1 just makes another reference to the object and mod
ifying y1 modify also y2; use y2 <- y1$clone() instead
R6::R6Class() is the only function from the package that you’ll ever use! 51
classname, public = list() – list of all functions and fields that the class needs
Accumulator <- R6Class("Accumulator", list(
sum = 0,
add = function(x = 1) {
self$sum <- self$sum + x
invisible(self)
})
)
You construct a new object from the class by calling the new() method. In R6,
methods belong to objects, so you use $ to access new():
x <- Accumulator$new()
x$add(4)
x$sum
#> [1] 4
In this class, the fields and methods are public, which means that you can get or set the valu
e of any field.
x$add(10)$add(10)$sum – possible due to the invisible() return

There are two important methods that should be defined for most
classes: $initialize() and $print(). -> initialize overrides the $new function

-----------------------
Add new elements to an existing class with $set(), supplying the visibility (more on in
Section 14.3), the name, and the component.
Accumulator <- R6Class("Accumulator")
Accumulator$set("public", "sum", 0)
Accumulator$set("public", "add", function(x = 1) {
self$sum <- self$sum + x
invisible(self)
})

Inheritance:
AccumulatorChatty <- R6Class("AccumulatorChatty",
inherit = Accumulator,
public = list(
add = function(x = 1) {
cat("Adding ", x, "\n", sep = "")
super$add(x = x)
}
)
)
# super accesses the parent class; add overrides the add for the
parent class and uses it in the implementation

private = list()
- private elements can only be accessed from whitin the class with private$
instead of self$
-active bindings -> advanced!

Functional programming
purrr package
purrr::map(c(1,2),function(x) x+1) -> makes a list
purrr::map_chr(chem$MATRIKEL,typeof) -> makes a vector
map_lgl -> returns a logical vector
map_int returns a logical vector, map_dbl double vector
instead of function(x) write ~ and then .x:
purrr::map_dbl(c(1,2,3),~ .x+1)
- can be used for extracting elements from (nested) lists, too:

x <- list(
list(-1, x = 1, y = c(2), z = "a"),
list(-2, x = 4, y = c(5, 6), z = "b"),
list(-3, x = 8, y = c(9, 10, 11))
)

# Select by name
map_dbl(x, "x")
#> [1] 1 4 8

# Or by position
map_dbl(x, 1)
#> [1] -1 -2 -3

generating random data:


x <- map(1:3, ~ runif(2))

plus <- function(x, y) x + y

x <- c(0, 0, 0, 0)
map_dbl(x, plus, runif(1)) # passed as argument of map, the function is evaluated
only once and applied to all elements of the vector
#> [1] 0.0625 0.0625 0.0625 0.0625
map_dbl(x, ~ plus(.x, runif(1))) # here we evaluate plus every time, thus runif is
evaluated 4 times, too
#> [1] 0.903 0.132 0.629 0.945

the map family -> map, map2, walk, walk2, modify, pmap…
reduce takes a list of vectors and a function
accumulate gives a list of all intermediate results of reduce
while map works (in the sense that it generates the desired welcomes), it also returns list(N
ULL, NULL). If you want to call a function only for its side effects, use walk

delayedAssign(“v”,sample(1e8)) for lazy evaluation

### Promises in an environment [for advanced users]: ---------------------

e <- (function(x, y = 1, z) environment())(cos, "y", {cat(" HO!\n"); pi+2})


## How can we look at all promises in an env (w/o forcing them)?
gete <- function(e_)
lapply(lapply(ls(e_), as.name),
function(n) eval(substitute(substitute(X, e_), list(X=n))))

(exps <- gete(e))


sapply(exps, typeof)
(le <- as.list(e)) # evaluates ("force"s) the promises
stopifnot(identical(unname(le), lapply(exps, eval))) # and another "Ho!"

Regression models:
lm(predicted ~ predictor)
glm – e.g. for logistic regression
- Weight ~ Age + Height
- Weight ~ . -> everything
- Weight ~ .-GPA -> exclude one variable
model <- glm(formula = BP ~ ., data = gender[1:16,])
=> predict(model,gender[17:20,], type=response) for 0<output<1
Making a confusion matrix(false positives etc.)
table(Actual_Value=test$type,Predicted_Value=res>0.5)

table(vector) -> use e.g. on Boolean vector to show how many T and F

PCA:
- pca <- prcomp(data)
-summary(pca) -> see most important principal components
- plot(pca, type = "l") -> plot them

You might also like