R - Tutorial: Matrices Are Vectors
R - Tutorial: Matrices Are Vectors
Creating a vector:
z <- c(1,3,4),z <- c(“f”,”k”,”l”) – do not mix integers with strings!
Creating a sequence:
z <- 2:7
z <- seq(from=1,to=13,by=4)
Repeating a vector:
z <- rep(c(1,3,4),times=3)
Repeating a sequence:
z <- rep(seq(from=1,to=11,by=2),times=3)
z <- rep(2:4), times=5)
z <- rep(c(5,12,13),each=2)
- if 2 vectors are of the same length, we could add, substrate, multiply, etc., them
> x+y #x <- c(1,2,3,4) y <- c(10,13,16,19)
[1] 11 15 19 23
- : has higher precedence than * and +
> z <- 1:5*2 # returns 2,4..10
Importing data:
csv - comma separated, txt – tab delimited file
data <- read.table(file.choose(), header = T,sep = ",") - for csv
data4 <- read.table(file.choose(),header = T,sep="\t") – for txt
data1 <- read.csv(file.choose(), header=T)
data2 <- read.delim(file.choose(),header=T) - for txt
View(data) – opens a new tab with the file
Attaching/Detaching a file to R:
> attach(file) #now we don’t need the $ sign to find a certain column/the mean of a column
> detach(file)
Factors:
Factors are used to represent categorical data. They are stored as integers
- smart trick to relabel data: col <- as.factor(col) then levels(col) <- c(…) to
automatically update all factors!
> dim(res)
[1] #ofrows, #ofcolums
Subsetting data:
mean(col1[col2==”female”]) – prints the mean of col 1 if the level of col 2 is “female”
x <- filename[col == “male”, ] – creates a new table only with the male levels
y <- filename[col1 == “male” & col2 > 20, ]- all columns with this requirements
Vectors
- immutable
- x <- c(x[1:3],168,x[4]) -> this creates a new vector
Recycling
applying functions to vectors of different length causes R to repeat the smal
ler vector until it has the same length as the longer
- same with matrices , which are just longer vectors
Vector allocation is time consuming. Its better to create a new vector once and then to assign
values to it, and not to concatenate it using x <- c(x,number)
runs <- vector(length=n)
runs[i] <- 3
Filtering
> z <- c(5,2,-3,8)
> y <- c(1,2,30,5)
> y[z*z > 8] [1] 1 30 5
Whe are using the indices of z which fulfill the condition to determine the corresponding
indices of y
- works with assignment too!
- NULL values are not filtered
=> we use subset(vector,condition)
which(x>3)
first1a <- function(x) return(which(cond)[1]) # takes the first elemen
t of a vector which holds a certain condition
ifelse
ifelse(b,u,v) -> b is condition on the vector, u[i] is the result if b[i] is true
x <- 1:10
y <- ifelse(x %% 2 == 0,5,12) # %% is the mod operator
> ifelse(x[,2]>3,c(1:8),c(19:(19+8)))
[1] 1 20 21 22 23 24 7 8
Maths
cumsum, cumprod, pmin- element-wise, pmax
nlm(),optim() – finds a minimum;
D(expression
Linear algebra:
…, solve(a,b) -> Ax=b; qr(), chol, t(); diag(3) -> 3x3 identity matrix; diag(m) -> takes the diagonal of a
matrix
Plotting
- aba is a dataset with cols gender, length, diameter;
pchvec <- ifelse(aba$Gender == "M","o","x")
- pchvec set the plotting “figures”; if gender is M, plot “o”, else plot a “x”
plot(aba$Length,aba$Diameter,pch=pchvec)
- plot length vs. diameter and use the figures from pchvec
locator() -> click on some point on the graph, and it show you the location
Functions:
default argument: b = 2 -> always with spaces!!!
- R searches first for local variables in the function body, then one level higher(global vars or if
the function is nested, in the higher function)
Infix functions:
- always begin and end with %%, e.g. %+%, %*%; you can make your own
Replacement functions:
`modfy<-` <- function(x,elem,value) {
x[elem] <- value
x
}
- we call them like so:
modfy(vec,3) <- 10
S3 Classes
x <- structure(1, class = "foo")
or
class(x) <- "foo" # if x is an existing object
- S3 generics: UseMethod(“fun”,arg) -> Method dispatch decides, which method to use for a
generic; in S3, method dispatch works only on the first argument!
isS3stdGeneric
mean <- function (x, ...) {
UseMethod("mean", x)
}
mean.numeric <- function(x, ...) sum(x) / length(x)
mean.data.frame <- function(x, ...) sapply(x, mean, ...)
mean.matrix <- function(x, ...) apply(x, 2, mean)
-- NextMethod()
I create an object with 2 classes attributes (inheritance) 'first' and 'second'.
x <- 1
attr(x,'class') <- c('first','second')
Then I create a generic method Cat to print my object
Cate <- function(x,...)UseMethod('Cate')
I define Cate method for each class.
Cate.first <- function(x,...){
print(match.call())
print(paste('first:',x))
print('---------------------')
NextMethod() ## This will call Cate.second
}
S4 Classes
- more formal and safe than S3
- access with obj@elem or slot(obj,elem)
- @ and $ here can’t make new elements of an object
-Method dispatch can be based on multiple arguments to a generic function, not just one.
setClass("yyy", representation(v="numeric"))
prototype = list(
name = NA_character_,
age = NA_real_
)
setClass("Employee",
contains = "Person",
slots = c(
boss = "Person"
),
prototype = list(
boss = new("Person")
)
)
Canonical example
Class definition
> setClass("Greeting",
+ representation =
+ representation(phrase = "character"),
+ prototype = prototype(phrase = "Hello world!"))
Default instantiation
> new("Greeting")
An object of class "Greeting"
Slot "phrase":
[1] "Hello world!"
Customized instantiation
> new("Greeting", phrase = "ciao")
An object of class "Greeting"
Slot "phrase":
[1] "ciao"
Validity method
> setValidity("Greeting",
+ function(object) {
+ if (length(object@phrase) != 1)
+ "'phrase' not a single string"
+ else if (!nzchar(object@phrase))
+ "'phrase' is empty"
+ else
+ TRUE
+ })
Invalid object instantiation
> new("Greeting", phrase = "")
Error in validObject(.Object) :
invalid class "Greeting" object: 'phrase' is empty
setMethod("show","student",function(object) {
cat(object@name, "\n")
Inheritance:
R6
It uses the encapsulated OOP paradigm, which means that methods belong t
o objects, not generics, and you call them like object$method().
R6 objects are mutable, which means that they are modified in place, and he
nce have reference semantics. objects are not copied when modified; if y1 is an
R6 object, setting y2 <- y1 just makes another reference to the object and mod
ifying y1 modify also y2; use y2 <- y1$clone() instead
R6::R6Class() is the only function from the package that you’ll ever use! 51
classname, public = list() – list of all functions and fields that the class needs
Accumulator <- R6Class("Accumulator", list(
sum = 0,
add = function(x = 1) {
self$sum <- self$sum + x
invisible(self)
})
)
You construct a new object from the class by calling the new() method. In R6,
methods belong to objects, so you use $ to access new():
x <- Accumulator$new()
x$add(4)
x$sum
#> [1] 4
In this class, the fields and methods are public, which means that you can get or set the valu
e of any field.
x$add(10)$add(10)$sum – possible due to the invisible() return
There are two important methods that should be defined for most
classes: $initialize() and $print(). -> initialize overrides the $new function
-----------------------
Add new elements to an existing class with $set(), supplying the visibility (more on in
Section 14.3), the name, and the component.
Accumulator <- R6Class("Accumulator")
Accumulator$set("public", "sum", 0)
Accumulator$set("public", "add", function(x = 1) {
self$sum <- self$sum + x
invisible(self)
})
Inheritance:
AccumulatorChatty <- R6Class("AccumulatorChatty",
inherit = Accumulator,
public = list(
add = function(x = 1) {
cat("Adding ", x, "\n", sep = "")
super$add(x = x)
}
)
)
# super accesses the parent class; add overrides the add for the
parent class and uses it in the implementation
private = list()
- private elements can only be accessed from whitin the class with private$
instead of self$
-active bindings -> advanced!
Functional programming
purrr package
purrr::map(c(1,2),function(x) x+1) -> makes a list
purrr::map_chr(chem$MATRIKEL,typeof) -> makes a vector
map_lgl -> returns a logical vector
map_int returns a logical vector, map_dbl double vector
instead of function(x) write ~ and then .x:
purrr::map_dbl(c(1,2,3),~ .x+1)
- can be used for extracting elements from (nested) lists, too:
x <- list(
list(-1, x = 1, y = c(2), z = "a"),
list(-2, x = 4, y = c(5, 6), z = "b"),
list(-3, x = 8, y = c(9, 10, 11))
)
# Select by name
map_dbl(x, "x")
#> [1] 1 4 8
# Or by position
map_dbl(x, 1)
#> [1] -1 -2 -3
x <- c(0, 0, 0, 0)
map_dbl(x, plus, runif(1)) # passed as argument of map, the function is evaluated
only once and applied to all elements of the vector
#> [1] 0.0625 0.0625 0.0625 0.0625
map_dbl(x, ~ plus(.x, runif(1))) # here we evaluate plus every time, thus runif is
evaluated 4 times, too
#> [1] 0.903 0.132 0.629 0.945
the map family -> map, map2, walk, walk2, modify, pmap…
reduce takes a list of vectors and a function
accumulate gives a list of all intermediate results of reduce
while map works (in the sense that it generates the desired welcomes), it also returns list(N
ULL, NULL). If you want to call a function only for its side effects, use walk
Regression models:
lm(predicted ~ predictor)
glm – e.g. for logistic regression
- Weight ~ Age + Height
- Weight ~ . -> everything
- Weight ~ .-GPA -> exclude one variable
model <- glm(formula = BP ~ ., data = gender[1:16,])
=> predict(model,gender[17:20,], type=response) for 0<output<1
Making a confusion matrix(false positives etc.)
table(Actual_Value=test$type,Predicted_Value=res>0.5)
table(vector) -> use e.g. on Boolean vector to show how many T and F
PCA:
- pca <- prcomp(data)
-summary(pca) -> see most important principal components
- plot(pca, type = "l") -> plot them