Intr2R Week2 2020
Intr2R Week2 2020
Vectors in R
Vectors
https://youtu.be/pWNNWw-G51I
Duration: 6m58s
So far we have only looked at scalar variables, i.e. variables containing only one value. In most programming
languages such scalar variables are the basic data types. R however has not scalar variables, all variables are
vectors. A variable storing a number is for example just a numeric vector of length 1.
Defining vectors
The simplest way to create vectors in R is to use the function c:
a <- c(1, 4, 2)
a
## [1] 1 4 2
We can use the function c as well to concatenate (“stick together”) two vectors:
a <- c(1, 4, 2)
b <- c(5, 9, 13)
c <- c(a, b)
c
## [1] 1 4 2 5 9 13
Task 1 Create a vector x that contains the values x = (1, 3, 2, 5) and a vector y which contains the values
y = (1, 0). Finally, concatenate x and y and store the result as z. Print z.
Naming entries
If the entries of the vectors correspond to quantities with natural names it is a good idea to associate names
to the entries of the vector. This can be done using
names(a) <- c("first", "second", "third")
a
1
a <- c(first=1, second=4, third=2)
Accessing elements
You can use square brackets to access a single element of a vector. x[i] returns the i-th element of the
vector x. Using the vector a from above,
a[3]
## third
## 2
You can use the same notation to change elements of a vector:
a[3] <- 10
a
## third
## 10
Subsetting vectors
You can use square brackets not only to access a single element of a vector, but also to subset a vector. There
are three ways of specifying subsets of vectors:
You can use a vector specifying the indices to be returned:
a <- c(1, 4, 9, 16)
a[c(1,2,3)]
## [1] 1 4 9
You can use a vector specifying the indices to be removed (as negative numbers):
a[-4]
## [1] 1 4 9
You can use a logical vector specifying the elements to be returned:
a[c(TRUE, TRUE, TRUE, FALSE)]
## [1] 1 4 9
We can exploit the latter when we want to subset a vector based on its values. Suppose you want to keep all
elements in a that are divisible by 2:
2
a[a%%2==0]
## [1] 4 16
Why does this work? a\%\%2==0 returns a logical vector of length 4, indicating which elements of a are
divisible by 2. These elements are then selected from a.
Task 2 Consider the vector x defined as
x <- c(1, 5, 9, 3, 8)
Use all three of the above methods to extract the first, third and fifth entry.
Vectorised calculations
Vectorised calculations
https://youtu.be/dILiux93ueA
Duration: 4m50s
Vectors can be used in arithmetic expressions using the arithmetic operators and the mathematical and
statistical functions we have seen when we used R as a calculator. In this case the computations are carried
out element-wise.
For example,
a <- c(1, 2, 3, 4)
b <- c(2, 0, 1, 3)
c <- 2 * a + b
c
## [1] 4 4 7 11
The third entry of the result, 7, is obtained by taking twice the third entry of the vector a and adding it to
third entry of the vector b, i.e. c3 = 2 × a3 + b3 = 2 × 3 + 1 = 7.
Recycling rules
If vectors of different length are used in an arithmetic expression, the shorter vector(s) are repeated (“recycled”)
until they match the length of the longest vector.
a <- c(1, 2, 3, 4)
b <- c(2, 0)
a * b
## [1] 2 0 6 0
R has thus “recycled” the vector b once.
If the length of the longest vector is not a multiple of the length of the shorter vector(s), R will produce a
warning. For example,
3
a <- c(1, 2, 3, 4)
b <- c(2, 0, 1)
a * b
## [1] 1 4 9 6 5 8
Explain how R has calculated the result.
## [1] 2 3 4 5
If the first argument is larger then the second argument, then the sequence will be decreasing.
1:0
## [1] 1 0
seq(from, to, by) creates a sequence from from to to using by as increment.
seq(1, 2, by=0.2)
4
Repeats
rep(x, times=n) repeats the vector x n times.
rep(1:3, times=3)
## [1] 1 2 3 1 2 3 1 2 3
rep(x, each=n) repeats each element of the vector x n times.
rep(1:3, each=3)
## [1] 1 1 1 2 2 2 3 3 3
Task 4 Create each of the following vectors using :, seq and rep.
2 3 4 5 6
2 4 6
1.00 1.25 1.50 1.75 2.00
3 3 4 4 5 5
2 3 4 2 3 4
Matrices
s
Matrices
https://youtu.be/slxsFdCNfkk
Duration: 6m45s
Matrices in R
Matrices are the two-dimensional generalisation of vectors. The main difference between a vector and a
matrix is that a vector has a single index, whereas a matrix has two indices: row and column.
Internally, R stores matrices in column-major mode, i.e. the matrix
1 4 7
A = 2 5 8
3 6 9
is stored as
1 2 3 4 5 6 7 8 9
i.e. R internally stacks the columns on top of each other, which is known as “column-major mode”. If you had
stored the matrix A in a two-dimensional array in C or Java, it would be stored in what is called “row-major
mode”, i.e. the rows would be stacked on top of each other.
Creating matrices
There are essentially three ways of creating a matrix in R.
5
Using the internal representation The first one consists of using the internal representation of matrices
as vectors. If we want to create a matrix
0 2 9
B=
7 4 6
we can use the command matrix.
B <- matrix(c(0, 7, 2, 4, 9, 6), nrow=2)
Row-wise build-up Another option is to build up the matrix row-wise using the function rbind.
B <- rbind(c(0, 2, 9),
c(7, 4, 6))
Column-wise build-up The third option consists of using rbind’s sibling cbind. cbind adds a column
to a matrix and can be used to build up a matrix column-wise. Thus we can create the matrix B using
B <- cbind(c(0, 7), c(2, 4), c(9, 6))
Task 5 Use all three methods fom above to create the matrix
9 2 4
M = 3 −2 7
4 8 −1
Dimensions of a matrix
To find out the dimensions of a matrix you can use the three functions nrow, ncol and dim.
nrow(B)
## [1] 2
ncol(B)
## [1] 3
6
dim(B)
## [1] 2 3
The function length returns the number of entries of a matrix (2 × 3 = 6 in our case)
length(B)
## [1] 6
Diagonal matrices
Diagonal matrices have a special role in Linear Algebra and thus in Statistics. For this reason R has a
function dedicated to diagonal matrices: diag.
E <- diag(c(1 ,4 , 2))
E
7
B[2,3]
## [1] 6
Similarly, we can set B23 to -1 using
B[2,3] <- -1
Though this is not recommended, we could have also used the internal vector representation and extract the
sixth element of the internal representation
B[6]
## [1] -1
You can access arbitrary submatrices by specifying the rows and columns you wish to access. You can do so
by using any combination of the three methods used for vectors. To extract the first row and first and second
column of B you can use any of the following lines (. . . and there are many other ways of doing so):
B[1, 1:2]
8
Furthermore, logical expressions can be used to subset matrices in the same way as they are used to subset
vectors. Suppose you want to set all entries larger than 5 to 6.
B[B > 5] <- 6
B
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
C%*%C.inv
## [,1] [,2]
9
## [1,] 1 0
## [2,] 0 1
The function solve can be used not only for inverting a matrix, but also for solving a (non degenerate)
system of linear equations. solve(A, b) solves the system of equations Az = b for z.
To solve the system of equations
5z2 + z3 = 7
7z2 − z3 = 5
11z1 + z2 + z3 = 14
you can use
A <- rbind(c( 0, 5, 1),
c( 0, 7, -1),
c(11, 1, 1))
b <- c(7, 5, 14)
z <- solve(A, b)
z
## [1] 1 1 2
We can check the answer by computing Az, which should then be b.
A%*%z
## [,1]
## [1,] 7
## [2,] 5
## [3,] 14
You could as well first compute the inverse of A and then compute A−1 b, i.e. use
z <- solve(A)%*%b
(X> X) β = X> y
| {z } |{z} | {z }
=A =z =b
Matrix decomposition
In certain special cases the linear system of equations Az = b can be solved more easily by decomposing the
matrix A. The key idea is to rewrite the matrix A as a product of two matrices,
A = BC.
We can then solve Az = b by solving BCz = b. We can do this by first solving Bv = b for v and then
solving Cz = v for z. Then Az = BCz = Bv = b, i.e. z is indeed a solution to the linear system of equations.
The trick here is that the matrices B and C are of a form for which solving the associated system of equations
is much simpler than the original one.
10
Choleski decomposition We look here at one specific type of decomposition, although many others are
commonly used in linear algebra, namely the Choleski decomposition. If the matrix A is symmetric and
positive-definite then the Choleski decomposition can be computed as
A = LLT ,
where L is lower-diagonal. The Choleski decomposition can be computed using the function chol in R. Let’s
consider the matrix A so defined
A <- matrix(c(1,3,3,13),nrow=2)
A
## [,1] [,2]
## [1,] 1 3
## [2,] 3 13
and its Choleski decomposition is
L <- t(chol(A))
L
## [,1] [,2]
## [1,] 1 0
## [2,] 3 2
We can now verify that indeed A = LLT
L%*%t(L)
## [,1] [,2]
## [1,] 1 3
## [2,] 3 13
## eigen() decomposition
## $values
## [1] 13.7082039 0.2917961
##
## $vectors
## [,1] [,2]
## [1,] 0.2297529 -0.9732490
## [2,] 0.9732490 0.2297529
We can now numerically verify the spectral decomposition A = ΓΛΓT , where Γ is the matrix of eigenvectors
and Λ is the diagonal matrix containing the eigenvalues.
A.eig$vectors %*% diag(A.eig$values) %*% t(A.eig$vectors)
## [,1] [,2]
## [1,] 1 3
## [2,] 3 13
which is identical to A.
11
Solutions to the tasks
Task 1
x <- c(1,3,2,5)
y <- c(1,0)
z <- c(x,y)
z
## [1] 1 3 2 5 1 0
Task 2 We can use the following R code, but there are many other equally good answers
x <- c(1,5,9,3,8)
x[c(1,3,5)]
## [1] 1 9 8
x[-c(2,4)]
## [1] 1 9 8
x[c(TRUE,FALSE,TRUE,FALSE,TRUE)]
## [1] 1 9 8
Task 3 The recycling rule means that R treats x+y as x+yyy where
yyy <- c(0,2,0,2,0,2)
Indeed
x <- c(1,2,9,4,5,6)
x+yyy
## [1] 1 4 9 6 5 8
Task 4
2:6
## [1] 2 3 4 5 6
seq(2,6,by=2)
## [1] 2 4 6
seq(1,2,length.out=5)
## [1] 3 3 4 4 5 5
rep(2:4,2)
## [1] 2 3 4 2 3 4
Task 5 Using the internal representation
M <- matrix(c(9,3,4,2,-2,8,4,7,-1),ncol=3)
Using rbind
12
M <- rbind(c(9,3,4),c(2,-2,8),c(4,7,-1))
Using cbind
M <- cbind(c(9,3,4),c(2,-2,8),c(4,7,-1))
## [1] 9 2 4
M[1,3] <- 0
M[,3] <- M[,3] + 1
13