0% found this document useful (0 votes)
18 views13 pages

Intr2R Week2 2020

Uploaded by

shuaiwu365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views13 pages

Intr2R Week2 2020

Uploaded by

shuaiwu365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Module 2 - Structures and Operations

Vectors in R

Vectors

https://youtu.be/pWNNWw-G51I

Duration: 6m58s

So far we have only looked at scalar variables, i.e. variables containing only one value. In most programming
languages such scalar variables are the basic data types. R however has not scalar variables, all variables are
vectors. A variable storing a number is for example just a numeric vector of length 1.

Defining vectors
The simplest way to create vectors in R is to use the function c:
a <- c(1, 4, 2)
a

## [1] 1 4 2
We can use the function c as well to concatenate (“stick together”) two vectors:
a <- c(1, 4, 2)
b <- c(5, 9, 13)
c <- c(a, b)
c

## [1] 1 4 2 5 9 13
Task 1 Create a vector x that contains the values x = (1, 3, 2, 5) and a vector y which contains the values
y = (1, 0). Finally, concatenate x and y and store the result as z. Print z.

Naming entries
If the entries of the vectors correspond to quantities with natural names it is a good idea to associate names
to the entries of the vector. This can be done using
names(a) <- c("first", "second", "third")
a

## first second third


## 1 4 2
Alternatively we can set the names when creating the vector using c

1
a <- c(first=1, second=4, third=2)

Accessing elements
You can use square brackets to access a single element of a vector. x[i] returns the i-th element of the
vector x. Using the vector a from above,
a[3]

## third
## 2
You can use the same notation to change elements of a vector:
a[3] <- 10
a

## first second third


## 1 4 10
If the element you want to change is beyond the last element, the vector will be extended, such that the i-th
element is the last element.
a[7] <- 99
a

## first second third


## 1 4 10 NA NA NA 99
If the entries of the vector are named, you can also use the names for accessing elements.
a["third"]

## third
## 10

Subsetting vectors
You can use square brackets not only to access a single element of a vector, but also to subset a vector. There
are three ways of specifying subsets of vectors:
You can use a vector specifying the indices to be returned:
a <- c(1, 4, 9, 16)
a[c(1,2,3)]

## [1] 1 4 9
You can use a vector specifying the indices to be removed (as negative numbers):
a[-4]

## [1] 1 4 9
You can use a logical vector specifying the elements to be returned:
a[c(TRUE, TRUE, TRUE, FALSE)]

## [1] 1 4 9
We can exploit the latter when we want to subset a vector based on its values. Suppose you want to keep all
elements in a that are divisible by 2:

2
a[a%%2==0]

## [1] 4 16
Why does this work? a\%\%2==0 returns a logical vector of length 4, indicating which elements of a are
divisible by 2. These elements are then selected from a.
Task 2 Consider the vector x defined as
x <- c(1, 5, 9, 3, 8)

Use all three of the above methods to extract the first, third and fifth entry.

Vectorised calculations

Vectorised calculations

https://youtu.be/dILiux93ueA

Duration: 4m50s

Vectors can be used in arithmetic expressions using the arithmetic operators and the mathematical and
statistical functions we have seen when we used R as a calculator. In this case the computations are carried
out element-wise.
For example,
a <- c(1, 2, 3, 4)
b <- c(2, 0, 1, 3)
c <- 2 * a + b
c

## [1] 4 4 7 11
The third entry of the result, 7, is obtained by taking twice the third entry of the vector a and adding it to
third entry of the vector b, i.e. c3 = 2 × a3 + b3 = 2 × 3 + 1 = 7.

Recycling rules
If vectors of different length are used in an arithmetic expression, the shorter vector(s) are repeated (“recycled”)
until they match the length of the longest vector.
a <- c(1, 2, 3, 4)
b <- c(2, 0)
a * b

## [1] 2 0 6 0
R has thus “recycled” the vector b once.
If the length of the longest vector is not a multiple of the length of the shorter vector(s), R will produce a
warning. For example,

3
a <- c(1, 2, 3, 4)
b <- c(2, 0, 1)
a * b

## Warning in a * b: longer object length is not a multiple of shorter object


## length
## [1] 2 0 3 8
The vector b is shorter than the vector a. However, if b is recycled once, it would have 6 elements, making it
longer than a. Thus R produces a warning. Such a warning is almost always a sign that you have made a
mistake.
Task 3 Consider the following two vectors x and y and their sum
x <- c(1, 2, 9, 4, 5, 6)
y <- c(0, 2)
x+y

## [1] 1 4 9 6 5 8
Explain how R has calculated the result.

Useful functions for vectors


numeric(n) creates a numeric vector of length n (containing 0’s).
length(x) returns the length (numebr of elements) of the vector x.
unique(x) returns the unique elements of x.
rev(x) reverses the vector x, i.e. returns (xn . . . . , x1 )

Sequences and patterned vectors


Sequences
R has built-in functions to create simple sequences and patterned vectors:
The operator : can be used for creating basic sequences.
2:5

## [1] 2 3 4 5
If the first argument is larger then the second argument, then the sequence will be decreasing.
1:0

## [1] 1 0
seq(from, to, by) creates a sequence from from to to using by as increment.
seq(1, 2, by=0.2)

## [1] 1.0 1.2 1.4 1.6 1.8 2.0


seq(from, to, length.out=n) creates a sequence of length n from from to to.
seq(3, 5, length.out=5)

## [1] 3.0 3.5 4.0 4.5 5.0

4
Repeats
rep(x, times=n) repeats the vector x n times.
rep(1:3, times=3)

## [1] 1 2 3 1 2 3 1 2 3
rep(x, each=n) repeats each element of the vector x n times.
rep(1:3, each=3)

## [1] 1 1 1 2 2 2 3 3 3
Task 4 Create each of the following vectors using :, seq and rep.
2 3 4 5 6
2 4 6
1.00 1.25 1.50 1.75 2.00
3 3 4 4 5 5
2 3 4 2 3 4

Matrices
s

Matrices

https://youtu.be/slxsFdCNfkk

Duration: 6m45s

Matrices in R
Matrices are the two-dimensional generalisation of vectors. The main difference between a vector and a
matrix is that a vector has a single index, whereas a matrix has two indices: row and column.
Internally, R stores matrices in column-major mode, i.e. the matrix

1 4 7
 

A = 2 5 8
3 6 9

is stored as
1 2 3 4 5 6 7 8 9
i.e. R internally stacks the columns on top of each other, which is known as “column-major mode”. If you had
stored the matrix A in a two-dimensional array in C or Java, it would be stored in what is called “row-major
mode”, i.e. the rows would be stacked on top of each other.

Creating matrices
There are essentially three ways of creating a matrix in R.

5
Using the internal representation The first one consists of using the internal representation of matrices
as vectors. If we want to create a matrix
0 2 9
 
B=
7 4 6
we can use the command matrix.
B <- matrix(c(0, 7, 2, 4, 9, 6), nrow=2)

Alternatively, you could specify the number of columns using ncol=3.


The function matrix can also be used to create “empty” matrices. matrix(a, nrow, ncol) creates a
nrow×ncol matrix in which every entry is set to a.

Row-wise build-up Another option is to build up the matrix row-wise using the function rbind.
B <- rbind(c(0, 2, 9),
c(7, 4, 6))

We can also use rbind to add a row to an existing matrix.


rbind(B, c(1, 2, 9))

## [,1] [,2] [,3]


## [1,] 0 2 9
## [2,] 7 4 6
## [3,] 1 2 9
If a vector given to rbind is shorter than the other rows, it is “recycled” using the same rules as used for
vector arithmetic. For example, to add a row of 0’s to the matrix B we can use
rbind(B, 0)

## [,1] [,2] [,3]


## [1,] 0 2 9
## [2,] 7 4 6
## [3,] 0 0 0

Column-wise build-up The third option consists of using rbind’s sibling cbind. cbind adds a column
to a matrix and can be used to build up a matrix column-wise. Thus we can create the matrix B using
B <- cbind(c(0, 7), c(2, 4), c(9, 6))

Task 5 Use all three methods fom above to create the matrix

9 2 4
 

M = 3 −2 7 
4 8 −1

Dimensions of a matrix
To find out the dimensions of a matrix you can use the three functions nrow, ncol and dim.
nrow(B)

## [1] 2
ncol(B)

## [1] 3

6
dim(B)

## [1] 2 3
The function length returns the number of entries of a matrix (2 × 3 = 6 in our case)
length(B)

## [1] 6

Diagonal matrices
Diagonal matrices have a special role in Linear Algebra and thus in Statistics. For this reason R has a
function dedicated to diagonal matrices: diag.
E <- diag(c(1 ,4 , 2))
E

## [,1] [,2] [,3]


## [1,] 1 0 0
## [2,] 0 4 0
## [3,] 0 0 2
diag can also be used to access the diagonal of an existing matrix. The matrix does not even need to be
diagonal for this. You can change the second element of the diagonal of the matrix E to 5 using
diag(E)[2] <- 5
E

## [,1] [,2] [,3]


## [1,] 1 0 0
## [2,] 0 5 0
## [3,] 0 0 2
Task 6 Create in R the diagonal matrix
4 0 0 0
 
0 7 0 0
0 0 0
 
−9
0 0 0 4

Naming rows and columns


When working with data matrices it is a good idea to name at least the columns of a matrix. It is much
better to refer to variables in a matrix by their name rather than the column index in which they are stored.
Rows and columns can be named using the functions rownames and colnames.
colnames(B) <- c("First column", "Second column", "Third column")
rownames(B) <- c("First row", "Second row")
B

## First column Second column Third column


## First row 0 2 9
## Second row 7 4 6

Accessing elements and submatrices


Single entries of matrices can be accessed using square brackets. To extract B23 , i.e. the value in the second
row and third column of B we can use

7
B[2,3]

## [1] 6
Similarly, we can set B23 to -1 using
B[2,3] <- -1

Though this is not recommended, we could have also used the internal vector representation and extract the
sixth element of the internal representation
B[6]

## [1] -1
You can access arbitrary submatrices by specifying the rows and columns you wish to access. You can do so
by using any combination of the three methods used for vectors. To extract the first row and first and second
column of B you can use any of the following lines (. . . and there are many other ways of doing so):
B[1, 1:2]

## First column Second column


## 0 2
B[-2, 1:2]

## First column Second column


## 0 2
B[c(TRUE, FALSE), -3]

## First column Second column


## 0 2
Each line returns the vector [0, 2]
Because the output object has only one row it is returned as a vector (instead of a matrix). In complex
programmes in which the result is then expected to be matrix this can sometimes cause problems. In this
case you can use the additional argument drop=FALSE:
B[1, 1:2, drop=FALSE]

## First column Second column


## First row 0 2
If we only want to subset rows and columns the elector for the other dimension can be left empty. To access
the first row of the matrix B use
B[1,]

## First column Second column Third column


## 0 2 9
To access the third column of B use
B[, 3]

## First row Second row


## 9 -1
To replace the third column of B with the numbers (1, 8) you can use
B[, 3] <- c(1, 8)

8
Furthermore, logical expressions can be used to subset matrices in the same way as they are used to subset
vectors. Suppose you want to set all entries larger than 5 to 6.
B[B > 5] <- 6
B

## First column Second column Third column


## First row 0 2 1
## Second row 6 4 6
Task 7 Using the matrix M from Task 5:
• extract the first row,
• set the top-right entry to 0,
• add 1 to the last column.

Matrix multiplication and linear algebra


Matrix multiplication
The basic arithmetic operators can be applied to matrices in the same way as they can be applied to vectors.
They are interpreted element-wise. Most importantly, * performs element-wise multiplication. In order to
perform matrix multiplication you need to use the operator %*%.
To compute the matrix product
1
 
−1 
0 2 9

0 1 ·
7 4 −1
−1 0
in R, you can use:
A <- matrix(c(1, 0, -1, -1, 1, 0), ncol=2)
B <- matrix(c(0, 7, 2, 4, 9, -1), ncol=3)
A %*% B

## [,1] [,2] [,3]


## [1,] -7 -2 10
## [2,] 7 4 -1
## [3,] 0 -2 -9
Note that matrix multiplication is generally not commutative (unless A and B are symmetric), thus A\%*\%B
is not the same as B\%*\%A.
The transpose A> of a matrix A can be computed using the function t(A). The cross product A> A can be
computed using the function crossprod(A).

Matrix inverse and linear systems of equations


The function solve computes the matrix inverse. For example,
C <- B%*%t(B) # Create a square matrix C=BB'
C.inv <- solve(C) # Compute the inverse of C
C.inv%*%C # Check whether C.inv is indeed the inverse

## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
C%*%C.inv

## [,1] [,2]

9
## [1,] 1 0
## [2,] 0 1
The function solve can be used not only for inverting a matrix, but also for solving a (non degenerate)
system of linear equations. solve(A, b) solves the system of equations Az = b for z.
To solve the system of equations

5z2 + z3 = 7
7z2 − z3 = 5
11z1 + z2 + z3 = 14
you can use
A <- rbind(c( 0, 5, 1),
c( 0, 7, -1),
c(11, 1, 1))
b <- c(7, 5, 14)
z <- solve(A, b)
z

## [1] 1 1 2
We can check the answer by computing Az, which should then be b.
A%*%z

## [,1]
## [1,] 7
## [2,] 5
## [3,] 14
You could as well first compute the inverse of A and then compute A−1 b, i.e. use
z <- solve(A)%*%b

but this is slightly less efficient than the above code.


Linear systems of equations play a key role in Data Analytics and Statistics. For example, you will learn that
the least-squares estimate of the coefficients in a linear regression model can be found by

β̂ = (X> X)−1 X> y,

which we can rewrite as solving the system of linear equations

(X> X) β = X> y
| {z } |{z} | {z }
=A =z =b

Matrix decomposition
In certain special cases the linear system of equations Az = b can be solved more easily by decomposing the
matrix A. The key idea is to rewrite the matrix A as a product of two matrices,

A = BC.

We can then solve Az = b by solving BCz = b. We can do this by first solving Bv = b for v and then
solving Cz = v for z. Then Az = BCz = Bv = b, i.e. z is indeed a solution to the linear system of equations.
The trick here is that the matrices B and C are of a form for which solving the associated system of equations
is much simpler than the original one.

10
Choleski decomposition We look here at one specific type of decomposition, although many others are
commonly used in linear algebra, namely the Choleski decomposition. If the matrix A is symmetric and
positive-definite then the Choleski decomposition can be computed as

A = LLT ,

where L is lower-diagonal. The Choleski decomposition can be computed using the function chol in R. Let’s
consider the matrix A so defined
A <- matrix(c(1,3,3,13),nrow=2)
A

## [,1] [,2]
## [1,] 1 3
## [2,] 3 13
and its Choleski decomposition is
L <- t(chol(A))
L

## [,1] [,2]
## [1,] 1 0
## [2,] 3 2
We can now verify that indeed A = LLT
L%*%t(L)

## [,1] [,2]
## [1,] 1 3
## [2,] 3 13

Eigenvectors and eigenvalues


The function eigen can be used to compute the eigenvectors and eigenvalues of a square matrix. For example,
A.eig <- eigen(A)
A.eig

## eigen() decomposition
## $values
## [1] 13.7082039 0.2917961
##
## $vectors
## [,1] [,2]
## [1,] 0.2297529 -0.9732490
## [2,] 0.9732490 0.2297529
We can now numerically verify the spectral decomposition A = ΓΛΓT , where Γ is the matrix of eigenvectors
and Λ is the diagonal matrix containing the eigenvalues.
A.eig$vectors %*% diag(A.eig$values) %*% t(A.eig$vectors)

## [,1] [,2]
## [1,] 1 3
## [2,] 3 13
which is identical to A.

11
Solutions to the tasks
Task 1
x <- c(1,3,2,5)
y <- c(1,0)
z <- c(x,y)
z

## [1] 1 3 2 5 1 0
Task 2 We can use the following R code, but there are many other equally good answers
x <- c(1,5,9,3,8)
x[c(1,3,5)]

## [1] 1 9 8
x[-c(2,4)]

## [1] 1 9 8
x[c(TRUE,FALSE,TRUE,FALSE,TRUE)]

## [1] 1 9 8
Task 3 The recycling rule means that R treats x+y as x+yyy where
yyy <- c(0,2,0,2,0,2)

Indeed
x <- c(1,2,9,4,5,6)
x+yyy

## [1] 1 4 9 6 5 8
Task 4
2:6

## [1] 2 3 4 5 6
seq(2,6,by=2)

## [1] 2 4 6
seq(1,2,length.out=5)

## [1] 1.00 1.25 1.50 1.75 2.00


rep(3:5,each=2)

## [1] 3 3 4 4 5 5
rep(2:4,2)

## [1] 2 3 4 2 3 4
Task 5 Using the internal representation
M <- matrix(c(9,3,4,2,-2,8,4,7,-1),ncol=3)

Using rbind

12
M <- rbind(c(9,3,4),c(2,-2,8),c(4,7,-1))

Using cbind
M <- cbind(c(9,3,4),c(2,-2,8),c(4,7,-1))

Task 6 You can use the following R code


diag(c(4,7,-9,4))

## [,1] [,2] [,3] [,4]


## [1,] 4 0 0 0
## [2,] 0 7 0 0
## [3,] 0 0 -9 0
## [4,] 0 0 0 4
Task 7 You can use the following R code
M[1,]

## [1] 9 2 4
M[1,3] <- 0
M[,3] <- M[,3] + 1

13

You might also like