Week 02
Week 02
Week 2
Vectors and Data Frames
1 / 39
Recap
Last week, we covered:
R Notebook (AKA R Markdown)
HTML knitting
2 / 39
Learning Outcomes for this week
You will learn
More R basics
Vectors
Data frames
External Packages
Data loading
Column(s) access
Data types conversion
NOTE: There are plenty of explanations in the Appendix for your reference. Download
handout02.Rmd now for your hands-on practice.
3 / 39
Data
4 / 39
Assignment One
What's involved?
5 / 39
Feeling Survey
A quick survey about your learning as feeling at menti.com with code:3967 9303
6 / 39
Appendix
7 / 39
Basic Variable Types – Special Types
NA: Not Available
for missing values (sometimes used interchangeably with NaN)
NaN: Not a Number
for when numerical calculations generate a numerical error or when a value cannot be
calculated
NULL: denotes an undefined value (i.e. vaccum)
Inf: for infinite values
+Inf is the same as Inf and denotes +∞
-Inf denotes −∞
8 / 39
Basic Variable Types – Special Types
Operations:
assign each of (3, 0, -5, Inf, NULL, NA, NaN, 0.0001, 90.6, -Inf) to x, and run is.infinite,
!is.infinite, is.finite, is.numeric, is.na, is.nan respectively.
9 / 39
Basic Variable Types – Vectors
Notice the funny “[1]” that accompanies each returned value. In R, any number that you enter
is interpreted as a vector.
Vectors:
Ordered collection of values of the same type. The “[1]” means that the index of the first
item displayed in the row is 1.
Vectors are the fundamental data structure in R. A vector is an integer-indexed one-
dimensional array.
10 / 39
Basic Variable Types – Vectors
Most operations applied on single values can be applied to a vector (and later matrix and data
frame).
R returns another vector with the result of applying the operation on each value of the
vector.
However, operations on vectors depend on what information is stored on each position.
Arithmetic operations:
values * 2
sqrt(values)
Round operations:
Min/Max value:
min(values)
max(values)
11 / 39
Basic Variable Types – Vectors
Comparison operations on each element – values >= 0
is.na(values)
length(values)
append(x, values)
12 / 39
Basic Variable Types – Vectors
An individual value of a vector is accessed by specifying its position in the vector
values[position]
the position of the first element of a vector in R is 1: values[1]
in other languages like Python and Java, the position of the first element is 0
13 / 39
Basic Variable Types – Vectors
The operations in the following 2 slides use the vector from values = seq(from = 0, to = 20,
by = 5)
We can also filter values of an vector that match some conditions. i.e. a logical expression
## [1] 10 20
## [1] 3 5
14 / 39
Basic Variable Types – Vectors
We can select the first/last elements of a vector using the following commands head/tail,
respectively:
head(values, n = 3)
Selects first three elements of vector values
tail(values, n = 2)
Selects the last two elements of vector values
We can modify each element of an vector by specifying its position and using the
assignment operators = or <-
assigning -1 to position 2 : values[2] = -1
setting positions 1, 2 and 3 to -1 : values[1:3] = -1
We can also modify the elements that verify a condition
or
values[values %% 2 == 0] = values[values %% 2 == 0] * 2
tail(values, n=3)
15 / 39
Basic Variable Types – Data Frame
Supported operations (this will be similar for the matrix type):
When columns have names, they can be accessed by their name regardless of the column
position
df[, "Date"]
df[, "Price"]
16 / 39
Basic Variable Types – Data Frame
We can add a new column (e.g. Indicator) by specifying its name and its values
To change the column names, we need to assign a new array of column names
17 / 39
Basic Variable Types - Data Frame
Summary: Common tips to access a data frame
df[,1]
df[,"iid"]
df$ID
18 / 39
Extended Variable Types – xts
R does not support timeseries by default
Once the xts package is loaded, we can use the xts command (from the xts library) to create of
timeseries objects in R
a timeseries object is represented by data columns (usually numeric) and a time index
(i.e. of type Date or POSIXct)
Both the data columns and the time index need to have the same length
So it is similar to a special kind of data.frame (same length, diff classes)
19 / 39
Reading and Plotting Time Series Data
Use the trading sample data
20 / 39
Operations on Time Series Data
Use square bracket filtering [ ]
Selecting a particular time period (e.g. with start and end dates)
ts["2020-03-01::2020-05-01"]
Selecting a time period from the first date to a specific date
ts["::2020-05-01"]
Selecting a time period from specific date to the last date
ts["2020-03-01::"]
Selecting a particular month
ts["2020-03"]
Equivalent to ts2["2020-03-01::2020-03-31"]
Selecting a particular year
ts["2020"]
21 / 39
Creating arrays of random values
R has several tools for statistical analysis
Many others
22 / 39
Basic Variable Types – Factor
For categorical values (e.g. answers of a multiple choice question)
Operations:
practice
23 / 39
Basic Variable Types - Matrix
For storing multiple values of the same type in a N x M matrix (row x col).
can be seen as an array with an extra dimension (with rows and columns).
0 15 6 21 30
5 20 11 10 40
10 1 17 20 50
all values need to be of the same type (e.g. numeric, character).
24 / 39
Matrix
For storing multiple values of the same type in a N × M matrix.
For example, create a matrix where all elements have the same value:
matrix(data = 1, nrow = 2, ncol = 3)
matrix(data = 0, nrow = 4, ncol = 4)
## [,1]
## [1,] 1
## [2,] 2
## [3,] 3
## [4,] 4
## [5,] 5
## [6,] 6
matrix( data = c(1,2,3,4,5,6), ncol = 6) - creates a matrix with 6 columns and 1 row.
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 2 3 4 5 6
25 / 39
Matrix
We can specify all the values in a matrix.
Try:
26 / 39
Matrix
We can control how R uses the array data to fill the values in a matrix.
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
27 / 39
Matrix
To avoid unexpected results, you should always have N × M values in the data
## Warning in matrix(data = 1:19, nrow = 4, ncol = 5): data length [19] is not a
## sub-multiple or multiple of the number of rows [4]
matrix(c(1,2,3), 6, 4) matrix(, 6, 4)
28 / 39
Accessing values in a Matrix
To access the value of an entry in a matrix m, we need to specify both its row and its column.
index starts from 1.
Second column of first row:m[1, 2]
First column of second row:m[2, 1]
We can also access multiple values.
All columns of first row: m[1, ]
All rows of column 1 and 3: m[, c(1, 3) ]
First and second rows of column 2 and 3: m[c(1, 2), c(2, 3) ]
Other operations:
Number of elements: length(m)
Number of rows: nrow(m)
Number of columns: ncol(m)
In-class group discussion:
Create a matrix: m<-matrix(runif(24, 1, 10), 6, 4)
Use loop to access diagonal (top left corner to bottom right corner) elements in m.
29 / 39
Operations on matrices
Most operations applied on single values can be applied to a matrix.
R returns another matrix with the result of applying the operation on each value of the
matrix.
Matrix of numeric values.
arithmetic operations: m * 2 ; m + 2
rounding operations: round(m, 3); floor(m)
comparison operations: m > 0
Matrix of logical values
And: m & TRUE
Negation: !m
Exclusive Or: xor(m, FALSE)
m/2
m > 7
!(m >7)
30 / 39
Operations on matrices
We can also apply some of these operations on multiple matrices, provided they have the
exact same number of rows and columns.
R applies the operation element by element, i.e. to elements of the matrices on the same
position, e.g., m1[1,1]+m2[1,1]
NOTE: Such operations require the two matrices have the same dimension, that is, same
number of rows and same number of columns.
31 / 39
Operations on matrices
Matrix multiplication ("linear algebra")
m1 %*% m2
involves the multiplication of each row of matrix m1 by each column of matrix m2.
only possible if number of columns of m1 is the same as the number of rows of m2
if m1 is an n1 × m matrix, and m2 is an m × n2 matrix, the product of m1 by m2 is an
n1 × n2 matrix.
Example: multiplication of a 2 × 2 matrix by a 2 × 1 matrix, results in a 2 × 1
matrix.
a×x+b×y
[ ]×[ ]=[ ]
a b x
c d y c × x + d × y
m3=matrix(1,2,1); m1 %*% m3
32 / 39
Operations on matrices
Obtain an array with the diagonal of a matrix m.
diag(m)
diag(3)
t(m)
cbind(m, columnValues)
Requires that columnValues is either:
a single value of the same type as the existing values in matrix m.
for example: cbind(m1, 7)
an array with a value for each row of the new column.
for example: cbind(m1, c(7, 8, 9))
33 / 39
Operations on matrices
Adds a new row
rbind(m, rowValues)
Requires that rowValues is either:
a single value of the same type as the existing values in matrix m.
for example: rbind(m1, 7)
an array with a value for each column of the new row.
for example: rbind(m1, c(7, 8))
Functions cbind and rbind can also be used to combine two matrices, i.e. columnValues and
rowValues can also be a matrix.
When using cbind, the matrices need to have the same number of rows.
When using rbind, the matrices need to have the same number of columns.
for example: cbind(m1, m2), cbind(m1,m4)
34 / 39
Operations on matrices
We can calculate operations along rows or columns.
Given a matrix m, we can calculate the sum of values in each:
row: rowSums(m1)
column: colSums(m1)
rowMeans(m1) , colMeans(m1)
For any function that operates on arrays, we can use the function apply
apply(m1, 1, FUN = sum) is equivalent to the function rowSums (1 refers to the first
dimension of a matrix: rows).
apply(m1, 2, FUN = mean) is equivalent to the function colMeans (2 refers to the second
dimension of a matrix: columns).
other examples: apply(m1, 1, FUN = min); apply(m1, 2, FUN = max)
35 / 39
Operations on matrices-Matrix inverse
An inverse of matrix is sometimes also referred to as a reciprocal matrix.
The reciprocal of a number x is a number y such that x × y = 1
I denotes the identity matrix (diagonals are 1, everything else is 0). For example:
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1
It is only defined for a square matrix, i.e. a matrix with the same number of rows and
columns.
36 / 39
Basic Variable Types – List
For when we need to save values of different types, or different columns that do not have the
same length (unlike matrix and data frame, all columns have the same length).
Each entry in the list can be accessed using its position, but using double squared brackets -
[[ ]].
myList[[1]]; myList[[2]]
37 / 39
Basic Variable Classes – checking & conversion
To check the class of a variable x we can use:
38 / 39
Thanks!
Slides created via the R package xaringan.
39 / 39