0% found this document useful (0 votes)
28 views39 pages

Week 02

This document provides an overview of vectors, data frames, and time series data in R for finance applications. It discusses learning outcomes which include manipulating and plotting data frames. The appendix provides examples and explanations of basic R data types like vectors and data frames, as well as special data types like NA, NaN, and Inf. It also covers accessing and subsetting data frames, and creating time series objects using the xts package to represent financial time series data.

Uploaded by

Sam Sung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views39 pages

Week 02

This document provides an overview of vectors, data frames, and time series data in R for finance applications. It discusses learning outcomes which include manipulating and plotting data frames. The appendix provides examples and explanations of basic R data types like vectors and data frames, as well as special data types like NA, NaN, and Inf. It also covers accessing and subsetting data frames, and creating time series objects using the xts package to represent financial time series data.

Uploaded by

Sam Sung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Applications of R for Finance

Imperial College Business School

Week 2
Vectors and Data Frames

Liam (Jianliang) Gao & Jay (James) DesLauriers


j.gao and j.deslauriers AT imperial.ac.uk
2023/09 (updated: 2023-09-07)

1 / 39
Recap
Last week, we covered:
R Notebook (AKA R Markdown)

HTML knitting

Basic programming and text editing in RStudio

Basic data types (numeric, character and Date)

2 / 39
Learning Outcomes for this week
You will learn

More R basics

Vectors
Data frames
External Packages

Manipulation of data frames.

Data loading
Column(s) access
Data types conversion

NOTE: There are plenty of explanations in the Appendix for your reference. Download
handout02.Rmd now for your hands-on practice.

3 / 39
Data

4 / 39
Assignment One
What's involved?

Loading data from a .csv file into a data frame


Converting the type of whole columns of data
Plotting data
Creating new columns in data frames
Merging two data frames*
Performing multiple linear regression*
Formatting text in Markdown
Rendering the Notebook to PDF

You will learn these in week 3

5 / 39
Feeling Survey
A quick survey about your learning as feeling at menti.com with code:3967 9303

6 / 39
Appendix

7 / 39
Basic Variable Types – Special Types
NA: Not Available
for missing values (sometimes used interchangeably with NaN)
NaN: Not a Number
for when numerical calculations generate a numerical error or when a value cannot be
calculated
NULL: denotes an undefined value (i.e. vaccum)
Inf: for infinite values
+Inf is the same as Inf and denotes +∞
-Inf denotes −∞

8 / 39
Basic Variable Types – Special Types
Operations:

is.nan(x): checks if variable x is of type NaN


is.na(x): checks if variable x is of type NA

if x is of type NA, is.nan(x) also returns TRUE

is.null(x): checks if variable x is of type NULL


is.infinite(x): checks if variable x is of type Inf or -Inf

In-class practice example:

assign each of (3, 0, -5, Inf, NULL, NA, NaN, 0.0001, 90.6, -Inf) to x, and run is.infinite,
!is.infinite, is.finite, is.numeric, is.na, is.nan respectively.

9 / 39
Basic Variable Types – Vectors
Notice the funny “[1]” that accompanies each returned value. In R, any number that you enter
is interpreted as a vector.

Vectors:

Ordered collection of values of the same type. The “[1]” means that the index of the first
item displayed in the row is 1.
Vectors are the fundamental data structure in R. A vector is an integer-indexed one-
dimensional array.

Vector of numeric values (N x 1 vector)

Natural numbers from 1 to 20: values = 1:20


Numbers from 0 to 20, in steps of 5: values = seq(from = 0, to = 20, by = 5)
0 5 10 15 20
Vector with 10 zeros: values = rep(0, times = 10)

Vector with character values: values = c("Imperial", "College", "London")

Imperial College London

10 / 39
Basic Variable Types – Vectors
Most operations applied on single values can be applied to a vector (and later matrix and data
frame).

R returns another vector with the result of applying the operation on each value of the
vector.
However, operations on vectors depend on what information is stored on each position.

Arithmetic operations:

values * 2
sqrt(values)

Round operations:

round(values, 2); ceiling(values); floor(values)

Min/Max value:

min(values)
max(values)

11 / 39
Basic Variable Types – Vectors
Comparison operations on each element – values >= 0

is.na(values)

Comparison operations on all elements

all values need to satisfy condition: all(values >= 0)


at least one value needs to satisfy condition: any(values >= 0)

Check how many elements an vector has:

length(values)

Append a value to an existing vector

append(x, values)

12 / 39
Basic Variable Types – Vectors
An individual value of a vector is accessed by specifying its position in the vector
values[position]
the position of the first element of a vector in R is 1: values[1]

in other languages like Python and Java, the position of the first element is 0

We can also access multiple values of the same vector


values[1:3]; values[c(1, 2, 3)]

13 / 39
Basic Variable Types – Vectors
The operations in the following 2 slides use the vector from values = seq(from = 0, to = 20,
by = 5)

We can also filter values of an vector that match some conditions. i.e. a logical expression

values[values > 0 & values %% 2 == 0]


Selects all values in the vector that are positive and multiples of 2

values <- seq(from = 0, to = 20, by = 5)


values[values > 0 & values %% 2 == 0]

## [1] 10 20

Obtain positions of elements that match a logical expression

which(values > 0 & values %% 2 == 0)

## [1] 3 5

14 / 39
Basic Variable Types – Vectors
We can select the first/last elements of a vector using the following commands head/tail,
respectively:

head(values, n = 3)
Selects first three elements of vector values
tail(values, n = 2)
Selects the last two elements of vector values
We can modify each element of an vector by specifying its position and using the
assignment operators = or <-
assigning -1 to position 2 : values[2] = -1
setting positions 1, 2 and 3 to -1 : values[1:3] = -1
We can also modify the elements that verify a condition

Multiply all even numbers by 2

index = which(values %% 2 == 0); values[index] = values[index] * 2

or

values[values %% 2 == 0] = values[values %% 2 == 0] * 2

tail(values, n=3)

15 / 39
Basic Variable Types – Data Frame
Supported operations (this will be similar for the matrix type):

All columns of first row: df[1, ]


All rows of column 1 and 3: df[ , c(1, 3) ]
Number of rows: nrow(df)
Number of columns: ncol(df)
Dimension (rows & cols): dim(df)

When columns have names, they can be accessed by their name regardless of the column
position

df[, "Date"]
df[, "Price"]

The $ sign can also be used to access a particular column

for example: Date or Price column.

16 / 39
Basic Variable Types – Data Frame
We can add a new column (e.g. Indicator) by specifying its name and its values

df[, "Indicator"] = df[, "Price"] * 1.1 OR df$Indicator = df$Price * 1.1


New column needs to have the same number of values as existing ones
Special case: assign single value to all rows of new column
df[, "Indicator"] = 1

Column names are accessed using the command names

names(df) #returns c("Date", "Price", "Flag")

To change the column names, we need to assign a new array of column names

names(df) = c("Date", "Price", "RiskFlag") or colnames(df) <- c("Date", "Price",


"RiskFlag")

17 / 39
Basic Variable Types - Data Frame
Summary: Common tips to access a data frame

df = trading_data[1:8, ] # access to the rows 1 to 8


df
# single column access of data frame
df[1]
df["iid"]

df[,1]
df[,"iid"]
df$ID

# multiple columns access


# use column index or column name
df[1:3]
df[,1:3]
df[c(1,2,3)] # or df[ , c(1,2,3)]
df[c("iid", "conm","datadate")] # or df[ , c("iid", "conm","datadate")]

18 / 39
Extended Variable Types – xts
R does not support timeseries by default

We need to install xts package to use timeseries


install.packages("xts")

Once the xts package is loaded, we can use the xts command (from the xts library) to create of
timeseries objects in R

a timeseries object is represented by data columns (usually numeric) and a time index
(i.e. of type Date or POSIXct)
Both the data columns and the time index need to have the same length
So it is similar to a special kind of data.frame (same length, diff classes)

19 / 39
Reading and Plotting Time Series Data
Use the trading sample data

Think: Is it a time series dataset?

Now create the timeseries representation

ts = xts(trading_data[, 6], order.by = trading_data[, 3])


first argument specifies data columns (NOTE: can be more than one)
order.by argument specifies time values

ts = xts(trading_data[, 6], order.by = trading_data[, 3])

Will get errors because dates are not time-based


We need to convert dates (in text) to a time-based object

trading_data$datadate = as.Date(as.character(trading_data$datadate), format = "%Y%m%d")


ts = xts(trading_data[, 6], order.by = trading_data[, 3])
plot(ts)

20 / 39
Operations on Time Series Data
Use square bracket filtering [ ]
Selecting a particular time period (e.g. with start and end dates)
ts["2020-03-01::2020-05-01"]
Selecting a time period from the first date to a specific date
ts["::2020-05-01"]
Selecting a time period from specific date to the last date
ts["2020-03-01::"]
Selecting a particular month
ts["2020-03"]
Equivalent to ts2["2020-03-01::2020-03-31"]
Selecting a particular year
ts["2020"]

21 / 39
Creating arrays of random values
R has several tools for statistical analysis

A very useful one is generating random numbers from particular distributions

Can be used in Monte-Carlo simulations


runif(n = 10, min = 0, max = 1)
Generates 10 numbers between 0 and 1 using a uniform distribution
rnorm(n=100,mean=0,sd=1)
Generates 100 numbers using a normal distribution with zero mean and unit
standard deviation

Many others

rbinom: binomial distribution


rexp: exponential distribution
rt: T-distribution

22 / 39
Basic Variable Types – Factor
For categorical values (e.g. answers of a multiple choice question)

recommendation = factor(c("buy", "buy", "sell", "nothing", "nothing"), levels =


c("buy", "sell", "nothing") )
recommendation = factor(c("buy", "buy", "sell", "nothing", "nothing"), levels =
c("buy", "sell", "nothing"), ordered = TRUE)

Operations:

List of categories: levels(recommendation)


Number of categories: nlevels(recommendation)
Number of instances per category: table(recommendation)
Min/Max category: min(recommendation) ; max(recommendation)

practice

create your own factor


find and plot frequecy (table, plot)

23 / 39
Basic Variable Types - Matrix
For storing multiple values of the same type in a N x M matrix (row x col).
can be seen as an array with an extra dimension (with rows and columns).
0 15 6 21 30
5 20 11 10 40
10 1 17 20 50
all values need to be of the same type (e.g. numeric, character).

24 / 39
Matrix
For storing multiple values of the same type in a N × M matrix.
For example, create a matrix where all elements have the same value:
matrix(data = 1, nrow = 2, ncol = 3)
matrix(data = 0, nrow = 4, ncol = 4)

Alternatively we can specify all the values in a matrix.

matrix( data = c(1,2,3,4,5,6) ) - creates a matrix with 6 rows and 1 column.

## [,1]
## [1,] 1
## [2,] 2
## [3,] 3
## [4,] 4
## [5,] 5
## [6,] 6

matrix( data = c(1,2,3,4,5,6), ncol = 6) - creates a matrix with 6 columns and 1 row.
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 2 3 4 5 6

25 / 39
Matrix
We can specify all the values in a matrix.

matrix(data = 1:20, nrow = 4, ncol = 5)

Try:

matrix(data = c(1,2,3,4,5,6), nrow = 2, ncol = 3)

matrix(data = runif(20), nrow = 4, ncol = 5)

26 / 39
Matrix
We can control how R uses the array data to fill the values in a matrix.

fills values by column (default).

matrix( data = c(1,2,3,4,5,6), ncol = 2, byrow = FALSE)

## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6

fills values by row.

matrix( data = c(1,2,3,4,5,6), ncol = 2, byrow = TRUE)

## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6

27 / 39
Matrix
To avoid unexpected results, you should always have N × M values in the data

matrix(data = 1:19, nrow = 4, ncol = 5)

## Warning in matrix(data = 1:19, nrow = 4, ncol = 5): data length [19] is not a
## sub-multiple or multiple of the number of rows [4]

## [,1] [,2] [,3] [,4] [,5]


## [1,] 1 5 9 13 17
## [2,] 2 6 10 14 18
## [3,] 3 7 11 15 19
## [4,] 4 8 12 16 1

matrix(c(1,2,3), 6, 4) matrix(, 6, 4)

## [,1] [,2] [,3] [,4] ## [,1] [,2] [,3] [,4]


## [1,] 1 1 1 1 ## [1,] NA NA NA NA
## [2,] 2 2 2 2 ## [2,] NA NA NA NA
## [3,] 3 3 3 3 ## [3,] NA NA NA NA
## [4,] 1 1 1 1 ## [4,] NA NA NA NA
## [5,] 2 2 2 2 ## [5,] NA NA NA NA
## [6,] 3 3 3 3 ## [6,] NA NA NA NA

28 / 39
Accessing values in a Matrix
To access the value of an entry in a matrix m, we need to specify both its row and its column.
index starts from 1.
Second column of first row:m[1, 2]
First column of second row:m[2, 1]
We can also access multiple values.
All columns of first row: m[1, ]
All rows of column 1 and 3: m[, c(1, 3) ]
First and second rows of column 2 and 3: m[c(1, 2), c(2, 3) ]
Other operations:
Number of elements: length(m)
Number of rows: nrow(m)
Number of columns: ncol(m)
In-class group discussion:
Create a matrix: m<-matrix(runif(24, 1, 10), 6, 4)
Use loop to access diagonal (top left corner to bottom right corner) elements in m.

29 / 39
Operations on matrices
Most operations applied on single values can be applied to a matrix.
R returns another matrix with the result of applying the operation on each value of the
matrix.
Matrix of numeric values.
arithmetic operations: m * 2 ; m + 2
rounding operations: round(m, 3); floor(m)
comparison operations: m > 0
Matrix of logical values
And: m & TRUE
Negation: !m
Exclusive Or: xor(m, FALSE)

m/2
m > 7
!(m >7)

30 / 39
Operations on matrices
We can also apply some of these operations on multiple matrices, provided they have the
exact same number of rows and columns.

create two matrices: m1 = matrix(1:6, ncol = 2); m2 = matrix(c(2,2,2,3,3,3), ncol=2)

try: m1 + m2; m1*m2; m1>m2

R applies the operation element by element, i.e. to elements of the matrices on the same
position, e.g., m1[1,1]+m2[1,1]

NOTE: Such operations require the two matrices have the same dimension, that is, same
number of rows and same number of columns.

31 / 39
Operations on matrices
Matrix multiplication ("linear algebra")
m1 %*% m2
involves the multiplication of each row of matrix m1 by each column of matrix m2.
only possible if number of columns of m1 is the same as the number of rows of m2
if m1 is an n1 × m matrix, and m2 is an m × n2 matrix, the product of m1 by m2 is an
n1 × n2 matrix.
Example: multiplication of a 2 × 2 matrix by a 2 × 1 matrix, results in a 2 × 1
matrix.
a×x+b×y
[ ]×[ ]=[ ]
a b x
c d y c × x + d × y
m3=matrix(1,2,1); m1 %*% m3

32 / 39
Operations on matrices
Obtain an array with the diagonal of a matrix m.

diag(m)

Create an n × n identity matrix (e.g., n = 3).

diag(3)

Transpose of a matrix (i.e. flipping rows to columns).

t(m)

Adds a new column

cbind(m, columnValues)
Requires that columnValues is either:
a single value of the same type as the existing values in matrix m.
for example: cbind(m1, 7)
an array with a value for each row of the new column.
for example: cbind(m1, c(7, 8, 9))

33 / 39
Operations on matrices
Adds a new row

rbind(m, rowValues)
Requires that rowValues is either:
a single value of the same type as the existing values in matrix m.
for example: rbind(m1, 7)
an array with a value for each column of the new row.
for example: rbind(m1, c(7, 8))

Functions cbind and rbind can also be used to combine two matrices, i.e. columnValues and
rowValues can also be a matrix.

When using cbind, the matrices need to have the same number of rows.
When using rbind, the matrices need to have the same number of columns.
for example: cbind(m1, m2), cbind(m1,m4)

34 / 39
Operations on matrices
We can calculate operations along rows or columns.
Given a matrix m, we can calculate the sum of values in each:
row: rowSums(m1)
column: colSums(m1)

We can also calculate the average value in each row/column.

rowMeans(m1) , colMeans(m1)

For any function that operates on arrays, we can use the function apply

apply(m1, 1, FUN = sum) is equivalent to the function rowSums (1 refers to the first
dimension of a matrix: rows).
apply(m1, 2, FUN = mean) is equivalent to the function colMeans (2 refers to the second
dimension of a matrix: columns).
other examples: apply(m1, 1, FUN = min); apply(m1, 2, FUN = max)

35 / 39
Operations on matrices-Matrix inverse
An inverse of matrix is sometimes also referred to as a reciprocal matrix.
The reciprocal of a number x is a number y such that x × y = 1

The inverse of a matrix M is a matrix M −1 such that M × M −1 = M −1 × M = I

I denotes the identity matrix (diagonals are 1, everything else is 0). For example:
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1

The inverse of a matrix m can be calculated in R using the function solve.

It is only defined for a square matrix, i.e. a matrix with the same number of rows and
columns.

# 4x4 matrix with normally distributed random numbers.


mx = matrix(rnorm(16), ncol=4)
solve(mx)
mx %*% solve(mx)
round(mx %*% solve(mx))

36 / 39
Basic Variable Types – List
For when we need to save values of different types, or different columns that do not have the
same length (unlike matrix and data frame, all columns have the same length).

myList = list(1:20, "Imperial College", 3.141593)

Each entry in the list can be accessed using its position, but using double squared brackets -
[[ ]].

myList[[1]]; myList[[2]]

Each entry can also be associated with a name.

myCID = list(university = "Imperial College", studentID = c(12301, 32102, 12301,


40213) )
myCID[["university"]]; myCID[["studentID"]]
myCID$university; myCID$studentID

37 / 39
Basic Variable Classes – checking & conversion
To check the class of a variable x we can use:

returns the name of the variable type: class(x)


checks if variable is of the specified class:
is.matrix(x)
is.factor(x)
is.data.frame(x)
is.list(x)

Conversion between classes with as. functions

as.Date("2018-03-01", format = "%Y-%m-%d") converts date in text to a Date


as.numeric("12.3") converts text to number 12.3
as.character(12.3) converts number to text "12.3"
as.logical(1) converts to TRUE
as.logical("false") converts to FALSE
as.data.frame(matrix) or data.frame(matrix)

38 / 39
Thanks!
Slides created via the R package xaringan.

The chakra comes from remark.js, knitr, and R Markdown.

39 / 39

You might also like