0% found this document useful (0 votes)

61 views32 pages

Problem Set 1 Solution Numerical Methods

1. The document provides instructions and examples for a problem set on introductory R topics including setting up R and RStudio, performing ordinary least squares regression, calculating matrix inverses and eigenvalues, writing functions, simulating AR(1) time series processes, and assessing properties of statistical estimators via simulation. 2. Examples are provided to generate data and estimate a linear regression model, calculate the inverse and eigenvalues of a matrix, write a function to evaluate polynomials, simulate AR(1) time series, and use simulation to show the sample mean is an unbiased estimator and confidence intervals achieve the nominal coverage probability. 3. Readers are instructed to complete readings in preparation for working with stock market data in the next

Uploaded by

Ariyan Jahanyar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views32 pages

Problem Set 1 Solution Numerical Methods

Uploaded by

Ariyan Jahanyar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

problem set 1

7 feb 2023

1 Setting up and intro problems

1.1 Setup
Install R and RStudio on your laptop. Make sure you install the most recent versions. You do not have to do
this if you are working on a university computer.
Many problems can be solved by removing any existing installation of R and RStudio. Sometimes rebooting
a laptop may solve problems as well. There should be enough spave available on the harddisk.You find the
version of R and some other information as follows:
sessionInfo()

## R version 4.1.2 (2021-11-01)

## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_4.1.2 magrittr_2.0.1 fastmap_1.1.0 cli_3.1.0
## [5] tools_4.1.2 htmltools_0.5.2 rstudioapi_0.13 yaml_2.2.1
## [9] stringi_1.7.6 rmarkdown_2.11 knitr_1.37 stringr_1.4.0
## [13] xfun_0.29 digest_0.6.29 rlang_1.0.2 evaluate_0.14

1.2 Ordinary Least Squares

Generate 200 observations from a N (0, 2) distribution and put these observations in a vector x. Use this
vector to generate observations yi according to

yi = 0.3 + 0.2xi + i , i = 1, . . . , 200,

with i ∼ N (0, 0.25). Construct an X matrix with a column of ones, and the vector x, and calculate the OLS
estimator (X 0 X)−1 X 0 y.
The purpose is to do some simple linear algebra and regression, so that you are familiar with basic functions.
Note that one of the arguments of rnorm is sd and not the variance.
set.seed(3572)
x <- rnorm(200,mean=0,sd=sqrt(2))

1
y <- 0.3+0.2*x + rnorm(200,sd=sqrt(0.25))
X <- cbind(rep(1,200),x)
b.ols <- solve(t(X) %*% X, t(X) %*% y)
b.ols

## [,1]
## 0.2890479
## x 0.2115689
lm(y~x)

##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## 0.2890 0.2116

1.3 Problem 4
Consider the matrix
1 0 0 1
 
0 2 0 0
A=
0
.
0 3 0
0 0 0 4
Calculate the inverse of A, and its eigenvalues and eigenvectors.
A <- diag(1:4)
A[1,4] <- 1
A

## [,1] [,2] [,3] [,4]

## [1,] 1 0 0 1
## [2,] 0 2 0 0
## [3,] 0 0 3 0
## [4,] 0 0 0 4
A.inv <- solve(A)
A.inv %*% A

## [,1] [,2] [,3] [,4]

## [1,] 1 0 0 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1
A %*% A.inv

## [,1] [,2] [,3] [,4]

## [1,] 1 0 0 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1
eigen(A)

## eigen() decomposition

2
## $values
## [1] 4 3 2 1
##
## $vectors
## [,1] [,2] [,3] [,4]
## [1,] 0.3162278 0 0 1
## [2,] 0.0000000 0 1 0
## [3,] 0.0000000 1 0 0
## [4,] 0.9486833 0 0 0

1.4 Write a function

Write a function that calculates f (x) = a + bx + cx2 , such that the function takes two arguments: a vector x
where the function needs to be evaluated (elementwise), and a vector of parameters (a b c)0 .
It is important that all arguments and parameters are communicated to the function, and not taken from the
workspace when the function is evaluated.
f.correct <- function(x,p){
p[1] + p[2]*x + p[3]*xˆ2
}

p <- c(0,1,1)
x <- c(1,2,3,4)
f.correct(x,p)

## [1] 2 6 12 20
R is would also evaluate the function if the parameters of the function are in the workspace, but this may
lead to unintended outcomes.
f.wrong <- function(x){
p[1] + p[2]*x + p[3]*xˆ2
}

f.wrong(x)

## [1] 2 6 12 20
p <- p/2
f.wrong(x)

## [1] 1 3 6 10

1.5 AR-model
Write a loop that generates data according to an AR(1) process: yt = αyt−1 + t , for t = 1, . . . , 2000. Make a
graph of your time series yt , for three different values of α. Take vart = 0.25.
We could use a package and a function to do this, but the point here is to write a loop.
ar.one <- function(T,alpha,sigma){
y0 <- rnorm(1,sd=sigma)
y <- rep(NA,T)
y[1] <- alpha * y0 + rnorm(1,sd=sigma)
for (t in 2:T){
y[t] <- alpha*y[t-1] + rnorm(1,sd=sigma)
}

3
y
}
y1 <- ar.one(2000,0.3,0.5)
y2 <- ar.one(2000,-0.1,0.5)
y3 <- ar.one(2000,0.95,0.5)

y.df <- data.frame(y1,y2,y3,i=1:2000)

plot(1:2000,y1)
1.5
0.5
y1

−0.5
−1.5

0 500 1000 1500 2000

1:2000
plot(1:2000,y2)
1.5
0.5
y2

−0.5
−1.5

0 500 1000 1500 2000

1:2000

4
plot(1:2000,y3)

4
2
y3

0
−2
−4

0 500 1000 1500 2000

1:2000
These plots are not very helpful. Below is a slightly more elaborate attempt.
library(ggplot2)
y.df <- data.frame(y=c(y1,y2,y3),alpha=as.factor(c(rep(0.3,2000),rep(-0.1,2000),rep(0.95,2000))),
i=rep(1:2000,3))

ggplot(y.df) + geom_point(aes(x=i,y=y,color=alpha),size=0.2)

5
5.0

2.5

alpha
−0.1
0.0
y

0.3
0.95

−2.5

−5.0
0 500 1000 1500 2000
i
ggplot(y.df) + geom_point(aes(x=i,y=y),size=0.2) + facet_wrap(~alpha)

6
−0.1 0.3 0.95
5.0

2.5

0.0
y

−2.5

−5.0
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000
i

1.6 Computers are imprecise

Computers are not precise in a mathematical sense. Can you find a number such that R thinks that 1+ = 1?
eps <- 1
while ((1+eps)!=1) eps <- eps/2
eps

## [1] 1.110223e-16
1+eps

## [1] 1
(1+eps)==1

## [1] TRUE

1.7 Unbiasedness and coverage

Suppose you have a sample of independent observations X1 , . . . , Xn , distributed according to a N (µ, σ 2 )
distribution. Then you know that
• X̄ = n1 i Xi is an unbiased estimator for EX = µ and,
P

t−1 (0.975)S
• X̄ ± n−1 √n is a 95% confidence interval for µ, with t−1
n−1 (0.975) the 97.5th percentile of a t
distribution with n − 1 degrees of freedom, and S the square root of the unbiased estimator for the
variance of X, σ 2 .
Take n = 25, µ = 0 and σ 2 = 2, and show the validity of these two claims in a simple simulation experiment.

7
These statements concern finite sample properties, so we should assess them by repeatedly sampling from the
(true) distribution, and check whether the claims hold.
mu <- 0
sigma <- sqrt(2)
n <- 25

B <- 1000000 # number of replications

m.sampled <- rep(NA,B)
ci.sampled <- matrix(NA,ncol=2,nrow=B)

set.seed(63729)
for (b in 1:B){
# generate a sample
sample.b <- rnorm(n,mean=mu,sd=sigma)
m.sampled[b] <- mean(sample.b)
}

t.percentile <- qt(0.975,df=n-1)

for (b in 1:B){
# generate a sample
sample.b <- rnorm(n,mean=mu,sd=sigma)
sd.b <- sd(sample.b)
ci.sampled[b,1] <- mean(sample.b) - t.percentile*sd.b/sqrt(n)
ci.sampled[b,2] <- mean(sample.b) + t.percentile*sd.b/sqrt(n)
}

mean(m.sampled)

## [1] -0.000373545
mean(ci.sampled[,1]<mu & ci.sampled[,2]>mu)

## [1] 0.949782
The average of the sampled means is very close to the true value (µ = 0), and the coverage of the confidence
interval is very close to 0.95. Whether the choice of a million replications makes sense depends your the
specifications of your laptop or computer. First test your code with, say, B = 10. Later in the course, we will
see a technique that can be used to speed up this type of simulation studies.

2 Reading and processing data: Stock market data

This week we look at data processing, and we will construct a dataset that will be used the coming weeks.
For that reason, it is important that your dataset is of high quality!

2.1 Reading material

Please scroll through Wickham and Grolemund, chapters 1-16. Brush up your R skills by reading chapters
1-8 of Jones et al., should that be necessary.

2.2 Read market data

On nestor (course documents, week 1) you find two files with stock market data, one of the Dutch in-
dex AEX and one with the German index DAX. Read both files into R. (Hint: use either read.csv or
readxl::read_excel.)}

8
First, we set the working directory to the one having the data, we load the appropriate library, and read the
data. When reading a csv file, make sure that the decimal point is the decimal point, and not a decimal
comma. This varies sometimes by language of the operating system. Nothing beats looking at the first few
lines of data in a simple text editor.
library(readxl)
suppressMessages(library(tidyverse))
aex <- read_excel("aex.xlsx")
glimpse(aex)

## Rows: 7,831
## Columns: 7
## $ Date <chr> "1992-10-12", "1992-10-13", "1992-10-14", "1992-10-15", "1~
## $ Open <chr> "126.382904", "126.773155", "126.056175", "126.097015", "1~
## $ High <chr> "127.181557", "127.004585", "126.555336", "126.568954", "1~
## $ Low <chr> "126.056175", "126.491806", "126.001724", "125.688614", "1~
## $ Close <chr> "126.945595", "126.850296", "126.487274", "125.824753", "1~
## $ `Adj Close` <chr> "126.945595", "126.850296", "126.487274", "125.824753", "1~
## $ Volume <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"~
dax <- read.csv("dax.csv")
glimpse(dax)

## Rows: 9,041
## Columns: 7
## $ Date <chr> "1987-12-30", "1987-12-31", "1988-01-01", "1988-01-04", "198~
## $ Open <chr> "1005.190002", "null", "null", "956.489990", "996.099976", "~
## $ High <chr> "1005.190002", "null", "null", "956.489990", "996.099976", "~
## $ Low <chr> "1005.190002", "null", "null", "956.489990", "996.099976", "~
## $ Close <chr> "1005.190002", "null", "null", "956.489990", "996.099976", "~
## $ Adj.Close <chr> "1005.190002", "null", "null", "956.489990", "996.099976", "~
## $ Volume <chr> "0", "null", "null", "0", "0", "0", "0", "0", "0", "0", "0",~
save(aex,dax,file="stock data.Rda")

Later, we will clean these data and do some transformations.

2.3 Index returns

Calculate the daily log-returns in your AEX and DAX data sets. Merge the datasets horizontally by day. It is
difficult to graph the densities of both markets, since they are in two different columns. Now, create a dataset
with the following variables: date, market, daily log-return. Use the command pivot_longer to do so, and
make a graph of the density of the log-returns in a faceted graph (one cell for the AEX, one for the DAX).
The inverse operation of pivot_longer is pivot_wider)}
library(readxl)
suppressMessages(library(tidyverse))
library(lubridate)

##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
aex <- read_excel("aex.xlsx")
glimpse(aex)

9
## Rows: 7,831
## Columns: 7
## $ Date <chr> "1992-10-12", "1992-10-13", "1992-10-14", "1992-10-15", "1~
## $ Open <chr> "126.382904", "126.773155", "126.056175", "126.097015", "1~
## $ High <chr> "127.181557", "127.004585", "126.555336", "126.568954", "1~
## $ Low <chr> "126.056175", "126.491806", "126.001724", "125.688614", "1~
## $ Close <chr> "126.945595", "126.850296", "126.487274", "125.824753", "1~
## $ `Adj Close` <chr> "126.945595", "126.850296", "126.487274", "125.824753", "1~
## $ Volume <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"~
dax <- read.csv("dax.csv",as.is=TRUE)
glimpse(dax)

## # A tibble: 6 x 7
## date open high low close àdj close` volume
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1992-10-12 126.382904 127.181557 126.056175 126.945595 126.945595 0
## 2 1992-10-13 126.773155 127.004585 126.491806 126.850296 126.850296 0
## 3 1992-10-14 126.056175 126.555336 126.001724 126.487274 126.487274 0
## 4 1992-10-15 126.097015 126.568954 125.688614 125.824753 125.824753 0
## 5 1992-10-16 126.264915 126.478195 124.858192 125.230293 125.230293 0
## 6 1992-10-19 124.967102 125.184914 124.141220 124.286430 124.286430 0
aex <- mutate(aex,date=ymd(date),open=as.numeric(open),high=as.numeric(high),
low=as.numeric(low),close=as.numeric(close),adj.close=as.numeric(àdj close`),
volume=as.numeric(volume)) %>% select(-àdj close`) %>%
mutate(d.return=(log(adj.close)-log(lag(adj.close))))

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

names(dax) <- tolower(names(dax))
head(dax)

## date open high low close adj.close volume

10
## 1 1987-12-30 1005.190002 1005.190002 1005.190002 1005.190002 1005.190002 0
## 2 1987-12-31 null null null null null null
## 3 1988-01-01 null null null null null null
## 4 1988-01-04 956.489990 956.489990 956.489990 956.489990 956.489990 0
## 5 1988-01-05 996.099976 996.099976 996.099976 996.099976 996.099976 0
## 6 1988-01-06 1006.010010 1006.010010 1006.010010 1006.010010 1006.010010 0
dax <- mutate(dax,date=ymd(date),open=as.numeric(open),high=as.numeric(high),
low=as.numeric(low),close=as.numeric(close),adj.close=as.numeric(adj.close),
volume=as.numeric(volume)) %>%
mutate(d.return=(log(adj.close)-log(lag(adj.close))))

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

names(aex)[2:8] <- paste(names(aex)[2:8],"aex",sep=".")
names(dax)[2:8] <- paste(names(dax)[2:8],"dax",sep=".")

First, we look at the summaries, and remove apparent errors.

summary(aex$d.return.aex)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

## -0.11376 -0.00563 0.00069 0.00021 0.00664 0.10028 162
summary(dax$d.return.dax)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

## -0.14091 -0.00638 0.00076 0.00027 0.00741 0.10797 289
It is not possible for the AEX to have dropped 78% on one day, we remove that observation.
stock.returns1 <- full_join(aex,dax)

## Joining, by = "date"
stock.returns2 <- left_join(aex,dax)

## Joining, by = "date"
stock.returns3 <- right_join(aex,dax)

## Joining, by = "date"
glimpse(filter(stock.returns1,date=="1988-01-25"))

## Rows: 1
## Columns: 15
## $ date <date> 1988-01-25
## $ open.aex <dbl> NA
## $ high.aex <dbl> NA

11
## $ low.aex <dbl> NA
## $ close.aex <dbl> NA
## $ volume.aex <dbl> NA
## $ adj.close.aex <dbl> NA
## $ d.return.aex <dbl> NA
## $ open.dax <dbl> 962.63
## $ high.dax <dbl> 962.63
## $ low.dax <dbl> 962.63
## $ close.dax <dbl> 962.63
## $ adj.close.dax <dbl> 962.63
## $ volume.dax <dbl> 0
## $ d.return.dax <dbl> -0.003991457
glimpse(filter(stock.returns2,date=="1988-01-25"))

## Rows: 0
## Columns: 15
## $ date <date>
## $ open.aex <dbl>
## $ high.aex <dbl>
## $ low.aex <dbl>
## $ close.aex <dbl>
## $ volume.aex <dbl>
## $ adj.close.aex <dbl>
## $ d.return.aex <dbl>
## $ open.dax <dbl>
## $ high.dax <dbl>
## $ low.dax <dbl>
## $ close.dax <dbl>
## $ adj.close.dax <dbl>
## $ volume.dax <dbl>
## $ d.return.dax <dbl>
glimpse(filter(stock.returns3,date=="1988-01-25"))

## Rows: 1
## Columns: 15
## $ date <date> 1988-01-25
## $ open.aex <dbl> NA
## $ high.aex <dbl> NA
## $ low.aex <dbl> NA
## $ close.aex <dbl> NA
## $ volume.aex <dbl> NA
## $ adj.close.aex <dbl> NA
## $ d.return.aex <dbl> NA
## $ open.dax <dbl> 962.63
## $ high.dax <dbl> 962.63
## $ low.dax <dbl> 962.63
## $ close.dax <dbl> 962.63
## $ adj.close.dax <dbl> 962.63
## $ volume.dax <dbl> 0
## $ d.return.dax <dbl> -0.003991457
dim(merge(aex,dax))

## [1] 7792 15

12
dim(merge(aex,dax,all.x=TRUE))

## [1] 7831 15
dim(merge(aex,dax,all.y=TRUE))

## [1] 9041 15
dim(merge(aex,dax,all=TRUE))

## [1] 9080 15
stock.returns <- stock.returns1

save(aex,dax,stock.returns,file="stock data.Rda")

mean(is.element(aex$date,stock.returns$date))

## [1] 1
mean(is.element(dax$date,stock.returns$date))

## [1] 1
stocks.long <- select(stock.returns,date,aex=d.return.aex,dax=d.return.dax) %>%
pivot_longer(!date,names_to="market",values_to="return")
head(stocks.long)

## # A tibble: 6 x 3
## date market return
## <date> <chr> <dbl>
## 1 1992-10-12 aex NA
## 2 1992-10-12 dax 0.00272
## 3 1992-10-13 aex -0.000751
## 4 1992-10-13 dax 0.0206
## 5 1992-10-14 aex -0.00287
## 6 1992-10-14 dax -0.0121
ggplot(stocks.long) + geom_density(aes(x=return,color=market))

## Warning: Removed 1739 rows containing non-finite values (stat_density).

13
40

market
density

aex

20 dax

−0.15 −0.10 −0.05 0.00 0.05 0.10

return
ggplot(stocks.long) + geom_density(aes(x=return)) +
facet_wrap(~market,nrow=1)

## Warning: Removed 1739 rows containing non-finite values (stat_density).

14
aex dax

30
density

−0.15 −0.10 −0.05 0.00 0.05 0.10 −0.15 −0.10 −0.05 0.00 0.05 0.10
return

2.4 Monthly correlations

For each month in your returns dataset, calculate the correlation between the daily return on the AEX and
the DAX. Make a graph that shows this monthly correlation over time.
monthly.correlations <- mutate(stock.returns,month=format(date,format="%Y-%m")) %>%
group_by(month) %>%
summarise(correlation=cor(d.return.aex,d.return.dax,use="pairwise.complete.obs"))
ggplot(monthly.correlations,aes(x=month,y=correlation))+geom_point()+ylim(0,1)

## Warning: Removed 58 rows containing missing values (geom_point).

15
1.00

0.75
correlation

0.50

0.25

0.00

1987−12
1988−01
1988−02
1988−03
1988−04
1988−05
1988−06
1988−07
1988−08
1988−09
1988−10
1988−11
1988−12
1989−01
1989−02
1989−03
1989−04
1989−05
1989−06
1989−07
1989−08
1989−09
1989−10
1989−11
1989−12
1990−01
1990−02
1990−03
1990−04
1990−05
1990−06
1990−07
1990−08
1990−09
1990−10
1990−11
1990−12
1991−01
1991−02
1991−03
1991−04
1991−05
1991−06
1991−07
1991−08
1991−09
1991−10
1991−11
1991−12
1992−01
1992−02
1992−03
1992−04
1992−05
1992−06
1992−07
1992−08
1992−09
1992−10
1992−11
1992−12
1993−01
1993−02
1993−03
1993−04
1993−05
1993−06
1993−07
1993−08
1993−09
1993−10
1993−11
1993−12
1994−01
1994−02
1994−03
1994−04
1994−05
1994−06
1994−07
1994−08
1994−09
1994−10
1994−11
1994−12
1995−01
1995−02
1995−03
1995−04
1995−05
1995−06
1995−07
1995−08
1995−09
1995−10
1995−11
1995−12
1996−01
1996−02
1996−03
1996−04
1996−05
1996−06
1996−07
1996−08
1996−09
1996−10
1996−11
1996−12
1997−01
1997−02
1997−03
1997−04
1997−05
1997−06
1997−07
1997−08
1997−09
1997−10
1997−11
1997−12
1998−01
1998−02
1998−03
1998−04
1998−05
1998−06
1998−07
1998−08
1998−09
1998−10
1998−11
1998−12
1999−01
1999−02
1999−03
1999−04
1999−05
1999−06
1999−07
1999−08
1999−09
1999−10
1999−11
1999−12
2000−01
2000−02
2000−03
2000−04
2000−05
2000−06
2000−07
2000−08
2000−09
2000−10
2000−11
2000−12
2001−01
2001−02
2001−03
2001−04
2001−05
2001−06
2001−07
2001−08
2001−09
2001−10
2001−11
2001−12
2002−01
2002−02
2002−03
2002−04
2002−05
2002−06
2002−07
2002−08
2002−09
2002−10
2002−11
2002−12
2003−01
2003−02
2003−03
2003−04
2003−05
2003−06
2003−07
2003−08
2003−09
2003−10
2003−11
2003−12
2004−01
2004−02
2004−03
2004−04
2004−05
2004−06
2004−07
2004−08
2004−09
2004−10
2004−11
2004−12
2005−01
2005−02
2005−03
2005−04
2005−05
2005−06
2005−07
2005−08
2005−09
2005−10
2005−11
2005−12
2006−01
2006−02
2006−03
2006−04
2006−05
2006−06
2006−07
2006−08
2006−09
2006−10
2006−11
2006−12
2007−01
2007−02
2007−03
2007−04
2007−05
2007−06
2007−07
2007−08
2007−09
2007−10
2007−11
2007−12
2008−01
2008−02
2008−03
2008−04
2008−05
2008−06
2008−07
2008−08
2008−09
2008−10
2008−11
2008−12
2009−01
2009−02
2009−03
2009−04
2009−05
2009−06
2009−07
2009−08
2009−09
2009−10
2009−11
2009−12
2010−01
2010−02
2010−03
2010−04
2010−05
2010−06
2010−07
2010−08
2010−09
2010−10
2010−11
2010−12
2011−01
2011−02
2011−03
2011−04
2011−05
2011−06
2011−07
2011−08
2011−09
2011−10
2011−11
2011−12
2012−01
2012−02
2012−03
2012−04
2012−05
2012−06
2012−07
2012−08
2012−09
2012−10
2012−11
2012−12
2013−01
2013−02
2013−03
2013−04
2013−05
2013−06
2013−07
2013−08
2013−09
2013−10
2013−11
2013−12
2014−01
2014−02
2014−03
2014−04
2014−05
2014−06
2014−07
2014−08
2014−09
2014−10
2014−11
2014−12
2015−01
2015−02
2015−03
2015−04
2015−05
2015−06
2015−07
2015−08
2015−09
2015−10
2015−11
2015−12
2016−01
2016−02
2016−03
2016−04
2016−05
2016−06
2016−07
2016−08
2016−09
2016−10
2016−11
2016−12
2017−01
2017−02
2017−03
2017−04
2017−05
2017−06
2017−07
2017−08
2017−09
2017−10
2017−11
2017−12
2018−01
2018−02
2018−03
2018−04
2018−05
2018−06
2018−07
2018−08
2018−09
2018−10
2018−11
2018−12
2019−01
2019−02
2019−03
2019−04
2019−05
2019−06
2019−07
2019−08
2019−09
2019−10
2019−11
2019−12
2020−01
2020−02
2020−03
2020−04
2020−05
2020−06
2020−07
2020−08
2020−09
2020−10
2020−11
2020−12
2021−01
2021−02
2021−03
2021−04
2021−05
2021−06
2021−07
2021−08
2021−09
2021−10
2021−11
2021−12
2022−01
2022−02
2022−03
2022−04
2022−05
2022−06
2022−07
2022−08
2022−09
2022−10
2022−11
2022−12
2023−01
2023−02
month
monthly.correlations <- mutate(monthly.correlations,month=ymd(month,truncated=2))
ggplot(monthly.correlations,aes(x=month,y=correlation))+geom_point()+ylim(0,1)

## Warning: Removed 58 rows containing missing values (geom_point).

16
1.00

0.75
correlation

0.50

0.25

0.00

1990 2000 2010 2020

month

3 Reading and processing data: Tennis data

On http:// www.tennis-data.co.uk/alldata.php you find compressed files with tennis results. There is also
a file Notes.txt that has a short explanation of the data that are provided. Download all results for males
(2001-2017) and females (2007-2017). Read the xls files into R. Make sure that the variables in your datasets
with sex-year results are of the appropriate type. You will have to create a new variable, sex that takes values
male or female. Now merge all 20 + 14 = 34 datasets into one dataset. You can save this dataset using the
save-command.
suppressMessages(library(tidyverse))
#library(tidyverse)
library(lubridate)
library(readxl)
m.files <- list.files("raw data/m",full.names=TRUE)
f1 <- read_excel(m.files[1])
class(f1)

## [1] "tbl_df" "tbl" "data.frame"

## # A tibble: 2,963 x 34
## ATP Location Tournament Date Series Court Surface Round
## <dbl> <chr> <chr> <dttm> <chr> <chr> <chr> <chr>
## 1 1 Adelaide AAPT Champi~ 2001-01-01 00:00:00 Interna~ Outdo~ Hard 1st ~
## 2 1 Adelaide AAPT Champi~ 2001-01-01 00:00:00 Interna~ Outdo~ Hard 1st ~
## 3 1 Adelaide AAPT Champi~ 2001-01-01 00:00:00 Interna~ Outdo~ Hard 1st ~
## 4 1 Adelaide AAPT Champi~ 2001-01-01 00:00:00 Interna~ Outdo~ Hard 1st ~

17
## 5 1 Adelaide AAPT Champi~ 2001-01-01 00:00:00 Interna~ Outdo~ Hard 1st ~
## 6 1 Adelaide AAPT Champi~ 2001-01-01 00:00:00 Interna~ Outdo~ Hard 1st ~
## 7 1 Adelaide AAPT Champi~ 2001-01-01 00:00:00 Interna~ Outdo~ Hard 1st ~
## 8 1 Adelaide AAPT Champi~ 2001-01-01 00:00:00 Interna~ Outdo~ Hard 1st ~
## 9 1 Adelaide AAPT Champi~ 2001-01-01 00:00:00 Interna~ Outdo~ Hard 1st ~
## 10 1 Adelaide AAPT Champi~ 2001-01-01 00:00:00 Interna~ Outdo~ Hard 1st ~
## # ... with 2,953 more rows, and 26 more variables: Best of <dbl>, Winner <chr>,
## # Loser <chr>, WRank <dbl>, LRank <chr>, W1 <dbl>, L1 <dbl>, W2 <dbl>,
## # L2 <dbl>, W3 <dbl>, L3 <dbl>, W4 <dbl>, L4 <dbl>, W5 <dbl>, L5 <dbl>,
## # Wsets <dbl>, Lsets <dbl>, Comment <chr>, CBW <dbl>, CBL <dbl>, GBW <dbl>,
## # GBL <dbl>, IWW <dbl>, IWL <dbl>, SBW <dbl>, SBL <dbl>
The variable LRank should be <dbl> so we write a function that forces WRank and LRank to be numeric. The
other variables that are forced to be numeric are from other excel files with incorrect types.
force.type <- function(d){
nm <- names(d)
if (is.element("WRank",nm)) d$WRank <- as.numeric(d$WRank)
if (is.element("LRank",nm)) d$LRank <- as.numeric(d$LRank)
if (is.element("B365W",nm)) d$B365W <- as.numeric(d$B365W)
if (is.element("B365L",nm)) d$B365L <- as.numeric(d$B365L)
if (is.element("WPts",nm)) d$WPts <- as.numeric(d$WPts)
if (is.element("LPts",nm)) d$LPts <- as.numeric(d$LPts)
if (is.element("LBW",nm)) d$LBW <- as.numeric(d$LBW)
if (is.element("LBL",nm)) d$LBL <- as.numeric(d$LBL)
d
}

First we read all the files into tibbles and store those in a list. Then we force the types if necessary (and if
the variable is present).
m.tennis <- force.type(f1)

## Warning in force.type(f1): NAs introduced by coercion

for (i in 2:length(m.files)){
f <- read_excel(m.files[i])
select(f,Date,B365W,B365L) %>% glimpse()
f <- force.type(f)
m.tennis <- bind_rows(m.tennis,f)
}

## Rows: 2,854
## Columns: 3
## $ Date <dttm> 2001-12-31, 2001-12-31, 2001-12-31, 2001-12-31, 2001-12-31, 200~
## $ B365W <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ B365L <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
## Warning in force.type(f): NAs introduced by coercion
## Rows: 2,861
## Columns: 3
## $ Date <dttm> 2002-12-30, 2002-12-30, 2002-12-30, 2002-12-30, 2002-12-30, 200~
## $ B365W <dbl> NA, NA, 1.364, NA, NA, NA, 1.667, 1.400, 1.667, 1.286, 1.800, NA~
## $ B365L <dbl> NA, NA, 2.875, NA, NA, NA, 2.100, 2.750, 2.100, 3.250, 1.909, NA~
## Warning in force.type(f): NAs introduced by coercion
## Rows: 2,877

18
## Columns: 3
## $ Date <dttm> 2004-01-05, 2004-01-05, 2004-01-05, 2004-01-05, 2004-01-05, 200~
## $ B365W <dbl> NA, 1.160, 2.000, 1.830, 1.400, NA, 1.800, 1.800, NA, 1.533, 1.4~
## $ B365L <dbl> NA, 4.500, 1.720, 1.830, 2.750, NA, 1.909, 1.900, NA, 2.375, 2.6~
## Warning in force.type(f): NAs introduced by coercion
## Warning in force.type(f): NAs introduced by coercion
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting logical in O1724 / R1724C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting logical in O1906 / R1906C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting logical in O2015 / R2015C15: got 'N/A'
## Rows: 2,909
## Columns: 3
## $ Date <dttm> 2005-01-03, 2005-01-03, 2005-01-03, 2005-01-03, 2005-01-03, 200~
## $ B365W <dbl> 1.286, 1.833, 1.800, 1.667, 1.615, 1.333, 1.500, 1.533, 1.400, 1~
## $ B365L <dbl> 3.250, 1.833, 1.909, 2.100, 2.200, 3.000, 2.500, 2.375, 2.750, 2~
## Warning in force.type(f): NAs introduced by coercion
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in AA1402 / R1402C27: got '`1'
## Rows: 2,909
## Columns: 3
## $ Date <dttm> 2006-01-02, 2006-01-02, 2006-01-02, 2006-01-02, 2006-01-02, 200~
## $ B365W <dbl> 1.39, 1.53, 1.28, 1.53, 1.44, 1.50, 1.40, 1.83, 1.90, 1.12, 1.66~
## $ B365L <dbl> 2.75, 2.37, 3.25, 2.37, 2.62, 2.50, 2.75, 1.83, 1.80, 5.50, 2.10~
## Warning in force.type(f): NAs introduced by coercion
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1384 / R1384C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1384 / R1384C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1475 / R1475C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1475 / R1475C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1485 / R1485C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1485 / R1485C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M2384 / R2384C13: got 'N/A'

19
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O2384 / R2384C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M2386 / R2386C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O2386 / R2386C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M2698 / R2698C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O2698 / R2698C15: got 'N/A'
## Rows: 2,806
## Columns: 3
## $ Date <dttm> 2007-01-01, 2007-01-01, 2007-01-01, 2007-01-01, 2007-01-02, 200~
## $ B365W <dbl> 2.87, 3.00, 2.00, 2.37, 1.50, 1.08, 2.87, 1.19, 2.62, 1.14, 1.83~
## $ B365L <dbl> 1.36, 1.33, 1.72, 1.53, 2.50, 6.50, 1.36, 4.00, 1.44, 5.00, 1.83~
## Rows: 2,707
## Columns: 3
## $ Date <dttm> 2007-12-31, 2007-12-31, 2007-12-31, 2007-12-31, 2007-12-31, 200~
## $ B365W <dbl> 1.53, 6.50, 3.75, 1.40, 1.72, 2.50, 1.44, NA, 1.57, 1.16, 1.28, ~
## $ B365L <dbl> 2.37, 1.10, 1.25, 2.75, 2.00, 1.50, 2.62, NA, 2.25, 4.50, 3.50, ~
## Warning in force.type(f): NAs introduced by coercion
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Rows: 2,731
## Columns: 3
## $ Date <dttm> 2009-01-04, 2009-01-04, 2009-01-04, 2009-01-04, 2009-01-05, 200~
## $ B365W <dbl> 1.25, 2.75, 2.10, 1.44, 2.00, 1.36, 2.20, 1.06, 1.03, 2.75, 1.28~
## $ B365L <dbl> 3.75, 1.40, 1.66, 2.62, 1.72, 3.00, 1.61, 8.00, 11.00, 1.40, 3.5~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Rows: 2,679
## Columns: 3
## $ Date <dttm> 2010-01-04, 2010-01-04, 2010-01-04, 2010-01-04, 2010-01-04, 201~
## $ B365W <dbl> 1.44, 2.25, 1.61, 2.62, 3.00, 1.61, 1.04, 1.08, 1.72, 1.12, 1.36~
## $ B365L <dbl> 2.62, 1.57, 2.20, 1.44, 1.36, 2.20, 10.00, 7.00, 2.00, 5.50, 3.0~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1232 / R1232C13: got 'N/A'

20
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1232 / R1232C15: got 'N/A'
## Rows: 2,675
## Columns: 3
## $ Date <dttm> 2011-01-02, 2011-01-02, 2011-01-03, 2011-01-03, 2011-01-03, 201~
## $ B365W <dbl> 1.53, 1.90, 1.83, 1.72, 1.36, 3.50, 1.53, 1.06, 1.36, 3.00, NA, ~
## $ B365L <dbl> 2.37, 1.80, 1.83, 2.00, 3.00, 1.28, 2.37, 8.00, 3.00, 1.36, NA, ~
## Rows: 2,607
## Columns: 3
## $ Date <dttm> 2012-01-01, 2012-01-01, 2012-01-02, 2012-01-02, 2012-01-02, 201~
## $ B365W <dbl> 4.33, 1.25, 3.25, 1.66, 1.40, 1.20, 1.36, 1.22, 1.36, 1.44, 1.50~
## $ B365L <dbl> 1.20, 3.75, 1.33, 2.10, 2.75, 4.33, 3.00, 4.00, 3.00, 2.62, 2.50~
## Warning in force.type(f): NAs introduced by coercion
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in AE1755 / R1755C31: got '2.,3'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L2226 / R2226C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N2226 / R2226C14: got 'N/A'
## Rows: 2,631
## Columns: 3
## $ Date <dttm> 2012-12-31, 2012-12-31, 2012-12-31, 2012-12-31, 2013-01-01, 201~
## $ B365W <dbl> 1.36, 1.61, 1.25, 1.07, 1.90, 1.61, 2.20, 1.44, 3.00, 1.36, 1.57~
## $ B365L <dbl> 3.00, 2.20, 3.75, 9.00, 1.80, 2.20, 1.61, 2.62, 1.36, 3.00, 2.25~
## Warning in force.type(f): NAs introduced by coercion
## Warning in force.type(f): NAs introduced by coercion
## Rows: 2,600
## Columns: 3
## $ Date <dttm> 2013-12-30, 2013-12-30, 2013-12-30, 2013-12-30, 2013-12-30, 201~
## $ B365W <dbl> 1.72, 1.28, 1.36, 1.90, 1.25, 2.25, 1.25, 1.66, 1.44, 1.57, 1.28~
## $ B365L <dbl> 2.00, 3.50, 3.00, 1.80, 3.75, 1.57, 3.75, 2.10, 2.62, 2.25, 3.50~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Rows: 2,630
## Columns: 3
## $ Date <dttm> 2015-01-05, 2015-01-05, 2015-01-05, 2015-01-05, 2015-01-06, 201~
## $ B365W <dbl> 4.50, 2.62, 1.28, 1.57, 1.53, 3.50, 2.00, 1.66, 1.66, 1.40, 1.25~
## $ B365L <dbl> 1.18, 1.44, 3.50, 2.25, 2.37, 1.28, 1.72, 2.10, 2.10, 2.75, 3.75~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

21
## Warning in force.type(f): NAs introduced by coercion
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1687 / R1687C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1687 / R1687C14: got 'N/A'
## Rows: 2,626
## Columns: 3
## $ Date <dttm> 2016-01-04, 2016-01-04, 2016-01-04, 2016-01-04, 2016-01-05, 201~
## $ B365W <dbl> 1.66, 1.53, 1.72, 1.83, 1.28, 2.00, 1.33, 2.00, 1.06, 2.25, 2.00~
## $ B365L <dbl> 2.10, 2.37, 2.00, 1.83, 3.50, 1.72, 3.25, 1.72, 10.00, 1.57, 1.7~
## Warning in force.type(f): NAs introduced by coercion
## Warning in force.type(f): NAs introduced by coercion
## Rows: 2,633
## Columns: 3
## $ Date <dttm> 2017-01-01, 2017-01-02, 2017-01-02, 2017-01-02, 2017-01-02, 201~
## $ B365W <dbl> 1.28, 1.50, 1.90, 1.36, 1.40, 2.62, 2.25, 1.80, 1.50, 2.20, 1.44~
## $ B365L <dbl> 3.50, 2.50, 1.80, 3.00, 2.75, 1.44, 1.57, 1.90, 2.50, 1.61, 2.62~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Rows: 2,637
## Columns: 3
## $ Date <dttm> 2017-12-31, 2017-12-31, 2018-01-01, 2018-01-01, 2018-01-01, 201~
## $ B365W <dbl> 2.20, 2.75, 1.61, 2.50, 1.40, 2.20, 1.83, 1.66, 1.53, 1.72, 2.62~
## $ B365L <dbl> 1.61, 1.40, 2.20, 1.50, 2.75, 1.61, 1.83, 2.10, 2.37, 2.00, 1.44~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Rows: 2,610
## Columns: 3
## $ Date <dttm> 2018-12-31, 2018-12-31, 2018-12-31, 2018-12-31, 2018-12-31, 201~
## $ B365W <dbl> 1.36, 1.18, 1.57, 1.40, 2.62, 2.62, 2.10, 1.28, 1.40, 2.25, 1.20~
## $ B365L <dbl> 3.00, 4.50, 2.25, 2.75, 1.44, 1.44, 1.66, 3.50, 2.75, 1.57, 4.33~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Rows: 1,267
## Columns: 3
## $ Date <dttm> 2020-01-06, 2020-01-06, 2020-01-06, 2020-01-06, 2020-01-06, 202~
## $ B365W <dbl> 2.00, 1.57, 1.25, 1.83, 1.50, 1.66, 1.83, 3.00, 2.00, 3.50, 1.28~
## $ B365L <dbl> 1.72, 2.25, 3.75, 1.83, 2.50, 2.10, 1.83, 1.36, 1.72, 1.28, 3.50~
## Warning in force.type(f): NAs introduced by coercion

22
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Rows: 58
## Columns: 3
## $ Date <dttm> 2021-01-07, 2021-01-07, 2021-01-07, 2021-01-07, 2021-01-07, 202~
## $ B365W <dbl> 1.50, 2.50, 1.50, 1.61, 1.40, 2.62, 1.22, 5.00, 1.20, 1.80, 1.03~
## $ B365L <dbl> 2.50, 1.50, 2.50, 2.20, 2.75, 1.44, 4.00, 1.16, 4.33, 1.90, 15.0~
m.tennis$Sex <- "male"

We do the same for the files with female tennis results, and combine the result into one dataset.
f.files <- list.files("raw data/f",full.names=TRUE)
f1 <- read_excel(f.files[1])

## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :

## Expecting numeric in Z2159 / R2159C26: got '5..5'
select(f1,Date,B365W,B365L) %>% glimpse()

## Rows: 2,491
## Columns: 3
## $ Date <dttm> 2007-01-01, 2007-01-01, 2007-01-01, 2007-01-01, 2007-01-01, 200~
## $ B365W <dbl> 1.33, 3.75, 1.72, 1.83, 1.16, 2.50, 1.36, 2.50, 1.83, 1.53, 1.72~
## $ B365L <dbl> 3.00, 1.22, 2.00, 1.83, 4.50, 1.50, 2.87, 1.50, 1.83, 2.37, 2.00~
class(f1)

## [1] "tbl_df" "tbl" "data.frame"

## # A tibble: 2,491 x 34
## WTA Location Tournament Date Tier Court Surface Round
## <dbl> <chr> <chr> <dttm> <chr> <chr> <chr> <chr>
## 1 1 Auckland ASB Classic 2007-01-01 00:00:00 Tier 4 Outdoor Hard 1st Ro~
## 2 1 Auckland ASB Classic 2007-01-01 00:00:00 Tier 4 Outdoor Hard 1st Ro~
## 3 1 Auckland ASB Classic 2007-01-01 00:00:00 Tier 4 Outdoor Hard 1st Ro~
## 4 1 Auckland ASB Classic 2007-01-01 00:00:00 Tier 4 Outdoor Hard 1st Ro~
## 5 1 Auckland ASB Classic 2007-01-01 00:00:00 Tier 4 Outdoor Hard 1st Ro~
## 6 1 Auckland ASB Classic 2007-01-01 00:00:00 Tier 4 Outdoor Hard 1st Ro~
## 7 1 Auckland ASB Classic 2007-01-01 00:00:00 Tier 4 Outdoor Hard 1st Ro~
## 8 1 Auckland ASB Classic 2007-01-01 00:00:00 Tier 4 Outdoor Hard 1st Ro~
## 9 1 Auckland ASB Classic 2007-01-01 00:00:00 Tier 4 Outdoor Hard 1st Ro~
## 10 1 Auckland ASB Classic 2007-01-01 00:00:00 Tier 4 Outdoor Hard 1st Ro~
## # ... with 2,481 more rows, and 26 more variables: Best of <dbl>, Winner <chr>,
## # Loser <chr>, WRank <chr>, LRank <chr>, WPts <chr>, LPts <chr>, W1 <dbl>,
## # L1 <dbl>, W2 <dbl>, L2 <dbl>, W3 <dbl>, L3 <dbl>, Wsets <dbl>, Lsets <dbl>,
## # Comment <chr>, B365W <dbl>, B365L <dbl>, CBW <dbl>, CBL <dbl>, EXW <dbl>,
## # EXL <dbl>, PSW <dbl>, PSL <dbl>, UBW <dbl>, UBL <dbl>
f.tennis <- force.type(f1)

## Warning in force.type(f1): NAs introduced by coercion

23
## Warning in force.type(f1): NAs introduced by coercion

## Warning in force.type(f1): NAs introduced by coercion

for (i in 2:length(f.files)){
f <- read_excel(f.files[i])
select(f,Date,B365W,B365L) %>% glimpse()
f <- force.type(f)
f.tennis <- bind_rows(f.tennis,f)
}

## Rows: 2,404
## Columns: 3
## $ Date <dttm> 2007-12-30, 2007-12-31, 2007-12-31, 2007-12-31, 2007-12-31, 200~
## $ B365W <dbl> 1.25, 1.07, 1.66, 1.16, 1.83, 1.33, 2.62, 1.20, 1.36, 1.50, 1.04~
## $ B365L <dbl> 3.75, 7.50, 2.10, 4.50, 1.83, 3.25, 1.44, 4.33, 3.00, 2.50, 9.00~
## Warning in force.type(f): NAs introduced by coercion
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1051 / R1051C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1051 / R1051C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1482 / R1482C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1482 / R1482C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1493 / R1493C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1493 / R1493C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1689 / R1689C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1689 / R1689C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1704 / R1704C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1704 / R1704C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1731 / R1731C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1731 / R1731C15: got 'N/A'

24
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1785 / R1785C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1785 / R1785C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1808 / R1808C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1808 / R1808C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1817 / R1817C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1817 / R1817C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1824 / R1824C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1824 / R1824C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1853 / R1853C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1853 / R1853C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1867 / R1867C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1867 / R1867C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1877 / R1877C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1877 / R1877C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1923 / R1923C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1923 / R1923C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1989 / R1989C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1989 / R1989C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L2018 / R2018C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N2018 / R2018C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L2030 / R2030C12: got 'N/A'

25
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N2030 / R2030C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L2036 / R2036C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N2036 / R2036C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L2040 / R2040C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N2040 / R2040C14: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L2042 / R2042C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N2042 / R2042C14: got 'N/A'
## Rows: 2,433
## Columns: 3
## $ Date <dttm> 2009-01-04, 2009-01-05, 2009-01-05, 2009-01-05, 2009-01-05, 200~
## $ B365W <dbl> 1.400, 1.400, 3.000, 1.380, 1.660, 1.330, 1.660, 1.280, 1.500, 2~
## $ B365L <dbl> 2.75, 2.75, 1.36, 2.87, 2.10, 3.25, 2.10, 3.50, 2.50, 1.50, 6.50~
## Warning in force.type(f): NAs introduced by coercion
## Rows: 2,448
## Columns: 3
## $ Date <dttm> 2010-01-03, 2010-01-04, 2010-01-04, 2010-01-04, 2010-01-04, 201~
## $ B365W <dbl> 1.22, 1.10, 1.28, 2.10, 1.83, 1.25, 1.16, 1.16, 1.50, 1.36, 1.36~
## $ B365L <dbl> 4.00, 6.50, 3.50, 1.66, 1.83, 3.75, 4.50, 4.50, 2.50, 3.00, 3.00~
## Warning in force.type(f): NAs introduced by coercion
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1651 / R1651C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1651 / R1651C15: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in M1654 / R1654C13: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in O1654 / R1654C15: got 'N/A'
## Rows: 2,468
## Columns: 3
## $ Date <dttm> 2011-01-02, 2011-01-03, 2011-01-03, 2011-01-03, 2011-01-03, 201~
## $ B365W <dbl> 1.50, 1.05, 1.57, 1.04, 1.66, 2.20, 1.44, 1.44, 1.72, 1.22, 1.16~
## $ B365L <dbl> 2.50, 9.00, 2.25, 10.00, 2.10, 1.61, 2.62, 2.62, 2.00, 4.00, 4.5~
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :

26
## Expecting numeric in L1596 / R1596C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1596 / R1596C14: got 'N/A'
## Rows: 2,407
## Columns: 3
## $ Date <dttm> 2012-01-02, 2012-01-02, 2012-01-02, 2012-01-02, 2012-01-02, 201~
## $ B365W <dbl> 1.36, 1.33, 1.44, 1.36, 1.18, 1.14, 1.66, 1.16, 1.36, 1.36, 1.30~
## $ B365L <dbl> 3.00, 3.25, 2.62, 3.00, 4.50, 5.50, 2.10, 5.00, 3.00, 3.00, 3.40~
## Warning in force.type(f): NAs introduced by coercion
## Warning in force.type(f): NAs introduced by coercion
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1166 / R1166C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1166 / R1166C14: got 'N/A'
## Rows: 2,442
## Columns: 3
## $ Date <dttm> 2012-12-30, 2012-12-31, 2012-12-31, 2012-12-31, 2012-12-31, 201~
## $ B365W <dbl> 2.50, 1.53, 1.83, 2.50, 1.40, 1.57, 1.36, 1.53, 1.83, 1.61, 1.44~
## $ B365L <dbl> 1.50, 2.37, 1.83, 1.50, 2.75, 2.25, 3.00, 2.37, 1.83, 2.20, 2.62~
## Warning in force.type(f): NAs introduced by coercion
## Warning in force.type(f): NAs introduced by coercion
## Rows: 2,476
## Columns: 3
## $ Date <dttm> 2013-12-30, 2013-12-30, 2013-12-30, 2013-12-30, 2013-12-30, 201~
## $ B365W <dbl> 2.20, 2.50, 1.57, 1.20, 1.61, 1.40, 1.16, 2.62, 1.53, 3.50, 3.25~
## $ B365L <dbl> 1.61, 1.50, 2.25, 4.33, 2.20, 2.75, 4.50, 1.44, 2.37, 1.28, 1.33~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Rows: 2,521
## Columns: 3
## $ Date <dttm> 2015-01-04, 2015-01-04, 2015-01-05, 2015-01-05, 2015-01-05, 201~
## $ B365W <dbl> 1.66, 1.72, 1.61, 1.36, 2.75, 1.44, 1.57, 1.72, 1.83, 1.66, 4.00~
## $ B365L <dbl> 2.10, 2.00, 2.20, 3.00, 1.40, 2.62, 2.25, 2.00, 1.83, 2.10, 1.22~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Rows: 2,522
## Columns: 3
## $ Date <dttm> 2016-01-03, 2016-01-03, 2016-01-04, 2016-01-04, 2016-01-04, 201~
## $ B365W <dbl> 2.10, 1.61, 1.40, 1.72, 2.50, 1.30, 2.75, 2.25, 1.50, 1.72, 1.50~
## $ B365L <dbl> 1.66, 2.20, 2.75, 2.00, 1.50, 3.40, 1.40, 1.57, 2.50, 2.00, 2.50~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

27
## Rows: 2,500
## Columns: 3
## $ Date <dttm> 2017-01-01, 2017-01-01, 2017-01-02, 2017-01-02, 2017-01-03, 201~
## $ B365W <dbl> 1.40, 1.30, 1.20, 1.10, 2.37, 1.40, 2.50, 1.66, 1.66, 1.72, 1.16~
## $ B365L <dbl> 2.75, 3.40, 4.33, 7.00, 1.53, 2.75, 1.50, 2.10, 2.10, 2.00, 5.00~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in L1646 / R1646C12: got 'N/A'
## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :
## Expecting numeric in N1646 / R1646C14: got 'N/A'
## Rows: 2,469
## Columns: 3
## $ Date <dttm> 2017-12-31, 2017-12-31, 2018-01-01, 2018-01-01, 2018-01-01, 201~
## $ B365W <dbl> 1.90, 2.62, 1.66, 1.44, 2.75, 1.50, 2.10, 2.10, 1.90, 1.30, 1.72~
## $ B365L <dbl> 1.80, 1.44, 2.10, 2.62, 1.40, 2.50, 1.66, 1.66, 1.80, 3.40, 2.00~
## Warning in force.type(f): NAs introduced by coercion
## Warning in force.type(f): NAs introduced by coercion
## Rows: 2,472
## Columns: 3
## $ Date <dttm> 2018-12-31, 2018-12-31, 2018-12-31, 2018-12-31, 2018-12-31, 201~
## $ B365W <dbl> 1.36, 1.44, 1.61, 1.50, 2.25, 2.00, NA, 1.30, 2.62, 1.50, 1.44, ~
## $ B365L <dbl> 3.00, 2.62, 2.20, 2.50, 1.57, 1.72, NA, 3.40, 1.44, 2.50, 2.62, ~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Rows: 1,055
## Columns: 3
## $ Date <dttm> 2020-01-06, 2020-01-06, 2020-01-06, 2020-01-06, 2020-01-06, 202~
## $ B365W <dbl> 1.61, 2.37, 1.90, 3.00, 1.04, 1.72, 2.50, 1.18, 1.50, 1.25, 1.22~
## $ B365L <dbl> 2.20, 1.53, 1.80, 1.36, 13.00, 2.00, 1.50, 4.50, 2.50, 3.75, 4.0~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Rows: 63

28
## Columns: 3
## $ Date <dttm> 2021-01-06, 2021-01-06, 2021-01-06, 2021-01-06, 2021-01-06, 202~
## $ B365W <dbl> 1.66, 2.25, 1.22, 1.30, 2.50, 1.57, 1.57, 2.50, 2.10, 1.25, 1.10~
## $ B365L <dbl> 2.10, 1.57, 4.00, 3.40, 1.50, 2.25, 2.25, 1.50, 1.66, 3.75, 7.00~
## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

f.tennis$Sex <- "female"
tennis.data <- bind_rows(m.tennis,f.tennis)
tennis.data <- tennis.data[,c(55,1:54,56,57)]
save(tennis.data,file="tennis.Rda")

3.1 Tidy data

Is the dataset you have constructed a tidy dataset? For later convenience, we are going to reorder the data.
Set the seed of the random number generator as follows: set.seed(662288). The select half of your dataset
by randomly sampling rows. Now you have two subsets, each equally large (give or take one row). In the first
subset, label the winner as player A, and the loser as player B, and rename all other variables according to
this labeling. In this first subset, variable Winner becomes PlayerA, variable Loser is renamed as variable
PlayerB, WRank becomes RankA, etc. In that first subset, add a variable Winner and that variable takes
value A only. The difficult work relates to the second subset. In that subset, the winner should be labelled as
PlayerB, and the loser as PlayerA. This labeling applies to all other variables as well. The variable Winner
should be added to the second subset, taking value B only. Finally, merge both datasets into one big dataset.
If you have done it correctly, mean(dataset$Winner=="A") should be approximately 0.50. Retain only the
betting quotes from B365.
The reason for relabeling is the following. Before any match, you know the names of the two players, and
we call them PlayerA and PlayerB. Before the match you don’t know who wins! That outcome variable will
be in a new column, with the name Winner. This variable takes two values, A and B. This way you can talk
sensibly about the question ’how does the probability that player A wins, depend on the difference in ranking
between players A and B’.
Note that the data are not organized in a tidy way: the column Winner has the outcome as a column name.
A match is described by three variables: the first player (PlayerA), the second player (PlayerB) and the
outcome (either A or B wins). We need to organize the data in this A/B labeling. In the next few weeks,
we will model the outcome of the match to depend on covariates. The current labeling does not allow any
modeling, as there is no variation in the dependent variable (Winner always wins). As we need to retain
information from B365 only, we remove all other columns with betting quotes.
#tennis.data <- select(tennis.data,Sex:Comment,B365W,B365L,WTA,Tier)
tennis.data <- select(tennis.data,Sex,ATP,Location,Tournament,Date,Series,
Court,Surface,Round,`Best of`,Winner,Loser,WRank,LRank,W1,L1,
W2,L2,W3,L3,W4,L4,W5,L5,Wsets,Lsets,Comment,B365W,B365L,WTA,Tier)
set.seed(662288)
e <- sample(1:nrow(tennis.data),floor(nrow(tennis.data)/2))
part1 <- tennis.data[e,]
part2 <- tennis.data[-e,]
names(part1)[10:29] <- c("Best.of","PlayerA","PlayerB",
"RankA","RankB","A1","B1","A2","B2","A3","B3","A4","B4","A5","B5",
"SetsA","SetsB","Comment","B365A","B365B")
part1$Winner <- "A"
names(part2)[10:29] <- c("Best.of","PlayerB","PlayerA",
"RankB","RankA","B1","A1","B2","A2","B3","A3","B4","A4","B5","A5",
"SetsB","SetsA","Comment","B365B","B365A")

29
part2$Winner <- "B"
tennis.data.relabeled <- bind_rows(part1,part2)
mean(tennis.data.relabeled$Winner=="A")

## [1] 0.4999942
This seems to have done the trick, compare both datasets using View(tennis.data) and View(tennis.data.relabeled).

3.2 Check your data

Check your dataset. Are there any weird values, or problems? Note that the betting quotes should be at least
1.0. Save your dataset
We select completed matches only. Moreover, for later use we need data with ‘reasonable’ values for betting
quotes, so the sum of their inverse (known as the overround) should not exceed 1.15.
table(tennis.data.relabeled$Comment)

##
## Awarded Completed Disqualified Retired Sched Walkoer
## 1 83274 2 2700 2 1
## Walkover
## 461
tennis.data <- filter(tennis.data.relabeled,Comment=="Completed")
tennis.data <- filter(tennis.data,1/B365A + 1/B365B <=1.15)
lapply(tennis.data,summary)

## $Sex
## Length Class Mode
## 76363 character character
##
## $ATP
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 19.00 32.00 32.56 48.00 67.00 31718
##
## $Location
## Length Class Mode
## 76363 character character
##
## $Tournament
## Length Class Mode
## 76363 character character
##
## $Date
## Min. 1st Qu. Median
## "2002-12-30 00:00:00" "2008-08-22 00:00:00" "2012-07-21 00:00:00"
## Mean 3rd Qu. Max.
## "2012-06-27 11:51:29" "2016-06-28 00:00:00" "2021-01-13 00:00:00"
## NA's
## "2"
##
## $Series
## Length Class Mode
## 76363 character character
##

30
## $Court
## Length Class Mode
## 76363 character character
##
## $Surface
## Length Class Mode
## 76363 character character
##
## $Round
## Length Class Mode
## 76363 character character
##
## $Best.of
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 3.000 3.000 3.225 3.000 5.000
##
## $PlayerA
## Length Class Mode
## 76363 character character
##
## $PlayerB
## Length Class Mode
## 76363 character character
##
## $RankA
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 24.00 52.00 72.87 91.00 2159.00 112
##
## $RankB
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 24.00 53.00 73.67 92.00 1890.00 122
##
## $A1
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 3.000 6.000 4.834 6.000 7.000 2
##
## $B1
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 3.000 6.000 4.833 6.000 7.000 2
##
## $A2
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 3.000 6.000 4.759 6.000 7.000 5
##
## $B2
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 3.00 6.00 4.77 6.00 7.00 4
##
## $A3
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 3.00 6.00 4.81 6.00 16.00 44708
##
## $B3
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

31
## 0.0 3.0 6.0 4.8 6.0 15.0 44708
##
## $A4
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 3.00 6.00 4.86 6.00 7.00 72110
##
## $B4
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 3.00 6.00 4.86 6.00 7.00 72110
##
## $A5
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 3.0 6.0 5.2 6.0 70.0 74739
##
## $B5
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 4.0 6.0 5.3 6.0 68.0 74739
##
## $SetsA
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.000 2.000 1.246 2.000 3.000 2
##
## $SetsB
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.000 2.000 1.245 2.000 3.000 3
##
## $Comment
## Length Class Mode
## 76363 character character
##
## $B365A
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.971 1.390 1.830 2.624 2.750 67.000
##
## $B365B
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.967 1.400 1.830 2.639 2.870 101.000
##
## $WTA
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 15.00 27.00 27.72 42.00 61.00 44645
##
## $Tier
## Length Class Mode
## 76363 character character
##
## $Winner
## Length Class Mode
## 76363 character character
tennis.data$Date <- as.Date(tennis.data$Date)
save(tennis.data,file="tennis.Rda")

The maximum values for the betting quotes are quite high. We should check this later when we transform
them into implied winning probabilities during one of the coming weeks.

Pewin 32 Pro 2 Software
No ratings yet
Pewin 32 Pro 2 Software
95 pages
Registration and Login Form
100% (1)
Registration and Login Form
33 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
R Examples
No ratings yet
R Examples
56 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Lecture 10 R
No ratings yet
Lecture 10 R
117 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
First Course On R
No ratings yet
First Course On R
26 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
R Tutorial
No ratings yet
R Tutorial
32 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
R
No ratings yet
R
38 pages
Practical 1 - Data Frame Manipulation - 072502
No ratings yet
Practical 1 - Data Frame Manipulation - 072502
16 pages
This Is The Course Script
No ratings yet
This Is The Course Script
9 pages
STTN 225 R Summary
No ratings yet
STTN 225 R Summary
18 pages
R Programing
No ratings yet
R Programing
32 pages
Sta238 Wks - Week1+2
No ratings yet
Sta238 Wks - Week1+2
35 pages
R Commands: Appendix B
No ratings yet
R Commands: Appendix B
5 pages
Rintro
No ratings yet
Rintro
14 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
RStudio
No ratings yet
RStudio
31 pages
R Short Tutorial
No ratings yet
R Short Tutorial
5 pages
R File Code
No ratings yet
R File Code
16 pages
Exercises For R
No ratings yet
Exercises For R
40 pages
Machine Learning-Intro
No ratings yet
Machine Learning-Intro
7 pages
Model 1
No ratings yet
Model 1
14 pages
R Programming Materials
No ratings yet
R Programming Materials
51 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
R Codes
No ratings yet
R Codes
5 pages
R-Unit 2
No ratings yet
R-Unit 2
81 pages
Lab Manual Record: St. Josephs PG College
No ratings yet
Lab Manual Record: St. Josephs PG College
14 pages
Assignment 1
No ratings yet
Assignment 1
8 pages
R Console
No ratings yet
R Console
6 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
R Assignment
No ratings yet
R Assignment
9 pages
Fall 2005 Statistics 579 R Tutorial: Vectors, Matrices, and Arrays
No ratings yet
Fall 2005 Statistics 579 R Tutorial: Vectors, Matrices, and Arrays
8 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
An Introduction To Matlab For Econometrics
No ratings yet
An Introduction To Matlab For Econometrics
106 pages
Matlab
100% (1)
Matlab
83 pages
COST - JournalPracticals (1-7)
No ratings yet
COST - JournalPracticals (1-7)
22 pages
R Programming
No ratings yet
R Programming
50 pages
R Programs
No ratings yet
R Programs
12 pages
MultivariateRGGobi PDF
No ratings yet
MultivariateRGGobi PDF
60 pages
Broomspatial
No ratings yet
Broomspatial
31 pages
R Practicals
No ratings yet
R Practicals
32 pages
R Functions
No ratings yet
R Functions
8 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Analytic Geometry: Graphic Solutions Using Matlab Language
From Everand
Analytic Geometry: Graphic Solutions Using Matlab Language
Ing. Mario Castillo
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Core Concepts in Real Analysis
From Everand
Core Concepts in Real Analysis
Roshan Trivedi
No ratings yet
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
All Tutorials Dynamic Econometrics
No ratings yet
All Tutorials Dynamic Econometrics
58 pages
Problem Set 6 Solution Numerical Methods
No ratings yet
Problem Set 6 Solution Numerical Methods
11 pages
Problem Set 4 Solution Numerical Methods
No ratings yet
Problem Set 4 Solution Numerical Methods
6 pages
Python Exam Practice - Exercises
No ratings yet
Python Exam Practice - Exercises
6 pages
Return To Libc Ex
No ratings yet
Return To Libc Ex
3 pages
Assessment of Higher Order Thinking - Skills Schraw, Gregory Robinson
No ratings yet
Assessment of Higher Order Thinking - Skills Schraw, Gregory Robinson
407 pages
Saurav Kadariya
No ratings yet
Saurav Kadariya
3 pages
FlashSystem Fundamental Concepts Quiz - Attempt Review
No ratings yet
FlashSystem Fundamental Concepts Quiz - Attempt Review
22 pages
Cash App
No ratings yet
Cash App
8 pages
UCID: 5119637477-02 (Solution Level)
No ratings yet
UCID: 5119637477-02 (Solution Level)
2 pages
Handbook of Research On Machine and Deep Learning Applications For Cyber Security 1st Edition by Padmavathi Ganapathi 1522596143 9781522596141 Instant Download
100% (1)
Handbook of Research On Machine and Deep Learning Applications For Cyber Security 1st Edition by Padmavathi Ganapathi 1522596143 9781522596141 Instant Download
63 pages
ESET Remote Administrator: User Guide
No ratings yet
ESET Remote Administrator: User Guide
37 pages
Welcome To Powerpoint For Android
No ratings yet
Welcome To Powerpoint For Android
9 pages
Forest Fire
No ratings yet
Forest Fire
13 pages
Intro To Career in Data Science: Md. Rabiul Islam
100% (1)
Intro To Career in Data Science: Md. Rabiul Islam
62 pages
Discovery Request Cheat Sheet
No ratings yet
Discovery Request Cheat Sheet
3 pages
EM-043 ENG - Visitor T30 M-DR - Technical Manual (216359-00-01)
100% (1)
EM-043 ENG - Visitor T30 M-DR - Technical Manual (216359-00-01)
248 pages
Beamr Case Study
No ratings yet
Beamr Case Study
4 pages
Masraflarım Akmıyor PM
No ratings yet
Masraflarım Akmıyor PM
3 pages
A Project Report ON "Factors Affecting Consumer Buying Behaviour Towards Laptop Industry in Delhi"
No ratings yet
A Project Report ON "Factors Affecting Consumer Buying Behaviour Towards Laptop Industry in Delhi"
11 pages
Event Management System
No ratings yet
Event Management System
51 pages
9 Exploitation With Ruby
No ratings yet
9 Exploitation With Ruby
69 pages
SAP HANA Installation Guide Trigger Based Replication SLT en
No ratings yet
SAP HANA Installation Guide Trigger Based Replication SLT en
21 pages
Top 50 CSS & CSS3 Interview Questions & Answers
No ratings yet
Top 50 CSS & CSS3 Interview Questions & Answers
9 pages
Zain E-Band Project Quality of Implementation V1.0
No ratings yet
Zain E-Band Project Quality of Implementation V1.0
20 pages
Embedded System
No ratings yet
Embedded System
14 pages
Jayaram Mohapatra Resume - Thai
No ratings yet
Jayaram Mohapatra Resume - Thai
4 pages
Microservices: Yesterday, Today, and Tomorrow: June 2016
No ratings yet
Microservices: Yesterday, Today, and Tomorrow: June 2016
17 pages
Oracle Exalytics ILOM Remote Console Fails To Start or Starts and Reports Error No Appropriate Protocol (Protocol Is Disabled or Cipher Suites Is Inappropriate) (Doc ID 1987885.1)
No ratings yet
Oracle Exalytics ILOM Remote Console Fails To Start or Starts and Reports Error No Appropriate Protocol (Protocol Is Disabled or Cipher Suites Is Inappropriate) (Doc ID 1987885.1)
2 pages
CP Unr 4K4082 P8V4
No ratings yet
CP Unr 4K4082 P8V4
5 pages
Target 22 Polynomial Project
No ratings yet
Target 22 Polynomial Project
7 pages
Lab Syllabus Format
No ratings yet
Lab Syllabus Format
4 pages

Problem Set 1 Solution Numerical Methods

Uploaded by

Problem Set 1 Solution Numerical Methods

Uploaded by

problem set 1

1 Setting up and intro problems

## R version 4.1.2 (2021-11-01)

1.2 Ordinary Least Squares

yi = 0.3 + 0.2xi + i , i = 1, . . . , 200,

## [,1] [,2] [,3] [,4]

## [,1] [,2] [,3] [,4]

## [,1] [,2] [,3] [,4]

1.4 Write a function

y.df <- data.frame(y1,y2,y3,i=1:2000)

0 500 1000 1500 2000

0 500 1000 1500 2000

0 500 1000 1500 2000

1.6 Computers are imprecise

1.7 Unbiasedness and coverage

B <- 1000000 # number of replications

t.percentile <- qt(0.975,df=n-1)

2 Reading and processing data: Stock market data

2.1 Reading material

2.2 Read market data

Later, we will clean these data and do some transformations.

2.3 Index returns

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## date open high low close adj.close volume

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

First, we look at the summaries, and remove apparent errors.

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

## Warning: Removed 1739 rows containing non-finite values (stat_density).

−0.15 −0.10 −0.05 0.00 0.05 0.10

## Warning: Removed 1739 rows containing non-finite values (stat_density).

2.4 Monthly correlations

## Warning: Removed 58 rows containing missing values (geom_point).

## Warning: Removed 58 rows containing missing values (geom_point).

1990 2000 2010 2020

3 Reading and processing data: Tennis data

## [1] "tbl_df" "tbl" "data.frame"

## Warning in force.type(f1): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet, :

## [1] "tbl_df" "tbl" "data.frame"

## Warning in force.type(f1): NAs introduced by coercion

## Warning in force.type(f1): NAs introduced by coercion

## Warning in force.type(f1): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

## Warning in force.type(f): NAs introduced by coercion

yi = 0.3 + 0.2xi + i , i = 1, . . . , 200,