Lesson 6: Time Series 1: Library
Lesson 6: Time Series 1: Library
## [,1]
## 2018-01-01 0.01874617
## 2018-02-01 -0.18425254
## 2018-03-01 -1.37133055
## 2018-04-01 -0.59916772
## 2018-05-01 0.29454513
## 2018-06-01 0.38979430
## 2018-07-01 -1.20807618
## 2018-08-01 -0.36367602
## 2018-09-01 -1.62667268
## 2018-10-01 -0.25647839
## 2018-11-01 1.10177950
## 2018-12-01 0.75578151
## 2019-01-01 -0.23823356
## 2019-02-01 0.98744470
## 2019-03-01 0.74139013
## 2019-04-01 0.08934727
## 2019-05-01 -0.95494386
## 2019-06-01 -0.19515038
## 2019-07-01 0.92552126
1
## 2019-08-01 0.48297852
## 2019-09-01 -0.59631064
## 2019-10-01 -2.18528684
## 2019-11-01 -0.67486594
## 2019-12-01 -2.11906119
## 2020-01-01 -1.26519802
## 2020-02-01 -0.37366156
## 2020-03-01 -0.68755543
## 2020-04-01 -0.87215883
## 2020-05-01 -0.10176101
plot(white_noise)
1.0 1.0
0.5 0.5
0.0 0.0
−0.5 −0.5
−1.0 −1.0
−1.5 −1.5
−2.0 −2.0
Jan Apr Jul Oct Jan Apr Jul Oct Jan Apr
2018 2018 2018 2018 2019 2019 2019 2019 2020 2020
Let’s us now import some financial data from quantmod. We can use Google data in the last 2 years. When
you import from quantmod, the object you create is already a xts time series.
library(quantmod)
getSymbols("GOOGL",from="2017-05-01",to="2020-05-11",src="yahoo")
## [1] "GOOGL"
stock=Ad(`GOOGL`) # Stock Values
plot(stock)
2
stock 2017−05−01 / 2020−05−08
1500 1500
1400 1400
1300 1300
1200 1200
1100 1100
1000 1000
3
Series white_noise
0.8
0.4
ACF
0.0
−0.4
0 2 4 6 8 10 12 14
Lag As ex-
pected, the acf sees no significant autocorrelation in a white noise. Apart from the first bar, which we can
ignore, all the others are within the confidence lines (blue dashed ones). Please be always careful about the
first bar in the ACF, that it is always 1 being the autocorrelation of xt with xt , i.e. ρ(0).
What about Google?
acf(stock,100)
4
Series stock
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0 20 40 60 80 100
Lag
Given the long and strong decay of a yet-to-be-cleaned TS, an ACF like the one above suggests the presence
of a strong trend in the data. We need to remove it before proceeding with the analysis of the type of time
series we are dealing with.
We can use two methods: regression and differences.
# Regression
t=1:length(stock)
fit=lm(stock~t)
summary(fit)
##
## Call:
## lm(formula = stock ~ t)
##
## Residuals:
## Min 1Q Median 3Q Max
## -245.225 -52.886 -6.225 47.112 236.196
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 960.82285 5.73806 167.45 <2e-16 ***
## t 0.46438 0.01303 35.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 79.12 on 760 degrees of freedom
## Multiple R-squared: 0.6256, Adjusted R-squared: 0.6252
## F-statistic: 1270 on 1 and 760 DF, p-value: < 2.2e-16
5
clean_stock=as.xts(fit$residuals)
plot(clean_stock)
200 200
100 100
0 0
−100 −100
−200 −200
6
stock 2017−05−01 / 2020−05−08
1500 1500
1400 1400
1300 1300
1200 1200
1100 1100
1000 1000
1500 1500
1400 1400
1300 1300
1200 1200
1100 1100
1000 1000
7
diff_stock=diff.xts(stock,na.action=na.pass)
plot(diff_stock)
100 100
50 50
0 0
−50 −50
−100 −100
8
Series clean_stock
1.0
0.8
0.6
ACF
0.4
0.2
0.0
0 5 10 15 20 25
Lag
acf(diff_stock,na.action = na.pass)
Series diff_stock
1.0
0.8
0.6
ACF
0.4
0.2
−0.2
0 5 10 15 20 25
Lag
Regression only removes the linear (deterministic) trend, while differencing also accounts for a the variable
trends and (partially) for the seasonal effect.
9
Both methods are unable to deal with cluster volatility, however.
And the apparently nice plot we get via differencing can be misleading. . . Why?
In any case, both new time series are just the starting point of the analysis. The best is yet to come!
10