cheatsheet的副本
cheatsheet的副本
Introduction
This is a cheat sheet of Time Series Analysis which uses language R for programming. In this paper, we
briefly introduce theory of time series and provide respective R coding.
The cheat sheet is based on An Introduction to Analysis of Financial Data with R(Ruey S. Tsay, 2013). We
learned the course Applied Time Series Analysis in Wuhan University teaching by professor O Chia Chuang
and he chose this book as our textbook. Many coding and scripts come from the book.
Tianqi Li, Muyang Ren
May 21, 2018 in IAS.
Contents
1 Rudimental Coding 2
1.1 Basic Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Get financial data from the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Statistical Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Simple Return and Log Return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 ARMA 3
2.1 Stationary Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 ACF and PACF Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Ljung-Box Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 General ARMA(p,q) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Unit-root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Seasonal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Long-memory time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.6 Model testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 ARCH 8
4 VaR and ES 8
*
email: [email protected]
†
email: @outlook.com
1
A Cheat Sheet of Time Series Analysis With R - 2/8 -
1 Rudimental Coding
1.1 Basic Commands
Here we list some basic commands for using R.
1 setwd('C:/TimeSeriesAnalysis/') # Set working directory
2 install.packages('quantmod') # Install library
3 library(quantmod) # Load Package
4 mydata = read.table("AAPL.txt", header=T) # Load txt file data
5 head(mydata) # Show head of data
6 dim(mydata) # Describe dimension of data
7 data$date = as.Date(as.character(data$date),"%Y%m%d") #Change numerical date 20180521 into Date form
2018-05-21, this is important in timeseries analysis
8 source("MyScript.R") # Load your R Script
To complete our works, all packages we need can be installed at once by the following coding:
1 install.packages(c("quantmod","fBasics","quantreg","ggplot2","fGarch","fUnitRoots","fracdiff"))
Pt − Pt−1
Rt =
Pt−1
It is usually used in our daily life. However, in time series analysis, we mainly use log return rt = log (Rt + 1)
instead for the following reason:
• The continuously compounded multiperiod return is simply the sum of continuously compounded one-
period returns involved.
rt [k] = rt−1 + rt−2 + · · · + rt−k+1
• Some time series data, for example, GDP, have an exponential growth, and take log form can transform
it into linear form.
2 ARMA
2.1 Stationary Test
A necessary condition for using ARMA model to capture log return {rt } is that {rt } is weakly stationary. In
detail, the mean and variance of {rt } are supposed to be time-invariant. We can check the covariance γk :=
Cov(rt , rt−k ) between rt and rt−k to see whether a time series {rt } is weakly stationary or not.
When Q(m) > χ2α we reject H0 and χ2α is the (1 − α)th quantile of χ2 distribution with freedom m. The
coding is
1 # Box.test(x, lag = 1, type = c("Box-Pierce", "Ljung-Box"), fitdf = 0)
2 Box.test(lrtn, lag = 1, type = "Ljung-Box")
A Cheat Sheet of Time Series Analysis With R - 4/8 -
Information Criterion Now we introduce two information criterion, AIC and BIC, in order to determine the
order (p,q). Denote that T is the number of samples. likelihood function means maximal likelihood function.
−2 2
• AIC = ln(likelihood f unction) + × Number of parameter (Generally)
T T
2l
• AIC(l) = ln(σ̃l2 ) + (AR(l), σ̃l2 is the MLE estimator of σa2 .)
T
l × ln(T )
• BIC(l) = ln(σ̃l2 ) + (AR(l))
T
Smaller AIC or BIC are better.
We have several commands to estimate an ARMA(p,q) model.
1 # AR Model
2 model_ar = ar(lrtn, method="mle") # The command ar normalizes minimum AIC to 0
3 model_ar$order # Print the identified order
4
5 # ARMA Model
6 # arima(x, order = c(0L, 0L, 0L), seasonal = list(order = c(0L, 0L, 0L), period = NA), xreg = NULL,
include.mean = TRUE, transform.pars = TRUE, fixed = NULL, init = NULL, method = c("CSS-ML", "ML
", "CSS"), n.cond, SSinit = c("Gardner1980", "Rossignol2011"), optim.method = "BFGS", optim.
control = list(), kappa = 1e6)
7 model_arma = arima(lrtn, order=c(3,0,2)) # An ARMA(3,2) model
If we want to fix the coefficient of some variable, we need the option fixed. For instance, we need a modified
MA(3) model:
xt = c0 + at − θ1 at−1 − 0 ∗ at−2 − θ3 at−3
1 model_ma3 = arima(lrtn, order=c(0,0,3), fixed=c(NA,0,NA)) # An MA(3) model with specific coefficients
2.3 Unit-root
Unit-root makes time series nonstationary. There are three typical examples of unit-root.
Random walk
pt = pt−1 + at
We extend ARMA model such that it has a unit characteristic root, then it is called Autoregressive Integrated
Moving Average (ARIMA) which has unit-root causing long-memory. An efficient way to treat with ARIMA is
differencing. Here we give a formal definition of ARIMA(p,1,q). Assume a time series {yt }. If ct = yt −yt−1 =
(1 − B)yt satisfying ARMA(p,q), then {yt } is called ARIMA(p,1,q). Differencing makes {yt } stationary. Not
surprisingly, ARIMA(p,n,q) (n ≥ 2) exists.
1 ct = diff(yt,1) # order 1 differencing
∑
p−1
xt = ct + βxt−1 + ϕi ∆xt−i + et
i=1
and the null hypothesis is H0 : β = 1. The alternative hypothesis is H1 : β < 1. ct is an explicit function
depending on time. ∆xj = xj − xj−1 is a differencing sequence. Actually, ct usually has three forms:
0 , random walk
ct = c , random walk with drift
w0 + w1 t , exponential growth
β̂ − 1
ADF-statistics =
Std. dev of β̂
2.4 Seasonal
Actually, there are some time series which have periodic or cyclical behavior. This type of series is called a
seasonal time series. We usually eliminate the seasonal behavior by differencing preceding analysis of time
series, which is called seasonal adjustment. For example, if our data has a strong evidence of autocorrelation at
lags of 4, we need to consider a seasonal differencing
∆4 = 1 − B 4
A Cheat Sheet of Time Series Analysis With R - 6/8 -
From the ACF plot, we can observe whether there are significant and non-zero values at each lag. If so, we may
make difference to make the time series stationary.
Furthermore, multiplicative seasonal models should be taken into consideration.
In addition, we use dummy variable to capture exact seasonality (For monthly return, ACF is significantly
non-zero at lags of 12, 24 and 36, etc.). We can use dummies in OLS to check it.
1 Jan = rep(c(1, rep(0,11), n) # generate a sequence of one 1 and eleven 0 for n times
2 model_test = lm(lrtn~Jan)
3 summary(model_test)
If a differencing sequence (1 − B)d xt follows an ARMA(p,q) model, we called it ARFIMA(p,d,q) process. Here
d is not necessarily integer.
We use package fracdiff to estimate AFRIMA(p,d,q).
1 model = fracdiff(lrtn, nar=1, nma=1) #AFRIMA(1,d,1), d is estimated.
2. Use {xt |t = 1, . . . , h} to train model in order to estimate x̂h (1) and calculate eh (1) = xh+1 − x̂h (1);
3. Expand training set to {xt |t = 1, . . . , h + 1} and, similar to step 2, estimate x̂h+1 (1) and calculate
eh+1 (1) = xh+2 − x̂h+2 (1);
Also we can use Mean absolute Forecast Error (MAFE) and Bias followed.
∑T −1 ∑T −1
j=k |ej (1)| j=h ej (1)
MAFE(m) = , Bias(m) =
T −h T −h
Notice: Here we use a R Script to realize backtest. backtest.R can be downloaded on Tsay’s homepage.1
1 source("backtest.R")
2 backtest_model = backtest(model, lrtn, 215, 1) # Return MSFE and MAFE
3
4
5 # Raw code of backtest.R
6 "backtest" <- function(m1,rt,orig,h,xre=NULL,fixed=NULL,inc.mean=TRUE){
7 # m1: is a time-series model object
8 # orig: is the starting forecast origin
9 # rt: the time series
10 # xre: the independent variables
11 # h: forecast horizon
12 # fixed: parameter constriant
13 # inc.mean: flag for constant term of the model.
14 #
15 regor=c(m1$arma[1],m1$arma[6],m1$arma[2])
16 seaor=list(order=c(m1$arma[3],m1$arma[7],m1$arma[4]),period=m1$arma[5])
17 T=length(rt)
18 if(!is.null(xre) && !is.matrix(xre))xre=as.matrix(xre)
19 ncx=ncol(xre)
20 if(orig > T)orig=T
21 if(h < 1) h=1
22 rmse=rep(0,h)
23 mabso=rep(0,h)
24 nori=T-orig
25 err=matrix(0,nori,h)
26 jlast=T-1
27 for (n in orig:jlast){
28 jcnt=n-orig+1
29 x=rt[1:n]
30 if (!is.null(xre)){
31 pretor=xre[1:n,]
32 mm=arima(x,order=regor,seasonal=seaor,xreg=pretor,fixed=fixed,include.mean=inc.mean)
33 nx=xre[(n+1):(n+h),]
34 if(h==1)nx=matrix(nx,1,ncx)
35 fore=predict(mm,h,newxreg=nx)
36 }
37 else {
38 mm=arima(x,order=regor,seasonal=seaor,xreg=NULL,fixed=fixed,include.mean=inc.mean)
39 fore=predict(mm,h,newxreg=NULL)
40 }
41 kk=min(T,(n+h))
42 # nof is the effective number of forecats at the forecast origin n.
43 nof=kk-n
44 pred=fore$pred[1:nof]
45 obsd=rt[(n+1):kk]
46 err[jcnt,1:nof]=obsd-pred
47 }
48 #
49 for (i in 1:h){
50 iend=nori-i+1
51 tmp=err[1:iend,i]
52 mabso[i]=sum(abs(tmp))/iend
1
http://faculty.chicagobooth.edu/ruey.tsay/teaching/introTS/backtest.R
A Cheat Sheet of Time Series Analysis With R - 8/8 -
53 rmse[i]=sqrt(sum(tmp^2)/iend)
54 }
55 print("RMSE␣of␣out-of-sample␣forecasts")
56 print(rmse)
57 print("Mean␣absolute␣error␣of␣out-of-sample␣forecasts")
58 print(mabso)
59 backtest <- list(origin=orig,error=err,rmse=rmse,mabso=mabso)
60 }
3 ARCH
4 VaR and ES