0% found this document useful (0 votes)
17 views8 pages

cheatsheet的副本

This document serves as a cheat sheet for Time Series Analysis using R, summarizing essential theories and coding practices. It covers rudimentary coding commands, ARMA modeling, and statistical tests necessary for analyzing financial data. The content is based on a course taught at Wuhan University and references the textbook 'An Introduction to Analysis of Financial Data with R' by Ruey S. Tsay.

Uploaded by

bassirou ndao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

cheatsheet的副本

This document serves as a cheat sheet for Time Series Analysis using R, summarizing essential theories and coding practices. It covers rudimentary coding commands, ARMA modeling, and statistical tests necessary for analyzing financial data. The content is based on a course taught at Wuhan University and references the textbook 'An Introduction to Analysis of Financial Data with R' by Ruey S. Tsay.

Uploaded by

bassirou ndao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

A Cheat Sheet of Time Series Analysis With R

Tianqi Li*, Muyang Ren†

IAS, Wuhan University


Last Updated: June 4, 2018

Introduction
This is a cheat sheet of Time Series Analysis which uses language R for programming. In this paper, we
briefly introduce theory of time series and provide respective R coding.
The cheat sheet is based on An Introduction to Analysis of Financial Data with R(Ruey S. Tsay, 2013). We
learned the course Applied Time Series Analysis in Wuhan University teaching by professor O Chia Chuang
and he chose this book as our textbook. Many coding and scripts come from the book.
Tianqi Li, Muyang Ren
May 21, 2018 in IAS.

Contents
1 Rudimental Coding 2
1.1 Basic Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Get financial data from the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Statistical Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Simple Return and Log Return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 ARMA 3
2.1 Stationary Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 ACF and PACF Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Ljung-Box Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 General ARMA(p,q) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Unit-root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Seasonal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Long-memory time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.6 Model testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 ARCH 8

4 VaR and ES 8
*
email: [email protected]

email: @outlook.com

1
A Cheat Sheet of Time Series Analysis With R - 2/8 -

1 Rudimental Coding
1.1 Basic Commands
Here we list some basic commands for using R.
1 setwd('C:/TimeSeriesAnalysis/') # Set working directory
2 install.packages('quantmod') # Install library
3 library(quantmod) # Load Package
4 mydata = read.table("AAPL.txt", header=T) # Load txt file data
5 head(mydata) # Show head of data
6 dim(mydata) # Describe dimension of data
7 data$date = as.Date(as.character(data$date),"%Y%m%d") #Change numerical date 20180521 into Date form
2018-05-21, this is important in timeseries analysis
8 source("MyScript.R") # Load your R Script

To complete our works, all packages we need can be installed at once by the following coding:
1 install.packages(c("quantmod","fBasics","quantreg","ggplot2","fGarch","fUnitRoots","fracdiff"))

1.2 Get financial data from the Internet


We usually use getSymbols in quantmod to get data such as stocks from the Internet.
1 library(quantmod)
2 getSymbols("AAPL", from="2007-01-03", to="2011-12-02") # Get Data of Apple Stock
3 getSymbols("DEXUSEU", src="FRED") # Exchange rates from FRED

1.3 Statistical Commands


Before we start our analysis in time series data, we should learn some basic statistical commands.
1 library(fBasics) # Load package
2 basicStats(vector) # Show summary statistics of vector such as log(return)
3 mean(vector) # Sample mean
4 var(vector) # Sample variance
5 stdev(vector) # Standard deviation
6 t.test(vector) # Testing mean equals to zero
7 skewness(vector) # Skewness
8 kurtosis(vector) # Kurtosis
9 normalTest(vector, method="jb") # JB-test

1.4 Simple Return and Log Return


Simple return Rt has an intuitional understanding.

Pt − Pt−1
Rt =
Pt−1

It is usually used in our daily life. However, in time series analysis, we mainly use log return rt = log (Rt + 1)
instead for the following reason:

• The continuously compounded multiperiod return is simply the sum of continuously compounded one-
period returns involved.
rt [k] = rt−1 + rt−2 + · · · + rt−k+1

• Statistical properties of log returns are more tractable because of summation.


A Cheat Sheet of Time Series Analysis With R - 3/8 -

• Some time series data, for example, GDP, have an exponential growth, and take log form can transform
it into linear form.

• Log return transformation usually makes time series stationary.


In R, we use the following code to do log return transform.
1 lrtn = log(rtn+1) # rtn is the simple return

2 ARMA
2.1 Stationary Test
A necessary condition for using ARMA model to capture log return {rt } is that {rt } is weakly stationary. In
detail, the mean and variance of {rt } are supposed to be time-invariant. We can check the covariance γk :=
Cov(rt , rt−k ) between rt and rt−k to see whether a time series {rt } is weakly stationary or not.

2.1.1 ACF and PACF Plot


The definition of Autocorrelation Function (ACF) is
Cov(xt , xt−k ) γk
ρk = =
V ar(xt ) γ0
Partial AutoCorrelation Function (PACF) : PACF(i) = ϕi,i is obtained by multi-regression:

xt = ϕ0,1 + ϕ1,1 xt−1 + e1t


xt = ϕ0,2 + ϕ1,2 xt−1 + ϕ2,2 xt−2 + e2t
xt = ϕ0,3 + ϕ1,3 xt−1 + ϕ2,3 xt−2 + ϕ3,3 xt−3 + e3t
..
.

In R, we use acf and pacf to realize the plot.


1 # acf(x, lag.max = NULL, type = c("correlation", "covariance", "partial"), plot = TRUE, na.action =
na.fail, demean = TRUE)
2 # pacf(x, lag.max, plot, na.action)
3 acf(lrtn, lag=10)
4 pacf(lrtn, lag=10)

2.1.2 Ljung-Box Test


The null hypothesis is
H0 : ρ1 = · · · = ρm = 0
and the corresponding statistics Q∗ (m) is

m
ρ̂2l
Q∗ (m) = T (T + 2)
T −l
l=1

When Q(m) > χ2α we reject H0 and χ2α is the (1 − α)th quantile of χ2 distribution with freedom m. The
coding is
1 # Box.test(x, lag = 1, type = c("Box-Pierce", "Ljung-Box"), fitdf = 0)
2 Box.test(lrtn, lag = 1, type = "Ljung-Box")
A Cheat Sheet of Time Series Analysis With R - 4/8 -

2.2 General ARMA(p,q) model


Here we directly consider the ARMA(p,q) model instead of separately discuss AR(p) and MA(q) model. Gen-
erally, an ARMA(p,q) model is

p ∑
q
xt = ϕ0 + ϕi xt−i + at − θj at−j
i=1 j=1

where {at } is a white noise sequence, p and q are non-negative integer.


The order of (p,q) should be discussed at first. The following two methods can help us decide our model.
However, these statistics can be calculated only after data is fitted by corresponding model.

ACF and PACF For a stationary AR(p) model, there is a proposition:


{
not zero if j ≤ p
PACF(j) = ϕj,j =
0 if j > p
For an MA(q) model (MA model is definitely stationary), we have:
{
not zero if j ≤ p
ACF(j) =
0 if j > p
Notice that ARMA model doesn t have property above.

Information Criterion Now we introduce two information criterion, AIC and BIC, in order to determine the
order (p,q). Denote that T is the number of samples. likelihood function means maximal likelihood function.
−2 2
• AIC = ln(likelihood f unction) + × Number of parameter (Generally)
T T
2l
• AIC(l) = ln(σ̃l2 ) + (AR(l), σ̃l2 is the MLE estimator of σa2 .)
T
l × ln(T )
• BIC(l) = ln(σ̃l2 ) + (AR(l))
T
Smaller AIC or BIC are better.
We have several commands to estimate an ARMA(p,q) model.
1 # AR Model
2 model_ar = ar(lrtn, method="mle") # The command ar normalizes minimum AIC to 0
3 model_ar$order # Print the identified order
4
5 # ARMA Model
6 # arima(x, order = c(0L, 0L, 0L), seasonal = list(order = c(0L, 0L, 0L), period = NA), xreg = NULL,
include.mean = TRUE, transform.pars = TRUE, fixed = NULL, init = NULL, method = c("CSS-ML", "ML
", "CSS"), n.cond, SSinit = c("Gardner1980", "Rossignol2011"), optim.method = "BFGS", optim.
control = list(), kappa = 1e6)
7 model_arma = arima(lrtn, order=c(3,0,2)) # An ARMA(3,2) model

If we want to fix the coefficient of some variable, we need the option fixed. For instance, we need a modified
MA(3) model:
xt = c0 + at − θ1 at−1 − 0 ∗ at−2 − θ3 at−3
1 model_ma3 = arima(lrtn, order=c(0,0,3), fixed=c(NA,0,NA)) # An MA(3) model with specific coefficients

Prediction is also a piece of cake by R.


1 predict(model_ma3,10) # Using your model_ma3 to forecast the next 10 periods
A Cheat Sheet of Time Series Analysis With R - 5/8 -

2.3 Unit-root
Unit-root makes time series nonstationary. There are three typical examples of unit-root.

Random walk
pt = pt−1 + at

Random walk with drift


pt = µ + pt−1 + at

Trend stationary time series


pt = β0 + β1 t + xt
where {at } is a white noise series with zero mean.

We extend ARMA model such that it has a unit characteristic root, then it is called Autoregressive Integrated
Moving Average (ARIMA) which has unit-root causing long-memory. An efficient way to treat with ARIMA is
differencing. Here we give a formal definition of ARIMA(p,1,q). Assume a time series {yt }. If ct = yt −yt−1 =
(1 − B)yt satisfying ARMA(p,q), then {yt } is called ARIMA(p,1,q). Differencing makes {yt } stationary. Not
surprisingly, ARIMA(p,n,q) (n ≥ 2) exists.
1 ct = diff(yt,1) # order 1 differencing

A reliable way to test unit-root is Augmented Dickey-Fuller Test (ADF-Test):


p−1
xt = ct + βxt−1 + ϕi ∆xt−i + et
i=1

and the null hypothesis is H0 : β = 1. The alternative hypothesis is H1 : β < 1. ct is an explicit function
depending on time. ∆xj = xj − xj−1 is a differencing sequence. Actually, ct usually has three forms:

 0 , random walk
ct = c , random walk with drift

w0 + w1 t , exponential growth

The choice of ct is based on your data.

β̂ − 1
ADF-statistics =
Std. dev of β̂

where β̂ is the OLS estimator of β.


1 library(fUnitRoots) # Loading package for unit-root
2 adfTest(lrtn, lags=10, type=c("c")) # Here type can be 0, c and ct

2.4 Seasonal
Actually, there are some time series which have periodic or cyclical behavior. This type of series is called a
seasonal time series. We usually eliminate the seasonal behavior by differencing preceding analysis of time
series, which is called seasonal adjustment. For example, if our data has a strong evidence of autocorrelation at
lags of 4, we need to consider a seasonal differencing

∆4 = 1 − B 4
A Cheat Sheet of Time Series Analysis With R - 6/8 -

From the ACF plot, we can observe whether there are significant and non-zero values at each lag. If so, we may
make difference to make the time series stationary.
Furthermore, multiplicative seasonal models should be taken into consideration.

(1 − B)(1 − B s )xt = (1 − θB)(1 − ΘB s )at

To do with seasonal series, we can use arima with option seasonal


1 #(1-B)(1-B^4)x_t=(1+ma1*B)(1+sma1*B^4)a_t
2 model = arima(lrtn, order=c(0,1,1), seasonal=list(order=c(0,1,1), period=4)

In addition, we use dummy variable to capture exact seasonality (For monthly return, ACF is significantly
non-zero at lags of 12, 24 and 36, etc.). We can use dummies in OLS to check it.
1 Jan = rep(c(1, rep(0,11), n) # generate a sequence of one 1 and eleven 0 for n times
2 model_test = lm(lrtn~Jan)
3 summary(model_test)

2.5 Long-memory time series


For a unit-root nonstationary time series, it can be shown that the sample ACF converges to 1 for all fixed lags
as the sample size increases. However, there are other time series whose ACF decreases at a polynomial rate
while lags increasing. These time series are called long-memory time series. To fit these time series, we define
ARFIMA(p,d,q) model.
As is known,
∑∞
d(d − 1) . . . (d − k + 1) k
(1 − B)d = (−1)k B
k!
k=0

If a differencing sequence (1 − B)d xt follows an ARMA(p,q) model, we called it ARFIMA(p,d,q) process. Here
d is not necessarily integer.
We use package fracdiff to estimate AFRIMA(p,d,q).
1 model = fracdiff(lrtn, nar=1, nma=1) #AFRIMA(1,d,1), d is estimated.

2.6 Model testing


If we focus on the dynamic structure of a time series, we can use AIC, BIC and variance of residual to examine
our model.
If our aim is to forecast, a good way is to use Mean Square of Forecast Error (MSFE) to measure its ability
of prediction. Here we briefly introduce the procedure of backtesting.

1. Split our dataset for training {xt |t = 1, . . . , h} and testing{xt |t = h + 1, . . . , T };

2. Use {xt |t = 1, . . . , h} to train model in order to estimate x̂h (1) and calculate eh (1) = xh+1 − x̂h (1);

3. Expand training set to {xt |t = 1, . . . , h + 1} and, similar to step 2, estimate x̂h+1 (1) and calculate
eh+1 (1) = xh+2 − x̂h+2 (1);

4. Iterate step 3 until the last sample.

And smaller MSFE is better. ∑T −1


j=k [ej (1)]2
MSFE(m) =
T −h
A Cheat Sheet of Time Series Analysis With R - 7/8 -

Also we can use Mean absolute Forecast Error (MAFE) and Bias followed.
∑T −1 ∑T −1
j=k |ej (1)| j=h ej (1)
MAFE(m) = , Bias(m) =
T −h T −h

Notice: Here we use a R Script to realize backtest. backtest.R can be downloaded on Tsay’s homepage.1
1 source("backtest.R")
2 backtest_model = backtest(model, lrtn, 215, 1) # Return MSFE and MAFE
3
4
5 # Raw code of backtest.R
6 "backtest" <- function(m1,rt,orig,h,xre=NULL,fixed=NULL,inc.mean=TRUE){
7 # m1: is a time-series model object
8 # orig: is the starting forecast origin
9 # rt: the time series
10 # xre: the independent variables
11 # h: forecast horizon
12 # fixed: parameter constriant
13 # inc.mean: flag for constant term of the model.
14 #
15 regor=c(m1$arma[1],m1$arma[6],m1$arma[2])
16 seaor=list(order=c(m1$arma[3],m1$arma[7],m1$arma[4]),period=m1$arma[5])
17 T=length(rt)
18 if(!is.null(xre) && !is.matrix(xre))xre=as.matrix(xre)
19 ncx=ncol(xre)
20 if(orig > T)orig=T
21 if(h < 1) h=1
22 rmse=rep(0,h)
23 mabso=rep(0,h)
24 nori=T-orig
25 err=matrix(0,nori,h)
26 jlast=T-1
27 for (n in orig:jlast){
28 jcnt=n-orig+1
29 x=rt[1:n]
30 if (!is.null(xre)){
31 pretor=xre[1:n,]
32 mm=arima(x,order=regor,seasonal=seaor,xreg=pretor,fixed=fixed,include.mean=inc.mean)
33 nx=xre[(n+1):(n+h),]
34 if(h==1)nx=matrix(nx,1,ncx)
35 fore=predict(mm,h,newxreg=nx)
36 }
37 else {
38 mm=arima(x,order=regor,seasonal=seaor,xreg=NULL,fixed=fixed,include.mean=inc.mean)
39 fore=predict(mm,h,newxreg=NULL)
40 }
41 kk=min(T,(n+h))
42 # nof is the effective number of forecats at the forecast origin n.
43 nof=kk-n
44 pred=fore$pred[1:nof]
45 obsd=rt[(n+1):kk]
46 err[jcnt,1:nof]=obsd-pred
47 }
48 #
49 for (i in 1:h){
50 iend=nori-i+1
51 tmp=err[1:iend,i]
52 mabso[i]=sum(abs(tmp))/iend
1
http://faculty.chicagobooth.edu/ruey.tsay/teaching/introTS/backtest.R
A Cheat Sheet of Time Series Analysis With R - 8/8 -

53 rmse[i]=sqrt(sum(tmp^2)/iend)
54 }
55 print("RMSE␣of␣out-of-sample␣forecasts")
56 print(rmse)
57 print("Mean␣absolute␣error␣of␣out-of-sample␣forecasts")
58 print(mabso)
59 backtest <- list(origin=orig,error=err,rmse=rmse,mabso=mabso)
60 }

3 ARCH

4 VaR and ES

You might also like