0% found this document useful (0 votes)
101 views6 pages

Lin Regr and Arima

This document compares an ARIMA time series model to a linear regression model fitted through machine learning on foreign exchange return data. It fits an ARIMA(0,1,1) model to EUR/USD exchange rate data and shows the results. It then builds a Bayesian linear regression model on the same data set and performs variational inference to estimate the posterior distribution over the model parameters. The goal is to compare the ARIMA approach commonly used in finance to a supervised learning regression model.

Uploaded by

api-223061586
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views6 pages

Lin Regr and Arima

This document compares an ARIMA time series model to a linear regression model fitted through machine learning on foreign exchange return data. It fits an ARIMA(0,1,1) model to EUR/USD exchange rate data and shows the results. It then builds a Bayesian linear regression model on the same data set and performs variational inference to estimate the posterior distribution over the model parameters. The goal is to compare the ARIMA approach commonly used in finance to a supervised learning regression model.

Uploaded by

api-223061586
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Lin_Regr_Finance-inverseX&Y-text 6/14/17, 5)51 PM

ARIMA vs. Linear Model Fitted through Machine Learning

Linear time series analyses are some of the most common techniques for analyzing data in
finance but also other industries where linear dependencies between variables at time t and
previous times t-1, t-2, are assumed. Treating an asset return as a collection of random
variables over time, and capturing the linear relationship between the asset return and
information available prior to time t provides a natural framework to study the dynamic structure
of a time series.

Since we are doing the analysis in Python, we need to import a few modules:

In [4]: import numpy as np


import tensorflow as tf
import edward as ed
from edward.models import Normal
%matplotlib inline
import pandas as pd
from pandas import DataFrame, Series
import statsmodels
import statsmodels.api as sm
import matplotlib.pyplot as plt

Correlations between the variable of interest and its past values dier by type of variable,
whether those are monthly stock returns, value-weighted index returns or foreign exchange
returns. This determines the type of model that is likely to fit best. In the finance literature, a
version of the Capital Asset Pricing Model (CAPM) theory is that the return of an asset is not
predictable and should have no autocorrelations. For demonstration purposes in this example,
lets look at a series of foreign exchange returns for the EUR/USD pair for some time period in
2012:

Import FOREX data file

In [3]: forex = pd.read_csv('/Users/DrC-GStefanita/Desktop/FOREX.csv',index_col=[


parse_dates=['date'])

http://localhost:8888/notebooks/Lin_Regr_Finance-inverseX%26Y-text.ipynb# Page 1 of 6
Lin_Regr_Finance-inverseX&Y-text 6/14/17, 5)51 PM

In [3]: y = pd.DataFrame(forex.price,index = forex.index)


print(y.head())

price
date
2012-09-30 23:12:00 1.281598
2012-09-30 23:13:00 1.281041
2012-09-30 23:14:00 1.281705
2012-09-30 23:16:00 1.280685
2012-10-01 00:00:00 1.280717

It is common to fit an Autoregressive Integrated Moving Average (ARIMA) model of the simplest
form ARIMA (0,1,1), a basic exponential smoothing model to better understand the data or to
predict future point in the series (forecasting). ARIMA models are preferred for foreign exchange
data when there are assumptions of non-stationarity so that the integrated part of the model
can eliminate that. We fit such a model, as shown below:

In [4]: # We fit an ARIMA(0,1,1) model via maximum likelihood.


mod_arima = statsmodels.tsa.api.ARIMA(y, order=(0,1,1))
res_arima = mod_arima.fit()

And print the results:

http://localhost:8888/notebooks/Lin_Regr_Finance-inverseX%26Y-text.ipynb# Page 2 of 6
Lin_Regr_Finance-inverseX&Y-text 6/14/17, 5)51 PM

In [5]: # Show the summary of results


print(res_arima.summary())

ARIMA Model Results


======================================================================
========
Dep. Variable: D.price No. Observations:
202
Model: ARIMA(0, 1, 1) Log Likelihood
1266.011
Method: css-mle S.D. of innovations
0.000
Date: Wed, 14 Jun 2017 AIC -
2526.022
Time: 07:54:03 BIC -
2516.097
Sample: 09-30-2012 HQIC -
2522.006
- 10-01-2012
======================================================================
===========
coef std err z P>|z| [0.025
0.975]
----------------------------------------------------------------------
-----------
const 3.635e-05 3.43e-05 1.061 0.290 -3.08e-05
0.000
ma.L1.D.price 0.0607 0.070 0.874 0.383 -0.075
0.197
Roots
======================================================================
=======
Real Imaginary Modulus Fr
equency
----------------------------------------------------------------------
-------
MA.1 -16.4632 +0.0000j 16.4632
0.5000
----------------------------------------------------------------------
-------

Assuming we accept this model, we now want to see how a Supervised Learning Regression
Model in Python would compare to this particular choice. In conventional regression such as
ARIMA we maximize the likelihood function by estimating the parameters that would do that, so
that parameters are constant. The Bayesian approach to regression used in Machine Learning
turns these concepts upside down: Shouldnt we maximize instead the probability of these
parameters given the data set? If so, the data is considered fixed, a constant, and the

http://localhost:8888/notebooks/Lin_Regr_Finance-inverseX%26Y-text.ipynb# Page 3 of 6
Lin_Regr_Finance-inverseX&Y-text 6/14/17, 5)51 PM

parameters are now random variables that have a probability distrbution function (pdf). We want
to maximize the probability of a random variable given the data we just observed. You would
then update the probability of the parameters given the trends you observe.

For comparison to the ARIMA model, we build a Supervised Learning Regression Model

Our Data Set:

In [6]: n = len(y)

In [7]: y = np.array(y)

In [8]: b_true = np.random.randn(1)

In [9]: x = np.random.randn(n)
x = x.reshape(n,1)

In [36]: y_train = y /b_true + np.random.randn(n,1)

In [37]: y_train = y_train.reshape(n,1)

In [38]: # The Bayesian Linear Regression Model


X = tf.placeholder(tf.float32,[n,1])
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))
alpha = Normal(loc=tf.zeros(1), scale=tf.ones(1))
yt = Normal(loc=((X-alpha)/b), scale=tf.ones(1))

The Bayesian Linear Regression Model assumes a linear relationship between inputs x and
outputs y with linearly distributed noise. Our task is to infer hidden structure from labeled data
comprised of training examples. The latent variables are the linear models weights b and
intercept alpha also known as the bias. We define a placeholder X and during inference we pass
in the value for this placeholder according to data.

The next step is to infer the posterior using variational inference:

In [39]: # Prepare Inference


qb = Normal(loc=tf.Variable(tf.random_normal([1])),
scale=tf.nn.softplus(tf.Variable(tf.random_normal([1]))))
qalpha = Normal(loc=tf.Variable(tf.random_normal([1])),
scale=tf.nn.softplus(tf.Variable(tf.random_normal([1]))))

http://localhost:8888/notebooks/Lin_Regr_Finance-inverseX%26Y-text.ipynb# Page 4 of 6
Lin_Regr_Finance-inverseX&Y-text 6/14/17, 5)51 PM

In [40]: sess = tf.Session()


init = tf.global_variables_initializer()
sess.run(init)

Then run variational inference with the Kullback - Leibler divergence using 250 iterations and 5
latent variable samples in the algorithm:

In [41]: inference = ed.KLqp({b: qb, alpha: qalpha}, data={X: x, yt: y_train})


inference.run(n_samples=5, n_iter=250)

250/250 [100%] Elapsed: 5s | Loss: 755.


810

Criticism

We then evaluate the regression by comparing prediction accuracy on testing data where in our
case we drew inspiration from the fitted ARIMA model parameters:

In [42]: yt_post = ed.copy(yt, {b: qb, alpha: qalpha})

In [43]: yt_test = (y - 0.0607) / 3.635e-05

In [44]: yt_test = yt_test.reshape(n,1)

We can now visualize the fit by comparing data generated with the prior to data generated with
the posterior:

In [45]: def visualise(X_data, Y_data, b, alpha, n_samples=10):


b_samples = b.sample(n_samples)[:, 0].eval()
alpha_samples = alpha.sample(n_samples).eval()
plt.scatter(X_data[:, 0], Y_data)
inputs = np.linspace(-8, 8, num=1000)
for ns in range(n_samples):
output = inputs * b_samples[ns] + alpha_samples[ns]
plt.plot(inputs, output)

http://localhost:8888/notebooks/Lin_Regr_Finance-inverseX%26Y-text.ipynb# Page 5 of 6
Lin_Regr_Finance-inverseX&Y-text 6/14/17, 5)51 PM

In [46]: # Visualize samples from the prior.


visualise(x, y_train, b, alpha)

In [47]: # Visualize samples from the posterior.


visualise(x, y_train, qb, qalpha)

The Bayesian Linear Regression Model has learned a linear relationship between the exchange
rate and the return outputs.

http://localhost:8888/notebooks/Lin_Regr_Finance-inverseX%26Y-text.ipynb# Page 6 of 6

You might also like