Forecasting Daily and Monthly Exchange Rates With Machine Learning Techniques

1
Forecasting daily and monthly exchange rates with machine learning

techniques
Vasilios Plakandaras
1,
Theophilos Papadimitriou
*
, Periklis Gogas
+

Department of Economics
Democritus University of Thrace.
Abstract
We combine signal processing to machine learning methodologies by introducing a
hybrid Ensemble Empirical Mode Decomposition (EEMD), Multivariate Adaptive
Regression Splines (MARS) and Support Vector Regression (SVR) model in order to
forecast the monthly and daily Euro (EUR)/United States Dollar (USD), USD/Japanese
Yen (JPY), Australian Dollar (AUD)/Norwegian Krone (NOK), New Zealand Dollar
(NZD)/Brazilian Real (BRL) and South African Rand (ZAR)/Philippine Peso (PHP)
exchange rates. After the decomposition with EEMD of the original exchange rate series
into a smoothed and a fluctuation component, MARS selects the most informative input
datasets from the plethora of variables included in our initial data set. The selected
variables are fed into two distinctive SVR models for forecasting each component
separately one period ahead for daily and monthly data. The summation of the two
forecasted components provides exchange rate forecasts. The above implementation
exhibits superior forecasting ability in exchange rate forecasting compared to various
models. Overall the proposed model a) is a combination of empirically proven effective
techniques in forecasting time series, b) is data driven, c) relies on minimum initial
assumptions and d) provides a structural aspect of the forecasting problem.
JEL Code: G15
Key words: Exchange rate forecasting, Support Vector Regression, Multivariate
Adaptive Regression Splines, variable selection, Ensemble Empirical Mode
Decomposition, time series forecasting.

1
[email protected]
*
[email protected], corresponding author
+
[email protected]
2

1. I ntroduction
Following the breakdown of the Bretton Woods fixed exchange rate system, a large
number of models were proposed in order to outperform the random walk model in
exchange rate forecasting. Forecasting the evolution of exchange rates apart from
influencing the way an investor chooses to allocate his wealth into different portfolios, is
also a key determinant in macroeconomic variable forecasting, since future
macroeconomic information is projected on exchange rate values and vice versa (Rime,
Sarno and Sojli, 2010).
There is a vast literature regarding exchange rate forecasting. Overall, we can separate
existing research initiatives into two general categories: models examining the influence
of macroeconomic variables on the exchange rate markets and those that build on a
micro-structural market approach. The first category includes the so-called monetary
exchange rate models which attempt to establish a link between the evolution of
exchange rates and the variability of fundamentals. To mention just a few, one of the
first attempts to examine exchange rate evolution is the classic model of Mundell (1968)
and Fleming (1962) and its extension, i.e. the sticky price monetary model proposed by
Dornbuch (1976). On a similar perspective, the flexible price monetary model was for
many years the workhorse of exchange rate economics (Bilson; 1978, Stockman, 1980;
Lucas, 1982) extended by Frankel (1979) to its real interest rate differential variant.
1

A large part of the critique against the ability of fundamentals to forecast exchange rates
stems from the work of Messe and Rogoff (1983). In their seminal paper they test
various monetary exchange rate models against a random walk (RW) with a drift model
in out-of-sample forecasting. They find that regardless of the version of the monetary
model used, none outperforms the RW in terms of Mean Square Error. This suggestion
has been for many years the standard benchmark for exchange rate forecasting models.
Mark and Sul (2001) using panel regression models reject the univariate framework of
Messe and Rogoff (1983) concluding that fundamentals possess significant forecasting

1
A complete reference of all monetary exchange rate literature is beyond the scope of this paper. An
interest reader is referenced to Ronald Macdonald (2010).
3

ability on cross sectional data examination. On a broader perspective, Cheung et al
(2005) argue against the ability of the widely used Mean Square Error metric for
evaluating the forecasting power of a structural model. Studying a vast variety of
monetary exchange rate models, they find that the forecasting ability of each model
depends on the time period considered for evaluation. On a more promising path,
Molodtsova and Papell (2009) report superior out-of-sample predictability on short
forecasting horizons for monetary structural models compared to a RW for a sample of
eleven currencies. Overall, the examination of the existing literature leads to the
conclusion that fundamentals have not yet established their value in exchange rate
forecasting since their main drawbacks have not been convincingly overturned.
Under a different perspective, the microstructure proposition of the exchange rate
market focuses on extremely short term forecasting (from daily to tick-by-tick data) and
incorporates a broad category of input variables. Evans and Lyons (2002, 2005) find that
institutional aspects of the foreign exchange market such as order flow volume (net
value of long and short positions) can be of significant importance. As Lyons (2001)
states it, unlike macroeconomic expectations that are based on surveys, order flow is a
more realistic measurement of market sentiment since it represents ones direct
willingness to back his beliefs up with money. Nevertheless, Sager and Taylor (2008)
examine the forecasting ability of various currencies and forecasting horizons and reject
the superiority of order flow models against the RW. Killeen et al (2006) argue that the
predictive information content in order flow models decays rapidly over time and thus
their forecasting ability is time limited since exchange rates revert back to a RW model,
implying that the exchange rate forecasting puzzle is far from being considered fully
tackled.
On similar strands, Karemera and Kim (2006) develop Auto Regressive Integrated
Moving Average (ARIMA) models that outperform random walk models for a number
of currencies on a monthly forecasting horizon but the forecasting ability is strongly tied
to the time period under evaluation. Cheung (1993) detects long-run memory in
exchange rates and proposes the use of Auto Regressive Fractionally Integrated Moving
Average, but with limited success. On the other hand, Generalized Auto Regressive
Conditional Heteroskedasticity (GARCH) models are a popular selection among
practitioners for volatility modeling, although lacking clear cut validation of their
4

forecasting ability. Bollerslev and Wright (2001) compare GARCH and Exponential
GARCH models to autoregressive models on high frequency data, concluding that
GARCH models possess forecasting power but tend to forecast worse than AR models
on high frequency series such as daily or intraday data. Later, Galbraith and Kisinbay
(2005) reach to similar findings with the above for Deutche Mark/USD and Japanese
Yen (JPY)/USD exchange rates. Projections based on past realized variance with
autoregressive models provide better one step ahead forecasts than GARCH and
Fractionally Integrated GARCH models for a period of up to 30 trading days.
Apart from econometricians, the intense economic interest for exchange rate forecasting
has attracted researchers from a wide variety of scientific areas. Machine Learning
methodologies and more specifically Neural Networks (NN) (Semprinis et al, 2013) and
Support Vector Machines (SVM) gained significant merit based on their ability to model
non-linear systems with minimum initial assumptions and high forecasting accuracy.
Dunis and Williams (2002) compare alternative models: NN, Random Walk, Auto
Regressive Conditional Heteroskedasticity, Moving Average Convergence/Divergence,
Auto Regressive Moving Average and a Logit model on the EUR/USD exchange rate.
The sample spans May 2000 to July 2001 and use a daily forecasting horizon. They
verify the statistical and economic superiority of the NN based scheme over the
alternative models. Brandl et al., in their 2009 paper, use Genetic Algorithms for feature
selection and set their model using the SVR methodology. The initial dataset includes
variables suggested by the Purchasing Power Parity, the Covered Interest Rate Parity,
and the Uncovered Interest Rate Parity. The proposed model outperforms an NN, an
OLS regression and an ARIMA model on monthly out-of-sample forecasting for
EUR/USD, USD/JPY and USD/GBP rates. On a similar research framework, Ince and
Trafalis (2006) couple ARIMA and SVM for improving directional forecasting of the
EUR/USD exchange rate and outperforming Logit/Probit models.
The use of the RW model as a forecasting benchmark is also a contemporaneous testing
of the Efficient Market Hypothesis (EMH). Proposed by Eugene Fama (1965), states
that the determination of prices in an efficient market follows a random walk and thus it
is impossible to create a forecasting model that achieves sustainable positive returns on
the long-run. The EMH is usually presented in three forms; the weak, the semi-strong
and the strong form of efficiency. We have a weak-form efficient market when historic
5

prices of the variable in question cannot forecast the future ones, as the generating
mechanism of prices follows a random walk. Thus, autoregressive models have no
forecasting power and the best forecast about next periods price is todays price. Semi-
strong efficiency imposes more strict assumptions in that all historic prices and all
publicly available information is already reflected in current asset prices and thus they
cannot be used successfully in forecasting. Finally, the strong EMH builds on the semi-
strong case adding all private information and thus making impossible to forecast
successfully the future evolution of an assets price. Overall, outperforming the RW
model is an indication of potential economic gains for a trader that follows an alternative
trading strategy.
In this paper we propose a hybrid model that combines signal processing to Machine
Learning methodologies. Following existing literature we compile an initial dataset
containing a plethora of variables. Then we suppress noise by decomposing all series
with EEMD into a smoothed and a fluctuation component. MARS selects the most
informative input dataset for each component of the forecasted exchange rate. On a final
stage, two SVR models are trained for forecasting each component of the exchange rate
separately. The additive series of the two forecasted component series is evaluated for
out-of-sample forecasting.
We choose one day ahead forecasting as suggested by Rime, Sarno and Sojli (2010)
because: (i) one-day ahead forecasts are implementable, (ii) it is a relevant horizon for
practitioners (e.g. most currency hedge funds), (iii) unlike intraday forecasts it involves
interest rate considerations, and (iv) it is unlikely that gradual learning based on this data
will allow forecasting at much longer horizons. Moreover, one month ahead forecasting
is evaluated for comparison to a longer horizon. We consider our method for the Euro
(EUR)/United States Dollar (USD), USD/Japanese Yen (JPY), Australian Dollar
(AUD)/Norwegian Krone (NOK), New Zealand Dollar (NZD)/Brazilian Real (BRL) and
South African Rand (ZAR)/Philippine Peso (PHP) exchange rates. According to the
triennial survey of BIS (2010) USD, EUR, JPY, AUD and NOK belong to the ten most
6

traded currencies in the foreign exchange rate market
2
, while the latter three exhibit very
small trading volumes. Since the trading volume of a currency reflects the economic
interest of traders, we select USD, EUR and JPN as high, NOK, AUD and NZD as
medium and BRL, PHP, ZAR as currencies with small trading volumes respectively. In
this way we evaluate the forecasting power of our proposed model by examining
exchange rates of different volumes and thus we expect that different trading interest
leads to different versions of market efficiency.
Following the empirical findings of West et al (1993), Cheung et al (2005) and Rime et
al (2010) statistical evaluation itself cannot guarantee tangible economic gains for a
trader. Therefore we extend our research framework not only to statistical metrics as
MSE, but also conduct mean-variance analysis of returns for a dynamic trading strategy
based on the forecasts provided by our model. We calculate the Sharpe ratio (SR) of a
trading strategy following forecasts of the proposed model, since SR is a common
performance measure among practitioners.

2
USD:42,45%, EUR: 19,05%, JPY:9,5%, NZD:0,9%, NOK: 0,65%, BRL: 0,35%, PHP: 0,1% and ZAR: 0,35% of
the total trading volume.
7

2. Methodology Dataset
2.1 Methodology overview
As reported by Ronald Macdonald (2010), the volatility of exchange rates time series is
significantly higher than the volatility of other macroeconomic variables, with the
exception of interest rate differentials. The above phenomenon can be attributed to
transaction costs (which shift the original price of the series away from its original value)
and to the difficulty of observing the exact price of the exchange rate, since the daily
exchange rate is the approximation of a large number of tick-by-tick data. Smoothing
techniques can provide an alternative representation of the exchange rate series closer to
the original underlying phenomenon, reducing noise on data set.
In the voluminous literature regarding time series smoothing, several methodologies are
proposed
3
. A key issue in all smoothing methodologies is the definition of the optimal
point, beyond which smoothing results to distortion rather than noise reduction.
Apparently, since the time series of the actual phenomenon is always unknown and we
can only observe its noise additive variant, we can only assume its actual representation
through the use of a smoothing function. Unfortunately, many smoothing
implementations select ad hoc the smoothness parameters of the model, lacking
empirical evidence.. To overcome this drawback, our implementation exploits the
unique abilities of a relatively novel signal decomposition method called Ensemble
Empirical Mode Decomposition (EEMD) coupled with the criteria proposed by Guo and
Tse (2013) and the trend extraction method of Moghtaderi et al (2013). The main
advantage of this smoothing framework is the absence of initial assumptions about data
and the tuning of all model parameters based solely on data characteristics.
Another difficulty in training a forecasting model is the proper selection of input
variables. A large part of existing literature selects arbitrary input dataset of the
forecasting model, based on the propositions of a theoretical framework such as the
sticky or the flexible price model aforementioned. Instead of hand-picking and then

3
For a detailed survey on the field of trend extraction and smoothing on time series the interested
reader is referred to Alexandrov et al (2009).
8

searching for the empirical confirmation of our decision, we implement Multivariate
Adaptive Regression Splines (MARS) for automatic variable selection based on
statistical loss metrics. After decomposing all initial time series into a smoothed and a
residual fluctuating component (from each series we get its two components), we train
two MARS models for selecting the input variables of every exchange rate; one MARS
model for the smoothed and one for the fluctuating component. The selected variables
are then fed into two Support Vector Regression (SVR) models for forecasting each
component separately. Finally, the summation of the two forecasted component series is
evaluated for out-of-sample forecasting. An overview of the proposed model is depicted
in Figure 1.

Figure 1: Overview of the proposed EEMD-MARS-SVR model.
An innovation of this paper in comparison to other noise reduction methodologies
(Premanode and Toumazou, 2013) is that we model the fluctuating part of the initial
time series separately from its smoothed variant, but taking into account forecasts of
both parts in order to avoid information loss. In other words, since the observed value of
the exchange rate in the exchange rate market is always a noise additive version of the
underlying phenomenon, we study the observed value into a segregated smoothed part
and a noise component (fluctuating part). Thus we can produce models that are better
fitted to the characteristics of each series. Then, we restructure a forecasted proxy of the
9

actual observed value from both component forecasts so as to approximate the actual
observed exchange rate value, which always includes noise.
2.2 Ensemble Empirical Mode Decomposition and time series smoothing
The EEMD is a data driven algorithm that decomposes a time series into finite additive
oscillatory components called Intrinsic Mode Functions (IMFs). Proposed by Wu and
Huang (2009), the main advantage of EEMD is the lack of initial assumptions about data,
such as the need for linearity or stationarity. The decomposition into IMFs is achieved
through an iterative scheme, until stopping criteria are met and the residual has no local
maxima. The generic EEMD procedure is described on the Appendix and an example of
a decomposition applied on daily EUR/USD exchange rate is depicted in figure 2.

Figure 2: Decomposition of the original (1
st
row) daily EUR/USD exchange rate into 10
series. The last row (straight line) is the residue of the EEMD process.
The main difference of EEMD in comparison to simple Empirical Mode Decomposition
(EMD) is the addition of white Gaussian noise to the original series before
decomposition. Iteratively white noise is used and an ensemble mean of all IMFs is
10

taken as the final result (i.e. the mean of the 1
st
IMF of all decompositions is the final 1
st

IMF and so on). The ensemble of the decomposed IMF is relieved from the initially
added noise
4
.
The amplitude of the Gaussian noise variant added is chosen according to the
maximized relative RMSE criterion proposed by Guo and Tse (2013):

(()
())
()
()
where n is the number of observations, the original series and
() the IMF with

the maximum correlation with the initial exchange rate series.
The proposed Relative RMSE is the ratio between the RMSE of the original series and a
selected IMF. The IMF with the highest correlation is expected to preserve the main
elements of the exchange rate under decomposition, relieved from the added white noise.
So, this specific IMF is selected for evaluating the decomposition performance for
different Gaussian noise amplitudes.
Following Moghtaderi et al (2013) we compute a smoothed version of the initial time
series as the sum of the last few IMFs and the residual of the EEMD decomposition.
With mathematical notation, the above smoothing function is expressed with:
()
where
is the smoothed variant of the initial series for index
, L the total number of

IMFs, r the final EEMD residual and
the optimum index of the IMFs to be summed.

So, smoothing depends on selecting the optimum index
, for constructing the

smoothed series.
From the decomposition of the original exchange rate series in Figure 2, we notice that
the frequency of every IMF drops as we move from old to new IMFs, until no further

4
For more details the interested reader is referenced to Wu et al (2009).
11

decomposition is possible. Rilling et al (2005) examine fractional Gaussian noise
functions and find that apart from frequency, the averaged energy of every IMF
decreases as the index order increases. In general, the mathematical notation of a
signals energy is given by:
()

where
denotes time observations of the i-th IMF,
is the energy of the i-th IMF

and L is the total number of IMFs. Then, the averaged
energy of an IMF would be:
(4)
Moghtaderi et al (2013) expand the energy drop assumption to broadband functions
showing that
is a decreasing series of i. Nevertheless, when we depart from fractional

Gaussian noise to general form functions, the energy of certain IMFs increase instead of
decreasing. Exploiting this empirical observation the authors argue that the optimum
index
of the smoothing function lies with the smallest index, where the average
energy rises for the first time.
For instance, in Figure 2, after the decomposition of the daily time series into 10 IMFs
and the residual, energy rises on the 4
th
and 6
th
IMF. So by summing up 4th to 10th IMF
plus the residual, we obtain the smoothed function representation of the red curve in
Figure 3, while figure 4 depicts implementation on monthly data.
12

Figure 3: Daily EUR/USD exchange rate and smoothed series.

Figure 4: Monthly EUR/USD exchange rate and smoothed series.

2.3 Support Vector Regression (SVR)
The Support Vector Regression is a direct extension of the classic Support Vector
Machine model. The algorithm proposed by Vladimir Vapnik et al (1992), originates
from the field of statistical learning. When it comes to regression, the basic idea is to
13

find a linear function that has at most a predetermined deviation from the actual values
of the initial series. In other words we do not care about the error of each forecast as
long as it doesnt violate a threshold, but we will not tolerate a higher deviation. The
Support Vector (SV) set which bounds this error-tolerance band is located in the
dataset through a minimization procedure.
One of the main advantage of SVR in comparison to other machine learning techniques
is the ability to identify global minima avoiding local ones, thus reaching an optimal
solution. This aspect is crucial to the generalization ability of any model in producing
accurate and reliable forecasts. The model is built in two steps: the training step and the
testing step. In the training step, the largest part of the dataset is used for the estimation
of the function (i.e. the detection of the Support Vectors that define the band); in the
testing step, the generalization ability of the model is evaluated by checking the models
performance in the small subset that was left aside in the first step performing out-of-
sample forecasting.
Using mathematical notation we start from a training data set
(
) (
) (
, where
are
observation samples and
is the dependent variable (the target of the regression system

that we need to approximate). The methods scope is to minimize the loss function
(
) subject to|
)| , thus we enforce an upper

deviation threshold creating an error-tolerance band.
Expanding this initial framework, Vapnik and Cortes (1995) propose a soft margin
model, i.e. they allowed the existence of data points outside the error tolerance zone,
though they penalized them proportionally to the distance from the edge of the zone. So,
they introduce slack variables to the loss function
and
controlled through a cost

parameter C, resulting in the loss function (
).
In this way the primal problem that we wish to solve is:
14

) (
()
where
are the Lagrange multipliers from the Lagrangian primal function (5).
Instead of solving (3) we attack the dual form of the problem, which takes the form:
(
) (6)
subject to (
and

The solution of the primal problem (5) is
(
()
and (
()
Naturally, not all phenomena can be described by strictly linear functions. To overcome
such a drawback, we map initial data into a higher dimensional space were such a linear
function exists (Figure 5). This mapping is achieved through the use of a kernel function,
also known as kernel trick, since the computation of such a projection for a
multivariate problem would be impossible. Then the linear solution of the procuring
univariate equation is re-projected back to its original dimensional space. In this way,
with the selection of both linear or non-linear kernels SVR can also approximate non-
linear phenomena.
15

Figure 5: Upper and lower threshold on error tolerance indicated with letter . The
boundaries of the error tolerance band are defined by Support Vectors (SVs). On the
right we see the projection form 2 to 3 dimensions space and the projected error
tolerance band. Forecasted values greater than get a penalty according to their
distance from the tolerance accepted band (source: Scholckopf and Smola, 1998).
The described mapping is performed on four kernels: the linear, the radial basis function
(RBF), the sigmoid and the polynomial. The mathematical representation of each kernel
is:
Linear
(9)
RBF

(10)
Polynomial
) (
(11)
Sigmoid(MLP)
) (
) (12)
with factors d, r, representing kernel parameters.
For the construction of the SVR model we used LIBSVM, an SVM model computation
software package developed by Chang and Lin (2011).
SV
SV
SV
16

2.4 Multivariate Adaptive Regression Splines
Multivariate Adaptive Regression Splines (MARS) proposed by Freidman (1991), is a
non-parametric form of piece-wise regression. In global parametric methods such as
linear regression, the relationship between a depended variable and a set of explanatory
variables is described using a global parametric function fitted universally to all data set.
In a non-parametric approach the available data are separated into sub-regions and a
model is locally fitted to each sub-region. The points separating the regions are called
knots. The position and number of the knots is determined iteratively based on a lack-of-
fit criterion.
Starting from a training dataset (
) (
) (
, where
are input variables and
is the dependent variable vector (the

target of the regression system that we need to approximate), MARS builds a model of
the form
()
(13)
where
is a constant,
are local regression model coefficients of each sub-region, m

the number of the sub-regions of the data and
() are spline basis functions of the

form:
() (
) (14)
where
is the corresponding knot of the sub-region i (a constant number) and
the
data sample of the sub-region i.
MARS models are developed through a two stage forward/backward stepwise regression
procedure. In the forward stage, the entire D is split arbitrarily into overlapping sub-
regions and model parameters are selected by minimizing a lack-of-fit criterion. On the
backward stage, basis functions (variables) and knots that no longer contribute to the
accuracy of the fit are removed. The lack-of-fit criterion is a modified version of the
generalized cross validation criterion (MGCV) expressed as:

(() )
(15)
17

where C(M) is the number of parameters being fit, n is the total number of observations
and d is a penalty factor. A representation of a MARS model compared to a parametric
linear regression model is depicted in Figure 6.

Figure 6: A MARS and a parametric linear regression model representation.
The basic advantages of MARS are its data driven regression procedure, the lack of
initial assumptions about data and the detection of an optimum input variable set
through a lack-of-fit criterion. For the development of MARS models we used the
software package ARESLab developed by Jekabsons (2011).
2.5 Dataset
As already stated, the scope of this paper is to create models that produce one period
ahead forecasts for 5 nominal exchange rates, using monthly and daily data. The data
span the period from 1/1/1999 to 30/10/2011, not including weekends and holidays. We
use as explanatory variables data of 7 exchange rates, closing prices of major stock
indices in the U.S. and the Euro zone, spot prices of 10 precious and non-precious
metals, 18 commodities including crude oil and moving averages of 3,5,10 and 30 days
of the EUR/USD exchange rate (since it is the exchange rate with the highest volume on
the exchange rate market). We also include trade weighted USD indices compiled by the
Federal Reserve Bank of Saint Louis (FRED), EURIBOR rates of various maturities and
interest rate spreads between commercial paper, the federal funds and the effective
federal funds rate. Finally, we include basic macroeconomic variables from the US, EU,
Japan, UK, Australia, Norway, South Africa, Brazil and New Zealand.
18

Table 1: Input Variables
Commodities
Crude Oil
Cotton
Lumber
Cocoa
Coffee
Orange Juice
Sugar
Corn
Wheat
Oats
Rough Rice
Soybean Meal
Soybean Oil
Soybeans
Feeder Cattle
Lean Hogs
Live Cattle
Pork Bellies
Iron Ore
Crude Oil
Iron Ore
Cocoa
Cotton
Lumber
Metals
Gold
Copper
Pallladium
Platinum
Silver
Aluminium
Zinc
Nickel
Lead
Tin

Stock Indices
Dow Jones
Nasdaq 100
S & P 500
DAX
CAC 40
FTSE100
USD Trade Weighted
Indices
Major partners
Broad Index
Other Partners
Interest rates
T-bill 6 months
T-bill 10 years
Spread MLP-EURIBOR 3M
Spread MLP-Eonia
Spread FF-CP
Spread FF-EFF
EONIA
EURIBOR 1 Week
ECB Interest rate
EURIBOR 1 Month
FED rate

Macroeconomic
Japan/New Zealand/South
Africa/ Brasil/ Australia/ UK
Variables
Cunsumer Price Index
Productivity index
Gross Domestic Product
Trade Balance
Unemployment rate

Exchange Rates
JPY/USD
USD/GBP
BRL/NZD
NOK/AUD
PHP/ZAR
EUR/GBP
EYR/USD

Technical Analysis
variables
Relative Strengh
Index
MA3 EUR/USD
MA5 EUR/USD
MA10 EUR/USD
MA30 EUR/USD

Macroeconomic
EU Variables
Unemployment rate
Productivity
Consumer Price
Index
Current accounts
Debt
Deficit
M3
GDP
Macroeconomic
US variables
Consumer Price
Index
Debt
M3
trade balance
US deficit or
Surplus
GDP
Productivity
Unemployment rate

19

3. Empirical Results
3.1.1 Statistical Evaluation
A crucial step in measuring the generalization ability of a model is its out-of-sample
forecasting performance. Thus, in order to test the generalization ability of the selected
model, the dataset is split in two parts; the train subset and the test subset. The train/test
set ratio chosen both for monthly and daily data is 80/20. Specifically, we leave aside 30
monthly observations (6/2009-10/2011) and 648 daily observations (2/4/2009-
30/10/2011) for out-of-sample forecasting. As forecast evaluation metrics we use the
Root Mean Square Error (RMSE), the Root Mean Square Percentage Error (RMSPE)
and the Mean Absolute Percentage Error (MAPE).

()

(17)

()
where
is the forecasted ith value of the actual Yi rate and n is the total number of the
observations.
After implementing EEMD to decompose each time series into a smoothed and the
remaining fluctuating component, MARS selects the most informative input dataset for
each component of the forecasted exchange rate.
Table 2: Optimum input variable number selected by MARS
Exchange
rate
Monthly data Daily data
Fluctuating
Component
Smoothed
Series
Fluctuating
Component
Smoothed Series
EUR/USD 16 1 22 1
USD/JPY 14 4 20 1
AUD/NOK 11 7 19 1
ZND/BRL 5 1 17 1
ZAR/PHP 1 1 22 1
As we observe from table 2, MARS selects only the smoothed component of the current
exchange rate as input variable for daily forecasting of the smoothed component, while
20

for monthly forecasting of USD/JPY and AUD/NOK constructs a structural model of
more input variables. For the fluctuating component, the input dataset consists from
more variable for daily forecasting horizon and a less for monthly data.
In order to test the forecasting ability of the EEMD-MARS-SVR model against
alternatives, we compare it with a Random Walk (RW) with a drift, an autoregressive
(using as input variables only the two decomposition components for each exchange rate)
AR-SVR and an EEMD-AR-SVR model. The results from the evaluation of all models
are depicted in tables 3 and 4.
Table 3: Out-of-sample forecasting results on monthly exchange rates
Model RMSE RMSPE MAPE
EUR/USD
RW 0.0512 0.0379 0.0295
AR-SVR(linear) 0.2260 0.0379 0.0294
EEMD-AR-SVR(linear) 0.0426 0.0314 0.0239
EEMD-MARS-SVR(RBF) 0.0407 0.0300 0.0238
USD/JPY
RW 2.4739 0.0281 0.0221
AR-SVR(linear) 3.3696 0.0395 0.0354
EEMD-MARS-SVR(linear) 1.9996 0.0226 0.0177
AUD/NOK
RW 0.1054 0.0189 0.0151
AR-SVR(linear) 0.1693 0.0301 0.0261
EEMD-AR-SVR(poly) 0.1017 0.0187 0.0144
EEMD-MARS-SVR(poly) 0.1287 0.0233 0.0184
ZND/BRL
RW 0.0359 0.0275 0.0220
AR-SVR(RBF) 0.0342 0.0263 0.0209
EEMD-AR-SVR(Sigmoid) 0.0253 0.0192 0.0158
EEMD-MARS-SVR(Sigmoid) 0.0256 0.0194 0.0149
ZAR/PHP
RW 0.2075 0.0348 0.0270
AR-SVR(linear) 0.2054 0.0344 0.0267
EEMD-AR-SVR(poly) 0.1862 0.0317 0.0213
EEMD-MARS-SVR(Sigmoid) 0.2017 0.0337 0.0243
Note: Best forecasted values are noted in bold. Best kernel is noted in parenthesis.

21

Table 4: Out-of-sample forecasting results on daily exchange rates
Model RMSE RMSPE MAPE
EUR/USD
RW 0.0096 0.0070 0.0055
AR-SVR(linear) 0.0980 0.0070 0.0055
USD/JPY
RW 0.5950 0.0068 0.0050
AR-SVR(linear) 0.6088 0.0069 0.0053
AUD/NOK
RW 0.0340 0.0063 0.0049
AR-SVR(linear) 0.0343 0.0063 0.0050
EEMD-MARS-SVR(poly) 0.0583 0.0102 0.0078
ZND/BRL
RW 0.0106 0.0082 0.0064
AR-SVR(linear) 0.0108 0.0082 0.0064
EEMD-AR-SVR(RBF) 0.0093 0.0072 0.0057
ZAR/PHP
RW 0.0661 0.0111 0.0085
AR-SVR(linear) 0.0661 0.0111 0.0085
Note: Best forecasted values are noted in bold. Best kernel is noted in parenthesis.
The model incorporating EEMD decomposition exhibits superior forecasting ability
both on daily and monthly data. In all exchange rates but the EUR/USD, the
autoregressive version of the EEMD-SVR model has the smallest RMSE, while for the
EUR/USD series the structural version of the model with the variables selected by
MARS for the fluctuating component has lower RMSE. With the exception of
AUD/NOK series, both the EEMD-AR-SVR and the EEMD-MARS-SVR outperform
the RW model, while for the AUD/NOK rate only the autoregressive EEMD-AR-SVR
model is more accurate.
The empirical findings are rather interesting and in line with our expectations about
trading volumes and EMH. The EUR/USD trading volume accounts for 28% of the total
22

daily trading volume in exchange rate market
5
. A successful estimation of the future
price of the exact exchange rate with the aforementioned trading volume could yield
high trading profits. Thus we expect that the intense economic interest for the EUR/USD
rate minimizes profit margin rapidly by moving the exchange rate to buy/sell
equilibrium and ultimately leads market to efficiency.
Indeed, our empirical findings are in favor of a weak form of market efficiency on the
EUR/USD exchange rate market. Both with daily and monthly data, the structural
variant of our proposed model presents better forecasting abilities, while for daily
forecasting horizon the autoregressive model is only slightly better than the RW, in
comparison to the structural one. So, we cannot reject the weak form efficiency for the
EUR/USD rate. On the contrary, market efficiency is rejected for all other exchange
rates, since the autoregressive model provides significantly better forecasts than the RW
and the structural variant, which is in line with our expectations due to their lower daily
trading volumes in comparison to EUR/USD.
Overall, SVR models coupled with EEMD decomposition as a preprocessing step
produce superior out-of-sample forecasts compared to the RW model, indicating the
possibility of potential profits for a trading strategy based on the EEMD-SVR forecasts.
3.2 Trading performance
In the previous section we measured the forecasting ability of the EEMD-SVR model
with forecasting performance metrics. Nevertheless, when it comes to trading, the
interest of a market participant is the evaluation of a model by means of overall
volatility and return, since statistical performance is not always synonymous to profit (as
pointed out by Cheng et al (2005)). Specifically, we examine whether there are any
additional economic gains for an investing strategy based on forecasts of the proposed
model, in comparison to trading based on the RW model; i.e. that the current price of the
exchange rate is the best approximation to future price.

5
According to the triennial report of BIS (2010) EUR/USD accounts for the 28% of the total daily trading
volume, USD/JPY for 14% and all the other exchange rates are well below 1%.
23

Sharpe ratio, or return-to-variability ratio, measures the risk-adjusted returns of a
portfolio or investment strategy and is widely used by investment banks and asset
management companies to evaluate investment performance. It measures the risk-
adjusted performance of a portfolio and its mathematical expression is:

(19)
where
is the annualized return of the exchange rate and
the annualized standard

deviation of the trading strategy.
In order to measure the profitability of the model, we transform our forecasts into
returns with the formula:
) ()
where
is the return at time t+1 and
is the exchange rate at time t. The trading

strategy is to stay long on the currency of the nominator of each exchange rate when the
forecasted return is positive and go short (sell) the currency at the opposite occasion.
When the forecasted return is zero, we are indifferent, keeping our previous position. In
other words the problem breaks down to a binary problem of 2 states; up/down, buy/sell
and in the special case stay neutral. For simplicity we assume that the investor begins
with an initial amount of all five currencies of the nominator, he invests entirely to one
of the two currencies for each exchange rate and receives his long/short decision based
on the forecast value of the exchange rate he has on time t, a position that he closes at
the end of the next trading period (t+1). At the end of the trading horizon, he measures
the annualized return and standard deviation, given by:
()
(22)
where
is the annualized return,
the annualized standard deviation,
the standard
deviation computed at the end of the forecasting horizon for all out-of-sample forecasted
returns and
the maximum annual transactions possible, here 252 for daily and 12 for
24

monthly data. Since the vast majority of transactions is made through electronic trade
platform such EBS and Reuters dealing 3000 thus minimizing trading costs, we assume
the least transaction cost possible for all exchange rates as of 1 pip per trade (0.0001
spread between buy and sell price of the exchange rate).
25

Table 6: Trading Performance on monthly data
Model EUR/USD RW EUR/USD USD/JPY RW USD/JPY AUD/NOK RW AUD/NOK NZD/BRL RW NZD/BRL ZAR/PHP RW ZAR/PHP
Annualized return (excluding costs)(%) 1,24 -0,44 8,12 0,66 -1,08 -2,61 1,12 -8,15 3,91 0,46
Annualized volatility (excluding costs)(%) 12,18 8 8,80 8,17 1,76 2,85 4,02 5,81 9,02 8,83
Sharpe ratio(excluding costs) 0,10 -0,06 0.92 0,08 -0,61 -0,92 0,28 -1,40 0,43 0,05
Positions taken (annualized) 10 5 10 8 2 4 2 5 6 6
Transaction costs (annualized)(%) 0,003 0,002 0.0005 0,0004 0.0022 0.0036 0,0078 0.0019 0,0049 0,0049
Annualized return (including costs)(%) 1,21 -0,46 8,12 0,66 -1,08 -2,61 1,11 -8,17 3,90 0,46
Sharpe ratio (including costs)(%) 0,09 -0,06 0,92 0,08 -0,62 -0.92 0,28 -1,40 0,43 0,05

Table 7: Trading Performance on daily data
Model EUR/USD RW EUR/USD USD/JPY RW USD/JPY AUD/NOK RW AUD/NOK NZD/BRL RW NZD/BRL ZAR/PHP RW ZAR/PHP
Annualized return (excluding costs)(%) 31,02 2,25 24,23 0,67 6,44 -3,05 9,59 -3,24 45,37 0,15
Annualized volatility (excluding costs)(%) 7,74 8,15 7,75 7,90 6,83 6,87 8,77 8,74 12,43 13,53
Sharpe ratio(excluding costs) 4 0,28 3,13 0.08 0,94 -0,44 1,09 -0.37 3,65 0,01
Positions taken (annualized) 121 125 142 134 116 117 124 123 128 120
Transaction costs (annualized)(%) 0,44 0,45 0.09 0.007 0,11 0,11 0,48 0,48 0,10 0,01
Annualized return (including costs)(%) 30,59 1,80 24,23 0.66 6,34 -3,15 9,10 -3,72 45,26 0,04
Sharpe ratio (including costs)(%) 3,95 0,22 3,12 0.08 0,93 -0,46 1,04 -0,43 3,64 0.004
26

The relatively high Sharpe ratio of the proposed trading strategies are generally in line
with the empirical findings of previous studies about portfolios including exchange rates
(Rime et al, 2010, Sermpinis et al, 2013). Nevertheless, the computed ratios from our
approach are higher than previous studies both in daily and monthly forecasting
horizons
6
.
Summarizing the results of table 6 and 7, a strategy based on the forecasts of the
proposed EEMD-SVR methodology exhibits positive annualized returns (after trading
costs) both for daily and monthly data on all exchange rates, with the exception of the
AUD/NOK monthly exchange rate. Overall, the EEMD-SVR based trading strategy
outperforms the RW model, producing economic gains.

6
Sarno, Valente, and Leon (2006) calculate Sharpe ratios of forward bias trading strategies that range
between 0.16 and 0.88, while Lyons (2001) reports a Sharpe ratio of 0.48 for an equally weighted
investment in six currencies.
27

4. Conclusion
The proposed hybrid EEMD-MARS-SVR model is a novel forecasting approach based
on data driven methods. By segregating variable time series to a smoothed and a
fluctuating component and forecasting each one separately, our model outperforms the
RW in out-of-sample forecasting on the Euro (EUR) /United States Dollar (USD),
Japanese Yien (JPY) /USD, Norwegian Krone (NOK) / Australian Dollar (AUD)/, New
Zealand Dollar (NZD) / Brazilian Real (BRL) and Philippine Peso (PHP) / South
African Rand (ZAR) exchange rates. Moreover, the adoption of an autoregressive
expression for all series rejects the EMH for all exchange rate markets, but the
EUR/USD rate where we cannot reject the weak form of market efficiency. Overall,
machine learning approximations coupled with signal processing methodologies for time
series smoothing can be exploited for adopting trading strategies that yield economic
profits.
Acknowledgments
This research has been co-financed by the European Union (European Social Fund
ESF) and Greek national funds through the Operational Program "Education and
Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research
Funding Program: THALES. Investing in knowledge society through the European
Social Fund.
We would also like to thank the participants of the 75st International Atlantic Economic
conference held in Vienna, Austria in April 2013, for their helpful comments.
References

Alexandrov, T., Bianconcini, S., Bee Dagum, E., Maass, P. and Mc Elroy, T. (2009) A
review of some modern approaches to the Problem of trend extraction, Research
Report Series, Statistics vol. 3, U.S. Census Bureau, Washington.
Berger, D.W., Chaboud, A.P., Chernenko, S.V., Howorka, E. and Wright, J.H. (2008)
Order flow and exchange rate dynamics in electronic brokerage system data,
Journal of International Economics, vol.75, pp. 93109.
28

Bilson, J. (1978) The monetary approach to the exchange rate some empirical evidence,
IMF Staff Papers, vol. 25(1), pp. 4875.
Bollerslev, T. and Wright, J. H. (2001) High-frequency data, frequency domain
inference, and volatility forecasting, Review of Economics and Statistics, vol. 83,
pp. 596602.
Brandl B., Wildburger U. and Pickl S. (2009) Increasing of the fitness of fundamental
exchange rate forecast models. International Journal of Contemporary
Mathematical Sciences, vol. 4 (16), pp. 779-798.
Chang C.-C. and Lin C.-J. (2011) LIBSVM: a library for support vector machines. ACM
Transactions on Intelligent Systems and Technology, vol. 2 (27), pp. 1-27.
Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Cheung Y-W. (1993) Long memory in foreign-exchange rates, Journal of Business and
Economic Statistics, vol. 11, pp. 93- 101.
Cheung,Y.-W., Chinn M. and Pascual A.G. (2005) Empirical Exchange Rate Models of
the Nineties: Are any Fit to Survive?, Journal of International Money and Finance,
vol. 24(7), pp. 115075.
Cortes C. and Vapnik V. (1995) Support-Vector Networks, Machine Leaming, vol 20,
pp. 273-297.
Dornbusch, R., (1976) Expectations and Exchange Rate Dynamics, Journal of Political
Economy, vol. 86, pp. 1161-1176.
Dunis, C. and Williams, M. (2002) Modeling and Trading the EUR/USD Exchange Rate:
Do Neural Network Models Perform Better ?, Derivatives Use, Trading and
Regulation, vol.8 (3), pp. 211-239.
Evans, M.D.D. and Lyons, R.K. (2008) How is macro news transmitted to exchange
rates?, Journal of Financial Economics, vol. 88,pp. 2650.
Evans, M.D.D. and Lyons, R.K. (2002) Order flow and exchange rate dynamics, Journal
of Political Economy, vol. 110, pp. 170180.
Evans, M.D.D. and Lyons, R.K. (2005) Do currency markets absorb news quickly?,
Journal of International Money and Finance, vol. 24,pp. 197217.
Fama, E (1965) The Behavior of Stock Market Prices, Journal of Business. vol. 38, pp.
34105.
Fleming, J., Kirby, C. and Ostdiek, B. (2001) The economic value of volatility timing.
Journal of Finance, vol. 56, pp. 329352.
29

Fleming,J.M. (1962),Domestic Financial Policies under Fixed and Floating Exchange
Rates, IMF Staff Papers, pp. 36979.
Frankel, J.A., (1979) On the Mark.: A Theory of Floating Exchange Rates Based on
Real Interest Differentials, American Economic Review, vol. 69, pp. 610-622.
Freidman J. (1991) Multivariate adaptive regression splines, Annals of Statistics, vol. 19,
pp. 1-141.
Galbraith, J.W. and Ksnbay T. (2005) Content horizons for conditional variance
Forecasts, International Journal of Forecasting, vol. 21, pp. 249-260.
Guo W. and Tse P. (2013), A novel signal compression method based on optimal
ensemble empirical mode decomposition for bearing vibration signals, Journal of
Sound and Vibration, vol.332, pp. 423-441.
Hannan, E. J., and B. G. Quinn (1979) The Determination of the Order of an
Autoregression, Journal of the Royal Statistical Society, B, vol.41, pp.190195.
Hodrick, R.J. and Prescott, E.C. (1997), Postwar US business cycles: an empirical
investigation, Journal of Money, Credit, and Banking, vol. 29, pp. 1-16.
Huang, M. L. Wu, S. R. Long, S. S. Shen, W. D. Qu, P. Gloersen, and K. L. Fan (1998),
The empirical mode decomposition and the Hilbert spectrum for nonlinear and
non-stationary time series analysis, Procccedings of the Royal Society of London,
vol. 454A, pp. 903-993.
Ince H. and Trafalis T. (2006) A hybrid model for exchange rate prediction. Decision
Support Systems, vol. 42(2), pp. 1054-1062.
Jekabsons G. (2011), ARESLab: Adaptive Regression Splines toolbox for
Matlab/Octave, available at http://www.cs.rtu.lv/jekabsons.
Karemera D. and Kim B, (2006), Assessing the forecasting accuracy of alternative
nominal exchange rate models: the case of long memory, Journal of Forecasting
Volume 25, Issue 5, pages 369380.
Killeen, W.P., Lyons, R.K. and Moore, M.J. (2006) Fixed versus flexible: lessons from
EMS order flow, Journal of International Money and Finance, vol. 25, pp. 551
579.
Lucas,R.E. (1982)Interest Rates and Currency Prices in a Two-Country World, Journal
of Monetary Economics, vol. 10, pp. 33560.
Lyons,R.K. (2001), The Microstructure Approach to Exchange Rates, Cambridge, MA:
MIT Press.
30

Macdonald, Ronald. (2010) Exchange Rate Economics: Theory and Evidence. 1st
Edition. New York: Routledge.
Mark, N.C. and D. Sul (2001), Nominal Exchange Rates and Monetary Fundamentals:
Evidence from a Small Post-Bretton Woods Panel, Journal of International
Economics, vol. 53(1), pp. 2952.
Meese, R. and Rogoff K. (1983) Empirical Exchange Rate Models of the Seventies: Do
they Fit out of Sample?, Journal of International Economics, vol. 14, pp. 324.
Moghtaderi A., Flandrin P, and Borgnat P, (2013), Trend filtering via Empirical Mode
Decompostition, Computational Statistics and Data Analysis, vol. 58, pp. 114-126.
Molodtsova, T. and Papell, D.H., (2009) Out- of- sample exchange rate predictability
with Taylor rule fundamentals. Journal of International Economics. vol. 77,
pp.167180.
Mundell, R.A. (1968), International Economics (New York: Macmillan).
Premanode B. and Toumazou C. (2013) Improving prediction of exchange rates using
Differential EMD, Expert Systems with Applications, vol. 40, pp. 377384.
Rilling G., Flandrin, P. and Goncalves, P. (2005) Empirical mode decomposition,
fractional Gaussian noise and Hurst exponent estimation. IEEE International
Conference on Acoustics, Speech, and Signal Processing, pp. 489492.
Rime D., Sarno L. and Sojli E. (2010), Exchange rate forecasting, order flow and
macroeconomic information, Journal of International Economics, vol. 80, pp. 72-
88.
Sager M.J. and Taylor M.P. (2008) Commercially available order flow data and
exchange rate movements: caveat emptor. Journal of Money and Credit Banking,
vol. 40, pp. 583625.
Sarno L., Valente G. and Leon H. (2006) Nonlinearity in Deviations from Uncovered
Interest Parity: An Explanation of the Forward Bias Puzzle, Review of Finance,
European Finance Association, vol. 10(3), pages 443-482.
Sermpinis G., Theofilatos K., Karathanasopoulos A., Georgopoulos F. E., Dunis C.
(2013) Forecasting foreign exchange rates with adaptive neural networks using
radial-basis functions and Particle Swarm Optimization, European Journal of
Operational Research, vol. 225, pp. 528540.
Stockman, A. (1980) A Theory of Exchange Rate Determination, Journal of Political
Economy, vol. 88, pp. 67398.
31

Vapnik, V., Boser, B. and Guyon, I. (1992) A training algorithm for optimal margin
classifiers, Fifth Annual Workshop on Computational Learning Theory, Pittsburgh,
ACM, pp.144152.
West, K.D., Edison, H.J. and Cho, D. (1993) A utility based comparison of some models
for exchange rate volatility. Journal of International Economics, vol.35,pp.2345.
Wu, Z., and N. E Huang (2009) Ensemble Empirical Mode Decomposition: a noise-
assisted data analysis method., Advances in Adaptive Data Analysis., vol. 1, No.1,
pp. 1-41.

32

APPENDIX

The generic EEMD method can be described as follows:
1) Add white noise with a predefined variation to the time series under
consideration.
2) Detect local minima and maxima.
3) With cubic interpolation compute the upper and lower envelopes from local
minima and maxima respectively.
4) Compute the mean value of the lower and the upper envelope. If a) the number
of local minima, local maxima and zero crossing points vary at most by one and
b) the mean is approximately zero, then we subtract the mean from the initial
time series. The residual is the first IMF. If the criteria are not met then we go
back to step (2) and repeat the procedure until criteria fulfillment.
5) Repeat step (4) until the residual has no local maxima and minima.
6) Go back to step (1) and repeat the process for different versions of added white
noise.
7) Take the mean of the ensemble of all decompositions for each IMF as the final
output.

Forecasting Daily and Monthly Exchange Rates With Machine Learning Techniques

Uploaded by

Copyright:

Available Formats

Forecasting Daily and Monthly Exchange Rates With Machine Learning Techniques

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Forecasting Daily and Monthly Exchange Rates With Machine Learning Techniques

Uploaded by

Copyright:

Available Formats

1

Forecasting daily and monthly exchange rates with machine learning

() the IMF with

is the smoothed variant of the initial series for index

, L the total number of

the optimum index of the IMFs to be summed.

, for constructing the

denotes time observations of the i-th IMF,

is the energy of the i-th IMF

energy of an IMF would be:

is a decreasing series of i. Nevertheless, when we depart from fractional

is the dependent variable (the target of the regression system

)| , thus we enforce an upper

controlled through a cost

are input variables and

is the dependent variable vector (the

are local regression model coefficients of each sub-region, m

() are spline basis functions of the

is the corresponding knot of the sub-region i (a constant number) and

is the annualized return of the exchange rate and

the annualized standard

is the return at time t+1 and

is the exchange rate at time t. The trading

is the annualized return,

the annualized standard deviation,

You might also like