Evaluating Machine Learning Classification For Financial Trading
Evaluating Machine Learning Classification For Financial Trading
a r t i c l e i n f o a b s t r a c t
Keywords: Technical and quantitative analysis in financial trading use mathematical and statistical tools to help in-
Trading vestors decide on the optimum moment to initiate and close orders. While these traditional approaches
Financial forecasting
have served their purpose to some extent, new techniques arising from the field of computational intel-
Computer intelligence
ligence such as machine learning and data mining have emerged to analyse financial information. While
Data mining
Machine learning the main financial engineering research has focused on complex computational models such as Neu-
FOREX markets ral Networks and Support Vector Machines, there are also simpler models that have demonstrated their
usefulness in applications other than financial trading, and are worth considering to determine their ad-
vantages and inherent limitations when used as trading analysis tools. This paper analyses the role of
simple machine learning models to achieve profitable trading through a series of trading simulations in
the FOREX market. It assesses the performance of the models and how particular setups of the models
produce systematic and consistent predictions for profitable trading. Due to the inherent complexities of
financial time series the role of attribute selection, periodic retraining and training set size are discussed
in order to obtain a combination of those parameters not only capable of generating positive cumulative
returns for each one of the machine learning models but also to demonstrate how simple algorithms
traditionally precluded from financial forecasting for trading applications presents similar performances
as their more complex counterparts. The paper discusses how a combination of attributes in addition to
technical indicators that has been used as inputs of the machine learning-based predictors such as price
related features, seasonality features and lagged values used in classical time series analysis are used to
enhance the classification capabilities that impacts directly into the final profitability.
© 2016 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.eswa.2016.01.018
0957-4174/© 2016 Elsevier Ltd. All rights reserved.
194 E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207
that they are well-suited for quantitative analysis within the fi- will demonstrate if such low-complexity binary classification
nancial industry, as their capabilities of finding hidden patterns approaches are able to generate consistent profitable trading over
in large amounts of financial data may help in derivatives pric- an extensive period of time. The paper’s main contribution resides
ing, risk management and financial forecasting. One of the most in the fact that simple machine learning models that tradition-
published projects that uses such techniques in financial applica- ally have been precluded from financial applications, as opposed
tions is Standard & Poor’s Neural Fair Value 25 portfolio (Smicklas, to the more complex NN and SVM, can be used to generate prof-
2008), which selects on a weekly basis 25 stocks using an artificial itable transactions on the long term with the correct combina-
NN, from a total of 30 0 0 stocks yield relative to that of the S&P tion of periodic retraining, training set size and attribute selection.
500 index, attempting to outperform the market by calculating a The work is motivated mainly by the results reported in (Barbosa,
stock’s weekly fair value based on fundamental analysis. Particu- 2011), which claims outstanding financial results using simple clas-
larly for securities trading, the utility of complex models such as sifiers. In this case, a simple model is characterized by the low
NN, SVM and hybrid models (Cai, Hu, & Lin, 2012) have been ex- computational requirements for both the training and the classi-
tensively studied and have led to promising results. Nevertheless, fying process due to the inherent simplicity of the learned model
information regarding the incorporation of such methods into trad- (instance-based classifiers, decision trees and rule-based learners).
ing floor operations tends to remain hidden to the public, for com- While the main objective of ML classification is to maximise ac-
mercial proprietary reasons (Yamazaki & Ozasa, n.d.; Duhigg, 2006; curacy, this might not be the best metric to evaluate the perfor-
Patterson, 2016). mance of such systems when used in the context of financial trad-
In terms of financial trading, analysts in the industry (usually ing. The most important metric when assessing trading strategies
referred to as “quants”) have developed technical indicators that is undoubtedly profitability, reflected in this paper as cumulative
are used to identify the most suitable moments to open and close return over a specific trading period. In this paper, it is developed
trades, and are possibly the most popular tools currently used an empirical comparison between average accuracy and cumula-
in technical trading. Published research that aims to incorporate tive return as the main metrics of performance of a set of six ma-
Computational Intelligence (CI) in financial prediction shows how chine learning models (OneR, C4.5, JRip, Logistic Model Tree, KStar
those technical indicators have been used as inputs to ML models and Naïve Bayes). The models produce a binary classification used
to find the hidden patterns or relationships among them, in order later to predict price movement (up or down in the next trading
to predict future prices, trends or a percentage of confidence in period) for the USDJPY currency pair using six hour time frames
those predictions. With the possible exception of long term aver- over a trading time-frame of six years. The six hour time frame
ages, those technical indicators are constructed using information of was selected to be able to validate and compare with the results
prices over short periods in the past, no more than 20–30 trading reported in (Barbosa & Belo, 2008b), although the same approach
periods in order to incorporate the historical behaviour in a sin- used in the experiments can be applied to different time frames. A
gle value. The selected trading period is part of the trading strat- set of experiments was conducted where the results of modifying
egy and might vary from long frames of 1 day to small frames of three variables were studied: training set size, period of retraining
1 minute, or even smaller time windows as in the case of high fre- and number of attributes for the training and test sets. The results
quency trading. The construction of such indicators can be seen as show relative low accuracy, only a few points over 50%, but at the
a process used in large scale time series data sets called dimension same time, very promising results in terms of profitability. Later,
reduction (Wang et al., 2005) that attempts to transform the series further experiments are conducted on simulated trades over the
to another domain seeking a version of the data set that might be same period of time using EURGPB and EURUSD currency pairs,
much simpler to analyse. In contrast to time series analysis where and similar results are reported.
the data set is seen as a whole entity, ML classification tasks con- The remainder of the paper is organised as follows: Section 2
struct independent instances that are representative examples of discusses related work using ML in financial forecasting applica-
the concept to be learned. tions. Section 3 presents the general experimental setup, describ-
Financial predictions that incorporate ML approaches construct ing the data sets, and the attribute selection to feed the models
the training, test and off-sample data sets as a collection of in- and briefly describes the different ML algorithms used in the ex-
stances using popular technical indicators as reported in a number periments as well as their particular parameter set up in order to
of papers. Hence, an instance is created usually using the value of present comprehensive information for future experiment replica-
the current price and the instantaneous values of the mentioned tion. Section 4 discusses the results detailing each one of the sim-
indicators, generating a static picture of the situation of the market ulated trading scenarios. Finally, Section 5 concludes the paper and
for the exact time that the instance is constructed. In this scenario, explores future work.
each instance, i.e. prices and their correspondent technical indica-
tors used as attributes, becomes itself an independent example of 2. Machine learning in financial forecasting
the problem, which avoids the time dependence in the series, ap-
proaching the problem as simple classification task rather than a Within the financial trading chain two main areas can be iden-
time series analysis in the strict meaning of the word. The hypoth- tified, where the use of ML techniques have reported particularly
esis in this case is that once a ML model is trained, it may be able successful implementations: derivatives pricing, risk management
to classify individual instances using the technical indicators as at- and financial forecasting. Financial forecasting is possibly the most
tributes, due to the fact that those unseen instances represent in important application within ML for data mining in Capital Mar-
turn the invariant circumstances of the market at certain points kets. ML techniques for forecasting include expert or rule-based
in time, and that the result of the classification task can be inter- systems, decision trees, NNs and genetic computing. Applications
preted as a trend forecast. The main implication of this hypothesis within the trading cycle such as Algorithmic Trading Engines1 and
is that the financial forecasting can benefit from the use of simpler
ML techniques rather than using complex time series analysis ap-
proaches, simplifying the use of computational resources while at 1
Algorithmic trading engines in the buy side, are essentially semi-automatic
the same time avoiding indexing and ordering issues in the data computer aided systems that help retail investors to take the best financial de-
cisions in terms of high returns at lowest possible risks, and by means of pro-
sets.
gramming specific rules the system are capable of transmitting pre- and post-trade
This paper addresses the question of the usefulness of low- data about quotes and trades to other market participants (Hendershott, 2003 Chan,
complex ML classifiers in financial trading, and in particular 2008). The literature also reports the use of algorithmic trading engines in the sell
E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207 195
Order Matching Engines2 (Hendershott, 2003) have the potential to casting future values of the series, particularly in terms of the fu-
incorporate different levels of CI, and in particular ML techniques. ture direction of the series.
The majority of existing ML-based methods for trading use techni- While artificial NNs are considered the most popular technique
cal indicators as part of the training attributes extracted from the in financial forecasting, other reports also show promising results
financial series instead of using the raw prices as a training vector. with different data mining models. Using SVM, Kim (2003) at-
Maggini, Giles, and Horne (1997) had pointed out that there is an tempted to forecast daily price directions of the KOSPI stock in-
inherent difficulty in generating statistically reliable technical indi- dex. The model used technical analysis indicators (momentum,
cators, due to the fact that the rules inferred to produce accurate Williams %R and commodity channel index) as inputs and the best
predictions are changing continually in financial time series, and accuracy obtained after training several models with different pa-
that it is even possible to evidence the presence of a high number rameters was 57.83%. The work also presented a comparison with
of contradictory instances in the training sets due to the fact that the back propagation NN (with 54.76% of accuracy) and nearest-
market data exhibit statistical characteristics found in other types neighbour model (51.98% of accuracy). This middle-range level of
of time series. This situation is reflected in the large volume of pa- accuracy is expected due to the high volatility of financial time se-
pers (Chen & Shih, 2006; Eng, Li, Wang, & Lee, 2008; Kim, 2003; ries, but to achieve it several models needed to be trained. This
Lee, Park, O, Lee, & Hong, 2007; Li & Kuo, 2008; Tenti, 1996) that study concludes that no single model is perfectly suited in all mar-
have reported accuracies under 60% with ML models which have ket conditions, and even more importantly, the models must be
shown impressive performance in areas other than financial pre- retrained frequently to maintain the forecasts accurate. In another
diction. According to Sewell and Yan (2008), for certain markets study, Tay and Cao (2001) also compared SVM with back propa-
such as futures and FOREX, it may be necessary to generate pre- gation NN to forecast prices of five types of futures contracts. On
dictions with an accuracy marginally higher than the one obtained average, the SVM approach obtained better accuracy than the back
by a random classifier to obtain profits due to two main factors: propagation NN, but also in middle-range levels: 47.7% by SVM
low costs and leverage. against 45.0% obtained by back propagation NN. SVM also out-
Artificial NNs are probably the most common method utilized performed a back propagation NN in the work presented by Chen
in financial forecasting. Early works such as that of Tenti (Tenti, and Shih (2006), where these techniques were used to predict the
1996) compared the performance of three recurrent neural net- value of six Asian indices, obtaining 57.2% level of accuracy with
works based on their returns in the simulated forecasts on cur- SVM and 56.7% with NN models.
rency futures. The inputs to the networks include technical indica- Apart from NN and SVM, in the study presented in (Maggini et
tors (average directional movement index, trend movement index al., 1997) the authors proposed a heuristic method to select dif-
and the rate of change). Tenti also takes into account trading costs, ferent inputs for a non-linear machine learning algorithm, discard-
and reports positive returns in the trading simulation, demonstrat- ing the option of time series prediction and limiting the problem
ing that NN techniques can indeed be used as forecasting tools. Lu to classification to determine the class of price variation, although
and Wu (2009) show another example of stock market forecasting there is no specification in the paper if the problem is restricted to
with artificial neural networks. The paper compared a NN model’s a binary (up/down) or multi-class classification (up/down/stable)
performance against the ARIMA model, predicting the direction of problem. The selected method is the K-nearest neighbours with a
future values of the S&P 500 Index. The experiments showed that sliding window dataset used to retrain the model at every time
the NN-based system outperformed the ARIMA model only in sta- step. The metric selected to evaluate the performance was mean
ble market conditions, since the system only exhibits a modest 23% square error. The paper concludes that it is impossible to predict
level of accuracy against the ARIMA’s 42% in more volatile scenar- price variation with enough accuracy, discouraging the use of this
ios. Kamruzzaman and Sarker (2003) compared the performance approach and attributing the poor results to the weakness of the
of the ARIMA model with several NN models when forecasting ex- model and a poor selection of inputs that might affect the price
change rates of currency pairs in the FOREX market. The NNs were movement. Nevertheless, the authors seem to be focused on ac-
trained with back-propagation, scaled conjugate gradient and back- curacy and do not provide any financial results in the trading pe-
propagation with Bayesian regularization, using exchange rates in riod used in the simulation. J. Li, Tsang, and Park (1999) predicted
the previous period and moving averages as inputs. The accuracy expected return in the Dow Jones Industrial Average using Finan-
in the prediction as well as the normalized mean square error and cial Genetic Programming, and compared it with random decisions
the mean absolute error were used to measure global performance, and C4.5 decision tree classification. They used some simple tech-
showing that all NN models outperformed the ARIMA model with nical indicators such as short and long term moving averages and
an accuracy of 80%. Taking into account that only the best results long and short term price filters. The authors did not focus only
obtained were reported and that, in general ML techniques do not on accuracy of predictions but also on annualized returns and pos-
present high levels of accuracy with unseen data that differ sig- itive returns of a simulated set of investments following the pre-
nificantly from the training sets, these impressive results must be dictions. They reported over 60% in positive returns and over 40%
considered with care. (McDonald, Coleman, McGinnity, Li, & Bela- in annualized returns over a trading period of four years for the
treche, 2014), investigate the effectiveness of a number of machine Genetic Programming model and over 40% for the C4.5 decision
learning algorithms and their combinations at generating one-step tree. In both cases, the results represent an outstanding financial
ahead forecasts of a number of financial time series. The authors return even without taking into account trading costs, which sug-
found that hybrid models, consisting of a linear statistical model gest that technical indicators might generate profitable rule-based
and a non-linear machine learning algorithm, are effective at fore- models to predict complex financial time series.
Barbosa and Belo presented several reports using single agents
to execute algorithmic trading in the FOREX Market (Barbosa &
side, brokers and investment banks, to manage the vast amount of daily orders Belo, 2008b), a micro-society managing a hedge fund (Barbosa &
from the clients usually arrived before the market opening hours, deciding how Belo, 2010) and a multi-agent system for multiple markets trad-
and when to execute those orders taking into account existing regulations and at ing (Barbosa & Belo, 2008a), focusing on profitability and maxi-
the same time, minimizing the effect on prices [53]. mum drawdown as performance metrics. The proposed architec-
2
An order matching engine is a trading system that facilitates the exchange of
ture is divided into three modules in charge of (a) predicting the
financial instruments between multiple parties by means of a transaction algorithm
that translates orders into trades pairing buyers with sellers in terms of transaction immediate next trend by means of an ensemble of binary classi-
prices and quantities (Hendershott, 2003). fiers, (b) a risk management module to decide how much to invest
196 E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207
in each trade by a case-based engine that analyses past trades, and ment or the interest rates, avoiding the high volatility associated
(c) a rule-based system, where a set of rules resulting from hu- with those events that impact directly in slippage4 .
man experience can be incorporated to enhance the trading deci- The multiagent system is built using a JAVA-based multiagent
sion and add limit and stop-loss orders, trading and closing poli- framework called BESA (González, Avila, & Bustacara, 2003). The
cies. The system performs learning by means of an update to the Organizational Approach for Agent Oriented Programming method-
weighted ensemble according to the results of the individual sim- ology proposed by González and Torres (2006) was followed. The
ple ML models such as OneR, C4.5, JRip, Logistic Model Tree, KStar, machine learning models used in these experiments to predict
NN, SVM and Naïve Bayes, and also retrains the classifiers at fixed price trends were integrated in the agents’ forecasting modules
periods to adapt to new market regimes. What is interesting in this using the WEKA toolbox (Witten et al., 2011). Given the “open-
development is that even though the classifier module only pro- source” nature of the tool, WEKA-based classifiers are imported
duced a not very impressive 52.74% accuracy, the complete system and instantiated as JAVA objects, providing a seamless integration
produced a success rate of 66.67% in profitability over the tested with a custom application, great flexibility and simplified imple-
period, performing fewer but more profitable trades and avoiding mentation, focusing the efforts on the feature extraction task.
trades that were expected to be unprofitable due to the combina- Fig. 1 depicts the general architecture of the system, where two
tion of the different modules, with a high level of automation. agents are shown: a Market Agent and a Trading Agent. The role of
In all these works, there is a general agreement in favour of the Market Agent is to encode financial information using Japanese
the use of ML models for performing financial forecasting. Most Candlestick charts (open, low, high, close prices at the pre-defined
articles report positive results and may be seen as empirical evi- 6 hour time frames) and a 32-bit UNIX timestamp. Additionally, the
dence against the efficient market hypothesis3 , to demonstrate that Market Agent sends the instrument information to the Trader Agent,
there is some predictability of market prices based on historical and maintains an order book to keep track of the orders as well as
data (Li et al., 1999). With the possible exception of the work pre- the profits of the client (Trader Agent).
sented by Barbosa and Belo, in general all the reviewed reports The Trader Agent receives information about the state of the
focused on the use of sophisticated data models such as NN, SVM market as raw price data in Japanese Candlestick format, and out-
and genetic programming approaches, using technical indicators as puts market orders to open and close trades at every trading pe-
inputs. These approaches are shown to be well-suited for financial riod. An internal pre-processing module is in charge of calculat-
data modelling and forecasting, but still there is a need to study in ing different technical indicators as part of the feature extraction
depth the capabilities and limitations of these techniques for one task, which serve as an input feed to one of the six classifiers se-
of the most competitive industries in the world. The lack of reports lected. The classifier module in turn is in charge of generating a
that explore data mining approaches with an inherent simplicity of price trend forecast for the next trading period: “the price will in-
the learned model such as instance-based classifiers, decision trees crease in the next trading period” or “the price will decrease in the
and rule learners to generate consistent profitable trading suggest next trading period”. Finally, using the result of the classifiers, the
the further exploration and discussion in this area, and is the main Trader Agent generates opening and closing orders and sends them
motivation for this paper. Additionally, it is important to establish to the Market Agent. The ML models inside the Trading Agent are
a relationship between accuracy and profitably in ML-based trad- set up to produce a binary classification output predicting the di-
ing, due to the fact that financial trading incurs considerable costs, rection of the price of the financial instrument of interest for the
which must be included in the assessment of any new technique. next trading period. The classification results are used to decide if
the instrument should be bought (long trade) or sold (short trade).
The first set of experiments included six different models trained
with historical exchange data of the USDJPY currency pair. Details
3. Experimental setup
on the particular set up (attributes and training parameters) are
discussed in the following sections.
In order to validate the assumption suggested by Barbosa and
Belo (2008a) where low complexity ML models can be used to 3.1. Data set and attribute selection
trade in a consistent profitable fashion, a multiagent system is im-
plemented to carry out a series of trading simulation experiments From the raw prices data, nine attributes are constructed off-
over a two year period. A six hour time frame gives the opportu- line to build the attribute vectors that comprise the initial train-
nity to open and close trades four times per day, expecting greater ing set. The particular attribute selection is based on Barbosa and
price movements during the interval, and consequently opening Belo (2008a), which suggests that the integration of a diverse types
the possibility to obtain greater profit per trade on average com- of features such as seasonality features, lagged values and techni-
pared to low range time frames such as 1 hour or less. At the same cal indicators such as moving averages, RSI and WR, are used in
time, as suggested in (Barbosa & Belo, 2008a), the convenience of the construction of the models to enhance the classification ca-
selecting six-hour time frames, starting at midnight, will guarantee pabilities in both the training and prediction processes. The char-
that the trades do not coincide with the traditional times where acteristics of the data and the selected attributes are shown in
major reports are released such as the Nonfarm Payrolls Employ- Table 1:
The nine attributes are calculated off-line for the training set
derived directly from the historical price information. The trading
3
The financial industry still debates the idea of accepting that there is no pos- simulation is performed using a test set comprised of off-sample
sibility to “beat the market” and in consequence, that instrument prices are not instances covering a two year period as described in Table 1. For
predictable. In other words, that is impossible to be able to obtain a consistently
return higher than an index growth with a simple buy-and-hold strategy (Becket
4
& Essen, 2010). The efficient market hypothesis claims at any given point in time, Slippage is defined as the difference between the expected price for a trade and
an instrument’s price always fully reflects all the information available. The random the effective price at which it is executed. In FOREX, slippage often occurs at high
walk hypothesis says that stock prices follow a random walk model, i.e. the varia- volatility periods where the prices exhibit unexpected movements, generally during
tions in price from one time step to the next one are completely independent. The news events releases, which makes very difficult to execute an order at a specific
martingale hypothesis suggest that forecasting based on historical prices is ineffec- price. Slippage in the trading of stocks is related to the spread between the ask and
tive (Becket & Essen, 2010). However several famous investors (Ellis, 2001), have bid prices, usually at execution of market orders at the time of a spread movement
been successful for decades making profitable financial forecasts, something that generally caused by the presence of large orders executed when there is not enough
should not be possible if asset prices were completely random. buyers/sellers to fill the desired price level to maintain the expected price of trade.
E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207 197
Fig. 1. Trading engine – A multi-agent system composed of two agents. A trading agent in charge of predicting price trends and generating market orders. A market agent
is in charge of keeping track of an order book and sends market information using a historical data base.
Table 1
Description of the training and test sets – no-retrain experiments.
Currency: USDJPY
Training set 5191 Wednesday, 02 January 2002 00:00:00–Friday, 29 December 2006 18:00:00.
Test set (off-sample data): 2510 Thursday, 18 January 2007 12:00:00–Monday, 22 Jun 2009 00:00:00.
Attributes
Nine attributes Hour, day of the week, closing price, percentage of price change, lagged percentage of price change, lagged percentage of price
change moving average (10 periods), relative strength index, Williams %R, class.
the test set (off-sample data), the trader agent is in charge of calcu- Williams %R oscillator is below –80 or over –20, it means
lating the corresponding attributes every new trading period. The that the instrument is oversold or overbought respectively. The
particular details of the selected attributes are: Williams %R compares the closing price in the current period
with the lowest and highest prices in the last n periods and it
• HOUR: time of the day in when the instance is captured, e.g.,
is calculated as:
trading fixed 6 hour time frame. The possible values are: 0, 6,
12 and 18. wil l iams_R(n )i = ((closeP ricei −1
• DAY_OF_WEEK: nominal attribute represented as a list of nom- − highn )/(highn − lown ) ) × 100 (4)
inal labels, i.e., {Monday, Tuesday, Wednesday, Thursday, Friday,
Saturday, Sunday}. The last two attributes are selected taking
• RELATIVE_STRENGTH_INDEX: this attribute is also a technical
into account seasonality factors in the price change (Nawaz & indicator which compares the magnitude of recent gains to re-
Mirza, 2012; Maggini et al., 1997). cent losses in an attempt to determine overbought and oversold
• CLOSING_PRICE: numeric attribute that represents the price of conditions of an instrument. The relative strength index oscil-
the instrument at the end of the trading period. lates from 0 to 100 and indicates an overvalued asset when its
• PERCENTAGE_PRICE_CHANGE: this numeric attribute describes value reaches 70 and therefore a price drop is expected, or oth-
the relative change in the price during the current trading pe- erwise if the index reaches 30 it indicates that the asset may
riod given by Eq. (1). be undervalued and a price rise will occur. The relative strength
index is calculated:
ppci +1 = ( (closeP ricei +1 − closeP ricei )/closeP ricei ) × 100 (1)
relat ive_st rength_index(n )i = 100 − 100 × (1/(1 − RS (n )i )
• LAGGED_PERCENTAGE_PRICE_CHANGE: This numeric value is
used in time series analysis such as ARIMA to integrate to val- (5)
ues of the past values of the series. In this case, the lagged per- • RS(n)i indicates a ratio between the average of past n periods
centage price change will be described by the preceding period where the price increased and the average of n periods where
percentage price change as: the price decreased, given by Eqs. (6)–(8):
lagged_ ppci = ppci −1 = ((closeP ricei −1 RS (n )i = UP _avg(n )i /DOW N _avg(n )i (6)
− closeP ricei−2 )/closeP ricei−2 ) × 100 (2)
• LAGGED_PERCENTAGE_PRICE_CHANGE_MOVING_AVERAGE: this
numeric attribute is calculated constructing the average of UP _avg(n )i = (closePricek − openPricek ) /n (7)
prices changes in the last n periods, given by: k
• where k is a period characterized by the close being higher
n−1
lagged_ ppcMA(n )i = ppci + lagged_ ppci−k /n (3) than the previous close, and
k=1
• WILLIAMS_%R: this numeric attribute represents a technical DOW N _avg(n )i = openP rice j − closeP rice j /n (8)
indicator whose value oscillates between 0 and –100. When j
198 E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207
• where j is a period characterized by the open being higher than stances in the training set are replaced with the latest and more
the previous opening price. up-to-date instances and the classifiers are discarded and reset
• CLASS: the class will be defined as a nominal feature that is at- using the new training set. This can be seen as a leave-n-out over
tempted to be classified by the ML models. It shows the direc- a sliding window where n is the retraining period. This procedure
tion that the price will take in the very next trading period. The ensures unbiased results because the training data is kept sepa-
labels for the class are UP and DOWN, describing if the closing rated from the test data and the results of classification over the
price is expected to rise or decrease compared to the opening test data results are never used for optimization as in other ML ap-
price in the very next trading period respectively. plications where the optimization is desirable as new data is being
processed. In this sense, the models receive unseen data to classify
3.2. Performance metrics which in theory do not exist because they represent points in the
future, approaching the evaluation of the classification capabilities
Many different techniques have been proposed to evaluate the to the usage that the models would experience in real life trading
performance of classifiers, such as repeated cross-validation which while at the same time avoiding the effect given by data snooping,
is probably the one of the most popular choice especially in a sit- i.e. a forward-testing mechanism as opposed a back-testing mech-
uation with limited-data. In theory, a cross-validation procedure anism. Using the leave-n-out over a sliding window, allows to cal-
delivers a more confident metric on the future performance of a culate the cumulative accuracy along the trading period, which is
particular model over unseen new data while at the same time used in this study to validate the different models, which in addi-
provides a consistent procedure to compare the classification capa- tion incorporates the temporal component to the assessment as it
bilities for different models applied to the same problem. Initially, evaluates performance over time.
a 10-fold cross-validation was used to assess classification capa- It is important to note that for trading applications, higher
bilities measuring the accuracy (not the profitability) of the mod- accuracy in predictions does not always imply higher profits. If
els in the retraining experiments. Every retraining procedure re- the returns obtained in a series of successive trades are not high
turns a rate of success from a 10-fold cross-validation routine us- enough to overcome the associated trading costs (among others:
ing the current sliding window as a training set. Finally, all the commissions, spreads and slippage) any trading strategy will even-
individual values from each training are averaged to present an tually lose money, even though the strategy seemed to be prof-
expected performance metric of the models. The average 10-fold itable on paper. In this sense, the cumulative return over an exten-
cross-validation value for each experiment set-up is finally pre- sive period of time represents a more adequate metric for this type
sented as an evaluation metric in the general results. of study. Cumulative return takes into account the underlying ac-
The financial prediction problem must be assessed also from a curacy of the predictions that triggered the series of trades. In ad-
different perspective due to problems that can invalidate the pre- dition, it also represents a weighted average of the successful rate
diction results such as bias, variance and data snooping. One of the as the positive predicted trades are associated with the individ-
main issues in ML is to find a trade-off between bias and variance ual profit obtained. Financial trading strategies are to be evaluated
(Witten et al., 2011). Bias is understood as the error rate, i.e. the by observing their historical trading track records, so the returns
proportion of wrongly classified instances over a whole data set. are mandatory, which in turn represents a clear and simple perfor-
Variance is the error that results from perfectly fitting an over- mance of investment indicator of interest for financial traders. The
complex model to a finite data set, which might not reflect the results of the trading simulations are assessed using the following
whole set of patterns present in the universes of instances for metrics:
a particular application. In these experiments the use of reduced
• Accuracy (%): percentage of accurate predictions for the entire
training sets during retraining does not reflect patterns in a global
trading period.
wider historical data set, but in turn will locally minimise the bias
• UP_Accuracy (%): percentage of accurate UP predictions for the
deliberately avoiding a variance minimisation mechanism during
next trading period.
predictions made in between retraining periods. Therefore, peri-
• DOWN_Accuracy (%): percentage of accurate DOWN predictions
odic retraining allows the adjustment of the bias of the models in
for the next trading period.
the short term, while at the same time addressing variance in the
• Cumulative Return (%): percentage of return accumulated at the
long term, avoiding the need of using a larger sample of the time-
end of the trading cycle. The cumulative return for the ith pe-
series to build the models (Maggini et al., 1997). This situation is
riod is calculated as:
especially desirable in financial forecasting, where the time-series
are influenced by many factors than can possibly be captured by a Cumulat iveRet ur ni = (1 + Cumulat iveRet ur ni −1 )
comprehensive set of technical indicators (Maggini et al., 1997).
×(1 + Retur ni ) − 1 (9)
The approach followed in this paper uses the same data set at
each iteration of parameters, situation usually associated with data • Maximum drawdown: the maximum drawdown measures the
snooping. According to White (20 0 0), this problem is virtually im- historical maximum peak-to-valley decline of an equity value or
possible to avoid in time series analysis, and especially in financial in a trading strategy using the cumulative returns to keep track
analysis due to the existence of an unique set of historical data. of the movements. In other words, the maximum drawdown in-
The approach used in this paper is entirely empirical in the sense dicates the maximum accumulated loss the trading agent expe-
that the variation of parameters in each experiment attempts to rienced while trading.
observe the impact in the selected metrics rather than being an • Average return per trade: indicates the average return obtained
attempt to prove the model validity or performance. in all the trades at the end of the trading cycle.
To minimize the effects given by data snooping a modified pro- • Long_Accuracy (%): long accuracy is tied to the UP_Accuracy
cedure of the leave-one-out cross-validation is used. In the tra- since an UP prediction will produce a buy order (long trade).
ditional approach of the leave-one-out cross-validation, each in- The long accuracy will measure the percentage of those trades
stance in the dataset is left out once and the training is performed that actually presented positive profit without taking into ac-
on all the remaining instances. The results of all classifications, one count trade costs.
for each instance in the dataset, are averaged to obtain the error • Short_Accuracy (%): the short accuracy is tied to the
estimate. In this experiments a retraining procedure is effectuated DOWN_Accuracy since a DOWN prediction will produce a
at predefined periods of time. During each retraining, the first in- sell order (short trade). The short accuracy will measure the
E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207 199
percentage of those trades that actually presented positive criterion discussed previously. The building of the model and the
profit without taking into account trade costs. splitting of the data are continued as long as there are at least
the minimum number of instances (set as parameter) present at
3.3. Machine learning classifiers a node. For the experiments conducted in this paper, the parame-
ters required by the WEKA API were: the number of iterations for
From the extensive spectrum of data mining algorithms, six LogitBoost set to 1, the minimum number of instances at which
models are used in the study presented in this paper to gener- a node can be split set to 15, the weight trimming for LogitBoost
ate the financial prediction capabilities. In addition to their inher- was set to 0 for no weight trimming.
ent simplicity, the selection of the algorithms was made in a way The Repeated Incremental Pruning To Produce Error Reduction
that ensured a representative algorithm coming from several ap- – RIPPER – rule learner (Cohen, 1995), takes one class at a time,
proaches traditionally used in data mining was present in the ex- and attempt to create a rule that covers as many instances of that
periments: instance-based classifiers (also called lazy models), deci- class as possible, by iteratively adding conditions to it that apply
sion trees and rule learners. The agents’ decision modules are built only to the class being targeted. The method for picking the best
using the WEKA library5 (Witten et al., 2011). Before discussing the attribute is based on maximization of the accuracy for the class
experiments related to financial prediction, it is important to men- under scrutiny. Rules are created greedily adding antecedents to
tion the basic characteristics and set-ups used for the classifiers in a rule until it becomes 100% accurate. In each iteration, the con-
this work. dition to be added is selected by testing every possible value for
The naïve Bayes is an instance-based classifier which general- each attribute, and picking the condition with the highest informa-
ization rules are derived from the Bayes’ theorem of conditional tion gain. The parameters used in the WEKA implementation were:
probability. For nominal or discrete attributes, the probability of number of folds set to 3, the minimal weights of instances within
an attribute to belong to a particular class is calculated by deter- a split set to 2, the number of runs of optimizations set to 2 and
mining their relative frequency in the training set. On the other the seed for randomization set to 1.
hand, numeric attributes can be discretized and treated similar to
nominal attributes, or otherwise, it is assumed that the values of 4. Experimental results: trading simulations
the attributes follow a particular probability distribution function.
In this paper, the normal distribution was assumed to model the A preliminary set of experiments was conducted where the
numeric attributes. Trading Agent trained a specific ML model and posteriorly simu-
The K∗ model (Cleary & Trigg, 1995) is also an instance-based lates a total of 2510 trades over the whole test set period shown
classifier that tests a new instance by determining those instances in Table 1. The results in this scenario were poor in terms of both
in the training set that are closer to it using a similarity measure- accuracy of the predictions and cumulative returns. A subsequent
ment, and assigning the predominant class to the new instance. set of experiments was conducted where periodic retraining was
The distance between two instances in K∗ is calculated by mea- used while investigating the performance effect of varying three
suring the “complexity” of transforming instance a into instance variables: retraining set size, period of retraining and number of
b using a sequence of predefined operations and calculating the attributes. The inclusion of periodic retraining in the models has
probability of the occurrence of the sequence if the operations are shown that profitable trading can be achieved by selecting a trade-
chosen randomly. An important parameter is used in the calcula- off in these variables, which might be different for each ML model.
tion of probabilities called “blending parameter”. Selecting the same To validate these experiments, the same procedure was used to
“blending parameter” for all the attributes gives equal weighting to simulate trades using GPBUSD and EURUSD currency pairs.
each one of them, and is usually the approach used when apply-
4.1. Single training experiments
ing the K∗ algorithm. WEKA gives the option of setting the blend
parameter for the experiments, and in this work, it was set at 20%.
The first set of experiments consists of an initial training of the
The experiments were conducted with the Blend setting modes pa-
ML models using the entire training set described in Table 1 with
rameter set to “spherical”.
a total of 5191 instances. Once the models are built, the Market
The C4.5 (Quinlan, 1993) is an internally structured tree, in
Agent feeds the Trader Agent with the price information from the
which each leaf represents a classification, while the branches that
off-sample data. The system simulates the trading activity for each
connect the leaf to the root node equate to conjunctions of condi-
one of the selected ML models and generates as output the final
tions that lead up to that classification. The C4.5 algorithm makes
order book with the respective trades at the end of the off-sample
use of entropy to grow the tree iteratively from the training in-
period.
stances separating the training instances into different branches,
Table 2 presents the trading results for the six machine learn-
according to the values of a specific attribute until all the instances
ing models and an additional random trader. As can be seen, the
in the branch belong to the same class or when none of the poten-
accuracy of the models is not higher than 51.5%. Intuitively it is
tial splits results in an information gain. For the WEKA usage of the
possible to think that an accuracy of approximately 50% in trading
C4.5 classifier, the confidence threshold for the pruning parameter
is not better than a random guess, as suggested by the results ob-
was set to 0.25, and the minimum number of instances accepted
tained in the cumulative returns which are similar to the negative
per leaf was set at 2.
profits obtained by the random trader. Surprisingly, OneR being the
Logistic Model Tree, LMT (Landwehr, Hall, & Frank, 2005), is a
simplest classifier, obtained positive cumulative return at the end
classifier that implements two models for classification: linear lo-
of the trading cycle as opposed to the results shown by the other
gistic regression and tree induction. The implementation of WEKA
classifiers using the same setup. OneR constructed its model based
uses the LogitBoost algorithm, as a numeric optimization tool to
on the Williams %R oscillator, generating a complex set of more
estimate the parameters needed in the logistic regression proce-
than 200 conditions to test the generated prediction rule creating
dure. To find a root node, the LogitBoost algorithm is run over the
fine grain set intervals while the value of the indicator might fall
training set to build a logistic regression model in a five-fold cross-
when applied to unseen instances. The question that arises at this
validation iterative process. The data are then split using the C4.5
point is whether the OneR classifier is indeed able to identify the
most meaningful attribute to predict the behaviour of the price for
5
WEKA information, API and source code is available at: http://www.cs.waikato. the next trading period, or if in this particular case, are the good
ac.nz/ml/weka/. results a are consequence of mere luck.
200 E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207
Table 2
Simulation results of the USD/JPY trading agents, Single training at inception.
Ticker USDJPY
Training set size 5242
Attributes Attributes : <hour>,<day>,<closing_price>,<ppc>,<lppc>,<lppcma>,<RSI>,<Williams%R>,<class> (9 attributes)
Accuracy DOWN accuracy UP accuracy SHORT accuracy Long accuracy MAXDD Average ret/trade Cummulative return Trades
Fig. 2. Cumulative accuracy single training experiment, first 100 trading periods.
It is worth to note form Table 2 is how comparable accura- overcome the loses of those trades incorrectly predicted. This situ-
cies, e.g. between K∗ with 49.84% and JRip with 49.76%, leads to ation is precisely what an investor would expect in real life trad-
considerably different cumulative returns of –13.59% and –22.08%, ing, where an individual cannot win all the time, just expect to be
respectively. The same case is observed for the C4.5 and OneR profitable when trades go well. In these experiments the cumula-
which have comparable accuracy, 51.08% and 51.27% respectively, tive return is calculated as a relative metric that is updated every
yet considerably different cumulative returns of 1.2% and 31.96%. period including the profit of the last trade as shown in Eq. (9).
Even though the final accuracy seems to be comparable, the return Fig. 2 presents a detailed cumulative accuracy during the first
per trade is showing that the K∗ was less prone to lose money than 100 periods. As one can expect according to the results of Table 2,
the JRip in the first case. While it is true that OneR’s accuracy is all the models tend to reach steady values of approximately 50%,
not significantly higher than the other models, what captures at- but also an interesting feature can be noted: the first 50 trading
tention is its positive average return per trade of 0.0119%, signifi- periods after the initial training showed accuracies over 60% for
cantly higher than the rest of the classifiers, and which made the most of the models, specifically for Naïve Bayes, OneR, K∗ and JRip-
model profitable at the end of the trading cycle. Data not present per. Consequently, it could be presumed that if a periodic retrain-
in Table 2 showed that the average return per trade is 0.06537% ing process is executed after n periods (with n approximately less
for the long trades and 0.2691% for short trades, which means that than 50), the cumulative accuracy may perform similar to the ini-
the OneR classifier was extremely profitable in downwards peri- tial period and thus sustain a higher average than the 50% obtained
ods for this particular experiment. For the other classifiers, while for the single training case.
prediction accuracies may be similar to those observed in the lit-
erature (Chen & Shih, 2006; Eng et al., 20 08; Kim, 20 03; Lee et al.,
20 07; Li & Kuo, 20 08; Tenti, 1996), the average return per trade 4.2. Retraining experiments
does not allow the models to have a positive return at the end
of the trading period. These initial results suggest that it is possi- The hypothesis that periodic retraining may increase the av-
ble to generate positive return with modest middle range accuracy. erage accuracy, motivated the next set of experiments where the
The premise in (Barbosa & Belo, 2008b) points out that a low value retraining period was varied as long as two different sets of at-
of accuracy, such as the ones observed in Table 2, is not necessar- tributes were used to create the instances, as shown in Table 3.
ily a bad situation if the return per trade resulted from the cor- While the previous experiment followed the set up proposed in
rectly predicted trades are profitable enough on the long term to (Barbosa & Belo, 2008b), at the time of conducting the experiments
E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207 201
Table 3
Description of the training and test sets – retraining experiments. Two sets of attributes were used.
Currency: USDJPY
Training set 5242 Wednesday, 02 Jan 2002 00:00 - Thursday, 18 Jan 2007 06:00
Test set (off-sample data): 6442 Thursday, 18 Jan 2007 12:00 - Wednesday, 17 Apr 2013 00:00
Attributes
Five attributes Hour, day of the week, lagged percentage of price change, percentage of price change moving average (10 periods), class.
Nine attributes Hour, day of the week, closing price, percentage of price change, lagged percentage of price change, lagged percentage of price
change moving average (10 periods), relative strength index, Williams %R, class.
Fig. 3. Cumulative Accuracy for financial prediction of USDJPY discriminated by the machine learning models in the first 150 trading periods, for periodic retraining every
50 periods and incremental training set size. Accuracies for the different classifiers tend to flatten at the final average value from early times in the trading periods which
consisted in a total of 6442 points in time.
for this paper, several more years of data were available. Informa- trend. This characteristic seems to be critical when trading during
tion of prices of four additional years were included in the test set. 2008 and 2009, years where the financial system suffered erratic
behaviours and high volatility.
4.2.1. Periodic retraining every 50 periods, incremental training set The training set is incrementing its size every retraining pro-
size, 9 attributes cedure; thus it is also possible to infer that a big set of instances
A first periodic retraining experiment was executed, setting the used for retraining might be generating noisy information in the
retraining period to 50 trades. The training set size was incremen- construction of the models, due to the fact that the patterns in the
tally grown incorporating the information of the past 50 trading distant past might be poorly related to the current situation of the
periods as new training instances. As expected, the time consumed market and in some cases opposite relationships might be encoun-
in retraining the models was also incremented every new iteration. tered in the training set. The simple models used in these tests
Contrary to what was expected, the cumulative accuracy did not might not be able to generalize rules from such a large amount
show the expected recovering behaviour at the retraining points. of data. Cumulative results obtained by Naïve Bayes, OneR and K∗
The tendency to flatten around the 50% for the average accuracy may be evidence of this assumption.
has persisted. Fig. 3 shows the cumulative accuracy for the retrain-
ing period n = 50 and incremental training set size since inception 4.2.2. Experiments with variable retraining set size (sliding window),
for the first 150 trading periods. retraining periods and number of attributes
Table 4 presents the results obtained for retraining the models Periodic retraining improved the average accuracy and cumula-
every 50 trading periods with an incremental training set size. An tive profits, but the incremental size of the training set size may
increment of two points in the total average accuracy was reached be affecting the classifiers’ performances in a negative way. While
for all the models with the exception of the OneR classification it is important to give the opportunity to the models to learn un-
which decremented its average accuracy. Additionally, C4.5, JRip- seen patterns in the price trends, at the same time, it will be de-
per and LMT obtained positive returns, although not significant for sirable that they are able to predict using a more “up to date”
a trading period of almost six years. Given the results of Table 4, it training set that represents a fresher situation of the market with-
is clear that the incremental augmentation of the size of the train- out taking into account particular conditions located too far in the
ing set did not show a major effect on the average accuracy. On past. Thus in further tests, four sizes in the training set were se-
the other hand, it had a significant effect on the cumulative return lected using the last n instances using a sliding window approach,
reflected on the average returns per trade. This general increment trying to cover a range of training set sizes. The selected values
in the profits may be caused by the fact that the periodic retrain- where n = [50 0, 10 0 0, 20 0 0, 40 0 0]. The retraining period is being
ing allowed the models to learn new unseen patterns in the price also varied every 5, 10, 15 and 20 trades, and the instances are
202 E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207
Table 4
Prediction results for USDJPY. Retraining period = 50, Retraining test size = incremental since inception, 9 attributes.
Ticker USDJPY
Retrain set size Incremental
Retrain periods 50
Attributes Attributes : <hour>,<day>,<closing_price>,<ppc>,<lppc>,<lppcma>,<RSI>,<Williams%R>,<class> (9 attributes)
Long Short
DOWN UP SHORT Long Average Cummulative average average
Accuracy accuracy accuracy accuracy accuracy MAXDD ret/trade return ret/trade ret/trade Trades
Model (%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
OneR 49.86 51.19 48.59 48.61 48.71 31.59 –0.0 0 05 –7.22 –0.0032 0.1560 6441
C4.5 52.25 54.57 50.78 52.39 50.74 20.28 0.0104 87.50 0.0065 0.30 0 0 6441
Jrip 50.81 52.17 49.54 51.34 51.05 23.13 0.0042 25.81 0.0015 –0.0239 6441
LMT 52.84 54.58 51.41 51.41 52.34 22.42 0.0048 30.98 0.0019 0.0213 6441
Kstar 50.73 52.05 49.45 50.33 49.88 37.21 –0.0 0 02 –5.17 –0.0028 –0.1077 6441
NaiveBayes 52.59 53.42 51.51 50.98 51.26 41.07 –0.0034 –23.13 –0.0070 –0.1716 6441
constructed using both sets of attributes shown in Table 3. It is not seem to perceive substantial improvements over the previous
important to clarify that the objective of varying the parameters setup but again presented a reverse effect when compared with
was not to optimize, but an empirical evaluation of the effects on the case of using nine attributes while maintaining the retraining
models prediction performance at varying the retaining period and set size and the retraining periods, although it continued present-
the number of attributes. A heuristic approach was used to vary ing positive returns for all the setups using 10 0 0 instances for the
the parameters. In total 12 different setups were used to cover dif- training set. For the rest of the models, the increment in the train-
ferent combinations of the variables in order to visualize the ef- ing set, compared with size 500, as well as the reduction of the
fects of these parameters in the model predictions and the subse- number of attributes generated beneficial effects in the average ac-
quent simulated trades, i.e. small retraining set size with frequent curacy and cumulative profit. What is starting to appear is the fact
retraining, medium training set size with higher period and the use that higher accuracy may not imply higher return. This is the case
of both five and nine attributes, and large retraining set size with for K∗, which showed an accuracy of 49.76% and a cumulative re-
less frequent retaining. Table 5 presents a consolidated summary turn of 62.47% in setup 3, but exhibited 49.30% accuracy with an
of the 12 different configurations used for the experiments. In all impressive 130.74% cumulative return for the setup 6. The oppo-
the cases the average accuracy was not higher than 55% for the site effect is observed with JRipper, which obtained 51.92% accu-
USDJPY data set, but on the other hand, the cumulative profit was racy with 41.86% cumulative return in setup 3, and 52.32% accuracy
greatly improved. with a smaller 12.46% cumulative return in setup 7.
A group of experiments was conducted with a retraining set of A more frequent retraining seems to have a positive effect on
size 500, setups 1–3 in Table 5. Retraining every five trading pe- the general classifier performance, but at the same time, having
riods and using five attributes, presented similar effects compared a bigger retraining set compared to the previous set up may pro-
to the results obtained with a retraining period of 10. Average ac- duce positive results as well. Naïve Bayes uses normal distribution
curacies experienced an increment in one or two points compared to estimate the weights of the numeric attributes and a condi-
to the results observed in Table 4 that use an incremental training tional probability derived from the frequency of occurrences for
set size, reaching values of approximately 52–53% with the excep- the nominal attributes, thus the bigger training set size will im-
tion of OneR that maintained an accuracy between 49% and 50% ply wider normal distribution shapes since the longer the train-
while at the same time decreasing drastically its cumulative return. ing periods, the more prone they will be to exhibit different finan-
What is notable is the substantial increment in the cumulative re- cial time series regimes. A smaller training set may produce values
turn for the rest of the models. When retraining occurs every 15 more closely related to the current trend in the prices, and thus,
periods using nine attributes, the average accuracies experienced a the model can exhibit higher accuracy. A longer retraining period
slight decay, reflected as well in the cumulative profits at the end of 15 using nine attributes per instance used in setup 6, also ex-
of the trading season. Eventhough OneR experienced an increment hibits a slight increase in most of the accuracies. K∗ presented an
of one point in its accuracy and a substantial increase in the cumu- impressive cumulative return of 174.64%, higher than the one ob-
lative return, it is still negative. K∗ presents a reversed behaviour tained in other setups which may suggest that this model is better
observed in 49.7567% of accuracy while maintaining a high posi- at generating more profitable trades when using the set of nine
tive cumulative return. attributes. The less frequent retraining points seems to affect the
The next group of experiments were conducted using a retrain- JRipper model which was more capable of generating profitable
ing set of size 10 0 0 instances, the results are presented in setups trades retraining every 10 periods instead of 15. LMT seems not
4–9 in the Table 5. Setup 4 with a retraining period of 10 and us- to be greatly affected by the change in the number of attributes.
ing five attributes, exhibited that the OneR classifier benefited from Naïve Bayes and C4.5 tend to be more accurate and more prof-
the increased size of the training set compared with the previous itable with a reduced set of five attributes in all the experiments
setup presenting a slight upturn in the accuracy but on the down- so far.
size, it experienced a reduction its cumulative return, which might A small group of experiments was conducted with a retrain-
suggest that the rule learned from a smaller set of attributes is not ing set size of 20 0 0 instances shown in setups 10 and 11 in Table
adequate to generate profits. OneR – One Rule, selects minimum- 5. When retraining every 15 periods and using nine attributes,
error attribute for prediction. For this classifier, a greater training C4.5, JRipper and K∗ increased their accuracy with the increased
set might include higher error values as the number of training in- training set. Nevertheless, again K∗ presented a reverse effect, in-
stances increased or at the same time, if the number of attributes creasing its accuracy but reducing the capability of generated prof-
is incremented, because just one of the attributes may not ex- itable trades as can be seen in the reduction of the cumulative
hibit enough correlation with the prediction of the class. K∗ did profit from 174.63% (setup 6) to 137.52% (setup 11), although still
E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207 203
Table 5
Consolidated results for the experiments with variations in retraining set size, retrain period, and number of attributes.
Experiment setup Retrain set size Retrain periods # of attributes Metrics Machine learning models
attractive from an investor point of view. OneR decreased its ac- cumulative return in setup 10. For LMT, higher values in its accu-
curacy and cumulative return as the training set increased. Naïve racy and cumulative return were obtained with a smaller training
Bayes also reduced its accuracy and cumulative profit, a situa- set of 10 0 0 when using nine attributes.
tion that offers strong evidence to the premise that the increase For a larger retraining set size of 40 0 0 instances, a longer re-
of the training set can generate negative effects on the perfor- training period of 20 was set, using 9 attributes, as shown in setup
mance for these two models. Although the LMT model decreased 12 in Table 5. It should be noted that as the size of the training set
its performance, the results are still outstanding, obtaining 137.54% is incremented, the time consumed in building the models is also
204 E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207
Table 6
Combination of parameters for (a) maximum cumulative return and (b) maximum average accuracy, by model. As seen, in some cases, both maximums do not reflex
a direct correlation.
(a)
OneR C4.5 Jrip LMT Kstar NaiveBayes
(b)
Max accuracy (%) 51.53 54.03 54.01 53.84 52.10 53.58
Cumulative return (%) 4.49 147.86 137.54 156.45 137.52 136.94
Retrain set size 1000 1000 2000 1000 20 0 0 1000
Retrain periods 5 15 15 10 15 5
# of attributes 9 5 5 9 9 5
greatly increased. In general, the models obtained positive cumu- Bayes share the number of attributes used to construct the models
lative returns but not as high as the ones obtained with smaller in both cases, but with vary retraining periods and retraining set
training sets. 40 0 0 instances represent 76% of the initial training size.
set, thus this situation may be replicating the negative effects in One of the best set ups for the experiments consisted of a re-
the prediction performance observed for the first two setups when training set of size 10 0 0 instances, retrained every 10 periods, and
no retraining was implemented or an incremental training set was five attributes, in which five out of six classifiers presented posi-
used. tive and high cumulative returns. Fig. 4 presents a comparison of
the evolution of the cumulative return over the entire off sample
4.3. Results analysis trading period for the classifiers. What is interesting to note in Fig.
4 is that the period of trading between 2007 and 2009 does not
It is possible to note from Table 5, that the results obtained by present good performance, possibly as a result of very volatile be-
the average 10-fold cross-validation does not seem to exhibit a cor- haviour in the economy for that particular period of time. On the
relation with the cumulative accuracy and the cumulative returns. contrary, the plot shows a consistent increase in the cumulative
Taking for example, the Setup 1 and the Setup 2 for the Naïve return after 2010 for most of the classifiers with the exception of
Bayes model, it can be noted that both the accuracy and the cu- OneR. This may suggest that the selection of simple ML models
mulative return decreased simultaneously, from 52.99% to 52.74% for trading may be useful in times of “normal” behaviour of the
for the accuracy and 142.89–123.55% for the cumulative profit, market, but that such models will not respond well to extreme
whilst the assessment obtained by the 10-fold cross-validation in- events such as the economy crisis during 2008 and the subsequent
creased from 50.90% to 51.30%. Following the Setup 3 also for period of recovery. Each model seems to respond to a particular
the Naïve Bayes classifier, the 10-fold cross-validation rises even semi-optimal combination of retraining set sizes and number of
more 53.31% but the accuracy and cumulative return diminished to attributes, but the retraining period does not affect the final ac-
52.20% and 54.76, respectively. For the Setups 4 and 5 for the Naïve curacies significantly.
Bayes model, the 10-fold cross-validation follows decrementing be- Similar experiments were conducted with other currency pairs.
haviour observed by in the accuracy and cumulative return. Simi- Table 7 shows the experimental set ups that produced maximum
lar contradictory cases can be found along Table 5, suggesting that cumulative returns and maximum accuracies for EURGPB and EU-
10-fold cross-validation might not be the more appropriate method RUSD currency pairs. While cumulative returns obtained by the
for this type of study. The traditional fold cross-validation divides classifiers with these new currencies were not as impressive as
the data set series into sets used to train and generalize, usually the ones obtained in the USDJPY, with an appropriate balance be-
using a random approach. As discussed, the models does not gen- tween the retraining set size, retraining period and number of at-
eralize well using a large training set, due to the fact that simi- tributes, the models still manage to obtain positive returns. The
lar market conditions represented by the particular values of the lower return for EURGPB and EURUSD currency pairs might re-
attributes in the instances can present opposite classes. In those flect the fact that those pairs are less volatile than the USDJPY, i.e.
cases, classifiers as the ones used in these experiments are unable the exchange rate moves only a few pips from one trading period
to generalize rules. The rationale behind using smaller training sets to the next limiting the possibility of obtaining high returns per
relies in the fact that points in time closer to the trading period trade.
that need to be predicted are more likely to exhibit similar con- From a managerial point of view, the trading strategy followed
ditions related to the market and therefore the training procedure in this paper must be refined. As seen in Fig. 4, the models are
will use them. A 10-fold cross-validation cannot ensure that the subject to individual loses that were especially critical during 2008
more recent points in time series are kept in the training set dur- and 2009, years where the financial system suffered erratic be-
ing a particular fold, and the obtained performance over the train- haviours and high volatility. Once a more stable situation in the
ing set might not reflect the future performance of the classifier in markets appeared, the models tend to recover and follow a gain-
the incoming unseen instances. ing trend in their cumulative returns. The year 2008 kept all the
The models used in the experiments present high sensitivity to models except K∗ on negative profits. The trading simulation does
the size of the training set and the number of attributes as can not take into account trading cost and leveraged trading. If the re-
be seen in the consolidated results presented in Table 5. Table 6 turn obtained in each trade is not high enough to cover for the as-
presents the maximum values obtained by each model under two sociated costs, the strategy will lose money even though the pre-
criteria: (a) maximum cumulative return and (b) maximum av- diction is accurate. In this sense the average profit per trade can
erage accuracy. OneR, JRipper and K∗ presented a direct correla- be an important metric to evaluate how much the associated costs
tion between accuracy and cumulative profit. C4.5, LMT and Naïve affect the strategy performance. Additionally, when trading under
E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207 205
Fig. 4. Cumulative return for the experiment setup: Retraining set Size = 10 0 0; Retraining period = 10; Number of attributes = 5.
Table 7
Experiment set ups for (a) maximum cumulative returns for EURGPB, (b) maximum accuracies for EURGPB, (c) maximum cumulative re-
turns for EURUSD and (d) maximum accuracies for EURUSD.
leverage, the profits can be multiplied extensively but also are the this point in the discussion is worth to examine the current re-
losses, where a single trade can generate a margin call. In addition, sults in light of other applications that have focused in those com-
each of the machine learning models act as an individual trader. A plex models. The following analysis must be taken carefully since
more efficient strategy can be implemented combining the predic- the financial instruments that have been assessed are generally
tions of the different models in a classification ensemble that gen- not the same, neither are the selection of attributes and training
erates trades according to a weighted voting mechanism based on strategies. The main objective of this section is to compare and il-
the past accuracy of the individual models. A weighted decision in lustrate the prediction capabilities of machine learning in trading
addition to risk management strategies such as stop loss and take applications.
profit limits must be considered if ML approaches are to be used In (Tenti, 1996) the comparative the performance of three re-
in a real life automated trading strategy. current neural networks is presented based on their returns in the
simulated forecasts on currency futures, reporting accuracies be-
4.3.1. Accuracy comparison tween 43% and 48.5%. Wu and Lu (2009) compared a NN model’s
Even though the rationale behind this study is to evaluate the performance against the ARIMA model, predicting the direction of
trading capabilities of ML models that exhibit a greater degree future values of the S&P 500 Index. The experiments reported a
of simplicity when compared to the traditional NNs and SVM, at modest 23% level of accuracy against the ARIMA’s 42% in more
206 E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207
volatile scenarios. In turn, in (Kamruzzaman & Sarker, 2003) the use of simple machine learning allow a seamless upgrade if new
authors compared the performance of the ARIMA model with attributes are selected. The results have shown that while it is
several NN models when forecasting exchange rates of currency possible to obtain profitability using simple classifiers, each model
pairs in the FOREX market, showing that all NN models outper- needs a particular setup taking into account variables such as the
formed the ARIMA model with an accuracy of 80%. Taking into retraining period, the retraining set size and the number and type
account that only the best results obtained were reported and of attributes selected to construct the model. The complexities of
that, in general ML techniques do not present high levels of ac- the market require a particular combination of the parameters that
curacy with unseen data that differ significantly from the training might change in different market conditions and seasons for the
sets, these impressive results must be considered with care. Us- same instrument. The models need to learn new patterns in or-
ing SVM, the work presented in (Kim, 2003) attempted to fore- der to cope with the dynamics of the market, but at the same
cast daily price directions of the KOSPI stock index, using tech- time, must only use a training comprised of recent values of the
nical indicators (momentum, Williams %R and commodity chan- time series using a sliding window approach, to avoid noisy pat-
nel index) as inputs and the best accuracy obtained after train- terns that might not be related to the current market situation.
ing several models with different parameters was 57.83%. The work A heuristic approach was used to find pseudo-optimal combina-
also presented a comparison with the back propagation NN (with tion of the variables mentioned, but a more systematic approach
54.76% of accuracy) and nearest-neighbour model (51.98% of ac- would be needed to find an optimal trade-off that maximizes both
curacy). In another study (Tay & Cao, 2001) a comparison be- accuracies and cumulative profits, taking into account the particu-
tween SVM and back propagation NN is presented to forecast larities of different financial instruments. Most of the models ex-
prices of five types of futures contracts. On average, the SVM ap- hibited positive returns for certain experimental setups and com-
proach obtained better accuracy than the back propagation NN, bination of parameters, nevertheless their middle range accuracy
but also in middle-range levels: 47.7% by SVM against 45.0% ob- might be seen as the main weakness of the approach used in this
tained by back propagation NN. SVM also outperformed a back paper.
propagation NN in the work presented by Chen and Shih (2006), Attribute selection represents a critical aspect in the construc-
where these techniques were used to predict the value of six Asian tion of the ML models, since a good set of attributes derived from
indices, obtaining 57.2% level of accuracy with SVM and 56.7% the financial time series will ease the process of classification. In
with NN models. In addition to simple models, Barbosa and Belo this paper a set of attributes was selected from diverse fields that
(2008b) also analysed the performance of SVM in the Forex mar- include price related features such as the price itself and the per-
ket reporting cumulative accuracy 53.4% with a positive cumu- centage of price change, seasonality features such as the day of
lative return of 56.0% trading a currency pair over a two year the week and the hour of the day, lagged values used in classi-
period. cal time series analysis such as lagged percentage of price change
As seen, the attempts to predict financial time series tend to and its moving average, and of course technical indicators, which
exhibit modest accuracies, not far from the experiments shown in in this case are represented by the Williams %R oscillator and the
this paper. Machine Learning approaches are still far from being RSI which relates recent gains and losses over the past trading pe-
optimal solutions for financial forecasting due to the complex na- riods. The attributes must be used in conjunction due to the fact
ture of the financial time series and market erratic behaviours and that those values by themselves do not produce enough informa-
the possible presence of extreme events that can undermine any tion regarding the future trend of the price that allow to separate
generalization or pattern found on them as shown in the capabil- effectively the classes in both the training set and the off-sample
ities and limitations of the models observed in this paper and in data, due to the complexity of financial markets. For instance, it is
other studies. possible to observe is that technical indicators that present an os-
cillatory behaviour will repeat their values with time as well as the
5. Conclusions and future work seasonality features. The prices also move typically around certain
ranges during short periods of time and the percentage of price
This paper discusses the usefulness of simple ML models ap- change is also comparable from one instance to another when re-
plied to trading scenarios, to verify if it is possible to obtain con- duced trading periods are selected, i.e. minutes and hours. These
sistently profitable returns while taking advantage of the computa- attributes have shown not to be highly discriminative observed in
tional simplicity of binary classification models. An extensive set of instances with similar attributes that might present contradictory
trading simulations in the FOREX market was conducted in which classes for different market conditions. Thus the use of attributes
six ML models were used to classify a set of instances constructed coming from technical analysis and classic time series analysis pro-
from historical price data for USDJP, EURUSD and EURGPB currency vides means to represent the state of the market in a particular
pairs. The result of the classification was interpreted as a predic- point in time and as it has been demonstrated in this paper, allows
tion and used to simulate a series of orders over a trading pe- to produce profitable trading despite the fact that the obtained ac-
riod of six years. While the paper does not attempt to incorporate curacy is not impressive. Future work will include the use of an
new theoretical contribution in the model construction, it provides extended set of attributes that explores some from the extensive
a valuable insight in the use of machine learning classification to offer of technical indicators available in the trading literature for
elaborate predictions in financial time series. To the best knowl- the construction of the classifiers. The main risk of a large group
edge of the authors, other related papers focus the construction of of characteristics is the presence of attributes that represent re-
sophisticated models that are evaluated by accuracy and/or RMSE. dundant information. To overcome this drawback, a feature selec-
A theoretical contribution in the model construction exceeds the tor can also be implemented such as the one presented in (Cai, Hu,
scope of this paper, but opens the door to explore in the construc- & Lin, 2012) which implements a Restricted Boltzmann Machine to
tion of machine learning models that are tuned for the particular extract features from technical indicators which in turn are the in-
conditions given by a set of non-discriminatory attributes such as puts for a SVM-based regression model. Future research will also
technical indicators and time series related indexes in financial ap- include new ML models, and other markets besides FOREX. Dy-
plications. namic retraining windows and variable training set sizes can be
The selected models present a low computational cost in both investigated for particular ML models in order to determine the
the training phase and the classification procedure which benefit precise moment in time when a model needs to be replaced with
their implementation for more frequent trading time frames. The a better setup of parameters.
E.A. Gerlein et al. / Expert Systems With Applications 54 (2016) 193–207 207
Acknowledgements Kim, K. (2003). Financial time series forecasting using support vector machines.
Neurocomputing, 55(1-2), 307–319. doi:10.1016/S0925- 2312(03)00372- 2.
Landwehr, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine Learning,
Eduardo Gerlein is supported by a Vice-Chancellor Research 59(1-2), 161–205. doi:10.1007/s10994- 005- 0466- 3.
Scholarship (VCRS) from the University of Ulster, as part of the Lee, J. W., Park, J., O, J., Lee, J., & Hong, E. (2007). A Multiagent approach to Q-
Capital Markets Engineering project. learning for daily stock trading. IEEE Transactions on Systems, Man, and Cyber-
netics - Part A: Systems and Humans, 37(6), 864–877. doi:10.1109/TSMCA.2007.
904825.
References Li, J., Tsang, E. P. K., & Park, W. (1999). Investment decision making using FGP: A
case study. In Congress on Evolutionary Computation (CEC’99): Vol. 5 (pp. 6–9).
Barbosa, R. P. (2011). Agents in the market place: An exploratory study on using intel- doi:10.1.1.170.2021.
ligent agents to trade financial instruments p. 307 (Ph.D. Thesis). Universidade do Li, S., & Kuo, S. (2008). Knowledge discovery in financial investment for forecasting
Minho, Escola de Engenharia. Braga, Portugal. and trading strategy through wavelet-based SOM networks. Expert Systems with
Barbosa, R. P., & Belo, O. (2008). Algorithmic trading using intelligent agents. In Applications, 34(2), 935–951. doi:10.1016/j.eswa.2006.10.039.
H. R. Arabnia, & Y. Mun (Eds.), Proceedings of the 2008 International Conference Lu, C.-C., & Wu, C.-H. (2009). Support vector machine combined with GARCH
on Artificial Intelligence, ICAI 2008 (pp. 136–142). CSREA Press. models for call option price prediction. In International Conference on
Barbosa, R. P., & Belo, O. (2008). Autonomous forex trading agents. In Proceedings of Artificial Intelligence and Computational Intelligence (pp. 35–40). IEEE.
the 8th industrial conference on Advances in Data Mining: Medical Applications, E- doi:10.1109/AICI.2009.464.
Commerce, Marketing, and Theoretical Aspects. ICDM’08 (pp. 389–403). Springer- Maggini, M., Giles, C. L., & Horne, B. (1997). Financial time series forecasting using
Verlag. doi:10.1007/978- 3- 540- 70720- 2_30. K -nearest neighbors classification. In Proceedings of the 1st Nonlinear Financial
Barbosa, R. P., & Belo, O. (2010). The agent-based hedge fund. In IEEE/WIC/ACM Inter- Forecasting Conf. (INFFC’97) (pp. 169–181). doi:10.1.1.9.291.
national Conference on Web Intelligence and Intelligent Agent Technology (pp. 449– McDonald, S., Coleman, S., McGinnity, T. M., Li, Y., & Belatreche, A. (2014). A compar-
452). IEEE. doi:10.1109/WI-IAT.2010.149. ison of forecasting approaches for capital markets. In Proceedings of 2014 IEEE
Becket, M., & Essen, Y. (2010). How the stock market works (3rd ed., p. 209). London, Conference on Computational Intelligence for Financial Engineering & Economics
U.K.: Kogan Page. (CIFEr) (pp. 32–39). IEEE. doi:10.1109/CIFEr.2014.6924051.
Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (1994). Time series analysis: Forecasting Nawaz, S., & Mirza, N. (2012). Calendar anomalies and stock returns: A literature
and control p. 598. Wiley. survey. Journal of Basic and Applied Scientific Research, 2(12), 12321–12329.
Cai, X., Hu, S., & Lin, X. (2012). Feature extraction using restricted Boltzmann ma- Nguyen, T., & Armitage, G. (2008). A survey of techniques for internet traffic classi-
chine for stock price prediction. In 2012 IEEE International Conference on Com- fication using machine learning. IEEE Communications Surveys & Tutorials, 10(4),
puter Science and Automation Engineering (CSAE) (pp. 80–83). IEEE. doi:10.1109/ 56–76. doi:10.1109/SURV.2008.080406.
CSAE.2012.6272913. Patterson, S., Letting the machines decide, The Wall Street Journal [on line]- Eu-
Chan, E. P. (2008). Quantitative trading p. 208. John Wiley & Sons. rope Edition, Retrieved on February 10, 2016, http://online.wsj.com/article/
Chen, W., & Shih, J. (2006). Comparison of support-vector machines and back prop- SB10 0 01424052748703834604575365310813948080.html.
agation neural networks in forecasting the six major Asian stock markets. Inter- Qi, M., & Zhang, G. P. (2008). Trend time-series modeling and forecasting with neu-
national Journal of Electronic Finance, 1(1), 49–67. doi:10.1504/IJEF.20 06.0 08837. ral networks. IEEE Transactions on Neural Networks, 19(5), 808–816. doi:10.1109/
Cleary, J. G., & Trigg, L. E. (1995). K∗: An instance-based learner using an entropic TNN.2007.912308.
distance measure. In Proceedings of the 12th International Conference on Machine Quinlan, R. (1993). C4.5: Programs for machine learning p. 302. San Francisco, CA,
Learning: Vol. 5 (pp. 1–14). USA: Morgan Kaufmann Publishers Inc.
Cohen, W. W. (1995). Fast effective rule induction. In Proceedings of the 12th Inter- Sewell, M. V., & Yan, W. (2008). Ultra high frequency financial data. In Proceed-
national Conference on Machine Learning (pp. 115–123). Morgan Kaufmann. ings Conference companion on Genetic and evolutionary computation - GECCO ’08
Di, M. (2007). A survey of machine learning in wireless sensor netoworks from net- (p. 1847). ACM Press. doi:10.1145/1388969.1388988.
working and application perspectives. In 2007 6th International Conference on In- Retrieved September 24, 2014, from Smicklas, T. . (2008). http://seekingalpha.
formation, Communications & Signal Processing (pp. 1–5). IEEE. doi:10.1109/ICICS. com/article/105288- s- and- p- neural- fair- value- 25- portfolio- good- source- for-
2007.4449882. investment-ideas.
Duhigg, C. (2006). Artificial intelligence applied heavily to picking stocks. The New Tay, F. E., & Cao, L. (2001). Application of support vector machines in financial time
York Times. Retrieved September 26, 2014, from http://www.nytimes.com/ series forecasting. Omega, 29(4), 309–317. doi:10.1016/S0305-0483(01)0 0 026-3.
2006/11/23/business/worldbusiness/23iht-trading.3647885.html?pagewanted= Tenti, P. (1996). Forecasting foreign exchange rates using recurrent neural networks.
all&_r=0 Applied Artificial Intelligence, 10(6), 567–582.
Ellis, C. D. (2001). Wall street people: True stories of today’s masters and moguls Wang, X., Smith, K. A., & Hyndman, R. J. (2005). Dimension reduction for clustering
p. 360. John Wiley & Sons. time series using global characteristics. In Proceedings of the 5th International
Eng, M. H., Li, Y., Wang, Q.-G., & Lee, T. H. (2008). Forecast forex with ANN using Conference on Computational Science (ICCS’05) - Part III: Vol. 3516 (pp. 792–795).
fundamental data. In International Conference on Information Management, Inno- doi:10.1007/11428862_108.
vation Management and Industrial Engineering (pp. 279–282). IEEE. doi:10.1109/ Wernick, M., Yang, Y., Brankov, J., Yourganov, G., & Strother, S. (2010). Machine
ICIII.2008.302. learning in medical imaging. IEEE Signal Processing Magazine, 27(4), 25–38.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of doi:10.1109/MSP.2010.936730.
the variance of United Kingdom inflation. Econometrica, 50(4), 987–1007. White, H. (20 0 0). A reality check for data snooping. Econometrica, 68(5), 1097–1126.
Freed, M., & Lee, J. (2013). Application of support vector machines to the classifica- Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning
tion of galaxy morphologies. In 2013 International Conference on Computational tools and techniques: Practical machine learning tools and techniques (3rd ed,
and Information Sciences (pp. 322–325). IEEE. doi:10.1109/ICCIS.2013.92. p. 664). Elsevier.
González, E., Avila, J., & Bustacara, C. (2003). BESA: Behavior-oriented, event-driven Yamazaki, T., & Ozasa, S. (n.d.). Ex-Goldman Sachs Banker Starts Hedge
and social-based agent framework. In Parallel and Distributed Processing Tech- Fund Analyzing Japanese Blog Traffic. Bloomberg. Retrieved Novem-
niques and Applications - PDPTA’03. CSREA Press. ber 08, 2012, from http://www.bloomberg.com/news/2011- 04- 21/
González, E., & Torres, M. (2006). Organizational approach for agent oriented pro- ex- goldman- banker- starts- hedge- fund- analyzing- japanese- blogs.html
gramming. In 8th International Conference on Enterprise Information Systems - Yoo, P. D., Kim, M. H., & Jan, T. (2007). Machine learning techniques and use
ICEIS (pp. 75–80). of event information for stock market prediction: A survey and evaluation.
Hendershott, T. (2003). Electronic trading in financial markets. IT Professional, 5(4), In International Conference on Computational Intelligence for Modelling, Control
10–14. doi:10.1109/MITP.2003.1216227. and Automation and International Conference on Intelligent Agents, Web Tech-
Kamruzzaman, J., & Sarker, R. A. (2003). Comparing ANN based models with ARIMA nologies and Internet Commerce (CIMCA-IAWTIC’06): Vol. 2 (pp. 835–841). IEEE.
for prediction of forex rates. ASOR Bulletin, 22(2), 1–11. doi:10.1109/CIMCA.2005.1631572.
Khan, K., Baharudin, B. B., Khan, A., & E-Malik, F. (2009). Mining opinion from text Zamani, M., & Kremer, S. C. (2011). Amino acid encoding schemes for machine
documents: A survey. In 3rd IEEE International Conference on Digital Ecosystems learning methods. In 2011 IEEE International Conference on Bioinformatics and
and Technologies (pp. 217–222). IEEE. doi:10.1109/DEST.2009.5276756. Biomedicine Workshops (BIBMW) (pp. 327–333). IEEE. doi:10.1109/BIBMW.2011.
6112394.