0% found this document useful (0 votes)

16 views7 pages

NIPS 1998 Reinforcement Learning For Trading Paper

This document discusses using reinforcement learning to optimize trading systems by maximizing financial performance functions. Specifically, it proposes using recurrent reinforcement learning and Q-learning to directly optimize performance measures like profit, risk-adjusted return (Sharpe ratio), and a newly proposed differential Sharpe ratio. It also provides simulation results demonstrating the presence of predictability in stock prices and compares the reinforcement learning algorithms.

Uploaded by

adeka1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

NIPS 1998 Reinforcement Learning For Trading Paper

Uploaded by

adeka1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Reinforcement Learning for Trading

John Moody and Matthew Saffell*

Oregon Graduate Institute , CSE Dept.
P.O . Box 91000 , Portland, OR 97291-1000
{moody, saffell }@cse.ogi.edu

Abstract
We propose to train trading systems by optimizing financial objec-
tive functions via reinforcement learning. The performance func-
tions that we consider are profit or wealth, the Sharpe ratio and
our recently proposed differential Sharpe ratio for online learn-
ing. In Moody & Wu (1997), we presented empirical results that
demonstrate the advantages of reinforcement learning relative to
supervised learning . Here we extend our previous work to com-
pare Q-Learning to our Recurrent Reinforcement Learning (RRL)
algorithm. We provide new simulation results that demonstrate
the presence of predictability in the monthly S&P 500 Stock Index
for the 25 year period 1970 through 1994, as well as a sensitivity
analysis that provides economic insight into the trader's structure.

1 Introduction: Reinforcement Learning for Thading

The investor's or trader's ultimate goal is to optimize some relevant measure of
trading system performance , such as profit , economic utility or risk-adjusted re-
turn. In this paper , we propose to use recurrent reinforcement learning to directly
optimize such trading system performance functions , and we compare two differ-
ent reinforcement learning methods. The first, Recurrent Reinforcement Learning,
uses immediate rewards to train the trading systems , while the second (Q-Learning
(Watkins 1989)) approximates discounted future rewards. These methodologies can
be applied to optimizing systems designed to trade a single security or to trade port-
folios . In addition , we propose a novel value function for risk-adjusted return that
enables learning to be done online: the differential Sharpe ratio.
Trading system profits depend upon sequences of interdependent decisions, and are
thus path-dependent. Optimal trading decisions when the effects of transactions
costs, market impact and taxes are included require knowledge of the current system
state. In Moody, Wu, Liao & Saffell (1998), we demonstrate that reinforcement
learning provides a more elegant and effective means for training trading systems
when transaction costs are included , than do more standard supervised approaches.
• The authors are also with Nonlinear Prediction Systems.
918 J. Moody and M Saffell

Though much theoretical progress has been made in recent years in the area of rein-
forcement learning, there have been relatively few successful, practical applications
of the techniques. Notable examples include Neuro-gammon (Tesauro 1989), the
asset trader of Neuneier (1996), an elevator scheduler (Crites & Barto 1996) and a
space-shuttle payload scheduler (Zhang & Dietterich 1996).
In this paper we present results for reinforcement learning trading systems that
outperform the S&P 500 Stock Index over a 25-year test period, thus demonstrating
the presence of predictable structure in US stock prices. The reinforcement learning
algorithms compared here include our new recurrent reinforcement learning (RRL)
method (Moody & Wu 1997, Moody et ai. 1998) and Q-Learning (Watkins 1989).

2 Trading Systems and Financial Performance Functions

2.1 Structure, Profit and Wealth for Trading Systems
We consider performance functions for systems that trade a single 1 security with
price series Zt. The trader is assumed to take only long, neutral or short positions
F t E {-I , 0, I} of constant magnitude. The constant magnitude assumption can
be easily relaxed to enable better risk control. The position Ft is established or
maintained at the end of each time interval t, and is re-assessed at the end of
period t + 1. A trade is thus possible at the end of each time period, although
nonzero trading costs will discourage excessive trading. A trading system return
R t is realized at the end of the time interval (t - 1, t] and includes the profit or loss
resulting from the position F t - 1 held during that interval and any transaction cost
incurred at time t due to a difference in the positions Ft - 1 and Ft.
In order to properly incorporate the effects of transactions costs, market impact and
taxes in a trader's decision making, the trader must have internal state information
and must therefore be recurrent. An example of a single asset trading system
that takes into account transactions costs and market impact has following decision
function: Ft = F((}t; Ft-l. It) with It = {Zt, Zt-1, Zt-2,··.; Yt, Yt-1, Yt-2, ... } where
(}t denotes the (learned) system parameters at time t and It denotes the information
set at time t, which includes present and past values of the price series Zt and an
arbitrary number of other external variables denoted Yt.
Trading systems can be optimized by maximizing performance functions U 0 such
as profit, wealth, utility functions of wealth or performance ratios like the Sharpe
ratio. The simplest and most natural performance function for a risk-insensitive
trader is profit. The transactions cost rate is denoted 6.
Additive profits are appropriate to consider if each trade is for a fixed number
of shares or contracts of security Zt. This is often the case, for example, when
trading small futures accounts or when trading standard US$ FX contracts in dollar-
denominated foreign currencies. With the definitions rt = Zt - Zt-1 and r{ =
4 - 4-1 for the price returns of a risky (traded) asset and a risk-free asset (like T-
Bills) respectively, the additive profit accumulated over T time periods with trading
position size Jl > 0 is then defined as:

T T
PT = LRt = Jl L {r{ + Ft- 1(rt - r{) - 61Ft - Ft-11} (1)
t=l t=l

1 See Moody et al. (1998) for a detailed discussion of multiple asset portfolios.
Reinforcement Learning for Trading 919

with Po = 0 and typically FT = =

Fa O. Equation (1) holds for continuous quanti-
ties also. The wealth is defined as WT Wo + PT.=
Multiplicative profits are appropriate when a fixed fraction of accumulated
wealth v > 0 is invested in each long or short trade. Here, rt = (zt/ Zt-l - I)
and r{ = (z{ /4-1 - 1). If no short sales are allowed and the leverage factor is set
fixed at v = 1, the wealth at time Tis:
T T
WT = Wo II {I + Rd = Wo II {I + (1- Ft_t}r{ + Ft-1rt} {1- 81Ft - Ft- 11}· (2)
t=1 t=1
2.2 The Differential Sharpe Ratio for On-line Learning
Rather than maximizing profits, most modern fund managers attempt to maximize
risk-adjusted return as advocated by Modern Portfolio Theory. The Sharpe ratio is
the most widely-used measure of risk-adjusted return (Sharpe 1966). Denoting as
before the trading system returns for period t (including transactions costs) as R t ,
the Sharpe ratio is defined to be

S _ Average(Re)
(3)
T - Standard Deviation(Rt )

where the average and standard deviation are estimated for periods t = {I, ... , T}.
Proper on-line learning requires that we compute the influence on the Sharpe ratio
of the return at time t. To accomplish this, we have derived a new objective func-
tion called the differential Sharpe ratio for on-line optimization of trading system
performance (Moody et al. 1998). It is obtained by considering exponential moving
averages of the returns and standard deviation of returns in (3), and expanding to
first order in the decay rate ".,: St ~ St-1 + ""~ll1=O + 0(".,2) . Noting that only the
first order term in this expansion depends upon the return R t at time t, we define
the differential Sharpe ratio as:

(4)

where the quantities At and B t are exponential moving estimates of the first and
second moments of R t :

A t- 1 + ".,~At= A t- 1 + ".,(Rt - A t -1)

Bt- 1 + ".,~Bt = Bt- 1 + TJ(R; - Bt-d (5)

Treating A t - 1 and B t - 1 as numerical constants, note that"., in the update equations

controls the magnitude of the influence of the return R t on the Sharpe ratio St .
Hence, the differential Sharpe ratio represents the influence of the trading return
R t realized at time t on St.

3 Reinforcement Learning for Trading Systems

The goal in using reinforcement learning to adjust the parameters of a system is
to maximize the expected payoff or reward that is generated due to the actions
of the system. This is accomplished through trial and error exploration of the
environment. The system receives a reinforcement signal from its environment (a
920 J. Moody and M. Saffell

reward) that provides information on whether its actions are good or bad. The
performance function at time T can be expressed as a function of the sequence of
trading returns UT =U(R1' R 2 , ... , RT).
Given a trading system model FtU}), the goal is to adjust the parameters () in
order to maximize UT. This maximization for a complete sequence of T trades
can be done off-line using dynamic programming or batch versions of recurrent
reinforcement learning algorithms. Here we do the optimization on-line using a
reinforcement learning technique. This reinforcement learning algorithm is based
on stochastic gradient ascent. The gradient of UT with respect to the parameters ()
of the system after a sequence of T trades is

=L
T
dUT(()) dUT {dRt dFt + dR t dFt-1} (6)
d() dR t dFt d() dFt - 1 d()
t=1

A simple on-line stochastic optimization can be obtained by considering only the

term in (6) that depends on the most recently realized return R t during a forward
pass through the data:

_dU_t-'..(()-'-) = _dU_t {_dR_t _dF_t + _d_R_t__dF_t_-_1} . (7)

d() dRt dFt d() dFt - 1 d()

The parameters are then updated on-line using /),.()t = pdUt(()t)/d()t. Because of the
recurrent structure of the problem (necessary when transaction costs are included),
we use a reinforcement learning algorithm based on real-time recurrent learning
(Williams & Zipser 1989). This approach, which we call recurrent reinforcement
learning (RRL), is described in (Moody & Wu 1997, Moody et al. 1998) along with
extensive simulation results.

4 Empirical Results: S&P 500 I TBill Asset Allocation

A long/short trading system is trained on monthly S&P 500 stock index and 3-
month TBill data to maximize the differential Sharpe ratio . The S&P 500 target
series is the total return index computed by reinvesting dividends. The 84 input
series used in the trading systems include both financial and macroeconomic data.
All data are obtained from Citibase, and the macroeconomic series are lagged by
one month to reflect reporting delays.
A total of 45 years of monthly data are used, from January 1950 through December
1994. The first 20 years of data are used only for the initial training of the system.
The test period is the 25 year period from January 1970 through December 1994.
The experimental results for the 25 year test period are true ex ante simulated
trading results.
For each year during 1970 through 1994, the system is trained on a moving window
of the previous 20 years of data. For 1970, the system is initialized with random
parameters. For the 24 subsequent years, the previously learned parameters are
used to initialize the training. In this way, the system is able to adapt to changing
market and economic conditions. Within the moving training window, the "RRL"
systems use the first 10 years for stochastic optimization of system parameters, and
the subsequent 10 years for validating early stopping of training. The networks
are linear, and are regularized using quadratic weight decay during training with a
Reinforcement Learningfor Trading 921

regularization parameter of 0.0l. The "Qtrader" systems use a bootstrap sample

of the 20 year training window for training, and the final 10 years of the training
window are used for validating early stopping of training. The networks are two-
layer feedforward networks with 30 tanh units in the hidden layer.
4.1 Experimental Results
The left panel in Figure 1 shows box plots summarizing the test performance for
the full 25 year test period of the trading systems with various realizations of the
initial system parameters over 30 trials for the "RRL" system, and 10 trials for
the "Qtrader" system 2 . The transaction cost is set at 0.5%. Profits are reinvested
during trading, and multiplicative profits are used when calculating the wealth. The
notches in the box plots indicate robust estimates of the 95% confidence intervals
on the hypothesis that the median is equal to the performance of the buy and hold
strategy. The horizontal lines show the performance of the "RRL" voting, "Qtrader"
voting and buy and hold strategies for the same test period. The annualized monthly
Sharpe ratios of the buy and hold strategy, the "Qtrader" voting strategy and the
"RRL" voting strategy are 0.34, 0.63 and 0.83 respectively. The Sharpe ratios
calculated here are for the returns in excess of the 3-month treasury bill rate.
The right panel of Figure 1 shows results for following the strategy of taking posi-
tions based on a majority vote of the ensembles of trading systems compared with
the buy and hold strategy. We can see that the trading systems go short the S&P
500 during critical periods, such as the oil price shock of 1974, the tight money
periods of the early 1980's, the market correction of 1984 and the 1987 crash. This
ability to take advantage of high treasury bill rates or to avoid periods of substantial
stock market loss is the major factor in the long term success of these trading mod-
els. One exception is that the "RRL" trading system remains long during the 1991
stock market correction associated with the Persian Gulf war, though the "Qtrader"
system does identify the correction. On the whole though, the "Qtrader" system
trades much more frequently than the "RRL" system, and in the end does not
perform as well on this data set.
From these results we find that both trading systems outperform the buy and hold
strategy, as measured by both accumulated wealth and Sharpe ratio. These dif-
ferences are statistically significant and support the proposition that there is pre-
dictability in the U.S. stock and treasury bill markets during the 25 year period
1970 through 1994. A more detailed presentation of the "RRL" results appears in
(Moody et al. 1998).
4.2 Gaining Economic Insight Through Sensitivity Analysis
A sensitivity analysis of the "RRL" systems was performed in an attempt to de-
termine on which economic factors the traders are basing their decisions. Figure 2
shows the absolute normalized sensitivities for 3 of the more salient input series as
a function of time, averaged over the 30 members of the "RRL" committee. The
sensitivity of input i is defined as:

IdXi I
Si = dF /max dF
J
IdXj I (8)

where F is the unthresholded trading output and Xi denotes input i.

2Ten trials were done for the "Qtrader" system due to the amount of computation
required in training the systems
922 J. Moody and M. Saffell

F,nal Eqully: OIrador VI RRl

""' __ I
70
.I: ---
. . :. .:. ~:vcMI
""''''''_I I~ a.-._
Ml_

________ , __
g
ro~====~ ~ ~

so
~40
.z-
30
_____
...-
g_~ _________________,_______ _ ,
,,,
20
,,
, -'-
----- --r-
, ----- --- -------- --- --
10
-'-
RRL

Figure 1: Test results for ensembles of simulations using the S&P 500 stock in-
dex and 3-month Treasury Bill data over the 1970-1994 time period. The solid
curves correspond to the "RRL" voting system performance, dashed curves to the
"Qtrader" voting system and the dashed and dotted curves indicate the buy and
hold performance. The boxplots in (a) show the performance for the ensembles
of "RRL" and "Qtrader" trading systems The horizontal lines indicate the per-
formance of the voting systems and the buy and hold strategy. Both systems
significantly outperform the buy and hold strategy. (b) shows the equity curves
associated with the voting systems and the buy and hold strategy, as well as the
voting trading signals produced by the systems. In both cases, the traders avoid
the dramatic losses that the buy and hold strategy incurred during 1974 and 1987.

The time-varying sensitivities in Figure 2 emphasize the nonstationarity of economic

relationships. For example, the yield curve slope (which measures inflation expecta-
tions) is found to be a very important factor in the 1970's, while trends in long term
interest rates (measured by the 6 month difference in the AAA bond yield) becomes
more important in the 1980's, and trends in short term interest rates (measured by
the 6 month difference in the treasury bill yield) dominate in the early 1990's.
5 Conclusions and Extensions
In this paper, we have trained trading systems via reinforcement learning to optimize
financial objective functions including our differential Sharpe ratio for online learn-
ing. We have also provided results that demonstrate the presence of predictability
in the monthly S&P 500 Stock Index for the 25 year period 1970 through 1994.
We have previously shown with extensive simulation results (Moody & Wu
1997, Moody et al. 1998) that the "RRL" trading system significantly outperforms
systems trained using supervised methods for traders of both single securities and
portfolios. The superiority of reinforcement learning over supervised learning is
most striking when state-dependent transaction costs are taken into account. Here,
we present results for asset allocation systems trained using two different reinforce-
ment learning algorithms on a real, economic dataset. We find that the "Qtrader"
system does not perform as well as the "RRL" system on the S&P 500 / TBill asset
allocation problem, possibly due to its more frequent trading. This effect deserves
further exploration. In general, we find that Q-Iearning can suffer from the curse of
dimensionality and is more difficult to use than our RRL approach.
Finally, we apply sensitivity analysis to the trading systems, and find that certain
interest rate variables have an influential role in making asset allocation decisions.
Reinforcement Learningfor Trading 923

S",sltivity Analysis: A....g. on RRL Commill ••

0.9 ,- -.. j
,- "\ \
"
,',
, ,I ,
,I I
0.8 I , I

~f07 :
, I

,
"
! i
GO.6 •
,
jos! ,, I
,,
iO.4 I I

1 \

1 03
0.2
I
Ir--------'...!.'-----,
'

VI.1d Curv. Slop.

6 Month Dill. In AM Bond yield
6 Month Dill. In TBIU Vieid

1975 1980
0.,. 1985 1990 1995

Figure 2: Sensitivity traces for three of the inputs to the "RRL" trading system
averaged over the ensemble of traders. The nonstationary relationships typical
among economic variables is evident from the time-varying sensitivities.

We also find that these influences exhibit nonstationarity over time.

Acknowledgements
We gratefully acknowledge support for this work from Nonlinear Prediction Systems and
from DARPA under contract DAAH01-96-C-R026 and AASERT grant DAAH04-95-1-
0485.

References
Crites, R. H. & Barto, A. G. (1996), Improving elevator performance using reinforcement
learning, in D. S. Touretzky, M. C. Mozer & M. E. Hasselmo, eds, 'Advances in NIPS',
Vol. 8, pp. 1017-1023.
Moody, J. & Wu, L. (1997), Optimization of trading systems and portfolios, in Y. Abu-
Mostafa, A. N. Refenes & A. S. Weigend, eds, 'Decision Technologies for Financial
Engineering', World Scientific, London, pp. 23-35. This is a slightly revised version
of the original paper that appeared in the NNCM*96 Conference Record, published
by Caltech, Pasadena, 1996.
Moody, J., Wu, L., Liao, Y. & Saffell, M. (1998), 'Performance functions and reinforcement
learning for trading systems and portfolios', Journal of Forecasting 17,441-470.
Neuneier, R. (1996), Optimal asset allocation using adaptive dynamic programming, in
D. S. Touretzky, M. C. Mozer & M. E. Hasselmo, eds, 'Advances in NIPS', Vol. 8,
pp. 952-958.
Sharpe, W. F. (1966), 'Mutual fund performance', Journal of Business pp. 119-138.
Tesauro, G. (1989), 'Neurogammon wins the computer olympiad', Neural Computation
1,321-323.
Watkins, C. J. C. H. (1989), Learning with Delayed Rewards, PhD thesis, Cambridge
University, Psychology Department.
Williams, R. J. & Zipser, D. (1989), 'A learning algorithm for continually running fully
recurrent neural networks', Neural Computation 1,270-280.
Zhang, W. & Dietterich, T. G. (1996), High-performance job-shop scheduling with a time-
delay td(A) network, in D. S. Touretzky, M. C. Mozer & M. E. Hasselmo, eds, 'Ad-
vances in NIPS', Vol. 8, pp. 1024-1030.

Final Business Plan For Quarrying and Aggregate Plant
100% (4)
Final Business Plan For Quarrying and Aggregate Plant
46 pages
Sonia's Project-1-4
No ratings yet
Sonia's Project-1-4
82 pages
WARRANTY
No ratings yet
WARRANTY
2 pages
Chapter 1
No ratings yet
Chapter 1
7 pages
Mobile Jan
No ratings yet
Mobile Jan
18 pages
Law On Corporation Cases
No ratings yet
Law On Corporation Cases
15 pages
Sonu Kumar Declaration Sept - 2nd 2024
No ratings yet
Sonu Kumar Declaration Sept - 2nd 2024
1 page
Earning and Managing Money
No ratings yet
Earning and Managing Money
16 pages
Advert Rangers Greater-Mana-Pools 241112 185043 114016
No ratings yet
Advert Rangers Greater-Mana-Pools 241112 185043 114016
6 pages
Mastering Trading Indicators & Oscillators: Strategies for Success with ATR, CCI, DeMarker, Bulls Power, and Bears Power: Trading Indicators & Oscillators
From Everand
Mastering Trading Indicators & Oscillators: Strategies for Success with ATR, CCI, DeMarker, Bulls Power, and Bears Power: Trading Indicators & Oscillators
SmartMoney
No ratings yet
Optimal Linear Signal
No ratings yet
Optimal Linear Signal
10 pages
7 - M - Com - CA Syllabus (2017-18)
No ratings yet
7 - M - Com - CA Syllabus (2017-18)
32 pages
12 b2 - Advertising Vocabulary
No ratings yet
12 b2 - Advertising Vocabulary
2 pages
GeM Bidding 6418087
No ratings yet
GeM Bidding 6418087
8 pages
Multimodal Deep Reinforcement Learning For
No ratings yet
Multimodal Deep Reinforcement Learning For
24 pages
Introduction To Management 24.1 (Student)
No ratings yet
Introduction To Management 24.1 (Student)
36 pages
Investment Formulas: A Simple Introduction
From Everand
Investment Formulas: A Simple Introduction
K.H. Erickson
No ratings yet
HealthTech in Bangladesh August 2022
No ratings yet
HealthTech in Bangladesh August 2022
39 pages
Optimal Barrier Trading With and Without Transaction Costs
No ratings yet
Optimal Barrier Trading With and Without Transaction Costs
33 pages
Chapter 13 - Reinforcement Learning For Algorithmic Trading
No ratings yet
Chapter 13 - Reinforcement Learning For Algorithmic Trading
21 pages
How Skewed Are Simultaneously Long-Short Trading Gains?: Michael Heinrich Baumann
No ratings yet
How Skewed Are Simultaneously Long-Short Trading Gains?: Michael Heinrich Baumann
6 pages
Service Blueprint
No ratings yet
Service Blueprint
10 pages
Dissertation Boming
No ratings yet
Dissertation Boming
96 pages
MOU Shepp Women Shed. Dec. 2022
No ratings yet
MOU Shepp Women Shed. Dec. 2022
2 pages
TFM MII PretelParejoMerino, CarlosJesus
No ratings yet
TFM MII PretelParejoMerino, CarlosJesus
105 pages
Enhancing Time Series Momentum Strategies Using Deep Neural Networks
No ratings yet
Enhancing Time Series Momentum Strategies Using Deep Neural Networks
21 pages
Annexure To Form 16
No ratings yet
Annexure To Form 16
3 pages
Quantitative Trading Using Deep Q Learning: Abstract
No ratings yet
Quantitative Trading Using Deep Q Learning: Abstract
14 pages
11 620000 4800000602 Odc Civ MST 000004 - 02
No ratings yet
11 620000 4800000602 Odc Civ MST 000004 - 02
13 pages
Case Study Consumer Behaviour
75% (8)
Case Study Consumer Behaviour
19 pages
The Illusion of Thin-Tails Under Aggregation - SSRN-Id1987562
No ratings yet
The Illusion of Thin-Tails Under Aggregation - SSRN-Id1987562
4 pages
7 PDFsam Applsci-13-12485
No ratings yet
7 PDFsam Applsci-13-12485
2 pages
Intelligent Algorithmic Trading Strategy Using Reinforcement Learning and Directional Change
No ratings yet
Intelligent Algorithmic Trading Strategy Using Reinforcement Learning and Directional Change
13 pages
Loan Application Form TCHFL
No ratings yet
Loan Application Form TCHFL
4 pages
Directory of Files
No ratings yet
Directory of Files
14 pages
0017-5309 V01 - Vestas Standard For Control of Hazardous Energy
No ratings yet
0017-5309 V01 - Vestas Standard For Control of Hazardous Energy
18 pages
Exploring Classic Quantitative Strategies: Jun Lu
No ratings yet
Exploring Classic Quantitative Strategies: Jun Lu
38 pages
Customer Experience & Satisfaction Theories
No ratings yet
Customer Experience & Satisfaction Theories
2 pages
SLB Business For Innovators
No ratings yet
SLB Business For Innovators
13 pages
Livre GHO Output
No ratings yet
Livre GHO Output
11 pages
Pair Trading
No ratings yet
Pair Trading
19 pages
Common Errors in The Interpretation of The Ideas of The Black Swan and Associated Papers - SSRN-Id1490769
No ratings yet
Common Errors in The Interpretation of The Ideas of The Black Swan and Associated Papers - SSRN-Id1490769
7 pages
Market Profile Basics: What is the Market Worth?
From Everand
Market Profile Basics: What is the Market Worth?
Daniel Christal
4.5/5 (13)
N50 Exploration Optation Tricks
No ratings yet
N50 Exploration Optation Tricks
2 pages
Ethnicity and National Integration in Pakistan
No ratings yet
Ethnicity and National Integration in Pakistan
11 pages
3M India Share Price, Financials and Stock Analysis
No ratings yet
3M India Share Price, Financials and Stock Analysis
9 pages
1.3 The Filipino Value System
No ratings yet
1.3 The Filipino Value System
11 pages
Draft Icpo-Sts-Ready Cargo 2
No ratings yet
Draft Icpo-Sts-Ready Cargo 2
5 pages
Quantitative Trading Using Deep Q Learning
No ratings yet
Quantitative Trading Using Deep Q Learning
13 pages
ProgrammingGuide v1.0
0% (1)
ProgrammingGuide v1.0
89 pages
Ensembling Portfolio Strategies For Long-Term Investments: A Distribution-Free Preference Framework For Decision-Making and Algorithms
No ratings yet
Ensembling Portfolio Strategies For Long-Term Investments: A Distribution-Free Preference Framework For Decision-Making and Algorithms
25 pages
Random Forest
No ratings yet
Random Forest
225 pages
Stock Trading Strategy Developing Based On Reinforcement Learning
No ratings yet
Stock Trading Strategy Developing Based On Reinforcement Learning
9 pages
01 Module 3 Portfolio Optimization
No ratings yet
01 Module 3 Portfolio Optimization
65 pages
Liquidity 2008
No ratings yet
Liquidity 2008
47 pages
SSRN 4383184
No ratings yet
SSRN 4383184
18 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
16 pages
33 Optimization of Multi Factor M
No ratings yet
33 Optimization of Multi Factor M
7 pages
Reinforcement Learning For Trading Systems and Portfolios
No ratings yet
Reinforcement Learning For Trading Systems and Portfolios
5 pages
PSJV Duqm Refinery Project Hsse Observations Tracking Register
No ratings yet
PSJV Duqm Refinery Project Hsse Observations Tracking Register
3 pages
Practical Algorithmic Trading Using State Represen
No ratings yet
Practical Algorithmic Trading Using State Represen
12 pages
Statistical Undecidability - SSRN-Id1691165
No ratings yet
Statistical Undecidability - SSRN-Id1691165
4 pages
Algorithms 16 00023 v2
No ratings yet
Algorithms 16 00023 v2
17 pages
We Consider Several Frequently Asked Questions (FAQ's) in Option Pricing Theory
No ratings yet
We Consider Several Frequently Asked Questions (FAQ's) in Option Pricing Theory
31 pages
A Deep Reinforcement Learning Trader Without Offline Training
No ratings yet
A Deep Reinforcement Learning Trader Without Offline Training
17 pages
5587-Article Text-8812-1-10-20200512
No ratings yet
5587-Article Text-8812-1-10-20200512
8 pages
The Value of Queue Position in A Limit Order Book
100% (1)
The Value of Queue Position in A Limit Order Book
48 pages
Value-Added Activities in Venture Capital and The Effects of Stage - An Empirical Study
100% (1)
Value-Added Activities in Venture Capital and The Effects of Stage - An Empirical Study
116 pages
Algorithmic Trading On Financial Time Series Using
No ratings yet
Algorithmic Trading On Financial Time Series Using
20 pages
RCR Draft
No ratings yet
RCR Draft
2 pages
Multi-Agent Q Learning Daily Trading 2007
No ratings yet
Multi-Agent Q Learning Daily Trading 2007
14 pages
Trading Strategies World
From Everand
Trading Strategies World
Tarannum Mujawar
No ratings yet
10.1007@s11590 020 01546 7
No ratings yet
10.1007@s11590 020 01546 7
16 pages
FULLTEXT01
No ratings yet
FULLTEXT01
57 pages
Machine Learning For Trading by Gordon Ritter
No ratings yet
Machine Learning For Trading by Gordon Ritter
18 pages
How To Outperform Markets Using Trading Systems
100% (4)
How To Outperform Markets Using Trading Systems
81 pages
Designing Loss-Aware Fitness Function For GA-based Algorithmic Trading
No ratings yet
Designing Loss-Aware Fitness Function For GA-based Algorithmic Trading
8 pages
Beating The Stock Market With A Deep Reinforcement Learning Day Trading System
No ratings yet
Beating The Stock Market With A Deep Reinforcement Learning Day Trading System
8 pages
Deep Reinforcement Learning For Algorithmic Trading: 1.0.1 Learning in Financial Markets
No ratings yet
Deep Reinforcement Learning For Algorithmic Trading: 1.0.1 Learning in Financial Markets
24 pages
Dynamic Portfolio Management With Transaction Costs: Alberto - Suarez@uam - Es Moody, Saffell@icsi - Berkeley.edu
No ratings yet
Dynamic Portfolio Management With Transaction Costs: Alberto - Suarez@uam - Es Moody, Saffell@icsi - Berkeley.edu
2 pages
Learning To Trade Via Direct Reinforcement
No ratings yet
Learning To Trade Via Direct Reinforcement
15 pages
All Rights Reserved This Document Is A Chapter of "Modeling Trading System Performance" Published by Blue Owl Press, Inc
No ratings yet
All Rights Reserved This Document Is A Chapter of "Modeling Trading System Performance" Published by Blue Owl Press, Inc
8 pages
RTP Ritter Web
No ratings yet
RTP Ritter Web
6 pages
J Forecast Moody Wu
No ratings yet
J Forecast Moody Wu
36 pages
Gold (2003) - FX Trading Via Recurrent Reinforcement Learning PDF
No ratings yet
Gold (2003) - FX Trading Via Recurrent Reinforcement Learning PDF
8 pages
Deep Reinforcement Learning For Trading: Correspondence To: Zihao Zhang
No ratings yet
Deep Reinforcement Learning For Trading: Correspondence To: Zihao Zhang
16 pages
Algorithm Trading Using Q-Learning and Recurrent Reinforcement Learning PDF
No ratings yet
Algorithm Trading Using Q-Learning and Recurrent Reinforcement Learning PDF
7 pages
Optimal Trading With Linear Costs
No ratings yet
Optimal Trading With Linear Costs
20 pages
Wayne A. Thorp - Testing Trading Success
No ratings yet
Wayne A. Thorp - Testing Trading Success
5 pages
Trading Strategies From Technical Analysis - 6
No ratings yet
Trading Strategies From Technical Analysis - 6
19 pages
Statistical Arbitrage For Mid-Frequency Trading
No ratings yet
Statistical Arbitrage For Mid-Frequency Trading
17 pages
Dynamic Portfolio Management With Transaction Costs: Alberto Su Arez John Moody, Matthew Saffell
No ratings yet
Dynamic Portfolio Management With Transaction Costs: Alberto Su Arez John Moody, Matthew Saffell
5 pages
Backtesting RSI Trading Rules
No ratings yet
Backtesting RSI Trading Rules
31 pages
Adaptive RSI
No ratings yet
Adaptive RSI
3 pages
Can We Learn To Beat The Best Stock
No ratings yet
Can We Learn To Beat The Best Stock
16 pages
Loop-shaping Robust Control
From Everand
Loop-shaping Robust Control
Philippe Feyel
No ratings yet
2 Development and Analysis: Echnical Nalysis
No ratings yet
2 Development and Analysis: Echnical Nalysis
26 pages
Efficient Computation of Optimal Trading Strategies
No ratings yet
Efficient Computation of Optimal Trading Strategies
44 pages
An Automated FX Trading System Using Adaptive Reinforcement Learning
No ratings yet
An Automated FX Trading System Using Adaptive Reinforcement Learning
10 pages
Optimization 101 Rev A
No ratings yet
Optimization 101 Rev A
54 pages

NIPS 1998 Reinforcement Learning For Trading Paper

Uploaded by

NIPS 1998 Reinforcement Learning For Trading Paper

Uploaded by

Reinforcement Learning for Trading

John Moody and Matthew Saffell*

1 Introduction: Reinforcement Learning for Thading

2 Trading Systems and Financial Performance Functions

with Po = 0 and typically FT = =

A t- 1 + ".,~At= A t- 1 + ".,(Rt - A t -1)

Treating A t - 1 and B t - 1 as numerical constants, note that"., in the update equations

3 Reinforcement Learning for Trading Systems

A simple on-line stochastic optimization can be obtained by considering only the

_dU_t-'..(()-'-) = _dU_t {_dR_t _dF_t + _d_R_t__dF_t_-_1} . (7)

4 Empirical Results: S&P 500 I TBill Asset Allocation

regularization parameter of 0.0l. The "Qtrader" systems use a bootstrap sample

where F is the unthresholded trading output and Xi denotes input i.

F,nal Eqully: OIrador VI RRl

The time-varying sensitivities in Figure 2 emphasize the nonstationarity of economic

S",sltivity Analysis: A....g. on RRL Commill ••

VI.1d Curv. Slop.

We also find that these influences exhibit nonstationarity over time.

You might also like