Dynamic Portfolio Management With Transaction Costs: Alberto Su Arez John Moody, Matthew Saffell
Dynamic Portfolio Management With Transaction Costs: Alberto Su Arez John Moody, Matthew Saffell
Alberto Su rez a Computer Science Department Universidad Aut noma de Madrid o 28049, Madrid (Spain)
[email protected]
John Moody, Matthew Saffell International Computer Science Institute 1947 Center Street. Suite 600 Berkeley, CA 94704, USA
[email protected],[email protected]
Abstract
We develop a recurrent reinforcement learning (RRL) system that directly induces portfolio management policies from time series of asset prices and indicators, while accounting for transaction costs. The RRL approach learns a direct mapping from indicator series to portfolio weights, bypassing the need to explicitly model the time series of price returns. The resulting policies dynamically optimize the portfolio Sharpe ratio, while incorporating changing conditions and transaction costs. A key problem with many portfolio optimization methods, including Markowitz, is discovering corner solutions with weight concentrated on just a few assets. In a dynamic context, naive portfolio algorithms can exhibit switching behavior, particularly when transaction costs are ignored. In this work, we extend the RRL approach to produce better diversied portfolios and smoother asset allocations over time. The solutions we propose are to include realistic transaction costs and to shrink portfolio weights toward the prior portfolio. The methods are assessed on a global asset allocation problem consisting of the Pacic, North America and Europe MSCI International Equity Indices.
1 Introduction
The selection of optimal portfolios is a central problem of great interest of quantitative nance, one that still dees complete solution. [1, 2, 3, 4, 5, 6, 7, 8, 9]. A drawback of the standard framework formulated by Markowitz [1] is that only one period is used in the evaluation of the portfolio performance. In fact, no dynamics are explicitly considered. Like in many other nancial planning problems, the potential improvements of modifying the portfolio composition should be weighed against the costs of the reallocation of capital, taxes, market impact, and other state-dependent factors. The performance of an investment depends on a sequence of portfolio rebalancing decisions over several periods. This problem has been addressed using different techniques, such as dynamic programming [2, 5] stochastic network programming [3], tabu search [4], reinforcement learning [7] and Monte Carlo methods [8, 9]. A key problem with many portfolio optimization methods, including Markowitz, is nding corner solutions with weight concentrated on just a few assets. In a dynamic context, naive portfolio algorithms can exhibit switching behavior, particularly when transaction costs are ignored. In this work, we address the asset management problem following the proposal of Moody et al [6, 7], and use reinforcement learning to optimize objective functions such as the Sharpe ratio that directly measure the performance of the trading system. A recurrent softmax architecture learns a direct mapping from indicator series to portfolio weights, and the recurrence enables incorporation of transaction costs. The softmax network parameters are optimized via the recurrent reinforcement learning (RRL) algorithm. We extend the RRL approach to produce more evenly diversied portfolios and smoother asset allocations over time. The solutions we propose are to incorporate realistic transaction costs and 1
Figure 1: Architecture for the reinforcement learning system. The system improves on previous proposals by directly considering, when = 0, the current portfolio composition in the determination of the new portfolio weights.
to shrink portfolio weights toward the prior portfolio. The methods are assessed on a global asset allocation problem consisting of the Pacic, North America and Europe MSCI International Equity Indices.
(1)
The relative importance of these two terms in the nal output is controlled by a hyperparameter [0, 1]. For = 0, the nal prediction is directly the output of the softmax network. In the absence of transaction costs, a new portfolio can be created at no expense. In this case, the currently held portfolio need not be used as a reference, and = 0 should be used. If transaction costs are nonzero, it is necessary to ensure that the expected return from dynamically managing the investments outweighs the cost of modifying the composition of the portfolio. The costs are deterministic and can be calculated once the new makeup of the portfolio is established. By contrast, the returns expected from the investment are uncertain. If they are overestimated (e.g. when there is overtting) the costs will dominate and the dynamic management strategy seeking to maximize the returns by rebalancing will have a poor performance. A value > 0 causes the composition of the portfolio to vary smoothly, which should lead to improved performance in the presence of transaction costs. The parameters of the RRL system are xed by either directly maximizing the wealth accumulated over the training period or by optimizing an exponentially smoothed Sharpe ratio [6]. The training algorithm is a variant of gradient ascent with learning parameter extended to take into account the recurrent terms in 1. The hyperparameters of the learning system (, , ) can be determined by either holdout validation or cross-validation. 2
Table 1: Performance of the portfolios selected by the different strategies. The values displayed between parentheses are performance measures relative to the market portfolio: The ratio of the prot for the corresponding portfolio to the prot of the market portfolio, and the difference between the Sharpe ratio of the portfolio and the Sharpe ratio of the market portfolio. Cost Prot 0% 1% 2% 3% 5% 0% 1% 2% 3% 5% Market 2.9084 Markowitz 3.1889 ( 1.0965) 2.9094 ( 1.0003) 2.6539 ( 0.9125) 2.4205 ( 0.8322) 2.0125 ( 0.6920) 0.5147 ( 0.0381) 0.4804 ( 0.0037) 0.4457 (-0.0309) 0.4108 (-0.0658) 0.3405 (-0.1362) RRL 3.4507 ( 1.1865) 3.1825 ( 1.0942) 3.1749 ( 1.0916) 2.9176 ( 1.0031) 2.8342 ( 0.9745) 0.5456 ( 0.0689) 0.5110 ( 0.0350) 0.5080 ( 0.0314) 0.4793 ( 0.0027) 0.4682 (-0.0084)
Sharpe ratio
0.4767
ket. For zero transaction costs, both a Markowitz portfolio, and the reinforcement learning strategy perform better than the market portfolio. Since the market portfolio is never rebalanced, there is no cost associated to holding the market portfolio even when transaction costs are different from zero (other than the initial investment, no transactions are needed to implement this passive management strategy). In the presence of non-zero transaction costs, the performance of the Markowitz portfolio quickly deteriorates. Only for small transaction costs (1%), and according to the Sharpe ratio is it better than the market portfolio. By contrast, the reinforcement learning strategy improves the results of the market portfolio even when higher transactions costs are considered (up to 3%). However, for sufciently high transaction costs (5%), the market portfolio outperforms the dynamic investment strategies considered.
Market weights
1 EU NA PA
0.5
20
40
60
80
100
120
140
160
180
200
0.5
20
40
60
80
100
120
140
160
180
200
0.5
20
40
60
80
100
120
140
160
180
200
0.5
20
40
60
80
100
120
140
160
180
200
t (months)
Figure 2: Evolution of portfolio weights for the market portfolio (top) and for the reinforcement learning systems for different transaction costs (0%, 1% and 3% from the top down). The gures show the sensitivity of the RL learners to vary their strategies with the level of transaction costs. From the results obtained, several important observations can be made. As anticipated, the policy learned in the absence of transaction costs involves a large amount of portfolio rebalancing. At a given time, the investment is concentrated in the index that has had the best performance in the recent past. The switching observed for the portfolio weights is clearly undesirable i n real markets, where transaction costs make this type of behavior suboptimal. By contrast, the policies learned by the RRL system when transaction costs are considered to be smoother and require much less rebalancing. Furthermore, the portfolios selected are well-diversied, which is in agreement with nancial good practices. The use of the current portfolio composition as a reference in the reinforcement learning architecture considered in (Fig. 1) is crucial for the identication of robust investment policies in the presence of transaction costs. Current work includes extending the empirical investigation of the learning capabilities and limitations of the RRL system under different conditions. In particular, it is important to analyze its performance in the presence of correlations, autoregressive structure or heterocedasticity in the series of asset prices. Furthermore, the reinforcement learning system is being extensively tested using different nancial data, and its performance compared with alternative investment strategies [11, 12]. Finally, it is also necessary to consider the possibility of investing in a risk-free asset so that strong decreases in prot can be avoided during periods in which all the portfolio constituents lose value. 4
References
[1] Harry Markowitz. Portfolio selection. Journal of Finance, 7(1):7791, 1952. [2] Paul A. Samuelson. Lifetime portfolio selection by dynamic stochastic programming. The Review of Economics and Statistics, 51(3):239246, aug 1969. [3] J. M. Mulvey and H. Vladimirou. Stochastic network programming for nancial planning problems. Management Science, 38:16421664, 1992. [4] F. Glover, J. M. Mulvey, and K. Hoyland. Solving dynamic stochastic control problems in nance using tabu search with variable scaling. In I. H. Osman and J. P. Kelly, editors, MetaHeuristics: Theory and Applications, pages 429448. Kluwer Academic Publishers, 1996. [5] Ralph Neuneier. Optimal asset allocation using adaptive dynamic programming. In David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 952958. The MIT Press, 1996. [6] John Moody, Lizhong Wu, Yuansong Liao, and Matthew Saffell. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 17(1):441 470, 1998. [7] John Moody and Matthew Saffell. Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4):875889, 2001. [8] J.B. Detemple, R. Garcia, and M. Rindisbacher. A monte carlo method for optimal portfolios. The Journal of Finance, 58(1):401446, 2003. [9] Michael W. Brandt, Amit Goyal, Pedro Santa-Clara, and Jonathan R. Stroud. A simulation approach to dynamic portfolio choice with an application to learning about return predictability. Review of Financial Studies, 18(3):831873, 2005. [10] MSCI Inc. http://www.mscibarra.com/products/indices/equity/. [11] Allan Borodin, Ran El-Yaniv, and Vincent Gogan. Can we learn to beat the best stock. Journal of Articial Intelligence Research, 21:579594, 2004. [12] Amit Agarwal, Elad Hazan, Satyen Kale, and Robert E. Schapire. Algorithms for portfolio management based on the newton method. In Proceedings of the 23rd international conference on Machine learning, ICML 2006, pages 9 16, New York, NY, USA, 2006. ACM.