Portfolio Optimization With Return Prediction Using Deep Learning and Machine Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Expert Systems With Applications 165 (2021) 113973

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Portfolio optimization with return prediction using deep learning and


machine learning
Yilin Ma, Ruizhu Han ∗, Weizhong Wang
School of Economics and Management, Southeast University, 2 Southeast University Road, Jiangning District, Nanjing, 211189, China

ARTICLE INFO ABSTRACT

Keywords: Integrating return prediction of traditional time series models in portfolio formation can improve the
Financial trading performance of original portfolio optimization model. Since machine learning and deep learning models have
Return prediction shown overwhelming superiority than time series models, this paper combines return prediction in portfolio
Portfolio optimization
formation with two machine learning models, i.e., random forest (RF) and support vector regression (SVR), and
Deep learning
three deep learning models, i.e., LSTM neural network, deep multilayer perceptron (DMLP) and convolutional
Machine learning
neural network. To be specific, this paper first applies these prediction models for stock preselection before
portfolio formation. Then, this paper incorporates their predictive results in advancing mean–variance (MV)
and omega portfolio optimization models. In order to present the superiority of these models, portfolio models
with autoregressive integrated moving average’s return prediction are used as benchmarks. Evaluation is
based on historical data of 9 years from 2007 to 2015 of component stocks of China securities 100 index.
Experimental results show that MV and omega models with RF return prediction, i.e., RF+MVF and RF+OF,
outperform the other models. Further, RF+MVF is superior to RF+OF. Due to the high turnover of these
two models, this paper discusses their performance after deducting the transaction fee cased by turnover.
Experiments present that RF+MVF still performs the best among MVF models and omega model with SVR
prediction (SVR+OF) performs the best among OF models. Moreover, RF+MVF performs better than SVR+OF
and high turnover erodes nearly half of their total returns especially for RF+OF and RF+MVF. Therefore, this
paper recommends investors to build MVF with RF return prediction for daily trading investment.

1. Introduction multilayer perceptron (DMLP) and convolutional neural network (CNN)


are frequently used in financial time series forecasting (Sezer, Gudelek,
Stock market prediction is a challenging problem of time series & Ozbayoglu, 2020).
prediction since stock market is essentially a nonlinear, dynamic, noisy Within the superiority of machine learning and deep learning mod-
and chaotic system (Deboeck, 1994). In fact, stock price is influenced els in stock market prediction, many researchers apply these models
by many factors such as political events, company’s policies and news, in stock preselection process before portfolio formation and generate
economic situations, interest rates and investors’ sentiments (Wang, satisfying results (Huang, 2012; Krauss et al., 2017; Paiva, Cardoso,
Wang, Zhang, & Guo, 2011). Recently, many researchers have applied Hanaoka, & Duarte, 2019; Ta, Liu, & Addis, 2018; Ta, Liu, & Tadesse,
different kinds of machine learning models for stock market prediction 2020; Wang, Li, Zhang, & Liu, 2020). Actually, high quality stock prese-
and generated satisfying results, such as support vector regression
lection is crucial for the success of portfolio management (Wang et al.,
(SVR) (Emir, 2013; Lu, Lee, & Chiu, 2009; Matías & Reboredo, 2012;
2020). In stock market, individual investors usually try to determine
Rasel, Sultana, & Meesad, 2015) and random forecast (RF) (Ballings,
the future return of their investing stocks and then figure out optimal
Poel, Hespeels, & Gryp, 2015; Patel, Shah, & Thakkar, 2015). Artificial
weight for each stock to build a portfolio (Zhang, Li, & Guo, 2018).
neural networks (ANNs) (Chong, Han, & Park, 2017; Fischer & Krauss,
Thus, after stock preselection process, investors also need to calculate
2018; Krauss, Do, & Huck, 2017; Oliveira, Cortez, & Areal, 2017; Pang,
Zhou, Wang, Lin, & Chang, 2018; Sezer & Ozbayoglu, 2018; Singh & the optimal investment weight for each selected stock before conduct-
Srivastava, 2017; Zhang & Wu, 2009) as the core of deep learning ing trading investment. This procedure is mainly based on modern
technology have also been widely used for stock market prediction. portfolio theory (Chen, Zhong, & Chen, 2020; Paiva et al., 2019; Wang
Among all the deep learning technologies, LSTM neural network, deep et al., 2020). Modern portfolio theory contains different models for

∗ Corresponding author.
E-mail addresses: [email protected] (Y. Ma), [email protected] (R. Han), [email protected] (W. Wang).

https://doi.org/10.1016/j.eswa.2020.113973
Received 4 June 2020; Received in revised form 1 September 2020; Accepted 3 September 2020
Available online 5 September 2020
0957-4174/© 2020 Elsevier Ltd. All rights reserved.
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

calculating the optimal portfolio weight of each asset. Under given deep learning models’ return prediction in advancing classic portfolio
assets, portfolio models are used to optimize one or more objective optimization models. As far as we know, there is no existing research
functions under different constraint conditions. By solving the portfolio focusing on this problem. Also, since RF, SVR, DMLP, LSTM neural
optimization problem, the optimal investment weight of each asset is network and CNN are frequently used and generate satisfying perfor-
obtained. mance in stock prediction, this paper pays attention to extend portfolio
Markowitz mean–variance (MV) model as the beginning of modern optimization models with these models’ predictive results according
portfolio theory, builds a portfolio optimization model by simultane- to the frame in Yu et al. (2020), which can give full play to their
ously maximizing portfolio’s expected return and minimizing portfolio’s advantages in return prediction.
investment risk (Fabozzi, Gupta, & Markowitz, 2002). This model The main purpose of this paper is to research the performance of
forms an efficient frontier, which presents the asset portfolio that advancing portfolio optimization models with machine learning and
decreases the total risk under a predetermined expected return. For deep learning models’ return prediction. In this regard, this study has
each level of expected return, the efficiency frontier gives the optimal two contributions to fill the gaps in existing researches. First, this
investment strategy (Deng, Lin, & Lo, 2012). However, MV model has paper researches the performance of two machine learning models
many limitations for practical application, such as restrict hypotheses (i.e., SVR and RF) and three deep learning models (i.e., DMLP, LSTM
and computational complexity for larger scale assets. Thus, numerous neural network and CNN) in the stock preselection process before
models are proposed to solve these issues. For instance, Konno and Ya- portfolio formation. These models own overwhelming performance
mazaki (1991) develop mean–absolute deviation (MAD) model, where than traditional time series models, which guarantees high quality
they use absolute deviation to replace variance as risk metric. Alexan- stocks are selected before building portfolio optimization models. Also,
der and Baptista (2002) propose a mean value at risk (VaR) model these models need few hypotheses that is more suitable for practical
by combining MV model with VaR model, in which VaR estimates applications. Second, this paper combines the predictive results of
the tolerated loss under a given confidence level (Jorion, 1997). Since these models in advancing classic MV and omega portfolio optimization
the VaR metric generates many local minima and dissatisfies sub- models for the first time. These advanced portfolio optimization models
additive property in managing portfolio risk, Rockafellar and Uryasev not only own the advantages of machine learning and deep learning
(2000) propose conditional value at risk (CVaR) model to overcome the models in return prediction, but also retain the essences of classical
limitations of VaR metric. Kapsos, Christofides, and Rustem (2014) use MV and omega models in portfolio optimization. Thus, these models
omega model for portfolio optimization. This model tries to optimize can further improve the out-of-sample performance of existing models.
the relative probability of portfolio return or loss exceeding a critical In addition, this study applies China Securities 100 Index component
value based on the asymmetric distribution of return by maximizing stocks as the entire assets. Moreover, this study focuses on the data
Omega ratio, since it avoids the limitation of Sharpe ratio. from 2007 to 2015 and applies the last four years to measure the effect
Classic portfolio optimization models often use mean historical of the proposed investment models.
return as expected return, which induces a low pass filtering influence
The remainder of this paper is presented as follows. Section 2
on the stock market’s behavior, thus obtaining inaccurate estimates
reviews some related works concerning to the this paper. Section 3
of future short term returns (Freitas, Souza, & Almeida, 2009). Also,
describes some related models. Section 4 presents the detailed experi-
since short term stock price is greatly affected by investor sentiment,
mental process. Section 5 shows experimental results. Finally, Section 6
it is not reasonable to use mean historical return as short term ex-
draws a conclusion.
pected return of individual stock. Therefore, stock return prediction
should be combined with portfolio optimization models in financial
2. Literature review
investment (Kolm, Tütüncü, & Fabozzi, 2014). In this regard, many
scholars apply predicted return as expected return in building portfolio
optimization models (Freitas et al., 2009; Hao, Wang, & Xu, 2013; Zhu, There are many works concerning stock trading investment by
2013). Not only that, some researches try to combine more predictive using different kinds of models. This paper only presents some relative
results in forming objective functions in portfolio optimization models researches.
and further improve the performance of original portfolio optimization Huang (2012) developed a model for stock selection by using SVR
models (Ustun & Kasimbeyli, 2012; Yu, Chiou, Lee, & Lin, 2020). and genetic algorithms (GAs). This model applied SVR to predict fu-
To be specific, Ustun and Kasimbeyli (2012) propose a generalized ture return of each stock where GA was employed to optimize model
frame of combining stock forecasts in building portfolio optimiza- parameters and input features. Then, top-ranked stocks were equal
tion model. They develop a portfolio optimization model with eleven weighted to build a portfolio. Empirical results showed that the in-
objective functions based on mean–variance–skewness model. Subse- vestment performance of proposed model performed better than the
quently, Yu et al. (2020) adopt this frame by combining return forecasts benchmarks. Krauss et al. (2017) analyzed the effectiveness of DNN,
of autoregressive integrated moving average (ARIMA) in advancing gradient-boosted trees, RF, and several ensembles of these methods in
mean–variance (MV) model, mean absolute deviation (MAD) model, the context of statistical arbitrage. Each model was trained on lagged
downside risk (DSR) model, linearized value-at-risk (LVaR) model, con- returns of all stocks of the 𝑆&𝑃 500 after elimination of survivor bias.
ditional value-at-risk (CVaR) model and omega model. Experimental Experiments showed that the simple equal-weighted ensemble method
results show that these advanced portfolio optimization models can could generate satisfying result. Lee and Yoo (2018) compared three
further improve the performance of original portfolio optimization kinds of recurrent neural network for stock return prediction, i.e., re-
models, and extended MV and omega models perform better than current neural network, gated recurrent unit and long short-term mem-
other models. Since ARIMA model is mainly based on the hypothe- ory (LSTM) neural network. The experimental results presented LSTM
ses of linearity and normal distribution, these assumptions may not neural network performed the best of these models. Also, they built
be satisfied in stock return series. Machine learning models without predictive threshold-based portfolios based on the predictive results of
these restrict hypotheses have shown better performance than ARIMA LSTM neural network. This model was more data-driven than existing
model (Adebiyi, Adewumi, & Ayo, 2014; Hansen & Nelson, 2002). models in designing portfolio. Experimental results showed that this
Also, deep learning models as novel machine learning technologies portfolio earned promising return. Fischer and Krauss (2018) deployed
have shown promising performance in stock market prediction (Hi- LSTM neural networks for predicting directional movements of the
ransha, Gopalakrishnan, Menon, & Soman, 2018; Long, Lu, & Cui, constituent stocks of the S&P 500 from 1992 until 2015. They found
2018; Moews, Herrmann, & Ibikunle, 2019). Thus, it is interesting that LSTM neural networks based portfolio outperformed memory-free
and necessary to investigate the combination of machine learning and classification based portfolio models, i.e., RF, DNN and LR.

2
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

The common shortcoming of above models is that these models only Table 1
Parameters of DMLP.
apply simple method to build their portfolio, such as equal weighted
Parameter Value
and threshold based method. These portfolio construction methods do
not analyze the risk of each stock, which unbalances the portfolio’s Hidden nodes 5,10,15,20,25,30
Hidden layers 1,2,3, . . . ,10
expected return and risk.
Learning rate 0.0001, 0.001, 0.01, 0.1
Lin, Huang, Gen, and Tzeng (2006) proposed a dynamic portfolio Patient 0,5,10
optimization model. This model used Elman neural network to predict Batch size 50,100,200
future stock return, and then applied covariance matrix to measure risk. Loss function Mean absolute error
Optimizer SGD, RMSprop, Adam
Experimental results presented that this model outperformed vector
autoregression model in dynamic portfolio selection models. Alizadeh,
Rada, Jolai, and Fotoohi (2011) developed a portfolio optimization
model by using adaptive neuro-fuzzy inference system for portfolio return errors to form eleven objective functions, which comprehen-
return prediction and variance index for risk assessment. Experimental sively used predictive results in building portfolio optimization models.
results showed that this portfolio optimization model performed better Following on this approach, Yu et al. (2020) combined ARIMA model’s
than MV model, neural network and Sugeno–Yasukawa method. This forecasts in advancing six portfolio optimization models (i.e., MV,
research informed us that combining artificial intelligence technique MAD, DSR, LVaR, CVaR and omega models). They first used ARIMA
with modern portfolio optimization could generate better performance model to predict future stock return, then applied the predictive results
than each single model for trading investment. Deng and Min (2013) in extending these portfolio optimization models. Experimental results
applied linear regression model with ten variables to select stocks showed that the advanced portfolio optimization models with ARIMA
from US and global equities, and built portfolio based on MV model prediction outperformed single portfolio models, and extended MV and
with some practical constraints of risk tolerance, tracking error and omega models with ARIMA prediction performed the best among these
turnover. They found that the risk adjusted return of the proposed models.
model of global equity universe performed better than the domestic These models show us a promising direction to combine stock return
equity universe, and the portfolio return increased with the systematic prediction with portfolio optimization models, which fully applies the
tracking error and risk tolerance. Paiva et al. (2019) proposed a unique advantages of stock forecasts in building portfolio optimization models.
decision-making model for day trading investments on stock market, Therefore, this paper tries to follow this step to further research the
which was developed using a fusion approach of SVM and MV model performance of combining machine learning and deep learning models
for portfolio selection. The proposed model was compared with two with portfolio optimization models.
other models, i.e., SVM+ 1/N and Random+ MV. The experimental
evaluation was based on assets from Ibovespa stock market, which 3. Models
showed the proposed model performed the best. Wang et al. (2020)
developed a portfolio model by using LSTM neural network for stock This section first introduces the DMLP, LSTM neural network, CNN,
selection and MV for portfolio optimization. In this model, LSTM neural SVM and RF models, then their applied parameters are displayed. Also,
network first selected k stocks from total stock set, then the chosen the classical MV model and omega model are clarified.
k stocks are used to build MV portfolio model. They compared LSTM
3.1. Deep multilayer perceptron (DMLP)
neural network with SVM, RF and ARIMA model in stock selection
process, then used MV for portfolio optimization. Experiments’ result
DMLP is a classic ANN, which is different from multilayer percep-
showed that their proposed model outperformed the others. Ta et al.
tron (MLP) since it contains more hidden layers. Although in terms
(2020) built portfolios by using LSTM neural network and three portfo-
of mapping abilities, MLP is believed to be capable of approximating
lio optimization techniques, i.e., equal weighted method, Monte Carlo
arbitrary functions (Principe, Euliano, & Lefebvre, 1999), DMLP usually
simulation and MV model. Also, they applied linear regression and SVM
performs better than MLP with few hidden layers in practice (Ori-
as comparisons in stock selection process. Experimental results showed
moloye, Sung, Ma, & Johnson, 2020; Singh & Srivastava, 2017). DMLP
that LSTM neural network owned higher predictive accuracy than
model contains three parts, i.e., input layer, hidden layer and output
linear regression and SVM, and its constructed portfolios outperformed
layer. In this paper, stochastic gradient descent is used to train DMLP
the others.
and earlystopping technology is applied to avoid overfitting problem
These models apply different methods for stock selection, then
during the training process. The main hyperparameters of DMLP con-
build portfolio optimization models with selected stocks for trading tain hidden nodes, hidden layers, optimizer, learning rate, activation
investment. These methods show us a promising direction to build function, loss function, batch size and patient. As recommended by Ori-
portfolio models in practice. However, classic portfolio optimization moloye et al. (2020), relu function is adopted as activation function.
models are often inappropriate for short term practical investment. The considered values of the other hyperparameters are presented in
Thus, it is important to explore more efficient approach to combine Table 1. Grid research is used to discover the optimal hyperparameter.
return predictive results with portfolio optimization models. After many attempts, the specified topology of DMLP model is
Freitas et al. (2009) proposed a prediction-based portfolio optimiza- discovered. DMLP with 15 nodes each hidden layer, 6 hidden layers,
tion model by using autoregressive moving reference neural network 0.01 as learning rate, 0 as patient, 100 as batch size and Adam as
(AR-MRNN). This model first used AR-MRNN to predict future stock optimizer performs the best. Thus, this paper uses this DMLP model
return, then built portfolio optimization model using predictive results for stock return prediction.
of AR-MRNN. Experimental results showed that this model outper-
formed original portfolio optimization model and beat the market 3.2. Long short term memory (LSTM) neural network
index. Then, Hao et al. (2013) developed a similar portfolio optimiza-
tion model by using SVR. They compared their model with the model LSTM neural network is a kind of recurrent neural network, which
in Freitas et al. (2009). Experimental results showed that their model was proposed to overcome the limitation of recurrent neural network
owned better performance in trading investment. Ustun and Kasimbeyli and retain long term information (Graves & Schmidhuber, 2005). This
(2012) built a generalized approach for building portfolio optimization property is mainly based on the memory cells in hidden layer. LSTM
model using stock return prediction. They built an extended mean– neural network usually consists of input layer, hidden layer and output
variance–skewness model by using predictive returns and predictive layer. This paper uses stochastic gradient descent to train LSTM neural

3
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

Table 2 Table 4
Parameters of LSTM neural network. Parameters of RF.
Parameter Value Parameter Value
Hidden nodes 5,10,15,20,25,30 Max-depth 5,10,15,20,25,30
Hidden layers 1,2,3, . . . ,10 Min-samples-split 2,5,10,15,20,25,30
Learning rate 0.0001,0.001,0.01,0.1 Min-samples-leaf 1,5,10,15,20,25,30
Patient 0,5,10 Max-features 10,20,30,40,50
Batch size 50,100,200
Dropout rate 0.1, 0.2, . . . ,0.5
Recurrent dropout rate 0.1, 0.2, . . . ,0.5 Table 5
Loss function Mean absolute error Parameters of SVR.
Optimizer SGD, RMSprop, Adam Parameter Value
𝐶 20 , 21 , … , 25
Table 3 𝛾 2−5 , 2−4 , … , 20
Parameters of CNN.
Parameter Value
Filter numbers 2,4,8,16,32,64 patient, batch size, activation function and optimizer are assigned as
Convolutional layers 1,2,3, . . . ,10
0.001, 0, 100, relu and SGD respectively. This paper applies this CNN
Maxpooling layers 1,2,3, . . . ,10
Fully connected layers 1,2,3,4,5 model for stock return prediction in the following.
Fully connected layer nodes 2,4,8,16,32,64
Learning rate 0.0001,0.001,0.01 3.4. Random forest (RF)
Patient 0,5,10
Batch size 50,100,200
RF is a nonparametric and nonlinear model, which was first pro-
Activation function relu, tanh
Loss function Mean absolute error posed by Ho (1995). This model avoids the overfitting problem since it
Optimizer SGD, RMSprop, Adam always converges (Breiman, 2001). Due to the advantages of RF, it is
often used for stock prediction (Ballings et al., 2015; Booth, Gerding, &
Mcgroarty, 2014; Qin, 2014). The main parameters of RF are number of
decision tree, the maximum depth of the tree (max-depth for short), the
network and earlystopping technology is applied to reduce overfitting.
minimum number of samples required to split an internal node (min-
The considered hyperparameters of LSTM neural network contain hid-
samples-split for short), the minimum number of samples needed to be
den nodes, hidden layers, learning rate, batch size, patient, dropout
at a leaf node (min-samples-leaf for short) and the number of features to
rate, recurrent dropout rate, activation function, optimizer and loss
consider when looking for the best split (max-features for short). In this
function. According to Orimoloye et al. (2020), relu function is used
paper, the number of decision tree is set to 500 according to Breiman
as activation function. The investigated values of the other hyperpa- (2001). The considered values of the other parameters are presented in
rameters are presented in Table 2. Grid research is used to determine Table 4. Grid research is used to determine the optimal parameters.
the optimal hyperparameter. After many experiments, this paper sets max-depth, min-samples-
By trial and error, the optimal hyperparameters of LSTM neural split, min-samples-leaf and max-features as 20, 10, 10 and 40. Thus,
network are determined. The topology of LSTM neural network consists this paper uses this RF model for stock return prediction.
of 4 hidden layers and each layer contain 5 nodes. And, 0.01, 0, 100,
0.4, 0.3 and RMSprop are set for learning rate, patient, batch size, 3.5. Support vector regression (SVR)
dropout rate, recurrent dropout rate and optimizer respectively. In the
following, this paper uses this LSTM neural network model for return SVR is an classical machine learning model which has been widely
prediction. applied in stock market prediction (Emir, 2013; Lu et al., 2009; Matías
& Reboredo, 2012; Rasel et al., 2015). SVR uses Vapnik’s Structural Risk
3.3. Convolutional neural network (CNN) Minimization (SRM) principle to resolve different regression problems.
SVR originates from statistical learning theory, which is applied for how
CNN is a novel ANN, which was introduced by LeCun and Bengio to regulate generalization and discover the optimal trade off between
(1995). CNN is often used in computer vision and image process, complexity of model structure and empirical risk.
and obtains satisfying performance (Ji, Xu, Yang, & Yu, 2012; Long, In this paper, radial basis function is used as the kernel function of
Shelhamer, & Darrell, 2015). Recent years, some researchers have SVR, which is given as follows:
applied CNN in stock price prediction and generated promising re-
𝐾(𝑥𝑖 , 𝑥𝑗 ) = exp(−𝛾‖𝑥𝑖 − 𝑥𝑗 ‖2 ) (1)
sults (Hoseinzade & Haratizadeh, 2019; Sezer & Ozbayoglu, 2018).
CNN typically consists of many successive convolutional layers and where 𝛾 is the constant of radial basis function.
pooling layers, then followed by some fully connected layers. Since The parameters of SVR are composed of the regularization param-
stock return is time series, this paper applies one dimensional (1D) eter (𝐶) and 𝛾. Table 5 presents the used value of these parameters.
CNN for stock return prediction. Similarly, stochastic gradient descent Grid research is used to discover their optimal values in each training
method is used to train CNN and earlystopping technology is used process.
to avoid the overfitting problem in this paper. The hyperparameters
of CNN in this paper contain filter numbers, convolutional layers, 3.6. Autoregressive integrated moving average (ARIMA) model
maxpooling layers, fully connected layers, fully connected layer nodes,
learning rate, patient, batch size, activation function, optimizer and ARIMA is a classical statistical model, which is often used in stock
loss function. Their considered values are presented in Table 3. Grid prediction. The ARIMA model can be presented as follows:
research is applied to discover the optimal hyperparameter. ∑
𝑝 ∑
𝑞
After multiple trial and error, the topology of CNN is determined. (1 − 𝜙𝑖 𝐿𝑖 )(1 − 𝐿𝑖 )𝑑 𝑟𝑡 = 𝛿 + (1 − 𝜃𝑖 𝐿𝑖 )𝜀𝑡 (2)
The input layer is followed with one 1D convolutional layer (2 fil- 𝑖=1 𝑖=1

ters with 2 × 1 size), one 1D maxpooling layer (2 × 1 size), three where 𝑝, 𝑑, 𝑞, 𝐿, 𝜙𝑖 , 𝜃𝑖 and 𝜀𝑡 represent the number of autoregressive
fully connected layers (2 nodes) and output layer. And, learning rate, terms, the number of difference times, the number of moving average

4
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

terms, lag operator, autoregressive parameter, moving average param- where 𝜏 denotes the threshold of dividing returns into expected (rev-
eter and error term respectively (Yu et al., 2020). 𝑝, 𝑑, 𝑞 need to be enue) and unexpected (loss), and it is often decided by investors. 𝑦𝑖
determined before using ARIMA model. Since ARIMA model needs means random return of asset 𝑖. Since Omega ratio needs the probability
some restrict hypotheses such as stationarity test, this paper set the distribution of asset returns, the obtained solution turns into biased and
values of 𝑝, 𝑑, 𝑞 before each training process. overoptimistic when this probability distribution is imprecise (Kapsos
In this paper, we uses ARIMA model as benchmark in stock re- et al., 2014). Thus, Kapsos et al. (2014) introduce worst-case omega
turn prediction process. Also, its predictive results are combined in ratio (WCOR) to solve this problem, and modify omega model as
advancing MV and omega portfolio optimization models for further follows.
comparisons in trading simulation.
max 𝜓 (12)
3.7. Mean–variance with forecasting (MVF) model Subject to
𝑖
Markowitz (1959) as the forerunner of modern finance theory in- ∑𝑛
1 ∑ 𝑖
𝑇

troduced mean–variance (MV) model, which presented a mathematical 𝛿( 𝑥𝑗 𝑟̄𝑖𝑗 − 𝜏) − (1 − 𝛿) 𝑖 𝜂 ≥𝜓 (13)


𝑗=1
𝑇 𝑡=1 𝑡
solution to settle the trade-off between expected return maximization
and risk minimization. Following the frame proposed by Yu et al. ∑
𝑛
𝜂𝑡𝑖 ≥ − 𝑥𝑗 𝑟̄𝑖𝑗 + 𝜏 (14)
(2020), this paper combines the return predictive results in advancing 𝑗=1
MV model for building the MVF model.
𝜂𝑡𝑖 ≥ 0 (15)
The MVF model is actually a multi-objective optimization problem.
The following equations present the MVF model. ∑
𝑛
𝑥𝑗 = 1 (16)

𝑛 𝑗=1
min 𝑥𝑖 𝑥𝑗 𝜎𝑖𝑗 (3)
𝑖,𝑗=1 0 ≤ 𝑥𝑖 ≤ 1 (17)
∑ 𝑛
𝑡 = 1, 2, … , 𝑇 𝑖
𝑖 = 1, 2, … , 𝑙 𝑗 = 1, 2, … , 𝑛 (18)
max 𝑥𝑖 𝑟̂𝑖 (4)
𝑖=1 where 𝑥𝑖 means the proportion of asset 𝑖 in the portfolio, is an 𝜂𝑡𝑖

𝑛
auxiliary variable used to linearize this portfolio model, 𝛿 denotes the
max 𝑥𝑖 𝜀̄ 𝑖 (5)
risk return preference of this model, 𝑇 𝑖 means the sample period of the
𝑖=1
𝑖th distribution, 𝑟̄𝑖𝑗 represents the sample period’s mean return of the
Subject to
𝑖th distribution, 𝑙 is the number of distributions and 𝑛 is the number

𝑛
of assets in portfolio. This paper sets 𝛿, 𝑇 𝑖 , 𝜏, 𝑙 as 0.5, 20, 0 and 1
𝑥𝑖 = 1 (6)
respectively according to Yu et al. (2020).
𝑖=1
Then, we can combine return predictive results with this model and
0 ≤ 𝑥𝑖 ≤ 1 𝑖 = 1, 2, … , 𝑛 (7) build the OF model similar to MVF model.
where 𝑥𝑖 means the proportion of asset 𝑖 in portfolio, 𝑛 is the number
max 𝜓 (19)
of assets in portfolio, 𝜎𝑖𝑗 is the covariance of asset 𝑖 and 𝑗, 𝑟̂𝑖 denotes

𝑛
the predicted return of asset 𝑖 and 𝜀̄ 𝑖 represents the average predictive max 𝑥𝑖 𝑟̂𝑖
errors of asset 𝑖 over the sample period. This paper sets the sample 𝑖=1
period as 20 trading days to build the MVF model according to Yu et al. ∑𝑛
(2020). In other words, 𝑟̂𝑖 is predicted return of asset 𝑖 at time 𝑡, and 𝜀̄ 𝑖 max 𝑥𝑖 𝜀̄ 𝑖
means the average predictive error of asset 𝑖 over the past 20 trading 𝑖=1

days, i.e., time 𝑡, 𝑡 − 1, … , 𝑡 − 19. The predictive error of asset 𝑖 at time Subject to
𝑡 equals to 𝜀𝑖 = 𝑟𝑖 − 𝑟̂𝑖 , where 𝑟𝑖 represents the actual return of asset
∑𝑛
0.5 ∑
20
𝑖. Eqs. (4)–(5) mean maximization of expected portfolio return and the 0.5( 𝑥𝑖 𝑟̄𝑖 ) − 𝜂 ≥𝜓 (20)
sample period’s abnormal return respectively. 𝑖=1
20 𝑡=1 𝑡
The equal weighted method is often used to convert the above mul- ∑
𝑛

tiple objective portfolio optimization to a single objective model (Yu 𝜂𝑡 ≥ − 𝑥𝑖 𝑟̄𝑖 (21)
𝑖=1
et al., 2020). Thus, the MVF model becomes the following form:
𝜂𝑡 ≥ 0 (22)

𝑛 ∑
𝑛 ∑
𝑛
min 𝑥𝑖 𝑥𝑗 𝜎𝑖𝑗 − 𝑥𝑖 𝑟̂𝑖 − 𝑥𝑖 𝜀̄ 𝑖 (8) ∑
𝑛
𝑖,𝑗=1 𝑖=1 𝑖=1 𝑥𝑖 = 1 (23)
𝑖=1
Subject to
0 ≤ 𝑥𝑖 ≤ 1 (24)

𝑛
𝑥𝑖 = 1 (9) 𝑡 = 1, 2, … , 20 𝑖 = 1, 2, … , 𝑛 (25)
𝑖=1
Similarly, we convert the multiobjective optimization model to single
0 ≤ 𝑥𝑖 ≤ 1 𝑖 = 1, 2, … , 𝑛 (10)
objective model (Yu et al., 2020).
3.8. Omega with forecasting (OF) model ∑
𝑛 ∑
𝑛
min −𝜓 − 𝑥𝑖 𝑟̂𝑖 − 𝑥𝑖 𝜀̄ 𝑖 (26)
𝑖=1 𝑖=1
Omega ratio was first introduced by Keating and Shadwick (2002).
Then, it is widely used to build portfolio since it avoids the known Subject to
limitations of Sharpe ratio (Gilli, Schumann, di Tollo, & Cabej, 2011; ∑ 0.5 ∑
𝑛 20
Kane, Bartholomew-Biggs, Cross, & Dewar, 2009; Kapsos et al., 2014). 0.5( 𝑥𝑖 𝑟̄𝑖 ) − 𝜂 ≥𝜓 (27)
20 𝑡=1 𝑡
The Omega ratio is defined as follows 𝑖=1

𝐸(𝑦𝑖 ) − 𝜏 ∑
𝑛

𝑤= +1 (11) 𝜂𝑡 ≥ − 𝑥𝑖 𝑟̄𝑖 (28)


𝐸[𝜏 − 𝑦𝑖 ]+ 𝑖=1

5
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

Table 6 Table 7
Selected stocks’ tickers. The predictive performance of different models.
000001 000002 000063 000069 000538 000625 Model MAE MSE 𝐻𝑅 𝐻𝑅+ 𝐻𝑅−
000651 000725 000858 000895 002024 300059 DMLP mean 2.21 ∗ 10−2 1.04 ∗ 10−3 48.71% 48.89% 48.89%
600000 600010 600011 600015 600016 600018 SD 5.77 ∗ 10−3 5.35 ∗ 10−4 2.48 ∗ 10−2 3.20 ∗ 10−2 3.18 ∗ 10−2
600019 600028 600030 600031 600036 600048 LSTM mean 2.02 ∗ 10−2 8.86 ∗ 10−4 48.46% 48.64% 47.90%
600050 600104 600111 600115 600150 600276 SD 4.01 ∗ 10−3 3.20 ∗ 10−4 2.11 ∗ 10−2 3.43 ∗ 10−2 3.07 ∗ 10−2
600340 600372 600398 600485 600518 600519 CNN mean 4.24 ∗ 10−2 2.32 ∗ 10−2 48.39% 48.27% 47.88%
600585 600637 600690 600795 600837 600886 SD 4.79 ∗ 10−2 7.68 ∗ 10−2 2.50 ∗ 10−2 6.62 ∗ 10−2 1.17 ∗ 10−1
600887 600893 600900 601006 601111 601398 SVR mean 3.10 ∗ 10−2 1.70 ∗ 10−3 47.94% 52.83% 47.69%
601988 SD 1.20 ∗ 10−2 9.71 ∗ 10−4 2.38 ∗ 10−2 1.47 ∗ 10−1 2.36 ∗ 10−2
RF mean 1.85 ∗ 10−2 8.08 ∗ 10−4 48.69% 48.99% 48.28%
SD 3.92 ∗ 10−3 3.02 ∗ 10−4 2.15 ∗ 10−2 3.25 ∗ 10−2 2.35 ∗ 10−2
ARIMA mean 2.94 ∗ 10−2 2.12 ∗ 10−3 48.05% 48.33% 47.68%
𝜂𝑡 ≥ 0 (29) SD 7.00 ∗ 10−3 1.77 ∗ 10−3 1.96 ∗ 10−2 2.95 ∗ 10−2 2.40 ∗ 10−2


𝑛 SD means standard deviation.
𝑥𝑖 = 1 (30)
𝑖=1

0 ≤ 𝑥𝑖 ≤ 1 (31) 5.1. Stock return prediction


𝑡 = 1, 2, … , 20 𝑖 = 1, 2, … , 𝑛 (32)
In order to comprehensively measure the performance of different
models in stock return prediction process, five metrics, i.e., mean
4. Experimental process
squared error (MSE), mean absolute error (MAE), 𝐻𝑅 , 𝐻𝑅+ and 𝐻𝑅− ,
are applied in this paper. As these metrics clearly show the model
In order to evaluate the proposed methods, this paper utilizes the predictive ability, they are widely used as performance metrics (Freitas
historical data of the China Securities 100 Index component stocks as et al., 2009; Gandhmal & Kumar, 2019; Wang et al., 2020). These
experimental data set. The China Securities 100 Index is selected from metrics are defined as follows
the sample stocks of the Shanghai and Shenzhen 300 Index in order
1 ∑
𝑁
to comprehensively reflect the overall situation of the most influential 𝑀𝑆𝐸 = (𝑟 − 𝑟̂𝑡 )2 (35)
large capitalization companies in the Shanghai and Shenzhen stock 𝑁 𝑡=1 𝑡
markets. Experimental data range from January 4, 2007 to December
1 ∑
𝑁
31, 2015, and after deleting the stocks that are unlisted during this 𝑀𝐴𝐸 = |𝑟 − 𝑟̂𝑡 | (36)
𝑁 𝑡=1 𝑡
period or halted for a long period of time, the remainder of China
𝐶𝑜𝑢𝑛𝑡𝑛𝑡=1 (𝑟𝑡 𝑟̂𝑡 > 0)
Securities 100 index component stocks contains 49 stocks, which are 𝐻𝑅 = (37)
presented in Table 6. 𝐶𝑜𝑢𝑛𝑡𝑛𝑡=1 (𝑟𝑡 𝑟̂𝑡 ≠ 0)
For each stock, the past 60 days’ daily returns are used as input 𝐶𝑜𝑢𝑛𝑡𝑛𝑡=1 (𝑟𝑡 > 0 𝐴𝑁𝐷 𝑟̂𝑡 > 0)
𝐻𝑅+ = (38)
features to predict next day’s return. For each input feature, since its 𝐶𝑜𝑢𝑛𝑡𝑛𝑡=1 (̂𝑟𝑡 > 0)
fluctuation range has apparent difference, we need to process them 𝐶𝑜𝑢𝑛𝑡𝑛𝑡=1 (𝑟𝑡 < 0 𝐴𝑁𝐷 𝑟̂𝑡 < 0)
before training models. For each feature series {𝑑𝑖 }, 𝑑𝑖 is processed as 𝐻𝑅− = (39)
𝐶𝑜𝑢𝑛𝑡𝑛𝑡=1 (̂𝑟𝑡 < 0)
follows
{ where 𝑟𝑡 , 𝑟̂𝑡 represent actual return and predictive return at time 𝑡 re-
𝑑𝑚 + 5𝑑𝑚𝑚 𝑖𝑓 𝑑𝑖 ≥ 𝑑𝑚 + 5𝑑𝑚𝑚 ,
𝑑𝑖 = (33) spectively. In addition, 𝐻𝑅 denotes total hit rate, 𝐻𝑅+ means accuracy
𝑑𝑚 − 5𝑑𝑚𝑚 𝑖𝑓 𝑑𝑖 ≤ 𝑑𝑚 − 5𝑑𝑚𝑚 .
of positive prediction and 𝐻𝑅− is accuracy of negative prediction. Note
where 𝑑𝑚 and 𝑑𝑚𝑚 mean the median of series {𝑑𝑖 } and series {|𝑑𝑖 − 𝑑𝑚 |} that this paper sets MAE and MSE as the key metrics since they play
respectively. Last, each processed input feature is standardized in order important roles in building portfolio with return prediction.
to unify fluctuation range before model training First, the performance of three deep learning models is compared.
𝑑𝑖 − 𝜇 Table 7 shows that LSTM neural network owns the lowest MAE and
𝑑̂𝑖 = (34) MSE among three deep learning models, also its standard deviation is
𝜎
the lowest. Second, for two machine learning models, RF’s MAE and
where 𝜇 and 𝜎 represent the mean and standard deviation of series {𝑑𝑖 }
MSE are lower than SVR, and its standard deviation is much lower.
respectively.
Third, compared with ARIMA model, RF and LSTM neural network
The total data set contains 9 years’ data. The experiment is imple-
have lower MAE and MSE, also their standard deviations are smaller.
mented by sliding window, i.e., the first 4 years’ data is used as training
Thus, RF and LSTM neural network both perform better than traditional
set, the following 1 year’s data is applied as validation set, then the
ARIMA model, which is consistent with the conclusion in Adebiyi et al.
next year’s data is used to test models’ abilities, so we can use the last
(2014) and Hiransha et al. (2018). In addition, the predictive errors of
four years’ data (2012–2015) to test the performance of each model. In
RF and LSTM neural network, i.e., the values of MAE and MSE, are also
the experiment, the DMLP, LSTM neural network and CNN model are
superior than existing literature (Sadaei, Enayatifar, Lee, & Mahmud,
implemented based on Keras deep learning package and the SVR and
2016; Wang et al., 2020; Weng, Lu, Wang, Megahed, & Martinez, 2018).
RF are prepared based on Scikit-learn machine learning package.
Last, the predictive performance of RF and LSTM neural network is
discussed. RF owns the lowest MSE and MAE among these models. Also,
5. Results its standard deviation is the smallest. In addition, RF’s 𝐻𝑅 is pretty high
among these models.
This section first presents the predictive results of different models In conclusion, this section presents that RF outperforms the other
in stock prediction during the whole test period. In the following, models in stock return prediction process, and the predictive perfor-
this paper conducts trading simulation to compare the performance mance of LSTM neural network is the second followed by DMLP.
of different MVF and OF models for daily trading investment without Besides, the traditional time series model ARIMA performs better than
transaction fee. Lastly, this paper further presents the performance of CNN and SVR, and CNN’s predictive error is the highest among these
these models when transaction fee is considered. models. This result is similar to the conclusion in Ballings et al. (2015).

6
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

Table 8
The performance of different MVF models.
Model ER SD IR TOR MD TUR
DMLP+MVF 9.71% 0.7446 0.1304 7.88% 75.29% 54.21%
LSTM+MVF 46.66% 0.6685 0.6979 59.79% 74.75% 53.04%
CNN+MVF 39.39% 0.8541 0.4612 284.43% 52.99% 124.72%
SVR+MVF 97.34% 1.5395 0.6323 449.53% 69.79% 86.71%
RF+MVF 274.88% 3.3805 0.8131 1780.34% 59.61% 155.36%
ARIMA+MVF 20.70% 0.4202 0.4926 126.05% 64.54% 33.66%

ER, SD, IR, TOR, MD and TUR mean excess return, standard deviation, information
ratio, total return, maximum drawdown and turnover rate respectively.

Table 9
The performance of different OF models.
Model ER SD IR TOR MD TUR
DMLP+OF 14.76% 0.9467 0.1560 −8.42% 77.63% 57.10%
LSTM+OF 120.63% 1.6467 0.7325 56.14% 87.51% 51.91% Fig. 1. Net value of different MVF models.
CNN+OF 46.68% 0.8330 0.5604 314.43% 51.02% 126.19%
SVR+OF 133.18% 2.0262 0.6573 612.29% 74.36% 79.50%
RF+OF 121.53% 1.3980 0.8693 679.36% 70.42% 149.72%
ARIMA+OF 7.90% 0.3912 0.2020 51.96% 65.78% 40.39% First, this paper compares three MVF models with deep learning
ER, SD, IR, TOR, MD and TUR mean excess return, standard deviation, information models’ forecasts. From Table 8, LSTM+MVF owns the highest ex-
ratio, total return, maximum drawdown and turnover rate respectively. cess return and information ratio, and CNN+MVF has the highest
total return. Thus, CNN+MVF and LSTM+MVF perform better than
DMLP+MVF. In the following, the performance of CNN+MVF and
This is mainly due to the input features used in this paper. Note LSTM+MVF is further compared. Fig. 1 presents that there is no trans-
that LSTM neural network may perform better than RF if more input parent difference between these models. In addition, Mann–Whitney
features are contained. test is conducted to compare their excess returns, test’s 𝑝− value equals
to 0.216, which means there is no significant difference between these
models statistically. Thus, CNN+MVF and LSTM+MVF have their own
5.2. Model performance without transaction fee
advantages and we cannot simply distinguish them.
Second, two MVF models with machine learning models’ forecasts
After comparing the performance of different models in stock return are compared. Table 8 shows that RF+MVF model owns higher excess
prediction, trading simulation is conducted to investigate the investing return, information ratio and total return than SVR+MVF. Also, Fig. 1
abilities of different portfolios. This paper simulates buying and selling depicts that the net value of RF+MVF is higher than SVR+MVF. Thus,
behaviors as a typical investor. Specifically, an investor decides to buy RF+MVF outperforms SVR+MVF.
or sell certain proportion of each stock from the market before each Third, CNN+MVF, LSTM+MVF, RF+MVF and ARIMA+MVF are fur-
trading day to achieve the calculated proportion of each stock in the ther compared. From Table 8, RF+MVF has the highest excess return,
portfolio. For simplicity, the dividends and taxes are neglected, also information ratio and total return. Also, Fig. 1 presents that RF+MVF’s
leveraging and short selling are overlooked when investing. Also, the net value is the largest among these models. Thus, RF+MVF is the best
trading cost is not considered in this section. The trading simulation is choice among the MVF models.
implemented for all over the testing period, including 970 samples. In addition, the monthly excess returns of different MVF models
This paper applies six metrics, i.e., excess return, standard devia- are presented in Figs. 2–5, which presents the performance of different
tion, information ratio, total return, maximum drawdown and turnover models each month in detail. These figures display that the monthly
rate to comprehensively evaluate the abilities of different portfolio excess returns of RF+MVF and LSTM+MVF in 2012 are the highest
models, where excess return means monthly average excess return and among these models and there is little difference between them. But
standard deviation measures the volatility of excess return each month. from 2013 to 2015, the RF+MVF’s excess return is almost the largest
Total return represents the whole profit during the test period. The among these models.
definitions of information ratio, maximum drawdown and turnover rate Based on the above analysis, this paper concludes that RF+MVF
are presented as follows. model outperforms the other MVF models. Therefore, RF is more
𝑒𝑥𝑐𝑒𝑠𝑠 𝑟𝑒𝑡𝑢𝑟𝑛 suitable to build MVF portfolio model.
𝐼𝑛𝑓 𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑟𝑎𝑡𝑖𝑜 = (40)
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
5.2.2. OF models
𝑁𝑒𝑡𝑣𝑙 − 𝑁𝑒𝑡𝑣𝑠
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑑𝑟𝑎𝑤𝑑𝑜𝑤𝑛 = max (41) This section discusses the performance of different OF models. Also,
𝑙<𝑠 𝑁𝑒𝑡𝑣𝑙
this paper uses RF+OF to denote OF model with RF forecasts, the others

𝑛
are similar.
𝑇 𝑢𝑟𝑛𝑜𝑣𝑒𝑟 𝑟𝑎𝑡𝑒 = |𝑥𝑖,𝑡 − 𝑥𝑖,𝑡−1 | (42)
First, this paper compares the performance of different OF models
𝑖=1
with deep learning models’ forecasts. Table 9 shows that LSTM+OF
where 𝑁𝑒𝑡𝑣𝑙 , 𝑥𝑖,𝑡 and 𝑛 represent net value at time 𝑙, the weight of stock owns the highest excess return and information ratio, and CNN+OF has
𝑥 in portfolio at time 𝑡 and the number of stocks in portfolio. Note that, the highest total return. Thus, LSTM+OF and CNN+OF perform better
excess return, information ratio and total return are set as core metrics than DMLP+OF. Then, the performance of LSTM+OF and CNN+OF is
to compare different models since these metrics thoroughly represent further compared. Fig. 6 depicts that LSTM+OF’s net value is larger
their profitabilities. than CNN+OF until the second half of 2015. In addition, Mann–
Whitney test is conducted to compare their excess returns, test’s 𝑝−
5.2.1. MVF models value equals to 0.001, which means there is significant difference be-
This section presents the experimental results of different MVF tween them in statistical sense. Thus, LSTM+OF outperforms CNN+OF.
models. And, this paper uses RF +MVF to represent MVF model with Second, two OF models with machine learning models’ forecasts are
RF forecasts, the others are similar. discussed. Table 9 shows that the difference of these models’ excess

7
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

Fig. 2. Excess returns of different MVF models in 2012.

Fig. 3. Excess returns of different MVF models in 2013.

Fig. 4. Excess returns of different MVF models in 2014.

Fig. 5. Excess returns of different MVF models in 2015.

8
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

Last, since RF+MVF and RF+OF are the best models among all
the considered MVF and OF models, it is necessary to compare these
two models. Tables 8–9 present that RF+MVF’s excess return and total
return are much higher than RF+OF, and its information ratio is slightly
lower than RF+OF. Thus, RF+MVF is superior to RF+OF.
Based on the analysis, this paper deduces that RF+OF performs
the best among these OF models. And RF+MVF is superior to RF+OF.
However, their turnover rates are both the highest among these models.
Since high turnover causes high transaction fee, we deduct their trans-
action fees and further compare their profitabilities in the following
section. In addition, the performance of different MVF generally corre-
sponds to the performance of different OF models. This phenomenon
is mainly attribute to the same way of combining return prediction
in advancing MV and omega models. Moreover, RF+MVF and RF+OF
perform the best among these models due to the optimal predictive
Fig. 6. Net value of different OF models. ability of RF. However, the other portfolios’ performance do not ex-
actly match their predictive models. Possible reasons are the utilized
predictive information in Eqs. (4)–(5).

returns is small and RF+OF’s information ratio and total return are 5.3. Model performance with transaction fee
much higher than SVR+OF. Thus, RF+OF is a better choice for trading
investment. Based on the experiments of above section, we can obtain that
Third, the performance of RF+OF, LSTM+OF and ARIMA+OF is RF+MVF and RF+OF have the highest turnover rates among MVF and
further compared. Table 9 presents that RF+OF owns the highest excess OF models. As high turnover will increase transaction fee in real stock
return, information ratio and total return among these models. Also, trading investment, it is necessary to explore the real performance
Fig. 6 shows that the net value of RF+OF is pretty large among these of these models by deducting their transaction fees. Thus, this sec-
tion discusses the performance of different models after deducting the
models. Therefore, RF+OF is the best choice among these models.
transaction fee cased by turnover in order to further compare their
In addition, the monthly excess returns of different OF models are profitabilities in real stock market. This paper uses the transaction
presented in Figs. 7–10. These figures show that LSTM+OF performs fee caused by turnover of 0.05% per unit to approximate the total
the best from 2012 to 2013, but since then it is gradually surpassed transaction fee of real trading investment for simplicity. Also, excess
by SVR+OF. And, the performance of RF+OF is relatively stable and return, information ratio and total return are set as key metrics since
satisfying during the whole test period. these metrics comprehensively show their profitabilities.

Fig. 7. Excess returns of different OF models in 2012.

Fig. 8. Excess returns of different OF models in 2013.

9
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

Fig. 9. Excess returns of different OF models in 2014.

Fig. 10. Excess returns of different OF models in 2015.

Table 10
The performance of different MVF models with transaction fee.
Model ER SD IR TR MD
DMLP+MVF −0.10% 0.6312 −0.0015 −16.42% 76.15%
LSTM+MVF 31.60% 0.5617 0.5625 24.43% 75.73%
CNN+MVF 7.17% 0.5540 0.1295 114.07% 54.50%
SVR+MVF 58.34% 1.0327 0.5650 265.80% 70.71%
RF+MVF 134.76% 1.6835 0.8005 806.52% 63.10%
ARIMA+MVF 13.01% 0.3491 0.3726 93.01% 64.95%

ER, SD, IR, TOR and MD mean excess return, standard deviation, information ratio,
total return and maximum drawdown respectively.

depicts that the net value of RF+MVF is significantly higher than


SVR+MVF model. Thus, RF+MVF is a better choice than SVR+MVF.
Third, this paper compares the differences between LSTM+MVF,
Fig. 11. Net value of different MVF models with transaction fee. RF+MVF and ARIMA+MVF. From Table 10, RF+MVF owns the highest
excess return, information ratio and total return. Also, Fig. 11 displays
that RF+MVF’s net value is the largest among these models. Therefore,
5.3.1. MVF models with transaction fee RF+MVF performs the best among these models.
This section discusses the performance of different MVF models with Figs. 12–15 present the monthly excess returns of different MVF
transaction fee. models after deducting their transaction fees caused by turnover. These
First, three MVF models with deep learning models’ forecasts are figures show that the superiority of RF+MVF is further improved com-
compared. From Table 10, LSTM+MVF owns the highest excess return pared with other models. In conclusion, RF+MVF outperforms the other
and information ratio, and CNN+MVF has the largest total return. models, and high turnover erodes nearly half of its total return. Thus,
Thus, LSTM+MVF and CNN+MVF perform better than DMLP+MVF. this paper recommends to build MVF model with RF forecasts and it is
In the following, LSTM+MVF and CNN+MVF are further compared.
necessary to consider transaction fee when testing the performance of
Mann–Whitney test is conducted to compare their excess returns, test’s
different models.
𝑝− value equals to 0.012, which means there is significant difference
between them. Thus, LSTM+MVF performs better than CNN+MVF.
Second, two MVF models with machine learning models’ forecasts 5.3.2. OF models with transaction fee
are compared. Table 10 shows that RF+MVF’s excess return, informa- This section discusses different OF models after deducting their
tion ratio and total return are higher than SVR+MVF. Also, Fig. 11 transaction fees.

10
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

Fig. 12. Excess return of different MVF models with transaction fee in 2012.

Fig. 13. Excess return of different MVF models with transaction fee in 2013.

Fig. 14. Excess return of different MVF models with transaction fee in 2014.

Fig. 15. Excess return of different MVF models with transaction fee in 2015.

11
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

Table 11 ratio, and SVR+OF has the largest total return. Thus, LSTM+OF and
The performance of different OF models with transaction fee.
SVR+OF outperform ARIMA+OF. Then, LSTM+OF and SVR+OF are
Model ER SD IR TR MD
further compared. Fig. 16 shows that the net value of LSTM+OF is the
DMLP+OF 4.07% 0.8060 0.0505 −30.00% 78.69% largest in 2012–2015 and then it is surpassed by SVR+OF in 2015. Also,
LSTM+OF 95.31% 1.3508 0.7056 22.21% 88.37%
since SVR+OF not only owns similar excess return and information
CNN+OF 12.07% 0.5167 0.2335 129.23% 52.67%
SVR+OF 87.56% 1.4373 0.6092 390.46% 74.93% ratio to LSTM+OF but also has much larger total return than LSTM+OF.
RF+OF 50.87% 0.6532 0.7788 285.40% 72.82% Therefore, SVR+OF is a better choice than LSTM+OF.
ARIMA+OF 0.54% 0.3433 0.0158 25.73% 66.34% Moreover, the monthly excess returns of different OF models are
ER, SD, IR, TOR and MD mean excess return, standard deviation, information ratio, presented in Figs. 17–20, which presents that the relative performance
total return and maximum drawdown respectively. of different models do not change after deducting their transaction fees.
Last, this paper compares RF+MVF with SVR+OF. Tables 10–11
show that RF+MVF owns higher excess return, information ratio and
total return than SVR+OF. Therefore, RF+MVF outperforms SVR+OF.
In conclusion, SVR+OF performs the best among OF models after
deducting the transaction fee. And turnover erodes nearly half of
RF+OF’s total profit, which greatly influences its profitability. Thus,
this paper suggests to build omega model with SVR prediction for
trading investment.

6. Discussion and conclusion

6.1. Discussion of key findings

This study aims to extend the existing literature on portfolio con-


struction with return prediction. Two machine learning models and
three deep learning models are applied to advance the MV and omega
Fig. 16. Net value of different OF models with transaction fee.
models, which combines the advantages of machine learning and deep
learning models in portfolio formation. The test period ranges from
January 5, 2012 to December 31, 2015, containing 970 days, and the
First, three OF models with deep learning models’ forecasts are experiment focuses on Chinese stock market, i.e., the China Securities
compared. From Table 11, LSTM+OF owns the highest excess return 100 Index component stocks.
and information ratio, and CNN+OF has the largest total return. Thus,
First, this paper compares the predictive abilities of RF, SVR, DMLP,
LSTM+OF and CNN+OF outperform DMLP+OF. Then, LSTM+OF and
LSTM neural network and CNN in stock return prediction. Experimental
CNN+OF are further compared. Fig. 16 depicts that the net value
results show that RF outperforms the other models. Second, this paper
of LSTM+OF is higher than CNN+OF until the second half of 2015.
discusses the performance of different MVF and OF models without
In addition, Mann–Whitney test is conducted to measure their excess
transaction fee, and applies six metrics to comprehensively measure
returns, test’s result shows that 𝑝− value equals to 0.000, which means
there is significant difference between them. Thus, LSTM+OF performs their differences. Experiments’ results present that RF+MVF and RF+OF
better than CNN+OF. perform the best among these models. Further, RF+MVF is superior
Second, two OF models with machine learning models’ forecasts are to RF+OF. However, the main defect of RF+MVF and RF+OF is high
discussed. Table 11 gives that SVR+OF has higher excess return and turnover, which may erode considerable total profit. Thus, this paper
total return, and RF+OF has larger information ratio. Thus, it is difficult further compares the performance of different models after deducting
to differentiate them with three core metrics. Further, according to their transaction fees. Experimental results show that RF+MVF still
Fig. 16, SVR+OF’s net value is larger than RF+OF model in most cases. outperforms the other MVF models, but RF+OF is surpassed by SVR+OF
Therefore, SVR+OF performs better than RF+OF. Note that, RF+OF’s since it owns higher turnover rate. And, the RF+MVF model performs
turnover rate is nearly twice that of SVR+OF, which causes its poor better than SVR+OF. In addition, turnover erodes nearly half of their
performance after deducting transaction fee. total returns especially for RF+MVF and RF+OF. Therefore, this paper
Third, LSTM+OF, SVR+OF and ARIMA+OF are compared. Table 11 recommends to build MVF model with RF return forecasts for daily
presents that LSTM+OF owns the highest excess return, information trading investment.

Fig. 17. Excess return of different OF models with transaction fee in 2012.

12
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

Fig. 18. Excess return of different OF models with transaction fee in 2013.

Fig. 19. Excess return of different OF models with transaction fee in 2014.

Fig. 20. Excess return of different OF models with transaction fee in 2015.

6.2. Theoretical implications 6.3. Limitations and future work

This study enriches the theoretical researches on portfolio optimiza- This study also has limitations because we only apply simple histor-
tion with return prediction. First of all, this paper uses five models ical returns as input features in order to compare with the benchmark
in stock return prediction process, which guarantees the high-quality model. Since many studies have shown the value of technical indi-
cators, news, exchange rate and economic indicators. Thus, further
stocks are selected before building portfolio optimization models. To be
studies can try to apply more efficient input features to train predictive
specific, RF, SVR, DMLP, LSTM neural network and CNN are adopted
models and improve the performance of MVF and OF models for daily
for future daily return prediction, and ARIMA model is used as bench-
trading investment. Also, high turnover is a big challenge for these
mark to show their superiorities. Second, two valuable portfolio op-
models to overcome especially for RF+MVF, which is profitable in
timization models are advanced with above models’ return predictive
practical investment transaction.
results, which fills the research gap in existing literature. Actually,
MV and omega models are extended with deep learning and machine CRediT authorship contribution statement
learning models’ predictive results for the first time, and portfolio opti-
mization models with ARIMA return prediction are used as comparisons Yilin Ma: Conceptualization, Data curation, Formal analysis, In-
to present their advantages. vestigation, Methodology, Software, Validation, Visualization, Writing

13
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

- original draft, Writing - review & editing. Ruizhu Han: Supervi- Ho, T. K. (1995). Random decision forests. In Proceedings of the third international
sion, Funding acquisition, Project administration, Resources, Writing - conference on document analysis and recognition (pp. 278–282).
Hoseinzade, E., & Haratizadeh, S. (2019). CNNpred: CNN-based stock market prediction
review & editing. Weizhong Wang: Writing - review & editing.
using a diverse set of variables. Expert Systems with Applications, 129, 273–285.
Huang, C. F. (2012). A hybrid stock selection model using genetic algorithms and
Declaration of competing interest support vector regression. Applied Soft Computing, 12, 807–818.
Ji, S. W., Xu, W., Yang, M., & Yu, K. (2012). 3D convolutional neural networks
The authors declare that they have no known competing finan- for human action recognition. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 35, 221–231.
cial interests or personal relationships that could have appeared to
Jorion, P. (1997). Value at risk: The new benchmark for controlling market risk. Chicago:
influence the work reported in this paper. McGraw-Hill.
Kane, S. J., Bartholomew-Biggs, M. C., Cross, M., & Dewar, M. (2009). Optimizing
Acknowledgments Omega. Journal of Global Optimization, 45, 153–167.
Kapsos, M., Christofides, N., & Rustem, B. (2014). Worst-case robust Omega ratio.
European Journal of Operational Research, 234, 499–507.
Yilin Ma conceptualized the research, collected and analyzed the
Keating, C., & Shadwick, W. F. (2002). A universal performance measure. Journal of
data, and conducted the experiment. Yilin Ma, Ruizhu Han and Performance Measurement, 6, 59–84.
Weizhong Wang wrote, reviewed and edited the paper. Kolm, P. N., Tütüncü, R., & Fabozzi, F. J. (2014). 60 Years of portfolio optimisation:
Practical challenges and current trends. European Journal of Operational Research,
Funding 234, 356–371.
Konno, H., & Yamazaki, H. (1991). Mean-absolute Deviation Portfolio Optimization
model and its applications to Tokyo Stock Market. Management Science, 37(5),
This research was supported by the National Natural Science Foun- 519–531.
dation of China (No. 71390335). Krauss, C., Do, X. A., & Huck, N. (2017). Deep neural networks, gradient-boosted
trees, random forests: Statistical arbitrage on the S & P 500. European Journal
References of Operational Research, 259, 689–702.
LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time
series. In The handbook of brain theory and neural networks, Vol. 3361. 1995.
Adebiyi, A. A., Adewumi, A. O., & Ayo, C. K. (2014). Comparison of ARIMA and
Lee, S. I., & Yoo, S. J. (2018). Threshold-based Portfolio: The Role of the Threshold and
artificial Neural Networks models for Stock Price Prediction. Journal of Applied
its applications. The Journal of Supercomputing, http://dx.doi.org/10.1007/s11227-
Mathematics, http://dx.doi.org/10.1155/2014/614342.
018-2577-1.
Alexander, G. J., & Baptista, A. M. (2002). Economic implications of using a mean-VaR
Lin, C. M., Huang, J. J., Gen, M., & Tzeng, G. H. (2006). Recurrent neural network for
model for portfolio selection: A comparison with mean–variance analysis. Journal
dynamic portfolio selection. Applied Mathematics and Computation, 175, 1139–1146.
of Economic Dynamics and Control, 26, 1159–1193.
Long, W., Lu, Z., & Cui, L. (2018). Deep learning-based feature engineering for stock
Alizadeh, M., Rada, R., Jolai, F., & Fotoohi, E. (2011). An adaptive neuro-fuzzy system
price movement prediction. Knowledge-Based Systems, 164, 163–173.
for stock portfolio analysis. International Journal of Intelligent Systems, 26, 99–114.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic
Ballings, M., Poel, D. V. D., Hespeels, N., & Gryp, R. (2015). Evaluating multiple
segmentation. In Proceedings of the IEEE conference on computer vision and pattern
classifiers for stock price direction prediction. Expert Systems with Applications, 42,
recognition (pp. 3431–3440).
7046–7056.
Booth, A., Gerding, E., & Mcgroarty, F. (2014). Automated trading with performance Lu, C. J., Lee, T. S., & Chiu, C. C. (2009). Financial time series forecasting using
weighted random forests and seasonality. Expert Systems with Application, 41, independent component analysis and support vector regression. Decision Support
3651–3661. Systems, 47, 115–125.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. Markowitz, H. M. (1959). Portfolio selection: Efficient diversification of investments. New
Chen, B. L., Zhong, J. D., & Chen, Y. Y. (2020). A hybrid approach for portfolio selection York: John Wiley Sons Inc.
with higher-order moments: Empirical evidence from Shanghai Stock Exchange. Matías, J. M., & Reboredo, J. C. (2012). Forecasting performance of nonlinear models
Expert Systems with Applications, 145, Article 113104. for Intraday Stock Returns. Journal of Forecasting, 31, 172–188.
Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market Moews, B., Herrmann, J. M., & Ibikunle, G. (2019). Lagged correlation-based deep
analysis and prediction: Methodology, data representations, and case studies. Expert learning for directional trend change prediction in financial time series. Expert
Systems with Applications, 83, 187–205. Systems with Applications, 120, 197–206.
Deboeck, G. J. (1994). Trading on the edge: Neural, genetic, and fuzzy systems for chaotic Oliveira, N., Cortez, P., & Areal, N. (2017). The impact of microblogging data for stock
financial markets. New York: Wiley. market prediction: Using Twitter to predict returns, volatility, trading volume and
Deng, G. F., Lin, W. T., & Lo, C. C. (2012). Markowitz-based portfolio selection with survey sentiment indices. Expert Systems with Applications, 73, 125–144.
cardinality constraints using improved particle swarm optimization. Expert Systems Orimoloye, L. O., Sung, M. C., Ma, T. J., & Johnson, J. E. V. (2020). Comparing the
with Applications, 39, 4558–4566. effectiveness of deep feedforward neural networks and shallow architectures for
Deng, S. J., & Min, X. Y. (2013). Applied optimization in Global Efficient Portfolio predicting stock price indices. Expert Systems with Application, 139, Article 112828.
Construction using Earning Forecasts. The Journal of Investing, 22, 104–114. Paiva, F. D., Cardoso, R. T. N., Hanaoka, G. P., & Duarte, W. M. (2019). Decision-
Emir, Ş. (2013). Predicting the Istanbul Stock Exchange index return using Technical making for financial trading: A fusion approach of machine learning and portfolio
Indicators: A comparative study. International Journal of Finance & Banking Studies, selection. Expert Systems with Applications, 115, 635–655.
2(3), 111–117. Pang, X., Zhou, Y., Wang, P., Lin, W. W., & Chang, V. (2018). An innovative neural
Fabozzi, F. J., Gupta, F., & Markowitz, H. M. (2002). The legacy of modern portfolio network approach for stock market prediction. The Journal of Supercomputing,
theory. The Journal of Investing, 11, 7–22. http://dx.doi.org/10.1007/s11227-017-2228-y.
Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks Patel, J., Shah, S., & Thakkar, P. (2015). Predicting stock and stock price index
for financial market predictions. European Journal of Operational Research, 270, movement using Trend Deterministic Data Preparation and machine learning
654–669. techniques. Expert Systems with Applications, 42, 259–268.
Freitas, F. D., Souza, A. F. D., & Almeida, A. R. D. (2009). Prediction-based portfolio Principe, J. C., Euliano, N. R., & Lefebvre, W. C. (1999). Neural and adaptive systems:
optimization model using neural networks. Neurocomputing, 72, 2155–2170. Fundamentals through simulations. New York: Wiley.
Gandhmal, D. P., & Kumar, K. (2019). Systematic analysis and review of stock market Qin, Q. (2014). Linear and nonlinear Trading models with Gradient Boosted random
prediction techniques. Computer Science Review, 34, Article 100190. Forests and Application to Singapore Stock Market. Journal of Intelligent Learning
Gilli, M., Schumann, E., di Tollo, G., & Cabej, G. (2011). Constructing 130/30-portfolios Systems and Applications, 5, 1–10.
with the Omega ratio. Journal of Asset Management, 12, 94–108. Rasel, R. I., Sultana, N., & Meesad, P. (2015). An efficient modelling approach
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidi- for forecasting financial time series data using support vector regression and
rectional lstm and other neural network architectures. Neural Networks, 18, windowing operators. International Journal of Computational Intelligence Studies, 4(2),
602–610. 134–150.
Hansen, J. V., & Nelson, R. D. (2002). Data mining of time series using stacked Rockafellar, R. T., & Uryasev, S. (2000). Optimization of conditional value-at-risk.
generalizers. Neurocomputing, 43, 173–184. Journal of Risk, 2, 21–42.
Hao, C. Y., Wang, J. Q., & Xu, W. (2013). Prediction-based portfolio selection model Sadaei, H. J., Enayatifar, R., Lee, M. H., & Mahmud, M. (2016). A hybrid model based
using support vector machines. In Proceedings of sixth international conference on on differential fuzzy logic relationships and imperialist competitive algorithm for
business intelligence and financial engineering (pp. 567–571). stock market forecasting. Applied Soft Computing, 40, 132–149.
Hiransha, M., Gopalakrishnan, E. A., Menon, V. K., & Soman, K. P. (2018). NSE Stock Sezer, O. B., Gudelek, U., & Ozbayoglu, M. (2020). Financial time series forecasting with
Market Prediction using Deep-learning Models. Procedia Computer Science, 132, deep learning : A systematic literature review: 2005-2019. Applied Soft Computing,
1351–1362. 90, Article 106181.

14
Y. Ma et al. Expert Systems With Applications 165 (2021) 113973

Sezer, O. B., & Ozbayoglu, A. M. (2018). Algorithmic Financial Trading with deep Wang, J. Z., Wang, J. J., Zhang, Z. G., & Guo, S. P. (2011). Forecasting stock
Convolutional Neural Networks: Time Series to image Conversion approach. Applied indices with back propagation neural network. Expert Systems with Applications, 38,
Soft Computing, 70, 525–538. 14346–14355.
Singh, R., & Srivastava, S. (2017). Stock prediction using deep learning. Multimedia Weng, B., Lu, L., Wang, X., Megahed, F. M., & Martinez, W. (2018). Predicting short-
Tools and Applications, 76, 18569–18584. term stock prices using ensemble methods and online data sources. Expert Systems
Ta, V. D., Liu, C. M., & Addis, D. (2018). Prediction and portfolio optimization in with Applications, 112, 258–273.
quantitative trading using machine learning techniques. In Proceedings of the ninth Yu, J. R., Chiou, W. J. P., Lee, W. Y., & Lin, S. J. (2020). Portfolio models with return
international symposium on information and communication technology (98–105). forecasting and transaction costs. International Review of Economics & Finance, 66,
Ta, V. D., Liu, C. M., & Tadesse, D. A. (2020). Portfolio optimization-based stock 118–130, 2020.
prediction using long-short term memory network in quantitative trading. Applied Zhang, Y., Li, X., & Guo, S. (2018). Portfolio selection problems with Markowitz’s
Sciences, 10, 437. mean–variance framework: A review of literature. Fuzzy Optimization and Decision
Ustun, O., & Kasimbeyli, R. (2012). Combined forecasts in portfolio optimization: A Making, 17, 125–158.
generalized approach. Computers & Operations Research, 39, 805–819, 2012. Zhang, Y., & Wu, L. (2009). Stock market prediction of S & P 500 via combination of
Wang, W., Li, W., Zhang, N., & Liu, K. C. (2020). Portfolio formation with preselection improved BCO approach and BP neural network. Expert Systems with Applications,
using deep learning from long-term financial data. Expert Systems with Applications, 36, 8849–8854.
143, Article 113042. Zhu, M. (2013). Return distribution predictability and its implications for portfolio
selection. International Review of Economics & Finance, 27, 209–223.

15

You might also like