Robust Portfolio Design and Stock Price Prediction Using An Optimized LSTM Model
Robust Portfolio Design and Stock Price Prediction Using An Optimized LSTM Model
Abstract— Accurate prediction of future prices of stocks is a for predicting future stock prices and future returns of the
difficult task to perform. Even more challenging is to design an portfolios. The actual returns of the portfolios and the
optimized portfolio with weights allocated to the stocks in a way predicted returns by the LSTM model are compared six
that optimizes its return and the risk. This paper presents a months after the construction of the portfolios to evaluate the
systematic approach towards building two types of portfolios, accuracy of the predictive model. Further, the actual returns
optimum risk, and eigen, for four critical economic sectors of and volatilities reflect the current return and the risk associated
India. The prices of the stocks are extracted from the web from with each sector studied in this work.
Jan 1, 2016, to Dec 31, 2020. Sector-wise portfolios are built
based on their ten most significant stocks. An LSTM model is The contribution of this work is threefold. First, the work
also designed for predicting future stock prices. Six months after proposes two different approaches towards portfolio
the construction of the portfolios, i.e., on Jul 1, 2021, the actual construction, the eigen portfolios, and the optimum risk
returns and the LSTM-predicted returns for the portfolios are portfolios. These two methods of portfolio design are applied
computed. A comparison of the predicted and the actual returns to stocks of four critical sectors listed in the NSE. These
indicate a high accuracy level of the LSTM model. portfolios will surely be a good guide for the investors in
making effective and profitable investment decisions.
Keywords— Portfolio Optimization, Minimum Variance
Second, it proposes an efficient and optimized design of an
Portfolio, Optimum Risk Portfolio, Eigen Portfolio, Stock Price
LSTM architecture for predicting future prices of stocks for
Prediction, LSTM, Sharpe Ratio, Prediction Accuracy.
designing robust portfolios. Finally, the actual returns of the
portfolios indicate the current profitability and volatility of the
I. INTRODUCTION four sectors.
The task of designing optimum and robust portfolios has The paper is organized as follows. In Section II, some of
always been considered a research problem of intense interest the existing works on portfolio management and stock price
among quantitative and statistical financial analysts and prediction are discussed briefly. Section III provides the
researchers. A portfolio is said to be optimal when it allocates details of the data used and the methodology followed. Section
weights to a set of stocks in such a way that the return and risk IV discusses the design of the LSTM regression model.
associated with the portfolio are traded-off in the best possible Section V discusses the results of different portfolios and the
manner. Markowitz, in his seminal work, proposed an predictions of the future stock prices made by the LSTM
approach called “the mean-variance optimization approach”, model. Section VI concludes the paper.
which is based on the mean and covariance matrix of the
returns [1]. While identifying an optimal “mean-variance”
portfolio belongs to the NP-hard class, following Markowitz’s II. RELATED WORK
work, several propositions have been made by researchers for Due to the challenging nature of the problem related to
portfolio optimization and stock price prediction. Among the stock price prediction and robust portfolio design, and since
proposed methods in the literature for stock price prediction, these applications find impactful use cases in the real world,
multivariate regression, ARIMA, VAR, time series several propositions exist in the literature of these research
forecasting, and learning-based approaches are quite popular. areas. The use of predictive models built on learning
However, since it is extremely difficult to accurately estimate algorithms and deep neural architectures for stock price
the future prices of stocks, estimation of the expected returns prediction has been quite popular [3-6]. Hybrid models are
of a stock from its historical prices is invariably error-prone. proposed integrating learning-based algorithms and
Hence, it is a popular practice to use either a minimum architectures with the sentiments in the unstructured data on
variance portfolio or an optimum risk portfolio with the the social web [7-9]. Several adaptations of Markowitz’s
maximum Sharpe Ratio as better proxies for the expected minimum variance approach are proposed by researchers
returns. including purchase limit constraints and cardinality
This paper presents a step-by-step approach towards constraints. Generalized autoregressive conditional
designing robust and efficient portfolios by choosing stocks heteroscedasticity (GARCH) is a common approach for
from four critical sectors of the National Stock Exchange estimating the future volatilities of stocks and portfolios [10].
(NSE) of India. Based on the report of NSE on Jul 30, 2021, The use of metaheuristics in solving multi-objective
the ten most significant stocks of each of the four sectors are optimization problems for portfolio design, eigen portfolios
first identified [2]. Based on the historical prices of the forty using principal component analysis, and linear and non-linear
stocks from Jan 1, 2016, to Dec 31, 2020, efficient portfolios programming-based approaches are proposed by some
are designed for the sectors optimizing their risks and returns researchers [11-13]. Further, fuzzy logic, genetic algorithms
and exploiting their principal components. To augment the (GAs), particle swarm optimization (PSO) are also some of
process of portfolio construction, an LSTM model is designed the popular approaches for portfolio design [14-15].
The current work presents two intrinsically distinct of candidate portfolios is first plotted. The efficient frontier
methods to portfolio design, (i) the optimum risk portfolio and depicts the contour or locus containing portfolio points that
(ii) the eigen portfolio, for robust designing portfolios for four yield a maximum return for a given value of risk, or they
important economic sectors of India. Based on the stock prices involve the minimum risk for a given value of the return. For
from Jan 1, 2016, to Dec 31, 2020, eight portfolios are an efficient frontier, the return and the risk are plotted along
designed, two for each sector. An LSTM model is then built the y-axis and the x-axis, respectively. It is evident that the
for predicting the future prices of the stocks of each portfolio. left-most point on a given efficient frontier depicts the
Six months after the portfolio construction, the actual return minimum risk portfolio. In the current work, the efficient
for each portfolio and the return predicted by the LSTM model frontier for each sector is identified by randomly assigning
are computed to analyze the profitability of each sector and weights to its constituent stocks and iterating such random
the prediction accuracy of the LSTM model. assignment 10000 times over a loop so that 10000 such
portfolios are designed. These portfolios are plotted on a two-
III. DATA AND METHODOLOGY dimensional space, and the left-most point along the x-axis is
identified to determine the minimum risk (or minimum
In Section I, it was pointed out that the primary objective variance) portfolio. The return and risk for a portfolio are
of the work is to build robust portfolios for four critical sectors derived Using (1) and (2). In (1), Ret depicts the return of a
of the Indian economy. The second goal is to evaluate the portfolio with n stocks S1, S2, …Sn, with respective weights
prediction accuracy of the proposed LSTM model in wi’s.
predicting the future stock prices and future returns and risks
associated with each portfolio. The return-risk analysis also 𝑅𝑒𝑡 = 𝑤1 𝑅𝑒𝑡(𝑆1 ) + 𝑤2 𝑅𝑒𝑡(𝑆2 ) + ⋯ + 𝑤𝑛 𝑅𝑒𝑡(𝑆𝑛 ) (1)
provides one with insights into the current profitability and
risk involved in each of the sectors analyzed. The Python In (1), Ret represents the return of a portfolio consisting of
programming language has been used in designing the n stocks, which are represented as S1, S2, …Sn, while wi's are
portfolios and the LSTM model. The Tensorflow and Keras their corresponding weights. The variance of a portfolio is
frameworks are also used. In the following, the seven-step given by (2).
approach followed in the design process is discussed.
𝑉 = ∑𝑛𝑖=1 𝑤𝑖 𝑠𝑖2 + 2 ∗ ∑𝑖,𝑗 𝑤𝑖 ∗ 𝑤𝑗 ∗ 𝑐𝑜𝑣𝑎𝑟(𝑖, 𝑗) (2)
A. Choosing the Sectors
Four important sectors are chosen from the NSE, India. In (2), V, wi, and si represent the variance of a portfolio,
The chosen sectors are the following: financial services, oil the weight of the ith stock, and its standard deviation. The
and gas, pharma, and public sector unit (PSU) banks. The ten covariance between the prices of the ith and the jth stock is
most significant stocks are identified for each sector, based on represented as covar(i, j).
their contributions to the derivation of the sectoral index. The
significant stocks are identified based on the report published E. Identifying the Minimum Risk Portfolios
by NSE on Jun 30, 2021 [2]. Since the minimum risk portfolios are rarely adopted by
B. Data Acquisition the investors in the real-word due to the low returns that they
usually yield, a trade-off between the risk and return is carried
For each sector, the prices of the top stocks are extracted out to optimize the return and risk. This trade-off leads to the
using the DataReader function of the data sub-module of the design of optimum risk portfolios. For optimizing the risk, a
pandas_datareader module in Python. The prices are metric called Sharpe Ratio (SR) is used. SR of a portfolio is
extracted from the Yahoo Finance site, from Jan 1, 2016, to the ratio of the difference between its return and that of a risk-
Jun 1, 2021. The stock price data from Jan 1, 2016, to Dec 31, free portfolio, to its standard deviation. as Sharpe Ratio is
2020, are used for building the portfolios, while the portfolios used, which is given by (3).
are tested for their return on Jun 1, 2021. The current work is
a univariate analysis, and hence, the variable close is chosen 𝑅𝑒𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡 − 𝑅𝑒𝑡𝑓𝑟𝑒𝑒
𝑆𝑅 = (3)
as the variable of interest, ignoring the remaining variables. 𝑆𝑇𝐷𝑐𝑢𝑟𝑟𝑒𝑛𝑡
C. Deriving the Return and Volatility of the Stocks In (3), Retcurent, Rfree, and STDcurrent represent the returns of
the current and the risk-free portfolios, and the risk of the
The daily returns are computed for each stock as the current portfolio, respectively. A risk-free portfolio is
changes in the successive close values in percentage. The assumed to have a volatility of one percent. For a given set of
Python function pct_change is used for computing the daily stocks, the optimum-risk portfolio is designed to maximize the
returns. The variance, and hence the standard deviations of the Sharpe Ratio. Once, the optimum risk portfolio is determined
daily return values are then derived to arrive at the daily the corresponding weights assigned to the individual stocks
volatility values for each stock. Assuming that there are 250 are available. The Python function idxmax is used to identify
operational days in a calendar year, the annual volatilities are the portfolio with the maximum SR value.
for the stock are computed by multiplying the daily volatilities
by a factor of the square root of 250. The risk involved in stock F. Building the Eigen Portfolios
is manifested in its annual volatility figure. Designing eigen portfolios involves the concept of
D. Designing the Minimum Risk Portfolios principal component analysis (PCA), a well-known
dimensionality reduction method based on unsupervised
After computing the annual returns and risks (i.e., learning. PCA retains the intrinsic variance in the data while
volatilities) of all the stocks, the minimum risk portfolios are reducing the number of dimensions. The principal
designed for the four sectors. The minimum risk portfolio is components in the training data of the stock prices are
the portfolio that has minimum variance associated with its determined using the PCA function defined in the sklearn
constituent stocks. To identify the minimum variance library of Python. To retain 80% of the variance in the original
portfolio for a sector, the efficient frontier of a large number
stock price data, it is found that a minimum of five days with a single feature (i.e., close values) is represented by
components is needed from the ten stocks. The components the data shape of (50, 1). The input layer forwards the data to
generated by the PCA function are orthogonal to each other, the first LSTM layer. The LSTM layer is composed of 256
and their power of explanation of the variance in the data nodes. The output from the LSTM layer has a shape of (50,
decreases with a higher component number. In other words, 256). Thus, each node of the LSTM layer extracts 256 features
the first component explains the maximum percentage of the from every record in the input data. A dropout layer is used
total variance. The component loading of the five principal after the first LSTM layer that randomly switches off the
components on each of the ten stocks reflects the weights output of thirty percent of the nodes in the LSTM to avoid
allocated to the stocks in building the candidate eigen model overfitting. Another LSTM layer with the same
portfolios. Finally, the portfolio yielding the maximum Sharpe architecture as the previous one receives the output from the
Ratio among the candidates is selected as the best eigen first and applies a dropout rate of thirty percent. A dense layer
portfolio. A Python function is used iterating over a loop for with 256 nodes receives the results from the second LSTM
deriving the weights assigned to the five principal components layer. The output of the dense layer produces the predicted
and in identifying the best candidate eigen portfolio [12]. close price. The forecast horizon may be adjusted to different
values by changing a tunable parameter. A forecast horizon of
G. Predicted and Actual Returns and Risks of Portfolios one day is used so that a prediction for the following is made.
Using the training dataset of the stock price from Jan 1, The model is trained using a batch size of 64, and 100 epochs
2016, to Dec 31, 2020, two portfolios are designed for each are used. While the sigmoid function is used for activation at
sector, an optimal risk portfolio, and an eigen portfolio. On the final output layer, the ReLU activation is deployed at all
Jan 1, 2021, a fictitious investor is created who invests an other layers. The loss and the accuracy during training and
amount of Indian Rupees (INR) of 100,000, for each sector validation are measured using the Huber loss function and the
based on the recommendation of the optimal risk portfolio mean absolute error function, respectively. The
structure for the corresponding sector. Note that the amount of hyperparameter values used in the network design are all
INR 100,000 is just for illustrative purposes only. Our analysis chosen based on the grid search method.
will not be affected either by the currency or by the amount.
To compute the future values of the stock prices and hence to
predict the future value of the portfolio, a regression model is
designed based on LSTM deep learning architecture. On May
31, 2021, using the LSTM model, the stock prices for June 1,
2021, are predicted (i.e., a forecast horizon of one day is used).
Based on the predicted stock values, the predicted rate of
return for each portfolio is determined. And finally, on June 1,
2021, when the actual prices of the stocks are known, the
actual rates of return are derived. The predicted and actual
rates of return for the portfolios are compared to evaluate the
profitability of the portfolios and the accuracy of the LSTM
model.
IV. THE LSTM MODEL
As explained in Section III, the stock prices are predicted
with a forecast horizon of one day, using an LSTM deep
learning model. This section presents the details of the
architecture and the choice of various parameters in the model Fig. 1. The schematic diagram of the LSTM model
design. A very brief discussion on the fundamentals of LSTM
networks and the effectiveness of these networks in
interpreting sequential data is first discussed before the details V. EXPERIMENTAL RESULTS
of the model design are presented. This section presents the detailed results of the
performances of the portfolios and their analysis. The four
LSTM is an extended and advanced, recurrent neural
sectors of the Indian stock market we choose are (i) financial
network (RNN) with a high capability of interpreting and
services, (ii) oil and gas, (iii) pharma, and (iv) public sector
predicting future values of sequential data like time series of
unit (PSU) banks, and (vii) realty. The portfolios and the
stock prices or text [16]. LSTM networks are able to maintain
LSTM models are implemented using Python, and its
their state information in some specially designed memory
associated libraries in TensorFlow and Keras. The models are
cells or gates. The networks carry out an aggregation
trained and validated on the GPU environment of Google for
operation on the historical state stored in the forget gates with
faster execution and processing. The execution time of each
the current state information to compute the future state
epoch was three seconds approximately.
information. The information available at the current time slot
is received at the input gates. Using the results of aggregation A. Financial Services Sector
at the forget gates and the input gates, the network predicts the The ten critical stocks of the financial services sector with
target variable's value for the next round. The predicted value their corresponding weights used for deriving the overall
is available at the output gates [16]. index of the sector based on the NSE report published on Jun
30, 2021, are HDFC Bank (HDB): 24.41, Housing
For predicting the stock prices for the next day, and LSTM Development Finance Corporation (HDF): 16.167, ICICI
model is designed and fine-tuned. The design of the model is Bank Motors (ICB): 16.31, and Kotak Mahindra Bank (KTB):
exhibited in Fig. 1. The model uses daily close prices of the 9.35, Axis Bank (AXB): 7.19, State Bank of India (SBI): 6.01,
stock of the past 50 days as the input. The input data of 50
Bajaj Finance (BJF): 5.97, Baja Finserv (BFS): 2.73, HDFC
Life Insurance (HLI): 2.12, and SBI Life Insurance (SLI): 1.66
[2]. We assume an imaginary investor who invested total
capital of INR 100000 on Jan 1, 2021. The six months' returns
are computed on Jul 1, 2021, for the optimum risk and the
eigen portfolios. The LSTM model is used for predicting the
stock prices of Jul 1, and the predicted return is compared with
the actual return of the optimum risk portfolio.
TABLE I. ACTUAL RETURN OF OPT RISK PORTFOLIO (FIN SERV SEC) Fig. 2. The red and the green stars depict the min. risk portfolio and the opt.
risk portfolio, respectively, for the financial services sector built on Jan 1,
Date: Jan 1, 2021 Date: Jul 1, 2021 2021. The x- and the y- axis plot the risk, return, respectively.
Stock Wts Amnt Act No of Act Act
Invstd Price Stocks Price Value
HDB 0.2297 22970 1425 16.12 1487 23969
HDF 0.0149 1490 2569 0.58 2459 1426
ICB 0.0914 9140 528 17.31 631 10923
KTB 0.0416 4160 1994 2.09 1716 3580
AXB 0.0676 6760 624 10.83 746 8082
SBI 0.0086 860 279 3.08 420 1295
BJF 0.4010 40100 5280 7.59 5967 45318
BFS 0.0149 1490 8870 0.17 11816 1985
HLI 0.0075 760 678 1.11 686 761
SLI 0.1227 12270 895 13.71 1007 13806
Total 100000 111145 Fig. 3. Actual vs. predicted values of the HDFC Bank (HDB) stock as
Actual Return: 11.15 % predicted by the LSTM model (Period: Jan 1, 2021, to Jul 1, 2021)
TABLE IV. ACTUAL RETURN OF OPT RISK PORTFOLIO (OIL & GAS SEC)
TABLE II. ACTUAL RETURN OF EIGEN PORTFOLIO (FIN SERV SEC)
Date: Jan 1, 2021 Date: Jul 1, 2021
Date: Jan 1, 2021 Date: Jul 1, 2021
Stock Wts Amnt Act No of Act Act
Stock Wts Amnt Act No of Act Act
Invstd Price Stocks Price Value
Invstd Price Stocks Price Value
RLI 0.2338 23380 1988 11.76 2098 24674
HDB 0.1100 11000 1425 7.72 1487 11479
BPC 0.0443 4430 382 11.60 463 5369
HDF 0.1100 11000 2569 4.28 2459 10529
ONG 0.0479 4790 93 51.51 119 6129
ICB 0.1100 11000 528 20.83 631 13146
ATG 0.1771 17710 377 46.98 969 45520
KTB 0.1000 10000 1994 5.02 1716 8606
IOC 0.0084 840 92 9.13 108 986
AXB 0.1100 11000 624 17.63 746 13151
GAI 0.0272 2720 124 21.94 153 3356
SBI 0.1100 11000 279 39.43 420 16559
IPG 0.1186 11860 507 23.39 571 13357
BJF 0.1200 12000 5280 2.27 5967 13561
HPC 0.0441 4410 221 19.95 296 5907
BFS 0.1200 12000 8870 1.35 11816 15986
PNL 0.0370 3700 250 14.80 223 3300
HLI 0.0600 6000 678 8.85 686 6071
GJG 0.2615 26160 378 69.18 675 46697
SLI 0.0500 5000 895 5.59 1007 5626
Total 100000 155295
Total 100000 114714
Actual Return: 55.30 %
Actual Return: 14.71 %
Tables I – III depict the returns of two portfolio design B. Oil & Gas Sector
approaches - optimum risk, and eigen, and the LSTM model The top ten stocks of the oil & gas sector and their weights
predicted return for an investor who invested following the (in percent) are Reliance Industries (RIL): 35.58, Bharat
recommendations of the optimum risk portfolio. Fig. 2 shows Petroleum Corporation (BPC), Oi & Natural Gas Corporation
the efficient frontier, the minimum risk portfolio, and the (ONG): 10.78, Adani Total Gas (ATG): 7.04, Indian Oil
optimum risk portfolio of the financial services sector. As an Corporation (IOC): 6.88, GAIL (GAI): 6.67, Indraprastha Gas
illustration, Fig 3 depicts the plot of actual prices vs. (IPG): 4.90, Hindustan Petroleum Corporation (HPC): 4.70,
corresponding predicted prices of the most significant stock in Petronet LNG (PNL): 4.25, and Gujarat Gas (GJG): 2.85 [2].
this sector, HDFC Bank, from Jan 1, 2021, to Jul 1, 2021. Tables IV-VI show the returns of the two portfolios, optimum
risk, the eigen, and the return predicted by the LSTM model.
Fig. 4 shows the efficient frontier, while Fig. 5 displays the (AKL): 3.90 [2]. Tables VII-IX depict the returns produced by
actual and predicted prices of Reliance Ind (RLI), which is the the two portfolios, optimum risk and eigen, and the LSTM-
leading stock of the oil & gas sector. predicted return. Fig. 6 plots of the actual vs. predicted prices
of the leading stock of the sector, Sun Pharmaceuticals (SPI).
TABLE VI. PREDICTED RETURN BY THE LSTM MODEL (OIL & GAS SEC)
TABLE VIII ACTUAL RETURN OF EIGEN PORTFOLIO (PHARMA SEC)
Date: July 1, 2021
Stock
Pred Price No of Stocks Pred Value Date: Jan 1, 2021 Date: Jul 1, 2021
RLI 2069 11.76 24331 Stock Wts Amnt Act No of Act Act
BPC 471 11.6 5464 Invstd Price Stocks Price Value
ONG 119 51.51 6130 SPI 0.12 12000 596 20.13 684 13772
ATG 1065 46.98 50034 DRL 0.11 11000 5241 2.10 5559 11667
IOC 109 9.13 995 DVL 0.10 10000 3849 2.60 4436 11525
GAI 151 21.94 3313 CPL 0.11 11000 827 13.30 978 13008
IPG 535 23.39 12514 LPN 0.12 12000 1001 11.99 1146 13738
HPC 297 19.95 5925 APH 0.12 12000 928 12.93 968 12517
PNL 224 14.8 3315 BCN 0.10 10000 466 21.46 406 8712
GJG 645 69.18 44621 CDH 0.11 11000 478 23.01 639 14706
Total 156642 TPH 0.08 8000 2795 2.86 2924 8369
Predicted Return: 56.64 % AKL 0.03 3000 2951 1.02 3220 3274
Total 100000 111288
Actual Return: 11.29 %