Advancing Financial Risk Prediction Through Optimized LSTM Model Performance and Comparative Analysis
Advancing Financial Risk Prediction Through Optimized LSTM Model Performance and Comparative Analysis
1
Columbia University,USA,[email protected]
2
Columbia University,USA,[email protected]
3
Independent Researcher,China,[email protected]
4
Rutgers University,USA,[email protected]
5
University of Connecticut Business School,USA,[email protected]
6
Stevens Institute of Technological,USA,[email protected]
This paper focuses on the application and optimization of become increasingly prominent, and it is difficult to meet the
LSTM model in financial risk prediction. The study starts with an high requirements of modern financial markets for risk
overview of the architecture and algorithm foundation of LSTM, prediction accuracy and speed [2].
and then details the model training process and hyperparameter
tuning strategy, and adjusts network parameters through Through the multi-layer neural network structure, deep
experiments to improve performance. Comparative experiments learning can automatically learn complex feature
show that the optimized LSTM model shows significant advantages representations from massive data, which can not only
in AUC index compared with random forest, BP neural network effectively deal with high-dimensional data, but also capture
and XGBoost, which verifies its efficiency and practicability in the the deep nonlinear relationship between variables, which is
field of financial risk prediction, especially its ability to deal with particularly important for understanding the complex dynamics
complex time series data, which lays a solid foundation for the of financial markets [3]. In addition, the deep learning model
application of the model in the actual production environment. also has good generalization ability, which can effectively
predict risk on the basis of limited samples and reduce the risk
Keywords: Financial Risk Prediction Model; Deep Learning; of overfitting [4]. Coupled with its powerful parallel computing
LSTM capabilities, the risk analysis process is more rapid, which
I. INTRODUCTION helps financial institutions to quickly respond to market
changes and adjust risk management strategies in time.
Risks in finance encompass credit, market, and operational
aspects, which are the core issues that financial institutions In today's era of accelerated digital transformation, the
must face in their daily operations [1]. Traditional risk financial industry is undergoing unprecedented changes. With
prediction models mostly rely on statistical methods and simple the rapid development of big data technology and the
machine learning algorithms. Although these methods can continuous breakthrough in the field of artificial intelligence,
identify risk factors to a certain extent, they have limitations in deep learning, as an important branch of artificial intelligence,
dealing with high-dimensional data, capturing nonlinear has gradually penetrated into all levels of financial risk
relationships, and dealing with data noise. With the surge of management and become one of the key tools for predicting
financial trading volume and the improvement of data and controlling financial risks [5]. This paper aims to explore
complexity, the shortcomings of traditional models have how to use deep learning technology to improve traditional risk
assessment methods, introduce more efficient and accurate Yannis and Magdalene used the loan records of 1411
analysis tools for the field of financial risk prediction, improve companies from a major commercial bank in Greece to
the accuracy and efficiency of financial risk prediction, and compare and analyze the two classification models of support
provide more scientific and timely risk management strategies vector machine and decision tree, and found that support vector
for financial institutions. machine was slightly better than decision tree in stability [23].
Alina and Simona respectively used Bayesian logistic
II.CORRELATIONAL RESEARCH regression and artificial neural network model for commercial
As artificial intelligence technology progresses swiftly, the default risk. Based on the sample of 3000 enterprises of a
output of academic literature has risen significantly. Deep multinational bank in Romania, the study pointed out that
learning has made remarkable achievements in the fields of artificial neural network was superior to the Bayesian model in
medical diagnosis classification[6-9], computer vision[10-13], terms of prediction accuracy and effect [24].
and disease prediction[14]. In addition, stock market trend Zhao Nan, Zhao Zheyun and other researchers combined
prediction, financial risk quantification, and asset allocation principal component analysis and BP neural network
optimization have become key issues to be solved urgently. In technology to provide effective ways and practical plans for
this wave of technology, deep learning, as a cutting-edge credit risk early warning for Internet financial institutions [25].
branch of machine learning, is leading the innovation. In order to cope with the challenges of financial big data, Yang
Compared with previous machine learning models, deep Dejie and Zhang Ning et al. improved the stack denoising
learning does not require manual preprocessing and feature autoencoder neural network model and introduced Karhunen-
selection steps. On the contrary, it relies on a multi-level Loeve expansion as a means of noise processing. Experimental
nonlinear structure to automatically perform feature extraction results show that this improvement improves the accuracy of
and transformation. These hierarchical structures enable the the model by about 3% on the benchmark [26]. In addition, Yi
network to capture the inherent complex nonlinear correlations Baiheng and Zhu Jianjun et al., aiming at the problem of
of data, and directly extract the learned high-level features from unbalanced samples, proposed a method of synthetic data
the original input [15], which greatly enhances the model's enhancement only for misclassified samples, so as to optimize
ability to recognize and understand data patterns. the classification hyperplane bias problem of support vector
Zhou et al. [16] showed that on the Hadoop cloud machine in dealing with such samples, applied this
computing platform, the deep learning model integrated with enhancement to assess customer credit risk for small loan
Convolutional Neural Network (CNN) can quickly and firms, outperforming other methods in accuracy [27].
accurately identify fraudulent transactions in supply chain
finance, emphasizing that the integration of cloud computing III.METHOD
and deep learning in dealing with financial risk identification, A. LSTM
especially in the field of supply chain, is a highly potential
LSTM (Long Short-Term Memory network), an advanced
development direction. Stevenson et al. [17] used BERT, the
class of recurrent neural networks, is engineered to address the
natural language processing technology of deep learning, to
challenge of long-term dependencies in sequence data
mine rich information from the text of corporate financial
processing. This network employs an innovative gating
reports, which helped to alleviate the information asymmetry
mechanism comprising the input gate, forget gate, cell state,
problem of SME financing. Facing the challenge of sample
and output gate, facilitating efficient selective information
imbalance in credit default risk identification, Lin et al. [18]
storage, forgetting, and output. Consequently, LSTMs have
successfully used Generative Adversarial Networks (GAN) to
proven their superior capabilities in complex network domains
improve the accuracy of credit default swap (CDS) risk
such as natural language processing[28], speech
assessment. In addition, deep learning has also shown
recognition[29], time series prediction[30], video analytics[31-
innovative applications in other aspects of financial risk
33], and material detection[34]. Compared with the traditional
management. For example, Liu et al. [19] developed a video
RNN model, LSTM relies on sigmoid and tanh functions to
surveillance system specifically for risk review to enhance
fine-regulate the information flow, significantly alleviate the
stock trading compliance by applying deep learning technology
phenomenon of gradient disappearance and explosion[35], and
to computer vision. The fuzzy deep learning model designed by
ensure that the model can firmly grasp and use historical
Pena et al. [20] followed the Basel III guidelines and
information even when dealing with long sequence data to
effectively predicted operational risks.
achieve higher accuracy prediction and content generation.
Adha and Nurrohmah adopted the multinomial logistic LSTM is also frequently integrated with other cutting-edge
regression model to predict the proportion of bank's default technologies to continuously expand the frontier of sequence
loss, and used maximum likelihood estimation to optimize the data processing [36]. This study develops a deep learning
model parameters. Experimental results show that the accuracy model featuring an input layer, three intermediate hidden
of the model in the classification of bank customer default loss layers, and a final output layer, and its architecture layout is
reaches 95.3% [21]. On the other hand, Silvia and Paolo used shown in Figure 1 to further explore the potential and
Bayesian model averaging strategy to construct credit risk application of LSTM in network structure design.
model, and proved through empirical analysis that on a real
credit risk database, this method not only improved the
prediction accuracy, but also showed better comprehensive
performance compared with the basic regression model [22].
interaction between each variable by regressing it with respect
to its historical value and the historical values of other
variables. Its model formula is given by Formula (10).
𝑌𝑡 = 𝐴1 𝑌𝑡−1 + 𝐴2 𝑌𝑡−2 + ⋯ + 𝐴𝑥 𝑌𝑡−𝑥 (10)
Here, Y represents the vector form of the endogenous
variable, A is the coefficient matrix, and x refers to the lag
order of the variable.
One of the characteristics of vector autoregressive (VAR)
model is that it is less dependent on economic theory, only
based on a few assumptions, focusing on time series data, and
Figure 1. LSTM Long Short-Term Memory network structure taking it as the core to describe the reaction path of economic
1. Algorithm flow system to various shocks. By analyzing the shock response,
VAR model can reveal the stability characteristics and
First, we analyze the forward propagation process of transmission effects of dynamic changes in the economic
LSTMs, encompassing the entire pathway of information from system. The model skillfully integrates theoretical and practical
the input layer to the output layer. During this propagation, the data, uses simple linear or nonlinear regression techniques to
input gate, output gate, and forget gate determine whether to establish the internal relationship between variables, and
retain or discard data. constitutes a comprehensive analysis framework composed of
multiple equations.
The input gate incorporates two sources of information: a
weighted sum of the current input vector and another weighted 3. β coefficient of financial risk
sum from the previous time step's hidden states, as expressed in
Equation (1). The Beta coefficient of financial risk is a key indicator to
measure the sensitivity of a single asset or portfolio to the
𝐼
𝜒𝜌𝑡 = 2𝛴𝑖=1 𝑤𝑖𝜌 𝑥𝑖𝑡 + 𝛴ℎ=1
𝐻
𝑤ℎ𝜌 𝑦ℎ𝑡−1 (1) overall volatility of the market, which is widely used in the
The output generated by the input gate is calculated by a Capital Asset Pricing model (CAPM). It quantifies systematic
specific function, and the calculation process follows equation risk, that is, the risk that diversification cannot eliminate, by
(2). analyzing the proportional relationship between the return on
an asset and the change in the return on a market index, such as
𝑦𝑝𝑡 = 𝑓(𝑥𝜌𝑡 ) (2) the S&P 500. The β coefficient signifies market responsiveness
The hidden layer's memory unit processes information via a of assets: a value of 1 implies assets move in tandem with the
weighted combination of the current input vector and the prior market; exceeding 1 suggests higher volatility compared to the
hidden state, adhering to Formula (3). market; below 1 denotes less volatility, and a rare instance
𝐼 below 0 signifies the asset moves counter to the market. In the
𝜒𝑐𝑡 = 𝛴𝑖=1 𝑤𝑖𝑐 𝑥𝑖𝑡 + 𝛴ℎ=1
𝐻
𝑤ℎ𝑐 𝑦ℎ𝑡−1 (3) practice of risk management, β coefficient is an effective tool
The state feedback value received by the input gate from to adjust the risk level of the portfolio and help to build a
the memory unit of the hidden layer is reflected in Formula (4). portfolio that conforms to the risk preferences of investors. At
𝑆𝑐𝑡 = 2𝑠𝑐𝑡−1 + 𝑦𝜌𝑡 𝑔(2𝑥𝐶𝑡 ) (4) the same time, it only measures the systematic risk rather than
The input and output formulas of the forget gate are shown the non-systematic risk of a specific company or industry.
in Formula (5) and (6). Through statistical regression analysis of historical data, it is
calculated that β coefficient not only predicts the expected
𝜒𝜑𝑡 = 2𝐼 𝑤𝑖𝜑 𝑥𝑖𝑡 + 2𝐻 𝑡−1
ℎ=1 𝑤ℎ𝜑 𝑦ℎ (5) return rate of assets, but also plays a key parameter to balance
𝑦𝜑𝑡 = 𝑓(𝑥𝜑𝑡 ) (6) risk exposure in hedging strategies, thus playing a core role in
The state feedback received by the forget gate from the financial market analysis, investment decision-making and risk
memory unit of the hidden layer is calculated according to management. The formula for calculating β coefficient is
Formula (7). shown in Equation (11).
𝐶𝑜𝜈(𝑅𝑖 , 𝑅𝑚 )
𝑆𝜑𝑡 = 𝑠𝜑𝑡−1 + 𝑦𝜌𝑡 𝑔(𝑥𝜑𝑡 ) (7) 𝛽𝑖 = (11)
The input received by the output gate is derived from the 𝑉𝑎𝑟(𝑅𝑚)
weighted sum of the memory unit and the output layer, In this context, Coν(Ri,Rm) indicates the covariance
following the stipulation of Formula (8). between the returns of the ith asset and the returns of the market
portfolio, while Var(Rm) denotes the variance of the market
𝐼
𝜒𝜋𝑡 = 2𝛴𝑖=1 𝑤𝑖𝜋 𝑥𝑖𝑡 + 2𝛴ℎ=1
𝐻
𝑤ℎ𝜋 𝑦ℎ𝑡 (8) portfolio's returns.
The final output result of the memory unit of the hidden 4. Description of LSTM model parameters
layer is calculated and determined according to Formula (9).
The parameters of deep neural network model are divided
𝑦𝑐𝑡 = 𝑦𝜋𝑡 ℎ(𝑠𝑐𝑡 ) (9) into two categories: manually set hyperparameters and self-
2. VAR model learning weights of the model. After data input, the model
Vector autoregressive models are a method for analyzing adjusts the weight of each connection in the training process.
multivariate time series data that captures the dynamic According to the preset learning rate, LSTM gradually
optimizes the weight according to the prediction error until it positive rates (TPR) and false positive rates (FPR), thereby
reaches the preset upper limit of iteration. Hyperparameters, demonstrating the classifier's performance across different
such as network depth, activation function selection (this study threshold levels. The Area Under the Curve (AUC) provides a
uses the typical Sigmoid function in LSTM), number of hidden single metric summarizing the ROC curve's overall
layer nodes (64, 32, 16 layers in this case), loss function effectiveness, with scores ranging from 0 to 1. A score
(choose mean square error), optimization algorithm (use approaching 1 indicates an exceptional model proficient in
efficient adam algorithm), batch size (set to 100), and total distinguishing between positive and negative cases. An AUC of
iteration rounds (1000). It needs to be defined in advance, 0.5 means that the model is not discriminative, which is
which has a direct impact on the model prediction equivalent to random guessing. Anything below 0.5 indicates
performance. An overview of the LSTM configuration detailed poor model performance. Therefore, ROC curve and AUC
below is presented in Table 1. value together constitute an important tool to evaluate the
balance between classification ability, sensitivity and
TABLE I. LSTM CONFIGURATION specificity of the model.
Layer Output Shape Param 2. Optimizing Network Depth and Neuron Counts per
Lstm1 (None,1,64) 35072 Layer
Lstm2 (None,1,32) 12416 The LSTM architecture comprises an input, hidden, and
Lstm3 (None,1,16) 3136 output layer. The input layer aligns with the feature count,
Dense5 (None,1) 17 necessitating 45 nodes to fit the dataset's structure. The output
Activation6 (None,1) 0 layer mirrors classification targets, hence, catering to the binary
outcomes—"approved" or "rejected"—a solitary output node is
B. Parameter tuning and optimization employed.
1. Evaluation metrics 1) Layer Count Optimization
Classification model evaluation metrics serve as crucial In examining network depth, other hyperparameters were
instruments to gauge the predictive power of models, enabling held constant: 20 hidden nodes per layer, Sigmoid activation,
assessment of their accuracy in categorizing unseen data. logarithmic loss function, Adam optimizer, a batch size of 100,
1) Accuracy and 100 epochs. Four architectural setups were considered: a
single hidden layer, two hidden layers, three hidden layers, and
Accuracy, a primary evaluation metric, represents the four hidden layers. Comparative analysis was conducted using
proportion of correctly classified instances by the model experimental datasets for these configurations, with the
relative to the total number of instances. resultant test set Loss curves plotted in Figure 2.
𝑇𝑃 + 𝑇𝑁
Accuracy= (12)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
TP stands for true positives, TN for true negatives, FP for
false positives, and FN for false negatives.
2) Precision and Recall
Precision is a measure of the fraction of examples that the
model predicts to be positive are actually positive.
𝑇𝑃
Precision= (13)
𝑇𝑃 + 𝐹𝑃
Determine the proportion of true positive instances detected
by the model, which represents the fraction of all actual
positive instances correctly identified by the model.
𝑇𝑃 Figure 2. Plot of loss for different number of layers
Recall= (14)
𝑇𝑃 + 𝐹𝑁 Figure 2 illustrates that the loss decreases to a minimal
3) F Score point with three network layers, and further layer increments
F Score, a unified measure of precision and recall, is their yield negligible loss reduction. Post-training, Table II
harmonic mean. summarizes the terminal loss and AUC scores of the models.
2 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙 TABLE II. LOSS WITH DIFFERENT NUMBER OF LAYERS, AUC VALUES
𝐹= (15)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 Network Training set Test set
4) ROC Curve and AUC
layer
The Receiver Operating Characteristic (ROC) curve serves LOSS AUC LOSS AUC
number
as a graphical tool for assessing binary classifiers, widely 1 0.2773 0.8836 0.3394 0.5237
utilized in fields such as healthcare, statistics, and machine 2 0.2020 0.9125 0.2402 0.5408
learning. This curve illustrates the balance between true
3 0.1758 0.9185 0.1845 0.5384 literature on enhancing data accessibility and interoperability
4 0.1792 0.9220 0.1994 0.5349 [37]. This approach allowed us to effectively integrate and
During the experiment varying hidden node counts, all analyze the comprehensive range of indicators provided by the
other hyperparameters remained constant: three hidden layers, World Bank Open Data. The dataset includes over 9,000
Sigmoid activation, logarithmic loss, Adam optimization, a indicators spanning various dimensions such as economic,
batch size of 100, and 100 iterations. We examined the social, environmental, and others, encompassing
following four configurations of hidden layer nodes: model1 macroeconomic factors like GDP, inflation, and employment,
has 20 nodes per layer (20, 20, 20), model2 has 60 nodes per as well as social progress metrics such as education, health, and
layer (60, 60, 60), model3 has 100 nodes per layer (100, 100, poverty. Additionally, it includes data on international trade,
100), and model4 uses decreasing number of nodes (64, 32, foreign direct investment, and environmental impacts. The
16). These four models are tested by applying the data of this World Bank Open Data supports multiple languages and data
study, and their Loss function (Loss) curves on the test set are formats, facilitating easy data filtration, search, and
plotted, and the specific results are shown in Figure 3. visualization through its user-friendly interface. This
accessibility is crucial for stakeholders including policymakers,
researchers, and NGOs, enabling enhanced data-driven
decision making. By applying the linked data method, we were
able to leverage the dataset's potential to its fullest, fostering a
deeper understanding of global economic phenomena and
contributing significantly to the fields of financial risk
prediction, market analysis, and sustainable development.
2. Experimental environment
The experimental environment adopted in this paper is
shown in Table IV
TABLE III. LOSS WITH DIFFERENT NUMBER OF NODES, AUC VALUES B. Experimental results and analysis
To assess our proposed LSTM deep learning model's
Training set Test set efficacy, comparative experiments are conducted against
Node number
LOSS AUC LOSS AUC outputs from other prevalent models, including Random
(20,20,20) 0.1425 0.9315 0.2502 0.8005 Forest[38], XGBoost[39], and traditional Backpropagation
(60,60,60) 0.1139 0.9527 0.2833 0.8348 Neural Networks.
(100,100,100) 0.0923 0.9618 0.3309 0.7964
(64,32,16) 0.0918 0.9577 0.2212 0.8031 By fixing all parameters related to pseudo-randomness and
The data in Table 3 show that when the hidden layer implementing five-fold cross-validation strategy, this paper
structure is set to (64, 32, 16) nodes, the AUC index of the aims to enhance the stability of the algorithm and the reliability
model reaches a better level. Therefore, based on the above of the results. After completing the training of each model, the
analysis, a 3-layer hidden layer structure with 64, 32 and 16 performance comparison details are listed in Table 5.
nodes per layer was determined as the best configuration for
TABLE V. COMPARISON OF FINAL RESULTS OF DIFFERENT MODELS
the model in this study.
Model Acc Precision Recall F AUC
IV.APPLICATION OF MODEL SVM 0.9102 0.8345 0.7729 0.8025 0.7924
A. Data set and experimental environment XGBoost 0.9373 0.9109 0.8124 0.8588 0.8105
1. Data set BP 0.8846 0.7012 0.7283 0.7145 0.7538
Model of
The experimental Data are collected from the World Bank 0.9731 0.8736 0.8426 0.8578 0.8522
this paper
Open Data dataset, which is a comprehensive Open data
resource maintained by the World Bank. To process the
experimental data collected from the World Bank Open Data The data in Table 5 show that the random forest model
dataset, we utilized a linked data method as proposed in recent shows a high accuracy rate of 0.9784, but in view of the sample
skew characteristics in the financial risk control scenario, a
high accuracy rate is not enough to comprehensively evaluate additional data sources and the use of ensemble techniques to
the performance of the model. It is worth noting that the further enhance predictive accuracy and robustness.
XGBoost model outperforms in F-measure, while the LSTM Additionally, studies could assess the LSTM model's
model dominates in AUC metric. Although XGBoost is performance across different financial markets to generalize its
currently widely used in production environments, LSTM applicability and effectiveness. Ultimately, this research
shows higher potential and adaptability because it is good at contributes to the ongoing evolution of risk management
dealing with time-series related and class imbalance data, strategies within the financial sector, supporting the shift
especially in sequence problems such as financial risk control, towards more data-driven, precise, and timely decision-making
implying that LSTM model may have better comprehensive processes. By continuing to refine these deep learning models,
performance. the financial industry can better anticipate and mitigate risks,
safeguarding against potential crises and enhancing overall
To further validate LSTM's efficacy as a classification market stability.
model, Performance-Recall curves and ROC plots are utilized
for comparison, illustrated in Figure 4. REFERENCES